INSTRUCTOR-LED Hadoop Development TRAINING

01 May 10:30 PM - 11:30 PM
WeekDay Course Hadoop
(Sat-Sun)
6 weeks - 30 hrs
USD 175

Hadoop Development Videos with Support

(33 modules , 140 hrs , 10 live projects )

Hadoop Development Course Curriculum


New Hadoop Development Training batch starting from
Hadoop Development

JAVA

Module - 1

Duration 53 mins

Java Essentials
  • What is JAVA ?
  • What is JRE and JDK ?
  • What is JVM?
  • How Java works?
  • Installation of JAVA and Eclipse IDE
  • Data types in Java
Module - 2

Duration 2 hrs 40 mins

Oops Concepts
  • Classes and Objects
  • Concept of Static and non static
  • Modifiers
  • Do… while loop
  • While loop
  • For loop
Module - 3

Duration 1 hr 30 mins

Oops Concept Part 2
  • Inheritance
  • Concept of Polymorphism
  • Abstract class
  • Interface
  • StringTokenizer
  • BufferReader
Module - 4

Duration 55 mins

Collections
  • List
  • Set
  • Map
  • Arraylist
  • HashMap

MEET HADOOP

Module - 5

Duration 1 hr 20 min

Meet Hadoop
  • What is Data?25 mins
  • Real world example of Big Data Analytics
  • Importance of Big Data Analytics
  • Distributed File System and Why do we need it ?
  • What is Hadoop?
  • Brief History of Hadoop
Module - 6

Duration 47 min

Hadoop Architecture Part 1
  • Architecture of Hadoop20 mins
  • Anatomy of File Write
  • Anatomy of File Read
  • Communication between nodes
  • Detailed description of various nodes
  • HDFS architecture
Module - 7

Duration 37 min

Hadoop Architecture Part 2
  • Processing Data with Hadoop21 mins
  • What is YARN and its functionality?
  • What are Active and Passive namdenode and their interaction
  • Feature of HDFS 2
  • HDFS federation in Hadoop
  • Limitations of Hadoop 1.0 Architecture
Module - 8

Duration 23 min

Environement Setup
  • Operational modes of Hadoop23 mins
  • Use Case : Joining customer transaction datasets
  • How the data is joined at ReduceSide
  • What are Reduce Joins
  • Description of different configuration files
  • Installation of Cloudera VM
Module - 9

Duration 1 hr

Hdfs
  • Browsing the HDFS20 mins
  • What is Reduce Task ?34 mins
  • Detailed explanation of Reducer
  • What is Writable?
  • Providing data to the Mapper
  • How data is read by Mapper

MAPREDUCE PROGRAMMING

Module - 10

Duration 1 hr 10 mins

Mapreduce Programming Part 1
  • Introduction to MapReduce27 mins
  • Writing a Word Count Program21 mins
  • Hands On session32 mins
  • Relationship between Input Split and HDFS blocks
  • What are input splits?
  • Input and Output of MapReduce Job
Module - 11

Duration 1 hr 35 mins

Mapreduce Programming Part 2
  • Browsing the HDFS20 mins
  • What is Reduce Task ?34 mins
  • Detailed explanation of Reducer
  • What is Writable?
  • Providing data to the Mapper
  • How data is read by Mapper
Module - 12

Duration 1 hr 40 mins

Mapreduce Programming Part 3
  • Introduction to MapReduce27 mins
  • Writing a Word Count Program21 mins
  • Hands On session32 mins
  • Relationship between Input Split and HDFS blocks
  • What are input splits?
  • Input and Output of MapReduce Job
Module - 13

Duration 1 hr

Mapreduce Programming Part 4
  • Use case of Partitioner: Segregating patient data based on some condition using partitioners29 mins
Module - 14

Duration 1 hr 15 mins

Advanced Mapreduce Part 1
  • What are Custom data types?29 mins
  • Common rules for creating Custom data types42 mins
  • Use case : Processing Online music data using custom data types 5 mins
Module - 15

Duration 1 hr 30 mins

Advanced Mapreduce Part 2
  • What are Counters?25 mins
  • Built in counters31 mins
  • Types of Counters and their description11 mins
  • User defined counters
  • Use case1 : Processing weblog entries using counters
  • Use case 2 : Processig Customer complaint data using counters
Module - 16

Duration 50 mins

Advanced Mapreduce Part 3
  • What is Distributed Cache?
  • Setting up the cache for a job
  • Usecase: Processing news data using distributed cache
Module - 17

Duration 1 hr

Advanced Mapreduce Part 4
  • Joining data in Mapreduce28 mins
  • What are MapJoins34 mins
  • How the data is joined at MapSide
  • Use Case : Joining Enterprise datasets
Module - 18

Duration 43 mins

Advanced Mapreduce Part 5
  • Operational modes of Hadoop23 mins
  • Use Case : Joining customer transaction datasets
  • How the data is joined at ReduceSide
  • What are Reduce Joins
  • Description of different configuration files
  • Installation of Cloudera VM
Module - 19

Duration 2 hrs 10 mins

Advanced Mapreduce Part 6
  • What are Sequence files?
  • Formats of Sequence files
  • Structure of Seq files with and without record compression
  • Structure of Seq files with and without block compression
  • Sequence file header
  • Writing a sequence file

PIG

Module - 20

Duration 2 hrs

Pig Part
  • Why do we need PIG ?37 mins
  • Sorting of data using Pig
  • Combining and Spliting data
  • File loaders in Pig Latin
  • Filter Operator in Pig latin
  • Foreach operator
Module - 21

Duration 41 mins

Pig Part 2
  • UseCase: Processing the weblogs using Pig
Module - 22

Duration 1 hr 20 mins

Pig Part 3
  • Functions in Pig31 mins
  • User defined functions in Pig
  • SUM
  • TOKENIZE
  • SIZE
  • MIN
Module - 23

Duration 45 mins

Hive Part 1
  • What is Hive?45 mins
  • File formats in Hive
  • Collection Data Types
  • Hive Data Types
  • Remote Metastore
  • Local Metastore

HIVE

Module - 24

Duration 2 hrs 30 mins

Hive Part 2
  • Hive Query language29 mins
  • GUI - Hue
  • Serialization Formats
  • Joins
  • Bucketing in Hive
  • Partitions in Hive
Module - 25

Duration 40 mins

Hive Part 3
  • Use Case : Processing Stocks Data to calculate covariance
Module - 26

Duration 32 mins

Hive Part 4
  • Hive UDFs
  • Usecase : Analyzing news data using user defined functions32 mins

HBASE

Module - 27

Duration 1 hr 17 mins

Hbase Part 1
  • What are NoSQL databases?29 mins
  • Table Manangement commands
  • Hbase Shell Commands
  • Hbase Shell
  • Storage in Hbase
  • Logical Architecture
Module - 28

Duration 3 hrs

Hbase Part 2
  • Hbase Data Coordinates21 mins
  • Bulk Loading into Hbase
  • Filters
  • Create/Save data to Hbase
  • Deleting a column family
  • Alter using java api

SQOOP

Module - 29

Duration 1 hr 30 min

Sqoop Part 1
  • Why Sqoop?29 mins
  • Insert and Updates in Sqoop
  • Incremental Imports
  • Staging table - Auxiliary Table
  • Direct mode of importing data
  • Controlling Parallelism
Module - 30

Duration 2 hrs

Sqoop Part 2
  • Importing data into Hbase22 mins
  • Sqoop-import-all-tables36 mins
  • Sqoop-export36 mins
  • Sqoop-job
  • Savedjobs and incremental imports
  • Sqoop-eval

FLUME

Module - 31

Duration 2 hr 50 mins

Flume
  • What is Flume?33 mins
  • Sequence generator source example
  • Different sources, sinks and channels in Flume
  • Flume configuration file
  • Working with Flume
  • Multiplexing

PROJECT

Module - 32

Duration 1 hrs 50 mins

Project 1: Processing Web Log Data

  • Generating web log dummy data using a script27 mins
  • Loading the data into hive28 mins
  • Analysing the DDOS attacks
  • Plotting the refined data in Power View30 mins

Module - 33

Duration 1 hrs 50 mins

Project 2: Sentiment Analysis

  • Popup Handling
  • Assigning the polarity to each tweet
  • Loading the data into hive34 mins
  • Getting live data from twitter about launch of a movie30 mins
  • GetRowWithCellData function
  • Building custom functions for Webtables

Total modules

33

Total Duration (hrs)

140

Total assignments

12