COURSE CONTENT

EXPAND ALL

  • Architecture of Hadoop20 mins
  • Detailed description of Hadoop ecosystem27 mins
  • Classification of Hadoop Ecosystem
  • Introduction of different components of Hadoop: Hive,Pig,Sqoop,Hbase,Flume etc
  • Hadoop Core Components: HDFS and MapReduce
  • Detailed description of HDFS
See more
  • Processing Data with Hadoop21 mins
  • Introduction to MapReduce16 mins
  • Languages used in MapReduce
  • MapReduce Daemons
  • Introduction to Job Tracker?
  • Introduction to Task Tracker?
See more
  • Operational modes of Hadoop23 mins
  • Description of Standalone mode
  • Description of Pseudo Distributed mode
  • Desciption of Fully distributed mode
  • Environment Setup of Hadoop
  • Installation of Cloudera VM
See more
  • Browsing the HDFS20 mins
  • HDFS Commands and Operations22 mins
  • Listing all the Hadoop file system Commands18 mins
  • Checking version of Hadoop
  • How to run a Jar file in Hadoop
  • Making a directory in HDFS
See more
  • Introduction to MapReduce27 mins
  • High level view of MapReduce Processing30 mins
  • Descriptive Details of various steps involved in MapReduce22 mins
  • MapReduce version 1
  • Issues in MRv1: Need of MRv2
  • MRv2 : YARN
See more
  • Browsing the HDFS20 mins
  • HDFS Commands and Operations22 mins
  • Listing all the Hadoop file system Commands18 mins
  • Checking version of Hadoop
  • How to run a Jar file in Hadoop
  • Making a directory in HDFS
See more
  • Introduction to MapReduce27 mins
  • High level view of MapReduce Processing30 mins
  • Descriptive Details of various steps involved in MapReduce22 mins
  • MapReduce version 1
  • Issues in MRv1: Need of MRv2
  • MRv2 : YARN
See more
  • Use case of Partitioner: Segregating patient data based on some condition using partitioners29 mins
  • What are Custom data types?29 mins
  • Common rules for creating Custom data types42 mins
  • Use case : Processing Online music data using custom data types 5 mins
  • What are Counters?25 mins
  • Built in counters31 mins
  • Types of Counters and their description11 mins
  • User defined counters
  • Use case1 : Processing weblog entries using counters
  • Use case 2 : Processig Customer complaint data using counters
  • What is Distributed Cache?
  • Setting up the cache for a job
  • Usecase: Processing news data using distributed cache
  • Joining data in Mapreduce28 mins
  • What are MapJoins34 mins
  • How the data is joined at MapSide
  • Use Case : Joining Enterprise datasets
  • Operational modes of Hadoop23 mins
  • Description of Standalone mode
  • Description of Pseudo Distributed mode
  • Desciption of Fully distributed mode
  • Environment Setup of Hadoop
  • Installation of Cloudera VM
See more
  • What are Sequence files?
  • Formats of Sequence files
  • Structure of Seq files with and without record compression
  • Structure of Seq files with and without block compression
  • Sequence file header
  • Writing a sequence file
See more
  • Why do we need PIG ?37 mins
  • Why should we go for PIG when we have MapReduce?19 mins
  • What is Pig ?13 mins
  • Where to use Pig?13 mins
  • Anatomy of Pig15 mins
  • Pig on Hadoop
See more
  • UseCase: Processing the weblogs using Pig
  • Functions in Pig31 mins
  • AVG19 mins
  • CONCAT30 mins
  • COUNT
  • MAX
  • MIN
See more
  • What is Hive?45 mins
  • Hive query language
  • Why Hive when Pig is there ?
  • Hive v/s Pig
  • Brief History of Hive
  • Features of Hive
See more
  • Hive Query language29 mins
  • Database commands27 mins
  • Creating a database30 mins
  • Listing all databases30 mins
  • Using a specific database26 mins
  • Tables : Managed and External Tables
See more
  • Use Case : Processing Stocks Data to calculate covariance
  • Hive UDFs
  • Usecase : Analyzing news data using user defined functions32 mins
  • What are NoSQL databases?29 mins
  • Types of NoSQL databases15 mins
  • NoSQL technology landscape33 mins
  • Limitations of Hadoop
  • What is Hbase?
  • Brief History of Hbase
See more
  • Hbase Data Coordinates21 mins
  • Multi Map Structure29 mins
  • Java Client Apis27 mins
  • Creating a table using java api27 mins
  • Listing tables using java api35 mins
  • Disabling a table using java api27 mins
See more
  • Why Sqoop?29 mins
  • What is Sqoop ?22 mins
  • How Sqoop works ?26 mins
  • Sqoop import and sqoop export16 mins
  • Controlling Parallelism
  • Direct mode of importing data
See more
  • Importing data into Hbase22 mins
  • Sqoop-import-all-tables36 mins
  • Sqoop-export36 mins
  • Sqoop-job
  • Savedjobs and incremental imports
  • Sqoop-eval
  • What is Flume?33 mins
  • Why Flume?32 mins
  • Advantages of Flume13 mins
  • Architecture of Flume25 mins
  • Flume event23 mins
  • Flume agents23 mins
See more
  • Generating web log dummy data using a script27 mins
  • Loading the data into hive28 mins
  • Analysing the DDOS attacks
  • Plotting the refined data in Power View30 mins
  • Popup Handling
  • Managing different Windows
  • Close and Quit -Difference
  • Concept of WebTables
  • Dynamic WebTable Handling
  • Extracting Data From WebTable
See more

PROJECT INFO

Businesses thrive by making informed decisions that target the needs of their customers and users. To make such strategic decisions, they rely on data. Hive is a tool of choice for many data scientists because it allows them to work with SQL, a familiar syntax, to derive insights from Hadoop, reflecting the information that businesses seek to plan effectively.

This course shows how to use Hive to process data. Instructor Yogesh starts by showing you how to structure and optimize your data. Next, he explains how to get Hue, the Hadoop user interface, to leverage HiveQL when analyzing data. Using the newly configured option, he then demonstrates how to load data, create aggregate tables for fast query access, and run advanced analytics. He also takes you through managing tables and putting functions to use. This course is designed to help you find new ways to work with datasets so you can answer the tough data science questions that come your way.

PRICING/BATCHES

INSTUCTOR LED HADOOP DEVELOPMENT TRAINING   

  • 18 Jul 10:30 PM - 11:30 PM - EST