Hadoop Administrator and Developer

  • Price $490
  • Course Type Online
  • Rating

| +

Course Overview

This is a 65 hours instructor lead Hadoop training course delivers the key concepts and expertise necessary to create robust data processing applications using Apache Hadoop. Through lecture and interactive hands-on exercises, attendees will learn Hadoop and its ecosystem components.
Upon completion of the course, attendees can clear Hadoop developer and Hadoop administrator certifications from Cloudera or from HortonWorks. Certification is a great differentiator; it helps establish individuals as leaders in their field, providing customers with tangible evidence of skills and expertise.

Trainer

Curriculum

1. Introduction

Topics - What is Hadoop?, The Hadoop Distributed File System, Hadoop Map Reduce Works, Anatomy of a Hadoop Cluster, Master Daemons, Name node, Job Tracker, Secondary name node, Slave Daemons, Job tracker, Task tracker.


2. HDFS(Hadoop Distributed File System)

Topics - Blocks and Splits, Input Splits, HDFS Splits, Data Replication, Hadoop Rack Aware, Data high availability, Data Integrity, Cluster architecture and block placement, Accessing HDFS, JAVA Approach, CLI Approach, Programming Practices, Developing MapReduce Programs in, Running without HDFS and Mapreduce, Running all daemons in a single node, Running daemons on dedicated nodes, Local Mode, Pseudo-distributed Mode, Fully distributed mode.


3. Setup Hadoop cluster of Apache, Cloudera and HortonWorks

Topics - Make a fully distributed Hadoop cluster on a single laptop/desktop, Name Node in Safe mode, Meta Data Backup, Integrating Kerberos security in hadoop.


4. Writing a MapReduce Program

Topics - Examining a Sample MapReduce Program, With several examples, Basic API Concepts, The Driver Code, The Mapper, The Reducer, Hadoop's Streaming API.


5. Performing several hadoop jobs

Topics - The configure and close Methods, Sequence Files, Record Reader, Record Writer, Role of Reporter, Output Collector, Processing XML files, Counters, Directly Accessing HDFS, ToolRunner, Using The Distributed Cache.


6. Common MapReduce Algorithms

Topics - Sorting and Searching, Indexing, Classification/Machine Learning, Term Frequency - Inverse Document Frequency, Word Co-Occurrence, Hands-On Exercise: Creating an Inverted Index, Identity Mapper, Identity Reducer, Exploring well known problems using MapReduce applications.


7. Debugging MapReduce Programs

Topics - Testing with MRUnit, Logging, Other Debugging Strategies.


8. Advanced MapReduce Programming

Topics - A Recap of the MapReduce Flow, The Secondary Sort, Customized Input Formats and Output Formats.


9. Monitoring and debugging on a Production Cluster

Topics - Counters, Skipping Bad Records, Rerunning failed tasks with Isolation Runner.


10. Tuning for Performance in MapReduce

Topics - Reducing network traffic with combiner, Partitioners, Using Compression, Reusing the JVM, Running with speculative execution, Refactoring code and rewriting algorithms Parameters affecting Performance, Other Performance Aspects.


11. HBase

Topics - HBase concepts, HBase architecture, Region server architecture, File storage architecture, HBase basics, Column access, Scans, HBase use cases, Install and configure HBase on a multi node cluster, Create database, Develop and run sample applications, Access data stored in HBase using clients like Java, Python and Pearl, HBase and Hive Integration, HBase admin tasks, Defining Schema and basic operation.


12. Hive

Topics - Hive concepts, Hive architecture, Install and configure hive on cluster, Create database, access it from java client, Buckets, Partitions, Joins in hive, Inner joins, Outer Joins, Hive UDF, Hive UDAF, Hive UDTF, Develop and run sample applications in Java/Python to access hive.


13. PIG

Topics - Pig basics, Install and configure PIG on a cluster, PIG Vs MapReduce and SQL, Pig Vs Hive, Write sample Pig Latin scripts, Modes of running PIG, Running in Grunt shell, Programming in Eclipse, Running as Java program, PIG UDFs, Pig Macros.


14. Flume, Chukwa, Avro, Scribe, Thrift

Topics - Flume and Chukwa concepts, Use cases of Thrift, Avro and scribe, Install and configure flume on cluster, Create a sample application to capture logs from Apache using flume.


15. CDH4 Enhancements

Topics - Name Node High – Availability, Name Node federation, Fencing, YARN.


16. Hadoop Challenges

Topics - Hadoop disaster recovery, Hadoop suitable cases.


17. Exercies

Topics - Documents or tests, Hadoop Project - a realtime project where students can practice.


Trainer Details

Rithisha Information Systems Pvt.Ltd has been committed to providing the highest quality, needs-based training interventions to its clients, both locally and internationally, Rithisha is a renowned for superior training programs delivered by an enviable team of qualified, expert and highly experienced trainers in the area of Information Technology. Rithisha provides organizations and individuals with a complete and comprehensive suite of training offerings including online and classroom training's.


Rating
Post a Rating
Post a Rating
by Kumar
The course is a great opportunity to get some quick practical experience with Hadoop and its related sub projects. I am impressed with the methodology used. The course meets my expectations. The training was interactive and got all answers for all the queries. Thank you.
by Praveen Kumar
Hadoop Administrator and Developer sessions were really informative. I really get lot of indept knowledge on Hadoop architechture and basic hadoop programming concept. The training imparts a good knowledge on Hadoop technologies and use cases. The trainer was good and was able to give good theoretical concepts on the Hadoop framework.

Register For Demo

Register For Demo

Goals & Objectives

  • Understands Big Data and Hadoop Basics.
  • Understands Hadoop Architecture and Hadoop Cluster.
  • Understands the HDFS Architecture.
  • Learns Map Reduce Framework.
  • Learns how to install Hadoop on Single node and Multi node.
  • Should able to clear Clourera or HortonWorks certification.

Suggested Courses

About Us

VIBLOO is a platform for teaching and learning on the web. Trainers can offer Video, Online & Classroom courses on Vibloo.


Read More