BETA

Believe in yourself and you can do unbelievable things.

  • OVERVIEW:

     

    In this course you will get in depth understanding of the following:
    - MapReduce
    - HDFS
    - Hadoop I/O
    - Developing MapReduce Application
    - Setting up Hadoop Cluster
    - Pig
    - Hive
    - Hbase
    - Zookeeper
    - Sqoop
    - Flume

     

    DURATION:

     

    40 hours

     

    OBJECTIVES:

     

    To gain in depth understanding and hands on experience of the tools required to succeeed as a "Hadoop Developer"

     

    TARGET AUDIENCE:

     

    Hadoop is quickly becoming a must-know technology for the following professionals:

    • Software Developers and Architects
    • Analytics Professionals
    • Senior IT professionals
    • Testing and Mainframe professionals
    • Data Management Professionals
    • Business Intelligence Professionals
    • Project Managers
    • Aspiring Data Scientists
    • Graduates looking to build a career in Big Data Analytics

    .

     

    ELIGIBILITY:

     

    • Knowledge of Java is necessary for this course
    • Knowledge of an Tech Linux and scripting
  • COURSE STRUCTURE:

     

    LESSON 1:  Meet Hadoop                                                                                                                                    

                                                                                                                                 

    • Data
    • Data Storage and Analysis
    • Comparison with Other Systems
    • RDBMS
    • Grid Computing
    • Volunteer Computing
    • A Brief History of Hadoop
    • Apache Hadoop and the Hadoop Ecosystem
    • Hadoop Releases
    LESSON 2: MapReduce                                                                                                                  

     

    • A Weather Dataset
    • Data Format
    • Analyzing the Data with Unix Tools
    • Analyzing the Data with Hadoop
    • Map and Reduce
    • Java MapReduce
    • Scaling Out
    • Data Flow
    • Combiner Functions
    • Running a Distributed MapReduce Job
    • Hadoop Streaming
    • Compiling and Running

     

    LESSON 3. The Hadoop Distributed File System (HDFS)

     

    • The Design of HDFS
    • HDFS Concepts
    • Blocks
    • Namenodes and Datanodes
    • HDFS Federation
    • HDFS High-Availability
    • The Command-Line Interface
    • Basic Filesystem Operations
    • Hadoop Filesystems
    • Interfaces
    • The Java Interface
    • Reading Data from a Hadoop URL
    • Reading Data Using the FileSystem API
    • Writing Data
    • Directories
    • Querying the Filesystem
    • Deleting Data
    • Data Flow
    • Anatomy of a File Read
    • Anatomy of a File Write
    • Coherency Model
    • Parallel Copying with distcp
    • Keeping an HDFS Cluster Balanced
    • Hadoop Archives

     

    LESSON 4: Hadoop I/O                                                                                               

                                                               

    • Data Integrity
    • Data Integrity in HDFS
    • LocalFileSystem
    • ChecksumFileSystem
    • Compression
    • Codecs
    • Compression and Input Splits
    • Using Compression in MapReduce
    • Serialization
    • The Writable Interface
    • Writable Classes
    • File-Based Data Structures
    • SequenceFile
    • MapFile

     

    LESSON 5: Developing a MapReduce Application                                                                                                           

                                                                                            

    • The Configuration API
    • Combining Resources
    • Variable Expansion
    • Configuring the Development Environment
    • Managing Configuration
    • GenericOptionsParser, Tool, and ToolRunner
    • Writing a Unit Test
    • Mapper
    • Reducer
    • Running Locally on Test Data
    • Running a Job in a Local Job Runner
    • Testing the Driver
    • Running on a Cluster
    • Packaging
    • Launching a Job
    • The MapReduce Web UI
    • Retrieving the Results
    • Debugging a Job
    • Hadoop Logs
    • Tuning a Job
    • Profiling Tasks
    • MapReduce Workflows
    • Decomposing a Problem into MapReduce Jobs
    • JobControl

     

    LESSON 6: How MapReduce Works                                                                   

     

    • Anatomy of a MapReduce Job Run
    • Classic MapReduce (MapReduce 1)
    • Failures
    • Failures in Classic MapReduce
    • Failures in YARN
    • Job Scheduling
    • The Capacity Scheduler
    • Shuffle and Sort
    • The Map Side
    • The Reduce Side
    • Configuration Tuning
    • Task Execution
    • The Task Execution Environment
    • Speculative Execution
    • Output Committers
    • Task JVM Reuse
    • Skipping Bad Records

     

     

    LESSON 7: MapReduce Types and Formats                                                                  

     

    • MapReduce Types
    • The Default MapReduce Job
    • Input Formats
    • Input Splits and Records
    • Text Input
    • Binary Input
    • Multiple Inputs
    • Database Input (and Output)
    • Output Formats
    • Text Output
    • Binary Output
    • Multiple Outputs
    • Lazy Output
    • Database Output

     

    LESSON 8: MapReduce Features                                                                   

     

    • Counters
    • Built-in Counters
    • User-Defined Java Counters
    • User-Defined Streaming Counters
    • Sorting
    • Preparation
    • Partial Sort
    • Total Sort
    • Secondary Sort
    • Joins
    • Map-Side Joins
    • Reduce-Side Joins
    • Side Data Distribution
    • Using the Job Configuration
    • Distributed Cache
    • MapReduce Library Classes

     

     

    LESSON 9: Setting Up a Hadoop Cluster                                                                   

     

    • Cluster Specification
    • Network Topology
    • Cluster Setup and Installation
    • Installing Java
    • Creating a Hadoop User
    • Installing Hadoop
    • Testing the Installation
    • SSH Configuration
    • Hadoop Configuration
    • Configuration Management
    • Environment Settings
    • Important Hadoop Daemon Properties
    • Hadoop Daemon Addresses and Ports
    • Other Hadoop Properties
    • User Account Creation
    • YARN Configuration
    • Important YARN Daemon Properties
    • YARN Daemon Addresses and Ports
    • Security
    • Kerberos and Hadoop
    • Delegation Tokens
    • Other Security Enhancements
    • Benchmarking a Hadoop Cluster
    • Hadoop Benchmarks
    • User Jobs
    • Hadoop in the Cloud
    • Hadoop on Amazon EC2

     

    LESSON 10: Administering Hadoop                                                                    

                                                                      

    • HDFS
    • Persistent Data Structures
    • Safe Mode
    • Audit Logging
    • Tools
    • Monitoring
    • Logging
    • Metrics
    • Java Management Extensions
    • Routine Administration Procedures
    • Commissioning and Decommissioning Nodes
    • Upgrades

     

    LESSON 11: Pig                                                                   

                                                                                              

    • Installing and Running Pig
    • Execution Types
    • Running Pig Programs
    • Grunt
    • Pig Latin Editors
    • An Example
    • Generating Examples
    • Comparison with Databases
    • Pig Latin
    • Structure
    • Statements
    • Expressions
    • Types
    • Schemas
    • Functions
    • Macros
    • User-Defined Functions
    • A Filter UDF
    • An Eval UDF
    • A Load UDF
    • Data Processing Operators
    • Loading and Storing Data
    • Filtering Data
    • Grouping and Joining Data
    • Sorting Data
    • Combining and Splitting Data
    • Pig in Practice
    • Parallelism
    • Parameter Substitution

     

    LESSON 12: Hive                                                                   

     

    • Installing Hive
    • The Hive Shell
    • An Example
    • Running Hive
    • Configuring Hive
    • Hive Services
    • Comparison with Traditional Databases
    • Schema on Read Versus Schema on Write
    • Updates, Transactions, and Indexes
    • HiveQL
    • Data Types
    • Operators and Functions
    • Tables
    • Managed Tables and External Tables
    • Partitions and Buckets
    • Storage Formats
    • Importing Data
    • Altering Tables
    • Dropping Tables
    • Querying Data
    • Sorting and Aggregating
    • MapReduce Scripts
    • Joins
    • Subqueries
    • Views
    • User-Defined Functions
    • Writing a UDF
    • Writing a UDAF

     

    LESSON 13: Hbase                                                                   

     

    • Backdrop
    • Concepts
    • Whirlwind Tour of the Data Model
    • Implementation
    • Installation
    • Test Drive
    • Clients
    • Java
    • Avro, REST, and Thrift
    • Schemas
    • Loading Data
    • Web Queries
    • HBase Versus RDBMS
    • Successful Service
    • Hbase

     

    LESSON 14: Zookeeper                                                                   

                                                                                                     

    • Installing and Running ZooKeeper
    • Group Membership in ZooKeeper
    • Creating the Group
    • Joining a Group
    • Listing Members in a Group
    • Deleting a Group
    • The ZooKeeper Service
    • Data Model
    • Operations
    • Implementation
    • Consistency
    • Sessions
    • States

     

    LESSON 15: Sqoop                                                                    

     

    • Getting Sqoop
    • A Sample Import
    • Generated Code
    • Additional Serialization Systems
    • Database Imports: A Deeper Look
    • Controlling the Import
    • Imports and Consistency
    • Direct-mode Imports
    • Working with Imported Data
    • Imported Data and Hive
    • Importing Large Objects

     

    LESSON 16: Flume                                                                   

     

    • Introduction
      • Overview
      • Architecture
    • Data flow model
    • Reliability
    • Building Flume
      • Getting the source
      • Compile/test Flume
    • Developing custom components
      • Client
        • Client SDK
        • RPC client interface
        • RPC clients - Avro and Thrift
        • Failover Client
        • Load Balancing RPC client
      • Embedded agent
      • Transaction interface
      • Sink
      • Source
      • Channel
  • Below mentioned training methodologies will be adopted while training:

     

    • Pre & Post Assessment
      • Objective and fair assessment of trainees post training
      • Demarcation between Theory and In class Practicals
      • Be as much interactive as possible
      • Encourage participants to share their ”burning questions” about the topic
      • Launching a poll question for participants to answer
      • Try to explain using famous models and studies
      • Incorporate Case based learning (case videos/ movies) and storytelling, if applicable
      • Integrate soft skills in the training, if applicable
      • Motivate learners to apply concepts from domain subjects in a real world situation
      • Practical Home work
      • Provision of skill Certification information to trainees, wherever applied and required
      • Encourage regular trainee interaction during , in-between and after the course
      • Aim to design a module that engages participants every 4 minutes
      • Encourage participants to use platform tools like Poll, Chat, Raise Hand, Screen Sharing etc.
      • Provision of learning/ reference material to the trainees
      • Regular post session response to the trainees
      • Record of participation and Certificate of achievement issued by EduSmart Skills
  • Frequently Asked Questions about Hadoop Developer

    Not available!

  • EDUSMART SKILLS CERTIFICATION

     

    The entire training course content is in line with respective certification program and helps you clear the requisite exams with ease and get the best jobs in the top MNCs.

    As part of this training you will be working on the assignments that have immense implications in the real world industry scenario thus helping you fast track your career effortlessly.

    During the program there will be assessments that perfectly reflect the type of questions asked in the exams and helps you score better.

    EduSmart Skills Course Completion certificate will be awarded on the completion of course.

  • Students Reviews for Hadoop Developer