Hadoop and Spark Skills you will learn

  • Realtime data processing
  • Functional programming
  • Spark applications
  • Parallel processing
  • Spark RDD optimization techniques
  • Spark SQL

Who should learn Hadoop and Spark

  • IT Professionals
  • BI Professionals
  • Analytics Professionals
  • Software Developers
  • Senior IT Professionals
  • Project Managers
  • Aspiring Data Scientists

What you will learn in Hadoop and Spark Basics Program

  • Big Data Hadoop and Spark Developer Training

    • Lesson 01: Course Introduction

      10:24
      • 1.01 Course Introduction
        10:24
    • Lesson 02: Introduction to Big Data and Hadoop

      38:20
      • 2.01 Learning Objectives
        00:38
      • 2.02 Big Data Overview
        05:19
      • 2.03 Big Data Analytics
        03:01
      • 2.04 Case Study Big Data Using Nvidia Jetson Camera
        01:44
      • 2.05 What Is Big Data
        03:49
      • 2.06 Five Vs of Big Data
        03:51
      • 2.07 Case Study Royal Bank of Scotland
        00:40
      • 2.08 Challenges of Traditional System
        01:40
      • 2.09 Case Study Big Data in Netflix
        01:41
      • 2.10 Distributed Systems
        01:13
      • 2.11 Introduction to Hadoop
        03:58
      • 2.12 Components of Hadoop Ecosystem
        08:59
      • 2.13 Commercial Hadoop Distributions
        01:07
      • 2.14 Key Takeaways
        00:40
    • Lesson 03: HDFS: The Storage Layer

      32:35
      • 3.01 Learning Objectives
        00:52
      • 3.02 Hadoop Distributed File System (HDFS)
        07:25
      • 3.03 HDFS Architecture and Components
        16:32
      • 3.04 Case Study Analyzing Uber Datasets using Hadoop Framework
        01:18
      • 3.05 Assisted Practice
        05:45
      • 3.06 Key Takeaways
        00:43
    • Lesson 04: Distributed Processing MapReduce Framework

      36:48
      • 4.01 Distributed Processing MapReduce Framework
        00:43
      • 4.02 Distributed Processing in MapReduce
        03:38
      • 4.03 Case Study Flipkart Dodged WannaCry Ransomware
        01:47
      • 4.04 MapReduce Terminologies
        05:37
      • 4.05 Map Execution Phases
        02:35
      • 4.06 MapReduce Jobs
        05:58
      • 4.07 Building a MapReduce Program
        03:39
      • 4.08 Creating a New Project
        06:38
      • 4.09 Assisted Practice
        05:40
      • 4.10 Key Takeways
        00:33
      • Lesson End Project: Count the number of Words using MapReduce
    • Lesson 05: MapReduce Advanced Concepts

      27:07
      • 5.01 Learning Objectives
        00:46
      • 5.02 Data Types in Hadoop
        02:36
      • 5.03 Custom Data Type using WritableComparable Interface
        03:36
      • 5.04 InputSplit
        03:28
      • 5.05 Custom Partitioner
        01:59
      • 5.06 Distributed Cache and Job Chaining
        04:16
      • 5.07 Hadoop Scheduler and its Types
        05:32
      • 5.08 Assisted Practice Execution of MapReduce job using Custom partitioner
        04:26
      • 5.09 Key Takeaways
        00:28
      • Lesson End Project: Flipkart Analysis
    • Lesson 06: Apache Hive

      49:53
      • 6.01 Learning Objective
        00:41
      • 6.02 Hive SQL Over Hadoop Map reduce
        02:35
      • 6.03 Hive Case study
        01:19
      • 6.04 Hive Architecture
        03:59
      • 6.05 Hive Meta Store
        04:30
      • 6.06 Hive DDL and DML
        02:23
      • 6.07 Hive Data types
        04:19
      • 6.08 File Format Types
        02:47
      • 6.09 Hive Data Serialization
        03:21
      • 6.10 Hive Optimization Partitioning Bucketing Skewing
        10:35
      • 6.11 Hive Analytics UDF and UDAF
        08:11
      • 6.12 Assisted Practice Working with Hive Quer Editor
        00:35
      • 6.13 Assisted Practice Working with Hive Query Editor using Meta Data
        03:52
      • 6.14 Key Takeaways
        00:46
      • Lesson End Project: Post Office Data Analysis using Hive
    • Lesson 07: Apache Pig

      12:25
      • 7.01 Learning Objectives
        00:42
      • 7.02 Introduction to pig
        02:59
      • 7.03 Components of Pig
        07:41
      • 7.04 Key Takeaways
        01:03
    • Lesson 08: NoSQL Databases - HBase

      32:32
      • 8.01 Learning Objectives
        00:53
      • 8.02 NoSQL Introduction
        05:10
      • 8.03 HBase Overview
        06:26
      • 8.04 HBase Architecture
        05:45
      • 8.05 HBase Data Model
        06:15
      • 8.06 Connecting to HBase
        03:36
      • 8.07 Assisted Practice Data Upload from HDFS to HBase
        03:45
      • 8.08 Key Takeaways
        00:42
      • Lesson End Project: Uploading Data from HDFS to HBase
    • Lesson 09: Data Ingestion into Big Data Systems and ETL

      33:19
      • 9.01 Learning Objectives
        00:48
      • 9.02 Data Ingestion Overview
        04:19
      • 9.03 Apache Kafka
        04:57
      • 9.04 Kafka Data Model
        04:38
      • 9.05 Apache Kafka Architecture
        07:55
      • 9.06 Apache Flume
        01:35
      • 9.07 Apache Flume Model
        03:20
      • 9.08 Components in Flume’s Architecture
        04:56
      • 9.09 Key Takeaways
        00:51
      • Lesson End Project: Twitter Data Ingestion with Flume
    • Lesson 10: YARN Introduction

      27:55
      • 10.01 Learning Objective
        00:51
      • 10.02 YARN Yet Another Resource Negotiator
        06:12
      • 10.03 Use Case YARN
        01:28
      • 10.04 YARN Infrastructure
        00:51
      • 10.05 YARN Architecture
        12:19
      • 10.06 Tools for YARN Developers
        02:15
      • 10.07 Assisted Practice YARN
        03:14
      • 10.08 Key Takeaways
        00:45
      • Lesson End Project: Working with Yarn
    • Lesson 11: Introduction to Python for Apache Spark

      48:12
      • 11.01 Learning Objectives
        00:45
      • 11.02 Introduction to Python
        03:12
      • 11.03 Modes of Python
        03:08
      • 11.04 Applications of Python
        02:34
      • 11.05 Variables in Python
        02:30
      • 11.06 Operators in Python
        05:02
      • 11.07 Control Statements in Python
        03:50
      • 11.08 Loop Statements in Python
        02:48
      • 11.09 Assisted Practice List Operations
        10:23
      • 11.10 Assisted Practice Swap Two Strings
        06:23
      • 11.11 Assisted Practice Merge Two Dictionaries
        07:04
      • 11.12 Key Takeaways
        00:33
    • Lesson 12: Functions

      01:05:27
      • 12.01 Learning Objectives
        00:49
      • 12.02 Python Functions
        10:32
      • 12.03 Object-Oriented Programming in Python
        02:48
      • 12.04 Access Modifiers
        06:10
      • 12.05 Object - Oriented Programming Concepts
        38:48
      • 12.06 Modules in Python
        05:51
      • 12.07 Key Takeaways
        00:29
      • Lesson End Project : Banking Data Standardization in Python
    • Lesson 13: Big Data and the Need for Spark

      14:41
      • 13.01 Learning Objectives
        00:57
      • 13.02 Types of Big data
        01:17
      • 13.03 Challenges is in Traditional Data Solution
        02:33
      • 13.04 Data Processing in Big Data
        02:24
      • 13.05 Distributed Computing and Its Challenges
        00:45
      • 13.06 MapReduce
        02:23
      • 13.07 Apache Storm and Its Limitations
        01:54
      • 13.08 General Purpose Solution Apache Spark
        02:03
      • 13.09 Key Takeways
        00:25
    • Lesson 14: Deep Dive into Apache Spark Framework

      24:16
      • 14.01 Learning Objectives
        00:36
      • 14.02 Spark Components
        05:44
      • 14.03 Spark Architecture
        02:14
      • 14.04 Spark Cluster in Real World
        04:16
      • 14.05 Intoduction to PySpark Shell
        01:07
      • 14.06 Submitting PySpark Job
        03:02
      • 14.07 Spark Web UI
        02:14
      • 14.08 Assisted Practice Deployment of PySpark Job
        04:36
      • 14.09 Key Takeaways
        00:27
    • Lesson 15: Working with Spark RDD's

      39:37
      • 15.01 Learning Objectives
        01:02
      • 15.02 Challenges in Existing Computing Methods
        01:51
      • 15.03 Resilient Distributed Dataset
        04:14
      • 15.04 RD Opearations
        00:11
      • 15.05 RDD Transformation
        01:38
      • 15.06 RDD Transformation Examples
        08:23
      • 15.07 RDD Action
        01:02
      • 15.08 RDD Action Examples
        03:01
      • 15.09 Loading and Saving Data into an RDD
        01:34
      • 15.10 Pair RDDs
        01:26
      • 15.11 Double RDD and its Functions
        01:38
      • 15.12 DAG and RDD Lineage
        01:51
      • 15.13 RDD Persistence and Its Storage Levels
        05:50
      • 15.14 Word Count Program
        01:29
      • 15.15 RDD Partitioning
        01:46
      • 15.16 Passing Function to Spark
        01:01
      • 15.17 Assisted Practice Create an RDD in Spark
        00:46
      • 15.18 Key Takeaways
        00:54
      • Lesson End Project: Telecom Log Parsing
    • Lesson 16: Spark SQL and Data Frames

      36:42
      • 16.01 Learning Objective
        00:33
      • 16.02 Spark SQL Introduction
        02:40
      • 16.03 Spark SQL Architecture
        01:58
      • 16.04 Spark - Context
        05:04
      • 16.05 User - defined Functions
        01:15
      • 16.06 User - defined Aggregate Functions
        01:07
      • 16.07 Apache Spark DataFrames
        02:10
      • 16.08 Spark DataFrames – Catalyst Optimizer
        01:11
      • 16.09 Interoperating with RDDs
        01:28
      • 16.10 PySpark DataFrames
        02:20
      • 16.11 Spark - Hive Integration
        01:14
      • 16.12 Assisted Practice Create DataFrame Using PySpark to Process Records
        06:03
      • 16.13 Assisted Practice UDF with DataFrame
        09:05
      • 16.14 Key Takeaways
        00:34
      • Lesson End Project: Retail Business Analytics
    • Lesson 17: Machine Learning using Spark ML

      42:54
      • 17.01 Learning Objectives
        00:47
      • 17.02 Analytics in Spark
        03:13
      • 17.03 Introduction to Machine Learning
        02:51
      • 17.04 Machine Learning Implementation
        04:53
      • 17.05 Applications of Machine Learning
        01:51
      • 17.06 Machine Learning Types
        00:16
      • 17.07 Supervised Learning
        02:25
      • 17.08 Unsupervised Learning
        02:59
      • 17.09 Semi-Supervised Learning
        01:24
      • 17.10 Reinforcement Learning
        02:59
      • 17.11 Machine Learning Use Case Face Detection
        01:21
      • 17.12 Introduction to Spark ML
        01:23
      • 17.13 ML Pipeline
        05:21
      • 17.14 Machine Learning Examples
        05:06
      • 17.15 Assisted Practice Data Exploration
        04:49
      • 17.16 Key Takeaways
        01:16
      • Lesson End Project: Linear Regression with Real-world Dataset
    • Lesson 18: Stream Processing Frameworks and Spark Streaming

      38:01
      • 18.01 Learning Objectives
        00:58
      • 18.02 Traditional Computing Methods and Its Drawbacks
        01:32
      • 18.03 Spark Streaming Introduction
        03:54
      • 18.04 Real Time Processing of Big Data
        02:23
      • 18.05 Data Processing Architectures
        07:23
      • 18.06 Spark Streaming
        05:29
      • 18.07 Introduction to DStreams
        05:35
      • 18.08 Checkpointing
        01:49
      • 18.09 State Operations
        01:19
      • 18.10 Windowing Operation
        01:16
      • 18.11 Spark Streaming Source
        01:36
      • 18.12 Assisted Practice Apache Spark Streaming
        04:15
      • 18.13 Key Takeaways
        00:32
      • Lesson End Project: Retail Business Analysis Using Spark Streaming
    • Lesson 19: Spark Structured Streaming

      32:43
      • 19.01 Learning Objectives
        00:44
      • 19.02 Introduction to Spark Structured Streaming
        03:01
      • 19.03 Batch vs Streaming
        04:16
      • 19.04 Structured Streaming Architecture
        06:22
      • 19.05 Use Case Banking Transactions
        00:31
      • 19.06 Structured Streaming APIs
        07:11
      • 19.07 Usecase Spark Structured Streaming
        01:07
      • 19.08 Assisted Practice Working with Spark Strutured Application
        09:00
      • 19.09 Key Takeaways
        00:31
      • Lesson End Project: Retail Business Analysis Using Structured Streaming
    • Lesson 20: Spark GraphX

      46:00
      • 20.01 Learning Objectives
        00:37
      • 20.02 Introduction to Graphs
        01:23
      • 20.03 Use Cases of GraphX
        02:00
      • 20.04 Introduction to Spark GraphX
        08:55
      • 20.05 GraphX Operators
        10:05
      • 20.06 Graph Parallel System
        00:55
      • 20.07 Algorithms in Spark
        05:07
      • 20.08 Pregel API
        04:29
      • 20.09 Graph Frames
        05:49
      • 20.10 Assisted Practice 20.2 GraphX
        06:08
      • 20.11 Key Takeaways
        00:32

Get a Completion Certificate

Share your certificate with prospective employers and your professional network on LinkedIn.

Learn the Basics of Hadoop and Spark

Course Advisors

  • Ronald van Loon

    Ronald van Loon

    CEO, Principal Analyst Intelligent World,Top10 AI-Data-IoT-Influencer

    Named by Onalytica as one of the three most influential people in Big Data, Ronald is also an author of a number of leading Big Data and Data Science websites, including Datafloq, Data Science Central, and The Guardian. He also regularly speaks at renowned events.

prevNext

Why you should learn Hadoop and Spark

$84.6 Billion by 2021

Global Hadoop market as per Allied Market Research

1.9 million jobs in the US

For Hadoop data analysts by 2021

Career Opportunities

FAQs

  • What are the prerequisites to learn the Hadoop and Spark basics program?

    Professionals should be familiar with Core Java and SQL before taking this Hadoop and Spark basics program. 

  • How do beginners learn Spark and Hadoop basics?

    Beginners today usually look for online resources to learn Spark and Hadoop. While textual content is easily available over the internet, this Hadoop and Spark free course offers comprehensive video modules to further enrich your learning experience.

  • How long does it take to learn Hadoop and Spark?

    This Hadoop basics course offers 11 hours of in-depth and high-quality video lessons that you can follow at your preferred learning speed. You can access the course 24/7 and return to any previous lesson for better understanding.

  • What should I learn first in the Hadoop and Spark basics program?

    It is recommended to first get an overview of big data, the four Vs of big data, and the reasons why distributed systems were introduced in this Spark basics program.

  • Is the Hadoop and Spark basics program easy to learn?

    The instructors of this Hadoop basics program have explained all the concepts from scratch with an easily understandable approach. Anyone having a technical background can easily follow the lessons.

  • What are the basics in a Hadoop and Spark training program?

    Learners enrolling in this Hadoop and Spark fundamentals program are guided in basics like introduction to big data analytics, the components of Hadoop ecosystem, and the Hadoop architecture.

  • What is Hadoop?

    Hadoop is an open-source software framework for data storage. It also enables applications to run on commodity hardware. It gained popularity over the years for its high-capacity storage, processing power, and the ability to multitask unlimited jobs.
     

  • What is Spark?

    Originally developed at UC Berkeley, Apache Spark is an extremely powerful and fast analytics engine for big data and machine learning. It is used for processing enormous amounts of data via in-memory caching and optimized query execution.

  • What are Hadoop and Spark used for?

    Hadoop is an open-source framework that allows organizations to store and process big data in a parallel and distributed environment. It is used to store and combine data, and it scales up from one server to thousands of machines, each offering low-cost storage and local computation. Spark is an open-source framework that provides several interconnected platforms, systems, and standards for big data projects.

  • Why learn Hadoop and Spark together?

    It is recommended to learn Hadoop and Spark together because their distinct individualities are interlinked in multiple ways. While Hadoop reads and writes files for the HDFS, Spark takes over the data processing in RAM using a Resilient Distributed Dataset (RDD). However, Spark can run independently or along with a Hadoop cluster as the data source. From a skill point of view, hiring managers and companies are particularly interested in professionals who are well-versed in both Hadoop and Spark.
     

  • What are my next best learning options after completing this Hadoop and Spark basics program?

    After completing this Hadoop and Spark basics training program, you can get ahead with other courses like Big Data Engineer Master’s Program or Professional Certificate Program in Data Engineering.

  • Will I get a certificate after completing the Hadoop and Spark basics program?

    Yes, You will receive a Course Completion Certificate from SkillUp upon completing the Hadoop and Spark basics program. You can unlock it by logging in to your SkillUp account. As soon as the certificate is unlocked, you will receive a mail with a link to your SkillUp learning dashboard on your registered mail address. Click the link to view and download your certificate. You can even add the certificate to your resume and share it on social media platforms.

  • What are the career opportunities after learning Hadoop and Spark?

    Professionals who learn Hadoop and Spark open their doors to many job opportunities in the field of big data analytics. Business analyst, big data engineer, analytics manager, and data architect are some of the popular job roles in this field that one can target after learning Hadoop basics.

  • Who can learn Hadoop and Spark?

    You can learn Hadoop and Spark if you are one among the following:

    • IT professional
    • BI professional
    • Analytics professional
    • Software developer
    • Senior IT professional
    • Project manager
    • Aspiring data scientist
       

  • Can I complete this Hadoop and Spark basics online course in 90 days?

    Yes, the video modules of this Hadoop and Spark basics program are easy to grasp and you can complete them within 90 days.

Learner Review

  • Jojo Jacob

    Jojo Jacob

    My Curiosity to learn more about Big data and how it is differentiated from "data" lead me to this Skill Up session by Simplilearn. Now, I am able to understand the fundamentals in Hadoop, Kafka and various other tools.

  • Gaayathri P

    Gaayathri P

    I really enjoyed the course. It was well-planned and easy for me to follow. The workload was just right and I could finish everything in time without feeling rushed.

  • Seun Daniel

    Seun Daniel

    Great to see that this amazing course is free! The learning structure is so compact and easy and the instruction was detailed. The examples and practices sessions were insightful and I am happy to complete the course.

  • Geo Oik

    Geo Oik

    I really enjoyed practising with the projects in every lesson! I think that applying the course's theory through real-life examples is the perfect way to understand better.

prevNext
  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.
  • *According to Simplilearn survey conducted and subject to terms & conditions with Ernst & Young LLP (EY) as Process Advisors