Hadoop and Spark Skills you will learn

  • Realtime data processing
  • Functional programming
  • Spark applications
  • Parallel processing
  • Spark RDD optimization techniques
  • Spark SQL

Who should learn Hadoop and Spark

  • IT Professionals
  • BI Professionals
  • Analytics Professionals
  • Software Developers
  • Senior IT Professionals
  • Project Managers
  • Aspiring Data Scientists

What you will learn in Hadoop and Spark Basics Program

  • Big Data Hadoop and Spark Developer

    • Lesson 1 Course Introduction

      08:51
      • 1.1 Course Introduction
        05:52
      • 1.2 Accessing Practice Lab
        02:59
    • Lesson 2 Introduction to Big Data and Hadoop

      43:59
      • 1.1 Introduction to Big Data and Hadoop
        00:31
      • 1.2 Introduction to Big Data
        01:02
      • 1.3 Big Data Analytics
        04:24
      • 1.4 What is Big Data
        02:54
      • 1.5 Four Vs Of Big Data
        02:13
      • 1.6 Case Study: Royal Bank of Scotland
        01:31
      • 1.7 Challenges of Traditional System
        03:38
      • 1.8 Distributed Systems
        01:55
      • 1.9 Introduction to Hadoop
        05:28
      • 1.10 Components of Hadoop Ecosystem: Part One
        02:17
      • 1.11 Components of Hadoop Ecosystem: Part Two
        02:53
      • 1.12 Components of Hadoop Ecosystem: Part Three
        03:48
      • 1.13 Commercial Hadoop Distributions
        04:19
      • 1.14 Demo: Walkthrough of Simplilearn Cloudlab
        06:51
      • 1.15 Key Takeaways
        00:15
      • Knowledge Check
    • Lesson 3 Hadoop Architecture,Distributed Storage (HDFS) and YARN

      57:50
      • 2.1 Hadoop Architecture Distributed Storage (HDFS) and YARN
        00:50
      • 2.2 What Is HDFS
        00:54
      • 2.3 Need for HDFS
        01:52
      • 2.4 Regular File System vs HDFS
        01:27
      • 2.5 Characteristics of HDFS
        03:24
      • 2.6 HDFS Architecture and Components
        02:30
      • 2.7 High Availability Cluster Implementations
        04:47
      • 2.8 HDFS Component File System Namespace
        02:40
      • 2.9 Data Block Split
        02:32
      • 2.10 Data Replication Topology
        01:16
      • 2.11 HDFS Command Line
        02:14
      • 2.12 Demo: Common HDFS Commands
        04:39
      • HDFS Command Line
      • 2.13 YARN Introduction
        01:32
      • 2.14 YARN Use Case
        02:21
      • 2.15 YARN and Its Architecture
        02:09
      • 2.16 Resource Manager
        02:14
      • 2.17 How Resource Manager Operates
        02:28
      • 2.18 Application Master
        03:29
      • 2.19 How YARN Runs an Application
        04:39
      • 2.20 Tools for YARN Developers
        01:38
      • 2.21 Demo: Walkthrough of Cluster Part One
        03:06
      • 2.22 Demo: Walkthrough of Cluster Part Two
        04:35
      • 2.23 Key Takeaways
        00:34
      • Knowledge Check
      • Hadoop Architecture,Distributed Storage (HDFS) and YARN
    • Lesson 4 Data Ingestion into Big Data Systems and ETL

      01:04:02
      • 3.1 Data Ingestion into Big Data Systems and ETL
        00:42
      • 3.2 Data Ingestion Overview Part One
        01:51
      • 3.3 Data Ingestion
        01:41
      • 3.4 Apache Sqoop
        02:04
      • 3.5 Sqoop and Its Uses
        03:02
      • 3.6 Sqoop Processing
        02:11
      • 3.7 Sqoop Import Process
        02:24
      • Assisted Practice: Import into Sqoop
      • 3.8 Sqoop Connectors
        04:22
      • 3.9 Demo: Importing and Exporting Data from MySQL to HDFS
        05:07
      • Apache Sqoop
      • 3.9 Apache Flume
        02:42
      • 3.10 Flume Model
        01:56
      • 3.11 Scalability in Flume
        01:33
      • 3.12 Components in Flume’s Architecture
        02:40
      • 3.13 Configuring Flume Components
        01:58
      • 3.15 Demo: Ingest Twitter Data
        04:43
      • 3.14 Apache Kafka
        01:54
      • 3.15 Aggregating User Activity Using Kafka
        01:34
      • 3.16 Kafka Data Model
        02:56
      • 3.17 Partitions
        02:04
      • 3.18 Apache Kafka Architecture
        03:02
      • 3.19 Producer Side API Example
        02:30
      • 3.20 Consumer Side API
        00:43
      • 3.21 Demo: Setup Kafka Cluster
        03:52
      • 3.21 Consumer Side API Example
        02:36
      • 3.22 Kafka Connect
        01:14
      • 3.23 Key Takeaways
        00:25
      • 3.26 Demo: Creating Sample Kafka Data Pipeline using Producer and Consumer
        02:16
      • Knowledge Check
      • Data Ingestion into Big Data Systems and ETL
    • Lesson 5 Distributed Processing - MapReduce Framework and Pig

      01:01:09
      • 4.1 Distributed Processing MapReduce Framework and Pig
        00:44
      • 4.2 Distributed Processing in MapReduce
        03:01
      • 4.3 Word Count Example
        02:09
      • 4.4 Map Execution Phases
        01:48
      • 4.5 Map Execution Distributed Two Node Environment
        02:10
      • 4.6 MapReduce Jobs
        01:55
      • 4.7 Hadoop MapReduce Job Work Interaction
        02:24
      • 4.8 Setting Up the Environment for MapReduce Development
        02:57
      • 4.9 Set of Classes
        02:09
      • 4.10 Creating a New Project
        02:25
      • 4.11 Advanced MapReduce
        01:30
      • 4.12 Data Types in Hadoop
        02:22
      • 4.13 OutputFormats in MapReduce
        02:25
      • 4.14 Using Distributed Cache
        01:51
      • 4.15 Joins in MapReduce
        03:07
      • 4.16 Replicated Join
        02:37
      • 4.17 Introduction to Pig
        02:03
      • 4.18 Components of Pig
        02:08
      • 4.19 Pig Data Model
        02:23
      • 4.20 Pig Interactive Modes
        03:18
      • 4.21 Pig Operations
        01:19
      • 4.22 Various Relations Performed by Developers
        03:06
      • 4.23 Demo: Analyzing Web Log Data Using MapReduce
        05:43
      • 4.24 Demo: Analyzing Sales Data and Solving KPIs using PIG
        02:46
      • Apache Pig
      • 4.25 Demo: Wordcount
        02:21
      • 4.26 Key takeaways
        00:28
      • Knowledge Check
      • Distributed Processing - MapReduce Framework and Pig
    • Lesson 6 Apache Hive

      57:45
      • 5.1 Apache Hive
        00:37
      • 5.2 Hive SQL over Hadoop MapReduce
        01:38
      • 5.3 Hive Architecture
        02:41
      • 5.4 Interfaces to Run Hive Queries
        01:47
      • 5.5 Running Beeline from Command Line
        01:51
      • 5.6 Hive Metastore
        02:58
      • 5.7 Hive DDL and DML
        02:00
      • 5.8 Creating New Table
        03:15
      • 5.9 Data Types
        01:37
      • 5.10 Validation of Data
        02:41
      • 5.11 File Format Types
        02:40
      • 5.12 Data Serialization
        02:35
      • 5.13 Hive Table and Avro Schema
        02:38
      • 5.14 Hive Optimization Partitioning Bucketing and Sampling
        01:28
      • 5.15 Non Partitioned Table
        01:58
      • 5.16 Data Insertion
        02:22
      • 5.17 Dynamic Partitioning in Hive
        02:43
      • 5.18 Bucketing
        01:44
      • 5.19 What Do Buckets Do
        02:04
      • 5.20 Hive Analytics UDF and UDAF
        03:11
      • Assisted Practice: Synchronization
      • 5.21 Other Functions of Hive
        03:17
      • 5.22 Demo: Real-Time Analysis and Data Filteration
        03:18
      • 5.23 Demo: Real-World Problem
        04:30
      • 5.24 Demo: Data Representation and Import using Hive
        01:50
      • 5.25 Key Takeaways
        00:22
      • Knowledge Check
      • Apache Hive
    • Lesson 7 NoSQL Databases - HBase

      21:41
      • 6.1 NoSQL Databases HBase
        00:33
      • 6.2 NoSQL Introduction
        04:42
      • Demo: Yarn Tuning
        03:28
      • 6.3 HBase Overview
        02:53
      • 6.4 HBase Architecture
        04:43
      • 6.5 Data Model
        03:11
      • 6.6 Connecting to HBase
        01:56
      • HBase Shell
      • 6.7 Key Takeaways
        00:15
      • Knowledge Check
      • NoSQL Databases - HBase
    • Lesson 8 Basics of Functional Programming and Scala

      44:59
      • 7.1 Basics of Functional Programming and Scala
        00:39
      • 7.2 Introduction to Scala
        02:59
      • 7.3 Demo: Scala Installation
        02:54
      • 7.3 Functional Programming
        03:08
      • 7.4 Programming with Scala
        04:01
      • Demo: Basic Literals and Arithmetic Operators
        02:57
      • Demo: Logical Operators
        01:21
      • 7.5 Type Inference Classes Objects and Functions in Scala
        04:45
      • Demo: Type Inference Functions Anonymous Function and Class
        02:03
      • 7.6 Collections
        01:33
      • 7.7 Types of Collections
        05:37
      • Demo: Five Types of Collections
        03:42
      • Demo: Operations on List
        03:16
      • 7.8 Scala REPL
        02:27
      • Assisted Practice: Scala REPL
      • Demo: Features of Scala REPL
        03:17
      • 7.9 Key Takeaways
        00:20
      • Knowledge Check
      • Basics of Functional Programming and Scala
    • Lesson 9 Apache Spark Next Generation Big Data Framework

      36:54
      • 8.1 Apache Spark Next Generation Big Data Framework
        00:43
      • 8.2 History of Spark
        01:58
      • 8.3 Limitations of MapReduce in Hadoop
        02:48
      • 8.4 Introduction to Apache Spark
        01:11
      • 8.5 Components of Spark
        03:10
      • 8.6 Application of In-Memory Processing
        02:54
      • 8.7 Hadoop Ecosystem vs Spark
        01:30
      • 8.8 Advantages of Spark
        03:22
      • 8.9 Spark Architecture
        03:42
      • 8.10 Spark Cluster in Real World
        02:52
      • 8.11 Demo: Running a Scala Programs in Spark Shell
        03:45
      • 8.12 Demo: Setting Up Execution Environment in IDE
        04:18
      • 8.13 Demo: Spark Web UI
        04:14
      • 8.14 Key Takeaways
        00:27
      • Knowledge Check
      • Apache Spark Next Generation Big Data Framework
    • Lesson 10 Spark Core Processing RDD

      01:16:31
      • 9.1 Processing RDD
        00:37
      • 9.1 Introduction to Spark RDD
        02:35
      • 9.2 RDD in Spark
        02:18
      • 9.3 Creating Spark RDD
        05:48
      • 9.4 Pair RDD
        01:53
      • 9.5 RDD Operations
        03:20
      • 9.6 Demo: Spark Transformation Detailed Exploration Using Scala Examples
        03:13
      • 9.7 Demo: Spark Action Detailed Exploration Using Scala
        03:32
      • 9.8 Caching and Persistence
        02:41
      • 9.9 Storage Levels
        03:31
      • 9.10 Lineage and DAG
        02:11
      • 9.11 Need for DAG
        02:51
      • 9.12 Debugging in Spark
        01:11
      • 9.13 Partitioning in Spark
        04:05
      • 9.14 Scheduling in Spark
        03:28
      • 9.15 Shuffling in Spark
        02:41
      • 9.16 Sort Shuffle
        03:18
      • 9.17 Aggregating Data with Pair RDD
        01:33
      • 9.18 Demo: Spark Application with Data Written Back to HDFS and Spark UI
        09:08
      • 9.19 Demo: Changing Spark Application Parameters
        06:27
      • 9.20 Demo: Handling Different File Formats
        02:51
      • 9.21 Demo: Spark RDD with Real-World Application
        04:03
      • 9.22 Demo: Optimizing Spark Jobs
        02:56
      • Assisted Practice: Changing Spark Application Params
      • 9.23 Key Takeaways
        00:20
      • Knowledge Check
      • Spark Core Processing RDD
    • Lesson 11 Spark SQL - Processing DataFrames

      26:50
      • 10.1 Spark SQL Processing DataFrames
        00:32
      • 10.2 Spark SQL Introduction
        02:13
      • 10.3 Spark SQL Architecture
        01:25
      • 10.4 DataFrames
        05:21
      • 10.5 Demo: Handling Various Data Formats
        02:05
      • 10.6 Demo: Implement Various DataFrame Operations
        02:18
      • 10.7 Demo: UDF and UDAF
        02:50
      • 10.8 Interoperating with RDDs
        04:45
      • 10.9 Demo: Process DataFrame Using SQL Query
        02:30
      • 10.10 RDD vs DataFrame vs Dataset
        02:34
      • Processing DataFrames
      • 10.11 Key Takeaways
        00:17
      • Knowledge Check
      • Spark SQL - Processing DataFrames
    • Lesson 12 Spark MLLib - Modelling BigData with Spark

      32:54
      • 11.1 Spark MLlib Modeling Big Data with Spark
        00:38
      • 11.2 Role of Data Scientist and Data Analyst in Big Data
        02:12
      • 11.3 Analytics in Spark
        03:37
      • 11.4 Machine Learning
        03:27
      • 11.5 Supervised Learning
        02:19
      • 11.6 Demo: Classification of Linear SVM
        02:37
      • 11.7 Demo: Linear Regression with Real World Case Studies
        03:41
      • 11.8 Unsupervised Learning
        01:16
      • 11.9 Demo: Unsupervised Clustering K-Means
        02:45
      • Assisted Practice: Unsupervised Clustering K-means
      • 11.10 Reinforcement Learning
        02:02
      • 11.11 Semi-Supervised Learning
        01:17
      • 11.12 Overview of MLlib
        02:59
      • 11.13 MLlib Pipelines
        03:42
      • 11.14 Key Takeaways
        00:22
      • Knowledge Check
      • Spark MLLib - Modeling BigData with Spark
    • Lesson 13 Stream Processing Frameworks and Spark Streaming

      01:13:16
      • 12.1 Stream Processing Frameworks and Spark Streaming
        00:34
      • 12.1 Streaming Overview
        01:41
      • 12.2 Real-Time Processing of Big Data
        02:45
      • 12.3 Data Processing Architectures
        04:12
      • 12.4 Demo: Real-Time Data Processing
        02:28
      • 12.5 Spark Streaming
        04:21
      • 12.6 Demo: Writing Spark Streaming Application
        03:15
      • 12.7 Introduction to DStreams
        01:52
      • 12.8 Transformations on DStreams
        03:44
      • 12.9 Design Patterns for Using ForeachRDD
        03:25
      • 12.10 State Operations
        00:46
      • 12.11 Windowing Operations
        03:16
      • 12.12 Join Operations stream-dataset Join
        02:13
      • 12.13 Demo: Windowing of Real-Time Data Processing
        02:32
      • 12.14 Streaming Sources
        01:56
      • 12.15 Demo: Processing Twitter Streaming Data
        03:56
      • 12.16 Structured Spark Streaming
        03:54
      • 12.17 Use Case Banking Transactions
        02:29
      • 12.18 Structured Streaming Architecture Model and Its Components
        04:01
      • 12.19 Output Sinks
        00:49
      • 12.20 Structured Streaming APIs
        03:36
      • 12.21 Constructing Columns in Structured Streaming
        03:07
      • 12.22 Windowed Operations on Event-Time
        03:36
      • 12.23 Use Cases
        01:24
      • 12.24 Demo: Streaming Pipeline
        07:07
      • Spark Streaming
      • 12.25 Key Takeaways
        00:17
      • Knowledge Check
      • Stream Processing Frameworks and Spark Streaming
    • Lesson 14 Spark GraphX

      28:43
      • 13.1 Spark GraphX
        00:35
      • 13.2 Introduction to Graph
        02:38
      • 13.3 Graphx in Spark
        02:41
      • 13.4 Graph Operators
        03:29
      • 13.5 Join Operators
        03:18
      • 13.6 Graph Parallel System
        01:33
      • 13.7 Algorithms in Spark
        03:26
      • 13.8 Pregel API
        02:31
      • 13.9 Use Case of GraphX
        01:02
      • 13.10 Demo: GraphX Vertex Predicate
        02:23
      • 13.11 Demo: Page Rank Algorithm
        02:33
      • 13.12 Key Takeaways
        00:17
      • Knowledge Check
      • Spark GraphX
      • 13.14 Project Assistance
        02:17
    • Practice Projects

      • Car Insurance Analysis
      • Transactional Data Analysis
      • K-Means clustering for telecommunication domain

Get a Completion Certificate

Share your certificate with prospective employers and your professional network on LinkedIn.

Learn the Basics of Hadoop and Spark

Course Advisors

  • Ronald van Loon

    Ronald van Loon

    Top 10 Big Data and Data Science Influencer, Director - Adversitement

    Named by Onalytica as one of the three most influential people in Big Data, Ronald is also an author of a number of leading Big Data and Data Science websites, including Datafloq, Data Science Central, and The Guardian. He also regularly speaks at renowned events.

prevNext

Why you should learn Hadoop and Spark

$84.6 Billion by 2021

Global Hadoop market as per Allied Market Research

1.9 million jobs in the US

For Hadoop data analysts by 2021

Career Opportunities

FAQs

  • What are the prerequisites to learn the Hadoop and Spark basics program?

    Professionals should be familiar with Core Java and SQL before taking this Hadoop and Spark basics program. 

  • How do beginners learn Spark and Hadoop basics?

    Beginners today usually look for online resources to learn Spark and Hadoop. While textual content is easily available over the internet, this Hadoop and Spark free course offers comprehensive video modules to further enrich your learning experience.

  • How long does it take to learn Hadoop and Spark?

    This Hadoop basics course offers 11 hours of in-depth and high-quality video lessons that you can follow at your preferred learning speed. You can access the course 24/7 and return to any previous lesson for better understanding.

  • What should I learn first in the Hadoop and Spark basics program?

    It is recommended to first get an overview of big data, the four Vs of big data, and the reasons why distributed systems were introduced in this Spark basics program.

  • Is the Hadoop and Spark basics program easy to learn?

    The instructors of this Hadoop basics program have explained all the concepts from scratch with an easily understandable approach. Anyone having a technical background can easily follow the lessons.

  • What are the basics in a Hadoop and Spark training program?

    Learners enrolling in this Hadoop and Spark fundamentals program are guided in basics like introduction to big data analytics, the components of Hadoop ecosystem, and the Hadoop architecture.

  • What is Hadoop?

    Hadoop is an open-source software framework for data storage. It also enables applications to run on commodity hardware. It gained popularity over the years for its high-capacity storage, processing power, and the ability to multitask unlimited jobs.
     

  • What is Spark?

    Originally developed at UC Berkeley, Apache Spark is an extremely powerful and fast analytics engine for big data and machine learning. It is used for processing enormous amounts of data via in-memory caching and optimized query execution.

  • What are Hadoop and Spark used for?

    Hadoop is an open-source framework that allows organizations to store and process big data in a parallel and distributed environment. It is used to store and combine data, and it scales up from one server to thousands of machines, each offering low-cost storage and local computation. Spark is an open-source framework that provides several interconnected platforms, systems, and standards for big data projects.

  • Why learn Hadoop and Spark together?

    It is recommended to learn Hadoop and Spark together because their distinct individualities are interlinked in multiple ways. While Hadoop reads and writes files for the HDFS, Spark takes over the data processing in RAM using a Resilient Distributed Dataset (RDD). However, Spark can run independently or along with a Hadoop cluster as the data source. From a skill point of view, hiring managers and companies are particularly interested in professionals who are well-versed in both Hadoop and Spark.
     

  • What are my next best learning options after completing this Hadoop and Spark basics program?

    After completing this Hadoop and Spark basics training program, you can get ahead with other courses like Big Data Engineer Master’s Program or Post Graduate Program in Data Engineering.

  • Will I get a certificate after completing the Hadoop and Spark basics program?

    Yes, You will receive a Course Completion Certificate from SkillUp upon completing the Hadoop and Spark basics program. You can unlock it by logging in to your SkillUp account. As soon as the certificate is unlocked, you will receive a mail with a link to your SkillUp learning dashboard on your registered mail address. Click the link to view and download your certificate. You can even add the certificate to your resume and share it on social media platforms.

  • What are the career opportunities after learning Hadoop and Spark?

    Professionals who learn Hadoop and Spark open their doors to many job opportunities in the field of big data analytics. Business analyst, big data engineer, analytics manager, and data architect are some of the popular job roles in this field that one can target after learning Hadoop basics.

  • Who can learn Hadoop and Spark?

    You can learn Hadoop and Spark if you are one among the following:

    • IT professional
    • BI professional
    • Analytics professional
    • Software developer
    • Senior IT professional
    • Project manager
    • Aspiring data scientist
       

  • Can I complete this Hadoop and Spark basics online course in 90 days?

    Yes, the video modules of this Hadoop and Spark basics program are easy to grasp and you can complete them within 90 days.

  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.