Pyspark Course Overview

This PySpark course gives you an overview of Apache Spark and how to integrate it with Python using the PySpark interface. The training will show you how to build and implement data-intensive applications after you know about machine learning, leveraging Spark RDD, Spark SQL, Spark MLlib, Spark Streaming, HDFS, Flume, Spark GraphX, and Kafka.

Skills Covered

  • Spark 20 architecture
  • Spark SQL
  • Spark MILib
  • Sqoop
  • Kafka
  • Flume
  • Spark Streaming
  • Spark DataFrames
  • Schemas for RDD lazy executions and transformations
  • Aggregate transform filter and sort data with DataFrames

Training Options

Self-Paced Learning

$ 899

  • num_of_days days of access to high-quality, self-paced learning content designed by industry experts

Pyspark Course Curriculum

Eligibility

The global market for Big Data analytics is booming, opening up exciting opportunities for IT professionals. Professionals roles that are ideal for this PySpark course include freshers willing to start a career in Big Data, developers and architects, BI/ETL/DW professionals, mainframe professionals, Big Data architects, engineers, developers, and data scientists and analytics professionals.
Read More

Pre-requisites

There are no prerequisites for this PySpark training course. However, prior knowledge of Python Programming and SQL will be beneficial but not mandatory.
Read More

Course Content

  • PySpark Training

    Preview
    • Lesson 1 A Brief Primer on PySpark

      14:52Preview
      • 1.1 A Brief Primer on PySpark
        05:52
      • 1.2 Brief Introduction to Spark
        02:04
      • 1.3 Apache Spark Stack
        01:38
      • 1.4 Spark Execution Process
        01:26
      • 1.05 Newest Capabilities of PySpark
        01:56
      • 1.6 Cloning GitHub Repository
        01:56
    • Lesson 2 Resilient Distributed Datasets

      38:44Preview
      • 2.1 Resilient Distributed Datasets
        01:49
      • 2.2 Creating RDDs
        04:38
      • 2.3 Schema of an RDD
        02:17
      • 2.4 Understanding Lazy Execution
        02:11
      • 2.5 Introducing Transformations – .map(…)
        03:57
      • 2.6 Introducing Transformations – .filter(…)
        02:23
      • 2.7 Introducing Transformations – .flatMap(…)
        06:14
      • 2.8 Introducing Transformations – .distinct(…)
        03:27
      • 2.9 Introducing Transformations – .sample(…)
        03:15
      • 2.10 Introducing Transformations – .join(…)
        04:17
      • 2.11 Introducing Transformations – .repartition(…)
        04:16
    • Lesson 3 Resilient Distributed Datasets and Actions

      35:27Preview
      • 3.1 Resilient Distributed Datasets and Actions
        05:43
      • 3.2 Introducing Actions – .collect(…)
        02:15
      • 3.3 Introducing Actions – .reduce(…) and .reduceByKey(…)
        02:59
      • 3.4 Introducing Actions – .count()
        02:36
      • 3.5 Introducing Actions – .foreach(…)
        01:51
      • 3.6 Introducing Actions – .aggregate(…) and .aggregateByKey(…)
        04:55
      • 3.7 Introducing Actions – .coalesce(…)
        02:05
      • 3.8 Introducing Actions – .combineByKey(…)
        03:11
      • 3.9 Introducing Actions – .histogram(…)
        01:50
      • 3.10 Introducing Actions – .sortBy(…)
        02:38
      • 3.11 Introducing Actions – Saving Data
        03:10
      • 3.12 Introducing Actions – Descriptive Statistics
        02:14
    • Lesson 4 DataFrames and Transformations

      32:33Preview
      • 4.1 DataFrames and Transformations
        01:35
      • 4.2 Creating DataFrames
        04:16
      • 4.3 Specifying Schema of a DataFrame
        06:00
      • 4.4 Interacting with DataFrames
        01:36
      • 4.5 The .agg(…) Transformation
        03:19
      • 4.6 The .sql(…) Transformation
        03:57
      • 4.7 Creating Temporary Tables
        02:31
      • 4.8 Joining Two DataFrames
        03:54
      • 4.9 Performing Statistical Transformations
        03:55
      • 4.10 The .distinct(…) Transformation
        01:30
    • Lesson 5 Data Processing with Spark DataFrames

      27:16Preview
      • 5.1 Data Processing with Spark DataFrames
        06:29
      • 5.2 Filtering Data
        01:31
      • 5.3 Aggregating Data
        02:34
      • 5.4 Selecting Data
        02:24
      • 5.5 Transforming Data
        01:40
      • 5.6 Presenting Data
        01:34
      • 5.7 Sorting DataFrames
        01:00
      • 5.8 Saving DataFrames
        04:28
      • 5.9 Pitfalls of UDFs
        03:38
      • 5.10 Repartitioning Data
        01:58
  • Free Course
  • Python for Data Science

    Preview
    • Lesson 1 - Welcome

      02:28Preview
      • Welcome
        02:28
      • Learning Objectives
    • Lesson 2 - Python Basics

      11:55Preview
      • 2.1 Learning Objectives
      • 2.2 Your first program
        01:15
      • 2.3 Types
        02:57
      • 2.4 Expressions and Variables
        03:50
      • 2.5 Write your First Python Code
      • 2.6 String Operations
        03:53
      • 2.7 String Operations
    • Lesson 3 - Python Data Structures

      16:22Preview
      • 3.1 Learning Objectives
      • 3.2 Lists and Tuples
        08:46
      • 3.3 Lists and Tuples
      • 3.4 Sets
        05:12
      • 3.5 Sets
      • 3.6 Dictionaries
        02:24
      • 3.7 Dictionaries
    • Lesson 4 - Python Programming Fundamentals

      41:08Preview
      • 4.1 Learning Objectives
      • 4.2 Conditions and Branching
        10:13
      • 4.3 Conditions and Branching
      • 4.4 Loops
        06:40
      • 4.5 Loops
      • 4.6 Functions
        13:28
      • 4.7 Functions
      • 4.8 Objects and Classes
        10:47
      • 4.9 Objects and Classes
    • Lesson 5 - Working with Data in Python

      12:35Preview
      • 5.1 Learning Objectives
      • 5.2 Reading files with open
        03:38
      • 5.3 Reading Files
      • 5.4 Writing files with open
        02:49
      • 5.5 Writing Files
      • 5.6 Loading data with Pandas
        04:07
      • 5.7 Working with and Saving data with Pandas
        02:01
      • 5.8 Loading Data and Viewing Data
    • Lesson 6 - Working with Numpy Arrays

      18:26
      • 6.1 Learning Objectives
      • 6.2 Numpy One-Dimensional Arrays
        11:18
      • 6.3 Working with One-Dimensional Numpy Arrays
      • 6.4 Numpy Two-Dimensional Arrays
        07:08
      • 6.5 Working with Two-Dimensional Numpy Arrays
    • Lesson 7 - Course Summary

      01:13Preview
      • Course Summary
        01:13
      • Unlocking IBM Certificate

Pyspark Exam & Certification

PySpark Certificate
  • Who provides the certification and how long is it valid for?

    Upon successful completion of the PySpark certification training, Simplilearn will provide you with an industry-recognized course completion certificate which has lifelong validity.

  • What do I need to do to unlock my Simplilearn certificate?

    To obtain the PySpark course certification, you must complete the online self-learning training.

Pyspark FAQs

  • What is PySpark?

    Apache Spark is an open-source real-time cluster processing framework which is used in streaming analytics systems. Python is an open-source programming language that has a plethora of libraries that support diverse applications. PySpark is an integration of Python and Spark used for Big Data analytics. The Python API for Spark enables programmers to harness the simplicity of Python and the power of Apache Spark.
     

  • What is RDD in PySpark?

    RDD is an abbreviation for Resilient Distributed Dataset, the primary building block of Apache Spark. RDD is a fundamental data structure of Apache Spark, which is a constant distributed collection of objects. Each dataset in an RDD is divided into logical partitions that may be computed on different nodes of the cluster.
     

  • Is PySpark a programming language?

    PySpark is not a programming language. It is a Python API for Apache Spark deployments that Python developers can leverage to create in-memory processing applications. 

  • Who are the instructors and how are they selected?

    All of our highly qualified PySpark trainers are Big Data industry experts with years of relevant industry experience working with front-end development technology. Each of them has gone through a rigorous selection process that includes profile screening, technical evaluation, and a training demo before they are certified to train for us. We also ensure that only those trainers with a high alumni rating remain on our faculty.

  • How do I enroll in this PySpark certification training?

    You can enroll in this PySpark certification training on our website and make an online payment using any of the following options:

    • Visa Credit or Debit Card
    • MasterCard
    • American Express
    • Diner’s Club
    • PayPal

    Once payment is received, you will automatically receive a payment receipt and access information via email.

  • How can I learn more about this PySpark course?

    Contact us using the form on the right of any page on the Simplilearn website, or select the Live Chat link. Our customer service representatives will be able to give you more details.

  • What is Global Teaching Assistance?

    Our teaching assistants are a dedicated team of subject matter experts here to help you get certified in your first attempt. They engage students proactively to ensure the course path is being followed and help you enrich your learning experience, from class onboarding to project mentoring and job assistance.

  • Can I cancel my enrollment? Will I get a refund?

    Yes, you can cancel your enrollment if necessary. We will refund the course price after deducting an administration fee. To learn more, you can view our refund policy.

  • What is covered under the 24/7 Support promise?

    We offer 24/7 support through email, chat, and calls. We also have a dedicated team that provides on-demand assistance through our community forum. What’s more, you will have lifetime access to the community forum, even after completion of your course with us.

  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.