Accessing Data with Spark: An Introduction to Spark

Accessing Data with Spark: An Introduction to Spark

Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level

Overview/Description

Explore the basics of Apache Spark, an analytics engine used for big data processing. It's an open source, cluster computing framework built on top of Hadoop. Discover how it allows operations on data with both its own library methods and with SQL, while delivering great performance. Learn the characteristics, components, and functions of Spark, Hadoop, RDDS, the spark session, and master and worker notes. Install PySpark. Then, initialize a Spark Context and Spark DataFrame from the contents of an RDD and a DataFrame. Configure a DataFrame with a map function. Retrieve and transform data. Finally, convert Spark and Pandas DataFrames and vice versa.

Expected Duration (hours)
1.1

Lesson Objectives

Accessing Data with Spark: An Introduction to Spark

discover the key concepts covered in this course

recognize where Spark fits in with Hadoop and its components

describe Spark RDDs and their characteristics, including what makes them resilient and distributed

identify the types of operations which are permitted on an RDD and describe how RDD transformations are lazily evaluated

distinguish between RDDs and DataFrames and describe the relationship between the two

list the crucial components of Spark and the relationships between them and recognize the functions of the Spark Session, Master and Worker nodes

install PySpark and initialize a Spark Context

create and load data into an RDD

initialize a Spark DataFrame from the contents of an RDD

work with Spark DataFrames containing both primitive and structured data types

define the contents of a DataFrame using the SQLContext

apply the map() function on an RDD to configure a DataFrame with column headers

retrieve required data from within a DataFrame and define and apply transformations on a DataFrame

convert Spark DataFrames to Pandas DataFrames and vice versa

describe basic Spark concepts

Course Number:
it_dsadskdj_01_enus

Expertise Level
Beginner