Programming and Deploying Apache Spark Applications
Overview/Description
Target Audience
Prerequisites
Expected Duration
Lesson Objectives
Course Number
Expertise Level
Overview/Description
Apache Spark is a cluster computing framework for fast processing of Hadoop data. Spark applications can be written in Scala, Java, or Python. In this course, you will learn how to develop Spark applications using Scala, Java, or Python. You will also learn how to test and deploy applications to a cluster, monitor clusters and applications, and schedule resources for clusters and individual applications.
Target Audience
Developers familiar with Scala, Python, or Java who want to learn how to program and deploy Spark applications
Prerequisites
None
Expected Duration (hours)
3.0
Lesson Objectives Programming and Deploying Apache Spark Applications
start the course
describe Apache Spark and the main components of a Spark application
download and install Apache Spark on Windows 8.1 Pro N
download and install Apache Spark on Mac OS X Yosemite
download and install Java Development Kit or JDK 8 and build Apache Spark using Simple Build Tool or SBT on Mac OS X Yosemite
use the Spark shell for analyzing data interactively
link an application to Spark
create a SparkContext to initialize Apache Spark
introduce Resilient Distributed Datasets or RDDs and create a parallelized collection to generate an RDD
load external datasets to create Resilient Distributed Datasets or RDDs
distinguish transformations and actions, describe some of the transformations supported by Spark, and use transformations
describe some of the actions supported by Spark and use the actions
use anonymous function syntax and use static methods in a global singleton to pass functions to Spark
work with key-value pairs
persist Spark RDDs
use broadcast variables in a Spark operation
use accumulators in Spark operations
use different formats for loading and saving Spark data
use basic Spark SQL for data queries in a Spark application
use basic Spark GraphX to work with graphs in a Spark application
describe how Spark applications run in a cluster
deploy a Spark application to a cluster
unit test a Spark application
describe how to monitor a Spark application or cluster with Web UIs
describe options for scheduling resources across applications in a Spark cluster
describe how to enable a fair scheduler for fair sharing within an application in a Spark cluster
configure fair scheduler pool properties for a Spark context within a cluster
practice programming and deploying a Spark application to a cluster
Course Number: df_apsf_a01_it_enus
Expertise Level
Beginner