Programming and Deploying Apache Spark Applications


Overview/Description
Target Audience
Prerequisites
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description
Apache Spark is a cluster computing framework for fast processing of Hadoop data. Spark applications can be written in Scala, Java, or Python. In this course, you will learn how to develop Spark applications using Scala, Java, or Python. You will also learn how to test and deploy applications to a cluster, monitor clusters and applications, and schedule resources for clusters and individual applications.

Target Audience
Developers familiar with Scala, Python, or Java who want to learn how to program and deploy Spark applications

Prerequisites
None

Expected Duration (hours)
3.0

Lesson Objectives

Programming and Deploying Apache Spark Applications

  • start the course
  • describe Apache Spark and the main components of a Spark application
  • download and install Apache Spark on Windows 8.1 Pro N
  • download and install Apache Spark on Mac OS X Yosemite
  • download and install Java Development Kit or JDK 8 and build Apache Spark using Simple Build Tool or SBT on Mac OS X Yosemite
  • use the Spark shell for analyzing data interactively
  • link an application to Spark
  • create a SparkContext to initialize Apache Spark
  • introduce Resilient Distributed Datasets or RDDs and create a parallelized collection to generate an RDD
  • load external datasets to create Resilient Distributed Datasets or RDDs
  • distinguish transformations and actions, describe some of the transformations supported by Spark, and use transformations
  • describe some of the actions supported by Spark and use the actions
  • use anonymous function syntax and use static methods in a global singleton to pass functions to Spark
  • work with key-value pairs
  • persist Spark RDDs
  • use broadcast variables in a Spark operation
  • use accumulators in Spark operations
  • use different formats for loading and saving Spark data
  • use basic Spark SQL for data queries in a Spark application
  • use basic Spark GraphX to work with graphs in a Spark application
  • describe how Spark applications run in a cluster
  • deploy a Spark application to a cluster
  • unit test a Spark application
  • describe how to monitor a Spark application or cluster with Web UIs
  • describe options for scheduling resources across applications in a Spark cluster
  • describe how to enable a fair scheduler for fair sharing within an application in a Spark cluster
  • configure fair scheduler pool properties for a Spark context within a cluster
  • practice programming and deploying a Spark application to a cluster
  • Course Number:
    df_apsf_a01_it_enus

    Expertise Level
    Beginner