Operationalize and Design with Spark


Overview/Description
Target Audience
Prerequisites
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description
In this course you will learn to operationalize and design batch solutions with Spark. It is one in a series of courses that prepares learners for exam 70-775: Perform Data Engineering on Microsoft Azure HDInsight.

Target Audience
IT professionals who implement and work with big data analytics and engineering workflows and use open-source technologies; IT professionals preparing for Microsoft exam 70-775

Prerequisites
None

Expected Duration (hours)
1.2

Lesson Objectives

Operationalize and Design with Spark

  • start the course
  • use YARN to share resources between Spark applications
  • describe how to optimize Spark performance
  • describe how to tune Spark performance using executors, partitioning, and bucketing
  • connect to external Spark data sources
  • describe Spark dataset programs and how to add custom Python and Scala code
  • identify bottlenecks using Spark SQL query graphs
  • describe Azure Data Factory (ADF)
  • create a cluster using Azure Data Factory (ADF)
  • connect a storage account to a cluster using the Azure Data Factory (ADF)
  • create an on-demand Hadoop cluster in HDInsight
  • use Apache Oozie in HDInsight
  • describe how to share metastores and storage accounts between clusters such as Hive and Spark
  • compare different storage types for data pipeline
  • connect to external data sources
  • Course Number:
    df_mahd_a05_it_enus

    Expertise Level
    Intermediate