Operationalize and Design with Spark

Operationalize and Design with Spark

Overview/Description
Target Audience
Prerequisites
Expected Duration
Lesson Objectives
Course Number
Expertise Level

Overview/Description
In this course you will learn to operationalize and design batch solutions with Spark. It is one in a series of courses that prepares learners for exam 70-775: Perform Data Engineering on Microsoft Azure HDInsight.

Target Audience
IT professionals who implement and work with big data analytics and engineering workflows and use open-source technologies; IT professionals preparing for Microsoft exam 70-775

Prerequisites
None

Expected Duration (hours)
1.2

Lesson Objectives

Operationalize and Design with Spark

start the course

use YARN to share resources between Spark applications

describe how to optimize Spark performance

describe how to tune Spark performance using executors, partitioning, and bucketing

connect to external Spark data sources

describe Spark dataset programs and how to add custom Python and Scala code

identify bottlenecks using Spark SQL query graphs

describe Azure Data Factory (ADF)

create a cluster using Azure Data Factory (ADF)

connect a storage account to a cluster using the Azure Data Factory (ADF)

create an on-demand Hadoop cluster in HDInsight

use Apache Oozie in HDInsight

describe how to share metastores and storage accounts between clusters such as Hive and Spark

compare different storage types for data pipeline

connect to external data sources

Course Number:
df_mahd_a05_it_enus

Expertise Level
Intermediate