Data Warehousing with Hadoop: Spark, HDInsight and Cluster Management
Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level
Overview/Description
Discover how to work with Spark and its in-memory capabilities of data management. How to manage and troubleshoot HDInsight clusters using Ambari and the Azure CLI tool is also covered.
Expected Duration (hours)
0.9
Lesson Objectives Data Warehousing with Hadoop: Spark, HDInsight and Cluster Management
specify the essential capabilities of Spark and its essential architectural components
list the data structures along with the RDD and lineage concepts that are used in Spark
set up Spark clusters using PowerShell and Azure Resource Manager template
describe the relationship between Spark SQL and Hive
specify the essential concepts of Spark SQL and DataFrame
demonstrate the approach of customizing HDInsight clusters using bootstrap
install Hadoop applications on Azure HDInsight
illustrate the usage of Ambari as a tool in order to manage clusters
manage Hadoop clusters in HDInsight using Azure CLI
specify the approach of troubleshooting and tuning HDInsight clusters
monitor Hadoop clusters in HDInsight to collect metrics for analysis
set up Spark clusters and manage the clusters using Ambari GUI
Course Number: it_dfdwha_04_enus
Expertise Level
Intermediate