Data Warehousing with Hadoop: Spark, HDInsight and Cluster Management


Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description

Discover how to work with Spark and its in-memory capabilities of data management. How to manage and troubleshoot HDInsight clusters using Ambari and the Azure CLI tool is also covered.



Expected Duration (hours)
0.9

Lesson Objectives

Data Warehousing with Hadoop: Spark, HDInsight and Cluster Management

  • specify the essential capabilities of Spark and its essential architectural components
  • list the data structures along with the RDD and lineage concepts that are used in Spark
  • set up Spark clusters using PowerShell and Azure Resource Manager template
  • describe the relationship between Spark SQL and Hive
  • specify the essential concepts of Spark SQL and DataFrame
  • demonstrate the approach of customizing HDInsight clusters using bootstrap
  • install Hadoop applications on Azure HDInsight
  • illustrate the usage of Ambari as a tool in order to manage clusters
  • manage Hadoop clusters in HDInsight using Azure CLI
  • specify the approach of troubleshooting and tuning HDInsight clusters
  • monitor Hadoop clusters in HDInsight to collect metrics for analysis
  • set up Spark clusters and manage the clusters using Ambari GUI
  • Course Number:
    it_dfdwha_04_enus

    Expertise Level
    Intermediate