Continued Dataproc Operations


Overview/Description
Target Audience
Prerequisites
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description
Executing Dataproc implementations with big data can provide a variety of methods. This course will continue the study of Dataproc implementations with Spark and Hadoop using the cloud shell and introduce BigQuery PySpark REPL package.

Target Audience
Data professionals who are responsible for provisioning and optimizing big data solutions, and data enthusiasts getting started with Google Cloud Platform

Prerequisites
None

Expected Duration (hours)
1.0

Lesson Objectives

Continued Dataproc Operations

  • start the course
  • describe the various Spark and Hadoop processes that can be performed with Dataproc
  • recognize the benefits of separating storage and compute services using Cloud Dataproc
  • recall the process of monitoring and logging Dataproc jobs
  • demonstrate the process of using an SSH tunnel to connect to the master and worker nodes in a cluster
  • define the Spark REPL package and how it's used in Linux
  • describe the compute and storage processes and the benefits of their separation and the virtualized distribution of Hadoop
  • define BigQuery and its benefits for large-scale analytics
  • describe the MapReduce programming model
  • demonstrate the process of submitting multiple jobs with Dataproc
  • recognize the various Dataproc and Cloud Shell job operations and implementations
  • Course Number:
    cl_gcde_a07_it_enus

    Expertise Level
    Intermediate