Using Apache Spark for AI Development

Using Apache Spark for AI Development

Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level

Overview/Description

Spark is a leading open-source cluster-computing framework that is used for distributed databases and machine learning. Although not primarily designed for AI, Spark allows you to take advantage of data parallelism and the large distributed systems used in AI development.

AI practitioners should recognize when to use Spark for a particular application. In this course, you'll explore advanced techniques for working with Apache Spark and identify the key advantages of using Spark over other platforms. You'll define the meaning of resilient distributed databases (RDDs) and explore several workflows related to them.

You'll move on to recognize how to work with a Spark DataFrame, identifying its features and use cases. Finally, you'll learn how to create a machine learning pipeline using Spark ML Pipelines.

Expected Duration (hours)
0.6

Lesson Objectives

Using Apache Spark for AI Development

discover the key concepts covered in this course

identify cases in which it is advantageous to use Spark over other platforms

define a resilient distributed dataset and identify typical sources of data

specify the unique features of a resilient distributed dataset

describe how to create a resilient distributed dataset

list possible operations with resilient distributed datasets and define their roles

list potential sources of data for a Spark DataFrame and outline how to import these into Spark

name the features of a Spark DataFrame and some useful operations with which to use it

outline how to create a Spark DataFrame

specify how Spark ML Pipelines can be used for creating and tuning ML models

describe fundamental concepts of Spark ML pipelines

create an ML pipeline using Spark ML pipelines

summarize the key concepts covered in this course

Course Number:
it_aiexspdj_01_enus

Expertise Level
Intermediate