Final Exam: Data Wrangler

Final Exam: Data Wrangler

Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level

Overview/Description

Final Exam: Data Wrangler will test your knowledge and application of the topics presented throughout the Data Wrangler track of the Skillsoft Aspire Data Analyst to Data Scientist Journey.

Expected Duration (hours)
0.0

Lesson Objectives

Final Exam: Data Wrangler

apply a group by transformation to aggregate with a conditional value

apply grouping and aggregation operations on a DataFrame to analyze categories of data in a dataset

build and run the application and confirm the output using HDFS from both the command line and the web application

change column values by applying functions

change date formats to the ISO 8601 standard

code up a Combiner for the MapReduce application and configure the Driver to use it for a partial reduction on the Mapper nodes of the cluster

compare managed and external tables in Hive and how they relate to the underlying data

configure and test PyMongo in a Python program

configure the Reducer and the Driver for the inverted index application

create and analyze categories of data in a dataset using Windows

Create and configure Pandas dataFrame objects

Create and configure pandas series object

create and instantiate a directed acyclic graph in Airflow

create a Spark DataFrame from the contents of a CSV file and apply some simple transformations on the DataFrame

create the driver program for the MapReduce application

define and run a join query involving two related tables

define a vehicle type that can be used to represent automobiles to be stored in a Java PriorityQueue

define the Mapper for a MapReduce application to build an inverted index from a set of text files

define what a window is in the context of Spark DataFrames and when they can be used

demonstrate how to ingest data using Sqoop

describe data ingestion approaches and compare Avro and Parquet file format benefits

describe the beneficial features that we can achieve using serverless and lambda architectures

describe the data processing strategies provided by MapReduce V2, Hive, Pig, and Yam for processing data with data lakes

describe the different primitive and complex data types available in Hive

extract subsets of data using filtering

flatten multi-dimensional data structures by chaining lateral views

handle common errors encountered when reading CSV data

identify and troubleshoot missing data

identify and work with time-series data

identify kinds of masking operations

implement a multi-stage aggregation pipeline

implement data lakes using AWS

implement deep learning using Keras

install MongoDB and implement data partitioning using MongoDB

list the prominent distributed data models along with their associative implementation benefits

list the various frameworks that can be used to process data from data lakes

load a few rows of data into a table and query it with simple select statements

load multiple sheets from an Excel document

perform create, read, update, and delete operations on a MongoDB document

perform statistical operations on DataFrames

plot pie charts, box plots, and scatter plots using Pandas

recall the prominent data pattern implementation in microservices

recognize the capabilities of Microsoft machine learning tools

recognize the machine learning tools provided by AWS for data analysis

recognize the read and write optimizations in MongoDB

setup and install Apache Airflow

split columns based on a pattern

test Airflow tasks using the airflow command line utility

trim and clean a DataFrame before a view is created as a precursor to running SQL queries on it

use a regular expression to extract data into a new column

use a Spark accumulator as a counter

use createIndex to build an index on a collection

use Maven to create a new project for a MapReduce application and plan out the Map and Reduce phases by examining the auto prices dataset

use the alter table statement to change the definition of a Hive table

use the find operation to select documents from a collection

use the mongoexport tool to export data from MongoDB to JSON and CSV

use the mongoimport tool to import from JSON and CSV

use the UNION and UNION ALL operations on table data and distinguish between the two

work with data in the form of key-value pairs - map data structures in Hive

work with scikit-learn to implement machine learning

Course Number:
it_fedads_02_enus

Expertise Level
Intermediate