Final Exam: Data Wrangler
Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level
Overview/Description
Final Exam: Data Wrangler will test your knowledge and application of the topics presented throughout the Data Wrangler track of the Skillsoft Aspire Data Analyst to Data Scientist Journey.
Expected Duration (hours)
0.0
Lesson Objectives Final Exam: Data Wrangler
apply a group by transformation to aggregate with a conditional value
apply grouping and aggregation operations on a DataFrame to analyze categories of data in a dataset
build and run the application and confirm the output using HDFS from both the command line and the web application
change column values by applying functions
change date formats to the ISO 8601 standard
code up a Combiner for the MapReduce application and configure the Driver to use it for a partial reduction on the Mapper nodes of the cluster
compare managed and external tables in Hive and how they relate to the underlying data
configure and test PyMongo in a Python program
configure the Reducer and the Driver for the inverted index application
create and analyze categories of data in a dataset using Windows
Create and configure Pandas dataFrame objects
Create and configure pandas series object
create and instantiate a directed acyclic graph in Airflow
create a Spark DataFrame from the contents of a CSV file and apply some simple transformations on the DataFrame
create the driver program for the MapReduce application
define and run a join query involving two related tables
define a vehicle type that can be used to represent automobiles to be stored in a Java PriorityQueue
define the Mapper for a MapReduce application to build an inverted index from a set of text files
define what a window is in the context of Spark DataFrames and when they can be used
demonstrate how to ingest data using Sqoop
describe data ingestion approaches and compare Avro and Parquet file format benefits
describe the beneficial features that we can achieve using serverless and lambda architectures
describe the data processing strategies provided by MapReduce V2, Hive, Pig, and Yam for processing data with data lakes
describe the different primitive and complex data types available in Hive
extract subsets of data using filtering
flatten multi-dimensional data structures by chaining lateral views
handle common errors encountered when reading CSV data
identify and troubleshoot missing data
identify and work with time-series data
identify kinds of masking operations
implement a multi-stage aggregation pipeline
implement data lakes using AWS
implement deep learning using Keras
install MongoDB and implement data partitioning using MongoDB
list the prominent distributed data models along with their associative implementation benefits
list the various frameworks that can be used to process data from data lakes
load a few rows of data into a table and query it with simple select statements
load multiple sheets from an Excel document
perform create, read, update, and delete operations on a MongoDB document
perform statistical operations on DataFrames
plot pie charts, box plots, and scatter plots using Pandas
recall the prominent data pattern implementation in microservices
recognize the capabilities of Microsoft machine learning tools
recognize the machine learning tools provided by AWS for data analysis
recognize the read and write optimizations in MongoDB
setup and install Apache Airflow
split columns based on a pattern
test Airflow tasks using the airflow command line utility
trim and clean a DataFrame before a view is created as a precursor to running SQL queries on it
use a regular expression to extract data into a new column
use a Spark accumulator as a counter
use createIndex to build an index on a collection
use Maven to create a new project for a MapReduce application and plan out the Map and Reduce phases by examining the auto prices dataset
use the alter table statement to change the definition of a Hive table
use the find operation to select documents from a collection
use the mongoexport tool to export data from MongoDB to JSON and CSV
use the mongoimport tool to import from JSON and CSV
use the UNION and UNION ALL operations on table data and distinguish between the two
work with data in the form of key-value pairs - map data structures in Hive
work with scikit-learn to implement machine learning
Course Number: it_fedads_02_enus
Expertise Level
Intermediate