Final Exam: Data Scientist

Final Exam: Data Scientist

Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level

Overview/Description

Final Exam: Data Scientist will test your knowledge and application of the topics presented throughout the Data Scientist track of the Skillsoft Aspire Data Analyst to Data Scientist Journey.

Expected Duration (hours)
0.0

Lesson Objectives

Final Exam: Data Scientist

add extensions to your dashboard such as Tableau Extensions API

build and customize graphs using ggplot2 in R

build backup and restore mechanisms in the cloud

build heat maps and scatter plots using R

can be leveraged to extract value from big data

combine the use of oversampling and PCA in building a classification model

compare the differences between the descriptive and inferential statistical analysis

compare the different types of Recommendation Engines and how they can be used to solve different recommendation problems

create an HTTP server using hapi.js

create an R function that finds similar users and finds products they liked which would be good to recommend to the user

create Histograms, Scatter plots, and Box plots using Python libraries

define a port

define the concept of storyboarding along with the prominent storyboarding templates that we can use to implement storyboarding

demonstrate how to craft visual data using Tableau

demonstrate how to create a stacked bar plot

demonstrate how to implement data exploration using R

demonstrate how to implement different types of bar charts using PowerBI

demonstrate how we can ingest data using WaveFront

demonstrate the steps involved in ingesting data from databases to Hadoop clusters using Sqoop

describe blockchain

describe how regression works by finding the best fit straight line to model the relationships in your data

describe the aspects of data quality

describe the concept of serverless computing and its benefits

describe the Gestalt principles of visual perception

describe the process involved in learning a relationship between input and output during the training phase of machine learning

describe the various essential distributed data management frameworks used to handle big data

describe what truncated data is and how to remove it using Azure Automation

how the four Vs should be balanced in order to implement a successful big data strategy

identify different cloud data sources available

identify libraries that can be used in Python to implement data visualization

identify the process and approaches involved in storytelling with data

implement correlogram and build area charts using R

implement Dask arrays in order to manage NumPy APIs

implement data exploration using plots in R

implement missing values and outliers using Python

implement point and interval estimation using R

implement Python Luigi in order to set up data pipelines

install and prepare R for data exploration

integrate Spark and Tableau to manage data pipelines

Linear regression

list and compare the various essential data ingestion tools that we can use to ingest data

list Dask task scheduling and big data collection features

list libraries that can be used in Python to implement data visualization

load data from databases using R

organize your dashboard by adding objects and adjusting the layout

Pandas ML to explore a dataset where the samples are not evenly distributed across the target classes

recall cloud migration models from the perspective of architectural preferences

recall the various essential decluttering steps and approaches that we can implement to eliminate clutters

recognize how to enable data-driven decision making

recognize the data pipeline building capabilities provided by Kafka, Spark, and PySpark

recognize the impact of implementing containerization on cloud hosting environments

recognize the impact of the implementing Kubernetes and Docker in the cloud

recognize the problems associated with a model that is overfitted to training data and how to mitigate the issue

share your dashboard to others

specify volume in big data analytics and its role in the principle of the four Vs

use modules in your API using node.js

use Pandas and Seaborn to visualize the correlated fields in a dataset

use R to import, filter, and massage data into data sets

use the scikit-learn library to build and train a LinearSVC classification model and then evaluate its performance using the available model evaluation functions

work with vectors and metrics using Python and R

Course Number:
it_fedads_04_enus

Expertise Level
Intermediate