Data Science Statistics: Applied Inferential Statistics


Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description

Explore how different t-tests can be performed by using the SciPy library for hypothesis testing in this 10-video course, which continues your explorations of data science. This beginner-level course assumes prior experience with Python programming, along with an understanding of such terms as skewness and kurtosis and concepts from inferential statistics, such as t-tests and regression. Begin by learning how to perform three different t-tests—the one-sample t-test, the independent or two-sample t-test, and the paired t-test—on various samples of data using the SciPy library. Next, learners explore how to interpret results to accept or reject a hypothesis. The course covers, as an example, how to fit a regression model on the returns on an individual stock, and on the S&P 500 Index, by using the scikit-learn library. Finally, watch demonstrations of measuring skewness and kurtosis in a data set. The closing exercise asks you to list three different types of t-tests, identify values which are returned by t-tests, and write code to calculate the percentage returns from time series data using Pandas.



Expected Duration (hours)
1.3

Lesson Objectives

Data Science Statistics: Applied Inferential Statistics

  • Course Overview
  • test a hypothesis about a sample by comparing it to the general population using the one-sample t-test available in the SciPy library
  • compare a sample with another independent sample using the independent t-test and with a related sample using a paired t-test using the SciPy library
  • apply independent t-tests on a real dataset to test a hypothesis that managers at a firm have higher salaries than non-managerial employees
  • work with Pandas and Matplotlib to analyze the stock price of Volkswagen in 2008, which were affected by some extreme events
  • compute the skewness and kurtosis of the returns on Volkswagen stock in 2008 and recognize how it was a few days of extreme behavior which increased those numbers
  • perform pre-processing operations on a dataset containing close prices for stocks and indices to analyze it using linear regression
  • use the scikit-learn library to fit a linear regression model on the returns on a stock and the returns on the S&P 500 index
  • use two explanatory variables - the returns on the S&P 500 index and on an index tracking the strength of the US Dollar - to perform a regression on the returns on individual stocks
  • recall different types of T-tests and identify the values they return, calculate percentage returns from time series data using Pandas, and measure the skew and kurtosis values for a series
  • Course Number:
    it_dssds2dj_02_enus

    Expertise Level
    Intermediate