Data Filtering


Overview/Description
Target Audience
Prerequisites
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description
Once data is gathered for data science it is often in an unstructured or raw format. Data must be filtered for content and validity. In this course, you'll explore examples of practical tools and techniques for data filtering.

Target Audience
Individuals with some programming and math experience working toward implementing data science in their everyday work

Prerequisites
None

Expected Duration (hours)
1.0

Lesson Objectives

Data Filtering

  • start the course
  • identify common filtering techniques and tools
  • extract date elements from common date formats
  • parse content types in HTTP headers
  • use csvcut to filter CSV data
  • use sed to replace values in a text data stream
  • drop duplicate records from data
  • extract headers from a jpeg image
  • use pdfgrep to extract data from searchable pdf files
  • detect invalid or impossible data combinations
  • parse robots.txt from a web site to decide what should and shouldn't be crawled nor indexed
  • drop records from a CSV file based on date range
  • Course Number:
    df_dses_a03_it_enus

    Expertise Level
    Beginner