Preparing Data for Machine Learning

Focused View

Janani Ravi

3:23:57

16 View

01 - Course Overview

01 - Course Overview.mp4

01:48

02 - Understanding the Need for Data Preparation

02 - Module Overview.mp4

01:22

03 - Prerequisites and Course Outline.mp4

01:42

04 - The Need for Data Preparation.mp4

04:19

05 - Insufficient Data.mp4

06:23

06 - Too Much Data.mp4

03:56

07 - Non-representative Data, Missing Values, Outliers, Duplicates.mp4

02:15

08 - Dealing with Missing Data.mp4

05:16

09 - Dealing with Outliers.mp4

05:36

10 - Oversampling and Undersampling to Balance Datasets.mp4

04:11

11 - Overfitting and Underfitting.mp4

03:08

12 - Module Summary.mp4

01:34

03 - Implementing Data Cleaning and Transformation

13 - Module Overview.mp4

01:13

14 - Handling Missing Values.mp4

06:49

15 - Cleaning Data.mp4

07:38

16 - Visualizing Relationships.mp4

04:04

17 - Building a Regression Model.mp4

07:39

18 - Univariate Feature Imputation Using the Simple Imputer.mp4

06:50

19 - Multivariate Feature Imputation Using the Iterative Imputer.mp4

06:00

20 - Missing Value Indicator.mp4

02:11

21 - Feature Imputation as a Part of an Machine Learning Pipeline.mp4

04:01

22 - Module Summary.mp4

01:26

04 - Transforming Continuous and Categorical Data

23 - Module Overview.mp4

02:26

24 - Numeric Data.mp4

05:33

25 - Scaling and Standardizing Features.mp4

04:21

26 - Normalizing and Binarizing Features.mp4

06:09

27 - Categorical Data.mp4

03:23

28 - Numeric Encoding of Categorical Data.mp4

04:43

29 - Label Encoding and One-hot Encoding.mp4

07:42

30 - Discretization of Continuous Values Using Pandas Cut.mp4

03:27

31 - Discretization of Continuous Values Using the KBins Discretizer.mp4

03:53

32 - Building a Regression Model with Discretized Data.mp4

03:28

33 - Module Summary.mp4

01:18

05 - Understanding Feature Selection

34 - Module Overview.mp4

01:19

35 - The Curse of Dimensionality.mp4

04:49

36 - Reducing Complexity in Data.mp4

03:22

37 - Feature Selection to Reduce Dimensions.mp4

03:40

38 - Filter Methods.mp4

04:22

39 - Embedded Methods.mp4

05:02

40 - Module Summary.mp4

01:36

06 - Implementing Feature Selection

41 - Module Overview.mp4

01:14

42 - Feature Correlations.mp4

07:54

43 - Using the Correlation Matrix to Detect Multi-collinearity.mp4

04:56

44 - Using Variance Inflation Factor to Detect Multi-collinearity.mp4

03:20

45 - Features Selection Using Missing Values Threshold and Variance Threshold.mp4

06:26

46 - Univariate Feature Selection Using Chi2 and ANOVA.mp4

07:16

47 - Feature Selection Using Wrapper Methods.mp4

07:30

48 - Feature Selection Using Embedded Methods.mp4

04:04

49 - Module Summary.mp4

01:23

Description

This course covers important techniques in data preparation, data cleaning and feature selection that are needed to set your machine learning model up for success. You will also learn how to use imputation to deal with missing data and strategies for identifying and coping with outliers.

What You'll Learn?

As Machine Learning explodes in popularity, it is becoming ever more important to know precisely how to prepare the data going into the model in a manner appropriate to the problem we are trying to solve.

In this course, Preparing Data for Machine Learning* you will gain the ability to explore, clean, and structure your data in ways that get the best out of your machine learning model.

First, you will learn why data cleaning and data preparation are so important, and how missing data, outliers, and other data-related problems can be solved. Next, you will discover how models that read too much into data suffer from a problem called overfitting, in which models perform well under test conditions but struggle in live deployments. You will also understand how models that are trained with insufficient or unrepresentative data suffer from a different set of problems, and how these problems can be mitigated.

Finally, you will round out your knowledge by applying different methods for feature selection, dealing with missing data using imputation, and building your models using the most relevant features.

When you’re finished with this course, you will have the skills and knowledge to identify the right data procedures for data cleaning and data preparation to set your model up for success.

More details

User Reviews

Rating

average 0

Total votes0

Focused display

Machine Learning

Data Science

Janani Ravi

Instructor's Courses

Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework. After spending years working in tech in the Bay Area, New York, and Singapore at companies such as Microsoft, Google, and Flipkart, Janani finally decided to combine her love for technology with her passion for teaching. She is now the co-founder of Loonycorn, a content studio focused on providing high-quality content for technical skill development. Loonycorn is working on developing an engine (patent filed) to automate animations for presentations and educational content.

Pluralsight

View courses Pluralsight

Pluralsight, LLC is an American privately held online education company that offers a variety of video training courses for software developers, IT administrators, and creative professionals through its website. Founded in 2004 by Aaron Skonnard, Keith Brown, Fritz Onion, and Bill Williams, the company has its headquarters in Farmington, Utah. As of July 2018, it uses more than 1,400 subject-matter experts as authors, and offers more than 7,000 courses in its catalog. Since first moving its courses online in 2007, the company has expanded, developing a full enterprise platform, and adding skills assessment modules.