Data Cleansing Master Class in Python

Focused View

Mike West

3:32:19

147 View

01.01-course introduction.mp4

01:37

01.02-course structure.mp4

02:08

01.03-is this course right for you.mp4

00:56

02.01-introducing data preparation.mp4

01:43

02.02-the machine learning process.mp4

03:07

02.03-data preparation defined.mp4

02:19

02.04-choosing a data preparation technique.mp4

01:37

02.05-what is data in machine learning.mp4

02:49

02.06-raw data.mp4

04:44

02.07-machine learning is mostly data preparation.mp4

02:28

02.08-common data preparation tasks-data cleansing.mp4

02:21

02.09-common data preparation tasks-feature selection.mp4

02:09

02.10-common data preparation tasks-data transforms.mp4

02:26

02.11-common data preparation tasks-feature engineering.mp4

01:14

02.12-common data preparation tasks-dimensionality reduction.mp4

01:49

02.13-data leakage.mp4

00:42

02.14-problem with naive data preparation.mp4

03:10

02.15-case study data leakage train test split naive approach.mp4

02:19

02.16-case study data leakage train test split correct approach.mp4

01:26

02.17-case study data leakage k-fold naive approach.mp4

02:37

02.18-case study data leakage k-fold correct approach.mp4

01:58

03.01-data cleansing overview.mp4

01:17

03.02-identify columns that contain a single value.mp4

01:56

03.03-identify columns with few values.mp4

02:26

03.04-remove columns with low variance.mp4

02:06

03.05-identify and remove rows that contain duplicate data.mp4

02:16

03.06-defining outliers.mp4

01:30

03.07-remove outliers-the standard deviation approach.mp4

03:13

03.08-remove outliers-the iqr approach.mp4

02:33

03.09-automatic outlier detection.mp4

03:04

03.10-mark missing values.mp4

03:59

03.11-remove rows with missing values.mp4

01:29

03.12-statistical imputation.mp4

01:12

03.13-mean value imputation.mp4

02:47

03.14-simple imputer with model evaluation.mp4

01:12

03.15-compare different statistical imputation strategies.mp4

01:30

03.16-k-nearest neighbors imputation.mp4

03:00

03.17-knnimputer and model evaluation.mp4

01:58

03.18-iterative imputation.mp4

02:24

03.19-iterativeimputer and model evaluation.mp4

00:55

03.20-iterativeimputer and different imputation order.mp4

01:17

04.01-feature selection introduction.mp4

01:27

04.02-feature selection defined.mp4

02:31

04.03-statistics for feature selection.mp4

01:47

04.04-loading a categorical dataset.mp4

01:52

04.05-encode the dataset for modelling.mp4

01:45

04.06-chi-squared.mp4

01:48

04.07-mutual information.mp4

01:22

04.08-modeling with selected categorical features.mp4

02:19

04.09-feature selection with anova on numerical input.mp4

03:44

04.10-feature selection with mutual information.mp4

01:37

04.11-modeling with selected numerical features.mp4

01:27

04.12-tuning a number of selected features.mp4

02:28

04.13-select features for numerical output.mp4

01:56

04.14-linear correlation with correlation statistics.mp4

01:55

04.15-linear correlation with mutual information.mp4

01:46

04.16-baseline and model built using correlation.mp4

01:51

04.17-model built using mutual information features.mp4

00:38

04.18-tuning number of selected features.mp4

03:00

04.19-recursive feature elimination.mp4

02:19

04.20-rfe for classification.mp4

02:40

04.21-rfe for regression.mp4

01:28

04.22-rfe hyperparameters.mp4

02:02

04.23-feature ranking for rfe.mp4

01:46

04.24-feature importance scores defined.mp4

02:12

04.25-feature importance scores linear regression.mp4

02:22

04.26-feature importance scores logistic regression and cart.mp4

02:27

04.27-feature importance scores random forests.mp4

01:07

04.28-permutation feature importance.mp4

01:49

04.29-feature selection with importance.mp4

02:18

05.01-scale numerical data.mp4

01:44

05.02-diabetes dataset for scaling.mp4

01:23

05.03-minmaxscaler transform.mp4

01:23

05.04-standardscaler transform.mp4

01:32

05.05-robust scaling data.mp4

03:10

05.06-robust scaler applied to dataset.mp4

01:15

05.07-explore robust scaler range.mp4

01:04

05.08-nominal and ordinal variables.mp4

02:30

05.09-ordinal encoding.mp4

02:00

05.10-one-hot encoding defined.mp4

00:56

05.11-one-hot encoding.mp4

01:46

05.12-dummy variable encoding.mp4

01:44

05.13-ordinal encoder transform on breast cancer dataset.mp4

02:50

05.14-make distributions more gaussian.mp4

01:46

05.15-power transform on contrived dataset.mp4

02:05

05.16-power transform on sonar dataset.mp4

01:41

05.17-box-cox on sonar dataset.mp4

01:45

05.18-yeo-johnson on sonar dataset.mp4

01:28

05.19-polynomial features.mp4

03:02

05.20-effect of polynomial degrees.mp4

01:35

06.01-transforming different data types.mp4

01:50

06.02-the columntransformer.mp4

01:50

06.03-the columntransformer on abalone dataset.mp4

02:09

06.04-manually transform target variable.mp4

01:59

06.05-automatically transform target variable.mp4

03:10

06.06-challenge of preparing new data for a model.mp4

02:50

06.07-save model and data scaler.mp4

02:11

06.08-load and apply saved scalers.mp4

01:08

07.01-curse of dimensionality.mp4

01:31

07.02-techniques for dimensionality reduction.mp4

02:53

07.03-linear discriminant analysis.mp4

01:44

07.04-linear discriminant analysis demonstrated.mp4

03:03

07.05-principal component analysis.mp4

03:56

9781803239040 Code.zip

Description

Data preparation may be the most important part of a machine learning project. It is the most time-consuming part, although it is the least discussed topic. Data preparation, sometimes referred to as data preprocessing, is the act of transforming raw data into a form that is appropriate for modeling.

Machine learning algorithms require input data to be numbered, and most algorithm implementations maintain this expectation. Therefore, if your data contains data types and values that are not numbers, such as labels, you will need to change the data into numbers. Further, specific machine learning algorithms have expectations regarding the data types, scale, probability distribution, and relationships between input variables, and you may need to change the data to meet these expectations.

In this course, you will learn data imputation and advanced data cleansing techniques, how to apply real-world data cleansing techniques to your data, advanced data cleansing techniques. Also, learn how to prepare data in a way that avoids data leakage, and in turn, incorrect model evaluation.

By the end of this course, you will perform data preprocessing and master data cleaning skills.

The complete code bundle for this course is available at https://github.com/PacktPublishing/Data-Cleansing-Master-Class-in-Python

More details

User Reviews

Rating

average 0

Total votes0

Focused display

Python

Data Analysis

Data Mining

Mike West

Instructor's Courses

Mike has Bachelor of Science degrees in Business and Psychology. He started his career as a middle school psychologist prior to moving into the information technology space. His love of computers resulted in him spending many additional hours working on computers while studying for his master's degree in Statistics. His current areas of interests include Machine Learning, Data Engineering and SQL Server. When not working, Mike enjoys spending time with his family and traveling.

PacktPub

View courses PacktPub

Packt is a publishing company founded in 2003 headquartered in Birmingham, UK, with offices in Mumbai, India. Packt primarily publishes print and electronic books and videos relating to information technology, including programming, web design, data analysis and hardware.