Companies Home Search Profile

Data Cleansing Master Class in Python

Focused View

Mike West

3:32:19

147 View
  • 01.01-course introduction.mp4
    01:37
  • 01.02-course structure.mp4
    02:08
  • 01.03-is this course right for you.mp4
    00:56
  • 02.01-introducing data preparation.mp4
    01:43
  • 02.02-the machine learning process.mp4
    03:07
  • 02.03-data preparation defined.mp4
    02:19
  • 02.04-choosing a data preparation technique.mp4
    01:37
  • 02.05-what is data in machine learning.mp4
    02:49
  • 02.06-raw data.mp4
    04:44
  • 02.07-machine learning is mostly data preparation.mp4
    02:28
  • 02.08-common data preparation tasks-data cleansing.mp4
    02:21
  • 02.09-common data preparation tasks-feature selection.mp4
    02:09
  • 02.10-common data preparation tasks-data transforms.mp4
    02:26
  • 02.11-common data preparation tasks-feature engineering.mp4
    01:14
  • 02.12-common data preparation tasks-dimensionality reduction.mp4
    01:49
  • 02.13-data leakage.mp4
    00:42
  • 02.14-problem with naive data preparation.mp4
    03:10
  • 02.15-case study data leakage train test split naive approach.mp4
    02:19
  • 02.16-case study data leakage train test split correct approach.mp4
    01:26
  • 02.17-case study data leakage k-fold naive approach.mp4
    02:37
  • 02.18-case study data leakage k-fold correct approach.mp4
    01:58
  • 03.01-data cleansing overview.mp4
    01:17
  • 03.02-identify columns that contain a single value.mp4
    01:56
  • 03.03-identify columns with few values.mp4
    02:26
  • 03.04-remove columns with low variance.mp4
    02:06
  • 03.05-identify and remove rows that contain duplicate data.mp4
    02:16
  • 03.06-defining outliers.mp4
    01:30
  • 03.07-remove outliers-the standard deviation approach.mp4
    03:13
  • 03.08-remove outliers-the iqr approach.mp4
    02:33
  • 03.09-automatic outlier detection.mp4
    03:04
  • 03.10-mark missing values.mp4
    03:59
  • 03.11-remove rows with missing values.mp4
    01:29
  • 03.12-statistical imputation.mp4
    01:12
  • 03.13-mean value imputation.mp4
    02:47
  • 03.14-simple imputer with model evaluation.mp4
    01:12
  • 03.15-compare different statistical imputation strategies.mp4
    01:30
  • 03.16-k-nearest neighbors imputation.mp4
    03:00
  • 03.17-knnimputer and model evaluation.mp4
    01:58
  • 03.18-iterative imputation.mp4
    02:24
  • 03.19-iterativeimputer and model evaluation.mp4
    00:55
  • 03.20-iterativeimputer and different imputation order.mp4
    01:17
  • 04.01-feature selection introduction.mp4
    01:27
  • 04.02-feature selection defined.mp4
    02:31
  • 04.03-statistics for feature selection.mp4
    01:47
  • 04.04-loading a categorical dataset.mp4
    01:52
  • 04.05-encode the dataset for modelling.mp4
    01:45
  • 04.06-chi-squared.mp4
    01:48
  • 04.07-mutual information.mp4
    01:22
  • 04.08-modeling with selected categorical features.mp4
    02:19
  • 04.09-feature selection with anova on numerical input.mp4
    03:44
  • 04.10-feature selection with mutual information.mp4
    01:37
  • 04.11-modeling with selected numerical features.mp4
    01:27
  • 04.12-tuning a number of selected features.mp4
    02:28
  • 04.13-select features for numerical output.mp4
    01:56
  • 04.14-linear correlation with correlation statistics.mp4
    01:55
  • 04.15-linear correlation with mutual information.mp4
    01:46
  • 04.16-baseline and model built using correlation.mp4
    01:51
  • 04.17-model built using mutual information features.mp4
    00:38
  • 04.18-tuning number of selected features.mp4
    03:00
  • 04.19-recursive feature elimination.mp4
    02:19
  • 04.20-rfe for classification.mp4
    02:40
  • 04.21-rfe for regression.mp4
    01:28
  • 04.22-rfe hyperparameters.mp4
    02:02
  • 04.23-feature ranking for rfe.mp4
    01:46
  • 04.24-feature importance scores defined.mp4
    02:12
  • 04.25-feature importance scores linear regression.mp4
    02:22
  • 04.26-feature importance scores logistic regression and cart.mp4
    02:27
  • 04.27-feature importance scores random forests.mp4
    01:07
  • 04.28-permutation feature importance.mp4
    01:49
  • 04.29-feature selection with importance.mp4
    02:18
  • 05.01-scale numerical data.mp4
    01:44
  • 05.02-diabetes dataset for scaling.mp4
    01:23
  • 05.03-minmaxscaler transform.mp4
    01:23
  • 05.04-standardscaler transform.mp4
    01:32
  • 05.05-robust scaling data.mp4
    03:10
  • 05.06-robust scaler applied to dataset.mp4
    01:15
  • 05.07-explore robust scaler range.mp4
    01:04
  • 05.08-nominal and ordinal variables.mp4
    02:30
  • 05.09-ordinal encoding.mp4
    02:00
  • 05.10-one-hot encoding defined.mp4
    00:56
  • 05.11-one-hot encoding.mp4
    01:46
  • 05.12-dummy variable encoding.mp4
    01:44
  • 05.13-ordinal encoder transform on breast cancer dataset.mp4
    02:50
  • 05.14-make distributions more gaussian.mp4
    01:46
  • 05.15-power transform on contrived dataset.mp4
    02:05
  • 05.16-power transform on sonar dataset.mp4
    01:41
  • 05.17-box-cox on sonar dataset.mp4
    01:45
  • 05.18-yeo-johnson on sonar dataset.mp4
    01:28
  • 05.19-polynomial features.mp4
    03:02
  • 05.20-effect of polynomial degrees.mp4
    01:35
  • 06.01-transforming different data types.mp4
    01:50
  • 06.02-the columntransformer.mp4
    01:50
  • 06.03-the columntransformer on abalone dataset.mp4
    02:09
  • 06.04-manually transform target variable.mp4
    01:59
  • 06.05-automatically transform target variable.mp4
    03:10
  • 06.06-challenge of preparing new data for a model.mp4
    02:50
  • 06.07-save model and data scaler.mp4
    02:11
  • 06.08-load and apply saved scalers.mp4
    01:08
  • 07.01-curse of dimensionality.mp4
    01:31
  • 07.02-techniques for dimensionality reduction.mp4
    02:53
  • 07.03-linear discriminant analysis.mp4
    01:44
  • 07.04-linear discriminant analysis demonstrated.mp4
    03:03
  • 07.05-principal component analysis.mp4
    03:56
  • 9781803239040 Code.zip
  • Description


    Data preparation may be the most important part of a machine learning project. It is the most time-consuming part, although it is the least discussed topic. Data preparation, sometimes referred to as data preprocessing, is the act of transforming raw data into a form that is appropriate for modeling.

    Machine learning algorithms require input data to be numbered, and most algorithm implementations maintain this expectation. Therefore, if your data contains data types and values that are not numbers, such as labels, you will need to change the data into numbers. Further, specific machine learning algorithms have expectations regarding the data types, scale, probability distribution, and relationships between input variables, and you may need to change the data to meet these expectations.

    In this course, you will learn data imputation and advanced data cleansing techniques, how to apply real-world data cleansing techniques to your data, advanced data cleansing techniques. Also, learn how to prepare data in a way that avoids data leakage, and in turn, incorrect model evaluation.

    By the end of this course, you will perform data preprocessing and master data cleaning skills.

    The complete code bundle for this course is available at https://github.com/PacktPublishing/Data-Cleansing-Master-Class-in-Python

    More details


    User Reviews
    Rating
    0
    0
    0
    0
    0
    average 0
    Total votes0
    Focused display
    Mike has Bachelor of Science degrees in Business and Psychology. He started his career as a middle school psychologist prior to moving into the information technology space. His love of computers resulted in him spending many additional hours working on computers while studying for his master's degree in Statistics. His current areas of interests include Machine Learning, Data Engineering and SQL Server. When not working, Mike enjoys spending time with his family and traveling.
    Packt is a publishing company founded in 2003 headquartered in Birmingham, UK, with offices in Mumbai, India. Packt primarily publishes print and electronic books and videos relating to information technology, including programming, web design, data analysis and hardware.
    • language english
    • Training sessions 103
    • duration 3:32:19
    • Release Date 2023/02/14