Companies Home Search Profile

Preparing Data for Machine Learning

Focused View

Janani Ravi

3:23:57

16 View
  • 01 - Course Overview.mp4
    01:48
  • 02 - Module Overview.mp4
    01:22
  • 03 - Prerequisites and Course Outline.mp4
    01:42
  • 04 - The Need for Data Preparation.mp4
    04:19
  • 05 - Insufficient Data.mp4
    06:23
  • 06 - Too Much Data.mp4
    03:56
  • 07 - Non-representative Data, Missing Values, Outliers, Duplicates.mp4
    02:15
  • 08 - Dealing with Missing Data.mp4
    05:16
  • 09 - Dealing with Outliers.mp4
    05:36
  • 10 - Oversampling and Undersampling to Balance Datasets.mp4
    04:11
  • 11 - Overfitting and Underfitting.mp4
    03:08
  • 12 - Module Summary.mp4
    01:34
  • 13 - Module Overview.mp4
    01:13
  • 14 - Handling Missing Values.mp4
    06:49
  • 15 - Cleaning Data.mp4
    07:38
  • 16 - Visualizing Relationships.mp4
    04:04
  • 17 - Building a Regression Model.mp4
    07:39
  • 18 - Univariate Feature Imputation Using the Simple Imputer.mp4
    06:50
  • 19 - Multivariate Feature Imputation Using the Iterative Imputer.mp4
    06:00
  • 20 - Missing Value Indicator.mp4
    02:11
  • 21 - Feature Imputation as a Part of an Machine Learning Pipeline.mp4
    04:01
  • 22 - Module Summary.mp4
    01:26
  • 23 - Module Overview.mp4
    02:26
  • 24 - Numeric Data.mp4
    05:33
  • 25 - Scaling and Standardizing Features.mp4
    04:21
  • 26 - Normalizing and Binarizing Features.mp4
    06:09
  • 27 - Categorical Data.mp4
    03:23
  • 28 - Numeric Encoding of Categorical Data.mp4
    04:43
  • 29 - Label Encoding and One-hot Encoding.mp4
    07:42
  • 30 - Discretization of Continuous Values Using Pandas Cut.mp4
    03:27
  • 31 - Discretization of Continuous Values Using the KBins Discretizer.mp4
    03:53
  • 32 - Building a Regression Model with Discretized Data.mp4
    03:28
  • 33 - Module Summary.mp4
    01:18
  • 34 - Module Overview.mp4
    01:19
  • 35 - The Curse of Dimensionality.mp4
    04:49
  • 36 - Reducing Complexity in Data.mp4
    03:22
  • 37 - Feature Selection to Reduce Dimensions.mp4
    03:40
  • 38 - Filter Methods.mp4
    04:22
  • 39 - Embedded Methods.mp4
    05:02
  • 40 - Module Summary.mp4
    01:36
  • 41 - Module Overview.mp4
    01:14
  • 42 - Feature Correlations.mp4
    07:54
  • 43 - Using the Correlation Matrix to Detect Multi-collinearity.mp4
    04:56
  • 44 - Using Variance Inflation Factor to Detect Multi-collinearity.mp4
    03:20
  • 45 - Features Selection Using Missing Values Threshold and Variance Threshold.mp4
    06:26
  • 46 - Univariate Feature Selection Using Chi2 and ANOVA.mp4
    07:16
  • 47 - Feature Selection Using Wrapper Methods.mp4
    07:30
  • 48 - Feature Selection Using Embedded Methods.mp4
    04:04
  • 49 - Module Summary.mp4
    01:23
  • Description


    This course covers important techniques in data preparation, data cleaning and feature selection that are needed to set your machine learning model up for success. You will also learn how to use imputation to deal with missing data and strategies for identifying and coping with outliers.

    What You'll Learn?


      As Machine Learning explodes in popularity, it is becoming ever more important to know precisely how to prepare the data going into the model in a manner appropriate to the problem we are trying to solve.

      In this course, Preparing Data for Machine Learning* you will gain the ability to explore, clean, and structure your data in ways that get the best out of your machine learning model.

      First, you will learn why data cleaning and data preparation are so important, and how missing data, outliers, and other data-related problems can be solved. Next, you will discover how models that read too much into data suffer from a problem called overfitting, in which models perform well under test conditions but struggle in live deployments. You will also understand how models that are trained with insufficient or unrepresentative data suffer from a different set of problems, and how these problems can be mitigated.

      Finally, you will round out your knowledge by applying different methods for feature selection, dealing with missing data using imputation, and building your models using the most relevant features.

      When you’re finished with this course, you will have the skills and knowledge to identify the right data procedures for data cleaning and data preparation to set your model up for success.

    More details


    User Reviews
    Rating
    0
    0
    0
    0
    0
    average 0
    Total votes0
    Focused display
    Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework. After spending years working in tech in the Bay Area, New York, and Singapore at companies such as Microsoft, Google, and Flipkart, Janani finally decided to combine her love for technology with her passion for teaching. She is now the co-founder of Loonycorn, a content studio focused on providing high-quality content for technical skill development. Loonycorn is working on developing an engine (patent filed) to automate animations for presentations and educational content.
    Pluralsight, LLC is an American privately held online education company that offers a variety of video training courses for software developers, IT administrators, and creative professionals through its website. Founded in 2004 by Aaron Skonnard, Keith Brown, Fritz Onion, and Bill Williams, the company has its headquarters in Farmington, Utah. As of July 2018, it uses more than 1,400 subject-matter experts as authors, and offers more than 7,000 courses in its catalog. Since first moving its courses online in 2007, the company has expanded, developing a full enterprise platform, and adding skills assessment modules.
    • language english
    • Training sessions 49
    • duration 3:23:57
    • level preliminary
    • Release Date 2023/12/08