Companies Home Search Profile

Problem Solving using PySpark - Regression & Classification

Focused View

Sathish Jayaraman

1:48:52

6 View
  • 1. Introduction.html
  • 2. Problem Solving with PySpark Regression and Classification.mp4
    02:49
  • 1.1 Data for the segment on Descriptive Statistics.html
  • 1.2 metropt+3+dataset.zip
  • 1.3 spark 3 descp stat 1.zip
  • 1. Setting up PySpark Environment in Google Colab.mp4
    03:54
  • 2. Understanding Descriptive Statistics in PySpark.mp4
    05:35
  • 3. Understanding Data Filtering and Slicing in PySpark.mp4
    03:23
  • 4. Summary of Descriptive Statistics in PySpark and Quiz.mp4
    01:41
  • 1.1 Data cleaning.zip
  • 1. Introduction to Data Cleaning with PySpark.mp4
    01:06
  • 2. Setting up PySpark Environment for Data Cleaning on Google Colab.mp4
    02:31
  • 3. Understanding the Dataset Explanatory Analysis and Data Cleaning with PySpark.mp4
    04:18
  • 4. PySpark Data Cleaning Assessment of Null Values and Outliers.mp4
    04:50
  • 5. Data Cleaning with PySpark Imputation Strategy Quiz.mp4
    03:51
  • 6. Introduction to Pivot Tables in PySpark.mp4
    01:30
  • 1.1 Electric grid.csv
  • 1. Introduction to Regression and Classification Problems in PySpark.mp4
    04:17
  • 2. Understanding the Data Set through Explanatory Analysis.mp4
    04:07
  • 3. Correlation Analysis and Data Preparation.mp4
    02:56
  • 4. Modeling the data using Gradient Boosted Trees Regression.mp4
    03:25
  • 5. Understanding Feature Importance.mp4
    02:48
  • 6. Gradient Boosted Trees Regression - Quiz.mp4
    02:25
  • 1.1 Stabf class.csv
  • 1. Classification Problem Statement Supervised Machine Learning.mp4
    03:39
  • 2. Data Cleaning and Preparation for XGBoost Classification Model.mp4
    02:59
  • 3. XGBoost Classification Model Pipeline using PySpark.mp4
    02:52
  • 4. Summary of the segment on Spark XGBoost Classifier.mp4
    00:39
  • 1.1 Text data cleaning .zip
  • 1. Classification Model for Text Data.mp4
    02:31
  • 2. Understanding the Data for Text Classification.mp4
    03:48
  • 3. Word Cloud Text Analytics Quiz.mp4
    02:03
  • 4. Spark NLP Pipeline Classification Model.mp4
    04:52
  • 1.1 train forecast.zip
  • 1. Introduction to Time Series Analysis Setting up the Google Colab Notebook.mp4
    02:33
  • 2. Explanatory Analysis and Data Cleaning.mp4
    03:06
  • 3. Analysis of time series components using advanced visualization techniques.mp4
    05:38
  • 4. Use of Prophet Model for Time Series Forecasting.mp4
    02:43
  • 5. Time Series Forecasting - Quiz.mp4
    01:34
  • 1.1 Spark SQL.zip
  • 1. Introduction to Spark SQL Querying.mp4
    02:42
  • 2. Comparison of PySpark statements and Spark SQL Query.mp4
    04:06
  • 3. Join in Spark SQL.mp4
    05:34
  • 4. Join in Spark SQL - Quiz.mp4
    02:07
  • Description


    Gradient Boosted Trees, XGBoost, Spark NLP, Time Series, Prophet, Data Cleaning, Descriptive Statistics, Spark SQL

    What You'll Learn?


    • Data analysis and descriptive statistics with PySpark - Learning to compute essential descriptive statistics for data understanding and summarization
    • Data Cleaning with PySpark
    • Predictive modeling with PySpark using Regression
    • Applying Classification techniques to a real world problem in PySpark
    • Text analytics using PySpark and Spark NLP
    • Time-Series modeling with PySpark and Prophet
    • Introduction to Spark SQL for data querying

    Who is this for?


  • This course is suited for anyone interested in the realm of analytics using PySpark - particularly useful for analysts and engineers interested in Big Data, someone with a basic knowledge of data science and ML principles
  • What You Need to Know?


  • Basic knowledge of data science and ML principles will be helpful
  • Familiarity with Python to work with PySpark
  • A computer with internet to access course material
  • More details


    Description

    This course is based on real world problems in PySpark, surrounding Data Cleaning, Descriptive statistics, Classification and Regression Modeling.

    The first segment introduces descriptive statistics in PySpark and computing fundamental measures such as mean, standard deviation and generating an extended statistical summary.

    The second segment is based on cleaning the data in PySpark, working with null values,  redundant data and imputing the null values.

    The third segment is about Predictive modeling with PySpark using Gradient Boosted Trees Regression

    The fourth and fifth segments  are based on applying classification techniques in PySpark. The fourth Segment introduces the application of Spark XGB Classifier for a classification problem and the fifth segment is about using a deep learning model for text sentiment classification.

    The sixth segment is about time series analytics and modeling using PySpark and Prophet

    The seventh segment introduces  Spark SQL for data querying and analysis.

    These segments also include advanced visualization techniques through Seaborn and Plotly libraries including  Box plots to understand the distribution of the data and assessment of outliers, Count plots to understand balance in the proportion of data, Bar chart to represent feature importance as part of the Gradient Boosted Trees Regression Model, Word Cloud for text analytics and analyzing time series data to extract seasonality and trend components.

    Each of these segments, has a Google Colab notebook included aligning with the lecture.

    Who this course is for:

    • This course is suited for anyone interested in the realm of analytics using PySpark - particularly useful for analysts and engineers interested in Big Data, someone with a basic knowledge of data science and ML principles

    User Reviews
    Rating
    0
    0
    0
    0
    0
    average 0
    Total votes0
    Focused display
    Category
    Sathish Jayaraman
    Sathish Jayaraman
    Instructor's Courses
    Sathish Jayaraman's interests are in Data Science, Data Analytics, Machine Learning, ML Pipeline and Artificial Intelligence. He is passionate about solving real world problems in Data Science, ML and AI. He has III degrees in engineering, including a Bachelor's degree from Anna University and an MS degree from the University of Minnesota, Minneapolis.
    Students take courses primarily to improve job-related skills.Some courses generate credit toward technical certification. Udemy has made a special effort to attract corporate trainers seeking to create coursework for employees of their company.
    • language english
    • Training sessions 34
    • duration 1:48:52
    • Release Date 2024/03/12