Companies Home Search Profile

Mastering Big Data Analytics with PySpark

Focused View

Danny Meijer

8:07:06

11 View
  • 01-Course Overview.mp4
    06:50
  • 02-Python versus Spark.mp4
    10:35
  • 03-Preparing for the Course.mp4
    06:23
  • 04-Connecting Jupyter to Spark.mp4
    14:58
  • 05-Getting to Know Spark.mp4
    07:27
  • 06-The Power of Spark.mp4
    06:38
  • 07-The Power of Spark MLlib.mp4
    06:35
  • 08-Spark DataFrames.mp4
    10:45
  • 09-Spark Data Operations.mp4
    11:33
  • 10-Loading Data from CSV Files.mp4
    11:12
  • 11-Fixing Issues in Our Data Part One.mp4
    10:51
  • 12-Fixing Issues in Our Data Part Two.mp4
    10:44
  • 13-Grouping, Joining, and Aggregating Part One.mp4
    16:08
  • 14-Grouping, Joining, and Aggregating Part Two.mp4
    09:02
  • 15-Machine Learning with Spark.mp4
    09:51
  • 16-Building a Recommendation System with Spark MLlib Part One.mp4
    11:11
  • 17-Building a Recommendation System with Spark MLlib Part Two.mp4
    11:22
  • 18-Building a Recommendation System with Spark MLlib Part Three.mp4
    16:18
  • 19-Finalizing our Recommendation System.mp4
    15:37
  • 20-What We Have Learned So Far.mp4
    10:13
  • 21-Machine Learning with Spark.mp4
    21:15
  • 22-Machine Learning Pipelines.mp4
    11:25
  • 23-Running a Logistic Regression Pipeline.mp4
    11:42
  • 24-Parameters, Features, and Persistence.mp4
    15:28
  • 25-Frequent Pattern Mining and Statistics.mp4
    21:59
  • 26-Natural Language Processing with Spark.mp4
    12:17
  • 27-Identifying Our Data.mp4
    11:38
  • 28-Data Preparation and Exploration.mp4
    11:38
  • 29-Creating Our Raw Training Data.mp4
    10:13
  • 30-Data Preparation and Regular Expressions.mp4
    15:28
  • 31-Data Cleaning and Transformation.mp4
    19:01
  • 32-Training a Sentiment Analysis Model Part One.mp4
    15:33
  • 33-Training a Sentiment Analysis Model Part Two.mp4
    09:32
  • 34-Fetching Data from Twitter.mp4
    06:24
  • 35-Spark Structured Streaming.mp4
    11:23
  • 36-Managing and Converting Streams.mp4
    12:49
  • 37-Assembling Our Streaming ML Solution.mp4
    17:05
  • 38-A Structured Approach to ML Streaming.mp4
    02:19
  • 39-Running Spark in Production.mp4
    10:43
  • 40-Running Spark at Scale.mp4
    10:02
  • 41-Tips, Tricks, and Take-Aways.mp4
    14:59
  • Description


    PySpark helps you perform data analysis at-scale; it enables you to build more scalable analyses and pipelines. This course starts by introducing you to PySpark's potential for performing effective analyses of large datasets. You'll learn how to interact with Spark from Python and connect Jupyter to Spark to provide rich data visualizations. After that, you'll delve into various Spark components and its architecture. You'll learn to work with Apache Spark and perform ML tasks more smoothly than before. Gathering and querying data using Spark SQL, to overcome challenges involved in reading it. You'll use the DataFrame API to operate with Spark MLlib and learn about the Pipeline API. Finally, we provide tips and tricks for deploying your code and performance tuning. By the end of this course, you will not only be able to perform efficient data analytics but will have also learned to use PySpark to easily analyze large datasets at-scale in your organization. All related code files are placed on a GitHub repository at: https://github.com/PacktPublishing/Mastering-Big-Data-Analytics-with-PySpark

    More details


    User Reviews
    Rating
    0
    0
    0
    0
    0
    average 0
    Total votes0
    Focused display
    Category
    Danny Meijer
    Danny Meijer
    Instructor's Courses
    Danny Meijer works as the Lead Data Engineer in the Netherlands for the Data and Analytics department of a leading sporting goods retailer. He is a Business Process Expert, big data scientist and additionally a data engineer, which gives him a unique mix of skills—the foremost of which is his business-first approach to data science and data engineering. He has over 13-years' IT experience across various domains and skills ranging from (big) data modeling, architecture, design, and development as well as project and process management; he also has extensive experience with process mining, data engineering on big data, and process improvement. As a certified data scientist and big data professional, he knows his way around data and analytics, and is proficient in various types of programming language. He has extensive experience with various big data technologies and is fluent in everything: NoSQL, Hadoop, Python, and of course Spark. Danny is a driven person, motivated by everything data and big-data. He loves math and machine learning and tackling difficult problems.
    Packt is a publishing company founded in 2003 headquartered in Birmingham, UK, with offices in Mumbai, India. Packt primarily publishes print and electronic books and videos relating to information technology, including programming, web design, data analysis and hardware.
    • language english
    • Training sessions 41
    • duration 8:07:06
    • Release Date 2024/03/15