Companies Home Search Profile

Handling Batch Data with Apache Spark on Databricks

Focused View

Janani Ravi

2:21:44

12 View
  • 01. Course Overview.mp4
    02:01
  • 02. Prerequisites and Course Outline.mp4
    02:04
  • 03. Apache Spark on Databricks.mp4
    02:55
  • 04. RDDs and Data Frames.mp4
    05:31
  • 05. Narrow and Wide Transformations.mp4
    05:24
  • 06. Demo-Configuring Workspace and Cluster.mp4
    04:39
  • 07. Demo-Operations with Shuffled Writes to Disk.mp4
    05:45
  • 08. Demo-Basic Transformations.mp4
    05:33
  • 09. Demo-Aggregation Transformations.mp4
    08:10
  • 10. The Catalyst Optimizer.mp4
    06:20
  • 11. Demo-Creating Global Table.mp4
    03:13
  • 12. Demo-Running SQL Queries in Spark.mp4
    06:27
  • 13. Demo-Replacing Table Contents and Partitioning Tables.mp4
    05:35
  • 14. Demo-Running Interactive Queries on a Notebook on an All-purpose Cluster.mp4
    04:11
  • 15. Demo-Running a Notebook as a Job on a Job Cluster.mp4
    05:41
  • 16. User-defined Functions UDFs.mp4
    02:12
  • 17. Vectorized UDFs.mp4
    02:55
  • 18. Demo-Loading Data into Azure Cosmos DB.mp4
    03:51
  • 19. Demo-Reading Data from Cosmos DB in Spark.mp4
    03:52
  • 20. Demo-User-defined Functions UDFs.mp4
    05:08
  • 21. Demo-Vectorized UDFs - Series to Series.mp4
    05:08
  • 22. Demo-Vectorized UDFs - Iterator of Series to Iterator of Series.mp4
    02:12
  • 23. Demo-Vectorized UDFs - Iterator of Multiple Series to Iterator of Series.mp4
    02:39
  • 24. Demo-Vectorized UDFs - Series to Scalar.mp4
    03:08
  • 25. Partitioning.mp4
    02:15
  • 26. Demo-Working with Data Partitions.mp4
    05:42
  • 27. Demo-Repartitioning and Coalescing Data.mp4
    03:56
  • 28. Demo-Performing Union Operations.mp4
    02:15
  • 29. Demo-Performing Join Operations.mp4
    07:27
  • 30. Window Functions.mp4
    03:01
  • 31. Row Frames and Range Frames.mp4
    05:03
  • 32. Demo-Applying Window Functions.mp4
    06:05
  • 33. Summary and Further Study.mp4
    01:26
  • Description


    This course will teach you how to transform and aggregate batch data using Apache Spark on the Azure Databricks platform using selection, filter, and aggregation queries, built-in and user-defined functions, and perform windowing and join operations on batch data.

    What You'll Learn?


      Azure Databricks allows you to work with big data processing and queries using the Apache Spark unified analytics engine. Azure Databricks allows to work with a variety of batch sources and makes it seamless to analyze, visualize, and process data on the Azure Cloud Platform. In this course, Handling Batch Data with Apache Spark on Databricks, you will learn how to perform transformations and aggregations on batch data with selection, filtering, grouping, and ordering queries that use the DataFrame API. You will understand the difference between narrow transformations and wide transformations in Spark which will help you figure out why certain transformations are more efficient than others. You will also see how you can execute these same transformations by executing SQL queries on your data. Next, you will learn how you can implement your own custom user-defined functions to process your data. You will write code on Azure Databricks notebooks to define and register your UDFs and use them to transform your data. You will also understand how to define and use different flavors of vectorized UDFs for data processing and learn how vectorized UDFs are often more efficient than regular UDFs. Along the way, you will also see how you can read from Azure Cosmos DB as a source for your batch data. Finally, you will see how you can repartition your data in memory to improve processing performance, you will use window functions to compute statistics on your data and you will combine data frames using union and join operations. When you’re finished with this course you will have the skills and ability to perform advanced transformations and aggregations on batch data, including defining and using user-defined functions for processing.

    More details


    User Reviews
    Rating
    0
    0
    0
    0
    0
    average 0
    Total votes0
    Focused display
    Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework. After spending years working in tech in the Bay Area, New York, and Singapore at companies such as Microsoft, Google, and Flipkart, Janani finally decided to combine her love for technology with her passion for teaching. She is now the co-founder of Loonycorn, a content studio focused on providing high-quality content for technical skill development. Loonycorn is working on developing an engine (patent filed) to automate animations for presentations and educational content.
    Pluralsight, LLC is an American privately held online education company that offers a variety of video training courses for software developers, IT administrators, and creative professionals through its website. Founded in 2004 by Aaron Skonnard, Keith Brown, Fritz Onion, and Bill Williams, the company has its headquarters in Farmington, Utah. As of July 2018, it uses more than 1,400 subject-matter experts as authors, and offers more than 7,000 courses in its catalog. Since first moving its courses online in 2007, the company has expanded, developing a full enterprise platform, and adding skills assessment modules.
    • language english
    • Training sessions 33
    • duration 2:21:44
    • level preliminary
    • Release Date 2023/12/15