Big Data Analytics with Hadoop and Apache Spark

Focused View

Kumaran Ponnambalam

51:55

0 View

01 - Introduction

01 - The combined power of Spark and Hadoop Distributed File System (HDFS).mp4

00:42

02 - 1. Introduction and Setup

01 - Apache Hadoop overview.mp4

01:50

02 - Apache Spark overview.mp4

00:45

03 - Integrating Spark and Hadoop.mp4

01:19

04 - Using exercise files.mp4

03:36

03 - 2. HDFS Data Modeling for Analytics

01 - Storage formats.mp4

02:20

02 - Compression.mp4

02:05

03 - Partitioning.mp4

02:02

04 - Bucketing.mp4

01:17

05 - Best practices for data storage.mp4

01:19

04 - 3. Data Ingestion with Spark

01 - Reading external files into Spark.mp4

01:46

02 - Writing to HDFS.mp4

01:26

03 - Parallel writes with partitioning.mp4

01:12

04 - Parallel writes with bucketing.mp4

01:17

05 - Best practices for ingestion.mp4

00:55

05 - 4. Data Extraction with Spark

01 - How Spark works.mp4

02:59

02 - Reading HDFS files with schema.mp4

01:18

03 - Reading partitioned data.mp4

01:25

04 - Reading bucketed data.mp4

00:55

05 - Best practices for data extraction.mp4

01:08

06 - 5. Optimizing Spark Processing

01 - Pushing down projections.mp4

01:45

02 - Pushing down filters.mp4

01:52

03 - Managing partitions.mp4

02:32

04 - Improving joins.mp4

01:59

05 - Storing intermediate results.mp4

02:00

06 - Best practices for data processing.mp4

02:39

07 - 6. Use Case Project

01 - Problem definition.mp4

01:57

02 - Data loading.mp4

01:38

03 - Total score analytics.mp4

01:02

04 - Average score analytics.mp4

00:59

05 - Top student analytics.mp4

01:12

08 - Conclusion

01 - Continuing on with big data analytics.mp4

00:44

Description

Apache Hadoop was a pioneer in the world of big data technologies, and it continues to lead in enterprise big data storage. Apache Spark is the top big data processing engine and provides an impressive array of features and capabilities. When used together, the Hadoop Distributed File System (HDFS) and Spark can provide a truly scalable setup for big data analytics. In this course, data analytics expert Kumaran Ponnambalam shows you how to leverage these two technologies to build scalable and optimized data analytics pipelines. Explore ways to optimize data modeling and storage on HDFS; discuss scalable data ingestion and extraction using Spark; and review actionable tips for optimizing data processing in Spark. Plus, complete a use case project that allows you to practice your new techniques.

More details

User Reviews

Rating

average 0

Total votes0

Focused display

Apache Spark

Kumaran Ponnambalam

Instructor's Courses

A seasoned veteran in everything data, with a reputation for delivering high performance database and SaaS applications and currently specializing in leading Big Data Science and Engineering efforts

Linkedin Learning

View courses Linkedin Learning

LinkedIn Learning is an American online learning provider. It provides video courses taught by industry experts in software, creative, and business skills. It is a subsidiary of LinkedIn. All the courses on LinkedIn fall into four categories: Business, Creative, Technology and Certifications. It was founded in 1995 by Lynda Weinman as Lynda.com before being acquired by LinkedIn in 2015. Microsoft acquired LinkedIn in December 2016.