Companies Home Search Profile

About Apache SparkLearn More

Apache Spark is an open-source unified analytics engine for analyzing large data sets in real-time. Not only does Spark feature easy-to-use APIs, it also comes with higher-level libraries to support machine learning, SQL queries, and data streaming. In a business landscape that depends on big data, Apache Spark is an invaluable tool.

Sort by:

The newest Most visited Course time

Subtitle

The newest

Most visited

Course time

Subtitle

Courses

Optimizing Apache Spark on Databricks

Janani Ravi

Optimizing Apache Spark on Databricks

2:00:05

12/11/2023

Handling Batch Data with Apache Spark on Databricks

Janani Ravi

Handling Batch Data with Apache Spark on Databricks

2:21:44

12/10/2023

Getting Started with Apache Spark on Databricks

Janani Ravi

Getting Started with Apache Spark on Databricks

1:52:09

12/10/2023

Subtitle

Azure Spark Databricks Essential Training

Linkedin Learning

Lynn Langit

Azure Spark Databricks Essential Training

2:52:18

English subtitles

12/05/2023

Handling Streaming Data with Azure Databricks Using Spark Structured Streaming

Mohit Batra

Handling Streaming Data with Azure Databricks Using Spark Structured Streaming

2:27:47

12/02/2023

Modeling Streaming Data for Processing with Apache Spark Structured Streaming

Eugene Meidinger

Eugene Meidinger

Modeling Streaming Data for Processing with Apache Spark Structured Streaming

1:19:13

11/28/2023

Apache Spark Foundation

Udemy Apache Spark Foundation

4:57:21

11/18/2023

Subtitle

Introduction to Spark SQL and DataFrames

Linkedin Learning

Dan Sullivan

Introduction to Spark SQL and DataFrames

1:53:25

English subtitles

11/13/2023

Manning Building An End-To-End Batch Data Pipeline With Apache Spark

O'Reilly Manning Building An End-To-End Batch Data Pipeline With Apache Spark

43:45

10/31/2023

Subtitle

Manning - Processing Covid-19 Data with Apache Spark

O'Reilly Manning - Processing Covid-19 Data with Apache Spark

1:11:33

English subtitles

10/31/2023

Spark in Action, Second Edition

Manning Publications Spark in Action, Second Edition

15:43:43

10/23/2023

Data Engineering Foundations Part 1 Using Spark, Hive, and Hadoop Scalable Tools

LiveLessons Data Engineering Foundations Part 1 Using Spark, Hive, and Hadoop Scalable Tools

4:26:08

10/22/2023

Subtitle

Big Data with Apache Spark and AWS

Skillbox, LLC

Big Data with Apache Spark and AWS

2:17:20

English subtitles

09/28/2023

Spark Project on Cloudera Hadoop(CDH) and GCP for Beginners

PARI MARGU

Spark Project on Cloudera Hadoop(CDH) and GCP for Beginners

10:51:25

08/21/2023

Subtitle

Master Big Data - Apache Spark/Hadoop/Sqoop/Hive/Flume/Mongo

Navdeep Kaur

Master Big Data - Apache Spark/Hadoop/Sqoop/Hive/Flume/Mongo

10:57:25

English subtitles

08/21/2023

Databricks Certified Associate Developer for Apache Spark 3

Aviral Bhardwaj

Aviral Bhardwaj

Databricks Certified Associate Developer for Apache Spark 3

4:15:26

08/20/2023

1 2 3 4

Books

View More

Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning library

Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning library

Hien Luu

Big Data Processing Using Spark in Cloud (Studies in Big Data, 43)

Big Data Processing Using Spark in Cloud (Studies in Big Data, 43)

Valentina Emilia Balas

Next-Generation Big Data: A Practical Guide to Apache Kudu, Impala, and Spark

Next-Generation Big Data: A Practical Guide to Apache Kudu, Impala, and Spark

Butch Quinto

Modern Data Engineering with Apache Spark: A Hands-On Guide for Building Mission-Critical Streaming Applications

Modern Data Engineering with Apache Spark: A Hands-On Guide for Building Mission-Critical Streaming Applications

Scott Haines

Data Algorithms with Spark: Recipes and Design Patterns for Scaling Up using PySpark

Data Algorithms with Spark: Recipes and Design Patterns for Scaling Up using PySpark

Mahmoud Parsian

Software Development

Developing Spark Applications with Python

Developing Spark Applications with Python

Xavier Morera

Hands-On Deep Learning with Apache Spark: Build and deploy distributed deep learning applications on Apache Spark

Hands-On Deep Learning with Apache Spark: Build and deploy distributed deep learning applications on Apache Spark

Guglielmo Iozzia

Mastering Machine Learning on AWS: Advanced machine learning in Python using SageMaker, Apache Spark, and TensorFlow

Mastering Machine Learning on AWS: Advanced machine learning in Python using SageMaker, Apache Spark, and TensorFlow

Dr. Saket S.R. Mengle

Big Data Analysis with Python: Combine Spark and Python to unlock the powers of parallel computing and machine learning

Big Data Analysis with Python: Combine Spark and Python to unlock the powers of parallel computing and machine learning

Ankit Shukla

Frequently asked questions about Apache Spark

What is Apache Spark?

Apache Spark is a framework designed for data processing. It was created for big data and is quick at performing processing tasks on very large data sets. With Apache Spark, you can distribute the same data processing task across many computers, either by only using Spark or using it in combination with other big data processing tools. Spark is an important tool in the world of big data, machine learning, and artificial intelligence, which require a lot of computing power to crunch massive amounts of data. Spark takes some of the burdens off of programmers by abstracting away a lot of the manual work involved in distributed computing and data processing. Programmers can interact with Spark using the Java, Python, Scala, and R programming languages. Spark also supports streaming data and SQL.

What careers use Apache Spark?

You will find Apache Spark developers wherever big data, machine learning, and artificial intelligence are used. You can find Spark being used for financial services to create recommendations for new financial products and more. It is also used to crunch data in investment banks to predict future stock trends. FinTech also uses it heavily. Developers in the health industry use Spark to analyze patient records with their past clinical data and determine future health risks. Manufacturers use Spark for large data set analysis. Programmers in the retail industry use it to marshall customers' data, create personalized services for them, and suggest related products at checkout. Machine learning engineers, data scientists, and big data developers also use Spark in the travel, e-commerce, media, and entertainment industries.

What should I learn before Apache Spark?

Apache Spark is a flexible framework for data processing, and there are some technologies it helps to know before you learn to use it. The first thing you need to know is how to interact with data stores, and there are a lot Spark can use. It also helps to know Hadoop, a popular distributed data infrastructure that is often used in conjunction with Spark for big data tasks. Knowing SQL allows you to interact with and retrieve data from databases if you plan on using them as a source for the data in Spark. Understanding the basics of a distributed database system like Hbase or Cassandra will also be useful. Being able to interact with Spark is important, requiring knowing a programming language that Spark understands. So to use Spark, you need to know either Java, Python, Scala, or the R programming language.