Spark Project on Cloudera Hadoop(CDH) and GCP for Beginners

Focused View

PARI MARGU

10:51:25

52 View

1 - Introduction

1 - Course Introduction.mp4

04:13

2 - Big Data and Apache Hadoop Concepts

2 - Introduction to Big Data.mp4

09:12

3 - Introduction to Apache Hadoop.mp4

17:11

4 - Understanding Hadoop Distributed File System HDFS and MapReduce.mp4

30:01

3 - Apache Spark Concepts

5 - Introduction to Apache Spark.mp4

32:31

6 - Spark Architecture.mp4

12:00

4 - Environment Setup

7 - Workaround for setting up Cloudera CDH on GCP.html

8 - Environment Setup Overview.mp4

01:16

9 - Create Free Trial Account in Google Cloud PlatformGCP.mp4

09:52

10 - Create VM instance using Compute Engine in GCP.mp4

21:35

11 - Setting Up Single Node Cloudera Hadoop CDH 63 Cluster in GCP.mp4

32:45

11 - Setting-Up-Single-Node-Cloudera-Hadoop-CDH-6.3-Cluster-in-GCP-v1.0.txt

12 - Install Apache NiFi on Single Node CDH 63 Cluster.mp4

06:29

13 - Install Apache Kafka on Single Node CDH 63 Cluster.mp4

15:33

13 - Install-Kafka.txt

14 - Install Apache Cassandra on Single Node CDH 63 Cluster.mp4

15:50

14 - Install-Apache-Cassandra-on-CDH-6.3-Cluster-in-GCP.txt

15 - Install MongoDB on Single Node CDH 63 Cluster.mp4

07:20

15 - Install-MongoDB-on-CDH-6.3-in-GCP.txt

16 - Install and Configure PyCharm Community Edition for PySpark Application.mp4

17:09

16 - datamaking-pyspark-demo.zip

16 - sample-data.zip

17 - Install Configure IntelliJ Community Edition for Spark with Scala Application.mp4

22:01

17 - apachespark101.zip

5 - Apache Spark Practical using Spark with Scala and PySpark

18 - Resilient Distributed Datasets RDD Transformation Operations.mp4

34:02

18 - data.zip

18 - datamaking-pyspark-demo.zip

18 - datamaking-spark-demo.zip

19 - Resilient Distributed Datasets RDD Action Operations.mp4

10:00

19 - data.zip

19 - datamaking-pyspark-demo.zip

19 - datamaking-spark-demo.zip

20 - Spark DataFrame Operations.mp4

34:02

20 - data.zip

20 - datamaking-pyspark-demo.zip

20 - datamaking-spark-demo.zip

21 - Spark SQL Concepts with HandsOn.mp4

13:33

21 - data.zip

21 - datamaking-pyspark-demo.zip

21 - datamaking-spark-demo.zip

6 - Fundamentals of Apache NiFi

22 - Introduction to Apache NiFi.mp4

08:15

23 - Apache NiFi Core Terminologies.mp4

05:54

24 - Apache NiFi Concepts with HandsOn Part 1.mp4

09:45

25 - Apache NiFi Concepts with HandsOn Part 2.mp4

10:04

7 - Fundamentals of Apache Kafka

26 - Introduction to Apache Kafka.mp4

11:32

27 - Key Concepts in Apache Kafka.mp4

13:39

28 - Apache Kafka Architecture.mp4

09:12

29 - Kafka Producer with HandsOn.mp4

08:16

29 - Python-Kafka-Producer.txt

29 - datamaking-kafka-demo.zip

30 - Kafka Consumer with HandsOn.mp4

08:04

30 - Python-Kafka-Consumer.txt

30 - datamaking-kafka-demo.zip

8 - Fundamentals of Apache Hive

31 - Introduction to Apache Hive.mp4

08:07

32 - Hive Table Concepts with HandsOn.html

32 - customers.txt

32 - orders.txt

33 - Hive Joins Concepts with HandsOn.html

34 - Partitioning and Bucketing Concepts in Hive with HandsOn.html

9 - Spark Project Development using Spark with Scala and PySpark on CDH 63 Cluster

35 - Project ArchitectureBuilding Data Processing Pipeline.mp4

09:00

36 - Generate Retail Data using Apache NiFi Data PipelineeCommerce Data Simulator.mp4

14:04

36 - api-apache-nifi-retail-simulator-kafka.zip

36 - jolt-json-spec.txt

37 - Spark Structured Streaming and Apache Kafka Integration.mp4

14:03

37 - pyspark.zip

37 - spark-scala.zip

38 - Building Data Processing Pipeline with Spark Structured Streaming and Cassandra.mp4

47:10

38 - cassandra-table.txt

38 - pyspark.zip

38 - spark-scala.zip

39 - Building Data Processing Pipeline with Spark Structured Streaming and MongoDB.mp4

08:49

39 - mongodb-collections.txt

39 - pyspark.zip

39 - spark-with-scala.zip

40 - Building Data Visualization using Python.mp4

14:38

40 - datamaking-real-time-dashboard.zip

40 - install-dash-on-centos.txt

41 - Project Demo.mp4

22:02

41 - spark-submit-command.txt

42 - How to Install Apache Zeppelin in CDH 63 Cluster.mp4

14:04

42 - Install-Apache-Zeppelin.txt

43 - Data Analysis using Spark SQL in Apache Zeppelin.mp4

09:03

43 - spark-sql.txt

10 - Bonus Tutorial

44 - Introduction to Docker.mp4

11:11

45 - Install Docker on Ubuntu Operating System.mp4

10:16

46 - Install Docker on Windows Operating System.mp4

08:22

47 - Docker Practical Tutorial.mp4

29:20

Description

Building Data Processing Pipeline Using Apache NiFi, Apache Kafka, Apache Spark, Cassandra, MongoDB, Hive and Zeppelin

What You'll Learn?

Complete Spark Project Development on Cloudera Hadoop and Spark Cluster
Fundamentals of Google Cloud Platform(GCP)
Setting up Cloudera Hadoop and Spark Cluster(CDH 6.3) on GCP
Features of Spark Structured Streaming using Spark with Scala
Features of Spark Structured Streaming using Spark with Python(PySpark)
Fundamentals of Apache NiFi
Fundamentals of Apache Kafka
How to use NoSQL like MongoDB and Cassandra with Spark Structured Streaming
How to build Data Visualisation using Python
Fundamentals of Apache Hive and how to integrate with Apache Spark
Features of Apache Zeppelin
Fundamentals of Docker and Containerization

Who is this for?

Beginners who want to learn Apache Spark/Big Data Project Development Process and Architecture

Entry/Intermediate level Data Engineers and Data Scientist

Data Engineering and Data Science Aspirants

Data Enthusiast who want to learn, how to develop and run Spark Application on CDH Cluster

Anyone who is really willingness to become Big Data/Spark Developer

What You Need to Know?

Basic understanding of Programming Language

Basic understanding of Apache Hadoop

Basic understanding of Apache Spark

No worry, even solid Apache Hadoop and Apache Spark basics are covered for the benefit of absolute beginners

Most important one, which is willingness to learn

More details

Description
In retail business, retail stores and eCommerce websites generates large amount of data in real-time.
There is always a need to process these data in real-time and generate insights which will be used by the business people and they make business decision to increase the sales in the retail market and provide better customer experience.
Since the data is huge and coming in real-time, we need to choose the right architecture with scalable storage and computation frameworks/technologies.
Hence we want to build the Data Processing Pipeline Using Apache NiFi, Apache Kafka, Apache Spark, Apache Cassandra, MongoDB, Apache Hive and Apache Zeppelin to generate insights out of this data.
The Spark Project is built using Apache Spark with Scala and PySpark on Cloudera Hadoop(CDH 6.3) Cluster which is on top of Google Cloud Platform(GCP).
Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance.
Apache Kafka is a distributed event store and stream-processing platform. It is an open-source system developed by the Apache Software Foundation written in Java and Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.
A NoSQL (originally referring to "non-SQL" or "non-relational") database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.
Who this course is for:
Beginners who want to learn Apache Spark/Big Data Project Development Process and Architecture
Entry/Intermediate level Data Engineers and Data Scientist
Data Engineering and Data Science Aspirants
Data Enthusiast who want to learn, how to develop and run Spark Application on CDH Cluster
Anyone who is really willingness to become Big Data/Spark Developer

User Reviews

Rating

average 0

Total votes0

Focused display

Data Engineer(Big Data/Hadoop, Apache Spark, Python) cum Freelance Consultant, YouTube Creator. Having 12+ years of experience in implementing solutions to the enterprise clients and having strong Framework skills to implement complex business solutions. Worked on, Web, Windows, Mobile and Hadoop/Big Data, Apache Spark applications. Having 6+ years of experience with Hadoop/Big Data, Apache Spark Framework. Worked on Hadoop distributions like Cloudera CDH, Apache Hadoop.

Udemy

View courses Udemy

Students take courses primarily to improve job-related skills.Some courses generate credit toward technical certification. Udemy has made a special effort to attract corporate trainers seeking to create coursework for employees of their company.