Companies Home Search Profile

Spark Project on Cloudera Hadoop(CDH) and GCP for Beginners

Focused View

PARI MARGU

10:51:25

52 View
  • 1 - Course Introduction.mp4
    04:13
  • 2 - Introduction to Big Data.mp4
    09:12
  • 3 - Introduction to Apache Hadoop.mp4
    17:11
  • 4 - Understanding Hadoop Distributed File System HDFS and MapReduce.mp4
    30:01
  • 5 - Introduction to Apache Spark.mp4
    32:31
  • 6 - Spark Architecture.mp4
    12:00
  • 7 - Workaround for setting up Cloudera CDH on GCP.html
  • 8 - Environment Setup Overview.mp4
    01:16
  • 9 - Create Free Trial Account in Google Cloud PlatformGCP.mp4
    09:52
  • 10 - Create VM instance using Compute Engine in GCP.mp4
    21:35
  • 11 - Setting Up Single Node Cloudera Hadoop CDH 63 Cluster in GCP.mp4
    32:45
  • 11 - Setting-Up-Single-Node-Cloudera-Hadoop-CDH-6.3-Cluster-in-GCP-v1.0.txt
  • 12 - Install Apache NiFi on Single Node CDH 63 Cluster.mp4
    06:29
  • 13 - Install Apache Kafka on Single Node CDH 63 Cluster.mp4
    15:33
  • 13 - Install-Kafka.txt
  • 14 - Install Apache Cassandra on Single Node CDH 63 Cluster.mp4
    15:50
  • 14 - Install-Apache-Cassandra-on-CDH-6.3-Cluster-in-GCP.txt
  • 15 - Install MongoDB on Single Node CDH 63 Cluster.mp4
    07:20
  • 15 - Install-MongoDB-on-CDH-6.3-in-GCP.txt
  • 16 - Install and Configure PyCharm Community Edition for PySpark Application.mp4
    17:09
  • 16 - datamaking-pyspark-demo.zip
  • 16 - sample-data.zip
  • 17 - Install Configure IntelliJ Community Edition for Spark with Scala Application.mp4
    22:01
  • 17 - apachespark101.zip
  • 18 - Resilient Distributed Datasets RDD Transformation Operations.mp4
    34:02
  • 18 - data.zip
  • 18 - datamaking-pyspark-demo.zip
  • 18 - datamaking-spark-demo.zip
  • 19 - Resilient Distributed Datasets RDD Action Operations.mp4
    10:00
  • 19 - data.zip
  • 19 - datamaking-pyspark-demo.zip
  • 19 - datamaking-spark-demo.zip
  • 20 - Spark DataFrame Operations.mp4
    34:02
  • 20 - data.zip
  • 20 - datamaking-pyspark-demo.zip
  • 20 - datamaking-spark-demo.zip
  • 21 - Spark SQL Concepts with HandsOn.mp4
    13:33
  • 21 - data.zip
  • 21 - datamaking-pyspark-demo.zip
  • 21 - datamaking-spark-demo.zip
  • 22 - Introduction to Apache NiFi.mp4
    08:15
  • 23 - Apache NiFi Core Terminologies.mp4
    05:54
  • 24 - Apache NiFi Concepts with HandsOn Part 1.mp4
    09:45
  • 25 - Apache NiFi Concepts with HandsOn Part 2.mp4
    10:04
  • 26 - Introduction to Apache Kafka.mp4
    11:32
  • 27 - Key Concepts in Apache Kafka.mp4
    13:39
  • 28 - Apache Kafka Architecture.mp4
    09:12
  • 29 - Kafka Producer with HandsOn.mp4
    08:16
  • 29 - Python-Kafka-Producer.txt
  • 29 - datamaking-kafka-demo.zip
  • 30 - Kafka Consumer with HandsOn.mp4
    08:04
  • 30 - Python-Kafka-Consumer.txt
  • 30 - datamaking-kafka-demo.zip
  • 31 - Introduction to Apache Hive.mp4
    08:07
  • 32 - Hive Table Concepts with HandsOn.html
  • 32 - customers.txt
  • 32 - orders.txt
  • 33 - Hive Joins Concepts with HandsOn.html
  • 34 - Partitioning and Bucketing Concepts in Hive with HandsOn.html
  • 35 - Project ArchitectureBuilding Data Processing Pipeline.mp4
    09:00
  • 36 - Generate Retail Data using Apache NiFi Data PipelineeCommerce Data Simulator.mp4
    14:04
  • 36 - api-apache-nifi-retail-simulator-kafka.zip
  • 36 - jolt-json-spec.txt
  • 37 - Spark Structured Streaming and Apache Kafka Integration.mp4
    14:03
  • 37 - pyspark.zip
  • 37 - spark-scala.zip
  • 38 - Building Data Processing Pipeline with Spark Structured Streaming and Cassandra.mp4
    47:10
  • 38 - cassandra-table.txt
  • 38 - pyspark.zip
  • 38 - spark-scala.zip
  • 39 - Building Data Processing Pipeline with Spark Structured Streaming and MongoDB.mp4
    08:49
  • 39 - mongodb-collections.txt
  • 39 - pyspark.zip
  • 39 - spark-with-scala.zip
  • 40 - Building Data Visualization using Python.mp4
    14:38
  • 40 - datamaking-real-time-dashboard.zip
  • 40 - install-dash-on-centos.txt
  • 41 - Project Demo.mp4
    22:02
  • 41 - spark-submit-command.txt
  • 42 - How to Install Apache Zeppelin in CDH 63 Cluster.mp4
    14:04
  • 42 - Install-Apache-Zeppelin.txt
  • 43 - Data Analysis using Spark SQL in Apache Zeppelin.mp4
    09:03
  • 43 - spark-sql.txt
  • 44 - Introduction to Docker.mp4
    11:11
  • 45 - Install Docker on Ubuntu Operating System.mp4
    10:16
  • 46 - Install Docker on Windows Operating System.mp4
    08:22
  • 47 - Docker Practical Tutorial.mp4
    29:20
  • Description


    Building Data Processing Pipeline Using Apache NiFi, Apache Kafka, Apache Spark, Cassandra, MongoDB, Hive and Zeppelin

    What You'll Learn?


    • Complete Spark Project Development on Cloudera Hadoop and Spark Cluster
    • Fundamentals of Google Cloud Platform(GCP)
    • Setting up Cloudera Hadoop and Spark Cluster(CDH 6.3) on GCP
    • Features of Spark Structured Streaming using Spark with Scala
    • Features of Spark Structured Streaming using Spark with Python(PySpark)
    • Fundamentals of Apache NiFi
    • Fundamentals of Apache Kafka
    • How to use NoSQL like MongoDB and Cassandra with Spark Structured Streaming
    • How to build Data Visualisation using Python
    • Fundamentals of Apache Hive and how to integrate with Apache Spark
    • Features of Apache Zeppelin
    • Fundamentals of Docker and Containerization

    Who is this for?


  • Beginners who want to learn Apache Spark/Big Data Project Development Process and Architecture
  • Entry/Intermediate level Data Engineers and Data Scientist
  • Data Engineering and Data Science Aspirants
  • Data Enthusiast who want to learn, how to develop and run Spark Application on CDH Cluster
  • Anyone who is really willingness to become Big Data/Spark Developer
  • What You Need to Know?


  • Basic understanding of Programming Language
  • Basic understanding of Apache Hadoop
  • Basic understanding of Apache Spark
  • No worry, even solid Apache Hadoop and Apache Spark basics are covered for the benefit of absolute beginners
  • Most important one, which is willingness to learn
  • More details


    Description
    • In retail business, retail stores and eCommerce websites generates large amount of data in real-time.

    • There is always a need to process these data in real-time and generate insights which will be used by the business people and they make business decision to increase the sales in the retail market and provide better customer experience.

    • Since the data is huge and coming in real-time, we need to choose the right architecture with scalable storage and computation frameworks/technologies.

    • Hence we want to build the Data Processing Pipeline Using Apache NiFi, Apache Kafka, Apache Spark, Apache Cassandra, MongoDB, Apache Hive and Apache Zeppelin to generate insights out of this data.

    • The Spark Project is built using Apache Spark with Scala and PySpark on Cloudera Hadoop(CDH 6.3) Cluster which is on top of Google Cloud Platform(GCP).

    • Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance.

      Apache Kafka is a distributed event store and stream-processing platform. It is an open-source system developed by the Apache Software Foundation written in Java and Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.

      Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.

      A NoSQL (originally referring to "non-SQL" or "non-relational") database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.

    Who this course is for:

    • Beginners who want to learn Apache Spark/Big Data Project Development Process and Architecture
    • Entry/Intermediate level Data Engineers and Data Scientist
    • Data Engineering and Data Science Aspirants
    • Data Enthusiast who want to learn, how to develop and run Spark Application on CDH Cluster
    • Anyone who is really willingness to become Big Data/Spark Developer

    User Reviews
    Rating
    0
    0
    0
    0
    0
    average 0
    Total votes0
    Focused display
    Data Engineer(Big Data/Hadoop, Apache Spark, Python) cum Freelance Consultant, YouTube Creator. Having 12+ years of experience in implementing solutions to the enterprise clients and having strong Framework skills to implement complex business solutions. Worked on, Web, Windows, Mobile and Hadoop/Big Data, Apache Spark applications. Having 6+ years of experience with Hadoop/Big Data, Apache Spark Framework. Worked on Hadoop distributions like Cloudera CDH, Apache Hadoop.
    Students take courses primarily to improve job-related skills.Some courses generate credit toward technical certification. Udemy has made a special effort to attract corporate trainers seeking to create coursework for employees of their company.
    • language english
    • Training sessions 43
    • duration 10:51:25
    • Release Date 2023/08/25

    Courses related to Apache Spark

    Courses related to Google Cloud