Companies Home Search Profile

Architecting Big Data Solutions Using Google Dataproc

Focused View

Janani Ravi

2:16:51

26 View
  • 0101.Course Overview.mp4
    02:09
  • 0201.Module Overview.mp4
    01:46
  • 0202.Prerequisites, Course Outline, and Spikey Sales Scenarios.mp4
    04:03
  • 0203.Distributed Processing.mp4
    02:57
  • 0204.Storage in Traditional Hadoop.mp4
    03:03
  • 0205.Compute in Traditional Hadoop.mp4
    04:19
  • 0206.Separating Storage and Compute with Dataproc.mp4
    06:22
  • 0207.Hadoop vs. Dataproc.mp4
    03:34
  • 0208.Using the Cloud Shell, Enabling the Dataproc API.mp4
    03:48
  • 0209.Dataproc Features.mp4
    03:56
  • 0210.Migrating to Dataproc.mp4
    05:50
  • 0211.Dataproc Pricing.mp4
    03:00
  • 0301.Module Overview.mp4
    01:04
  • 0302.Creating a Dataproc Cluster Using the Web Console.mp4
    06:58
  • 0303.Using SSH to Connect to the Master Node.mp4
    04:14
  • 0304.Creating a Firewall Rule to Enable Access to Dataproc.mp4
    04:30
  • 0305.Accessing the Resource Manager and Name Node UI.mp4
    02:12
  • 0306.Upload Data and MapReduce Code to Cloud Storage .mp4
    04:06
  • 0307.Running MapReduce on Dataproc.mp4
    03:58
  • 0308.Running MapReduce Using the gcloud Command Line Utility.mp4
    04:17
  • 0309.Creating a Cluster with Preemptible Instances Using gcloud.mp4
    03:01
  • 0310.Monitoring Clusters Using Stackdriver.mp4
    04:50
  • 0311.Stackdriver Monitoring Groups and Alerting Policies.mp4
    05:26
  • 0312.Configuring Initialization Actions for Dataproc.mp4
    04:22
  • 0401.Module Overview.mp4
    01:10
  • 0402.Spark for Distributed Processing.mp4
    04:18
  • 0403.Running a Spark Scala Job Using the Web Console.mp4
    03:22
  • 0404.Executing a Spark Application Using gcloud.mp4
    02:33
  • 0405.Creating a BigQuery Table.mp4
    03:32
  • 0406.Pyspark Application Using BiqQuery and Cloud Storage Connectors.mp4
    04:10
  • 0407.Executing a Spark Application to Get Results in BigQuery.mp4
    03:09
  • 0408.Monitoring Spark Jobs on Dataproc.mp4
    02:39
  • 0501.Module Overview.mp4
    01:04
  • 0502.Pig for Extract Transform Load.mp4
    03:34
  • 0503.Running Pig Scripts on Dataproc.mp4
    02:58
  • 0504.Storing Pig Output to Cloud Storage.mp4
    02:31
  • 0505.Hive to Query Big Data.mp4
    02:52
  • 0506.Executing Hive Queries on Dataproc.mp4
    03:03
  • 0507.Summary and Further Study.mp4
    02:11
  • Description


    Dataproc is Google’s managed Hadoop offering on the cloud. This course teaches you how the separation of storage and compute allows you to utilize clusters more efficiently purely for processing data and not for storage.

    What You'll Learn?


      When organizations plan their move to the Google Cloud Platform, Dataproc offers the same features but with additional powerful paradigms such as separation of compute and storage. Dataproc allows you to lift-and-shift your Hadoop processing jobs to the cloud and store your data separately on Cloud Storage buckets, thus effectively eliminating the requirement to keep your clusters always running. In this course, Architecting Big Data Solutions Using Google Dataproc, you’ll learn to work with managed Hadoop on the Google Cloud and the best practices to follow for migrating your on-premise jobs to Dataproc clusters. First, you'll delve into creating a Dataproc cluster and configuring firewall rules to enable you to access the cluster manager UI from your local machine. Next, you'll discover how to use the Spark distributed analytics engine on your Dataproc cluster. Then, you'll explore how to write code in order to integrate your Spark jobs with BigQuery and Cloud Storage buckets using connectors. Finally, you'll learn how to use your Dataproc cluster to perform extract, transform, and load operations using Pig as a scripting language and work with Hive tables. By the end of this course, you'll have the necessary knowledge to work with Google’s managed Hadoop offering and have a sound idea of how to migrate jobs and data on your on-premise Hadoop cluster to the Google Cloud.

    More details


    User Reviews
    Rating
    0
    0
    0
    0
    0
    average 0
    Total votes0
    Focused display
    Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework. After spending years working in tech in the Bay Area, New York, and Singapore at companies such as Microsoft, Google, and Flipkart, Janani finally decided to combine her love for technology with her passion for teaching. She is now the co-founder of Loonycorn, a content studio focused on providing high-quality content for technical skill development. Loonycorn is working on developing an engine (patent filed) to automate animations for presentations and educational content.
    Pluralsight, LLC is an American privately held online education company that offers a variety of video training courses for software developers, IT administrators, and creative professionals through its website. Founded in 2004 by Aaron Skonnard, Keith Brown, Fritz Onion, and Bill Williams, the company has its headquarters in Farmington, Utah. As of July 2018, it uses more than 1,400 subject-matter experts as authors, and offers more than 7,000 courses in its catalog. Since first moving its courses online in 2007, the company has expanded, developing a full enterprise platform, and adding skills assessment modules.
    • language english
    • Training sessions 39
    • duration 2:16:51
    • level preliminary
    • Release Date 2023/10/11