Data Engineering with Google Datafusion and Big Query (CDAP)

Focused View

Cassio Alessandro de Bolba

3:08:15

57 View

1 - Introduction

1 - 11 Get to Know the Teacher.mp4

02:07

2 - 12 Get to Know the Course.mp4

03:52

3 - 13 Introduction to Google Datafusion.mp4

08:44

4 - 14 Architecture and Components.mp4

07:26

5 - 15 Creating a Datafusion Instance.mp4

05:07

6 - 16 Instance Types and Pricing.mp4

07:13

7 - 17 Understanding a Datafusion Instance.mp4

07:35

2 - Developing Data Pipelines

8 - 21 GCS Object Storage.mp4

06:04

9 - 22 Big Query as Datalake.mp4

06:27

10 - 23 Working with Semi Structured Data.mp4

04:38

11 - 24 Pipeline Studio and Wangler.mp4

13:40

12 - 25 Preview and Debug.mp4

07:06

13 - 26 Sinking data on Big Query.mp4

10:04

14 - ERROR Importing json pipeline from other Datafusion Instance.mp4

05:59

15 - 27 Branching the Pipeline.mp4

09:01

16 - 28 Move files.mp4

08:55

17 - 29 Big Query as Source.mp4

05:02

18 - 210 Transforming Data with Wrangler 1.mp4

10:28

19 - 211 Transforming Data with Wrangler 2.mp4

07:45

20 - 212 Transforming Data with Big Query.mp4

04:50

21 - 213 Execute Query in Datafusion.mp4

05:49

22 - 214 Data Partitioning in Big Query.mp4

07:33

23 - 215 MERGE statement.mp4

07:28

24 - 216 Delete temp Tables.mp4

08:35

25 - 217 Scheduling and Pipeline Dependencies.mp4

05:42

26 - 218 ERRO Quota DISKSTOTALGB Exceed.mp4

05:31

27 - 219 Challenge.mp4

05:34

Description

Your first steps in Data Engineering with Google Datafusion, a low-code tool with an open-source version (CDAP)

What You'll Learn?

Understand a bit more Google Cloud Resources
Use Google Datafusion as ETL tool
Data Engineering Low Code
ETL
Create Data Pipelines and DAGs
Read and Write data on Google Big Query
Read and Write data on Google Cloud Storage
Data Transformations with low code and queries
Some Advanced SQL commands

Who is this for?

Data Engineers

Data Analysts

Data Scientists

Analytics Engineer

Low Code Developers

Python Developers looking to reduce coding overhead

Open Source Fans

What You Need to Know?

GCP account

Previous exposure to SQL

More details

Description
This is an INTRODUCTORY course to Google Cloud's low-code ingestion tool, Datafusion. Google Data Fusion is a fully managed data integration platform that allows data engineers to efficiently create, deploy, and manage data pipelines.
One of the main reasons to use Google Data Fusion is its ease of use. With an intuitive and visual interface, data engineers can create complex data pipelines without the need for extensive coding. The drag-and-drop interface simplifies the process of data transformation and cleansing, allowing professionals to focus on business logic rather than worrying about detailed coding.
Another significant benefit of Google Data Fusion is its scalability. The platform runs on Google Cloud, which means it can handle large volumes of data and high-performance parallel processing. Data engineers can vertically or horizontally expand their processing capabilities according to project needs, ensuring they can handle any data demand at scale.
Furthermore, Google Data Fusion seamlessly integrates with other services and products in the Google Cloud ecosystem. Data engineers can easily connect and integrate data pipelines with services such as BigQuery, Cloud Storage, Pub/Sub, and many others. This enables a cohesive and unified data architecture, facilitating data ingestion, storage, and analysis across multiple platforms.
In this course, you will learn:
Understanding its internal workings.
What its benefits are.
How to create a Datafusion instance.
Using Google Cloud Storage as data input.
Using BigQuery as a Data Lake (Bronze and Silver layers).
Advanced features of BigQuery: Partitioned tables and MERGE command.
Ingesting data from different sources.
Transforming data with Wrangle (low code) and queries.
Creating DAGs for data ETL (Extract, Transform, Load) and dependencies.
Scheduling and inter-DAG dependencies.
Who this course is for:
Data Engineers
Data Analysts
Data Scientists
Analytics Engineer
Low Code Developers
Python Developers looking to reduce coding overhead
Open Source Fans

User Reviews

Rating

average 0

Total votes0

Focused display

I'm self taught Senior Data Engineer and content creator. Migrated from a machine operator at my 30's to the Data IT Industry. Can help early professionals to drive their path to become data professionals as well as give some great advices for those who wish to live abroad and achieve a sponsorship visa.My current stack:Data Integration / Processing -> Databricks | Dataflow | AWS Lambdas | Datafusion | DataFactoryAutomation -> Power Platform | Power Automate | Power AppsDatabases -> Snowflake | Big Query | SQL ServerData Transformation -> DBTVersioning / Repository -> Git | Azure DevOpsProgramming -> SQL | Python | PySparkCloud Providers -> Azure | GCP | AWS Task / Data Orchestration -> AirflowBI -> Power BI | Qlik Sense CI / CD -> Git Lab CIContainers -> Docker

Udemy

View courses Udemy

Students take courses primarily to improve job-related skills.Some courses generate credit toward technical certification. Udemy has made a special effort to attract corporate trainers seeking to create coursework for employees of their company.