Simplifying Data Engineering and Analytics with Delta: Create analytics-ready data that fuels artificial intelligence and business intelligence

Data Architecture

Author

Packt Publishing

0 View

Explore how Delta brings reliability, performance, and governance to your data lake and all the AI and BI use cases built on top of it

Delta helps you generate reliable insights at scale and simplifies architecture around data pipelines, allowing you to focus primarily on refining the use cases being worked on. This is especially important when you consider that existing architecture is frequently reused for new use cases.

In this book, you'll learn about the principles of distributed computing, data modeling techniques, and big data design patterns and templates that help solve end-to-end data flow problems for common scenarios and are reusable across use cases and industry verticals. You'll also learn how to recover from errors and the best practices around handling structured, semi-structured, and unstructured data using Delta. After that, you'll get to grips with features such as ACID transactions on big data, disciplined schema evolution, time travel to help rewind a dataset to a different time or version, and unified batch and streaming capabilities that will help you build agile and robust data products.

By the end of this Delta book, you'll be able to use Delta as the foundational block for creating analytics-ready data that fuels all AI/BI use cases.

Download pdf

ISBN-10

1801814864

ISBN-13

978-1801814867

Publisher

Packt Publishing

Price

44.99

File Type

PDF

Page No.

334

Anindita Mahapatra

Anindita is a Solutions Architect specializing in Data & AI helping clients make the most of their d ...

Review

"Technology investments require financial justification. This book speaks succinctly about very complex technical topics, makes them easy to understand at any level, and connects the technology to why it matters to the business. The opportunity to apply powerful technology such as Delta and deliver impact all the way to the boardroom of your employer is real and required for success in todays market. Anindita can speak succinctly about very complex technical topics, make them easy to understand at any level, and connect the technology to why it matters to the business."

Doug May, VP, Global Value Acceleration, Databricks, Inc.

"Anindita has written the definitive book on data engineering and the data lakehouse. The path to a viable technology platform for global-scale enterprise digital transformation starts with opencore software such as Delta Lake. If you are looking for a single guide to help you understand the crucial big data and advanced analytics work needed to fully achieve the goals of digital transformation, this is it."

Brad Nicholas, Director, Digital Platforms, IT Emerging Technology at Corning Incorporated

"If you are a data engineer, or you aspire to be one, you need to read this book to understand how to leverage the most popular Lakehouse format, Delta Lake. Anindita has written the definitive book on how to best leverage Delta Lake. Delta Lake has established itself as the leading open source Lakehouse layer, and Anindita does an excellent job of explaining why that is and how it works, as well as giving patterns for implementation. She is careful to document the tradeoffs when designing various aspects of your architecture and gives recommendations on how best to future-proof your architecture. She bridges theory and implementation by providing excellent prescriptive examples (with links to GitHub). She explains how Delta Lake can be used to replace a traditional data warehouse, but also how it can excel with Machine learning, particularly as the storage layer for a feature store. If you have an interest in Delta Lake, or even just Lakehouse layers in general, this is one book you need on your shelf!"

Jason Pohl, Director of Data Management at Databricks

About the Author

Anindita Mahapatra is a Solutions Architect at Databricks in the data and AI space helping clients across all industry verticals reap value from their data infrastructure investments.

She teaches a data engineering and analytics course at Harvard University as part of their extension school program.

She has extensive big data and Hadoop consulting experience from Thinkbig/Teradata prior to which she was managing development of algorithmic app discovery and promotion for both Nokia and Microsoft AppStores.

She holds a Masters degree in Liberal Arts and Management from Harvard Extension School, a Masters in Computer Science from Boston University and a Bachelors in Computer Science from BITS Pilani, India.

Explore the key challenges of traditional data lakes
Appreciate the unique features of Delta that come out of the box
Address reliability, performance, and governance concerns using Delta
Analyze the open data format for an extensible and pluggable architecture
Handle multiple use cases to support BI, AI, streaming, and data discovery
Discover how common data and machine learning design patterns are executed on Delta
Build and deploy data and machine learning pipelines at scale using Delta

Data engineers, data scientists, ML practitioners, BI analysts, or anyone in the data domain working with big data will be able to put their knowledge to work with this practical guide to executing pipelines and supporting diverse use cases using the Delta protocol. Basic knowledge of SQL, Python programming, and Spark is required to get the most out of this book.

An Introduction to Data Engineering
Data Modeling and ETL
Delta The Foundation Block for Big Data
Unifying Batch and Streaming with Delta
Data Consolidation in Delta Lake
Solving Common Data Pattern Scenarios with Delta
Delta for Data Warehouse Use Cases
Handling Atypical Data Scenarios with Delta
Delta for Reproducible Machine Learning Pipelines
Delta for Data Products and Services
Operationalizing Data and ML Pipelines
Optimizing Cost and Performance with Delta
Managing Your Data Journey

Simplifying Data Engineering and Analytics with Delta: Create analytics-ready data that fuels artificial intelligence and business intelligence

Data Analysis

Data Architecture

Anindita Mahapatra

Packt Publishing

Anindita Mahapatra

Publishers opinion

Publishers opinion

Review

About the Author

what You will learn in this book?

Who is this book suitable for?

Headline Books

Similar Books

Other Authors' Books

Other Publishing Books

User Reviews

Rating