Data Lakes with Spark

The course aims to equip participants with the knowledge and skills to build, and optimize lakehouses, perform ETL processes using Spark

Intermediate

Difficulty

2.5

Hours per week

8

Live Sessions

Jan 15, 2024

Next Date

Objective

Requeriments

Syllabus

1
Introduction to Data Lakes and Lakehouses

Discover the concepts and benefits of data lakes and lakehouses, explore data lake architecture, and understand key considerations and challenges in implementing data lakes and lakehouses.

2
Apache Spark and Databricks Overview

Gain insights into Apache Spark's role in data engineering, explore the features of the powerful Databricks platform, and learn how to set up and configure Spark and Databricks environments.

3
Data Ingestion

Master the art of extracting data from diverse sources, including databases, files, and streaming data. Learn how to ingest and store data in data lakes using Spark and Databricks, handling structured, semi-structured, and unstructured data.

4
Data Transformation and Processing with Spark

Understand Spark RDDs and DataFrames for efficient data transformations. Apply Spark transformations and actions for data processing and become proficient in using Spark SQL for querying and manipulating data.

5
Schema Evolution and Management in Data Lakes

Navigate schema evolution and versioning challenges in data lakes. Implement schema enforcement strategies, and learn about metadata and data catalog management within data lake environments.

6
Data Engineering Workflows with Databricks

Leverage Databricks notebooks for seamless data engineering tasks, foster collaborative development, and implement version control in Databricks. Master the creation of data pipelines and job scheduling.

7
Data Quality and Governance in Databricks Lakehouse

Ensure data quality and governance with Unity Catalog in data lake and lakehouse environments using Databricks. Stay compliant with data regulations and discover best practices for effective data governance.

8
Data Lakes and Lakehouses Best Practices and Case Studies

Acquire best practices for designing and managing data lakes and lakehouses effectively. Explore real-world case studies demonstrating data engineering success with Spark and Databricks. Stay updated on the latest trends and advancements in data lakes and lakehouses.

Mentor

Mentor to be defined.
Our alumni works in:

Learn all you can. No extra fees, no commissions, no surprises.

— We’re an hybrid learning platform with live-cohorts. Learn everything you want by acquiring a membership.