Medallion (Pull) vs Data Product (Push) architectures

This article summarizes the core arguments from Modern Data 101‘s piece: Data Products: A Case Against Medallion Architecture. The original article contrasts the traditional Medallion Architecture (Bronze, Silver, Gold) for data lakes with a Data Product approach. It argues that Medallion, while intended to simplify data management, often leads to bottlenecks and quality issues. The authors advocate for Data Products and a “Lakehouse with usable data instead of ALL Data.” Here’s a streamlined breakdown:

Visual Comparison: Pull vs Push Mechanisms

Image illustrating the key differences in data flow between the Medallion (Pull) and Data Product (Push) architectures.

Key Takeaways

Medallion Architecture (Key Issues):
- Enforces a strict, often unnecessary, pipeline structure.
- Data Engineers pull ALL source data without specific context.
- Results in bottlenecks and increased storage/compute costs.
Data Product Architecture (Key Benefits):
- Emphasizes a “push” mechanism, driving business context upstream.
- Enables a lean-pull system, moving only the data needed for specific purposes.
- Productized data is high-quality and governed from the start.
Key Arguments Against Medallion:
- Shifts work and quality responsibility to data consumers.
- Incurs unnecessary data movement and costs.
- Lacks business context until late stages.
Recommendations:
- Focus on a Model-Driven Data Product approach based on business needs.
- Prioritize a “Lakehouse with usable data” not ALL data.

Medallion Architecture vs. Data Products

Feature	Medallion Architecture	Data Product Architecture
Data Flow	Pull Mechanism: Data is pulled through predefined layers (Bronze, Silver, Gold).	Push Mechanism: Business context is pushed to upstream layers, guiding data integration.
Data Focus	Transformation Stage: Data is categorized based on its refinement level.	Business Need/Use Case: Data is shaped according to specific analytical or operational requirements.
Data Quality	Incremental Refinement: Quality improvements are applied progressively at each layer.	Embedded Quality: Quality controls and governance are enforced from the outset (Shift-Left).
Context	Limited upstream context; business understanding is primarily in the Gold layer.	Business context is embedded as early as possible.
Data Movement	High data movement between layers (ETL processes), leading to redundant copies.	Minimized data movement; purpose-driven data flows based on specific business needs.
Flexibility	Limited flexibility; predefined pathways and batch-based delivery.	High flexibility; diverse consumption patterns (batch, streaming, APIs). Self-service data consumption.
Bottlenecks	Creates bottlenecks at each layer due to dependencies and predefined processes.	Eliminates bottlenecks by shifting responsibility leftward and providing consumer-driven data.
Cost	Higher operational costs due to unnecessary data movement and storage of multiple copies.	Lower operational costs due to reduced data movement, storage, and processing.
Outcome	Data consumers often have to engineer their own transformations and handle complex quality issues.	Consumers interact with high-quality, reliable data that meets their specific needs.
Data Governance	Often complicated, lineage tied to pipeline stages rather than business meaning.	Enforces well-defined SLAs, contracts, and ownership of data from the source.
Key Principle	“Lakehouse with ALL data”	“Lakehouse with usable data” – reducing unneccessary layering with purpose driven storage and processing

This summary provides a clear comparison of the two architectures, highlighting the benefits of adopting a Data Product approach over the traditional Medallion Architecture.

Source: Modern Data 101 – Data Products: A Case Against Medallion Architecture

Discover more from Data Engineer Journey

Subscribe to get the latest posts sent to your email.

Medallion (Pull) vs Data Product (Push) architectures

Visual Comparison: Pull vs Push Mechanisms

Key Takeaways

Medallion Architecture vs. Data Products

Like this:

Related

Discover more from Data Engineer Journey

Leave a Comment Cancel Reply

Visual Comparison: Pull vs Push Mechanisms

Key Takeaways

Medallion Architecture vs. Data Products

Share this:

Like this:

Related

Discover more from Data Engineer Journey

Leave a Comment Cancel Reply

Discover more from Data Engineer Journey