Medallion (Pull) vs Data Product (Push) architectures

This article summarizes the core arguments from Modern Data 101‘s piece: Data Products: A Case Against Medallion Architecture. The original article contrasts the traditional Medallion Architecture (Bronze, Silver, Gold) for data lakes with a Data Product approach. It argues that Medallion, while intended to simplify data management, often leads to bottlenecks and quality issues. The authors advocate for Data Products and a “Lakehouse with usable data instead of ALL Data.” Here’s a streamlined breakdown:

Visual Comparison: Pull vs Push Mechanisms

Medallion vs Data Products: Pull vs Push

Image illustrating the key differences in data flow between the Medallion (Pull) and Data Product (Push) architectures.

Key Takeaways

  • Medallion Architecture (Key Issues):
    • Enforces a strict, often unnecessary, pipeline structure.
    • Data Engineers pull ALL source data without specific context.
    • Results in bottlenecks and increased storage/compute costs.
  • Data Product Architecture (Key Benefits):
    • Emphasizes a “push” mechanism, driving business context upstream.
    • Enables a lean-pull system, moving only the data needed for specific purposes.
    • Productized data is high-quality and governed from the start.
  • Key Arguments Against Medallion:
    • Shifts work and quality responsibility to data consumers.
    • Incurs unnecessary data movement and costs.
    • Lacks business context until late stages.
  • Recommendations:
    • Focus on a Model-Driven Data Product approach based on business needs.
    • Prioritize a “Lakehouse with usable data” not ALL data.

Medallion Architecture vs. Data Products

Feature Medallion Architecture Data Product Architecture
Data Flow Pull Mechanism: Data is pulled through predefined layers (Bronze, Silver, Gold). Push Mechanism: Business context is pushed to upstream layers, guiding data integration.
Data Focus Transformation Stage: Data is categorized based on its refinement level. Business Need/Use Case: Data is shaped according to specific analytical or operational requirements.
Data Quality Incremental Refinement: Quality improvements are applied progressively at each layer. Embedded Quality: Quality controls and governance are enforced from the outset (Shift-Left).
Context Limited upstream context; business understanding is primarily in the Gold layer. Business context is embedded as early as possible.
Data Movement High data movement between layers (ETL processes), leading to redundant copies. Minimized data movement; purpose-driven data flows based on specific business needs.
Flexibility Limited flexibility; predefined pathways and batch-based delivery. High flexibility; diverse consumption patterns (batch, streaming, APIs). Self-service data consumption.
Bottlenecks Creates bottlenecks at each layer due to dependencies and predefined processes. Eliminates bottlenecks by shifting responsibility leftward and providing consumer-driven data.
Cost Higher operational costs due to unnecessary data movement and storage of multiple copies. Lower operational costs due to reduced data movement, storage, and processing.
Outcome Data consumers often have to engineer their own transformations and handle complex quality issues. Consumers interact with high-quality, reliable data that meets their specific needs.
Data Governance Often complicated, lineage tied to pipeline stages rather than business meaning. Enforces well-defined SLAs, contracts, and ownership of data from the source.
Key Principle “Lakehouse with ALL data” “Lakehouse with usable data” – reducing unneccessary layering with purpose driven storage and processing

This summary provides a clear comparison of the two architectures, highlighting the benefits of adopting a Data Product approach over the traditional Medallion Architecture.


Discover more from Data Engineer Journey

Subscribe to get the latest posts sent to your email.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top

Discover more from Data Engineer Journey

Subscribe now to keep reading and get access to the full archive.

Continue reading