Data Glossary


As you progress in your data journey, whether you’re transitioning from a data analyst to a data engineer or simply enhancing your data skills, exploring online certifications like IBM Data Warehouse Engineer or Azure Data Engineering Associate will expose you to a wide array of data topics. While mastering every detail may not be necessary, staying updated on these subjects can significantly elevate your expertise.

Note: I’ll be adding more relevant topics as I learn along the way.

Topics:

TopicsLinkCommentsTags
Schema ON read/writeRead/Write SchemaRead – Traditional RDBMS
Write – Data Lake
Data
GDPRGeneral Data Production RegulationRegulation on information privacy in the European Union and the European Economic Area.Privacy, Security, Governance
HIPAAHealth Insurance Portability and Accountability ActThe Privacy Act 1988 is largely the Australian counterpart to HIPAA.Privacy, Security, Governance
PCI DSSPayment Card Industry Data Security Standard (PCI DSS)Global standard mandated by the leading Card Schemes including Visa and MasterCard to reduce the risk of card data breach.Privacy, Security, Governance
Performance Tuning in TalendTalend Performance TuningIdentify and eliminate bottlenecksTalend, ETL
Slowly Changing Dimension (SCD)Slowly Changing Dimensions (SCDs)Type 1 – Overwrite the changes
Type 2 – History will be added as a new row
Type 3 – History will be added as a new column
Note: type 2 is an idempotent type
Data
Change Data Capture (CDC)Change Data Capture (CDC)Extracting data in real-time or near-real-timeData
Idempotent Data PipelinesIdempotent Data PipelinesThe ability to execute the same operation multiple times without changing the result
Not Idempotent → INSERT INTO without TRUNCATE – may store the same data twice.
Data
Analytics EngineeringAnalytics EngineeringData
Serverless ComputingServerless vs ContainersCloud provider manages the infrastructure and automatically allocates computing resources as needed to run applications.
Examples: Azure Functions, AWS Lambda
Computing
ContainerContainers vs VMsVirtual machines provide an abstracted version of the entire hardware of a physical machine, including the CPU, memory, and storage.
Containers are portable instances of software with its dependencies that run on a physical or virtual machine.
Computing
Virtual MachinesContainers vs VMsVirtual machines provide an abstracted version of the entire hardware of a physical machine, including the CPU, memory, and storage.
Containers are portable instances of software with its dependencies that run on a physical or virtual machine.

Tools

Tools:

ToolsLinkComments
Apache ParquetApache ParquetColumn-oriented data file format
DockerDockerTool for containers
Remember: the difference between containers vs virtual machines; similarly, Docker vs VMware
Kubernetes – container orchestration system for automating software deployment, scaling, and management.
MariaDBMariaDBFork of MySQL with advanced features

Discover more from Data Engineer Journey

Subscribe to get the latest posts sent to your email.

Scroll to Top