As you progress in your data journey, whether you’re transitioning from a data analyst to a data engineer or simply enhancing your data skills, exploring online certifications like IBM Data Warehouse Engineer or Azure Data Engineering Associate will expose you to a wide array of data topics. While mastering every detail may not be necessary, staying updated on these subjects can significantly elevate your expertise.
Note: I’ll be adding more relevant topics as I learn along the way.
Topics:
Topics | Link | Comments | Tags |
---|---|---|---|
Schema ON read/write | Read/Write Schema | Read – Traditional RDBMS Write – Data Lake | Data |
GDPR | General Data Production Regulation | Regulation on information privacy in the European Union and the European Economic Area. | Privacy, Security, Governance |
HIPAA | Health Insurance Portability and Accountability Act | The Privacy Act 1988 is largely the Australian counterpart to HIPAA. | Privacy, Security, Governance |
PCI DSS | Payment Card Industry Data Security Standard (PCI DSS) | Global standard mandated by the leading Card Schemes including Visa and MasterCard to reduce the risk of card data breach. | Privacy, Security, Governance |
Performance Tuning in Talend | Talend Performance Tuning | Identify and eliminate bottlenecks | Talend, ETL |
Slowly Changing Dimension (SCD) | Slowly Changing Dimensions (SCDs) | Type 1 – Overwrite the changes Type 2 – History will be added as a new row Type 3 – History will be added as a new column Note: type 2 is an idempotent type | Data |
Change Data Capture (CDC) | Change Data Capture (CDC) | Extracting data in real-time or near-real-time | Data |
Idempotent Data Pipelines | Idempotent Data Pipelines | The ability to execute the same operation multiple times without changing the result Not Idempotent → INSERT INTO without TRUNCATE – may store the same data twice. | Data |
Analytics Engineering | Analytics Engineering | Data | |
Serverless Computing | Serverless vs Containers | Cloud provider manages the infrastructure and automatically allocates computing resources as needed to run applications. Examples: Azure Functions, AWS Lambda | Computing |
Container | Containers vs VMs | Virtual machines provide an abstracted version of the entire hardware of a physical machine, including the CPU, memory, and storage. Containers are portable instances of software with its dependencies that run on a physical or virtual machine. | Computing |
Virtual Machines | Containers vs VMs | Virtual machines provide an abstracted version of the entire hardware of a physical machine, including the CPU, memory, and storage. Containers are portable instances of software with its dependencies that run on a physical or virtual machine. |
Tools
Tools:
Tools | Link | Comments |
---|---|---|
Apache Parquet | Apache Parquet | Column-oriented data file format |
Docker | Docker | Tool for containers Remember: the difference between containers vs virtual machines; similarly, Docker vs VMware Kubernetes – container orchestration system for automating software deployment, scaling, and management. |
MariaDB | MariaDB | Fork of MySQL with advanced features |