Top 6 Techniques for Identifying and Resolving Database Differences (Postgres)

In the complex world of database management, ensuring consistency between databases, schemas, and tables is crucial for seamless data operations. This article explores six techniques for identifying and resolving differences within Postgres databases, addressing schema variations, field inconsistencies, and other discrepancies.

1. Checking Schema differences using Schema diff tool

You can use PgAdmin’s in built tool to check schema difference.

Navigation: PgAdmin>Tools>Schema diff

But in SQL SERVER, you will have SQL Server Data Tools (SSDT)

Note: To compare using this method, both source and destination SERVER should have same version of Postgres installed.

Example:

Let’s check the schema difference between two modules; i. e. Production and public, which is given below:

2. Comparing fields between two tables in any schema:

Underlying Problems that might exist in your workspace:

In my experience during ETL development and data warehousing, I came across several typos or different name conventions for the same fields for some tables which should be same actually across the database.
PgAdmin does not support/displays multiple queries output like SSMS does, so its hard to compare fields of two tables in a single view.

Benefits:

Can help to compare fields of two tables in a single view.
It raises the notice if the fields are consistent or not.
Reduces the risk of transformation errors by confirming that the fields expected by data integration tool processes are consistently named in both source and target tables.
Ensures the accuracy of ETL transformations including typos in field names.

How to use it for other tables?

Just rename table schemas and table names you want to compare and run the query from following example, and check the raised output. You can also visualize and compare fields in output.

Example and Query:

Checking whether the fields are consistent or not among following two tables:

expenserecords
paymentrecords

The structure of these tables are defined as follows:

Query to compare field differences:

-- Compare field names in Production schema
DO $$
DECLARE
    feed_facts_fields text[];
    output_facts_fields text[];
BEGIN
    -- Get field names for expenserecords
    SELECT array_agg(column_name)
    INTO feed_facts_fields
    FROM information_schema.columns
    WHERE table_schema = 'Production'
      AND table_name = 'expenserecords';

    -- Get field names for paymentrecords
    SELECT array_agg(column_name)
    INTO output_facts_fields
    FROM information_schema.columns
    WHERE table_schema = 'Production'
      AND table_name = 'paymentrecords';

    -- Compare field names
    RAISE NOTICE 'Field names in expenserecords: %', feed_facts_fields;
    RAISE NOTICE 'Field names in paymentrecords: %', output_facts_fields;

    IF feed_facts_fields = output_facts_fields THEN
        RAISE NOTICE 'Field names are consistent between the two tables.';
    ELSE
        RAISE NOTICE 'Field names are inconsistent between the two tables.';
    END IF;

END $$;

Query Output in PgAdmin:

Here, we noticed that fields are inconsistent where first table has vendor_name and second table has payee_name. If they are same and you want to make them consistent across the system, you can then modify your ETL job accordingly.

3. Comparing Table Row Counts between tables

SELECT 'your_table_name' AS table_name, COUNT(*) AS row_count
FROM your_schema.your_table
UNION ALL
SELECT 'other_table_name' AS table_name, COUNT(*) AS row_count
FROM other_schema.other_table;
//add UNION ALL and other tables if necessary

4. Compare Views

This query compares views between two databases.

SELECT table_name
FROM information_schema.views
WHERE table_schema = 'your_schema'
EXCEPT
SELECT table_name
FROM information_schema.views
WHERE table_schema = 'other_schema';

5. Checking field difference using SQL joins or using EXCEPT/INTERSECT:

a) Using Traditional Technique (i.e. IN/NOT IN):

Find New Records:

SELECT *
FROM "Database2"."YourSchema"."YourTable"
WHERE "PrimaryKeyColumn" NOT IN (
    SELECT "PrimaryKeyColumn"
    FROM "Database1"."YourSchema"."YourTable"
);

-- SELECT *
-- FROM "postgres"."Production"."employees"
-- WHERE "employee_id" NOT IN (
--     SELECT "employee_id"
--     FROM "postgres"."public"."employees"
-- );

Find Deleted Records:

-- Find Deleted Records:
SELECT *
FROM "Database1"."YourSchema"."YourTable"
WHERE "PrimaryKeyColumn" NOT IN (
    SELECT "PrimaryKeyColumn"
    FROM "Database2"."YourSchema"."YourTable"
);

b) Using EXCEPT and INTERSECT

In PostgreSQL, you can leverage the EXCEPT and INTERSECT operators to compare two sets of data effectively. These operators are valuable tools for identifying differences and commonalities between two result sets.

Find New Records:

-- Find New Records:
SELECT * FROM "Database2"."YourSchema"."YourTable"
EXCEPT
SELECT * FROM "Database1"."YourSchema"."YourTable";

Find Deleted Records:

-- Find Deleted Records:
SELECT * FROM "Database1"."YourSchema"."YourTable"
EXCEPT
SELECT * FROM "Database2"."YourSchema"."YourTable";

6. Validating documented Data Dictionary with current status of Data Warehouse using ChatGPT in Turbo Mode

Simply paste tables and fields from the data dictionary and ask ChatGPT to produce a single query whether those fields exists or not in the data warehouse based on the schema and table name, then you will get long query results, which you can run in PgAdmin and see the surprise.

ChatGPT Prompt Sample:

Conclusion:

In conclusion, comparing two databases in Postgres is a routine yet crucial task for database administrators and developers. This article has covered fundamental techniques for identifying changes between databases, allowing users to pinpoint newly added entries, removed records, and other variations. It’s essential to choose the method that best suits your specific needs and the complexity of your data comparison assignment.

References:

https://www.pgadmin.org/docs/pgadmin4/development/schema_diff.html

Stored Procedure with Parameters

Here’s how to write a stored procedure that accepts schema name, table 1 name, and table 2 name as parameters and compares their field names:

CREATE OR REPLACE PROCEDURE compare_table_fields(
    schema_name TEXT,
    table1_name TEXT,
    table2_name TEXT
)
LANGUAGE plpgsql
AS $$
DECLARE
    table1_fields TEXT[];
    table2_fields TEXT[];
BEGIN
    -- Get fields for first table using dynamic SQL
    EXECUTE format(
        'SELECT array_agg(column_name ORDER BY ordinal_position)
         FROM information_schema.columns
         WHERE table_schema = %L AND table_name = %L',
         schema_name, table1_name)
    INTO table1_fields;

    -- Get fields for second table
    EXECUTE format(
        'SELECT array_agg(column_name ORDER BY ordinal_position)
         FROM information_schema.columns
         WHERE table_schema = %L AND table_name = %L',
         schema_name, table2_name)
    INTO table2_fields;

    -- Print and compare
    RAISE NOTICE 'Fields in %.%: %', schema_name, table1_name, table1_fields;
    RAISE NOTICE 'Fields in %.%: %', schema_name, table2_name, table2_fields;

    IF table1_fields = table2_fields THEN
        RAISE NOTICE '✅ Field names are consistent.';
    ELSE
        RAISE NOTICE '❌ Field names are NOT consistent.';
    END IF;
END;
$$;

✅ How to Call It

CALL compare_table_fields('Production', 'expenserecords', 'paymentrecords');

Discover more from Data Engineer Journey

Subscribe to get the latest posts sent to your email.

Top 6 Techniques for Identifying and Resolving Database Differences (Postgres)

1. Checking Schema differences using Schema diff tool

2. Comparing fields between two tables in any schema:

3. Comparing Table Row Counts between tables

4. Compare Views

5. Checking field difference using SQL joins or using EXCEPT/INTERSECT:

a) Using Traditional Technique (i.e. IN/NOT IN):

b) Using EXCEPT and INTERSECT

6. Validating documented Data Dictionary with current status of Data Warehouse using ChatGPT in Turbo Mode

Conclusion:

References:

Stored Procedure with Parameters

✅ How to Call It

Like this:

Related

Discover more from Data Engineer Journey

Leave a Comment Cancel Reply

1. Checking Schema differences using Schema diff tool

2. Comparing fields between two tables in any schema:

3. Comparing Table Row Counts between tables

4. Compare Views

5. Checking field difference using SQL joins or using EXCEPT/INTERSECT:

a) Using Traditional Technique (i.e. IN/NOT IN):

b) Using EXCEPT and INTERSECT

6. Validating documented Data Dictionary with current status of Data Warehouse using ChatGPT in Turbo Mode

Conclusion:

References:

Stored Procedure with Parameters

✅ How to Call It

Share this:

Like this:

Related

Discover more from Data Engineer Journey

Leave a Comment Cancel Reply

Discover more from Data Engineer Journey