In the complex world of database management, ensuring consistency between databases, schemas, and tables is crucial for seamless data operations. This article explores six techniques for identifying and resolving differences within Postgres databases, addressing schema variations, field inconsistencies, and other discrepancies.
1. Checking Schema differences using Schema diff tool
You can use PgAdmin’s in built tool to check schema difference.
Navigation: PgAdmin>Tools>Schema diff
But in SQL SERVER, you will have SQL Server Data Tools (SSDT)
Note: To compare using this method, both source and destination SERVER should have same version of Postgres installed.
Example:
Let’s check the schema difference between two modules; i. e. Production and public, which is given below:
2. Comparing fields between two tables in any schema:
Underlying Problems that might exist in your workspace:
- In my experience during ETL development and data warehousing, I came across several typos or different name conventions for the same fields for some tables which should be same actually across the database.
- PgAdmin does not support/displays multiple queries output like SSMS does, so its hard to compare fields of two tables in a single view.
Benefits:
- Can help to compare fields of two tables in a single view.
- It raises the notice if the fields are consistent or not.
- Reduces the risk of transformation errors by confirming that the fields expected by data integration tool processes are consistently named in both source and target tables.
- Ensures the accuracy of ETL transformations including typos in field names.
How to use it for other tables?
Just rename table schemas and table names you want to compare and run the query from following example, and check the raised output. You can also visualize and compare fields in output.
Example and Query:
Checking whether the fields are consistent or not among following two tables:
- expenserecords
- paymentrecords
The structure of these tables are defined as follows:
Query to compare field differences:
-- Compare field names in Production schema
DO $$
DECLARE
feed_facts_fields text[];
output_facts_fields text[];
BEGIN
-- Get field names for expenserecords
SELECT array_agg(column_name)
INTO feed_facts_fields
FROM information_schema.columns
WHERE table_schema = 'Production'
AND table_name = 'expenserecords';
-- Get field names for paymentrecords
SELECT array_agg(column_name)
INTO output_facts_fields
FROM information_schema.columns
WHERE table_schema = 'Production'
AND table_name = 'paymentrecords';
-- Compare field names
RAISE NOTICE 'Field names in expenserecords: %', feed_facts_fields;
RAISE NOTICE 'Field names in paymentrecords: %', output_facts_fields;
IF feed_facts_fields = output_facts_fields THEN
RAISE NOTICE 'Field names are consistent between the two tables.';
ELSE
RAISE NOTICE 'Field names are inconsistent between the two tables.';
END IF;
END $$;
Query Output in PgAdmin:
Here, we noticed that fields are inconsistent where first table has vendor_name and second table has payee_name. If they are same and you want to make them consistent across the system, you can then modify your ETL job accordingly.
3. Comparing Table Row Counts between tables
SELECT 'your_table_name' AS table_name, COUNT(*) AS row_count
FROM your_schema.your_table
UNION ALL
SELECT 'other_table_name' AS table_name, COUNT(*) AS row_count
FROM other_schema.other_table;
//add UNION ALL and other tables if necessary
4. Compare Views
This query compares views between two databases.
SELECT table_name
FROM information_schema.views
WHERE table_schema = 'your_schema'
EXCEPT
SELECT table_name
FROM information_schema.views
WHERE table_schema = 'other_schema';
5. Checking field difference using SQL joins or using EXCEPT/INTERSECT:
a) Using Traditional Technique (i.e. IN/NOT IN):
Find New Records:
SELECT *
FROM "Database2"."YourSchema"."YourTable"
WHERE "PrimaryKeyColumn" NOT IN (
SELECT "PrimaryKeyColumn"
FROM "Database1"."YourSchema"."YourTable"
);
-- SELECT *
-- FROM "postgres"."Production"."employees"
-- WHERE "employee_id" NOT IN (
-- SELECT "employee_id"
-- FROM "postgres"."public"."employees"
-- );
Find Deleted Records:
-- Find Deleted Records:
SELECT *
FROM "Database1"."YourSchema"."YourTable"
WHERE "PrimaryKeyColumn" NOT IN (
SELECT "PrimaryKeyColumn"
FROM "Database2"."YourSchema"."YourTable"
);
b) Using EXCEPT and INTERSECT
In PostgreSQL, you can leverage the EXCEPT and INTERSECT operators to compare two sets of data effectively. These operators are valuable tools for identifying differences and commonalities between two result sets.
Find New Records:
-- Find New Records:
SELECT * FROM "Database2"."YourSchema"."YourTable"
EXCEPT
SELECT * FROM "Database1"."YourSchema"."YourTable";
Find Deleted Records:
-- Find Deleted Records:
SELECT * FROM "Database1"."YourSchema"."YourTable"
EXCEPT
SELECT * FROM "Database2"."YourSchema"."YourTable";
6. Validating documented Data Dictionary with current status of Data Warehouse using ChatGPT in Turbo Mode
Simply paste tables and fields from the data dictionary and ask ChatGPT to produce a single query whether those fields exists or not in the data warehouse based on the schema and table name, then you will get long query results, which you can run in PgAdmin and see the surprise.
ChatGPT Prompt Sample:
Conclusion:
In conclusion, comparing two databases in Postgres is a routine yet crucial task for database administrators and developers. This article has covered fundamental techniques for identifying changes between databases, allowing users to pinpoint newly added entries, removed records, and other variations. It’s essential to choose the method that best suits your specific needs and the complexity of your data comparison assignment.
References:
https://www.pgadmin.org/docs/pgadmin4/development/schema_diff.html
Discover more from Data Engineer Journey
Subscribe to get the latest posts sent to your email.