Last week I found an interesting SQL problem posted by Ankit Bansal on LinkedIn, and my solution to the problem using PSQL is as follows:
Q) You are given the travel data for each customer in no particular order. You need to find the start location and end location of the customer.
Input
Sample Output:
Create and Insert Statement:
CREATE TABLE travel_data (
customer VARCHAR(10),
start_loc VARCHAR(50),
end_loc VARCHAR(50)
);
INSERT INTO travel_data (customer, start_loc, end_loc)
VALUES
('c1', 'New York', 'Lima'),
('c1', 'London', 'New York'),
('c1', 'Lima', 'Sao Paulo'),
('c1', 'Sao Paulo', 'New Delhi'),
('c2', 'Mumbai', 'Hyderabad'),
('c2', 'Surat', 'Pune'),
('c2', 'Hyderabad', 'Surat'),
('c3', 'Kochi', 'Kurnool'),
('c3', 'Lucknow', 'Agra'),
('c3', 'Agra', 'Jaipur'),
('c3', 'Jaipur', 'Kochi');
Solution
with t1 as(select
customer,
start_loc
from travel_data
where start_loc not in (select end_loc from travel_data)),
t2 as (select
customer,
end_loc
from travel_data
where end_loc not in (select start_loc from travel_data)),
t3 as (select customer, start_loc from travel_data union
select customer, end_loc from travel_data )
select t1.customer, t1.start_loc, t2.end_loc, count(t3.start_loc)
from t3
inner join t1 on t3.customer=t1.customer
inner join t2 on t3.customer=t2.customer
group by 1,2,3
order by customer
Testing on PgAdmin
References:
https://www.linkedin.com/posts/ankitbansal6_sql-analytics-activity-7162648046789406720-5vs1/
Discover more from Data Engineer Journey
Subscribe to get the latest posts sent to your email.