I love keeping track of my learning journey. Earning my fourth certification, Badge 4: Data Lake Workshop, has been exciting. I’ve often heard about Parquet files, and now this course finally covers them. This isn’t just another certification—it’s an adventure into the world of data.
Parquet File
To get started, you’ll be working with the cherry_creek_trail.parquet file.
Download file here:
Creating a Stage on Snowflake to Load Parquet File
You can create a stage using the interface:
or by running these commands:
create database mels_smoothie_challenge_db;
drop schema public;
create schema trails;
create stage trails_parquet;
Creating a File Format
Next, define the file format for your Parquet files using query or interface:
create file format FF_PARQUET
type = PARQUET;
Viewing the Parquet File
To see what’s inside your Parquet file, use this query:
select * from @trails_parquet
(file_format => ff_parquet);
Parsing Data into Columns
Organize your trail data with this query:
select
$1:sequence_1 as point_id,
$1:trail_name::varchar as trail_name,
$1:latitude::number(11,8) as lng,
$1:longitude::number(11,8) as lat
from @trails_parquet
(file_format => ff_parquet)
order by point_id;
Creating a View
Finally, create a view to manage your data:
create view cherry_creek_trial as
select
$1:sequence_1 as point_id,
$1:trail_name::varchar as trail_name,
$1:latitude::number(11,8) as lng,
$1:longitude::number(11,8) as lat
from @trails_parquet
(file_format => ff_parquet)
order by point_id;
Resources:
Course URL: https://learn.snowflake.com/en/courses/uni-ess-dlkw/
Map Plotting Tools
Here are some useful tools for plotting your data on a map:
- OpenStreetMap: https://www.openstreetmap.org/#map=4/-28.15/133.28
- WKT Playground: https://clydedacruz.github.io/openstreetmap-wkt-playground/#
- GeoJSON.io : https://geojson.io/#map=12.3/39.7055/-104.84319
Discover more from Data Engineer Journey
Subscribe to get the latest posts sent to your email.