Download Parquet Data

How We Create the Data

Built from OpenStreetMap using a multi-step pipeline

Our datasets are built from OpenStreetMap (OSM) using a multi-step pipeline. We use regional PBF extracts from Geofabrik—continental or country-level OSM data—then process each region through an 8-step pipeline.

Aerial view of ski resort with map data overlay

The 8-Step Pipeline

From OSM extract to GeoParquet output

Extract winter_sports – Pull ski areas and winter sport facilities from OSM
osm_nearby – Extract OSM features within ~2 km of each ski area
lifts and pistes – Extract lift lines and piste (trail) geometries
enrich – Add boundaries, administrative data, and enrich attributes
analyze – Compute statistics (trail counts, elevation, area, etc.)
parquet – Export to GeoParquet format for compact storage and fast reads

Regions & Deployment

Scale by region, merge globally

Regions are defined in config/regions.yaml. Large areas (Europe, North America, Asia) are split into countries, states, or sub-regions so each run stays manageable. After processing, we combine regional outputs into a single global dataset using our combine_regions script.

The pipeline runs either locally with Docker or on AWS ECS Fargate for continent-wide batch jobs. Full Europe or North America runs take roughly 5–8 hours each.

View globalskiatlas_data on GitHub