Download Parquet Data
Download ski resort datasets in Parquet format and learn how we create them.
How We Create the Data
Built from OpenStreetMap using a multi-step pipeline
Our datasets are built from OpenStreetMap (OSM) using a multi-step pipeline. We use regional PBF extracts from Geofabrik—continental or country-level OSM data—then process each region through an 8-step pipeline.
The 8-Step Pipeline
From OSM extract to GeoParquet output
- Extract winter_sports – Pull ski areas and winter sport facilities from OSM
- osm_nearby – Extract OSM features within ~2 km of each ski area
- lifts and pistes – Extract lift lines and piste (trail) geometries
- enrich – Add boundaries, administrative data, and enrich attributes
- analyze – Compute statistics (trail counts, elevation, area, etc.)
- parquet – Export to GeoParquet format for compact storage and fast reads
Regions & Deployment
Scale by region, merge globally
Regions are defined in config/regions.yaml. Large areas (Europe, North America, Asia) are split into countries, states, or sub-regions so each run stays manageable. After processing, we combine regional outputs into a single global dataset using our combine_regions script.
The pipeline runs either locally with Docker or on AWS ECS Fargate for continent-wide batch jobs. Full Europe or North America runs take roughly 5–8 hours each.
View globalskiatlas_data on GitHub
Datasets
GeoParquet format — use with Pandas, DuckDB, GeoPandas
Each file has embedded geometry. Download below:
Further Reading
Pipeline docs in the globalskiatlas_data repo
- LOCAL_WORKFLOW.md – Run the pipeline locally with Docker
- RUN_BY_REGION.md – Region layout, PBF sizes, OOM avoidance
- WORLD_SCALE.md – Roadmap for world-scale data and serving
- AWS_ECS_DEPLOYMENT.md – Deploy to AWS ECS Fargate and S3