Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Generation Script for londonTubeLines.json Dataset #667

Open
dsmedia opened this issue Jan 18, 2025 · 0 comments
Open

Add Generation Script for londonTubeLines.json Dataset #667

dsmedia opened this issue Jan 18, 2025 · 0 comments

Comments

@dsmedia
Copy link
Collaborator

dsmedia commented Jan 18, 2025

Add Generation Script for londonTubeLines.json Dataset

The londonTubeLines.json dataset, showcased in this example:

  1. has a complex lineage
  2. lacks a generation script

Given the significant community interest in geospatial visualization, maintaining reproducible geographic datasets seems to be a worthwhile priority. A script to add to the repo that can generate (or update) londonTubeLines.json from its original source, which is believed to be OpenStreetMap, would secure the dataset's long-term viability. Input from those with geospatial data expertise would be welcome.

Background and Current Status

As I understand it, the londonTubeLines.json dataset is a TopoJSON file representing selected London Underground rail lines. It appears to have been added to the repository in this commit. The dataset's description, sources, and license are currently being expanded in pull request #663.

The commit history and related documentation suggest the following lineage:

  1. Original Source (Likely OpenStreetMap): The data was likely originally sourced from OpenStreetMap, although a direct link could not be found.
  2. Intermediate Source 1 (oobrien/vis): User @oobrien appears to have processed the data into a simplified GeoJSON format, tfl_lines.json. The file can be found in this commit of the oobrien/vis repository, which cites OpenStreetMap. This file represents a simplified view of London transport lines from the original source.
  3. Intermediate Source 2 (gicentre/litvis): @jwoLondon documented the process of converting tfl_lines.json to a TopoJSON file (similar to londonTubeLines.json) in this tutorial. This involved filtering specific lines and mapping properties using ndjson-cli and topojson. When I attempted to folllow the instructions (code below), I wasn't quite able to match this repo's version. Also, this code still relies on an intermediate source, not the original source.

topoJSON files are not limited to aereal units. Here, for example, we can import a file containing the geographical routes of selected London Underground tube lines. The conversion of the tfl_lines.json follows a similar pattern to the conversion of the borough boundary files, but with some minor differences:

  • The file is already in unprojected geoJSON format so does not need reprojecting or conversion from a shapefile.
  • ndjson-cat converts the original geoJSON file to a single line necessary for further processing.
  • the file contains details of more rail lines than we need to map so ndjson.filter is used with a regular expression to select data for tube and DLR lines only.
  • the property we will use for the id (the tube line name) is inside the first element of an array so we reference it with [0] (where there is more than one element in the array it indicates more than one named tube line shares the same physical line).
ndjson-cat < tfl_lines.json \
  | ndjson-split 'd.features' \
  | ndjson-filter 'd.properties.lines[0].name.match("Ci.*|Di.*|No.*|Ce.*|DLR|Ha.*|Ba.*|Ju.*|Me.*|Pi.*|Vi.*|Wa.*")' \
  | ndjson-map 'd.id = d.properties.lines[0].name,delete d.properties,d' \
  | geo2topo -n -q 1e4 line="-" \
  > londonTubeLines.json

An initial attempt was made to create a generation script using @oobrien 's tfl_lines.json as a starting point. The script involved using ndjson-cli, topojson, and d3-geo-centroid, but the output did not perfectly match the existing londonTubeLines.json in vega-datasets.

1. Setup Commands
npm install -g shapefile ndjson-cli topojson d3-geo-centroid
apt-get install gdal-bin

wget https://raw.githubusercontent.com/oobrien/vis/master/tubecreature/data/tfl_lines.json

ndjson-cat tfl_lines.json \
  | ndjson-split 'd.features' \
  | ndjson-filter 'd.properties.lines.some((l) => l.name == "DLR" || l.name == "Bakerloo" || l.name == "District" || l.name == "Piccadilly" || l.name == "Northern" || l.name == "Hammersmith & City" || l.name == "Jubilee" || l.name == "Circle" || l.name == "Waterloo & City" || l.name == "Victoria" || l.name == "Metropolitan" || l.name == "Central") && !d.properties.lines.some((l) => l.name == "London Overground")' \
  | ndjson-map 'd.id = d.properties.lines[0].name + (d.id ? "_" + d.id : ""), d' \
  > tfl_lines_filtered.ndjson

geo2topo -n -q 1e4 line=tfl_lines_filtered.ndjson > londonTubeLines.json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant