My last post went into a lot of detail on setting things up to solve edge covering problems. Despite the torrent of words I threw at the topic, I wanted to highlight the step of extracting map data from OpenStreetMap for a city. As with many of my posts, this one is motivated by a question on the OR-Tools mailing list. The original poster asked about generating walking routes to conduct a door-to-door survey of residents of Port-au-Prince, Haiti.
Finding OSM data
The first step to using OpenStreetMap (OSM) is to figure out how to
find the data in the first place. The root source of course is the
OSM API itself, which allows one to download the entire world map. As
that is a very large file, it is better to access an area snapshot of
the map from an alternative download server. One of my favorites is
the Geofabrik download
server, as it is
regularly updated and well maintained. Using Port-au-Prince as an
example, I first clicked on the Central
America link, and
then on the
link. From there I downloaded the
OSM snapshot file. However, I’m interested in Port-au-Prince, not the
entire island, and Geofabrik does not have handy outlines of cities
Unbelievable as it may seem, between preparing for the original Glendale proposal and in writing up the related blog post, I’d forgotten where I found the polygon for the city! (I swear all my tweets and blog posts are really just a way for my future self to remember things my past self already knows.) After a little bit of grousing and hair-pulling, I finally remembered.
The OSM API allows direct access to nodes, ways, and relations. Typically, a city is defined as a relation that associates ways and nodes with the city itself. One can find the relation by using the search box on the OSM map interface.
For example, to see the city of Port-au-Prince, you can type “Port-au-Prince” into the search bar. This is shown in the screenshot to the right. Read down the suggestions and look for one that says “boundary”. In this case, it is the second search result, the one that is highlighted in yellow. Clicking on that link will take you to the OSM page showing the boundary relation for the city, as shown in the screenshot below.
The Port-au-Prince boundary relation page will zoom to the city, showing the outline of the city, as well as the location of the administrative seat of the city. Most cities in OSM have this boundary relation, but as OSM is purely the work of volunteers, it may be the case that some smaller cities do not have a bounding polygon defined.
The boundary relation contains information on the ways and nodes that are members of the relation. OpenStreetMap is made up of nodes and ways, with relations being used to group together “related” nodes and ways. If you scroll down the left hand side data column, you can see the member relations, as shown in the third screenshot. What we need to do is download the boundary relation as well as its constituent nodes and ways.
Relations are defined by unique numbers. Those numbers can change
however, so it is best to look up the city or area by name first, and
then copy the relation number. In this case, the second screen shot
shows the relation number identifying the Port-au-Prince boundary
387318. The page also identifies the change that last
touched the relation. In this case, it was three years ago.
In order to download the full relation, we need to change from the OSM
map URL to the URL for the data API. The OSM API is defined and
documented on the OSM API wiki
page. Specifically, we want
to use the 0.6 API version
GET call to download the relation, as defined
However, there is a little trick that I found somewhere at some point in time but often forget. Specifically, if you follow the API instructions, the URL for the Port-au-Prince relation should be:
However, if you download that URL, you won’t get what you want!
wget -O port-au-prince-poly.osm https://www.openstreetmap.org/api/0.6/relation/387318 --2019-11-19 12:29:17-- https://www.openstreetmap.org/api/0.6/relation/387318 Resolving www.openstreetmap.org ... Connecting to www.openstreetmap.org ... HTTP request sent, awaiting response... 200 OK Length: unspecified [text/xml] Saving to: ‘port-au-prince-poly.osm’ port-au-prince-poly.osm [ <=> ] 6.31K --.-KB/s in 0.03s 2019-11-19 12:29:19 (214 KB/s) - ‘port-au-prince-poly.osm’ saved  $ cat port-au-prince-poly.osm <?xml version="1.0" encoding="UTF-8"?> <osm version="0.6" generator="CGImap 0.7.5 (28928 thorn-03.openstreetmap.org)" copyright="OpenStreetMap and contributors" attribution="http://www.openstreetmap.org/copyright" license="http://opendatacommons.org/licenses/odbl/1-0/"> <relation id="387318" visible="true" version="42" changeset="43835965" timestamp="2016-11-21T02:03:47Z" user="nyuriks" uid="339581"> <member type="way" ref="48436422" role="outer"/> <member type="way" ref="48800769" role="outer"/> <member type="way" ref="48800768" role="outer"/> ...
All you get is the list of things included in the relation, but not the things themselves. The trick is that the word “full” must be appended to the URL, in order to download the actual contents of the relation. Specifically, the URL should look like:
This file can then be used in OSM-aware software tools as the boundary
file defining Port-au-Prince, Haiti. You can download it using a web
browser (save to a file), or else use wget, as in
So now I have both the area data file (for Haiti and Dominican Republic) and the outline definition for the city I am interested in (Port-au-Prince). The next step is to use the outline file to extract just the city’s data.
Extracting just a city’s data
The best tool to use these days for manipulating OSM data is Osmium.
Because I want to load the city’s map data into PostgreSQL using
pgRouting, I want the output file to be in OSM format, not in PBF
format (which osm2pgrouting cannot decipher). The command is quite
simple, using the
-p option to specify the Port-au-Prince poly as
the area to extract from the larger OSM file.
osmium extract -p port-au-prince-poly.osm \ -o port-au-prince-latest.osm \ haiti-and-domrep-latest.osm.pbf
There are many configurable alternatives for extracting data, specifying whether a way should overlap the poly, be completely contained within the poly, and so on. I prefer using the defaults, which should include all shapes that overlap the area of interest, and then I can use PostGIS to further refine the included shapes as needed. Consult the Osmium documentation for all of the possible options.
Loading into PostgreSQL
The next step is to load the data into PostgreSQL using osm2pgrouting. By using pgRouting’s tool to load the data, the streets will be automatically processed to convert the roads into a proper network. The best source to get more information is from the osm2pgrouting documentation.
osm2pgrouting --f data/port-au-prince-latest.osm \ --conf data/map_config_streets.xml \ --dbname portauprince \ --prefix 'portauprince_' \ --username dbuser \ --clean
The one interesting feature here is the map configuration file,
map_config_streets.xml. The default configuration will load all
roads into the database. Alternate standard configurations exist for
loading transit and walking type networks. In this case, I altered
the default road loading configuration to exclude all highways.
Originally I used it for work in Glendale, and only set it up after
examining the types of roads that existed in that city. I wanted to
exclude highways because the city did not have the responsibility of
cleaning highway links. By omitting “motorway”, “motorway_link”, and
“motorway_junction”, all of the freeway links in Glendale would be
omitted. I also wanted to set a common “maxspeed” for
all streets to 50 km/hr, but I ended up not using it.
My xml configuration file looks like this:
<?xml version="1.0" encoding="UTF-8"?> <configuration> <!-- note maxspeed="50" means kph, not mph --> <tag_name name="highway" id="1"> <!-- <tag_value name="motorway" id="101" priority="1.0" maxspeed="130" /> --> <!-- <tag_value name="motorway_link" id="102" priority="1.0" maxspeed="130" /> --> <!-- <tag_value name="motorway_junction" id="103" priority="1.0" maxspeed="130" /> --> <tag_value name="trunk" id="104" priority="1.05" maxspeed="50" /> <tag_value name="trunk_link" id="105" priority="1.05" maxspeed="50" /> <tag_value name="primary" id="106" priority="1.15" maxspeed="50" /> <tag_value name="primary_link" id="107" priority="1.15" maxspeed="50" /> <tag_value name="secondary" id="108" priority="1.5" maxspeed="50" /> <tag_value name="secondary_link" id="109" priority="1.5" maxspeed="50"/> <tag_value name="tertiary" id="110" priority="1.75" maxspeed="50" /> <tag_value name="tertiary_link" id="111" priority="1.75" maxspeed="50" /> <tag_value name="residential" id="112" priority="2.5" maxspeed="50" /> <tag_value name="living_street" id="113" priority="3" maxspeed="50" /> <!-- <tag_value name="service" id="114" priority="2.5" maxspeed="50" /> --> <tag_value name="unclassified" id="117" priority="3" maxspeed="50"/> <tag_value name="road" id="100" priority="5" maxspeed="50" /> </tag_name> </configuration>
Running the above command processes the data and loads it into the database. The larger the data set, the longer it will take to run. On my laptop running under Docker, the Port-au-Prince data file takes about 25 seconds to load. The output is as follows:
Execution starts at: Mon Dec 2 12:49:51 2019 *************************************************** COMMAND LINE CONFIGURATION * *************************************************** Filename = data/port-au-prince-latest.osm Configuration file = data/map_config_streets.xml host = localhost port = 5432 dbname = glendale username = slash schema= prefix = portauprince_ suffix = Drop tables Don't create indexes Don't add OSM nodes *************************************************** Testing database connection: glendale database connection successful: glendale Connecting to the database connection success Dropping tables... TABLE: portauprince_ways dropped ... OK. TABLE: portauprince_ways_vertices_pgr dropped ... OK. TABLE: portauprince_pointsofinterest dropped ... OK. TABLE: configuration dropped ... OK. TABLE: osm_nodes dropped ... OK. TABLE: osm_ways dropped ... OK. TABLE: osm_relations dropped ... OK. Creating tables... TABLE: portauprince_ways_vertices_pgr created ... OK. TABLE: portauprince_ways created ... OK. TABLE: portauprince_pointsofinterest created ... OK. TABLE: configuration created ... OK. Opening configuration file: data/map_config_streets.xml Parsing configuration Exporting configuration ... - Done Counting lines ... - Done Opening data file: data/port-au-prince-latest.osm total lines: 2725680 Parsing data Finish Parsing data Adding auxiliary tables to database... Export Ways ... Processing 169321 ways: [**| ] (11%) Total processed: 20000 Vertices inserted: 15566 Split ways inserted 15581 [****| ] (23%) Total processed: 40000 Vertices inserted: 2861 Split ways inserted 4235 [******| ] (35%) Total processed: 60000 Vertices inserted: 2040 Split ways inserted 3743 [********| ] (47%) Total processed: 80000 Vertices inserted: 122 Split ways inserted 274 [**********| ] (59%) Total processed: 100000 Vertices inserted: 342 Split ways inserted 324 [************| ] (70%) Total processed: 120000 Vertices inserted: 29 Split ways inserted 41 [**************| ] (82%) Total processed: 140000 Vertices inserted: 10 Split ways inserted 21 [****************| ] (94%) Total processed: 160000 Vertices inserted: 185 Split ways inserted 340 [******************|] (100%) Total processed: 169321 Vertices inserted: 296 Split ways inserted 495 Creating indexes ... Processing Points of Interest ... ######################### size of streets: 169321 Execution started at: Mon Dec 2 12:49:51 2019 Execution ended at: Mon Dec 2 12:50:16 2019 Elapsed time: 25.153 Seconds. User CPU time: -> 19.4725 seconds #########################
And with that, the data for Port-au-Prince is loaded into PostgreSQL.
The figure below is rendered using QGIS, dumping the
portauprince_ways table straight to the screen, layered on top of
standard OSM tiles. The line segments have been colored using the
values of “priority” specified in the configuration XML file, above.