By examining the use of algorithms to solve the Prize Collecting Steiner Tree (PCST) problem we consider the facets which determine effectiveness. Specifically, by measuring a number of solution approaches and comparing them based on metrics. In order to understand the solution approach we must asses why it is useful. Our goal is to determine the effectiveness of Mixed Integer Programming (MIP) and heuristic methods. Utilizing freely available street and address data a base graph representation is created and then computed on. Such that a tree connects every address utilizing the minimum total length of edges from the street network. This is the basis of many approaches used to solve infrastructure problems including telecommunications network design and costing. The analysis is conducted on methods developed by Hegde et al. 2015, Ljubić et al. 2006, and Teitz et al. 1963. We present a data processing architecture, as well as a concise set of results and a framework for assessing the facets and trade-offs for a given approach. In this case the heuristic approaches are proven to have advantages in the simplistic case but fail when more complex requirements are added. This is where the MIP approach is able to capitalize, whilst detrimentally limiting the flexibility due to the strictness and specificity in modelling.
Euro30 2019 - Benchmarking tree approaches on street data
1. 3-GIS Confidential and proprietary.
Benchmarking tree approaches on street data
Fabion Kauker - fkauker@3-gis.com
EURO30 Dublin, Ireland 2019
2. Problem definition & Questions
• We want to find topologies on data that relates to possible routes for
the creation of infrastructure.
• Many methods exist, but which are most advantageous and for what
applications?
• Can we use street data as a starting point?
• What would the software look like and how could it be used?
3. Data and representations
● Many formats
○ SHP, TAB, csv, geoJSON …
● Many tools
○ ESRI, QGIS, MapInfo …
● Many sources
○ OSM, open address, municipal, commercial
Key Geometry
Point → (lat, lon)
LineString → [(lat_1, lon_1),..., (lat_n,lon_n)]
Polygon → [(lat_1, lon_1),..., (lat_1,lon_1)]
4. Geometry to Graph
Karduni, A., Kermanshah, A., & Derrible, S. (2016). A protocol to convert spatial polyline data to network
formats and applications to world urban road networks
1. https://github.com/fhk/test_data
2. https://observablehq.com/@d3/force-directed-graph
(1) (2)
5. Baking a graph
Given that we have some geometry we
want to derive the graph structure.
Where:
vertex/node == Point or LineString
endpoint
and
edge/arc == LineString endpoint to
LineString endpoint
Directional vs. Undirected
degree 2 nodes = degree 1 nodes
Why Homebaked v. Open Source?
Python
networkx
graph-tool
Baked in algos
Edge and node access
Connected components
Used for path computation
Representations:
adjacency list or matrix
Compute origin destination
matrix/lookup
Simple assembly
Let’s just round to a level of accuracy
Currently using 6 decimal places on
lat lon coordinates
Stringify the coordinates
Assign unique ids
Should probably convert to
euclidean, or use a projection ...
7. Given a graph we want to utilize modelling
approaches...
Pre-compute intermediate inputs
Create formulation or data input
Run model
Parse solution
1) http://people.csail.mit.edu/ludwigs/papers/icml15_graphsparsity.pdf
2) http://people.csail.mit.edu/ludwigs/papers/dimacs14_fastpcst.pdf
3) https://github.com/fraenkel-lab/pcst_fast
4) https://cran.r-project.org/web/packages/tbart/index.html
5) https://pubsonline.informs.org/doi/abs/10.1287/opre.16.5.955
6) Goldman AJ (1971) Optimal center location in simple networks. Transp Sci 5:212–221
7) https://pdfs.semanticscholar.org/2094/882d425fe9b3f668eaafbcf8ac0bba478b5f.pdf
8) Ljubić et al. (2005). Solving the Prize-Collecting {Steiner} Tree Problem to Optimality
Network flow MIP
(adapted from Ljubić et al)
Assignment MIP
Tietz-Bart p-median
Hegde et al. Prize Collecting
Steiner Tree (PCST)
8. Graph to MIP
Flow
source
sinkFlow conservation
constraints
Arc and node
based variables
Assignment
Path and
candidate/demand
variables
candidate demand
Note: be sure to try out fstrings in python 3.6+
9. Why compare apples and oranges?
Create shared and accessible
benchmarks
Create an extensible framework for
development
Create a web deployable library
Explore combinations and search
techniques
Document caveats
Create extensibility
Explore User Experience (UX)
Develop understanding in industry
10. How was benchmarking done?
https://github.com/fhk/test_data
1. Source data
2. Process data
3. Wrap “link_src” lib
4. Solve and time
5. Process time
6. Process results
7. Create visualization
Solver: CPLEX
Time limit 1800 seconds