This document discusses adding 1 billion points to a PostGIS Topology database. It describes:
1. Successfully adding 1 billion points to PostGIS Topology in 15-16 hours and being able to efficiently edit the topology layer.
2. Using PostGIS Topology to store and manage a high resolution Norwegian land resource map with 1 billion points representing 8 million polygons.
3. A process for filling the PostGIS Topology layer with data that uses a content balanced grid to parallelize insertion and minimize processing time.
Dev Dives: Streamline document processing with UiPath Studio Web
What happens when you add 1 billion points to PostGIS Topology
1. What happens when you put 1 billion
points into Postgis Topology?
Foss4g 2015, Como 16/07/2015
2. Norwegian Institute of Bioeconomy Research
WWW.NIBIO.NO
(from 1. July 2015 Skogoglandskap was merged into
NIBIO together 2 other institutes. )
Lars Aksel Opsahl (Lars.Opsahl@nibio.no) , developer.
3. Is this possible ?
7/18/15 31 billion points in Postgis Topology
Move 1
billion points
Into
postgis/topology
The answer is YES!
4. How long time to add 15 billion ? 15-16 hours.
Is it possible to edit this topo layer ? Yes.
Does edit take long time ? 1 sec and more.
The rest of the slides will go into details about how
we solve this and why Topology is good alternative
for our case.
7/18/15 41 billion points in Postgis Topology
5. This presentation we will focus on
WHAT type of data we test on.
WHY use Postgis Topology for this layer.
HOW we use Postgis Topology.
HOW we f ill this Postgis Topology layer with data.
HOW we plan to update this Topology layer.
6. AR5 is a high resolution land resource map
that covers all of Norway.
●
The map describes land resources based on land
type, site index, tree species and ground conditions.
●
When simple feature it is 8 million polygons with a
total of 1 billion points.
7. AR5 used in gardskart.nibio.no
7/18/15 71 billion points in Postgis Topology
8. AR5 used in kilden.nibio.no
7/18/15 81 billion points in Postgis Topology
10. View map changes
7/18/15 101 billion points in Postgis Topology
What you see Whats the history of the map
Added by aeb
10/01/2011
Added by lop
16/06/2015
11. Rollback a user map update
7/18/15 111 billion points in Postgis Topology
User adds a new line
and surface attribute
Moderator deletes
the new line
The new
map
Initial map
12. No overlap or gaps when map edit
7/18/15 121 billion points in Postgis Topology
User adds a new line
and surface attribute The new map
Initial map
This new line will
not cause any
overlap or gap
with the
exiting surface
Old lines will keep
their history and
original points
(2 new points)
13. 7/18/15 131 billion points in Postgis Topology
CREATE UNLOGGED TABLE topo_ar5.ar5_topo_linje(
id serial PRIMARY KEY not null );
SELECT topology.AddTopoGeometryColumn('topo_ar5_sysdata', 'topo_ar5',
'ar5_topo_linje', 'geo', 'LINESTRING') As new_layer_id;
-- create a new table for linestring attrubuttes
CREATE UNLOGGED TABLE topo_ar5.ar5_topo_linje_attr(
id serial PRIMARY KEY not null,
-- could be a feoreign key to topo_ar5_sysdata.edge_data, but since
this update outside our range we can not us foreig key her
edge_id int not null,
objtype_kode smallint not null CONSTRAINT objtype_kode_1_2_m1
CHECK (objtype_kode in (1,2,-1)),
aravgrtype smallint not null,
-- contains felles egenskaper from ar5
felles_egenskaper topo_ar5.sosi_felles_egenskaper,
-- used temp data will be deleted after data is addded
sl_sdeid int
);
HOW TO ILUSTRATE
A good picture may say more that any text, but for some people
a SQL fragment may say more that any text or picture.
When you see SQL fragments, I will explain the meaning. You
can actually think of this as a picture.
15. Database structure for border (lines/edges)
7/18/15 151 billion points in Postgis Topology
CREATE UNLOGGED TABLE topo_ar5.ar5_topo_linje(
id serial PRIMARY KEY not null );
SELECT topology.AddTopoGeometryColumn('topo_ar5_sysdata', 'topo_ar5', 'ar5_topo_linje', 'geo',
'LINESTRING') As new_layer_id;
-- create a new table for linestring attrubuttes
CREATE UNLOGGED TABLE topo_ar5.ar5_topo_linje_attr(
id serial PRIMARY KEY not null,
-- could be a feoreign key to topo_ar5_sysdata.edge_data, but since this update outside
our range we can not us foreig key her
edge_id int not null,
objtype_kode smallint not null CONSTRAINT objtype_kode_1_2_m1
CHECK (objtype_kode in (1,2,-1)),
aravgrtype smallint not null,
-- contains felles egenskaper from ar5
felles_egenskaper topo_ar5.sosi_felles_egenskaper,
-- used temp data will be deleted after data is addded
sl_sdeid int
);
table that holds
Topo object for lines
Holds attribute
For egdes
16. Why store attributes in separate table for lines ?
7/18/15 161 billion points in Postgis Topology
●
We want to be sure that any edge can have only
one attribute value.
●
After a discussion with Sandro Santilli we will look at
other ways to do this : My update code becomes
complicated and many of the same tests are already
done in Topology package by Sandro Santilli. The
way I have solved this now needs to be redesigned.
17. Database structure surface
7/18/15 171 billion points in Postgis Topology
CREATE UNLOGGED TABLE topo_ar5.ar5_topo_flate(
id serial PRIMARY KEY not null,
artype int4 CONSTRAINT artype_between_0_100 CHECK (artype > 0 and artype < 100),
arskogbon int4 CONSTRAINT arskogbon_between_0_100 CHECK (arskogbon > 0 and arskogbon < 100),
artreslag int4 CONSTRAINT artreslag_between_0_100 CHECK (artreslag > 0 and artreslag < 100),
argrunnf int4 CONSTRAINT argrunnf_between_0_100 CHECK (argrunnf > 0 and argrunnf < 100),
-- contains felles egenskaper form ar5
felles_egenskaper topo_ar5.sosi_felles_egenskaper,
simple_geo geometry(MultiPolygon,4258) NULL
);
--add a topogeometry column to the a ref to polygpn surface
SELECT topology.AddTopoGeometryColumn('topo_ar5_sysdata', 'topo_ar5', 'ar5_topo_flate', 'geo',
'POLYGON') As new_layer_id;
Used for performance.
Adding the topo geometry
18. HOW we f ill this Postgis Topology layer with data.
●
Content balanced grid.
●
Parallelize with GNU parallel and the grid cells.
●
All code is wrapped in PL/pgSQL functions.
●
We use simple feature lines and surface
representation points when we create Postgis
Topology
19. -- Core create grid code we use the && Operators to increase index use
sql := 'SELECT count(*) FROM ' || table_name || ' WHERE ' || geo_column_name || ' && ' ||
'ST_MakeEnvelope(' || x_min || ',' || y_min || ',' || x_max || ',' || y_max || ',' || source_srid || ')';
EXECUTE sql INTO num_rows_table_tmp ;
IF num_rows_table < max_rows
THEN
sectors[0] := grid_geom;
ELSE
x_delta := (x_max – x_min)/2; y_delta := (y_max – y_min)/2;
x_center := x_min + x_delta; y_center := y_min + y_delta;
sectors[0] := func_grid.SL_make_contert_based_grid(table_name_column_name_array,
ST_MakeEnvelope(x_min,y_min,x_center,y_center, ST_SRID(grid_geom)), min_distance, max_rows);
sectors[1] := func_grid.SL_make_contert_based_grid(table_name_column_name_array,
ST_MakeEnvelope(x_center,y_min,x_max,y_center, ST_SRID(grid_geom)), min_distance, max_rows);
sectors[2] := func_grid.SL_make_contert_based_grid(table_name_column_name_array,
ST_MakeEnvelope(x_min,y_center,x_center,y_max, ST_SRID(grid_geom)), min_distance, max_rows);
sectors[3] := func_grid.SL_make_contert_based_grid(table_name_column_name_array,
ST_MakeEnvelope(x_center,y_center,x_max,y_max, ST_SRID(grid_geom)), min_distance, max_rows);
Create content balanced grid for AR5 in Norway
7/18/15 191 billion points in Postgis Topology
-- Create a grid with around max 4000 lines in each cell
SL_make_content_based_balanced_grid01(ARRAY['org_ar5.ar5_linje geo'],4000))
To big, split in 4
Below limit ok to use
20. Content balanced grid for AR5 in Norway
7/18/15 201 billion points in Postgis Topology
21. Content balanced grid for AR5 in Norway
7/18/15 211 billion points in Postgis Topology
22. Linestring and surface distribution for the grid used.
●
Covered by a single cell (does not touch any cell border lines)
●
Single cell edges : 18988984
●
Single cell surfaces : 7093814
●
Crosses/touches cell border lines
●
Multi cell edges : 635048
●
Multi cell surfaces : 534455
221 billion points in Postgis Topology
23. 4 different operation type
7/18/15 231 billion points in Postgis Topology
●
A:Process lines covered by single cells.
●
B:Merge cells to include lines that cross cell borders
(then do the same as in A for lines founs)
●
C:Process surfaces covered by single cells.
●
D:Merge cells to include surfaces that cross cell
borders. (then do the same as in C for surfaces
found)
24. A: Only process data covered by each cell
7/18/15 241 billion points in Postgis Topology
WAIT TO PROCESS:
LINE NOT COVERD BY SINGLE CELL
START TO PROCESS :
LINE COVERD BY SINGLE CELL
25. B: Merge cells to include lines that cross cell borders.
7/18/15 251 billion points in Postgis Topology
OK TO PROCESS NOW:
LINE COVERD BY SET OF MERGED CELLS
DON'T PROCESS :
DON'T TOUCH ANY ORIGNAL BORDERS
26. Process lines covered by single cells : 1. create topo.
7/18/15 261 billion points in Postgis Topology
SELECT
topology.toTopoGeom(geo, 'topo_ar5_sysdata', 1, 0.0000000001) as geo,
sl_sdeid
FROM (
select arl.sl_sdeid, arl.geo from org_ar5.ar5_linje arl
where
cell_geo_in && arl.geo and
ST_Contains(cell_geo_in, arl.geo) and
arl.objType not in ('KantUtsnitt') and
NOT EXISTS ( select sl_sdeid from topo_ar5.added_edges f where
arl.sl_sdeid=f.sl_sdeid)
) AS a
Create the topo object. Extreme
performance. Snap to value
Use to find attributes
27. Merge cells and collect cell borders
7/18/15 271 billion points in Postgis Topology
-- merge cel
( SELECT
ST_union(cell.geo) as cell_union
FROM topo_ar5.cell_ad as cell
WHERE cell.id >= cell_min_in and
cell.id < (stop_cell_id)
) AS r2
-- get cell borders
FROM (
SELECT
(ST_Dump(grid_lines)).geom AS grid_line
FROM (
SELECT
ST_Collect(ST_ExteriorRing(cell.geo)) as grid_lines
FROM topo_ar5.cell_ad as cell
WHERE cell.id >= cell_min_in and
cell.id < (stop_cell_id)
) AS r
) AS r,
28. Use merged cells and cell borders to f ind new lines
7/18/15 281 billion points in Postgis Topology
....
WHERE ST_intersects(r.grid_line, arl.geo) AND
NOT EXISTS ( select edge_id from topo_ar5_sysdata.edge_data where
ST_Intersects(geom, arl.geo) and ST_Intersects(geom, r.grid_line) ) AND
arl.objType not in ('KantUtsnitt') AND
NOT EXISTS ( select sl_sdeid from topo_ar5.added_edges f where arl.sl_sdeid=f.sl_sdeid)
...
WHERE ST_Contains(r2.cell_union, arl.geo) AND
NOT EXISTS ( select sl_sdeid from topo_ar5.added_edges f where arl.sl_sdeid=f.sl_sdeid)
Covered by
merged cell
29. Process lines covered by single cells : 2. add attributes
7/18/15 291 billion points in Postgis Topology
SELECT
distinct ON (edge_id) edge_id,
topo_ar5.ar5_omkod_objtype_2_kode(b.objtype) as objtype_kode,
aravgrtype,b.datafangstdato,
ARRAY[b.informasjon] as informasjon,
(b.maalemetode,b.noyaktighet,b.synbarhet)::topo_ar5.sosi_kvalitet as
kvalitet ,b.opphav,b.verifiseringsdato,
(b.registreringsversjon,4.5)::topo_ar5.sosi_registreringsversjon as
registreringsversjon,
b.sl_sdeid
FROM (
select r.element_id as edge_id , arl.*
FROM relation_ids_added ra, topo_ar5_sysdata.relation r ,
org_ar5.ar5_linje arl
WHERE
ra.topogeo_id = r.topogeo_id and ra.layer_id = r.layer_id and
arl.sl_sdeid = ra.sl_sdeid
) AS b Map by id.
Add attributes using
user defined types.
30. Process surfaces covered by single cells: 1 add topo
7/18/15 301 billion points in Postgis Topology
INSERT INTO topo_ar5.ar5_topo_flate (geo)
SELECT topology.CreateTopoGeom('topo_ar5_sysdata',3,2,topoelementarray ) as geo
from
( select distinct ST_GetFaceGeometry('topo_ar5_sysdata',l.face_id) as geo,
topology.TopoElementArray_Agg(ARRAY[l.face_id,3]) as topoelementarray,
ST_union(l.mbr) as union_face
From topo_ar5_sysdata.face as l, topo_ar5.cell_ad cell
where cell.id = cell_nr_in and
ST_Contains(cell.geo,l.mbr) and
NOT EXISTS (select re.element_id from topo_ar5_sysdata.relation re
where re.layer_id = 2 and re.element_id = l.face_id )
group by l.face_id
) as r1,
topo_ar5.cell_ad cell
where cell.id = cell_nr_in and
ST_Contains(cell.geo, ST_Boundary(r1.union_face));
Build surface created
Find surfaces inside
Current cell
Create surface
Topo geo
31. Process surfaces covered by single cells: 2 update
simple geo
7/18/15 311 billion points in Postgis Topology
update topo_ar5.ar5_topo_flate AS f
set
simple_geo = geo::geometry
from arf_id as ft
where f.id = ft.id_temp; Just cast from topo geomtry
32. Process surfaces covered by single cells : 2. update
attributes
7/18/15 321 billion points in Postgis Topology
-- update the rest of the attributtes
update topo_ar5.ar5_topo_flate as f SET (artype, arskogbon,
artreslag,argrunnf,felles_egenskaper) =
(c.artype,c.arskogbon,c.artreslag,c.argrunnf,
(datafangstdato,informasjon,null, kvalitet,null,opphav,null,
registreringsversjon,verifiseringsdato)::topo_ar5.sosi_felles_egenskaper )
FROM ( SELECT
b.artype ,b.arskogbon,b.artreslag,b.argrunnf,
b.id_temp,b.datafangstdato, ARRAY[b.informasjon] as informasjon,
(b.maalemetode,b.noyaktighet,b.synbarhet)::topo_ar5.sosi_kvalitet as kvalitet ,
b.opphav, b.verifiseringsdato,
(b.registreringsversjon,'4.5')::topo_ar5.sosi_registreringsversjon as registreringsversjon
FROM
( select p.*, ft.id_temp from org_ar5.ar5_punkt as p,arf_id as ft,
topo_ar5.ar5_topo_flate as f2
where f2.id = ft.id_temp and ST_Covers(f2.simple_geo,p.geo)
) as b
) AS c where f.id = c.id_temp;
Find data by using
Representation point
33. Test performance for the migrations process
(16 dual core CPU's and ssd disks)
1 parallel thread
function_create_topo_ar5.sh vroom2 1 13000 200
15 parallel thread
function_create_topo_ar5.sh vroom2 15 13000 200
20 parallel thread
function_create_topo_ar5.sh vroom2 20 13000 200
331 billion points in Postgis Topology
34. Decreasing processing time
when increasing number of parallel threads
Number of threads Total runtime in hours
1 108
15 16
20 18
7/18/15 341 billion points in Postgis Topology
35. Average operations per second the 4 the different
operation types with different number of threads.
Number
of threads
A: Single cell
linestrings
B: Multi cell
linestrings
C: Single cell
surfaces
D: Multi cell
surfaces
1 91 9 305 5
15 1043 48 972 21
20 814 48 934 27
7/18/15 351 billion points in Postgis Topology
36. Average operations per second at every hour when
running single threaded.
7/18/15 361 billion points in Postgis Topology
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
B
B
B
B
C
C
C
C
C
C
D
0
50
100
150
200
250
300
350
400
450
500
Hours and opr. type
Opr.
pr. sec .
37. Average operations per second at every hour when
running 15 parallel threads.
7/18/15 371 billion points in Postgis Topology
A A A A A A B B B B C C C D D D D D D D D
0
200
400
600
800
1000
1200
1400
1600
1800
Hours and opr. type
Opr.
pr. sec .
38. Summery convert AR5 to Postgis Topology
7/18/15 381 billion points in Postgis Topology
●
Content balanced grid and parallel threads.
●
Two parallel threads can not work in the same area
●
Function based index topo_ar5.get_relation_id( geo
TopoGeometry) and indexes on relation table.
●
Heavy use of && operator.
●
Ok with 16 hours processing time since this is a one
time operation.
●
ValidateTopology('topo_ar5_sysdata') show no error.
39. HOW to update the Postgis Topology layer.
●
Draw a line and set attribute values
●
Use stored procedures
●
Use one single transaction
●
Rollback if any errors
●
Java backend with JSON API
●
Simple test client using this API
40. Two comments about update
7/18/15 401 billion points in Postgis Topology
1) Jostein head of AR5 “Don't delete old lines, it's nice to
know the history behind changes”.
2) Ingvild my boss “Why do I have to move old lines around
with many hundreds points, why can´t I just give you a new
simple line that just shows the difference ?”
41. Edit Topology data with surface data
7/18/15 411 billion points in Postgis Topology
Draw a polygon
Split a polygon
Update surface attributes
Extend a polygon
42. Edit Topology : Split a polygon
- Original map
7/18/15 421 billion points in Postgis Topology
43. Edit Topology : Split a polygon
- Input : point, line, attribute values
7/18/15 431 billion points in Postgis Topology
44. Edit Topology : What happens when you have a
split surface operation.
1 billion points in Postgis Topology
Java backend calls : apply_line_on_topo_flate( geo_in geometry,
p_in geometry, artype_in int, arskogbon_in int,
artreslag_in int, argrunnf_in int)
And the following happens
- Adjust input input line to current data and take in account that equal surface be equal
- Compute the area to be update
- Take a copy of the non changed data
- Take a copy of data may change
- Clear data from the line attribute table
- Clear data from the topo surface layer and delete rows to be changed
- Add the adjusted line by topology.toTopoGeom
- Update the line attribute table
- Create new surfaces with new attribute value
- Create old surfaces with old value
- Check that non changed area is still the same
45. Edit Topology : Timing issues when you have a
split surface operation.
1 billion points in Postgis Topology
Java backend calls this function
topo_ar5.apply_line_on_topo_flate( geo_in geometry, p_in geometry,
artype_in int, arskogbon_in int, artreslag_in int, argrunnf_in int)
Small operations that include few changes takes a 1000 ms, but bigger oprations may minutts
http://trac.osgeo.org/postgis/ticket/2083
46. Edit Topology : Split a polygon
- New map
7/18/15 461 billion points in Postgis Topology
47. Edit Topology : Extend a polygon
7/18/15 471 billion points in Postgis Topology
48. Edit Topology : Extend a polygon
7/18/15 481 billion points in Postgis Topology
49. Edit Topology : Extend a polygon.
1 billion points in Postgis Topology
Java backend call this function:
apply_line_on_topo_flate( geo_in geometry, p_in geometry,
artype_in int, arskogbon_in int, artreslag_in int,
argrunnf_in int)
Where p_in (0.0) means not set.
50. Edit Topology : Extend a polygon
7/18/15 501 billion points in Postgis Topology
51. Edit Topology : Draw a new polygon
7/18/15 511 billion points in Postgis Topology
52. Edit Topology : Draw a new polygon
7/18/15 521 billion points in Postgis Topology
53. Edit Topology : Draw a new polygon.
1 billion points in Postgis Topology
Java backend call this function: apply_polygon_on_topo_flate(
geo_in geometry, artype_in int, arskogbon_in int,
artreslag_in int, argrunnf_in int
)
54. Edit Topology : Draw a new polygon
7/18/15 541 billion points in Postgis Topology
55. Further plans this year
●
Add many new layer to Postgis Topology this fall and
adjust the Topology model to new requirements.
●
Create a client that uses JSON API for update of
topology layers.
●
Extend update API with more functionality.
●
We have to work more on performance and topology
usage and update client for AR5 .
56. Postgis Topology is a great tool and you can
add one billion points and it's possible to
update it afterwords.
Thanks to everybody that has contributed to
Postgis Topology and other open source tools.
Questions ?