A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam

1A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Frank Cremer (Geomatik) Mansour Raad (ESRI)
A Hadoop-enabled Ship Tracking
Application for the Port of Rotterdam
Hadoop Summit, Brussels, 15 April 2015

Rue des Bouchers

Where are the ships?
AIS = Automatic Identification System

Radar and control station
VTS = Vessel Traffic Service

Port Challenges
The Port should
become smarter,
faster and more
sustainable
Allard Castelein
CEO Port of Rotterdam

Access information in three clicks

Presentation Highlights
• Geospatial data set
• Only geospatial presentation this summit
• Sensor data
• Structured, but not flawless
• Easy access to Hadoop functionality
• Retrieving information in three clicks
• Users can be agnostic about Hadoop

About the speakers
• Mansour Raad
• BigData advocate
• Senior Software Architect
• ESRI – World’s largest GIS company
• GIS = Geographical Information System
• Frank Cremer
• Independent Geospatial and big data Consultant
• Consulting for the Port since 2008
• Geomatik

Port of Rotterdam: the facts
• 8th largest port in the
world
• Largest port of Europe
• Total area: 12,600 ha
• Depth 24 meter
• 70.5 km quay length
Maasvlakte 2

Port of Rotterdam in figures (1 year)
• 35.000 ship visits with 400 million ton cargo
• 80.000 barge visits
• 7.500.000 trucks (25.000 per day)
28%
48%
7%
17%
Road
Barge
Railway
Pipe lines
Over 40 kilometers
+

Usage of ship position data
• Harbour master
• Incident analysis
• Safety checks
• Capacity management
• Identifying bottlenecks
• Planning decision support
• Environmental management
• Pollution (NOx) calculations
• Speed measures to reduce pollutions
?

PortMaps project
• New geographical information system
• Deployed in partnership with ESRI
• Key characteristics:
• One uniform source of data
• Easy access
• Ship position data

Is ship position data Big Data?
• 5 Terabytes
(since 2009)
• 1 Terabyte
per year >1,000 records
every 10 sec
Single data
format (csv)
Volume
Variety
Velocity

Portmaps: Ship position data
• Challenges:
• Receiving data every 10s
• 10 Million records per day
• Considered options:
• Geospatial database; possible but
• Expensive
• Custom partitioning required
• Analyses could be a challenge
• Hadoop
• Commodity hardware
• Built for huge data sets (Petabytes)
• Framework for analyses

Simplified architecture

How it all got started…
ESRI.NL ad hoc cluster
• ESRI.NL hardware
• 4 x R200 Dell
• 2 x 1 Tb harddisks
• 1 x 4 Cores
• 16 GB RAM
• CentOS
• Installed CDH4
• Handed 2 1Tb USB drives
• 2 days to bulk load 2.5 Tb of data

Dataset
N|3270|N|550|N|-14927|441077|1|N|1|N|0|N|194|N|N||||||||||||||2231|01-03-2015 01:00:04|
J|N|N|N|N|||||5155.929,N|00254.968,E|||01-03-2015 00:00:04|A|R|N|ORE SALVADOR|
N|D5DO9|N|179|N|5|N|0|N|-5|N|188|N|-3|N|209|N|NL RTM|N|23-02-2015 17:45:00|N|
Voor anker|N|9607045|N|636015935|N|Klasse A|N|Vracht|N|5155.995,N|N|00254.979,E|
N|-14911|N|441197|N|3270|N|550|N||N||
• Track number
• MMSI
• X
• Y
• Navigational status: anchored
• Length (x 0.1 m)
• Width (x 0.1 m)
• Time (UTC)

Dataset storage
• Considerations:
• Hadoop prefers large files (64 Mbyte++)
• User selection by date/time
• Implementation:
• Partitioned by year, month, day and hour
• Separate directory for each partition, e.g.
/…/year=2015/month=4/day=15/hour=14

Production Hadoop cluster
• Using Hadoop as a service
• Based on Hortonworks
Data Platform 2.1
• Provided by KPN
• 4 data nodes
• 12 CPUs
• 96 GB memory
• 3 x 4 Tb disk
• Running as virtual machines
• No shared disks!

Flume configuration
• Extract time stamp
• Select the 45th field:
• Input: ^(?:[^|]*|){44}([^|])
• Output: dd-MM-yyyy HH:mm:ss
• Custom serializer:
• Implemented in Java
• Outputs only selected fields
• Configuration: sink1.serializer = com.esri.serializer.AramisSerializer$Builder

Developed tools
PolyTrackTool
LineTool & LineStatTool
DensityTool
SpeedTool

Custom ArcGIS Java Toolbox
• ArcGIS is Java-based
• Limited ArcObjects Java API examples
• ArcGIS server toolbox
• Almost same as client toolbox
• Certain functionality not available (getmap)
• FeatureSet
• Allowing to draw geographical inputs
• Unit test
• Test functionality before deploying

Access from browser (WebMap)

The challenge of counting

Results: Passages (LineTool)
• Large job (LineTool)
• Passages of 55 lines
• Full year of data
• Results
• Takes 1 hour on the cluster
• versus 1 week on a PC
• with 6 times more data!

Density Tool
• Number of observations per grid cell
• Output: centre point & population

Implementation challenges
Challenge
• Performance
• Performance YARN – MR v1
• Connectivity
• No access through firewall to
application master
• Flume
• Too slow when too many files in
CIFS spool directory
Solution
• Performance
• # containers per node = # cores
• More reduces for bigger jobs
• Connectivity
• Submit job and poll resource
manager
• Flume
• Spoon feed Flume by limiting max
number of files

Future work
• Using Spark instead of MapReduce jobs
• Faster and potentially real time
• Easier in development
• Using Python for interfacing with ArcMap
• Easier development
• Better supported / documented
• Having a web service at the Hadoop cluster
• Easier for connectivity
• Spring framework for easy development

Technical conclusions
• Geospatial application for Hadoop in production
• Integrated within the GIS system
• Easy to use for end users

Benefits
• User testimonials
• René Kronieger: “I can now obtain faster access to the information I need”
• Eric van Andel: “I can provide results more often”
• Bob van Hell: “I get my result with existing GIS tools”
• Our Hadoop solution:
• provides better insight in Port usage – smarter
• provides results more often – faster
• enables pollution calculation – more sustainable
Allard Castelein
CEO Port of Rotterdam

Questions?
• How geospatial is your data set?
• Can you use our approach?
• For further information and questions please contact:
• Frank Cremer f.cremer@portofrotterdam.com
or frank@geomatik.nl
• Mansour Raad mraad@esri.com

A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam

Recomendados

Recomendados

Más contenido relacionado

Similar a A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam

Similar a A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam (20)

Más de DataWorks Summit

Más de DataWorks Summit (20)

Último

Último (20)

A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam

Notas del editor