Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 31 Anuncio

Más Contenido Relacionado

Similares a A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam (20)

Más de DataWorks Summit (20)

Anuncio

Más reciente (20)

A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam

  1. 1. 1A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Frank Cremer (Geomatik) Mansour Raad (ESRI) A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam Hadoop Summit, Brussels, 15 April 2015
  2. 2. 2A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Rue des Bouchers
  3. 3. 3A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Where are the ships? AIS = Automatic Identification System
  4. 4. 4A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Radar and control station VTS = Vessel Traffic Service
  5. 5. 5A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Port Challenges The Port should become smarter, faster and more sustainable Allard Castelein CEO Port of Rotterdam
  6. 6. 6A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Access information in three clicks
  7. 7. 7A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Presentation Highlights • Geospatial data set • Only geospatial presentation this summit • Sensor data • Structured, but not flawless • Easy access to Hadoop functionality • Retrieving information in three clicks • Users can be agnostic about Hadoop
  8. 8. 8A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam About the speakers • Mansour Raad • BigData advocate • Senior Software Architect • ESRI – World’s largest GIS company • GIS = Geographical Information System • Frank Cremer • Independent Geospatial and big data Consultant • Consulting for the Port since 2008 • Geomatik
  9. 9. 9A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Port of Rotterdam: the facts • 8th largest port in the world • Largest port of Europe • Total area: 12,600 ha • Depth 24 meter • 70.5 km quay length Maasvlakte 2
  10. 10. 10A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Port of Rotterdam in figures (1 year) • 35.000 ship visits with 400 million ton cargo • 80.000 barge visits • 7.500.000 trucks (25.000 per day) 28% 48% 7% 17% Road Barge Railway Pipe lines Over 40 kilometers +
  11. 11. 11A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Usage of ship position data • Harbour master • Incident analysis • Safety checks • Capacity management • Identifying bottlenecks • Planning decision support • Environmental management • Pollution (NOx) calculations • Speed measures to reduce pollutions ?
  12. 12. 12A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam PortMaps project • New geographical information system • Deployed in partnership with ESRI • Key characteristics: • One uniform source of data • Easy access • Ship position data
  13. 13. 13A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Is ship position data Big Data? • 5 Terabytes (since 2009) • 1 Terabyte per year >1,000 records every 10 sec Single data format (csv) Volume Variety Velocity
  14. 14. 14A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Portmaps: Ship position data • Challenges: • Receiving data every 10s • 10 Million records per day • Considered options: • Geospatial database; possible but • Expensive • Custom partitioning required • Analyses could be a challenge • Hadoop • Commodity hardware • Built for huge data sets (Petabytes) • Framework for analyses
  15. 15. 15A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Simplified architecture
  16. 16. 16A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam How it all got started… ESRI.NL ad hoc cluster • ESRI.NL hardware • 4 x R200 Dell • 2 x 1 Tb harddisks • 1 x 4 Cores • 16 GB RAM • CentOS • Installed CDH4 • Handed 2 1Tb USB drives • 2 days to bulk load 2.5 Tb of data
  17. 17. 17A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Dataset N|3270|N|550|N|-14927|441077|1|N|1|N|0|N|194|N|N||||||||||||||2231|01-03-2015 01:00:04| J|N|N|N|N|||||5155.929,N|00254.968,E|||01-03-2015 00:00:04|A|R|N|ORE SALVADOR| N|D5DO9|N|179|N|5|N|0|N|-5|N|188|N|-3|N|209|N|NL RTM|N|23-02-2015 17:45:00|N| Voor anker|N|9607045|N|636015935|N|Klasse A|N|Vracht|N|5155.995,N|N|00254.979,E| N|-14911|N|441197|N|3270|N|550|N||N|| • Track number • MMSI • X • Y • Navigational status: anchored • Length (x 0.1 m) • Width (x 0.1 m) • Time (UTC)
  18. 18. 18A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Dataset storage • Considerations: • Hadoop prefers large files (64 Mbyte++) • User selection by date/time • Implementation: • Partitioned by year, month, day and hour • Separate directory for each partition, e.g. /…/year=2015/month=4/day=15/hour=14
  19. 19. 19A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Production Hadoop cluster • Using Hadoop as a service • Based on Hortonworks Data Platform 2.1 • Provided by KPN • 4 data nodes • 12 CPUs • 96 GB memory • 3 x 4 Tb disk • Running as virtual machines • No shared disks!
  20. 20. 20A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Flume configuration • Extract time stamp • Select the 45th field: • Input: ^(?:[^|]*|){44}([^|]) • Output: dd-MM-yyyy HH:mm:ss • Custom serializer: • Implemented in Java • Outputs only selected fields • Configuration: sink1.serializer = com.esri.serializer.AramisSerializer$Builder
  21. 21. 21A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Developed tools PolyTrackTool LineTool & LineStatTool DensityTool SpeedTool
  22. 22. 22A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Custom ArcGIS Java Toolbox • ArcGIS is Java-based • Limited ArcObjects Java API examples • ArcGIS server toolbox • Almost same as client toolbox • Certain functionality not available (getmap) • FeatureSet • Allowing to draw geographical inputs • Unit test • Test functionality before deploying
  23. 23. 23A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Access from browser (WebMap)
  24. 24. 24A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam The challenge of counting
  25. 25. 25A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Results: Passages (LineTool) • Large job (LineTool) • Passages of 55 lines • Full year of data • Results • Takes 1 hour on the cluster • versus 1 week on a PC • with 6 times more data!
  26. 26. 26A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Density Tool • Number of observations per grid cell • Output: centre point & population
  27. 27. 27A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Implementation challenges Challenge • Performance • Performance YARN – MR v1 • Connectivity • No access through firewall to application master • Flume • Too slow when too many files in CIFS spool directory Solution • Performance • # containers per node = # cores • More reduces for bigger jobs • Connectivity • Submit job and poll resource manager • Flume • Spoon feed Flume by limiting max number of files
  28. 28. 28A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Future work • Using Spark instead of MapReduce jobs • Faster and potentially real time • Easier in development • Using Python for interfacing with ArcMap • Easier development • Better supported / documented • Having a web service at the Hadoop cluster • Easier for connectivity • Spring framework for easy development
  29. 29. 29A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Technical conclusions • Geospatial application for Hadoop in production • Integrated within the GIS system • Easy to use for end users
  30. 30. 30A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Benefits • User testimonials • René Kronieger: “I can now obtain faster access to the information I need” • Eric van Andel: “I can provide results more often” • Bob van Hell: “I get my result with existing GIS tools” • Our Hadoop solution: • provides better insight in Port usage – smarter • provides results more often – faster • enables pollution calculation – more sustainable Allard Castelein CEO Port of Rotterdam
  31. 31. 31A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Questions? • How geospatial is your data set? • Can you use our approach? • For further information and questions please contact: • Frank Cremer f.cremer@portofrotterdam.com or frank@geomatik.nl • Mansour Raad mraad@esri.com

Notas del editor

  • For the Port of Rotterdam it’s of course important to know where are all the ships!

    One of the primary roles of the Port of Rotterdam Authorities is to guide ships safely to their destinations. Each ship has an AIS transponder that continuously transmits the current location of the ship. AIS stands for Automatic Identification System, so not only the location is broadcast but also the ID of the ship (amongst other parameters).

    That’s one way, how the Port of Rotterdam “knows” where the ships are.

  • Throughout the Port the radar stations continuously scan for ships. This information together with the AIS information is passed to the (vessel traffic service) VTS operators. The VTS operators are located in several control stations throughout the port and they are responsible to manage the shipping in real time.
  • As ships become bigger and bigger, it poses a challenge to get the goods transported further on. This container ship can handle up to 18,000 containers, transporting the equivalent of 125 milion pairs of shoes.

    Therefore the CEO of the Port of Rotterdam has said that the Port should become smarter, faster and more sustainable. The way to do that is to innovate in which this project contributes.
  • This animated slides gives a offline demonstration.
  • Where does our presentation stand out?

    First, our presentation deal with a geospatial data set, which means data where location is important. Despite location being omnipresent, very few Hadoop applications deal with this data set; as far as I can tell this might be the only presentation in the summit.
    Second, we deal with sensor data; meaning observations and measurements. We’re obviously not only at that, although most Hadoop applications deal with social and/or transaction data. As with any measurement, errors can and will occur. Special care is need tot take that into account for analysis.
    Third, we’ve created an interface that allow end users easy access to the information obtained from big data. I’ll demonstrate that in the presentation.
  • Here are some facts about the Port of Rotterdam. It’s area is about three times the city of Brussels or 80% of the Brussels region.
  • You may have heard about the Port of Rotterdam. It’s one of the biggest in the world. In terms of size, it’ big; it stretches for more than 40 km. It takes up to 4 hours to sail from one end to the other.
  • We store all this data for three main customers.
    The Harbour master main interest is in safety. They use the tool for incidents. For example when there is a collision, they’ll like to know what happened. They of course like to prevent this from happening so they’ll like to see how the harbour is used and identify possible safety concerns.
    The second group, capacity management is interested to ensure quick and easy passage of goods through the harbour. They’re interesting in identifying bottlenecks by looking at traffic patterns. Furthermore they’re interested in how current traffic patterns may alter if certain changes are made like widening of channels. This enables better decision making.
    The third group, environmental management is interested in the pollution effect of the shipping. They are also evaluating speed measures that are put in place to reduce the pollution.
  • The big data work here is part of the Portmaps project. In this project the Port of Rotterdam has implemented a new geographical information system.

    Uniform source of data.
  • All this data about ships and their location: is it big data? And does it make sense to use Hadoop for it?

    Let us look at the big data score card:
    For big data three key characteristics are import: volume, velocity and variety. The data has a reasonable volume. It comes in at quite a high velocity at over a 1,000 records per second. It has only a single data format so it doesn’t meet the variety characteristic. However, it meets the other two characteristics so it is big data and it does make sense to use Hadoop for it.

    Volume = 18 billion records since 2009, there is three times the number of people in the world.
    Velocity = during this presentation 250,000 records have been added
  • One part of the data set for Portmaps data set is ship position data. As we’ve seen the properties for this data set are receiving data every 10 seconds.

    Several options were considered. The most potential one is storing it in a geospatial database. However it is expensive, it may require custom partitioning. It also requires custom queries and code to perform analyses.

    And then there is of course Hadoop.
  • The external radar/AIS system places a file in the spool directory every 10s. Flume picks up this file and serialises it and sinks it into Hadoop.

    To be able to access data a custom toolbox has been created that access the Hadoop cluster. It can read and write data from HDFS and can submit jobs.
    The clients ArcMap and WebMap make use of the geoprocessing services that is provided by the custom Java toolbox.
  • The data set is just a CSV line for each observed ship every 10 seconds. Here is one example line. Each field is separated by the bar character. The following information is extracted from this line:
    Track number – an assigned number by the radar/AIS system
    MMSI – an unique identification number of the ship; based on that we know which ship it is.
    X – The X-coordinate of the ship
    Y – The Y-coordinate of the ship
    Navigational status: whether is moored, anchored or moving. In this case it is anchored.
    Length – the length of the ship. Although based on the MMSI this ship property may be found. However, may differ for barges that are pushing boat with car floats the length is variable.
    Breadth – or width same as for the length.
    Time – th

  • The ship positions data set is stored in Hadoop. Two considerations are important. One is that Hadoop prefers big files. In fact it can split big files and have it send to different mappers if need. Second, users often wants ranges of data to be considered.

    We have chosen to partition the data at the hourly level. For each hour we store about 80Mb. Each file can therefore be processed by one mapper. If we consider a day, 24 mappers can work in parallel.
  • To facilitate easy deployment, the Port of Rotterdam has chosen for the Hadoop as a Service solution as provided by KPN. KPN is one of the main IT service providers for the Port.

    The cluster is configured as stated.
    Although the cluster is virtual, each node has exclusive access to its three disks.
  • This animated slides gives a offline demonstration.

    To make it even easier for end users to obtain information. We’ve also created a webmap application. The end user just needs to go to the right website and gets a map of the area.

×