BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
Large Scale Image Processing for Disaster Relief Using Open Cloud Consortium Resources
1. Project Matsu: Large Scale On-Demand
Image Processing for Disaster Relief
Collin Bennett, Robert Grossman,
Yunhong Gu, and Andrew Levine
Open Cloud Consortium
June 21, 2010
www.opencloudconsortium.org
2. Project Matsu Goals
• Provide persistent data resources and elastic
computing to assist in disasters:
– Make imagery available for disaster relief workers
– Elastic computing for large scale image processing
– Change detection for temporally different and
geospatially identical image sets
• Provide a resource to test standards and
interoperability studies large data clouds
4. • 501(3)(c) Not-for-profit corporation
• Supports the development of standards,
interoperability frameworks, and reference
implementations.
• Manages testbeds: Open Cloud Testbed and
Intercloud Testbed.
• Manages cloud computing infrastructure to support
scientific research: Open Science Data Cloud.
• Develops benchmarks.
4
www.opencloudconsortium.org
5. OCC Members
• Companies: Aerospace, Booz Allen Hamilton,
Cisco, InfoBlox, Open Data Group, Raytheon,
Yahoo
• Universities: CalIT2, Johns Hopkins,
Northwestern Univ., University of Illinois at
Chicago, University of Chicago
• Government agencies: NASA
• Open Source Projects: Sector Project
5
6. Operates Clouds
• 500 nodes
• 3000 cores
• 1.5+ PB
• Four data centers
• 10 Gbps
• Target to refresh 1/3
each year.
• Open Cloud Testbed
• Open Science Data Cloud
• Intercloud Testbed
• Project Matsu: Cloud-
based Disaster Relief
Services
7. Open Science Data Cloud
7
Astronomical data
Biological data
(Bionimbus)
Networking data
Image processing for disaster relief
8. Focus of OCC Large Data Cloud Working Group
8
Cloud Storage Services
Cloud Compute Services
(MapReduce, UDF, & other programming
frameworks)
Table-based Data
Services
Relational-like
Data Services
App App App App App
App App
App App
• Developing APIs for this framework.
9. Tools and Standards
• Apache Hadoop/MapReduce
• Sector/Sphere large data cloud
• Open Geospatial Consortium
– Web Map Service (WMS)
• OCC tools are open source (matsu-project)
– http://code.google.com/p/matsu-project/
10. Part 2: Technical Approach
• Hadoop – Lead Andrew Levine
• Hadoop with Python Streams – Lead Collin
Bennet
• Sector/Sphere – Lead Yunhong Gu
13. Image Processing in the Cloud - Reducer
Reducer Key Input: Bounding Box
(minx = -45.0 miny = -2.8125 maxx = -43.59375 maxy = -2.109375)
Reducer Value Input:
Step 1: Input to Reducer
… …
Step 2: Process difference in Reducer
Assemble Images based on timestamps and compare Result is a delta of the two Images
Step 3: Reducer Output
All images go to different map layers set of images for display in WMS
Timestamp 1
Set
Timestamp 2
Set
Delta Set
15. Preprocessing Step
• All images (in a batch to be processed) are
combined into a single file.
• Each line contains the image’s byte array
transformed to pixels (raw bytes don’t seem
to work well with the one-line-at-a-time
Hadoop streaming paradigm).
geolocation t timestamp | tuple size
; image width ; image height; comma-
separated list of pixels
the fields in red are metadata needed to process the image in the
reducer
16. Map and Shuffle
• We can use the identity mapper
• All of the work for mapping was
done in the pre-process step
• Map / Shuffle key is the geolocation
• In the reducer, the timestamp will be
1st field of each record when
splitting on ‘|’
18. Sector Distributed File System
• Sector aggregate hard disk storage across
commodity computers
– With single namespace, file system level reliability
(using replication), high availability
• Sector does not split files
– A single image will not be split, therefore when it
is being processed, the application does not need
to read the data from other nodes via network
– A directory can be kept together on a single node
as well, as an option
19. Sphere UDF
• Sphere allows a User Defined Function to be
applied to each file (either it is a single image
or multiple images)
• Existing applications can be wrapped up in a
Sphere UDF
• In many situations, Sphere streaming utility
accepts a data directory and a application
binary as inputs
• ./stream -i haiti -c ossim_foo -o results