As new geospatial data sources come online the variety and velocity of this data makes it increasingly difficult to find the answers to intelligence problems manually.
3. Agenda
Geospatial AI at Booz Allen
Applying Spark at scale to tackle
In-Depth: Land Classification
with Black Marble
Exploring NASA’s night lights data for a new take
on land use classification.
5. ▪ Spatial Analysis
▪ Anomaly detection
▪ Colocation
▪ Density analysis
▪ Computer Vision
▪ Processing overhead imagery
▪ Synthetic data
▪ Natural Language Processing
▪ Clustering of conflict events reported by ACLED
▪ Scalable ETL
▪ Datetime normalization
▪ Geospatial normalization
Mission Applications of Geospatial AI
6. Challenges
▪ Scale of Data
▪ Geospatial datasets can get immensely large
▪ High resolution imagery, internet of things, & people
carrying GPS in their pockets has enabled the creation of
data sets with petabytes of data
▪ Quickly becomes untenable to solve geospatial problems
on a single CPU, some problems are untenable without
GPU technology
▪ Geospatial Optimization
▪ Many geospatial analytics require highly intensive
geospatial joins. Without being able to build geospatial
indices these joins become time & cost prohibitive
▪ Unique Data Formats
▪ Vector data including shapefiles, GeoJSON, KML are
standard for geospatial applications
▪ Raster data such as imagery has more than just pixel data;
these file formats also contain unique geospatial metadata
7. Spark Powered Solutions
▪ Specialized Libraries
▪ Over the past five years, Apache Spark has made massive strides
in enabling geospatial workflows through packages like GeoMesa,
GeoTrellis, GeoPandas, and GeoSpark
▪ These libraries enable spatial joins, geospatial vector analysis,
and handle specialized geospatial formats out of the box
▪ Built in Scale
▪ Apache Spark, coupled with the libraries above simplifies the
process of executing geospatial analytics in parallel
▪ Acceleration with Databricks
▪ Databricks provides built in quality of life improvements for
training geospatial AI models at scale
▪ The Databricks ML runtime enables GPU acceleration for training
deep learning models out of the box, reducing the time to set up
the Spark development environment
▪ Delta Lake coupled with ML Flow enables model tracking, data
versioning, and accelerated analytics making it possible to track
AI models as they are trained
9. ▪ Human Light Around the World
▪ Corrected to subtract natural reflection including moonlight,
albedo, and backscatter
▪ Factors in seasonal effects such as vegetation and snow
▪ Daily Collection
▪ Captures variation in seasonality
▪ Available back to January 2012
▪ Variety of Use Cases
▪ Disaster impact
▪ Environmental monitoring
▪ Economic analysis
Black Marble
10. Modeling
▪ Features
▪ How much output did we see from areas initially?
▪ How did output change over time?
▪ Regression
▪ Outputs slope and intercept
▪ Resistant to noisy input values
▪ Performed over time series for each pixel
▪ Clustering
▪ Cluster pixels based on initial output and slope of change over time
ß Steeper Decline Steeper Incline à
HigherInitialRadianceàRadiance
Day
11. Making it Parallel
▪ Rare Use Case…
▪ Many small models required for analysis
▪ Rare use case, not currently optimized well with SparkML
▪ …but still benefits from parallel
▪ Low level RDDs and udfs to distribute across nodes
▪ A small, 3 node cluster resulted in 6x speed increase
▪ Benefits scale with application
▪ Scaling across geographic areas or larger time windows
could be infeasible without parallelizing.
12. Identifying Areas of Interest
Paulsboro Refinery
Pixels located in target cluster
Borgata Casino, March 20, 2020
Tropicana Resort
Steel Pier
Borgata Casino
14. Data Science at
Booz Allen
▪ Strategic and Technical Advisory
▪ Understand AI readiness and develop a roadmap to
implement responsible and ethical AI solutions
▪ Design and Implementation
▪ AI Solution Development + change management and
organizational design to create sustainable solutions
▪ ML Ops
▪ Formal ML Engineering process for end-to-end lifecycle of
production-grade ML
▪ Training
▪ Partnership with NVIDIA's Deep Learning Institute to deliver
both technical and non-technical AI training
15. Donald Polaski
Chief Data Scientist
Booz Allen Hamilton
Polaski_Donald@bah.com
https://www.linkedin.com/in/dpolaski/
Michael Gasvoda
Lead Data Scientist
Booz Allen Hamilton
Gasvoda_Michael@bah.com
https://www.linkedin.com/in/michaelgasvoda/