Maurice Bouwhuis (SARA/Vancis) - Hoe big data te begrijpen door ze te visualiseren

Big Data: Visualization

Dr. Maurice Bouwhuis
SARA National High Performance Computing Services

Big Data analytics– Almere 14-06-2012

About SARA: Our Mission
Is to Support Innovation
SARA national HPC Center has
about ~170 fte’s in 2 locations
(Amsterdam and Almere)
Offices in Amsterdam
The mission of SARA is 2-fold:
4. Supporting research in the Netherlands
[SARA BV for Science & Innovation]

5. Offering commercial high-end ICT Data Center in Amsterdam
services
[Vancis BV for adVANCed Ict Services]

SARA werkt nauw samen met SURF

Data Center in Almere

Almere Big Data Hoofdstad


Big Data: The Deluge


What is Big Data?
“I cannot define it, but I know it when I see it”
My Big Bear now

My Big Bear then


Wikipedia Defining Big Data
beyond Commonly used ICT

“Data sets whose size is beyond the ability of commonly
used tools to capture, manage, and process the data within
a tolerable elapsed time.

Big Data as defined
by IDC (2011)
“Bringing together vast
amounts of data from public
and private sources,
combined with the intuition
of business and thought
leaders and the speed and
affordability of today's
computers.”
(IDC October 2011)


Defining Big Data
The 3 V’s
Volume
 Large amounts
 Massive historical archives
 Valuable for data mining Velocity
Velocity
 At very high rates
(sensors, streams, social media, …)
 Valuable in its “fresh” state
Volume Variety
Variety
 Structured, semi-structured and unstructured
 Variety also in Value


2 new V’s: Viscosity & Virality


Big Data Drivers
“Internet of Things”
Commoditization of HPC
Human dynamics can be
easily stored and queried
with Apache Hadoop
 HDFS (storage)
Hadoop Distributed File System
 MapReduce (processing)
high performance parallel data
processing
 Scalable & Self-healing
So, Big data is driven by
large scale data collection,
storage and (information)
processing


Data Deluge @SARA
It has always been there…
Scientific Data Deluge:
 Observations (e.g.
LOFAR, Lifewatch)
 Large-scale Simulation
(e.g. astrophysics,
climate modeling)
 Experiments (e.g. Large
Hadron Collider, DNA
e-Science and Technology Infrastructure for Large Hadron Collider
sequencers) Biodiversity Data and Ecosystem Research

Multi-Petabytes of data
growth at SARA each year
Single datasets of 10-100
Terabytes and larger
Multidisciplinary use of data
Science needs Insight, not
only Data Low Frequency Array Biobanking and Biomolecular
Resources Research Infrastructure


Big Data Ultimate Challenge:
How to get insight?
As volume, variety and
velocity of data
increase, use of
visualization is
imperative to help
getting the insight for
an ever increasingly
data-driven future


Some Applications of Big Data
Astronomy
Astron
1 Exabyte per
day raw data
2 times WWW
traffic per day

SKA
300 - 1500
Petabytes
storage per
year!
20 times LHC


Healthcare
Erasmus MC
Diagnostics:
4%: costs
72%: decisions

Opportunities
for disease
management:
1) New
classification of
patients for
better
diagnostics &
combined
therapy
2) Assessing and
managing risks


Water Management
Total length primary flood
defenses in Netherlands:
2875km spread over 90 dike
“rings”…
Decision Support System:
Integration of: Sensor data +
AI, Simulation results, Maps,
weather, ships, roadwork,
traffic, twitter, GSM, location
of emergency services, ...


Infrawatch, Hollandse Brug

145 x 100 x 60 x 60 x 24 x 365 = big data
sensors Hz seconds minutes hours days
(Arno Knobbe, LIACS, 2011, http://infrawatch.liacs.nl)


Ecology
Citizen Science:
>20,000 users, >50M
observations.
Bird radars: streaming
data, many terabytes
GPS-tracking:
Streaming data,
Word-wide projects.
Massive amounts of
complementary, multi-
scale information that
can not be “seen” in
the field.


eScience is also (big) data mining

Cognition: image analysis
and data exchange Food Specific Ontologies for
Climate Research: Food Focused Text Mining
Regional Sea-Level

Chemical Metabolomics
Data Analysis

Biography Portal: Data-Intensive Modeling
by SURF & NWO
interconnections, trends, of the Global Water Cycle
geographical maps and
time lines Big Data analytics– Almere 14-06-2012

CosmoGrid Case: The Need for
Integrated e-Infrastructure Services
A cosmological N-body simulation with
8,589,934,592 particles, formation of large
structures of dark matter
Dutch Computing Challenge Project & DEISA
Extreme Computing Initiative: DCCP 2008 –
2009 / DECI 2009
Run 1 + 2:
4.25 M core hours Computing, 110 TB data
Huygens Amsterdam + Cray XT4 Tokyo,
coupled via light path and Amsterdam + Tokyo
+ Helsinki + Edinburgh
High resolution data remote visualization on
tiled panel display
Advanced support in porting and optimization,
visualization, data storage, networking and
project management
All infrastructure elements and their integration are crucial

Visit SURF 7-6-2012

Visualization @ SARA
more than 20 years of experience and support

 Scientific visualization  High resolution  Scientific visualization
 Scientific & industrial
support visualization support support
visualization support
 Rendering  Remote visualization  Remote visualization
 Virtual Reality
 Animations and slides & streaming service  Collaboration support


How are we
Coping with Big Data?
HPC centers, universities,
and in recent years, Internet
companies like Yahoo!,
Facebook en of course
Google are pioneers (lots of
knowledge exchange, by the
way.)
We collect Big Data, store it
and we have the knowledge
to interpret it.
What tools do we have to
pull this out?


‘Collaboratorium’: New visualization
and collaboration facility @SARA
videoconference laptop 2 laptop 3 website in browser
Visualization of big shared data

data and trends
Also for improving
business and
Science models
and computational
debugging.
PowerPoint, Video
Conferencing,
videoconference laptop 1 data from data from
telepresence, 3D remote camera workstation 1 workstation 2

(stereo) projection
Based on proven
technology from
SARA and
partners EVL and
Calit2 (San Diego)


Visualization @ SARA –
Remote Visualization
Remote visualization service
q Provide dedicated visualization resources in SARA
data center: Rendercluster and visualization software
(i.e. Paraview, VisIt, VTK, VMD, Blender, ...
q Embedded in national e-Infrastructure
q Visualization resource has direct access to storage at
SARA
q Avoid large data transfers over network (esp.
Internet) by running visualization applications
remotely
q Pixel output/remote desktop transferred to user,
instead of files
q Application support for parallel rendering


Big Data Requires Big Computing

What benefits could
exascale computing bring?
It will enable discovery in many
areas of science. "Aerospace
engineering, astrophysics,
biology, climate modeling and
national security all have
applications with extreme
computing requirements,"


Compute Ecosystem @SARA
1. Low-latency, high-bandwidth
capability computing (Huygens)
2. Capacity compute clusters
(LISA)
3. Loosely coupled compute Grids
(Big Grid)
4. Sector, private and public
Clouds Including our HPC
Cloud) and Beehub storage
5. Special-purpose (GPU)
clusters
6. Big Data Apache Hadoop
systems (since 2009)


Big Data Eco-System @SARA
DevOps Programming algorithms Domain knowledge


To Summarize: Big Data
Is Changing Rapidly our Life
Big Data is changing
science, medicine,
business, and technology.
A whole new way of
science: correlation
supersedes causation,
coherent models or unified
theories…
Biggest challenge for
science & business is not
storing or processing data
but how to make sense of
it without affecting our
privacy.

Big Data… Big Enough?
Thank You


Maurice Bouwhuis (SARA/Vancis) - Hoe big data te begrijpen door ze te visualiseren

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (17)

Similar a Maurice Bouwhuis (SARA/Vancis) - Hoe big data te begrijpen door ze te visualiseren

Similar a Maurice Bouwhuis (SARA/Vancis) - Hoe big data te begrijpen door ze te visualiseren (20)

Más de AlmereDataCapital

Más de AlmereDataCapital (19)

Último

Último (20)

Maurice Bouwhuis (SARA/Vancis) - Hoe big data te begrijpen door ze te visualiseren

Notas del editor