SlideShare una empresa de Scribd logo
1 de 33
Notre Dame CCL Meeting October 11 2013
NIST Big Data Public Working Group NBD-PWG
Geoffrey Fox
Based on September 30, 2013 Presentations
at one day workshop at NIST
Leaders of activity
Wo Chang, NIST
Robert Marcus, ET-Strategies
Chaitanya Baru, UC San Diego
Note web site http://bigdatawg.nist.gov/ is shut down
Notre Dame CCL Meeting October 11 20139/29/13
NBD-PWG Charter
Launch Date: June 26, 2013; Public Meeting with interim deliverables:
September, 30, 2013; Edit and send out for comment Nov-Dec 2013
• The focus of the (NBD-PWG) is to form a community of interest
from industry, academia, and government, with the goal of
developing a consensus definitions, taxonomies, secure reference
architectures, and technology roadmap. The aim is to create
vendor-neutral, technology and infrastructure agnostic deliverables
to enable big data stakeholders to pick-and-choose best analytics
tools for their processing and visualization requirements on the
most suitable computing platforms and clusters while allowing
value-added from big data service providers and flow of data
between the stakeholders in a cohesive and secure manner.
• Note identify common/best practice; includes but not limited to
discussing standards (S in NIST)
2
Notre Dame CCL Meeting October 11 20139/29/13
NBD-PWG Subgroups & Co-Chairs
3
• Requirements and Use Cases SG
– Geoffrey Fox, Indiana U.; Joe Paiva, VA; Tsegereda Beyene, Cisco
• Definitions and Taxonomies SG
– Nancy Grady, SAIC; Natasha Balac, SDSC; Eugene Luster, R2AD
• Reference Architecture SG
– Orit Levin, Microsoft; James Ketner, AT&T; Don Krapohl, Augmented
Intelligence
• Security and Privacy SG
– Arnab Roy, CSA/Fujitsu Nancy Landreville, U. MD Akhil Manchanda, GE
• Technology Roadmap SG
– Carl Buffington, Vistronix; Dan McClary, Oracle; David Boyd, Data Tactic
Notre Dame CCL Meeting October 11 20139/29/13
Requirements and Use Case Subgroup
4
The focus is to form a community of interest from industry, academia, and
government, with the goal of developing a consensus list of Big Data
requirements across all stakeholders. This includes gathering and
understanding various use cases from diversified application domains.
Tasks
• Gather use case input from all stakeholders
• Derive Big Data requirements from each use case.
• Analyze/prioritize a list of challenging general requirements that may
delay or prevent adoption of Big Data deployment
• Work with Reference Architecture to validate requirements and reference
architecture
• Develop a set of general patterns capturing the “essence” of use cases
(to do)
Notre Dame CCL Meeting October 11 20139/29/13
Use Case Template
• 26 fields completed for 51
areas
• Government Operation: 4
• Commercial: 8
• Defense: 3
• Healthcare and Life Sciences:
10
• Deep Learning and Social
Media: 6
• The Ecosystem for Research:
4
• Astronomy and Physics: 5
• Earth, Environmental and
Polar Science: 10
• Energy: 1
Notre Dame CCL Meeting October 11 20139/29/13
51 Detailed Use Cases: Many TB’s to Many PB’s
• Government Operation: National Archives and Records Administration, Census Bureau
• Commercial: Finance in Cloud, Cloud Backup, Mendeley (Citations), Netflix, Web Search, Digital
Materials, Cargo shipping (as in UPS)
• Defense: Sensors, Image surveillance, Situation Assessment
• Healthcare and Life Sciences: Medical records, Graph and Probabilistic analysis, Pathology,
Bioimaging, Genomics, Epidemiology, People Activity models, Biodiversity
• Deep Learning and Social Media: Driving Car, Geolocate images/cameras, Twitter, Crowd Sourcing,
Network Science, NIST benchmark datasets
• The Ecosystem for Research: Metadata, Collaboration, Language Translation, Light source experiments
• Astronomy and Physics: Sky Surveys compared to simulation, Large Hadron Collider at CERN, Belle
Accelerator II in Japan
• Earth, Environmental and Polar Science: Radar Scattering in Atmosphere, Earthquake, Ocean, Earth
Observation, Ice sheet Radar scattering, Earth radar mapping, Climate simulation datasets,
Atmospheric turbulence identification, Subsurface Biogeochemistry (microbes to watersheds),
AmeriFlux and FLUXNET gas sensors
• Energy: Smart grid
• Next step involves matching extracted requirements and reference architecture
• Alternatively develop a set of general patterns capturing the “essence” of use cases
Notre Dame CCL Meeting October 11 20139/29/13
Some Trends
• Also from IEEE Big Data conference in Santa Clara
• Practitioners consider themselves Data Scientists
• Images are a major source of Big Data
– Radar
– Light Synchrotrons
– Phones
– Bioimaging
7
• Hadoop and HDFS dominant
• Business – main emphasis at
NIST – interested in analytics
and assume HDFS
• Academia also extremely
interested in data management
• Clouds v. Grids
Notre Dame CCL Meeting October 11 20139/29/13
Example Use Case I: Summary of Genomics
• Application: NIST/Genome in a Bottle Consortium integrates data from
multiple sequencing technologies and methods to develop highly
confident characterization of whole human genomes as reference
materials, and develop methods to use these Reference Materials to
assess performance of any genome sequencing run.
• Current Approach: The storage of ~40TB NFS at NIST is full; there are
also PBs of genomics data at NIH/NCBI. Use Open-source sequencing
bioinformatics software from academic groups (UNIX-based) on a 72 core
cluster at NIST supplemented by larger systems at collaborators.
• Futures: DNA sequencers can generate ~300GB compressed data/day
which volume has increased much faster than Moore’s Law. Future data
could include other ‘omics’ measurements, which will be even larger than
DNA sequencing. Clouds have been explored.
8
Notre Dame CCL Meeting October 11 20139/29/13
Example Use Case II: Census Bureau Statistical Survey
Response Improvement (Adaptive Design)
• Application: Survey costs are increasing as survey response declines. The goal of this work
is to use advanced “recommendation system techniques” that are open and scientifically
objective, using data mashed up from several sources and historical survey para-data
(administrative data about the survey) to drive operational processes in an effort to
increase quality and reduce the cost of field surveys.
• Current Approach: About a petabyte of data coming from surveys and other government
administrative sources. Data can be streamed with approximately 150 million records
transmitted as field data streamed continuously, during the decennial census. All data
must be both confidential and secure. All processes must be auditable for security and
confidentiality as required by various legal statutes. Data quality should be high and
statistically checked for accuracy and reliability throughout the collection process. Use
Hadoop, Spark, Hive, R, SAS, Mahout, Allegrograph, MySQL, Oracle, Storm, BigMemory,
Cassandra, Pig software.
• Futures: Analytics needs to be developed which give statistical estimations that provide
more detail, on a more near real time basis for less cost. The reliability of estimated
statistics from such “mashed up” sources still must be evaluated.
9
Notre Dame CCL Meeting October 11 20139/29/13
Example Use Case III: Large-scale Deep Learning
• Application: Large models (e.g., neural networks with more neurons and connections)
combined with large datasets are increasingly the top performers in benchmark tasks for
vision, speech, and Natural Language Processing. One needs to train a deep neural network
from a large (>>1TB) corpus of data (typically imagery, video, audio, or text). Such training
procedures often require customization of the neural network architecture, learning criteria,
and dataset pre-processing. In addition to the computational expense demanded by the
learning algorithms, the need for rapid prototyping and ease of development is extremely high.
• Current Approach: The largest applications so far are to image recognition and scientific studies
of unsupervised learning with 10 million images and up to 11 billion parameters on a 64 GPU
HPC Infiniband cluster. Both supervised (using existing classified images) and unsupervised
applications investigated.
• Futures: Large datasets of 100TB or more may be necessary in order to exploit the
representational power of the larger models. Training a self-driving car could take 100 million
images at megapixel resolution. Deep Learning shares many characteristics with the broader
field of machine learning. The paramount requirements are high computational throughput for
mostly dense linear algebra operations, and extremely high productivity for researcher
exploration. One needs integration of high performance libraries with high level (python)
prototyping environments.
10
Notre Dame CCL Meeting October 11 20139/29/13
Example Use Case IV:
EISCAT 3D incoherent scatter radar system
• Application: EISCAT, the European Incoherent Scatter Scientific Association, conducts research on the
lower, middle and upper atmosphere and ionosphere using the incoherent scatter radar technique.
This technique is the most powerful ground-based tool for these research applications. EISCAT studies
instabilities in the ionosphere, as well as investigating the structure and dynamics of the middle
atmosphere. It is also a diagnostic instrument in ionospheric modification experiments with addition
of a separate Heating facility. Currently EISCAT operates 3 of the 10 major incoherent radar scattering
instruments worldwide with its facilities in in the Scandinavian sector, north of the Arctic Circle.
• Current Approach: The current running old EISCAT radar generates terabytes per year rates and does
not present special challenges.
• Futures: The design of the next generation radar, EISCAT_3D, will consist of a core site with a
transmitting and receiving radar arrays and four sites with receiving antenna arrays at some 100 km
from the core. The fully operational 5-site system will generate several thousand times data of current
EISCAT system with 40 PB/year in 2022 and is expected to operate for 30 years. EISCAT 3D data e-
Infrastructure plans to use the high performance computers for central site data processing and high
throughput computers for mirror sites data processing. Downloading the full data is not time critical,
but operations require real-time information about certain pre-defined events to be sent from the
sites to the operation center and a real-time link from the operation center to the sites to set the
mode of radar operation on with immediate action.
11
Notre Dame CCL Meeting October 11 20139/29/13
Example Use Case V: Consumption forecasting in Smart Grids
• Application: Predict energy consumption for customers, transformers, sub-stations
and the electrical grid service area using smart meters providing measurements
every 15-mins at the granularity of individual consumers within the service area of
smart power utilities. Combine Head-end of smart meters (distributed), Utility
databases (Customer Information, Network topology; centralized), US Census data
(distributed), NOAA weather data (distributed), Micro-grid building information system
(centralized), Micro-grid sensor network (distributed). This generalizes to real-time
data-driven analytics for time series from cyber physical systems
• Current Approach: GIS based visualization. Data is around 4 TB a year for a city with
1.4M sensors in Los Angeles. Uses R/Matlab, Weka, Hadoop software. Significant
privacy issues requiring anonymization by aggregation. Combine real time and
historic data with machine learning for predicting consumption.
• Futures: Wide spread deployment of Smart Grids with new analytics integrating
diverse data and supporting curtailment requests. Mobile applications for client
interactions.
12
Notre Dame CCL Meeting October 11 20139/29/13 13
Part of Property Summary Table
Notre Dame CCL Meeting October 11 20139/29/13
Definitions and Taxonomies Subgroup
14
• The focus is to gain a better understanding of the principles of Big Data.
It is important to develop a consensus-based common language and
vocabulary terms used in Big Data across stakeholders from industry,
academia, and government. In addition, it is also critical to identify
essential actors with roles and responsibility, and subdivide them into
components and sub-components on how they interact/ relate with each
other according to their similarities and differences.
Tasks
• For Definitions: Compile terms used from all stakeholders regarding the
meaning of Big Data from various standard bodies, domain applications,
and diversified operational environments.
• For Taxonomies: Identify key actors with their roles and responsibilities
from all stakeholders, categorize them into components and
subcomponents based on their similarities and differences
Notre Dame CCL Meeting October 11 20139/29/13
Data Science Definition (Big Data less consensus)
• Data Science is the extraction of
actionable knowledge directly from
data through a process of discovery,
hypothesis, and analytical
hypothesis analysis.
• A Data Scientist is a practitioner who
has sufficient knowledge of the
overlapping regimes of expertise in
business needs, domain knowledge,
analytical skills and programming
expertise to manage the end-to-end
scientific method process through
each stage in the big data lifecycle.
15
Big Data refers to digital data volume,
velocity and/or variety whose
management requires scalability across
coupled horizontal resources
Notre Dame CCL Meeting October 11 20139/29/13
Data Science (MOOC) Degree at IU
16
• Program in Data Science hosted by School of
Informatics and Computing (37 faculty)
• Initial activity is online Masters offered as MOOC
• Still on approval cycle
• First 4 courses (Certificate) which hopefully will
be available next Semester
– Big Data Applications and Analytics (~complete
trial at https://x-informatics.appspot.com/preview)
– Clouds
– Information Visualization
– Life Sciences Informatics
Notre Dame CCL Meeting October 11 20139/29/13
Reference Architecture Subgroup
17
The focus is to form a community of interest from industry, academia, and
government, with the goal of developing a consensus-based approach to orchestrate
vendor-neutral, technology and infrastructure agnostic for analytics tools and
computing environments. The goal is to enable Big Data stakeholders to pick-and-
choose technology-agnostic analytics tools for processing and visualization in any
computing platform and cluster while allowing value-added from Big Data service
providers and the flow of the data between the stakeholders in a cohesive and
secure manner.
Tasks
• Gather and study available Big Data architectures representing various
stakeholders, different data types,’ use cases, and document the architectures
using the Big Data taxonomies model based upon the identified actors with their
roles and responsibilities.
• Ensure that the developed Big Data reference architecture and the Security and
Privacy Reference Architecture correspond and complement each other.
Notre Dame CCL Meeting October 11 20139/29/13
List Of Surveyed Architectures
• Vendor-neutral and technology-agnostic proposals
– Bob Marcus ET-Strategies
– Orit Levin Microsoft
– Gary Mazzaferro AlloyCloud
– Yuri Demchenko University of Amsterdam
• Vendors’ Architectures
– IBM
– Oracle
– Booz Allen Hamilton
– EMC
– SAP
– 9sight
– LexusNexis
18
Notre Dame CCL Meeting October 11 20139/29/13
Vendor-neutral and Technology-agnostic Proposals
19
Data Processing Flow
M0039
Data Transformation Flow
M0017
IT Stack
M0047
Notre Dame CCL Meeting October 11 20139/29/13
Vendor-neutral and Technology-agnostic Proposals
20
Data Processing Flow
M0039
Data Transformation Flow
M0017
IT Stack
M0047
Notre Dame CCL Meeting October 11 20139/29/13
Vendor-neutral and Technology-agnostic Proposals
21
Data Processing Flow
M0039
IT Stack
M0047
Data Transformation Flow
M0017
Notre Dame CCL Meeting October 11 20139/29/13
Vendor-neutral and Technology-agnostic Proposals
22
Data Transformation Flow
M0017
IT Stack
M0047
Data Processing Flow
M0039
Notre Dame CCL Meeting October 11 20139/29/13
Draft Agreement / Rough Consensus
• Transformation includes
– Processing functions
– Analytic functions
– Visualization functions
• Data Infrastructure includes
– Data stores
– In-memory DBs
– Analytic DBs
23
Sources
Transformation
Usage
DataInfrastructure
Security
Management
CloudComputing
Network
Notre Dame CCL Meeting October 11 20139/29/13
Main Functional Blocks
24
Big Data Application Provider
System Orchestrator
DataConsumer
DataProvider
Big Data Framework Provider
• Application Specific
• Identity Management & Authorization
• Etc.
• Discovery of data
• Description of data
• Access to data
• Code execution on data
• Etc.
• Discovery of services
• Description of data
• Visualization of data
• Rendering of data
• Reporting of data
• Code execution on data
• Etc.
• Analytic processing of data
• Machine learning
• Code execution on data et situ
• Storage, retrieval, search, etc. of data
• Providing computing infrastructure
• Providing networking infrastructure
• Etc.
Notre Dame CCL Meeting October 11 20139/29/13 25
Management
Security&Privacy
Big Data Application Provider
Visualization AccessAnalyticsCurationCollection
System Orchestrator
DATA
SW
DATA
SW
I N F O R M AT I O N VA L U E C H A I N
ITVALUECHAIN
DataConsumer
DataProvider
Horizontally Scalable (VM clusters)
Vertically Scalable
Horizontally Scalable
Vertically Scalable
Horizontally Scalable
Vertically Scalable
Big Data Framework Provider
Processing Frameworks (analytic tools, etc.)
Platforms (databases, etc.)
Infrastructures
Physical and Virtual Resources (networking, computing, etc.)
DATA
SW
K E Y :
DATA
SW
Service Use
Big Data
Information Flow
SW Tools and
Algorithms Transfer
Notre Dame CCL Meeting October 11 20139/29/13
Security and Privacy Subgroup
26
The focus is to form a community of interest from industry, academia, and
government, with the goal of developing a consensus secure reference
architecture to handle security and privacy issues across all stakeholders.
This includes gaining an understanding of what standards are available or
under development, as well as identifies which key organizations are
working on these standards.
Tasks
• Gather input from all stakeholders regarding security and privacy
concerns in Big Data processing, storage, and services.
• Analyze/prioritize a list of challenging security and privacy requirements
from ~10 special use cases that may delay or prevent adoption of Big
Data deployment
• Develop a Security and Privacy Reference Architecture that supplements
the general Big Data Reference Architecture
Notre Dame CCL Meeting October 11 20139/29/13 27
Public/Private/Hybrid Cloud
5, 7, 8, 9
1, 3, 5, 6, 7, 8, 9, 10
4, 8, 9
4, 10
10
2, 3, 5, 8, 9
Data Storage
1) Secure computations in distributed
programming frameworks
2) Security best practices for non-
relational datastores
3) Secure data storage and transactions
logs
4) End-point input validation/filtering
5) Real time security monitoring
6) Scalable and composable privacy-
preserving data mining and analytics
7) Cryptographically enforced access
control and secure communication
8) Granular access control
9) Granular audits
10) Data provenance
CSA (Cloud Security Alliance) BDWG: Top Ten Big
Data Security and Privacy Challenges
Notre Dame CCL Meeting October 11 20139/29/13
Top 10 S&P Challenges: Classification
28
Infrastructure
security
Secure
Computations in
Distributed
Programming
Frameworks
Security Best
Practices for Non-
Relational Data
Stores
Data Privacy
Privacy Preserving
Data Mining and
Analytics
Cryptographically
Enforced Data
Centric Security
Granular Access
Control
Data
Management
Secure Data
Storage and
Transaction Logs
Granular Audits
Data Provenance
Integrity and
Reactive
Security
End-point validation
and filtering
Real time Security
Monitoring
Notre Dame CCL Meeting October 11 20139/29/13
Use Cases
29
• Retail/Marketing
– Modern Day Consumerism
– Nielsen Homescan
– Web Traffic Analysis
• Healthcare
– Health Information Exchange
– Genetic Privacy
– Pharma Clinical Trial Data
Sharing
• Cyber-security
• Government
– Military
– Education
Notre Dame CCL Meeting October 11 20139/29/13
Big Data Security Reference Architecture
Notre Dame CCL Meeting October 11 20139/29/13
Technology Roadmap Subgroup
31
The focus is to form a community of interest from industry, academia, and
government, with the goal of developing a consensus vision with recommendations
on how Big Data should move forward by performing a good gap analysis through
the materials gathered from all other NBD subgroups. This includes setting
standardization and adoption priorities through an understanding of what standards
are available or under development as part of the recommendations.
Tasks
• Gather input from NBD subgroups and study the taxonomies for the actors’ roles
and responsibility, use cases and requirements, and secure reference
architecture.
• Gain understanding of what standards are available or under development for Big
Data
• Perform a thorough gap analysis and document the findings
• Identify what possible barriers may delay or prevent adoption of Big Data
• Document vision and recommendations
Notre Dame CCL Meeting October 11 20139/29/13
Some Identified Features
32
Technology Roadmap09.30.2013
Feature Roles Readiness Ref Architecture Mapping
Storage Framework TBD TBD Capabilities
Processing Framework TBD TBD Capabilities
Resource Managers Framework TBD TBD Capabilities
Infrastructure Framework TBD TBD Capabilities
Information Framework TBD TBD Data Services
Standards Integration
Framework
TBD TBD Data Services
Applications Framework TBD TBD Capabilities
Business Operations TBD TBD Vertical Orchestrator
Notre Dame CCL Meeting October 11 20139/29/13
Interaction Between Subgroups
33
Technology
Roadmap
Requirements
& Use Cases
Definitions &
Taxonomies

Reference
Architecture 


Security &
Privacy


Due to time constraints, activities were carried out in parallel.

Más contenido relacionado

La actualidad más candente

Human Networking: a University, High School & Industry Partnership
Human Networking: a University, High School & Industry PartnershipHuman Networking: a University, High School & Industry Partnership
Human Networking: a University, High School & Industry PartnershipKenneth Ronkowitz
 
Technology and Student Affairs
Technology and Student AffairsTechnology and Student Affairs
Technology and Student AffairsLeslie Dare
 
Web Mining Research Issues and Future Directions – A Survey
Web Mining Research Issues and Future Directions – A SurveyWeb Mining Research Issues and Future Directions – A Survey
Web Mining Research Issues and Future Directions – A SurveyIOSR Journals
 
Image retrieval from the world wide web issues, techniques, and systems
Image retrieval from the world wide web issues, techniques, and systemsImage retrieval from the world wide web issues, techniques, and systems
Image retrieval from the world wide web issues, techniques, and systemsunyil96
 
Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recomm...
Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recomm...Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recomm...
Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recomm...MOVING Project
 
The Promise of Grid Computing Technologies for E-Learning Systems in Kenya
The Promise of Grid Computing Technologies for E-Learning Systems in KenyaThe Promise of Grid Computing Technologies for E-Learning Systems in Kenya
The Promise of Grid Computing Technologies for E-Learning Systems in KenyaUmma Khatuna Jannat
 
The UVA School of Data Science
The UVA School of Data ScienceThe UVA School of Data Science
The UVA School of Data SciencePhilip Bourne
 
Resume It Industry Strath Address
Resume It Industry Strath AddressResume It Industry Strath Address
Resume It Industry Strath Addressdroussinov
 
Data Education project briefing for Royal Society
Data Education project briefing for Royal SocietyData Education project briefing for Royal Society
Data Education project briefing for Royal SocietyKate Farrell
 
Information technology research trends: The future vision
Information technology research trends: The future vision Information technology research trends: The future vision
Information technology research trends: The future vision Aboul Ella Hassanien
 
DisCo 2013: Rohliková and Vejvodová and Zounek - Modern Technology and Univer...
DisCo 2013: Rohliková and Vejvodová and Zounek - Modern Technology and Univer...DisCo 2013: Rohliková and Vejvodová and Zounek - Modern Technology and Univer...
DisCo 2013: Rohliková and Vejvodová and Zounek - Modern Technology and Univer...8th DisCo conference 2013
 
A Review of Big Data Analytics in Sector of Higher Education
A Review of Big Data Analytics in Sector of Higher EducationA Review of Big Data Analytics in Sector of Higher Education
A Review of Big Data Analytics in Sector of Higher EducationIJERA Editor
 
Dissemination Information Packages (DIPS) for Information Reuse
Dissemination Information Packages (DIPS) for Information Reuse Dissemination Information Packages (DIPS) for Information Reuse
Dissemination Information Packages (DIPS) for Information Reuse Micah Altman
 
Mining Social Media Data for Understanding Drugs Usage
Mining Social Media Data for Understanding Drugs  UsageMining Social Media Data for Understanding Drugs  Usage
Mining Social Media Data for Understanding Drugs UsageIRJET Journal
 
Development of Southern Luzon State University Digital Library of Theses and ...
Development of Southern Luzon State University Digital Library of Theses and ...Development of Southern Luzon State University Digital Library of Theses and ...
Development of Southern Luzon State University Digital Library of Theses and ...IRJET Journal
 
Information theoritic analysis of entity dynamics on the linked open data cloud
Information theoritic analysis of entity dynamics on the linked open data cloudInformation theoritic analysis of entity dynamics on the linked open data cloud
Information theoritic analysis of entity dynamics on the linked open data cloudMOVING Project
 
AIT Initiative in KM Feb07
AIT Initiative in KM Feb07AIT Initiative in KM Feb07
AIT Initiative in KM Feb07kmtkproject
 
Chapter123final
Chapter123finalChapter123final
Chapter123finalDelapisa18
 

La actualidad más candente (20)

Human Networking: a University, High School & Industry Partnership
Human Networking: a University, High School & Industry PartnershipHuman Networking: a University, High School & Industry Partnership
Human Networking: a University, High School & Industry Partnership
 
Technology and Student Affairs
Technology and Student AffairsTechnology and Student Affairs
Technology and Student Affairs
 
Web Mining Research Issues and Future Directions – A Survey
Web Mining Research Issues and Future Directions – A SurveyWeb Mining Research Issues and Future Directions – A Survey
Web Mining Research Issues and Future Directions – A Survey
 
Image retrieval from the world wide web issues, techniques, and systems
Image retrieval from the world wide web issues, techniques, and systemsImage retrieval from the world wide web issues, techniques, and systems
Image retrieval from the world wide web issues, techniques, and systems
 
Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recomm...
Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recomm...Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recomm...
Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recomm...
 
The Promise of Grid Computing Technologies for E-Learning Systems in Kenya
The Promise of Grid Computing Technologies for E-Learning Systems in KenyaThe Promise of Grid Computing Technologies for E-Learning Systems in Kenya
The Promise of Grid Computing Technologies for E-Learning Systems in Kenya
 
The UVA School of Data Science
The UVA School of Data ScienceThe UVA School of Data Science
The UVA School of Data Science
 
Resume It Industry Strath Address
Resume It Industry Strath AddressResume It Industry Strath Address
Resume It Industry Strath Address
 
Data Education project briefing for Royal Society
Data Education project briefing for Royal SocietyData Education project briefing for Royal Society
Data Education project briefing for Royal Society
 
Information technology research trends: The future vision
Information technology research trends: The future vision Information technology research trends: The future vision
Information technology research trends: The future vision
 
Ijciet 10 02_032
Ijciet 10 02_032Ijciet 10 02_032
Ijciet 10 02_032
 
DisCo 2013: Rohliková and Vejvodová and Zounek - Modern Technology and Univer...
DisCo 2013: Rohliková and Vejvodová and Zounek - Modern Technology and Univer...DisCo 2013: Rohliková and Vejvodová and Zounek - Modern Technology and Univer...
DisCo 2013: Rohliková and Vejvodová and Zounek - Modern Technology and Univer...
 
A Review of Big Data Analytics in Sector of Higher Education
A Review of Big Data Analytics in Sector of Higher EducationA Review of Big Data Analytics in Sector of Higher Education
A Review of Big Data Analytics in Sector of Higher Education
 
Dissemination Information Packages (DIPS) for Information Reuse
Dissemination Information Packages (DIPS) for Information Reuse Dissemination Information Packages (DIPS) for Information Reuse
Dissemination Information Packages (DIPS) for Information Reuse
 
Teaching Service Science in the iSchool at the University of Toronto
Teaching Service Science in the iSchool at the University of TorontoTeaching Service Science in the iSchool at the University of Toronto
Teaching Service Science in the iSchool at the University of Toronto
 
Mining Social Media Data for Understanding Drugs Usage
Mining Social Media Data for Understanding Drugs  UsageMining Social Media Data for Understanding Drugs  Usage
Mining Social Media Data for Understanding Drugs Usage
 
Development of Southern Luzon State University Digital Library of Theses and ...
Development of Southern Luzon State University Digital Library of Theses and ...Development of Southern Luzon State University Digital Library of Theses and ...
Development of Southern Luzon State University Digital Library of Theses and ...
 
Information theoritic analysis of entity dynamics on the linked open data cloud
Information theoritic analysis of entity dynamics on the linked open data cloudInformation theoritic analysis of entity dynamics on the linked open data cloud
Information theoritic analysis of entity dynamics on the linked open data cloud
 
AIT Initiative in KM Feb07
AIT Initiative in KM Feb07AIT Initiative in KM Feb07
AIT Initiative in KM Feb07
 
Chapter123final
Chapter123finalChapter123final
Chapter123final
 

Destacado

Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...Otávio Carvalho
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsGeoffrey Fox
 
NIST SP 800 1500-7 Working group on Big Data -- Interoperability Standards
NIST SP 800 1500-7 Working group on Big Data -- Interoperability StandardsNIST SP 800 1500-7 Working group on Big Data -- Interoperability Standards
NIST SP 800 1500-7 Working group on Big Data -- Interoperability StandardsDavid Sweigert
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Geoffrey Fox
 
Big Data HPC Convergence
Big Data HPC ConvergenceBig Data HPC Convergence
Big Data HPC ConvergenceGeoffrey Fox
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with HadoopCloudera, Inc.
 
Arnab-resume-new
Arnab-resume-newArnab-resume-new
Arnab-resume-newArnab Roy
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data Geoffrey Fox
 

Destacado (9)

Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other things
 
NIST SP 800 1500-7 Working group on Big Data -- Interoperability Standards
NIST SP 800 1500-7 Working group on Big Data -- Interoperability StandardsNIST SP 800 1500-7 Working group on Big Data -- Interoperability Standards
NIST SP 800 1500-7 Working group on Big Data -- Interoperability Standards
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel 
 
Big Data HPC Convergence
Big Data HPC ConvergenceBig Data HPC Convergence
Big Data HPC Convergence
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
 
Big Data Security and Governance
Big Data Security and GovernanceBig Data Security and Governance
Big Data Security and Governance
 
Arnab-resume-new
Arnab-resume-newArnab-resume-new
Arnab-resume-new
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 

Similar a NIST Big Data Public Working Group NBD-PWG

big_data_casestudies_2.ppt
big_data_casestudies_2.pptbig_data_casestudies_2.ppt
big_data_casestudies_2.pptvishal choudhary
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Dan Taylor
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
 
Graham Pryor
Graham PryorGraham Pryor
Graham PryorEduserv
 
Australia's Environmental Predictive Capability
Australia's Environmental Predictive CapabilityAustralia's Environmental Predictive Capability
Australia's Environmental Predictive CapabilityTERN Australia
 
EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013EarthCube
 
Kerry Taylor - Semantics & sensors
Kerry Taylor - Semantics & sensorsKerry Taylor - Semantics & sensors
Kerry Taylor - Semantics & sensorsWeb Directions
 
Azure Brain: 4th paradigm, scientific discovery & (really) big data
Azure Brain: 4th paradigm, scientific discovery & (really) big dataAzure Brain: 4th paradigm, scientific discovery & (really) big data
Azure Brain: 4th paradigm, scientific discovery & (really) big dataMicrosoft Technet France
 
Data accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphereData accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphereAlex Hardisty
 
Big data: Challenges, Practices and Technologies
Big data: Challenges, Practices and TechnologiesBig data: Challenges, Practices and Technologies
Big data: Challenges, Practices and TechnologiesNavneet Randhawa
 
Cloud Testbeds for Standards Development and Innovation
Cloud Testbeds for Standards Development and InnovationCloud Testbeds for Standards Development and Innovation
Cloud Testbeds for Standards Development and InnovationAlan Sill
 
Data Management Planning - 02/21/13
Data Management Planning - 02/21/13Data Management Planning - 02/21/13
Data Management Planning - 02/21/13Lizzy_Rolando
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
 
Panel: The Global Research Platform: An Overview
Panel: The Global Research Platform: An OverviewPanel: The Global Research Platform: An Overview
Panel: The Global Research Platform: An OverviewLarry Smarr
 
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...BigData_Europe
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research PlatformLarry Smarr
 

Similar a NIST Big Data Public Working Group NBD-PWG (20)

big_data_casestudies_2.ppt
big_data_casestudies_2.pptbig_data_casestudies_2.ppt
big_data_casestudies_2.ppt
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
Graham Pryor
Graham PryorGraham Pryor
Graham Pryor
 
Australia's Environmental Predictive Capability
Australia's Environmental Predictive CapabilityAustralia's Environmental Predictive Capability
Australia's Environmental Predictive Capability
 
EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013
 
Big Data
Big Data Big Data
Big Data
 
Kerry Taylor - Semantics & sensors
Kerry Taylor - Semantics & sensorsKerry Taylor - Semantics & sensors
Kerry Taylor - Semantics & sensors
 
Azure Brain: 4th paradigm, scientific discovery & (really) big data
Azure Brain: 4th paradigm, scientific discovery & (really) big dataAzure Brain: 4th paradigm, scientific discovery & (really) big data
Azure Brain: 4th paradigm, scientific discovery & (really) big data
 
Data accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphereData accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphere
 
Big data: Challenges, Practices and Technologies
Big data: Challenges, Practices and TechnologiesBig data: Challenges, Practices and Technologies
Big data: Challenges, Practices and Technologies
 
Cloud Testbeds for Standards Development and Innovation
Cloud Testbeds for Standards Development and InnovationCloud Testbeds for Standards Development and Innovation
Cloud Testbeds for Standards Development and Innovation
 
DGterzo
DGterzoDGterzo
DGterzo
 
Data Management Planning - 02/21/13
Data Management Planning - 02/21/13Data Management Planning - 02/21/13
Data Management Planning - 02/21/13
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
Panel: The Global Research Platform: An Overview
Panel: The Global Research Platform: An OverviewPanel: The Global Research Platform: An Overview
Panel: The Global Research Platform: An Overview
 
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
 

Más de Geoffrey Fox

AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...Geoffrey Fox
 
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...Geoffrey Fox
 
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...Geoffrey Fox
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming DataGeoffrey Fox
 
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Geoffrey Fox
 
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...Geoffrey Fox
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...Geoffrey Fox
 
Experience with Online Teaching with Open Source MOOC Technology
Experience with Online Teaching with Open Source MOOC TechnologyExperience with Online Teaching with Open Source MOOC Technology
Experience with Online Teaching with Open Source MOOC TechnologyGeoffrey Fox
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsGeoffrey Fox
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox
 
Big Data and Clouds: Research and Education
Big Data and Clouds: Research and EducationBig Data and Clouds: Research and Education
Big Data and Clouds: Research and EducationGeoffrey Fox
 
Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Geoffrey Fox
 
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC Geoffrey Fox
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsGeoffrey Fox
 
FutureGrid Computing Testbed as a Service
 FutureGrid Computing Testbed as a Service FutureGrid Computing Testbed as a Service
FutureGrid Computing Testbed as a ServiceGeoffrey Fox
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Geoffrey Fox
 
51 Use Cases and implications for HPC & Apache Big Data Stack
51 Use Cases and implications for HPC & Apache Big Data Stack51 Use Cases and implications for HPC & Apache Big Data Stack
51 Use Cases and implications for HPC & Apache Big Data StackGeoffrey Fox
 
Linking Programming models between Grids, Web 2.0 and Multicore
Linking Programming models between Grids, Web 2.0 and Multicore Linking Programming models between Grids, Web 2.0 and Multicore
Linking Programming models between Grids, Web 2.0 and Multicore Geoffrey Fox
 
CTS Conference Web 2.0 Tutorial Part 2
CTS Conference Web 2.0 Tutorial Part 2CTS Conference Web 2.0 Tutorial Part 2
CTS Conference Web 2.0 Tutorial Part 2Geoffrey Fox
 

Más de Geoffrey Fox (20)

AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
 
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
 
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming Data
 
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
 
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
 
Experience with Online Teaching with Open Source MOOC Technology
Experience with Online Teaching with Open Source MOOC TechnologyExperience with Online Teaching with Open Source MOOC Technology
Experience with Online Teaching with Open Source MOOC Technology
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
Big Data and Clouds: Research and Education
Big Data and Clouds: Research and EducationBig Data and Clouds: Research and Education
Big Data and Clouds: Research and Education
 
Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...
 
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different Facets
 
Remarks on MOOC's
Remarks on MOOC'sRemarks on MOOC's
Remarks on MOOC's
 
FutureGrid Computing Testbed as a Service
 FutureGrid Computing Testbed as a Service FutureGrid Computing Testbed as a Service
FutureGrid Computing Testbed as a Service
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
 
51 Use Cases and implications for HPC & Apache Big Data Stack
51 Use Cases and implications for HPC & Apache Big Data Stack51 Use Cases and implications for HPC & Apache Big Data Stack
51 Use Cases and implications for HPC & Apache Big Data Stack
 
Linking Programming models between Grids, Web 2.0 and Multicore
Linking Programming models between Grids, Web 2.0 and Multicore Linking Programming models between Grids, Web 2.0 and Multicore
Linking Programming models between Grids, Web 2.0 and Multicore
 
CTS Conference Web 2.0 Tutorial Part 2
CTS Conference Web 2.0 Tutorial Part 2CTS Conference Web 2.0 Tutorial Part 2
CTS Conference Web 2.0 Tutorial Part 2
 

Último

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 

Último (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 

NIST Big Data Public Working Group NBD-PWG

  • 1. Notre Dame CCL Meeting October 11 2013 NIST Big Data Public Working Group NBD-PWG Geoffrey Fox Based on September 30, 2013 Presentations at one day workshop at NIST Leaders of activity Wo Chang, NIST Robert Marcus, ET-Strategies Chaitanya Baru, UC San Diego Note web site http://bigdatawg.nist.gov/ is shut down
  • 2. Notre Dame CCL Meeting October 11 20139/29/13 NBD-PWG Charter Launch Date: June 26, 2013; Public Meeting with interim deliverables: September, 30, 2013; Edit and send out for comment Nov-Dec 2013 • The focus of the (NBD-PWG) is to form a community of interest from industry, academia, and government, with the goal of developing a consensus definitions, taxonomies, secure reference architectures, and technology roadmap. The aim is to create vendor-neutral, technology and infrastructure agnostic deliverables to enable big data stakeholders to pick-and-choose best analytics tools for their processing and visualization requirements on the most suitable computing platforms and clusters while allowing value-added from big data service providers and flow of data between the stakeholders in a cohesive and secure manner. • Note identify common/best practice; includes but not limited to discussing standards (S in NIST) 2
  • 3. Notre Dame CCL Meeting October 11 20139/29/13 NBD-PWG Subgroups & Co-Chairs 3 • Requirements and Use Cases SG – Geoffrey Fox, Indiana U.; Joe Paiva, VA; Tsegereda Beyene, Cisco • Definitions and Taxonomies SG – Nancy Grady, SAIC; Natasha Balac, SDSC; Eugene Luster, R2AD • Reference Architecture SG – Orit Levin, Microsoft; James Ketner, AT&T; Don Krapohl, Augmented Intelligence • Security and Privacy SG – Arnab Roy, CSA/Fujitsu Nancy Landreville, U. MD Akhil Manchanda, GE • Technology Roadmap SG – Carl Buffington, Vistronix; Dan McClary, Oracle; David Boyd, Data Tactic
  • 4. Notre Dame CCL Meeting October 11 20139/29/13 Requirements and Use Case Subgroup 4 The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus list of Big Data requirements across all stakeholders. This includes gathering and understanding various use cases from diversified application domains. Tasks • Gather use case input from all stakeholders • Derive Big Data requirements from each use case. • Analyze/prioritize a list of challenging general requirements that may delay or prevent adoption of Big Data deployment • Work with Reference Architecture to validate requirements and reference architecture • Develop a set of general patterns capturing the “essence” of use cases (to do)
  • 5. Notre Dame CCL Meeting October 11 20139/29/13 Use Case Template • 26 fields completed for 51 areas • Government Operation: 4 • Commercial: 8 • Defense: 3 • Healthcare and Life Sciences: 10 • Deep Learning and Social Media: 6 • The Ecosystem for Research: 4 • Astronomy and Physics: 5 • Earth, Environmental and Polar Science: 10 • Energy: 1
  • 6. Notre Dame CCL Meeting October 11 20139/29/13 51 Detailed Use Cases: Many TB’s to Many PB’s • Government Operation: National Archives and Records Administration, Census Bureau • Commercial: Finance in Cloud, Cloud Backup, Mendeley (Citations), Netflix, Web Search, Digital Materials, Cargo shipping (as in UPS) • Defense: Sensors, Image surveillance, Situation Assessment • Healthcare and Life Sciences: Medical records, Graph and Probabilistic analysis, Pathology, Bioimaging, Genomics, Epidemiology, People Activity models, Biodiversity • Deep Learning and Social Media: Driving Car, Geolocate images/cameras, Twitter, Crowd Sourcing, Network Science, NIST benchmark datasets • The Ecosystem for Research: Metadata, Collaboration, Language Translation, Light source experiments • Astronomy and Physics: Sky Surveys compared to simulation, Large Hadron Collider at CERN, Belle Accelerator II in Japan • Earth, Environmental and Polar Science: Radar Scattering in Atmosphere, Earthquake, Ocean, Earth Observation, Ice sheet Radar scattering, Earth radar mapping, Climate simulation datasets, Atmospheric turbulence identification, Subsurface Biogeochemistry (microbes to watersheds), AmeriFlux and FLUXNET gas sensors • Energy: Smart grid • Next step involves matching extracted requirements and reference architecture • Alternatively develop a set of general patterns capturing the “essence” of use cases
  • 7. Notre Dame CCL Meeting October 11 20139/29/13 Some Trends • Also from IEEE Big Data conference in Santa Clara • Practitioners consider themselves Data Scientists • Images are a major source of Big Data – Radar – Light Synchrotrons – Phones – Bioimaging 7 • Hadoop and HDFS dominant • Business – main emphasis at NIST – interested in analytics and assume HDFS • Academia also extremely interested in data management • Clouds v. Grids
  • 8. Notre Dame CCL Meeting October 11 20139/29/13 Example Use Case I: Summary of Genomics • Application: NIST/Genome in a Bottle Consortium integrates data from multiple sequencing technologies and methods to develop highly confident characterization of whole human genomes as reference materials, and develop methods to use these Reference Materials to assess performance of any genome sequencing run. • Current Approach: The storage of ~40TB NFS at NIST is full; there are also PBs of genomics data at NIH/NCBI. Use Open-source sequencing bioinformatics software from academic groups (UNIX-based) on a 72 core cluster at NIST supplemented by larger systems at collaborators. • Futures: DNA sequencers can generate ~300GB compressed data/day which volume has increased much faster than Moore’s Law. Future data could include other ‘omics’ measurements, which will be even larger than DNA sequencing. Clouds have been explored. 8
  • 9. Notre Dame CCL Meeting October 11 20139/29/13 Example Use Case II: Census Bureau Statistical Survey Response Improvement (Adaptive Design) • Application: Survey costs are increasing as survey response declines. The goal of this work is to use advanced “recommendation system techniques” that are open and scientifically objective, using data mashed up from several sources and historical survey para-data (administrative data about the survey) to drive operational processes in an effort to increase quality and reduce the cost of field surveys. • Current Approach: About a petabyte of data coming from surveys and other government administrative sources. Data can be streamed with approximately 150 million records transmitted as field data streamed continuously, during the decennial census. All data must be both confidential and secure. All processes must be auditable for security and confidentiality as required by various legal statutes. Data quality should be high and statistically checked for accuracy and reliability throughout the collection process. Use Hadoop, Spark, Hive, R, SAS, Mahout, Allegrograph, MySQL, Oracle, Storm, BigMemory, Cassandra, Pig software. • Futures: Analytics needs to be developed which give statistical estimations that provide more detail, on a more near real time basis for less cost. The reliability of estimated statistics from such “mashed up” sources still must be evaluated. 9
  • 10. Notre Dame CCL Meeting October 11 20139/29/13 Example Use Case III: Large-scale Deep Learning • Application: Large models (e.g., neural networks with more neurons and connections) combined with large datasets are increasingly the top performers in benchmark tasks for vision, speech, and Natural Language Processing. One needs to train a deep neural network from a large (>>1TB) corpus of data (typically imagery, video, audio, or text). Such training procedures often require customization of the neural network architecture, learning criteria, and dataset pre-processing. In addition to the computational expense demanded by the learning algorithms, the need for rapid prototyping and ease of development is extremely high. • Current Approach: The largest applications so far are to image recognition and scientific studies of unsupervised learning with 10 million images and up to 11 billion parameters on a 64 GPU HPC Infiniband cluster. Both supervised (using existing classified images) and unsupervised applications investigated. • Futures: Large datasets of 100TB or more may be necessary in order to exploit the representational power of the larger models. Training a self-driving car could take 100 million images at megapixel resolution. Deep Learning shares many characteristics with the broader field of machine learning. The paramount requirements are high computational throughput for mostly dense linear algebra operations, and extremely high productivity for researcher exploration. One needs integration of high performance libraries with high level (python) prototyping environments. 10
  • 11. Notre Dame CCL Meeting October 11 20139/29/13 Example Use Case IV: EISCAT 3D incoherent scatter radar system • Application: EISCAT, the European Incoherent Scatter Scientific Association, conducts research on the lower, middle and upper atmosphere and ionosphere using the incoherent scatter radar technique. This technique is the most powerful ground-based tool for these research applications. EISCAT studies instabilities in the ionosphere, as well as investigating the structure and dynamics of the middle atmosphere. It is also a diagnostic instrument in ionospheric modification experiments with addition of a separate Heating facility. Currently EISCAT operates 3 of the 10 major incoherent radar scattering instruments worldwide with its facilities in in the Scandinavian sector, north of the Arctic Circle. • Current Approach: The current running old EISCAT radar generates terabytes per year rates and does not present special challenges. • Futures: The design of the next generation radar, EISCAT_3D, will consist of a core site with a transmitting and receiving radar arrays and four sites with receiving antenna arrays at some 100 km from the core. The fully operational 5-site system will generate several thousand times data of current EISCAT system with 40 PB/year in 2022 and is expected to operate for 30 years. EISCAT 3D data e- Infrastructure plans to use the high performance computers for central site data processing and high throughput computers for mirror sites data processing. Downloading the full data is not time critical, but operations require real-time information about certain pre-defined events to be sent from the sites to the operation center and a real-time link from the operation center to the sites to set the mode of radar operation on with immediate action. 11
  • 12. Notre Dame CCL Meeting October 11 20139/29/13 Example Use Case V: Consumption forecasting in Smart Grids • Application: Predict energy consumption for customers, transformers, sub-stations and the electrical grid service area using smart meters providing measurements every 15-mins at the granularity of individual consumers within the service area of smart power utilities. Combine Head-end of smart meters (distributed), Utility databases (Customer Information, Network topology; centralized), US Census data (distributed), NOAA weather data (distributed), Micro-grid building information system (centralized), Micro-grid sensor network (distributed). This generalizes to real-time data-driven analytics for time series from cyber physical systems • Current Approach: GIS based visualization. Data is around 4 TB a year for a city with 1.4M sensors in Los Angeles. Uses R/Matlab, Weka, Hadoop software. Significant privacy issues requiring anonymization by aggregation. Combine real time and historic data with machine learning for predicting consumption. • Futures: Wide spread deployment of Smart Grids with new analytics integrating diverse data and supporting curtailment requests. Mobile applications for client interactions. 12
  • 13. Notre Dame CCL Meeting October 11 20139/29/13 13 Part of Property Summary Table
  • 14. Notre Dame CCL Meeting October 11 20139/29/13 Definitions and Taxonomies Subgroup 14 • The focus is to gain a better understanding of the principles of Big Data. It is important to develop a consensus-based common language and vocabulary terms used in Big Data across stakeholders from industry, academia, and government. In addition, it is also critical to identify essential actors with roles and responsibility, and subdivide them into components and sub-components on how they interact/ relate with each other according to their similarities and differences. Tasks • For Definitions: Compile terms used from all stakeholders regarding the meaning of Big Data from various standard bodies, domain applications, and diversified operational environments. • For Taxonomies: Identify key actors with their roles and responsibilities from all stakeholders, categorize them into components and subcomponents based on their similarities and differences
  • 15. Notre Dame CCL Meeting October 11 20139/29/13 Data Science Definition (Big Data less consensus) • Data Science is the extraction of actionable knowledge directly from data through a process of discovery, hypothesis, and analytical hypothesis analysis. • A Data Scientist is a practitioner who has sufficient knowledge of the overlapping regimes of expertise in business needs, domain knowledge, analytical skills and programming expertise to manage the end-to-end scientific method process through each stage in the big data lifecycle. 15 Big Data refers to digital data volume, velocity and/or variety whose management requires scalability across coupled horizontal resources
  • 16. Notre Dame CCL Meeting October 11 20139/29/13 Data Science (MOOC) Degree at IU 16 • Program in Data Science hosted by School of Informatics and Computing (37 faculty) • Initial activity is online Masters offered as MOOC • Still on approval cycle • First 4 courses (Certificate) which hopefully will be available next Semester – Big Data Applications and Analytics (~complete trial at https://x-informatics.appspot.com/preview) – Clouds – Information Visualization – Life Sciences Informatics
  • 17. Notre Dame CCL Meeting October 11 20139/29/13 Reference Architecture Subgroup 17 The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus-based approach to orchestrate vendor-neutral, technology and infrastructure agnostic for analytics tools and computing environments. The goal is to enable Big Data stakeholders to pick-and- choose technology-agnostic analytics tools for processing and visualization in any computing platform and cluster while allowing value-added from Big Data service providers and the flow of the data between the stakeholders in a cohesive and secure manner. Tasks • Gather and study available Big Data architectures representing various stakeholders, different data types,’ use cases, and document the architectures using the Big Data taxonomies model based upon the identified actors with their roles and responsibilities. • Ensure that the developed Big Data reference architecture and the Security and Privacy Reference Architecture correspond and complement each other.
  • 18. Notre Dame CCL Meeting October 11 20139/29/13 List Of Surveyed Architectures • Vendor-neutral and technology-agnostic proposals – Bob Marcus ET-Strategies – Orit Levin Microsoft – Gary Mazzaferro AlloyCloud – Yuri Demchenko University of Amsterdam • Vendors’ Architectures – IBM – Oracle – Booz Allen Hamilton – EMC – SAP – 9sight – LexusNexis 18
  • 19. Notre Dame CCL Meeting October 11 20139/29/13 Vendor-neutral and Technology-agnostic Proposals 19 Data Processing Flow M0039 Data Transformation Flow M0017 IT Stack M0047
  • 20. Notre Dame CCL Meeting October 11 20139/29/13 Vendor-neutral and Technology-agnostic Proposals 20 Data Processing Flow M0039 Data Transformation Flow M0017 IT Stack M0047
  • 21. Notre Dame CCL Meeting October 11 20139/29/13 Vendor-neutral and Technology-agnostic Proposals 21 Data Processing Flow M0039 IT Stack M0047 Data Transformation Flow M0017
  • 22. Notre Dame CCL Meeting October 11 20139/29/13 Vendor-neutral and Technology-agnostic Proposals 22 Data Transformation Flow M0017 IT Stack M0047 Data Processing Flow M0039
  • 23. Notre Dame CCL Meeting October 11 20139/29/13 Draft Agreement / Rough Consensus • Transformation includes – Processing functions – Analytic functions – Visualization functions • Data Infrastructure includes – Data stores – In-memory DBs – Analytic DBs 23 Sources Transformation Usage DataInfrastructure Security Management CloudComputing Network
  • 24. Notre Dame CCL Meeting October 11 20139/29/13 Main Functional Blocks 24 Big Data Application Provider System Orchestrator DataConsumer DataProvider Big Data Framework Provider • Application Specific • Identity Management & Authorization • Etc. • Discovery of data • Description of data • Access to data • Code execution on data • Etc. • Discovery of services • Description of data • Visualization of data • Rendering of data • Reporting of data • Code execution on data • Etc. • Analytic processing of data • Machine learning • Code execution on data et situ • Storage, retrieval, search, etc. of data • Providing computing infrastructure • Providing networking infrastructure • Etc.
  • 25. Notre Dame CCL Meeting October 11 20139/29/13 25 Management Security&Privacy Big Data Application Provider Visualization AccessAnalyticsCurationCollection System Orchestrator DATA SW DATA SW I N F O R M AT I O N VA L U E C H A I N ITVALUECHAIN DataConsumer DataProvider Horizontally Scalable (VM clusters) Vertically Scalable Horizontally Scalable Vertically Scalable Horizontally Scalable Vertically Scalable Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing, etc.) DATA SW K E Y : DATA SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer
  • 26. Notre Dame CCL Meeting October 11 20139/29/13 Security and Privacy Subgroup 26 The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus secure reference architecture to handle security and privacy issues across all stakeholders. This includes gaining an understanding of what standards are available or under development, as well as identifies which key organizations are working on these standards. Tasks • Gather input from all stakeholders regarding security and privacy concerns in Big Data processing, storage, and services. • Analyze/prioritize a list of challenging security and privacy requirements from ~10 special use cases that may delay or prevent adoption of Big Data deployment • Develop a Security and Privacy Reference Architecture that supplements the general Big Data Reference Architecture
  • 27. Notre Dame CCL Meeting October 11 20139/29/13 27 Public/Private/Hybrid Cloud 5, 7, 8, 9 1, 3, 5, 6, 7, 8, 9, 10 4, 8, 9 4, 10 10 2, 3, 5, 8, 9 Data Storage 1) Secure computations in distributed programming frameworks 2) Security best practices for non- relational datastores 3) Secure data storage and transactions logs 4) End-point input validation/filtering 5) Real time security monitoring 6) Scalable and composable privacy- preserving data mining and analytics 7) Cryptographically enforced access control and secure communication 8) Granular access control 9) Granular audits 10) Data provenance CSA (Cloud Security Alliance) BDWG: Top Ten Big Data Security and Privacy Challenges
  • 28. Notre Dame CCL Meeting October 11 20139/29/13 Top 10 S&P Challenges: Classification 28 Infrastructure security Secure Computations in Distributed Programming Frameworks Security Best Practices for Non- Relational Data Stores Data Privacy Privacy Preserving Data Mining and Analytics Cryptographically Enforced Data Centric Security Granular Access Control Data Management Secure Data Storage and Transaction Logs Granular Audits Data Provenance Integrity and Reactive Security End-point validation and filtering Real time Security Monitoring
  • 29. Notre Dame CCL Meeting October 11 20139/29/13 Use Cases 29 • Retail/Marketing – Modern Day Consumerism – Nielsen Homescan – Web Traffic Analysis • Healthcare – Health Information Exchange – Genetic Privacy – Pharma Clinical Trial Data Sharing • Cyber-security • Government – Military – Education
  • 30. Notre Dame CCL Meeting October 11 20139/29/13 Big Data Security Reference Architecture
  • 31. Notre Dame CCL Meeting October 11 20139/29/13 Technology Roadmap Subgroup 31 The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus vision with recommendations on how Big Data should move forward by performing a good gap analysis through the materials gathered from all other NBD subgroups. This includes setting standardization and adoption priorities through an understanding of what standards are available or under development as part of the recommendations. Tasks • Gather input from NBD subgroups and study the taxonomies for the actors’ roles and responsibility, use cases and requirements, and secure reference architecture. • Gain understanding of what standards are available or under development for Big Data • Perform a thorough gap analysis and document the findings • Identify what possible barriers may delay or prevent adoption of Big Data • Document vision and recommendations
  • 32. Notre Dame CCL Meeting October 11 20139/29/13 Some Identified Features 32 Technology Roadmap09.30.2013 Feature Roles Readiness Ref Architecture Mapping Storage Framework TBD TBD Capabilities Processing Framework TBD TBD Capabilities Resource Managers Framework TBD TBD Capabilities Infrastructure Framework TBD TBD Capabilities Information Framework TBD TBD Data Services Standards Integration Framework TBD TBD Data Services Applications Framework TBD TBD Capabilities Business Operations TBD TBD Vertical Orchestrator
  • 33. Notre Dame CCL Meeting October 11 20139/29/13 Interaction Between Subgroups 33 Technology Roadmap Requirements & Use Cases Definitions & Taxonomies  Reference Architecture    Security & Privacy   Due to time constraints, activities were carried out in parallel.