SlideShare una empresa de Scribd logo
1 de 23
AI-Driven Science and Engineering with
the Global AI and Modeling
Supercomputer GAIMSC
Workshop on Clusters, Clouds, and Data for Scientific Computing CCDSC 2018
Châteauform’, La Maison des Contes, 427 Chemin de Chanzé, (near Lyon) France
September 4-7 2018
http://www.netlib.org/utk/people/JackDongarra/CCDSC-2018/
Geoffrey Fox, September 5, 2018
Digital Science Center
Department of Intelligent Systems Engineering
Indiana University Bloomington
gcf@indiana.edu, http://www.dsc.soic.indiana.edu/, http://spidal.org/
1
Let’s learn from Microsoft Research what
they think are hot areas
• Industry’s role in research much larger today than 20-40 years ago
• Microsoft Research has about 1000 researchers and has 800 interns per year
• One of the largest computer science research organizations (INRIA larger)
• They just held a faculty summit August 2018 largely focused on systems for AI
• https://www.microsoft.com/en-us/research/event/faculty-summit-2018/
• With an interesting overview at the end positioning their work as building
designing and using the "Global AI Supercomputer" concept linking the
Intelligent Cloud to the Intelligent Edge
https://www.youtube.com/watch?v=jsv7EWhCqIQ&feature=youtu.be
2
aa
• aa
3
aa
• aa
4
Adding Modeling?
• Microsoft is very optimistic and excited
• I added “Modeling” to get the
Global AI and Modeling Supercomputer GAIMSC
• Modeling was meant to
• Include classic simulation oriented supercomputers
• Even just in Big Data, one needs to build a model for the machine learning
to use
• Note many talks discussed autotuning i.e. using GAIMSC to optimize GAIMSC
5
Possible Slogans for Research in
Global AI and Modeling Supercomputer arena
• AI First Science and Engineering
• Global AI and Modeling Supercomputer (Grid)
• Linking Intelligent Cloud to Intelligent Edge
• Linking Digital Twins to Deep Learning
• High-Performance Big-Data Computing
• Big Data and Extreme-scale Computing (BDEC)
• Common Digital Continuum Platform for Big Data and Extreme Scale
Computing (BDEC2)
• Using High Performance Computing ideas/technologies to give higher
functionality and performance “cloud” and “edge” systems
• Software 2.0 – replace Python by Training Data
• Industry 4.0 – Software defined machines or Industrial Internet of Things
6
Big Data and Extreme-scale Computing
http://www.exascale.org/bdec/
• BDEC Pathways to Convergence Report
• New series BDEC2 “Common Digital Continuum Platform for Big Data and
Extreme Scale Computing” with first meeting November 28-30, 2018
Bloomington Indiana USA.
• First day is evening reception with meeting focus “Defining application requirements for a
Common Digital Continuum Platform for Big Data and Extreme Scale Computing”
• Next meetings: February 19-21 Kobe, Japan (National infrastructure visions)
followed by two in Europe, one in USA and one in China.
http://www.exascale.org/bdec/sites/www.exascale.org.bdec/files/w
hitepapers/bdec2017pathways.pdf
7
AI First Research for AI-Driven Science and Engineering
• Artificial Intelligence is a dominant disruptive technology affecting all our
activities including business, education, research, and society.
• Further, several companies have proposed AI first strategies.
• They used to be mobile first
• The AI disruption is typically associated with big data coming from edge,
repositories or sophisticated scientific instruments such as telescopes, light
sources and gene sequencers.
• AI First requires mammoth computing resources such as clouds,
supercomputers, hyperscale systems and their distributed integration.
• AI First strategy is using the Global AI and Modeling Supercomputer GAIMSC
• Indiana University is examining a Masters in AI-Driven Engineering
8
AI First Publicity: 2017 Headlines
• The Race For AI: Google, Twitter, Intel, Apple In A Rush To Grab Artificial Intelligence
Startups
• Google, Facebook, And Microsoft Are Remaking Themselves Around AI
• Google: The Full Stack AI Company
• Bezos Says Artificial Intelligence to Fuel Amazon's Success
• Microsoft CEO says artificial intelligence is the 'ultimate breakthrough'
• Tesla’s New AI Guru Could Help Its Cars Teach Themselves
• Netflix Is Using AI to Conquer the World... and Bandwidth Issues
• How Google Is Remaking Itself As A “Machine Learning First” Company
• If You Love Machine Learning, You Should Check Out General Electric
Requirements for Global AI (and Modeling) Supercomputer
• Application Requirements: The structure of application clearly impacts needed
hardware and software
• Pleasingly parallel
• Workflow
• Global Machine Learning
• Data model: SQL, NoSQL; File Systems, Object store; Lustre, HDFS
• Distributed data from distributed sensors and instruments (Internet of Things)
requires Edge computing model
• Device – Fog – Cloud model and streaming data software and algorithms
• Hardware: node (accelerators such as GPU or KNL for deep learning) and multi-
node architecture configured as AI First HPC Cloud;
• Disks speed and location
• Software requirements: Programming model for GAIMSC
• Analytics
• Data management
• Streaming or Repository access or both
10
Distinctive Features of Applications
• Ratio of data to model sizes: vertical axis on next slide
• Importance of Synchronization – ratio of inter-node communication
to node computing: horizontal axis on next slide
• Sparsity of Data or Model; impacts value of GPU’s or vector
computing
• Irregularity of Data or Model
• Geographic distribution of Data as in edge computing; use of
streaming (dynamic data) versus batch paradigms
• Dynamic model structure as in some iterative algorithms
11
Big Data and Simulation Difficulty in Parallelism
Size of Synchronization constraints
Pleasingly Parallel
Often independent events
MapReduce as in
scalable databases
Structured Adaptive Sparse
Loosely Coupled
Largest scale
simulations
Current major Big
Data category
Commodity Clouds
HPC Clouds: Accelerators
High Performance Interconnect
Exascale Supercomputers
Global Machine
Learning
e.g. parallel
clustering
Deep Learning
HPC Clouds/Supercomputers
Memory access also critical
Unstructured Adaptive Sparse
Graph Analytics e.g.
subgraph mining
LDA
Linear Algebra at core
(often not sparse)
Size of
Disk I/O
Tightly Coupled
Parameter sweep
simulations
Just two problem characteristics
There is also data/compute distribution seen in grid/edge computing
12
Remembering Grid Computing: IoT and Distributed Center I
• Hyperscale data centers will grow from 338 in number at the end of 2016 to 628 by 2021.
They will represent 53 percent of all installed data center servers by 2021.
• They form a distributed Compute (on data) grid with some 50 million servers
• 94 percent of workloads and compute instances will be processed by cloud data centers by
2021-- only six percent will be processed by traditional data centers.
• Analysis from CISCO https://www.cisco.com/c/en/us/solutions/collateral/service-
provider/global-cloud-index-gci/white-paper-c11-738085.html
13
Number
of
instances
per server
Number
of Cloud
Data
Centers
Number of
Public or
Private
Cloud Data
Center
Instances
Remembering Grid Computing: IoT and Distributed Center II
• By 2021, Cisco expects IoT connections to reach 13.7 billion, up from 5.8
billion in 2016, according to its Global Cloud Index.
• Globally, the data stored in data centers will nearly quintuple by 2021 to
reach 1.3ZB by 2021, up 4.6-fold (a CAGR of 36 percent) from 286 exabytes
(EB) in 2016.
• Big data will reach 403 EB by 2021, up almost eight-fold from 25EB in 2016.
Big data will represent 30 percent of data stored in data centers by 2021,
up from 18 percent in 2016.
• The amount of data stored on devices will be 4.5-times higher than data
stored in data centers, at 5.9ZB by 2021.
• Driven largely by IoT, the total amount of data created (and not necessarily
stored) by any device will reach 847ZB per year by 2021, up from 218ZB per
year in 2016.
• The Intelligent Edge or IoT is a distributed Data Grid
14
15
Mary Meeker
identifies even more
digital data
Overall Global AI and Modeling Supercomputer
GAIMSC Architecture
• There is only a cloud at the logical center but it’s physically distributed and
owned by a few major players
• There is a very distributed set of devices surrounded by local Fog computing;
this forms the logically and physically distributed edge
• The edge is structured and largely data
• These are two differences from the Grid of the past
• e.g. self driving car will have its own fog and will not share fog with truck that it is about
to collide with
• The cloud and edge will both be very heterogeneous with varying accelerators,
memory size and disk structure.
16
Its is challenging these days to compete with Berkeley, Stanford, Google, Facebook, Amazon,
Microsoft, IBM.
Zaharia from Stanford (earlier Berkeley and Spark)
17
Collaborating on the
Global AI and Modeling Supercomputer GAIMSC
• Microsoft says:
• We can only “play together” and link functionalities from Google, Amazon, Facebook,
Microsoft, Academia if we have open API’s and open code to customize
• We must collaborate
• Open source Apache software
• Academia needs to use and define their own Apache projects
• We want to use AI supercomputer for AI-Driven science studying the early universe and
the Higgs boson and not just producing annoying advertisements (goal of most elite CS
researchers)
18
19
ML Code
NIPS 2015 http://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
Should we train data scientists or data engineers?
Gartner says that 3 times as many jobs for data engineers as data scientists.
There is more
to life (jobs)
than Machine
Learning!
Gartner on Data Engineering
• Gartner says that job numbers in data science teams are
• 10% - Data Scientists
• 20% - Citizen Data Scientists ("decision makers")
• 30% - Data Engineers
• 20% - Business experts
• 15% - Software engineers
• 5% - Quant geeks
• ~0% - Unicorns
(very few exist!)
20
Ways of adding High Performance to
Global AI (and Modeling) Supercomputer
• Fix performance issues in Spark, Heron, Hadoop, Flink etc.
• Messy as some features of these big data systems intrinsically slow in some (not all)
cases
• All these systems are “monolithic” and difficult to deal with individual components
• Execute HPBDC from classic big data system with custom communication
environment – approach of Harp for the relatively simple Hadoop
environment
• Provide a native Mesos/Yarn/Kubernetes/HDFS high performance execution
environment with all capabilities of Spark, Hadoop and Heron – goal of
Twister2
• Execute with MPI in classic (Slurm, Lustre) HPC environment
• Add modules to existing frameworks like Scikit-Learn or Tensorflow either as
new capability or as a higher performance version of existing module.
21
Working with Industry?
• Many academic areas today have been turned upside down by the increased role of
industry as there is so much overlap between major University research issues and
problems where the large technology companies are battling it out for commercial
leadership.
• Correspondingly we have seen -- especially with top-ranked departments --- increasing
numbers and styles of industry-university collaboration. Probably most departments
must join this trend and increase their Industry links if they are to thrive.
• Sometimes faculty are 20% University and 80% Industry
• These links can have the student internship opportunity needed for ABET and
traditional in the field.
• However, this is a double-edged sword as the increased access to internships for our best Ph.D.
students is one reason for the decrease in University research contribution.
• We should try to make such internships part of joint University-Industry research and not just an
Industry only activity
• Relationship can be jointly run centers such as NSF I/UCRC and Industry oriented
University centers such as Intel parallel Computing centers
22
GAIMSC Global AI & Modeling Supercomputer Questions
• What do gain from the concept? e.g. Ability to work with Big Data community
• What do we lose from the concept? e.g. everything runs as slow as Spark
• Is GAIMSC useful for BDEC2 initiative? For NSF? For DoE?
For Universities? For Industry? For users?
• Does adding modeling to concept add value?
• What are the research issues for GAIMSC? e.g. how to program?
• What can we do with GAIMSC that we couldn’t do with classic Big Data
technologies?
• What can we do with GAIMSC that we couldn’t do with classic HPC technologies?
• Are there deep or important issues associated with the “Global” in GAIMSC?
• Is the concept of an auto-tuned Global AI Supercomputer scary?
23

Más contenido relacionado

La actualidad más candente

SIMA AZ: Emerging Information Technology Innovations & Trends 11/15/17
SIMA AZ: Emerging Information Technology Innovations & Trends 11/15/17SIMA AZ: Emerging Information Technology Innovations & Trends 11/15/17
SIMA AZ: Emerging Information Technology Innovations & Trends 11/15/17Mark Goldstein
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataRichard Vidgen
 
Issues on Big Data & Cloud Computing
Issues on Big Data & Cloud Computing Issues on Big Data & Cloud Computing
Issues on Big Data & Cloud Computing Seungyun Lee
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things PayamBarnaghi
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its ChallengesKathirvel Ayyaswamy
 
Green Compute and Storage - Why does it Matter and What is in Scope
Green Compute and Storage - Why does it Matter and What is in ScopeGreen Compute and Storage - Why does it Matter and What is in Scope
Green Compute and Storage - Why does it Matter and What is in ScopeNarayanan Subramaniam
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Geoffrey Fox
 
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Edward Curry
 
Vertex perspectives artificial intelligence
Vertex perspectives   artificial intelligenceVertex perspectives   artificial intelligence
Vertex perspectives artificial intelligenceYanai Oron
 
Department of Commerce App Challenge: Big Data Dashboards
Department of Commerce App Challenge: Big Data DashboardsDepartment of Commerce App Challenge: Big Data Dashboards
Department of Commerce App Challenge: Big Data DashboardsBrand Niemann
 
Linked Water Data For Water Information Management
Linked Water Data For Water Information ManagementLinked Water Data For Water Information Management
Linked Water Data For Water Information ManagementEdward Curry
 
Cloud Analytics - Using cloud based services to analyse big data
Cloud Analytics - Using cloud based services to analyse big dataCloud Analytics - Using cloud based services to analyse big data
Cloud Analytics - Using cloud based services to analyse big dataDavid Parsons
 
Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewBig Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewIJERA Editor
 
Latest Trends in Computer Science
Latest Trends in Computer ScienceLatest Trends in Computer Science
Latest Trends in Computer ScienceTechsparks
 
Gc vit sttp cc december 2013
Gc vit sttp cc december 2013Gc vit sttp cc december 2013
Gc vit sttp cc december 2013Seema Shah
 
Data Science Introduction - Data Science: What Art Thou?
Data Science Introduction - Data Science: What Art Thou?Data Science Introduction - Data Science: What Art Thou?
Data Science Introduction - Data Science: What Art Thou?Gregg Barrett
 
Hadoop, Iot and Analytics- The Three Musketeers
Hadoop, Iot and Analytics- The Three MusketeersHadoop, Iot and Analytics- The Three Musketeers
Hadoop, Iot and Analytics- The Three MusketeersEdureka!
 
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...DataWorks Summit
 

La actualidad más candente (20)

SIMA AZ: Emerging Information Technology Innovations & Trends 11/15/17
SIMA AZ: Emerging Information Technology Innovations & Trends 11/15/17SIMA AZ: Emerging Information Technology Innovations & Trends 11/15/17
SIMA AZ: Emerging Information Technology Innovations & Trends 11/15/17
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Issues on Big Data & Cloud Computing
Issues on Big Data & Cloud Computing Issues on Big Data & Cloud Computing
Issues on Big Data & Cloud Computing
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
 
Green Compute and Storage - Why does it Matter and What is in Scope
Green Compute and Storage - Why does it Matter and What is in ScopeGreen Compute and Storage - Why does it Matter and What is in Scope
Green Compute and Storage - Why does it Matter and What is in Scope
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
 
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
 
Vertex perspectives artificial intelligence
Vertex perspectives   artificial intelligenceVertex perspectives   artificial intelligence
Vertex perspectives artificial intelligence
 
Department of Commerce App Challenge: Big Data Dashboards
Department of Commerce App Challenge: Big Data DashboardsDepartment of Commerce App Challenge: Big Data Dashboards
Department of Commerce App Challenge: Big Data Dashboards
 
Linked Water Data For Water Information Management
Linked Water Data For Water Information ManagementLinked Water Data For Water Information Management
Linked Water Data For Water Information Management
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Cloud Analytics - Using cloud based services to analyse big data
Cloud Analytics - Using cloud based services to analyse big dataCloud Analytics - Using cloud based services to analyse big data
Cloud Analytics - Using cloud based services to analyse big data
 
Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewBig Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –Review
 
Latest Trends in Computer Science
Latest Trends in Computer ScienceLatest Trends in Computer Science
Latest Trends in Computer Science
 
Gc vit sttp cc december 2013
Gc vit sttp cc december 2013Gc vit sttp cc december 2013
Gc vit sttp cc december 2013
 
Data Science Introduction - Data Science: What Art Thou?
Data Science Introduction - Data Science: What Art Thou?Data Science Introduction - Data Science: What Art Thou?
Data Science Introduction - Data Science: What Art Thou?
 
Hadoop, Iot and Analytics- The Three Musketeers
Hadoop, Iot and Analytics- The Three MusketeersHadoop, Iot and Analytics- The Three Musketeers
Hadoop, Iot and Analytics- The Three Musketeers
 
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
 
Big Data introduction - Café Numérique Bruxelles
Big Data introduction - Café Numérique BruxellesBig Data introduction - Café Numérique Bruxelles
Big Data introduction - Café Numérique Bruxelles
 

Similar a AI-Driven Science and Engineering with the Global AI and Modeling Supercomputer GAIMSC

DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsGeoffrey Fox
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.pptVipin Singhal
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.pptgeminass1
 
Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01nayanbhatia2
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptxkalai75
 
Data science and Artificial Intelligence
Data science and Artificial IntelligenceData science and Artificial Intelligence
Data science and Artificial IntelligenceSuman Srinivasan
 
Internet of Things Presentation to Los Angeles CTO Forum
Internet of Things Presentation to Los Angeles CTO ForumInternet of Things Presentation to Los Angeles CTO Forum
Internet of Things Presentation to Los Angeles CTO ForumFred Thiel
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxdickonsondorris
 
AWS res 2024 key points for better research.ppt
AWS res 2024 key points for better research.pptAWS res 2024 key points for better research.ppt
AWS res 2024 key points for better research.pptfodod37142
 
Wizardry - Creating Magical Changes in the full lifecycle of Infrastructure #...
Wizardry - Creating Magical Changes in the full lifecycle of Infrastructure #...Wizardry - Creating Magical Changes in the full lifecycle of Infrastructure #...
Wizardry - Creating Magical Changes in the full lifecycle of Infrastructure #...Comit Projects Ltd
 
Future of jobs and digital economy citi conference 090618
Future of jobs and digital economy citi conference 090618Future of jobs and digital economy citi conference 090618
Future of jobs and digital economy citi conference 090618Economic Strategy Institute
 

Similar a AI-Driven Science and Engineering with the Global AI and Modeling Supercomputer GAIMSC (20)

Cloud computingjun28
Cloud computingjun28Cloud computingjun28
Cloud computingjun28
 
Cloud computingjun28
Cloud computingjun28Cloud computingjun28
Cloud computingjun28
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other things
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
 
Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
 
Data science and Artificial Intelligence
Data science and Artificial IntelligenceData science and Artificial Intelligence
Data science and Artificial Intelligence
 
Internet of Things Presentation to Los Angeles CTO Forum
Internet of Things Presentation to Los Angeles CTO ForumInternet of Things Presentation to Los Angeles CTO Forum
Internet of Things Presentation to Los Angeles CTO Forum
 
Big_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptxBig_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptx
 
Kartikey tripathi
Kartikey tripathiKartikey tripathi
Kartikey tripathi
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
 
Gcc notes unit 1
Gcc notes unit 1Gcc notes unit 1
Gcc notes unit 1
 
AWS res 2024 key points for better research.ppt
AWS res 2024 key points for better research.pptAWS res 2024 key points for better research.ppt
AWS res 2024 key points for better research.ppt
 
Wizardry - Creating Magical Changes in the full lifecycle of Infrastructure #...
Wizardry - Creating Magical Changes in the full lifecycle of Infrastructure #...Wizardry - Creating Magical Changes in the full lifecycle of Infrastructure #...
Wizardry - Creating Magical Changes in the full lifecycle of Infrastructure #...
 
Future of jobs and digital economy citi conference 090618
Future of jobs and digital economy citi conference 090618Future of jobs and digital economy citi conference 090618
Future of jobs and digital economy citi conference 090618
 

Más de Geoffrey Fox

Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...Geoffrey Fox
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data Geoffrey Fox
 
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...Geoffrey Fox
 
Big Data HPC Convergence
Big Data HPC ConvergenceBig Data HPC Convergence
Big Data HPC ConvergenceGeoffrey Fox
 
Data Science and Online Education
Data Science and Online EducationData Science and Online Education
Data Science and Online EducationGeoffrey Fox
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming DataGeoffrey Fox
 
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Geoffrey Fox
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Geoffrey Fox
 
Lessons from Data Science Program at Indiana University: Curriculum, Students...
Lessons from Data Science Program at Indiana University: Curriculum, Students...Lessons from Data Science Program at Indiana University: Curriculum, Students...
Lessons from Data Science Program at Indiana University: Curriculum, Students...Geoffrey Fox
 
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...Geoffrey Fox
 
Data Science Curriculum at Indiana University
Data Science Curriculum at Indiana UniversityData Science Curriculum at Indiana University
Data Science Curriculum at Indiana UniversityGeoffrey Fox
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...Geoffrey Fox
 
Experience with Online Teaching with Open Source MOOC Technology
Experience with Online Teaching with Open Source MOOC TechnologyExperience with Online Teaching with Open Source MOOC Technology
Experience with Online Teaching with Open Source MOOC TechnologyGeoffrey Fox
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsGeoffrey Fox
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox
 
Big Data and Clouds: Research and Education
Big Data and Clouds: Research and EducationBig Data and Clouds: Research and Education
Big Data and Clouds: Research and EducationGeoffrey Fox
 
Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Geoffrey Fox
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
 
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC Geoffrey Fox
 

Más de Geoffrey Fox (20)

Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
 
Big Data HPC Convergence
Big Data HPC ConvergenceBig Data HPC Convergence
Big Data HPC Convergence
 
Data Science and Online Education
Data Science and Online EducationData Science and Online Education
Data Science and Online Education
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming Data
 
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel 
 
Lessons from Data Science Program at Indiana University: Curriculum, Students...
Lessons from Data Science Program at Indiana University: Curriculum, Students...Lessons from Data Science Program at Indiana University: Curriculum, Students...
Lessons from Data Science Program at Indiana University: Curriculum, Students...
 
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
 
Data Science Curriculum at Indiana University
Data Science Curriculum at Indiana UniversityData Science Curriculum at Indiana University
Data Science Curriculum at Indiana University
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
 
Experience with Online Teaching with Open Source MOOC Technology
Experience with Online Teaching with Open Source MOOC TechnologyExperience with Online Teaching with Open Source MOOC Technology
Experience with Online Teaching with Open Source MOOC Technology
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
Big Data and Clouds: Research and Education
Big Data and Clouds: Research and EducationBig Data and Clouds: Research and Education
Big Data and Clouds: Research and Education
 
Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
 

Último

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Último (20)

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

AI-Driven Science and Engineering with the Global AI and Modeling Supercomputer GAIMSC

  • 1. AI-Driven Science and Engineering with the Global AI and Modeling Supercomputer GAIMSC Workshop on Clusters, Clouds, and Data for Scientific Computing CCDSC 2018 Châteauform’, La Maison des Contes, 427 Chemin de Chanzé, (near Lyon) France September 4-7 2018 http://www.netlib.org/utk/people/JackDongarra/CCDSC-2018/ Geoffrey Fox, September 5, 2018 Digital Science Center Department of Intelligent Systems Engineering Indiana University Bloomington gcf@indiana.edu, http://www.dsc.soic.indiana.edu/, http://spidal.org/ 1
  • 2. Let’s learn from Microsoft Research what they think are hot areas • Industry’s role in research much larger today than 20-40 years ago • Microsoft Research has about 1000 researchers and has 800 interns per year • One of the largest computer science research organizations (INRIA larger) • They just held a faculty summit August 2018 largely focused on systems for AI • https://www.microsoft.com/en-us/research/event/faculty-summit-2018/ • With an interesting overview at the end positioning their work as building designing and using the "Global AI Supercomputer" concept linking the Intelligent Cloud to the Intelligent Edge https://www.youtube.com/watch?v=jsv7EWhCqIQ&feature=youtu.be 2
  • 5. Adding Modeling? • Microsoft is very optimistic and excited • I added “Modeling” to get the Global AI and Modeling Supercomputer GAIMSC • Modeling was meant to • Include classic simulation oriented supercomputers • Even just in Big Data, one needs to build a model for the machine learning to use • Note many talks discussed autotuning i.e. using GAIMSC to optimize GAIMSC 5
  • 6. Possible Slogans for Research in Global AI and Modeling Supercomputer arena • AI First Science and Engineering • Global AI and Modeling Supercomputer (Grid) • Linking Intelligent Cloud to Intelligent Edge • Linking Digital Twins to Deep Learning • High-Performance Big-Data Computing • Big Data and Extreme-scale Computing (BDEC) • Common Digital Continuum Platform for Big Data and Extreme Scale Computing (BDEC2) • Using High Performance Computing ideas/technologies to give higher functionality and performance “cloud” and “edge” systems • Software 2.0 – replace Python by Training Data • Industry 4.0 – Software defined machines or Industrial Internet of Things 6
  • 7. Big Data and Extreme-scale Computing http://www.exascale.org/bdec/ • BDEC Pathways to Convergence Report • New series BDEC2 “Common Digital Continuum Platform for Big Data and Extreme Scale Computing” with first meeting November 28-30, 2018 Bloomington Indiana USA. • First day is evening reception with meeting focus “Defining application requirements for a Common Digital Continuum Platform for Big Data and Extreme Scale Computing” • Next meetings: February 19-21 Kobe, Japan (National infrastructure visions) followed by two in Europe, one in USA and one in China. http://www.exascale.org/bdec/sites/www.exascale.org.bdec/files/w hitepapers/bdec2017pathways.pdf 7
  • 8. AI First Research for AI-Driven Science and Engineering • Artificial Intelligence is a dominant disruptive technology affecting all our activities including business, education, research, and society. • Further, several companies have proposed AI first strategies. • They used to be mobile first • The AI disruption is typically associated with big data coming from edge, repositories or sophisticated scientific instruments such as telescopes, light sources and gene sequencers. • AI First requires mammoth computing resources such as clouds, supercomputers, hyperscale systems and their distributed integration. • AI First strategy is using the Global AI and Modeling Supercomputer GAIMSC • Indiana University is examining a Masters in AI-Driven Engineering 8
  • 9. AI First Publicity: 2017 Headlines • The Race For AI: Google, Twitter, Intel, Apple In A Rush To Grab Artificial Intelligence Startups • Google, Facebook, And Microsoft Are Remaking Themselves Around AI • Google: The Full Stack AI Company • Bezos Says Artificial Intelligence to Fuel Amazon's Success • Microsoft CEO says artificial intelligence is the 'ultimate breakthrough' • Tesla’s New AI Guru Could Help Its Cars Teach Themselves • Netflix Is Using AI to Conquer the World... and Bandwidth Issues • How Google Is Remaking Itself As A “Machine Learning First” Company • If You Love Machine Learning, You Should Check Out General Electric
  • 10. Requirements for Global AI (and Modeling) Supercomputer • Application Requirements: The structure of application clearly impacts needed hardware and software • Pleasingly parallel • Workflow • Global Machine Learning • Data model: SQL, NoSQL; File Systems, Object store; Lustre, HDFS • Distributed data from distributed sensors and instruments (Internet of Things) requires Edge computing model • Device – Fog – Cloud model and streaming data software and algorithms • Hardware: node (accelerators such as GPU or KNL for deep learning) and multi- node architecture configured as AI First HPC Cloud; • Disks speed and location • Software requirements: Programming model for GAIMSC • Analytics • Data management • Streaming or Repository access or both 10
  • 11. Distinctive Features of Applications • Ratio of data to model sizes: vertical axis on next slide • Importance of Synchronization – ratio of inter-node communication to node computing: horizontal axis on next slide • Sparsity of Data or Model; impacts value of GPU’s or vector computing • Irregularity of Data or Model • Geographic distribution of Data as in edge computing; use of streaming (dynamic data) versus batch paradigms • Dynamic model structure as in some iterative algorithms 11
  • 12. Big Data and Simulation Difficulty in Parallelism Size of Synchronization constraints Pleasingly Parallel Often independent events MapReduce as in scalable databases Structured Adaptive Sparse Loosely Coupled Largest scale simulations Current major Big Data category Commodity Clouds HPC Clouds: Accelerators High Performance Interconnect Exascale Supercomputers Global Machine Learning e.g. parallel clustering Deep Learning HPC Clouds/Supercomputers Memory access also critical Unstructured Adaptive Sparse Graph Analytics e.g. subgraph mining LDA Linear Algebra at core (often not sparse) Size of Disk I/O Tightly Coupled Parameter sweep simulations Just two problem characteristics There is also data/compute distribution seen in grid/edge computing 12
  • 13. Remembering Grid Computing: IoT and Distributed Center I • Hyperscale data centers will grow from 338 in number at the end of 2016 to 628 by 2021. They will represent 53 percent of all installed data center servers by 2021. • They form a distributed Compute (on data) grid with some 50 million servers • 94 percent of workloads and compute instances will be processed by cloud data centers by 2021-- only six percent will be processed by traditional data centers. • Analysis from CISCO https://www.cisco.com/c/en/us/solutions/collateral/service- provider/global-cloud-index-gci/white-paper-c11-738085.html 13 Number of instances per server Number of Cloud Data Centers Number of Public or Private Cloud Data Center Instances
  • 14. Remembering Grid Computing: IoT and Distributed Center II • By 2021, Cisco expects IoT connections to reach 13.7 billion, up from 5.8 billion in 2016, according to its Global Cloud Index. • Globally, the data stored in data centers will nearly quintuple by 2021 to reach 1.3ZB by 2021, up 4.6-fold (a CAGR of 36 percent) from 286 exabytes (EB) in 2016. • Big data will reach 403 EB by 2021, up almost eight-fold from 25EB in 2016. Big data will represent 30 percent of data stored in data centers by 2021, up from 18 percent in 2016. • The amount of data stored on devices will be 4.5-times higher than data stored in data centers, at 5.9ZB by 2021. • Driven largely by IoT, the total amount of data created (and not necessarily stored) by any device will reach 847ZB per year by 2021, up from 218ZB per year in 2016. • The Intelligent Edge or IoT is a distributed Data Grid 14
  • 15. 15 Mary Meeker identifies even more digital data
  • 16. Overall Global AI and Modeling Supercomputer GAIMSC Architecture • There is only a cloud at the logical center but it’s physically distributed and owned by a few major players • There is a very distributed set of devices surrounded by local Fog computing; this forms the logically and physically distributed edge • The edge is structured and largely data • These are two differences from the Grid of the past • e.g. self driving car will have its own fog and will not share fog with truck that it is about to collide with • The cloud and edge will both be very heterogeneous with varying accelerators, memory size and disk structure. 16
  • 17. Its is challenging these days to compete with Berkeley, Stanford, Google, Facebook, Amazon, Microsoft, IBM. Zaharia from Stanford (earlier Berkeley and Spark) 17
  • 18. Collaborating on the Global AI and Modeling Supercomputer GAIMSC • Microsoft says: • We can only “play together” and link functionalities from Google, Amazon, Facebook, Microsoft, Academia if we have open API’s and open code to customize • We must collaborate • Open source Apache software • Academia needs to use and define their own Apache projects • We want to use AI supercomputer for AI-Driven science studying the early universe and the Higgs boson and not just producing annoying advertisements (goal of most elite CS researchers) 18
  • 19. 19 ML Code NIPS 2015 http://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf Should we train data scientists or data engineers? Gartner says that 3 times as many jobs for data engineers as data scientists. There is more to life (jobs) than Machine Learning!
  • 20. Gartner on Data Engineering • Gartner says that job numbers in data science teams are • 10% - Data Scientists • 20% - Citizen Data Scientists ("decision makers") • 30% - Data Engineers • 20% - Business experts • 15% - Software engineers • 5% - Quant geeks • ~0% - Unicorns (very few exist!) 20
  • 21. Ways of adding High Performance to Global AI (and Modeling) Supercomputer • Fix performance issues in Spark, Heron, Hadoop, Flink etc. • Messy as some features of these big data systems intrinsically slow in some (not all) cases • All these systems are “monolithic” and difficult to deal with individual components • Execute HPBDC from classic big data system with custom communication environment – approach of Harp for the relatively simple Hadoop environment • Provide a native Mesos/Yarn/Kubernetes/HDFS high performance execution environment with all capabilities of Spark, Hadoop and Heron – goal of Twister2 • Execute with MPI in classic (Slurm, Lustre) HPC environment • Add modules to existing frameworks like Scikit-Learn or Tensorflow either as new capability or as a higher performance version of existing module. 21
  • 22. Working with Industry? • Many academic areas today have been turned upside down by the increased role of industry as there is so much overlap between major University research issues and problems where the large technology companies are battling it out for commercial leadership. • Correspondingly we have seen -- especially with top-ranked departments --- increasing numbers and styles of industry-university collaboration. Probably most departments must join this trend and increase their Industry links if they are to thrive. • Sometimes faculty are 20% University and 80% Industry • These links can have the student internship opportunity needed for ABET and traditional in the field. • However, this is a double-edged sword as the increased access to internships for our best Ph.D. students is one reason for the decrease in University research contribution. • We should try to make such internships part of joint University-Industry research and not just an Industry only activity • Relationship can be jointly run centers such as NSF I/UCRC and Industry oriented University centers such as Intel parallel Computing centers 22
  • 23. GAIMSC Global AI & Modeling Supercomputer Questions • What do gain from the concept? e.g. Ability to work with Big Data community • What do we lose from the concept? e.g. everything runs as slow as Spark • Is GAIMSC useful for BDEC2 initiative? For NSF? For DoE? For Universities? For Industry? For users? • Does adding modeling to concept add value? • What are the research issues for GAIMSC? e.g. how to program? • What can we do with GAIMSC that we couldn’t do with classic Big Data technologies? • What can we do with GAIMSC that we couldn’t do with classic HPC technologies? • Are there deep or important issues associated with the “Global” in GAIMSC? • Is the concept of an auto-tuned Global AI Supercomputer scary? 23

Notas del editor

  1. until ISE came along