Chicago AWS user group - "Big Data in Higher Education" - Rebecca Schmidt and Alana Alfeche

•

2 recomendaciones•211 vistas

Big Data and Analytics on AWS Chicago AWS user group event Nov 12, 2019 "Big Data in Higher Education" - Rebecca Schmidt and Alana Alfeche // @rebeccaschmidtm and @alanaalfeche

Tecnología

Big Data
in Higher Education
Alana Alfeche and Rebecca Schmidt

Presentation Breakdown
1. Share pain points in our ﬁeld
2. Our ﬁeld with cloud technology
3. Q & A
Disclaimer: None of the following materials presented reﬂects what we do at our professional roles.
These are knowledge we obtained from our graduate programs.
1

Bioinformatics: An Introduction
2
Bioinformatics = [2 * passion]
+ Computer Science Design Principles
+ Domain Knowledge of Biology

Whole Genome Sequencing
1995 First free-living organism to have its entire genome
sequenced (Haemophilus inﬂuenzae Rd.)
2003 Human Genome Project completed with a price tag
of $2.7 billion
2015 The cost to generate a whole-exome sequence is
estimated to be below $1500
3
Moore’s Law states that computer power double every
two years. Technology that ‘keep up’ with Moore’s Law
are widely regarded to be doing well.
NIH, 2019

Information Explosion
Data Volume
- By 2020, 40% of IoT devices will be related to
health and medicine
- By 2025, biomedical data will exceed the growth
of other big data domains such as astronomy,
physics, and social media
Data Velocity
- Next genome sequencing (NGS) brings us
real-time 30GB of data
Data Variety
- Biological data are heterogeneous
- No standard annotation
- Each database has its own data format
4
NCBI, October 2019
Rossi, 2018

CV Through the Years
● Data mining now utilizes machine learning
algorithms as tools to extract potentially-valuable
patterns held within datasets
○ Informs image recognition
● Advancements in the study of Computer Vision are
inﬂuencing almost every industry
○ Automotive
○ Healthcare
○ Retail
○ Agriculture
○ Banking
7

Challenges with Big Data in CV
Availability of Public Data
● Companies like Waymo are moving toward making their data publicly
available, but not necessarily in a common/centralized way
● Diﬃcult to monitor the eﬀectiveness of data integration
Quantity
● ML algorithms not necessarily designed to handle big data
● Adapting through new processing paradigms (MapReduce - parallel
execution of multiple nodes) and distributed processing frameworks
(Hadoop)
● Computational Complexity and Processing Performance
Non-Linearity of Data
● Diﬃcult to observe relationships
Variance and Bias
● As volume of data increases, the learner can become too closely biased
to the training set and unable to generalize adequately for new data
● Regularization is used to avoid this, but requires more computation time
8

Future of Bioinformatics and CV in the Cloud
Database
- DynamoDB
- DocumentDB
- Neptune
Analytics
- EMR
- Lake Formation
- Batch
Compute
- EC2 instance
- Lambda
Machine Learning
- SageMaker
- Rekognition
- DeepLens 9
AWS, 2019

Más contenido relacionado

La actualidad más candente

Introduction to-data-scienceAhmad karawash

ACCJ healthcare it 20130612Eiji Sasahara, Ph.D., MBA 笹原英司

Internet of ThingsMphasis

2017 11 cascdJohannes Keizer

North Carolina State University -- Harnessing Artificial Intelligence and big...Lenovo Data Center

Big Data In EducationCareerFoundry

Why is data science hotMahesh Kumar CV

ICT Trends Article - Big Data - October 2015Garry Roberton

Internet of ThingsMphasis

Introduction to data science clubData Science Club

Hendy ferdian (1)HendyFerdian1

Industrial training pptHRJEETSINGH

IBM Analytics at Scale: Because Business Outcomes MatterChristine O'Connor

Data architecture A Primer for the Data ScientistMary Levins, PMP

Aaas Data Intensive Science And GridIan Foster

Stanford Solar Schools ProjectTrevor House

Data has a gravity and is attracting decisionsPietro Leo

2019 04-08 ieee forum presentationRichard Vines

The COCH projectaskroll

Call for Papers - International Journal of Information Sciences and Technique...ijistjournal

La actualidad más candente (20)

Introduction to-data-science

ACCJ healthcare it 20130612

Internet of Things

2017 11 cascd

North Carolina State University -- Harnessing Artificial Intelligence and big...

Big Data In Education

Why is data science hot

ICT Trends Article - Big Data - October 2015

Internet of Things

Introduction to data science club

Hendy ferdian (1)

Industrial training ppt

IBM Analytics at Scale: Because Business Outcomes Matter

Data architecture A Primer for the Data Scientist

Aaas Data Intensive Science And Grid

Stanford Solar Schools Project

Data has a gravity and is attracting decisions

2019 04-08 ieee forum presentation

The COCH project

Call for Papers - International Journal of Information Sciences and Technique...

Similar a Chicago AWS user group - "Big Data in Higher Education" - Rebecca Schmidt and Alana Alfeche

Challenges and outlook with Big Data IJCERT JOURNAL

hariri2019.pdfAkuhuruf

Implementation of application for huge data file transferijwmn

wireless sensor networkparry prabhu

A Survey on Big Data Mining ChallengesEditor IJMTER

Iisrt z dr.s.sapnaIISRT

Data science Big Datasreekanthricky

Causal networks, learning and inference - IntroductionFabio Stella

big-data.pdfaditi276464

Big data - a review (2013 4)Sonu Gupta

Big Data Challenges faced by OrganizationsIJCSIS Research Publications

Big data analytics and its impact on internet usersStruggler Ever

IRJET- Scope of Big Data Analytics in Industrial DomainIRJET Journal

Data scienceDeekshaSrivas

The What, Why and How of Big DataLuca Naso

Power from big data - Are Europe's utilities ready for the age of data?Steve Bray

E content.1 - P.SENEKA II-MSC COMPUTER SCIENCE,BON SECOURS COLLEGE FOR WOMENsenekapseneka

Big Data Expo 2015 - IBM 5 predictionsBigDataExpo

Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Onyebuchi nosiri

Similar a Chicago AWS user group - "Big Data in Higher Education" - Rebecca Schmidt and Alana Alfeche (20)

Challenges and outlook with Big Data

hariri2019.pdf

Implementation of application for huge data file transfer

wireless sensor network

A Survey on Big Data Mining Challenges

Iisrt z dr.s.sapna

Data science Big Data

Causal networks, learning and inference - Introduction

big-data.pdf

Big data - a review (2013 4)

Big Data Challenges faced by Organizations

Big data analytics and its impact on internet users

IRJET- Scope of Big Data Analytics in Industrial Domain

Data science

The What, Why and How of Big Data

Power from big data - Are Europe's utilities ready for the age of data?

E content.1 - P.SENEKA II-MSC COMPUTER SCIENCE,BON SECOURS COLLEGE FOR WOMEN

Big Data Expo 2015 - IBM 5 predictions

Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...

Más de AWS Chicago

AWS reInvent 2023 recaps from Chicago AWS user groupAWS Chicago

Chicago AWS Solutions Architect Mehdy Haghy recaps the new AI/ML releases and...AWS Chicago

WilliamCollins_Road-to-Transit-Gateway.pptxAWS Chicago

Suresh Poopandi_Generative AI On AWS-MidWestCommunityDay-Final.pdfAWS Chicago

Streamlined Entitlements with AWS Lake Formation - Anusha DwivedulaAWS Chicago

Steve Seaney_AWS Control Tower - 2023 Midwest Community Day - Final.pptxAWS Chicago

Saurabh_Shanbhag - Building_SaaS_on_AWS.pptxAWS Chicago

Sanket_Nasre_Simplify Modernization.pdfAWS Chicago

Ross Stuart_Using ML to Solve Lifes Problems.pptxAWS Chicago

robsable_Enhancing DevOps Practices with CloudWatch APM FINAL.pdfAWS Chicago

Sanket_Nasre_Simplify Modernization.pdfAWS Chicago

Mohamed Wali_AWS Security Reference Architecture.pptxAWS Chicago

Nick-Walter-HOB_Migrating_Dinosaurs.pptxAWS Chicago

Pat_Davies_AWSCostOptimization_Final.pdfAWS Chicago

MARK GAMBLE_ASC For Really Remote Edge Computing - AWS Community Day Chicago ...AWS Chicago

MichaelSoule-UsingJupyterNotebooks.pptxAWS Chicago

Michal Brygidyn_CloudHackingScenarios.pdfAWS Chicago

Kamil Kolodziejski_Structura-AWS.pptxAWS Chicago

John Merline AWS Certification FAQ.pptxAWS Chicago

JuliaFMorgado_Breaking_bad_habits.pptxAWS Chicago

Más de AWS Chicago (20)

AWS reInvent 2023 recaps from Chicago AWS user group

Chicago AWS Solutions Architect Mehdy Haghy recaps the new AI/ML releases and...

WilliamCollins_Road-to-Transit-Gateway.pptx

Suresh Poopandi_Generative AI On AWS-MidWestCommunityDay-Final.pdf

Streamlined Entitlements with AWS Lake Formation - Anusha Dwivedula

Steve Seaney_AWS Control Tower - 2023 Midwest Community Day - Final.pptx

Saurabh_Shanbhag - Building_SaaS_on_AWS.pptx

Sanket_Nasre_Simplify Modernization.pdf

Ross Stuart_Using ML to Solve Lifes Problems.pptx

robsable_Enhancing DevOps Practices with CloudWatch APM FINAL.pdf

Sanket_Nasre_Simplify Modernization.pdf

Mohamed Wali_AWS Security Reference Architecture.pptx

Nick-Walter-HOB_Migrating_Dinosaurs.pptx

Pat_Davies_AWSCostOptimization_Final.pdf

MARK GAMBLE_ASC For Really Remote Edge Computing - AWS Community Day Chicago ...

MichaelSoule-UsingJupyterNotebooks.pptx

Michal Brygidyn_CloudHackingScenarios.pdf

Kamil Kolodziejski_Structura-AWS.pptx

John Merline AWS Certification FAQ.pptx

JuliaFMorgado_Breaking_bad_habits.pptx

Último

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Google AI Hackathon: LLM based Evaluator for RAGSujit Pal

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

GenCyber Cyber Security Day PresentationMichael W. Hawkins

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Histor y of HAM Radio presentation slidevu2urc

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

A Call to Action for Generative AI in 2024Results

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Chicago AWS user group - "Big Data in Higher Education" - Rebecca Schmidt and Alana Alfeche

1. Big Data in Higher Education Alana Alfeche and Rebecca Schmidt

2. Presentation Breakdown 1. Share pain points in our field 2. Our field with cloud technology 3. Q & A Disclaimer: None of the following materials presented reflects what we do at our professional roles. These are knowledge we obtained from our graduate programs. 1

3. Bioinformatics: An Introduction 2 Bioinformatics = [2 * passion] + Computer Science Design Principles + Domain Knowledge of Biology

4. Whole Genome Sequencing 1995 First free-living organism to have its entire genome sequenced (Haemophilus inﬂuenzae Rd.) 2003 Human Genome Project completed with a price tag of $2.7 billion 2015 The cost to generate a whole-exome sequence is estimated to be below $1500 3 Moore’s Law states that computer power double every two years. Technology that ‘keep up’ with Moore’s Law are widely regarded to be doing well. NIH, 2019

5. Information Explosion Data Volume - By 2020, 40% of IoT devices will be related to health and medicine - By 2025, biomedical data will exceed the growth of other big data domains such as astronomy, physics, and social media Data Velocity - Next genome sequencing (NGS) brings us real-time 30GB of data Data Variety - Biological data are heterogeneous - No standard annotation - Each database has its own data format 4 NCBI, October 2019 Rossi, 2018

6. File Format Examples Hosseini, 2016 5

7. Questions?

8. Computer Vision: An Introduction 6

9. CV Through the Years ● Data mining now utilizes machine learning algorithms as tools to extract potentially-valuable patterns held within datasets ○ Informs image recognition ● Advancements in the study of Computer Vision are inﬂuencing almost every industry ○ Automotive ○ Healthcare ○ Retail ○ Agriculture ○ Banking 7

10. Challenges with Big Data in CV Availability of Public Data ● Companies like Waymo are moving toward making their data publicly available, but not necessarily in a common/centralized way ● Difficult to monitor the effectiveness of data integration Quantity ● ML algorithms not necessarily designed to handle big data ● Adapting through new processing paradigms (MapReduce - parallel execution of multiple nodes) and distributed processing frameworks (Hadoop) ● Computational Complexity and Processing Performance Non-Linearity of Data ● Difficult to observe relationships Variance and Bias ● As volume of data increases, the learner can become too closely biased to the training set and unable to generalize adequately for new data ● Regularization is used to avoid this, but requires more computation time 8

11. Questions?

12. Future of Bioinformatics and CV in the Cloud Database - DynamoDB - DocumentDB - Neptune Analytics - EMR - Lake Formation - Batch Compute - EC2 instance - Lambda Machine Learning - SageMaker - Rekognition - DeepLens 9 AWS, 2019

13. Other Questions?

14. Thanks for listening!

Chicago AWS user group - "Big Data in Higher Education" - Rebecca Schmidt and Alana Alfeche

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Chicago AWS user group - "Big Data in Higher Education" - Rebecca Schmidt and Alana Alfeche

Similar a Chicago AWS user group - "Big Data in Higher Education" - Rebecca Schmidt and Alana Alfeche (20)

Más de AWS Chicago

Más de AWS Chicago (20)

Último

Último (20)

Chicago AWS user group - "Big Data in Higher Education" - Rebecca Schmidt and Alana Alfeche