SlideShare una empresa de Scribd logo
1 de 47
Descargar para leer sin conexión
{GraphConnect NYC}
Hadoop and Graph Databases
(Neo4j): Winning Combination for
Bioinformatics
Jonathan Freeman
@freethejazz
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioanalytics Win

Open Software Integrators
●

Jonathan Freeman
@freethejazz

Founded January 2008 by Andrew C. Oliver
○ Durham, NC

Revenue and staff has at least doubled every year since
2009.
●

New office (2012) in Chicago, IL
○ We're hiring associate to senior level as well as UI Developers
(JQuery, Javascript, HTML, CSS)
○ Up to 50% travel (probably less), salary + bonus, 401k, health,
etc etc
○ Preferred: Java, Tomcat, JBoss, Hibernate, Spring, RDBMS,
JQuery
○ Nice to have: Hadoop, Neo4j, MongoDB, Ruby a/o at least one
Cloud platform

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win

Questions to answer

●
●
●
●

uhh, bioinformatics?
What is Hadoop? Why is it a good fit?
And Neo4j? Why the combination?
I want this now! How do I do it?!?!

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}

Jonathan Freeman
@freethejazz
{Hadoop + Neo4j = Bioinformatics Win}

Bioinformatics

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

“
dynamic
information processing
system
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

Life
http://www.labtimes.org/labtimes/issues/lt2011/lt07/lt_2011_07_26_29.pdf

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

● Storing/Retrieving Biological Data
● Organizing Biological Data
● Analyzing Biological Data

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

Biological Data
● amino acid sequences
● nucleotide sequences
● protein structures

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

●
●
●
●
●

Genetic sequence analysis
Tracing biological evolution
Analysis of gene expression
Studying mutations in cancer
Predicting protein structure and
function
● Molecular Interaction

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

●
●
●
●
●

Genetic sequence analysis
Tracing biological evolution
Analysis of gene expression
Studying mutations in cancer
Predicting protein structure and
function
● Molecular Interaction

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

Full Human Genome Sequencing Then

13 Years

$2,700,000,000

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

Full Human Genome Sequencing Then

1 Day

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}

$5,000
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

http://www.genome.gov/images/content/cost_per_genome_apr.jpg

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

So what are we
waiting for?

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

well, the thing
about that…

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

...
ATTCCAGGAGTATTGACACCAT...

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

AGGATTACCAGGA
CAAAGGATT
TTACCAGGATACCAG
TGACAA
AAGGATTAC
GATACCAGTA
CAAGGATT
GTGACAA

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
{Hadoop + Neo4j = Bioinformatics Win}

Hadoop

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

Infrastructure for distributed computing
HDFS

MapReduce

A distributed file system.

An implementation of a
programming model for
processing very large data sets.

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

…
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

Infrastructure for distributed computing
HDFS

MapReduce

A distributed file system.

An implementation of a
programming model for
processing very large data sets.

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

AGGATTACCAGGA
CAAAGGATT
TTACCAGGATACCAG
TGACAA
AAGGATTAC
GATACCAGTA
CAAGGATT
GTGACAA

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

...
ATTCCAGGAGTATTGACACCAT...

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

1000 CPU hours

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

3 hours
$85
OSS
http://bowtie-bio.sourceforge.net/crossbow/index.shtml
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
{Hadoop + Neo4j = Bioinformatics Win}

And Neo4j?

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

MATCH (snp)<-[:INFLUENCED_BY]-(conditions)
WHERE snp.id = “rs1234”
RETURN conditions;

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

MATCH (p)-[:GENOME_CONTAINS]->(snp)
(snp)<-[:INFLUENCED_BY]-(conditions)
WHERE p.name = “Jonathan Freeman”
RETURN conditions;

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

MATCH (p)-[:GENOME_CONTAINS]->(snp)
(snp)<-[:INFLUENCED_BY]-(conditions)
WHERE c.name = “Parkinsons”
RETURN p;

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
{Hadoop + Neo4j = Bioinformatics Win}

How can I haz?!?!?!1

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

Step 1: Get local copies
● Hadoop: http://www.neo4j.org/download
● Neo4j: http://hadoop.apache.org/releases.html#Download
● Batch Importer: https://github.com/jexp/batch-import

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

Step 2: Familiarize yourself with the languages
●
●
●

MapReduce: http://hadoop.apache.org/docs/r0.18.3/mapred_tutorial.html
Pig: http://pig.apache.org/docs/r0.12.0/start.html
Hive: https://cwiki.apache.org/confluence/display/Hive/GettingStarted

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

Step 3: Find a dataset
●
●

Typical starter data: http://www.gutenberg.org/
Amazon’s public data sets: http://aws.amazon.com/publicdatasets/

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

Step 4: Start Playing!!!

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

Step 5: Take Hadoop to the cloud
● http://aws.amazon.com/elasticmapreduce/

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

Doing this in production?
http://blog.xebia.com/2012/11/13/combining-neo4j-and-hadoop-part-i/
http://blog.xebia.com/2013/01/17/combining-neo4j-and-hadoop-part-ii/

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
{Hadoop + Neo4j = Bioinformatics Win}

Thank You
@freethejazz

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Hadoop + Neo4j = Bioinformatics Win
Jonathan Freeman
@freethejazz

Image Attribution:
Sand Timer: http://bit.ly/HyCAgy
Money: http://bit.ly/1e4lhS6
Scraggly DNA drawings: Jonathan Freeman :)

{Open Software Integrators} { www.osintegrators.com} {@osintegrators}

Más contenido relacionado

Similar a Hadoop and Neo4j: A Winning Combination for Bioinformatics

Creating Open Data with Open Source (beta2)
Creating Open Data with Open Source (beta2)Creating Open Data with Open Source (beta2)
Creating Open Data with Open Source (beta2)Sammy Fung
 
Graph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise GraphGraph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise GraphTigerGraph
 
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014Austin Ogilvie
 
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...Neo4j
 
Building a Distributed Build System at Google Scale
Building a Distributed Build System at Google ScaleBuilding a Distributed Build System at Google Scale
Building a Distributed Build System at Google ScaleAysylu Greenberg
 
JSON and Oracle Database: A Brave New World
 JSON and Oracle Database: A Brave New World JSON and Oracle Database: A Brave New World
JSON and Oracle Database: A Brave New WorldDaniel McGhan
 
Comprehensive Container Based Service Monitoring with Kubernetes and Istio
Comprehensive Container Based Service Monitoring with Kubernetes and IstioComprehensive Container Based Service Monitoring with Kubernetes and Istio
Comprehensive Container Based Service Monitoring with Kubernetes and IstioFred Moyer
 
Creando microservicios con Java, Microprofile y TomEE - Baranquilla JUG
Creando microservicios con Java, Microprofile y TomEE - Baranquilla JUGCreando microservicios con Java, Microprofile y TomEE - Baranquilla JUG
Creando microservicios con Java, Microprofile y TomEE - Baranquilla JUGCésar Hernández
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 
Developing in R - the contextual Multi-Armed Bandit edition
Developing in R - the contextual Multi-Armed Bandit editionDeveloping in R - the contextual Multi-Armed Bandit edition
Developing in R - the contextual Multi-Armed Bandit editionRobin van Emden
 
HEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 TalkHEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 TalkEamonn Maguire
 
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Data Science Amsterdam - Massively Parallel Processing with Procedural LanguagesData Science Amsterdam - Massively Parallel Processing with Procedural Languages
Data Science Amsterdam - Massively Parallel Processing with Procedural LanguagesIan Huston
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Demi Ben-Ari
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Codemotion
 
Fully Tested: From Design to MVP In 3 Weeks
Fully Tested: From Design to MVP In 3 WeeksFully Tested: From Design to MVP In 3 Weeks
Fully Tested: From Design to MVP In 3 WeeksSmartBear
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataInside Analysis
 
OPA APIs and Use Case Survey
OPA APIs and Use Case SurveyOPA APIs and Use Case Survey
OPA APIs and Use Case SurveyTorin Sandall
 
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...PyData
 
Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?Tugdual Grall
 

Similar a Hadoop and Neo4j: A Winning Combination for Bioinformatics (20)

Creating Open Data with Open Source (beta2)
Creating Open Data with Open Source (beta2)Creating Open Data with Open Source (beta2)
Creating Open Data with Open Source (beta2)
 
Graph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise GraphGraph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise Graph
 
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
 
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
 
Building a Distributed Build System at Google Scale
Building a Distributed Build System at Google ScaleBuilding a Distributed Build System at Google Scale
Building a Distributed Build System at Google Scale
 
JSON and Oracle Database: A Brave New World
 JSON and Oracle Database: A Brave New World JSON and Oracle Database: A Brave New World
JSON and Oracle Database: A Brave New World
 
Comprehensive Container Based Service Monitoring with Kubernetes and Istio
Comprehensive Container Based Service Monitoring with Kubernetes and IstioComprehensive Container Based Service Monitoring with Kubernetes and Istio
Comprehensive Container Based Service Monitoring with Kubernetes and Istio
 
Creando microservicios con Java, Microprofile y TomEE - Baranquilla JUG
Creando microservicios con Java, Microprofile y TomEE - Baranquilla JUGCreando microservicios con Java, Microprofile y TomEE - Baranquilla JUG
Creando microservicios con Java, Microprofile y TomEE - Baranquilla JUG
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 
Developing in R - the contextual Multi-Armed Bandit edition
Developing in R - the contextual Multi-Armed Bandit editionDeveloping in R - the contextual Multi-Armed Bandit edition
Developing in R - the contextual Multi-Armed Bandit edition
 
HEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 TalkHEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 Talk
 
Logs & Visualizations at Twitter
Logs & Visualizations at TwitterLogs & Visualizations at Twitter
Logs & Visualizations at Twitter
 
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Data Science Amsterdam - Massively Parallel Processing with Procedural LanguagesData Science Amsterdam - Massively Parallel Processing with Procedural Languages
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
 
Fully Tested: From Design to MVP In 3 Weeks
Fully Tested: From Design to MVP In 3 WeeksFully Tested: From Design to MVP In 3 Weeks
Fully Tested: From Design to MVP In 3 Weeks
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
OPA APIs and Use Case Survey
OPA APIs and Use Case SurveyOPA APIs and Use Case Survey
OPA APIs and Use Case Survey
 
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...
 
Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?
 

Último

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Último (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Hadoop and Neo4j: A Winning Combination for Bioinformatics

  • 1. {GraphConnect NYC} Hadoop and Graph Databases (Neo4j): Winning Combination for Bioinformatics Jonathan Freeman @freethejazz {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 2. Hadoop + Neo4j = Bioanalytics Win Open Software Integrators ● Jonathan Freeman @freethejazz Founded January 2008 by Andrew C. Oliver ○ Durham, NC Revenue and staff has at least doubled every year since 2009. ● New office (2012) in Chicago, IL ○ We're hiring associate to senior level as well as UI Developers (JQuery, Javascript, HTML, CSS) ○ Up to 50% travel (probably less), salary + bonus, 401k, health, etc etc ○ Preferred: Java, Tomcat, JBoss, Hibernate, Spring, RDBMS, JQuery ○ Nice to have: Hadoop, Neo4j, MongoDB, Ruby a/o at least one Cloud platform {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 3. Hadoop + Neo4j = Bioinformatics Win Questions to answer ● ● ● ● uhh, bioinformatics? What is Hadoop? Why is it a good fit? And Neo4j? Why the combination? I want this now! How do I do it?!?! {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Jonathan Freeman @freethejazz
  • 4. {Hadoop + Neo4j = Bioinformatics Win} Bioinformatics {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 5. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz “ dynamic information processing system {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 6. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz Life http://www.labtimes.org/labtimes/issues/lt2011/lt07/lt_2011_07_26_29.pdf {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 7. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz ● Storing/Retrieving Biological Data ● Organizing Biological Data ● Analyzing Biological Data {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 8. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz Biological Data ● amino acid sequences ● nucleotide sequences ● protein structures {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 9. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz ● ● ● ● ● Genetic sequence analysis Tracing biological evolution Analysis of gene expression Studying mutations in cancer Predicting protein structure and function ● Molecular Interaction {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 10. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz ● ● ● ● ● Genetic sequence analysis Tracing biological evolution Analysis of gene expression Studying mutations in cancer Predicting protein structure and function ● Molecular Interaction {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 11. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz Full Human Genome Sequencing Then 13 Years $2,700,000,000 {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 12. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz Full Human Genome Sequencing Then 1 Day {Open Software Integrators} { www.osintegrators.com} {@osintegrators} $5,000
  • 13. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz http://www.genome.gov/images/content/cost_per_genome_apr.jpg {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 14. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz So what are we waiting for? {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 15. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 16. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz well, the thing about that… {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 17. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 18. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 19. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz ... ATTCCAGGAGTATTGACACCAT... {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 20. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 21. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 22. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 23. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz AGGATTACCAGGA CAAAGGATT TTACCAGGATACCAG TGACAA AAGGATTAC GATACCAGTA CAAGGATT GTGACAA {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 24. {Hadoop + Neo4j = Bioinformatics Win} Hadoop {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 25. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz Infrastructure for distributed computing HDFS MapReduce A distributed file system. An implementation of a programming model for processing very large data sets. {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 26. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz … {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 27. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 28. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 29. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz Infrastructure for distributed computing HDFS MapReduce A distributed file system. An implementation of a programming model for processing very large data sets. {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 30. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz AGGATTACCAGGA CAAAGGATT TTACCAGGATACCAG TGACAA AAGGATTAC GATACCAGTA CAAGGATT GTGACAA {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 31. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz ... ATTCCAGGAGTATTGACACCAT... {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 32. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz 1000 CPU hours {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 33. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz 3 hours $85 OSS http://bowtie-bio.sourceforge.net/crossbow/index.shtml {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 34. {Hadoop + Neo4j = Bioinformatics Win} And Neo4j? {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 35. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 36. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz MATCH (snp)<-[:INFLUENCED_BY]-(conditions) WHERE snp.id = “rs1234” RETURN conditions; {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 37. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz MATCH (p)-[:GENOME_CONTAINS]->(snp) (snp)<-[:INFLUENCED_BY]-(conditions) WHERE p.name = “Jonathan Freeman” RETURN conditions; {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 38. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz MATCH (p)-[:GENOME_CONTAINS]->(snp) (snp)<-[:INFLUENCED_BY]-(conditions) WHERE c.name = “Parkinsons” RETURN p; {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 39. {Hadoop + Neo4j = Bioinformatics Win} How can I haz?!?!?!1 {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 40. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz Step 1: Get local copies ● Hadoop: http://www.neo4j.org/download ● Neo4j: http://hadoop.apache.org/releases.html#Download ● Batch Importer: https://github.com/jexp/batch-import {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 41. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz Step 2: Familiarize yourself with the languages ● ● ● MapReduce: http://hadoop.apache.org/docs/r0.18.3/mapred_tutorial.html Pig: http://pig.apache.org/docs/r0.12.0/start.html Hive: https://cwiki.apache.org/confluence/display/Hive/GettingStarted {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 42. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz Step 3: Find a dataset ● ● Typical starter data: http://www.gutenberg.org/ Amazon’s public data sets: http://aws.amazon.com/publicdatasets/ {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 43. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz Step 4: Start Playing!!! {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 44. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz Step 5: Take Hadoop to the cloud ● http://aws.amazon.com/elasticmapreduce/ {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 45. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz Doing this in production? http://blog.xebia.com/2012/11/13/combining-neo4j-and-hadoop-part-i/ http://blog.xebia.com/2013/01/17/combining-neo4j-and-hadoop-part-ii/ {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 46. {Hadoop + Neo4j = Bioinformatics Win} Thank You @freethejazz {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 47. Hadoop + Neo4j = Bioinformatics Win Jonathan Freeman @freethejazz Image Attribution: Sand Timer: http://bit.ly/HyCAgy Money: http://bit.ly/1e4lhS6 Scraggly DNA drawings: Jonathan Freeman :) {Open Software Integrators} { www.osintegrators.com} {@osintegrators}