SlideShare a Scribd company logo
1 of 58
Download to read offline
Big Data Analyses in Pharma
An Overview
Josef Scheiber, PhD
Managing Director
July 2015
Geographie
Startup Center in Waldsassen
Main site
Data Analyses and Software
Development
Westpark Center
Garmischer Str. in Munich
Scientific ActivitiesSince Jan 1, 2015
Basel/Switzerland
Data Curation and customer-
related activities
Prag
150 km
München
200 km
Berlin
300 km
Frankfurt
250 km
BioVariance at a Glance –
Get most out of your complex data
Curate.Integrate
Analyze.Model
Visualize.Explore
DECIDE
Overview
• Background
• Strategy
• Examples
Background
Courtesy: M. Zeinab, slideshare
What do we need out of Big Data?
1. What are the inhibitors of kinase X and the five most similar
kinases with IC50 < 1 μM and with MW < 500 from all internal and
external data sources?
2. What assay technologies have been used against my kinase?
Which cell lines?
3. What other proteins are in the same kinase branch as target X,
where there were validated chemical hits from external or
internal sources?
4. If I hit a particular kinase, what would the potential side-effect
profile look like? Which known inhibitor of this kinase has the
best safety profile and the fewest known IC50s?
5. Have I identified other compounds with a bioactivity profile
similar to compound X and with the same core substructure?
6. Can we create a phylochemical tree of kinases and for a new
kinase target place it into the tree on the basis of activity against a
reference panel of compounds?
7. Have I identified all kinases with an x-ray structure (in-house or
external) that are in pathway X?
Bridging Chemical and Biological Data: Implications for Pharmaceutical Drug Discovery
JL Jenkins, J Scheiber, D Mikhailov, A Bender, A Schuffenhauer, B Cornett, V Chan, J
Kondracki, B Rohde, JW Davies (2012) In: Computational Approaches in Cheminformatics and
Bioinformatics Edited by:A Bender, R Guha. 25-56 John Wiley & Sons, Inc.
ANSWERS
Context matters!
metabolites
drugs
targets pathways
diseases (phenotypes)
Context matters
RNADNA
It´s not that simple …
Descriptive:
What happened?
Diagnostic:
Why did it happen?
Predictive:
What will happen?
Prescriptive:
How can we make it
happen?
Better data for better analytics
Hindsight Insight Foresight
Need for interpretation
33,3
10
20
30
70
33,3 80
70
60
10
33,3
10 10 10
20
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Before molecular
biology
Molecular biology
golden age
Genomics age Deep sequencing
age
Very soon
Data Analysis Experiment Experimental Design
Big Data?
Volume
Genome Sequencing
Slide adapted from George Church
Genome Sequencing
Slide adapted from George Church
Cost Reduction - Example
458 Ferrari Spider - $398,000 in 2006 –
40 cents now!
 Much more data for way less
money
Challenges for Informatics? –
1 genome is roughly 500 GB/data
2011 – several 100 exomes
Drug Discovery Pipeline
Target
finding
Lead Finding
Lead
Optimization
… Phase 1 … Market
Drug candidates Patients
Velocity
Velocity
• Mutations in tumor
• Resistance mechanisms in patients
• long term/short term AE
• compliance
• Nutrition and microbiome
• Data from wearables relevant for drugs
For each patient
Variety
Variety
Variety
• Bioinformatics
• Clinical
• Social network
• E-health
• Also text/patents
A simplified overview –
Molecules in Man
Adapted from Gohlke JM, Portier CJ.
Environ. Health Perspect. 115:1261-1263 (2007)
A question of complexity –They all
interact …
Biology
Chemistry
Physics
Dealing with a very complex environment –
i.e. many opportunities
 DNA
 RNA
 Protein
 Interactions
 Clinical parameters
 Treatment History
 Tissue anatomy
 Surgical History
 Epigenetic Profiles from many
patients at different
timeponits
 Target
 Off-targets
 Metabolites
 Additional indications
 Unspecific effects
 Similar drugs
Adapted from: J. Scheiber; How can we enable drug discovery informatics for personalized healthcare?
Expert Opinion on Drug Discovery, 1-6; 2/2011
… individual polypharmacology
Sequences Expression Proteomics Biological networks
(but also: Cells, Tissues, Organs)
POPULATION
Veracity
Veracity
• Chemogenomics data
• Gene expression data
 Imputation?
Veracity - Chemogenomics
Adapted from Tanrikulu et al. Missing
Value Estimation for Compound-
Target Activity Data, J. Mol. Inf
Veracity - Interactomics
A Proteome-Scale Map of the Human
Interactome Network
Rolland, Thomas et al.
Cell , Volume 159 , Issue 5 , 1212 - 1226
Veracity – Social Media
Strategy
Biological/Pharmacological
Understanding
drugs
targets pathways
diseases (phenotypes)
Data integration strategy
a) A central vocabulary/pointer server (information
stored are preferred names and synonyms plus
pointers to data servers, where to find what)
b)  semantic integration layer with domain-specific
terminology and referential data
c) A database for each datatype collected, storing only
preferred names along with raw measurements
d) Clearly defined APIs for further integration with
public data sources and to enable large-scale
analyses
Vocabularies needed
• Genes, Drugs, Proteins
• Diseases
• Organisms
• Microbiome species & genes
• Localization & source
• Phenotype
• Metabolite common names
Answering workflow
Vocabulary
Vocabulary server acts as
translator, aggregator and
locator, i.e. knows where
the respective facts can be
found
Firmicutes produce alpha-Linolein and thereby cause gut irritation
species
metabolite
Further
Data of each type is
stored in a specific
database to
enhance
performance of
large-scale analyses
Expert tools talk to
data directly or via
webservices
API
API
API
API
Enduserinterfaceand
visualization
Examples
Genome data at scale
Workflow
Identify drug targets
(primary and off-targets,
from DrugBank)
Call variations on a per-
individuum basis
Workflow
Analyse mutation rates in
the targets and in
particular drug binding
pockets
Example: Donepezil /
Acetylcholinesterase
• PDB 4EY7
Image extracted from Cheung et al.,
2012 [2]
Example: Donepezil /
Acetylcholinesterase
Example: Acetylcholinesterase
Integrative Genomics Viewer
Not very successful
Alignment of the 3D
structures of mutant
number 52 (yellow) and
PDB 4EY7 AChE protein
(green). The only changed
residue is the Y150
(magenta) to H150 (red).
The white surface
represents the molecular
surface of donepezil.
Why is this a bad example?
AChE a key enzyme in human biology  these are
the most highly conserved, even interspecies
 Learning: Look at that stuff before investing
time 
Generating
Vocabularies
Vocabulary generation
Extensive mapping of terms from various sources
Vocabulary generation
397211
preferred
names
598532
synonyms
102086
identifiers
The chevron diagram shows the number of samples annotated
with names. Already by looking at the numbers you can see tha
mapping everything is non-trivial.
A Big Data exercise in itself …
Tweet mining
Mining Twitter for side effects
Needed Drug Name
and synonyms:
Adalimumab
Humira
Exemptia
331731-18-1
L04AB04
MedDRA vocabulary
Many birds tweet lots of noise …
BUT …
• [1] "Lipitor headache 0"
[1] "Lipitor rash 1"
[1] "Lipitor pain 27"
[1] "Lipitor bleeding 0"
[1] "Lipitor cough 0"
[1] "Lisinopril headache 0"
[1] "Lisinopril rash 0"
[1] "Lisinopril pain 8"
[1] "Lisinopril bleeding 0"
[1] "Lisinopril cough 7"
[1] "Simvastatin headache 0"
[1] "Simvastatin rash 0"
[1] "Simvastatin pain 0"
[1] "Simvastatin bleeding 0"
[1] "Simvastatin cough 0"
[1] "Plavix headache 0"
[1] "Plavix rash 0"
[1] "Plavix pain 0"
[1] "Plavix bleeding 1"
[1] "Plavix cough 0"
[1] "Crestor headache 0"
[1] "Crestor rash 0"
[1] "Crestor pain 0"
[1] "Crestor bleeding 0"
[1] "Crestor cough 0"
Top 200 drugs
- Cutoff is at 1500 tweets that a
few drugs easily surpass (although
it's mostly only pharmacies
advertizing)
- Others are not mentioned once
(probably a synonym issue as I
restricted to English as language). -
- top drugs are tweeted more
often, but e.g. Tarceva (in 2006) at
the very bottom also reaches the
top number of tweets (109 on list).
089 – 189 6582 – 80
Garmischer Str. 4/V
80339 München
josef.scheiber@biovariance.com:
09632 – 9248 325
Konnersreuther Str. 6g
95652 Waldsassen
Questions?

More Related Content

What's hot

Big Data applications in Health Care
Big Data applications in Health CareBig Data applications in Health Care
Big Data applications in Health CareLeo Barella
 
Big Data in healthcare - opportunities and issues
Big Data in healthcare - opportunities and issuesBig Data in healthcare - opportunities and issues
Big Data in healthcare - opportunities and issuesJaco van Duivenboden
 
Digital Healthcare Trends: Transformation Towards Better Care Relationship
Digital Healthcare Trends: Transformation Towards Better Care RelationshipDigital Healthcare Trends: Transformation Towards Better Care Relationship
Digital Healthcare Trends: Transformation Towards Better Care RelationshipKumaraguru Veerasamy
 
Introduction to Healthcare Analytics
Introduction to Healthcare Analytics Introduction to Healthcare Analytics
Introduction to Healthcare Analytics Experfy
 
Big-Data in HealthCare _ Overview
Big-Data in HealthCare _ OverviewBig-Data in HealthCare _ Overview
Big-Data in HealthCare _ OverviewHamdaoui Younes
 
2023 Healthcare Trends: What Leaders Need to Know about the Latest Emerging M...
2023 Healthcare Trends: What Leaders Need to Know about the Latest Emerging M...2023 Healthcare Trends: What Leaders Need to Know about the Latest Emerging M...
2023 Healthcare Trends: What Leaders Need to Know about the Latest Emerging M...Health Catalyst
 
Natural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health RecordsNatural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health RecordsMMS Holdings
 
Artificial Intelligence in Healthcare Report
Artificial Intelligence in Healthcare Report Artificial Intelligence in Healthcare Report
Artificial Intelligence in Healthcare Report Mohit Sharma (GAICD)
 
Artificial Intelligence in Medicine and Healthcare
Artificial Intelligence in Medicine and HealthcareArtificial Intelligence in Medicine and Healthcare
Artificial Intelligence in Medicine and HealthcareAgnieszka Maria Walorska
 
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingBig Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingHealth Catalyst
 
演講-Meta analysis in medical research-張偉豪
演講-Meta analysis in medical research-張偉豪演講-Meta analysis in medical research-張偉豪
演講-Meta analysis in medical research-張偉豪Beckett Hsieh
 
Big data in healthcare
Big data in healthcareBig data in healthcare
Big data in healthcareDeZyre
 
Artificial intelligence during covid 19 april 2021
Artificial intelligence during covid  19 april 2021Artificial intelligence during covid  19 april 2021
Artificial intelligence during covid 19 april 2021Shazia Iqbal
 
Big Data Ppt PowerPoint Presentation Slides
Big Data Ppt PowerPoint Presentation Slides Big Data Ppt PowerPoint Presentation Slides
Big Data Ppt PowerPoint Presentation Slides SlideTeam
 

What's hot (20)

Big Data applications in Health Care
Big Data applications in Health CareBig Data applications in Health Care
Big Data applications in Health Care
 
Big Data in healthcare - opportunities and issues
Big Data in healthcare - opportunities and issuesBig Data in healthcare - opportunities and issues
Big Data in healthcare - opportunities and issues
 
Big Data
Big DataBig Data
Big Data
 
Digital Healthcare Trends: Transformation Towards Better Care Relationship
Digital Healthcare Trends: Transformation Towards Better Care RelationshipDigital Healthcare Trends: Transformation Towards Better Care Relationship
Digital Healthcare Trends: Transformation Towards Better Care Relationship
 
Big Data
Big DataBig Data
Big Data
 
Introduction to Healthcare Analytics
Introduction to Healthcare Analytics Introduction to Healthcare Analytics
Introduction to Healthcare Analytics
 
Big-Data in HealthCare _ Overview
Big-Data in HealthCare _ OverviewBig-Data in HealthCare _ Overview
Big-Data in HealthCare _ Overview
 
2023 Healthcare Trends: What Leaders Need to Know about the Latest Emerging M...
2023 Healthcare Trends: What Leaders Need to Know about the Latest Emerging M...2023 Healthcare Trends: What Leaders Need to Know about the Latest Emerging M...
2023 Healthcare Trends: What Leaders Need to Know about the Latest Emerging M...
 
Natural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health RecordsNatural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health Records
 
Artificial Intelligence in Healthcare Report
Artificial Intelligence in Healthcare Report Artificial Intelligence in Healthcare Report
Artificial Intelligence in Healthcare Report
 
Big Data
Big DataBig Data
Big Data
 
Artificial Intelligence in Medicine and Healthcare
Artificial Intelligence in Medicine and HealthcareArtificial Intelligence in Medicine and Healthcare
Artificial Intelligence in Medicine and Healthcare
 
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingBig Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
 
演講-Meta analysis in medical research-張偉豪
演講-Meta analysis in medical research-張偉豪演講-Meta analysis in medical research-張偉豪
演講-Meta analysis in medical research-張偉豪
 
Big data in healthcare
Big data in healthcareBig data in healthcare
Big data in healthcare
 
Big data in healthcare
Big data in healthcareBig data in healthcare
Big data in healthcare
 
Artificial intelligence during covid 19 april 2021
Artificial intelligence during covid  19 april 2021Artificial intelligence during covid  19 april 2021
Artificial intelligence during covid 19 april 2021
 
ARTIFICIAL INTELLIGENCE ROLE IN HEALTH CARE Dr.T.V.Rao MD
ARTIFICIAL INTELLIGENCE ROLE IN HEALTH CARE  Dr.T.V.Rao MDARTIFICIAL INTELLIGENCE ROLE IN HEALTH CARE  Dr.T.V.Rao MD
ARTIFICIAL INTELLIGENCE ROLE IN HEALTH CARE Dr.T.V.Rao MD
 
Digital Health Care Technology
Digital Health Care TechnologyDigital Health Care Technology
Digital Health Care Technology
 
Big Data Ppt PowerPoint Presentation Slides
Big Data Ppt PowerPoint Presentation Slides Big Data Ppt PowerPoint Presentation Slides
Big Data Ppt PowerPoint Presentation Slides
 

Viewers also liked

Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Ankur Khanna
 
Improving pharmaceutical marketing using big data solutions
Improving pharmaceutical marketing using big data solutionsImproving pharmaceutical marketing using big data solutions
Improving pharmaceutical marketing using big data solutionsPaul Grant
 
Data mining (DM) in the pharmaceutical industry
Data mining (DM) in the pharmaceutical industryData mining (DM) in the pharmaceutical industry
Data mining (DM) in the pharmaceutical industrylurdhu agnes
 
New Pharma Market Reality - Predictive Analytics is the Solution
New Pharma Market Reality - Predictive Analytics is the SolutionNew Pharma Market Reality - Predictive Analytics is the Solution
New Pharma Market Reality - Predictive Analytics is the SolutionDr. Sandeep Juneja
 
Application of BI in pharmaceutical industry
Application of BI in pharmaceutical industryApplication of BI in pharmaceutical industry
Application of BI in pharmaceutical industryBiBoard.Org
 
Bio variance j_scheiber_bioit_repurposingworkshop2013_draft
Bio variance j_scheiber_bioit_repurposingworkshop2013_draftBio variance j_scheiber_bioit_repurposingworkshop2013_draft
Bio variance j_scheiber_bioit_repurposingworkshop2013_draftJosef Scheiber
 
BioVariance Research Services - Target Profile Prediction
BioVariance Research Services - Target Profile PredictionBioVariance Research Services - Target Profile Prediction
BioVariance Research Services - Target Profile PredictionJosef Scheiber
 
Conference presentation from #iccs2014 in Noordwijkerhout
Conference presentation from #iccs2014 in NoordwijkerhoutConference presentation from #iccs2014 in Noordwijkerhout
Conference presentation from #iccs2014 in NoordwijkerhoutJosef Scheiber
 
BioVariance Research Services - Mapping Pharmaceutical patents to Biological ...
BioVariance Research Services - Mapping Pharmaceutical patents to Biological ...BioVariance Research Services - Mapping Pharmaceutical patents to Biological ...
BioVariance Research Services - Mapping Pharmaceutical patents to Biological ...Josef Scheiber
 
BioVariance - Pediatric Pharmacogenomics in Drug Discovery
BioVariance - Pediatric Pharmacogenomics in Drug DiscoveryBioVariance - Pediatric Pharmacogenomics in Drug Discovery
BioVariance - Pediatric Pharmacogenomics in Drug DiscoveryJosef Scheiber
 
Mobile Health Forum Frankfurt - Therapieempfehlung per Smartphone
Mobile Health Forum Frankfurt - Therapieempfehlung per SmartphoneMobile Health Forum Frankfurt - Therapieempfehlung per Smartphone
Mobile Health Forum Frankfurt - Therapieempfehlung per SmartphoneJosef Scheiber
 
Digital Asset Management in Pharma
Digital Asset Management in PharmaDigital Asset Management in Pharma
Digital Asset Management in Pharmaphillycaferacer
 
Legal Content Management on SharePoint 2010
Legal Content Management on SharePoint 2010Legal Content Management on SharePoint 2010
Legal Content Management on SharePoint 2010phillycaferacer
 
Big Data Challenges for Real-Time Personalized Medicine
Big Data Challenges for Real-Time Personalized MedicineBig Data Challenges for Real-Time Personalized Medicine
Big Data Challenges for Real-Time Personalized MedicineSAP Technology
 
Zeller Edm Summit Agile Deployment Of Predictive Analytics
Zeller Edm Summit   Agile Deployment Of Predictive AnalyticsZeller Edm Summit   Agile Deployment Of Predictive Analytics
Zeller Edm Summit Agile Deployment Of Predictive AnalyticsRonald.Ramos
 
20160512 predictive and adaptive approach
20160512   predictive and adaptive approach20160512   predictive and adaptive approach
20160512 predictive and adaptive approachSilvia Fragola
 
Agile 2013 presentation, tom grant
Agile 2013 presentation, tom grantAgile 2013 presentation, tom grant
Agile 2013 presentation, tom grantTom Grant
 
WE Europe 2015: Innovating in disruptive ecosystems: lessons from the life sc...
WE Europe 2015: Innovating in disruptive ecosystems: lessons from the life sc...WE Europe 2015: Innovating in disruptive ecosystems: lessons from the life sc...
WE Europe 2015: Innovating in disruptive ecosystems: lessons from the life sc...Society of Women Engineers
 

Viewers also liked (19)

Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma
 
Analytics in Pharmaceutical Industry
Analytics in Pharmaceutical IndustryAnalytics in Pharmaceutical Industry
Analytics in Pharmaceutical Industry
 
Improving pharmaceutical marketing using big data solutions
Improving pharmaceutical marketing using big data solutionsImproving pharmaceutical marketing using big data solutions
Improving pharmaceutical marketing using big data solutions
 
Data mining (DM) in the pharmaceutical industry
Data mining (DM) in the pharmaceutical industryData mining (DM) in the pharmaceutical industry
Data mining (DM) in the pharmaceutical industry
 
New Pharma Market Reality - Predictive Analytics is the Solution
New Pharma Market Reality - Predictive Analytics is the SolutionNew Pharma Market Reality - Predictive Analytics is the Solution
New Pharma Market Reality - Predictive Analytics is the Solution
 
Application of BI in pharmaceutical industry
Application of BI in pharmaceutical industryApplication of BI in pharmaceutical industry
Application of BI in pharmaceutical industry
 
Bio variance j_scheiber_bioit_repurposingworkshop2013_draft
Bio variance j_scheiber_bioit_repurposingworkshop2013_draftBio variance j_scheiber_bioit_repurposingworkshop2013_draft
Bio variance j_scheiber_bioit_repurposingworkshop2013_draft
 
BioVariance Research Services - Target Profile Prediction
BioVariance Research Services - Target Profile PredictionBioVariance Research Services - Target Profile Prediction
BioVariance Research Services - Target Profile Prediction
 
Conference presentation from #iccs2014 in Noordwijkerhout
Conference presentation from #iccs2014 in NoordwijkerhoutConference presentation from #iccs2014 in Noordwijkerhout
Conference presentation from #iccs2014 in Noordwijkerhout
 
BioVariance Research Services - Mapping Pharmaceutical patents to Biological ...
BioVariance Research Services - Mapping Pharmaceutical patents to Biological ...BioVariance Research Services - Mapping Pharmaceutical patents to Biological ...
BioVariance Research Services - Mapping Pharmaceutical patents to Biological ...
 
BioVariance - Pediatric Pharmacogenomics in Drug Discovery
BioVariance - Pediatric Pharmacogenomics in Drug DiscoveryBioVariance - Pediatric Pharmacogenomics in Drug Discovery
BioVariance - Pediatric Pharmacogenomics in Drug Discovery
 
Mobile Health Forum Frankfurt - Therapieempfehlung per Smartphone
Mobile Health Forum Frankfurt - Therapieempfehlung per SmartphoneMobile Health Forum Frankfurt - Therapieempfehlung per Smartphone
Mobile Health Forum Frankfurt - Therapieempfehlung per Smartphone
 
Digital Asset Management in Pharma
Digital Asset Management in PharmaDigital Asset Management in Pharma
Digital Asset Management in Pharma
 
Legal Content Management on SharePoint 2010
Legal Content Management on SharePoint 2010Legal Content Management on SharePoint 2010
Legal Content Management on SharePoint 2010
 
Big Data Challenges for Real-Time Personalized Medicine
Big Data Challenges for Real-Time Personalized MedicineBig Data Challenges for Real-Time Personalized Medicine
Big Data Challenges for Real-Time Personalized Medicine
 
Zeller Edm Summit Agile Deployment Of Predictive Analytics
Zeller Edm Summit   Agile Deployment Of Predictive AnalyticsZeller Edm Summit   Agile Deployment Of Predictive Analytics
Zeller Edm Summit Agile Deployment Of Predictive Analytics
 
20160512 predictive and adaptive approach
20160512   predictive and adaptive approach20160512   predictive and adaptive approach
20160512 predictive and adaptive approach
 
Agile 2013 presentation, tom grant
Agile 2013 presentation, tom grantAgile 2013 presentation, tom grant
Agile 2013 presentation, tom grant
 
WE Europe 2015: Innovating in disruptive ecosystems: lessons from the life sc...
WE Europe 2015: Innovating in disruptive ecosystems: lessons from the life sc...WE Europe 2015: Innovating in disruptive ecosystems: lessons from the life sc...
WE Europe 2015: Innovating in disruptive ecosystems: lessons from the life sc...
 

Similar to Big Data in Pharma - Overview and Use Cases

Introduction to Bioinformatics.
 Introduction to Bioinformatics. Introduction to Bioinformatics.
Introduction to Bioinformatics.Elena Sügis
 
Bioinformatics
BioinformaticsBioinformatics
BioinformaticsJTADrexel
 
Artificial Intelligence for Discovery
Artificial Intelligence for DiscoveryArtificial Intelligence for Discovery
Artificial Intelligence for DiscoveryDayOne
 
01. Introduction to Bioinformatics.pptx
01. Introduction to Bioinformatics.pptx01. Introduction to Bioinformatics.pptx
01. Introduction to Bioinformatics.pptxHussainTaqi1
 
acs talk open source drug discovery
acs talk open source drug discoveryacs talk open source drug discovery
acs talk open source drug discoverySean Ekins
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
Single-Cell Sequencing for Drug Discovery: Applications and Challenges
Single-Cell Sequencing for Drug Discovery: Applications and ChallengesSingle-Cell Sequencing for Drug Discovery: Applications and Challenges
Single-Cell Sequencing for Drug Discovery: Applications and Challengesinside-BigData.com
 
WEBINAR: The Yosemite Project PART 6 -- Data-Driven Biomedical Research with ...
WEBINAR: The Yosemite Project PART 6 -- Data-Driven Biomedical Research with ...WEBINAR: The Yosemite Project PART 6 -- Data-Driven Biomedical Research with ...
WEBINAR: The Yosemite Project PART 6 -- Data-Driven Biomedical Research with ...DATAVERSITY
 
Methods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataMethods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataChirag Patel
 
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...Pistoia Alliance
 
TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)jmoore89
 
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p collegeSKUASTKashmir
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical DataPaul Agapow
 
2019-06-21 YC Preso V5.pdf
2019-06-21 YC Preso V5.pdf2019-06-21 YC Preso V5.pdf
2019-06-21 YC Preso V5.pdfYue Cathy Chang
 
Big Data Analytics in the Health Domain
Big Data Analytics in the Health DomainBig Data Analytics in the Health Domain
Big Data Analytics in the Health DomainBigData_Europe
 
Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!adcobb
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08Russ Altman
 
Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Sage Base
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsmikaelhuss
 

Similar to Big Data in Pharma - Overview and Use Cases (20)

Introduction to Bioinformatics.
 Introduction to Bioinformatics. Introduction to Bioinformatics.
Introduction to Bioinformatics.
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Artificial Intelligence for Discovery
Artificial Intelligence for DiscoveryArtificial Intelligence for Discovery
Artificial Intelligence for Discovery
 
01. Introduction to Bioinformatics.pptx
01. Introduction to Bioinformatics.pptx01. Introduction to Bioinformatics.pptx
01. Introduction to Bioinformatics.pptx
 
acs talk open source drug discovery
acs talk open source drug discoveryacs talk open source drug discovery
acs talk open source drug discovery
 
Online Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery SystemsOnline Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery Systems
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Single-Cell Sequencing for Drug Discovery: Applications and Challenges
Single-Cell Sequencing for Drug Discovery: Applications and ChallengesSingle-Cell Sequencing for Drug Discovery: Applications and Challenges
Single-Cell Sequencing for Drug Discovery: Applications and Challenges
 
WEBINAR: The Yosemite Project PART 6 -- Data-Driven Biomedical Research with ...
WEBINAR: The Yosemite Project PART 6 -- Data-Driven Biomedical Research with ...WEBINAR: The Yosemite Project PART 6 -- Data-Driven Biomedical Research with ...
WEBINAR: The Yosemite Project PART 6 -- Data-Driven Biomedical Research with ...
 
Methods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataMethods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big data
 
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
 
TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)
 
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p college
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
 
2019-06-21 YC Preso V5.pdf
2019-06-21 YC Preso V5.pdf2019-06-21 YC Preso V5.pdf
2019-06-21 YC Preso V5.pdf
 
Big Data Analytics in the Health Domain
Big Data Analytics in the Health DomainBig Data Analytics in the Health Domain
Big Data Analytics in the Health Domain
 
Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
 
Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 

Recently uploaded

From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computationsit20ad004
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Servicejennyeacort
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 

Recently uploaded (20)

From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computation
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 

Big Data in Pharma - Overview and Use Cases

  • 1. Big Data Analyses in Pharma An Overview Josef Scheiber, PhD Managing Director July 2015
  • 2. Geographie Startup Center in Waldsassen Main site Data Analyses and Software Development Westpark Center Garmischer Str. in Munich Scientific ActivitiesSince Jan 1, 2015 Basel/Switzerland Data Curation and customer- related activities Prag 150 km München 200 km Berlin 300 km Frankfurt 250 km
  • 3. BioVariance at a Glance – Get most out of your complex data Curate.Integrate Analyze.Model Visualize.Explore DECIDE
  • 6. Courtesy: M. Zeinab, slideshare
  • 7. What do we need out of Big Data? 1. What are the inhibitors of kinase X and the five most similar kinases with IC50 < 1 μM and with MW < 500 from all internal and external data sources? 2. What assay technologies have been used against my kinase? Which cell lines? 3. What other proteins are in the same kinase branch as target X, where there were validated chemical hits from external or internal sources? 4. If I hit a particular kinase, what would the potential side-effect profile look like? Which known inhibitor of this kinase has the best safety profile and the fewest known IC50s? 5. Have I identified other compounds with a bioactivity profile similar to compound X and with the same core substructure? 6. Can we create a phylochemical tree of kinases and for a new kinase target place it into the tree on the basis of activity against a reference panel of compounds? 7. Have I identified all kinases with an x-ray structure (in-house or external) that are in pathway X? Bridging Chemical and Biological Data: Implications for Pharmaceutical Drug Discovery JL Jenkins, J Scheiber, D Mikhailov, A Bender, A Schuffenhauer, B Cornett, V Chan, J Kondracki, B Rohde, JW Davies (2012) In: Computational Approaches in Cheminformatics and Bioinformatics Edited by:A Bender, R Guha. 25-56 John Wiley & Sons, Inc. ANSWERS
  • 10. Descriptive: What happened? Diagnostic: Why did it happen? Predictive: What will happen? Prescriptive: How can we make it happen? Better data for better analytics Hindsight Insight Foresight
  • 11. Need for interpretation 33,3 10 20 30 70 33,3 80 70 60 10 33,3 10 10 10 20 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Before molecular biology Molecular biology golden age Genomics age Deep sequencing age Very soon Data Analysis Experiment Experimental Design
  • 14. Genome Sequencing Slide adapted from George Church
  • 15. Genome Sequencing Slide adapted from George Church
  • 16. Cost Reduction - Example 458 Ferrari Spider - $398,000 in 2006 – 40 cents now!
  • 17.  Much more data for way less money
  • 18. Challenges for Informatics? – 1 genome is roughly 500 GB/data 2011 – several 100 exomes
  • 19. Drug Discovery Pipeline Target finding Lead Finding Lead Optimization … Phase 1 … Market Drug candidates Patients
  • 21. Velocity • Mutations in tumor • Resistance mechanisms in patients • long term/short term AE • compliance • Nutrition and microbiome • Data from wearables relevant for drugs
  • 25. Variety • Bioinformatics • Clinical • Social network • E-health • Also text/patents
  • 26. A simplified overview – Molecules in Man Adapted from Gohlke JM, Portier CJ. Environ. Health Perspect. 115:1261-1263 (2007)
  • 27. A question of complexity –They all interact … Biology Chemistry Physics
  • 28. Dealing with a very complex environment – i.e. many opportunities  DNA  RNA  Protein  Interactions  Clinical parameters  Treatment History  Tissue anatomy  Surgical History  Epigenetic Profiles from many patients at different timeponits  Target  Off-targets  Metabolites  Additional indications  Unspecific effects  Similar drugs Adapted from: J. Scheiber; How can we enable drug discovery informatics for personalized healthcare? Expert Opinion on Drug Discovery, 1-6; 2/2011
  • 30. Sequences Expression Proteomics Biological networks (but also: Cells, Tissues, Organs) POPULATION
  • 32. Veracity • Chemogenomics data • Gene expression data  Imputation?
  • 33. Veracity - Chemogenomics Adapted from Tanrikulu et al. Missing Value Estimation for Compound- Target Activity Data, J. Mol. Inf
  • 34. Veracity - Interactomics A Proteome-Scale Map of the Human Interactome Network Rolland, Thomas et al. Cell , Volume 159 , Issue 5 , 1212 - 1226
  • 36.
  • 39. Data integration strategy a) A central vocabulary/pointer server (information stored are preferred names and synonyms plus pointers to data servers, where to find what) b)  semantic integration layer with domain-specific terminology and referential data c) A database for each datatype collected, storing only preferred names along with raw measurements d) Clearly defined APIs for further integration with public data sources and to enable large-scale analyses
  • 40. Vocabularies needed • Genes, Drugs, Proteins • Diseases • Organisms • Microbiome species & genes • Localization & source • Phenotype • Metabolite common names
  • 41. Answering workflow Vocabulary Vocabulary server acts as translator, aggregator and locator, i.e. knows where the respective facts can be found Firmicutes produce alpha-Linolein and thereby cause gut irritation species metabolite Further Data of each type is stored in a specific database to enhance performance of large-scale analyses Expert tools talk to data directly or via webservices API API API API Enduserinterfaceand visualization
  • 43. Genome data at scale
  • 44. Workflow Identify drug targets (primary and off-targets, from DrugBank) Call variations on a per- individuum basis
  • 45. Workflow Analyse mutation rates in the targets and in particular drug binding pockets
  • 46. Example: Donepezil / Acetylcholinesterase • PDB 4EY7 Image extracted from Cheung et al., 2012 [2]
  • 49. Not very successful Alignment of the 3D structures of mutant number 52 (yellow) and PDB 4EY7 AChE protein (green). The only changed residue is the Y150 (magenta) to H150 (red). The white surface represents the molecular surface of donepezil.
  • 50. Why is this a bad example? AChE a key enzyme in human biology  these are the most highly conserved, even interspecies  Learning: Look at that stuff before investing time 
  • 52. Vocabulary generation Extensive mapping of terms from various sources
  • 53. Vocabulary generation 397211 preferred names 598532 synonyms 102086 identifiers The chevron diagram shows the number of samples annotated with names. Already by looking at the numbers you can see tha mapping everything is non-trivial. A Big Data exercise in itself …
  • 55. Mining Twitter for side effects Needed Drug Name and synonyms: Adalimumab Humira Exemptia 331731-18-1 L04AB04 MedDRA vocabulary
  • 56. Many birds tweet lots of noise … BUT … • [1] "Lipitor headache 0" [1] "Lipitor rash 1" [1] "Lipitor pain 27" [1] "Lipitor bleeding 0" [1] "Lipitor cough 0" [1] "Lisinopril headache 0" [1] "Lisinopril rash 0" [1] "Lisinopril pain 8" [1] "Lisinopril bleeding 0" [1] "Lisinopril cough 7" [1] "Simvastatin headache 0" [1] "Simvastatin rash 0" [1] "Simvastatin pain 0" [1] "Simvastatin bleeding 0" [1] "Simvastatin cough 0" [1] "Plavix headache 0" [1] "Plavix rash 0" [1] "Plavix pain 0" [1] "Plavix bleeding 1" [1] "Plavix cough 0" [1] "Crestor headache 0" [1] "Crestor rash 0" [1] "Crestor pain 0" [1] "Crestor bleeding 0" [1] "Crestor cough 0"
  • 57. Top 200 drugs - Cutoff is at 1500 tweets that a few drugs easily surpass (although it's mostly only pharmacies advertizing) - Others are not mentioned once (probably a synonym issue as I restricted to English as language). - - top drugs are tweeted more often, but e.g. Tarceva (in 2006) at the very bottom also reaches the top number of tweets (109 on list).
  • 58. 089 – 189 6582 – 80 Garmischer Str. 4/V 80339 München josef.scheiber@biovariance.com: 09632 – 9248 325 Konnersreuther Str. 6g 95652 Waldsassen Questions?