Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
1. The Gene Wiki: Using
Wikipedia and Wikidata to
organize biomedical
knowledge
Andrew Su, Ph.D.
@andrewsu
[[User:Andrew Su]]
http://sulab.org
August 23, 2017
WMF Research
Slides: slideshare.net/andrewsu
3. The biomedical literature is massive…
3
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1,400,000
1985 1990 1995 2000 2005 2010 2015
Number of new PubMed-indexed articles
11. 11
Biomedical
databases
Applications
1. Text mining biological annotations
2. Editor engagement via peer-
review dual publication model
3. Embedding structured data using
Wikipedia templates
https://www.ncbi.nlm.nih.gov/pubmed/22434829
The expression of the protein has been found to be
significantly lower in [[schizophrenia]] and
psychotic...
The expression of the protein has been found to be
significantly lower in {{SWL|type=decreased
expression|target=schizophrenia}} and psychotic...
12. is to data
is to text
biomedical
Provide a database of the world’s
knowledge that anyone can edit
- Denny Vrandečić
21. Leveraging the Disease Ontology structure
21
“Retrieve genes with
genetic association with any
respiratory disease and
gene product is localized to
cell membrane”
31 genes / 8 diseases
diseaseGALabel gene_counts geneList
asthma 15
SMAD3, RAP1GAP2, IL18R1, HPSE2,
SLC30A8, SLC22A5, PSAP, ERBB4, HLA-
DQA1, IGSF3, IL2RB, IL6R, NOTCH4, PDE4D,
RAD50
chronic obstructive pulmonary
disease 5 HLA-C, SFTPD, ANXA5, ANXA11, ATP2C2
lung cancer 3 TGM5, VTI1A, PHACTR2
interstitial lung disease 2 DSP, ATP11A
non-small-cell lung carcinoma 2 NALCN, DLST
nasopharynx carcinoma 2 ITGA9, TNFRSF19
adenocarcinoma of the lung 1 BTNL2
pulmonary emphysema 1 BICD1
http://bit.ly/bosc2017_wikidata
22. Opportunistic integration
22
diseaseGALabel exposureLabel
lung cancer arsenic pentoxide exposure
lung cancer HN1 exposure
lung cancer mechlorethamine exposure
lung cancer HN3 exposure
asthma Phenacyl chloride exposure
pulmonary emphysema phosgene exposure
“Retrieve genes with
genetic association with any
respiratory disease and
gene product is localized to
cell membrane and show
causative chemical
hazards”
4 diseases / 6 chemical hazards
http://bit.ly/bosc2017_wikidata
23. 23
Biomedical
databases
Applications
1. Text mining biological annotations
2. Editor engagement via peer-
review dual publication model
3. Embedding structured data using
Wikipedia templates
Applications
1. Demonstrating integrative
biomedical queries
2. Building domain-specific web
applications
25. Chlambase.org for the Chlamydia research community
25
Community-specific
structured knowledge
Genetic mutants, gene
expression, host-pathogen
interactions, orthologs, ….
27. Thoughts for the future
(TFTF)
27
https://www.pexels.com/photo/telescope-view-binoculars-viewpoint-4754/
28. TFTF #1: Need incentives for data owners → data contributors
28
Circular
Diagram
Direct measures of usage
• SPARQL query logs
• Network interconnectedness
• Other ideas?
29. TFTF #2: Need functional integration of WP and WD edit histories
1. Statement-level filtering
2. Across all sourced WD items
(arbitrary access)
29
https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Medicine/Archive_92
30. TFTF #3: Need more expressive data modeling and reporting
• Defining data models and constraints
(ShEx?)
• Visualizing and disseminating models
• Reporting violations and auto-suggesting
fixes
30
https://github.com/SuLab/Genewiki-ShEx