TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
Semantic Technology empowering Real World outcomes in Biomedical Research and Clinical Practices
1. Semantic technology empowering real world
outcomes in biomedical research and clinical
practices
Talk presented at Case Western Reserve University on Nov 26, 2012
Amit Sheth
Kno.e.sis– Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, Ohio
http://knoesis.org
http://knoesis.org/amit/hcls
Special thanks: SujanParera
1
6. Alan Smith Vinh HemantP
Sujan Nguyen urohit
Perera
Wenbo
Wang
Cory Henson
Pramod Koneru
Amit Sheth Kalpa
Maryam Panahiazar
Gunaratna
AshutoshJadhav
Sanjaya
Wijeratne
Pramod Prateek PavanKapanip Delroy
Sarasi Lalithsena Ajith
Anantharam Jain athi Lu Chen Cameron
Ranabahu
7. Semantic Web
• Improve Insight from Biomedical Data
Objective • Improve Clinical Decision Making
• Vastness/Volume
• Velocity
Challenges • Variety/Heterogeneity
• Vagueness, Uncertainty, Inconsistency, Deceit
• Improve the machine understandability and
Approach processing of data of all types to
• Modeling and Background Knowledge
• Annotation
• Complex Querying/Analysis, Reasoning
8. User interface and applications
Trust
Knowledge
Proof
Representation
Unifying logic
Querying Ontologies: Rules:
Cryptography
Querying: OWL Data/Knowledge
RIF/SWRL
SPARQL Representation
Taxonomies: RDFS
Data interchange: RDF
Syntax: XML
Identifiers: URI Character set: UNICODE
9.
10.
11.
12. Applications
Epidemiology
Biomedical • PREscription Drug abuse
Online Surveillance and
• Semantic Search and Epidemiology(PREDOSE)
Browsing(Doozer++,
SCOONER, iExplore)
• Semantics and Services Healthcare
enabled Problem Solving
• Active Semantic Electronic
Environment for
Medical Record(ASEMR)
T.cruzi(SPSE)
• Mining and Analysis of
EMR(ezFIND, ezMeasure)
• kHealth
13. Doozer++
Some of the
semantic tools
iExplore
SCOONER
Knowledge
Insights
Exploration
Hypothesis Intuitive
Generation Browsing
Better
Personalization
Understanding
14. Knowledge Acquisition – Doozer++
• Building ontology is costly
• Large volume of knowledge available in semi-
structured/unstructured format
• No assurance for the credibility of such
knowledge
19. Doozer++ Demo
Knowledge Acquisition from Community-Generated Content
Continuous Semantics to Analyze Real-Time Data , IEEE Internet
Computing (Volume 14)
20. Beyond Hierarchy
• Identify Relationships
• Textual pattern-based extraction for known
relationships
• Facts available in background knowledge
• Find evidence for such facts
• Combined evidence from many different
patterns increases the certainty of a
relationship between the entities
21. Validating Knowledge
• Evaluating acquired knowledge
• Explicit
• User can vote for facts
• Facts presented based on user interests
• Implicit
• User’s browsing history used as a indication of
which propositions are correct and interesting
• Now it adds validated knowledge back to community
22. Building Human Performance &
Cognition Ontology (HPCO)
HPC Base Hierarchy from
Keywords Wikipedia Focused pattern
based extraction
SenseLab Neuroscience
Ontologies
Initial KB creation
Meta Knowledgebase
PubMed Abstracts
Merge
Kno.e.sis: NLP
based triples
NLM: Rule based
Enriched
BKR triples
Knowledgebase
23. Use Case for HPCO
• Number of Entities – 2 million
• Number of non-trivial facts – 3 million
• NLP Based*: calcium-binding protein S100B
modulates long-term synaptic plasticity
• Pattern Based**: Olfactory Bulb has physical
part of anatomic structure Mitral cell
* Joint Extraction of Compound Entities and Relationships from Biomedical Literature , Web Intel. 2008
* A Framework for Schema-Driven Relationship Discovery from Unstructured Text, ISWC 2006
** On Demand Creation of Focused Domain Models using Top-down and Bottom-up Information Extraction,
Technical Report
25. SCOONER Demo
An Up-to-date Knowledge-Based Literature Search and Exploration Framework
for Focused Bioscience Domains , IHI 2012- 2nd ACM SIGHIT International
Health Informatics Symposium
31. Active Semantic Electronic Medical
Record - ASEMR
• New Drugs
• Adds interaction with current drugs
• Changes possible procedures to treat an
illness
• Insurance coverage changes
• Will pay for drug X, but not Y
• May need certain diagnosis before
expensive tests
• Physicians are require to keep track of ever
changing landscape
32. ASEMR – Active Semantic Document
• A Document
• With semantic annotations
• entities linked to ontology
• terms linked to specialized lexicon
• With actionable information
• rules over semantic annotations
• rule violation indicated with alerts
Atrial fibrillation with prior stroke, currently
on Pradaxa, doing well.
Mild glucose intolerance and hyperlipidemia,
being treated by primary care.
33. ASEMR – Active Semantic Patient Record
• Type of ASD
• Three Ontologies
• Practice
Information about practice such as
patient/physician data
• Drug
Information about drugs, interaction,
formularies, etc.
• ICD/CPT
Describes the relationships between CPT
and ICD codes
34. ASEMR – Practice Ontology Hierarchy
facility
insurance_
ancillary owl:thing carrier
ambularory insurance
_episode
insurance_
encounter
plan
person
event insurance_
patient policy
practitioner
37. Charts
Ja
n
100
200
300
400
500
600
0
04
M
ar
04
M
ay
04
Ju
l0
Se 4
pt
04
N
ov
04
Ja
n
05
Month/Year
M
ar
05
M
ay
05
Before ASEMR
Ju
l0
5
Back Log
Same Day
38. After ASEMR
700
600
500
Charts
400 Same Day
300 Back Log
200
100
0
Sept Nov 05 Jan 06 Mar 06
05
Month/Year
39. ASEMR - Benefits
• Error Prevention
• Patient care
• Insurance
• Decision Support
• Patient satisfaction
• Reimbursement
• Efficiency/Time
• Real-time chart completion
• “semantic” and automated linking with
billing
41. Semantics and Services enabled
Problem Solving Environment for
T.cruzi - SPSE
• Majority of experimental data reside in labs
• Integration of lab data facilitate new insights
• Formulating queries against such data required
deep technical knowledge
A Semantic Problem Solving Environment for Integrative Parasite Research:
Identification of Intervention Targets for Trypanosomacruzi, 2012
43. SPSE
• Integrated internal data with external databases, such as
KEGG, GO, and some datasets on TriTrypDB
• Developed semantic provenance framework and influenced
W3C community
• SPSE supports complex biological queries that help find
gene knockout, drug and/or vaccination targets. For
example:
• Show me proteins that are downregulated in the epimastigote
stage and exist in a single metabolic pathway.
• Give me the gene knockout summaries, both for plasmid
construction and strain creation, for all gene knockout targets that
are 2-fold upregulated in amastigotes at the transcript level and
that have orthologs in Leishmania but not in Trypanosomabrucei.
44. SPSE
Complex queries can also include:
- on-the-fly Web services execution to retrieve additional data
- inference rules to make implicit knowledge explicit
45. Knowledge Enrichment from Data
• So many ontologies
• Rich in number of concepts
• Mostly concentrated on taxonomical
relationships
• Applications require domain relationships
• A is_symptom_of B
• C is_treated_with D
47. Knowledge Enrichment from Data
Background
knowledge IntellegO
Modified background
knowledge
EMR
An Ontological Approach to Focusing Attention and Enhancing Machine
Perception on the Web, Applied Ontology 2011
Data Driven Knowledge Acquisition Method for Domain Knowledge
Enrichment in Healthcare, BIBM 2012
48. Knowledge Enrichment from Data
From EMR From KB
Diseases Symptoms
fatigue
atrial Fibrillation Is edema symptom of atrial fibrillation? syncope
hypertension Is edema symptom of hypertension?
Symptoms weight loss
diabetes Is edema symptom of diabetes? chest pain
chest pain discomfort in chest
weight gain dizzy
discomfort in chest shortness of breath
rash skin nausea
cough vomiting
weight loss headache
headache cough
edema weight gain
shortness of breath
49. Knowledge Enrichment from Data
Domains No of concepts 1008161
Cardiology Problems(diseases, symptoms) 125778
with the above Orthopedics Procedures 262360
method
Oncology Medicines 298993
+ Neurology Medical Devices 33124
UMLS Etc…
healthline.com
druglib.com Relationships 77261
is treated with (disease -> medication) 41182
is relevant procedure (procedure -> disease) 3352
is symptom of (symptom -> disease) 8299
contraindicated drug (medication -> disease) 24428
50. Healthcare Challenge
• 80% unstructured healthcare data
• Pose challenges in
• Searching
• Understanding
• Mining
• Knowledge discovery
• Decision support
• Evidence based medicine
• Federal policies promote meaningful use
51. Healthcare Challenge
Coding Complexity ICD-9 ICD-10
Diagnostic Codes 14,000 69,000
Procedure Codes 3,800 72,000
Clinical
ICD-9 ICD-10 Conversion Documentation &
(Current) (1st Oct,2014) Coding-Billing
Challenges
Example: 821.01: ICD-9 code for “closed” Fractured Femur, or thigh bone.
Translates to 36 codes in ICD-10 with details regarding the precise nature of
fracture, which thigh was fractured, whether a delay in healing occurred etc.
52. Healthcare Challenge
Need to Do Better
• Traditional methods doesn’t work
• Understanding the context is crucial
53. Healthcare Challenge – The Solution
Decision Support
Search NLP Mining
+
Semantics
Knowledge Discovery Evidence-based Medicine
55. ezHealth - Benefits
• Advance search
• All hypertension patients with ejection
fraction <40
• All MI patients who are taking either beta-
blockers or ACE Inhibitors
• Patients diagnose with Atrial Fibrillation on
Coumadin or Lovanox
• Support core-measure initiative
56. Error Detection
EMR:
1. “Sepsis due to urinary tract infection….”
2. “Her prognosis is poor both short term and long term, however, we
will do everything possible to keep her alive and battle this infection."
without IntellegO
with usage of IntellegO
Problem Problem
SNM:40733004_infection SNM:68566005_infection_urinary_tract
A syntax based NLP extractor By utilizing IntellegO and cardiology
(such as Medlee) can extract background knowledge, we can more
this term and annotate accurately annotate the term as
asSNM:40733004_infection SNM:68566005_infection_urinary_tract
57. Error Detection
EMR: ”The patient is to receive 2 fluid boluses."
without IntellegO with IntellegO
Problem Treatment
SNM:32457005_body_fluid Fluid is part of buloses treatment, not a problem
A syntax based NLP extractor By utilizing IntellegO and cardiology
(such as Medlee) can extract background knowledge, we can determine
this term and annotate that this is an incorrect annotation.
asSNM:32457005_body_fluid
58. Resolve Inconsistency
The balance of evidence would suggest NLP
that his episode of atrial fibrillation seems Patient has atrial fibrillation
to be an isolated event
He has had no documented atrial NLP Patient does not have atrial
fibrillation since that time fibrillation
Syncope Atrial Fibrillation
Warfarin
Atenolol
Is_symptom_of
Is_medication_for Aspirin
59. Resolve Inconsistency
She denies any chest pain but is not really NLP Patient does not have
function due to leg stiffness, swelling an
shortness of breath
shortness of breath
Regarding the shortness of breath, we will
NLP
send for a dobutamine stress Patient has shortness of breath
echocardiogram
Shortness of Breath Obesity
Hypertension
Sleep Apnea
Is_symptom_of
Obstructive
60. PREscription Drug abuse Online
Surveillance and Epidemiology -
PREDOSE
• Non-Medical Use of Prescr - iption Drugs
• Fastest growing drug issue in US
• Escalating accidental overdose deaths
• Epidemiological Data Systems
• Data collection practices
• Data analysis limitations
61. PREDOSE
• Poor Scalability
• Limited Reusability
• Interoperability is
challenging
• Small sample size
Of course, junkie that I am, I decided to repeat the experiment. Today, after waiting 48 hours
after my last bunk 4 mg injection, I injected 2 mg. There wasn't really any rush to speak of,
but after 5 minutes I started to feel pretty damn good. So I injected another 1 mg. That was
about half an hour ago. I feel great now.
http://wiki.knoesis.org/index.php/PREDOSE
62. PREDOSE
Describe drug user’s knowledge, attitudes, and
behaviors related to illicit use of Prescription Drugs
(Information extraction)
Describe temporal patterns of non-medical use of
Prescription Drugs
(Trend Detection)
63. PREDOSE
Stage 3. Data Analysis and Interpretation
Qualitative and Quantitative Analysis
Scooner Cuebee
of Drug User Knowledge, Attitudes
and Behaviors
10
9
Semantic Web Tools Temporal Analysis for Trend Detection
Stage 2. Automatic Coding Semantic Web Database
Ontology Information Extraction Module
8
Schema 5 Natural 7
e.g. Opioid, Semantics-based
Language
Pain Pills
Instances
e.g. Suboxone, 6
+ Techniques
Processing
Entity, Relationship, Sentiment
=
Subutex and Triple Extraction Triples/RDF Database
Stage 1. Data Collection
1 3
2 4
Web
Crawler Informal Text
Web Forums Data Cleaning Data Store
64. PREDOSE
Forum Y
Entity (pre)
Entity (confirmed)
+ve Sentiment
-ve Sentiment
67. kHealth
Health information is now available from multiple sources
• medical records
• background knowledge
• social networks
• personal observations
• sensors
• etc.
68
68. kHealth
FitBit Community allows the
automated collection and
sharing of health-related data,
goals, and achievements
Foursquare is an online application which
integrates a persons physical location and Community of enthusiasts that share experiences of
social network. self-tracking and measurement.
69
69. kHealth
Sensors, actuators, and mobile computing are playing an
increasingly important role in providing data for early phases of
the health-care life-cycle
This represents a fundamental shift:
• people are now empowered to monitor and manage their own health;
• and doctors are given access to more data about their patients
70
72. kHealth
Personal Health Dashboard
Continuous Monitoring Personal Assessment Medical Service
1 2 3
Auxiliary Information – background knowledge, social/community support,
personal context, personal medical history
73
79. kHealth - Technology
Explanation: is the act of choosing the objects or events that best
account for a set of observations; often referred to as hypothesis
building
Discrimination: is the act of finding those properties that, if
observed, would help distinguish between multiple explanatory
features
80
80. kHealth - Technology
Explanation
Explanatory Feature: a feature that explains the set of
observed properties
ExplanatoryFeature ≡ ∃ssn:isPropertyOf—.{p1} ⊓ … ⊓ ∃ssn:isPropertyOf—.{pn}
Observed Property Explanatory Feature
elevated blood pressure Hypertension
clammy skin Hyperthyroidism
palpitations Pulmonary Edema
81
81. kHealth - Technology
Discrimination
Expected Property: would be explained by every explanatory
feature
ExpectedProperty ≡ ∃ssn:isPropertyOf.{f1} ⊓ … ⊓ ∃ssn:isPropertyOf.{fn}
Expected Property Explanatory Feature
elevated blood pressure Hypertension
clammy skin Hyperthyroidism
palpitations Pulmonary Edema
82
82. kHealth - Technology
Discrimination
Not Applicable Property: would not be explained by any
explanatory feature
NotApplicableProperty≡ ¬∃ssn:isPropertyOf.{f1} ⊓ … ⊓ ¬∃ssn:isPropertyOf.{fn}
Not Applicable Property Explanatory Feature
elevated blood pressure Hypertension
clammy skin Hyperthyroidism
palpitations Pulmonary Edema
83
88. Thank You
Visit Us @
www.knoesis.org
with additional background at http://knoesis.org/amit/hcls
Notas del editor
Heterogeneity of data to be integrated(Variety)
QualityHow do you fix it? Measure it?How do you decide
Consumers are changedClinicians + drug makers + Insurance companiesTechnology savvy users + gadgets
We have lot of data, we are trying to use meaningfully, but still customer(users) are not satisfiedSo we need computer to understand the data
What is semantic web?http://en.wikipedia.org/wiki/Semantic_WebVast – huge dataVague – define ‘young’ ‘tall’Uncertainty - a patient might present a set of symptoms which correspond to a number of different distinct diagnoses each with a different probabilityDeceit - intentionally misleading
The technology stack and usage of most popular technologies
Knowledge + data representation
Knowledge representation
Querying
Kno.e.sis products
This slide intend justify the development of tools doozer, scooner, iExplorerHuge amount of knowledge in different format and people are overloaded withKnowledge/Information, we need mechanism to better exploration of knowledgeAnd help them to find what they require(scooner, iExplorer) and derive new knowledge
Why doozer?Knowledge is available in various formats, but they are hardly helpful if not inStructured format. But building structured knowledgebase from available formats is achallenge
Human knowledge cycleDoozer is a one tool that supports this
Forms of open knowledgeWikipediaLODFormal models
Knowledge acquisition through Model creation
Hierarchy creation from wikipedia
Big picture
Doozer’s way of identifying relationships
Last two steps of knowledge cycle
Big pictureKno.e.sis: NLP based triples - CarticRamakrishnan's and Pablo's work on open Information Extraction from biomedical text.Sentences in MedLine abstracts are parsed and split into Subject, Predicate and Object.In the Merge phase, only those triples that have Subject and Object that can be mapped to the initial KB are added to the enriched KB.BKR triples is that the BKR triples were probably verified by NLM before being published, whereas the Knoesis triples went into the KB unverified, apart from having to match initial KB concepts.
Last two steps of knowledge cycle
Why scooner
demo
Knowledge and data are separatedThere is no way to validate whether my data adheres to knowledge and vice-versa
Architecture
Generate Novel hypothesis
The challengeWhy ASEMR?
How ASEMR?
How ASEMR?
The architecture
Why SPSE?Integration of data gives more insights, but the heterogeneity of data stand against the integration
How SPSE
BenefitsGet Vihn’s help to reduce text
why
EMR documents not only contain data/information but knowledge tooBut scattered nature of knowledge makes it difficult to discover
The big pictureThe built knowledgebase should be able to explain the real world data,We used this claim in reverse order: real world data can be used to enhance the Knowledge base when it fails to explain the dataScenario: Extract all diseases from the documentGenerate all possible symptoms for these diseases using knowledgebaseExtract all symptoms from the documentIf there are more symptoms in document than the generated set, this indicates that we might be missingsome relationship betweenDisease and symptomsWe use this indication to generate questions that can be answered by the domain expert, this will allow us to enrich the knowledgebase
What we found is edema is symptom of hypertension.This method will reduce the workload of domain expertImagine we have 50 diseases and 100 symptomsThen there are 5000 possibilities,Domain expert has to go through each and validate, but with this methodWe will only ask the question only if we find evidence
What we achieved?Not sure whether this slide is requiredWe used lot of existing knowledgebases to build this knowledgebase
Unstructured data posing challenges in every field, but here is our attempt to overcomeThe challenge in healthcareTraditional methods - IR, Data mining, traditional NLP
People waiting to harness the unstructured healthcare data for all these applications
ArchitectureTo-Do – May need to use logos of ezFIND and ezMeasureData Cleaning:Adding section headersModify malformed section headersDe-identificationCAC – Computer Assisted CodingCDI – Clinical Document Improvement
Emphasize the capability of inferencing (only because we have knowledgebase) andPoint out that how difficult to formulate such queries if knowledgebase is not available
EMR doc has these two sentences‘Urinary tract infection’ (first sentence) is correctly annotated, but ‘infection’ in second one is not.Second ‘infection’ actually refers to ‘urinary tract infection’ in first sentence, but NLP engineDoes not understand this.We could find this because there are no evidences to suggest ‘infection’ in the document according to our knowledgebase.So after detecting this issue, we could annotate the second infection as urinary tract infection(this annotation is done manually) Detection is done with IntellegOOne could rather argue that annotating second ‘infection’ as just infection does not harm because urinary tract infection is alsoInfection, but detection of these things help to improve the annotation.
NLP engine annotate the fluid as ‘body_fluid’ which is a symptomBut here the term ‘fluid’ does not refer to symptom rather the form of medication ‘boluses’We could find this issue because there was no disease in the document to suggest the ‘body fluid’
In this case NLP does not detect second statement is talking about history.But with the knowledgebase we have, we can say patient actually has AF.So we resolve the inconsistency here.Example from document 673
NLP does not understand the first sentenceIt says ‘not’ attached to shortness of breath which is wrong according to semantics of the sentence.But we can resolve this issue by using knowledgebaseExample from document 595
Recent advancement in observation mechanisms and data sharing
Sensors play key role
But still we are here
We need to get here
Kno.e.siskHealth ideaOngoing work : simulating first two phasesOur product is MobileMDDemo is at the end of the slides
The ChallengeWe have sensors to measure movements, heart rate, sleeping, galvanic skin response etc…But we don’t know how to aggregate
Key ingredients which will help to understand the healthcare data(measurements)
Numbers->abstractions->knowledge integration(static knowledge about the domain, personal background)->predictionAdvantages: early detection and alert generation