Automating the formalization of clinical guidelines using information extraction

Automating the formalization of clinical
guidelines using information extraction:
an overview of recent lexical approaches

05 August 2011

Phil Gooch
Centre for Health Informatics
City University, London UK

Clinical guidelines

• Contain recommendations for best practice based on systematic
reviews of clinical evidence, consensus statements and expert opinion.
• Goal is to reduce variation in medical care by promoting the most
effective treatments, and to provide a means of quality control in clinical
practice via audit
• Produced by a variety of organizations (e.g. NICE, RCP, SIGN) in a
variety of document formats usually not conducive to use at the point of
care.

Clinical decision support (CDS)

• Aims to provide diagnostic and treatment recommendations and
advice at the point of care, i.e. information tailored for the specific
patient under consideration by the clinician during a consultation
• CDS systems require a knowledge base (KB), usually derived from
guidelines, consisting of declarative knowledge (penicillin is-a
antibiotic) and procedural (if…then) rules, and some sort of electronic
patient record system (EPR)

Computer-interpretable guidelines

• Early systems ‘computerized’ guidelines by making them available ‘on
the computer’, e.g. as HTML or PDF
• Did not lead to improved guideline compliance or use!
• To standardize the format of the knowledge-base, ease development
of CDS, and to improve guideline use at the point of care, a number of
formalisms for representing guidelines have been developed

Computer-interpretable guidelines (CIGs)

Rule-based: ‘if ... then’, e.g. Arden Syntax for individual clinical decisions
LET Last_HgA1C BE READ LATEST {"HgA1C Value"};
LET Diabetic_Patient BE READ LATEST {"Problem: Diabetes"};
if Diabetic_Patient and Last_HgA1C Occurred not within past 6 months and Last_HgA1C is less
than or equal 7
then conclude true;

Document based, e.g. GEM, for complete guideline documents in XML
OO expression query languages e.g. GELLO:
observation.code == ‘SBP’ AND observation.value > 140 AND assessment.code ==‘LVF’

Task-network models (TNM), e.g. GLIF, Asbru, PROforma, for workflow-like
modelling of decisions over time

Formalization of guidelines into a CIG model

• Declarative: Mapping clinical concepts in the guideline to terms within a
controlled vocabulary (e.g. UMLS) or ‘virtual medical record’
• Procedural: Identification and extraction of eligibility criteria, clinical
actions (tests, treatment regimes, referrals), temporal constraints and
if…then decision rules
• Translation to a formal model, e.g. PROforma, GLIF, Asbru
• Time-consuming, iterative, manual process as the guideline text tends to
assume background knowledge, is incomplete or contains ambiguity and
vague terms

Example CIG fragment (Asbru)

<plan name="Doxycycline : 100 mg orally twice a day for 7 days"
plan_id="plan52769441">
<cyclical_plan plan_id="plan5675512">
<frequency value="12" unit="hour"/>
</cyclical_plan>
<duration>
<min value="7" unit="day"/>
<max value="7" unit="day"/>
</duration>
</plan>

Examples of vague guideline statements

Underspecification:
• Avoid the use of highly intensive management strategies to achieve
an HbA1c level less than 6.5% (48 mmol/mol)

• Monitor HbA1c every 2–6 months (according to individual need) until it
is stable on unchanging treatment

Qualitative terms requiring mapping to numeric values or ranges:
• The moderate use of alcohol may increase HDL-cholesterol

• If blood pressure remains uncontrolled on adequate doses of three
drugs, consider adding a fourth and/or seeking expert advice

Information extraction for guideline formalization

• Helpful to automate
• Knowledge base construction: text to formal model translation
• Identiﬁcation of opportunities for decision support: mapping
guideline concepts and rules to concepts in the EPR
• Measurement of guideline compliance

Information extraction approaches

• Bottom-up: identification of individual clinical terms, temporal
expressions, units of measure
• Look-up lists, regular expressions
• Shallow parsing to identify noun phrases
• Terminology services: UMLS, MetaMap
• Co-reference resolution: WordNet

• Top-down: identification of guideline structure: preamble, eligibility,
recommendations, ‘action’ sentences and rules
• Shallow parsing to identify verb phrases
• Ontologies for semantic relations, e.g. UMLS Semantic Network
• Use of linguistic guideline patterns (see later)

Mapping text to UMLS concepts - problems

• Identification of clinical terms is dependent on context:
- family history of congestive heart failure
- probable diagnosis of congestive heart failure
- no evidence of congestive heart failure
- patient does not have established cardiovascular disease

• Clearly just identifying the raw concepts congestive heart failure and
cardiovascular disease and mapping them to UMLS terms is
inadequate.

Mapping guideline text to UMLS concepts - problems

• Guideline documents are typically large (100 pages), in PDF or XML
format
• Requires guideline text to be segmented to enable efficient processing
- How best to segment the text that maximizes contextual clinical concept
identification?

Solutions: Text segmentation
• Customised phrase chunker to identify candidate terms:
- Noun phrases (NP), prepositional phrases (PP), verb phrases (VP)
- Neoclassical combining forms phrases (Token groups containing
Latin/Greek prefixes, roots, suffixes)
- Past-participle and gerund NPs:
- 'results in increased blood pressure', 'fasting blood glucose'
- List expansion:
- 'mild, moderate and severe hypertension → mild hypertension,
moderate hypertension and severe hypertension'
- 'lowering of heart rate and blood pressure → lowering of heart
rate and lowering of blood pressure'
- Abbreviation expansion: 'waist circumference (WC)'

Solutions: GATE-MetaMap Server integration plugin

- Extracts clinical concepts, in context, from large guideline texts in
multiple formats and encodings (PDF, XML, RTF, ASCII, UTF-8)
- Exchanges data/annotations with a MetaMap server
- Implements Unicode Normalization Forms for UTF-8 → ASCII
- Provides flexible text chunking options
- Optimises input data to MetaMap for mapping to UMLS concepts
- Integrates with other information extraction pipelines

GATE-MetaMap integration module

Guideline patterns

Serban et al. (2007), examples:

(med_context, target_group, recommendation_operator, med_action)

In the event of [pregnancy]med_context, [patients with diabetes]target_group
[should]recommendation_op be[prescribed calcium channel blocker]med_action

(target_group, med_context, med_goal)

For [diabetic patients]target_group with [kidney damage]med_context the [blood
pressure target is130/80]med_goal

Extracting guideline recommendations

Extracting guideline recommendations

… and rules from guideline text

Information extraction from patient data

Patient data: automatic spelling correction

Patient data: WordNet mappings for coreferencing

Automating the formalization of clinical guidelines using information extraction

Recomendados

Recomendados

Más contenido relacionado

Similar a Automating the formalization of clinical guidelines using information extraction

Similar a Automating the formalization of clinical guidelines using information extraction (20)

Último

Último (20)

Automating the formalization of clinical guidelines using information extraction