Mental Health Records (MHRs) contain free-text documentation about patients’ suicide and suicidality. In this paper, we address the problem of determining whether grammatic variants (inflections) of the word “suicide” are affirmed or negated. To achieve this, we populate and annotate a dataset with over 6,000 sentences originating from a large repository of MHRs. The resulting dataset has high Inter- Annotator Agreement (κ 0.93). Furthermore, we develop and propose a negation detection method that leverages syntactic features of text1. Using parse trees, we build a set of basic rules that rely on minimum domain knowledge and render the problem as binary classification (affirmed vs. negated). Since the overall goal is to identify patients who are expected to be at high risk of suicide, we focus on the evaluation of positive (affirmed) cases as determined by our classifier. Our negation detection approach yields a recall (sensitivity) value of 94.6% for the positive cases and an overall accuracy value of 91.9%. We believe that our approach can be integrated with other clinical Natural Language Processing tools in order to further advance information extraction capabilities.
Don’t Let Notes Be Misunderstood: A Negation Detection Method for Assessing Risk of Suicide in Mental Health Records
1. Don't Let Notes Be Misunderstood:
A Negation Detection Method for
Assessing Risk of Suicide in Mental
Health Records
George Gkotsis, Sumithra Velupillai, Anika Oellrich
Harry Dean, Maria Liakata and Rina Dutta
Biomedical Research Centre Nucleus – King’s College London
2. e-HOST-IT
Electronic health records to predict
HOspitalised Suicide attempts:
Targeting Information Technology solutions
Aim
To determine whether structured and free-text data in Electronic
Health Records (EHRs) can be used to quantify changes in
symptoms, behaviour patterns and health service-utilisation and
predict serious suicide attempts
4. • Health records contain many structured fields, such as:
• Personal information/contact
• Diagnosis
• Prescription
• Interventions
• Scans & Measurements
Most of the structured fields
are left blank
The mysterious case of health records (1/2)
Max Weber
Theory of
Formal Rationality
5. The mysterious case of health records (2/2)
• Free text contains a lot of information
• Traditional information access
technology returns many false
positives
Example
1. Patient is suicidal
2. Patient is not suicidal
• Meaning can be expressed in multiple
ways
Example
1. He has suicidal thoughts
2. He wants to end his life
3. She wants to kill herself
6.
7. CRIS database
• 226,000 patients
• 18.6 million documents (Event)
• Suicide-related data
• 783,000 documents contain the word suicid*
• 111,000 patients
Anonymous Reviewer:
“Overall,I think the paper is well thought out and written, and I am
envious of their access to such a large patient dataset”
9. Negation detection – definition (1/2)
“The determination of whether a finding or
disease mentioned within narrative medical
reports is present or absent”*
Negex, Chapman et al.
Journal of Biomedical Informatics, 2001
11. Towards negation detection resolution
• Fundamental NLP task
Reduced to identifying the scope of negation
Examples:
No issues other than her indicating that she might commit
He continues to deny any suicidal thoughts and is
happy to come to the XXX for medical review
tomorrow
+
-
12. State-of-the-art: Negex (1/2)
• Lexical-based approach
• Collection of negation cues/expressions
• Pseudo-negation expressions
• Termination cues for scope
• Search scope of 6 words surrounding the target keyword
• pyConTextNLP
13. State-of-the-art: DEEPEN (2/2)
• Wrapper over NegEx
• Applied over the (predicted to be) negated sentences
• Uses a dependency parse tree
14. “Negation’s Not Solved”*
• Optimizable, but not generalizable
• Annotation guidelines are different
• Spans considered can be nouns or whole phrases
• Amount of overlap allowed (or not)
*Wu et al.
PLOS One, 2014
17. 2941 3125
Positive Negative
Dataset and annotation
• Generation
• Random sampling from SLAM Events of 6k sentences containing the word
“suicid*”
• Annotation
• One expert annotated the complete corpus
• Another expert repeated the annotations for 25% of the sentences
𝛋=0.93 (IAA=97.9%)
Limitations
• Linguistic focus
• Patient-agnostic
28. Discussion
• Corpus of 6k sentences from Mental Health Records
• Annotation of high quality
• Evaluation - focus on positive cases
• Parse trees
+ Require minimum number of negation keywords
+ Further potential
• Statement extraction (subject-predicate-object)
• Temporal characteristics
• Degree of suicidality
- Expensive
- Error prone for long sentences
29. Future work (1/2)
Expand on expressions of suicidality
oStudy how negation detection can be used to strengthen predictive
power of mental health records
oOngoing cohort study (pupils with ASD)
oLarge scale study based on hospitalisation events
30. Future work (2/2)
• Evaluate our tool against other datasets/domains
• Consider syntactic dependency parser (instead of constituency-based)
• spaCy
• SyntaxNet