SlideShare una empresa de Scribd logo
1 de 36
mining unstructured healthcare data



                        deep dhillon

chief data scientist | ddhillon@alliancehealth.com | twitter.com/zang0
alliance health networks


                   HCP/advanced patients
         medify.com – 10,000% user growth past year

                           patients
diabeticconnect.com - # 1 online diabetes site @ ~1.4M uniques
    *connect.com – content sites + guided social networks

                     health care industry
         patient surveying, matchmaking and analysis
why mine healthcare text?
anatomy of what patients currently use: webmd (e.g. drugs.com, yahoo health, etc.)

q/a           topic pages            news + media                  experts             discussions

                                                                             •   original content
                                                                             •   addresses emotional needs
                                                                             •   simple to understand
                                                                             •   provides answers



                                                                 • original content
                                                                 • moderately authoritative
                                                                 • simple to understand
                                         •   original content    • good coverage in the head
                                         •   fresh               • provides answers
                                         •   simple to understand
                                         •   good coverage in the head
                •   original content
                •   typically aligned well with google searches,
                    i.e. treatments for X, symptoms for Y            original content=important
                •   good coverage in the head                        providing answers=important
•     original content                                               head=developed
•     aligns well with google searches                               fresh=important
•     provides answers                                               authority=important
why mine healthcare text?
Patients and HCPs need long tail, statistically meaningful, consumer friendly,
authoritative and fresh health content.

q/a           topic pages             news + media                experts                 discussions

                                                                               •   manually written, not authoritative
                                                                               •   not thorough, i.e. not long tail
                                                                               •   not consistently credible, i.e. minimal
                                                                               •   accredation
                                                                               •   not statistically meaningful
                                                                               •   sparse

                                                                  •   manually written, moderately authoritative
                                                                  •   not thorough, i.e. not long tail
                                         •   manually written, not authoritative
                                         •   not consistently credible, i.e. minimal accredation
                                         •   not thorough, i.e. not long tail

                •   evergreen, rarely change
                •   manually written, not authoritative
                •   not consistently credible, i.e. minimal accredation
                •   not thorough, i.e. not long tail                               manual=expensive
                                                                                   manual=head focused
 •    manually written, not authoritative
 •    not consistently credible, i.e. minimal accredation                          manual=not authoritative
 •    not statistically meaningful                                                 manual=old and dated
automated content generation

•   cost effective
•   structured content
•   statistically based
•   scales to millions of patients
•   scales to long tail treatments + conditions
•   authoritative / citation driven
•   fresh
types of text mining
medify demo
how do we mine text?
                 assignments


curators

                  examples




  knowledge        parser               Index




external repos                  Page
  like UMLS                             Search
                               Module
annotate in the product:
•   Easier for people to give explicit GT feedback
•   Channels high visibility error angst productively
•   Channels more GT toward areas most seen by users
•   Override high visibility system mistakes with human data




gt demo:
• http://www.diabeticconnect.com/discussions/4790
• https://www.medify.com/internal/annotate/abstract?abstractId=8181575
Detailed Signals Preliminary Mining From Discussion Threads


5,000


4,500


4,000


3,500


3,000


2,500


2,000


1,500


1,000


 500


   0
Distribution of Signals Mined From Diabetic Connect Discussion
                            Threads




         Medical History - Newly                                              Lifestyle Adjustment
               Diagnosed                         Medical History - Symptom              6%
                   0%                                        3%




                                                                                    Medical History - Condition
                                                                                               10%



                                    Helper/Influencer
                                          33%




                                                                             Personal Account
                                                                                   12%




                                                                 Treatment Experience
                                                                         15%

                                   Medical History
                                        21%
API


 Research (Outcome)          Discussion (Gromin)
                                                                    curator
   treatment/symptom/condition/demographic

                                                   •   identity (i.e. UMLS id)
                      UMLS
                                                   •   taxonomical relations
                                                         • hierarchical is_a
                                                         •   condition/arthritis/Rhematoid Arthritis
                                                   •   synonym relations
                                                         • RA=Rhematoid Arthritis
                                                         • metformim
                                                   •   polarity
knowledge base                                           • effective=positive
                                                         • ineffective=negative
                                                   •   parsing clues (ambiguity)
                                                   •   anaphor clues
abstract parser work flow
document    sentence selection         1
                                                        1.        sentences w/ high conclusion presence selected
                                                        2.        shallow entity tagging applied based on umls
                                       2                3.        sentence text is modified to retain entities, optimize
             shallow tagging                                      for performance, and eliminate unnecessary filler.
                                                        4.        deep typed dependency parsing of modified text
                                                        5.        3 tier text classification applied on BOW + entities
                                       3
                sentence                                6.        SVO triples extracted from dependencies

               modification                             7.        rule based conclusions generated from triples
                                                        8.        confidence models applied to rule and classifier
                                                                  based conclusions
9                                                       9.        knowledge base used to represent domain and
             dependency tree                                      discourse style.
knowledge         parse           4                     10.       errors measured against curated data applied for
                                                                  improvements


                                                              5

             triple extraction    6
                                           3 tier classification                                                10



                concluder         7

                                                              8

                        confidence assignment                                        annotations
discussion parser work flow
    message     sentence split          1             1.       sentences split from discussion thread messages
                                                      2.       shallow entity tagging applied based on umls
                                                      3.       sentence text is modified to retain entities, optimize
                                        2                      for performance, and eliminate unnecessary filler.
               shallow tagging                        4.       deep typed dependency parsing of modified text
                                                               for select conclusion types
                                                      5.       select conclusion type based text classification
                                        3                      applied on BOW + entities after dimensionality
                  sentence                                     reduction

                 modification                         6.       SVO triples extracted from dependencies
                                                      7.       rule based conclusions generated from triples

9                                                     8.       rule based KB driven anaphora resolution applied.
                                                               Classification based conclusions added.
               dependency tree                        9.       knowledge base used to represent domain and
knowledge           parse           4                          discourse style.
                                                      10.      errors measured against curated data applied for
                                                               improvements
                                                           5

               triple extraction    6
                                            classification                                                  10



                  concluder         7

                                                           8

              anaphora resolution                                                conclusions
feature engineering

• entities, i.e. normalize synonyms, id new
  entity types, like social relations
• entity types, i.e. metformin >
  treatment_medication
• phrase driven cues, i.e. [have] [you]
  [considered] > suggestion_indicator
anaphora resolution

• relation structure, i.e. [person]>takes>it
  it refers to treatment (i.e. not condition/symptom), and specifically: medication but not device

• statistically driven, manually curated cues
  i.e. drug > treatment/medication
• filter
   – non matching antecedent candidates
   – singular/plural agreement
• score candidates:
   – antecedent occurrence frequency
   – distance (#sents) from antecedent to anaphor
   – co-occurrence of anaphor w/ antecedent
technology

•   Lang: Java + Python + Ruby
•   DB: Solr 4, Mongo DB, S3
•   Work: Map Reduce
•   Dependencies: Malt Parser, Stanford Parser
• Misc: Tomcat, Spring, Mallet, Reverb, Minor Third
• Tagging: Peregrine + home grown
Data Pipeline




                21
Solr 1              Solr 2               Solr N




                                 Load Balancer – API




                Cache
                          Portal 1            Portal 2   Portal N




                              Load Balancer – www.medify.com




request transaction flow
                                                         browser
questions?
Advair: Experts vs. Patients
                  • “medicalese” vs. patients words
                  • more granularity
                  • a story like perspective w/ words
                    of inspiration



                    My Pulmonologist today said that he had just come from
                    the hospital bedside of a patient with my exact symptoms
                    who is on oxygen and not doing well. If I hadn't been so
                    diligent in following my asthma plan that could easily have
                    been me. His telling me that really hit home.




                    By the way, my Pulmonologist surprised me with his
                    view on many of his patients. He actually got excited
                    and thanked me for being knowledgeable about
                    asthma, understanding my own body and health needs
                    and for following my treatment plan. He said he doesn't
                    get many patients who can actually talk about their
                    disease, symptoms, time
                    frames, treatments/medications, and non-medical
                    measures they are taking. He also doesn't get many
                    people who ask questions. My response to this: What
                    are people thinking? They need to take control of their
                    own health or noone else will.

Más contenido relacionado

Similar a Mining Unstructured Healthcare Data

Whats The Role Of Social Media In Bcm Dominique C Brack, SBCI
Whats The Role Of Social Media In Bcm   Dominique C Brack, SBCIWhats The Role Of Social Media In Bcm   Dominique C Brack, SBCI
Whats The Role Of Social Media In Bcm Dominique C Brack, SBCIReputelligence
 
Reactive Writing Techniques for Rewarding and Retaining Users
Reactive Writing Techniques for Rewarding and Retaining UsersReactive Writing Techniques for Rewarding and Retaining Users
Reactive Writing Techniques for Rewarding and Retaining Usersgrebstock
 
Seek #CincySM Conversation Starter
Seek #CincySM Conversation StarterSeek #CincySM Conversation Starter
Seek #CincySM Conversation StarterSEEK Company
 
Analyzing Qualitative Data for_ Research
Analyzing Qualitative Data for_ ResearchAnalyzing Qualitative Data for_ Research
Analyzing Qualitative Data for_ ResearchNirmalPoudel4
 
Seek #DIGNC Digital Non-Conference Conversation starter
Seek #DIGNC Digital Non-Conference Conversation starterSeek #DIGNC Digital Non-Conference Conversation starter
Seek #DIGNC Digital Non-Conference Conversation starterSEEK Company
 
HxD 2012: Communities of Care: Social Media in Healthcare
HxD 2012: Communities of Care: Social Media in HealthcareHxD 2012: Communities of Care: Social Media in Healthcare
HxD 2012: Communities of Care: Social Media in HealthcareAmy Cueva
 
MRECO Conversation Starter
MRECO Conversation StarterMRECO Conversation Starter
MRECO Conversation StarterSEEK Company
 
Critical evaluation (web version)
Critical evaluation (web version)Critical evaluation (web version)
Critical evaluation (web version)Durham_Library_DTP
 
Critical evaluation (web version)
Critical evaluation (web version)Critical evaluation (web version)
Critical evaluation (web version)Jamie Bisset
 
Corporate communication workshop
Corporate communication workshopCorporate communication workshop
Corporate communication workshopOanaStefancu
 
4 12 college board conference
4 12 college board conference4 12 college board conference
4 12 college board conferenceJaneTors
 
Effects revised
Effects revisedEffects revised
Effects revisedmaduncan
 
SPE 108 Research: Supporting Ideas
SPE 108 Research: Supporting IdeasSPE 108 Research: Supporting Ideas
SPE 108 Research: Supporting IdeasVal Bello
 
Putting Social Media to Work
Putting Social Media to WorkPutting Social Media to Work
Putting Social Media to WorkMarcus Tewksbury
 

Similar a Mining Unstructured Healthcare Data (16)

Whats The Role Of Social Media In Bcm Dominique C Brack, SBCI
Whats The Role Of Social Media In Bcm   Dominique C Brack, SBCIWhats The Role Of Social Media In Bcm   Dominique C Brack, SBCI
Whats The Role Of Social Media In Bcm Dominique C Brack, SBCI
 
Reactive Writing Techniques for Rewarding and Retaining Users
Reactive Writing Techniques for Rewarding and Retaining UsersReactive Writing Techniques for Rewarding and Retaining Users
Reactive Writing Techniques for Rewarding and Retaining Users
 
Seek #CincySM Conversation Starter
Seek #CincySM Conversation StarterSeek #CincySM Conversation Starter
Seek #CincySM Conversation Starter
 
Analyzing Qualitative Data for_ Research
Analyzing Qualitative Data for_ ResearchAnalyzing Qualitative Data for_ Research
Analyzing Qualitative Data for_ Research
 
Seek #DIGNC Digital Non-Conference Conversation starter
Seek #DIGNC Digital Non-Conference Conversation starterSeek #DIGNC Digital Non-Conference Conversation starter
Seek #DIGNC Digital Non-Conference Conversation starter
 
HxD 2012: Communities of Care: Social Media in Healthcare
HxD 2012: Communities of Care: Social Media in HealthcareHxD 2012: Communities of Care: Social Media in Healthcare
HxD 2012: Communities of Care: Social Media in Healthcare
 
MRECO Conversation Starter
MRECO Conversation StarterMRECO Conversation Starter
MRECO Conversation Starter
 
Critical evaluation (web version)
Critical evaluation (web version)Critical evaluation (web version)
Critical evaluation (web version)
 
Critical evaluation (web version)
Critical evaluation (web version)Critical evaluation (web version)
Critical evaluation (web version)
 
Corporate communication workshop
Corporate communication workshopCorporate communication workshop
Corporate communication workshop
 
Speech
SpeechSpeech
Speech
 
4 12 college board conference
4 12 college board conference4 12 college board conference
4 12 college board conference
 
Press release
Press releasePress release
Press release
 
Effects revised
Effects revisedEffects revised
Effects revised
 
SPE 108 Research: Supporting Ideas
SPE 108 Research: Supporting IdeasSPE 108 Research: Supporting Ideas
SPE 108 Research: Supporting Ideas
 
Putting Social Media to Work
Putting Social Media to WorkPutting Social Media to Work
Putting Social Media to Work
 

Último

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Último (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Mining Unstructured Healthcare Data

  • 1. mining unstructured healthcare data deep dhillon chief data scientist | ddhillon@alliancehealth.com | twitter.com/zang0
  • 2. alliance health networks HCP/advanced patients medify.com – 10,000% user growth past year patients diabeticconnect.com - # 1 online diabetes site @ ~1.4M uniques *connect.com – content sites + guided social networks health care industry patient surveying, matchmaking and analysis
  • 3. why mine healthcare text? anatomy of what patients currently use: webmd (e.g. drugs.com, yahoo health, etc.) q/a topic pages news + media experts discussions • original content • addresses emotional needs • simple to understand • provides answers • original content • moderately authoritative • simple to understand • original content • good coverage in the head • fresh • provides answers • simple to understand • good coverage in the head • original content • typically aligned well with google searches, i.e. treatments for X, symptoms for Y original content=important • good coverage in the head providing answers=important • original content head=developed • aligns well with google searches fresh=important • provides answers authority=important
  • 4. why mine healthcare text? Patients and HCPs need long tail, statistically meaningful, consumer friendly, authoritative and fresh health content. q/a topic pages news + media experts discussions • manually written, not authoritative • not thorough, i.e. not long tail • not consistently credible, i.e. minimal • accredation • not statistically meaningful • sparse • manually written, moderately authoritative • not thorough, i.e. not long tail • manually written, not authoritative • not consistently credible, i.e. minimal accredation • not thorough, i.e. not long tail • evergreen, rarely change • manually written, not authoritative • not consistently credible, i.e. minimal accredation • not thorough, i.e. not long tail manual=expensive manual=head focused • manually written, not authoritative • not consistently credible, i.e. minimal accredation manual=not authoritative • not statistically meaningful manual=old and dated
  • 5. automated content generation • cost effective • structured content • statistically based • scales to millions of patients • scales to long tail treatments + conditions • authoritative / citation driven • fresh
  • 6. types of text mining
  • 8. how do we mine text? assignments curators examples knowledge parser Index external repos Page like UMLS Search Module
  • 9. annotate in the product: • Easier for people to give explicit GT feedback • Channels high visibility error angst productively • Channels more GT toward areas most seen by users • Override high visibility system mistakes with human data gt demo: • http://www.diabeticconnect.com/discussions/4790 • https://www.medify.com/internal/annotate/abstract?abstractId=8181575
  • 10.
  • 11.
  • 12.
  • 13. Detailed Signals Preliminary Mining From Discussion Threads 5,000 4,500 4,000 3,500 3,000 2,500 2,000 1,500 1,000 500 0
  • 14. Distribution of Signals Mined From Diabetic Connect Discussion Threads Medical History - Newly Lifestyle Adjustment Diagnosed Medical History - Symptom 6% 0% 3% Medical History - Condition 10% Helper/Influencer 33% Personal Account 12% Treatment Experience 15% Medical History 21%
  • 15. API Research (Outcome) Discussion (Gromin) curator treatment/symptom/condition/demographic • identity (i.e. UMLS id) UMLS • taxonomical relations • hierarchical is_a • condition/arthritis/Rhematoid Arthritis • synonym relations • RA=Rhematoid Arthritis • metformim • polarity knowledge base • effective=positive • ineffective=negative • parsing clues (ambiguity) • anaphor clues
  • 16. abstract parser work flow document sentence selection 1 1. sentences w/ high conclusion presence selected 2. shallow entity tagging applied based on umls 2 3. sentence text is modified to retain entities, optimize shallow tagging for performance, and eliminate unnecessary filler. 4. deep typed dependency parsing of modified text 5. 3 tier text classification applied on BOW + entities 3 sentence 6. SVO triples extracted from dependencies modification 7. rule based conclusions generated from triples 8. confidence models applied to rule and classifier based conclusions 9 9. knowledge base used to represent domain and dependency tree discourse style. knowledge parse 4 10. errors measured against curated data applied for improvements 5 triple extraction 6 3 tier classification 10 concluder 7 8 confidence assignment annotations
  • 17. discussion parser work flow message sentence split 1 1. sentences split from discussion thread messages 2. shallow entity tagging applied based on umls 3. sentence text is modified to retain entities, optimize 2 for performance, and eliminate unnecessary filler. shallow tagging 4. deep typed dependency parsing of modified text for select conclusion types 5. select conclusion type based text classification 3 applied on BOW + entities after dimensionality sentence reduction modification 6. SVO triples extracted from dependencies 7. rule based conclusions generated from triples 9 8. rule based KB driven anaphora resolution applied. Classification based conclusions added. dependency tree 9. knowledge base used to represent domain and knowledge parse 4 discourse style. 10. errors measured against curated data applied for improvements 5 triple extraction 6 classification 10 concluder 7 8 anaphora resolution conclusions
  • 18. feature engineering • entities, i.e. normalize synonyms, id new entity types, like social relations • entity types, i.e. metformin > treatment_medication • phrase driven cues, i.e. [have] [you] [considered] > suggestion_indicator
  • 19. anaphora resolution • relation structure, i.e. [person]>takes>it it refers to treatment (i.e. not condition/symptom), and specifically: medication but not device • statistically driven, manually curated cues i.e. drug > treatment/medication • filter – non matching antecedent candidates – singular/plural agreement • score candidates: – antecedent occurrence frequency – distance (#sents) from antecedent to anaphor – co-occurrence of anaphor w/ antecedent
  • 20. technology • Lang: Java + Python + Ruby • DB: Solr 4, Mongo DB, S3 • Work: Map Reduce • Dependencies: Malt Parser, Stanford Parser • Misc: Tomcat, Spring, Mallet, Reverb, Minor Third • Tagging: Peregrine + home grown
  • 22. Solr 1 Solr 2 Solr N Load Balancer – API Cache Portal 1 Portal 2 Portal N Load Balancer – www.medify.com request transaction flow browser
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36. Advair: Experts vs. Patients • “medicalese” vs. patients words • more granularity • a story like perspective w/ words of inspiration My Pulmonologist today said that he had just come from the hospital bedside of a patient with my exact symptoms who is on oxygen and not doing well. If I hadn't been so diligent in following my asthma plan that could easily have been me. His telling me that really hit home. By the way, my Pulmonologist surprised me with his view on many of his patients. He actually got excited and thanked me for being knowledgeable about asthma, understanding my own body and health needs and for following my treatment plan. He said he doesn't get many patients who can actually talk about their disease, symptoms, time frames, treatments/medications, and non-medical measures they are taking. He also doesn't get many people who ask questions. My response to this: What are people thinking? They need to take control of their own health or noone else will.