SlideShare una empresa de Scribd logo
1 de 5
Descargar para leer sin conexión
Platform for Patient Centric Collaborative
                      Research
           Dadong Wan, Sophia Cao, Karthik Gomadam,
             Accenture Technology Labs, San Jose, CA.
    { dadong.wan, sophia.cao, karthik.gomadam }@accenture.com


1     Abstract
The Affordable Care Act is perhaps the most significant “face-lift” in the U.S.
healthcare system since the introduction of Medicare and Medicaid. Key focus
areas of ACA include evidence based care and pay for performance. Patient
engagement is at the heart of both of these focus areas. However, finding relevant
patients to engage with medical providers is an important challenge. In this
paper, we describe our solution to alleviate this problem that leverages patient
data avaialble in online health communities and seeks to match the patients in
these communities for relevant projects. Our solution can be applied to data
from any patient community and patients can engage with researchers from
within the communities they are already a part of. We believe that this approach
will help researchers find highly relevant patients and will enable patient centric,
dynamic and responsive research.


2     Introduction
The Affordable Care Act is perhaps the most significant “face-lift” in the U.S.
healthcare system since the introduction of Medicare and Medicaid. Key focus
areas of ACA include evidence based care and pay for performance. Patient en-
gagement is at the heart of both of these focus areas. For example, researchers
who want to study the effectiveness of levetiracetam, lamotrigine, or oxcar-
bazepine on pediatric epilepsy patients should engage with the patients and
their caregivers. Measuring this would allow them to validate their care plan
process for helping patients manage their conditions as well as that of their treat-
ment plans. Having these validations will help providers analyze and optimize
their performance in the pay for performance age. However, finding relevant
patients to engage with medical providers is a non-trivial problem. In the above
example, providers will need to recruit patients who are children, have epilepsy,
and are prescribed levetiracetam, lamotrigine, or oxcarbazepine.
   In this paper, we propose a solution to address this problem using the data
from online health communities. Our experience in the past when we developed


                                         1
applications to match patients and clinical trial investigators had proved to us
that patients will not flock to recruitment platforms and any meaningful solution
should /emphfish where the fishes are. We realized that patient communities
such as PatientsLikeMe and Medhelp have millions of patients who are sharing
information about their medical conditions, medications, and their experience
in managing their conditions. We developed a solution that takes advantage of
this patient data, allowing researchers to find patients from these communities.
We apply semantic and text mining algorithms to analyze patient conversations
in these communities to build rich patient profies that captures their medical
conditions, medications, and demographic information. We build similar profiles
for research projects (listed at PCORI.org). We then match and rank the project
and the patient profiles to find the most relevant patients.
    One challenge in matching projects with patients based on patient conver-
sations is the difference in the ways in which different participants (researchers,
patients, caregivers) describe the same thing. For example, a researcher will use
diabetes mellitus while a patient might say type 2. Using semantic Web tech-
nologies (UMLS ontologies, OpenCalais entity extractor, semantic type match-
ing) allows us to overcome this problem.
    We have prototyped our approach (available at: http://bit.ly/pccr_acn)
that demonstrates the effectiveness of our approach in finding patients. Due
to privacy concerns, we were not able to integrate with existing online commu-
nities. We have developed a sample online community, MeMed (available at:
http://bit.ly/me_med_), and created posts similar to those found in existing
communities. Our prototype allows users to add PCORI projects and finding
matching patients in MeMed.


3     Overview of the PCCR Platform
In this section we briefly describe the PCCR platform. We begin by describing
the main models in the system. These are illustated in figure 1.
    1. Investigators: captures the information about the investigators who are
       seeking participants for their projects. We model the institution and the
       areas of interest for an investigator. The areas of interest of an investigator
       are automatically created by analyzing their projects.
    2. Projects: Each investigator can have multiple projects. Each project has
       a title, description, goals, project type that captures the nature of the
       project, the medical conditions and medications of interest described in
       the project, and the expected outcomes. Our matching algorithm matches
       participants across these different dimensions and calculates a match score.
       The patients are ranked based on this match score.
    3. Patients: We extract patient profiles based on their conversations / partic-
       ipation in existing online health communities. We identify and use their
       demographic, socio-economic, and medical information in creating their
       profile.


                                          2
Inves-gators(                                                                                   Pa-ents(
                               Name(                                                                                          Name(
                            Organiza-on(                                                                                       Age(
                          Areas(of(interest(                                                                                 Gender(
                           Project(History(                                                                                 Loca-on(
                                                                                                                        Economic(status(
                                                                                                                              Race(
                                                                                                                        Areas(of(interest(
                                                                                                                    Medical(Condi-ons(/(stage(
                                                                                                                             Ac-vity(

           Project(1(         Project(2(                   Project(k(



                   Project(defini-on(
                                                                                                    Medical(
                    Statement(        Goals(                            Type(                      Condi-ons(       Demographics(       Outcome(

                                                                Preven-ve(                    Medical(                Age(             Trial(
                                                                Diagnos-c(                    condi-on(               Gender(          Tests(
                    PR(                        PR(              Therapeu-c(                   Condi-on(stage(         Economic(        Studies(
                                                                Pallia-ve(                    Medica-on(              Region(          Surveys(
                             UC(                     UC(        Health(Delivery(                                      Race(

                             UC(                     UC(

                    PR(                        PR(

                                                                        *PR(–(Pa-ent(response,(*UCN(User(comment(




             Figure 1: PCCR Matching Platform - Data Definitions


                                Project Title & Description                                        Patient Communities



                               Big Data & Multidimensional                                  Big Data & Multidimensional
                                    Semantic Analysis                                            Semantic Analysis


                                      Rich Project Profile                                          Rich Patient Profile



                                                           Multidimensional Semantic
                                                                 Match Engine



                                                           Matched Participants Across
                                                           Online Patient Communities




                        Figure 2: PCCR Matching Platform - Data Flow

    The researcher and patients profiles are used by our matching engine to
identify relevant patients for a project. At the heart of the PCCR platform is
our matching engine. Figure 2 illustrates the data flow of our matching engine.
The two main components of the matching engine are the researcher profile
generator and the patient profile generator.
    The researcher profile generator takes as input the textual description of a
research project. For the purposes of this challenge, we use the descriptions of
funded PCORI projects. This profile is passed through a semantic analyzer.
The semantic analyzer is built using concepts in RXNorm and SNOMED and


                                                                                 3
Figure 3: Example output of semantic analysis


identifies medical terminologies and concepts in the description, along with their
semantic types. In addition to the semantic analyzer, the description is also sent
to OpenCalais Web API for entity identification. A final list of entites and types
is created by combining the output of the semantic analyzer and OpenCalais.
The demographic analyzer module extracts demographic information (such as
age group of target population, gender, and location information). We use
textual cues to identify expected outcomes. Figure 3 illustrates the entities
identified from the description of a PCORI project on Epilepsy.
    The patient profile generator uses the semantic analyzer and the demo-
graphic analyzer. However, given the volume of patient data, we needed to
adopt a more scalable approach as semantic analysis can be expensive. We use
a Map-Reduce based solution, where we have a series of map and reduce jobs.
The first map job takes user profiles as input and uses the semantic analyzer
to identify entities and types. In parallel, we have another map job that uses
extracts entities and types using OpenCalais. The respective reduce jobs com-
bine all the identified entities for a patient. We merge these lists to create a
semantic signature of the patient consisting of a collection of entities and their
types. Similarly, the demographic and socio-economic information is identified.
Combining all of the above information yields a rich patient profile. We store
the profile as a structured object in Mongo.
    The matching algorithm takes as input a rich project profile. For each of
the facets in medical condition, medication, and demographics, the match-
ing algorithm first finds the relevant patients using set containment opera-
tors. We also use Mongo’s geo querying to filter users by location, if the
project description mentions such as a restriction. Further, we apply a seman-
tic similarity (based on Ted Pedersons UMLS Similarity project available at
http://umls-similarity.sourceforge.net/), to compute the semantic sim-
ilarity of a patient profile to that of a project. All of these are then combined
to create a match score that is used in selected and ranking patients.


4    Related Work
The techniques we have used in this paper are built upon prior research in the
areas of semantic Web, hierarchical object matching, and entity extraction. In
the context of patient matching for healthcare, the TrialX system [4]is very
relevant to work. We also use our prior work in the area of faceted matching
and searching of unstructured documents [3] for factet extraction.We model our


                                        4
similarity measurement technique based on the their approach. We also applied
the principles of hierarchical object matching discussed by Ganesan et. al in [2]
and Doan et. al in [1]. We also use OpenCalais Web service [5] to semantically
enrich patient conversations and project descriptions and to extract relevant
entities.


5    Conclusions
In this paper, we describe our solution to the PCORI Healthcare 2.0 chal-
lenge. Our solution leverages existing patient data available in online health
communities and creates a rich semantic profile of the patients. We have also
developed techniques for creating multi-dimensional project profiles from their
textual descriptions. We have developed a semantic matching algorithm that
finds matching patients for research projects. The PCCR platform we have de-
veloped works for any patient community. Due to privacy concerns, we have
not used any online community data in our development or demonstration. In-
stead, we use data from a patient community that we prototyped and seeded
with posts. We evaluated our system and found that our approach has over
90% accuracy in finding patients who have same or similar medical conditions.
The match rate when using demographics goes down to about 80%. We are
currently improving our demographic profiling and extraction technique. Our
approach builds on current ways patients share and interact on the Web today
and we believe that it can help researchers find very relevant patients leading
to more meaningful and productive engagements and outcomes.


References
[1] Anhai Doan, Pedro Domingos, and Alon Halevy. Learning to match the
    schemas of data sources: A multistrategy approach. Machine Learning,
    50(3):279–301, 2003.
[2] Prasanna Ganesan, Hector Garcia-Molina, and Jennifer Widom. Exploiting
    hierarchical domain structure to compute similarity. ACM Transactions on
    Information Systems (TOIS), 21(1):64–93, 2003.
[3] Karthik Gomadam, Ajith Ranabahu, Meenakshi Nagarajan, Amit P Sheth,
    and Kunal Verma. A faceted classification based approach to search and rank
    web apis. In Web Services, 2008. ICWS’08. IEEE International Conference
    on, pages 177–184. IEEE, 2008.
[4] Chintan Patel, Sharib Khan, and Karthik Gomadam. Trialx: Using semantic
    technologies to match patients to relevant clinical trials based on their per-
    sonal health records. Proc. of the International Semantic Web Conference
    (ISWC), 2009.
[5] T Reuters. Opencalais, 2009.


                                        5

Más contenido relacionado

Similar a Pcori2013 (23)

Leverage machine learning and new technologies to enhance rwe generation and ...
Leverage machine learning and new technologies to enhance rwe generation and ...Leverage machine learning and new technologies to enhance rwe generation and ...
Leverage machine learning and new technologies to enhance rwe generation and ...Athula Herath
 
SAS M2006 Presentation
SAS M2006 PresentationSAS M2006 Presentation
SAS M2006 PresentationGregPotts
 
Standards & Coding Systems in Biomedical and Health Informatics
Standards & Coding Systems in Biomedical and Health InformaticsStandards & Coding Systems in Biomedical and Health Informatics
Standards & Coding Systems in Biomedical and Health InformaticsNawanan Theera-Ampornpunt
 
AI and Big Data in Psychiatry: An Introduction and Overview
AI and Big Data in Psychiatry: An Introduction and OverviewAI and Big Data in Psychiatry: An Introduction and Overview
AI and Big Data in Psychiatry: An Introduction and OverviewCarlo Carandang
 
Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Sage Base
 
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29Sage Base
 
E-Symptom Analysis System to Improve Medical Diagnosis and Treatment Recommen...
E-Symptom Analysis System to Improve Medical Diagnosis and Treatment Recommen...E-Symptom Analysis System to Improve Medical Diagnosis and Treatment Recommen...
E-Symptom Analysis System to Improve Medical Diagnosis and Treatment Recommen...journal ijrtem
 
E-Symptom Analysis System to Improve Medical Diagnosis and Treatment Recommen...
E-Symptom Analysis System to Improve Medical Diagnosis and Treatment Recommen...E-Symptom Analysis System to Improve Medical Diagnosis and Treatment Recommen...
E-Symptom Analysis System to Improve Medical Diagnosis and Treatment Recommen...IJRTEMJOURNAL
 
Big data and machine learning: opportunità per la medicina di precisione e i ...
Big data and machine learning: opportunità per la medicina di precisione e i ...Big data and machine learning: opportunità per la medicina di precisione e i ...
Big data and machine learning: opportunità per la medicina di precisione e i ...Fondazione Giannino Bassetti
 
Application of Data Analytics to Improve Patient Care: A Systematic Review
Application of Data Analytics to Improve Patient Care: A Systematic ReviewApplication of Data Analytics to Improve Patient Care: A Systematic Review
Application of Data Analytics to Improve Patient Care: A Systematic ReviewIRJET Journal
 
Architectuurcongres 20110623
Architectuurcongres 20110623Architectuurcongres 20110623
Architectuurcongres 20110623Hazelzet
 
Visual Analytics for Healthcare - Panel at AMIA 2012 in Chicago
Visual Analytics for Healthcare - Panel at AMIA 2012 in ChicagoVisual Analytics for Healthcare - Panel at AMIA 2012 in Chicago
Visual Analytics for Healthcare - Panel at AMIA 2012 in ChicagoAdam Perer
 
A location comparison of three health care centers in Sfax-city
A location comparison of three health care centers in Sfax-cityA location comparison of three health care centers in Sfax-city
A location comparison of three health care centers in Sfax-cityIJERD Editor
 
Big data analytics in health care by data mining and classification techniques
Big data analytics in health care by data mining and classification techniquesBig data analytics in health care by data mining and classification techniques
Big data analytics in health care by data mining and classification techniquesssuserc491ef2
 
Big data analytics in health care by data mining and classification techniques
Big data analytics in health care by data mining and classification techniquesBig data analytics in health care by data mining and classification techniques
Big data analytics in health care by data mining and classification techniquesssuserc491ef2
 
Integration of HEOR into Global Publication Plans
Integration of HEOR into Global Publication PlansIntegration of HEOR into Global Publication Plans
Integration of HEOR into Global Publication PlansDr. Kavita Lamror
 
Biomedical Informatics
Biomedical InformaticsBiomedical Informatics
Biomedical Informaticsimprovemed
 

Similar a Pcori2013 (23) (20)

Leverage machine learning and new technologies to enhance rwe generation and ...
Leverage machine learning and new technologies to enhance rwe generation and ...Leverage machine learning and new technologies to enhance rwe generation and ...
Leverage machine learning and new technologies to enhance rwe generation and ...
 
Artificial Intelligence in Medicine
Artificial Intelligence in MedicineArtificial Intelligence in Medicine
Artificial Intelligence in Medicine
 
SAS M2006 Presentation
SAS M2006 PresentationSAS M2006 Presentation
SAS M2006 Presentation
 
Standards & Coding Systems in Biomedical and Health Informatics
Standards & Coding Systems in Biomedical and Health InformaticsStandards & Coding Systems in Biomedical and Health Informatics
Standards & Coding Systems in Biomedical and Health Informatics
 
AI and Big Data in Psychiatry: An Introduction and Overview
AI and Big Data in Psychiatry: An Introduction and OverviewAI and Big Data in Psychiatry: An Introduction and Overview
AI and Big Data in Psychiatry: An Introduction and Overview
 
Secondary Use of Healthcare Data for Translational Research
Secondary Use of Healthcare Data for Translational ResearchSecondary Use of Healthcare Data for Translational Research
Secondary Use of Healthcare Data for Translational Research
 
Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24
 
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
 
E-Symptom Analysis System to Improve Medical Diagnosis and Treatment Recommen...
E-Symptom Analysis System to Improve Medical Diagnosis and Treatment Recommen...E-Symptom Analysis System to Improve Medical Diagnosis and Treatment Recommen...
E-Symptom Analysis System to Improve Medical Diagnosis and Treatment Recommen...
 
E-Symptom Analysis System to Improve Medical Diagnosis and Treatment Recommen...
E-Symptom Analysis System to Improve Medical Diagnosis and Treatment Recommen...E-Symptom Analysis System to Improve Medical Diagnosis and Treatment Recommen...
E-Symptom Analysis System to Improve Medical Diagnosis and Treatment Recommen...
 
Big data and machine learning: opportunità per la medicina di precisione e i ...
Big data and machine learning: opportunità per la medicina di precisione e i ...Big data and machine learning: opportunità per la medicina di precisione e i ...
Big data and machine learning: opportunità per la medicina di precisione e i ...
 
Application of Data Analytics to Improve Patient Care: A Systematic Review
Application of Data Analytics to Improve Patient Care: A Systematic ReviewApplication of Data Analytics to Improve Patient Care: A Systematic Review
Application of Data Analytics to Improve Patient Care: A Systematic Review
 
Architectuurcongres 20110623
Architectuurcongres 20110623Architectuurcongres 20110623
Architectuurcongres 20110623
 
Visual Analytics for Healthcare - Panel at AMIA 2012 in Chicago
Visual Analytics for Healthcare - Panel at AMIA 2012 in ChicagoVisual Analytics for Healthcare - Panel at AMIA 2012 in Chicago
Visual Analytics for Healthcare - Panel at AMIA 2012 in Chicago
 
A location comparison of three health care centers in Sfax-city
A location comparison of three health care centers in Sfax-cityA location comparison of three health care centers in Sfax-city
A location comparison of three health care centers in Sfax-city
 
Big data analytics in health care by data mining and classification techniques
Big data analytics in health care by data mining and classification techniquesBig data analytics in health care by data mining and classification techniques
Big data analytics in health care by data mining and classification techniques
 
Big data analytics in health care by data mining and classification techniques
Big data analytics in health care by data mining and classification techniquesBig data analytics in health care by data mining and classification techniques
Big data analytics in health care by data mining and classification techniques
 
Integration of HEOR into Global Publication Plans
Integration of HEOR into Global Publication PlansIntegration of HEOR into Global Publication Plans
Integration of HEOR into Global Publication Plans
 
Duke University Medical Center
Duke University Medical CenterDuke University Medical Center
Duke University Medical Center
 
Biomedical Informatics
Biomedical InformaticsBiomedical Informatics
Biomedical Informatics
 

Pcori2013 (23)

  • 1. Platform for Patient Centric Collaborative Research Dadong Wan, Sophia Cao, Karthik Gomadam, Accenture Technology Labs, San Jose, CA. { dadong.wan, sophia.cao, karthik.gomadam }@accenture.com 1 Abstract The Affordable Care Act is perhaps the most significant “face-lift” in the U.S. healthcare system since the introduction of Medicare and Medicaid. Key focus areas of ACA include evidence based care and pay for performance. Patient engagement is at the heart of both of these focus areas. However, finding relevant patients to engage with medical providers is an important challenge. In this paper, we describe our solution to alleviate this problem that leverages patient data avaialble in online health communities and seeks to match the patients in these communities for relevant projects. Our solution can be applied to data from any patient community and patients can engage with researchers from within the communities they are already a part of. We believe that this approach will help researchers find highly relevant patients and will enable patient centric, dynamic and responsive research. 2 Introduction The Affordable Care Act is perhaps the most significant “face-lift” in the U.S. healthcare system since the introduction of Medicare and Medicaid. Key focus areas of ACA include evidence based care and pay for performance. Patient en- gagement is at the heart of both of these focus areas. For example, researchers who want to study the effectiveness of levetiracetam, lamotrigine, or oxcar- bazepine on pediatric epilepsy patients should engage with the patients and their caregivers. Measuring this would allow them to validate their care plan process for helping patients manage their conditions as well as that of their treat- ment plans. Having these validations will help providers analyze and optimize their performance in the pay for performance age. However, finding relevant patients to engage with medical providers is a non-trivial problem. In the above example, providers will need to recruit patients who are children, have epilepsy, and are prescribed levetiracetam, lamotrigine, or oxcarbazepine. In this paper, we propose a solution to address this problem using the data from online health communities. Our experience in the past when we developed 1
  • 2. applications to match patients and clinical trial investigators had proved to us that patients will not flock to recruitment platforms and any meaningful solution should /emphfish where the fishes are. We realized that patient communities such as PatientsLikeMe and Medhelp have millions of patients who are sharing information about their medical conditions, medications, and their experience in managing their conditions. We developed a solution that takes advantage of this patient data, allowing researchers to find patients from these communities. We apply semantic and text mining algorithms to analyze patient conversations in these communities to build rich patient profies that captures their medical conditions, medications, and demographic information. We build similar profiles for research projects (listed at PCORI.org). We then match and rank the project and the patient profiles to find the most relevant patients. One challenge in matching projects with patients based on patient conver- sations is the difference in the ways in which different participants (researchers, patients, caregivers) describe the same thing. For example, a researcher will use diabetes mellitus while a patient might say type 2. Using semantic Web tech- nologies (UMLS ontologies, OpenCalais entity extractor, semantic type match- ing) allows us to overcome this problem. We have prototyped our approach (available at: http://bit.ly/pccr_acn) that demonstrates the effectiveness of our approach in finding patients. Due to privacy concerns, we were not able to integrate with existing online commu- nities. We have developed a sample online community, MeMed (available at: http://bit.ly/me_med_), and created posts similar to those found in existing communities. Our prototype allows users to add PCORI projects and finding matching patients in MeMed. 3 Overview of the PCCR Platform In this section we briefly describe the PCCR platform. We begin by describing the main models in the system. These are illustated in figure 1. 1. Investigators: captures the information about the investigators who are seeking participants for their projects. We model the institution and the areas of interest for an investigator. The areas of interest of an investigator are automatically created by analyzing their projects. 2. Projects: Each investigator can have multiple projects. Each project has a title, description, goals, project type that captures the nature of the project, the medical conditions and medications of interest described in the project, and the expected outcomes. Our matching algorithm matches participants across these different dimensions and calculates a match score. The patients are ranked based on this match score. 3. Patients: We extract patient profiles based on their conversations / partic- ipation in existing online health communities. We identify and use their demographic, socio-economic, and medical information in creating their profile. 2
  • 3. Inves-gators( Pa-ents( Name( Name( Organiza-on( Age( Areas(of(interest( Gender( Project(History( Loca-on( Economic(status( Race( Areas(of(interest( Medical(Condi-ons(/(stage( Ac-vity( Project(1( Project(2( Project(k( Project(defini-on( Medical( Statement( Goals( Type( Condi-ons( Demographics( Outcome( Preven-ve( Medical( Age( Trial( Diagnos-c( condi-on( Gender( Tests( PR( PR( Therapeu-c( Condi-on(stage( Economic( Studies( Pallia-ve( Medica-on( Region( Surveys( UC( UC( Health(Delivery( Race( UC( UC( PR( PR( *PR(–(Pa-ent(response,(*UCN(User(comment( Figure 1: PCCR Matching Platform - Data Definitions Project Title & Description Patient Communities Big Data & Multidimensional Big Data & Multidimensional Semantic Analysis Semantic Analysis Rich Project Profile Rich Patient Profile Multidimensional Semantic Match Engine Matched Participants Across Online Patient Communities Figure 2: PCCR Matching Platform - Data Flow The researcher and patients profiles are used by our matching engine to identify relevant patients for a project. At the heart of the PCCR platform is our matching engine. Figure 2 illustrates the data flow of our matching engine. The two main components of the matching engine are the researcher profile generator and the patient profile generator. The researcher profile generator takes as input the textual description of a research project. For the purposes of this challenge, we use the descriptions of funded PCORI projects. This profile is passed through a semantic analyzer. The semantic analyzer is built using concepts in RXNorm and SNOMED and 3
  • 4. Figure 3: Example output of semantic analysis identifies medical terminologies and concepts in the description, along with their semantic types. In addition to the semantic analyzer, the description is also sent to OpenCalais Web API for entity identification. A final list of entites and types is created by combining the output of the semantic analyzer and OpenCalais. The demographic analyzer module extracts demographic information (such as age group of target population, gender, and location information). We use textual cues to identify expected outcomes. Figure 3 illustrates the entities identified from the description of a PCORI project on Epilepsy. The patient profile generator uses the semantic analyzer and the demo- graphic analyzer. However, given the volume of patient data, we needed to adopt a more scalable approach as semantic analysis can be expensive. We use a Map-Reduce based solution, where we have a series of map and reduce jobs. The first map job takes user profiles as input and uses the semantic analyzer to identify entities and types. In parallel, we have another map job that uses extracts entities and types using OpenCalais. The respective reduce jobs com- bine all the identified entities for a patient. We merge these lists to create a semantic signature of the patient consisting of a collection of entities and their types. Similarly, the demographic and socio-economic information is identified. Combining all of the above information yields a rich patient profile. We store the profile as a structured object in Mongo. The matching algorithm takes as input a rich project profile. For each of the facets in medical condition, medication, and demographics, the match- ing algorithm first finds the relevant patients using set containment opera- tors. We also use Mongo’s geo querying to filter users by location, if the project description mentions such as a restriction. Further, we apply a seman- tic similarity (based on Ted Pedersons UMLS Similarity project available at http://umls-similarity.sourceforge.net/), to compute the semantic sim- ilarity of a patient profile to that of a project. All of these are then combined to create a match score that is used in selected and ranking patients. 4 Related Work The techniques we have used in this paper are built upon prior research in the areas of semantic Web, hierarchical object matching, and entity extraction. In the context of patient matching for healthcare, the TrialX system [4]is very relevant to work. We also use our prior work in the area of faceted matching and searching of unstructured documents [3] for factet extraction.We model our 4
  • 5. similarity measurement technique based on the their approach. We also applied the principles of hierarchical object matching discussed by Ganesan et. al in [2] and Doan et. al in [1]. We also use OpenCalais Web service [5] to semantically enrich patient conversations and project descriptions and to extract relevant entities. 5 Conclusions In this paper, we describe our solution to the PCORI Healthcare 2.0 chal- lenge. Our solution leverages existing patient data available in online health communities and creates a rich semantic profile of the patients. We have also developed techniques for creating multi-dimensional project profiles from their textual descriptions. We have developed a semantic matching algorithm that finds matching patients for research projects. The PCCR platform we have de- veloped works for any patient community. Due to privacy concerns, we have not used any online community data in our development or demonstration. In- stead, we use data from a patient community that we prototyped and seeded with posts. We evaluated our system and found that our approach has over 90% accuracy in finding patients who have same or similar medical conditions. The match rate when using demographics goes down to about 80%. We are currently improving our demographic profiling and extraction technique. Our approach builds on current ways patients share and interact on the Web today and we believe that it can help researchers find very relevant patients leading to more meaningful and productive engagements and outcomes. References [1] Anhai Doan, Pedro Domingos, and Alon Halevy. Learning to match the schemas of data sources: A multistrategy approach. Machine Learning, 50(3):279–301, 2003. [2] Prasanna Ganesan, Hector Garcia-Molina, and Jennifer Widom. Exploiting hierarchical domain structure to compute similarity. ACM Transactions on Information Systems (TOIS), 21(1):64–93, 2003. [3] Karthik Gomadam, Ajith Ranabahu, Meenakshi Nagarajan, Amit P Sheth, and Kunal Verma. A faceted classification based approach to search and rank web apis. In Web Services, 2008. ICWS’08. IEEE International Conference on, pages 177–184. IEEE, 2008. [4] Chintan Patel, Sharib Khan, and Karthik Gomadam. Trialx: Using semantic technologies to match patients to relevant clinical trials based on their per- sonal health records. Proc. of the International Semantic Web Conference (ISWC), 2009. [5] T Reuters. Opencalais, 2009. 5