Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Ana	Paula	Appel
Data	Scientist	&	Master	Inventor
Discovering	the	hidden	treasure	of	data	using	graph	
analytic
©	2015	IBM	Corporation2
IBM Research – Brazil
view from Rio de Janeiro Lab
Mission:	To	be	known	for	our	science	and	technology	and	vital	to	IBM,	B...
Healthcare Data
• Medical	attention	transactional	data
• Large	healthcare	insurance	company	in	
Brazil
• Nationwide
• Span...
©	2015	IBM	Corporation5
Healthcare Data:	Stakeholders
Physicians
Patients
Healthcare	providers
Health	Services
Claims
Heal...
©	2015	IBM	Corporation6
• Paid Claims
• Total:	109M
• Doctors:	220k	(almost	half	of	all	doctors	In	Brazil)
• Patients:	2.2...
©	2015	IBM	Corporation7
A	Complex Network	Perspective
©	2015	IBM	Corporation8
PhysID ICD9				 PatientID DATE
SP45962 - 1001									 09/04/13
SP45962 Z017		 1001 26/04/13
SP471...
©	2015	IBM	Corporation9
Phys - Patient
Nodes	=	402	
Links			=	403
Patient	- Patient
Nodes	=	377
Links	=	5488
Phys - Phys
N...
©	2015	IBM	Corporation10
One	patient	with		
123	different	
physicians
409k	patients	with	
only	1	physician
Patient	Histogr...
©	2015	IBM	Corporation11
Network-Derived Metrics
• Aim:	extend the doctors description with
relevant metrics
• Metrics whi...
Case:	Build	Metrics for	Describe Physicians using
Complex Network
Mutual	Reference CentralityLoyalty
Health	Insurance:	Similarity	between	Complex	
Network
Friendship Physician	Network
©	2015	IBM	Corporation14
Mutual	Reference
©	2015	IBM	Corporation15
a b
w(ab)	=	17
Δt =	7	days
w(ba)	=	8
Δt =	2 days
time
1 1 2 2
a b b a
visit visit visit visit
Pat...
©	2015	IBM	Corporation16
BA DF SP
Top	50Top	20
PE RJ
Dens.:
Dens.:
0.809 0.4470.8050.845
0.913 0.963 0.834 0.568 0.802
0.5...
Mutual	Reference
Alergy Oftalmology
©	2015	IBM	Corporation18
Mutual	Reference
Conclusions	and	Insights
• Claim	data	is	rich	to	identify	connections	among	phys...
©	2015	IBM	Corporation19
Patient Loyalty
©	2015	IBM	Corporation20
Patient Loyalty
Goal
Identify (and quantify)	doctors that have recurring patients in	a	systematic...
©	2015	IBM	Corporation21
Patient Loyalty
São	Paulo
1.00
• Weight	wij represents	the	number	of	visits	of	patient	i to	dr.	j...
©	2015	IBM	Corporation22
• The	more	patients	with	high	rw and	high	
s,	the	most	likely	the	doctor	is	a	
candidate	to	have	...
©	2015	IBM	Corporation23
Centrality
©	2015	IBM	Corporation24
Goal
Identify	physicians	role	in	the	network	using	their	relative	importance	over	other	
physicia...
©	2015	IBM	Corporation25
Summary &	Take Home	Messages
• Networks	are	all	about	relationships,	as	most	data	is.	
• Network-...
Where find more	information..	
Introduction basic Advanced
Database API’s Visualization
GRAPH	ANALYTICS
Thanks!
apappel@br.ibm.com
Próxima SlideShare
Cargando en…5
×

Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

298 visualizaciones

Publicado el

Graphs are used to map relations on unstructured data. Companies’ data are most from database and mined using traditional data mining approach. However, model relational data as a graph can reveal useful insights and discovery relation among data that is ignored by traditional data mining techniques. In this work we used graphs to map physician relations using claim data as a proxy and this approach reveal interesting insights from health insurance company.

Publicado en: Tecnología
  • Sé el primero en comentar

Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

  1. 1. Ana Paula Appel Data Scientist & Master Inventor Discovering the hidden treasure of data using graph analytic
  2. 2. © 2015 IBM Corporation2
  3. 3. IBM Research – Brazil view from Rio de Janeiro Lab Mission: To be known for our science and technology and vital to IBM, Brazil, our clients in the region and worldwide
  4. 4. Healthcare Data • Medical attention transactional data • Large healthcare insurance company in Brazil • Nationwide • Spanning 1.5 years (2013-2014) • 0.6 Tb (compressed)
  5. 5. © 2015 IBM Corporation5 Healthcare Data: Stakeholders Physicians Patients Healthcare providers Health Services Claims Health Insurance Company
  6. 6. © 2015 IBM Corporation6 • Paid Claims • Total: 109M • Doctors: 220k (almost half of all doctors In Brazil) • Patients: 2.2M • Unique Doctor-Patient pairs: 11.6M • Other support data: • Company • Providers • Authorizations ~3M • Claim denials ~13M • Geolocation • ... Over 40 tables, hundreds of fields Healthcare Data: Claims CLAIM • Physician ID • Patient ID • Timestamp • Service code • Disease – ICD9 • (80+ extra rows)
  7. 7. © 2015 IBM Corporation7 A Complex Network Perspective
  8. 8. © 2015 IBM Corporation8 PhysID ICD9 PatientID DATE SP45962 - 1001 09/04/13 SP45962 Z017 1001 26/04/13 SP47108 Z017 1001 06/12/13 SP47108 Z017 1001 16/12/13 SP45962 - 1002 11/07/13 SP45962 Z017 1002 12/07/13 SP45962 - 1002 19/08/13 SP59938 Z000 1002 24/10/13 … … … … Bipartite graph Weighted graph Directed graph • Bipartite network of doctors and patients • |V|=2.4M, |E|=11.6M • Keep only the largest connected component (92%-99% of all links) • Remove multiple edges and map to weights A Network Approach
  9. 9. © 2015 IBM Corporation9 Phys - Patient Nodes = 402 Links = 403 Patient - Patient Nodes = 377 Links = 5488 Phys - Phys Nodes = 25 Links = 30 Patient-Sharing networks Links represent a shared patient
  10. 10. © 2015 IBM Corporation10 One patient with 123 different physicians 409k patients with only 1 physician Patient Histogram Physician Histogram Physican and Patient Degree Distributions 26 physicians with more than 5k different patients, 1 with 30k (possibly spurious)
  11. 11. © 2015 IBM Corporation11 Network-Derived Metrics • Aim: extend the doctors description with relevant metrics • Metrics which, in combination with other data, will allow to: • classify • filter • reduce 35 0.1 3.2 0 4% 7% ... ... 17 0.2 5.1 1 9% 1% ... ... Compliant doctors Not-compliant doctors
  12. 12. Case: Build Metrics for Describe Physicians using Complex Network Mutual Reference CentralityLoyalty
  13. 13. Health Insurance: Similarity between Complex Network Friendship Physician Network
  14. 14. © 2015 IBM Corporation14 Mutual Reference
  15. 15. © 2015 IBM Corporation15 a b w(ab) = 17 Δt = 7 days w(ba) = 8 Δt = 2 days time 1 1 2 2 a b b a visit visit visit visit Patients Doctors Mutual Reference Same patient visits two doctors + Happens in both directions Δt = 7 days Δt = 2 days Reciprocal Link Goal Identify strong connections between each pair of physicians, in particular, the outliers.
  16. 16. © 2015 IBM Corporation16 BA DF SP Top 50Top 20 PE RJ Dens.: Dens.: 0.809 0.4470.8050.845 0.913 0.963 0.834 0.568 0.802 0.576 Mutual Reference
  17. 17. Mutual Reference Alergy Oftalmology
  18. 18. © 2015 IBM Corporation18 Mutual Reference Conclusions and Insights • Claim data is rich to identify connections among physicians and how a partnership is done. • The Mutual Reference is an indicative of physician relationship and can potentially generate other analyses, especially in a large volume of data. • The proposed metric makes possible a frequent computational analyze of that relationship. Physician A Physician B rm Rank MMS028 MMS027 1 1 MSP145 MSP144 0.31 10 Mutual Reference • Specialties that appear more • Ophthalmology to ophthalmology • Gynecologic and obstetrician to Gynecologic and obstetrician • DF has most of consultation with irregular interval • MDF010 and MDF009 with 267 consultations and average of days equal to 0 • Top pair; • 205 from MMS028 to MMS027 • 196 from MMS027 to MMS028
  19. 19. © 2015 IBM Corporation19 Patient Loyalty
  20. 20. © 2015 IBM Corporation20 Patient Loyalty Goal Identify (and quantify) doctors that have recurring patients in a systematic way, suggesting ‘loyalty’ 1. Consider patients with many visits to doctors 2. Compute the relative weight for each doctor visited 3. Count the relative number of ‘loyal’ patients for that doctor Time Consultations
  21. 21. © 2015 IBM Corporation21 Patient Loyalty São Paulo 1.00 • Weight wij represents the number of visits of patient i to dr. j • Strength s: sum of the weights attached to links belonging to a node (i.e., all visits from i) • Relative weight rw(ij): fraction of weight ij over total Strength s Degree k High rw Low rw
  22. 22. © 2015 IBM Corporation22 • The more patients with high rw and high s, the most likely the doctor is a candidate to have ‘loyalty’ capacity • Stability: Many doctors maintain sustained values of the metric across time. • A given doctor is in rank 1 or 2 during all 5 quarters. • 20% mean turnover across quarters • Top 5 specialty among physicians with higher loyalty (mf > 0.5) • Orthopedic and traumatology (5 in top 10) • Ophthalmology (3) • Gynecologic and obstetrician(2) • Pediatric (1) Patient Loyalty Relativeweight strength strength Cardio Cardio Physician mf RANK MSP 139 1.54 175 MSP 261 1.18 432 Loyalty
  23. 23. © 2015 IBM Corporation23 Centrality
  24. 24. © 2015 IBM Corporation24 Goal Identify physicians role in the network using their relative importance over other physicians. • We applied several centrality measures: • Eigenvalue; • Degree; • Betweeness; • Closeness • Do the values of these metrics change overtime? • Is it seasonal? Physician Centrality physician eigen Rank Grau MSP 153 1 1 253 MSP 139 0.55 8 335 2Q 2014 Centrality Conclusion and insights • Centrality recommends which physicians are important in the physician community • There is a set of physicians with high scores • This set of physician has a a higher number of patients in common building a block • The relative centrality has a positive correlation among close physicians • This group of physician with high score is stable overtime, with few change in each quartile.
  25. 25. © 2015 IBM Corporation25 Summary & Take Home Messages • Networks are all about relationships, as most data is. • Network-derived insights are usually not reachable from other analyses. • Complex Networks methods are very valuable to data science. • Large Healthcare claim database from Brazilian insurance company. • Applied complex network methods to find how physicians build their network. • Examples: Temporality, reciprocity and ‘loyalty’.
  26. 26. Where find more information.. Introduction basic Advanced
  27. 27. Database API’s Visualization GRAPH ANALYTICS
  28. 28. Thanks! apappel@br.ibm.com

×