SlideShare una empresa de Scribd logo
1 de 79
Descargar para leer sin conexión
Data	Pipeline	using:	Spark,	MapR	Event	Store	
for	Kafka,	and	MapR	Database
2 © 2018 MapR Technologies, Inc
•  Overview	of	Kafka	API	
•  Use	Spark	Structured	Streaming	to	
continously:	
•  Read	from	Kafka	topic	
•  Transfom		
•  Write	to	MapR	document	
database	
•  Analyze	using	Spark	SQL	
	
Agenda	
2
3 © 2018 MapR Technologies, Inc
Use	Case:	Open	Payment	Dataset	
•  Payments Drug and Device companies make to
•  Physicians and Teaching Hospitals for
•  Travel, Research, Gifts, Speaking fees, and Meals
•  https://www.cms.gov/openpayments/
4 © 2018 MapR Technologies, Inc
•  Medicare	payment	data	for	most	common	inpatient	diagnoses	
•  Data	shows	dramatically	different	charges	and	payments		
•  https://www.cms.gov/research-statistics-data-and-systems/statistics-trends-and-
reports/medicare-provider-charge-data/inpatient.html	
InPatient	Payment	Dataset	(IPPS)	
Provider	ID Provider	
State	
DRG Total
Discharges
Average		
Charges
Average	
Total	
payments	
Avg		
Medicare
Payments
1261770 CA
Heart	
Transplant	
13	 1,172,866	 251,876	 244,457
What	is	Streaming	Data?
6 © 2018 MapR Technologies, Inc
What	is	a	Stream	?	
•  A stream is a continuous sequence of events
•  Events are key-value pairs
7 © 2018 MapR Technologies, Inc
What	is	Streaming	Data?	
Fraud detection Smart Machinery Smart Meters Home Automation
Networks Manufacturing Security Systems Patient Monitoring
8 © 2018 MapR Technologies, Inc
Monitoring	devices	combined	with	ML	can	provide	alerts	for	Sepsis,	
which	is	one	of	the	leading	causes	for	death	in	hospitals	
– http://www.computerweekly.com/news/450422258/Putting-sepsis-
algorithms-into-electronic-patient-records	
Examples	of	Streaming	Data
9 © 2018 MapR Technologies, Inc
A	Stanford	team	has	shown	that	a	machine-learning	model	can	identify	arrhythmias	
from	an	EKG	better	than	an	expert	
•  https://www.technologyreview.com/s/608234/the-machines-are-getting-ready-
to-play-doctor/	
Example	of	Streaming	Data	combined	with	Machine	Learning
10 © 2018 MapR Technologies, Inc
	the	ECG	app	on	the	Apple	Watch	can	check	heart	rhythms	and	send	a	notification	if	an	
irregular	heart	rhythm		is	identified.	
•  https://www.apple.com/newsroom/2018/12/ecg-app-and-irregular-heart-
rhythm-notification-available-today-on-apple-watch/	
Example	of	Streaming	Data	combined	with	Machine	Learning
11 © 2018 MapR Technologies, Inc
https://mapr.com/blog/ml-iot-connected-medical-devices/	
Applying	Machine	Learning	to	Live	Patient	Data
Kafka	API	and	Streaming	Data
13 © 2018 MapR Technologies, Inc
Intro	to	the	Kafka	API
14 © 2018 MapR Technologies, Inc
Topics:		
Logical	collection	of	events		
Organize	Events	into	Categories	
Organize	Data	into	Topics	with	the	MapR	Event	Store	for	Kafka
15 © 2018 MapR Technologies, Inc
Topics:		
Logical	collection	of	events		
Organize	Events	into	Categories	
Organize	Data	into	Topics	with	the	MapR	Event	Store	for	Kafka	
Consumers
MapR Cluster
Topic: Pressure
Topic: Temperature
Topic: Warnings
Consumers
Consumers
Kafka API Kafka API
16 © 2018 MapR Technologies, Inc
Topics	are	partitioned	for	throughput	and	
scalability	
	
Scalable	Messaging	with	MapR	Event	Streams	
Server 1
Partition1: Topic - Pressure
Partition1: Topic - Temperature
Partition1: Topic - Warning
Server 2
Partition2: Topic - Pressure
Partition2: Topic - Temperature
Partition2: Topic - Warning
Server 3
Partition3: Topic - Pressure
Partition3: Topic - Temperature
Partition3: Topic - Warning
17 © 2018 MapR Technologies, Inc
New	Messages	are		
Added	to	the	end	
Partition	is	like	an	Event	Log	
New
Message
6 5 4 3 2 1
Old
Message
18 © 2018 MapR Technologies, Inc
Messages	are	delivered	in	the	order	they	are	received	
Partition	is	like	a	Queue
19 © 2018 MapR Technologies, Inc
Events	remain	on	the	partition,	available	to	other	consumers	
	
	
Unlike	a	queue,	Events	are	not	deleted	when	Read
20 © 2018 MapR Technologies, Inc
Messages	can	be	persisted	forever		
Or			
Older	messages	can	be	deleted	automatically	based	on	time	to	live	
Not	deleting	messages,	minimizes	disk	read/writes	
		
	
When	Are	Messages	Deleted?	
MapR Cluster
6 5 4 3 2 1Partition
1
Older
message
21 © 2018 MapR Technologies, Inc
High	Performance	at	Scale:		
•  Partitioning	for	Parallel	operations		
•  Not	deleting	messages,	minimizes	disk	read/writes
22 © 2018 MapR Technologies, Inc
Processing	Same	Message	for	Different	Purposes
23 © 2018 MapR Technologies, Inc
Georgia	Health	Connect	
ALLOY Health:
Exchange State HIE
Clinical Data Viewer
Reporting and Analytics
Clinical Data
Financial Data
Provider
Organizations
What are the outcomes in the entire
state on diabetes?
Are there doctors that are doing this
better than others?
Georgia Health Connect
24 © 2018 MapR Technologies, Inc
Use	Case:	Streaming	System	of	Record	for	Healthcare
Spark	Structured	Streaming
26 © 2018 MapR Technologies, Inc
Spark	Distributed	Datasets	
partitioned
•  Distributed collection of objects
Dataset[T]
•  Partitioned across a cluster
•  Operated on in parallel
•  in memory can be Cached
27 © 2018 MapR Technologies, Inc
DataFrame	=	Dataset[Row]	can	use	Spark	SQL	
	
DataFrame is like a table
Dataset[Row]
row
columns
28 © 2018 MapR Technologies, Inc
Dataset[Typed	Object]		can	use	Spark	SQL	and	Functions	
A Dataset is a distributed collection of objects
Dataset[objects]
object
columns
partitioned
29 © 2018 MapR Technologies, Inc
Process	the	Data	with	Spark	Structured	Streaming
30 © 2018 MapR Technologies, Inc
Datasets	Read	from	Stream	
Task	
Cache		
Process
& Cache
Data
offsets
Stream
partition
Task	
Cache		
Process
& Cache
Data
Task	
Cache		
Process
& Cache
Data
Driver	
Stream
partition
Stream
partition
Data is cached for
aggregations
And windowed
functions
31 © 2018 MapR Technologies, Inc
new data in the
data stream
=
new rows appended
to an unbounded table
Data stream as an unbounded table
	
Treat	Stream	as	Unbounded	Tables
32 © 2018 MapR Technologies, Inc
The	Stream	is	continuously	processed
33 © 2018 MapR Technologies, Inc
Spark	automatically		streamifies	SQL	plans	
Image	reference	Databricks
34 © 2018 MapR Technologies, Inc
Stream	Processing
35 © 2018 MapR Technologies, Inc
Use	Case:	Payment	Data	
"NEW","Covered Recipient Physician",,,,"132655","GREGG","D","ALZATE",,"8745
AERO DRIVE","STE 200","SAN DIEGO","CA","92123","United States",,,"Medical
Doctor","Allopathic & Osteopathic Physicians|Radiology|Diagnostic
Radiology","CA",,,,,"DFINE, Inc","100000000326","DFINE, Inc","CA","United States",
90.87,"02/12/2016","1","In-kind items and services","Food and Beverage",,,,"No","No
Third Party
Payment",,,,,"No","346039438","No","Yes","Covered","Device","Radiology","StabiliT",
,"Covered","Device","Radiology","STAR Tumor Ablation
System",,,,,,,,,,,,,,,,,"2016","06/30/2017"
{
"_id":"317150_08/26/2016_346122858",
"physician_id":"317150",
"date_payment":"08/26/2016",
"record_id":"346122858",
"payer":"Mission Pharmacal Company",
"amount":9.23,
"Physician_Specialty":"Obstetrics & Gynecology",
"Nature_of_payment":"Food and Beverage"
}
36 © 2018 MapR Technologies, Inc
Scenario:	Payment	Data	
Provider	ID Date Payer Payer
State
Provider	
Specialty
Provider	
State	
Amount Payment
Nature
1261770 01/11/2016
Southern	
Anesthesia	&	
Surgical,	Inc	
CO	
Oral	and	
Maxillofacial	
Surgery	
CA	 117.5 Food	and	Beverage
37 © 2018 MapR Technologies, Inc
val df1 = spark.readStream.format("kafka")
.option("kafka.bootstrap.servers", "maprdemo:9092")
.option("subscribe", "/apps/stream:payments”)
.option("startingOffsets", "earliest")
.option("failOnDataLoss", false)
.option("maxOffsetsPerTrigger", 1000)
.load()
	
Streaming	pipeline		Kafka	Data	source
38 © 2018 MapR Technologies, Inc
df1.printSchema()
root

|-- key: binary (nullable = true)

|-- value: binary (nullable = true)

|-- topic: string (nullable = true)

|-- partition: integer (nullable = true)

|-- offset: long (nullable = true)

|-- timestamp: timestamp (nullable = true)

|-- timestampType: integer (nullable = true)
Kafka	DataFrame	schema
39 © 2018 MapR Technologies, Inc
case class Payment(_id: String, physician_id: String,
date_payment: String, payer: String, payer_state: String,
amount: Double, physician_type: String,physician_specialty: String,
physician_state: String, nature_of_payment: String)
def parsePayment(str: String): Payment = {
val td = str.split(",(?=([^"]*"[^"]*")*[^"]*$)")
val physician_id =td(5)
val amount = td(30).toDouble
. . .
Payment(id, physician_id, date_payment, payer, payer_state, amount,
physician_type, focus, physician_state, nature_of_payment)
}
	
Function	to	Parse	CSV	data	to	Payment	Object
40 © 2018 MapR Technologies, Inc
//register a user-defined function (UDF) to deserialize the message
spark.udf.register("deserialize",
(message: String) => parsePayment(message))
//use the UDF in a select expression
val df2 = df1.selectExpr("""deserialize(CAST(value as STRING)) AS
message""").select($"message".as[Payment])
	
Parse	message	txt	to	Payment	Object
41 © 2018 MapR Technologies, Inc
Writing	to	a	Memory	Sink	
Write		results	to	Memory	
Start	running	the	query	
val	query	=	cdf	
	.writeStream	
	.queryName("uber")	
	.format("memory")	
	.outputMode("append”)		
	
query.start().awaitTermination()
42 © 2018 MapR Technologies, Inc
%sql	select	*	from	payments	limit	4:		
Streaming	Applicaton	
+--------------------+------------+------------+--------------------+-----------+--------+-------------------+--------------------+---------------+------------------+
| _id|physician_id|date_payment| payer|payer_state| amount| physician_type| physician_specialty|physician_state| nature_of_payment|
+--------------------+------------+------------+--------------------+-----------+--------+-------------------+--------------------+---------------+------------------+
|CA_Diagnostic Rad...| 132655| 02/12/2016| DFINE, Inc| CA| 90.87| Medical Doctor|Diagnostic Radiology| CA| Food and Beverage|
|AZ_Vascular & Int...| 1006832| 01/26/2016| DFINE, Inc| CA| 25.0| Medical Doctor|Vascular & Interv...| AZ|Travel and Lodging|
|AZ_Vascular & Int...| 1006832| 02/13/2016| DFINE, Inc| CA| 32.0| Medical Doctor|Vascular & Interv...| AZ|Travel and Lodging|
|AZ_Vascular & Int...| 1006832| 02/19/2016| DFINE, Inc| CA| 27.27| Medical Doctor|Vascular & Interv...| AZ|Travel and Lodging|
43 © 2018 MapR Technologies, Inc
%sql	select	*	from	payments	limit	3:		
Streaming	Applicaton
44 © 2018 MapR Technologies, Inc
%sql	select	physician_specialty,	count(*)	as	cnt	from	payment2		
									group	by	physician_specialty	order	by	cnt	desc	
Streaming	Applicaton
Spark		&	MapR-DB
46 © 2018 MapR Technologies, Inc
MapR-DB Connector for Apache Spark 	
Spark	Streaming	writing	to	MapR-DB	JSON
47 © 2018 MapR Technologies, Inc
Spark	MapR-DB	Connector
48 © 2018 MapR Technologies, Inc
Designed	for	Partitioning	and	Scaling
49 © 2018 MapR Technologies, Inc
MapR-DB	JSON	Document	Store	
Data is automatically partitioned
and sorted by _id row key!
{
"_id":”AK_08/26/2016_346122858",
"physician_id":"317150",
"date_payment":"08/26/2016",
"payer":"Mission Pharmacal Company",
"payer_state":”CO",
"amount":9.23,
”physician_specialty":”Gynecology",
“physician_state":”AK"
”nature_of_payment":"Food and Beverage"
}
50 © 2018 MapR Technologies, Inc
MapR	Database	shell	
maprdb mapr:> find /user/mapr/paytable --limit 2
{"_id":"AK_01/03/2016_1309545_391702408", "amount":94.21,
"date_payment":"01/03/2016", "nature_of_payment":"Food and Beverage",
"payer":"Stryker Corporation", "payer_state":"MI","physician_id":"1309545",
"physician_specialty":"Specialist", "physician_state":"AK",
"physician_type":"Medical Doctor"}
{"_id":"AK_01/04/2016_100801_396460394","amount":11.59,
"date_payment":"01/04/2016", "nature_of_payment":"Food and Beverage",
"payer":"AbbVie, Inc.", "payer_state":"IL", "physician_id":"100801",
"physician_specialty":"Family Medicine", "physician_state":"AK",
"physician_type":"Medical Doctor"}
2 document(s) found.	
Data is automatically partitioned
and sorted by _id row key!
51 © 2018 MapR Technologies, Inc
Writing	to	a	MapR-DB	Sink	
Write		Streaming	DataFrame	
Query	Results	to	MapR-DB	
	
Start	running	the	query	
	val	query	=	cdf.writeStream	
						.format(MapRDBSourceConfig.Format)	
						.option(MapRDBSourceConfig.TablePathOption,	tableName)	
						.option(MapRDBSourceConfig.IdFieldPathOption,	"_id")	
						.option(MapRDBSourceConfig.CreateTableOption,	false)	
						.option("checkpointLocation",	"/user/mapr/check")	
						.option(MapRDBSourceConfig.BulkModeOption,	true)	
						.option(MapRDBSourceConfig.SampleSizeOption,	1000)	
	
				query.start().awaitTermination()
52 © 2018 MapR Technologies, Inc
Streaming	Dataset	Read	from	Topic	partitions,	Transformed,	
Written	to	MapR	Database	partitions
53 © 2018 MapR Technologies, Inc
Store	in	MapR	Database
Explore	the	Data	With	Spark	SQL
55 © 2018 MapR Technologies, Inc
•  Spark SQL queries and updates to MapR-DB
•  With projection and filter pushdown, custom partitioning, and data locality
	
Spark	SQL	Querying	MapR-DB	JSON
56 © 2018 MapR Technologies, Inc
val pdf: Dataset[Payment] = spark
.loadFromMapRDB[Payment](tableName, schema)
.as[Payment]
	
Spark	Distributed	Datasets	read	from	MapR-DB	Partitions	
Worker	
Task	
Worker	
Driver	
Cache	1	
Cache	2	
Cache	3	
Process
& Cache
Data
Process
& Cache
Data
Process
& Cache
Data
Task	
Task	
Driver	
tasks
tasks
tasks
57 © 2018 MapR Technologies, Inc
Data
Frame
Load data
pdf.createOrReplaceTempView(”payment")	
pdf.select("_id",	"payer",	"amount”).show	
	
Load	the	data	into	a	Dataframe	
Data is automatically partitioned
and sorted by _id row key!
58 © 2018 MapR Technologies, Inc
•  Which	physician	specialties,	or	physician	states,	are	getting	the	most	payments	?		
•  Count	,	total	amount	$,		average	amount	$,	standard	deviation	
•  Which	companies,	or	company	states,		are	making	the	most	payments	?		
•  Count	,	total	amount	$,		average	amount	$,	standard	deviation	
•  What	are	the	Top	Nature	of	Payments	by	count,	amount		?	
•  What	are	the	payments	over	$2,000,000	?	
Now	We	Can	Ask	Questions	Like:	
Physician	ID Date Payer Payer
State
Physician	
Specialty
Physician	
State	
Amount Payment
Nature
1261770 01/11/2016
Southern	
Anesthesia	&	
Surgical,	Inc	
CO	
Oral	and	
Maxillofacial	
Surgery	
CA	 117.5 Food	and	Beverage
59 © 2018 MapR Technologies, Inc
Language	Integrated	Queries	
L	I	Query	 Description	
agg(expr, exprs) Aggregates	on	entire	DataFrame	
distinct Returns	new	DataFrame	with	unique	rows	
except(other) Returns	new	DataFrame	with	rows	from	this	DataFrame	not	in	other	
DataFrame	
filter(expr);
where(condition)
Filter	based	on	the	SQL	expression	or	condition	
groupBy(cols:
Columns)
Groups	DataFrame	using	specified	columns	
join (DataFrame,
joinExpr)
Joins	with	another	DataFrame	using	given	join	expression	
sort(sortcol) Returns	new	DataFrame	sorted	by	specified	column	
select(col) Selects	set	of	columns
60 © 2018 MapR Technologies, Inc
MapR	Data	Platform
61 © 2018 MapR Technologies, Inc
Top	5	Nature	of	Payment	by	count	
val res = pdf.groupBy(”Nature_of_Payment")
.count()
.orderBy(desc(count))
.show(5)
62 © 2018 MapR Technologies, Inc
Top		5	Nature	of	Payment	by	amount	of	payment	
%sql select Nature_of_payment,
sum(amount) as total from payments
group by Nature_of_payment order by total desc limit 5
63 © 2018 MapR Technologies, Inc
Top	5	Physician	Specialties	by	total	Amount	
%sql select physician_specialty, sum(amount) as total
from payments where physician_specialty IS NOT NULL
group by physician_specialty order by total desc limit 5
64 © 2018 MapR Technologies, Inc
Average	Payment	by	Specialty
65 © 2018 MapR Technologies, Inc
What	are	the	Nature	of	Payments	with	payments	>	$1000	?	
pdf.filter($"amount" > 1000)
.groupBy("Nature_of_payment")
.count().orderBy(desc("count")).show()
66 © 2018 MapR Technologies, Inc
Top	Payers	by	Total	Amount	with	count	
%sql select payer, payer_state, count(*) as cnt,
sum(amount) as total from payments
group by payer, payer_state order by total desc limit 10
67 © 2018 MapR Technologies, Inc
Scenario:	InPatient	Payment	Data	
Provider	ID Provider	
State	
DRG Total
Discharges
Average		
Charges
Average	
Total	
payments	
Avg		
Medicare
Payments
1261770 CA
Heart	
Transplant	
13	 1,172,866	 251,876	 244,457
68 © 2018 MapR Technologies, Inc
MapR	Database	shell	
maprdb mapr:> find /user/mapr/ippstable --limit 2
{"_id":"001_2014_010033", "avg_covered_charges":1172866.385, "avg_medicare_payments":
244457.9231, "avg_total_payments":251876.3077, "drg_code":"001", "drg_definition":"HEART
TRANSPLANT OR IMPLANT OF HEART ASSIST SYSTEM W MCC", "provider_address":"619
SOUTH 19TH STREET", "provider_city":"BIRMINGHAM", "provider_id":"010033",
"provider_name":"UNIVERSITY OF ALABAMA HOSPITAL", "provider_region":"AL -
Birmingham", "provider_state":"AL", "provider_zip":"35233", "total_discharges":13,
"year":"2014"}
{"_id":"001_2014_030103", "avg_covered_charges":437531.3, "avg_medicare_payments":
133509.55, "avg_total_payments":240422.8, "drg_code":"001", "drg_definition":"HEART
TRANSPLANT OR IMPLANT OF HEART ASSIST SYSTEM W MCC",
"provider_address":"5777 EAST MAYO BOULEVARD", "provider_city":"PHOENIX",
"provider_id":"030103", "provider_name":"MAYO CLINIC HOSPITAL", "provider_region":"AZ -
Phoenix", "provider_state":"AZ", "provider_zip":"85054", "total_discharges":20, "year":"2014”}
69 © 2018 MapR Technologies, Inc
•  What	are	the	top	charges,	and		payments	by	Hospital?		
•  	 	by	Hospital	State?		
•  What	are	the	top	charges,	and		payments	by	diagnosis?		
•  Most	common	diagnosis	?		
•  What	are	the	statistics	for	payments	for	most	common	diagnosis	(Sepsis	)	
•  What	are	the	most	expensive	states	for	most	common	diagnosis		sepsis	
Now	We	Can	Ask	Questions	Like:	
Provider	ID Provider	
State	
DRG Total
Discharges
Average		
Charges
Average	
Total	
payments	
Avg		
Medicare
Payments
1261770 CA
Heart	
Transplant	
13	 1,172,866	 251,876	 244,457
70 © 2018 MapR Technologies, Inc
InPatient	Prospective	Payment	System	payments
71 © 2018 MapR Technologies, Inc
To	Download	the	code	:	Streaming	data	pipeline	blogs	
•  https://mapr.com/blog/etl-pipeline-healthcare-dataset-with-spark-json-mapr-db/	
•  https://mapr.com/blog/streaming-data-pipeline-transform-store-explore-healthcare-
dataset-mapr-db/
72 © 2018 MapR Technologies, Inc
http://mapr.com/ebooks/	
New	Spark	Ebook
73 © 2018 MapR Technologies, Inc
Read	about	Streaming	Architectures	
mapr.com/ebooks
74 © 2018 MapR Technologies, Inc
Read	about	Machine	Learning	Logistics	
mapr.com/ebooks
75 © 2018 MapR Technologies, Inc
MapR	Free	ODT			http://learn.mapr.com/	
To	Learn	More:	New	Spark	2.0	training
76 © 2018 MapR Technologies, Inc
https://mapr.com/blog/	
MapR	Blog
77 © 2018 MapR Technologies, Inc
To	Learn	More:	
•  https://mapr.com/blog/ml-iot-connected-medical-devices/
78 © 2018 MapR Technologies, Inc
To	Learn	More:	
•  https://mapr.com/blog/how-stream-first-architecture-patterns-are-revolutionizing-
healthcare-platforms/
79 © 2018 MapR Technologies, Inc
Applying	Machine	Learning	to	Live	Patient	Data	
•  https://www.slideshare.net/caroljmcdonald/applying-machine-learning-to-live-
patient-data

Más contenido relacionado

La actualidad más candente

Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Carol McDonald
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient DataCarol McDonald
 
Introduction to machine learning with GPUs
Introduction to machine learning with GPUsIntroduction to machine learning with GPUs
Introduction to machine learning with GPUsCarol McDonald
 
Demystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep LearningDemystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep LearningCarol McDonald
 
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Carol McDonald
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Carol McDonald
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Carol McDonald
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
The Keys to Digital Transformation
The Keys to Digital TransformationThe Keys to Digital Transformation
The Keys to Digital TransformationMapR Technologies
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR Technologies
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataCarol McDonald
 
Insight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital TransformationInsight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital TransformationMapR Technologies
 
When Streaming Becomes Strategic
When Streaming Becomes StrategicWhen Streaming Becomes Strategic
When Streaming Becomes StrategicMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningMapR Technologies
 

La actualidad más candente (19)

Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient Data
 
Introduction to machine learning with GPUs
Introduction to machine learning with GPUsIntroduction to machine learning with GPUs
Introduction to machine learning with GPUs
 
Demystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep LearningDemystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep Learning
 
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
 
Spark graphx
Spark graphxSpark graphx
Spark graphx
 
Managing a Multi-Tenant Data Lake
Managing a Multi-Tenant Data LakeManaging a Multi-Tenant Data Lake
Managing a Multi-Tenant Data Lake
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
The Keys to Digital Transformation
The Keys to Digital TransformationThe Keys to Digital Transformation
The Keys to Digital Transformation
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
 
Enterprise Data Lakes
Enterprise Data LakesEnterprise Data Lakes
Enterprise Data Lakes
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
 
Insight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital TransformationInsight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital Transformation
 
Building a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with RBuilding a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with R
 
When Streaming Becomes Strategic
When Streaming Becomes StrategicWhen Streaming Becomes Strategic
When Streaming Becomes Strategic
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
 

Similar a Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with MapR Database

How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications MapR Technologies
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged ApplicationsMapR Technologies
 
Map r seattle streams meetup oct 2016
Map r seattle streams meetup   oct 2016Map r seattle streams meetup   oct 2016
Map r seattle streams meetup oct 2016Nitin Kumar
 
From Mainframe to Microservices with Pivotal Platform and Kafka: Bridging the...
From Mainframe to Microservices with Pivotal Platform and Kafka: Bridging the...From Mainframe to Microservices with Pivotal Platform and Kafka: Bridging the...
From Mainframe to Microservices with Pivotal Platform and Kafka: Bridging the...VMware Tanzu
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
 
HTAP By Accident: Getting More From PostgreSQL Using Hardware Acceleration
HTAP By Accident: Getting More From PostgreSQL Using Hardware AccelerationHTAP By Accident: Getting More From PostgreSQL Using Hardware Acceleration
HTAP By Accident: Getting More From PostgreSQL Using Hardware AccelerationEDB
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...DataWorks Summit
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...DataStax
 
CWIN17 India / Bigdata architecture yashowardhan sowale
CWIN17 India / Bigdata architecture  yashowardhan sowaleCWIN17 India / Bigdata architecture  yashowardhan sowale
CWIN17 India / Bigdata architecture yashowardhan sowaleCapgemini
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Dataconomy Media
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformDeepak Chandramouli
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...MapR Technologies
 
Powering the "As it Happens" Business
Powering the "As it Happens" BusinessPowering the "As it Happens" Business
Powering the "As it Happens" BusinessMapR Technologies
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryDataWorks Summit
 
In-Memory Stream Processing with Hazelcast Jet @JEEConf
In-Memory Stream Processing with Hazelcast Jet @JEEConfIn-Memory Stream Processing with Hazelcast Jet @JEEConf
In-Memory Stream Processing with Hazelcast Jet @JEEConfNazarii Cherkas
 
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughtonReal-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughtonSynerzip
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR Technologies
 

Similar a Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with MapR Database (20)

How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged Applications
 
Map r seattle streams meetup oct 2016
Map r seattle streams meetup   oct 2016Map r seattle streams meetup   oct 2016
Map r seattle streams meetup oct 2016
 
From Mainframe to Microservices with Pivotal Platform and Kafka: Bridging the...
From Mainframe to Microservices with Pivotal Platform and Kafka: Bridging the...From Mainframe to Microservices with Pivotal Platform and Kafka: Bridging the...
From Mainframe to Microservices with Pivotal Platform and Kafka: Bridging the...
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
HTAP By Accident: Getting More From PostgreSQL Using Hardware Acceleration
HTAP By Accident: Getting More From PostgreSQL Using Hardware AccelerationHTAP By Accident: Getting More From PostgreSQL Using Hardware Acceleration
HTAP By Accident: Getting More From PostgreSQL Using Hardware Acceleration
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
CWIN17 India / Bigdata architecture yashowardhan sowale
CWIN17 India / Bigdata architecture  yashowardhan sowaleCWIN17 India / Bigdata architecture  yashowardhan sowale
CWIN17 India / Bigdata architecture yashowardhan sowale
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
big-data-anallytics.pptx
big-data-anallytics.pptxbig-data-anallytics.pptx
big-data-anallytics.pptx
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic Platform
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
 
Powering the "As it Happens" Business
Powering the "As it Happens" BusinessPowering the "As it Happens" Business
Powering the "As it Happens" Business
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy Industry
 
In-Memory Stream Processing with Hazelcast Jet @JEEConf
In-Memory Stream Processing with Hazelcast Jet @JEEConfIn-Memory Stream Processing with Hazelcast Jet @JEEConf
In-Memory Stream Processing with Hazelcast Jet @JEEConf
 
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughtonReal-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community Edition
 

Más de Carol McDonald

Spark machine learning predicting customer churn
Spark machine learning predicting customer churnSpark machine learning predicting customer churn
Spark machine learning predicting customer churnCarol McDonald
 
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APIStreaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APICarol McDonald
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Carol McDonald
 
Apache Spark Machine Learning
Apache Spark Machine LearningApache Spark Machine Learning
Apache Spark Machine LearningCarol McDonald
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBaseCarol McDonald
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBaseCarol McDonald
 
Machine Learning Recommendations with Spark
Machine Learning Recommendations with SparkMachine Learning Recommendations with Spark
Machine Learning Recommendations with SparkCarol McDonald
 
Getting started with HBase
Getting started with HBaseGetting started with HBase
Getting started with HBaseCarol McDonald
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on HadoopCarol McDonald
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill Carol McDonald
 

Más de Carol McDonald (13)

Spark machine learning predicting customer churn
Spark machine learning predicting customer churnSpark machine learning predicting customer churn
Spark machine learning predicting customer churn
 
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APIStreaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka API
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
 
Apache Spark Machine Learning
Apache Spark Machine LearningApache Spark Machine Learning
Apache Spark Machine Learning
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBase
 
Machine Learning Recommendations with Spark
Machine Learning Recommendations with SparkMachine Learning Recommendations with Spark
Machine Learning Recommendations with Spark
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
 
CU9411MW.DOC
CU9411MW.DOCCU9411MW.DOC
CU9411MW.DOC
 
Getting started with HBase
Getting started with HBaseGetting started with HBase
Getting started with HBase
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on Hadoop
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
 

Último

Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 

Último (20)

Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 

Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with MapR Database

  • 2. 2 © 2018 MapR Technologies, Inc •  Overview of Kafka API •  Use Spark Structured Streaming to continously: •  Read from Kafka topic •  Transfom •  Write to MapR document database •  Analyze using Spark SQL Agenda 2
  • 3. 3 © 2018 MapR Technologies, Inc Use Case: Open Payment Dataset •  Payments Drug and Device companies make to •  Physicians and Teaching Hospitals for •  Travel, Research, Gifts, Speaking fees, and Meals •  https://www.cms.gov/openpayments/
  • 4. 4 © 2018 MapR Technologies, Inc •  Medicare payment data for most common inpatient diagnoses •  Data shows dramatically different charges and payments •  https://www.cms.gov/research-statistics-data-and-systems/statistics-trends-and- reports/medicare-provider-charge-data/inpatient.html InPatient Payment Dataset (IPPS) Provider ID Provider State DRG Total Discharges Average Charges Average Total payments Avg Medicare Payments 1261770 CA Heart Transplant 13 1,172,866 251,876 244,457
  • 6. 6 © 2018 MapR Technologies, Inc What is a Stream ? •  A stream is a continuous sequence of events •  Events are key-value pairs
  • 7. 7 © 2018 MapR Technologies, Inc What is Streaming Data? Fraud detection Smart Machinery Smart Meters Home Automation Networks Manufacturing Security Systems Patient Monitoring
  • 8. 8 © 2018 MapR Technologies, Inc Monitoring devices combined with ML can provide alerts for Sepsis, which is one of the leading causes for death in hospitals – http://www.computerweekly.com/news/450422258/Putting-sepsis- algorithms-into-electronic-patient-records Examples of Streaming Data
  • 9. 9 © 2018 MapR Technologies, Inc A Stanford team has shown that a machine-learning model can identify arrhythmias from an EKG better than an expert •  https://www.technologyreview.com/s/608234/the-machines-are-getting-ready- to-play-doctor/ Example of Streaming Data combined with Machine Learning
  • 10. 10 © 2018 MapR Technologies, Inc the ECG app on the Apple Watch can check heart rhythms and send a notification if an irregular heart rhythm is identified. •  https://www.apple.com/newsroom/2018/12/ecg-app-and-irregular-heart- rhythm-notification-available-today-on-apple-watch/ Example of Streaming Data combined with Machine Learning
  • 11. 11 © 2018 MapR Technologies, Inc https://mapr.com/blog/ml-iot-connected-medical-devices/ Applying Machine Learning to Live Patient Data
  • 13. 13 © 2018 MapR Technologies, Inc Intro to the Kafka API
  • 14. 14 © 2018 MapR Technologies, Inc Topics: Logical collection of events Organize Events into Categories Organize Data into Topics with the MapR Event Store for Kafka
  • 15. 15 © 2018 MapR Technologies, Inc Topics: Logical collection of events Organize Events into Categories Organize Data into Topics with the MapR Event Store for Kafka Consumers MapR Cluster Topic: Pressure Topic: Temperature Topic: Warnings Consumers Consumers Kafka API Kafka API
  • 16. 16 © 2018 MapR Technologies, Inc Topics are partitioned for throughput and scalability Scalable Messaging with MapR Event Streams Server 1 Partition1: Topic - Pressure Partition1: Topic - Temperature Partition1: Topic - Warning Server 2 Partition2: Topic - Pressure Partition2: Topic - Temperature Partition2: Topic - Warning Server 3 Partition3: Topic - Pressure Partition3: Topic - Temperature Partition3: Topic - Warning
  • 17. 17 © 2018 MapR Technologies, Inc New Messages are Added to the end Partition is like an Event Log New Message 6 5 4 3 2 1 Old Message
  • 18. 18 © 2018 MapR Technologies, Inc Messages are delivered in the order they are received Partition is like a Queue
  • 19. 19 © 2018 MapR Technologies, Inc Events remain on the partition, available to other consumers Unlike a queue, Events are not deleted when Read
  • 20. 20 © 2018 MapR Technologies, Inc Messages can be persisted forever Or Older messages can be deleted automatically based on time to live Not deleting messages, minimizes disk read/writes When Are Messages Deleted? MapR Cluster 6 5 4 3 2 1Partition 1 Older message
  • 21. 21 © 2018 MapR Technologies, Inc High Performance at Scale: •  Partitioning for Parallel operations •  Not deleting messages, minimizes disk read/writes
  • 22. 22 © 2018 MapR Technologies, Inc Processing Same Message for Different Purposes
  • 23. 23 © 2018 MapR Technologies, Inc Georgia Health Connect ALLOY Health: Exchange State HIE Clinical Data Viewer Reporting and Analytics Clinical Data Financial Data Provider Organizations What are the outcomes in the entire state on diabetes? Are there doctors that are doing this better than others? Georgia Health Connect
  • 24. 24 © 2018 MapR Technologies, Inc Use Case: Streaming System of Record for Healthcare
  • 26. 26 © 2018 MapR Technologies, Inc Spark Distributed Datasets partitioned •  Distributed collection of objects Dataset[T] •  Partitioned across a cluster •  Operated on in parallel •  in memory can be Cached
  • 27. 27 © 2018 MapR Technologies, Inc DataFrame = Dataset[Row] can use Spark SQL DataFrame is like a table Dataset[Row] row columns
  • 28. 28 © 2018 MapR Technologies, Inc Dataset[Typed Object] can use Spark SQL and Functions A Dataset is a distributed collection of objects Dataset[objects] object columns partitioned
  • 29. 29 © 2018 MapR Technologies, Inc Process the Data with Spark Structured Streaming
  • 30. 30 © 2018 MapR Technologies, Inc Datasets Read from Stream Task Cache Process & Cache Data offsets Stream partition Task Cache Process & Cache Data Task Cache Process & Cache Data Driver Stream partition Stream partition Data is cached for aggregations And windowed functions
  • 31. 31 © 2018 MapR Technologies, Inc new data in the data stream = new rows appended to an unbounded table Data stream as an unbounded table Treat Stream as Unbounded Tables
  • 32. 32 © 2018 MapR Technologies, Inc The Stream is continuously processed
  • 33. 33 © 2018 MapR Technologies, Inc Spark automatically streamifies SQL plans Image reference Databricks
  • 34. 34 © 2018 MapR Technologies, Inc Stream Processing
  • 35. 35 © 2018 MapR Technologies, Inc Use Case: Payment Data "NEW","Covered Recipient Physician",,,,"132655","GREGG","D","ALZATE",,"8745 AERO DRIVE","STE 200","SAN DIEGO","CA","92123","United States",,,"Medical Doctor","Allopathic & Osteopathic Physicians|Radiology|Diagnostic Radiology","CA",,,,,"DFINE, Inc","100000000326","DFINE, Inc","CA","United States", 90.87,"02/12/2016","1","In-kind items and services","Food and Beverage",,,,"No","No Third Party Payment",,,,,"No","346039438","No","Yes","Covered","Device","Radiology","StabiliT", ,"Covered","Device","Radiology","STAR Tumor Ablation System",,,,,,,,,,,,,,,,,"2016","06/30/2017" { "_id":"317150_08/26/2016_346122858", "physician_id":"317150", "date_payment":"08/26/2016", "record_id":"346122858", "payer":"Mission Pharmacal Company", "amount":9.23, "Physician_Specialty":"Obstetrics & Gynecology", "Nature_of_payment":"Food and Beverage" }
  • 36. 36 © 2018 MapR Technologies, Inc Scenario: Payment Data Provider ID Date Payer Payer State Provider Specialty Provider State Amount Payment Nature 1261770 01/11/2016 Southern Anesthesia & Surgical, Inc CO Oral and Maxillofacial Surgery CA 117.5 Food and Beverage
  • 37. 37 © 2018 MapR Technologies, Inc val df1 = spark.readStream.format("kafka") .option("kafka.bootstrap.servers", "maprdemo:9092") .option("subscribe", "/apps/stream:payments”) .option("startingOffsets", "earliest") .option("failOnDataLoss", false) .option("maxOffsetsPerTrigger", 1000) .load() Streaming pipeline Kafka Data source
  • 38. 38 © 2018 MapR Technologies, Inc df1.printSchema() root
 |-- key: binary (nullable = true)
 |-- value: binary (nullable = true)
 |-- topic: string (nullable = true)
 |-- partition: integer (nullable = true)
 |-- offset: long (nullable = true)
 |-- timestamp: timestamp (nullable = true)
 |-- timestampType: integer (nullable = true) Kafka DataFrame schema
  • 39. 39 © 2018 MapR Technologies, Inc case class Payment(_id: String, physician_id: String, date_payment: String, payer: String, payer_state: String, amount: Double, physician_type: String,physician_specialty: String, physician_state: String, nature_of_payment: String) def parsePayment(str: String): Payment = { val td = str.split(",(?=([^"]*"[^"]*")*[^"]*$)") val physician_id =td(5) val amount = td(30).toDouble . . . Payment(id, physician_id, date_payment, payer, payer_state, amount, physician_type, focus, physician_state, nature_of_payment) } Function to Parse CSV data to Payment Object
  • 40. 40 © 2018 MapR Technologies, Inc //register a user-defined function (UDF) to deserialize the message spark.udf.register("deserialize", (message: String) => parsePayment(message)) //use the UDF in a select expression val df2 = df1.selectExpr("""deserialize(CAST(value as STRING)) AS message""").select($"message".as[Payment]) Parse message txt to Payment Object
  • 41. 41 © 2018 MapR Technologies, Inc Writing to a Memory Sink Write results to Memory Start running the query val query = cdf .writeStream .queryName("uber") .format("memory") .outputMode("append”) query.start().awaitTermination()
  • 42. 42 © 2018 MapR Technologies, Inc %sql select * from payments limit 4: Streaming Applicaton +--------------------+------------+------------+--------------------+-----------+--------+-------------------+--------------------+---------------+------------------+ | _id|physician_id|date_payment| payer|payer_state| amount| physician_type| physician_specialty|physician_state| nature_of_payment| +--------------------+------------+------------+--------------------+-----------+--------+-------------------+--------------------+---------------+------------------+ |CA_Diagnostic Rad...| 132655| 02/12/2016| DFINE, Inc| CA| 90.87| Medical Doctor|Diagnostic Radiology| CA| Food and Beverage| |AZ_Vascular & Int...| 1006832| 01/26/2016| DFINE, Inc| CA| 25.0| Medical Doctor|Vascular & Interv...| AZ|Travel and Lodging| |AZ_Vascular & Int...| 1006832| 02/13/2016| DFINE, Inc| CA| 32.0| Medical Doctor|Vascular & Interv...| AZ|Travel and Lodging| |AZ_Vascular & Int...| 1006832| 02/19/2016| DFINE, Inc| CA| 27.27| Medical Doctor|Vascular & Interv...| AZ|Travel and Lodging|
  • 43. 43 © 2018 MapR Technologies, Inc %sql select * from payments limit 3: Streaming Applicaton
  • 44. 44 © 2018 MapR Technologies, Inc %sql select physician_specialty, count(*) as cnt from payment2 group by physician_specialty order by cnt desc Streaming Applicaton
  • 46. 46 © 2018 MapR Technologies, Inc MapR-DB Connector for Apache Spark Spark Streaming writing to MapR-DB JSON
  • 47. 47 © 2018 MapR Technologies, Inc Spark MapR-DB Connector
  • 48. 48 © 2018 MapR Technologies, Inc Designed for Partitioning and Scaling
  • 49. 49 © 2018 MapR Technologies, Inc MapR-DB JSON Document Store Data is automatically partitioned and sorted by _id row key! { "_id":”AK_08/26/2016_346122858", "physician_id":"317150", "date_payment":"08/26/2016", "payer":"Mission Pharmacal Company", "payer_state":”CO", "amount":9.23, ”physician_specialty":”Gynecology", “physician_state":”AK" ”nature_of_payment":"Food and Beverage" }
  • 50. 50 © 2018 MapR Technologies, Inc MapR Database shell maprdb mapr:> find /user/mapr/paytable --limit 2 {"_id":"AK_01/03/2016_1309545_391702408", "amount":94.21, "date_payment":"01/03/2016", "nature_of_payment":"Food and Beverage", "payer":"Stryker Corporation", "payer_state":"MI","physician_id":"1309545", "physician_specialty":"Specialist", "physician_state":"AK", "physician_type":"Medical Doctor"} {"_id":"AK_01/04/2016_100801_396460394","amount":11.59, "date_payment":"01/04/2016", "nature_of_payment":"Food and Beverage", "payer":"AbbVie, Inc.", "payer_state":"IL", "physician_id":"100801", "physician_specialty":"Family Medicine", "physician_state":"AK", "physician_type":"Medical Doctor"} 2 document(s) found. Data is automatically partitioned and sorted by _id row key!
  • 51. 51 © 2018 MapR Technologies, Inc Writing to a MapR-DB Sink Write Streaming DataFrame Query Results to MapR-DB Start running the query val query = cdf.writeStream .format(MapRDBSourceConfig.Format) .option(MapRDBSourceConfig.TablePathOption, tableName) .option(MapRDBSourceConfig.IdFieldPathOption, "_id") .option(MapRDBSourceConfig.CreateTableOption, false) .option("checkpointLocation", "/user/mapr/check") .option(MapRDBSourceConfig.BulkModeOption, true) .option(MapRDBSourceConfig.SampleSizeOption, 1000) query.start().awaitTermination()
  • 52. 52 © 2018 MapR Technologies, Inc Streaming Dataset Read from Topic partitions, Transformed, Written to MapR Database partitions
  • 53. 53 © 2018 MapR Technologies, Inc Store in MapR Database
  • 55. 55 © 2018 MapR Technologies, Inc •  Spark SQL queries and updates to MapR-DB •  With projection and filter pushdown, custom partitioning, and data locality Spark SQL Querying MapR-DB JSON
  • 56. 56 © 2018 MapR Technologies, Inc val pdf: Dataset[Payment] = spark .loadFromMapRDB[Payment](tableName, schema) .as[Payment] Spark Distributed Datasets read from MapR-DB Partitions Worker Task Worker Driver Cache 1 Cache 2 Cache 3 Process & Cache Data Process & Cache Data Process & Cache Data Task Task Driver tasks tasks tasks
  • 57. 57 © 2018 MapR Technologies, Inc Data Frame Load data pdf.createOrReplaceTempView(”payment") pdf.select("_id", "payer", "amount”).show Load the data into a Dataframe Data is automatically partitioned and sorted by _id row key!
  • 58. 58 © 2018 MapR Technologies, Inc •  Which physician specialties, or physician states, are getting the most payments ? •  Count , total amount $, average amount $, standard deviation •  Which companies, or company states, are making the most payments ? •  Count , total amount $, average amount $, standard deviation •  What are the Top Nature of Payments by count, amount ? •  What are the payments over $2,000,000 ? Now We Can Ask Questions Like: Physician ID Date Payer Payer State Physician Specialty Physician State Amount Payment Nature 1261770 01/11/2016 Southern Anesthesia & Surgical, Inc CO Oral and Maxillofacial Surgery CA 117.5 Food and Beverage
  • 59. 59 © 2018 MapR Technologies, Inc Language Integrated Queries L I Query Description agg(expr, exprs) Aggregates on entire DataFrame distinct Returns new DataFrame with unique rows except(other) Returns new DataFrame with rows from this DataFrame not in other DataFrame filter(expr); where(condition) Filter based on the SQL expression or condition groupBy(cols: Columns) Groups DataFrame using specified columns join (DataFrame, joinExpr) Joins with another DataFrame using given join expression sort(sortcol) Returns new DataFrame sorted by specified column select(col) Selects set of columns
  • 60. 60 © 2018 MapR Technologies, Inc MapR Data Platform
  • 61. 61 © 2018 MapR Technologies, Inc Top 5 Nature of Payment by count val res = pdf.groupBy(”Nature_of_Payment") .count() .orderBy(desc(count)) .show(5)
  • 62. 62 © 2018 MapR Technologies, Inc Top 5 Nature of Payment by amount of payment %sql select Nature_of_payment, sum(amount) as total from payments group by Nature_of_payment order by total desc limit 5
  • 63. 63 © 2018 MapR Technologies, Inc Top 5 Physician Specialties by total Amount %sql select physician_specialty, sum(amount) as total from payments where physician_specialty IS NOT NULL group by physician_specialty order by total desc limit 5
  • 64. 64 © 2018 MapR Technologies, Inc Average Payment by Specialty
  • 65. 65 © 2018 MapR Technologies, Inc What are the Nature of Payments with payments > $1000 ? pdf.filter($"amount" > 1000) .groupBy("Nature_of_payment") .count().orderBy(desc("count")).show()
  • 66. 66 © 2018 MapR Technologies, Inc Top Payers by Total Amount with count %sql select payer, payer_state, count(*) as cnt, sum(amount) as total from payments group by payer, payer_state order by total desc limit 10
  • 67. 67 © 2018 MapR Technologies, Inc Scenario: InPatient Payment Data Provider ID Provider State DRG Total Discharges Average Charges Average Total payments Avg Medicare Payments 1261770 CA Heart Transplant 13 1,172,866 251,876 244,457
  • 68. 68 © 2018 MapR Technologies, Inc MapR Database shell maprdb mapr:> find /user/mapr/ippstable --limit 2 {"_id":"001_2014_010033", "avg_covered_charges":1172866.385, "avg_medicare_payments": 244457.9231, "avg_total_payments":251876.3077, "drg_code":"001", "drg_definition":"HEART TRANSPLANT OR IMPLANT OF HEART ASSIST SYSTEM W MCC", "provider_address":"619 SOUTH 19TH STREET", "provider_city":"BIRMINGHAM", "provider_id":"010033", "provider_name":"UNIVERSITY OF ALABAMA HOSPITAL", "provider_region":"AL - Birmingham", "provider_state":"AL", "provider_zip":"35233", "total_discharges":13, "year":"2014"} {"_id":"001_2014_030103", "avg_covered_charges":437531.3, "avg_medicare_payments": 133509.55, "avg_total_payments":240422.8, "drg_code":"001", "drg_definition":"HEART TRANSPLANT OR IMPLANT OF HEART ASSIST SYSTEM W MCC", "provider_address":"5777 EAST MAYO BOULEVARD", "provider_city":"PHOENIX", "provider_id":"030103", "provider_name":"MAYO CLINIC HOSPITAL", "provider_region":"AZ - Phoenix", "provider_state":"AZ", "provider_zip":"85054", "total_discharges":20, "year":"2014”}
  • 69. 69 © 2018 MapR Technologies, Inc •  What are the top charges, and payments by Hospital? •  by Hospital State? •  What are the top charges, and payments by diagnosis? •  Most common diagnosis ? •  What are the statistics for payments for most common diagnosis (Sepsis ) •  What are the most expensive states for most common diagnosis sepsis Now We Can Ask Questions Like: Provider ID Provider State DRG Total Discharges Average Charges Average Total payments Avg Medicare Payments 1261770 CA Heart Transplant 13 1,172,866 251,876 244,457
  • 70. 70 © 2018 MapR Technologies, Inc InPatient Prospective Payment System payments
  • 71. 71 © 2018 MapR Technologies, Inc To Download the code : Streaming data pipeline blogs •  https://mapr.com/blog/etl-pipeline-healthcare-dataset-with-spark-json-mapr-db/ •  https://mapr.com/blog/streaming-data-pipeline-transform-store-explore-healthcare- dataset-mapr-db/
  • 72. 72 © 2018 MapR Technologies, Inc http://mapr.com/ebooks/ New Spark Ebook
  • 73. 73 © 2018 MapR Technologies, Inc Read about Streaming Architectures mapr.com/ebooks
  • 74. 74 © 2018 MapR Technologies, Inc Read about Machine Learning Logistics mapr.com/ebooks
  • 75. 75 © 2018 MapR Technologies, Inc MapR Free ODT http://learn.mapr.com/ To Learn More: New Spark 2.0 training
  • 76. 76 © 2018 MapR Technologies, Inc https://mapr.com/blog/ MapR Blog
  • 77. 77 © 2018 MapR Technologies, Inc To Learn More: •  https://mapr.com/blog/ml-iot-connected-medical-devices/
  • 78. 78 © 2018 MapR Technologies, Inc To Learn More: •  https://mapr.com/blog/how-stream-first-architecture-patterns-are-revolutionizing- healthcare-platforms/
  • 79. 79 © 2018 MapR Technologies, Inc Applying Machine Learning to Live Patient Data •  https://www.slideshare.net/caroljmcdonald/applying-machine-learning-to-live- patient-data