SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
APACHE: BIG DATA EUROPE 2015
Budapest, September 28-30, 2015
tech talk @ ferret
Andrii Gakhov
SELECTEDTALKS
Photos © Apache Big Data
BEING READY FOR APACHE KAFKA
by Michael G. Noll, Confluent Inc.
http://www.slideshare.net/miguno/being-ready-for-apache-kafka-apache-big-data-europe-2015
Apache Kafka is a publish-subscribe messaging
rethought as a distributed commit log.
Producer
Producer
Consumer
Consumer
Broker Broker Broker
Broker Broker Broker
Broker Broker Broker
ZooKeeper
Kafka Cluster
oldest newest
Producer
Customer
Customer
topic
ABOUT KAFKA FROM JAY KREPS
• A consumer just maintains an “offset,” which is the log entry number
for the last record it has processed on each of these partitions. So,
changing the consumer’s position to go back and reprocess data is as
simple as restarting the job with a different offset. Adding a second
consumer for the same data is just another reader pointing to a
different position in the log.
• Kafka supports replication and fault-tolerance, runs on cheap,
commodity hardware, and is glad to store many TBs of data per
machine.
• LinkedIn keeps more than a petabyte of Kafka storage online,
and a number of applications make good use of this long retention
pattern for exactly this purpose.
http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html
USING KAFKA
• DEB and RPM are available via Confluence Platform
(http://www.confluent.io/developer)
• Recommended Python client: kafka-python 

(https://github.com/mumrah/kafka-python)
• Confluent Kafka-REST is available via Confluent
Platform
• Monitoring is important: Host metrics (CPU, memory,
disk I/O and usage, network I/O), Kafka metrics
(consumer lag, replication stats, message latency, GC),
ZooKeeper metrics (requests latency, #outstanding
requests)
NEW IN KAFKA 0.9.0
• Copycat is a new framework for loading structured data into and
out of Kafka
• Kafka Streams is a library that supports basic operations (join/
filter/map/…), windowing, schema and proper time modelling
(event time vs. processing time)
• New unified consumer Java API
• ZooKeeper dependency is removed from clients
copycat copycat
$ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt
Copycat Copycat
Kafka Kafka
Kafka Streams Kafka Streams
KAPPA ARCHITECTURE
OUR EXPERIENCE
by Juantomás García,ASPgems
http://events.linuxfoundation.org/sites/events/files/slides/ASPgems%20-%20Kappa%20Architecture.pdf
LAMBDA ARCHITECTURE
https://www.mapr.com/developercentral/lambda-architecture
LAMBDA ARCHITECTURE
• Batch layer that provides the following functionality:
• managing the master dataset, an immutable, append-only
set of raw data.
• pre-computing arbitrary query functions, called batch views.
• Serving layer (NoSQL such as HBase,Apache Druid, etc.)
• This layer indexes the batch views so that they can be
queried in ad hoc with low latency.
• Speed layer (Apache Storm, Spark Streaming, etc.)
• This layer accommodates all requests that are subject to
low latency requirements. Using fast and incremental
algorithms, the speed layer deals with recent data only.
LAMBDA ARCHITECTURE
• Retain the input data
unchanged
• Take in account the
problem of
reprocessing data (the
code change, and you
need to reprocess)
• Maintain the code that
need to produce the
same result from two
complex distributed
system is painful
• Different and diverging
programming
paradigms
Pros Cons
KAPPA ARCHITECTURE
• July 2, 2014 Jay Kreps from LinkedIn coined the term Kappa
Architecture
• The proposal of Jay Kreps is simple:
• Use Kafka (or other system) that will let you retain the full log
of the data you need to reprocess.
• When you want to do the reprocessing, start a second instance
of your stream processing job that starts processing from the
beginning of the retained data, but direct this output data to a
new output table.
• When the second job has caught up, switch the application to
read from the new table.
• Stop the old version of the job, and delete the old output table.
http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html
KAPPA ARCHITECTURE
APP
output table n
output table n+1
job version n
job version n+1
input topic
Kafka Cluster Stream Processing Serving DB
LAMDA ARCHITECTURE
APP
speed table
batch table
processing job
processing job
input topic
Kafka Cluster
Stream Processing
Serving DB
Batch Processing
• Need to reprocess only when you change the code.
• Check if the new version is working OK and if not
reverse to the old output table.
• You can mirror a Kafka topic to HDFS so you are
not limited to the Kafka retention configuration.
• You have only a code to maintain with an unique
framework.
• The real advantage is allowing your team to
develop, test, debug and operate their systems on
top of a single processing framework.
KAPPA ARCHITECTURE
USE CASES: IOT - OBD II
• One of clients install On Board Devices in the cars of
its customers.
• ASPGems implements an API to got all the
information in real time and inject the information in
Kafka.
• The business rules are implemented in a CEP
(complex event processing) running into Apache
Spark Streaming.
• As MPP (massively parallel processing) they use
ElasticSearch.
CATCHTHEM INTHE ACT
FRAUD DETECTION IN REAL-TIME
by Seshika Fernando,WSO2
http://events.linuxfoundation.org/sites/events/files/slides/Fraud%20Detection%20in%20Real-time%20-%20Seshika%20Fernando.pdf
FRAUD:ATRILLION DOLLAR PROBLEM
• Survey results
• $ 3.5 – 4Trillion in Global Losses per year (5% of Global GDP)
• Payment Fraud Only
• Merchants are losing around $250B globally
• Cost of Fraud is around 0.68% of Revenue for Retailers (2014)
• Steep rise in Fraud in eCommerce (0.85% of Revenue) and
mCommerce (1.36% of Revenue) with a movement of
payments to newer channels
Domain
Knowledge
Batch
Analytics
Real-Time
Analytics
Predictive
Analytics
Interactive
Analytics
Fraud DetectionToolkit
Data Analytics Server
FRAUD SCORING
• Use combinations of rules
• Give weights to each rule
• Derive a single number that reflects many fraud indicators
• Use a threshold to reject transactions
• Example: Score = 0.001 * itemPrice + 0.1 *
itemQuantity + 2.5 * isFreeEmail + 

5 * riskyCountry + 8 * suspicousIPRange + 5 *
suspicousUsername + 3 * highTransactionVelocity
LEARN FROM DATA
• Utilize Machine LearningTechniques to identify
‘unknown’ point anomalies (e.g. k-means clustering)
MARKOV MODELS FOR FRAUD DETECTION
• Markov Models are stochastic models used to model
randomly changing systems
Classify
Events
Update
Probability
Matrix
Compare
Incoming
Sequences
Probability

Matrix
events alerts
MARKOV MODEL: CLASSIFICATION
Example: Each transaction is classified under
the following three qualities and expressed
as a 3 letter token, e.g., HNN
• Amount spent: Low, Normal and High
• Whether the transaction includes high price item:
Normal and High
• Time elapsed since the last transaction: Large,
Normal and Small
MARKOV MODEL: PROBABILITY
LNL LNH LNS LHL HHL …
LNL 0.97 0.54 0.2 0.09 0.07
LNH 0.8 0.6 0.18 0.65 0.11
LNS 0.07 0.83 0.95 0.15 0.12
…
• Compare the probabilities of incoming transaction
sequences with thresholds and flag fraud as appropriate
• Can use direct probabilities or more complex metrics (Miss
Rate Metric, Miss Probability Metric, Entropy Reduction
Metric, …)
• Update Markov Probability table with incoming transactions
DIG DEEPER
• Access historical
data using
• expressive
querying
• easy filtering
• useful
visualisations
• to isolate incidents
and unearth
connections
NLP STRUCTURED DATA INVESTIGATION
ON NON-TEXTUAL DATA WITH MLLIB
by Casey Stella, Hortonworks
http://events.linuxfoundation.org/sites/events/files/slides/NLP_on_non_textual_data.pdf
WORD2VEC
• Word2Vec is a vectorization model created by Google that attempts
to learn relationships between words automatically given a large
corpus of sentences.
• Gives us a way to find similar words by finding near neighbors in
the vector space with cosine similarity.
• Uses a neural network to learn vector representations.
• Recent work by Pennington, Socher, and Manning shows that the
word2vec model is equivalent to weighting a word co-
occurance matrix based on window distance and lowering the
dimension by matrix factorization.
• Read more: http://radimrehurek.com/2014/12/making-sense-of-
word2vec/
CLINICAL DATA AS SENTENCES
• Clinical encounters form a sort of sentence over time. For a
given encounter:
• Vitals are measured (e.g. height, weight, BMI).
• Labs are performed and results are recorded (e.g. blood tests).
• Procedures are performed.
• Diagnoses are made (e.g. Diabetes).
• Drugs are prescribed.
• Each of these can be considered clinical “words” and the
encounter forms a clinical “sentence”.
• Idea:We can use word2vec to investigate connections between
these clinical concepts.
DEMO FOR KAGGLE COMPETION
• Practice Fusion Diabetes Classification (https://
www.kaggle.com/c/pf2012-diabetes)
• Given a de-identified data set of patient electronic
health records, build a model to determine who
has a diabetes diagnosis, as defined by ICD9 codes
• There are a total of 9,948 patients in the training
set and 4,979 patients in the test set.
• Ingested and preprocessed these records
into197,340 clinical “sentences”
SYNONIMS
• Sentence:
• dx::042 rx::benzoyl_peroxide_topical rx::morphine
from pyspark.mllib.feature import Word2Vec
word2vec = Word2Vec()
word2vec.setSeed(0)
word2vec.setVectorSize(100)
model = word2vec.fit(sentences)
def print_synonyms_filt(clinical_concept, model, prefix):
synonyms = model.findSynonyms(clinical_concept, 10000)
for word, cosine_distance in synonyms:
if prefix is None or word.startswith(prefix):
print "{}: {}".format(cosine_distance, word)
RESULTS EXAMPLE:
ATHEROSCLEROSIS OFTHE AORTA
• Hearing Loss¶
• From an article from the Journal of Atherosclerosis in 2012:
• Sensorineural hearing loss seemed to be associated with vascular endothelial
dysfunction and an increased cardiovascular risk
• Knee Joint Replacements
• These procedures are common among those with osteoarthritis and there has
been a solid correlation between osteoarthritis and atherosclerosis in the literature.
print_synonyms_filt(‘dx::440.0’, model, None)
0.930721402168: dx: v12.71 -- Personal history of peptic ulcer disease
0.926115810871: dx: 533.40 -- Chronic or unspecified peptic ulcer of
unspecified site with hemorrhage, without mention of obstruction
0.91034334898: dx: 153.6 -- Malignant neoplasm of ascending colon
0.90947073698: dx: 238.75 -- Myelodysplastic syndrome, unspecified
0.907130658627: dx: 389.10 -- Sensorineural hearing loss, unspecified
0.90490090847: dx: 428.30 -- Diastolic heart failure, unspecified
0.902494549751: dx: v43.65 -- Knee joint replacement
THANKYOU

Más contenido relacionado

La actualidad más candente

Architecting &Building Scalable Secure Web API
Architecting &Building Scalable Secure Web APIArchitecting &Building Scalable Secure Web API
Architecting &Building Scalable Secure Web APISHAKIL AKHTAR
 
Microservices in GO lang
Microservices in GO langMicroservices in GO lang
Microservices in GO langSHAKIL AKHTAR
 
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...HostedbyConfluent
 
Debugging Microservices - key challenges and techniques - Microservices Odesa...
Debugging Microservices - key challenges and techniques - Microservices Odesa...Debugging Microservices - key challenges and techniques - Microservices Odesa...
Debugging Microservices - key challenges and techniques - Microservices Odesa...Lohika_Odessa_TechTalks
 
Building Killer RESTful APIs with NodeJs
Building Killer RESTful APIs with NodeJsBuilding Killer RESTful APIs with NodeJs
Building Killer RESTful APIs with NodeJsSrdjan Strbanovic
 
Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...
Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...
Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...Kai Wähner
 
Rob Gruhl and Erik Erikson - What We Learned in 18 Serverless Months at Nords...
Rob Gruhl and Erik Erikson - What We Learned in 18 Serverless Months at Nords...Rob Gruhl and Erik Erikson - What We Learned in 18 Serverless Months at Nords...
Rob Gruhl and Erik Erikson - What We Learned in 18 Serverless Months at Nords...ServerlessConf
 
Creating Event Driven Serverless Applications - Sandeep - Adobe - Serverless ...
Creating Event Driven Serverless Applications - Sandeep - Adobe - Serverless ...Creating Event Driven Serverless Applications - Sandeep - Adobe - Serverless ...
Creating Event Driven Serverless Applications - Sandeep - Adobe - Serverless ...CodeOps Technologies LLP
 
Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...
Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...
Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...confluent
 
Reactive messaging Quarkus and Kafka
Reactive messaging Quarkus and KafkaReactive messaging Quarkus and Kafka
Reactive messaging Quarkus and KafkaBruno Horta
 
Cloud Native Spring - The role of Spring Cloud after Kubernetes became a main...
Cloud Native Spring - The role of Spring Cloud after Kubernetes became a main...Cloud Native Spring - The role of Spring Cloud after Kubernetes became a main...
Cloud Native Spring - The role of Spring Cloud after Kubernetes became a main...Orkhan Gasimov
 
Kafka Pluggable Authorization for Enterprise Security (Anna Kepler, Viasat) K...
Kafka Pluggable Authorization for Enterprise Security (Anna Kepler, Viasat) K...Kafka Pluggable Authorization for Enterprise Security (Anna Kepler, Viasat) K...
Kafka Pluggable Authorization for Enterprise Security (Anna Kepler, Viasat) K...confluent
 
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, ConfluentMaking Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, ConfluentHostedbyConfluent
 
Spring Cloud: API gateway upgrade & configuration in the cloud
Spring Cloud: API gateway upgrade & configuration in the cloudSpring Cloud: API gateway upgrade & configuration in the cloud
Spring Cloud: API gateway upgrade & configuration in the cloudOrkhan Gasimov
 
GDG Taipei 2020 - Cloud and On-premises Applications Integration Using Event-...
GDG Taipei 2020 - Cloud and On-premises Applications Integration Using Event-...GDG Taipei 2020 - Cloud and On-premises Applications Integration Using Event-...
GDG Taipei 2020 - Cloud and On-premises Applications Integration Using Event-...Rich Lee
 
The best of Apache Kafka Architecture
The best of Apache Kafka ArchitectureThe best of Apache Kafka Architecture
The best of Apache Kafka Architecturetechmaddy
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Worksconfluent
 

La actualidad más candente (20)

Architecting &Building Scalable Secure Web API
Architecting &Building Scalable Secure Web APIArchitecting &Building Scalable Secure Web API
Architecting &Building Scalable Secure Web API
 
Microservices in GO lang
Microservices in GO langMicroservices in GO lang
Microservices in GO lang
 
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...
 
Debugging Microservices - key challenges and techniques - Microservices Odesa...
Debugging Microservices - key challenges and techniques - Microservices Odesa...Debugging Microservices - key challenges and techniques - Microservices Odesa...
Debugging Microservices - key challenges and techniques - Microservices Odesa...
 
Building Killer RESTful APIs with NodeJs
Building Killer RESTful APIs with NodeJsBuilding Killer RESTful APIs with NodeJs
Building Killer RESTful APIs with NodeJs
 
Javantura v4 - (Spring)Boot your application on Red Hat middleware stack - Al...
Javantura v4 - (Spring)Boot your application on Red Hat middleware stack - Al...Javantura v4 - (Spring)Boot your application on Red Hat middleware stack - Al...
Javantura v4 - (Spring)Boot your application on Red Hat middleware stack - Al...
 
KrakenD API Gateway
KrakenD API GatewayKrakenD API Gateway
KrakenD API Gateway
 
Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...
Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...
Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...
 
Rob Gruhl and Erik Erikson - What We Learned in 18 Serverless Months at Nords...
Rob Gruhl and Erik Erikson - What We Learned in 18 Serverless Months at Nords...Rob Gruhl and Erik Erikson - What We Learned in 18 Serverless Months at Nords...
Rob Gruhl and Erik Erikson - What We Learned in 18 Serverless Months at Nords...
 
Creating Event Driven Serverless Applications - Sandeep - Adobe - Serverless ...
Creating Event Driven Serverless Applications - Sandeep - Adobe - Serverless ...Creating Event Driven Serverless Applications - Sandeep - Adobe - Serverless ...
Creating Event Driven Serverless Applications - Sandeep - Adobe - Serverless ...
 
Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...
Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...
Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...
 
Reactive messaging Quarkus and Kafka
Reactive messaging Quarkus and KafkaReactive messaging Quarkus and Kafka
Reactive messaging Quarkus and Kafka
 
Cloud Native Spring - The role of Spring Cloud after Kubernetes became a main...
Cloud Native Spring - The role of Spring Cloud after Kubernetes became a main...Cloud Native Spring - The role of Spring Cloud after Kubernetes became a main...
Cloud Native Spring - The role of Spring Cloud after Kubernetes became a main...
 
Kafka Pluggable Authorization for Enterprise Security (Anna Kepler, Viasat) K...
Kafka Pluggable Authorization for Enterprise Security (Anna Kepler, Viasat) K...Kafka Pluggable Authorization for Enterprise Security (Anna Kepler, Viasat) K...
Kafka Pluggable Authorization for Enterprise Security (Anna Kepler, Viasat) K...
 
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, ConfluentMaking Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
 
API Gateway report
API Gateway reportAPI Gateway report
API Gateway report
 
Spring Cloud: API gateway upgrade & configuration in the cloud
Spring Cloud: API gateway upgrade & configuration in the cloudSpring Cloud: API gateway upgrade & configuration in the cloud
Spring Cloud: API gateway upgrade & configuration in the cloud
 
GDG Taipei 2020 - Cloud and On-premises Applications Integration Using Event-...
GDG Taipei 2020 - Cloud and On-premises Applications Integration Using Event-...GDG Taipei 2020 - Cloud and On-premises Applications Integration Using Event-...
GDG Taipei 2020 - Cloud and On-premises Applications Integration Using Event-...
 
The best of Apache Kafka Architecture
The best of Apache Kafka ArchitectureThe best of Apache Kafka Architecture
The best of Apache Kafka Architecture
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Works
 

Destacado

Вероятностные структуры данных
Вероятностные структуры данныхВероятностные структуры данных
Вероятностные структуры данныхAndrii Gakhov
 
Bloom filter
Bloom filterBloom filter
Bloom filterfeng lee
 
Probabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. FrequencyProbabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. FrequencyAndrii Gakhov
 
Probabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityProbabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityAndrii Gakhov
 
Implementing a Fileserver with Nginx and Lua
Implementing a Fileserver with Nginx and LuaImplementing a Fileserver with Nginx and Lua
Implementing a Fileserver with Nginx and LuaAndrii Gakhov
 
Probabilistic data structures. Part 4. Similarity
Probabilistic data structures. Part 4. SimilarityProbabilistic data structures. Part 4. Similarity
Probabilistic data structures. Part 4. SimilarityAndrii Gakhov
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingGwen (Chen) Shapira
 
22 of the best marketing quotes
22 of the best marketing quotes22 of the best marketing quotes
22 of the best marketing quotessherinshaju
 
AQA Biology-Physical factors affecting organisms
AQA Biology-Physical factors affecting organismsAQA Biology-Physical factors affecting organisms
AQA Biology-Physical factors affecting organismssherinshaju
 
Частотный преобразователь
Частотный преобразовательЧастотный преобразователь
Частотный преобразовательkulibin
 
Mobile for SharePoint with Windows Phone
Mobile for SharePoint with Windows PhoneMobile for SharePoint with Windows Phone
Mobile for SharePoint with Windows PhoneEdgewater
 
Content Marketing: How to Attract Talent using Sponsored Updates
Content Marketing: How to Attract Talent using Sponsored UpdatesContent Marketing: How to Attract Talent using Sponsored Updates
Content Marketing: How to Attract Talent using Sponsored UpdatesRebecca Feldman
 
Journalism, Networks, Ontology: Pat kane presentation at Media140 barcelona
Journalism, Networks, Ontology: Pat kane presentation at Media140 barcelonaJournalism, Networks, Ontology: Pat kane presentation at Media140 barcelona
Journalism, Networks, Ontology: Pat kane presentation at Media140 barcelonawww.patkane.global
 

Destacado (19)

14 Skip Lists
14 Skip Lists14 Skip Lists
14 Skip Lists
 
Вероятностные структуры данных
Вероятностные структуры данныхВероятностные структуры данных
Вероятностные структуры данных
 
Bloom filter
Bloom filterBloom filter
Bloom filter
 
Probabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. FrequencyProbabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. Frequency
 
Probabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityProbabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. Cardinality
 
Implementing a Fileserver with Nginx and Lua
Implementing a Fileserver with Nginx and LuaImplementing a Fileserver with Nginx and Lua
Implementing a Fileserver with Nginx and Lua
 
Probabilistic data structures. Part 4. Similarity
Probabilistic data structures. Part 4. SimilarityProbabilistic data structures. Part 4. Similarity
Probabilistic data structures. Part 4. Similarity
 
skip list
skip listskip list
skip list
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
 
22 of the best marketing quotes
22 of the best marketing quotes22 of the best marketing quotes
22 of the best marketing quotes
 
Tech
TechTech
Tech
 
Diana maria morales hernandez actividad1 mapa_c
Diana maria morales hernandez actividad1 mapa_cDiana maria morales hernandez actividad1 mapa_c
Diana maria morales hernandez actividad1 mapa_c
 
AQA Biology-Physical factors affecting organisms
AQA Biology-Physical factors affecting organismsAQA Biology-Physical factors affecting organisms
AQA Biology-Physical factors affecting organisms
 
BIOSTER Technology Research Institute
BIOSTER Technology Research InstituteBIOSTER Technology Research Institute
BIOSTER Technology Research Institute
 
Частотный преобразователь
Частотный преобразовательЧастотный преобразователь
Частотный преобразователь
 
Mobile for SharePoint with Windows Phone
Mobile for SharePoint with Windows PhoneMobile for SharePoint with Windows Phone
Mobile for SharePoint with Windows Phone
 
Cómo hacer presentaciones exitosas
Cómo hacer presentaciones exitosasCómo hacer presentaciones exitosas
Cómo hacer presentaciones exitosas
 
Content Marketing: How to Attract Talent using Sponsored Updates
Content Marketing: How to Attract Talent using Sponsored UpdatesContent Marketing: How to Attract Talent using Sponsored Updates
Content Marketing: How to Attract Talent using Sponsored Updates
 
Journalism, Networks, Ontology: Pat kane presentation at Media140 barcelona
Journalism, Networks, Ontology: Pat kane presentation at Media140 barcelonaJournalism, Networks, Ontology: Pat kane presentation at Media140 barcelona
Journalism, Networks, Ontology: Pat kane presentation at Media140 barcelona
 

Similar a Apache Big Data Europe 2015: Selected Talks

Distributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and ScalaDistributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and ScalaMax Alexejev
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comCedric Vidal
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with storesYoni Farin
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streamsYoni Farin
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application  Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application Apache Apex
 
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationRemoving performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationKnoldus Inc.
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...Athens Big Data
 
DataConf.TW2018: Develop Kafka Streams Application on Your Laptop
DataConf.TW2018: Develop Kafka Streams Application on Your LaptopDataConf.TW2018: Develop Kafka Streams Application on Your Laptop
DataConf.TW2018: Develop Kafka Streams Application on Your LaptopYu-Jhe Li
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...confluent
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsAsis Mohanty
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)Spark Summit
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Event Driven Architectures with Apache Kafka
Event Driven Architectures with Apache KafkaEvent Driven Architectures with Apache Kafka
Event Driven Architectures with Apache KafkaMatt Masuda
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingApache Apex
 

Similar a Apache Big Data Europe 2015: Selected Talks (20)

Distributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and ScalaDistributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and Scala
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with stores
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application  Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application
 
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationRemoving performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configuration
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
DataConf.TW2018: Develop Kafka Streams Application on Your Laptop
DataConf.TW2018: Develop Kafka Streams Application on Your LaptopDataConf.TW2018: Develop Kafka Streams Application on Your Laptop
DataConf.TW2018: Develop Kafka Streams Application on Your Laptop
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture Patterns
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Event Driven Architectures with Apache Kafka
Event Driven Architectures with Apache KafkaEvent Driven Architectures with Apache Kafka
Event Driven Architectures with Apache Kafka
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
 

Más de Andrii Gakhov

Let's start GraphQL: structure, behavior, and architecture
Let's start GraphQL: structure, behavior, and architectureLet's start GraphQL: structure, behavior, and architecture
Let's start GraphQL: structure, behavior, and architectureAndrii Gakhov
 
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Andrii Gakhov
 
Too Much Data? - Just Sample, Just Hash, ...
Too Much Data? - Just Sample, Just Hash, ...Too Much Data? - Just Sample, Just Hash, ...
Too Much Data? - Just Sample, Just Hash, ...Andrii Gakhov
 
Pecha Kucha: Ukrainian Food Traditions
Pecha Kucha: Ukrainian Food TraditionsPecha Kucha: Ukrainian Food Traditions
Pecha Kucha: Ukrainian Food TraditionsAndrii Gakhov
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryAndrii Gakhov
 
Swagger / Quick Start Guide
Swagger / Quick Start GuideSwagger / Quick Start Guide
Swagger / Quick Start GuideAndrii Gakhov
 
ELK - What's new and showcases
ELK - What's new and showcasesELK - What's new and showcases
ELK - What's new and showcasesAndrii Gakhov
 
Apache Spark Overview @ ferret
Apache Spark Overview @ ferretApache Spark Overview @ ferret
Apache Spark Overview @ ferretAndrii Gakhov
 
Data Mining - lecture 8 - 2014
Data Mining - lecture 8 - 2014Data Mining - lecture 8 - 2014
Data Mining - lecture 8 - 2014Andrii Gakhov
 
Data Mining - lecture 7 - 2014
Data Mining - lecture 7 - 2014Data Mining - lecture 7 - 2014
Data Mining - lecture 7 - 2014Andrii Gakhov
 
Data Mining - lecture 6 - 2014
Data Mining - lecture 6 - 2014Data Mining - lecture 6 - 2014
Data Mining - lecture 6 - 2014Andrii Gakhov
 
Data Mining - lecture 5 - 2014
Data Mining - lecture 5 - 2014Data Mining - lecture 5 - 2014
Data Mining - lecture 5 - 2014Andrii Gakhov
 
Data Mining - lecture 4 - 2014
Data Mining - lecture 4 - 2014Data Mining - lecture 4 - 2014
Data Mining - lecture 4 - 2014Andrii Gakhov
 
Data Mining - lecture 3 - 2014
Data Mining - lecture 3 - 2014Data Mining - lecture 3 - 2014
Data Mining - lecture 3 - 2014Andrii Gakhov
 
Decision Theory - lecture 1 (introduction)
Decision Theory - lecture 1 (introduction)Decision Theory - lecture 1 (introduction)
Decision Theory - lecture 1 (introduction)Andrii Gakhov
 
Data Mining - lecture 2 - 2014
Data Mining - lecture 2 - 2014Data Mining - lecture 2 - 2014
Data Mining - lecture 2 - 2014Andrii Gakhov
 
Data Mining - lecture 1 - 2014
Data Mining - lecture 1 - 2014Data Mining - lecture 1 - 2014
Data Mining - lecture 1 - 2014Andrii Gakhov
 
Buzzwords 2014 / Overview / part2
Buzzwords 2014 / Overview / part2Buzzwords 2014 / Overview / part2
Buzzwords 2014 / Overview / part2Andrii Gakhov
 
Buzzwords 2014 / Overview / part1
Buzzwords 2014 / Overview / part1Buzzwords 2014 / Overview / part1
Buzzwords 2014 / Overview / part1Andrii Gakhov
 

Más de Andrii Gakhov (20)

Let's start GraphQL: structure, behavior, and architecture
Let's start GraphQL: structure, behavior, and architectureLet's start GraphQL: structure, behavior, and architecture
Let's start GraphQL: structure, behavior, and architecture
 
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
 
Too Much Data? - Just Sample, Just Hash, ...
Too Much Data? - Just Sample, Just Hash, ...Too Much Data? - Just Sample, Just Hash, ...
Too Much Data? - Just Sample, Just Hash, ...
 
DNS Delegation
DNS DelegationDNS Delegation
DNS Delegation
 
Pecha Kucha: Ukrainian Food Traditions
Pecha Kucha: Ukrainian Food TraditionsPecha Kucha: Ukrainian Food Traditions
Pecha Kucha: Ukrainian Food Traditions
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
 
Swagger / Quick Start Guide
Swagger / Quick Start GuideSwagger / Quick Start Guide
Swagger / Quick Start Guide
 
ELK - What's new and showcases
ELK - What's new and showcasesELK - What's new and showcases
ELK - What's new and showcases
 
Apache Spark Overview @ ferret
Apache Spark Overview @ ferretApache Spark Overview @ ferret
Apache Spark Overview @ ferret
 
Data Mining - lecture 8 - 2014
Data Mining - lecture 8 - 2014Data Mining - lecture 8 - 2014
Data Mining - lecture 8 - 2014
 
Data Mining - lecture 7 - 2014
Data Mining - lecture 7 - 2014Data Mining - lecture 7 - 2014
Data Mining - lecture 7 - 2014
 
Data Mining - lecture 6 - 2014
Data Mining - lecture 6 - 2014Data Mining - lecture 6 - 2014
Data Mining - lecture 6 - 2014
 
Data Mining - lecture 5 - 2014
Data Mining - lecture 5 - 2014Data Mining - lecture 5 - 2014
Data Mining - lecture 5 - 2014
 
Data Mining - lecture 4 - 2014
Data Mining - lecture 4 - 2014Data Mining - lecture 4 - 2014
Data Mining - lecture 4 - 2014
 
Data Mining - lecture 3 - 2014
Data Mining - lecture 3 - 2014Data Mining - lecture 3 - 2014
Data Mining - lecture 3 - 2014
 
Decision Theory - lecture 1 (introduction)
Decision Theory - lecture 1 (introduction)Decision Theory - lecture 1 (introduction)
Decision Theory - lecture 1 (introduction)
 
Data Mining - lecture 2 - 2014
Data Mining - lecture 2 - 2014Data Mining - lecture 2 - 2014
Data Mining - lecture 2 - 2014
 
Data Mining - lecture 1 - 2014
Data Mining - lecture 1 - 2014Data Mining - lecture 1 - 2014
Data Mining - lecture 1 - 2014
 
Buzzwords 2014 / Overview / part2
Buzzwords 2014 / Overview / part2Buzzwords 2014 / Overview / part2
Buzzwords 2014 / Overview / part2
 
Buzzwords 2014 / Overview / part1
Buzzwords 2014 / Overview / part1Buzzwords 2014 / Overview / part1
Buzzwords 2014 / Overview / part1
 

Último

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Último (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Apache Big Data Europe 2015: Selected Talks

  • 1. APACHE: BIG DATA EUROPE 2015 Budapest, September 28-30, 2015 tech talk @ ferret Andrii Gakhov SELECTEDTALKS
  • 2. Photos © Apache Big Data
  • 3. BEING READY FOR APACHE KAFKA by Michael G. Noll, Confluent Inc. http://www.slideshare.net/miguno/being-ready-for-apache-kafka-apache-big-data-europe-2015
  • 4. Apache Kafka is a publish-subscribe messaging rethought as a distributed commit log. Producer Producer Consumer Consumer Broker Broker Broker Broker Broker Broker Broker Broker Broker ZooKeeper Kafka Cluster oldest newest Producer Customer Customer topic
  • 5. ABOUT KAFKA FROM JAY KREPS • A consumer just maintains an “offset,” which is the log entry number for the last record it has processed on each of these partitions. So, changing the consumer’s position to go back and reprocess data is as simple as restarting the job with a different offset. Adding a second consumer for the same data is just another reader pointing to a different position in the log. • Kafka supports replication and fault-tolerance, runs on cheap, commodity hardware, and is glad to store many TBs of data per machine. • LinkedIn keeps more than a petabyte of Kafka storage online, and a number of applications make good use of this long retention pattern for exactly this purpose. http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html
  • 6. USING KAFKA • DEB and RPM are available via Confluence Platform (http://www.confluent.io/developer) • Recommended Python client: kafka-python 
 (https://github.com/mumrah/kafka-python) • Confluent Kafka-REST is available via Confluent Platform • Monitoring is important: Host metrics (CPU, memory, disk I/O and usage, network I/O), Kafka metrics (consumer lag, replication stats, message latency, GC), ZooKeeper metrics (requests latency, #outstanding requests)
  • 7. NEW IN KAFKA 0.9.0 • Copycat is a new framework for loading structured data into and out of Kafka • Kafka Streams is a library that supports basic operations (join/ filter/map/…), windowing, schema and proper time modelling (event time vs. processing time) • New unified consumer Java API • ZooKeeper dependency is removed from clients copycat copycat
  • 8. $ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt Copycat Copycat Kafka Kafka Kafka Streams Kafka Streams
  • 9. KAPPA ARCHITECTURE OUR EXPERIENCE by Juantomás García,ASPgems http://events.linuxfoundation.org/sites/events/files/slides/ASPgems%20-%20Kappa%20Architecture.pdf
  • 11. LAMBDA ARCHITECTURE • Batch layer that provides the following functionality: • managing the master dataset, an immutable, append-only set of raw data. • pre-computing arbitrary query functions, called batch views. • Serving layer (NoSQL such as HBase,Apache Druid, etc.) • This layer indexes the batch views so that they can be queried in ad hoc with low latency. • Speed layer (Apache Storm, Spark Streaming, etc.) • This layer accommodates all requests that are subject to low latency requirements. Using fast and incremental algorithms, the speed layer deals with recent data only.
  • 12. LAMBDA ARCHITECTURE • Retain the input data unchanged • Take in account the problem of reprocessing data (the code change, and you need to reprocess) • Maintain the code that need to produce the same result from two complex distributed system is painful • Different and diverging programming paradigms Pros Cons
  • 13. KAPPA ARCHITECTURE • July 2, 2014 Jay Kreps from LinkedIn coined the term Kappa Architecture • The proposal of Jay Kreps is simple: • Use Kafka (or other system) that will let you retain the full log of the data you need to reprocess. • When you want to do the reprocessing, start a second instance of your stream processing job that starts processing from the beginning of the retained data, but direct this output data to a new output table. • When the second job has caught up, switch the application to read from the new table. • Stop the old version of the job, and delete the old output table. http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html
  • 14. KAPPA ARCHITECTURE APP output table n output table n+1 job version n job version n+1 input topic Kafka Cluster Stream Processing Serving DB LAMDA ARCHITECTURE APP speed table batch table processing job processing job input topic Kafka Cluster Stream Processing Serving DB Batch Processing
  • 15. • Need to reprocess only when you change the code. • Check if the new version is working OK and if not reverse to the old output table. • You can mirror a Kafka topic to HDFS so you are not limited to the Kafka retention configuration. • You have only a code to maintain with an unique framework. • The real advantage is allowing your team to develop, test, debug and operate their systems on top of a single processing framework. KAPPA ARCHITECTURE
  • 16. USE CASES: IOT - OBD II • One of clients install On Board Devices in the cars of its customers. • ASPGems implements an API to got all the information in real time and inject the information in Kafka. • The business rules are implemented in a CEP (complex event processing) running into Apache Spark Streaming. • As MPP (massively parallel processing) they use ElasticSearch.
  • 17. CATCHTHEM INTHE ACT FRAUD DETECTION IN REAL-TIME by Seshika Fernando,WSO2 http://events.linuxfoundation.org/sites/events/files/slides/Fraud%20Detection%20in%20Real-time%20-%20Seshika%20Fernando.pdf
  • 18. FRAUD:ATRILLION DOLLAR PROBLEM • Survey results • $ 3.5 – 4Trillion in Global Losses per year (5% of Global GDP) • Payment Fraud Only • Merchants are losing around $250B globally • Cost of Fraud is around 0.68% of Revenue for Retailers (2014) • Steep rise in Fraud in eCommerce (0.85% of Revenue) and mCommerce (1.36% of Revenue) with a movement of payments to newer channels
  • 20. FRAUD SCORING • Use combinations of rules • Give weights to each rule • Derive a single number that reflects many fraud indicators • Use a threshold to reject transactions • Example: Score = 0.001 * itemPrice + 0.1 * itemQuantity + 2.5 * isFreeEmail + 
 5 * riskyCountry + 8 * suspicousIPRange + 5 * suspicousUsername + 3 * highTransactionVelocity
  • 21. LEARN FROM DATA • Utilize Machine LearningTechniques to identify ‘unknown’ point anomalies (e.g. k-means clustering)
  • 22. MARKOV MODELS FOR FRAUD DETECTION • Markov Models are stochastic models used to model randomly changing systems Classify Events Update Probability Matrix Compare Incoming Sequences Probability
 Matrix events alerts
  • 23. MARKOV MODEL: CLASSIFICATION Example: Each transaction is classified under the following three qualities and expressed as a 3 letter token, e.g., HNN • Amount spent: Low, Normal and High • Whether the transaction includes high price item: Normal and High • Time elapsed since the last transaction: Large, Normal and Small
  • 24. MARKOV MODEL: PROBABILITY LNL LNH LNS LHL HHL … LNL 0.97 0.54 0.2 0.09 0.07 LNH 0.8 0.6 0.18 0.65 0.11 LNS 0.07 0.83 0.95 0.15 0.12 … • Compare the probabilities of incoming transaction sequences with thresholds and flag fraud as appropriate • Can use direct probabilities or more complex metrics (Miss Rate Metric, Miss Probability Metric, Entropy Reduction Metric, …) • Update Markov Probability table with incoming transactions
  • 25. DIG DEEPER • Access historical data using • expressive querying • easy filtering • useful visualisations • to isolate incidents and unearth connections
  • 26. NLP STRUCTURED DATA INVESTIGATION ON NON-TEXTUAL DATA WITH MLLIB by Casey Stella, Hortonworks http://events.linuxfoundation.org/sites/events/files/slides/NLP_on_non_textual_data.pdf
  • 27. WORD2VEC • Word2Vec is a vectorization model created by Google that attempts to learn relationships between words automatically given a large corpus of sentences. • Gives us a way to find similar words by finding near neighbors in the vector space with cosine similarity. • Uses a neural network to learn vector representations. • Recent work by Pennington, Socher, and Manning shows that the word2vec model is equivalent to weighting a word co- occurance matrix based on window distance and lowering the dimension by matrix factorization. • Read more: http://radimrehurek.com/2014/12/making-sense-of- word2vec/
  • 28. CLINICAL DATA AS SENTENCES • Clinical encounters form a sort of sentence over time. For a given encounter: • Vitals are measured (e.g. height, weight, BMI). • Labs are performed and results are recorded (e.g. blood tests). • Procedures are performed. • Diagnoses are made (e.g. Diabetes). • Drugs are prescribed. • Each of these can be considered clinical “words” and the encounter forms a clinical “sentence”. • Idea:We can use word2vec to investigate connections between these clinical concepts.
  • 29. DEMO FOR KAGGLE COMPETION • Practice Fusion Diabetes Classification (https:// www.kaggle.com/c/pf2012-diabetes) • Given a de-identified data set of patient electronic health records, build a model to determine who has a diabetes diagnosis, as defined by ICD9 codes • There are a total of 9,948 patients in the training set and 4,979 patients in the test set. • Ingested and preprocessed these records into197,340 clinical “sentences”
  • 30. SYNONIMS • Sentence: • dx::042 rx::benzoyl_peroxide_topical rx::morphine from pyspark.mllib.feature import Word2Vec word2vec = Word2Vec() word2vec.setSeed(0) word2vec.setVectorSize(100) model = word2vec.fit(sentences) def print_synonyms_filt(clinical_concept, model, prefix): synonyms = model.findSynonyms(clinical_concept, 10000) for word, cosine_distance in synonyms: if prefix is None or word.startswith(prefix): print "{}: {}".format(cosine_distance, word)
  • 31. RESULTS EXAMPLE: ATHEROSCLEROSIS OFTHE AORTA • Hearing Loss¶ • From an article from the Journal of Atherosclerosis in 2012: • Sensorineural hearing loss seemed to be associated with vascular endothelial dysfunction and an increased cardiovascular risk • Knee Joint Replacements • These procedures are common among those with osteoarthritis and there has been a solid correlation between osteoarthritis and atherosclerosis in the literature. print_synonyms_filt(‘dx::440.0’, model, None) 0.930721402168: dx: v12.71 -- Personal history of peptic ulcer disease 0.926115810871: dx: 533.40 -- Chronic or unspecified peptic ulcer of unspecified site with hemorrhage, without mention of obstruction 0.91034334898: dx: 153.6 -- Malignant neoplasm of ascending colon 0.90947073698: dx: 238.75 -- Myelodysplastic syndrome, unspecified 0.907130658627: dx: 389.10 -- Sensorineural hearing loss, unspecified 0.90490090847: dx: 428.30 -- Diastolic heart failure, unspecified 0.902494549751: dx: v43.65 -- Knee joint replacement