Improving Mobile Payments With Real time Spark

Improving Mobile
Payments with Real time
Spark

● Madhukara Phatak
● Big data consultant and
trainer at datamantra.io
● Consult in Hadoop, Spark
and Scala
● www.madhukaraphatak.com

Agenda
● Mobile as drive for big data
● Our customer solution
● Existing data solution
● Improved solution
● Technical details
● Future enhancements
● Q & A

Mobile as Big data drive
● Mobile has changed the way in which we interact with
world
● Most of the buy/sell happens on mobile today
○ Myntra went fully mobile
○ Flipkart and amazon say their 50% buy happens on
mobile
○ Quikr and OLX is mobile based selling platform
○ Ola etc

Challenges in Mobile
● Customers expect the service to available 24/7
● Tiny screens make very challenging to typical software
flows
● Flaky connectivity of mobile networks makes it tougher
● Constant moving results in drop in interactions
● No more downtime
● Everything has to be done in realtime

Mobile payments
● Almost every app earlier mentioned needs some kind of
payment
● Getting payments right on mobile is very hard
● Globally 21% of online shoppers abandon their basket
due to payment failures or delays
● Some companies are building sdk’s to help the app
developers
● Our customer is one of them

Our customer solution
● Mobile sdk for applications simplify the payments
● SDK provides better user interface like big buttons to
generate OTP or other flows
● SDK also helps in filling up different kind of forms given
by different banks using consistent UI
● Better user experience across applications
● Application sends anonymous payments details across
apps to our customer servers

Some numbers
● 40 + customers
● Over 1 million transactions per month as per March
● Around 55% success rate ( 5 % above average)
● Supports major banks, payment gateways and wallet
providers
● Soon will be available in other than mobile payment
space

Why data matters?
● As number of transaction increases, things will go
wrong
● There are so many different combinations to go wrong
● Example
○ Airtel OTP failing with state bank netbanking
○ Customers stuck in password page
○ Not able to read OTP from some specific
● Understanding customer pain and reacting to it is
paramount
● Every help results in payment

Initial BI solution
Events
Hourly
Push
JSON
Data
S3FS
Session Wise
Aggregations

Initial BI solution
● Phone sdk pushes events like transaction initiation,
payment complete to logging servers
● Logging servers roll log for every one hour and push to
s3
● A single node spark machine aggregates data by
sessions and pushes it to mysql
● Google BigQuery is used for adhoc querying

Challenges with BI solution
● Batch processing
● Geared towards more of report generation oriented flow
● Very minimal use of Spark API’s as team was not well
aware of it’s potential
● No integration with mobile sdk for feed back loop

Requirements for consulting
● Bring the same reporting calculations to real time
● Understanding the user behaviour and tracking his/her
flow over a session
● Closing the loop by providing automatic alerts based on
the metric calculations
● Some new specific business cases like loyalty
management etc
● Improving team expertise on spark

Choosing Spark streaming
● Company was already invested in Spark so spark
streaming was no brainer
● Also porting spark batch code to streaming was mostly
straight forward as both talk same API
● Company used python as Spark API language which
was supported by streaming also
● So we didn't consider storm we went ahead with Spark
streaming

First version
Events
Five Minute Push
JSON
Data
FileStream
Session Wise
Aggregations

First version
● We used fileStream API of spark streaming which
allowed us to poll a s3 bucket for every few mins
● A new rolling appender was added to log servers to
push logs to s3 every 5 mins
● Exact same batch code was used for calculations which
made transition very easy
● All downstream applications remained same

Second version
Events
JSON
Data
Session Wise
Aggregations
Hourly
Push Realtime

Amazon Kinesis
● A kafka like distributed message queue by Amazon
● It’s used as managed kafka source on AWS web
services
● Highly scalable and low latency support
● Persistence with fault tolerance across multiple
availability zones
● Great integration with Spark

Second version
● Amazon kinesis is added as real time stream source
● Logging server push logs to kinesis as they arrive
● Streaming application pulls the data from kinesis for
every few mins
● Multiple partitions support added for parallel streams

Challenges with Python
● Spark streaming API for python was introduced in 1.2
whereas spark-streaming for Scala/Java is available
from 0.8
● No aws kinesis connector was available as of March
● Team has to write it’s own
● No support for python in Spark job server

Challenges from batch to streaming
● Session typically last from 1-10 mins. Batch is easy
most of the time session is done for a one hour data but
challenging for real time data
● Designing state for session
● Designing checkpointing and deciding on interval
● Weird checkpointing issues with s3 due to eventual
consistency

Improvements to batch code
● Most of the code was written in rdd paradigm as it was
only know to team
● Team was trained on spark sql and spark streaming
● Majority code was ported to Spark sql based solution to
improve readability and maintainability
● Recently moved into Dataframe based code

Third version
Events
JSON
Data
Session Wise
Aggregations
Hourly
Push Realtime

Choosing Mesos
● Mesos is a great cluster manager for Spark only
workloads
● Has specific coarse-grain mode which is dedicated for
the real time systems
● Minimal overhead compared to YARN
● Easy to setup on EC2

Fourth version
Events
JSON
Data
Session Wise
Aggregations
Hourly
Push Realtime

Grafana
● Added grafana for visualization and dashboards
● Graphana = Graphite + influxDB
● Moved away from mysql to time series database influx
DB
● Scales much better compared to mysql
● Data scientists or product managers can monitor
customers using these dashboards
● Integrates with mobile sdk

Improving Mobile Payments With Real time Spark

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (16)

Similar a Improving Mobile Payments With Real time Spark

Similar a Improving Mobile Payments With Real time Spark (20)

Más de datamantra

Más de datamantra (16)

Último

Último (20)

Improving Mobile Payments With Real time Spark