Understanding the Pakistan Budgeting Process: Basics and Key Insights
Borys Pratsiuk "How to be NVidia partner"
1. Big Data, Analytics &
Artificial Intelligence.
Borys Pratsiuk, Ph.D, Head of R&D.
2. 2
Inspire brilliant minds to innovate and create.
First project,
C, embedded
2004
Engineer, R&D Lab,
Tescom, South Korea
2006
2013
Assistant professor, Kiev
Polytechnic Institute
2012
Ph.D Solidstate
Electronic
2015 - ...
Senior Android
Team Lead
Android Architect
2009
Android
Developer
Head of R&D
Engineering
2007
b_pratsiuk
bopr@ciklum.com
Who am I?
Borys Pratsiuk,
Ph.D.
3. 3
Inspire brilliant minds to innovate and create.
4
3
2
1
Data, Big Data
Agenda
Analytics, Data Science,
Machine Learning
Q&A
Artificial intelligence
6. 6
Skills | Knowledge | Collaboration
Every Day we create 2,500,000,000,000,000,000 bytes
(2500 petabytes) of data
90% of the data in the world has been
created in the last two years alone.
EVERY
1
MINUTE
150 MILLION
emails sent
2.4 MILLION
search queries
20.8 MILLION
messages
701,389
facebook logins
2.78 MILLION
video views
$203.596
in sales
Within five years there will be over 50 billion smart connected devices
in the world, all developed to collect, analyze and share data.
Key facts
7. 7
Skills | Knowledge | Collaboration
Result, a Data Driven Business!
An Integrated Digital Business demands
8. 8
Skills | Knowledge | Collaboration
The Industry’s View - Data Initiatives and
Success Rate
9. 9
Skills | Knowledge | Collaboration
Data & Analytics Ecosystem
Distributed file stores
NoSQL databases
Hadoop-optimized data warehousing
Data integration
Data aggregation & representation
Data Extraction, Cleaning
Analytic development platforms
Advanced analytics applications
Data Modeling & Prediction
Data visualization
Interactive visual dashboards
Business intelligence applications
Data
Engineering
Data
Analytics
Data
Presentation
10. 10
Skills | Knowledge | Collaboration
Data Engineering accelerator!
Data ProcessingBI/Visualization Data Mining/MLData Architecture / DWStorage DB’s
11. 11
Skills | Knowledge | Collaboration
Big Data Reference Architecture
Interfaces,APIs
Physical or Virtual Distributed Environment (Local Cluster, AWS, OVH...)
Visualization
Data Sources
Proprietary Open Source
APIs
Unstructured / Structured
AWS S3 HDFS
Multitenant Distributed Storage
Cassandra HBase MongoDB Hive Neo4j
Indexing engines
Elastic Search Solr Splunk
Data processing
Data collecting
ETL, search and data aggregation
Data mining, Machine Learning
14. 14
Skills | Knowledge | Collaboration
Data processing flow
Skills needed
● Database Development
● ETL
● Data Warehouse
● Data Analysis
● Data visualization/Reporting
Technologies
● T-SQL
● PL/SQL
● C#
● Java
15. 15
Skills | Knowledge | Collaboration
Reporting system development for Ad market
СLIENT
Software provider offering a range of advertising tools for campaign management,
workflow, campaign monitoring and reporting, and analytics.
SOLUTION DELIVERED
● ETL solution (C# modules and packages)
● Data Warehouse DB's
● Data Marts DB's
● OLAP cubes
● Reporting server with published reports
● Dashboards
IMPACT
● it allows to analyze and control the data from order systems and advertising
servers
● the solution provides tools for monitoring of effectiveness advertise placing
using
BUSINESS NEEDS
● collect data from various structured/ unstructured sources
● filter, transform and aggregate data
● store date into DWH
● build Data Mart based on DWH
● create dashboard and reports for visualization data
Sources:
Salesforce
DFP
Smart
AdTech
AppNexus
Videoplaza
Adition
Freewheel
SmartX
Tools and technologies:
16. 16
Skills | Knowledge | Collaboration
To develop a data collection and analysis platform and automate the
sourcing process for B2B leads (discovery, research, and follow-on analysis)
Data platform, “Ranking” algorithm, Data Access Application
Results
Investment portfolio monitoring
Challenge
Solution
● The platform enables VP to have an instant access to the consistent and
validated data within days and in some cases within hours which enables
them quickly to spot opportunities, quantify potential and follow changes;
● The ML algorithm automates the research process and enables analysts to
look only at high potential opportunities;
● The Chrome extension helps analysts stay concentrated, increases
efficiency and saves time up to 1h per day.
● Analytical platform identifies significant changes of every company and bring
the attention of the analyst to these changes.
● Monitor 6 million companies simultaneously
17. 17
Skills | Knowledge | Collaboration
At the time of the project, Axonix had more than 2.5 Pb of raw real-time
bidding (RTB) logs that contained anonymised information about user
behaviour. To minimize processing costs, the proposed data modelling
pipeline required an incremental approach to update models every
week without rerunning the pipeline with historical data.
Data processing pipeline efficiently aggregated RTB records, anomaly
detection algorithm, developed modelling pipeline has model tuning,
feature selection, model evaluation and reporting capabilities.
Result
Data preparation for Scientists
Challenge
Solution
Axonix achieved very significant savings by replacing the previous
solution with this in house system designed by Ciklum. The payback
time for the project investment was measured in months rather than
years.
18. 18
Skills | Knowledge | Collaboration
Data consulting / processing automation
1000 employee
x 100 invoices per person
x 2-20 min per document
x 20 days
x 50 parameters
2M invoices per month
0.3M hours per month
100M params per month
20. 20
Skills | Knowledge | Collaboration
Need to develop the algorithm which measures the brain and
heart activity and defines the parameters of Personalized
repetitive Transcranial magnetic stimulation in many affiliate
clinics with centralized control center.
System for EEG and heart rate analysis, integration with
recording and stimulating devices, internal hospital’s
infrastructure and the Electronic Health Records infrastructure.
Challenge
Results - Reduced the processing time for one patient from 10 minutes to
1 minute (x10 boost)
- Enabled the clinic to scale up their practice
to 15 more locations
Multi-cite EEG data analysis for treatment control
Solution
21. 21
Inspire brilliant minds to innovate and create.
US multi channel apparel retailer
• Identified most probable 1-st and 2-nd
purchase sequence (CTR + 3.4%)
• Clustered clients into segments, identified
churn reasons and produced
recommendations for targeting the audience
(2nd purchase probability + 4.3%)
• Identified future most valuable customers at
early stage of their history
• Optimized online advertise spending.
Customer behaviour prediction
Advanced retail analytics
Customer
Results
22. 22
Inspire brilliant minds to innovate and create.
Demographics Prediction
Company that operates within video related analytics sector. There was high
interest from video creators in the user distribution that watch their content.
Company also was interested in using this data for more concise and
intelligent advertising targeting purposes.
Two methods of solution were proposed each with its strengths and
disadvantages but similar accuracy.
1. Supervised learning on available dataset. A black-box tree-based model
that required extensive hyperparameters tuning but slightly higher
accuracy.
2. Look-alike modeling uses user similarity in a nutshell. A user is assigned
to age group based with probability that depends on a relative distance to
the users of each age group. Quick to train, user clusters can be saved
locally.
7 different models show results 85-94 ROC AUC score. This solution provide
posibility build prediction of age and gender for 60 million users.
Customer
Solution
Results
23. CHALLENGE:
Targeting the underbanked can often increase default risks. To make a decision about
default risk often requires expensive external data and judgements on the individual
with incomplete data. To serve this customer base requires accurate decisions that
don’t incur large costs.
SOLUTION:
Using data such as telephone records, transactional information and other behavioural
data sets. A new classification model was developed to give instant credit decisions
and improve the accuracy of the model while reducing the cost to assess a customer.
BUSINESS VALUE:
• Every 1% decrease in the default rate improved the profitability for the
loan portfolio. The cost to decision falls increasing the reach of the long
tail of finance consumers.
Reduced cost to decision
Credit default risk prediction
25. 25
Inspire brilliant minds to innovate and create.
Scientific research is a fundamental background to test
any revolutionary business ideas
Zhang, W., Yu, Q., Siddiquie, B., Divakaran, A., & Sawhney,
H. (2015). "Snap-n-Eat": food recognition and nutrition
estimation on a smartphone. Journal of Diabetes Science and
Technology, 9(3), 525-533. doi: 10.1177/1932296815582222
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3224860/
What is scientific research?
26. 26
Inspire brilliant minds to innovate and create.
Healthcare
Deep Learning today
Finance Robotics
Telecom Travel Automotiv
e
27. CHALLENGE:
There are around 4500 salmon farms in Norway. Every week each
farm should submit ecology report to the government, otherwise
should pay high penalty. Sometimes it’s not possible due to weather
and limited access of biologist to the cages.
SOLUTION:
• Underwater capsula with 2 cameras for 24/7 monitoring and for 6k
resolution images.
• Image processing algorithm for fish detection and parasites
detection on early stages.
• Infrastructure for centralized processing and analytics.
BUSINESS VALUE:
• 10% cost reduction for report generation due to data collection
optimisation.
• up to 25% reduction of pesticide use
due to parasite detection on early stages.
Salmon’s parasite detection
28. CHALLENGE:
Client needed to quantify the volume of ripe fruits in the garden using
the automatic collection of images. They required the scalable solution
showing the applicability of DL-inspired approach.
Accurately locate the fruits in the images with different scales,
perspectives, angles etc. and provide robust scalable solution.
SOLUTION:
• Developed object detection algorithm based on Faster R-CNN
architecture
• Built AI system that identifies fruit in an image in a nearly human-
level performance
BUSINESS VALUE:
• More accurate counting 83% versus 75% (human performance)
• AI framework able to automate counting
on 1.6 million of acres orange fields.
Fruits detection
Video
29. CHALLENGE:
The Client needed to automatically evaluate
the car repair costs using DL-inspired approach
and eliminate manual evaluation errors.
Each claim contained images with different types
of damages, different perspectives, scale,
amount of dirt, sun glare etc.
SOLUTION:
Developed system classifies damaged parts of a сar,
segments the damages in the images and automatically
estimates a car claim cost.
BUSINESS VALUE:
Reduce up to 30% costs associated with needs to send
commissar to the car accident with not significant damages
Car claim cost evaluation
30. CHALLENGE:
Validate if ID submitted by the user is original photo or scan and has
no Photoshop of any other modifications/changes.
SOLUTION:
Optimized convolution neural network architectures for image
classification
BUSINESS VALUE:
• Automation of 35 000 photos processing per month.
• 90-95% accuracy in identification of real vs. fake IDs
Online ID validation
31. CHALLENGE:
Develop an algorithm to automatically generate masks for humans in
images. The algorithm has to predict whether each pixel of the picture
belongs to the class of the human or the background.
SOLUTION:
• fully-convolutional deep neural network with weighted and penalized
loss
BUSINESS VALUE:
• Approx. 2,500 CVs/month to be processed by algorithm
• It saves around 1000 person-hours/month
Photo processing automation
33. CHALLENGE:
Communication between the employer and candidates should be changed.
The job seekers need to be engaged while the burden of a large part of
qualifying and scheduling routine should be automated. ~80% of candidates
are comfortable interacting with the chatbots.
SOLUTION:
NLP powered chatbot assistant to job seekers:
• collects information from candidates and proposes the vacancies
• questions about candidates’ skills, knowledge, and experience
• ranks candidates
• answers FAQs about the job and the recruitment process
• schedules an interview with a human recruiter
BUSINESS VALUE:
• Real-life simultaneous interaction with hundreds of candidates
• No ignored applications
• automating up to 80% of routine recruitment work
• saving of $3,500 for recruitment of one candidate
Case-NN Recruitment ChatBot
34. CHALLENGE:
Today’s consumers are active online 24 hours a day, seven days a week.
Customers want support when needed, but with around-the-clock online use,
that mean high overhead costs in staffing for call centres. More suitable
option would be messengers.
SOLUTION:
Developed and released Chatbot solution for Telecom tech support. Chatbot
has integrated NLP to understand customers requests and provide related
help with multi language support.
BUSINESS VALUE:
• Streamline user experience with 24/7 support
• 50% of the call centre enquiries covered by chatbot (~ 600 calls) $2,400
saved/day
• Approx. $864,000/year saved
Case-XX Customer Support ChatBot
35. 35
Bus schedule chatbot
BusBot is available via
FB
CHALLENGE:
Big business center provides shuttle buses to multiple locations.
Schedule may change due to traffic conditions, weather, day of
the week and provider operation issues. Tenants have to get the
instant access to the actual schedule online.
SOLUTION:
Chatbot solution for business center using NLP to provide the
nearest bus to the location, pictures of bus stops, schedule,
traffic alerts etc. Integration with scheduling and GPS services.
BUSINESS VALUE:
• Streamline user experience with 24/7 support
• 98% of users are satisfied
• Decrease of the shuttle bus scheduling
support costs
36. CHALLENGE:
The Client ask for innovation project to find a way how to
leverage from conversation commerce in retail.
SOLUTION:
Smart IoT shell with directed microphone and direct
speaker covered a selected zone. Designed smart
assistant that convert speech to text and advice customer
in his need. Additional camera help to identify products and
find similar in a database.
BUSINESS VALUE:
• Increase client satisfaction in pilot shop on 7%
• Enable buying experience for people with disabilities.
Blind people could be guided buy sound and their
movements controlled with a camera.
• First step in Amazon competition strategy.
Case-4 Virtual assistant in Retail
37. 37
Skills | Knowledge | CollaborationSkills | Knowledge | Collaboration
DEEP LEARNING
AS A SERVICE
39. Process Automation
Risk Analysis
Assistants & Bots
Security
• Credit / loans
• Insurance
• Yield / harvesting
• Health diagnosis
• Visual object tracking
• Document processing
• Invoice validation
• Fraud detection
• Video surveillance
• GDPR compliance
• 24/7 support bots
• Conversational commerce
• New sales channels
• New search experience
We work with:
40. Identification
of the need
Data
Collection
Data
Preparation
Train
models
Deploy
Improve,
Grow,
Scale
Support
How we deliver deep learning solutions
It’s a unique approach that
allows us to help you to
identify business needs and
apply modern scientific
approach to design digital
transformation roadmap for
you.
Being NVIDIA preferred deep learning
partner we have a library of the best
state of the art DNN architectures and
frameworks that allow us to start
model training immediately as we got
data.
Partnership with AWS,
Microsoft, IBM and Google
Cloud Platform allows us
easily setup infrastructure
and collect all you data in a
right DWH.
24/7 support service
41. What is your first step with us?
Ciklum will offer architecture
design and project roadmap
based on our initial research
Setup a team:
● Data Engineer
● Data Scientist
● Project Manager
Create All needed
infrastructure
Business Analyst & Data
Scientists will do audit of
your existing solution
We integrate with your Data
sources establish continuous
data collection
Having all consolidated data
we do labeling or data
cleaning.
Iterative process of model training and
scientific research will result in solution
with expected accuracy.
Ready model will be integrated
into your product or will be
accessible over API
Post project support will help to collect
feedback, plan next iteration of innovation and
measure business impact.
42. 42
Skills | Knowledge | Collaboration
Initiation
INITIATION PHASE IS REQUIRED TO UNDERSTAND:
• from the Client side:
- Expectations on the role of Deep Learning solution in business transformation
- Efforts, risks and benefits from implementation the Deep Learning solution
• from the Ciklum side:
- Data review (validity and applicability to solve the business problem)
- Opportunities in project implementation plan
DELIVERABLES:
1. Understanding of solution
implementation using Deep Learning
2. Solution architecture vision
3. Infrastructure recommendations
4. Implementation recommendations
5. Data recommendations
DURATION: 3-5 days on/off site
43. 43
Skills | Knowledge | Collaboration
From the Client side:
- Expectations on the role of Deep Learning solution in
business transformation
- Disclosure of the available data and infrastructure,
requirements (security, throughput etc.)
- Client’s core team availability for discussion and
consultations
From the Ciklum side:
- Data review (validity and applicability to solve the
business problem)
- Opportunities in project implementation
- Possible risks and mitigation plan
- Expectation management
Consulting in Deep Learning
DELIVERABLES:
1. Statement of Work
Business objectives
High-level and detailed requirements
1. Solution Architecture
Architecture and infrastructure diagrams documented with
Service Level Agreement (if needed)
Hardware recommendations
1. Solution implementation roadmap
Scientific paper review about the possible solutions (if
required)
Project roadmap
Team composition
1. Dataset report (volume, balance, labeling) and
recommendations
Recommendations on data labeling, data collection and
needed datasets approximate volume
DURATION: 1-4 weeks
44. 44
Skills | Knowledge | Collaboration
EXPLORATION & ESTIMATION PHASE:
• from the Client side:
- Business needs and expectations
- Available data
• from Ciklum side:
- Effort estimate (team, infrastructure,
duration)
PoC, Exploration & Estimation
DELIVERABLES:
1. First results of the Deep Learning
models implementation
2. Report and recommendations based
on the data exploration
3. Project plan and budget estimate
DURATION: 2-4 weeks
45. 45
Skills | Knowledge | Collaboration
• from the Client side:
- Business needs and expectations
- Available data
• from Ciklum side:
- Effort estimate
- Team:
• Deep Learning Research
Engineers
• Data Engineers, DevOps,
• Software Engineers,
• Design,
• QA etc.
Deep Learning in production
DELIVERABLES:
1. Deep Learning model deployed
2. Model performance report
3. Integration/support
4. Further development plan
DURATION: 2-6 months
46. 21Million
People get to their holidays
with
Thomas Cook using web
platform developed by
Ciklum
19 Million
Enjoy restaurant meal with
Just Eat’s service
maintained by Ciklum
45,000
Uganda newborns receive
a chance to survive with
Neopenda neonatal
monitor developed by
Ciklum R&D Team
200,000
daily connections of
Flixbus rely on Ciklum DB
solution to transfer
passengers to over 1,200
destinations.
4 Million
Customers are using
Payoneer services relying
on payment platform
supported by Ciklum
168 Million
of Ebay active users benefit
with DFS tech platforms
supported by Ciklum
47. 620,000
Betsson active players in
Q3 2017 had a chance to
win with web and mobile
platforms supported by
Ciklum.
€1.7 Bn
of L’Oreal E-commerce
sales achieved with Kantar
Retail VR solution
developed by Ciklum
14 Million
Tele2 clients rely on Ciklum
eCommerce solutions and
mobile applications
50,000
Jabra users can hear
better via with Resound
Relief hearing aids app
supported by Ciklum
156 Million
TimeOut global monthly
audience relies on Ciklum
while discovering the world
10,000
MasterCard employees enjoy
WalkMe Digital Adoption
Platform supported by Ciklum