SlideShare una empresa de Scribd logo
1 de 150
UFSM
SACT 01.04.2019
❖ Head of R&D, Dafiti
❖ 17+ years IT stuff
❖ 1981 - 2011 in Germany
❖ 2011 Rocket Internet (locondo, lamoda, dafiti)
❖ Since 2011 in Brazil
➢ and in some way or another w/ Dafiti
❖ married, 2 sons
❖ Skype: georg.buske
❖ Drop me an email: georg.buske@dafiti.com.br
whoami; Georg Buske
Lots of stuff, won’t stop at each slide for long
Should be interactive - please ask anything during the presentation
Disclaimer
Industry view
Get to know Dafiti
Lots of examples and showcases, about successes and failures
Answer your questions
Today’s Objective
About us / History
• Founded in 2011;
• Offices in 4 countries;
• 2.900 employees;
• 5 warehouses in LATAM;
• 50MM monthly users;
• > R$ 1.4 bi gross revenue
• Belongs to Global Fashion Group since 2014.
• Today ~120 people in IT (Brasil)
• R&D area created in 01/2018
• DFTech: Dafiti’s tech brand
Our history
Dafiti & GFG
Dafiti & GFG
- Global fashion group
(GFG)
- founded in 2014
- HQ London /
Singapore / tech
hub Vietnam
- operates in 27
counrites
- joint initiatives
IT Organizational timeline
● 2010 - 2011
○ project CTO / Rocket Internet
○ local dev team
○ Berlin dev team
● 2011 - 2012 (incl. first jira generation)
○ IT support
○ project teams
○ sprint team (each 1 Manager + coordinators)
○ backoffice team
○ infrastructure team
○ dedicated QA
○ outsourced developers
● 2012 - 2013 (incl. new jira generation)
○ as before with architecture team
● 2013 - 2014
○ as before with module owners inside sprint and project teams (technical ownership)
○ NOC
● 2014 - 2015 (incl. new jira generation)
○ agile cells and committee of technical leaders (with POs and SMs) instead of project and sprint
○ lots of SAP consultants, backoffice team is now more part of global IT
IT Organizational timeline
● 2015 - 2016
○ renamed architecture team to labs
○ dedicated UX / frontend team
○ removed NOC
○ removed outsourced developers
● 2016 - 2017 (incl. new jira generation)
○ squads and [explicit] cross functional teams instead of agile cells
○ PO = PM (product manager)
○ removed scrum master
○ renamed labs to devtools
○ added maintenance team
● 2017 - 2019++
○ PMs, POs, squads, pillars, SREs, prioritization committee and product funnel, R&D (from 2018)
Learning 2: All approaches since 2014 are all not that different and could have
worked with the right focus and methodologies. Thus another approach might
Learning 1: Whatever the next approach will be, we should fix the structure last
(processes first).
“Culture eats strategy to breakfast”
-- Peter Drucker
Organizational Design and communication
structures
Conway’s Law
Organizations which design systems are constrained to produce designs which are copies of the
communication structures of these organizations.
Brook’s Law
Adding human resources to a late software project makes it late
Jeff’s 2 Pizza rule
If a team couldn't be fed with two pizzas, it was too big
Project timeline
2011 Relaunch (Magento -> Alice and Bob)
2012 - 2013 lots of minor projects
2014 SAP
2015 Marketplace
2016 TriKan Integration (also deployment change because of audit problem)
...
Learning: There will never be the right time for [technical and/or cultural] shift -
and there will be always be #blackfriday at the last friday in November!
Legacy systems
“Today’s implementation is tomorrow’s legacy”
Dafiti’s system is a 8 year old set of monolithic applications
the
THE CONCEPT
#DfTechJourney - Concept
● Core Customer
● 21th century mindset (VUCA)
● Technology Heavy User
● Contributors Culture – Empowerment
● Space for experimentation
● Execution speed & Fail Fast (MVP’s)
● BT - Business Technology
● Never left the day 1 – Agile “Start-UP” (by Jeff Bezos -Amazon)
What are the main points of Exponential structure & BT?
OUR GOAL
Transform Dafiti's E-commerce to an
Exponential Platform for our Customers
Improving the user experience, applying the best technologies,
through a learning culture and continuous improvement.
OUR
MANTRAS
THE
JOURNEY
Wave 1 - Rice and Beans
6 months
Wave 2 - The place to be
1 Year
Wave 3 - F*cking Awesome
6 months
THEMES
Infrastructure
Corporate IT & DC
Information Security
People & Culture
Products
SRE
Governance
Innovation &
Intelligence
Tech Stack
D&A
Platform
Backoffice
It was a busy year
People
HIERARCHY CHART
CTO
Cristiano Hyppolito
Head of Eng
LATAM Rafael
Morelo
Head of R&D
LATAM Georg
Buske
Manager of Eng
LATAM
Pablo Maronna
Head of Gov
LATAM
Leandro Lemes
Head of InfoSec
LATAM
Luis Gonçalvez
Head of
BackOffice
LATAM
Adriana Ramos
Head of Infra
LATAM
Fabio Jacometto
Argentina Chile Colômbia
Coord Helpdesk CH
& CO TBD
Colombia
Chile
Argentina Chile
Head of AGILE
TBD
Org structure: Classical Organigram, but in practice super flat
During 2019: 300 Astronauts in Brazil + Argentina + Chile + Colômbia
#Dafiti
Our purpose is to revolutionize the fashion
ecosystem with intelligence.
Our principles:
- we put the customer at the center of everything
- we never stop learning
- we act with intelligence
- we build the best teams
- we trust and support each other
- we work together for the common good
Lots of achievements:
Our purpose, our journey, our blackfriday!
4 x orders of a normal day
324 orders / minute
● Lots of new collegues (third parties and full time hires)
● company wide agile rollout
● Ghostbusters (internal hackathon)
● intercontinental teams (AR + BR)
● lots of fun and beer (in fact, at least every friday - cheers)
● consulting for agile, platform and more
● new platform to come
● new dashboards via live
and many more...
#DFTechJourney
R&D and Innovation Recap & Outlook
Training for all
Safari as learning platform
R&D and Innovation Recap & Outlook
There are technical topics in other
departments which want to get taught:
Python. SQL. HTML, Big Data, Angular/
React, Arquitetura de Banco de Dados/ ETL,
R (programming)
DFTAcademy rollout
R&D and Innovation Recap & Outlook
#DfTechJourney
Trying new ways for talent
acquisition in tech: hackerX
and stackoverflow talent
#DfTechJourney
Workshops & Guilds
● Machine Learning 101: regression (home prices prediction)
○ https://docs.google.com/presentation/d/1JAg382c9LMrdUm1lSvOfiTGTpWj9iEDKU1Saz9NAEPk
● Machine Learning 101: Image Understanding (Fashion-MNIST)
○ https://docs.google.com/presentation/d/122Pl6ej1x4JZVI1aN-Lawb6LlQ7gEOKT3C5x11L0EkA
● Machine Learning 101: Natural Language Processing (Rating and Reviews)
○ https://docs.google.com/presentation/d/1mC01GXDTByoRNtrPUdpxe1rWqlZ9u5EaxbJsM9Yl0rw
● Machine Learning 101: clustering (Dafiti brands)
○ https://drive.google.com/drive/folders/1XeHMBgh2Lx9LwJpX6Hunb2I0RgX5WgdQ?ogsrc=32
● Machine Learning 101: Recommendation engine (Dafiti products)
○ https://drive.google.com/drive/folders/1hgf4NzOEE0ExRb0EFQ7XUpT8MqFivrfM?ogsrc=32
● Python 101:
○ https://drive.google.com/drive/folders/1OHbNu8DBh3WecpY3jmJVQdd_tpyyaACs
Internal workshops
R&D and Innovation Recap & Outlook
Workshops delivered through DFT
Academy and HR support 11/2018
and 12/2018
#DfTechJourney
ML Workshops
Machine learning guild’s main
objective in 2018:
● create internal workshops
objective 2019:
● papers we love / journal club
#DfTechJourney
ML Guild
● 25.07.2018 - definition and goals
● 29.08.2018 - DWH training Redshift
https://docs.google.com/presentation/d/1muxuxnlBgG0GAF9RP9vFtYcNfEw5JWCydqaoEYT8VUY
● 19.09.2018 - Data catalog and internal system Hulk
https://docs.google.com/presentation/d/1CscU8TcI-
2YsJGJCJxiewEXS9o1qxCykOzJZ95SZd4w/edit?ts=5ba293fb&pli=1#slide=id.g3f4ca1ae3c_1_0
● 10.10.2018 - internal system Nick
● 02.01.2019 - data security (TBD)
Summaries: https://docs.google.com/document/d/1d9Edegl2iiLlH4Qa7PkROwb_5FyYd-
e3GaYAVQpufgU/edit#
R&D and Innovation Recap & Outlook
Data Guild
Events
● Agile trends (H1)
● Sponsoring papis.io (H1)
● Hosting pydata meetup
● Semacomp
● II Congresso Latino-Americano de IA
● Mediaeval
● Hosting deep learning meetup
Events
#DfTechJourney
papis.io sponsoring
papis.io is a maior conference
about machine learning
#DfTechJourney
Follow: https://twitter.com/dafiti_tech or https://www.linkedin.com/company/dafiti/
Tweet: I <3 ML #papis #dafiti #ufsm
To make part of the raffle to win a papis LATAM 2019 ticket
Hosting pyData meetup
#DfTechJourney
II Congresso IA LatAm
R&D and Innovation Recap & Outlook
#DfTechJourney
mediaEval, France
And more
Agile trends 04/2018, Semacomp 10/2018, deep
learning meetup in 12/2018 (TBD) o/
R&D and Innovation Recap & Outlook
Methodology
OKRs (a.k.a. objectives and
key results)
R&D and Innovation Recap & Outlook
Shared OKRs
1
12
5.001
● company wide
● guarantees alignment and focus
● Strategic Objectives valid for 1 year
● KRs reviewed every 3 months
● regular team check-ins
● confidence index
Shared OKRs
#DfTechJourney
Still learning - MVP
#DfTechJourney
● Physical Kanban board with
backlog
● Started with sprints and Jira board
○ continue to improve in 2019
with participation by agile
masters
○ timebox: at least weeks
○ caution: extensive planning
● 2 - 2 - 2
○ 2 days kick - off
○ 2 weeks demo
○ 2 months verifiable user
facing prototype
Methodology
CRISP-DM
Various breakdowns on Kanban board
Initiative
Initiative
Initiative
Initiative
Initiative
Initiative
Initiative
Initiative
Initiative
Initiative
Initiative
Initiative
Initiative
Initiative
Initiative
Initiative
Predictability
Uncertainty
Cone of Uncertainty
Nessa etapa o time
conseguirá dar uma
previsibilidade de
entrega baseado no
histórico.
When?
What?
How?
For what?
Which?
Why?
Stakeholders
C-Level
Product
Manager
Engineering
Manager
Product
Owner
Engineering
Manager
Engineering
Manager
4x
2x
1,25 x
0,8 x
0,5 x
0,25 x
Product development workflow
Engineering
Platform
Team
Discovery Payment & Order + MKTplace Post Sales
Platform
Team
Platform
Team
Feature
Team
Feature
Team
Feature
Team
SRE
Team
SRE
Team
SRE
Team
Platform
01/01
Feature
01/01
Infra
00 / 01
Platform
01/01
Feature
01/01
Infra
01 / 01
Platform
01/01
Feature
01/01
Infra
00 / 01
Product team split (pillars & squads)
- each pillar has a PM, an engineering manager and various quads responsible for specific
features consisting of: Engineers and product owners
- and supported by (cross): Agile coaches, UX, Data engineering, AI and infrastructure
Architecture 2019
#DfTechJourney
A macro view of the technologies we will use...
Dafiti Maturity Model
...AND DATA FUELED
by our awesome D&A team :)
Data Lake DWH
Reporting
Sharing
Data
Load/ Export
Orchestration
Data Quality
Scheduling
Monitoring
ETL
Data Streaming
Datamarts
Feeds
Named Queries
Data Security
D&A
Tech Services
accengage
adjust
admotion
appannie
b2b
b2w
bingads
bob
campaign
carmen
criteo
cubiscan
dynad
exacttarget
exchange
external
fabric
facebook
financial
fit
freight
google
gotcha
Dafiti Data Lake
homer
ino
internal
itunes
king
madruga
marketing
markovian
netsuite
osticket
parallel
price
reception
responsys
sap
seller
solr
supplier
taboola
tms
wms
yahoo
zanox
zendesk
> 50 Different Sources
> 160 Database
schemas
8Tb distributed in 800k ORC / Parquet Files
7.5Tb in 6k Tables
http://172.18.10.70:8080/nick/home
Huge Files
When the files
aren´t so big and
we need to apply
filters
For more demanded data
D&A
Data Architecture
D&A Governance
D&A
Data Sharing Map
Transactional Systems / External
Sources
BI Tools
Operational Reports
(based on 1 system)
Data Feeds / Data Interfaces Operational Reports - “Heavy”, “Hard do run” reports
(based on 2 or more systems)
Tactical Reports / Dashboards
External platforms
Historical Data
Data Mining / AI
GFG BI
Global Pricing
Live
Visenze
Marketing Apps
D&A
Data Hub
Pricing
Commercial Planning
Supply Chain
Logistics
Transportation
Data Mining / AI
Other Platforms
GFG BI
Global Pricing
Google / Facebook
Financial Processes
Customer Service
the
Now R&D and Innovation…
Executive Summary
D&A / DWH
R&D / D&A
Ops / D&A
R&D / Eng.
Team
R&D Team
Will Marcio
Ricardo
contratando
Georg
Drop us an email: research-and-development@dafiti.com.br
Rafael Albert
Partnerships & Startups
● Visual conception
○ Visenze
○ Streamoid
○ Flashwall
○ Markable.ai
○ Syte
○ Flixstock
R&D and Innovation Recap & Outlook
Third party product integration (PoCs 2018)
Understand needs, create assessment framework, search more
possible third parties (benchmark or integration)
use the startup ecosystem to create value for Dafiti!
there are many startups pushing into chatbots
and fashion (image similarity and catalog
enrichment) but nobody is trying the hard stuff
as a product (e.g. marketing budget allocation)
;-)
● academic research group
● current status: paper work
● works on AR and image
understanding
R&D and Innovation Recap & Outlook
R&D and Innovation Recap & Outlook
● Internships @ Dafiti
○ Still working on the contractual part but we are making this
happen
○ feel free to send me an email if you have interest:
georg.buske@dafiti.com.br
Dafiti <3 UFSM
R&D and Innovation Recap & Outlook
Innovation hubs
● Starting in 2019 create hubs in Brazil
● Work more closely with GFG (first calls with lamoda
R&D, get back to innovation topics within GFG)
● Until 2020 assess possibilities in China and USA
Use the startup ecosystem as multiplier - this is
what an exponential platform means...
If you participate in UFSM incubator
and/or creating a startup disrupting
fashion and/or commerce we want to
hear from you :)
Vision &
key takeaways
R&D Vision
Purpose: Lead the revolution of fashion and shopping
with AI and technological innovation.
Mission: Give Dafiti the capacity to use state-of-the-
art AI.
Innovation and research
needs alignment too!
Innovation and
Intelligence Committee
Innovating our fashion eCommerce and help with the transformation to THE fashion platform in LATAM with the aid of
innovative ways such as machine learning, resp. artificial intelligence in general. E.g.:
● Building algorithms that help us with anticipated shipping, purchasing forecast and protects us against system failure.
● using image recognition to give our users the highest possible convenience and coolest features.
● using state of the art game engines to build virtual reality into our customer experience.
● Optimize product search and build data consistency monitoring.
● Help building a large scale architecture together with entire IT team.
● create a machine learning framework / standard stack and rules (e.g. Sakemaker, CICD, multi-cloud, experiment,
tracking, etc.)
The outcome will be nothing less than transform the way e-commerce works and to
provide sustainable solutions.
Mostly the team won’t work directly on user facing products but assesses ways to create impact and works together with
other areas to make them happen.
R&D and Innovation Recap & Outlook
R&D and Innovation Recap & Outlook
Key takeaways
● Events and techbranding is important to attract talent (team goal achieved - open positions filled,
BIG THANKS TO OUR HRBPs <3)
○ OTOH we’ll reduce the number of indicators which make the brand index [the old index is not
in these slides, please refer to the R&D strategy docs for more information]
● Techbranding and internal workshops not only helps foster DFTech as brand and teach our internal
workforce but creates insights and identifies problems and opportunities
● The plan is to start 2019 with 100 % alignment and a mixed model of internal workforce, consulting
partnerships and third party providers
○ regular update and alignment meetings will be held in form of an intelligence & innovation
committee
○ using more rigid agile methods such as timeboxed sprints (incl. planning / review) to create
more visibility and better alignment on results
● we will invest more into our ML standards and stack (as already started)
1
Innovation and research needs alignment,
too!
=> Innovation and Intelligence Committee
R&D and Innovation Recap & Outlook
Key takeaways
● To become a name in research we must invest more and thus will start with 20 % time for this and
will partner more with academic institutions
● We’ll reduce the number of area KPIs monitored to budget, people, PoCs realized, models launched
in production, innovations launched (ideation will be part of this metric), third parties assessed,
internal workshops given and techbrand initiatives (papers, articles, events, etc.) for now - KPI
review is not in this presentation (please see strategy docs for old area metrics list)
● Pricing optimization and marketing allocation projects didn’t bring the expected results yet
○ eventually we will invest into more research
○ also there are many startups pushing into chatbots and fashion (image similarity and
catalog enrichment) but nobody trying the hard stuff as a product ;-)
● Investment in search, recommendations (looks, emails, onsite), catalog enrichment and image
recognition might be the most important in 2019
2
Balance explore (PoCs) VS.
exploit (production)!
R&D framework
HOW
● Committee
OUTPUT
● prioritized shortlist
● team composition (third party,
R&D, interdisciplinary team,
etc.)
R&D Framework
How
HOW
● Area or Product wishlist
● ML guilds or 20 % research
● Design thinking workshops
per area
OUTPUT
● Wishlist backlog (now: google
docs, future: open innovation
portal)
● ideas, hypothesis
Ideation Prioritization
Commit
tee
HOW
● Workshop per area together
with R&D (regular schedule)
OUTPUT
● ML Canvas (ML 101
workshop)
○ definition of
success criterias
and metrics
● Business canvas
● 6 pager
Detailing
R&D /
areas
Dafiti
Identif
y
collab
oratio
n /
work
type
R&D Framework
How
HOW
● Retrospective (Committee)
● Operation (if success)
OUTPUT
● Lessons learned / Insights
Finalization
Com
mittee
HOW
● Development
● Test (AB test)
● Refine until satisfied or
aborted (validation
with user)
OUTPUT
● Success -> deployment, ops
○ API
○ end-to-end
● Failure -> fail wall
Implement
HOW
● Data curation
● Paper research
● Third party benchmark
● EDA
OUTPUT
● Baseline model
● insights
● validated hypothesis
● GO/NOGO
PoC
R&D,
area,
third
party
KR:
6
KR:
2
Agile;
squad
s; TBD
R&D,
area,
third
party
R&D Framework
RULES
● 2 weeks ahead of committee meetings requirements of possible projects and its
definitions (success metrics, ML canvas) needs to be done / aligned with R&D
● no ideation during committee, only backlog discussion (exception: today)
● area / product person responsible [optionally together with R&D] will present the detailed
ideation item to committee
How
R&D Framework
COLLABORATION / WORK TYPES
● PoC internal: The PoC execution is fully owned by R&D.
● Implementation: The implementation of the deployable live product which is full owned
by R&D (either end-to-end or as API).
● Coach: The implementation or PoC execution is owned by the area and a R&D member is
supporting the initiative as a coach.
● workshop (not part of framework): Either as TechTalk like workshop through
DFTAcademy or deeper classroom trainings (certification) ML concepts will be taught to
DFT employees (rather than a concrete business problem solved).
How
Tools
Design Thinking workshops
CONVERGENTE DIVERGENTE
Entender Definir Gerar
Ideias
Decidir
Necessid
ades
(Pessoas)
Viabili
dade
(Negóc
ios)
Possib
ilidade
(Tecnol
ogia)
Oportun
idades
(Inovaç
ão)
DESIGN
THINKING
● with and by our awesome UX team
Machine Learning Canvas
● the canvas help to understand the maturity of the project (in terms of data sources, value proposition,
etc.)
● not everyone needs to understand every part but having the canvas created and validated shows it is
ready to work on
● value proposition
○ if there is a overall business model canvas the proportional value can be used for the ML task at
hand
○ there must be a success metric
● For the ones eager to learn more:
○ New book draft (I will send later on)
○ ML 101 workshop where we’ll discuss ML canvas
○ more to come :-)
Machine Learning Canvas
Example: Customer service
● Develop intelligence / integration for services
we already have: Chat BOT Facebook
Messenger / E-mail Form Site / FAQ
● Develop intelligence / integration for calls
that we would like to have: BOT Time Line
Facebook, Instagram and Twitter / Chat BOT
Shop / Whatsapp (Online and Offline), URA
(Voice Response)
● In addition, a work that was developed in
B2W and generated many gains in speed of
service, quality and standardization of
contacts was the development of a Virtual
Attendant, who in addition to passing
information, can execute actions (sending 2ª
Via Boleto, 2ª Via de Nota Fiscal, alteration of
cadastral data, sending reset of password
Initial ideas
BMC
(optional)
MLC
Ideation / Detailing
Innovation and product
launches
R&D and Innovation Recap & Outlook
Image Similarity
R&D and Innovation Recap & Outlook
R&D and Innovation Recap & Outlook
Return rate prediction
Approach
Dado um item comprado, prever se o item será retornado ou não.
DW
Feature
Extraction
Modelo 1
Modelo 2
Modelo
Agregador
Prob. de
Retorno
Features
Foram usadas em torno de 100 features a partir do produto, cliente e
transação para o treinamento dos modelos. Dentre as features estão:
- CEP de entrega
- Fornecedor
- Marca
- Idade da Conta do Usuário
- Tempo entre pedido e entrega
- Tempo desde a última compra do cliente
- Net Total Value
- Número de pedidos e retornos observados no cliente até a data
- etc...
Modelo Agregado
- Scores abaixo de 0.01:
- 90% dos itens não-retornados
- Taxa de retorno de 0.02%
- Scores acima de 0.5:
- 53% dos itens retornados
- Taxa de retorno de 19%
R&D and Innovation Recap & Outlook
R&D and Innovation Recap & Outlook
● a successful model for return rate prediction was created
● deployed via AWS sakemaker (part of ML standard)
● could be easily adapted for cancellation rate
Insights
● During the return rate project we noted many of our business
concern involve Survival Analysis.
● Survival Analysis model situations in which there are discrete
events that take some time to occur.
● Most of our problems fall into a less standard type of Survival
models called Cure Models
● We are currently developing the capability of applying cure
models in complex datasets for both insights and predictive
modelling.
● This will allows us to attack return rates, cancellation rates,
second purchase behavior, time-to-delivery, time-to-stock-
replenishment and all sorts of time-to-X problems.
A few return rate insights
A few return rate insights
A few return rate insights
A few return rate insights
A few return rate insights
PoCs (proof of concepts)
● Search the look (H1)
● Search - S4 (H1)
● Categorization (catalog automatization)
● Causal impact and marketing budget allocation
● Size filters [external partner: bmind]
● Ratings and reviews
● Brand clustering
● Sales forecasting Blackfriday
PoCs (proof of concepts)
R&D and Innovation Recap & Outlook
R&D and Innovation Recap & Outlook
Search S4
R&D and Innovation Recap & Outlook
Search PoC (s4) as
fallback for datajet
(because of before
outages) with advanced
learning to rank and
search optimization
R&D and Innovation Recap & Outlook
R&D and Innovation Recap & Outlook
● Strategy 2019 work together on Search as a global product (datajet)
● learnings and advanced concepts from s4 will be applied to datajet
R&D and Innovation Recap & Outlook
Sales forecasting Blackfriday
Blackfriday throughout the years
Looking at sales from Thursday 00:00 to Sunday 23:59 in the years 2013 to 2018 there is a pattern that repeats every year:
Simulating revenue for 2018 based on 2017
Given the distribution of gross revenue per hour that was generated in 2017 during the Blackfriday, we could generate a
revenue projection for 2018. The values expected for each hour were derived based on the total revenue estimated by the Live
Sales, which is a system used at Dafiti that implements a moving average type of calculation.
R&D and Innovation Recap & Outlook
● Success during Blackfriday
● Knowledge and models obtained being applied to “General Sales
Forecasting”:
○ awareness of cyclic sales behaviour in specific time windows
○ lag features
○ extraction and usage of Dafiti’s full sales history
○ how to deal with the data granularity
○ benchmarking GBM vs Neural Network
While starting to work on pricing optimization
we realized we need a sophisticated
forecasting first
R&D and Innovation Recap & Outlook
Categorization (catalog automatization)
● Its goal is to automate object identification only from sku images.
● Imagenet* exists since 2010, and this task is considered dominated by
computer science.
● Deep Learning models are the actual state-of-the-art for this task.
● We have enough data for big learning models, over 3 million images.
● We have the data (needs some work) and we have the model!
● The data needs some adjustments as catalog “mistakes are easy to find”.
● Also the used catalog trees have duplicates, attributes are considered
category, examples from name_tree3:
○ "Other", "Outras Roupas", "Outros".
○ "Pijamas", "Pijamas e Camisetas".
○ "Polo Manga Curta", "Polo Manga Longa", "Polos".
Catalog automatization
● The trained model achieves this results for these catalog trees:
Catalog automatization
Catalog model errors total sku accuracy
name_tree1 72.683 681.244 89,33 %
name_tree2 136.793 681.244 79,92 %
name_tree3 158.898 681.244 76,67 %
Catalog automatization
Catalog automatization
Catalog automatization
Catalog automatization
Catalog automatization
Catalog automatization
What is suggested to fulfill an automatization:
1. Data cleansing with model’s insights and/or enhanced categorization tree and attributes.
2. Train and validate new model’s predictions.
3. Repeat 1 and 2 until satisfied.
4. Connect this API into the sku registration steps.
Next steps catalog automatization and conclusion:
● high potential for catalog curation
● learnings from 2018 will be applied in catalog cleanup 2019
R&D and Innovation Recap & Outlook
Ratings & Reviews
● The goal is to automate approval of reviews.
● Started with preparation for slides for a congress -> made part of the
hackathon -> was incorporated into ML 101 workshops -> results aligned with
business
● We have the data (also needs some work) and we have the model!
● The data needs some adjustments:
○ Is there a defined policy for approval/rejection of reviews?
○ Is historical data accurate enough for what the company wants for the future?
○ Does the company wants more insights from reviews?*
Ratings & Reviews
Ratings & Reviews Historical data
historical data:
reviews_approved.csv 519.463
reviews_rejected.csv 81.598
total reviews model’s errors accuracy f1-score
manually evaluated reviews 601.061 57.704 90,39 % 88 %
approved
rejected
Test data (15%) results:
model’s
confidence
text
0.916 A qualidade não é tão boa. Pelo preço esperava ms
0.968 Muito boa,linda.
0.663 Não consigo fechar a compra
0.589 A calça e pequena tenho 1.63 ela ficou no meio das pernas odiei.por favor me reponha o valor pago.
0.773 Descascou no primeiro dia de uso. Decepcionada...
0.878 Recebi o tênis tem uma semana, a primeira vez que meu filho usou e fui limpar, o tênis desbotou. Não tem
qualidade
0.869 Lola lp.k
0.917 Produto
in store: REJECTED model’s prediction: APPROVED
Ratings & Reviews Historical data
model’s
confidence
text
0.731 Gostaria de saber quando estará disponível o nº 34?
0.973 very satisfied with the product. Great finish and very good value for the money.
Fits my shoe-size perfectly
0.549 gOSTARIA DE SABER SE VCS TEM ESSE SAPATO EM AZUL MARINHO!! OBRIGADO
MARIA LUCIA
0.700 oieeeeeeeeee eu queria essa linda sandalia pfv venha me dar xhauu
0.527 Quero saber se posso trocar o número se não der ..
0.521 è muito bonito mas eu vivo em moçambique e gostava que abrisem uma loja ca em maputo na capital de
moçambique.
0.691 morri porfavor digam-me alguma coisa porfavor
in store: APPROVED model’s prediction: REJECTED
Ratings & Reviews Historical data
Ratings & Reviews Pending
reviews_pending.csv 219414
approved rejected
pending reviews 194.173 (88,5 %) 25.241 (11,5 %)
most confidence cases of: confidence value text
APPROVED 0.999 Decepção. Malha Muito fina e áspera, parece uma lixa.
REJECTED 0.999 Gostei muito da sandália, super confortável mas já
estou mandando de volta pois ela esfolou inteirinha na
parte interna em dois usos. Já enviei fotos, estarei
enviando de volta amanhã pra dafiti.
Ratings & Reviews Pending
What is suggested to fulfill an automatization:
1. Data cleansing with model’s insights.
2. Train and validate new model’s predictions.
3. Repeat 1 and 2 until satisfied.
4. Connect this API into the rating and reviews validation steps.
New project: extract insights and information directly from users reviews,
possibilities to explore:
a. brand and products alarms on user problems (quality, fitting,...)
b. detect reviews that are customer support related
c. sentiment analysis
Ratings & Reviews Pending
Owner? Decision?
=> committee
R&D and Innovation Recap & Outlook
Causal Impact and budget allocation
R&D and Innovation Recap & Outlook
Hold-Out Testes
● Processamento séries temporais estruturadas
● Teste em produção no canal “google non-brand SEM”
● Confirmação estatística de valor representativo do canal
● Criação de algoritmos em Python
R&D and Innovation Recap & Outlook
Hackathon - Marketing Budget Allocation:
● Time series and non-linear optimization
● Minimization of “CIR” (1 / ROI)
● Algorithm makes resource allocation suggestions to optimize CIR
R&D and Innovation Recap & Outlook
R&D and Innovation Recap & Outlook
Results:
● Opensourced port of causal impact package in R to python
● A Hackathon can create good insights and kick off BUT might create a false
sense of success
● Understood GA data is not complete
● Optimization TBD
R&D and Innovation Recap & Outlook
Brand clustering
Brand Clustering Analysis
● The goal is to bring marketing insights on how users act on brands, and
reduce the brands dimension
● We used Google Analytics (GA) actions for 2 days on Dafiti website. 640.235
cookie sessions interacting with 276.923 skus of 4.825 brands.
● Top 4 more interacted brands are:
○ [('Colcci', 54948), ('Vizzano', 42715), ('Santa Lolla', 41401), ('Moleca', 34054)]
● Top 4 GA scores:
○ [('Beautiful Lingerie', 13.7532), ('Philco', 12.5799), ('#Euqfiz', 11.2241), ('Kmc', 9.1530)]
Brand Clustering Analysis
Brand Clustering Analysis
How are Dafiti brands related to others?
● 9 brands cluster -> ['Armadillo', 'DAFITI UNIQUE', 'Ki-fofo', 'Lua Luá', 'Mania De Moça',
'Meketrefe', 'Penguin', 'Red Life', 'Styll Baby']
● 6 brands cluster -> ['Cavage', 'DAFITI JOY', 'La Beauté Cosmétiques', 'Miu Miu', 'Montain Boot',
'Refuse']
● 10 brands cluster -> ['DAFITI ACCESSORIES', 'Enox', 'Khatto', 'Paul Ryan', 'Prorider', 'Secret',
'Sunnies', 'THOMASTON', 'Terra e Agua', 'Tilit']
● 582 brands cluster -> ['...Lost', '100% Marca Própria', '3 Sprouts', ..., 'DAFITI I.D.', 'DC
Original', 'DGK', 'DKNY', ...,'Sex and the City Cosmetics', ...,'Shoes Shoes', ...,'You Rock',
'Zebu', 'Zenit', 'Ziva']
● 71 brands cluster -> ['Alta Villa Shoes', 'Asics', 'Ausländer', 'Beautiful Lingerie', 'Botswana',
'Bracciale Acessórios', 'Bull Motors', 'CZ Brand', 'Calcifran', 'Cisco', 'Columbia', 'Crocs',
'DAFITI EDGE', 'Dangelis Moda Íntima', ...,'Won Sports', 'Yardley', 'adidas', 'adidas Originals',
'adidas Performance', 'test', 'zeus']
● 534 brands cluster -> ['24 Horas Calçados', ..., 'Bvlgari', ...,'Café Brasil', ...,'Cravo &
Canela', ..., 'DAFITI', 'DAFITI SHOES', ...,'GUESS Kids', ...,'Harley-Davidson Footwear',
...,'Moleca', ...,'Santa Lolla', ...,'Tiffany & Co.', ...,'VIA UNO', ...,'Vizzano',...]
The resulted clustering does not help much for marketing insights directly.
Some changes are needed to provide a direct business value:
1. Consider the problem as a recommendation task.
2. Implement changes to the “Marreco” system, to provide an analysis over brand interactions.
Brand clustering
3 most similar brands to Dafiti brands and its similarity score (cosine):
● DAFITI I.D. - [('D-Tox', 0.3293), ('Monte Carlo Polo Club', 0.1646), ('Drop Life', 0.0856)]
● DAFITI SHOES - [('Moleca', 0.2111), ('Ana Cristina', 0.1580), ('Vizzano', 0.1565)]
● DAFITI EDGE - [('FKN', 0.0718), ('Lemon Grove', 0.0665), ('Yachtsman', 0.0366)]
● DAFITI - [('Ride Skateboard', 0.0666), ('Santa Maria', 0.0591), ('Snoopy', 0.0524)]
● DAFITI ACCESSORIES - [('Vila Flor', 0.0455), ('Prorider', 0.0435), ('Flyca Girls', 0.0334)]
● DAFITI UNIQUE - [('Shoulder', 0.0246), ('Energia', 0.0231), ('DAFITI ONTREND', 0.0166)]
R&D and Innovation Recap & Outlook
Filter cleanup analysis (sizes)
R&D and Innovation Recap & Outlook
Sizes children’s clothes
Internal workshops
R&D and Innovation Recap & Outlook
R&D and Innovation Recap & Outlook
R&D and Innovation Recap & Outlook
Conclusion:
● we leveraged third party knowledge (consulting) to do the analysis
● few [marketplace] products are creating a very bad user experience
● has some potential quickwins
● we need to align the best form in terms of architecture (first idea of DB
update might not be ideal) - product development support?
● What can we fix in registration process already?
Wishlist / Backlog
AI awareness
Sales forecasting (train new model with learnings from blackfriday forecasting)
Price optimization
ng and Buying
Marketing allocation
Cancellation rate
Email click prediction (recipient selection, Markovien)
Customer segmentation / user profiles
Online recommendations
Search
Reinforcement learning
Survival analysis
Email recommendations (jetlore competition)
Image similarity
Delivery prediction
Delivery visualization
Image segmentation
Anticipatory shipping
Intelligent Sizing
NLP for chatbots and sentiment analysis
Looks, Image understanding and shoppable videos
VR
personalized discounts
R&D and Innovation Recap & Outlook
Thank You!

Más contenido relacionado

Similar a Dafiti R&D, Semana Acadêmica do Centro de Tecnologia (SACT), UFSM 2019

Intro to International Product Management by AWS Principal PM
Intro to International Product Management by AWS Principal PMIntro to International Product Management by AWS Principal PM
Intro to International Product Management by AWS Principal PMProduct School
 
Technology Management- Google
Technology Management- Google Technology Management- Google
Technology Management- Google Pramod Patil
 
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...LavaConConference
 
Kick-off nieuwe Monitoring Werkgroep bij de GSE tijdens de Nationale GSE Conf...
Kick-off nieuwe Monitoring Werkgroep bij de GSE tijdens de Nationale GSE Conf...Kick-off nieuwe Monitoring Werkgroep bij de GSE tijdens de Nationale GSE Conf...
Kick-off nieuwe Monitoring Werkgroep bij de GSE tijdens de Nationale GSE Conf...BDekkema
 
AgileLIVE Webinar: Build a DevOps Culture & Infrastructure for Success Part 1
AgileLIVE Webinar: Build a DevOps Culture & Infrastructure for Success Part 1AgileLIVE Webinar: Build a DevOps Culture & Infrastructure for Success Part 1
AgileLIVE Webinar: Build a DevOps Culture & Infrastructure for Success Part 1VersionOne
 
SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?Alex Meadows
 
Are project tracking tools helping or complicating Continuous Improvement Pro...
Are project tracking tools helping or complicating Continuous Improvement Pro...Are project tracking tools helping or complicating Continuous Improvement Pro...
Are project tracking tools helping or complicating Continuous Improvement Pro...Kubilay Balci
 
CTO Management ToolBox - Demi Ben-Ari -- Panorays
CTO Management ToolBox - Demi Ben-Ari -- PanoraysCTO Management ToolBox - Demi Ben-Ari -- Panorays
CTO Management ToolBox - Demi Ben-Ari -- PanoraysDemi Ben-Ari
 
Thriving in an Environment of Change
Thriving in an Environment of ChangeThriving in an Environment of Change
Thriving in an Environment of ChangeNeeraj Bhatia
 
GDSC UOWMKDUPG Info Session
GDSC UOWMKDUPG Info SessionGDSC UOWMKDUPG Info Session
GDSC UOWMKDUPG Info SessionGDSCUOWMKDUPG
 
GDSC USeP - Infosession 2023.pptx
GDSC USeP - Infosession 2023.pptxGDSC USeP - Infosession 2023.pptx
GDSC USeP - Infosession 2023.pptxjrmaldeza00117
 
Challenges on Product Management for Global Platform
Challenges on Product Management for Global PlatformChallenges on Product Management for Global Platform
Challenges on Product Management for Global PlatformDaisuke Matsuda
 
Sydney MuleSoft Meetup #16 - 19 November 2020
Sydney MuleSoft Meetup #16 - 19 November 2020Sydney MuleSoft Meetup #16 - 19 November 2020
Sydney MuleSoft Meetup #16 - 19 November 2020Royston Lobo
 
Running a small, high tech consulting firm - lessons learned
Running a small, high tech consulting firm - lessons learnedRunning a small, high tech consulting firm - lessons learned
Running a small, high tech consulting firm - lessons learnedPere Ferrera Bertran
 
Enhancing Software Engineering Practices at Our Startup.pptx
Enhancing Software Engineering Practices at Our Startup.pptxEnhancing Software Engineering Practices at Our Startup.pptx
Enhancing Software Engineering Practices at Our Startup.pptxmuktar42
 
Info Session : University Institute of engineering and technology , Kurukshet...
Info Session : University Institute of engineering and technology , Kurukshet...Info Session : University Institute of engineering and technology , Kurukshet...
Info Session : University Institute of engineering and technology , Kurukshet...HRITIKKHURANA1
 
Tutorial helsinki 20180313 v1
Tutorial helsinki 20180313 v1Tutorial helsinki 20180313 v1
Tutorial helsinki 20180313 v1ISSIP
 

Similar a Dafiti R&D, Semana Acadêmica do Centro de Tecnologia (SACT), UFSM 2019 (20)

Intro to International Product Management by AWS Principal PM
Intro to International Product Management by AWS Principal PMIntro to International Product Management by AWS Principal PM
Intro to International Product Management by AWS Principal PM
 
Technology Management- Google
Technology Management- Google Technology Management- Google
Technology Management- Google
 
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...
 
Kick-off nieuwe Monitoring Werkgroep bij de GSE tijdens de Nationale GSE Conf...
Kick-off nieuwe Monitoring Werkgroep bij de GSE tijdens de Nationale GSE Conf...Kick-off nieuwe Monitoring Werkgroep bij de GSE tijdens de Nationale GSE Conf...
Kick-off nieuwe Monitoring Werkgroep bij de GSE tijdens de Nationale GSE Conf...
 
AgileLIVE Webinar: Build a DevOps Culture & Infrastructure for Success Part 1
AgileLIVE Webinar: Build a DevOps Culture & Infrastructure for Success Part 1AgileLIVE Webinar: Build a DevOps Culture & Infrastructure for Success Part 1
AgileLIVE Webinar: Build a DevOps Culture & Infrastructure for Success Part 1
 
SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?
 
Are project tracking tools helping or complicating Continuous Improvement Pro...
Are project tracking tools helping or complicating Continuous Improvement Pro...Are project tracking tools helping or complicating Continuous Improvement Pro...
Are project tracking tools helping or complicating Continuous Improvement Pro...
 
CTO Management ToolBox - Demi Ben-Ari -- Panorays
CTO Management ToolBox - Demi Ben-Ari -- PanoraysCTO Management ToolBox - Demi Ben-Ari -- Panorays
CTO Management ToolBox - Demi Ben-Ari -- Panorays
 
GDSC INFO SESSION.pptx
GDSC INFO SESSION.pptxGDSC INFO SESSION.pptx
GDSC INFO SESSION.pptx
 
Thriving in an Environment of Change
Thriving in an Environment of ChangeThriving in an Environment of Change
Thriving in an Environment of Change
 
Microsoft Shutters
Microsoft ShuttersMicrosoft Shutters
Microsoft Shutters
 
GDSC UOWMKDUPG Info Session
GDSC UOWMKDUPG Info SessionGDSC UOWMKDUPG Info Session
GDSC UOWMKDUPG Info Session
 
GDSC USeP - Infosession 2023.pptx
GDSC USeP - Infosession 2023.pptxGDSC USeP - Infosession 2023.pptx
GDSC USeP - Infosession 2023.pptx
 
Hanu van Niekerk - CV
Hanu van Niekerk - CVHanu van Niekerk - CV
Hanu van Niekerk - CV
 
Challenges on Product Management for Global Platform
Challenges on Product Management for Global PlatformChallenges on Product Management for Global Platform
Challenges on Product Management for Global Platform
 
Sydney MuleSoft Meetup #16 - 19 November 2020
Sydney MuleSoft Meetup #16 - 19 November 2020Sydney MuleSoft Meetup #16 - 19 November 2020
Sydney MuleSoft Meetup #16 - 19 November 2020
 
Running a small, high tech consulting firm - lessons learned
Running a small, high tech consulting firm - lessons learnedRunning a small, high tech consulting firm - lessons learned
Running a small, high tech consulting firm - lessons learned
 
Enhancing Software Engineering Practices at Our Startup.pptx
Enhancing Software Engineering Practices at Our Startup.pptxEnhancing Software Engineering Practices at Our Startup.pptx
Enhancing Software Engineering Practices at Our Startup.pptx
 
Info Session : University Institute of engineering and technology , Kurukshet...
Info Session : University Institute of engineering and technology , Kurukshet...Info Session : University Institute of engineering and technology , Kurukshet...
Info Session : University Institute of engineering and technology , Kurukshet...
 
Tutorial helsinki 20180313 v1
Tutorial helsinki 20180313 v1Tutorial helsinki 20180313 v1
Tutorial helsinki 20180313 v1
 

Último

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Último (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Dafiti R&D, Semana Acadêmica do Centro de Tecnologia (SACT), UFSM 2019

  • 2. ❖ Head of R&D, Dafiti ❖ 17+ years IT stuff ❖ 1981 - 2011 in Germany ❖ 2011 Rocket Internet (locondo, lamoda, dafiti) ❖ Since 2011 in Brazil ➢ and in some way or another w/ Dafiti ❖ married, 2 sons ❖ Skype: georg.buske ❖ Drop me an email: georg.buske@dafiti.com.br whoami; Georg Buske
  • 3. Lots of stuff, won’t stop at each slide for long Should be interactive - please ask anything during the presentation Disclaimer
  • 4. Industry view Get to know Dafiti Lots of examples and showcases, about successes and failures Answer your questions Today’s Objective
  • 5. About us / History
  • 6. • Founded in 2011; • Offices in 4 countries; • 2.900 employees; • 5 warehouses in LATAM; • 50MM monthly users; • > R$ 1.4 bi gross revenue • Belongs to Global Fashion Group since 2014. • Today ~120 people in IT (Brasil) • R&D area created in 01/2018 • DFTech: Dafiti’s tech brand Our history Dafiti & GFG
  • 7. Dafiti & GFG - Global fashion group (GFG) - founded in 2014 - HQ London / Singapore / tech hub Vietnam - operates in 27 counrites - joint initiatives
  • 8. IT Organizational timeline ● 2010 - 2011 ○ project CTO / Rocket Internet ○ local dev team ○ Berlin dev team ● 2011 - 2012 (incl. first jira generation) ○ IT support ○ project teams ○ sprint team (each 1 Manager + coordinators) ○ backoffice team ○ infrastructure team ○ dedicated QA ○ outsourced developers ● 2012 - 2013 (incl. new jira generation) ○ as before with architecture team ● 2013 - 2014 ○ as before with module owners inside sprint and project teams (technical ownership) ○ NOC ● 2014 - 2015 (incl. new jira generation) ○ agile cells and committee of technical leaders (with POs and SMs) instead of project and sprint ○ lots of SAP consultants, backoffice team is now more part of global IT
  • 9. IT Organizational timeline ● 2015 - 2016 ○ renamed architecture team to labs ○ dedicated UX / frontend team ○ removed NOC ○ removed outsourced developers ● 2016 - 2017 (incl. new jira generation) ○ squads and [explicit] cross functional teams instead of agile cells ○ PO = PM (product manager) ○ removed scrum master ○ renamed labs to devtools ○ added maintenance team ● 2017 - 2019++ ○ PMs, POs, squads, pillars, SREs, prioritization committee and product funnel, R&D (from 2018) Learning 2: All approaches since 2014 are all not that different and could have worked with the right focus and methodologies. Thus another approach might Learning 1: Whatever the next approach will be, we should fix the structure last (processes first).
  • 10. “Culture eats strategy to breakfast” -- Peter Drucker
  • 11. Organizational Design and communication structures Conway’s Law Organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations. Brook’s Law Adding human resources to a late software project makes it late Jeff’s 2 Pizza rule If a team couldn't be fed with two pizzas, it was too big
  • 12. Project timeline 2011 Relaunch (Magento -> Alice and Bob) 2012 - 2013 lots of minor projects 2014 SAP 2015 Marketplace 2016 TriKan Integration (also deployment change because of audit problem) ... Learning: There will never be the right time for [technical and/or cultural] shift - and there will be always be #blackfriday at the last friday in November!
  • 13. Legacy systems “Today’s implementation is tomorrow’s legacy” Dafiti’s system is a 8 year old set of monolithic applications
  • 14. the
  • 16. #DfTechJourney - Concept ● Core Customer ● 21th century mindset (VUCA) ● Technology Heavy User ● Contributors Culture – Empowerment ● Space for experimentation ● Execution speed & Fail Fast (MVP’s) ● BT - Business Technology ● Never left the day 1 – Agile “Start-UP” (by Jeff Bezos -Amazon) What are the main points of Exponential structure & BT?
  • 18. Transform Dafiti's E-commerce to an Exponential Platform for our Customers Improving the user experience, applying the best technologies, through a learning culture and continuous improvement.
  • 20.
  • 22. Wave 1 - Rice and Beans 6 months
  • 23. Wave 2 - The place to be 1 Year
  • 24. Wave 3 - F*cking Awesome 6 months
  • 26. Infrastructure Corporate IT & DC Information Security People & Culture Products SRE Governance Innovation & Intelligence Tech Stack D&A Platform Backoffice
  • 27. It was a busy year People
  • 28. HIERARCHY CHART CTO Cristiano Hyppolito Head of Eng LATAM Rafael Morelo Head of R&D LATAM Georg Buske Manager of Eng LATAM Pablo Maronna Head of Gov LATAM Leandro Lemes Head of InfoSec LATAM Luis Gonçalvez Head of BackOffice LATAM Adriana Ramos Head of Infra LATAM Fabio Jacometto Argentina Chile Colômbia Coord Helpdesk CH & CO TBD Colombia Chile Argentina Chile Head of AGILE TBD Org structure: Classical Organigram, but in practice super flat During 2019: 300 Astronauts in Brazil + Argentina + Chile + Colômbia
  • 29. #Dafiti Our purpose is to revolutionize the fashion ecosystem with intelligence. Our principles: - we put the customer at the center of everything - we never stop learning - we act with intelligence - we build the best teams - we trust and support each other - we work together for the common good Lots of achievements: Our purpose, our journey, our blackfriday! 4 x orders of a normal day 324 orders / minute
  • 30. ● Lots of new collegues (third parties and full time hires) ● company wide agile rollout ● Ghostbusters (internal hackathon) ● intercontinental teams (AR + BR) ● lots of fun and beer (in fact, at least every friday - cheers) ● consulting for agile, platform and more ● new platform to come ● new dashboards via live and many more... #DFTechJourney
  • 31. R&D and Innovation Recap & Outlook Training for all
  • 32. Safari as learning platform R&D and Innovation Recap & Outlook
  • 33. There are technical topics in other departments which want to get taught: Python. SQL. HTML, Big Data, Angular/ React, Arquitetura de Banco de Dados/ ETL, R (programming) DFTAcademy rollout R&D and Innovation Recap & Outlook
  • 34. #DfTechJourney Trying new ways for talent acquisition in tech: hackerX and stackoverflow talent
  • 37. ● Machine Learning 101: regression (home prices prediction) ○ https://docs.google.com/presentation/d/1JAg382c9LMrdUm1lSvOfiTGTpWj9iEDKU1Saz9NAEPk ● Machine Learning 101: Image Understanding (Fashion-MNIST) ○ https://docs.google.com/presentation/d/122Pl6ej1x4JZVI1aN-Lawb6LlQ7gEOKT3C5x11L0EkA ● Machine Learning 101: Natural Language Processing (Rating and Reviews) ○ https://docs.google.com/presentation/d/1mC01GXDTByoRNtrPUdpxe1rWqlZ9u5EaxbJsM9Yl0rw ● Machine Learning 101: clustering (Dafiti brands) ○ https://drive.google.com/drive/folders/1XeHMBgh2Lx9LwJpX6Hunb2I0RgX5WgdQ?ogsrc=32 ● Machine Learning 101: Recommendation engine (Dafiti products) ○ https://drive.google.com/drive/folders/1hgf4NzOEE0ExRb0EFQ7XUpT8MqFivrfM?ogsrc=32 ● Python 101: ○ https://drive.google.com/drive/folders/1OHbNu8DBh3WecpY3jmJVQdd_tpyyaACs Internal workshops R&D and Innovation Recap & Outlook
  • 38. Workshops delivered through DFT Academy and HR support 11/2018 and 12/2018 #DfTechJourney ML Workshops
  • 39. Machine learning guild’s main objective in 2018: ● create internal workshops objective 2019: ● papers we love / journal club #DfTechJourney ML Guild
  • 40. ● 25.07.2018 - definition and goals ● 29.08.2018 - DWH training Redshift https://docs.google.com/presentation/d/1muxuxnlBgG0GAF9RP9vFtYcNfEw5JWCydqaoEYT8VUY ● 19.09.2018 - Data catalog and internal system Hulk https://docs.google.com/presentation/d/1CscU8TcI- 2YsJGJCJxiewEXS9o1qxCykOzJZ95SZd4w/edit?ts=5ba293fb&pli=1#slide=id.g3f4ca1ae3c_1_0 ● 10.10.2018 - internal system Nick ● 02.01.2019 - data security (TBD) Summaries: https://docs.google.com/document/d/1d9Edegl2iiLlH4Qa7PkROwb_5FyYd- e3GaYAVQpufgU/edit# R&D and Innovation Recap & Outlook Data Guild
  • 42. ● Agile trends (H1) ● Sponsoring papis.io (H1) ● Hosting pydata meetup ● Semacomp ● II Congresso Latino-Americano de IA ● Mediaeval ● Hosting deep learning meetup Events #DfTechJourney
  • 43. papis.io sponsoring papis.io is a maior conference about machine learning #DfTechJourney
  • 44. Follow: https://twitter.com/dafiti_tech or https://www.linkedin.com/company/dafiti/ Tweet: I <3 ML #papis #dafiti #ufsm To make part of the raffle to win a papis LATAM 2019 ticket
  • 46. II Congresso IA LatAm R&D and Innovation Recap & Outlook
  • 48. And more Agile trends 04/2018, Semacomp 10/2018, deep learning meetup in 12/2018 (TBD) o/ R&D and Innovation Recap & Outlook
  • 50. OKRs (a.k.a. objectives and key results)
  • 51. R&D and Innovation Recap & Outlook Shared OKRs 1 12 5.001 ● company wide ● guarantees alignment and focus ● Strategic Objectives valid for 1 year ● KRs reviewed every 3 months ● regular team check-ins ● confidence index
  • 53. #DfTechJourney ● Physical Kanban board with backlog ● Started with sprints and Jira board ○ continue to improve in 2019 with participation by agile masters ○ timebox: at least weeks ○ caution: extensive planning ● 2 - 2 - 2 ○ 2 days kick - off ○ 2 weeks demo ○ 2 months verifiable user facing prototype Methodology CRISP-DM Various breakdowns on Kanban board
  • 54. Initiative Initiative Initiative Initiative Initiative Initiative Initiative Initiative Initiative Initiative Initiative Initiative Initiative Initiative Initiative Initiative Predictability Uncertainty Cone of Uncertainty Nessa etapa o time conseguirá dar uma previsibilidade de entrega baseado no histórico. When? What? How? For what? Which? Why? Stakeholders C-Level Product Manager Engineering Manager Product Owner Engineering Manager Engineering Manager 4x 2x 1,25 x 0,8 x 0,5 x 0,25 x Product development workflow
  • 56. Platform Team Discovery Payment & Order + MKTplace Post Sales Platform Team Platform Team Feature Team Feature Team Feature Team SRE Team SRE Team SRE Team Platform 01/01 Feature 01/01 Infra 00 / 01 Platform 01/01 Feature 01/01 Infra 01 / 01 Platform 01/01 Feature 01/01 Infra 00 / 01 Product team split (pillars & squads) - each pillar has a PM, an engineering manager and various quads responsible for specific features consisting of: Engineers and product owners - and supported by (cross): Agile coaches, UX, Data engineering, AI and infrastructure
  • 58. A macro view of the technologies we will use...
  • 59.
  • 61. ...AND DATA FUELED by our awesome D&A team :)
  • 62. Data Lake DWH Reporting Sharing Data Load/ Export Orchestration Data Quality Scheduling Monitoring ETL Data Streaming Datamarts Feeds Named Queries Data Security D&A Tech Services
  • 63. accengage adjust admotion appannie b2b b2w bingads bob campaign carmen criteo cubiscan dynad exacttarget exchange external fabric facebook financial fit freight google gotcha Dafiti Data Lake homer ino internal itunes king madruga marketing markovian netsuite osticket parallel price reception responsys sap seller solr supplier taboola tms wms yahoo zanox zendesk > 50 Different Sources > 160 Database schemas 8Tb distributed in 800k ORC / Parquet Files 7.5Tb in 6k Tables http://172.18.10.70:8080/nick/home Huge Files When the files aren´t so big and we need to apply filters For more demanded data D&A Data Architecture
  • 64. D&A Governance D&A Data Sharing Map Transactional Systems / External Sources BI Tools Operational Reports (based on 1 system) Data Feeds / Data Interfaces Operational Reports - “Heavy”, “Hard do run” reports (based on 2 or more systems) Tactical Reports / Dashboards External platforms Historical Data Data Mining / AI GFG BI Global Pricing Live Visenze Marketing Apps
  • 65. D&A Data Hub Pricing Commercial Planning Supply Chain Logistics Transportation Data Mining / AI Other Platforms GFG BI Global Pricing Google / Facebook Financial Processes Customer Service
  • 66. the Now R&D and Innovation… Executive Summary
  • 67. D&A / DWH R&D / D&A Ops / D&A R&D / Eng.
  • 68. Team
  • 69. R&D Team Will Marcio Ricardo contratando Georg Drop us an email: research-and-development@dafiti.com.br Rafael Albert
  • 71. ● Visual conception ○ Visenze ○ Streamoid ○ Flashwall ○ Markable.ai ○ Syte ○ Flixstock R&D and Innovation Recap & Outlook Third party product integration (PoCs 2018) Understand needs, create assessment framework, search more possible third parties (benchmark or integration) use the startup ecosystem to create value for Dafiti! there are many startups pushing into chatbots and fashion (image similarity and catalog enrichment) but nobody is trying the hard stuff as a product (e.g. marketing budget allocation) ;-)
  • 72. ● academic research group ● current status: paper work ● works on AR and image understanding R&D and Innovation Recap & Outlook
  • 73. R&D and Innovation Recap & Outlook ● Internships @ Dafiti ○ Still working on the contractual part but we are making this happen ○ feel free to send me an email if you have interest: georg.buske@dafiti.com.br Dafiti <3 UFSM
  • 74. R&D and Innovation Recap & Outlook Innovation hubs ● Starting in 2019 create hubs in Brazil ● Work more closely with GFG (first calls with lamoda R&D, get back to innovation topics within GFG) ● Until 2020 assess possibilities in China and USA Use the startup ecosystem as multiplier - this is what an exponential platform means... If you participate in UFSM incubator and/or creating a startup disrupting fashion and/or commerce we want to hear from you :)
  • 76. R&D Vision Purpose: Lead the revolution of fashion and shopping with AI and technological innovation. Mission: Give Dafiti the capacity to use state-of-the- art AI. Innovation and research needs alignment too! Innovation and Intelligence Committee
  • 77. Innovating our fashion eCommerce and help with the transformation to THE fashion platform in LATAM with the aid of innovative ways such as machine learning, resp. artificial intelligence in general. E.g.: ● Building algorithms that help us with anticipated shipping, purchasing forecast and protects us against system failure. ● using image recognition to give our users the highest possible convenience and coolest features. ● using state of the art game engines to build virtual reality into our customer experience. ● Optimize product search and build data consistency monitoring. ● Help building a large scale architecture together with entire IT team. ● create a machine learning framework / standard stack and rules (e.g. Sakemaker, CICD, multi-cloud, experiment, tracking, etc.) The outcome will be nothing less than transform the way e-commerce works and to provide sustainable solutions. Mostly the team won’t work directly on user facing products but assesses ways to create impact and works together with other areas to make them happen. R&D and Innovation Recap & Outlook
  • 78. R&D and Innovation Recap & Outlook Key takeaways ● Events and techbranding is important to attract talent (team goal achieved - open positions filled, BIG THANKS TO OUR HRBPs <3) ○ OTOH we’ll reduce the number of indicators which make the brand index [the old index is not in these slides, please refer to the R&D strategy docs for more information] ● Techbranding and internal workshops not only helps foster DFTech as brand and teach our internal workforce but creates insights and identifies problems and opportunities ● The plan is to start 2019 with 100 % alignment and a mixed model of internal workforce, consulting partnerships and third party providers ○ regular update and alignment meetings will be held in form of an intelligence & innovation committee ○ using more rigid agile methods such as timeboxed sprints (incl. planning / review) to create more visibility and better alignment on results ● we will invest more into our ML standards and stack (as already started) 1 Innovation and research needs alignment, too! => Innovation and Intelligence Committee
  • 79. R&D and Innovation Recap & Outlook Key takeaways ● To become a name in research we must invest more and thus will start with 20 % time for this and will partner more with academic institutions ● We’ll reduce the number of area KPIs monitored to budget, people, PoCs realized, models launched in production, innovations launched (ideation will be part of this metric), third parties assessed, internal workshops given and techbrand initiatives (papers, articles, events, etc.) for now - KPI review is not in this presentation (please see strategy docs for old area metrics list) ● Pricing optimization and marketing allocation projects didn’t bring the expected results yet ○ eventually we will invest into more research ○ also there are many startups pushing into chatbots and fashion (image similarity and catalog enrichment) but nobody trying the hard stuff as a product ;-) ● Investment in search, recommendations (looks, emails, onsite), catalog enrichment and image recognition might be the most important in 2019 2 Balance explore (PoCs) VS. exploit (production)!
  • 81. HOW ● Committee OUTPUT ● prioritized shortlist ● team composition (third party, R&D, interdisciplinary team, etc.) R&D Framework How HOW ● Area or Product wishlist ● ML guilds or 20 % research ● Design thinking workshops per area OUTPUT ● Wishlist backlog (now: google docs, future: open innovation portal) ● ideas, hypothesis Ideation Prioritization Commit tee HOW ● Workshop per area together with R&D (regular schedule) OUTPUT ● ML Canvas (ML 101 workshop) ○ definition of success criterias and metrics ● Business canvas ● 6 pager Detailing R&D / areas Dafiti Identif y collab oratio n / work type
  • 82. R&D Framework How HOW ● Retrospective (Committee) ● Operation (if success) OUTPUT ● Lessons learned / Insights Finalization Com mittee HOW ● Development ● Test (AB test) ● Refine until satisfied or aborted (validation with user) OUTPUT ● Success -> deployment, ops ○ API ○ end-to-end ● Failure -> fail wall Implement HOW ● Data curation ● Paper research ● Third party benchmark ● EDA OUTPUT ● Baseline model ● insights ● validated hypothesis ● GO/NOGO PoC R&D, area, third party KR: 6 KR: 2 Agile; squad s; TBD R&D, area, third party
  • 83. R&D Framework RULES ● 2 weeks ahead of committee meetings requirements of possible projects and its definitions (success metrics, ML canvas) needs to be done / aligned with R&D ● no ideation during committee, only backlog discussion (exception: today) ● area / product person responsible [optionally together with R&D] will present the detailed ideation item to committee How
  • 84. R&D Framework COLLABORATION / WORK TYPES ● PoC internal: The PoC execution is fully owned by R&D. ● Implementation: The implementation of the deployable live product which is full owned by R&D (either end-to-end or as API). ● Coach: The implementation or PoC execution is owned by the area and a R&D member is supporting the initiative as a coach. ● workshop (not part of framework): Either as TechTalk like workshop through DFTAcademy or deeper classroom trainings (certification) ML concepts will be taught to DFT employees (rather than a concrete business problem solved). How
  • 85. Tools
  • 86. Design Thinking workshops CONVERGENTE DIVERGENTE Entender Definir Gerar Ideias Decidir Necessid ades (Pessoas) Viabili dade (Negóc ios) Possib ilidade (Tecnol ogia) Oportun idades (Inovaç ão) DESIGN THINKING ● with and by our awesome UX team
  • 87. Machine Learning Canvas ● the canvas help to understand the maturity of the project (in terms of data sources, value proposition, etc.) ● not everyone needs to understand every part but having the canvas created and validated shows it is ready to work on ● value proposition ○ if there is a overall business model canvas the proportional value can be used for the ML task at hand ○ there must be a success metric ● For the ones eager to learn more: ○ New book draft (I will send later on) ○ ML 101 workshop where we’ll discuss ML canvas ○ more to come :-)
  • 89. Example: Customer service ● Develop intelligence / integration for services we already have: Chat BOT Facebook Messenger / E-mail Form Site / FAQ ● Develop intelligence / integration for calls that we would like to have: BOT Time Line Facebook, Instagram and Twitter / Chat BOT Shop / Whatsapp (Online and Offline), URA (Voice Response) ● In addition, a work that was developed in B2W and generated many gains in speed of service, quality and standardization of contacts was the development of a Virtual Attendant, who in addition to passing information, can execute actions (sending 2ª Via Boleto, 2ª Via de Nota Fiscal, alteration of cadastral data, sending reset of password Initial ideas BMC (optional) MLC Ideation / Detailing
  • 91. R&D and Innovation Recap & Outlook Image Similarity
  • 92. R&D and Innovation Recap & Outlook
  • 93. R&D and Innovation Recap & Outlook Return rate prediction
  • 94. Approach Dado um item comprado, prever se o item será retornado ou não. DW Feature Extraction Modelo 1 Modelo 2 Modelo Agregador Prob. de Retorno
  • 95. Features Foram usadas em torno de 100 features a partir do produto, cliente e transação para o treinamento dos modelos. Dentre as features estão: - CEP de entrega - Fornecedor - Marca - Idade da Conta do Usuário - Tempo entre pedido e entrega - Tempo desde a última compra do cliente - Net Total Value - Número de pedidos e retornos observados no cliente até a data - etc...
  • 96. Modelo Agregado - Scores abaixo de 0.01: - 90% dos itens não-retornados - Taxa de retorno de 0.02% - Scores acima de 0.5: - 53% dos itens retornados - Taxa de retorno de 19%
  • 97. R&D and Innovation Recap & Outlook
  • 98. R&D and Innovation Recap & Outlook ● a successful model for return rate prediction was created ● deployed via AWS sakemaker (part of ML standard) ● could be easily adapted for cancellation rate
  • 100. ● During the return rate project we noted many of our business concern involve Survival Analysis. ● Survival Analysis model situations in which there are discrete events that take some time to occur. ● Most of our problems fall into a less standard type of Survival models called Cure Models ● We are currently developing the capability of applying cure models in complex datasets for both insights and predictive modelling. ● This will allows us to attack return rates, cancellation rates, second purchase behavior, time-to-delivery, time-to-stock- replenishment and all sorts of time-to-X problems.
  • 101. A few return rate insights
  • 102. A few return rate insights
  • 103. A few return rate insights
  • 104. A few return rate insights
  • 105. A few return rate insights
  • 106. PoCs (proof of concepts)
  • 107. ● Search the look (H1) ● Search - S4 (H1) ● Categorization (catalog automatization) ● Causal impact and marketing budget allocation ● Size filters [external partner: bmind] ● Ratings and reviews ● Brand clustering ● Sales forecasting Blackfriday PoCs (proof of concepts) R&D and Innovation Recap & Outlook
  • 108. R&D and Innovation Recap & Outlook Search S4
  • 109. R&D and Innovation Recap & Outlook Search PoC (s4) as fallback for datajet (because of before outages) with advanced learning to rank and search optimization
  • 110. R&D and Innovation Recap & Outlook
  • 111. R&D and Innovation Recap & Outlook ● Strategy 2019 work together on Search as a global product (datajet) ● learnings and advanced concepts from s4 will be applied to datajet
  • 112. R&D and Innovation Recap & Outlook Sales forecasting Blackfriday
  • 113. Blackfriday throughout the years Looking at sales from Thursday 00:00 to Sunday 23:59 in the years 2013 to 2018 there is a pattern that repeats every year:
  • 114. Simulating revenue for 2018 based on 2017 Given the distribution of gross revenue per hour that was generated in 2017 during the Blackfriday, we could generate a revenue projection for 2018. The values expected for each hour were derived based on the total revenue estimated by the Live Sales, which is a system used at Dafiti that implements a moving average type of calculation.
  • 115. R&D and Innovation Recap & Outlook ● Success during Blackfriday ● Knowledge and models obtained being applied to “General Sales Forecasting”: ○ awareness of cyclic sales behaviour in specific time windows ○ lag features ○ extraction and usage of Dafiti’s full sales history ○ how to deal with the data granularity ○ benchmarking GBM vs Neural Network While starting to work on pricing optimization we realized we need a sophisticated forecasting first
  • 116. R&D and Innovation Recap & Outlook Categorization (catalog automatization)
  • 117. ● Its goal is to automate object identification only from sku images. ● Imagenet* exists since 2010, and this task is considered dominated by computer science. ● Deep Learning models are the actual state-of-the-art for this task. ● We have enough data for big learning models, over 3 million images. ● We have the data (needs some work) and we have the model! ● The data needs some adjustments as catalog “mistakes are easy to find”. ● Also the used catalog trees have duplicates, attributes are considered category, examples from name_tree3: ○ "Other", "Outras Roupas", "Outros". ○ "Pijamas", "Pijamas e Camisetas". ○ "Polo Manga Curta", "Polo Manga Longa", "Polos". Catalog automatization
  • 118. ● The trained model achieves this results for these catalog trees: Catalog automatization Catalog model errors total sku accuracy name_tree1 72.683 681.244 89,33 % name_tree2 136.793 681.244 79,92 % name_tree3 158.898 681.244 76,67 %
  • 124. Catalog automatization What is suggested to fulfill an automatization: 1. Data cleansing with model’s insights and/or enhanced categorization tree and attributes. 2. Train and validate new model’s predictions. 3. Repeat 1 and 2 until satisfied. 4. Connect this API into the sku registration steps. Next steps catalog automatization and conclusion: ● high potential for catalog curation ● learnings from 2018 will be applied in catalog cleanup 2019
  • 125. R&D and Innovation Recap & Outlook Ratings & Reviews
  • 126. ● The goal is to automate approval of reviews. ● Started with preparation for slides for a congress -> made part of the hackathon -> was incorporated into ML 101 workshops -> results aligned with business ● We have the data (also needs some work) and we have the model! ● The data needs some adjustments: ○ Is there a defined policy for approval/rejection of reviews? ○ Is historical data accurate enough for what the company wants for the future? ○ Does the company wants more insights from reviews?* Ratings & Reviews
  • 127. Ratings & Reviews Historical data historical data: reviews_approved.csv 519.463 reviews_rejected.csv 81.598 total reviews model’s errors accuracy f1-score manually evaluated reviews 601.061 57.704 90,39 % 88 % approved rejected Test data (15%) results:
  • 128. model’s confidence text 0.916 A qualidade não é tão boa. Pelo preço esperava ms 0.968 Muito boa,linda. 0.663 Não consigo fechar a compra 0.589 A calça e pequena tenho 1.63 ela ficou no meio das pernas odiei.por favor me reponha o valor pago. 0.773 Descascou no primeiro dia de uso. Decepcionada... 0.878 Recebi o tênis tem uma semana, a primeira vez que meu filho usou e fui limpar, o tênis desbotou. Não tem qualidade 0.869 Lola lp.k 0.917 Produto in store: REJECTED model’s prediction: APPROVED Ratings & Reviews Historical data
  • 129. model’s confidence text 0.731 Gostaria de saber quando estará disponível o nº 34? 0.973 very satisfied with the product. Great finish and very good value for the money. Fits my shoe-size perfectly 0.549 gOSTARIA DE SABER SE VCS TEM ESSE SAPATO EM AZUL MARINHO!! OBRIGADO MARIA LUCIA 0.700 oieeeeeeeeee eu queria essa linda sandalia pfv venha me dar xhauu 0.527 Quero saber se posso trocar o número se não der .. 0.521 è muito bonito mas eu vivo em moçambique e gostava que abrisem uma loja ca em maputo na capital de moçambique. 0.691 morri porfavor digam-me alguma coisa porfavor in store: APPROVED model’s prediction: REJECTED Ratings & Reviews Historical data
  • 130. Ratings & Reviews Pending reviews_pending.csv 219414 approved rejected pending reviews 194.173 (88,5 %) 25.241 (11,5 %) most confidence cases of: confidence value text APPROVED 0.999 Decepção. Malha Muito fina e áspera, parece uma lixa. REJECTED 0.999 Gostei muito da sandália, super confortável mas já estou mandando de volta pois ela esfolou inteirinha na parte interna em dois usos. Já enviei fotos, estarei enviando de volta amanhã pra dafiti.
  • 131. Ratings & Reviews Pending
  • 132. What is suggested to fulfill an automatization: 1. Data cleansing with model’s insights. 2. Train and validate new model’s predictions. 3. Repeat 1 and 2 until satisfied. 4. Connect this API into the rating and reviews validation steps. New project: extract insights and information directly from users reviews, possibilities to explore: a. brand and products alarms on user problems (quality, fitting,...) b. detect reviews that are customer support related c. sentiment analysis Ratings & Reviews Pending Owner? Decision? => committee
  • 133. R&D and Innovation Recap & Outlook Causal Impact and budget allocation
  • 134. R&D and Innovation Recap & Outlook Hold-Out Testes ● Processamento séries temporais estruturadas ● Teste em produção no canal “google non-brand SEM” ● Confirmação estatística de valor representativo do canal ● Criação de algoritmos em Python
  • 135. R&D and Innovation Recap & Outlook Hackathon - Marketing Budget Allocation: ● Time series and non-linear optimization ● Minimization of “CIR” (1 / ROI) ● Algorithm makes resource allocation suggestions to optimize CIR
  • 136. R&D and Innovation Recap & Outlook
  • 137. R&D and Innovation Recap & Outlook Results: ● Opensourced port of causal impact package in R to python ● A Hackathon can create good insights and kick off BUT might create a false sense of success ● Understood GA data is not complete ● Optimization TBD
  • 138. R&D and Innovation Recap & Outlook Brand clustering
  • 139. Brand Clustering Analysis ● The goal is to bring marketing insights on how users act on brands, and reduce the brands dimension ● We used Google Analytics (GA) actions for 2 days on Dafiti website. 640.235 cookie sessions interacting with 276.923 skus of 4.825 brands. ● Top 4 more interacted brands are: ○ [('Colcci', 54948), ('Vizzano', 42715), ('Santa Lolla', 41401), ('Moleca', 34054)] ● Top 4 GA scores: ○ [('Beautiful Lingerie', 13.7532), ('Philco', 12.5799), ('#Euqfiz', 11.2241), ('Kmc', 9.1530)]
  • 141. Brand Clustering Analysis How are Dafiti brands related to others? ● 9 brands cluster -> ['Armadillo', 'DAFITI UNIQUE', 'Ki-fofo', 'Lua Luá', 'Mania De Moça', 'Meketrefe', 'Penguin', 'Red Life', 'Styll Baby'] ● 6 brands cluster -> ['Cavage', 'DAFITI JOY', 'La Beauté Cosmétiques', 'Miu Miu', 'Montain Boot', 'Refuse'] ● 10 brands cluster -> ['DAFITI ACCESSORIES', 'Enox', 'Khatto', 'Paul Ryan', 'Prorider', 'Secret', 'Sunnies', 'THOMASTON', 'Terra e Agua', 'Tilit'] ● 582 brands cluster -> ['...Lost', '100% Marca Própria', '3 Sprouts', ..., 'DAFITI I.D.', 'DC Original', 'DGK', 'DKNY', ...,'Sex and the City Cosmetics', ...,'Shoes Shoes', ...,'You Rock', 'Zebu', 'Zenit', 'Ziva'] ● 71 brands cluster -> ['Alta Villa Shoes', 'Asics', 'Ausländer', 'Beautiful Lingerie', 'Botswana', 'Bracciale Acessórios', 'Bull Motors', 'CZ Brand', 'Calcifran', 'Cisco', 'Columbia', 'Crocs', 'DAFITI EDGE', 'Dangelis Moda Íntima', ...,'Won Sports', 'Yardley', 'adidas', 'adidas Originals', 'adidas Performance', 'test', 'zeus'] ● 534 brands cluster -> ['24 Horas Calçados', ..., 'Bvlgari', ...,'Café Brasil', ...,'Cravo & Canela', ..., 'DAFITI', 'DAFITI SHOES', ...,'GUESS Kids', ...,'Harley-Davidson Footwear', ...,'Moleca', ...,'Santa Lolla', ...,'Tiffany & Co.', ...,'VIA UNO', ...,'Vizzano',...]
  • 142. The resulted clustering does not help much for marketing insights directly. Some changes are needed to provide a direct business value: 1. Consider the problem as a recommendation task. 2. Implement changes to the “Marreco” system, to provide an analysis over brand interactions. Brand clustering 3 most similar brands to Dafiti brands and its similarity score (cosine): ● DAFITI I.D. - [('D-Tox', 0.3293), ('Monte Carlo Polo Club', 0.1646), ('Drop Life', 0.0856)] ● DAFITI SHOES - [('Moleca', 0.2111), ('Ana Cristina', 0.1580), ('Vizzano', 0.1565)] ● DAFITI EDGE - [('FKN', 0.0718), ('Lemon Grove', 0.0665), ('Yachtsman', 0.0366)] ● DAFITI - [('Ride Skateboard', 0.0666), ('Santa Maria', 0.0591), ('Snoopy', 0.0524)] ● DAFITI ACCESSORIES - [('Vila Flor', 0.0455), ('Prorider', 0.0435), ('Flyca Girls', 0.0334)] ● DAFITI UNIQUE - [('Shoulder', 0.0246), ('Energia', 0.0231), ('DAFITI ONTREND', 0.0166)]
  • 143. R&D and Innovation Recap & Outlook Filter cleanup analysis (sizes)
  • 144. R&D and Innovation Recap & Outlook Sizes children’s clothes
  • 145. Internal workshops R&D and Innovation Recap & Outlook
  • 146. R&D and Innovation Recap & Outlook
  • 147. R&D and Innovation Recap & Outlook Conclusion: ● we leveraged third party knowledge (consulting) to do the analysis ● few [marketplace] products are creating a very bad user experience ● has some potential quickwins ● we need to align the best form in terms of architecture (first idea of DB update might not be ideal) - product development support? ● What can we fix in registration process already?
  • 149. AI awareness Sales forecasting (train new model with learnings from blackfriday forecasting) Price optimization ng and Buying Marketing allocation Cancellation rate Email click prediction (recipient selection, Markovien) Customer segmentation / user profiles Online recommendations Search Reinforcement learning Survival analysis Email recommendations (jetlore competition) Image similarity Delivery prediction Delivery visualization Image segmentation Anticipatory shipping Intelligent Sizing NLP for chatbots and sentiment analysis Looks, Image understanding and shoppable videos VR personalized discounts R&D and Innovation Recap & Outlook