Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Meetup duchess 20160119 - Leboncoin de la data

872 visualizaciones

Publicado el

Experience sharing, about the transition from a BI architecture to a big data architecture.
Theses slides describes what leboncoin is currently working on.

Publicado en: Datos y análisis
  • Sé el primero en comentar

Meetup duchess 20160119 - Leboncoin de la data

  1. 1. LEBONCOIN DE LA DATA Stéphanie Baltus – Responsable Data Engineering- @steph_baltus Meetup Duchess France @ TheFamily – 01/19/2016
  2. 2. ■ About leboncoin ■ Data, data everywhere ! ■ To infinity and beyond … 2 PLAN
  3. 3. ABOUT LEBONCOIN
  4. 4. 4 LEBONCOIN...AND FRIENDS
  5. 5. 5
  6. 6. ■ A Schibsted Media Group company ■ Since 2006 ■ 320+ people ■ Located in Paris, Montceau-Les-Mines, Reims ■ 2014 Revenue: 150+M€ 6 IN A FEW WORDS
  7. 7. 7 NOT JUST A WEBSITE
  8. 8. ■ Classified ads : ■ Professional ■ Personal ■ Premium offer : ■ Highlight products ■ Ad import tools ■ Ad display 8 NOT JUST A CLASSIFIEDADS COMPANY
  9. 9. DATA, DATA EVERYWHERE
  10. 10. ■ Building a team ■ Provide daily batch DWH ■ Website traffic (sort of) ■ Ad activity & validation ■ Sales & Coin usage ■ User information ■ Support ■ Try near-real time processing 10 A BIT OF STORY
  11. 11. 11 SO, WE DID SOME BI STUFF (2012-2015)
  12. 12. 12 IT LOOKS LIKE THIS
  13. 13. ■ A lot of uncovered scope ■ Incremental load only ■ Unablity to load historical data, stuck from 2013 to today ■ A business team unable to query the database ■ A lot of « no! » when asking for evolution ■ Vertical scalability only ■ No potential sharing policy with the product (website, app, CRM, …) 13 IT WORKS ! BUT …
  14. 14. TO INFINITYAND BEYOND!
  15. 15. ■ Share data services with the website, apps ■ Build a unique source of truth ■ Provide raw data to our analysts ■ Provide real time data ■ Cover all the data scope of leboncoin 15 THE FUTURE
  16. 16. 16 FUNCTIONALARCHITECTURE
  17. 17. 17 DATAARCHITECTURE : DUMBO STYLE
  18. 18. 18 ONE STACK TO RULE THEM ALL
  19. 19. ■ Centralized data cleaning / streamlining ■ Extended analytics apps ■ Ads and customers indexes ■ Import ad web service ■ Datalake indexing through bloomfilter ■ Anomaly detection 19 SOME IMPLEMENTATIONS
  20. 20. ■ Goal : help the SysAdmin Team to catch bots crawling our website and apps to steal our ads or people’s phone numbers => Anomaly detection ■ How : ■ Use http logs (150Go per day) ■ Build KPIs and vectors ■ Apply a logistic regression to identify suspicious session ■ Next steps : ■ Test K-Means algorithm 20 CATCH’EMALL !
  21. 21. ■ Data unified view ■ Home built data extractor + Spark MDM jobs ■ Build a next generation BI app ■ Spark ETL+ Redshift ■ Share built information with other apps ■ Spark ETL+ ES + Kafka 21 DIVE INTO DATA SHARING
  22. 22. 22 NOW IT LOOKS LIKE THIS
  23. 23. ■ Being production ready ■ New app, new services ■ More machine learning oriented apps ■ Feeding the website ■ Recruiting  23 WHAT’S NEXT ?
  24. 24. QUESTIONS ?

×