SlideShare una empresa de Scribd logo
1 de 13
Descargar para leer sin conexión
@ Rubyslava 2014
Michal Hariš : michal.haris@visualdna.com
- Technical Architect, joined VisualDNA in 2012
Where were we 3 years ago
●

10 people working around one mysql table holding 50M+ user profiles
Where were we 3 years ago
●

10 people working around one mysql table holding 50M+ user profiles

●

LAMP Architecture
SCALABILITY ISSUES
Where were we 3 years ago
●

10 people working around one mysql table holding 50M+ user profiles

●

LAMP Architecture
SCALABILITY ISSUES
DECISION TO GO BIG (DATA) !
Where were we 18 months ago
●

30 strong team, of that a single tech team of roughly 15 people

●

Basically a batch architecture
●
●
●
●
●
●

●

just not MySQL but CASSANDRA + HADOOP at the back
http+php trackers with piped custom log batch process
s3 upload every 5 min
daily hdfs distcp
POC = daily hadoop inference > 6 node cassandra -> batch integrations
POC was a daily batch job which on bad days took 30 hours

One of the first commercial Cassandra cluster in the world
● very unstable
Where are we today
● Stack
● Java
● Scala
● Hadoop
● Cassandra
● Kafka
● Redis
● R
● AngularJS for the front-end
Where are we today
●

Auto-scaling geo-located Tracker Clusters - well, almost auto-scaling

●

Robust Streaming Infrastructure - aggregation of all data streams in
central infrastructure
●

bringing in 8.5k events/ second at peak

●

●

Real-time end-user products, scoring services, integrations with third
parties where possible, pre-computation infrastructure that scales more
predictively
● These are primary events which get multiplied by various speed-layer
ETL Pipeline - offloading data streams and pre-computing materialised
views onto HDFS > 30TB of primary data

●

● some data we keep only last 60 or 90 days, others we keep for ever
Decision Analytics Pipeline (or RD Pipe) > 100TB+ of secondary data i
●

Using feature-extraction machine learning methods
Where are we today
●

Still one Cassandra ring, just bigger and more stable, 16 nodes, 250M+
active user profiles

●

Lambda Architecture for real-time products like WHY Analytics
●
●
●
●
●

RD Pipe is the "batch" layer (daily) that generates active profiles as a
cassandra ("view layer")
Primary Events are enriched for user profiles produced daily by the
Enrichment service ("speed layer")
Combination of probabilistic counters and Redis cubes calculates the
current audience profiles for subscribed websites ("speed layer")
API on top of the Redis cubes serves the current audience profiles for the
front end suite of real-time analytics products ("serving layer")
Audience Analytics product suite is the good looking bit - http://www.
visualdna.com/why/
Where are we today
● 120-strong team, of that tech is roughly 60:
●
●
●
●
●

Sysadmin Team
Architecture Tech Team
Decision Analytics Tech Team
Consumer Tech Team
WHY Analytics Team
What have we learned
●

Architecture:
●

Updating json blobs in Cassandra columns is a trap
● Logging is better http://engineering.linkedin.com/distributed-systems/log-what-everysoftware-engineer-should-know-about-real-time-datas-unifying

●

●

●

Metrics are crucial in large distributed systems
● yammer metrics + graphite + icinga works well for infrastructure
● but complex event/anomalies detection and pattern analysis gives the
edge
Real-Time processing of Data Streams is not only cool, but scales
well ... until you find a bottleneck in a single component which will limit the
entire system
Batch still matters
● but could be much faster than Hadoop which falls on too much
redundant I/O and requires a coordinated ETL pipeline
What have we learned
●

Engineering:
●

●

the unix philosophy of building short, simple, clear, modular, and
extendable code applies also to a design of distributed systems not
just an OS
bad tests are better than no tests but they are still bad and most tests
only test positive outcome
● the story of Math.abs() -> actually can return negative number ->
but none of the unit-tests anticipated this -> which is why metrics
and systems with feedback control are crucial

●

●
Process:
●

●

It is possible to co-operate remotely even on complex and not-well
defined systems - atm some of the architecture team is working remotely
on permanent basis
QA is intrinsic to Architecture and local to products
Interesting issues we’re facing
1. SLAs vs. Start-up dynamics - Separate process (and to some
degree architecture) for different levels of guarantee of service

2. Globally-distributed highly-available API for random
access to our profiles - enabling decisions based on VDNA profiles on-demand
3. Our Lambda has a bottleneck at the enrichment point

-

although if we solve (2.) we will be half-way through

4. Complex data pooling attribution model
5. Cassandra still gives us some pain - it's the drivers! - interesting
about consistency: http://aphyr.com/posts/294-call-me-maybe-cassandra/

6. Preserving start-up dynamics and culture in a company
of 200+ with offices in several cities
We’re hiring for Bratislava office!
● We’re looking for engineers and analysts and
more to be based in Bratislava

careers-cee@visualdna.com

Más contenido relacionado

La actualidad más candente

Pavel Prischepa. Fffast Drupal backend.
Pavel Prischepa. Fffast Drupal backend.Pavel Prischepa. Fffast Drupal backend.
Pavel Prischepa. Fffast Drupal backend.
DrupalSib
 
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
ScyllaDB
 

La actualidad más candente (20)

Scylla Summit 2022: ScyllaDB Cloud: Simplifying Deployment to the Public Cloud
Scylla Summit 2022: ScyllaDB Cloud: Simplifying Deployment to the Public CloudScylla Summit 2022: ScyllaDB Cloud: Simplifying Deployment to the Public Cloud
Scylla Summit 2022: ScyllaDB Cloud: Simplifying Deployment to the Public Cloud
 
Pavel Prischepa. Fffast Drupal backend.
Pavel Prischepa. Fffast Drupal backend.Pavel Prischepa. Fffast Drupal backend.
Pavel Prischepa. Fffast Drupal backend.
 
Introducing workload analysis
Introducing workload analysisIntroducing workload analysis
Introducing workload analysis
 
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
 
The architecture of SkySQL
The architecture of SkySQLThe architecture of SkySQL
The architecture of SkySQL
 
MariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introduction
 
How Pixid dropped Oracle and went hybrid with MariaDB
How Pixid dropped Oracle and went hybrid with MariaDBHow Pixid dropped Oracle and went hybrid with MariaDB
How Pixid dropped Oracle and went hybrid with MariaDB
 
Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...
Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...
Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...
 
TiDB Introduction
TiDB IntroductionTiDB Introduction
TiDB Introduction
 
Cassandra Lunch #23: Lucene Based Indexes on Cassandra
Cassandra Lunch #23: Lucene Based Indexes on CassandraCassandra Lunch #23: Lucene Based Indexes on Cassandra
Cassandra Lunch #23: Lucene Based Indexes on Cassandra
 
Presto Summit 2018 - 02 - LinkedIn
Presto Summit 2018  - 02 - LinkedInPresto Summit 2018  - 02 - LinkedIn
Presto Summit 2018 - 02 - LinkedIn
 
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDBScylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
 
Productionalizing a spark application
Productionalizing a spark applicationProductionalizing a spark application
Productionalizing a spark application
 
Journey and evolution of Presto@Grab
Journey and evolution of Presto@GrabJourney and evolution of Presto@Grab
Journey and evolution of Presto@Grab
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
 
Spark stack for Model life-cycle management
Spark stack for Model life-cycle managementSpark stack for Model life-cycle management
Spark stack for Model life-cycle management
 
Introducing the ultimate MariaDB cloud, SkySQL
Introducing the ultimate MariaDB cloud, SkySQLIntroducing the ultimate MariaDB cloud, SkySQL
Introducing the ultimate MariaDB cloud, SkySQL
 
Orchestrating Cassandra with Kubernetes
Orchestrating Cassandra with KubernetesOrchestrating Cassandra with Kubernetes
Orchestrating Cassandra with Kubernetes
 
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate
BDX 2016 - Kevin lyons & yakir buskilla  @ eXelate BDX 2016 - Kevin lyons & yakir buskilla  @ eXelate
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate
 
CCV: migrating our payment processing system to MariaDB
CCV: migrating our payment processing system to MariaDBCCV: migrating our payment processing system to MariaDB
CCV: migrating our payment processing system to MariaDB
 

Destacado

OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud ComputingOSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
Mark Hinkle
 
Título de experto en programación con tecnologías web
Título de experto en programación con tecnologías webTítulo de experto en programación con tecnologías web
Título de experto en programación con tecnologías web
AlicantePHP
 
Rapid Product Design in the Wild - Agile Iceland
Rapid Product Design in the Wild - Agile IcelandRapid Product Design in the Wild - Agile Iceland
Rapid Product Design in the Wild - Agile Iceland
Michele Ide-Smith
 
Big Data, Big Changes: Data-Driven Product Development at Etsy
Big Data, Big Changes: Data-Driven Product Development at EtsyBig Data, Big Changes: Data-Driven Product Development at Etsy
Big Data, Big Changes: Data-Driven Product Development at Etsy
Jason Davis
 
Taxonomy of Scala
Taxonomy of ScalaTaxonomy of Scala
Taxonomy of Scala
shinolajla
 

Destacado (20)

Microdata, Authorship, Google+ and Joomla! - Ruth Cheesley - Joomla! World Co...
Microdata, Authorship, Google+ and Joomla! - Ruth Cheesley - Joomla! World Co...Microdata, Authorship, Google+ and Joomla! - Ruth Cheesley - Joomla! World Co...
Microdata, Authorship, Google+ and Joomla! - Ruth Cheesley - Joomla! World Co...
 
SST 2014; The Reluctant SME
SST 2014; The Reluctant SMESST 2014; The Reluctant SME
SST 2014; The Reluctant SME
 
Business of Front-end Web Development
Business of Front-end Web DevelopmentBusiness of Front-end Web Development
Business of Front-end Web Development
 
Web accessibiilty and Drupal
Web accessibiilty and DrupalWeb accessibiilty and Drupal
Web accessibiilty and Drupal
 
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud ComputingOSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
 
Título de experto en programación con tecnologías web
Título de experto en programación con tecnologías webTítulo de experto en programación con tecnologías web
Título de experto en programación con tecnologías web
 
Rapid Product Design in the Wild - Agile Iceland
Rapid Product Design in the Wild - Agile IcelandRapid Product Design in the Wild - Agile Iceland
Rapid Product Design in the Wild - Agile Iceland
 
Datatium - radiation free responsive experiences
Datatium - radiation free responsive experiencesDatatium - radiation free responsive experiences
Datatium - radiation free responsive experiences
 
Something from Nothing: Simple Ways to Look Sharp When Time is Short
Something from Nothing: Simple Ways to Look Sharp When Time is ShortSomething from Nothing: Simple Ways to Look Sharp When Time is Short
Something from Nothing: Simple Ways to Look Sharp When Time is Short
 
OpenID and decentralised social networks
OpenID and decentralised social networksOpenID and decentralised social networks
OpenID and decentralised social networks
 
Groovy & Grails eXchange 2012 - Building an e-commerce business with gr8 tec...
Groovy & Grails eXchange 2012 - Building an  e-commerce business with gr8 tec...Groovy & Grails eXchange 2012 - Building an  e-commerce business with gr8 tec...
Groovy & Grails eXchange 2012 - Building an e-commerce business with gr8 tec...
 
Rails traps
Rails trapsRails traps
Rails traps
 
UXD v. Analytics - WIAD13 Ann Arbor
UXD v. Analytics - WIAD13 Ann ArborUXD v. Analytics - WIAD13 Ann Arbor
UXD v. Analytics - WIAD13 Ann Arbor
 
Alternative Design Workflows in a "PostPSD" Era
Alternative Design Workflows in a "PostPSD" EraAlternative Design Workflows in a "PostPSD" Era
Alternative Design Workflows in a "PostPSD" Era
 
FSharp for Trading - CodeMesh 2013
FSharp for Trading - CodeMesh 2013FSharp for Trading - CodeMesh 2013
FSharp for Trading - CodeMesh 2013
 
Using Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open DataUsing Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open Data
 
ReactJS maakt het web eenvoudig
ReactJS maakt het web eenvoudigReactJS maakt het web eenvoudig
ReactJS maakt het web eenvoudig
 
Big Data, Big Changes: Data-Driven Product Development at Etsy
Big Data, Big Changes: Data-Driven Product Development at EtsyBig Data, Big Changes: Data-Driven Product Development at Etsy
Big Data, Big Changes: Data-Driven Product Development at Etsy
 
Taxonomy of Scala
Taxonomy of ScalaTaxonomy of Scala
Taxonomy of Scala
 
SCALE12X Build a Cloud Day: Chef: The Swiss Army Knife of Cloud Infrastructure
SCALE12X Build a Cloud Day: Chef: The Swiss Army Knife of Cloud InfrastructureSCALE12X Build a Cloud Day: Chef: The Swiss Army Knife of Cloud Infrastructure
SCALE12X Build a Cloud Day: Chef: The Swiss Army Knife of Cloud Infrastructure
 

Similar a About VisualDNA Architecture @ Rubyslava 2014

Similar a About VisualDNA Architecture @ Rubyslava 2014 (20)

PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
 
CDP.pl - tech case study by Divante
CDP.pl - tech case study by DivanteCDP.pl - tech case study by Divante
CDP.pl - tech case study by Divante
 
CDP.pl - tech case study by Divante
CDP.pl - tech case study by DivanteCDP.pl - tech case study by Divante
CDP.pl - tech case study by Divante
 
Data ops in practice - Swedish style
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish style
 
Web-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchWeb-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batch
 
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriThinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
 
DOES14 - David Ashman - Blackboard Learn - Keep Your Head in the Clouds
DOES14 - David Ashman - Blackboard Learn - Keep Your Head in the CloudsDOES14 - David Ashman - Blackboard Learn - Keep Your Head in the Clouds
DOES14 - David Ashman - Blackboard Learn - Keep Your Head in the Clouds
 
DOES14 - David Ashman, Blackboard Learn - Keep Your Head in the Clouds Tuesda...
DOES14 - David Ashman, Blackboard Learn - Keep Your Head in the Clouds Tuesda...DOES14 - David Ashman, Blackboard Learn - Keep Your Head in the Clouds Tuesda...
DOES14 - David Ashman, Blackboard Learn - Keep Your Head in the Clouds Tuesda...
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
 
The Crown Jewels: Is Enterprise Data Ready for the Cloud?
The Crown Jewels: Is Enterprise Data Ready for the Cloud?The Crown Jewels: Is Enterprise Data Ready for the Cloud?
The Crown Jewels: Is Enterprise Data Ready for the Cloud?
 
Accelerating Digital Transformation: It's About Digital Enablement
Accelerating Digital Transformation:  It's About Digital EnablementAccelerating Digital Transformation:  It's About Digital Enablement
Accelerating Digital Transformation: It's About Digital Enablement
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
 
From monolith to microservices
From monolith to microservicesFrom monolith to microservices
From monolith to microservices
 
DevOps at Obama for America(2012) and the DNC (DevOps Days NYC Jan 2013)
DevOps at Obama for America(2012) and the DNC (DevOps Days NYC Jan 2013)DevOps at Obama for America(2012) and the DNC (DevOps Days NYC Jan 2013)
DevOps at Obama for America(2012) and the DNC (DevOps Days NYC Jan 2013)
 
Gluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with HadoopGluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with Hadoop
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

About VisualDNA Architecture @ Rubyslava 2014

  • 1. @ Rubyslava 2014 Michal Hariš : michal.haris@visualdna.com - Technical Architect, joined VisualDNA in 2012
  • 2. Where were we 3 years ago ● 10 people working around one mysql table holding 50M+ user profiles
  • 3. Where were we 3 years ago ● 10 people working around one mysql table holding 50M+ user profiles ● LAMP Architecture SCALABILITY ISSUES
  • 4. Where were we 3 years ago ● 10 people working around one mysql table holding 50M+ user profiles ● LAMP Architecture SCALABILITY ISSUES DECISION TO GO BIG (DATA) !
  • 5. Where were we 18 months ago ● 30 strong team, of that a single tech team of roughly 15 people ● Basically a batch architecture ● ● ● ● ● ● ● just not MySQL but CASSANDRA + HADOOP at the back http+php trackers with piped custom log batch process s3 upload every 5 min daily hdfs distcp POC = daily hadoop inference > 6 node cassandra -> batch integrations POC was a daily batch job which on bad days took 30 hours One of the first commercial Cassandra cluster in the world ● very unstable
  • 6. Where are we today ● Stack ● Java ● Scala ● Hadoop ● Cassandra ● Kafka ● Redis ● R ● AngularJS for the front-end
  • 7. Where are we today ● Auto-scaling geo-located Tracker Clusters - well, almost auto-scaling ● Robust Streaming Infrastructure - aggregation of all data streams in central infrastructure ● bringing in 8.5k events/ second at peak ● ● Real-time end-user products, scoring services, integrations with third parties where possible, pre-computation infrastructure that scales more predictively ● These are primary events which get multiplied by various speed-layer ETL Pipeline - offloading data streams and pre-computing materialised views onto HDFS > 30TB of primary data ● ● some data we keep only last 60 or 90 days, others we keep for ever Decision Analytics Pipeline (or RD Pipe) > 100TB+ of secondary data i ● Using feature-extraction machine learning methods
  • 8. Where are we today ● Still one Cassandra ring, just bigger and more stable, 16 nodes, 250M+ active user profiles ● Lambda Architecture for real-time products like WHY Analytics ● ● ● ● ● RD Pipe is the "batch" layer (daily) that generates active profiles as a cassandra ("view layer") Primary Events are enriched for user profiles produced daily by the Enrichment service ("speed layer") Combination of probabilistic counters and Redis cubes calculates the current audience profiles for subscribed websites ("speed layer") API on top of the Redis cubes serves the current audience profiles for the front end suite of real-time analytics products ("serving layer") Audience Analytics product suite is the good looking bit - http://www. visualdna.com/why/
  • 9. Where are we today ● 120-strong team, of that tech is roughly 60: ● ● ● ● ● Sysadmin Team Architecture Tech Team Decision Analytics Tech Team Consumer Tech Team WHY Analytics Team
  • 10. What have we learned ● Architecture: ● Updating json blobs in Cassandra columns is a trap ● Logging is better http://engineering.linkedin.com/distributed-systems/log-what-everysoftware-engineer-should-know-about-real-time-datas-unifying ● ● ● Metrics are crucial in large distributed systems ● yammer metrics + graphite + icinga works well for infrastructure ● but complex event/anomalies detection and pattern analysis gives the edge Real-Time processing of Data Streams is not only cool, but scales well ... until you find a bottleneck in a single component which will limit the entire system Batch still matters ● but could be much faster than Hadoop which falls on too much redundant I/O and requires a coordinated ETL pipeline
  • 11. What have we learned ● Engineering: ● ● the unix philosophy of building short, simple, clear, modular, and extendable code applies also to a design of distributed systems not just an OS bad tests are better than no tests but they are still bad and most tests only test positive outcome ● the story of Math.abs() -> actually can return negative number -> but none of the unit-tests anticipated this -> which is why metrics and systems with feedback control are crucial ● ● Process: ● ● It is possible to co-operate remotely even on complex and not-well defined systems - atm some of the architecture team is working remotely on permanent basis QA is intrinsic to Architecture and local to products
  • 12. Interesting issues we’re facing 1. SLAs vs. Start-up dynamics - Separate process (and to some degree architecture) for different levels of guarantee of service 2. Globally-distributed highly-available API for random access to our profiles - enabling decisions based on VDNA profiles on-demand 3. Our Lambda has a bottleneck at the enrichment point - although if we solve (2.) we will be half-way through 4. Complex data pooling attribution model 5. Cassandra still gives us some pain - it's the drivers! - interesting about consistency: http://aphyr.com/posts/294-call-me-maybe-cassandra/ 6. Preserving start-up dynamics and culture in a company of 200+ with offices in several cities
  • 13. We’re hiring for Bratislava office! ● We’re looking for engineers and analysts and more to be based in Bratislava careers-cee@visualdna.com