When building a data team from scratch or inheriting an existing team, there are plenty of questions to ask when thinking about how to successfully deliver on our mission to the company. Should data engineering be part of the data organization or does it sit better with the engineering team? Data scientist is a job title that means a lot of different things to different companies, what does it mean to us? Are we aligned around platforms or functions? What's our strategy around data governance and compliance? And that's just to name a few.
This talk will present some insights from prior experience on structuring data teams, both at startups and larger legacy organizations, covering examples that have been both successful and not so successful, and lessons learned in each case.
2. Why do I like talking about building
data teams?
I’ve done it a lot!
I’m currently on round 3
● Round 1 Timeline Labs - Seed Stage start-up
● Round 2 Tribune Publishing / tronc - Large legacy organization
● Round 3 Los Angeles Times - Slightly smaller legacy organization
Every organization is different. What works at one company may need to be modified for
another.
3. What makes any organization
successful?
● Performance Deliver projects on time that meet the needs of the business
● Agility Easily adjust team goals and work tasks to meet changing business needs
● Communication Intra- and inter- team communication are critical to any complex
project’s success
● Collaboration Successful teams don’t work in a bubble, they work hand in hand with
many different elements of the business
4. Data orgs often face unique challenges
compared to other technical teams
● Analytics groups can easily get stuck in a service request mode rather than being
product focused
● Machine learning groups are often working from business requirements, but the exact
solution is generally not understood until some research and prototyping is done
● Frameworks for deploying data and ML solutions haven’t standardized, solutions are
often one-off even inside a single company or team and certainly vary across
companies
5. What might hinder us from reaching
our goals?
There are always restrictions placed on us that limit what
we can do to achieve our goals
● Budgets and Headcount
● Proving ROI (particularly when it comes to new ML
groups)
● Existing structures and processes
6. How we are building our data org at the
LA Times
7. Why have data engineering and data
science in the same org?
Benefits of bringing together DE and DS
● Improve data scientist developer chops
● Increase alignment between engineering and data science
● Reduce challenges in getting machine learning models into
production
● Increase ownership of engineering guidance to data science
Case Study: Failures in delivering meaningful ML models at TLL
Challenges
● Timeline to develop models too long compared to product
needs
● Platform Engineering and Data Science in different orgs
8. Consumer Research?
Case Study: New
approach at Los
Angeles Times
What does a standard
big data approach tell us
from this chart?
9. Consumer Research
Case Study: New
approach at Los
Angeles Times
Consumer research -
what is going on in the
bottom left?
10. Putting a more rigorous process around
data science
Descriptive Analytics
Decision Science
Machine Learning
Explore your data. Find some ideas. Gain inspiration. CANNOT BE
USED FOR BUSINESS DECISIONS
Take the ideas found above, and validate that they are actually true
Automate the ideas that were validated to drive true business value
Cassie Kozyrkov - Chief Decision Scientist at Google
11. Not all data scientists are the same
Data Science is a pretty vague term
Is the Type A - Analysis and Type B - Building view of the world* enough?
*Michael Hochster (Stitch Fix) and Robert Chang (AirBNB)
Machine Learning
Engineer
Decision ScientistDescriptive Analyst
12. How do we take action on different
types of data scientists?
Machine Learning Engineer
Descriptive Analyst Descriptive Analyst
13. How do we take action on different
types of data scientists?
We don’t have any Decision Scientists
● Overall org isn’t big enough
● Work is not full-time
Tribune Publishing - Made Data Science team the official testing team
● Majority of A/B testing was done by marketing group
● Found many key issues with more complicated scenarios
○ Missing clear hypotheses
○ Test structure sometimes incorrect (too many changing variables)
● Created test plan document, data science had to work with marketing to come to
agreement on plan that got data science stamp of approval
Changes to hiring practices
● Distinct JDs for each role
● Different skill sets for Talent Acquisition to look for
14. What do these candidates look like at
LAT?
Machine Learning Engineer Descriptive Analyst
● Generally STEM background
● Experience with ML in production systems
● Stronger coding skills
● Has to know Python (or whatever DE is
using)
● Desire to learn more developer skills
● Broader backgrounds (MBA, econ, DS
training programs)
● R or Python
● Skill set emphasis
○ Data interpretation
○ Communication and presentation
○ Data visualization
15. Data Governance and Compliance
VPPA, GDPR, CCPA
Where should this role sit? Legal? Compliance? Security?
Data?
Why I like the data org:
● Focus on data documentation (lineage, dictionaries, etc.)
● Increases awareness
○ For rest of the org: reinforces they need to think about this
in their daily job
○ For Compliance / Governance: more rapid discovery of
what work is being done and injecting suggestions into the
development process earlier