SlideShare una empresa de Scribd logo
1 de 50
Descargar para leer sin conexión
Data Science &
Culture
(Or how to stop worrying and love data driven culture)
Ícaro Medeiros
Data Science Forum
São Paulo, Jun 2017
Inspired by
(not limited to)
refs
Big Data
http://www.kdnuggets.com/2017/02/origins-big-data.html
✦ Fundamental blocks: evolutions on CS e.g.
distributed systems, databases, massive AI, etc

✦ Fuzzy concept, ill-defined

✦ Popularized by Gartner

(hype-fueled consulting firm)
✦ Big Data no longer considered an emerging
technology (pervasive in industry)

✦ Entered Trough of Disillusionment in 2013
https://knowledgeimmersion.wordpress.com/2016/06/22/disillusionment-of-big-data/
http://www.mikelnino.com/2016/03/chronology-big-data.html
Chronology of antecedents
Data science
✦ Statistics (late 19th century)

✦ Computer Science (1950s)

✦ Machine Learning (1950s)

✦ Data Mining (1990s)

✦ Data Science (2010s)
https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
yet another hyped term
Beware: controversy
✦ Data science is not all-science
✴ It’s getting more and more engineering-like, a practice

✴ Data storytelling is a creative endeavor
✦ Hyper-inflated expectations, misunderstood
concepts and hurry to get value: a dangerous
recipe
A new hope
machine learning
big data
https://trends.google.com/trends/explore?date=today%2012-m&geo=US&q=machine%20learning,big%20data
or hype
Hype: not that bad
✦ Haters gonna hate i.e. don’t fully hate the hype

✴ more practitioners = faster tech and processes evolution
✴ Highly skilled professionals and innovation

✦ Academics sometimes look for difficult unwanted
problems

✴
industry is more pragmatic, specially in tech
https://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science
What we need…
✦ Forget about Big Data pokémons

✴ OH so in Big Data we don’t need people to think schemas?

✦ Forget about misunderstood business expectations

✴ OH in deep learning we don’t need people to train models?

✦ You need PEOPLE

✴ Collaborating with shared values

✴ Awesome in tech but more importantly: CREATIVE
Shared values
and practices
Culture
Good people
✦ People are more important than ideas

✴ A mediocre team will screw up a good idea

✴ Mediocre idea to great team: they will fix it or rethink it

✦ A good lab: different kinds of autonomous thinkers

✴ Why hire smart people if they can't fix what’s broken?

✦ Prefer a heterogeneous and complimentary team
instead of looking for unicorns
The mythical 10x professional
https://twitter.com/icaromedeiros/status/838968884023668737
Good communication
✦ Honesty, excellence, originality and self-
criticism (values)

✦ Communication structure <> organizational

✦ Be ready to hear the truth

✴ Sincerity is only valuable if people are open and willing to give
up on ideas that will not work

✦ Braintrust: Leave ego and Jobs outside the door
Power to the people!
✦ Product quality is everyone’s responsibility
✴ Don’t ask permission to take responsibility

✦ Passion and excellence versus autonomy

✦ Good things might shadow the bad

✴ People struggle to explore bad things to avoid being called
“complainers”
Rebels
http://qaspire.com/2017/05/19/sketchnote-what-rebels-want-from-their-boss/
Destroy data silos!
✦ Without information about data there is no science

✦ Software and data should be a collective property
within the company

✦ Knowledge management matter

✦ Communication between areas must be enforced
Data portals
✦ Self-service platforms to publish datasets

✴ Descriptions, schemas, samples, relations between datasets,
etc

✦ Open Data initiatives, mostly governments

✦ OSS platforms: CKAN, AirBNB’s Dataportal

✦ Examples: data.gov.uk, dados.gov.br, etc
“When it comes to creative
inspiration, job titles and
hierarchy are meaningless”
Data storytelling
✦ Explain what numbers tell in layman, clear terms

✦ Make hidden premises clear

✴ Outside data insights

✦ Convince others about actions

✴ Decreases insights-to-value interval
✦ From data to knowledge
https://www.forbes.com/sites/brentdykes/2016/03/31/data-storytelling-the-essential-data-science-skill-everyone-needs
What is creativity
✦ Unexpected connections of concepts and ideas

✦ It's a marathon, it needs rhythm

✦ Creativity must start somewhere and there’s power
on healthy feedback in a iterative process
Visual communication
✦ Clean straightforward graphs > visually appealing

✴ Choose dataviz libs wisely

✦ “Don’t make me think”

✦ The right graph for the right audience

✴ Prefer a language everyone understands
Visual communication 101
Stats are not enough
https://www.autodeskresearch.com/publications/samestats
Stats are not enough
https://www.autodeskresearch.com/publications/samestats
Strateg a
Avoid egotrip data science
✦ “OH my cluster has 10 Petabytes, I’m awesome”

✦ Fancy ML algorithms are not the goal

✦ The most important V in Big Data is value
https://twitter.com/amyhoy/status/847097034536554497
KPI versus HiPPO
✦ Tech adoption per se is meaningless

✴ Slide-driven Big Data

✴ KPIs should grow from Big Data and data insights initatives

✦ Poor defined goals -> bad decisions

✦ Define viable but ambitious goals

✦ Data beats opinion
Set goal, plan and GO!
✦ Business questions can't be like “OH we want to
detect things related to millennials”

✦ Clear goals must be set, with actionable metrics

✦ Balance perfect models versus time-to-market

✦ Brad Bird: “Sometimes, as a director, you’re
guiding. Sometimes you’re letting the car drive”
https://hbr.org/2017/02/how-chief-data-officers-can-get-their-companies-to-collect-clean-data
The process
✦ The process is not the goal

✴ It has no agenda or taste, it’s just a tool

✦ Quality is the best business plan

✦ Agile is a mindset: not only kanbans or scrum

✦ If the model will become operational, mix scientists
and engineers from start
Build vs Buy
✦ If you buy and your core business is not techie, you can be
illiterate in tech
✴ Benchmark before buying

✴ Accelerate results and boost internal knowledge

✦ If you build and have a good-enough techie culture, you’re
more or less good to go

✴ Assess pros and cons consciously

✦ If you surf the tech hype AND build good systems you’re
awesome
https://twitter.com/Doug_Laney/status/847452219641356288
When data goes to vendors…
http://www.louisdorard.com/machine-learning-canvas/
DATA
ENGINEERING
Big Data vs Great Data
✦ If your logical models do not make sense

✦ Most performed queries are slow

✦ If you have string-only databases

✦ If you have unused expensive data

✦ Maybe your data lake is a swamp
“The data is a mess”
✦ First step: accelerate human understanding of data

✴ Metadata, context, hidden assumptions

✦ Datasets might serves multiple purposes

✴ Define rationale and context

✴ Data portals and understandable datasets > Dashboards
https://hbr.org/2016/12/why-youre-not-getting-value-from-your-data-science
https://medium.com/airbnb-engineering/democratizing-data-at-airbnb-852d76c51770
Data lost in translation
✦ Heterogeneous and siloed databases (and people)

✦ Rethink ESB (microservices network)

✦ State-of-the-art: data workflow

✴ Luigi, Airflow (open source), almost every big tech vendor

✴ Transparency, reusability, reproducibility, traceability

✴ Automation and monitoring all the way!
https://hbr.org/2016/12/why-youre-not-getting-value-from-your-data-science
Beyond relational models
✦ Not all data problems fits well in traditional SQL or
DW models

✴ Key-value, columnar, graph-based, inverted index, etc

✦ Models are a framework for problem-solving
✴ Not the ultimate answer

✴ There’s no one-size-fits-all model
Do not forget fluency
✦ Check the company lingua franca

✦ Make it easy for critical decision-makers

✴ Adhoc SQL queries?

✴ Dashboards?

✴ Reports?
EXPERIMENTATION
Experiments
✦ Missions to discover facts towards understanding

✴ They don’t fail, any result produces new information

✴ If the initial theory was wrong: good

✴ With new facts you can reformulate the question

✦ Get more modeling questions asked more often

✦ Iterative data science
Product experimentation (A/B)
✦ Product experimentation should be hypothesis-
driven (not feature-driven)

✦ Define the proper exposed population
✴ No new users, no heavy users only, no early adopters

✦ Understanding effect is essential
https://medium.com/airbnb-engineering/4-principles-for-making-experimentation-count-7a5f1a5268a
5 stages of A/B tests
https://www.linkedin.com/pulse/ab-testing-which-do-i-pick-sahar-heidari
Some other quick tips
✦ Focus on outcomes (not algorithms or methods)

✦ Design the right metric and evaluation
✦ Good experiments don't produce obvious insights

✦ Mix of data and intuition
https://twitter.com/mrdatascience/status/869957499662860288
Being data driven
✦ Be BAYESIAN - uncertainty is everywhere

✦ Be CURIOUS - keep learning
✦ Be AGILE - Fail fast, not too fast: evidence comes first
https://www.reaktor.com/blog/culture-eats-data-science-for-breakfast/
Being data driven
✦ Be TRUTHFUL - don’t torture data to please opinions

✦ Be HELPFUL - work across silos, support democracy
✦ Be WISE - know when to be analytical or intuitive
https://www.reaktor.com/blog/culture-eats-data-science-for-breakfast/
With the right people,
Democracy,
Creativity,
Strategy,
Big Great Data™
and Experiments
there's a good chance to do great
SCIENCE
Take-away message
Ícaro Medeiros
Data Scientist
icaromedeiros

Más contenido relacionado

La actualidad más candente

The Field Guide to Data Science
The Field Guide to Data ScienceThe Field Guide to Data Science
The Field Guide to Data ScienceEMC
 
Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...
Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...
Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...ux singapore
 
Lessons Learned The Hard Way: 32+ Data Science Interviews
Lessons Learned The Hard Way: 32+ Data Science InterviewsLessons Learned The Hard Way: 32+ Data Science Interviews
Lessons Learned The Hard Way: 32+ Data Science InterviewsGregory Kamradt
 
Science in the context of journals, Open, and the future
Science in the context of journals, Open, and the futureScience in the context of journals, Open, and the future
Science in the context of journals, Open, and the futureBenjamin Laken
 
Less is More: Behind the Data at Risk I/O
Less is More: Behind the Data at Risk I/OLess is More: Behind the Data at Risk I/O
Less is More: Behind the Data at Risk I/OMichael Roytman
 
Data and Algorithmic Bias in the Web
Data and Algorithmic Bias in the WebData and Algorithmic Bias in the Web
Data and Algorithmic Bias in the WebWebVisions
 
Mental Health Informatics - What we can learn from the past and where we can be
Mental Health Informatics - What we can learn from the past and where we can beMental Health Informatics - What we can learn from the past and where we can be
Mental Health Informatics - What we can learn from the past and where we can beHimanshu Tyagi
 
The data science handbook pre release (1)
The data science handbook   pre release (1)The data science handbook   pre release (1)
The data science handbook pre release (1)Lakshmi Prasanna
 
A data view of the data science process
A data view of the data science processA data view of the data science process
A data view of the data science processMathieu d'Aquin
 
Solving the Wanamaker Problem for Healthcare (keynote file)
Solving the Wanamaker Problem for Healthcare (keynote file)Solving the Wanamaker Problem for Healthcare (keynote file)
Solving the Wanamaker Problem for Healthcare (keynote file)Tim O'Reilly
 
2015 data-science-salary-survey
2015 data-science-salary-survey2015 data-science-salary-survey
2015 data-science-salary-surveyAdam Rabinovitch
 
Trusting a Distributed Data Pipeline | Masters of Conversion
Trusting a Distributed Data Pipeline | Masters of ConversionTrusting a Distributed Data Pipeline | Masters of Conversion
Trusting a Distributed Data Pipeline | Masters of ConversionVWO
 

La actualidad más candente (17)

The Field Guide to Data Science
The Field Guide to Data ScienceThe Field Guide to Data Science
The Field Guide to Data Science
 
Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...
Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...
Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...
 
Lessons Learned The Hard Way: 32+ Data Science Interviews
Lessons Learned The Hard Way: 32+ Data Science InterviewsLessons Learned The Hard Way: 32+ Data Science Interviews
Lessons Learned The Hard Way: 32+ Data Science Interviews
 
Science in the context of journals, Open, and the future
Science in the context of journals, Open, and the futureScience in the context of journals, Open, and the future
Science in the context of journals, Open, and the future
 
Less is More: Behind the Data at Risk I/O
Less is More: Behind the Data at Risk I/OLess is More: Behind the Data at Risk I/O
Less is More: Behind the Data at Risk I/O
 
Data and Algorithmic Bias in the Web
Data and Algorithmic Bias in the WebData and Algorithmic Bias in the Web
Data and Algorithmic Bias in the Web
 
Mental Health Informatics - What we can learn from the past and where we can be
Mental Health Informatics - What we can learn from the past and where we can beMental Health Informatics - What we can learn from the past and where we can be
Mental Health Informatics - What we can learn from the past and where we can be
 
The data science handbook pre release (1)
The data science handbook   pre release (1)The data science handbook   pre release (1)
The data science handbook pre release (1)
 
A data view of the data science process
A data view of the data science processA data view of the data science process
A data view of the data science process
 
Big data to big understanding
Big data to big understandingBig data to big understanding
Big data to big understanding
 
Designing Data for Dignity StrataRx
Designing Data for Dignity StrataRxDesigning Data for Dignity StrataRx
Designing Data for Dignity StrataRx
 
Solving the Wanamaker Problem for Healthcare (keynote file)
Solving the Wanamaker Problem for Healthcare (keynote file)Solving the Wanamaker Problem for Healthcare (keynote file)
Solving the Wanamaker Problem for Healthcare (keynote file)
 
2015 data-science-salary-survey
2015 data-science-salary-survey2015 data-science-salary-survey
2015 data-science-salary-survey
 
Connect, communicate, collaborate
Connect, communicate, collaborateConnect, communicate, collaborate
Connect, communicate, collaborate
 
How Change Happens
How Change HappensHow Change Happens
How Change Happens
 
Small data big impact
Small data big impactSmall data big impact
Small data big impact
 
Trusting a Distributed Data Pipeline | Masters of Conversion
Trusting a Distributed Data Pipeline | Masters of ConversionTrusting a Distributed Data Pipeline | Masters of Conversion
Trusting a Distributed Data Pipeline | Masters of Conversion
 

Similar a Data Science and Culture

How AI is revolutionizing the world
How AI is revolutionizing the worldHow AI is revolutionizing the world
How AI is revolutionizing the worldSK Reddy
 
Minne analytics presentation 2018 12 03 final compressed
Minne analytics presentation 2018 12 03 final   compressedMinne analytics presentation 2018 12 03 final   compressed
Minne analytics presentation 2018 12 03 final compressedBonnie Holub
 
Minne analytics presentation 2018 12 03 final compressed
Minne analytics presentation 2018 12 03 final   compressedMinne analytics presentation 2018 12 03 final   compressed
Minne analytics presentation 2018 12 03 final compressedBonnie Holub
 
Big Data; Big Potential: How to find the talent who can harness its power
Big Data; Big Potential: How to find the talent who can harness its powerBig Data; Big Potential: How to find the talent who can harness its power
Big Data; Big Potential: How to find the talent who can harness its powerLucas Group
 
DevSecOps Through Blunt Force Trauma, I'm the Trauma
DevSecOps Through Blunt Force Trauma, I'm the TraumaDevSecOps Through Blunt Force Trauma, I'm the Trauma
DevSecOps Through Blunt Force Trauma, I'm the TraumaDevOpsDays DFW
 
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...DATAVERSITY
 
How to get on the AI journey?
How to get on the AI journey? How to get on the AI journey?
How to get on the AI journey? Aarthi Srinivasan
 
Data Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph DatabasesData Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph DatabasesDATAVERSITY
 
15 Ways to Stand Out on Your IT Team
15 Ways to Stand Out on Your IT Team15 Ways to Stand Out on Your IT Team
15 Ways to Stand Out on Your IT TeamAll Things Open
 
Maximizing Business Connections Through Social Media
Maximizing Business Connections Through Social MediaMaximizing Business Connections Through Social Media
Maximizing Business Connections Through Social Mediadrewblue
 
The New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
The New Role of Epertise: Open Science in a Web of Sensors, Senses and SemanticsThe New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
The New Role of Epertise: Open Science in a Web of Sensors, Senses and SemanticsJohn Blossom
 
Future of data science as a profession
Future of data science as a professionFuture of data science as a profession
Future of data science as a professionJose Quesada
 
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...Kai Wähner
 
Why Your Data Management Strategy Isn't Working (and How to Fix It)
Why Your Data Management Strategy Isn't Working (and How to Fix It)Why Your Data Management Strategy Isn't Working (and How to Fix It)
Why Your Data Management Strategy Isn't Working (and How to Fix It)DATAVERSITY
 
DataEd Slides: The Seven Deadly Data Sins
DataEd Slides: The Seven Deadly Data SinsDataEd Slides: The Seven Deadly Data Sins
DataEd Slides: The Seven Deadly Data SinsDATAVERSITY
 
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Chris Dagdigian
 
Big data & data science challenges and opportunities
Big data & data science   challenges and opportunitiesBig data & data science   challenges and opportunities
Big data & data science challenges and opportunitiesJose Quesada
 
Data science market insights usa
Data science market insights usaData science market insights usa
Data science market insights usaKaitlin McAndrews
 

Similar a Data Science and Culture (20)

How AI is revolutionizing the world
How AI is revolutionizing the worldHow AI is revolutionizing the world
How AI is revolutionizing the world
 
Minne analytics presentation 2018 12 03 final compressed
Minne analytics presentation 2018 12 03 final   compressedMinne analytics presentation 2018 12 03 final   compressed
Minne analytics presentation 2018 12 03 final compressed
 
Minne analytics presentation 2018 12 03 final compressed
Minne analytics presentation 2018 12 03 final   compressedMinne analytics presentation 2018 12 03 final   compressed
Minne analytics presentation 2018 12 03 final compressed
 
Big Data; Big Potential: How to find the talent who can harness its power
Big Data; Big Potential: How to find the talent who can harness its powerBig Data; Big Potential: How to find the talent who can harness its power
Big Data; Big Potential: How to find the talent who can harness its power
 
DevSecOps Through Blunt Force Trauma, I'm the Trauma
DevSecOps Through Blunt Force Trauma, I'm the TraumaDevSecOps Through Blunt Force Trauma, I'm the Trauma
DevSecOps Through Blunt Force Trauma, I'm the Trauma
 
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
 
How to get on the AI journey?
How to get on the AI journey? How to get on the AI journey?
How to get on the AI journey?
 
Data Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph DatabasesData Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph Databases
 
Technical Communication, Marketing , Truth
Technical Communication, Marketing , TruthTechnical Communication, Marketing , Truth
Technical Communication, Marketing , Truth
 
15 Ways to Stand Out on Your IT Team
15 Ways to Stand Out on Your IT Team15 Ways to Stand Out on Your IT Team
15 Ways to Stand Out on Your IT Team
 
Maximizing Business Connections Through Social Media
Maximizing Business Connections Through Social MediaMaximizing Business Connections Through Social Media
Maximizing Business Connections Through Social Media
 
The New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
The New Role of Epertise: Open Science in a Web of Sensors, Senses and SemanticsThe New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
The New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
 
Future of data science as a profession
Future of data science as a professionFuture of data science as a profession
Future of data science as a profession
 
Tf wdvds
Tf wdvdsTf wdvds
Tf wdvds
 
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
 
Why Your Data Management Strategy Isn't Working (and How to Fix It)
Why Your Data Management Strategy Isn't Working (and How to Fix It)Why Your Data Management Strategy Isn't Working (and How to Fix It)
Why Your Data Management Strategy Isn't Working (and How to Fix It)
 
DataEd Slides: The Seven Deadly Data Sins
DataEd Slides: The Seven Deadly Data SinsDataEd Slides: The Seven Deadly Data Sins
DataEd Slides: The Seven Deadly Data Sins
 
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
 
Big data & data science challenges and opportunities
Big data & data science   challenges and opportunitiesBig data & data science   challenges and opportunities
Big data & data science challenges and opportunities
 
Data science market insights usa
Data science market insights usaData science market insights usa
Data science market insights usa
 

Más de Ícaro Medeiros

Why Python is better for Data Science
Why Python is better for Data ScienceWhy Python is better for Data Science
Why Python is better for Data ScienceÍcaro Medeiros
 
Statistics: the grammar of Data Science
Statistics: the grammar of Data ScienceStatistics: the grammar of Data Science
Statistics: the grammar of Data ScienceÍcaro Medeiros
 
Linked Data, Big Data, and User Science at Globo.com
Linked Data, Big Data, and User Science at Globo.comLinked Data, Big Data, and User Science at Globo.com
Linked Data, Big Data, and User Science at Globo.comÍcaro Medeiros
 
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...Ícaro Medeiros
 
Web Semântica na Globo.com (Novas Mídias UFRJ)
Web Semântica na Globo.com (Novas Mídias UFRJ)Web Semântica na Globo.com (Novas Mídias UFRJ)
Web Semântica na Globo.com (Novas Mídias UFRJ)Ícaro Medeiros
 
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013Ícaro Medeiros
 
Engenharia de ontologias
Engenharia de ontologiasEngenharia de ontologias
Engenharia de ontologiasÍcaro Medeiros
 
Schema.org - HTML semântico - Front in Maceio 2012
Schema.org - HTML semântico - Front in Maceio 2012Schema.org - HTML semântico - Front in Maceio 2012
Schema.org - HTML semântico - Front in Maceio 2012Ícaro Medeiros
 
R2R Framework: Ontology Mapping
R2R Framework: Ontology MappingR2R Framework: Ontology Mapping
R2R Framework: Ontology MappingÍcaro Medeiros
 
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...Ícaro Medeiros
 
Tag Suggestion using Multiple Sources of Knowledge
Tag Suggestion using Multiple Sources of KnowledgeTag Suggestion using Multiple Sources of Knowledge
Tag Suggestion using Multiple Sources of KnowledgeÍcaro Medeiros
 
Expressões regulares no Linux
Expressões regulares no LinuxExpressões regulares no Linux
Expressões regulares no LinuxÍcaro Medeiros
 

Más de Ícaro Medeiros (15)

Why Python is better for Data Science
Why Python is better for Data ScienceWhy Python is better for Data Science
Why Python is better for Data Science
 
Statistics: the grammar of Data Science
Statistics: the grammar of Data ScienceStatistics: the grammar of Data Science
Statistics: the grammar of Data Science
 
Linked Data, Big Data, and User Science at Globo.com
Linked Data, Big Data, and User Science at Globo.comLinked Data, Big Data, and User Science at Globo.com
Linked Data, Big Data, and User Science at Globo.com
 
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...
 
Web Semântica na Globo.com (Novas Mídias UFRJ)
Web Semântica na Globo.com (Novas Mídias UFRJ)Web Semântica na Globo.com (Novas Mídias UFRJ)
Web Semântica na Globo.com (Novas Mídias UFRJ)
 
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013
 
Engenharia de ontologias
Engenharia de ontologiasEngenharia de ontologias
Engenharia de ontologias
 
Schema.org - HTML semântico - Front in Maceio 2012
Schema.org - HTML semântico - Front in Maceio 2012Schema.org - HTML semântico - Front in Maceio 2012
Schema.org - HTML semântico - Front in Maceio 2012
 
Ontology matching
Ontology matchingOntology matching
Ontology matching
 
R2R Framework: Ontology Mapping
R2R Framework: Ontology MappingR2R Framework: Ontology Mapping
R2R Framework: Ontology Mapping
 
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
 
Tag Suggestion using Multiple Sources of Knowledge
Tag Suggestion using Multiple Sources of KnowledgeTag Suggestion using Multiple Sources of Knowledge
Tag Suggestion using Multiple Sources of Knowledge
 
Expressões regulares no Linux
Expressões regulares no LinuxExpressões regulares no Linux
Expressões regulares no Linux
 
Ontology Learning
Ontology LearningOntology Learning
Ontology Learning
 
Tag Suggestion
Tag SuggestionTag Suggestion
Tag Suggestion
 

Último

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 

Último (20)

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 

Data Science and Culture

  • 1. Data Science & Culture (Or how to stop worrying and love data driven culture) Ícaro Medeiros Data Science Forum São Paulo, Jun 2017
  • 3. Big Data http://www.kdnuggets.com/2017/02/origins-big-data.html ✦ Fundamental blocks: evolutions on CS e.g. distributed systems, databases, massive AI, etc ✦ Fuzzy concept, ill-defined ✦ Popularized by Gartner
 (hype-fueled consulting firm)
  • 4. ✦ Big Data no longer considered an emerging technology (pervasive in industry) ✦ Entered Trough of Disillusionment in 2013 https://knowledgeimmersion.wordpress.com/2016/06/22/disillusionment-of-big-data/
  • 6. Data science ✦ Statistics (late 19th century) ✦ Computer Science (1950s) ✦ Machine Learning (1950s) ✦ Data Mining (1990s) ✦ Data Science (2010s) https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century yet another hyped term
  • 7. Beware: controversy ✦ Data science is not all-science ✴ It’s getting more and more engineering-like, a practice ✴ Data storytelling is a creative endeavor ✦ Hyper-inflated expectations, misunderstood concepts and hurry to get value: a dangerous recipe
  • 8. A new hope machine learning big data https://trends.google.com/trends/explore?date=today%2012-m&geo=US&q=machine%20learning,big%20data or hype
  • 9. Hype: not that bad ✦ Haters gonna hate i.e. don’t fully hate the hype ✴ more practitioners = faster tech and processes evolution ✴ Highly skilled professionals and innovation ✦ Academics sometimes look for difficult unwanted problems ✴ industry is more pragmatic, specially in tech https://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science
  • 10. What we need… ✦ Forget about Big Data pokémons ✴ OH so in Big Data we don’t need people to think schemas? ✦ Forget about misunderstood business expectations ✴ OH in deep learning we don’t need people to train models? ✦ You need PEOPLE ✴ Collaborating with shared values ✴ Awesome in tech but more importantly: CREATIVE
  • 12.
  • 13. Good people ✦ People are more important than ideas ✴ A mediocre team will screw up a good idea ✴ Mediocre idea to great team: they will fix it or rethink it ✦ A good lab: different kinds of autonomous thinkers ✴ Why hire smart people if they can't fix what’s broken? ✦ Prefer a heterogeneous and complimentary team instead of looking for unicorns
  • 14. The mythical 10x professional https://twitter.com/icaromedeiros/status/838968884023668737
  • 15. Good communication ✦ Honesty, excellence, originality and self- criticism (values) ✦ Communication structure <> organizational ✦ Be ready to hear the truth ✴ Sincerity is only valuable if people are open and willing to give up on ideas that will not work ✦ Braintrust: Leave ego and Jobs outside the door
  • 16. Power to the people! ✦ Product quality is everyone’s responsibility ✴ Don’t ask permission to take responsibility ✦ Passion and excellence versus autonomy ✦ Good things might shadow the bad ✴ People struggle to explore bad things to avoid being called “complainers”
  • 18. Destroy data silos! ✦ Without information about data there is no science ✦ Software and data should be a collective property within the company ✦ Knowledge management matter ✦ Communication between areas must be enforced
  • 19. Data portals ✦ Self-service platforms to publish datasets ✴ Descriptions, schemas, samples, relations between datasets, etc ✦ Open Data initiatives, mostly governments ✦ OSS platforms: CKAN, AirBNB’s Dataportal ✦ Examples: data.gov.uk, dados.gov.br, etc
  • 20. “When it comes to creative inspiration, job titles and hierarchy are meaningless”
  • 21.
  • 22. Data storytelling ✦ Explain what numbers tell in layman, clear terms ✦ Make hidden premises clear ✴ Outside data insights ✦ Convince others about actions ✴ Decreases insights-to-value interval ✦ From data to knowledge https://www.forbes.com/sites/brentdykes/2016/03/31/data-storytelling-the-essential-data-science-skill-everyone-needs
  • 23. What is creativity ✦ Unexpected connections of concepts and ideas ✦ It's a marathon, it needs rhythm ✦ Creativity must start somewhere and there’s power on healthy feedback in a iterative process
  • 24. Visual communication ✦ Clean straightforward graphs > visually appealing ✴ Choose dataviz libs wisely ✦ “Don’t make me think” ✦ The right graph for the right audience ✴ Prefer a language everyone understands
  • 26. Stats are not enough https://www.autodeskresearch.com/publications/samestats
  • 27. Stats are not enough https://www.autodeskresearch.com/publications/samestats
  • 29. Avoid egotrip data science ✦ “OH my cluster has 10 Petabytes, I’m awesome” ✦ Fancy ML algorithms are not the goal ✦ The most important V in Big Data is value https://twitter.com/amyhoy/status/847097034536554497
  • 30. KPI versus HiPPO ✦ Tech adoption per se is meaningless ✴ Slide-driven Big Data ✴ KPIs should grow from Big Data and data insights initatives ✦ Poor defined goals -> bad decisions ✦ Define viable but ambitious goals ✦ Data beats opinion
  • 31. Set goal, plan and GO! ✦ Business questions can't be like “OH we want to detect things related to millennials” ✦ Clear goals must be set, with actionable metrics ✦ Balance perfect models versus time-to-market ✦ Brad Bird: “Sometimes, as a director, you’re guiding. Sometimes you’re letting the car drive” https://hbr.org/2017/02/how-chief-data-officers-can-get-their-companies-to-collect-clean-data
  • 32. The process ✦ The process is not the goal ✴ It has no agenda or taste, it’s just a tool ✦ Quality is the best business plan ✦ Agile is a mindset: not only kanbans or scrum ✦ If the model will become operational, mix scientists and engineers from start
  • 33. Build vs Buy ✦ If you buy and your core business is not techie, you can be illiterate in tech ✴ Benchmark before buying ✴ Accelerate results and boost internal knowledge ✦ If you build and have a good-enough techie culture, you’re more or less good to go ✴ Assess pros and cons consciously ✦ If you surf the tech hype AND build good systems you’re awesome
  • 37. Big Data vs Great Data ✦ If your logical models do not make sense ✦ Most performed queries are slow ✦ If you have string-only databases ✦ If you have unused expensive data ✦ Maybe your data lake is a swamp
  • 38. “The data is a mess” ✦ First step: accelerate human understanding of data ✴ Metadata, context, hidden assumptions ✦ Datasets might serves multiple purposes ✴ Define rationale and context ✴ Data portals and understandable datasets > Dashboards https://hbr.org/2016/12/why-youre-not-getting-value-from-your-data-science https://medium.com/airbnb-engineering/democratizing-data-at-airbnb-852d76c51770
  • 39. Data lost in translation ✦ Heterogeneous and siloed databases (and people) ✦ Rethink ESB (microservices network) ✦ State-of-the-art: data workflow ✴ Luigi, Airflow (open source), almost every big tech vendor ✴ Transparency, reusability, reproducibility, traceability ✴ Automation and monitoring all the way! https://hbr.org/2016/12/why-youre-not-getting-value-from-your-data-science
  • 40. Beyond relational models ✦ Not all data problems fits well in traditional SQL or DW models ✴ Key-value, columnar, graph-based, inverted index, etc ✦ Models are a framework for problem-solving ✴ Not the ultimate answer ✴ There’s no one-size-fits-all model
  • 41. Do not forget fluency ✦ Check the company lingua franca ✦ Make it easy for critical decision-makers ✴ Adhoc SQL queries? ✴ Dashboards? ✴ Reports?
  • 43. Experiments ✦ Missions to discover facts towards understanding ✴ They don’t fail, any result produces new information ✴ If the initial theory was wrong: good ✴ With new facts you can reformulate the question ✦ Get more modeling questions asked more often ✦ Iterative data science
  • 44. Product experimentation (A/B) ✦ Product experimentation should be hypothesis- driven (not feature-driven) ✦ Define the proper exposed population ✴ No new users, no heavy users only, no early adopters ✦ Understanding effect is essential https://medium.com/airbnb-engineering/4-principles-for-making-experimentation-count-7a5f1a5268a
  • 45. 5 stages of A/B tests https://www.linkedin.com/pulse/ab-testing-which-do-i-pick-sahar-heidari
  • 46. Some other quick tips ✦ Focus on outcomes (not algorithms or methods) ✦ Design the right metric and evaluation ✦ Good experiments don't produce obvious insights ✦ Mix of data and intuition https://twitter.com/mrdatascience/status/869957499662860288
  • 47. Being data driven ✦ Be BAYESIAN - uncertainty is everywhere ✦ Be CURIOUS - keep learning ✦ Be AGILE - Fail fast, not too fast: evidence comes first https://www.reaktor.com/blog/culture-eats-data-science-for-breakfast/
  • 48. Being data driven ✦ Be TRUTHFUL - don’t torture data to please opinions ✦ Be HELPFUL - work across silos, support democracy ✦ Be WISE - know when to be analytical or intuitive https://www.reaktor.com/blog/culture-eats-data-science-for-breakfast/
  • 49. With the right people, Democracy, Creativity, Strategy, Big Great Data™ and Experiments there's a good chance to do great SCIENCE Take-away message