SlideShare una empresa de Scribd logo
1 de 18
A HYBRID
APPROACH TO DATA
SCIENCE PROJECT
MANAGEMENT
Elaine Lee
O P E N
D A T A
S C I E N C E
C O N F E R E N C E_
BOSTON 2015
@opendatasci
Building a Data-Driven WorldTM
Open Data Science Conference
A Hybrid Approach to Data
Science Project Management
Elaine Lee
elee@civisanalytics.com
@elaineklee
3Open Data Science Conference#ODSC
Organizations want to be data-driven but many obstacles stand in their way:
• Communication not trickling up to executives and key decision makers
• Silos between departments, making it difficult to share and collaborate on
analysis
• Data ingestion (ETL or Extract-Transform-Load) is difficult and time-consuming
• Lack of meaningful, yet customizable visual reporting
• Inability to flexibly scale up or down technological needs at a reasonable cost
• Inadequate or overwhelming learning resources about data science
A Common Problem With Many Faces
4Open Data Science Conference#ODSC
Where should Enroll America direct its insurance signup efforts?
Mapping the Uninsured in America
5Civis Analytics | Proprietary and Confidential
As a company, Civis traces its
origins to the 2012 Obama for
America analytics team.
We built a scientific
understanding of each voter.
Our data science influenced
every strategy and tactic: voter
targeting, messaging, media
buys, and fundraising.
This meant the campaign could
allocate resources where impact
would be greatest.
We ran the first
individualized
presidential
campaign
Civis Analytics | Proprietary and Confidential Open Data Science Conference#ODSC
6Civis Analytics | Proprietary and Confidential
Today, we
leverage data
science to help
our clients in
politics, non-
profits, and the
corporate world.
Civis Analytics | Proprietary and Confidential Open Data Science Conference#ODSC
Open Data Science Conference#ODSC Open Data Science Conference#ODSC
An easy-to-use,
end-to-end, incredibly
extendable, data science
platform in the cloud for
teams who want to make
great data-driven decisions
to drive their organizations
forward.
Introducing
Civis
8Open Data Science Conference#ODSC
The Civis Approach
ProductConsulting R&D
Applied Data Science
• Tackles the toughest data
science problems we can
find
Data Science R&D
• Generalizes and
automates the solution for
many scenarios
Software Engineering
• Integrates solutions into
user-empowering software
• Highly collaborative departments
• All departments contribute to both our services arm and product development
9Open Data Science Conference#ODSC
The Civis Approach
Our unique team structure allows
us to solve your biggest problems
with custom solutions and the
technology to scale them.
10Open Data Science Conference#ODSC
Strategies and philosophies
• Teams based on Civis’s product and consulting needs:
• “Built around code”
• Semi-annual departmental day-long off-sites to plan upcoming R&D initiatives
• Academia-influenced: evidence-based approaches to finding and reporting best
solutions
• Software development-influenced: standups, code review
• Favorite tools:
Data Science R&D
R&D
Modeling
Methodology
Unstructured
Data
Engineering
11Open Data Science Conference#ODSC
Tools
• Share and discuss data science news
• Receive feedback from colleagues
using our tools
• Discuss implementation
• Lower communication costs compared
to email
Data Science R&D
12Open Data Science Conference#ODSC
Tools
• Prototype new workflows
• Used like a log book to record and
present results
• Share preliminary results with
members of other departments
Data Science R&D
13Open Data Science Conference#ODSC
Tools
• Department heads set milestones,
check progress, and make project
staffing decisions
• Collaboratively plan development on
new functionality or organizational
processes (e.g. recruiting)
Data Science R&D
14Open Data Science Conference#ODSC
Tools
Strategies
• Designate “tag team” on R&D as
default R&D resources for client
engagements
• This is the Modeling Methodology
team
• Other R&D teams’ members may be
staffed on engagements depending on
expertise required
• R&D team member always serves as the
Consulted in the RACI model
• Transparency about challenges is
paramount
R&D <-> ADS
15Open Data Science Conference#ODSC
1. Assemble a project team of R&D data
scientists and Applied Data Scientists
2. Work with Enroll America to refine
requirements and come up with a plan
of analysis, ultimately resulting in the
design and execution of a phone
survey on a sample of individuals,
followed by building a predictive
model for the rest of the country.
3. The Applied Data Science Manager
has weekly calls with Enroll America
and status meetings with the project
team.
4. The project team delivers the
predictions and analysis to Enroll
America.
R&D <-> ADS: A Case Study
Mapping the Uninsured in America
The project team completes a postmortem
and determines these activities could be
automated: model building
16Open Data Science Conference#ODSC
Tools
Strategies
• Designate teams at the interface to
triage issues and plan new
development:
• R&D: “Engineering” team
• Tech: “Modeling” team
• Use module or project-specific chatrooms
to get answers to ad-hoc questions
quickly
• Identify opportunities to form cross-
functional teams, e.g.:
• Developing apps using the Platform’s
API
• Knowledge sharing on best practices
R&D <-> Tech
17Open Data Science Conference#ODSC
1. After the postmortem for the Enroll
America engagement, R&D begins
prototyping automated modeling
functionality and discussing its
implementation with the Tech
department.
2. R&D’s Engineering team finishes the
prototype and works with Tech’s
Modeling team to integrate it as a new
feature in the Platform.
3. During integration, ad hoc
discussions occur on GitHub and
Hipchat to address usability
questions, e.g. resource usage and
input/output specifications.
R&D <-> Tech: A Case Study
Mapping the Uninsured in America
The integration team successfully builds
and integrates the Build Model module in
the Platform.
Open Data Science Conference#ODSC
Our approach to data science consulting and product development
is enriched by valuable perspectives of our employees, who come
from a wide array of backgrounds, making our project management
strategies a hybrid of more conventional techniques.
Conclusion

Más contenido relacionado

Más de odsc

Understanding the Chief Data Officer
Understanding the Chief Data Officer Understanding the Chief Data Officer
Understanding the Chief Data Officer odsc
 
Machine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge DiscoveryMachine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge Discoveryodsc
 
API Driven Development
API Driven Development API Driven Development
API Driven Development odsc
 
Mobile technology Usage by Humanitarian Programs: A Metadata Analysis
Mobile technology Usage by Humanitarian Programs: A Metadata AnalysisMobile technology Usage by Humanitarian Programs: A Metadata Analysis
Mobile technology Usage by Humanitarian Programs: A Metadata Analysisodsc
 
Productionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground UpProductionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground Upodsc
 
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and HiveBig Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hiveodsc
 
Think Breadth, Not Depth
Think Breadth, Not DepthThink Breadth, Not Depth
Think Breadth, Not Depthodsc
 
Data Science at Dow Jones: Monetizing Data, News and Information
Data Science at Dow Jones: Monetizing Data, News and InformationData Science at Dow Jones: Monetizing Data, News and Information
Data Science at Dow Jones: Monetizing Data, News and Informationodsc
 
Spark, Python and Parquet
Spark, Python and Parquet Spark, Python and Parquet
Spark, Python and Parquet odsc
 
Building a Predictive Analytics Solution with Azure ML
Building a Predictive Analytics Solution with Azure MLBuilding a Predictive Analytics Solution with Azure ML
Building a Predictive Analytics Solution with Azure MLodsc
 
Beyond Names
Beyond NamesBeyond Names
Beyond Namesodsc
 
How Woman are Conquering the S&P 500
How Woman are Conquering the S&P 500How Woman are Conquering the S&P 500
How Woman are Conquering the S&P 500odsc
 
Domain Expertise and Unstructured Data
Domain Expertise and Unstructured DataDomain Expertise and Unstructured Data
Domain Expertise and Unstructured Dataodsc
 
Kaggle The Home of Data Science
Kaggle The Home of Data ScienceKaggle The Home of Data Science
Kaggle The Home of Data Scienceodsc
 
Open Source Tools & Data Science Competitions
Open Source Tools & Data Science Competitions Open Source Tools & Data Science Competitions
Open Source Tools & Data Science Competitions odsc
 
Machine Learning with scikit-learn
Machine Learning with scikit-learnMachine Learning with scikit-learn
Machine Learning with scikit-learnodsc
 
Bridging the Gap Between Data and Insight using Open-Source Tools
Bridging the Gap Between Data and Insight using Open-Source ToolsBridging the Gap Between Data and Insight using Open-Source Tools
Bridging the Gap Between Data and Insight using Open-Source Toolsodsc
 
Top 10 Signs of the Textpocalypse
Top 10 Signs of the TextpocalypseTop 10 Signs of the Textpocalypse
Top 10 Signs of the Textpocalypseodsc
 
The Art of Data Science
The Art of Data Science The Art of Data Science
The Art of Data Science odsc
 
Frontiers of Open Data Science Research
Frontiers of Open Data Science ResearchFrontiers of Open Data Science Research
Frontiers of Open Data Science Researchodsc
 

Más de odsc (20)

Understanding the Chief Data Officer
Understanding the Chief Data Officer Understanding the Chief Data Officer
Understanding the Chief Data Officer
 
Machine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge DiscoveryMachine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge Discovery
 
API Driven Development
API Driven Development API Driven Development
API Driven Development
 
Mobile technology Usage by Humanitarian Programs: A Metadata Analysis
Mobile technology Usage by Humanitarian Programs: A Metadata AnalysisMobile technology Usage by Humanitarian Programs: A Metadata Analysis
Mobile technology Usage by Humanitarian Programs: A Metadata Analysis
 
Productionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground UpProductionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground Up
 
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and HiveBig Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
 
Think Breadth, Not Depth
Think Breadth, Not DepthThink Breadth, Not Depth
Think Breadth, Not Depth
 
Data Science at Dow Jones: Monetizing Data, News and Information
Data Science at Dow Jones: Monetizing Data, News and InformationData Science at Dow Jones: Monetizing Data, News and Information
Data Science at Dow Jones: Monetizing Data, News and Information
 
Spark, Python and Parquet
Spark, Python and Parquet Spark, Python and Parquet
Spark, Python and Parquet
 
Building a Predictive Analytics Solution with Azure ML
Building a Predictive Analytics Solution with Azure MLBuilding a Predictive Analytics Solution with Azure ML
Building a Predictive Analytics Solution with Azure ML
 
Beyond Names
Beyond NamesBeyond Names
Beyond Names
 
How Woman are Conquering the S&P 500
How Woman are Conquering the S&P 500How Woman are Conquering the S&P 500
How Woman are Conquering the S&P 500
 
Domain Expertise and Unstructured Data
Domain Expertise and Unstructured DataDomain Expertise and Unstructured Data
Domain Expertise and Unstructured Data
 
Kaggle The Home of Data Science
Kaggle The Home of Data ScienceKaggle The Home of Data Science
Kaggle The Home of Data Science
 
Open Source Tools & Data Science Competitions
Open Source Tools & Data Science Competitions Open Source Tools & Data Science Competitions
Open Source Tools & Data Science Competitions
 
Machine Learning with scikit-learn
Machine Learning with scikit-learnMachine Learning with scikit-learn
Machine Learning with scikit-learn
 
Bridging the Gap Between Data and Insight using Open-Source Tools
Bridging the Gap Between Data and Insight using Open-Source ToolsBridging the Gap Between Data and Insight using Open-Source Tools
Bridging the Gap Between Data and Insight using Open-Source Tools
 
Top 10 Signs of the Textpocalypse
Top 10 Signs of the TextpocalypseTop 10 Signs of the Textpocalypse
Top 10 Signs of the Textpocalypse
 
The Art of Data Science
The Art of Data Science The Art of Data Science
The Art of Data Science
 
Frontiers of Open Data Science Research
Frontiers of Open Data Science ResearchFrontiers of Open Data Science Research
Frontiers of Open Data Science Research
 

Último

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 

Último (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

A Hybrid Approach to Data Science Project Management

  • 1. A HYBRID APPROACH TO DATA SCIENCE PROJECT MANAGEMENT Elaine Lee O P E N D A T A S C I E N C E C O N F E R E N C E_ BOSTON 2015 @opendatasci
  • 2. Building a Data-Driven WorldTM Open Data Science Conference A Hybrid Approach to Data Science Project Management Elaine Lee elee@civisanalytics.com @elaineklee
  • 3. 3Open Data Science Conference#ODSC Organizations want to be data-driven but many obstacles stand in their way: • Communication not trickling up to executives and key decision makers • Silos between departments, making it difficult to share and collaborate on analysis • Data ingestion (ETL or Extract-Transform-Load) is difficult and time-consuming • Lack of meaningful, yet customizable visual reporting • Inability to flexibly scale up or down technological needs at a reasonable cost • Inadequate or overwhelming learning resources about data science A Common Problem With Many Faces
  • 4. 4Open Data Science Conference#ODSC Where should Enroll America direct its insurance signup efforts? Mapping the Uninsured in America
  • 5. 5Civis Analytics | Proprietary and Confidential As a company, Civis traces its origins to the 2012 Obama for America analytics team. We built a scientific understanding of each voter. Our data science influenced every strategy and tactic: voter targeting, messaging, media buys, and fundraising. This meant the campaign could allocate resources where impact would be greatest. We ran the first individualized presidential campaign Civis Analytics | Proprietary and Confidential Open Data Science Conference#ODSC
  • 6. 6Civis Analytics | Proprietary and Confidential Today, we leverage data science to help our clients in politics, non- profits, and the corporate world. Civis Analytics | Proprietary and Confidential Open Data Science Conference#ODSC
  • 7. Open Data Science Conference#ODSC Open Data Science Conference#ODSC An easy-to-use, end-to-end, incredibly extendable, data science platform in the cloud for teams who want to make great data-driven decisions to drive their organizations forward. Introducing Civis
  • 8. 8Open Data Science Conference#ODSC The Civis Approach ProductConsulting R&D Applied Data Science • Tackles the toughest data science problems we can find Data Science R&D • Generalizes and automates the solution for many scenarios Software Engineering • Integrates solutions into user-empowering software • Highly collaborative departments • All departments contribute to both our services arm and product development
  • 9. 9Open Data Science Conference#ODSC The Civis Approach Our unique team structure allows us to solve your biggest problems with custom solutions and the technology to scale them.
  • 10. 10Open Data Science Conference#ODSC Strategies and philosophies • Teams based on Civis’s product and consulting needs: • “Built around code” • Semi-annual departmental day-long off-sites to plan upcoming R&D initiatives • Academia-influenced: evidence-based approaches to finding and reporting best solutions • Software development-influenced: standups, code review • Favorite tools: Data Science R&D R&D Modeling Methodology Unstructured Data Engineering
  • 11. 11Open Data Science Conference#ODSC Tools • Share and discuss data science news • Receive feedback from colleagues using our tools • Discuss implementation • Lower communication costs compared to email Data Science R&D
  • 12. 12Open Data Science Conference#ODSC Tools • Prototype new workflows • Used like a log book to record and present results • Share preliminary results with members of other departments Data Science R&D
  • 13. 13Open Data Science Conference#ODSC Tools • Department heads set milestones, check progress, and make project staffing decisions • Collaboratively plan development on new functionality or organizational processes (e.g. recruiting) Data Science R&D
  • 14. 14Open Data Science Conference#ODSC Tools Strategies • Designate “tag team” on R&D as default R&D resources for client engagements • This is the Modeling Methodology team • Other R&D teams’ members may be staffed on engagements depending on expertise required • R&D team member always serves as the Consulted in the RACI model • Transparency about challenges is paramount R&D <-> ADS
  • 15. 15Open Data Science Conference#ODSC 1. Assemble a project team of R&D data scientists and Applied Data Scientists 2. Work with Enroll America to refine requirements and come up with a plan of analysis, ultimately resulting in the design and execution of a phone survey on a sample of individuals, followed by building a predictive model for the rest of the country. 3. The Applied Data Science Manager has weekly calls with Enroll America and status meetings with the project team. 4. The project team delivers the predictions and analysis to Enroll America. R&D <-> ADS: A Case Study Mapping the Uninsured in America The project team completes a postmortem and determines these activities could be automated: model building
  • 16. 16Open Data Science Conference#ODSC Tools Strategies • Designate teams at the interface to triage issues and plan new development: • R&D: “Engineering” team • Tech: “Modeling” team • Use module or project-specific chatrooms to get answers to ad-hoc questions quickly • Identify opportunities to form cross- functional teams, e.g.: • Developing apps using the Platform’s API • Knowledge sharing on best practices R&D <-> Tech
  • 17. 17Open Data Science Conference#ODSC 1. After the postmortem for the Enroll America engagement, R&D begins prototyping automated modeling functionality and discussing its implementation with the Tech department. 2. R&D’s Engineering team finishes the prototype and works with Tech’s Modeling team to integrate it as a new feature in the Platform. 3. During integration, ad hoc discussions occur on GitHub and Hipchat to address usability questions, e.g. resource usage and input/output specifications. R&D <-> Tech: A Case Study Mapping the Uninsured in America The integration team successfully builds and integrates the Build Model module in the Platform.
  • 18. Open Data Science Conference#ODSC Our approach to data science consulting and product development is enriched by valuable perspectives of our employees, who come from a wide array of backgrounds, making our project management strategies a hybrid of more conventional techniques. Conclusion

Notas del editor

  1. Hi everyone, it’s great to be here. My name is Elaine Lee. I am a Data Scientist in the R&D department at Civis Analytics. Civis is a Chicago-based data science consulting and software startup, and I’m excited to tell you a little bit about our company and the work that we do. In particular, I’ll be talking about how the R&D department juggles concurrent development of both our consulting services and our cloud-based data science platform. I’ll be emphasizing approaches borrowed from other more established industries as it pertains to department projects as well as interdepartmental collaborations.
  2. Many of you are already familiar with data science and the potential it has to change the way things are done. However, data science has a high barrier of entry for some teams, from a technical standpoint and organizational standpoint. It can be difficult to wrap your head around the technical needs and quantitative concepts that go into data science. In addition, it can be hard to assemble the right team to do data science and to keep the work organized. Picture a team of data scientists working on the same project. Some of them have written R or Python scripts to process the data, do feature engineering, and build models on it. Some of them have taken the results of the models and produced charts and visualizations in Excel, Tableau, or D3. All the work is being kept in a few different places – Dropbox, Google Drive, Github, MySQL, … It is difficult for this hypothetical team to figure out what exactly has been done, and even worse, what efforts have been duplicated. It is also incredibly difficult to validate the analysis. Does this sound familiar to anyone? Fortunately, many of us at Civis Analytics have faced these challenges in our previous work, but we’ve made those challenges a thing of the past! It didn’t happen overnight, but we were constantly coming up with new ideas to improve the data science workflow by, well, working on a variety of consulting projects and researching new methods. Today I will talk about what some of these ideas are. In addition, I will tell the story of how one client engagement provided us a valuable exercise in collaboration and data science best practices we’ve internalized.
  3. Throughout my talk today, I will be using our project with Enroll America to illustrate a lot my concepts. Enroll America was one of our first clients in 2013. They wanted our help identifying Americans without health insurance so they knew where to direct their outreach. This was a challenging problem because of its large scope – they want to do outreach throughout the country! – and it wasn’t obvious what’s predictive of being uninsured. Why did Enroll America specifically seek us out to solve this problem?
  4. Let’s talk a little about what expertise Civis has for tackling problems like Enroll America’s. The founding members of Civis Analytics were part of Obama For America’s analytics team in his 2012 re-election campaign. There, we developed the beginnings of a framework for doing person-level analytics (which is highly relevant for Enroll America). With scientific levels of rigor, we built models to understand all sorts of relevant vote-related behaviors in order to better identify and persuade supporters, which translated to optimizing how the campaign’s resources were used. The campaign spanned many months and during that time, lots of models were being built and refined; their results were constantly being sent to those in the field to take action upon. Developing an organized and repeatable workflow was especially crucial in order to minimize costs, time spent – especially since the staff was small, and any inadvertent human error, especially when models are built at such a large scale.
  5. After the campaign ended in 2012, we re-examined the strategies we employed and the problems they solved. We realized that if we generalized them, we could solve similar problems for clients in the political, non-profit, and corporate worlds. Which is exactly what Civis did. What you see here is a sample of clients, in addition to Enroll America, that we have helped better target their advertising dollars, identify potential customers for greener sources of electricity, and determine public awareness and sentiment on their brand or cause. In the past year, we took it a step further and we formed a partnership with Discovery Communications to inform more sophisticated audience targeting approaches, ratings forecasting, and marketing spend. We anticipate making more partnerships like this in the future. The examples I gave are all problems with a similar flavor to what Civis successfully solved in 2012 – identifying and reaching the people you care about most.
  6. Our diverse client portfolio, innovative approaches, and proven track record have made Civis Analytics’ consulting services highly sought after in the predictive analytics space. However, we’re equally passionate about removing obstacles to doing data science. Our steady client pipeline enables us to formalize our approach in the form of a cloud-based data science application. Our software, Civis, or “the Platform”, supports the entire workflow of a typical data science project, from data warehousing to data processing to predictive modeling to reporting. This enables organizations to easily take control of their own data and unlock their insights.
  7. This is how we turn our client work experiences into software. We select novel problems brought forth by our clients and work with them to deliver a solution. This is primarily addressed by our Applied Data Science department. Simultaneous to this, we’ve been conducting research and experimenting with different methods to solve the problem, with one eye towards determining how to generalize the solution. This is primarily done by the Data Science R&D department. Finally, solutions are integrated into our software platform by the Software Engineering, or Tech, department. Users of our software platform – clients and our Applied Data Scientists – provide us valuable feedback which are continuously incorporated. This unique, synergistic cycle enables us to deliver high quality results to our customers.
  8. In our day-to-day work, all departments pitch in on both lines of business, ensuring fluency on all the company’s offerings and thus better decision making. We also collaborate across departments on all projects, big or small. Today I will be focusing on how my department, the DS R&D department, manages its workload and how it works with the Applied Data Science and Tech departments.
  9. The R&D department is the only department that is intimately aligned with both lines of business. We’re split into 3 different teams. Modeling Methodology focuses on developing new modeling workflows. Unstructured data specializes in data that can’t neatly be summarized by a flat file, like text data. Engineering is responsible for managing our production codebases of new features for our software product. Our department is “built around code”: “We're trying to build up knowledge and best practices, and being built around code lowers our communication costs, errors, redundancy, and facilitates us making software.” To roadmap what we build, based on what we’ve learned from recent client engagements, we have day-long semi-annual department off-sites. When developing new methodologies, we use an academic-influenced approach – empirical and thorough such that our recommended solution covers all the edge cases. When building out workflows, we follow guidelines common to most software development projects, including some ideas from the Agile methodology – we have daily standups to make sure everyone’s on the same page about the status of the codebases and we do code reviews before any changes are shipped. Our standups are on a per-repository basis, so it doesn’t waste anyone’s time. To do our work, these are our favorite tools. Let’s take a look at how we use them.
  10. Hipchat and Github form the backbone of our communications. To those not familiar with these tools, Hipchat is an instant messaging tool for organizations. Github is a web interface, built on top of the version control system, git, for teams to collaborate on a codebase. These tools are crucial to our philosophy on being built around code They enable members across the company to participate by asking questions and generally weighing in Departmental members use it to discuss implementation These tools are much faster than email since it makes it easier to ask questions and get answers, since anyone who knows the answer can see the request and thus respond.
  11. When developing new methods, we like to use Jupyter and Google Drive. We use Jupyter for its Ipython Notebook capabilities. It allows us to run Python code, especially modules from our codebase, interactively – it allows us to chain components together to make new workflows. Jupyter also has presentation functionality, so we also use it as a log book to record and present results in internal meetings. Sometimes we also use Google Drive to record and share results with members of other departments, such as Applied Data Scientists, who have a vested interest in the project but don’t require all the details.
  12. Finally, to take the “pulse” on the R&D department as a whole, department heads use Google Drive and Asana for big picture planning. Asana is a project management tool which gives department heads a birds eye view of what each team member is working on and how each project is progressing. Google Drive tools are used to collaborate on planning documents, be it plans for new functionality to build or revising organizational processes, such as rewriting our hiring exam.
  13. That was how we, the R&D department, work together. How do we work with the Applied Data Scientists, the data scientists in our consulting arm? To make project staffing seamless, we designate a tag team to serve as the first point of contact for client engagements. This is the Modeling Methodology team. However, other R&D data scientists may be staffed on a project depending on expertise required. The R&D data scientist always serves as the Consulted in the RACI model. The RACI model is a popular project management model used in consulting. It emphasizes explicit roles for each team member to ensure accountability. R is for Responsible, a role held by the applied data scientists. A is for Accountable; this is the Applied Data Science Manager or project manager C is for consulted. And I is for Informed (the client) Lastly, we are open with Applied Data Scientists about R&D challenges in order to avoid schedule slips on the client engagement. The project plan is often tracked in Trello, a popular bulletin board app, with bulletin boards for each milestone’s requirements.
  14. Let’s revisit our client story – Mapping the Uninsured in America – to illustrate concretely how we work together. After Enroll America shared their problem to us, we assembled a project team of R&D data scientists and Applied data scientists to solve it. We worked with Enroll to refine the problem statement into a set of requirements, ultimately resulting in the design and execution of a phone survey on a sample of individuals, followed by building a model to capture the rest of the country. The project gets under way. Throughout the project, the Applied Data Science Manager has weekly status calls with Enroll and with the project team to make sure we’re on schedule. Occasionally we staffed a couple extra data scientists to the project to make sure we delivered results on time when there was risk of a schedule slip. For example, we brought in an extra data scientist towards the end of the project to help produce graphs and visualizations of the results. Finally, we finished our analysis and presented our predictions to Enroll America. Afterwards, we did a post mortem and realized that automated model building would’ve made us more efficient. This is because we conducted our experiment in waves and built similar models as the results came in, with the only difference being the input data. Also, the analysts were each working on individual components of the analysis, writing their own R scripts which had a lot of overlap (such as the data processing steps), which meant a lot of time was wasted.
  15. So that’s how we work with the Applied Data Scientists on consulting projects. How do we work with the Tech department? Much like how we work with the Applied Data Science department, we’ve designated a team to interface with the Tech department and they have as well. That would be the Engineering team on our side and the Modeling team on their side. The Engineering team in Data Science are data scientists who speak software development and the Modeling team in the Tech department are software engineers who speak data science. Most of our communications are done using module or project-specific chatrooms and github issue tickets, which gets answers quickly. To promote really inspired product development, we identify opportunities to form cross-functional teams, Such as using the Platform’s API to develop new apps And teaching each other best practices for software development via brownbag sessions.
  16. Let’s revisit the Enroll America project for an example of how the R&D data scientists work with the software engineers. After the post mortem for the Enroll engagement, we began prototyping automated modeling functionality, communicating to the Tech department the motivation for it and including them in discussions about implementation and feasibility. Once we finish the prototype, ensuring that it passes all the tests and code review, the Engineering team in R&D work with the Modeling team in Tech to integrate it as a new feature in the Platform. We use Github and Hipchat to discuss questions that come up, such as resource usage, input/output specifications, and data visualizations we wanted to provide to the end user. Together, the R&D department and the Tech department successfully built and integrated the Build Model module that exists today in Platform.
  17. In summary, a lot of our approaches have a common theme, which is minimizing communication costs within the R&D department and with other departments. This is evidenced by our embrace of some free or open-source tools for collaboration and our general belief in transparency about challenges. We also emphasize collaborative opportunities between departments to strengthen our cohesiveness as a team, be it working on a client engagements together or learning best practices in a seminar format. A lot of our ideas come from the valuable perspectives of our employees, who come from a wide array of backgrounds. Thus, our project management strategies are a hybrid of techniques seen in more established industries such as software engineering, consulting, and academia. I hope the tips presented in my talk today has made doing data science more manageable for your team. Thank you for your time.