SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
Data Science
&
Data Products
at
René Pfitzner, Lead Data Scientist
19th Swiss Big Data User Group Meeting
Zürich, 23rd January 2017
I. Introduction
NZZ, media challenges, trends
II. Data Science @NZZ
Goals, principles, approaches
III. Data Products @NZZ
Our “stack” & insights & demo
IV. NZZ Companion
Individual news fueled by data science
Outline
I. Introduction
Myself, NZZ, media challenges, trends
● Lead Data Scientist at NZZ
● media innovation
● algorithmic approaches for
news media
● background in StatPhys
● python, scala, spark, R
Self-Intro
www.renepfitzner.net
@RenePfitznerZH
Newspaper Revenue: the reality
US newspaper advertising revenue,
corrected for inflation
Data:
Newspaper Association of
America
Graphics:
https://commons.wikimedi
a.org/wiki/File:Naa_newsp
aper_ad_revenue.svg
Newspaper Decline
Number of newspapers in the
United States
-25%
Data:
www.census.gov
Graphics:
https://commons.wikimedi
a.org/wiki/File:Number_of
_newspaper_firms.png
Well, it should be!
Media = Fourth Estate!
Is this something to worry about?
Wikipedia:
Decline of Newspapers
https://en.wikipedia.org/wiki/Decline_o
f_newspapers
II. Data Science @NZZ
Goals, principles, approaches
Data Science: Goals
Data Science at NZZ
DecisionMaking
DataProducts
MarketingOptimization
Data Science: Data Products
Attempt of a definition:
A data product is a digital product that provides
some benefit to a downstream consuming
application, incorporating data and data-based
methods (e.g. ML).
Data Science: Data Products
What good is Data Science, if you cannot put
it into production?
Data Science: Data Products
Provision &
Integration
?
https://blog.treasuredata.com/blog/2016/03/15/self-study-list-for-data-engin
eers-and-aspiring-data-architects/
Data Science: Data Products
?Provision &
Integration
Data Product
III. Data Products @NZZ
Our “stack” & insights & demo
Data Products: Our stack
REST API’s
Data Products: What is Spark?
● “General engine for fast big
data processing”
● it’s more: parallel computing
framework
● “hadoop on steroids”
→ in-memory!
Data Products: How and where?
REST API’s
- on-premise / hosted
- gcloud -- dataproc
- gcloud
- in parts dockerized
- kubernetes
- gcloud & hosted
- dockerized; kubernetes
- microservice approach
Data Products: Article Recomm
- recommendations
based on current article
- mixed with
advertisement
- article click rate x3
- ad conversion rate x3
Data Products: Article Recomm
Network-based
- weighted
co-reading net
Trending articles
- clicks
- click trend
Topic detection
- word2vec
Data Products: Learnings?
● Spark is great for general purpose
● … easily maintained
● … go fast from dev to prod
● Scala forces you to think more & structure better
● cons: development notebooks
● more technical? Talk to me later ...
IV. NZZ Companion
Individual news fueled by data science
NZZ News Companion: Facts
● changing news consumption behavior
● vast majority of article clicks emerges from
startpage
● highly volatile startpage
data-enhanced content delivery
NZZ News Companion: Prototype
NZZ News Companion: DNI
https://www.digitalnewsinitiative.com/
Be a news-innovation
beta tester!
companion@nzz.ch
Be a news-innovation beta tester!
companion@nzz.ch
rene.pfitzner@nzz.ch
@RenePfitznerZH
www.renepfitzner.net

Más contenido relacionado

La actualidad más candente

Big Data - What's the Big Deal
Big Data - What's the Big DealBig Data - What's the Big Deal
Big Data - What's the Big DealDebarchan Sarkar
 
Raising the Tides: Open Source Analytics for Data Science
Raising the Tides: Open Source Analytics for Data ScienceRaising the Tides: Open Source Analytics for Data Science
Raising the Tides: Open Source Analytics for Data ScienceWes McKinney
 
You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
 You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival - You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -freshdatabos
 
Cloud and Big Data in the agriculture sector
Cloud and Big Data in the agriculture sectorCloud and Big Data in the agriculture sector
Cloud and Big Data in the agriculture sectorFernando Lopez Aguilar
 
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...yashbheda
 
BizTech2017 Presentation
BizTech2017 PresentationBizTech2017 Presentation
BizTech2017 PresentationRaquel Seville
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Arohi Khandelwal
 
Getting Digital Preservation Data Out Of Wikidata
Getting Digital Preservation Data Out Of WikidataGetting Digital Preservation Data Out Of Wikidata
Getting Digital Preservation Data Out Of WikidataKenneth Seals-Nutt
 
IP EXPO Nordic: Data Science in the Cloud
IP EXPO Nordic: Data Science in the CloudIP EXPO Nordic: Data Science in the Cloud
IP EXPO Nordic: Data Science in the CloudMargriet Groenendijk
 
Data Symposium Data Privacy Ethics
Data Symposium Data Privacy EthicsData Symposium Data Privacy Ethics
Data Symposium Data Privacy EthicsChristian Bartens
 
Is Hadoop a Necessity for Data Science
Is Hadoop a Necessity for Data ScienceIs Hadoop a Necessity for Data Science
Is Hadoop a Necessity for Data ScienceEdureka!
 
Hack reduce introduction
Hack reduce introductionHack reduce introduction
Hack reduce introductionmontrealouvert
 
Big data introduction (HackTM 2016)
Big data introduction (HackTM 2016)Big data introduction (HackTM 2016)
Big data introduction (HackTM 2016)Moldovan Radu Adrian
 

La actualidad más candente (19)

Big Data - What's the Big Deal
Big Data - What's the Big DealBig Data - What's the Big Deal
Big Data - What's the Big Deal
 
Raising the Tides: Open Source Analytics for Data Science
Raising the Tides: Open Source Analytics for Data ScienceRaising the Tides: Open Source Analytics for Data Science
Raising the Tides: Open Source Analytics for Data Science
 
You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
 You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival - You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
 
Cloud and Big Data in the agriculture sector
Cloud and Big Data in the agriculture sectorCloud and Big Data in the agriculture sector
Cloud and Big Data in the agriculture sector
 
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
 
BizTech2017 Presentation
BizTech2017 PresentationBizTech2017 Presentation
BizTech2017 Presentation
 
Data Skipping Technology
Data Skipping TechnologyData Skipping Technology
Data Skipping Technology
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop
 
Big-Data Computing on the Cloud
Big-Data Computing on the CloudBig-Data Computing on the Cloud
Big-Data Computing on the Cloud
 
Getting Digital Preservation Data Out Of Wikidata
Getting Digital Preservation Data Out Of WikidataGetting Digital Preservation Data Out Of Wikidata
Getting Digital Preservation Data Out Of Wikidata
 
Database Solutions : Database research & update
Database Solutions : Database research & updateDatabase Solutions : Database research & update
Database Solutions : Database research & update
 
How Do I Learn Big Data
How Do I Learn Big DataHow Do I Learn Big Data
How Do I Learn Big Data
 
IP EXPO Nordic: Data Science in the Cloud
IP EXPO Nordic: Data Science in the CloudIP EXPO Nordic: Data Science in the Cloud
IP EXPO Nordic: Data Science in the Cloud
 
Data Symposium Data Privacy Ethics
Data Symposium Data Privacy EthicsData Symposium Data Privacy Ethics
Data Symposium Data Privacy Ethics
 
A Brief History Of Data
A Brief History Of DataA Brief History Of Data
A Brief History Of Data
 
متن‌بازسازی کلان‌داده
متن‌بازسازی کلان‌دادهمتن‌بازسازی کلان‌داده
متن‌بازسازی کلان‌داده
 
Is Hadoop a Necessity for Data Science
Is Hadoop a Necessity for Data ScienceIs Hadoop a Necessity for Data Science
Is Hadoop a Necessity for Data Science
 
Hack reduce introduction
Hack reduce introductionHack reduce introduction
Hack reduce introduction
 
Big data introduction (HackTM 2016)
Big data introduction (HackTM 2016)Big data introduction (HackTM 2016)
Big data introduction (HackTM 2016)
 

Destacado

Building Data Products
Building Data ProductsBuilding Data Products
Building Data ProductsCloudera, Inc.
 
How can Data Science benefit your business?
How can Data Science benefit your business?How can Data Science benefit your business?
How can Data Science benefit your business?Peadar Coyle
 
Data Science for Business Managers by TektosData
Data Science for Business Managers by TektosDataData Science for Business Managers by TektosData
Data Science for Business Managers by TektosDataMaurício Garcia
 
Big data-science-oanyc
Big data-science-oanycBig data-science-oanyc
Big data-science-oanycOpen Analytics
 
Developing Data Products
Developing Data ProductsDeveloping Data Products
Developing Data ProductsPeter Skomoroch
 
Moving From The Art To The Science
Moving From The Art To The ScienceMoving From The Art To The Science
Moving From The Art To The ScienceCapgemini
 
Creating Big Data Success with the Collaboration of Business and IT
Creating Big Data Success with the Collaboration of Business and ITCreating Big Data Success with the Collaboration of Business and IT
Creating Big Data Success with the Collaboration of Business and ITEdward Chenard
 
Data Over Matter: Innovating the next generation of products
Data Over Matter: Innovating the next generation of productsData Over Matter: Innovating the next generation of products
Data Over Matter: Innovating the next generation of productsEli Bressert
 
Data Science for Smart Manufacturing
Data Science for Smart ManufacturingData Science for Smart Manufacturing
Data Science for Smart ManufacturingCarlo Torniai
 
From Data Science to Business Value - Analytics Applied
From Data Science to Business Value - Analytics AppliedFrom Data Science to Business Value - Analytics Applied
From Data Science to Business Value - Analytics AppliedTeradata Aster
 
Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...
Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...
Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...Dataconomy Media
 
Harvesting business Value with Data Science
Harvesting business Value with Data ScienceHarvesting business Value with Data Science
Harvesting business Value with Data ScienceInfoFarm
 
Business model innovation for the digital age
Business model innovation for the digital ageBusiness model innovation for the digital age
Business model innovation for the digital ageChanade Hemming
 
11 Principles of Applied Analytics
11 Principles of Applied Analytics11 Principles of Applied Analytics
11 Principles of Applied AnalyticsGeorgian
 
On Big Data Analytics - opportunities and challenges
On Big Data Analytics - opportunities and challengesOn Big Data Analytics - opportunities and challenges
On Big Data Analytics - opportunities and challengesPetteri Alahuhta
 
Malang Digital Core - Business Model Navigator
Malang Digital Core - Business Model NavigatorMalang Digital Core - Business Model Navigator
Malang Digital Core - Business Model NavigatorEvans Winata
 
Business Model Canvas - Definition & Some examples
Business Model Canvas - Definition & Some examplesBusiness Model Canvas - Definition & Some examples
Business Model Canvas - Definition & Some examplesFederico Giovanni Rega
 
UCD Smurfit: Digital Merchants Business Model Analysis
UCD Smurfit: Digital Merchants Business Model AnalysisUCD Smurfit: Digital Merchants Business Model Analysis
UCD Smurfit: Digital Merchants Business Model AnalysisLara Zaccaria
 

Destacado (20)

Building Data Products
Building Data ProductsBuilding Data Products
Building Data Products
 
Designing Data Products
Designing Data ProductsDesigning Data Products
Designing Data Products
 
Business model canvas
Business model canvasBusiness model canvas
Business model canvas
 
How can Data Science benefit your business?
How can Data Science benefit your business?How can Data Science benefit your business?
How can Data Science benefit your business?
 
Data Science for Business Managers by TektosData
Data Science for Business Managers by TektosDataData Science for Business Managers by TektosData
Data Science for Business Managers by TektosData
 
Big data-science-oanyc
Big data-science-oanycBig data-science-oanyc
Big data-science-oanyc
 
Developing Data Products
Developing Data ProductsDeveloping Data Products
Developing Data Products
 
Moving From The Art To The Science
Moving From The Art To The ScienceMoving From The Art To The Science
Moving From The Art To The Science
 
Creating Big Data Success with the Collaboration of Business and IT
Creating Big Data Success with the Collaboration of Business and ITCreating Big Data Success with the Collaboration of Business and IT
Creating Big Data Success with the Collaboration of Business and IT
 
Data Over Matter: Innovating the next generation of products
Data Over Matter: Innovating the next generation of productsData Over Matter: Innovating the next generation of products
Data Over Matter: Innovating the next generation of products
 
Data Science for Smart Manufacturing
Data Science for Smart ManufacturingData Science for Smart Manufacturing
Data Science for Smart Manufacturing
 
From Data Science to Business Value - Analytics Applied
From Data Science to Business Value - Analytics AppliedFrom Data Science to Business Value - Analytics Applied
From Data Science to Business Value - Analytics Applied
 
Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...
Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...
Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...
 
Harvesting business Value with Data Science
Harvesting business Value with Data ScienceHarvesting business Value with Data Science
Harvesting business Value with Data Science
 
Business model innovation for the digital age
Business model innovation for the digital ageBusiness model innovation for the digital age
Business model innovation for the digital age
 
11 Principles of Applied Analytics
11 Principles of Applied Analytics11 Principles of Applied Analytics
11 Principles of Applied Analytics
 
On Big Data Analytics - opportunities and challenges
On Big Data Analytics - opportunities and challengesOn Big Data Analytics - opportunities and challenges
On Big Data Analytics - opportunities and challenges
 
Malang Digital Core - Business Model Navigator
Malang Digital Core - Business Model NavigatorMalang Digital Core - Business Model Navigator
Malang Digital Core - Business Model Navigator
 
Business Model Canvas - Definition & Some examples
Business Model Canvas - Definition & Some examplesBusiness Model Canvas - Definition & Some examples
Business Model Canvas - Definition & Some examples
 
UCD Smurfit: Digital Merchants Business Model Analysis
UCD Smurfit: Digital Merchants Business Model AnalysisUCD Smurfit: Digital Merchants Business Model Analysis
UCD Smurfit: Digital Merchants Business Model Analysis
 

Similar a Data Science & Data Products at Neue Zürcher Zeitung

Building your data driven business with Reactive Marketing Technology
Building your data driven business with Reactive Marketing TechnologyBuilding your data driven business with Reactive Marketing Technology
Building your data driven business with Reactive Marketing TechnologyTrieu Nguyen
 
Snowplow presentation for Amsterdam Meetup #3
Snowplow presentation for Amsterdam Meetup #3Snowplow presentation for Amsterdam Meetup #3
Snowplow presentation for Amsterdam Meetup #3Snowplow Analytics
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big DataeXascale Infolab
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformVMware Tanzu
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataPrakalp Agarwal
 
Linked Open Data & E-Commerce von Jun.-Prof. Dr. habil. Ansgar Scherp
Linked Open Data & E-Commerce von Jun.-Prof. Dr. habil. Ansgar ScherpLinked Open Data & E-Commerce von Jun.-Prof. Dr. habil. Ansgar Scherp
Linked Open Data & E-Commerce von Jun.-Prof. Dr. habil. Ansgar ScherpADTELLIGENCE GmbH
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data SnapLogic
 
Managing the Impact of COVID-19 Using Data Virtualization
Managing the Impact of COVID-19 Using Data VirtualizationManaging the Impact of COVID-19 Using Data Virtualization
Managing the Impact of COVID-19 Using Data VirtualizationDenodo
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoopRemas Ittahir
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataMatt Stubbs
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataMatt Stubbs
 
Computer Applications and Systems - Workshop V
Computer Applications and Systems - Workshop VComputer Applications and Systems - Workshop V
Computer Applications and Systems - Workshop VRaji Gogulapati
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesSpringPeople
 
SnapLogic At Tableau Conference - Sept 2013 #tcc13
SnapLogic At Tableau Conference - Sept 2013 #tcc13SnapLogic At Tableau Conference - Sept 2013 #tcc13
SnapLogic At Tableau Conference - Sept 2013 #tcc13Maneesh Joshi
 
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...Experfy
 

Similar a Data Science & Data Products at Neue Zürcher Zeitung (20)

Building your data driven business with Reactive Marketing Technology
Building your data driven business with Reactive Marketing TechnologyBuilding your data driven business with Reactive Marketing Technology
Building your data driven business with Reactive Marketing Technology
 
Snowplow presentation for Amsterdam Meetup #3
Snowplow presentation for Amsterdam Meetup #3Snowplow presentation for Amsterdam Meetup #3
Snowplow presentation for Amsterdam Meetup #3
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 
Taming Big Data With Modern Software Architecture
Taming Big Data  With Modern Software ArchitectureTaming Big Data  With Modern Software Architecture
Taming Big Data With Modern Software Architecture
 
Linked Open Data & E-Commerce von Jun.-Prof. Dr. habil. Ansgar Scherp
Linked Open Data & E-Commerce von Jun.-Prof. Dr. habil. Ansgar ScherpLinked Open Data & E-Commerce von Jun.-Prof. Dr. habil. Ansgar Scherp
Linked Open Data & E-Commerce von Jun.-Prof. Dr. habil. Ansgar Scherp
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data
 
Managing the Impact of COVID-19 Using Data Virtualization
Managing the Impact of COVID-19 Using Data VirtualizationManaging the Impact of COVID-19 Using Data Virtualization
Managing the Impact of COVID-19 Using Data Virtualization
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Computer Applications and Systems - Workshop V
Computer Applications and Systems - Workshop VComputer Applications and Systems - Workshop V
Computer Applications and Systems - Workshop V
 
Big data
Big dataBig data
Big data
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
SnapLogic At Tableau Conference - Sept 2013 #tcc13
SnapLogic At Tableau Conference - Sept 2013 #tcc13SnapLogic At Tableau Conference - Sept 2013 #tcc13
SnapLogic At Tableau Conference - Sept 2013 #tcc13
 
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
 

Último

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

Último (20)

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Data Science & Data Products at Neue Zürcher Zeitung

  • 1. Data Science & Data Products at René Pfitzner, Lead Data Scientist 19th Swiss Big Data User Group Meeting Zürich, 23rd January 2017
  • 2. I. Introduction NZZ, media challenges, trends II. Data Science @NZZ Goals, principles, approaches III. Data Products @NZZ Our “stack” & insights & demo IV. NZZ Companion Individual news fueled by data science Outline
  • 3. I. Introduction Myself, NZZ, media challenges, trends
  • 4. ● Lead Data Scientist at NZZ ● media innovation ● algorithmic approaches for news media ● background in StatPhys ● python, scala, spark, R Self-Intro www.renepfitzner.net @RenePfitznerZH
  • 5.
  • 6. Newspaper Revenue: the reality US newspaper advertising revenue, corrected for inflation Data: Newspaper Association of America Graphics: https://commons.wikimedi a.org/wiki/File:Naa_newsp aper_ad_revenue.svg
  • 7. Newspaper Decline Number of newspapers in the United States -25% Data: www.census.gov Graphics: https://commons.wikimedi a.org/wiki/File:Number_of _newspaper_firms.png
  • 8. Well, it should be! Media = Fourth Estate! Is this something to worry about?
  • 10. II. Data Science @NZZ Goals, principles, approaches
  • 11. Data Science: Goals Data Science at NZZ DecisionMaking DataProducts MarketingOptimization
  • 12. Data Science: Data Products Attempt of a definition: A data product is a digital product that provides some benefit to a downstream consuming application, incorporating data and data-based methods (e.g. ML).
  • 13. Data Science: Data Products What good is Data Science, if you cannot put it into production?
  • 14. Data Science: Data Products Provision & Integration ? https://blog.treasuredata.com/blog/2016/03/15/self-study-list-for-data-engin eers-and-aspiring-data-architects/
  • 15. Data Science: Data Products ?Provision & Integration Data Product
  • 16. III. Data Products @NZZ Our “stack” & insights & demo
  • 17. Data Products: Our stack REST API’s
  • 18. Data Products: What is Spark? ● “General engine for fast big data processing” ● it’s more: parallel computing framework ● “hadoop on steroids” → in-memory!
  • 19. Data Products: How and where? REST API’s - on-premise / hosted - gcloud -- dataproc - gcloud - in parts dockerized - kubernetes - gcloud & hosted - dockerized; kubernetes - microservice approach
  • 20. Data Products: Article Recomm - recommendations based on current article - mixed with advertisement - article click rate x3 - ad conversion rate x3
  • 21. Data Products: Article Recomm Network-based - weighted co-reading net Trending articles - clicks - click trend Topic detection - word2vec
  • 22. Data Products: Learnings? ● Spark is great for general purpose ● … easily maintained ● … go fast from dev to prod ● Scala forces you to think more & structure better ● cons: development notebooks ● more technical? Talk to me later ...
  • 23. IV. NZZ Companion Individual news fueled by data science
  • 24. NZZ News Companion: Facts ● changing news consumption behavior ● vast majority of article clicks emerges from startpage ● highly volatile startpage data-enhanced content delivery
  • 25. NZZ News Companion: Prototype
  • 26. NZZ News Companion: DNI https://www.digitalnewsinitiative.com/
  • 27. Be a news-innovation beta tester! companion@nzz.ch
  • 28. Be a news-innovation beta tester! companion@nzz.ch rene.pfitzner@nzz.ch @RenePfitznerZH www.renepfitzner.net