SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
從實驗室走進生產線
——談談怎麼和資料科學家合作
安捏母湯資料科學家
Source: http://codewithmax.com/2018/03/06/basic-example-of-a-neural-network-with-tensorflow-and-keras/
我以為的資料科學家
實際上的資料科學家
Source:	Sculley et	al.:	Hidden	Technical	Debt	in	Machine	Learning	Systems
當我想做一個「資料科學」專案的時候
當我想做一個「資料科學」專案的時候
資料清洗 資料分析 資料驗證
資料切分 訓練模型 驗證模型
當我想做一個「資料科學」產品的時候
Source: https://udn.com/news/story/11320/3222213
當我想做一個「資料科學」產品的時候
資料清洗 資料分析 資料驗證
資料切分 訓練模型 驗證模型
當我想做一個「資料科學」產品的時候
資料清洗 資料分析 資料驗證 資料切分
訓練模型 驗證模型 規模訓練 模型更新
模型上線 模型監控 模型日誌 模型優化
資料科學工作流程
• 一致的編排與環境
• 可擴張的團隊建模協作
• 持續滿足需求
• 改進迭代週期自動部署
• 可重現的結果
• 監控品質與效能測試監控
開發+運維
• 開發+運維=DevOps
• 使用者、開發人員、QA、以及運維人員協力解決
軟體遞交的問題。
資料+運維
• 資料+運維=DataOps
• 讓所有資料從業人員(包含資料分析師、資料科學
家、資料工程師和 IT 人員等等)一起來持續地遞
交有品質的資料給應用及商業流程。
資料+運維
Source: https://medium.com/data-ops/dataops-is-not-just-devops-for-data-6e03083157b7
資料+運維
Source: https://medium.com/data-ops/dataops-is-not-just-devops-for-data-6e03083157b7
資料+運維
Source: https://medium.com/data-ops/dataops-is-not-just-devops-for-data-6e03083157b7
實踐 DataOps
Source: https://www.kubeflow.org/
實踐 DataOps
Source: KubeCon Europe 2018
實踐 DataOps
Source:	https://blog.paperspace.com/ci-cd-for-machine-learning-ai/
實踐 DataOps
Source:	https://blog.paperspace.com/ci-cd-for-machine-learning-ai/
實踐 DataOps
https://www.infuseai.io
SQL	DB
Cosmos	DB
Datawarehouse
Data	lake
Blob	storage
… Prepare	Data Build	&	Train Deploy
Machine	Learning	Process
How	much	is	this	car	worth?
Machine	Learning	Problem	Example
Model	Creation	Is	Typically	Time-Consuming
Mileage
Condition
Car	brand
Year	of	make
Regulations
…
Parameter	1
Parameter	2
Parameter	3
Parameter	4
…
Gradient	Boosted		
Nearest	Neighbors	
SVM
Bayesian	Regression
LGBM	
…
Mileage Gradient Boosted Criterion
Loss
Min	Samples	Split
Min	Samples	Leaf
Others Model
Which algorithm? Which parameters?Which features?
Car brand
Year of make
Criterion
Loss
Min	Samples	Split
Min	Samples	Leaf
Others
N	Neighbors
Weights
Metric
P
Others
Which algorithm? Which parameters?Which features?
Mileage
Condition
Car	brand
Year	of	make
Regulations
…
Gradient	Boosted		
Nearest	Neighbors	
SVM
Bayesian	Regression
LGBM	
…
Nearest Neighbors
Model
Iterate
Gradient BoostedMileage
Car brand
Year of make
Car brand
Year of make
Condition
Model	Creation	Is	Typically	Time-Consuming
Which algorithm? Which parameters?Which features?
Iterate
Model	Creation	Is	Typically	Time-Consuming
Source:	http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
Machine	Learning	Complexity
Dataset
Training	
Algorithm	1
Hyperparameter
Values	– config	1
Model	1
Hyperparameter
Values	– config	2
Model	2
Hyperparameter
Values	– config	3
Model	3
Model	Training
InfrastructureTraining	
Algorithm	2
Hyperparameter
Values	– config	4
Model	4
Model	Selection	&	Hyperparameter	Tuning
Introducing	Automated	Machine	Learning
Dataset
Optimization
Metric
Constraints
(Time/Cost)
ML	ModelAutomated ML
Accessible	&	Faster
Enter data
Define goals
Apply constraints
Output
Automated	ML	Accelerates	Model	Development	
Input Intelligently test multiple models in parallel
Optimized model
Automated	ML	Customer	Testimonials	
• Press-coverage	from	
public	preview:
• CNET
• VentureBeat
• PRNewswire
“I	quite	like	your	AutoML	function.	It	gives	me	good	results	compared	to	
other	libraries	I	tested	before	(tpot and	auto-sklearn)	that	I	believe	was	
only	looking	at	scores	and	often	gave	me	models	that	over-trained	my	
data.	And	of	course	the	model	from	your	suggested	code	is	better.”
- Big	oil	company
“I	will	start	with	AutoML	and	use	the	algorithm	that	AutoML	
recommends	to	further	tune	the	model”
- Data	Scientist
“I	actually	enjoy	being	able	to	use	AutoML	in	a	Jupyter	notebook.	The	
DataRobot	interface	was	nice	for	non-experts,	but	for	someone	like	me,	
it	felt	a	bit	basic.”
- Data	Scientist
Automated	ML	Capabilities
Automated	ML	Capabilities
• Based	on	Microsoft	Research
• Brain	trained	with	several	
million	experiments
• Collaborative	filtering	and	
Bayesian	optimization
• Privacy	preserving:	No	need	to	
“see”	the	data
Automated	ML	Capabilities
• ML	Scenarios:	Classification	&	
Regression,	Forecasting
• Integration:	Azure	Machine	
Learning,	Azure	Notebooks,	
Jupyter Notebooks
• Data	Type:	Numeric,	Text
• Languages:	Python	SDK	for	
deployment	and	hosting	for	
inference	
• Training	Compute:	Local	Machine,	
Remote	Azure	DSVM	(Linux),	Azure	
Batch	AI,	Databricks
• Transparency:	View	run	history,	
model	metrics
• Scale:	Faster	model	training	using	
multiple	cores	and	parallel	
experiments
• Dropping	high	cardinality	or	no	
variance	features
• Missing	value	imputation
• Generating	additional	features
• Transformations	and	encodings
Feature	Engineering
• Feature	importance	as	part	of	
training
• Local	feature	importance	for	a	
given	sample
Model	Explain-ability
Q	&	A

Más contenido relacionado

La actualidad más candente

Cuckoo search algorithm
Cuckoo search algorithmCuckoo search algorithm
Cuckoo search algorithm
Ritesh Kumar
 

La actualidad más candente (20)

AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...
AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...
AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...
 
Machine Learning in 10 Minutes | What is Machine Learning? | Edureka
Machine Learning in 10 Minutes | What is Machine Learning? | EdurekaMachine Learning in 10 Minutes | What is Machine Learning? | Edureka
Machine Learning in 10 Minutes | What is Machine Learning? | Edureka
 
Alpha beta
Alpha betaAlpha beta
Alpha beta
 
Cuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An IntroductionCuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An Introduction
 
AI_Session 20 Horn clause.pptx
AI_Session 20 Horn clause.pptxAI_Session 20 Horn clause.pptx
AI_Session 20 Horn clause.pptx
 
Expected questions in Artificial Intelligence
Expected questions in Artificial IntelligenceExpected questions in Artificial Intelligence
Expected questions in Artificial Intelligence
 
5. phases of nlp
5. phases of nlp5. phases of nlp
5. phases of nlp
 
Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...
Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...
Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...
 
Rule Based System
Rule Based SystemRule Based System
Rule Based System
 
AI simple search strategies
AI simple search strategiesAI simple search strategies
AI simple search strategies
 
Optimization techniques: Ant Colony Optimization: Bee Colony Optimization: Tr...
Optimization techniques: Ant Colony Optimization: Bee Colony Optimization: Tr...Optimization techniques: Ant Colony Optimization: Bee Colony Optimization: Tr...
Optimization techniques: Ant Colony Optimization: Bee Colony Optimization: Tr...
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processing
 
AI Lecture 3 (solving problems by searching)
AI Lecture 3 (solving problems by searching)AI Lecture 3 (solving problems by searching)
AI Lecture 3 (solving problems by searching)
 
Case based reasoning
Case based reasoningCase based reasoning
Case based reasoning
 
Artificial Intelligence 1 Planning In The Real World
Artificial Intelligence 1 Planning In The Real WorldArtificial Intelligence 1 Planning In The Real World
Artificial Intelligence 1 Planning In The Real World
 
Hill climbing algorithm
Hill climbing algorithmHill climbing algorithm
Hill climbing algorithm
 
Hot Topics in Machine Learning For Research and thesis
Hot Topics in Machine Learning For Research and thesisHot Topics in Machine Learning For Research and thesis
Hot Topics in Machine Learning For Research and thesis
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
Crow search algorithm
Crow search algorithmCrow search algorithm
Crow search algorithm
 
Cuckoo search algorithm
Cuckoo search algorithmCuckoo search algorithm
Cuckoo search algorithm
 

Similar a Data Ops:從實驗室走進生產線, 談談怎麼和資料科學家合作

chapter23 ComputerForensicsHowoftenhaveIsaidto.docx
chapter23 ComputerForensicsHowoftenhaveIsaidto.docxchapter23 ComputerForensicsHowoftenhaveIsaidto.docx
chapter23 ComputerForensicsHowoftenhaveIsaidto.docx
tiffanyd4
 
Personal Narrative-Co-Capitalist
Personal Narrative-Co-CapitalistPersonal Narrative-Co-Capitalist
Personal Narrative-Co-Capitalist
Ashley Jean
 
Karen Hegarty Resume
Karen Hegarty ResumeKaren Hegarty Resume
Karen Hegarty Resume
Karen Hegarty
 

Similar a Data Ops:從實驗室走進生產線, 談談怎麼和資料科學家合作 (20)

W-JAX Keynote - Big Data and Corporate Evolution
W-JAX Keynote - Big Data and Corporate EvolutionW-JAX Keynote - Big Data and Corporate Evolution
W-JAX Keynote - Big Data and Corporate Evolution
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Ai idea to implementation : Use cases in Healthcare
Ai idea to implementation : Use cases in Healthcare Ai idea to implementation : Use cases in Healthcare
Ai idea to implementation : Use cases in Healthcare
 
chapter23 ComputerForensicsHowoftenhaveIsaidto.docx
chapter23 ComputerForensicsHowoftenhaveIsaidto.docxchapter23 ComputerForensicsHowoftenhaveIsaidto.docx
chapter23 ComputerForensicsHowoftenhaveIsaidto.docx
 
Sqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, Analyze
 
Butler
ButlerButler
Butler
 
From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...
From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...
From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...
 
stackconf 2020 | Artificial Intelligence? – more like Artificial Stupidity! b...
stackconf 2020 | Artificial Intelligence? – more like Artificial Stupidity! b...stackconf 2020 | Artificial Intelligence? – more like Artificial Stupidity! b...
stackconf 2020 | Artificial Intelligence? – more like Artificial Stupidity! b...
 
Personal Narrative-Co-Capitalist
Personal Narrative-Co-CapitalistPersonal Narrative-Co-Capitalist
Personal Narrative-Co-Capitalist
 
Efficient Data Labelling for Ocular Imaging
Efficient Data Labelling for Ocular ImagingEfficient Data Labelling for Ocular Imaging
Efficient Data Labelling for Ocular Imaging
 
What Is Artificial Intelligence? Part 1/10
What Is Artificial Intelligence? Part 1/10What Is Artificial Intelligence? Part 1/10
What Is Artificial Intelligence? Part 1/10
 
Best practices in building machine learning models in Azure ML
Best practices in building machine learning models in Azure MLBest practices in building machine learning models in Azure ML
Best practices in building machine learning models in Azure ML
 
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
 
Train, explain, acclaim. Build a good model in three steps
Train, explain, acclaim.  Build a good model in three stepsTrain, explain, acclaim.  Build a good model in three steps
Train, explain, acclaim. Build a good model in three steps
 
POTENTIAL IMPACT OF GENERATIVE ARTIFICIAL INTELLIGENCE(AI) ON THE FINANCIAL I...
POTENTIAL IMPACT OF GENERATIVE ARTIFICIAL INTELLIGENCE(AI) ON THE FINANCIAL I...POTENTIAL IMPACT OF GENERATIVE ARTIFICIAL INTELLIGENCE(AI) ON THE FINANCIAL I...
POTENTIAL IMPACT OF GENERATIVE ARTIFICIAL INTELLIGENCE(AI) ON THE FINANCIAL I...
 
Information entanglement
Information entanglementInformation entanglement
Information entanglement
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Just because you can doesn't mean that you should - thingmonk 2016
Just because you can doesn't mean that you should - thingmonk 2016Just because you can doesn't mean that you should - thingmonk 2016
Just because you can doesn't mean that you should - thingmonk 2016
 
AI-Driven Logical Argumentation in Active Cyber Defense
AI-Driven Logical Argumentation in Active Cyber DefenseAI-Driven Logical Argumentation in Active Cyber Defense
AI-Driven Logical Argumentation in Active Cyber Defense
 
Karen Hegarty Resume
Karen Hegarty ResumeKaren Hegarty Resume
Karen Hegarty Resume
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Data Ops:從實驗室走進生產線, 談談怎麼和資料科學家合作