SlideShare una empresa de Scribd logo
1 de 13
Idea Engineering
tim@menzies.us
PROMISE’13
Oct’13
0. algorithm
mining
1. landscape
mining
2. decision
mining
3. discussion
mining
yesterday today
tomorrow future
The Premises of PROMISE
(2005)
– Wanted: predictions
• Nope. Users want decision, or engagement
The Premises of PROMISE
(2005)
– Wanted: predictions
• Nope. Users want decision, or engagement
– Data mining will reveal “the truth” about SE
• [Dejaeger: TSE’11], [Hall: TSE’12], [Shepperd:COW’13]
• Not(Better learners = better conclusions)
The Premises of PROMISE
(2005)
– Wanted: predictions
• Nope. Users want decision, or engagement
– Data mining will reveal “the truth” about SE
• [Dejaeger: TSE’11], [Hall: TSE’12], [Shepperd:COW’13]
• Not(Better learners = better conclusions)
– Sooner or later: enough data for general conclusions
• Found more differences than generalities
• Special issues: [IST’13], [ESEj’13]
• Best papers, ASE’11, MSR’12
• Menzies, Zimmermann et al [TSE’13]
• Lots of local models
Landscape mining:
look before your leap
• Report what is true about the
data
– Not trivia on how algorithms
walk that data
• Map the landscape
– Reason on each part of map
• E.g. landscape mining
– Unsupervised iterative
dichotomization
– Cluster, prune
– Then generate rules
5
Landscape mining:
look before your leap
• Report what is true about the
data
– Not trivia on how algorithms
walk that data
• Map the landscape
– Reason on each part of map
• E.g. landscape mining
– Unsupervised iterative
dichotomization
– Cluster, prune
– Then generate rules
• Different to “leap before you look”
– i.e. skew learning by class variable
– then study the results
• E.g. C4.5, CART, Fayya-Iranni, etc
– Supervised iterative dichotomization
• E.g. 61% * 300+effort estimation
papers
– Algorithm tinkering, without end
6
Find landscape = cluster data, assign “heights”
Find decisions = report delta highs to lows
Monitor discussions = watch, help, communities explore deltas
7
IDEA Engineering = <landscape, decisions, discussion>
Spectral Landscape Mining
• Spectrum = condition that is not
limited to a specific set of values
but varies in a continuum.
• Groups together a broad range of
conditions or behaviors under
one single title
• In mathematics, the spectrum of
a (finite-dimensional) matrix is
the set of its eigenvalues.
• Nystrom algorithms:
approximations to eigenvalues
– FASTMAP: linear time
Project data on first 2 PCA; grid that data
e.g. Nasa93dem
1) project 23 dimensions projected into 2
2a) cluster
2b) replace clusters with centroids.
MOEA: score=
effort+defects
+months
Sanity check:
What information loss?
• E.g. POI-3
– 400+ examples
– 20 centroids
• Prediction via:
– Extrapolation between two
nearest centroids
• Works as well as
– Random forest, Naïve Bayes
• For defect prediction (10 data sets)
– Linear regression, M5’
• For effort estimation (10 data sets)
• Find delta between neighbors that go worse to better
• Very small rules, found in logLinear time
• Menzies et al. [TSE’13]
11
Planning = Inter-cluster contrast sets
Applications
• Prediction
• Planning
• Monitoring
• Multi-objective optimization
– Cluster first on N objectives
• Anomaly detection
• Incremental theory revision
• Compression
• Privacy
• etc
Idea Engineering
0. algorithm
mining
1. landscape
mining
2. decision
mining
3. discussion
mining
yesterday today
tomorrow future
Beyond Data Mining, T. Menzies, IEEE Software, 2013, to appear
13
Q: why call it
mining?
• A1: because all the primitives for the above are
in the data mining literature
• So we know how to get from here to there
• A2: because data mining scales

Más contenido relacionado

Similar a Idea Engineering

Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2
Lukas Mandrake
 
Final_Talk_Tool_Team
Final_Talk_Tool_TeamFinal_Talk_Tool_Team
Final_Talk_Tool_Team
Mehdi Lamee
 

Similar a Idea Engineering (20)

Icse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceIcse15 Tech-briefing Data Science
Icse15 Tech-briefing Data Science
 
What Metrics Matter?
What Metrics Matter? What Metrics Matter?
What Metrics Matter?
 
Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2
 
MUMS: Transition & SPUQ Workshop - Uncertainty Quantification in Materials Wo...
MUMS: Transition & SPUQ Workshop - Uncertainty Quantification in Materials Wo...MUMS: Transition & SPUQ Workshop - Uncertainty Quantification in Materials Wo...
MUMS: Transition & SPUQ Workshop - Uncertainty Quantification in Materials Wo...
 
Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction
 
LSST Solar System Science: MOPS Status, the Science, and Your Questions
LSST Solar System Science: MOPS Status, the Science, and Your QuestionsLSST Solar System Science: MOPS Status, the Science, and Your Questions
LSST Solar System Science: MOPS Status, the Science, and Your Questions
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013
 
Tim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceTim Menzies, directions in Data Science
Tim Menzies, directions in Data Science
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Franhouder july2013
Franhouder july2013Franhouder july2013
Franhouder july2013
 
Licentiate Defense Slide
Licentiate Defense SlideLicentiate Defense Slide
Licentiate Defense Slide
 
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
 
Clustering large-scale data Buzzwords 2013 full
Clustering large-scale data Buzzwords 2013 fullClustering large-scale data Buzzwords 2013 full
Clustering large-scale data Buzzwords 2013 full
 
Sensors1(1)
Sensors1(1)Sensors1(1)
Sensors1(1)
 
Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25
 
Topological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial SystemsTopological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial Systems
 
Lecture2-DT.pptx
Lecture2-DT.pptxLecture2-DT.pptx
Lecture2-DT.pptx
 
social.pptx
social.pptxsocial.pptx
social.pptx
 
Final_Talk_Tool_Team
Final_Talk_Tool_TeamFinal_Talk_Tool_Team
Final_Talk_Tool_Team
 

Más de CS, NcState

Lexisnexis june9
Lexisnexis june9Lexisnexis june9
Lexisnexis june9
CS, NcState
 
Ai4se lab template
Ai4se lab templateAi4se lab template
Ai4se lab template
CS, NcState
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSU
CS, NcState
 
Dagstuhl14 intro-v1
Dagstuhl14 intro-v1Dagstuhl14 intro-v1
Dagstuhl14 intro-v1
CS, NcState
 
How to do better experiments in SE
How to do better experiments in SEHow to do better experiments in SE
How to do better experiments in SE
CS, NcState
 
Icse 2013-tutorial-data-science-for-software-engineering
Icse 2013-tutorial-data-science-for-software-engineeringIcse 2013-tutorial-data-science-for-software-engineering
Icse 2013-tutorial-data-science-for-software-engineering
CS, NcState
 
Dm sei-tutorial-v7
Dm sei-tutorial-v7Dm sei-tutorial-v7
Dm sei-tutorial-v7
CS, NcState
 

Más de CS, NcState (20)

Future se oct15
Future se oct15Future se oct15
Future se oct15
 
GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software Engineering
 
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
 
Lexisnexis june9
Lexisnexis june9Lexisnexis june9
Lexisnexis june9
 
Kits to Find the Bits that Fits
Kits to Find  the Bits that Fits Kits to Find  the Bits that Fits
Kits to Find the Bits that Fits
 
Ai4se lab template
Ai4se lab templateAi4se lab template
Ai4se lab template
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSU
 
Requirements Engineering
Requirements EngineeringRequirements Engineering
Requirements Engineering
 
172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia
 
Automated Software Engineering
Automated Software EngineeringAutomated Software Engineering
Automated Software Engineering
 
Goldrush
GoldrushGoldrush
Goldrush
 
Dagstuhl14 intro-v1
Dagstuhl14 intro-v1Dagstuhl14 intro-v1
Dagstuhl14 intro-v1
 
Know thy tools
Know thy toolsKnow thy tools
Know thy tools
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software Data
 
Sayyad slides ase13_v4
Sayyad slides ase13_v4Sayyad slides ase13_v4
Sayyad slides ase13_v4
 
Ase2013
Ase2013Ase2013
Ase2013
 
Warning: don't do CS
Warning: don't do CSWarning: don't do CS
Warning: don't do CS
 
How to do better experiments in SE
How to do better experiments in SEHow to do better experiments in SE
How to do better experiments in SE
 
Icse 2013-tutorial-data-science-for-software-engineering
Icse 2013-tutorial-data-science-for-software-engineeringIcse 2013-tutorial-data-science-for-software-engineering
Icse 2013-tutorial-data-science-for-software-engineering
 
Dm sei-tutorial-v7
Dm sei-tutorial-v7Dm sei-tutorial-v7
Dm sei-tutorial-v7
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Idea Engineering

  • 1. Idea Engineering tim@menzies.us PROMISE’13 Oct’13 0. algorithm mining 1. landscape mining 2. decision mining 3. discussion mining yesterday today tomorrow future
  • 2. The Premises of PROMISE (2005) – Wanted: predictions • Nope. Users want decision, or engagement
  • 3. The Premises of PROMISE (2005) – Wanted: predictions • Nope. Users want decision, or engagement – Data mining will reveal “the truth” about SE • [Dejaeger: TSE’11], [Hall: TSE’12], [Shepperd:COW’13] • Not(Better learners = better conclusions)
  • 4. The Premises of PROMISE (2005) – Wanted: predictions • Nope. Users want decision, or engagement – Data mining will reveal “the truth” about SE • [Dejaeger: TSE’11], [Hall: TSE’12], [Shepperd:COW’13] • Not(Better learners = better conclusions) – Sooner or later: enough data for general conclusions • Found more differences than generalities • Special issues: [IST’13], [ESEj’13] • Best papers, ASE’11, MSR’12 • Menzies, Zimmermann et al [TSE’13] • Lots of local models
  • 5. Landscape mining: look before your leap • Report what is true about the data – Not trivia on how algorithms walk that data • Map the landscape – Reason on each part of map • E.g. landscape mining – Unsupervised iterative dichotomization – Cluster, prune – Then generate rules 5
  • 6. Landscape mining: look before your leap • Report what is true about the data – Not trivia on how algorithms walk that data • Map the landscape – Reason on each part of map • E.g. landscape mining – Unsupervised iterative dichotomization – Cluster, prune – Then generate rules • Different to “leap before you look” – i.e. skew learning by class variable – then study the results • E.g. C4.5, CART, Fayya-Iranni, etc – Supervised iterative dichotomization • E.g. 61% * 300+effort estimation papers – Algorithm tinkering, without end 6
  • 7. Find landscape = cluster data, assign “heights” Find decisions = report delta highs to lows Monitor discussions = watch, help, communities explore deltas 7 IDEA Engineering = <landscape, decisions, discussion>
  • 8. Spectral Landscape Mining • Spectrum = condition that is not limited to a specific set of values but varies in a continuum. • Groups together a broad range of conditions or behaviors under one single title • In mathematics, the spectrum of a (finite-dimensional) matrix is the set of its eigenvalues. • Nystrom algorithms: approximations to eigenvalues – FASTMAP: linear time
  • 9. Project data on first 2 PCA; grid that data e.g. Nasa93dem 1) project 23 dimensions projected into 2 2a) cluster 2b) replace clusters with centroids. MOEA: score= effort+defects +months
  • 10. Sanity check: What information loss? • E.g. POI-3 – 400+ examples – 20 centroids • Prediction via: – Extrapolation between two nearest centroids • Works as well as – Random forest, Naïve Bayes • For defect prediction (10 data sets) – Linear regression, M5’ • For effort estimation (10 data sets)
  • 11. • Find delta between neighbors that go worse to better • Very small rules, found in logLinear time • Menzies et al. [TSE’13] 11 Planning = Inter-cluster contrast sets
  • 12. Applications • Prediction • Planning • Monitoring • Multi-objective optimization – Cluster first on N objectives • Anomaly detection • Incremental theory revision • Compression • Privacy • etc
  • 13. Idea Engineering 0. algorithm mining 1. landscape mining 2. decision mining 3. discussion mining yesterday today tomorrow future Beyond Data Mining, T. Menzies, IEEE Software, 2013, to appear 13 Q: why call it mining? • A1: because all the primitives for the above are in the data mining literature • So we know how to get from here to there • A2: because data mining scales