SlideShare una empresa de Scribd logo
1 de 55
Spreadsheet Engineering
Jácome Cunha
jacome@di.uminho.pt
www.di.uminho.pt/~jacome
Researcher @

HASLab / INESC TEC & Universidade do Minho, Portugal
Oregon State University (office 2077)
OSU - EECS Colloquium - 02/24/14
Agenda
I.

Motivation

II.

Spreadsheets Meet Models

III.

Models for Spreadsheets – ClassSheets

IV.

Inferring ClassSheets

V.

Embedding ClassSheets

VI.

Evolution!

VII.

Model-Driven Spreadsheets

VIII.

Summary
2
I. Motivation

3
4
Why do Spreadsheets matter?

Financial intelligence firm CODA reports that
95% of all U.S. firms use spreadsheets for
financial reporting.

Sarbanes-Oxley: What About all the Spreadsheets?, Raymond R.
Panko and Nicholas Ordway, 2008

5
Why do Spreadsheets matter?

They are the programming language of choice by nonprofessional programmers, a.k.a. end users.
In the U.S. alone, the number of end-user
programmers is conservatively estimated at 11 million,
compared to only 2.75 million other, professional
programmers.
Estimating the numbers of end users and end-user programmers,
Christopher Scaffidi, Mary Shaw, and Brad Myers, VL/HCC 2005
6
7
Why do Spreadsheets matter?

In 2004, RevenueRecognition.com (now
Softtrax) had the International Data Corporation
interview 118 business leaders.
IDC found that 85% were using spreadsheets in
financial reporting and forecasting.

Sarbanes-Oxley: What About all the Spreadsheets?, Raymond R.
Panko and Nicholas Ordway, 2008

8
In fact, spreadsheets lack:
●

Abstraction
●

●

Encapsulation

Type system
●

Testing
●

●

IDE

...
9
And the consequences may be...
Around 200 people who thought their only
experience of the London 2012
Olympic Games would be minor heats of
synchronised swimming have
received an unexpected upgrade to the men’s
100m final following an
embarrassing ticketing mistake.
...
Locog said the error occurred in the summer,
between the first and
second round of ticket sales, when a member of
staff made a single
keystroke mistake and entered ‘20,000’ into a
spreadsheet rather than
the correct figure of 10,000 remaining tickets.
The Telegraph, 04 January 2012

10
And the consequences may be...
In a 2010 paper* Carmen Reinhart, now a professor at
Harvard Kennedy School, and Kenneth Rogoff, an
economist at Harvard University...argued that GDP
growth slows to a snail’s pace once government-debt
levels exceed 90% of GDP. The 90% figure quickly
became ammunition in political arguments over
austerity...This week a new piece of research poured fuel
on the fire by calling the 90% finding into question..

Economy losses of $10 billion/year!

The Economist, 17 April 2013

Harvard University economists Carmen Reinhart
and Kenneth Rogoff have acknowledged making
a spreadsheet calculation mistake in a 2010
research paper, “Growth in a Time of Debt”,
which has been widely cited to justify budgetcutting.
Business Week, 18 April 2013

11
II. Spreadsheets Meet Models
[PEPM'09, VL/HCC'10]

12
Why Models?

13
Spreadsheet Example

14
Functional Dependency?

→

↛
15
Functional Dependencies

●

●

●

We compute the business logic from the data, by
inferring FDs
They are the building blocks inferring models for
(legacy) spreadsheets
The better the FDs we infer, the better the model we
compute!
16
Too Many??
["A"] -> ["B","C","D","E","F"]
["C"] -> ["A","B","D","E","F"]
["D"] -> ["A","B","C","E","F"]
["E"] -> ["A","B","C","D","F"]
["F"] -> ["A","B","C","D","E"]
["G"] -> ["H","I","J"]
["H"] -> ["G","I","J"]
["I"] -> ["G","H","J"]
["J"] -> ["G","H","I"]
["K"] -> ["L","M"]
["L"] -> ["K","M"]
["M"] -> ["K","L"]
["B","K"] -> ["A","C","D","E","F"]
["B","L"] -> ["A","C","D","E","F"]
["B","M"] -> ["A","C","D","E","F"]

17
Accidents happen

●

●

We use a data mining algorithm which produces to
many accidental FDs!

We introduce some spreadsheet specific heuristics to
filter out “accidental” FDs

18
Organize them
●

●

Label semantics: often keys are labeled “code” or “id”
Label arrangement: we prefer FDs respecting the
order of columns

●

Antecedent size: small keys are preferable

●

Ratio: small ratio between keys and non-keys

●

Single value columns: columns always with the
same value appear in too many FDs

19
Final set

Pilot-Id → Pilot-Name, Phone
N-Number → Model, Plane-Name
Pilot-Id, N-Number, Depart, Destination, Date, Hours → {}
20
The first model: a relational model
Having computed the FDs, we can now use the FUN
algorithm to produce a relational model for the
spreadsheet:

Pilots (Pilot-Id, Pilot-Name, Phone)
Planes (N-Number, Model, Plane-Name)
<Flights> (#Pilot-Id,# N-Number, Depart, Destination, Date Hours)
21
III. Models for Spreadsheet – ClassSheets
Engels and Erwig ASE'05

22
ClassSheets - Models for Spreadsheets
ClassSheets are a high-level, object-oriented formalism to
specify spreadsheets

23
ClassSheets - Models for Spreadsheets

24
ClassSheets - Models for Spreadsheets

25
IV. Inferring ClassSheets
[VL/HCC'10]

26
ClassSheet Inference

27
V. Embedding ClassSheets
[VL/HCC'11]

28
Why the Embedding?

29
Embedding...
●

Embedding a language into another language is
a recurring strategy (e.g. for DSLs)
–

–

Users are used to the host language and do not
need to learn a (complete) new language :-)

–

Implementation effort is much reduced :-)

–

●

Embedded language inherit all the power of the
host language :-)

It may have some restrictions :-(

We embedded ClassSheets in traditional
spreadsheet systems

30
Vertically Expandable Tables

31
Horizontally Expandable Tables

32
Relationship Tables

33
34
VI. Evolution!
[FASE'11, ICMT'12]

35
Why do Spreadsheet Models Need Evolution?
●

●

Suppose now you need to add new information
to the spreadsheet
For instance, the number of passengers of
each flight

●

It would require to do several error-prone tasks

●

Add columns, labels, update formulas, etc.

●

We can do it automatically!

36
Why do Spreadsheet Instances Need Evolution?

●

●

●

Some evolution steps are easier to perform on
the instance
For instance, to add a column to one of the
repetition blocks
People felt the need to evolve the data
37
38
Bidirectional Transformation System

39
(Data) Operations on Instances

40
(Model) Operations on ClassSheets

41
Bidirectional Transformation Functions

42
Compositional
Example: Add a Column and a Class

43
VII. MDSheet – Model-Driven Spreadsheets
[ICSE'12]

44
MDSheet Tool

http://youtu.be/6LNdTdCpV2U

45
●

Available at http://ssaapp.di.uminho.pt

●

Built out of 7886 LOC:
–

3181 in Haskell, for the inference and evolution

–

980 in Basic, for the embedding

–

2884 in C++, for gluing all components

–

340 in Perl, for compilation and setup

–

722, for makefiles
46
VIII. Summary

47
Acknowledgments

This work has been done in collaboration with
many people:
Martin Erwig, João Paulo Fernandes, Jorge
Mendes, Hugo Pacheco, Rui Pereira, João
Saraiva, Joost Visser

48
Thanks!
Questions?

49
More?
●

More at http://ssaapp.di.uminho.pt

●

Querying model-driven spreadsheet

●

Visually querying model-driven spreadsheets

●

Detections of bad smells

●

Edit assistance

●

Empirical validations

●

Variational spreadsheets (@ OSU)

●

...

50
Does It Work?

51
Empirical Study Settings

●

17 student from a MSc course

●

2 different spreadsheets
–

Microsoft budget

–

Local company responsible for water supply of
Braga, Portugal - agere
52
Study Setting

●

Hypotheses:
(1) In order to perform a given set of tasks,
users spend less time when using modeldriven spreadsheets instead of plain ones.
(2) Spreadsheets developed in the modeldriven environment hold less errors than plain
ones.
53
Main Results
Number of tasks performed on the MS spreadsheet

54
Main Results
Error rate in the budget spreadsheet

55

Más contenido relacionado

Similar a Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14

Spreadsheet Engineering
Spreadsheet EngineeringSpreadsheet Engineering
Spreadsheet EngineeringJácome Cunha
 
Summer School DSL 2013 - SpreadSheet Engineering
Summer School DSL 2013 - SpreadSheet EngineeringSummer School DSL 2013 - SpreadSheet Engineering
Summer School DSL 2013 - SpreadSheet EngineeringJácome Cunha
 
Varied encounters with data science (slide share)
Varied encounters with data science (slide share)Varied encounters with data science (slide share)
Varied encounters with data science (slide share)gilbert.peffer
 
Best Practices in Software Cost Estimation - Metrikon 2015 - Frank Vogelezang
Best Practices in Software Cost Estimation - Metrikon 2015 - Frank VogelezangBest Practices in Software Cost Estimation - Metrikon 2015 - Frank Vogelezang
Best Practices in Software Cost Estimation - Metrikon 2015 - Frank VogelezangFrank Vogelezang
 
Model-Driven Spreadsheet Development
Model-Driven Spreadsheet DevelopmentModel-Driven Spreadsheet Development
Model-Driven Spreadsheet DevelopmentJácome Cunha
 
The Study of the Large Scale Twitter on Machine Learning
The Study of the Large Scale Twitter on Machine LearningThe Study of the Large Scale Twitter on Machine Learning
The Study of the Large Scale Twitter on Machine LearningIRJET Journal
 
Computer aided design, computer aided manufacturing, computer aided engineering
Computer aided design, computer aided manufacturing, computer aided engineeringComputer aided design, computer aided manufacturing, computer aided engineering
Computer aided design, computer aided manufacturing, computer aided engineeringuniversity of sust.
 
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...DataBench
 
51477813 45498199-sonia-f-sap-fico-project
51477813 45498199-sonia-f-sap-fico-project51477813 45498199-sonia-f-sap-fico-project
51477813 45498199-sonia-f-sap-fico-projectArup Bose, PMP
 
Eng Cal Reduced1
Eng Cal Reduced1Eng Cal Reduced1
Eng Cal Reduced1rtmote
 
Model-driven Spreadsheets
Model-driven SpreadsheetsModel-driven Spreadsheets
Model-driven SpreadsheetsJácome Cunha
 
Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...
Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...
Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...DataBench
 
New Developments in New-Product Development
New Developments in New-Product DevelopmentNew Developments in New-Product Development
New Developments in New-Product DevelopmentPTC
 
Recent DEA Applications to Industry: A Literature Review From 2010 To 2014
Recent DEA Applications to Industry: A Literature Review From 2010 To 2014Recent DEA Applications to Industry: A Literature Review From 2010 To 2014
Recent DEA Applications to Industry: A Literature Review From 2010 To 2014inventionjournals
 
MGT 410Homework Set 3Provide a short answer to each of the f
MGT 410Homework Set 3Provide a short answer to each of the fMGT 410Homework Set 3Provide a short answer to each of the f
MGT 410Homework Set 3Provide a short answer to each of the fDioneWang844
 
Dutch Industry Experience - Presentatie Jan Baan
Dutch Industry Experience - Presentatie Jan BaanDutch Industry Experience - Presentatie Jan Baan
Dutch Industry Experience - Presentatie Jan BaanSalesforce_Benelux
 
Machine Learning for Finance Master Class
Machine Learning for Finance Master Class Machine Learning for Finance Master Class
Machine Learning for Finance Master Class QuantUniversity
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLPaco Nathan
 

Similar a Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 (20)

Spreadsheet Engineering
Spreadsheet EngineeringSpreadsheet Engineering
Spreadsheet Engineering
 
Summer School DSL 2013 - SpreadSheet Engineering
Summer School DSL 2013 - SpreadSheet EngineeringSummer School DSL 2013 - SpreadSheet Engineering
Summer School DSL 2013 - SpreadSheet Engineering
 
Varied encounters with data science (slide share)
Varied encounters with data science (slide share)Varied encounters with data science (slide share)
Varied encounters with data science (slide share)
 
Best Practices in Software Cost Estimation - Metrikon 2015 - Frank Vogelezang
Best Practices in Software Cost Estimation - Metrikon 2015 - Frank VogelezangBest Practices in Software Cost Estimation - Metrikon 2015 - Frank Vogelezang
Best Practices in Software Cost Estimation - Metrikon 2015 - Frank Vogelezang
 
Model-Driven Spreadsheet Development
Model-Driven Spreadsheet DevelopmentModel-Driven Spreadsheet Development
Model-Driven Spreadsheet Development
 
The Study of the Large Scale Twitter on Machine Learning
The Study of the Large Scale Twitter on Machine LearningThe Study of the Large Scale Twitter on Machine Learning
The Study of the Large Scale Twitter on Machine Learning
 
Computer aided design, computer aided manufacturing, computer aided engineering
Computer aided design, computer aided manufacturing, computer aided engineeringComputer aided design, computer aided manufacturing, computer aided engineering
Computer aided design, computer aided manufacturing, computer aided engineering
 
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
 
51477813 45498199-sonia-f-sap-fico-project
51477813 45498199-sonia-f-sap-fico-project51477813 45498199-sonia-f-sap-fico-project
51477813 45498199-sonia-f-sap-fico-project
 
Eng Cal Reduced1
Eng Cal Reduced1Eng Cal Reduced1
Eng Cal Reduced1
 
Model-driven Spreadsheets
Model-driven SpreadsheetsModel-driven Spreadsheets
Model-driven Spreadsheets
 
Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...
Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...
Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
 
New Developments in New-Product Development
New Developments in New-Product DevelopmentNew Developments in New-Product Development
New Developments in New-Product Development
 
Recent DEA Applications to Industry: A Literature Review From 2010 To 2014
Recent DEA Applications to Industry: A Literature Review From 2010 To 2014Recent DEA Applications to Industry: A Literature Review From 2010 To 2014
Recent DEA Applications to Industry: A Literature Review From 2010 To 2014
 
MGT 410Homework Set 3Provide a short answer to each of the f
MGT 410Homework Set 3Provide a short answer to each of the fMGT 410Homework Set 3Provide a short answer to each of the f
MGT 410Homework Set 3Provide a short answer to each of the f
 
Dutch Industry Experience - Presentatie Jan Baan
Dutch Industry Experience - Presentatie Jan BaanDutch Industry Experience - Presentatie Jan Baan
Dutch Industry Experience - Presentatie Jan Baan
 
Machine Learning for Finance Master Class
Machine Learning for Finance Master Class Machine Learning for Finance Master Class
Machine Learning for Finance Master Class
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 

Más de Jácome Cunha

Energy Efficiency Across 
Programming Languages
Energy Efficiency Across 
Programming LanguagesEnergy Efficiency Across 
Programming Languages
Energy Efficiency Across 
Programming LanguagesJácome Cunha
 
Explaining Spreadsheets with Spreadsheets
Explaining Spreadsheets with SpreadsheetsExplaining Spreadsheets with Spreadsheets
Explaining Spreadsheets with SpreadsheetsJácome Cunha
 
Automatically Inferring ClassSheet Models from Spreadsheets
Automatically Inferring ClassSheet Models from SpreadsheetsAutomatically Inferring ClassSheet Models from Spreadsheets
Automatically Inferring ClassSheet Models from SpreadsheetsJácome Cunha
 
On Understanding Data Scientists
On Understanding  Data ScientistsOn Understanding  Data Scientists
On Understanding Data ScientistsJácome Cunha
 
Systematic Spreadsheet Construction Processes @ VL/HCC 2017
Systematic Spreadsheet Construction Processes @ VL/HCC 2017Systematic Spreadsheet Construction Processes @ VL/HCC 2017
Systematic Spreadsheet Construction Processes @ VL/HCC 2017Jácome Cunha
 
jStanley: Placing a Green Thumb on Java Collections
jStanley: Placing a Green Thumb on  Java CollectionsjStanley: Placing a Green Thumb on  Java Collections
jStanley: Placing a Green Thumb on Java CollectionsJácome Cunha
 
Type-Safe Evolution of 
Web Services
Type-Safe Evolution of 
Web ServicesType-Safe Evolution of 
Web Services
Type-Safe Evolution of 
Web ServicesJácome Cunha
 
MDSheet – Model-Driven Spreadsheets
MDSheet – Model-Driven SpreadsheetsMDSheet – Model-Driven Spreadsheets
MDSheet – Model-Driven SpreadsheetsJácome Cunha
 

Más de Jácome Cunha (12)

Energy Efficiency Across 
Programming Languages
Energy Efficiency Across 
Programming LanguagesEnergy Efficiency Across 
Programming Languages
Energy Efficiency Across 
Programming Languages
 
LMCC - 30 Anos
LMCC - 30 AnosLMCC - 30 Anos
LMCC - 30 Anos
 
Explaining Spreadsheets with Spreadsheets
Explaining Spreadsheets with SpreadsheetsExplaining Spreadsheets with Spreadsheets
Explaining Spreadsheets with Spreadsheets
 
Automatically Inferring ClassSheet Models from Spreadsheets
Automatically Inferring ClassSheet Models from SpreadsheetsAutomatically Inferring ClassSheet Models from Spreadsheets
Automatically Inferring ClassSheet Models from Spreadsheets
 
On Understanding Data Scientists
On Understanding  Data ScientistsOn Understanding  Data Scientists
On Understanding Data Scientists
 
Systematic Spreadsheet Construction Processes @ VL/HCC 2017
Systematic Spreadsheet Construction Processes @ VL/HCC 2017Systematic Spreadsheet Construction Processes @ VL/HCC 2017
Systematic Spreadsheet Construction Processes @ VL/HCC 2017
 
jStanley: Placing a Green Thumb on Java Collections
jStanley: Placing a Green Thumb on  Java CollectionsjStanley: Placing a Green Thumb on  Java Collections
jStanley: Placing a Green Thumb on Java Collections
 
Type-Safe Evolution of 
Web Services
Type-Safe Evolution of 
Web ServicesType-Safe Evolution of 
Web Services
Type-Safe Evolution of 
Web Services
 
MDSheet – Model-Driven Spreadsheets
MDSheet – Model-Driven SpreadsheetsMDSheet – Model-Driven Spreadsheets
MDSheet – Model-Driven Spreadsheets
 
Talk
TalkTalk
Talk
 
Talk at IS-EUD '11
Talk at IS-EUD '11Talk at IS-EUD '11
Talk at IS-EUD '11
 
Talk at VL/HCC '11
Talk at VL/HCC '11Talk at VL/HCC '11
Talk at VL/HCC '11
 

Último

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Último (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14

  • 1. Spreadsheet Engineering Jácome Cunha jacome@di.uminho.pt www.di.uminho.pt/~jacome Researcher @ HASLab / INESC TEC & Universidade do Minho, Portugal Oregon State University (office 2077) OSU - EECS Colloquium - 02/24/14
  • 2. Agenda I. Motivation II. Spreadsheets Meet Models III. Models for Spreadsheets – ClassSheets IV. Inferring ClassSheets V. Embedding ClassSheets VI. Evolution! VII. Model-Driven Spreadsheets VIII. Summary 2
  • 4. 4
  • 5. Why do Spreadsheets matter? Financial intelligence firm CODA reports that 95% of all U.S. firms use spreadsheets for financial reporting. Sarbanes-Oxley: What About all the Spreadsheets?, Raymond R. Panko and Nicholas Ordway, 2008 5
  • 6. Why do Spreadsheets matter? They are the programming language of choice by nonprofessional programmers, a.k.a. end users. In the U.S. alone, the number of end-user programmers is conservatively estimated at 11 million, compared to only 2.75 million other, professional programmers. Estimating the numbers of end users and end-user programmers, Christopher Scaffidi, Mary Shaw, and Brad Myers, VL/HCC 2005 6
  • 7. 7
  • 8. Why do Spreadsheets matter? In 2004, RevenueRecognition.com (now Softtrax) had the International Data Corporation interview 118 business leaders. IDC found that 85% were using spreadsheets in financial reporting and forecasting. Sarbanes-Oxley: What About all the Spreadsheets?, Raymond R. Panko and Nicholas Ordway, 2008 8
  • 9. In fact, spreadsheets lack: ● Abstraction ● ● Encapsulation Type system ● Testing ● ● IDE ... 9
  • 10. And the consequences may be... Around 200 people who thought their only experience of the London 2012 Olympic Games would be minor heats of synchronised swimming have received an unexpected upgrade to the men’s 100m final following an embarrassing ticketing mistake. ... Locog said the error occurred in the summer, between the first and second round of ticket sales, when a member of staff made a single keystroke mistake and entered ‘20,000’ into a spreadsheet rather than the correct figure of 10,000 remaining tickets. The Telegraph, 04 January 2012 10
  • 11. And the consequences may be... In a 2010 paper* Carmen Reinhart, now a professor at Harvard Kennedy School, and Kenneth Rogoff, an economist at Harvard University...argued that GDP growth slows to a snail’s pace once government-debt levels exceed 90% of GDP. The 90% figure quickly became ammunition in political arguments over austerity...This week a new piece of research poured fuel on the fire by calling the 90% finding into question.. Economy losses of $10 billion/year! The Economist, 17 April 2013 Harvard University economists Carmen Reinhart and Kenneth Rogoff have acknowledged making a spreadsheet calculation mistake in a 2010 research paper, “Growth in a Time of Debt”, which has been widely cited to justify budgetcutting. Business Week, 18 April 2013 11
  • 12. II. Spreadsheets Meet Models [PEPM'09, VL/HCC'10] 12
  • 16. Functional Dependencies ● ● ● We compute the business logic from the data, by inferring FDs They are the building blocks inferring models for (legacy) spreadsheets The better the FDs we infer, the better the model we compute! 16
  • 17. Too Many?? ["A"] -> ["B","C","D","E","F"] ["C"] -> ["A","B","D","E","F"] ["D"] -> ["A","B","C","E","F"] ["E"] -> ["A","B","C","D","F"] ["F"] -> ["A","B","C","D","E"] ["G"] -> ["H","I","J"] ["H"] -> ["G","I","J"] ["I"] -> ["G","H","J"] ["J"] -> ["G","H","I"] ["K"] -> ["L","M"] ["L"] -> ["K","M"] ["M"] -> ["K","L"] ["B","K"] -> ["A","C","D","E","F"] ["B","L"] -> ["A","C","D","E","F"] ["B","M"] -> ["A","C","D","E","F"] 17
  • 18. Accidents happen ● ● We use a data mining algorithm which produces to many accidental FDs! We introduce some spreadsheet specific heuristics to filter out “accidental” FDs 18
  • 19. Organize them ● ● Label semantics: often keys are labeled “code” or “id” Label arrangement: we prefer FDs respecting the order of columns ● Antecedent size: small keys are preferable ● Ratio: small ratio between keys and non-keys ● Single value columns: columns always with the same value appear in too many FDs 19
  • 20. Final set Pilot-Id → Pilot-Name, Phone N-Number → Model, Plane-Name Pilot-Id, N-Number, Depart, Destination, Date, Hours → {} 20
  • 21. The first model: a relational model Having computed the FDs, we can now use the FUN algorithm to produce a relational model for the spreadsheet: Pilots (Pilot-Id, Pilot-Name, Phone) Planes (N-Number, Model, Plane-Name) <Flights> (#Pilot-Id,# N-Number, Depart, Destination, Date Hours) 21
  • 22. III. Models for Spreadsheet – ClassSheets Engels and Erwig ASE'05 22
  • 23. ClassSheets - Models for Spreadsheets ClassSheets are a high-level, object-oriented formalism to specify spreadsheets 23
  • 24. ClassSheets - Models for Spreadsheets 24
  • 25. ClassSheets - Models for Spreadsheets 25
  • 30. Embedding... ● Embedding a language into another language is a recurring strategy (e.g. for DSLs) – – Users are used to the host language and do not need to learn a (complete) new language :-) – Implementation effort is much reduced :-) – ● Embedded language inherit all the power of the host language :-) It may have some restrictions :-( We embedded ClassSheets in traditional spreadsheet systems 30
  • 34. 34
  • 36. Why do Spreadsheet Models Need Evolution? ● ● Suppose now you need to add new information to the spreadsheet For instance, the number of passengers of each flight ● It would require to do several error-prone tasks ● Add columns, labels, update formulas, etc. ● We can do it automatically! 36
  • 37. Why do Spreadsheet Instances Need Evolution? ● ● ● Some evolution steps are easier to perform on the instance For instance, to add a column to one of the repetition blocks People felt the need to evolve the data 37
  • 38. 38
  • 40. (Data) Operations on Instances 40
  • 41. (Model) Operations on ClassSheets 41
  • 43. Compositional Example: Add a Column and a Class 43
  • 44. VII. MDSheet – Model-Driven Spreadsheets [ICSE'12] 44
  • 46. ● Available at http://ssaapp.di.uminho.pt ● Built out of 7886 LOC: – 3181 in Haskell, for the inference and evolution – 980 in Basic, for the embedding – 2884 in C++, for gluing all components – 340 in Perl, for compilation and setup – 722, for makefiles 46
  • 48. Acknowledgments This work has been done in collaboration with many people: Martin Erwig, João Paulo Fernandes, Jorge Mendes, Hugo Pacheco, Rui Pereira, João Saraiva, Joost Visser 48
  • 50. More? ● More at http://ssaapp.di.uminho.pt ● Querying model-driven spreadsheet ● Visually querying model-driven spreadsheets ● Detections of bad smells ● Edit assistance ● Empirical validations ● Variational spreadsheets (@ OSU) ● ... 50
  • 52. Empirical Study Settings ● 17 student from a MSc course ● 2 different spreadsheets – Microsoft budget – Local company responsible for water supply of Braga, Portugal - agere 52
  • 53. Study Setting ● Hypotheses: (1) In order to perform a given set of tasks, users spend less time when using modeldriven spreadsheets instead of plain ones. (2) Spreadsheets developed in the modeldriven environment hold less errors than plain ones. 53
  • 54. Main Results Number of tasks performed on the MS spreadsheet 54
  • 55. Main Results Error rate in the budget spreadsheet 55

Notas del editor

  1. Spreadsheet are widely used By everyone By many companies
  2. They are grate Lots of features
  3. Intended to be used by trained person Professional on Spreadsheet models/ClassSheets Not by instance/data end users
  4. This generated spreadsheet guides users in introducing correct data The spreadsheet includes mechanisms that guarantee that the spreadsheet data always conforms to the model after an user update
  5. Baseado em haskell Integrado no OO Clicanca-se em botoes Espetacular Basic