SlideShare una empresa de Scribd logo
1 de 76
Descargar para leer sin conexión
Analyzing & visualizing spreadsheets
Felienne Hermans (@felienne)
Analyzing & visualizing spreadsheets
Felienne Hermans (@felienne)
In this slidedeck I present an
overview of my PhD research. I
recently defended my dissertation
titled ‘Analyzing and visualizing
Spreadsheets’
In this slidedeck I present an
overview of my PhD research. I
recently defended my dissertation
titled ‘Analyzing and visualizing
Spreadsheets’
 This one!
Bridging the gap
Funny story: I wasn’t hired to
research spreadsheets at all. When
I started my PhD project, I was
supposed to research the gap
between business users and
programmers.
Users
Programmers
To research this gap, I started by
studying business in practice
What surprised me, is that this gap
wasn’t that big, it was more like a
small creek than a huge cliff.
Some programmers were heavilly
involved in business, and even more
interesting: some business guys were
doing serious programming.
Programmers
Users
What surprised me, is that this gap
wasn’t that big, it was more like a
small creek than a huge cliff.
Some programmers were heavilly
involved in business, and even more
interesting: some business guys were
doing serious programming.
In Excel!
Programmers
Users
What surprised me, is that this gap
wasn’t that big, it was more like a
small creek than a huge cliff.
Some programmers were heavilly
involved in business, and even more
interesting: some business guys were
doing serious programming.
In Excel!
So I looked into some previous work
on the impact of spreadsheets on
business.
Programmers
Users
95% of all U.S. firms use spreadsheets for
financial reporting
90% of all analysts in industry perform
calculations in spreadsheets
50% of spreadsheets form the basis for
decisions
Importance can grow over time
When studying the impact of
spreadsheets, we found that they
do not become important
overnight. As processes change,
spreadsheets can become key
company assets over time.
Nobody sets out to create a mission
critical spreadsheet, they “just
happen”
This is a simple spreadsheet for many
users
Furthermore, spreadsheets can
become surprisingly complex.
And, spreadsheet exist
‘under the radar’
Another interesting property of
spreadsheets is that they often live
‘under the radar’:
There is no list of spreadsheets, no
one keeps track of what sheets are
needed for what report and some
spreadsheets do not have a clear
owner.
Only 33% of spreadsheets has
a manual
Finally, spreadsheets are lacking
documentation. In only one third of
spreadsheets we found
‘documentation’ (i.e. Some sort of
explanation on how to use the
spreadsheet) Technical
documentation, explaining why a
spreadsheet was designed as it is,
was hardly ever found.
Complex spreadsheets without
documentation can lead to serious errors
You can imagine the combination
of all the above facts:
• Spreadsheets are important
• They are complex
• They lack documentation
is a potential recipe for disaster.
And indeed, those errors happen
The European Spreadsheet Risk Interest
Group (Eusprig.org) collects horror stories
Estimated loss: 10 billion dollars a year
We interviewed spreadsheet
professionals
Once I had studied related
spreadsheet work and the horror
stories from Eusprig, I wanted to
gain a deeper understanding of
spreadsheet problems in practice.
So I interviewed 27 spreadsheet
professionals at the Dutch Robeco
bank.
We interviewed spreadsheet
professionals
Once I had studied related
spreadsheet work and the horror
stories from Eusprig, I wanted to
gain a deeper understanding of
spreadsheet problems in practice.
So I interviewed 27 spreadsheet
professionals at the Dutch Robeco
bank.
I asked only two questions (a semi-
structured interview) to obtain an
overall view of spreadsheet
problems:
What annoys you?
And what makes you happy?
Financial professionals spend 2 days a
week working with Excel
From the interviews, we learned the
following facts
Spreadsheets can have a long life,
5 years on average
Average sheet is used by 12 different
people
There is a gap! Between importance and
treatment.
Then I concluded that there is an
interesting gap that needs
bridging:
the gap between how important
spreadsheets are and how well
they are treated.
So how could this gap be bridged?
It looks like software in the 70s!
Let’s summarize the problems
around spreadsheets again:
• They lack documentation
• They contain errors
• They stay alive for several years
and are used by several people
• They are complex
Does this remind you of
something?
It reminded me of the problems in
the early days of software
Hence, we tried to bridge this gap with
methods from software engineering.
Spreadsheet users lack great tool
support
If you compare the tooling of
spreadsheet developers with that
of software developers, the
difference is clear.
Modern IDEs (like Visual Studio)
have all kinds of build-in tools to
help you build software in a
responsible way: debugging,
testing, analyzing and visualizing
are accessible at the click of a
button.
Compare this to a spreadsheet
environment, like Excel. Lots of
support to create a spreadsheet,
with fonts and colors and borders,
but none of the helpful tools to
build a maintainable spreadsheet.
We did not start coding immediately
However tempting, we did not start
to build a spreadsheet IDE
immediately. Instead, we looked
at the results of the interviews, to
find the most pressing information
need that spreadsheet users had.
Most important problem: support for
understanding spreadsheets was missing
To address this information need
specifically, we developed our
tool Breviz.
This tool visualizes the
dependencies among worksheets,
depicted as rectangles with arrows
drawn between them. The thicker
the arrow, the more connections
there are.
Example: In worksheet ‘POA
Project’ formulas are placed that
refer to cells in ‘ProjectTeam’
We went back to practice
With our tool, we went back to
practice, to see whether it really
supported spreadsheet users.
Turned out, it did. Some of the
responses of users:
“This diagram
reminds me of
what I had in mind
when building”
Turned out, it did. Some of the
responses of users:
This remark is interesting:
apparently, this spreadsheet user
did do some modeling before
building a spreadsheet.
“This diagram
reminds me of
what I had in mind
when building”
Turned out, it did. Some of the
responses of users:
A clear sign that we were on the
right track!
“This makes my job
10 times easier”
This work was published
at ICSE 2011
However, unexpected things also
happened. Not all spreadsheets
looked as well structured as this
one.
Let’s look at some of them:
Here, pink blocks represent
worksheets outside of the
spreadsheet. So this spreadsheet
gathers information from over 20
other worksheets and combines
this information.
Users diagnosed with the diagrams
We found that, due to the diversity
on the diagrams, users started to
judge spreadsheets based on their
dataflow diagrams.
We therefore formalized this
feeling users had into ‘smells’ at
the design level.
These spreadsheet smells turned
out to be very similar to code
smells as defined by Fowler.
Consider for instance the ‘feature
envy’ smell. This occurs when a
method from class B refers to
many fields outside its own class.
This method envies all the cool
fields that A has, hence the name.
Consider for instance the ‘feature
envy’ smell. This occurs when a
method from class B refers to
many fields outside its own class.
This method envies all the cool
fields that A has, hence the name.
Easy to see how this smell could
be defined on spreadsheets,
where a formula in worksheet B
could be overly interested in cells
on worksheet A.
We added support in Breviz for
detecting and visualizing these
inter-worksheet code smells.
We went back to practice
Next, of course, we went back to
practice, to see how users felt
about the detected smells.
“That
should be
improved”
Results showed that users
understoond why certain
constructions were qualified as
smelly.
“That
should be
improved”
Results showed that users
understoond why certain
constructions were qualified as
smelly.
“This must be
confusing for others”
Published at ICSE 2012
However, new problems were to be
discovered. We found that, once
the structure of the spreadsheets
had been understood and
validated, complex formulas still
got in the way of understanding
spreadsheets.
This led us to the idea of formula smells
Again, we took our inpiration from
the smells that Fowler defines in his
canonical book on refctoring.
Published at ICSM 2012
In a recent extention of the paper,
we also suggest refactorings
corresponding to smells.
This formula, for instance, contain
the same subformula twice.
Extracting this subformula into a
seperate cell will improve
readbility.
We went back to practice
And again... A look in practice
We found that cloning (i.e. Copy
pasting) in spreadsheets was a
problem. If data is copy-pasted,
updates will not be propagated to
the copies and that might lead to
errors.
Based on existing work in clone
detection in source code, we
developed an algorithm to detec
clones.
Clone visualization was added to
our visualization, indicated with a
dashed arrow. After all, when data
is copy-pasted between
worksheets, there is a dependency
between those worksheets (albeit a
different one than a formula link)
To validate our algorithm, we
performed a case study at the
distribution centre of the South
Dutch food bank. There, they
process 100.000 kilos of food per
month, and keep track of that with
spreadsheets.
We were able to detect 61 near-
miss clones, of which 25 were
actual errors.
Because of our analysis, this
distrubution centre is now running
error-free spreadsheets!
To be published at ICSE 2013
And this paper concluded my PhD
thesis.
I will continue to work on
spreadsheet analysis for at least
five more years at Delft University of
Technology, so in the remaining
few slides, I’ll line out what I will be
working on in the future.
Remember spreadsheets stay in
business for 5 years and are used
by 12 people during their life span?
This makes it interesting to consider
‘spreadsheet evolution’ and study
how spreadsheets are created.
Visual Basic Analysis
In our current visualization and
analysis technique, we only
consider formulas.
However, spreadsheets also allow
for code to interact with data and
formulas (VBA code in Excel).
By analyzing this, we could make
our analysis more complete and
interesting.
Spreadsheet testing
Finally, we want to research how
spreadsheet users test. One might
think that spreadsheet users do not
test, but this is not true.
In our previous studies, we often
saw formules like this one. Here,
nothing is really calculated.
Instead, some sort of validation is
performed: if ‘find zone’!W3 is
smaller than 0, we are not
interested in the value.
When we could extract these type
of formulas, we could use them to
test the spreadsheet.
Analyzing and visualizing spreadsheets
Felienne Hermans
Thanks for reading about the
research adventure I was enjoying
the past 4 years!
If you want to know more, have a
look at my blog: www.felienne.com
If you are intrested in collaborating,
please send me an
Email f.f.j.hermans@tudelft.nl
or a tweet @felienne

Más contenido relacionado

La actualidad más candente

Domain-Driven Design
Domain-Driven DesignDomain-Driven Design
Domain-Driven DesignAndriy Buday
 
Implementing DDD with C#
Implementing DDD with C#Implementing DDD with C#
Implementing DDD with C#Pascal Laurin
 
Spreadsheets: Functional Programming for the Masses
Spreadsheets: Functional Programming for the MassesSpreadsheets: Functional Programming for the Masses
Spreadsheets: Functional Programming for the Masseskfrdbs
 
ASP.NET MVC Presentation
ASP.NET MVC PresentationASP.NET MVC Presentation
ASP.NET MVC PresentationVolkan Uzun
 
Graphql presentation
Graphql presentationGraphql presentation
Graphql presentationVibhor Grover
 
The Secrets of Hexagonal Architecture
The Secrets of Hexagonal ArchitectureThe Secrets of Hexagonal Architecture
The Secrets of Hexagonal ArchitectureNicolas Carlo
 
Kata: Hexagonal Architecture / Ports and Adapters
Kata: Hexagonal Architecture / Ports and AdaptersKata: Hexagonal Architecture / Ports and Adapters
Kata: Hexagonal Architecture / Ports and Adaptersholsky
 
Introduction to Laravel
Introduction to LaravelIntroduction to Laravel
Introduction to LaravelYogi Pratama
 
Hexagonal architecture: how, why and when
Hexagonal architecture: how, why and whenHexagonal architecture: how, why and when
Hexagonal architecture: how, why and whenXoubaman
 
Domain Driven Design Quickly
Domain Driven Design QuicklyDomain Driven Design Quickly
Domain Driven Design QuicklyMariam Hakobyan
 
Graphql Intro (Tutorial and Example)
Graphql Intro (Tutorial and Example)Graphql Intro (Tutorial and Example)
Graphql Intro (Tutorial and Example)Rafael Wilber Kerr
 

La actualidad más candente (20)

Framework laravel
Framework laravelFramework laravel
Framework laravel
 
Domain-Driven Design
Domain-Driven DesignDomain-Driven Design
Domain-Driven Design
 
Implementing DDD with C#
Implementing DDD with C#Implementing DDD with C#
Implementing DDD with C#
 
CQRS and Event Sourcing
CQRS and Event SourcingCQRS and Event Sourcing
CQRS and Event Sourcing
 
Agile@core - Scrum
Agile@core - ScrumAgile@core - Scrum
Agile@core - Scrum
 
Spreadsheets: Functional Programming for the Masses
Spreadsheets: Functional Programming for the MassesSpreadsheets: Functional Programming for the Masses
Spreadsheets: Functional Programming for the Masses
 
ASP.NET MVC Presentation
ASP.NET MVC PresentationASP.NET MVC Presentation
ASP.NET MVC Presentation
 
Graphql presentation
Graphql presentationGraphql presentation
Graphql presentation
 
The Secrets of Hexagonal Architecture
The Secrets of Hexagonal ArchitectureThe Secrets of Hexagonal Architecture
The Secrets of Hexagonal Architecture
 
React & GraphQL
React & GraphQLReact & GraphQL
React & GraphQL
 
Introdução à linguagem python
Introdução à linguagem pythonIntrodução à linguagem python
Introdução à linguagem python
 
Managing SAP Custom Code
Managing SAP Custom CodeManaging SAP Custom Code
Managing SAP Custom Code
 
Kata: Hexagonal Architecture / Ports and Adapters
Kata: Hexagonal Architecture / Ports and AdaptersKata: Hexagonal Architecture / Ports and Adapters
Kata: Hexagonal Architecture / Ports and Adapters
 
GraphQL in Magento 2
GraphQL in Magento 2GraphQL in Magento 2
GraphQL in Magento 2
 
Introduction to Laravel
Introduction to LaravelIntroduction to Laravel
Introduction to Laravel
 
Hexagonal architecture: how, why and when
Hexagonal architecture: how, why and whenHexagonal architecture: how, why and when
Hexagonal architecture: how, why and when
 
Domain Driven Design
Domain Driven DesignDomain Driven Design
Domain Driven Design
 
Domain Driven Design Quickly
Domain Driven Design QuicklyDomain Driven Design Quickly
Domain Driven Design Quickly
 
Json
JsonJson
Json
 
Graphql Intro (Tutorial and Example)
Graphql Intro (Tutorial and Example)Graphql Intro (Tutorial and Example)
Graphql Intro (Tutorial and Example)
 

Destacado

Small, simple and smelly: What we can learn from examining end-user artifacts?
Small, simple and smelly: What we can learn from examining end-user artifacts?Small, simple and smelly: What we can learn from examining end-user artifacts?
Small, simple and smelly: What we can learn from examining end-user artifacts?Felienne Hermans
 
High-performance sheet-defined functions in Excel - Peter Sestoft at Sems 2014
High-performance sheet-defined functions in Excel - Peter Sestoft at Sems 2014High-performance sheet-defined functions in Excel - Peter Sestoft at Sems 2014
High-performance sheet-defined functions in Excel - Peter Sestoft at Sems 2014semsworkshop
 
20140913 CMF Fall 2014 Presentation
20140913 CMF Fall 2014 Presentation20140913 CMF Fall 2014 Presentation
20140913 CMF Fall 2014 PresentationЦМФ МГУ
 
Detecting and Visualizing Inter-worksheet Smells in Spreadsheets
Detecting and Visualizing Inter-worksheet Smells in Spreadsheets Detecting and Visualizing Inter-worksheet Smells in Spreadsheets
Detecting and Visualizing Inter-worksheet Smells in Spreadsheets Felienne Hermans
 
Reverse Engineering Spreadsheets
Reverse Engineering SpreadsheetsReverse Engineering Spreadsheets
Reverse Engineering SpreadsheetsDevnology
 
How to survive a PhD
How to survive a PhDHow to survive a PhD
How to survive a PhDncg_nuim
 
The Art of Doing a PhD
The Art of Doing a PhDThe Art of Doing a PhD
The Art of Doing a PhDJakob Bardram
 
Sharing My PhD Experience
Sharing My PhD ExperienceSharing My PhD Experience
Sharing My PhD ExperienceHiram Ting
 
Improving Spreadsheet Test Practices
Improving Spreadsheet Test PracticesImproving Spreadsheet Test Practices
Improving Spreadsheet Test PracticesFelienne Hermans
 
Being a PhD student: Experiences and Challenges
Being a PhD student: Experiences and ChallengesBeing a PhD student: Experiences and Challenges
Being a PhD student: Experiences and ChallengesFaegheh Hasibi
 
The Research Proposal
The Research ProposalThe Research Proposal
The Research Proposalguest349908
 

Destacado (14)

Spreadsheets are code
Spreadsheets are codeSpreadsheets are code
Spreadsheets are code
 
Small, simple and smelly: What we can learn from examining end-user artifacts?
Small, simple and smelly: What we can learn from examining end-user artifacts?Small, simple and smelly: What we can learn from examining end-user artifacts?
Small, simple and smelly: What we can learn from examining end-user artifacts?
 
High-performance sheet-defined functions in Excel - Peter Sestoft at Sems 2014
High-performance sheet-defined functions in Excel - Peter Sestoft at Sems 2014High-performance sheet-defined functions in Excel - Peter Sestoft at Sems 2014
High-performance sheet-defined functions in Excel - Peter Sestoft at Sems 2014
 
20140913 CMF Fall 2014 Presentation
20140913 CMF Fall 2014 Presentation20140913 CMF Fall 2014 Presentation
20140913 CMF Fall 2014 Presentation
 
FCell Features
FCell FeaturesFCell Features
FCell Features
 
Detecting and Visualizing Inter-worksheet Smells in Spreadsheets
Detecting and Visualizing Inter-worksheet Smells in Spreadsheets Detecting and Visualizing Inter-worksheet Smells in Spreadsheets
Detecting and Visualizing Inter-worksheet Smells in Spreadsheets
 
Reverse Engineering Spreadsheets
Reverse Engineering SpreadsheetsReverse Engineering Spreadsheets
Reverse Engineering Spreadsheets
 
How to survive a PhD
How to survive a PhDHow to survive a PhD
How to survive a PhD
 
The Art of Doing a PhD
The Art of Doing a PhDThe Art of Doing a PhD
The Art of Doing a PhD
 
Sharing My PhD Experience
Sharing My PhD ExperienceSharing My PhD Experience
Sharing My PhD Experience
 
Improving Spreadsheet Test Practices
Improving Spreadsheet Test PracticesImproving Spreadsheet Test Practices
Improving Spreadsheet Test Practices
 
Recipes for PhD
Recipes for PhDRecipes for PhD
Recipes for PhD
 
Being a PhD student: Experiences and Challenges
Being a PhD student: Experiences and ChallengesBeing a PhD student: Experiences and Challenges
Being a PhD student: Experiences and Challenges
 
The Research Proposal
The Research ProposalThe Research Proposal
The Research Proposal
 

Similar a An overview of my PhD research

Data Driven Design - Frontend Conference Zurich
Data Driven Design - Frontend Conference ZurichData Driven Design - Frontend Conference Zurich
Data Driven Design - Frontend Conference ZurichMemi Beltrame
 
Konstantin Knizhnik: static analysis, a view from aside
Konstantin Knizhnik: static analysis, a view from asideKonstantin Knizhnik: static analysis, a view from aside
Konstantin Knizhnik: static analysis, a view from asidePVS-Studio
 
UPA2007 Designing Interfaces Jenifer Tidwell
UPA2007 Designing Interfaces Jenifer TidwellUPA2007 Designing Interfaces Jenifer Tidwell
UPA2007 Designing Interfaces Jenifer Tidwellnikrao
 
UPA2007 Designing Interfaces Jenifer Tidwell
UPA2007 Designing Interfaces Jenifer TidwellUPA2007 Designing Interfaces Jenifer Tidwell
UPA2007 Designing Interfaces Jenifer Tidwellguestf59d1c4
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera, Inc.
 
SAD01 - An Introduction to Systems Analysis and Design
SAD01 - An Introduction to Systems Analysis and DesignSAD01 - An Introduction to Systems Analysis and Design
SAD01 - An Introduction to Systems Analysis and DesignMichael Heron
 
Putting the science in computer science
Putting the science in computer sciencePutting the science in computer science
Putting the science in computer scienceFelienne Hermans
 
Software Carpentry and the Hydrological Sciences @ AGU 2013
Software Carpentry and the Hydrological Sciences @ AGU 2013Software Carpentry and the Hydrological Sciences @ AGU 2013
Software Carpentry and the Hydrological Sciences @ AGU 2013Aron Ahmadia
 
Bridging Current Reality & Future Vision with Reality Maps
Bridging Current Reality & Future Vision with Reality MapsBridging Current Reality & Future Vision with Reality Maps
Bridging Current Reality & Future Vision with Reality MapsMalini Rao
 
Semantic web, python, construction industry
Semantic web, python, construction industrySemantic web, python, construction industry
Semantic web, python, construction industryReinout van Rees
 
User Research on a Shoestring
User Research on a ShoestringUser Research on a Shoestring
User Research on a Shoestringteaguese
 
Documentation for developers
Documentation for developersDocumentation for developers
Documentation for developersMichael Marotta
 
Using Data Effectively: Beyond Art and Science
Using Data Effectively: Beyond Art and ScienceUsing Data Effectively: Beyond Art and Science
Using Data Effectively: Beyond Art and ScienceC4Media
 
If the coding bug is banal, it doesn't meant it's not crucial
If the coding bug is banal, it doesn't meant it's not crucialIf the coding bug is banal, it doesn't meant it's not crucial
If the coding bug is banal, it doesn't meant it's not crucialPVS-Studio
 
How To Contribute Drupalcon
How To Contribute   DrupalconHow To Contribute   Drupalcon
How To Contribute Drupalconguestc9344e
 
Prototyping for tiny fingers
Prototyping for tiny fingersPrototyping for tiny fingers
Prototyping for tiny fingersJulio Pari
 

Similar a An overview of my PhD research (20)

Data Driven Design - Frontend Conference Zurich
Data Driven Design - Frontend Conference ZurichData Driven Design - Frontend Conference Zurich
Data Driven Design - Frontend Conference Zurich
 
Smart Housekeeping Apps
Smart Housekeeping AppsSmart Housekeeping Apps
Smart Housekeeping Apps
 
Konstantin Knizhnik: static analysis, a view from aside
Konstantin Knizhnik: static analysis, a view from asideKonstantin Knizhnik: static analysis, a view from aside
Konstantin Knizhnik: static analysis, a view from aside
 
UPA2007 Designing Interfaces Jenifer Tidwell
UPA2007 Designing Interfaces Jenifer TidwellUPA2007 Designing Interfaces Jenifer Tidwell
UPA2007 Designing Interfaces Jenifer Tidwell
 
UPA2007 Designing Interfaces Jenifer Tidwell
UPA2007 Designing Interfaces Jenifer TidwellUPA2007 Designing Interfaces Jenifer Tidwell
UPA2007 Designing Interfaces Jenifer Tidwell
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
 
SAD01 - An Introduction to Systems Analysis and Design
SAD01 - An Introduction to Systems Analysis and DesignSAD01 - An Introduction to Systems Analysis and Design
SAD01 - An Introduction to Systems Analysis and Design
 
Putting the science in computer science
Putting the science in computer sciencePutting the science in computer science
Putting the science in computer science
 
Software Carpentry and the Hydrological Sciences @ AGU 2013
Software Carpentry and the Hydrological Sciences @ AGU 2013Software Carpentry and the Hydrological Sciences @ AGU 2013
Software Carpentry and the Hydrological Sciences @ AGU 2013
 
Bridging Current Reality & Future Vision with Reality Maps
Bridging Current Reality & Future Vision with Reality MapsBridging Current Reality & Future Vision with Reality Maps
Bridging Current Reality & Future Vision with Reality Maps
 
Semantic web, python, construction industry
Semantic web, python, construction industrySemantic web, python, construction industry
Semantic web, python, construction industry
 
User Research on a Shoestring
User Research on a ShoestringUser Research on a Shoestring
User Research on a Shoestring
 
Documentation for developers
Documentation for developersDocumentation for developers
Documentation for developers
 
How do you design
How do you designHow do you design
How do you design
 
Using Data Effectively: Beyond Art and Science
Using Data Effectively: Beyond Art and ScienceUsing Data Effectively: Beyond Art and Science
Using Data Effectively: Beyond Art and Science
 
If the coding bug is banal, it doesn't meant it's not crucial
If the coding bug is banal, it doesn't meant it's not crucialIf the coding bug is banal, it doesn't meant it's not crucial
If the coding bug is banal, it doesn't meant it's not crucial
 
How To Contribute Drupalcon
How To Contribute   DrupalconHow To Contribute   Drupalcon
How To Contribute Drupalcon
 
Prototyping for tiny fingers
Prototyping for tiny fingersPrototyping for tiny fingers
Prototyping for tiny fingers
 
AntiPatterns
AntiPatternsAntiPatterns
AntiPatterns
 
DataHub
DataHubDataHub
DataHub
 

Más de Felienne Hermans

Using F# and genetic programming to play computer bridge
Using F# and genetic programming to play computer bridgeUsing F# and genetic programming to play computer bridge
Using F# and genetic programming to play computer bridgeFelienne Hermans
 
Functional Programming in Excel
Functional Programming in ExcelFunctional Programming in Excel
Functional Programming in ExcelFelienne Hermans
 
Programming is logical reasoning?
Programming is logical reasoning?Programming is logical reasoning?
Programming is logical reasoning?Felienne Hermans
 
Do Code Smell Hamper Novice Programmers?
Do Code Smell Hamper Novice Programmers?Do Code Smell Hamper Novice Programmers?
Do Code Smell Hamper Novice Programmers?Felienne Hermans
 
Programming by Calculation
Programming by CalculationProgramming by Calculation
Programming by CalculationFelienne Hermans
 
A board game night with geeks: attacking Quarto ties with SAT solvers
A board game night with geeks: attacking Quarto ties with SAT solversA board game night with geeks: attacking Quarto ties with SAT solvers
A board game night with geeks: attacking Quarto ties with SAT solversFelienne Hermans
 
Spreadsheets for developers
Spreadsheets for developersSpreadsheets for developers
Spreadsheets for developersFelienne Hermans
 
Presenting: structure story and support
Presenting: structure story and supportPresenting: structure story and support
Presenting: structure story and supportFelienne Hermans
 
Spreadsheets are graphs too: Using Neo4J as backend to store spreadsheet info...
Spreadsheets are graphs too: Using Neo4J as backend to store spreadsheet info...Spreadsheets are graphs too: Using Neo4J as backend to store spreadsheet info...
Spreadsheets are graphs too: Using Neo4J as backend to store spreadsheet info...Felienne Hermans
 
Social media for the busy scientist
Social media for the busy scientistSocial media for the busy scientist
Social media for the busy scientistFelienne Hermans
 

Más de Felienne Hermans (14)

Using F# and genetic programming to play computer bridge
Using F# and genetic programming to play computer bridgeUsing F# and genetic programming to play computer bridge
Using F# and genetic programming to play computer bridge
 
Functional Programming in Excel
Functional Programming in ExcelFunctional Programming in Excel
Functional Programming in Excel
 
How does code sound?
How does code sound?How does code sound?
How does code sound?
 
Programming is logical reasoning?
Programming is logical reasoning?Programming is logical reasoning?
Programming is logical reasoning?
 
Do Code Smell Hamper Novice Programmers?
Do Code Smell Hamper Novice Programmers?Do Code Smell Hamper Novice Programmers?
Do Code Smell Hamper Novice Programmers?
 
Programming by Calculation
Programming by CalculationProgramming by Calculation
Programming by Calculation
 
A board game night with geeks: attacking Quarto ties with SAT solvers
A board game night with geeks: attacking Quarto ties with SAT solversA board game night with geeks: attacking Quarto ties with SAT solvers
A board game night with geeks: attacking Quarto ties with SAT solvers
 
Spreadsheets for developers
Spreadsheets for developersSpreadsheets for developers
Spreadsheets for developers
 
Presenting: structure story and support
Presenting: structure story and supportPresenting: structure story and support
Presenting: structure story and support
 
Spreadsheets are graphs too: Using Neo4J as backend to store spreadsheet info...
Spreadsheets are graphs too: Using Neo4J as backend to store spreadsheet info...Spreadsheets are graphs too: Using Neo4J as backend to store spreadsheet info...
Spreadsheets are graphs too: Using Neo4J as backend to store spreadsheet info...
 
Social media for the busy scientist
Social media for the busy scientistSocial media for the busy scientist
Social media for the busy scientist
 
Spreadsheet Testing
Spreadsheet TestingSpreadsheet Testing
Spreadsheet Testing
 
TEDxDelft
TEDxDelftTEDxDelft
TEDxDelft
 
The power of symmetry
The power of symmetryThe power of symmetry
The power of symmetry
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

An overview of my PhD research

  • 1. Analyzing & visualizing spreadsheets Felienne Hermans (@felienne)
  • 2. Analyzing & visualizing spreadsheets Felienne Hermans (@felienne) In this slidedeck I present an overview of my PhD research. I recently defended my dissertation titled ‘Analyzing and visualizing Spreadsheets’
  • 3. In this slidedeck I present an overview of my PhD research. I recently defended my dissertation titled ‘Analyzing and visualizing Spreadsheets’  This one!
  • 4. Bridging the gap Funny story: I wasn’t hired to research spreadsheets at all. When I started my PhD project, I was supposed to research the gap between business users and programmers. Users Programmers
  • 5. To research this gap, I started by studying business in practice
  • 6. What surprised me, is that this gap wasn’t that big, it was more like a small creek than a huge cliff. Some programmers were heavilly involved in business, and even more interesting: some business guys were doing serious programming. Programmers Users
  • 7. What surprised me, is that this gap wasn’t that big, it was more like a small creek than a huge cliff. Some programmers were heavilly involved in business, and even more interesting: some business guys were doing serious programming. In Excel! Programmers Users
  • 8. What surprised me, is that this gap wasn’t that big, it was more like a small creek than a huge cliff. Some programmers were heavilly involved in business, and even more interesting: some business guys were doing serious programming. In Excel! So I looked into some previous work on the impact of spreadsheets on business. Programmers Users
  • 9. 95% of all U.S. firms use spreadsheets for financial reporting
  • 10. 90% of all analysts in industry perform calculations in spreadsheets
  • 11. 50% of spreadsheets form the basis for decisions
  • 12. Importance can grow over time When studying the impact of spreadsheets, we found that they do not become important overnight. As processes change, spreadsheets can become key company assets over time. Nobody sets out to create a mission critical spreadsheet, they “just happen”
  • 13. This is a simple spreadsheet for many users Furthermore, spreadsheets can become surprisingly complex.
  • 14. And, spreadsheet exist ‘under the radar’ Another interesting property of spreadsheets is that they often live ‘under the radar’: There is no list of spreadsheets, no one keeps track of what sheets are needed for what report and some spreadsheets do not have a clear owner.
  • 15. Only 33% of spreadsheets has a manual Finally, spreadsheets are lacking documentation. In only one third of spreadsheets we found ‘documentation’ (i.e. Some sort of explanation on how to use the spreadsheet) Technical documentation, explaining why a spreadsheet was designed as it is, was hardly ever found.
  • 16. Complex spreadsheets without documentation can lead to serious errors You can imagine the combination of all the above facts: • Spreadsheets are important • They are complex • They lack documentation is a potential recipe for disaster. And indeed, those errors happen
  • 17. The European Spreadsheet Risk Interest Group (Eusprig.org) collects horror stories
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27. Estimated loss: 10 billion dollars a year
  • 28. We interviewed spreadsheet professionals Once I had studied related spreadsheet work and the horror stories from Eusprig, I wanted to gain a deeper understanding of spreadsheet problems in practice. So I interviewed 27 spreadsheet professionals at the Dutch Robeco bank.
  • 29. We interviewed spreadsheet professionals Once I had studied related spreadsheet work and the horror stories from Eusprig, I wanted to gain a deeper understanding of spreadsheet problems in practice. So I interviewed 27 spreadsheet professionals at the Dutch Robeco bank. I asked only two questions (a semi- structured interview) to obtain an overall view of spreadsheet problems:
  • 31. And what makes you happy?
  • 32. Financial professionals spend 2 days a week working with Excel From the interviews, we learned the following facts
  • 33. Spreadsheets can have a long life, 5 years on average
  • 34. Average sheet is used by 12 different people
  • 35. There is a gap! Between importance and treatment. Then I concluded that there is an interesting gap that needs bridging: the gap between how important spreadsheets are and how well they are treated. So how could this gap be bridged?
  • 36. It looks like software in the 70s! Let’s summarize the problems around spreadsheets again: • They lack documentation • They contain errors • They stay alive for several years and are used by several people • They are complex Does this remind you of something? It reminded me of the problems in the early days of software
  • 37. Hence, we tried to bridge this gap with methods from software engineering.
  • 38. Spreadsheet users lack great tool support If you compare the tooling of spreadsheet developers with that of software developers, the difference is clear.
  • 39. Modern IDEs (like Visual Studio) have all kinds of build-in tools to help you build software in a responsible way: debugging, testing, analyzing and visualizing are accessible at the click of a button.
  • 40. Compare this to a spreadsheet environment, like Excel. Lots of support to create a spreadsheet, with fonts and colors and borders, but none of the helpful tools to build a maintainable spreadsheet.
  • 41. We did not start coding immediately However tempting, we did not start to build a spreadsheet IDE immediately. Instead, we looked at the results of the interviews, to find the most pressing information need that spreadsheet users had.
  • 42. Most important problem: support for understanding spreadsheets was missing
  • 43. To address this information need specifically, we developed our tool Breviz. This tool visualizes the dependencies among worksheets, depicted as rectangles with arrows drawn between them. The thicker the arrow, the more connections there are. Example: In worksheet ‘POA Project’ formulas are placed that refer to cells in ‘ProjectTeam’
  • 44. We went back to practice With our tool, we went back to practice, to see whether it really supported spreadsheet users.
  • 45. Turned out, it did. Some of the responses of users: “This diagram reminds me of what I had in mind when building”
  • 46. Turned out, it did. Some of the responses of users: This remark is interesting: apparently, this spreadsheet user did do some modeling before building a spreadsheet. “This diagram reminds me of what I had in mind when building”
  • 47. Turned out, it did. Some of the responses of users: A clear sign that we were on the right track! “This makes my job 10 times easier”
  • 48. This work was published at ICSE 2011
  • 49. However, unexpected things also happened. Not all spreadsheets looked as well structured as this one. Let’s look at some of them:
  • 50.
  • 51.
  • 52. Here, pink blocks represent worksheets outside of the spreadsheet. So this spreadsheet gathers information from over 20 other worksheets and combines this information.
  • 53. Users diagnosed with the diagrams We found that, due to the diversity on the diagrams, users started to judge spreadsheets based on their dataflow diagrams. We therefore formalized this feeling users had into ‘smells’ at the design level. These spreadsheet smells turned out to be very similar to code smells as defined by Fowler.
  • 54. Consider for instance the ‘feature envy’ smell. This occurs when a method from class B refers to many fields outside its own class. This method envies all the cool fields that A has, hence the name.
  • 55. Consider for instance the ‘feature envy’ smell. This occurs when a method from class B refers to many fields outside its own class. This method envies all the cool fields that A has, hence the name. Easy to see how this smell could be defined on spreadsheets, where a formula in worksheet B could be overly interested in cells on worksheet A.
  • 56. We added support in Breviz for detecting and visualizing these inter-worksheet code smells.
  • 57. We went back to practice Next, of course, we went back to practice, to see how users felt about the detected smells.
  • 58. “That should be improved” Results showed that users understoond why certain constructions were qualified as smelly.
  • 59. “That should be improved” Results showed that users understoond why certain constructions were qualified as smelly. “This must be confusing for others”
  • 61. However, new problems were to be discovered. We found that, once the structure of the spreadsheets had been understood and validated, complex formulas still got in the way of understanding spreadsheets.
  • 62. This led us to the idea of formula smells
  • 63. Again, we took our inpiration from the smells that Fowler defines in his canonical book on refctoring.
  • 65. In a recent extention of the paper, we also suggest refactorings corresponding to smells. This formula, for instance, contain the same subformula twice. Extracting this subformula into a seperate cell will improve readbility.
  • 66. We went back to practice And again... A look in practice
  • 67. We found that cloning (i.e. Copy pasting) in spreadsheets was a problem. If data is copy-pasted, updates will not be propagated to the copies and that might lead to errors. Based on existing work in clone detection in source code, we developed an algorithm to detec clones.
  • 68. Clone visualization was added to our visualization, indicated with a dashed arrow. After all, when data is copy-pasted between worksheets, there is a dependency between those worksheets (albeit a different one than a formula link)
  • 69. To validate our algorithm, we performed a case study at the distribution centre of the South Dutch food bank. There, they process 100.000 kilos of food per month, and keep track of that with spreadsheets. We were able to detect 61 near- miss clones, of which 25 were actual errors. Because of our analysis, this distrubution centre is now running error-free spreadsheets!
  • 70. To be published at ICSE 2013
  • 71. And this paper concluded my PhD thesis. I will continue to work on spreadsheet analysis for at least five more years at Delft University of Technology, so in the remaining few slides, I’ll line out what I will be working on in the future.
  • 72. Remember spreadsheets stay in business for 5 years and are used by 12 people during their life span? This makes it interesting to consider ‘spreadsheet evolution’ and study how spreadsheets are created.
  • 73. Visual Basic Analysis In our current visualization and analysis technique, we only consider formulas. However, spreadsheets also allow for code to interact with data and formulas (VBA code in Excel). By analyzing this, we could make our analysis more complete and interesting.
  • 74. Spreadsheet testing Finally, we want to research how spreadsheet users test. One might think that spreadsheet users do not test, but this is not true.
  • 75. In our previous studies, we often saw formules like this one. Here, nothing is really calculated. Instead, some sort of validation is performed: if ‘find zone’!W3 is smaller than 0, we are not interested in the value. When we could extract these type of formulas, we could use them to test the spreadsheet.
  • 76. Analyzing and visualizing spreadsheets Felienne Hermans Thanks for reading about the research adventure I was enjoying the past 4 years! If you want to know more, have a look at my blog: www.felienne.com If you are intrested in collaborating, please send me an Email f.f.j.hermans@tudelft.nl or a tweet @felienne