Overview of the paper "What's in Unison? A Formal Specication and Reference Implementation of a File Synchronizer" by Benjamin C. Pierce Jerome Vouillon presented at Oregon State University for "Domain Specific Languages" class on May 20th 2014. Presentation time: 20 min
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
CS589 paper presentation - What is in unison? A formal specification and reference implementation of a file synchronizer
1. WHAT’S IN UNISON?
A FORMAL SPECIFICATION AND
REFERENCE IMPLEMENTATION
OF A FILE SYNCHRONIZER
Presentation type: paper presentation
Class: CS 589 – Domain Specific Languages
Presenter: Sergii Shmarkatiuk
Date: 5/20/2014
3. COMMON USE CASE: CLOUD STORAGE AND
SYNCHRONIZATION BETWEEN DEVICES
3
4. COMMON USE CASE: DEPLOYMENT OF
WEB-APPLICATION
4
development instance production instance
http://localhost/MyApp http://myapp.com
3. upload
2. test
ftp://@myapp.com/var/www/html/
1. edit files
4. test
/var/www/html/MyApp
5. COMMON USE CASE: DEPLOYMENT OF
WEB-APPLICATION
5
development instance production instance
http://localhost/MyApp http://myapp.com
4. test
/var/www/html/MyApp ftp://@myapp.com/var/www/html/
1. edit files
5. test
2. edit files
3. sync
+
8. UNISON
File synchronization tool
Command line interface
Implemented in OCaml (DSL) and C
(tool)
Available for all major platforms (UNIX,
Win, Mac)
Unlike rsync, Unison is not included into
basic UNIX distributions
8
9. PAPER CONTRIBUTIONS
Paper presents mathematical model,
DSL, mechanics and challenges of file
synchronization
Authors proved some properties of file
synchronization operations using Coq
Authors described the gap between
idealistic representation and actual tool
implementation
9
10. QUESTIONS
Keying Xu, Chao Peng:
What is the semantic domain of Unison? Is it a deep or
shallow embedded DSL?
10
Unison is a deep embedded DSL
11. UNISON: SYNTAX AND SEMANTIC DOMAIN
11
Current states of the
replicas
2 file trees
Archive (last
synchronized state)
Synchronized
replicas
2 file trees
Archive (last
synchronized state)
Syntax Semantic domain
sem
A B
~
A B
=
13. UNISON: BASIC DATA STRUCTURES
FILESYSTEM
13
type name = string
type contents = string
type properties = string
type fs = Dir of properties * dContents
| File of properties * contents
| Symlink of contents
| Bot
and dContents = (name * fs) list
OCaml
14. UNISON: BASIC DATA STRUCTURES
UPDATE DETECTION
14
type prevState = DIR
| FILE
| SYMLINK
| ABSENT
type ’a leafUpdate = LeafSame
| LeafUpdated of ’a * ’a option
OCaml
15. UNISON: BASIC DATA STRUCTURES
UPDATE DETECTION
15
type updateItem = Same
| Updated of updateContent * prevState
| Error
and updateContent =
UCDir of properties leafUpdate * updateChildren
| UCFile of properties leafUpdate * contents leafUpdate
| UCSymlink of contents leafUpdate
| UCAbsent
and updateChildren = (name * updateItem) list
OCaml
16. UNISON: BASIC DATA STRUCTURES
RECONCILIATION
16
type direction = Conflict
| LeftToRight
| RightToLeft
| Equal
type transportInstr =
Instr of updateItem * updateItem * direction
| NoInstr
| Problem
type transportInstrTree = Node of transportInstr * transp
ortInstrList
and transportInstrList = (name * transportInstrTree) list
OCaml
17. UNISON: BASIC OPERATIONS
17
• Comparison of two file trees
• Description of detected difference
Update detection (buildUpdates)
• Building set of transport instructions
Reconciliation (reconcile)
• Performing transport instructions
• Giving user the opportunity to verify changes
Propagation (propagate)
22. QUESTIONS
Keying Xu:
How do authors deal with modeling gap between the
reference implementation and the specification?
22
Authors describe „modeling gap‟ limitations in
their paper
23. UNISON: THE “MODELING GAP”
23
Functional program (Ocaml)
Returns new replicas without
changing content
Written as if it “owns”
filesystems
Regards filesystems as simple,
mathematical tree structures
Assumes that all operations
can be implemented
111111111111
Treats archive as full-blown
filesystem
Imperative program (C)
Modifies real filesystems in-
place
Runs with live filesystems
123123
Operates on real
implementations of filesystems
(POSIX, NTFS, …)
Deals with operations that might
be impossible to implement
Stores just a fingerprint of each
file‟s contents
Reference implementation
(DSL)
Real implementation (software
tool)
24. QUESTIONS
Brent Carmer:
What is the connection between Unison and your DSL?
Panini Patapanchala:
Relation with the version control and what aspects you
can take from this paper.
24
• SCMF-DSL also uses concept of replication
• SCMF-DSL also operates with file trees
• SCMF-DSL also detects file changes to perform such
automatic actions as version numbering
26. UNISON VS SCMF-DSL
26
Takes into account only latest
synchronized state
Allows synchronization only
between latest states
Treats both file trees as equal
sources of changes (everything
is writable)11111111111
There is no defined direction for
replication 1111111111111111111
Operates with file trees
Replicates contents of
filesystem
Might generate incompatible
changes (conflicts)
Saves information about all
synchronized states
Allows to roll back to previous
states
Treats file trees as primary and
secondary (can be writable or
read-only)
Operates with certain direction
for replication: from primary
replica to secondary replica
Operates with version trees
Replicates contents of version
control system
Does not generate incompatible
changes
Unison SCMF-DSL
27. UNISON VS SCMF-DSL
27
A B
~
A B
=
sem
Unison: file trees
SCMF-DSL: version trees
sem
1
x
2
3
57
8
9
11
x
4
6
x
10
1
x
2
3
57
8
9
11
x
4
6
x
10
12
28. UNISON VS SCMF-DSL
28
A B
~
A B
=
sem
Unison: file trees, 2 platforms (A, B)
SCMF-DSL: version trees, N platforms (P1, P2, … PN)
sem
1
x
2
3
57
8
9
11
x
4
6
x
10
1
x
2
3
57
8
9
11
x
4
6
x
10
12
29. QUESTIONS
Brent Carmer:
Does the user ever construct things using the types
listed in the reference implementation?
29
NO
User uses real implementation (tool) instead of
reference implementation (DSL)
30. QUESTIONS
Amin Alipour:
How can they make sure that function synch is run
atomically?
Brent Carmer:
How do they use Coq to verify their reference
implementation?
Panini Patapanchala:
Maximal runs are unique can you justify the theorem with an
example.
30
31. QUESTIONS
31
Authors use Coq to prove following properties of their DSL:
• Laziness is safe (replication with itself is safe )
• Mirroring is a special case (replication with previously
synchronized state o and replica a gives replica a)
• Maximal runs are unique (it is impossible to generate
two different synchronizations on the same two replicas
a and b)
• Success in the absence of conflicts (if replication does
not generate conflicts first time, it won‟t generate
conflicts next time as well)
32. QUESTIONS
Amin Alipour:
The paper assumes that there are only two replica of filesystem's.
Is that right? If so, how it can scale synchronization to more than
two replicas?
Rui Qin:
How about more than two replicas, does it also work?
Panini Patapanchala:
I feel the paper explained the base cases for reconciliation and
presently the more important problems are the one presented in
future scope like multi-replica synchronization for more number of
replicas.
32
NO
Unison works only with pairs of replicas
33. QUESTIONS
Amin Alipour:
What is the relation of conflict as described in the paper and
merge in git?
Chao Peng:
Is Unison easily extensible? can you conclude some low-
level or high-level aspects of Unison?
33
34. QUESTIONS
Panini Patapanchala:
The buildupdate of the implementation is a bit like
imperative implementation than functional.
34
This is partly because of mixed nature of OCaml language -
it incorporates functional, imperative and
object-oriented paradigms
Notas del editor
Hello everybody. Today I will be presenting paper about file synchronization tool called Unison created by the researchers at the University of Pennsylvania. I know that all of you read the paper. So, you probably already know a little bit about file synchronization. Nevertheless, let me do quick recap on the target domain of the DSL described in the paper.
Target domain is file synchronization. There are also related activities, such as … Unison can be used to help with those activities because In one way or another listed activities employ concept of file synchronization. Basically, those are all synonyms with few minor differences. Core principle is the same – file synchronization.
If you will say that you never used tool for file synchronization, I won’t believe you. The most common use case for file synchronization is cloud storage. And thanks to cloud storage tools such as Dropbox and Google Drive today not only software engineers use file synchronization, but also end-users to keep their documents up to date on different devices.
Another common use case is deployment of web-application. How many of you have personal web-page? How do you update content of your web-page? Do you edit it manually directly on the server or do you edit it locally and then upload it to the server? Personally I prefer editing it locally and uploading it to the server after I tested it on my machine
But sometimes I need to perform changes directly on the server. In that case I will need to synchronize files after I edited it in different locations. Such tools as unison can help with that.
There are many similar tools for file synchronization – manual and automatic
There are also tools used specifically for deployment of applications and infrastructure. They provide high-level abstractions for file synchronization hiding technical details of file synchronization from the users.
What is unison and how is it different from all other tools?
Why do we need it then if it is the same as rsync? It is beneficial for us as the researchers to know more about file synchronization DSL, mechanics of file synchronization, prove properties of the DSL and learn about approach of converting DSL into real-world tool that can be used by real users.
Unison is a deep embedded DSLBecause basic types and semantic functions are defined separately
Unison takes current states of the replicas as an inputAnd produces synchronized replicas as an output
Let me provide an overview of the file sync workflow used by Unison.There are few basic operations introduced by Unison, such as …
There are certain DSL helper functions used to help with filesystem operations, …
… with update detection, …
… with reconciliation …
… and with propagation …
Authors just describe ‘modeling gap’ limitations in their paper
They come up with several important points on conceptual differences between DSL implementation and tool implementation.I consider this to be one of the most important paper contributions because I believe in one way or another all DSL developers face the problem of implementing real tool and matching initial idealistic specification of the DSL with real world and coping with implementation problems.Some operations cannot be implemented sometimes. For example, atomic rename. There are problems with that on different OS.
Unison is the most close match with my DSL among other available DSLs I reviewed. It is somehow similar, but it is also different at the same time.
Learning about system as a whole
Learning about system as a whole
У сегодняшнего тренинга две цели.Первая - ?Вторая – ?
Marco D’Ambros and MicheleLanza describe approach of software evolution reconstruction. Its goal is to help with software history visualization
Marco D’Ambros and MicheleLanza describe approach of software evolution reconstruction. Its goal is to help with software history visualization
What I liked:Discrete time moduleWhat I didn’t like:CVS module view, CVS revision viewNot enough accent on evolutionAmbivalent:Fractal view