Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Sciunits: Resuable Research Object
1. 1IEEE eScience 2017
sciunits:sciunits:
Reusable Research ObjectsReusable Research Objects
School of Computing, College of Computing and Digital MediaSchool of Computing, College of Computing and Digital Media
Dai Hai Ton That, Gabe Fils,Dai Hai Ton That, Gabe Fils,
Zhihao Yuan, Tanu MalikZhihao Yuan, Tanu Malik
Presented by Gabe FilsPresented by Gabe Fils
2. 2IEEE eScience 2017
Problem SpaceProblem Space
No easily creatable, readily reusable, efficiently versioned,No easily creatable, readily reusable, efficiently versioned,
discrete unit of computation existsdiscrete unit of computation exists
Virtualization Distributed
Version Control
Research Object
portable
self-contained
repeatable collaborative
versioned
documented
3. 3IEEE eScience 2017
Introducing: TheIntroducing: The sciunitsciunit
Captures application executionsCaptures application executions
Repeats executionsRepeats executions
Reproduces executions, changing input argsReproduces executions, changing input args
Versioned executions stored as oneVersioned executions stored as one sciunitsciunit
Uses provenance for self-documentationUses provenance for self-documentation
AA reusablereusable research object.research object.
4. 4IEEE eScience 2017
Sample Applications: FIE, VICSample Applications: FIE, VIC
FIE Application WorkflowFIE Application Workflow
City of Chicago Food InspectionsCity of Chicago Food Inspections
Evaluation ModelEvaluation Model
Four applicationsFour applications
Two languagesTwo languages
130 files130 files
1580 dependencies1580 dependencies
908 MB908 MB
https://github.com/chicago/food-inspections-https://github.com/chicago/food-inspections-
evaluationevaluation
Variable Infiltration CapacityVariable Infiltration Capacity
Four applicationsFour applications
Five languagesFive languages
7 GB7 GB https://vic.readthedocs.iohttps://vic.readthedocs.io
6. 6IEEE eScience 2017
ClientClient packagepackage CommandCommand
Initialize or retrieve aInitialize or retrieve a sciunitsciunit::
Run FIE, capture into package:Run FIE, capture into package:
7. 7IEEE eScience 2017
Packaging DetailsPackaging Details
Package In Storage (133 MB):Package In Storage (133 MB):
22ndnd
Version Of Package (133 MB):Version Of Package (133 MB):
FIE pkg-root DirFIE pkg-root Dir
1) Attach to process1) Attach to process
2) Intercept system calls2) Intercept system calls
3) Copy files / executables3) Copy files / executables
4) Log system calls4) Log system calls
8. 8IEEE eScience 2017
ClientClient repeatrepeat CommandCommand
original calloriginal call
61021 exec61021 exec
“/usr/bin/python”“/usr/bin/python”
replaced callreplaced call
61021 exec61021 exec
“/home/user1/pkgroot/usr/bin/python”“/home/user1/pkgroot/usr/bin/python”
Repeat a package:Repeat a package: 1) Attach to process1) Attach to process
2) Replace system call args2) Replace system call args
9. 9IEEE eScience 2017
Versioning SolutionVersioning Solution
80G File, Fixed 4K Chunks:80G File, Fixed 4K Chunks:
Same File, 1 Byte Inserted At Start:Same File, 1 Byte Inserted At Start:
When to store?When to store?
During packagingDuring packaging
After packagingAfter packaging
How to store?How to store?
Line-based diffsLine-based diffs
Fixed-size chunksFixed-size chunks
Content-definedContent-defined
10. 10IEEE eScience 2017
Rabin HashRabin Hash
Hash of subset of file bytes (Hash of subset of file bytes (RH(BRH(B11,, BB22, …, … BBnn))))
Fixed-size sliding windowFixed-size sliding window nn
Hash at any positionHash at any position ii ((RH(XRH(X(i,n)(i,n)))))
Deduplicate chunkDeduplicate chunk
11. 11IEEE eScience 2017
Storage And RetrievalStorage And Retrieval
Deduplicated Container StorageDeduplicated Container Storage
Store package:Store package:
1) Archive package-root1) Archive package-root
2) CDC on archive2) CDC on archive
3) Store manifest3) Store manifest
Retrieve package:Retrieve package:
1) Retrieve manifest1) Retrieve manifest
2) Concatenate chunks2) Concatenate chunks
3) Extract archive3) Extract archive
12. 12IEEE eScience 2017
Detailed VisualizationDetailed Visualization
Part Of A Normal (Verbose) Provenance LogPart Of A Normal (Verbose) Provenance Log
Small Section Of Graph Built From Normal Provenance LogSmall Section Of Graph Built From Normal Provenance Log
13. 13IEEE eScience 2017
Summarization: Group By SimilaritySummarization: Group By Similarity
Group vertices byGroup vertices by type / connectionstype / connections
Effect: group subprocesses, group files in directoryEffect: group subprocesses, group files in directory
Similarity RuleSimilarity Rule
Type(u) = Type(v), Input(u) = Input(v), Output(u) = Output(v)Type(u) = Type(v), Input(u) = Input(v), Output(u) = Output(v)
Similarity AppliedSimilarity AppliedFull GraphFull Graph
14. 14IEEE eScience 2017
Summarization: PackSummarization: Pack
Find min-connected nodes, pack into hubsFind min-connected nodes, pack into hubs
Packability RulesPackability Rules
1) Type(u) = file, {1) Type(u) = file, { !e | e E ( e=(u,v) e=(v,u) ) }∃ ∈ ∧ ∨!e | e E ( e=(u,v) e=(v,u) ) }∃ ∈ ∧ ∨
2) Type(u) = process, { !e | e E e=(u,v) }∃ ∈ ∧2) Type(u) = process, { !e | e E e=(u,v) }∃ ∈ ∧
3) Type(u) = file, { !(e∃3) Type(u) = file, { !(e∃ 11,e,e22) | ( x V, v≠x) ( e∃ ∈ ∧) | ( x V, v≠x) ( e∃ ∈ ∧ 11=(u,v) E, e∈=(u,v) E, e∈ 22=(x,u) E ) }∈=(x,u) E ) }∈
Packability AppliedPackability Applied
Similarity AppliedSimilarity Applied
15. 15IEEE eScience 2017
Summarization: AnnotateSummarization: Annotate
Higher precedence to process nodesHigher precedence to process nodes
File with n > 1 edges → n annotationsFile with n > 1 edges → n annotations
Annotation AppliedAnnotation AppliedPackability AppliedPackability Applied
16. 16IEEE eScience 2017
Package / Repeat PerformancePackage / Repeat Performance
Added ptrace system callsAdded ptrace system calls
I/O-intensive apps: VICI/O-intensive apps: VIC
Non-I/O-intensive apps: FIENon-I/O-intensive apps: FIE
Package/Repeat RuntimesPackage/Repeat Runtimes
1) Run app normally1) Run app normally
2) Run with2) Run with packagepackage
3) Run with3) Run with repeatrepeat
17. 17IEEE eScience 2017
Versioning PerformanceVersioning Performance
Commit/Reconstruct TimesCommit/Reconstruct Times
Storage SizesStorage Sizes
Package/Repeat RuntimesPackage/Repeat Runtimes
1) Size of several versions1) Size of several versions
2) Size after deduplication2) Size after deduplication
3) CDC / concatenation time3) CDC / concatenation time
18. 18IEEE eScience 2017
Results From Provenance SummarizationResults From Provenance Summarization
Reduction Of EdgesReduction Of EdgesReduction Of File NodesReduction Of File Nodes
Reduction Of Process NodesReduction Of Process Nodes
1) Full FIE graph1) Full FIE graph
2) All techniques applied2) All techniques applied
3) Dynamic expansion3) Dynamic expansion
19. 19IEEE eScience 2017
Conclusion And Current WorkConclusion And Current Work
Graph summarization testingGraph summarization testing
Database applicationsDatabase applications
Exact partial repeatabilityExact partial repeatability
Apps with network-operationsApps with network-operations
Parallel HPC applicationsParallel HPC applications
Emerging reusable object formatsEmerging reusable object formats
sciunitsciunit is a portable, self-contained, and inherentlyis a portable, self-contained, and inherently
understandable versioned unit of computation.understandable versioned unit of computation.
20. 20IEEE eScience 2017
Links And AcknowledgementsLinks And Acknowledgements
sciunitsciunit::
https://sciunit.runhttps://sciunit.run
sciunitsciunit paper:paper:
https://arxiv.orghttps://arxiv.org
Search for “Search for “sciunit”sciunit”
National Science Foundation grants ICER-1639759,National Science Foundation grants ICER-1639759,
ICER-1661918, ICER-1440327, ICER-1343816ICER-1661918, ICER-1440327, ICER-1343816