Pithos is an object storage service similar to Amazon S3 that is written in Python. It has clients for various platforms that communicate with its REST API based on the OpenStack Object Storage API. The Pithos client is a .NET application that synchronizes local folders to Pithos accounts. It handles file events asynchronously using multiple agents to avoid hanging, while efficiently hashing and uploading/downloading large files over unstable networks.
4. Client API
REST API based on OpenStack Object
Storage API
Accounts, Containers without Folders
GET for data, object info
PUT, POST for uploads and data updates
7. API Characteristics
No folders!
Placeholder directory object hold metadata
Block updates ONLY
Merkle hashing to detect modified blocks
Hash using SHA256
8. Merkle Hashing
Top Hash
Hash of #1- Hash of #2-
2 Hashes 3 Hashes
Block #1 Block #2 Block #3 Block #4
Hash Hash Hash Hash
9. Download Process
Get Hashmap Calculate local Find different
from server hashmap blocks
Download Patch local file
blocks with blocks
10. Upload Process
Server responds
Calculate local
PUT to server with missing
hashmap
block hashes
PUT missing
Server responds
blocks at Repeat from #2
201
container level
11. Pithos Client
Multiple accounts per machine
Synchronize local folder to Pithos account
Detect local changes and upload
Detect server changes and download
Calculate Merkle Hash for each file
13. Technologies
.ΝΕΤ 4, due to Windows XP support req
Visual Studio 2012 + Async Targeting Pack
UI - Caliburn.Micro
Concurrency - TPL, Parallel, Dataflow
Network – HttpClient
Hashing - OpenSSL – Faster than native
provider for hashing
Storage - NHibernate, SQLite/SQL Server
Compact
Logging - log4net
14. The challenges
Handle hundreds of file events
Hashing of many large files
Multiple slow connections to the server
Unstable network
Yet it shouldn’t hang
Minimal UI with enough info for the user
15. Event Handling
Poll Agent Uploader/Downloader
• Listen • Queue requests
• Wait for Idle • Get Server • Process each file • Network ops for
hashes files
• Compare hashes
• Identify changes
File Agent Network Agent
16. Events Handling(2)
Use producer/consumer
Store events in ConcurrentQueue
Process ONLY after an idle timeout
17. Merkle Hashing
Why I hate Game of Thrones
Asynchronous reading of blocks
Block hashing in parallel
Use OpenSSL to gain SSE2 etc
Concurrency throttling
Watch the memory consumption!
18. Memory Leaks in a Managed
Environment!
4ΜΒ Blocks? Large Memory but …
Quickly reading 2GB in 64ΚΒ blocks?
Downloading 600ΜΒ in x KB blocks?
Huge number of small objects awaiting collection
during CPU/IO intensive processing
Poor Garbage Collector can’t keep up!
19. Hashing 100% CPU?
Multicore is nice but …
Blocks the system when processing large files!
Throttle parallel block hash ops
Improvements:
Dynamically throttle «large» files
«Throttling» of File Read Ops
20. Multiple slow network calls
Every call a Task
Concurrent REST calls per account and
shared folder
Task.WhenAll to process results at end of poll
21. Unstable network
Use System.Net.Http.HttpClient
Store downloaded blocks in .pithos.cache
folder
Check and reuse orphans
Asynchronous Retry of calls
22. Resistance to crashes
Use Transactional NTFS where available
Thanks MS for killing it!
Modify a copy File.Replace otherwise
23. Should not hang
Use independent agents
Asynchronous operations wherever possible
Use async/await for more readable code
Must always .ConfigureAwait(false)!
BE CAREFULL of async void
24. Minimal UI
Use WPF, MVVM
Use Progress to update the UI
Part of .NET 4.5, backported to 4
The Icon is the Shell!
Lack of good WPF Notification Icon
Problematic Data Binding in menus
25. SQLite or Compact CE?
Initially SQLite -> Staleness problems (DUH !)
Write Ahead logging, means you can see stale data
Switch to SQL Compact to allow concurrent
updates (duh ?)
Really needed better caching?
Akavache?
A Document DB is better suited
26. Next Steps
File Manager UI
General Cleanup (DUH!)
Bring back Unit Tests (Duh ?)
Mock Server
WebAPI? scriptcs? Yumm!
Create a separate Pithos library
Windows RT, Windows Phone clients
AFTER the cleanup
27. Links for Pithos
Pithos trial
http://pithos.okeanos.io
Synnefo Documentation
http://www.synnefo.org/docs/synnefo/latest/ind
ex.html
Pithos API Documentation
http://www.synnefo.org/docs/pithos/latest/index
.html
Pithos Windows Client
https://code.grnet.gr/projects/pithos-ms-client
28. Useful Links
Parallel FX Team blog
http://blogs.msdn.com/b/pfxteam
Caliburn.Micro
http://caliburnmicro.codeplex.com/
Ayende’s BufferPool
http://ayende.com/blog/4827/answer-
stopping-the-leaks
29. Useful Books
C# 5 in a Nutshell, O’Riley
Parallel Programming with .NET, Microsoft
• Pro Parallel Programming with C#, Wiley
• Concurrent Programming on Windows,
Pearson
• The Art of Concurrency, O’Reilly