SlideShare una empresa de Scribd logo
1 de 35
www.twosigma.com
Archival storage at Two Sigma
September 13, 2018
Josh Leners
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
What is Two Sigma?
September 13, 2018
• Technology company applying data science platform to investment
management
• Follow the scientific method for finding investment strategies
• Over 2/3 technical staff; 72% non-financial
• 10,000 data sources
• 35 PB of data
• 95000 CPUs; 1.7 PB Memory
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
“If x, then
y and z
correlate”
Bloomberg, Thompson Reuters
Analysis/news
Prices, order books, trades
Market data
“We look beyond the obvious. So
we can find connections that lead
to the next great investment idea”
Other data
Data at Two Sigma
September 13, 2018
Modeling/
Research
“when x,
buy y and
sell z”
Trading tactic
$$$
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
This talk
September 13, 2018
• Celfs: evolution of an archival file store
• Jaks: a next generation backend
• What an academic has learned in industry
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
Celfs: the architecture
September 13, 2018
Celfs stores filesystem
snapshots, or views.
Root servers name and locate
views.
Data servers locate and store
files.
Metadata
Server
Root
server
Root
Server
Data
server
Data
server
Data
server
Data
server
Data
server
Data
server
Data
server
Data
server
Data
server
Data
server
NYC CHI
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
2017-09-21
------
File 1, File A
client 9/21
/home/dir/:
File 1
File A
client 9/22
/home/dir/:
File 1
File 2
File A
File B
client 9/23
/home/dir/:
File 3
File C
Celfs: the data model
September 13, 2018
Cel 1.
------
File 1, File A
Cel 2
------
File 2, File B
Cel 3.
------
File 3, File C
LATEST
------
File 1, File A
File 2, File B
File 3, File C
2017-09-22
------
File 1, File A
File 2, File B
2017-09-23
------
File 1, File A
File 2, File B
File 3, File C
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
Celfs: the teleology
September 13, 2018
• Archival storage — root servers and data servers are multi-datacenter
• CDN — publish information in one datacenter to another with strong
consistency guarantees
• High bandwidth data source — because cels are randomly distributed a large
view will often be able to make use of the whole cluster’s bandwidth
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
Celfs drawback: storage TCO
September 13, 2018
Single unit of scaling:
Lots of data center real-estate, power, cooling, etc.
Data has three total copies (vulnerable to a small number of disk failures)
Data
server
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
Celfs drawback: performance isolation and scalability
September 13, 2018
Data
server
Data
server
Data
server
Data
server
Data
server
Large-scale
computations
Fairness based on per-user limits, so
single user can’t utilize whole system.
Cluster-level isolation makes scaling
trade-offs worse!
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
This talk
September 13, 2018
• Celfs: evolution of an archival file store
• Jaks: a next generation backend
• What an academic learned in industry
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
JAKS: Just another keystone for storage
September 13, 2018
Most simply:
put(Object) -> id
get(id) -> Object
delete(id) -> ok
Under the hood:
• Tiered storage
• End-to-end encryption
• Quality of service
…
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
Storage tiers: where data lives
September 13, 2018
Bandwidth/
Speed
Cost/GB
RAM
SSD
Erasure encoded disk arrays
Offline storage (Glacier/Coldine/Tape)
100s Gbps
1000s Mbps
10s Mbps
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
JAKS: implementing storage tiers
September 13, 2018
Metadata
Server
Metadata
Server
Metadata
Gateway
Data
Gateway
Data
Gateway
Data
Gateway
Data
Gateway
Data
Gateway
Data
Gateway
consistent
metadata store
backing store
other
sites
client
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
JAKS: implementing storage tiers
September 13, 2018
• Clients only talk to gateways in their site
• Freedom to change backing store and metadata store
• Data gateways are unit of scaling for bandwidth; their RAM/SSDs scale cache
• Clients load-balance across gateways to make full use of cluster
• Random for metadata
• Consistent hash for data
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
Caching in Jaks
September 13, 2018
• Data in Jaks can be cached with three policies
• Pinned — data guaranteed to not be evicted (regardless of use) until some
future point in time
• Long cycle — data is not evicted until it hasn’t been used for a few weeks
• Short cycle — data is not evicted until it hasn’t been used for a few days
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
Measuring access time in Jaks
September 13, 2018
• Use two times
• mtime (when a file was created)
• atime (when a file was accessed)
• Can’t use filesystem “atime” because of SSD wear
• Use off-disk Bloom filters measuring daily access
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
Cache eviction in detail (today is Oct 13)
September 13, 2018
Dec 5
Oct 12
Oct 7
Oct 1
Oct 13
Oct 9
Long Cycle Short Cycle
Periodically:
1. Evict aged out entries
2. Check space, evict random if full
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
End-to-end encryption
September 13, 2018
Metadata
Server
Metadata
Server
Metadata
Gateway
Data
Gateway
Data
Gateway
Data
Gateway
Data
Gateway
Data
Gateway
Data
Gateway
consistent
metadata
store
backing
store
client
get(27)
secret
secret
PUT hash(data)
200 OK
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
End-to-end encryption details
September 13, 2018
• Use authenticated encryption scheme (AES-OCB)
• Derive baking store names from object’s secret
• End-to-end check is powerful!!
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
Performance isolation and bursty workflows
September 13, 2018
Data
server
Data
server
Large-scale
computations
Requirements:
• Allow user to take advantage of
whole system if idle
• Prevent oversubscription from
degrading service below SLA
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
Quality of service: Admission controllers
September 13, 2018
• Need to limit bandwidth resources
• Inbound/outbound traffic per network interface
• Inbound/outbound traffic per backend
• Need to limit fixed resources
• Database connections (in Metadata servers)
• Staging space (for uncached writes/reads)
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
Quality of Service: queuing and allocation
September 13, 2018
Background work
Research workflow
Trading daemons
Rachel
Barry
TomTina
Beth
RalphRandy
Medium priority
Guarantee 60%
Gets 40% of excess
Lowest priority
Guarantee 10%
Gets 50% of excess
Highest priority
Guarantee 30%
Gets 10% of excess
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
Quality of Service: flow control
September 13, 2018
• How to allocate resources like network bandwidth?
• Undersubscribe the OS  sub-optimal utilization
• Oversubscribe the OS  less control over allocation
• Need performance feedback to determine how much flow to allocate
• How can we measure TCP performance from the user level?
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
Measuring TCP performance from user space
September 13, 2018
Server:
Send 54 KB
Wait 27 us
Send 54 KB
…
Case 1: client can receive at maximum allowed rate.
- Send buffer never fills up
Case 2: client can’t receive at maximum allowed rate.
- Send buffer fills up
Gotchas:
- This feedback only works when RTT is low
- Feedback only effective if transfers are long
- Still need to account for duty cycle on backend
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
Quality of Service: backpressure
September 13, 2018
Data Gateway
Ralph
client
Response, backlog info
Backlog at server is
communicated on every
response.
Clients use backlog to
rate limit.
Rejections (queue too
full) lead to exponential
backoff
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
JAKS: Just another keystone for storage
September 13, 2018
Most simply:
put(Object) -> id
get(id) -> Object
delete(id) -> ok
Under the hood:
• End-to-end encryption
• Tiered storage (cached, normal, cold)
• Quality of service
…
Not covered:
- slow clients
- high-availability restarts
- fault-tolerance
- consistent hashing strategy
- geographic replication
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
This talk
September 13, 2018
• Celfs: evolution of an archival file store
• Jaks: a next generation backend
• What an academic learned in industry
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
What an academic learned: measurement
September 13, 2018
Grad school: building measurement framework
• Need to test hypotheses
• Need to get graphs into the paper!
Industry: building measurement framework
• Need to validate changes and measure impact (aka “test hypotheses”)
• Need to understand performance
• Need to detect and anticipate problems
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
What an academic learned: hedging risk
September 13, 2018
Celfs is stable, important, and highly integrated
• can’t expect people to jump ship voluntarily
Need extensive exposure to find bugs and gain confidence
• Jaks development starts January 2016; End-to-end deployment in March 2016
• Finally made GA this month (still have a Celfs safety net)
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
What an academic learned: compatibility
September 13, 2018
Academic: thick clients allow more sophisticated fault-tolerance and scaling
Industry: thick clients allow more sophisticated bugs to persist
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
What an academic learned: build vs. buy decisions
September 13, 2018
Celfs — it’s 2006 and Hadoop is just being born from Apache Nutch
Jaks
• We want to avoid lock-in
• Geo-redundancy not a common ask for vendors
• We need performance isolation
Ultimately, we took a hybrid approach: building gateways
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
What an academic learned: unexpected failures
September 13, 2018
Jaks is designed to tolerate faults in gateways, backend stores, and other sites
• Failure handling is most important part of integration testing
Hard to predict all failure scenarios (Byzantine Fault Tolerance won’t help!)
• Firewall configuration creates partition to certain hosts
• MTU settings disable Kerberos negotiation
• Misuse of Kerberos library causes authentication failures
• Stale network info misdirects clients
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
Placeholder before backup slides
September 13, 2018
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
Gateway caching performance as a function of clients
reading 100 MB
September 13, 2018
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
1 10 20 40 80
0% hot
50% hot
75% hot
90% hot
100% hot
number of clients
MBps
For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no
guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this
presentation for important disclosure information.
Small read performance (64 KB)
September 13, 2018
0
10,000
20,000
30,000
40,000
50,000
0 40 80 120 160
0% hot
50% hot
75% hot
90% hot
100% hot
number of clients
IOPS

Más contenido relacionado

La actualidad más candente

Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScyllaDB
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkFlink Forward
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Sadayuki Furuhashi
 
How to Avoid Common Mistakes When Using Reactor Netty
How to Avoid Common Mistakes When Using Reactor NettyHow to Avoid Common Mistakes When Using Reactor Netty
How to Avoid Common Mistakes When Using Reactor NettyVMware Tanzu
 
HotSpot Synchronization, A Peek Under the Hood [JavaOne 2015 CON7570]
HotSpot Synchronization, A Peek Under the Hood [JavaOne 2015 CON7570]HotSpot Synchronization, A Peek Under the Hood [JavaOne 2015 CON7570]
HotSpot Synchronization, A Peek Under the Hood [JavaOne 2015 CON7570]David Buck
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyDaniel Bimschas
 
Scaling Push Messaging for Millions of Netflix Devices
Scaling Push Messaging for Millions of Netflix DevicesScaling Push Messaging for Millions of Netflix Devices
Scaling Push Messaging for Millions of Netflix DevicesSusheel Aroskar
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDB
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDBHow to Manage Your Time Series Data Pipeline at the Edge with InfluxDB
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDBInfluxData
 
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...HostedbyConfluent
 
Why Task Queues - ComoRichWeb
Why Task Queues - ComoRichWebWhy Task Queues - ComoRichWeb
Why Task Queues - ComoRichWebBryan Helmig
 
Incrementally streaming rdbms data to your data lake automagically
Incrementally streaming rdbms data to your data lake automagicallyIncrementally streaming rdbms data to your data lake automagically
Incrementally streaming rdbms data to your data lake automagicallyTimothy Spann
 
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...Amazon Web Services Korea
 
[GS네오텍] Google Cloud CDN
[GS네오텍] Google Cloud CDN[GS네오텍] Google Cloud CDN
[GS네오텍] Google Cloud CDNGS Neotek
 
How to choose the right messaging service
How to choose the right messaging serviceHow to choose the right messaging service
How to choose the right messaging serviceYan Cui
 
Scaling up task processing with Celery
Scaling up task processing with CeleryScaling up task processing with Celery
Scaling up task processing with CeleryNicolas Grasset
 

La actualidad más candente (20)

Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
How to Avoid Common Mistakes When Using Reactor Netty
How to Avoid Common Mistakes When Using Reactor NettyHow to Avoid Common Mistakes When Using Reactor Netty
How to Avoid Common Mistakes When Using Reactor Netty
 
HotSpot Synchronization, A Peek Under the Hood [JavaOne 2015 CON7570]
HotSpot Synchronization, A Peek Under the Hood [JavaOne 2015 CON7570]HotSpot Synchronization, A Peek Under the Hood [JavaOne 2015 CON7570]
HotSpot Synchronization, A Peek Under the Hood [JavaOne 2015 CON7570]
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with Netty
 
Scaling Push Messaging for Millions of Netflix Devices
Scaling Push Messaging for Millions of Netflix DevicesScaling Push Messaging for Millions of Netflix Devices
Scaling Push Messaging for Millions of Netflix Devices
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDB
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDBHow to Manage Your Time Series Data Pipeline at the Edge with InfluxDB
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDB
 
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
 
Why Task Queues - ComoRichWeb
Why Task Queues - ComoRichWebWhy Task Queues - ComoRichWeb
Why Task Queues - ComoRichWeb
 
Incrementally streaming rdbms data to your data lake automagically
Incrementally streaming rdbms data to your data lake automagicallyIncrementally streaming rdbms data to your data lake automagically
Incrementally streaming rdbms data to your data lake automagically
 
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...
 
[GS네오텍] Google Cloud CDN
[GS네오텍] Google Cloud CDN[GS네오텍] Google Cloud CDN
[GS네오텍] Google Cloud CDN
 
How to choose the right messaging service
How to choose the right messaging serviceHow to choose the right messaging service
How to choose the right messaging service
 
Disruptor
DisruptorDisruptor
Disruptor
 
Scaling up task processing with Celery
Scaling up task processing with CeleryScaling up task processing with Celery
Scaling up task processing with Celery
 
Model storming
Model stormingModel storming
Model storming
 

Similar a Archival Storage at Two Sigma - Josh Leners

Webinar: The All-Flash Fix – How to Create a Hybrid Storage Architecture
Webinar: The All-Flash Fix – How to Create a Hybrid Storage ArchitectureWebinar: The All-Flash Fix – How to Create a Hybrid Storage Architecture
Webinar: The All-Flash Fix – How to Create a Hybrid Storage ArchitectureStorage Switzerland
 
ADV Slides: Modern Analytic Data Architecture Maturity Modeling
ADV Slides: Modern Analytic Data Architecture Maturity ModelingADV Slides: Modern Analytic Data Architecture Maturity Modeling
ADV Slides: Modern Analytic Data Architecture Maturity ModelingDATAVERSITY
 
High Tech Perspective: Overlooked Opportunity from S&OP
High Tech Perspective: Overlooked Opportunity from S&OPHigh Tech Perspective: Overlooked Opportunity from S&OP
High Tech Perspective: Overlooked Opportunity from S&OPSteelwedge
 
Swoc21 Feb08 Amig
Swoc21 Feb08 AmigSwoc21 Feb08 Amig
Swoc21 Feb08 Amiglatha_only
 
Predictive vs Prescriptive Analytics
Predictive vs Prescriptive AnalyticsPredictive vs Prescriptive Analytics
Predictive vs Prescriptive AnalyticsDATAVERSITY
 
ADV Slides: Strategies for Transitioning to a Cloud-First Enterprise
ADV Slides: Strategies for Transitioning to a Cloud-First EnterpriseADV Slides: Strategies for Transitioning to a Cloud-First Enterprise
ADV Slides: Strategies for Transitioning to a Cloud-First EnterpriseDATAVERSITY
 
Navigating Storage in a Cloudy Environment
Navigating Storage in a Cloudy EnvironmentNavigating Storage in a Cloudy Environment
Navigating Storage in a Cloudy EnvironmentHGST Storage
 
What Is My Enterprise Data Maturity 2021
What Is My Enterprise Data Maturity 2021What Is My Enterprise Data Maturity 2021
What Is My Enterprise Data Maturity 2021DATAVERSITY
 
Simulations in Ark Design Mode.pdf
Simulations in Ark Design Mode.pdfSimulations in Ark Design Mode.pdf
Simulations in Ark Design Mode.pdfJeanMarshall8
 
Simulations in Ark Design Mode.pdf
Simulations in Ark Design Mode.pdfSimulations in Ark Design Mode.pdf
Simulations in Ark Design Mode.pdfJeanMarshall8
 
World Wide Technology: Is backing up to the cloud right for you?
World Wide Technology: Is backing up to the cloud right for you?World Wide Technology: Is backing up to the cloud right for you?
World Wide Technology: Is backing up to the cloud right for you?Angie Clark
 
The GIC Investment Ideas: SMAC vs. FANG
The GIC Investment Ideas: SMAC vs. FANGThe GIC Investment Ideas: SMAC vs. FANG
The GIC Investment Ideas: SMAC vs. FANGMarqus J Freeman
 
Chart bookslides 20151229_142542_155_marqus.freeman
Chart bookslides 20151229_142542_155_marqus.freemanChart bookslides 20151229_142542_155_marqus.freeman
Chart bookslides 20151229_142542_155_marqus.freemanMarqus J Freeman
 
Grayscale-Investor-Deck-January-2023.pdf
Grayscale-Investor-Deck-January-2023.pdfGrayscale-Investor-Deck-January-2023.pdf
Grayscale-Investor-Deck-January-2023.pdfSharkilyLei
 
Enterprise SaaS: A Mismatch Made in Heaven
Enterprise SaaS: A Mismatch Made in HeavenEnterprise SaaS: A Mismatch Made in Heaven
Enterprise SaaS: A Mismatch Made in HeavenStanton Jones
 
TIBCO Spotfire: Data Science in the Enterprise
TIBCO Spotfire: Data Science in the EnterpriseTIBCO Spotfire: Data Science in the Enterprise
TIBCO Spotfire: Data Science in the EnterpriseTIBCO Spotfire
 
Reducing Cost Per Release Cycle
Reducing Cost Per Release CycleReducing Cost Per Release Cycle
Reducing Cost Per Release CycleKalido
 
Qonnections2015 - Why Qlik is better with Big Data
Qonnections2015 - Why Qlik is better with Big DataQonnections2015 - Why Qlik is better with Big Data
Qonnections2015 - Why Qlik is better with Big DataJohn Park
 

Similar a Archival Storage at Two Sigma - Josh Leners (20)

Webinar: The All-Flash Fix – How to Create a Hybrid Storage Architecture
Webinar: The All-Flash Fix – How to Create a Hybrid Storage ArchitectureWebinar: The All-Flash Fix – How to Create a Hybrid Storage Architecture
Webinar: The All-Flash Fix – How to Create a Hybrid Storage Architecture
 
ADV Slides: Modern Analytic Data Architecture Maturity Modeling
ADV Slides: Modern Analytic Data Architecture Maturity ModelingADV Slides: Modern Analytic Data Architecture Maturity Modeling
ADV Slides: Modern Analytic Data Architecture Maturity Modeling
 
High Tech Perspective: Overlooked Opportunity from S&OP
High Tech Perspective: Overlooked Opportunity from S&OPHigh Tech Perspective: Overlooked Opportunity from S&OP
High Tech Perspective: Overlooked Opportunity from S&OP
 
Swoc21 Feb08 Amig
Swoc21 Feb08 AmigSwoc21 Feb08 Amig
Swoc21 Feb08 Amig
 
Predictive vs Prescriptive Analytics
Predictive vs Prescriptive AnalyticsPredictive vs Prescriptive Analytics
Predictive vs Prescriptive Analytics
 
ADV Slides: Strategies for Transitioning to a Cloud-First Enterprise
ADV Slides: Strategies for Transitioning to a Cloud-First EnterpriseADV Slides: Strategies for Transitioning to a Cloud-First Enterprise
ADV Slides: Strategies for Transitioning to a Cloud-First Enterprise
 
Navigating Storage in a Cloudy Environment
Navigating Storage in a Cloudy EnvironmentNavigating Storage in a Cloudy Environment
Navigating Storage in a Cloudy Environment
 
What Is My Enterprise Data Maturity 2021
What Is My Enterprise Data Maturity 2021What Is My Enterprise Data Maturity 2021
What Is My Enterprise Data Maturity 2021
 
Simulations in Ark Design Mode.pdf
Simulations in Ark Design Mode.pdfSimulations in Ark Design Mode.pdf
Simulations in Ark Design Mode.pdf
 
Simulations in Ark Design Mode.pdf
Simulations in Ark Design Mode.pdfSimulations in Ark Design Mode.pdf
Simulations in Ark Design Mode.pdf
 
World Wide Technology: Is backing up to the cloud right for you?
World Wide Technology: Is backing up to the cloud right for you?World Wide Technology: Is backing up to the cloud right for you?
World Wide Technology: Is backing up to the cloud right for you?
 
The GIC Investment Ideas: SMAC vs. FANG
The GIC Investment Ideas: SMAC vs. FANGThe GIC Investment Ideas: SMAC vs. FANG
The GIC Investment Ideas: SMAC vs. FANG
 
Chart bookslides 20151229_142542_155_marqus.freeman
Chart bookslides 20151229_142542_155_marqus.freemanChart bookslides 20151229_142542_155_marqus.freeman
Chart bookslides 20151229_142542_155_marqus.freeman
 
Grayscale-Investor-Deck-January-2023.pdf
Grayscale-Investor-Deck-January-2023.pdfGrayscale-Investor-Deck-January-2023.pdf
Grayscale-Investor-Deck-January-2023.pdf
 
Enterprise SaaS: A Mismatch Made in Heaven
Enterprise SaaS: A Mismatch Made in HeavenEnterprise SaaS: A Mismatch Made in Heaven
Enterprise SaaS: A Mismatch Made in Heaven
 
The Enterprise and SaaS
The Enterprise and SaaSThe Enterprise and SaaS
The Enterprise and SaaS
 
TIBCO Spotfire: Data Science in the Enterprise
TIBCO Spotfire: Data Science in the EnterpriseTIBCO Spotfire: Data Science in the Enterprise
TIBCO Spotfire: Data Science in the Enterprise
 
Reducing Cost Per Release Cycle
Reducing Cost Per Release CycleReducing Cost Per Release Cycle
Reducing Cost Per Release Cycle
 
Dreamforce '13 Developer Keynote
Dreamforce '13 Developer KeynoteDreamforce '13 Developer Keynote
Dreamforce '13 Developer Keynote
 
Qonnections2015 - Why Qlik is better with Big Data
Qonnections2015 - Why Qlik is better with Big DataQonnections2015 - Why Qlik is better with Big Data
Qonnections2015 - Why Qlik is better with Big Data
 

Más de Two Sigma

The State of Open Data on School Bullying
The State of Open Data on School BullyingThe State of Open Data on School Bullying
The State of Open Data on School BullyingTwo Sigma
 
Halite @ Google Cloud Next 2018
Halite @ Google Cloud Next 2018Halite @ Google Cloud Next 2018
Halite @ Google Cloud Next 2018Two Sigma
 
Future of Pandas - Jeff Reback
Future of Pandas - Jeff RebackFuture of Pandas - Jeff Reback
Future of Pandas - Jeff RebackTwo Sigma
 
BeakerX - Tiezheng Li
BeakerX - Tiezheng LiBeakerX - Tiezheng Li
BeakerX - Tiezheng LiTwo Sigma
 
Engineering with Open Source - Hyonjee Joo
Engineering with Open Source - Hyonjee JooEngineering with Open Source - Hyonjee Joo
Engineering with Open Source - Hyonjee JooTwo Sigma
 
Bringing Linux back to the Server BIOS with LinuxBoot - Trammel Hudson
Bringing Linux back to the Server BIOS with LinuxBoot - Trammel HudsonBringing Linux back to the Server BIOS with LinuxBoot - Trammel Hudson
Bringing Linux back to the Server BIOS with LinuxBoot - Trammel HudsonTwo Sigma
 
Waiter: An Open-Source Distributed Auto-Scaler
Waiter: An Open-Source Distributed Auto-ScalerWaiter: An Open-Source Distributed Auto-Scaler
Waiter: An Open-Source Distributed Auto-ScalerTwo Sigma
 
Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye
Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia YeResponsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye
Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia YeTwo Sigma
 
The Language of Compression - Leif Walsh
The Language of Compression - Leif WalshThe Language of Compression - Leif Walsh
The Language of Compression - Leif WalshTwo Sigma
 
Identifying Emergent Behaviors in Complex Systems - Jane Adams
Identifying Emergent Behaviors in Complex Systems - Jane AdamsIdentifying Emergent Behaviors in Complex Systems - Jane Adams
Identifying Emergent Behaviors in Complex Systems - Jane AdamsTwo Sigma
 
Algorithmic Data Science = Theory + Practice
Algorithmic Data Science = Theory + PracticeAlgorithmic Data Science = Theory + Practice
Algorithmic Data Science = Theory + PracticeTwo Sigma
 
HUOHUA: A Distributed Time Series Analysis Framework For Spark
HUOHUA: A Distributed Time Series Analysis Framework For SparkHUOHUA: A Distributed Time Series Analysis Framework For Spark
HUOHUA: A Distributed Time Series Analysis Framework For SparkTwo Sigma
 
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowImproving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowTwo Sigma
 
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...Two Sigma
 
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...Two Sigma
 
Graph Summarization with Quality Guarantees
Graph Summarization with Quality GuaranteesGraph Summarization with Quality Guarantees
Graph Summarization with Quality GuaranteesTwo Sigma
 
Rademacher Averages: Theory and Practice
Rademacher Averages: Theory and PracticeRademacher Averages: Theory and Practice
Rademacher Averages: Theory and PracticeTwo Sigma
 
Credit-Implied Volatility
Credit-Implied VolatilityCredit-Implied Volatility
Credit-Implied VolatilityTwo Sigma
 
Principles of REST API Design
Principles of REST API DesignPrinciples of REST API Design
Principles of REST API DesignTwo Sigma
 

Más de Two Sigma (19)

The State of Open Data on School Bullying
The State of Open Data on School BullyingThe State of Open Data on School Bullying
The State of Open Data on School Bullying
 
Halite @ Google Cloud Next 2018
Halite @ Google Cloud Next 2018Halite @ Google Cloud Next 2018
Halite @ Google Cloud Next 2018
 
Future of Pandas - Jeff Reback
Future of Pandas - Jeff RebackFuture of Pandas - Jeff Reback
Future of Pandas - Jeff Reback
 
BeakerX - Tiezheng Li
BeakerX - Tiezheng LiBeakerX - Tiezheng Li
BeakerX - Tiezheng Li
 
Engineering with Open Source - Hyonjee Joo
Engineering with Open Source - Hyonjee JooEngineering with Open Source - Hyonjee Joo
Engineering with Open Source - Hyonjee Joo
 
Bringing Linux back to the Server BIOS with LinuxBoot - Trammel Hudson
Bringing Linux back to the Server BIOS with LinuxBoot - Trammel HudsonBringing Linux back to the Server BIOS with LinuxBoot - Trammel Hudson
Bringing Linux back to the Server BIOS with LinuxBoot - Trammel Hudson
 
Waiter: An Open-Source Distributed Auto-Scaler
Waiter: An Open-Source Distributed Auto-ScalerWaiter: An Open-Source Distributed Auto-Scaler
Waiter: An Open-Source Distributed Auto-Scaler
 
Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye
Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia YeResponsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye
Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye
 
The Language of Compression - Leif Walsh
The Language of Compression - Leif WalshThe Language of Compression - Leif Walsh
The Language of Compression - Leif Walsh
 
Identifying Emergent Behaviors in Complex Systems - Jane Adams
Identifying Emergent Behaviors in Complex Systems - Jane AdamsIdentifying Emergent Behaviors in Complex Systems - Jane Adams
Identifying Emergent Behaviors in Complex Systems - Jane Adams
 
Algorithmic Data Science = Theory + Practice
Algorithmic Data Science = Theory + PracticeAlgorithmic Data Science = Theory + Practice
Algorithmic Data Science = Theory + Practice
 
HUOHUA: A Distributed Time Series Analysis Framework For Spark
HUOHUA: A Distributed Time Series Analysis Framework For SparkHUOHUA: A Distributed Time Series Analysis Framework For Spark
HUOHUA: A Distributed Time Series Analysis Framework For Spark
 
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowImproving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache Arrow
 
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
 
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
 
Graph Summarization with Quality Guarantees
Graph Summarization with Quality GuaranteesGraph Summarization with Quality Guarantees
Graph Summarization with Quality Guarantees
 
Rademacher Averages: Theory and Practice
Rademacher Averages: Theory and PracticeRademacher Averages: Theory and Practice
Rademacher Averages: Theory and Practice
 
Credit-Implied Volatility
Credit-Implied VolatilityCredit-Implied Volatility
Credit-Implied Volatility
 
Principles of REST API Design
Principles of REST API DesignPrinciples of REST API Design
Principles of REST API Design
 

Último

Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfsumitt6_25730773
 
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...jabtakhaidam7
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Learn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic MarksLearn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic MarksMagic Marks
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxmaisarahman1
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxMuhammadAsimMuhammad6
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilVinayVitekari
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...Amil baba
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxpritamlangde
 

Último (20)

Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Learn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic MarksLearn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic Marks
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
 

Archival Storage at Two Sigma - Josh Leners

  • 1. www.twosigma.com Archival storage at Two Sigma September 13, 2018 Josh Leners
  • 2. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. What is Two Sigma? September 13, 2018 • Technology company applying data science platform to investment management • Follow the scientific method for finding investment strategies • Over 2/3 technical staff; 72% non-financial • 10,000 data sources • 35 PB of data • 95000 CPUs; 1.7 PB Memory
  • 3. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. “If x, then y and z correlate” Bloomberg, Thompson Reuters Analysis/news Prices, order books, trades Market data “We look beyond the obvious. So we can find connections that lead to the next great investment idea” Other data Data at Two Sigma September 13, 2018 Modeling/ Research “when x, buy y and sell z” Trading tactic $$$
  • 4. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. This talk September 13, 2018 • Celfs: evolution of an archival file store • Jaks: a next generation backend • What an academic has learned in industry
  • 5. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Celfs: the architecture September 13, 2018 Celfs stores filesystem snapshots, or views. Root servers name and locate views. Data servers locate and store files. Metadata Server Root server Root Server Data server Data server Data server Data server Data server Data server Data server Data server Data server Data server NYC CHI
  • 6. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. 2017-09-21 ------ File 1, File A client 9/21 /home/dir/: File 1 File A client 9/22 /home/dir/: File 1 File 2 File A File B client 9/23 /home/dir/: File 3 File C Celfs: the data model September 13, 2018 Cel 1. ------ File 1, File A Cel 2 ------ File 2, File B Cel 3. ------ File 3, File C LATEST ------ File 1, File A File 2, File B File 3, File C 2017-09-22 ------ File 1, File A File 2, File B 2017-09-23 ------ File 1, File A File 2, File B File 3, File C
  • 7. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Celfs: the teleology September 13, 2018 • Archival storage — root servers and data servers are multi-datacenter • CDN — publish information in one datacenter to another with strong consistency guarantees • High bandwidth data source — because cels are randomly distributed a large view will often be able to make use of the whole cluster’s bandwidth
  • 8. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Celfs drawback: storage TCO September 13, 2018 Single unit of scaling: Lots of data center real-estate, power, cooling, etc. Data has three total copies (vulnerable to a small number of disk failures) Data server
  • 9. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Celfs drawback: performance isolation and scalability September 13, 2018 Data server Data server Data server Data server Data server Large-scale computations Fairness based on per-user limits, so single user can’t utilize whole system. Cluster-level isolation makes scaling trade-offs worse!
  • 10. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. This talk September 13, 2018 • Celfs: evolution of an archival file store • Jaks: a next generation backend • What an academic learned in industry
  • 11. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. JAKS: Just another keystone for storage September 13, 2018 Most simply: put(Object) -> id get(id) -> Object delete(id) -> ok Under the hood: • Tiered storage • End-to-end encryption • Quality of service …
  • 12. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Storage tiers: where data lives September 13, 2018 Bandwidth/ Speed Cost/GB RAM SSD Erasure encoded disk arrays Offline storage (Glacier/Coldine/Tape) 100s Gbps 1000s Mbps 10s Mbps
  • 13. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. JAKS: implementing storage tiers September 13, 2018 Metadata Server Metadata Server Metadata Gateway Data Gateway Data Gateway Data Gateway Data Gateway Data Gateway Data Gateway consistent metadata store backing store other sites client
  • 14. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. JAKS: implementing storage tiers September 13, 2018 • Clients only talk to gateways in their site • Freedom to change backing store and metadata store • Data gateways are unit of scaling for bandwidth; their RAM/SSDs scale cache • Clients load-balance across gateways to make full use of cluster • Random for metadata • Consistent hash for data
  • 15. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Caching in Jaks September 13, 2018 • Data in Jaks can be cached with three policies • Pinned — data guaranteed to not be evicted (regardless of use) until some future point in time • Long cycle — data is not evicted until it hasn’t been used for a few weeks • Short cycle — data is not evicted until it hasn’t been used for a few days
  • 16. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Measuring access time in Jaks September 13, 2018 • Use two times • mtime (when a file was created) • atime (when a file was accessed) • Can’t use filesystem “atime” because of SSD wear • Use off-disk Bloom filters measuring daily access
  • 17. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Cache eviction in detail (today is Oct 13) September 13, 2018 Dec 5 Oct 12 Oct 7 Oct 1 Oct 13 Oct 9 Long Cycle Short Cycle Periodically: 1. Evict aged out entries 2. Check space, evict random if full
  • 18. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. End-to-end encryption September 13, 2018 Metadata Server Metadata Server Metadata Gateway Data Gateway Data Gateway Data Gateway Data Gateway Data Gateway Data Gateway consistent metadata store backing store client get(27) secret secret PUT hash(data) 200 OK
  • 19. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. End-to-end encryption details September 13, 2018 • Use authenticated encryption scheme (AES-OCB) • Derive baking store names from object’s secret • End-to-end check is powerful!!
  • 20. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Performance isolation and bursty workflows September 13, 2018 Data server Data server Large-scale computations Requirements: • Allow user to take advantage of whole system if idle • Prevent oversubscription from degrading service below SLA
  • 21. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Quality of service: Admission controllers September 13, 2018 • Need to limit bandwidth resources • Inbound/outbound traffic per network interface • Inbound/outbound traffic per backend • Need to limit fixed resources • Database connections (in Metadata servers) • Staging space (for uncached writes/reads)
  • 22. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Quality of Service: queuing and allocation September 13, 2018 Background work Research workflow Trading daemons Rachel Barry TomTina Beth RalphRandy Medium priority Guarantee 60% Gets 40% of excess Lowest priority Guarantee 10% Gets 50% of excess Highest priority Guarantee 30% Gets 10% of excess
  • 23. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Quality of Service: flow control September 13, 2018 • How to allocate resources like network bandwidth? • Undersubscribe the OS  sub-optimal utilization • Oversubscribe the OS  less control over allocation • Need performance feedback to determine how much flow to allocate • How can we measure TCP performance from the user level?
  • 24. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Measuring TCP performance from user space September 13, 2018 Server: Send 54 KB Wait 27 us Send 54 KB … Case 1: client can receive at maximum allowed rate. - Send buffer never fills up Case 2: client can’t receive at maximum allowed rate. - Send buffer fills up Gotchas: - This feedback only works when RTT is low - Feedback only effective if transfers are long - Still need to account for duty cycle on backend
  • 25. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Quality of Service: backpressure September 13, 2018 Data Gateway Ralph client Response, backlog info Backlog at server is communicated on every response. Clients use backlog to rate limit. Rejections (queue too full) lead to exponential backoff
  • 26. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. JAKS: Just another keystone for storage September 13, 2018 Most simply: put(Object) -> id get(id) -> Object delete(id) -> ok Under the hood: • End-to-end encryption • Tiered storage (cached, normal, cold) • Quality of service … Not covered: - slow clients - high-availability restarts - fault-tolerance - consistent hashing strategy - geographic replication
  • 27. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. This talk September 13, 2018 • Celfs: evolution of an archival file store • Jaks: a next generation backend • What an academic learned in industry
  • 28. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. What an academic learned: measurement September 13, 2018 Grad school: building measurement framework • Need to test hypotheses • Need to get graphs into the paper! Industry: building measurement framework • Need to validate changes and measure impact (aka “test hypotheses”) • Need to understand performance • Need to detect and anticipate problems
  • 29. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. What an academic learned: hedging risk September 13, 2018 Celfs is stable, important, and highly integrated • can’t expect people to jump ship voluntarily Need extensive exposure to find bugs and gain confidence • Jaks development starts January 2016; End-to-end deployment in March 2016 • Finally made GA this month (still have a Celfs safety net)
  • 30. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. What an academic learned: compatibility September 13, 2018 Academic: thick clients allow more sophisticated fault-tolerance and scaling Industry: thick clients allow more sophisticated bugs to persist
  • 31. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. What an academic learned: build vs. buy decisions September 13, 2018 Celfs — it’s 2006 and Hadoop is just being born from Apache Nutch Jaks • We want to avoid lock-in • Geo-redundancy not a common ask for vendors • We need performance isolation Ultimately, we took a hybrid approach: building gateways
  • 32. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. What an academic learned: unexpected failures September 13, 2018 Jaks is designed to tolerate faults in gateways, backend stores, and other sites • Failure handling is most important part of integration testing Hard to predict all failure scenarios (Byzantine Fault Tolerance won’t help!) • Firewall configuration creates partition to certain hosts • MTU settings disable Kerberos negotiation • Misuse of Kerberos library causes authentication failures • Stale network info misdirects clients
  • 33. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Placeholder before backup slides September 13, 2018
  • 34. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Gateway caching performance as a function of clients reading 100 MB September 13, 2018 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 1 10 20 40 80 0% hot 50% hot 75% hot 90% hot 100% hot number of clients MBps
  • 35. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Small read performance (64 KB) September 13, 2018 0 10,000 20,000 30,000 40,000 50,000 0 40 80 120 160 0% hot 50% hot 75% hot 90% hot 100% hot number of clients IOPS