SlideShare una empresa de Scribd logo
1 de 31
An Integrated Hardware-SoftwareAn Integrated Hardware-Software
Approach to Flexible TransactionalApproach to Flexible Transactional
MemoryMemory
Arrvindh Shriraman, Michael F. Spear,
Hemayet Hossain, Virendra J. Marathe,
Sandhya Dwarkadas, and Michael L. Scott
www.cs.rochester.edu/research/synchronization
01/29/15
An Integrated Hardware-Software Approach to Flexible
2
Transactional Memory Implementation
• Hardware Transactional Memory (HTM)
+ library compatible, fast if no pathologies
- rigid policy, virtualization support expensive, no migration path
• Software Transactional Memory (STM)
+ flexible policy (conflict ,escape actions), hardware compatibility
- slow (always ?), library compatibility hard
• Best-effort TMs
+ simplifies future hardware, runs on current hardware
- rigid policy, hardware inflexible, performance cliffs
e.g., TCC, UTM, LogTM, VTM, PTM, BulkTM
e.g., RSTM, DSTM, McRT, TL2, SXM
e.g., HyTM, Intel Hybrid TM
01/29/15
An Integrated Hardware-Software Approach to Flexible
3
Our Approach
Hardware-Software Transactions
– hardware to accelerate STMs and support your favorite policy
– hardware that supports flexible software implementation
– software routines to support uncommon events
(i.e., overflows, context switches, paging)
+ flexible policy, supports today’s hardware,
accelerates STMs, multiple uses for acceleration hardware
- slower than HTMs, library compatibility (compiler support?)
e.g., RTM (this talk), AOU_N (yesterday at SPAA 2007)
01/29/15
An Integrated Hardware-Software Approach to Flexible
4
TAG Data
Data Structures in TM
R W
HTM cache entry STM organization
Data
Meta
Data
Conflict
resolution
Version
management
DataA TAG
Alert-On-Update
for conflict detection
Meta
Data TAGR W
Programmable-Data-Isolation
for data versioning
Flexible Transactional Memory
Conflict
resolution
Version
management
&
01/29/15
An Integrated Hardware-Software Approach to Flexible
5
Why ?
• Decoupled conflict detection and version
management for flexible policy and usage
• Conflict detection
– Eager, at first read/write to a shared data
– Lazy, prior to commit of speculative updates
– Mixed, eager write-write and lazy read-write
– and more.....
• Flexible software contention managers
– arbitrate among conflicting transactions
01/29/15
An Integrated Hardware-Software Approach to Flexible
6
For workload description, please see the paper
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Hash RBTree-Large LinkedList-
Release
LFUCache RandomGraph
NormalizedExecutionTime
Abort
Copy
Validation
CM
Bookkeeping
MM
App Non-Tx
App Tx
STM Overheads
21%
43% 42%
34%
Overheads targeted
Runtime SW
RBTree
RSTM [TRANSACT ’06]
Copying : Buffering of speculative modifications to ensure isolation
Validation: Verifying consistency of accessed locations
79%
01/29/15
An Integrated Hardware-Software Approach to Flexible
7
Flexible Transactional Memory
• Leave policy decisions in software
– multiple-writer coherence for data isolation at software’s behest
– HW provides conflict detection, SW specifies resolution policy
• Minimize the validation overhead
– Alert-on-update provides fast event based communication of
remote memory operations
• Eliminate copying overhead
– Programmable data isolation allows software to employ private
caches as thread local buffers
• Use software mechanisms to accommodate virtualization
(i.e., cache overflows, paging, thread switches)
01/29/15
An Integrated Hardware-Software Approach to Flexible
8
Alert-On-Update (AOU)
• ISA includes an instruction, ALoad, that loads an
address and marks the cache line
• A-tagged line on invalidation
– jumps to a software handler
– masks further alerts until exit from alert handler
• Alerts can be due to
– capacity, cache cannot track update events on evicted line
– coherence, remote processor has acquired exclusive access
Caveat: AOU support cannot extend across events that exhaust space and timeAdvantages: general, lightweight, simple, and fine-grained
DataA TAG
Cache Entry
01/29/15
An Integrated Hardware-Software Approach to Flexible
9
• ISA provides TStore and TLoad to isolate data in
cache line
• TMI buffers/isolates TStores
– supports concurrent speculative writers; BusTRdX
ignored
– supports concurrent readers; BusRd threatened and
data response suppressed
• TI isolates concurrent readers from speculative
writers
– values written by other TStores are isolated;
– a threatened read results in dropping to TI
Programmable Data Isolation (PDI)
01/29/15
An Integrated Hardware-Software Approach to Flexible
10
For details on coherence protocol and tag encoding, please see TR 910
Programmable Data Isolation (PDI)
• TI lines isolate concurrent readers from speculative
writers
– are dropped without alerting processor
– allow caching; drop to I on revert or commit
• TStored (TMI) lines buffer speculative stores
– must remain in cache or HW alerts active thread
– drop to M on commit, I on revert
• Support R-W and W-W concurrent sharers (if SW wants)
• no global consensus in HW required for committing
– commit is entirely local; SW responsible for correctness
01/29/15
An Integrated Hardware-Software Approach to Flexible
11
Putting things together
• Decoupled hardware for
– version management (PDI) and conflict detection (AOU)
– accelerating common TM operations
• Many feasible software libraries to
– implement and export transaction constructs
– handle time and space exhaustion
– control runtime policy
• RTM is an object-level, indirection based TM.
01/29/15
An Integrated Hardware-Software Approach to Flexible
12
RTM Data Structure
Owner Status
Transaction Descriptor
Current Data
(if versioning in
SW)
Serial #
New Data
uncommitted
Overflow
Readers
Serial #
Runtime SW associates a metadata header with every object.
An Object can denote a semantic entity or a group of memory locations.
Metadata per Object
reader bitmap to track
transactions not using HW support
committed
Conflict detection
Data Versioning
N cache lines
01/29/15
An Integrated Hardware-Software Approach to Flexible
13
FastPath Transactions
(Validation + Copying)
Program Data
Begin_hw_t abort_pc
ALD TxD_2
ALD OH(A)
TLD A
TST A
CAS OH(A)
CAS-Commit TxD_2
Owner
COMMIT
TxD_1
#S
Overflow
Readers
TxD_2
CAS
ACTIVECOMMIT
A
(current)
• Do not overflow time or space resources
• ALoad descriptor to detect concurrent active transactions
• ALoad object header to detect ownership changes
• TStore updates are isolated in private cache
OH(A)
AOU
PDI
In Cache
01/29/15
An Integrated Hardware-Software Approach to Flexible
14
A
current
Overflow Transactions
Program Data
Begin_sw_t abort_pc
ALD TxD_2
LD OH(A)
...........
ST A’
CAS OH(A)
CAS-Commit TxD_2
Owner
COMMIT
TxD_1
#S
Overflow
Readers
TxD_2
CAS
ACTIVECOMMIT
OH(A)
A’
new version
• ALoad descriptor to detect concurrent active transactions
• To Read, update overflow-reader list to notify future requestors
• To Write, copy current version and buffer speculative updates
In Cache
AOU
01/29/15
An Integrated Hardware-Software Approach to Flexible
15
TMESI Prototype
I$
Shared L2$
1P
D$ I$
2P
D$ I$
16P
D$
Snoopy Interconnect
SPARC v9
1.2GHz 64KB I&D, 4-way
2-cycle access
32 entry VB
Memory
4-ary ordered tree
1-cycle link delay
64 bytes/cycle 8MB,8way,4banks
20-cycle bank delay
100-cycle DRAM access
……….
MESI coherence protocol
The simulation infrastructure is based on the SIMICS + Multifacet GEMS framework
Our thanks to the Wisconsin Multifacet group for distributing the GEMS toolset
01/29/15
An Integrated Hardware-Software Approach to Flexible
16
* For a detailed description of Lite transactions, please see the paper
Runtime Systems
• CGL (Coarse Grain Lock)
• RTM-F(astpath) - Validation, Copying
• RTM-O(verflow) - Validation, Copying
• RTM-Lite* - Validation, Copying
• RSTM (Invisible + Eager) [Transact’06]
Benchmarks
33% lookup, 33%insert, 33%delete operations on
HashTable (256 buckets), RBTree
RBTree-Large (256byte entry), LinkedList-Rel,
LFUCache (255 queue + 2048 array), RandomGraph
01/29/15
An Integrated Hardware-Software Approach to Flexible
17
RTM-F Scales
RBTree-Large
• RTM-F improves performance and provides good scalability
- at 2 threads its 50% slower than CGL1 but at 16 threads its 1.8X faster
• RTM-O’s performance is as good as RSTM on a CMP (Avg: 6% variation)
0
0.25
0.5
0.75
1
1.25
1.5
1.75
2
1 2 4 8 16
Threads
NormalizedThroughput
CGL
RTM-F
RTM-Lite
RTM-O
RSTM
1.9X
2X
CGL, 1thread = 1
2X
01/29/15
An Integrated Hardware-Software Approach to Flexible
18
Hardware accelerates Software
0
0.5
1
1.5
2
2.5
3
Hash RBTree RBTree-
Large
LinkedList-
Rel
NormalizedThroughput
RTM-F RTMLite RTM-O RSTM
0
0.05
0.1
0.15
0.2
0.25
0.3
LFUCache
• RTM-F’s speedup over RTM-Lite is proportional to copying overhead
- HashTable (5%), LFUCache (14%), RBTree-Large(45%)
• RTM-Lite presents an attractive HW cost/performance tradeoff
- 45% slower than RTM-F on our most copy heavy benchmark
CGL, 1thread = 1
1.5X
1.7X 1.8X1.7X
1.6X
16 Threads
01/29/15
An Integrated Hardware-Software Approach to Flexible
19
Conflict Policy Important!
0
0.2
0.4
0.6
0.8
1
NormalizedThroughput
RandomGraph
1 2 4 8 16
X-Axis, Threads
Livelock
0
1
2
3
4
5
6
1 2 4 8 16
NormalizedThroughput
Hash
Eager
Lazy
01/29/15
An Integrated Hardware-Software Approach to Flexible
20
Conflict Policy Important!
• In applications with low degree of sharing
– Eager as good as lazy
– Lazy imposes higher bookkeeping overheads
• In applications with high degree of sharing
– Lazy eliminates livelock anomalies
– Lazy exploits R-W and W-W sharing
– Lazy narrows conflict window to attain more commits
HashTable (Eager is 21% faster) and RBTree (Eager is 10% slower)
LFUCache (Lazy is 28% faster) and RandomGraph (lazy eliminates livelocks)
01/29/15
An Integrated Hardware-Software Approach to Flexible
21
To Take Home
• Decouple hardware for versioning and conflict
detection to enable
– flexible software TM policy and
– non-TM uses
• Flexible conflict detection and management to
eliminate performance anomalies
• Use software to handle the uncommon cases
01/29/15
An Integrated Hardware-Software Approach to Flexible
22
Questions
Download RSTM version 3.0 at
http://www.cs.rochester.edu/research/synchronization/
Arrvindh Mike
Sandhya
VirendraHemayet
Michael
01/29/15
An Integrated Hardware-Software Approach to Flexible
23
Backup
01/29/15
An Integrated Hardware-Software Approach to Flexible
24
Future Work
• How to enable flexible usage of hardware ?
– semantics, concurrent use, programmer interface
• Simplify metadata organization
• Extend to scalable protocols and compare with
pure HTM system
• Strong Isolation and Privatization
01/29/15
An Integrated Hardware-Software Approach to Flexible
25
RTM Interface
Z = X + Y ≡
1. Start transaction in (Fastpath/Overflow) mode and save abort-handler PC2. Open object metadata before reading/writing object data3. Read and speculatively update objects4. Acquire ownership of written objects in their metadata at either
- open (i.e. eager)
+ reduces wasted work,
- possible livelock, reduced concurrency (not even R-W sharing)
- end_tx (i.e. lazy)
+ increased concurrency, livelock freedom
- more wasted work, requires lazy versioning
5. If Active, switch status to commited.
BEGIN_TX (handler_ptr, mode [H/S])
const integer* rd_X = X  open_RO()
const integer* rd_Y = Y  open_RO()
integer* wr_Z = Z  open_RW()
*wr_Z = (*rd_X) x (*rd_Y)
END_TX
01/29/15
An Integrated Hardware-Software Approach to Flexible
26
P0
L1
Shared L2
1 P1
L1
P2
L1
T0 T1 T2
TLoad A
TStore B TStore A
TLoad A
TLoad B
23
4
5
TGetX
AE: OH(A)
TEE: A
AE: OH(B)
TMI: B
AS: OH(A)
TMI: A
AS: OH(A)
TII: A
AS: OH(A)
TII: A
AS: OH(B)
TII: B
AS: OH(B)
Protocol Animation
Cache line size objects: A,B Object Metadata: OH(A), OH(B)
01/29/15
An Integrated Hardware-Software Approach to Flexible
27
Protocol Animation
P0
L1
Shared L2
1 P1
L1
P2
L1
T0 T1 T2
TLoad A
TStore B TStore A
TLoad A
TLoad B
Acquire OH(A)
CAS-Commit
CAS-Commit
23
4
5
GetX
AS: OH(A)
AS: OH(B)
TMI: B
AS: OH(A)
TMI: ATII: A
AS: OH(A)
TII: A
AS: OH(B)
TII: B
6
S: OH(A)
I: A
S: OH(B)
I: B
7
Abort
I: OH(A)
S: OH(B)
I: B
I: A M: A
M: OH(A)
Commit Commit
Cache line size objects: A,B Object metadata: OH(A), OH(B)
01/29/15
An Integrated Hardware-Software Approach to Flexible
28
Lite Transaction
(Validation)
• To read
– ALoad object header to detect object ownership
acquisition
• To write
– ALoad descriptor to detect concurrent transactions
stealing ownership
– Clone object and buffer modifications
– Acquire ownership and pointers to perform logical
update
01/29/15
An Integrated Hardware-Software Approach to Flexible
29
01/29/15
An Integrated Hardware-Software Approach to Flexible
30
• What is the serial number for ?
• How does A-tags differ from Intel-HASTM
• Privatization
• 2X is not enough, why are you slow ?
• What about strong isolation ?
• What about 2 modified lines
01/29/15
An Integrated Hardware-Software Approach to Flexible
31

Más contenido relacionado

La actualidad más candente

Real Time Operating System Concepts
Real Time Operating System ConceptsReal Time Operating System Concepts
Real Time Operating System ConceptsSanjiv Malik
 
Solving Real-Time Scheduling Problems With RT_PREEMPT and Deadline-Based Sche...
Solving Real-Time Scheduling Problems With RT_PREEMPT and Deadline-Based Sche...Solving Real-Time Scheduling Problems With RT_PREEMPT and Deadline-Based Sche...
Solving Real-Time Scheduling Problems With RT_PREEMPT and Deadline-Based Sche...peknap
 
Considerations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfConsiderations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfhik_lhz
 
Operating System-Ch8 memory management
Operating System-Ch8 memory managementOperating System-Ch8 memory management
Operating System-Ch8 memory managementSyaiful Ahdan
 
Dev Conf 2017 - Meeting nfv networking requirements
Dev Conf 2017 - Meeting nfv networking requirementsDev Conf 2017 - Meeting nfv networking requirements
Dev Conf 2017 - Meeting nfv networking requirementsFlavio Leitner
 
Real Time Systems
Real Time SystemsReal Time Systems
Real Time SystemsDeepak John
 
Operating Systems 1 (10/12) - Scheduling
Operating Systems 1 (10/12) - SchedulingOperating Systems 1 (10/12) - Scheduling
Operating Systems 1 (10/12) - SchedulingPeter Tröger
 
Design of Software for Embedded Systems
Design of Software for Embedded SystemsDesign of Software for Embedded Systems
Design of Software for Embedded SystemsPeter Tröger
 
Galvin-operating System(Ch1)
Galvin-operating System(Ch1)Galvin-operating System(Ch1)
Galvin-operating System(Ch1)dsuyal1
 
Linux Preempt-RT Internals
Linux Preempt-RT InternalsLinux Preempt-RT Internals
Linux Preempt-RT Internals哲豪 康哲豪
 

La actualidad más candente (19)

Rtos slides
Rtos slidesRtos slides
Rtos slides
 
RT linux
RT linuxRT linux
RT linux
 
Real Time Operating System Concepts
Real Time Operating System ConceptsReal Time Operating System Concepts
Real Time Operating System Concepts
 
Solving Real-Time Scheduling Problems With RT_PREEMPT and Deadline-Based Sche...
Solving Real-Time Scheduling Problems With RT_PREEMPT and Deadline-Based Sche...Solving Real-Time Scheduling Problems With RT_PREEMPT and Deadline-Based Sche...
Solving Real-Time Scheduling Problems With RT_PREEMPT and Deadline-Based Sche...
 
Considerations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfConsiderations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmf
 
ucOS
ucOSucOS
ucOS
 
Ch6 cpu scheduling
Ch6   cpu schedulingCh6   cpu scheduling
Ch6 cpu scheduling
 
Operating System-Ch8 memory management
Operating System-Ch8 memory managementOperating System-Ch8 memory management
Operating System-Ch8 memory management
 
Dev Conf 2017 - Meeting nfv networking requirements
Dev Conf 2017 - Meeting nfv networking requirementsDev Conf 2017 - Meeting nfv networking requirements
Dev Conf 2017 - Meeting nfv networking requirements
 
Real Time Systems
Real Time SystemsReal Time Systems
Real Time Systems
 
Operating Systems 1 (10/12) - Scheduling
Operating Systems 1 (10/12) - SchedulingOperating Systems 1 (10/12) - Scheduling
Operating Systems 1 (10/12) - Scheduling
 
Rtos by shibu
Rtos by shibuRtos by shibu
Rtos by shibu
 
Process Scheduling
Process SchedulingProcess Scheduling
Process Scheduling
 
Design of Software for Embedded Systems
Design of Software for Embedded SystemsDesign of Software for Embedded Systems
Design of Software for Embedded Systems
 
Cpu scheduling
Cpu schedulingCpu scheduling
Cpu scheduling
 
Galvin-operating System(Ch1)
Galvin-operating System(Ch1)Galvin-operating System(Ch1)
Galvin-operating System(Ch1)
 
Vx works RTOS
Vx works RTOSVx works RTOS
Vx works RTOS
 
Linux Preempt-RT Internals
Linux Preempt-RT InternalsLinux Preempt-RT Internals
Linux Preempt-RT Internals
 
Rtai
RtaiRtai
Rtai
 

Destacado

CORTES DE PELO | ¡¡Vídeo aniversario!!
CORTES DE PELO | ¡¡Vídeo aniversario!!CORTES DE PELO | ¡¡Vídeo aniversario!!
CORTES DE PELO | ¡¡Vídeo aniversario!!Aitor BV
 
Dynamic processors w4_3_imp
Dynamic processors w4_3_impDynamic processors w4_3_imp
Dynamic processors w4_3_impJan Zurcher
 
TheCrowdCafe: Global Crowdinvesting Industry Presentation
TheCrowdCafe: Global Crowdinvesting Industry Presentation  TheCrowdCafe: Global Crowdinvesting Industry Presentation
TheCrowdCafe: Global Crowdinvesting Industry Presentation Jonathan Sandlund
 
雲南東川紅土地
雲南東川紅土地雲南東川紅土地
雲南東川紅土地Jaing Lai
 
A Synthesis Review of Key Lessons in Programs Relating to Oceans and Fisheries
A Synthesis Review of Key Lessons in Programs Relating to Oceans and FisheriesA Synthesis Review of Key Lessons in Programs Relating to Oceans and Fisheries
A Synthesis Review of Key Lessons in Programs Relating to Oceans and FisheriesThe Rockefeller Foundation
 
The White Rose of Athens
The White Rose of AthensThe White Rose of Athens
The White Rose of AthensJohn *
 
Museums Give Teachers Control to Create
Museums Give Teachers  Control to CreateMuseums Give Teachers  Control to Create
Museums Give Teachers Control to CreateDarren Milligan
 
Growing Your Audience: Reaching Kids Online with Digital Museum Educational R...
Growing Your Audience: Reaching Kids Online with Digital Museum Educational R...Growing Your Audience: Reaching Kids Online with Digital Museum Educational R...
Growing Your Audience: Reaching Kids Online with Digital Museum Educational R...Darren Milligan
 
SME2: Social Media Excellence x Social Media Expertise
SME2: Social Media Excellence x Social Media ExpertiseSME2: Social Media Excellence x Social Media Expertise
SME2: Social Media Excellence x Social Media ExpertiseRichard Binhammer
 
Drude Lorentz circuit Gonano Zich
Drude Lorentz circuit Gonano ZichDrude Lorentz circuit Gonano Zich
Drude Lorentz circuit Gonano ZichCarlo Andrea Gonano
 
Photo of the Day from Outdoor Photographer Magazine
Photo of the Day from Outdoor Photographer MagazinePhoto of the Day from Outdoor Photographer Magazine
Photo of the Day from Outdoor Photographer Magazinemaditabalnco
 
CORTES DE PELO CORTO 2016 | Tendencias y Ventajas de Cortar tu Cabello a la ...
CORTES DE PELO CORTO 2016 |  Tendencias y Ventajas de Cortar tu Cabello a la ...CORTES DE PELO CORTO 2016 |  Tendencias y Ventajas de Cortar tu Cabello a la ...
CORTES DE PELO CORTO 2016 | Tendencias y Ventajas de Cortar tu Cabello a la ...Miriam TM
 
1 Imprezy i usługi turystyczne
1 Imprezy i usługi turystyczne1 Imprezy i usługi turystyczne
1 Imprezy i usługi turystyczneKasia Stachura
 

Destacado (20)

CORTES DE PELO | ¡¡Vídeo aniversario!!
CORTES DE PELO | ¡¡Vídeo aniversario!!CORTES DE PELO | ¡¡Vídeo aniversario!!
CORTES DE PELO | ¡¡Vídeo aniversario!!
 
2
22
2
 
Dynamic processors w4_3_imp
Dynamic processors w4_3_impDynamic processors w4_3_imp
Dynamic processors w4_3_imp
 
完美的管理
完美的管理完美的管理
完美的管理
 
TLC Services
TLC ServicesTLC Services
TLC Services
 
Zaragoza Turismo 44
Zaragoza Turismo 44Zaragoza Turismo 44
Zaragoza Turismo 44
 
TheCrowdCafe: Global Crowdinvesting Industry Presentation
TheCrowdCafe: Global Crowdinvesting Industry Presentation  TheCrowdCafe: Global Crowdinvesting Industry Presentation
TheCrowdCafe: Global Crowdinvesting Industry Presentation
 
Zaragoza turismo 233
Zaragoza turismo 233Zaragoza turismo 233
Zaragoza turismo 233
 
雲南東川紅土地
雲南東川紅土地雲南東川紅土地
雲南東川紅土地
 
A Synthesis Review of Key Lessons in Programs Relating to Oceans and Fisheries
A Synthesis Review of Key Lessons in Programs Relating to Oceans and FisheriesA Synthesis Review of Key Lessons in Programs Relating to Oceans and Fisheries
A Synthesis Review of Key Lessons in Programs Relating to Oceans and Fisheries
 
The White Rose of Athens
The White Rose of AthensThe White Rose of Athens
The White Rose of Athens
 
Museums Give Teachers Control to Create
Museums Give Teachers  Control to CreateMuseums Give Teachers  Control to Create
Museums Give Teachers Control to Create
 
Growing Your Audience: Reaching Kids Online with Digital Museum Educational R...
Growing Your Audience: Reaching Kids Online with Digital Museum Educational R...Growing Your Audience: Reaching Kids Online with Digital Museum Educational R...
Growing Your Audience: Reaching Kids Online with Digital Museum Educational R...
 
SME2: Social Media Excellence x Social Media Expertise
SME2: Social Media Excellence x Social Media ExpertiseSME2: Social Media Excellence x Social Media Expertise
SME2: Social Media Excellence x Social Media Expertise
 
Workshop creativiteit
Workshop creativiteit Workshop creativiteit
Workshop creativiteit
 
Drude Lorentz circuit Gonano Zich
Drude Lorentz circuit Gonano ZichDrude Lorentz circuit Gonano Zich
Drude Lorentz circuit Gonano Zich
 
Photo of the Day from Outdoor Photographer Magazine
Photo of the Day from Outdoor Photographer MagazinePhoto of the Day from Outdoor Photographer Magazine
Photo of the Day from Outdoor Photographer Magazine
 
Zaragoza Turismo 39
Zaragoza Turismo 39Zaragoza Turismo 39
Zaragoza Turismo 39
 
CORTES DE PELO CORTO 2016 | Tendencias y Ventajas de Cortar tu Cabello a la ...
CORTES DE PELO CORTO 2016 |  Tendencias y Ventajas de Cortar tu Cabello a la ...CORTES DE PELO CORTO 2016 |  Tendencias y Ventajas de Cortar tu Cabello a la ...
CORTES DE PELO CORTO 2016 | Tendencias y Ventajas de Cortar tu Cabello a la ...
 
1 Imprezy i usługi turystyczne
1 Imprezy i usługi turystyczne1 Imprezy i usługi turystyczne
1 Imprezy i usługi turystyczne
 

Similar a An Integrated Hardware-Software Approach to Flexible Transactional Memory

rtosbyshibu-131026100746-phpapp01.pdf
rtosbyshibu-131026100746-phpapp01.pdfrtosbyshibu-131026100746-phpapp01.pdf
rtosbyshibu-131026100746-phpapp01.pdfreemasajin1
 
Prelim Slides
Prelim SlidesPrelim Slides
Prelim Slidessmpant
 
How to Measure RTOS Performance
How to Measure RTOS Performance How to Measure RTOS Performance
How to Measure RTOS Performance mentoresd
 
SDN, OpenFlow, NFV, and Virtual Network
SDN, OpenFlow, NFV, and Virtual NetworkSDN, OpenFlow, NFV, and Virtual Network
SDN, OpenFlow, NFV, and Virtual NetworkTim4PreStartup
 
Next Generation Execution Engine for Apache Storm
Next Generation Execution Engine for Apache StormNext Generation Execution Engine for Apache Storm
Next Generation Execution Engine for Apache StormDataWorks Summit
 
What is RTOS Step by Step Guide?
What is RTOS Step by Step Guide?What is RTOS Step by Step Guide?
What is RTOS Step by Step Guide?IntervalZero
 
Real Time OS For Embedded Systems
Real Time OS For Embedded SystemsReal Time OS For Embedded Systems
Real Time OS For Embedded SystemsHimanshu Ghetia
 
Real time Operating System
Real time Operating SystemReal time Operating System
Real time Operating SystemTech_MX
 
Real-Time Operating Systems Real-Time Operating Systems RTOS .ppt
Real-Time Operating Systems Real-Time Operating Systems RTOS .pptReal-Time Operating Systems Real-Time Operating Systems RTOS .ppt
Real-Time Operating Systems Real-Time Operating Systems RTOS .pptlematadese670
 
Os rtos.ppt
Os rtos.pptOs rtos.ppt
Os rtos.pptrahul km
 
Lecture10_RealTimeOperatingSystems.pptx
Lecture10_RealTimeOperatingSystems.pptxLecture10_RealTimeOperatingSystems.pptx
Lecture10_RealTimeOperatingSystems.pptxSekharSankuri1
 
4029569 ltsp-practice-guide
4029569 ltsp-practice-guide4029569 ltsp-practice-guide
4029569 ltsp-practice-guidekishorconnectix
 
Rts assighment final
Rts assighment finalRts assighment final
Rts assighment finalsayanpandit
 
200923 01en
200923 01en200923 01en
200923 01enopenrtm
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataRyan Bosshart
 

Similar a An Integrated Hardware-Software Approach to Flexible Transactional Memory (20)

rtosbyshibu-131026100746-phpapp01.pdf
rtosbyshibu-131026100746-phpapp01.pdfrtosbyshibu-131026100746-phpapp01.pdf
rtosbyshibu-131026100746-phpapp01.pdf
 
Prelim Slides
Prelim SlidesPrelim Slides
Prelim Slides
 
Intro to Embedded OS, RTOS and Communication Protocols
Intro to Embedded OS, RTOS and Communication ProtocolsIntro to Embedded OS, RTOS and Communication Protocols
Intro to Embedded OS, RTOS and Communication Protocols
 
How to Measure RTOS Performance
How to Measure RTOS Performance How to Measure RTOS Performance
How to Measure RTOS Performance
 
SDN, OpenFlow, NFV, and Virtual Network
SDN, OpenFlow, NFV, and Virtual NetworkSDN, OpenFlow, NFV, and Virtual Network
SDN, OpenFlow, NFV, and Virtual Network
 
Next Generation Execution Engine for Apache Storm
Next Generation Execution Engine for Apache StormNext Generation Execution Engine for Apache Storm
Next Generation Execution Engine for Apache Storm
 
What is RTOS Step by Step Guide?
What is RTOS Step by Step Guide?What is RTOS Step by Step Guide?
What is RTOS Step by Step Guide?
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Real Time OS For Embedded Systems
Real Time OS For Embedded SystemsReal Time OS For Embedded Systems
Real Time OS For Embedded Systems
 
Embedded os
Embedded osEmbedded os
Embedded os
 
Real time Operating System
Real time Operating SystemReal time Operating System
Real time Operating System
 
Real-Time Operating Systems Real-Time Operating Systems RTOS .ppt
Real-Time Operating Systems Real-Time Operating Systems RTOS .pptReal-Time Operating Systems Real-Time Operating Systems RTOS .ppt
Real-Time Operating Systems Real-Time Operating Systems RTOS .ppt
 
Os rtos.ppt
Os rtos.pptOs rtos.ppt
Os rtos.ppt
 
Lecture10_RealTimeOperatingSystems.pptx
Lecture10_RealTimeOperatingSystems.pptxLecture10_RealTimeOperatingSystems.pptx
Lecture10_RealTimeOperatingSystems.pptx
 
4029569 ltsp-practice-guide
4029569 ltsp-practice-guide4029569 ltsp-practice-guide
4029569 ltsp-practice-guide
 
Rts assighment final
Rts assighment finalRts assighment final
Rts assighment final
 
200923 01en
200923 01en200923 01en
200923 01en
 
Rtos 2
Rtos 2Rtos 2
Rtos 2
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast Data
 
Presentation
PresentationPresentation
Presentation
 

An Integrated Hardware-Software Approach to Flexible Transactional Memory

  • 1. An Integrated Hardware-SoftwareAn Integrated Hardware-Software Approach to Flexible TransactionalApproach to Flexible Transactional MemoryMemory Arrvindh Shriraman, Michael F. Spear, Hemayet Hossain, Virendra J. Marathe, Sandhya Dwarkadas, and Michael L. Scott www.cs.rochester.edu/research/synchronization
  • 2. 01/29/15 An Integrated Hardware-Software Approach to Flexible 2 Transactional Memory Implementation • Hardware Transactional Memory (HTM) + library compatible, fast if no pathologies - rigid policy, virtualization support expensive, no migration path • Software Transactional Memory (STM) + flexible policy (conflict ,escape actions), hardware compatibility - slow (always ?), library compatibility hard • Best-effort TMs + simplifies future hardware, runs on current hardware - rigid policy, hardware inflexible, performance cliffs e.g., TCC, UTM, LogTM, VTM, PTM, BulkTM e.g., RSTM, DSTM, McRT, TL2, SXM e.g., HyTM, Intel Hybrid TM
  • 3. 01/29/15 An Integrated Hardware-Software Approach to Flexible 3 Our Approach Hardware-Software Transactions – hardware to accelerate STMs and support your favorite policy – hardware that supports flexible software implementation – software routines to support uncommon events (i.e., overflows, context switches, paging) + flexible policy, supports today’s hardware, accelerates STMs, multiple uses for acceleration hardware - slower than HTMs, library compatibility (compiler support?) e.g., RTM (this talk), AOU_N (yesterday at SPAA 2007)
  • 4. 01/29/15 An Integrated Hardware-Software Approach to Flexible 4 TAG Data Data Structures in TM R W HTM cache entry STM organization Data Meta Data Conflict resolution Version management DataA TAG Alert-On-Update for conflict detection Meta Data TAGR W Programmable-Data-Isolation for data versioning Flexible Transactional Memory Conflict resolution Version management &
  • 5. 01/29/15 An Integrated Hardware-Software Approach to Flexible 5 Why ? • Decoupled conflict detection and version management for flexible policy and usage • Conflict detection – Eager, at first read/write to a shared data – Lazy, prior to commit of speculative updates – Mixed, eager write-write and lazy read-write – and more..... • Flexible software contention managers – arbitrate among conflicting transactions
  • 6. 01/29/15 An Integrated Hardware-Software Approach to Flexible 6 For workload description, please see the paper 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Hash RBTree-Large LinkedList- Release LFUCache RandomGraph NormalizedExecutionTime Abort Copy Validation CM Bookkeeping MM App Non-Tx App Tx STM Overheads 21% 43% 42% 34% Overheads targeted Runtime SW RBTree RSTM [TRANSACT ’06] Copying : Buffering of speculative modifications to ensure isolation Validation: Verifying consistency of accessed locations 79%
  • 7. 01/29/15 An Integrated Hardware-Software Approach to Flexible 7 Flexible Transactional Memory • Leave policy decisions in software – multiple-writer coherence for data isolation at software’s behest – HW provides conflict detection, SW specifies resolution policy • Minimize the validation overhead – Alert-on-update provides fast event based communication of remote memory operations • Eliminate copying overhead – Programmable data isolation allows software to employ private caches as thread local buffers • Use software mechanisms to accommodate virtualization (i.e., cache overflows, paging, thread switches)
  • 8. 01/29/15 An Integrated Hardware-Software Approach to Flexible 8 Alert-On-Update (AOU) • ISA includes an instruction, ALoad, that loads an address and marks the cache line • A-tagged line on invalidation – jumps to a software handler – masks further alerts until exit from alert handler • Alerts can be due to – capacity, cache cannot track update events on evicted line – coherence, remote processor has acquired exclusive access Caveat: AOU support cannot extend across events that exhaust space and timeAdvantages: general, lightweight, simple, and fine-grained DataA TAG Cache Entry
  • 9. 01/29/15 An Integrated Hardware-Software Approach to Flexible 9 • ISA provides TStore and TLoad to isolate data in cache line • TMI buffers/isolates TStores – supports concurrent speculative writers; BusTRdX ignored – supports concurrent readers; BusRd threatened and data response suppressed • TI isolates concurrent readers from speculative writers – values written by other TStores are isolated; – a threatened read results in dropping to TI Programmable Data Isolation (PDI)
  • 10. 01/29/15 An Integrated Hardware-Software Approach to Flexible 10 For details on coherence protocol and tag encoding, please see TR 910 Programmable Data Isolation (PDI) • TI lines isolate concurrent readers from speculative writers – are dropped without alerting processor – allow caching; drop to I on revert or commit • TStored (TMI) lines buffer speculative stores – must remain in cache or HW alerts active thread – drop to M on commit, I on revert • Support R-W and W-W concurrent sharers (if SW wants) • no global consensus in HW required for committing – commit is entirely local; SW responsible for correctness
  • 11. 01/29/15 An Integrated Hardware-Software Approach to Flexible 11 Putting things together • Decoupled hardware for – version management (PDI) and conflict detection (AOU) – accelerating common TM operations • Many feasible software libraries to – implement and export transaction constructs – handle time and space exhaustion – control runtime policy • RTM is an object-level, indirection based TM.
  • 12. 01/29/15 An Integrated Hardware-Software Approach to Flexible 12 RTM Data Structure Owner Status Transaction Descriptor Current Data (if versioning in SW) Serial # New Data uncommitted Overflow Readers Serial # Runtime SW associates a metadata header with every object. An Object can denote a semantic entity or a group of memory locations. Metadata per Object reader bitmap to track transactions not using HW support committed Conflict detection Data Versioning N cache lines
  • 13. 01/29/15 An Integrated Hardware-Software Approach to Flexible 13 FastPath Transactions (Validation + Copying) Program Data Begin_hw_t abort_pc ALD TxD_2 ALD OH(A) TLD A TST A CAS OH(A) CAS-Commit TxD_2 Owner COMMIT TxD_1 #S Overflow Readers TxD_2 CAS ACTIVECOMMIT A (current) • Do not overflow time or space resources • ALoad descriptor to detect concurrent active transactions • ALoad object header to detect ownership changes • TStore updates are isolated in private cache OH(A) AOU PDI In Cache
  • 14. 01/29/15 An Integrated Hardware-Software Approach to Flexible 14 A current Overflow Transactions Program Data Begin_sw_t abort_pc ALD TxD_2 LD OH(A) ........... ST A’ CAS OH(A) CAS-Commit TxD_2 Owner COMMIT TxD_1 #S Overflow Readers TxD_2 CAS ACTIVECOMMIT OH(A) A’ new version • ALoad descriptor to detect concurrent active transactions • To Read, update overflow-reader list to notify future requestors • To Write, copy current version and buffer speculative updates In Cache AOU
  • 15. 01/29/15 An Integrated Hardware-Software Approach to Flexible 15 TMESI Prototype I$ Shared L2$ 1P D$ I$ 2P D$ I$ 16P D$ Snoopy Interconnect SPARC v9 1.2GHz 64KB I&D, 4-way 2-cycle access 32 entry VB Memory 4-ary ordered tree 1-cycle link delay 64 bytes/cycle 8MB,8way,4banks 20-cycle bank delay 100-cycle DRAM access ………. MESI coherence protocol The simulation infrastructure is based on the SIMICS + Multifacet GEMS framework Our thanks to the Wisconsin Multifacet group for distributing the GEMS toolset
  • 16. 01/29/15 An Integrated Hardware-Software Approach to Flexible 16 * For a detailed description of Lite transactions, please see the paper Runtime Systems • CGL (Coarse Grain Lock) • RTM-F(astpath) - Validation, Copying • RTM-O(verflow) - Validation, Copying • RTM-Lite* - Validation, Copying • RSTM (Invisible + Eager) [Transact’06] Benchmarks 33% lookup, 33%insert, 33%delete operations on HashTable (256 buckets), RBTree RBTree-Large (256byte entry), LinkedList-Rel, LFUCache (255 queue + 2048 array), RandomGraph
  • 17. 01/29/15 An Integrated Hardware-Software Approach to Flexible 17 RTM-F Scales RBTree-Large • RTM-F improves performance and provides good scalability - at 2 threads its 50% slower than CGL1 but at 16 threads its 1.8X faster • RTM-O’s performance is as good as RSTM on a CMP (Avg: 6% variation) 0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 1 2 4 8 16 Threads NormalizedThroughput CGL RTM-F RTM-Lite RTM-O RSTM 1.9X 2X CGL, 1thread = 1 2X
  • 18. 01/29/15 An Integrated Hardware-Software Approach to Flexible 18 Hardware accelerates Software 0 0.5 1 1.5 2 2.5 3 Hash RBTree RBTree- Large LinkedList- Rel NormalizedThroughput RTM-F RTMLite RTM-O RSTM 0 0.05 0.1 0.15 0.2 0.25 0.3 LFUCache • RTM-F’s speedup over RTM-Lite is proportional to copying overhead - HashTable (5%), LFUCache (14%), RBTree-Large(45%) • RTM-Lite presents an attractive HW cost/performance tradeoff - 45% slower than RTM-F on our most copy heavy benchmark CGL, 1thread = 1 1.5X 1.7X 1.8X1.7X 1.6X 16 Threads
  • 19. 01/29/15 An Integrated Hardware-Software Approach to Flexible 19 Conflict Policy Important! 0 0.2 0.4 0.6 0.8 1 NormalizedThroughput RandomGraph 1 2 4 8 16 X-Axis, Threads Livelock 0 1 2 3 4 5 6 1 2 4 8 16 NormalizedThroughput Hash Eager Lazy
  • 20. 01/29/15 An Integrated Hardware-Software Approach to Flexible 20 Conflict Policy Important! • In applications with low degree of sharing – Eager as good as lazy – Lazy imposes higher bookkeeping overheads • In applications with high degree of sharing – Lazy eliminates livelock anomalies – Lazy exploits R-W and W-W sharing – Lazy narrows conflict window to attain more commits HashTable (Eager is 21% faster) and RBTree (Eager is 10% slower) LFUCache (Lazy is 28% faster) and RandomGraph (lazy eliminates livelocks)
  • 21. 01/29/15 An Integrated Hardware-Software Approach to Flexible 21 To Take Home • Decouple hardware for versioning and conflict detection to enable – flexible software TM policy and – non-TM uses • Flexible conflict detection and management to eliminate performance anomalies • Use software to handle the uncommon cases
  • 22. 01/29/15 An Integrated Hardware-Software Approach to Flexible 22 Questions Download RSTM version 3.0 at http://www.cs.rochester.edu/research/synchronization/ Arrvindh Mike Sandhya VirendraHemayet Michael
  • 23. 01/29/15 An Integrated Hardware-Software Approach to Flexible 23 Backup
  • 24. 01/29/15 An Integrated Hardware-Software Approach to Flexible 24 Future Work • How to enable flexible usage of hardware ? – semantics, concurrent use, programmer interface • Simplify metadata organization • Extend to scalable protocols and compare with pure HTM system • Strong Isolation and Privatization
  • 25. 01/29/15 An Integrated Hardware-Software Approach to Flexible 25 RTM Interface Z = X + Y ≡ 1. Start transaction in (Fastpath/Overflow) mode and save abort-handler PC2. Open object metadata before reading/writing object data3. Read and speculatively update objects4. Acquire ownership of written objects in their metadata at either - open (i.e. eager) + reduces wasted work, - possible livelock, reduced concurrency (not even R-W sharing) - end_tx (i.e. lazy) + increased concurrency, livelock freedom - more wasted work, requires lazy versioning 5. If Active, switch status to commited. BEGIN_TX (handler_ptr, mode [H/S]) const integer* rd_X = X  open_RO() const integer* rd_Y = Y  open_RO() integer* wr_Z = Z  open_RW() *wr_Z = (*rd_X) x (*rd_Y) END_TX
  • 26. 01/29/15 An Integrated Hardware-Software Approach to Flexible 26 P0 L1 Shared L2 1 P1 L1 P2 L1 T0 T1 T2 TLoad A TStore B TStore A TLoad A TLoad B 23 4 5 TGetX AE: OH(A) TEE: A AE: OH(B) TMI: B AS: OH(A) TMI: A AS: OH(A) TII: A AS: OH(A) TII: A AS: OH(B) TII: B AS: OH(B) Protocol Animation Cache line size objects: A,B Object Metadata: OH(A), OH(B)
  • 27. 01/29/15 An Integrated Hardware-Software Approach to Flexible 27 Protocol Animation P0 L1 Shared L2 1 P1 L1 P2 L1 T0 T1 T2 TLoad A TStore B TStore A TLoad A TLoad B Acquire OH(A) CAS-Commit CAS-Commit 23 4 5 GetX AS: OH(A) AS: OH(B) TMI: B AS: OH(A) TMI: ATII: A AS: OH(A) TII: A AS: OH(B) TII: B 6 S: OH(A) I: A S: OH(B) I: B 7 Abort I: OH(A) S: OH(B) I: B I: A M: A M: OH(A) Commit Commit Cache line size objects: A,B Object metadata: OH(A), OH(B)
  • 28. 01/29/15 An Integrated Hardware-Software Approach to Flexible 28 Lite Transaction (Validation) • To read – ALoad object header to detect object ownership acquisition • To write – ALoad descriptor to detect concurrent transactions stealing ownership – Clone object and buffer modifications – Acquire ownership and pointers to perform logical update
  • 29. 01/29/15 An Integrated Hardware-Software Approach to Flexible 29
  • 30. 01/29/15 An Integrated Hardware-Software Approach to Flexible 30 • What is the serial number for ? • How does A-tags differ from Intel-HASTM • Privatization • 2X is not enough, why are you slow ? • What about strong isolation ? • What about 2 modified lines
  • 31. 01/29/15 An Integrated Hardware-Software Approach to Flexible 31

Notas del editor

  1. kmp
  2. Why keep policy in SW
  3. What is invisible