3. Disksim
Disksim: An open source disk simulator originally developed at
UMich. and enhanced at CMU.
4. Disksim features
Various device model including: disk, simpledisk,
memsmodel
Controller model: simple, smart(with cache)
Trace synthesis and different trace file format
DIXtrac: automatic disk characterization
5. ssdmodel
Developed by Microsoft.
NOT for any specific SSD Device
For an idealized SSD that is parameterized by the
properties of NAND flash chips
Cache is NOT natively supported
6. Source Dir
src/ disksim source (disksim_*.c/h)
ssdmodel/ ssd extension source (ssd_*.c/h)
diskmodel/ diskmodel layout and mech
memsmodel/ MEMS device model
libparam/ parameter processing lib
...
8. Disksim source: src/
disksim_main* main entrance main()
disksim_iodriver* driver iodriver_send_event_down_path()
dismsim_bus* bus bus_deliver_event()
disksim_controller* controller controller_event_arrive()
disksim_diskctlr* disk controller disk_event_arrive()
...
9. Disksim Control Path
Event Based System:
various types of events: io, interrupt, timer...
all event are stored in a global queue in time order
addtointq() and removefromintq() are used to access the
global queue
Equivalent code:
while(curr=getnextevent()){
swith (curr->type){
case IO_REQUEST_ARRIVE:
iodriver_request(curr); break;
}
}
10. Example
src/disksim_iosim.c io_internal_event()
case IO_ACCESS_ARRIVE:
iodriver_schedule(0, curr);
break;
src/disksim_iodriver.c iodriver_schedule()
iodriver_send_event_down_path(curr);
src/disksim_iodriver.c iodriver_send_event_down_path()
bus_deliver_event(busno.byte[0], slotno.byte[0], curr);
11. Example con.
src/disksim_bus.c bus_deliver_event()
case CONTROLLER:
controller_event_arrive(devno, curr);
break;
case DEVICE:
ASSERT(devno == curr->devno);
device_event_arrive(curr);
break;
This control flow is a simulation of an event.
12. Disksim & Device Interface
INLINE void device_event_arrive (ioreq_event *curr)
{
ASSERT1 ((curr->devno >= 0) && (curr->devno <
numdevices), "curr->devno", curr->devno);
return disksim->deviceinfo->devices[curr->devno]-
>event_arrive(curr);
}
Funtion pointer! By dynamic tracing using gdb, we found that
For disk, it jumps to disk_event_arrive()
For ssd, it jumps to ssd_event_arrive()
13. event_arrive: disk v.s. ssd
disk_event_arrive() ssd_event_arrive()
case IO_ACCESS_ARRIVE: case DEVICE_OVERHEAD_COMPLETE:
disk_request_arrive(curr); ssd_request_arrive(curr);
case DEVICE_OVERHEAD_COMPLETE:
disk_request_arrive(curr);
case DEVICE_ACCESS_COMPLETE:
case DEVICE_BUFFER_SEEKDONE: ssd_access_complete (curr);
disk_buffer_seekdone(currdisk, curr); case DEVICE_DATA_TRANSFER_COMPLETE:
case DEVICE_BUFFER_SECTOR_DONE: ssd_bustransfer_complete(curr);
disk_buffer_sector_done(currdisk, curr); case IO_INTERRUPT_COMPLETE:
case DEVICE_GOTO_REMAPPED_SECTOR:
disk_goto_remapped_sector(currdisk, curr);
ssd_interrupt_complete(curr);
case DEVICE_GOT_REMAPPED_SECTOR: case SSD_CLEAN_GANG:
disk_got_remapped_sector(currdisk, curr); ssd_clean_gang_complete(curr);
case DEVICE_PREPARE_FOR_DATA_TRANSFER: case SSD_CLEAN_ELEMENT:
disk_prepare_for_data_transfer(curr); ssd_clean_element_complete(curr);
case DEVICE_DATA_TRANSFER_COMPLETE:
disk_reconnection_or_transfer_complete(curr);
case IO_INTERRUPT_COMPLETE:
disk_interrupt_complete(curr);
"buffer" is cache related events. "clean" is garbage collection and wear-leveling
"remapped sector" seems to related to data layout related. "Gang" and "Element" specify the
(not sure) allocation and reclaim unit.
15. ssdmodel features
Add an auxiliary level of parallel elements, each with a
closed queue, to represent flash elements or gangs
Add logic to serialized request completions from these
parallel elements
For each elements, maintain data structures to represent
SSD logical block maps, cleaning state and wear_leveling
state
Delay is introduced when request is processed
Parameters including background cleaning, gang-size, gang
organization, interleaving, overprovisioning
17. Flash Chip Performance
1. Latency 4. Bandwidth and Interleave
bus<->data reg 100us
media->reg: read 25us src plane -> dest plane 4 page copying
(100us per page)
reg->media: write 200us
erease 1.5ms
2. Two-plane commands
can be executed on their
plane pairs 0&1 or 2&3
3. Support background copy
on the same plane
18. SSD Simulation
Logical Block Map
allocation pool
Cleaning
greedy or wear-leveling aware
Parallelism and Interconnect Density
ganging, interleaving, background cleaning
Persistence
saving mapping information per block in DRAM
19. Interconnection - Ganging
A gang of flash packages
can be utilized in synchrony
to optimized a multi-page
request.
Allow multiple packages to
be used in parallel while
sharing one request queue
A request queue can be
associated to each gang or
to each element (full
interconnection mode)
20. Logical Block Map
Use allocation pool to think about how an SSD allocates
flash blocks to service write requests
An allocation pool an be a flash package or a gang
Static: a portion of each LBA constitutes a fixed mapping to
a specific allocation pool
Dynamic: the non-static portion of a LBA is the lookup key
for a mapping within a pool
21. Garbage Collection (Cleaning)
active block: block available to holding incoming writes in a
pool
superseded page: out-of-date page
cleaning efficiency: (superseded / total pages) in a block
a pure greedy approach: choosing blocks to clean based on
potential cleaning efficiency
22. Wear-Leveling
average remaining lifetime(ARL) of a block
age variance (say 20%) of the ARL
retirement age (say 85%) of the ARL
Wear-aware garbage collection:
1. If ARL < retirement, migrate cold data into this block from a
migration-candidate queue, and recycle the head block of
the queue. Populate the queue with new blocks with cold
data.
Otherwise, if ARL<age variance, then restrict recycling of
the block with a probability that increases linearly as the
remaining lifetime drops to 0. (80% of average ~ Prob of
recycle = 1; 0% of average ~ 0)
23. Source: ssdmodel/
ssdmodel is very simple, all c files listed below:
ssd.c main ssd_event_arrive()
ssd_clean.c gabege collection and wear ssd_activate_gang()
leveling
ssd_gang.c several flash packages ssd_clean_blocks_greedy()
orgnised as gang
ssd_timing.c timing model ssd_compute_access_time()
ssd_utils.c util
ssd_init.c init
24. Example
event sequences for one request:
ssd_request_arrive->ssd_interrupt_complete(reconnect)->ssd_bustransfer_complete-
>ssd_access_complete->ssd_interrupt_complete(completion)
ssd_bustransfer_complete() -> ssd_media_access_request ();
ssdmodel/ssd.c: ssd_media_access_request ()
case SSD_ALLOC_POOL_PLANE:
case SSD_ALLOC_POOL_CHIP:
ssd_media_access_request_element(curr);
break;
case SSD_ALLOC_POOL_GANG:
#if SYNC_GANG
ssd_media_access_request_gang_sync(curr);
#else
ssd_media_access_request_gang(curr);
#endif
break;
25. Example con.
ssd_media_access_request_element()
-> sse_activate_element()
-> ssd_invoke_element_cleaning()
-> ssd_compute_access_time(currdisk, elem_num,
read_reqs, read_total);
-> add complete into global event queue
-> ssd_compute_access_time(currdisk, elem_num,
write_reqs, write_total);
-> add complete into global event queue
Parallel processing sequential complete is achieved by processing batch of requests
in parallel, however, generate the ACCESS_COMPLETE events sequencially
26. References
Disksim: http://www.pdl.cmu.edu/DiskSim/
Disksim Manual: http://www.pdl.cmu.edu/PDL-
FTP/DriveChar/CMU-PDL-08-101.pdf
Disksim implementation doc: src/doc/Outline.txt
SSD Extension: http://research.microsoft.com/en-
us/downloads/b41019e2-1d2b-44d8-b512-ba35ab814cd4/
SSD Extension paper: Design Tradeoffs for SSD
Performance, N Agrawal, 2008
Cache over SSD project: Group 6 on http://www-users.cselabs.
umn.edu/classes/Spring-2009/csci8980-ass/