SlideShare a Scribd company logo
1 of 26
1
ioremap & mmap in Linux
Taichien Chang
Outline
2
How to access Physical Address?
Why ioremap? & ioremap func.
 Flow of I/O Memory Map Access
 Why MMAP?
MMAP Syscall. & MMAP func.
MMAP flags : MAP_SHARED, MAP_PRIVATE ,
MAP_LOCKED.
Flow of implement of mmap
remap_pfn_range func.
The implement of mmap file operation
How to access Physical Address ?
3
1. Drivers use virtual address.
2. H/W use physical address(Registers,RAM)
3. Virtual memory doesn’t store anything, it simply maps a
program’s address space onto the underlying physical memory.
In Direct Mapping Area :
Virtual Address Physical Address
Kernel Space
User Space
MMU
I/O Mem
phys_to_virt() or __pa()
0x10200000
0xd0200000
0x200000
RAM
0xc0000000 3G
4G
“Virtual Memory NOT Physical RAM"
Address Translation func.
4
PAGE_OFFSET= 0XC000000 (for x86)
PAGE_OFFSET= 0x80000000 (for MIPS Cached Address)
PAGE_OFFSET= 0xA0000000 (for MIPS Uncached Address)
MIPS architectures.
Why ioremap ?
5
1. When physical memory or I/O Address is larger than virtual address
space size.(0xffffffff)
2. How to access these extra physical addresses?
Virtual Address Physical Address
Kernel Space
User Space
MMU
I/O Mem
0x40200000
0xf8044000
0x200000
RAM
0xc0000000 3G
4G
ioremap()
phys_to_virt(0x40200000)= 0x00300000 ????
Reserved for MMIO
x86 128MB
“Using I/O Memory Mapping"
0xffffffff
3. Use __pa(high_memory)? 0x377fe000 ≒ 896MB
ioremap func.
6
#include <asm/io.h>
__u32 __iomem virt_addr = ioremap(unsigned long phys_addr, unsigned long size);
__u32 __iomem virt_addr = ioremap_nocache(unsigned long phys_addr, unsigned long size);
void iounmap(void * virt_addr );
You should not directly access addresses returned by ioremap as if they
were pointer to virtual memory address. Why?
 We have these functions to access H/W register
“Guarantee read/write ordering"
readb(addr)
readw(addr)
readl(addr)
writeb(val,addr)
writew(val,addr)
writel(val,addr)
memcpy_fromio(buffer,addr, len);
memcpy_toio(addr,buffer,len);
memset_io(addr,val,len);
Flow of I/O Memory Map Access
7
#include <asm/ioport.h>
Using
request_mem_region(unsigned long start, unsigned long len, char *name);
to reserve [start , start+len] region into “iomem_resource” & avoid another driver to use them.
 All I/O memory allocations are listed in /proc/iomem.
request_mem_region(phy_addr,len,”NAME”)
virt_addr = ioremap(phy_addr,len)
readb/readw/readl (virt_addr)
writeb/writew/writel (val,virt_addr)
iounmap(virt_addr)
release_mem_region(phy_addr,len)
Driver Open
Driver Release
Memory Mapping between kernel & User space
8
Q:How can AP directly access to physical address ? (RAM or Registers)
A:Kernel provide a system call - “mmap”
Virtual Address Physical Address
Kernel Space
User Space
MMU RAM
0xc0000000 3G
4G
mmap()
0x10200000
1.Reserved Memory
2.Dynamic Memory
virt_to_phys()
kmalloc()
to create dynamic memory space
SetPageReserved()
對kernel virtual address調用virt_to_phys也是沒有意義的
Read File from Disk (1) – Using “read()”
9
1. AP allocate 8KB buffer in user space & exec “read()” file operation.
2. Kernel find & allocates 2 pages, initiates I/O requests for 8KB.
3. Driver send SCSI Command to read 16 sectors(8KB) & copy to allocated pages.
4. Kernel copies the requested 8KB from page cache to user buffer.
Virtual Address Physical Address
Kernel Space
User Space
MMU RAM
0xc0000000 3G
4G
Read(2page)
=8192bytes
Find 2 free pages in RAM
& Read (512bytes x 16)
HARD DISK
offset
fd=open(“file”)
read(8192byte)
Page Cache
Read File from Disk (1) – Using “mmap()”
10
1. AP call “mmap()” syscall to mapping file with length=8KB.
2. Kernel find & allocates 2 pages, initiates I/O requests for 8KB.
3. Driver send SCSI Command to read 16 sectors(8KB) & copy to allocated pages.
4. AP can directly access file via page buffer without allocating buffer again.
Virtual Address Physical Address
Kernel Space
User Space
MMU RAM
0xc0000000 3G
4G
mmap(2page)
=8192bytes
Find 2 free pages in RAM
& Read (512bytes x 16)
HARD DISK
offset
fd=open(“file”)
read(8192byte)
Page Cache
Why MMAP?
11
 Reduced memory usage : 1 memory copy
 Performance gain:
Read/write file operations & ioctl syscall by using
copy_from_user/copy_to_user make too much effort to copy large data
between Kernel space & User Space.
 “MMAP” can yield significant performance improvements. 30%
MMAP func.
12
#include <sys/mman.h>
virt_addr = mmap(start_addr, len, int prot, int flag, int fd, offset);
Returns Starting virtual address of the mapping if OK, MAP_FAILED on error
start_addr  If NULL, then the kernel chooses the address available at which to create the mapping
prot  memory protection
flag  MAP_SHARED
MAP_PRIVATE …..
fd  should be a valid file descriptor
offset should be a multiple of the page size
User Virtual Address File referenced by fd
start_addr
offset
len
return
value of mmap
PROT_EXEC
Pages may be executed.
PROT_READ
Pages may be read.
PROT_WRITE
Pages may be written.
PROT_NONE
Pages may not be accessed.
PROT_NONE
PROT_NONE
PROT_READ
PROT_WRITE
MMAP with MAP_SHARED flag (Share Mapping)
13
1. Thanks to virtual memory management, different processes can have mapped pages in common.
2. Share this mapping with all other processes that map this object.
3. Storing to the region is equivalent to writing to the file.  Changes are shared.
Ex:
virt_addr2 = (char*)mmap(0, size,PROT_WRITE|PROT_READ,MAP_SHARED,fd,offset);
Virtual Address in Process
Process 2
② READ
virt_addr1
virt_addr2
Process 1
①WRITE
(8192byte)
Physical Address
MMU RAM
Write(2page)
=8192bytes
Find 2 free pages in RAM
& Read (512bytes x 16)
HARD DISK
Write data
offset
fd=open(“file”)
Page Cache
Write data
 msync(virt_addr2,size, MS_SYNC); ☞
virt_addr2 must be page aligned
msync()
to force flush changes
Write data
Read(2page)
=8192bytes
MMAP with MAP_PRIVATE flag (Private Mapping)
1. Any modifications to the data are not reflected to the file.
2. Any modifications not visible to other processes mapping the same file.  Changes are private.
3. A real life example can be found in :
glibc’s Dynamically linking libraries (*.so) are loaded by using Private Mapping.
virt_addr2 = (char*)mmap(0, size,PROT_WRITE|PROT_READ,MAP_PRIVATE,fd,offset);
Virtual Address in Process
② READ
virt_addr1
virt_addr2
Process 1
①WRITE
(2048byte)
Physical Address
MMU RAM
HARD DISK
offset
fd=open(“file”)
Page Cache
2
Read(1page)
=4096bytes
!
3
1
3
1
Process 2
2
1.“copy-on-write”
2
3
1
2.Write(0.5page)
=2048bytes
2
2
Ex:
MMAP with MAP_LOCKED flag
15
 Lock the pages of the mapped region into physical memory (avoid swapping out)
 Kernel version > 2.5.37
 Set the VMA flag of VM_LOCKED
 In the same manner of mlock()
#include <sys/mman.h>
int mlock(const void * virt_addr, size_t len);
int munlock(const void * virt_addr, size_t len);
Ex:
virt_addr = (char*)mmap(0, size,PROT_WRITE|PROT_READ,MAP_SHARED|MAP_LOCKED,fd,offset);
Virtual Address
Physical Address
MMU
Clean Pages
VMA
RAM
mmap()
SWAP
Page Cache
Dirty Page
Dirty Page
Reduce the size of page cache
HARD DISK
offset
fd=open(“file”)
Write Swap it out!!
virt_addr
len
The Usual Rules of mmap()
16
 The requested memory protection (prot, flags) must be
compatible with the file descriptor permissions (O_RDONLY,
etc.).
Ex: If PROT_WRITE and MAP_SHARED are given, the file must be
open for writing.
 Usually, an entire mapping is unmapped, e.g.:
i f ( ( virt_addr = mmap(NULL, length , /* . . . */ ) ) < 0)
perror("mmap error") ;
/* access memory mapped region via addr */
i f (munmap( virt_addr , length ) < 0)
perror("munmap error ") ;
 Accessing it after asuccessful munmap will (very likely) result in a
segmentation fault.
Mmap --- Example
17
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>
int main( int argc,char **argv ) {
int fd ;
int filesize= getpagesize(); //sysconf(_SC_PAGESIZE)
void *virt_addr;
if ( ( fd = open( “test.bin”, O_RDONLY) ) < 0)
perror("open error”) ;
virt_addr = mmap(0, filesize, PROT_READ, MAP_SHARED | MAP_LOCKED, fd , 0) ;
if (virt_addr == MAP_FAILED) perror("mmap error”) ;
*(unsigned long*)virt_addr = 0x12345678;
msync(virt_addr,filesize,MS_SYNC)
munmap(virt_addr,filesize)
}
mmap - Direct Mapping to RAM
18
 If we want to mapping directly to RAM & access physical addresses, we
need to build a custom driver to implement mmap file operation.
Ex : We create a device file “mmapx” to replace normal file via our custom
driver – “mmapx.ko”.
Virtual Address Physical Address
Kernel Space
User Space
MMU RAM
0xc0000000 3G
4G
mmap()
offset
fd=open(“/dev/mmapx”)
mmapx
Physical
address =
offset
fd=open(“file”)
HARD DISK
Flow of Direct Mapping via mmap syscall
19
mmapx driver AP
Create a device file
/dev/mmapx
module_init :
mmap file operation:
Using remap_pfn_range
to do real memory
mapping
time
open device file:
fd = open(“/dev/mmapx”)
call mmap syscall:
virt_addr =
mmap(0,size,PROT_READ|PROT
_WRITE,MAP_SHARED|MAP_LO
CKED,fd,phyaddr);
K
E
R
E
N
L
S
P
A
C
E
U
S
E
R
S
P
A
C
E
call munmap syscall:
munmap(virt_addr ,size);
close device file:
close(fd);
What does “remap_pfn_range” do & before doing?
20
1. Kernel allocate a vma area. (Kernel manage user space address by using
vm_area_struct)
2. Driver get pages (physical address) of physical RAM. (via vma->vm_pgoff)
3. Driver call remap_pfn_page() to build a new “page table” to map a range of
physical addresses.
Process Virtual Memory
Physical Memory
MMU
address
RAM
offset
fd=open(“/dev/mmapx”)
mmapx
Physical
address
=
vm_area_struct address
address
page
page
page
vma->vm_start
vma->vm_end
vma->vm_pgoff =
=
Process Descriptor
vm_area_struct
vm_area_struct
vm_area_struct
remap_pfn_page()
Link to new Page table
Using remap_pfn_range
21
int remap_pfn_range(struct vm_area_struct *vma,unsigned long virt_addr,
unsigned long pfn,unsigned long size, pgprot_t prot);
 Only for “reserved pages” (Out of memory management) & “physical address”
★ Kernel helps us to fill these arguments :
vma The virtual memory area into which the page range is being mapped.
virt_addr The user virtual address where mapping should begin.(vma->vm_start)
pfn Page Frame Number corresponding to the physical address.
For most users , vm->vm_pgoff contains physical address.
vma->vm_pgoff << PAGE_SHIFT is the value you need.
size  The area size being remapped. In bytes. (vma->vm_end- vma->vm_start)
prot  Protection for Pages in this VMA. Using vm->vm_page_prot .
If you don’t want the mapping area cached by CPU ,
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
The implement of mmap file operation
22
#include <linux/mm.h>
int sample_mmap(struct file *filp, struct vm_area_struct *vma)
{
unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
if (offset >=_pa(high_memory) || (filp->f_flags & O_SYNC))
vma->vm_flags |= VM_IO;
vma->vm_flags |= VM_RESERVED;
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
if (remap_pfn_range(vma , vma->vm_start, vma->vm_pgoff ,
vma->vm_end-vma->vm_start, vma->vm_page_prot))
return -EAGAIN;
vma->vm_ops=&sample_vm_ops ;
return 0;
}
Ldd3 Example : http://www.cs.fsu.edu/~baker/devices/lxr/http/source/ldd-
examples/simple/simple.c
This VMA MUST be a MMIO/VRAM backend memory, not
System RAM. & prevent the region being core dumped
Out of memory management – never be swapped out
Flow of custom mmapx driver
23
mmapx driver AP
Create a device file
/dev/mmapx
module_init :
mmap file operation:
Using remap_pfn_range to
do real memory mapping
time
open device file:
fd = open(“/dev/mmapx”)
call mmap syscall:
virt_addr =
mmap(0,size,PROT_READ|PROT
_WRITE,MAP_SHARED|MAP_LO
CKED,fd,phyaddr);
K
E
R
E
N
L
S
P
A
C
E
U
S
E
R
S
P
A
C
E
call munmap syscall:
munmap(virt_addr ,size);
close device file:
close(fd);
call ioctl syscall:
phyaddr =
ioctl(fd,size,GET_MEMORY)
ioctl file opreation:
Case GET_MEMORY :
buf=kmalloc(size)
phyaddr=virt_to_phys(buf)
vma->vm_flags |=VM_RESERVED
module_exit :
kfree(buf);
mmap summary
24
 The device driver is loaded.
It defines an mmap file operation.
 A user space process calls the mmap system call.
 The process gets a starting address to read from and write to .
(depending on permissions).
 The MMU automatically takes care of converting the process virtual
addresses into physical ones.
Direct access to the hardware! No expensive read or write system calls!
More mmap:
25
1 : Operation not permitted for “/dev/mem” :
fd= open("/dev/mem", O_RDWR | O_SYNC);
Virtaddr=mmap(0, PAGE_SIZE, PROT_READ | PROT_WRITE,MAP_SHARED,fd,phyaddr);
not supported in defult for Linux Kernel 2.6.25↑ expect for disabling
CONFIG_STRICT_DEVMEM on kernel building.
2. We need to set page reserved before doing real mapping(remap_pfn_range).
Linux 2.4 ↓  Using mem_map_reserve() to set each pages as PG_Reserved.
Linux 2.6.0~2.6.18 ↓  Using SetPageReserved() to set each pages as PG_Reserved.
Linux 2.6.25 ↑  Setting vm_flags as VM_RESERVED to avoid swapping out.
3. We do not need use “msync()” to force flush changes in our AP via custom mmapx
driver. Because there is no “Page-Cache” implemented in our custom mmapx driver.
And msync will call fsync file operation, so we also do not implement fsync.
4. A buffer allocated by get_user_pages() does not need mlock() function.
THANK YOU

More Related Content

What's hot

Reverse Mapping (rmap) in Linux Kernel
Reverse Mapping (rmap) in Linux KernelReverse Mapping (rmap) in Linux Kernel
Reverse Mapping (rmap) in Linux KernelAdrian Huang
 
COSCUP 2020 RISC-V 32 bit linux highmem porting
COSCUP 2020 RISC-V 32 bit linux highmem portingCOSCUP 2020 RISC-V 32 bit linux highmem porting
COSCUP 2020 RISC-V 32 bit linux highmem portingEric Lin
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux KernelAdrian Huang
 
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...Adrian Huang
 
Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelContinguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelKernel TLV
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory ManagementNi Zo-Ma
 
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedVmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedAdrian Huang
 
semaphore & mutex.pdf
semaphore & mutex.pdfsemaphore & mutex.pdf
semaphore & mutex.pdfAdrian Huang
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking ExplainedThomas Graf
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File SystemAdrian Huang
 
Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Pankaj Suryawanshi
 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBshimosawa
 
Linux kernel memory allocators
Linux kernel memory allocatorsLinux kernel memory allocators
Linux kernel memory allocatorsHao-Ran Liu
 
Bootstrap process of u boot (NDS32 RISC CPU)
Bootstrap process of u boot (NDS32 RISC CPU)Bootstrap process of u boot (NDS32 RISC CPU)
Bootstrap process of u boot (NDS32 RISC CPU)Macpaul Lin
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)shimosawa
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedAnne Nicolas
 
Linux Kernel MMC Storage driver Overview
Linux Kernel MMC Storage driver OverviewLinux Kernel MMC Storage driver Overview
Linux Kernel MMC Storage driver OverviewRajKumar Rampelli
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)Brendan Gregg
 

What's hot (20)

Reverse Mapping (rmap) in Linux Kernel
Reverse Mapping (rmap) in Linux KernelReverse Mapping (rmap) in Linux Kernel
Reverse Mapping (rmap) in Linux Kernel
 
COSCUP 2020 RISC-V 32 bit linux highmem porting
COSCUP 2020 RISC-V 32 bit linux highmem portingCOSCUP 2020 RISC-V 32 bit linux highmem porting
COSCUP 2020 RISC-V 32 bit linux highmem porting
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux Kernel
 
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
 
Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelContinguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux Kernel
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
 
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedVmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
 
semaphore & mutex.pdf
semaphore & mutex.pdfsemaphore & mutex.pdf
semaphore & mutex.pdf
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File System
 
Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)
 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKB
 
Linux kernel memory allocators
Linux kernel memory allocatorsLinux kernel memory allocators
Linux kernel memory allocators
 
Bootstrap process of u boot (NDS32 RISC CPU)
Bootstrap process of u boot (NDS32 RISC CPU)Bootstrap process of u boot (NDS32 RISC CPU)
Bootstrap process of u boot (NDS32 RISC CPU)
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
 
Linux Kernel MMC Storage driver Overview
Linux Kernel MMC Storage driver OverviewLinux Kernel MMC Storage driver Overview
Linux Kernel MMC Storage driver Overview
 
Platform Drivers
Platform DriversPlatform Drivers
Platform Drivers
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
Linux dma engine
Linux dma engineLinux dma engine
Linux dma engine
 

Viewers also liked

Basic of virtual memory of Linux
Basic of virtual memory of LinuxBasic of virtual memory of Linux
Basic of virtual memory of LinuxTetsuyuki Kobayashi
 
Process' Virtual Address Space in GNU/Linux
Process' Virtual Address Space in GNU/LinuxProcess' Virtual Address Space in GNU/Linux
Process' Virtual Address Space in GNU/LinuxVarun Mahajan
 
Simple and efficient way to get the last log using MMAP
Simple and efficient way to get the last log using MMAPSimple and efficient way to get the last log using MMAP
Simple and efficient way to get the last log using MMAPTetsuyuki Kobayashi
 
Creating a keystroke logger in unix shell scripting
Creating a keystroke logger in unix shell scriptingCreating a keystroke logger in unix shell scripting
Creating a keystroke logger in unix shell scriptingDan Morrill
 
Cpu Cache and Memory Ordering——并发程序设计入门
Cpu Cache and Memory Ordering——并发程序设计入门Cpu Cache and Memory Ordering——并发程序设计入门
Cpu Cache and Memory Ordering——并发程序设计入门frogd
 
Process monitoring in UNIX shell scripting
Process monitoring in UNIX shell scriptingProcess monitoring in UNIX shell scripting
Process monitoring in UNIX shell scriptingDan Morrill
 
Memory & I/O interfacing
Memory & I/O  interfacingMemory & I/O  interfacing
Memory & I/O interfacingdeval patel
 

Viewers also liked (9)

Basic of virtual memory of Linux
Basic of virtual memory of LinuxBasic of virtual memory of Linux
Basic of virtual memory of Linux
 
Process' Virtual Address Space in GNU/Linux
Process' Virtual Address Space in GNU/LinuxProcess' Virtual Address Space in GNU/Linux
Process' Virtual Address Space in GNU/Linux
 
Simple and efficient way to get the last log using MMAP
Simple and efficient way to get the last log using MMAPSimple and efficient way to get the last log using MMAP
Simple and efficient way to get the last log using MMAP
 
Creating a keystroke logger in unix shell scripting
Creating a keystroke logger in unix shell scriptingCreating a keystroke logger in unix shell scripting
Creating a keystroke logger in unix shell scripting
 
ARM 64bit has come!
ARM 64bit has come!ARM 64bit has come!
ARM 64bit has come!
 
Cpu Cache and Memory Ordering——并发程序设计入门
Cpu Cache and Memory Ordering——并发程序设计入门Cpu Cache and Memory Ordering——并发程序设计入门
Cpu Cache and Memory Ordering——并发程序设计入门
 
Process monitoring in UNIX shell scripting
Process monitoring in UNIX shell scriptingProcess monitoring in UNIX shell scripting
Process monitoring in UNIX shell scripting
 
Memory & I/O interfacing
Memory & I/O  interfacingMemory & I/O  interfacing
Memory & I/O interfacing
 
Build Features, Not Apps
Build Features, Not AppsBuild Features, Not Apps
Build Features, Not Apps
 

Similar to Linux MMAP & Ioremap introduction

Post Exploitation Bliss: Loading Meterpreter on a Factory iPhone, Black Hat U...
Post Exploitation Bliss: Loading Meterpreter on a Factory iPhone, Black Hat U...Post Exploitation Bliss: Loading Meterpreter on a Factory iPhone, Black Hat U...
Post Exploitation Bliss: Loading Meterpreter on a Factory iPhone, Black Hat U...Vincenzo Iozzo
 
Linux memory-management-kamal
Linux memory-management-kamalLinux memory-management-kamal
Linux memory-management-kamalKamal Maiti
 
Csw2016 economou nissim-getting_physical
Csw2016 economou nissim-getting_physicalCsw2016 economou nissim-getting_physical
Csw2016 economou nissim-getting_physicalCanSecWest
 
Mainmemoryfinalprefinal 160927115742
Mainmemoryfinalprefinal 160927115742Mainmemoryfinalprefinal 160927115742
Mainmemoryfinalprefinal 160927115742marangburu42
 
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020Eric Lin
 
tuningfor_oracle
 tuningfor_oracle tuningfor_oracle
tuningfor_oraclestyxyx
 
Windows memory manager internals
Windows memory manager internalsWindows memory manager internals
Windows memory manager internalsSisimon Soman
 
Raspberry pi's gpio programming with go
Raspberry pi's gpio programming with goRaspberry pi's gpio programming with go
Raspberry pi's gpio programming with goKonstantin Shamko
 
Introduction to Kernel Programming
Introduction to Kernel ProgrammingIntroduction to Kernel Programming
Introduction to Kernel ProgrammingAhmed Mekkawy
 
Buffer overflow tutorial
Buffer overflow tutorialBuffer overflow tutorial
Buffer overflow tutorialhughpearse
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceM Baddar
 
Mainmemoryfinal 161019122029
Mainmemoryfinal 161019122029Mainmemoryfinal 161019122029
Mainmemoryfinal 161019122029marangburu42
 
Kernel Recipes 2015: Anatomy of an atomic KMS driver
Kernel Recipes 2015: Anatomy of an atomic KMS driverKernel Recipes 2015: Anatomy of an atomic KMS driver
Kernel Recipes 2015: Anatomy of an atomic KMS driverAnne Nicolas
 

Similar to Linux MMAP & Ioremap introduction (20)

Memory
MemoryMemory
Memory
 
Post Exploitation Bliss: Loading Meterpreter on a Factory iPhone, Black Hat U...
Post Exploitation Bliss: Loading Meterpreter on a Factory iPhone, Black Hat U...Post Exploitation Bliss: Loading Meterpreter on a Factory iPhone, Black Hat U...
Post Exploitation Bliss: Loading Meterpreter on a Factory iPhone, Black Hat U...
 
Virtual memory 20070222-en
Virtual memory 20070222-enVirtual memory 20070222-en
Virtual memory 20070222-en
 
Linux memory-management-kamal
Linux memory-management-kamalLinux memory-management-kamal
Linux memory-management-kamal
 
Csw2016 economou nissim-getting_physical
Csw2016 economou nissim-getting_physicalCsw2016 economou nissim-getting_physical
Csw2016 economou nissim-getting_physical
 
Mainmemoryfinalprefinal 160927115742
Mainmemoryfinalprefinal 160927115742Mainmemoryfinalprefinal 160927115742
Mainmemoryfinalprefinal 160927115742
 
Defense_Presentation
Defense_PresentationDefense_Presentation
Defense_Presentation
 
Sysprog 15
Sysprog 15Sysprog 15
Sysprog 15
 
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
 
tuningfor_oracle
 tuningfor_oracle tuningfor_oracle
tuningfor_oracle
 
Windows memory manager internals
Windows memory manager internalsWindows memory manager internals
Windows memory manager internals
 
Raspberry pi's gpio programming with go
Raspberry pi's gpio programming with goRaspberry pi's gpio programming with go
Raspberry pi's gpio programming with go
 
Nachos 2
Nachos 2Nachos 2
Nachos 2
 
Linux memory
Linux memoryLinux memory
Linux memory
 
Introduction to Kernel Programming
Introduction to Kernel ProgrammingIntroduction to Kernel Programming
Introduction to Kernel Programming
 
Buffer overflow tutorial
Buffer overflow tutorialBuffer overflow tutorial
Buffer overflow tutorial
 
Mmap failure analysis
Mmap failure analysisMmap failure analysis
Mmap failure analysis
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Mainmemoryfinal 161019122029
Mainmemoryfinal 161019122029Mainmemoryfinal 161019122029
Mainmemoryfinal 161019122029
 
Kernel Recipes 2015: Anatomy of an atomic KMS driver
Kernel Recipes 2015: Anatomy of an atomic KMS driverKernel Recipes 2015: Anatomy of an atomic KMS driver
Kernel Recipes 2015: Anatomy of an atomic KMS driver
 

Recently uploaded

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptrcbcrtm
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 

Recently uploaded (20)

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.ppt
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Odoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting ServiceOdoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting Service
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 

Linux MMAP & Ioremap introduction

  • 1. 1 ioremap & mmap in Linux Taichien Chang
  • 2. Outline 2 How to access Physical Address? Why ioremap? & ioremap func.  Flow of I/O Memory Map Access  Why MMAP? MMAP Syscall. & MMAP func. MMAP flags : MAP_SHARED, MAP_PRIVATE , MAP_LOCKED. Flow of implement of mmap remap_pfn_range func. The implement of mmap file operation
  • 3. How to access Physical Address ? 3 1. Drivers use virtual address. 2. H/W use physical address(Registers,RAM) 3. Virtual memory doesn’t store anything, it simply maps a program’s address space onto the underlying physical memory. In Direct Mapping Area : Virtual Address Physical Address Kernel Space User Space MMU I/O Mem phys_to_virt() or __pa() 0x10200000 0xd0200000 0x200000 RAM 0xc0000000 3G 4G “Virtual Memory NOT Physical RAM"
  • 4. Address Translation func. 4 PAGE_OFFSET= 0XC000000 (for x86) PAGE_OFFSET= 0x80000000 (for MIPS Cached Address) PAGE_OFFSET= 0xA0000000 (for MIPS Uncached Address) MIPS architectures.
  • 5. Why ioremap ? 5 1. When physical memory or I/O Address is larger than virtual address space size.(0xffffffff) 2. How to access these extra physical addresses? Virtual Address Physical Address Kernel Space User Space MMU I/O Mem 0x40200000 0xf8044000 0x200000 RAM 0xc0000000 3G 4G ioremap() phys_to_virt(0x40200000)= 0x00300000 ???? Reserved for MMIO x86 128MB “Using I/O Memory Mapping" 0xffffffff 3. Use __pa(high_memory)? 0x377fe000 ≒ 896MB
  • 6. ioremap func. 6 #include <asm/io.h> __u32 __iomem virt_addr = ioremap(unsigned long phys_addr, unsigned long size); __u32 __iomem virt_addr = ioremap_nocache(unsigned long phys_addr, unsigned long size); void iounmap(void * virt_addr ); You should not directly access addresses returned by ioremap as if they were pointer to virtual memory address. Why?  We have these functions to access H/W register “Guarantee read/write ordering" readb(addr) readw(addr) readl(addr) writeb(val,addr) writew(val,addr) writel(val,addr) memcpy_fromio(buffer,addr, len); memcpy_toio(addr,buffer,len); memset_io(addr,val,len);
  • 7. Flow of I/O Memory Map Access 7 #include <asm/ioport.h> Using request_mem_region(unsigned long start, unsigned long len, char *name); to reserve [start , start+len] region into “iomem_resource” & avoid another driver to use them.  All I/O memory allocations are listed in /proc/iomem. request_mem_region(phy_addr,len,”NAME”) virt_addr = ioremap(phy_addr,len) readb/readw/readl (virt_addr) writeb/writew/writel (val,virt_addr) iounmap(virt_addr) release_mem_region(phy_addr,len) Driver Open Driver Release
  • 8. Memory Mapping between kernel & User space 8 Q:How can AP directly access to physical address ? (RAM or Registers) A:Kernel provide a system call - “mmap” Virtual Address Physical Address Kernel Space User Space MMU RAM 0xc0000000 3G 4G mmap() 0x10200000 1.Reserved Memory 2.Dynamic Memory virt_to_phys() kmalloc() to create dynamic memory space SetPageReserved() 對kernel virtual address調用virt_to_phys也是沒有意義的
  • 9. Read File from Disk (1) – Using “read()” 9 1. AP allocate 8KB buffer in user space & exec “read()” file operation. 2. Kernel find & allocates 2 pages, initiates I/O requests for 8KB. 3. Driver send SCSI Command to read 16 sectors(8KB) & copy to allocated pages. 4. Kernel copies the requested 8KB from page cache to user buffer. Virtual Address Physical Address Kernel Space User Space MMU RAM 0xc0000000 3G 4G Read(2page) =8192bytes Find 2 free pages in RAM & Read (512bytes x 16) HARD DISK offset fd=open(“file”) read(8192byte) Page Cache
  • 10. Read File from Disk (1) – Using “mmap()” 10 1. AP call “mmap()” syscall to mapping file with length=8KB. 2. Kernel find & allocates 2 pages, initiates I/O requests for 8KB. 3. Driver send SCSI Command to read 16 sectors(8KB) & copy to allocated pages. 4. AP can directly access file via page buffer without allocating buffer again. Virtual Address Physical Address Kernel Space User Space MMU RAM 0xc0000000 3G 4G mmap(2page) =8192bytes Find 2 free pages in RAM & Read (512bytes x 16) HARD DISK offset fd=open(“file”) read(8192byte) Page Cache
  • 11. Why MMAP? 11  Reduced memory usage : 1 memory copy  Performance gain: Read/write file operations & ioctl syscall by using copy_from_user/copy_to_user make too much effort to copy large data between Kernel space & User Space.  “MMAP” can yield significant performance improvements. 30%
  • 12. MMAP func. 12 #include <sys/mman.h> virt_addr = mmap(start_addr, len, int prot, int flag, int fd, offset); Returns Starting virtual address of the mapping if OK, MAP_FAILED on error start_addr  If NULL, then the kernel chooses the address available at which to create the mapping prot  memory protection flag  MAP_SHARED MAP_PRIVATE ….. fd  should be a valid file descriptor offset should be a multiple of the page size User Virtual Address File referenced by fd start_addr offset len return value of mmap PROT_EXEC Pages may be executed. PROT_READ Pages may be read. PROT_WRITE Pages may be written. PROT_NONE Pages may not be accessed. PROT_NONE PROT_NONE PROT_READ PROT_WRITE
  • 13. MMAP with MAP_SHARED flag (Share Mapping) 13 1. Thanks to virtual memory management, different processes can have mapped pages in common. 2. Share this mapping with all other processes that map this object. 3. Storing to the region is equivalent to writing to the file.  Changes are shared. Ex: virt_addr2 = (char*)mmap(0, size,PROT_WRITE|PROT_READ,MAP_SHARED,fd,offset); Virtual Address in Process Process 2 ② READ virt_addr1 virt_addr2 Process 1 ①WRITE (8192byte) Physical Address MMU RAM Write(2page) =8192bytes Find 2 free pages in RAM & Read (512bytes x 16) HARD DISK Write data offset fd=open(“file”) Page Cache Write data  msync(virt_addr2,size, MS_SYNC); ☞ virt_addr2 must be page aligned msync() to force flush changes Write data Read(2page) =8192bytes
  • 14. MMAP with MAP_PRIVATE flag (Private Mapping) 1. Any modifications to the data are not reflected to the file. 2. Any modifications not visible to other processes mapping the same file.  Changes are private. 3. A real life example can be found in : glibc’s Dynamically linking libraries (*.so) are loaded by using Private Mapping. virt_addr2 = (char*)mmap(0, size,PROT_WRITE|PROT_READ,MAP_PRIVATE,fd,offset); Virtual Address in Process ② READ virt_addr1 virt_addr2 Process 1 ①WRITE (2048byte) Physical Address MMU RAM HARD DISK offset fd=open(“file”) Page Cache 2 Read(1page) =4096bytes ! 3 1 3 1 Process 2 2 1.“copy-on-write” 2 3 1 2.Write(0.5page) =2048bytes 2 2 Ex:
  • 15. MMAP with MAP_LOCKED flag 15  Lock the pages of the mapped region into physical memory (avoid swapping out)  Kernel version > 2.5.37  Set the VMA flag of VM_LOCKED  In the same manner of mlock() #include <sys/mman.h> int mlock(const void * virt_addr, size_t len); int munlock(const void * virt_addr, size_t len); Ex: virt_addr = (char*)mmap(0, size,PROT_WRITE|PROT_READ,MAP_SHARED|MAP_LOCKED,fd,offset); Virtual Address Physical Address MMU Clean Pages VMA RAM mmap() SWAP Page Cache Dirty Page Dirty Page Reduce the size of page cache HARD DISK offset fd=open(“file”) Write Swap it out!! virt_addr len
  • 16. The Usual Rules of mmap() 16  The requested memory protection (prot, flags) must be compatible with the file descriptor permissions (O_RDONLY, etc.). Ex: If PROT_WRITE and MAP_SHARED are given, the file must be open for writing.  Usually, an entire mapping is unmapped, e.g.: i f ( ( virt_addr = mmap(NULL, length , /* . . . */ ) ) < 0) perror("mmap error") ; /* access memory mapped region via addr */ i f (munmap( virt_addr , length ) < 0) perror("munmap error ") ;  Accessing it after asuccessful munmap will (very likely) result in a segmentation fault.
  • 17. Mmap --- Example 17 #include <fcntl.h> #include <sys/mman.h> #include <sys/stat.h> #include <unistd.h> int main( int argc,char **argv ) { int fd ; int filesize= getpagesize(); //sysconf(_SC_PAGESIZE) void *virt_addr; if ( ( fd = open( “test.bin”, O_RDONLY) ) < 0) perror("open error”) ; virt_addr = mmap(0, filesize, PROT_READ, MAP_SHARED | MAP_LOCKED, fd , 0) ; if (virt_addr == MAP_FAILED) perror("mmap error”) ; *(unsigned long*)virt_addr = 0x12345678; msync(virt_addr,filesize,MS_SYNC) munmap(virt_addr,filesize) }
  • 18. mmap - Direct Mapping to RAM 18  If we want to mapping directly to RAM & access physical addresses, we need to build a custom driver to implement mmap file operation. Ex : We create a device file “mmapx” to replace normal file via our custom driver – “mmapx.ko”. Virtual Address Physical Address Kernel Space User Space MMU RAM 0xc0000000 3G 4G mmap() offset fd=open(“/dev/mmapx”) mmapx Physical address = offset fd=open(“file”) HARD DISK
  • 19. Flow of Direct Mapping via mmap syscall 19 mmapx driver AP Create a device file /dev/mmapx module_init : mmap file operation: Using remap_pfn_range to do real memory mapping time open device file: fd = open(“/dev/mmapx”) call mmap syscall: virt_addr = mmap(0,size,PROT_READ|PROT _WRITE,MAP_SHARED|MAP_LO CKED,fd,phyaddr); K E R E N L S P A C E U S E R S P A C E call munmap syscall: munmap(virt_addr ,size); close device file: close(fd);
  • 20. What does “remap_pfn_range” do & before doing? 20 1. Kernel allocate a vma area. (Kernel manage user space address by using vm_area_struct) 2. Driver get pages (physical address) of physical RAM. (via vma->vm_pgoff) 3. Driver call remap_pfn_page() to build a new “page table” to map a range of physical addresses. Process Virtual Memory Physical Memory MMU address RAM offset fd=open(“/dev/mmapx”) mmapx Physical address = vm_area_struct address address page page page vma->vm_start vma->vm_end vma->vm_pgoff = = Process Descriptor vm_area_struct vm_area_struct vm_area_struct remap_pfn_page() Link to new Page table
  • 21. Using remap_pfn_range 21 int remap_pfn_range(struct vm_area_struct *vma,unsigned long virt_addr, unsigned long pfn,unsigned long size, pgprot_t prot);  Only for “reserved pages” (Out of memory management) & “physical address” ★ Kernel helps us to fill these arguments : vma The virtual memory area into which the page range is being mapped. virt_addr The user virtual address where mapping should begin.(vma->vm_start) pfn Page Frame Number corresponding to the physical address. For most users , vm->vm_pgoff contains physical address. vma->vm_pgoff << PAGE_SHIFT is the value you need. size  The area size being remapped. In bytes. (vma->vm_end- vma->vm_start) prot  Protection for Pages in this VMA. Using vm->vm_page_prot . If you don’t want the mapping area cached by CPU , vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
  • 22. The implement of mmap file operation 22 #include <linux/mm.h> int sample_mmap(struct file *filp, struct vm_area_struct *vma) { unsigned long offset = vma->vm_pgoff << PAGE_SHIFT; if (offset >=_pa(high_memory) || (filp->f_flags & O_SYNC)) vma->vm_flags |= VM_IO; vma->vm_flags |= VM_RESERVED; vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); if (remap_pfn_range(vma , vma->vm_start, vma->vm_pgoff , vma->vm_end-vma->vm_start, vma->vm_page_prot)) return -EAGAIN; vma->vm_ops=&sample_vm_ops ; return 0; } Ldd3 Example : http://www.cs.fsu.edu/~baker/devices/lxr/http/source/ldd- examples/simple/simple.c This VMA MUST be a MMIO/VRAM backend memory, not System RAM. & prevent the region being core dumped Out of memory management – never be swapped out
  • 23. Flow of custom mmapx driver 23 mmapx driver AP Create a device file /dev/mmapx module_init : mmap file operation: Using remap_pfn_range to do real memory mapping time open device file: fd = open(“/dev/mmapx”) call mmap syscall: virt_addr = mmap(0,size,PROT_READ|PROT _WRITE,MAP_SHARED|MAP_LO CKED,fd,phyaddr); K E R E N L S P A C E U S E R S P A C E call munmap syscall: munmap(virt_addr ,size); close device file: close(fd); call ioctl syscall: phyaddr = ioctl(fd,size,GET_MEMORY) ioctl file opreation: Case GET_MEMORY : buf=kmalloc(size) phyaddr=virt_to_phys(buf) vma->vm_flags |=VM_RESERVED module_exit : kfree(buf);
  • 24. mmap summary 24  The device driver is loaded. It defines an mmap file operation.  A user space process calls the mmap system call.  The process gets a starting address to read from and write to . (depending on permissions).  The MMU automatically takes care of converting the process virtual addresses into physical ones. Direct access to the hardware! No expensive read or write system calls!
  • 25. More mmap: 25 1 : Operation not permitted for “/dev/mem” : fd= open("/dev/mem", O_RDWR | O_SYNC); Virtaddr=mmap(0, PAGE_SIZE, PROT_READ | PROT_WRITE,MAP_SHARED,fd,phyaddr); not supported in defult for Linux Kernel 2.6.25↑ expect for disabling CONFIG_STRICT_DEVMEM on kernel building. 2. We need to set page reserved before doing real mapping(remap_pfn_range). Linux 2.4 ↓  Using mem_map_reserve() to set each pages as PG_Reserved. Linux 2.6.0~2.6.18 ↓  Using SetPageReserved() to set each pages as PG_Reserved. Linux 2.6.25 ↑  Setting vm_flags as VM_RESERVED to avoid swapping out. 3. We do not need use “msync()” to force flush changes in our AP via custom mmapx driver. Because there is no “Page-Cache” implemented in our custom mmapx driver. And msync will call fsync file operation, so we also do not implement fsync. 4. A buffer allocated by get_user_pages() does not need mlock() function.

Editor's Notes

  1. 注: 調用request_mem_region()不是必須的,但是建議使用。該函數的任務是檢查申請的資源是否可用,如果可用則申請成功,並標誌為已經使用,其他驅動想再申請該資源時就會失敗
  2. 对kernel virtual address调用virt_to_phys也是没有意义的
  3. struct file_operations{ struct module *owner;是一個指向擁有這個結構的模塊的指針. 這個成員用來當模塊在被使用時阻止其被卸載. 一般初始化為: THIS_MODULE loff_t (*llseek) (struct file *, loff_t, int);用作改變文件中的當前讀/寫位置, 並且新位置作為(正的)返回值. ssize_t (*read) (struct file *, char *, size_t, loff_t *);從設備中獲取數據. 空指針導致read系統調用返回-EINVAL("Invalid argument") . 非負返回值代表了成功讀取的字節數 ssize_t (*write) (struct file *, const char *, size_t, loff_t *);發送數據給設備. 空指針導致write 系統調用返回-EINVAL. 非負返回值代表成功寫的字節數. unsigned int (*poll) (struct file *, struct poll_table_struct *);3 個系統調用的後端: poll, epoll, 和 select. 都用作查詢對一個或多個文件描述符的讀或寫是否會阻塞. poll 方法應當返回一個位掩碼指示是否非阻塞的讀或寫是可能的. 如果一個驅動的 poll 方法為 NULL, 設備假定為不阻塞地可讀可寫. int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);提供了發出設備特定命令的方法. 註意:有幾個 ioctl 命令被內核識別而不會調用此方法. int (*mmap) (struct file *, struct vm_area_struct *);請求將設備內存映射到進程的地址空間. 如果這個方法是 NULL,系統調用返回 -ENODEV. int (*open) (struct inode *, struct file *);open一個設備文件. 如果這個項是 NULL, 設備打開一直成功 int (*release) (struct inode *, struct file *);在文件結構被釋放時引用這個操作. 即在最後一個打開設備文件的文件描述符關閉時調用(而不是每次close時都調用) int (*fsync) (struct file *, struct dentry *, int datasync);fsync系統調用的後端, 用戶調用來刷新任何掛著的數據. 如果這個指針是 NULL, 系統調用返回 -EINVAL. int (*fasync) (int, struct file *, int);通知設備它的 FASYNC 標誌(異步通知)的改變.  這個成員可以是NULL 如果驅動不支持異步通知. ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, loff_t *);包含多個內存區的單個讀操作; 如果為 NULL, read方法被調用( 可能多於一次 ). ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, loff_t *);包含多個內存區的單個寫操作; 如果為 NULL, write方法被調用( 可能多於一次 ). }
  4. 建立vma->vm_start 到 vma->vm_end 的page table
  5. VM_LOCKED If set, the pages will not be swapped out. Set by mlock() VM_IO Signals that the area is a mmaped region for IO to a device. It will also prevent the region being core dumped VM_RESERVED Do not swap out this region, used by device drivers
  6. CONFIG_STRICT_DEVMEM