SlideShare una empresa de Scribd logo
1 de 11
Descargar para leer sin conexión
IOCP vs EPOLL
Performance Comparison
Seungmo Koo
@sm9kr
kr.linkedin.com/in/sm9kr
Test Configuration
Dummy Clients Test Server
Random data packets
Relay to client (echo)
Client-side measurement:
Server throughput
(Send/Receive Mbps)
Server-side measurement:
CPU usage
(overall % and per-core %)
Gbe link
Test Environment - Server
• Intel i7-3770k, 16GB RAM, Realtek PCIe Gigabit Ethernet
• Disabled CPU-frequency scaling
• Performance Test Program
– Simple packet relay (echo) server using Boost.Asio 1.53
• Boost.Asio uses IOCP on Windows while it uses EPOLL on Linux
– I/O threads: 8
– Client sessions: 10000
– Buffer size per session: read 4096, write 4096
• Performance Check Program
– Linux: htop & sar
– Windows: perfmon
• Operating System
– Linux: Ubuntu Linux Server 13.04 64bit, kernel 3.8.0-23
+ max socket tuning
– Windows: Windows Server 2012 64bit
Test Environment - Client
• Mac mini server 2012 late
– Intel i7 quad-core, 16GB RAM, Gigabit Ethernet
• Dummy Client Program
– Simple packet generator using Boost.Asio 1.53
– # of Clients (session): 10000
– I/O threads: 8
– Buffer size per session: read 4096, write 4096
Performance Test
• Two Cases
– NAGLE: Nagle’s algorithm ON
– NODELAY: Nagle’s algorithm OFF
• Dummy Client Program
– Measuring server-throughput
– Sending random data to the Server and receiving those from
the server for 600 seconds
• Test Server
– Measuring server CPU usage for 600 seconds
• 3 Times Measurement
– Uses the median result
• As a result, every test was practically the same.
Performance Evaluation
• No Session Drop
– Both EPOLL and IOCP kept 10000 sessions alive during a test
• Normalized Throughput
– They were pretty much same in throughput
0
10
20
30
40
50
60
70
80
90
100
NODELAY NAGLE
Normalized Throughput
EPOLL
IOCP
Performance Evaluation
• CPU Utilization
– Average of 8-core usage
– Consists of Most kernel-time and Slight user-time
– IOCP defeated EPOLL
0%
2%
4%
6%
8%
10%
12%
14%
NODELAY NAGLE
Average CPU usage
EPOLL
IOCP
Performance Evaluation
• Average CPU Utilization Per Core (NODELAY mode)
– Similar to results in case of NAGLE and NODELAY
– EPOLL compared with IOCP
• One of the CPU cores is consistently having high CPU utilization
• While the other cores are close to the average utilization
0
10
20
30
40
50
60
70
EPOLL IOCP
Average CPU usage per core (%)
CORE 0
CORE 1
CORE 2
CORE 3
CORE 4
CORE 5
CORE 6
CORE 7
Don’t care.
It is Hyper
Threading
Effect
NIC Receive Processing
on only one core
See “RSS queue”
Update: New Experiment with RSS option
• Average CPU Utilization Per Core (NAGLE mode)
– Using RSS queue (a.k.a. NIC multi-queue)
– Server HW: Mac-mini 2012 server (Broadcom BCM57766 NIC)
– Server OS: Windows Server 2012 and Ubuntu Server 13.04
– Performance
• Throughput: EPOLL’s is approximately equal to IOCP’s
• Average CPU usage: virtually the same (EPOLL 7.38%, IOCP 6.8%)
0
5
10
15
20
EPOLL IOCP
Average CPU Usage per Core (%)
with RSS (NIC multi-queue)
CORE 0
CORE 1
CORE 2
CORE 3
CORE 4
CORE 5
CORE 6
CORE 7
Summary
• Throughput
– There was little difference between IOCP and EPOLL
• CPU usage
– Without RSS (Multi-queue)
• IOCP was more efficient than EPOLL in CPU utilization
• EPOLL had consistently high CPU utilization compared with IOCP
– With RSS mode
• IOCP and EPOLL are about the same in CPU usage
When making a high performance server for Linux,
you should use RSS (multi-queue) supported NIC
Reference: RSS Queue
Linux: NIC Multi-queue Support
Windows: NIC Receive Side Scaling
http://msdn.microsoft.com/en-us/library/windows/hardware/ff556942(v=vs.85).aspx

Más contenido relacionado

La actualidad más candente

이승재, 실버바인 서버엔진 2 설계 리뷰, NDC2018
이승재, 실버바인 서버엔진 2 설계 리뷰, NDC2018이승재, 실버바인 서버엔진 2 설계 리뷰, NDC2018
이승재, 실버바인 서버엔진 2 설계 리뷰, NDC2018
devCAT Studio, NEXON
 
임태현, 게임 서버 디자인 가이드, NDC2013
임태현, 게임 서버 디자인 가이드, NDC2013임태현, 게임 서버 디자인 가이드, NDC2013
임태현, 게임 서버 디자인 가이드, NDC2013
devCAT Studio, NEXON
 
Modern C++ 프로그래머를 위한 CPP11/14 핵심
Modern C++ 프로그래머를 위한 CPP11/14 핵심Modern C++ 프로그래머를 위한 CPP11/14 핵심
Modern C++ 프로그래머를 위한 CPP11/14 핵심
흥배 최
 

La actualidad más candente (20)

게임서버프로그래밍 #8 - 성능 평가
게임서버프로그래밍 #8 - 성능 평가게임서버프로그래밍 #8 - 성능 평가
게임서버프로그래밍 #8 - 성능 평가
 
게임서버프로그래밍 #1 - IOCP
게임서버프로그래밍 #1 - IOCP게임서버프로그래밍 #1 - IOCP
게임서버프로그래밍 #1 - IOCP
 
[NDC2016] TERA 서버의 Modern C++ 활용기
[NDC2016] TERA 서버의 Modern C++ 활용기[NDC2016] TERA 서버의 Modern C++ 활용기
[NDC2016] TERA 서버의 Modern C++ 활용기
 
게임서버프로그래밍 #7 - 패킷핸들링 및 암호화
게임서버프로그래밍 #7 - 패킷핸들링 및 암호화게임서버프로그래밍 #7 - 패킷핸들링 및 암호화
게임서버프로그래밍 #7 - 패킷핸들링 및 암호화
 
오딘: 발할라 라이징 MMORPG의 성능 최적화 사례 공유 [카카오게임즈 - 레벨 300] - 발표자: 김문권, 팀장, 라이온하트 스튜디오...
오딘: 발할라 라이징 MMORPG의 성능 최적화 사례 공유 [카카오게임즈 - 레벨 300] - 발표자: 김문권, 팀장, 라이온하트 스튜디오...오딘: 발할라 라이징 MMORPG의 성능 최적화 사례 공유 [카카오게임즈 - 레벨 300] - 발표자: 김문권, 팀장, 라이온하트 스튜디오...
오딘: 발할라 라이징 MMORPG의 성능 최적화 사례 공유 [카카오게임즈 - 레벨 300] - 발표자: 김문권, 팀장, 라이온하트 스튜디오...
 
잘 알려지지 않은 숨은 진주, Winsock API - WSAPoll, Fast Loopback
잘 알려지지 않은 숨은 진주, Winsock API - WSAPoll, Fast Loopback잘 알려지지 않은 숨은 진주, Winsock API - WSAPoll, Fast Loopback
잘 알려지지 않은 숨은 진주, Winsock API - WSAPoll, Fast Loopback
 
게임서버프로그래밍 #2 - IOCP Adv
게임서버프로그래밍 #2 - IOCP Adv게임서버프로그래밍 #2 - IOCP Adv
게임서버프로그래밍 #2 - IOCP Adv
 
NDC 2017 하재승 NEXON ZERO (넥슨 제로) 점검없이 실시간으로 코드 수정 및 게임 정보 수집하기
NDC 2017 하재승 NEXON ZERO (넥슨 제로) 점검없이 실시간으로 코드 수정 및 게임 정보 수집하기NDC 2017 하재승 NEXON ZERO (넥슨 제로) 점검없이 실시간으로 코드 수정 및 게임 정보 수집하기
NDC 2017 하재승 NEXON ZERO (넥슨 제로) 점검없이 실시간으로 코드 수정 및 게임 정보 수집하기
 
Multiplayer Game Sync Techniques through CAP theorem
Multiplayer Game Sync Techniques through CAP theoremMultiplayer Game Sync Techniques through CAP theorem
Multiplayer Game Sync Techniques through CAP theorem
 
Visual Studio를 이용한 어셈블리어 학습 part 1
Visual Studio를 이용한 어셈블리어 학습 part 1Visual Studio를 이용한 어셈블리어 학습 part 1
Visual Studio를 이용한 어셈블리어 학습 part 1
 
이승재, 실버바인 서버엔진 2 설계 리뷰, NDC2018
이승재, 실버바인 서버엔진 2 설계 리뷰, NDC2018이승재, 실버바인 서버엔진 2 설계 리뷰, NDC2018
이승재, 실버바인 서버엔진 2 설계 리뷰, NDC2018
 
Akka.NET 으로 만드는 온라인 게임 서버 (NDC2016)
Akka.NET 으로 만드는 온라인 게임 서버 (NDC2016)Akka.NET 으로 만드는 온라인 게임 서버 (NDC2016)
Akka.NET 으로 만드는 온라인 게임 서버 (NDC2016)
 
〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3
〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3
〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3
 
[야생의 땅: 듀랑고] 서버 아키텍처 Vol. 2 (자막)
[야생의 땅: 듀랑고] 서버 아키텍처 Vol. 2 (자막)[야생의 땅: 듀랑고] 서버 아키텍처 Vol. 2 (자막)
[야생의 땅: 듀랑고] 서버 아키텍처 Vol. 2 (자막)
 
실시간 게임 서버 최적화 전략
실시간 게임 서버 최적화 전략실시간 게임 서버 최적화 전략
실시간 게임 서버 최적화 전략
 
임태현, 게임 서버 디자인 가이드, NDC2013
임태현, 게임 서버 디자인 가이드, NDC2013임태현, 게임 서버 디자인 가이드, NDC2013
임태현, 게임 서버 디자인 가이드, NDC2013
 
GCGC- CGCII 서버 엔진에 적용된 기술 (5) - Executor with Exception
GCGC- CGCII 서버 엔진에 적용된 기술 (5) - Executor with ExceptionGCGC- CGCII 서버 엔진에 적용된 기술 (5) - Executor with Exception
GCGC- CGCII 서버 엔진에 적용된 기술 (5) - Executor with Exception
 
[IGC 2017] 아마존 구승모 - 게임 엔진으로 서버 제작 및 운영까지
[IGC 2017] 아마존 구승모 - 게임 엔진으로 서버 제작 및 운영까지[IGC 2017] 아마존 구승모 - 게임 엔진으로 서버 제작 및 운영까지
[IGC 2017] 아마존 구승모 - 게임 엔진으로 서버 제작 및 운영까지
 
Modern C++ 프로그래머를 위한 CPP11/14 핵심
Modern C++ 프로그래머를 위한 CPP11/14 핵심Modern C++ 프로그래머를 위한 CPP11/14 핵심
Modern C++ 프로그래머를 위한 CPP11/14 핵심
 
Next-generation MMORPG service architecture
Next-generation MMORPG service architectureNext-generation MMORPG service architecture
Next-generation MMORPG service architecture
 

Destacado

Destacado (6)

NHN NEXT 2014년도 게임트랙 소개
NHN NEXT 2014년도 게임트랙 소개 NHN NEXT 2014년도 게임트랙 소개
NHN NEXT 2014년도 게임트랙 소개
 
게임제작개론 : #0 과목소개
게임제작개론 : #0 과목소개게임제작개론 : #0 과목소개
게임제작개론 : #0 과목소개
 
게임서버프로그래밍 #6 - 예외처리 및 로깅
게임서버프로그래밍 #6 - 예외처리 및 로깅게임서버프로그래밍 #6 - 예외처리 및 로깅
게임서버프로그래밍 #6 - 예외처리 및 로깅
 
게임서버프로그래밍 #5 - 데이터베이스 핸들링
게임서버프로그래밍 #5 - 데이터베이스 핸들링게임서버프로그래밍 #5 - 데이터베이스 핸들링
게임서버프로그래밍 #5 - 데이터베이스 핸들링
 
게임서버프로그래밍 #4 - 멀티스레드 프로그래밍
게임서버프로그래밍 #4 - 멀티스레드 프로그래밍게임서버프로그래밍 #4 - 멀티스레드 프로그래밍
게임서버프로그래밍 #4 - 멀티스레드 프로그래밍
 
사설 서버를 막는 방법들 (프리섭, 더이상은 Naver)
사설 서버를 막는 방법들 (프리섭, 더이상은 Naver)사설 서버를 막는 방법들 (프리섭, 더이상은 Naver)
사설 서버를 막는 방법들 (프리섭, 더이상은 Naver)
 

Similar a Windows IOCP vs Linux EPOLL Performance Comparison

Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
Haris456
 
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalShak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Tommy Lee
 

Similar a Windows IOCP vs Linux EPOLL Performance Comparison (20)

Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
 
3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf
 
Fastest Servlets in the West
Fastest Servlets in the WestFastest Servlets in the West
Fastest Servlets in the West
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Ceph on rdma
Ceph on rdmaCeph on rdma
Ceph on rdma
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
 
CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
 
Performance Tuning Oracle Weblogic Server 12c
Performance Tuning Oracle Weblogic Server 12cPerformance Tuning Oracle Weblogic Server 12c
Performance Tuning Oracle Weblogic Server 12c
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
Cloud Performance Benchmarking
Cloud Performance BenchmarkingCloud Performance Benchmarking
Cloud Performance Benchmarking
 
IDF'16 San Francisco - Overclocking Session
IDF'16 San Francisco - Overclocking SessionIDF'16 San Francisco - Overclocking Session
IDF'16 San Francisco - Overclocking Session
 
MySQL Manchester TT - 5.7 Whats new
MySQL Manchester TT - 5.7 Whats newMySQL Manchester TT - 5.7 Whats new
MySQL Manchester TT - 5.7 Whats new
 
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalShak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-final
 
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong TangAccelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
 
(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance
 
SOC Processors Used in SOC
SOC Processors Used in SOCSOC Processors Used in SOC
SOC Processors Used in SOC
 
Best Practices and Performance Studies for High-Performance Computing Clusters
Best Practices and Performance Studies for High-Performance Computing ClustersBest Practices and Performance Studies for High-Performance Computing Clusters
Best Practices and Performance Studies for High-Performance Computing Clusters
 
Inter connect2016 yps-2749_02232016_aspresented
Inter connect2016 yps-2749_02232016_aspresentedInter connect2016 yps-2749_02232016_aspresented
Inter connect2016 yps-2749_02232016_aspresented
 

Más de Seungmo Koo

Más de Seungmo Koo (13)

Understanding Tech Debt
Understanding Tech Debt Understanding Tech Debt
Understanding Tech Debt
 
게임서버프로그래밍 #3 - 메모리 및 오브젝트 풀링
게임서버프로그래밍 #3 - 메모리 및 오브젝트 풀링게임서버프로그래밍 #3 - 메모리 및 오브젝트 풀링
게임서버프로그래밍 #3 - 메모리 및 오브젝트 풀링
 
게임제작개론 : #9 라이브 서비스
게임제작개론 : #9 라이브 서비스게임제작개론 : #9 라이브 서비스
게임제작개론 : #9 라이브 서비스
 
게임제작개론 : #8 게임 제작 프로세스
게임제작개론 : #8 게임 제작 프로세스게임제작개론 : #8 게임 제작 프로세스
게임제작개론 : #8 게임 제작 프로세스
 
게임제작개론 : #7 팀 역할과 게임 리소스에 대한 이해
게임제작개론 : #7 팀 역할과 게임 리소스에 대한 이해게임제작개론 : #7 팀 역할과 게임 리소스에 대한 이해
게임제작개론 : #7 팀 역할과 게임 리소스에 대한 이해
 
게임제작개론 : #6 게임 시스템 구조에 대한 이해
게임제작개론 : #6 게임 시스템 구조에 대한 이해게임제작개론 : #6 게임 시스템 구조에 대한 이해
게임제작개론 : #6 게임 시스템 구조에 대한 이해
 
게임제작개론 : #5 플레이어에 대한 이해
게임제작개론 : #5 플레이어에 대한 이해게임제작개론 : #5 플레이어에 대한 이해
게임제작개론 : #5 플레이어에 대한 이해
 
게임제작개론 : #4 게임 밸런싱
게임제작개론 : #4 게임 밸런싱게임제작개론 : #4 게임 밸런싱
게임제작개론 : #4 게임 밸런싱
 
게임제작개론: #3 간접통제와 게임 커뮤니티
게임제작개론: #3 간접통제와 게임 커뮤니티게임제작개론: #3 간접통제와 게임 커뮤니티
게임제작개론: #3 간접통제와 게임 커뮤니티
 
게임제작개론: #2 세부 디자인 요소
게임제작개론: #2 세부 디자인 요소게임제작개론: #2 세부 디자인 요소
게임제작개론: #2 세부 디자인 요소
 
게임제작개론: #1 게임 구성 요소의 이해
게임제작개론: #1 게임 구성 요소의 이해게임제작개론: #1 게임 구성 요소의 이해
게임제작개론: #1 게임 구성 요소의 이해
 
NHN NEXT 게임 전공 소개
NHN NEXT 게임 전공 소개NHN NEXT 게임 전공 소개
NHN NEXT 게임 전공 소개
 
Game Developer Magazine, May 2012, Supplemental Info
Game Developer Magazine, May 2012, Supplemental InfoGame Developer Magazine, May 2012, Supplemental Info
Game Developer Magazine, May 2012, Supplemental Info
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Windows IOCP vs Linux EPOLL Performance Comparison

  • 1. IOCP vs EPOLL Performance Comparison Seungmo Koo @sm9kr kr.linkedin.com/in/sm9kr
  • 2. Test Configuration Dummy Clients Test Server Random data packets Relay to client (echo) Client-side measurement: Server throughput (Send/Receive Mbps) Server-side measurement: CPU usage (overall % and per-core %) Gbe link
  • 3. Test Environment - Server • Intel i7-3770k, 16GB RAM, Realtek PCIe Gigabit Ethernet • Disabled CPU-frequency scaling • Performance Test Program – Simple packet relay (echo) server using Boost.Asio 1.53 • Boost.Asio uses IOCP on Windows while it uses EPOLL on Linux – I/O threads: 8 – Client sessions: 10000 – Buffer size per session: read 4096, write 4096 • Performance Check Program – Linux: htop & sar – Windows: perfmon • Operating System – Linux: Ubuntu Linux Server 13.04 64bit, kernel 3.8.0-23 + max socket tuning – Windows: Windows Server 2012 64bit
  • 4. Test Environment - Client • Mac mini server 2012 late – Intel i7 quad-core, 16GB RAM, Gigabit Ethernet • Dummy Client Program – Simple packet generator using Boost.Asio 1.53 – # of Clients (session): 10000 – I/O threads: 8 – Buffer size per session: read 4096, write 4096
  • 5. Performance Test • Two Cases – NAGLE: Nagle’s algorithm ON – NODELAY: Nagle’s algorithm OFF • Dummy Client Program – Measuring server-throughput – Sending random data to the Server and receiving those from the server for 600 seconds • Test Server – Measuring server CPU usage for 600 seconds • 3 Times Measurement – Uses the median result • As a result, every test was practically the same.
  • 6. Performance Evaluation • No Session Drop – Both EPOLL and IOCP kept 10000 sessions alive during a test • Normalized Throughput – They were pretty much same in throughput 0 10 20 30 40 50 60 70 80 90 100 NODELAY NAGLE Normalized Throughput EPOLL IOCP
  • 7. Performance Evaluation • CPU Utilization – Average of 8-core usage – Consists of Most kernel-time and Slight user-time – IOCP defeated EPOLL 0% 2% 4% 6% 8% 10% 12% 14% NODELAY NAGLE Average CPU usage EPOLL IOCP
  • 8. Performance Evaluation • Average CPU Utilization Per Core (NODELAY mode) – Similar to results in case of NAGLE and NODELAY – EPOLL compared with IOCP • One of the CPU cores is consistently having high CPU utilization • While the other cores are close to the average utilization 0 10 20 30 40 50 60 70 EPOLL IOCP Average CPU usage per core (%) CORE 0 CORE 1 CORE 2 CORE 3 CORE 4 CORE 5 CORE 6 CORE 7 Don’t care. It is Hyper Threading Effect NIC Receive Processing on only one core See “RSS queue”
  • 9. Update: New Experiment with RSS option • Average CPU Utilization Per Core (NAGLE mode) – Using RSS queue (a.k.a. NIC multi-queue) – Server HW: Mac-mini 2012 server (Broadcom BCM57766 NIC) – Server OS: Windows Server 2012 and Ubuntu Server 13.04 – Performance • Throughput: EPOLL’s is approximately equal to IOCP’s • Average CPU usage: virtually the same (EPOLL 7.38%, IOCP 6.8%) 0 5 10 15 20 EPOLL IOCP Average CPU Usage per Core (%) with RSS (NIC multi-queue) CORE 0 CORE 1 CORE 2 CORE 3 CORE 4 CORE 5 CORE 6 CORE 7
  • 10. Summary • Throughput – There was little difference between IOCP and EPOLL • CPU usage – Without RSS (Multi-queue) • IOCP was more efficient than EPOLL in CPU utilization • EPOLL had consistently high CPU utilization compared with IOCP – With RSS mode • IOCP and EPOLL are about the same in CPU usage When making a high performance server for Linux, you should use RSS (multi-queue) supported NIC
  • 11. Reference: RSS Queue Linux: NIC Multi-queue Support Windows: NIC Receive Side Scaling http://msdn.microsoft.com/en-us/library/windows/hardware/ff556942(v=vs.85).aspx