SlideShare una empresa de Scribd logo
1 de 111
Descargar para leer sin conexión
HPC
             TCP/IP




  2009   5   29   SACSIS2009
(2)



            HPC      TCP/IP



•
    •                    1 Gbps
        RTT 100

        •                 240 Mbps ☹

        •         940 Mbps ☺



                                         2
(2)



                                                HPC                                                        TCP/IP


                          • Ethernet                                    PC


                               •              TCP/IP
                                          2                       10MB
                   1000                                                                                 1000

                    800                                x3.5                          Bandwidth (Mbps)    800
Bandwidth (Mbps)




                              160
                    600                                                                                  600

                    400                                                                                  400

                    200                                                                                  200

                      0                                                                                    0
                          0         100       200       300       400    500   600                             0   100   200       300       400   500   600
                                                    Time (msec)                                                                Time (msec)
                                                                                                                                                           3
(3)




•                      TCP/IP


    •
    • MPI

• TCP/IP    TCP

 • TCP      Reno [RFC2851]

 •          TCP   CUBIC   Compound TCP



                                           4
(4)




• TCP/IP
• TCP/IP
• TCP
• TCP/IP MPI
•
• TCP/IP
•

                 5
(5)




TCP/IP
(6)



TCP (Transport Control Protocol)

 •


 •



                                     7
(7)



        TCP

•             •
    •             •
    •             •
    • SACK            •
                      •
                      • ACK


                                8
(8)



        TCP
        •
            •
            •
                ACK        ACK

        •                                  ACK

                           #5      #4    #3     #2      #1
                 #5   #1


                           #2       #3    #4     #5     #6


※                                                            ACK
        1
    1
                                RTT (Round Trip Time)
                                                                     9
(9)




•                    RTO

    •         RTT α           ACK

•
    • ACK            3

    • 1 RTT                    RTO            +α

                #5       #4    #3    #2      #1



                 #2      #3          #3      #3

                                                   ACK
                                                  ACK
                     RTT (Round Trip Time)
                                                           10
(10)



TCP
•
• ACK           1 RTT


                                        RTT


               #8   #7    #6     #5
     #8   #1
#9                                            #4   #1
               #2    #3    #4    #5


                                        ACK

                RTT (Round Trip Time)
                                                     11
(11)



TCP
•
    •
•
    •




          12
(13)




•
➡               rwin

 •        ACK

 •                            rwin

                #8     #7   #6    #5
     #8    #1                                  rwin
#9                                                    #4#3
                #2     #3   #4    #5


                                         ACK

                 RTT (Round Trip Time)
                                                         13
(-)




     •

     •
     •
     ➔

                   B                               B/2
#5       #4   #3       #2   #1   #7 ... #4   #3   #2      #1




                             T                           2T
                                                                 14
(14)




•
➡               cwnd

  •
  •             cwnd


                #8     #7   #6     #5
      #8   #1                                   rwin
#9                                                     #4#3
                       cwnd
                 #2    #3   #4     #5


                                          ACK

                  RTT (Round Trip Time)
                                                          15
(15)




      •
      •        cwnd

          •2
 cwnd
 w



w/2
                      ssthresh


                           time
                                    16
(16)




      •              cwnd


      •       cwnd

          •                 →
 cwnd
 w



w/2
                                ssthresh


                                     time
                                              17
(12)




•                 rwin        ACK TCP

•                 cwnd

‣

     =min(cwnd, rwin)
                  #8     #7    #6    #5
     #8      #1                                   rwin
#9                                                       #4#3
                   #2    #3   #4     #5
                          cwnd
                                            ACK

                    RTT (Round Trip Time)
                                                            18
(17)



         ACK
 •
          RTT

     •


ACK




ACK




                  19
(17)



        ACK
•
           RTT

    •
• ACK
 •
(1) data



                 (2) ACK

                             20
(34)




TCP/IP
(35)



                                          TCP
• RTT   100


                    240 Mbps



                         : 1 Gbps
                    RTT: 100

              S                     R

              GbE                   GbE



                                                  22
(36)




•              ×

    •              1Gbps                     50
            RTT 100                     6.25MB


                                 : 50
        :          1 Gbps x 50          = 50 Mbit
1 Gbps                                  = 6.25 MB



               2
                                                      23
(37)




•                         x RTT

    • 1 Gbps × 50    x 2 = 12.5 MB

    •
•                                                 MB

                       6.25MB                          6.25MB

        12.5MB



                                                          out-
                    RTT (Round Trip Time)   of-order
                                                             24
(-)



        TCP

•
    ‣ sysctl
•
    ‣          API




                       25
(39)



sysctl                                                    net.*
                  sysctl                                                              default
    net.core.rmem_default                                                             109568
    net.core.rmem_max                                                                 131071
    net.core.wmem_default                                                             109568
    net.core.wmem_max                                                                 131071
    net.core.netdev_max_backlog                                                        1000
    net.ipv4.tcp_rmem                 TCP                [min, default, max]   4096, 87380, 4194304
    net.ipv4.tcp_wmem                 TCP                [min, default, max]   4096, 16384, 4194304
    net.ipv4.tcp_mem                                [low,pressure, high]       98304, 131072, 196608
    net.ipv4.tcp_no_metrics_save                                                        0
    net.ipv4.tcp_sack                             SACK                                  1
    net.ipv4.tcp_congestion_control         TCP                                        cubic


※                                                                               1GB
                                                                                                         26
(38)



              sysctl
•
       $ sysctl net.core.wmem_max
       net.core.wmem_max = 131071

       # sysctl -w net.core.wmem_max=1048576
       net.core.wmem_max=1048576


•   Linux procfs
       $ cat /proc/sys/net/core/wmem_max
       131071

       # echo 1048576 > /proc/sys/net/core/wmem_max


•
       $ sysctl -p   #              /etc/sysctl.conf
       $ sysctl -a | grep tcp #
                                                         27
(41)




   •       Linux


   •                            tcp_wmem
       tcp_rmem

       •               cwnd       4MB

TCP

       min              default         cwnd          max

net.ipv4.tcp_wmem[0]   tcp_wmem[1]                 tcp_wmem[2]
        (4KB)             (16KB)                      (4MB)

                                               ※
                                                                   28
(40)



                                           API
•
• setsockopt(2)                  getsockopt(2)
  int ssiz; /* send buffer size */
  cc = setsockopt(so, SOL_SOCKET, SO_SNDBUF, &ssiz, sizeof(ssiz));

  socklen_t optlen = sizeof(csiz);
  cc = getsockopt(so, SOL_SOCKET, SO_SNDBUF, &csiz, &optlen);



     SOL_SOCKET          SO_SNDBUF
     SOL_SOCKET          SO_RCVBUF

• SO_SNDBUF
• SO_SNDBUF                      net.core.wmem_max
                                                                       29
(42)



    setsockopt

• listen connect
• Linux                     2

 • man tcp(7)                          TCP




                  Iperf
  $ iperf -w 1m -c 192.168.0.2
  ------------------------------------------------------------
  Client connecting to 192.168.0.2, TCP port 5001
  TCP window size: 2.00 MByte (WARNING: requested 1.00 MByte)
  ------------------------------------------------------------

                                                                   30
(43)




      •
      •
                                 1000

                                  900

                                  800

                                  700
              Bandwidth (Mbps)




                                  600

                                  500

                                  400

                                  300
   : 1 Gbps
                                  200
RTT: 100
                                  100
TCP: CUBIC                          0
                                                                sndbuf=12.5MB
                                        0   10   20         30      40      50   60
                                                      Time (Second)
                                                                                        31
(44)




•                                                      QoS
    NIC

•
•                       1000

                         900

                         800

                         700
     Bandwidth (Mbps)




                         600

                         500

                         400

                         300

                         200

                         100            sndbuf=12.5MB txqueuelen=10000
                                                         sndbuf=12.5MB
                                                                                           NIC
                           0
                               0   10     20         30      40      50   60
                                               Time (Second)                   NIC: Network Interface Card
                                                                                                         32
(45)




•                                                      tc
    $ tc -s qdisc show dev eth0
    qdisc pfifo_fast 0: root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
     Sent 41825726768 bytes 1383455 pkt (dropped 72, overlimits 0 requeues 25164)
     rate 0bit 0pps backlog 0b 0p requeues 25164


• ifconfig
    $ ifconfig eth0
    eth0    Link encap:Ethernet HWaddr 00:22:64:10:cc:9d
          inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0
          inet6 addr: fe80::222:64ff:fe10:cc9d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:795673 errors:0 dropped:0 overruns:0 frame:0
          TX packets:163559 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:72212519 (72.2 MB) TX bytes:29126498 (29.1 MB)
          Interrupt:17
    $ ifconfig eth0 txqueuelen 10000



                                                                                      33
(71)



                                              netstat -s
            •
            •                         Linux                 net-tools

$ netstat -s                                    TcpExt:
[snip]                                             55 resets received for embryonic SYN_RECV sockets
Tcp:                                               108 TCP sockets finished time wait in fast timer
   49 active connections openings                  10969 delayed acks sent
   4925 passive connection openings                Quick ack mode was activated 67 times
   56 failed connection attempts                   7961 packets directly queued to recvmsg prequeue.
   12 connection resets received                   43 bytes directly received in process context from prequeue
   1 connections established                       69443 packet headers predicted
   226842 segments received                        11162 acknowledgments not containing data payload received
   161944 segments send out                        110524 predicted acknowledgments
   210 segments retransmited                       4 times recovered from packet loss by selective acknowledgements
   1 bad segments received.                        37 congestion windows recovered without slow start after partial ack
   575 resets sent                                 4 TCP data loss events
[snip]                                             6 timeouts after SACK recovery
                                                   1 timeouts in loss state
                                                   5 fast retransmits
                                                   5 retransmits in slow start
                                                   167 other TCP timeouts
                                                [snip]


                                                                                                                            34
(46)




TCP
(47)




•       TCP                               cwnd


    •              cwnd

    •
                 1Gbps    RTT 200ms MTU 1500B
cwnd
         16000

                                                 8000


                         8000 RTT ≒ 27

                                                   Time
                                                            36
(48)




•
    • PSockets   GridFTP   (BitTrrent Swarming)

•
    •
        •                  TCP           TCP Friendly

        •
    •
        •
                                                          37
(50)




•
•

•                        Inter-protocol fairness

    • TCP Friendliness      TCP

•                        Intra-protocol fairness

    • RTT         RTT

     • Reno   RTT

                                                     38
(49)



  TCP
  NewReno                   RFC 2852
HighSpeed TCP               RFC 3649
 Scalable TCP
     BIC
   CUBIC                Linux
   H-TCP
    Vegas
 Westwood+
    FAST
    Illinois
    YeAH
Compound TCP        Windows Vista

                                       ≒
                ≒
                        RTT2 > RTT1



                                             39
(51)



                       CUBIC
•
            cwnd     Wmax


• RTT                    cubic
                                     cwnd

cwnd

           Steady state behavior              Probing
    Wmax
                                                           3
                                                3   Wmax β
                             Wcubic = C   T−                      + Wmax
                                                     C
    Wmin
                                                                     Time
                                                                              40
(52)



             Compound TCP

• Windows Vista                loss-based (cwnd)

•                                            Reno

                                           +
                               delay-based (dwnd)
    •
    •            Reno
                                               RTT



•



                                         =
                               cwnd + dwnd


    • dwnd   0          Reno
                                      loss free period

                                                           41
(53)



                        TCP
•                 TCP

    •
    •
        •
    •
        •
•
•           HPC           TCP

http://code.google.com/p/pspacer/wiki/DonkanTcp
                                               42
(54)




    • RTT
    •
    •
        →
                             RTT: 200
        RTT: 200
        BW: 500 Mbps


S                      R
                                        (   )
GbE                    GbE

 500Mbps
 1Gbps
                                                43
(55)




• Limited slow start [RFC 3742]
 •   max_ssthresh  cwnd = ssthresh     RTT
     cwnd      max_ssthresh/2

• Swift Start
 •   Packet-pair probing
     ACK

• HyStart (Hybrid slow start)
 •                                 RTT


 •     Linux 2.6.29 CUBIC
                                                 44
(18)




TCP/IP MPI
(19)



MPI                      TCP

 • MPI                  TCP
          e.g.

  •
  •
      •



                 Time          Time
                                        46
(20)



                                  PC
8 PCs                                                              8 PCs
                            WAN                                    Node8
Node0
 Host 0                                                             Host 0
  Host 0    Catalyst 3750    GtrcNET-1                               Host 0




                                                   Catalyst 3750
   Host 0                                                             Host 0




                                                                     ………
  ………




Node7                                                              Node15



                                   • CPU: Pentium4/2.4GHz
                                   • Memory: DDR400 512MB
                                   • NIC: Intel PRO/1000 (82547EI)
                                   • OS: Linux-2.6.9-1.6 (Fedora Core 2)
                                   • Socket buffer: 20MB
                                                                                 47
(-)




    8 PCs       2                     10MB                                   8 PCs
                                     WAN                                     Node8
    Node0
     Host 0                                                                   Host 0
      Host 0        Catalyst 3750     GtrcNET-1                                Host 0




                                                             Catalyst 3750
       Host 0                                                                   Host 0




                                                                               ………
      ………




    Node7                           RTT: 20 ms                               Node15
                                        : 500 Mbps

                                             • CPU: Pentium4/2.4GHz
                                             • Memory: DDR400 512MB
                                             • NIC: Intel PRO/1000 (82547EI)
1                                            • OS: Linux-2.6.9-1.6 (Fedora Core 2)
                                             • Socket buffer: 20MB
                                                                                           48
(21)




                    1000

                     800                                                                2         10MB
 Bandwidth (Mbps)




                     600

                     400

                     200

                       0
                           0           2         4                6         8      10


                                                                                        •
                                                     Time (sec)
                                                                                                   160   10MB
                    1000                                                                    500Mbps

                                                                                        •
                     800
                                                       x3.5
Bandwidth (Mbps)




                               160
                     600

                     400

                     200

                       0
                           0         100   200       300              400   500   600
                                                 Time (msec)
                                                                                                                  49
(22)




       •
           •   Linux RTO                 cwnd        [RFC 2861]

       • MPI
        • HTTP/1.1
       • kernel 2.6.18
                    net.ipv4. tcp_slow_start_after_idle=0


                                                      ssthresh
cwnd




                                                            time
                                                                     50
(23)




•
    •
    • cwnd                                    ACK
                                        →

             cwnd   cwnd




                                              ACK
                      RTT (Round Trip Time)
                                                      51
(24)




• ACK

 •   ACK                        RTT

 •                RTT cwnd

           cwnd           RTT cwnd




                                                ACK
                        RTT (Round Trip Time)
                                                        52
(25)




                   1000

                    800
Bandwidth (Mbps)




                    600
                                                                          2   10MB
                    400

                    200

                      0
                          0   100   200       300       400   500   600
                                          Time (msec)




                   1000
                                                                          •
                    800
                                                                                cwnd
Bandwidth (Mbps)




                    600

                    400
                                                                          •
                    200

                      0
                          0   100   200       300       400   500   600
                                          Time (msec)

                                                                                         53
(-)




8 PCs                                                              8 PCs
                             WAN                                   Node8
Node0
 Host 0                                                             Host 0
  Host 0    Catalyst 3750     GtrcNET-1                              Host 0




                                                   Catalyst 3750
   Host 0                                                             Host 0




                                                                     ………
  ………




Node7                       RTT: 0 ms                              Node15
                                : 1 Gbps

                                   • CPU: Pentium4/2.4GHz
                                   • Memory: DDR400 512MB
16                                 • NIC: Intel PRO/1000 (82547EI)
                                   • OS: Linux-2.6.9-1.6 (Fedora Core 2)
                                   • Socket buffer: 20MB
                                                                                 54
(29)



                                         All-to-all
      •
      • MPI
      •
                        data

          A   A0       A1      A2       A3       A0       B0   C0   D0
process




          B   B0       B1      B2       B3       A1       B1   C1   D1
          C   C0       C1      C2       C3       A2       B2   C2   D2
          D   D0       D1      D2       D3       A3       B3   C3   D3



          A        B                A        B        A        B



          C        D                C        D        C        D
                   1                         2                 3
                                                                           55
(28)




                                   1
                            250


                            200                        230Mbps                     •
         Bandwidth (Mbps)




                            150
                                                                                   •
                                                                                   •
                            100


                             50


                              0
                                   0
                                   32 KB200     400        600
                                                Message size (KB)
                                                                     800   1000




                            1000
                                                32KB
                                                                                   •
                             800
Bandwidth (Mbps)




                             600       200
                             400                                                   •   200

                                                                                   •
                             200

                               0
                                   0      500        1000           1500    2000
                                                  Time (msec)
                                                                                               56
(30)



                        RTO
    •           2                             RTO




    • TCP
            1                     2            3

        A           B         A       B   A         B
                        RTO                             A C
1

        C           D         C       D   C         D

        A           B         A       B   A         B
2

        C           D         C       D   C         D
                                                                57
(31)



           RTO
• RTO = RTT + α
 •α →
 •α →
                                   }
• RTT 0                  RTO            200
      RTO = SRTT + max(G, 4 x RTTVAR)
      SRTT = (1 - β) x SRTT + β x RTT
      RTTVAR = (1 - γ) x RTTVAR + γ x |SRTT - RTT|
      β = 1/8, γ = 1/4, G = 200 msec
                                          SRTT: Smoothed Round Trip Time

 RFC2988     G 1     RTT                       500
                                                                             58
(32)



                                                   RTO
                                1
                         250
                                                                                            RTO = SRTT + max(G, 4 x RTTVAR)
                         200


                                                                                               •                           G = 200
      Bandwidth (Mbps)




                         150


                         100
                                                                                               •                           G=0

                                                                                                               • (kernel 2.6.23
                          50
                                    32 KB                             w/o RTO fix
                                                                      w/ RTO fix
                                                                                                                                               ) ip route
                           0
                                0     200         400        600           800      1000
                                                  Message size (KB)


                                       RTO                                                                                RTO
                         1000                                                                                  1000

                          800                                                                                   800
Bandwidth (Mbps)




                                                                                            Bandwidth (Mbps)


                          600                                                                                   600

                          400                                                                                   400

                          200                                                                                   200

                            0                                                                                     0
                                0           500        1000              1500        2000                             0    500      1000         1500   2000
                                                    Time (msec)                                                                  Time (msec)
                                                                                                                                                              59
(-)




                         1
                   250
                                                                              •
                   200
                                                                                  •
Bandwidth (Mbps)




                   150




                                                                                  ‣
                   100


                    50
                             32 KB                       w/o RTO fix
                                                         w/ RTO fix


                                                                              •
                     0
                         0     200   400        600           800      1000
                                     Message size (KB)



                                                                                  • RTO
                                                                                  ‣ RTO
                                                                              •
                                                                                  •
                                                                                          RTO
                                                                                                  60
(33)



      MPI   TCP/IP



            ACK
RTO



            cwnd

            MPI      TCP            ,
                           ACS11   , 2005
                                          61
(65)
(57)




  63
(58)


    PSPacer


•

• Linux                          OS


•                GPL


    http://www.gridmpi.org/pspacer.jsp


      ,                                  ACS14   , 2006
                                                            64
(63)




•             500Mbps


•




    PSPacer             PSPacer   500Mbps
                                              65
(64)



                      TCP
      Sender                                     Receiver

 A
                                                            C
                               1 Gbps
                              RTT 200ms

 B                                                          D

            w/o PSPacer                   w/ PSPacer
                           535 Mbps            980 Mbps

                                               490 Mbps
            394 Mbps
                                               490 Mbps

                141 Mbps
                               TCP

※Scalable TCP
                                                                  66
(62)



                  PSPacer 10GbE
                          100Mbps       Iperf 5
         GtrcNET-10
         10
(Gbps)




          8


                                                    PSPacer 10GbE
          6




          4
                                                           HTB: Hierarchical Token Bucket
                                                            Linux

          2

                                         PSPacer+
                                              HTB
          0                                                CPU: Quad-core Xeon (E5430) x 2
              0       2         4   6          8      10   NIC: Myricom Myri-10G (PCIe x 8)
                                                                MTU: 9000 byte
                                                           Memory: 4GB DDR2-667
                                        (Gbps)
                                                                                                67
(66)




•

    •
• ip
        $ ip route show cached to 192.168.0.2
        192.168.0.2 from 192.168.0.1 dev eth1
           cache mtu 9000 rtt 187ms rttvar 175ms ssthresh 1024 cwnd 637
        advmss 8960 hoplimit 64


•
        # sysctl -w net.ipv4.tcp_no_metrics_save=1
        net.ipv4.tcp_no_metrics_save=1


                                                                            68
(67)




•
    •
        GbE      MTU 1500B → 949.3 Mbps

        (Gb      MTU 9000B → 991.4 Mbps
•                               “ping -M do” IP                             “Don’t
    fragment” bit on
          $ ping -M do -s 9000 192.168.0.2                                  NG
          PING 192.168.0.2 (192.168.0.2) 9000(9028) bytes of data.
          From 192.168.0.1 icmp_seq=1 Frag needed and DF set (mtu = 9000)
          From 192.168.0.1 icmp_seq=1 Frag needed and DF set (mtu = 9000)
          From 192.168.0.1 icmp_seq=1 Frag needed and DF set (mtu = 9000)

          $ ping -M do -s 8972 192.168.0.2
          PING 192.168.0.2 (192.168.0.2) 8972(9000) bytes of data.          OK
          8980 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=3.76 ms
          8980 bytes from 192.168.0.2: icmp_seq=2 ttl=64 time=0.127 ms
          8980 bytes from 192.168.0.2: icmp_seq=3 ttl=64 time=0.133 ms


                                                                                       69
(68)




•   Infiniband      Myrinet

        •

•                                        IEEE 802.3x

    •                                                  PAUSE


    •
            $ sudo ethtool -A eth0 tx on rx on

            $ sudo ethtool -a eth0
            Pause parameters for eth1:
            Autonegotiate:	

off
            RX:	

 	

 off
            TX:	

 	

 off
                                                   Lossless Ethernet
                                                                         70
(69)




TCP/IP
(70)



Rough consensus, running code
 •                               RFC

     •        Reno [RFC 2851]

 •
     • net/ipv4
             180

     •             regression


         •    BIC     init_ssthresh ABC

 •
     •
                                            72
(72)



                    Web100
•        Pittsburgh Supercomputing Center (PSC)


•                                       procfs


    • netstat -s                          [RFC 4898]

•
•

                                  http://www.web100.org/
                                                           73
(72)



                 TCP Probe
• KProbes            TCP



    •
            procfs

•
•



                               74
(73)



TCP Probe
                              cwnd ssthresh




   http://www.linuxfoundation.org/en/Net:TCP_testing
                                                       75
(76)




1.                                 ×RTT
                         4MB

2. setsockopt   listen   connect

3.


4.    TCP


5.




                                            76
(77)




6.
            tcp_slow_start_after_idle

7.


8.                RTO
      RTO

9.


10.


                                          77
TCP
              ARPANET                                  NCP
              TCP IP         (1970           )

                                                       TCP
                             RFC793
                            (standard)       TCP                 IPv4
                   (1986)
    →                                Tahoe
                RFC2851
               (standard)      Reno
        RFC2852
     (experimental)     NewReno                          Vegas

              HSTCP
                                                         FAST
ScalableTCP                       HTCP
                      BIC
          CUBIC                          CompoundTCP         TCP
                                                                        79
RFC
                   Spec.                                  sysctl             default
                TCP (RFC 793)                                -                  -
          Window scaling (RFC 1323)                tcp_window_scaling          1
        TImestamps option (RFC 1323)                 tcp_timestamps            1
TIME-WAIT Assassination Hazards (RFC 1337)             tcp_rfc1337             0
            SACK (RFC 2018, 3517)                        tcp_sack              1
       Control block sharing (RFC 2140)            tcp_no_metrics_save         0
       MD5 signature option (RFC 2385)                       -                  -
         Initial window size (RFC 2414)                      -                  -
        Congestion control (RFC 2581)                        -                  -
             NewReno (RFC 3782)                              -                  -
          Cwnd validation (RFC 2861)             tcp_slow_start_after_idle     1
             D-SACK (RFC 2883)                          tcp_dsack              1
         Limited transmit (RFC 3042)                         -                  -
   Explicit congestion notification (RFC 3168)            tcp_ecn               0
     Appropriate Byte Counting (RFC 3465)                tcp_abc               0
         Limited slow start (RFC 3742)              tcp_max_ssthresh           0
               FRTO (RFC 4138)                           tcp_frto              0
                 Forward ACK                             tcp_fack              1
              Path MTU discovery                     tcp_mtu_probing           0
                                                                                       80
URL
    Project                               URL
  BIC/CUBIC        http://netsrv.csc.ncsu.edu/twiki/bin/view/Main/BIC
     FAST                   http://netlab.caltech.edu/FAST/
TCP Westwood+          http://c3lab.poliba.it/index.php/Westwood
  Scalable TCP         http://www.deneholme.net/tom/scalable/
 HighSpeed TCP            http://www.icir.org/floyd/hstcp.html
    H-TCP             http://www.hamilton.ie/net/htcp/index.htm
  TCP-Illinois       http://www.ews.uiuc.edu/~shaoliu/tcpillinois/
Compound TCP       http://research.microsoft.com/en-us/projects/ctcp/
   TCP Vegas         http://www.cs.arizona.edu/projects/protocols/
TCP Westwood             http://www.cs.ucla.edu/NRL/hpi/tcpw/
  Circuit TCP            http://www.ece.virginia.edu/cheetah/
   TCP Hybla                   https://hybla.deis.unibo.it/
TCP Low Priority      http://www.ece.rice.edu/networks/TCP-LP/
     UDT                     http://www.cs.uic.edu/~ygu1/
     XCP                 http://www.ana.lcs.mit.edu/dina/XCP/
    Web100                     http://www.web100.org/
  TCP Probe        http://www.linuxfoundation.org/en/Net:TcpProbe
    iproute2        http://www.linuxfoundation.org/en/Net:Iproute2
                                                                        81
1000

                    800
Bandwidth (Mbps)




                    600                  cwnd=600


                                                               • cwnd
                    400

                    200        cwnd=50
                                                                             50→600
                      0

                                                               •
                          -1      0        1      2    3   4
                                          Time (sec)




                                                                   BIC TCP

                                                                                      83
•
                   1000

                    800
Bandwidth (Mbps)




                    600
                                                                                 cwnd=49
                    400




                                                                             •
                    200

                      0

                                                                                 cwnd=582
                          0   0.5   1   1.5       2      2.5   3   3.5   4
                                              Time (sec)




                   1000
                                                                                 •
                    800
                                                                                            ssthresh
Bandwidth (Mbps)




                    600

                    400

                    200
                                                                                 •
                      0
                          0   0.5   1   1.5       2      2.5   3   3.5   4
                                              Time (sec)


                                                                                                       84
•
    • tcpdump
    • Wireshark (a.k.a. Etherreal)
•
    • netstat
    • Web100
    • TCP Probe
•
    • netem
                                     85
•                                 OS
    HPC

    •   1996-2001            OS

    •   2002-2004

    •   2003-

    •   2003-       TCP/IP

    •   2008-


                                       86
•
•

    •   TCP

    •   RED          AQM       ECN




              RED: Random Early Detection
              AQM: Active Queue Management
              ECN: Explicit Congestion Notification
                                                  87
End-to-end
•
    •
    •
•

•
    • cf. ECN (Explicit Congestion Notification)

                                                   88
SACK


1   2   3   4   5   6   7   9     10   11   13   15   16




                                             8




1   2   3   4   5   6   7   9    10    11   13   15   16




                                TCP
                                                           4
                                                               89
ACK

       10
       11
                     •      ACK   ACK   3
       12   X
RTT1




       13
       14       9
       15       9
                9
                     •   1 RTT
                9
                9

                     •
       10
       16
       17
RTT2




                15
                16
                17




                                            90
•
          •
          •                       RTO



              data         data

RTT
                              X
              ACK    RTO




                                        RTO: Retransmission
                                              TimeOut
                                        RTT: Round Trip Time
                                                           91
AIMD (Additive Increase
        Multiplicative Decrease)
•   cwnd

    •   α:          β:

             ACK
                        α
        cwnd ← cwnd +           cwnd ← (1 − β)cwnd
                      cwnd
                         cwnd          RTT
                         w
Reno: α=1, β=0.5                                w/2




                                                time
                                                       92
TCP cwnd
   Example of Growth Functions




                                                              HTCP
                                                      STCP      HSTCP BIC
                                        Window Size


                                                                            CUBIC

                                                                            Time


H.Cai, et al., “Stochastic Ordering for Internet Congestion Control,” PFLDnet2007   93
#1   #2   #3   #4   #5   #6   #7   #8



#1   #2   #3   #4   #5   #6   #7   #8



#1   #2   #3   #4   #5   #6   #7   #8




                                        94
•
•
            8                 80
                        80

(1) data
                  100Kbps
    1Mbps                     1Mbps
      80                     80



     (3) data                (2) ACK
            ACK
                  100Kbps
    1Mbps                     1Mbps
                                       95
CLOSED

            passive open            appl: listen()
                                                                     appl: connect()
                                                                     send: SYN
                 recv: SYN
                 send: SYN, ACK                 LISTEN                         active open

     SYN_RECV                             recv: SYN                            SYN_SENT
                                          send: SYN, ACK
                   recv: ACK                                        recv: SYN, ACK
                                             ESTABLISHED            send: ACK

      active close
                                                              recv: FIN        CLOSE_WAIT
                                                              send: ACK
                     recv: FIN
      FIN_WAIT_1                    CLOSING                                                appl: close()
                     send: ACK                                passive close                send: FIN
                   recv: FIN, ACK
recv: ACK                                recv: ACK
                   send: ACK                                                    LAST_ACK
                                                                                                 recv: ACK
                     recv: FIN      TIME_WAIT
      FIN_WAIT_2
                     send: ACK
                                                     2MSL timeout                      ※
                                                                                                             96
lfd=socket(...);
              fd=socket(...);      (1) SYN (SeqNo=1000)            bind(lfd,...);
              connect(fd,...);
                                                                   listen(lfd,...);




                                                                                         LISTEN
SYN_SENT




                                   (2) SYN, ACK (SeqNo=200,
                                         AckNo=1001)




                                                                                         SYN_RCVD
                                             (3) ACK (AckNo=201)
ESTABLISHED




                                                                   fd=accept(lfd,...);




                                                                                         LISHED
                  ※              SYN         ACK
                                                                                              97
close(fd);
                                               (1) FIN (SeqNo=2000)
TIME_WAIT FIN_WAIT_2 FIN_WAIT_1




                                                                                 EOF
                                                   (2) ACK (AckNo=2001)




                                                                                                          CLOSE_
                                                                                                           WAIT
                                                           (3) FIN (SeqNo=700)   close(fd);




                                                                                                            CLOSED LAST_ACK
                                                    (4) ACK (AckNo=701)

                                    2 MSL



                                               ※FIN-ACK
                                                                          MSL: Maximum Segment Lifetime
                                                                                                                        98
TIME_WAIT
•                        FIN     2MSL     Linux
      60

    • FIN
•
    • bind    EADDRINUSE           Linux listen


    • setsockopt(SO_REUSEADDR)
    •

                                                  99
•
                                         MTU

•   OS

          Guest OS
               eth0

    MTU 1500          Host OS        MTU 9000
           veth0                  eth0
                        br0
                                MTU 1500
                                MTU 9000       ※         eth0   IP
                                                   br0
                                                                     100
Linux

• kernel 2.6.29          13

  •                   CUBIC

•
     $ sysctl net.ipv4.tcp_congestion_control
     cubic



•
     $ sysctl -w net.ipv4.tcp_congestion_control=htcp
     net.ipv4.tcp_congestion_control = htcp




                                                        101
Ethernet
Source MAC     Destination MAC Type
                                                    Payload (46 -- 1500)             FCS (4)
 address (6)     address (6)    (2)



                              0800            IPv4 datagram (46 -- 1500)



                                      VLAN
                              8100     tag   0800            IPv4 datagram (46 -- 1500)



    •                                                          8
                     IFG 12

    •   Taged-VLAN                                                 4
                                                                   MAC: Media Access Control
                                                                   IFG: Inter Frame Gap
                                                                   FCS: Frame Check Sequence
                                                                                               102
IPv4
0                                                                                           31
    Version     Header length          TOS                                Length
                        Identifier                      0   DF MF          Fragment offset
              TTL                    Protocol                         Header checksum
                                             Source IP address
                                           Destination IP address

                                          Options (variable length)



                                      Payload (variable length)




     Flags:
          DF                    Don’t Fragment               MF             More Fragment

                                                                                                 103
IPv6
0                                                                                31
    Version     Traffic class                            Flow label
               Payload length                     Next header        Hop limit


                                Source IP address (128 bit)


                          Destination IP address (128 bit)


                        Extension header (variable length)


                                 Payload (variable length)




                                                                                      104
TCP
0                                                    31




                    U   A   P   R   S   F




                                              40




    Flags:
     ACK(A)                                 FIN(F)
     PSH(P)                                 RST(R)
     SYN(S)                                 URG(U)

                                                          105
TCP
kind      length        meaning
  0          -
  1          -              NOP
  2          4                     MSS
  3          3
  4          2                SACK
  5    2+8N (1≦N≦4)             SACK
  8        10
 19        18         MD5
 27         8



                                         106
TCP
      MSS SACK
         Window Scale                        SACK
kind=2    len=4              MSS            NOP     NOP      kind=8   len=10
kind=4    len=2     kind=8         len=10             Time Stamp
             Time Stamp
                                            NOP     NOP      kind=5   len=18
NOP       kind=3    len=3          WScale           SACK Block [0]


                                                    SACK Block [1]


                                                    SACK Block [2]




                                                    NOP
                                                                               107
TCP/IP    Vol.1
             W.


              2000/12




             2001/4

High Performance TCP/IP Networking: Concepts, Issues, and Solutions
            Raj Jain Mahbub Hassan
             Pearson Prentice Hall
             2003/10

Linux




              2006/11

                                                                      108
•
    •
                               1/2

    •
        • 1Gbps, MTU 1500 B   12

        • OS              1   10
                                   12 us 12 us

                                          1500B
                                                  Time




                                                         109
PSPacer

 •
      GbE, 1       =8
 •
 •
                          12 us 12 us

                          1500B   1500B
          9000   6000   3000              0   (byte)




                                                       110
• PAUSE   IEEE 802.3x

  •
    •
      •
  •

   PC




                        111

Más contenido relacionado

La actualidad más candente

Interrupt Affinityについて
Interrupt AffinityについてInterrupt Affinityについて
Interrupt AffinityについてTakuya ASADA
 
Ethernetの受信処理
Ethernetの受信処理Ethernetの受信処理
Ethernetの受信処理Takuya ASADA
 
PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)
PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)
PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)NTT DATA Technology & Innovation
 
ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方Yoshiyasu SAEKI
 
[D20] 高速Software Switch/Router 開発から得られた高性能ソフトウェアルータ・スイッチ活用の知見 (July Tech Fest...
[D20] 高速Software Switch/Router 開発から得られた高性能ソフトウェアルータ・スイッチ活用の知見 (July Tech Fest...[D20] 高速Software Switch/Router 開発から得られた高性能ソフトウェアルータ・スイッチ活用の知見 (July Tech Fest...
[D20] 高速Software Switch/Router 開発から得られた高性能ソフトウェアルータ・スイッチ活用の知見 (July Tech Fest...Tomoya Hibi
 
TiDBのトランザクション
TiDBのトランザクションTiDBのトランザクション
TiDBのトランザクションAkio Mitobe
 
YugabyteDBを使ってみよう - part2 -(NewSQL/分散SQLデータベースよろず勉強会 #2 発表資料)
YugabyteDBを使ってみよう - part2 -(NewSQL/分散SQLデータベースよろず勉強会 #2 発表資料)YugabyteDBを使ってみよう - part2 -(NewSQL/分散SQLデータベースよろず勉強会 #2 発表資料)
YugabyteDBを使ってみよう - part2 -(NewSQL/分散SQLデータベースよろず勉強会 #2 発表資料)NTT DATA Technology & Innovation
 
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)NTT DATA Technology & Innovation
 
単なるキャッシュじゃないよ!?infinispanの紹介
単なるキャッシュじゃないよ!?infinispanの紹介単なるキャッシュじゃないよ!?infinispanの紹介
単なるキャッシュじゃないよ!?infinispanの紹介AdvancedTechNight
 
コンテナネットワーキング(CNI)最前線
コンテナネットワーキング(CNI)最前線コンテナネットワーキング(CNI)最前線
コンテナネットワーキング(CNI)最前線Motonori Shindo
 
C/C++プログラマのための開発ツール
C/C++プログラマのための開発ツールC/C++プログラマのための開発ツール
C/C++プログラマのための開発ツールMITSUNARI Shigeo
 
オンライン物理バックアップの排他モードと非排他モードについて(第15回PostgreSQLアンカンファレンス@オンライン 発表資料)
オンライン物理バックアップの排他モードと非排他モードについて(第15回PostgreSQLアンカンファレンス@オンライン 発表資料)オンライン物理バックアップの排他モードと非排他モードについて(第15回PostgreSQLアンカンファレンス@オンライン 発表資料)
オンライン物理バックアップの排他モードと非排他モードについて(第15回PostgreSQLアンカンファレンス@オンライン 発表資料)NTT DATA Technology & Innovation
 
今こそ知りたいSpring Batch(Spring Fest 2020講演資料)
今こそ知りたいSpring Batch(Spring Fest 2020講演資料)今こそ知りたいSpring Batch(Spring Fest 2020講演資料)
今こそ知りたいSpring Batch(Spring Fest 2020講演資料)NTT DATA Technology & Innovation
 
スケールアウトするPostgreSQLを目指して!その第一歩!(NTTデータ テクノロジーカンファレンス 2020 発表資料)
スケールアウトするPostgreSQLを目指して!その第一歩!(NTTデータ テクノロジーカンファレンス 2020 発表資料)スケールアウトするPostgreSQLを目指して!その第一歩!(NTTデータ テクノロジーカンファレンス 2020 発表資料)
スケールアウトするPostgreSQLを目指して!その第一歩!(NTTデータ テクノロジーカンファレンス 2020 発表資料)NTT DATA Technology & Innovation
 
StackStormを活用した運用自動化の実践
StackStormを活用した運用自動化の実践StackStormを活用した運用自動化の実践
StackStormを活用した運用自動化の実践Shu Sugimoto
 
GNU AGPLv3について(On GNU AGPLv3)
GNU AGPLv3について(On GNU AGPLv3)GNU AGPLv3について(On GNU AGPLv3)
GNU AGPLv3について(On GNU AGPLv3)真行 八田
 
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワークSeastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワークTakuya ASADA
 
本当は恐ろしい分散システムの話
本当は恐ろしい分散システムの話本当は恐ろしい分散システムの話
本当は恐ろしい分散システムの話Kumazaki Hiroki
 
ネットワークでなぜ遅延が生じるのか
ネットワークでなぜ遅延が生じるのかネットワークでなぜ遅延が生じるのか
ネットワークでなぜ遅延が生じるのかJun Kato
 

La actualidad más candente (20)

Interrupt Affinityについて
Interrupt AffinityについてInterrupt Affinityについて
Interrupt Affinityについて
 
Ethernetの受信処理
Ethernetの受信処理Ethernetの受信処理
Ethernetの受信処理
 
PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)
PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)
PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)
 
ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方
 
[D20] 高速Software Switch/Router 開発から得られた高性能ソフトウェアルータ・スイッチ活用の知見 (July Tech Fest...
[D20] 高速Software Switch/Router 開発から得られた高性能ソフトウェアルータ・スイッチ活用の知見 (July Tech Fest...[D20] 高速Software Switch/Router 開発から得られた高性能ソフトウェアルータ・スイッチ活用の知見 (July Tech Fest...
[D20] 高速Software Switch/Router 開発から得られた高性能ソフトウェアルータ・スイッチ活用の知見 (July Tech Fest...
 
TiDBのトランザクション
TiDBのトランザクションTiDBのトランザクション
TiDBのトランザクション
 
YugabyteDBを使ってみよう - part2 -(NewSQL/分散SQLデータベースよろず勉強会 #2 発表資料)
YugabyteDBを使ってみよう - part2 -(NewSQL/分散SQLデータベースよろず勉強会 #2 発表資料)YugabyteDBを使ってみよう - part2 -(NewSQL/分散SQLデータベースよろず勉強会 #2 発表資料)
YugabyteDBを使ってみよう - part2 -(NewSQL/分散SQLデータベースよろず勉強会 #2 発表資料)
 
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
 
単なるキャッシュじゃないよ!?infinispanの紹介
単なるキャッシュじゃないよ!?infinispanの紹介単なるキャッシュじゃないよ!?infinispanの紹介
単なるキャッシュじゃないよ!?infinispanの紹介
 
コンテナネットワーキング(CNI)最前線
コンテナネットワーキング(CNI)最前線コンテナネットワーキング(CNI)最前線
コンテナネットワーキング(CNI)最前線
 
C/C++プログラマのための開発ツール
C/C++プログラマのための開発ツールC/C++プログラマのための開発ツール
C/C++プログラマのための開発ツール
 
オンライン物理バックアップの排他モードと非排他モードについて(第15回PostgreSQLアンカンファレンス@オンライン 発表資料)
オンライン物理バックアップの排他モードと非排他モードについて(第15回PostgreSQLアンカンファレンス@オンライン 発表資料)オンライン物理バックアップの排他モードと非排他モードについて(第15回PostgreSQLアンカンファレンス@オンライン 発表資料)
オンライン物理バックアップの排他モードと非排他モードについて(第15回PostgreSQLアンカンファレンス@オンライン 発表資料)
 
今こそ知りたいSpring Batch(Spring Fest 2020講演資料)
今こそ知りたいSpring Batch(Spring Fest 2020講演資料)今こそ知りたいSpring Batch(Spring Fest 2020講演資料)
今こそ知りたいSpring Batch(Spring Fest 2020講演資料)
 
スケールアウトするPostgreSQLを目指して!その第一歩!(NTTデータ テクノロジーカンファレンス 2020 発表資料)
スケールアウトするPostgreSQLを目指して!その第一歩!(NTTデータ テクノロジーカンファレンス 2020 発表資料)スケールアウトするPostgreSQLを目指して!その第一歩!(NTTデータ テクノロジーカンファレンス 2020 発表資料)
スケールアウトするPostgreSQLを目指して!その第一歩!(NTTデータ テクノロジーカンファレンス 2020 発表資料)
 
ヤフー発のメッセージキュー「Pulsar」のご紹介
ヤフー発のメッセージキュー「Pulsar」のご紹介ヤフー発のメッセージキュー「Pulsar」のご紹介
ヤフー発のメッセージキュー「Pulsar」のご紹介
 
StackStormを活用した運用自動化の実践
StackStormを活用した運用自動化の実践StackStormを活用した運用自動化の実践
StackStormを活用した運用自動化の実践
 
GNU AGPLv3について(On GNU AGPLv3)
GNU AGPLv3について(On GNU AGPLv3)GNU AGPLv3について(On GNU AGPLv3)
GNU AGPLv3について(On GNU AGPLv3)
 
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワークSeastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
 
本当は恐ろしい分散システムの話
本当は恐ろしい分散システムの話本当は恐ろしい分散システムの話
本当は恐ろしい分散システムの話
 
ネットワークでなぜ遅延が生じるのか
ネットワークでなぜ遅延が生じるのかネットワークでなぜ遅延が生じるのか
ネットワークでなぜ遅延が生じるのか
 

Destacado

Interface between kernel and user space
Interface between kernel and user spaceInterface between kernel and user space
Interface between kernel and user spaceSusant Sahani
 
AWS re:Inventに行くために今日からやるべき3つのこと
AWS re:Inventに行くために今日からやるべき3つのことAWS re:Inventに行くために今日からやるべき3つのこと
AWS re:Inventに行くために今日からやるべき3つのこと真吾 吉田
 
サバ缶のない世界でスカイアーチはこの先生きのこれるのか考える
サバ缶のない世界でスカイアーチはこの先生きのこれるのか考えるサバ缶のない世界でスカイアーチはこの先生きのこれるのか考える
サバ缶のない世界でスカイアーチはこの先生きのこれるのか考える真吾 吉田
 
研究法(Claimとは)
研究法(Claimとは)研究法(Claimとは)
研究法(Claimとは)Jun Rekimoto
 
JAWS目黒 EC2チューニングTips #jawsmeguro #jawsug
JAWS目黒 EC2チューニングTips #jawsmeguro #jawsugJAWS目黒 EC2チューニングTips #jawsmeguro #jawsug
JAWS目黒 EC2チューニングTips #jawsmeguro #jawsugYasuhiro Matsuo
 
優れた研究論文の書き方―7つの提案
優れた研究論文の書き方―7つの提案優れた研究論文の書き方―7つの提案
優れた研究論文の書き方―7つの提案Masanori Kado
 
iSCSI Protocol and Functionality
iSCSI Protocol and FunctionalityiSCSI Protocol and Functionality
iSCSI Protocol and FunctionalityLexumo
 

Destacado (8)

Interface between kernel and user space
Interface between kernel and user spaceInterface between kernel and user space
Interface between kernel and user space
 
AWS re:Inventに行くために今日からやるべき3つのこと
AWS re:Inventに行くために今日からやるべき3つのことAWS re:Inventに行くために今日からやるべき3つのこと
AWS re:Inventに行くために今日からやるべき3つのこと
 
サバ缶のない世界でスカイアーチはこの先生きのこれるのか考える
サバ缶のない世界でスカイアーチはこの先生きのこれるのか考えるサバ缶のない世界でスカイアーチはこの先生きのこれるのか考える
サバ缶のない世界でスカイアーチはこの先生きのこれるのか考える
 
profile
profileprofile
profile
 
研究法(Claimとは)
研究法(Claimとは)研究法(Claimとは)
研究法(Claimとは)
 
JAWS目黒 EC2チューニングTips #jawsmeguro #jawsug
JAWS目黒 EC2チューニングTips #jawsmeguro #jawsugJAWS目黒 EC2チューニングTips #jawsmeguro #jawsug
JAWS目黒 EC2チューニングTips #jawsmeguro #jawsug
 
優れた研究論文の書き方―7つの提案
優れた研究論文の書き方―7つの提案優れた研究論文の書き方―7つの提案
優れた研究論文の書き方―7つの提案
 
iSCSI Protocol and Functionality
iSCSI Protocol and FunctionalityiSCSI Protocol and Functionality
iSCSI Protocol and Functionality
 

Similar a HPCユーザが知っておきたいTCP/IPの話 ~クラスタ・グリッド環境の落とし穴~

SACSIS2009_TCP.pdf
SACSIS2009_TCP.pdfSACSIS2009_TCP.pdf
SACSIS2009_TCP.pdfHiroshi Ono
 
SACSIS2009_TCP.pdf
SACSIS2009_TCP.pdfSACSIS2009_TCP.pdf
SACSIS2009_TCP.pdfHiroshi Ono
 
Jitsi Videobridge, Octopodes, and Kotlin
Jitsi Videobridge, Octopodes, and KotlinJitsi Videobridge, Octopodes, and Kotlin
Jitsi Videobridge, Octopodes, and KotlinBoris Grozev
 
IPv6 Fundamentals & Securities
IPv6 Fundamentals & SecuritiesIPv6 Fundamentals & Securities
IPv6 Fundamentals & SecuritiesDon Anto
 
ARM LPC2300/LPC2400 TCP/IP Stack Porting
ARM LPC2300/LPC2400 TCP/IP Stack PortingARM LPC2300/LPC2400 TCP/IP Stack Porting
ARM LPC2300/LPC2400 TCP/IP Stack PortingMathivanan Elangovan
 
MAP-E as IPv4 over IPv6 Technology - with some operational experiences
MAP-E as IPv4 over IPv6 Technology - with some operational experiencesMAP-E as IPv4 over IPv6 Technology - with some operational experiences
MAP-E as IPv4 over IPv6 Technology - with some operational experiencesAPNIC
 
HTTP and 5G (fixed1)
HTTP and 5G (fixed1)HTTP and 5G (fixed1)
HTTP and 5G (fixed1)dynamis
 
SMP Implementation for OpenBSD/sgi [Japanese Edition]
SMP Implementation for OpenBSD/sgi [Japanese Edition]SMP Implementation for OpenBSD/sgi [Japanese Edition]
SMP Implementation for OpenBSD/sgi [Japanese Edition]Takuya ASADA
 
MAP-E as IPv4 over IPv6 Technology
MAP-E as IPv4 over IPv6 TechnologyMAP-E as IPv4 over IPv6 Technology
MAP-E as IPv4 over IPv6 TechnologyAkira Nakagawa
 
Part 9 : Congestion control and IPv6
Part 9 : Congestion control and IPv6Part 9 : Congestion control and IPv6
Part 9 : Congestion control and IPv6Olivier Bonaventure
 
Innovation is back in the transport and network layers
Innovation is back in the transport and network layersInnovation is back in the transport and network layers
Innovation is back in the transport and network layersOlivier Bonaventure
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughThomas Graf
 
Improving Distributed TCP Caching for Wireless Sensor Networks
Improving Distributed TCP Caching for Wireless Sensor NetworksImproving Distributed TCP Caching for Wireless Sensor Networks
Improving Distributed TCP Caching for Wireless Sensor NetworksAhmed Ayadi
 
Collaborate nfs kyle_final
Collaborate nfs kyle_finalCollaborate nfs kyle_final
Collaborate nfs kyle_finalKyle Hailey
 

Similar a HPCユーザが知っておきたいTCP/IPの話 ~クラスタ・グリッド環境の落とし穴~ (20)

SACSIS2009_TCP.pdf
SACSIS2009_TCP.pdfSACSIS2009_TCP.pdf
SACSIS2009_TCP.pdf
 
SACSIS2009_TCP.pdf
SACSIS2009_TCP.pdfSACSIS2009_TCP.pdf
SACSIS2009_TCP.pdf
 
Jitsi Videobridge, Octopodes, and Kotlin
Jitsi Videobridge, Octopodes, and KotlinJitsi Videobridge, Octopodes, and Kotlin
Jitsi Videobridge, Octopodes, and Kotlin
 
IPv6 Fundamentals & Securities
IPv6 Fundamentals & SecuritiesIPv6 Fundamentals & Securities
IPv6 Fundamentals & Securities
 
ARM LPC2300/LPC2400 TCP/IP Stack Porting
ARM LPC2300/LPC2400 TCP/IP Stack PortingARM LPC2300/LPC2400 TCP/IP Stack Porting
ARM LPC2300/LPC2400 TCP/IP Stack Porting
 
SCTP Tutorial
SCTP TutorialSCTP Tutorial
SCTP Tutorial
 
MAP-E as IPv4 over IPv6 Technology - with some operational experiences
MAP-E as IPv4 over IPv6 Technology - with some operational experiencesMAP-E as IPv4 over IPv6 Technology - with some operational experiences
MAP-E as IPv4 over IPv6 Technology - with some operational experiences
 
Sctp tutorial
Sctp tutorialSctp tutorial
Sctp tutorial
 
CN Jntu PPT
CN Jntu PPTCN Jntu PPT
CN Jntu PPT
 
HTTP and 5G (fixed1)
HTTP and 5G (fixed1)HTTP and 5G (fixed1)
HTTP and 5G (fixed1)
 
Transportlayer tanenbaum
Transportlayer tanenbaumTransportlayer tanenbaum
Transportlayer tanenbaum
 
SMP Implementation for OpenBSD/sgi [Japanese Edition]
SMP Implementation for OpenBSD/sgi [Japanese Edition]SMP Implementation for OpenBSD/sgi [Japanese Edition]
SMP Implementation for OpenBSD/sgi [Japanese Edition]
 
MAP-E as IPv4 over IPv6 Technology
MAP-E as IPv4 over IPv6 TechnologyMAP-E as IPv4 over IPv6 Technology
MAP-E as IPv4 over IPv6 Technology
 
Part 9 : Congestion control and IPv6
Part 9 : Congestion control and IPv6Part 9 : Congestion control and IPv6
Part 9 : Congestion control and IPv6
 
Innovation is back in the transport and network layers
Innovation is back in the transport and network layersInnovation is back in the transport and network layers
Innovation is back in the transport and network layers
 
Tcp ip
Tcp ipTcp ip
Tcp ip
 
Services
ServicesServices
Services
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking Walkthrough
 
Improving Distributed TCP Caching for Wireless Sensor Networks
Improving Distributed TCP Caching for Wireless Sensor NetworksImproving Distributed TCP Caching for Wireless Sensor Networks
Improving Distributed TCP Caching for Wireless Sensor Networks
 
Collaborate nfs kyle_final
Collaborate nfs kyle_finalCollaborate nfs kyle_final
Collaborate nfs kyle_final
 

Más de Ryousei Takano

Error Permissive Computing
Error Permissive ComputingError Permissive Computing
Error Permissive ComputingRyousei Takano
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIRyousei Takano
 
ABCI: An Open Innovation Platform for Advancing AI Research and Deployment
ABCI: An Open Innovation Platform for Advancing AI Research and DeploymentABCI: An Open Innovation Platform for Advancing AI Research and Deployment
ABCI: An Open Innovation Platform for Advancing AI Research and DeploymentRyousei Takano
 
クラウド環境におけるキャッシュメモリQoS制御の評価
クラウド環境におけるキャッシュメモリQoS制御の評価クラウド環境におけるキャッシュメモリQoS制御の評価
クラウド環境におけるキャッシュメモリQoS制御の評価Ryousei Takano
 
USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)Ryousei Takano
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network ProcessingRyousei Takano
 
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraFlow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraRyousei Takano
 
A Look Inside Google’s Data Center Networks
A Look Inside Google’s Data Center NetworksA Look Inside Google’s Data Center Networks
A Look Inside Google’s Data Center NetworksRyousei Takano
 
クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術Ryousei Takano
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...Ryousei Takano
 
IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告Ryousei Takano
 
Expectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchExpectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchRyousei Takano
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudRyousei Takano
 
不揮発メモリとOS研究にまつわる何か
不揮発メモリとOS研究にまつわる何か不揮発メモリとOS研究にまつわる何か
不揮発メモリとOS研究にまつわる何かRyousei Takano
 
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...Ryousei Takano
 
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~Ryousei Takano
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersRyousei Takano
 
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
高性能かつスケールアウト可能なHPCクラウド AIST Super Green CloudRyousei Takano
 
Iris: Inter-cloud Resource Integration System for Elastic Cloud Data Center
Iris: Inter-cloud Resource Integration System for Elastic Cloud Data CenterIris: Inter-cloud Resource Integration System for Elastic Cloud Data Center
Iris: Inter-cloud Resource Integration System for Elastic Cloud Data CenterRyousei Takano
 

Más de Ryousei Takano (20)

Error Permissive Computing
Error Permissive ComputingError Permissive Computing
Error Permissive Computing
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCI
 
ABCI: An Open Innovation Platform for Advancing AI Research and Deployment
ABCI: An Open Innovation Platform for Advancing AI Research and DeploymentABCI: An Open Innovation Platform for Advancing AI Research and Deployment
ABCI: An Open Innovation Platform for Advancing AI Research and Deployment
 
ABCI Data Center
ABCI Data CenterABCI Data Center
ABCI Data Center
 
クラウド環境におけるキャッシュメモリQoS制御の評価
クラウド環境におけるキャッシュメモリQoS制御の評価クラウド環境におけるキャッシュメモリQoS制御の評価
クラウド環境におけるキャッシュメモリQoS制御の評価
 
USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
 
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraFlow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
 
A Look Inside Google’s Data Center Networks
A Look Inside Google’s Data Center NetworksA Look Inside Google’s Data Center Networks
A Look Inside Google’s Data Center Networks
 
クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
 
IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告
 
Expectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchExpectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software research
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC Cloud
 
不揮発メモリとOS研究にまつわる何か
不揮発メモリとOS研究にまつわる何か不揮発メモリとOS研究にまつわる何か
不揮発メモリとOS研究にまつわる何か
 
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...
 
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
 
Iris: Inter-cloud Resource Integration System for Elastic Cloud Data Center
Iris: Inter-cloud Resource Integration System for Elastic Cloud Data CenterIris: Inter-cloud Resource Integration System for Elastic Cloud Data Center
Iris: Inter-cloud Resource Integration System for Elastic Cloud Data Center
 

Último

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Último (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

HPCユーザが知っておきたいTCP/IPの話 ~クラスタ・グリッド環境の落とし穴~

  • 1. HPC TCP/IP 2009 5 29 SACSIS2009
  • 2. (2) HPC TCP/IP • • 1 Gbps RTT 100 • 240 Mbps ☹ • 940 Mbps ☺ 2
  • 3. (2) HPC TCP/IP • Ethernet PC • TCP/IP 2 10MB 1000 1000 800 x3.5 Bandwidth (Mbps) 800 Bandwidth (Mbps) 160 600 600 400 400 200 200 0 0 0 100 200 300 400 500 600 0 100 200 300 400 500 600 Time (msec) Time (msec) 3
  • 4. (3) • TCP/IP • • MPI • TCP/IP TCP • TCP Reno [RFC2851] • TCP CUBIC Compound TCP 4
  • 5. (4) • TCP/IP • TCP/IP • TCP • TCP/IP MPI • • TCP/IP • 5
  • 7. (6) TCP (Transport Control Protocol) • • 7
  • 8. (7) TCP • • • • • • • SACK • • • ACK 8
  • 9. (8) TCP • • • ACK ACK • ACK #5 #4 #3 #2 #1 #5 #1 #2 #3 #4 #5 #6 ※ ACK 1 1 RTT (Round Trip Time) 9
  • 10. (9) • RTO • RTT α ACK • • ACK 3 • 1 RTT RTO +α #5 #4 #3 #2 #1 #2 #3 #3 #3 ACK ACK RTT (Round Trip Time) 10
  • 11. (10) TCP • • ACK 1 RTT RTT #8 #7 #6 #5 #8 #1 #9 #4 #1 #2 #3 #4 #5 ACK RTT (Round Trip Time) 11
  • 12. (11) TCP • • • • 12
  • 13. (13) • ➡ rwin • ACK • rwin #8 #7 #6 #5 #8 #1 rwin #9 #4#3 #2 #3 #4 #5 ACK RTT (Round Trip Time) 13
  • 14. (-) • • • ➔ B B/2 #5 #4 #3 #2 #1 #7 ... #4 #3 #2 #1 T 2T 14
  • 15. (14) • ➡ cwnd • • cwnd #8 #7 #6 #5 #8 #1 rwin #9 #4#3 cwnd #2 #3 #4 #5 ACK RTT (Round Trip Time) 15
  • 16. (15) • • cwnd •2 cwnd w w/2 ssthresh time 16
  • 17. (16) • cwnd • cwnd • → cwnd w w/2 ssthresh time 17
  • 18. (12) • rwin ACK TCP • cwnd ‣ =min(cwnd, rwin) #8 #7 #6 #5 #8 #1 rwin #9 #4#3 #2 #3 #4 #5 cwnd ACK RTT (Round Trip Time) 18
  • 19. (17) ACK • RTT • ACK ACK 19
  • 20. (17) ACK • RTT • • ACK • (1) data (2) ACK 20
  • 22. (35) TCP • RTT 100 240 Mbps : 1 Gbps RTT: 100 S R GbE GbE 22
  • 23. (36) • × • 1Gbps 50 RTT 100 6.25MB : 50 : 1 Gbps x 50 = 50 Mbit 1 Gbps = 6.25 MB 2 23
  • 24. (37) • x RTT • 1 Gbps × 50 x 2 = 12.5 MB • • MB 6.25MB 6.25MB 12.5MB out- RTT (Round Trip Time) of-order 24
  • 25. (-) TCP • ‣ sysctl • ‣ API 25
  • 26. (39) sysctl net.* sysctl default net.core.rmem_default 109568 net.core.rmem_max 131071 net.core.wmem_default 109568 net.core.wmem_max 131071 net.core.netdev_max_backlog 1000 net.ipv4.tcp_rmem TCP [min, default, max] 4096, 87380, 4194304 net.ipv4.tcp_wmem TCP [min, default, max] 4096, 16384, 4194304 net.ipv4.tcp_mem [low,pressure, high] 98304, 131072, 196608 net.ipv4.tcp_no_metrics_save 0 net.ipv4.tcp_sack SACK 1 net.ipv4.tcp_congestion_control TCP cubic ※ 1GB 26
  • 27. (38) sysctl • $ sysctl net.core.wmem_max net.core.wmem_max = 131071 # sysctl -w net.core.wmem_max=1048576 net.core.wmem_max=1048576 • Linux procfs $ cat /proc/sys/net/core/wmem_max 131071 # echo 1048576 > /proc/sys/net/core/wmem_max • $ sysctl -p # /etc/sysctl.conf $ sysctl -a | grep tcp # 27
  • 28. (41) • Linux • tcp_wmem tcp_rmem • cwnd 4MB TCP min default cwnd max net.ipv4.tcp_wmem[0] tcp_wmem[1] tcp_wmem[2] (4KB) (16KB) (4MB) ※ 28
  • 29. (40) API • • setsockopt(2) getsockopt(2) int ssiz; /* send buffer size */ cc = setsockopt(so, SOL_SOCKET, SO_SNDBUF, &ssiz, sizeof(ssiz)); socklen_t optlen = sizeof(csiz); cc = getsockopt(so, SOL_SOCKET, SO_SNDBUF, &csiz, &optlen); SOL_SOCKET SO_SNDBUF SOL_SOCKET SO_RCVBUF • SO_SNDBUF • SO_SNDBUF net.core.wmem_max 29
  • 30. (42) setsockopt • listen connect • Linux 2 • man tcp(7) TCP Iperf $ iperf -w 1m -c 192.168.0.2 ------------------------------------------------------------ Client connecting to 192.168.0.2, TCP port 5001 TCP window size: 2.00 MByte (WARNING: requested 1.00 MByte) ------------------------------------------------------------ 30
  • 31. (43) • • 1000 900 800 700 Bandwidth (Mbps) 600 500 400 300 : 1 Gbps 200 RTT: 100 100 TCP: CUBIC 0 sndbuf=12.5MB 0 10 20 30 40 50 60 Time (Second) 31
  • 32. (44) • QoS NIC • • 1000 900 800 700 Bandwidth (Mbps) 600 500 400 300 200 100 sndbuf=12.5MB txqueuelen=10000 sndbuf=12.5MB NIC 0 0 10 20 30 40 50 60 Time (Second) NIC: Network Interface Card 32
  • 33. (45) • tc $ tc -s qdisc show dev eth0 qdisc pfifo_fast 0: root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 41825726768 bytes 1383455 pkt (dropped 72, overlimits 0 requeues 25164) rate 0bit 0pps backlog 0b 0p requeues 25164 • ifconfig $ ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:22:64:10:cc:9d inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::222:64ff:fe10:cc9d/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:795673 errors:0 dropped:0 overruns:0 frame:0 TX packets:163559 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:72212519 (72.2 MB) TX bytes:29126498 (29.1 MB) Interrupt:17 $ ifconfig eth0 txqueuelen 10000 33
  • 34. (71) netstat -s • • Linux net-tools $ netstat -s TcpExt: [snip] 55 resets received for embryonic SYN_RECV sockets Tcp: 108 TCP sockets finished time wait in fast timer 49 active connections openings 10969 delayed acks sent 4925 passive connection openings Quick ack mode was activated 67 times 56 failed connection attempts 7961 packets directly queued to recvmsg prequeue. 12 connection resets received 43 bytes directly received in process context from prequeue 1 connections established 69443 packet headers predicted 226842 segments received 11162 acknowledgments not containing data payload received 161944 segments send out 110524 predicted acknowledgments 210 segments retransmited 4 times recovered from packet loss by selective acknowledgements 1 bad segments received. 37 congestion windows recovered without slow start after partial ack 575 resets sent 4 TCP data loss events [snip] 6 timeouts after SACK recovery 1 timeouts in loss state 5 fast retransmits 5 retransmits in slow start 167 other TCP timeouts [snip] 34
  • 36. (47) • TCP cwnd • cwnd • 1Gbps RTT 200ms MTU 1500B cwnd 16000 8000 8000 RTT ≒ 27 Time 36
  • 37. (48) • • PSockets GridFTP (BitTrrent Swarming) • • • TCP TCP Friendly • • • 37
  • 38. (50) • • • Inter-protocol fairness • TCP Friendliness TCP • Intra-protocol fairness • RTT RTT • Reno RTT 38
  • 39. (49) TCP NewReno RFC 2852 HighSpeed TCP RFC 3649 Scalable TCP BIC CUBIC Linux H-TCP Vegas Westwood+ FAST Illinois YeAH Compound TCP Windows Vista ≒ ≒ RTT2 > RTT1 39
  • 40. (51) CUBIC • cwnd Wmax • RTT cubic cwnd cwnd Steady state behavior Probing Wmax 3 3 Wmax β Wcubic = C T− + Wmax C Wmin Time 40
  • 41. (52) Compound TCP • Windows Vista loss-based (cwnd) • Reno + delay-based (dwnd) • • Reno RTT • = cwnd + dwnd • dwnd 0 Reno loss free period 41
  • 42. (53) TCP • TCP • • • • • • • HPC TCP http://code.google.com/p/pspacer/wiki/DonkanTcp 42
  • 43. (54) • RTT • • → RTT: 200 RTT: 200 BW: 500 Mbps S R ( ) GbE GbE 500Mbps 1Gbps 43
  • 44. (55) • Limited slow start [RFC 3742] • max_ssthresh cwnd = ssthresh RTT cwnd max_ssthresh/2 • Swift Start • Packet-pair probing ACK • HyStart (Hybrid slow start) • RTT • Linux 2.6.29 CUBIC 44
  • 46. (19) MPI TCP • MPI TCP e.g. • • • Time Time 46
  • 47. (20) PC 8 PCs 8 PCs WAN Node8 Node0 Host 0 Host 0 Host 0 Catalyst 3750 GtrcNET-1 Host 0 Catalyst 3750 Host 0 Host 0 ……… ……… Node7 Node15 • CPU: Pentium4/2.4GHz • Memory: DDR400 512MB • NIC: Intel PRO/1000 (82547EI) • OS: Linux-2.6.9-1.6 (Fedora Core 2) • Socket buffer: 20MB 47
  • 48. (-) 8 PCs 2 10MB 8 PCs WAN Node8 Node0 Host 0 Host 0 Host 0 Catalyst 3750 GtrcNET-1 Host 0 Catalyst 3750 Host 0 Host 0 ……… ……… Node7 RTT: 20 ms Node15 : 500 Mbps • CPU: Pentium4/2.4GHz • Memory: DDR400 512MB • NIC: Intel PRO/1000 (82547EI) 1 • OS: Linux-2.6.9-1.6 (Fedora Core 2) • Socket buffer: 20MB 48
  • 49. (21) 1000 800 2 10MB Bandwidth (Mbps) 600 400 200 0 0 2 4 6 8 10 • Time (sec) 160 10MB 1000 500Mbps • 800 x3.5 Bandwidth (Mbps) 160 600 400 200 0 0 100 200 300 400 500 600 Time (msec) 49
  • 50. (22) • • Linux RTO cwnd [RFC 2861] • MPI • HTTP/1.1 • kernel 2.6.18 net.ipv4. tcp_slow_start_after_idle=0 ssthresh cwnd time 50
  • 51. (23) • • • cwnd ACK → cwnd cwnd ACK RTT (Round Trip Time) 51
  • 52. (24) • ACK • ACK RTT • RTT cwnd cwnd RTT cwnd ACK RTT (Round Trip Time) 52
  • 53. (25) 1000 800 Bandwidth (Mbps) 600 2 10MB 400 200 0 0 100 200 300 400 500 600 Time (msec) 1000 • 800 cwnd Bandwidth (Mbps) 600 400 • 200 0 0 100 200 300 400 500 600 Time (msec) 53
  • 54. (-) 8 PCs 8 PCs WAN Node8 Node0 Host 0 Host 0 Host 0 Catalyst 3750 GtrcNET-1 Host 0 Catalyst 3750 Host 0 Host 0 ……… ……… Node7 RTT: 0 ms Node15 : 1 Gbps • CPU: Pentium4/2.4GHz • Memory: DDR400 512MB 16 • NIC: Intel PRO/1000 (82547EI) • OS: Linux-2.6.9-1.6 (Fedora Core 2) • Socket buffer: 20MB 54
  • 55. (29) All-to-all • • MPI • data A A0 A1 A2 A3 A0 B0 C0 D0 process B B0 B1 B2 B3 A1 B1 C1 D1 C C0 C1 C2 C3 A2 B2 C2 D2 D D0 D1 D2 D3 A3 B3 C3 D3 A B A B A B C D C D C D 1 2 3 55
  • 56. (28) 1 250 200 230Mbps • Bandwidth (Mbps) 150 • • 100 50 0 0 32 KB200 400 600 Message size (KB) 800 1000 1000 32KB • 800 Bandwidth (Mbps) 600 200 400 • 200 • 200 0 0 500 1000 1500 2000 Time (msec) 56
  • 57. (30) RTO • 2 RTO • TCP 1 2 3 A B A B A B RTO A C 1 C D C D C D A B A B A B 2 C D C D C D 57
  • 58. (31) RTO • RTO = RTT + α •α → •α → } • RTT 0 RTO 200 RTO = SRTT + max(G, 4 x RTTVAR) SRTT = (1 - β) x SRTT + β x RTT RTTVAR = (1 - γ) x RTTVAR + γ x |SRTT - RTT| β = 1/8, γ = 1/4, G = 200 msec SRTT: Smoothed Round Trip Time RFC2988 G 1 RTT 500 58
  • 59. (32) RTO 1 250 RTO = SRTT + max(G, 4 x RTTVAR) 200 • G = 200 Bandwidth (Mbps) 150 100 • G=0 • (kernel 2.6.23 50 32 KB w/o RTO fix w/ RTO fix ) ip route 0 0 200 400 600 800 1000 Message size (KB) RTO RTO 1000 1000 800 800 Bandwidth (Mbps) Bandwidth (Mbps) 600 600 400 400 200 200 0 0 0 500 1000 1500 2000 0 500 1000 1500 2000 Time (msec) Time (msec) 59
  • 60. (-) 1 250 • 200 • Bandwidth (Mbps) 150 ‣ 100 50 32 KB w/o RTO fix w/ RTO fix • 0 0 200 400 600 800 1000 Message size (KB) • RTO ‣ RTO • • RTO 60
  • 61. (33) MPI TCP/IP ACK RTO cwnd MPI TCP , ACS11 , 2005 61
  • 62. (65)
  • 64. (58) PSPacer • • Linux OS • GPL http://www.gridmpi.org/pspacer.jsp , ACS14 , 2006 64
  • 65. (63) • 500Mbps • PSPacer PSPacer 500Mbps 65
  • 66. (64) TCP Sender Receiver A C 1 Gbps RTT 200ms B D w/o PSPacer w/ PSPacer 535 Mbps 980 Mbps 490 Mbps 394 Mbps 490 Mbps 141 Mbps TCP ※Scalable TCP 66
  • 67. (62) PSPacer 10GbE 100Mbps Iperf 5 GtrcNET-10 10 (Gbps) 8 PSPacer 10GbE 6 4 HTB: Hierarchical Token Bucket Linux 2 PSPacer+ HTB 0 CPU: Quad-core Xeon (E5430) x 2 0 2 4 6 8 10 NIC: Myricom Myri-10G (PCIe x 8) MTU: 9000 byte Memory: 4GB DDR2-667 (Gbps) 67
  • 68. (66) • • • ip $ ip route show cached to 192.168.0.2 192.168.0.2 from 192.168.0.1 dev eth1 cache mtu 9000 rtt 187ms rttvar 175ms ssthresh 1024 cwnd 637 advmss 8960 hoplimit 64 • # sysctl -w net.ipv4.tcp_no_metrics_save=1 net.ipv4.tcp_no_metrics_save=1 68
  • 69. (67) • • GbE MTU 1500B → 949.3 Mbps (Gb MTU 9000B → 991.4 Mbps • “ping -M do” IP “Don’t fragment” bit on $ ping -M do -s 9000 192.168.0.2 NG PING 192.168.0.2 (192.168.0.2) 9000(9028) bytes of data. From 192.168.0.1 icmp_seq=1 Frag needed and DF set (mtu = 9000) From 192.168.0.1 icmp_seq=1 Frag needed and DF set (mtu = 9000) From 192.168.0.1 icmp_seq=1 Frag needed and DF set (mtu = 9000) $ ping -M do -s 8972 192.168.0.2 PING 192.168.0.2 (192.168.0.2) 8972(9000) bytes of data. OK 8980 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=3.76 ms 8980 bytes from 192.168.0.2: icmp_seq=2 ttl=64 time=0.127 ms 8980 bytes from 192.168.0.2: icmp_seq=3 ttl=64 time=0.133 ms 69
  • 70. (68) • Infiniband Myrinet • • IEEE 802.3x • PAUSE • $ sudo ethtool -A eth0 tx on rx on $ sudo ethtool -a eth0 Pause parameters for eth1: Autonegotiate: off RX: off TX: off Lossless Ethernet 70
  • 72. (70) Rough consensus, running code • RFC • Reno [RFC 2851] • • net/ipv4 180 • regression • BIC init_ssthresh ABC • • 72
  • 73. (72) Web100 • Pittsburgh Supercomputing Center (PSC) • procfs • netstat -s [RFC 4898] • • http://www.web100.org/ 73
  • 74. (72) TCP Probe • KProbes TCP • procfs • • 74
  • 75. (73) TCP Probe cwnd ssthresh http://www.linuxfoundation.org/en/Net:TCP_testing 75
  • 76. (76) 1. ×RTT 4MB 2. setsockopt listen connect 3. 4. TCP 5. 76
  • 77. (77) 6. tcp_slow_start_after_idle 7. 8. RTO RTO 9. 10. 77
  • 78.
  • 79. TCP ARPANET NCP TCP IP (1970 ) TCP RFC793 (standard) TCP IPv4 (1986) → Tahoe RFC2851 (standard) Reno RFC2852 (experimental) NewReno Vegas HSTCP FAST ScalableTCP HTCP BIC CUBIC CompoundTCP TCP 79
  • 80. RFC Spec. sysctl default TCP (RFC 793) - - Window scaling (RFC 1323) tcp_window_scaling 1 TImestamps option (RFC 1323) tcp_timestamps 1 TIME-WAIT Assassination Hazards (RFC 1337) tcp_rfc1337 0 SACK (RFC 2018, 3517) tcp_sack 1 Control block sharing (RFC 2140) tcp_no_metrics_save 0 MD5 signature option (RFC 2385) - - Initial window size (RFC 2414) - - Congestion control (RFC 2581) - - NewReno (RFC 3782) - - Cwnd validation (RFC 2861) tcp_slow_start_after_idle 1 D-SACK (RFC 2883) tcp_dsack 1 Limited transmit (RFC 3042) - - Explicit congestion notification (RFC 3168) tcp_ecn 0 Appropriate Byte Counting (RFC 3465) tcp_abc 0 Limited slow start (RFC 3742) tcp_max_ssthresh 0 FRTO (RFC 4138) tcp_frto 0 Forward ACK tcp_fack 1 Path MTU discovery tcp_mtu_probing 0 80
  • 81. URL Project URL BIC/CUBIC http://netsrv.csc.ncsu.edu/twiki/bin/view/Main/BIC FAST http://netlab.caltech.edu/FAST/ TCP Westwood+ http://c3lab.poliba.it/index.php/Westwood Scalable TCP http://www.deneholme.net/tom/scalable/ HighSpeed TCP http://www.icir.org/floyd/hstcp.html H-TCP http://www.hamilton.ie/net/htcp/index.htm TCP-Illinois http://www.ews.uiuc.edu/~shaoliu/tcpillinois/ Compound TCP http://research.microsoft.com/en-us/projects/ctcp/ TCP Vegas http://www.cs.arizona.edu/projects/protocols/ TCP Westwood http://www.cs.ucla.edu/NRL/hpi/tcpw/ Circuit TCP http://www.ece.virginia.edu/cheetah/ TCP Hybla https://hybla.deis.unibo.it/ TCP Low Priority http://www.ece.rice.edu/networks/TCP-LP/ UDT http://www.cs.uic.edu/~ygu1/ XCP http://www.ana.lcs.mit.edu/dina/XCP/ Web100 http://www.web100.org/ TCP Probe http://www.linuxfoundation.org/en/Net:TcpProbe iproute2 http://www.linuxfoundation.org/en/Net:Iproute2 81
  • 82.
  • 83. 1000 800 Bandwidth (Mbps) 600 cwnd=600 • cwnd 400 200 cwnd=50 50→600 0 • -1 0 1 2 3 4 Time (sec) BIC TCP 83
  • 84. 1000 800 Bandwidth (Mbps) 600 cwnd=49 400 • 200 0 cwnd=582 0 0.5 1 1.5 2 2.5 3 3.5 4 Time (sec) 1000 • 800 ssthresh Bandwidth (Mbps) 600 400 200 • 0 0 0.5 1 1.5 2 2.5 3 3.5 4 Time (sec) 84
  • 85. • tcpdump • Wireshark (a.k.a. Etherreal) • • netstat • Web100 • TCP Probe • • netem 85
  • 86. OS HPC • 1996-2001 OS • 2002-2004 • 2003- • 2003- TCP/IP • 2008- 86
  • 87. • • • TCP • RED AQM ECN RED: Random Early Detection AQM: Active Queue Management ECN: Explicit Congestion Notification 87
  • 88. End-to-end • • • • • • cf. ECN (Explicit Congestion Notification) 88
  • 89. SACK 1 2 3 4 5 6 7 9 10 11 13 15 16 8 1 2 3 4 5 6 7 9 10 11 13 15 16 TCP 4 89
  • 90. ACK 10 11 • ACK ACK 3 12 X RTT1 13 14 9 15 9 9 • 1 RTT 9 9 • 10 16 17 RTT2 15 16 17 90
  • 91. • • RTO data data RTT X ACK RTO RTO: Retransmission TimeOut RTT: Round Trip Time 91
  • 92. AIMD (Additive Increase Multiplicative Decrease) • cwnd • α: β: ACK α cwnd ← cwnd + cwnd ← (1 − β)cwnd cwnd cwnd RTT w Reno: α=1, β=0.5 w/2 time 92
  • 93. TCP cwnd Example of Growth Functions HTCP STCP HSTCP BIC Window Size CUBIC Time H.Cai, et al., “Stochastic Ordering for Internet Congestion Control,” PFLDnet2007 93
  • 94. #1 #2 #3 #4 #5 #6 #7 #8 #1 #2 #3 #4 #5 #6 #7 #8 #1 #2 #3 #4 #5 #6 #7 #8 94
  • 95. • • 8 80 80 (1) data 100Kbps 1Mbps 1Mbps 80 80 (3) data (2) ACK ACK 100Kbps 1Mbps 1Mbps 95
  • 96. CLOSED passive open appl: listen() appl: connect() send: SYN recv: SYN send: SYN, ACK LISTEN active open SYN_RECV recv: SYN SYN_SENT send: SYN, ACK recv: ACK recv: SYN, ACK ESTABLISHED send: ACK active close recv: FIN CLOSE_WAIT send: ACK recv: FIN FIN_WAIT_1 CLOSING appl: close() send: ACK passive close send: FIN recv: FIN, ACK recv: ACK recv: ACK send: ACK LAST_ACK recv: ACK recv: FIN TIME_WAIT FIN_WAIT_2 send: ACK 2MSL timeout ※ 96
  • 97. lfd=socket(...); fd=socket(...); (1) SYN (SeqNo=1000) bind(lfd,...); connect(fd,...); listen(lfd,...); LISTEN SYN_SENT (2) SYN, ACK (SeqNo=200, AckNo=1001) SYN_RCVD (3) ACK (AckNo=201) ESTABLISHED fd=accept(lfd,...); LISHED ※ SYN ACK 97
  • 98. close(fd); (1) FIN (SeqNo=2000) TIME_WAIT FIN_WAIT_2 FIN_WAIT_1 EOF (2) ACK (AckNo=2001) CLOSE_ WAIT (3) FIN (SeqNo=700) close(fd); CLOSED LAST_ACK (4) ACK (AckNo=701) 2 MSL ※FIN-ACK MSL: Maximum Segment Lifetime 98
  • 99. TIME_WAIT • FIN 2MSL Linux 60 • FIN • • bind EADDRINUSE Linux listen • setsockopt(SO_REUSEADDR) • 99
  • 100. MTU • OS Guest OS eth0 MTU 1500 Host OS MTU 9000 veth0 eth0 br0 MTU 1500 MTU 9000 ※ eth0 IP br0 100
  • 101. Linux • kernel 2.6.29 13 • CUBIC • $ sysctl net.ipv4.tcp_congestion_control cubic • $ sysctl -w net.ipv4.tcp_congestion_control=htcp net.ipv4.tcp_congestion_control = htcp 101
  • 102. Ethernet Source MAC Destination MAC Type Payload (46 -- 1500) FCS (4) address (6) address (6) (2) 0800 IPv4 datagram (46 -- 1500) VLAN 8100 tag 0800 IPv4 datagram (46 -- 1500) • 8 IFG 12 • Taged-VLAN 4 MAC: Media Access Control IFG: Inter Frame Gap FCS: Frame Check Sequence 102
  • 103. IPv4 0 31 Version Header length TOS Length Identifier 0 DF MF Fragment offset TTL Protocol Header checksum Source IP address Destination IP address Options (variable length) Payload (variable length) Flags: DF Don’t Fragment MF More Fragment 103
  • 104. IPv6 0 31 Version Traffic class Flow label Payload length Next header Hop limit Source IP address (128 bit) Destination IP address (128 bit) Extension header (variable length) Payload (variable length) 104
  • 105. TCP 0 31 U A P R S F 40 Flags: ACK(A) FIN(F) PSH(P) RST(R) SYN(S) URG(U) 105
  • 106. TCP kind length meaning 0 - 1 - NOP 2 4 MSS 3 3 4 2 SACK 5 2+8N (1≦N≦4) SACK 8 10 19 18 MD5 27 8 106
  • 107. TCP MSS SACK Window Scale SACK kind=2 len=4 MSS NOP NOP kind=8 len=10 kind=4 len=2 kind=8 len=10 Time Stamp Time Stamp NOP NOP kind=5 len=18 NOP kind=3 len=3 WScale SACK Block [0] SACK Block [1] SACK Block [2] NOP 107
  • 108. TCP/IP Vol.1 W. 2000/12 2001/4 High Performance TCP/IP Networking: Concepts, Issues, and Solutions Raj Jain Mahbub Hassan Pearson Prentice Hall 2003/10 Linux 2006/11 108
  • 109. • 1/2 • • 1Gbps, MTU 1500 B 12 • OS 1 10 12 us 12 us 1500B Time 109
  • 110. PSPacer • GbE, 1 =8 • • 12 us 12 us 1500B 1500B 9000 6000 3000 0 (byte) 110
  • 111. • PAUSE IEEE 802.3x • • • • PC 111