- ### [GPUネットワーク設計・運⽤ 基礎勉強会 Lossless Ethernet ‒ PFC/ECN編  .BTBZVLJ,PCBZBTIJ](https://files.speakerdeck.com/presentations/6015ec3f4bb84379b474e69524b623ba/slide_0.jpg "GPUネットワーク設計・運用 基礎勉強会 Lossless Ethernet – PFC/ECN編 GPUネットワーク設計・運⽤ 基礎勉強会 Lossless Ethernet ‒ PFC/EC...") - ### [Lossless Ethernet (16ωοτϫʔΫʹϩεϨε͕ඞཁͳཧ༝  IUUQTTQFBLFSEFDLDPNNBSLVOFULVSBVUPUFUBTFOUBOFUVUPXBLVOPJNBUPLPSFLBSB](https://files.speakerdeck.com/presentations/6015ec3f4bb84379b474e69524b623ba/slide_2.jpg "GPUネットワーク設計・運用 基礎勉強会 Lossless Ethernet – PFC/ECN編 Lossless Ethernet (16ωοτϫʔΫʹϩεϨε͕ඞཁͳཧ༝  IUUQT...") - ### [パケットのプライオリティと分類 /\*$ଆͷઃఆ ࠷ॏཁ  # use L3 PFC, default=pcp (L2](https://files.speakerdeck.com/presentations/6015ec3f4bb84379b474e69524b623ba/slide_13.jpg "GPUネットワーク設計・運用 基礎勉強会 Lossless Ethernet – PFC/ECN編 パケットのプライオリティと分類 /*$ଆͷઃఆ ࠷ॏཁ  # use L3 PFC, d...") PFC) sudo mlnx\_qos -i $IF\_NAME --trust dscp # enable PFC on PFC Priority 3 sudo mlnx\_qos -i $IF\_NAME --pfc 0,0,0,1,0,0,0,0 # clear Traffic Class (TC) settings echo "tclass=-1" | sudo tee /sys/class/infiniband/$DEV\_NAME/tc/1/traffic\_class # set default ToS (= DSCP value \* 4) for RoCE traffic echo 106 | sudo tee /sys/class/infiniband/$DEV\_NAME/tc/1/traffic\_class # set default ToS for RoCE traffic sudo cma\_roce\_tos -d $DEV\_NAME -t 106 &$/ %4$1 CJUT CJUT    5P4 CJUT   1'$ %4$1ઃఆ - ### [パケットのプライオリティと分類 /\*$ଆͷઃఆ ࠷ॏཁ  \[markunet@hgx200\]$ sudo mlnx\_qos -i enp14s0np0 --trust](https://files.speakerdeck.com/presentations/6015ec3f4bb84379b474e69524b623ba/slide_14.jpg "GPUネットワーク設計・運用 基礎勉強会 Lossless Ethernet – PFC/ECN編 パケットのプライオリティと分類 /*$ଆͷઃఆ ࠷ॏཁ  [markunet@hgx20...") dscp DCBX mode: OS controlled Priority trust state: dscp dscp2prio mapping: prio:0 dscp:00, default priority: Receive buffer size (bytes): 19872,220896,0,0,0,0,0,0,max\_buffer\_size=4151520 Cable len: 7 PFC configuration: priority 0 1 2 3 4 5 6 7 enabled 0 0 0 1 0 0 0 0 buffer 0 0 0 1 0 0 0 0 tc: 0 ratelimit: unlimited, tsa: vendor priority: 1 tc: 1 ratelimit: unlimited, tsa: vendor priority: 0 tc: 2 ratelimit: unlimited, tsa: vendor priority: 2 tc: 3 ratelimit: unlimited, tsa: vendor priority: 3 tc: 4 ratelimit: unlimited, tsa: vendor priority: 4 tc: 5 ratelimit: unlimited, tsa: vendor priority: 5 tc: 6 ratelimit: unlimited, tsa: vendor priority: 6 tc: 7 ratelimit: unlimited, tsa: vendor priority: 7 1'$ઃఆ֬ೝ - ### [パケットのプライオリティと分類 /\*$ଆͷઃఆ ࠷ॏཁ  IUUQTCMPHNZMBCDD&OBCMF-1'$%$2$/GPS3Pamp;PO.FMMBOPY$POOFDU9/\*$T](https://files.speakerdeck.com/presentations/6015ec3f4bb84379b474e69524b623ba/slide_15.jpg "GPUネットワーク設計・運用 基礎勉強会 Lossless Ethernet – PFC/ECN編 パケットのプライオリティと分類 /*$ଆͷઃఆ ࠷ॏཁ  IUUQTCMPHNZ...") - ### [パケットのプライオリティと分類 %4$1UP5$.BQQJOH$POGJHVSBUJPO εΠονଆ  RoCE PCP/DSCP->SP mapping configurations =========================================== pcp](https://files.speakerdeck.com/presentations/6015ec3f4bb84379b474e69524b623ba/slide_16.jpg "GPUネットワーク設計・運用 基礎勉強会 Lossless Ethernet – PFC/ECN編 パケットのプライオリティと分類 %4$1UP5$.BQQJOH$POGJHVSBUJP...") dscp switch-prio - --- ----------------------- ----------- 0 0 0,1,2,3,4,5,6,7 0 1 1 8,9,10,11,12,13,14,15 1 2 2 16,17,18,19,20,21,22,23 2 3 3 24,25,26,27,28,29,30,31 3 4 4 32,33,34,35,36,37,38,39 4 5 5 40,41,42,43,44,45,46,47 5 6 6 48,49,50,51,52,53,54,55 6 7 7 56,57,58,59,60,61,62,63 7 qos map DSCP 0 1 2 3 4 5 6 7 to traffic-class 0 qos map DSCP 8 9 10 11 12 13 14 15 to traffic-class 1 qos map DSCP 16 17 18 19 20 21 22 23 to traffic-class 2 qos map DSCP 24 25 26 27 28 29 30 31 to traffic-class 3 qos map DSCP 32 33 34 35 36 37 38 39 to traffic-class 4 qos map DSCP 40 41 42 43 44 45 46 47 to traffic-class 5 qos map DSCP 48 49 50 51 52 53 54 55 to traffic-class 6 qos map DSCP 56 57 58 59 60 61 62 63 to traffic-class 7 Dscp-tc map: d1 : d2 0 1 2 3 4 5 6 7 8 9 -------------------------------------- 0 : 0 0 0 0 0 0 0 0 1 1 1 : 1 1 1 1 1 1 2 2 2 2 2 : 2 2 2 2 3 3 3 3 3 3 3 : 3 3 4 4 4 4 4 4 4 4 4 : 5 5 5 5 5 5 5 5 6 6 5 : 6 6 6 6 6 6 7 7 7 7 6 : 7 7 7 7 /7\*%\*"$VNVMVTσϑΥϧτ஋ ઃఆෆཁ "SJTUBΛ/7\*%\*"ͱಉ༷ͷϚοϐϯάͳΔΑ͏ʹઃఆͨ͠ྫ - ### [パケットのプライオリティと分類 ઃఆྫ "SJTUB&04  leaf\_rail-4#show qos interfaces ethernet 1/1 Ethernet1/1:](https://files.speakerdeck.com/presentations/6015ec3f4bb84379b474e69524b623ba/slide_18.jpg "GPUネットワーク設計・運用 基礎勉強会 Lossless Ethernet – PFC/ECN編 パケットのプライオリティと分類 ઃఆྫ "SJTUB&04  leaf_rail-4#...") Trust Mode: DSCP Default COS: 0 Default DSCP: 0 Port shaping rate: disabled Tx Bandwidth Bandwidth Shape Rate Priority ECN/WRED Queue Guaranteed (units) (units) ------------------------------------------------------------------------------------------ 7 - / - - / - ( - ) - / - ( - ) SP / SP D 6 - / - - / - ( - ) - / - ( - ) SP / SP D 5 - / - - / - ( - ) - / - ( - ) SP / SP D 4 - / - - / - ( - ) - / - ( - ) SP / SP D 3 95% / 95% - / - ( - ) - / - ( - ) RR / RR L 2 - / - - / - ( - ) - / - ( - ) RR / SP D 1 5% / 5% - / - ( - ) - / - ( - ) RR / RR D 0 - / - - / - ( - ) - / - ( - ) RR / SP D Note: Values are displayed as Operational/Configured Legend: RR -> Round Robin SP -> Strict Priority - -> Not Applicable / Not Configured % -> Percentage of reference ECN/WRED: L -> Queue Length ECN Enabled W -> WRED Enabled D -> Disabled - ### [Headroom Buffer 1'$ͷ࠷ॏཁ࣮૷  1'$ൃಈ࣌ʹɺ఻ૹ࿏্ͷύέοτΛड৴͢ΔͨΊͷόοϑΝͷ҆શྖҬ IUUQTXXXJFFFPSHGJMFTQVCMJDEPDTOFXMWBEBQUJWFQGDIFBESPPNWQEG](https://files.speakerdeck.com/presentations/6015ec3f4bb84379b474e69524b623ba/slide_23.jpg "GPUネットワーク設計・運用 基礎勉強会 Lossless Ethernet – PFC/ECN編 Headroom Buffer 1'$ͷ࠷ॏཁ࣮૷  1'$ൃಈ࣌ʹɺ఻ૹ࿏্ͷύέοτΛ...") - ### [Headroom Buffer ࣮ࡍͷΫϥελͰͷݸผઃఆྫ  έʔϒϧ௕͕ҟͳΔઃܭʹ͢Δͱܭࢉ͕໘౗ͳͷͰɺ&P3ͷϥοΫσβΠϯ͕औΕͳ͘ͳΔ ͜ͷܭࢉͷͨΊʹࠨӈରশͷϥοΫσβΠϯʹͳ͍ͬͯΔ ඇެ։](https://files.speakerdeck.com/presentations/6015ec3f4bb84379b474e69524b623ba/slide_25.jpg "GPUネットワーク設計・運用 基礎勉強会 Lossless Ethernet – PFC/ECN編 Headroom Buffer ࣮ࡍͷΫϥελͰͷݸผઃఆྫ  έʔϒϧ௕͕ҟͳΔઃܭʹ͢...") - ### [छྨͷ᫔᫓௨஌ύέοτΛར༻͢Δ • &$/&YQMJDJU$POHFTUJPO/PUJGJDBUJPO • $/1$POHFTUJPO/PUJGJDBUJPO1BDLFU ಈ࡞ϝΧχζϜ  ECN / CNP](https://files.speakerdeck.com/presentations/6015ec3f4bb84379b474e69524b623ba/slide_27.jpg "GPUネットワーク設計・運用 基礎勉強会 Lossless Ethernet – PFC/ECN編 छྨͷ᫔᫓௨஌ύέοτΛར༻͢Δ • &$/&YQMJDJU$POHFTUJPO/P...") • C/PU&$5 /PU&$/$BQBCMF5SBOTQPSU  • C&$5 &$/$BQBCMF5SBOTQPSU  • C&$5 &$/$BQBCMF5SBOTQPSU  • Camp; $POHFTUJPO&YQFSJFODFE  4XJUDI 4FSWFS 4FOEFS 4FSWFS 3FDFJWFS $POHFTUJPO $POHFTUJPO.BSLJOH $POHFTUFE5SBGGJD $POHFTUJPO/PUJGJDBUJPO )JHI1SJPSJUZ.BSLJOH \*# 6%1 \*1)FBEFS &$/C &UIFSOFU \*# 6%1 \*1)FBEFS &$/C &UIFSOFU $/1 &$/C ᶃ &$5 CJUΛηοτ ᶄ amp; CJUΛมߋ ᶅ amp;Λड৴ͨ͠Β$/1Λૹ৴ ᶆ $/1ͷड৴Ͱૹ৴Ϩʔτௐ੔ 0Q  \*# 6%1 \*1)FBEFS &$/C &UIFSOFU - ### [DCQCN ҰׅઃఆεΫϦϓτ  IUUQTHJUIVCDPN/7\*%\*"EPSPDFMJOVY](https://files.speakerdeck.com/presentations/6015ec3f4bb84379b474e69524b623ba/slide_33.jpg "GPUネットワーク設計・運用 基礎勉強会 Lossless Ethernet – PFC/ECN編 DCQCN ҰׅઃఆεΫϦϓτ  IUUQTHJUIVCDPN/7*%*"EP...") - ### [PFC / ECN / CNP ϋʔυ΢ΣΞΧ΢ϯλ  markunet@leaf\_rail-1:mgmt:~$ ethtool -S](https://files.speakerdeck.com/presentations/6015ec3f4bb84379b474e69524b623ba/slide_35.jpg "GPUネットワーク設計・運用 基礎勉強会 Lossless Ethernet – PFC/ECN編 PFC / ECN / CNP ϋʔυ΢ΣΞΧ΢ϯλ  markunet@leaf_rai...") swp3 | egrep "Q3|Q6|Ecn|Pfc3" HwIfInPfc3Pkt: 31848 HwIfOutPfc3Pkt: 0 HwIfOutQ3WredDrops: 0 HwIfOutQ6WredDrops: 0 HwIfOutQ3BuffDiscards: 0 HwIfOutQ6BuffDiscards: 0 HwIfOutQ3Pkts: 318357735060 HwIfOutQ3Octets: 326513898313396 HwIfOutEcnMarkedPkts: 11375061 HwIfOutQ6Pkts: 8245964 HwIfOutQ6Octets: 643185192 HwIfInPfc3Duration: 635412 HwIfOutPfc3Duration: 0 HwIfInQ3Pkts: 0 HwIfInQ6Pkts: 0 HwIfInQ3BuffDiscards: 0 HwIfInQ6BuffDiscards: 0 HwIfInQ3SharedBuffDiscards: 0 HwIfInQ6SharedBuffDiscards: 0 3Pamp;WͷύέοτΧ΢ϯλ &$/ͷϚʔΩϯάΛͨ͠ύέοτΧ΢ϯλ $/1ͷύέοτΧ΢ϯλ 1'$ͷύέοτΧ΢ϯλ - ### [PFC / ECN / CNP ϋʔυ΢ΣΞΧ΢ϯλ  leaf\_rail-4#show interfaces ethernet](https://files.speakerdeck.com/presentations/6015ec3f4bb84379b474e69524b623ba/slide_36.jpg "GPUネットワーク設計・運用 基礎勉強会 Lossless Ethernet – PFC/ECN編 PFC / ECN / CNP ϋʔυ΢ΣΞΧ΢ϯλ  leaf_rail-4#show ...") 1/1 counters queue detail Port TxQ Counter/pkts Counter/bytes Drop/pkts Drop/bytes ------- ---- ------------ ------------ ------------ ------------ Et1/1 UC0 5 638 0 0 Et1/1 UC1 0 0 0 0 Et1/1 UC2 0 0 0 0 Et1/1 UC3 7242261179 477989237814 0 0 Et1/1 UC4 0 0 0 0 Et1/1 UC5 0 0 0 0 Et1/1 UC6 128963 12930230 0 0 Et1/1 UC7 0 0 0 0 Et1/1 UC8 1976 539448 0 0 leaf\_rail-4#show qos interfaces ethernet 1/1 ecn counters queue Ethernet1/1: Tx-Queue Marked Packets ---------- ----------------------- 0 - 1 - 2 - 3 0 4 - 5 - 6 - 7 - ద੾ͳΩϡʔΛར༻͍ͯ͠Δͷ͔ͷ֬ೝ͸ඞਢ  ͳͲ͸γεςϜͷಛघ༻్ͷ৔߹͋Γ - ### [RoCEv2 設定例 Arista EOS ϓϥΠΦϦςΟͷϚοϐϯάͱ..6ϓϩϑΝΠϧͷઃఆ  qos map DSCP 0](https://files.speakerdeck.com/presentations/6015ec3f4bb84379b474e69524b623ba/slide_37.jpg "GPUネットワーク設計・運用 基礎勉強会 Lossless Ethernet – PFC/ECN編 RoCEv2 設定例 Arista EOS ϓϥΠΦϦςΟͷϚοϐϯάͱ..6ϓϩϑΝΠϧͷઃ...") 1 2 3 4 5 6 7 to traffic-class 0 qos map DSCP 8 9 10 11 12 13 14 15 to traffic-class 1 qos map DSCP 16 17 18 19 20 21 22 23 to traffic-class 2 qos map DSCP 24 25 26 27 28 29 30 31 to traffic-class 3 qos map DSCP 32 33 34 35 36 37 38 39 to traffic-class 4 qos map DSCP 40 41 42 43 44 45 46 47 to traffic-class 5 qos map DSCP 48 49 50 51 52 53 54 55 to traffic-class 6 qos map DSCP 56 57 58 59 60 61 62 63 to traffic-class 7 platform trident mmu queue profile RoCE\_MMU\_Profile ingress threshold 1/16 egress unicast queue 3 threshold 8 ! ..6.FNPSZ.BOBHFNFOU6OJU όοϑΝ༧໿Λߏ੒͢ΔϓϩϑΝΠϧͷ͜ͱ Ωϡʔͷᮢ஋Λཁ݅ʹ߹Θͤͯมߋ͢Δ ..6Λมߋ͠ͳ͍ͱ1'$͕ܧଓಈ࡞͠ύϑΥʔϚϯε͕௿Լ͠·͢ 4USBUB9(4 $IJQͰͷઃఆྫɻ%/9Ͱ͸ҟͳΔͷͰ஫ҙɻ - ### [RoCEv2 設定例 Arista EOS 3Pamp;WϓϩϑΝΠϧͷ࡞੒ͱ\*OUFSGBDF΁ͷΞλον  qos profile RoCEv2 priority-flow-control](https://files.speakerdeck.com/presentations/6015ec3f4bb84379b474e69524b623ba/slide_38.jpg "GPUネットワーク設計・運用 基礎勉強会 Lossless Ethernet – PFC/ECN編 RoCEv2 設定例 Arista EOS 3Pamp;amp;WϓϩϑΝΠϧͷ࡞੒ͱ*OUFSGBDF...") on priority-flow-control priority 3 no-drop ! tx-queue 1 no priority bandwidth percent 5 ! tx-queue 3 no priority bandwidth percent 95 random-detect ecn minimum-threshold 512 kbytes maximum-threshold 768 kbytes max-mark-probability 100 ! interface Ethernet1/1 description DOWNLINK mtu 9216 speed forced 100gfull no switchport ipv6 enable service-profile RoCEv2 ! tx-queue 3 random-detect ecn count platform trident mmu queue interface-profile RoCE\_MMU\_Profile ! &$/ͷᮢ஋͸σϑΥϧτ஋Ͱݕূͯ͠ɺඞཁͳΒมߋ͢Δ 1'$Λ5$Ͱ༗ޮԽ͢Δઃఆ OPESPQ-PTTMFTT &$/ͷઃఆ - ### [RoCEv2 設定例 Scheduled Fabric 7P2ϕʔεͷઃఆ  ඇެ։](https://files.speakerdeck.com/presentations/6015ec3f4bb84379b474e69524b623ba/slide_40.jpg "GPUネットワーク設計・運用 基礎勉強会 Lossless Ethernet – PFC/ECN編 RoCEv2 設定例 Scheduled Fabric 7P2ϕʔεͷઃఆ  ඇެ։ ") - ### [RoCEv2 設定例 SONiC 1VSF40/J$ ͰҰ෦ػೳʹະରԠ  ඇެ։](https://files.speakerdeck.com/presentations/6015ec3f4bb84379b474e69524b623ba/slide_41.jpg "GPUネットワーク設計・運用 基礎勉強会 Lossless Ethernet – PFC/ECN編 RoCEv2 設定例 SONiC 1VSF40/J$ ͰҰ෦ػೳʹະରԠ  ඇެ։ ") - ### [RoCEv2 設定例 Juniper Junos ࢀߟ৘ใ  ඇެ։](https://files.speakerdeck.com/presentations/6015ec3f4bb84379b474e69524b623ba/slide_42.jpg "GPUネットワーク設計・運用 基礎勉強会 Lossless Ethernet – PFC/ECN編 RoCEv2 設定例 Juniper Junos ࢀߟ৘ใ  ඇެ։ ") - ### [RoCEv2 設定例 Cisco Nexus ࢀߟ৘ใ  ඇެ։](https://files.speakerdeck.com/presentations/6015ec3f4bb84379b474e69524b623ba/slide_43.jpg "GPUネットワーク設計・運用 基礎勉強会 Lossless Ethernet – PFC/ECN編 RoCEv2 設定例 Cisco Nexus ࢀߟ৘ใ  ඇެ։ ") - ### [Lossless Ethernet ্ҐϨΠϠͷ᫔᫓੍ޚ͔Βॱʹಈ࡞͢Δ͜ͱ͕ద੾ɺ1'$͸ॠؒతͳ࠷ऴखஈ  /FUXPSL#FTU1SBDUJDFTGPS"SUJGJDJBM\*OUFMMJHFODF%BUB$FOUSF /FNBOKB,BNFOJDB 5FDIOJDBM.BSLFUJOH&OHJOFFS $JTDP-JWF#3,%$/ ΑΓը૾Ҿ༻](https://files.speakerdeck.com/presentations/6015ec3f4bb84379b474e69524b623ba/slide_44.jpg "GPUネットワーク設計・運用 基礎勉強会 Lossless Ethernet – PFC/ECN編 Lossless Ethernet ্ҐϨΠϠͷ᫔᫓੍ޚ͔Βॱʹಈ࡞͢Δ͜ͱ͕ద੾ɺ1'$͸ॠ...") - ### [EoF ](https://files.speakerdeck.com/presentations/6015ec3f4bb84379b474e69524b623ba/slide_46.jpg "GPUネットワーク設計・運用 基礎勉強会 Lossless Ethernet – PFC/ECN編 EoF  ")