DPDK receive performance
Atom C2758 and E5-2609 core has same clock speed 2.4GHz.
But the receiving performance has difference values, since Atom has much cache-misses.
Xeon (E5-2609 @ 2.4 GHz) | 13.66 Mpps (9179 Mbps)
| 7.46 Mpps (4777 Mbps) | 13.57 Mpps (9122 Mbps) | |
Atom (C2758 @ 2.41GHz) | 9.78 Mpps (6576 Mbps) | | 9.63 Mpps (6476 Mbps) * SR-IOV w/o IOMMU (private patch) | |
| | | | |
C2758 Atom Core
root@cnode24-m:~# perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,branch-misses -t 4891 sleep 10
Performance counter stats for thread id '4891':
677353724 cache-references [33.31%]
120428677 cache-misses # 17.779 % of all cache refs [33.35%]
24121523267 cycles [33.37%]
24619412547 instructions # 1.02 insns per cycle [50.03%]
3744803946 branches [50.00%]
54565668 branch-misses # 1.46% of all branches [49.97%]
10.001082048 seconds time elapsed
on virtio DPDK
root@server:~/suprem/linux-stable/tools/perf# ./perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,branch-misses -t 2165 sleep 10
Performance counter stats for thread id '2165':
89,454,697 cache-references [33.30%]
3,212,875 cache-misses # 3.592 % of all cache refs [33.38%]
6,635,235,594 cycles [33.35%]
1,744,194,176 instructions # 0.26 insns per cycle [50.03%]
371,715,356 branches [50.00%]
6,856,483 branch-misses # 1.84% of all branches [49.97%]
10.001054322 seconds time elapsed
C2758 cache size
*-cache:0
description: L1 cache
physical id: 25
slot: L1-Cache
size: 448KiB
capacity: 448KiB
capabilities: synchronous internal write-back instruction
*-cache:1
description: L2 cache
physical id: 26
slot: L2-Cache
size: 4MiB
capacity: 4MiB
capabilities: synchronous internal write-back unified
There are no iTLB in ATOM
Last level iTLB entries: 4KB 0
Last level dTLB entries: 4KB 128
E5-2609 Xeon Core
root@cos05-m:~# perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,branch-misses -t 2810 sleep 10
Performance counter stats for thread id '2810':
186538457 cache-references [100.00%]
500 cache-misses # 0.000 % of all cache refs [100.00%]
23988762917 cycles [100.00%]
43773193744 instructions # 1.82 insns per cycle [100.00%]
6520118366 branches [100.00%]
25966179 branch-misses # 0.40% of all branches
10.001159796 seconds time elapsed
E5-2609 cache size
configuration: cores=4 enabledcores=4 threads=4
*-cache:0
description: L1 cache
physical id: 700
size: 128KiB
capacity: 128KiB
capabilities: internal write-through data
*-cache:1
description: L2 cache
physical id: 701
size: 1MiB
capacity: 1MiB
capabilities: internal write-through unified
*-cache:2
description: L3 cache
physical id: 702
size: 10MiB
capacity: 10MiB
capabilities: internal write-back unified
The LLC (last-level cache) is the last level in the memory hierarchy before main memory. Any memory requests missing here must be serviced by local or remote DRAM, with significant latency. The LLC Miss metric shows a ratio of cycles with outstanding LLC misses to all cycles.