R&D/DPDK

C2758 vs. E5-2609 performance

sunshout 2014. 10. 13. 17:07

DPDK receive performance

Atom C2758 and E5-2609 core has same clock speed 2.4GHz.

But the receiving performance has difference values, since Atom has much cache-misses.

L2 Forwarding Performance

Baremetal

VM(IOMMU)

VM(No IOMMU)

Vhost

Xeon

(E5-2609 @ 2.4 GHz)

13.66 Mpps

(9179 Mbps)

7.46 Mpps

(4777 Mbps)

13.57 Mpps

(9122 Mbps)

Atom

(C2758 @ 2.41GHz)

9.78 Mpps

(6576 Mbps)

9.63 Mpps

(6476 Mbps)

* SR-IOV w/o IOMMU (private patch)

C2758 Atom Core

root@cnode24-m:~# perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,branch-misses -t 4891 sleep 10

Performance counter stats for thread id '4891':

677353724 cache-references [33.31%]

120428677 cache-misses # 17.779 % of all cache refs [33.35%]

24121523267 cycles [33.37%]

24619412547 instructions # 1.02 insns per cycle [50.03%]

3744803946 branches [50.00%]

54565668 branch-misses # 1.46% of all branches [49.97%]

10.001082048 seconds time elapsed

on virtio DPDK

root@server:~/suprem/linux-stable/tools/perf# ./perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,branch-misses -t 2165 sleep 10

Performance counter stats for thread id '2165':

89,454,697 cache-references [33.30%]

3,212,875 cache-misses # 3.592 % of all cache refs [33.38%]

6,635,235,594 cycles [33.35%]

1,744,194,176 instructions # 0.26 insns per cycle [50.03%]

371,715,356 branches [50.00%]

6,856,483 branch-misses # 1.84% of all branches [49.97%]

10.001054322 seconds time elapsed

C2758 cache size

*-cache:0

description: L1 cache

physical id: 25

slot: L1-Cache

size: 448KiB

capacity: 448KiB

capabilities: synchronous internal write-back instruction

*-cache:1

description: L2 cache

physical id: 26

slot: L2-Cache

size: 4MiB

capacity: 4MiB

capabilities: synchronous internal write-back unified

There are no iTLB in ATOM

Last level iTLB entries: 4KB 0

Last level dTLB entries: 4KB 128

E5-2609 Xeon Core

root@cos05-m:~# perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,branch-misses -t 2810 sleep 10

Performance counter stats for thread id '2810':

186538457 cache-references [100.00%]

500 cache-misses # 0.000 % of all cache refs [100.00%]

23988762917 cycles [100.00%]

43773193744 instructions # 1.82 insns per cycle [100.00%]

6520118366 branches [100.00%]

25966179 branch-misses # 0.40% of all branches

10.001159796 seconds time elapsed

E5-2609 cache size

configuration: cores=4 enabledcores=4 threads=4

*-cache:0

description: L1 cache

physical id: 700

size: 128KiB

capacity: 128KiB

capabilities: internal write-through data

*-cache:1

description: L2 cache

physical id: 701

size: 1MiB

capacity: 1MiB

capabilities: internal write-through unified

*-cache:2

description: L3 cache

physical id: 702

size: 10MiB

capacity: 10MiB

capabilities: internal write-back unified

The LLC (last-level cache) is the last level in the memory hierarchy before main memory. Any memory requests missing here must be serviced by local or remote DRAM, with significant latency. The LLC Miss metric shows a ratio of cycles with outstanding LLC misses to all cycles.

현재글C2758 vs. E5-2609 performance

네트워크, 미완성, Xen, CloudStack, OVM, 팁, PyQt4, ns, 가상화, C, 회사, Hadoop, 분양, Eclipse, 라우터, HBase, Python, latex, 아파트, 논문,

Today :
Yesterday :

Deep dive into Kernel

C2758 vs. E5-2609 performance

L2 Forwarding Performance

'R&D/DPDK'의 다른글

티스토리툴바

« 2024/05 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

C2758 vs. E5-2609 performance

L2 Forwarding Performance

'R&D/DPDK'의 다른글

관련글

티스토리툴바