Linux :: Tunning the _spin_lock Caused by High Soft IRQ CPU with numactl

When testing the performance for OneProxy, we found very high soft irq CPU usage from “top” result.

top - 00:15:30 up 33 days,  4:42,  1 user,  load average: 0.17, 0.06, 0.02
Tasks: 498 total,   2 running, 496 sleeping,   0 stopped,   0 zombie
Cpu(s): 20.9%us, 18.8%sy,  0.0%ni, 52.8%id,  0.0%wa,  0.0%hi,  7.5%si,  0.0%st
Mem:  32772236k total,  1290176k used, 31482060k free,   178964k buffers
Swap: 16457724k total,        0k used, 16457724k free,   490088k cached

And very high “_spin_lock” and “_spin_lock_irqsave” contention from “perf top” result.

Samples: 397K of event 'cycles', Event count (approx.): 152724262750
   1.96%  [kernel]            [k] _spin_lock
   1.91%  oneproxy            [.] g_string_append_len
   1.73%  oneproxy            [.] ifree
   1.53%  oneproxy            [.] sql_tokenizer_internal
   1.50%  oneproxy            [.] g_private_get
   1.49%  [kernel]            [k] _spin_lock_irqsave
   1.34%  oneproxy            [.] malloc
   1.33%  oneproxy            [.] proxy_read_query
   1.31%  [kernel]            [k] sys_epoll_ctl
   1.19%  oneproxy            [.] __intel_ssse3_rep_memcpy

You can find out which CPU core is busy on soft irq by “mpstat -P ALL 5” command,  there are two network adapters on my test machine, and each have 4 irq number, so totally 8 cores are busy on soft irq. Since numactl utility can bond a process to specific physical CPU cores, I made the following steps for performance tuning.

Bond the irq CPU to the first 4 cores (0-3) for irqbalance service by add the following line to “/etc/sysconfig/irqbalance” and restart the irqbalance service (run “service irqbalance restart” under root user).

IRQBALANCE_BANNED_CPUS="FFFFFFF0"

And then restart the OneProxy with numactl utility as following (we have totally 24 cores on the target machine)

numactl --physcpubind=4-23 \
  ${ONEPROXY_HOME}/oneproxy --keepalive --proxy-address=:3307 \

Start the mydbtest program again, and check the load data by “top”  command again.

top - 00:30:27 up 33 days,  4:57,  1 user,  load average: 0.00, 0.00, 0.00
Tasks: 498 total,   1 running, 497 sleeping,   0 stopped,   0 zombie
Cpu(s): 22.8%us, 21.3%sy,  0.0%ni, 55.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32772236k total,  1292776k used, 31479460k free,   178964k buffers
Swap: 16457724k total,        0k used, 16457724k free,   490104k cached

Let’s check the “perf top” result again.

Samples: 345K of event 'cycles', Event count (approx.): 132376202794
   2.26%  oneproxy            [.] g_string_append_len
   1.97%  oneproxy            [.] ifree
   1.75%  oneproxy            [.] sql_tokenizer_internal
   1.65%  oneproxy            [.] g_private_get
   1.57%  oneproxy            [.] malloc
   1.51%  [kernel]            [k] sys_epoll_ctl
   1.48%  oneproxy            [.] proxy_read_query
   1.36%  [kernel]            [k] _spin_lock
   1.33%  oneproxy            [.] __intel_ssse3_rep_memcpy
   1.33%  [kernel]            [k] _spin_lock_irqsave

The same testing data and the same concurrent threads, we found that the soft irq CPU usage disappears. Of cause we get better overall QPS.

[root@localhost data]# ./mydbtest_linux64.bin query=test1.sql degree=280 
CSummary: SQL01 exec=11130858, rows=11130858=100/e, avg=437 us
Summary: exec=585834/s, qtps=585834/s
/* tunning the irq balance and restart oneproxy */
[root@localhost data]# ./mydbtest_linux64.bin query=test1.sql degree=280
CSummary: SQL01 exec=159157157, rows=159157157=100/e, avg=437 us
Summary: exec=636628/s, qtps=636628/s

Lower CPU usage and more QPS by carefully CPU bonding.