Linux :: Tunning the _spin_lock Caused by High Soft IRQ CPU with numactl

When testing the performance for OneProxy, we found very high soft irq CPU usage from “top” result.

top - 00:15:30 up 33 days,  4:42,  1 user,  load average: 0.17, 0.06, 0.02
Tasks: 498 total,   2 running, 496 sleeping,   0 stopped,   0 zombie
Cpu(s): 20.9%us, 18.8%sy,  0.0%ni, 52.8%id,  0.0%wa,  0.0%hi,  7.5%si,  0.0%st
Mem:  32772236k total,  1290176k used, 31482060k free,   178964k buffers
Swap: 16457724k total,        0k used, 16457724k free,   490088k cached

And very high “_spin_lock” and “_spin_lock_irqsave” contention from “perf top” result.

Samples: 397K of event 'cycles', Event count (approx.): 152724262750
   1.96%  [kernel]            [k] _spin_lock
   1.91%  oneproxy            [.] g_string_append_len
   1.73%  oneproxy            [.] ifree
   1.53%  oneproxy            [.] sql_tokenizer_internal
   1.50%  oneproxy            [.] g_private_get
   1.49%  [kernel]            [k] _spin_lock_irqsave
   1.34%  oneproxy            [.] malloc
   1.33%  oneproxy            [.] proxy_read_query
   1.31%  [kernel]            [k] sys_epoll_ctl
   1.19%  oneproxy            [.] __intel_ssse3_rep_memcpy

You can find out which CPU core is busy on soft irq by “mpstat -P ALL 5” command,  there are two network adapters on my test machine, and each have 4 irq number, so totally 8 cores are busy on soft irq. Since numactl utility can bond a process to specific physical CPU cores, I made the following steps for performance tuning.

Bond the irq CPU to the first 4 cores (0-3) for irqbalance service by add the following line to “/etc/sysconfig/irqbalance” and restart the irqbalance service (run “service irqbalance restart” under root user).


And then restart the OneProxy with numactl utility as following (we have totally 24 cores on the target machine)

numactl --physcpubind=4-23 \
  ${ONEPROXY_HOME}/oneproxy --keepalive --proxy-address=:3307 \

Start the mydbtest program again, and check the load data by “top”  command again.

top - 00:30:27 up 33 days,  4:57,  1 user,  load average: 0.00, 0.00, 0.00
Tasks: 498 total,   1 running, 497 sleeping,   0 stopped,   0 zombie
Cpu(s): 22.8%us, 21.3%sy,  0.0%ni, 55.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32772236k total,  1292776k used, 31479460k free,   178964k buffers
Swap: 16457724k total,        0k used, 16457724k free,   490104k cached

Let’s check the “perf top” result again.

Samples: 345K of event 'cycles', Event count (approx.): 132376202794
   2.26%  oneproxy            [.] g_string_append_len
   1.97%  oneproxy            [.] ifree
   1.75%  oneproxy            [.] sql_tokenizer_internal
   1.65%  oneproxy            [.] g_private_get
   1.57%  oneproxy            [.] malloc
   1.51%  [kernel]            [k] sys_epoll_ctl
   1.48%  oneproxy            [.] proxy_read_query
   1.36%  [kernel]            [k] _spin_lock
   1.33%  oneproxy            [.] __intel_ssse3_rep_memcpy
   1.33%  [kernel]            [k] _spin_lock_irqsave

The same testing data and the same concurrent threads, we found that the soft irq CPU usage disappears. Of cause we get better overall QPS.

[root@localhost data]# ./mydbtest_linux64.bin query=test1.sql degree=280 
CSummary: SQL01 exec=11130858, rows=11130858=100/e, avg=437 us
Summary: exec=585834/s, qtps=585834/s
/* tunning the irq balance and restart oneproxy */
[root@localhost data]# ./mydbtest_linux64.bin query=test1.sql degree=280
CSummary: SQL01 exec=159157157, rows=159157157=100/e, avg=437 us
Summary: exec=636628/s, qtps=636628/s

Lower CPU usage and more QPS by carefully CPU bonding.