内核netfilter处理问题(暨packet接受与NAPI介绍)

2020-05-28 00:00:00 函数执行变量服务中断

我认为，一个数据报在netfilter处理的过程中，一直是在softirq中进行的，因此，不可能被其他softirq打断，

如果在一个netfilter的动作，也即target中，直接调用dev_queue_xmit()来发送自己构造的数据报，会出现内存泄漏吗？

内核中，一个为接近的类似的处理是ipt_REJECT，单它使用NF的钩子来发送自己的数据，而不是直接调用dev_queue_xmit()。

函数源码如下（kernel/softirq.c）：
asmlinkage void do_softirq()
{
int cpu = smp_processor_id();
__u32 active, mask;

if (in_interrupt())
return;

local_bh_disable();

local_irq_disable();
mask = softirq_mask(cpu);
active = softirq_active(cpu) & mask;

if (active) {
struct softirq_action *h;

restart:
/* Reset active bitmask before enabling irqs */
softirq_active(cpu) &= ~active;

local_irq_enable();

h = softirq_vec;
mask &= ~active;

do {
if (active & 1)
h->action(h);
h++;
active >>= 1;
} while (active);

local_irq_disable();

active = softirq_active(cpu);
if ((active &= mask) != 0)
goto retry;
}

local_bh_enable();

/* Leave with locally disabled hard irqs. It is critical to close
* window for infinite recursion, while we help local bh count,
* it protected us. Now we are defenceless.
*/
return;

retry:
goto restart;
}
结合上述源码，我们可以看出软中断服务的执行过程如下：
（1）调用宏in_interrupt()来检测当前CPU此次是否已经处于中断服务中。该宏定义在hardirq.h，请参见5.7节。
（2）调用local_bh_disable()宏将当前CPU的中断统计信息结构中的__local_bh_count成员变量加1，表示当前CPU已经处在软中断服务状态。
（3）由于接下来要读写当前CPU的中断统计信息结构中的__softirq_active变量和__softirq_mask变量，因此为了保证这一个操作过程的原子性，先用local_irq_disable()宏（实际上就是cli指令）关闭当前CPU的中断。
（4）然后，读当前CPU的__softirq_active变量值和__softirq_mask变量值。当某个软中断向量被触发时（即 __softirq_active变量中的相应位被置1），只有__softirq_mask变量中的相应位也为1时，它的软中断服务函数才能得到执行。因此，需要将__softirq_active变量和__softirq_mask变量作一次“与”逻辑操作。
（5）如果active变量非 0，说明需要执行软中断服务函数。因此：①先将当前CPU的__softirq_active中的相应位清零，然后用local_irq_enable ()宏（实际上就是sti指令）打开当前CPU的中断。②将局部变量mask中的相应位清零，其目的是：让do_softirq()函数的这一次执行不对同一个软中断向量上的再次软中断请求进行服务，而是将它留待下一次do_softirq()执行时去服务，从而使do_sottirq()函数避免陷入无休止的软中断服务中。③用一个do{}while循环来根据active的值去执行相应的软中断服务函数。④由于接下来又要检测当前CPU的 __softirq_active变量，因此再一次调用local_irq_disable()宏关闭当前CPU的中断。⑤读取当前CPU的 __softirq_active变量的值，并将它与局部变量mask进行与操作，以看看是否又有其他软中断服务被触发了（比如前面所说的那种情形）。如果有的话，那就跳转到entry程序段（实际上是跳转到restart程序段）重新执行软中断服务。如果没有的话，那么此次软中断服务过程就宣告结束。
（6）后，通过local_bh_enable()宏将当前CPU的__local_bh_count变量值减1，表示当前CPU已经离开软中断服务状态。宏local_bh_enable()也定义在include/asm-i386/softirq.h头文件中。

我对softirq的分析，不如xiaozhaoz 的详细、准确，也贴出来献丑。

Softirq.c

/*
* We restart softirq processing MAX_SOFTIRQ_RESTART times,
* and we fall back to softirqd after that.
*
* This number has been established via experimentation.
* The two things to balance is latency against fairness -
* we want to handle softirqs as soon as possible, but they
* should not be able to lock up the box.
*/
#define MAX_SOFTIRQ_RESTART 10

asmlinkage void __do_softirq(void)
{
struct softirq_action *h;
__u32 pending;
int max_restart = MAX_SOFTIRQ_RESTART;
int cpu;

pending = local_softirq_pending(); // 保存当前softirq状态，即有那些softirq需要处理；

local_bh_disable();
cpu = smp_processor_id();
restart:
/* Reset the pending bitmask before enabling irqs */
local_softirq_pending() = 0;

local_irq_enable();

h = softirq_vec;

do {
if (pending & 1) {
h->action(h);
rcu_bh_qsctr_inc(cpu);
}
h++;
pending >>= 1;
} while (pending); //依次处理softirq，直到没有为止。
/*
1．在处理softirq的动作时，中断是使能的，所以在此过程中，是有可能被硬中断中断的，但不可能被软中断中断，因为do_softirq开始时有判断，该软中断有可能在下面的判断中跳转到restart重新开始；
2．软中断一次多处理10次，MAX_SOFTIRQ_RESTART定义，每次多可以处理32个软中断，不过目前软中断只定义了4类；
*/
local_irq_disable();

pending = local_softirq_pending();
//如果在上面处理软中断的过程中，产生了新的需要处理的软中断，并且没有达到大的软中断处理次数，返回再次进行处理！
if (pending && --max_restart)
goto restart;
//如果达到了大次数，但是还有未处理完成的软中断，由系统分配完成何事处理！
if (pending)
wakeup_softirqd();

__local_bh_enable();
}
#ifndef __ARCH_HAS_DO_SOFTIRQ

asmlinkage void do_softirq(void)
{
__u32 pending;
unsigned long flags;

if (in_interrupt()) // 这个函数包括软中断和硬中断
return;

local_irq_save(flags); // 保存状态，禁止中断

pending = local_softirq_pending(); //是否有softirq需要处理

if (pending)
__do_softirq(); // 执行软中断

local_irq_restore(flags); // 恢复状态，使能中断
}

EXPORT_SYMBOL(do_softirq);

#endif

/* SoftIRQ primitives.  */
#define local_bh_disable() \
do { add_preempt_count(SOFTIRQ_OFFSET); barrier(); } while (0)
#define __local_bh_enable() \
do { barrier(); sub_preempt_count(SOFTIRQ_OFFSET); } while (0)

#ifdef CONFIG_DEBUG_PREEMPT
  extern void fastcall add_preempt_count(int val);
  extern void fastcall sub_preempt_count(int val);
#else
# define add_preempt_count(val) do { preempt_count() += (val); } while (0)
# define sub_preempt_count(val) do { preempt_count() -= (val); } while (0)
#endif

#define PREEMPT_OFFSET (1UL << PREEMPT_SHIFT)
#define SOFTIRQ_OFFSET (1UL << SOFTIRQ_SHIFT) //0x0100
#define HARDIRQ_OFFSET (1UL << HARDIRQ_SHIFT) //0x010000

#define PREEMPT_SHIFT 0
#define SOFTIRQ_SHIFT (PREEMPT_SHIFT + PREEMPT_BITS)  // 8
#define HARDIRQ_SHIFT (SOFTIRQ_SHIFT + SOFTIRQ_BITS)  // 16

#define PREEMPT_BITS 8
#define SOFTIRQ_BITS 8

/*
* PREEMPT_MASK: 0x000000ff
* SOFTIRQ_MASK: 0x0000ff00
* HARDIRQ_MASK: 0x0fff0000
*/

下面的汇编不明白什么意思，不过从文字上理解应该是得到当前CPU线程的preempt_count值；所以，local_bh_disable其实就是把当前CPU线程的preempt_count加SOFTIRQ_OFFSET(即0x0100)，barrier()应该保证执行顺序不被打乱（不确定）。
问题：为什么这样就可以禁止bh了吗？

#define preempt_count() (current_thread_info()->preempt_count)
/* how to get the thread information struct from C */
static inline struct thread_info *current_thread_info(void)
{
struct thread_info *ti;
__asm__("andl %%esp,%0; ":"=r" (ti) : "0" (~(THREAD_SIZE - 1)));
return ti;
}

kernel 2.6.13

文章来源CU社区：内核netfilter处理问题(暨packet接受与NAPI介绍)

相关文章