Linux 中断机制（ 2 ） -- 软中断

Linux中断机制（2）--软中断 梁冰北大网络分布式实验室 2004.11.15

1.软中断简介 • 回忆前面的知识 • 硬中断最后的处理do_IRQ()函数中，最后的一个处理是什么？do_softirq()! • 为什么要软中断？ • 硬中断中是需要尽快响应和处理的，不能将太多时间放在中断事件的处理上，否则会丢失掉另外的相同类型的中断 • 一些中断需要处理的工作不是那么急迫 • 硬件中断打断CPU，软中断打断内核序列

2.软中断分类 • Linux2.4.*系列有三种可延迟中断内核函数： • 软中断（softirq） • 小任务（tasklet） • 下半部分（bottom half） • 三者之间关系 • Tasklet用softirq实现，bottom half用tasklet实现

3 软中断，tasklet及下半部 • 软中断 • 非动态分配，需要内核编译 • 同类软中断可以并发运行在几个CPU上 • Tasklet • 可以编程时动态分配，内核运行时，模块载入 • 不同种类的taskelt可以并发在接个CPU上运行，同种的不行 • 下半部分 • 非动态分配，需要内核编译 • 下半部分不能并发在几个CPU上运行 • 注意 • 任何可延迟函数都不能与其他的可延迟函数在同一个CPU上交错执行。

3.1 可延迟函数的一般操作 • 初始化（Initialization） • 定义一个新的可延迟函数 • 激活（Activation） • 标记一个可执行函数成为pending态 • 激活可以在任何时候进行（中断处理中也可） • 屏蔽（Masking） • 有选择地屏蔽一个可延迟函数，使它即使被激活也不会被内核执行 • 执行（Execution） • 执行一个可延迟函数和其他任何的可延迟函数 • 给定CPU激活的可延迟函数一般在同一CPU上运行

4 可延迟函数－－－软中断详解 • 在2.4.*内核中，定义了4种软中断 • 低下标意味着高优先级，软中断函数从下标0开始执行

4.1 软中断主要数据结构 • Softirq_vec数组 • 由softirq_action结构体组成的数组，默认内核只有前4项有用。 • struct softirq_action • { • void (*action)(struct softirq_action *); • void *data; • };

4.1 软中断主要数据结构 • Irq_stat • 是irq_cpustat_t数组，每一个CPU一个。其实在硬中断处理部分已经使用了其中的许多数据项了。 • typedef struct { • unsigned int __softirq_pending; • unsigned int __local_irq_count; • unsigned int __local_bh_count; • unsigned int __syscall_count; • struct task_struct * __ksoftirqd_task; /* waitqueue is too large */ • unsigned int __nmi_count; /* arch dependent */ • } ____cacheline_aligned irq_cpustat_t;

4.1 软中断数据结构 • Irq_cpustat_t • __softirq_pending字段存放一组标志表示挂起的软中断 • __local_bh_count字段，禁止软中断的执行，为0则软中断被激活，为负数则被禁止 • __ksoftirq_task字段，存放ksoftirqd_CPUn内核线程的进程描述符，这种内核线程致力于可延迟函数的执行

4.2 软中断的相关调用 • 初始化 • open_softirq()函数处理软中断的初始化，三个函数分别是软中断下标，要执行的软中断的 • 激活 • __cpu_raise_softirq宏激活 • cup_raise_softirq函数激活，同时唤醒ksoftirq_CPUn内核线程

4.2 软中断的相关调用 • 屏蔽 • __local_bh_count为0时打开软中断，为负(正?)数时禁止 • 执行 • 检查软中断的挂起是在内核代码的几个点上进行的。且挂起点的个数和位置随内核版本的变化而改变。以下以2.4.*内核为例： • 当local_bh_enable宏重新接活软中断时 • 当do_IRQ完成时处理I/O的中断时 • 当smp_apic_timer_interrupt函数完成了处理一个本地时钟中断时 • 一个特定的ksoftirq_CPU内核线程被唤醒时 • 当在网络接口卡上获取一个数据包时

4.2 软中断的相关调用 • 关键代码： • asmlinkage void do_softirq() • { • int cpu = smp_processor_id(); • __u32 pending; • unsigned long flags; • __u32 mask; • if (in_interrupt()) • return;

4.2 软中断的相关调用 • local_irq_save(flags); • pending = softirq_pending(cpu); • if (pending) { • struct softirq_action *h; • mask = ~pending; • local_bh_disable();

4.2 软中断的相关调用 • restart: • /* Reset the pending bitmask before enabling irqs */ • softirq_pending(cpu) = 0; • local_irq_enable(); • h = softirq_vec; • do { • if (pending & 1) • h->action(h); • h++; • pending >>= 1; • } while (pending);

4.2 软中断的相关调用 • local_irq_disable(); • pending = softirq_pending(cpu); • if (pending & mask) { • mask &= ~pending; • goto restart; • } • __local_bh_enable(); • if (pending) • wakeup_softirqd(cpu); • }

4.2 软中断的相关调用 • 几点注意及解释： • 可延迟函数（软中断）必须在中断之外运行，否则就违背初衷了。 • 如果可延迟函数被禁止就不能执行了。 • 关中断修改数据结构，关软中断执行并开中断执行软中断函数 • 每一次在执行点上最多执行一次每一类的软中断函数，如果在开中断的时候被中断挂上去的函数，可以且最多可以执行一次 • 最后如果还有挂起的软中断没有执行，就唤醒ksoftirq_CPUn内核线程

4.3 软中断内核线程 • 每个cpu都有一个自己的ksoftirq_CPUn内核线程(这里,n为CPU的逻辑号).每个ksoftirqd_CPUn内核线程都运行ksoftirqd()函数: • ksoftirqd_task(cpu) = current; • for (;;) { • if (!softirq_pending(cpu)) • schedule(); • __set_current_state(TASK_RUNNING); • while (softirq_pending(cpu)) { • do_softirq(); • if (current->need_resched) • schedule(); • } • __set_current_state(TASK_INTERRUPTIBLE); • }

4.3 软中断内核线程(续) • 软中断内核线程的必要性 • 软中断函数可以自己重新激活自己,如网络软中断和tasklest软中断都可以如此,网卡上高流量可能激活大量软中断 • 没有软中断内核线程,有两种极端的处理方法 • 忽略do_softirq运行时新出现的软中断,就是在该函数开始执行时,确定哪些软中断是挂起的,然后一次执行完,但是之后就不再重新检查新的软中断,而是等到下次时钟中断后执行 • 在do_softirq中不断检查软中断并执行,这样虽然满足大流量的软中断的执行,但是会使得用户进程长期得不到响应 • 引入ksoftirqd可以在上述两种情况中作一个妥协和折衷

5 可延迟函数----tasklet(小任务) • 简述 • Tasklet是I/O驱动程序中实现可延迟函数的首选方法. • Tasklet建立在两个叫做HI_SOFTIRQ和TASKLET_SOFTIRQ的软中断之上.实际上是一种软中断的应用.这两隔软中断没有太大差别,只是先执行前者后执行后者 • 几个tasklet可以与同一个软中断相关联,即同一个软中断可以串行执行好多的tasklet

5.1 tasklet的数据结构 • tasklet_vec和tasklet_hi_vec • 二者分别对应两种软中断所执行的tasklet的数据结构,实际上二者都是一个指针数组,有NR_CPUS个元素,每个元素是tasklet_head的指针,tasklet_head含有指向tasklet_struct类型数据结构的指针(….晕了) • struct tasklet_struct • { • struct tasklet_struct *next; • unsigned long state; /* TASKLET_STATE_SKEED or TASKLET_STATE_RUN */ • atomic_t count; • void (*func)(unsigned long); • unsigned long data; • };

5.2 tasklet调用步骤和原理 • 先分配一个新的tasklet_struct数据结构,并由tasklet_init初始化它 • Tasklet_disable_nosync或者tasklet_disable函数可以选择性地禁止tasklet • 可以用tasklet_shedule或tasklet_hi_schedule来调度 • static inline void tasklet_schedule(struct tasklet_struct *t) • { • if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) • __tasklet_schedule(t); • }

5.2 tasklet调用步骤和原理 • void __tasklet_schedule(struct tasklet_struct *t) • { • int cpu = smp_processor_id(); • unsigned long flags; • local_irq_save(flags); • t->next = tasklet_vec[cpu].list; • tasklet_vec[cpu].list = t; • cpu_raise_softirq(cpu, TASKLET_SOFTIRQ); • local_irq_restore(flags); • }

5.2 tasklet调用步骤和原理 • 分配一个新的tasklet_struct数据结构,调用tasklet_init初始化它 • 调用tasklet_disable_nosync或tasklet_disable来选择性地禁止tasklet • 调用tasklet_schedule或tasklet_hi_shedule来激活函数 • 在do_softirq函数中由tasklet_action和tasklet_hi_action执行具体的tasklet任务 static inline void tasklet_schedule(struct tasklet_struct *t) { if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) __tasklet_schedule(t); }

5.2 tasklet调用步骤和原理 • void __tasklet_schedule(struct tasklet_struct *t) • { • int cpu = smp_processor_id(); • unsigned long flags; • local_irq_save(flags); • t->next = tasklet_vec[cpu].list; • tasklet_vec[cpu].list = t; • cpu_raise_softirq(cpu, TASKLET_SOFTIRQ); • local_irq_restore(flags); • }

5.2 tasklet调用步骤和原理 • inline void cpu_raise_softirq(unsigned int cpu, unsigned int nr) • { • __cpu_raise_softirq(cpu, nr); • /* • * If we're in an interrupt or bh, we're done • * (this also catches bh-disabled code). We will • * actually run the softirq once we return from • * the irq or bh. • * • * Otherwise we wake up ksoftirqd to make sure we • * schedule the softirq soon. • */ • if (!(local_irq_count(cpu) | local_bh_count(cpu))) • wakeup_softirqd(cpu); • } • #define __cpu_raise_softirq(cpu, nr) do { softirq_pending(cpu) |= 1UL << (nr); } while (0)

5.2 tasklet调用步骤和原理 • static void tasklet_action(struct softirq_action *a) • { • int cpu = smp_processor_id(); • struct tasklet_struct *list; • local_irq_disable(); • list = tasklet_vec[cpu].list; • tasklet_vec[cpu].list = NULL; • local_irq_enable(); • while (list) { • struct tasklet_struct *t = list; • list = list->next; • if (tasklet_trylock(t)) { • if (!atomic_read(&t->count)) {

5.2 tasklet调用步骤和原理 • if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state)) • BUG(); • t->func(t->data); • tasklet_unlock(t); • continue; • } • tasklet_unlock(t); • } • local_irq_disable(); • t->next = tasklet_vec[cpu].list; • tasklet_vec[cpu].list = t; • __cpu_raise_softirq(cpu, TASKLET_SOFTIRQ); • local_irq_enable(); • } • } • 注意,除非tasklet函数重新激活自己,否则,tasklet每次激活至多触发tasklet函数的一次执行

6 延时执行------下半部(buttom half) • 下半部分本质上是一个不能与其他下半部分并发执行的高优先级的tasklet,即使它是一个不同的类型,并且在另一个CPU上,global_bh_lock自旋锁用来确保至多有一个下半部在运行 • 下半部在2.4以前的内核中常用,但是在2.4后,由于其并发效率不高,逐渐为tasklet取代,而为了兼容以前的驱动程序,下半部在2.4继续使用,并在底部使用tasklet来实现.

6.1 下半部的数据结构 • static void (*bh_base[32])(void); • struct tasklet_struct bh_task_vec[32]; • 在bh_base当中大多数是一些与硬件相关的处理,但是其中 • TIME_BH, • CONSOLE_BH, • TQUEUE_BH, • SERIAL_BH, • IMMEDIATE_BH 等等有广泛用途

6.1 下半部的数据结构

6.2 下半部的运行 • 初始化在bh_init里面进行 void init_bh(int nr, void (*routine)(void)) { bh_base[nr] = routine; mb(); } • 删除在remove_bh里面进行 void remove_bh(int nr) { tasklet_kill(bh_task_vec+nr); bh_base[nr] = NULL; }

6.2 下半部的运行 • 激活在mark_bh里面进行 static inline void mark_bh(int nr) { tasklet_hi_schedule(bh_task_vec+nr); } • 所有的初始化在softirq_init里面进行 void __init softirq_init() { int i; for (i=0; i<32; i++) tasklet_init(bh_task_vec+i, bh_action, i); open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL); open_softirq(HI_SOFTIRQ, tasklet_hi_action, NULL); }

6.2 下半部的运行 • 运行下半部通过bh_action来执行,它是所有下半部共同的tasklet函数,它使用下半部的标号作为参数,执行如下步骤: • 获取执行tasklet的CPU逻辑号 • 检查是否已经获取全局的自旋锁,如果有别的CPU在执行这个下半部那就返回,保证全局串行执行 • 获取到全局自旋锁 • 看是否处于中断环境中,且全局中断是否被激活,如果二者之中至少有一个成立,说明不是下半部运行的时机,不可以执行,归还全局自旋锁 • 如果可以执行下半部则立即执行处理函数 • 归还全局自旋锁

6.2 下半部的运行 • static void bh_action(unsigned long nr) • { • int cpu = smp_processor_id(); • if (!spin_trylock(&global_bh_lock)) • goto resched; • if (!hardirq_trylock(cpu)) • goto resched_unlock; • if (bh_base[nr]) • bh_base[nr](); • hardirq_endlock(cpu); • spin_unlock(&global_bh_lock); • return; • resched_unlock: • spin_unlock(&global_bh_lock); • resched: • mark_bh(nr); • }

6.3 扩充下半部份----任务队列 • 引入可延迟函数的目的是允许一些与中断处理相关的有限个函数以推迟方式执行,这种方式可以从以下两个方面得到延伸 • 允许一个普通的内核函数,而不仅仅是服务于中断的一个函数,能以下半部的身份执行 • 允许几个内核函数,而不是单独的一个函数,能与一个下半部份相关联 • 这种方式可以使用任务队列(task array)来表示和运行

6.3.1 任务队列数据结构 • tq_struct结构体 struct tq_struct { struct list_head list; /* linked list of active bh's */ unsigned long sync; /* must be initialized to zero */ void (*routine)(void *); /* function to call */ void *data; /* argument to function */ }; 其中sync成员用来防止多个task queue活动

6.3.2 任务队列的运行 • DECLARE_TASK_QUEUE用来分配一个新的任务队列 • queue_task把一个函数插入到任务队列中 static inline int queue_task(struct tq_struct *bh_pointer, task_queue *bh_list) { int ret = 0; if (!test_and_set_bit(0,&bh_pointer->sync)) { unsigned long flags; spin_lock_irqsave(&tqueue_lock, flags); list_add_tail(&bh_pointer->list, bh_list); spin_unlock_irqrestore(&tqueue_lock, flags); ret = 1; } return ret; } • run_task_queue来执行给定的任务队列中所有的函数

6.3.2 任务队列的运行 static inline void run_task_queue(task_queue *list) { if (TQ_ACTIVE(*list)) __run_task_queue(list); } • void __run_task_queue(task_queue *list) • { • struct list_head head, *next; • unsigned long flags; • spin_lock_irqsave(&tqueue_lock, flags); • list_add(&head, list); • list_del_init(list); • spin_unlock_irqrestore(&tqueue_lock, flags); • next = head.next;

6.3.2 任务队列的运行 • while (next != &head) { • void (*f) (void *); • struct tq_struct *p; • void *data; • p = list_entry(next, struct tq_struct, list); • next = next->next; • f = p->routine; • data = p->data; • wmb(); • p->sync = 0; • if (f) • f(data); • } • }

6.3.3 三个特殊任务队列及其作用 • Tq_immediate队列,由IMMEDIATE_BH下半部运行,该队列可以运行内核函数和标准下半部分,只要一个函数加入到tq_immediate任务队列,内核就调用mark_bh激活IMMEDIATE_BH下半部.当do_softirq一被调用该队列就执行 • Tq_timer队列,由TQUEU_BH下半部运行,每次定时中断是都激活TQEUE_BH下半部分运行 • Tq_context队列,这个与下半部份没有太多关系,主要由keventd内核线程运行.shedule_task函数把一个函数加入到这个队列,它的执行就被延迟,直到内核调度器调度选择keventd内核线程作为下一个进程运行时执行 • 相对于其他可延迟函数的任务队列而言,tq_context队列的主要优点是可以进行阻塞操作 • 由于内核开发人员无法确定那个进程将执行延迟函数,从这个意义上说软中断(以及tasklet,buttomhalf)类似中断处理,这意味着软中断们无法访问文件,请求信号量或者在等待队列上睡眠 • 一旦tq_context中的执行被调度,它可能延迟相当长的时间等待调度进程调度keventd,这是代价

7谢谢大家! • 请提问题? • 2.6内核还提供了一个工作队列,类似tq_context队列,而完全放弃了buttom half的使用 • 下次是无线下安全的概况

Linux 中断机制（ 2 ） -- 软中断

Linux 中断机制（ 2 ） -- 软中断

Presentation Transcript

全体ミーティング 03/04

Linux2.6 内核中”下半部分”分析

資源間のリレーションシップ（関係）

Logging into the linux machines