• 正文
  • 相关推荐
申请入驻 产业图谱

【Linux内核设计思想】四、进程管理(一)

10/14 14:19
768
加入交流群
扫码加入
获取工程师必备礼包
参与热点资讯讨论

进程、进程描述符及任务结构

什么是进程、进程描述符

进程(process)就是处于执行期的程序,但是进程并不局限于一段可执行代码(或叫代码段 text section),它还包含其它资源,比如打开的文件、挂起的信号、内核内部数据、处理器状态、地址空间、一个或多个执行线程等等。也就是说,进程是处于执行期的程序以及它所包含的资源的总称。实际上,有可能会出现多个不同进程执行的是同一个程序,多个并存的进程共享文件等资源这些情况。在Linux内核中,进程一般也被叫做任务(task)。

执行线程(thread of execution)简称为线程,是在进程中活动的对象,每个线程有独立的程序计数器、进程栈和一组进程寄存器。内核的调度对象是线程,而不是进程,进程是资源分配的基本单位。通常,一个进程包含一个或多个线程,但是Linux并不特别区分进程和线程,对Linux而言,线程是特殊的进程。

进程提供了两种虚拟机制,虚拟处理器和虚拟内存

虚拟处理器:实际上多个进程分享同一个处理器,但是虚拟处理器机制给进程一种假象,好像进程在独享处理器。

虚拟内存:进程在获取和使用内存时,让进程感觉像是拥有整个系统的内存资源。

包含在同一个进程中的线程可以共享虚拟内存,但是拥有各自的虚拟处理器。

进程在它被创建的时候开始存活,在Linux中,这一时刻通常是调用系统调用 fork() 的返回结果。fork() 系统调用通过复制一个进程(也是调用fork的进程,父进程)来创建一个新进程(子进程)。调用结束时,在返回点的相同位置上,父进程恢复执行,子进程开始执行。fork() 系统调用的一个特点就是,一次调用两次返回,即在内核返回两次,一次返回到父进程调用fork()的地方,另一次返回到新创建的子进程。

我们创建一个新进程是为了执行一个新的程序,但是通过上面介绍我们知道,fork() 系统调用是复制了一个现有的进程,fork() 系统调用实际上是通过 clone() 系统调用实现的。让新创建的进程执行新的程序的方法是,在新创建的进程中调用exec函数族,这样就可以创建新的地址空间并载入新的程序。

最终,可以通过 exit() 系统调用来退出进程,并释放进程所占用的资源。父进程可以通过wait4()系统调用来查询子进程是否终结。

Linux内核把程序存放在任务队列(task list)中,任务队列是一个双向循环链表,链表中的每一项都是类型为 task_struct 的结构,task_struct 被称为进程描述符(process descriptor)。该结构定义在 <include/linux/sched.h> 文件中,进程描述符包含了一个具体进程的所有信息。这个结构体非常大

structtask_struct {
#ifdef CONFIG_THREAD_INFO_IN_TASK
/*
* For reasons of header soup (see current_thread_info()), this
* must be the first element of task_struct.
*/
structthread_infothread_info;
#endif
unsignedint__state;

#ifdef CONFIG_PREEMPT_RT
/* saved state for "spinlock sleepers" */
unsignedintsaved_state;
#endif

/*
* This begins the randomizable portion of task_struct. Only
* scheduling-critical items should be added above here.
*/
randomized_struct_fields_start

void*stack;
refcount_tusage;
/* Per task flags (PF_*), defined further below: */
unsignedintflags;
unsignedintptrace;

#ifdef CONFIG_SMP
inton_cpu;
struct__call_single_nodewake_entry;
unsignedintwakee_flips;
unsignedlongwakee_flip_decay_ts;
structtask_struct*last_wakee;

/*
* recent_used_cpu is initially set as the last CPU used by a task
* that wakes affine another task. Waker/wakee relationships can
* push tasks around a CPU where each wakeup moves to the next one.
* Tracking a recently used CPU allows a quick search for a recently
* used CPU that may be idle.
*/
intrecent_used_cpu;
intwake_cpu;
#endif
inton_rq;

intprio;
intstatic_prio;
intnormal_prio;
unsignedintrt_priority;

structsched_entityse;
structsched_rt_entityrt;
structsched_dl_entitydl;
conststructsched_class*sched_class;

#ifdef CONFIG_SCHED_CORE
structrb_nodecore_node;
unsignedlongcore_cookie;
unsignedintcore_occupation;
#endif

#ifdef CONFIG_CGROUP_SCHED
structtask_group*sched_task_group;
#endif

#ifdef CONFIG_UCLAMP_TASK
/*
* Clamp values requested for a scheduling entity.
* Must be updated with task_rq_lock() held.
*/
structuclamp_seuclamp_req[UCLAMP_CNT];
/*
* Effective clamp values used for a scheduling entity.
* Must be updated with task_rq_lock() held.
*/
structuclamp_seuclamp[UCLAMP_CNT];
#endif

structsched_statistics         stats;

#ifdef CONFIG_PREEMPT_NOTIFIERS
/* List of struct preempt_notifier: */
structhlist_headpreempt_notifiers;
#endif

#ifdef CONFIG_BLK_DEV_IO_TRACE
unsignedintbtrace_seq;
#endif

unsignedintpolicy;
intnr_cpus_allowed;
constcpumask_t*cpus_ptr;
cpumask_t*user_cpus_ptr;
cpumask_tcpus_mask;
void*migration_pending;
#ifdef CONFIG_SMP
unsignedshortmigration_disabled;
#endif
unsignedshortmigration_flags;

#ifdef CONFIG_PREEMPT_RCU
intrcu_read_lock_nesting;
unionrcu_specialrcu_read_unlock_special;
structlist_headrcu_node_entry;
structrcu_node*rcu_blocked_node;
#endif /* #ifdef CONFIG_PREEMPT_RCU */

#ifdef CONFIG_TASKS_RCU
unsignedlongrcu_tasks_nvcsw;
u8rcu_tasks_holdout;
u8rcu_tasks_idx;
intrcu_tasks_idle_cpu;
structlist_headrcu_tasks_holdout_list;
#endif /* #ifdef CONFIG_TASKS_RCU */

#ifdef CONFIG_TASKS_TRACE_RCU
inttrc_reader_nesting;
inttrc_ipi_to_cpu;
unionrcu_specialtrc_reader_special;
structlist_headtrc_holdout_list;
structlist_headtrc_blkd_node;
inttrc_blkd_cpu;
#endif /* #ifdef CONFIG_TASKS_TRACE_RCU */

structsched_infosched_info;

structlist_headtasks;
#ifdef CONFIG_SMP
structplist_nodepushable_tasks;
structrb_nodepushable_dl_tasks;
#endif

structmm_struct*mm;
structmm_struct*active_mm;

intexit_state;
intexit_code;
intexit_signal;
/* The signal sent when the parent dies: */
intpdeath_signal;
/* JOBCTL_*, siglock protected: */
unsignedlongjobctl;

/* Used for emulating ABI behavior of previous Linux versions: */
unsignedintpersonality;

/* Scheduler bits, serialized by scheduler locks: */
unsignedsched_reset_on_fork:1;
unsignedsched_contributes_to_load:1;
unsignedsched_migrated:1;

/* Force alignment to the next boundary: */
unsigned:0;

/* Unserialized, strictly 'current' */

/*
* This field must not be in the scheduler word above due to wakelist
* queueing no longer being serialized by p->on_cpu. However:
*
* p->XXX = X;ttwu()
* schedule()  if (p->on_rq && ..) // false
*   smp_mb__after_spinlock();  if (smp_load_acquire(&p->on_cpu) && //true
*   deactivate_task()      ttwu_queue_wakelist())
*     p->on_rq = 0;p->sched_remote_wakeup = Y;
*
* guarantees all stores of 'current' are visible before
* ->sched_remote_wakeup gets used, so it can be in this word.
*/
unsignedsched_remote_wakeup:1;

/* Bit to tell LSMs we're in execve(): */
unsignedin_execve:1;
unsignedin_iowait:1;
#ifndef TIF_RESTORE_SIGMASK
unsignedrestore_sigmask:1;
#endif
#ifdef CONFIG_MEMCG
unsignedin_user_fault:1;
#endif
#ifdef CONFIG_LRU_GEN
/* whether the LRU algorithm may apply to this access */
unsignedin_lru_fault:1;
#endif
#ifdef CONFIG_COMPAT_BRK
unsignedbrk_randomized:1;
#endif
#ifdef CONFIG_CGROUPS
/* disallow userland-initiated cgroup migration */
unsignedno_cgroup_migration:1;
/* task is frozen/stopped (used by the cgroup freezer) */
unsignedfrozen:1;
#endif
#ifdef CONFIG_BLK_CGROUP
unsigneduse_memdelay:1;
#endif
#ifdef CONFIG_PSI
/* Stalled due to lack of memory */
unsignedin_memstall:1;
#endif
#ifdef CONFIG_PAGE_OWNER
/* Used by page_owner=on to detect recursion in page tracking. */
unsignedin_page_owner:1;
#endif
#ifdef CONFIG_EVENTFD
/* Recursion prevention for eventfd_signal() */
unsignedin_eventfd:1;
#endif
#ifdef CONFIG_IOMMU_SVA
unsignedpasid_activated:1;
#endif
#ifdefCONFIG_CPU_SUP_INTEL
unsignedreported_split_lock:1;
#endif
#ifdef CONFIG_TASK_DELAY_ACCT
/* delay due to memory thrashing */
unsigned                        in_thrashing:1;
#endif

unsignedlongatomic_flags; /* Flags requiring atomic access. */

structrestart_blockrestart_block;

pid_tpid;
pid_ttgid;

#ifdef CONFIG_STACKPROTECTOR
/* Canary value for the -fstack-protector GCC feature: */
unsignedlongstack_canary;
#endif
/*
* Pointers to the (original) parent process, youngest child, younger sibling,
* older sibling, respectively.  (p->father can be replaced with
* p->real_parent->pid)
*/

/* Real parent process: */
structtask_struct__rcu*real_parent;

/* Recipient of SIGCHLD, wait4() reports: */
structtask_struct__rcu*parent;

/*
* Children/sibling form the list of natural children:
*/
structlist_headchildren;
structlist_headsibling;
structtask_struct*group_leader;

/*
* 'ptraced' is the list of tasks this task is using ptrace() on.
*
* This includes both natural children and PTRACE_ATTACH targets.
* 'ptrace_entry' is this task's link on the p->parent->ptraced list.
*/
structlist_headptraced;
structlist_headptrace_entry;

/* PID/PID hash table linkage. */
structpid*thread_pid;
structhlist_nodepid_links[PIDTYPE_MAX];
structlist_headthread_group;
structlist_headthread_node;

structcompletion*vfork_done;

/* CLONE_CHILD_SETTID: */
int__user*set_child_tid;

/* CLONE_CHILD_CLEARTID: */
int__user*clear_child_tid;

/* PF_KTHREAD | PF_IO_WORKER */
void*worker_private;

u64utime;
u64stime;
#ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME
u64utimescaled;
u64stimescaled;
#endif
u64gtime;
structprev_cputimeprev_cputime;
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
structvtimevtime;
#endif

#ifdef CONFIG_NO_HZ_FULL
atomic_ttick_dep_mask;
#endif
/* Context switch counts: */
unsignedlongnvcsw;
unsignedlongnivcsw;

/* Monotonic time in nsecs: */
u64start_time;

/* Boot based time in nsecs: */
u64start_boottime;

/* MM fault and swap info: this can arguably be seen as either mm-specific or thread-specific: */
unsignedlongmin_flt;
unsignedlongmaj_flt;

/* Empty if CONFIG_POSIX_CPUTIMERS=n */
structposix_cputimersposix_cputimers;

#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
structposix_cputimers_workposix_cputimers_work;
#endif

/* Process credentials: */

/* Tracer's credentials at attach: */
conststructcred__rcu*ptracer_cred;

/* Objective and real subjective task credentials (COW): */
conststructcred__rcu*real_cred;

/* Effective (overridable) subjective task credentials (COW): */
conststructcred__rcu*cred;

#ifdef CONFIG_KEYS
/* Cached requested key. */
structkey*cached_requested_key;
#endif

/*
* executable name, excluding path.
*
* - normally initialized setup_new_exec()
* - access it with [gs]et_task_comm()
* - lock it with task_lock()
*/
charcomm[TASK_COMM_LEN];

structnameidata*nameidata;

#ifdef CONFIG_SYSVIPC
structsysv_semsysvsem;
structsysv_shmsysvshm;
#endif
#ifdef CONFIG_DETECT_HUNG_TASK
unsignedlonglast_switch_count;
unsignedlonglast_switch_time;
#endif
/* Filesystem information: */
structfs_struct*fs;

/* Open file information: */
structfiles_struct*files;

#ifdef CONFIG_IO_URING
structio_uring_task*io_uring;
#endif

/* Namespaces: */
structnsproxy*nsproxy;

/* Signal handlers: */
structsignal_struct*signal;
structsighand_struct__rcu*sighand;
sigset_tblocked;
sigset_treal_blocked;
/* Restored if set_restore_sigmask() was used: */
sigset_tsaved_sigmask;
structsigpendingpending;
unsignedlongsas_ss_sp;
size_tsas_ss_size;
unsignedintsas_ss_flags;

structcallback_head*task_works;

#ifdef CONFIG_AUDIT
#ifdef CONFIG_AUDITSYSCALL
structaudit_context*audit_context;
#endif
kuid_tloginuid;
unsignedintsessionid;
#endif
structseccompseccomp;
structsyscall_user_dispatchsyscall_dispatch;

/* Thread group tracking: */
u64parent_exec_id;
u64self_exec_id;

/* Protection against (de-)allocation: mm, files, fs, tty, keyrings, mems_allowed, mempolicy: */
spinlock_talloc_lock;

/* Protection of the PI data structures: */
raw_spinlock_tpi_lock;

structwake_q_nodewake_q;

#ifdef CONFIG_RT_MUTEXES
/* PI waiters blocked on a rt_mutex held by this task: */
structrb_root_cachedpi_waiters;
/* Updated under owner's pi_lock and rq lock */
structtask_struct*pi_top_task;
/* Deadlock detection and priority inheritance handling: */
structrt_mutex_waiter*pi_blocked_on;
#endif

#ifdef CONFIG_DEBUG_MUTEXES
/* Mutex deadlock detection: */
structmutex_waiter*blocked_on;
#endif

#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
intnon_block_count;
#endif

#ifdef CONFIG_TRACE_IRQFLAGS
structirqtrace_eventsirqtrace;
unsignedinthardirq_threaded;
u64hardirq_chain_key;
intsoftirqs_enabled;
intsoftirq_context;
intirq_config;
#endif
#ifdef CONFIG_PREEMPT_RT
intsoftirq_disable_cnt;
#endif

#ifdef CONFIG_LOCKDEP
# define MAX_LOCK_DEPTH48UL
u64curr_chain_key;
intlockdep_depth;
unsignedintlockdep_recursion;
structheld_lockheld_locks[MAX_LOCK_DEPTH];
#endif

#if defined(CONFIG_UBSAN) && !defined(CONFIG_UBSAN_TRAP)
unsignedintin_ubsan;
#endif

/* Journalling filesystem info: */
void*journal_info;

/* Stacked block device info: */
structbio_list*bio_list;

/* Stack plugging: */
structblk_plug*plug;

/* VM state: */
structreclaim_state*reclaim_state;

structbacking_dev_info*backing_dev_info;

structio_context*io_context;

#ifdef CONFIG_COMPACTION
structcapture_control*capture_control;
#endif
/* Ptrace state: */
unsignedlongptrace_message;
kernel_siginfo_t*last_siginfo;

structtask_io_accountingioac;
#ifdef CONFIG_PSI
/* Pressure stall state */
unsignedintpsi_flags;
#endif
#ifdef CONFIG_TASK_XACCT
/* Accumulated RSS usage: */
u64acct_rss_mem1;
/* Accumulated virtual memory usage: */
u64acct_vm_mem1;
/* stime + utime since last update: */
u64acct_timexpd;
#endif
#ifdef CONFIG_CPUSETS
/* Protected by ->alloc_lock: */
nodemask_tmems_allowed;
/* Sequence number to catch updates: */
seqcount_spinlock_tmems_allowed_seq;
intcpuset_mem_spread_rotor;
intcpuset_slab_spread_rotor;
#endif
#ifdef CONFIG_CGROUPS
/* Control Group info protected by css_set_lock: */
structcss_set__rcu*cgroups;
/* cg_list protected by css_set_lock and tsk->alloc_lock: */
structlist_headcg_list;
#endif
#ifdef CONFIG_X86_CPU_RESCTRL
u32closid;
u32rmid;
#endif
#ifdef CONFIG_FUTEX
structrobust_list_head__user*robust_list;
#ifdef CONFIG_COMPAT
structcompat_robust_list_head__user*compat_robust_list;
#endif
structlist_headpi_state_list;
structfutex_pi_state*pi_state_cache;
structmutexfutex_exit_mutex;
unsignedintfutex_state;
#endif
#ifdef CONFIG_PERF_EVENTS
structperf_event_context*perf_event_ctxp;
structmutexperf_event_mutex;
structlist_headperf_event_list;
#endif
#ifdef CONFIG_DEBUG_PREEMPT
unsignedlongpreempt_disable_ip;
#endif
#ifdef CONFIG_NUMA
/* Protected by alloc_lock: */
structmempolicy*mempolicy;
shortil_prev;
shortpref_node_fork;
#endif
#ifdef CONFIG_NUMA_BALANCING
intnuma_scan_seq;
unsignedintnuma_scan_period;
unsignedintnuma_scan_period_max;
intnuma_preferred_nid;
unsignedlongnuma_migrate_retry;
/* Migration stamp: */
u64node_stamp;
u64last_task_numa_placement;
u64last_sum_exec_runtime;
structcallback_headnuma_work;

/*
* This pointer is only modified for current in syscall and
* pagefault context (and for tasks being destroyed), so it can be read
* from any of the following contexts:
*  - RCU read-side critical section
*  - current->numa_group from everywhere
*  - task's runqueue locked, task not running
*/
structnuma_group__rcu*numa_group;

/*
* numa_faults is an array split into four regions:
* faults_memory, faults_cpu, faults_memory_buffer, faults_cpu_buffer
* in this precise order.
*
* faults_memory: Exponential decaying average of faults on a per-node
* basis. Scheduling placement decisions are made based on these
* counts. The values remain static for the duration of a PTE scan.
* faults_cpu: Track the nodes the process was running on when a NUMA
* hinting fault was incurred.
* faults_memory_buffer and faults_cpu_buffer: Record faults per node
* during the current scan window. When the scan completes, the counts
* in faults_memory and faults_cpu decay and these values are copied.
*/
unsignedlong*numa_faults;
unsignedlongtotal_numa_faults;

/*
* numa_faults_locality tracks if faults recorded during the last
* scan window were remote/local or failed to migrate. The task scan
* period is adapted based on the locality of the faults with different
* weights depending on whether they were shared or private faults
*/
unsignedlongnuma_faults_locality[3];

unsignedlongnuma_pages_migrated;
#endif /* CONFIG_NUMA_BALANCING */

#ifdef CONFIG_RSEQ
structrseq__user*rseq;
u32rseq_sig;
/*
* RmW on rseq_event_mask must be performed atomically
* with respect to preemption.
*/
unsignedlongrseq_event_mask;
#endif

structtlbflush_unmap_batchtlb_ubc;

union {
refcount_trcu_users;
structrcu_headrcu;
};

/* Cache last used pipe for splice(): */
structpipe_inode_info*splice_pipe;

structpage_fragtask_frag;

#ifdef CONFIG_TASK_DELAY_ACCT
structtask_delay_info*delays;
#endif

#ifdef CONFIG_FAULT_INJECTION
intmake_it_fail;
unsignedintfail_nth;
#endif
/*
* When (nr_dirtied >= nr_dirtied_pause), it's time to call
* balance_dirty_pages() for a dirty throttling pause:
*/
intnr_dirtied;
intnr_dirtied_pause;
/* Start of a write-and-pause period: */
unsignedlongdirty_paused_when;

#ifdef CONFIG_LATENCYTOP
intlatency_record_count;
structlatency_recordlatency_record[LT_SAVECOUNT];
#endif
/*
* Time slack values; these are used to round up poll() and
* select() etc timeout values. These are in nanoseconds.
*/
u64timer_slack_ns;
u64default_timer_slack_ns;

#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
unsignedintkasan_depth;
#endif

#ifdef CONFIG_KCSAN
structkcsan_ctxkcsan_ctx;
#ifdef CONFIG_TRACE_IRQFLAGS
structirqtrace_eventskcsan_save_irqtrace;
#endif
#ifdef CONFIG_KCSAN_WEAK_MEMORY
intkcsan_stack_depth;
#endif
#endif

#ifdef CONFIG_KMSAN
structkmsan_ctxkmsan_ctx;
#endif

#if IS_ENABLED(CONFIG_KUNIT)
structkunit*kunit_test;
#endif

#ifdef CONFIG_FUNCTION_GRAPH_TRACER
/* Index of current stored address in ret_stack: */
intcurr_ret_stack;
intcurr_ret_depth;

/* Stack of return addresses for return function tracing: */
structftrace_ret_stack*ret_stack;

/* Timestamp for last schedule: */
unsignedlonglongftrace_timestamp;

/*
* Number of functions that haven't been traced
* because of depth overrun:
*/
atomic_ttrace_overrun;

/* Pause tracing: */
atomic_ttracing_graph_pause;
#endif

#ifdef CONFIG_TRACING
/* Bitmask and counter of trace recursion: */
unsignedlongtrace_recursion;
#endif /* CONFIG_TRACING */

#ifdef CONFIG_KCOV
/* See kernel/kcov.c for more details. */

/* Coverage collection mode enabled for this task (0 if disabled): */
unsignedintkcov_mode;

/* Size of the kcov_area: */
unsignedintkcov_size;

/* Buffer for coverage collection: */
void*kcov_area;

/* KCOV descriptor wired with this task or NULL: */
structkcov*kcov;

/* KCOV common handle for remote coverage collection: */
u64kcov_handle;

/* KCOV sequence number: */
intkcov_sequence;

/* Collect coverage from softirq context: */
unsignedintkcov_softirq;
#endif

#ifdef CONFIG_MEMCG
structmem_cgroup*memcg_in_oom;
gfp_tmemcg_oom_gfp_mask;
intmemcg_oom_order;

/* Number of pages to reclaim on returning to userland: */
unsignedintmemcg_nr_pages_over_high;

/* Used by memcontrol for targeted memcg charge: */
structmem_cgroup*active_memcg;
#endif

#ifdef CONFIG_BLK_CGROUP
structrequest_queue*throttle_queue;
#endif

#ifdef CONFIG_UPROBES
structuprobe_task*utask;
#endif
#if defined(CONFIG_BCACHE) || defined(CONFIG_BCACHE_MODULE)
unsignedintsequential_io;
unsignedintsequential_io_avg;
#endif
structkmap_ctrlkmap_ctrl;
#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
unsignedlongtask_state_change;
# ifdef CONFIG_PREEMPT_RT
unsignedlongsaved_state_change;
# endif
#endif
intpagefault_disabled;
#ifdef CONFIG_MMU
structtask_struct*oom_reaper_list;
structtimer_listoom_reaper_timer;
#endif
#ifdef CONFIG_VMAP_STACK
structvm_struct*stack_vm_area;
#endif
#ifdef CONFIG_THREAD_INFO_IN_TASK
/* A live task holds one reference: */
refcount_tstack_refcount;
#endif
#ifdef CONFIG_LIVEPATCH
intpatch_state;
#endif
#ifdef CONFIG_SECURITY
/* Used by LSM modules for access restriction: */
void*security;
#endif
#ifdef CONFIG_BPF_SYSCALL
/* Used by BPF task local storage */
structbpf_local_storage__rcu*bpf_storage;
/* Used for BPF run context */
structbpf_run_ctx*bpf_ctx;
#endif

#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
unsignedlonglowest_stack;
unsignedlongprev_lowest_stack;
#endif

#ifdef CONFIG_X86_MCE
void__user*mce_vaddr;
__u64mce_kflags;
u64mce_addr;
__u64mce_ripv : 1,
mce_whole_page : 1,
__mce_reserved : 62;
structcallback_headmce_kill_me;
intmce_count;
#endif

#ifdef CONFIG_KRETPROBES
structllist_head               kretprobe_instances;
#endif
#ifdef CONFIG_RETHOOK
structllist_head               rethooks;
#endif

#ifdef CONFIG_ARCH_HAS_PARANOID_L1D_FLUSH
/*
* If L1D flush is supported on mm context switch
* then we use this callback head to queue kill work
* to kill tasks that are not running on SMT disabled
* cores
*/
structcallback_headl1d_flush_kill;
#endif

#ifdef CONFIG_RV
/*
* Per-task RV monitor. Nowadays fixed in RV_PER_TASK_MONITORS.
* If we find justification for more monitors, we can think
* about adding more or developing a dynamic method. So far,
* none of these are justified.
*/
unionrv_task_monitorrv[RV_PER_TASK_MONITORS];
#endif

/*
* New fields for task_struct should be added above here, so that
* they are included in the randomized portion of task_struct.
*/
randomized_struct_fields_end

/* CPU-specific state of this task: */
structthread_structthread;

/*
* WARNING: on x86, 'thread_struct' contains a variable-sized
* structure.  It *MUST* be at the end of 'task_struct'.
*
* Do not put anything below here!
*/
};

这个相当大的结构体,即文件描述符,能够描述一个正在执行的程序,包括它打开的文件,进程地址空间,挂起的信号,进程状态等等。

Linux就是这样把所有的进程使用双向链表连接起来。

进程描述符

Linux通过slab分配器来分配 task_struct 结构,通过预先分配和重复使用 task_struct 结构,可以避免动态分配和释放所带来的资源消耗。为了更快速更方便的获取当前进程的 task_struct ,在内核中,把各个进程的  task_struct 存放在内核栈的尾端。这样通过栈指针就能够计算出 task_struct 结构的位置,避免了使用额外的寄存器来记录,节省了资源并提升了速度。

使用slab分配器动态生成 task_struct ,只需要在栈底(栈向下增长)或栈顶(栈向上增长)创建一个新的 struct thread_info 即可。这样计算这个新增结构的地址偏移就会变得很简单。

thread_info结构体的位置在<archx86includeasm>

/* Linux 6.2 */
structthread_info {
unsignedlongflags;/* low level flags */
unsignedlongsyscall_work;/* SYSCALL_WORK_ flags */
u32status;/* thread synchronous flags */
#ifdef CONFIG_SMP
u32cpu;/* current CPU */
#endif
};

/* linux 2.6 */
structthread_info{
    structtask_struct*task; /* 存放指向该任务的实际task_struct的指针 */
    structexec_domain*exec_domain;
    unsignedlongflags;
    unsignedlongstatus;
    __u32cpu;
    __u32preempt_count;
    mm_segment_taddr_limit;
    structrestart_blockrestart_block;
    unsignedlongprevious_esp;
    _u8supervisor_stack[0];
}

每个任务的thread_info在内核栈的尾端分配。获取当前运行进程的task_struct代码如下,位于<current.h>

#include <linux/thread_info.h>

#define get_current() (current_thread_info()->task)
#define current get_current()

structtask_struct;

staticinlinestructtask_struct*get_current(void)
{
structtask_struct*current;
__asm__("andl %%esp,%0; ":"=r" (current) : "0" (~8191UL));
returncurrent;
 }

内核通过一个唯一的进程标识值(process identification value)或PID来标识每个进程,PID是一个数字,类型为pid_t类型,实际上就是int类型。PID最大默认设置为32768(为short int最大值,也可以增加到类型允许的范围,其上限可在/proc/sys/kernel/pid_max修改),这个最大值实际上就是系统中允许同时存在的进程的最大数目,内核把每个进程的PID存放咋它们各自的进程描述符中。

内核中,大部分处理进程的代码都是直接通过task_struct进行的,通过current宏(current.h中)可以找到当前正在运行的进程及进程描述符。该宏的实现和硬件体系结构息息相关,有的硬件体系结构寄存器丰富,可以直接拿出一个寄存器专门用于存放指向当前进程task_struct的指针,但是像x86这种寄存器较少的体系结构,只能在内核栈的尾端创建thread_info结构,通过计算偏移简介查找task_struct结构。也就是上面的宏current_thread_info()->task。

进程状态

在进程描述符中,有一个state域用于存放进程当前的状态,进程状态有五种,任意进程必然处在进程五态之一。

TASK_RUNNING,运行态,进程是可执行的,进程当前正在执行,或者处于运行队列中等待执行。在内核空间中,执行的进程处于这一状态,而在用户空间中,这也是进程执行的唯一状态。

TASK_INTERRUPTIBLE,可中断状态,此时进程处于睡眠状态,被阻塞,等待某些条件的达成。一旦条件达成,内核会立即把该进程设置为运行状态,也有可能会因为收到某信号而被提前唤醒并投入运行。

TASK_UNINTERRUPTIBLE,不可中断状态,除了不会因为收到信号而被唤醒并投入运行外,与TASK_INTERRUPTIBLE相同。

TASK_ZOMBIE,僵死状态,表示该进程已经结束了,但是父进程还没有调用wait4()系统调用(资源还未回收)。此时,子进程的进程描述符还保留着,父进程仍然可以通过进程描述符找到子进程的信息,一旦父进程调用wait4(),子进程的进程描述符将被释放。

进程五态的转化关系如下

 

TASK_STOPPED,停止状态,进程未运行且无法运行。通常该状态发生在进程收到SIGSTOP、SIGTSTP、SIGTTIN、SIGTTOU等信号的时候。另外,在调试期间,收到任何信号都会使进程进入该状态。

内核可以通过下面的函数来改变进程的状态

set_task_state(task, state);

task表示进程,state表示要指定的状态。

进程上下文与进程家族树

可执行程序代码是进程的重要组成部分,这些代码从可执行文件载入到进程的地址空间执行。一般来说,程序在用户空间执行,当一个程序调用了系统调用或者触发了某个异常,该程序就会陷入内核空间。此时,我们称内核“代表进程执行”并处于进程上下文中。在该上下文中,current宏有效。除非此时有更高优先级的进城需要执行并由调度器做出相应调度,否则的话,在内核退出时程序会恢复在用户空间继续执行。

系统调用和异常处理程序是对内核明确定义的接口,进程只有通过这些接口才能陷入内核执行,也就是说,对内核的所有访问都必须通过这些接口。

Linux系统中,所有进程都有着明显的继承关系,即所有进程都是PID为1的进程的后代。内核在系统启动的最后阶段启动init进程,init进程读取系统的初始化脚本 initscript 并执行其他相关程序,最终完成系统的整个启动过程。

在Linux系统中,每个进程必有一个父进程,每个进程也同样拥有0个或多个子进程。其中,拥有同一个父进程的所有进程称为兄弟。在进程描述符中存放了进程间的关系,每个task_struct都包含一个指向其父进程task_struct的指针parent,还包含一个子进程链表children。

对于当前进程,我们可以通过下面程序获得其父进程描述符或遍历所有子进程

/* 获取父进程描述符 */
structtask_struct*my_parent=current->parent;

/* 遍历子进程 */
structtask_struct*task;
structlist_head*list;

list_for_each(list, &current->children){
    task=list_entry(list, structtask_struct, sibling);
    /* task指向某个子进程 */
}

/* init进程的进程描述符是作为init_task静态分配的
进程间关系可以描述如下 */
structtask_struct*task;

for(task=current; task!=&init_task; task=task->parent){
    ;
}
/* 此时task指向init */

通过这种继承体系,可以从系统的任意进程出发找到指定的任意其它进程,因为任务队列本身就是一个双向循环链表。对于给定进程,获取链表中下一个进程和前一个进程的方法如下

list_entry(task->tasks.next, structtask_struct,tasks); /* next_task(task)宏 */

list_entry(task->tasks.prev, structtask_struct, tasks); /* next_task(task)宏 */

也可以通过for_each_process(task)宏依次访问整个任务队列,每次访问,任务指针指向链表的下一个元素

structtask_struct*task;

for_each_process(task) {
    /* 打印每一个任务的名称和PID */
    printk("%s[%d]n", task->comm, task->pid);
}

 

相关推荐

登录即可解锁
  • 海量技术文章
  • 设计资源下载
  • 产业链客户资源
  • 写文章/发需求
立即登录

Linux、C、C++、Python、Matlab,机器人运动控制、多机器人协作,智能优化算法,贝叶斯滤波与卡尔曼滤波估计、多传感器信息融合,机器学习,人工智能。