Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: add documentation

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: add documentation
Linus Torvalds
2 parents 2c35cd019f c54fce6eff
Showing 3 changed files Side-by-side Diff
Documentation/workqueue.txt
include/linux/workqueue.h
kernel/workqueue.c
+
+Concurrency Managed Workqueue (cmwq)
+
+September, 2010		Tejun Heo <tj@kernel.org>
+			Florian Mickler <florian@mickler.org>
+
+CONTENTS
+
+1. Introduction
+2. Why cmwq?
+3. The Design
+4. Application Programming Interface (API)
+5. Example Execution Scenarios
+6. Guidelines
+
+
+1. Introduction
+
+There are many cases where an asynchronous process execution context
+is needed and the workqueue (wq) API is the most commonly used
+mechanism for such cases.
+
+When such an asynchronous execution context is needed, a work item
+describing which function to execute is put on a queue.  An
+independent thread serves as the asynchronous execution context.  The
+queue is called workqueue and the thread is called worker.
+
+While there are work items on the workqueue the worker executes the
+functions associated with the work items one after the other.  When
+there is no work item left on the workqueue the worker becomes idle.
+When a new work item gets queued, the worker begins executing again.
+
+
+2. Why cmwq?
+
+In the original wq implementation, a multi threaded (MT) wq had one
+worker thread per CPU and a single threaded (ST) wq had one worker
+thread system-wide.  A single MT wq needed to keep around the same
+number of workers as the number of CPUs.  The kernel grew a lot of MT
+wq users over the years and with the number of CPU cores continuously
+rising, some systems saturated the default 32k PID space just booting
+up.
+
+Although MT wq wasted a lot of resource, the level of concurrency
+provided was unsatisfactory.  The limitation was common to both ST and
+MT wq albeit less severe on MT.  Each wq maintained its own separate
+worker pool.  A MT wq could provide only one execution context per CPU
+while a ST wq one for the whole system.  Work items had to compete for
+those very limited execution contexts leading to various problems
+including proneness to deadlocks around the single execution context.
+
+The tension between the provided level of concurrency and resource
+usage also forced its users to make unnecessary tradeoffs like libata
+choosing to use ST wq for polling PIOs and accepting an unnecessary
+limitation that no two polling PIOs can progress at the same time.  As
+MT wq don't provide much better concurrency, users which require
+higher level of concurrency, like async or fscache, had to implement
+their own thread pool.
+
+Concurrency Managed Workqueue (cmwq) is a reimplementation of wq with
+focus on the following goals.
+
+* Maintain compatibility with the original workqueue API.
+
+* Use per-CPU unified worker pools shared by all wq to provide
+  flexible level of concurrency on demand without wasting a lot of
+  resource.
+
+* Automatically regulate worker pool and level of concurrency so that
+  the API users don't need to worry about such details.
+
+
+3. The Design
+
+In order to ease the asynchronous execution of functions a new
+abstraction, the work item, is introduced.
+
+A work item is a simple struct that holds a pointer to the function
+that is to be executed asynchronously.  Whenever a driver or subsystem
+wants a function to be executed asynchronously it has to set up a work
+item pointing to that function and queue that work item on a
+workqueue.
+
+Special purpose threads, called worker threads, execute the functions
+off of the queue, one after the other.  If no work is queued, the
+worker threads become idle.  These worker threads are managed in so
+called thread-pools.
+
+The cmwq design differentiates between the user-facing workqueues that
+subsystems and drivers queue work items on and the backend mechanism
+which manages thread-pool and processes the queued work items.
+
+The backend is called gcwq.  There is one gcwq for each possible CPU
+and one gcwq to serve work items queued on unbound workqueues.
+
+Subsystems and drivers can create and queue work items through special
+workqueue API functions as they see fit. They can influence some
+aspects of the way the work items are executed by setting flags on the
+workqueue they are putting the work item on. These flags include
+things like CPU locality, reentrancy, concurrency limits and more. To
+get a detailed overview refer to the API description of
+alloc_workqueue() below.
+
+When a work item is queued to a workqueue, the target gcwq is
+determined according to the queue parameters and workqueue attributes
+and appended on the shared worklist of the gcwq.  For example, unless
+specifically overridden, a work item of a bound workqueue will be
+queued on the worklist of exactly that gcwq that is associated to the
+CPU the issuer is running on.
+
+For any worker pool implementation, managing the concurrency level
+(how many execution contexts are active) is an important issue.  cmwq
+tries to keep the concurrency at a minimal but sufficient level.
+Minimal to save resources and sufficient in that the system is used at
+its full capacity.
+
+Each gcwq bound to an actual CPU implements concurrency management by
+hooking into the scheduler.  The gcwq is notified whenever an active
+worker wakes up or sleeps and keeps track of the number of the
+currently runnable workers.  Generally, work items are not expected to
+hog a CPU and consume many cycles.  That means maintaining just enough
+concurrency to prevent work processing from stalling should be
+optimal.  As long as there are one or more runnable workers on the
+CPU, the gcwq doesn't start execution of a new work, but, when the
+last running worker goes to sleep, it immediately schedules a new
+worker so that the CPU doesn't sit idle while there are pending work
+items.  This allows using a minimal number of workers without losing
+execution bandwidth.
+
+Keeping idle workers around doesn't cost other than the memory space
+for kthreads, so cmwq holds onto idle ones for a while before killing
+them.
+
+For an unbound wq, the above concurrency management doesn't apply and
+the gcwq for the pseudo unbound CPU tries to start executing all work
+items as soon as possible.  The responsibility of regulating
+concurrency level is on the users.  There is also a flag to mark a
+bound wq to ignore the concurrency management.  Please refer to the
+API section for details.
+
+Forward progress guarantee relies on that workers can be created when
+more execution contexts are necessary, which in turn is guaranteed
+through the use of rescue workers.  All work items which might be used
+on code paths that handle memory reclaim are required to be queued on
+wq's that have a rescue-worker reserved for execution under memory
+pressure.  Else it is possible that the thread-pool deadlocks waiting
+for execution contexts to free up.
+
+
+4. Application Programming Interface (API)
+
+alloc_workqueue() allocates a wq.  The original create_*workqueue()
+functions are deprecated and scheduled for removal.  alloc_workqueue()
+takes three arguments - @name, @flags and @max_active.  @name is the
+name of the wq and also used as the name of the rescuer thread if
+there is one.
+
+A wq no longer manages execution resources but serves as a domain for
+forward progress guarantee, flush and work item attributes.  @flags
+and @max_active control how work items are assigned execution
+resources, scheduled and executed.
+
+@flags:
+
+  WQ_NON_REENTRANT
+
+	By default, a wq guarantees non-reentrance only on the same
+	CPU.  A work item may not be executed concurrently on the same
+	CPU by multiple workers but is allowed to be executed
+	concurrently on multiple CPUs.  This flag makes sure
+	non-reentrance is enforced across all CPUs.  Work items queued
+	to a non-reentrant wq are guaranteed to be executed by at most
+	one worker system-wide at any given time.
+
+  WQ_UNBOUND
+
+	Work items queued to an unbound wq are served by a special
+	gcwq which hosts workers which are not bound to any specific
+	CPU.  This makes the wq behave as a simple execution context
+	provider without concurrency management.  The unbound gcwq
+	tries to start execution of work items as soon as possible.
+	Unbound wq sacrifices locality but is useful for the following
+	cases.
+
+	* Wide fluctuation in the concurrency level requirement is
+	  expected and using bound wq may end up creating large number
+	  of mostly unused workers across different CPUs as the issuer
+	  hops through different CPUs.
+
+	* Long running CPU intensive workloads which can be better
+	  managed by the system scheduler.
+
+  WQ_FREEZEABLE
+
+	A freezeable wq participates in the freeze phase of the system
+	suspend operations.  Work items on the wq are drained and no
+	new work item starts execution until thawed.
+
+  WQ_RESCUER
+
+	All wq which might be used in the memory reclaim paths _MUST_
+	have this flag set.  This reserves one worker exclusively for
+	the execution of this wq under memory pressure.
+
+  WQ_HIGHPRI
+
+	Work items of a highpri wq are queued at the head of the
+	worklist of the target gcwq and start execution regardless of
+	the current concurrency level.  In other words, highpri work
+	items will always start execution as soon as execution
+	resource is available.
+
+	Ordering among highpri work items is preserved - a highpri
+	work item queued after another highpri work item will start
+	execution after the earlier highpri work item starts.
+
+	Although highpri work items are not held back by other
+	runnable work items, they still contribute to the concurrency
+	level.  Highpri work items in runnable state will prevent
+	non-highpri work items from starting execution.
+
+	This flag is meaningless for unbound wq.
+
+  WQ_CPU_INTENSIVE
+
+	Work items of a CPU intensive wq do not contribute to the
+	concurrency level.  In other words, runnable CPU intensive
+	work items will not prevent other work items from starting
+	execution.  This is useful for bound work items which are
+	expected to hog CPU cycles so that their execution is
+	regulated by the system scheduler.
+
+	Although CPU intensive work items don't contribute to the
+	concurrency level, start of their executions is still
+	regulated by the concurrency management and runnable
+	non-CPU-intensive work items can delay execution of CPU
+	intensive work items.
+
+	This flag is meaningless for unbound wq.
+
+  WQ_HIGHPRI | WQ_CPU_INTENSIVE
+
+	This combination makes the wq avoid interaction with
+	concurrency management completely and behave as a simple
+	per-CPU execution context provider.  Work items queued on a
+	highpri CPU-intensive wq start execution as soon as resources
+	are available and don't affect execution of other work items.
+
+@max_active:
+
+@max_active determines the maximum number of execution contexts per
+CPU which can be assigned to the work items of a wq.  For example,
+with @max_active of 16, at most 16 work items of the wq can be
+executing at the same time per CPU.
+
+Currently, for a bound wq, the maximum limit for @max_active is 512
+and the default value used when 0 is specified is 256.  For an unbound
+wq, the limit is higher of 512 and 4 * num_possible_cpus().  These
+values are chosen sufficiently high such that they are not the
+limiting factor while providing protection in runaway cases.
+
+The number of active work items of a wq is usually regulated by the
+users of the wq, more specifically, by how many work items the users
+may queue at the same time.  Unless there is a specific need for
+throttling the number of active work items, specifying '0' is
+recommended.
+
+Some users depend on the strict execution ordering of ST wq.  The
+combination of @max_active of 1 and WQ_UNBOUND is used to achieve this
+behavior.  Work items on such wq are always queued to the unbound gcwq
+and only one work item can be active at any given time thus achieving
+the same ordering property as ST wq.
+
+
+5. Example Execution Scenarios
+
+The following example execution scenarios try to illustrate how cmwq
+behave under different configurations.
+
+ Work items w0, w1, w2 are queued to a bound wq q0 on the same CPU.
+ w0 burns CPU for 5ms then sleeps for 10ms then burns CPU for 5ms
+ again before finishing.  w1 and w2 burn CPU for 5ms then sleep for
+ 10ms.
+
+Ignoring all other tasks, works and processing overhead, and assuming
+simple FIFO scheduling, the following is one highly simplified version
+of possible sequences of events with the original wq.
+
+ TIME IN MSECS	EVENT
+ 0		w0 starts and burns CPU
+ 5		w0 sleeps
+ 15		w0 wakes up and burns CPU
+ 20		w0 finishes
+ 20		w1 starts and burns CPU
+ 25		w1 sleeps
+ 35		w1 wakes up and finishes
+ 35		w2 starts and burns CPU
+ 40		w2 sleeps
+ 50		w2 wakes up and finishes
+
+And with cmwq with @max_active >= 3,
+
+ TIME IN MSECS	EVENT
+ 0		w0 starts and burns CPU
+ 5		w0 sleeps
+ 5		w1 starts and burns CPU
+ 10		w1 sleeps
+ 10		w2 starts and burns CPU
+ 15		w2 sleeps
+ 15		w0 wakes up and burns CPU
+ 20		w0 finishes
+ 20		w1 wakes up and finishes
+ 25		w2 wakes up and finishes
+
+If @max_active == 2,
+
+ TIME IN MSECS	EVENT
+ 0		w0 starts and burns CPU
+ 5		w0 sleeps
+ 5		w1 starts and burns CPU
+ 10		w1 sleeps
+ 15		w0 wakes up and burns CPU
+ 20		w0 finishes
+ 20		w1 wakes up and finishes
+ 20		w2 starts and burns CPU
+ 25		w2 sleeps
+ 35		w2 wakes up and finishes
+
+Now, let's assume w1 and w2 are queued to a different wq q1 which has
+WQ_HIGHPRI set,
+
+ TIME IN MSECS	EVENT
+ 0		w1 and w2 start and burn CPU
+ 5		w1 sleeps
+ 10		w2 sleeps
+ 10		w0 starts and burns CPU
+ 15		w0 sleeps
+ 15		w1 wakes up and finishes
+ 20		w2 wakes up and finishes
+ 25		w0 wakes up and burns CPU
+ 30		w0 finishes
+
+If q1 has WQ_CPU_INTENSIVE set,
+
+ TIME IN MSECS	EVENT
+ 0		w0 starts and burns CPU
+ 5		w0 sleeps
+ 5		w1 and w2 start and burn CPU
+ 10		w1 sleeps
+ 15		w2 sleeps
+ 15		w0 wakes up and burns CPU
+ 20		w0 finishes
+ 20		w1 wakes up and finishes
+ 25		w2 wakes up and finishes
+
+
+6. Guidelines
+
+* Do not forget to use WQ_RESCUER if a wq may process work items which
+  are used during memory reclaim.  Each wq with WQ_RESCUER set has one
+  rescuer thread reserved for it.  If there is dependency among
+  multiple work items used during memory reclaim, they should be
+  queued to separate wq each with WQ_RESCUER.
+
+* Unless strict ordering is required, there is no need to use ST wq.
+
+* Unless there is a specific need, using 0 for @max_active is
+  recommended.  In most use cases, concurrency level usually stays
+  well under the default limit.
+
+* A wq serves as a domain for forward progress guarantee (WQ_RESCUER),
+  flush and work item attributes.  Work items which are not involved
+  in memory reclaim and don't need to be flushed as a part of a group
+  of work items, and don't require any special attribute, can use one
+  of the system wq.  There is no difference in execution
+  characteristics between using a dedicated wq and a system wq.
+
+* Unless work items are expected to consume a huge amount of CPU
+  cycles, using a bound wq is usually beneficial due to the increased
+  level of locality in wq operations and work item execution.
@@ -235,6 +235,10 @@
 #define work_clear_pending(work) \
 	clear_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))
  
+/*
+ * Workqueue flags and constants.  For details, please refer to
+ * Documentation/workqueue.txt.
+ */
 enum {
 	WQ_NON_REENTRANT	= 1 << 0, /* guarantee non-reentrance */
 	WQ_UNBOUND		= 1 << 1, /* not bound to any cpu */
 /*
- * linux/kernel/workqueue.c
+ * kernel/workqueue.c - generic async execution with shared worker pool
  *
- * Generic mechanism for defining kernel helper threads for running
- * arbitrary tasks in process context.
+ * Copyright (C) 2002		Ingo Molnar
  *
- * Started by Ingo Molnar, Copyright (C) 2002
+ *   Derived from the taskqueue/keventd code by:
+ *     David Woodhouse <dwmw2@infradead.org>
+ *     Andrew Morton
+ *     Kai Petzke <wpp@marie.physik.tu-berlin.de>
+ *     Theodore Ts'o <tytso@mit.edu>
  *
- * Derived from the taskqueue/keventd code by:
+ * Made to use alloc_percpu by Christoph Lameter.
  *
- *   David Woodhouse <dwmw2@infradead.org>
- *   Andrew Morton
- *   Kai Petzke <wpp@marie.physik.tu-berlin.de>
- *   Theodore Ts'o <tytso@mit.edu>
+ * Copyright (C) 2010		SUSE Linux Products GmbH
+ * Copyright (C) 2010		Tejun Heo <tj@kernel.org>
  *
- * Made to use alloc_percpu by Christoph Lameter.
+ * This is the generic async execution mechanism.  Work items as are
+ * executed in process context.  The worker pool is shared and
+ * automatically managed.  There is one worker pool for each CPU and
+ * one extra for works which are better served by workers which are
+ * not bound to any specific CPU.
+ *
+ * Please read Documentation/workqueue.txt for details.
  */
  
 #include <linux/module.h>
	1	+
	2	+Concurrency Managed Workqueue (cmwq)
	3	+
	4	+September, 2010 Tejun Heo <tj@kernel.org>
	5	+ Florian Mickler <florian@mickler.org>
	6	+
	7	+CONTENTS
	8	+
	9	+1. Introduction
	10	+2. Why cmwq?
	11	+3. The Design
	12	+4. Application Programming Interface (API)
	13	+5. Example Execution Scenarios
	14	+6. Guidelines
	15	+
	16	+
	17	+1. Introduction
	18	+
	19	+There are many cases where an asynchronous process execution context
	20	+is needed and the workqueue (wq) API is the most commonly used
	21	+mechanism for such cases.
	22	+
	23	+When such an asynchronous execution context is needed, a work item
	24	+describing which function to execute is put on a queue. An
	25	+independent thread serves as the asynchronous execution context. The
	26	+queue is called workqueue and the thread is called worker.
	27	+
	28	+While there are work items on the workqueue the worker executes the
	29	+functions associated with the work items one after the other. When
	30	+there is no work item left on the workqueue the worker becomes idle.
	31	+When a new work item gets queued, the worker begins executing again.
	32	+
	33	+
	34	+2. Why cmwq?
	35	+
	36	+In the original wq implementation, a multi threaded (MT) wq had one
	37	+worker thread per CPU and a single threaded (ST) wq had one worker
	38	+thread system-wide. A single MT wq needed to keep around the same
	39	+number of workers as the number of CPUs. The kernel grew a lot of MT
	40	+wq users over the years and with the number of CPU cores continuously
	41	+rising, some systems saturated the default 32k PID space just booting
	42	+up.
	43	+
	44	+Although MT wq wasted a lot of resource, the level of concurrency
	45	+provided was unsatisfactory. The limitation was common to both ST and
	46	+MT wq albeit less severe on MT. Each wq maintained its own separate
	47	+worker pool. A MT wq could provide only one execution context per CPU
	48	+while a ST wq one for the whole system. Work items had to compete for
	49	+those very limited execution contexts leading to various problems
	50	+including proneness to deadlocks around the single execution context.
	51	+
	52	+The tension between the provided level of concurrency and resource
	53	+usage also forced its users to make unnecessary tradeoffs like libata
	54	+choosing to use ST wq for polling PIOs and accepting an unnecessary
	55	+limitation that no two polling PIOs can progress at the same time. As
	56	+MT wq don't provide much better concurrency, users which require
	57	+higher level of concurrency, like async or fscache, had to implement
	58	+their own thread pool.
	59	+
	60	+Concurrency Managed Workqueue (cmwq) is a reimplementation of wq with
	61	+focus on the following goals.
	62	+
	63	+* Maintain compatibility with the original workqueue API.
	64	+
	65	+* Use per-CPU unified worker pools shared by all wq to provide
	66	+ flexible level of concurrency on demand without wasting a lot of
	67	+ resource.
	68	+
	69	+* Automatically regulate worker pool and level of concurrency so that
	70	+ the API users don't need to worry about such details.
	71	+
	72	+
	73	+3. The Design
	74	+
	75	+In order to ease the asynchronous execution of functions a new
	76	+abstraction, the work item, is introduced.
	77	+
	78	+A work item is a simple struct that holds a pointer to the function
	79	+that is to be executed asynchronously. Whenever a driver or subsystem
	80	+wants a function to be executed asynchronously it has to set up a work
	81	+item pointing to that function and queue that work item on a
	82	+workqueue.
	83	+
	84	+Special purpose threads, called worker threads, execute the functions
	85	+off of the queue, one after the other. If no work is queued, the
	86	+worker threads become idle. These worker threads are managed in so
	87	+called thread-pools.
	88	+
	89	+The cmwq design differentiates between the user-facing workqueues that
	90	+subsystems and drivers queue work items on and the backend mechanism
	91	+which manages thread-pool and processes the queued work items.
	92	+
	93	+The backend is called gcwq. There is one gcwq for each possible CPU
	94	+and one gcwq to serve work items queued on unbound workqueues.
	95	+
	96	+Subsystems and drivers can create and queue work items through special
	97	+workqueue API functions as they see fit. They can influence some
	98	+aspects of the way the work items are executed by setting flags on the
	99	+workqueue they are putting the work item on. These flags include
	100	+things like CPU locality, reentrancy, concurrency limits and more. To
	101	+get a detailed overview refer to the API description of
	102	+alloc_workqueue() below.
	103	+
	104	+When a work item is queued to a workqueue, the target gcwq is
	105	+determined according to the queue parameters and workqueue attributes
	106	+and appended on the shared worklist of the gcwq. For example, unless
	107	+specifically overridden, a work item of a bound workqueue will be
	108	+queued on the worklist of exactly that gcwq that is associated to the
	109	+CPU the issuer is running on.
	110	+
	111	+For any worker pool implementation, managing the concurrency level
	112	+(how many execution contexts are active) is an important issue. cmwq
	113	+tries to keep the concurrency at a minimal but sufficient level.
	114	+Minimal to save resources and sufficient in that the system is used at
	115	+its full capacity.
	116	+
	117	+Each gcwq bound to an actual CPU implements concurrency management by
	118	+hooking into the scheduler. The gcwq is notified whenever an active
	119	+worker wakes up or sleeps and keeps track of the number of the
	120	+currently runnable workers. Generally, work items are not expected to
	121	+hog a CPU and consume many cycles. That means maintaining just enough
	122	+concurrency to prevent work processing from stalling should be
	123	+optimal. As long as there are one or more runnable workers on the
	124	+CPU, the gcwq doesn't start execution of a new work, but, when the
	125	+last running worker goes to sleep, it immediately schedules a new
	126	+worker so that the CPU doesn't sit idle while there are pending work
	127	+items. This allows using a minimal number of workers without losing
	128	+execution bandwidth.
	129	+
	130	+Keeping idle workers around doesn't cost other than the memory space
	131	+for kthreads, so cmwq holds onto idle ones for a while before killing
	132	+them.
	133	+
	134	+For an unbound wq, the above concurrency management doesn't apply and
	135	+the gcwq for the pseudo unbound CPU tries to start executing all work
	136	+items as soon as possible. The responsibility of regulating
	137	+concurrency level is on the users. There is also a flag to mark a
	138	+bound wq to ignore the concurrency management. Please refer to the
	139	+API section for details.
	140	+
	141	+Forward progress guarantee relies on that workers can be created when
	142	+more execution contexts are necessary, which in turn is guaranteed
	143	+through the use of rescue workers. All work items which might be used
	144	+on code paths that handle memory reclaim are required to be queued on
	145	+wq's that have a rescue-worker reserved for execution under memory
	146	+pressure. Else it is possible that the thread-pool deadlocks waiting
	147	+for execution contexts to free up.
	148	+
	149	+
	150	+4. Application Programming Interface (API)
	151	+
	152	+alloc_workqueue() allocates a wq. The original create_*workqueue()
	153	+functions are deprecated and scheduled for removal. alloc_workqueue()
	154	+takes three arguments - @name, @flags and @max_active. @name is the
	155	+name of the wq and also used as the name of the rescuer thread if
	156	+there is one.
	157	+
	158	+A wq no longer manages execution resources but serves as a domain for
	159	+forward progress guarantee, flush and work item attributes. @flags
	160	+and @max_active control how work items are assigned execution
	161	+resources, scheduled and executed.
	162	+
	163	+@flags:
	164	+
	165	+ WQ_NON_REENTRANT
	166	+
	167	+ By default, a wq guarantees non-reentrance only on the same
	168	+ CPU. A work item may not be executed concurrently on the same
	169	+ CPU by multiple workers but is allowed to be executed
	170	+ concurrently on multiple CPUs. This flag makes sure
	171	+ non-reentrance is enforced across all CPUs. Work items queued
	172	+ to a non-reentrant wq are guaranteed to be executed by at most
	173	+ one worker system-wide at any given time.
	174	+
	175	+ WQ_UNBOUND
	176	+
	177	+ Work items queued to an unbound wq are served by a special
	178	+ gcwq which hosts workers which are not bound to any specific
	179	+ CPU. This makes the wq behave as a simple execution context
	180	+ provider without concurrency management. The unbound gcwq
	181	+ tries to start execution of work items as soon as possible.
	182	+ Unbound wq sacrifices locality but is useful for the following
	183	+ cases.
	184	+
	185	+ * Wide fluctuation in the concurrency level requirement is
	186	+ expected and using bound wq may end up creating large number
	187	+ of mostly unused workers across different CPUs as the issuer
	188	+ hops through different CPUs.
	189	+
	190	+ * Long running CPU intensive workloads which can be better
	191	+ managed by the system scheduler.
	192	+
	193	+ WQ_FREEZEABLE
	194	+
	195	+ A freezeable wq participates in the freeze phase of the system
	196	+ suspend operations. Work items on the wq are drained and no
	197	+ new work item starts execution until thawed.
	198	+
	199	+ WQ_RESCUER
	200	+
	201	+ All wq which might be used in the memory reclaim paths _MUST_
	202	+ have this flag set. This reserves one worker exclusively for
	203	+ the execution of this wq under memory pressure.
	204	+
	205	+ WQ_HIGHPRI
	206	+
	207	+ Work items of a highpri wq are queued at the head of the
	208	+ worklist of the target gcwq and start execution regardless of
	209	+ the current concurrency level. In other words, highpri work
	210	+ items will always start execution as soon as execution
	211	+ resource is available.
	212	+
	213	+ Ordering among highpri work items is preserved - a highpri
	214	+ work item queued after another highpri work item will start
	215	+ execution after the earlier highpri work item starts.
	216	+
	217	+ Although highpri work items are not held back by other
	218	+ runnable work items, they still contribute to the concurrency
	219	+ level. Highpri work items in runnable state will prevent
	220	+ non-highpri work items from starting execution.
	221	+
	222	+ This flag is meaningless for unbound wq.
	223	+
	224	+ WQ_CPU_INTENSIVE
	225	+
	226	+ Work items of a CPU intensive wq do not contribute to the
	227	+ concurrency level. In other words, runnable CPU intensive
	228	+ work items will not prevent other work items from starting
	229	+ execution. This is useful for bound work items which are
	230	+ expected to hog CPU cycles so that their execution is
	231	+ regulated by the system scheduler.
	232	+
	233	+ Although CPU intensive work items don't contribute to the
	234	+ concurrency level, start of their executions is still
	235	+ regulated by the concurrency management and runnable
	236	+ non-CPU-intensive work items can delay execution of CPU
	237	+ intensive work items.
	238	+
	239	+ This flag is meaningless for unbound wq.
	240	+
	241	+ WQ_HIGHPRI \| WQ_CPU_INTENSIVE
	242	+
	243	+ This combination makes the wq avoid interaction with
	244	+ concurrency management completely and behave as a simple
	245	+ per-CPU execution context provider. Work items queued on a
	246	+ highpri CPU-intensive wq start execution as soon as resources
	247	+ are available and don't affect execution of other work items.
	248	+
	249	+@max_active:
	250	+
	251	+@max_active determines the maximum number of execution contexts per
	252	+CPU which can be assigned to the work items of a wq. For example,
	253	+with @max_active of 16, at most 16 work items of the wq can be
	254	+executing at the same time per CPU.
	255	+
	256	+Currently, for a bound wq, the maximum limit for @max_active is 512
	257	+and the default value used when 0 is specified is 256. For an unbound
	258	+wq, the limit is higher of 512 and 4 * num_possible_cpus(). These
	259	+values are chosen sufficiently high such that they are not the
	260	+limiting factor while providing protection in runaway cases.
	261	+
	262	+The number of active work items of a wq is usually regulated by the
	263	+users of the wq, more specifically, by how many work items the users
	264	+may queue at the same time. Unless there is a specific need for
	265	+throttling the number of active work items, specifying '0' is
	266	+recommended.
	267	+
	268	+Some users depend on the strict execution ordering of ST wq. The
	269	+combination of @max_active of 1 and WQ_UNBOUND is used to achieve this
	270	+behavior. Work items on such wq are always queued to the unbound gcwq
	271	+and only one work item can be active at any given time thus achieving
	272	+the same ordering property as ST wq.
	273	+
	274	+
	275	+5. Example Execution Scenarios
	276	+
	277	+The following example execution scenarios try to illustrate how cmwq
	278	+behave under different configurations.
	279	+
	280	+ Work items w0, w1, w2 are queued to a bound wq q0 on the same CPU.
	281	+ w0 burns CPU for 5ms then sleeps for 10ms then burns CPU for 5ms
	282	+ again before finishing. w1 and w2 burn CPU for 5ms then sleep for
	283	+ 10ms.
	284	+
	285	+Ignoring all other tasks, works and processing overhead, and assuming
	286	+simple FIFO scheduling, the following is one highly simplified version
	287	+of possible sequences of events with the original wq.
	288	+
	289	+ TIME IN MSECS EVENT
	290	+ 0 w0 starts and burns CPU
	291	+ 5 w0 sleeps
	292	+ 15 w0 wakes up and burns CPU
	293	+ 20 w0 finishes
	294	+ 20 w1 starts and burns CPU
	295	+ 25 w1 sleeps
	296	+ 35 w1 wakes up and finishes
	297	+ 35 w2 starts and burns CPU
	298	+ 40 w2 sleeps
	299	+ 50 w2 wakes up and finishes
	300	+
	301	+And with cmwq with @max_active >= 3,
	302	+
	303	+ TIME IN MSECS EVENT
	304	+ 0 w0 starts and burns CPU
	305	+ 5 w0 sleeps
	306	+ 5 w1 starts and burns CPU
	307	+ 10 w1 sleeps
	308	+ 10 w2 starts and burns CPU
	309	+ 15 w2 sleeps
	310	+ 15 w0 wakes up and burns CPU
	311	+ 20 w0 finishes
	312	+ 20 w1 wakes up and finishes
	313	+ 25 w2 wakes up and finishes
	314	+
	315	+If @max_active == 2,
	316	+
	317	+ TIME IN MSECS EVENT
	318	+ 0 w0 starts and burns CPU
	319	+ 5 w0 sleeps
	320	+ 5 w1 starts and burns CPU
	321	+ 10 w1 sleeps
	322	+ 15 w0 wakes up and burns CPU
	323	+ 20 w0 finishes
	324	+ 20 w1 wakes up and finishes
	325	+ 20 w2 starts and burns CPU
	326	+ 25 w2 sleeps
	327	+ 35 w2 wakes up and finishes
	328	+
	329	+Now, let's assume w1 and w2 are queued to a different wq q1 which has
	330	+WQ_HIGHPRI set,
	331	+
	332	+ TIME IN MSECS EVENT
	333	+ 0 w1 and w2 start and burn CPU
	334	+ 5 w1 sleeps
	335	+ 10 w2 sleeps
	336	+ 10 w0 starts and burns CPU
	337	+ 15 w0 sleeps
	338	+ 15 w1 wakes up and finishes
	339	+ 20 w2 wakes up and finishes
	340	+ 25 w0 wakes up and burns CPU
	341	+ 30 w0 finishes
	342	+
	343	+If q1 has WQ_CPU_INTENSIVE set,
	344	+
	345	+ TIME IN MSECS EVENT
	346	+ 0 w0 starts and burns CPU
	347	+ 5 w0 sleeps
	348	+ 5 w1 and w2 start and burn CPU
	349	+ 10 w1 sleeps
	350	+ 15 w2 sleeps
	351	+ 15 w0 wakes up and burns CPU
	352	+ 20 w0 finishes
	353	+ 20 w1 wakes up and finishes
	354	+ 25 w2 wakes up and finishes
	355	+
	356	+
	357	+6. Guidelines
	358	+
	359	+* Do not forget to use WQ_RESCUER if a wq may process work items which
	360	+ are used during memory reclaim. Each wq with WQ_RESCUER set has one
	361	+ rescuer thread reserved for it. If there is dependency among
	362	+ multiple work items used during memory reclaim, they should be
	363	+ queued to separate wq each with WQ_RESCUER.
	364	+
	365	+* Unless strict ordering is required, there is no need to use ST wq.
	366	+
	367	+* Unless there is a specific need, using 0 for @max_active is
	368	+ recommended. In most use cases, concurrency level usually stays
	369	+ well under the default limit.
	370	+
	371	+* A wq serves as a domain for forward progress guarantee (WQ_RESCUER),
	372	+ flush and work item attributes. Work items which are not involved
	373	+ in memory reclaim and don't need to be flushed as a part of a group
	374	+ of work items, and don't require any special attribute, can use one
	375	+ of the system wq. There is no difference in execution
	376	+ characteristics between using a dedicated wq and a system wq.
	377	+
	378	+* Unless work items are expected to consume a huge amount of CPU
	379	+ cycles, using a bound wq is usually beneficial due to the increased
	380	+ level of locality in wq operations and work item execution.
...	...	@@ -235,6 +235,10 @@
235	235	#define work_clear_pending(work) \
236	236	clear_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))
237	237
	238	+/*
	239	+ * Workqueue flags and constants. For details, please refer to
	240	+ * Documentation/workqueue.txt.
	241	+ */
238	242	enum {
239	243	WQ_NON_REENTRANT = 1 << 0, /* guarantee non-reentrance */
240	244	WQ_UNBOUND = 1 << 1, /* not bound to any cpu */
1	1	/*
2		- * linux/kernel/workqueue.c
	2	+ * kernel/workqueue.c - generic async execution with shared worker pool
3	3	*
4		- * Generic mechanism for defining kernel helper threads for running
5		- * arbitrary tasks in process context.
	4	+ * Copyright (C) 2002 Ingo Molnar
6	5	*
7		- * Started by Ingo Molnar, Copyright (C) 2002
	6	+ * Derived from the taskqueue/keventd code by:
	7	+ * David Woodhouse <dwmw2@infradead.org>
	8	+ * Andrew Morton
	9	+ * Kai Petzke <wpp@marie.physik.tu-berlin.de>
	10	+ * Theodore Ts'o <tytso@mit.edu>
8	11	*
9		- * Derived from the taskqueue/keventd code by:
	12	+ * Made to use alloc_percpu by Christoph Lameter.
10	13	*
11		- * David Woodhouse <dwmw2@infradead.org>
12		- * Andrew Morton
13		- * Kai Petzke <wpp@marie.physik.tu-berlin.de>
14		- * Theodore Ts'o <tytso@mit.edu>
	14	+ * Copyright (C) 2010 SUSE Linux Products GmbH
	15	+ * Copyright (C) 2010 Tejun Heo <tj@kernel.org>
15	16	*
16		- * Made to use alloc_percpu by Christoph Lameter.
	17	+ * This is the generic async execution mechanism. Work items as are
	18	+ * executed in process context. The worker pool is shared and
	19	+ * automatically managed. There is one worker pool for each CPU and
	20	+ * one extra for works which are better served by workers which are
	21	+ * not bound to any specific CPU.
	22	+ *
	23	+ * Please read Documentation/workqueue.txt for details.
17	24	*/
18	25
19	26	#include <linux/module.h>