Commit 91b745016c12d440386c40fb76ab69c8e08cbc06

Authored by Linus Torvalds

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: remove in_workqueue_context()
  workqueue: Clarify that schedule_on_each_cpu is synchronous
  memory_hotplug: drop spurious calls to flush_scheduled_work()
  shpchp: update workqueue usage
  pciehp: update workqueue usage
  isdn/eicon: don't call flush_scheduled_work() from diva_os_remove_soft_isr()
  workqueue: add and use WQ_MEM_RECLAIM flag
  workqueue: fix HIGHPRI handling in keep_working()
  workqueue: add queue_work and activate_work trace points
  workqueue: prepare for more tracepoints
  workqueue: implement flush[_delayed]_work_sync()
  workqueue: factor out start_flush_work()
  workqueue: cleanup flush/cancel functions
  workqueue: implement alloc_ordered_workqueue()

Fix up trivial conflict in fs/gfs2/main.c as per Tejun

Showing 17 changed files Inline Diff

Documentation/workqueue.txt
1 1
2 Concurrency Managed Workqueue (cmwq) 2 Concurrency Managed Workqueue (cmwq)
3 3
4 September, 2010 Tejun Heo <tj@kernel.org> 4 September, 2010 Tejun Heo <tj@kernel.org>
5 Florian Mickler <florian@mickler.org> 5 Florian Mickler <florian@mickler.org>
6 6
7 CONTENTS 7 CONTENTS
8 8
9 1. Introduction 9 1. Introduction
10 2. Why cmwq? 10 2. Why cmwq?
11 3. The Design 11 3. The Design
12 4. Application Programming Interface (API) 12 4. Application Programming Interface (API)
13 5. Example Execution Scenarios 13 5. Example Execution Scenarios
14 6. Guidelines 14 6. Guidelines
15 15
16 16
17 1. Introduction 17 1. Introduction
18 18
19 There are many cases where an asynchronous process execution context 19 There are many cases where an asynchronous process execution context
20 is needed and the workqueue (wq) API is the most commonly used 20 is needed and the workqueue (wq) API is the most commonly used
21 mechanism for such cases. 21 mechanism for such cases.
22 22
23 When such an asynchronous execution context is needed, a work item 23 When such an asynchronous execution context is needed, a work item
24 describing which function to execute is put on a queue. An 24 describing which function to execute is put on a queue. An
25 independent thread serves as the asynchronous execution context. The 25 independent thread serves as the asynchronous execution context. The
26 queue is called workqueue and the thread is called worker. 26 queue is called workqueue and the thread is called worker.
27 27
28 While there are work items on the workqueue the worker executes the 28 While there are work items on the workqueue the worker executes the
29 functions associated with the work items one after the other. When 29 functions associated with the work items one after the other. When
30 there is no work item left on the workqueue the worker becomes idle. 30 there is no work item left on the workqueue the worker becomes idle.
31 When a new work item gets queued, the worker begins executing again. 31 When a new work item gets queued, the worker begins executing again.
32 32
33 33
34 2. Why cmwq? 34 2. Why cmwq?
35 35
36 In the original wq implementation, a multi threaded (MT) wq had one 36 In the original wq implementation, a multi threaded (MT) wq had one
37 worker thread per CPU and a single threaded (ST) wq had one worker 37 worker thread per CPU and a single threaded (ST) wq had one worker
38 thread system-wide. A single MT wq needed to keep around the same 38 thread system-wide. A single MT wq needed to keep around the same
39 number of workers as the number of CPUs. The kernel grew a lot of MT 39 number of workers as the number of CPUs. The kernel grew a lot of MT
40 wq users over the years and with the number of CPU cores continuously 40 wq users over the years and with the number of CPU cores continuously
41 rising, some systems saturated the default 32k PID space just booting 41 rising, some systems saturated the default 32k PID space just booting
42 up. 42 up.
43 43
44 Although MT wq wasted a lot of resource, the level of concurrency 44 Although MT wq wasted a lot of resource, the level of concurrency
45 provided was unsatisfactory. The limitation was common to both ST and 45 provided was unsatisfactory. The limitation was common to both ST and
46 MT wq albeit less severe on MT. Each wq maintained its own separate 46 MT wq albeit less severe on MT. Each wq maintained its own separate
47 worker pool. A MT wq could provide only one execution context per CPU 47 worker pool. A MT wq could provide only one execution context per CPU
48 while a ST wq one for the whole system. Work items had to compete for 48 while a ST wq one for the whole system. Work items had to compete for
49 those very limited execution contexts leading to various problems 49 those very limited execution contexts leading to various problems
50 including proneness to deadlocks around the single execution context. 50 including proneness to deadlocks around the single execution context.
51 51
52 The tension between the provided level of concurrency and resource 52 The tension between the provided level of concurrency and resource
53 usage also forced its users to make unnecessary tradeoffs like libata 53 usage also forced its users to make unnecessary tradeoffs like libata
54 choosing to use ST wq for polling PIOs and accepting an unnecessary 54 choosing to use ST wq for polling PIOs and accepting an unnecessary
55 limitation that no two polling PIOs can progress at the same time. As 55 limitation that no two polling PIOs can progress at the same time. As
56 MT wq don't provide much better concurrency, users which require 56 MT wq don't provide much better concurrency, users which require
57 higher level of concurrency, like async or fscache, had to implement 57 higher level of concurrency, like async or fscache, had to implement
58 their own thread pool. 58 their own thread pool.
59 59
60 Concurrency Managed Workqueue (cmwq) is a reimplementation of wq with 60 Concurrency Managed Workqueue (cmwq) is a reimplementation of wq with
61 focus on the following goals. 61 focus on the following goals.
62 62
63 * Maintain compatibility with the original workqueue API. 63 * Maintain compatibility with the original workqueue API.
64 64
65 * Use per-CPU unified worker pools shared by all wq to provide 65 * Use per-CPU unified worker pools shared by all wq to provide
66 flexible level of concurrency on demand without wasting a lot of 66 flexible level of concurrency on demand without wasting a lot of
67 resource. 67 resource.
68 68
69 * Automatically regulate worker pool and level of concurrency so that 69 * Automatically regulate worker pool and level of concurrency so that
70 the API users don't need to worry about such details. 70 the API users don't need to worry about such details.
71 71
72 72
73 3. The Design 73 3. The Design
74 74
75 In order to ease the asynchronous execution of functions a new 75 In order to ease the asynchronous execution of functions a new
76 abstraction, the work item, is introduced. 76 abstraction, the work item, is introduced.
77 77
78 A work item is a simple struct that holds a pointer to the function 78 A work item is a simple struct that holds a pointer to the function
79 that is to be executed asynchronously. Whenever a driver or subsystem 79 that is to be executed asynchronously. Whenever a driver or subsystem
80 wants a function to be executed asynchronously it has to set up a work 80 wants a function to be executed asynchronously it has to set up a work
81 item pointing to that function and queue that work item on a 81 item pointing to that function and queue that work item on a
82 workqueue. 82 workqueue.
83 83
84 Special purpose threads, called worker threads, execute the functions 84 Special purpose threads, called worker threads, execute the functions
85 off of the queue, one after the other. If no work is queued, the 85 off of the queue, one after the other. If no work is queued, the
86 worker threads become idle. These worker threads are managed in so 86 worker threads become idle. These worker threads are managed in so
87 called thread-pools. 87 called thread-pools.
88 88
89 The cmwq design differentiates between the user-facing workqueues that 89 The cmwq design differentiates between the user-facing workqueues that
90 subsystems and drivers queue work items on and the backend mechanism 90 subsystems and drivers queue work items on and the backend mechanism
91 which manages thread-pool and processes the queued work items. 91 which manages thread-pool and processes the queued work items.
92 92
93 The backend is called gcwq. There is one gcwq for each possible CPU 93 The backend is called gcwq. There is one gcwq for each possible CPU
94 and one gcwq to serve work items queued on unbound workqueues. 94 and one gcwq to serve work items queued on unbound workqueues.
95 95
96 Subsystems and drivers can create and queue work items through special 96 Subsystems and drivers can create and queue work items through special
97 workqueue API functions as they see fit. They can influence some 97 workqueue API functions as they see fit. They can influence some
98 aspects of the way the work items are executed by setting flags on the 98 aspects of the way the work items are executed by setting flags on the
99 workqueue they are putting the work item on. These flags include 99 workqueue they are putting the work item on. These flags include
100 things like CPU locality, reentrancy, concurrency limits and more. To 100 things like CPU locality, reentrancy, concurrency limits and more. To
101 get a detailed overview refer to the API description of 101 get a detailed overview refer to the API description of
102 alloc_workqueue() below. 102 alloc_workqueue() below.
103 103
104 When a work item is queued to a workqueue, the target gcwq is 104 When a work item is queued to a workqueue, the target gcwq is
105 determined according to the queue parameters and workqueue attributes 105 determined according to the queue parameters and workqueue attributes
106 and appended on the shared worklist of the gcwq. For example, unless 106 and appended on the shared worklist of the gcwq. For example, unless
107 specifically overridden, a work item of a bound workqueue will be 107 specifically overridden, a work item of a bound workqueue will be
108 queued on the worklist of exactly that gcwq that is associated to the 108 queued on the worklist of exactly that gcwq that is associated to the
109 CPU the issuer is running on. 109 CPU the issuer is running on.
110 110
111 For any worker pool implementation, managing the concurrency level 111 For any worker pool implementation, managing the concurrency level
112 (how many execution contexts are active) is an important issue. cmwq 112 (how many execution contexts are active) is an important issue. cmwq
113 tries to keep the concurrency at a minimal but sufficient level. 113 tries to keep the concurrency at a minimal but sufficient level.
114 Minimal to save resources and sufficient in that the system is used at 114 Minimal to save resources and sufficient in that the system is used at
115 its full capacity. 115 its full capacity.
116 116
117 Each gcwq bound to an actual CPU implements concurrency management by 117 Each gcwq bound to an actual CPU implements concurrency management by
118 hooking into the scheduler. The gcwq is notified whenever an active 118 hooking into the scheduler. The gcwq is notified whenever an active
119 worker wakes up or sleeps and keeps track of the number of the 119 worker wakes up or sleeps and keeps track of the number of the
120 currently runnable workers. Generally, work items are not expected to 120 currently runnable workers. Generally, work items are not expected to
121 hog a CPU and consume many cycles. That means maintaining just enough 121 hog a CPU and consume many cycles. That means maintaining just enough
122 concurrency to prevent work processing from stalling should be 122 concurrency to prevent work processing from stalling should be
123 optimal. As long as there are one or more runnable workers on the 123 optimal. As long as there are one or more runnable workers on the
124 CPU, the gcwq doesn't start execution of a new work, but, when the 124 CPU, the gcwq doesn't start execution of a new work, but, when the
125 last running worker goes to sleep, it immediately schedules a new 125 last running worker goes to sleep, it immediately schedules a new
126 worker so that the CPU doesn't sit idle while there are pending work 126 worker so that the CPU doesn't sit idle while there are pending work
127 items. This allows using a minimal number of workers without losing 127 items. This allows using a minimal number of workers without losing
128 execution bandwidth. 128 execution bandwidth.
129 129
130 Keeping idle workers around doesn't cost other than the memory space 130 Keeping idle workers around doesn't cost other than the memory space
131 for kthreads, so cmwq holds onto idle ones for a while before killing 131 for kthreads, so cmwq holds onto idle ones for a while before killing
132 them. 132 them.
133 133
134 For an unbound wq, the above concurrency management doesn't apply and 134 For an unbound wq, the above concurrency management doesn't apply and
135 the gcwq for the pseudo unbound CPU tries to start executing all work 135 the gcwq for the pseudo unbound CPU tries to start executing all work
136 items as soon as possible. The responsibility of regulating 136 items as soon as possible. The responsibility of regulating
137 concurrency level is on the users. There is also a flag to mark a 137 concurrency level is on the users. There is also a flag to mark a
138 bound wq to ignore the concurrency management. Please refer to the 138 bound wq to ignore the concurrency management. Please refer to the
139 API section for details. 139 API section for details.
140 140
141 Forward progress guarantee relies on that workers can be created when 141 Forward progress guarantee relies on that workers can be created when
142 more execution contexts are necessary, which in turn is guaranteed 142 more execution contexts are necessary, which in turn is guaranteed
143 through the use of rescue workers. All work items which might be used 143 through the use of rescue workers. All work items which might be used
144 on code paths that handle memory reclaim are required to be queued on 144 on code paths that handle memory reclaim are required to be queued on
145 wq's that have a rescue-worker reserved for execution under memory 145 wq's that have a rescue-worker reserved for execution under memory
146 pressure. Else it is possible that the thread-pool deadlocks waiting 146 pressure. Else it is possible that the thread-pool deadlocks waiting
147 for execution contexts to free up. 147 for execution contexts to free up.
148 148
149 149
150 4. Application Programming Interface (API) 150 4. Application Programming Interface (API)
151 151
152 alloc_workqueue() allocates a wq. The original create_*workqueue() 152 alloc_workqueue() allocates a wq. The original create_*workqueue()
153 functions are deprecated and scheduled for removal. alloc_workqueue() 153 functions are deprecated and scheduled for removal. alloc_workqueue()
154 takes three arguments - @name, @flags and @max_active. @name is the 154 takes three arguments - @name, @flags and @max_active. @name is the
155 name of the wq and also used as the name of the rescuer thread if 155 name of the wq and also used as the name of the rescuer thread if
156 there is one. 156 there is one.
157 157
158 A wq no longer manages execution resources but serves as a domain for 158 A wq no longer manages execution resources but serves as a domain for
159 forward progress guarantee, flush and work item attributes. @flags 159 forward progress guarantee, flush and work item attributes. @flags
160 and @max_active control how work items are assigned execution 160 and @max_active control how work items are assigned execution
161 resources, scheduled and executed. 161 resources, scheduled and executed.
162 162
163 @flags: 163 @flags:
164 164
165 WQ_NON_REENTRANT 165 WQ_NON_REENTRANT
166 166
167 By default, a wq guarantees non-reentrance only on the same 167 By default, a wq guarantees non-reentrance only on the same
168 CPU. A work item may not be executed concurrently on the same 168 CPU. A work item may not be executed concurrently on the same
169 CPU by multiple workers but is allowed to be executed 169 CPU by multiple workers but is allowed to be executed
170 concurrently on multiple CPUs. This flag makes sure 170 concurrently on multiple CPUs. This flag makes sure
171 non-reentrance is enforced across all CPUs. Work items queued 171 non-reentrance is enforced across all CPUs. Work items queued
172 to a non-reentrant wq are guaranteed to be executed by at most 172 to a non-reentrant wq are guaranteed to be executed by at most
173 one worker system-wide at any given time. 173 one worker system-wide at any given time.
174 174
175 WQ_UNBOUND 175 WQ_UNBOUND
176 176
177 Work items queued to an unbound wq are served by a special 177 Work items queued to an unbound wq are served by a special
178 gcwq which hosts workers which are not bound to any specific 178 gcwq which hosts workers which are not bound to any specific
179 CPU. This makes the wq behave as a simple execution context 179 CPU. This makes the wq behave as a simple execution context
180 provider without concurrency management. The unbound gcwq 180 provider without concurrency management. The unbound gcwq
181 tries to start execution of work items as soon as possible. 181 tries to start execution of work items as soon as possible.
182 Unbound wq sacrifices locality but is useful for the following 182 Unbound wq sacrifices locality but is useful for the following
183 cases. 183 cases.
184 184
185 * Wide fluctuation in the concurrency level requirement is 185 * Wide fluctuation in the concurrency level requirement is
186 expected and using bound wq may end up creating large number 186 expected and using bound wq may end up creating large number
187 of mostly unused workers across different CPUs as the issuer 187 of mostly unused workers across different CPUs as the issuer
188 hops through different CPUs. 188 hops through different CPUs.
189 189
190 * Long running CPU intensive workloads which can be better 190 * Long running CPU intensive workloads which can be better
191 managed by the system scheduler. 191 managed by the system scheduler.
192 192
193 WQ_FREEZEABLE 193 WQ_FREEZEABLE
194 194
195 A freezeable wq participates in the freeze phase of the system 195 A freezeable wq participates in the freeze phase of the system
196 suspend operations. Work items on the wq are drained and no 196 suspend operations. Work items on the wq are drained and no
197 new work item starts execution until thawed. 197 new work item starts execution until thawed.
198 198
199 WQ_RESCUER 199 WQ_MEM_RECLAIM
200 200
201 All wq which might be used in the memory reclaim paths _MUST_ 201 All wq which might be used in the memory reclaim paths _MUST_
202 have this flag set. This reserves one worker exclusively for 202 have this flag set. The wq is guaranteed to have at least one
203 the execution of this wq under memory pressure. 203 execution context regardless of memory pressure.
204 204
205 WQ_HIGHPRI 205 WQ_HIGHPRI
206 206
207 Work items of a highpri wq are queued at the head of the 207 Work items of a highpri wq are queued at the head of the
208 worklist of the target gcwq and start execution regardless of 208 worklist of the target gcwq and start execution regardless of
209 the current concurrency level. In other words, highpri work 209 the current concurrency level. In other words, highpri work
210 items will always start execution as soon as execution 210 items will always start execution as soon as execution
211 resource is available. 211 resource is available.
212 212
213 Ordering among highpri work items is preserved - a highpri 213 Ordering among highpri work items is preserved - a highpri
214 work item queued after another highpri work item will start 214 work item queued after another highpri work item will start
215 execution after the earlier highpri work item starts. 215 execution after the earlier highpri work item starts.
216 216
217 Although highpri work items are not held back by other 217 Although highpri work items are not held back by other
218 runnable work items, they still contribute to the concurrency 218 runnable work items, they still contribute to the concurrency
219 level. Highpri work items in runnable state will prevent 219 level. Highpri work items in runnable state will prevent
220 non-highpri work items from starting execution. 220 non-highpri work items from starting execution.
221 221
222 This flag is meaningless for unbound wq. 222 This flag is meaningless for unbound wq.
223 223
224 WQ_CPU_INTENSIVE 224 WQ_CPU_INTENSIVE
225 225
226 Work items of a CPU intensive wq do not contribute to the 226 Work items of a CPU intensive wq do not contribute to the
227 concurrency level. In other words, runnable CPU intensive 227 concurrency level. In other words, runnable CPU intensive
228 work items will not prevent other work items from starting 228 work items will not prevent other work items from starting
229 execution. This is useful for bound work items which are 229 execution. This is useful for bound work items which are
230 expected to hog CPU cycles so that their execution is 230 expected to hog CPU cycles so that their execution is
231 regulated by the system scheduler. 231 regulated by the system scheduler.
232 232
233 Although CPU intensive work items don't contribute to the 233 Although CPU intensive work items don't contribute to the
234 concurrency level, start of their executions is still 234 concurrency level, start of their executions is still
235 regulated by the concurrency management and runnable 235 regulated by the concurrency management and runnable
236 non-CPU-intensive work items can delay execution of CPU 236 non-CPU-intensive work items can delay execution of CPU
237 intensive work items. 237 intensive work items.
238 238
239 This flag is meaningless for unbound wq. 239 This flag is meaningless for unbound wq.
240 240
241 WQ_HIGHPRI | WQ_CPU_INTENSIVE 241 WQ_HIGHPRI | WQ_CPU_INTENSIVE
242 242
243 This combination makes the wq avoid interaction with 243 This combination makes the wq avoid interaction with
244 concurrency management completely and behave as a simple 244 concurrency management completely and behave as a simple
245 per-CPU execution context provider. Work items queued on a 245 per-CPU execution context provider. Work items queued on a
246 highpri CPU-intensive wq start execution as soon as resources 246 highpri CPU-intensive wq start execution as soon as resources
247 are available and don't affect execution of other work items. 247 are available and don't affect execution of other work items.
248 248
249 @max_active: 249 @max_active:
250 250
251 @max_active determines the maximum number of execution contexts per 251 @max_active determines the maximum number of execution contexts per
252 CPU which can be assigned to the work items of a wq. For example, 252 CPU which can be assigned to the work items of a wq. For example,
253 with @max_active of 16, at most 16 work items of the wq can be 253 with @max_active of 16, at most 16 work items of the wq can be
254 executing at the same time per CPU. 254 executing at the same time per CPU.
255 255
256 Currently, for a bound wq, the maximum limit for @max_active is 512 256 Currently, for a bound wq, the maximum limit for @max_active is 512
257 and the default value used when 0 is specified is 256. For an unbound 257 and the default value used when 0 is specified is 256. For an unbound
258 wq, the limit is higher of 512 and 4 * num_possible_cpus(). These 258 wq, the limit is higher of 512 and 4 * num_possible_cpus(). These
259 values are chosen sufficiently high such that they are not the 259 values are chosen sufficiently high such that they are not the
260 limiting factor while providing protection in runaway cases. 260 limiting factor while providing protection in runaway cases.
261 261
262 The number of active work items of a wq is usually regulated by the 262 The number of active work items of a wq is usually regulated by the
263 users of the wq, more specifically, by how many work items the users 263 users of the wq, more specifically, by how many work items the users
264 may queue at the same time. Unless there is a specific need for 264 may queue at the same time. Unless there is a specific need for
265 throttling the number of active work items, specifying '0' is 265 throttling the number of active work items, specifying '0' is
266 recommended. 266 recommended.
267 267
268 Some users depend on the strict execution ordering of ST wq. The 268 Some users depend on the strict execution ordering of ST wq. The
269 combination of @max_active of 1 and WQ_UNBOUND is used to achieve this 269 combination of @max_active of 1 and WQ_UNBOUND is used to achieve this
270 behavior. Work items on such wq are always queued to the unbound gcwq 270 behavior. Work items on such wq are always queued to the unbound gcwq
271 and only one work item can be active at any given time thus achieving 271 and only one work item can be active at any given time thus achieving
272 the same ordering property as ST wq. 272 the same ordering property as ST wq.
273 273
274 274
275 5. Example Execution Scenarios 275 5. Example Execution Scenarios
276 276
277 The following example execution scenarios try to illustrate how cmwq 277 The following example execution scenarios try to illustrate how cmwq
278 behave under different configurations. 278 behave under different configurations.
279 279
280 Work items w0, w1, w2 are queued to a bound wq q0 on the same CPU. 280 Work items w0, w1, w2 are queued to a bound wq q0 on the same CPU.
281 w0 burns CPU for 5ms then sleeps for 10ms then burns CPU for 5ms 281 w0 burns CPU for 5ms then sleeps for 10ms then burns CPU for 5ms
282 again before finishing. w1 and w2 burn CPU for 5ms then sleep for 282 again before finishing. w1 and w2 burn CPU for 5ms then sleep for
283 10ms. 283 10ms.
284 284
285 Ignoring all other tasks, works and processing overhead, and assuming 285 Ignoring all other tasks, works and processing overhead, and assuming
286 simple FIFO scheduling, the following is one highly simplified version 286 simple FIFO scheduling, the following is one highly simplified version
287 of possible sequences of events with the original wq. 287 of possible sequences of events with the original wq.
288 288
289 TIME IN MSECS EVENT 289 TIME IN MSECS EVENT
290 0 w0 starts and burns CPU 290 0 w0 starts and burns CPU
291 5 w0 sleeps 291 5 w0 sleeps
292 15 w0 wakes up and burns CPU 292 15 w0 wakes up and burns CPU
293 20 w0 finishes 293 20 w0 finishes
294 20 w1 starts and burns CPU 294 20 w1 starts and burns CPU
295 25 w1 sleeps 295 25 w1 sleeps
296 35 w1 wakes up and finishes 296 35 w1 wakes up and finishes
297 35 w2 starts and burns CPU 297 35 w2 starts and burns CPU
298 40 w2 sleeps 298 40 w2 sleeps
299 50 w2 wakes up and finishes 299 50 w2 wakes up and finishes
300 300
301 And with cmwq with @max_active >= 3, 301 And with cmwq with @max_active >= 3,
302 302
303 TIME IN MSECS EVENT 303 TIME IN MSECS EVENT
304 0 w0 starts and burns CPU 304 0 w0 starts and burns CPU
305 5 w0 sleeps 305 5 w0 sleeps
306 5 w1 starts and burns CPU 306 5 w1 starts and burns CPU
307 10 w1 sleeps 307 10 w1 sleeps
308 10 w2 starts and burns CPU 308 10 w2 starts and burns CPU
309 15 w2 sleeps 309 15 w2 sleeps
310 15 w0 wakes up and burns CPU 310 15 w0 wakes up and burns CPU
311 20 w0 finishes 311 20 w0 finishes
312 20 w1 wakes up and finishes 312 20 w1 wakes up and finishes
313 25 w2 wakes up and finishes 313 25 w2 wakes up and finishes
314 314
315 If @max_active == 2, 315 If @max_active == 2,
316 316
317 TIME IN MSECS EVENT 317 TIME IN MSECS EVENT
318 0 w0 starts and burns CPU 318 0 w0 starts and burns CPU
319 5 w0 sleeps 319 5 w0 sleeps
320 5 w1 starts and burns CPU 320 5 w1 starts and burns CPU
321 10 w1 sleeps 321 10 w1 sleeps
322 15 w0 wakes up and burns CPU 322 15 w0 wakes up and burns CPU
323 20 w0 finishes 323 20 w0 finishes
324 20 w1 wakes up and finishes 324 20 w1 wakes up and finishes
325 20 w2 starts and burns CPU 325 20 w2 starts and burns CPU
326 25 w2 sleeps 326 25 w2 sleeps
327 35 w2 wakes up and finishes 327 35 w2 wakes up and finishes
328 328
329 Now, let's assume w1 and w2 are queued to a different wq q1 which has 329 Now, let's assume w1 and w2 are queued to a different wq q1 which has
330 WQ_HIGHPRI set, 330 WQ_HIGHPRI set,
331 331
332 TIME IN MSECS EVENT 332 TIME IN MSECS EVENT
333 0 w1 and w2 start and burn CPU 333 0 w1 and w2 start and burn CPU
334 5 w1 sleeps 334 5 w1 sleeps
335 10 w2 sleeps 335 10 w2 sleeps
336 10 w0 starts and burns CPU 336 10 w0 starts and burns CPU
337 15 w0 sleeps 337 15 w0 sleeps
338 15 w1 wakes up and finishes 338 15 w1 wakes up and finishes
339 20 w2 wakes up and finishes 339 20 w2 wakes up and finishes
340 25 w0 wakes up and burns CPU 340 25 w0 wakes up and burns CPU
341 30 w0 finishes 341 30 w0 finishes
342 342
343 If q1 has WQ_CPU_INTENSIVE set, 343 If q1 has WQ_CPU_INTENSIVE set,
344 344
345 TIME IN MSECS EVENT 345 TIME IN MSECS EVENT
346 0 w0 starts and burns CPU 346 0 w0 starts and burns CPU
347 5 w0 sleeps 347 5 w0 sleeps
348 5 w1 and w2 start and burn CPU 348 5 w1 and w2 start and burn CPU
349 10 w1 sleeps 349 10 w1 sleeps
350 15 w2 sleeps 350 15 w2 sleeps
351 15 w0 wakes up and burns CPU 351 15 w0 wakes up and burns CPU
352 20 w0 finishes 352 20 w0 finishes
353 20 w1 wakes up and finishes 353 20 w1 wakes up and finishes
354 25 w2 wakes up and finishes 354 25 w2 wakes up and finishes
355 355
356 356
357 6. Guidelines 357 6. Guidelines
358 358
359 * Do not forget to use WQ_RESCUER if a wq may process work items which 359 * Do not forget to use WQ_MEM_RECLAIM if a wq may process work items
360 are used during memory reclaim. Each wq with WQ_RESCUER set has one 360 which are used during memory reclaim. Each wq with WQ_MEM_RECLAIM
361 rescuer thread reserved for it. If there is dependency among 361 set has an execution context reserved for it. If there is
362 multiple work items used during memory reclaim, they should be 362 dependency among multiple work items used during memory reclaim,
363 queued to separate wq each with WQ_RESCUER. 363 they should be queued to separate wq each with WQ_MEM_RECLAIM.
364 364
365 * Unless strict ordering is required, there is no need to use ST wq. 365 * Unless strict ordering is required, there is no need to use ST wq.
366 366
367 * Unless there is a specific need, using 0 for @max_active is 367 * Unless there is a specific need, using 0 for @max_active is
368 recommended. In most use cases, concurrency level usually stays 368 recommended. In most use cases, concurrency level usually stays
369 well under the default limit. 369 well under the default limit.
370 370
371 * A wq serves as a domain for forward progress guarantee (WQ_RESCUER), 371 * A wq serves as a domain for forward progress guarantee
372 flush and work item attributes. Work items which are not involved 372 (WQ_MEM_RECLAIM, flush and work item attributes. Work items which
373 in memory reclaim and don't need to be flushed as a part of a group 373 are not involved in memory reclaim and don't need to be flushed as a
374 of work items, and don't require any special attribute, can use one 374 part of a group of work items, and don't require any special
375 of the system wq. There is no difference in execution 375 attribute, can use one of the system wq. There is no difference in
376 characteristics between using a dedicated wq and a system wq. 376 execution characteristics between using a dedicated wq and a system
377 wq.
377 378
378 * Unless work items are expected to consume a huge amount of CPU 379 * Unless work items are expected to consume a huge amount of CPU
379 cycles, using a bound wq is usually beneficial due to the increased 380 cycles, using a bound wq is usually beneficial due to the increased
380 level of locality in wq operations and work item execution. 381 level of locality in wq operations and work item execution.
381 382
drivers/ata/libata-sff.c
1 /* 1 /*
2 * libata-sff.c - helper library for PCI IDE BMDMA 2 * libata-sff.c - helper library for PCI IDE BMDMA
3 * 3 *
4 * Maintained by: Jeff Garzik <jgarzik@pobox.com> 4 * Maintained by: Jeff Garzik <jgarzik@pobox.com>
5 * Please ALWAYS copy linux-ide@vger.kernel.org 5 * Please ALWAYS copy linux-ide@vger.kernel.org
6 * on emails. 6 * on emails.
7 * 7 *
8 * Copyright 2003-2006 Red Hat, Inc. All rights reserved. 8 * Copyright 2003-2006 Red Hat, Inc. All rights reserved.
9 * Copyright 2003-2006 Jeff Garzik 9 * Copyright 2003-2006 Jeff Garzik
10 * 10 *
11 * 11 *
12 * This program is free software; you can redistribute it and/or modify 12 * This program is free software; you can redistribute it and/or modify
13 * it under the terms of the GNU General Public License as published by 13 * it under the terms of the GNU General Public License as published by
14 * the Free Software Foundation; either version 2, or (at your option) 14 * the Free Software Foundation; either version 2, or (at your option)
15 * any later version. 15 * any later version.
16 * 16 *
17 * This program is distributed in the hope that it will be useful, 17 * This program is distributed in the hope that it will be useful,
18 * but WITHOUT ANY WARRANTY; without even the implied warranty of 18 * but WITHOUT ANY WARRANTY; without even the implied warranty of
19 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 19 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
20 * GNU General Public License for more details. 20 * GNU General Public License for more details.
21 * 21 *
22 * You should have received a copy of the GNU General Public License 22 * You should have received a copy of the GNU General Public License
23 * along with this program; see the file COPYING. If not, write to 23 * along with this program; see the file COPYING. If not, write to
24 * the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. 24 * the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
25 * 25 *
26 * 26 *
27 * libata documentation is available via 'make {ps|pdf}docs', 27 * libata documentation is available via 'make {ps|pdf}docs',
28 * as Documentation/DocBook/libata.* 28 * as Documentation/DocBook/libata.*
29 * 29 *
30 * Hardware documentation available from http://www.t13.org/ and 30 * Hardware documentation available from http://www.t13.org/ and
31 * http://www.sata-io.org/ 31 * http://www.sata-io.org/
32 * 32 *
33 */ 33 */
34 34
35 #include <linux/kernel.h> 35 #include <linux/kernel.h>
36 #include <linux/gfp.h> 36 #include <linux/gfp.h>
37 #include <linux/pci.h> 37 #include <linux/pci.h>
38 #include <linux/libata.h> 38 #include <linux/libata.h>
39 #include <linux/highmem.h> 39 #include <linux/highmem.h>
40 40
41 #include "libata.h" 41 #include "libata.h"
42 42
43 static struct workqueue_struct *ata_sff_wq; 43 static struct workqueue_struct *ata_sff_wq;
44 44
45 const struct ata_port_operations ata_sff_port_ops = { 45 const struct ata_port_operations ata_sff_port_ops = {
46 .inherits = &ata_base_port_ops, 46 .inherits = &ata_base_port_ops,
47 47
48 .qc_prep = ata_noop_qc_prep, 48 .qc_prep = ata_noop_qc_prep,
49 .qc_issue = ata_sff_qc_issue, 49 .qc_issue = ata_sff_qc_issue,
50 .qc_fill_rtf = ata_sff_qc_fill_rtf, 50 .qc_fill_rtf = ata_sff_qc_fill_rtf,
51 51
52 .freeze = ata_sff_freeze, 52 .freeze = ata_sff_freeze,
53 .thaw = ata_sff_thaw, 53 .thaw = ata_sff_thaw,
54 .prereset = ata_sff_prereset, 54 .prereset = ata_sff_prereset,
55 .softreset = ata_sff_softreset, 55 .softreset = ata_sff_softreset,
56 .hardreset = sata_sff_hardreset, 56 .hardreset = sata_sff_hardreset,
57 .postreset = ata_sff_postreset, 57 .postreset = ata_sff_postreset,
58 .error_handler = ata_sff_error_handler, 58 .error_handler = ata_sff_error_handler,
59 59
60 .sff_dev_select = ata_sff_dev_select, 60 .sff_dev_select = ata_sff_dev_select,
61 .sff_check_status = ata_sff_check_status, 61 .sff_check_status = ata_sff_check_status,
62 .sff_tf_load = ata_sff_tf_load, 62 .sff_tf_load = ata_sff_tf_load,
63 .sff_tf_read = ata_sff_tf_read, 63 .sff_tf_read = ata_sff_tf_read,
64 .sff_exec_command = ata_sff_exec_command, 64 .sff_exec_command = ata_sff_exec_command,
65 .sff_data_xfer = ata_sff_data_xfer, 65 .sff_data_xfer = ata_sff_data_xfer,
66 .sff_drain_fifo = ata_sff_drain_fifo, 66 .sff_drain_fifo = ata_sff_drain_fifo,
67 67
68 .lost_interrupt = ata_sff_lost_interrupt, 68 .lost_interrupt = ata_sff_lost_interrupt,
69 }; 69 };
70 EXPORT_SYMBOL_GPL(ata_sff_port_ops); 70 EXPORT_SYMBOL_GPL(ata_sff_port_ops);
71 71
72 /** 72 /**
73 * ata_sff_check_status - Read device status reg & clear interrupt 73 * ata_sff_check_status - Read device status reg & clear interrupt
74 * @ap: port where the device is 74 * @ap: port where the device is
75 * 75 *
76 * Reads ATA taskfile status register for currently-selected device 76 * Reads ATA taskfile status register for currently-selected device
77 * and return its value. This also clears pending interrupts 77 * and return its value. This also clears pending interrupts
78 * from this device 78 * from this device
79 * 79 *
80 * LOCKING: 80 * LOCKING:
81 * Inherited from caller. 81 * Inherited from caller.
82 */ 82 */
83 u8 ata_sff_check_status(struct ata_port *ap) 83 u8 ata_sff_check_status(struct ata_port *ap)
84 { 84 {
85 return ioread8(ap->ioaddr.status_addr); 85 return ioread8(ap->ioaddr.status_addr);
86 } 86 }
87 EXPORT_SYMBOL_GPL(ata_sff_check_status); 87 EXPORT_SYMBOL_GPL(ata_sff_check_status);
88 88
89 /** 89 /**
90 * ata_sff_altstatus - Read device alternate status reg 90 * ata_sff_altstatus - Read device alternate status reg
91 * @ap: port where the device is 91 * @ap: port where the device is
92 * 92 *
93 * Reads ATA taskfile alternate status register for 93 * Reads ATA taskfile alternate status register for
94 * currently-selected device and return its value. 94 * currently-selected device and return its value.
95 * 95 *
96 * Note: may NOT be used as the check_altstatus() entry in 96 * Note: may NOT be used as the check_altstatus() entry in
97 * ata_port_operations. 97 * ata_port_operations.
98 * 98 *
99 * LOCKING: 99 * LOCKING:
100 * Inherited from caller. 100 * Inherited from caller.
101 */ 101 */
102 static u8 ata_sff_altstatus(struct ata_port *ap) 102 static u8 ata_sff_altstatus(struct ata_port *ap)
103 { 103 {
104 if (ap->ops->sff_check_altstatus) 104 if (ap->ops->sff_check_altstatus)
105 return ap->ops->sff_check_altstatus(ap); 105 return ap->ops->sff_check_altstatus(ap);
106 106
107 return ioread8(ap->ioaddr.altstatus_addr); 107 return ioread8(ap->ioaddr.altstatus_addr);
108 } 108 }
109 109
110 /** 110 /**
111 * ata_sff_irq_status - Check if the device is busy 111 * ata_sff_irq_status - Check if the device is busy
112 * @ap: port where the device is 112 * @ap: port where the device is
113 * 113 *
114 * Determine if the port is currently busy. Uses altstatus 114 * Determine if the port is currently busy. Uses altstatus
115 * if available in order to avoid clearing shared IRQ status 115 * if available in order to avoid clearing shared IRQ status
116 * when finding an IRQ source. Non ctl capable devices don't 116 * when finding an IRQ source. Non ctl capable devices don't
117 * share interrupt lines fortunately for us. 117 * share interrupt lines fortunately for us.
118 * 118 *
119 * LOCKING: 119 * LOCKING:
120 * Inherited from caller. 120 * Inherited from caller.
121 */ 121 */
122 static u8 ata_sff_irq_status(struct ata_port *ap) 122 static u8 ata_sff_irq_status(struct ata_port *ap)
123 { 123 {
124 u8 status; 124 u8 status;
125 125
126 if (ap->ops->sff_check_altstatus || ap->ioaddr.altstatus_addr) { 126 if (ap->ops->sff_check_altstatus || ap->ioaddr.altstatus_addr) {
127 status = ata_sff_altstatus(ap); 127 status = ata_sff_altstatus(ap);
128 /* Not us: We are busy */ 128 /* Not us: We are busy */
129 if (status & ATA_BUSY) 129 if (status & ATA_BUSY)
130 return status; 130 return status;
131 } 131 }
132 /* Clear INTRQ latch */ 132 /* Clear INTRQ latch */
133 status = ap->ops->sff_check_status(ap); 133 status = ap->ops->sff_check_status(ap);
134 return status; 134 return status;
135 } 135 }
136 136
137 /** 137 /**
138 * ata_sff_sync - Flush writes 138 * ata_sff_sync - Flush writes
139 * @ap: Port to wait for. 139 * @ap: Port to wait for.
140 * 140 *
141 * CAUTION: 141 * CAUTION:
142 * If we have an mmio device with no ctl and no altstatus 142 * If we have an mmio device with no ctl and no altstatus
143 * method this will fail. No such devices are known to exist. 143 * method this will fail. No such devices are known to exist.
144 * 144 *
145 * LOCKING: 145 * LOCKING:
146 * Inherited from caller. 146 * Inherited from caller.
147 */ 147 */
148 148
149 static void ata_sff_sync(struct ata_port *ap) 149 static void ata_sff_sync(struct ata_port *ap)
150 { 150 {
151 if (ap->ops->sff_check_altstatus) 151 if (ap->ops->sff_check_altstatus)
152 ap->ops->sff_check_altstatus(ap); 152 ap->ops->sff_check_altstatus(ap);
153 else if (ap->ioaddr.altstatus_addr) 153 else if (ap->ioaddr.altstatus_addr)
154 ioread8(ap->ioaddr.altstatus_addr); 154 ioread8(ap->ioaddr.altstatus_addr);
155 } 155 }
156 156
157 /** 157 /**
158 * ata_sff_pause - Flush writes and wait 400nS 158 * ata_sff_pause - Flush writes and wait 400nS
159 * @ap: Port to pause for. 159 * @ap: Port to pause for.
160 * 160 *
161 * CAUTION: 161 * CAUTION:
162 * If we have an mmio device with no ctl and no altstatus 162 * If we have an mmio device with no ctl and no altstatus
163 * method this will fail. No such devices are known to exist. 163 * method this will fail. No such devices are known to exist.
164 * 164 *
165 * LOCKING: 165 * LOCKING:
166 * Inherited from caller. 166 * Inherited from caller.
167 */ 167 */
168 168
169 void ata_sff_pause(struct ata_port *ap) 169 void ata_sff_pause(struct ata_port *ap)
170 { 170 {
171 ata_sff_sync(ap); 171 ata_sff_sync(ap);
172 ndelay(400); 172 ndelay(400);
173 } 173 }
174 EXPORT_SYMBOL_GPL(ata_sff_pause); 174 EXPORT_SYMBOL_GPL(ata_sff_pause);
175 175
176 /** 176 /**
177 * ata_sff_dma_pause - Pause before commencing DMA 177 * ata_sff_dma_pause - Pause before commencing DMA
178 * @ap: Port to pause for. 178 * @ap: Port to pause for.
179 * 179 *
180 * Perform I/O fencing and ensure sufficient cycle delays occur 180 * Perform I/O fencing and ensure sufficient cycle delays occur
181 * for the HDMA1:0 transition 181 * for the HDMA1:0 transition
182 */ 182 */
183 183
184 void ata_sff_dma_pause(struct ata_port *ap) 184 void ata_sff_dma_pause(struct ata_port *ap)
185 { 185 {
186 if (ap->ops->sff_check_altstatus || ap->ioaddr.altstatus_addr) { 186 if (ap->ops->sff_check_altstatus || ap->ioaddr.altstatus_addr) {
187 /* An altstatus read will cause the needed delay without 187 /* An altstatus read will cause the needed delay without
188 messing up the IRQ status */ 188 messing up the IRQ status */
189 ata_sff_altstatus(ap); 189 ata_sff_altstatus(ap);
190 return; 190 return;
191 } 191 }
192 /* There are no DMA controllers without ctl. BUG here to ensure 192 /* There are no DMA controllers without ctl. BUG here to ensure
193 we never violate the HDMA1:0 transition timing and risk 193 we never violate the HDMA1:0 transition timing and risk
194 corruption. */ 194 corruption. */
195 BUG(); 195 BUG();
196 } 196 }
197 EXPORT_SYMBOL_GPL(ata_sff_dma_pause); 197 EXPORT_SYMBOL_GPL(ata_sff_dma_pause);
198 198
199 /** 199 /**
200 * ata_sff_busy_sleep - sleep until BSY clears, or timeout 200 * ata_sff_busy_sleep - sleep until BSY clears, or timeout
201 * @ap: port containing status register to be polled 201 * @ap: port containing status register to be polled
202 * @tmout_pat: impatience timeout in msecs 202 * @tmout_pat: impatience timeout in msecs
203 * @tmout: overall timeout in msecs 203 * @tmout: overall timeout in msecs
204 * 204 *
205 * Sleep until ATA Status register bit BSY clears, 205 * Sleep until ATA Status register bit BSY clears,
206 * or a timeout occurs. 206 * or a timeout occurs.
207 * 207 *
208 * LOCKING: 208 * LOCKING:
209 * Kernel thread context (may sleep). 209 * Kernel thread context (may sleep).
210 * 210 *
211 * RETURNS: 211 * RETURNS:
212 * 0 on success, -errno otherwise. 212 * 0 on success, -errno otherwise.
213 */ 213 */
214 int ata_sff_busy_sleep(struct ata_port *ap, 214 int ata_sff_busy_sleep(struct ata_port *ap,
215 unsigned long tmout_pat, unsigned long tmout) 215 unsigned long tmout_pat, unsigned long tmout)
216 { 216 {
217 unsigned long timer_start, timeout; 217 unsigned long timer_start, timeout;
218 u8 status; 218 u8 status;
219 219
220 status = ata_sff_busy_wait(ap, ATA_BUSY, 300); 220 status = ata_sff_busy_wait(ap, ATA_BUSY, 300);
221 timer_start = jiffies; 221 timer_start = jiffies;
222 timeout = ata_deadline(timer_start, tmout_pat); 222 timeout = ata_deadline(timer_start, tmout_pat);
223 while (status != 0xff && (status & ATA_BUSY) && 223 while (status != 0xff && (status & ATA_BUSY) &&
224 time_before(jiffies, timeout)) { 224 time_before(jiffies, timeout)) {
225 ata_msleep(ap, 50); 225 ata_msleep(ap, 50);
226 status = ata_sff_busy_wait(ap, ATA_BUSY, 3); 226 status = ata_sff_busy_wait(ap, ATA_BUSY, 3);
227 } 227 }
228 228
229 if (status != 0xff && (status & ATA_BUSY)) 229 if (status != 0xff && (status & ATA_BUSY))
230 ata_port_printk(ap, KERN_WARNING, 230 ata_port_printk(ap, KERN_WARNING,
231 "port is slow to respond, please be patient " 231 "port is slow to respond, please be patient "
232 "(Status 0x%x)\n", status); 232 "(Status 0x%x)\n", status);
233 233
234 timeout = ata_deadline(timer_start, tmout); 234 timeout = ata_deadline(timer_start, tmout);
235 while (status != 0xff && (status & ATA_BUSY) && 235 while (status != 0xff && (status & ATA_BUSY) &&
236 time_before(jiffies, timeout)) { 236 time_before(jiffies, timeout)) {
237 ata_msleep(ap, 50); 237 ata_msleep(ap, 50);
238 status = ap->ops->sff_check_status(ap); 238 status = ap->ops->sff_check_status(ap);
239 } 239 }
240 240
241 if (status == 0xff) 241 if (status == 0xff)
242 return -ENODEV; 242 return -ENODEV;
243 243
244 if (status & ATA_BUSY) { 244 if (status & ATA_BUSY) {
245 ata_port_printk(ap, KERN_ERR, "port failed to respond " 245 ata_port_printk(ap, KERN_ERR, "port failed to respond "
246 "(%lu secs, Status 0x%x)\n", 246 "(%lu secs, Status 0x%x)\n",
247 DIV_ROUND_UP(tmout, 1000), status); 247 DIV_ROUND_UP(tmout, 1000), status);
248 return -EBUSY; 248 return -EBUSY;
249 } 249 }
250 250
251 return 0; 251 return 0;
252 } 252 }
253 EXPORT_SYMBOL_GPL(ata_sff_busy_sleep); 253 EXPORT_SYMBOL_GPL(ata_sff_busy_sleep);
254 254
255 static int ata_sff_check_ready(struct ata_link *link) 255 static int ata_sff_check_ready(struct ata_link *link)
256 { 256 {
257 u8 status = link->ap->ops->sff_check_status(link->ap); 257 u8 status = link->ap->ops->sff_check_status(link->ap);
258 258
259 return ata_check_ready(status); 259 return ata_check_ready(status);
260 } 260 }
261 261
262 /** 262 /**
263 * ata_sff_wait_ready - sleep until BSY clears, or timeout 263 * ata_sff_wait_ready - sleep until BSY clears, or timeout
264 * @link: SFF link to wait ready status for 264 * @link: SFF link to wait ready status for
265 * @deadline: deadline jiffies for the operation 265 * @deadline: deadline jiffies for the operation
266 * 266 *
267 * Sleep until ATA Status register bit BSY clears, or timeout 267 * Sleep until ATA Status register bit BSY clears, or timeout
268 * occurs. 268 * occurs.
269 * 269 *
270 * LOCKING: 270 * LOCKING:
271 * Kernel thread context (may sleep). 271 * Kernel thread context (may sleep).
272 * 272 *
273 * RETURNS: 273 * RETURNS:
274 * 0 on success, -errno otherwise. 274 * 0 on success, -errno otherwise.
275 */ 275 */
276 int ata_sff_wait_ready(struct ata_link *link, unsigned long deadline) 276 int ata_sff_wait_ready(struct ata_link *link, unsigned long deadline)
277 { 277 {
278 return ata_wait_ready(link, deadline, ata_sff_check_ready); 278 return ata_wait_ready(link, deadline, ata_sff_check_ready);
279 } 279 }
280 EXPORT_SYMBOL_GPL(ata_sff_wait_ready); 280 EXPORT_SYMBOL_GPL(ata_sff_wait_ready);
281 281
282 /** 282 /**
283 * ata_sff_set_devctl - Write device control reg 283 * ata_sff_set_devctl - Write device control reg
284 * @ap: port where the device is 284 * @ap: port where the device is
285 * @ctl: value to write 285 * @ctl: value to write
286 * 286 *
287 * Writes ATA taskfile device control register. 287 * Writes ATA taskfile device control register.
288 * 288 *
289 * Note: may NOT be used as the sff_set_devctl() entry in 289 * Note: may NOT be used as the sff_set_devctl() entry in
290 * ata_port_operations. 290 * ata_port_operations.
291 * 291 *
292 * LOCKING: 292 * LOCKING:
293 * Inherited from caller. 293 * Inherited from caller.
294 */ 294 */
295 static void ata_sff_set_devctl(struct ata_port *ap, u8 ctl) 295 static void ata_sff_set_devctl(struct ata_port *ap, u8 ctl)
296 { 296 {
297 if (ap->ops->sff_set_devctl) 297 if (ap->ops->sff_set_devctl)
298 ap->ops->sff_set_devctl(ap, ctl); 298 ap->ops->sff_set_devctl(ap, ctl);
299 else 299 else
300 iowrite8(ctl, ap->ioaddr.ctl_addr); 300 iowrite8(ctl, ap->ioaddr.ctl_addr);
301 } 301 }
302 302
303 /** 303 /**
304 * ata_sff_dev_select - Select device 0/1 on ATA bus 304 * ata_sff_dev_select - Select device 0/1 on ATA bus
305 * @ap: ATA channel to manipulate 305 * @ap: ATA channel to manipulate
306 * @device: ATA device (numbered from zero) to select 306 * @device: ATA device (numbered from zero) to select
307 * 307 *
308 * Use the method defined in the ATA specification to 308 * Use the method defined in the ATA specification to
309 * make either device 0, or device 1, active on the 309 * make either device 0, or device 1, active on the
310 * ATA channel. Works with both PIO and MMIO. 310 * ATA channel. Works with both PIO and MMIO.
311 * 311 *
312 * May be used as the dev_select() entry in ata_port_operations. 312 * May be used as the dev_select() entry in ata_port_operations.
313 * 313 *
314 * LOCKING: 314 * LOCKING:
315 * caller. 315 * caller.
316 */ 316 */
317 void ata_sff_dev_select(struct ata_port *ap, unsigned int device) 317 void ata_sff_dev_select(struct ata_port *ap, unsigned int device)
318 { 318 {
319 u8 tmp; 319 u8 tmp;
320 320
321 if (device == 0) 321 if (device == 0)
322 tmp = ATA_DEVICE_OBS; 322 tmp = ATA_DEVICE_OBS;
323 else 323 else
324 tmp = ATA_DEVICE_OBS | ATA_DEV1; 324 tmp = ATA_DEVICE_OBS | ATA_DEV1;
325 325
326 iowrite8(tmp, ap->ioaddr.device_addr); 326 iowrite8(tmp, ap->ioaddr.device_addr);
327 ata_sff_pause(ap); /* needed; also flushes, for mmio */ 327 ata_sff_pause(ap); /* needed; also flushes, for mmio */
328 } 328 }
329 EXPORT_SYMBOL_GPL(ata_sff_dev_select); 329 EXPORT_SYMBOL_GPL(ata_sff_dev_select);
330 330
331 /** 331 /**
332 * ata_dev_select - Select device 0/1 on ATA bus 332 * ata_dev_select - Select device 0/1 on ATA bus
333 * @ap: ATA channel to manipulate 333 * @ap: ATA channel to manipulate
334 * @device: ATA device (numbered from zero) to select 334 * @device: ATA device (numbered from zero) to select
335 * @wait: non-zero to wait for Status register BSY bit to clear 335 * @wait: non-zero to wait for Status register BSY bit to clear
336 * @can_sleep: non-zero if context allows sleeping 336 * @can_sleep: non-zero if context allows sleeping
337 * 337 *
338 * Use the method defined in the ATA specification to 338 * Use the method defined in the ATA specification to
339 * make either device 0, or device 1, active on the 339 * make either device 0, or device 1, active on the
340 * ATA channel. 340 * ATA channel.
341 * 341 *
342 * This is a high-level version of ata_sff_dev_select(), which 342 * This is a high-level version of ata_sff_dev_select(), which
343 * additionally provides the services of inserting the proper 343 * additionally provides the services of inserting the proper
344 * pauses and status polling, where needed. 344 * pauses and status polling, where needed.
345 * 345 *
346 * LOCKING: 346 * LOCKING:
347 * caller. 347 * caller.
348 */ 348 */
349 static void ata_dev_select(struct ata_port *ap, unsigned int device, 349 static void ata_dev_select(struct ata_port *ap, unsigned int device,
350 unsigned int wait, unsigned int can_sleep) 350 unsigned int wait, unsigned int can_sleep)
351 { 351 {
352 if (ata_msg_probe(ap)) 352 if (ata_msg_probe(ap))
353 ata_port_printk(ap, KERN_INFO, "ata_dev_select: ENTER, " 353 ata_port_printk(ap, KERN_INFO, "ata_dev_select: ENTER, "
354 "device %u, wait %u\n", device, wait); 354 "device %u, wait %u\n", device, wait);
355 355
356 if (wait) 356 if (wait)
357 ata_wait_idle(ap); 357 ata_wait_idle(ap);
358 358
359 ap->ops->sff_dev_select(ap, device); 359 ap->ops->sff_dev_select(ap, device);
360 360
361 if (wait) { 361 if (wait) {
362 if (can_sleep && ap->link.device[device].class == ATA_DEV_ATAPI) 362 if (can_sleep && ap->link.device[device].class == ATA_DEV_ATAPI)
363 ata_msleep(ap, 150); 363 ata_msleep(ap, 150);
364 ata_wait_idle(ap); 364 ata_wait_idle(ap);
365 } 365 }
366 } 366 }
367 367
368 /** 368 /**
369 * ata_sff_irq_on - Enable interrupts on a port. 369 * ata_sff_irq_on - Enable interrupts on a port.
370 * @ap: Port on which interrupts are enabled. 370 * @ap: Port on which interrupts are enabled.
371 * 371 *
372 * Enable interrupts on a legacy IDE device using MMIO or PIO, 372 * Enable interrupts on a legacy IDE device using MMIO or PIO,
373 * wait for idle, clear any pending interrupts. 373 * wait for idle, clear any pending interrupts.
374 * 374 *
375 * Note: may NOT be used as the sff_irq_on() entry in 375 * Note: may NOT be used as the sff_irq_on() entry in
376 * ata_port_operations. 376 * ata_port_operations.
377 * 377 *
378 * LOCKING: 378 * LOCKING:
379 * Inherited from caller. 379 * Inherited from caller.
380 */ 380 */
381 void ata_sff_irq_on(struct ata_port *ap) 381 void ata_sff_irq_on(struct ata_port *ap)
382 { 382 {
383 struct ata_ioports *ioaddr = &ap->ioaddr; 383 struct ata_ioports *ioaddr = &ap->ioaddr;
384 384
385 if (ap->ops->sff_irq_on) { 385 if (ap->ops->sff_irq_on) {
386 ap->ops->sff_irq_on(ap); 386 ap->ops->sff_irq_on(ap);
387 return; 387 return;
388 } 388 }
389 389
390 ap->ctl &= ~ATA_NIEN; 390 ap->ctl &= ~ATA_NIEN;
391 ap->last_ctl = ap->ctl; 391 ap->last_ctl = ap->ctl;
392 392
393 if (ap->ops->sff_set_devctl || ioaddr->ctl_addr) 393 if (ap->ops->sff_set_devctl || ioaddr->ctl_addr)
394 ata_sff_set_devctl(ap, ap->ctl); 394 ata_sff_set_devctl(ap, ap->ctl);
395 ata_wait_idle(ap); 395 ata_wait_idle(ap);
396 396
397 if (ap->ops->sff_irq_clear) 397 if (ap->ops->sff_irq_clear)
398 ap->ops->sff_irq_clear(ap); 398 ap->ops->sff_irq_clear(ap);
399 } 399 }
400 EXPORT_SYMBOL_GPL(ata_sff_irq_on); 400 EXPORT_SYMBOL_GPL(ata_sff_irq_on);
401 401
402 /** 402 /**
403 * ata_sff_tf_load - send taskfile registers to host controller 403 * ata_sff_tf_load - send taskfile registers to host controller
404 * @ap: Port to which output is sent 404 * @ap: Port to which output is sent
405 * @tf: ATA taskfile register set 405 * @tf: ATA taskfile register set
406 * 406 *
407 * Outputs ATA taskfile to standard ATA host controller. 407 * Outputs ATA taskfile to standard ATA host controller.
408 * 408 *
409 * LOCKING: 409 * LOCKING:
410 * Inherited from caller. 410 * Inherited from caller.
411 */ 411 */
412 void ata_sff_tf_load(struct ata_port *ap, const struct ata_taskfile *tf) 412 void ata_sff_tf_load(struct ata_port *ap, const struct ata_taskfile *tf)
413 { 413 {
414 struct ata_ioports *ioaddr = &ap->ioaddr; 414 struct ata_ioports *ioaddr = &ap->ioaddr;
415 unsigned int is_addr = tf->flags & ATA_TFLAG_ISADDR; 415 unsigned int is_addr = tf->flags & ATA_TFLAG_ISADDR;
416 416
417 if (tf->ctl != ap->last_ctl) { 417 if (tf->ctl != ap->last_ctl) {
418 if (ioaddr->ctl_addr) 418 if (ioaddr->ctl_addr)
419 iowrite8(tf->ctl, ioaddr->ctl_addr); 419 iowrite8(tf->ctl, ioaddr->ctl_addr);
420 ap->last_ctl = tf->ctl; 420 ap->last_ctl = tf->ctl;
421 ata_wait_idle(ap); 421 ata_wait_idle(ap);
422 } 422 }
423 423
424 if (is_addr && (tf->flags & ATA_TFLAG_LBA48)) { 424 if (is_addr && (tf->flags & ATA_TFLAG_LBA48)) {
425 WARN_ON_ONCE(!ioaddr->ctl_addr); 425 WARN_ON_ONCE(!ioaddr->ctl_addr);
426 iowrite8(tf->hob_feature, ioaddr->feature_addr); 426 iowrite8(tf->hob_feature, ioaddr->feature_addr);
427 iowrite8(tf->hob_nsect, ioaddr->nsect_addr); 427 iowrite8(tf->hob_nsect, ioaddr->nsect_addr);
428 iowrite8(tf->hob_lbal, ioaddr->lbal_addr); 428 iowrite8(tf->hob_lbal, ioaddr->lbal_addr);
429 iowrite8(tf->hob_lbam, ioaddr->lbam_addr); 429 iowrite8(tf->hob_lbam, ioaddr->lbam_addr);
430 iowrite8(tf->hob_lbah, ioaddr->lbah_addr); 430 iowrite8(tf->hob_lbah, ioaddr->lbah_addr);
431 VPRINTK("hob: feat 0x%X nsect 0x%X, lba 0x%X 0x%X 0x%X\n", 431 VPRINTK("hob: feat 0x%X nsect 0x%X, lba 0x%X 0x%X 0x%X\n",
432 tf->hob_feature, 432 tf->hob_feature,
433 tf->hob_nsect, 433 tf->hob_nsect,
434 tf->hob_lbal, 434 tf->hob_lbal,
435 tf->hob_lbam, 435 tf->hob_lbam,
436 tf->hob_lbah); 436 tf->hob_lbah);
437 } 437 }
438 438
439 if (is_addr) { 439 if (is_addr) {
440 iowrite8(tf->feature, ioaddr->feature_addr); 440 iowrite8(tf->feature, ioaddr->feature_addr);
441 iowrite8(tf->nsect, ioaddr->nsect_addr); 441 iowrite8(tf->nsect, ioaddr->nsect_addr);
442 iowrite8(tf->lbal, ioaddr->lbal_addr); 442 iowrite8(tf->lbal, ioaddr->lbal_addr);
443 iowrite8(tf->lbam, ioaddr->lbam_addr); 443 iowrite8(tf->lbam, ioaddr->lbam_addr);
444 iowrite8(tf->lbah, ioaddr->lbah_addr); 444 iowrite8(tf->lbah, ioaddr->lbah_addr);
445 VPRINTK("feat 0x%X nsect 0x%X lba 0x%X 0x%X 0x%X\n", 445 VPRINTK("feat 0x%X nsect 0x%X lba 0x%X 0x%X 0x%X\n",
446 tf->feature, 446 tf->feature,
447 tf->nsect, 447 tf->nsect,
448 tf->lbal, 448 tf->lbal,
449 tf->lbam, 449 tf->lbam,
450 tf->lbah); 450 tf->lbah);
451 } 451 }
452 452
453 if (tf->flags & ATA_TFLAG_DEVICE) { 453 if (tf->flags & ATA_TFLAG_DEVICE) {
454 iowrite8(tf->device, ioaddr->device_addr); 454 iowrite8(tf->device, ioaddr->device_addr);
455 VPRINTK("device 0x%X\n", tf->device); 455 VPRINTK("device 0x%X\n", tf->device);
456 } 456 }
457 457
458 ata_wait_idle(ap); 458 ata_wait_idle(ap);
459 } 459 }
460 EXPORT_SYMBOL_GPL(ata_sff_tf_load); 460 EXPORT_SYMBOL_GPL(ata_sff_tf_load);
461 461
462 /** 462 /**
463 * ata_sff_tf_read - input device's ATA taskfile shadow registers 463 * ata_sff_tf_read - input device's ATA taskfile shadow registers
464 * @ap: Port from which input is read 464 * @ap: Port from which input is read
465 * @tf: ATA taskfile register set for storing input 465 * @tf: ATA taskfile register set for storing input
466 * 466 *
467 * Reads ATA taskfile registers for currently-selected device 467 * Reads ATA taskfile registers for currently-selected device
468 * into @tf. Assumes the device has a fully SFF compliant task file 468 * into @tf. Assumes the device has a fully SFF compliant task file
469 * layout and behaviour. If you device does not (eg has a different 469 * layout and behaviour. If you device does not (eg has a different
470 * status method) then you will need to provide a replacement tf_read 470 * status method) then you will need to provide a replacement tf_read
471 * 471 *
472 * LOCKING: 472 * LOCKING:
473 * Inherited from caller. 473 * Inherited from caller.
474 */ 474 */
475 void ata_sff_tf_read(struct ata_port *ap, struct ata_taskfile *tf) 475 void ata_sff_tf_read(struct ata_port *ap, struct ata_taskfile *tf)
476 { 476 {
477 struct ata_ioports *ioaddr = &ap->ioaddr; 477 struct ata_ioports *ioaddr = &ap->ioaddr;
478 478
479 tf->command = ata_sff_check_status(ap); 479 tf->command = ata_sff_check_status(ap);
480 tf->feature = ioread8(ioaddr->error_addr); 480 tf->feature = ioread8(ioaddr->error_addr);
481 tf->nsect = ioread8(ioaddr->nsect_addr); 481 tf->nsect = ioread8(ioaddr->nsect_addr);
482 tf->lbal = ioread8(ioaddr->lbal_addr); 482 tf->lbal = ioread8(ioaddr->lbal_addr);
483 tf->lbam = ioread8(ioaddr->lbam_addr); 483 tf->lbam = ioread8(ioaddr->lbam_addr);
484 tf->lbah = ioread8(ioaddr->lbah_addr); 484 tf->lbah = ioread8(ioaddr->lbah_addr);
485 tf->device = ioread8(ioaddr->device_addr); 485 tf->device = ioread8(ioaddr->device_addr);
486 486
487 if (tf->flags & ATA_TFLAG_LBA48) { 487 if (tf->flags & ATA_TFLAG_LBA48) {
488 if (likely(ioaddr->ctl_addr)) { 488 if (likely(ioaddr->ctl_addr)) {
489 iowrite8(tf->ctl | ATA_HOB, ioaddr->ctl_addr); 489 iowrite8(tf->ctl | ATA_HOB, ioaddr->ctl_addr);
490 tf->hob_feature = ioread8(ioaddr->error_addr); 490 tf->hob_feature = ioread8(ioaddr->error_addr);
491 tf->hob_nsect = ioread8(ioaddr->nsect_addr); 491 tf->hob_nsect = ioread8(ioaddr->nsect_addr);
492 tf->hob_lbal = ioread8(ioaddr->lbal_addr); 492 tf->hob_lbal = ioread8(ioaddr->lbal_addr);
493 tf->hob_lbam = ioread8(ioaddr->lbam_addr); 493 tf->hob_lbam = ioread8(ioaddr->lbam_addr);
494 tf->hob_lbah = ioread8(ioaddr->lbah_addr); 494 tf->hob_lbah = ioread8(ioaddr->lbah_addr);
495 iowrite8(tf->ctl, ioaddr->ctl_addr); 495 iowrite8(tf->ctl, ioaddr->ctl_addr);
496 ap->last_ctl = tf->ctl; 496 ap->last_ctl = tf->ctl;
497 } else 497 } else
498 WARN_ON_ONCE(1); 498 WARN_ON_ONCE(1);
499 } 499 }
500 } 500 }
501 EXPORT_SYMBOL_GPL(ata_sff_tf_read); 501 EXPORT_SYMBOL_GPL(ata_sff_tf_read);
502 502
503 /** 503 /**
504 * ata_sff_exec_command - issue ATA command to host controller 504 * ata_sff_exec_command - issue ATA command to host controller
505 * @ap: port to which command is being issued 505 * @ap: port to which command is being issued
506 * @tf: ATA taskfile register set 506 * @tf: ATA taskfile register set
507 * 507 *
508 * Issues ATA command, with proper synchronization with interrupt 508 * Issues ATA command, with proper synchronization with interrupt
509 * handler / other threads. 509 * handler / other threads.
510 * 510 *
511 * LOCKING: 511 * LOCKING:
512 * spin_lock_irqsave(host lock) 512 * spin_lock_irqsave(host lock)
513 */ 513 */
514 void ata_sff_exec_command(struct ata_port *ap, const struct ata_taskfile *tf) 514 void ata_sff_exec_command(struct ata_port *ap, const struct ata_taskfile *tf)
515 { 515 {
516 DPRINTK("ata%u: cmd 0x%X\n", ap->print_id, tf->command); 516 DPRINTK("ata%u: cmd 0x%X\n", ap->print_id, tf->command);
517 517
518 iowrite8(tf->command, ap->ioaddr.command_addr); 518 iowrite8(tf->command, ap->ioaddr.command_addr);
519 ata_sff_pause(ap); 519 ata_sff_pause(ap);
520 } 520 }
521 EXPORT_SYMBOL_GPL(ata_sff_exec_command); 521 EXPORT_SYMBOL_GPL(ata_sff_exec_command);
522 522
523 /** 523 /**
524 * ata_tf_to_host - issue ATA taskfile to host controller 524 * ata_tf_to_host - issue ATA taskfile to host controller
525 * @ap: port to which command is being issued 525 * @ap: port to which command is being issued
526 * @tf: ATA taskfile register set 526 * @tf: ATA taskfile register set
527 * 527 *
528 * Issues ATA taskfile register set to ATA host controller, 528 * Issues ATA taskfile register set to ATA host controller,
529 * with proper synchronization with interrupt handler and 529 * with proper synchronization with interrupt handler and
530 * other threads. 530 * other threads.
531 * 531 *
532 * LOCKING: 532 * LOCKING:
533 * spin_lock_irqsave(host lock) 533 * spin_lock_irqsave(host lock)
534 */ 534 */
535 static inline void ata_tf_to_host(struct ata_port *ap, 535 static inline void ata_tf_to_host(struct ata_port *ap,
536 const struct ata_taskfile *tf) 536 const struct ata_taskfile *tf)
537 { 537 {
538 ap->ops->sff_tf_load(ap, tf); 538 ap->ops->sff_tf_load(ap, tf);
539 ap->ops->sff_exec_command(ap, tf); 539 ap->ops->sff_exec_command(ap, tf);
540 } 540 }
541 541
542 /** 542 /**
543 * ata_sff_data_xfer - Transfer data by PIO 543 * ata_sff_data_xfer - Transfer data by PIO
544 * @dev: device to target 544 * @dev: device to target
545 * @buf: data buffer 545 * @buf: data buffer
546 * @buflen: buffer length 546 * @buflen: buffer length
547 * @rw: read/write 547 * @rw: read/write
548 * 548 *
549 * Transfer data from/to the device data register by PIO. 549 * Transfer data from/to the device data register by PIO.
550 * 550 *
551 * LOCKING: 551 * LOCKING:
552 * Inherited from caller. 552 * Inherited from caller.
553 * 553 *
554 * RETURNS: 554 * RETURNS:
555 * Bytes consumed. 555 * Bytes consumed.
556 */ 556 */
557 unsigned int ata_sff_data_xfer(struct ata_device *dev, unsigned char *buf, 557 unsigned int ata_sff_data_xfer(struct ata_device *dev, unsigned char *buf,
558 unsigned int buflen, int rw) 558 unsigned int buflen, int rw)
559 { 559 {
560 struct ata_port *ap = dev->link->ap; 560 struct ata_port *ap = dev->link->ap;
561 void __iomem *data_addr = ap->ioaddr.data_addr; 561 void __iomem *data_addr = ap->ioaddr.data_addr;
562 unsigned int words = buflen >> 1; 562 unsigned int words = buflen >> 1;
563 563
564 /* Transfer multiple of 2 bytes */ 564 /* Transfer multiple of 2 bytes */
565 if (rw == READ) 565 if (rw == READ)
566 ioread16_rep(data_addr, buf, words); 566 ioread16_rep(data_addr, buf, words);
567 else 567 else
568 iowrite16_rep(data_addr, buf, words); 568 iowrite16_rep(data_addr, buf, words);
569 569
570 /* Transfer trailing byte, if any. */ 570 /* Transfer trailing byte, if any. */
571 if (unlikely(buflen & 0x01)) { 571 if (unlikely(buflen & 0x01)) {
572 unsigned char pad[2]; 572 unsigned char pad[2];
573 573
574 /* Point buf to the tail of buffer */ 574 /* Point buf to the tail of buffer */
575 buf += buflen - 1; 575 buf += buflen - 1;
576 576
577 /* 577 /*
578 * Use io*16_rep() accessors here as well to avoid pointlessly 578 * Use io*16_rep() accessors here as well to avoid pointlessly
579 * swapping bytes to and from on the big endian machines... 579 * swapping bytes to and from on the big endian machines...
580 */ 580 */
581 if (rw == READ) { 581 if (rw == READ) {
582 ioread16_rep(data_addr, pad, 1); 582 ioread16_rep(data_addr, pad, 1);
583 *buf = pad[0]; 583 *buf = pad[0];
584 } else { 584 } else {
585 pad[0] = *buf; 585 pad[0] = *buf;
586 iowrite16_rep(data_addr, pad, 1); 586 iowrite16_rep(data_addr, pad, 1);
587 } 587 }
588 words++; 588 words++;
589 } 589 }
590 590
591 return words << 1; 591 return words << 1;
592 } 592 }
593 EXPORT_SYMBOL_GPL(ata_sff_data_xfer); 593 EXPORT_SYMBOL_GPL(ata_sff_data_xfer);
594 594
595 /** 595 /**
596 * ata_sff_data_xfer32 - Transfer data by PIO 596 * ata_sff_data_xfer32 - Transfer data by PIO
597 * @dev: device to target 597 * @dev: device to target
598 * @buf: data buffer 598 * @buf: data buffer
599 * @buflen: buffer length 599 * @buflen: buffer length
600 * @rw: read/write 600 * @rw: read/write
601 * 601 *
602 * Transfer data from/to the device data register by PIO using 32bit 602 * Transfer data from/to the device data register by PIO using 32bit
603 * I/O operations. 603 * I/O operations.
604 * 604 *
605 * LOCKING: 605 * LOCKING:
606 * Inherited from caller. 606 * Inherited from caller.
607 * 607 *
608 * RETURNS: 608 * RETURNS:
609 * Bytes consumed. 609 * Bytes consumed.
610 */ 610 */
611 611
612 unsigned int ata_sff_data_xfer32(struct ata_device *dev, unsigned char *buf, 612 unsigned int ata_sff_data_xfer32(struct ata_device *dev, unsigned char *buf,
613 unsigned int buflen, int rw) 613 unsigned int buflen, int rw)
614 { 614 {
615 struct ata_port *ap = dev->link->ap; 615 struct ata_port *ap = dev->link->ap;
616 void __iomem *data_addr = ap->ioaddr.data_addr; 616 void __iomem *data_addr = ap->ioaddr.data_addr;
617 unsigned int words = buflen >> 2; 617 unsigned int words = buflen >> 2;
618 int slop = buflen & 3; 618 int slop = buflen & 3;
619 619
620 if (!(ap->pflags & ATA_PFLAG_PIO32)) 620 if (!(ap->pflags & ATA_PFLAG_PIO32))
621 return ata_sff_data_xfer(dev, buf, buflen, rw); 621 return ata_sff_data_xfer(dev, buf, buflen, rw);
622 622
623 /* Transfer multiple of 4 bytes */ 623 /* Transfer multiple of 4 bytes */
624 if (rw == READ) 624 if (rw == READ)
625 ioread32_rep(data_addr, buf, words); 625 ioread32_rep(data_addr, buf, words);
626 else 626 else
627 iowrite32_rep(data_addr, buf, words); 627 iowrite32_rep(data_addr, buf, words);
628 628
629 /* Transfer trailing bytes, if any */ 629 /* Transfer trailing bytes, if any */
630 if (unlikely(slop)) { 630 if (unlikely(slop)) {
631 unsigned char pad[4]; 631 unsigned char pad[4];
632 632
633 /* Point buf to the tail of buffer */ 633 /* Point buf to the tail of buffer */
634 buf += buflen - slop; 634 buf += buflen - slop;
635 635
636 /* 636 /*
637 * Use io*_rep() accessors here as well to avoid pointlessly 637 * Use io*_rep() accessors here as well to avoid pointlessly
638 * swapping bytes to and from on the big endian machines... 638 * swapping bytes to and from on the big endian machines...
639 */ 639 */
640 if (rw == READ) { 640 if (rw == READ) {
641 if (slop < 3) 641 if (slop < 3)
642 ioread16_rep(data_addr, pad, 1); 642 ioread16_rep(data_addr, pad, 1);
643 else 643 else
644 ioread32_rep(data_addr, pad, 1); 644 ioread32_rep(data_addr, pad, 1);
645 memcpy(buf, pad, slop); 645 memcpy(buf, pad, slop);
646 } else { 646 } else {
647 memcpy(pad, buf, slop); 647 memcpy(pad, buf, slop);
648 if (slop < 3) 648 if (slop < 3)
649 iowrite16_rep(data_addr, pad, 1); 649 iowrite16_rep(data_addr, pad, 1);
650 else 650 else
651 iowrite32_rep(data_addr, pad, 1); 651 iowrite32_rep(data_addr, pad, 1);
652 } 652 }
653 } 653 }
654 return (buflen + 1) & ~1; 654 return (buflen + 1) & ~1;
655 } 655 }
656 EXPORT_SYMBOL_GPL(ata_sff_data_xfer32); 656 EXPORT_SYMBOL_GPL(ata_sff_data_xfer32);
657 657
658 /** 658 /**
659 * ata_sff_data_xfer_noirq - Transfer data by PIO 659 * ata_sff_data_xfer_noirq - Transfer data by PIO
660 * @dev: device to target 660 * @dev: device to target
661 * @buf: data buffer 661 * @buf: data buffer
662 * @buflen: buffer length 662 * @buflen: buffer length
663 * @rw: read/write 663 * @rw: read/write
664 * 664 *
665 * Transfer data from/to the device data register by PIO. Do the 665 * Transfer data from/to the device data register by PIO. Do the
666 * transfer with interrupts disabled. 666 * transfer with interrupts disabled.
667 * 667 *
668 * LOCKING: 668 * LOCKING:
669 * Inherited from caller. 669 * Inherited from caller.
670 * 670 *
671 * RETURNS: 671 * RETURNS:
672 * Bytes consumed. 672 * Bytes consumed.
673 */ 673 */
674 unsigned int ata_sff_data_xfer_noirq(struct ata_device *dev, unsigned char *buf, 674 unsigned int ata_sff_data_xfer_noirq(struct ata_device *dev, unsigned char *buf,
675 unsigned int buflen, int rw) 675 unsigned int buflen, int rw)
676 { 676 {
677 unsigned long flags; 677 unsigned long flags;
678 unsigned int consumed; 678 unsigned int consumed;
679 679
680 local_irq_save(flags); 680 local_irq_save(flags);
681 consumed = ata_sff_data_xfer(dev, buf, buflen, rw); 681 consumed = ata_sff_data_xfer(dev, buf, buflen, rw);
682 local_irq_restore(flags); 682 local_irq_restore(flags);
683 683
684 return consumed; 684 return consumed;
685 } 685 }
686 EXPORT_SYMBOL_GPL(ata_sff_data_xfer_noirq); 686 EXPORT_SYMBOL_GPL(ata_sff_data_xfer_noirq);
687 687
688 /** 688 /**
689 * ata_pio_sector - Transfer a sector of data. 689 * ata_pio_sector - Transfer a sector of data.
690 * @qc: Command on going 690 * @qc: Command on going
691 * 691 *
692 * Transfer qc->sect_size bytes of data from/to the ATA device. 692 * Transfer qc->sect_size bytes of data from/to the ATA device.
693 * 693 *
694 * LOCKING: 694 * LOCKING:
695 * Inherited from caller. 695 * Inherited from caller.
696 */ 696 */
697 static void ata_pio_sector(struct ata_queued_cmd *qc) 697 static void ata_pio_sector(struct ata_queued_cmd *qc)
698 { 698 {
699 int do_write = (qc->tf.flags & ATA_TFLAG_WRITE); 699 int do_write = (qc->tf.flags & ATA_TFLAG_WRITE);
700 struct ata_port *ap = qc->ap; 700 struct ata_port *ap = qc->ap;
701 struct page *page; 701 struct page *page;
702 unsigned int offset; 702 unsigned int offset;
703 unsigned char *buf; 703 unsigned char *buf;
704 704
705 if (qc->curbytes == qc->nbytes - qc->sect_size) 705 if (qc->curbytes == qc->nbytes - qc->sect_size)
706 ap->hsm_task_state = HSM_ST_LAST; 706 ap->hsm_task_state = HSM_ST_LAST;
707 707
708 page = sg_page(qc->cursg); 708 page = sg_page(qc->cursg);
709 offset = qc->cursg->offset + qc->cursg_ofs; 709 offset = qc->cursg->offset + qc->cursg_ofs;
710 710
711 /* get the current page and offset */ 711 /* get the current page and offset */
712 page = nth_page(page, (offset >> PAGE_SHIFT)); 712 page = nth_page(page, (offset >> PAGE_SHIFT));
713 offset %= PAGE_SIZE; 713 offset %= PAGE_SIZE;
714 714
715 DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); 715 DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read");
716 716
717 if (PageHighMem(page)) { 717 if (PageHighMem(page)) {
718 unsigned long flags; 718 unsigned long flags;
719 719
720 /* FIXME: use a bounce buffer */ 720 /* FIXME: use a bounce buffer */
721 local_irq_save(flags); 721 local_irq_save(flags);
722 buf = kmap_atomic(page, KM_IRQ0); 722 buf = kmap_atomic(page, KM_IRQ0);
723 723
724 /* do the actual data transfer */ 724 /* do the actual data transfer */
725 ap->ops->sff_data_xfer(qc->dev, buf + offset, qc->sect_size, 725 ap->ops->sff_data_xfer(qc->dev, buf + offset, qc->sect_size,
726 do_write); 726 do_write);
727 727
728 kunmap_atomic(buf, KM_IRQ0); 728 kunmap_atomic(buf, KM_IRQ0);
729 local_irq_restore(flags); 729 local_irq_restore(flags);
730 } else { 730 } else {
731 buf = page_address(page); 731 buf = page_address(page);
732 ap->ops->sff_data_xfer(qc->dev, buf + offset, qc->sect_size, 732 ap->ops->sff_data_xfer(qc->dev, buf + offset, qc->sect_size,
733 do_write); 733 do_write);
734 } 734 }
735 735
736 if (!do_write && !PageSlab(page)) 736 if (!do_write && !PageSlab(page))
737 flush_dcache_page(page); 737 flush_dcache_page(page);
738 738
739 qc->curbytes += qc->sect_size; 739 qc->curbytes += qc->sect_size;
740 qc->cursg_ofs += qc->sect_size; 740 qc->cursg_ofs += qc->sect_size;
741 741
742 if (qc->cursg_ofs == qc->cursg->length) { 742 if (qc->cursg_ofs == qc->cursg->length) {
743 qc->cursg = sg_next(qc->cursg); 743 qc->cursg = sg_next(qc->cursg);
744 qc->cursg_ofs = 0; 744 qc->cursg_ofs = 0;
745 } 745 }
746 } 746 }
747 747
748 /** 748 /**
749 * ata_pio_sectors - Transfer one or many sectors. 749 * ata_pio_sectors - Transfer one or many sectors.
750 * @qc: Command on going 750 * @qc: Command on going
751 * 751 *
752 * Transfer one or many sectors of data from/to the 752 * Transfer one or many sectors of data from/to the
753 * ATA device for the DRQ request. 753 * ATA device for the DRQ request.
754 * 754 *
755 * LOCKING: 755 * LOCKING:
756 * Inherited from caller. 756 * Inherited from caller.
757 */ 757 */
758 static void ata_pio_sectors(struct ata_queued_cmd *qc) 758 static void ata_pio_sectors(struct ata_queued_cmd *qc)
759 { 759 {
760 if (is_multi_taskfile(&qc->tf)) { 760 if (is_multi_taskfile(&qc->tf)) {
761 /* READ/WRITE MULTIPLE */ 761 /* READ/WRITE MULTIPLE */
762 unsigned int nsect; 762 unsigned int nsect;
763 763
764 WARN_ON_ONCE(qc->dev->multi_count == 0); 764 WARN_ON_ONCE(qc->dev->multi_count == 0);
765 765
766 nsect = min((qc->nbytes - qc->curbytes) / qc->sect_size, 766 nsect = min((qc->nbytes - qc->curbytes) / qc->sect_size,
767 qc->dev->multi_count); 767 qc->dev->multi_count);
768 while (nsect--) 768 while (nsect--)
769 ata_pio_sector(qc); 769 ata_pio_sector(qc);
770 } else 770 } else
771 ata_pio_sector(qc); 771 ata_pio_sector(qc);
772 772
773 ata_sff_sync(qc->ap); /* flush */ 773 ata_sff_sync(qc->ap); /* flush */
774 } 774 }
775 775
776 /** 776 /**
777 * atapi_send_cdb - Write CDB bytes to hardware 777 * atapi_send_cdb - Write CDB bytes to hardware
778 * @ap: Port to which ATAPI device is attached. 778 * @ap: Port to which ATAPI device is attached.
779 * @qc: Taskfile currently active 779 * @qc: Taskfile currently active
780 * 780 *
781 * When device has indicated its readiness to accept 781 * When device has indicated its readiness to accept
782 * a CDB, this function is called. Send the CDB. 782 * a CDB, this function is called. Send the CDB.
783 * 783 *
784 * LOCKING: 784 * LOCKING:
785 * caller. 785 * caller.
786 */ 786 */
787 static void atapi_send_cdb(struct ata_port *ap, struct ata_queued_cmd *qc) 787 static void atapi_send_cdb(struct ata_port *ap, struct ata_queued_cmd *qc)
788 { 788 {
789 /* send SCSI cdb */ 789 /* send SCSI cdb */
790 DPRINTK("send cdb\n"); 790 DPRINTK("send cdb\n");
791 WARN_ON_ONCE(qc->dev->cdb_len < 12); 791 WARN_ON_ONCE(qc->dev->cdb_len < 12);
792 792
793 ap->ops->sff_data_xfer(qc->dev, qc->cdb, qc->dev->cdb_len, 1); 793 ap->ops->sff_data_xfer(qc->dev, qc->cdb, qc->dev->cdb_len, 1);
794 ata_sff_sync(ap); 794 ata_sff_sync(ap);
795 /* FIXME: If the CDB is for DMA do we need to do the transition delay 795 /* FIXME: If the CDB is for DMA do we need to do the transition delay
796 or is bmdma_start guaranteed to do it ? */ 796 or is bmdma_start guaranteed to do it ? */
797 switch (qc->tf.protocol) { 797 switch (qc->tf.protocol) {
798 case ATAPI_PROT_PIO: 798 case ATAPI_PROT_PIO:
799 ap->hsm_task_state = HSM_ST; 799 ap->hsm_task_state = HSM_ST;
800 break; 800 break;
801 case ATAPI_PROT_NODATA: 801 case ATAPI_PROT_NODATA:
802 ap->hsm_task_state = HSM_ST_LAST; 802 ap->hsm_task_state = HSM_ST_LAST;
803 break; 803 break;
804 #ifdef CONFIG_ATA_BMDMA 804 #ifdef CONFIG_ATA_BMDMA
805 case ATAPI_PROT_DMA: 805 case ATAPI_PROT_DMA:
806 ap->hsm_task_state = HSM_ST_LAST; 806 ap->hsm_task_state = HSM_ST_LAST;
807 /* initiate bmdma */ 807 /* initiate bmdma */
808 ap->ops->bmdma_start(qc); 808 ap->ops->bmdma_start(qc);
809 break; 809 break;
810 #endif /* CONFIG_ATA_BMDMA */ 810 #endif /* CONFIG_ATA_BMDMA */
811 default: 811 default:
812 BUG(); 812 BUG();
813 } 813 }
814 } 814 }
815 815
816 /** 816 /**
817 * __atapi_pio_bytes - Transfer data from/to the ATAPI device. 817 * __atapi_pio_bytes - Transfer data from/to the ATAPI device.
818 * @qc: Command on going 818 * @qc: Command on going
819 * @bytes: number of bytes 819 * @bytes: number of bytes
820 * 820 *
821 * Transfer Transfer data from/to the ATAPI device. 821 * Transfer Transfer data from/to the ATAPI device.
822 * 822 *
823 * LOCKING: 823 * LOCKING:
824 * Inherited from caller. 824 * Inherited from caller.
825 * 825 *
826 */ 826 */
827 static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) 827 static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes)
828 { 828 {
829 int rw = (qc->tf.flags & ATA_TFLAG_WRITE) ? WRITE : READ; 829 int rw = (qc->tf.flags & ATA_TFLAG_WRITE) ? WRITE : READ;
830 struct ata_port *ap = qc->ap; 830 struct ata_port *ap = qc->ap;
831 struct ata_device *dev = qc->dev; 831 struct ata_device *dev = qc->dev;
832 struct ata_eh_info *ehi = &dev->link->eh_info; 832 struct ata_eh_info *ehi = &dev->link->eh_info;
833 struct scatterlist *sg; 833 struct scatterlist *sg;
834 struct page *page; 834 struct page *page;
835 unsigned char *buf; 835 unsigned char *buf;
836 unsigned int offset, count, consumed; 836 unsigned int offset, count, consumed;
837 837
838 next_sg: 838 next_sg:
839 sg = qc->cursg; 839 sg = qc->cursg;
840 if (unlikely(!sg)) { 840 if (unlikely(!sg)) {
841 ata_ehi_push_desc(ehi, "unexpected or too much trailing data " 841 ata_ehi_push_desc(ehi, "unexpected or too much trailing data "
842 "buf=%u cur=%u bytes=%u", 842 "buf=%u cur=%u bytes=%u",
843 qc->nbytes, qc->curbytes, bytes); 843 qc->nbytes, qc->curbytes, bytes);
844 return -1; 844 return -1;
845 } 845 }
846 846
847 page = sg_page(sg); 847 page = sg_page(sg);
848 offset = sg->offset + qc->cursg_ofs; 848 offset = sg->offset + qc->cursg_ofs;
849 849
850 /* get the current page and offset */ 850 /* get the current page and offset */
851 page = nth_page(page, (offset >> PAGE_SHIFT)); 851 page = nth_page(page, (offset >> PAGE_SHIFT));
852 offset %= PAGE_SIZE; 852 offset %= PAGE_SIZE;
853 853
854 /* don't overrun current sg */ 854 /* don't overrun current sg */
855 count = min(sg->length - qc->cursg_ofs, bytes); 855 count = min(sg->length - qc->cursg_ofs, bytes);
856 856
857 /* don't cross page boundaries */ 857 /* don't cross page boundaries */
858 count = min(count, (unsigned int)PAGE_SIZE - offset); 858 count = min(count, (unsigned int)PAGE_SIZE - offset);
859 859
860 DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); 860 DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read");
861 861
862 if (PageHighMem(page)) { 862 if (PageHighMem(page)) {
863 unsigned long flags; 863 unsigned long flags;
864 864
865 /* FIXME: use bounce buffer */ 865 /* FIXME: use bounce buffer */
866 local_irq_save(flags); 866 local_irq_save(flags);
867 buf = kmap_atomic(page, KM_IRQ0); 867 buf = kmap_atomic(page, KM_IRQ0);
868 868
869 /* do the actual data transfer */ 869 /* do the actual data transfer */
870 consumed = ap->ops->sff_data_xfer(dev, buf + offset, 870 consumed = ap->ops->sff_data_xfer(dev, buf + offset,
871 count, rw); 871 count, rw);
872 872
873 kunmap_atomic(buf, KM_IRQ0); 873 kunmap_atomic(buf, KM_IRQ0);
874 local_irq_restore(flags); 874 local_irq_restore(flags);
875 } else { 875 } else {
876 buf = page_address(page); 876 buf = page_address(page);
877 consumed = ap->ops->sff_data_xfer(dev, buf + offset, 877 consumed = ap->ops->sff_data_xfer(dev, buf + offset,
878 count, rw); 878 count, rw);
879 } 879 }
880 880
881 bytes -= min(bytes, consumed); 881 bytes -= min(bytes, consumed);
882 qc->curbytes += count; 882 qc->curbytes += count;
883 qc->cursg_ofs += count; 883 qc->cursg_ofs += count;
884 884
885 if (qc->cursg_ofs == sg->length) { 885 if (qc->cursg_ofs == sg->length) {
886 qc->cursg = sg_next(qc->cursg); 886 qc->cursg = sg_next(qc->cursg);
887 qc->cursg_ofs = 0; 887 qc->cursg_ofs = 0;
888 } 888 }
889 889
890 /* 890 /*
891 * There used to be a WARN_ON_ONCE(qc->cursg && count != consumed); 891 * There used to be a WARN_ON_ONCE(qc->cursg && count != consumed);
892 * Unfortunately __atapi_pio_bytes doesn't know enough to do the WARN 892 * Unfortunately __atapi_pio_bytes doesn't know enough to do the WARN
893 * check correctly as it doesn't know if it is the last request being 893 * check correctly as it doesn't know if it is the last request being
894 * made. Somebody should implement a proper sanity check. 894 * made. Somebody should implement a proper sanity check.
895 */ 895 */
896 if (bytes) 896 if (bytes)
897 goto next_sg; 897 goto next_sg;
898 return 0; 898 return 0;
899 } 899 }
900 900
901 /** 901 /**
902 * atapi_pio_bytes - Transfer data from/to the ATAPI device. 902 * atapi_pio_bytes - Transfer data from/to the ATAPI device.
903 * @qc: Command on going 903 * @qc: Command on going
904 * 904 *
905 * Transfer Transfer data from/to the ATAPI device. 905 * Transfer Transfer data from/to the ATAPI device.
906 * 906 *
907 * LOCKING: 907 * LOCKING:
908 * Inherited from caller. 908 * Inherited from caller.
909 */ 909 */
910 static void atapi_pio_bytes(struct ata_queued_cmd *qc) 910 static void atapi_pio_bytes(struct ata_queued_cmd *qc)
911 { 911 {
912 struct ata_port *ap = qc->ap; 912 struct ata_port *ap = qc->ap;
913 struct ata_device *dev = qc->dev; 913 struct ata_device *dev = qc->dev;
914 struct ata_eh_info *ehi = &dev->link->eh_info; 914 struct ata_eh_info *ehi = &dev->link->eh_info;
915 unsigned int ireason, bc_lo, bc_hi, bytes; 915 unsigned int ireason, bc_lo, bc_hi, bytes;
916 int i_write, do_write = (qc->tf.flags & ATA_TFLAG_WRITE) ? 1 : 0; 916 int i_write, do_write = (qc->tf.flags & ATA_TFLAG_WRITE) ? 1 : 0;
917 917
918 /* Abuse qc->result_tf for temp storage of intermediate TF 918 /* Abuse qc->result_tf for temp storage of intermediate TF
919 * here to save some kernel stack usage. 919 * here to save some kernel stack usage.
920 * For normal completion, qc->result_tf is not relevant. For 920 * For normal completion, qc->result_tf is not relevant. For
921 * error, qc->result_tf is later overwritten by ata_qc_complete(). 921 * error, qc->result_tf is later overwritten by ata_qc_complete().
922 * So, the correctness of qc->result_tf is not affected. 922 * So, the correctness of qc->result_tf is not affected.
923 */ 923 */
924 ap->ops->sff_tf_read(ap, &qc->result_tf); 924 ap->ops->sff_tf_read(ap, &qc->result_tf);
925 ireason = qc->result_tf.nsect; 925 ireason = qc->result_tf.nsect;
926 bc_lo = qc->result_tf.lbam; 926 bc_lo = qc->result_tf.lbam;
927 bc_hi = qc->result_tf.lbah; 927 bc_hi = qc->result_tf.lbah;
928 bytes = (bc_hi << 8) | bc_lo; 928 bytes = (bc_hi << 8) | bc_lo;
929 929
930 /* shall be cleared to zero, indicating xfer of data */ 930 /* shall be cleared to zero, indicating xfer of data */
931 if (unlikely(ireason & (1 << 0))) 931 if (unlikely(ireason & (1 << 0)))
932 goto atapi_check; 932 goto atapi_check;
933 933
934 /* make sure transfer direction matches expected */ 934 /* make sure transfer direction matches expected */
935 i_write = ((ireason & (1 << 1)) == 0) ? 1 : 0; 935 i_write = ((ireason & (1 << 1)) == 0) ? 1 : 0;
936 if (unlikely(do_write != i_write)) 936 if (unlikely(do_write != i_write))
937 goto atapi_check; 937 goto atapi_check;
938 938
939 if (unlikely(!bytes)) 939 if (unlikely(!bytes))
940 goto atapi_check; 940 goto atapi_check;
941 941
942 VPRINTK("ata%u: xfering %d bytes\n", ap->print_id, bytes); 942 VPRINTK("ata%u: xfering %d bytes\n", ap->print_id, bytes);
943 943
944 if (unlikely(__atapi_pio_bytes(qc, bytes))) 944 if (unlikely(__atapi_pio_bytes(qc, bytes)))
945 goto err_out; 945 goto err_out;
946 ata_sff_sync(ap); /* flush */ 946 ata_sff_sync(ap); /* flush */
947 947
948 return; 948 return;
949 949
950 atapi_check: 950 atapi_check:
951 ata_ehi_push_desc(ehi, "ATAPI check failed (ireason=0x%x bytes=%u)", 951 ata_ehi_push_desc(ehi, "ATAPI check failed (ireason=0x%x bytes=%u)",
952 ireason, bytes); 952 ireason, bytes);
953 err_out: 953 err_out:
954 qc->err_mask |= AC_ERR_HSM; 954 qc->err_mask |= AC_ERR_HSM;
955 ap->hsm_task_state = HSM_ST_ERR; 955 ap->hsm_task_state = HSM_ST_ERR;
956 } 956 }
957 957
958 /** 958 /**
959 * ata_hsm_ok_in_wq - Check if the qc can be handled in the workqueue. 959 * ata_hsm_ok_in_wq - Check if the qc can be handled in the workqueue.
960 * @ap: the target ata_port 960 * @ap: the target ata_port
961 * @qc: qc on going 961 * @qc: qc on going
962 * 962 *
963 * RETURNS: 963 * RETURNS:
964 * 1 if ok in workqueue, 0 otherwise. 964 * 1 if ok in workqueue, 0 otherwise.
965 */ 965 */
966 static inline int ata_hsm_ok_in_wq(struct ata_port *ap, 966 static inline int ata_hsm_ok_in_wq(struct ata_port *ap,
967 struct ata_queued_cmd *qc) 967 struct ata_queued_cmd *qc)
968 { 968 {
969 if (qc->tf.flags & ATA_TFLAG_POLLING) 969 if (qc->tf.flags & ATA_TFLAG_POLLING)
970 return 1; 970 return 1;
971 971
972 if (ap->hsm_task_state == HSM_ST_FIRST) { 972 if (ap->hsm_task_state == HSM_ST_FIRST) {
973 if (qc->tf.protocol == ATA_PROT_PIO && 973 if (qc->tf.protocol == ATA_PROT_PIO &&
974 (qc->tf.flags & ATA_TFLAG_WRITE)) 974 (qc->tf.flags & ATA_TFLAG_WRITE))
975 return 1; 975 return 1;
976 976
977 if (ata_is_atapi(qc->tf.protocol) && 977 if (ata_is_atapi(qc->tf.protocol) &&
978 !(qc->dev->flags & ATA_DFLAG_CDB_INTR)) 978 !(qc->dev->flags & ATA_DFLAG_CDB_INTR))
979 return 1; 979 return 1;
980 } 980 }
981 981
982 return 0; 982 return 0;
983 } 983 }
984 984
985 /** 985 /**
986 * ata_hsm_qc_complete - finish a qc running on standard HSM 986 * ata_hsm_qc_complete - finish a qc running on standard HSM
987 * @qc: Command to complete 987 * @qc: Command to complete
988 * @in_wq: 1 if called from workqueue, 0 otherwise 988 * @in_wq: 1 if called from workqueue, 0 otherwise
989 * 989 *
990 * Finish @qc which is running on standard HSM. 990 * Finish @qc which is running on standard HSM.
991 * 991 *
992 * LOCKING: 992 * LOCKING:
993 * If @in_wq is zero, spin_lock_irqsave(host lock). 993 * If @in_wq is zero, spin_lock_irqsave(host lock).
994 * Otherwise, none on entry and grabs host lock. 994 * Otherwise, none on entry and grabs host lock.
995 */ 995 */
996 static void ata_hsm_qc_complete(struct ata_queued_cmd *qc, int in_wq) 996 static void ata_hsm_qc_complete(struct ata_queued_cmd *qc, int in_wq)
997 { 997 {
998 struct ata_port *ap = qc->ap; 998 struct ata_port *ap = qc->ap;
999 unsigned long flags; 999 unsigned long flags;
1000 1000
1001 if (ap->ops->error_handler) { 1001 if (ap->ops->error_handler) {
1002 if (in_wq) { 1002 if (in_wq) {
1003 spin_lock_irqsave(ap->lock, flags); 1003 spin_lock_irqsave(ap->lock, flags);
1004 1004
1005 /* EH might have kicked in while host lock is 1005 /* EH might have kicked in while host lock is
1006 * released. 1006 * released.
1007 */ 1007 */
1008 qc = ata_qc_from_tag(ap, qc->tag); 1008 qc = ata_qc_from_tag(ap, qc->tag);
1009 if (qc) { 1009 if (qc) {
1010 if (likely(!(qc->err_mask & AC_ERR_HSM))) { 1010 if (likely(!(qc->err_mask & AC_ERR_HSM))) {
1011 ata_sff_irq_on(ap); 1011 ata_sff_irq_on(ap);
1012 ata_qc_complete(qc); 1012 ata_qc_complete(qc);
1013 } else 1013 } else
1014 ata_port_freeze(ap); 1014 ata_port_freeze(ap);
1015 } 1015 }
1016 1016
1017 spin_unlock_irqrestore(ap->lock, flags); 1017 spin_unlock_irqrestore(ap->lock, flags);
1018 } else { 1018 } else {
1019 if (likely(!(qc->err_mask & AC_ERR_HSM))) 1019 if (likely(!(qc->err_mask & AC_ERR_HSM)))
1020 ata_qc_complete(qc); 1020 ata_qc_complete(qc);
1021 else 1021 else
1022 ata_port_freeze(ap); 1022 ata_port_freeze(ap);
1023 } 1023 }
1024 } else { 1024 } else {
1025 if (in_wq) { 1025 if (in_wq) {
1026 spin_lock_irqsave(ap->lock, flags); 1026 spin_lock_irqsave(ap->lock, flags);
1027 ata_sff_irq_on(ap); 1027 ata_sff_irq_on(ap);
1028 ata_qc_complete(qc); 1028 ata_qc_complete(qc);
1029 spin_unlock_irqrestore(ap->lock, flags); 1029 spin_unlock_irqrestore(ap->lock, flags);
1030 } else 1030 } else
1031 ata_qc_complete(qc); 1031 ata_qc_complete(qc);
1032 } 1032 }
1033 } 1033 }
1034 1034
1035 /** 1035 /**
1036 * ata_sff_hsm_move - move the HSM to the next state. 1036 * ata_sff_hsm_move - move the HSM to the next state.
1037 * @ap: the target ata_port 1037 * @ap: the target ata_port
1038 * @qc: qc on going 1038 * @qc: qc on going
1039 * @status: current device status 1039 * @status: current device status
1040 * @in_wq: 1 if called from workqueue, 0 otherwise 1040 * @in_wq: 1 if called from workqueue, 0 otherwise
1041 * 1041 *
1042 * RETURNS: 1042 * RETURNS:
1043 * 1 when poll next status needed, 0 otherwise. 1043 * 1 when poll next status needed, 0 otherwise.
1044 */ 1044 */
1045 int ata_sff_hsm_move(struct ata_port *ap, struct ata_queued_cmd *qc, 1045 int ata_sff_hsm_move(struct ata_port *ap, struct ata_queued_cmd *qc,
1046 u8 status, int in_wq) 1046 u8 status, int in_wq)
1047 { 1047 {
1048 struct ata_link *link = qc->dev->link; 1048 struct ata_link *link = qc->dev->link;
1049 struct ata_eh_info *ehi = &link->eh_info; 1049 struct ata_eh_info *ehi = &link->eh_info;
1050 unsigned long flags = 0; 1050 unsigned long flags = 0;
1051 int poll_next; 1051 int poll_next;
1052 1052
1053 WARN_ON_ONCE((qc->flags & ATA_QCFLAG_ACTIVE) == 0); 1053 WARN_ON_ONCE((qc->flags & ATA_QCFLAG_ACTIVE) == 0);
1054 1054
1055 /* Make sure ata_sff_qc_issue() does not throw things 1055 /* Make sure ata_sff_qc_issue() does not throw things
1056 * like DMA polling into the workqueue. Notice that 1056 * like DMA polling into the workqueue. Notice that
1057 * in_wq is not equivalent to (qc->tf.flags & ATA_TFLAG_POLLING). 1057 * in_wq is not equivalent to (qc->tf.flags & ATA_TFLAG_POLLING).
1058 */ 1058 */
1059 WARN_ON_ONCE(in_wq != ata_hsm_ok_in_wq(ap, qc)); 1059 WARN_ON_ONCE(in_wq != ata_hsm_ok_in_wq(ap, qc));
1060 1060
1061 fsm_start: 1061 fsm_start:
1062 DPRINTK("ata%u: protocol %d task_state %d (dev_stat 0x%X)\n", 1062 DPRINTK("ata%u: protocol %d task_state %d (dev_stat 0x%X)\n",
1063 ap->print_id, qc->tf.protocol, ap->hsm_task_state, status); 1063 ap->print_id, qc->tf.protocol, ap->hsm_task_state, status);
1064 1064
1065 switch (ap->hsm_task_state) { 1065 switch (ap->hsm_task_state) {
1066 case HSM_ST_FIRST: 1066 case HSM_ST_FIRST:
1067 /* Send first data block or PACKET CDB */ 1067 /* Send first data block or PACKET CDB */
1068 1068
1069 /* If polling, we will stay in the work queue after 1069 /* If polling, we will stay in the work queue after
1070 * sending the data. Otherwise, interrupt handler 1070 * sending the data. Otherwise, interrupt handler
1071 * takes over after sending the data. 1071 * takes over after sending the data.
1072 */ 1072 */
1073 poll_next = (qc->tf.flags & ATA_TFLAG_POLLING); 1073 poll_next = (qc->tf.flags & ATA_TFLAG_POLLING);
1074 1074
1075 /* check device status */ 1075 /* check device status */
1076 if (unlikely((status & ATA_DRQ) == 0)) { 1076 if (unlikely((status & ATA_DRQ) == 0)) {
1077 /* handle BSY=0, DRQ=0 as error */ 1077 /* handle BSY=0, DRQ=0 as error */
1078 if (likely(status & (ATA_ERR | ATA_DF))) 1078 if (likely(status & (ATA_ERR | ATA_DF)))
1079 /* device stops HSM for abort/error */ 1079 /* device stops HSM for abort/error */
1080 qc->err_mask |= AC_ERR_DEV; 1080 qc->err_mask |= AC_ERR_DEV;
1081 else { 1081 else {
1082 /* HSM violation. Let EH handle this */ 1082 /* HSM violation. Let EH handle this */
1083 ata_ehi_push_desc(ehi, 1083 ata_ehi_push_desc(ehi,
1084 "ST_FIRST: !(DRQ|ERR|DF)"); 1084 "ST_FIRST: !(DRQ|ERR|DF)");
1085 qc->err_mask |= AC_ERR_HSM; 1085 qc->err_mask |= AC_ERR_HSM;
1086 } 1086 }
1087 1087
1088 ap->hsm_task_state = HSM_ST_ERR; 1088 ap->hsm_task_state = HSM_ST_ERR;
1089 goto fsm_start; 1089 goto fsm_start;
1090 } 1090 }
1091 1091
1092 /* Device should not ask for data transfer (DRQ=1) 1092 /* Device should not ask for data transfer (DRQ=1)
1093 * when it finds something wrong. 1093 * when it finds something wrong.
1094 * We ignore DRQ here and stop the HSM by 1094 * We ignore DRQ here and stop the HSM by
1095 * changing hsm_task_state to HSM_ST_ERR and 1095 * changing hsm_task_state to HSM_ST_ERR and
1096 * let the EH abort the command or reset the device. 1096 * let the EH abort the command or reset the device.
1097 */ 1097 */
1098 if (unlikely(status & (ATA_ERR | ATA_DF))) { 1098 if (unlikely(status & (ATA_ERR | ATA_DF))) {
1099 /* Some ATAPI tape drives forget to clear the ERR bit 1099 /* Some ATAPI tape drives forget to clear the ERR bit
1100 * when doing the next command (mostly request sense). 1100 * when doing the next command (mostly request sense).
1101 * We ignore ERR here to workaround and proceed sending 1101 * We ignore ERR here to workaround and proceed sending
1102 * the CDB. 1102 * the CDB.
1103 */ 1103 */
1104 if (!(qc->dev->horkage & ATA_HORKAGE_STUCK_ERR)) { 1104 if (!(qc->dev->horkage & ATA_HORKAGE_STUCK_ERR)) {
1105 ata_ehi_push_desc(ehi, "ST_FIRST: " 1105 ata_ehi_push_desc(ehi, "ST_FIRST: "
1106 "DRQ=1 with device error, " 1106 "DRQ=1 with device error, "
1107 "dev_stat 0x%X", status); 1107 "dev_stat 0x%X", status);
1108 qc->err_mask |= AC_ERR_HSM; 1108 qc->err_mask |= AC_ERR_HSM;
1109 ap->hsm_task_state = HSM_ST_ERR; 1109 ap->hsm_task_state = HSM_ST_ERR;
1110 goto fsm_start; 1110 goto fsm_start;
1111 } 1111 }
1112 } 1112 }
1113 1113
1114 /* Send the CDB (atapi) or the first data block (ata pio out). 1114 /* Send the CDB (atapi) or the first data block (ata pio out).
1115 * During the state transition, interrupt handler shouldn't 1115 * During the state transition, interrupt handler shouldn't
1116 * be invoked before the data transfer is complete and 1116 * be invoked before the data transfer is complete and
1117 * hsm_task_state is changed. Hence, the following locking. 1117 * hsm_task_state is changed. Hence, the following locking.
1118 */ 1118 */
1119 if (in_wq) 1119 if (in_wq)
1120 spin_lock_irqsave(ap->lock, flags); 1120 spin_lock_irqsave(ap->lock, flags);
1121 1121
1122 if (qc->tf.protocol == ATA_PROT_PIO) { 1122 if (qc->tf.protocol == ATA_PROT_PIO) {
1123 /* PIO data out protocol. 1123 /* PIO data out protocol.
1124 * send first data block. 1124 * send first data block.
1125 */ 1125 */
1126 1126
1127 /* ata_pio_sectors() might change the state 1127 /* ata_pio_sectors() might change the state
1128 * to HSM_ST_LAST. so, the state is changed here 1128 * to HSM_ST_LAST. so, the state is changed here
1129 * before ata_pio_sectors(). 1129 * before ata_pio_sectors().
1130 */ 1130 */
1131 ap->hsm_task_state = HSM_ST; 1131 ap->hsm_task_state = HSM_ST;
1132 ata_pio_sectors(qc); 1132 ata_pio_sectors(qc);
1133 } else 1133 } else
1134 /* send CDB */ 1134 /* send CDB */
1135 atapi_send_cdb(ap, qc); 1135 atapi_send_cdb(ap, qc);
1136 1136
1137 if (in_wq) 1137 if (in_wq)
1138 spin_unlock_irqrestore(ap->lock, flags); 1138 spin_unlock_irqrestore(ap->lock, flags);
1139 1139
1140 /* if polling, ata_sff_pio_task() handles the rest. 1140 /* if polling, ata_sff_pio_task() handles the rest.
1141 * otherwise, interrupt handler takes over from here. 1141 * otherwise, interrupt handler takes over from here.
1142 */ 1142 */
1143 break; 1143 break;
1144 1144
1145 case HSM_ST: 1145 case HSM_ST:
1146 /* complete command or read/write the data register */ 1146 /* complete command or read/write the data register */
1147 if (qc->tf.protocol == ATAPI_PROT_PIO) { 1147 if (qc->tf.protocol == ATAPI_PROT_PIO) {
1148 /* ATAPI PIO protocol */ 1148 /* ATAPI PIO protocol */
1149 if ((status & ATA_DRQ) == 0) { 1149 if ((status & ATA_DRQ) == 0) {
1150 /* No more data to transfer or device error. 1150 /* No more data to transfer or device error.
1151 * Device error will be tagged in HSM_ST_LAST. 1151 * Device error will be tagged in HSM_ST_LAST.
1152 */ 1152 */
1153 ap->hsm_task_state = HSM_ST_LAST; 1153 ap->hsm_task_state = HSM_ST_LAST;
1154 goto fsm_start; 1154 goto fsm_start;
1155 } 1155 }
1156 1156
1157 /* Device should not ask for data transfer (DRQ=1) 1157 /* Device should not ask for data transfer (DRQ=1)
1158 * when it finds something wrong. 1158 * when it finds something wrong.
1159 * We ignore DRQ here and stop the HSM by 1159 * We ignore DRQ here and stop the HSM by
1160 * changing hsm_task_state to HSM_ST_ERR and 1160 * changing hsm_task_state to HSM_ST_ERR and
1161 * let the EH abort the command or reset the device. 1161 * let the EH abort the command or reset the device.
1162 */ 1162 */
1163 if (unlikely(status & (ATA_ERR | ATA_DF))) { 1163 if (unlikely(status & (ATA_ERR | ATA_DF))) {
1164 ata_ehi_push_desc(ehi, "ST-ATAPI: " 1164 ata_ehi_push_desc(ehi, "ST-ATAPI: "
1165 "DRQ=1 with device error, " 1165 "DRQ=1 with device error, "
1166 "dev_stat 0x%X", status); 1166 "dev_stat 0x%X", status);
1167 qc->err_mask |= AC_ERR_HSM; 1167 qc->err_mask |= AC_ERR_HSM;
1168 ap->hsm_task_state = HSM_ST_ERR; 1168 ap->hsm_task_state = HSM_ST_ERR;
1169 goto fsm_start; 1169 goto fsm_start;
1170 } 1170 }
1171 1171
1172 atapi_pio_bytes(qc); 1172 atapi_pio_bytes(qc);
1173 1173
1174 if (unlikely(ap->hsm_task_state == HSM_ST_ERR)) 1174 if (unlikely(ap->hsm_task_state == HSM_ST_ERR))
1175 /* bad ireason reported by device */ 1175 /* bad ireason reported by device */
1176 goto fsm_start; 1176 goto fsm_start;
1177 1177
1178 } else { 1178 } else {
1179 /* ATA PIO protocol */ 1179 /* ATA PIO protocol */
1180 if (unlikely((status & ATA_DRQ) == 0)) { 1180 if (unlikely((status & ATA_DRQ) == 0)) {
1181 /* handle BSY=0, DRQ=0 as error */ 1181 /* handle BSY=0, DRQ=0 as error */
1182 if (likely(status & (ATA_ERR | ATA_DF))) { 1182 if (likely(status & (ATA_ERR | ATA_DF))) {
1183 /* device stops HSM for abort/error */ 1183 /* device stops HSM for abort/error */
1184 qc->err_mask |= AC_ERR_DEV; 1184 qc->err_mask |= AC_ERR_DEV;
1185 1185
1186 /* If diagnostic failed and this is 1186 /* If diagnostic failed and this is
1187 * IDENTIFY, it's likely a phantom 1187 * IDENTIFY, it's likely a phantom
1188 * device. Mark hint. 1188 * device. Mark hint.
1189 */ 1189 */
1190 if (qc->dev->horkage & 1190 if (qc->dev->horkage &
1191 ATA_HORKAGE_DIAGNOSTIC) 1191 ATA_HORKAGE_DIAGNOSTIC)
1192 qc->err_mask |= 1192 qc->err_mask |=
1193 AC_ERR_NODEV_HINT; 1193 AC_ERR_NODEV_HINT;
1194 } else { 1194 } else {
1195 /* HSM violation. Let EH handle this. 1195 /* HSM violation. Let EH handle this.
1196 * Phantom devices also trigger this 1196 * Phantom devices also trigger this
1197 * condition. Mark hint. 1197 * condition. Mark hint.
1198 */ 1198 */
1199 ata_ehi_push_desc(ehi, "ST-ATA: " 1199 ata_ehi_push_desc(ehi, "ST-ATA: "
1200 "DRQ=0 without device error, " 1200 "DRQ=0 without device error, "
1201 "dev_stat 0x%X", status); 1201 "dev_stat 0x%X", status);
1202 qc->err_mask |= AC_ERR_HSM | 1202 qc->err_mask |= AC_ERR_HSM |
1203 AC_ERR_NODEV_HINT; 1203 AC_ERR_NODEV_HINT;
1204 } 1204 }
1205 1205
1206 ap->hsm_task_state = HSM_ST_ERR; 1206 ap->hsm_task_state = HSM_ST_ERR;
1207 goto fsm_start; 1207 goto fsm_start;
1208 } 1208 }
1209 1209
1210 /* For PIO reads, some devices may ask for 1210 /* For PIO reads, some devices may ask for
1211 * data transfer (DRQ=1) alone with ERR=1. 1211 * data transfer (DRQ=1) alone with ERR=1.
1212 * We respect DRQ here and transfer one 1212 * We respect DRQ here and transfer one
1213 * block of junk data before changing the 1213 * block of junk data before changing the
1214 * hsm_task_state to HSM_ST_ERR. 1214 * hsm_task_state to HSM_ST_ERR.
1215 * 1215 *
1216 * For PIO writes, ERR=1 DRQ=1 doesn't make 1216 * For PIO writes, ERR=1 DRQ=1 doesn't make
1217 * sense since the data block has been 1217 * sense since the data block has been
1218 * transferred to the device. 1218 * transferred to the device.
1219 */ 1219 */
1220 if (unlikely(status & (ATA_ERR | ATA_DF))) { 1220 if (unlikely(status & (ATA_ERR | ATA_DF))) {
1221 /* data might be corrputed */ 1221 /* data might be corrputed */
1222 qc->err_mask |= AC_ERR_DEV; 1222 qc->err_mask |= AC_ERR_DEV;
1223 1223
1224 if (!(qc->tf.flags & ATA_TFLAG_WRITE)) { 1224 if (!(qc->tf.flags & ATA_TFLAG_WRITE)) {
1225 ata_pio_sectors(qc); 1225 ata_pio_sectors(qc);
1226 status = ata_wait_idle(ap); 1226 status = ata_wait_idle(ap);
1227 } 1227 }
1228 1228
1229 if (status & (ATA_BUSY | ATA_DRQ)) { 1229 if (status & (ATA_BUSY | ATA_DRQ)) {
1230 ata_ehi_push_desc(ehi, "ST-ATA: " 1230 ata_ehi_push_desc(ehi, "ST-ATA: "
1231 "BUSY|DRQ persists on ERR|DF, " 1231 "BUSY|DRQ persists on ERR|DF, "
1232 "dev_stat 0x%X", status); 1232 "dev_stat 0x%X", status);
1233 qc->err_mask |= AC_ERR_HSM; 1233 qc->err_mask |= AC_ERR_HSM;
1234 } 1234 }
1235 1235
1236 /* There are oddball controllers with 1236 /* There are oddball controllers with
1237 * status register stuck at 0x7f and 1237 * status register stuck at 0x7f and
1238 * lbal/m/h at zero which makes it 1238 * lbal/m/h at zero which makes it
1239 * pass all other presence detection 1239 * pass all other presence detection
1240 * mechanisms we have. Set NODEV_HINT 1240 * mechanisms we have. Set NODEV_HINT
1241 * for it. Kernel bz#7241. 1241 * for it. Kernel bz#7241.
1242 */ 1242 */
1243 if (status == 0x7f) 1243 if (status == 0x7f)
1244 qc->err_mask |= AC_ERR_NODEV_HINT; 1244 qc->err_mask |= AC_ERR_NODEV_HINT;
1245 1245
1246 /* ata_pio_sectors() might change the 1246 /* ata_pio_sectors() might change the
1247 * state to HSM_ST_LAST. so, the state 1247 * state to HSM_ST_LAST. so, the state
1248 * is changed after ata_pio_sectors(). 1248 * is changed after ata_pio_sectors().
1249 */ 1249 */
1250 ap->hsm_task_state = HSM_ST_ERR; 1250 ap->hsm_task_state = HSM_ST_ERR;
1251 goto fsm_start; 1251 goto fsm_start;
1252 } 1252 }
1253 1253
1254 ata_pio_sectors(qc); 1254 ata_pio_sectors(qc);
1255 1255
1256 if (ap->hsm_task_state == HSM_ST_LAST && 1256 if (ap->hsm_task_state == HSM_ST_LAST &&
1257 (!(qc->tf.flags & ATA_TFLAG_WRITE))) { 1257 (!(qc->tf.flags & ATA_TFLAG_WRITE))) {
1258 /* all data read */ 1258 /* all data read */
1259 status = ata_wait_idle(ap); 1259 status = ata_wait_idle(ap);
1260 goto fsm_start; 1260 goto fsm_start;
1261 } 1261 }
1262 } 1262 }
1263 1263
1264 poll_next = 1; 1264 poll_next = 1;
1265 break; 1265 break;
1266 1266
1267 case HSM_ST_LAST: 1267 case HSM_ST_LAST:
1268 if (unlikely(!ata_ok(status))) { 1268 if (unlikely(!ata_ok(status))) {
1269 qc->err_mask |= __ac_err_mask(status); 1269 qc->err_mask |= __ac_err_mask(status);
1270 ap->hsm_task_state = HSM_ST_ERR; 1270 ap->hsm_task_state = HSM_ST_ERR;
1271 goto fsm_start; 1271 goto fsm_start;
1272 } 1272 }
1273 1273
1274 /* no more data to transfer */ 1274 /* no more data to transfer */
1275 DPRINTK("ata%u: dev %u command complete, drv_stat 0x%x\n", 1275 DPRINTK("ata%u: dev %u command complete, drv_stat 0x%x\n",
1276 ap->print_id, qc->dev->devno, status); 1276 ap->print_id, qc->dev->devno, status);
1277 1277
1278 WARN_ON_ONCE(qc->err_mask & (AC_ERR_DEV | AC_ERR_HSM)); 1278 WARN_ON_ONCE(qc->err_mask & (AC_ERR_DEV | AC_ERR_HSM));
1279 1279
1280 ap->hsm_task_state = HSM_ST_IDLE; 1280 ap->hsm_task_state = HSM_ST_IDLE;
1281 1281
1282 /* complete taskfile transaction */ 1282 /* complete taskfile transaction */
1283 ata_hsm_qc_complete(qc, in_wq); 1283 ata_hsm_qc_complete(qc, in_wq);
1284 1284
1285 poll_next = 0; 1285 poll_next = 0;
1286 break; 1286 break;
1287 1287
1288 case HSM_ST_ERR: 1288 case HSM_ST_ERR:
1289 ap->hsm_task_state = HSM_ST_IDLE; 1289 ap->hsm_task_state = HSM_ST_IDLE;
1290 1290
1291 /* complete taskfile transaction */ 1291 /* complete taskfile transaction */
1292 ata_hsm_qc_complete(qc, in_wq); 1292 ata_hsm_qc_complete(qc, in_wq);
1293 1293
1294 poll_next = 0; 1294 poll_next = 0;
1295 break; 1295 break;
1296 default: 1296 default:
1297 poll_next = 0; 1297 poll_next = 0;
1298 BUG(); 1298 BUG();
1299 } 1299 }
1300 1300
1301 return poll_next; 1301 return poll_next;
1302 } 1302 }
1303 EXPORT_SYMBOL_GPL(ata_sff_hsm_move); 1303 EXPORT_SYMBOL_GPL(ata_sff_hsm_move);
1304 1304
1305 void ata_sff_queue_pio_task(struct ata_link *link, unsigned long delay) 1305 void ata_sff_queue_pio_task(struct ata_link *link, unsigned long delay)
1306 { 1306 {
1307 struct ata_port *ap = link->ap; 1307 struct ata_port *ap = link->ap;
1308 1308
1309 WARN_ON((ap->sff_pio_task_link != NULL) && 1309 WARN_ON((ap->sff_pio_task_link != NULL) &&
1310 (ap->sff_pio_task_link != link)); 1310 (ap->sff_pio_task_link != link));
1311 ap->sff_pio_task_link = link; 1311 ap->sff_pio_task_link = link;
1312 1312
1313 /* may fail if ata_sff_flush_pio_task() in progress */ 1313 /* may fail if ata_sff_flush_pio_task() in progress */
1314 queue_delayed_work(ata_sff_wq, &ap->sff_pio_task, 1314 queue_delayed_work(ata_sff_wq, &ap->sff_pio_task,
1315 msecs_to_jiffies(delay)); 1315 msecs_to_jiffies(delay));
1316 } 1316 }
1317 EXPORT_SYMBOL_GPL(ata_sff_queue_pio_task); 1317 EXPORT_SYMBOL_GPL(ata_sff_queue_pio_task);
1318 1318
1319 void ata_sff_flush_pio_task(struct ata_port *ap) 1319 void ata_sff_flush_pio_task(struct ata_port *ap)
1320 { 1320 {
1321 DPRINTK("ENTER\n"); 1321 DPRINTK("ENTER\n");
1322 1322
1323 cancel_rearming_delayed_work(&ap->sff_pio_task); 1323 cancel_rearming_delayed_work(&ap->sff_pio_task);
1324 ap->hsm_task_state = HSM_ST_IDLE; 1324 ap->hsm_task_state = HSM_ST_IDLE;
1325 1325
1326 if (ata_msg_ctl(ap)) 1326 if (ata_msg_ctl(ap))
1327 ata_port_printk(ap, KERN_DEBUG, "%s: EXIT\n", __func__); 1327 ata_port_printk(ap, KERN_DEBUG, "%s: EXIT\n", __func__);
1328 } 1328 }
1329 1329
1330 static void ata_sff_pio_task(struct work_struct *work) 1330 static void ata_sff_pio_task(struct work_struct *work)
1331 { 1331 {
1332 struct ata_port *ap = 1332 struct ata_port *ap =
1333 container_of(work, struct ata_port, sff_pio_task.work); 1333 container_of(work, struct ata_port, sff_pio_task.work);
1334 struct ata_link *link = ap->sff_pio_task_link; 1334 struct ata_link *link = ap->sff_pio_task_link;
1335 struct ata_queued_cmd *qc; 1335 struct ata_queued_cmd *qc;
1336 u8 status; 1336 u8 status;
1337 int poll_next; 1337 int poll_next;
1338 1338
1339 BUG_ON(ap->sff_pio_task_link == NULL); 1339 BUG_ON(ap->sff_pio_task_link == NULL);
1340 /* qc can be NULL if timeout occurred */ 1340 /* qc can be NULL if timeout occurred */
1341 qc = ata_qc_from_tag(ap, link->active_tag); 1341 qc = ata_qc_from_tag(ap, link->active_tag);
1342 if (!qc) { 1342 if (!qc) {
1343 ap->sff_pio_task_link = NULL; 1343 ap->sff_pio_task_link = NULL;
1344 return; 1344 return;
1345 } 1345 }
1346 1346
1347 fsm_start: 1347 fsm_start:
1348 WARN_ON_ONCE(ap->hsm_task_state == HSM_ST_IDLE); 1348 WARN_ON_ONCE(ap->hsm_task_state == HSM_ST_IDLE);
1349 1349
1350 /* 1350 /*
1351 * This is purely heuristic. This is a fast path. 1351 * This is purely heuristic. This is a fast path.
1352 * Sometimes when we enter, BSY will be cleared in 1352 * Sometimes when we enter, BSY will be cleared in
1353 * a chk-status or two. If not, the drive is probably seeking 1353 * a chk-status or two. If not, the drive is probably seeking
1354 * or something. Snooze for a couple msecs, then 1354 * or something. Snooze for a couple msecs, then
1355 * chk-status again. If still busy, queue delayed work. 1355 * chk-status again. If still busy, queue delayed work.
1356 */ 1356 */
1357 status = ata_sff_busy_wait(ap, ATA_BUSY, 5); 1357 status = ata_sff_busy_wait(ap, ATA_BUSY, 5);
1358 if (status & ATA_BUSY) { 1358 if (status & ATA_BUSY) {
1359 ata_msleep(ap, 2); 1359 ata_msleep(ap, 2);
1360 status = ata_sff_busy_wait(ap, ATA_BUSY, 10); 1360 status = ata_sff_busy_wait(ap, ATA_BUSY, 10);
1361 if (status & ATA_BUSY) { 1361 if (status & ATA_BUSY) {
1362 ata_sff_queue_pio_task(link, ATA_SHORT_PAUSE); 1362 ata_sff_queue_pio_task(link, ATA_SHORT_PAUSE);
1363 return; 1363 return;
1364 } 1364 }
1365 } 1365 }
1366 1366
1367 /* 1367 /*
1368 * hsm_move() may trigger another command to be processed. 1368 * hsm_move() may trigger another command to be processed.
1369 * clean the link beforehand. 1369 * clean the link beforehand.
1370 */ 1370 */
1371 ap->sff_pio_task_link = NULL; 1371 ap->sff_pio_task_link = NULL;
1372 /* move the HSM */ 1372 /* move the HSM */
1373 poll_next = ata_sff_hsm_move(ap, qc, status, 1); 1373 poll_next = ata_sff_hsm_move(ap, qc, status, 1);
1374 1374
1375 /* another command or interrupt handler 1375 /* another command or interrupt handler
1376 * may be running at this point. 1376 * may be running at this point.
1377 */ 1377 */
1378 if (poll_next) 1378 if (poll_next)
1379 goto fsm_start; 1379 goto fsm_start;
1380 } 1380 }
1381 1381
1382 /** 1382 /**
1383 * ata_sff_qc_issue - issue taskfile to a SFF controller 1383 * ata_sff_qc_issue - issue taskfile to a SFF controller
1384 * @qc: command to issue to device 1384 * @qc: command to issue to device
1385 * 1385 *
1386 * This function issues a PIO or NODATA command to a SFF 1386 * This function issues a PIO or NODATA command to a SFF
1387 * controller. 1387 * controller.
1388 * 1388 *
1389 * LOCKING: 1389 * LOCKING:
1390 * spin_lock_irqsave(host lock) 1390 * spin_lock_irqsave(host lock)
1391 * 1391 *
1392 * RETURNS: 1392 * RETURNS:
1393 * Zero on success, AC_ERR_* mask on failure 1393 * Zero on success, AC_ERR_* mask on failure
1394 */ 1394 */
1395 unsigned int ata_sff_qc_issue(struct ata_queued_cmd *qc) 1395 unsigned int ata_sff_qc_issue(struct ata_queued_cmd *qc)
1396 { 1396 {
1397 struct ata_port *ap = qc->ap; 1397 struct ata_port *ap = qc->ap;
1398 struct ata_link *link = qc->dev->link; 1398 struct ata_link *link = qc->dev->link;
1399 1399
1400 /* Use polling pio if the LLD doesn't handle 1400 /* Use polling pio if the LLD doesn't handle
1401 * interrupt driven pio and atapi CDB interrupt. 1401 * interrupt driven pio and atapi CDB interrupt.
1402 */ 1402 */
1403 if (ap->flags & ATA_FLAG_PIO_POLLING) 1403 if (ap->flags & ATA_FLAG_PIO_POLLING)
1404 qc->tf.flags |= ATA_TFLAG_POLLING; 1404 qc->tf.flags |= ATA_TFLAG_POLLING;
1405 1405
1406 /* select the device */ 1406 /* select the device */
1407 ata_dev_select(ap, qc->dev->devno, 1, 0); 1407 ata_dev_select(ap, qc->dev->devno, 1, 0);
1408 1408
1409 /* start the command */ 1409 /* start the command */
1410 switch (qc->tf.protocol) { 1410 switch (qc->tf.protocol) {
1411 case ATA_PROT_NODATA: 1411 case ATA_PROT_NODATA:
1412 if (qc->tf.flags & ATA_TFLAG_POLLING) 1412 if (qc->tf.flags & ATA_TFLAG_POLLING)
1413 ata_qc_set_polling(qc); 1413 ata_qc_set_polling(qc);
1414 1414
1415 ata_tf_to_host(ap, &qc->tf); 1415 ata_tf_to_host(ap, &qc->tf);
1416 ap->hsm_task_state = HSM_ST_LAST; 1416 ap->hsm_task_state = HSM_ST_LAST;
1417 1417
1418 if (qc->tf.flags & ATA_TFLAG_POLLING) 1418 if (qc->tf.flags & ATA_TFLAG_POLLING)
1419 ata_sff_queue_pio_task(link, 0); 1419 ata_sff_queue_pio_task(link, 0);
1420 1420
1421 break; 1421 break;
1422 1422
1423 case ATA_PROT_PIO: 1423 case ATA_PROT_PIO:
1424 if (qc->tf.flags & ATA_TFLAG_POLLING) 1424 if (qc->tf.flags & ATA_TFLAG_POLLING)
1425 ata_qc_set_polling(qc); 1425 ata_qc_set_polling(qc);
1426 1426
1427 ata_tf_to_host(ap, &qc->tf); 1427 ata_tf_to_host(ap, &qc->tf);
1428 1428
1429 if (qc->tf.flags & ATA_TFLAG_WRITE) { 1429 if (qc->tf.flags & ATA_TFLAG_WRITE) {
1430 /* PIO data out protocol */ 1430 /* PIO data out protocol */
1431 ap->hsm_task_state = HSM_ST_FIRST; 1431 ap->hsm_task_state = HSM_ST_FIRST;
1432 ata_sff_queue_pio_task(link, 0); 1432 ata_sff_queue_pio_task(link, 0);
1433 1433
1434 /* always send first data block using the 1434 /* always send first data block using the
1435 * ata_sff_pio_task() codepath. 1435 * ata_sff_pio_task() codepath.
1436 */ 1436 */
1437 } else { 1437 } else {
1438 /* PIO data in protocol */ 1438 /* PIO data in protocol */
1439 ap->hsm_task_state = HSM_ST; 1439 ap->hsm_task_state = HSM_ST;
1440 1440
1441 if (qc->tf.flags & ATA_TFLAG_POLLING) 1441 if (qc->tf.flags & ATA_TFLAG_POLLING)
1442 ata_sff_queue_pio_task(link, 0); 1442 ata_sff_queue_pio_task(link, 0);
1443 1443
1444 /* if polling, ata_sff_pio_task() handles the 1444 /* if polling, ata_sff_pio_task() handles the
1445 * rest. otherwise, interrupt handler takes 1445 * rest. otherwise, interrupt handler takes
1446 * over from here. 1446 * over from here.
1447 */ 1447 */
1448 } 1448 }
1449 1449
1450 break; 1450 break;
1451 1451
1452 case ATAPI_PROT_PIO: 1452 case ATAPI_PROT_PIO:
1453 case ATAPI_PROT_NODATA: 1453 case ATAPI_PROT_NODATA:
1454 if (qc->tf.flags & ATA_TFLAG_POLLING) 1454 if (qc->tf.flags & ATA_TFLAG_POLLING)
1455 ata_qc_set_polling(qc); 1455 ata_qc_set_polling(qc);
1456 1456
1457 ata_tf_to_host(ap, &qc->tf); 1457 ata_tf_to_host(ap, &qc->tf);
1458 1458
1459 ap->hsm_task_state = HSM_ST_FIRST; 1459 ap->hsm_task_state = HSM_ST_FIRST;
1460 1460
1461 /* send cdb by polling if no cdb interrupt */ 1461 /* send cdb by polling if no cdb interrupt */
1462 if ((!(qc->dev->flags & ATA_DFLAG_CDB_INTR)) || 1462 if ((!(qc->dev->flags & ATA_DFLAG_CDB_INTR)) ||
1463 (qc->tf.flags & ATA_TFLAG_POLLING)) 1463 (qc->tf.flags & ATA_TFLAG_POLLING))
1464 ata_sff_queue_pio_task(link, 0); 1464 ata_sff_queue_pio_task(link, 0);
1465 break; 1465 break;
1466 1466
1467 default: 1467 default:
1468 WARN_ON_ONCE(1); 1468 WARN_ON_ONCE(1);
1469 return AC_ERR_SYSTEM; 1469 return AC_ERR_SYSTEM;
1470 } 1470 }
1471 1471
1472 return 0; 1472 return 0;
1473 } 1473 }
1474 EXPORT_SYMBOL_GPL(ata_sff_qc_issue); 1474 EXPORT_SYMBOL_GPL(ata_sff_qc_issue);
1475 1475
1476 /** 1476 /**
1477 * ata_sff_qc_fill_rtf - fill result TF using ->sff_tf_read 1477 * ata_sff_qc_fill_rtf - fill result TF using ->sff_tf_read
1478 * @qc: qc to fill result TF for 1478 * @qc: qc to fill result TF for
1479 * 1479 *
1480 * @qc is finished and result TF needs to be filled. Fill it 1480 * @qc is finished and result TF needs to be filled. Fill it
1481 * using ->sff_tf_read. 1481 * using ->sff_tf_read.
1482 * 1482 *
1483 * LOCKING: 1483 * LOCKING:
1484 * spin_lock_irqsave(host lock) 1484 * spin_lock_irqsave(host lock)
1485 * 1485 *
1486 * RETURNS: 1486 * RETURNS:
1487 * true indicating that result TF is successfully filled. 1487 * true indicating that result TF is successfully filled.
1488 */ 1488 */
1489 bool ata_sff_qc_fill_rtf(struct ata_queued_cmd *qc) 1489 bool ata_sff_qc_fill_rtf(struct ata_queued_cmd *qc)
1490 { 1490 {
1491 qc->ap->ops->sff_tf_read(qc->ap, &qc->result_tf); 1491 qc->ap->ops->sff_tf_read(qc->ap, &qc->result_tf);
1492 return true; 1492 return true;
1493 } 1493 }
1494 EXPORT_SYMBOL_GPL(ata_sff_qc_fill_rtf); 1494 EXPORT_SYMBOL_GPL(ata_sff_qc_fill_rtf);
1495 1495
1496 static unsigned int ata_sff_idle_irq(struct ata_port *ap) 1496 static unsigned int ata_sff_idle_irq(struct ata_port *ap)
1497 { 1497 {
1498 ap->stats.idle_irq++; 1498 ap->stats.idle_irq++;
1499 1499
1500 #ifdef ATA_IRQ_TRAP 1500 #ifdef ATA_IRQ_TRAP
1501 if ((ap->stats.idle_irq % 1000) == 0) { 1501 if ((ap->stats.idle_irq % 1000) == 0) {
1502 ap->ops->sff_check_status(ap); 1502 ap->ops->sff_check_status(ap);
1503 if (ap->ops->sff_irq_clear) 1503 if (ap->ops->sff_irq_clear)
1504 ap->ops->sff_irq_clear(ap); 1504 ap->ops->sff_irq_clear(ap);
1505 ata_port_printk(ap, KERN_WARNING, "irq trap\n"); 1505 ata_port_printk(ap, KERN_WARNING, "irq trap\n");
1506 return 1; 1506 return 1;
1507 } 1507 }
1508 #endif 1508 #endif
1509 return 0; /* irq not handled */ 1509 return 0; /* irq not handled */
1510 } 1510 }
1511 1511
1512 static unsigned int __ata_sff_port_intr(struct ata_port *ap, 1512 static unsigned int __ata_sff_port_intr(struct ata_port *ap,
1513 struct ata_queued_cmd *qc, 1513 struct ata_queued_cmd *qc,
1514 bool hsmv_on_idle) 1514 bool hsmv_on_idle)
1515 { 1515 {
1516 u8 status; 1516 u8 status;
1517 1517
1518 VPRINTK("ata%u: protocol %d task_state %d\n", 1518 VPRINTK("ata%u: protocol %d task_state %d\n",
1519 ap->print_id, qc->tf.protocol, ap->hsm_task_state); 1519 ap->print_id, qc->tf.protocol, ap->hsm_task_state);
1520 1520
1521 /* Check whether we are expecting interrupt in this state */ 1521 /* Check whether we are expecting interrupt in this state */
1522 switch (ap->hsm_task_state) { 1522 switch (ap->hsm_task_state) {
1523 case HSM_ST_FIRST: 1523 case HSM_ST_FIRST:
1524 /* Some pre-ATAPI-4 devices assert INTRQ 1524 /* Some pre-ATAPI-4 devices assert INTRQ
1525 * at this state when ready to receive CDB. 1525 * at this state when ready to receive CDB.
1526 */ 1526 */
1527 1527
1528 /* Check the ATA_DFLAG_CDB_INTR flag is enough here. 1528 /* Check the ATA_DFLAG_CDB_INTR flag is enough here.
1529 * The flag was turned on only for atapi devices. No 1529 * The flag was turned on only for atapi devices. No
1530 * need to check ata_is_atapi(qc->tf.protocol) again. 1530 * need to check ata_is_atapi(qc->tf.protocol) again.
1531 */ 1531 */
1532 if (!(qc->dev->flags & ATA_DFLAG_CDB_INTR)) 1532 if (!(qc->dev->flags & ATA_DFLAG_CDB_INTR))
1533 return ata_sff_idle_irq(ap); 1533 return ata_sff_idle_irq(ap);
1534 break; 1534 break;
1535 case HSM_ST: 1535 case HSM_ST:
1536 case HSM_ST_LAST: 1536 case HSM_ST_LAST:
1537 break; 1537 break;
1538 default: 1538 default:
1539 return ata_sff_idle_irq(ap); 1539 return ata_sff_idle_irq(ap);
1540 } 1540 }
1541 1541
1542 /* check main status, clearing INTRQ if needed */ 1542 /* check main status, clearing INTRQ if needed */
1543 status = ata_sff_irq_status(ap); 1543 status = ata_sff_irq_status(ap);
1544 if (status & ATA_BUSY) { 1544 if (status & ATA_BUSY) {
1545 if (hsmv_on_idle) { 1545 if (hsmv_on_idle) {
1546 /* BMDMA engine is already stopped, we're screwed */ 1546 /* BMDMA engine is already stopped, we're screwed */
1547 qc->err_mask |= AC_ERR_HSM; 1547 qc->err_mask |= AC_ERR_HSM;
1548 ap->hsm_task_state = HSM_ST_ERR; 1548 ap->hsm_task_state = HSM_ST_ERR;
1549 } else 1549 } else
1550 return ata_sff_idle_irq(ap); 1550 return ata_sff_idle_irq(ap);
1551 } 1551 }
1552 1552
1553 /* clear irq events */ 1553 /* clear irq events */
1554 if (ap->ops->sff_irq_clear) 1554 if (ap->ops->sff_irq_clear)
1555 ap->ops->sff_irq_clear(ap); 1555 ap->ops->sff_irq_clear(ap);
1556 1556
1557 ata_sff_hsm_move(ap, qc, status, 0); 1557 ata_sff_hsm_move(ap, qc, status, 0);
1558 1558
1559 return 1; /* irq handled */ 1559 return 1; /* irq handled */
1560 } 1560 }
1561 1561
1562 /** 1562 /**
1563 * ata_sff_port_intr - Handle SFF port interrupt 1563 * ata_sff_port_intr - Handle SFF port interrupt
1564 * @ap: Port on which interrupt arrived (possibly...) 1564 * @ap: Port on which interrupt arrived (possibly...)
1565 * @qc: Taskfile currently active in engine 1565 * @qc: Taskfile currently active in engine
1566 * 1566 *
1567 * Handle port interrupt for given queued command. 1567 * Handle port interrupt for given queued command.
1568 * 1568 *
1569 * LOCKING: 1569 * LOCKING:
1570 * spin_lock_irqsave(host lock) 1570 * spin_lock_irqsave(host lock)
1571 * 1571 *
1572 * RETURNS: 1572 * RETURNS:
1573 * One if interrupt was handled, zero if not (shared irq). 1573 * One if interrupt was handled, zero if not (shared irq).
1574 */ 1574 */
1575 unsigned int ata_sff_port_intr(struct ata_port *ap, struct ata_queued_cmd *qc) 1575 unsigned int ata_sff_port_intr(struct ata_port *ap, struct ata_queued_cmd *qc)
1576 { 1576 {
1577 return __ata_sff_port_intr(ap, qc, false); 1577 return __ata_sff_port_intr(ap, qc, false);
1578 } 1578 }
1579 EXPORT_SYMBOL_GPL(ata_sff_port_intr); 1579 EXPORT_SYMBOL_GPL(ata_sff_port_intr);
1580 1580
1581 static inline irqreturn_t __ata_sff_interrupt(int irq, void *dev_instance, 1581 static inline irqreturn_t __ata_sff_interrupt(int irq, void *dev_instance,
1582 unsigned int (*port_intr)(struct ata_port *, struct ata_queued_cmd *)) 1582 unsigned int (*port_intr)(struct ata_port *, struct ata_queued_cmd *))
1583 { 1583 {
1584 struct ata_host *host = dev_instance; 1584 struct ata_host *host = dev_instance;
1585 bool retried = false; 1585 bool retried = false;
1586 unsigned int i; 1586 unsigned int i;
1587 unsigned int handled, idle, polling; 1587 unsigned int handled, idle, polling;
1588 unsigned long flags; 1588 unsigned long flags;
1589 1589
1590 /* TODO: make _irqsave conditional on x86 PCI IDE legacy mode */ 1590 /* TODO: make _irqsave conditional on x86 PCI IDE legacy mode */
1591 spin_lock_irqsave(&host->lock, flags); 1591 spin_lock_irqsave(&host->lock, flags);
1592 1592
1593 retry: 1593 retry:
1594 handled = idle = polling = 0; 1594 handled = idle = polling = 0;
1595 for (i = 0; i < host->n_ports; i++) { 1595 for (i = 0; i < host->n_ports; i++) {
1596 struct ata_port *ap = host->ports[i]; 1596 struct ata_port *ap = host->ports[i];
1597 struct ata_queued_cmd *qc; 1597 struct ata_queued_cmd *qc;
1598 1598
1599 qc = ata_qc_from_tag(ap, ap->link.active_tag); 1599 qc = ata_qc_from_tag(ap, ap->link.active_tag);
1600 if (qc) { 1600 if (qc) {
1601 if (!(qc->tf.flags & ATA_TFLAG_POLLING)) 1601 if (!(qc->tf.flags & ATA_TFLAG_POLLING))
1602 handled |= port_intr(ap, qc); 1602 handled |= port_intr(ap, qc);
1603 else 1603 else
1604 polling |= 1 << i; 1604 polling |= 1 << i;
1605 } else 1605 } else
1606 idle |= 1 << i; 1606 idle |= 1 << i;
1607 } 1607 }
1608 1608
1609 /* 1609 /*
1610 * If no port was expecting IRQ but the controller is actually 1610 * If no port was expecting IRQ but the controller is actually
1611 * asserting IRQ line, nobody cared will ensue. Check IRQ 1611 * asserting IRQ line, nobody cared will ensue. Check IRQ
1612 * pending status if available and clear spurious IRQ. 1612 * pending status if available and clear spurious IRQ.
1613 */ 1613 */
1614 if (!handled && !retried) { 1614 if (!handled && !retried) {
1615 bool retry = false; 1615 bool retry = false;
1616 1616
1617 for (i = 0; i < host->n_ports; i++) { 1617 for (i = 0; i < host->n_ports; i++) {
1618 struct ata_port *ap = host->ports[i]; 1618 struct ata_port *ap = host->ports[i];
1619 1619
1620 if (polling & (1 << i)) 1620 if (polling & (1 << i))
1621 continue; 1621 continue;
1622 1622
1623 if (!ap->ops->sff_irq_check || 1623 if (!ap->ops->sff_irq_check ||
1624 !ap->ops->sff_irq_check(ap)) 1624 !ap->ops->sff_irq_check(ap))
1625 continue; 1625 continue;
1626 1626
1627 if (idle & (1 << i)) { 1627 if (idle & (1 << i)) {
1628 ap->ops->sff_check_status(ap); 1628 ap->ops->sff_check_status(ap);
1629 if (ap->ops->sff_irq_clear) 1629 if (ap->ops->sff_irq_clear)
1630 ap->ops->sff_irq_clear(ap); 1630 ap->ops->sff_irq_clear(ap);
1631 } else { 1631 } else {
1632 /* clear INTRQ and check if BUSY cleared */ 1632 /* clear INTRQ and check if BUSY cleared */
1633 if (!(ap->ops->sff_check_status(ap) & ATA_BUSY)) 1633 if (!(ap->ops->sff_check_status(ap) & ATA_BUSY))
1634 retry |= true; 1634 retry |= true;
1635 /* 1635 /*
1636 * With command in flight, we can't do 1636 * With command in flight, we can't do
1637 * sff_irq_clear() w/o racing with completion. 1637 * sff_irq_clear() w/o racing with completion.
1638 */ 1638 */
1639 } 1639 }
1640 } 1640 }
1641 1641
1642 if (retry) { 1642 if (retry) {
1643 retried = true; 1643 retried = true;
1644 goto retry; 1644 goto retry;
1645 } 1645 }
1646 } 1646 }
1647 1647
1648 spin_unlock_irqrestore(&host->lock, flags); 1648 spin_unlock_irqrestore(&host->lock, flags);
1649 1649
1650 return IRQ_RETVAL(handled); 1650 return IRQ_RETVAL(handled);
1651 } 1651 }
1652 1652
1653 /** 1653 /**
1654 * ata_sff_interrupt - Default SFF ATA host interrupt handler 1654 * ata_sff_interrupt - Default SFF ATA host interrupt handler
1655 * @irq: irq line (unused) 1655 * @irq: irq line (unused)
1656 * @dev_instance: pointer to our ata_host information structure 1656 * @dev_instance: pointer to our ata_host information structure
1657 * 1657 *
1658 * Default interrupt handler for PCI IDE devices. Calls 1658 * Default interrupt handler for PCI IDE devices. Calls
1659 * ata_sff_port_intr() for each port that is not disabled. 1659 * ata_sff_port_intr() for each port that is not disabled.
1660 * 1660 *
1661 * LOCKING: 1661 * LOCKING:
1662 * Obtains host lock during operation. 1662 * Obtains host lock during operation.
1663 * 1663 *
1664 * RETURNS: 1664 * RETURNS:
1665 * IRQ_NONE or IRQ_HANDLED. 1665 * IRQ_NONE or IRQ_HANDLED.
1666 */ 1666 */
1667 irqreturn_t ata_sff_interrupt(int irq, void *dev_instance) 1667 irqreturn_t ata_sff_interrupt(int irq, void *dev_instance)
1668 { 1668 {
1669 return __ata_sff_interrupt(irq, dev_instance, ata_sff_port_intr); 1669 return __ata_sff_interrupt(irq, dev_instance, ata_sff_port_intr);
1670 } 1670 }
1671 EXPORT_SYMBOL_GPL(ata_sff_interrupt); 1671 EXPORT_SYMBOL_GPL(ata_sff_interrupt);
1672 1672
1673 /** 1673 /**
1674 * ata_sff_lost_interrupt - Check for an apparent lost interrupt 1674 * ata_sff_lost_interrupt - Check for an apparent lost interrupt
1675 * @ap: port that appears to have timed out 1675 * @ap: port that appears to have timed out
1676 * 1676 *
1677 * Called from the libata error handlers when the core code suspects 1677 * Called from the libata error handlers when the core code suspects
1678 * an interrupt has been lost. If it has complete anything we can and 1678 * an interrupt has been lost. If it has complete anything we can and
1679 * then return. Interface must support altstatus for this faster 1679 * then return. Interface must support altstatus for this faster
1680 * recovery to occur. 1680 * recovery to occur.
1681 * 1681 *
1682 * Locking: 1682 * Locking:
1683 * Caller holds host lock 1683 * Caller holds host lock
1684 */ 1684 */
1685 1685
1686 void ata_sff_lost_interrupt(struct ata_port *ap) 1686 void ata_sff_lost_interrupt(struct ata_port *ap)
1687 { 1687 {
1688 u8 status; 1688 u8 status;
1689 struct ata_queued_cmd *qc; 1689 struct ata_queued_cmd *qc;
1690 1690
1691 /* Only one outstanding command per SFF channel */ 1691 /* Only one outstanding command per SFF channel */
1692 qc = ata_qc_from_tag(ap, ap->link.active_tag); 1692 qc = ata_qc_from_tag(ap, ap->link.active_tag);
1693 /* We cannot lose an interrupt on a non-existent or polled command */ 1693 /* We cannot lose an interrupt on a non-existent or polled command */
1694 if (!qc || qc->tf.flags & ATA_TFLAG_POLLING) 1694 if (!qc || qc->tf.flags & ATA_TFLAG_POLLING)
1695 return; 1695 return;
1696 /* See if the controller thinks it is still busy - if so the command 1696 /* See if the controller thinks it is still busy - if so the command
1697 isn't a lost IRQ but is still in progress */ 1697 isn't a lost IRQ but is still in progress */
1698 status = ata_sff_altstatus(ap); 1698 status = ata_sff_altstatus(ap);
1699 if (status & ATA_BUSY) 1699 if (status & ATA_BUSY)
1700 return; 1700 return;
1701 1701
1702 /* There was a command running, we are no longer busy and we have 1702 /* There was a command running, we are no longer busy and we have
1703 no interrupt. */ 1703 no interrupt. */
1704 ata_port_printk(ap, KERN_WARNING, "lost interrupt (Status 0x%x)\n", 1704 ata_port_printk(ap, KERN_WARNING, "lost interrupt (Status 0x%x)\n",
1705 status); 1705 status);
1706 /* Run the host interrupt logic as if the interrupt had not been 1706 /* Run the host interrupt logic as if the interrupt had not been
1707 lost */ 1707 lost */
1708 ata_sff_port_intr(ap, qc); 1708 ata_sff_port_intr(ap, qc);
1709 } 1709 }
1710 EXPORT_SYMBOL_GPL(ata_sff_lost_interrupt); 1710 EXPORT_SYMBOL_GPL(ata_sff_lost_interrupt);
1711 1711
1712 /** 1712 /**
1713 * ata_sff_freeze - Freeze SFF controller port 1713 * ata_sff_freeze - Freeze SFF controller port
1714 * @ap: port to freeze 1714 * @ap: port to freeze
1715 * 1715 *
1716 * Freeze SFF controller port. 1716 * Freeze SFF controller port.
1717 * 1717 *
1718 * LOCKING: 1718 * LOCKING:
1719 * Inherited from caller. 1719 * Inherited from caller.
1720 */ 1720 */
1721 void ata_sff_freeze(struct ata_port *ap) 1721 void ata_sff_freeze(struct ata_port *ap)
1722 { 1722 {
1723 ap->ctl |= ATA_NIEN; 1723 ap->ctl |= ATA_NIEN;
1724 ap->last_ctl = ap->ctl; 1724 ap->last_ctl = ap->ctl;
1725 1725
1726 if (ap->ops->sff_set_devctl || ap->ioaddr.ctl_addr) 1726 if (ap->ops->sff_set_devctl || ap->ioaddr.ctl_addr)
1727 ata_sff_set_devctl(ap, ap->ctl); 1727 ata_sff_set_devctl(ap, ap->ctl);
1728 1728
1729 /* Under certain circumstances, some controllers raise IRQ on 1729 /* Under certain circumstances, some controllers raise IRQ on
1730 * ATA_NIEN manipulation. Also, many controllers fail to mask 1730 * ATA_NIEN manipulation. Also, many controllers fail to mask
1731 * previously pending IRQ on ATA_NIEN assertion. Clear it. 1731 * previously pending IRQ on ATA_NIEN assertion. Clear it.
1732 */ 1732 */
1733 ap->ops->sff_check_status(ap); 1733 ap->ops->sff_check_status(ap);
1734 1734
1735 if (ap->ops->sff_irq_clear) 1735 if (ap->ops->sff_irq_clear)
1736 ap->ops->sff_irq_clear(ap); 1736 ap->ops->sff_irq_clear(ap);
1737 } 1737 }
1738 EXPORT_SYMBOL_GPL(ata_sff_freeze); 1738 EXPORT_SYMBOL_GPL(ata_sff_freeze);
1739 1739
1740 /** 1740 /**
1741 * ata_sff_thaw - Thaw SFF controller port 1741 * ata_sff_thaw - Thaw SFF controller port
1742 * @ap: port to thaw 1742 * @ap: port to thaw
1743 * 1743 *
1744 * Thaw SFF controller port. 1744 * Thaw SFF controller port.
1745 * 1745 *
1746 * LOCKING: 1746 * LOCKING:
1747 * Inherited from caller. 1747 * Inherited from caller.
1748 */ 1748 */
1749 void ata_sff_thaw(struct ata_port *ap) 1749 void ata_sff_thaw(struct ata_port *ap)
1750 { 1750 {
1751 /* clear & re-enable interrupts */ 1751 /* clear & re-enable interrupts */
1752 ap->ops->sff_check_status(ap); 1752 ap->ops->sff_check_status(ap);
1753 if (ap->ops->sff_irq_clear) 1753 if (ap->ops->sff_irq_clear)
1754 ap->ops->sff_irq_clear(ap); 1754 ap->ops->sff_irq_clear(ap);
1755 ata_sff_irq_on(ap); 1755 ata_sff_irq_on(ap);
1756 } 1756 }
1757 EXPORT_SYMBOL_GPL(ata_sff_thaw); 1757 EXPORT_SYMBOL_GPL(ata_sff_thaw);
1758 1758
1759 /** 1759 /**
1760 * ata_sff_prereset - prepare SFF link for reset 1760 * ata_sff_prereset - prepare SFF link for reset
1761 * @link: SFF link to be reset 1761 * @link: SFF link to be reset
1762 * @deadline: deadline jiffies for the operation 1762 * @deadline: deadline jiffies for the operation
1763 * 1763 *
1764 * SFF link @link is about to be reset. Initialize it. It first 1764 * SFF link @link is about to be reset. Initialize it. It first
1765 * calls ata_std_prereset() and wait for !BSY if the port is 1765 * calls ata_std_prereset() and wait for !BSY if the port is
1766 * being softreset. 1766 * being softreset.
1767 * 1767 *
1768 * LOCKING: 1768 * LOCKING:
1769 * Kernel thread context (may sleep) 1769 * Kernel thread context (may sleep)
1770 * 1770 *
1771 * RETURNS: 1771 * RETURNS:
1772 * 0 on success, -errno otherwise. 1772 * 0 on success, -errno otherwise.
1773 */ 1773 */
1774 int ata_sff_prereset(struct ata_link *link, unsigned long deadline) 1774 int ata_sff_prereset(struct ata_link *link, unsigned long deadline)
1775 { 1775 {
1776 struct ata_eh_context *ehc = &link->eh_context; 1776 struct ata_eh_context *ehc = &link->eh_context;
1777 int rc; 1777 int rc;
1778 1778
1779 rc = ata_std_prereset(link, deadline); 1779 rc = ata_std_prereset(link, deadline);
1780 if (rc) 1780 if (rc)
1781 return rc; 1781 return rc;
1782 1782
1783 /* if we're about to do hardreset, nothing more to do */ 1783 /* if we're about to do hardreset, nothing more to do */
1784 if (ehc->i.action & ATA_EH_HARDRESET) 1784 if (ehc->i.action & ATA_EH_HARDRESET)
1785 return 0; 1785 return 0;
1786 1786
1787 /* wait for !BSY if we don't know that no device is attached */ 1787 /* wait for !BSY if we don't know that no device is attached */
1788 if (!ata_link_offline(link)) { 1788 if (!ata_link_offline(link)) {
1789 rc = ata_sff_wait_ready(link, deadline); 1789 rc = ata_sff_wait_ready(link, deadline);
1790 if (rc && rc != -ENODEV) { 1790 if (rc && rc != -ENODEV) {
1791 ata_link_printk(link, KERN_WARNING, "device not ready " 1791 ata_link_printk(link, KERN_WARNING, "device not ready "
1792 "(errno=%d), forcing hardreset\n", rc); 1792 "(errno=%d), forcing hardreset\n", rc);
1793 ehc->i.action |= ATA_EH_HARDRESET; 1793 ehc->i.action |= ATA_EH_HARDRESET;
1794 } 1794 }
1795 } 1795 }
1796 1796
1797 return 0; 1797 return 0;
1798 } 1798 }
1799 EXPORT_SYMBOL_GPL(ata_sff_prereset); 1799 EXPORT_SYMBOL_GPL(ata_sff_prereset);
1800 1800
1801 /** 1801 /**
1802 * ata_devchk - PATA device presence detection 1802 * ata_devchk - PATA device presence detection
1803 * @ap: ATA channel to examine 1803 * @ap: ATA channel to examine
1804 * @device: Device to examine (starting at zero) 1804 * @device: Device to examine (starting at zero)
1805 * 1805 *
1806 * This technique was originally described in 1806 * This technique was originally described in
1807 * Hale Landis's ATADRVR (www.ata-atapi.com), and 1807 * Hale Landis's ATADRVR (www.ata-atapi.com), and
1808 * later found its way into the ATA/ATAPI spec. 1808 * later found its way into the ATA/ATAPI spec.
1809 * 1809 *
1810 * Write a pattern to the ATA shadow registers, 1810 * Write a pattern to the ATA shadow registers,
1811 * and if a device is present, it will respond by 1811 * and if a device is present, it will respond by
1812 * correctly storing and echoing back the 1812 * correctly storing and echoing back the
1813 * ATA shadow register contents. 1813 * ATA shadow register contents.
1814 * 1814 *
1815 * LOCKING: 1815 * LOCKING:
1816 * caller. 1816 * caller.
1817 */ 1817 */
1818 static unsigned int ata_devchk(struct ata_port *ap, unsigned int device) 1818 static unsigned int ata_devchk(struct ata_port *ap, unsigned int device)
1819 { 1819 {
1820 struct ata_ioports *ioaddr = &ap->ioaddr; 1820 struct ata_ioports *ioaddr = &ap->ioaddr;
1821 u8 nsect, lbal; 1821 u8 nsect, lbal;
1822 1822
1823 ap->ops->sff_dev_select(ap, device); 1823 ap->ops->sff_dev_select(ap, device);
1824 1824
1825 iowrite8(0x55, ioaddr->nsect_addr); 1825 iowrite8(0x55, ioaddr->nsect_addr);
1826 iowrite8(0xaa, ioaddr->lbal_addr); 1826 iowrite8(0xaa, ioaddr->lbal_addr);
1827 1827
1828 iowrite8(0xaa, ioaddr->nsect_addr); 1828 iowrite8(0xaa, ioaddr->nsect_addr);
1829 iowrite8(0x55, ioaddr->lbal_addr); 1829 iowrite8(0x55, ioaddr->lbal_addr);
1830 1830
1831 iowrite8(0x55, ioaddr->nsect_addr); 1831 iowrite8(0x55, ioaddr->nsect_addr);
1832 iowrite8(0xaa, ioaddr->lbal_addr); 1832 iowrite8(0xaa, ioaddr->lbal_addr);
1833 1833
1834 nsect = ioread8(ioaddr->nsect_addr); 1834 nsect = ioread8(ioaddr->nsect_addr);
1835 lbal = ioread8(ioaddr->lbal_addr); 1835 lbal = ioread8(ioaddr->lbal_addr);
1836 1836
1837 if ((nsect == 0x55) && (lbal == 0xaa)) 1837 if ((nsect == 0x55) && (lbal == 0xaa))
1838 return 1; /* we found a device */ 1838 return 1; /* we found a device */
1839 1839
1840 return 0; /* nothing found */ 1840 return 0; /* nothing found */
1841 } 1841 }
1842 1842
1843 /** 1843 /**
1844 * ata_sff_dev_classify - Parse returned ATA device signature 1844 * ata_sff_dev_classify - Parse returned ATA device signature
1845 * @dev: ATA device to classify (starting at zero) 1845 * @dev: ATA device to classify (starting at zero)
1846 * @present: device seems present 1846 * @present: device seems present
1847 * @r_err: Value of error register on completion 1847 * @r_err: Value of error register on completion
1848 * 1848 *
1849 * After an event -- SRST, E.D.D., or SATA COMRESET -- occurs, 1849 * After an event -- SRST, E.D.D., or SATA COMRESET -- occurs,
1850 * an ATA/ATAPI-defined set of values is placed in the ATA 1850 * an ATA/ATAPI-defined set of values is placed in the ATA
1851 * shadow registers, indicating the results of device detection 1851 * shadow registers, indicating the results of device detection
1852 * and diagnostics. 1852 * and diagnostics.
1853 * 1853 *
1854 * Select the ATA device, and read the values from the ATA shadow 1854 * Select the ATA device, and read the values from the ATA shadow
1855 * registers. Then parse according to the Error register value, 1855 * registers. Then parse according to the Error register value,
1856 * and the spec-defined values examined by ata_dev_classify(). 1856 * and the spec-defined values examined by ata_dev_classify().
1857 * 1857 *
1858 * LOCKING: 1858 * LOCKING:
1859 * caller. 1859 * caller.
1860 * 1860 *
1861 * RETURNS: 1861 * RETURNS:
1862 * Device type - %ATA_DEV_ATA, %ATA_DEV_ATAPI or %ATA_DEV_NONE. 1862 * Device type - %ATA_DEV_ATA, %ATA_DEV_ATAPI or %ATA_DEV_NONE.
1863 */ 1863 */
1864 unsigned int ata_sff_dev_classify(struct ata_device *dev, int present, 1864 unsigned int ata_sff_dev_classify(struct ata_device *dev, int present,
1865 u8 *r_err) 1865 u8 *r_err)
1866 { 1866 {
1867 struct ata_port *ap = dev->link->ap; 1867 struct ata_port *ap = dev->link->ap;
1868 struct ata_taskfile tf; 1868 struct ata_taskfile tf;
1869 unsigned int class; 1869 unsigned int class;
1870 u8 err; 1870 u8 err;
1871 1871
1872 ap->ops->sff_dev_select(ap, dev->devno); 1872 ap->ops->sff_dev_select(ap, dev->devno);
1873 1873
1874 memset(&tf, 0, sizeof(tf)); 1874 memset(&tf, 0, sizeof(tf));
1875 1875
1876 ap->ops->sff_tf_read(ap, &tf); 1876 ap->ops->sff_tf_read(ap, &tf);
1877 err = tf.feature; 1877 err = tf.feature;
1878 if (r_err) 1878 if (r_err)
1879 *r_err = err; 1879 *r_err = err;
1880 1880
1881 /* see if device passed diags: continue and warn later */ 1881 /* see if device passed diags: continue and warn later */
1882 if (err == 0) 1882 if (err == 0)
1883 /* diagnostic fail : do nothing _YET_ */ 1883 /* diagnostic fail : do nothing _YET_ */
1884 dev->horkage |= ATA_HORKAGE_DIAGNOSTIC; 1884 dev->horkage |= ATA_HORKAGE_DIAGNOSTIC;
1885 else if (err == 1) 1885 else if (err == 1)
1886 /* do nothing */ ; 1886 /* do nothing */ ;
1887 else if ((dev->devno == 0) && (err == 0x81)) 1887 else if ((dev->devno == 0) && (err == 0x81))
1888 /* do nothing */ ; 1888 /* do nothing */ ;
1889 else 1889 else
1890 return ATA_DEV_NONE; 1890 return ATA_DEV_NONE;
1891 1891
1892 /* determine if device is ATA or ATAPI */ 1892 /* determine if device is ATA or ATAPI */
1893 class = ata_dev_classify(&tf); 1893 class = ata_dev_classify(&tf);
1894 1894
1895 if (class == ATA_DEV_UNKNOWN) { 1895 if (class == ATA_DEV_UNKNOWN) {
1896 /* If the device failed diagnostic, it's likely to 1896 /* If the device failed diagnostic, it's likely to
1897 * have reported incorrect device signature too. 1897 * have reported incorrect device signature too.
1898 * Assume ATA device if the device seems present but 1898 * Assume ATA device if the device seems present but
1899 * device signature is invalid with diagnostic 1899 * device signature is invalid with diagnostic
1900 * failure. 1900 * failure.
1901 */ 1901 */
1902 if (present && (dev->horkage & ATA_HORKAGE_DIAGNOSTIC)) 1902 if (present && (dev->horkage & ATA_HORKAGE_DIAGNOSTIC))
1903 class = ATA_DEV_ATA; 1903 class = ATA_DEV_ATA;
1904 else 1904 else
1905 class = ATA_DEV_NONE; 1905 class = ATA_DEV_NONE;
1906 } else if ((class == ATA_DEV_ATA) && 1906 } else if ((class == ATA_DEV_ATA) &&
1907 (ap->ops->sff_check_status(ap) == 0)) 1907 (ap->ops->sff_check_status(ap) == 0))
1908 class = ATA_DEV_NONE; 1908 class = ATA_DEV_NONE;
1909 1909
1910 return class; 1910 return class;
1911 } 1911 }
1912 EXPORT_SYMBOL_GPL(ata_sff_dev_classify); 1912 EXPORT_SYMBOL_GPL(ata_sff_dev_classify);
1913 1913
1914 /** 1914 /**
1915 * ata_sff_wait_after_reset - wait for devices to become ready after reset 1915 * ata_sff_wait_after_reset - wait for devices to become ready after reset
1916 * @link: SFF link which is just reset 1916 * @link: SFF link which is just reset
1917 * @devmask: mask of present devices 1917 * @devmask: mask of present devices
1918 * @deadline: deadline jiffies for the operation 1918 * @deadline: deadline jiffies for the operation
1919 * 1919 *
1920 * Wait devices attached to SFF @link to become ready after 1920 * Wait devices attached to SFF @link to become ready after
1921 * reset. It contains preceding 150ms wait to avoid accessing TF 1921 * reset. It contains preceding 150ms wait to avoid accessing TF
1922 * status register too early. 1922 * status register too early.
1923 * 1923 *
1924 * LOCKING: 1924 * LOCKING:
1925 * Kernel thread context (may sleep). 1925 * Kernel thread context (may sleep).
1926 * 1926 *
1927 * RETURNS: 1927 * RETURNS:
1928 * 0 on success, -ENODEV if some or all of devices in @devmask 1928 * 0 on success, -ENODEV if some or all of devices in @devmask
1929 * don't seem to exist. -errno on other errors. 1929 * don't seem to exist. -errno on other errors.
1930 */ 1930 */
1931 int ata_sff_wait_after_reset(struct ata_link *link, unsigned int devmask, 1931 int ata_sff_wait_after_reset(struct ata_link *link, unsigned int devmask,
1932 unsigned long deadline) 1932 unsigned long deadline)
1933 { 1933 {
1934 struct ata_port *ap = link->ap; 1934 struct ata_port *ap = link->ap;
1935 struct ata_ioports *ioaddr = &ap->ioaddr; 1935 struct ata_ioports *ioaddr = &ap->ioaddr;
1936 unsigned int dev0 = devmask & (1 << 0); 1936 unsigned int dev0 = devmask & (1 << 0);
1937 unsigned int dev1 = devmask & (1 << 1); 1937 unsigned int dev1 = devmask & (1 << 1);
1938 int rc, ret = 0; 1938 int rc, ret = 0;
1939 1939
1940 ata_msleep(ap, ATA_WAIT_AFTER_RESET); 1940 ata_msleep(ap, ATA_WAIT_AFTER_RESET);
1941 1941
1942 /* always check readiness of the master device */ 1942 /* always check readiness of the master device */
1943 rc = ata_sff_wait_ready(link, deadline); 1943 rc = ata_sff_wait_ready(link, deadline);
1944 /* -ENODEV means the odd clown forgot the D7 pulldown resistor 1944 /* -ENODEV means the odd clown forgot the D7 pulldown resistor
1945 * and TF status is 0xff, bail out on it too. 1945 * and TF status is 0xff, bail out on it too.
1946 */ 1946 */
1947 if (rc) 1947 if (rc)
1948 return rc; 1948 return rc;
1949 1949
1950 /* if device 1 was found in ata_devchk, wait for register 1950 /* if device 1 was found in ata_devchk, wait for register
1951 * access briefly, then wait for BSY to clear. 1951 * access briefly, then wait for BSY to clear.
1952 */ 1952 */
1953 if (dev1) { 1953 if (dev1) {
1954 int i; 1954 int i;
1955 1955
1956 ap->ops->sff_dev_select(ap, 1); 1956 ap->ops->sff_dev_select(ap, 1);
1957 1957
1958 /* Wait for register access. Some ATAPI devices fail 1958 /* Wait for register access. Some ATAPI devices fail
1959 * to set nsect/lbal after reset, so don't waste too 1959 * to set nsect/lbal after reset, so don't waste too
1960 * much time on it. We're gonna wait for !BSY anyway. 1960 * much time on it. We're gonna wait for !BSY anyway.
1961 */ 1961 */
1962 for (i = 0; i < 2; i++) { 1962 for (i = 0; i < 2; i++) {
1963 u8 nsect, lbal; 1963 u8 nsect, lbal;
1964 1964
1965 nsect = ioread8(ioaddr->nsect_addr); 1965 nsect = ioread8(ioaddr->nsect_addr);
1966 lbal = ioread8(ioaddr->lbal_addr); 1966 lbal = ioread8(ioaddr->lbal_addr);
1967 if ((nsect == 1) && (lbal == 1)) 1967 if ((nsect == 1) && (lbal == 1))
1968 break; 1968 break;
1969 ata_msleep(ap, 50); /* give drive a breather */ 1969 ata_msleep(ap, 50); /* give drive a breather */
1970 } 1970 }
1971 1971
1972 rc = ata_sff_wait_ready(link, deadline); 1972 rc = ata_sff_wait_ready(link, deadline);
1973 if (rc) { 1973 if (rc) {
1974 if (rc != -ENODEV) 1974 if (rc != -ENODEV)
1975 return rc; 1975 return rc;
1976 ret = rc; 1976 ret = rc;
1977 } 1977 }
1978 } 1978 }
1979 1979
1980 /* is all this really necessary? */ 1980 /* is all this really necessary? */
1981 ap->ops->sff_dev_select(ap, 0); 1981 ap->ops->sff_dev_select(ap, 0);
1982 if (dev1) 1982 if (dev1)
1983 ap->ops->sff_dev_select(ap, 1); 1983 ap->ops->sff_dev_select(ap, 1);
1984 if (dev0) 1984 if (dev0)
1985 ap->ops->sff_dev_select(ap, 0); 1985 ap->ops->sff_dev_select(ap, 0);
1986 1986
1987 return ret; 1987 return ret;
1988 } 1988 }
1989 EXPORT_SYMBOL_GPL(ata_sff_wait_after_reset); 1989 EXPORT_SYMBOL_GPL(ata_sff_wait_after_reset);
1990 1990
1991 static int ata_bus_softreset(struct ata_port *ap, unsigned int devmask, 1991 static int ata_bus_softreset(struct ata_port *ap, unsigned int devmask,
1992 unsigned long deadline) 1992 unsigned long deadline)
1993 { 1993 {
1994 struct ata_ioports *ioaddr = &ap->ioaddr; 1994 struct ata_ioports *ioaddr = &ap->ioaddr;
1995 1995
1996 DPRINTK("ata%u: bus reset via SRST\n", ap->print_id); 1996 DPRINTK("ata%u: bus reset via SRST\n", ap->print_id);
1997 1997
1998 /* software reset. causes dev0 to be selected */ 1998 /* software reset. causes dev0 to be selected */
1999 iowrite8(ap->ctl, ioaddr->ctl_addr); 1999 iowrite8(ap->ctl, ioaddr->ctl_addr);
2000 udelay(20); /* FIXME: flush */ 2000 udelay(20); /* FIXME: flush */
2001 iowrite8(ap->ctl | ATA_SRST, ioaddr->ctl_addr); 2001 iowrite8(ap->ctl | ATA_SRST, ioaddr->ctl_addr);
2002 udelay(20); /* FIXME: flush */ 2002 udelay(20); /* FIXME: flush */
2003 iowrite8(ap->ctl, ioaddr->ctl_addr); 2003 iowrite8(ap->ctl, ioaddr->ctl_addr);
2004 ap->last_ctl = ap->ctl; 2004 ap->last_ctl = ap->ctl;
2005 2005
2006 /* wait the port to become ready */ 2006 /* wait the port to become ready */
2007 return ata_sff_wait_after_reset(&ap->link, devmask, deadline); 2007 return ata_sff_wait_after_reset(&ap->link, devmask, deadline);
2008 } 2008 }
2009 2009
2010 /** 2010 /**
2011 * ata_sff_softreset - reset host port via ATA SRST 2011 * ata_sff_softreset - reset host port via ATA SRST
2012 * @link: ATA link to reset 2012 * @link: ATA link to reset
2013 * @classes: resulting classes of attached devices 2013 * @classes: resulting classes of attached devices
2014 * @deadline: deadline jiffies for the operation 2014 * @deadline: deadline jiffies for the operation
2015 * 2015 *
2016 * Reset host port using ATA SRST. 2016 * Reset host port using ATA SRST.
2017 * 2017 *
2018 * LOCKING: 2018 * LOCKING:
2019 * Kernel thread context (may sleep) 2019 * Kernel thread context (may sleep)
2020 * 2020 *
2021 * RETURNS: 2021 * RETURNS:
2022 * 0 on success, -errno otherwise. 2022 * 0 on success, -errno otherwise.
2023 */ 2023 */
2024 int ata_sff_softreset(struct ata_link *link, unsigned int *classes, 2024 int ata_sff_softreset(struct ata_link *link, unsigned int *classes,
2025 unsigned long deadline) 2025 unsigned long deadline)
2026 { 2026 {
2027 struct ata_port *ap = link->ap; 2027 struct ata_port *ap = link->ap;
2028 unsigned int slave_possible = ap->flags & ATA_FLAG_SLAVE_POSS; 2028 unsigned int slave_possible = ap->flags & ATA_FLAG_SLAVE_POSS;
2029 unsigned int devmask = 0; 2029 unsigned int devmask = 0;
2030 int rc; 2030 int rc;
2031 u8 err; 2031 u8 err;
2032 2032
2033 DPRINTK("ENTER\n"); 2033 DPRINTK("ENTER\n");
2034 2034
2035 /* determine if device 0/1 are present */ 2035 /* determine if device 0/1 are present */
2036 if (ata_devchk(ap, 0)) 2036 if (ata_devchk(ap, 0))
2037 devmask |= (1 << 0); 2037 devmask |= (1 << 0);
2038 if (slave_possible && ata_devchk(ap, 1)) 2038 if (slave_possible && ata_devchk(ap, 1))
2039 devmask |= (1 << 1); 2039 devmask |= (1 << 1);
2040 2040
2041 /* select device 0 again */ 2041 /* select device 0 again */
2042 ap->ops->sff_dev_select(ap, 0); 2042 ap->ops->sff_dev_select(ap, 0);
2043 2043
2044 /* issue bus reset */ 2044 /* issue bus reset */
2045 DPRINTK("about to softreset, devmask=%x\n", devmask); 2045 DPRINTK("about to softreset, devmask=%x\n", devmask);
2046 rc = ata_bus_softreset(ap, devmask, deadline); 2046 rc = ata_bus_softreset(ap, devmask, deadline);
2047 /* if link is occupied, -ENODEV too is an error */ 2047 /* if link is occupied, -ENODEV too is an error */
2048 if (rc && (rc != -ENODEV || sata_scr_valid(link))) { 2048 if (rc && (rc != -ENODEV || sata_scr_valid(link))) {
2049 ata_link_printk(link, KERN_ERR, "SRST failed (errno=%d)\n", rc); 2049 ata_link_printk(link, KERN_ERR, "SRST failed (errno=%d)\n", rc);
2050 return rc; 2050 return rc;
2051 } 2051 }
2052 2052
2053 /* determine by signature whether we have ATA or ATAPI devices */ 2053 /* determine by signature whether we have ATA or ATAPI devices */
2054 classes[0] = ata_sff_dev_classify(&link->device[0], 2054 classes[0] = ata_sff_dev_classify(&link->device[0],
2055 devmask & (1 << 0), &err); 2055 devmask & (1 << 0), &err);
2056 if (slave_possible && err != 0x81) 2056 if (slave_possible && err != 0x81)
2057 classes[1] = ata_sff_dev_classify(&link->device[1], 2057 classes[1] = ata_sff_dev_classify(&link->device[1],
2058 devmask & (1 << 1), &err); 2058 devmask & (1 << 1), &err);
2059 2059
2060 DPRINTK("EXIT, classes[0]=%u [1]=%u\n", classes[0], classes[1]); 2060 DPRINTK("EXIT, classes[0]=%u [1]=%u\n", classes[0], classes[1]);
2061 return 0; 2061 return 0;
2062 } 2062 }
2063 EXPORT_SYMBOL_GPL(ata_sff_softreset); 2063 EXPORT_SYMBOL_GPL(ata_sff_softreset);
2064 2064
2065 /** 2065 /**
2066 * sata_sff_hardreset - reset host port via SATA phy reset 2066 * sata_sff_hardreset - reset host port via SATA phy reset
2067 * @link: link to reset 2067 * @link: link to reset
2068 * @class: resulting class of attached device 2068 * @class: resulting class of attached device
2069 * @deadline: deadline jiffies for the operation 2069 * @deadline: deadline jiffies for the operation
2070 * 2070 *
2071 * SATA phy-reset host port using DET bits of SControl register, 2071 * SATA phy-reset host port using DET bits of SControl register,
2072 * wait for !BSY and classify the attached device. 2072 * wait for !BSY and classify the attached device.
2073 * 2073 *
2074 * LOCKING: 2074 * LOCKING:
2075 * Kernel thread context (may sleep) 2075 * Kernel thread context (may sleep)
2076 * 2076 *
2077 * RETURNS: 2077 * RETURNS:
2078 * 0 on success, -errno otherwise. 2078 * 0 on success, -errno otherwise.
2079 */ 2079 */
2080 int sata_sff_hardreset(struct ata_link *link, unsigned int *class, 2080 int sata_sff_hardreset(struct ata_link *link, unsigned int *class,
2081 unsigned long deadline) 2081 unsigned long deadline)
2082 { 2082 {
2083 struct ata_eh_context *ehc = &link->eh_context; 2083 struct ata_eh_context *ehc = &link->eh_context;
2084 const unsigned long *timing = sata_ehc_deb_timing(ehc); 2084 const unsigned long *timing = sata_ehc_deb_timing(ehc);
2085 bool online; 2085 bool online;
2086 int rc; 2086 int rc;
2087 2087
2088 rc = sata_link_hardreset(link, timing, deadline, &online, 2088 rc = sata_link_hardreset(link, timing, deadline, &online,
2089 ata_sff_check_ready); 2089 ata_sff_check_ready);
2090 if (online) 2090 if (online)
2091 *class = ata_sff_dev_classify(link->device, 1, NULL); 2091 *class = ata_sff_dev_classify(link->device, 1, NULL);
2092 2092
2093 DPRINTK("EXIT, class=%u\n", *class); 2093 DPRINTK("EXIT, class=%u\n", *class);
2094 return rc; 2094 return rc;
2095 } 2095 }
2096 EXPORT_SYMBOL_GPL(sata_sff_hardreset); 2096 EXPORT_SYMBOL_GPL(sata_sff_hardreset);
2097 2097
2098 /** 2098 /**
2099 * ata_sff_postreset - SFF postreset callback 2099 * ata_sff_postreset - SFF postreset callback
2100 * @link: the target SFF ata_link 2100 * @link: the target SFF ata_link
2101 * @classes: classes of attached devices 2101 * @classes: classes of attached devices
2102 * 2102 *
2103 * This function is invoked after a successful reset. It first 2103 * This function is invoked after a successful reset. It first
2104 * calls ata_std_postreset() and performs SFF specific postreset 2104 * calls ata_std_postreset() and performs SFF specific postreset
2105 * processing. 2105 * processing.
2106 * 2106 *
2107 * LOCKING: 2107 * LOCKING:
2108 * Kernel thread context (may sleep) 2108 * Kernel thread context (may sleep)
2109 */ 2109 */
2110 void ata_sff_postreset(struct ata_link *link, unsigned int *classes) 2110 void ata_sff_postreset(struct ata_link *link, unsigned int *classes)
2111 { 2111 {
2112 struct ata_port *ap = link->ap; 2112 struct ata_port *ap = link->ap;
2113 2113
2114 ata_std_postreset(link, classes); 2114 ata_std_postreset(link, classes);
2115 2115
2116 /* is double-select really necessary? */ 2116 /* is double-select really necessary? */
2117 if (classes[0] != ATA_DEV_NONE) 2117 if (classes[0] != ATA_DEV_NONE)
2118 ap->ops->sff_dev_select(ap, 1); 2118 ap->ops->sff_dev_select(ap, 1);
2119 if (classes[1] != ATA_DEV_NONE) 2119 if (classes[1] != ATA_DEV_NONE)
2120 ap->ops->sff_dev_select(ap, 0); 2120 ap->ops->sff_dev_select(ap, 0);
2121 2121
2122 /* bail out if no device is present */ 2122 /* bail out if no device is present */
2123 if (classes[0] == ATA_DEV_NONE && classes[1] == ATA_DEV_NONE) { 2123 if (classes[0] == ATA_DEV_NONE && classes[1] == ATA_DEV_NONE) {
2124 DPRINTK("EXIT, no device\n"); 2124 DPRINTK("EXIT, no device\n");
2125 return; 2125 return;
2126 } 2126 }
2127 2127
2128 /* set up device control */ 2128 /* set up device control */
2129 if (ap->ops->sff_set_devctl || ap->ioaddr.ctl_addr) { 2129 if (ap->ops->sff_set_devctl || ap->ioaddr.ctl_addr) {
2130 ata_sff_set_devctl(ap, ap->ctl); 2130 ata_sff_set_devctl(ap, ap->ctl);
2131 ap->last_ctl = ap->ctl; 2131 ap->last_ctl = ap->ctl;
2132 } 2132 }
2133 } 2133 }
2134 EXPORT_SYMBOL_GPL(ata_sff_postreset); 2134 EXPORT_SYMBOL_GPL(ata_sff_postreset);
2135 2135
2136 /** 2136 /**
2137 * ata_sff_drain_fifo - Stock FIFO drain logic for SFF controllers 2137 * ata_sff_drain_fifo - Stock FIFO drain logic for SFF controllers
2138 * @qc: command 2138 * @qc: command
2139 * 2139 *
2140 * Drain the FIFO and device of any stuck data following a command 2140 * Drain the FIFO and device of any stuck data following a command
2141 * failing to complete. In some cases this is necessary before a 2141 * failing to complete. In some cases this is necessary before a
2142 * reset will recover the device. 2142 * reset will recover the device.
2143 * 2143 *
2144 */ 2144 */
2145 2145
2146 void ata_sff_drain_fifo(struct ata_queued_cmd *qc) 2146 void ata_sff_drain_fifo(struct ata_queued_cmd *qc)
2147 { 2147 {
2148 int count; 2148 int count;
2149 struct ata_port *ap; 2149 struct ata_port *ap;
2150 2150
2151 /* We only need to flush incoming data when a command was running */ 2151 /* We only need to flush incoming data when a command was running */
2152 if (qc == NULL || qc->dma_dir == DMA_TO_DEVICE) 2152 if (qc == NULL || qc->dma_dir == DMA_TO_DEVICE)
2153 return; 2153 return;
2154 2154
2155 ap = qc->ap; 2155 ap = qc->ap;
2156 /* Drain up to 64K of data before we give up this recovery method */ 2156 /* Drain up to 64K of data before we give up this recovery method */
2157 for (count = 0; (ap->ops->sff_check_status(ap) & ATA_DRQ) 2157 for (count = 0; (ap->ops->sff_check_status(ap) & ATA_DRQ)
2158 && count < 65536; count += 2) 2158 && count < 65536; count += 2)
2159 ioread16(ap->ioaddr.data_addr); 2159 ioread16(ap->ioaddr.data_addr);
2160 2160
2161 /* Can become DEBUG later */ 2161 /* Can become DEBUG later */
2162 if (count) 2162 if (count)
2163 ata_port_printk(ap, KERN_DEBUG, 2163 ata_port_printk(ap, KERN_DEBUG,
2164 "drained %d bytes to clear DRQ.\n", count); 2164 "drained %d bytes to clear DRQ.\n", count);
2165 2165
2166 } 2166 }
2167 EXPORT_SYMBOL_GPL(ata_sff_drain_fifo); 2167 EXPORT_SYMBOL_GPL(ata_sff_drain_fifo);
2168 2168
2169 /** 2169 /**
2170 * ata_sff_error_handler - Stock error handler for SFF controller 2170 * ata_sff_error_handler - Stock error handler for SFF controller
2171 * @ap: port to handle error for 2171 * @ap: port to handle error for
2172 * 2172 *
2173 * Stock error handler for SFF controller. It can handle both 2173 * Stock error handler for SFF controller. It can handle both
2174 * PATA and SATA controllers. Many controllers should be able to 2174 * PATA and SATA controllers. Many controllers should be able to
2175 * use this EH as-is or with some added handling before and 2175 * use this EH as-is or with some added handling before and
2176 * after. 2176 * after.
2177 * 2177 *
2178 * LOCKING: 2178 * LOCKING:
2179 * Kernel thread context (may sleep) 2179 * Kernel thread context (may sleep)
2180 */ 2180 */
2181 void ata_sff_error_handler(struct ata_port *ap) 2181 void ata_sff_error_handler(struct ata_port *ap)
2182 { 2182 {
2183 ata_reset_fn_t softreset = ap->ops->softreset; 2183 ata_reset_fn_t softreset = ap->ops->softreset;
2184 ata_reset_fn_t hardreset = ap->ops->hardreset; 2184 ata_reset_fn_t hardreset = ap->ops->hardreset;
2185 struct ata_queued_cmd *qc; 2185 struct ata_queued_cmd *qc;
2186 unsigned long flags; 2186 unsigned long flags;
2187 2187
2188 qc = __ata_qc_from_tag(ap, ap->link.active_tag); 2188 qc = __ata_qc_from_tag(ap, ap->link.active_tag);
2189 if (qc && !(qc->flags & ATA_QCFLAG_FAILED)) 2189 if (qc && !(qc->flags & ATA_QCFLAG_FAILED))
2190 qc = NULL; 2190 qc = NULL;
2191 2191
2192 spin_lock_irqsave(ap->lock, flags); 2192 spin_lock_irqsave(ap->lock, flags);
2193 2193
2194 /* 2194 /*
2195 * We *MUST* do FIFO draining before we issue a reset as 2195 * We *MUST* do FIFO draining before we issue a reset as
2196 * several devices helpfully clear their internal state and 2196 * several devices helpfully clear their internal state and
2197 * will lock solid if we touch the data port post reset. Pass 2197 * will lock solid if we touch the data port post reset. Pass
2198 * qc in case anyone wants to do different PIO/DMA recovery or 2198 * qc in case anyone wants to do different PIO/DMA recovery or
2199 * has per command fixups 2199 * has per command fixups
2200 */ 2200 */
2201 if (ap->ops->sff_drain_fifo) 2201 if (ap->ops->sff_drain_fifo)
2202 ap->ops->sff_drain_fifo(qc); 2202 ap->ops->sff_drain_fifo(qc);
2203 2203
2204 spin_unlock_irqrestore(ap->lock, flags); 2204 spin_unlock_irqrestore(ap->lock, flags);
2205 2205
2206 /* ignore ata_sff_softreset if ctl isn't accessible */ 2206 /* ignore ata_sff_softreset if ctl isn't accessible */
2207 if (softreset == ata_sff_softreset && !ap->ioaddr.ctl_addr) 2207 if (softreset == ata_sff_softreset && !ap->ioaddr.ctl_addr)
2208 softreset = NULL; 2208 softreset = NULL;
2209 2209
2210 /* ignore built-in hardresets if SCR access is not available */ 2210 /* ignore built-in hardresets if SCR access is not available */
2211 if ((hardreset == sata_std_hardreset || 2211 if ((hardreset == sata_std_hardreset ||
2212 hardreset == sata_sff_hardreset) && !sata_scr_valid(&ap->link)) 2212 hardreset == sata_sff_hardreset) && !sata_scr_valid(&ap->link))
2213 hardreset = NULL; 2213 hardreset = NULL;
2214 2214
2215 ata_do_eh(ap, ap->ops->prereset, softreset, hardreset, 2215 ata_do_eh(ap, ap->ops->prereset, softreset, hardreset,
2216 ap->ops->postreset); 2216 ap->ops->postreset);
2217 } 2217 }
2218 EXPORT_SYMBOL_GPL(ata_sff_error_handler); 2218 EXPORT_SYMBOL_GPL(ata_sff_error_handler);
2219 2219
2220 /** 2220 /**
2221 * ata_sff_std_ports - initialize ioaddr with standard port offsets. 2221 * ata_sff_std_ports - initialize ioaddr with standard port offsets.
2222 * @ioaddr: IO address structure to be initialized 2222 * @ioaddr: IO address structure to be initialized
2223 * 2223 *
2224 * Utility function which initializes data_addr, error_addr, 2224 * Utility function which initializes data_addr, error_addr,
2225 * feature_addr, nsect_addr, lbal_addr, lbam_addr, lbah_addr, 2225 * feature_addr, nsect_addr, lbal_addr, lbam_addr, lbah_addr,
2226 * device_addr, status_addr, and command_addr to standard offsets 2226 * device_addr, status_addr, and command_addr to standard offsets
2227 * relative to cmd_addr. 2227 * relative to cmd_addr.
2228 * 2228 *
2229 * Does not set ctl_addr, altstatus_addr, bmdma_addr, or scr_addr. 2229 * Does not set ctl_addr, altstatus_addr, bmdma_addr, or scr_addr.
2230 */ 2230 */
2231 void ata_sff_std_ports(struct ata_ioports *ioaddr) 2231 void ata_sff_std_ports(struct ata_ioports *ioaddr)
2232 { 2232 {
2233 ioaddr->data_addr = ioaddr->cmd_addr + ATA_REG_DATA; 2233 ioaddr->data_addr = ioaddr->cmd_addr + ATA_REG_DATA;
2234 ioaddr->error_addr = ioaddr->cmd_addr + ATA_REG_ERR; 2234 ioaddr->error_addr = ioaddr->cmd_addr + ATA_REG_ERR;
2235 ioaddr->feature_addr = ioaddr->cmd_addr + ATA_REG_FEATURE; 2235 ioaddr->feature_addr = ioaddr->cmd_addr + ATA_REG_FEATURE;
2236 ioaddr->nsect_addr = ioaddr->cmd_addr + ATA_REG_NSECT; 2236 ioaddr->nsect_addr = ioaddr->cmd_addr + ATA_REG_NSECT;
2237 ioaddr->lbal_addr = ioaddr->cmd_addr + ATA_REG_LBAL; 2237 ioaddr->lbal_addr = ioaddr->cmd_addr + ATA_REG_LBAL;
2238 ioaddr->lbam_addr = ioaddr->cmd_addr + ATA_REG_LBAM; 2238 ioaddr->lbam_addr = ioaddr->cmd_addr + ATA_REG_LBAM;
2239 ioaddr->lbah_addr = ioaddr->cmd_addr + ATA_REG_LBAH; 2239 ioaddr->lbah_addr = ioaddr->cmd_addr + ATA_REG_LBAH;
2240 ioaddr->device_addr = ioaddr->cmd_addr + ATA_REG_DEVICE; 2240 ioaddr->device_addr = ioaddr->cmd_addr + ATA_REG_DEVICE;
2241 ioaddr->status_addr = ioaddr->cmd_addr + ATA_REG_STATUS; 2241 ioaddr->status_addr = ioaddr->cmd_addr + ATA_REG_STATUS;
2242 ioaddr->command_addr = ioaddr->cmd_addr + ATA_REG_CMD; 2242 ioaddr->command_addr = ioaddr->cmd_addr + ATA_REG_CMD;
2243 } 2243 }
2244 EXPORT_SYMBOL_GPL(ata_sff_std_ports); 2244 EXPORT_SYMBOL_GPL(ata_sff_std_ports);
2245 2245
2246 #ifdef CONFIG_PCI 2246 #ifdef CONFIG_PCI
2247 2247
2248 static int ata_resources_present(struct pci_dev *pdev, int port) 2248 static int ata_resources_present(struct pci_dev *pdev, int port)
2249 { 2249 {
2250 int i; 2250 int i;
2251 2251
2252 /* Check the PCI resources for this channel are enabled */ 2252 /* Check the PCI resources for this channel are enabled */
2253 port = port * 2; 2253 port = port * 2;
2254 for (i = 0; i < 2; i++) { 2254 for (i = 0; i < 2; i++) {
2255 if (pci_resource_start(pdev, port + i) == 0 || 2255 if (pci_resource_start(pdev, port + i) == 0 ||
2256 pci_resource_len(pdev, port + i) == 0) 2256 pci_resource_len(pdev, port + i) == 0)
2257 return 0; 2257 return 0;
2258 } 2258 }
2259 return 1; 2259 return 1;
2260 } 2260 }
2261 2261
2262 /** 2262 /**
2263 * ata_pci_sff_init_host - acquire native PCI ATA resources and init host 2263 * ata_pci_sff_init_host - acquire native PCI ATA resources and init host
2264 * @host: target ATA host 2264 * @host: target ATA host
2265 * 2265 *
2266 * Acquire native PCI ATA resources for @host and initialize the 2266 * Acquire native PCI ATA resources for @host and initialize the
2267 * first two ports of @host accordingly. Ports marked dummy are 2267 * first two ports of @host accordingly. Ports marked dummy are
2268 * skipped and allocation failure makes the port dummy. 2268 * skipped and allocation failure makes the port dummy.
2269 * 2269 *
2270 * Note that native PCI resources are valid even for legacy hosts 2270 * Note that native PCI resources are valid even for legacy hosts
2271 * as we fix up pdev resources array early in boot, so this 2271 * as we fix up pdev resources array early in boot, so this
2272 * function can be used for both native and legacy SFF hosts. 2272 * function can be used for both native and legacy SFF hosts.
2273 * 2273 *
2274 * LOCKING: 2274 * LOCKING:
2275 * Inherited from calling layer (may sleep). 2275 * Inherited from calling layer (may sleep).
2276 * 2276 *
2277 * RETURNS: 2277 * RETURNS:
2278 * 0 if at least one port is initialized, -ENODEV if no port is 2278 * 0 if at least one port is initialized, -ENODEV if no port is
2279 * available. 2279 * available.
2280 */ 2280 */
2281 int ata_pci_sff_init_host(struct ata_host *host) 2281 int ata_pci_sff_init_host(struct ata_host *host)
2282 { 2282 {
2283 struct device *gdev = host->dev; 2283 struct device *gdev = host->dev;
2284 struct pci_dev *pdev = to_pci_dev(gdev); 2284 struct pci_dev *pdev = to_pci_dev(gdev);
2285 unsigned int mask = 0; 2285 unsigned int mask = 0;
2286 int i, rc; 2286 int i, rc;
2287 2287
2288 /* request, iomap BARs and init port addresses accordingly */ 2288 /* request, iomap BARs and init port addresses accordingly */
2289 for (i = 0; i < 2; i++) { 2289 for (i = 0; i < 2; i++) {
2290 struct ata_port *ap = host->ports[i]; 2290 struct ata_port *ap = host->ports[i];
2291 int base = i * 2; 2291 int base = i * 2;
2292 void __iomem * const *iomap; 2292 void __iomem * const *iomap;
2293 2293
2294 if (ata_port_is_dummy(ap)) 2294 if (ata_port_is_dummy(ap))
2295 continue; 2295 continue;
2296 2296
2297 /* Discard disabled ports. Some controllers show 2297 /* Discard disabled ports. Some controllers show
2298 * their unused channels this way. Disabled ports are 2298 * their unused channels this way. Disabled ports are
2299 * made dummy. 2299 * made dummy.
2300 */ 2300 */
2301 if (!ata_resources_present(pdev, i)) { 2301 if (!ata_resources_present(pdev, i)) {
2302 ap->ops = &ata_dummy_port_ops; 2302 ap->ops = &ata_dummy_port_ops;
2303 continue; 2303 continue;
2304 } 2304 }
2305 2305
2306 rc = pcim_iomap_regions(pdev, 0x3 << base, 2306 rc = pcim_iomap_regions(pdev, 0x3 << base,
2307 dev_driver_string(gdev)); 2307 dev_driver_string(gdev));
2308 if (rc) { 2308 if (rc) {
2309 dev_printk(KERN_WARNING, gdev, 2309 dev_printk(KERN_WARNING, gdev,
2310 "failed to request/iomap BARs for port %d " 2310 "failed to request/iomap BARs for port %d "
2311 "(errno=%d)\n", i, rc); 2311 "(errno=%d)\n", i, rc);
2312 if (rc == -EBUSY) 2312 if (rc == -EBUSY)
2313 pcim_pin_device(pdev); 2313 pcim_pin_device(pdev);
2314 ap->ops = &ata_dummy_port_ops; 2314 ap->ops = &ata_dummy_port_ops;
2315 continue; 2315 continue;
2316 } 2316 }
2317 host->iomap = iomap = pcim_iomap_table(pdev); 2317 host->iomap = iomap = pcim_iomap_table(pdev);
2318 2318
2319 ap->ioaddr.cmd_addr = iomap[base]; 2319 ap->ioaddr.cmd_addr = iomap[base];
2320 ap->ioaddr.altstatus_addr = 2320 ap->ioaddr.altstatus_addr =
2321 ap->ioaddr.ctl_addr = (void __iomem *) 2321 ap->ioaddr.ctl_addr = (void __iomem *)
2322 ((unsigned long)iomap[base + 1] | ATA_PCI_CTL_OFS); 2322 ((unsigned long)iomap[base + 1] | ATA_PCI_CTL_OFS);
2323 ata_sff_std_ports(&ap->ioaddr); 2323 ata_sff_std_ports(&ap->ioaddr);
2324 2324
2325 ata_port_desc(ap, "cmd 0x%llx ctl 0x%llx", 2325 ata_port_desc(ap, "cmd 0x%llx ctl 0x%llx",
2326 (unsigned long long)pci_resource_start(pdev, base), 2326 (unsigned long long)pci_resource_start(pdev, base),
2327 (unsigned long long)pci_resource_start(pdev, base + 1)); 2327 (unsigned long long)pci_resource_start(pdev, base + 1));
2328 2328
2329 mask |= 1 << i; 2329 mask |= 1 << i;
2330 } 2330 }
2331 2331
2332 if (!mask) { 2332 if (!mask) {
2333 dev_printk(KERN_ERR, gdev, "no available native port\n"); 2333 dev_printk(KERN_ERR, gdev, "no available native port\n");
2334 return -ENODEV; 2334 return -ENODEV;
2335 } 2335 }
2336 2336
2337 return 0; 2337 return 0;
2338 } 2338 }
2339 EXPORT_SYMBOL_GPL(ata_pci_sff_init_host); 2339 EXPORT_SYMBOL_GPL(ata_pci_sff_init_host);
2340 2340
2341 /** 2341 /**
2342 * ata_pci_sff_prepare_host - helper to prepare PCI PIO-only SFF ATA host 2342 * ata_pci_sff_prepare_host - helper to prepare PCI PIO-only SFF ATA host
2343 * @pdev: target PCI device 2343 * @pdev: target PCI device
2344 * @ppi: array of port_info, must be enough for two ports 2344 * @ppi: array of port_info, must be enough for two ports
2345 * @r_host: out argument for the initialized ATA host 2345 * @r_host: out argument for the initialized ATA host
2346 * 2346 *
2347 * Helper to allocate PIO-only SFF ATA host for @pdev, acquire 2347 * Helper to allocate PIO-only SFF ATA host for @pdev, acquire
2348 * all PCI resources and initialize it accordingly in one go. 2348 * all PCI resources and initialize it accordingly in one go.
2349 * 2349 *
2350 * LOCKING: 2350 * LOCKING:
2351 * Inherited from calling layer (may sleep). 2351 * Inherited from calling layer (may sleep).
2352 * 2352 *
2353 * RETURNS: 2353 * RETURNS:
2354 * 0 on success, -errno otherwise. 2354 * 0 on success, -errno otherwise.
2355 */ 2355 */
2356 int ata_pci_sff_prepare_host(struct pci_dev *pdev, 2356 int ata_pci_sff_prepare_host(struct pci_dev *pdev,
2357 const struct ata_port_info * const *ppi, 2357 const struct ata_port_info * const *ppi,
2358 struct ata_host **r_host) 2358 struct ata_host **r_host)
2359 { 2359 {
2360 struct ata_host *host; 2360 struct ata_host *host;
2361 int rc; 2361 int rc;
2362 2362
2363 if (!devres_open_group(&pdev->dev, NULL, GFP_KERNEL)) 2363 if (!devres_open_group(&pdev->dev, NULL, GFP_KERNEL))
2364 return -ENOMEM; 2364 return -ENOMEM;
2365 2365
2366 host = ata_host_alloc_pinfo(&pdev->dev, ppi, 2); 2366 host = ata_host_alloc_pinfo(&pdev->dev, ppi, 2);
2367 if (!host) { 2367 if (!host) {
2368 dev_printk(KERN_ERR, &pdev->dev, 2368 dev_printk(KERN_ERR, &pdev->dev,
2369 "failed to allocate ATA host\n"); 2369 "failed to allocate ATA host\n");
2370 rc = -ENOMEM; 2370 rc = -ENOMEM;
2371 goto err_out; 2371 goto err_out;
2372 } 2372 }
2373 2373
2374 rc = ata_pci_sff_init_host(host); 2374 rc = ata_pci_sff_init_host(host);
2375 if (rc) 2375 if (rc)
2376 goto err_out; 2376 goto err_out;
2377 2377
2378 devres_remove_group(&pdev->dev, NULL); 2378 devres_remove_group(&pdev->dev, NULL);
2379 *r_host = host; 2379 *r_host = host;
2380 return 0; 2380 return 0;
2381 2381
2382 err_out: 2382 err_out:
2383 devres_release_group(&pdev->dev, NULL); 2383 devres_release_group(&pdev->dev, NULL);
2384 return rc; 2384 return rc;
2385 } 2385 }
2386 EXPORT_SYMBOL_GPL(ata_pci_sff_prepare_host); 2386 EXPORT_SYMBOL_GPL(ata_pci_sff_prepare_host);
2387 2387
2388 /** 2388 /**
2389 * ata_pci_sff_activate_host - start SFF host, request IRQ and register it 2389 * ata_pci_sff_activate_host - start SFF host, request IRQ and register it
2390 * @host: target SFF ATA host 2390 * @host: target SFF ATA host
2391 * @irq_handler: irq_handler used when requesting IRQ(s) 2391 * @irq_handler: irq_handler used when requesting IRQ(s)
2392 * @sht: scsi_host_template to use when registering the host 2392 * @sht: scsi_host_template to use when registering the host
2393 * 2393 *
2394 * This is the counterpart of ata_host_activate() for SFF ATA 2394 * This is the counterpart of ata_host_activate() for SFF ATA
2395 * hosts. This separate helper is necessary because SFF hosts 2395 * hosts. This separate helper is necessary because SFF hosts
2396 * use two separate interrupts in legacy mode. 2396 * use two separate interrupts in legacy mode.
2397 * 2397 *
2398 * LOCKING: 2398 * LOCKING:
2399 * Inherited from calling layer (may sleep). 2399 * Inherited from calling layer (may sleep).
2400 * 2400 *
2401 * RETURNS: 2401 * RETURNS:
2402 * 0 on success, -errno otherwise. 2402 * 0 on success, -errno otherwise.
2403 */ 2403 */
2404 int ata_pci_sff_activate_host(struct ata_host *host, 2404 int ata_pci_sff_activate_host(struct ata_host *host,
2405 irq_handler_t irq_handler, 2405 irq_handler_t irq_handler,
2406 struct scsi_host_template *sht) 2406 struct scsi_host_template *sht)
2407 { 2407 {
2408 struct device *dev = host->dev; 2408 struct device *dev = host->dev;
2409 struct pci_dev *pdev = to_pci_dev(dev); 2409 struct pci_dev *pdev = to_pci_dev(dev);
2410 const char *drv_name = dev_driver_string(host->dev); 2410 const char *drv_name = dev_driver_string(host->dev);
2411 int legacy_mode = 0, rc; 2411 int legacy_mode = 0, rc;
2412 2412
2413 rc = ata_host_start(host); 2413 rc = ata_host_start(host);
2414 if (rc) 2414 if (rc)
2415 return rc; 2415 return rc;
2416 2416
2417 if ((pdev->class >> 8) == PCI_CLASS_STORAGE_IDE) { 2417 if ((pdev->class >> 8) == PCI_CLASS_STORAGE_IDE) {
2418 u8 tmp8, mask; 2418 u8 tmp8, mask;
2419 2419
2420 /* TODO: What if one channel is in native mode ... */ 2420 /* TODO: What if one channel is in native mode ... */
2421 pci_read_config_byte(pdev, PCI_CLASS_PROG, &tmp8); 2421 pci_read_config_byte(pdev, PCI_CLASS_PROG, &tmp8);
2422 mask = (1 << 2) | (1 << 0); 2422 mask = (1 << 2) | (1 << 0);
2423 if ((tmp8 & mask) != mask) 2423 if ((tmp8 & mask) != mask)
2424 legacy_mode = 1; 2424 legacy_mode = 1;
2425 #if defined(CONFIG_NO_ATA_LEGACY) 2425 #if defined(CONFIG_NO_ATA_LEGACY)
2426 /* Some platforms with PCI limits cannot address compat 2426 /* Some platforms with PCI limits cannot address compat
2427 port space. In that case we punt if their firmware has 2427 port space. In that case we punt if their firmware has
2428 left a device in compatibility mode */ 2428 left a device in compatibility mode */
2429 if (legacy_mode) { 2429 if (legacy_mode) {
2430 printk(KERN_ERR "ata: Compatibility mode ATA is not supported on this platform, skipping.\n"); 2430 printk(KERN_ERR "ata: Compatibility mode ATA is not supported on this platform, skipping.\n");
2431 return -EOPNOTSUPP; 2431 return -EOPNOTSUPP;
2432 } 2432 }
2433 #endif 2433 #endif
2434 } 2434 }
2435 2435
2436 if (!devres_open_group(dev, NULL, GFP_KERNEL)) 2436 if (!devres_open_group(dev, NULL, GFP_KERNEL))
2437 return -ENOMEM; 2437 return -ENOMEM;
2438 2438
2439 if (!legacy_mode && pdev->irq) { 2439 if (!legacy_mode && pdev->irq) {
2440 rc = devm_request_irq(dev, pdev->irq, irq_handler, 2440 rc = devm_request_irq(dev, pdev->irq, irq_handler,
2441 IRQF_SHARED, drv_name, host); 2441 IRQF_SHARED, drv_name, host);
2442 if (rc) 2442 if (rc)
2443 goto out; 2443 goto out;
2444 2444
2445 ata_port_desc(host->ports[0], "irq %d", pdev->irq); 2445 ata_port_desc(host->ports[0], "irq %d", pdev->irq);
2446 ata_port_desc(host->ports[1], "irq %d", pdev->irq); 2446 ata_port_desc(host->ports[1], "irq %d", pdev->irq);
2447 } else if (legacy_mode) { 2447 } else if (legacy_mode) {
2448 if (!ata_port_is_dummy(host->ports[0])) { 2448 if (!ata_port_is_dummy(host->ports[0])) {
2449 rc = devm_request_irq(dev, ATA_PRIMARY_IRQ(pdev), 2449 rc = devm_request_irq(dev, ATA_PRIMARY_IRQ(pdev),
2450 irq_handler, IRQF_SHARED, 2450 irq_handler, IRQF_SHARED,
2451 drv_name, host); 2451 drv_name, host);
2452 if (rc) 2452 if (rc)
2453 goto out; 2453 goto out;
2454 2454
2455 ata_port_desc(host->ports[0], "irq %d", 2455 ata_port_desc(host->ports[0], "irq %d",
2456 ATA_PRIMARY_IRQ(pdev)); 2456 ATA_PRIMARY_IRQ(pdev));
2457 } 2457 }
2458 2458
2459 if (!ata_port_is_dummy(host->ports[1])) { 2459 if (!ata_port_is_dummy(host->ports[1])) {
2460 rc = devm_request_irq(dev, ATA_SECONDARY_IRQ(pdev), 2460 rc = devm_request_irq(dev, ATA_SECONDARY_IRQ(pdev),
2461 irq_handler, IRQF_SHARED, 2461 irq_handler, IRQF_SHARED,
2462 drv_name, host); 2462 drv_name, host);
2463 if (rc) 2463 if (rc)
2464 goto out; 2464 goto out;
2465 2465
2466 ata_port_desc(host->ports[1], "irq %d", 2466 ata_port_desc(host->ports[1], "irq %d",
2467 ATA_SECONDARY_IRQ(pdev)); 2467 ATA_SECONDARY_IRQ(pdev));
2468 } 2468 }
2469 } 2469 }
2470 2470
2471 rc = ata_host_register(host, sht); 2471 rc = ata_host_register(host, sht);
2472 out: 2472 out:
2473 if (rc == 0) 2473 if (rc == 0)
2474 devres_remove_group(dev, NULL); 2474 devres_remove_group(dev, NULL);
2475 else 2475 else
2476 devres_release_group(dev, NULL); 2476 devres_release_group(dev, NULL);
2477 2477
2478 return rc; 2478 return rc;
2479 } 2479 }
2480 EXPORT_SYMBOL_GPL(ata_pci_sff_activate_host); 2480 EXPORT_SYMBOL_GPL(ata_pci_sff_activate_host);
2481 2481
2482 static const struct ata_port_info *ata_sff_find_valid_pi( 2482 static const struct ata_port_info *ata_sff_find_valid_pi(
2483 const struct ata_port_info * const *ppi) 2483 const struct ata_port_info * const *ppi)
2484 { 2484 {
2485 int i; 2485 int i;
2486 2486
2487 /* look up the first valid port_info */ 2487 /* look up the first valid port_info */
2488 for (i = 0; i < 2 && ppi[i]; i++) 2488 for (i = 0; i < 2 && ppi[i]; i++)
2489 if (ppi[i]->port_ops != &ata_dummy_port_ops) 2489 if (ppi[i]->port_ops != &ata_dummy_port_ops)
2490 return ppi[i]; 2490 return ppi[i];
2491 2491
2492 return NULL; 2492 return NULL;
2493 } 2493 }
2494 2494
2495 /** 2495 /**
2496 * ata_pci_sff_init_one - Initialize/register PIO-only PCI IDE controller 2496 * ata_pci_sff_init_one - Initialize/register PIO-only PCI IDE controller
2497 * @pdev: Controller to be initialized 2497 * @pdev: Controller to be initialized
2498 * @ppi: array of port_info, must be enough for two ports 2498 * @ppi: array of port_info, must be enough for two ports
2499 * @sht: scsi_host_template to use when registering the host 2499 * @sht: scsi_host_template to use when registering the host
2500 * @host_priv: host private_data 2500 * @host_priv: host private_data
2501 * @hflag: host flags 2501 * @hflag: host flags
2502 * 2502 *
2503 * This is a helper function which can be called from a driver's 2503 * This is a helper function which can be called from a driver's
2504 * xxx_init_one() probe function if the hardware uses traditional 2504 * xxx_init_one() probe function if the hardware uses traditional
2505 * IDE taskfile registers and is PIO only. 2505 * IDE taskfile registers and is PIO only.
2506 * 2506 *
2507 * ASSUMPTION: 2507 * ASSUMPTION:
2508 * Nobody makes a single channel controller that appears solely as 2508 * Nobody makes a single channel controller that appears solely as
2509 * the secondary legacy port on PCI. 2509 * the secondary legacy port on PCI.
2510 * 2510 *
2511 * LOCKING: 2511 * LOCKING:
2512 * Inherited from PCI layer (may sleep). 2512 * Inherited from PCI layer (may sleep).
2513 * 2513 *
2514 * RETURNS: 2514 * RETURNS:
2515 * Zero on success, negative on errno-based value on error. 2515 * Zero on success, negative on errno-based value on error.
2516 */ 2516 */
2517 int ata_pci_sff_init_one(struct pci_dev *pdev, 2517 int ata_pci_sff_init_one(struct pci_dev *pdev,
2518 const struct ata_port_info * const *ppi, 2518 const struct ata_port_info * const *ppi,
2519 struct scsi_host_template *sht, void *host_priv, int hflag) 2519 struct scsi_host_template *sht, void *host_priv, int hflag)
2520 { 2520 {
2521 struct device *dev = &pdev->dev; 2521 struct device *dev = &pdev->dev;
2522 const struct ata_port_info *pi; 2522 const struct ata_port_info *pi;
2523 struct ata_host *host = NULL; 2523 struct ata_host *host = NULL;
2524 int rc; 2524 int rc;
2525 2525
2526 DPRINTK("ENTER\n"); 2526 DPRINTK("ENTER\n");
2527 2527
2528 pi = ata_sff_find_valid_pi(ppi); 2528 pi = ata_sff_find_valid_pi(ppi);
2529 if (!pi) { 2529 if (!pi) {
2530 dev_printk(KERN_ERR, &pdev->dev, 2530 dev_printk(KERN_ERR, &pdev->dev,
2531 "no valid port_info specified\n"); 2531 "no valid port_info specified\n");
2532 return -EINVAL; 2532 return -EINVAL;
2533 } 2533 }
2534 2534
2535 if (!devres_open_group(dev, NULL, GFP_KERNEL)) 2535 if (!devres_open_group(dev, NULL, GFP_KERNEL))
2536 return -ENOMEM; 2536 return -ENOMEM;
2537 2537
2538 rc = pcim_enable_device(pdev); 2538 rc = pcim_enable_device(pdev);
2539 if (rc) 2539 if (rc)
2540 goto out; 2540 goto out;
2541 2541
2542 /* prepare and activate SFF host */ 2542 /* prepare and activate SFF host */
2543 rc = ata_pci_sff_prepare_host(pdev, ppi, &host); 2543 rc = ata_pci_sff_prepare_host(pdev, ppi, &host);
2544 if (rc) 2544 if (rc)
2545 goto out; 2545 goto out;
2546 host->private_data = host_priv; 2546 host->private_data = host_priv;
2547 host->flags |= hflag; 2547 host->flags |= hflag;
2548 2548
2549 rc = ata_pci_sff_activate_host(host, ata_sff_interrupt, sht); 2549 rc = ata_pci_sff_activate_host(host, ata_sff_interrupt, sht);
2550 out: 2550 out:
2551 if (rc == 0) 2551 if (rc == 0)
2552 devres_remove_group(&pdev->dev, NULL); 2552 devres_remove_group(&pdev->dev, NULL);
2553 else 2553 else
2554 devres_release_group(&pdev->dev, NULL); 2554 devres_release_group(&pdev->dev, NULL);
2555 2555
2556 return rc; 2556 return rc;
2557 } 2557 }
2558 EXPORT_SYMBOL_GPL(ata_pci_sff_init_one); 2558 EXPORT_SYMBOL_GPL(ata_pci_sff_init_one);
2559 2559
2560 #endif /* CONFIG_PCI */ 2560 #endif /* CONFIG_PCI */
2561 2561
2562 /* 2562 /*
2563 * BMDMA support 2563 * BMDMA support
2564 */ 2564 */
2565 2565
2566 #ifdef CONFIG_ATA_BMDMA 2566 #ifdef CONFIG_ATA_BMDMA
2567 2567
2568 const struct ata_port_operations ata_bmdma_port_ops = { 2568 const struct ata_port_operations ata_bmdma_port_ops = {
2569 .inherits = &ata_sff_port_ops, 2569 .inherits = &ata_sff_port_ops,
2570 2570
2571 .error_handler = ata_bmdma_error_handler, 2571 .error_handler = ata_bmdma_error_handler,
2572 .post_internal_cmd = ata_bmdma_post_internal_cmd, 2572 .post_internal_cmd = ata_bmdma_post_internal_cmd,
2573 2573
2574 .qc_prep = ata_bmdma_qc_prep, 2574 .qc_prep = ata_bmdma_qc_prep,
2575 .qc_issue = ata_bmdma_qc_issue, 2575 .qc_issue = ata_bmdma_qc_issue,
2576 2576
2577 .sff_irq_clear = ata_bmdma_irq_clear, 2577 .sff_irq_clear = ata_bmdma_irq_clear,
2578 .bmdma_setup = ata_bmdma_setup, 2578 .bmdma_setup = ata_bmdma_setup,
2579 .bmdma_start = ata_bmdma_start, 2579 .bmdma_start = ata_bmdma_start,
2580 .bmdma_stop = ata_bmdma_stop, 2580 .bmdma_stop = ata_bmdma_stop,
2581 .bmdma_status = ata_bmdma_status, 2581 .bmdma_status = ata_bmdma_status,
2582 2582
2583 .port_start = ata_bmdma_port_start, 2583 .port_start = ata_bmdma_port_start,
2584 }; 2584 };
2585 EXPORT_SYMBOL_GPL(ata_bmdma_port_ops); 2585 EXPORT_SYMBOL_GPL(ata_bmdma_port_ops);
2586 2586
2587 const struct ata_port_operations ata_bmdma32_port_ops = { 2587 const struct ata_port_operations ata_bmdma32_port_ops = {
2588 .inherits = &ata_bmdma_port_ops, 2588 .inherits = &ata_bmdma_port_ops,
2589 2589
2590 .sff_data_xfer = ata_sff_data_xfer32, 2590 .sff_data_xfer = ata_sff_data_xfer32,
2591 .port_start = ata_bmdma_port_start32, 2591 .port_start = ata_bmdma_port_start32,
2592 }; 2592 };
2593 EXPORT_SYMBOL_GPL(ata_bmdma32_port_ops); 2593 EXPORT_SYMBOL_GPL(ata_bmdma32_port_ops);
2594 2594
2595 /** 2595 /**
2596 * ata_bmdma_fill_sg - Fill PCI IDE PRD table 2596 * ata_bmdma_fill_sg - Fill PCI IDE PRD table
2597 * @qc: Metadata associated with taskfile to be transferred 2597 * @qc: Metadata associated with taskfile to be transferred
2598 * 2598 *
2599 * Fill PCI IDE PRD (scatter-gather) table with segments 2599 * Fill PCI IDE PRD (scatter-gather) table with segments
2600 * associated with the current disk command. 2600 * associated with the current disk command.
2601 * 2601 *
2602 * LOCKING: 2602 * LOCKING:
2603 * spin_lock_irqsave(host lock) 2603 * spin_lock_irqsave(host lock)
2604 * 2604 *
2605 */ 2605 */
2606 static void ata_bmdma_fill_sg(struct ata_queued_cmd *qc) 2606 static void ata_bmdma_fill_sg(struct ata_queued_cmd *qc)
2607 { 2607 {
2608 struct ata_port *ap = qc->ap; 2608 struct ata_port *ap = qc->ap;
2609 struct ata_bmdma_prd *prd = ap->bmdma_prd; 2609 struct ata_bmdma_prd *prd = ap->bmdma_prd;
2610 struct scatterlist *sg; 2610 struct scatterlist *sg;
2611 unsigned int si, pi; 2611 unsigned int si, pi;
2612 2612
2613 pi = 0; 2613 pi = 0;
2614 for_each_sg(qc->sg, sg, qc->n_elem, si) { 2614 for_each_sg(qc->sg, sg, qc->n_elem, si) {
2615 u32 addr, offset; 2615 u32 addr, offset;
2616 u32 sg_len, len; 2616 u32 sg_len, len;
2617 2617
2618 /* determine if physical DMA addr spans 64K boundary. 2618 /* determine if physical DMA addr spans 64K boundary.
2619 * Note h/w doesn't support 64-bit, so we unconditionally 2619 * Note h/w doesn't support 64-bit, so we unconditionally
2620 * truncate dma_addr_t to u32. 2620 * truncate dma_addr_t to u32.
2621 */ 2621 */
2622 addr = (u32) sg_dma_address(sg); 2622 addr = (u32) sg_dma_address(sg);
2623 sg_len = sg_dma_len(sg); 2623 sg_len = sg_dma_len(sg);
2624 2624
2625 while (sg_len) { 2625 while (sg_len) {
2626 offset = addr & 0xffff; 2626 offset = addr & 0xffff;
2627 len = sg_len; 2627 len = sg_len;
2628 if ((offset + sg_len) > 0x10000) 2628 if ((offset + sg_len) > 0x10000)
2629 len = 0x10000 - offset; 2629 len = 0x10000 - offset;
2630 2630
2631 prd[pi].addr = cpu_to_le32(addr); 2631 prd[pi].addr = cpu_to_le32(addr);
2632 prd[pi].flags_len = cpu_to_le32(len & 0xffff); 2632 prd[pi].flags_len = cpu_to_le32(len & 0xffff);
2633 VPRINTK("PRD[%u] = (0x%X, 0x%X)\n", pi, addr, len); 2633 VPRINTK("PRD[%u] = (0x%X, 0x%X)\n", pi, addr, len);
2634 2634
2635 pi++; 2635 pi++;
2636 sg_len -= len; 2636 sg_len -= len;
2637 addr += len; 2637 addr += len;
2638 } 2638 }
2639 } 2639 }
2640 2640
2641 prd[pi - 1].flags_len |= cpu_to_le32(ATA_PRD_EOT); 2641 prd[pi - 1].flags_len |= cpu_to_le32(ATA_PRD_EOT);
2642 } 2642 }
2643 2643
2644 /** 2644 /**
2645 * ata_bmdma_fill_sg_dumb - Fill PCI IDE PRD table 2645 * ata_bmdma_fill_sg_dumb - Fill PCI IDE PRD table
2646 * @qc: Metadata associated with taskfile to be transferred 2646 * @qc: Metadata associated with taskfile to be transferred
2647 * 2647 *
2648 * Fill PCI IDE PRD (scatter-gather) table with segments 2648 * Fill PCI IDE PRD (scatter-gather) table with segments
2649 * associated with the current disk command. Perform the fill 2649 * associated with the current disk command. Perform the fill
2650 * so that we avoid writing any length 64K records for 2650 * so that we avoid writing any length 64K records for
2651 * controllers that don't follow the spec. 2651 * controllers that don't follow the spec.
2652 * 2652 *
2653 * LOCKING: 2653 * LOCKING:
2654 * spin_lock_irqsave(host lock) 2654 * spin_lock_irqsave(host lock)
2655 * 2655 *
2656 */ 2656 */
2657 static void ata_bmdma_fill_sg_dumb(struct ata_queued_cmd *qc) 2657 static void ata_bmdma_fill_sg_dumb(struct ata_queued_cmd *qc)
2658 { 2658 {
2659 struct ata_port *ap = qc->ap; 2659 struct ata_port *ap = qc->ap;
2660 struct ata_bmdma_prd *prd = ap->bmdma_prd; 2660 struct ata_bmdma_prd *prd = ap->bmdma_prd;
2661 struct scatterlist *sg; 2661 struct scatterlist *sg;
2662 unsigned int si, pi; 2662 unsigned int si, pi;
2663 2663
2664 pi = 0; 2664 pi = 0;
2665 for_each_sg(qc->sg, sg, qc->n_elem, si) { 2665 for_each_sg(qc->sg, sg, qc->n_elem, si) {
2666 u32 addr, offset; 2666 u32 addr, offset;
2667 u32 sg_len, len, blen; 2667 u32 sg_len, len, blen;
2668 2668
2669 /* determine if physical DMA addr spans 64K boundary. 2669 /* determine if physical DMA addr spans 64K boundary.
2670 * Note h/w doesn't support 64-bit, so we unconditionally 2670 * Note h/w doesn't support 64-bit, so we unconditionally
2671 * truncate dma_addr_t to u32. 2671 * truncate dma_addr_t to u32.
2672 */ 2672 */
2673 addr = (u32) sg_dma_address(sg); 2673 addr = (u32) sg_dma_address(sg);
2674 sg_len = sg_dma_len(sg); 2674 sg_len = sg_dma_len(sg);
2675 2675
2676 while (sg_len) { 2676 while (sg_len) {
2677 offset = addr & 0xffff; 2677 offset = addr & 0xffff;
2678 len = sg_len; 2678 len = sg_len;
2679 if ((offset + sg_len) > 0x10000) 2679 if ((offset + sg_len) > 0x10000)
2680 len = 0x10000 - offset; 2680 len = 0x10000 - offset;
2681 2681
2682 blen = len & 0xffff; 2682 blen = len & 0xffff;
2683 prd[pi].addr = cpu_to_le32(addr); 2683 prd[pi].addr = cpu_to_le32(addr);
2684 if (blen == 0) { 2684 if (blen == 0) {
2685 /* Some PATA chipsets like the CS5530 can't 2685 /* Some PATA chipsets like the CS5530 can't
2686 cope with 0x0000 meaning 64K as the spec 2686 cope with 0x0000 meaning 64K as the spec
2687 says */ 2687 says */
2688 prd[pi].flags_len = cpu_to_le32(0x8000); 2688 prd[pi].flags_len = cpu_to_le32(0x8000);
2689 blen = 0x8000; 2689 blen = 0x8000;
2690 prd[++pi].addr = cpu_to_le32(addr + 0x8000); 2690 prd[++pi].addr = cpu_to_le32(addr + 0x8000);
2691 } 2691 }
2692 prd[pi].flags_len = cpu_to_le32(blen); 2692 prd[pi].flags_len = cpu_to_le32(blen);
2693 VPRINTK("PRD[%u] = (0x%X, 0x%X)\n", pi, addr, len); 2693 VPRINTK("PRD[%u] = (0x%X, 0x%X)\n", pi, addr, len);
2694 2694
2695 pi++; 2695 pi++;
2696 sg_len -= len; 2696 sg_len -= len;
2697 addr += len; 2697 addr += len;
2698 } 2698 }
2699 } 2699 }
2700 2700
2701 prd[pi - 1].flags_len |= cpu_to_le32(ATA_PRD_EOT); 2701 prd[pi - 1].flags_len |= cpu_to_le32(ATA_PRD_EOT);
2702 } 2702 }
2703 2703
2704 /** 2704 /**
2705 * ata_bmdma_qc_prep - Prepare taskfile for submission 2705 * ata_bmdma_qc_prep - Prepare taskfile for submission
2706 * @qc: Metadata associated with taskfile to be prepared 2706 * @qc: Metadata associated with taskfile to be prepared
2707 * 2707 *
2708 * Prepare ATA taskfile for submission. 2708 * Prepare ATA taskfile for submission.
2709 * 2709 *
2710 * LOCKING: 2710 * LOCKING:
2711 * spin_lock_irqsave(host lock) 2711 * spin_lock_irqsave(host lock)
2712 */ 2712 */
2713 void ata_bmdma_qc_prep(struct ata_queued_cmd *qc) 2713 void ata_bmdma_qc_prep(struct ata_queued_cmd *qc)
2714 { 2714 {
2715 if (!(qc->flags & ATA_QCFLAG_DMAMAP)) 2715 if (!(qc->flags & ATA_QCFLAG_DMAMAP))
2716 return; 2716 return;
2717 2717
2718 ata_bmdma_fill_sg(qc); 2718 ata_bmdma_fill_sg(qc);
2719 } 2719 }
2720 EXPORT_SYMBOL_GPL(ata_bmdma_qc_prep); 2720 EXPORT_SYMBOL_GPL(ata_bmdma_qc_prep);
2721 2721
2722 /** 2722 /**
2723 * ata_bmdma_dumb_qc_prep - Prepare taskfile for submission 2723 * ata_bmdma_dumb_qc_prep - Prepare taskfile for submission
2724 * @qc: Metadata associated with taskfile to be prepared 2724 * @qc: Metadata associated with taskfile to be prepared
2725 * 2725 *
2726 * Prepare ATA taskfile for submission. 2726 * Prepare ATA taskfile for submission.
2727 * 2727 *
2728 * LOCKING: 2728 * LOCKING:
2729 * spin_lock_irqsave(host lock) 2729 * spin_lock_irqsave(host lock)
2730 */ 2730 */
2731 void ata_bmdma_dumb_qc_prep(struct ata_queued_cmd *qc) 2731 void ata_bmdma_dumb_qc_prep(struct ata_queued_cmd *qc)
2732 { 2732 {
2733 if (!(qc->flags & ATA_QCFLAG_DMAMAP)) 2733 if (!(qc->flags & ATA_QCFLAG_DMAMAP))
2734 return; 2734 return;
2735 2735
2736 ata_bmdma_fill_sg_dumb(qc); 2736 ata_bmdma_fill_sg_dumb(qc);
2737 } 2737 }
2738 EXPORT_SYMBOL_GPL(ata_bmdma_dumb_qc_prep); 2738 EXPORT_SYMBOL_GPL(ata_bmdma_dumb_qc_prep);
2739 2739
2740 /** 2740 /**
2741 * ata_bmdma_qc_issue - issue taskfile to a BMDMA controller 2741 * ata_bmdma_qc_issue - issue taskfile to a BMDMA controller
2742 * @qc: command to issue to device 2742 * @qc: command to issue to device
2743 * 2743 *
2744 * This function issues a PIO, NODATA or DMA command to a 2744 * This function issues a PIO, NODATA or DMA command to a
2745 * SFF/BMDMA controller. PIO and NODATA are handled by 2745 * SFF/BMDMA controller. PIO and NODATA are handled by
2746 * ata_sff_qc_issue(). 2746 * ata_sff_qc_issue().
2747 * 2747 *
2748 * LOCKING: 2748 * LOCKING:
2749 * spin_lock_irqsave(host lock) 2749 * spin_lock_irqsave(host lock)
2750 * 2750 *
2751 * RETURNS: 2751 * RETURNS:
2752 * Zero on success, AC_ERR_* mask on failure 2752 * Zero on success, AC_ERR_* mask on failure
2753 */ 2753 */
2754 unsigned int ata_bmdma_qc_issue(struct ata_queued_cmd *qc) 2754 unsigned int ata_bmdma_qc_issue(struct ata_queued_cmd *qc)
2755 { 2755 {
2756 struct ata_port *ap = qc->ap; 2756 struct ata_port *ap = qc->ap;
2757 struct ata_link *link = qc->dev->link; 2757 struct ata_link *link = qc->dev->link;
2758 2758
2759 /* defer PIO handling to sff_qc_issue */ 2759 /* defer PIO handling to sff_qc_issue */
2760 if (!ata_is_dma(qc->tf.protocol)) 2760 if (!ata_is_dma(qc->tf.protocol))
2761 return ata_sff_qc_issue(qc); 2761 return ata_sff_qc_issue(qc);
2762 2762
2763 /* select the device */ 2763 /* select the device */
2764 ata_dev_select(ap, qc->dev->devno, 1, 0); 2764 ata_dev_select(ap, qc->dev->devno, 1, 0);
2765 2765
2766 /* start the command */ 2766 /* start the command */
2767 switch (qc->tf.protocol) { 2767 switch (qc->tf.protocol) {
2768 case ATA_PROT_DMA: 2768 case ATA_PROT_DMA:
2769 WARN_ON_ONCE(qc->tf.flags & ATA_TFLAG_POLLING); 2769 WARN_ON_ONCE(qc->tf.flags & ATA_TFLAG_POLLING);
2770 2770
2771 ap->ops->sff_tf_load(ap, &qc->tf); /* load tf registers */ 2771 ap->ops->sff_tf_load(ap, &qc->tf); /* load tf registers */
2772 ap->ops->bmdma_setup(qc); /* set up bmdma */ 2772 ap->ops->bmdma_setup(qc); /* set up bmdma */
2773 ap->ops->bmdma_start(qc); /* initiate bmdma */ 2773 ap->ops->bmdma_start(qc); /* initiate bmdma */
2774 ap->hsm_task_state = HSM_ST_LAST; 2774 ap->hsm_task_state = HSM_ST_LAST;
2775 break; 2775 break;
2776 2776
2777 case ATAPI_PROT_DMA: 2777 case ATAPI_PROT_DMA:
2778 WARN_ON_ONCE(qc->tf.flags & ATA_TFLAG_POLLING); 2778 WARN_ON_ONCE(qc->tf.flags & ATA_TFLAG_POLLING);
2779 2779
2780 ap->ops->sff_tf_load(ap, &qc->tf); /* load tf registers */ 2780 ap->ops->sff_tf_load(ap, &qc->tf); /* load tf registers */
2781 ap->ops->bmdma_setup(qc); /* set up bmdma */ 2781 ap->ops->bmdma_setup(qc); /* set up bmdma */
2782 ap->hsm_task_state = HSM_ST_FIRST; 2782 ap->hsm_task_state = HSM_ST_FIRST;
2783 2783
2784 /* send cdb by polling if no cdb interrupt */ 2784 /* send cdb by polling if no cdb interrupt */
2785 if (!(qc->dev->flags & ATA_DFLAG_CDB_INTR)) 2785 if (!(qc->dev->flags & ATA_DFLAG_CDB_INTR))
2786 ata_sff_queue_pio_task(link, 0); 2786 ata_sff_queue_pio_task(link, 0);
2787 break; 2787 break;
2788 2788
2789 default: 2789 default:
2790 WARN_ON(1); 2790 WARN_ON(1);
2791 return AC_ERR_SYSTEM; 2791 return AC_ERR_SYSTEM;
2792 } 2792 }
2793 2793
2794 return 0; 2794 return 0;
2795 } 2795 }
2796 EXPORT_SYMBOL_GPL(ata_bmdma_qc_issue); 2796 EXPORT_SYMBOL_GPL(ata_bmdma_qc_issue);
2797 2797
2798 /** 2798 /**
2799 * ata_bmdma_port_intr - Handle BMDMA port interrupt 2799 * ata_bmdma_port_intr - Handle BMDMA port interrupt
2800 * @ap: Port on which interrupt arrived (possibly...) 2800 * @ap: Port on which interrupt arrived (possibly...)
2801 * @qc: Taskfile currently active in engine 2801 * @qc: Taskfile currently active in engine
2802 * 2802 *
2803 * Handle port interrupt for given queued command. 2803 * Handle port interrupt for given queued command.
2804 * 2804 *
2805 * LOCKING: 2805 * LOCKING:
2806 * spin_lock_irqsave(host lock) 2806 * spin_lock_irqsave(host lock)
2807 * 2807 *
2808 * RETURNS: 2808 * RETURNS:
2809 * One if interrupt was handled, zero if not (shared irq). 2809 * One if interrupt was handled, zero if not (shared irq).
2810 */ 2810 */
2811 unsigned int ata_bmdma_port_intr(struct ata_port *ap, struct ata_queued_cmd *qc) 2811 unsigned int ata_bmdma_port_intr(struct ata_port *ap, struct ata_queued_cmd *qc)
2812 { 2812 {
2813 struct ata_eh_info *ehi = &ap->link.eh_info; 2813 struct ata_eh_info *ehi = &ap->link.eh_info;
2814 u8 host_stat = 0; 2814 u8 host_stat = 0;
2815 bool bmdma_stopped = false; 2815 bool bmdma_stopped = false;
2816 unsigned int handled; 2816 unsigned int handled;
2817 2817
2818 if (ap->hsm_task_state == HSM_ST_LAST && ata_is_dma(qc->tf.protocol)) { 2818 if (ap->hsm_task_state == HSM_ST_LAST && ata_is_dma(qc->tf.protocol)) {
2819 /* check status of DMA engine */ 2819 /* check status of DMA engine */
2820 host_stat = ap->ops->bmdma_status(ap); 2820 host_stat = ap->ops->bmdma_status(ap);
2821 VPRINTK("ata%u: host_stat 0x%X\n", ap->print_id, host_stat); 2821 VPRINTK("ata%u: host_stat 0x%X\n", ap->print_id, host_stat);
2822 2822
2823 /* if it's not our irq... */ 2823 /* if it's not our irq... */
2824 if (!(host_stat & ATA_DMA_INTR)) 2824 if (!(host_stat & ATA_DMA_INTR))
2825 return ata_sff_idle_irq(ap); 2825 return ata_sff_idle_irq(ap);
2826 2826
2827 /* before we do anything else, clear DMA-Start bit */ 2827 /* before we do anything else, clear DMA-Start bit */
2828 ap->ops->bmdma_stop(qc); 2828 ap->ops->bmdma_stop(qc);
2829 bmdma_stopped = true; 2829 bmdma_stopped = true;
2830 2830
2831 if (unlikely(host_stat & ATA_DMA_ERR)) { 2831 if (unlikely(host_stat & ATA_DMA_ERR)) {
2832 /* error when transfering data to/from memory */ 2832 /* error when transfering data to/from memory */
2833 qc->err_mask |= AC_ERR_HOST_BUS; 2833 qc->err_mask |= AC_ERR_HOST_BUS;
2834 ap->hsm_task_state = HSM_ST_ERR; 2834 ap->hsm_task_state = HSM_ST_ERR;
2835 } 2835 }
2836 } 2836 }
2837 2837
2838 handled = __ata_sff_port_intr(ap, qc, bmdma_stopped); 2838 handled = __ata_sff_port_intr(ap, qc, bmdma_stopped);
2839 2839
2840 if (unlikely(qc->err_mask) && ata_is_dma(qc->tf.protocol)) 2840 if (unlikely(qc->err_mask) && ata_is_dma(qc->tf.protocol))
2841 ata_ehi_push_desc(ehi, "BMDMA stat 0x%x", host_stat); 2841 ata_ehi_push_desc(ehi, "BMDMA stat 0x%x", host_stat);
2842 2842
2843 return handled; 2843 return handled;
2844 } 2844 }
2845 EXPORT_SYMBOL_GPL(ata_bmdma_port_intr); 2845 EXPORT_SYMBOL_GPL(ata_bmdma_port_intr);
2846 2846
2847 /** 2847 /**
2848 * ata_bmdma_interrupt - Default BMDMA ATA host interrupt handler 2848 * ata_bmdma_interrupt - Default BMDMA ATA host interrupt handler
2849 * @irq: irq line (unused) 2849 * @irq: irq line (unused)
2850 * @dev_instance: pointer to our ata_host information structure 2850 * @dev_instance: pointer to our ata_host information structure
2851 * 2851 *
2852 * Default interrupt handler for PCI IDE devices. Calls 2852 * Default interrupt handler for PCI IDE devices. Calls
2853 * ata_bmdma_port_intr() for each port that is not disabled. 2853 * ata_bmdma_port_intr() for each port that is not disabled.
2854 * 2854 *
2855 * LOCKING: 2855 * LOCKING:
2856 * Obtains host lock during operation. 2856 * Obtains host lock during operation.
2857 * 2857 *
2858 * RETURNS: 2858 * RETURNS:
2859 * IRQ_NONE or IRQ_HANDLED. 2859 * IRQ_NONE or IRQ_HANDLED.
2860 */ 2860 */
2861 irqreturn_t ata_bmdma_interrupt(int irq, void *dev_instance) 2861 irqreturn_t ata_bmdma_interrupt(int irq, void *dev_instance)
2862 { 2862 {
2863 return __ata_sff_interrupt(irq, dev_instance, ata_bmdma_port_intr); 2863 return __ata_sff_interrupt(irq, dev_instance, ata_bmdma_port_intr);
2864 } 2864 }
2865 EXPORT_SYMBOL_GPL(ata_bmdma_interrupt); 2865 EXPORT_SYMBOL_GPL(ata_bmdma_interrupt);
2866 2866
2867 /** 2867 /**
2868 * ata_bmdma_error_handler - Stock error handler for BMDMA controller 2868 * ata_bmdma_error_handler - Stock error handler for BMDMA controller
2869 * @ap: port to handle error for 2869 * @ap: port to handle error for
2870 * 2870 *
2871 * Stock error handler for BMDMA controller. It can handle both 2871 * Stock error handler for BMDMA controller. It can handle both
2872 * PATA and SATA controllers. Most BMDMA controllers should be 2872 * PATA and SATA controllers. Most BMDMA controllers should be
2873 * able to use this EH as-is or with some added handling before 2873 * able to use this EH as-is or with some added handling before
2874 * and after. 2874 * and after.
2875 * 2875 *
2876 * LOCKING: 2876 * LOCKING:
2877 * Kernel thread context (may sleep) 2877 * Kernel thread context (may sleep)
2878 */ 2878 */
2879 void ata_bmdma_error_handler(struct ata_port *ap) 2879 void ata_bmdma_error_handler(struct ata_port *ap)
2880 { 2880 {
2881 struct ata_queued_cmd *qc; 2881 struct ata_queued_cmd *qc;
2882 unsigned long flags; 2882 unsigned long flags;
2883 bool thaw = false; 2883 bool thaw = false;
2884 2884
2885 qc = __ata_qc_from_tag(ap, ap->link.active_tag); 2885 qc = __ata_qc_from_tag(ap, ap->link.active_tag);
2886 if (qc && !(qc->flags & ATA_QCFLAG_FAILED)) 2886 if (qc && !(qc->flags & ATA_QCFLAG_FAILED))
2887 qc = NULL; 2887 qc = NULL;
2888 2888
2889 /* reset PIO HSM and stop DMA engine */ 2889 /* reset PIO HSM and stop DMA engine */
2890 spin_lock_irqsave(ap->lock, flags); 2890 spin_lock_irqsave(ap->lock, flags);
2891 2891
2892 if (qc && ata_is_dma(qc->tf.protocol)) { 2892 if (qc && ata_is_dma(qc->tf.protocol)) {
2893 u8 host_stat; 2893 u8 host_stat;
2894 2894
2895 host_stat = ap->ops->bmdma_status(ap); 2895 host_stat = ap->ops->bmdma_status(ap);
2896 2896
2897 /* BMDMA controllers indicate host bus error by 2897 /* BMDMA controllers indicate host bus error by
2898 * setting DMA_ERR bit and timing out. As it wasn't 2898 * setting DMA_ERR bit and timing out. As it wasn't
2899 * really a timeout event, adjust error mask and 2899 * really a timeout event, adjust error mask and
2900 * cancel frozen state. 2900 * cancel frozen state.
2901 */ 2901 */
2902 if (qc->err_mask == AC_ERR_TIMEOUT && (host_stat & ATA_DMA_ERR)) { 2902 if (qc->err_mask == AC_ERR_TIMEOUT && (host_stat & ATA_DMA_ERR)) {
2903 qc->err_mask = AC_ERR_HOST_BUS; 2903 qc->err_mask = AC_ERR_HOST_BUS;
2904 thaw = true; 2904 thaw = true;
2905 } 2905 }
2906 2906
2907 ap->ops->bmdma_stop(qc); 2907 ap->ops->bmdma_stop(qc);
2908 2908
2909 /* if we're gonna thaw, make sure IRQ is clear */ 2909 /* if we're gonna thaw, make sure IRQ is clear */
2910 if (thaw) { 2910 if (thaw) {
2911 ap->ops->sff_check_status(ap); 2911 ap->ops->sff_check_status(ap);
2912 if (ap->ops->sff_irq_clear) 2912 if (ap->ops->sff_irq_clear)
2913 ap->ops->sff_irq_clear(ap); 2913 ap->ops->sff_irq_clear(ap);
2914 } 2914 }
2915 } 2915 }
2916 2916
2917 spin_unlock_irqrestore(ap->lock, flags); 2917 spin_unlock_irqrestore(ap->lock, flags);
2918 2918
2919 if (thaw) 2919 if (thaw)
2920 ata_eh_thaw_port(ap); 2920 ata_eh_thaw_port(ap);
2921 2921
2922 ata_sff_error_handler(ap); 2922 ata_sff_error_handler(ap);
2923 } 2923 }
2924 EXPORT_SYMBOL_GPL(ata_bmdma_error_handler); 2924 EXPORT_SYMBOL_GPL(ata_bmdma_error_handler);
2925 2925
2926 /** 2926 /**
2927 * ata_bmdma_post_internal_cmd - Stock post_internal_cmd for BMDMA 2927 * ata_bmdma_post_internal_cmd - Stock post_internal_cmd for BMDMA
2928 * @qc: internal command to clean up 2928 * @qc: internal command to clean up
2929 * 2929 *
2930 * LOCKING: 2930 * LOCKING:
2931 * Kernel thread context (may sleep) 2931 * Kernel thread context (may sleep)
2932 */ 2932 */
2933 void ata_bmdma_post_internal_cmd(struct ata_queued_cmd *qc) 2933 void ata_bmdma_post_internal_cmd(struct ata_queued_cmd *qc)
2934 { 2934 {
2935 struct ata_port *ap = qc->ap; 2935 struct ata_port *ap = qc->ap;
2936 unsigned long flags; 2936 unsigned long flags;
2937 2937
2938 if (ata_is_dma(qc->tf.protocol)) { 2938 if (ata_is_dma(qc->tf.protocol)) {
2939 spin_lock_irqsave(ap->lock, flags); 2939 spin_lock_irqsave(ap->lock, flags);
2940 ap->ops->bmdma_stop(qc); 2940 ap->ops->bmdma_stop(qc);
2941 spin_unlock_irqrestore(ap->lock, flags); 2941 spin_unlock_irqrestore(ap->lock, flags);
2942 } 2942 }
2943 } 2943 }
2944 EXPORT_SYMBOL_GPL(ata_bmdma_post_internal_cmd); 2944 EXPORT_SYMBOL_GPL(ata_bmdma_post_internal_cmd);
2945 2945
2946 /** 2946 /**
2947 * ata_bmdma_irq_clear - Clear PCI IDE BMDMA interrupt. 2947 * ata_bmdma_irq_clear - Clear PCI IDE BMDMA interrupt.
2948 * @ap: Port associated with this ATA transaction. 2948 * @ap: Port associated with this ATA transaction.
2949 * 2949 *
2950 * Clear interrupt and error flags in DMA status register. 2950 * Clear interrupt and error flags in DMA status register.
2951 * 2951 *
2952 * May be used as the irq_clear() entry in ata_port_operations. 2952 * May be used as the irq_clear() entry in ata_port_operations.
2953 * 2953 *
2954 * LOCKING: 2954 * LOCKING:
2955 * spin_lock_irqsave(host lock) 2955 * spin_lock_irqsave(host lock)
2956 */ 2956 */
2957 void ata_bmdma_irq_clear(struct ata_port *ap) 2957 void ata_bmdma_irq_clear(struct ata_port *ap)
2958 { 2958 {
2959 void __iomem *mmio = ap->ioaddr.bmdma_addr; 2959 void __iomem *mmio = ap->ioaddr.bmdma_addr;
2960 2960
2961 if (!mmio) 2961 if (!mmio)
2962 return; 2962 return;
2963 2963
2964 iowrite8(ioread8(mmio + ATA_DMA_STATUS), mmio + ATA_DMA_STATUS); 2964 iowrite8(ioread8(mmio + ATA_DMA_STATUS), mmio + ATA_DMA_STATUS);
2965 } 2965 }
2966 EXPORT_SYMBOL_GPL(ata_bmdma_irq_clear); 2966 EXPORT_SYMBOL_GPL(ata_bmdma_irq_clear);
2967 2967
2968 /** 2968 /**
2969 * ata_bmdma_setup - Set up PCI IDE BMDMA transaction 2969 * ata_bmdma_setup - Set up PCI IDE BMDMA transaction
2970 * @qc: Info associated with this ATA transaction. 2970 * @qc: Info associated with this ATA transaction.
2971 * 2971 *
2972 * LOCKING: 2972 * LOCKING:
2973 * spin_lock_irqsave(host lock) 2973 * spin_lock_irqsave(host lock)
2974 */ 2974 */
2975 void ata_bmdma_setup(struct ata_queued_cmd *qc) 2975 void ata_bmdma_setup(struct ata_queued_cmd *qc)
2976 { 2976 {
2977 struct ata_port *ap = qc->ap; 2977 struct ata_port *ap = qc->ap;
2978 unsigned int rw = (qc->tf.flags & ATA_TFLAG_WRITE); 2978 unsigned int rw = (qc->tf.flags & ATA_TFLAG_WRITE);
2979 u8 dmactl; 2979 u8 dmactl;
2980 2980
2981 /* load PRD table addr. */ 2981 /* load PRD table addr. */
2982 mb(); /* make sure PRD table writes are visible to controller */ 2982 mb(); /* make sure PRD table writes are visible to controller */
2983 iowrite32(ap->bmdma_prd_dma, ap->ioaddr.bmdma_addr + ATA_DMA_TABLE_OFS); 2983 iowrite32(ap->bmdma_prd_dma, ap->ioaddr.bmdma_addr + ATA_DMA_TABLE_OFS);
2984 2984
2985 /* specify data direction, triple-check start bit is clear */ 2985 /* specify data direction, triple-check start bit is clear */
2986 dmactl = ioread8(ap->ioaddr.bmdma_addr + ATA_DMA_CMD); 2986 dmactl = ioread8(ap->ioaddr.bmdma_addr + ATA_DMA_CMD);
2987 dmactl &= ~(ATA_DMA_WR | ATA_DMA_START); 2987 dmactl &= ~(ATA_DMA_WR | ATA_DMA_START);
2988 if (!rw) 2988 if (!rw)
2989 dmactl |= ATA_DMA_WR; 2989 dmactl |= ATA_DMA_WR;
2990 iowrite8(dmactl, ap->ioaddr.bmdma_addr + ATA_DMA_CMD); 2990 iowrite8(dmactl, ap->ioaddr.bmdma_addr + ATA_DMA_CMD);
2991 2991
2992 /* issue r/w command */ 2992 /* issue r/w command */
2993 ap->ops->sff_exec_command(ap, &qc->tf); 2993 ap->ops->sff_exec_command(ap, &qc->tf);
2994 } 2994 }
2995 EXPORT_SYMBOL_GPL(ata_bmdma_setup); 2995 EXPORT_SYMBOL_GPL(ata_bmdma_setup);
2996 2996
2997 /** 2997 /**
2998 * ata_bmdma_start - Start a PCI IDE BMDMA transaction 2998 * ata_bmdma_start - Start a PCI IDE BMDMA transaction
2999 * @qc: Info associated with this ATA transaction. 2999 * @qc: Info associated with this ATA transaction.
3000 * 3000 *
3001 * LOCKING: 3001 * LOCKING:
3002 * spin_lock_irqsave(host lock) 3002 * spin_lock_irqsave(host lock)
3003 */ 3003 */
3004 void ata_bmdma_start(struct ata_queued_cmd *qc) 3004 void ata_bmdma_start(struct ata_queued_cmd *qc)
3005 { 3005 {
3006 struct ata_port *ap = qc->ap; 3006 struct ata_port *ap = qc->ap;
3007 u8 dmactl; 3007 u8 dmactl;
3008 3008
3009 /* start host DMA transaction */ 3009 /* start host DMA transaction */
3010 dmactl = ioread8(ap->ioaddr.bmdma_addr + ATA_DMA_CMD); 3010 dmactl = ioread8(ap->ioaddr.bmdma_addr + ATA_DMA_CMD);
3011 iowrite8(dmactl | ATA_DMA_START, ap->ioaddr.bmdma_addr + ATA_DMA_CMD); 3011 iowrite8(dmactl | ATA_DMA_START, ap->ioaddr.bmdma_addr + ATA_DMA_CMD);
3012 3012
3013 /* Strictly, one may wish to issue an ioread8() here, to 3013 /* Strictly, one may wish to issue an ioread8() here, to
3014 * flush the mmio write. However, control also passes 3014 * flush the mmio write. However, control also passes
3015 * to the hardware at this point, and it will interrupt 3015 * to the hardware at this point, and it will interrupt
3016 * us when we are to resume control. So, in effect, 3016 * us when we are to resume control. So, in effect,
3017 * we don't care when the mmio write flushes. 3017 * we don't care when the mmio write flushes.
3018 * Further, a read of the DMA status register _immediately_ 3018 * Further, a read of the DMA status register _immediately_
3019 * following the write may not be what certain flaky hardware 3019 * following the write may not be what certain flaky hardware
3020 * is expected, so I think it is best to not add a readb() 3020 * is expected, so I think it is best to not add a readb()
3021 * without first all the MMIO ATA cards/mobos. 3021 * without first all the MMIO ATA cards/mobos.
3022 * Or maybe I'm just being paranoid. 3022 * Or maybe I'm just being paranoid.
3023 * 3023 *
3024 * FIXME: The posting of this write means I/O starts are 3024 * FIXME: The posting of this write means I/O starts are
3025 * unneccessarily delayed for MMIO 3025 * unneccessarily delayed for MMIO
3026 */ 3026 */
3027 } 3027 }
3028 EXPORT_SYMBOL_GPL(ata_bmdma_start); 3028 EXPORT_SYMBOL_GPL(ata_bmdma_start);
3029 3029
3030 /** 3030 /**
3031 * ata_bmdma_stop - Stop PCI IDE BMDMA transfer 3031 * ata_bmdma_stop - Stop PCI IDE BMDMA transfer
3032 * @qc: Command we are ending DMA for 3032 * @qc: Command we are ending DMA for
3033 * 3033 *
3034 * Clears the ATA_DMA_START flag in the dma control register 3034 * Clears the ATA_DMA_START flag in the dma control register
3035 * 3035 *
3036 * May be used as the bmdma_stop() entry in ata_port_operations. 3036 * May be used as the bmdma_stop() entry in ata_port_operations.
3037 * 3037 *
3038 * LOCKING: 3038 * LOCKING:
3039 * spin_lock_irqsave(host lock) 3039 * spin_lock_irqsave(host lock)
3040 */ 3040 */
3041 void ata_bmdma_stop(struct ata_queued_cmd *qc) 3041 void ata_bmdma_stop(struct ata_queued_cmd *qc)
3042 { 3042 {
3043 struct ata_port *ap = qc->ap; 3043 struct ata_port *ap = qc->ap;
3044 void __iomem *mmio = ap->ioaddr.bmdma_addr; 3044 void __iomem *mmio = ap->ioaddr.bmdma_addr;
3045 3045
3046 /* clear start/stop bit */ 3046 /* clear start/stop bit */
3047 iowrite8(ioread8(mmio + ATA_DMA_CMD) & ~ATA_DMA_START, 3047 iowrite8(ioread8(mmio + ATA_DMA_CMD) & ~ATA_DMA_START,
3048 mmio + ATA_DMA_CMD); 3048 mmio + ATA_DMA_CMD);
3049 3049
3050 /* one-PIO-cycle guaranteed wait, per spec, for HDMA1:0 transition */ 3050 /* one-PIO-cycle guaranteed wait, per spec, for HDMA1:0 transition */
3051 ata_sff_dma_pause(ap); 3051 ata_sff_dma_pause(ap);
3052 } 3052 }
3053 EXPORT_SYMBOL_GPL(ata_bmdma_stop); 3053 EXPORT_SYMBOL_GPL(ata_bmdma_stop);
3054 3054
3055 /** 3055 /**
3056 * ata_bmdma_status - Read PCI IDE BMDMA status 3056 * ata_bmdma_status - Read PCI IDE BMDMA status
3057 * @ap: Port associated with this ATA transaction. 3057 * @ap: Port associated with this ATA transaction.
3058 * 3058 *
3059 * Read and return BMDMA status register. 3059 * Read and return BMDMA status register.
3060 * 3060 *
3061 * May be used as the bmdma_status() entry in ata_port_operations. 3061 * May be used as the bmdma_status() entry in ata_port_operations.
3062 * 3062 *
3063 * LOCKING: 3063 * LOCKING:
3064 * spin_lock_irqsave(host lock) 3064 * spin_lock_irqsave(host lock)
3065 */ 3065 */
3066 u8 ata_bmdma_status(struct ata_port *ap) 3066 u8 ata_bmdma_status(struct ata_port *ap)
3067 { 3067 {
3068 return ioread8(ap->ioaddr.bmdma_addr + ATA_DMA_STATUS); 3068 return ioread8(ap->ioaddr.bmdma_addr + ATA_DMA_STATUS);
3069 } 3069 }
3070 EXPORT_SYMBOL_GPL(ata_bmdma_status); 3070 EXPORT_SYMBOL_GPL(ata_bmdma_status);
3071 3071
3072 3072
3073 /** 3073 /**
3074 * ata_bmdma_port_start - Set port up for bmdma. 3074 * ata_bmdma_port_start - Set port up for bmdma.
3075 * @ap: Port to initialize 3075 * @ap: Port to initialize
3076 * 3076 *
3077 * Called just after data structures for each port are 3077 * Called just after data structures for each port are
3078 * initialized. Allocates space for PRD table. 3078 * initialized. Allocates space for PRD table.
3079 * 3079 *
3080 * May be used as the port_start() entry in ata_port_operations. 3080 * May be used as the port_start() entry in ata_port_operations.
3081 * 3081 *
3082 * LOCKING: 3082 * LOCKING:
3083 * Inherited from caller. 3083 * Inherited from caller.
3084 */ 3084 */
3085 int ata_bmdma_port_start(struct ata_port *ap) 3085 int ata_bmdma_port_start(struct ata_port *ap)
3086 { 3086 {
3087 if (ap->mwdma_mask || ap->udma_mask) { 3087 if (ap->mwdma_mask || ap->udma_mask) {
3088 ap->bmdma_prd = 3088 ap->bmdma_prd =
3089 dmam_alloc_coherent(ap->host->dev, ATA_PRD_TBL_SZ, 3089 dmam_alloc_coherent(ap->host->dev, ATA_PRD_TBL_SZ,
3090 &ap->bmdma_prd_dma, GFP_KERNEL); 3090 &ap->bmdma_prd_dma, GFP_KERNEL);
3091 if (!ap->bmdma_prd) 3091 if (!ap->bmdma_prd)
3092 return -ENOMEM; 3092 return -ENOMEM;
3093 } 3093 }
3094 3094
3095 return 0; 3095 return 0;
3096 } 3096 }
3097 EXPORT_SYMBOL_GPL(ata_bmdma_port_start); 3097 EXPORT_SYMBOL_GPL(ata_bmdma_port_start);
3098 3098
3099 /** 3099 /**
3100 * ata_bmdma_port_start32 - Set port up for dma. 3100 * ata_bmdma_port_start32 - Set port up for dma.
3101 * @ap: Port to initialize 3101 * @ap: Port to initialize
3102 * 3102 *
3103 * Called just after data structures for each port are 3103 * Called just after data structures for each port are
3104 * initialized. Enables 32bit PIO and allocates space for PRD 3104 * initialized. Enables 32bit PIO and allocates space for PRD
3105 * table. 3105 * table.
3106 * 3106 *
3107 * May be used as the port_start() entry in ata_port_operations for 3107 * May be used as the port_start() entry in ata_port_operations for
3108 * devices that are capable of 32bit PIO. 3108 * devices that are capable of 32bit PIO.
3109 * 3109 *
3110 * LOCKING: 3110 * LOCKING:
3111 * Inherited from caller. 3111 * Inherited from caller.
3112 */ 3112 */
3113 int ata_bmdma_port_start32(struct ata_port *ap) 3113 int ata_bmdma_port_start32(struct ata_port *ap)
3114 { 3114 {
3115 ap->pflags |= ATA_PFLAG_PIO32 | ATA_PFLAG_PIO32CHANGE; 3115 ap->pflags |= ATA_PFLAG_PIO32 | ATA_PFLAG_PIO32CHANGE;
3116 return ata_bmdma_port_start(ap); 3116 return ata_bmdma_port_start(ap);
3117 } 3117 }
3118 EXPORT_SYMBOL_GPL(ata_bmdma_port_start32); 3118 EXPORT_SYMBOL_GPL(ata_bmdma_port_start32);
3119 3119
3120 #ifdef CONFIG_PCI 3120 #ifdef CONFIG_PCI
3121 3121
3122 /** 3122 /**
3123 * ata_pci_bmdma_clear_simplex - attempt to kick device out of simplex 3123 * ata_pci_bmdma_clear_simplex - attempt to kick device out of simplex
3124 * @pdev: PCI device 3124 * @pdev: PCI device
3125 * 3125 *
3126 * Some PCI ATA devices report simplex mode but in fact can be told to 3126 * Some PCI ATA devices report simplex mode but in fact can be told to
3127 * enter non simplex mode. This implements the necessary logic to 3127 * enter non simplex mode. This implements the necessary logic to
3128 * perform the task on such devices. Calling it on other devices will 3128 * perform the task on such devices. Calling it on other devices will
3129 * have -undefined- behaviour. 3129 * have -undefined- behaviour.
3130 */ 3130 */
3131 int ata_pci_bmdma_clear_simplex(struct pci_dev *pdev) 3131 int ata_pci_bmdma_clear_simplex(struct pci_dev *pdev)
3132 { 3132 {
3133 unsigned long bmdma = pci_resource_start(pdev, 4); 3133 unsigned long bmdma = pci_resource_start(pdev, 4);
3134 u8 simplex; 3134 u8 simplex;
3135 3135
3136 if (bmdma == 0) 3136 if (bmdma == 0)
3137 return -ENOENT; 3137 return -ENOENT;
3138 3138
3139 simplex = inb(bmdma + 0x02); 3139 simplex = inb(bmdma + 0x02);
3140 outb(simplex & 0x60, bmdma + 0x02); 3140 outb(simplex & 0x60, bmdma + 0x02);
3141 simplex = inb(bmdma + 0x02); 3141 simplex = inb(bmdma + 0x02);
3142 if (simplex & 0x80) 3142 if (simplex & 0x80)
3143 return -EOPNOTSUPP; 3143 return -EOPNOTSUPP;
3144 return 0; 3144 return 0;
3145 } 3145 }
3146 EXPORT_SYMBOL_GPL(ata_pci_bmdma_clear_simplex); 3146 EXPORT_SYMBOL_GPL(ata_pci_bmdma_clear_simplex);
3147 3147
3148 static void ata_bmdma_nodma(struct ata_host *host, const char *reason) 3148 static void ata_bmdma_nodma(struct ata_host *host, const char *reason)
3149 { 3149 {
3150 int i; 3150 int i;
3151 3151
3152 dev_printk(KERN_ERR, host->dev, "BMDMA: %s, falling back to PIO\n", 3152 dev_printk(KERN_ERR, host->dev, "BMDMA: %s, falling back to PIO\n",
3153 reason); 3153 reason);
3154 3154
3155 for (i = 0; i < 2; i++) { 3155 for (i = 0; i < 2; i++) {
3156 host->ports[i]->mwdma_mask = 0; 3156 host->ports[i]->mwdma_mask = 0;
3157 host->ports[i]->udma_mask = 0; 3157 host->ports[i]->udma_mask = 0;
3158 } 3158 }
3159 } 3159 }
3160 3160
3161 /** 3161 /**
3162 * ata_pci_bmdma_init - acquire PCI BMDMA resources and init ATA host 3162 * ata_pci_bmdma_init - acquire PCI BMDMA resources and init ATA host
3163 * @host: target ATA host 3163 * @host: target ATA host
3164 * 3164 *
3165 * Acquire PCI BMDMA resources and initialize @host accordingly. 3165 * Acquire PCI BMDMA resources and initialize @host accordingly.
3166 * 3166 *
3167 * LOCKING: 3167 * LOCKING:
3168 * Inherited from calling layer (may sleep). 3168 * Inherited from calling layer (may sleep).
3169 */ 3169 */
3170 void ata_pci_bmdma_init(struct ata_host *host) 3170 void ata_pci_bmdma_init(struct ata_host *host)
3171 { 3171 {
3172 struct device *gdev = host->dev; 3172 struct device *gdev = host->dev;
3173 struct pci_dev *pdev = to_pci_dev(gdev); 3173 struct pci_dev *pdev = to_pci_dev(gdev);
3174 int i, rc; 3174 int i, rc;
3175 3175
3176 /* No BAR4 allocation: No DMA */ 3176 /* No BAR4 allocation: No DMA */
3177 if (pci_resource_start(pdev, 4) == 0) { 3177 if (pci_resource_start(pdev, 4) == 0) {
3178 ata_bmdma_nodma(host, "BAR4 is zero"); 3178 ata_bmdma_nodma(host, "BAR4 is zero");
3179 return; 3179 return;
3180 } 3180 }
3181 3181
3182 /* 3182 /*
3183 * Some controllers require BMDMA region to be initialized 3183 * Some controllers require BMDMA region to be initialized
3184 * even if DMA is not in use to clear IRQ status via 3184 * even if DMA is not in use to clear IRQ status via
3185 * ->sff_irq_clear method. Try to initialize bmdma_addr 3185 * ->sff_irq_clear method. Try to initialize bmdma_addr
3186 * regardless of dma masks. 3186 * regardless of dma masks.
3187 */ 3187 */
3188 rc = pci_set_dma_mask(pdev, ATA_DMA_MASK); 3188 rc = pci_set_dma_mask(pdev, ATA_DMA_MASK);
3189 if (rc) 3189 if (rc)
3190 ata_bmdma_nodma(host, "failed to set dma mask"); 3190 ata_bmdma_nodma(host, "failed to set dma mask");
3191 if (!rc) { 3191 if (!rc) {
3192 rc = pci_set_consistent_dma_mask(pdev, ATA_DMA_MASK); 3192 rc = pci_set_consistent_dma_mask(pdev, ATA_DMA_MASK);
3193 if (rc) 3193 if (rc)
3194 ata_bmdma_nodma(host, 3194 ata_bmdma_nodma(host,
3195 "failed to set consistent dma mask"); 3195 "failed to set consistent dma mask");
3196 } 3196 }
3197 3197
3198 /* request and iomap DMA region */ 3198 /* request and iomap DMA region */
3199 rc = pcim_iomap_regions(pdev, 1 << 4, dev_driver_string(gdev)); 3199 rc = pcim_iomap_regions(pdev, 1 << 4, dev_driver_string(gdev));
3200 if (rc) { 3200 if (rc) {
3201 ata_bmdma_nodma(host, "failed to request/iomap BAR4"); 3201 ata_bmdma_nodma(host, "failed to request/iomap BAR4");
3202 return; 3202 return;
3203 } 3203 }
3204 host->iomap = pcim_iomap_table(pdev); 3204 host->iomap = pcim_iomap_table(pdev);
3205 3205
3206 for (i = 0; i < 2; i++) { 3206 for (i = 0; i < 2; i++) {
3207 struct ata_port *ap = host->ports[i]; 3207 struct ata_port *ap = host->ports[i];
3208 void __iomem *bmdma = host->iomap[4] + 8 * i; 3208 void __iomem *bmdma = host->iomap[4] + 8 * i;
3209 3209
3210 if (ata_port_is_dummy(ap)) 3210 if (ata_port_is_dummy(ap))
3211 continue; 3211 continue;
3212 3212
3213 ap->ioaddr.bmdma_addr = bmdma; 3213 ap->ioaddr.bmdma_addr = bmdma;
3214 if ((!(ap->flags & ATA_FLAG_IGN_SIMPLEX)) && 3214 if ((!(ap->flags & ATA_FLAG_IGN_SIMPLEX)) &&
3215 (ioread8(bmdma + 2) & 0x80)) 3215 (ioread8(bmdma + 2) & 0x80))
3216 host->flags |= ATA_HOST_SIMPLEX; 3216 host->flags |= ATA_HOST_SIMPLEX;
3217 3217
3218 ata_port_desc(ap, "bmdma 0x%llx", 3218 ata_port_desc(ap, "bmdma 0x%llx",
3219 (unsigned long long)pci_resource_start(pdev, 4) + 8 * i); 3219 (unsigned long long)pci_resource_start(pdev, 4) + 8 * i);
3220 } 3220 }
3221 } 3221 }
3222 EXPORT_SYMBOL_GPL(ata_pci_bmdma_init); 3222 EXPORT_SYMBOL_GPL(ata_pci_bmdma_init);
3223 3223
3224 /** 3224 /**
3225 * ata_pci_bmdma_prepare_host - helper to prepare PCI BMDMA ATA host 3225 * ata_pci_bmdma_prepare_host - helper to prepare PCI BMDMA ATA host
3226 * @pdev: target PCI device 3226 * @pdev: target PCI device
3227 * @ppi: array of port_info, must be enough for two ports 3227 * @ppi: array of port_info, must be enough for two ports
3228 * @r_host: out argument for the initialized ATA host 3228 * @r_host: out argument for the initialized ATA host
3229 * 3229 *
3230 * Helper to allocate BMDMA ATA host for @pdev, acquire all PCI 3230 * Helper to allocate BMDMA ATA host for @pdev, acquire all PCI
3231 * resources and initialize it accordingly in one go. 3231 * resources and initialize it accordingly in one go.
3232 * 3232 *
3233 * LOCKING: 3233 * LOCKING:
3234 * Inherited from calling layer (may sleep). 3234 * Inherited from calling layer (may sleep).
3235 * 3235 *
3236 * RETURNS: 3236 * RETURNS:
3237 * 0 on success, -errno otherwise. 3237 * 0 on success, -errno otherwise.
3238 */ 3238 */
3239 int ata_pci_bmdma_prepare_host(struct pci_dev *pdev, 3239 int ata_pci_bmdma_prepare_host(struct pci_dev *pdev,
3240 const struct ata_port_info * const * ppi, 3240 const struct ata_port_info * const * ppi,
3241 struct ata_host **r_host) 3241 struct ata_host **r_host)
3242 { 3242 {
3243 int rc; 3243 int rc;
3244 3244
3245 rc = ata_pci_sff_prepare_host(pdev, ppi, r_host); 3245 rc = ata_pci_sff_prepare_host(pdev, ppi, r_host);
3246 if (rc) 3246 if (rc)
3247 return rc; 3247 return rc;
3248 3248
3249 ata_pci_bmdma_init(*r_host); 3249 ata_pci_bmdma_init(*r_host);
3250 return 0; 3250 return 0;
3251 } 3251 }
3252 EXPORT_SYMBOL_GPL(ata_pci_bmdma_prepare_host); 3252 EXPORT_SYMBOL_GPL(ata_pci_bmdma_prepare_host);
3253 3253
3254 /** 3254 /**
3255 * ata_pci_bmdma_init_one - Initialize/register BMDMA PCI IDE controller 3255 * ata_pci_bmdma_init_one - Initialize/register BMDMA PCI IDE controller
3256 * @pdev: Controller to be initialized 3256 * @pdev: Controller to be initialized
3257 * @ppi: array of port_info, must be enough for two ports 3257 * @ppi: array of port_info, must be enough for two ports
3258 * @sht: scsi_host_template to use when registering the host 3258 * @sht: scsi_host_template to use when registering the host
3259 * @host_priv: host private_data 3259 * @host_priv: host private_data
3260 * @hflags: host flags 3260 * @hflags: host flags
3261 * 3261 *
3262 * This function is similar to ata_pci_sff_init_one() but also 3262 * This function is similar to ata_pci_sff_init_one() but also
3263 * takes care of BMDMA initialization. 3263 * takes care of BMDMA initialization.
3264 * 3264 *
3265 * LOCKING: 3265 * LOCKING:
3266 * Inherited from PCI layer (may sleep). 3266 * Inherited from PCI layer (may sleep).
3267 * 3267 *
3268 * RETURNS: 3268 * RETURNS:
3269 * Zero on success, negative on errno-based value on error. 3269 * Zero on success, negative on errno-based value on error.
3270 */ 3270 */
3271 int ata_pci_bmdma_init_one(struct pci_dev *pdev, 3271 int ata_pci_bmdma_init_one(struct pci_dev *pdev,
3272 const struct ata_port_info * const * ppi, 3272 const struct ata_port_info * const * ppi,
3273 struct scsi_host_template *sht, void *host_priv, 3273 struct scsi_host_template *sht, void *host_priv,
3274 int hflags) 3274 int hflags)
3275 { 3275 {
3276 struct device *dev = &pdev->dev; 3276 struct device *dev = &pdev->dev;
3277 const struct ata_port_info *pi; 3277 const struct ata_port_info *pi;
3278 struct ata_host *host = NULL; 3278 struct ata_host *host = NULL;
3279 int rc; 3279 int rc;
3280 3280
3281 DPRINTK("ENTER\n"); 3281 DPRINTK("ENTER\n");
3282 3282
3283 pi = ata_sff_find_valid_pi(ppi); 3283 pi = ata_sff_find_valid_pi(ppi);
3284 if (!pi) { 3284 if (!pi) {
3285 dev_printk(KERN_ERR, &pdev->dev, 3285 dev_printk(KERN_ERR, &pdev->dev,
3286 "no valid port_info specified\n"); 3286 "no valid port_info specified\n");
3287 return -EINVAL; 3287 return -EINVAL;
3288 } 3288 }
3289 3289
3290 if (!devres_open_group(dev, NULL, GFP_KERNEL)) 3290 if (!devres_open_group(dev, NULL, GFP_KERNEL))
3291 return -ENOMEM; 3291 return -ENOMEM;
3292 3292
3293 rc = pcim_enable_device(pdev); 3293 rc = pcim_enable_device(pdev);
3294 if (rc) 3294 if (rc)
3295 goto out; 3295 goto out;
3296 3296
3297 /* prepare and activate BMDMA host */ 3297 /* prepare and activate BMDMA host */
3298 rc = ata_pci_bmdma_prepare_host(pdev, ppi, &host); 3298 rc = ata_pci_bmdma_prepare_host(pdev, ppi, &host);
3299 if (rc) 3299 if (rc)
3300 goto out; 3300 goto out;
3301 host->private_data = host_priv; 3301 host->private_data = host_priv;
3302 host->flags |= hflags; 3302 host->flags |= hflags;
3303 3303
3304 pci_set_master(pdev); 3304 pci_set_master(pdev);
3305 rc = ata_pci_sff_activate_host(host, ata_bmdma_interrupt, sht); 3305 rc = ata_pci_sff_activate_host(host, ata_bmdma_interrupt, sht);
3306 out: 3306 out:
3307 if (rc == 0) 3307 if (rc == 0)
3308 devres_remove_group(&pdev->dev, NULL); 3308 devres_remove_group(&pdev->dev, NULL);
3309 else 3309 else
3310 devres_release_group(&pdev->dev, NULL); 3310 devres_release_group(&pdev->dev, NULL);
3311 3311
3312 return rc; 3312 return rc;
3313 } 3313 }
3314 EXPORT_SYMBOL_GPL(ata_pci_bmdma_init_one); 3314 EXPORT_SYMBOL_GPL(ata_pci_bmdma_init_one);
3315 3315
3316 #endif /* CONFIG_PCI */ 3316 #endif /* CONFIG_PCI */
3317 #endif /* CONFIG_ATA_BMDMA */ 3317 #endif /* CONFIG_ATA_BMDMA */
3318 3318
3319 /** 3319 /**
3320 * ata_sff_port_init - Initialize SFF/BMDMA ATA port 3320 * ata_sff_port_init - Initialize SFF/BMDMA ATA port
3321 * @ap: Port to initialize 3321 * @ap: Port to initialize
3322 * 3322 *
3323 * Called on port allocation to initialize SFF/BMDMA specific 3323 * Called on port allocation to initialize SFF/BMDMA specific
3324 * fields. 3324 * fields.
3325 * 3325 *
3326 * LOCKING: 3326 * LOCKING:
3327 * None. 3327 * None.
3328 */ 3328 */
3329 void ata_sff_port_init(struct ata_port *ap) 3329 void ata_sff_port_init(struct ata_port *ap)
3330 { 3330 {
3331 INIT_DELAYED_WORK(&ap->sff_pio_task, ata_sff_pio_task); 3331 INIT_DELAYED_WORK(&ap->sff_pio_task, ata_sff_pio_task);
3332 ap->ctl = ATA_DEVCTL_OBS; 3332 ap->ctl = ATA_DEVCTL_OBS;
3333 ap->last_ctl = 0xFF; 3333 ap->last_ctl = 0xFF;
3334 } 3334 }
3335 3335
3336 int __init ata_sff_init(void) 3336 int __init ata_sff_init(void)
3337 { 3337 {
3338 ata_sff_wq = alloc_workqueue("ata_sff", WQ_RESCUER, WQ_MAX_ACTIVE); 3338 ata_sff_wq = alloc_workqueue("ata_sff", WQ_MEM_RECLAIM, WQ_MAX_ACTIVE);
3339 if (!ata_sff_wq) 3339 if (!ata_sff_wq)
3340 return -ENOMEM; 3340 return -ENOMEM;
3341 3341
3342 return 0; 3342 return 0;
3343 } 3343 }
3344 3344
3345 void ata_sff_exit(void) 3345 void ata_sff_exit(void)
3346 { 3346 {
3347 destroy_workqueue(ata_sff_wq); 3347 destroy_workqueue(ata_sff_wq);
3348 } 3348 }
3349 3349
drivers/isdn/hardware/eicon/divasmain.c
1 /* $Id: divasmain.c,v 1.55.4.6 2005/02/09 19:28:20 armin Exp $ 1 /* $Id: divasmain.c,v 1.55.4.6 2005/02/09 19:28:20 armin Exp $
2 * 2 *
3 * Low level driver for Eicon DIVA Server ISDN cards. 3 * Low level driver for Eicon DIVA Server ISDN cards.
4 * 4 *
5 * Copyright 2000-2003 by Armin Schindler (mac@melware.de) 5 * Copyright 2000-2003 by Armin Schindler (mac@melware.de)
6 * Copyright 2000-2003 Cytronics & Melware (info@melware.de) 6 * Copyright 2000-2003 Cytronics & Melware (info@melware.de)
7 * 7 *
8 * This software may be used and distributed according to the terms 8 * This software may be used and distributed according to the terms
9 * of the GNU General Public License, incorporated herein by reference. 9 * of the GNU General Public License, incorporated herein by reference.
10 */ 10 */
11 11
12 #include <linux/module.h> 12 #include <linux/module.h>
13 #include <linux/init.h> 13 #include <linux/init.h>
14 #include <linux/kernel.h> 14 #include <linux/kernel.h>
15 #include <asm/uaccess.h> 15 #include <asm/uaccess.h>
16 #include <asm/io.h> 16 #include <asm/io.h>
17 #include <linux/ioport.h> 17 #include <linux/ioport.h>
18 #include <linux/workqueue.h>
19 #include <linux/pci.h> 18 #include <linux/pci.h>
20 #include <linux/interrupt.h> 19 #include <linux/interrupt.h>
21 #include <linux/list.h> 20 #include <linux/list.h>
22 #include <linux/poll.h> 21 #include <linux/poll.h>
23 #include <linux/kmod.h> 22 #include <linux/kmod.h>
24 23
25 #include "platform.h" 24 #include "platform.h"
26 #undef ID_MASK 25 #undef ID_MASK
27 #undef N_DATA 26 #undef N_DATA
28 #include "pc.h" 27 #include "pc.h"
29 #include "di_defs.h" 28 #include "di_defs.h"
30 #include "divasync.h" 29 #include "divasync.h"
31 #include "diva.h" 30 #include "diva.h"
32 #include "di.h" 31 #include "di.h"
33 #include "io.h" 32 #include "io.h"
34 #include "xdi_msg.h" 33 #include "xdi_msg.h"
35 #include "xdi_adapter.h" 34 #include "xdi_adapter.h"
36 #include "xdi_vers.h" 35 #include "xdi_vers.h"
37 #include "diva_dma.h" 36 #include "diva_dma.h"
38 #include "diva_pci.h" 37 #include "diva_pci.h"
39 38
40 static char *main_revision = "$Revision: 1.55.4.6 $"; 39 static char *main_revision = "$Revision: 1.55.4.6 $";
41 40
42 static int major; 41 static int major;
43 42
44 static int dbgmask; 43 static int dbgmask;
45 44
46 MODULE_DESCRIPTION("Kernel driver for Eicon DIVA Server cards"); 45 MODULE_DESCRIPTION("Kernel driver for Eicon DIVA Server cards");
47 MODULE_AUTHOR("Cytronics & Melware, Eicon Networks"); 46 MODULE_AUTHOR("Cytronics & Melware, Eicon Networks");
48 MODULE_LICENSE("GPL"); 47 MODULE_LICENSE("GPL");
49 48
50 module_param(dbgmask, int, 0); 49 module_param(dbgmask, int, 0);
51 MODULE_PARM_DESC(dbgmask, "initial debug mask"); 50 MODULE_PARM_DESC(dbgmask, "initial debug mask");
52 51
53 static char *DRIVERNAME = 52 static char *DRIVERNAME =
54 "Eicon DIVA Server driver (http://www.melware.net)"; 53 "Eicon DIVA Server driver (http://www.melware.net)";
55 static char *DRIVERLNAME = "divas"; 54 static char *DRIVERLNAME = "divas";
56 static char *DEVNAME = "Divas"; 55 static char *DEVNAME = "Divas";
57 char *DRIVERRELEASE_DIVAS = "2.0"; 56 char *DRIVERRELEASE_DIVAS = "2.0";
58 57
59 extern irqreturn_t diva_os_irq_wrapper(int irq, void *context); 58 extern irqreturn_t diva_os_irq_wrapper(int irq, void *context);
60 extern int create_divas_proc(void); 59 extern int create_divas_proc(void);
61 extern void remove_divas_proc(void); 60 extern void remove_divas_proc(void);
62 extern void diva_get_vserial_number(PISDN_ADAPTER IoAdapter, char *buf); 61 extern void diva_get_vserial_number(PISDN_ADAPTER IoAdapter, char *buf);
63 extern int divasfunc_init(int dbgmask); 62 extern int divasfunc_init(int dbgmask);
64 extern void divasfunc_exit(void); 63 extern void divasfunc_exit(void);
65 64
66 typedef struct _diva_os_thread_dpc { 65 typedef struct _diva_os_thread_dpc {
67 struct tasklet_struct divas_task; 66 struct tasklet_struct divas_task;
68 diva_os_soft_isr_t *psoft_isr; 67 diva_os_soft_isr_t *psoft_isr;
69 } diva_os_thread_dpc_t; 68 } diva_os_thread_dpc_t;
70 69
71 /* -------------------------------------------------------------------------- 70 /* --------------------------------------------------------------------------
72 PCI driver interface section 71 PCI driver interface section
73 -------------------------------------------------------------------------- */ 72 -------------------------------------------------------------------------- */
74 /* 73 /*
75 vendor, device Vendor and device ID to match (or PCI_ANY_ID) 74 vendor, device Vendor and device ID to match (or PCI_ANY_ID)
76 subvendor, Subsystem vendor and device ID to match (or PCI_ANY_ID) 75 subvendor, Subsystem vendor and device ID to match (or PCI_ANY_ID)
77 subdevice 76 subdevice
78 class, Device class to match. The class_mask tells which bits 77 class, Device class to match. The class_mask tells which bits
79 class_mask of the class are honored during the comparison. 78 class_mask of the class are honored during the comparison.
80 driver_data Data private to the driver. 79 driver_data Data private to the driver.
81 */ 80 */
82 81
83 #if !defined(PCI_DEVICE_ID_EICON_MAESTRAP_2) 82 #if !defined(PCI_DEVICE_ID_EICON_MAESTRAP_2)
84 #define PCI_DEVICE_ID_EICON_MAESTRAP_2 0xE015 83 #define PCI_DEVICE_ID_EICON_MAESTRAP_2 0xE015
85 #endif 84 #endif
86 85
87 #if !defined(PCI_DEVICE_ID_EICON_4BRI_VOIP) 86 #if !defined(PCI_DEVICE_ID_EICON_4BRI_VOIP)
88 #define PCI_DEVICE_ID_EICON_4BRI_VOIP 0xE016 87 #define PCI_DEVICE_ID_EICON_4BRI_VOIP 0xE016
89 #endif 88 #endif
90 89
91 #if !defined(PCI_DEVICE_ID_EICON_4BRI_2_VOIP) 90 #if !defined(PCI_DEVICE_ID_EICON_4BRI_2_VOIP)
92 #define PCI_DEVICE_ID_EICON_4BRI_2_VOIP 0xE017 91 #define PCI_DEVICE_ID_EICON_4BRI_2_VOIP 0xE017
93 #endif 92 #endif
94 93
95 #if !defined(PCI_DEVICE_ID_EICON_BRI2M_2) 94 #if !defined(PCI_DEVICE_ID_EICON_BRI2M_2)
96 #define PCI_DEVICE_ID_EICON_BRI2M_2 0xE018 95 #define PCI_DEVICE_ID_EICON_BRI2M_2 0xE018
97 #endif 96 #endif
98 97
99 #if !defined(PCI_DEVICE_ID_EICON_MAESTRAP_2_VOIP) 98 #if !defined(PCI_DEVICE_ID_EICON_MAESTRAP_2_VOIP)
100 #define PCI_DEVICE_ID_EICON_MAESTRAP_2_VOIP 0xE019 99 #define PCI_DEVICE_ID_EICON_MAESTRAP_2_VOIP 0xE019
101 #endif 100 #endif
102 101
103 #if !defined(PCI_DEVICE_ID_EICON_2F) 102 #if !defined(PCI_DEVICE_ID_EICON_2F)
104 #define PCI_DEVICE_ID_EICON_2F 0xE01A 103 #define PCI_DEVICE_ID_EICON_2F 0xE01A
105 #endif 104 #endif
106 105
107 #if !defined(PCI_DEVICE_ID_EICON_BRI2M_2_VOIP) 106 #if !defined(PCI_DEVICE_ID_EICON_BRI2M_2_VOIP)
108 #define PCI_DEVICE_ID_EICON_BRI2M_2_VOIP 0xE01B 107 #define PCI_DEVICE_ID_EICON_BRI2M_2_VOIP 0xE01B
109 #endif 108 #endif
110 109
111 /* 110 /*
112 This table should be sorted by PCI device ID 111 This table should be sorted by PCI device ID
113 */ 112 */
114 static struct pci_device_id divas_pci_tbl[] = { 113 static struct pci_device_id divas_pci_tbl[] = {
115 /* Diva Server BRI-2M PCI 0xE010 */ 114 /* Diva Server BRI-2M PCI 0xE010 */
116 { PCI_VDEVICE(EICON, PCI_DEVICE_ID_EICON_MAESTRA), 115 { PCI_VDEVICE(EICON, PCI_DEVICE_ID_EICON_MAESTRA),
117 CARDTYPE_MAESTRA_PCI }, 116 CARDTYPE_MAESTRA_PCI },
118 /* Diva Server 4BRI-8M PCI 0xE012 */ 117 /* Diva Server 4BRI-8M PCI 0xE012 */
119 { PCI_VDEVICE(EICON, PCI_DEVICE_ID_EICON_MAESTRAQ), 118 { PCI_VDEVICE(EICON, PCI_DEVICE_ID_EICON_MAESTRAQ),
120 CARDTYPE_DIVASRV_Q_8M_PCI }, 119 CARDTYPE_DIVASRV_Q_8M_PCI },
121 /* Diva Server 4BRI-8M 2.0 PCI 0xE013 */ 120 /* Diva Server 4BRI-8M 2.0 PCI 0xE013 */
122 { PCI_VDEVICE(EICON, PCI_DEVICE_ID_EICON_MAESTRAQ_U), 121 { PCI_VDEVICE(EICON, PCI_DEVICE_ID_EICON_MAESTRAQ_U),
123 CARDTYPE_DIVASRV_Q_8M_V2_PCI }, 122 CARDTYPE_DIVASRV_Q_8M_V2_PCI },
124 /* Diva Server PRI-30M PCI 0xE014 */ 123 /* Diva Server PRI-30M PCI 0xE014 */
125 { PCI_VDEVICE(EICON, PCI_DEVICE_ID_EICON_MAESTRAP), 124 { PCI_VDEVICE(EICON, PCI_DEVICE_ID_EICON_MAESTRAP),
126 CARDTYPE_DIVASRV_P_30M_PCI }, 125 CARDTYPE_DIVASRV_P_30M_PCI },
127 /* Diva Server PRI 2.0 adapter 0xE015 */ 126 /* Diva Server PRI 2.0 adapter 0xE015 */
128 { PCI_VDEVICE(EICON, PCI_DEVICE_ID_EICON_MAESTRAP_2), 127 { PCI_VDEVICE(EICON, PCI_DEVICE_ID_EICON_MAESTRAP_2),
129 CARDTYPE_DIVASRV_P_30M_V2_PCI }, 128 CARDTYPE_DIVASRV_P_30M_V2_PCI },
130 /* Diva Server Voice 4BRI-8M PCI 0xE016 */ 129 /* Diva Server Voice 4BRI-8M PCI 0xE016 */
131 { PCI_VDEVICE(EICON, PCI_DEVICE_ID_EICON_4BRI_VOIP), 130 { PCI_VDEVICE(EICON, PCI_DEVICE_ID_EICON_4BRI_VOIP),
132 CARDTYPE_DIVASRV_VOICE_Q_8M_PCI }, 131 CARDTYPE_DIVASRV_VOICE_Q_8M_PCI },
133 /* Diva Server Voice 4BRI-8M 2.0 PCI 0xE017 */ 132 /* Diva Server Voice 4BRI-8M 2.0 PCI 0xE017 */
134 { PCI_VDEVICE(EICON, PCI_DEVICE_ID_EICON_4BRI_2_VOIP), 133 { PCI_VDEVICE(EICON, PCI_DEVICE_ID_EICON_4BRI_2_VOIP),
135 CARDTYPE_DIVASRV_VOICE_Q_8M_V2_PCI }, 134 CARDTYPE_DIVASRV_VOICE_Q_8M_V2_PCI },
136 /* Diva Server BRI-2M 2.0 PCI 0xE018 */ 135 /* Diva Server BRI-2M 2.0 PCI 0xE018 */
137 { PCI_VDEVICE(EICON, PCI_DEVICE_ID_EICON_BRI2M_2), 136 { PCI_VDEVICE(EICON, PCI_DEVICE_ID_EICON_BRI2M_2),
138 CARDTYPE_DIVASRV_B_2M_V2_PCI }, 137 CARDTYPE_DIVASRV_B_2M_V2_PCI },
139 /* Diva Server Voice PRI 2.0 PCI 0xE019 */ 138 /* Diva Server Voice PRI 2.0 PCI 0xE019 */
140 { PCI_VDEVICE(EICON, PCI_DEVICE_ID_EICON_MAESTRAP_2_VOIP), 139 { PCI_VDEVICE(EICON, PCI_DEVICE_ID_EICON_MAESTRAP_2_VOIP),
141 CARDTYPE_DIVASRV_VOICE_P_30M_V2_PCI }, 140 CARDTYPE_DIVASRV_VOICE_P_30M_V2_PCI },
142 /* Diva Server 2FX 0xE01A */ 141 /* Diva Server 2FX 0xE01A */
143 { PCI_VDEVICE(EICON, PCI_DEVICE_ID_EICON_2F), 142 { PCI_VDEVICE(EICON, PCI_DEVICE_ID_EICON_2F),
144 CARDTYPE_DIVASRV_B_2F_PCI }, 143 CARDTYPE_DIVASRV_B_2F_PCI },
145 /* Diva Server Voice BRI-2M 2.0 PCI 0xE01B */ 144 /* Diva Server Voice BRI-2M 2.0 PCI 0xE01B */
146 { PCI_VDEVICE(EICON, PCI_DEVICE_ID_EICON_BRI2M_2_VOIP), 145 { PCI_VDEVICE(EICON, PCI_DEVICE_ID_EICON_BRI2M_2_VOIP),
147 CARDTYPE_DIVASRV_VOICE_B_2M_V2_PCI }, 146 CARDTYPE_DIVASRV_VOICE_B_2M_V2_PCI },
148 { 0, } /* 0 terminated list. */ 147 { 0, } /* 0 terminated list. */
149 }; 148 };
150 MODULE_DEVICE_TABLE(pci, divas_pci_tbl); 149 MODULE_DEVICE_TABLE(pci, divas_pci_tbl);
151 150
152 static int divas_init_one(struct pci_dev *pdev, 151 static int divas_init_one(struct pci_dev *pdev,
153 const struct pci_device_id *ent); 152 const struct pci_device_id *ent);
154 static void __devexit divas_remove_one(struct pci_dev *pdev); 153 static void __devexit divas_remove_one(struct pci_dev *pdev);
155 154
156 static struct pci_driver diva_pci_driver = { 155 static struct pci_driver diva_pci_driver = {
157 .name = "divas", 156 .name = "divas",
158 .probe = divas_init_one, 157 .probe = divas_init_one,
159 .remove = __devexit_p(divas_remove_one), 158 .remove = __devexit_p(divas_remove_one),
160 .id_table = divas_pci_tbl, 159 .id_table = divas_pci_tbl,
161 }; 160 };
162 161
163 /********************************************************* 162 /*********************************************************
164 ** little helper functions 163 ** little helper functions
165 *********************************************************/ 164 *********************************************************/
166 static char *getrev(const char *revision) 165 static char *getrev(const char *revision)
167 { 166 {
168 char *rev; 167 char *rev;
169 char *p; 168 char *p;
170 if ((p = strchr(revision, ':'))) { 169 if ((p = strchr(revision, ':'))) {
171 rev = p + 2; 170 rev = p + 2;
172 p = strchr(rev, '$'); 171 p = strchr(rev, '$');
173 *--p = 0; 172 *--p = 0;
174 } else 173 } else
175 rev = "1.0"; 174 rev = "1.0";
176 return rev; 175 return rev;
177 } 176 }
178 177
179 void diva_log_info(unsigned char *format, ...) 178 void diva_log_info(unsigned char *format, ...)
180 { 179 {
181 va_list args; 180 va_list args;
182 unsigned char line[160]; 181 unsigned char line[160];
183 182
184 va_start(args, format); 183 va_start(args, format);
185 vsnprintf(line, sizeof(line), format, args); 184 vsnprintf(line, sizeof(line), format, args);
186 va_end(args); 185 va_end(args);
187 186
188 printk(KERN_INFO "%s: %s\n", DRIVERLNAME, line); 187 printk(KERN_INFO "%s: %s\n", DRIVERLNAME, line);
189 } 188 }
190 189
191 void divas_get_version(char *p) 190 void divas_get_version(char *p)
192 { 191 {
193 char tmprev[32]; 192 char tmprev[32];
194 193
195 strcpy(tmprev, main_revision); 194 strcpy(tmprev, main_revision);
196 sprintf(p, "%s: %s(%s) %s(%s) major=%d\n", DRIVERLNAME, DRIVERRELEASE_DIVAS, 195 sprintf(p, "%s: %s(%s) %s(%s) major=%d\n", DRIVERLNAME, DRIVERRELEASE_DIVAS,
197 getrev(tmprev), diva_xdi_common_code_build, DIVA_BUILD, major); 196 getrev(tmprev), diva_xdi_common_code_build, DIVA_BUILD, major);
198 } 197 }
199 198
200 /* -------------------------------------------------------------------------- 199 /* --------------------------------------------------------------------------
201 PCI Bus services 200 PCI Bus services
202 -------------------------------------------------------------------------- */ 201 -------------------------------------------------------------------------- */
203 byte diva_os_get_pci_bus(void *pci_dev_handle) 202 byte diva_os_get_pci_bus(void *pci_dev_handle)
204 { 203 {
205 struct pci_dev *pdev = (struct pci_dev *) pci_dev_handle; 204 struct pci_dev *pdev = (struct pci_dev *) pci_dev_handle;
206 return ((byte) pdev->bus->number); 205 return ((byte) pdev->bus->number);
207 } 206 }
208 207
209 byte diva_os_get_pci_func(void *pci_dev_handle) 208 byte diva_os_get_pci_func(void *pci_dev_handle)
210 { 209 {
211 struct pci_dev *pdev = (struct pci_dev *) pci_dev_handle; 210 struct pci_dev *pdev = (struct pci_dev *) pci_dev_handle;
212 return ((byte) pdev->devfn); 211 return ((byte) pdev->devfn);
213 } 212 }
214 213
215 unsigned long divasa_get_pci_irq(unsigned char bus, unsigned char func, 214 unsigned long divasa_get_pci_irq(unsigned char bus, unsigned char func,
216 void *pci_dev_handle) 215 void *pci_dev_handle)
217 { 216 {
218 unsigned char irq = 0; 217 unsigned char irq = 0;
219 struct pci_dev *dev = (struct pci_dev *) pci_dev_handle; 218 struct pci_dev *dev = (struct pci_dev *) pci_dev_handle;
220 219
221 irq = dev->irq; 220 irq = dev->irq;
222 221
223 return ((unsigned long) irq); 222 return ((unsigned long) irq);
224 } 223 }
225 224
226 unsigned long divasa_get_pci_bar(unsigned char bus, unsigned char func, 225 unsigned long divasa_get_pci_bar(unsigned char bus, unsigned char func,
227 int bar, void *pci_dev_handle) 226 int bar, void *pci_dev_handle)
228 { 227 {
229 unsigned long ret = 0; 228 unsigned long ret = 0;
230 struct pci_dev *dev = (struct pci_dev *) pci_dev_handle; 229 struct pci_dev *dev = (struct pci_dev *) pci_dev_handle;
231 230
232 if (bar < 6) { 231 if (bar < 6) {
233 ret = dev->resource[bar].start; 232 ret = dev->resource[bar].start;
234 } 233 }
235 234
236 DBG_TRC(("GOT BAR[%d]=%08x", bar, ret)); 235 DBG_TRC(("GOT BAR[%d]=%08x", bar, ret));
237 236
238 { 237 {
239 unsigned long type = (ret & 0x00000001); 238 unsigned long type = (ret & 0x00000001);
240 if (type & PCI_BASE_ADDRESS_SPACE_IO) { 239 if (type & PCI_BASE_ADDRESS_SPACE_IO) {
241 DBG_TRC((" I/O")); 240 DBG_TRC((" I/O"));
242 ret &= PCI_BASE_ADDRESS_IO_MASK; 241 ret &= PCI_BASE_ADDRESS_IO_MASK;
243 } else { 242 } else {
244 DBG_TRC((" memory")); 243 DBG_TRC((" memory"));
245 ret &= PCI_BASE_ADDRESS_MEM_MASK; 244 ret &= PCI_BASE_ADDRESS_MEM_MASK;
246 } 245 }
247 DBG_TRC((" final=%08x", ret)); 246 DBG_TRC((" final=%08x", ret));
248 } 247 }
249 248
250 return (ret); 249 return (ret);
251 } 250 }
252 251
253 void PCIwrite(byte bus, byte func, int offset, void *data, int length, 252 void PCIwrite(byte bus, byte func, int offset, void *data, int length,
254 void *pci_dev_handle) 253 void *pci_dev_handle)
255 { 254 {
256 struct pci_dev *dev = (struct pci_dev *) pci_dev_handle; 255 struct pci_dev *dev = (struct pci_dev *) pci_dev_handle;
257 256
258 switch (length) { 257 switch (length) {
259 case 1: /* byte */ 258 case 1: /* byte */
260 pci_write_config_byte(dev, offset, 259 pci_write_config_byte(dev, offset,
261 *(unsigned char *) data); 260 *(unsigned char *) data);
262 break; 261 break;
263 case 2: /* word */ 262 case 2: /* word */
264 pci_write_config_word(dev, offset, 263 pci_write_config_word(dev, offset,
265 *(unsigned short *) data); 264 *(unsigned short *) data);
266 break; 265 break;
267 case 4: /* dword */ 266 case 4: /* dword */
268 pci_write_config_dword(dev, offset, 267 pci_write_config_dword(dev, offset,
269 *(unsigned int *) data); 268 *(unsigned int *) data);
270 break; 269 break;
271 270
272 default: /* buffer */ 271 default: /* buffer */
273 if (!(length % 4) && !(length & 0x03)) { /* Copy as dword */ 272 if (!(length % 4) && !(length & 0x03)) { /* Copy as dword */
274 dword *p = (dword *) data; 273 dword *p = (dword *) data;
275 length /= 4; 274 length /= 4;
276 275
277 while (length--) { 276 while (length--) {
278 pci_write_config_dword(dev, offset, 277 pci_write_config_dword(dev, offset,
279 *(unsigned int *) 278 *(unsigned int *)
280 p++); 279 p++);
281 } 280 }
282 } else { /* copy as byte stream */ 281 } else { /* copy as byte stream */
283 byte *p = (byte *) data; 282 byte *p = (byte *) data;
284 283
285 while (length--) { 284 while (length--) {
286 pci_write_config_byte(dev, offset, 285 pci_write_config_byte(dev, offset,
287 *(unsigned char *) 286 *(unsigned char *)
288 p++); 287 p++);
289 } 288 }
290 } 289 }
291 } 290 }
292 } 291 }
293 292
294 void PCIread(byte bus, byte func, int offset, void *data, int length, 293 void PCIread(byte bus, byte func, int offset, void *data, int length,
295 void *pci_dev_handle) 294 void *pci_dev_handle)
296 { 295 {
297 struct pci_dev *dev = (struct pci_dev *) pci_dev_handle; 296 struct pci_dev *dev = (struct pci_dev *) pci_dev_handle;
298 297
299 switch (length) { 298 switch (length) {
300 case 1: /* byte */ 299 case 1: /* byte */
301 pci_read_config_byte(dev, offset, (unsigned char *) data); 300 pci_read_config_byte(dev, offset, (unsigned char *) data);
302 break; 301 break;
303 case 2: /* word */ 302 case 2: /* word */
304 pci_read_config_word(dev, offset, (unsigned short *) data); 303 pci_read_config_word(dev, offset, (unsigned short *) data);
305 break; 304 break;
306 case 4: /* dword */ 305 case 4: /* dword */
307 pci_read_config_dword(dev, offset, (unsigned int *) data); 306 pci_read_config_dword(dev, offset, (unsigned int *) data);
308 break; 307 break;
309 308
310 default: /* buffer */ 309 default: /* buffer */
311 if (!(length % 4) && !(length & 0x03)) { /* Copy as dword */ 310 if (!(length % 4) && !(length & 0x03)) { /* Copy as dword */
312 dword *p = (dword *) data; 311 dword *p = (dword *) data;
313 length /= 4; 312 length /= 4;
314 313
315 while (length--) { 314 while (length--) {
316 pci_read_config_dword(dev, offset, 315 pci_read_config_dword(dev, offset,
317 (unsigned int *) 316 (unsigned int *)
318 p++); 317 p++);
319 } 318 }
320 } else { /* copy as byte stream */ 319 } else { /* copy as byte stream */
321 byte *p = (byte *) data; 320 byte *p = (byte *) data;
322 321
323 while (length--) { 322 while (length--) {
324 pci_read_config_byte(dev, offset, 323 pci_read_config_byte(dev, offset,
325 (unsigned char *) 324 (unsigned char *)
326 p++); 325 p++);
327 } 326 }
328 } 327 }
329 } 328 }
330 } 329 }
331 330
332 /* 331 /*
333 Init map with DMA pages. It is not problem if some allocations fail - 332 Init map with DMA pages. It is not problem if some allocations fail -
334 the channels that will not get one DMA page will use standard PIO 333 the channels that will not get one DMA page will use standard PIO
335 interface 334 interface
336 */ 335 */
337 static void *diva_pci_alloc_consistent(struct pci_dev *hwdev, 336 static void *diva_pci_alloc_consistent(struct pci_dev *hwdev,
338 size_t size, 337 size_t size,
339 dma_addr_t * dma_handle, 338 dma_addr_t * dma_handle,
340 void **addr_handle) 339 void **addr_handle)
341 { 340 {
342 void *addr = pci_alloc_consistent(hwdev, size, dma_handle); 341 void *addr = pci_alloc_consistent(hwdev, size, dma_handle);
343 342
344 *addr_handle = addr; 343 *addr_handle = addr;
345 344
346 return (addr); 345 return (addr);
347 } 346 }
348 347
349 void diva_init_dma_map(void *hdev, 348 void diva_init_dma_map(void *hdev,
350 struct _diva_dma_map_entry **ppmap, int nentries) 349 struct _diva_dma_map_entry **ppmap, int nentries)
351 { 350 {
352 struct pci_dev *pdev = (struct pci_dev *) hdev; 351 struct pci_dev *pdev = (struct pci_dev *) hdev;
353 struct _diva_dma_map_entry *pmap = 352 struct _diva_dma_map_entry *pmap =
354 diva_alloc_dma_map(hdev, nentries); 353 diva_alloc_dma_map(hdev, nentries);
355 354
356 if (pmap) { 355 if (pmap) {
357 int i; 356 int i;
358 dma_addr_t dma_handle; 357 dma_addr_t dma_handle;
359 void *cpu_addr; 358 void *cpu_addr;
360 void *addr_handle; 359 void *addr_handle;
361 360
362 for (i = 0; i < nentries; i++) { 361 for (i = 0; i < nentries; i++) {
363 if (!(cpu_addr = diva_pci_alloc_consistent(pdev, 362 if (!(cpu_addr = diva_pci_alloc_consistent(pdev,
364 PAGE_SIZE, 363 PAGE_SIZE,
365 &dma_handle, 364 &dma_handle,
366 &addr_handle))) 365 &addr_handle)))
367 { 366 {
368 break; 367 break;
369 } 368 }
370 diva_init_dma_map_entry(pmap, i, cpu_addr, 369 diva_init_dma_map_entry(pmap, i, cpu_addr,
371 (dword) dma_handle, 370 (dword) dma_handle,
372 addr_handle); 371 addr_handle);
373 DBG_TRC(("dma map alloc [%d]=(%08lx:%08x:%08lx)", 372 DBG_TRC(("dma map alloc [%d]=(%08lx:%08x:%08lx)",
374 i, (unsigned long) cpu_addr, 373 i, (unsigned long) cpu_addr,
375 (dword) dma_handle, 374 (dword) dma_handle,
376 (unsigned long) addr_handle))} 375 (unsigned long) addr_handle))}
377 } 376 }
378 377
379 *ppmap = pmap; 378 *ppmap = pmap;
380 } 379 }
381 380
382 /* 381 /*
383 Free all contained in the map entries and memory used by the map 382 Free all contained in the map entries and memory used by the map
384 Should be always called after adapter removal from DIDD array 383 Should be always called after adapter removal from DIDD array
385 */ 384 */
386 void diva_free_dma_map(void *hdev, struct _diva_dma_map_entry *pmap) 385 void diva_free_dma_map(void *hdev, struct _diva_dma_map_entry *pmap)
387 { 386 {
388 struct pci_dev *pdev = (struct pci_dev *) hdev; 387 struct pci_dev *pdev = (struct pci_dev *) hdev;
389 int i; 388 int i;
390 dword phys_addr; 389 dword phys_addr;
391 void *cpu_addr; 390 void *cpu_addr;
392 dma_addr_t dma_handle; 391 dma_addr_t dma_handle;
393 void *addr_handle; 392 void *addr_handle;
394 393
395 for (i = 0; (pmap != NULL); i++) { 394 for (i = 0; (pmap != NULL); i++) {
396 diva_get_dma_map_entry(pmap, i, &cpu_addr, &phys_addr); 395 diva_get_dma_map_entry(pmap, i, &cpu_addr, &phys_addr);
397 if (!cpu_addr) { 396 if (!cpu_addr) {
398 break; 397 break;
399 } 398 }
400 addr_handle = diva_get_entry_handle(pmap, i); 399 addr_handle = diva_get_entry_handle(pmap, i);
401 dma_handle = (dma_addr_t) phys_addr; 400 dma_handle = (dma_addr_t) phys_addr;
402 pci_free_consistent(pdev, PAGE_SIZE, addr_handle, 401 pci_free_consistent(pdev, PAGE_SIZE, addr_handle,
403 dma_handle); 402 dma_handle);
404 DBG_TRC(("dma map free [%d]=(%08lx:%08x:%08lx)", i, 403 DBG_TRC(("dma map free [%d]=(%08lx:%08x:%08lx)", i,
405 (unsigned long) cpu_addr, (dword) dma_handle, 404 (unsigned long) cpu_addr, (dword) dma_handle,
406 (unsigned long) addr_handle)) 405 (unsigned long) addr_handle))
407 } 406 }
408 407
409 diva_free_dma_mapping(pmap); 408 diva_free_dma_mapping(pmap);
410 } 409 }
411 410
412 411
413 /********************************************************* 412 /*********************************************************
414 ** I/O port utilities 413 ** I/O port utilities
415 *********************************************************/ 414 *********************************************************/
416 415
417 int 416 int
418 diva_os_register_io_port(void *adapter, int on, unsigned long port, 417 diva_os_register_io_port(void *adapter, int on, unsigned long port,
419 unsigned long length, const char *name, int id) 418 unsigned long length, const char *name, int id)
420 { 419 {
421 if (on) { 420 if (on) {
422 if (!request_region(port, length, name)) { 421 if (!request_region(port, length, name)) {
423 DBG_ERR(("A: I/O: can't register port=%08x", port)) 422 DBG_ERR(("A: I/O: can't register port=%08x", port))
424 return (-1); 423 return (-1);
425 } 424 }
426 } else { 425 } else {
427 release_region(port, length); 426 release_region(port, length);
428 } 427 }
429 return (0); 428 return (0);
430 } 429 }
431 430
432 void __iomem *divasa_remap_pci_bar(diva_os_xdi_adapter_t *a, int id, unsigned long bar, unsigned long area_length) 431 void __iomem *divasa_remap_pci_bar(diva_os_xdi_adapter_t *a, int id, unsigned long bar, unsigned long area_length)
433 { 432 {
434 void __iomem *ret = ioremap(bar, area_length); 433 void __iomem *ret = ioremap(bar, area_length);
435 DBG_TRC(("remap(%08x)->%p", bar, ret)); 434 DBG_TRC(("remap(%08x)->%p", bar, ret));
436 return (ret); 435 return (ret);
437 } 436 }
438 437
439 void divasa_unmap_pci_bar(void __iomem *bar) 438 void divasa_unmap_pci_bar(void __iomem *bar)
440 { 439 {
441 if (bar) { 440 if (bar) {
442 iounmap(bar); 441 iounmap(bar);
443 } 442 }
444 } 443 }
445 444
446 /********************************************************* 445 /*********************************************************
447 ** I/O port access 446 ** I/O port access
448 *********************************************************/ 447 *********************************************************/
449 byte __inline__ inpp(void __iomem *addr) 448 byte __inline__ inpp(void __iomem *addr)
450 { 449 {
451 return (inb((unsigned long) addr)); 450 return (inb((unsigned long) addr));
452 } 451 }
453 452
454 word __inline__ inppw(void __iomem *addr) 453 word __inline__ inppw(void __iomem *addr)
455 { 454 {
456 return (inw((unsigned long) addr)); 455 return (inw((unsigned long) addr));
457 } 456 }
458 457
459 void __inline__ inppw_buffer(void __iomem *addr, void *P, int length) 458 void __inline__ inppw_buffer(void __iomem *addr, void *P, int length)
460 { 459 {
461 insw((unsigned long) addr, (word *) P, length >> 1); 460 insw((unsigned long) addr, (word *) P, length >> 1);
462 } 461 }
463 462
464 void __inline__ outppw_buffer(void __iomem *addr, void *P, int length) 463 void __inline__ outppw_buffer(void __iomem *addr, void *P, int length)
465 { 464 {
466 outsw((unsigned long) addr, (word *) P, length >> 1); 465 outsw((unsigned long) addr, (word *) P, length >> 1);
467 } 466 }
468 467
469 void __inline__ outppw(void __iomem *addr, word w) 468 void __inline__ outppw(void __iomem *addr, word w)
470 { 469 {
471 outw(w, (unsigned long) addr); 470 outw(w, (unsigned long) addr);
472 } 471 }
473 472
474 void __inline__ outpp(void __iomem *addr, word p) 473 void __inline__ outpp(void __iomem *addr, word p)
475 { 474 {
476 outb(p, (unsigned long) addr); 475 outb(p, (unsigned long) addr);
477 } 476 }
478 477
479 /* -------------------------------------------------------------------------- 478 /* --------------------------------------------------------------------------
480 IRQ request / remove 479 IRQ request / remove
481 -------------------------------------------------------------------------- */ 480 -------------------------------------------------------------------------- */
482 int diva_os_register_irq(void *context, byte irq, const char *name) 481 int diva_os_register_irq(void *context, byte irq, const char *name)
483 { 482 {
484 int result = request_irq(irq, diva_os_irq_wrapper, 483 int result = request_irq(irq, diva_os_irq_wrapper,
485 IRQF_DISABLED | IRQF_SHARED, name, context); 484 IRQF_DISABLED | IRQF_SHARED, name, context);
486 return (result); 485 return (result);
487 } 486 }
488 487
489 void diva_os_remove_irq(void *context, byte irq) 488 void diva_os_remove_irq(void *context, byte irq)
490 { 489 {
491 free_irq(irq, context); 490 free_irq(irq, context);
492 } 491 }
493 492
494 /* -------------------------------------------------------------------------- 493 /* --------------------------------------------------------------------------
495 DPC framework implementation 494 DPC framework implementation
496 -------------------------------------------------------------------------- */ 495 -------------------------------------------------------------------------- */
497 static void diva_os_dpc_proc(unsigned long context) 496 static void diva_os_dpc_proc(unsigned long context)
498 { 497 {
499 diva_os_thread_dpc_t *psoft_isr = (diva_os_thread_dpc_t *) context; 498 diva_os_thread_dpc_t *psoft_isr = (diva_os_thread_dpc_t *) context;
500 diva_os_soft_isr_t *pisr = psoft_isr->psoft_isr; 499 diva_os_soft_isr_t *pisr = psoft_isr->psoft_isr;
501 500
502 (*(pisr->callback)) (pisr, pisr->callback_context); 501 (*(pisr->callback)) (pisr, pisr->callback_context);
503 } 502 }
504 503
505 int diva_os_initialize_soft_isr(diva_os_soft_isr_t * psoft_isr, 504 int diva_os_initialize_soft_isr(diva_os_soft_isr_t * psoft_isr,
506 diva_os_soft_isr_callback_t callback, 505 diva_os_soft_isr_callback_t callback,
507 void *callback_context) 506 void *callback_context)
508 { 507 {
509 diva_os_thread_dpc_t *pdpc; 508 diva_os_thread_dpc_t *pdpc;
510 509
511 pdpc = (diva_os_thread_dpc_t *) diva_os_malloc(0, sizeof(*pdpc)); 510 pdpc = (diva_os_thread_dpc_t *) diva_os_malloc(0, sizeof(*pdpc));
512 if (!(psoft_isr->object = pdpc)) { 511 if (!(psoft_isr->object = pdpc)) {
513 return (-1); 512 return (-1);
514 } 513 }
515 memset(pdpc, 0x00, sizeof(*pdpc)); 514 memset(pdpc, 0x00, sizeof(*pdpc));
516 psoft_isr->callback = callback; 515 psoft_isr->callback = callback;
517 psoft_isr->callback_context = callback_context; 516 psoft_isr->callback_context = callback_context;
518 pdpc->psoft_isr = psoft_isr; 517 pdpc->psoft_isr = psoft_isr;
519 tasklet_init(&pdpc->divas_task, diva_os_dpc_proc, (unsigned long)pdpc); 518 tasklet_init(&pdpc->divas_task, diva_os_dpc_proc, (unsigned long)pdpc);
520 519
521 return (0); 520 return (0);
522 } 521 }
523 522
524 int diva_os_schedule_soft_isr(diva_os_soft_isr_t * psoft_isr) 523 int diva_os_schedule_soft_isr(diva_os_soft_isr_t * psoft_isr)
525 { 524 {
526 if (psoft_isr && psoft_isr->object) { 525 if (psoft_isr && psoft_isr->object) {
527 diva_os_thread_dpc_t *pdpc = 526 diva_os_thread_dpc_t *pdpc =
528 (diva_os_thread_dpc_t *) psoft_isr->object; 527 (diva_os_thread_dpc_t *) psoft_isr->object;
529 528
530 tasklet_schedule(&pdpc->divas_task); 529 tasklet_schedule(&pdpc->divas_task);
531 } 530 }
532 531
533 return (1); 532 return (1);
534 } 533 }
535 534
536 int diva_os_cancel_soft_isr(diva_os_soft_isr_t * psoft_isr) 535 int diva_os_cancel_soft_isr(diva_os_soft_isr_t * psoft_isr)
537 { 536 {
538 return (0); 537 return (0);
539 } 538 }
540 539
541 void diva_os_remove_soft_isr(diva_os_soft_isr_t * psoft_isr) 540 void diva_os_remove_soft_isr(diva_os_soft_isr_t * psoft_isr)
542 { 541 {
543 if (psoft_isr && psoft_isr->object) { 542 if (psoft_isr && psoft_isr->object) {
544 diva_os_thread_dpc_t *pdpc = 543 diva_os_thread_dpc_t *pdpc =
545 (diva_os_thread_dpc_t *) psoft_isr->object; 544 (diva_os_thread_dpc_t *) psoft_isr->object;
546 void *mem; 545 void *mem;
547 546
548 tasklet_kill(&pdpc->divas_task); 547 tasklet_kill(&pdpc->divas_task);
549 flush_scheduled_work();
550 mem = psoft_isr->object; 548 mem = psoft_isr->object;
551 psoft_isr->object = NULL; 549 psoft_isr->object = NULL;
552 diva_os_free(0, mem); 550 diva_os_free(0, mem);
553 } 551 }
554 } 552 }
555 553
556 /* 554 /*
557 * kernel/user space copy functions 555 * kernel/user space copy functions
558 */ 556 */
559 static int 557 static int
560 xdi_copy_to_user(void *os_handle, void __user *dst, const void *src, int length) 558 xdi_copy_to_user(void *os_handle, void __user *dst, const void *src, int length)
561 { 559 {
562 if (copy_to_user(dst, src, length)) { 560 if (copy_to_user(dst, src, length)) {
563 return (-EFAULT); 561 return (-EFAULT);
564 } 562 }
565 return (length); 563 return (length);
566 } 564 }
567 565
568 static int 566 static int
569 xdi_copy_from_user(void *os_handle, void *dst, const void __user *src, int length) 567 xdi_copy_from_user(void *os_handle, void *dst, const void __user *src, int length)
570 { 568 {
571 if (copy_from_user(dst, src, length)) { 569 if (copy_from_user(dst, src, length)) {
572 return (-EFAULT); 570 return (-EFAULT);
573 } 571 }
574 return (length); 572 return (length);
575 } 573 }
576 574
577 /* 575 /*
578 * device node operations 576 * device node operations
579 */ 577 */
580 static int divas_open(struct inode *inode, struct file *file) 578 static int divas_open(struct inode *inode, struct file *file)
581 { 579 {
582 return (0); 580 return (0);
583 } 581 }
584 582
585 static int divas_release(struct inode *inode, struct file *file) 583 static int divas_release(struct inode *inode, struct file *file)
586 { 584 {
587 if (file->private_data) { 585 if (file->private_data) {
588 diva_xdi_close_adapter(file->private_data, file); 586 diva_xdi_close_adapter(file->private_data, file);
589 } 587 }
590 return (0); 588 return (0);
591 } 589 }
592 590
593 static ssize_t divas_write(struct file *file, const char __user *buf, 591 static ssize_t divas_write(struct file *file, const char __user *buf,
594 size_t count, loff_t * ppos) 592 size_t count, loff_t * ppos)
595 { 593 {
596 int ret = -EINVAL; 594 int ret = -EINVAL;
597 595
598 if (!file->private_data) { 596 if (!file->private_data) {
599 file->private_data = diva_xdi_open_adapter(file, buf, 597 file->private_data = diva_xdi_open_adapter(file, buf,
600 count, 598 count,
601 xdi_copy_from_user); 599 xdi_copy_from_user);
602 } 600 }
603 if (!file->private_data) { 601 if (!file->private_data) {
604 return (-ENODEV); 602 return (-ENODEV);
605 } 603 }
606 604
607 ret = diva_xdi_write(file->private_data, file, 605 ret = diva_xdi_write(file->private_data, file,
608 buf, count, xdi_copy_from_user); 606 buf, count, xdi_copy_from_user);
609 switch (ret) { 607 switch (ret) {
610 case -1: /* Message should be removed from rx mailbox first */ 608 case -1: /* Message should be removed from rx mailbox first */
611 ret = -EBUSY; 609 ret = -EBUSY;
612 break; 610 break;
613 case -2: /* invalid adapter was specified in this call */ 611 case -2: /* invalid adapter was specified in this call */
614 ret = -ENOMEM; 612 ret = -ENOMEM;
615 break; 613 break;
616 case -3: 614 case -3:
617 ret = -ENXIO; 615 ret = -ENXIO;
618 break; 616 break;
619 } 617 }
620 DBG_TRC(("write: ret %d", ret)); 618 DBG_TRC(("write: ret %d", ret));
621 return (ret); 619 return (ret);
622 } 620 }
623 621
624 static ssize_t divas_read(struct file *file, char __user *buf, 622 static ssize_t divas_read(struct file *file, char __user *buf,
625 size_t count, loff_t * ppos) 623 size_t count, loff_t * ppos)
626 { 624 {
627 int ret = -EINVAL; 625 int ret = -EINVAL;
628 626
629 if (!file->private_data) { 627 if (!file->private_data) {
630 file->private_data = diva_xdi_open_adapter(file, buf, 628 file->private_data = diva_xdi_open_adapter(file, buf,
631 count, 629 count,
632 xdi_copy_from_user); 630 xdi_copy_from_user);
633 } 631 }
634 if (!file->private_data) { 632 if (!file->private_data) {
635 return (-ENODEV); 633 return (-ENODEV);
636 } 634 }
637 635
638 ret = diva_xdi_read(file->private_data, file, 636 ret = diva_xdi_read(file->private_data, file,
639 buf, count, xdi_copy_to_user); 637 buf, count, xdi_copy_to_user);
640 switch (ret) { 638 switch (ret) {
641 case -1: /* RX mailbox is empty */ 639 case -1: /* RX mailbox is empty */
642 ret = -EAGAIN; 640 ret = -EAGAIN;
643 break; 641 break;
644 case -2: /* no memory, mailbox was cleared, last command is failed */ 642 case -2: /* no memory, mailbox was cleared, last command is failed */
645 ret = -ENOMEM; 643 ret = -ENOMEM;
646 break; 644 break;
647 case -3: /* can't copy to user, retry */ 645 case -3: /* can't copy to user, retry */
648 ret = -EFAULT; 646 ret = -EFAULT;
649 break; 647 break;
650 } 648 }
651 DBG_TRC(("read: ret %d", ret)); 649 DBG_TRC(("read: ret %d", ret));
652 return (ret); 650 return (ret);
653 } 651 }
654 652
655 static unsigned int divas_poll(struct file *file, poll_table * wait) 653 static unsigned int divas_poll(struct file *file, poll_table * wait)
656 { 654 {
657 if (!file->private_data) { 655 if (!file->private_data) {
658 return (POLLERR); 656 return (POLLERR);
659 } 657 }
660 return (POLLIN | POLLRDNORM); 658 return (POLLIN | POLLRDNORM);
661 } 659 }
662 660
663 static const struct file_operations divas_fops = { 661 static const struct file_operations divas_fops = {
664 .owner = THIS_MODULE, 662 .owner = THIS_MODULE,
665 .llseek = no_llseek, 663 .llseek = no_llseek,
666 .read = divas_read, 664 .read = divas_read,
667 .write = divas_write, 665 .write = divas_write,
668 .poll = divas_poll, 666 .poll = divas_poll,
669 .open = divas_open, 667 .open = divas_open,
670 .release = divas_release 668 .release = divas_release
671 }; 669 };
672 670
673 static void divas_unregister_chrdev(void) 671 static void divas_unregister_chrdev(void)
674 { 672 {
675 unregister_chrdev(major, DEVNAME); 673 unregister_chrdev(major, DEVNAME);
676 } 674 }
677 675
678 static int DIVA_INIT_FUNCTION divas_register_chrdev(void) 676 static int DIVA_INIT_FUNCTION divas_register_chrdev(void)
679 { 677 {
680 if ((major = register_chrdev(0, DEVNAME, &divas_fops)) < 0) 678 if ((major = register_chrdev(0, DEVNAME, &divas_fops)) < 0)
681 { 679 {
682 printk(KERN_ERR "%s: failed to create /dev entry.\n", 680 printk(KERN_ERR "%s: failed to create /dev entry.\n",
683 DRIVERLNAME); 681 DRIVERLNAME);
684 return (0); 682 return (0);
685 } 683 }
686 684
687 return (1); 685 return (1);
688 } 686 }
689 687
690 /* -------------------------------------------------------------------------- 688 /* --------------------------------------------------------------------------
691 PCI driver section 689 PCI driver section
692 -------------------------------------------------------------------------- */ 690 -------------------------------------------------------------------------- */
693 static int __devinit divas_init_one(struct pci_dev *pdev, 691 static int __devinit divas_init_one(struct pci_dev *pdev,
694 const struct pci_device_id *ent) 692 const struct pci_device_id *ent)
695 { 693 {
696 void *pdiva = NULL; 694 void *pdiva = NULL;
697 u8 pci_latency; 695 u8 pci_latency;
698 u8 new_latency = 32; 696 u8 new_latency = 32;
699 697
700 DBG_TRC(("%s bus: %08x fn: %08x insertion.\n", 698 DBG_TRC(("%s bus: %08x fn: %08x insertion.\n",
701 CardProperties[ent->driver_data].Name, 699 CardProperties[ent->driver_data].Name,
702 pdev->bus->number, pdev->devfn)) 700 pdev->bus->number, pdev->devfn))
703 printk(KERN_INFO "%s: %s bus: %08x fn: %08x insertion.\n", 701 printk(KERN_INFO "%s: %s bus: %08x fn: %08x insertion.\n",
704 DRIVERLNAME, CardProperties[ent->driver_data].Name, 702 DRIVERLNAME, CardProperties[ent->driver_data].Name,
705 pdev->bus->number, pdev->devfn); 703 pdev->bus->number, pdev->devfn);
706 704
707 if (pci_enable_device(pdev)) { 705 if (pci_enable_device(pdev)) {
708 DBG_TRC(("%s: %s bus: %08x fn: %08x device init failed.\n", 706 DBG_TRC(("%s: %s bus: %08x fn: %08x device init failed.\n",
709 DRIVERLNAME, 707 DRIVERLNAME,
710 CardProperties[ent->driver_data].Name, 708 CardProperties[ent->driver_data].Name,
711 pdev->bus->number, 709 pdev->bus->number,
712 pdev->devfn)) 710 pdev->devfn))
713 printk(KERN_ERR 711 printk(KERN_ERR
714 "%s: %s bus: %08x fn: %08x device init failed.\n", 712 "%s: %s bus: %08x fn: %08x device init failed.\n",
715 DRIVERLNAME, 713 DRIVERLNAME,
716 CardProperties[ent->driver_data]. 714 CardProperties[ent->driver_data].
717 Name, pdev->bus->number, 715 Name, pdev->bus->number,
718 pdev->devfn); 716 pdev->devfn);
719 return (-EIO); 717 return (-EIO);
720 } 718 }
721 719
722 pci_set_master(pdev); 720 pci_set_master(pdev);
723 721
724 pci_read_config_byte(pdev, PCI_LATENCY_TIMER, &pci_latency); 722 pci_read_config_byte(pdev, PCI_LATENCY_TIMER, &pci_latency);
725 if (!pci_latency) { 723 if (!pci_latency) {
726 DBG_TRC(("%s: bus: %08x fn: %08x fix latency.\n", 724 DBG_TRC(("%s: bus: %08x fn: %08x fix latency.\n",
727 DRIVERLNAME, pdev->bus->number, pdev->devfn)) 725 DRIVERLNAME, pdev->bus->number, pdev->devfn))
728 printk(KERN_INFO 726 printk(KERN_INFO
729 "%s: bus: %08x fn: %08x fix latency.\n", 727 "%s: bus: %08x fn: %08x fix latency.\n",
730 DRIVERLNAME, pdev->bus->number, pdev->devfn); 728 DRIVERLNAME, pdev->bus->number, pdev->devfn);
731 pci_write_config_byte(pdev, PCI_LATENCY_TIMER, new_latency); 729 pci_write_config_byte(pdev, PCI_LATENCY_TIMER, new_latency);
732 } 730 }
733 731
734 if (!(pdiva = diva_driver_add_card(pdev, ent->driver_data))) { 732 if (!(pdiva = diva_driver_add_card(pdev, ent->driver_data))) {
735 DBG_TRC(("%s: %s bus: %08x fn: %08x card init failed.\n", 733 DBG_TRC(("%s: %s bus: %08x fn: %08x card init failed.\n",
736 DRIVERLNAME, 734 DRIVERLNAME,
737 CardProperties[ent->driver_data].Name, 735 CardProperties[ent->driver_data].Name,
738 pdev->bus->number, 736 pdev->bus->number,
739 pdev->devfn)) 737 pdev->devfn))
740 printk(KERN_ERR 738 printk(KERN_ERR
741 "%s: %s bus: %08x fn: %08x card init failed.\n", 739 "%s: %s bus: %08x fn: %08x card init failed.\n",
742 DRIVERLNAME, 740 DRIVERLNAME,
743 CardProperties[ent->driver_data]. 741 CardProperties[ent->driver_data].
744 Name, pdev->bus->number, 742 Name, pdev->bus->number,
745 pdev->devfn); 743 pdev->devfn);
746 return (-EIO); 744 return (-EIO);
747 } 745 }
748 746
749 pci_set_drvdata(pdev, pdiva); 747 pci_set_drvdata(pdev, pdiva);
750 748
751 return (0); 749 return (0);
752 } 750 }
753 751
754 static void __devexit divas_remove_one(struct pci_dev *pdev) 752 static void __devexit divas_remove_one(struct pci_dev *pdev)
755 { 753 {
756 void *pdiva = pci_get_drvdata(pdev); 754 void *pdiva = pci_get_drvdata(pdev);
757 755
758 DBG_TRC(("bus: %08x fn: %08x removal.\n", 756 DBG_TRC(("bus: %08x fn: %08x removal.\n",
759 pdev->bus->number, pdev->devfn)) 757 pdev->bus->number, pdev->devfn))
760 printk(KERN_INFO "%s: bus: %08x fn: %08x removal.\n", 758 printk(KERN_INFO "%s: bus: %08x fn: %08x removal.\n",
761 DRIVERLNAME, pdev->bus->number, pdev->devfn); 759 DRIVERLNAME, pdev->bus->number, pdev->devfn);
762 760
763 if (pdiva) { 761 if (pdiva) {
764 diva_driver_remove_card(pdiva); 762 diva_driver_remove_card(pdiva);
765 } 763 }
766 764
767 } 765 }
768 766
769 /* -------------------------------------------------------------------------- 767 /* --------------------------------------------------------------------------
770 Driver Load / Startup 768 Driver Load / Startup
771 -------------------------------------------------------------------------- */ 769 -------------------------------------------------------------------------- */
772 static int DIVA_INIT_FUNCTION divas_init(void) 770 static int DIVA_INIT_FUNCTION divas_init(void)
773 { 771 {
774 char tmprev[50]; 772 char tmprev[50];
775 int ret = 0; 773 int ret = 0;
776 774
777 printk(KERN_INFO "%s\n", DRIVERNAME); 775 printk(KERN_INFO "%s\n", DRIVERNAME);
778 printk(KERN_INFO "%s: Rel:%s Rev:", DRIVERLNAME, DRIVERRELEASE_DIVAS); 776 printk(KERN_INFO "%s: Rel:%s Rev:", DRIVERLNAME, DRIVERRELEASE_DIVAS);
779 strcpy(tmprev, main_revision); 777 strcpy(tmprev, main_revision);
780 printk("%s Build: %s(%s)\n", getrev(tmprev), 778 printk("%s Build: %s(%s)\n", getrev(tmprev),
781 diva_xdi_common_code_build, DIVA_BUILD); 779 diva_xdi_common_code_build, DIVA_BUILD);
782 printk(KERN_INFO "%s: support for: ", DRIVERLNAME); 780 printk(KERN_INFO "%s: support for: ", DRIVERLNAME);
783 #ifdef CONFIG_ISDN_DIVAS_BRIPCI 781 #ifdef CONFIG_ISDN_DIVAS_BRIPCI
784 printk("BRI/PCI "); 782 printk("BRI/PCI ");
785 #endif 783 #endif
786 #ifdef CONFIG_ISDN_DIVAS_PRIPCI 784 #ifdef CONFIG_ISDN_DIVAS_PRIPCI
787 printk("PRI/PCI "); 785 printk("PRI/PCI ");
788 #endif 786 #endif
789 printk("adapters\n"); 787 printk("adapters\n");
790 788
791 if (!divasfunc_init(dbgmask)) { 789 if (!divasfunc_init(dbgmask)) {
792 printk(KERN_ERR "%s: failed to connect to DIDD.\n", 790 printk(KERN_ERR "%s: failed to connect to DIDD.\n",
793 DRIVERLNAME); 791 DRIVERLNAME);
794 ret = -EIO; 792 ret = -EIO;
795 goto out; 793 goto out;
796 } 794 }
797 795
798 if (!divas_register_chrdev()) { 796 if (!divas_register_chrdev()) {
799 #ifdef MODULE 797 #ifdef MODULE
800 divasfunc_exit(); 798 divasfunc_exit();
801 #endif 799 #endif
802 ret = -EIO; 800 ret = -EIO;
803 goto out; 801 goto out;
804 } 802 }
805 803
806 if (!create_divas_proc()) { 804 if (!create_divas_proc()) {
807 #ifdef MODULE 805 #ifdef MODULE
808 divas_unregister_chrdev(); 806 divas_unregister_chrdev();
809 divasfunc_exit(); 807 divasfunc_exit();
810 #endif 808 #endif
811 printk(KERN_ERR "%s: failed to create proc entry.\n", 809 printk(KERN_ERR "%s: failed to create proc entry.\n",
812 DRIVERLNAME); 810 DRIVERLNAME);
813 ret = -EIO; 811 ret = -EIO;
814 goto out; 812 goto out;
815 } 813 }
816 814
817 if ((ret = pci_register_driver(&diva_pci_driver))) { 815 if ((ret = pci_register_driver(&diva_pci_driver))) {
818 #ifdef MODULE 816 #ifdef MODULE
819 remove_divas_proc(); 817 remove_divas_proc();
820 divas_unregister_chrdev(); 818 divas_unregister_chrdev();
821 divasfunc_exit(); 819 divasfunc_exit();
822 #endif 820 #endif
823 printk(KERN_ERR "%s: failed to init pci driver.\n", 821 printk(KERN_ERR "%s: failed to init pci driver.\n",
824 DRIVERLNAME); 822 DRIVERLNAME);
825 goto out; 823 goto out;
826 } 824 }
827 printk(KERN_INFO "%s: started with major %d\n", DRIVERLNAME, major); 825 printk(KERN_INFO "%s: started with major %d\n", DRIVERLNAME, major);
828 826
829 out: 827 out:
830 return (ret); 828 return (ret);
831 } 829 }
832 830
833 /* -------------------------------------------------------------------------- 831 /* --------------------------------------------------------------------------
834 Driver Unload 832 Driver Unload
835 -------------------------------------------------------------------------- */ 833 -------------------------------------------------------------------------- */
836 static void DIVA_EXIT_FUNCTION divas_exit(void) 834 static void DIVA_EXIT_FUNCTION divas_exit(void)
837 { 835 {
838 pci_unregister_driver(&diva_pci_driver); 836 pci_unregister_driver(&diva_pci_driver);
839 remove_divas_proc(); 837 remove_divas_proc();
840 divas_unregister_chrdev(); 838 divas_unregister_chrdev();
841 divasfunc_exit(); 839 divasfunc_exit();
842 840
843 printk(KERN_INFO "%s: module unloaded.\n", DRIVERLNAME); 841 printk(KERN_INFO "%s: module unloaded.\n", DRIVERLNAME);
844 } 842 }
845 843
846 module_init(divas_init); 844 module_init(divas_init);
847 module_exit(divas_exit); 845 module_exit(divas_exit);
848 846
drivers/pci/hotplug/pciehp.h
1 /* 1 /*
2 * PCI Express Hot Plug Controller Driver 2 * PCI Express Hot Plug Controller Driver
3 * 3 *
4 * Copyright (C) 1995,2001 Compaq Computer Corporation 4 * Copyright (C) 1995,2001 Compaq Computer Corporation
5 * Copyright (C) 2001 Greg Kroah-Hartman (greg@kroah.com) 5 * Copyright (C) 2001 Greg Kroah-Hartman (greg@kroah.com)
6 * Copyright (C) 2001 IBM Corp. 6 * Copyright (C) 2001 IBM Corp.
7 * Copyright (C) 2003-2004 Intel Corporation 7 * Copyright (C) 2003-2004 Intel Corporation
8 * 8 *
9 * All rights reserved. 9 * All rights reserved.
10 * 10 *
11 * This program is free software; you can redistribute it and/or modify 11 * This program is free software; you can redistribute it and/or modify
12 * it under the terms of the GNU General Public License as published by 12 * it under the terms of the GNU General Public License as published by
13 * the Free Software Foundation; either version 2 of the License, or (at 13 * the Free Software Foundation; either version 2 of the License, or (at
14 * your option) any later version. 14 * your option) any later version.
15 * 15 *
16 * This program is distributed in the hope that it will be useful, but 16 * This program is distributed in the hope that it will be useful, but
17 * WITHOUT ANY WARRANTY; without even the implied warranty of 17 * WITHOUT ANY WARRANTY; without even the implied warranty of
18 * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or 18 * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
19 * NON INFRINGEMENT. See the GNU General Public License for more 19 * NON INFRINGEMENT. See the GNU General Public License for more
20 * details. 20 * details.
21 * 21 *
22 * You should have received a copy of the GNU General Public License 22 * You should have received a copy of the GNU General Public License
23 * along with this program; if not, write to the Free Software 23 * along with this program; if not, write to the Free Software
24 * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. 24 * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
25 * 25 *
26 * Send feedback to <greg@kroah.com>, <kristen.c.accardi@intel.com> 26 * Send feedback to <greg@kroah.com>, <kristen.c.accardi@intel.com>
27 * 27 *
28 */ 28 */
29 #ifndef _PCIEHP_H 29 #ifndef _PCIEHP_H
30 #define _PCIEHP_H 30 #define _PCIEHP_H
31 31
32 #include <linux/types.h> 32 #include <linux/types.h>
33 #include <linux/pci.h> 33 #include <linux/pci.h>
34 #include <linux/pci_hotplug.h> 34 #include <linux/pci_hotplug.h>
35 #include <linux/delay.h> 35 #include <linux/delay.h>
36 #include <linux/sched.h> /* signal_pending() */ 36 #include <linux/sched.h> /* signal_pending() */
37 #include <linux/pcieport_if.h> 37 #include <linux/pcieport_if.h>
38 #include <linux/mutex.h> 38 #include <linux/mutex.h>
39 #include <linux/workqueue.h>
39 40
40 #define MY_NAME "pciehp" 41 #define MY_NAME "pciehp"
41 42
42 extern int pciehp_poll_mode; 43 extern int pciehp_poll_mode;
43 extern int pciehp_poll_time; 44 extern int pciehp_poll_time;
44 extern int pciehp_debug; 45 extern int pciehp_debug;
45 extern int pciehp_force; 46 extern int pciehp_force;
46 extern struct workqueue_struct *pciehp_wq; 47 extern struct workqueue_struct *pciehp_wq;
48 extern struct workqueue_struct *pciehp_ordered_wq;
47 49
48 #define dbg(format, arg...) \ 50 #define dbg(format, arg...) \
49 do { \ 51 do { \
50 if (pciehp_debug) \ 52 if (pciehp_debug) \
51 printk(KERN_DEBUG "%s: " format, MY_NAME , ## arg); \ 53 printk(KERN_DEBUG "%s: " format, MY_NAME , ## arg); \
52 } while (0) 54 } while (0)
53 #define err(format, arg...) \ 55 #define err(format, arg...) \
54 printk(KERN_ERR "%s: " format, MY_NAME , ## arg) 56 printk(KERN_ERR "%s: " format, MY_NAME , ## arg)
55 #define info(format, arg...) \ 57 #define info(format, arg...) \
56 printk(KERN_INFO "%s: " format, MY_NAME , ## arg) 58 printk(KERN_INFO "%s: " format, MY_NAME , ## arg)
57 #define warn(format, arg...) \ 59 #define warn(format, arg...) \
58 printk(KERN_WARNING "%s: " format, MY_NAME , ## arg) 60 printk(KERN_WARNING "%s: " format, MY_NAME , ## arg)
59 61
60 #define ctrl_dbg(ctrl, format, arg...) \ 62 #define ctrl_dbg(ctrl, format, arg...) \
61 do { \ 63 do { \
62 if (pciehp_debug) \ 64 if (pciehp_debug) \
63 dev_printk(KERN_DEBUG, &ctrl->pcie->device, \ 65 dev_printk(KERN_DEBUG, &ctrl->pcie->device, \
64 format, ## arg); \ 66 format, ## arg); \
65 } while (0) 67 } while (0)
66 #define ctrl_err(ctrl, format, arg...) \ 68 #define ctrl_err(ctrl, format, arg...) \
67 dev_err(&ctrl->pcie->device, format, ## arg) 69 dev_err(&ctrl->pcie->device, format, ## arg)
68 #define ctrl_info(ctrl, format, arg...) \ 70 #define ctrl_info(ctrl, format, arg...) \
69 dev_info(&ctrl->pcie->device, format, ## arg) 71 dev_info(&ctrl->pcie->device, format, ## arg)
70 #define ctrl_warn(ctrl, format, arg...) \ 72 #define ctrl_warn(ctrl, format, arg...) \
71 dev_warn(&ctrl->pcie->device, format, ## arg) 73 dev_warn(&ctrl->pcie->device, format, ## arg)
72 74
73 #define SLOT_NAME_SIZE 10 75 #define SLOT_NAME_SIZE 10
74 struct slot { 76 struct slot {
75 u8 state; 77 u8 state;
76 struct controller *ctrl; 78 struct controller *ctrl;
77 struct hotplug_slot *hotplug_slot; 79 struct hotplug_slot *hotplug_slot;
78 struct delayed_work work; /* work for button event */ 80 struct delayed_work work; /* work for button event */
79 struct mutex lock; 81 struct mutex lock;
80 }; 82 };
81 83
82 struct event_info { 84 struct event_info {
83 u32 event_type; 85 u32 event_type;
84 struct slot *p_slot; 86 struct slot *p_slot;
85 struct work_struct work; 87 struct work_struct work;
86 }; 88 };
87 89
88 struct controller { 90 struct controller {
89 struct mutex ctrl_lock; /* controller lock */ 91 struct mutex ctrl_lock; /* controller lock */
90 struct pcie_device *pcie; /* PCI Express port service */ 92 struct pcie_device *pcie; /* PCI Express port service */
91 struct slot *slot; 93 struct slot *slot;
92 wait_queue_head_t queue; /* sleep & wake process */ 94 wait_queue_head_t queue; /* sleep & wake process */
93 u32 slot_cap; 95 u32 slot_cap;
94 struct timer_list poll_timer; 96 struct timer_list poll_timer;
95 unsigned int cmd_busy:1; 97 unsigned int cmd_busy:1;
96 unsigned int no_cmd_complete:1; 98 unsigned int no_cmd_complete:1;
97 unsigned int link_active_reporting:1; 99 unsigned int link_active_reporting:1;
98 unsigned int notification_enabled:1; 100 unsigned int notification_enabled:1;
99 unsigned int power_fault_detected; 101 unsigned int power_fault_detected;
100 }; 102 };
101 103
102 #define INT_BUTTON_IGNORE 0 104 #define INT_BUTTON_IGNORE 0
103 #define INT_PRESENCE_ON 1 105 #define INT_PRESENCE_ON 1
104 #define INT_PRESENCE_OFF 2 106 #define INT_PRESENCE_OFF 2
105 #define INT_SWITCH_CLOSE 3 107 #define INT_SWITCH_CLOSE 3
106 #define INT_SWITCH_OPEN 4 108 #define INT_SWITCH_OPEN 4
107 #define INT_POWER_FAULT 5 109 #define INT_POWER_FAULT 5
108 #define INT_POWER_FAULT_CLEAR 6 110 #define INT_POWER_FAULT_CLEAR 6
109 #define INT_BUTTON_PRESS 7 111 #define INT_BUTTON_PRESS 7
110 #define INT_BUTTON_RELEASE 8 112 #define INT_BUTTON_RELEASE 8
111 #define INT_BUTTON_CANCEL 9 113 #define INT_BUTTON_CANCEL 9
112 114
113 #define STATIC_STATE 0 115 #define STATIC_STATE 0
114 #define BLINKINGON_STATE 1 116 #define BLINKINGON_STATE 1
115 #define BLINKINGOFF_STATE 2 117 #define BLINKINGOFF_STATE 2
116 #define POWERON_STATE 3 118 #define POWERON_STATE 3
117 #define POWEROFF_STATE 4 119 #define POWEROFF_STATE 4
118 120
119 #define ATTN_BUTTN(ctrl) ((ctrl)->slot_cap & PCI_EXP_SLTCAP_ABP) 121 #define ATTN_BUTTN(ctrl) ((ctrl)->slot_cap & PCI_EXP_SLTCAP_ABP)
120 #define POWER_CTRL(ctrl) ((ctrl)->slot_cap & PCI_EXP_SLTCAP_PCP) 122 #define POWER_CTRL(ctrl) ((ctrl)->slot_cap & PCI_EXP_SLTCAP_PCP)
121 #define MRL_SENS(ctrl) ((ctrl)->slot_cap & PCI_EXP_SLTCAP_MRLSP) 123 #define MRL_SENS(ctrl) ((ctrl)->slot_cap & PCI_EXP_SLTCAP_MRLSP)
122 #define ATTN_LED(ctrl) ((ctrl)->slot_cap & PCI_EXP_SLTCAP_AIP) 124 #define ATTN_LED(ctrl) ((ctrl)->slot_cap & PCI_EXP_SLTCAP_AIP)
123 #define PWR_LED(ctrl) ((ctrl)->slot_cap & PCI_EXP_SLTCAP_PIP) 125 #define PWR_LED(ctrl) ((ctrl)->slot_cap & PCI_EXP_SLTCAP_PIP)
124 #define HP_SUPR_RM(ctrl) ((ctrl)->slot_cap & PCI_EXP_SLTCAP_HPS) 126 #define HP_SUPR_RM(ctrl) ((ctrl)->slot_cap & PCI_EXP_SLTCAP_HPS)
125 #define EMI(ctrl) ((ctrl)->slot_cap & PCI_EXP_SLTCAP_EIP) 127 #define EMI(ctrl) ((ctrl)->slot_cap & PCI_EXP_SLTCAP_EIP)
126 #define NO_CMD_CMPL(ctrl) ((ctrl)->slot_cap & PCI_EXP_SLTCAP_NCCS) 128 #define NO_CMD_CMPL(ctrl) ((ctrl)->slot_cap & PCI_EXP_SLTCAP_NCCS)
127 #define PSN(ctrl) ((ctrl)->slot_cap >> 19) 129 #define PSN(ctrl) ((ctrl)->slot_cap >> 19)
128 130
129 extern int pciehp_sysfs_enable_slot(struct slot *slot); 131 extern int pciehp_sysfs_enable_slot(struct slot *slot);
130 extern int pciehp_sysfs_disable_slot(struct slot *slot); 132 extern int pciehp_sysfs_disable_slot(struct slot *slot);
131 extern u8 pciehp_handle_attention_button(struct slot *p_slot); 133 extern u8 pciehp_handle_attention_button(struct slot *p_slot);
132 extern u8 pciehp_handle_switch_change(struct slot *p_slot); 134 extern u8 pciehp_handle_switch_change(struct slot *p_slot);
133 extern u8 pciehp_handle_presence_change(struct slot *p_slot); 135 extern u8 pciehp_handle_presence_change(struct slot *p_slot);
134 extern u8 pciehp_handle_power_fault(struct slot *p_slot); 136 extern u8 pciehp_handle_power_fault(struct slot *p_slot);
135 extern int pciehp_configure_device(struct slot *p_slot); 137 extern int pciehp_configure_device(struct slot *p_slot);
136 extern int pciehp_unconfigure_device(struct slot *p_slot); 138 extern int pciehp_unconfigure_device(struct slot *p_slot);
137 extern void pciehp_queue_pushbutton_work(struct work_struct *work); 139 extern void pciehp_queue_pushbutton_work(struct work_struct *work);
138 struct controller *pcie_init(struct pcie_device *dev); 140 struct controller *pcie_init(struct pcie_device *dev);
139 int pcie_init_notification(struct controller *ctrl); 141 int pcie_init_notification(struct controller *ctrl);
140 int pciehp_enable_slot(struct slot *p_slot); 142 int pciehp_enable_slot(struct slot *p_slot);
141 int pciehp_disable_slot(struct slot *p_slot); 143 int pciehp_disable_slot(struct slot *p_slot);
142 int pcie_enable_notification(struct controller *ctrl); 144 int pcie_enable_notification(struct controller *ctrl);
143 int pciehp_power_on_slot(struct slot *slot); 145 int pciehp_power_on_slot(struct slot *slot);
144 int pciehp_power_off_slot(struct slot *slot); 146 int pciehp_power_off_slot(struct slot *slot);
145 int pciehp_get_power_status(struct slot *slot, u8 *status); 147 int pciehp_get_power_status(struct slot *slot, u8 *status);
146 int pciehp_get_attention_status(struct slot *slot, u8 *status); 148 int pciehp_get_attention_status(struct slot *slot, u8 *status);
147 149
148 int pciehp_set_attention_status(struct slot *slot, u8 status); 150 int pciehp_set_attention_status(struct slot *slot, u8 status);
149 int pciehp_get_latch_status(struct slot *slot, u8 *status); 151 int pciehp_get_latch_status(struct slot *slot, u8 *status);
150 int pciehp_get_adapter_status(struct slot *slot, u8 *status); 152 int pciehp_get_adapter_status(struct slot *slot, u8 *status);
151 int pciehp_get_max_link_speed(struct slot *slot, enum pci_bus_speed *speed); 153 int pciehp_get_max_link_speed(struct slot *slot, enum pci_bus_speed *speed);
152 int pciehp_get_max_link_width(struct slot *slot, enum pcie_link_width *val); 154 int pciehp_get_max_link_width(struct slot *slot, enum pcie_link_width *val);
153 int pciehp_get_cur_link_speed(struct slot *slot, enum pci_bus_speed *speed); 155 int pciehp_get_cur_link_speed(struct slot *slot, enum pci_bus_speed *speed);
154 int pciehp_get_cur_link_width(struct slot *slot, enum pcie_link_width *val); 156 int pciehp_get_cur_link_width(struct slot *slot, enum pcie_link_width *val);
155 int pciehp_query_power_fault(struct slot *slot); 157 int pciehp_query_power_fault(struct slot *slot);
156 void pciehp_green_led_on(struct slot *slot); 158 void pciehp_green_led_on(struct slot *slot);
157 void pciehp_green_led_off(struct slot *slot); 159 void pciehp_green_led_off(struct slot *slot);
158 void pciehp_green_led_blink(struct slot *slot); 160 void pciehp_green_led_blink(struct slot *slot);
159 int pciehp_check_link_status(struct controller *ctrl); 161 int pciehp_check_link_status(struct controller *ctrl);
160 void pciehp_release_ctrl(struct controller *ctrl); 162 void pciehp_release_ctrl(struct controller *ctrl);
161 163
162 static inline const char *slot_name(struct slot *slot) 164 static inline const char *slot_name(struct slot *slot)
163 { 165 {
164 return hotplug_slot_name(slot->hotplug_slot); 166 return hotplug_slot_name(slot->hotplug_slot);
165 } 167 }
166 168
167 #ifdef CONFIG_ACPI 169 #ifdef CONFIG_ACPI
168 #include <acpi/acpi.h> 170 #include <acpi/acpi.h>
169 #include <acpi/acpi_bus.h> 171 #include <acpi/acpi_bus.h>
170 #include <linux/pci-acpi.h> 172 #include <linux/pci-acpi.h>
171 173
172 extern void __init pciehp_acpi_slot_detection_init(void); 174 extern void __init pciehp_acpi_slot_detection_init(void);
173 extern int pciehp_acpi_slot_detection_check(struct pci_dev *dev); 175 extern int pciehp_acpi_slot_detection_check(struct pci_dev *dev);
174 176
175 static inline void pciehp_firmware_init(void) 177 static inline void pciehp_firmware_init(void)
176 { 178 {
177 pciehp_acpi_slot_detection_init(); 179 pciehp_acpi_slot_detection_init();
178 } 180 }
179 #else 181 #else
180 #define pciehp_firmware_init() do {} while (0) 182 #define pciehp_firmware_init() do {} while (0)
181 static inline int pciehp_acpi_slot_detection_check(struct pci_dev *dev) 183 static inline int pciehp_acpi_slot_detection_check(struct pci_dev *dev)
182 { 184 {
183 return 0; 185 return 0;
184 } 186 }
185 #endif /* CONFIG_ACPI */ 187 #endif /* CONFIG_ACPI */
186 #endif /* _PCIEHP_H */ 188 #endif /* _PCIEHP_H */
187 189
drivers/pci/hotplug/pciehp_core.c
1 /* 1 /*
2 * PCI Express Hot Plug Controller Driver 2 * PCI Express Hot Plug Controller Driver
3 * 3 *
4 * Copyright (C) 1995,2001 Compaq Computer Corporation 4 * Copyright (C) 1995,2001 Compaq Computer Corporation
5 * Copyright (C) 2001 Greg Kroah-Hartman (greg@kroah.com) 5 * Copyright (C) 2001 Greg Kroah-Hartman (greg@kroah.com)
6 * Copyright (C) 2001 IBM Corp. 6 * Copyright (C) 2001 IBM Corp.
7 * Copyright (C) 2003-2004 Intel Corporation 7 * Copyright (C) 2003-2004 Intel Corporation
8 * 8 *
9 * All rights reserved. 9 * All rights reserved.
10 * 10 *
11 * This program is free software; you can redistribute it and/or modify 11 * This program is free software; you can redistribute it and/or modify
12 * it under the terms of the GNU General Public License as published by 12 * it under the terms of the GNU General Public License as published by
13 * the Free Software Foundation; either version 2 of the License, or (at 13 * the Free Software Foundation; either version 2 of the License, or (at
14 * your option) any later version. 14 * your option) any later version.
15 * 15 *
16 * This program is distributed in the hope that it will be useful, but 16 * This program is distributed in the hope that it will be useful, but
17 * WITHOUT ANY WARRANTY; without even the implied warranty of 17 * WITHOUT ANY WARRANTY; without even the implied warranty of
18 * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or 18 * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
19 * NON INFRINGEMENT. See the GNU General Public License for more 19 * NON INFRINGEMENT. See the GNU General Public License for more
20 * details. 20 * details.
21 * 21 *
22 * You should have received a copy of the GNU General Public License 22 * You should have received a copy of the GNU General Public License
23 * along with this program; if not, write to the Free Software 23 * along with this program; if not, write to the Free Software
24 * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. 24 * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
25 * 25 *
26 * Send feedback to <greg@kroah.com>, <kristen.c.accardi@intel.com> 26 * Send feedback to <greg@kroah.com>, <kristen.c.accardi@intel.com>
27 * 27 *
28 */ 28 */
29 29
30 #include <linux/module.h> 30 #include <linux/module.h>
31 #include <linux/moduleparam.h> 31 #include <linux/moduleparam.h>
32 #include <linux/kernel.h> 32 #include <linux/kernel.h>
33 #include <linux/slab.h> 33 #include <linux/slab.h>
34 #include <linux/types.h> 34 #include <linux/types.h>
35 #include <linux/pci.h> 35 #include <linux/pci.h>
36 #include "pciehp.h" 36 #include "pciehp.h"
37 #include <linux/interrupt.h> 37 #include <linux/interrupt.h>
38 #include <linux/time.h> 38 #include <linux/time.h>
39 39
40 /* Global variables */ 40 /* Global variables */
41 int pciehp_debug; 41 int pciehp_debug;
42 int pciehp_poll_mode; 42 int pciehp_poll_mode;
43 int pciehp_poll_time; 43 int pciehp_poll_time;
44 int pciehp_force; 44 int pciehp_force;
45 struct workqueue_struct *pciehp_wq; 45 struct workqueue_struct *pciehp_wq;
46 struct workqueue_struct *pciehp_ordered_wq;
46 47
47 #define DRIVER_VERSION "0.4" 48 #define DRIVER_VERSION "0.4"
48 #define DRIVER_AUTHOR "Dan Zink <dan.zink@compaq.com>, Greg Kroah-Hartman <greg@kroah.com>, Dely Sy <dely.l.sy@intel.com>" 49 #define DRIVER_AUTHOR "Dan Zink <dan.zink@compaq.com>, Greg Kroah-Hartman <greg@kroah.com>, Dely Sy <dely.l.sy@intel.com>"
49 #define DRIVER_DESC "PCI Express Hot Plug Controller Driver" 50 #define DRIVER_DESC "PCI Express Hot Plug Controller Driver"
50 51
51 MODULE_AUTHOR(DRIVER_AUTHOR); 52 MODULE_AUTHOR(DRIVER_AUTHOR);
52 MODULE_DESCRIPTION(DRIVER_DESC); 53 MODULE_DESCRIPTION(DRIVER_DESC);
53 MODULE_LICENSE("GPL"); 54 MODULE_LICENSE("GPL");
54 55
55 module_param(pciehp_debug, bool, 0644); 56 module_param(pciehp_debug, bool, 0644);
56 module_param(pciehp_poll_mode, bool, 0644); 57 module_param(pciehp_poll_mode, bool, 0644);
57 module_param(pciehp_poll_time, int, 0644); 58 module_param(pciehp_poll_time, int, 0644);
58 module_param(pciehp_force, bool, 0644); 59 module_param(pciehp_force, bool, 0644);
59 MODULE_PARM_DESC(pciehp_debug, "Debugging mode enabled or not"); 60 MODULE_PARM_DESC(pciehp_debug, "Debugging mode enabled or not");
60 MODULE_PARM_DESC(pciehp_poll_mode, "Using polling mechanism for hot-plug events or not"); 61 MODULE_PARM_DESC(pciehp_poll_mode, "Using polling mechanism for hot-plug events or not");
61 MODULE_PARM_DESC(pciehp_poll_time, "Polling mechanism frequency, in seconds"); 62 MODULE_PARM_DESC(pciehp_poll_time, "Polling mechanism frequency, in seconds");
62 MODULE_PARM_DESC(pciehp_force, "Force pciehp, even if OSHP is missing"); 63 MODULE_PARM_DESC(pciehp_force, "Force pciehp, even if OSHP is missing");
63 64
64 #define PCIE_MODULE_NAME "pciehp" 65 #define PCIE_MODULE_NAME "pciehp"
65 66
66 static int set_attention_status (struct hotplug_slot *slot, u8 value); 67 static int set_attention_status (struct hotplug_slot *slot, u8 value);
67 static int enable_slot (struct hotplug_slot *slot); 68 static int enable_slot (struct hotplug_slot *slot);
68 static int disable_slot (struct hotplug_slot *slot); 69 static int disable_slot (struct hotplug_slot *slot);
69 static int get_power_status (struct hotplug_slot *slot, u8 *value); 70 static int get_power_status (struct hotplug_slot *slot, u8 *value);
70 static int get_attention_status (struct hotplug_slot *slot, u8 *value); 71 static int get_attention_status (struct hotplug_slot *slot, u8 *value);
71 static int get_latch_status (struct hotplug_slot *slot, u8 *value); 72 static int get_latch_status (struct hotplug_slot *slot, u8 *value);
72 static int get_adapter_status (struct hotplug_slot *slot, u8 *value); 73 static int get_adapter_status (struct hotplug_slot *slot, u8 *value);
73 74
74 /** 75 /**
75 * release_slot - free up the memory used by a slot 76 * release_slot - free up the memory used by a slot
76 * @hotplug_slot: slot to free 77 * @hotplug_slot: slot to free
77 */ 78 */
78 static void release_slot(struct hotplug_slot *hotplug_slot) 79 static void release_slot(struct hotplug_slot *hotplug_slot)
79 { 80 {
80 struct slot *slot = hotplug_slot->private; 81 struct slot *slot = hotplug_slot->private;
81 82
82 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n", 83 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n",
83 __func__, hotplug_slot_name(hotplug_slot)); 84 __func__, hotplug_slot_name(hotplug_slot));
84 85
85 kfree(hotplug_slot->ops); 86 kfree(hotplug_slot->ops);
86 kfree(hotplug_slot->info); 87 kfree(hotplug_slot->info);
87 kfree(hotplug_slot); 88 kfree(hotplug_slot);
88 } 89 }
89 90
90 static int init_slot(struct controller *ctrl) 91 static int init_slot(struct controller *ctrl)
91 { 92 {
92 struct slot *slot = ctrl->slot; 93 struct slot *slot = ctrl->slot;
93 struct hotplug_slot *hotplug = NULL; 94 struct hotplug_slot *hotplug = NULL;
94 struct hotplug_slot_info *info = NULL; 95 struct hotplug_slot_info *info = NULL;
95 struct hotplug_slot_ops *ops = NULL; 96 struct hotplug_slot_ops *ops = NULL;
96 char name[SLOT_NAME_SIZE]; 97 char name[SLOT_NAME_SIZE];
97 int retval = -ENOMEM; 98 int retval = -ENOMEM;
98 99
99 hotplug = kzalloc(sizeof(*hotplug), GFP_KERNEL); 100 hotplug = kzalloc(sizeof(*hotplug), GFP_KERNEL);
100 if (!hotplug) 101 if (!hotplug)
101 goto out; 102 goto out;
102 103
103 info = kzalloc(sizeof(*info), GFP_KERNEL); 104 info = kzalloc(sizeof(*info), GFP_KERNEL);
104 if (!info) 105 if (!info)
105 goto out; 106 goto out;
106 107
107 /* Setup hotplug slot ops */ 108 /* Setup hotplug slot ops */
108 ops = kzalloc(sizeof(*ops), GFP_KERNEL); 109 ops = kzalloc(sizeof(*ops), GFP_KERNEL);
109 if (!ops) 110 if (!ops)
110 goto out; 111 goto out;
111 ops->enable_slot = enable_slot; 112 ops->enable_slot = enable_slot;
112 ops->disable_slot = disable_slot; 113 ops->disable_slot = disable_slot;
113 ops->get_power_status = get_power_status; 114 ops->get_power_status = get_power_status;
114 ops->get_adapter_status = get_adapter_status; 115 ops->get_adapter_status = get_adapter_status;
115 if (MRL_SENS(ctrl)) 116 if (MRL_SENS(ctrl))
116 ops->get_latch_status = get_latch_status; 117 ops->get_latch_status = get_latch_status;
117 if (ATTN_LED(ctrl)) { 118 if (ATTN_LED(ctrl)) {
118 ops->get_attention_status = get_attention_status; 119 ops->get_attention_status = get_attention_status;
119 ops->set_attention_status = set_attention_status; 120 ops->set_attention_status = set_attention_status;
120 } 121 }
121 122
122 /* register this slot with the hotplug pci core */ 123 /* register this slot with the hotplug pci core */
123 hotplug->info = info; 124 hotplug->info = info;
124 hotplug->private = slot; 125 hotplug->private = slot;
125 hotplug->release = &release_slot; 126 hotplug->release = &release_slot;
126 hotplug->ops = ops; 127 hotplug->ops = ops;
127 slot->hotplug_slot = hotplug; 128 slot->hotplug_slot = hotplug;
128 snprintf(name, SLOT_NAME_SIZE, "%u", PSN(ctrl)); 129 snprintf(name, SLOT_NAME_SIZE, "%u", PSN(ctrl));
129 130
130 ctrl_dbg(ctrl, "Registering domain:bus:dev=%04x:%02x:00 sun=%x\n", 131 ctrl_dbg(ctrl, "Registering domain:bus:dev=%04x:%02x:00 sun=%x\n",
131 pci_domain_nr(ctrl->pcie->port->subordinate), 132 pci_domain_nr(ctrl->pcie->port->subordinate),
132 ctrl->pcie->port->subordinate->number, PSN(ctrl)); 133 ctrl->pcie->port->subordinate->number, PSN(ctrl));
133 retval = pci_hp_register(hotplug, 134 retval = pci_hp_register(hotplug,
134 ctrl->pcie->port->subordinate, 0, name); 135 ctrl->pcie->port->subordinate, 0, name);
135 if (retval) 136 if (retval)
136 ctrl_err(ctrl, 137 ctrl_err(ctrl,
137 "pci_hp_register failed with error %d\n", retval); 138 "pci_hp_register failed with error %d\n", retval);
138 out: 139 out:
139 if (retval) { 140 if (retval) {
140 kfree(ops); 141 kfree(ops);
141 kfree(info); 142 kfree(info);
142 kfree(hotplug); 143 kfree(hotplug);
143 } 144 }
144 return retval; 145 return retval;
145 } 146 }
146 147
147 static void cleanup_slot(struct controller *ctrl) 148 static void cleanup_slot(struct controller *ctrl)
148 { 149 {
149 pci_hp_deregister(ctrl->slot->hotplug_slot); 150 pci_hp_deregister(ctrl->slot->hotplug_slot);
150 } 151 }
151 152
152 /* 153 /*
153 * set_attention_status - Turns the Amber LED for a slot on, off or blink 154 * set_attention_status - Turns the Amber LED for a slot on, off or blink
154 */ 155 */
155 static int set_attention_status(struct hotplug_slot *hotplug_slot, u8 status) 156 static int set_attention_status(struct hotplug_slot *hotplug_slot, u8 status)
156 { 157 {
157 struct slot *slot = hotplug_slot->private; 158 struct slot *slot = hotplug_slot->private;
158 159
159 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n", 160 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n",
160 __func__, slot_name(slot)); 161 __func__, slot_name(slot));
161 162
162 return pciehp_set_attention_status(slot, status); 163 return pciehp_set_attention_status(slot, status);
163 } 164 }
164 165
165 166
166 static int enable_slot(struct hotplug_slot *hotplug_slot) 167 static int enable_slot(struct hotplug_slot *hotplug_slot)
167 { 168 {
168 struct slot *slot = hotplug_slot->private; 169 struct slot *slot = hotplug_slot->private;
169 170
170 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n", 171 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n",
171 __func__, slot_name(slot)); 172 __func__, slot_name(slot));
172 173
173 return pciehp_sysfs_enable_slot(slot); 174 return pciehp_sysfs_enable_slot(slot);
174 } 175 }
175 176
176 177
177 static int disable_slot(struct hotplug_slot *hotplug_slot) 178 static int disable_slot(struct hotplug_slot *hotplug_slot)
178 { 179 {
179 struct slot *slot = hotplug_slot->private; 180 struct slot *slot = hotplug_slot->private;
180 181
181 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n", 182 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n",
182 __func__, slot_name(slot)); 183 __func__, slot_name(slot));
183 184
184 return pciehp_sysfs_disable_slot(slot); 185 return pciehp_sysfs_disable_slot(slot);
185 } 186 }
186 187
187 static int get_power_status(struct hotplug_slot *hotplug_slot, u8 *value) 188 static int get_power_status(struct hotplug_slot *hotplug_slot, u8 *value)
188 { 189 {
189 struct slot *slot = hotplug_slot->private; 190 struct slot *slot = hotplug_slot->private;
190 191
191 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n", 192 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n",
192 __func__, slot_name(slot)); 193 __func__, slot_name(slot));
193 194
194 return pciehp_get_power_status(slot, value); 195 return pciehp_get_power_status(slot, value);
195 } 196 }
196 197
197 static int get_attention_status(struct hotplug_slot *hotplug_slot, u8 *value) 198 static int get_attention_status(struct hotplug_slot *hotplug_slot, u8 *value)
198 { 199 {
199 struct slot *slot = hotplug_slot->private; 200 struct slot *slot = hotplug_slot->private;
200 201
201 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n", 202 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n",
202 __func__, slot_name(slot)); 203 __func__, slot_name(slot));
203 204
204 return pciehp_get_attention_status(slot, value); 205 return pciehp_get_attention_status(slot, value);
205 } 206 }
206 207
207 static int get_latch_status(struct hotplug_slot *hotplug_slot, u8 *value) 208 static int get_latch_status(struct hotplug_slot *hotplug_slot, u8 *value)
208 { 209 {
209 struct slot *slot = hotplug_slot->private; 210 struct slot *slot = hotplug_slot->private;
210 211
211 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n", 212 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n",
212 __func__, slot_name(slot)); 213 __func__, slot_name(slot));
213 214
214 return pciehp_get_latch_status(slot, value); 215 return pciehp_get_latch_status(slot, value);
215 } 216 }
216 217
217 static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value) 218 static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value)
218 { 219 {
219 struct slot *slot = hotplug_slot->private; 220 struct slot *slot = hotplug_slot->private;
220 221
221 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n", 222 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n",
222 __func__, slot_name(slot)); 223 __func__, slot_name(slot));
223 224
224 return pciehp_get_adapter_status(slot, value); 225 return pciehp_get_adapter_status(slot, value);
225 } 226 }
226 227
227 static int pciehp_probe(struct pcie_device *dev) 228 static int pciehp_probe(struct pcie_device *dev)
228 { 229 {
229 int rc; 230 int rc;
230 struct controller *ctrl; 231 struct controller *ctrl;
231 struct slot *slot; 232 struct slot *slot;
232 u8 occupied, poweron; 233 u8 occupied, poweron;
233 234
234 if (pciehp_force) 235 if (pciehp_force)
235 dev_info(&dev->device, 236 dev_info(&dev->device,
236 "Bypassing BIOS check for pciehp use on %s\n", 237 "Bypassing BIOS check for pciehp use on %s\n",
237 pci_name(dev->port)); 238 pci_name(dev->port));
238 else if (pciehp_acpi_slot_detection_check(dev->port)) 239 else if (pciehp_acpi_slot_detection_check(dev->port))
239 goto err_out_none; 240 goto err_out_none;
240 241
241 ctrl = pcie_init(dev); 242 ctrl = pcie_init(dev);
242 if (!ctrl) { 243 if (!ctrl) {
243 dev_err(&dev->device, "Controller initialization failed\n"); 244 dev_err(&dev->device, "Controller initialization failed\n");
244 goto err_out_none; 245 goto err_out_none;
245 } 246 }
246 set_service_data(dev, ctrl); 247 set_service_data(dev, ctrl);
247 248
248 /* Setup the slot information structures */ 249 /* Setup the slot information structures */
249 rc = init_slot(ctrl); 250 rc = init_slot(ctrl);
250 if (rc) { 251 if (rc) {
251 if (rc == -EBUSY) 252 if (rc == -EBUSY)
252 ctrl_warn(ctrl, "Slot already registered by another " 253 ctrl_warn(ctrl, "Slot already registered by another "
253 "hotplug driver\n"); 254 "hotplug driver\n");
254 else 255 else
255 ctrl_err(ctrl, "Slot initialization failed\n"); 256 ctrl_err(ctrl, "Slot initialization failed\n");
256 goto err_out_release_ctlr; 257 goto err_out_release_ctlr;
257 } 258 }
258 259
259 /* Enable events after we have setup the data structures */ 260 /* Enable events after we have setup the data structures */
260 rc = pcie_init_notification(ctrl); 261 rc = pcie_init_notification(ctrl);
261 if (rc) { 262 if (rc) {
262 ctrl_err(ctrl, "Notification initialization failed\n"); 263 ctrl_err(ctrl, "Notification initialization failed\n");
263 goto err_out_free_ctrl_slot; 264 goto err_out_free_ctrl_slot;
264 } 265 }
265 266
266 /* Check if slot is occupied */ 267 /* Check if slot is occupied */
267 slot = ctrl->slot; 268 slot = ctrl->slot;
268 pciehp_get_adapter_status(slot, &occupied); 269 pciehp_get_adapter_status(slot, &occupied);
269 pciehp_get_power_status(slot, &poweron); 270 pciehp_get_power_status(slot, &poweron);
270 if (occupied && pciehp_force) 271 if (occupied && pciehp_force)
271 pciehp_enable_slot(slot); 272 pciehp_enable_slot(slot);
272 /* If empty slot's power status is on, turn power off */ 273 /* If empty slot's power status is on, turn power off */
273 if (!occupied && poweron && POWER_CTRL(ctrl)) 274 if (!occupied && poweron && POWER_CTRL(ctrl))
274 pciehp_power_off_slot(slot); 275 pciehp_power_off_slot(slot);
275 276
276 return 0; 277 return 0;
277 278
278 err_out_free_ctrl_slot: 279 err_out_free_ctrl_slot:
279 cleanup_slot(ctrl); 280 cleanup_slot(ctrl);
280 err_out_release_ctlr: 281 err_out_release_ctlr:
281 pciehp_release_ctrl(ctrl); 282 pciehp_release_ctrl(ctrl);
282 err_out_none: 283 err_out_none:
283 return -ENODEV; 284 return -ENODEV;
284 } 285 }
285 286
286 static void pciehp_remove(struct pcie_device *dev) 287 static void pciehp_remove(struct pcie_device *dev)
287 { 288 {
288 struct controller *ctrl = get_service_data(dev); 289 struct controller *ctrl = get_service_data(dev);
289 290
290 cleanup_slot(ctrl); 291 cleanup_slot(ctrl);
291 pciehp_release_ctrl(ctrl); 292 pciehp_release_ctrl(ctrl);
292 } 293 }
293 294
294 #ifdef CONFIG_PM 295 #ifdef CONFIG_PM
295 static int pciehp_suspend (struct pcie_device *dev) 296 static int pciehp_suspend (struct pcie_device *dev)
296 { 297 {
297 dev_info(&dev->device, "%s ENTRY\n", __func__); 298 dev_info(&dev->device, "%s ENTRY\n", __func__);
298 return 0; 299 return 0;
299 } 300 }
300 301
301 static int pciehp_resume (struct pcie_device *dev) 302 static int pciehp_resume (struct pcie_device *dev)
302 { 303 {
303 dev_info(&dev->device, "%s ENTRY\n", __func__); 304 dev_info(&dev->device, "%s ENTRY\n", __func__);
304 if (pciehp_force) { 305 if (pciehp_force) {
305 struct controller *ctrl = get_service_data(dev); 306 struct controller *ctrl = get_service_data(dev);
306 struct slot *slot; 307 struct slot *slot;
307 u8 status; 308 u8 status;
308 309
309 /* reinitialize the chipset's event detection logic */ 310 /* reinitialize the chipset's event detection logic */
310 pcie_enable_notification(ctrl); 311 pcie_enable_notification(ctrl);
311 312
312 slot = ctrl->slot; 313 slot = ctrl->slot;
313 314
314 /* Check if slot is occupied */ 315 /* Check if slot is occupied */
315 pciehp_get_adapter_status(slot, &status); 316 pciehp_get_adapter_status(slot, &status);
316 if (status) 317 if (status)
317 pciehp_enable_slot(slot); 318 pciehp_enable_slot(slot);
318 else 319 else
319 pciehp_disable_slot(slot); 320 pciehp_disable_slot(slot);
320 } 321 }
321 return 0; 322 return 0;
322 } 323 }
323 #endif /* PM */ 324 #endif /* PM */
324 325
325 static struct pcie_port_service_driver hpdriver_portdrv = { 326 static struct pcie_port_service_driver hpdriver_portdrv = {
326 .name = PCIE_MODULE_NAME, 327 .name = PCIE_MODULE_NAME,
327 .port_type = PCIE_ANY_PORT, 328 .port_type = PCIE_ANY_PORT,
328 .service = PCIE_PORT_SERVICE_HP, 329 .service = PCIE_PORT_SERVICE_HP,
329 330
330 .probe = pciehp_probe, 331 .probe = pciehp_probe,
331 .remove = pciehp_remove, 332 .remove = pciehp_remove,
332 333
333 #ifdef CONFIG_PM 334 #ifdef CONFIG_PM
334 .suspend = pciehp_suspend, 335 .suspend = pciehp_suspend,
335 .resume = pciehp_resume, 336 .resume = pciehp_resume,
336 #endif /* PM */ 337 #endif /* PM */
337 }; 338 };
338 339
339 static int __init pcied_init(void) 340 static int __init pcied_init(void)
340 { 341 {
341 int retval = 0; 342 int retval = 0;
342 343
344 pciehp_wq = alloc_workqueue("pciehp", 0, 0);
345 if (!pciehp_wq)
346 return -ENOMEM;
347
348 pciehp_ordered_wq = alloc_ordered_workqueue("pciehp_ordered", 0);
349 if (!pciehp_ordered_wq) {
350 destroy_workqueue(pciehp_wq);
351 return -ENOMEM;
352 }
353
343 pciehp_firmware_init(); 354 pciehp_firmware_init();
344 retval = pcie_port_service_register(&hpdriver_portdrv); 355 retval = pcie_port_service_register(&hpdriver_portdrv);
345 dbg("pcie_port_service_register = %d\n", retval); 356 dbg("pcie_port_service_register = %d\n", retval);
346 info(DRIVER_DESC " version: " DRIVER_VERSION "\n"); 357 info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
347 if (retval) 358 if (retval) {
359 destroy_workqueue(pciehp_ordered_wq);
360 destroy_workqueue(pciehp_wq);
348 dbg("Failure to register service\n"); 361 dbg("Failure to register service\n");
362 }
349 return retval; 363 return retval;
350 } 364 }
351 365
352 static void __exit pcied_cleanup(void) 366 static void __exit pcied_cleanup(void)
353 { 367 {
354 dbg("unload_pciehpd()\n"); 368 dbg("unload_pciehpd()\n");
369 destroy_workqueue(pciehp_ordered_wq);
370 destroy_workqueue(pciehp_wq);
355 pcie_port_service_unregister(&hpdriver_portdrv); 371 pcie_port_service_unregister(&hpdriver_portdrv);
356 info(DRIVER_DESC " version: " DRIVER_VERSION " unloaded\n"); 372 info(DRIVER_DESC " version: " DRIVER_VERSION " unloaded\n");
357 } 373 }
358 374
359 module_init(pcied_init); 375 module_init(pcied_init);
360 module_exit(pcied_cleanup); 376 module_exit(pcied_cleanup);
361 377
drivers/pci/hotplug/pciehp_ctrl.c
1 /* 1 /*
2 * PCI Express Hot Plug Controller Driver 2 * PCI Express Hot Plug Controller Driver
3 * 3 *
4 * Copyright (C) 1995,2001 Compaq Computer Corporation 4 * Copyright (C) 1995,2001 Compaq Computer Corporation
5 * Copyright (C) 2001 Greg Kroah-Hartman (greg@kroah.com) 5 * Copyright (C) 2001 Greg Kroah-Hartman (greg@kroah.com)
6 * Copyright (C) 2001 IBM Corp. 6 * Copyright (C) 2001 IBM Corp.
7 * Copyright (C) 2003-2004 Intel Corporation 7 * Copyright (C) 2003-2004 Intel Corporation
8 * 8 *
9 * All rights reserved. 9 * All rights reserved.
10 * 10 *
11 * This program is free software; you can redistribute it and/or modify 11 * This program is free software; you can redistribute it and/or modify
12 * it under the terms of the GNU General Public License as published by 12 * it under the terms of the GNU General Public License as published by
13 * the Free Software Foundation; either version 2 of the License, or (at 13 * the Free Software Foundation; either version 2 of the License, or (at
14 * your option) any later version. 14 * your option) any later version.
15 * 15 *
16 * This program is distributed in the hope that it will be useful, but 16 * This program is distributed in the hope that it will be useful, but
17 * WITHOUT ANY WARRANTY; without even the implied warranty of 17 * WITHOUT ANY WARRANTY; without even the implied warranty of
18 * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or 18 * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
19 * NON INFRINGEMENT. See the GNU General Public License for more 19 * NON INFRINGEMENT. See the GNU General Public License for more
20 * details. 20 * details.
21 * 21 *
22 * You should have received a copy of the GNU General Public License 22 * You should have received a copy of the GNU General Public License
23 * along with this program; if not, write to the Free Software 23 * along with this program; if not, write to the Free Software
24 * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. 24 * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
25 * 25 *
26 * Send feedback to <greg@kroah.com>, <kristen.c.accardi@intel.com> 26 * Send feedback to <greg@kroah.com>, <kristen.c.accardi@intel.com>
27 * 27 *
28 */ 28 */
29 29
30 #include <linux/module.h> 30 #include <linux/module.h>
31 #include <linux/kernel.h> 31 #include <linux/kernel.h>
32 #include <linux/types.h> 32 #include <linux/types.h>
33 #include <linux/slab.h> 33 #include <linux/slab.h>
34 #include <linux/pci.h> 34 #include <linux/pci.h>
35 #include <linux/workqueue.h>
36 #include "../pci.h" 35 #include "../pci.h"
37 #include "pciehp.h" 36 #include "pciehp.h"
38 37
39 static void interrupt_event_handler(struct work_struct *work); 38 static void interrupt_event_handler(struct work_struct *work);
40 39
41 static int queue_interrupt_event(struct slot *p_slot, u32 event_type) 40 static int queue_interrupt_event(struct slot *p_slot, u32 event_type)
42 { 41 {
43 struct event_info *info; 42 struct event_info *info;
44 43
45 info = kmalloc(sizeof(*info), GFP_ATOMIC); 44 info = kmalloc(sizeof(*info), GFP_ATOMIC);
46 if (!info) 45 if (!info)
47 return -ENOMEM; 46 return -ENOMEM;
48 47
49 info->event_type = event_type; 48 info->event_type = event_type;
50 info->p_slot = p_slot; 49 info->p_slot = p_slot;
51 INIT_WORK(&info->work, interrupt_event_handler); 50 INIT_WORK(&info->work, interrupt_event_handler);
52 51
53 schedule_work(&info->work); 52 queue_work(pciehp_wq, &info->work);
54 53
55 return 0; 54 return 0;
56 } 55 }
57 56
58 u8 pciehp_handle_attention_button(struct slot *p_slot) 57 u8 pciehp_handle_attention_button(struct slot *p_slot)
59 { 58 {
60 u32 event_type; 59 u32 event_type;
61 struct controller *ctrl = p_slot->ctrl; 60 struct controller *ctrl = p_slot->ctrl;
62 61
63 /* Attention Button Change */ 62 /* Attention Button Change */
64 ctrl_dbg(ctrl, "Attention button interrupt received\n"); 63 ctrl_dbg(ctrl, "Attention button interrupt received\n");
65 64
66 /* 65 /*
67 * Button pressed - See if need to TAKE ACTION!!! 66 * Button pressed - See if need to TAKE ACTION!!!
68 */ 67 */
69 ctrl_info(ctrl, "Button pressed on Slot(%s)\n", slot_name(p_slot)); 68 ctrl_info(ctrl, "Button pressed on Slot(%s)\n", slot_name(p_slot));
70 event_type = INT_BUTTON_PRESS; 69 event_type = INT_BUTTON_PRESS;
71 70
72 queue_interrupt_event(p_slot, event_type); 71 queue_interrupt_event(p_slot, event_type);
73 72
74 return 0; 73 return 0;
75 } 74 }
76 75
77 u8 pciehp_handle_switch_change(struct slot *p_slot) 76 u8 pciehp_handle_switch_change(struct slot *p_slot)
78 { 77 {
79 u8 getstatus; 78 u8 getstatus;
80 u32 event_type; 79 u32 event_type;
81 struct controller *ctrl = p_slot->ctrl; 80 struct controller *ctrl = p_slot->ctrl;
82 81
83 /* Switch Change */ 82 /* Switch Change */
84 ctrl_dbg(ctrl, "Switch interrupt received\n"); 83 ctrl_dbg(ctrl, "Switch interrupt received\n");
85 84
86 pciehp_get_latch_status(p_slot, &getstatus); 85 pciehp_get_latch_status(p_slot, &getstatus);
87 if (getstatus) { 86 if (getstatus) {
88 /* 87 /*
89 * Switch opened 88 * Switch opened
90 */ 89 */
91 ctrl_info(ctrl, "Latch open on Slot(%s)\n", slot_name(p_slot)); 90 ctrl_info(ctrl, "Latch open on Slot(%s)\n", slot_name(p_slot));
92 event_type = INT_SWITCH_OPEN; 91 event_type = INT_SWITCH_OPEN;
93 } else { 92 } else {
94 /* 93 /*
95 * Switch closed 94 * Switch closed
96 */ 95 */
97 ctrl_info(ctrl, "Latch close on Slot(%s)\n", slot_name(p_slot)); 96 ctrl_info(ctrl, "Latch close on Slot(%s)\n", slot_name(p_slot));
98 event_type = INT_SWITCH_CLOSE; 97 event_type = INT_SWITCH_CLOSE;
99 } 98 }
100 99
101 queue_interrupt_event(p_slot, event_type); 100 queue_interrupt_event(p_slot, event_type);
102 101
103 return 1; 102 return 1;
104 } 103 }
105 104
106 u8 pciehp_handle_presence_change(struct slot *p_slot) 105 u8 pciehp_handle_presence_change(struct slot *p_slot)
107 { 106 {
108 u32 event_type; 107 u32 event_type;
109 u8 presence_save; 108 u8 presence_save;
110 struct controller *ctrl = p_slot->ctrl; 109 struct controller *ctrl = p_slot->ctrl;
111 110
112 /* Presence Change */ 111 /* Presence Change */
113 ctrl_dbg(ctrl, "Presence/Notify input change\n"); 112 ctrl_dbg(ctrl, "Presence/Notify input change\n");
114 113
115 /* Switch is open, assume a presence change 114 /* Switch is open, assume a presence change
116 * Save the presence state 115 * Save the presence state
117 */ 116 */
118 pciehp_get_adapter_status(p_slot, &presence_save); 117 pciehp_get_adapter_status(p_slot, &presence_save);
119 if (presence_save) { 118 if (presence_save) {
120 /* 119 /*
121 * Card Present 120 * Card Present
122 */ 121 */
123 ctrl_info(ctrl, "Card present on Slot(%s)\n", slot_name(p_slot)); 122 ctrl_info(ctrl, "Card present on Slot(%s)\n", slot_name(p_slot));
124 event_type = INT_PRESENCE_ON; 123 event_type = INT_PRESENCE_ON;
125 } else { 124 } else {
126 /* 125 /*
127 * Not Present 126 * Not Present
128 */ 127 */
129 ctrl_info(ctrl, "Card not present on Slot(%s)\n", 128 ctrl_info(ctrl, "Card not present on Slot(%s)\n",
130 slot_name(p_slot)); 129 slot_name(p_slot));
131 event_type = INT_PRESENCE_OFF; 130 event_type = INT_PRESENCE_OFF;
132 } 131 }
133 132
134 queue_interrupt_event(p_slot, event_type); 133 queue_interrupt_event(p_slot, event_type);
135 134
136 return 1; 135 return 1;
137 } 136 }
138 137
139 u8 pciehp_handle_power_fault(struct slot *p_slot) 138 u8 pciehp_handle_power_fault(struct slot *p_slot)
140 { 139 {
141 u32 event_type; 140 u32 event_type;
142 struct controller *ctrl = p_slot->ctrl; 141 struct controller *ctrl = p_slot->ctrl;
143 142
144 /* power fault */ 143 /* power fault */
145 ctrl_dbg(ctrl, "Power fault interrupt received\n"); 144 ctrl_dbg(ctrl, "Power fault interrupt received\n");
146 ctrl_err(ctrl, "Power fault on slot %s\n", slot_name(p_slot)); 145 ctrl_err(ctrl, "Power fault on slot %s\n", slot_name(p_slot));
147 event_type = INT_POWER_FAULT; 146 event_type = INT_POWER_FAULT;
148 ctrl_info(ctrl, "Power fault bit %x set\n", 0); 147 ctrl_info(ctrl, "Power fault bit %x set\n", 0);
149 queue_interrupt_event(p_slot, event_type); 148 queue_interrupt_event(p_slot, event_type);
150 149
151 return 1; 150 return 1;
152 } 151 }
153 152
154 /* The following routines constitute the bulk of the 153 /* The following routines constitute the bulk of the
155 hotplug controller logic 154 hotplug controller logic
156 */ 155 */
157 156
158 static void set_slot_off(struct controller *ctrl, struct slot * pslot) 157 static void set_slot_off(struct controller *ctrl, struct slot * pslot)
159 { 158 {
160 /* turn off slot, turn on Amber LED, turn off Green LED if supported*/ 159 /* turn off slot, turn on Amber LED, turn off Green LED if supported*/
161 if (POWER_CTRL(ctrl)) { 160 if (POWER_CTRL(ctrl)) {
162 if (pciehp_power_off_slot(pslot)) { 161 if (pciehp_power_off_slot(pslot)) {
163 ctrl_err(ctrl, 162 ctrl_err(ctrl,
164 "Issue of Slot Power Off command failed\n"); 163 "Issue of Slot Power Off command failed\n");
165 return; 164 return;
166 } 165 }
167 /* 166 /*
168 * After turning power off, we must wait for at least 1 second 167 * After turning power off, we must wait for at least 1 second
169 * before taking any action that relies on power having been 168 * before taking any action that relies on power having been
170 * removed from the slot/adapter. 169 * removed from the slot/adapter.
171 */ 170 */
172 msleep(1000); 171 msleep(1000);
173 } 172 }
174 173
175 if (PWR_LED(ctrl)) 174 if (PWR_LED(ctrl))
176 pciehp_green_led_off(pslot); 175 pciehp_green_led_off(pslot);
177 176
178 if (ATTN_LED(ctrl)) { 177 if (ATTN_LED(ctrl)) {
179 if (pciehp_set_attention_status(pslot, 1)) { 178 if (pciehp_set_attention_status(pslot, 1)) {
180 ctrl_err(ctrl, 179 ctrl_err(ctrl,
181 "Issue of Set Attention Led command failed\n"); 180 "Issue of Set Attention Led command failed\n");
182 return; 181 return;
183 } 182 }
184 } 183 }
185 } 184 }
186 185
187 /** 186 /**
188 * board_added - Called after a board has been added to the system. 187 * board_added - Called after a board has been added to the system.
189 * @p_slot: &slot where board is added 188 * @p_slot: &slot where board is added
190 * 189 *
191 * Turns power on for the board. 190 * Turns power on for the board.
192 * Configures board. 191 * Configures board.
193 */ 192 */
194 static int board_added(struct slot *p_slot) 193 static int board_added(struct slot *p_slot)
195 { 194 {
196 int retval = 0; 195 int retval = 0;
197 struct controller *ctrl = p_slot->ctrl; 196 struct controller *ctrl = p_slot->ctrl;
198 struct pci_bus *parent = ctrl->pcie->port->subordinate; 197 struct pci_bus *parent = ctrl->pcie->port->subordinate;
199 198
200 if (POWER_CTRL(ctrl)) { 199 if (POWER_CTRL(ctrl)) {
201 /* Power on slot */ 200 /* Power on slot */
202 retval = pciehp_power_on_slot(p_slot); 201 retval = pciehp_power_on_slot(p_slot);
203 if (retval) 202 if (retval)
204 return retval; 203 return retval;
205 } 204 }
206 205
207 if (PWR_LED(ctrl)) 206 if (PWR_LED(ctrl))
208 pciehp_green_led_blink(p_slot); 207 pciehp_green_led_blink(p_slot);
209 208
210 /* Check link training status */ 209 /* Check link training status */
211 retval = pciehp_check_link_status(ctrl); 210 retval = pciehp_check_link_status(ctrl);
212 if (retval) { 211 if (retval) {
213 ctrl_err(ctrl, "Failed to check link status\n"); 212 ctrl_err(ctrl, "Failed to check link status\n");
214 goto err_exit; 213 goto err_exit;
215 } 214 }
216 215
217 /* Check for a power fault */ 216 /* Check for a power fault */
218 if (ctrl->power_fault_detected || pciehp_query_power_fault(p_slot)) { 217 if (ctrl->power_fault_detected || pciehp_query_power_fault(p_slot)) {
219 ctrl_err(ctrl, "Power fault on slot %s\n", slot_name(p_slot)); 218 ctrl_err(ctrl, "Power fault on slot %s\n", slot_name(p_slot));
220 retval = -EIO; 219 retval = -EIO;
221 goto err_exit; 220 goto err_exit;
222 } 221 }
223 222
224 retval = pciehp_configure_device(p_slot); 223 retval = pciehp_configure_device(p_slot);
225 if (retval) { 224 if (retval) {
226 ctrl_err(ctrl, "Cannot add device at %04x:%02x:00\n", 225 ctrl_err(ctrl, "Cannot add device at %04x:%02x:00\n",
227 pci_domain_nr(parent), parent->number); 226 pci_domain_nr(parent), parent->number);
228 goto err_exit; 227 goto err_exit;
229 } 228 }
230 229
231 if (PWR_LED(ctrl)) 230 if (PWR_LED(ctrl))
232 pciehp_green_led_on(p_slot); 231 pciehp_green_led_on(p_slot);
233 232
234 return 0; 233 return 0;
235 234
236 err_exit: 235 err_exit:
237 set_slot_off(ctrl, p_slot); 236 set_slot_off(ctrl, p_slot);
238 return retval; 237 return retval;
239 } 238 }
240 239
241 /** 240 /**
242 * remove_board - Turns off slot and LEDs 241 * remove_board - Turns off slot and LEDs
243 * @p_slot: slot where board is being removed 242 * @p_slot: slot where board is being removed
244 */ 243 */
245 static int remove_board(struct slot *p_slot) 244 static int remove_board(struct slot *p_slot)
246 { 245 {
247 int retval = 0; 246 int retval = 0;
248 struct controller *ctrl = p_slot->ctrl; 247 struct controller *ctrl = p_slot->ctrl;
249 248
250 retval = pciehp_unconfigure_device(p_slot); 249 retval = pciehp_unconfigure_device(p_slot);
251 if (retval) 250 if (retval)
252 return retval; 251 return retval;
253 252
254 if (POWER_CTRL(ctrl)) { 253 if (POWER_CTRL(ctrl)) {
255 /* power off slot */ 254 /* power off slot */
256 retval = pciehp_power_off_slot(p_slot); 255 retval = pciehp_power_off_slot(p_slot);
257 if (retval) { 256 if (retval) {
258 ctrl_err(ctrl, 257 ctrl_err(ctrl,
259 "Issue of Slot Disable command failed\n"); 258 "Issue of Slot Disable command failed\n");
260 return retval; 259 return retval;
261 } 260 }
262 /* 261 /*
263 * After turning power off, we must wait for at least 1 second 262 * After turning power off, we must wait for at least 1 second
264 * before taking any action that relies on power having been 263 * before taking any action that relies on power having been
265 * removed from the slot/adapter. 264 * removed from the slot/adapter.
266 */ 265 */
267 msleep(1000); 266 msleep(1000);
268 } 267 }
269 268
270 /* turn off Green LED */ 269 /* turn off Green LED */
271 if (PWR_LED(ctrl)) 270 if (PWR_LED(ctrl))
272 pciehp_green_led_off(p_slot); 271 pciehp_green_led_off(p_slot);
273 272
274 return 0; 273 return 0;
275 } 274 }
276 275
277 struct power_work_info { 276 struct power_work_info {
278 struct slot *p_slot; 277 struct slot *p_slot;
279 struct work_struct work; 278 struct work_struct work;
280 }; 279 };
281 280
282 /** 281 /**
283 * pciehp_power_thread - handle pushbutton events 282 * pciehp_power_thread - handle pushbutton events
284 * @work: &struct work_struct describing work to be done 283 * @work: &struct work_struct describing work to be done
285 * 284 *
286 * Scheduled procedure to handle blocking stuff for the pushbuttons. 285 * Scheduled procedure to handle blocking stuff for the pushbuttons.
287 * Handles all pending events and exits. 286 * Handles all pending events and exits.
288 */ 287 */
289 static void pciehp_power_thread(struct work_struct *work) 288 static void pciehp_power_thread(struct work_struct *work)
290 { 289 {
291 struct power_work_info *info = 290 struct power_work_info *info =
292 container_of(work, struct power_work_info, work); 291 container_of(work, struct power_work_info, work);
293 struct slot *p_slot = info->p_slot; 292 struct slot *p_slot = info->p_slot;
294 293
295 mutex_lock(&p_slot->lock); 294 mutex_lock(&p_slot->lock);
296 switch (p_slot->state) { 295 switch (p_slot->state) {
297 case POWEROFF_STATE: 296 case POWEROFF_STATE:
298 mutex_unlock(&p_slot->lock); 297 mutex_unlock(&p_slot->lock);
299 ctrl_dbg(p_slot->ctrl, 298 ctrl_dbg(p_slot->ctrl,
300 "Disabling domain:bus:device=%04x:%02x:00\n", 299 "Disabling domain:bus:device=%04x:%02x:00\n",
301 pci_domain_nr(p_slot->ctrl->pcie->port->subordinate), 300 pci_domain_nr(p_slot->ctrl->pcie->port->subordinate),
302 p_slot->ctrl->pcie->port->subordinate->number); 301 p_slot->ctrl->pcie->port->subordinate->number);
303 pciehp_disable_slot(p_slot); 302 pciehp_disable_slot(p_slot);
304 mutex_lock(&p_slot->lock); 303 mutex_lock(&p_slot->lock);
305 p_slot->state = STATIC_STATE; 304 p_slot->state = STATIC_STATE;
306 break; 305 break;
307 case POWERON_STATE: 306 case POWERON_STATE:
308 mutex_unlock(&p_slot->lock); 307 mutex_unlock(&p_slot->lock);
309 if (pciehp_enable_slot(p_slot) && PWR_LED(p_slot->ctrl)) 308 if (pciehp_enable_slot(p_slot) && PWR_LED(p_slot->ctrl))
310 pciehp_green_led_off(p_slot); 309 pciehp_green_led_off(p_slot);
311 mutex_lock(&p_slot->lock); 310 mutex_lock(&p_slot->lock);
312 p_slot->state = STATIC_STATE; 311 p_slot->state = STATIC_STATE;
313 break; 312 break;
314 default: 313 default:
315 break; 314 break;
316 } 315 }
317 mutex_unlock(&p_slot->lock); 316 mutex_unlock(&p_slot->lock);
318 317
319 kfree(info); 318 kfree(info);
320 } 319 }
321 320
322 void pciehp_queue_pushbutton_work(struct work_struct *work) 321 void pciehp_queue_pushbutton_work(struct work_struct *work)
323 { 322 {
324 struct slot *p_slot = container_of(work, struct slot, work.work); 323 struct slot *p_slot = container_of(work, struct slot, work.work);
325 struct power_work_info *info; 324 struct power_work_info *info;
326 325
327 info = kmalloc(sizeof(*info), GFP_KERNEL); 326 info = kmalloc(sizeof(*info), GFP_KERNEL);
328 if (!info) { 327 if (!info) {
329 ctrl_err(p_slot->ctrl, "%s: Cannot allocate memory\n", 328 ctrl_err(p_slot->ctrl, "%s: Cannot allocate memory\n",
330 __func__); 329 __func__);
331 return; 330 return;
332 } 331 }
333 info->p_slot = p_slot; 332 info->p_slot = p_slot;
334 INIT_WORK(&info->work, pciehp_power_thread); 333 INIT_WORK(&info->work, pciehp_power_thread);
335 334
336 mutex_lock(&p_slot->lock); 335 mutex_lock(&p_slot->lock);
337 switch (p_slot->state) { 336 switch (p_slot->state) {
338 case BLINKINGOFF_STATE: 337 case BLINKINGOFF_STATE:
339 p_slot->state = POWEROFF_STATE; 338 p_slot->state = POWEROFF_STATE;
340 break; 339 break;
341 case BLINKINGON_STATE: 340 case BLINKINGON_STATE:
342 p_slot->state = POWERON_STATE; 341 p_slot->state = POWERON_STATE;
343 break; 342 break;
344 default: 343 default:
345 kfree(info); 344 kfree(info);
346 goto out; 345 goto out;
347 } 346 }
348 queue_work(pciehp_wq, &info->work); 347 queue_work(pciehp_ordered_wq, &info->work);
349 out: 348 out:
350 mutex_unlock(&p_slot->lock); 349 mutex_unlock(&p_slot->lock);
351 } 350 }
352 351
353 /* 352 /*
354 * Note: This function must be called with slot->lock held 353 * Note: This function must be called with slot->lock held
355 */ 354 */
356 static void handle_button_press_event(struct slot *p_slot) 355 static void handle_button_press_event(struct slot *p_slot)
357 { 356 {
358 struct controller *ctrl = p_slot->ctrl; 357 struct controller *ctrl = p_slot->ctrl;
359 u8 getstatus; 358 u8 getstatus;
360 359
361 switch (p_slot->state) { 360 switch (p_slot->state) {
362 case STATIC_STATE: 361 case STATIC_STATE:
363 pciehp_get_power_status(p_slot, &getstatus); 362 pciehp_get_power_status(p_slot, &getstatus);
364 if (getstatus) { 363 if (getstatus) {
365 p_slot->state = BLINKINGOFF_STATE; 364 p_slot->state = BLINKINGOFF_STATE;
366 ctrl_info(ctrl, 365 ctrl_info(ctrl,
367 "PCI slot #%s - powering off due to button " 366 "PCI slot #%s - powering off due to button "
368 "press.\n", slot_name(p_slot)); 367 "press.\n", slot_name(p_slot));
369 } else { 368 } else {
370 p_slot->state = BLINKINGON_STATE; 369 p_slot->state = BLINKINGON_STATE;
371 ctrl_info(ctrl, 370 ctrl_info(ctrl,
372 "PCI slot #%s - powering on due to button " 371 "PCI slot #%s - powering on due to button "
373 "press.\n", slot_name(p_slot)); 372 "press.\n", slot_name(p_slot));
374 } 373 }
375 /* blink green LED and turn off amber */ 374 /* blink green LED and turn off amber */
376 if (PWR_LED(ctrl)) 375 if (PWR_LED(ctrl))
377 pciehp_green_led_blink(p_slot); 376 pciehp_green_led_blink(p_slot);
378 if (ATTN_LED(ctrl)) 377 if (ATTN_LED(ctrl))
379 pciehp_set_attention_status(p_slot, 0); 378 pciehp_set_attention_status(p_slot, 0);
380 379
381 schedule_delayed_work(&p_slot->work, 5*HZ); 380 queue_delayed_work(pciehp_wq, &p_slot->work, 5*HZ);
382 break; 381 break;
383 case BLINKINGOFF_STATE: 382 case BLINKINGOFF_STATE:
384 case BLINKINGON_STATE: 383 case BLINKINGON_STATE:
385 /* 384 /*
386 * Cancel if we are still blinking; this means that we 385 * Cancel if we are still blinking; this means that we
387 * press the attention again before the 5 sec. limit 386 * press the attention again before the 5 sec. limit
388 * expires to cancel hot-add or hot-remove 387 * expires to cancel hot-add or hot-remove
389 */ 388 */
390 ctrl_info(ctrl, "Button cancel on Slot(%s)\n", slot_name(p_slot)); 389 ctrl_info(ctrl, "Button cancel on Slot(%s)\n", slot_name(p_slot));
391 cancel_delayed_work(&p_slot->work); 390 cancel_delayed_work(&p_slot->work);
392 if (p_slot->state == BLINKINGOFF_STATE) { 391 if (p_slot->state == BLINKINGOFF_STATE) {
393 if (PWR_LED(ctrl)) 392 if (PWR_LED(ctrl))
394 pciehp_green_led_on(p_slot); 393 pciehp_green_led_on(p_slot);
395 } else { 394 } else {
396 if (PWR_LED(ctrl)) 395 if (PWR_LED(ctrl))
397 pciehp_green_led_off(p_slot); 396 pciehp_green_led_off(p_slot);
398 } 397 }
399 if (ATTN_LED(ctrl)) 398 if (ATTN_LED(ctrl))
400 pciehp_set_attention_status(p_slot, 0); 399 pciehp_set_attention_status(p_slot, 0);
401 ctrl_info(ctrl, "PCI slot #%s - action canceled " 400 ctrl_info(ctrl, "PCI slot #%s - action canceled "
402 "due to button press\n", slot_name(p_slot)); 401 "due to button press\n", slot_name(p_slot));
403 p_slot->state = STATIC_STATE; 402 p_slot->state = STATIC_STATE;
404 break; 403 break;
405 case POWEROFF_STATE: 404 case POWEROFF_STATE:
406 case POWERON_STATE: 405 case POWERON_STATE:
407 /* 406 /*
408 * Ignore if the slot is on power-on or power-off state; 407 * Ignore if the slot is on power-on or power-off state;
409 * this means that the previous attention button action 408 * this means that the previous attention button action
410 * to hot-add or hot-remove is undergoing 409 * to hot-add or hot-remove is undergoing
411 */ 410 */
412 ctrl_info(ctrl, "Button ignore on Slot(%s)\n", slot_name(p_slot)); 411 ctrl_info(ctrl, "Button ignore on Slot(%s)\n", slot_name(p_slot));
413 break; 412 break;
414 default: 413 default:
415 ctrl_warn(ctrl, "Not a valid state\n"); 414 ctrl_warn(ctrl, "Not a valid state\n");
416 break; 415 break;
417 } 416 }
418 } 417 }
419 418
420 /* 419 /*
421 * Note: This function must be called with slot->lock held 420 * Note: This function must be called with slot->lock held
422 */ 421 */
423 static void handle_surprise_event(struct slot *p_slot) 422 static void handle_surprise_event(struct slot *p_slot)
424 { 423 {
425 u8 getstatus; 424 u8 getstatus;
426 struct power_work_info *info; 425 struct power_work_info *info;
427 426
428 info = kmalloc(sizeof(*info), GFP_KERNEL); 427 info = kmalloc(sizeof(*info), GFP_KERNEL);
429 if (!info) { 428 if (!info) {
430 ctrl_err(p_slot->ctrl, "%s: Cannot allocate memory\n", 429 ctrl_err(p_slot->ctrl, "%s: Cannot allocate memory\n",
431 __func__); 430 __func__);
432 return; 431 return;
433 } 432 }
434 info->p_slot = p_slot; 433 info->p_slot = p_slot;
435 INIT_WORK(&info->work, pciehp_power_thread); 434 INIT_WORK(&info->work, pciehp_power_thread);
436 435
437 pciehp_get_adapter_status(p_slot, &getstatus); 436 pciehp_get_adapter_status(p_slot, &getstatus);
438 if (!getstatus) 437 if (!getstatus)
439 p_slot->state = POWEROFF_STATE; 438 p_slot->state = POWEROFF_STATE;
440 else 439 else
441 p_slot->state = POWERON_STATE; 440 p_slot->state = POWERON_STATE;
442 441
443 queue_work(pciehp_wq, &info->work); 442 queue_work(pciehp_ordered_wq, &info->work);
444 } 443 }
445 444
446 static void interrupt_event_handler(struct work_struct *work) 445 static void interrupt_event_handler(struct work_struct *work)
447 { 446 {
448 struct event_info *info = container_of(work, struct event_info, work); 447 struct event_info *info = container_of(work, struct event_info, work);
449 struct slot *p_slot = info->p_slot; 448 struct slot *p_slot = info->p_slot;
450 struct controller *ctrl = p_slot->ctrl; 449 struct controller *ctrl = p_slot->ctrl;
451 450
452 mutex_lock(&p_slot->lock); 451 mutex_lock(&p_slot->lock);
453 switch (info->event_type) { 452 switch (info->event_type) {
454 case INT_BUTTON_PRESS: 453 case INT_BUTTON_PRESS:
455 handle_button_press_event(p_slot); 454 handle_button_press_event(p_slot);
456 break; 455 break;
457 case INT_POWER_FAULT: 456 case INT_POWER_FAULT:
458 if (!POWER_CTRL(ctrl)) 457 if (!POWER_CTRL(ctrl))
459 break; 458 break;
460 if (ATTN_LED(ctrl)) 459 if (ATTN_LED(ctrl))
461 pciehp_set_attention_status(p_slot, 1); 460 pciehp_set_attention_status(p_slot, 1);
462 if (PWR_LED(ctrl)) 461 if (PWR_LED(ctrl))
463 pciehp_green_led_off(p_slot); 462 pciehp_green_led_off(p_slot);
464 break; 463 break;
465 case INT_PRESENCE_ON: 464 case INT_PRESENCE_ON:
466 case INT_PRESENCE_OFF: 465 case INT_PRESENCE_OFF:
467 if (!HP_SUPR_RM(ctrl)) 466 if (!HP_SUPR_RM(ctrl))
468 break; 467 break;
469 ctrl_dbg(ctrl, "Surprise Removal\n"); 468 ctrl_dbg(ctrl, "Surprise Removal\n");
470 handle_surprise_event(p_slot); 469 handle_surprise_event(p_slot);
471 break; 470 break;
472 default: 471 default:
473 break; 472 break;
474 } 473 }
475 mutex_unlock(&p_slot->lock); 474 mutex_unlock(&p_slot->lock);
476 475
477 kfree(info); 476 kfree(info);
478 } 477 }
479 478
480 int pciehp_enable_slot(struct slot *p_slot) 479 int pciehp_enable_slot(struct slot *p_slot)
481 { 480 {
482 u8 getstatus = 0; 481 u8 getstatus = 0;
483 int rc; 482 int rc;
484 struct controller *ctrl = p_slot->ctrl; 483 struct controller *ctrl = p_slot->ctrl;
485 484
486 rc = pciehp_get_adapter_status(p_slot, &getstatus); 485 rc = pciehp_get_adapter_status(p_slot, &getstatus);
487 if (rc || !getstatus) { 486 if (rc || !getstatus) {
488 ctrl_info(ctrl, "No adapter on slot(%s)\n", slot_name(p_slot)); 487 ctrl_info(ctrl, "No adapter on slot(%s)\n", slot_name(p_slot));
489 return -ENODEV; 488 return -ENODEV;
490 } 489 }
491 if (MRL_SENS(p_slot->ctrl)) { 490 if (MRL_SENS(p_slot->ctrl)) {
492 rc = pciehp_get_latch_status(p_slot, &getstatus); 491 rc = pciehp_get_latch_status(p_slot, &getstatus);
493 if (rc || getstatus) { 492 if (rc || getstatus) {
494 ctrl_info(ctrl, "Latch open on slot(%s)\n", 493 ctrl_info(ctrl, "Latch open on slot(%s)\n",
495 slot_name(p_slot)); 494 slot_name(p_slot));
496 return -ENODEV; 495 return -ENODEV;
497 } 496 }
498 } 497 }
499 498
500 if (POWER_CTRL(p_slot->ctrl)) { 499 if (POWER_CTRL(p_slot->ctrl)) {
501 rc = pciehp_get_power_status(p_slot, &getstatus); 500 rc = pciehp_get_power_status(p_slot, &getstatus);
502 if (rc || getstatus) { 501 if (rc || getstatus) {
503 ctrl_info(ctrl, "Already enabled on slot(%s)\n", 502 ctrl_info(ctrl, "Already enabled on slot(%s)\n",
504 slot_name(p_slot)); 503 slot_name(p_slot));
505 return -EINVAL; 504 return -EINVAL;
506 } 505 }
507 } 506 }
508 507
509 pciehp_get_latch_status(p_slot, &getstatus); 508 pciehp_get_latch_status(p_slot, &getstatus);
510 509
511 rc = board_added(p_slot); 510 rc = board_added(p_slot);
512 if (rc) { 511 if (rc) {
513 pciehp_get_latch_status(p_slot, &getstatus); 512 pciehp_get_latch_status(p_slot, &getstatus);
514 } 513 }
515 return rc; 514 return rc;
516 } 515 }
517 516
518 517
519 int pciehp_disable_slot(struct slot *p_slot) 518 int pciehp_disable_slot(struct slot *p_slot)
520 { 519 {
521 u8 getstatus = 0; 520 u8 getstatus = 0;
522 int ret = 0; 521 int ret = 0;
523 struct controller *ctrl = p_slot->ctrl; 522 struct controller *ctrl = p_slot->ctrl;
524 523
525 if (!p_slot->ctrl) 524 if (!p_slot->ctrl)
526 return 1; 525 return 1;
527 526
528 if (!HP_SUPR_RM(p_slot->ctrl)) { 527 if (!HP_SUPR_RM(p_slot->ctrl)) {
529 ret = pciehp_get_adapter_status(p_slot, &getstatus); 528 ret = pciehp_get_adapter_status(p_slot, &getstatus);
530 if (ret || !getstatus) { 529 if (ret || !getstatus) {
531 ctrl_info(ctrl, "No adapter on slot(%s)\n", 530 ctrl_info(ctrl, "No adapter on slot(%s)\n",
532 slot_name(p_slot)); 531 slot_name(p_slot));
533 return -ENODEV; 532 return -ENODEV;
534 } 533 }
535 } 534 }
536 535
537 if (MRL_SENS(p_slot->ctrl)) { 536 if (MRL_SENS(p_slot->ctrl)) {
538 ret = pciehp_get_latch_status(p_slot, &getstatus); 537 ret = pciehp_get_latch_status(p_slot, &getstatus);
539 if (ret || getstatus) { 538 if (ret || getstatus) {
540 ctrl_info(ctrl, "Latch open on slot(%s)\n", 539 ctrl_info(ctrl, "Latch open on slot(%s)\n",
541 slot_name(p_slot)); 540 slot_name(p_slot));
542 return -ENODEV; 541 return -ENODEV;
543 } 542 }
544 } 543 }
545 544
546 if (POWER_CTRL(p_slot->ctrl)) { 545 if (POWER_CTRL(p_slot->ctrl)) {
547 ret = pciehp_get_power_status(p_slot, &getstatus); 546 ret = pciehp_get_power_status(p_slot, &getstatus);
548 if (ret || !getstatus) { 547 if (ret || !getstatus) {
549 ctrl_info(ctrl, "Already disabled on slot(%s)\n", 548 ctrl_info(ctrl, "Already disabled on slot(%s)\n",
550 slot_name(p_slot)); 549 slot_name(p_slot));
551 return -EINVAL; 550 return -EINVAL;
552 } 551 }
553 } 552 }
554 553
555 return remove_board(p_slot); 554 return remove_board(p_slot);
556 } 555 }
557 556
558 int pciehp_sysfs_enable_slot(struct slot *p_slot) 557 int pciehp_sysfs_enable_slot(struct slot *p_slot)
559 { 558 {
560 int retval = -ENODEV; 559 int retval = -ENODEV;
561 struct controller *ctrl = p_slot->ctrl; 560 struct controller *ctrl = p_slot->ctrl;
562 561
563 mutex_lock(&p_slot->lock); 562 mutex_lock(&p_slot->lock);
564 switch (p_slot->state) { 563 switch (p_slot->state) {
565 case BLINKINGON_STATE: 564 case BLINKINGON_STATE:
566 cancel_delayed_work(&p_slot->work); 565 cancel_delayed_work(&p_slot->work);
567 case STATIC_STATE: 566 case STATIC_STATE:
568 p_slot->state = POWERON_STATE; 567 p_slot->state = POWERON_STATE;
569 mutex_unlock(&p_slot->lock); 568 mutex_unlock(&p_slot->lock);
570 retval = pciehp_enable_slot(p_slot); 569 retval = pciehp_enable_slot(p_slot);
571 mutex_lock(&p_slot->lock); 570 mutex_lock(&p_slot->lock);
572 p_slot->state = STATIC_STATE; 571 p_slot->state = STATIC_STATE;
573 break; 572 break;
574 case POWERON_STATE: 573 case POWERON_STATE:
575 ctrl_info(ctrl, "Slot %s is already in powering on state\n", 574 ctrl_info(ctrl, "Slot %s is already in powering on state\n",
576 slot_name(p_slot)); 575 slot_name(p_slot));
577 break; 576 break;
578 case BLINKINGOFF_STATE: 577 case BLINKINGOFF_STATE:
579 case POWEROFF_STATE: 578 case POWEROFF_STATE:
580 ctrl_info(ctrl, "Already enabled on slot %s\n", 579 ctrl_info(ctrl, "Already enabled on slot %s\n",
581 slot_name(p_slot)); 580 slot_name(p_slot));
582 break; 581 break;
583 default: 582 default:
584 ctrl_err(ctrl, "Not a valid state on slot %s\n", 583 ctrl_err(ctrl, "Not a valid state on slot %s\n",
585 slot_name(p_slot)); 584 slot_name(p_slot));
586 break; 585 break;
587 } 586 }
588 mutex_unlock(&p_slot->lock); 587 mutex_unlock(&p_slot->lock);
589 588
590 return retval; 589 return retval;
591 } 590 }
592 591
593 int pciehp_sysfs_disable_slot(struct slot *p_slot) 592 int pciehp_sysfs_disable_slot(struct slot *p_slot)
594 { 593 {
595 int retval = -ENODEV; 594 int retval = -ENODEV;
596 struct controller *ctrl = p_slot->ctrl; 595 struct controller *ctrl = p_slot->ctrl;
597 596
598 mutex_lock(&p_slot->lock); 597 mutex_lock(&p_slot->lock);
599 switch (p_slot->state) { 598 switch (p_slot->state) {
600 case BLINKINGOFF_STATE: 599 case BLINKINGOFF_STATE:
601 cancel_delayed_work(&p_slot->work); 600 cancel_delayed_work(&p_slot->work);
602 case STATIC_STATE: 601 case STATIC_STATE:
603 p_slot->state = POWEROFF_STATE; 602 p_slot->state = POWEROFF_STATE;
604 mutex_unlock(&p_slot->lock); 603 mutex_unlock(&p_slot->lock);
605 retval = pciehp_disable_slot(p_slot); 604 retval = pciehp_disable_slot(p_slot);
606 mutex_lock(&p_slot->lock); 605 mutex_lock(&p_slot->lock);
607 p_slot->state = STATIC_STATE; 606 p_slot->state = STATIC_STATE;
608 break; 607 break;
609 case POWEROFF_STATE: 608 case POWEROFF_STATE:
610 ctrl_info(ctrl, "Slot %s is already in powering off state\n", 609 ctrl_info(ctrl, "Slot %s is already in powering off state\n",
611 slot_name(p_slot)); 610 slot_name(p_slot));
612 break; 611 break;
613 case BLINKINGON_STATE: 612 case BLINKINGON_STATE:
614 case POWERON_STATE: 613 case POWERON_STATE:
615 ctrl_info(ctrl, "Already disabled on slot %s\n", 614 ctrl_info(ctrl, "Already disabled on slot %s\n",
616 slot_name(p_slot)); 615 slot_name(p_slot));
617 break; 616 break;
618 default: 617 default:
619 ctrl_err(ctrl, "Not a valid state on slot %s\n", 618 ctrl_err(ctrl, "Not a valid state on slot %s\n",
620 slot_name(p_slot)); 619 slot_name(p_slot));
621 break; 620 break;
622 } 621 }
623 mutex_unlock(&p_slot->lock); 622 mutex_unlock(&p_slot->lock);
624 623
625 return retval; 624 return retval;
626 } 625 }
627 626
drivers/pci/hotplug/pciehp_hpc.c
1 /* 1 /*
2 * PCI Express PCI Hot Plug Driver 2 * PCI Express PCI Hot Plug Driver
3 * 3 *
4 * Copyright (C) 1995,2001 Compaq Computer Corporation 4 * Copyright (C) 1995,2001 Compaq Computer Corporation
5 * Copyright (C) 2001 Greg Kroah-Hartman (greg@kroah.com) 5 * Copyright (C) 2001 Greg Kroah-Hartman (greg@kroah.com)
6 * Copyright (C) 2001 IBM Corp. 6 * Copyright (C) 2001 IBM Corp.
7 * Copyright (C) 2003-2004 Intel Corporation 7 * Copyright (C) 2003-2004 Intel Corporation
8 * 8 *
9 * All rights reserved. 9 * All rights reserved.
10 * 10 *
11 * This program is free software; you can redistribute it and/or modify 11 * This program is free software; you can redistribute it and/or modify
12 * it under the terms of the GNU General Public License as published by 12 * it under the terms of the GNU General Public License as published by
13 * the Free Software Foundation; either version 2 of the License, or (at 13 * the Free Software Foundation; either version 2 of the License, or (at
14 * your option) any later version. 14 * your option) any later version.
15 * 15 *
16 * This program is distributed in the hope that it will be useful, but 16 * This program is distributed in the hope that it will be useful, but
17 * WITHOUT ANY WARRANTY; without even the implied warranty of 17 * WITHOUT ANY WARRANTY; without even the implied warranty of
18 * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or 18 * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
19 * NON INFRINGEMENT. See the GNU General Public License for more 19 * NON INFRINGEMENT. See the GNU General Public License for more
20 * details. 20 * details.
21 * 21 *
22 * You should have received a copy of the GNU General Public License 22 * You should have received a copy of the GNU General Public License
23 * along with this program; if not, write to the Free Software 23 * along with this program; if not, write to the Free Software
24 * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. 24 * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
25 * 25 *
26 * Send feedback to <greg@kroah.com>,<kristen.c.accardi@intel.com> 26 * Send feedback to <greg@kroah.com>,<kristen.c.accardi@intel.com>
27 * 27 *
28 */ 28 */
29 29
30 #include <linux/kernel.h> 30 #include <linux/kernel.h>
31 #include <linux/module.h> 31 #include <linux/module.h>
32 #include <linux/types.h> 32 #include <linux/types.h>
33 #include <linux/signal.h> 33 #include <linux/signal.h>
34 #include <linux/jiffies.h> 34 #include <linux/jiffies.h>
35 #include <linux/timer.h> 35 #include <linux/timer.h>
36 #include <linux/pci.h> 36 #include <linux/pci.h>
37 #include <linux/interrupt.h> 37 #include <linux/interrupt.h>
38 #include <linux/time.h> 38 #include <linux/time.h>
39 #include <linux/slab.h> 39 #include <linux/slab.h>
40 40
41 #include "../pci.h" 41 #include "../pci.h"
42 #include "pciehp.h" 42 #include "pciehp.h"
43 43
44 static atomic_t pciehp_num_controllers = ATOMIC_INIT(0);
45
46 static inline int pciehp_readw(struct controller *ctrl, int reg, u16 *value) 44 static inline int pciehp_readw(struct controller *ctrl, int reg, u16 *value)
47 { 45 {
48 struct pci_dev *dev = ctrl->pcie->port; 46 struct pci_dev *dev = ctrl->pcie->port;
49 return pci_read_config_word(dev, pci_pcie_cap(dev) + reg, value); 47 return pci_read_config_word(dev, pci_pcie_cap(dev) + reg, value);
50 } 48 }
51 49
52 static inline int pciehp_readl(struct controller *ctrl, int reg, u32 *value) 50 static inline int pciehp_readl(struct controller *ctrl, int reg, u32 *value)
53 { 51 {
54 struct pci_dev *dev = ctrl->pcie->port; 52 struct pci_dev *dev = ctrl->pcie->port;
55 return pci_read_config_dword(dev, pci_pcie_cap(dev) + reg, value); 53 return pci_read_config_dword(dev, pci_pcie_cap(dev) + reg, value);
56 } 54 }
57 55
58 static inline int pciehp_writew(struct controller *ctrl, int reg, u16 value) 56 static inline int pciehp_writew(struct controller *ctrl, int reg, u16 value)
59 { 57 {
60 struct pci_dev *dev = ctrl->pcie->port; 58 struct pci_dev *dev = ctrl->pcie->port;
61 return pci_write_config_word(dev, pci_pcie_cap(dev) + reg, value); 59 return pci_write_config_word(dev, pci_pcie_cap(dev) + reg, value);
62 } 60 }
63 61
64 static inline int pciehp_writel(struct controller *ctrl, int reg, u32 value) 62 static inline int pciehp_writel(struct controller *ctrl, int reg, u32 value)
65 { 63 {
66 struct pci_dev *dev = ctrl->pcie->port; 64 struct pci_dev *dev = ctrl->pcie->port;
67 return pci_write_config_dword(dev, pci_pcie_cap(dev) + reg, value); 65 return pci_write_config_dword(dev, pci_pcie_cap(dev) + reg, value);
68 } 66 }
69 67
70 /* Power Control Command */ 68 /* Power Control Command */
71 #define POWER_ON 0 69 #define POWER_ON 0
72 #define POWER_OFF PCI_EXP_SLTCTL_PCC 70 #define POWER_OFF PCI_EXP_SLTCTL_PCC
73 71
74 static irqreturn_t pcie_isr(int irq, void *dev_id); 72 static irqreturn_t pcie_isr(int irq, void *dev_id);
75 static void start_int_poll_timer(struct controller *ctrl, int sec); 73 static void start_int_poll_timer(struct controller *ctrl, int sec);
76 74
77 /* This is the interrupt polling timeout function. */ 75 /* This is the interrupt polling timeout function. */
78 static void int_poll_timeout(unsigned long data) 76 static void int_poll_timeout(unsigned long data)
79 { 77 {
80 struct controller *ctrl = (struct controller *)data; 78 struct controller *ctrl = (struct controller *)data;
81 79
82 /* Poll for interrupt events. regs == NULL => polling */ 80 /* Poll for interrupt events. regs == NULL => polling */
83 pcie_isr(0, ctrl); 81 pcie_isr(0, ctrl);
84 82
85 init_timer(&ctrl->poll_timer); 83 init_timer(&ctrl->poll_timer);
86 if (!pciehp_poll_time) 84 if (!pciehp_poll_time)
87 pciehp_poll_time = 2; /* default polling interval is 2 sec */ 85 pciehp_poll_time = 2; /* default polling interval is 2 sec */
88 86
89 start_int_poll_timer(ctrl, pciehp_poll_time); 87 start_int_poll_timer(ctrl, pciehp_poll_time);
90 } 88 }
91 89
92 /* This function starts the interrupt polling timer. */ 90 /* This function starts the interrupt polling timer. */
93 static void start_int_poll_timer(struct controller *ctrl, int sec) 91 static void start_int_poll_timer(struct controller *ctrl, int sec)
94 { 92 {
95 /* Clamp to sane value */ 93 /* Clamp to sane value */
96 if ((sec <= 0) || (sec > 60)) 94 if ((sec <= 0) || (sec > 60))
97 sec = 2; 95 sec = 2;
98 96
99 ctrl->poll_timer.function = &int_poll_timeout; 97 ctrl->poll_timer.function = &int_poll_timeout;
100 ctrl->poll_timer.data = (unsigned long)ctrl; 98 ctrl->poll_timer.data = (unsigned long)ctrl;
101 ctrl->poll_timer.expires = jiffies + sec * HZ; 99 ctrl->poll_timer.expires = jiffies + sec * HZ;
102 add_timer(&ctrl->poll_timer); 100 add_timer(&ctrl->poll_timer);
103 } 101 }
104 102
105 static inline int pciehp_request_irq(struct controller *ctrl) 103 static inline int pciehp_request_irq(struct controller *ctrl)
106 { 104 {
107 int retval, irq = ctrl->pcie->irq; 105 int retval, irq = ctrl->pcie->irq;
108 106
109 /* Install interrupt polling timer. Start with 10 sec delay */ 107 /* Install interrupt polling timer. Start with 10 sec delay */
110 if (pciehp_poll_mode) { 108 if (pciehp_poll_mode) {
111 init_timer(&ctrl->poll_timer); 109 init_timer(&ctrl->poll_timer);
112 start_int_poll_timer(ctrl, 10); 110 start_int_poll_timer(ctrl, 10);
113 return 0; 111 return 0;
114 } 112 }
115 113
116 /* Installs the interrupt handler */ 114 /* Installs the interrupt handler */
117 retval = request_irq(irq, pcie_isr, IRQF_SHARED, MY_NAME, ctrl); 115 retval = request_irq(irq, pcie_isr, IRQF_SHARED, MY_NAME, ctrl);
118 if (retval) 116 if (retval)
119 ctrl_err(ctrl, "Cannot get irq %d for the hotplug controller\n", 117 ctrl_err(ctrl, "Cannot get irq %d for the hotplug controller\n",
120 irq); 118 irq);
121 return retval; 119 return retval;
122 } 120 }
123 121
124 static inline void pciehp_free_irq(struct controller *ctrl) 122 static inline void pciehp_free_irq(struct controller *ctrl)
125 { 123 {
126 if (pciehp_poll_mode) 124 if (pciehp_poll_mode)
127 del_timer_sync(&ctrl->poll_timer); 125 del_timer_sync(&ctrl->poll_timer);
128 else 126 else
129 free_irq(ctrl->pcie->irq, ctrl); 127 free_irq(ctrl->pcie->irq, ctrl);
130 } 128 }
131 129
132 static int pcie_poll_cmd(struct controller *ctrl) 130 static int pcie_poll_cmd(struct controller *ctrl)
133 { 131 {
134 u16 slot_status; 132 u16 slot_status;
135 int err, timeout = 1000; 133 int err, timeout = 1000;
136 134
137 err = pciehp_readw(ctrl, PCI_EXP_SLTSTA, &slot_status); 135 err = pciehp_readw(ctrl, PCI_EXP_SLTSTA, &slot_status);
138 if (!err && (slot_status & PCI_EXP_SLTSTA_CC)) { 136 if (!err && (slot_status & PCI_EXP_SLTSTA_CC)) {
139 pciehp_writew(ctrl, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_CC); 137 pciehp_writew(ctrl, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_CC);
140 return 1; 138 return 1;
141 } 139 }
142 while (timeout > 0) { 140 while (timeout > 0) {
143 msleep(10); 141 msleep(10);
144 timeout -= 10; 142 timeout -= 10;
145 err = pciehp_readw(ctrl, PCI_EXP_SLTSTA, &slot_status); 143 err = pciehp_readw(ctrl, PCI_EXP_SLTSTA, &slot_status);
146 if (!err && (slot_status & PCI_EXP_SLTSTA_CC)) { 144 if (!err && (slot_status & PCI_EXP_SLTSTA_CC)) {
147 pciehp_writew(ctrl, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_CC); 145 pciehp_writew(ctrl, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_CC);
148 return 1; 146 return 1;
149 } 147 }
150 } 148 }
151 return 0; /* timeout */ 149 return 0; /* timeout */
152 } 150 }
153 151
154 static void pcie_wait_cmd(struct controller *ctrl, int poll) 152 static void pcie_wait_cmd(struct controller *ctrl, int poll)
155 { 153 {
156 unsigned int msecs = pciehp_poll_mode ? 2500 : 1000; 154 unsigned int msecs = pciehp_poll_mode ? 2500 : 1000;
157 unsigned long timeout = msecs_to_jiffies(msecs); 155 unsigned long timeout = msecs_to_jiffies(msecs);
158 int rc; 156 int rc;
159 157
160 if (poll) 158 if (poll)
161 rc = pcie_poll_cmd(ctrl); 159 rc = pcie_poll_cmd(ctrl);
162 else 160 else
163 rc = wait_event_timeout(ctrl->queue, !ctrl->cmd_busy, timeout); 161 rc = wait_event_timeout(ctrl->queue, !ctrl->cmd_busy, timeout);
164 if (!rc) 162 if (!rc)
165 ctrl_dbg(ctrl, "Command not completed in 1000 msec\n"); 163 ctrl_dbg(ctrl, "Command not completed in 1000 msec\n");
166 } 164 }
167 165
168 /** 166 /**
169 * pcie_write_cmd - Issue controller command 167 * pcie_write_cmd - Issue controller command
170 * @ctrl: controller to which the command is issued 168 * @ctrl: controller to which the command is issued
171 * @cmd: command value written to slot control register 169 * @cmd: command value written to slot control register
172 * @mask: bitmask of slot control register to be modified 170 * @mask: bitmask of slot control register to be modified
173 */ 171 */
174 static int pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask) 172 static int pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
175 { 173 {
176 int retval = 0; 174 int retval = 0;
177 u16 slot_status; 175 u16 slot_status;
178 u16 slot_ctrl; 176 u16 slot_ctrl;
179 177
180 mutex_lock(&ctrl->ctrl_lock); 178 mutex_lock(&ctrl->ctrl_lock);
181 179
182 retval = pciehp_readw(ctrl, PCI_EXP_SLTSTA, &slot_status); 180 retval = pciehp_readw(ctrl, PCI_EXP_SLTSTA, &slot_status);
183 if (retval) { 181 if (retval) {
184 ctrl_err(ctrl, "%s: Cannot read SLOTSTATUS register\n", 182 ctrl_err(ctrl, "%s: Cannot read SLOTSTATUS register\n",
185 __func__); 183 __func__);
186 goto out; 184 goto out;
187 } 185 }
188 186
189 if (slot_status & PCI_EXP_SLTSTA_CC) { 187 if (slot_status & PCI_EXP_SLTSTA_CC) {
190 if (!ctrl->no_cmd_complete) { 188 if (!ctrl->no_cmd_complete) {
191 /* 189 /*
192 * After 1 sec and CMD_COMPLETED still not set, just 190 * After 1 sec and CMD_COMPLETED still not set, just
193 * proceed forward to issue the next command according 191 * proceed forward to issue the next command according
194 * to spec. Just print out the error message. 192 * to spec. Just print out the error message.
195 */ 193 */
196 ctrl_dbg(ctrl, "CMD_COMPLETED not clear after 1 sec\n"); 194 ctrl_dbg(ctrl, "CMD_COMPLETED not clear after 1 sec\n");
197 } else if (!NO_CMD_CMPL(ctrl)) { 195 } else if (!NO_CMD_CMPL(ctrl)) {
198 /* 196 /*
199 * This controller semms to notify of command completed 197 * This controller semms to notify of command completed
200 * event even though it supports none of power 198 * event even though it supports none of power
201 * controller, attention led, power led and EMI. 199 * controller, attention led, power led and EMI.
202 */ 200 */
203 ctrl_dbg(ctrl, "Unexpected CMD_COMPLETED. Need to " 201 ctrl_dbg(ctrl, "Unexpected CMD_COMPLETED. Need to "
204 "wait for command completed event.\n"); 202 "wait for command completed event.\n");
205 ctrl->no_cmd_complete = 0; 203 ctrl->no_cmd_complete = 0;
206 } else { 204 } else {
207 ctrl_dbg(ctrl, "Unexpected CMD_COMPLETED. Maybe " 205 ctrl_dbg(ctrl, "Unexpected CMD_COMPLETED. Maybe "
208 "the controller is broken.\n"); 206 "the controller is broken.\n");
209 } 207 }
210 } 208 }
211 209
212 retval = pciehp_readw(ctrl, PCI_EXP_SLTCTL, &slot_ctrl); 210 retval = pciehp_readw(ctrl, PCI_EXP_SLTCTL, &slot_ctrl);
213 if (retval) { 211 if (retval) {
214 ctrl_err(ctrl, "%s: Cannot read SLOTCTRL register\n", __func__); 212 ctrl_err(ctrl, "%s: Cannot read SLOTCTRL register\n", __func__);
215 goto out; 213 goto out;
216 } 214 }
217 215
218 slot_ctrl &= ~mask; 216 slot_ctrl &= ~mask;
219 slot_ctrl |= (cmd & mask); 217 slot_ctrl |= (cmd & mask);
220 ctrl->cmd_busy = 1; 218 ctrl->cmd_busy = 1;
221 smp_mb(); 219 smp_mb();
222 retval = pciehp_writew(ctrl, PCI_EXP_SLTCTL, slot_ctrl); 220 retval = pciehp_writew(ctrl, PCI_EXP_SLTCTL, slot_ctrl);
223 if (retval) 221 if (retval)
224 ctrl_err(ctrl, "Cannot write to SLOTCTRL register\n"); 222 ctrl_err(ctrl, "Cannot write to SLOTCTRL register\n");
225 223
226 /* 224 /*
227 * Wait for command completion. 225 * Wait for command completion.
228 */ 226 */
229 if (!retval && !ctrl->no_cmd_complete) { 227 if (!retval && !ctrl->no_cmd_complete) {
230 int poll = 0; 228 int poll = 0;
231 /* 229 /*
232 * if hotplug interrupt is not enabled or command 230 * if hotplug interrupt is not enabled or command
233 * completed interrupt is not enabled, we need to poll 231 * completed interrupt is not enabled, we need to poll
234 * command completed event. 232 * command completed event.
235 */ 233 */
236 if (!(slot_ctrl & PCI_EXP_SLTCTL_HPIE) || 234 if (!(slot_ctrl & PCI_EXP_SLTCTL_HPIE) ||
237 !(slot_ctrl & PCI_EXP_SLTCTL_CCIE)) 235 !(slot_ctrl & PCI_EXP_SLTCTL_CCIE))
238 poll = 1; 236 poll = 1;
239 pcie_wait_cmd(ctrl, poll); 237 pcie_wait_cmd(ctrl, poll);
240 } 238 }
241 out: 239 out:
242 mutex_unlock(&ctrl->ctrl_lock); 240 mutex_unlock(&ctrl->ctrl_lock);
243 return retval; 241 return retval;
244 } 242 }
245 243
246 static inline int check_link_active(struct controller *ctrl) 244 static inline int check_link_active(struct controller *ctrl)
247 { 245 {
248 u16 link_status; 246 u16 link_status;
249 247
250 if (pciehp_readw(ctrl, PCI_EXP_LNKSTA, &link_status)) 248 if (pciehp_readw(ctrl, PCI_EXP_LNKSTA, &link_status))
251 return 0; 249 return 0;
252 return !!(link_status & PCI_EXP_LNKSTA_DLLLA); 250 return !!(link_status & PCI_EXP_LNKSTA_DLLLA);
253 } 251 }
254 252
255 static void pcie_wait_link_active(struct controller *ctrl) 253 static void pcie_wait_link_active(struct controller *ctrl)
256 { 254 {
257 int timeout = 1000; 255 int timeout = 1000;
258 256
259 if (check_link_active(ctrl)) 257 if (check_link_active(ctrl))
260 return; 258 return;
261 while (timeout > 0) { 259 while (timeout > 0) {
262 msleep(10); 260 msleep(10);
263 timeout -= 10; 261 timeout -= 10;
264 if (check_link_active(ctrl)) 262 if (check_link_active(ctrl))
265 return; 263 return;
266 } 264 }
267 ctrl_dbg(ctrl, "Data Link Layer Link Active not set in 1000 msec\n"); 265 ctrl_dbg(ctrl, "Data Link Layer Link Active not set in 1000 msec\n");
268 } 266 }
269 267
270 int pciehp_check_link_status(struct controller *ctrl) 268 int pciehp_check_link_status(struct controller *ctrl)
271 { 269 {
272 u16 lnk_status; 270 u16 lnk_status;
273 int retval = 0; 271 int retval = 0;
274 272
275 /* 273 /*
276 * Data Link Layer Link Active Reporting must be capable for 274 * Data Link Layer Link Active Reporting must be capable for
277 * hot-plug capable downstream port. But old controller might 275 * hot-plug capable downstream port. But old controller might
278 * not implement it. In this case, we wait for 1000 ms. 276 * not implement it. In this case, we wait for 1000 ms.
279 */ 277 */
280 if (ctrl->link_active_reporting){ 278 if (ctrl->link_active_reporting){
281 /* Wait for Data Link Layer Link Active bit to be set */ 279 /* Wait for Data Link Layer Link Active bit to be set */
282 pcie_wait_link_active(ctrl); 280 pcie_wait_link_active(ctrl);
283 /* 281 /*
284 * We must wait for 100 ms after the Data Link Layer 282 * We must wait for 100 ms after the Data Link Layer
285 * Link Active bit reads 1b before initiating a 283 * Link Active bit reads 1b before initiating a
286 * configuration access to the hot added device. 284 * configuration access to the hot added device.
287 */ 285 */
288 msleep(100); 286 msleep(100);
289 } else 287 } else
290 msleep(1000); 288 msleep(1000);
291 289
292 retval = pciehp_readw(ctrl, PCI_EXP_LNKSTA, &lnk_status); 290 retval = pciehp_readw(ctrl, PCI_EXP_LNKSTA, &lnk_status);
293 if (retval) { 291 if (retval) {
294 ctrl_err(ctrl, "Cannot read LNKSTATUS register\n"); 292 ctrl_err(ctrl, "Cannot read LNKSTATUS register\n");
295 return retval; 293 return retval;
296 } 294 }
297 295
298 ctrl_dbg(ctrl, "%s: lnk_status = %x\n", __func__, lnk_status); 296 ctrl_dbg(ctrl, "%s: lnk_status = %x\n", __func__, lnk_status);
299 if ((lnk_status & PCI_EXP_LNKSTA_LT) || 297 if ((lnk_status & PCI_EXP_LNKSTA_LT) ||
300 !(lnk_status & PCI_EXP_LNKSTA_NLW)) { 298 !(lnk_status & PCI_EXP_LNKSTA_NLW)) {
301 ctrl_err(ctrl, "Link Training Error occurs \n"); 299 ctrl_err(ctrl, "Link Training Error occurs \n");
302 retval = -1; 300 retval = -1;
303 return retval; 301 return retval;
304 } 302 }
305 303
306 return retval; 304 return retval;
307 } 305 }
308 306
309 int pciehp_get_attention_status(struct slot *slot, u8 *status) 307 int pciehp_get_attention_status(struct slot *slot, u8 *status)
310 { 308 {
311 struct controller *ctrl = slot->ctrl; 309 struct controller *ctrl = slot->ctrl;
312 u16 slot_ctrl; 310 u16 slot_ctrl;
313 u8 atten_led_state; 311 u8 atten_led_state;
314 int retval = 0; 312 int retval = 0;
315 313
316 retval = pciehp_readw(ctrl, PCI_EXP_SLTCTL, &slot_ctrl); 314 retval = pciehp_readw(ctrl, PCI_EXP_SLTCTL, &slot_ctrl);
317 if (retval) { 315 if (retval) {
318 ctrl_err(ctrl, "%s: Cannot read SLOTCTRL register\n", __func__); 316 ctrl_err(ctrl, "%s: Cannot read SLOTCTRL register\n", __func__);
319 return retval; 317 return retval;
320 } 318 }
321 319
322 ctrl_dbg(ctrl, "%s: SLOTCTRL %x, value read %x\n", __func__, 320 ctrl_dbg(ctrl, "%s: SLOTCTRL %x, value read %x\n", __func__,
323 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, slot_ctrl); 321 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, slot_ctrl);
324 322
325 atten_led_state = (slot_ctrl & PCI_EXP_SLTCTL_AIC) >> 6; 323 atten_led_state = (slot_ctrl & PCI_EXP_SLTCTL_AIC) >> 6;
326 324
327 switch (atten_led_state) { 325 switch (atten_led_state) {
328 case 0: 326 case 0:
329 *status = 0xFF; /* Reserved */ 327 *status = 0xFF; /* Reserved */
330 break; 328 break;
331 case 1: 329 case 1:
332 *status = 1; /* On */ 330 *status = 1; /* On */
333 break; 331 break;
334 case 2: 332 case 2:
335 *status = 2; /* Blink */ 333 *status = 2; /* Blink */
336 break; 334 break;
337 case 3: 335 case 3:
338 *status = 0; /* Off */ 336 *status = 0; /* Off */
339 break; 337 break;
340 default: 338 default:
341 *status = 0xFF; 339 *status = 0xFF;
342 break; 340 break;
343 } 341 }
344 342
345 return 0; 343 return 0;
346 } 344 }
347 345
348 int pciehp_get_power_status(struct slot *slot, u8 *status) 346 int pciehp_get_power_status(struct slot *slot, u8 *status)
349 { 347 {
350 struct controller *ctrl = slot->ctrl; 348 struct controller *ctrl = slot->ctrl;
351 u16 slot_ctrl; 349 u16 slot_ctrl;
352 u8 pwr_state; 350 u8 pwr_state;
353 int retval = 0; 351 int retval = 0;
354 352
355 retval = pciehp_readw(ctrl, PCI_EXP_SLTCTL, &slot_ctrl); 353 retval = pciehp_readw(ctrl, PCI_EXP_SLTCTL, &slot_ctrl);
356 if (retval) { 354 if (retval) {
357 ctrl_err(ctrl, "%s: Cannot read SLOTCTRL register\n", __func__); 355 ctrl_err(ctrl, "%s: Cannot read SLOTCTRL register\n", __func__);
358 return retval; 356 return retval;
359 } 357 }
360 ctrl_dbg(ctrl, "%s: SLOTCTRL %x value read %x\n", __func__, 358 ctrl_dbg(ctrl, "%s: SLOTCTRL %x value read %x\n", __func__,
361 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, slot_ctrl); 359 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, slot_ctrl);
362 360
363 pwr_state = (slot_ctrl & PCI_EXP_SLTCTL_PCC) >> 10; 361 pwr_state = (slot_ctrl & PCI_EXP_SLTCTL_PCC) >> 10;
364 362
365 switch (pwr_state) { 363 switch (pwr_state) {
366 case 0: 364 case 0:
367 *status = 1; 365 *status = 1;
368 break; 366 break;
369 case 1: 367 case 1:
370 *status = 0; 368 *status = 0;
371 break; 369 break;
372 default: 370 default:
373 *status = 0xFF; 371 *status = 0xFF;
374 break; 372 break;
375 } 373 }
376 374
377 return retval; 375 return retval;
378 } 376 }
379 377
380 int pciehp_get_latch_status(struct slot *slot, u8 *status) 378 int pciehp_get_latch_status(struct slot *slot, u8 *status)
381 { 379 {
382 struct controller *ctrl = slot->ctrl; 380 struct controller *ctrl = slot->ctrl;
383 u16 slot_status; 381 u16 slot_status;
384 int retval; 382 int retval;
385 383
386 retval = pciehp_readw(ctrl, PCI_EXP_SLTSTA, &slot_status); 384 retval = pciehp_readw(ctrl, PCI_EXP_SLTSTA, &slot_status);
387 if (retval) { 385 if (retval) {
388 ctrl_err(ctrl, "%s: Cannot read SLOTSTATUS register\n", 386 ctrl_err(ctrl, "%s: Cannot read SLOTSTATUS register\n",
389 __func__); 387 __func__);
390 return retval; 388 return retval;
391 } 389 }
392 *status = !!(slot_status & PCI_EXP_SLTSTA_MRLSS); 390 *status = !!(slot_status & PCI_EXP_SLTSTA_MRLSS);
393 return 0; 391 return 0;
394 } 392 }
395 393
396 int pciehp_get_adapter_status(struct slot *slot, u8 *status) 394 int pciehp_get_adapter_status(struct slot *slot, u8 *status)
397 { 395 {
398 struct controller *ctrl = slot->ctrl; 396 struct controller *ctrl = slot->ctrl;
399 u16 slot_status; 397 u16 slot_status;
400 int retval; 398 int retval;
401 399
402 retval = pciehp_readw(ctrl, PCI_EXP_SLTSTA, &slot_status); 400 retval = pciehp_readw(ctrl, PCI_EXP_SLTSTA, &slot_status);
403 if (retval) { 401 if (retval) {
404 ctrl_err(ctrl, "%s: Cannot read SLOTSTATUS register\n", 402 ctrl_err(ctrl, "%s: Cannot read SLOTSTATUS register\n",
405 __func__); 403 __func__);
406 return retval; 404 return retval;
407 } 405 }
408 *status = !!(slot_status & PCI_EXP_SLTSTA_PDS); 406 *status = !!(slot_status & PCI_EXP_SLTSTA_PDS);
409 return 0; 407 return 0;
410 } 408 }
411 409
412 int pciehp_query_power_fault(struct slot *slot) 410 int pciehp_query_power_fault(struct slot *slot)
413 { 411 {
414 struct controller *ctrl = slot->ctrl; 412 struct controller *ctrl = slot->ctrl;
415 u16 slot_status; 413 u16 slot_status;
416 int retval; 414 int retval;
417 415
418 retval = pciehp_readw(ctrl, PCI_EXP_SLTSTA, &slot_status); 416 retval = pciehp_readw(ctrl, PCI_EXP_SLTSTA, &slot_status);
419 if (retval) { 417 if (retval) {
420 ctrl_err(ctrl, "Cannot check for power fault\n"); 418 ctrl_err(ctrl, "Cannot check for power fault\n");
421 return retval; 419 return retval;
422 } 420 }
423 return !!(slot_status & PCI_EXP_SLTSTA_PFD); 421 return !!(slot_status & PCI_EXP_SLTSTA_PFD);
424 } 422 }
425 423
426 int pciehp_set_attention_status(struct slot *slot, u8 value) 424 int pciehp_set_attention_status(struct slot *slot, u8 value)
427 { 425 {
428 struct controller *ctrl = slot->ctrl; 426 struct controller *ctrl = slot->ctrl;
429 u16 slot_cmd; 427 u16 slot_cmd;
430 u16 cmd_mask; 428 u16 cmd_mask;
431 429
432 cmd_mask = PCI_EXP_SLTCTL_AIC; 430 cmd_mask = PCI_EXP_SLTCTL_AIC;
433 switch (value) { 431 switch (value) {
434 case 0 : /* turn off */ 432 case 0 : /* turn off */
435 slot_cmd = 0x00C0; 433 slot_cmd = 0x00C0;
436 break; 434 break;
437 case 1: /* turn on */ 435 case 1: /* turn on */
438 slot_cmd = 0x0040; 436 slot_cmd = 0x0040;
439 break; 437 break;
440 case 2: /* turn blink */ 438 case 2: /* turn blink */
441 slot_cmd = 0x0080; 439 slot_cmd = 0x0080;
442 break; 440 break;
443 default: 441 default:
444 return -EINVAL; 442 return -EINVAL;
445 } 443 }
446 ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__, 444 ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
447 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, slot_cmd); 445 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, slot_cmd);
448 return pcie_write_cmd(ctrl, slot_cmd, cmd_mask); 446 return pcie_write_cmd(ctrl, slot_cmd, cmd_mask);
449 } 447 }
450 448
451 void pciehp_green_led_on(struct slot *slot) 449 void pciehp_green_led_on(struct slot *slot)
452 { 450 {
453 struct controller *ctrl = slot->ctrl; 451 struct controller *ctrl = slot->ctrl;
454 u16 slot_cmd; 452 u16 slot_cmd;
455 u16 cmd_mask; 453 u16 cmd_mask;
456 454
457 slot_cmd = 0x0100; 455 slot_cmd = 0x0100;
458 cmd_mask = PCI_EXP_SLTCTL_PIC; 456 cmd_mask = PCI_EXP_SLTCTL_PIC;
459 pcie_write_cmd(ctrl, slot_cmd, cmd_mask); 457 pcie_write_cmd(ctrl, slot_cmd, cmd_mask);
460 ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__, 458 ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
461 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, slot_cmd); 459 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, slot_cmd);
462 } 460 }
463 461
464 void pciehp_green_led_off(struct slot *slot) 462 void pciehp_green_led_off(struct slot *slot)
465 { 463 {
466 struct controller *ctrl = slot->ctrl; 464 struct controller *ctrl = slot->ctrl;
467 u16 slot_cmd; 465 u16 slot_cmd;
468 u16 cmd_mask; 466 u16 cmd_mask;
469 467
470 slot_cmd = 0x0300; 468 slot_cmd = 0x0300;
471 cmd_mask = PCI_EXP_SLTCTL_PIC; 469 cmd_mask = PCI_EXP_SLTCTL_PIC;
472 pcie_write_cmd(ctrl, slot_cmd, cmd_mask); 470 pcie_write_cmd(ctrl, slot_cmd, cmd_mask);
473 ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__, 471 ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
474 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, slot_cmd); 472 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, slot_cmd);
475 } 473 }
476 474
477 void pciehp_green_led_blink(struct slot *slot) 475 void pciehp_green_led_blink(struct slot *slot)
478 { 476 {
479 struct controller *ctrl = slot->ctrl; 477 struct controller *ctrl = slot->ctrl;
480 u16 slot_cmd; 478 u16 slot_cmd;
481 u16 cmd_mask; 479 u16 cmd_mask;
482 480
483 slot_cmd = 0x0200; 481 slot_cmd = 0x0200;
484 cmd_mask = PCI_EXP_SLTCTL_PIC; 482 cmd_mask = PCI_EXP_SLTCTL_PIC;
485 pcie_write_cmd(ctrl, slot_cmd, cmd_mask); 483 pcie_write_cmd(ctrl, slot_cmd, cmd_mask);
486 ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__, 484 ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
487 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, slot_cmd); 485 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, slot_cmd);
488 } 486 }
489 487
490 int pciehp_power_on_slot(struct slot * slot) 488 int pciehp_power_on_slot(struct slot * slot)
491 { 489 {
492 struct controller *ctrl = slot->ctrl; 490 struct controller *ctrl = slot->ctrl;
493 u16 slot_cmd; 491 u16 slot_cmd;
494 u16 cmd_mask; 492 u16 cmd_mask;
495 u16 slot_status; 493 u16 slot_status;
496 u16 lnk_status; 494 u16 lnk_status;
497 int retval = 0; 495 int retval = 0;
498 496
499 /* Clear sticky power-fault bit from previous power failures */ 497 /* Clear sticky power-fault bit from previous power failures */
500 retval = pciehp_readw(ctrl, PCI_EXP_SLTSTA, &slot_status); 498 retval = pciehp_readw(ctrl, PCI_EXP_SLTSTA, &slot_status);
501 if (retval) { 499 if (retval) {
502 ctrl_err(ctrl, "%s: Cannot read SLOTSTATUS register\n", 500 ctrl_err(ctrl, "%s: Cannot read SLOTSTATUS register\n",
503 __func__); 501 __func__);
504 return retval; 502 return retval;
505 } 503 }
506 slot_status &= PCI_EXP_SLTSTA_PFD; 504 slot_status &= PCI_EXP_SLTSTA_PFD;
507 if (slot_status) { 505 if (slot_status) {
508 retval = pciehp_writew(ctrl, PCI_EXP_SLTSTA, slot_status); 506 retval = pciehp_writew(ctrl, PCI_EXP_SLTSTA, slot_status);
509 if (retval) { 507 if (retval) {
510 ctrl_err(ctrl, 508 ctrl_err(ctrl,
511 "%s: Cannot write to SLOTSTATUS register\n", 509 "%s: Cannot write to SLOTSTATUS register\n",
512 __func__); 510 __func__);
513 return retval; 511 return retval;
514 } 512 }
515 } 513 }
516 ctrl->power_fault_detected = 0; 514 ctrl->power_fault_detected = 0;
517 515
518 slot_cmd = POWER_ON; 516 slot_cmd = POWER_ON;
519 cmd_mask = PCI_EXP_SLTCTL_PCC; 517 cmd_mask = PCI_EXP_SLTCTL_PCC;
520 retval = pcie_write_cmd(ctrl, slot_cmd, cmd_mask); 518 retval = pcie_write_cmd(ctrl, slot_cmd, cmd_mask);
521 if (retval) { 519 if (retval) {
522 ctrl_err(ctrl, "Write %x command failed!\n", slot_cmd); 520 ctrl_err(ctrl, "Write %x command failed!\n", slot_cmd);
523 return retval; 521 return retval;
524 } 522 }
525 ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__, 523 ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
526 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, slot_cmd); 524 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, slot_cmd);
527 525
528 retval = pciehp_readw(ctrl, PCI_EXP_LNKSTA, &lnk_status); 526 retval = pciehp_readw(ctrl, PCI_EXP_LNKSTA, &lnk_status);
529 if (retval) { 527 if (retval) {
530 ctrl_err(ctrl, "%s: Cannot read LNKSTA register\n", 528 ctrl_err(ctrl, "%s: Cannot read LNKSTA register\n",
531 __func__); 529 __func__);
532 return retval; 530 return retval;
533 } 531 }
534 pcie_update_link_speed(ctrl->pcie->port->subordinate, lnk_status); 532 pcie_update_link_speed(ctrl->pcie->port->subordinate, lnk_status);
535 533
536 return retval; 534 return retval;
537 } 535 }
538 536
539 int pciehp_power_off_slot(struct slot * slot) 537 int pciehp_power_off_slot(struct slot * slot)
540 { 538 {
541 struct controller *ctrl = slot->ctrl; 539 struct controller *ctrl = slot->ctrl;
542 u16 slot_cmd; 540 u16 slot_cmd;
543 u16 cmd_mask; 541 u16 cmd_mask;
544 int retval; 542 int retval;
545 543
546 slot_cmd = POWER_OFF; 544 slot_cmd = POWER_OFF;
547 cmd_mask = PCI_EXP_SLTCTL_PCC; 545 cmd_mask = PCI_EXP_SLTCTL_PCC;
548 retval = pcie_write_cmd(ctrl, slot_cmd, cmd_mask); 546 retval = pcie_write_cmd(ctrl, slot_cmd, cmd_mask);
549 if (retval) { 547 if (retval) {
550 ctrl_err(ctrl, "Write command failed!\n"); 548 ctrl_err(ctrl, "Write command failed!\n");
551 return retval; 549 return retval;
552 } 550 }
553 ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__, 551 ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
554 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, slot_cmd); 552 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, slot_cmd);
555 return 0; 553 return 0;
556 } 554 }
557 555
558 static irqreturn_t pcie_isr(int irq, void *dev_id) 556 static irqreturn_t pcie_isr(int irq, void *dev_id)
559 { 557 {
560 struct controller *ctrl = (struct controller *)dev_id; 558 struct controller *ctrl = (struct controller *)dev_id;
561 struct slot *slot = ctrl->slot; 559 struct slot *slot = ctrl->slot;
562 u16 detected, intr_loc; 560 u16 detected, intr_loc;
563 561
564 /* 562 /*
565 * In order to guarantee that all interrupt events are 563 * In order to guarantee that all interrupt events are
566 * serviced, we need to re-inspect Slot Status register after 564 * serviced, we need to re-inspect Slot Status register after
567 * clearing what is presumed to be the last pending interrupt. 565 * clearing what is presumed to be the last pending interrupt.
568 */ 566 */
569 intr_loc = 0; 567 intr_loc = 0;
570 do { 568 do {
571 if (pciehp_readw(ctrl, PCI_EXP_SLTSTA, &detected)) { 569 if (pciehp_readw(ctrl, PCI_EXP_SLTSTA, &detected)) {
572 ctrl_err(ctrl, "%s: Cannot read SLOTSTATUS\n", 570 ctrl_err(ctrl, "%s: Cannot read SLOTSTATUS\n",
573 __func__); 571 __func__);
574 return IRQ_NONE; 572 return IRQ_NONE;
575 } 573 }
576 574
577 detected &= (PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD | 575 detected &= (PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
578 PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC | 576 PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
579 PCI_EXP_SLTSTA_CC); 577 PCI_EXP_SLTSTA_CC);
580 detected &= ~intr_loc; 578 detected &= ~intr_loc;
581 intr_loc |= detected; 579 intr_loc |= detected;
582 if (!intr_loc) 580 if (!intr_loc)
583 return IRQ_NONE; 581 return IRQ_NONE;
584 if (detected && pciehp_writew(ctrl, PCI_EXP_SLTSTA, intr_loc)) { 582 if (detected && pciehp_writew(ctrl, PCI_EXP_SLTSTA, intr_loc)) {
585 ctrl_err(ctrl, "%s: Cannot write to SLOTSTATUS\n", 583 ctrl_err(ctrl, "%s: Cannot write to SLOTSTATUS\n",
586 __func__); 584 __func__);
587 return IRQ_NONE; 585 return IRQ_NONE;
588 } 586 }
589 } while (detected); 587 } while (detected);
590 588
591 ctrl_dbg(ctrl, "%s: intr_loc %x\n", __func__, intr_loc); 589 ctrl_dbg(ctrl, "%s: intr_loc %x\n", __func__, intr_loc);
592 590
593 /* Check Command Complete Interrupt Pending */ 591 /* Check Command Complete Interrupt Pending */
594 if (intr_loc & PCI_EXP_SLTSTA_CC) { 592 if (intr_loc & PCI_EXP_SLTSTA_CC) {
595 ctrl->cmd_busy = 0; 593 ctrl->cmd_busy = 0;
596 smp_mb(); 594 smp_mb();
597 wake_up(&ctrl->queue); 595 wake_up(&ctrl->queue);
598 } 596 }
599 597
600 if (!(intr_loc & ~PCI_EXP_SLTSTA_CC)) 598 if (!(intr_loc & ~PCI_EXP_SLTSTA_CC))
601 return IRQ_HANDLED; 599 return IRQ_HANDLED;
602 600
603 /* Check MRL Sensor Changed */ 601 /* Check MRL Sensor Changed */
604 if (intr_loc & PCI_EXP_SLTSTA_MRLSC) 602 if (intr_loc & PCI_EXP_SLTSTA_MRLSC)
605 pciehp_handle_switch_change(slot); 603 pciehp_handle_switch_change(slot);
606 604
607 /* Check Attention Button Pressed */ 605 /* Check Attention Button Pressed */
608 if (intr_loc & PCI_EXP_SLTSTA_ABP) 606 if (intr_loc & PCI_EXP_SLTSTA_ABP)
609 pciehp_handle_attention_button(slot); 607 pciehp_handle_attention_button(slot);
610 608
611 /* Check Presence Detect Changed */ 609 /* Check Presence Detect Changed */
612 if (intr_loc & PCI_EXP_SLTSTA_PDC) 610 if (intr_loc & PCI_EXP_SLTSTA_PDC)
613 pciehp_handle_presence_change(slot); 611 pciehp_handle_presence_change(slot);
614 612
615 /* Check Power Fault Detected */ 613 /* Check Power Fault Detected */
616 if ((intr_loc & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) { 614 if ((intr_loc & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) {
617 ctrl->power_fault_detected = 1; 615 ctrl->power_fault_detected = 1;
618 pciehp_handle_power_fault(slot); 616 pciehp_handle_power_fault(slot);
619 } 617 }
620 return IRQ_HANDLED; 618 return IRQ_HANDLED;
621 } 619 }
622 620
623 int pciehp_get_max_lnk_width(struct slot *slot, 621 int pciehp_get_max_lnk_width(struct slot *slot,
624 enum pcie_link_width *value) 622 enum pcie_link_width *value)
625 { 623 {
626 struct controller *ctrl = slot->ctrl; 624 struct controller *ctrl = slot->ctrl;
627 enum pcie_link_width lnk_wdth; 625 enum pcie_link_width lnk_wdth;
628 u32 lnk_cap; 626 u32 lnk_cap;
629 int retval = 0; 627 int retval = 0;
630 628
631 retval = pciehp_readl(ctrl, PCI_EXP_LNKCAP, &lnk_cap); 629 retval = pciehp_readl(ctrl, PCI_EXP_LNKCAP, &lnk_cap);
632 if (retval) { 630 if (retval) {
633 ctrl_err(ctrl, "%s: Cannot read LNKCAP register\n", __func__); 631 ctrl_err(ctrl, "%s: Cannot read LNKCAP register\n", __func__);
634 return retval; 632 return retval;
635 } 633 }
636 634
637 switch ((lnk_cap & PCI_EXP_LNKSTA_NLW) >> 4){ 635 switch ((lnk_cap & PCI_EXP_LNKSTA_NLW) >> 4){
638 case 0: 636 case 0:
639 lnk_wdth = PCIE_LNK_WIDTH_RESRV; 637 lnk_wdth = PCIE_LNK_WIDTH_RESRV;
640 break; 638 break;
641 case 1: 639 case 1:
642 lnk_wdth = PCIE_LNK_X1; 640 lnk_wdth = PCIE_LNK_X1;
643 break; 641 break;
644 case 2: 642 case 2:
645 lnk_wdth = PCIE_LNK_X2; 643 lnk_wdth = PCIE_LNK_X2;
646 break; 644 break;
647 case 4: 645 case 4:
648 lnk_wdth = PCIE_LNK_X4; 646 lnk_wdth = PCIE_LNK_X4;
649 break; 647 break;
650 case 8: 648 case 8:
651 lnk_wdth = PCIE_LNK_X8; 649 lnk_wdth = PCIE_LNK_X8;
652 break; 650 break;
653 case 12: 651 case 12:
654 lnk_wdth = PCIE_LNK_X12; 652 lnk_wdth = PCIE_LNK_X12;
655 break; 653 break;
656 case 16: 654 case 16:
657 lnk_wdth = PCIE_LNK_X16; 655 lnk_wdth = PCIE_LNK_X16;
658 break; 656 break;
659 case 32: 657 case 32:
660 lnk_wdth = PCIE_LNK_X32; 658 lnk_wdth = PCIE_LNK_X32;
661 break; 659 break;
662 default: 660 default:
663 lnk_wdth = PCIE_LNK_WIDTH_UNKNOWN; 661 lnk_wdth = PCIE_LNK_WIDTH_UNKNOWN;
664 break; 662 break;
665 } 663 }
666 664
667 *value = lnk_wdth; 665 *value = lnk_wdth;
668 ctrl_dbg(ctrl, "Max link width = %d\n", lnk_wdth); 666 ctrl_dbg(ctrl, "Max link width = %d\n", lnk_wdth);
669 667
670 return retval; 668 return retval;
671 } 669 }
672 670
673 int pciehp_get_cur_lnk_width(struct slot *slot, 671 int pciehp_get_cur_lnk_width(struct slot *slot,
674 enum pcie_link_width *value) 672 enum pcie_link_width *value)
675 { 673 {
676 struct controller *ctrl = slot->ctrl; 674 struct controller *ctrl = slot->ctrl;
677 enum pcie_link_width lnk_wdth = PCIE_LNK_WIDTH_UNKNOWN; 675 enum pcie_link_width lnk_wdth = PCIE_LNK_WIDTH_UNKNOWN;
678 int retval = 0; 676 int retval = 0;
679 u16 lnk_status; 677 u16 lnk_status;
680 678
681 retval = pciehp_readw(ctrl, PCI_EXP_LNKSTA, &lnk_status); 679 retval = pciehp_readw(ctrl, PCI_EXP_LNKSTA, &lnk_status);
682 if (retval) { 680 if (retval) {
683 ctrl_err(ctrl, "%s: Cannot read LNKSTATUS register\n", 681 ctrl_err(ctrl, "%s: Cannot read LNKSTATUS register\n",
684 __func__); 682 __func__);
685 return retval; 683 return retval;
686 } 684 }
687 685
688 switch ((lnk_status & PCI_EXP_LNKSTA_NLW) >> 4){ 686 switch ((lnk_status & PCI_EXP_LNKSTA_NLW) >> 4){
689 case 0: 687 case 0:
690 lnk_wdth = PCIE_LNK_WIDTH_RESRV; 688 lnk_wdth = PCIE_LNK_WIDTH_RESRV;
691 break; 689 break;
692 case 1: 690 case 1:
693 lnk_wdth = PCIE_LNK_X1; 691 lnk_wdth = PCIE_LNK_X1;
694 break; 692 break;
695 case 2: 693 case 2:
696 lnk_wdth = PCIE_LNK_X2; 694 lnk_wdth = PCIE_LNK_X2;
697 break; 695 break;
698 case 4: 696 case 4:
699 lnk_wdth = PCIE_LNK_X4; 697 lnk_wdth = PCIE_LNK_X4;
700 break; 698 break;
701 case 8: 699 case 8:
702 lnk_wdth = PCIE_LNK_X8; 700 lnk_wdth = PCIE_LNK_X8;
703 break; 701 break;
704 case 12: 702 case 12:
705 lnk_wdth = PCIE_LNK_X12; 703 lnk_wdth = PCIE_LNK_X12;
706 break; 704 break;
707 case 16: 705 case 16:
708 lnk_wdth = PCIE_LNK_X16; 706 lnk_wdth = PCIE_LNK_X16;
709 break; 707 break;
710 case 32: 708 case 32:
711 lnk_wdth = PCIE_LNK_X32; 709 lnk_wdth = PCIE_LNK_X32;
712 break; 710 break;
713 default: 711 default:
714 lnk_wdth = PCIE_LNK_WIDTH_UNKNOWN; 712 lnk_wdth = PCIE_LNK_WIDTH_UNKNOWN;
715 break; 713 break;
716 } 714 }
717 715
718 *value = lnk_wdth; 716 *value = lnk_wdth;
719 ctrl_dbg(ctrl, "Current link width = %d\n", lnk_wdth); 717 ctrl_dbg(ctrl, "Current link width = %d\n", lnk_wdth);
720 718
721 return retval; 719 return retval;
722 } 720 }
723 721
724 int pcie_enable_notification(struct controller *ctrl) 722 int pcie_enable_notification(struct controller *ctrl)
725 { 723 {
726 u16 cmd, mask; 724 u16 cmd, mask;
727 725
728 /* 726 /*
729 * TBD: Power fault detected software notification support. 727 * TBD: Power fault detected software notification support.
730 * 728 *
731 * Power fault detected software notification is not enabled 729 * Power fault detected software notification is not enabled
732 * now, because it caused power fault detected interrupt storm 730 * now, because it caused power fault detected interrupt storm
733 * on some machines. On those machines, power fault detected 731 * on some machines. On those machines, power fault detected
734 * bit in the slot status register was set again immediately 732 * bit in the slot status register was set again immediately
735 * when it is cleared in the interrupt service routine, and 733 * when it is cleared in the interrupt service routine, and
736 * next power fault detected interrupt was notified again. 734 * next power fault detected interrupt was notified again.
737 */ 735 */
738 cmd = PCI_EXP_SLTCTL_PDCE; 736 cmd = PCI_EXP_SLTCTL_PDCE;
739 if (ATTN_BUTTN(ctrl)) 737 if (ATTN_BUTTN(ctrl))
740 cmd |= PCI_EXP_SLTCTL_ABPE; 738 cmd |= PCI_EXP_SLTCTL_ABPE;
741 if (MRL_SENS(ctrl)) 739 if (MRL_SENS(ctrl))
742 cmd |= PCI_EXP_SLTCTL_MRLSCE; 740 cmd |= PCI_EXP_SLTCTL_MRLSCE;
743 if (!pciehp_poll_mode) 741 if (!pciehp_poll_mode)
744 cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE; 742 cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE;
745 743
746 mask = (PCI_EXP_SLTCTL_PDCE | PCI_EXP_SLTCTL_ABPE | 744 mask = (PCI_EXP_SLTCTL_PDCE | PCI_EXP_SLTCTL_ABPE |
747 PCI_EXP_SLTCTL_MRLSCE | PCI_EXP_SLTCTL_PFDE | 745 PCI_EXP_SLTCTL_MRLSCE | PCI_EXP_SLTCTL_PFDE |
748 PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE); 746 PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE);
749 747
750 if (pcie_write_cmd(ctrl, cmd, mask)) { 748 if (pcie_write_cmd(ctrl, cmd, mask)) {
751 ctrl_err(ctrl, "Cannot enable software notification\n"); 749 ctrl_err(ctrl, "Cannot enable software notification\n");
752 return -1; 750 return -1;
753 } 751 }
754 return 0; 752 return 0;
755 } 753 }
756 754
757 static void pcie_disable_notification(struct controller *ctrl) 755 static void pcie_disable_notification(struct controller *ctrl)
758 { 756 {
759 u16 mask; 757 u16 mask;
760 mask = (PCI_EXP_SLTCTL_PDCE | PCI_EXP_SLTCTL_ABPE | 758 mask = (PCI_EXP_SLTCTL_PDCE | PCI_EXP_SLTCTL_ABPE |
761 PCI_EXP_SLTCTL_MRLSCE | PCI_EXP_SLTCTL_PFDE | 759 PCI_EXP_SLTCTL_MRLSCE | PCI_EXP_SLTCTL_PFDE |
762 PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE | 760 PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE |
763 PCI_EXP_SLTCTL_DLLSCE); 761 PCI_EXP_SLTCTL_DLLSCE);
764 if (pcie_write_cmd(ctrl, 0, mask)) 762 if (pcie_write_cmd(ctrl, 0, mask))
765 ctrl_warn(ctrl, "Cannot disable software notification\n"); 763 ctrl_warn(ctrl, "Cannot disable software notification\n");
766 } 764 }
767 765
768 int pcie_init_notification(struct controller *ctrl) 766 int pcie_init_notification(struct controller *ctrl)
769 { 767 {
770 if (pciehp_request_irq(ctrl)) 768 if (pciehp_request_irq(ctrl))
771 return -1; 769 return -1;
772 if (pcie_enable_notification(ctrl)) { 770 if (pcie_enable_notification(ctrl)) {
773 pciehp_free_irq(ctrl); 771 pciehp_free_irq(ctrl);
774 return -1; 772 return -1;
775 } 773 }
776 ctrl->notification_enabled = 1; 774 ctrl->notification_enabled = 1;
777 return 0; 775 return 0;
778 } 776 }
779 777
780 static void pcie_shutdown_notification(struct controller *ctrl) 778 static void pcie_shutdown_notification(struct controller *ctrl)
781 { 779 {
782 if (ctrl->notification_enabled) { 780 if (ctrl->notification_enabled) {
783 pcie_disable_notification(ctrl); 781 pcie_disable_notification(ctrl);
784 pciehp_free_irq(ctrl); 782 pciehp_free_irq(ctrl);
785 ctrl->notification_enabled = 0; 783 ctrl->notification_enabled = 0;
786 } 784 }
787 } 785 }
788 786
789 static int pcie_init_slot(struct controller *ctrl) 787 static int pcie_init_slot(struct controller *ctrl)
790 { 788 {
791 struct slot *slot; 789 struct slot *slot;
792 790
793 slot = kzalloc(sizeof(*slot), GFP_KERNEL); 791 slot = kzalloc(sizeof(*slot), GFP_KERNEL);
794 if (!slot) 792 if (!slot)
795 return -ENOMEM; 793 return -ENOMEM;
796 794
797 slot->ctrl = ctrl; 795 slot->ctrl = ctrl;
798 mutex_init(&slot->lock); 796 mutex_init(&slot->lock);
799 INIT_DELAYED_WORK(&slot->work, pciehp_queue_pushbutton_work); 797 INIT_DELAYED_WORK(&slot->work, pciehp_queue_pushbutton_work);
800 ctrl->slot = slot; 798 ctrl->slot = slot;
801 return 0; 799 return 0;
802 } 800 }
803 801
804 static void pcie_cleanup_slot(struct controller *ctrl) 802 static void pcie_cleanup_slot(struct controller *ctrl)
805 { 803 {
806 struct slot *slot = ctrl->slot; 804 struct slot *slot = ctrl->slot;
807 cancel_delayed_work(&slot->work); 805 cancel_delayed_work(&slot->work);
808 flush_scheduled_work();
809 flush_workqueue(pciehp_wq); 806 flush_workqueue(pciehp_wq);
807 flush_workqueue(pciehp_ordered_wq);
810 kfree(slot); 808 kfree(slot);
811 } 809 }
812 810
813 static inline void dbg_ctrl(struct controller *ctrl) 811 static inline void dbg_ctrl(struct controller *ctrl)
814 { 812 {
815 int i; 813 int i;
816 u16 reg16; 814 u16 reg16;
817 struct pci_dev *pdev = ctrl->pcie->port; 815 struct pci_dev *pdev = ctrl->pcie->port;
818 816
819 if (!pciehp_debug) 817 if (!pciehp_debug)
820 return; 818 return;
821 819
822 ctrl_info(ctrl, "Hotplug Controller:\n"); 820 ctrl_info(ctrl, "Hotplug Controller:\n");
823 ctrl_info(ctrl, " Seg/Bus/Dev/Func/IRQ : %s IRQ %d\n", 821 ctrl_info(ctrl, " Seg/Bus/Dev/Func/IRQ : %s IRQ %d\n",
824 pci_name(pdev), pdev->irq); 822 pci_name(pdev), pdev->irq);
825 ctrl_info(ctrl, " Vendor ID : 0x%04x\n", pdev->vendor); 823 ctrl_info(ctrl, " Vendor ID : 0x%04x\n", pdev->vendor);
826 ctrl_info(ctrl, " Device ID : 0x%04x\n", pdev->device); 824 ctrl_info(ctrl, " Device ID : 0x%04x\n", pdev->device);
827 ctrl_info(ctrl, " Subsystem ID : 0x%04x\n", 825 ctrl_info(ctrl, " Subsystem ID : 0x%04x\n",
828 pdev->subsystem_device); 826 pdev->subsystem_device);
829 ctrl_info(ctrl, " Subsystem Vendor ID : 0x%04x\n", 827 ctrl_info(ctrl, " Subsystem Vendor ID : 0x%04x\n",
830 pdev->subsystem_vendor); 828 pdev->subsystem_vendor);
831 ctrl_info(ctrl, " PCIe Cap offset : 0x%02x\n", 829 ctrl_info(ctrl, " PCIe Cap offset : 0x%02x\n",
832 pci_pcie_cap(pdev)); 830 pci_pcie_cap(pdev));
833 for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) { 831 for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
834 if (!pci_resource_len(pdev, i)) 832 if (!pci_resource_len(pdev, i))
835 continue; 833 continue;
836 ctrl_info(ctrl, " PCI resource [%d] : %pR\n", 834 ctrl_info(ctrl, " PCI resource [%d] : %pR\n",
837 i, &pdev->resource[i]); 835 i, &pdev->resource[i]);
838 } 836 }
839 ctrl_info(ctrl, "Slot Capabilities : 0x%08x\n", ctrl->slot_cap); 837 ctrl_info(ctrl, "Slot Capabilities : 0x%08x\n", ctrl->slot_cap);
840 ctrl_info(ctrl, " Physical Slot Number : %d\n", PSN(ctrl)); 838 ctrl_info(ctrl, " Physical Slot Number : %d\n", PSN(ctrl));
841 ctrl_info(ctrl, " Attention Button : %3s\n", 839 ctrl_info(ctrl, " Attention Button : %3s\n",
842 ATTN_BUTTN(ctrl) ? "yes" : "no"); 840 ATTN_BUTTN(ctrl) ? "yes" : "no");
843 ctrl_info(ctrl, " Power Controller : %3s\n", 841 ctrl_info(ctrl, " Power Controller : %3s\n",
844 POWER_CTRL(ctrl) ? "yes" : "no"); 842 POWER_CTRL(ctrl) ? "yes" : "no");
845 ctrl_info(ctrl, " MRL Sensor : %3s\n", 843 ctrl_info(ctrl, " MRL Sensor : %3s\n",
846 MRL_SENS(ctrl) ? "yes" : "no"); 844 MRL_SENS(ctrl) ? "yes" : "no");
847 ctrl_info(ctrl, " Attention Indicator : %3s\n", 845 ctrl_info(ctrl, " Attention Indicator : %3s\n",
848 ATTN_LED(ctrl) ? "yes" : "no"); 846 ATTN_LED(ctrl) ? "yes" : "no");
849 ctrl_info(ctrl, " Power Indicator : %3s\n", 847 ctrl_info(ctrl, " Power Indicator : %3s\n",
850 PWR_LED(ctrl) ? "yes" : "no"); 848 PWR_LED(ctrl) ? "yes" : "no");
851 ctrl_info(ctrl, " Hot-Plug Surprise : %3s\n", 849 ctrl_info(ctrl, " Hot-Plug Surprise : %3s\n",
852 HP_SUPR_RM(ctrl) ? "yes" : "no"); 850 HP_SUPR_RM(ctrl) ? "yes" : "no");
853 ctrl_info(ctrl, " EMI Present : %3s\n", 851 ctrl_info(ctrl, " EMI Present : %3s\n",
854 EMI(ctrl) ? "yes" : "no"); 852 EMI(ctrl) ? "yes" : "no");
855 ctrl_info(ctrl, " Command Completed : %3s\n", 853 ctrl_info(ctrl, " Command Completed : %3s\n",
856 NO_CMD_CMPL(ctrl) ? "no" : "yes"); 854 NO_CMD_CMPL(ctrl) ? "no" : "yes");
857 pciehp_readw(ctrl, PCI_EXP_SLTSTA, &reg16); 855 pciehp_readw(ctrl, PCI_EXP_SLTSTA, &reg16);
858 ctrl_info(ctrl, "Slot Status : 0x%04x\n", reg16); 856 ctrl_info(ctrl, "Slot Status : 0x%04x\n", reg16);
859 pciehp_readw(ctrl, PCI_EXP_SLTCTL, &reg16); 857 pciehp_readw(ctrl, PCI_EXP_SLTCTL, &reg16);
860 ctrl_info(ctrl, "Slot Control : 0x%04x\n", reg16); 858 ctrl_info(ctrl, "Slot Control : 0x%04x\n", reg16);
861 } 859 }
862 860
863 struct controller *pcie_init(struct pcie_device *dev) 861 struct controller *pcie_init(struct pcie_device *dev)
864 { 862 {
865 struct controller *ctrl; 863 struct controller *ctrl;
866 u32 slot_cap, link_cap; 864 u32 slot_cap, link_cap;
867 struct pci_dev *pdev = dev->port; 865 struct pci_dev *pdev = dev->port;
868 866
869 ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL); 867 ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
870 if (!ctrl) { 868 if (!ctrl) {
871 dev_err(&dev->device, "%s: Out of memory\n", __func__); 869 dev_err(&dev->device, "%s: Out of memory\n", __func__);
872 goto abort; 870 goto abort;
873 } 871 }
874 ctrl->pcie = dev; 872 ctrl->pcie = dev;
875 if (!pci_pcie_cap(pdev)) { 873 if (!pci_pcie_cap(pdev)) {
876 ctrl_err(ctrl, "Cannot find PCI Express capability\n"); 874 ctrl_err(ctrl, "Cannot find PCI Express capability\n");
877 goto abort_ctrl; 875 goto abort_ctrl;
878 } 876 }
879 if (pciehp_readl(ctrl, PCI_EXP_SLTCAP, &slot_cap)) { 877 if (pciehp_readl(ctrl, PCI_EXP_SLTCAP, &slot_cap)) {
880 ctrl_err(ctrl, "Cannot read SLOTCAP register\n"); 878 ctrl_err(ctrl, "Cannot read SLOTCAP register\n");
881 goto abort_ctrl; 879 goto abort_ctrl;
882 } 880 }
883 881
884 ctrl->slot_cap = slot_cap; 882 ctrl->slot_cap = slot_cap;
885 mutex_init(&ctrl->ctrl_lock); 883 mutex_init(&ctrl->ctrl_lock);
886 init_waitqueue_head(&ctrl->queue); 884 init_waitqueue_head(&ctrl->queue);
887 dbg_ctrl(ctrl); 885 dbg_ctrl(ctrl);
888 /* 886 /*
889 * Controller doesn't notify of command completion if the "No 887 * Controller doesn't notify of command completion if the "No
890 * Command Completed Support" bit is set in Slot Capability 888 * Command Completed Support" bit is set in Slot Capability
891 * register or the controller supports none of power 889 * register or the controller supports none of power
892 * controller, attention led, power led and EMI. 890 * controller, attention led, power led and EMI.
893 */ 891 */
894 if (NO_CMD_CMPL(ctrl) || 892 if (NO_CMD_CMPL(ctrl) ||
895 !(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl))) 893 !(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl)))
896 ctrl->no_cmd_complete = 1; 894 ctrl->no_cmd_complete = 1;
897 895
898 /* Check if Data Link Layer Link Active Reporting is implemented */ 896 /* Check if Data Link Layer Link Active Reporting is implemented */
899 if (pciehp_readl(ctrl, PCI_EXP_LNKCAP, &link_cap)) { 897 if (pciehp_readl(ctrl, PCI_EXP_LNKCAP, &link_cap)) {
900 ctrl_err(ctrl, "%s: Cannot read LNKCAP register\n", __func__); 898 ctrl_err(ctrl, "%s: Cannot read LNKCAP register\n", __func__);
901 goto abort_ctrl; 899 goto abort_ctrl;
902 } 900 }
903 if (link_cap & PCI_EXP_LNKCAP_DLLLARC) { 901 if (link_cap & PCI_EXP_LNKCAP_DLLLARC) {
904 ctrl_dbg(ctrl, "Link Active Reporting supported\n"); 902 ctrl_dbg(ctrl, "Link Active Reporting supported\n");
905 ctrl->link_active_reporting = 1; 903 ctrl->link_active_reporting = 1;
906 } 904 }
907 905
908 /* Clear all remaining event bits in Slot Status register */ 906 /* Clear all remaining event bits in Slot Status register */
909 if (pciehp_writew(ctrl, PCI_EXP_SLTSTA, 0x1f)) 907 if (pciehp_writew(ctrl, PCI_EXP_SLTSTA, 0x1f))
910 goto abort_ctrl; 908 goto abort_ctrl;
911 909
912 /* Disable sotfware notification */ 910 /* Disable sotfware notification */
913 pcie_disable_notification(ctrl); 911 pcie_disable_notification(ctrl);
914 912
915 /*
916 * If this is the first controller to be initialized,
917 * initialize the pciehp work queue
918 */
919 if (atomic_add_return(1, &pciehp_num_controllers) == 1) {
920 pciehp_wq = create_singlethread_workqueue("pciehpd");
921 if (!pciehp_wq)
922 goto abort_ctrl;
923 }
924
925 ctrl_info(ctrl, "HPC vendor_id %x device_id %x ss_vid %x ss_did %x\n", 913 ctrl_info(ctrl, "HPC vendor_id %x device_id %x ss_vid %x ss_did %x\n",
926 pdev->vendor, pdev->device, pdev->subsystem_vendor, 914 pdev->vendor, pdev->device, pdev->subsystem_vendor,
927 pdev->subsystem_device); 915 pdev->subsystem_device);
928 916
929 if (pcie_init_slot(ctrl)) 917 if (pcie_init_slot(ctrl))
930 goto abort_ctrl; 918 goto abort_ctrl;
931 919
932 return ctrl; 920 return ctrl;
933 921
934 abort_ctrl: 922 abort_ctrl:
935 kfree(ctrl); 923 kfree(ctrl);
936 abort: 924 abort:
937 return NULL; 925 return NULL;
938 } 926 }
939 927
940 void pciehp_release_ctrl(struct controller *ctrl) 928 void pciehp_release_ctrl(struct controller *ctrl)
941 { 929 {
942 pcie_shutdown_notification(ctrl); 930 pcie_shutdown_notification(ctrl);
943 pcie_cleanup_slot(ctrl); 931 pcie_cleanup_slot(ctrl);
944 /*
945 * If this is the last controller to be released, destroy the
946 * pciehp work queue
947 */
948 if (atomic_dec_and_test(&pciehp_num_controllers))
949 destroy_workqueue(pciehp_wq);
950 kfree(ctrl); 932 kfree(ctrl);
951 } 933 }
drivers/pci/hotplug/shpchp.h
1 /* 1 /*
2 * Standard Hot Plug Controller Driver 2 * Standard Hot Plug Controller Driver
3 * 3 *
4 * Copyright (C) 1995,2001 Compaq Computer Corporation 4 * Copyright (C) 1995,2001 Compaq Computer Corporation
5 * Copyright (C) 2001 Greg Kroah-Hartman (greg@kroah.com) 5 * Copyright (C) 2001 Greg Kroah-Hartman (greg@kroah.com)
6 * Copyright (C) 2001 IBM 6 * Copyright (C) 2001 IBM
7 * Copyright (C) 2003-2004 Intel Corporation 7 * Copyright (C) 2003-2004 Intel Corporation
8 * 8 *
9 * All rights reserved. 9 * All rights reserved.
10 * 10 *
11 * This program is free software; you can redistribute it and/or modify 11 * This program is free software; you can redistribute it and/or modify
12 * it under the terms of the GNU General Public License as published by 12 * it under the terms of the GNU General Public License as published by
13 * the Free Software Foundation; either version 2 of the License, or (at 13 * the Free Software Foundation; either version 2 of the License, or (at
14 * your option) any later version. 14 * your option) any later version.
15 * 15 *
16 * This program is distributed in the hope that it will be useful, but 16 * This program is distributed in the hope that it will be useful, but
17 * WITHOUT ANY WARRANTY; without even the implied warranty of 17 * WITHOUT ANY WARRANTY; without even the implied warranty of
18 * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or 18 * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
19 * NON INFRINGEMENT. See the GNU General Public License for more 19 * NON INFRINGEMENT. See the GNU General Public License for more
20 * details. 20 * details.
21 * 21 *
22 * You should have received a copy of the GNU General Public License 22 * You should have received a copy of the GNU General Public License
23 * along with this program; if not, write to the Free Software 23 * along with this program; if not, write to the Free Software
24 * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. 24 * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
25 * 25 *
26 * Send feedback to <greg@kroah.com>,<kristen.c.accardi@intel.com> 26 * Send feedback to <greg@kroah.com>,<kristen.c.accardi@intel.com>
27 * 27 *
28 */ 28 */
29 #ifndef _SHPCHP_H 29 #ifndef _SHPCHP_H
30 #define _SHPCHP_H 30 #define _SHPCHP_H
31 31
32 #include <linux/types.h> 32 #include <linux/types.h>
33 #include <linux/pci.h> 33 #include <linux/pci.h>
34 #include <linux/pci_hotplug.h> 34 #include <linux/pci_hotplug.h>
35 #include <linux/delay.h> 35 #include <linux/delay.h>
36 #include <linux/sched.h> /* signal_pending(), struct timer_list */ 36 #include <linux/sched.h> /* signal_pending(), struct timer_list */
37 #include <linux/mutex.h> 37 #include <linux/mutex.h>
38 #include <linux/workqueue.h>
38 39
39 #if !defined(MODULE) 40 #if !defined(MODULE)
40 #define MY_NAME "shpchp" 41 #define MY_NAME "shpchp"
41 #else 42 #else
42 #define MY_NAME THIS_MODULE->name 43 #define MY_NAME THIS_MODULE->name
43 #endif 44 #endif
44 45
45 extern int shpchp_poll_mode; 46 extern int shpchp_poll_mode;
46 extern int shpchp_poll_time; 47 extern int shpchp_poll_time;
47 extern int shpchp_debug; 48 extern int shpchp_debug;
48 extern struct workqueue_struct *shpchp_wq; 49 extern struct workqueue_struct *shpchp_wq;
50 extern struct workqueue_struct *shpchp_ordered_wq;
49 51
50 #define dbg(format, arg...) \ 52 #define dbg(format, arg...) \
51 do { \ 53 do { \
52 if (shpchp_debug) \ 54 if (shpchp_debug) \
53 printk(KERN_DEBUG "%s: " format, MY_NAME , ## arg); \ 55 printk(KERN_DEBUG "%s: " format, MY_NAME , ## arg); \
54 } while (0) 56 } while (0)
55 #define err(format, arg...) \ 57 #define err(format, arg...) \
56 printk(KERN_ERR "%s: " format, MY_NAME , ## arg) 58 printk(KERN_ERR "%s: " format, MY_NAME , ## arg)
57 #define info(format, arg...) \ 59 #define info(format, arg...) \
58 printk(KERN_INFO "%s: " format, MY_NAME , ## arg) 60 printk(KERN_INFO "%s: " format, MY_NAME , ## arg)
59 #define warn(format, arg...) \ 61 #define warn(format, arg...) \
60 printk(KERN_WARNING "%s: " format, MY_NAME , ## arg) 62 printk(KERN_WARNING "%s: " format, MY_NAME , ## arg)
61 63
62 #define ctrl_dbg(ctrl, format, arg...) \ 64 #define ctrl_dbg(ctrl, format, arg...) \
63 do { \ 65 do { \
64 if (shpchp_debug) \ 66 if (shpchp_debug) \
65 dev_printk(KERN_DEBUG, &ctrl->pci_dev->dev, \ 67 dev_printk(KERN_DEBUG, &ctrl->pci_dev->dev, \
66 format, ## arg); \ 68 format, ## arg); \
67 } while (0) 69 } while (0)
68 #define ctrl_err(ctrl, format, arg...) \ 70 #define ctrl_err(ctrl, format, arg...) \
69 dev_err(&ctrl->pci_dev->dev, format, ## arg) 71 dev_err(&ctrl->pci_dev->dev, format, ## arg)
70 #define ctrl_info(ctrl, format, arg...) \ 72 #define ctrl_info(ctrl, format, arg...) \
71 dev_info(&ctrl->pci_dev->dev, format, ## arg) 73 dev_info(&ctrl->pci_dev->dev, format, ## arg)
72 #define ctrl_warn(ctrl, format, arg...) \ 74 #define ctrl_warn(ctrl, format, arg...) \
73 dev_warn(&ctrl->pci_dev->dev, format, ## arg) 75 dev_warn(&ctrl->pci_dev->dev, format, ## arg)
74 76
75 77
76 #define SLOT_NAME_SIZE 10 78 #define SLOT_NAME_SIZE 10
77 struct slot { 79 struct slot {
78 u8 bus; 80 u8 bus;
79 u8 device; 81 u8 device;
80 u16 status; 82 u16 status;
81 u32 number; 83 u32 number;
82 u8 is_a_board; 84 u8 is_a_board;
83 u8 state; 85 u8 state;
84 u8 presence_save; 86 u8 presence_save;
85 u8 pwr_save; 87 u8 pwr_save;
86 struct controller *ctrl; 88 struct controller *ctrl;
87 struct hpc_ops *hpc_ops; 89 struct hpc_ops *hpc_ops;
88 struct hotplug_slot *hotplug_slot; 90 struct hotplug_slot *hotplug_slot;
89 struct list_head slot_list; 91 struct list_head slot_list;
90 struct delayed_work work; /* work for button event */ 92 struct delayed_work work; /* work for button event */
91 struct mutex lock; 93 struct mutex lock;
92 u8 hp_slot; 94 u8 hp_slot;
93 }; 95 };
94 96
95 struct event_info { 97 struct event_info {
96 u32 event_type; 98 u32 event_type;
97 struct slot *p_slot; 99 struct slot *p_slot;
98 struct work_struct work; 100 struct work_struct work;
99 }; 101 };
100 102
101 struct controller { 103 struct controller {
102 struct mutex crit_sect; /* critical section mutex */ 104 struct mutex crit_sect; /* critical section mutex */
103 struct mutex cmd_lock; /* command lock */ 105 struct mutex cmd_lock; /* command lock */
104 int num_slots; /* Number of slots on ctlr */ 106 int num_slots; /* Number of slots on ctlr */
105 int slot_num_inc; /* 1 or -1 */ 107 int slot_num_inc; /* 1 or -1 */
106 struct pci_dev *pci_dev; 108 struct pci_dev *pci_dev;
107 struct list_head slot_list; 109 struct list_head slot_list;
108 struct hpc_ops *hpc_ops; 110 struct hpc_ops *hpc_ops;
109 wait_queue_head_t queue; /* sleep & wake process */ 111 wait_queue_head_t queue; /* sleep & wake process */
110 u8 slot_device_offset; 112 u8 slot_device_offset;
111 u32 pcix_misc2_reg; /* for amd pogo errata */ 113 u32 pcix_misc2_reg; /* for amd pogo errata */
112 u32 first_slot; /* First physical slot number */ 114 u32 first_slot; /* First physical slot number */
113 u32 cap_offset; 115 u32 cap_offset;
114 unsigned long mmio_base; 116 unsigned long mmio_base;
115 unsigned long mmio_size; 117 unsigned long mmio_size;
116 void __iomem *creg; 118 void __iomem *creg;
117 struct timer_list poll_timer; 119 struct timer_list poll_timer;
118 }; 120 };
119 121
120 /* Define AMD SHPC ID */ 122 /* Define AMD SHPC ID */
121 #define PCI_DEVICE_ID_AMD_GOLAM_7450 0x7450 123 #define PCI_DEVICE_ID_AMD_GOLAM_7450 0x7450
122 #define PCI_DEVICE_ID_AMD_POGO_7458 0x7458 124 #define PCI_DEVICE_ID_AMD_POGO_7458 0x7458
123 125
124 /* AMD PCI-X bridge registers */ 126 /* AMD PCI-X bridge registers */
125 #define PCIX_MEM_BASE_LIMIT_OFFSET 0x1C 127 #define PCIX_MEM_BASE_LIMIT_OFFSET 0x1C
126 #define PCIX_MISCII_OFFSET 0x48 128 #define PCIX_MISCII_OFFSET 0x48
127 #define PCIX_MISC_BRIDGE_ERRORS_OFFSET 0x80 129 #define PCIX_MISC_BRIDGE_ERRORS_OFFSET 0x80
128 130
129 /* AMD PCIX_MISCII masks and offsets */ 131 /* AMD PCIX_MISCII masks and offsets */
130 #define PERRNONFATALENABLE_MASK 0x00040000 132 #define PERRNONFATALENABLE_MASK 0x00040000
131 #define PERRFATALENABLE_MASK 0x00080000 133 #define PERRFATALENABLE_MASK 0x00080000
132 #define PERRFLOODENABLE_MASK 0x00100000 134 #define PERRFLOODENABLE_MASK 0x00100000
133 #define SERRNONFATALENABLE_MASK 0x00200000 135 #define SERRNONFATALENABLE_MASK 0x00200000
134 #define SERRFATALENABLE_MASK 0x00400000 136 #define SERRFATALENABLE_MASK 0x00400000
135 137
136 /* AMD PCIX_MISC_BRIDGE_ERRORS masks and offsets */ 138 /* AMD PCIX_MISC_BRIDGE_ERRORS masks and offsets */
137 #define PERR_OBSERVED_MASK 0x00000001 139 #define PERR_OBSERVED_MASK 0x00000001
138 140
139 /* AMD PCIX_MEM_BASE_LIMIT masks */ 141 /* AMD PCIX_MEM_BASE_LIMIT masks */
140 #define RSE_MASK 0x40000000 142 #define RSE_MASK 0x40000000
141 143
142 #define INT_BUTTON_IGNORE 0 144 #define INT_BUTTON_IGNORE 0
143 #define INT_PRESENCE_ON 1 145 #define INT_PRESENCE_ON 1
144 #define INT_PRESENCE_OFF 2 146 #define INT_PRESENCE_OFF 2
145 #define INT_SWITCH_CLOSE 3 147 #define INT_SWITCH_CLOSE 3
146 #define INT_SWITCH_OPEN 4 148 #define INT_SWITCH_OPEN 4
147 #define INT_POWER_FAULT 5 149 #define INT_POWER_FAULT 5
148 #define INT_POWER_FAULT_CLEAR 6 150 #define INT_POWER_FAULT_CLEAR 6
149 #define INT_BUTTON_PRESS 7 151 #define INT_BUTTON_PRESS 7
150 #define INT_BUTTON_RELEASE 8 152 #define INT_BUTTON_RELEASE 8
151 #define INT_BUTTON_CANCEL 9 153 #define INT_BUTTON_CANCEL 9
152 154
153 #define STATIC_STATE 0 155 #define STATIC_STATE 0
154 #define BLINKINGON_STATE 1 156 #define BLINKINGON_STATE 1
155 #define BLINKINGOFF_STATE 2 157 #define BLINKINGOFF_STATE 2
156 #define POWERON_STATE 3 158 #define POWERON_STATE 3
157 #define POWEROFF_STATE 4 159 #define POWEROFF_STATE 4
158 160
159 /* Error messages */ 161 /* Error messages */
160 #define INTERLOCK_OPEN 0x00000002 162 #define INTERLOCK_OPEN 0x00000002
161 #define ADD_NOT_SUPPORTED 0x00000003 163 #define ADD_NOT_SUPPORTED 0x00000003
162 #define CARD_FUNCTIONING 0x00000005 164 #define CARD_FUNCTIONING 0x00000005
163 #define ADAPTER_NOT_SAME 0x00000006 165 #define ADAPTER_NOT_SAME 0x00000006
164 #define NO_ADAPTER_PRESENT 0x00000009 166 #define NO_ADAPTER_PRESENT 0x00000009
165 #define NOT_ENOUGH_RESOURCES 0x0000000B 167 #define NOT_ENOUGH_RESOURCES 0x0000000B
166 #define DEVICE_TYPE_NOT_SUPPORTED 0x0000000C 168 #define DEVICE_TYPE_NOT_SUPPORTED 0x0000000C
167 #define WRONG_BUS_FREQUENCY 0x0000000D 169 #define WRONG_BUS_FREQUENCY 0x0000000D
168 #define POWER_FAILURE 0x0000000E 170 #define POWER_FAILURE 0x0000000E
169 171
170 extern int __must_check shpchp_create_ctrl_files(struct controller *ctrl); 172 extern int __must_check shpchp_create_ctrl_files(struct controller *ctrl);
171 extern void shpchp_remove_ctrl_files(struct controller *ctrl); 173 extern void shpchp_remove_ctrl_files(struct controller *ctrl);
172 extern int shpchp_sysfs_enable_slot(struct slot *slot); 174 extern int shpchp_sysfs_enable_slot(struct slot *slot);
173 extern int shpchp_sysfs_disable_slot(struct slot *slot); 175 extern int shpchp_sysfs_disable_slot(struct slot *slot);
174 extern u8 shpchp_handle_attention_button(u8 hp_slot, struct controller *ctrl); 176 extern u8 shpchp_handle_attention_button(u8 hp_slot, struct controller *ctrl);
175 extern u8 shpchp_handle_switch_change(u8 hp_slot, struct controller *ctrl); 177 extern u8 shpchp_handle_switch_change(u8 hp_slot, struct controller *ctrl);
176 extern u8 shpchp_handle_presence_change(u8 hp_slot, struct controller *ctrl); 178 extern u8 shpchp_handle_presence_change(u8 hp_slot, struct controller *ctrl);
177 extern u8 shpchp_handle_power_fault(u8 hp_slot, struct controller *ctrl); 179 extern u8 shpchp_handle_power_fault(u8 hp_slot, struct controller *ctrl);
178 extern int shpchp_configure_device(struct slot *p_slot); 180 extern int shpchp_configure_device(struct slot *p_slot);
179 extern int shpchp_unconfigure_device(struct slot *p_slot); 181 extern int shpchp_unconfigure_device(struct slot *p_slot);
180 extern void cleanup_slots(struct controller *ctrl); 182 extern void cleanup_slots(struct controller *ctrl);
181 extern void shpchp_queue_pushbutton_work(struct work_struct *work); 183 extern void shpchp_queue_pushbutton_work(struct work_struct *work);
182 extern int shpc_init( struct controller *ctrl, struct pci_dev *pdev); 184 extern int shpc_init( struct controller *ctrl, struct pci_dev *pdev);
183 185
184 static inline const char *slot_name(struct slot *slot) 186 static inline const char *slot_name(struct slot *slot)
185 { 187 {
186 return hotplug_slot_name(slot->hotplug_slot); 188 return hotplug_slot_name(slot->hotplug_slot);
187 } 189 }
188 190
189 #ifdef CONFIG_ACPI 191 #ifdef CONFIG_ACPI
190 #include <linux/pci-acpi.h> 192 #include <linux/pci-acpi.h>
191 static inline int get_hp_hw_control_from_firmware(struct pci_dev *dev) 193 static inline int get_hp_hw_control_from_firmware(struct pci_dev *dev)
192 { 194 {
193 u32 flags = OSC_SHPC_NATIVE_HP_CONTROL; 195 u32 flags = OSC_SHPC_NATIVE_HP_CONTROL;
194 return acpi_get_hp_hw_control_from_firmware(dev, flags); 196 return acpi_get_hp_hw_control_from_firmware(dev, flags);
195 } 197 }
196 #else 198 #else
197 #define get_hp_hw_control_from_firmware(dev) (0) 199 #define get_hp_hw_control_from_firmware(dev) (0)
198 #endif 200 #endif
199 201
200 struct ctrl_reg { 202 struct ctrl_reg {
201 volatile u32 base_offset; 203 volatile u32 base_offset;
202 volatile u32 slot_avail1; 204 volatile u32 slot_avail1;
203 volatile u32 slot_avail2; 205 volatile u32 slot_avail2;
204 volatile u32 slot_config; 206 volatile u32 slot_config;
205 volatile u16 sec_bus_config; 207 volatile u16 sec_bus_config;
206 volatile u8 msi_ctrl; 208 volatile u8 msi_ctrl;
207 volatile u8 prog_interface; 209 volatile u8 prog_interface;
208 volatile u16 cmd; 210 volatile u16 cmd;
209 volatile u16 cmd_status; 211 volatile u16 cmd_status;
210 volatile u32 intr_loc; 212 volatile u32 intr_loc;
211 volatile u32 serr_loc; 213 volatile u32 serr_loc;
212 volatile u32 serr_intr_enable; 214 volatile u32 serr_intr_enable;
213 volatile u32 slot1; 215 volatile u32 slot1;
214 } __attribute__ ((packed)); 216 } __attribute__ ((packed));
215 217
216 /* offsets to the controller registers based on the above structure layout */ 218 /* offsets to the controller registers based on the above structure layout */
217 enum ctrl_offsets { 219 enum ctrl_offsets {
218 BASE_OFFSET = offsetof(struct ctrl_reg, base_offset), 220 BASE_OFFSET = offsetof(struct ctrl_reg, base_offset),
219 SLOT_AVAIL1 = offsetof(struct ctrl_reg, slot_avail1), 221 SLOT_AVAIL1 = offsetof(struct ctrl_reg, slot_avail1),
220 SLOT_AVAIL2 = offsetof(struct ctrl_reg, slot_avail2), 222 SLOT_AVAIL2 = offsetof(struct ctrl_reg, slot_avail2),
221 SLOT_CONFIG = offsetof(struct ctrl_reg, slot_config), 223 SLOT_CONFIG = offsetof(struct ctrl_reg, slot_config),
222 SEC_BUS_CONFIG = offsetof(struct ctrl_reg, sec_bus_config), 224 SEC_BUS_CONFIG = offsetof(struct ctrl_reg, sec_bus_config),
223 MSI_CTRL = offsetof(struct ctrl_reg, msi_ctrl), 225 MSI_CTRL = offsetof(struct ctrl_reg, msi_ctrl),
224 PROG_INTERFACE = offsetof(struct ctrl_reg, prog_interface), 226 PROG_INTERFACE = offsetof(struct ctrl_reg, prog_interface),
225 CMD = offsetof(struct ctrl_reg, cmd), 227 CMD = offsetof(struct ctrl_reg, cmd),
226 CMD_STATUS = offsetof(struct ctrl_reg, cmd_status), 228 CMD_STATUS = offsetof(struct ctrl_reg, cmd_status),
227 INTR_LOC = offsetof(struct ctrl_reg, intr_loc), 229 INTR_LOC = offsetof(struct ctrl_reg, intr_loc),
228 SERR_LOC = offsetof(struct ctrl_reg, serr_loc), 230 SERR_LOC = offsetof(struct ctrl_reg, serr_loc),
229 SERR_INTR_ENABLE = offsetof(struct ctrl_reg, serr_intr_enable), 231 SERR_INTR_ENABLE = offsetof(struct ctrl_reg, serr_intr_enable),
230 SLOT1 = offsetof(struct ctrl_reg, slot1), 232 SLOT1 = offsetof(struct ctrl_reg, slot1),
231 }; 233 };
232 234
233 static inline struct slot *get_slot(struct hotplug_slot *hotplug_slot) 235 static inline struct slot *get_slot(struct hotplug_slot *hotplug_slot)
234 { 236 {
235 return hotplug_slot->private; 237 return hotplug_slot->private;
236 } 238 }
237 239
238 static inline struct slot *shpchp_find_slot(struct controller *ctrl, u8 device) 240 static inline struct slot *shpchp_find_slot(struct controller *ctrl, u8 device)
239 { 241 {
240 struct slot *slot; 242 struct slot *slot;
241 243
242 list_for_each_entry(slot, &ctrl->slot_list, slot_list) { 244 list_for_each_entry(slot, &ctrl->slot_list, slot_list) {
243 if (slot->device == device) 245 if (slot->device == device)
244 return slot; 246 return slot;
245 } 247 }
246 248
247 ctrl_err(ctrl, "Slot (device=0x%02x) not found\n", device); 249 ctrl_err(ctrl, "Slot (device=0x%02x) not found\n", device);
248 return NULL; 250 return NULL;
249 } 251 }
250 252
251 static inline void amd_pogo_errata_save_misc_reg(struct slot *p_slot) 253 static inline void amd_pogo_errata_save_misc_reg(struct slot *p_slot)
252 { 254 {
253 u32 pcix_misc2_temp; 255 u32 pcix_misc2_temp;
254 256
255 /* save MiscII register */ 257 /* save MiscII register */
256 pci_read_config_dword(p_slot->ctrl->pci_dev, PCIX_MISCII_OFFSET, &pcix_misc2_temp); 258 pci_read_config_dword(p_slot->ctrl->pci_dev, PCIX_MISCII_OFFSET, &pcix_misc2_temp);
257 259
258 p_slot->ctrl->pcix_misc2_reg = pcix_misc2_temp; 260 p_slot->ctrl->pcix_misc2_reg = pcix_misc2_temp;
259 261
260 /* clear SERR/PERR enable bits */ 262 /* clear SERR/PERR enable bits */
261 pcix_misc2_temp &= ~SERRFATALENABLE_MASK; 263 pcix_misc2_temp &= ~SERRFATALENABLE_MASK;
262 pcix_misc2_temp &= ~SERRNONFATALENABLE_MASK; 264 pcix_misc2_temp &= ~SERRNONFATALENABLE_MASK;
263 pcix_misc2_temp &= ~PERRFLOODENABLE_MASK; 265 pcix_misc2_temp &= ~PERRFLOODENABLE_MASK;
264 pcix_misc2_temp &= ~PERRFATALENABLE_MASK; 266 pcix_misc2_temp &= ~PERRFATALENABLE_MASK;
265 pcix_misc2_temp &= ~PERRNONFATALENABLE_MASK; 267 pcix_misc2_temp &= ~PERRNONFATALENABLE_MASK;
266 pci_write_config_dword(p_slot->ctrl->pci_dev, PCIX_MISCII_OFFSET, pcix_misc2_temp); 268 pci_write_config_dword(p_slot->ctrl->pci_dev, PCIX_MISCII_OFFSET, pcix_misc2_temp);
267 } 269 }
268 270
269 static inline void amd_pogo_errata_restore_misc_reg(struct slot *p_slot) 271 static inline void amd_pogo_errata_restore_misc_reg(struct slot *p_slot)
270 { 272 {
271 u32 pcix_misc2_temp; 273 u32 pcix_misc2_temp;
272 u32 pcix_bridge_errors_reg; 274 u32 pcix_bridge_errors_reg;
273 u32 pcix_mem_base_reg; 275 u32 pcix_mem_base_reg;
274 u8 perr_set; 276 u8 perr_set;
275 u8 rse_set; 277 u8 rse_set;
276 278
277 /* write-one-to-clear Bridge_Errors[ PERR_OBSERVED ] */ 279 /* write-one-to-clear Bridge_Errors[ PERR_OBSERVED ] */
278 pci_read_config_dword(p_slot->ctrl->pci_dev, PCIX_MISC_BRIDGE_ERRORS_OFFSET, &pcix_bridge_errors_reg); 280 pci_read_config_dword(p_slot->ctrl->pci_dev, PCIX_MISC_BRIDGE_ERRORS_OFFSET, &pcix_bridge_errors_reg);
279 perr_set = pcix_bridge_errors_reg & PERR_OBSERVED_MASK; 281 perr_set = pcix_bridge_errors_reg & PERR_OBSERVED_MASK;
280 if (perr_set) { 282 if (perr_set) {
281 ctrl_dbg(p_slot->ctrl, 283 ctrl_dbg(p_slot->ctrl,
282 "Bridge_Errors[ PERR_OBSERVED = %08X] (W1C)\n", 284 "Bridge_Errors[ PERR_OBSERVED = %08X] (W1C)\n",
283 perr_set); 285 perr_set);
284 286
285 pci_write_config_dword(p_slot->ctrl->pci_dev, PCIX_MISC_BRIDGE_ERRORS_OFFSET, perr_set); 287 pci_write_config_dword(p_slot->ctrl->pci_dev, PCIX_MISC_BRIDGE_ERRORS_OFFSET, perr_set);
286 } 288 }
287 289
288 /* write-one-to-clear Memory_Base_Limit[ RSE ] */ 290 /* write-one-to-clear Memory_Base_Limit[ RSE ] */
289 pci_read_config_dword(p_slot->ctrl->pci_dev, PCIX_MEM_BASE_LIMIT_OFFSET, &pcix_mem_base_reg); 291 pci_read_config_dword(p_slot->ctrl->pci_dev, PCIX_MEM_BASE_LIMIT_OFFSET, &pcix_mem_base_reg);
290 rse_set = pcix_mem_base_reg & RSE_MASK; 292 rse_set = pcix_mem_base_reg & RSE_MASK;
291 if (rse_set) { 293 if (rse_set) {
292 ctrl_dbg(p_slot->ctrl, "Memory_Base_Limit[ RSE ] (W1C)\n"); 294 ctrl_dbg(p_slot->ctrl, "Memory_Base_Limit[ RSE ] (W1C)\n");
293 295
294 pci_write_config_dword(p_slot->ctrl->pci_dev, PCIX_MEM_BASE_LIMIT_OFFSET, rse_set); 296 pci_write_config_dword(p_slot->ctrl->pci_dev, PCIX_MEM_BASE_LIMIT_OFFSET, rse_set);
295 } 297 }
296 /* restore MiscII register */ 298 /* restore MiscII register */
297 pci_read_config_dword( p_slot->ctrl->pci_dev, PCIX_MISCII_OFFSET, &pcix_misc2_temp ); 299 pci_read_config_dword( p_slot->ctrl->pci_dev, PCIX_MISCII_OFFSET, &pcix_misc2_temp );
298 300
299 if (p_slot->ctrl->pcix_misc2_reg & SERRFATALENABLE_MASK) 301 if (p_slot->ctrl->pcix_misc2_reg & SERRFATALENABLE_MASK)
300 pcix_misc2_temp |= SERRFATALENABLE_MASK; 302 pcix_misc2_temp |= SERRFATALENABLE_MASK;
301 else 303 else
302 pcix_misc2_temp &= ~SERRFATALENABLE_MASK; 304 pcix_misc2_temp &= ~SERRFATALENABLE_MASK;
303 305
304 if (p_slot->ctrl->pcix_misc2_reg & SERRNONFATALENABLE_MASK) 306 if (p_slot->ctrl->pcix_misc2_reg & SERRNONFATALENABLE_MASK)
305 pcix_misc2_temp |= SERRNONFATALENABLE_MASK; 307 pcix_misc2_temp |= SERRNONFATALENABLE_MASK;
306 else 308 else
307 pcix_misc2_temp &= ~SERRNONFATALENABLE_MASK; 309 pcix_misc2_temp &= ~SERRNONFATALENABLE_MASK;
308 310
309 if (p_slot->ctrl->pcix_misc2_reg & PERRFLOODENABLE_MASK) 311 if (p_slot->ctrl->pcix_misc2_reg & PERRFLOODENABLE_MASK)
310 pcix_misc2_temp |= PERRFLOODENABLE_MASK; 312 pcix_misc2_temp |= PERRFLOODENABLE_MASK;
311 else 313 else
312 pcix_misc2_temp &= ~PERRFLOODENABLE_MASK; 314 pcix_misc2_temp &= ~PERRFLOODENABLE_MASK;
313 315
314 if (p_slot->ctrl->pcix_misc2_reg & PERRFATALENABLE_MASK) 316 if (p_slot->ctrl->pcix_misc2_reg & PERRFATALENABLE_MASK)
315 pcix_misc2_temp |= PERRFATALENABLE_MASK; 317 pcix_misc2_temp |= PERRFATALENABLE_MASK;
316 else 318 else
317 pcix_misc2_temp &= ~PERRFATALENABLE_MASK; 319 pcix_misc2_temp &= ~PERRFATALENABLE_MASK;
318 320
319 if (p_slot->ctrl->pcix_misc2_reg & PERRNONFATALENABLE_MASK) 321 if (p_slot->ctrl->pcix_misc2_reg & PERRNONFATALENABLE_MASK)
320 pcix_misc2_temp |= PERRNONFATALENABLE_MASK; 322 pcix_misc2_temp |= PERRNONFATALENABLE_MASK;
321 else 323 else
322 pcix_misc2_temp &= ~PERRNONFATALENABLE_MASK; 324 pcix_misc2_temp &= ~PERRNONFATALENABLE_MASK;
323 pci_write_config_dword(p_slot->ctrl->pci_dev, PCIX_MISCII_OFFSET, pcix_misc2_temp); 325 pci_write_config_dword(p_slot->ctrl->pci_dev, PCIX_MISCII_OFFSET, pcix_misc2_temp);
324 } 326 }
325 327
326 struct hpc_ops { 328 struct hpc_ops {
327 int (*power_on_slot)(struct slot *slot); 329 int (*power_on_slot)(struct slot *slot);
328 int (*slot_enable)(struct slot *slot); 330 int (*slot_enable)(struct slot *slot);
329 int (*slot_disable)(struct slot *slot); 331 int (*slot_disable)(struct slot *slot);
330 int (*set_bus_speed_mode)(struct slot *slot, enum pci_bus_speed speed); 332 int (*set_bus_speed_mode)(struct slot *slot, enum pci_bus_speed speed);
331 int (*get_power_status)(struct slot *slot, u8 *status); 333 int (*get_power_status)(struct slot *slot, u8 *status);
332 int (*get_attention_status)(struct slot *slot, u8 *status); 334 int (*get_attention_status)(struct slot *slot, u8 *status);
333 int (*set_attention_status)(struct slot *slot, u8 status); 335 int (*set_attention_status)(struct slot *slot, u8 status);
334 int (*get_latch_status)(struct slot *slot, u8 *status); 336 int (*get_latch_status)(struct slot *slot, u8 *status);
335 int (*get_adapter_status)(struct slot *slot, u8 *status); 337 int (*get_adapter_status)(struct slot *slot, u8 *status);
336 int (*get_adapter_speed)(struct slot *slot, enum pci_bus_speed *speed); 338 int (*get_adapter_speed)(struct slot *slot, enum pci_bus_speed *speed);
337 int (*get_mode1_ECC_cap)(struct slot *slot, u8 *mode); 339 int (*get_mode1_ECC_cap)(struct slot *slot, u8 *mode);
338 int (*get_prog_int)(struct slot *slot, u8 *prog_int); 340 int (*get_prog_int)(struct slot *slot, u8 *prog_int);
339 int (*query_power_fault)(struct slot *slot); 341 int (*query_power_fault)(struct slot *slot);
340 void (*green_led_on)(struct slot *slot); 342 void (*green_led_on)(struct slot *slot);
341 void (*green_led_off)(struct slot *slot); 343 void (*green_led_off)(struct slot *slot);
342 void (*green_led_blink)(struct slot *slot); 344 void (*green_led_blink)(struct slot *slot);
343 void (*release_ctlr)(struct controller *ctrl); 345 void (*release_ctlr)(struct controller *ctrl);
344 int (*check_cmd_status)(struct controller *ctrl); 346 int (*check_cmd_status)(struct controller *ctrl);
345 }; 347 };
346 348
347 #endif /* _SHPCHP_H */ 349 #endif /* _SHPCHP_H */
348 350
drivers/pci/hotplug/shpchp_core.c
1 /* 1 /*
2 * Standard Hot Plug Controller Driver 2 * Standard Hot Plug Controller Driver
3 * 3 *
4 * Copyright (C) 1995,2001 Compaq Computer Corporation 4 * Copyright (C) 1995,2001 Compaq Computer Corporation
5 * Copyright (C) 2001 Greg Kroah-Hartman (greg@kroah.com) 5 * Copyright (C) 2001 Greg Kroah-Hartman (greg@kroah.com)
6 * Copyright (C) 2001 IBM Corp. 6 * Copyright (C) 2001 IBM Corp.
7 * Copyright (C) 2003-2004 Intel Corporation 7 * Copyright (C) 2003-2004 Intel Corporation
8 * 8 *
9 * All rights reserved. 9 * All rights reserved.
10 * 10 *
11 * This program is free software; you can redistribute it and/or modify 11 * This program is free software; you can redistribute it and/or modify
12 * it under the terms of the GNU General Public License as published by 12 * it under the terms of the GNU General Public License as published by
13 * the Free Software Foundation; either version 2 of the License, or (at 13 * the Free Software Foundation; either version 2 of the License, or (at
14 * your option) any later version. 14 * your option) any later version.
15 * 15 *
16 * This program is distributed in the hope that it will be useful, but 16 * This program is distributed in the hope that it will be useful, but
17 * WITHOUT ANY WARRANTY; without even the implied warranty of 17 * WITHOUT ANY WARRANTY; without even the implied warranty of
18 * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or 18 * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
19 * NON INFRINGEMENT. See the GNU General Public License for more 19 * NON INFRINGEMENT. See the GNU General Public License for more
20 * details. 20 * details.
21 * 21 *
22 * You should have received a copy of the GNU General Public License 22 * You should have received a copy of the GNU General Public License
23 * along with this program; if not, write to the Free Software 23 * along with this program; if not, write to the Free Software
24 * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. 24 * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
25 * 25 *
26 * Send feedback to <greg@kroah.com>, <kristen.c.accardi@intel.com> 26 * Send feedback to <greg@kroah.com>, <kristen.c.accardi@intel.com>
27 * 27 *
28 */ 28 */
29 29
30 #include <linux/module.h> 30 #include <linux/module.h>
31 #include <linux/moduleparam.h> 31 #include <linux/moduleparam.h>
32 #include <linux/kernel.h> 32 #include <linux/kernel.h>
33 #include <linux/types.h> 33 #include <linux/types.h>
34 #include <linux/slab.h> 34 #include <linux/slab.h>
35 #include <linux/pci.h> 35 #include <linux/pci.h>
36 #include <linux/workqueue.h>
37 #include "shpchp.h" 36 #include "shpchp.h"
38 37
39 /* Global variables */ 38 /* Global variables */
40 int shpchp_debug; 39 int shpchp_debug;
41 int shpchp_poll_mode; 40 int shpchp_poll_mode;
42 int shpchp_poll_time; 41 int shpchp_poll_time;
43 struct workqueue_struct *shpchp_wq; 42 struct workqueue_struct *shpchp_wq;
43 struct workqueue_struct *shpchp_ordered_wq;
44 44
45 #define DRIVER_VERSION "0.4" 45 #define DRIVER_VERSION "0.4"
46 #define DRIVER_AUTHOR "Dan Zink <dan.zink@compaq.com>, Greg Kroah-Hartman <greg@kroah.com>, Dely Sy <dely.l.sy@intel.com>" 46 #define DRIVER_AUTHOR "Dan Zink <dan.zink@compaq.com>, Greg Kroah-Hartman <greg@kroah.com>, Dely Sy <dely.l.sy@intel.com>"
47 #define DRIVER_DESC "Standard Hot Plug PCI Controller Driver" 47 #define DRIVER_DESC "Standard Hot Plug PCI Controller Driver"
48 48
49 MODULE_AUTHOR(DRIVER_AUTHOR); 49 MODULE_AUTHOR(DRIVER_AUTHOR);
50 MODULE_DESCRIPTION(DRIVER_DESC); 50 MODULE_DESCRIPTION(DRIVER_DESC);
51 MODULE_LICENSE("GPL"); 51 MODULE_LICENSE("GPL");
52 52
53 module_param(shpchp_debug, bool, 0644); 53 module_param(shpchp_debug, bool, 0644);
54 module_param(shpchp_poll_mode, bool, 0644); 54 module_param(shpchp_poll_mode, bool, 0644);
55 module_param(shpchp_poll_time, int, 0644); 55 module_param(shpchp_poll_time, int, 0644);
56 MODULE_PARM_DESC(shpchp_debug, "Debugging mode enabled or not"); 56 MODULE_PARM_DESC(shpchp_debug, "Debugging mode enabled or not");
57 MODULE_PARM_DESC(shpchp_poll_mode, "Using polling mechanism for hot-plug events or not"); 57 MODULE_PARM_DESC(shpchp_poll_mode, "Using polling mechanism for hot-plug events or not");
58 MODULE_PARM_DESC(shpchp_poll_time, "Polling mechanism frequency, in seconds"); 58 MODULE_PARM_DESC(shpchp_poll_time, "Polling mechanism frequency, in seconds");
59 59
60 #define SHPC_MODULE_NAME "shpchp" 60 #define SHPC_MODULE_NAME "shpchp"
61 61
62 static int set_attention_status (struct hotplug_slot *slot, u8 value); 62 static int set_attention_status (struct hotplug_slot *slot, u8 value);
63 static int enable_slot (struct hotplug_slot *slot); 63 static int enable_slot (struct hotplug_slot *slot);
64 static int disable_slot (struct hotplug_slot *slot); 64 static int disable_slot (struct hotplug_slot *slot);
65 static int get_power_status (struct hotplug_slot *slot, u8 *value); 65 static int get_power_status (struct hotplug_slot *slot, u8 *value);
66 static int get_attention_status (struct hotplug_slot *slot, u8 *value); 66 static int get_attention_status (struct hotplug_slot *slot, u8 *value);
67 static int get_latch_status (struct hotplug_slot *slot, u8 *value); 67 static int get_latch_status (struct hotplug_slot *slot, u8 *value);
68 static int get_adapter_status (struct hotplug_slot *slot, u8 *value); 68 static int get_adapter_status (struct hotplug_slot *slot, u8 *value);
69 69
70 static struct hotplug_slot_ops shpchp_hotplug_slot_ops = { 70 static struct hotplug_slot_ops shpchp_hotplug_slot_ops = {
71 .set_attention_status = set_attention_status, 71 .set_attention_status = set_attention_status,
72 .enable_slot = enable_slot, 72 .enable_slot = enable_slot,
73 .disable_slot = disable_slot, 73 .disable_slot = disable_slot,
74 .get_power_status = get_power_status, 74 .get_power_status = get_power_status,
75 .get_attention_status = get_attention_status, 75 .get_attention_status = get_attention_status,
76 .get_latch_status = get_latch_status, 76 .get_latch_status = get_latch_status,
77 .get_adapter_status = get_adapter_status, 77 .get_adapter_status = get_adapter_status,
78 }; 78 };
79 79
80 /** 80 /**
81 * release_slot - free up the memory used by a slot 81 * release_slot - free up the memory used by a slot
82 * @hotplug_slot: slot to free 82 * @hotplug_slot: slot to free
83 */ 83 */
84 static void release_slot(struct hotplug_slot *hotplug_slot) 84 static void release_slot(struct hotplug_slot *hotplug_slot)
85 { 85 {
86 struct slot *slot = hotplug_slot->private; 86 struct slot *slot = hotplug_slot->private;
87 87
88 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n", 88 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n",
89 __func__, slot_name(slot)); 89 __func__, slot_name(slot));
90 90
91 kfree(slot->hotplug_slot->info); 91 kfree(slot->hotplug_slot->info);
92 kfree(slot->hotplug_slot); 92 kfree(slot->hotplug_slot);
93 kfree(slot); 93 kfree(slot);
94 } 94 }
95 95
96 static int init_slots(struct controller *ctrl) 96 static int init_slots(struct controller *ctrl)
97 { 97 {
98 struct slot *slot; 98 struct slot *slot;
99 struct hotplug_slot *hotplug_slot; 99 struct hotplug_slot *hotplug_slot;
100 struct hotplug_slot_info *info; 100 struct hotplug_slot_info *info;
101 char name[SLOT_NAME_SIZE]; 101 char name[SLOT_NAME_SIZE];
102 int retval = -ENOMEM; 102 int retval = -ENOMEM;
103 int i; 103 int i;
104 104
105 for (i = 0; i < ctrl->num_slots; i++) { 105 for (i = 0; i < ctrl->num_slots; i++) {
106 slot = kzalloc(sizeof(*slot), GFP_KERNEL); 106 slot = kzalloc(sizeof(*slot), GFP_KERNEL);
107 if (!slot) 107 if (!slot)
108 goto error; 108 goto error;
109 109
110 hotplug_slot = kzalloc(sizeof(*hotplug_slot), GFP_KERNEL); 110 hotplug_slot = kzalloc(sizeof(*hotplug_slot), GFP_KERNEL);
111 if (!hotplug_slot) 111 if (!hotplug_slot)
112 goto error_slot; 112 goto error_slot;
113 slot->hotplug_slot = hotplug_slot; 113 slot->hotplug_slot = hotplug_slot;
114 114
115 info = kzalloc(sizeof(*info), GFP_KERNEL); 115 info = kzalloc(sizeof(*info), GFP_KERNEL);
116 if (!info) 116 if (!info)
117 goto error_hpslot; 117 goto error_hpslot;
118 hotplug_slot->info = info; 118 hotplug_slot->info = info;
119 119
120 slot->hp_slot = i; 120 slot->hp_slot = i;
121 slot->ctrl = ctrl; 121 slot->ctrl = ctrl;
122 slot->bus = ctrl->pci_dev->subordinate->number; 122 slot->bus = ctrl->pci_dev->subordinate->number;
123 slot->device = ctrl->slot_device_offset + i; 123 slot->device = ctrl->slot_device_offset + i;
124 slot->hpc_ops = ctrl->hpc_ops; 124 slot->hpc_ops = ctrl->hpc_ops;
125 slot->number = ctrl->first_slot + (ctrl->slot_num_inc * i); 125 slot->number = ctrl->first_slot + (ctrl->slot_num_inc * i);
126 mutex_init(&slot->lock); 126 mutex_init(&slot->lock);
127 INIT_DELAYED_WORK(&slot->work, shpchp_queue_pushbutton_work); 127 INIT_DELAYED_WORK(&slot->work, shpchp_queue_pushbutton_work);
128 128
129 /* register this slot with the hotplug pci core */ 129 /* register this slot with the hotplug pci core */
130 hotplug_slot->private = slot; 130 hotplug_slot->private = slot;
131 hotplug_slot->release = &release_slot; 131 hotplug_slot->release = &release_slot;
132 snprintf(name, SLOT_NAME_SIZE, "%d", slot->number); 132 snprintf(name, SLOT_NAME_SIZE, "%d", slot->number);
133 hotplug_slot->ops = &shpchp_hotplug_slot_ops; 133 hotplug_slot->ops = &shpchp_hotplug_slot_ops;
134 134
135 ctrl_dbg(ctrl, "Registering domain:bus:dev=%04x:%02x:%02x " 135 ctrl_dbg(ctrl, "Registering domain:bus:dev=%04x:%02x:%02x "
136 "hp_slot=%x sun=%x slot_device_offset=%x\n", 136 "hp_slot=%x sun=%x slot_device_offset=%x\n",
137 pci_domain_nr(ctrl->pci_dev->subordinate), 137 pci_domain_nr(ctrl->pci_dev->subordinate),
138 slot->bus, slot->device, slot->hp_slot, slot->number, 138 slot->bus, slot->device, slot->hp_slot, slot->number,
139 ctrl->slot_device_offset); 139 ctrl->slot_device_offset);
140 retval = pci_hp_register(slot->hotplug_slot, 140 retval = pci_hp_register(slot->hotplug_slot,
141 ctrl->pci_dev->subordinate, slot->device, name); 141 ctrl->pci_dev->subordinate, slot->device, name);
142 if (retval) { 142 if (retval) {
143 ctrl_err(ctrl, "pci_hp_register failed with error %d\n", 143 ctrl_err(ctrl, "pci_hp_register failed with error %d\n",
144 retval); 144 retval);
145 goto error_info; 145 goto error_info;
146 } 146 }
147 147
148 get_power_status(hotplug_slot, &info->power_status); 148 get_power_status(hotplug_slot, &info->power_status);
149 get_attention_status(hotplug_slot, &info->attention_status); 149 get_attention_status(hotplug_slot, &info->attention_status);
150 get_latch_status(hotplug_slot, &info->latch_status); 150 get_latch_status(hotplug_slot, &info->latch_status);
151 get_adapter_status(hotplug_slot, &info->adapter_status); 151 get_adapter_status(hotplug_slot, &info->adapter_status);
152 152
153 list_add(&slot->slot_list, &ctrl->slot_list); 153 list_add(&slot->slot_list, &ctrl->slot_list);
154 } 154 }
155 155
156 return 0; 156 return 0;
157 error_info: 157 error_info:
158 kfree(info); 158 kfree(info);
159 error_hpslot: 159 error_hpslot:
160 kfree(hotplug_slot); 160 kfree(hotplug_slot);
161 error_slot: 161 error_slot:
162 kfree(slot); 162 kfree(slot);
163 error: 163 error:
164 return retval; 164 return retval;
165 } 165 }
166 166
167 void cleanup_slots(struct controller *ctrl) 167 void cleanup_slots(struct controller *ctrl)
168 { 168 {
169 struct list_head *tmp; 169 struct list_head *tmp;
170 struct list_head *next; 170 struct list_head *next;
171 struct slot *slot; 171 struct slot *slot;
172 172
173 list_for_each_safe(tmp, next, &ctrl->slot_list) { 173 list_for_each_safe(tmp, next, &ctrl->slot_list) {
174 slot = list_entry(tmp, struct slot, slot_list); 174 slot = list_entry(tmp, struct slot, slot_list);
175 list_del(&slot->slot_list); 175 list_del(&slot->slot_list);
176 cancel_delayed_work(&slot->work); 176 cancel_delayed_work(&slot->work);
177 flush_scheduled_work();
178 flush_workqueue(shpchp_wq); 177 flush_workqueue(shpchp_wq);
178 flush_workqueue(shpchp_ordered_wq);
179 pci_hp_deregister(slot->hotplug_slot); 179 pci_hp_deregister(slot->hotplug_slot);
180 } 180 }
181 } 181 }
182 182
183 /* 183 /*
184 * set_attention_status - Turns the Amber LED for a slot on, off or blink 184 * set_attention_status - Turns the Amber LED for a slot on, off or blink
185 */ 185 */
186 static int set_attention_status (struct hotplug_slot *hotplug_slot, u8 status) 186 static int set_attention_status (struct hotplug_slot *hotplug_slot, u8 status)
187 { 187 {
188 struct slot *slot = get_slot(hotplug_slot); 188 struct slot *slot = get_slot(hotplug_slot);
189 189
190 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n", 190 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n",
191 __func__, slot_name(slot)); 191 __func__, slot_name(slot));
192 192
193 hotplug_slot->info->attention_status = status; 193 hotplug_slot->info->attention_status = status;
194 slot->hpc_ops->set_attention_status(slot, status); 194 slot->hpc_ops->set_attention_status(slot, status);
195 195
196 return 0; 196 return 0;
197 } 197 }
198 198
199 static int enable_slot (struct hotplug_slot *hotplug_slot) 199 static int enable_slot (struct hotplug_slot *hotplug_slot)
200 { 200 {
201 struct slot *slot = get_slot(hotplug_slot); 201 struct slot *slot = get_slot(hotplug_slot);
202 202
203 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n", 203 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n",
204 __func__, slot_name(slot)); 204 __func__, slot_name(slot));
205 205
206 return shpchp_sysfs_enable_slot(slot); 206 return shpchp_sysfs_enable_slot(slot);
207 } 207 }
208 208
209 static int disable_slot (struct hotplug_slot *hotplug_slot) 209 static int disable_slot (struct hotplug_slot *hotplug_slot)
210 { 210 {
211 struct slot *slot = get_slot(hotplug_slot); 211 struct slot *slot = get_slot(hotplug_slot);
212 212
213 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n", 213 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n",
214 __func__, slot_name(slot)); 214 __func__, slot_name(slot));
215 215
216 return shpchp_sysfs_disable_slot(slot); 216 return shpchp_sysfs_disable_slot(slot);
217 } 217 }
218 218
219 static int get_power_status (struct hotplug_slot *hotplug_slot, u8 *value) 219 static int get_power_status (struct hotplug_slot *hotplug_slot, u8 *value)
220 { 220 {
221 struct slot *slot = get_slot(hotplug_slot); 221 struct slot *slot = get_slot(hotplug_slot);
222 int retval; 222 int retval;
223 223
224 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n", 224 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n",
225 __func__, slot_name(slot)); 225 __func__, slot_name(slot));
226 226
227 retval = slot->hpc_ops->get_power_status(slot, value); 227 retval = slot->hpc_ops->get_power_status(slot, value);
228 if (retval < 0) 228 if (retval < 0)
229 *value = hotplug_slot->info->power_status; 229 *value = hotplug_slot->info->power_status;
230 230
231 return 0; 231 return 0;
232 } 232 }
233 233
234 static int get_attention_status (struct hotplug_slot *hotplug_slot, u8 *value) 234 static int get_attention_status (struct hotplug_slot *hotplug_slot, u8 *value)
235 { 235 {
236 struct slot *slot = get_slot(hotplug_slot); 236 struct slot *slot = get_slot(hotplug_slot);
237 int retval; 237 int retval;
238 238
239 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n", 239 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n",
240 __func__, slot_name(slot)); 240 __func__, slot_name(slot));
241 241
242 retval = slot->hpc_ops->get_attention_status(slot, value); 242 retval = slot->hpc_ops->get_attention_status(slot, value);
243 if (retval < 0) 243 if (retval < 0)
244 *value = hotplug_slot->info->attention_status; 244 *value = hotplug_slot->info->attention_status;
245 245
246 return 0; 246 return 0;
247 } 247 }
248 248
249 static int get_latch_status (struct hotplug_slot *hotplug_slot, u8 *value) 249 static int get_latch_status (struct hotplug_slot *hotplug_slot, u8 *value)
250 { 250 {
251 struct slot *slot = get_slot(hotplug_slot); 251 struct slot *slot = get_slot(hotplug_slot);
252 int retval; 252 int retval;
253 253
254 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n", 254 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n",
255 __func__, slot_name(slot)); 255 __func__, slot_name(slot));
256 256
257 retval = slot->hpc_ops->get_latch_status(slot, value); 257 retval = slot->hpc_ops->get_latch_status(slot, value);
258 if (retval < 0) 258 if (retval < 0)
259 *value = hotplug_slot->info->latch_status; 259 *value = hotplug_slot->info->latch_status;
260 260
261 return 0; 261 return 0;
262 } 262 }
263 263
264 static int get_adapter_status (struct hotplug_slot *hotplug_slot, u8 *value) 264 static int get_adapter_status (struct hotplug_slot *hotplug_slot, u8 *value)
265 { 265 {
266 struct slot *slot = get_slot(hotplug_slot); 266 struct slot *slot = get_slot(hotplug_slot);
267 int retval; 267 int retval;
268 268
269 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n", 269 ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n",
270 __func__, slot_name(slot)); 270 __func__, slot_name(slot));
271 271
272 retval = slot->hpc_ops->get_adapter_status(slot, value); 272 retval = slot->hpc_ops->get_adapter_status(slot, value);
273 if (retval < 0) 273 if (retval < 0)
274 *value = hotplug_slot->info->adapter_status; 274 *value = hotplug_slot->info->adapter_status;
275 275
276 return 0; 276 return 0;
277 } 277 }
278 278
279 static int is_shpc_capable(struct pci_dev *dev) 279 static int is_shpc_capable(struct pci_dev *dev)
280 { 280 {
281 if ((dev->vendor == PCI_VENDOR_ID_AMD) || (dev->device == 281 if ((dev->vendor == PCI_VENDOR_ID_AMD) || (dev->device ==
282 PCI_DEVICE_ID_AMD_GOLAM_7450)) 282 PCI_DEVICE_ID_AMD_GOLAM_7450))
283 return 1; 283 return 1;
284 if (!pci_find_capability(dev, PCI_CAP_ID_SHPC)) 284 if (!pci_find_capability(dev, PCI_CAP_ID_SHPC))
285 return 0; 285 return 0;
286 if (get_hp_hw_control_from_firmware(dev)) 286 if (get_hp_hw_control_from_firmware(dev))
287 return 0; 287 return 0;
288 return 1; 288 return 1;
289 } 289 }
290 290
291 static int shpc_probe(struct pci_dev *pdev, const struct pci_device_id *ent) 291 static int shpc_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
292 { 292 {
293 int rc; 293 int rc;
294 struct controller *ctrl; 294 struct controller *ctrl;
295 295
296 if (!is_shpc_capable(pdev)) 296 if (!is_shpc_capable(pdev))
297 return -ENODEV; 297 return -ENODEV;
298 298
299 ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL); 299 ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
300 if (!ctrl) { 300 if (!ctrl) {
301 dev_err(&pdev->dev, "%s: Out of memory\n", __func__); 301 dev_err(&pdev->dev, "%s: Out of memory\n", __func__);
302 goto err_out_none; 302 goto err_out_none;
303 } 303 }
304 INIT_LIST_HEAD(&ctrl->slot_list); 304 INIT_LIST_HEAD(&ctrl->slot_list);
305 305
306 rc = shpc_init(ctrl, pdev); 306 rc = shpc_init(ctrl, pdev);
307 if (rc) { 307 if (rc) {
308 ctrl_dbg(ctrl, "Controller initialization failed\n"); 308 ctrl_dbg(ctrl, "Controller initialization failed\n");
309 goto err_out_free_ctrl; 309 goto err_out_free_ctrl;
310 } 310 }
311 311
312 pci_set_drvdata(pdev, ctrl); 312 pci_set_drvdata(pdev, ctrl);
313 313
314 /* Setup the slot information structures */ 314 /* Setup the slot information structures */
315 rc = init_slots(ctrl); 315 rc = init_slots(ctrl);
316 if (rc) { 316 if (rc) {
317 ctrl_err(ctrl, "Slot initialization failed\n"); 317 ctrl_err(ctrl, "Slot initialization failed\n");
318 goto err_out_release_ctlr; 318 goto err_out_release_ctlr;
319 } 319 }
320 320
321 rc = shpchp_create_ctrl_files(ctrl); 321 rc = shpchp_create_ctrl_files(ctrl);
322 if (rc) 322 if (rc)
323 goto err_cleanup_slots; 323 goto err_cleanup_slots;
324 324
325 return 0; 325 return 0;
326 326
327 err_cleanup_slots: 327 err_cleanup_slots:
328 cleanup_slots(ctrl); 328 cleanup_slots(ctrl);
329 err_out_release_ctlr: 329 err_out_release_ctlr:
330 ctrl->hpc_ops->release_ctlr(ctrl); 330 ctrl->hpc_ops->release_ctlr(ctrl);
331 err_out_free_ctrl: 331 err_out_free_ctrl:
332 kfree(ctrl); 332 kfree(ctrl);
333 err_out_none: 333 err_out_none:
334 return -ENODEV; 334 return -ENODEV;
335 } 335 }
336 336
337 static void shpc_remove(struct pci_dev *dev) 337 static void shpc_remove(struct pci_dev *dev)
338 { 338 {
339 struct controller *ctrl = pci_get_drvdata(dev); 339 struct controller *ctrl = pci_get_drvdata(dev);
340 340
341 shpchp_remove_ctrl_files(ctrl); 341 shpchp_remove_ctrl_files(ctrl);
342 ctrl->hpc_ops->release_ctlr(ctrl); 342 ctrl->hpc_ops->release_ctlr(ctrl);
343 kfree(ctrl); 343 kfree(ctrl);
344 } 344 }
345 345
346 static struct pci_device_id shpcd_pci_tbl[] = { 346 static struct pci_device_id shpcd_pci_tbl[] = {
347 {PCI_DEVICE_CLASS(((PCI_CLASS_BRIDGE_PCI << 8) | 0x00), ~0)}, 347 {PCI_DEVICE_CLASS(((PCI_CLASS_BRIDGE_PCI << 8) | 0x00), ~0)},
348 { /* end: all zeroes */ } 348 { /* end: all zeroes */ }
349 }; 349 };
350 MODULE_DEVICE_TABLE(pci, shpcd_pci_tbl); 350 MODULE_DEVICE_TABLE(pci, shpcd_pci_tbl);
351 351
352 static struct pci_driver shpc_driver = { 352 static struct pci_driver shpc_driver = {
353 .name = SHPC_MODULE_NAME, 353 .name = SHPC_MODULE_NAME,
354 .id_table = shpcd_pci_tbl, 354 .id_table = shpcd_pci_tbl,
355 .probe = shpc_probe, 355 .probe = shpc_probe,
356 .remove = shpc_remove, 356 .remove = shpc_remove,
357 }; 357 };
358 358
359 static int __init shpcd_init(void) 359 static int __init shpcd_init(void)
360 { 360 {
361 int retval = 0; 361 int retval = 0;
362 362
363 shpchp_wq = alloc_ordered_workqueue("shpchp", 0);
364 if (!shpchp_wq)
365 return -ENOMEM;
366
367 shpchp_ordered_wq = alloc_ordered_workqueue("shpchp_ordered", 0);
368 if (!shpchp_ordered_wq) {
369 destroy_workqueue(shpchp_wq);
370 return -ENOMEM;
371 }
372
363 retval = pci_register_driver(&shpc_driver); 373 retval = pci_register_driver(&shpc_driver);
364 dbg("%s: pci_register_driver = %d\n", __func__, retval); 374 dbg("%s: pci_register_driver = %d\n", __func__, retval);
365 info(DRIVER_DESC " version: " DRIVER_VERSION "\n"); 375 info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
376 if (retval) {
377 destroy_workqueue(shpchp_ordered_wq);
378 destroy_workqueue(shpchp_wq);
379 }
366 return retval; 380 return retval;
367 } 381 }
368 382
369 static void __exit shpcd_cleanup(void) 383 static void __exit shpcd_cleanup(void)
370 { 384 {
371 dbg("unload_shpchpd()\n"); 385 dbg("unload_shpchpd()\n");
372 pci_unregister_driver(&shpc_driver); 386 pci_unregister_driver(&shpc_driver);
387 destroy_workqueue(shpchp_ordered_wq);
388 destroy_workqueue(shpchp_wq);
373 info(DRIVER_DESC " version: " DRIVER_VERSION " unloaded\n"); 389 info(DRIVER_DESC " version: " DRIVER_VERSION " unloaded\n");
374 } 390 }
375 391
376 module_init(shpcd_init); 392 module_init(shpcd_init);
drivers/pci/hotplug/shpchp_ctrl.c
1 /* 1 /*
2 * Standard Hot Plug Controller Driver 2 * Standard Hot Plug Controller Driver
3 * 3 *
4 * Copyright (C) 1995,2001 Compaq Computer Corporation 4 * Copyright (C) 1995,2001 Compaq Computer Corporation
5 * Copyright (C) 2001 Greg Kroah-Hartman (greg@kroah.com) 5 * Copyright (C) 2001 Greg Kroah-Hartman (greg@kroah.com)
6 * Copyright (C) 2001 IBM Corp. 6 * Copyright (C) 2001 IBM Corp.
7 * Copyright (C) 2003-2004 Intel Corporation 7 * Copyright (C) 2003-2004 Intel Corporation
8 * 8 *
9 * All rights reserved. 9 * All rights reserved.
10 * 10 *
11 * This program is free software; you can redistribute it and/or modify 11 * This program is free software; you can redistribute it and/or modify
12 * it under the terms of the GNU General Public License as published by 12 * it under the terms of the GNU General Public License as published by
13 * the Free Software Foundation; either version 2 of the License, or (at 13 * the Free Software Foundation; either version 2 of the License, or (at
14 * your option) any later version. 14 * your option) any later version.
15 * 15 *
16 * This program is distributed in the hope that it will be useful, but 16 * This program is distributed in the hope that it will be useful, but
17 * WITHOUT ANY WARRANTY; without even the implied warranty of 17 * WITHOUT ANY WARRANTY; without even the implied warranty of
18 * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or 18 * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
19 * NON INFRINGEMENT. See the GNU General Public License for more 19 * NON INFRINGEMENT. See the GNU General Public License for more
20 * details. 20 * details.
21 * 21 *
22 * You should have received a copy of the GNU General Public License 22 * You should have received a copy of the GNU General Public License
23 * along with this program; if not, write to the Free Software 23 * along with this program; if not, write to the Free Software
24 * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. 24 * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
25 * 25 *
26 * Send feedback to <greg@kroah.com>, <kristen.c.accardi@intel.com> 26 * Send feedback to <greg@kroah.com>, <kristen.c.accardi@intel.com>
27 * 27 *
28 */ 28 */
29 29
30 #include <linux/module.h> 30 #include <linux/module.h>
31 #include <linux/kernel.h> 31 #include <linux/kernel.h>
32 #include <linux/types.h> 32 #include <linux/types.h>
33 #include <linux/slab.h> 33 #include <linux/slab.h>
34 #include <linux/pci.h> 34 #include <linux/pci.h>
35 #include <linux/workqueue.h>
36 #include "../pci.h" 35 #include "../pci.h"
37 #include "shpchp.h" 36 #include "shpchp.h"
38 37
39 static void interrupt_event_handler(struct work_struct *work); 38 static void interrupt_event_handler(struct work_struct *work);
40 static int shpchp_enable_slot(struct slot *p_slot); 39 static int shpchp_enable_slot(struct slot *p_slot);
41 static int shpchp_disable_slot(struct slot *p_slot); 40 static int shpchp_disable_slot(struct slot *p_slot);
42 41
43 static int queue_interrupt_event(struct slot *p_slot, u32 event_type) 42 static int queue_interrupt_event(struct slot *p_slot, u32 event_type)
44 { 43 {
45 struct event_info *info; 44 struct event_info *info;
46 45
47 info = kmalloc(sizeof(*info), GFP_ATOMIC); 46 info = kmalloc(sizeof(*info), GFP_ATOMIC);
48 if (!info) 47 if (!info)
49 return -ENOMEM; 48 return -ENOMEM;
50 49
51 info->event_type = event_type; 50 info->event_type = event_type;
52 info->p_slot = p_slot; 51 info->p_slot = p_slot;
53 INIT_WORK(&info->work, interrupt_event_handler); 52 INIT_WORK(&info->work, interrupt_event_handler);
54 53
55 schedule_work(&info->work); 54 queue_work(shpchp_wq, &info->work);
56 55
57 return 0; 56 return 0;
58 } 57 }
59 58
60 u8 shpchp_handle_attention_button(u8 hp_slot, struct controller *ctrl) 59 u8 shpchp_handle_attention_button(u8 hp_slot, struct controller *ctrl)
61 { 60 {
62 struct slot *p_slot; 61 struct slot *p_slot;
63 u32 event_type; 62 u32 event_type;
64 63
65 /* Attention Button Change */ 64 /* Attention Button Change */
66 ctrl_dbg(ctrl, "Attention button interrupt received\n"); 65 ctrl_dbg(ctrl, "Attention button interrupt received\n");
67 66
68 p_slot = shpchp_find_slot(ctrl, hp_slot + ctrl->slot_device_offset); 67 p_slot = shpchp_find_slot(ctrl, hp_slot + ctrl->slot_device_offset);
69 p_slot->hpc_ops->get_adapter_status(p_slot, &(p_slot->presence_save)); 68 p_slot->hpc_ops->get_adapter_status(p_slot, &(p_slot->presence_save));
70 69
71 /* 70 /*
72 * Button pressed - See if need to TAKE ACTION!!! 71 * Button pressed - See if need to TAKE ACTION!!!
73 */ 72 */
74 ctrl_info(ctrl, "Button pressed on Slot(%s)\n", slot_name(p_slot)); 73 ctrl_info(ctrl, "Button pressed on Slot(%s)\n", slot_name(p_slot));
75 event_type = INT_BUTTON_PRESS; 74 event_type = INT_BUTTON_PRESS;
76 75
77 queue_interrupt_event(p_slot, event_type); 76 queue_interrupt_event(p_slot, event_type);
78 77
79 return 0; 78 return 0;
80 79
81 } 80 }
82 81
83 u8 shpchp_handle_switch_change(u8 hp_slot, struct controller *ctrl) 82 u8 shpchp_handle_switch_change(u8 hp_slot, struct controller *ctrl)
84 { 83 {
85 struct slot *p_slot; 84 struct slot *p_slot;
86 u8 getstatus; 85 u8 getstatus;
87 u32 event_type; 86 u32 event_type;
88 87
89 /* Switch Change */ 88 /* Switch Change */
90 ctrl_dbg(ctrl, "Switch interrupt received\n"); 89 ctrl_dbg(ctrl, "Switch interrupt received\n");
91 90
92 p_slot = shpchp_find_slot(ctrl, hp_slot + ctrl->slot_device_offset); 91 p_slot = shpchp_find_slot(ctrl, hp_slot + ctrl->slot_device_offset);
93 p_slot->hpc_ops->get_adapter_status(p_slot, &(p_slot->presence_save)); 92 p_slot->hpc_ops->get_adapter_status(p_slot, &(p_slot->presence_save));
94 p_slot->hpc_ops->get_latch_status(p_slot, &getstatus); 93 p_slot->hpc_ops->get_latch_status(p_slot, &getstatus);
95 ctrl_dbg(ctrl, "Card present %x Power status %x\n", 94 ctrl_dbg(ctrl, "Card present %x Power status %x\n",
96 p_slot->presence_save, p_slot->pwr_save); 95 p_slot->presence_save, p_slot->pwr_save);
97 96
98 if (getstatus) { 97 if (getstatus) {
99 /* 98 /*
100 * Switch opened 99 * Switch opened
101 */ 100 */
102 ctrl_info(ctrl, "Latch open on Slot(%s)\n", slot_name(p_slot)); 101 ctrl_info(ctrl, "Latch open on Slot(%s)\n", slot_name(p_slot));
103 event_type = INT_SWITCH_OPEN; 102 event_type = INT_SWITCH_OPEN;
104 if (p_slot->pwr_save && p_slot->presence_save) { 103 if (p_slot->pwr_save && p_slot->presence_save) {
105 event_type = INT_POWER_FAULT; 104 event_type = INT_POWER_FAULT;
106 ctrl_err(ctrl, "Surprise Removal of card\n"); 105 ctrl_err(ctrl, "Surprise Removal of card\n");
107 } 106 }
108 } else { 107 } else {
109 /* 108 /*
110 * Switch closed 109 * Switch closed
111 */ 110 */
112 ctrl_info(ctrl, "Latch close on Slot(%s)\n", slot_name(p_slot)); 111 ctrl_info(ctrl, "Latch close on Slot(%s)\n", slot_name(p_slot));
113 event_type = INT_SWITCH_CLOSE; 112 event_type = INT_SWITCH_CLOSE;
114 } 113 }
115 114
116 queue_interrupt_event(p_slot, event_type); 115 queue_interrupt_event(p_slot, event_type);
117 116
118 return 1; 117 return 1;
119 } 118 }
120 119
121 u8 shpchp_handle_presence_change(u8 hp_slot, struct controller *ctrl) 120 u8 shpchp_handle_presence_change(u8 hp_slot, struct controller *ctrl)
122 { 121 {
123 struct slot *p_slot; 122 struct slot *p_slot;
124 u32 event_type; 123 u32 event_type;
125 124
126 /* Presence Change */ 125 /* Presence Change */
127 ctrl_dbg(ctrl, "Presence/Notify input change\n"); 126 ctrl_dbg(ctrl, "Presence/Notify input change\n");
128 127
129 p_slot = shpchp_find_slot(ctrl, hp_slot + ctrl->slot_device_offset); 128 p_slot = shpchp_find_slot(ctrl, hp_slot + ctrl->slot_device_offset);
130 129
131 /* 130 /*
132 * Save the presence state 131 * Save the presence state
133 */ 132 */
134 p_slot->hpc_ops->get_adapter_status(p_slot, &(p_slot->presence_save)); 133 p_slot->hpc_ops->get_adapter_status(p_slot, &(p_slot->presence_save));
135 if (p_slot->presence_save) { 134 if (p_slot->presence_save) {
136 /* 135 /*
137 * Card Present 136 * Card Present
138 */ 137 */
139 ctrl_info(ctrl, "Card present on Slot(%s)\n", 138 ctrl_info(ctrl, "Card present on Slot(%s)\n",
140 slot_name(p_slot)); 139 slot_name(p_slot));
141 event_type = INT_PRESENCE_ON; 140 event_type = INT_PRESENCE_ON;
142 } else { 141 } else {
143 /* 142 /*
144 * Not Present 143 * Not Present
145 */ 144 */
146 ctrl_info(ctrl, "Card not present on Slot(%s)\n", 145 ctrl_info(ctrl, "Card not present on Slot(%s)\n",
147 slot_name(p_slot)); 146 slot_name(p_slot));
148 event_type = INT_PRESENCE_OFF; 147 event_type = INT_PRESENCE_OFF;
149 } 148 }
150 149
151 queue_interrupt_event(p_slot, event_type); 150 queue_interrupt_event(p_slot, event_type);
152 151
153 return 1; 152 return 1;
154 } 153 }
155 154
156 u8 shpchp_handle_power_fault(u8 hp_slot, struct controller *ctrl) 155 u8 shpchp_handle_power_fault(u8 hp_slot, struct controller *ctrl)
157 { 156 {
158 struct slot *p_slot; 157 struct slot *p_slot;
159 u32 event_type; 158 u32 event_type;
160 159
161 /* Power fault */ 160 /* Power fault */
162 ctrl_dbg(ctrl, "Power fault interrupt received\n"); 161 ctrl_dbg(ctrl, "Power fault interrupt received\n");
163 162
164 p_slot = shpchp_find_slot(ctrl, hp_slot + ctrl->slot_device_offset); 163 p_slot = shpchp_find_slot(ctrl, hp_slot + ctrl->slot_device_offset);
165 164
166 if ( !(p_slot->hpc_ops->query_power_fault(p_slot))) { 165 if ( !(p_slot->hpc_ops->query_power_fault(p_slot))) {
167 /* 166 /*
168 * Power fault Cleared 167 * Power fault Cleared
169 */ 168 */
170 ctrl_info(ctrl, "Power fault cleared on Slot(%s)\n", 169 ctrl_info(ctrl, "Power fault cleared on Slot(%s)\n",
171 slot_name(p_slot)); 170 slot_name(p_slot));
172 p_slot->status = 0x00; 171 p_slot->status = 0x00;
173 event_type = INT_POWER_FAULT_CLEAR; 172 event_type = INT_POWER_FAULT_CLEAR;
174 } else { 173 } else {
175 /* 174 /*
176 * Power fault 175 * Power fault
177 */ 176 */
178 ctrl_info(ctrl, "Power fault on Slot(%s)\n", slot_name(p_slot)); 177 ctrl_info(ctrl, "Power fault on Slot(%s)\n", slot_name(p_slot));
179 event_type = INT_POWER_FAULT; 178 event_type = INT_POWER_FAULT;
180 /* set power fault status for this board */ 179 /* set power fault status for this board */
181 p_slot->status = 0xFF; 180 p_slot->status = 0xFF;
182 ctrl_info(ctrl, "Power fault bit %x set\n", hp_slot); 181 ctrl_info(ctrl, "Power fault bit %x set\n", hp_slot);
183 } 182 }
184 183
185 queue_interrupt_event(p_slot, event_type); 184 queue_interrupt_event(p_slot, event_type);
186 185
187 return 1; 186 return 1;
188 } 187 }
189 188
190 /* The following routines constitute the bulk of the 189 /* The following routines constitute the bulk of the
191 hotplug controller logic 190 hotplug controller logic
192 */ 191 */
193 static int change_bus_speed(struct controller *ctrl, struct slot *p_slot, 192 static int change_bus_speed(struct controller *ctrl, struct slot *p_slot,
194 enum pci_bus_speed speed) 193 enum pci_bus_speed speed)
195 { 194 {
196 int rc = 0; 195 int rc = 0;
197 196
198 ctrl_dbg(ctrl, "Change speed to %d\n", speed); 197 ctrl_dbg(ctrl, "Change speed to %d\n", speed);
199 if ((rc = p_slot->hpc_ops->set_bus_speed_mode(p_slot, speed))) { 198 if ((rc = p_slot->hpc_ops->set_bus_speed_mode(p_slot, speed))) {
200 ctrl_err(ctrl, "%s: Issue of set bus speed mode command " 199 ctrl_err(ctrl, "%s: Issue of set bus speed mode command "
201 "failed\n", __func__); 200 "failed\n", __func__);
202 return WRONG_BUS_FREQUENCY; 201 return WRONG_BUS_FREQUENCY;
203 } 202 }
204 return rc; 203 return rc;
205 } 204 }
206 205
207 static int fix_bus_speed(struct controller *ctrl, struct slot *pslot, 206 static int fix_bus_speed(struct controller *ctrl, struct slot *pslot,
208 u8 flag, enum pci_bus_speed asp, enum pci_bus_speed bsp, 207 u8 flag, enum pci_bus_speed asp, enum pci_bus_speed bsp,
209 enum pci_bus_speed msp) 208 enum pci_bus_speed msp)
210 { 209 {
211 int rc = 0; 210 int rc = 0;
212 211
213 /* 212 /*
214 * If other slots on the same bus are occupied, we cannot 213 * If other slots on the same bus are occupied, we cannot
215 * change the bus speed. 214 * change the bus speed.
216 */ 215 */
217 if (flag) { 216 if (flag) {
218 if (asp < bsp) { 217 if (asp < bsp) {
219 ctrl_err(ctrl, "Speed of bus %x and adapter %x " 218 ctrl_err(ctrl, "Speed of bus %x and adapter %x "
220 "mismatch\n", bsp, asp); 219 "mismatch\n", bsp, asp);
221 rc = WRONG_BUS_FREQUENCY; 220 rc = WRONG_BUS_FREQUENCY;
222 } 221 }
223 return rc; 222 return rc;
224 } 223 }
225 224
226 if (asp < msp) { 225 if (asp < msp) {
227 if (bsp != asp) 226 if (bsp != asp)
228 rc = change_bus_speed(ctrl, pslot, asp); 227 rc = change_bus_speed(ctrl, pslot, asp);
229 } else { 228 } else {
230 if (bsp != msp) 229 if (bsp != msp)
231 rc = change_bus_speed(ctrl, pslot, msp); 230 rc = change_bus_speed(ctrl, pslot, msp);
232 } 231 }
233 return rc; 232 return rc;
234 } 233 }
235 234
236 /** 235 /**
237 * board_added - Called after a board has been added to the system. 236 * board_added - Called after a board has been added to the system.
238 * @p_slot: target &slot 237 * @p_slot: target &slot
239 * 238 *
240 * Turns power on for the board. 239 * Turns power on for the board.
241 * Configures board. 240 * Configures board.
242 */ 241 */
243 static int board_added(struct slot *p_slot) 242 static int board_added(struct slot *p_slot)
244 { 243 {
245 u8 hp_slot; 244 u8 hp_slot;
246 u8 slots_not_empty = 0; 245 u8 slots_not_empty = 0;
247 int rc = 0; 246 int rc = 0;
248 enum pci_bus_speed asp, bsp, msp; 247 enum pci_bus_speed asp, bsp, msp;
249 struct controller *ctrl = p_slot->ctrl; 248 struct controller *ctrl = p_slot->ctrl;
250 struct pci_bus *parent = ctrl->pci_dev->subordinate; 249 struct pci_bus *parent = ctrl->pci_dev->subordinate;
251 250
252 hp_slot = p_slot->device - ctrl->slot_device_offset; 251 hp_slot = p_slot->device - ctrl->slot_device_offset;
253 252
254 ctrl_dbg(ctrl, 253 ctrl_dbg(ctrl,
255 "%s: p_slot->device, slot_offset, hp_slot = %d, %d ,%d\n", 254 "%s: p_slot->device, slot_offset, hp_slot = %d, %d ,%d\n",
256 __func__, p_slot->device, ctrl->slot_device_offset, hp_slot); 255 __func__, p_slot->device, ctrl->slot_device_offset, hp_slot);
257 256
258 /* Power on slot without connecting to bus */ 257 /* Power on slot without connecting to bus */
259 rc = p_slot->hpc_ops->power_on_slot(p_slot); 258 rc = p_slot->hpc_ops->power_on_slot(p_slot);
260 if (rc) { 259 if (rc) {
261 ctrl_err(ctrl, "Failed to power on slot\n"); 260 ctrl_err(ctrl, "Failed to power on slot\n");
262 return -1; 261 return -1;
263 } 262 }
264 263
265 if ((ctrl->pci_dev->vendor == 0x8086) && (ctrl->pci_dev->device == 0x0332)) { 264 if ((ctrl->pci_dev->vendor == 0x8086) && (ctrl->pci_dev->device == 0x0332)) {
266 if (slots_not_empty) 265 if (slots_not_empty)
267 return WRONG_BUS_FREQUENCY; 266 return WRONG_BUS_FREQUENCY;
268 267
269 if ((rc = p_slot->hpc_ops->set_bus_speed_mode(p_slot, PCI_SPEED_33MHz))) { 268 if ((rc = p_slot->hpc_ops->set_bus_speed_mode(p_slot, PCI_SPEED_33MHz))) {
270 ctrl_err(ctrl, "%s: Issue of set bus speed mode command" 269 ctrl_err(ctrl, "%s: Issue of set bus speed mode command"
271 " failed\n", __func__); 270 " failed\n", __func__);
272 return WRONG_BUS_FREQUENCY; 271 return WRONG_BUS_FREQUENCY;
273 } 272 }
274 273
275 /* turn on board, blink green LED, turn off Amber LED */ 274 /* turn on board, blink green LED, turn off Amber LED */
276 if ((rc = p_slot->hpc_ops->slot_enable(p_slot))) { 275 if ((rc = p_slot->hpc_ops->slot_enable(p_slot))) {
277 ctrl_err(ctrl, "Issue of Slot Enable command failed\n"); 276 ctrl_err(ctrl, "Issue of Slot Enable command failed\n");
278 return rc; 277 return rc;
279 } 278 }
280 } 279 }
281 280
282 rc = p_slot->hpc_ops->get_adapter_speed(p_slot, &asp); 281 rc = p_slot->hpc_ops->get_adapter_speed(p_slot, &asp);
283 if (rc) { 282 if (rc) {
284 ctrl_err(ctrl, "Can't get adapter speed or " 283 ctrl_err(ctrl, "Can't get adapter speed or "
285 "bus mode mismatch\n"); 284 "bus mode mismatch\n");
286 return WRONG_BUS_FREQUENCY; 285 return WRONG_BUS_FREQUENCY;
287 } 286 }
288 287
289 bsp = ctrl->pci_dev->bus->cur_bus_speed; 288 bsp = ctrl->pci_dev->bus->cur_bus_speed;
290 msp = ctrl->pci_dev->bus->max_bus_speed; 289 msp = ctrl->pci_dev->bus->max_bus_speed;
291 290
292 /* Check if there are other slots or devices on the same bus */ 291 /* Check if there are other slots or devices on the same bus */
293 if (!list_empty(&ctrl->pci_dev->subordinate->devices)) 292 if (!list_empty(&ctrl->pci_dev->subordinate->devices))
294 slots_not_empty = 1; 293 slots_not_empty = 1;
295 294
296 ctrl_dbg(ctrl, "%s: slots_not_empty %d, adapter_speed %d, bus_speed %d," 295 ctrl_dbg(ctrl, "%s: slots_not_empty %d, adapter_speed %d, bus_speed %d,"
297 " max_bus_speed %d\n", __func__, slots_not_empty, asp, 296 " max_bus_speed %d\n", __func__, slots_not_empty, asp,
298 bsp, msp); 297 bsp, msp);
299 298
300 rc = fix_bus_speed(ctrl, p_slot, slots_not_empty, asp, bsp, msp); 299 rc = fix_bus_speed(ctrl, p_slot, slots_not_empty, asp, bsp, msp);
301 if (rc) 300 if (rc)
302 return rc; 301 return rc;
303 302
304 /* turn on board, blink green LED, turn off Amber LED */ 303 /* turn on board, blink green LED, turn off Amber LED */
305 if ((rc = p_slot->hpc_ops->slot_enable(p_slot))) { 304 if ((rc = p_slot->hpc_ops->slot_enable(p_slot))) {
306 ctrl_err(ctrl, "Issue of Slot Enable command failed\n"); 305 ctrl_err(ctrl, "Issue of Slot Enable command failed\n");
307 return rc; 306 return rc;
308 } 307 }
309 308
310 /* Wait for ~1 second */ 309 /* Wait for ~1 second */
311 msleep(1000); 310 msleep(1000);
312 311
313 ctrl_dbg(ctrl, "%s: slot status = %x\n", __func__, p_slot->status); 312 ctrl_dbg(ctrl, "%s: slot status = %x\n", __func__, p_slot->status);
314 /* Check for a power fault */ 313 /* Check for a power fault */
315 if (p_slot->status == 0xFF) { 314 if (p_slot->status == 0xFF) {
316 /* power fault occurred, but it was benign */ 315 /* power fault occurred, but it was benign */
317 ctrl_dbg(ctrl, "%s: Power fault\n", __func__); 316 ctrl_dbg(ctrl, "%s: Power fault\n", __func__);
318 rc = POWER_FAILURE; 317 rc = POWER_FAILURE;
319 p_slot->status = 0; 318 p_slot->status = 0;
320 goto err_exit; 319 goto err_exit;
321 } 320 }
322 321
323 if (shpchp_configure_device(p_slot)) { 322 if (shpchp_configure_device(p_slot)) {
324 ctrl_err(ctrl, "Cannot add device at %04x:%02x:%02x\n", 323 ctrl_err(ctrl, "Cannot add device at %04x:%02x:%02x\n",
325 pci_domain_nr(parent), p_slot->bus, p_slot->device); 324 pci_domain_nr(parent), p_slot->bus, p_slot->device);
326 goto err_exit; 325 goto err_exit;
327 } 326 }
328 327
329 p_slot->status = 0; 328 p_slot->status = 0;
330 p_slot->is_a_board = 0x01; 329 p_slot->is_a_board = 0x01;
331 p_slot->pwr_save = 1; 330 p_slot->pwr_save = 1;
332 331
333 p_slot->hpc_ops->green_led_on(p_slot); 332 p_slot->hpc_ops->green_led_on(p_slot);
334 333
335 return 0; 334 return 0;
336 335
337 err_exit: 336 err_exit:
338 /* turn off slot, turn on Amber LED, turn off Green LED */ 337 /* turn off slot, turn on Amber LED, turn off Green LED */
339 rc = p_slot->hpc_ops->slot_disable(p_slot); 338 rc = p_slot->hpc_ops->slot_disable(p_slot);
340 if (rc) { 339 if (rc) {
341 ctrl_err(ctrl, "%s: Issue of Slot Disable command failed\n", 340 ctrl_err(ctrl, "%s: Issue of Slot Disable command failed\n",
342 __func__); 341 __func__);
343 return rc; 342 return rc;
344 } 343 }
345 344
346 return(rc); 345 return(rc);
347 } 346 }
348 347
349 348
350 /** 349 /**
351 * remove_board - Turns off slot and LEDs 350 * remove_board - Turns off slot and LEDs
352 * @p_slot: target &slot 351 * @p_slot: target &slot
353 */ 352 */
354 static int remove_board(struct slot *p_slot) 353 static int remove_board(struct slot *p_slot)
355 { 354 {
356 struct controller *ctrl = p_slot->ctrl; 355 struct controller *ctrl = p_slot->ctrl;
357 u8 hp_slot; 356 u8 hp_slot;
358 int rc; 357 int rc;
359 358
360 if (shpchp_unconfigure_device(p_slot)) 359 if (shpchp_unconfigure_device(p_slot))
361 return(1); 360 return(1);
362 361
363 hp_slot = p_slot->device - ctrl->slot_device_offset; 362 hp_slot = p_slot->device - ctrl->slot_device_offset;
364 p_slot = shpchp_find_slot(ctrl, hp_slot + ctrl->slot_device_offset); 363 p_slot = shpchp_find_slot(ctrl, hp_slot + ctrl->slot_device_offset);
365 364
366 ctrl_dbg(ctrl, "%s: hp_slot = %d\n", __func__, hp_slot); 365 ctrl_dbg(ctrl, "%s: hp_slot = %d\n", __func__, hp_slot);
367 366
368 /* Change status to shutdown */ 367 /* Change status to shutdown */
369 if (p_slot->is_a_board) 368 if (p_slot->is_a_board)
370 p_slot->status = 0x01; 369 p_slot->status = 0x01;
371 370
372 /* turn off slot, turn on Amber LED, turn off Green LED */ 371 /* turn off slot, turn on Amber LED, turn off Green LED */
373 rc = p_slot->hpc_ops->slot_disable(p_slot); 372 rc = p_slot->hpc_ops->slot_disable(p_slot);
374 if (rc) { 373 if (rc) {
375 ctrl_err(ctrl, "%s: Issue of Slot Disable command failed\n", 374 ctrl_err(ctrl, "%s: Issue of Slot Disable command failed\n",
376 __func__); 375 __func__);
377 return rc; 376 return rc;
378 } 377 }
379 378
380 rc = p_slot->hpc_ops->set_attention_status(p_slot, 0); 379 rc = p_slot->hpc_ops->set_attention_status(p_slot, 0);
381 if (rc) { 380 if (rc) {
382 ctrl_err(ctrl, "Issue of Set Attention command failed\n"); 381 ctrl_err(ctrl, "Issue of Set Attention command failed\n");
383 return rc; 382 return rc;
384 } 383 }
385 384
386 p_slot->pwr_save = 0; 385 p_slot->pwr_save = 0;
387 p_slot->is_a_board = 0; 386 p_slot->is_a_board = 0;
388 387
389 return 0; 388 return 0;
390 } 389 }
391 390
392 391
393 struct pushbutton_work_info { 392 struct pushbutton_work_info {
394 struct slot *p_slot; 393 struct slot *p_slot;
395 struct work_struct work; 394 struct work_struct work;
396 }; 395 };
397 396
398 /** 397 /**
399 * shpchp_pushbutton_thread - handle pushbutton events 398 * shpchp_pushbutton_thread - handle pushbutton events
400 * @work: &struct work_struct to be handled 399 * @work: &struct work_struct to be handled
401 * 400 *
402 * Scheduled procedure to handle blocking stuff for the pushbuttons. 401 * Scheduled procedure to handle blocking stuff for the pushbuttons.
403 * Handles all pending events and exits. 402 * Handles all pending events and exits.
404 */ 403 */
405 static void shpchp_pushbutton_thread(struct work_struct *work) 404 static void shpchp_pushbutton_thread(struct work_struct *work)
406 { 405 {
407 struct pushbutton_work_info *info = 406 struct pushbutton_work_info *info =
408 container_of(work, struct pushbutton_work_info, work); 407 container_of(work, struct pushbutton_work_info, work);
409 struct slot *p_slot = info->p_slot; 408 struct slot *p_slot = info->p_slot;
410 409
411 mutex_lock(&p_slot->lock); 410 mutex_lock(&p_slot->lock);
412 switch (p_slot->state) { 411 switch (p_slot->state) {
413 case POWEROFF_STATE: 412 case POWEROFF_STATE:
414 mutex_unlock(&p_slot->lock); 413 mutex_unlock(&p_slot->lock);
415 shpchp_disable_slot(p_slot); 414 shpchp_disable_slot(p_slot);
416 mutex_lock(&p_slot->lock); 415 mutex_lock(&p_slot->lock);
417 p_slot->state = STATIC_STATE; 416 p_slot->state = STATIC_STATE;
418 break; 417 break;
419 case POWERON_STATE: 418 case POWERON_STATE:
420 mutex_unlock(&p_slot->lock); 419 mutex_unlock(&p_slot->lock);
421 if (shpchp_enable_slot(p_slot)) 420 if (shpchp_enable_slot(p_slot))
422 p_slot->hpc_ops->green_led_off(p_slot); 421 p_slot->hpc_ops->green_led_off(p_slot);
423 mutex_lock(&p_slot->lock); 422 mutex_lock(&p_slot->lock);
424 p_slot->state = STATIC_STATE; 423 p_slot->state = STATIC_STATE;
425 break; 424 break;
426 default: 425 default:
427 break; 426 break;
428 } 427 }
429 mutex_unlock(&p_slot->lock); 428 mutex_unlock(&p_slot->lock);
430 429
431 kfree(info); 430 kfree(info);
432 } 431 }
433 432
434 void shpchp_queue_pushbutton_work(struct work_struct *work) 433 void shpchp_queue_pushbutton_work(struct work_struct *work)
435 { 434 {
436 struct slot *p_slot = container_of(work, struct slot, work.work); 435 struct slot *p_slot = container_of(work, struct slot, work.work);
437 struct pushbutton_work_info *info; 436 struct pushbutton_work_info *info;
438 437
439 info = kmalloc(sizeof(*info), GFP_KERNEL); 438 info = kmalloc(sizeof(*info), GFP_KERNEL);
440 if (!info) { 439 if (!info) {
441 ctrl_err(p_slot->ctrl, "%s: Cannot allocate memory\n", 440 ctrl_err(p_slot->ctrl, "%s: Cannot allocate memory\n",
442 __func__); 441 __func__);
443 return; 442 return;
444 } 443 }
445 info->p_slot = p_slot; 444 info->p_slot = p_slot;
446 INIT_WORK(&info->work, shpchp_pushbutton_thread); 445 INIT_WORK(&info->work, shpchp_pushbutton_thread);
447 446
448 mutex_lock(&p_slot->lock); 447 mutex_lock(&p_slot->lock);
449 switch (p_slot->state) { 448 switch (p_slot->state) {
450 case BLINKINGOFF_STATE: 449 case BLINKINGOFF_STATE:
451 p_slot->state = POWEROFF_STATE; 450 p_slot->state = POWEROFF_STATE;
452 break; 451 break;
453 case BLINKINGON_STATE: 452 case BLINKINGON_STATE:
454 p_slot->state = POWERON_STATE; 453 p_slot->state = POWERON_STATE;
455 break; 454 break;
456 default: 455 default:
457 kfree(info); 456 kfree(info);
458 goto out; 457 goto out;
459 } 458 }
460 queue_work(shpchp_wq, &info->work); 459 queue_work(shpchp_ordered_wq, &info->work);
461 out: 460 out:
462 mutex_unlock(&p_slot->lock); 461 mutex_unlock(&p_slot->lock);
463 } 462 }
464 463
465 static int update_slot_info (struct slot *slot) 464 static int update_slot_info (struct slot *slot)
466 { 465 {
467 struct hotplug_slot_info *info; 466 struct hotplug_slot_info *info;
468 int result; 467 int result;
469 468
470 info = kmalloc(sizeof(*info), GFP_KERNEL); 469 info = kmalloc(sizeof(*info), GFP_KERNEL);
471 if (!info) 470 if (!info)
472 return -ENOMEM; 471 return -ENOMEM;
473 472
474 slot->hpc_ops->get_power_status(slot, &(info->power_status)); 473 slot->hpc_ops->get_power_status(slot, &(info->power_status));
475 slot->hpc_ops->get_attention_status(slot, &(info->attention_status)); 474 slot->hpc_ops->get_attention_status(slot, &(info->attention_status));
476 slot->hpc_ops->get_latch_status(slot, &(info->latch_status)); 475 slot->hpc_ops->get_latch_status(slot, &(info->latch_status));
477 slot->hpc_ops->get_adapter_status(slot, &(info->adapter_status)); 476 slot->hpc_ops->get_adapter_status(slot, &(info->adapter_status));
478 477
479 result = pci_hp_change_slot_info(slot->hotplug_slot, info); 478 result = pci_hp_change_slot_info(slot->hotplug_slot, info);
480 kfree (info); 479 kfree (info);
481 return result; 480 return result;
482 } 481 }
483 482
484 /* 483 /*
485 * Note: This function must be called with slot->lock held 484 * Note: This function must be called with slot->lock held
486 */ 485 */
487 static void handle_button_press_event(struct slot *p_slot) 486 static void handle_button_press_event(struct slot *p_slot)
488 { 487 {
489 u8 getstatus; 488 u8 getstatus;
490 struct controller *ctrl = p_slot->ctrl; 489 struct controller *ctrl = p_slot->ctrl;
491 490
492 switch (p_slot->state) { 491 switch (p_slot->state) {
493 case STATIC_STATE: 492 case STATIC_STATE:
494 p_slot->hpc_ops->get_power_status(p_slot, &getstatus); 493 p_slot->hpc_ops->get_power_status(p_slot, &getstatus);
495 if (getstatus) { 494 if (getstatus) {
496 p_slot->state = BLINKINGOFF_STATE; 495 p_slot->state = BLINKINGOFF_STATE;
497 ctrl_info(ctrl, "PCI slot #%s - powering off due to " 496 ctrl_info(ctrl, "PCI slot #%s - powering off due to "
498 "button press.\n", slot_name(p_slot)); 497 "button press.\n", slot_name(p_slot));
499 } else { 498 } else {
500 p_slot->state = BLINKINGON_STATE; 499 p_slot->state = BLINKINGON_STATE;
501 ctrl_info(ctrl, "PCI slot #%s - powering on due to " 500 ctrl_info(ctrl, "PCI slot #%s - powering on due to "
502 "button press.\n", slot_name(p_slot)); 501 "button press.\n", slot_name(p_slot));
503 } 502 }
504 /* blink green LED and turn off amber */ 503 /* blink green LED and turn off amber */
505 p_slot->hpc_ops->green_led_blink(p_slot); 504 p_slot->hpc_ops->green_led_blink(p_slot);
506 p_slot->hpc_ops->set_attention_status(p_slot, 0); 505 p_slot->hpc_ops->set_attention_status(p_slot, 0);
507 506
508 schedule_delayed_work(&p_slot->work, 5*HZ); 507 queue_delayed_work(shpchp_wq, &p_slot->work, 5*HZ);
509 break; 508 break;
510 case BLINKINGOFF_STATE: 509 case BLINKINGOFF_STATE:
511 case BLINKINGON_STATE: 510 case BLINKINGON_STATE:
512 /* 511 /*
513 * Cancel if we are still blinking; this means that we 512 * Cancel if we are still blinking; this means that we
514 * press the attention again before the 5 sec. limit 513 * press the attention again before the 5 sec. limit
515 * expires to cancel hot-add or hot-remove 514 * expires to cancel hot-add or hot-remove
516 */ 515 */
517 ctrl_info(ctrl, "Button cancel on Slot(%s)\n", 516 ctrl_info(ctrl, "Button cancel on Slot(%s)\n",
518 slot_name(p_slot)); 517 slot_name(p_slot));
519 cancel_delayed_work(&p_slot->work); 518 cancel_delayed_work(&p_slot->work);
520 if (p_slot->state == BLINKINGOFF_STATE) 519 if (p_slot->state == BLINKINGOFF_STATE)
521 p_slot->hpc_ops->green_led_on(p_slot); 520 p_slot->hpc_ops->green_led_on(p_slot);
522 else 521 else
523 p_slot->hpc_ops->green_led_off(p_slot); 522 p_slot->hpc_ops->green_led_off(p_slot);
524 p_slot->hpc_ops->set_attention_status(p_slot, 0); 523 p_slot->hpc_ops->set_attention_status(p_slot, 0);
525 ctrl_info(ctrl, "PCI slot #%s - action canceled due to " 524 ctrl_info(ctrl, "PCI slot #%s - action canceled due to "
526 "button press\n", slot_name(p_slot)); 525 "button press\n", slot_name(p_slot));
527 p_slot->state = STATIC_STATE; 526 p_slot->state = STATIC_STATE;
528 break; 527 break;
529 case POWEROFF_STATE: 528 case POWEROFF_STATE:
530 case POWERON_STATE: 529 case POWERON_STATE:
531 /* 530 /*
532 * Ignore if the slot is on power-on or power-off state; 531 * Ignore if the slot is on power-on or power-off state;
533 * this means that the previous attention button action 532 * this means that the previous attention button action
534 * to hot-add or hot-remove is undergoing 533 * to hot-add or hot-remove is undergoing
535 */ 534 */
536 ctrl_info(ctrl, "Button ignore on Slot(%s)\n", 535 ctrl_info(ctrl, "Button ignore on Slot(%s)\n",
537 slot_name(p_slot)); 536 slot_name(p_slot));
538 update_slot_info(p_slot); 537 update_slot_info(p_slot);
539 break; 538 break;
540 default: 539 default:
541 ctrl_warn(ctrl, "Not a valid state\n"); 540 ctrl_warn(ctrl, "Not a valid state\n");
542 break; 541 break;
543 } 542 }
544 } 543 }
545 544
546 static void interrupt_event_handler(struct work_struct *work) 545 static void interrupt_event_handler(struct work_struct *work)
547 { 546 {
548 struct event_info *info = container_of(work, struct event_info, work); 547 struct event_info *info = container_of(work, struct event_info, work);
549 struct slot *p_slot = info->p_slot; 548 struct slot *p_slot = info->p_slot;
550 549
551 mutex_lock(&p_slot->lock); 550 mutex_lock(&p_slot->lock);
552 switch (info->event_type) { 551 switch (info->event_type) {
553 case INT_BUTTON_PRESS: 552 case INT_BUTTON_PRESS:
554 handle_button_press_event(p_slot); 553 handle_button_press_event(p_slot);
555 break; 554 break;
556 case INT_POWER_FAULT: 555 case INT_POWER_FAULT:
557 ctrl_dbg(p_slot->ctrl, "%s: Power fault\n", __func__); 556 ctrl_dbg(p_slot->ctrl, "%s: Power fault\n", __func__);
558 p_slot->hpc_ops->set_attention_status(p_slot, 1); 557 p_slot->hpc_ops->set_attention_status(p_slot, 1);
559 p_slot->hpc_ops->green_led_off(p_slot); 558 p_slot->hpc_ops->green_led_off(p_slot);
560 break; 559 break;
561 default: 560 default:
562 update_slot_info(p_slot); 561 update_slot_info(p_slot);
563 break; 562 break;
564 } 563 }
565 mutex_unlock(&p_slot->lock); 564 mutex_unlock(&p_slot->lock);
566 565
567 kfree(info); 566 kfree(info);
568 } 567 }
569 568
570 569
571 static int shpchp_enable_slot (struct slot *p_slot) 570 static int shpchp_enable_slot (struct slot *p_slot)
572 { 571 {
573 u8 getstatus = 0; 572 u8 getstatus = 0;
574 int rc, retval = -ENODEV; 573 int rc, retval = -ENODEV;
575 struct controller *ctrl = p_slot->ctrl; 574 struct controller *ctrl = p_slot->ctrl;
576 575
577 /* Check to see if (latch closed, card present, power off) */ 576 /* Check to see if (latch closed, card present, power off) */
578 mutex_lock(&p_slot->ctrl->crit_sect); 577 mutex_lock(&p_slot->ctrl->crit_sect);
579 rc = p_slot->hpc_ops->get_adapter_status(p_slot, &getstatus); 578 rc = p_slot->hpc_ops->get_adapter_status(p_slot, &getstatus);
580 if (rc || !getstatus) { 579 if (rc || !getstatus) {
581 ctrl_info(ctrl, "No adapter on slot(%s)\n", slot_name(p_slot)); 580 ctrl_info(ctrl, "No adapter on slot(%s)\n", slot_name(p_slot));
582 goto out; 581 goto out;
583 } 582 }
584 rc = p_slot->hpc_ops->get_latch_status(p_slot, &getstatus); 583 rc = p_slot->hpc_ops->get_latch_status(p_slot, &getstatus);
585 if (rc || getstatus) { 584 if (rc || getstatus) {
586 ctrl_info(ctrl, "Latch open on slot(%s)\n", slot_name(p_slot)); 585 ctrl_info(ctrl, "Latch open on slot(%s)\n", slot_name(p_slot));
587 goto out; 586 goto out;
588 } 587 }
589 rc = p_slot->hpc_ops->get_power_status(p_slot, &getstatus); 588 rc = p_slot->hpc_ops->get_power_status(p_slot, &getstatus);
590 if (rc || getstatus) { 589 if (rc || getstatus) {
591 ctrl_info(ctrl, "Already enabled on slot(%s)\n", 590 ctrl_info(ctrl, "Already enabled on slot(%s)\n",
592 slot_name(p_slot)); 591 slot_name(p_slot));
593 goto out; 592 goto out;
594 } 593 }
595 594
596 p_slot->is_a_board = 1; 595 p_slot->is_a_board = 1;
597 596
598 /* We have to save the presence info for these slots */ 597 /* We have to save the presence info for these slots */
599 p_slot->hpc_ops->get_adapter_status(p_slot, &(p_slot->presence_save)); 598 p_slot->hpc_ops->get_adapter_status(p_slot, &(p_slot->presence_save));
600 p_slot->hpc_ops->get_power_status(p_slot, &(p_slot->pwr_save)); 599 p_slot->hpc_ops->get_power_status(p_slot, &(p_slot->pwr_save));
601 ctrl_dbg(ctrl, "%s: p_slot->pwr_save %x\n", __func__, p_slot->pwr_save); 600 ctrl_dbg(ctrl, "%s: p_slot->pwr_save %x\n", __func__, p_slot->pwr_save);
602 p_slot->hpc_ops->get_latch_status(p_slot, &getstatus); 601 p_slot->hpc_ops->get_latch_status(p_slot, &getstatus);
603 602
604 if(((p_slot->ctrl->pci_dev->vendor == PCI_VENDOR_ID_AMD) || 603 if(((p_slot->ctrl->pci_dev->vendor == PCI_VENDOR_ID_AMD) ||
605 (p_slot->ctrl->pci_dev->device == PCI_DEVICE_ID_AMD_POGO_7458)) 604 (p_slot->ctrl->pci_dev->device == PCI_DEVICE_ID_AMD_POGO_7458))
606 && p_slot->ctrl->num_slots == 1) { 605 && p_slot->ctrl->num_slots == 1) {
607 /* handle amd pogo errata; this must be done before enable */ 606 /* handle amd pogo errata; this must be done before enable */
608 amd_pogo_errata_save_misc_reg(p_slot); 607 amd_pogo_errata_save_misc_reg(p_slot);
609 retval = board_added(p_slot); 608 retval = board_added(p_slot);
610 /* handle amd pogo errata; this must be done after enable */ 609 /* handle amd pogo errata; this must be done after enable */
611 amd_pogo_errata_restore_misc_reg(p_slot); 610 amd_pogo_errata_restore_misc_reg(p_slot);
612 } else 611 } else
613 retval = board_added(p_slot); 612 retval = board_added(p_slot);
614 613
615 if (retval) { 614 if (retval) {
616 p_slot->hpc_ops->get_adapter_status(p_slot, 615 p_slot->hpc_ops->get_adapter_status(p_slot,
617 &(p_slot->presence_save)); 616 &(p_slot->presence_save));
618 p_slot->hpc_ops->get_latch_status(p_slot, &getstatus); 617 p_slot->hpc_ops->get_latch_status(p_slot, &getstatus);
619 } 618 }
620 619
621 update_slot_info(p_slot); 620 update_slot_info(p_slot);
622 out: 621 out:
623 mutex_unlock(&p_slot->ctrl->crit_sect); 622 mutex_unlock(&p_slot->ctrl->crit_sect);
624 return retval; 623 return retval;
625 } 624 }
626 625
627 626
628 static int shpchp_disable_slot (struct slot *p_slot) 627 static int shpchp_disable_slot (struct slot *p_slot)
629 { 628 {
630 u8 getstatus = 0; 629 u8 getstatus = 0;
631 int rc, retval = -ENODEV; 630 int rc, retval = -ENODEV;
632 struct controller *ctrl = p_slot->ctrl; 631 struct controller *ctrl = p_slot->ctrl;
633 632
634 if (!p_slot->ctrl) 633 if (!p_slot->ctrl)
635 return -ENODEV; 634 return -ENODEV;
636 635
637 /* Check to see if (latch closed, card present, power on) */ 636 /* Check to see if (latch closed, card present, power on) */
638 mutex_lock(&p_slot->ctrl->crit_sect); 637 mutex_lock(&p_slot->ctrl->crit_sect);
639 638
640 rc = p_slot->hpc_ops->get_adapter_status(p_slot, &getstatus); 639 rc = p_slot->hpc_ops->get_adapter_status(p_slot, &getstatus);
641 if (rc || !getstatus) { 640 if (rc || !getstatus) {
642 ctrl_info(ctrl, "No adapter on slot(%s)\n", slot_name(p_slot)); 641 ctrl_info(ctrl, "No adapter on slot(%s)\n", slot_name(p_slot));
643 goto out; 642 goto out;
644 } 643 }
645 rc = p_slot->hpc_ops->get_latch_status(p_slot, &getstatus); 644 rc = p_slot->hpc_ops->get_latch_status(p_slot, &getstatus);
646 if (rc || getstatus) { 645 if (rc || getstatus) {
647 ctrl_info(ctrl, "Latch open on slot(%s)\n", slot_name(p_slot)); 646 ctrl_info(ctrl, "Latch open on slot(%s)\n", slot_name(p_slot));
648 goto out; 647 goto out;
649 } 648 }
650 rc = p_slot->hpc_ops->get_power_status(p_slot, &getstatus); 649 rc = p_slot->hpc_ops->get_power_status(p_slot, &getstatus);
651 if (rc || !getstatus) { 650 if (rc || !getstatus) {
652 ctrl_info(ctrl, "Already disabled on slot(%s)\n", 651 ctrl_info(ctrl, "Already disabled on slot(%s)\n",
653 slot_name(p_slot)); 652 slot_name(p_slot));
654 goto out; 653 goto out;
655 } 654 }
656 655
657 retval = remove_board(p_slot); 656 retval = remove_board(p_slot);
658 update_slot_info(p_slot); 657 update_slot_info(p_slot);
659 out: 658 out:
660 mutex_unlock(&p_slot->ctrl->crit_sect); 659 mutex_unlock(&p_slot->ctrl->crit_sect);
661 return retval; 660 return retval;
662 } 661 }
663 662
664 int shpchp_sysfs_enable_slot(struct slot *p_slot) 663 int shpchp_sysfs_enable_slot(struct slot *p_slot)
665 { 664 {
666 int retval = -ENODEV; 665 int retval = -ENODEV;
667 struct controller *ctrl = p_slot->ctrl; 666 struct controller *ctrl = p_slot->ctrl;
668 667
669 mutex_lock(&p_slot->lock); 668 mutex_lock(&p_slot->lock);
670 switch (p_slot->state) { 669 switch (p_slot->state) {
671 case BLINKINGON_STATE: 670 case BLINKINGON_STATE:
672 cancel_delayed_work(&p_slot->work); 671 cancel_delayed_work(&p_slot->work);
673 case STATIC_STATE: 672 case STATIC_STATE:
674 p_slot->state = POWERON_STATE; 673 p_slot->state = POWERON_STATE;
675 mutex_unlock(&p_slot->lock); 674 mutex_unlock(&p_slot->lock);
676 retval = shpchp_enable_slot(p_slot); 675 retval = shpchp_enable_slot(p_slot);
677 mutex_lock(&p_slot->lock); 676 mutex_lock(&p_slot->lock);
678 p_slot->state = STATIC_STATE; 677 p_slot->state = STATIC_STATE;
679 break; 678 break;
680 case POWERON_STATE: 679 case POWERON_STATE:
681 ctrl_info(ctrl, "Slot %s is already in powering on state\n", 680 ctrl_info(ctrl, "Slot %s is already in powering on state\n",
682 slot_name(p_slot)); 681 slot_name(p_slot));
683 break; 682 break;
684 case BLINKINGOFF_STATE: 683 case BLINKINGOFF_STATE:
685 case POWEROFF_STATE: 684 case POWEROFF_STATE:
686 ctrl_info(ctrl, "Already enabled on slot %s\n", 685 ctrl_info(ctrl, "Already enabled on slot %s\n",
687 slot_name(p_slot)); 686 slot_name(p_slot));
688 break; 687 break;
689 default: 688 default:
690 ctrl_err(ctrl, "Not a valid state on slot %s\n", 689 ctrl_err(ctrl, "Not a valid state on slot %s\n",
691 slot_name(p_slot)); 690 slot_name(p_slot));
692 break; 691 break;
693 } 692 }
694 mutex_unlock(&p_slot->lock); 693 mutex_unlock(&p_slot->lock);
695 694
696 return retval; 695 return retval;
697 } 696 }
698 697
699 int shpchp_sysfs_disable_slot(struct slot *p_slot) 698 int shpchp_sysfs_disable_slot(struct slot *p_slot)
700 { 699 {
701 int retval = -ENODEV; 700 int retval = -ENODEV;
702 struct controller *ctrl = p_slot->ctrl; 701 struct controller *ctrl = p_slot->ctrl;
703 702
704 mutex_lock(&p_slot->lock); 703 mutex_lock(&p_slot->lock);
705 switch (p_slot->state) { 704 switch (p_slot->state) {
706 case BLINKINGOFF_STATE: 705 case BLINKINGOFF_STATE:
707 cancel_delayed_work(&p_slot->work); 706 cancel_delayed_work(&p_slot->work);
708 case STATIC_STATE: 707 case STATIC_STATE:
709 p_slot->state = POWEROFF_STATE; 708 p_slot->state = POWEROFF_STATE;
710 mutex_unlock(&p_slot->lock); 709 mutex_unlock(&p_slot->lock);
711 retval = shpchp_disable_slot(p_slot); 710 retval = shpchp_disable_slot(p_slot);
712 mutex_lock(&p_slot->lock); 711 mutex_lock(&p_slot->lock);
713 p_slot->state = STATIC_STATE; 712 p_slot->state = STATIC_STATE;
714 break; 713 break;
715 case POWEROFF_STATE: 714 case POWEROFF_STATE:
716 ctrl_info(ctrl, "Slot %s is already in powering off state\n", 715 ctrl_info(ctrl, "Slot %s is already in powering off state\n",
717 slot_name(p_slot)); 716 slot_name(p_slot));
718 break; 717 break;
719 case BLINKINGON_STATE: 718 case BLINKINGON_STATE:
720 case POWERON_STATE: 719 case POWERON_STATE:
721 ctrl_info(ctrl, "Already disabled on slot %s\n", 720 ctrl_info(ctrl, "Already disabled on slot %s\n",
722 slot_name(p_slot)); 721 slot_name(p_slot));
723 break; 722 break;
724 default: 723 default:
725 ctrl_err(ctrl, "Not a valid state on slot %s\n", 724 ctrl_err(ctrl, "Not a valid state on slot %s\n",
726 slot_name(p_slot)); 725 slot_name(p_slot));
727 break; 726 break;
728 } 727 }
729 mutex_unlock(&p_slot->lock); 728 mutex_unlock(&p_slot->lock);
730 729
731 return retval; 730 return retval;
732 } 731 }
733 732
drivers/pci/hotplug/shpchp_hpc.c
1 /* 1 /*
2 * Standard PCI Hot Plug Driver 2 * Standard PCI Hot Plug Driver
3 * 3 *
4 * Copyright (C) 1995,2001 Compaq Computer Corporation 4 * Copyright (C) 1995,2001 Compaq Computer Corporation
5 * Copyright (C) 2001 Greg Kroah-Hartman (greg@kroah.com) 5 * Copyright (C) 2001 Greg Kroah-Hartman (greg@kroah.com)
6 * Copyright (C) 2001 IBM Corp. 6 * Copyright (C) 2001 IBM Corp.
7 * Copyright (C) 2003-2004 Intel Corporation 7 * Copyright (C) 2003-2004 Intel Corporation
8 * 8 *
9 * All rights reserved. 9 * All rights reserved.
10 * 10 *
11 * This program is free software; you can redistribute it and/or modify 11 * This program is free software; you can redistribute it and/or modify
12 * it under the terms of the GNU General Public License as published by 12 * it under the terms of the GNU General Public License as published by
13 * the Free Software Foundation; either version 2 of the License, or (at 13 * the Free Software Foundation; either version 2 of the License, or (at
14 * your option) any later version. 14 * your option) any later version.
15 * 15 *
16 * This program is distributed in the hope that it will be useful, but 16 * This program is distributed in the hope that it will be useful, but
17 * WITHOUT ANY WARRANTY; without even the implied warranty of 17 * WITHOUT ANY WARRANTY; without even the implied warranty of
18 * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or 18 * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
19 * NON INFRINGEMENT. See the GNU General Public License for more 19 * NON INFRINGEMENT. See the GNU General Public License for more
20 * details. 20 * details.
21 * 21 *
22 * You should have received a copy of the GNU General Public License 22 * You should have received a copy of the GNU General Public License
23 * along with this program; if not, write to the Free Software 23 * along with this program; if not, write to the Free Software
24 * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. 24 * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
25 * 25 *
26 * Send feedback to <greg@kroah.com>,<kristen.c.accardi@intel.com> 26 * Send feedback to <greg@kroah.com>,<kristen.c.accardi@intel.com>
27 * 27 *
28 */ 28 */
29 29
30 #include <linux/kernel.h> 30 #include <linux/kernel.h>
31 #include <linux/module.h> 31 #include <linux/module.h>
32 #include <linux/types.h> 32 #include <linux/types.h>
33 #include <linux/pci.h> 33 #include <linux/pci.h>
34 #include <linux/interrupt.h> 34 #include <linux/interrupt.h>
35 35
36 #include "shpchp.h" 36 #include "shpchp.h"
37 37
38 /* Slot Available Register I field definition */ 38 /* Slot Available Register I field definition */
39 #define SLOT_33MHZ 0x0000001f 39 #define SLOT_33MHZ 0x0000001f
40 #define SLOT_66MHZ_PCIX 0x00001f00 40 #define SLOT_66MHZ_PCIX 0x00001f00
41 #define SLOT_100MHZ_PCIX 0x001f0000 41 #define SLOT_100MHZ_PCIX 0x001f0000
42 #define SLOT_133MHZ_PCIX 0x1f000000 42 #define SLOT_133MHZ_PCIX 0x1f000000
43 43
44 /* Slot Available Register II field definition */ 44 /* Slot Available Register II field definition */
45 #define SLOT_66MHZ 0x0000001f 45 #define SLOT_66MHZ 0x0000001f
46 #define SLOT_66MHZ_PCIX_266 0x00000f00 46 #define SLOT_66MHZ_PCIX_266 0x00000f00
47 #define SLOT_100MHZ_PCIX_266 0x0000f000 47 #define SLOT_100MHZ_PCIX_266 0x0000f000
48 #define SLOT_133MHZ_PCIX_266 0x000f0000 48 #define SLOT_133MHZ_PCIX_266 0x000f0000
49 #define SLOT_66MHZ_PCIX_533 0x00f00000 49 #define SLOT_66MHZ_PCIX_533 0x00f00000
50 #define SLOT_100MHZ_PCIX_533 0x0f000000 50 #define SLOT_100MHZ_PCIX_533 0x0f000000
51 #define SLOT_133MHZ_PCIX_533 0xf0000000 51 #define SLOT_133MHZ_PCIX_533 0xf0000000
52 52
53 /* Slot Configuration */ 53 /* Slot Configuration */
54 #define SLOT_NUM 0x0000001F 54 #define SLOT_NUM 0x0000001F
55 #define FIRST_DEV_NUM 0x00001F00 55 #define FIRST_DEV_NUM 0x00001F00
56 #define PSN 0x07FF0000 56 #define PSN 0x07FF0000
57 #define UPDOWN 0x20000000 57 #define UPDOWN 0x20000000
58 #define MRLSENSOR 0x40000000 58 #define MRLSENSOR 0x40000000
59 #define ATTN_BUTTON 0x80000000 59 #define ATTN_BUTTON 0x80000000
60 60
61 /* 61 /*
62 * Interrupt Locator Register definitions 62 * Interrupt Locator Register definitions
63 */ 63 */
64 #define CMD_INTR_PENDING (1 << 0) 64 #define CMD_INTR_PENDING (1 << 0)
65 #define SLOT_INTR_PENDING(i) (1 << (i + 1)) 65 #define SLOT_INTR_PENDING(i) (1 << (i + 1))
66 66
67 /* 67 /*
68 * Controller SERR-INT Register 68 * Controller SERR-INT Register
69 */ 69 */
70 #define GLOBAL_INTR_MASK (1 << 0) 70 #define GLOBAL_INTR_MASK (1 << 0)
71 #define GLOBAL_SERR_MASK (1 << 1) 71 #define GLOBAL_SERR_MASK (1 << 1)
72 #define COMMAND_INTR_MASK (1 << 2) 72 #define COMMAND_INTR_MASK (1 << 2)
73 #define ARBITER_SERR_MASK (1 << 3) 73 #define ARBITER_SERR_MASK (1 << 3)
74 #define COMMAND_DETECTED (1 << 16) 74 #define COMMAND_DETECTED (1 << 16)
75 #define ARBITER_DETECTED (1 << 17) 75 #define ARBITER_DETECTED (1 << 17)
76 #define SERR_INTR_RSVDZ_MASK 0xfffc0000 76 #define SERR_INTR_RSVDZ_MASK 0xfffc0000
77 77
78 /* 78 /*
79 * Logical Slot Register definitions 79 * Logical Slot Register definitions
80 */ 80 */
81 #define SLOT_REG(i) (SLOT1 + (4 * i)) 81 #define SLOT_REG(i) (SLOT1 + (4 * i))
82 82
83 #define SLOT_STATE_SHIFT (0) 83 #define SLOT_STATE_SHIFT (0)
84 #define SLOT_STATE_MASK (3 << 0) 84 #define SLOT_STATE_MASK (3 << 0)
85 #define SLOT_STATE_PWRONLY (1) 85 #define SLOT_STATE_PWRONLY (1)
86 #define SLOT_STATE_ENABLED (2) 86 #define SLOT_STATE_ENABLED (2)
87 #define SLOT_STATE_DISABLED (3) 87 #define SLOT_STATE_DISABLED (3)
88 #define PWR_LED_STATE_SHIFT (2) 88 #define PWR_LED_STATE_SHIFT (2)
89 #define PWR_LED_STATE_MASK (3 << 2) 89 #define PWR_LED_STATE_MASK (3 << 2)
90 #define ATN_LED_STATE_SHIFT (4) 90 #define ATN_LED_STATE_SHIFT (4)
91 #define ATN_LED_STATE_MASK (3 << 4) 91 #define ATN_LED_STATE_MASK (3 << 4)
92 #define ATN_LED_STATE_ON (1) 92 #define ATN_LED_STATE_ON (1)
93 #define ATN_LED_STATE_BLINK (2) 93 #define ATN_LED_STATE_BLINK (2)
94 #define ATN_LED_STATE_OFF (3) 94 #define ATN_LED_STATE_OFF (3)
95 #define POWER_FAULT (1 << 6) 95 #define POWER_FAULT (1 << 6)
96 #define ATN_BUTTON (1 << 7) 96 #define ATN_BUTTON (1 << 7)
97 #define MRL_SENSOR (1 << 8) 97 #define MRL_SENSOR (1 << 8)
98 #define MHZ66_CAP (1 << 9) 98 #define MHZ66_CAP (1 << 9)
99 #define PRSNT_SHIFT (10) 99 #define PRSNT_SHIFT (10)
100 #define PRSNT_MASK (3 << 10) 100 #define PRSNT_MASK (3 << 10)
101 #define PCIX_CAP_SHIFT (12) 101 #define PCIX_CAP_SHIFT (12)
102 #define PCIX_CAP_MASK_PI1 (3 << 12) 102 #define PCIX_CAP_MASK_PI1 (3 << 12)
103 #define PCIX_CAP_MASK_PI2 (7 << 12) 103 #define PCIX_CAP_MASK_PI2 (7 << 12)
104 #define PRSNT_CHANGE_DETECTED (1 << 16) 104 #define PRSNT_CHANGE_DETECTED (1 << 16)
105 #define ISO_PFAULT_DETECTED (1 << 17) 105 #define ISO_PFAULT_DETECTED (1 << 17)
106 #define BUTTON_PRESS_DETECTED (1 << 18) 106 #define BUTTON_PRESS_DETECTED (1 << 18)
107 #define MRL_CHANGE_DETECTED (1 << 19) 107 #define MRL_CHANGE_DETECTED (1 << 19)
108 #define CON_PFAULT_DETECTED (1 << 20) 108 #define CON_PFAULT_DETECTED (1 << 20)
109 #define PRSNT_CHANGE_INTR_MASK (1 << 24) 109 #define PRSNT_CHANGE_INTR_MASK (1 << 24)
110 #define ISO_PFAULT_INTR_MASK (1 << 25) 110 #define ISO_PFAULT_INTR_MASK (1 << 25)
111 #define BUTTON_PRESS_INTR_MASK (1 << 26) 111 #define BUTTON_PRESS_INTR_MASK (1 << 26)
112 #define MRL_CHANGE_INTR_MASK (1 << 27) 112 #define MRL_CHANGE_INTR_MASK (1 << 27)
113 #define CON_PFAULT_INTR_MASK (1 << 28) 113 #define CON_PFAULT_INTR_MASK (1 << 28)
114 #define MRL_CHANGE_SERR_MASK (1 << 29) 114 #define MRL_CHANGE_SERR_MASK (1 << 29)
115 #define CON_PFAULT_SERR_MASK (1 << 30) 115 #define CON_PFAULT_SERR_MASK (1 << 30)
116 #define SLOT_REG_RSVDZ_MASK ((1 << 15) | (7 << 21)) 116 #define SLOT_REG_RSVDZ_MASK ((1 << 15) | (7 << 21))
117 117
118 /* 118 /*
119 * SHPC Command Code definitnions 119 * SHPC Command Code definitnions
120 * 120 *
121 * Slot Operation 00h - 3Fh 121 * Slot Operation 00h - 3Fh
122 * Set Bus Segment Speed/Mode A 40h - 47h 122 * Set Bus Segment Speed/Mode A 40h - 47h
123 * Power-Only All Slots 48h 123 * Power-Only All Slots 48h
124 * Enable All Slots 49h 124 * Enable All Slots 49h
125 * Set Bus Segment Speed/Mode B (PI=2) 50h - 5Fh 125 * Set Bus Segment Speed/Mode B (PI=2) 50h - 5Fh
126 * Reserved Command Codes 60h - BFh 126 * Reserved Command Codes 60h - BFh
127 * Vendor Specific Commands C0h - FFh 127 * Vendor Specific Commands C0h - FFh
128 */ 128 */
129 #define SET_SLOT_PWR 0x01 /* Slot Operation */ 129 #define SET_SLOT_PWR 0x01 /* Slot Operation */
130 #define SET_SLOT_ENABLE 0x02 130 #define SET_SLOT_ENABLE 0x02
131 #define SET_SLOT_DISABLE 0x03 131 #define SET_SLOT_DISABLE 0x03
132 #define SET_PWR_ON 0x04 132 #define SET_PWR_ON 0x04
133 #define SET_PWR_BLINK 0x08 133 #define SET_PWR_BLINK 0x08
134 #define SET_PWR_OFF 0x0c 134 #define SET_PWR_OFF 0x0c
135 #define SET_ATTN_ON 0x10 135 #define SET_ATTN_ON 0x10
136 #define SET_ATTN_BLINK 0x20 136 #define SET_ATTN_BLINK 0x20
137 #define SET_ATTN_OFF 0x30 137 #define SET_ATTN_OFF 0x30
138 #define SETA_PCI_33MHZ 0x40 /* Set Bus Segment Speed/Mode A */ 138 #define SETA_PCI_33MHZ 0x40 /* Set Bus Segment Speed/Mode A */
139 #define SETA_PCI_66MHZ 0x41 139 #define SETA_PCI_66MHZ 0x41
140 #define SETA_PCIX_66MHZ 0x42 140 #define SETA_PCIX_66MHZ 0x42
141 #define SETA_PCIX_100MHZ 0x43 141 #define SETA_PCIX_100MHZ 0x43
142 #define SETA_PCIX_133MHZ 0x44 142 #define SETA_PCIX_133MHZ 0x44
143 #define SETA_RESERVED1 0x45 143 #define SETA_RESERVED1 0x45
144 #define SETA_RESERVED2 0x46 144 #define SETA_RESERVED2 0x46
145 #define SETA_RESERVED3 0x47 145 #define SETA_RESERVED3 0x47
146 #define SET_PWR_ONLY_ALL 0x48 /* Power-Only All Slots */ 146 #define SET_PWR_ONLY_ALL 0x48 /* Power-Only All Slots */
147 #define SET_ENABLE_ALL 0x49 /* Enable All Slots */ 147 #define SET_ENABLE_ALL 0x49 /* Enable All Slots */
148 #define SETB_PCI_33MHZ 0x50 /* Set Bus Segment Speed/Mode B */ 148 #define SETB_PCI_33MHZ 0x50 /* Set Bus Segment Speed/Mode B */
149 #define SETB_PCI_66MHZ 0x51 149 #define SETB_PCI_66MHZ 0x51
150 #define SETB_PCIX_66MHZ_PM 0x52 150 #define SETB_PCIX_66MHZ_PM 0x52
151 #define SETB_PCIX_100MHZ_PM 0x53 151 #define SETB_PCIX_100MHZ_PM 0x53
152 #define SETB_PCIX_133MHZ_PM 0x54 152 #define SETB_PCIX_133MHZ_PM 0x54
153 #define SETB_PCIX_66MHZ_EM 0x55 153 #define SETB_PCIX_66MHZ_EM 0x55
154 #define SETB_PCIX_100MHZ_EM 0x56 154 #define SETB_PCIX_100MHZ_EM 0x56
155 #define SETB_PCIX_133MHZ_EM 0x57 155 #define SETB_PCIX_133MHZ_EM 0x57
156 #define SETB_PCIX_66MHZ_266 0x58 156 #define SETB_PCIX_66MHZ_266 0x58
157 #define SETB_PCIX_100MHZ_266 0x59 157 #define SETB_PCIX_100MHZ_266 0x59
158 #define SETB_PCIX_133MHZ_266 0x5a 158 #define SETB_PCIX_133MHZ_266 0x5a
159 #define SETB_PCIX_66MHZ_533 0x5b 159 #define SETB_PCIX_66MHZ_533 0x5b
160 #define SETB_PCIX_100MHZ_533 0x5c 160 #define SETB_PCIX_100MHZ_533 0x5c
161 #define SETB_PCIX_133MHZ_533 0x5d 161 #define SETB_PCIX_133MHZ_533 0x5d
162 #define SETB_RESERVED1 0x5e 162 #define SETB_RESERVED1 0x5e
163 #define SETB_RESERVED2 0x5f 163 #define SETB_RESERVED2 0x5f
164 164
165 /* 165 /*
166 * SHPC controller command error code 166 * SHPC controller command error code
167 */ 167 */
168 #define SWITCH_OPEN 0x1 168 #define SWITCH_OPEN 0x1
169 #define INVALID_CMD 0x2 169 #define INVALID_CMD 0x2
170 #define INVALID_SPEED_MODE 0x4 170 #define INVALID_SPEED_MODE 0x4
171 171
172 /* 172 /*
173 * For accessing SHPC Working Register Set via PCI Configuration Space 173 * For accessing SHPC Working Register Set via PCI Configuration Space
174 */ 174 */
175 #define DWORD_SELECT 0x2 175 #define DWORD_SELECT 0x2
176 #define DWORD_DATA 0x4 176 #define DWORD_DATA 0x4
177 177
178 /* Field Offset in Logical Slot Register - byte boundary */ 178 /* Field Offset in Logical Slot Register - byte boundary */
179 #define SLOT_EVENT_LATCH 0x2 179 #define SLOT_EVENT_LATCH 0x2
180 #define SLOT_SERR_INT_MASK 0x3 180 #define SLOT_SERR_INT_MASK 0x3
181 181
182 static atomic_t shpchp_num_controllers = ATOMIC_INIT(0);
183
184 static irqreturn_t shpc_isr(int irq, void *dev_id); 182 static irqreturn_t shpc_isr(int irq, void *dev_id);
185 static void start_int_poll_timer(struct controller *ctrl, int sec); 183 static void start_int_poll_timer(struct controller *ctrl, int sec);
186 static int hpc_check_cmd_status(struct controller *ctrl); 184 static int hpc_check_cmd_status(struct controller *ctrl);
187 185
188 static inline u8 shpc_readb(struct controller *ctrl, int reg) 186 static inline u8 shpc_readb(struct controller *ctrl, int reg)
189 { 187 {
190 return readb(ctrl->creg + reg); 188 return readb(ctrl->creg + reg);
191 } 189 }
192 190
193 static inline void shpc_writeb(struct controller *ctrl, int reg, u8 val) 191 static inline void shpc_writeb(struct controller *ctrl, int reg, u8 val)
194 { 192 {
195 writeb(val, ctrl->creg + reg); 193 writeb(val, ctrl->creg + reg);
196 } 194 }
197 195
198 static inline u16 shpc_readw(struct controller *ctrl, int reg) 196 static inline u16 shpc_readw(struct controller *ctrl, int reg)
199 { 197 {
200 return readw(ctrl->creg + reg); 198 return readw(ctrl->creg + reg);
201 } 199 }
202 200
203 static inline void shpc_writew(struct controller *ctrl, int reg, u16 val) 201 static inline void shpc_writew(struct controller *ctrl, int reg, u16 val)
204 { 202 {
205 writew(val, ctrl->creg + reg); 203 writew(val, ctrl->creg + reg);
206 } 204 }
207 205
208 static inline u32 shpc_readl(struct controller *ctrl, int reg) 206 static inline u32 shpc_readl(struct controller *ctrl, int reg)
209 { 207 {
210 return readl(ctrl->creg + reg); 208 return readl(ctrl->creg + reg);
211 } 209 }
212 210
213 static inline void shpc_writel(struct controller *ctrl, int reg, u32 val) 211 static inline void shpc_writel(struct controller *ctrl, int reg, u32 val)
214 { 212 {
215 writel(val, ctrl->creg + reg); 213 writel(val, ctrl->creg + reg);
216 } 214 }
217 215
218 static inline int shpc_indirect_read(struct controller *ctrl, int index, 216 static inline int shpc_indirect_read(struct controller *ctrl, int index,
219 u32 *value) 217 u32 *value)
220 { 218 {
221 int rc; 219 int rc;
222 u32 cap_offset = ctrl->cap_offset; 220 u32 cap_offset = ctrl->cap_offset;
223 struct pci_dev *pdev = ctrl->pci_dev; 221 struct pci_dev *pdev = ctrl->pci_dev;
224 222
225 rc = pci_write_config_byte(pdev, cap_offset + DWORD_SELECT, index); 223 rc = pci_write_config_byte(pdev, cap_offset + DWORD_SELECT, index);
226 if (rc) 224 if (rc)
227 return rc; 225 return rc;
228 return pci_read_config_dword(pdev, cap_offset + DWORD_DATA, value); 226 return pci_read_config_dword(pdev, cap_offset + DWORD_DATA, value);
229 } 227 }
230 228
231 /* 229 /*
232 * This is the interrupt polling timeout function. 230 * This is the interrupt polling timeout function.
233 */ 231 */
234 static void int_poll_timeout(unsigned long data) 232 static void int_poll_timeout(unsigned long data)
235 { 233 {
236 struct controller *ctrl = (struct controller *)data; 234 struct controller *ctrl = (struct controller *)data;
237 235
238 /* Poll for interrupt events. regs == NULL => polling */ 236 /* Poll for interrupt events. regs == NULL => polling */
239 shpc_isr(0, ctrl); 237 shpc_isr(0, ctrl);
240 238
241 init_timer(&ctrl->poll_timer); 239 init_timer(&ctrl->poll_timer);
242 if (!shpchp_poll_time) 240 if (!shpchp_poll_time)
243 shpchp_poll_time = 2; /* default polling interval is 2 sec */ 241 shpchp_poll_time = 2; /* default polling interval is 2 sec */
244 242
245 start_int_poll_timer(ctrl, shpchp_poll_time); 243 start_int_poll_timer(ctrl, shpchp_poll_time);
246 } 244 }
247 245
248 /* 246 /*
249 * This function starts the interrupt polling timer. 247 * This function starts the interrupt polling timer.
250 */ 248 */
251 static void start_int_poll_timer(struct controller *ctrl, int sec) 249 static void start_int_poll_timer(struct controller *ctrl, int sec)
252 { 250 {
253 /* Clamp to sane value */ 251 /* Clamp to sane value */
254 if ((sec <= 0) || (sec > 60)) 252 if ((sec <= 0) || (sec > 60))
255 sec = 2; 253 sec = 2;
256 254
257 ctrl->poll_timer.function = &int_poll_timeout; 255 ctrl->poll_timer.function = &int_poll_timeout;
258 ctrl->poll_timer.data = (unsigned long)ctrl; 256 ctrl->poll_timer.data = (unsigned long)ctrl;
259 ctrl->poll_timer.expires = jiffies + sec * HZ; 257 ctrl->poll_timer.expires = jiffies + sec * HZ;
260 add_timer(&ctrl->poll_timer); 258 add_timer(&ctrl->poll_timer);
261 } 259 }
262 260
263 static inline int is_ctrl_busy(struct controller *ctrl) 261 static inline int is_ctrl_busy(struct controller *ctrl)
264 { 262 {
265 u16 cmd_status = shpc_readw(ctrl, CMD_STATUS); 263 u16 cmd_status = shpc_readw(ctrl, CMD_STATUS);
266 return cmd_status & 0x1; 264 return cmd_status & 0x1;
267 } 265 }
268 266
269 /* 267 /*
270 * Returns 1 if SHPC finishes executing a command within 1 sec, 268 * Returns 1 if SHPC finishes executing a command within 1 sec,
271 * otherwise returns 0. 269 * otherwise returns 0.
272 */ 270 */
273 static inline int shpc_poll_ctrl_busy(struct controller *ctrl) 271 static inline int shpc_poll_ctrl_busy(struct controller *ctrl)
274 { 272 {
275 int i; 273 int i;
276 274
277 if (!is_ctrl_busy(ctrl)) 275 if (!is_ctrl_busy(ctrl))
278 return 1; 276 return 1;
279 277
280 /* Check every 0.1 sec for a total of 1 sec */ 278 /* Check every 0.1 sec for a total of 1 sec */
281 for (i = 0; i < 10; i++) { 279 for (i = 0; i < 10; i++) {
282 msleep(100); 280 msleep(100);
283 if (!is_ctrl_busy(ctrl)) 281 if (!is_ctrl_busy(ctrl))
284 return 1; 282 return 1;
285 } 283 }
286 284
287 return 0; 285 return 0;
288 } 286 }
289 287
290 static inline int shpc_wait_cmd(struct controller *ctrl) 288 static inline int shpc_wait_cmd(struct controller *ctrl)
291 { 289 {
292 int retval = 0; 290 int retval = 0;
293 unsigned long timeout = msecs_to_jiffies(1000); 291 unsigned long timeout = msecs_to_jiffies(1000);
294 int rc; 292 int rc;
295 293
296 if (shpchp_poll_mode) 294 if (shpchp_poll_mode)
297 rc = shpc_poll_ctrl_busy(ctrl); 295 rc = shpc_poll_ctrl_busy(ctrl);
298 else 296 else
299 rc = wait_event_interruptible_timeout(ctrl->queue, 297 rc = wait_event_interruptible_timeout(ctrl->queue,
300 !is_ctrl_busy(ctrl), timeout); 298 !is_ctrl_busy(ctrl), timeout);
301 if (!rc && is_ctrl_busy(ctrl)) { 299 if (!rc && is_ctrl_busy(ctrl)) {
302 retval = -EIO; 300 retval = -EIO;
303 ctrl_err(ctrl, "Command not completed in 1000 msec\n"); 301 ctrl_err(ctrl, "Command not completed in 1000 msec\n");
304 } else if (rc < 0) { 302 } else if (rc < 0) {
305 retval = -EINTR; 303 retval = -EINTR;
306 ctrl_info(ctrl, "Command was interrupted by a signal\n"); 304 ctrl_info(ctrl, "Command was interrupted by a signal\n");
307 } 305 }
308 306
309 return retval; 307 return retval;
310 } 308 }
311 309
312 static int shpc_write_cmd(struct slot *slot, u8 t_slot, u8 cmd) 310 static int shpc_write_cmd(struct slot *slot, u8 t_slot, u8 cmd)
313 { 311 {
314 struct controller *ctrl = slot->ctrl; 312 struct controller *ctrl = slot->ctrl;
315 u16 cmd_status; 313 u16 cmd_status;
316 int retval = 0; 314 int retval = 0;
317 u16 temp_word; 315 u16 temp_word;
318 316
319 mutex_lock(&slot->ctrl->cmd_lock); 317 mutex_lock(&slot->ctrl->cmd_lock);
320 318
321 if (!shpc_poll_ctrl_busy(ctrl)) { 319 if (!shpc_poll_ctrl_busy(ctrl)) {
322 /* After 1 sec and and the controller is still busy */ 320 /* After 1 sec and and the controller is still busy */
323 ctrl_err(ctrl, "Controller is still busy after 1 sec\n"); 321 ctrl_err(ctrl, "Controller is still busy after 1 sec\n");
324 retval = -EBUSY; 322 retval = -EBUSY;
325 goto out; 323 goto out;
326 } 324 }
327 325
328 ++t_slot; 326 ++t_slot;
329 temp_word = (t_slot << 8) | (cmd & 0xFF); 327 temp_word = (t_slot << 8) | (cmd & 0xFF);
330 ctrl_dbg(ctrl, "%s: t_slot %x cmd %x\n", __func__, t_slot, cmd); 328 ctrl_dbg(ctrl, "%s: t_slot %x cmd %x\n", __func__, t_slot, cmd);
331 329
332 /* To make sure the Controller Busy bit is 0 before we send out the 330 /* To make sure the Controller Busy bit is 0 before we send out the
333 * command. 331 * command.
334 */ 332 */
335 shpc_writew(ctrl, CMD, temp_word); 333 shpc_writew(ctrl, CMD, temp_word);
336 334
337 /* 335 /*
338 * Wait for command completion. 336 * Wait for command completion.
339 */ 337 */
340 retval = shpc_wait_cmd(slot->ctrl); 338 retval = shpc_wait_cmd(slot->ctrl);
341 if (retval) 339 if (retval)
342 goto out; 340 goto out;
343 341
344 cmd_status = hpc_check_cmd_status(slot->ctrl); 342 cmd_status = hpc_check_cmd_status(slot->ctrl);
345 if (cmd_status) { 343 if (cmd_status) {
346 ctrl_err(ctrl, 344 ctrl_err(ctrl,
347 "Failed to issued command 0x%x (error code = %d)\n", 345 "Failed to issued command 0x%x (error code = %d)\n",
348 cmd, cmd_status); 346 cmd, cmd_status);
349 retval = -EIO; 347 retval = -EIO;
350 } 348 }
351 out: 349 out:
352 mutex_unlock(&slot->ctrl->cmd_lock); 350 mutex_unlock(&slot->ctrl->cmd_lock);
353 return retval; 351 return retval;
354 } 352 }
355 353
356 static int hpc_check_cmd_status(struct controller *ctrl) 354 static int hpc_check_cmd_status(struct controller *ctrl)
357 { 355 {
358 int retval = 0; 356 int retval = 0;
359 u16 cmd_status = shpc_readw(ctrl, CMD_STATUS) & 0x000F; 357 u16 cmd_status = shpc_readw(ctrl, CMD_STATUS) & 0x000F;
360 358
361 switch (cmd_status >> 1) { 359 switch (cmd_status >> 1) {
362 case 0: 360 case 0:
363 retval = 0; 361 retval = 0;
364 break; 362 break;
365 case 1: 363 case 1:
366 retval = SWITCH_OPEN; 364 retval = SWITCH_OPEN;
367 ctrl_err(ctrl, "Switch opened!\n"); 365 ctrl_err(ctrl, "Switch opened!\n");
368 break; 366 break;
369 case 2: 367 case 2:
370 retval = INVALID_CMD; 368 retval = INVALID_CMD;
371 ctrl_err(ctrl, "Invalid HPC command!\n"); 369 ctrl_err(ctrl, "Invalid HPC command!\n");
372 break; 370 break;
373 case 4: 371 case 4:
374 retval = INVALID_SPEED_MODE; 372 retval = INVALID_SPEED_MODE;
375 ctrl_err(ctrl, "Invalid bus speed/mode!\n"); 373 ctrl_err(ctrl, "Invalid bus speed/mode!\n");
376 break; 374 break;
377 default: 375 default:
378 retval = cmd_status; 376 retval = cmd_status;
379 } 377 }
380 378
381 return retval; 379 return retval;
382 } 380 }
383 381
384 382
385 static int hpc_get_attention_status(struct slot *slot, u8 *status) 383 static int hpc_get_attention_status(struct slot *slot, u8 *status)
386 { 384 {
387 struct controller *ctrl = slot->ctrl; 385 struct controller *ctrl = slot->ctrl;
388 u32 slot_reg = shpc_readl(ctrl, SLOT_REG(slot->hp_slot)); 386 u32 slot_reg = shpc_readl(ctrl, SLOT_REG(slot->hp_slot));
389 u8 state = (slot_reg & ATN_LED_STATE_MASK) >> ATN_LED_STATE_SHIFT; 387 u8 state = (slot_reg & ATN_LED_STATE_MASK) >> ATN_LED_STATE_SHIFT;
390 388
391 switch (state) { 389 switch (state) {
392 case ATN_LED_STATE_ON: 390 case ATN_LED_STATE_ON:
393 *status = 1; /* On */ 391 *status = 1; /* On */
394 break; 392 break;
395 case ATN_LED_STATE_BLINK: 393 case ATN_LED_STATE_BLINK:
396 *status = 2; /* Blink */ 394 *status = 2; /* Blink */
397 break; 395 break;
398 case ATN_LED_STATE_OFF: 396 case ATN_LED_STATE_OFF:
399 *status = 0; /* Off */ 397 *status = 0; /* Off */
400 break; 398 break;
401 default: 399 default:
402 *status = 0xFF; /* Reserved */ 400 *status = 0xFF; /* Reserved */
403 break; 401 break;
404 } 402 }
405 403
406 return 0; 404 return 0;
407 } 405 }
408 406
409 static int hpc_get_power_status(struct slot * slot, u8 *status) 407 static int hpc_get_power_status(struct slot * slot, u8 *status)
410 { 408 {
411 struct controller *ctrl = slot->ctrl; 409 struct controller *ctrl = slot->ctrl;
412 u32 slot_reg = shpc_readl(ctrl, SLOT_REG(slot->hp_slot)); 410 u32 slot_reg = shpc_readl(ctrl, SLOT_REG(slot->hp_slot));
413 u8 state = (slot_reg & SLOT_STATE_MASK) >> SLOT_STATE_SHIFT; 411 u8 state = (slot_reg & SLOT_STATE_MASK) >> SLOT_STATE_SHIFT;
414 412
415 switch (state) { 413 switch (state) {
416 case SLOT_STATE_PWRONLY: 414 case SLOT_STATE_PWRONLY:
417 *status = 2; /* Powered only */ 415 *status = 2; /* Powered only */
418 break; 416 break;
419 case SLOT_STATE_ENABLED: 417 case SLOT_STATE_ENABLED:
420 *status = 1; /* Enabled */ 418 *status = 1; /* Enabled */
421 break; 419 break;
422 case SLOT_STATE_DISABLED: 420 case SLOT_STATE_DISABLED:
423 *status = 0; /* Disabled */ 421 *status = 0; /* Disabled */
424 break; 422 break;
425 default: 423 default:
426 *status = 0xFF; /* Reserved */ 424 *status = 0xFF; /* Reserved */
427 break; 425 break;
428 } 426 }
429 427
430 return 0; 428 return 0;
431 } 429 }
432 430
433 431
434 static int hpc_get_latch_status(struct slot *slot, u8 *status) 432 static int hpc_get_latch_status(struct slot *slot, u8 *status)
435 { 433 {
436 struct controller *ctrl = slot->ctrl; 434 struct controller *ctrl = slot->ctrl;
437 u32 slot_reg = shpc_readl(ctrl, SLOT_REG(slot->hp_slot)); 435 u32 slot_reg = shpc_readl(ctrl, SLOT_REG(slot->hp_slot));
438 436
439 *status = !!(slot_reg & MRL_SENSOR); /* 0 -> close; 1 -> open */ 437 *status = !!(slot_reg & MRL_SENSOR); /* 0 -> close; 1 -> open */
440 438
441 return 0; 439 return 0;
442 } 440 }
443 441
444 static int hpc_get_adapter_status(struct slot *slot, u8 *status) 442 static int hpc_get_adapter_status(struct slot *slot, u8 *status)
445 { 443 {
446 struct controller *ctrl = slot->ctrl; 444 struct controller *ctrl = slot->ctrl;
447 u32 slot_reg = shpc_readl(ctrl, SLOT_REG(slot->hp_slot)); 445 u32 slot_reg = shpc_readl(ctrl, SLOT_REG(slot->hp_slot));
448 u8 state = (slot_reg & PRSNT_MASK) >> PRSNT_SHIFT; 446 u8 state = (slot_reg & PRSNT_MASK) >> PRSNT_SHIFT;
449 447
450 *status = (state != 0x3) ? 1 : 0; 448 *status = (state != 0x3) ? 1 : 0;
451 449
452 return 0; 450 return 0;
453 } 451 }
454 452
455 static int hpc_get_prog_int(struct slot *slot, u8 *prog_int) 453 static int hpc_get_prog_int(struct slot *slot, u8 *prog_int)
456 { 454 {
457 struct controller *ctrl = slot->ctrl; 455 struct controller *ctrl = slot->ctrl;
458 456
459 *prog_int = shpc_readb(ctrl, PROG_INTERFACE); 457 *prog_int = shpc_readb(ctrl, PROG_INTERFACE);
460 458
461 return 0; 459 return 0;
462 } 460 }
463 461
464 static int hpc_get_adapter_speed(struct slot *slot, enum pci_bus_speed *value) 462 static int hpc_get_adapter_speed(struct slot *slot, enum pci_bus_speed *value)
465 { 463 {
466 int retval = 0; 464 int retval = 0;
467 struct controller *ctrl = slot->ctrl; 465 struct controller *ctrl = slot->ctrl;
468 u32 slot_reg = shpc_readl(ctrl, SLOT_REG(slot->hp_slot)); 466 u32 slot_reg = shpc_readl(ctrl, SLOT_REG(slot->hp_slot));
469 u8 m66_cap = !!(slot_reg & MHZ66_CAP); 467 u8 m66_cap = !!(slot_reg & MHZ66_CAP);
470 u8 pi, pcix_cap; 468 u8 pi, pcix_cap;
471 469
472 if ((retval = hpc_get_prog_int(slot, &pi))) 470 if ((retval = hpc_get_prog_int(slot, &pi)))
473 return retval; 471 return retval;
474 472
475 switch (pi) { 473 switch (pi) {
476 case 1: 474 case 1:
477 pcix_cap = (slot_reg & PCIX_CAP_MASK_PI1) >> PCIX_CAP_SHIFT; 475 pcix_cap = (slot_reg & PCIX_CAP_MASK_PI1) >> PCIX_CAP_SHIFT;
478 break; 476 break;
479 case 2: 477 case 2:
480 pcix_cap = (slot_reg & PCIX_CAP_MASK_PI2) >> PCIX_CAP_SHIFT; 478 pcix_cap = (slot_reg & PCIX_CAP_MASK_PI2) >> PCIX_CAP_SHIFT;
481 break; 479 break;
482 default: 480 default:
483 return -ENODEV; 481 return -ENODEV;
484 } 482 }
485 483
486 ctrl_dbg(ctrl, "%s: slot_reg = %x, pcix_cap = %x, m66_cap = %x\n", 484 ctrl_dbg(ctrl, "%s: slot_reg = %x, pcix_cap = %x, m66_cap = %x\n",
487 __func__, slot_reg, pcix_cap, m66_cap); 485 __func__, slot_reg, pcix_cap, m66_cap);
488 486
489 switch (pcix_cap) { 487 switch (pcix_cap) {
490 case 0x0: 488 case 0x0:
491 *value = m66_cap ? PCI_SPEED_66MHz : PCI_SPEED_33MHz; 489 *value = m66_cap ? PCI_SPEED_66MHz : PCI_SPEED_33MHz;
492 break; 490 break;
493 case 0x1: 491 case 0x1:
494 *value = PCI_SPEED_66MHz_PCIX; 492 *value = PCI_SPEED_66MHz_PCIX;
495 break; 493 break;
496 case 0x3: 494 case 0x3:
497 *value = PCI_SPEED_133MHz_PCIX; 495 *value = PCI_SPEED_133MHz_PCIX;
498 break; 496 break;
499 case 0x4: 497 case 0x4:
500 *value = PCI_SPEED_133MHz_PCIX_266; 498 *value = PCI_SPEED_133MHz_PCIX_266;
501 break; 499 break;
502 case 0x5: 500 case 0x5:
503 *value = PCI_SPEED_133MHz_PCIX_533; 501 *value = PCI_SPEED_133MHz_PCIX_533;
504 break; 502 break;
505 case 0x2: 503 case 0x2:
506 default: 504 default:
507 *value = PCI_SPEED_UNKNOWN; 505 *value = PCI_SPEED_UNKNOWN;
508 retval = -ENODEV; 506 retval = -ENODEV;
509 break; 507 break;
510 } 508 }
511 509
512 ctrl_dbg(ctrl, "Adapter speed = %d\n", *value); 510 ctrl_dbg(ctrl, "Adapter speed = %d\n", *value);
513 return retval; 511 return retval;
514 } 512 }
515 513
516 static int hpc_get_mode1_ECC_cap(struct slot *slot, u8 *mode) 514 static int hpc_get_mode1_ECC_cap(struct slot *slot, u8 *mode)
517 { 515 {
518 int retval = 0; 516 int retval = 0;
519 struct controller *ctrl = slot->ctrl; 517 struct controller *ctrl = slot->ctrl;
520 u16 sec_bus_status = shpc_readw(ctrl, SEC_BUS_CONFIG); 518 u16 sec_bus_status = shpc_readw(ctrl, SEC_BUS_CONFIG);
521 u8 pi = shpc_readb(ctrl, PROG_INTERFACE); 519 u8 pi = shpc_readb(ctrl, PROG_INTERFACE);
522 520
523 if (pi == 2) { 521 if (pi == 2) {
524 *mode = (sec_bus_status & 0x0100) >> 8; 522 *mode = (sec_bus_status & 0x0100) >> 8;
525 } else { 523 } else {
526 retval = -1; 524 retval = -1;
527 } 525 }
528 526
529 ctrl_dbg(ctrl, "Mode 1 ECC cap = %d\n", *mode); 527 ctrl_dbg(ctrl, "Mode 1 ECC cap = %d\n", *mode);
530 return retval; 528 return retval;
531 } 529 }
532 530
533 static int hpc_query_power_fault(struct slot * slot) 531 static int hpc_query_power_fault(struct slot * slot)
534 { 532 {
535 struct controller *ctrl = slot->ctrl; 533 struct controller *ctrl = slot->ctrl;
536 u32 slot_reg = shpc_readl(ctrl, SLOT_REG(slot->hp_slot)); 534 u32 slot_reg = shpc_readl(ctrl, SLOT_REG(slot->hp_slot));
537 535
538 /* Note: Logic 0 => fault */ 536 /* Note: Logic 0 => fault */
539 return !(slot_reg & POWER_FAULT); 537 return !(slot_reg & POWER_FAULT);
540 } 538 }
541 539
542 static int hpc_set_attention_status(struct slot *slot, u8 value) 540 static int hpc_set_attention_status(struct slot *slot, u8 value)
543 { 541 {
544 u8 slot_cmd = 0; 542 u8 slot_cmd = 0;
545 543
546 switch (value) { 544 switch (value) {
547 case 0 : 545 case 0 :
548 slot_cmd = SET_ATTN_OFF; /* OFF */ 546 slot_cmd = SET_ATTN_OFF; /* OFF */
549 break; 547 break;
550 case 1: 548 case 1:
551 slot_cmd = SET_ATTN_ON; /* ON */ 549 slot_cmd = SET_ATTN_ON; /* ON */
552 break; 550 break;
553 case 2: 551 case 2:
554 slot_cmd = SET_ATTN_BLINK; /* BLINK */ 552 slot_cmd = SET_ATTN_BLINK; /* BLINK */
555 break; 553 break;
556 default: 554 default:
557 return -1; 555 return -1;
558 } 556 }
559 557
560 return shpc_write_cmd(slot, slot->hp_slot, slot_cmd); 558 return shpc_write_cmd(slot, slot->hp_slot, slot_cmd);
561 } 559 }
562 560
563 561
564 static void hpc_set_green_led_on(struct slot *slot) 562 static void hpc_set_green_led_on(struct slot *slot)
565 { 563 {
566 shpc_write_cmd(slot, slot->hp_slot, SET_PWR_ON); 564 shpc_write_cmd(slot, slot->hp_slot, SET_PWR_ON);
567 } 565 }
568 566
569 static void hpc_set_green_led_off(struct slot *slot) 567 static void hpc_set_green_led_off(struct slot *slot)
570 { 568 {
571 shpc_write_cmd(slot, slot->hp_slot, SET_PWR_OFF); 569 shpc_write_cmd(slot, slot->hp_slot, SET_PWR_OFF);
572 } 570 }
573 571
574 static void hpc_set_green_led_blink(struct slot *slot) 572 static void hpc_set_green_led_blink(struct slot *slot)
575 { 573 {
576 shpc_write_cmd(slot, slot->hp_slot, SET_PWR_BLINK); 574 shpc_write_cmd(slot, slot->hp_slot, SET_PWR_BLINK);
577 } 575 }
578 576
579 static void hpc_release_ctlr(struct controller *ctrl) 577 static void hpc_release_ctlr(struct controller *ctrl)
580 { 578 {
581 int i; 579 int i;
582 u32 slot_reg, serr_int; 580 u32 slot_reg, serr_int;
583 581
584 /* 582 /*
585 * Mask event interrupts and SERRs of all slots 583 * Mask event interrupts and SERRs of all slots
586 */ 584 */
587 for (i = 0; i < ctrl->num_slots; i++) { 585 for (i = 0; i < ctrl->num_slots; i++) {
588 slot_reg = shpc_readl(ctrl, SLOT_REG(i)); 586 slot_reg = shpc_readl(ctrl, SLOT_REG(i));
589 slot_reg |= (PRSNT_CHANGE_INTR_MASK | ISO_PFAULT_INTR_MASK | 587 slot_reg |= (PRSNT_CHANGE_INTR_MASK | ISO_PFAULT_INTR_MASK |
590 BUTTON_PRESS_INTR_MASK | MRL_CHANGE_INTR_MASK | 588 BUTTON_PRESS_INTR_MASK | MRL_CHANGE_INTR_MASK |
591 CON_PFAULT_INTR_MASK | MRL_CHANGE_SERR_MASK | 589 CON_PFAULT_INTR_MASK | MRL_CHANGE_SERR_MASK |
592 CON_PFAULT_SERR_MASK); 590 CON_PFAULT_SERR_MASK);
593 slot_reg &= ~SLOT_REG_RSVDZ_MASK; 591 slot_reg &= ~SLOT_REG_RSVDZ_MASK;
594 shpc_writel(ctrl, SLOT_REG(i), slot_reg); 592 shpc_writel(ctrl, SLOT_REG(i), slot_reg);
595 } 593 }
596 594
597 cleanup_slots(ctrl); 595 cleanup_slots(ctrl);
598 596
599 /* 597 /*
600 * Mask SERR and System Interrupt generation 598 * Mask SERR and System Interrupt generation
601 */ 599 */
602 serr_int = shpc_readl(ctrl, SERR_INTR_ENABLE); 600 serr_int = shpc_readl(ctrl, SERR_INTR_ENABLE);
603 serr_int |= (GLOBAL_INTR_MASK | GLOBAL_SERR_MASK | 601 serr_int |= (GLOBAL_INTR_MASK | GLOBAL_SERR_MASK |
604 COMMAND_INTR_MASK | ARBITER_SERR_MASK); 602 COMMAND_INTR_MASK | ARBITER_SERR_MASK);
605 serr_int &= ~SERR_INTR_RSVDZ_MASK; 603 serr_int &= ~SERR_INTR_RSVDZ_MASK;
606 shpc_writel(ctrl, SERR_INTR_ENABLE, serr_int); 604 shpc_writel(ctrl, SERR_INTR_ENABLE, serr_int);
607 605
608 if (shpchp_poll_mode) 606 if (shpchp_poll_mode)
609 del_timer(&ctrl->poll_timer); 607 del_timer(&ctrl->poll_timer);
610 else { 608 else {
611 free_irq(ctrl->pci_dev->irq, ctrl); 609 free_irq(ctrl->pci_dev->irq, ctrl);
612 pci_disable_msi(ctrl->pci_dev); 610 pci_disable_msi(ctrl->pci_dev);
613 } 611 }
614 612
615 iounmap(ctrl->creg); 613 iounmap(ctrl->creg);
616 release_mem_region(ctrl->mmio_base, ctrl->mmio_size); 614 release_mem_region(ctrl->mmio_base, ctrl->mmio_size);
617
618 /*
619 * If this is the last controller to be released, destroy the
620 * shpchpd work queue
621 */
622 if (atomic_dec_and_test(&shpchp_num_controllers))
623 destroy_workqueue(shpchp_wq);
624 } 615 }
625 616
626 static int hpc_power_on_slot(struct slot * slot) 617 static int hpc_power_on_slot(struct slot * slot)
627 { 618 {
628 int retval; 619 int retval;
629 620
630 retval = shpc_write_cmd(slot, slot->hp_slot, SET_SLOT_PWR); 621 retval = shpc_write_cmd(slot, slot->hp_slot, SET_SLOT_PWR);
631 if (retval) 622 if (retval)
632 ctrl_err(slot->ctrl, "%s: Write command failed!\n", __func__); 623 ctrl_err(slot->ctrl, "%s: Write command failed!\n", __func__);
633 624
634 return retval; 625 return retval;
635 } 626 }
636 627
637 static int hpc_slot_enable(struct slot * slot) 628 static int hpc_slot_enable(struct slot * slot)
638 { 629 {
639 int retval; 630 int retval;
640 631
641 /* Slot - Enable, Power Indicator - Blink, Attention Indicator - Off */ 632 /* Slot - Enable, Power Indicator - Blink, Attention Indicator - Off */
642 retval = shpc_write_cmd(slot, slot->hp_slot, 633 retval = shpc_write_cmd(slot, slot->hp_slot,
643 SET_SLOT_ENABLE | SET_PWR_BLINK | SET_ATTN_OFF); 634 SET_SLOT_ENABLE | SET_PWR_BLINK | SET_ATTN_OFF);
644 if (retval) 635 if (retval)
645 ctrl_err(slot->ctrl, "%s: Write command failed!\n", __func__); 636 ctrl_err(slot->ctrl, "%s: Write command failed!\n", __func__);
646 637
647 return retval; 638 return retval;
648 } 639 }
649 640
650 static int hpc_slot_disable(struct slot * slot) 641 static int hpc_slot_disable(struct slot * slot)
651 { 642 {
652 int retval; 643 int retval;
653 644
654 /* Slot - Disable, Power Indicator - Off, Attention Indicator - On */ 645 /* Slot - Disable, Power Indicator - Off, Attention Indicator - On */
655 retval = shpc_write_cmd(slot, slot->hp_slot, 646 retval = shpc_write_cmd(slot, slot->hp_slot,
656 SET_SLOT_DISABLE | SET_PWR_OFF | SET_ATTN_ON); 647 SET_SLOT_DISABLE | SET_PWR_OFF | SET_ATTN_ON);
657 if (retval) 648 if (retval)
658 ctrl_err(slot->ctrl, "%s: Write command failed!\n", __func__); 649 ctrl_err(slot->ctrl, "%s: Write command failed!\n", __func__);
659 650
660 return retval; 651 return retval;
661 } 652 }
662 653
663 static int shpc_get_cur_bus_speed(struct controller *ctrl) 654 static int shpc_get_cur_bus_speed(struct controller *ctrl)
664 { 655 {
665 int retval = 0; 656 int retval = 0;
666 struct pci_bus *bus = ctrl->pci_dev->subordinate; 657 struct pci_bus *bus = ctrl->pci_dev->subordinate;
667 enum pci_bus_speed bus_speed = PCI_SPEED_UNKNOWN; 658 enum pci_bus_speed bus_speed = PCI_SPEED_UNKNOWN;
668 u16 sec_bus_reg = shpc_readw(ctrl, SEC_BUS_CONFIG); 659 u16 sec_bus_reg = shpc_readw(ctrl, SEC_BUS_CONFIG);
669 u8 pi = shpc_readb(ctrl, PROG_INTERFACE); 660 u8 pi = shpc_readb(ctrl, PROG_INTERFACE);
670 u8 speed_mode = (pi == 2) ? (sec_bus_reg & 0xF) : (sec_bus_reg & 0x7); 661 u8 speed_mode = (pi == 2) ? (sec_bus_reg & 0xF) : (sec_bus_reg & 0x7);
671 662
672 if ((pi == 1) && (speed_mode > 4)) { 663 if ((pi == 1) && (speed_mode > 4)) {
673 retval = -ENODEV; 664 retval = -ENODEV;
674 goto out; 665 goto out;
675 } 666 }
676 667
677 switch (speed_mode) { 668 switch (speed_mode) {
678 case 0x0: 669 case 0x0:
679 bus_speed = PCI_SPEED_33MHz; 670 bus_speed = PCI_SPEED_33MHz;
680 break; 671 break;
681 case 0x1: 672 case 0x1:
682 bus_speed = PCI_SPEED_66MHz; 673 bus_speed = PCI_SPEED_66MHz;
683 break; 674 break;
684 case 0x2: 675 case 0x2:
685 bus_speed = PCI_SPEED_66MHz_PCIX; 676 bus_speed = PCI_SPEED_66MHz_PCIX;
686 break; 677 break;
687 case 0x3: 678 case 0x3:
688 bus_speed = PCI_SPEED_100MHz_PCIX; 679 bus_speed = PCI_SPEED_100MHz_PCIX;
689 break; 680 break;
690 case 0x4: 681 case 0x4:
691 bus_speed = PCI_SPEED_133MHz_PCIX; 682 bus_speed = PCI_SPEED_133MHz_PCIX;
692 break; 683 break;
693 case 0x5: 684 case 0x5:
694 bus_speed = PCI_SPEED_66MHz_PCIX_ECC; 685 bus_speed = PCI_SPEED_66MHz_PCIX_ECC;
695 break; 686 break;
696 case 0x6: 687 case 0x6:
697 bus_speed = PCI_SPEED_100MHz_PCIX_ECC; 688 bus_speed = PCI_SPEED_100MHz_PCIX_ECC;
698 break; 689 break;
699 case 0x7: 690 case 0x7:
700 bus_speed = PCI_SPEED_133MHz_PCIX_ECC; 691 bus_speed = PCI_SPEED_133MHz_PCIX_ECC;
701 break; 692 break;
702 case 0x8: 693 case 0x8:
703 bus_speed = PCI_SPEED_66MHz_PCIX_266; 694 bus_speed = PCI_SPEED_66MHz_PCIX_266;
704 break; 695 break;
705 case 0x9: 696 case 0x9:
706 bus_speed = PCI_SPEED_100MHz_PCIX_266; 697 bus_speed = PCI_SPEED_100MHz_PCIX_266;
707 break; 698 break;
708 case 0xa: 699 case 0xa:
709 bus_speed = PCI_SPEED_133MHz_PCIX_266; 700 bus_speed = PCI_SPEED_133MHz_PCIX_266;
710 break; 701 break;
711 case 0xb: 702 case 0xb:
712 bus_speed = PCI_SPEED_66MHz_PCIX_533; 703 bus_speed = PCI_SPEED_66MHz_PCIX_533;
713 break; 704 break;
714 case 0xc: 705 case 0xc:
715 bus_speed = PCI_SPEED_100MHz_PCIX_533; 706 bus_speed = PCI_SPEED_100MHz_PCIX_533;
716 break; 707 break;
717 case 0xd: 708 case 0xd:
718 bus_speed = PCI_SPEED_133MHz_PCIX_533; 709 bus_speed = PCI_SPEED_133MHz_PCIX_533;
719 break; 710 break;
720 default: 711 default:
721 retval = -ENODEV; 712 retval = -ENODEV;
722 break; 713 break;
723 } 714 }
724 715
725 out: 716 out:
726 bus->cur_bus_speed = bus_speed; 717 bus->cur_bus_speed = bus_speed;
727 dbg("Current bus speed = %d\n", bus_speed); 718 dbg("Current bus speed = %d\n", bus_speed);
728 return retval; 719 return retval;
729 } 720 }
730 721
731 722
732 static int hpc_set_bus_speed_mode(struct slot * slot, enum pci_bus_speed value) 723 static int hpc_set_bus_speed_mode(struct slot * slot, enum pci_bus_speed value)
733 { 724 {
734 int retval; 725 int retval;
735 struct controller *ctrl = slot->ctrl; 726 struct controller *ctrl = slot->ctrl;
736 u8 pi, cmd; 727 u8 pi, cmd;
737 728
738 pi = shpc_readb(ctrl, PROG_INTERFACE); 729 pi = shpc_readb(ctrl, PROG_INTERFACE);
739 if ((pi == 1) && (value > PCI_SPEED_133MHz_PCIX)) 730 if ((pi == 1) && (value > PCI_SPEED_133MHz_PCIX))
740 return -EINVAL; 731 return -EINVAL;
741 732
742 switch (value) { 733 switch (value) {
743 case PCI_SPEED_33MHz: 734 case PCI_SPEED_33MHz:
744 cmd = SETA_PCI_33MHZ; 735 cmd = SETA_PCI_33MHZ;
745 break; 736 break;
746 case PCI_SPEED_66MHz: 737 case PCI_SPEED_66MHz:
747 cmd = SETA_PCI_66MHZ; 738 cmd = SETA_PCI_66MHZ;
748 break; 739 break;
749 case PCI_SPEED_66MHz_PCIX: 740 case PCI_SPEED_66MHz_PCIX:
750 cmd = SETA_PCIX_66MHZ; 741 cmd = SETA_PCIX_66MHZ;
751 break; 742 break;
752 case PCI_SPEED_100MHz_PCIX: 743 case PCI_SPEED_100MHz_PCIX:
753 cmd = SETA_PCIX_100MHZ; 744 cmd = SETA_PCIX_100MHZ;
754 break; 745 break;
755 case PCI_SPEED_133MHz_PCIX: 746 case PCI_SPEED_133MHz_PCIX:
756 cmd = SETA_PCIX_133MHZ; 747 cmd = SETA_PCIX_133MHZ;
757 break; 748 break;
758 case PCI_SPEED_66MHz_PCIX_ECC: 749 case PCI_SPEED_66MHz_PCIX_ECC:
759 cmd = SETB_PCIX_66MHZ_EM; 750 cmd = SETB_PCIX_66MHZ_EM;
760 break; 751 break;
761 case PCI_SPEED_100MHz_PCIX_ECC: 752 case PCI_SPEED_100MHz_PCIX_ECC:
762 cmd = SETB_PCIX_100MHZ_EM; 753 cmd = SETB_PCIX_100MHZ_EM;
763 break; 754 break;
764 case PCI_SPEED_133MHz_PCIX_ECC: 755 case PCI_SPEED_133MHz_PCIX_ECC:
765 cmd = SETB_PCIX_133MHZ_EM; 756 cmd = SETB_PCIX_133MHZ_EM;
766 break; 757 break;
767 case PCI_SPEED_66MHz_PCIX_266: 758 case PCI_SPEED_66MHz_PCIX_266:
768 cmd = SETB_PCIX_66MHZ_266; 759 cmd = SETB_PCIX_66MHZ_266;
769 break; 760 break;
770 case PCI_SPEED_100MHz_PCIX_266: 761 case PCI_SPEED_100MHz_PCIX_266:
771 cmd = SETB_PCIX_100MHZ_266; 762 cmd = SETB_PCIX_100MHZ_266;
772 break; 763 break;
773 case PCI_SPEED_133MHz_PCIX_266: 764 case PCI_SPEED_133MHz_PCIX_266:
774 cmd = SETB_PCIX_133MHZ_266; 765 cmd = SETB_PCIX_133MHZ_266;
775 break; 766 break;
776 case PCI_SPEED_66MHz_PCIX_533: 767 case PCI_SPEED_66MHz_PCIX_533:
777 cmd = SETB_PCIX_66MHZ_533; 768 cmd = SETB_PCIX_66MHZ_533;
778 break; 769 break;
779 case PCI_SPEED_100MHz_PCIX_533: 770 case PCI_SPEED_100MHz_PCIX_533:
780 cmd = SETB_PCIX_100MHZ_533; 771 cmd = SETB_PCIX_100MHZ_533;
781 break; 772 break;
782 case PCI_SPEED_133MHz_PCIX_533: 773 case PCI_SPEED_133MHz_PCIX_533:
783 cmd = SETB_PCIX_133MHZ_533; 774 cmd = SETB_PCIX_133MHZ_533;
784 break; 775 break;
785 default: 776 default:
786 return -EINVAL; 777 return -EINVAL;
787 } 778 }
788 779
789 retval = shpc_write_cmd(slot, 0, cmd); 780 retval = shpc_write_cmd(slot, 0, cmd);
790 if (retval) 781 if (retval)
791 ctrl_err(ctrl, "%s: Write command failed!\n", __func__); 782 ctrl_err(ctrl, "%s: Write command failed!\n", __func__);
792 else 783 else
793 shpc_get_cur_bus_speed(ctrl); 784 shpc_get_cur_bus_speed(ctrl);
794 785
795 return retval; 786 return retval;
796 } 787 }
797 788
798 static irqreturn_t shpc_isr(int irq, void *dev_id) 789 static irqreturn_t shpc_isr(int irq, void *dev_id)
799 { 790 {
800 struct controller *ctrl = (struct controller *)dev_id; 791 struct controller *ctrl = (struct controller *)dev_id;
801 u32 serr_int, slot_reg, intr_loc, intr_loc2; 792 u32 serr_int, slot_reg, intr_loc, intr_loc2;
802 int hp_slot; 793 int hp_slot;
803 794
804 /* Check to see if it was our interrupt */ 795 /* Check to see if it was our interrupt */
805 intr_loc = shpc_readl(ctrl, INTR_LOC); 796 intr_loc = shpc_readl(ctrl, INTR_LOC);
806 if (!intr_loc) 797 if (!intr_loc)
807 return IRQ_NONE; 798 return IRQ_NONE;
808 799
809 ctrl_dbg(ctrl, "%s: intr_loc = %x\n", __func__, intr_loc); 800 ctrl_dbg(ctrl, "%s: intr_loc = %x\n", __func__, intr_loc);
810 801
811 if(!shpchp_poll_mode) { 802 if(!shpchp_poll_mode) {
812 /* 803 /*
813 * Mask Global Interrupt Mask - see implementation 804 * Mask Global Interrupt Mask - see implementation
814 * note on p. 139 of SHPC spec rev 1.0 805 * note on p. 139 of SHPC spec rev 1.0
815 */ 806 */
816 serr_int = shpc_readl(ctrl, SERR_INTR_ENABLE); 807 serr_int = shpc_readl(ctrl, SERR_INTR_ENABLE);
817 serr_int |= GLOBAL_INTR_MASK; 808 serr_int |= GLOBAL_INTR_MASK;
818 serr_int &= ~SERR_INTR_RSVDZ_MASK; 809 serr_int &= ~SERR_INTR_RSVDZ_MASK;
819 shpc_writel(ctrl, SERR_INTR_ENABLE, serr_int); 810 shpc_writel(ctrl, SERR_INTR_ENABLE, serr_int);
820 811
821 intr_loc2 = shpc_readl(ctrl, INTR_LOC); 812 intr_loc2 = shpc_readl(ctrl, INTR_LOC);
822 ctrl_dbg(ctrl, "%s: intr_loc2 = %x\n", __func__, intr_loc2); 813 ctrl_dbg(ctrl, "%s: intr_loc2 = %x\n", __func__, intr_loc2);
823 } 814 }
824 815
825 if (intr_loc & CMD_INTR_PENDING) { 816 if (intr_loc & CMD_INTR_PENDING) {
826 /* 817 /*
827 * Command Complete Interrupt Pending 818 * Command Complete Interrupt Pending
828 * RO only - clear by writing 1 to the Command Completion 819 * RO only - clear by writing 1 to the Command Completion
829 * Detect bit in Controller SERR-INT register 820 * Detect bit in Controller SERR-INT register
830 */ 821 */
831 serr_int = shpc_readl(ctrl, SERR_INTR_ENABLE); 822 serr_int = shpc_readl(ctrl, SERR_INTR_ENABLE);
832 serr_int &= ~SERR_INTR_RSVDZ_MASK; 823 serr_int &= ~SERR_INTR_RSVDZ_MASK;
833 shpc_writel(ctrl, SERR_INTR_ENABLE, serr_int); 824 shpc_writel(ctrl, SERR_INTR_ENABLE, serr_int);
834 825
835 wake_up_interruptible(&ctrl->queue); 826 wake_up_interruptible(&ctrl->queue);
836 } 827 }
837 828
838 if (!(intr_loc & ~CMD_INTR_PENDING)) 829 if (!(intr_loc & ~CMD_INTR_PENDING))
839 goto out; 830 goto out;
840 831
841 for (hp_slot = 0; hp_slot < ctrl->num_slots; hp_slot++) { 832 for (hp_slot = 0; hp_slot < ctrl->num_slots; hp_slot++) {
842 /* To find out which slot has interrupt pending */ 833 /* To find out which slot has interrupt pending */
843 if (!(intr_loc & SLOT_INTR_PENDING(hp_slot))) 834 if (!(intr_loc & SLOT_INTR_PENDING(hp_slot)))
844 continue; 835 continue;
845 836
846 slot_reg = shpc_readl(ctrl, SLOT_REG(hp_slot)); 837 slot_reg = shpc_readl(ctrl, SLOT_REG(hp_slot));
847 ctrl_dbg(ctrl, "Slot %x with intr, slot register = %x\n", 838 ctrl_dbg(ctrl, "Slot %x with intr, slot register = %x\n",
848 hp_slot, slot_reg); 839 hp_slot, slot_reg);
849 840
850 if (slot_reg & MRL_CHANGE_DETECTED) 841 if (slot_reg & MRL_CHANGE_DETECTED)
851 shpchp_handle_switch_change(hp_slot, ctrl); 842 shpchp_handle_switch_change(hp_slot, ctrl);
852 843
853 if (slot_reg & BUTTON_PRESS_DETECTED) 844 if (slot_reg & BUTTON_PRESS_DETECTED)
854 shpchp_handle_attention_button(hp_slot, ctrl); 845 shpchp_handle_attention_button(hp_slot, ctrl);
855 846
856 if (slot_reg & PRSNT_CHANGE_DETECTED) 847 if (slot_reg & PRSNT_CHANGE_DETECTED)
857 shpchp_handle_presence_change(hp_slot, ctrl); 848 shpchp_handle_presence_change(hp_slot, ctrl);
858 849
859 if (slot_reg & (ISO_PFAULT_DETECTED | CON_PFAULT_DETECTED)) 850 if (slot_reg & (ISO_PFAULT_DETECTED | CON_PFAULT_DETECTED))
860 shpchp_handle_power_fault(hp_slot, ctrl); 851 shpchp_handle_power_fault(hp_slot, ctrl);
861 852
862 /* Clear all slot events */ 853 /* Clear all slot events */
863 slot_reg &= ~SLOT_REG_RSVDZ_MASK; 854 slot_reg &= ~SLOT_REG_RSVDZ_MASK;
864 shpc_writel(ctrl, SLOT_REG(hp_slot), slot_reg); 855 shpc_writel(ctrl, SLOT_REG(hp_slot), slot_reg);
865 } 856 }
866 out: 857 out:
867 if (!shpchp_poll_mode) { 858 if (!shpchp_poll_mode) {
868 /* Unmask Global Interrupt Mask */ 859 /* Unmask Global Interrupt Mask */
869 serr_int = shpc_readl(ctrl, SERR_INTR_ENABLE); 860 serr_int = shpc_readl(ctrl, SERR_INTR_ENABLE);
870 serr_int &= ~(GLOBAL_INTR_MASK | SERR_INTR_RSVDZ_MASK); 861 serr_int &= ~(GLOBAL_INTR_MASK | SERR_INTR_RSVDZ_MASK);
871 shpc_writel(ctrl, SERR_INTR_ENABLE, serr_int); 862 shpc_writel(ctrl, SERR_INTR_ENABLE, serr_int);
872 } 863 }
873 864
874 return IRQ_HANDLED; 865 return IRQ_HANDLED;
875 } 866 }
876 867
877 static int shpc_get_max_bus_speed(struct controller *ctrl) 868 static int shpc_get_max_bus_speed(struct controller *ctrl)
878 { 869 {
879 int retval = 0; 870 int retval = 0;
880 struct pci_bus *bus = ctrl->pci_dev->subordinate; 871 struct pci_bus *bus = ctrl->pci_dev->subordinate;
881 enum pci_bus_speed bus_speed = PCI_SPEED_UNKNOWN; 872 enum pci_bus_speed bus_speed = PCI_SPEED_UNKNOWN;
882 u8 pi = shpc_readb(ctrl, PROG_INTERFACE); 873 u8 pi = shpc_readb(ctrl, PROG_INTERFACE);
883 u32 slot_avail1 = shpc_readl(ctrl, SLOT_AVAIL1); 874 u32 slot_avail1 = shpc_readl(ctrl, SLOT_AVAIL1);
884 u32 slot_avail2 = shpc_readl(ctrl, SLOT_AVAIL2); 875 u32 slot_avail2 = shpc_readl(ctrl, SLOT_AVAIL2);
885 876
886 if (pi == 2) { 877 if (pi == 2) {
887 if (slot_avail2 & SLOT_133MHZ_PCIX_533) 878 if (slot_avail2 & SLOT_133MHZ_PCIX_533)
888 bus_speed = PCI_SPEED_133MHz_PCIX_533; 879 bus_speed = PCI_SPEED_133MHz_PCIX_533;
889 else if (slot_avail2 & SLOT_100MHZ_PCIX_533) 880 else if (slot_avail2 & SLOT_100MHZ_PCIX_533)
890 bus_speed = PCI_SPEED_100MHz_PCIX_533; 881 bus_speed = PCI_SPEED_100MHz_PCIX_533;
891 else if (slot_avail2 & SLOT_66MHZ_PCIX_533) 882 else if (slot_avail2 & SLOT_66MHZ_PCIX_533)
892 bus_speed = PCI_SPEED_66MHz_PCIX_533; 883 bus_speed = PCI_SPEED_66MHz_PCIX_533;
893 else if (slot_avail2 & SLOT_133MHZ_PCIX_266) 884 else if (slot_avail2 & SLOT_133MHZ_PCIX_266)
894 bus_speed = PCI_SPEED_133MHz_PCIX_266; 885 bus_speed = PCI_SPEED_133MHz_PCIX_266;
895 else if (slot_avail2 & SLOT_100MHZ_PCIX_266) 886 else if (slot_avail2 & SLOT_100MHZ_PCIX_266)
896 bus_speed = PCI_SPEED_100MHz_PCIX_266; 887 bus_speed = PCI_SPEED_100MHz_PCIX_266;
897 else if (slot_avail2 & SLOT_66MHZ_PCIX_266) 888 else if (slot_avail2 & SLOT_66MHZ_PCIX_266)
898 bus_speed = PCI_SPEED_66MHz_PCIX_266; 889 bus_speed = PCI_SPEED_66MHz_PCIX_266;
899 } 890 }
900 891
901 if (bus_speed == PCI_SPEED_UNKNOWN) { 892 if (bus_speed == PCI_SPEED_UNKNOWN) {
902 if (slot_avail1 & SLOT_133MHZ_PCIX) 893 if (slot_avail1 & SLOT_133MHZ_PCIX)
903 bus_speed = PCI_SPEED_133MHz_PCIX; 894 bus_speed = PCI_SPEED_133MHz_PCIX;
904 else if (slot_avail1 & SLOT_100MHZ_PCIX) 895 else if (slot_avail1 & SLOT_100MHZ_PCIX)
905 bus_speed = PCI_SPEED_100MHz_PCIX; 896 bus_speed = PCI_SPEED_100MHz_PCIX;
906 else if (slot_avail1 & SLOT_66MHZ_PCIX) 897 else if (slot_avail1 & SLOT_66MHZ_PCIX)
907 bus_speed = PCI_SPEED_66MHz_PCIX; 898 bus_speed = PCI_SPEED_66MHz_PCIX;
908 else if (slot_avail2 & SLOT_66MHZ) 899 else if (slot_avail2 & SLOT_66MHZ)
909 bus_speed = PCI_SPEED_66MHz; 900 bus_speed = PCI_SPEED_66MHz;
910 else if (slot_avail1 & SLOT_33MHZ) 901 else if (slot_avail1 & SLOT_33MHZ)
911 bus_speed = PCI_SPEED_33MHz; 902 bus_speed = PCI_SPEED_33MHz;
912 else 903 else
913 retval = -ENODEV; 904 retval = -ENODEV;
914 } 905 }
915 906
916 bus->max_bus_speed = bus_speed; 907 bus->max_bus_speed = bus_speed;
917 ctrl_dbg(ctrl, "Max bus speed = %d\n", bus_speed); 908 ctrl_dbg(ctrl, "Max bus speed = %d\n", bus_speed);
918 909
919 return retval; 910 return retval;
920 } 911 }
921 912
922 static struct hpc_ops shpchp_hpc_ops = { 913 static struct hpc_ops shpchp_hpc_ops = {
923 .power_on_slot = hpc_power_on_slot, 914 .power_on_slot = hpc_power_on_slot,
924 .slot_enable = hpc_slot_enable, 915 .slot_enable = hpc_slot_enable,
925 .slot_disable = hpc_slot_disable, 916 .slot_disable = hpc_slot_disable,
926 .set_bus_speed_mode = hpc_set_bus_speed_mode, 917 .set_bus_speed_mode = hpc_set_bus_speed_mode,
927 .set_attention_status = hpc_set_attention_status, 918 .set_attention_status = hpc_set_attention_status,
928 .get_power_status = hpc_get_power_status, 919 .get_power_status = hpc_get_power_status,
929 .get_attention_status = hpc_get_attention_status, 920 .get_attention_status = hpc_get_attention_status,
930 .get_latch_status = hpc_get_latch_status, 921 .get_latch_status = hpc_get_latch_status,
931 .get_adapter_status = hpc_get_adapter_status, 922 .get_adapter_status = hpc_get_adapter_status,
932 923
933 .get_adapter_speed = hpc_get_adapter_speed, 924 .get_adapter_speed = hpc_get_adapter_speed,
934 .get_mode1_ECC_cap = hpc_get_mode1_ECC_cap, 925 .get_mode1_ECC_cap = hpc_get_mode1_ECC_cap,
935 .get_prog_int = hpc_get_prog_int, 926 .get_prog_int = hpc_get_prog_int,
936 927
937 .query_power_fault = hpc_query_power_fault, 928 .query_power_fault = hpc_query_power_fault,
938 .green_led_on = hpc_set_green_led_on, 929 .green_led_on = hpc_set_green_led_on,
939 .green_led_off = hpc_set_green_led_off, 930 .green_led_off = hpc_set_green_led_off,
940 .green_led_blink = hpc_set_green_led_blink, 931 .green_led_blink = hpc_set_green_led_blink,
941 932
942 .release_ctlr = hpc_release_ctlr, 933 .release_ctlr = hpc_release_ctlr,
943 }; 934 };
944 935
945 int shpc_init(struct controller *ctrl, struct pci_dev *pdev) 936 int shpc_init(struct controller *ctrl, struct pci_dev *pdev)
946 { 937 {
947 int rc = -1, num_slots = 0; 938 int rc = -1, num_slots = 0;
948 u8 hp_slot; 939 u8 hp_slot;
949 u32 shpc_base_offset; 940 u32 shpc_base_offset;
950 u32 tempdword, slot_reg, slot_config; 941 u32 tempdword, slot_reg, slot_config;
951 u8 i; 942 u8 i;
952 943
953 ctrl->pci_dev = pdev; /* pci_dev of the P2P bridge */ 944 ctrl->pci_dev = pdev; /* pci_dev of the P2P bridge */
954 ctrl_dbg(ctrl, "Hotplug Controller:\n"); 945 ctrl_dbg(ctrl, "Hotplug Controller:\n");
955 946
956 if ((pdev->vendor == PCI_VENDOR_ID_AMD) || (pdev->device == 947 if ((pdev->vendor == PCI_VENDOR_ID_AMD) || (pdev->device ==
957 PCI_DEVICE_ID_AMD_GOLAM_7450)) { 948 PCI_DEVICE_ID_AMD_GOLAM_7450)) {
958 /* amd shpc driver doesn't use Base Offset; assume 0 */ 949 /* amd shpc driver doesn't use Base Offset; assume 0 */
959 ctrl->mmio_base = pci_resource_start(pdev, 0); 950 ctrl->mmio_base = pci_resource_start(pdev, 0);
960 ctrl->mmio_size = pci_resource_len(pdev, 0); 951 ctrl->mmio_size = pci_resource_len(pdev, 0);
961 } else { 952 } else {
962 ctrl->cap_offset = pci_find_capability(pdev, PCI_CAP_ID_SHPC); 953 ctrl->cap_offset = pci_find_capability(pdev, PCI_CAP_ID_SHPC);
963 if (!ctrl->cap_offset) { 954 if (!ctrl->cap_offset) {
964 ctrl_err(ctrl, "Cannot find PCI capability\n"); 955 ctrl_err(ctrl, "Cannot find PCI capability\n");
965 goto abort; 956 goto abort;
966 } 957 }
967 ctrl_dbg(ctrl, " cap_offset = %x\n", ctrl->cap_offset); 958 ctrl_dbg(ctrl, " cap_offset = %x\n", ctrl->cap_offset);
968 959
969 rc = shpc_indirect_read(ctrl, 0, &shpc_base_offset); 960 rc = shpc_indirect_read(ctrl, 0, &shpc_base_offset);
970 if (rc) { 961 if (rc) {
971 ctrl_err(ctrl, "Cannot read base_offset\n"); 962 ctrl_err(ctrl, "Cannot read base_offset\n");
972 goto abort; 963 goto abort;
973 } 964 }
974 965
975 rc = shpc_indirect_read(ctrl, 3, &tempdword); 966 rc = shpc_indirect_read(ctrl, 3, &tempdword);
976 if (rc) { 967 if (rc) {
977 ctrl_err(ctrl, "Cannot read slot config\n"); 968 ctrl_err(ctrl, "Cannot read slot config\n");
978 goto abort; 969 goto abort;
979 } 970 }
980 num_slots = tempdword & SLOT_NUM; 971 num_slots = tempdword & SLOT_NUM;
981 ctrl_dbg(ctrl, " num_slots (indirect) %x\n", num_slots); 972 ctrl_dbg(ctrl, " num_slots (indirect) %x\n", num_slots);
982 973
983 for (i = 0; i < 9 + num_slots; i++) { 974 for (i = 0; i < 9 + num_slots; i++) {
984 rc = shpc_indirect_read(ctrl, i, &tempdword); 975 rc = shpc_indirect_read(ctrl, i, &tempdword);
985 if (rc) { 976 if (rc) {
986 ctrl_err(ctrl, 977 ctrl_err(ctrl,
987 "Cannot read creg (index = %d)\n", i); 978 "Cannot read creg (index = %d)\n", i);
988 goto abort; 979 goto abort;
989 } 980 }
990 ctrl_dbg(ctrl, " offset %d: value %x\n", i, tempdword); 981 ctrl_dbg(ctrl, " offset %d: value %x\n", i, tempdword);
991 } 982 }
992 983
993 ctrl->mmio_base = 984 ctrl->mmio_base =
994 pci_resource_start(pdev, 0) + shpc_base_offset; 985 pci_resource_start(pdev, 0) + shpc_base_offset;
995 ctrl->mmio_size = 0x24 + 0x4 * num_slots; 986 ctrl->mmio_size = 0x24 + 0x4 * num_slots;
996 } 987 }
997 988
998 ctrl_info(ctrl, "HPC vendor_id %x device_id %x ss_vid %x ss_did %x\n", 989 ctrl_info(ctrl, "HPC vendor_id %x device_id %x ss_vid %x ss_did %x\n",
999 pdev->vendor, pdev->device, pdev->subsystem_vendor, 990 pdev->vendor, pdev->device, pdev->subsystem_vendor,
1000 pdev->subsystem_device); 991 pdev->subsystem_device);
1001 992
1002 rc = pci_enable_device(pdev); 993 rc = pci_enable_device(pdev);
1003 if (rc) { 994 if (rc) {
1004 ctrl_err(ctrl, "pci_enable_device failed\n"); 995 ctrl_err(ctrl, "pci_enable_device failed\n");
1005 goto abort; 996 goto abort;
1006 } 997 }
1007 998
1008 if (!request_mem_region(ctrl->mmio_base, ctrl->mmio_size, MY_NAME)) { 999 if (!request_mem_region(ctrl->mmio_base, ctrl->mmio_size, MY_NAME)) {
1009 ctrl_err(ctrl, "Cannot reserve MMIO region\n"); 1000 ctrl_err(ctrl, "Cannot reserve MMIO region\n");
1010 rc = -1; 1001 rc = -1;
1011 goto abort; 1002 goto abort;
1012 } 1003 }
1013 1004
1014 ctrl->creg = ioremap(ctrl->mmio_base, ctrl->mmio_size); 1005 ctrl->creg = ioremap(ctrl->mmio_base, ctrl->mmio_size);
1015 if (!ctrl->creg) { 1006 if (!ctrl->creg) {
1016 ctrl_err(ctrl, "Cannot remap MMIO region %lx @ %lx\n", 1007 ctrl_err(ctrl, "Cannot remap MMIO region %lx @ %lx\n",
1017 ctrl->mmio_size, ctrl->mmio_base); 1008 ctrl->mmio_size, ctrl->mmio_base);
1018 release_mem_region(ctrl->mmio_base, ctrl->mmio_size); 1009 release_mem_region(ctrl->mmio_base, ctrl->mmio_size);
1019 rc = -1; 1010 rc = -1;
1020 goto abort; 1011 goto abort;
1021 } 1012 }
1022 ctrl_dbg(ctrl, "ctrl->creg %p\n", ctrl->creg); 1013 ctrl_dbg(ctrl, "ctrl->creg %p\n", ctrl->creg);
1023 1014
1024 mutex_init(&ctrl->crit_sect); 1015 mutex_init(&ctrl->crit_sect);
1025 mutex_init(&ctrl->cmd_lock); 1016 mutex_init(&ctrl->cmd_lock);
1026 1017
1027 /* Setup wait queue */ 1018 /* Setup wait queue */
1028 init_waitqueue_head(&ctrl->queue); 1019 init_waitqueue_head(&ctrl->queue);
1029 1020
1030 ctrl->hpc_ops = &shpchp_hpc_ops; 1021 ctrl->hpc_ops = &shpchp_hpc_ops;
1031 1022
1032 /* Return PCI Controller Info */ 1023 /* Return PCI Controller Info */
1033 slot_config = shpc_readl(ctrl, SLOT_CONFIG); 1024 slot_config = shpc_readl(ctrl, SLOT_CONFIG);
1034 ctrl->slot_device_offset = (slot_config & FIRST_DEV_NUM) >> 8; 1025 ctrl->slot_device_offset = (slot_config & FIRST_DEV_NUM) >> 8;
1035 ctrl->num_slots = slot_config & SLOT_NUM; 1026 ctrl->num_slots = slot_config & SLOT_NUM;
1036 ctrl->first_slot = (slot_config & PSN) >> 16; 1027 ctrl->first_slot = (slot_config & PSN) >> 16;
1037 ctrl->slot_num_inc = ((slot_config & UPDOWN) >> 29) ? 1 : -1; 1028 ctrl->slot_num_inc = ((slot_config & UPDOWN) >> 29) ? 1 : -1;
1038 1029
1039 /* Mask Global Interrupt Mask & Command Complete Interrupt Mask */ 1030 /* Mask Global Interrupt Mask & Command Complete Interrupt Mask */
1040 tempdword = shpc_readl(ctrl, SERR_INTR_ENABLE); 1031 tempdword = shpc_readl(ctrl, SERR_INTR_ENABLE);
1041 ctrl_dbg(ctrl, "SERR_INTR_ENABLE = %x\n", tempdword); 1032 ctrl_dbg(ctrl, "SERR_INTR_ENABLE = %x\n", tempdword);
1042 tempdword |= (GLOBAL_INTR_MASK | GLOBAL_SERR_MASK | 1033 tempdword |= (GLOBAL_INTR_MASK | GLOBAL_SERR_MASK |
1043 COMMAND_INTR_MASK | ARBITER_SERR_MASK); 1034 COMMAND_INTR_MASK | ARBITER_SERR_MASK);
1044 tempdword &= ~SERR_INTR_RSVDZ_MASK; 1035 tempdword &= ~SERR_INTR_RSVDZ_MASK;
1045 shpc_writel(ctrl, SERR_INTR_ENABLE, tempdword); 1036 shpc_writel(ctrl, SERR_INTR_ENABLE, tempdword);
1046 tempdword = shpc_readl(ctrl, SERR_INTR_ENABLE); 1037 tempdword = shpc_readl(ctrl, SERR_INTR_ENABLE);
1047 ctrl_dbg(ctrl, "SERR_INTR_ENABLE = %x\n", tempdword); 1038 ctrl_dbg(ctrl, "SERR_INTR_ENABLE = %x\n", tempdword);
1048 1039
1049 /* Mask the MRL sensor SERR Mask of individual slot in 1040 /* Mask the MRL sensor SERR Mask of individual slot in
1050 * Slot SERR-INT Mask & clear all the existing event if any 1041 * Slot SERR-INT Mask & clear all the existing event if any
1051 */ 1042 */
1052 for (hp_slot = 0; hp_slot < ctrl->num_slots; hp_slot++) { 1043 for (hp_slot = 0; hp_slot < ctrl->num_slots; hp_slot++) {
1053 slot_reg = shpc_readl(ctrl, SLOT_REG(hp_slot)); 1044 slot_reg = shpc_readl(ctrl, SLOT_REG(hp_slot));
1054 ctrl_dbg(ctrl, "Default Logical Slot Register %d value %x\n", 1045 ctrl_dbg(ctrl, "Default Logical Slot Register %d value %x\n",
1055 hp_slot, slot_reg); 1046 hp_slot, slot_reg);
1056 slot_reg |= (PRSNT_CHANGE_INTR_MASK | ISO_PFAULT_INTR_MASK | 1047 slot_reg |= (PRSNT_CHANGE_INTR_MASK | ISO_PFAULT_INTR_MASK |
1057 BUTTON_PRESS_INTR_MASK | MRL_CHANGE_INTR_MASK | 1048 BUTTON_PRESS_INTR_MASK | MRL_CHANGE_INTR_MASK |
1058 CON_PFAULT_INTR_MASK | MRL_CHANGE_SERR_MASK | 1049 CON_PFAULT_INTR_MASK | MRL_CHANGE_SERR_MASK |
1059 CON_PFAULT_SERR_MASK); 1050 CON_PFAULT_SERR_MASK);
1060 slot_reg &= ~SLOT_REG_RSVDZ_MASK; 1051 slot_reg &= ~SLOT_REG_RSVDZ_MASK;
1061 shpc_writel(ctrl, SLOT_REG(hp_slot), slot_reg); 1052 shpc_writel(ctrl, SLOT_REG(hp_slot), slot_reg);
1062 } 1053 }
1063 1054
1064 if (shpchp_poll_mode) { 1055 if (shpchp_poll_mode) {
1065 /* Install interrupt polling timer. Start with 10 sec delay */ 1056 /* Install interrupt polling timer. Start with 10 sec delay */
1066 init_timer(&ctrl->poll_timer); 1057 init_timer(&ctrl->poll_timer);
1067 start_int_poll_timer(ctrl, 10); 1058 start_int_poll_timer(ctrl, 10);
1068 } else { 1059 } else {
1069 /* Installs the interrupt handler */ 1060 /* Installs the interrupt handler */
1070 rc = pci_enable_msi(pdev); 1061 rc = pci_enable_msi(pdev);
1071 if (rc) { 1062 if (rc) {
1072 ctrl_info(ctrl, 1063 ctrl_info(ctrl,
1073 "Can't get msi for the hotplug controller\n"); 1064 "Can't get msi for the hotplug controller\n");
1074 ctrl_info(ctrl, 1065 ctrl_info(ctrl,
1075 "Use INTx for the hotplug controller\n"); 1066 "Use INTx for the hotplug controller\n");
1076 } 1067 }
1077 1068
1078 rc = request_irq(ctrl->pci_dev->irq, shpc_isr, IRQF_SHARED, 1069 rc = request_irq(ctrl->pci_dev->irq, shpc_isr, IRQF_SHARED,
1079 MY_NAME, (void *)ctrl); 1070 MY_NAME, (void *)ctrl);
1080 ctrl_dbg(ctrl, "request_irq %d for hpc%d (returns %d)\n", 1071 ctrl_dbg(ctrl, "request_irq %d (returns %d)\n",
1081 ctrl->pci_dev->irq, 1072 ctrl->pci_dev->irq, rc);
1082 atomic_read(&shpchp_num_controllers), rc);
1083 if (rc) { 1073 if (rc) {
1084 ctrl_err(ctrl, "Can't get irq %d for the hotplug " 1074 ctrl_err(ctrl, "Can't get irq %d for the hotplug "
1085 "controller\n", ctrl->pci_dev->irq); 1075 "controller\n", ctrl->pci_dev->irq);
1086 goto abort_iounmap; 1076 goto abort_iounmap;
1087 } 1077 }
1088 } 1078 }
1089 ctrl_dbg(ctrl, "HPC at %s irq=%x\n", pci_name(pdev), pdev->irq); 1079 ctrl_dbg(ctrl, "HPC at %s irq=%x\n", pci_name(pdev), pdev->irq);
1090 1080
1091 shpc_get_max_bus_speed(ctrl); 1081 shpc_get_max_bus_speed(ctrl);
1092 shpc_get_cur_bus_speed(ctrl); 1082 shpc_get_cur_bus_speed(ctrl);
1093
1094 /*
1095 * If this is the first controller to be initialized,
1096 * initialize the shpchpd work queue
1097 */
1098 if (atomic_add_return(1, &shpchp_num_controllers) == 1) {
1099 shpchp_wq = create_singlethread_workqueue("shpchpd");
1100 if (!shpchp_wq) {
1101 rc = -ENOMEM;
1102 goto abort_iounmap;
1103 }
1104 }
1105 1083
1106 /* 1084 /*
1107 * Unmask all event interrupts of all slots 1085 * Unmask all event interrupts of all slots
1108 */ 1086 */
1109 for (hp_slot = 0; hp_slot < ctrl->num_slots; hp_slot++) { 1087 for (hp_slot = 0; hp_slot < ctrl->num_slots; hp_slot++) {
1110 slot_reg = shpc_readl(ctrl, SLOT_REG(hp_slot)); 1088 slot_reg = shpc_readl(ctrl, SLOT_REG(hp_slot));
1111 ctrl_dbg(ctrl, "Default Logical Slot Register %d value %x\n", 1089 ctrl_dbg(ctrl, "Default Logical Slot Register %d value %x\n",
1112 hp_slot, slot_reg); 1090 hp_slot, slot_reg);
1113 slot_reg &= ~(PRSNT_CHANGE_INTR_MASK | ISO_PFAULT_INTR_MASK | 1091 slot_reg &= ~(PRSNT_CHANGE_INTR_MASK | ISO_PFAULT_INTR_MASK |
1114 BUTTON_PRESS_INTR_MASK | MRL_CHANGE_INTR_MASK | 1092 BUTTON_PRESS_INTR_MASK | MRL_CHANGE_INTR_MASK |
1115 CON_PFAULT_INTR_MASK | SLOT_REG_RSVDZ_MASK); 1093 CON_PFAULT_INTR_MASK | SLOT_REG_RSVDZ_MASK);
1116 shpc_writel(ctrl, SLOT_REG(hp_slot), slot_reg); 1094 shpc_writel(ctrl, SLOT_REG(hp_slot), slot_reg);
1117 } 1095 }
1118 if (!shpchp_poll_mode) { 1096 if (!shpchp_poll_mode) {
1119 /* Unmask all general input interrupts and SERR */ 1097 /* Unmask all general input interrupts and SERR */
1120 tempdword = shpc_readl(ctrl, SERR_INTR_ENABLE); 1098 tempdword = shpc_readl(ctrl, SERR_INTR_ENABLE);
1121 tempdword &= ~(GLOBAL_INTR_MASK | COMMAND_INTR_MASK | 1099 tempdword &= ~(GLOBAL_INTR_MASK | COMMAND_INTR_MASK |
1122 SERR_INTR_RSVDZ_MASK); 1100 SERR_INTR_RSVDZ_MASK);
1123 shpc_writel(ctrl, SERR_INTR_ENABLE, tempdword); 1101 shpc_writel(ctrl, SERR_INTR_ENABLE, tempdword);
1124 tempdword = shpc_readl(ctrl, SERR_INTR_ENABLE); 1102 tempdword = shpc_readl(ctrl, SERR_INTR_ENABLE);
1125 ctrl_dbg(ctrl, "SERR_INTR_ENABLE = %x\n", tempdword); 1103 ctrl_dbg(ctrl, "SERR_INTR_ENABLE = %x\n", tempdword);
1126 } 1104 }
1127 1105
1128 return 0; 1106 return 0;
1129 1107
1130 /* We end up here for the many possible ways to fail this API. */ 1108 /* We end up here for the many possible ways to fail this API. */
1131 abort_iounmap: 1109 abort_iounmap:
1132 iounmap(ctrl->creg); 1110 iounmap(ctrl->creg);
1133 abort: 1111 abort:
1134 return rc; 1112 return rc;
1135 } 1113 }
1136 1114
1 /* 1 /*
2 * Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. 2 * Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved.
3 * Copyright (C) 2004-2006 Red Hat, Inc. All rights reserved. 3 * Copyright (C) 2004-2006 Red Hat, Inc. All rights reserved.
4 * 4 *
5 * This copyrighted material is made available to anyone wishing to use, 5 * This copyrighted material is made available to anyone wishing to use,
6 * modify, copy, or redistribute it subject to the terms and conditions 6 * modify, copy, or redistribute it subject to the terms and conditions
7 * of the GNU General Public License version 2. 7 * of the GNU General Public License version 2.
8 */ 8 */
9 9
10 #include <linux/slab.h> 10 #include <linux/slab.h>
11 #include <linux/spinlock.h> 11 #include <linux/spinlock.h>
12 #include <linux/completion.h> 12 #include <linux/completion.h>
13 #include <linux/buffer_head.h> 13 #include <linux/buffer_head.h>
14 #include <linux/module.h> 14 #include <linux/module.h>
15 #include <linux/init.h> 15 #include <linux/init.h>
16 #include <linux/gfs2_ondisk.h> 16 #include <linux/gfs2_ondisk.h>
17 #include <asm/atomic.h> 17 #include <asm/atomic.h>
18 18
19 #include "gfs2.h" 19 #include "gfs2.h"
20 #include "incore.h" 20 #include "incore.h"
21 #include "super.h" 21 #include "super.h"
22 #include "sys.h" 22 #include "sys.h"
23 #include "util.h" 23 #include "util.h"
24 #include "glock.h" 24 #include "glock.h"
25 #include "quota.h" 25 #include "quota.h"
26 #include "recovery.h" 26 #include "recovery.h"
27 #include "dir.h" 27 #include "dir.h"
28 28
29 static struct shrinker qd_shrinker = { 29 static struct shrinker qd_shrinker = {
30 .shrink = gfs2_shrink_qd_memory, 30 .shrink = gfs2_shrink_qd_memory,
31 .seeks = DEFAULT_SEEKS, 31 .seeks = DEFAULT_SEEKS,
32 }; 32 };
33 33
34 static void gfs2_init_inode_once(void *foo) 34 static void gfs2_init_inode_once(void *foo)
35 { 35 {
36 struct gfs2_inode *ip = foo; 36 struct gfs2_inode *ip = foo;
37 37
38 inode_init_once(&ip->i_inode); 38 inode_init_once(&ip->i_inode);
39 init_rwsem(&ip->i_rw_mutex); 39 init_rwsem(&ip->i_rw_mutex);
40 INIT_LIST_HEAD(&ip->i_trunc_list); 40 INIT_LIST_HEAD(&ip->i_trunc_list);
41 ip->i_alloc = NULL; 41 ip->i_alloc = NULL;
42 } 42 }
43 43
44 static void gfs2_init_glock_once(void *foo) 44 static void gfs2_init_glock_once(void *foo)
45 { 45 {
46 struct gfs2_glock *gl = foo; 46 struct gfs2_glock *gl = foo;
47 47
48 INIT_HLIST_NODE(&gl->gl_list); 48 INIT_HLIST_NODE(&gl->gl_list);
49 spin_lock_init(&gl->gl_spin); 49 spin_lock_init(&gl->gl_spin);
50 INIT_LIST_HEAD(&gl->gl_holders); 50 INIT_LIST_HEAD(&gl->gl_holders);
51 INIT_LIST_HEAD(&gl->gl_lru); 51 INIT_LIST_HEAD(&gl->gl_lru);
52 INIT_LIST_HEAD(&gl->gl_ail_list); 52 INIT_LIST_HEAD(&gl->gl_ail_list);
53 atomic_set(&gl->gl_ail_count, 0); 53 atomic_set(&gl->gl_ail_count, 0);
54 } 54 }
55 55
56 static void gfs2_init_gl_aspace_once(void *foo) 56 static void gfs2_init_gl_aspace_once(void *foo)
57 { 57 {
58 struct gfs2_glock *gl = foo; 58 struct gfs2_glock *gl = foo;
59 struct address_space *mapping = (struct address_space *)(gl + 1); 59 struct address_space *mapping = (struct address_space *)(gl + 1);
60 60
61 gfs2_init_glock_once(gl); 61 gfs2_init_glock_once(gl);
62 memset(mapping, 0, sizeof(*mapping)); 62 memset(mapping, 0, sizeof(*mapping));
63 INIT_RADIX_TREE(&mapping->page_tree, GFP_ATOMIC); 63 INIT_RADIX_TREE(&mapping->page_tree, GFP_ATOMIC);
64 spin_lock_init(&mapping->tree_lock); 64 spin_lock_init(&mapping->tree_lock);
65 spin_lock_init(&mapping->i_mmap_lock); 65 spin_lock_init(&mapping->i_mmap_lock);
66 INIT_LIST_HEAD(&mapping->private_list); 66 INIT_LIST_HEAD(&mapping->private_list);
67 spin_lock_init(&mapping->private_lock); 67 spin_lock_init(&mapping->private_lock);
68 INIT_RAW_PRIO_TREE_ROOT(&mapping->i_mmap); 68 INIT_RAW_PRIO_TREE_ROOT(&mapping->i_mmap);
69 INIT_LIST_HEAD(&mapping->i_mmap_nonlinear); 69 INIT_LIST_HEAD(&mapping->i_mmap_nonlinear);
70 } 70 }
71 71
72 /** 72 /**
73 * init_gfs2_fs - Register GFS2 as a filesystem 73 * init_gfs2_fs - Register GFS2 as a filesystem
74 * 74 *
75 * Returns: 0 on success, error code on failure 75 * Returns: 0 on success, error code on failure
76 */ 76 */
77 77
78 static int __init init_gfs2_fs(void) 78 static int __init init_gfs2_fs(void)
79 { 79 {
80 int error; 80 int error;
81 81
82 gfs2_str2qstr(&gfs2_qdot, "."); 82 gfs2_str2qstr(&gfs2_qdot, ".");
83 gfs2_str2qstr(&gfs2_qdotdot, ".."); 83 gfs2_str2qstr(&gfs2_qdotdot, "..");
84 84
85 error = gfs2_sys_init(); 85 error = gfs2_sys_init();
86 if (error) 86 if (error)
87 return error; 87 return error;
88 88
89 error = gfs2_glock_init(); 89 error = gfs2_glock_init();
90 if (error) 90 if (error)
91 goto fail; 91 goto fail;
92 92
93 error = -ENOMEM; 93 error = -ENOMEM;
94 gfs2_glock_cachep = kmem_cache_create("gfs2_glock", 94 gfs2_glock_cachep = kmem_cache_create("gfs2_glock",
95 sizeof(struct gfs2_glock), 95 sizeof(struct gfs2_glock),
96 0, 0, 96 0, 0,
97 gfs2_init_glock_once); 97 gfs2_init_glock_once);
98 if (!gfs2_glock_cachep) 98 if (!gfs2_glock_cachep)
99 goto fail; 99 goto fail;
100 100
101 gfs2_glock_aspace_cachep = kmem_cache_create("gfs2_glock(aspace)", 101 gfs2_glock_aspace_cachep = kmem_cache_create("gfs2_glock(aspace)",
102 sizeof(struct gfs2_glock) + 102 sizeof(struct gfs2_glock) +
103 sizeof(struct address_space), 103 sizeof(struct address_space),
104 0, 0, gfs2_init_gl_aspace_once); 104 0, 0, gfs2_init_gl_aspace_once);
105 105
106 if (!gfs2_glock_aspace_cachep) 106 if (!gfs2_glock_aspace_cachep)
107 goto fail; 107 goto fail;
108 108
109 gfs2_inode_cachep = kmem_cache_create("gfs2_inode", 109 gfs2_inode_cachep = kmem_cache_create("gfs2_inode",
110 sizeof(struct gfs2_inode), 110 sizeof(struct gfs2_inode),
111 0, SLAB_RECLAIM_ACCOUNT| 111 0, SLAB_RECLAIM_ACCOUNT|
112 SLAB_MEM_SPREAD, 112 SLAB_MEM_SPREAD,
113 gfs2_init_inode_once); 113 gfs2_init_inode_once);
114 if (!gfs2_inode_cachep) 114 if (!gfs2_inode_cachep)
115 goto fail; 115 goto fail;
116 116
117 gfs2_bufdata_cachep = kmem_cache_create("gfs2_bufdata", 117 gfs2_bufdata_cachep = kmem_cache_create("gfs2_bufdata",
118 sizeof(struct gfs2_bufdata), 118 sizeof(struct gfs2_bufdata),
119 0, 0, NULL); 119 0, 0, NULL);
120 if (!gfs2_bufdata_cachep) 120 if (!gfs2_bufdata_cachep)
121 goto fail; 121 goto fail;
122 122
123 gfs2_rgrpd_cachep = kmem_cache_create("gfs2_rgrpd", 123 gfs2_rgrpd_cachep = kmem_cache_create("gfs2_rgrpd",
124 sizeof(struct gfs2_rgrpd), 124 sizeof(struct gfs2_rgrpd),
125 0, 0, NULL); 125 0, 0, NULL);
126 if (!gfs2_rgrpd_cachep) 126 if (!gfs2_rgrpd_cachep)
127 goto fail; 127 goto fail;
128 128
129 gfs2_quotad_cachep = kmem_cache_create("gfs2_quotad", 129 gfs2_quotad_cachep = kmem_cache_create("gfs2_quotad",
130 sizeof(struct gfs2_quota_data), 130 sizeof(struct gfs2_quota_data),
131 0, 0, NULL); 131 0, 0, NULL);
132 if (!gfs2_quotad_cachep) 132 if (!gfs2_quotad_cachep)
133 goto fail; 133 goto fail;
134 134
135 register_shrinker(&qd_shrinker); 135 register_shrinker(&qd_shrinker);
136 136
137 error = register_filesystem(&gfs2_fs_type); 137 error = register_filesystem(&gfs2_fs_type);
138 if (error) 138 if (error)
139 goto fail; 139 goto fail;
140 140
141 error = register_filesystem(&gfs2meta_fs_type); 141 error = register_filesystem(&gfs2meta_fs_type);
142 if (error) 142 if (error)
143 goto fail_unregister; 143 goto fail_unregister;
144 144
145 error = -ENOMEM; 145 error = -ENOMEM;
146 gfs_recovery_wq = alloc_workqueue("gfs_recovery", 146 gfs_recovery_wq = alloc_workqueue("gfs_recovery",
147 WQ_RESCUER | WQ_FREEZEABLE, 0); 147 WQ_MEM_RECLAIM | WQ_FREEZEABLE, 0);
148 if (!gfs_recovery_wq) 148 if (!gfs_recovery_wq)
149 goto fail_wq; 149 goto fail_wq;
150 150
151 gfs2_register_debugfs(); 151 gfs2_register_debugfs();
152 152
153 printk("GFS2 (built %s %s) installed\n", __DATE__, __TIME__); 153 printk("GFS2 (built %s %s) installed\n", __DATE__, __TIME__);
154 154
155 return 0; 155 return 0;
156 156
157 fail_wq: 157 fail_wq:
158 unregister_filesystem(&gfs2meta_fs_type); 158 unregister_filesystem(&gfs2meta_fs_type);
159 fail_unregister: 159 fail_unregister:
160 unregister_filesystem(&gfs2_fs_type); 160 unregister_filesystem(&gfs2_fs_type);
161 fail: 161 fail:
162 unregister_shrinker(&qd_shrinker); 162 unregister_shrinker(&qd_shrinker);
163 gfs2_glock_exit(); 163 gfs2_glock_exit();
164 164
165 if (gfs2_quotad_cachep) 165 if (gfs2_quotad_cachep)
166 kmem_cache_destroy(gfs2_quotad_cachep); 166 kmem_cache_destroy(gfs2_quotad_cachep);
167 167
168 if (gfs2_rgrpd_cachep) 168 if (gfs2_rgrpd_cachep)
169 kmem_cache_destroy(gfs2_rgrpd_cachep); 169 kmem_cache_destroy(gfs2_rgrpd_cachep);
170 170
171 if (gfs2_bufdata_cachep) 171 if (gfs2_bufdata_cachep)
172 kmem_cache_destroy(gfs2_bufdata_cachep); 172 kmem_cache_destroy(gfs2_bufdata_cachep);
173 173
174 if (gfs2_inode_cachep) 174 if (gfs2_inode_cachep)
175 kmem_cache_destroy(gfs2_inode_cachep); 175 kmem_cache_destroy(gfs2_inode_cachep);
176 176
177 if (gfs2_glock_aspace_cachep) 177 if (gfs2_glock_aspace_cachep)
178 kmem_cache_destroy(gfs2_glock_aspace_cachep); 178 kmem_cache_destroy(gfs2_glock_aspace_cachep);
179 179
180 if (gfs2_glock_cachep) 180 if (gfs2_glock_cachep)
181 kmem_cache_destroy(gfs2_glock_cachep); 181 kmem_cache_destroy(gfs2_glock_cachep);
182 182
183 gfs2_sys_uninit(); 183 gfs2_sys_uninit();
184 return error; 184 return error;
185 } 185 }
186 186
187 /** 187 /**
188 * exit_gfs2_fs - Unregister the file system 188 * exit_gfs2_fs - Unregister the file system
189 * 189 *
190 */ 190 */
191 191
192 static void __exit exit_gfs2_fs(void) 192 static void __exit exit_gfs2_fs(void)
193 { 193 {
194 unregister_shrinker(&qd_shrinker); 194 unregister_shrinker(&qd_shrinker);
195 gfs2_glock_exit(); 195 gfs2_glock_exit();
196 gfs2_unregister_debugfs(); 196 gfs2_unregister_debugfs();
197 unregister_filesystem(&gfs2_fs_type); 197 unregister_filesystem(&gfs2_fs_type);
198 unregister_filesystem(&gfs2meta_fs_type); 198 unregister_filesystem(&gfs2meta_fs_type);
199 destroy_workqueue(gfs_recovery_wq); 199 destroy_workqueue(gfs_recovery_wq);
200 200
201 kmem_cache_destroy(gfs2_quotad_cachep); 201 kmem_cache_destroy(gfs2_quotad_cachep);
202 kmem_cache_destroy(gfs2_rgrpd_cachep); 202 kmem_cache_destroy(gfs2_rgrpd_cachep);
203 kmem_cache_destroy(gfs2_bufdata_cachep); 203 kmem_cache_destroy(gfs2_bufdata_cachep);
204 kmem_cache_destroy(gfs2_inode_cachep); 204 kmem_cache_destroy(gfs2_inode_cachep);
205 kmem_cache_destroy(gfs2_glock_aspace_cachep); 205 kmem_cache_destroy(gfs2_glock_aspace_cachep);
206 kmem_cache_destroy(gfs2_glock_cachep); 206 kmem_cache_destroy(gfs2_glock_cachep);
207 207
208 gfs2_sys_uninit(); 208 gfs2_sys_uninit();
209 } 209 }
210 210
211 MODULE_DESCRIPTION("Global File System"); 211 MODULE_DESCRIPTION("Global File System");
212 MODULE_AUTHOR("Red Hat, Inc."); 212 MODULE_AUTHOR("Red Hat, Inc.");
213 MODULE_LICENSE("GPL"); 213 MODULE_LICENSE("GPL");
214 214
215 module_init(init_gfs2_fs); 215 module_init(init_gfs2_fs);
216 module_exit(exit_gfs2_fs); 216 module_exit(exit_gfs2_fs);
217 217
218 218
fs/xfs/linux-2.6/xfs_buf.c
1 /* 1 /*
2 * Copyright (c) 2000-2006 Silicon Graphics, Inc. 2 * Copyright (c) 2000-2006 Silicon Graphics, Inc.
3 * All Rights Reserved. 3 * All Rights Reserved.
4 * 4 *
5 * This program is free software; you can redistribute it and/or 5 * This program is free software; you can redistribute it and/or
6 * modify it under the terms of the GNU General Public License as 6 * modify it under the terms of the GNU General Public License as
7 * published by the Free Software Foundation. 7 * published by the Free Software Foundation.
8 * 8 *
9 * This program is distributed in the hope that it would be useful, 9 * This program is distributed in the hope that it would be useful,
10 * but WITHOUT ANY WARRANTY; without even the implied warranty of 10 * but WITHOUT ANY WARRANTY; without even the implied warranty of
11 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 11 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12 * GNU General Public License for more details. 12 * GNU General Public License for more details.
13 * 13 *
14 * You should have received a copy of the GNU General Public License 14 * You should have received a copy of the GNU General Public License
15 * along with this program; if not, write the Free Software Foundation, 15 * along with this program; if not, write the Free Software Foundation,
16 * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA 16 * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
17 */ 17 */
18 #include "xfs.h" 18 #include "xfs.h"
19 #include <linux/stddef.h> 19 #include <linux/stddef.h>
20 #include <linux/errno.h> 20 #include <linux/errno.h>
21 #include <linux/gfp.h> 21 #include <linux/gfp.h>
22 #include <linux/pagemap.h> 22 #include <linux/pagemap.h>
23 #include <linux/init.h> 23 #include <linux/init.h>
24 #include <linux/vmalloc.h> 24 #include <linux/vmalloc.h>
25 #include <linux/bio.h> 25 #include <linux/bio.h>
26 #include <linux/sysctl.h> 26 #include <linux/sysctl.h>
27 #include <linux/proc_fs.h> 27 #include <linux/proc_fs.h>
28 #include <linux/workqueue.h> 28 #include <linux/workqueue.h>
29 #include <linux/percpu.h> 29 #include <linux/percpu.h>
30 #include <linux/blkdev.h> 30 #include <linux/blkdev.h>
31 #include <linux/hash.h> 31 #include <linux/hash.h>
32 #include <linux/kthread.h> 32 #include <linux/kthread.h>
33 #include <linux/migrate.h> 33 #include <linux/migrate.h>
34 #include <linux/backing-dev.h> 34 #include <linux/backing-dev.h>
35 #include <linux/freezer.h> 35 #include <linux/freezer.h>
36 #include <linux/list_sort.h> 36 #include <linux/list_sort.h>
37 37
38 #include "xfs_sb.h" 38 #include "xfs_sb.h"
39 #include "xfs_inum.h" 39 #include "xfs_inum.h"
40 #include "xfs_log.h" 40 #include "xfs_log.h"
41 #include "xfs_ag.h" 41 #include "xfs_ag.h"
42 #include "xfs_mount.h" 42 #include "xfs_mount.h"
43 #include "xfs_trace.h" 43 #include "xfs_trace.h"
44 44
45 static kmem_zone_t *xfs_buf_zone; 45 static kmem_zone_t *xfs_buf_zone;
46 STATIC int xfsbufd(void *); 46 STATIC int xfsbufd(void *);
47 STATIC int xfsbufd_wakeup(struct shrinker *, int, gfp_t); 47 STATIC int xfsbufd_wakeup(struct shrinker *, int, gfp_t);
48 STATIC void xfs_buf_delwri_queue(xfs_buf_t *, int); 48 STATIC void xfs_buf_delwri_queue(xfs_buf_t *, int);
49 static struct shrinker xfs_buf_shake = { 49 static struct shrinker xfs_buf_shake = {
50 .shrink = xfsbufd_wakeup, 50 .shrink = xfsbufd_wakeup,
51 .seeks = DEFAULT_SEEKS, 51 .seeks = DEFAULT_SEEKS,
52 }; 52 };
53 53
54 static struct workqueue_struct *xfslogd_workqueue; 54 static struct workqueue_struct *xfslogd_workqueue;
55 struct workqueue_struct *xfsdatad_workqueue; 55 struct workqueue_struct *xfsdatad_workqueue;
56 struct workqueue_struct *xfsconvertd_workqueue; 56 struct workqueue_struct *xfsconvertd_workqueue;
57 57
58 #ifdef XFS_BUF_LOCK_TRACKING 58 #ifdef XFS_BUF_LOCK_TRACKING
59 # define XB_SET_OWNER(bp) ((bp)->b_last_holder = current->pid) 59 # define XB_SET_OWNER(bp) ((bp)->b_last_holder = current->pid)
60 # define XB_CLEAR_OWNER(bp) ((bp)->b_last_holder = -1) 60 # define XB_CLEAR_OWNER(bp) ((bp)->b_last_holder = -1)
61 # define XB_GET_OWNER(bp) ((bp)->b_last_holder) 61 # define XB_GET_OWNER(bp) ((bp)->b_last_holder)
62 #else 62 #else
63 # define XB_SET_OWNER(bp) do { } while (0) 63 # define XB_SET_OWNER(bp) do { } while (0)
64 # define XB_CLEAR_OWNER(bp) do { } while (0) 64 # define XB_CLEAR_OWNER(bp) do { } while (0)
65 # define XB_GET_OWNER(bp) do { } while (0) 65 # define XB_GET_OWNER(bp) do { } while (0)
66 #endif 66 #endif
67 67
68 #define xb_to_gfp(flags) \ 68 #define xb_to_gfp(flags) \
69 ((((flags) & XBF_READ_AHEAD) ? __GFP_NORETRY : \ 69 ((((flags) & XBF_READ_AHEAD) ? __GFP_NORETRY : \
70 ((flags) & XBF_DONT_BLOCK) ? GFP_NOFS : GFP_KERNEL) | __GFP_NOWARN) 70 ((flags) & XBF_DONT_BLOCK) ? GFP_NOFS : GFP_KERNEL) | __GFP_NOWARN)
71 71
72 #define xb_to_km(flags) \ 72 #define xb_to_km(flags) \
73 (((flags) & XBF_DONT_BLOCK) ? KM_NOFS : KM_SLEEP) 73 (((flags) & XBF_DONT_BLOCK) ? KM_NOFS : KM_SLEEP)
74 74
75 #define xfs_buf_allocate(flags) \ 75 #define xfs_buf_allocate(flags) \
76 kmem_zone_alloc(xfs_buf_zone, xb_to_km(flags)) 76 kmem_zone_alloc(xfs_buf_zone, xb_to_km(flags))
77 #define xfs_buf_deallocate(bp) \ 77 #define xfs_buf_deallocate(bp) \
78 kmem_zone_free(xfs_buf_zone, (bp)); 78 kmem_zone_free(xfs_buf_zone, (bp));
79 79
80 static inline int 80 static inline int
81 xfs_buf_is_vmapped( 81 xfs_buf_is_vmapped(
82 struct xfs_buf *bp) 82 struct xfs_buf *bp)
83 { 83 {
84 /* 84 /*
85 * Return true if the buffer is vmapped. 85 * Return true if the buffer is vmapped.
86 * 86 *
87 * The XBF_MAPPED flag is set if the buffer should be mapped, but the 87 * The XBF_MAPPED flag is set if the buffer should be mapped, but the
88 * code is clever enough to know it doesn't have to map a single page, 88 * code is clever enough to know it doesn't have to map a single page,
89 * so the check has to be both for XBF_MAPPED and bp->b_page_count > 1. 89 * so the check has to be both for XBF_MAPPED and bp->b_page_count > 1.
90 */ 90 */
91 return (bp->b_flags & XBF_MAPPED) && bp->b_page_count > 1; 91 return (bp->b_flags & XBF_MAPPED) && bp->b_page_count > 1;
92 } 92 }
93 93
94 static inline int 94 static inline int
95 xfs_buf_vmap_len( 95 xfs_buf_vmap_len(
96 struct xfs_buf *bp) 96 struct xfs_buf *bp)
97 { 97 {
98 return (bp->b_page_count * PAGE_SIZE) - bp->b_offset; 98 return (bp->b_page_count * PAGE_SIZE) - bp->b_offset;
99 } 99 }
100 100
101 /* 101 /*
102 * Page Region interfaces. 102 * Page Region interfaces.
103 * 103 *
104 * For pages in filesystems where the blocksize is smaller than the 104 * For pages in filesystems where the blocksize is smaller than the
105 * pagesize, we use the page->private field (long) to hold a bitmap 105 * pagesize, we use the page->private field (long) to hold a bitmap
106 * of uptodate regions within the page. 106 * of uptodate regions within the page.
107 * 107 *
108 * Each such region is "bytes per page / bits per long" bytes long. 108 * Each such region is "bytes per page / bits per long" bytes long.
109 * 109 *
110 * NBPPR == number-of-bytes-per-page-region 110 * NBPPR == number-of-bytes-per-page-region
111 * BTOPR == bytes-to-page-region (rounded up) 111 * BTOPR == bytes-to-page-region (rounded up)
112 * BTOPRT == bytes-to-page-region-truncated (rounded down) 112 * BTOPRT == bytes-to-page-region-truncated (rounded down)
113 */ 113 */
114 #if (BITS_PER_LONG == 32) 114 #if (BITS_PER_LONG == 32)
115 #define PRSHIFT (PAGE_CACHE_SHIFT - 5) /* (32 == 1<<5) */ 115 #define PRSHIFT (PAGE_CACHE_SHIFT - 5) /* (32 == 1<<5) */
116 #elif (BITS_PER_LONG == 64) 116 #elif (BITS_PER_LONG == 64)
117 #define PRSHIFT (PAGE_CACHE_SHIFT - 6) /* (64 == 1<<6) */ 117 #define PRSHIFT (PAGE_CACHE_SHIFT - 6) /* (64 == 1<<6) */
118 #else 118 #else
119 #error BITS_PER_LONG must be 32 or 64 119 #error BITS_PER_LONG must be 32 or 64
120 #endif 120 #endif
121 #define NBPPR (PAGE_CACHE_SIZE/BITS_PER_LONG) 121 #define NBPPR (PAGE_CACHE_SIZE/BITS_PER_LONG)
122 #define BTOPR(b) (((unsigned int)(b) + (NBPPR - 1)) >> PRSHIFT) 122 #define BTOPR(b) (((unsigned int)(b) + (NBPPR - 1)) >> PRSHIFT)
123 #define BTOPRT(b) (((unsigned int)(b) >> PRSHIFT)) 123 #define BTOPRT(b) (((unsigned int)(b) >> PRSHIFT))
124 124
125 STATIC unsigned long 125 STATIC unsigned long
126 page_region_mask( 126 page_region_mask(
127 size_t offset, 127 size_t offset,
128 size_t length) 128 size_t length)
129 { 129 {
130 unsigned long mask; 130 unsigned long mask;
131 int first, final; 131 int first, final;
132 132
133 first = BTOPR(offset); 133 first = BTOPR(offset);
134 final = BTOPRT(offset + length - 1); 134 final = BTOPRT(offset + length - 1);
135 first = min(first, final); 135 first = min(first, final);
136 136
137 mask = ~0UL; 137 mask = ~0UL;
138 mask <<= BITS_PER_LONG - (final - first); 138 mask <<= BITS_PER_LONG - (final - first);
139 mask >>= BITS_PER_LONG - (final); 139 mask >>= BITS_PER_LONG - (final);
140 140
141 ASSERT(offset + length <= PAGE_CACHE_SIZE); 141 ASSERT(offset + length <= PAGE_CACHE_SIZE);
142 ASSERT((final - first) < BITS_PER_LONG && (final - first) >= 0); 142 ASSERT((final - first) < BITS_PER_LONG && (final - first) >= 0);
143 143
144 return mask; 144 return mask;
145 } 145 }
146 146
147 STATIC void 147 STATIC void
148 set_page_region( 148 set_page_region(
149 struct page *page, 149 struct page *page,
150 size_t offset, 150 size_t offset,
151 size_t length) 151 size_t length)
152 { 152 {
153 set_page_private(page, 153 set_page_private(page,
154 page_private(page) | page_region_mask(offset, length)); 154 page_private(page) | page_region_mask(offset, length));
155 if (page_private(page) == ~0UL) 155 if (page_private(page) == ~0UL)
156 SetPageUptodate(page); 156 SetPageUptodate(page);
157 } 157 }
158 158
159 STATIC int 159 STATIC int
160 test_page_region( 160 test_page_region(
161 struct page *page, 161 struct page *page,
162 size_t offset, 162 size_t offset,
163 size_t length) 163 size_t length)
164 { 164 {
165 unsigned long mask = page_region_mask(offset, length); 165 unsigned long mask = page_region_mask(offset, length);
166 166
167 return (mask && (page_private(page) & mask) == mask); 167 return (mask && (page_private(page) & mask) == mask);
168 } 168 }
169 169
170 /* 170 /*
171 * Internal xfs_buf_t object manipulation 171 * Internal xfs_buf_t object manipulation
172 */ 172 */
173 173
174 STATIC void 174 STATIC void
175 _xfs_buf_initialize( 175 _xfs_buf_initialize(
176 xfs_buf_t *bp, 176 xfs_buf_t *bp,
177 xfs_buftarg_t *target, 177 xfs_buftarg_t *target,
178 xfs_off_t range_base, 178 xfs_off_t range_base,
179 size_t range_length, 179 size_t range_length,
180 xfs_buf_flags_t flags) 180 xfs_buf_flags_t flags)
181 { 181 {
182 /* 182 /*
183 * We don't want certain flags to appear in b_flags. 183 * We don't want certain flags to appear in b_flags.
184 */ 184 */
185 flags &= ~(XBF_LOCK|XBF_MAPPED|XBF_DONT_BLOCK|XBF_READ_AHEAD); 185 flags &= ~(XBF_LOCK|XBF_MAPPED|XBF_DONT_BLOCK|XBF_READ_AHEAD);
186 186
187 memset(bp, 0, sizeof(xfs_buf_t)); 187 memset(bp, 0, sizeof(xfs_buf_t));
188 atomic_set(&bp->b_hold, 1); 188 atomic_set(&bp->b_hold, 1);
189 init_completion(&bp->b_iowait); 189 init_completion(&bp->b_iowait);
190 INIT_LIST_HEAD(&bp->b_list); 190 INIT_LIST_HEAD(&bp->b_list);
191 INIT_LIST_HEAD(&bp->b_hash_list); 191 INIT_LIST_HEAD(&bp->b_hash_list);
192 init_MUTEX_LOCKED(&bp->b_sema); /* held, no waiters */ 192 init_MUTEX_LOCKED(&bp->b_sema); /* held, no waiters */
193 XB_SET_OWNER(bp); 193 XB_SET_OWNER(bp);
194 bp->b_target = target; 194 bp->b_target = target;
195 bp->b_file_offset = range_base; 195 bp->b_file_offset = range_base;
196 /* 196 /*
197 * Set buffer_length and count_desired to the same value initially. 197 * Set buffer_length and count_desired to the same value initially.
198 * I/O routines should use count_desired, which will be the same in 198 * I/O routines should use count_desired, which will be the same in
199 * most cases but may be reset (e.g. XFS recovery). 199 * most cases but may be reset (e.g. XFS recovery).
200 */ 200 */
201 bp->b_buffer_length = bp->b_count_desired = range_length; 201 bp->b_buffer_length = bp->b_count_desired = range_length;
202 bp->b_flags = flags; 202 bp->b_flags = flags;
203 bp->b_bn = XFS_BUF_DADDR_NULL; 203 bp->b_bn = XFS_BUF_DADDR_NULL;
204 atomic_set(&bp->b_pin_count, 0); 204 atomic_set(&bp->b_pin_count, 0);
205 init_waitqueue_head(&bp->b_waiters); 205 init_waitqueue_head(&bp->b_waiters);
206 206
207 XFS_STATS_INC(xb_create); 207 XFS_STATS_INC(xb_create);
208 208
209 trace_xfs_buf_init(bp, _RET_IP_); 209 trace_xfs_buf_init(bp, _RET_IP_);
210 } 210 }
211 211
212 /* 212 /*
213 * Allocate a page array capable of holding a specified number 213 * Allocate a page array capable of holding a specified number
214 * of pages, and point the page buf at it. 214 * of pages, and point the page buf at it.
215 */ 215 */
216 STATIC int 216 STATIC int
217 _xfs_buf_get_pages( 217 _xfs_buf_get_pages(
218 xfs_buf_t *bp, 218 xfs_buf_t *bp,
219 int page_count, 219 int page_count,
220 xfs_buf_flags_t flags) 220 xfs_buf_flags_t flags)
221 { 221 {
222 /* Make sure that we have a page list */ 222 /* Make sure that we have a page list */
223 if (bp->b_pages == NULL) { 223 if (bp->b_pages == NULL) {
224 bp->b_offset = xfs_buf_poff(bp->b_file_offset); 224 bp->b_offset = xfs_buf_poff(bp->b_file_offset);
225 bp->b_page_count = page_count; 225 bp->b_page_count = page_count;
226 if (page_count <= XB_PAGES) { 226 if (page_count <= XB_PAGES) {
227 bp->b_pages = bp->b_page_array; 227 bp->b_pages = bp->b_page_array;
228 } else { 228 } else {
229 bp->b_pages = kmem_alloc(sizeof(struct page *) * 229 bp->b_pages = kmem_alloc(sizeof(struct page *) *
230 page_count, xb_to_km(flags)); 230 page_count, xb_to_km(flags));
231 if (bp->b_pages == NULL) 231 if (bp->b_pages == NULL)
232 return -ENOMEM; 232 return -ENOMEM;
233 } 233 }
234 memset(bp->b_pages, 0, sizeof(struct page *) * page_count); 234 memset(bp->b_pages, 0, sizeof(struct page *) * page_count);
235 } 235 }
236 return 0; 236 return 0;
237 } 237 }
238 238
239 /* 239 /*
240 * Frees b_pages if it was allocated. 240 * Frees b_pages if it was allocated.
241 */ 241 */
242 STATIC void 242 STATIC void
243 _xfs_buf_free_pages( 243 _xfs_buf_free_pages(
244 xfs_buf_t *bp) 244 xfs_buf_t *bp)
245 { 245 {
246 if (bp->b_pages != bp->b_page_array) { 246 if (bp->b_pages != bp->b_page_array) {
247 kmem_free(bp->b_pages); 247 kmem_free(bp->b_pages);
248 bp->b_pages = NULL; 248 bp->b_pages = NULL;
249 } 249 }
250 } 250 }
251 251
252 /* 252 /*
253 * Releases the specified buffer. 253 * Releases the specified buffer.
254 * 254 *
255 * The modification state of any associated pages is left unchanged. 255 * The modification state of any associated pages is left unchanged.
256 * The buffer most not be on any hash - use xfs_buf_rele instead for 256 * The buffer most not be on any hash - use xfs_buf_rele instead for
257 * hashed and refcounted buffers 257 * hashed and refcounted buffers
258 */ 258 */
259 void 259 void
260 xfs_buf_free( 260 xfs_buf_free(
261 xfs_buf_t *bp) 261 xfs_buf_t *bp)
262 { 262 {
263 trace_xfs_buf_free(bp, _RET_IP_); 263 trace_xfs_buf_free(bp, _RET_IP_);
264 264
265 ASSERT(list_empty(&bp->b_hash_list)); 265 ASSERT(list_empty(&bp->b_hash_list));
266 266
267 if (bp->b_flags & (_XBF_PAGE_CACHE|_XBF_PAGES)) { 267 if (bp->b_flags & (_XBF_PAGE_CACHE|_XBF_PAGES)) {
268 uint i; 268 uint i;
269 269
270 if (xfs_buf_is_vmapped(bp)) 270 if (xfs_buf_is_vmapped(bp))
271 vm_unmap_ram(bp->b_addr - bp->b_offset, 271 vm_unmap_ram(bp->b_addr - bp->b_offset,
272 bp->b_page_count); 272 bp->b_page_count);
273 273
274 for (i = 0; i < bp->b_page_count; i++) { 274 for (i = 0; i < bp->b_page_count; i++) {
275 struct page *page = bp->b_pages[i]; 275 struct page *page = bp->b_pages[i];
276 276
277 if (bp->b_flags & _XBF_PAGE_CACHE) 277 if (bp->b_flags & _XBF_PAGE_CACHE)
278 ASSERT(!PagePrivate(page)); 278 ASSERT(!PagePrivate(page));
279 page_cache_release(page); 279 page_cache_release(page);
280 } 280 }
281 } 281 }
282 _xfs_buf_free_pages(bp); 282 _xfs_buf_free_pages(bp);
283 xfs_buf_deallocate(bp); 283 xfs_buf_deallocate(bp);
284 } 284 }
285 285
286 /* 286 /*
287 * Finds all pages for buffer in question and builds it's page list. 287 * Finds all pages for buffer in question and builds it's page list.
288 */ 288 */
289 STATIC int 289 STATIC int
290 _xfs_buf_lookup_pages( 290 _xfs_buf_lookup_pages(
291 xfs_buf_t *bp, 291 xfs_buf_t *bp,
292 uint flags) 292 uint flags)
293 { 293 {
294 struct address_space *mapping = bp->b_target->bt_mapping; 294 struct address_space *mapping = bp->b_target->bt_mapping;
295 size_t blocksize = bp->b_target->bt_bsize; 295 size_t blocksize = bp->b_target->bt_bsize;
296 size_t size = bp->b_count_desired; 296 size_t size = bp->b_count_desired;
297 size_t nbytes, offset; 297 size_t nbytes, offset;
298 gfp_t gfp_mask = xb_to_gfp(flags); 298 gfp_t gfp_mask = xb_to_gfp(flags);
299 unsigned short page_count, i; 299 unsigned short page_count, i;
300 pgoff_t first; 300 pgoff_t first;
301 xfs_off_t end; 301 xfs_off_t end;
302 int error; 302 int error;
303 303
304 end = bp->b_file_offset + bp->b_buffer_length; 304 end = bp->b_file_offset + bp->b_buffer_length;
305 page_count = xfs_buf_btoc(end) - xfs_buf_btoct(bp->b_file_offset); 305 page_count = xfs_buf_btoc(end) - xfs_buf_btoct(bp->b_file_offset);
306 306
307 error = _xfs_buf_get_pages(bp, page_count, flags); 307 error = _xfs_buf_get_pages(bp, page_count, flags);
308 if (unlikely(error)) 308 if (unlikely(error))
309 return error; 309 return error;
310 bp->b_flags |= _XBF_PAGE_CACHE; 310 bp->b_flags |= _XBF_PAGE_CACHE;
311 311
312 offset = bp->b_offset; 312 offset = bp->b_offset;
313 first = bp->b_file_offset >> PAGE_CACHE_SHIFT; 313 first = bp->b_file_offset >> PAGE_CACHE_SHIFT;
314 314
315 for (i = 0; i < bp->b_page_count; i++) { 315 for (i = 0; i < bp->b_page_count; i++) {
316 struct page *page; 316 struct page *page;
317 uint retries = 0; 317 uint retries = 0;
318 318
319 retry: 319 retry:
320 page = find_or_create_page(mapping, first + i, gfp_mask); 320 page = find_or_create_page(mapping, first + i, gfp_mask);
321 if (unlikely(page == NULL)) { 321 if (unlikely(page == NULL)) {
322 if (flags & XBF_READ_AHEAD) { 322 if (flags & XBF_READ_AHEAD) {
323 bp->b_page_count = i; 323 bp->b_page_count = i;
324 for (i = 0; i < bp->b_page_count; i++) 324 for (i = 0; i < bp->b_page_count; i++)
325 unlock_page(bp->b_pages[i]); 325 unlock_page(bp->b_pages[i]);
326 return -ENOMEM; 326 return -ENOMEM;
327 } 327 }
328 328
329 /* 329 /*
330 * This could deadlock. 330 * This could deadlock.
331 * 331 *
332 * But until all the XFS lowlevel code is revamped to 332 * But until all the XFS lowlevel code is revamped to
333 * handle buffer allocation failures we can't do much. 333 * handle buffer allocation failures we can't do much.
334 */ 334 */
335 if (!(++retries % 100)) 335 if (!(++retries % 100))
336 printk(KERN_ERR 336 printk(KERN_ERR
337 "XFS: possible memory allocation " 337 "XFS: possible memory allocation "
338 "deadlock in %s (mode:0x%x)\n", 338 "deadlock in %s (mode:0x%x)\n",
339 __func__, gfp_mask); 339 __func__, gfp_mask);
340 340
341 XFS_STATS_INC(xb_page_retries); 341 XFS_STATS_INC(xb_page_retries);
342 xfsbufd_wakeup(NULL, 0, gfp_mask); 342 xfsbufd_wakeup(NULL, 0, gfp_mask);
343 congestion_wait(BLK_RW_ASYNC, HZ/50); 343 congestion_wait(BLK_RW_ASYNC, HZ/50);
344 goto retry; 344 goto retry;
345 } 345 }
346 346
347 XFS_STATS_INC(xb_page_found); 347 XFS_STATS_INC(xb_page_found);
348 348
349 nbytes = min_t(size_t, size, PAGE_CACHE_SIZE - offset); 349 nbytes = min_t(size_t, size, PAGE_CACHE_SIZE - offset);
350 size -= nbytes; 350 size -= nbytes;
351 351
352 ASSERT(!PagePrivate(page)); 352 ASSERT(!PagePrivate(page));
353 if (!PageUptodate(page)) { 353 if (!PageUptodate(page)) {
354 page_count--; 354 page_count--;
355 if (blocksize >= PAGE_CACHE_SIZE) { 355 if (blocksize >= PAGE_CACHE_SIZE) {
356 if (flags & XBF_READ) 356 if (flags & XBF_READ)
357 bp->b_flags |= _XBF_PAGE_LOCKED; 357 bp->b_flags |= _XBF_PAGE_LOCKED;
358 } else if (!PagePrivate(page)) { 358 } else if (!PagePrivate(page)) {
359 if (test_page_region(page, offset, nbytes)) 359 if (test_page_region(page, offset, nbytes))
360 page_count++; 360 page_count++;
361 } 361 }
362 } 362 }
363 363
364 bp->b_pages[i] = page; 364 bp->b_pages[i] = page;
365 offset = 0; 365 offset = 0;
366 } 366 }
367 367
368 if (!(bp->b_flags & _XBF_PAGE_LOCKED)) { 368 if (!(bp->b_flags & _XBF_PAGE_LOCKED)) {
369 for (i = 0; i < bp->b_page_count; i++) 369 for (i = 0; i < bp->b_page_count; i++)
370 unlock_page(bp->b_pages[i]); 370 unlock_page(bp->b_pages[i]);
371 } 371 }
372 372
373 if (page_count == bp->b_page_count) 373 if (page_count == bp->b_page_count)
374 bp->b_flags |= XBF_DONE; 374 bp->b_flags |= XBF_DONE;
375 375
376 return error; 376 return error;
377 } 377 }
378 378
379 /* 379 /*
380 * Map buffer into kernel address-space if nessecary. 380 * Map buffer into kernel address-space if nessecary.
381 */ 381 */
382 STATIC int 382 STATIC int
383 _xfs_buf_map_pages( 383 _xfs_buf_map_pages(
384 xfs_buf_t *bp, 384 xfs_buf_t *bp,
385 uint flags) 385 uint flags)
386 { 386 {
387 /* A single page buffer is always mappable */ 387 /* A single page buffer is always mappable */
388 if (bp->b_page_count == 1) { 388 if (bp->b_page_count == 1) {
389 bp->b_addr = page_address(bp->b_pages[0]) + bp->b_offset; 389 bp->b_addr = page_address(bp->b_pages[0]) + bp->b_offset;
390 bp->b_flags |= XBF_MAPPED; 390 bp->b_flags |= XBF_MAPPED;
391 } else if (flags & XBF_MAPPED) { 391 } else if (flags & XBF_MAPPED) {
392 bp->b_addr = vm_map_ram(bp->b_pages, bp->b_page_count, 392 bp->b_addr = vm_map_ram(bp->b_pages, bp->b_page_count,
393 -1, PAGE_KERNEL); 393 -1, PAGE_KERNEL);
394 if (unlikely(bp->b_addr == NULL)) 394 if (unlikely(bp->b_addr == NULL))
395 return -ENOMEM; 395 return -ENOMEM;
396 bp->b_addr += bp->b_offset; 396 bp->b_addr += bp->b_offset;
397 bp->b_flags |= XBF_MAPPED; 397 bp->b_flags |= XBF_MAPPED;
398 } 398 }
399 399
400 return 0; 400 return 0;
401 } 401 }
402 402
403 /* 403 /*
404 * Finding and Reading Buffers 404 * Finding and Reading Buffers
405 */ 405 */
406 406
407 /* 407 /*
408 * Look up, and creates if absent, a lockable buffer for 408 * Look up, and creates if absent, a lockable buffer for
409 * a given range of an inode. The buffer is returned 409 * a given range of an inode. The buffer is returned
410 * locked. If other overlapping buffers exist, they are 410 * locked. If other overlapping buffers exist, they are
411 * released before the new buffer is created and locked, 411 * released before the new buffer is created and locked,
412 * which may imply that this call will block until those buffers 412 * which may imply that this call will block until those buffers
413 * are unlocked. No I/O is implied by this call. 413 * are unlocked. No I/O is implied by this call.
414 */ 414 */
415 xfs_buf_t * 415 xfs_buf_t *
416 _xfs_buf_find( 416 _xfs_buf_find(
417 xfs_buftarg_t *btp, /* block device target */ 417 xfs_buftarg_t *btp, /* block device target */
418 xfs_off_t ioff, /* starting offset of range */ 418 xfs_off_t ioff, /* starting offset of range */
419 size_t isize, /* length of range */ 419 size_t isize, /* length of range */
420 xfs_buf_flags_t flags, 420 xfs_buf_flags_t flags,
421 xfs_buf_t *new_bp) 421 xfs_buf_t *new_bp)
422 { 422 {
423 xfs_off_t range_base; 423 xfs_off_t range_base;
424 size_t range_length; 424 size_t range_length;
425 xfs_bufhash_t *hash; 425 xfs_bufhash_t *hash;
426 xfs_buf_t *bp, *n; 426 xfs_buf_t *bp, *n;
427 427
428 range_base = (ioff << BBSHIFT); 428 range_base = (ioff << BBSHIFT);
429 range_length = (isize << BBSHIFT); 429 range_length = (isize << BBSHIFT);
430 430
431 /* Check for IOs smaller than the sector size / not sector aligned */ 431 /* Check for IOs smaller than the sector size / not sector aligned */
432 ASSERT(!(range_length < (1 << btp->bt_sshift))); 432 ASSERT(!(range_length < (1 << btp->bt_sshift)));
433 ASSERT(!(range_base & (xfs_off_t)btp->bt_smask)); 433 ASSERT(!(range_base & (xfs_off_t)btp->bt_smask));
434 434
435 hash = &btp->bt_hash[hash_long((unsigned long)ioff, btp->bt_hashshift)]; 435 hash = &btp->bt_hash[hash_long((unsigned long)ioff, btp->bt_hashshift)];
436 436
437 spin_lock(&hash->bh_lock); 437 spin_lock(&hash->bh_lock);
438 438
439 list_for_each_entry_safe(bp, n, &hash->bh_list, b_hash_list) { 439 list_for_each_entry_safe(bp, n, &hash->bh_list, b_hash_list) {
440 ASSERT(btp == bp->b_target); 440 ASSERT(btp == bp->b_target);
441 if (bp->b_file_offset == range_base && 441 if (bp->b_file_offset == range_base &&
442 bp->b_buffer_length == range_length) { 442 bp->b_buffer_length == range_length) {
443 atomic_inc(&bp->b_hold); 443 atomic_inc(&bp->b_hold);
444 goto found; 444 goto found;
445 } 445 }
446 } 446 }
447 447
448 /* No match found */ 448 /* No match found */
449 if (new_bp) { 449 if (new_bp) {
450 _xfs_buf_initialize(new_bp, btp, range_base, 450 _xfs_buf_initialize(new_bp, btp, range_base,
451 range_length, flags); 451 range_length, flags);
452 new_bp->b_hash = hash; 452 new_bp->b_hash = hash;
453 list_add(&new_bp->b_hash_list, &hash->bh_list); 453 list_add(&new_bp->b_hash_list, &hash->bh_list);
454 } else { 454 } else {
455 XFS_STATS_INC(xb_miss_locked); 455 XFS_STATS_INC(xb_miss_locked);
456 } 456 }
457 457
458 spin_unlock(&hash->bh_lock); 458 spin_unlock(&hash->bh_lock);
459 return new_bp; 459 return new_bp;
460 460
461 found: 461 found:
462 spin_unlock(&hash->bh_lock); 462 spin_unlock(&hash->bh_lock);
463 463
464 /* Attempt to get the semaphore without sleeping, 464 /* Attempt to get the semaphore without sleeping,
465 * if this does not work then we need to drop the 465 * if this does not work then we need to drop the
466 * spinlock and do a hard attempt on the semaphore. 466 * spinlock and do a hard attempt on the semaphore.
467 */ 467 */
468 if (down_trylock(&bp->b_sema)) { 468 if (down_trylock(&bp->b_sema)) {
469 if (!(flags & XBF_TRYLOCK)) { 469 if (!(flags & XBF_TRYLOCK)) {
470 /* wait for buffer ownership */ 470 /* wait for buffer ownership */
471 xfs_buf_lock(bp); 471 xfs_buf_lock(bp);
472 XFS_STATS_INC(xb_get_locked_waited); 472 XFS_STATS_INC(xb_get_locked_waited);
473 } else { 473 } else {
474 /* We asked for a trylock and failed, no need 474 /* We asked for a trylock and failed, no need
475 * to look at file offset and length here, we 475 * to look at file offset and length here, we
476 * know that this buffer at least overlaps our 476 * know that this buffer at least overlaps our
477 * buffer and is locked, therefore our buffer 477 * buffer and is locked, therefore our buffer
478 * either does not exist, or is this buffer. 478 * either does not exist, or is this buffer.
479 */ 479 */
480 xfs_buf_rele(bp); 480 xfs_buf_rele(bp);
481 XFS_STATS_INC(xb_busy_locked); 481 XFS_STATS_INC(xb_busy_locked);
482 return NULL; 482 return NULL;
483 } 483 }
484 } else { 484 } else {
485 /* trylock worked */ 485 /* trylock worked */
486 XB_SET_OWNER(bp); 486 XB_SET_OWNER(bp);
487 } 487 }
488 488
489 if (bp->b_flags & XBF_STALE) { 489 if (bp->b_flags & XBF_STALE) {
490 ASSERT((bp->b_flags & _XBF_DELWRI_Q) == 0); 490 ASSERT((bp->b_flags & _XBF_DELWRI_Q) == 0);
491 bp->b_flags &= XBF_MAPPED; 491 bp->b_flags &= XBF_MAPPED;
492 } 492 }
493 493
494 trace_xfs_buf_find(bp, flags, _RET_IP_); 494 trace_xfs_buf_find(bp, flags, _RET_IP_);
495 XFS_STATS_INC(xb_get_locked); 495 XFS_STATS_INC(xb_get_locked);
496 return bp; 496 return bp;
497 } 497 }
498 498
499 /* 499 /*
500 * Assembles a buffer covering the specified range. 500 * Assembles a buffer covering the specified range.
501 * Storage in memory for all portions of the buffer will be allocated, 501 * Storage in memory for all portions of the buffer will be allocated,
502 * although backing storage may not be. 502 * although backing storage may not be.
503 */ 503 */
504 xfs_buf_t * 504 xfs_buf_t *
505 xfs_buf_get( 505 xfs_buf_get(
506 xfs_buftarg_t *target,/* target for buffer */ 506 xfs_buftarg_t *target,/* target for buffer */
507 xfs_off_t ioff, /* starting offset of range */ 507 xfs_off_t ioff, /* starting offset of range */
508 size_t isize, /* length of range */ 508 size_t isize, /* length of range */
509 xfs_buf_flags_t flags) 509 xfs_buf_flags_t flags)
510 { 510 {
511 xfs_buf_t *bp, *new_bp; 511 xfs_buf_t *bp, *new_bp;
512 int error = 0, i; 512 int error = 0, i;
513 513
514 new_bp = xfs_buf_allocate(flags); 514 new_bp = xfs_buf_allocate(flags);
515 if (unlikely(!new_bp)) 515 if (unlikely(!new_bp))
516 return NULL; 516 return NULL;
517 517
518 bp = _xfs_buf_find(target, ioff, isize, flags, new_bp); 518 bp = _xfs_buf_find(target, ioff, isize, flags, new_bp);
519 if (bp == new_bp) { 519 if (bp == new_bp) {
520 error = _xfs_buf_lookup_pages(bp, flags); 520 error = _xfs_buf_lookup_pages(bp, flags);
521 if (error) 521 if (error)
522 goto no_buffer; 522 goto no_buffer;
523 } else { 523 } else {
524 xfs_buf_deallocate(new_bp); 524 xfs_buf_deallocate(new_bp);
525 if (unlikely(bp == NULL)) 525 if (unlikely(bp == NULL))
526 return NULL; 526 return NULL;
527 } 527 }
528 528
529 for (i = 0; i < bp->b_page_count; i++) 529 for (i = 0; i < bp->b_page_count; i++)
530 mark_page_accessed(bp->b_pages[i]); 530 mark_page_accessed(bp->b_pages[i]);
531 531
532 if (!(bp->b_flags & XBF_MAPPED)) { 532 if (!(bp->b_flags & XBF_MAPPED)) {
533 error = _xfs_buf_map_pages(bp, flags); 533 error = _xfs_buf_map_pages(bp, flags);
534 if (unlikely(error)) { 534 if (unlikely(error)) {
535 printk(KERN_WARNING "%s: failed to map pages\n", 535 printk(KERN_WARNING "%s: failed to map pages\n",
536 __func__); 536 __func__);
537 goto no_buffer; 537 goto no_buffer;
538 } 538 }
539 } 539 }
540 540
541 XFS_STATS_INC(xb_get); 541 XFS_STATS_INC(xb_get);
542 542
543 /* 543 /*
544 * Always fill in the block number now, the mapped cases can do 544 * Always fill in the block number now, the mapped cases can do
545 * their own overlay of this later. 545 * their own overlay of this later.
546 */ 546 */
547 bp->b_bn = ioff; 547 bp->b_bn = ioff;
548 bp->b_count_desired = bp->b_buffer_length; 548 bp->b_count_desired = bp->b_buffer_length;
549 549
550 trace_xfs_buf_get(bp, flags, _RET_IP_); 550 trace_xfs_buf_get(bp, flags, _RET_IP_);
551 return bp; 551 return bp;
552 552
553 no_buffer: 553 no_buffer:
554 if (flags & (XBF_LOCK | XBF_TRYLOCK)) 554 if (flags & (XBF_LOCK | XBF_TRYLOCK))
555 xfs_buf_unlock(bp); 555 xfs_buf_unlock(bp);
556 xfs_buf_rele(bp); 556 xfs_buf_rele(bp);
557 return NULL; 557 return NULL;
558 } 558 }
559 559
560 STATIC int 560 STATIC int
561 _xfs_buf_read( 561 _xfs_buf_read(
562 xfs_buf_t *bp, 562 xfs_buf_t *bp,
563 xfs_buf_flags_t flags) 563 xfs_buf_flags_t flags)
564 { 564 {
565 int status; 565 int status;
566 566
567 ASSERT(!(flags & (XBF_DELWRI|XBF_WRITE))); 567 ASSERT(!(flags & (XBF_DELWRI|XBF_WRITE)));
568 ASSERT(bp->b_bn != XFS_BUF_DADDR_NULL); 568 ASSERT(bp->b_bn != XFS_BUF_DADDR_NULL);
569 569
570 bp->b_flags &= ~(XBF_WRITE | XBF_ASYNC | XBF_DELWRI | \ 570 bp->b_flags &= ~(XBF_WRITE | XBF_ASYNC | XBF_DELWRI | \
571 XBF_READ_AHEAD | _XBF_RUN_QUEUES); 571 XBF_READ_AHEAD | _XBF_RUN_QUEUES);
572 bp->b_flags |= flags & (XBF_READ | XBF_ASYNC | \ 572 bp->b_flags |= flags & (XBF_READ | XBF_ASYNC | \
573 XBF_READ_AHEAD | _XBF_RUN_QUEUES); 573 XBF_READ_AHEAD | _XBF_RUN_QUEUES);
574 574
575 status = xfs_buf_iorequest(bp); 575 status = xfs_buf_iorequest(bp);
576 if (status || XFS_BUF_ISERROR(bp) || (flags & XBF_ASYNC)) 576 if (status || XFS_BUF_ISERROR(bp) || (flags & XBF_ASYNC))
577 return status; 577 return status;
578 return xfs_buf_iowait(bp); 578 return xfs_buf_iowait(bp);
579 } 579 }
580 580
581 xfs_buf_t * 581 xfs_buf_t *
582 xfs_buf_read( 582 xfs_buf_read(
583 xfs_buftarg_t *target, 583 xfs_buftarg_t *target,
584 xfs_off_t ioff, 584 xfs_off_t ioff,
585 size_t isize, 585 size_t isize,
586 xfs_buf_flags_t flags) 586 xfs_buf_flags_t flags)
587 { 587 {
588 xfs_buf_t *bp; 588 xfs_buf_t *bp;
589 589
590 flags |= XBF_READ; 590 flags |= XBF_READ;
591 591
592 bp = xfs_buf_get(target, ioff, isize, flags); 592 bp = xfs_buf_get(target, ioff, isize, flags);
593 if (bp) { 593 if (bp) {
594 trace_xfs_buf_read(bp, flags, _RET_IP_); 594 trace_xfs_buf_read(bp, flags, _RET_IP_);
595 595
596 if (!XFS_BUF_ISDONE(bp)) { 596 if (!XFS_BUF_ISDONE(bp)) {
597 XFS_STATS_INC(xb_get_read); 597 XFS_STATS_INC(xb_get_read);
598 _xfs_buf_read(bp, flags); 598 _xfs_buf_read(bp, flags);
599 } else if (flags & XBF_ASYNC) { 599 } else if (flags & XBF_ASYNC) {
600 /* 600 /*
601 * Read ahead call which is already satisfied, 601 * Read ahead call which is already satisfied,
602 * drop the buffer 602 * drop the buffer
603 */ 603 */
604 goto no_buffer; 604 goto no_buffer;
605 } else { 605 } else {
606 /* We do not want read in the flags */ 606 /* We do not want read in the flags */
607 bp->b_flags &= ~XBF_READ; 607 bp->b_flags &= ~XBF_READ;
608 } 608 }
609 } 609 }
610 610
611 return bp; 611 return bp;
612 612
613 no_buffer: 613 no_buffer:
614 if (flags & (XBF_LOCK | XBF_TRYLOCK)) 614 if (flags & (XBF_LOCK | XBF_TRYLOCK))
615 xfs_buf_unlock(bp); 615 xfs_buf_unlock(bp);
616 xfs_buf_rele(bp); 616 xfs_buf_rele(bp);
617 return NULL; 617 return NULL;
618 } 618 }
619 619
620 /* 620 /*
621 * If we are not low on memory then do the readahead in a deadlock 621 * If we are not low on memory then do the readahead in a deadlock
622 * safe manner. 622 * safe manner.
623 */ 623 */
624 void 624 void
625 xfs_buf_readahead( 625 xfs_buf_readahead(
626 xfs_buftarg_t *target, 626 xfs_buftarg_t *target,
627 xfs_off_t ioff, 627 xfs_off_t ioff,
628 size_t isize, 628 size_t isize,
629 xfs_buf_flags_t flags) 629 xfs_buf_flags_t flags)
630 { 630 {
631 struct backing_dev_info *bdi; 631 struct backing_dev_info *bdi;
632 632
633 bdi = target->bt_mapping->backing_dev_info; 633 bdi = target->bt_mapping->backing_dev_info;
634 if (bdi_read_congested(bdi)) 634 if (bdi_read_congested(bdi))
635 return; 635 return;
636 636
637 flags |= (XBF_TRYLOCK|XBF_ASYNC|XBF_READ_AHEAD); 637 flags |= (XBF_TRYLOCK|XBF_ASYNC|XBF_READ_AHEAD);
638 xfs_buf_read(target, ioff, isize, flags); 638 xfs_buf_read(target, ioff, isize, flags);
639 } 639 }
640 640
641 xfs_buf_t * 641 xfs_buf_t *
642 xfs_buf_get_empty( 642 xfs_buf_get_empty(
643 size_t len, 643 size_t len,
644 xfs_buftarg_t *target) 644 xfs_buftarg_t *target)
645 { 645 {
646 xfs_buf_t *bp; 646 xfs_buf_t *bp;
647 647
648 bp = xfs_buf_allocate(0); 648 bp = xfs_buf_allocate(0);
649 if (bp) 649 if (bp)
650 _xfs_buf_initialize(bp, target, 0, len, 0); 650 _xfs_buf_initialize(bp, target, 0, len, 0);
651 return bp; 651 return bp;
652 } 652 }
653 653
654 static inline struct page * 654 static inline struct page *
655 mem_to_page( 655 mem_to_page(
656 void *addr) 656 void *addr)
657 { 657 {
658 if ((!is_vmalloc_addr(addr))) { 658 if ((!is_vmalloc_addr(addr))) {
659 return virt_to_page(addr); 659 return virt_to_page(addr);
660 } else { 660 } else {
661 return vmalloc_to_page(addr); 661 return vmalloc_to_page(addr);
662 } 662 }
663 } 663 }
664 664
665 int 665 int
666 xfs_buf_associate_memory( 666 xfs_buf_associate_memory(
667 xfs_buf_t *bp, 667 xfs_buf_t *bp,
668 void *mem, 668 void *mem,
669 size_t len) 669 size_t len)
670 { 670 {
671 int rval; 671 int rval;
672 int i = 0; 672 int i = 0;
673 unsigned long pageaddr; 673 unsigned long pageaddr;
674 unsigned long offset; 674 unsigned long offset;
675 size_t buflen; 675 size_t buflen;
676 int page_count; 676 int page_count;
677 677
678 pageaddr = (unsigned long)mem & PAGE_CACHE_MASK; 678 pageaddr = (unsigned long)mem & PAGE_CACHE_MASK;
679 offset = (unsigned long)mem - pageaddr; 679 offset = (unsigned long)mem - pageaddr;
680 buflen = PAGE_CACHE_ALIGN(len + offset); 680 buflen = PAGE_CACHE_ALIGN(len + offset);
681 page_count = buflen >> PAGE_CACHE_SHIFT; 681 page_count = buflen >> PAGE_CACHE_SHIFT;
682 682
683 /* Free any previous set of page pointers */ 683 /* Free any previous set of page pointers */
684 if (bp->b_pages) 684 if (bp->b_pages)
685 _xfs_buf_free_pages(bp); 685 _xfs_buf_free_pages(bp);
686 686
687 bp->b_pages = NULL; 687 bp->b_pages = NULL;
688 bp->b_addr = mem; 688 bp->b_addr = mem;
689 689
690 rval = _xfs_buf_get_pages(bp, page_count, XBF_DONT_BLOCK); 690 rval = _xfs_buf_get_pages(bp, page_count, XBF_DONT_BLOCK);
691 if (rval) 691 if (rval)
692 return rval; 692 return rval;
693 693
694 bp->b_offset = offset; 694 bp->b_offset = offset;
695 695
696 for (i = 0; i < bp->b_page_count; i++) { 696 for (i = 0; i < bp->b_page_count; i++) {
697 bp->b_pages[i] = mem_to_page((void *)pageaddr); 697 bp->b_pages[i] = mem_to_page((void *)pageaddr);
698 pageaddr += PAGE_CACHE_SIZE; 698 pageaddr += PAGE_CACHE_SIZE;
699 } 699 }
700 700
701 bp->b_count_desired = len; 701 bp->b_count_desired = len;
702 bp->b_buffer_length = buflen; 702 bp->b_buffer_length = buflen;
703 bp->b_flags |= XBF_MAPPED; 703 bp->b_flags |= XBF_MAPPED;
704 bp->b_flags &= ~_XBF_PAGE_LOCKED; 704 bp->b_flags &= ~_XBF_PAGE_LOCKED;
705 705
706 return 0; 706 return 0;
707 } 707 }
708 708
709 xfs_buf_t * 709 xfs_buf_t *
710 xfs_buf_get_noaddr( 710 xfs_buf_get_noaddr(
711 size_t len, 711 size_t len,
712 xfs_buftarg_t *target) 712 xfs_buftarg_t *target)
713 { 713 {
714 unsigned long page_count = PAGE_ALIGN(len) >> PAGE_SHIFT; 714 unsigned long page_count = PAGE_ALIGN(len) >> PAGE_SHIFT;
715 int error, i; 715 int error, i;
716 xfs_buf_t *bp; 716 xfs_buf_t *bp;
717 717
718 bp = xfs_buf_allocate(0); 718 bp = xfs_buf_allocate(0);
719 if (unlikely(bp == NULL)) 719 if (unlikely(bp == NULL))
720 goto fail; 720 goto fail;
721 _xfs_buf_initialize(bp, target, 0, len, 0); 721 _xfs_buf_initialize(bp, target, 0, len, 0);
722 722
723 error = _xfs_buf_get_pages(bp, page_count, 0); 723 error = _xfs_buf_get_pages(bp, page_count, 0);
724 if (error) 724 if (error)
725 goto fail_free_buf; 725 goto fail_free_buf;
726 726
727 for (i = 0; i < page_count; i++) { 727 for (i = 0; i < page_count; i++) {
728 bp->b_pages[i] = alloc_page(GFP_KERNEL); 728 bp->b_pages[i] = alloc_page(GFP_KERNEL);
729 if (!bp->b_pages[i]) 729 if (!bp->b_pages[i])
730 goto fail_free_mem; 730 goto fail_free_mem;
731 } 731 }
732 bp->b_flags |= _XBF_PAGES; 732 bp->b_flags |= _XBF_PAGES;
733 733
734 error = _xfs_buf_map_pages(bp, XBF_MAPPED); 734 error = _xfs_buf_map_pages(bp, XBF_MAPPED);
735 if (unlikely(error)) { 735 if (unlikely(error)) {
736 printk(KERN_WARNING "%s: failed to map pages\n", 736 printk(KERN_WARNING "%s: failed to map pages\n",
737 __func__); 737 __func__);
738 goto fail_free_mem; 738 goto fail_free_mem;
739 } 739 }
740 740
741 xfs_buf_unlock(bp); 741 xfs_buf_unlock(bp);
742 742
743 trace_xfs_buf_get_noaddr(bp, _RET_IP_); 743 trace_xfs_buf_get_noaddr(bp, _RET_IP_);
744 return bp; 744 return bp;
745 745
746 fail_free_mem: 746 fail_free_mem:
747 while (--i >= 0) 747 while (--i >= 0)
748 __free_page(bp->b_pages[i]); 748 __free_page(bp->b_pages[i]);
749 _xfs_buf_free_pages(bp); 749 _xfs_buf_free_pages(bp);
750 fail_free_buf: 750 fail_free_buf:
751 xfs_buf_deallocate(bp); 751 xfs_buf_deallocate(bp);
752 fail: 752 fail:
753 return NULL; 753 return NULL;
754 } 754 }
755 755
756 /* 756 /*
757 * Increment reference count on buffer, to hold the buffer concurrently 757 * Increment reference count on buffer, to hold the buffer concurrently
758 * with another thread which may release (free) the buffer asynchronously. 758 * with another thread which may release (free) the buffer asynchronously.
759 * Must hold the buffer already to call this function. 759 * Must hold the buffer already to call this function.
760 */ 760 */
761 void 761 void
762 xfs_buf_hold( 762 xfs_buf_hold(
763 xfs_buf_t *bp) 763 xfs_buf_t *bp)
764 { 764 {
765 trace_xfs_buf_hold(bp, _RET_IP_); 765 trace_xfs_buf_hold(bp, _RET_IP_);
766 atomic_inc(&bp->b_hold); 766 atomic_inc(&bp->b_hold);
767 } 767 }
768 768
769 /* 769 /*
770 * Releases a hold on the specified buffer. If the 770 * Releases a hold on the specified buffer. If the
771 * the hold count is 1, calls xfs_buf_free. 771 * the hold count is 1, calls xfs_buf_free.
772 */ 772 */
773 void 773 void
774 xfs_buf_rele( 774 xfs_buf_rele(
775 xfs_buf_t *bp) 775 xfs_buf_t *bp)
776 { 776 {
777 xfs_bufhash_t *hash = bp->b_hash; 777 xfs_bufhash_t *hash = bp->b_hash;
778 778
779 trace_xfs_buf_rele(bp, _RET_IP_); 779 trace_xfs_buf_rele(bp, _RET_IP_);
780 780
781 if (unlikely(!hash)) { 781 if (unlikely(!hash)) {
782 ASSERT(!bp->b_relse); 782 ASSERT(!bp->b_relse);
783 if (atomic_dec_and_test(&bp->b_hold)) 783 if (atomic_dec_and_test(&bp->b_hold))
784 xfs_buf_free(bp); 784 xfs_buf_free(bp);
785 return; 785 return;
786 } 786 }
787 787
788 ASSERT(atomic_read(&bp->b_hold) > 0); 788 ASSERT(atomic_read(&bp->b_hold) > 0);
789 if (atomic_dec_and_lock(&bp->b_hold, &hash->bh_lock)) { 789 if (atomic_dec_and_lock(&bp->b_hold, &hash->bh_lock)) {
790 if (bp->b_relse) { 790 if (bp->b_relse) {
791 atomic_inc(&bp->b_hold); 791 atomic_inc(&bp->b_hold);
792 spin_unlock(&hash->bh_lock); 792 spin_unlock(&hash->bh_lock);
793 (*(bp->b_relse)) (bp); 793 (*(bp->b_relse)) (bp);
794 } else if (bp->b_flags & XBF_FS_MANAGED) { 794 } else if (bp->b_flags & XBF_FS_MANAGED) {
795 spin_unlock(&hash->bh_lock); 795 spin_unlock(&hash->bh_lock);
796 } else { 796 } else {
797 ASSERT(!(bp->b_flags & (XBF_DELWRI|_XBF_DELWRI_Q))); 797 ASSERT(!(bp->b_flags & (XBF_DELWRI|_XBF_DELWRI_Q)));
798 list_del_init(&bp->b_hash_list); 798 list_del_init(&bp->b_hash_list);
799 spin_unlock(&hash->bh_lock); 799 spin_unlock(&hash->bh_lock);
800 xfs_buf_free(bp); 800 xfs_buf_free(bp);
801 } 801 }
802 } 802 }
803 } 803 }
804 804
805 805
806 /* 806 /*
807 * Mutual exclusion on buffers. Locking model: 807 * Mutual exclusion on buffers. Locking model:
808 * 808 *
809 * Buffers associated with inodes for which buffer locking 809 * Buffers associated with inodes for which buffer locking
810 * is not enabled are not protected by semaphores, and are 810 * is not enabled are not protected by semaphores, and are
811 * assumed to be exclusively owned by the caller. There is a 811 * assumed to be exclusively owned by the caller. There is a
812 * spinlock in the buffer, used by the caller when concurrent 812 * spinlock in the buffer, used by the caller when concurrent
813 * access is possible. 813 * access is possible.
814 */ 814 */
815 815
816 /* 816 /*
817 * Locks a buffer object, if it is not already locked. 817 * Locks a buffer object, if it is not already locked.
818 * Note that this in no way locks the underlying pages, so it is only 818 * Note that this in no way locks the underlying pages, so it is only
819 * useful for synchronizing concurrent use of buffer objects, not for 819 * useful for synchronizing concurrent use of buffer objects, not for
820 * synchronizing independent access to the underlying pages. 820 * synchronizing independent access to the underlying pages.
821 */ 821 */
822 int 822 int
823 xfs_buf_cond_lock( 823 xfs_buf_cond_lock(
824 xfs_buf_t *bp) 824 xfs_buf_t *bp)
825 { 825 {
826 int locked; 826 int locked;
827 827
828 locked = down_trylock(&bp->b_sema) == 0; 828 locked = down_trylock(&bp->b_sema) == 0;
829 if (locked) 829 if (locked)
830 XB_SET_OWNER(bp); 830 XB_SET_OWNER(bp);
831 831
832 trace_xfs_buf_cond_lock(bp, _RET_IP_); 832 trace_xfs_buf_cond_lock(bp, _RET_IP_);
833 return locked ? 0 : -EBUSY; 833 return locked ? 0 : -EBUSY;
834 } 834 }
835 835
836 int 836 int
837 xfs_buf_lock_value( 837 xfs_buf_lock_value(
838 xfs_buf_t *bp) 838 xfs_buf_t *bp)
839 { 839 {
840 return bp->b_sema.count; 840 return bp->b_sema.count;
841 } 841 }
842 842
843 /* 843 /*
844 * Locks a buffer object. 844 * Locks a buffer object.
845 * Note that this in no way locks the underlying pages, so it is only 845 * Note that this in no way locks the underlying pages, so it is only
846 * useful for synchronizing concurrent use of buffer objects, not for 846 * useful for synchronizing concurrent use of buffer objects, not for
847 * synchronizing independent access to the underlying pages. 847 * synchronizing independent access to the underlying pages.
848 * 848 *
849 * If we come across a stale, pinned, locked buffer, we know that we 849 * If we come across a stale, pinned, locked buffer, we know that we
850 * are being asked to lock a buffer that has been reallocated. Because 850 * are being asked to lock a buffer that has been reallocated. Because
851 * it is pinned, we know that the log has not been pushed to disk and 851 * it is pinned, we know that the log has not been pushed to disk and
852 * hence it will still be locked. Rather than sleeping until someone 852 * hence it will still be locked. Rather than sleeping until someone
853 * else pushes the log, push it ourselves before trying to get the lock. 853 * else pushes the log, push it ourselves before trying to get the lock.
854 */ 854 */
855 void 855 void
856 xfs_buf_lock( 856 xfs_buf_lock(
857 xfs_buf_t *bp) 857 xfs_buf_t *bp)
858 { 858 {
859 trace_xfs_buf_lock(bp, _RET_IP_); 859 trace_xfs_buf_lock(bp, _RET_IP_);
860 860
861 if (atomic_read(&bp->b_pin_count) && (bp->b_flags & XBF_STALE)) 861 if (atomic_read(&bp->b_pin_count) && (bp->b_flags & XBF_STALE))
862 xfs_log_force(bp->b_mount, 0); 862 xfs_log_force(bp->b_mount, 0);
863 if (atomic_read(&bp->b_io_remaining)) 863 if (atomic_read(&bp->b_io_remaining))
864 blk_run_address_space(bp->b_target->bt_mapping); 864 blk_run_address_space(bp->b_target->bt_mapping);
865 down(&bp->b_sema); 865 down(&bp->b_sema);
866 XB_SET_OWNER(bp); 866 XB_SET_OWNER(bp);
867 867
868 trace_xfs_buf_lock_done(bp, _RET_IP_); 868 trace_xfs_buf_lock_done(bp, _RET_IP_);
869 } 869 }
870 870
871 /* 871 /*
872 * Releases the lock on the buffer object. 872 * Releases the lock on the buffer object.
873 * If the buffer is marked delwri but is not queued, do so before we 873 * If the buffer is marked delwri but is not queued, do so before we
874 * unlock the buffer as we need to set flags correctly. We also need to 874 * unlock the buffer as we need to set flags correctly. We also need to
875 * take a reference for the delwri queue because the unlocker is going to 875 * take a reference for the delwri queue because the unlocker is going to
876 * drop their's and they don't know we just queued it. 876 * drop their's and they don't know we just queued it.
877 */ 877 */
878 void 878 void
879 xfs_buf_unlock( 879 xfs_buf_unlock(
880 xfs_buf_t *bp) 880 xfs_buf_t *bp)
881 { 881 {
882 if ((bp->b_flags & (XBF_DELWRI|_XBF_DELWRI_Q)) == XBF_DELWRI) { 882 if ((bp->b_flags & (XBF_DELWRI|_XBF_DELWRI_Q)) == XBF_DELWRI) {
883 atomic_inc(&bp->b_hold); 883 atomic_inc(&bp->b_hold);
884 bp->b_flags |= XBF_ASYNC; 884 bp->b_flags |= XBF_ASYNC;
885 xfs_buf_delwri_queue(bp, 0); 885 xfs_buf_delwri_queue(bp, 0);
886 } 886 }
887 887
888 XB_CLEAR_OWNER(bp); 888 XB_CLEAR_OWNER(bp);
889 up(&bp->b_sema); 889 up(&bp->b_sema);
890 890
891 trace_xfs_buf_unlock(bp, _RET_IP_); 891 trace_xfs_buf_unlock(bp, _RET_IP_);
892 } 892 }
893 893
894 STATIC void 894 STATIC void
895 xfs_buf_wait_unpin( 895 xfs_buf_wait_unpin(
896 xfs_buf_t *bp) 896 xfs_buf_t *bp)
897 { 897 {
898 DECLARE_WAITQUEUE (wait, current); 898 DECLARE_WAITQUEUE (wait, current);
899 899
900 if (atomic_read(&bp->b_pin_count) == 0) 900 if (atomic_read(&bp->b_pin_count) == 0)
901 return; 901 return;
902 902
903 add_wait_queue(&bp->b_waiters, &wait); 903 add_wait_queue(&bp->b_waiters, &wait);
904 for (;;) { 904 for (;;) {
905 set_current_state(TASK_UNINTERRUPTIBLE); 905 set_current_state(TASK_UNINTERRUPTIBLE);
906 if (atomic_read(&bp->b_pin_count) == 0) 906 if (atomic_read(&bp->b_pin_count) == 0)
907 break; 907 break;
908 if (atomic_read(&bp->b_io_remaining)) 908 if (atomic_read(&bp->b_io_remaining))
909 blk_run_address_space(bp->b_target->bt_mapping); 909 blk_run_address_space(bp->b_target->bt_mapping);
910 schedule(); 910 schedule();
911 } 911 }
912 remove_wait_queue(&bp->b_waiters, &wait); 912 remove_wait_queue(&bp->b_waiters, &wait);
913 set_current_state(TASK_RUNNING); 913 set_current_state(TASK_RUNNING);
914 } 914 }
915 915
916 /* 916 /*
917 * Buffer Utility Routines 917 * Buffer Utility Routines
918 */ 918 */
919 919
920 STATIC void 920 STATIC void
921 xfs_buf_iodone_work( 921 xfs_buf_iodone_work(
922 struct work_struct *work) 922 struct work_struct *work)
923 { 923 {
924 xfs_buf_t *bp = 924 xfs_buf_t *bp =
925 container_of(work, xfs_buf_t, b_iodone_work); 925 container_of(work, xfs_buf_t, b_iodone_work);
926 926
927 if (bp->b_iodone) 927 if (bp->b_iodone)
928 (*(bp->b_iodone))(bp); 928 (*(bp->b_iodone))(bp);
929 else if (bp->b_flags & XBF_ASYNC) 929 else if (bp->b_flags & XBF_ASYNC)
930 xfs_buf_relse(bp); 930 xfs_buf_relse(bp);
931 } 931 }
932 932
933 void 933 void
934 xfs_buf_ioend( 934 xfs_buf_ioend(
935 xfs_buf_t *bp, 935 xfs_buf_t *bp,
936 int schedule) 936 int schedule)
937 { 937 {
938 trace_xfs_buf_iodone(bp, _RET_IP_); 938 trace_xfs_buf_iodone(bp, _RET_IP_);
939 939
940 bp->b_flags &= ~(XBF_READ | XBF_WRITE | XBF_READ_AHEAD); 940 bp->b_flags &= ~(XBF_READ | XBF_WRITE | XBF_READ_AHEAD);
941 if (bp->b_error == 0) 941 if (bp->b_error == 0)
942 bp->b_flags |= XBF_DONE; 942 bp->b_flags |= XBF_DONE;
943 943
944 if ((bp->b_iodone) || (bp->b_flags & XBF_ASYNC)) { 944 if ((bp->b_iodone) || (bp->b_flags & XBF_ASYNC)) {
945 if (schedule) { 945 if (schedule) {
946 INIT_WORK(&bp->b_iodone_work, xfs_buf_iodone_work); 946 INIT_WORK(&bp->b_iodone_work, xfs_buf_iodone_work);
947 queue_work(xfslogd_workqueue, &bp->b_iodone_work); 947 queue_work(xfslogd_workqueue, &bp->b_iodone_work);
948 } else { 948 } else {
949 xfs_buf_iodone_work(&bp->b_iodone_work); 949 xfs_buf_iodone_work(&bp->b_iodone_work);
950 } 950 }
951 } else { 951 } else {
952 complete(&bp->b_iowait); 952 complete(&bp->b_iowait);
953 } 953 }
954 } 954 }
955 955
956 void 956 void
957 xfs_buf_ioerror( 957 xfs_buf_ioerror(
958 xfs_buf_t *bp, 958 xfs_buf_t *bp,
959 int error) 959 int error)
960 { 960 {
961 ASSERT(error >= 0 && error <= 0xffff); 961 ASSERT(error >= 0 && error <= 0xffff);
962 bp->b_error = (unsigned short)error; 962 bp->b_error = (unsigned short)error;
963 trace_xfs_buf_ioerror(bp, error, _RET_IP_); 963 trace_xfs_buf_ioerror(bp, error, _RET_IP_);
964 } 964 }
965 965
966 int 966 int
967 xfs_bwrite( 967 xfs_bwrite(
968 struct xfs_mount *mp, 968 struct xfs_mount *mp,
969 struct xfs_buf *bp) 969 struct xfs_buf *bp)
970 { 970 {
971 int error; 971 int error;
972 972
973 bp->b_mount = mp; 973 bp->b_mount = mp;
974 bp->b_flags |= XBF_WRITE; 974 bp->b_flags |= XBF_WRITE;
975 bp->b_flags &= ~(XBF_ASYNC | XBF_READ); 975 bp->b_flags &= ~(XBF_ASYNC | XBF_READ);
976 976
977 xfs_buf_delwri_dequeue(bp); 977 xfs_buf_delwri_dequeue(bp);
978 xfs_bdstrat_cb(bp); 978 xfs_bdstrat_cb(bp);
979 979
980 error = xfs_buf_iowait(bp); 980 error = xfs_buf_iowait(bp);
981 if (error) 981 if (error)
982 xfs_force_shutdown(mp, SHUTDOWN_META_IO_ERROR); 982 xfs_force_shutdown(mp, SHUTDOWN_META_IO_ERROR);
983 xfs_buf_relse(bp); 983 xfs_buf_relse(bp);
984 return error; 984 return error;
985 } 985 }
986 986
987 void 987 void
988 xfs_bdwrite( 988 xfs_bdwrite(
989 void *mp, 989 void *mp,
990 struct xfs_buf *bp) 990 struct xfs_buf *bp)
991 { 991 {
992 trace_xfs_buf_bdwrite(bp, _RET_IP_); 992 trace_xfs_buf_bdwrite(bp, _RET_IP_);
993 993
994 bp->b_mount = mp; 994 bp->b_mount = mp;
995 995
996 bp->b_flags &= ~XBF_READ; 996 bp->b_flags &= ~XBF_READ;
997 bp->b_flags |= (XBF_DELWRI | XBF_ASYNC); 997 bp->b_flags |= (XBF_DELWRI | XBF_ASYNC);
998 998
999 xfs_buf_delwri_queue(bp, 1); 999 xfs_buf_delwri_queue(bp, 1);
1000 } 1000 }
1001 1001
1002 /* 1002 /*
1003 * Called when we want to stop a buffer from getting written or read. 1003 * Called when we want to stop a buffer from getting written or read.
1004 * We attach the EIO error, muck with its flags, and call biodone 1004 * We attach the EIO error, muck with its flags, and call biodone
1005 * so that the proper iodone callbacks get called. 1005 * so that the proper iodone callbacks get called.
1006 */ 1006 */
1007 STATIC int 1007 STATIC int
1008 xfs_bioerror( 1008 xfs_bioerror(
1009 xfs_buf_t *bp) 1009 xfs_buf_t *bp)
1010 { 1010 {
1011 #ifdef XFSERRORDEBUG 1011 #ifdef XFSERRORDEBUG
1012 ASSERT(XFS_BUF_ISREAD(bp) || bp->b_iodone); 1012 ASSERT(XFS_BUF_ISREAD(bp) || bp->b_iodone);
1013 #endif 1013 #endif
1014 1014
1015 /* 1015 /*
1016 * No need to wait until the buffer is unpinned, we aren't flushing it. 1016 * No need to wait until the buffer is unpinned, we aren't flushing it.
1017 */ 1017 */
1018 XFS_BUF_ERROR(bp, EIO); 1018 XFS_BUF_ERROR(bp, EIO);
1019 1019
1020 /* 1020 /*
1021 * We're calling biodone, so delete XBF_DONE flag. 1021 * We're calling biodone, so delete XBF_DONE flag.
1022 */ 1022 */
1023 XFS_BUF_UNREAD(bp); 1023 XFS_BUF_UNREAD(bp);
1024 XFS_BUF_UNDELAYWRITE(bp); 1024 XFS_BUF_UNDELAYWRITE(bp);
1025 XFS_BUF_UNDONE(bp); 1025 XFS_BUF_UNDONE(bp);
1026 XFS_BUF_STALE(bp); 1026 XFS_BUF_STALE(bp);
1027 1027
1028 xfs_biodone(bp); 1028 xfs_biodone(bp);
1029 1029
1030 return EIO; 1030 return EIO;
1031 } 1031 }
1032 1032
1033 /* 1033 /*
1034 * Same as xfs_bioerror, except that we are releasing the buffer 1034 * Same as xfs_bioerror, except that we are releasing the buffer
1035 * here ourselves, and avoiding the biodone call. 1035 * here ourselves, and avoiding the biodone call.
1036 * This is meant for userdata errors; metadata bufs come with 1036 * This is meant for userdata errors; metadata bufs come with
1037 * iodone functions attached, so that we can track down errors. 1037 * iodone functions attached, so that we can track down errors.
1038 */ 1038 */
1039 STATIC int 1039 STATIC int
1040 xfs_bioerror_relse( 1040 xfs_bioerror_relse(
1041 struct xfs_buf *bp) 1041 struct xfs_buf *bp)
1042 { 1042 {
1043 int64_t fl = XFS_BUF_BFLAGS(bp); 1043 int64_t fl = XFS_BUF_BFLAGS(bp);
1044 /* 1044 /*
1045 * No need to wait until the buffer is unpinned. 1045 * No need to wait until the buffer is unpinned.
1046 * We aren't flushing it. 1046 * We aren't flushing it.
1047 * 1047 *
1048 * chunkhold expects B_DONE to be set, whether 1048 * chunkhold expects B_DONE to be set, whether
1049 * we actually finish the I/O or not. We don't want to 1049 * we actually finish the I/O or not. We don't want to
1050 * change that interface. 1050 * change that interface.
1051 */ 1051 */
1052 XFS_BUF_UNREAD(bp); 1052 XFS_BUF_UNREAD(bp);
1053 XFS_BUF_UNDELAYWRITE(bp); 1053 XFS_BUF_UNDELAYWRITE(bp);
1054 XFS_BUF_DONE(bp); 1054 XFS_BUF_DONE(bp);
1055 XFS_BUF_STALE(bp); 1055 XFS_BUF_STALE(bp);
1056 XFS_BUF_CLR_IODONE_FUNC(bp); 1056 XFS_BUF_CLR_IODONE_FUNC(bp);
1057 if (!(fl & XBF_ASYNC)) { 1057 if (!(fl & XBF_ASYNC)) {
1058 /* 1058 /*
1059 * Mark b_error and B_ERROR _both_. 1059 * Mark b_error and B_ERROR _both_.
1060 * Lot's of chunkcache code assumes that. 1060 * Lot's of chunkcache code assumes that.
1061 * There's no reason to mark error for 1061 * There's no reason to mark error for
1062 * ASYNC buffers. 1062 * ASYNC buffers.
1063 */ 1063 */
1064 XFS_BUF_ERROR(bp, EIO); 1064 XFS_BUF_ERROR(bp, EIO);
1065 XFS_BUF_FINISH_IOWAIT(bp); 1065 XFS_BUF_FINISH_IOWAIT(bp);
1066 } else { 1066 } else {
1067 xfs_buf_relse(bp); 1067 xfs_buf_relse(bp);
1068 } 1068 }
1069 1069
1070 return EIO; 1070 return EIO;
1071 } 1071 }
1072 1072
1073 1073
1074 /* 1074 /*
1075 * All xfs metadata buffers except log state machine buffers 1075 * All xfs metadata buffers except log state machine buffers
1076 * get this attached as their b_bdstrat callback function. 1076 * get this attached as their b_bdstrat callback function.
1077 * This is so that we can catch a buffer 1077 * This is so that we can catch a buffer
1078 * after prematurely unpinning it to forcibly shutdown the filesystem. 1078 * after prematurely unpinning it to forcibly shutdown the filesystem.
1079 */ 1079 */
1080 int 1080 int
1081 xfs_bdstrat_cb( 1081 xfs_bdstrat_cb(
1082 struct xfs_buf *bp) 1082 struct xfs_buf *bp)
1083 { 1083 {
1084 if (XFS_FORCED_SHUTDOWN(bp->b_mount)) { 1084 if (XFS_FORCED_SHUTDOWN(bp->b_mount)) {
1085 trace_xfs_bdstrat_shut(bp, _RET_IP_); 1085 trace_xfs_bdstrat_shut(bp, _RET_IP_);
1086 /* 1086 /*
1087 * Metadata write that didn't get logged but 1087 * Metadata write that didn't get logged but
1088 * written delayed anyway. These aren't associated 1088 * written delayed anyway. These aren't associated
1089 * with a transaction, and can be ignored. 1089 * with a transaction, and can be ignored.
1090 */ 1090 */
1091 if (!bp->b_iodone && !XFS_BUF_ISREAD(bp)) 1091 if (!bp->b_iodone && !XFS_BUF_ISREAD(bp))
1092 return xfs_bioerror_relse(bp); 1092 return xfs_bioerror_relse(bp);
1093 else 1093 else
1094 return xfs_bioerror(bp); 1094 return xfs_bioerror(bp);
1095 } 1095 }
1096 1096
1097 xfs_buf_iorequest(bp); 1097 xfs_buf_iorequest(bp);
1098 return 0; 1098 return 0;
1099 } 1099 }
1100 1100
1101 /* 1101 /*
1102 * Wrapper around bdstrat so that we can stop data from going to disk in case 1102 * Wrapper around bdstrat so that we can stop data from going to disk in case
1103 * we are shutting down the filesystem. Typically user data goes thru this 1103 * we are shutting down the filesystem. Typically user data goes thru this
1104 * path; one of the exceptions is the superblock. 1104 * path; one of the exceptions is the superblock.
1105 */ 1105 */
1106 void 1106 void
1107 xfsbdstrat( 1107 xfsbdstrat(
1108 struct xfs_mount *mp, 1108 struct xfs_mount *mp,
1109 struct xfs_buf *bp) 1109 struct xfs_buf *bp)
1110 { 1110 {
1111 if (XFS_FORCED_SHUTDOWN(mp)) { 1111 if (XFS_FORCED_SHUTDOWN(mp)) {
1112 trace_xfs_bdstrat_shut(bp, _RET_IP_); 1112 trace_xfs_bdstrat_shut(bp, _RET_IP_);
1113 xfs_bioerror_relse(bp); 1113 xfs_bioerror_relse(bp);
1114 return; 1114 return;
1115 } 1115 }
1116 1116
1117 xfs_buf_iorequest(bp); 1117 xfs_buf_iorequest(bp);
1118 } 1118 }
1119 1119
1120 STATIC void 1120 STATIC void
1121 _xfs_buf_ioend( 1121 _xfs_buf_ioend(
1122 xfs_buf_t *bp, 1122 xfs_buf_t *bp,
1123 int schedule) 1123 int schedule)
1124 { 1124 {
1125 if (atomic_dec_and_test(&bp->b_io_remaining) == 1) { 1125 if (atomic_dec_and_test(&bp->b_io_remaining) == 1) {
1126 bp->b_flags &= ~_XBF_PAGE_LOCKED; 1126 bp->b_flags &= ~_XBF_PAGE_LOCKED;
1127 xfs_buf_ioend(bp, schedule); 1127 xfs_buf_ioend(bp, schedule);
1128 } 1128 }
1129 } 1129 }
1130 1130
1131 STATIC void 1131 STATIC void
1132 xfs_buf_bio_end_io( 1132 xfs_buf_bio_end_io(
1133 struct bio *bio, 1133 struct bio *bio,
1134 int error) 1134 int error)
1135 { 1135 {
1136 xfs_buf_t *bp = (xfs_buf_t *)bio->bi_private; 1136 xfs_buf_t *bp = (xfs_buf_t *)bio->bi_private;
1137 unsigned int blocksize = bp->b_target->bt_bsize; 1137 unsigned int blocksize = bp->b_target->bt_bsize;
1138 struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1; 1138 struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
1139 1139
1140 xfs_buf_ioerror(bp, -error); 1140 xfs_buf_ioerror(bp, -error);
1141 1141
1142 if (!error && xfs_buf_is_vmapped(bp) && (bp->b_flags & XBF_READ)) 1142 if (!error && xfs_buf_is_vmapped(bp) && (bp->b_flags & XBF_READ))
1143 invalidate_kernel_vmap_range(bp->b_addr, xfs_buf_vmap_len(bp)); 1143 invalidate_kernel_vmap_range(bp->b_addr, xfs_buf_vmap_len(bp));
1144 1144
1145 do { 1145 do {
1146 struct page *page = bvec->bv_page; 1146 struct page *page = bvec->bv_page;
1147 1147
1148 ASSERT(!PagePrivate(page)); 1148 ASSERT(!PagePrivate(page));
1149 if (unlikely(bp->b_error)) { 1149 if (unlikely(bp->b_error)) {
1150 if (bp->b_flags & XBF_READ) 1150 if (bp->b_flags & XBF_READ)
1151 ClearPageUptodate(page); 1151 ClearPageUptodate(page);
1152 } else if (blocksize >= PAGE_CACHE_SIZE) { 1152 } else if (blocksize >= PAGE_CACHE_SIZE) {
1153 SetPageUptodate(page); 1153 SetPageUptodate(page);
1154 } else if (!PagePrivate(page) && 1154 } else if (!PagePrivate(page) &&
1155 (bp->b_flags & _XBF_PAGE_CACHE)) { 1155 (bp->b_flags & _XBF_PAGE_CACHE)) {
1156 set_page_region(page, bvec->bv_offset, bvec->bv_len); 1156 set_page_region(page, bvec->bv_offset, bvec->bv_len);
1157 } 1157 }
1158 1158
1159 if (--bvec >= bio->bi_io_vec) 1159 if (--bvec >= bio->bi_io_vec)
1160 prefetchw(&bvec->bv_page->flags); 1160 prefetchw(&bvec->bv_page->flags);
1161 1161
1162 if (bp->b_flags & _XBF_PAGE_LOCKED) 1162 if (bp->b_flags & _XBF_PAGE_LOCKED)
1163 unlock_page(page); 1163 unlock_page(page);
1164 } while (bvec >= bio->bi_io_vec); 1164 } while (bvec >= bio->bi_io_vec);
1165 1165
1166 _xfs_buf_ioend(bp, 1); 1166 _xfs_buf_ioend(bp, 1);
1167 bio_put(bio); 1167 bio_put(bio);
1168 } 1168 }
1169 1169
1170 STATIC void 1170 STATIC void
1171 _xfs_buf_ioapply( 1171 _xfs_buf_ioapply(
1172 xfs_buf_t *bp) 1172 xfs_buf_t *bp)
1173 { 1173 {
1174 int rw, map_i, total_nr_pages, nr_pages; 1174 int rw, map_i, total_nr_pages, nr_pages;
1175 struct bio *bio; 1175 struct bio *bio;
1176 int offset = bp->b_offset; 1176 int offset = bp->b_offset;
1177 int size = bp->b_count_desired; 1177 int size = bp->b_count_desired;
1178 sector_t sector = bp->b_bn; 1178 sector_t sector = bp->b_bn;
1179 unsigned int blocksize = bp->b_target->bt_bsize; 1179 unsigned int blocksize = bp->b_target->bt_bsize;
1180 1180
1181 total_nr_pages = bp->b_page_count; 1181 total_nr_pages = bp->b_page_count;
1182 map_i = 0; 1182 map_i = 0;
1183 1183
1184 if (bp->b_flags & XBF_ORDERED) { 1184 if (bp->b_flags & XBF_ORDERED) {
1185 ASSERT(!(bp->b_flags & XBF_READ)); 1185 ASSERT(!(bp->b_flags & XBF_READ));
1186 rw = WRITE_FLUSH_FUA; 1186 rw = WRITE_FLUSH_FUA;
1187 } else if (bp->b_flags & XBF_LOG_BUFFER) { 1187 } else if (bp->b_flags & XBF_LOG_BUFFER) {
1188 ASSERT(!(bp->b_flags & XBF_READ_AHEAD)); 1188 ASSERT(!(bp->b_flags & XBF_READ_AHEAD));
1189 bp->b_flags &= ~_XBF_RUN_QUEUES; 1189 bp->b_flags &= ~_XBF_RUN_QUEUES;
1190 rw = (bp->b_flags & XBF_WRITE) ? WRITE_SYNC : READ_SYNC; 1190 rw = (bp->b_flags & XBF_WRITE) ? WRITE_SYNC : READ_SYNC;
1191 } else if (bp->b_flags & _XBF_RUN_QUEUES) { 1191 } else if (bp->b_flags & _XBF_RUN_QUEUES) {
1192 ASSERT(!(bp->b_flags & XBF_READ_AHEAD)); 1192 ASSERT(!(bp->b_flags & XBF_READ_AHEAD));
1193 bp->b_flags &= ~_XBF_RUN_QUEUES; 1193 bp->b_flags &= ~_XBF_RUN_QUEUES;
1194 rw = (bp->b_flags & XBF_WRITE) ? WRITE_META : READ_META; 1194 rw = (bp->b_flags & XBF_WRITE) ? WRITE_META : READ_META;
1195 } else { 1195 } else {
1196 rw = (bp->b_flags & XBF_WRITE) ? WRITE : 1196 rw = (bp->b_flags & XBF_WRITE) ? WRITE :
1197 (bp->b_flags & XBF_READ_AHEAD) ? READA : READ; 1197 (bp->b_flags & XBF_READ_AHEAD) ? READA : READ;
1198 } 1198 }
1199 1199
1200 /* Special code path for reading a sub page size buffer in -- 1200 /* Special code path for reading a sub page size buffer in --
1201 * we populate up the whole page, and hence the other metadata 1201 * we populate up the whole page, and hence the other metadata
1202 * in the same page. This optimization is only valid when the 1202 * in the same page. This optimization is only valid when the
1203 * filesystem block size is not smaller than the page size. 1203 * filesystem block size is not smaller than the page size.
1204 */ 1204 */
1205 if ((bp->b_buffer_length < PAGE_CACHE_SIZE) && 1205 if ((bp->b_buffer_length < PAGE_CACHE_SIZE) &&
1206 ((bp->b_flags & (XBF_READ|_XBF_PAGE_LOCKED)) == 1206 ((bp->b_flags & (XBF_READ|_XBF_PAGE_LOCKED)) ==
1207 (XBF_READ|_XBF_PAGE_LOCKED)) && 1207 (XBF_READ|_XBF_PAGE_LOCKED)) &&
1208 (blocksize >= PAGE_CACHE_SIZE)) { 1208 (blocksize >= PAGE_CACHE_SIZE)) {
1209 bio = bio_alloc(GFP_NOIO, 1); 1209 bio = bio_alloc(GFP_NOIO, 1);
1210 1210
1211 bio->bi_bdev = bp->b_target->bt_bdev; 1211 bio->bi_bdev = bp->b_target->bt_bdev;
1212 bio->bi_sector = sector - (offset >> BBSHIFT); 1212 bio->bi_sector = sector - (offset >> BBSHIFT);
1213 bio->bi_end_io = xfs_buf_bio_end_io; 1213 bio->bi_end_io = xfs_buf_bio_end_io;
1214 bio->bi_private = bp; 1214 bio->bi_private = bp;
1215 1215
1216 bio_add_page(bio, bp->b_pages[0], PAGE_CACHE_SIZE, 0); 1216 bio_add_page(bio, bp->b_pages[0], PAGE_CACHE_SIZE, 0);
1217 size = 0; 1217 size = 0;
1218 1218
1219 atomic_inc(&bp->b_io_remaining); 1219 atomic_inc(&bp->b_io_remaining);
1220 1220
1221 goto submit_io; 1221 goto submit_io;
1222 } 1222 }
1223 1223
1224 next_chunk: 1224 next_chunk:
1225 atomic_inc(&bp->b_io_remaining); 1225 atomic_inc(&bp->b_io_remaining);
1226 nr_pages = BIO_MAX_SECTORS >> (PAGE_SHIFT - BBSHIFT); 1226 nr_pages = BIO_MAX_SECTORS >> (PAGE_SHIFT - BBSHIFT);
1227 if (nr_pages > total_nr_pages) 1227 if (nr_pages > total_nr_pages)
1228 nr_pages = total_nr_pages; 1228 nr_pages = total_nr_pages;
1229 1229
1230 bio = bio_alloc(GFP_NOIO, nr_pages); 1230 bio = bio_alloc(GFP_NOIO, nr_pages);
1231 bio->bi_bdev = bp->b_target->bt_bdev; 1231 bio->bi_bdev = bp->b_target->bt_bdev;
1232 bio->bi_sector = sector; 1232 bio->bi_sector = sector;
1233 bio->bi_end_io = xfs_buf_bio_end_io; 1233 bio->bi_end_io = xfs_buf_bio_end_io;
1234 bio->bi_private = bp; 1234 bio->bi_private = bp;
1235 1235
1236 for (; size && nr_pages; nr_pages--, map_i++) { 1236 for (; size && nr_pages; nr_pages--, map_i++) {
1237 int rbytes, nbytes = PAGE_CACHE_SIZE - offset; 1237 int rbytes, nbytes = PAGE_CACHE_SIZE - offset;
1238 1238
1239 if (nbytes > size) 1239 if (nbytes > size)
1240 nbytes = size; 1240 nbytes = size;
1241 1241
1242 rbytes = bio_add_page(bio, bp->b_pages[map_i], nbytes, offset); 1242 rbytes = bio_add_page(bio, bp->b_pages[map_i], nbytes, offset);
1243 if (rbytes < nbytes) 1243 if (rbytes < nbytes)
1244 break; 1244 break;
1245 1245
1246 offset = 0; 1246 offset = 0;
1247 sector += nbytes >> BBSHIFT; 1247 sector += nbytes >> BBSHIFT;
1248 size -= nbytes; 1248 size -= nbytes;
1249 total_nr_pages--; 1249 total_nr_pages--;
1250 } 1250 }
1251 1251
1252 submit_io: 1252 submit_io:
1253 if (likely(bio->bi_size)) { 1253 if (likely(bio->bi_size)) {
1254 if (xfs_buf_is_vmapped(bp)) { 1254 if (xfs_buf_is_vmapped(bp)) {
1255 flush_kernel_vmap_range(bp->b_addr, 1255 flush_kernel_vmap_range(bp->b_addr,
1256 xfs_buf_vmap_len(bp)); 1256 xfs_buf_vmap_len(bp));
1257 } 1257 }
1258 submit_bio(rw, bio); 1258 submit_bio(rw, bio);
1259 if (size) 1259 if (size)
1260 goto next_chunk; 1260 goto next_chunk;
1261 } else { 1261 } else {
1262 /* 1262 /*
1263 * if we get here, no pages were added to the bio. However, 1263 * if we get here, no pages were added to the bio. However,
1264 * we can't just error out here - if the pages are locked then 1264 * we can't just error out here - if the pages are locked then
1265 * we have to unlock them otherwise we can hang on a later 1265 * we have to unlock them otherwise we can hang on a later
1266 * access to the page. 1266 * access to the page.
1267 */ 1267 */
1268 xfs_buf_ioerror(bp, EIO); 1268 xfs_buf_ioerror(bp, EIO);
1269 if (bp->b_flags & _XBF_PAGE_LOCKED) { 1269 if (bp->b_flags & _XBF_PAGE_LOCKED) {
1270 int i; 1270 int i;
1271 for (i = 0; i < bp->b_page_count; i++) 1271 for (i = 0; i < bp->b_page_count; i++)
1272 unlock_page(bp->b_pages[i]); 1272 unlock_page(bp->b_pages[i]);
1273 } 1273 }
1274 bio_put(bio); 1274 bio_put(bio);
1275 } 1275 }
1276 } 1276 }
1277 1277
1278 int 1278 int
1279 xfs_buf_iorequest( 1279 xfs_buf_iorequest(
1280 xfs_buf_t *bp) 1280 xfs_buf_t *bp)
1281 { 1281 {
1282 trace_xfs_buf_iorequest(bp, _RET_IP_); 1282 trace_xfs_buf_iorequest(bp, _RET_IP_);
1283 1283
1284 if (bp->b_flags & XBF_DELWRI) { 1284 if (bp->b_flags & XBF_DELWRI) {
1285 xfs_buf_delwri_queue(bp, 1); 1285 xfs_buf_delwri_queue(bp, 1);
1286 return 0; 1286 return 0;
1287 } 1287 }
1288 1288
1289 if (bp->b_flags & XBF_WRITE) { 1289 if (bp->b_flags & XBF_WRITE) {
1290 xfs_buf_wait_unpin(bp); 1290 xfs_buf_wait_unpin(bp);
1291 } 1291 }
1292 1292
1293 xfs_buf_hold(bp); 1293 xfs_buf_hold(bp);
1294 1294
1295 /* Set the count to 1 initially, this will stop an I/O 1295 /* Set the count to 1 initially, this will stop an I/O
1296 * completion callout which happens before we have started 1296 * completion callout which happens before we have started
1297 * all the I/O from calling xfs_buf_ioend too early. 1297 * all the I/O from calling xfs_buf_ioend too early.
1298 */ 1298 */
1299 atomic_set(&bp->b_io_remaining, 1); 1299 atomic_set(&bp->b_io_remaining, 1);
1300 _xfs_buf_ioapply(bp); 1300 _xfs_buf_ioapply(bp);
1301 _xfs_buf_ioend(bp, 0); 1301 _xfs_buf_ioend(bp, 0);
1302 1302
1303 xfs_buf_rele(bp); 1303 xfs_buf_rele(bp);
1304 return 0; 1304 return 0;
1305 } 1305 }
1306 1306
1307 /* 1307 /*
1308 * Waits for I/O to complete on the buffer supplied. 1308 * Waits for I/O to complete on the buffer supplied.
1309 * It returns immediately if no I/O is pending. 1309 * It returns immediately if no I/O is pending.
1310 * It returns the I/O error code, if any, or 0 if there was no error. 1310 * It returns the I/O error code, if any, or 0 if there was no error.
1311 */ 1311 */
1312 int 1312 int
1313 xfs_buf_iowait( 1313 xfs_buf_iowait(
1314 xfs_buf_t *bp) 1314 xfs_buf_t *bp)
1315 { 1315 {
1316 trace_xfs_buf_iowait(bp, _RET_IP_); 1316 trace_xfs_buf_iowait(bp, _RET_IP_);
1317 1317
1318 if (atomic_read(&bp->b_io_remaining)) 1318 if (atomic_read(&bp->b_io_remaining))
1319 blk_run_address_space(bp->b_target->bt_mapping); 1319 blk_run_address_space(bp->b_target->bt_mapping);
1320 wait_for_completion(&bp->b_iowait); 1320 wait_for_completion(&bp->b_iowait);
1321 1321
1322 trace_xfs_buf_iowait_done(bp, _RET_IP_); 1322 trace_xfs_buf_iowait_done(bp, _RET_IP_);
1323 return bp->b_error; 1323 return bp->b_error;
1324 } 1324 }
1325 1325
1326 xfs_caddr_t 1326 xfs_caddr_t
1327 xfs_buf_offset( 1327 xfs_buf_offset(
1328 xfs_buf_t *bp, 1328 xfs_buf_t *bp,
1329 size_t offset) 1329 size_t offset)
1330 { 1330 {
1331 struct page *page; 1331 struct page *page;
1332 1332
1333 if (bp->b_flags & XBF_MAPPED) 1333 if (bp->b_flags & XBF_MAPPED)
1334 return XFS_BUF_PTR(bp) + offset; 1334 return XFS_BUF_PTR(bp) + offset;
1335 1335
1336 offset += bp->b_offset; 1336 offset += bp->b_offset;
1337 page = bp->b_pages[offset >> PAGE_CACHE_SHIFT]; 1337 page = bp->b_pages[offset >> PAGE_CACHE_SHIFT];
1338 return (xfs_caddr_t)page_address(page) + (offset & (PAGE_CACHE_SIZE-1)); 1338 return (xfs_caddr_t)page_address(page) + (offset & (PAGE_CACHE_SIZE-1));
1339 } 1339 }
1340 1340
1341 /* 1341 /*
1342 * Move data into or out of a buffer. 1342 * Move data into or out of a buffer.
1343 */ 1343 */
1344 void 1344 void
1345 xfs_buf_iomove( 1345 xfs_buf_iomove(
1346 xfs_buf_t *bp, /* buffer to process */ 1346 xfs_buf_t *bp, /* buffer to process */
1347 size_t boff, /* starting buffer offset */ 1347 size_t boff, /* starting buffer offset */
1348 size_t bsize, /* length to copy */ 1348 size_t bsize, /* length to copy */
1349 void *data, /* data address */ 1349 void *data, /* data address */
1350 xfs_buf_rw_t mode) /* read/write/zero flag */ 1350 xfs_buf_rw_t mode) /* read/write/zero flag */
1351 { 1351 {
1352 size_t bend, cpoff, csize; 1352 size_t bend, cpoff, csize;
1353 struct page *page; 1353 struct page *page;
1354 1354
1355 bend = boff + bsize; 1355 bend = boff + bsize;
1356 while (boff < bend) { 1356 while (boff < bend) {
1357 page = bp->b_pages[xfs_buf_btoct(boff + bp->b_offset)]; 1357 page = bp->b_pages[xfs_buf_btoct(boff + bp->b_offset)];
1358 cpoff = xfs_buf_poff(boff + bp->b_offset); 1358 cpoff = xfs_buf_poff(boff + bp->b_offset);
1359 csize = min_t(size_t, 1359 csize = min_t(size_t,
1360 PAGE_CACHE_SIZE-cpoff, bp->b_count_desired-boff); 1360 PAGE_CACHE_SIZE-cpoff, bp->b_count_desired-boff);
1361 1361
1362 ASSERT(((csize + cpoff) <= PAGE_CACHE_SIZE)); 1362 ASSERT(((csize + cpoff) <= PAGE_CACHE_SIZE));
1363 1363
1364 switch (mode) { 1364 switch (mode) {
1365 case XBRW_ZERO: 1365 case XBRW_ZERO:
1366 memset(page_address(page) + cpoff, 0, csize); 1366 memset(page_address(page) + cpoff, 0, csize);
1367 break; 1367 break;
1368 case XBRW_READ: 1368 case XBRW_READ:
1369 memcpy(data, page_address(page) + cpoff, csize); 1369 memcpy(data, page_address(page) + cpoff, csize);
1370 break; 1370 break;
1371 case XBRW_WRITE: 1371 case XBRW_WRITE:
1372 memcpy(page_address(page) + cpoff, data, csize); 1372 memcpy(page_address(page) + cpoff, data, csize);
1373 } 1373 }
1374 1374
1375 boff += csize; 1375 boff += csize;
1376 data += csize; 1376 data += csize;
1377 } 1377 }
1378 } 1378 }
1379 1379
1380 /* 1380 /*
1381 * Handling of buffer targets (buftargs). 1381 * Handling of buffer targets (buftargs).
1382 */ 1382 */
1383 1383
1384 /* 1384 /*
1385 * Wait for any bufs with callbacks that have been submitted but 1385 * Wait for any bufs with callbacks that have been submitted but
1386 * have not yet returned... walk the hash list for the target. 1386 * have not yet returned... walk the hash list for the target.
1387 */ 1387 */
1388 void 1388 void
1389 xfs_wait_buftarg( 1389 xfs_wait_buftarg(
1390 xfs_buftarg_t *btp) 1390 xfs_buftarg_t *btp)
1391 { 1391 {
1392 xfs_buf_t *bp, *n; 1392 xfs_buf_t *bp, *n;
1393 xfs_bufhash_t *hash; 1393 xfs_bufhash_t *hash;
1394 uint i; 1394 uint i;
1395 1395
1396 for (i = 0; i < (1 << btp->bt_hashshift); i++) { 1396 for (i = 0; i < (1 << btp->bt_hashshift); i++) {
1397 hash = &btp->bt_hash[i]; 1397 hash = &btp->bt_hash[i];
1398 again: 1398 again:
1399 spin_lock(&hash->bh_lock); 1399 spin_lock(&hash->bh_lock);
1400 list_for_each_entry_safe(bp, n, &hash->bh_list, b_hash_list) { 1400 list_for_each_entry_safe(bp, n, &hash->bh_list, b_hash_list) {
1401 ASSERT(btp == bp->b_target); 1401 ASSERT(btp == bp->b_target);
1402 if (!(bp->b_flags & XBF_FS_MANAGED)) { 1402 if (!(bp->b_flags & XBF_FS_MANAGED)) {
1403 spin_unlock(&hash->bh_lock); 1403 spin_unlock(&hash->bh_lock);
1404 /* 1404 /*
1405 * Catch superblock reference count leaks 1405 * Catch superblock reference count leaks
1406 * immediately 1406 * immediately
1407 */ 1407 */
1408 BUG_ON(bp->b_bn == 0); 1408 BUG_ON(bp->b_bn == 0);
1409 delay(100); 1409 delay(100);
1410 goto again; 1410 goto again;
1411 } 1411 }
1412 } 1412 }
1413 spin_unlock(&hash->bh_lock); 1413 spin_unlock(&hash->bh_lock);
1414 } 1414 }
1415 } 1415 }
1416 1416
1417 /* 1417 /*
1418 * Allocate buffer hash table for a given target. 1418 * Allocate buffer hash table for a given target.
1419 * For devices containing metadata (i.e. not the log/realtime devices) 1419 * For devices containing metadata (i.e. not the log/realtime devices)
1420 * we need to allocate a much larger hash table. 1420 * we need to allocate a much larger hash table.
1421 */ 1421 */
1422 STATIC void 1422 STATIC void
1423 xfs_alloc_bufhash( 1423 xfs_alloc_bufhash(
1424 xfs_buftarg_t *btp, 1424 xfs_buftarg_t *btp,
1425 int external) 1425 int external)
1426 { 1426 {
1427 unsigned int i; 1427 unsigned int i;
1428 1428
1429 btp->bt_hashshift = external ? 3 : 12; /* 8 or 4096 buckets */ 1429 btp->bt_hashshift = external ? 3 : 12; /* 8 or 4096 buckets */
1430 btp->bt_hash = kmem_zalloc_large((1 << btp->bt_hashshift) * 1430 btp->bt_hash = kmem_zalloc_large((1 << btp->bt_hashshift) *
1431 sizeof(xfs_bufhash_t)); 1431 sizeof(xfs_bufhash_t));
1432 for (i = 0; i < (1 << btp->bt_hashshift); i++) { 1432 for (i = 0; i < (1 << btp->bt_hashshift); i++) {
1433 spin_lock_init(&btp->bt_hash[i].bh_lock); 1433 spin_lock_init(&btp->bt_hash[i].bh_lock);
1434 INIT_LIST_HEAD(&btp->bt_hash[i].bh_list); 1434 INIT_LIST_HEAD(&btp->bt_hash[i].bh_list);
1435 } 1435 }
1436 } 1436 }
1437 1437
1438 STATIC void 1438 STATIC void
1439 xfs_free_bufhash( 1439 xfs_free_bufhash(
1440 xfs_buftarg_t *btp) 1440 xfs_buftarg_t *btp)
1441 { 1441 {
1442 kmem_free_large(btp->bt_hash); 1442 kmem_free_large(btp->bt_hash);
1443 btp->bt_hash = NULL; 1443 btp->bt_hash = NULL;
1444 } 1444 }
1445 1445
1446 /* 1446 /*
1447 * buftarg list for delwrite queue processing 1447 * buftarg list for delwrite queue processing
1448 */ 1448 */
1449 static LIST_HEAD(xfs_buftarg_list); 1449 static LIST_HEAD(xfs_buftarg_list);
1450 static DEFINE_SPINLOCK(xfs_buftarg_lock); 1450 static DEFINE_SPINLOCK(xfs_buftarg_lock);
1451 1451
1452 STATIC void 1452 STATIC void
1453 xfs_register_buftarg( 1453 xfs_register_buftarg(
1454 xfs_buftarg_t *btp) 1454 xfs_buftarg_t *btp)
1455 { 1455 {
1456 spin_lock(&xfs_buftarg_lock); 1456 spin_lock(&xfs_buftarg_lock);
1457 list_add(&btp->bt_list, &xfs_buftarg_list); 1457 list_add(&btp->bt_list, &xfs_buftarg_list);
1458 spin_unlock(&xfs_buftarg_lock); 1458 spin_unlock(&xfs_buftarg_lock);
1459 } 1459 }
1460 1460
1461 STATIC void 1461 STATIC void
1462 xfs_unregister_buftarg( 1462 xfs_unregister_buftarg(
1463 xfs_buftarg_t *btp) 1463 xfs_buftarg_t *btp)
1464 { 1464 {
1465 spin_lock(&xfs_buftarg_lock); 1465 spin_lock(&xfs_buftarg_lock);
1466 list_del(&btp->bt_list); 1466 list_del(&btp->bt_list);
1467 spin_unlock(&xfs_buftarg_lock); 1467 spin_unlock(&xfs_buftarg_lock);
1468 } 1468 }
1469 1469
1470 void 1470 void
1471 xfs_free_buftarg( 1471 xfs_free_buftarg(
1472 struct xfs_mount *mp, 1472 struct xfs_mount *mp,
1473 struct xfs_buftarg *btp) 1473 struct xfs_buftarg *btp)
1474 { 1474 {
1475 xfs_flush_buftarg(btp, 1); 1475 xfs_flush_buftarg(btp, 1);
1476 if (mp->m_flags & XFS_MOUNT_BARRIER) 1476 if (mp->m_flags & XFS_MOUNT_BARRIER)
1477 xfs_blkdev_issue_flush(btp); 1477 xfs_blkdev_issue_flush(btp);
1478 xfs_free_bufhash(btp); 1478 xfs_free_bufhash(btp);
1479 iput(btp->bt_mapping->host); 1479 iput(btp->bt_mapping->host);
1480 1480
1481 /* Unregister the buftarg first so that we don't get a 1481 /* Unregister the buftarg first so that we don't get a
1482 * wakeup finding a non-existent task 1482 * wakeup finding a non-existent task
1483 */ 1483 */
1484 xfs_unregister_buftarg(btp); 1484 xfs_unregister_buftarg(btp);
1485 kthread_stop(btp->bt_task); 1485 kthread_stop(btp->bt_task);
1486 1486
1487 kmem_free(btp); 1487 kmem_free(btp);
1488 } 1488 }
1489 1489
1490 STATIC int 1490 STATIC int
1491 xfs_setsize_buftarg_flags( 1491 xfs_setsize_buftarg_flags(
1492 xfs_buftarg_t *btp, 1492 xfs_buftarg_t *btp,
1493 unsigned int blocksize, 1493 unsigned int blocksize,
1494 unsigned int sectorsize, 1494 unsigned int sectorsize,
1495 int verbose) 1495 int verbose)
1496 { 1496 {
1497 btp->bt_bsize = blocksize; 1497 btp->bt_bsize = blocksize;
1498 btp->bt_sshift = ffs(sectorsize) - 1; 1498 btp->bt_sshift = ffs(sectorsize) - 1;
1499 btp->bt_smask = sectorsize - 1; 1499 btp->bt_smask = sectorsize - 1;
1500 1500
1501 if (set_blocksize(btp->bt_bdev, sectorsize)) { 1501 if (set_blocksize(btp->bt_bdev, sectorsize)) {
1502 printk(KERN_WARNING 1502 printk(KERN_WARNING
1503 "XFS: Cannot set_blocksize to %u on device %s\n", 1503 "XFS: Cannot set_blocksize to %u on device %s\n",
1504 sectorsize, XFS_BUFTARG_NAME(btp)); 1504 sectorsize, XFS_BUFTARG_NAME(btp));
1505 return EINVAL; 1505 return EINVAL;
1506 } 1506 }
1507 1507
1508 if (verbose && 1508 if (verbose &&
1509 (PAGE_CACHE_SIZE / BITS_PER_LONG) > sectorsize) { 1509 (PAGE_CACHE_SIZE / BITS_PER_LONG) > sectorsize) {
1510 printk(KERN_WARNING 1510 printk(KERN_WARNING
1511 "XFS: %u byte sectors in use on device %s. " 1511 "XFS: %u byte sectors in use on device %s. "
1512 "This is suboptimal; %u or greater is ideal.\n", 1512 "This is suboptimal; %u or greater is ideal.\n",
1513 sectorsize, XFS_BUFTARG_NAME(btp), 1513 sectorsize, XFS_BUFTARG_NAME(btp),
1514 (unsigned int)PAGE_CACHE_SIZE / BITS_PER_LONG); 1514 (unsigned int)PAGE_CACHE_SIZE / BITS_PER_LONG);
1515 } 1515 }
1516 1516
1517 return 0; 1517 return 0;
1518 } 1518 }
1519 1519
1520 /* 1520 /*
1521 * When allocating the initial buffer target we have not yet 1521 * When allocating the initial buffer target we have not yet
1522 * read in the superblock, so don't know what sized sectors 1522 * read in the superblock, so don't know what sized sectors
1523 * are being used is at this early stage. Play safe. 1523 * are being used is at this early stage. Play safe.
1524 */ 1524 */
1525 STATIC int 1525 STATIC int
1526 xfs_setsize_buftarg_early( 1526 xfs_setsize_buftarg_early(
1527 xfs_buftarg_t *btp, 1527 xfs_buftarg_t *btp,
1528 struct block_device *bdev) 1528 struct block_device *bdev)
1529 { 1529 {
1530 return xfs_setsize_buftarg_flags(btp, 1530 return xfs_setsize_buftarg_flags(btp,
1531 PAGE_CACHE_SIZE, bdev_logical_block_size(bdev), 0); 1531 PAGE_CACHE_SIZE, bdev_logical_block_size(bdev), 0);
1532 } 1532 }
1533 1533
1534 int 1534 int
1535 xfs_setsize_buftarg( 1535 xfs_setsize_buftarg(
1536 xfs_buftarg_t *btp, 1536 xfs_buftarg_t *btp,
1537 unsigned int blocksize, 1537 unsigned int blocksize,
1538 unsigned int sectorsize) 1538 unsigned int sectorsize)
1539 { 1539 {
1540 return xfs_setsize_buftarg_flags(btp, blocksize, sectorsize, 1); 1540 return xfs_setsize_buftarg_flags(btp, blocksize, sectorsize, 1);
1541 } 1541 }
1542 1542
1543 STATIC int 1543 STATIC int
1544 xfs_mapping_buftarg( 1544 xfs_mapping_buftarg(
1545 xfs_buftarg_t *btp, 1545 xfs_buftarg_t *btp,
1546 struct block_device *bdev) 1546 struct block_device *bdev)
1547 { 1547 {
1548 struct backing_dev_info *bdi; 1548 struct backing_dev_info *bdi;
1549 struct inode *inode; 1549 struct inode *inode;
1550 struct address_space *mapping; 1550 struct address_space *mapping;
1551 static const struct address_space_operations mapping_aops = { 1551 static const struct address_space_operations mapping_aops = {
1552 .sync_page = block_sync_page, 1552 .sync_page = block_sync_page,
1553 .migratepage = fail_migrate_page, 1553 .migratepage = fail_migrate_page,
1554 }; 1554 };
1555 1555
1556 inode = new_inode(bdev->bd_inode->i_sb); 1556 inode = new_inode(bdev->bd_inode->i_sb);
1557 if (!inode) { 1557 if (!inode) {
1558 printk(KERN_WARNING 1558 printk(KERN_WARNING
1559 "XFS: Cannot allocate mapping inode for device %s\n", 1559 "XFS: Cannot allocate mapping inode for device %s\n",
1560 XFS_BUFTARG_NAME(btp)); 1560 XFS_BUFTARG_NAME(btp));
1561 return ENOMEM; 1561 return ENOMEM;
1562 } 1562 }
1563 inode->i_mode = S_IFBLK; 1563 inode->i_mode = S_IFBLK;
1564 inode->i_bdev = bdev; 1564 inode->i_bdev = bdev;
1565 inode->i_rdev = bdev->bd_dev; 1565 inode->i_rdev = bdev->bd_dev;
1566 bdi = blk_get_backing_dev_info(bdev); 1566 bdi = blk_get_backing_dev_info(bdev);
1567 if (!bdi) 1567 if (!bdi)
1568 bdi = &default_backing_dev_info; 1568 bdi = &default_backing_dev_info;
1569 mapping = &inode->i_data; 1569 mapping = &inode->i_data;
1570 mapping->a_ops = &mapping_aops; 1570 mapping->a_ops = &mapping_aops;
1571 mapping->backing_dev_info = bdi; 1571 mapping->backing_dev_info = bdi;
1572 mapping_set_gfp_mask(mapping, GFP_NOFS); 1572 mapping_set_gfp_mask(mapping, GFP_NOFS);
1573 btp->bt_mapping = mapping; 1573 btp->bt_mapping = mapping;
1574 return 0; 1574 return 0;
1575 } 1575 }
1576 1576
1577 STATIC int 1577 STATIC int
1578 xfs_alloc_delwrite_queue( 1578 xfs_alloc_delwrite_queue(
1579 xfs_buftarg_t *btp, 1579 xfs_buftarg_t *btp,
1580 const char *fsname) 1580 const char *fsname)
1581 { 1581 {
1582 int error = 0; 1582 int error = 0;
1583 1583
1584 INIT_LIST_HEAD(&btp->bt_list); 1584 INIT_LIST_HEAD(&btp->bt_list);
1585 INIT_LIST_HEAD(&btp->bt_delwrite_queue); 1585 INIT_LIST_HEAD(&btp->bt_delwrite_queue);
1586 spin_lock_init(&btp->bt_delwrite_lock); 1586 spin_lock_init(&btp->bt_delwrite_lock);
1587 btp->bt_flags = 0; 1587 btp->bt_flags = 0;
1588 btp->bt_task = kthread_run(xfsbufd, btp, "xfsbufd/%s", fsname); 1588 btp->bt_task = kthread_run(xfsbufd, btp, "xfsbufd/%s", fsname);
1589 if (IS_ERR(btp->bt_task)) { 1589 if (IS_ERR(btp->bt_task)) {
1590 error = PTR_ERR(btp->bt_task); 1590 error = PTR_ERR(btp->bt_task);
1591 goto out_error; 1591 goto out_error;
1592 } 1592 }
1593 xfs_register_buftarg(btp); 1593 xfs_register_buftarg(btp);
1594 out_error: 1594 out_error:
1595 return error; 1595 return error;
1596 } 1596 }
1597 1597
1598 xfs_buftarg_t * 1598 xfs_buftarg_t *
1599 xfs_alloc_buftarg( 1599 xfs_alloc_buftarg(
1600 struct block_device *bdev, 1600 struct block_device *bdev,
1601 int external, 1601 int external,
1602 const char *fsname) 1602 const char *fsname)
1603 { 1603 {
1604 xfs_buftarg_t *btp; 1604 xfs_buftarg_t *btp;
1605 1605
1606 btp = kmem_zalloc(sizeof(*btp), KM_SLEEP); 1606 btp = kmem_zalloc(sizeof(*btp), KM_SLEEP);
1607 1607
1608 btp->bt_dev = bdev->bd_dev; 1608 btp->bt_dev = bdev->bd_dev;
1609 btp->bt_bdev = bdev; 1609 btp->bt_bdev = bdev;
1610 if (xfs_setsize_buftarg_early(btp, bdev)) 1610 if (xfs_setsize_buftarg_early(btp, bdev))
1611 goto error; 1611 goto error;
1612 if (xfs_mapping_buftarg(btp, bdev)) 1612 if (xfs_mapping_buftarg(btp, bdev))
1613 goto error; 1613 goto error;
1614 if (xfs_alloc_delwrite_queue(btp, fsname)) 1614 if (xfs_alloc_delwrite_queue(btp, fsname))
1615 goto error; 1615 goto error;
1616 xfs_alloc_bufhash(btp, external); 1616 xfs_alloc_bufhash(btp, external);
1617 return btp; 1617 return btp;
1618 1618
1619 error: 1619 error:
1620 kmem_free(btp); 1620 kmem_free(btp);
1621 return NULL; 1621 return NULL;
1622 } 1622 }
1623 1623
1624 1624
1625 /* 1625 /*
1626 * Delayed write buffer handling 1626 * Delayed write buffer handling
1627 */ 1627 */
1628 STATIC void 1628 STATIC void
1629 xfs_buf_delwri_queue( 1629 xfs_buf_delwri_queue(
1630 xfs_buf_t *bp, 1630 xfs_buf_t *bp,
1631 int unlock) 1631 int unlock)
1632 { 1632 {
1633 struct list_head *dwq = &bp->b_target->bt_delwrite_queue; 1633 struct list_head *dwq = &bp->b_target->bt_delwrite_queue;
1634 spinlock_t *dwlk = &bp->b_target->bt_delwrite_lock; 1634 spinlock_t *dwlk = &bp->b_target->bt_delwrite_lock;
1635 1635
1636 trace_xfs_buf_delwri_queue(bp, _RET_IP_); 1636 trace_xfs_buf_delwri_queue(bp, _RET_IP_);
1637 1637
1638 ASSERT((bp->b_flags&(XBF_DELWRI|XBF_ASYNC)) == (XBF_DELWRI|XBF_ASYNC)); 1638 ASSERT((bp->b_flags&(XBF_DELWRI|XBF_ASYNC)) == (XBF_DELWRI|XBF_ASYNC));
1639 1639
1640 spin_lock(dwlk); 1640 spin_lock(dwlk);
1641 /* If already in the queue, dequeue and place at tail */ 1641 /* If already in the queue, dequeue and place at tail */
1642 if (!list_empty(&bp->b_list)) { 1642 if (!list_empty(&bp->b_list)) {
1643 ASSERT(bp->b_flags & _XBF_DELWRI_Q); 1643 ASSERT(bp->b_flags & _XBF_DELWRI_Q);
1644 if (unlock) 1644 if (unlock)
1645 atomic_dec(&bp->b_hold); 1645 atomic_dec(&bp->b_hold);
1646 list_del(&bp->b_list); 1646 list_del(&bp->b_list);
1647 } 1647 }
1648 1648
1649 if (list_empty(dwq)) { 1649 if (list_empty(dwq)) {
1650 /* start xfsbufd as it is about to have something to do */ 1650 /* start xfsbufd as it is about to have something to do */
1651 wake_up_process(bp->b_target->bt_task); 1651 wake_up_process(bp->b_target->bt_task);
1652 } 1652 }
1653 1653
1654 bp->b_flags |= _XBF_DELWRI_Q; 1654 bp->b_flags |= _XBF_DELWRI_Q;
1655 list_add_tail(&bp->b_list, dwq); 1655 list_add_tail(&bp->b_list, dwq);
1656 bp->b_queuetime = jiffies; 1656 bp->b_queuetime = jiffies;
1657 spin_unlock(dwlk); 1657 spin_unlock(dwlk);
1658 1658
1659 if (unlock) 1659 if (unlock)
1660 xfs_buf_unlock(bp); 1660 xfs_buf_unlock(bp);
1661 } 1661 }
1662 1662
1663 void 1663 void
1664 xfs_buf_delwri_dequeue( 1664 xfs_buf_delwri_dequeue(
1665 xfs_buf_t *bp) 1665 xfs_buf_t *bp)
1666 { 1666 {
1667 spinlock_t *dwlk = &bp->b_target->bt_delwrite_lock; 1667 spinlock_t *dwlk = &bp->b_target->bt_delwrite_lock;
1668 int dequeued = 0; 1668 int dequeued = 0;
1669 1669
1670 spin_lock(dwlk); 1670 spin_lock(dwlk);
1671 if ((bp->b_flags & XBF_DELWRI) && !list_empty(&bp->b_list)) { 1671 if ((bp->b_flags & XBF_DELWRI) && !list_empty(&bp->b_list)) {
1672 ASSERT(bp->b_flags & _XBF_DELWRI_Q); 1672 ASSERT(bp->b_flags & _XBF_DELWRI_Q);
1673 list_del_init(&bp->b_list); 1673 list_del_init(&bp->b_list);
1674 dequeued = 1; 1674 dequeued = 1;
1675 } 1675 }
1676 bp->b_flags &= ~(XBF_DELWRI|_XBF_DELWRI_Q); 1676 bp->b_flags &= ~(XBF_DELWRI|_XBF_DELWRI_Q);
1677 spin_unlock(dwlk); 1677 spin_unlock(dwlk);
1678 1678
1679 if (dequeued) 1679 if (dequeued)
1680 xfs_buf_rele(bp); 1680 xfs_buf_rele(bp);
1681 1681
1682 trace_xfs_buf_delwri_dequeue(bp, _RET_IP_); 1682 trace_xfs_buf_delwri_dequeue(bp, _RET_IP_);
1683 } 1683 }
1684 1684
1685 /* 1685 /*
1686 * If a delwri buffer needs to be pushed before it has aged out, then promote 1686 * If a delwri buffer needs to be pushed before it has aged out, then promote
1687 * it to the head of the delwri queue so that it will be flushed on the next 1687 * it to the head of the delwri queue so that it will be flushed on the next
1688 * xfsbufd run. We do this by resetting the queuetime of the buffer to be older 1688 * xfsbufd run. We do this by resetting the queuetime of the buffer to be older
1689 * than the age currently needed to flush the buffer. Hence the next time the 1689 * than the age currently needed to flush the buffer. Hence the next time the
1690 * xfsbufd sees it is guaranteed to be considered old enough to flush. 1690 * xfsbufd sees it is guaranteed to be considered old enough to flush.
1691 */ 1691 */
1692 void 1692 void
1693 xfs_buf_delwri_promote( 1693 xfs_buf_delwri_promote(
1694 struct xfs_buf *bp) 1694 struct xfs_buf *bp)
1695 { 1695 {
1696 struct xfs_buftarg *btp = bp->b_target; 1696 struct xfs_buftarg *btp = bp->b_target;
1697 long age = xfs_buf_age_centisecs * msecs_to_jiffies(10) + 1; 1697 long age = xfs_buf_age_centisecs * msecs_to_jiffies(10) + 1;
1698 1698
1699 ASSERT(bp->b_flags & XBF_DELWRI); 1699 ASSERT(bp->b_flags & XBF_DELWRI);
1700 ASSERT(bp->b_flags & _XBF_DELWRI_Q); 1700 ASSERT(bp->b_flags & _XBF_DELWRI_Q);
1701 1701
1702 /* 1702 /*
1703 * Check the buffer age before locking the delayed write queue as we 1703 * Check the buffer age before locking the delayed write queue as we
1704 * don't need to promote buffers that are already past the flush age. 1704 * don't need to promote buffers that are already past the flush age.
1705 */ 1705 */
1706 if (bp->b_queuetime < jiffies - age) 1706 if (bp->b_queuetime < jiffies - age)
1707 return; 1707 return;
1708 bp->b_queuetime = jiffies - age; 1708 bp->b_queuetime = jiffies - age;
1709 spin_lock(&btp->bt_delwrite_lock); 1709 spin_lock(&btp->bt_delwrite_lock);
1710 list_move(&bp->b_list, &btp->bt_delwrite_queue); 1710 list_move(&bp->b_list, &btp->bt_delwrite_queue);
1711 spin_unlock(&btp->bt_delwrite_lock); 1711 spin_unlock(&btp->bt_delwrite_lock);
1712 } 1712 }
1713 1713
1714 STATIC void 1714 STATIC void
1715 xfs_buf_runall_queues( 1715 xfs_buf_runall_queues(
1716 struct workqueue_struct *queue) 1716 struct workqueue_struct *queue)
1717 { 1717 {
1718 flush_workqueue(queue); 1718 flush_workqueue(queue);
1719 } 1719 }
1720 1720
1721 STATIC int 1721 STATIC int
1722 xfsbufd_wakeup( 1722 xfsbufd_wakeup(
1723 struct shrinker *shrink, 1723 struct shrinker *shrink,
1724 int priority, 1724 int priority,
1725 gfp_t mask) 1725 gfp_t mask)
1726 { 1726 {
1727 xfs_buftarg_t *btp; 1727 xfs_buftarg_t *btp;
1728 1728
1729 spin_lock(&xfs_buftarg_lock); 1729 spin_lock(&xfs_buftarg_lock);
1730 list_for_each_entry(btp, &xfs_buftarg_list, bt_list) { 1730 list_for_each_entry(btp, &xfs_buftarg_list, bt_list) {
1731 if (test_bit(XBT_FORCE_SLEEP, &btp->bt_flags)) 1731 if (test_bit(XBT_FORCE_SLEEP, &btp->bt_flags))
1732 continue; 1732 continue;
1733 if (list_empty(&btp->bt_delwrite_queue)) 1733 if (list_empty(&btp->bt_delwrite_queue))
1734 continue; 1734 continue;
1735 set_bit(XBT_FORCE_FLUSH, &btp->bt_flags); 1735 set_bit(XBT_FORCE_FLUSH, &btp->bt_flags);
1736 wake_up_process(btp->bt_task); 1736 wake_up_process(btp->bt_task);
1737 } 1737 }
1738 spin_unlock(&xfs_buftarg_lock); 1738 spin_unlock(&xfs_buftarg_lock);
1739 return 0; 1739 return 0;
1740 } 1740 }
1741 1741
1742 /* 1742 /*
1743 * Move as many buffers as specified to the supplied list 1743 * Move as many buffers as specified to the supplied list
1744 * idicating if we skipped any buffers to prevent deadlocks. 1744 * idicating if we skipped any buffers to prevent deadlocks.
1745 */ 1745 */
1746 STATIC int 1746 STATIC int
1747 xfs_buf_delwri_split( 1747 xfs_buf_delwri_split(
1748 xfs_buftarg_t *target, 1748 xfs_buftarg_t *target,
1749 struct list_head *list, 1749 struct list_head *list,
1750 unsigned long age) 1750 unsigned long age)
1751 { 1751 {
1752 xfs_buf_t *bp, *n; 1752 xfs_buf_t *bp, *n;
1753 struct list_head *dwq = &target->bt_delwrite_queue; 1753 struct list_head *dwq = &target->bt_delwrite_queue;
1754 spinlock_t *dwlk = &target->bt_delwrite_lock; 1754 spinlock_t *dwlk = &target->bt_delwrite_lock;
1755 int skipped = 0; 1755 int skipped = 0;
1756 int force; 1756 int force;
1757 1757
1758 force = test_and_clear_bit(XBT_FORCE_FLUSH, &target->bt_flags); 1758 force = test_and_clear_bit(XBT_FORCE_FLUSH, &target->bt_flags);
1759 INIT_LIST_HEAD(list); 1759 INIT_LIST_HEAD(list);
1760 spin_lock(dwlk); 1760 spin_lock(dwlk);
1761 list_for_each_entry_safe(bp, n, dwq, b_list) { 1761 list_for_each_entry_safe(bp, n, dwq, b_list) {
1762 trace_xfs_buf_delwri_split(bp, _RET_IP_); 1762 trace_xfs_buf_delwri_split(bp, _RET_IP_);
1763 ASSERT(bp->b_flags & XBF_DELWRI); 1763 ASSERT(bp->b_flags & XBF_DELWRI);
1764 1764
1765 if (!XFS_BUF_ISPINNED(bp) && !xfs_buf_cond_lock(bp)) { 1765 if (!XFS_BUF_ISPINNED(bp) && !xfs_buf_cond_lock(bp)) {
1766 if (!force && 1766 if (!force &&
1767 time_before(jiffies, bp->b_queuetime + age)) { 1767 time_before(jiffies, bp->b_queuetime + age)) {
1768 xfs_buf_unlock(bp); 1768 xfs_buf_unlock(bp);
1769 break; 1769 break;
1770 } 1770 }
1771 1771
1772 bp->b_flags &= ~(XBF_DELWRI|_XBF_DELWRI_Q| 1772 bp->b_flags &= ~(XBF_DELWRI|_XBF_DELWRI_Q|
1773 _XBF_RUN_QUEUES); 1773 _XBF_RUN_QUEUES);
1774 bp->b_flags |= XBF_WRITE; 1774 bp->b_flags |= XBF_WRITE;
1775 list_move_tail(&bp->b_list, list); 1775 list_move_tail(&bp->b_list, list);
1776 } else 1776 } else
1777 skipped++; 1777 skipped++;
1778 } 1778 }
1779 spin_unlock(dwlk); 1779 spin_unlock(dwlk);
1780 1780
1781 return skipped; 1781 return skipped;
1782 1782
1783 } 1783 }
1784 1784
1785 /* 1785 /*
1786 * Compare function is more complex than it needs to be because 1786 * Compare function is more complex than it needs to be because
1787 * the return value is only 32 bits and we are doing comparisons 1787 * the return value is only 32 bits and we are doing comparisons
1788 * on 64 bit values 1788 * on 64 bit values
1789 */ 1789 */
1790 static int 1790 static int
1791 xfs_buf_cmp( 1791 xfs_buf_cmp(
1792 void *priv, 1792 void *priv,
1793 struct list_head *a, 1793 struct list_head *a,
1794 struct list_head *b) 1794 struct list_head *b)
1795 { 1795 {
1796 struct xfs_buf *ap = container_of(a, struct xfs_buf, b_list); 1796 struct xfs_buf *ap = container_of(a, struct xfs_buf, b_list);
1797 struct xfs_buf *bp = container_of(b, struct xfs_buf, b_list); 1797 struct xfs_buf *bp = container_of(b, struct xfs_buf, b_list);
1798 xfs_daddr_t diff; 1798 xfs_daddr_t diff;
1799 1799
1800 diff = ap->b_bn - bp->b_bn; 1800 diff = ap->b_bn - bp->b_bn;
1801 if (diff < 0) 1801 if (diff < 0)
1802 return -1; 1802 return -1;
1803 if (diff > 0) 1803 if (diff > 0)
1804 return 1; 1804 return 1;
1805 return 0; 1805 return 0;
1806 } 1806 }
1807 1807
1808 void 1808 void
1809 xfs_buf_delwri_sort( 1809 xfs_buf_delwri_sort(
1810 xfs_buftarg_t *target, 1810 xfs_buftarg_t *target,
1811 struct list_head *list) 1811 struct list_head *list)
1812 { 1812 {
1813 list_sort(NULL, list, xfs_buf_cmp); 1813 list_sort(NULL, list, xfs_buf_cmp);
1814 } 1814 }
1815 1815
1816 STATIC int 1816 STATIC int
1817 xfsbufd( 1817 xfsbufd(
1818 void *data) 1818 void *data)
1819 { 1819 {
1820 xfs_buftarg_t *target = (xfs_buftarg_t *)data; 1820 xfs_buftarg_t *target = (xfs_buftarg_t *)data;
1821 1821
1822 current->flags |= PF_MEMALLOC; 1822 current->flags |= PF_MEMALLOC;
1823 1823
1824 set_freezable(); 1824 set_freezable();
1825 1825
1826 do { 1826 do {
1827 long age = xfs_buf_age_centisecs * msecs_to_jiffies(10); 1827 long age = xfs_buf_age_centisecs * msecs_to_jiffies(10);
1828 long tout = xfs_buf_timer_centisecs * msecs_to_jiffies(10); 1828 long tout = xfs_buf_timer_centisecs * msecs_to_jiffies(10);
1829 int count = 0; 1829 int count = 0;
1830 struct list_head tmp; 1830 struct list_head tmp;
1831 1831
1832 if (unlikely(freezing(current))) { 1832 if (unlikely(freezing(current))) {
1833 set_bit(XBT_FORCE_SLEEP, &target->bt_flags); 1833 set_bit(XBT_FORCE_SLEEP, &target->bt_flags);
1834 refrigerator(); 1834 refrigerator();
1835 } else { 1835 } else {
1836 clear_bit(XBT_FORCE_SLEEP, &target->bt_flags); 1836 clear_bit(XBT_FORCE_SLEEP, &target->bt_flags);
1837 } 1837 }
1838 1838
1839 /* sleep for a long time if there is nothing to do. */ 1839 /* sleep for a long time if there is nothing to do. */
1840 if (list_empty(&target->bt_delwrite_queue)) 1840 if (list_empty(&target->bt_delwrite_queue))
1841 tout = MAX_SCHEDULE_TIMEOUT; 1841 tout = MAX_SCHEDULE_TIMEOUT;
1842 schedule_timeout_interruptible(tout); 1842 schedule_timeout_interruptible(tout);
1843 1843
1844 xfs_buf_delwri_split(target, &tmp, age); 1844 xfs_buf_delwri_split(target, &tmp, age);
1845 list_sort(NULL, &tmp, xfs_buf_cmp); 1845 list_sort(NULL, &tmp, xfs_buf_cmp);
1846 while (!list_empty(&tmp)) { 1846 while (!list_empty(&tmp)) {
1847 struct xfs_buf *bp; 1847 struct xfs_buf *bp;
1848 bp = list_first_entry(&tmp, struct xfs_buf, b_list); 1848 bp = list_first_entry(&tmp, struct xfs_buf, b_list);
1849 list_del_init(&bp->b_list); 1849 list_del_init(&bp->b_list);
1850 xfs_bdstrat_cb(bp); 1850 xfs_bdstrat_cb(bp);
1851 count++; 1851 count++;
1852 } 1852 }
1853 if (count) 1853 if (count)
1854 blk_run_address_space(target->bt_mapping); 1854 blk_run_address_space(target->bt_mapping);
1855 1855
1856 } while (!kthread_should_stop()); 1856 } while (!kthread_should_stop());
1857 1857
1858 return 0; 1858 return 0;
1859 } 1859 }
1860 1860
1861 /* 1861 /*
1862 * Go through all incore buffers, and release buffers if they belong to 1862 * Go through all incore buffers, and release buffers if they belong to
1863 * the given device. This is used in filesystem error handling to 1863 * the given device. This is used in filesystem error handling to
1864 * preserve the consistency of its metadata. 1864 * preserve the consistency of its metadata.
1865 */ 1865 */
1866 int 1866 int
1867 xfs_flush_buftarg( 1867 xfs_flush_buftarg(
1868 xfs_buftarg_t *target, 1868 xfs_buftarg_t *target,
1869 int wait) 1869 int wait)
1870 { 1870 {
1871 xfs_buf_t *bp; 1871 xfs_buf_t *bp;
1872 int pincount = 0; 1872 int pincount = 0;
1873 LIST_HEAD(tmp_list); 1873 LIST_HEAD(tmp_list);
1874 LIST_HEAD(wait_list); 1874 LIST_HEAD(wait_list);
1875 1875
1876 xfs_buf_runall_queues(xfsconvertd_workqueue); 1876 xfs_buf_runall_queues(xfsconvertd_workqueue);
1877 xfs_buf_runall_queues(xfsdatad_workqueue); 1877 xfs_buf_runall_queues(xfsdatad_workqueue);
1878 xfs_buf_runall_queues(xfslogd_workqueue); 1878 xfs_buf_runall_queues(xfslogd_workqueue);
1879 1879
1880 set_bit(XBT_FORCE_FLUSH, &target->bt_flags); 1880 set_bit(XBT_FORCE_FLUSH, &target->bt_flags);
1881 pincount = xfs_buf_delwri_split(target, &tmp_list, 0); 1881 pincount = xfs_buf_delwri_split(target, &tmp_list, 0);
1882 1882
1883 /* 1883 /*
1884 * Dropped the delayed write list lock, now walk the temporary list. 1884 * Dropped the delayed write list lock, now walk the temporary list.
1885 * All I/O is issued async and then if we need to wait for completion 1885 * All I/O is issued async and then if we need to wait for completion
1886 * we do that after issuing all the IO. 1886 * we do that after issuing all the IO.
1887 */ 1887 */
1888 list_sort(NULL, &tmp_list, xfs_buf_cmp); 1888 list_sort(NULL, &tmp_list, xfs_buf_cmp);
1889 while (!list_empty(&tmp_list)) { 1889 while (!list_empty(&tmp_list)) {
1890 bp = list_first_entry(&tmp_list, struct xfs_buf, b_list); 1890 bp = list_first_entry(&tmp_list, struct xfs_buf, b_list);
1891 ASSERT(target == bp->b_target); 1891 ASSERT(target == bp->b_target);
1892 list_del_init(&bp->b_list); 1892 list_del_init(&bp->b_list);
1893 if (wait) { 1893 if (wait) {
1894 bp->b_flags &= ~XBF_ASYNC; 1894 bp->b_flags &= ~XBF_ASYNC;
1895 list_add(&bp->b_list, &wait_list); 1895 list_add(&bp->b_list, &wait_list);
1896 } 1896 }
1897 xfs_bdstrat_cb(bp); 1897 xfs_bdstrat_cb(bp);
1898 } 1898 }
1899 1899
1900 if (wait) { 1900 if (wait) {
1901 /* Expedite and wait for IO to complete. */ 1901 /* Expedite and wait for IO to complete. */
1902 blk_run_address_space(target->bt_mapping); 1902 blk_run_address_space(target->bt_mapping);
1903 while (!list_empty(&wait_list)) { 1903 while (!list_empty(&wait_list)) {
1904 bp = list_first_entry(&wait_list, struct xfs_buf, b_list); 1904 bp = list_first_entry(&wait_list, struct xfs_buf, b_list);
1905 1905
1906 list_del_init(&bp->b_list); 1906 list_del_init(&bp->b_list);
1907 xfs_iowait(bp); 1907 xfs_iowait(bp);
1908 xfs_buf_relse(bp); 1908 xfs_buf_relse(bp);
1909 } 1909 }
1910 } 1910 }
1911 1911
1912 return pincount; 1912 return pincount;
1913 } 1913 }
1914 1914
1915 int __init 1915 int __init
1916 xfs_buf_init(void) 1916 xfs_buf_init(void)
1917 { 1917 {
1918 xfs_buf_zone = kmem_zone_init_flags(sizeof(xfs_buf_t), "xfs_buf", 1918 xfs_buf_zone = kmem_zone_init_flags(sizeof(xfs_buf_t), "xfs_buf",
1919 KM_ZONE_HWALIGN, NULL); 1919 KM_ZONE_HWALIGN, NULL);
1920 if (!xfs_buf_zone) 1920 if (!xfs_buf_zone)
1921 goto out; 1921 goto out;
1922 1922
1923 xfslogd_workqueue = alloc_workqueue("xfslogd", 1923 xfslogd_workqueue = alloc_workqueue("xfslogd",
1924 WQ_RESCUER | WQ_HIGHPRI, 1); 1924 WQ_MEM_RECLAIM | WQ_HIGHPRI, 1);
1925 if (!xfslogd_workqueue) 1925 if (!xfslogd_workqueue)
1926 goto out_free_buf_zone; 1926 goto out_free_buf_zone;
1927 1927
1928 xfsdatad_workqueue = create_workqueue("xfsdatad"); 1928 xfsdatad_workqueue = create_workqueue("xfsdatad");
1929 if (!xfsdatad_workqueue) 1929 if (!xfsdatad_workqueue)
1930 goto out_destroy_xfslogd_workqueue; 1930 goto out_destroy_xfslogd_workqueue;
1931 1931
1932 xfsconvertd_workqueue = create_workqueue("xfsconvertd"); 1932 xfsconvertd_workqueue = create_workqueue("xfsconvertd");
1933 if (!xfsconvertd_workqueue) 1933 if (!xfsconvertd_workqueue)
1934 goto out_destroy_xfsdatad_workqueue; 1934 goto out_destroy_xfsdatad_workqueue;
1935 1935
1936 register_shrinker(&xfs_buf_shake); 1936 register_shrinker(&xfs_buf_shake);
1937 return 0; 1937 return 0;
1938 1938
1939 out_destroy_xfsdatad_workqueue: 1939 out_destroy_xfsdatad_workqueue:
1940 destroy_workqueue(xfsdatad_workqueue); 1940 destroy_workqueue(xfsdatad_workqueue);
1941 out_destroy_xfslogd_workqueue: 1941 out_destroy_xfslogd_workqueue:
1942 destroy_workqueue(xfslogd_workqueue); 1942 destroy_workqueue(xfslogd_workqueue);
1943 out_free_buf_zone: 1943 out_free_buf_zone:
1944 kmem_zone_destroy(xfs_buf_zone); 1944 kmem_zone_destroy(xfs_buf_zone);
1945 out: 1945 out:
1946 return -ENOMEM; 1946 return -ENOMEM;
1947 } 1947 }
1948 1948
1949 void 1949 void
1950 xfs_buf_terminate(void) 1950 xfs_buf_terminate(void)
1951 { 1951 {
1952 unregister_shrinker(&xfs_buf_shake); 1952 unregister_shrinker(&xfs_buf_shake);
1953 destroy_workqueue(xfsconvertd_workqueue); 1953 destroy_workqueue(xfsconvertd_workqueue);
1954 destroy_workqueue(xfsdatad_workqueue); 1954 destroy_workqueue(xfsdatad_workqueue);
1955 destroy_workqueue(xfslogd_workqueue); 1955 destroy_workqueue(xfslogd_workqueue);
1956 kmem_zone_destroy(xfs_buf_zone); 1956 kmem_zone_destroy(xfs_buf_zone);
1957 } 1957 }
1958 1958
1959 #ifdef CONFIG_KDB_MODULES 1959 #ifdef CONFIG_KDB_MODULES
1960 struct list_head * 1960 struct list_head *
1961 xfs_get_buftarg_list(void) 1961 xfs_get_buftarg_list(void)
1962 { 1962 {
1963 return &xfs_buftarg_list; 1963 return &xfs_buftarg_list;
1964 } 1964 }
1965 #endif 1965 #endif
1966 1966
include/linux/workqueue.h
1 /* 1 /*
2 * workqueue.h --- work queue handling for Linux. 2 * workqueue.h --- work queue handling for Linux.
3 */ 3 */
4 4
5 #ifndef _LINUX_WORKQUEUE_H 5 #ifndef _LINUX_WORKQUEUE_H
6 #define _LINUX_WORKQUEUE_H 6 #define _LINUX_WORKQUEUE_H
7 7
8 #include <linux/timer.h> 8 #include <linux/timer.h>
9 #include <linux/linkage.h> 9 #include <linux/linkage.h>
10 #include <linux/bitops.h> 10 #include <linux/bitops.h>
11 #include <linux/lockdep.h> 11 #include <linux/lockdep.h>
12 #include <linux/threads.h> 12 #include <linux/threads.h>
13 #include <asm/atomic.h> 13 #include <asm/atomic.h>
14 14
15 struct workqueue_struct; 15 struct workqueue_struct;
16 16
17 struct work_struct; 17 struct work_struct;
18 typedef void (*work_func_t)(struct work_struct *work); 18 typedef void (*work_func_t)(struct work_struct *work);
19 19
20 /* 20 /*
21 * The first word is the work queue pointer and the flags rolled into 21 * The first word is the work queue pointer and the flags rolled into
22 * one 22 * one
23 */ 23 */
24 #define work_data_bits(work) ((unsigned long *)(&(work)->data)) 24 #define work_data_bits(work) ((unsigned long *)(&(work)->data))
25 25
26 enum { 26 enum {
27 WORK_STRUCT_PENDING_BIT = 0, /* work item is pending execution */ 27 WORK_STRUCT_PENDING_BIT = 0, /* work item is pending execution */
28 WORK_STRUCT_DELAYED_BIT = 1, /* work item is delayed */ 28 WORK_STRUCT_DELAYED_BIT = 1, /* work item is delayed */
29 WORK_STRUCT_CWQ_BIT = 2, /* data points to cwq */ 29 WORK_STRUCT_CWQ_BIT = 2, /* data points to cwq */
30 WORK_STRUCT_LINKED_BIT = 3, /* next work is linked to this one */ 30 WORK_STRUCT_LINKED_BIT = 3, /* next work is linked to this one */
31 #ifdef CONFIG_DEBUG_OBJECTS_WORK 31 #ifdef CONFIG_DEBUG_OBJECTS_WORK
32 WORK_STRUCT_STATIC_BIT = 4, /* static initializer (debugobjects) */ 32 WORK_STRUCT_STATIC_BIT = 4, /* static initializer (debugobjects) */
33 WORK_STRUCT_COLOR_SHIFT = 5, /* color for workqueue flushing */ 33 WORK_STRUCT_COLOR_SHIFT = 5, /* color for workqueue flushing */
34 #else 34 #else
35 WORK_STRUCT_COLOR_SHIFT = 4, /* color for workqueue flushing */ 35 WORK_STRUCT_COLOR_SHIFT = 4, /* color for workqueue flushing */
36 #endif 36 #endif
37 37
38 WORK_STRUCT_COLOR_BITS = 4, 38 WORK_STRUCT_COLOR_BITS = 4,
39 39
40 WORK_STRUCT_PENDING = 1 << WORK_STRUCT_PENDING_BIT, 40 WORK_STRUCT_PENDING = 1 << WORK_STRUCT_PENDING_BIT,
41 WORK_STRUCT_DELAYED = 1 << WORK_STRUCT_DELAYED_BIT, 41 WORK_STRUCT_DELAYED = 1 << WORK_STRUCT_DELAYED_BIT,
42 WORK_STRUCT_CWQ = 1 << WORK_STRUCT_CWQ_BIT, 42 WORK_STRUCT_CWQ = 1 << WORK_STRUCT_CWQ_BIT,
43 WORK_STRUCT_LINKED = 1 << WORK_STRUCT_LINKED_BIT, 43 WORK_STRUCT_LINKED = 1 << WORK_STRUCT_LINKED_BIT,
44 #ifdef CONFIG_DEBUG_OBJECTS_WORK 44 #ifdef CONFIG_DEBUG_OBJECTS_WORK
45 WORK_STRUCT_STATIC = 1 << WORK_STRUCT_STATIC_BIT, 45 WORK_STRUCT_STATIC = 1 << WORK_STRUCT_STATIC_BIT,
46 #else 46 #else
47 WORK_STRUCT_STATIC = 0, 47 WORK_STRUCT_STATIC = 0,
48 #endif 48 #endif
49 49
50 /* 50 /*
51 * The last color is no color used for works which don't 51 * The last color is no color used for works which don't
52 * participate in workqueue flushing. 52 * participate in workqueue flushing.
53 */ 53 */
54 WORK_NR_COLORS = (1 << WORK_STRUCT_COLOR_BITS) - 1, 54 WORK_NR_COLORS = (1 << WORK_STRUCT_COLOR_BITS) - 1,
55 WORK_NO_COLOR = WORK_NR_COLORS, 55 WORK_NO_COLOR = WORK_NR_COLORS,
56 56
57 /* special cpu IDs */ 57 /* special cpu IDs */
58 WORK_CPU_UNBOUND = NR_CPUS, 58 WORK_CPU_UNBOUND = NR_CPUS,
59 WORK_CPU_NONE = NR_CPUS + 1, 59 WORK_CPU_NONE = NR_CPUS + 1,
60 WORK_CPU_LAST = WORK_CPU_NONE, 60 WORK_CPU_LAST = WORK_CPU_NONE,
61 61
62 /* 62 /*
63 * Reserve 7 bits off of cwq pointer w/ debugobjects turned 63 * Reserve 7 bits off of cwq pointer w/ debugobjects turned
64 * off. This makes cwqs aligned to 256 bytes and allows 15 64 * off. This makes cwqs aligned to 256 bytes and allows 15
65 * workqueue flush colors. 65 * workqueue flush colors.
66 */ 66 */
67 WORK_STRUCT_FLAG_BITS = WORK_STRUCT_COLOR_SHIFT + 67 WORK_STRUCT_FLAG_BITS = WORK_STRUCT_COLOR_SHIFT +
68 WORK_STRUCT_COLOR_BITS, 68 WORK_STRUCT_COLOR_BITS,
69 69
70 WORK_STRUCT_FLAG_MASK = (1UL << WORK_STRUCT_FLAG_BITS) - 1, 70 WORK_STRUCT_FLAG_MASK = (1UL << WORK_STRUCT_FLAG_BITS) - 1,
71 WORK_STRUCT_WQ_DATA_MASK = ~WORK_STRUCT_FLAG_MASK, 71 WORK_STRUCT_WQ_DATA_MASK = ~WORK_STRUCT_FLAG_MASK,
72 WORK_STRUCT_NO_CPU = WORK_CPU_NONE << WORK_STRUCT_FLAG_BITS, 72 WORK_STRUCT_NO_CPU = WORK_CPU_NONE << WORK_STRUCT_FLAG_BITS,
73 73
74 /* bit mask for work_busy() return values */ 74 /* bit mask for work_busy() return values */
75 WORK_BUSY_PENDING = 1 << 0, 75 WORK_BUSY_PENDING = 1 << 0,
76 WORK_BUSY_RUNNING = 1 << 1, 76 WORK_BUSY_RUNNING = 1 << 1,
77 }; 77 };
78 78
79 struct work_struct { 79 struct work_struct {
80 atomic_long_t data; 80 atomic_long_t data;
81 struct list_head entry; 81 struct list_head entry;
82 work_func_t func; 82 work_func_t func;
83 #ifdef CONFIG_LOCKDEP 83 #ifdef CONFIG_LOCKDEP
84 struct lockdep_map lockdep_map; 84 struct lockdep_map lockdep_map;
85 #endif 85 #endif
86 }; 86 };
87 87
88 #define WORK_DATA_INIT() ATOMIC_LONG_INIT(WORK_STRUCT_NO_CPU) 88 #define WORK_DATA_INIT() ATOMIC_LONG_INIT(WORK_STRUCT_NO_CPU)
89 #define WORK_DATA_STATIC_INIT() \ 89 #define WORK_DATA_STATIC_INIT() \
90 ATOMIC_LONG_INIT(WORK_STRUCT_NO_CPU | WORK_STRUCT_STATIC) 90 ATOMIC_LONG_INIT(WORK_STRUCT_NO_CPU | WORK_STRUCT_STATIC)
91 91
92 struct delayed_work { 92 struct delayed_work {
93 struct work_struct work; 93 struct work_struct work;
94 struct timer_list timer; 94 struct timer_list timer;
95 }; 95 };
96 96
97 static inline struct delayed_work *to_delayed_work(struct work_struct *work) 97 static inline struct delayed_work *to_delayed_work(struct work_struct *work)
98 { 98 {
99 return container_of(work, struct delayed_work, work); 99 return container_of(work, struct delayed_work, work);
100 } 100 }
101 101
102 struct execute_work { 102 struct execute_work {
103 struct work_struct work; 103 struct work_struct work;
104 }; 104 };
105 105
106 #ifdef CONFIG_LOCKDEP 106 #ifdef CONFIG_LOCKDEP
107 /* 107 /*
108 * NB: because we have to copy the lockdep_map, setting _key 108 * NB: because we have to copy the lockdep_map, setting _key
109 * here is required, otherwise it could get initialised to the 109 * here is required, otherwise it could get initialised to the
110 * copy of the lockdep_map! 110 * copy of the lockdep_map!
111 */ 111 */
112 #define __WORK_INIT_LOCKDEP_MAP(n, k) \ 112 #define __WORK_INIT_LOCKDEP_MAP(n, k) \
113 .lockdep_map = STATIC_LOCKDEP_MAP_INIT(n, k), 113 .lockdep_map = STATIC_LOCKDEP_MAP_INIT(n, k),
114 #else 114 #else
115 #define __WORK_INIT_LOCKDEP_MAP(n, k) 115 #define __WORK_INIT_LOCKDEP_MAP(n, k)
116 #endif 116 #endif
117 117
118 #define __WORK_INITIALIZER(n, f) { \ 118 #define __WORK_INITIALIZER(n, f) { \
119 .data = WORK_DATA_STATIC_INIT(), \ 119 .data = WORK_DATA_STATIC_INIT(), \
120 .entry = { &(n).entry, &(n).entry }, \ 120 .entry = { &(n).entry, &(n).entry }, \
121 .func = (f), \ 121 .func = (f), \
122 __WORK_INIT_LOCKDEP_MAP(#n, &(n)) \ 122 __WORK_INIT_LOCKDEP_MAP(#n, &(n)) \
123 } 123 }
124 124
125 #define __DELAYED_WORK_INITIALIZER(n, f) { \ 125 #define __DELAYED_WORK_INITIALIZER(n, f) { \
126 .work = __WORK_INITIALIZER((n).work, (f)), \ 126 .work = __WORK_INITIALIZER((n).work, (f)), \
127 .timer = TIMER_INITIALIZER(NULL, 0, 0), \ 127 .timer = TIMER_INITIALIZER(NULL, 0, 0), \
128 } 128 }
129 129
130 #define DECLARE_WORK(n, f) \ 130 #define DECLARE_WORK(n, f) \
131 struct work_struct n = __WORK_INITIALIZER(n, f) 131 struct work_struct n = __WORK_INITIALIZER(n, f)
132 132
133 #define DECLARE_DELAYED_WORK(n, f) \ 133 #define DECLARE_DELAYED_WORK(n, f) \
134 struct delayed_work n = __DELAYED_WORK_INITIALIZER(n, f) 134 struct delayed_work n = __DELAYED_WORK_INITIALIZER(n, f)
135 135
136 /* 136 /*
137 * initialize a work item's function pointer 137 * initialize a work item's function pointer
138 */ 138 */
139 #define PREPARE_WORK(_work, _func) \ 139 #define PREPARE_WORK(_work, _func) \
140 do { \ 140 do { \
141 (_work)->func = (_func); \ 141 (_work)->func = (_func); \
142 } while (0) 142 } while (0)
143 143
144 #define PREPARE_DELAYED_WORK(_work, _func) \ 144 #define PREPARE_DELAYED_WORK(_work, _func) \
145 PREPARE_WORK(&(_work)->work, (_func)) 145 PREPARE_WORK(&(_work)->work, (_func))
146 146
147 #ifdef CONFIG_DEBUG_OBJECTS_WORK 147 #ifdef CONFIG_DEBUG_OBJECTS_WORK
148 extern void __init_work(struct work_struct *work, int onstack); 148 extern void __init_work(struct work_struct *work, int onstack);
149 extern void destroy_work_on_stack(struct work_struct *work); 149 extern void destroy_work_on_stack(struct work_struct *work);
150 static inline unsigned int work_static(struct work_struct *work) 150 static inline unsigned int work_static(struct work_struct *work)
151 { 151 {
152 return *work_data_bits(work) & WORK_STRUCT_STATIC; 152 return *work_data_bits(work) & WORK_STRUCT_STATIC;
153 } 153 }
154 #else 154 #else
155 static inline void __init_work(struct work_struct *work, int onstack) { } 155 static inline void __init_work(struct work_struct *work, int onstack) { }
156 static inline void destroy_work_on_stack(struct work_struct *work) { } 156 static inline void destroy_work_on_stack(struct work_struct *work) { }
157 static inline unsigned int work_static(struct work_struct *work) { return 0; } 157 static inline unsigned int work_static(struct work_struct *work) { return 0; }
158 #endif 158 #endif
159 159
160 /* 160 /*
161 * initialize all of a work item in one go 161 * initialize all of a work item in one go
162 * 162 *
163 * NOTE! No point in using "atomic_long_set()": using a direct 163 * NOTE! No point in using "atomic_long_set()": using a direct
164 * assignment of the work data initializer allows the compiler 164 * assignment of the work data initializer allows the compiler
165 * to generate better code. 165 * to generate better code.
166 */ 166 */
167 #ifdef CONFIG_LOCKDEP 167 #ifdef CONFIG_LOCKDEP
168 #define __INIT_WORK(_work, _func, _onstack) \ 168 #define __INIT_WORK(_work, _func, _onstack) \
169 do { \ 169 do { \
170 static struct lock_class_key __key; \ 170 static struct lock_class_key __key; \
171 \ 171 \
172 __init_work((_work), _onstack); \ 172 __init_work((_work), _onstack); \
173 (_work)->data = (atomic_long_t) WORK_DATA_INIT(); \ 173 (_work)->data = (atomic_long_t) WORK_DATA_INIT(); \
174 lockdep_init_map(&(_work)->lockdep_map, #_work, &__key, 0);\ 174 lockdep_init_map(&(_work)->lockdep_map, #_work, &__key, 0);\
175 INIT_LIST_HEAD(&(_work)->entry); \ 175 INIT_LIST_HEAD(&(_work)->entry); \
176 PREPARE_WORK((_work), (_func)); \ 176 PREPARE_WORK((_work), (_func)); \
177 } while (0) 177 } while (0)
178 #else 178 #else
179 #define __INIT_WORK(_work, _func, _onstack) \ 179 #define __INIT_WORK(_work, _func, _onstack) \
180 do { \ 180 do { \
181 __init_work((_work), _onstack); \ 181 __init_work((_work), _onstack); \
182 (_work)->data = (atomic_long_t) WORK_DATA_INIT(); \ 182 (_work)->data = (atomic_long_t) WORK_DATA_INIT(); \
183 INIT_LIST_HEAD(&(_work)->entry); \ 183 INIT_LIST_HEAD(&(_work)->entry); \
184 PREPARE_WORK((_work), (_func)); \ 184 PREPARE_WORK((_work), (_func)); \
185 } while (0) 185 } while (0)
186 #endif 186 #endif
187 187
188 #define INIT_WORK(_work, _func) \ 188 #define INIT_WORK(_work, _func) \
189 do { \ 189 do { \
190 __INIT_WORK((_work), (_func), 0); \ 190 __INIT_WORK((_work), (_func), 0); \
191 } while (0) 191 } while (0)
192 192
193 #define INIT_WORK_ON_STACK(_work, _func) \ 193 #define INIT_WORK_ON_STACK(_work, _func) \
194 do { \ 194 do { \
195 __INIT_WORK((_work), (_func), 1); \ 195 __INIT_WORK((_work), (_func), 1); \
196 } while (0) 196 } while (0)
197 197
198 #define INIT_DELAYED_WORK(_work, _func) \ 198 #define INIT_DELAYED_WORK(_work, _func) \
199 do { \ 199 do { \
200 INIT_WORK(&(_work)->work, (_func)); \ 200 INIT_WORK(&(_work)->work, (_func)); \
201 init_timer(&(_work)->timer); \ 201 init_timer(&(_work)->timer); \
202 } while (0) 202 } while (0)
203 203
204 #define INIT_DELAYED_WORK_ON_STACK(_work, _func) \ 204 #define INIT_DELAYED_WORK_ON_STACK(_work, _func) \
205 do { \ 205 do { \
206 INIT_WORK_ON_STACK(&(_work)->work, (_func)); \ 206 INIT_WORK_ON_STACK(&(_work)->work, (_func)); \
207 init_timer_on_stack(&(_work)->timer); \ 207 init_timer_on_stack(&(_work)->timer); \
208 } while (0) 208 } while (0)
209 209
210 #define INIT_DELAYED_WORK_DEFERRABLE(_work, _func) \ 210 #define INIT_DELAYED_WORK_DEFERRABLE(_work, _func) \
211 do { \ 211 do { \
212 INIT_WORK(&(_work)->work, (_func)); \ 212 INIT_WORK(&(_work)->work, (_func)); \
213 init_timer_deferrable(&(_work)->timer); \ 213 init_timer_deferrable(&(_work)->timer); \
214 } while (0) 214 } while (0)
215 215
216 /** 216 /**
217 * work_pending - Find out whether a work item is currently pending 217 * work_pending - Find out whether a work item is currently pending
218 * @work: The work item in question 218 * @work: The work item in question
219 */ 219 */
220 #define work_pending(work) \ 220 #define work_pending(work) \
221 test_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work)) 221 test_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))
222 222
223 /** 223 /**
224 * delayed_work_pending - Find out whether a delayable work item is currently 224 * delayed_work_pending - Find out whether a delayable work item is currently
225 * pending 225 * pending
226 * @work: The work item in question 226 * @work: The work item in question
227 */ 227 */
228 #define delayed_work_pending(w) \ 228 #define delayed_work_pending(w) \
229 work_pending(&(w)->work) 229 work_pending(&(w)->work)
230 230
231 /** 231 /**
232 * work_clear_pending - for internal use only, mark a work item as not pending 232 * work_clear_pending - for internal use only, mark a work item as not pending
233 * @work: The work item in question 233 * @work: The work item in question
234 */ 234 */
235 #define work_clear_pending(work) \ 235 #define work_clear_pending(work) \
236 clear_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work)) 236 clear_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))
237 237
238 /* 238 /*
239 * Workqueue flags and constants. For details, please refer to 239 * Workqueue flags and constants. For details, please refer to
240 * Documentation/workqueue.txt. 240 * Documentation/workqueue.txt.
241 */ 241 */
242 enum { 242 enum {
243 WQ_NON_REENTRANT = 1 << 0, /* guarantee non-reentrance */ 243 WQ_NON_REENTRANT = 1 << 0, /* guarantee non-reentrance */
244 WQ_UNBOUND = 1 << 1, /* not bound to any cpu */ 244 WQ_UNBOUND = 1 << 1, /* not bound to any cpu */
245 WQ_FREEZEABLE = 1 << 2, /* freeze during suspend */ 245 WQ_FREEZEABLE = 1 << 2, /* freeze during suspend */
246 WQ_RESCUER = 1 << 3, /* has an rescue worker */ 246 WQ_MEM_RECLAIM = 1 << 3, /* may be used for memory reclaim */
247 WQ_HIGHPRI = 1 << 4, /* high priority */ 247 WQ_HIGHPRI = 1 << 4, /* high priority */
248 WQ_CPU_INTENSIVE = 1 << 5, /* cpu instensive workqueue */ 248 WQ_CPU_INTENSIVE = 1 << 5, /* cpu instensive workqueue */
249 249
250 WQ_DYING = 1 << 6, /* internal: workqueue is dying */ 250 WQ_DYING = 1 << 6, /* internal: workqueue is dying */
251 WQ_RESCUER = 1 << 7, /* internal: workqueue has rescuer */
251 252
252 WQ_MAX_ACTIVE = 512, /* I like 512, better ideas? */ 253 WQ_MAX_ACTIVE = 512, /* I like 512, better ideas? */
253 WQ_MAX_UNBOUND_PER_CPU = 4, /* 4 * #cpus for unbound wq */ 254 WQ_MAX_UNBOUND_PER_CPU = 4, /* 4 * #cpus for unbound wq */
254 WQ_DFL_ACTIVE = WQ_MAX_ACTIVE / 2, 255 WQ_DFL_ACTIVE = WQ_MAX_ACTIVE / 2,
255 }; 256 };
256 257
257 /* unbound wq's aren't per-cpu, scale max_active according to #cpus */ 258 /* unbound wq's aren't per-cpu, scale max_active according to #cpus */
258 #define WQ_UNBOUND_MAX_ACTIVE \ 259 #define WQ_UNBOUND_MAX_ACTIVE \
259 max_t(int, WQ_MAX_ACTIVE, num_possible_cpus() * WQ_MAX_UNBOUND_PER_CPU) 260 max_t(int, WQ_MAX_ACTIVE, num_possible_cpus() * WQ_MAX_UNBOUND_PER_CPU)
260 261
261 /* 262 /*
262 * System-wide workqueues which are always present. 263 * System-wide workqueues which are always present.
263 * 264 *
264 * system_wq is the one used by schedule[_delayed]_work[_on](). 265 * system_wq is the one used by schedule[_delayed]_work[_on]().
265 * Multi-CPU multi-threaded. There are users which expect relatively 266 * Multi-CPU multi-threaded. There are users which expect relatively
266 * short queue flush time. Don't queue works which can run for too 267 * short queue flush time. Don't queue works which can run for too
267 * long. 268 * long.
268 * 269 *
269 * system_long_wq is similar to system_wq but may host long running 270 * system_long_wq is similar to system_wq but may host long running
270 * works. Queue flushing might take relatively long. 271 * works. Queue flushing might take relatively long.
271 * 272 *
272 * system_nrt_wq is non-reentrant and guarantees that any given work 273 * system_nrt_wq is non-reentrant and guarantees that any given work
273 * item is never executed in parallel by multiple CPUs. Queue 274 * item is never executed in parallel by multiple CPUs. Queue
274 * flushing might take relatively long. 275 * flushing might take relatively long.
275 * 276 *
276 * system_unbound_wq is unbound workqueue. Workers are not bound to 277 * system_unbound_wq is unbound workqueue. Workers are not bound to
277 * any specific CPU, not concurrency managed, and all queued works are 278 * any specific CPU, not concurrency managed, and all queued works are
278 * executed immediately as long as max_active limit is not reached and 279 * executed immediately as long as max_active limit is not reached and
279 * resources are available. 280 * resources are available.
280 */ 281 */
281 extern struct workqueue_struct *system_wq; 282 extern struct workqueue_struct *system_wq;
282 extern struct workqueue_struct *system_long_wq; 283 extern struct workqueue_struct *system_long_wq;
283 extern struct workqueue_struct *system_nrt_wq; 284 extern struct workqueue_struct *system_nrt_wq;
284 extern struct workqueue_struct *system_unbound_wq; 285 extern struct workqueue_struct *system_unbound_wq;
285 286
286 extern struct workqueue_struct * 287 extern struct workqueue_struct *
287 __alloc_workqueue_key(const char *name, unsigned int flags, int max_active, 288 __alloc_workqueue_key(const char *name, unsigned int flags, int max_active,
288 struct lock_class_key *key, const char *lock_name); 289 struct lock_class_key *key, const char *lock_name);
289 290
290 #ifdef CONFIG_LOCKDEP 291 #ifdef CONFIG_LOCKDEP
291 #define alloc_workqueue(name, flags, max_active) \ 292 #define alloc_workqueue(name, flags, max_active) \
292 ({ \ 293 ({ \
293 static struct lock_class_key __key; \ 294 static struct lock_class_key __key; \
294 const char *__lock_name; \ 295 const char *__lock_name; \
295 \ 296 \
296 if (__builtin_constant_p(name)) \ 297 if (__builtin_constant_p(name)) \
297 __lock_name = (name); \ 298 __lock_name = (name); \
298 else \ 299 else \
299 __lock_name = #name; \ 300 __lock_name = #name; \
300 \ 301 \
301 __alloc_workqueue_key((name), (flags), (max_active), \ 302 __alloc_workqueue_key((name), (flags), (max_active), \
302 &__key, __lock_name); \ 303 &__key, __lock_name); \
303 }) 304 })
304 #else 305 #else
305 #define alloc_workqueue(name, flags, max_active) \ 306 #define alloc_workqueue(name, flags, max_active) \
306 __alloc_workqueue_key((name), (flags), (max_active), NULL, NULL) 307 __alloc_workqueue_key((name), (flags), (max_active), NULL, NULL)
307 #endif 308 #endif
308 309
310 /**
311 * alloc_ordered_workqueue - allocate an ordered workqueue
312 * @name: name of the workqueue
313 * @flags: WQ_* flags (only WQ_FREEZEABLE and WQ_MEM_RECLAIM are meaningful)
314 *
315 * Allocate an ordered workqueue. An ordered workqueue executes at
316 * most one work item at any given time in the queued order. They are
317 * implemented as unbound workqueues with @max_active of one.
318 *
319 * RETURNS:
320 * Pointer to the allocated workqueue on success, %NULL on failure.
321 */
322 static inline struct workqueue_struct *
323 alloc_ordered_workqueue(const char *name, unsigned int flags)
324 {
325 return alloc_workqueue(name, WQ_UNBOUND | flags, 1);
326 }
327
309 #define create_workqueue(name) \ 328 #define create_workqueue(name) \
310 alloc_workqueue((name), WQ_RESCUER, 1) 329 alloc_workqueue((name), WQ_MEM_RECLAIM, 1)
311 #define create_freezeable_workqueue(name) \ 330 #define create_freezeable_workqueue(name) \
312 alloc_workqueue((name), WQ_FREEZEABLE | WQ_UNBOUND | WQ_RESCUER, 1) 331 alloc_workqueue((name), WQ_FREEZEABLE | WQ_UNBOUND | WQ_MEM_RECLAIM, 1)
313 #define create_singlethread_workqueue(name) \ 332 #define create_singlethread_workqueue(name) \
314 alloc_workqueue((name), WQ_UNBOUND | WQ_RESCUER, 1) 333 alloc_workqueue((name), WQ_UNBOUND | WQ_MEM_RECLAIM, 1)
315 334
316 extern void destroy_workqueue(struct workqueue_struct *wq); 335 extern void destroy_workqueue(struct workqueue_struct *wq);
317 336
318 extern int queue_work(struct workqueue_struct *wq, struct work_struct *work); 337 extern int queue_work(struct workqueue_struct *wq, struct work_struct *work);
319 extern int queue_work_on(int cpu, struct workqueue_struct *wq, 338 extern int queue_work_on(int cpu, struct workqueue_struct *wq,
320 struct work_struct *work); 339 struct work_struct *work);
321 extern int queue_delayed_work(struct workqueue_struct *wq, 340 extern int queue_delayed_work(struct workqueue_struct *wq,
322 struct delayed_work *work, unsigned long delay); 341 struct delayed_work *work, unsigned long delay);
323 extern int queue_delayed_work_on(int cpu, struct workqueue_struct *wq, 342 extern int queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
324 struct delayed_work *work, unsigned long delay); 343 struct delayed_work *work, unsigned long delay);
325 344
326 extern void flush_workqueue(struct workqueue_struct *wq); 345 extern void flush_workqueue(struct workqueue_struct *wq);
327 extern void flush_scheduled_work(void); 346 extern void flush_scheduled_work(void);
328 extern void flush_delayed_work(struct delayed_work *work);
329 347
330 extern int schedule_work(struct work_struct *work); 348 extern int schedule_work(struct work_struct *work);
331 extern int schedule_work_on(int cpu, struct work_struct *work); 349 extern int schedule_work_on(int cpu, struct work_struct *work);
332 extern int schedule_delayed_work(struct delayed_work *work, unsigned long delay); 350 extern int schedule_delayed_work(struct delayed_work *work, unsigned long delay);
333 extern int schedule_delayed_work_on(int cpu, struct delayed_work *work, 351 extern int schedule_delayed_work_on(int cpu, struct delayed_work *work,
334 unsigned long delay); 352 unsigned long delay);
335 extern int schedule_on_each_cpu(work_func_t func); 353 extern int schedule_on_each_cpu(work_func_t func);
336 extern int keventd_up(void); 354 extern int keventd_up(void);
337 355
338 int execute_in_process_context(work_func_t fn, struct execute_work *); 356 int execute_in_process_context(work_func_t fn, struct execute_work *);
339 357
340 extern int flush_work(struct work_struct *work); 358 extern bool flush_work(struct work_struct *work);
341 extern int cancel_work_sync(struct work_struct *work); 359 extern bool flush_work_sync(struct work_struct *work);
360 extern bool cancel_work_sync(struct work_struct *work);
342 361
362 extern bool flush_delayed_work(struct delayed_work *dwork);
363 extern bool flush_delayed_work_sync(struct delayed_work *work);
364 extern bool cancel_delayed_work_sync(struct delayed_work *dwork);
365
343 extern void workqueue_set_max_active(struct workqueue_struct *wq, 366 extern void workqueue_set_max_active(struct workqueue_struct *wq,
344 int max_active); 367 int max_active);
345 extern bool workqueue_congested(unsigned int cpu, struct workqueue_struct *wq); 368 extern bool workqueue_congested(unsigned int cpu, struct workqueue_struct *wq);
346 extern unsigned int work_cpu(struct work_struct *work); 369 extern unsigned int work_cpu(struct work_struct *work);
347 extern unsigned int work_busy(struct work_struct *work); 370 extern unsigned int work_busy(struct work_struct *work);
348 371
349 /* 372 /*
350 * Kill off a pending schedule_delayed_work(). Note that the work callback 373 * Kill off a pending schedule_delayed_work(). Note that the work callback
351 * function may still be running on return from cancel_delayed_work(), unless 374 * function may still be running on return from cancel_delayed_work(), unless
352 * it returns 1 and the work doesn't re-arm itself. Run flush_workqueue() or 375 * it returns 1 and the work doesn't re-arm itself. Run flush_workqueue() or
353 * cancel_work_sync() to wait on it. 376 * cancel_work_sync() to wait on it.
354 */ 377 */
355 static inline int cancel_delayed_work(struct delayed_work *work) 378 static inline bool cancel_delayed_work(struct delayed_work *work)
356 { 379 {
357 int ret; 380 bool ret;
358 381
359 ret = del_timer_sync(&work->timer); 382 ret = del_timer_sync(&work->timer);
360 if (ret) 383 if (ret)
361 work_clear_pending(&work->work); 384 work_clear_pending(&work->work);
362 return ret; 385 return ret;
363 } 386 }
364 387
365 /* 388 /*
366 * Like above, but uses del_timer() instead of del_timer_sync(). This means, 389 * Like above, but uses del_timer() instead of del_timer_sync(). This means,
367 * if it returns 0 the timer function may be running and the queueing is in 390 * if it returns 0 the timer function may be running and the queueing is in
368 * progress. 391 * progress.
369 */ 392 */
370 static inline int __cancel_delayed_work(struct delayed_work *work) 393 static inline bool __cancel_delayed_work(struct delayed_work *work)
371 { 394 {
372 int ret; 395 bool ret;
373 396
374 ret = del_timer(&work->timer); 397 ret = del_timer(&work->timer);
375 if (ret) 398 if (ret)
376 work_clear_pending(&work->work); 399 work_clear_pending(&work->work);
377 return ret; 400 return ret;
378 } 401 }
379 402
380 extern int cancel_delayed_work_sync(struct delayed_work *work);
381
382 /* Obsolete. use cancel_delayed_work_sync() */ 403 /* Obsolete. use cancel_delayed_work_sync() */
383 static inline 404 static inline
384 void cancel_rearming_delayed_workqueue(struct workqueue_struct *wq, 405 void cancel_rearming_delayed_workqueue(struct workqueue_struct *wq,
385 struct delayed_work *work) 406 struct delayed_work *work)
386 { 407 {
387 cancel_delayed_work_sync(work); 408 cancel_delayed_work_sync(work);
388 } 409 }
389 410
390 /* Obsolete. use cancel_delayed_work_sync() */ 411 /* Obsolete. use cancel_delayed_work_sync() */
391 static inline 412 static inline
392 void cancel_rearming_delayed_work(struct delayed_work *work) 413 void cancel_rearming_delayed_work(struct delayed_work *work)
393 { 414 {
394 cancel_delayed_work_sync(work); 415 cancel_delayed_work_sync(work);
395 } 416 }
396 417
397 #ifndef CONFIG_SMP 418 #ifndef CONFIG_SMP
398 static inline long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg) 419 static inline long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
399 { 420 {
400 return fn(arg); 421 return fn(arg);
401 } 422 }
402 #else 423 #else
403 long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg); 424 long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg);
404 #endif /* CONFIG_SMP */ 425 #endif /* CONFIG_SMP */
405 426
406 #ifdef CONFIG_FREEZER 427 #ifdef CONFIG_FREEZER
407 extern void freeze_workqueues_begin(void); 428 extern void freeze_workqueues_begin(void);
408 extern bool freeze_workqueues_busy(void); 429 extern bool freeze_workqueues_busy(void);
409 extern void thaw_workqueues(void); 430 extern void thaw_workqueues(void);
410 #endif /* CONFIG_FREEZER */ 431 #endif /* CONFIG_FREEZER */
include/trace/events/workqueue.h
1 #undef TRACE_SYSTEM 1 #undef TRACE_SYSTEM
2 #define TRACE_SYSTEM workqueue 2 #define TRACE_SYSTEM workqueue
3 3
4 #if !defined(_TRACE_WORKQUEUE_H) || defined(TRACE_HEADER_MULTI_READ) 4 #if !defined(_TRACE_WORKQUEUE_H) || defined(TRACE_HEADER_MULTI_READ)
5 #define _TRACE_WORKQUEUE_H 5 #define _TRACE_WORKQUEUE_H
6 6
7 #include <linux/tracepoint.h> 7 #include <linux/tracepoint.h>
8 #include <linux/workqueue.h> 8 #include <linux/workqueue.h>
9 9
10 DECLARE_EVENT_CLASS(workqueue_work,
11
12 TP_PROTO(struct work_struct *work),
13
14 TP_ARGS(work),
15
16 TP_STRUCT__entry(
17 __field( void *, work )
18 ),
19
20 TP_fast_assign(
21 __entry->work = work;
22 ),
23
24 TP_printk("work struct %p", __entry->work)
25 );
26
10 /** 27 /**
11 * workqueue_execute_start - called immediately before the workqueue callback 28 * workqueue_queue_work - called when a work gets queued
29 * @req_cpu: the requested cpu
30 * @cwq: pointer to struct cpu_workqueue_struct
12 * @work: pointer to struct work_struct 31 * @work: pointer to struct work_struct
13 * 32 *
14 * Allows to track workqueue execution. 33 * This event occurs when a work is queued immediately or once a
34 * delayed work is actually queued on a workqueue (ie: once the delay
35 * has been reached).
15 */ 36 */
16 TRACE_EVENT(workqueue_execute_start, 37 TRACE_EVENT(workqueue_queue_work,
17 38
18 TP_PROTO(struct work_struct *work), 39 TP_PROTO(unsigned int req_cpu, struct cpu_workqueue_struct *cwq,
40 struct work_struct *work),
19 41
20 TP_ARGS(work), 42 TP_ARGS(req_cpu, cwq, work),
21 43
22 TP_STRUCT__entry( 44 TP_STRUCT__entry(
23 __field( void *, work ) 45 __field( void *, work )
24 __field( void *, function) 46 __field( void *, function)
47 __field( void *, workqueue)
48 __field( unsigned int, req_cpu )
49 __field( unsigned int, cpu )
25 ), 50 ),
26 51
27 TP_fast_assign( 52 TP_fast_assign(
28 __entry->work = work; 53 __entry->work = work;
29 __entry->function = work->func; 54 __entry->function = work->func;
55 __entry->workqueue = cwq->wq;
56 __entry->req_cpu = req_cpu;
57 __entry->cpu = cwq->gcwq->cpu;
30 ), 58 ),
31 59
32 TP_printk("work struct %p: function %pf", __entry->work, __entry->function) 60 TP_printk("work struct=%p function=%pf workqueue=%p req_cpu=%u cpu=%u",
61 __entry->work, __entry->function, __entry->workqueue,
62 __entry->req_cpu, __entry->cpu)
33 ); 63 );
34 64
35 /** 65 /**
36 * workqueue_execute_end - called immediately before the workqueue callback 66 * workqueue_activate_work - called when a work gets activated
37 * @work: pointer to struct work_struct 67 * @work: pointer to struct work_struct
38 * 68 *
69 * This event occurs when a queued work is put on the active queue,
70 * which happens immediately after queueing unless @max_active limit
71 * is reached.
72 */
73 DEFINE_EVENT(workqueue_work, workqueue_activate_work,
74
75 TP_PROTO(struct work_struct *work),
76
77 TP_ARGS(work)
78 );
79
80 /**
81 * workqueue_execute_start - called immediately before the workqueue callback
82 * @work: pointer to struct work_struct
83 *
39 * Allows to track workqueue execution. 84 * Allows to track workqueue execution.
40 */ 85 */
41 TRACE_EVENT(workqueue_execute_end, 86 TRACE_EVENT(workqueue_execute_start,
42 87
43 TP_PROTO(struct work_struct *work), 88 TP_PROTO(struct work_struct *work),
44 89
45 TP_ARGS(work), 90 TP_ARGS(work),
46 91
47 TP_STRUCT__entry( 92 TP_STRUCT__entry(
48 __field( void *, work ) 93 __field( void *, work )
94 __field( void *, function)
49 ), 95 ),
50 96
51 TP_fast_assign( 97 TP_fast_assign(
52 __entry->work = work; 98 __entry->work = work;
99 __entry->function = work->func;
53 ), 100 ),
54 101
55 TP_printk("work struct %p", __entry->work) 102 TP_printk("work struct %p: function %pf", __entry->work, __entry->function)
56 ); 103 );
57 104
105 /**
106 * workqueue_execute_end - called immediately before the workqueue callback
107 * @work: pointer to struct work_struct
108 *
109 * Allows to track workqueue execution.
110 */
111 DEFINE_EVENT(workqueue_work, workqueue_execute_end,
112
113 TP_PROTO(struct work_struct *work),
114
115 TP_ARGS(work)
116 );
58 117
59 #endif /* _TRACE_WORKQUEUE_H */ 118 #endif /* _TRACE_WORKQUEUE_H */
60 119
61 /* This part must be outside protection */ 120 /* This part must be outside protection */
62 #include <trace/define_trace.h> 121 #include <trace/define_trace.h>
63 122
1 /* 1 /*
2 * kernel/workqueue.c - generic async execution with shared worker pool 2 * kernel/workqueue.c - generic async execution with shared worker pool
3 * 3 *
4 * Copyright (C) 2002 Ingo Molnar 4 * Copyright (C) 2002 Ingo Molnar
5 * 5 *
6 * Derived from the taskqueue/keventd code by: 6 * Derived from the taskqueue/keventd code by:
7 * David Woodhouse <dwmw2@infradead.org> 7 * David Woodhouse <dwmw2@infradead.org>
8 * Andrew Morton 8 * Andrew Morton
9 * Kai Petzke <wpp@marie.physik.tu-berlin.de> 9 * Kai Petzke <wpp@marie.physik.tu-berlin.de>
10 * Theodore Ts'o <tytso@mit.edu> 10 * Theodore Ts'o <tytso@mit.edu>
11 * 11 *
12 * Made to use alloc_percpu by Christoph Lameter. 12 * Made to use alloc_percpu by Christoph Lameter.
13 * 13 *
14 * Copyright (C) 2010 SUSE Linux Products GmbH 14 * Copyright (C) 2010 SUSE Linux Products GmbH
15 * Copyright (C) 2010 Tejun Heo <tj@kernel.org> 15 * Copyright (C) 2010 Tejun Heo <tj@kernel.org>
16 * 16 *
17 * This is the generic async execution mechanism. Work items as are 17 * This is the generic async execution mechanism. Work items as are
18 * executed in process context. The worker pool is shared and 18 * executed in process context. The worker pool is shared and
19 * automatically managed. There is one worker pool for each CPU and 19 * automatically managed. There is one worker pool for each CPU and
20 * one extra for works which are better served by workers which are 20 * one extra for works which are better served by workers which are
21 * not bound to any specific CPU. 21 * not bound to any specific CPU.
22 * 22 *
23 * Please read Documentation/workqueue.txt for details. 23 * Please read Documentation/workqueue.txt for details.
24 */ 24 */
25 25
26 #include <linux/module.h> 26 #include <linux/module.h>
27 #include <linux/kernel.h> 27 #include <linux/kernel.h>
28 #include <linux/sched.h> 28 #include <linux/sched.h>
29 #include <linux/init.h> 29 #include <linux/init.h>
30 #include <linux/signal.h> 30 #include <linux/signal.h>
31 #include <linux/completion.h> 31 #include <linux/completion.h>
32 #include <linux/workqueue.h> 32 #include <linux/workqueue.h>
33 #include <linux/slab.h> 33 #include <linux/slab.h>
34 #include <linux/cpu.h> 34 #include <linux/cpu.h>
35 #include <linux/notifier.h> 35 #include <linux/notifier.h>
36 #include <linux/kthread.h> 36 #include <linux/kthread.h>
37 #include <linux/hardirq.h> 37 #include <linux/hardirq.h>
38 #include <linux/mempolicy.h> 38 #include <linux/mempolicy.h>
39 #include <linux/freezer.h> 39 #include <linux/freezer.h>
40 #include <linux/kallsyms.h> 40 #include <linux/kallsyms.h>
41 #include <linux/debug_locks.h> 41 #include <linux/debug_locks.h>
42 #include <linux/lockdep.h> 42 #include <linux/lockdep.h>
43 #include <linux/idr.h> 43 #include <linux/idr.h>
44 44
45 #define CREATE_TRACE_POINTS
46 #include <trace/events/workqueue.h>
47
48 #include "workqueue_sched.h" 45 #include "workqueue_sched.h"
49 46
50 enum { 47 enum {
51 /* global_cwq flags */ 48 /* global_cwq flags */
52 GCWQ_MANAGE_WORKERS = 1 << 0, /* need to manage workers */ 49 GCWQ_MANAGE_WORKERS = 1 << 0, /* need to manage workers */
53 GCWQ_MANAGING_WORKERS = 1 << 1, /* managing workers */ 50 GCWQ_MANAGING_WORKERS = 1 << 1, /* managing workers */
54 GCWQ_DISASSOCIATED = 1 << 2, /* cpu can't serve workers */ 51 GCWQ_DISASSOCIATED = 1 << 2, /* cpu can't serve workers */
55 GCWQ_FREEZING = 1 << 3, /* freeze in progress */ 52 GCWQ_FREEZING = 1 << 3, /* freeze in progress */
56 GCWQ_HIGHPRI_PENDING = 1 << 4, /* highpri works on queue */ 53 GCWQ_HIGHPRI_PENDING = 1 << 4, /* highpri works on queue */
57 54
58 /* worker flags */ 55 /* worker flags */
59 WORKER_STARTED = 1 << 0, /* started */ 56 WORKER_STARTED = 1 << 0, /* started */
60 WORKER_DIE = 1 << 1, /* die die die */ 57 WORKER_DIE = 1 << 1, /* die die die */
61 WORKER_IDLE = 1 << 2, /* is idle */ 58 WORKER_IDLE = 1 << 2, /* is idle */
62 WORKER_PREP = 1 << 3, /* preparing to run works */ 59 WORKER_PREP = 1 << 3, /* preparing to run works */
63 WORKER_ROGUE = 1 << 4, /* not bound to any cpu */ 60 WORKER_ROGUE = 1 << 4, /* not bound to any cpu */
64 WORKER_REBIND = 1 << 5, /* mom is home, come back */ 61 WORKER_REBIND = 1 << 5, /* mom is home, come back */
65 WORKER_CPU_INTENSIVE = 1 << 6, /* cpu intensive */ 62 WORKER_CPU_INTENSIVE = 1 << 6, /* cpu intensive */
66 WORKER_UNBOUND = 1 << 7, /* worker is unbound */ 63 WORKER_UNBOUND = 1 << 7, /* worker is unbound */
67 64
68 WORKER_NOT_RUNNING = WORKER_PREP | WORKER_ROGUE | WORKER_REBIND | 65 WORKER_NOT_RUNNING = WORKER_PREP | WORKER_ROGUE | WORKER_REBIND |
69 WORKER_CPU_INTENSIVE | WORKER_UNBOUND, 66 WORKER_CPU_INTENSIVE | WORKER_UNBOUND,
70 67
71 /* gcwq->trustee_state */ 68 /* gcwq->trustee_state */
72 TRUSTEE_START = 0, /* start */ 69 TRUSTEE_START = 0, /* start */
73 TRUSTEE_IN_CHARGE = 1, /* trustee in charge of gcwq */ 70 TRUSTEE_IN_CHARGE = 1, /* trustee in charge of gcwq */
74 TRUSTEE_BUTCHER = 2, /* butcher workers */ 71 TRUSTEE_BUTCHER = 2, /* butcher workers */
75 TRUSTEE_RELEASE = 3, /* release workers */ 72 TRUSTEE_RELEASE = 3, /* release workers */
76 TRUSTEE_DONE = 4, /* trustee is done */ 73 TRUSTEE_DONE = 4, /* trustee is done */
77 74
78 BUSY_WORKER_HASH_ORDER = 6, /* 64 pointers */ 75 BUSY_WORKER_HASH_ORDER = 6, /* 64 pointers */
79 BUSY_WORKER_HASH_SIZE = 1 << BUSY_WORKER_HASH_ORDER, 76 BUSY_WORKER_HASH_SIZE = 1 << BUSY_WORKER_HASH_ORDER,
80 BUSY_WORKER_HASH_MASK = BUSY_WORKER_HASH_SIZE - 1, 77 BUSY_WORKER_HASH_MASK = BUSY_WORKER_HASH_SIZE - 1,
81 78
82 MAX_IDLE_WORKERS_RATIO = 4, /* 1/4 of busy can be idle */ 79 MAX_IDLE_WORKERS_RATIO = 4, /* 1/4 of busy can be idle */
83 IDLE_WORKER_TIMEOUT = 300 * HZ, /* keep idle ones for 5 mins */ 80 IDLE_WORKER_TIMEOUT = 300 * HZ, /* keep idle ones for 5 mins */
84 81
85 MAYDAY_INITIAL_TIMEOUT = HZ / 100, /* call for help after 10ms */ 82 MAYDAY_INITIAL_TIMEOUT = HZ / 100, /* call for help after 10ms */
86 MAYDAY_INTERVAL = HZ / 10, /* and then every 100ms */ 83 MAYDAY_INTERVAL = HZ / 10, /* and then every 100ms */
87 CREATE_COOLDOWN = HZ, /* time to breath after fail */ 84 CREATE_COOLDOWN = HZ, /* time to breath after fail */
88 TRUSTEE_COOLDOWN = HZ / 10, /* for trustee draining */ 85 TRUSTEE_COOLDOWN = HZ / 10, /* for trustee draining */
89 86
90 /* 87 /*
91 * Rescue workers are used only on emergencies and shared by 88 * Rescue workers are used only on emergencies and shared by
92 * all cpus. Give -20. 89 * all cpus. Give -20.
93 */ 90 */
94 RESCUER_NICE_LEVEL = -20, 91 RESCUER_NICE_LEVEL = -20,
95 }; 92 };
96 93
97 /* 94 /*
98 * Structure fields follow one of the following exclusion rules. 95 * Structure fields follow one of the following exclusion rules.
99 * 96 *
100 * I: Modifiable by initialization/destruction paths and read-only for 97 * I: Modifiable by initialization/destruction paths and read-only for
101 * everyone else. 98 * everyone else.
102 * 99 *
103 * P: Preemption protected. Disabling preemption is enough and should 100 * P: Preemption protected. Disabling preemption is enough and should
104 * only be modified and accessed from the local cpu. 101 * only be modified and accessed from the local cpu.
105 * 102 *
106 * L: gcwq->lock protected. Access with gcwq->lock held. 103 * L: gcwq->lock protected. Access with gcwq->lock held.
107 * 104 *
108 * X: During normal operation, modification requires gcwq->lock and 105 * X: During normal operation, modification requires gcwq->lock and
109 * should be done only from local cpu. Either disabling preemption 106 * should be done only from local cpu. Either disabling preemption
110 * on local cpu or grabbing gcwq->lock is enough for read access. 107 * on local cpu or grabbing gcwq->lock is enough for read access.
111 * If GCWQ_DISASSOCIATED is set, it's identical to L. 108 * If GCWQ_DISASSOCIATED is set, it's identical to L.
112 * 109 *
113 * F: wq->flush_mutex protected. 110 * F: wq->flush_mutex protected.
114 * 111 *
115 * W: workqueue_lock protected. 112 * W: workqueue_lock protected.
116 */ 113 */
117 114
118 struct global_cwq; 115 struct global_cwq;
119 116
120 /* 117 /*
121 * The poor guys doing the actual heavy lifting. All on-duty workers 118 * The poor guys doing the actual heavy lifting. All on-duty workers
122 * are either serving the manager role, on idle list or on busy hash. 119 * are either serving the manager role, on idle list or on busy hash.
123 */ 120 */
124 struct worker { 121 struct worker {
125 /* on idle list while idle, on busy hash table while busy */ 122 /* on idle list while idle, on busy hash table while busy */
126 union { 123 union {
127 struct list_head entry; /* L: while idle */ 124 struct list_head entry; /* L: while idle */
128 struct hlist_node hentry; /* L: while busy */ 125 struct hlist_node hentry; /* L: while busy */
129 }; 126 };
130 127
131 struct work_struct *current_work; /* L: work being processed */ 128 struct work_struct *current_work; /* L: work being processed */
132 struct cpu_workqueue_struct *current_cwq; /* L: current_work's cwq */ 129 struct cpu_workqueue_struct *current_cwq; /* L: current_work's cwq */
133 struct list_head scheduled; /* L: scheduled works */ 130 struct list_head scheduled; /* L: scheduled works */
134 struct task_struct *task; /* I: worker task */ 131 struct task_struct *task; /* I: worker task */
135 struct global_cwq *gcwq; /* I: the associated gcwq */ 132 struct global_cwq *gcwq; /* I: the associated gcwq */
136 /* 64 bytes boundary on 64bit, 32 on 32bit */ 133 /* 64 bytes boundary on 64bit, 32 on 32bit */
137 unsigned long last_active; /* L: last active timestamp */ 134 unsigned long last_active; /* L: last active timestamp */
138 unsigned int flags; /* X: flags */ 135 unsigned int flags; /* X: flags */
139 int id; /* I: worker id */ 136 int id; /* I: worker id */
140 struct work_struct rebind_work; /* L: rebind worker to cpu */ 137 struct work_struct rebind_work; /* L: rebind worker to cpu */
141 }; 138 };
142 139
143 /* 140 /*
144 * Global per-cpu workqueue. There's one and only one for each cpu 141 * Global per-cpu workqueue. There's one and only one for each cpu
145 * and all works are queued and processed here regardless of their 142 * and all works are queued and processed here regardless of their
146 * target workqueues. 143 * target workqueues.
147 */ 144 */
148 struct global_cwq { 145 struct global_cwq {
149 spinlock_t lock; /* the gcwq lock */ 146 spinlock_t lock; /* the gcwq lock */
150 struct list_head worklist; /* L: list of pending works */ 147 struct list_head worklist; /* L: list of pending works */
151 unsigned int cpu; /* I: the associated cpu */ 148 unsigned int cpu; /* I: the associated cpu */
152 unsigned int flags; /* L: GCWQ_* flags */ 149 unsigned int flags; /* L: GCWQ_* flags */
153 150
154 int nr_workers; /* L: total number of workers */ 151 int nr_workers; /* L: total number of workers */
155 int nr_idle; /* L: currently idle ones */ 152 int nr_idle; /* L: currently idle ones */
156 153
157 /* workers are chained either in the idle_list or busy_hash */ 154 /* workers are chained either in the idle_list or busy_hash */
158 struct list_head idle_list; /* X: list of idle workers */ 155 struct list_head idle_list; /* X: list of idle workers */
159 struct hlist_head busy_hash[BUSY_WORKER_HASH_SIZE]; 156 struct hlist_head busy_hash[BUSY_WORKER_HASH_SIZE];
160 /* L: hash of busy workers */ 157 /* L: hash of busy workers */
161 158
162 struct timer_list idle_timer; /* L: worker idle timeout */ 159 struct timer_list idle_timer; /* L: worker idle timeout */
163 struct timer_list mayday_timer; /* L: SOS timer for dworkers */ 160 struct timer_list mayday_timer; /* L: SOS timer for dworkers */
164 161
165 struct ida worker_ida; /* L: for worker IDs */ 162 struct ida worker_ida; /* L: for worker IDs */
166 163
167 struct task_struct *trustee; /* L: for gcwq shutdown */ 164 struct task_struct *trustee; /* L: for gcwq shutdown */
168 unsigned int trustee_state; /* L: trustee state */ 165 unsigned int trustee_state; /* L: trustee state */
169 wait_queue_head_t trustee_wait; /* trustee wait */ 166 wait_queue_head_t trustee_wait; /* trustee wait */
170 struct worker *first_idle; /* L: first idle worker */ 167 struct worker *first_idle; /* L: first idle worker */
171 } ____cacheline_aligned_in_smp; 168 } ____cacheline_aligned_in_smp;
172 169
173 /* 170 /*
174 * The per-CPU workqueue. The lower WORK_STRUCT_FLAG_BITS of 171 * The per-CPU workqueue. The lower WORK_STRUCT_FLAG_BITS of
175 * work_struct->data are used for flags and thus cwqs need to be 172 * work_struct->data are used for flags and thus cwqs need to be
176 * aligned at two's power of the number of flag bits. 173 * aligned at two's power of the number of flag bits.
177 */ 174 */
178 struct cpu_workqueue_struct { 175 struct cpu_workqueue_struct {
179 struct global_cwq *gcwq; /* I: the associated gcwq */ 176 struct global_cwq *gcwq; /* I: the associated gcwq */
180 struct workqueue_struct *wq; /* I: the owning workqueue */ 177 struct workqueue_struct *wq; /* I: the owning workqueue */
181 int work_color; /* L: current color */ 178 int work_color; /* L: current color */
182 int flush_color; /* L: flushing color */ 179 int flush_color; /* L: flushing color */
183 int nr_in_flight[WORK_NR_COLORS]; 180 int nr_in_flight[WORK_NR_COLORS];
184 /* L: nr of in_flight works */ 181 /* L: nr of in_flight works */
185 int nr_active; /* L: nr of active works */ 182 int nr_active; /* L: nr of active works */
186 int max_active; /* L: max active works */ 183 int max_active; /* L: max active works */
187 struct list_head delayed_works; /* L: delayed works */ 184 struct list_head delayed_works; /* L: delayed works */
188 }; 185 };
189 186
190 /* 187 /*
191 * Structure used to wait for workqueue flush. 188 * Structure used to wait for workqueue flush.
192 */ 189 */
193 struct wq_flusher { 190 struct wq_flusher {
194 struct list_head list; /* F: list of flushers */ 191 struct list_head list; /* F: list of flushers */
195 int flush_color; /* F: flush color waiting for */ 192 int flush_color; /* F: flush color waiting for */
196 struct completion done; /* flush completion */ 193 struct completion done; /* flush completion */
197 }; 194 };
198 195
199 /* 196 /*
200 * All cpumasks are assumed to be always set on UP and thus can't be 197 * All cpumasks are assumed to be always set on UP and thus can't be
201 * used to determine whether there's something to be done. 198 * used to determine whether there's something to be done.
202 */ 199 */
203 #ifdef CONFIG_SMP 200 #ifdef CONFIG_SMP
204 typedef cpumask_var_t mayday_mask_t; 201 typedef cpumask_var_t mayday_mask_t;
205 #define mayday_test_and_set_cpu(cpu, mask) \ 202 #define mayday_test_and_set_cpu(cpu, mask) \
206 cpumask_test_and_set_cpu((cpu), (mask)) 203 cpumask_test_and_set_cpu((cpu), (mask))
207 #define mayday_clear_cpu(cpu, mask) cpumask_clear_cpu((cpu), (mask)) 204 #define mayday_clear_cpu(cpu, mask) cpumask_clear_cpu((cpu), (mask))
208 #define for_each_mayday_cpu(cpu, mask) for_each_cpu((cpu), (mask)) 205 #define for_each_mayday_cpu(cpu, mask) for_each_cpu((cpu), (mask))
209 #define alloc_mayday_mask(maskp, gfp) zalloc_cpumask_var((maskp), (gfp)) 206 #define alloc_mayday_mask(maskp, gfp) zalloc_cpumask_var((maskp), (gfp))
210 #define free_mayday_mask(mask) free_cpumask_var((mask)) 207 #define free_mayday_mask(mask) free_cpumask_var((mask))
211 #else 208 #else
212 typedef unsigned long mayday_mask_t; 209 typedef unsigned long mayday_mask_t;
213 #define mayday_test_and_set_cpu(cpu, mask) test_and_set_bit(0, &(mask)) 210 #define mayday_test_and_set_cpu(cpu, mask) test_and_set_bit(0, &(mask))
214 #define mayday_clear_cpu(cpu, mask) clear_bit(0, &(mask)) 211 #define mayday_clear_cpu(cpu, mask) clear_bit(0, &(mask))
215 #define for_each_mayday_cpu(cpu, mask) if ((cpu) = 0, (mask)) 212 #define for_each_mayday_cpu(cpu, mask) if ((cpu) = 0, (mask))
216 #define alloc_mayday_mask(maskp, gfp) true 213 #define alloc_mayday_mask(maskp, gfp) true
217 #define free_mayday_mask(mask) do { } while (0) 214 #define free_mayday_mask(mask) do { } while (0)
218 #endif 215 #endif
219 216
220 /* 217 /*
221 * The externally visible workqueue abstraction is an array of 218 * The externally visible workqueue abstraction is an array of
222 * per-CPU workqueues: 219 * per-CPU workqueues:
223 */ 220 */
224 struct workqueue_struct { 221 struct workqueue_struct {
225 unsigned int flags; /* I: WQ_* flags */ 222 unsigned int flags; /* I: WQ_* flags */
226 union { 223 union {
227 struct cpu_workqueue_struct __percpu *pcpu; 224 struct cpu_workqueue_struct __percpu *pcpu;
228 struct cpu_workqueue_struct *single; 225 struct cpu_workqueue_struct *single;
229 unsigned long v; 226 unsigned long v;
230 } cpu_wq; /* I: cwq's */ 227 } cpu_wq; /* I: cwq's */
231 struct list_head list; /* W: list of all workqueues */ 228 struct list_head list; /* W: list of all workqueues */
232 229
233 struct mutex flush_mutex; /* protects wq flushing */ 230 struct mutex flush_mutex; /* protects wq flushing */
234 int work_color; /* F: current work color */ 231 int work_color; /* F: current work color */
235 int flush_color; /* F: current flush color */ 232 int flush_color; /* F: current flush color */
236 atomic_t nr_cwqs_to_flush; /* flush in progress */ 233 atomic_t nr_cwqs_to_flush; /* flush in progress */
237 struct wq_flusher *first_flusher; /* F: first flusher */ 234 struct wq_flusher *first_flusher; /* F: first flusher */
238 struct list_head flusher_queue; /* F: flush waiters */ 235 struct list_head flusher_queue; /* F: flush waiters */
239 struct list_head flusher_overflow; /* F: flush overflow list */ 236 struct list_head flusher_overflow; /* F: flush overflow list */
240 237
241 mayday_mask_t mayday_mask; /* cpus requesting rescue */ 238 mayday_mask_t mayday_mask; /* cpus requesting rescue */
242 struct worker *rescuer; /* I: rescue worker */ 239 struct worker *rescuer; /* I: rescue worker */
243 240
244 int saved_max_active; /* W: saved cwq max_active */ 241 int saved_max_active; /* W: saved cwq max_active */
245 const char *name; /* I: workqueue name */ 242 const char *name; /* I: workqueue name */
246 #ifdef CONFIG_LOCKDEP 243 #ifdef CONFIG_LOCKDEP
247 struct lockdep_map lockdep_map; 244 struct lockdep_map lockdep_map;
248 #endif 245 #endif
249 }; 246 };
250 247
251 struct workqueue_struct *system_wq __read_mostly; 248 struct workqueue_struct *system_wq __read_mostly;
252 struct workqueue_struct *system_long_wq __read_mostly; 249 struct workqueue_struct *system_long_wq __read_mostly;
253 struct workqueue_struct *system_nrt_wq __read_mostly; 250 struct workqueue_struct *system_nrt_wq __read_mostly;
254 struct workqueue_struct *system_unbound_wq __read_mostly; 251 struct workqueue_struct *system_unbound_wq __read_mostly;
255 EXPORT_SYMBOL_GPL(system_wq); 252 EXPORT_SYMBOL_GPL(system_wq);
256 EXPORT_SYMBOL_GPL(system_long_wq); 253 EXPORT_SYMBOL_GPL(system_long_wq);
257 EXPORT_SYMBOL_GPL(system_nrt_wq); 254 EXPORT_SYMBOL_GPL(system_nrt_wq);
258 EXPORT_SYMBOL_GPL(system_unbound_wq); 255 EXPORT_SYMBOL_GPL(system_unbound_wq);
259 256
257 #define CREATE_TRACE_POINTS
258 #include <trace/events/workqueue.h>
259
260 #define for_each_busy_worker(worker, i, pos, gcwq) \ 260 #define for_each_busy_worker(worker, i, pos, gcwq) \
261 for (i = 0; i < BUSY_WORKER_HASH_SIZE; i++) \ 261 for (i = 0; i < BUSY_WORKER_HASH_SIZE; i++) \
262 hlist_for_each_entry(worker, pos, &gcwq->busy_hash[i], hentry) 262 hlist_for_each_entry(worker, pos, &gcwq->busy_hash[i], hentry)
263 263
264 static inline int __next_gcwq_cpu(int cpu, const struct cpumask *mask, 264 static inline int __next_gcwq_cpu(int cpu, const struct cpumask *mask,
265 unsigned int sw) 265 unsigned int sw)
266 { 266 {
267 if (cpu < nr_cpu_ids) { 267 if (cpu < nr_cpu_ids) {
268 if (sw & 1) { 268 if (sw & 1) {
269 cpu = cpumask_next(cpu, mask); 269 cpu = cpumask_next(cpu, mask);
270 if (cpu < nr_cpu_ids) 270 if (cpu < nr_cpu_ids)
271 return cpu; 271 return cpu;
272 } 272 }
273 if (sw & 2) 273 if (sw & 2)
274 return WORK_CPU_UNBOUND; 274 return WORK_CPU_UNBOUND;
275 } 275 }
276 return WORK_CPU_NONE; 276 return WORK_CPU_NONE;
277 } 277 }
278 278
279 static inline int __next_wq_cpu(int cpu, const struct cpumask *mask, 279 static inline int __next_wq_cpu(int cpu, const struct cpumask *mask,
280 struct workqueue_struct *wq) 280 struct workqueue_struct *wq)
281 { 281 {
282 return __next_gcwq_cpu(cpu, mask, !(wq->flags & WQ_UNBOUND) ? 1 : 2); 282 return __next_gcwq_cpu(cpu, mask, !(wq->flags & WQ_UNBOUND) ? 1 : 2);
283 } 283 }
284 284
285 /* 285 /*
286 * CPU iterators 286 * CPU iterators
287 * 287 *
288 * An extra gcwq is defined for an invalid cpu number 288 * An extra gcwq is defined for an invalid cpu number
289 * (WORK_CPU_UNBOUND) to host workqueues which are not bound to any 289 * (WORK_CPU_UNBOUND) to host workqueues which are not bound to any
290 * specific CPU. The following iterators are similar to 290 * specific CPU. The following iterators are similar to
291 * for_each_*_cpu() iterators but also considers the unbound gcwq. 291 * for_each_*_cpu() iterators but also considers the unbound gcwq.
292 * 292 *
293 * for_each_gcwq_cpu() : possible CPUs + WORK_CPU_UNBOUND 293 * for_each_gcwq_cpu() : possible CPUs + WORK_CPU_UNBOUND
294 * for_each_online_gcwq_cpu() : online CPUs + WORK_CPU_UNBOUND 294 * for_each_online_gcwq_cpu() : online CPUs + WORK_CPU_UNBOUND
295 * for_each_cwq_cpu() : possible CPUs for bound workqueues, 295 * for_each_cwq_cpu() : possible CPUs for bound workqueues,
296 * WORK_CPU_UNBOUND for unbound workqueues 296 * WORK_CPU_UNBOUND for unbound workqueues
297 */ 297 */
298 #define for_each_gcwq_cpu(cpu) \ 298 #define for_each_gcwq_cpu(cpu) \
299 for ((cpu) = __next_gcwq_cpu(-1, cpu_possible_mask, 3); \ 299 for ((cpu) = __next_gcwq_cpu(-1, cpu_possible_mask, 3); \
300 (cpu) < WORK_CPU_NONE; \ 300 (cpu) < WORK_CPU_NONE; \
301 (cpu) = __next_gcwq_cpu((cpu), cpu_possible_mask, 3)) 301 (cpu) = __next_gcwq_cpu((cpu), cpu_possible_mask, 3))
302 302
303 #define for_each_online_gcwq_cpu(cpu) \ 303 #define for_each_online_gcwq_cpu(cpu) \
304 for ((cpu) = __next_gcwq_cpu(-1, cpu_online_mask, 3); \ 304 for ((cpu) = __next_gcwq_cpu(-1, cpu_online_mask, 3); \
305 (cpu) < WORK_CPU_NONE; \ 305 (cpu) < WORK_CPU_NONE; \
306 (cpu) = __next_gcwq_cpu((cpu), cpu_online_mask, 3)) 306 (cpu) = __next_gcwq_cpu((cpu), cpu_online_mask, 3))
307 307
308 #define for_each_cwq_cpu(cpu, wq) \ 308 #define for_each_cwq_cpu(cpu, wq) \
309 for ((cpu) = __next_wq_cpu(-1, cpu_possible_mask, (wq)); \ 309 for ((cpu) = __next_wq_cpu(-1, cpu_possible_mask, (wq)); \
310 (cpu) < WORK_CPU_NONE; \ 310 (cpu) < WORK_CPU_NONE; \
311 (cpu) = __next_wq_cpu((cpu), cpu_possible_mask, (wq))) 311 (cpu) = __next_wq_cpu((cpu), cpu_possible_mask, (wq)))
312 312
313 #ifdef CONFIG_LOCKDEP
314 /**
315 * in_workqueue_context() - in context of specified workqueue?
316 * @wq: the workqueue of interest
317 *
318 * Checks lockdep state to see if the current task is executing from
319 * within a workqueue item. This function exists only if lockdep is
320 * enabled.
321 */
322 int in_workqueue_context(struct workqueue_struct *wq)
323 {
324 return lock_is_held(&wq->lockdep_map);
325 }
326 #endif
327
328 #ifdef CONFIG_DEBUG_OBJECTS_WORK 313 #ifdef CONFIG_DEBUG_OBJECTS_WORK
329 314
330 static struct debug_obj_descr work_debug_descr; 315 static struct debug_obj_descr work_debug_descr;
331 316
332 /* 317 /*
333 * fixup_init is called when: 318 * fixup_init is called when:
334 * - an active object is initialized 319 * - an active object is initialized
335 */ 320 */
336 static int work_fixup_init(void *addr, enum debug_obj_state state) 321 static int work_fixup_init(void *addr, enum debug_obj_state state)
337 { 322 {
338 struct work_struct *work = addr; 323 struct work_struct *work = addr;
339 324
340 switch (state) { 325 switch (state) {
341 case ODEBUG_STATE_ACTIVE: 326 case ODEBUG_STATE_ACTIVE:
342 cancel_work_sync(work); 327 cancel_work_sync(work);
343 debug_object_init(work, &work_debug_descr); 328 debug_object_init(work, &work_debug_descr);
344 return 1; 329 return 1;
345 default: 330 default:
346 return 0; 331 return 0;
347 } 332 }
348 } 333 }
349 334
350 /* 335 /*
351 * fixup_activate is called when: 336 * fixup_activate is called when:
352 * - an active object is activated 337 * - an active object is activated
353 * - an unknown object is activated (might be a statically initialized object) 338 * - an unknown object is activated (might be a statically initialized object)
354 */ 339 */
355 static int work_fixup_activate(void *addr, enum debug_obj_state state) 340 static int work_fixup_activate(void *addr, enum debug_obj_state state)
356 { 341 {
357 struct work_struct *work = addr; 342 struct work_struct *work = addr;
358 343
359 switch (state) { 344 switch (state) {
360 345
361 case ODEBUG_STATE_NOTAVAILABLE: 346 case ODEBUG_STATE_NOTAVAILABLE:
362 /* 347 /*
363 * This is not really a fixup. The work struct was 348 * This is not really a fixup. The work struct was
364 * statically initialized. We just make sure that it 349 * statically initialized. We just make sure that it
365 * is tracked in the object tracker. 350 * is tracked in the object tracker.
366 */ 351 */
367 if (test_bit(WORK_STRUCT_STATIC_BIT, work_data_bits(work))) { 352 if (test_bit(WORK_STRUCT_STATIC_BIT, work_data_bits(work))) {
368 debug_object_init(work, &work_debug_descr); 353 debug_object_init(work, &work_debug_descr);
369 debug_object_activate(work, &work_debug_descr); 354 debug_object_activate(work, &work_debug_descr);
370 return 0; 355 return 0;
371 } 356 }
372 WARN_ON_ONCE(1); 357 WARN_ON_ONCE(1);
373 return 0; 358 return 0;
374 359
375 case ODEBUG_STATE_ACTIVE: 360 case ODEBUG_STATE_ACTIVE:
376 WARN_ON(1); 361 WARN_ON(1);
377 362
378 default: 363 default:
379 return 0; 364 return 0;
380 } 365 }
381 } 366 }
382 367
383 /* 368 /*
384 * fixup_free is called when: 369 * fixup_free is called when:
385 * - an active object is freed 370 * - an active object is freed
386 */ 371 */
387 static int work_fixup_free(void *addr, enum debug_obj_state state) 372 static int work_fixup_free(void *addr, enum debug_obj_state state)
388 { 373 {
389 struct work_struct *work = addr; 374 struct work_struct *work = addr;
390 375
391 switch (state) { 376 switch (state) {
392 case ODEBUG_STATE_ACTIVE: 377 case ODEBUG_STATE_ACTIVE:
393 cancel_work_sync(work); 378 cancel_work_sync(work);
394 debug_object_free(work, &work_debug_descr); 379 debug_object_free(work, &work_debug_descr);
395 return 1; 380 return 1;
396 default: 381 default:
397 return 0; 382 return 0;
398 } 383 }
399 } 384 }
400 385
401 static struct debug_obj_descr work_debug_descr = { 386 static struct debug_obj_descr work_debug_descr = {
402 .name = "work_struct", 387 .name = "work_struct",
403 .fixup_init = work_fixup_init, 388 .fixup_init = work_fixup_init,
404 .fixup_activate = work_fixup_activate, 389 .fixup_activate = work_fixup_activate,
405 .fixup_free = work_fixup_free, 390 .fixup_free = work_fixup_free,
406 }; 391 };
407 392
408 static inline void debug_work_activate(struct work_struct *work) 393 static inline void debug_work_activate(struct work_struct *work)
409 { 394 {
410 debug_object_activate(work, &work_debug_descr); 395 debug_object_activate(work, &work_debug_descr);
411 } 396 }
412 397
413 static inline void debug_work_deactivate(struct work_struct *work) 398 static inline void debug_work_deactivate(struct work_struct *work)
414 { 399 {
415 debug_object_deactivate(work, &work_debug_descr); 400 debug_object_deactivate(work, &work_debug_descr);
416 } 401 }
417 402
418 void __init_work(struct work_struct *work, int onstack) 403 void __init_work(struct work_struct *work, int onstack)
419 { 404 {
420 if (onstack) 405 if (onstack)
421 debug_object_init_on_stack(work, &work_debug_descr); 406 debug_object_init_on_stack(work, &work_debug_descr);
422 else 407 else
423 debug_object_init(work, &work_debug_descr); 408 debug_object_init(work, &work_debug_descr);
424 } 409 }
425 EXPORT_SYMBOL_GPL(__init_work); 410 EXPORT_SYMBOL_GPL(__init_work);
426 411
427 void destroy_work_on_stack(struct work_struct *work) 412 void destroy_work_on_stack(struct work_struct *work)
428 { 413 {
429 debug_object_free(work, &work_debug_descr); 414 debug_object_free(work, &work_debug_descr);
430 } 415 }
431 EXPORT_SYMBOL_GPL(destroy_work_on_stack); 416 EXPORT_SYMBOL_GPL(destroy_work_on_stack);
432 417
433 #else 418 #else
434 static inline void debug_work_activate(struct work_struct *work) { } 419 static inline void debug_work_activate(struct work_struct *work) { }
435 static inline void debug_work_deactivate(struct work_struct *work) { } 420 static inline void debug_work_deactivate(struct work_struct *work) { }
436 #endif 421 #endif
437 422
438 /* Serializes the accesses to the list of workqueues. */ 423 /* Serializes the accesses to the list of workqueues. */
439 static DEFINE_SPINLOCK(workqueue_lock); 424 static DEFINE_SPINLOCK(workqueue_lock);
440 static LIST_HEAD(workqueues); 425 static LIST_HEAD(workqueues);
441 static bool workqueue_freezing; /* W: have wqs started freezing? */ 426 static bool workqueue_freezing; /* W: have wqs started freezing? */
442 427
443 /* 428 /*
444 * The almighty global cpu workqueues. nr_running is the only field 429 * The almighty global cpu workqueues. nr_running is the only field
445 * which is expected to be used frequently by other cpus via 430 * which is expected to be used frequently by other cpus via
446 * try_to_wake_up(). Put it in a separate cacheline. 431 * try_to_wake_up(). Put it in a separate cacheline.
447 */ 432 */
448 static DEFINE_PER_CPU(struct global_cwq, global_cwq); 433 static DEFINE_PER_CPU(struct global_cwq, global_cwq);
449 static DEFINE_PER_CPU_SHARED_ALIGNED(atomic_t, gcwq_nr_running); 434 static DEFINE_PER_CPU_SHARED_ALIGNED(atomic_t, gcwq_nr_running);
450 435
451 /* 436 /*
452 * Global cpu workqueue and nr_running counter for unbound gcwq. The 437 * Global cpu workqueue and nr_running counter for unbound gcwq. The
453 * gcwq is always online, has GCWQ_DISASSOCIATED set, and all its 438 * gcwq is always online, has GCWQ_DISASSOCIATED set, and all its
454 * workers have WORKER_UNBOUND set. 439 * workers have WORKER_UNBOUND set.
455 */ 440 */
456 static struct global_cwq unbound_global_cwq; 441 static struct global_cwq unbound_global_cwq;
457 static atomic_t unbound_gcwq_nr_running = ATOMIC_INIT(0); /* always 0 */ 442 static atomic_t unbound_gcwq_nr_running = ATOMIC_INIT(0); /* always 0 */
458 443
459 static int worker_thread(void *__worker); 444 static int worker_thread(void *__worker);
460 445
461 static struct global_cwq *get_gcwq(unsigned int cpu) 446 static struct global_cwq *get_gcwq(unsigned int cpu)
462 { 447 {
463 if (cpu != WORK_CPU_UNBOUND) 448 if (cpu != WORK_CPU_UNBOUND)
464 return &per_cpu(global_cwq, cpu); 449 return &per_cpu(global_cwq, cpu);
465 else 450 else
466 return &unbound_global_cwq; 451 return &unbound_global_cwq;
467 } 452 }
468 453
469 static atomic_t *get_gcwq_nr_running(unsigned int cpu) 454 static atomic_t *get_gcwq_nr_running(unsigned int cpu)
470 { 455 {
471 if (cpu != WORK_CPU_UNBOUND) 456 if (cpu != WORK_CPU_UNBOUND)
472 return &per_cpu(gcwq_nr_running, cpu); 457 return &per_cpu(gcwq_nr_running, cpu);
473 else 458 else
474 return &unbound_gcwq_nr_running; 459 return &unbound_gcwq_nr_running;
475 } 460 }
476 461
477 static struct cpu_workqueue_struct *get_cwq(unsigned int cpu, 462 static struct cpu_workqueue_struct *get_cwq(unsigned int cpu,
478 struct workqueue_struct *wq) 463 struct workqueue_struct *wq)
479 { 464 {
480 if (!(wq->flags & WQ_UNBOUND)) { 465 if (!(wq->flags & WQ_UNBOUND)) {
481 if (likely(cpu < nr_cpu_ids)) { 466 if (likely(cpu < nr_cpu_ids)) {
482 #ifdef CONFIG_SMP 467 #ifdef CONFIG_SMP
483 return per_cpu_ptr(wq->cpu_wq.pcpu, cpu); 468 return per_cpu_ptr(wq->cpu_wq.pcpu, cpu);
484 #else 469 #else
485 return wq->cpu_wq.single; 470 return wq->cpu_wq.single;
486 #endif 471 #endif
487 } 472 }
488 } else if (likely(cpu == WORK_CPU_UNBOUND)) 473 } else if (likely(cpu == WORK_CPU_UNBOUND))
489 return wq->cpu_wq.single; 474 return wq->cpu_wq.single;
490 return NULL; 475 return NULL;
491 } 476 }
492 477
493 static unsigned int work_color_to_flags(int color) 478 static unsigned int work_color_to_flags(int color)
494 { 479 {
495 return color << WORK_STRUCT_COLOR_SHIFT; 480 return color << WORK_STRUCT_COLOR_SHIFT;
496 } 481 }
497 482
498 static int get_work_color(struct work_struct *work) 483 static int get_work_color(struct work_struct *work)
499 { 484 {
500 return (*work_data_bits(work) >> WORK_STRUCT_COLOR_SHIFT) & 485 return (*work_data_bits(work) >> WORK_STRUCT_COLOR_SHIFT) &
501 ((1 << WORK_STRUCT_COLOR_BITS) - 1); 486 ((1 << WORK_STRUCT_COLOR_BITS) - 1);
502 } 487 }
503 488
504 static int work_next_color(int color) 489 static int work_next_color(int color)
505 { 490 {
506 return (color + 1) % WORK_NR_COLORS; 491 return (color + 1) % WORK_NR_COLORS;
507 } 492 }
508 493
509 /* 494 /*
510 * A work's data points to the cwq with WORK_STRUCT_CWQ set while the 495 * A work's data points to the cwq with WORK_STRUCT_CWQ set while the
511 * work is on queue. Once execution starts, WORK_STRUCT_CWQ is 496 * work is on queue. Once execution starts, WORK_STRUCT_CWQ is
512 * cleared and the work data contains the cpu number it was last on. 497 * cleared and the work data contains the cpu number it was last on.
513 * 498 *
514 * set_work_{cwq|cpu}() and clear_work_data() can be used to set the 499 * set_work_{cwq|cpu}() and clear_work_data() can be used to set the
515 * cwq, cpu or clear work->data. These functions should only be 500 * cwq, cpu or clear work->data. These functions should only be
516 * called while the work is owned - ie. while the PENDING bit is set. 501 * called while the work is owned - ie. while the PENDING bit is set.
517 * 502 *
518 * get_work_[g]cwq() can be used to obtain the gcwq or cwq 503 * get_work_[g]cwq() can be used to obtain the gcwq or cwq
519 * corresponding to a work. gcwq is available once the work has been 504 * corresponding to a work. gcwq is available once the work has been
520 * queued anywhere after initialization. cwq is available only from 505 * queued anywhere after initialization. cwq is available only from
521 * queueing until execution starts. 506 * queueing until execution starts.
522 */ 507 */
523 static inline void set_work_data(struct work_struct *work, unsigned long data, 508 static inline void set_work_data(struct work_struct *work, unsigned long data,
524 unsigned long flags) 509 unsigned long flags)
525 { 510 {
526 BUG_ON(!work_pending(work)); 511 BUG_ON(!work_pending(work));
527 atomic_long_set(&work->data, data | flags | work_static(work)); 512 atomic_long_set(&work->data, data | flags | work_static(work));
528 } 513 }
529 514
530 static void set_work_cwq(struct work_struct *work, 515 static void set_work_cwq(struct work_struct *work,
531 struct cpu_workqueue_struct *cwq, 516 struct cpu_workqueue_struct *cwq,
532 unsigned long extra_flags) 517 unsigned long extra_flags)
533 { 518 {
534 set_work_data(work, (unsigned long)cwq, 519 set_work_data(work, (unsigned long)cwq,
535 WORK_STRUCT_PENDING | WORK_STRUCT_CWQ | extra_flags); 520 WORK_STRUCT_PENDING | WORK_STRUCT_CWQ | extra_flags);
536 } 521 }
537 522
538 static void set_work_cpu(struct work_struct *work, unsigned int cpu) 523 static void set_work_cpu(struct work_struct *work, unsigned int cpu)
539 { 524 {
540 set_work_data(work, cpu << WORK_STRUCT_FLAG_BITS, WORK_STRUCT_PENDING); 525 set_work_data(work, cpu << WORK_STRUCT_FLAG_BITS, WORK_STRUCT_PENDING);
541 } 526 }
542 527
543 static void clear_work_data(struct work_struct *work) 528 static void clear_work_data(struct work_struct *work)
544 { 529 {
545 set_work_data(work, WORK_STRUCT_NO_CPU, 0); 530 set_work_data(work, WORK_STRUCT_NO_CPU, 0);
546 } 531 }
547 532
548 static struct cpu_workqueue_struct *get_work_cwq(struct work_struct *work) 533 static struct cpu_workqueue_struct *get_work_cwq(struct work_struct *work)
549 { 534 {
550 unsigned long data = atomic_long_read(&work->data); 535 unsigned long data = atomic_long_read(&work->data);
551 536
552 if (data & WORK_STRUCT_CWQ) 537 if (data & WORK_STRUCT_CWQ)
553 return (void *)(data & WORK_STRUCT_WQ_DATA_MASK); 538 return (void *)(data & WORK_STRUCT_WQ_DATA_MASK);
554 else 539 else
555 return NULL; 540 return NULL;
556 } 541 }
557 542
558 static struct global_cwq *get_work_gcwq(struct work_struct *work) 543 static struct global_cwq *get_work_gcwq(struct work_struct *work)
559 { 544 {
560 unsigned long data = atomic_long_read(&work->data); 545 unsigned long data = atomic_long_read(&work->data);
561 unsigned int cpu; 546 unsigned int cpu;
562 547
563 if (data & WORK_STRUCT_CWQ) 548 if (data & WORK_STRUCT_CWQ)
564 return ((struct cpu_workqueue_struct *) 549 return ((struct cpu_workqueue_struct *)
565 (data & WORK_STRUCT_WQ_DATA_MASK))->gcwq; 550 (data & WORK_STRUCT_WQ_DATA_MASK))->gcwq;
566 551
567 cpu = data >> WORK_STRUCT_FLAG_BITS; 552 cpu = data >> WORK_STRUCT_FLAG_BITS;
568 if (cpu == WORK_CPU_NONE) 553 if (cpu == WORK_CPU_NONE)
569 return NULL; 554 return NULL;
570 555
571 BUG_ON(cpu >= nr_cpu_ids && cpu != WORK_CPU_UNBOUND); 556 BUG_ON(cpu >= nr_cpu_ids && cpu != WORK_CPU_UNBOUND);
572 return get_gcwq(cpu); 557 return get_gcwq(cpu);
573 } 558 }
574 559
575 /* 560 /*
576 * Policy functions. These define the policies on how the global 561 * Policy functions. These define the policies on how the global
577 * worker pool is managed. Unless noted otherwise, these functions 562 * worker pool is managed. Unless noted otherwise, these functions
578 * assume that they're being called with gcwq->lock held. 563 * assume that they're being called with gcwq->lock held.
579 */ 564 */
580 565
581 static bool __need_more_worker(struct global_cwq *gcwq) 566 static bool __need_more_worker(struct global_cwq *gcwq)
582 { 567 {
583 return !atomic_read(get_gcwq_nr_running(gcwq->cpu)) || 568 return !atomic_read(get_gcwq_nr_running(gcwq->cpu)) ||
584 gcwq->flags & GCWQ_HIGHPRI_PENDING; 569 gcwq->flags & GCWQ_HIGHPRI_PENDING;
585 } 570 }
586 571
587 /* 572 /*
588 * Need to wake up a worker? Called from anything but currently 573 * Need to wake up a worker? Called from anything but currently
589 * running workers. 574 * running workers.
590 */ 575 */
591 static bool need_more_worker(struct global_cwq *gcwq) 576 static bool need_more_worker(struct global_cwq *gcwq)
592 { 577 {
593 return !list_empty(&gcwq->worklist) && __need_more_worker(gcwq); 578 return !list_empty(&gcwq->worklist) && __need_more_worker(gcwq);
594 } 579 }
595 580
596 /* Can I start working? Called from busy but !running workers. */ 581 /* Can I start working? Called from busy but !running workers. */
597 static bool may_start_working(struct global_cwq *gcwq) 582 static bool may_start_working(struct global_cwq *gcwq)
598 { 583 {
599 return gcwq->nr_idle; 584 return gcwq->nr_idle;
600 } 585 }
601 586
602 /* Do I need to keep working? Called from currently running workers. */ 587 /* Do I need to keep working? Called from currently running workers. */
603 static bool keep_working(struct global_cwq *gcwq) 588 static bool keep_working(struct global_cwq *gcwq)
604 { 589 {
605 atomic_t *nr_running = get_gcwq_nr_running(gcwq->cpu); 590 atomic_t *nr_running = get_gcwq_nr_running(gcwq->cpu);
606 591
607 return !list_empty(&gcwq->worklist) && atomic_read(nr_running) <= 1; 592 return !list_empty(&gcwq->worklist) &&
593 (atomic_read(nr_running) <= 1 ||
594 gcwq->flags & GCWQ_HIGHPRI_PENDING);
608 } 595 }
609 596
610 /* Do we need a new worker? Called from manager. */ 597 /* Do we need a new worker? Called from manager. */
611 static bool need_to_create_worker(struct global_cwq *gcwq) 598 static bool need_to_create_worker(struct global_cwq *gcwq)
612 { 599 {
613 return need_more_worker(gcwq) && !may_start_working(gcwq); 600 return need_more_worker(gcwq) && !may_start_working(gcwq);
614 } 601 }
615 602
616 /* Do I need to be the manager? */ 603 /* Do I need to be the manager? */
617 static bool need_to_manage_workers(struct global_cwq *gcwq) 604 static bool need_to_manage_workers(struct global_cwq *gcwq)
618 { 605 {
619 return need_to_create_worker(gcwq) || gcwq->flags & GCWQ_MANAGE_WORKERS; 606 return need_to_create_worker(gcwq) || gcwq->flags & GCWQ_MANAGE_WORKERS;
620 } 607 }
621 608
622 /* Do we have too many workers and should some go away? */ 609 /* Do we have too many workers and should some go away? */
623 static bool too_many_workers(struct global_cwq *gcwq) 610 static bool too_many_workers(struct global_cwq *gcwq)
624 { 611 {
625 bool managing = gcwq->flags & GCWQ_MANAGING_WORKERS; 612 bool managing = gcwq->flags & GCWQ_MANAGING_WORKERS;
626 int nr_idle = gcwq->nr_idle + managing; /* manager is considered idle */ 613 int nr_idle = gcwq->nr_idle + managing; /* manager is considered idle */
627 int nr_busy = gcwq->nr_workers - nr_idle; 614 int nr_busy = gcwq->nr_workers - nr_idle;
628 615
629 return nr_idle > 2 && (nr_idle - 2) * MAX_IDLE_WORKERS_RATIO >= nr_busy; 616 return nr_idle > 2 && (nr_idle - 2) * MAX_IDLE_WORKERS_RATIO >= nr_busy;
630 } 617 }
631 618
632 /* 619 /*
633 * Wake up functions. 620 * Wake up functions.
634 */ 621 */
635 622
636 /* Return the first worker. Safe with preemption disabled */ 623 /* Return the first worker. Safe with preemption disabled */
637 static struct worker *first_worker(struct global_cwq *gcwq) 624 static struct worker *first_worker(struct global_cwq *gcwq)
638 { 625 {
639 if (unlikely(list_empty(&gcwq->idle_list))) 626 if (unlikely(list_empty(&gcwq->idle_list)))
640 return NULL; 627 return NULL;
641 628
642 return list_first_entry(&gcwq->idle_list, struct worker, entry); 629 return list_first_entry(&gcwq->idle_list, struct worker, entry);
643 } 630 }
644 631
645 /** 632 /**
646 * wake_up_worker - wake up an idle worker 633 * wake_up_worker - wake up an idle worker
647 * @gcwq: gcwq to wake worker for 634 * @gcwq: gcwq to wake worker for
648 * 635 *
649 * Wake up the first idle worker of @gcwq. 636 * Wake up the first idle worker of @gcwq.
650 * 637 *
651 * CONTEXT: 638 * CONTEXT:
652 * spin_lock_irq(gcwq->lock). 639 * spin_lock_irq(gcwq->lock).
653 */ 640 */
654 static void wake_up_worker(struct global_cwq *gcwq) 641 static void wake_up_worker(struct global_cwq *gcwq)
655 { 642 {
656 struct worker *worker = first_worker(gcwq); 643 struct worker *worker = first_worker(gcwq);
657 644
658 if (likely(worker)) 645 if (likely(worker))
659 wake_up_process(worker->task); 646 wake_up_process(worker->task);
660 } 647 }
661 648
662 /** 649 /**
663 * wq_worker_waking_up - a worker is waking up 650 * wq_worker_waking_up - a worker is waking up
664 * @task: task waking up 651 * @task: task waking up
665 * @cpu: CPU @task is waking up to 652 * @cpu: CPU @task is waking up to
666 * 653 *
667 * This function is called during try_to_wake_up() when a worker is 654 * This function is called during try_to_wake_up() when a worker is
668 * being awoken. 655 * being awoken.
669 * 656 *
670 * CONTEXT: 657 * CONTEXT:
671 * spin_lock_irq(rq->lock) 658 * spin_lock_irq(rq->lock)
672 */ 659 */
673 void wq_worker_waking_up(struct task_struct *task, unsigned int cpu) 660 void wq_worker_waking_up(struct task_struct *task, unsigned int cpu)
674 { 661 {
675 struct worker *worker = kthread_data(task); 662 struct worker *worker = kthread_data(task);
676 663
677 if (likely(!(worker->flags & WORKER_NOT_RUNNING))) 664 if (likely(!(worker->flags & WORKER_NOT_RUNNING)))
678 atomic_inc(get_gcwq_nr_running(cpu)); 665 atomic_inc(get_gcwq_nr_running(cpu));
679 } 666 }
680 667
681 /** 668 /**
682 * wq_worker_sleeping - a worker is going to sleep 669 * wq_worker_sleeping - a worker is going to sleep
683 * @task: task going to sleep 670 * @task: task going to sleep
684 * @cpu: CPU in question, must be the current CPU number 671 * @cpu: CPU in question, must be the current CPU number
685 * 672 *
686 * This function is called during schedule() when a busy worker is 673 * This function is called during schedule() when a busy worker is
687 * going to sleep. Worker on the same cpu can be woken up by 674 * going to sleep. Worker on the same cpu can be woken up by
688 * returning pointer to its task. 675 * returning pointer to its task.
689 * 676 *
690 * CONTEXT: 677 * CONTEXT:
691 * spin_lock_irq(rq->lock) 678 * spin_lock_irq(rq->lock)
692 * 679 *
693 * RETURNS: 680 * RETURNS:
694 * Worker task on @cpu to wake up, %NULL if none. 681 * Worker task on @cpu to wake up, %NULL if none.
695 */ 682 */
696 struct task_struct *wq_worker_sleeping(struct task_struct *task, 683 struct task_struct *wq_worker_sleeping(struct task_struct *task,
697 unsigned int cpu) 684 unsigned int cpu)
698 { 685 {
699 struct worker *worker = kthread_data(task), *to_wakeup = NULL; 686 struct worker *worker = kthread_data(task), *to_wakeup = NULL;
700 struct global_cwq *gcwq = get_gcwq(cpu); 687 struct global_cwq *gcwq = get_gcwq(cpu);
701 atomic_t *nr_running = get_gcwq_nr_running(cpu); 688 atomic_t *nr_running = get_gcwq_nr_running(cpu);
702 689
703 if (unlikely(worker->flags & WORKER_NOT_RUNNING)) 690 if (unlikely(worker->flags & WORKER_NOT_RUNNING))
704 return NULL; 691 return NULL;
705 692
706 /* this can only happen on the local cpu */ 693 /* this can only happen on the local cpu */
707 BUG_ON(cpu != raw_smp_processor_id()); 694 BUG_ON(cpu != raw_smp_processor_id());
708 695
709 /* 696 /*
710 * The counterpart of the following dec_and_test, implied mb, 697 * The counterpart of the following dec_and_test, implied mb,
711 * worklist not empty test sequence is in insert_work(). 698 * worklist not empty test sequence is in insert_work().
712 * Please read comment there. 699 * Please read comment there.
713 * 700 *
714 * NOT_RUNNING is clear. This means that trustee is not in 701 * NOT_RUNNING is clear. This means that trustee is not in
715 * charge and we're running on the local cpu w/ rq lock held 702 * charge and we're running on the local cpu w/ rq lock held
716 * and preemption disabled, which in turn means that none else 703 * and preemption disabled, which in turn means that none else
717 * could be manipulating idle_list, so dereferencing idle_list 704 * could be manipulating idle_list, so dereferencing idle_list
718 * without gcwq lock is safe. 705 * without gcwq lock is safe.
719 */ 706 */
720 if (atomic_dec_and_test(nr_running) && !list_empty(&gcwq->worklist)) 707 if (atomic_dec_and_test(nr_running) && !list_empty(&gcwq->worklist))
721 to_wakeup = first_worker(gcwq); 708 to_wakeup = first_worker(gcwq);
722 return to_wakeup ? to_wakeup->task : NULL; 709 return to_wakeup ? to_wakeup->task : NULL;
723 } 710 }
724 711
725 /** 712 /**
726 * worker_set_flags - set worker flags and adjust nr_running accordingly 713 * worker_set_flags - set worker flags and adjust nr_running accordingly
727 * @worker: self 714 * @worker: self
728 * @flags: flags to set 715 * @flags: flags to set
729 * @wakeup: wakeup an idle worker if necessary 716 * @wakeup: wakeup an idle worker if necessary
730 * 717 *
731 * Set @flags in @worker->flags and adjust nr_running accordingly. If 718 * Set @flags in @worker->flags and adjust nr_running accordingly. If
732 * nr_running becomes zero and @wakeup is %true, an idle worker is 719 * nr_running becomes zero and @wakeup is %true, an idle worker is
733 * woken up. 720 * woken up.
734 * 721 *
735 * CONTEXT: 722 * CONTEXT:
736 * spin_lock_irq(gcwq->lock) 723 * spin_lock_irq(gcwq->lock)
737 */ 724 */
738 static inline void worker_set_flags(struct worker *worker, unsigned int flags, 725 static inline void worker_set_flags(struct worker *worker, unsigned int flags,
739 bool wakeup) 726 bool wakeup)
740 { 727 {
741 struct global_cwq *gcwq = worker->gcwq; 728 struct global_cwq *gcwq = worker->gcwq;
742 729
743 WARN_ON_ONCE(worker->task != current); 730 WARN_ON_ONCE(worker->task != current);
744 731
745 /* 732 /*
746 * If transitioning into NOT_RUNNING, adjust nr_running and 733 * If transitioning into NOT_RUNNING, adjust nr_running and
747 * wake up an idle worker as necessary if requested by 734 * wake up an idle worker as necessary if requested by
748 * @wakeup. 735 * @wakeup.
749 */ 736 */
750 if ((flags & WORKER_NOT_RUNNING) && 737 if ((flags & WORKER_NOT_RUNNING) &&
751 !(worker->flags & WORKER_NOT_RUNNING)) { 738 !(worker->flags & WORKER_NOT_RUNNING)) {
752 atomic_t *nr_running = get_gcwq_nr_running(gcwq->cpu); 739 atomic_t *nr_running = get_gcwq_nr_running(gcwq->cpu);
753 740
754 if (wakeup) { 741 if (wakeup) {
755 if (atomic_dec_and_test(nr_running) && 742 if (atomic_dec_and_test(nr_running) &&
756 !list_empty(&gcwq->worklist)) 743 !list_empty(&gcwq->worklist))
757 wake_up_worker(gcwq); 744 wake_up_worker(gcwq);
758 } else 745 } else
759 atomic_dec(nr_running); 746 atomic_dec(nr_running);
760 } 747 }
761 748
762 worker->flags |= flags; 749 worker->flags |= flags;
763 } 750 }
764 751
765 /** 752 /**
766 * worker_clr_flags - clear worker flags and adjust nr_running accordingly 753 * worker_clr_flags - clear worker flags and adjust nr_running accordingly
767 * @worker: self 754 * @worker: self
768 * @flags: flags to clear 755 * @flags: flags to clear
769 * 756 *
770 * Clear @flags in @worker->flags and adjust nr_running accordingly. 757 * Clear @flags in @worker->flags and adjust nr_running accordingly.
771 * 758 *
772 * CONTEXT: 759 * CONTEXT:
773 * spin_lock_irq(gcwq->lock) 760 * spin_lock_irq(gcwq->lock)
774 */ 761 */
775 static inline void worker_clr_flags(struct worker *worker, unsigned int flags) 762 static inline void worker_clr_flags(struct worker *worker, unsigned int flags)
776 { 763 {
777 struct global_cwq *gcwq = worker->gcwq; 764 struct global_cwq *gcwq = worker->gcwq;
778 unsigned int oflags = worker->flags; 765 unsigned int oflags = worker->flags;
779 766
780 WARN_ON_ONCE(worker->task != current); 767 WARN_ON_ONCE(worker->task != current);
781 768
782 worker->flags &= ~flags; 769 worker->flags &= ~flags;
783 770
784 /* if transitioning out of NOT_RUNNING, increment nr_running */ 771 /* if transitioning out of NOT_RUNNING, increment nr_running */
785 if ((flags & WORKER_NOT_RUNNING) && (oflags & WORKER_NOT_RUNNING)) 772 if ((flags & WORKER_NOT_RUNNING) && (oflags & WORKER_NOT_RUNNING))
786 if (!(worker->flags & WORKER_NOT_RUNNING)) 773 if (!(worker->flags & WORKER_NOT_RUNNING))
787 atomic_inc(get_gcwq_nr_running(gcwq->cpu)); 774 atomic_inc(get_gcwq_nr_running(gcwq->cpu));
788 } 775 }
789 776
790 /** 777 /**
791 * busy_worker_head - return the busy hash head for a work 778 * busy_worker_head - return the busy hash head for a work
792 * @gcwq: gcwq of interest 779 * @gcwq: gcwq of interest
793 * @work: work to be hashed 780 * @work: work to be hashed
794 * 781 *
795 * Return hash head of @gcwq for @work. 782 * Return hash head of @gcwq for @work.
796 * 783 *
797 * CONTEXT: 784 * CONTEXT:
798 * spin_lock_irq(gcwq->lock). 785 * spin_lock_irq(gcwq->lock).
799 * 786 *
800 * RETURNS: 787 * RETURNS:
801 * Pointer to the hash head. 788 * Pointer to the hash head.
802 */ 789 */
803 static struct hlist_head *busy_worker_head(struct global_cwq *gcwq, 790 static struct hlist_head *busy_worker_head(struct global_cwq *gcwq,
804 struct work_struct *work) 791 struct work_struct *work)
805 { 792 {
806 const int base_shift = ilog2(sizeof(struct work_struct)); 793 const int base_shift = ilog2(sizeof(struct work_struct));
807 unsigned long v = (unsigned long)work; 794 unsigned long v = (unsigned long)work;
808 795
809 /* simple shift and fold hash, do we need something better? */ 796 /* simple shift and fold hash, do we need something better? */
810 v >>= base_shift; 797 v >>= base_shift;
811 v += v >> BUSY_WORKER_HASH_ORDER; 798 v += v >> BUSY_WORKER_HASH_ORDER;
812 v &= BUSY_WORKER_HASH_MASK; 799 v &= BUSY_WORKER_HASH_MASK;
813 800
814 return &gcwq->busy_hash[v]; 801 return &gcwq->busy_hash[v];
815 } 802 }
816 803
817 /** 804 /**
818 * __find_worker_executing_work - find worker which is executing a work 805 * __find_worker_executing_work - find worker which is executing a work
819 * @gcwq: gcwq of interest 806 * @gcwq: gcwq of interest
820 * @bwh: hash head as returned by busy_worker_head() 807 * @bwh: hash head as returned by busy_worker_head()
821 * @work: work to find worker for 808 * @work: work to find worker for
822 * 809 *
823 * Find a worker which is executing @work on @gcwq. @bwh should be 810 * Find a worker which is executing @work on @gcwq. @bwh should be
824 * the hash head obtained by calling busy_worker_head() with the same 811 * the hash head obtained by calling busy_worker_head() with the same
825 * work. 812 * work.
826 * 813 *
827 * CONTEXT: 814 * CONTEXT:
828 * spin_lock_irq(gcwq->lock). 815 * spin_lock_irq(gcwq->lock).
829 * 816 *
830 * RETURNS: 817 * RETURNS:
831 * Pointer to worker which is executing @work if found, NULL 818 * Pointer to worker which is executing @work if found, NULL
832 * otherwise. 819 * otherwise.
833 */ 820 */
834 static struct worker *__find_worker_executing_work(struct global_cwq *gcwq, 821 static struct worker *__find_worker_executing_work(struct global_cwq *gcwq,
835 struct hlist_head *bwh, 822 struct hlist_head *bwh,
836 struct work_struct *work) 823 struct work_struct *work)
837 { 824 {
838 struct worker *worker; 825 struct worker *worker;
839 struct hlist_node *tmp; 826 struct hlist_node *tmp;
840 827
841 hlist_for_each_entry(worker, tmp, bwh, hentry) 828 hlist_for_each_entry(worker, tmp, bwh, hentry)
842 if (worker->current_work == work) 829 if (worker->current_work == work)
843 return worker; 830 return worker;
844 return NULL; 831 return NULL;
845 } 832 }
846 833
847 /** 834 /**
848 * find_worker_executing_work - find worker which is executing a work 835 * find_worker_executing_work - find worker which is executing a work
849 * @gcwq: gcwq of interest 836 * @gcwq: gcwq of interest
850 * @work: work to find worker for 837 * @work: work to find worker for
851 * 838 *
852 * Find a worker which is executing @work on @gcwq. This function is 839 * Find a worker which is executing @work on @gcwq. This function is
853 * identical to __find_worker_executing_work() except that this 840 * identical to __find_worker_executing_work() except that this
854 * function calculates @bwh itself. 841 * function calculates @bwh itself.
855 * 842 *
856 * CONTEXT: 843 * CONTEXT:
857 * spin_lock_irq(gcwq->lock). 844 * spin_lock_irq(gcwq->lock).
858 * 845 *
859 * RETURNS: 846 * RETURNS:
860 * Pointer to worker which is executing @work if found, NULL 847 * Pointer to worker which is executing @work if found, NULL
861 * otherwise. 848 * otherwise.
862 */ 849 */
863 static struct worker *find_worker_executing_work(struct global_cwq *gcwq, 850 static struct worker *find_worker_executing_work(struct global_cwq *gcwq,
864 struct work_struct *work) 851 struct work_struct *work)
865 { 852 {
866 return __find_worker_executing_work(gcwq, busy_worker_head(gcwq, work), 853 return __find_worker_executing_work(gcwq, busy_worker_head(gcwq, work),
867 work); 854 work);
868 } 855 }
869 856
870 /** 857 /**
871 * gcwq_determine_ins_pos - find insertion position 858 * gcwq_determine_ins_pos - find insertion position
872 * @gcwq: gcwq of interest 859 * @gcwq: gcwq of interest
873 * @cwq: cwq a work is being queued for 860 * @cwq: cwq a work is being queued for
874 * 861 *
875 * A work for @cwq is about to be queued on @gcwq, determine insertion 862 * A work for @cwq is about to be queued on @gcwq, determine insertion
876 * position for the work. If @cwq is for HIGHPRI wq, the work is 863 * position for the work. If @cwq is for HIGHPRI wq, the work is
877 * queued at the head of the queue but in FIFO order with respect to 864 * queued at the head of the queue but in FIFO order with respect to
878 * other HIGHPRI works; otherwise, at the end of the queue. This 865 * other HIGHPRI works; otherwise, at the end of the queue. This
879 * function also sets GCWQ_HIGHPRI_PENDING flag to hint @gcwq that 866 * function also sets GCWQ_HIGHPRI_PENDING flag to hint @gcwq that
880 * there are HIGHPRI works pending. 867 * there are HIGHPRI works pending.
881 * 868 *
882 * CONTEXT: 869 * CONTEXT:
883 * spin_lock_irq(gcwq->lock). 870 * spin_lock_irq(gcwq->lock).
884 * 871 *
885 * RETURNS: 872 * RETURNS:
886 * Pointer to inserstion position. 873 * Pointer to inserstion position.
887 */ 874 */
888 static inline struct list_head *gcwq_determine_ins_pos(struct global_cwq *gcwq, 875 static inline struct list_head *gcwq_determine_ins_pos(struct global_cwq *gcwq,
889 struct cpu_workqueue_struct *cwq) 876 struct cpu_workqueue_struct *cwq)
890 { 877 {
891 struct work_struct *twork; 878 struct work_struct *twork;
892 879
893 if (likely(!(cwq->wq->flags & WQ_HIGHPRI))) 880 if (likely(!(cwq->wq->flags & WQ_HIGHPRI)))
894 return &gcwq->worklist; 881 return &gcwq->worklist;
895 882
896 list_for_each_entry(twork, &gcwq->worklist, entry) { 883 list_for_each_entry(twork, &gcwq->worklist, entry) {
897 struct cpu_workqueue_struct *tcwq = get_work_cwq(twork); 884 struct cpu_workqueue_struct *tcwq = get_work_cwq(twork);
898 885
899 if (!(tcwq->wq->flags & WQ_HIGHPRI)) 886 if (!(tcwq->wq->flags & WQ_HIGHPRI))
900 break; 887 break;
901 } 888 }
902 889
903 gcwq->flags |= GCWQ_HIGHPRI_PENDING; 890 gcwq->flags |= GCWQ_HIGHPRI_PENDING;
904 return &twork->entry; 891 return &twork->entry;
905 } 892 }
906 893
907 /** 894 /**
908 * insert_work - insert a work into gcwq 895 * insert_work - insert a work into gcwq
909 * @cwq: cwq @work belongs to 896 * @cwq: cwq @work belongs to
910 * @work: work to insert 897 * @work: work to insert
911 * @head: insertion point 898 * @head: insertion point
912 * @extra_flags: extra WORK_STRUCT_* flags to set 899 * @extra_flags: extra WORK_STRUCT_* flags to set
913 * 900 *
914 * Insert @work which belongs to @cwq into @gcwq after @head. 901 * Insert @work which belongs to @cwq into @gcwq after @head.
915 * @extra_flags is or'd to work_struct flags. 902 * @extra_flags is or'd to work_struct flags.
916 * 903 *
917 * CONTEXT: 904 * CONTEXT:
918 * spin_lock_irq(gcwq->lock). 905 * spin_lock_irq(gcwq->lock).
919 */ 906 */
920 static void insert_work(struct cpu_workqueue_struct *cwq, 907 static void insert_work(struct cpu_workqueue_struct *cwq,
921 struct work_struct *work, struct list_head *head, 908 struct work_struct *work, struct list_head *head,
922 unsigned int extra_flags) 909 unsigned int extra_flags)
923 { 910 {
924 struct global_cwq *gcwq = cwq->gcwq; 911 struct global_cwq *gcwq = cwq->gcwq;
925 912
926 /* we own @work, set data and link */ 913 /* we own @work, set data and link */
927 set_work_cwq(work, cwq, extra_flags); 914 set_work_cwq(work, cwq, extra_flags);
928 915
929 /* 916 /*
930 * Ensure that we get the right work->data if we see the 917 * Ensure that we get the right work->data if we see the
931 * result of list_add() below, see try_to_grab_pending(). 918 * result of list_add() below, see try_to_grab_pending().
932 */ 919 */
933 smp_wmb(); 920 smp_wmb();
934 921
935 list_add_tail(&work->entry, head); 922 list_add_tail(&work->entry, head);
936 923
937 /* 924 /*
938 * Ensure either worker_sched_deactivated() sees the above 925 * Ensure either worker_sched_deactivated() sees the above
939 * list_add_tail() or we see zero nr_running to avoid workers 926 * list_add_tail() or we see zero nr_running to avoid workers
940 * lying around lazily while there are works to be processed. 927 * lying around lazily while there are works to be processed.
941 */ 928 */
942 smp_mb(); 929 smp_mb();
943 930
944 if (__need_more_worker(gcwq)) 931 if (__need_more_worker(gcwq))
945 wake_up_worker(gcwq); 932 wake_up_worker(gcwq);
946 } 933 }
947 934
948 static void __queue_work(unsigned int cpu, struct workqueue_struct *wq, 935 static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
949 struct work_struct *work) 936 struct work_struct *work)
950 { 937 {
951 struct global_cwq *gcwq; 938 struct global_cwq *gcwq;
952 struct cpu_workqueue_struct *cwq; 939 struct cpu_workqueue_struct *cwq;
953 struct list_head *worklist; 940 struct list_head *worklist;
954 unsigned int work_flags; 941 unsigned int work_flags;
955 unsigned long flags; 942 unsigned long flags;
956 943
957 debug_work_activate(work); 944 debug_work_activate(work);
958 945
959 if (WARN_ON_ONCE(wq->flags & WQ_DYING)) 946 if (WARN_ON_ONCE(wq->flags & WQ_DYING))
960 return; 947 return;
961 948
962 /* determine gcwq to use */ 949 /* determine gcwq to use */
963 if (!(wq->flags & WQ_UNBOUND)) { 950 if (!(wq->flags & WQ_UNBOUND)) {
964 struct global_cwq *last_gcwq; 951 struct global_cwq *last_gcwq;
965 952
966 if (unlikely(cpu == WORK_CPU_UNBOUND)) 953 if (unlikely(cpu == WORK_CPU_UNBOUND))
967 cpu = raw_smp_processor_id(); 954 cpu = raw_smp_processor_id();
968 955
969 /* 956 /*
970 * It's multi cpu. If @wq is non-reentrant and @work 957 * It's multi cpu. If @wq is non-reentrant and @work
971 * was previously on a different cpu, it might still 958 * was previously on a different cpu, it might still
972 * be running there, in which case the work needs to 959 * be running there, in which case the work needs to
973 * be queued on that cpu to guarantee non-reentrance. 960 * be queued on that cpu to guarantee non-reentrance.
974 */ 961 */
975 gcwq = get_gcwq(cpu); 962 gcwq = get_gcwq(cpu);
976 if (wq->flags & WQ_NON_REENTRANT && 963 if (wq->flags & WQ_NON_REENTRANT &&
977 (last_gcwq = get_work_gcwq(work)) && last_gcwq != gcwq) { 964 (last_gcwq = get_work_gcwq(work)) && last_gcwq != gcwq) {
978 struct worker *worker; 965 struct worker *worker;
979 966
980 spin_lock_irqsave(&last_gcwq->lock, flags); 967 spin_lock_irqsave(&last_gcwq->lock, flags);
981 968
982 worker = find_worker_executing_work(last_gcwq, work); 969 worker = find_worker_executing_work(last_gcwq, work);
983 970
984 if (worker && worker->current_cwq->wq == wq) 971 if (worker && worker->current_cwq->wq == wq)
985 gcwq = last_gcwq; 972 gcwq = last_gcwq;
986 else { 973 else {
987 /* meh... not running there, queue here */ 974 /* meh... not running there, queue here */
988 spin_unlock_irqrestore(&last_gcwq->lock, flags); 975 spin_unlock_irqrestore(&last_gcwq->lock, flags);
989 spin_lock_irqsave(&gcwq->lock, flags); 976 spin_lock_irqsave(&gcwq->lock, flags);
990 } 977 }
991 } else 978 } else
992 spin_lock_irqsave(&gcwq->lock, flags); 979 spin_lock_irqsave(&gcwq->lock, flags);
993 } else { 980 } else {
994 gcwq = get_gcwq(WORK_CPU_UNBOUND); 981 gcwq = get_gcwq(WORK_CPU_UNBOUND);
995 spin_lock_irqsave(&gcwq->lock, flags); 982 spin_lock_irqsave(&gcwq->lock, flags);
996 } 983 }
997 984
998 /* gcwq determined, get cwq and queue */ 985 /* gcwq determined, get cwq and queue */
999 cwq = get_cwq(gcwq->cpu, wq); 986 cwq = get_cwq(gcwq->cpu, wq);
987 trace_workqueue_queue_work(cpu, cwq, work);
1000 988
1001 BUG_ON(!list_empty(&work->entry)); 989 BUG_ON(!list_empty(&work->entry));
1002 990
1003 cwq->nr_in_flight[cwq->work_color]++; 991 cwq->nr_in_flight[cwq->work_color]++;
1004 work_flags = work_color_to_flags(cwq->work_color); 992 work_flags = work_color_to_flags(cwq->work_color);
1005 993
1006 if (likely(cwq->nr_active < cwq->max_active)) { 994 if (likely(cwq->nr_active < cwq->max_active)) {
995 trace_workqueue_activate_work(work);
1007 cwq->nr_active++; 996 cwq->nr_active++;
1008 worklist = gcwq_determine_ins_pos(gcwq, cwq); 997 worklist = gcwq_determine_ins_pos(gcwq, cwq);
1009 } else { 998 } else {
1010 work_flags |= WORK_STRUCT_DELAYED; 999 work_flags |= WORK_STRUCT_DELAYED;
1011 worklist = &cwq->delayed_works; 1000 worklist = &cwq->delayed_works;
1012 } 1001 }
1013 1002
1014 insert_work(cwq, work, worklist, work_flags); 1003 insert_work(cwq, work, worklist, work_flags);
1015 1004
1016 spin_unlock_irqrestore(&gcwq->lock, flags); 1005 spin_unlock_irqrestore(&gcwq->lock, flags);
1017 } 1006 }
1018 1007
1019 /** 1008 /**
1020 * queue_work - queue work on a workqueue 1009 * queue_work - queue work on a workqueue
1021 * @wq: workqueue to use 1010 * @wq: workqueue to use
1022 * @work: work to queue 1011 * @work: work to queue
1023 * 1012 *
1024 * Returns 0 if @work was already on a queue, non-zero otherwise. 1013 * Returns 0 if @work was already on a queue, non-zero otherwise.
1025 * 1014 *
1026 * We queue the work to the CPU on which it was submitted, but if the CPU dies 1015 * We queue the work to the CPU on which it was submitted, but if the CPU dies
1027 * it can be processed by another CPU. 1016 * it can be processed by another CPU.
1028 */ 1017 */
1029 int queue_work(struct workqueue_struct *wq, struct work_struct *work) 1018 int queue_work(struct workqueue_struct *wq, struct work_struct *work)
1030 { 1019 {
1031 int ret; 1020 int ret;
1032 1021
1033 ret = queue_work_on(get_cpu(), wq, work); 1022 ret = queue_work_on(get_cpu(), wq, work);
1034 put_cpu(); 1023 put_cpu();
1035 1024
1036 return ret; 1025 return ret;
1037 } 1026 }
1038 EXPORT_SYMBOL_GPL(queue_work); 1027 EXPORT_SYMBOL_GPL(queue_work);
1039 1028
1040 /** 1029 /**
1041 * queue_work_on - queue work on specific cpu 1030 * queue_work_on - queue work on specific cpu
1042 * @cpu: CPU number to execute work on 1031 * @cpu: CPU number to execute work on
1043 * @wq: workqueue to use 1032 * @wq: workqueue to use
1044 * @work: work to queue 1033 * @work: work to queue
1045 * 1034 *
1046 * Returns 0 if @work was already on a queue, non-zero otherwise. 1035 * Returns 0 if @work was already on a queue, non-zero otherwise.
1047 * 1036 *
1048 * We queue the work to a specific CPU, the caller must ensure it 1037 * We queue the work to a specific CPU, the caller must ensure it
1049 * can't go away. 1038 * can't go away.
1050 */ 1039 */
1051 int 1040 int
1052 queue_work_on(int cpu, struct workqueue_struct *wq, struct work_struct *work) 1041 queue_work_on(int cpu, struct workqueue_struct *wq, struct work_struct *work)
1053 { 1042 {
1054 int ret = 0; 1043 int ret = 0;
1055 1044
1056 if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) { 1045 if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
1057 __queue_work(cpu, wq, work); 1046 __queue_work(cpu, wq, work);
1058 ret = 1; 1047 ret = 1;
1059 } 1048 }
1060 return ret; 1049 return ret;
1061 } 1050 }
1062 EXPORT_SYMBOL_GPL(queue_work_on); 1051 EXPORT_SYMBOL_GPL(queue_work_on);
1063 1052
1064 static void delayed_work_timer_fn(unsigned long __data) 1053 static void delayed_work_timer_fn(unsigned long __data)
1065 { 1054 {
1066 struct delayed_work *dwork = (struct delayed_work *)__data; 1055 struct delayed_work *dwork = (struct delayed_work *)__data;
1067 struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work); 1056 struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work);
1068 1057
1069 __queue_work(smp_processor_id(), cwq->wq, &dwork->work); 1058 __queue_work(smp_processor_id(), cwq->wq, &dwork->work);
1070 } 1059 }
1071 1060
1072 /** 1061 /**
1073 * queue_delayed_work - queue work on a workqueue after delay 1062 * queue_delayed_work - queue work on a workqueue after delay
1074 * @wq: workqueue to use 1063 * @wq: workqueue to use
1075 * @dwork: delayable work to queue 1064 * @dwork: delayable work to queue
1076 * @delay: number of jiffies to wait before queueing 1065 * @delay: number of jiffies to wait before queueing
1077 * 1066 *
1078 * Returns 0 if @work was already on a queue, non-zero otherwise. 1067 * Returns 0 if @work was already on a queue, non-zero otherwise.
1079 */ 1068 */
1080 int queue_delayed_work(struct workqueue_struct *wq, 1069 int queue_delayed_work(struct workqueue_struct *wq,
1081 struct delayed_work *dwork, unsigned long delay) 1070 struct delayed_work *dwork, unsigned long delay)
1082 { 1071 {
1083 if (delay == 0) 1072 if (delay == 0)
1084 return queue_work(wq, &dwork->work); 1073 return queue_work(wq, &dwork->work);
1085 1074
1086 return queue_delayed_work_on(-1, wq, dwork, delay); 1075 return queue_delayed_work_on(-1, wq, dwork, delay);
1087 } 1076 }
1088 EXPORT_SYMBOL_GPL(queue_delayed_work); 1077 EXPORT_SYMBOL_GPL(queue_delayed_work);
1089 1078
1090 /** 1079 /**
1091 * queue_delayed_work_on - queue work on specific CPU after delay 1080 * queue_delayed_work_on - queue work on specific CPU after delay
1092 * @cpu: CPU number to execute work on 1081 * @cpu: CPU number to execute work on
1093 * @wq: workqueue to use 1082 * @wq: workqueue to use
1094 * @dwork: work to queue 1083 * @dwork: work to queue
1095 * @delay: number of jiffies to wait before queueing 1084 * @delay: number of jiffies to wait before queueing
1096 * 1085 *
1097 * Returns 0 if @work was already on a queue, non-zero otherwise. 1086 * Returns 0 if @work was already on a queue, non-zero otherwise.
1098 */ 1087 */
1099 int queue_delayed_work_on(int cpu, struct workqueue_struct *wq, 1088 int queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
1100 struct delayed_work *dwork, unsigned long delay) 1089 struct delayed_work *dwork, unsigned long delay)
1101 { 1090 {
1102 int ret = 0; 1091 int ret = 0;
1103 struct timer_list *timer = &dwork->timer; 1092 struct timer_list *timer = &dwork->timer;
1104 struct work_struct *work = &dwork->work; 1093 struct work_struct *work = &dwork->work;
1105 1094
1106 if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) { 1095 if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
1107 unsigned int lcpu; 1096 unsigned int lcpu;
1108 1097
1109 BUG_ON(timer_pending(timer)); 1098 BUG_ON(timer_pending(timer));
1110 BUG_ON(!list_empty(&work->entry)); 1099 BUG_ON(!list_empty(&work->entry));
1111 1100
1112 timer_stats_timer_set_start_info(&dwork->timer); 1101 timer_stats_timer_set_start_info(&dwork->timer);
1113 1102
1114 /* 1103 /*
1115 * This stores cwq for the moment, for the timer_fn. 1104 * This stores cwq for the moment, for the timer_fn.
1116 * Note that the work's gcwq is preserved to allow 1105 * Note that the work's gcwq is preserved to allow
1117 * reentrance detection for delayed works. 1106 * reentrance detection for delayed works.
1118 */ 1107 */
1119 if (!(wq->flags & WQ_UNBOUND)) { 1108 if (!(wq->flags & WQ_UNBOUND)) {
1120 struct global_cwq *gcwq = get_work_gcwq(work); 1109 struct global_cwq *gcwq = get_work_gcwq(work);
1121 1110
1122 if (gcwq && gcwq->cpu != WORK_CPU_UNBOUND) 1111 if (gcwq && gcwq->cpu != WORK_CPU_UNBOUND)
1123 lcpu = gcwq->cpu; 1112 lcpu = gcwq->cpu;
1124 else 1113 else
1125 lcpu = raw_smp_processor_id(); 1114 lcpu = raw_smp_processor_id();
1126 } else 1115 } else
1127 lcpu = WORK_CPU_UNBOUND; 1116 lcpu = WORK_CPU_UNBOUND;
1128 1117
1129 set_work_cwq(work, get_cwq(lcpu, wq), 0); 1118 set_work_cwq(work, get_cwq(lcpu, wq), 0);
1130 1119
1131 timer->expires = jiffies + delay; 1120 timer->expires = jiffies + delay;
1132 timer->data = (unsigned long)dwork; 1121 timer->data = (unsigned long)dwork;
1133 timer->function = delayed_work_timer_fn; 1122 timer->function = delayed_work_timer_fn;
1134 1123
1135 if (unlikely(cpu >= 0)) 1124 if (unlikely(cpu >= 0))
1136 add_timer_on(timer, cpu); 1125 add_timer_on(timer, cpu);
1137 else 1126 else
1138 add_timer(timer); 1127 add_timer(timer);
1139 ret = 1; 1128 ret = 1;
1140 } 1129 }
1141 return ret; 1130 return ret;
1142 } 1131 }
1143 EXPORT_SYMBOL_GPL(queue_delayed_work_on); 1132 EXPORT_SYMBOL_GPL(queue_delayed_work_on);
1144 1133
1145 /** 1134 /**
1146 * worker_enter_idle - enter idle state 1135 * worker_enter_idle - enter idle state
1147 * @worker: worker which is entering idle state 1136 * @worker: worker which is entering idle state
1148 * 1137 *
1149 * @worker is entering idle state. Update stats and idle timer if 1138 * @worker is entering idle state. Update stats and idle timer if
1150 * necessary. 1139 * necessary.
1151 * 1140 *
1152 * LOCKING: 1141 * LOCKING:
1153 * spin_lock_irq(gcwq->lock). 1142 * spin_lock_irq(gcwq->lock).
1154 */ 1143 */
1155 static void worker_enter_idle(struct worker *worker) 1144 static void worker_enter_idle(struct worker *worker)
1156 { 1145 {
1157 struct global_cwq *gcwq = worker->gcwq; 1146 struct global_cwq *gcwq = worker->gcwq;
1158 1147
1159 BUG_ON(worker->flags & WORKER_IDLE); 1148 BUG_ON(worker->flags & WORKER_IDLE);
1160 BUG_ON(!list_empty(&worker->entry) && 1149 BUG_ON(!list_empty(&worker->entry) &&
1161 (worker->hentry.next || worker->hentry.pprev)); 1150 (worker->hentry.next || worker->hentry.pprev));
1162 1151
1163 /* can't use worker_set_flags(), also called from start_worker() */ 1152 /* can't use worker_set_flags(), also called from start_worker() */
1164 worker->flags |= WORKER_IDLE; 1153 worker->flags |= WORKER_IDLE;
1165 gcwq->nr_idle++; 1154 gcwq->nr_idle++;
1166 worker->last_active = jiffies; 1155 worker->last_active = jiffies;
1167 1156
1168 /* idle_list is LIFO */ 1157 /* idle_list is LIFO */
1169 list_add(&worker->entry, &gcwq->idle_list); 1158 list_add(&worker->entry, &gcwq->idle_list);
1170 1159
1171 if (likely(!(worker->flags & WORKER_ROGUE))) { 1160 if (likely(!(worker->flags & WORKER_ROGUE))) {
1172 if (too_many_workers(gcwq) && !timer_pending(&gcwq->idle_timer)) 1161 if (too_many_workers(gcwq) && !timer_pending(&gcwq->idle_timer))
1173 mod_timer(&gcwq->idle_timer, 1162 mod_timer(&gcwq->idle_timer,
1174 jiffies + IDLE_WORKER_TIMEOUT); 1163 jiffies + IDLE_WORKER_TIMEOUT);
1175 } else 1164 } else
1176 wake_up_all(&gcwq->trustee_wait); 1165 wake_up_all(&gcwq->trustee_wait);
1177 1166
1178 /* sanity check nr_running */ 1167 /* sanity check nr_running */
1179 WARN_ON_ONCE(gcwq->nr_workers == gcwq->nr_idle && 1168 WARN_ON_ONCE(gcwq->nr_workers == gcwq->nr_idle &&
1180 atomic_read(get_gcwq_nr_running(gcwq->cpu))); 1169 atomic_read(get_gcwq_nr_running(gcwq->cpu)));
1181 } 1170 }
1182 1171
1183 /** 1172 /**
1184 * worker_leave_idle - leave idle state 1173 * worker_leave_idle - leave idle state
1185 * @worker: worker which is leaving idle state 1174 * @worker: worker which is leaving idle state
1186 * 1175 *
1187 * @worker is leaving idle state. Update stats. 1176 * @worker is leaving idle state. Update stats.
1188 * 1177 *
1189 * LOCKING: 1178 * LOCKING:
1190 * spin_lock_irq(gcwq->lock). 1179 * spin_lock_irq(gcwq->lock).
1191 */ 1180 */
1192 static void worker_leave_idle(struct worker *worker) 1181 static void worker_leave_idle(struct worker *worker)
1193 { 1182 {
1194 struct global_cwq *gcwq = worker->gcwq; 1183 struct global_cwq *gcwq = worker->gcwq;
1195 1184
1196 BUG_ON(!(worker->flags & WORKER_IDLE)); 1185 BUG_ON(!(worker->flags & WORKER_IDLE));
1197 worker_clr_flags(worker, WORKER_IDLE); 1186 worker_clr_flags(worker, WORKER_IDLE);
1198 gcwq->nr_idle--; 1187 gcwq->nr_idle--;
1199 list_del_init(&worker->entry); 1188 list_del_init(&worker->entry);
1200 } 1189 }
1201 1190
1202 /** 1191 /**
1203 * worker_maybe_bind_and_lock - bind worker to its cpu if possible and lock gcwq 1192 * worker_maybe_bind_and_lock - bind worker to its cpu if possible and lock gcwq
1204 * @worker: self 1193 * @worker: self
1205 * 1194 *
1206 * Works which are scheduled while the cpu is online must at least be 1195 * Works which are scheduled while the cpu is online must at least be
1207 * scheduled to a worker which is bound to the cpu so that if they are 1196 * scheduled to a worker which is bound to the cpu so that if they are
1208 * flushed from cpu callbacks while cpu is going down, they are 1197 * flushed from cpu callbacks while cpu is going down, they are
1209 * guaranteed to execute on the cpu. 1198 * guaranteed to execute on the cpu.
1210 * 1199 *
1211 * This function is to be used by rogue workers and rescuers to bind 1200 * This function is to be used by rogue workers and rescuers to bind
1212 * themselves to the target cpu and may race with cpu going down or 1201 * themselves to the target cpu and may race with cpu going down or
1213 * coming online. kthread_bind() can't be used because it may put the 1202 * coming online. kthread_bind() can't be used because it may put the
1214 * worker to already dead cpu and set_cpus_allowed_ptr() can't be used 1203 * worker to already dead cpu and set_cpus_allowed_ptr() can't be used
1215 * verbatim as it's best effort and blocking and gcwq may be 1204 * verbatim as it's best effort and blocking and gcwq may be
1216 * [dis]associated in the meantime. 1205 * [dis]associated in the meantime.
1217 * 1206 *
1218 * This function tries set_cpus_allowed() and locks gcwq and verifies 1207 * This function tries set_cpus_allowed() and locks gcwq and verifies
1219 * the binding against GCWQ_DISASSOCIATED which is set during 1208 * the binding against GCWQ_DISASSOCIATED which is set during
1220 * CPU_DYING and cleared during CPU_ONLINE, so if the worker enters 1209 * CPU_DYING and cleared during CPU_ONLINE, so if the worker enters
1221 * idle state or fetches works without dropping lock, it can guarantee 1210 * idle state or fetches works without dropping lock, it can guarantee
1222 * the scheduling requirement described in the first paragraph. 1211 * the scheduling requirement described in the first paragraph.
1223 * 1212 *
1224 * CONTEXT: 1213 * CONTEXT:
1225 * Might sleep. Called without any lock but returns with gcwq->lock 1214 * Might sleep. Called without any lock but returns with gcwq->lock
1226 * held. 1215 * held.
1227 * 1216 *
1228 * RETURNS: 1217 * RETURNS:
1229 * %true if the associated gcwq is online (@worker is successfully 1218 * %true if the associated gcwq is online (@worker is successfully
1230 * bound), %false if offline. 1219 * bound), %false if offline.
1231 */ 1220 */
1232 static bool worker_maybe_bind_and_lock(struct worker *worker) 1221 static bool worker_maybe_bind_and_lock(struct worker *worker)
1233 __acquires(&gcwq->lock) 1222 __acquires(&gcwq->lock)
1234 { 1223 {
1235 struct global_cwq *gcwq = worker->gcwq; 1224 struct global_cwq *gcwq = worker->gcwq;
1236 struct task_struct *task = worker->task; 1225 struct task_struct *task = worker->task;
1237 1226
1238 while (true) { 1227 while (true) {
1239 /* 1228 /*
1240 * The following call may fail, succeed or succeed 1229 * The following call may fail, succeed or succeed
1241 * without actually migrating the task to the cpu if 1230 * without actually migrating the task to the cpu if
1242 * it races with cpu hotunplug operation. Verify 1231 * it races with cpu hotunplug operation. Verify
1243 * against GCWQ_DISASSOCIATED. 1232 * against GCWQ_DISASSOCIATED.
1244 */ 1233 */
1245 if (!(gcwq->flags & GCWQ_DISASSOCIATED)) 1234 if (!(gcwq->flags & GCWQ_DISASSOCIATED))
1246 set_cpus_allowed_ptr(task, get_cpu_mask(gcwq->cpu)); 1235 set_cpus_allowed_ptr(task, get_cpu_mask(gcwq->cpu));
1247 1236
1248 spin_lock_irq(&gcwq->lock); 1237 spin_lock_irq(&gcwq->lock);
1249 if (gcwq->flags & GCWQ_DISASSOCIATED) 1238 if (gcwq->flags & GCWQ_DISASSOCIATED)
1250 return false; 1239 return false;
1251 if (task_cpu(task) == gcwq->cpu && 1240 if (task_cpu(task) == gcwq->cpu &&
1252 cpumask_equal(&current->cpus_allowed, 1241 cpumask_equal(&current->cpus_allowed,
1253 get_cpu_mask(gcwq->cpu))) 1242 get_cpu_mask(gcwq->cpu)))
1254 return true; 1243 return true;
1255 spin_unlock_irq(&gcwq->lock); 1244 spin_unlock_irq(&gcwq->lock);
1256 1245
1257 /* CPU has come up inbetween, retry migration */ 1246 /* CPU has come up inbetween, retry migration */
1258 cpu_relax(); 1247 cpu_relax();
1259 } 1248 }
1260 } 1249 }
1261 1250
1262 /* 1251 /*
1263 * Function for worker->rebind_work used to rebind rogue busy workers 1252 * Function for worker->rebind_work used to rebind rogue busy workers
1264 * to the associated cpu which is coming back online. This is 1253 * to the associated cpu which is coming back online. This is
1265 * scheduled by cpu up but can race with other cpu hotplug operations 1254 * scheduled by cpu up but can race with other cpu hotplug operations
1266 * and may be executed twice without intervening cpu down. 1255 * and may be executed twice without intervening cpu down.
1267 */ 1256 */
1268 static void worker_rebind_fn(struct work_struct *work) 1257 static void worker_rebind_fn(struct work_struct *work)
1269 { 1258 {
1270 struct worker *worker = container_of(work, struct worker, rebind_work); 1259 struct worker *worker = container_of(work, struct worker, rebind_work);
1271 struct global_cwq *gcwq = worker->gcwq; 1260 struct global_cwq *gcwq = worker->gcwq;
1272 1261
1273 if (worker_maybe_bind_and_lock(worker)) 1262 if (worker_maybe_bind_and_lock(worker))
1274 worker_clr_flags(worker, WORKER_REBIND); 1263 worker_clr_flags(worker, WORKER_REBIND);
1275 1264
1276 spin_unlock_irq(&gcwq->lock); 1265 spin_unlock_irq(&gcwq->lock);
1277 } 1266 }
1278 1267
1279 static struct worker *alloc_worker(void) 1268 static struct worker *alloc_worker(void)
1280 { 1269 {
1281 struct worker *worker; 1270 struct worker *worker;
1282 1271
1283 worker = kzalloc(sizeof(*worker), GFP_KERNEL); 1272 worker = kzalloc(sizeof(*worker), GFP_KERNEL);
1284 if (worker) { 1273 if (worker) {
1285 INIT_LIST_HEAD(&worker->entry); 1274 INIT_LIST_HEAD(&worker->entry);
1286 INIT_LIST_HEAD(&worker->scheduled); 1275 INIT_LIST_HEAD(&worker->scheduled);
1287 INIT_WORK(&worker->rebind_work, worker_rebind_fn); 1276 INIT_WORK(&worker->rebind_work, worker_rebind_fn);
1288 /* on creation a worker is in !idle && prep state */ 1277 /* on creation a worker is in !idle && prep state */
1289 worker->flags = WORKER_PREP; 1278 worker->flags = WORKER_PREP;
1290 } 1279 }
1291 return worker; 1280 return worker;
1292 } 1281 }
1293 1282
1294 /** 1283 /**
1295 * create_worker - create a new workqueue worker 1284 * create_worker - create a new workqueue worker
1296 * @gcwq: gcwq the new worker will belong to 1285 * @gcwq: gcwq the new worker will belong to
1297 * @bind: whether to set affinity to @cpu or not 1286 * @bind: whether to set affinity to @cpu or not
1298 * 1287 *
1299 * Create a new worker which is bound to @gcwq. The returned worker 1288 * Create a new worker which is bound to @gcwq. The returned worker
1300 * can be started by calling start_worker() or destroyed using 1289 * can be started by calling start_worker() or destroyed using
1301 * destroy_worker(). 1290 * destroy_worker().
1302 * 1291 *
1303 * CONTEXT: 1292 * CONTEXT:
1304 * Might sleep. Does GFP_KERNEL allocations. 1293 * Might sleep. Does GFP_KERNEL allocations.
1305 * 1294 *
1306 * RETURNS: 1295 * RETURNS:
1307 * Pointer to the newly created worker. 1296 * Pointer to the newly created worker.
1308 */ 1297 */
1309 static struct worker *create_worker(struct global_cwq *gcwq, bool bind) 1298 static struct worker *create_worker(struct global_cwq *gcwq, bool bind)
1310 { 1299 {
1311 bool on_unbound_cpu = gcwq->cpu == WORK_CPU_UNBOUND; 1300 bool on_unbound_cpu = gcwq->cpu == WORK_CPU_UNBOUND;
1312 struct worker *worker = NULL; 1301 struct worker *worker = NULL;
1313 int id = -1; 1302 int id = -1;
1314 1303
1315 spin_lock_irq(&gcwq->lock); 1304 spin_lock_irq(&gcwq->lock);
1316 while (ida_get_new(&gcwq->worker_ida, &id)) { 1305 while (ida_get_new(&gcwq->worker_ida, &id)) {
1317 spin_unlock_irq(&gcwq->lock); 1306 spin_unlock_irq(&gcwq->lock);
1318 if (!ida_pre_get(&gcwq->worker_ida, GFP_KERNEL)) 1307 if (!ida_pre_get(&gcwq->worker_ida, GFP_KERNEL))
1319 goto fail; 1308 goto fail;
1320 spin_lock_irq(&gcwq->lock); 1309 spin_lock_irq(&gcwq->lock);
1321 } 1310 }
1322 spin_unlock_irq(&gcwq->lock); 1311 spin_unlock_irq(&gcwq->lock);
1323 1312
1324 worker = alloc_worker(); 1313 worker = alloc_worker();
1325 if (!worker) 1314 if (!worker)
1326 goto fail; 1315 goto fail;
1327 1316
1328 worker->gcwq = gcwq; 1317 worker->gcwq = gcwq;
1329 worker->id = id; 1318 worker->id = id;
1330 1319
1331 if (!on_unbound_cpu) 1320 if (!on_unbound_cpu)
1332 worker->task = kthread_create(worker_thread, worker, 1321 worker->task = kthread_create(worker_thread, worker,
1333 "kworker/%u:%d", gcwq->cpu, id); 1322 "kworker/%u:%d", gcwq->cpu, id);
1334 else 1323 else
1335 worker->task = kthread_create(worker_thread, worker, 1324 worker->task = kthread_create(worker_thread, worker,
1336 "kworker/u:%d", id); 1325 "kworker/u:%d", id);
1337 if (IS_ERR(worker->task)) 1326 if (IS_ERR(worker->task))
1338 goto fail; 1327 goto fail;
1339 1328
1340 /* 1329 /*
1341 * A rogue worker will become a regular one if CPU comes 1330 * A rogue worker will become a regular one if CPU comes
1342 * online later on. Make sure every worker has 1331 * online later on. Make sure every worker has
1343 * PF_THREAD_BOUND set. 1332 * PF_THREAD_BOUND set.
1344 */ 1333 */
1345 if (bind && !on_unbound_cpu) 1334 if (bind && !on_unbound_cpu)
1346 kthread_bind(worker->task, gcwq->cpu); 1335 kthread_bind(worker->task, gcwq->cpu);
1347 else { 1336 else {
1348 worker->task->flags |= PF_THREAD_BOUND; 1337 worker->task->flags |= PF_THREAD_BOUND;
1349 if (on_unbound_cpu) 1338 if (on_unbound_cpu)
1350 worker->flags |= WORKER_UNBOUND; 1339 worker->flags |= WORKER_UNBOUND;
1351 } 1340 }
1352 1341
1353 return worker; 1342 return worker;
1354 fail: 1343 fail:
1355 if (id >= 0) { 1344 if (id >= 0) {
1356 spin_lock_irq(&gcwq->lock); 1345 spin_lock_irq(&gcwq->lock);
1357 ida_remove(&gcwq->worker_ida, id); 1346 ida_remove(&gcwq->worker_ida, id);
1358 spin_unlock_irq(&gcwq->lock); 1347 spin_unlock_irq(&gcwq->lock);
1359 } 1348 }
1360 kfree(worker); 1349 kfree(worker);
1361 return NULL; 1350 return NULL;
1362 } 1351 }
1363 1352
1364 /** 1353 /**
1365 * start_worker - start a newly created worker 1354 * start_worker - start a newly created worker
1366 * @worker: worker to start 1355 * @worker: worker to start
1367 * 1356 *
1368 * Make the gcwq aware of @worker and start it. 1357 * Make the gcwq aware of @worker and start it.
1369 * 1358 *
1370 * CONTEXT: 1359 * CONTEXT:
1371 * spin_lock_irq(gcwq->lock). 1360 * spin_lock_irq(gcwq->lock).
1372 */ 1361 */
1373 static void start_worker(struct worker *worker) 1362 static void start_worker(struct worker *worker)
1374 { 1363 {
1375 worker->flags |= WORKER_STARTED; 1364 worker->flags |= WORKER_STARTED;
1376 worker->gcwq->nr_workers++; 1365 worker->gcwq->nr_workers++;
1377 worker_enter_idle(worker); 1366 worker_enter_idle(worker);
1378 wake_up_process(worker->task); 1367 wake_up_process(worker->task);
1379 } 1368 }
1380 1369
1381 /** 1370 /**
1382 * destroy_worker - destroy a workqueue worker 1371 * destroy_worker - destroy a workqueue worker
1383 * @worker: worker to be destroyed 1372 * @worker: worker to be destroyed
1384 * 1373 *
1385 * Destroy @worker and adjust @gcwq stats accordingly. 1374 * Destroy @worker and adjust @gcwq stats accordingly.
1386 * 1375 *
1387 * CONTEXT: 1376 * CONTEXT:
1388 * spin_lock_irq(gcwq->lock) which is released and regrabbed. 1377 * spin_lock_irq(gcwq->lock) which is released and regrabbed.
1389 */ 1378 */
1390 static void destroy_worker(struct worker *worker) 1379 static void destroy_worker(struct worker *worker)
1391 { 1380 {
1392 struct global_cwq *gcwq = worker->gcwq; 1381 struct global_cwq *gcwq = worker->gcwq;
1393 int id = worker->id; 1382 int id = worker->id;
1394 1383
1395 /* sanity check frenzy */ 1384 /* sanity check frenzy */
1396 BUG_ON(worker->current_work); 1385 BUG_ON(worker->current_work);
1397 BUG_ON(!list_empty(&worker->scheduled)); 1386 BUG_ON(!list_empty(&worker->scheduled));
1398 1387
1399 if (worker->flags & WORKER_STARTED) 1388 if (worker->flags & WORKER_STARTED)
1400 gcwq->nr_workers--; 1389 gcwq->nr_workers--;
1401 if (worker->flags & WORKER_IDLE) 1390 if (worker->flags & WORKER_IDLE)
1402 gcwq->nr_idle--; 1391 gcwq->nr_idle--;
1403 1392
1404 list_del_init(&worker->entry); 1393 list_del_init(&worker->entry);
1405 worker->flags |= WORKER_DIE; 1394 worker->flags |= WORKER_DIE;
1406 1395
1407 spin_unlock_irq(&gcwq->lock); 1396 spin_unlock_irq(&gcwq->lock);
1408 1397
1409 kthread_stop(worker->task); 1398 kthread_stop(worker->task);
1410 kfree(worker); 1399 kfree(worker);
1411 1400
1412 spin_lock_irq(&gcwq->lock); 1401 spin_lock_irq(&gcwq->lock);
1413 ida_remove(&gcwq->worker_ida, id); 1402 ida_remove(&gcwq->worker_ida, id);
1414 } 1403 }
1415 1404
1416 static void idle_worker_timeout(unsigned long __gcwq) 1405 static void idle_worker_timeout(unsigned long __gcwq)
1417 { 1406 {
1418 struct global_cwq *gcwq = (void *)__gcwq; 1407 struct global_cwq *gcwq = (void *)__gcwq;
1419 1408
1420 spin_lock_irq(&gcwq->lock); 1409 spin_lock_irq(&gcwq->lock);
1421 1410
1422 if (too_many_workers(gcwq)) { 1411 if (too_many_workers(gcwq)) {
1423 struct worker *worker; 1412 struct worker *worker;
1424 unsigned long expires; 1413 unsigned long expires;
1425 1414
1426 /* idle_list is kept in LIFO order, check the last one */ 1415 /* idle_list is kept in LIFO order, check the last one */
1427 worker = list_entry(gcwq->idle_list.prev, struct worker, entry); 1416 worker = list_entry(gcwq->idle_list.prev, struct worker, entry);
1428 expires = worker->last_active + IDLE_WORKER_TIMEOUT; 1417 expires = worker->last_active + IDLE_WORKER_TIMEOUT;
1429 1418
1430 if (time_before(jiffies, expires)) 1419 if (time_before(jiffies, expires))
1431 mod_timer(&gcwq->idle_timer, expires); 1420 mod_timer(&gcwq->idle_timer, expires);
1432 else { 1421 else {
1433 /* it's been idle for too long, wake up manager */ 1422 /* it's been idle for too long, wake up manager */
1434 gcwq->flags |= GCWQ_MANAGE_WORKERS; 1423 gcwq->flags |= GCWQ_MANAGE_WORKERS;
1435 wake_up_worker(gcwq); 1424 wake_up_worker(gcwq);
1436 } 1425 }
1437 } 1426 }
1438 1427
1439 spin_unlock_irq(&gcwq->lock); 1428 spin_unlock_irq(&gcwq->lock);
1440 } 1429 }
1441 1430
1442 static bool send_mayday(struct work_struct *work) 1431 static bool send_mayday(struct work_struct *work)
1443 { 1432 {
1444 struct cpu_workqueue_struct *cwq = get_work_cwq(work); 1433 struct cpu_workqueue_struct *cwq = get_work_cwq(work);
1445 struct workqueue_struct *wq = cwq->wq; 1434 struct workqueue_struct *wq = cwq->wq;
1446 unsigned int cpu; 1435 unsigned int cpu;
1447 1436
1448 if (!(wq->flags & WQ_RESCUER)) 1437 if (!(wq->flags & WQ_RESCUER))
1449 return false; 1438 return false;
1450 1439
1451 /* mayday mayday mayday */ 1440 /* mayday mayday mayday */
1452 cpu = cwq->gcwq->cpu; 1441 cpu = cwq->gcwq->cpu;
1453 /* WORK_CPU_UNBOUND can't be set in cpumask, use cpu 0 instead */ 1442 /* WORK_CPU_UNBOUND can't be set in cpumask, use cpu 0 instead */
1454 if (cpu == WORK_CPU_UNBOUND) 1443 if (cpu == WORK_CPU_UNBOUND)
1455 cpu = 0; 1444 cpu = 0;
1456 if (!mayday_test_and_set_cpu(cpu, wq->mayday_mask)) 1445 if (!mayday_test_and_set_cpu(cpu, wq->mayday_mask))
1457 wake_up_process(wq->rescuer->task); 1446 wake_up_process(wq->rescuer->task);
1458 return true; 1447 return true;
1459 } 1448 }
1460 1449
1461 static void gcwq_mayday_timeout(unsigned long __gcwq) 1450 static void gcwq_mayday_timeout(unsigned long __gcwq)
1462 { 1451 {
1463 struct global_cwq *gcwq = (void *)__gcwq; 1452 struct global_cwq *gcwq = (void *)__gcwq;
1464 struct work_struct *work; 1453 struct work_struct *work;
1465 1454
1466 spin_lock_irq(&gcwq->lock); 1455 spin_lock_irq(&gcwq->lock);
1467 1456
1468 if (need_to_create_worker(gcwq)) { 1457 if (need_to_create_worker(gcwq)) {
1469 /* 1458 /*
1470 * We've been trying to create a new worker but 1459 * We've been trying to create a new worker but
1471 * haven't been successful. We might be hitting an 1460 * haven't been successful. We might be hitting an
1472 * allocation deadlock. Send distress signals to 1461 * allocation deadlock. Send distress signals to
1473 * rescuers. 1462 * rescuers.
1474 */ 1463 */
1475 list_for_each_entry(work, &gcwq->worklist, entry) 1464 list_for_each_entry(work, &gcwq->worklist, entry)
1476 send_mayday(work); 1465 send_mayday(work);
1477 } 1466 }
1478 1467
1479 spin_unlock_irq(&gcwq->lock); 1468 spin_unlock_irq(&gcwq->lock);
1480 1469
1481 mod_timer(&gcwq->mayday_timer, jiffies + MAYDAY_INTERVAL); 1470 mod_timer(&gcwq->mayday_timer, jiffies + MAYDAY_INTERVAL);
1482 } 1471 }
1483 1472
1484 /** 1473 /**
1485 * maybe_create_worker - create a new worker if necessary 1474 * maybe_create_worker - create a new worker if necessary
1486 * @gcwq: gcwq to create a new worker for 1475 * @gcwq: gcwq to create a new worker for
1487 * 1476 *
1488 * Create a new worker for @gcwq if necessary. @gcwq is guaranteed to 1477 * Create a new worker for @gcwq if necessary. @gcwq is guaranteed to
1489 * have at least one idle worker on return from this function. If 1478 * have at least one idle worker on return from this function. If
1490 * creating a new worker takes longer than MAYDAY_INTERVAL, mayday is 1479 * creating a new worker takes longer than MAYDAY_INTERVAL, mayday is
1491 * sent to all rescuers with works scheduled on @gcwq to resolve 1480 * sent to all rescuers with works scheduled on @gcwq to resolve
1492 * possible allocation deadlock. 1481 * possible allocation deadlock.
1493 * 1482 *
1494 * On return, need_to_create_worker() is guaranteed to be false and 1483 * On return, need_to_create_worker() is guaranteed to be false and
1495 * may_start_working() true. 1484 * may_start_working() true.
1496 * 1485 *
1497 * LOCKING: 1486 * LOCKING:
1498 * spin_lock_irq(gcwq->lock) which may be released and regrabbed 1487 * spin_lock_irq(gcwq->lock) which may be released and regrabbed
1499 * multiple times. Does GFP_KERNEL allocations. Called only from 1488 * multiple times. Does GFP_KERNEL allocations. Called only from
1500 * manager. 1489 * manager.
1501 * 1490 *
1502 * RETURNS: 1491 * RETURNS:
1503 * false if no action was taken and gcwq->lock stayed locked, true 1492 * false if no action was taken and gcwq->lock stayed locked, true
1504 * otherwise. 1493 * otherwise.
1505 */ 1494 */
1506 static bool maybe_create_worker(struct global_cwq *gcwq) 1495 static bool maybe_create_worker(struct global_cwq *gcwq)
1507 __releases(&gcwq->lock) 1496 __releases(&gcwq->lock)
1508 __acquires(&gcwq->lock) 1497 __acquires(&gcwq->lock)
1509 { 1498 {
1510 if (!need_to_create_worker(gcwq)) 1499 if (!need_to_create_worker(gcwq))
1511 return false; 1500 return false;
1512 restart: 1501 restart:
1513 spin_unlock_irq(&gcwq->lock); 1502 spin_unlock_irq(&gcwq->lock);
1514 1503
1515 /* if we don't make progress in MAYDAY_INITIAL_TIMEOUT, call for help */ 1504 /* if we don't make progress in MAYDAY_INITIAL_TIMEOUT, call for help */
1516 mod_timer(&gcwq->mayday_timer, jiffies + MAYDAY_INITIAL_TIMEOUT); 1505 mod_timer(&gcwq->mayday_timer, jiffies + MAYDAY_INITIAL_TIMEOUT);
1517 1506
1518 while (true) { 1507 while (true) {
1519 struct worker *worker; 1508 struct worker *worker;
1520 1509
1521 worker = create_worker(gcwq, true); 1510 worker = create_worker(gcwq, true);
1522 if (worker) { 1511 if (worker) {
1523 del_timer_sync(&gcwq->mayday_timer); 1512 del_timer_sync(&gcwq->mayday_timer);
1524 spin_lock_irq(&gcwq->lock); 1513 spin_lock_irq(&gcwq->lock);
1525 start_worker(worker); 1514 start_worker(worker);
1526 BUG_ON(need_to_create_worker(gcwq)); 1515 BUG_ON(need_to_create_worker(gcwq));
1527 return true; 1516 return true;
1528 } 1517 }
1529 1518
1530 if (!need_to_create_worker(gcwq)) 1519 if (!need_to_create_worker(gcwq))
1531 break; 1520 break;
1532 1521
1533 __set_current_state(TASK_INTERRUPTIBLE); 1522 __set_current_state(TASK_INTERRUPTIBLE);
1534 schedule_timeout(CREATE_COOLDOWN); 1523 schedule_timeout(CREATE_COOLDOWN);
1535 1524
1536 if (!need_to_create_worker(gcwq)) 1525 if (!need_to_create_worker(gcwq))
1537 break; 1526 break;
1538 } 1527 }
1539 1528
1540 del_timer_sync(&gcwq->mayday_timer); 1529 del_timer_sync(&gcwq->mayday_timer);
1541 spin_lock_irq(&gcwq->lock); 1530 spin_lock_irq(&gcwq->lock);
1542 if (need_to_create_worker(gcwq)) 1531 if (need_to_create_worker(gcwq))
1543 goto restart; 1532 goto restart;
1544 return true; 1533 return true;
1545 } 1534 }
1546 1535
1547 /** 1536 /**
1548 * maybe_destroy_worker - destroy workers which have been idle for a while 1537 * maybe_destroy_worker - destroy workers which have been idle for a while
1549 * @gcwq: gcwq to destroy workers for 1538 * @gcwq: gcwq to destroy workers for
1550 * 1539 *
1551 * Destroy @gcwq workers which have been idle for longer than 1540 * Destroy @gcwq workers which have been idle for longer than
1552 * IDLE_WORKER_TIMEOUT. 1541 * IDLE_WORKER_TIMEOUT.
1553 * 1542 *
1554 * LOCKING: 1543 * LOCKING:
1555 * spin_lock_irq(gcwq->lock) which may be released and regrabbed 1544 * spin_lock_irq(gcwq->lock) which may be released and regrabbed
1556 * multiple times. Called only from manager. 1545 * multiple times. Called only from manager.
1557 * 1546 *
1558 * RETURNS: 1547 * RETURNS:
1559 * false if no action was taken and gcwq->lock stayed locked, true 1548 * false if no action was taken and gcwq->lock stayed locked, true
1560 * otherwise. 1549 * otherwise.
1561 */ 1550 */
1562 static bool maybe_destroy_workers(struct global_cwq *gcwq) 1551 static bool maybe_destroy_workers(struct global_cwq *gcwq)
1563 { 1552 {
1564 bool ret = false; 1553 bool ret = false;
1565 1554
1566 while (too_many_workers(gcwq)) { 1555 while (too_many_workers(gcwq)) {
1567 struct worker *worker; 1556 struct worker *worker;
1568 unsigned long expires; 1557 unsigned long expires;
1569 1558
1570 worker = list_entry(gcwq->idle_list.prev, struct worker, entry); 1559 worker = list_entry(gcwq->idle_list.prev, struct worker, entry);
1571 expires = worker->last_active + IDLE_WORKER_TIMEOUT; 1560 expires = worker->last_active + IDLE_WORKER_TIMEOUT;
1572 1561
1573 if (time_before(jiffies, expires)) { 1562 if (time_before(jiffies, expires)) {
1574 mod_timer(&gcwq->idle_timer, expires); 1563 mod_timer(&gcwq->idle_timer, expires);
1575 break; 1564 break;
1576 } 1565 }
1577 1566
1578 destroy_worker(worker); 1567 destroy_worker(worker);
1579 ret = true; 1568 ret = true;
1580 } 1569 }
1581 1570
1582 return ret; 1571 return ret;
1583 } 1572 }
1584 1573
1585 /** 1574 /**
1586 * manage_workers - manage worker pool 1575 * manage_workers - manage worker pool
1587 * @worker: self 1576 * @worker: self
1588 * 1577 *
1589 * Assume the manager role and manage gcwq worker pool @worker belongs 1578 * Assume the manager role and manage gcwq worker pool @worker belongs
1590 * to. At any given time, there can be only zero or one manager per 1579 * to. At any given time, there can be only zero or one manager per
1591 * gcwq. The exclusion is handled automatically by this function. 1580 * gcwq. The exclusion is handled automatically by this function.
1592 * 1581 *
1593 * The caller can safely start processing works on false return. On 1582 * The caller can safely start processing works on false return. On
1594 * true return, it's guaranteed that need_to_create_worker() is false 1583 * true return, it's guaranteed that need_to_create_worker() is false
1595 * and may_start_working() is true. 1584 * and may_start_working() is true.
1596 * 1585 *
1597 * CONTEXT: 1586 * CONTEXT:
1598 * spin_lock_irq(gcwq->lock) which may be released and regrabbed 1587 * spin_lock_irq(gcwq->lock) which may be released and regrabbed
1599 * multiple times. Does GFP_KERNEL allocations. 1588 * multiple times. Does GFP_KERNEL allocations.
1600 * 1589 *
1601 * RETURNS: 1590 * RETURNS:
1602 * false if no action was taken and gcwq->lock stayed locked, true if 1591 * false if no action was taken and gcwq->lock stayed locked, true if
1603 * some action was taken. 1592 * some action was taken.
1604 */ 1593 */
1605 static bool manage_workers(struct worker *worker) 1594 static bool manage_workers(struct worker *worker)
1606 { 1595 {
1607 struct global_cwq *gcwq = worker->gcwq; 1596 struct global_cwq *gcwq = worker->gcwq;
1608 bool ret = false; 1597 bool ret = false;
1609 1598
1610 if (gcwq->flags & GCWQ_MANAGING_WORKERS) 1599 if (gcwq->flags & GCWQ_MANAGING_WORKERS)
1611 return ret; 1600 return ret;
1612 1601
1613 gcwq->flags &= ~GCWQ_MANAGE_WORKERS; 1602 gcwq->flags &= ~GCWQ_MANAGE_WORKERS;
1614 gcwq->flags |= GCWQ_MANAGING_WORKERS; 1603 gcwq->flags |= GCWQ_MANAGING_WORKERS;
1615 1604
1616 /* 1605 /*
1617 * Destroy and then create so that may_start_working() is true 1606 * Destroy and then create so that may_start_working() is true
1618 * on return. 1607 * on return.
1619 */ 1608 */
1620 ret |= maybe_destroy_workers(gcwq); 1609 ret |= maybe_destroy_workers(gcwq);
1621 ret |= maybe_create_worker(gcwq); 1610 ret |= maybe_create_worker(gcwq);
1622 1611
1623 gcwq->flags &= ~GCWQ_MANAGING_WORKERS; 1612 gcwq->flags &= ~GCWQ_MANAGING_WORKERS;
1624 1613
1625 /* 1614 /*
1626 * The trustee might be waiting to take over the manager 1615 * The trustee might be waiting to take over the manager
1627 * position, tell it we're done. 1616 * position, tell it we're done.
1628 */ 1617 */
1629 if (unlikely(gcwq->trustee)) 1618 if (unlikely(gcwq->trustee))
1630 wake_up_all(&gcwq->trustee_wait); 1619 wake_up_all(&gcwq->trustee_wait);
1631 1620
1632 return ret; 1621 return ret;
1633 } 1622 }
1634 1623
1635 /** 1624 /**
1636 * move_linked_works - move linked works to a list 1625 * move_linked_works - move linked works to a list
1637 * @work: start of series of works to be scheduled 1626 * @work: start of series of works to be scheduled
1638 * @head: target list to append @work to 1627 * @head: target list to append @work to
1639 * @nextp: out paramter for nested worklist walking 1628 * @nextp: out paramter for nested worklist walking
1640 * 1629 *
1641 * Schedule linked works starting from @work to @head. Work series to 1630 * Schedule linked works starting from @work to @head. Work series to
1642 * be scheduled starts at @work and includes any consecutive work with 1631 * be scheduled starts at @work and includes any consecutive work with
1643 * WORK_STRUCT_LINKED set in its predecessor. 1632 * WORK_STRUCT_LINKED set in its predecessor.
1644 * 1633 *
1645 * If @nextp is not NULL, it's updated to point to the next work of 1634 * If @nextp is not NULL, it's updated to point to the next work of
1646 * the last scheduled work. This allows move_linked_works() to be 1635 * the last scheduled work. This allows move_linked_works() to be
1647 * nested inside outer list_for_each_entry_safe(). 1636 * nested inside outer list_for_each_entry_safe().
1648 * 1637 *
1649 * CONTEXT: 1638 * CONTEXT:
1650 * spin_lock_irq(gcwq->lock). 1639 * spin_lock_irq(gcwq->lock).
1651 */ 1640 */
1652 static void move_linked_works(struct work_struct *work, struct list_head *head, 1641 static void move_linked_works(struct work_struct *work, struct list_head *head,
1653 struct work_struct **nextp) 1642 struct work_struct **nextp)
1654 { 1643 {
1655 struct work_struct *n; 1644 struct work_struct *n;
1656 1645
1657 /* 1646 /*
1658 * Linked worklist will always end before the end of the list, 1647 * Linked worklist will always end before the end of the list,
1659 * use NULL for list head. 1648 * use NULL for list head.
1660 */ 1649 */
1661 list_for_each_entry_safe_from(work, n, NULL, entry) { 1650 list_for_each_entry_safe_from(work, n, NULL, entry) {
1662 list_move_tail(&work->entry, head); 1651 list_move_tail(&work->entry, head);
1663 if (!(*work_data_bits(work) & WORK_STRUCT_LINKED)) 1652 if (!(*work_data_bits(work) & WORK_STRUCT_LINKED))
1664 break; 1653 break;
1665 } 1654 }
1666 1655
1667 /* 1656 /*
1668 * If we're already inside safe list traversal and have moved 1657 * If we're already inside safe list traversal and have moved
1669 * multiple works to the scheduled queue, the next position 1658 * multiple works to the scheduled queue, the next position
1670 * needs to be updated. 1659 * needs to be updated.
1671 */ 1660 */
1672 if (nextp) 1661 if (nextp)
1673 *nextp = n; 1662 *nextp = n;
1674 } 1663 }
1675 1664
1676 static void cwq_activate_first_delayed(struct cpu_workqueue_struct *cwq) 1665 static void cwq_activate_first_delayed(struct cpu_workqueue_struct *cwq)
1677 { 1666 {
1678 struct work_struct *work = list_first_entry(&cwq->delayed_works, 1667 struct work_struct *work = list_first_entry(&cwq->delayed_works,
1679 struct work_struct, entry); 1668 struct work_struct, entry);
1680 struct list_head *pos = gcwq_determine_ins_pos(cwq->gcwq, cwq); 1669 struct list_head *pos = gcwq_determine_ins_pos(cwq->gcwq, cwq);
1681 1670
1671 trace_workqueue_activate_work(work);
1682 move_linked_works(work, pos, NULL); 1672 move_linked_works(work, pos, NULL);
1683 __clear_bit(WORK_STRUCT_DELAYED_BIT, work_data_bits(work)); 1673 __clear_bit(WORK_STRUCT_DELAYED_BIT, work_data_bits(work));
1684 cwq->nr_active++; 1674 cwq->nr_active++;
1685 } 1675 }
1686 1676
1687 /** 1677 /**
1688 * cwq_dec_nr_in_flight - decrement cwq's nr_in_flight 1678 * cwq_dec_nr_in_flight - decrement cwq's nr_in_flight
1689 * @cwq: cwq of interest 1679 * @cwq: cwq of interest
1690 * @color: color of work which left the queue 1680 * @color: color of work which left the queue
1691 * @delayed: for a delayed work 1681 * @delayed: for a delayed work
1692 * 1682 *
1693 * A work either has completed or is removed from pending queue, 1683 * A work either has completed or is removed from pending queue,
1694 * decrement nr_in_flight of its cwq and handle workqueue flushing. 1684 * decrement nr_in_flight of its cwq and handle workqueue flushing.
1695 * 1685 *
1696 * CONTEXT: 1686 * CONTEXT:
1697 * spin_lock_irq(gcwq->lock). 1687 * spin_lock_irq(gcwq->lock).
1698 */ 1688 */
1699 static void cwq_dec_nr_in_flight(struct cpu_workqueue_struct *cwq, int color, 1689 static void cwq_dec_nr_in_flight(struct cpu_workqueue_struct *cwq, int color,
1700 bool delayed) 1690 bool delayed)
1701 { 1691 {
1702 /* ignore uncolored works */ 1692 /* ignore uncolored works */
1703 if (color == WORK_NO_COLOR) 1693 if (color == WORK_NO_COLOR)
1704 return; 1694 return;
1705 1695
1706 cwq->nr_in_flight[color]--; 1696 cwq->nr_in_flight[color]--;
1707 1697
1708 if (!delayed) { 1698 if (!delayed) {
1709 cwq->nr_active--; 1699 cwq->nr_active--;
1710 if (!list_empty(&cwq->delayed_works)) { 1700 if (!list_empty(&cwq->delayed_works)) {
1711 /* one down, submit a delayed one */ 1701 /* one down, submit a delayed one */
1712 if (cwq->nr_active < cwq->max_active) 1702 if (cwq->nr_active < cwq->max_active)
1713 cwq_activate_first_delayed(cwq); 1703 cwq_activate_first_delayed(cwq);
1714 } 1704 }
1715 } 1705 }
1716 1706
1717 /* is flush in progress and are we at the flushing tip? */ 1707 /* is flush in progress and are we at the flushing tip? */
1718 if (likely(cwq->flush_color != color)) 1708 if (likely(cwq->flush_color != color))
1719 return; 1709 return;
1720 1710
1721 /* are there still in-flight works? */ 1711 /* are there still in-flight works? */
1722 if (cwq->nr_in_flight[color]) 1712 if (cwq->nr_in_flight[color])
1723 return; 1713 return;
1724 1714
1725 /* this cwq is done, clear flush_color */ 1715 /* this cwq is done, clear flush_color */
1726 cwq->flush_color = -1; 1716 cwq->flush_color = -1;
1727 1717
1728 /* 1718 /*
1729 * If this was the last cwq, wake up the first flusher. It 1719 * If this was the last cwq, wake up the first flusher. It
1730 * will handle the rest. 1720 * will handle the rest.
1731 */ 1721 */
1732 if (atomic_dec_and_test(&cwq->wq->nr_cwqs_to_flush)) 1722 if (atomic_dec_and_test(&cwq->wq->nr_cwqs_to_flush))
1733 complete(&cwq->wq->first_flusher->done); 1723 complete(&cwq->wq->first_flusher->done);
1734 } 1724 }
1735 1725
1736 /** 1726 /**
1737 * process_one_work - process single work 1727 * process_one_work - process single work
1738 * @worker: self 1728 * @worker: self
1739 * @work: work to process 1729 * @work: work to process
1740 * 1730 *
1741 * Process @work. This function contains all the logics necessary to 1731 * Process @work. This function contains all the logics necessary to
1742 * process a single work including synchronization against and 1732 * process a single work including synchronization against and
1743 * interaction with other workers on the same cpu, queueing and 1733 * interaction with other workers on the same cpu, queueing and
1744 * flushing. As long as context requirement is met, any worker can 1734 * flushing. As long as context requirement is met, any worker can
1745 * call this function to process a work. 1735 * call this function to process a work.
1746 * 1736 *
1747 * CONTEXT: 1737 * CONTEXT:
1748 * spin_lock_irq(gcwq->lock) which is released and regrabbed. 1738 * spin_lock_irq(gcwq->lock) which is released and regrabbed.
1749 */ 1739 */
1750 static void process_one_work(struct worker *worker, struct work_struct *work) 1740 static void process_one_work(struct worker *worker, struct work_struct *work)
1751 __releases(&gcwq->lock) 1741 __releases(&gcwq->lock)
1752 __acquires(&gcwq->lock) 1742 __acquires(&gcwq->lock)
1753 { 1743 {
1754 struct cpu_workqueue_struct *cwq = get_work_cwq(work); 1744 struct cpu_workqueue_struct *cwq = get_work_cwq(work);
1755 struct global_cwq *gcwq = cwq->gcwq; 1745 struct global_cwq *gcwq = cwq->gcwq;
1756 struct hlist_head *bwh = busy_worker_head(gcwq, work); 1746 struct hlist_head *bwh = busy_worker_head(gcwq, work);
1757 bool cpu_intensive = cwq->wq->flags & WQ_CPU_INTENSIVE; 1747 bool cpu_intensive = cwq->wq->flags & WQ_CPU_INTENSIVE;
1758 work_func_t f = work->func; 1748 work_func_t f = work->func;
1759 int work_color; 1749 int work_color;
1760 struct worker *collision; 1750 struct worker *collision;
1761 #ifdef CONFIG_LOCKDEP 1751 #ifdef CONFIG_LOCKDEP
1762 /* 1752 /*
1763 * It is permissible to free the struct work_struct from 1753 * It is permissible to free the struct work_struct from
1764 * inside the function that is called from it, this we need to 1754 * inside the function that is called from it, this we need to
1765 * take into account for lockdep too. To avoid bogus "held 1755 * take into account for lockdep too. To avoid bogus "held
1766 * lock freed" warnings as well as problems when looking into 1756 * lock freed" warnings as well as problems when looking into
1767 * work->lockdep_map, make a copy and use that here. 1757 * work->lockdep_map, make a copy and use that here.
1768 */ 1758 */
1769 struct lockdep_map lockdep_map = work->lockdep_map; 1759 struct lockdep_map lockdep_map = work->lockdep_map;
1770 #endif 1760 #endif
1771 /* 1761 /*
1772 * A single work shouldn't be executed concurrently by 1762 * A single work shouldn't be executed concurrently by
1773 * multiple workers on a single cpu. Check whether anyone is 1763 * multiple workers on a single cpu. Check whether anyone is
1774 * already processing the work. If so, defer the work to the 1764 * already processing the work. If so, defer the work to the
1775 * currently executing one. 1765 * currently executing one.
1776 */ 1766 */
1777 collision = __find_worker_executing_work(gcwq, bwh, work); 1767 collision = __find_worker_executing_work(gcwq, bwh, work);
1778 if (unlikely(collision)) { 1768 if (unlikely(collision)) {
1779 move_linked_works(work, &collision->scheduled, NULL); 1769 move_linked_works(work, &collision->scheduled, NULL);
1780 return; 1770 return;
1781 } 1771 }
1782 1772
1783 /* claim and process */ 1773 /* claim and process */
1784 debug_work_deactivate(work); 1774 debug_work_deactivate(work);
1785 hlist_add_head(&worker->hentry, bwh); 1775 hlist_add_head(&worker->hentry, bwh);
1786 worker->current_work = work; 1776 worker->current_work = work;
1787 worker->current_cwq = cwq; 1777 worker->current_cwq = cwq;
1788 work_color = get_work_color(work); 1778 work_color = get_work_color(work);
1789 1779
1790 /* record the current cpu number in the work data and dequeue */ 1780 /* record the current cpu number in the work data and dequeue */
1791 set_work_cpu(work, gcwq->cpu); 1781 set_work_cpu(work, gcwq->cpu);
1792 list_del_init(&work->entry); 1782 list_del_init(&work->entry);
1793 1783
1794 /* 1784 /*
1795 * If HIGHPRI_PENDING, check the next work, and, if HIGHPRI, 1785 * If HIGHPRI_PENDING, check the next work, and, if HIGHPRI,
1796 * wake up another worker; otherwise, clear HIGHPRI_PENDING. 1786 * wake up another worker; otherwise, clear HIGHPRI_PENDING.
1797 */ 1787 */
1798 if (unlikely(gcwq->flags & GCWQ_HIGHPRI_PENDING)) { 1788 if (unlikely(gcwq->flags & GCWQ_HIGHPRI_PENDING)) {
1799 struct work_struct *nwork = list_first_entry(&gcwq->worklist, 1789 struct work_struct *nwork = list_first_entry(&gcwq->worklist,
1800 struct work_struct, entry); 1790 struct work_struct, entry);
1801 1791
1802 if (!list_empty(&gcwq->worklist) && 1792 if (!list_empty(&gcwq->worklist) &&
1803 get_work_cwq(nwork)->wq->flags & WQ_HIGHPRI) 1793 get_work_cwq(nwork)->wq->flags & WQ_HIGHPRI)
1804 wake_up_worker(gcwq); 1794 wake_up_worker(gcwq);
1805 else 1795 else
1806 gcwq->flags &= ~GCWQ_HIGHPRI_PENDING; 1796 gcwq->flags &= ~GCWQ_HIGHPRI_PENDING;
1807 } 1797 }
1808 1798
1809 /* 1799 /*
1810 * CPU intensive works don't participate in concurrency 1800 * CPU intensive works don't participate in concurrency
1811 * management. They're the scheduler's responsibility. 1801 * management. They're the scheduler's responsibility.
1812 */ 1802 */
1813 if (unlikely(cpu_intensive)) 1803 if (unlikely(cpu_intensive))
1814 worker_set_flags(worker, WORKER_CPU_INTENSIVE, true); 1804 worker_set_flags(worker, WORKER_CPU_INTENSIVE, true);
1815 1805
1816 spin_unlock_irq(&gcwq->lock); 1806 spin_unlock_irq(&gcwq->lock);
1817 1807
1818 work_clear_pending(work); 1808 work_clear_pending(work);
1819 lock_map_acquire(&cwq->wq->lockdep_map); 1809 lock_map_acquire(&cwq->wq->lockdep_map);
1820 lock_map_acquire(&lockdep_map); 1810 lock_map_acquire(&lockdep_map);
1821 trace_workqueue_execute_start(work); 1811 trace_workqueue_execute_start(work);
1822 f(work); 1812 f(work);
1823 /* 1813 /*
1824 * While we must be careful to not use "work" after this, the trace 1814 * While we must be careful to not use "work" after this, the trace
1825 * point will only record its address. 1815 * point will only record its address.
1826 */ 1816 */
1827 trace_workqueue_execute_end(work); 1817 trace_workqueue_execute_end(work);
1828 lock_map_release(&lockdep_map); 1818 lock_map_release(&lockdep_map);
1829 lock_map_release(&cwq->wq->lockdep_map); 1819 lock_map_release(&cwq->wq->lockdep_map);
1830 1820
1831 if (unlikely(in_atomic() || lockdep_depth(current) > 0)) { 1821 if (unlikely(in_atomic() || lockdep_depth(current) > 0)) {
1832 printk(KERN_ERR "BUG: workqueue leaked lock or atomic: " 1822 printk(KERN_ERR "BUG: workqueue leaked lock or atomic: "
1833 "%s/0x%08x/%d\n", 1823 "%s/0x%08x/%d\n",
1834 current->comm, preempt_count(), task_pid_nr(current)); 1824 current->comm, preempt_count(), task_pid_nr(current));
1835 printk(KERN_ERR " last function: "); 1825 printk(KERN_ERR " last function: ");
1836 print_symbol("%s\n", (unsigned long)f); 1826 print_symbol("%s\n", (unsigned long)f);
1837 debug_show_held_locks(current); 1827 debug_show_held_locks(current);
1838 dump_stack(); 1828 dump_stack();
1839 } 1829 }
1840 1830
1841 spin_lock_irq(&gcwq->lock); 1831 spin_lock_irq(&gcwq->lock);
1842 1832
1843 /* clear cpu intensive status */ 1833 /* clear cpu intensive status */
1844 if (unlikely(cpu_intensive)) 1834 if (unlikely(cpu_intensive))
1845 worker_clr_flags(worker, WORKER_CPU_INTENSIVE); 1835 worker_clr_flags(worker, WORKER_CPU_INTENSIVE);
1846 1836
1847 /* we're done with it, release */ 1837 /* we're done with it, release */
1848 hlist_del_init(&worker->hentry); 1838 hlist_del_init(&worker->hentry);
1849 worker->current_work = NULL; 1839 worker->current_work = NULL;
1850 worker->current_cwq = NULL; 1840 worker->current_cwq = NULL;
1851 cwq_dec_nr_in_flight(cwq, work_color, false); 1841 cwq_dec_nr_in_flight(cwq, work_color, false);
1852 } 1842 }
1853 1843
1854 /** 1844 /**
1855 * process_scheduled_works - process scheduled works 1845 * process_scheduled_works - process scheduled works
1856 * @worker: self 1846 * @worker: self
1857 * 1847 *
1858 * Process all scheduled works. Please note that the scheduled list 1848 * Process all scheduled works. Please note that the scheduled list
1859 * may change while processing a work, so this function repeatedly 1849 * may change while processing a work, so this function repeatedly
1860 * fetches a work from the top and executes it. 1850 * fetches a work from the top and executes it.
1861 * 1851 *
1862 * CONTEXT: 1852 * CONTEXT:
1863 * spin_lock_irq(gcwq->lock) which may be released and regrabbed 1853 * spin_lock_irq(gcwq->lock) which may be released and regrabbed
1864 * multiple times. 1854 * multiple times.
1865 */ 1855 */
1866 static void process_scheduled_works(struct worker *worker) 1856 static void process_scheduled_works(struct worker *worker)
1867 { 1857 {
1868 while (!list_empty(&worker->scheduled)) { 1858 while (!list_empty(&worker->scheduled)) {
1869 struct work_struct *work = list_first_entry(&worker->scheduled, 1859 struct work_struct *work = list_first_entry(&worker->scheduled,
1870 struct work_struct, entry); 1860 struct work_struct, entry);
1871 process_one_work(worker, work); 1861 process_one_work(worker, work);
1872 } 1862 }
1873 } 1863 }
1874 1864
1875 /** 1865 /**
1876 * worker_thread - the worker thread function 1866 * worker_thread - the worker thread function
1877 * @__worker: self 1867 * @__worker: self
1878 * 1868 *
1879 * The gcwq worker thread function. There's a single dynamic pool of 1869 * The gcwq worker thread function. There's a single dynamic pool of
1880 * these per each cpu. These workers process all works regardless of 1870 * these per each cpu. These workers process all works regardless of
1881 * their specific target workqueue. The only exception is works which 1871 * their specific target workqueue. The only exception is works which
1882 * belong to workqueues with a rescuer which will be explained in 1872 * belong to workqueues with a rescuer which will be explained in
1883 * rescuer_thread(). 1873 * rescuer_thread().
1884 */ 1874 */
1885 static int worker_thread(void *__worker) 1875 static int worker_thread(void *__worker)
1886 { 1876 {
1887 struct worker *worker = __worker; 1877 struct worker *worker = __worker;
1888 struct global_cwq *gcwq = worker->gcwq; 1878 struct global_cwq *gcwq = worker->gcwq;
1889 1879
1890 /* tell the scheduler that this is a workqueue worker */ 1880 /* tell the scheduler that this is a workqueue worker */
1891 worker->task->flags |= PF_WQ_WORKER; 1881 worker->task->flags |= PF_WQ_WORKER;
1892 woke_up: 1882 woke_up:
1893 spin_lock_irq(&gcwq->lock); 1883 spin_lock_irq(&gcwq->lock);
1894 1884
1895 /* DIE can be set only while we're idle, checking here is enough */ 1885 /* DIE can be set only while we're idle, checking here is enough */
1896 if (worker->flags & WORKER_DIE) { 1886 if (worker->flags & WORKER_DIE) {
1897 spin_unlock_irq(&gcwq->lock); 1887 spin_unlock_irq(&gcwq->lock);
1898 worker->task->flags &= ~PF_WQ_WORKER; 1888 worker->task->flags &= ~PF_WQ_WORKER;
1899 return 0; 1889 return 0;
1900 } 1890 }
1901 1891
1902 worker_leave_idle(worker); 1892 worker_leave_idle(worker);
1903 recheck: 1893 recheck:
1904 /* no more worker necessary? */ 1894 /* no more worker necessary? */
1905 if (!need_more_worker(gcwq)) 1895 if (!need_more_worker(gcwq))
1906 goto sleep; 1896 goto sleep;
1907 1897
1908 /* do we need to manage? */ 1898 /* do we need to manage? */
1909 if (unlikely(!may_start_working(gcwq)) && manage_workers(worker)) 1899 if (unlikely(!may_start_working(gcwq)) && manage_workers(worker))
1910 goto recheck; 1900 goto recheck;
1911 1901
1912 /* 1902 /*
1913 * ->scheduled list can only be filled while a worker is 1903 * ->scheduled list can only be filled while a worker is
1914 * preparing to process a work or actually processing it. 1904 * preparing to process a work or actually processing it.
1915 * Make sure nobody diddled with it while I was sleeping. 1905 * Make sure nobody diddled with it while I was sleeping.
1916 */ 1906 */
1917 BUG_ON(!list_empty(&worker->scheduled)); 1907 BUG_ON(!list_empty(&worker->scheduled));
1918 1908
1919 /* 1909 /*
1920 * When control reaches this point, we're guaranteed to have 1910 * When control reaches this point, we're guaranteed to have
1921 * at least one idle worker or that someone else has already 1911 * at least one idle worker or that someone else has already
1922 * assumed the manager role. 1912 * assumed the manager role.
1923 */ 1913 */
1924 worker_clr_flags(worker, WORKER_PREP); 1914 worker_clr_flags(worker, WORKER_PREP);
1925 1915
1926 do { 1916 do {
1927 struct work_struct *work = 1917 struct work_struct *work =
1928 list_first_entry(&gcwq->worklist, 1918 list_first_entry(&gcwq->worklist,
1929 struct work_struct, entry); 1919 struct work_struct, entry);
1930 1920
1931 if (likely(!(*work_data_bits(work) & WORK_STRUCT_LINKED))) { 1921 if (likely(!(*work_data_bits(work) & WORK_STRUCT_LINKED))) {
1932 /* optimization path, not strictly necessary */ 1922 /* optimization path, not strictly necessary */
1933 process_one_work(worker, work); 1923 process_one_work(worker, work);
1934 if (unlikely(!list_empty(&worker->scheduled))) 1924 if (unlikely(!list_empty(&worker->scheduled)))
1935 process_scheduled_works(worker); 1925 process_scheduled_works(worker);
1936 } else { 1926 } else {
1937 move_linked_works(work, &worker->scheduled, NULL); 1927 move_linked_works(work, &worker->scheduled, NULL);
1938 process_scheduled_works(worker); 1928 process_scheduled_works(worker);
1939 } 1929 }
1940 } while (keep_working(gcwq)); 1930 } while (keep_working(gcwq));
1941 1931
1942 worker_set_flags(worker, WORKER_PREP, false); 1932 worker_set_flags(worker, WORKER_PREP, false);
1943 sleep: 1933 sleep:
1944 if (unlikely(need_to_manage_workers(gcwq)) && manage_workers(worker)) 1934 if (unlikely(need_to_manage_workers(gcwq)) && manage_workers(worker))
1945 goto recheck; 1935 goto recheck;
1946 1936
1947 /* 1937 /*
1948 * gcwq->lock is held and there's no work to process and no 1938 * gcwq->lock is held and there's no work to process and no
1949 * need to manage, sleep. Workers are woken up only while 1939 * need to manage, sleep. Workers are woken up only while
1950 * holding gcwq->lock or from local cpu, so setting the 1940 * holding gcwq->lock or from local cpu, so setting the
1951 * current state before releasing gcwq->lock is enough to 1941 * current state before releasing gcwq->lock is enough to
1952 * prevent losing any event. 1942 * prevent losing any event.
1953 */ 1943 */
1954 worker_enter_idle(worker); 1944 worker_enter_idle(worker);
1955 __set_current_state(TASK_INTERRUPTIBLE); 1945 __set_current_state(TASK_INTERRUPTIBLE);
1956 spin_unlock_irq(&gcwq->lock); 1946 spin_unlock_irq(&gcwq->lock);
1957 schedule(); 1947 schedule();
1958 goto woke_up; 1948 goto woke_up;
1959 } 1949 }
1960 1950
1961 /** 1951 /**
1962 * rescuer_thread - the rescuer thread function 1952 * rescuer_thread - the rescuer thread function
1963 * @__wq: the associated workqueue 1953 * @__wq: the associated workqueue
1964 * 1954 *
1965 * Workqueue rescuer thread function. There's one rescuer for each 1955 * Workqueue rescuer thread function. There's one rescuer for each
1966 * workqueue which has WQ_RESCUER set. 1956 * workqueue which has WQ_RESCUER set.
1967 * 1957 *
1968 * Regular work processing on a gcwq may block trying to create a new 1958 * Regular work processing on a gcwq may block trying to create a new
1969 * worker which uses GFP_KERNEL allocation which has slight chance of 1959 * worker which uses GFP_KERNEL allocation which has slight chance of
1970 * developing into deadlock if some works currently on the same queue 1960 * developing into deadlock if some works currently on the same queue
1971 * need to be processed to satisfy the GFP_KERNEL allocation. This is 1961 * need to be processed to satisfy the GFP_KERNEL allocation. This is
1972 * the problem rescuer solves. 1962 * the problem rescuer solves.
1973 * 1963 *
1974 * When such condition is possible, the gcwq summons rescuers of all 1964 * When such condition is possible, the gcwq summons rescuers of all
1975 * workqueues which have works queued on the gcwq and let them process 1965 * workqueues which have works queued on the gcwq and let them process
1976 * those works so that forward progress can be guaranteed. 1966 * those works so that forward progress can be guaranteed.
1977 * 1967 *
1978 * This should happen rarely. 1968 * This should happen rarely.
1979 */ 1969 */
1980 static int rescuer_thread(void *__wq) 1970 static int rescuer_thread(void *__wq)
1981 { 1971 {
1982 struct workqueue_struct *wq = __wq; 1972 struct workqueue_struct *wq = __wq;
1983 struct worker *rescuer = wq->rescuer; 1973 struct worker *rescuer = wq->rescuer;
1984 struct list_head *scheduled = &rescuer->scheduled; 1974 struct list_head *scheduled = &rescuer->scheduled;
1985 bool is_unbound = wq->flags & WQ_UNBOUND; 1975 bool is_unbound = wq->flags & WQ_UNBOUND;
1986 unsigned int cpu; 1976 unsigned int cpu;
1987 1977
1988 set_user_nice(current, RESCUER_NICE_LEVEL); 1978 set_user_nice(current, RESCUER_NICE_LEVEL);
1989 repeat: 1979 repeat:
1990 set_current_state(TASK_INTERRUPTIBLE); 1980 set_current_state(TASK_INTERRUPTIBLE);
1991 1981
1992 if (kthread_should_stop()) 1982 if (kthread_should_stop())
1993 return 0; 1983 return 0;
1994 1984
1995 /* 1985 /*
1996 * See whether any cpu is asking for help. Unbounded 1986 * See whether any cpu is asking for help. Unbounded
1997 * workqueues use cpu 0 in mayday_mask for CPU_UNBOUND. 1987 * workqueues use cpu 0 in mayday_mask for CPU_UNBOUND.
1998 */ 1988 */
1999 for_each_mayday_cpu(cpu, wq->mayday_mask) { 1989 for_each_mayday_cpu(cpu, wq->mayday_mask) {
2000 unsigned int tcpu = is_unbound ? WORK_CPU_UNBOUND : cpu; 1990 unsigned int tcpu = is_unbound ? WORK_CPU_UNBOUND : cpu;
2001 struct cpu_workqueue_struct *cwq = get_cwq(tcpu, wq); 1991 struct cpu_workqueue_struct *cwq = get_cwq(tcpu, wq);
2002 struct global_cwq *gcwq = cwq->gcwq; 1992 struct global_cwq *gcwq = cwq->gcwq;
2003 struct work_struct *work, *n; 1993 struct work_struct *work, *n;
2004 1994
2005 __set_current_state(TASK_RUNNING); 1995 __set_current_state(TASK_RUNNING);
2006 mayday_clear_cpu(cpu, wq->mayday_mask); 1996 mayday_clear_cpu(cpu, wq->mayday_mask);
2007 1997
2008 /* migrate to the target cpu if possible */ 1998 /* migrate to the target cpu if possible */
2009 rescuer->gcwq = gcwq; 1999 rescuer->gcwq = gcwq;
2010 worker_maybe_bind_and_lock(rescuer); 2000 worker_maybe_bind_and_lock(rescuer);
2011 2001
2012 /* 2002 /*
2013 * Slurp in all works issued via this workqueue and 2003 * Slurp in all works issued via this workqueue and
2014 * process'em. 2004 * process'em.
2015 */ 2005 */
2016 BUG_ON(!list_empty(&rescuer->scheduled)); 2006 BUG_ON(!list_empty(&rescuer->scheduled));
2017 list_for_each_entry_safe(work, n, &gcwq->worklist, entry) 2007 list_for_each_entry_safe(work, n, &gcwq->worklist, entry)
2018 if (get_work_cwq(work) == cwq) 2008 if (get_work_cwq(work) == cwq)
2019 move_linked_works(work, scheduled, &n); 2009 move_linked_works(work, scheduled, &n);
2020 2010
2021 process_scheduled_works(rescuer); 2011 process_scheduled_works(rescuer);
2022 spin_unlock_irq(&gcwq->lock); 2012 spin_unlock_irq(&gcwq->lock);
2023 } 2013 }
2024 2014
2025 schedule(); 2015 schedule();
2026 goto repeat; 2016 goto repeat;
2027 } 2017 }
2028 2018
2029 struct wq_barrier { 2019 struct wq_barrier {
2030 struct work_struct work; 2020 struct work_struct work;
2031 struct completion done; 2021 struct completion done;
2032 }; 2022 };
2033 2023
2034 static void wq_barrier_func(struct work_struct *work) 2024 static void wq_barrier_func(struct work_struct *work)
2035 { 2025 {
2036 struct wq_barrier *barr = container_of(work, struct wq_barrier, work); 2026 struct wq_barrier *barr = container_of(work, struct wq_barrier, work);
2037 complete(&barr->done); 2027 complete(&barr->done);
2038 } 2028 }
2039 2029
2040 /** 2030 /**
2041 * insert_wq_barrier - insert a barrier work 2031 * insert_wq_barrier - insert a barrier work
2042 * @cwq: cwq to insert barrier into 2032 * @cwq: cwq to insert barrier into
2043 * @barr: wq_barrier to insert 2033 * @barr: wq_barrier to insert
2044 * @target: target work to attach @barr to 2034 * @target: target work to attach @barr to
2045 * @worker: worker currently executing @target, NULL if @target is not executing 2035 * @worker: worker currently executing @target, NULL if @target is not executing
2046 * 2036 *
2047 * @barr is linked to @target such that @barr is completed only after 2037 * @barr is linked to @target such that @barr is completed only after
2048 * @target finishes execution. Please note that the ordering 2038 * @target finishes execution. Please note that the ordering
2049 * guarantee is observed only with respect to @target and on the local 2039 * guarantee is observed only with respect to @target and on the local
2050 * cpu. 2040 * cpu.
2051 * 2041 *
2052 * Currently, a queued barrier can't be canceled. This is because 2042 * Currently, a queued barrier can't be canceled. This is because
2053 * try_to_grab_pending() can't determine whether the work to be 2043 * try_to_grab_pending() can't determine whether the work to be
2054 * grabbed is at the head of the queue and thus can't clear LINKED 2044 * grabbed is at the head of the queue and thus can't clear LINKED
2055 * flag of the previous work while there must be a valid next work 2045 * flag of the previous work while there must be a valid next work
2056 * after a work with LINKED flag set. 2046 * after a work with LINKED flag set.
2057 * 2047 *
2058 * Note that when @worker is non-NULL, @target may be modified 2048 * Note that when @worker is non-NULL, @target may be modified
2059 * underneath us, so we can't reliably determine cwq from @target. 2049 * underneath us, so we can't reliably determine cwq from @target.
2060 * 2050 *
2061 * CONTEXT: 2051 * CONTEXT:
2062 * spin_lock_irq(gcwq->lock). 2052 * spin_lock_irq(gcwq->lock).
2063 */ 2053 */
2064 static void insert_wq_barrier(struct cpu_workqueue_struct *cwq, 2054 static void insert_wq_barrier(struct cpu_workqueue_struct *cwq,
2065 struct wq_barrier *barr, 2055 struct wq_barrier *barr,
2066 struct work_struct *target, struct worker *worker) 2056 struct work_struct *target, struct worker *worker)
2067 { 2057 {
2068 struct list_head *head; 2058 struct list_head *head;
2069 unsigned int linked = 0; 2059 unsigned int linked = 0;
2070 2060
2071 /* 2061 /*
2072 * debugobject calls are safe here even with gcwq->lock locked 2062 * debugobject calls are safe here even with gcwq->lock locked
2073 * as we know for sure that this will not trigger any of the 2063 * as we know for sure that this will not trigger any of the
2074 * checks and call back into the fixup functions where we 2064 * checks and call back into the fixup functions where we
2075 * might deadlock. 2065 * might deadlock.
2076 */ 2066 */
2077 INIT_WORK_ON_STACK(&barr->work, wq_barrier_func); 2067 INIT_WORK_ON_STACK(&barr->work, wq_barrier_func);
2078 __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&barr->work)); 2068 __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&barr->work));
2079 init_completion(&barr->done); 2069 init_completion(&barr->done);
2080 2070
2081 /* 2071 /*
2082 * If @target is currently being executed, schedule the 2072 * If @target is currently being executed, schedule the
2083 * barrier to the worker; otherwise, put it after @target. 2073 * barrier to the worker; otherwise, put it after @target.
2084 */ 2074 */
2085 if (worker) 2075 if (worker)
2086 head = worker->scheduled.next; 2076 head = worker->scheduled.next;
2087 else { 2077 else {
2088 unsigned long *bits = work_data_bits(target); 2078 unsigned long *bits = work_data_bits(target);
2089 2079
2090 head = target->entry.next; 2080 head = target->entry.next;
2091 /* there can already be other linked works, inherit and set */ 2081 /* there can already be other linked works, inherit and set */
2092 linked = *bits & WORK_STRUCT_LINKED; 2082 linked = *bits & WORK_STRUCT_LINKED;
2093 __set_bit(WORK_STRUCT_LINKED_BIT, bits); 2083 __set_bit(WORK_STRUCT_LINKED_BIT, bits);
2094 } 2084 }
2095 2085
2096 debug_work_activate(&barr->work); 2086 debug_work_activate(&barr->work);
2097 insert_work(cwq, &barr->work, head, 2087 insert_work(cwq, &barr->work, head,
2098 work_color_to_flags(WORK_NO_COLOR) | linked); 2088 work_color_to_flags(WORK_NO_COLOR) | linked);
2099 } 2089 }
2100 2090
2101 /** 2091 /**
2102 * flush_workqueue_prep_cwqs - prepare cwqs for workqueue flushing 2092 * flush_workqueue_prep_cwqs - prepare cwqs for workqueue flushing
2103 * @wq: workqueue being flushed 2093 * @wq: workqueue being flushed
2104 * @flush_color: new flush color, < 0 for no-op 2094 * @flush_color: new flush color, < 0 for no-op
2105 * @work_color: new work color, < 0 for no-op 2095 * @work_color: new work color, < 0 for no-op
2106 * 2096 *
2107 * Prepare cwqs for workqueue flushing. 2097 * Prepare cwqs for workqueue flushing.
2108 * 2098 *
2109 * If @flush_color is non-negative, flush_color on all cwqs should be 2099 * If @flush_color is non-negative, flush_color on all cwqs should be
2110 * -1. If no cwq has in-flight commands at the specified color, all 2100 * -1. If no cwq has in-flight commands at the specified color, all
2111 * cwq->flush_color's stay at -1 and %false is returned. If any cwq 2101 * cwq->flush_color's stay at -1 and %false is returned. If any cwq
2112 * has in flight commands, its cwq->flush_color is set to 2102 * has in flight commands, its cwq->flush_color is set to
2113 * @flush_color, @wq->nr_cwqs_to_flush is updated accordingly, cwq 2103 * @flush_color, @wq->nr_cwqs_to_flush is updated accordingly, cwq
2114 * wakeup logic is armed and %true is returned. 2104 * wakeup logic is armed and %true is returned.
2115 * 2105 *
2116 * The caller should have initialized @wq->first_flusher prior to 2106 * The caller should have initialized @wq->first_flusher prior to
2117 * calling this function with non-negative @flush_color. If 2107 * calling this function with non-negative @flush_color. If
2118 * @flush_color is negative, no flush color update is done and %false 2108 * @flush_color is negative, no flush color update is done and %false
2119 * is returned. 2109 * is returned.
2120 * 2110 *
2121 * If @work_color is non-negative, all cwqs should have the same 2111 * If @work_color is non-negative, all cwqs should have the same
2122 * work_color which is previous to @work_color and all will be 2112 * work_color which is previous to @work_color and all will be
2123 * advanced to @work_color. 2113 * advanced to @work_color.
2124 * 2114 *
2125 * CONTEXT: 2115 * CONTEXT:
2126 * mutex_lock(wq->flush_mutex). 2116 * mutex_lock(wq->flush_mutex).
2127 * 2117 *
2128 * RETURNS: 2118 * RETURNS:
2129 * %true if @flush_color >= 0 and there's something to flush. %false 2119 * %true if @flush_color >= 0 and there's something to flush. %false
2130 * otherwise. 2120 * otherwise.
2131 */ 2121 */
2132 static bool flush_workqueue_prep_cwqs(struct workqueue_struct *wq, 2122 static bool flush_workqueue_prep_cwqs(struct workqueue_struct *wq,
2133 int flush_color, int work_color) 2123 int flush_color, int work_color)
2134 { 2124 {
2135 bool wait = false; 2125 bool wait = false;
2136 unsigned int cpu; 2126 unsigned int cpu;
2137 2127
2138 if (flush_color >= 0) { 2128 if (flush_color >= 0) {
2139 BUG_ON(atomic_read(&wq->nr_cwqs_to_flush)); 2129 BUG_ON(atomic_read(&wq->nr_cwqs_to_flush));
2140 atomic_set(&wq->nr_cwqs_to_flush, 1); 2130 atomic_set(&wq->nr_cwqs_to_flush, 1);
2141 } 2131 }
2142 2132
2143 for_each_cwq_cpu(cpu, wq) { 2133 for_each_cwq_cpu(cpu, wq) {
2144 struct cpu_workqueue_struct *cwq = get_cwq(cpu, wq); 2134 struct cpu_workqueue_struct *cwq = get_cwq(cpu, wq);
2145 struct global_cwq *gcwq = cwq->gcwq; 2135 struct global_cwq *gcwq = cwq->gcwq;
2146 2136
2147 spin_lock_irq(&gcwq->lock); 2137 spin_lock_irq(&gcwq->lock);
2148 2138
2149 if (flush_color >= 0) { 2139 if (flush_color >= 0) {
2150 BUG_ON(cwq->flush_color != -1); 2140 BUG_ON(cwq->flush_color != -1);
2151 2141
2152 if (cwq->nr_in_flight[flush_color]) { 2142 if (cwq->nr_in_flight[flush_color]) {
2153 cwq->flush_color = flush_color; 2143 cwq->flush_color = flush_color;
2154 atomic_inc(&wq->nr_cwqs_to_flush); 2144 atomic_inc(&wq->nr_cwqs_to_flush);
2155 wait = true; 2145 wait = true;
2156 } 2146 }
2157 } 2147 }
2158 2148
2159 if (work_color >= 0) { 2149 if (work_color >= 0) {
2160 BUG_ON(work_color != work_next_color(cwq->work_color)); 2150 BUG_ON(work_color != work_next_color(cwq->work_color));
2161 cwq->work_color = work_color; 2151 cwq->work_color = work_color;
2162 } 2152 }
2163 2153
2164 spin_unlock_irq(&gcwq->lock); 2154 spin_unlock_irq(&gcwq->lock);
2165 } 2155 }
2166 2156
2167 if (flush_color >= 0 && atomic_dec_and_test(&wq->nr_cwqs_to_flush)) 2157 if (flush_color >= 0 && atomic_dec_and_test(&wq->nr_cwqs_to_flush))
2168 complete(&wq->first_flusher->done); 2158 complete(&wq->first_flusher->done);
2169 2159
2170 return wait; 2160 return wait;
2171 } 2161 }
2172 2162
2173 /** 2163 /**
2174 * flush_workqueue - ensure that any scheduled work has run to completion. 2164 * flush_workqueue - ensure that any scheduled work has run to completion.
2175 * @wq: workqueue to flush 2165 * @wq: workqueue to flush
2176 * 2166 *
2177 * Forces execution of the workqueue and blocks until its completion. 2167 * Forces execution of the workqueue and blocks until its completion.
2178 * This is typically used in driver shutdown handlers. 2168 * This is typically used in driver shutdown handlers.
2179 * 2169 *
2180 * We sleep until all works which were queued on entry have been handled, 2170 * We sleep until all works which were queued on entry have been handled,
2181 * but we are not livelocked by new incoming ones. 2171 * but we are not livelocked by new incoming ones.
2182 */ 2172 */
2183 void flush_workqueue(struct workqueue_struct *wq) 2173 void flush_workqueue(struct workqueue_struct *wq)
2184 { 2174 {
2185 struct wq_flusher this_flusher = { 2175 struct wq_flusher this_flusher = {
2186 .list = LIST_HEAD_INIT(this_flusher.list), 2176 .list = LIST_HEAD_INIT(this_flusher.list),
2187 .flush_color = -1, 2177 .flush_color = -1,
2188 .done = COMPLETION_INITIALIZER_ONSTACK(this_flusher.done), 2178 .done = COMPLETION_INITIALIZER_ONSTACK(this_flusher.done),
2189 }; 2179 };
2190 int next_color; 2180 int next_color;
2191 2181
2192 lock_map_acquire(&wq->lockdep_map); 2182 lock_map_acquire(&wq->lockdep_map);
2193 lock_map_release(&wq->lockdep_map); 2183 lock_map_release(&wq->lockdep_map);
2194 2184
2195 mutex_lock(&wq->flush_mutex); 2185 mutex_lock(&wq->flush_mutex);
2196 2186
2197 /* 2187 /*
2198 * Start-to-wait phase 2188 * Start-to-wait phase
2199 */ 2189 */
2200 next_color = work_next_color(wq->work_color); 2190 next_color = work_next_color(wq->work_color);
2201 2191
2202 if (next_color != wq->flush_color) { 2192 if (next_color != wq->flush_color) {
2203 /* 2193 /*
2204 * Color space is not full. The current work_color 2194 * Color space is not full. The current work_color
2205 * becomes our flush_color and work_color is advanced 2195 * becomes our flush_color and work_color is advanced
2206 * by one. 2196 * by one.
2207 */ 2197 */
2208 BUG_ON(!list_empty(&wq->flusher_overflow)); 2198 BUG_ON(!list_empty(&wq->flusher_overflow));
2209 this_flusher.flush_color = wq->work_color; 2199 this_flusher.flush_color = wq->work_color;
2210 wq->work_color = next_color; 2200 wq->work_color = next_color;
2211 2201
2212 if (!wq->first_flusher) { 2202 if (!wq->first_flusher) {
2213 /* no flush in progress, become the first flusher */ 2203 /* no flush in progress, become the first flusher */
2214 BUG_ON(wq->flush_color != this_flusher.flush_color); 2204 BUG_ON(wq->flush_color != this_flusher.flush_color);
2215 2205
2216 wq->first_flusher = &this_flusher; 2206 wq->first_flusher = &this_flusher;
2217 2207
2218 if (!flush_workqueue_prep_cwqs(wq, wq->flush_color, 2208 if (!flush_workqueue_prep_cwqs(wq, wq->flush_color,
2219 wq->work_color)) { 2209 wq->work_color)) {
2220 /* nothing to flush, done */ 2210 /* nothing to flush, done */
2221 wq->flush_color = next_color; 2211 wq->flush_color = next_color;
2222 wq->first_flusher = NULL; 2212 wq->first_flusher = NULL;
2223 goto out_unlock; 2213 goto out_unlock;
2224 } 2214 }
2225 } else { 2215 } else {
2226 /* wait in queue */ 2216 /* wait in queue */
2227 BUG_ON(wq->flush_color == this_flusher.flush_color); 2217 BUG_ON(wq->flush_color == this_flusher.flush_color);
2228 list_add_tail(&this_flusher.list, &wq->flusher_queue); 2218 list_add_tail(&this_flusher.list, &wq->flusher_queue);
2229 flush_workqueue_prep_cwqs(wq, -1, wq->work_color); 2219 flush_workqueue_prep_cwqs(wq, -1, wq->work_color);
2230 } 2220 }
2231 } else { 2221 } else {
2232 /* 2222 /*
2233 * Oops, color space is full, wait on overflow queue. 2223 * Oops, color space is full, wait on overflow queue.
2234 * The next flush completion will assign us 2224 * The next flush completion will assign us
2235 * flush_color and transfer to flusher_queue. 2225 * flush_color and transfer to flusher_queue.
2236 */ 2226 */
2237 list_add_tail(&this_flusher.list, &wq->flusher_overflow); 2227 list_add_tail(&this_flusher.list, &wq->flusher_overflow);
2238 } 2228 }
2239 2229
2240 mutex_unlock(&wq->flush_mutex); 2230 mutex_unlock(&wq->flush_mutex);
2241 2231
2242 wait_for_completion(&this_flusher.done); 2232 wait_for_completion(&this_flusher.done);
2243 2233
2244 /* 2234 /*
2245 * Wake-up-and-cascade phase 2235 * Wake-up-and-cascade phase
2246 * 2236 *
2247 * First flushers are responsible for cascading flushes and 2237 * First flushers are responsible for cascading flushes and
2248 * handling overflow. Non-first flushers can simply return. 2238 * handling overflow. Non-first flushers can simply return.
2249 */ 2239 */
2250 if (wq->first_flusher != &this_flusher) 2240 if (wq->first_flusher != &this_flusher)
2251 return; 2241 return;
2252 2242
2253 mutex_lock(&wq->flush_mutex); 2243 mutex_lock(&wq->flush_mutex);
2254 2244
2255 /* we might have raced, check again with mutex held */ 2245 /* we might have raced, check again with mutex held */
2256 if (wq->first_flusher != &this_flusher) 2246 if (wq->first_flusher != &this_flusher)
2257 goto out_unlock; 2247 goto out_unlock;
2258 2248
2259 wq->first_flusher = NULL; 2249 wq->first_flusher = NULL;
2260 2250
2261 BUG_ON(!list_empty(&this_flusher.list)); 2251 BUG_ON(!list_empty(&this_flusher.list));
2262 BUG_ON(wq->flush_color != this_flusher.flush_color); 2252 BUG_ON(wq->flush_color != this_flusher.flush_color);
2263 2253
2264 while (true) { 2254 while (true) {
2265 struct wq_flusher *next, *tmp; 2255 struct wq_flusher *next, *tmp;
2266 2256
2267 /* complete all the flushers sharing the current flush color */ 2257 /* complete all the flushers sharing the current flush color */
2268 list_for_each_entry_safe(next, tmp, &wq->flusher_queue, list) { 2258 list_for_each_entry_safe(next, tmp, &wq->flusher_queue, list) {
2269 if (next->flush_color != wq->flush_color) 2259 if (next->flush_color != wq->flush_color)
2270 break; 2260 break;
2271 list_del_init(&next->list); 2261 list_del_init(&next->list);
2272 complete(&next->done); 2262 complete(&next->done);
2273 } 2263 }
2274 2264
2275 BUG_ON(!list_empty(&wq->flusher_overflow) && 2265 BUG_ON(!list_empty(&wq->flusher_overflow) &&
2276 wq->flush_color != work_next_color(wq->work_color)); 2266 wq->flush_color != work_next_color(wq->work_color));
2277 2267
2278 /* this flush_color is finished, advance by one */ 2268 /* this flush_color is finished, advance by one */
2279 wq->flush_color = work_next_color(wq->flush_color); 2269 wq->flush_color = work_next_color(wq->flush_color);
2280 2270
2281 /* one color has been freed, handle overflow queue */ 2271 /* one color has been freed, handle overflow queue */
2282 if (!list_empty(&wq->flusher_overflow)) { 2272 if (!list_empty(&wq->flusher_overflow)) {
2283 /* 2273 /*
2284 * Assign the same color to all overflowed 2274 * Assign the same color to all overflowed
2285 * flushers, advance work_color and append to 2275 * flushers, advance work_color and append to
2286 * flusher_queue. This is the start-to-wait 2276 * flusher_queue. This is the start-to-wait
2287 * phase for these overflowed flushers. 2277 * phase for these overflowed flushers.
2288 */ 2278 */
2289 list_for_each_entry(tmp, &wq->flusher_overflow, list) 2279 list_for_each_entry(tmp, &wq->flusher_overflow, list)
2290 tmp->flush_color = wq->work_color; 2280 tmp->flush_color = wq->work_color;
2291 2281
2292 wq->work_color = work_next_color(wq->work_color); 2282 wq->work_color = work_next_color(wq->work_color);
2293 2283
2294 list_splice_tail_init(&wq->flusher_overflow, 2284 list_splice_tail_init(&wq->flusher_overflow,
2295 &wq->flusher_queue); 2285 &wq->flusher_queue);
2296 flush_workqueue_prep_cwqs(wq, -1, wq->work_color); 2286 flush_workqueue_prep_cwqs(wq, -1, wq->work_color);
2297 } 2287 }
2298 2288
2299 if (list_empty(&wq->flusher_queue)) { 2289 if (list_empty(&wq->flusher_queue)) {
2300 BUG_ON(wq->flush_color != wq->work_color); 2290 BUG_ON(wq->flush_color != wq->work_color);
2301 break; 2291 break;
2302 } 2292 }
2303 2293
2304 /* 2294 /*
2305 * Need to flush more colors. Make the next flusher 2295 * Need to flush more colors. Make the next flusher
2306 * the new first flusher and arm cwqs. 2296 * the new first flusher and arm cwqs.
2307 */ 2297 */
2308 BUG_ON(wq->flush_color == wq->work_color); 2298 BUG_ON(wq->flush_color == wq->work_color);
2309 BUG_ON(wq->flush_color != next->flush_color); 2299 BUG_ON(wq->flush_color != next->flush_color);
2310 2300
2311 list_del_init(&next->list); 2301 list_del_init(&next->list);
2312 wq->first_flusher = next; 2302 wq->first_flusher = next;
2313 2303
2314 if (flush_workqueue_prep_cwqs(wq, wq->flush_color, -1)) 2304 if (flush_workqueue_prep_cwqs(wq, wq->flush_color, -1))
2315 break; 2305 break;
2316 2306
2317 /* 2307 /*
2318 * Meh... this color is already done, clear first 2308 * Meh... this color is already done, clear first
2319 * flusher and repeat cascading. 2309 * flusher and repeat cascading.
2320 */ 2310 */
2321 wq->first_flusher = NULL; 2311 wq->first_flusher = NULL;
2322 } 2312 }
2323 2313
2324 out_unlock: 2314 out_unlock:
2325 mutex_unlock(&wq->flush_mutex); 2315 mutex_unlock(&wq->flush_mutex);
2326 } 2316 }
2327 EXPORT_SYMBOL_GPL(flush_workqueue); 2317 EXPORT_SYMBOL_GPL(flush_workqueue);
2328 2318
2329 /** 2319 static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr,
2330 * flush_work - block until a work_struct's callback has terminated 2320 bool wait_executing)
2331 * @work: the work which is to be flushed
2332 *
2333 * Returns false if @work has already terminated.
2334 *
2335 * It is expected that, prior to calling flush_work(), the caller has
2336 * arranged for the work to not be requeued, otherwise it doesn't make
2337 * sense to use this function.
2338 */
2339 int flush_work(struct work_struct *work)
2340 { 2321 {
2341 struct worker *worker = NULL; 2322 struct worker *worker = NULL;
2342 struct global_cwq *gcwq; 2323 struct global_cwq *gcwq;
2343 struct cpu_workqueue_struct *cwq; 2324 struct cpu_workqueue_struct *cwq;
2344 struct wq_barrier barr;
2345 2325
2346 might_sleep(); 2326 might_sleep();
2347 gcwq = get_work_gcwq(work); 2327 gcwq = get_work_gcwq(work);
2348 if (!gcwq) 2328 if (!gcwq)
2349 return 0; 2329 return false;
2350 2330
2351 spin_lock_irq(&gcwq->lock); 2331 spin_lock_irq(&gcwq->lock);
2352 if (!list_empty(&work->entry)) { 2332 if (!list_empty(&work->entry)) {
2353 /* 2333 /*
2354 * See the comment near try_to_grab_pending()->smp_rmb(). 2334 * See the comment near try_to_grab_pending()->smp_rmb().
2355 * If it was re-queued to a different gcwq under us, we 2335 * If it was re-queued to a different gcwq under us, we
2356 * are not going to wait. 2336 * are not going to wait.
2357 */ 2337 */
2358 smp_rmb(); 2338 smp_rmb();
2359 cwq = get_work_cwq(work); 2339 cwq = get_work_cwq(work);
2360 if (unlikely(!cwq || gcwq != cwq->gcwq)) 2340 if (unlikely(!cwq || gcwq != cwq->gcwq))
2361 goto already_gone; 2341 goto already_gone;
2362 } else { 2342 } else if (wait_executing) {
2363 worker = find_worker_executing_work(gcwq, work); 2343 worker = find_worker_executing_work(gcwq, work);
2364 if (!worker) 2344 if (!worker)
2365 goto already_gone; 2345 goto already_gone;
2366 cwq = worker->current_cwq; 2346 cwq = worker->current_cwq;
2367 } 2347 } else
2348 goto already_gone;
2368 2349
2369 insert_wq_barrier(cwq, &barr, work, worker); 2350 insert_wq_barrier(cwq, barr, work, worker);
2370 spin_unlock_irq(&gcwq->lock); 2351 spin_unlock_irq(&gcwq->lock);
2371 2352
2372 lock_map_acquire(&cwq->wq->lockdep_map); 2353 lock_map_acquire(&cwq->wq->lockdep_map);
2373 lock_map_release(&cwq->wq->lockdep_map); 2354 lock_map_release(&cwq->wq->lockdep_map);
2374 2355 return true;
2375 wait_for_completion(&barr.done);
2376 destroy_work_on_stack(&barr.work);
2377 return 1;
2378 already_gone: 2356 already_gone:
2379 spin_unlock_irq(&gcwq->lock); 2357 spin_unlock_irq(&gcwq->lock);
2380 return 0; 2358 return false;
2381 } 2359 }
2360
2361 /**
2362 * flush_work - wait for a work to finish executing the last queueing instance
2363 * @work: the work to flush
2364 *
2365 * Wait until @work has finished execution. This function considers
2366 * only the last queueing instance of @work. If @work has been
2367 * enqueued across different CPUs on a non-reentrant workqueue or on
2368 * multiple workqueues, @work might still be executing on return on
2369 * some of the CPUs from earlier queueing.
2370 *
2371 * If @work was queued only on a non-reentrant, ordered or unbound
2372 * workqueue, @work is guaranteed to be idle on return if it hasn't
2373 * been requeued since flush started.
2374 *
2375 * RETURNS:
2376 * %true if flush_work() waited for the work to finish execution,
2377 * %false if it was already idle.
2378 */
2379 bool flush_work(struct work_struct *work)
2380 {
2381 struct wq_barrier barr;
2382
2383 if (start_flush_work(work, &barr, true)) {
2384 wait_for_completion(&barr.done);
2385 destroy_work_on_stack(&barr.work);
2386 return true;
2387 } else
2388 return false;
2389 }
2382 EXPORT_SYMBOL_GPL(flush_work); 2390 EXPORT_SYMBOL_GPL(flush_work);
2383 2391
2392 static bool wait_on_cpu_work(struct global_cwq *gcwq, struct work_struct *work)
2393 {
2394 struct wq_barrier barr;
2395 struct worker *worker;
2396
2397 spin_lock_irq(&gcwq->lock);
2398
2399 worker = find_worker_executing_work(gcwq, work);
2400 if (unlikely(worker))
2401 insert_wq_barrier(worker->current_cwq, &barr, work, worker);
2402
2403 spin_unlock_irq(&gcwq->lock);
2404
2405 if (unlikely(worker)) {
2406 wait_for_completion(&barr.done);
2407 destroy_work_on_stack(&barr.work);
2408 return true;
2409 } else
2410 return false;
2411 }
2412
2413 static bool wait_on_work(struct work_struct *work)
2414 {
2415 bool ret = false;
2416 int cpu;
2417
2418 might_sleep();
2419
2420 lock_map_acquire(&work->lockdep_map);
2421 lock_map_release(&work->lockdep_map);
2422
2423 for_each_gcwq_cpu(cpu)
2424 ret |= wait_on_cpu_work(get_gcwq(cpu), work);
2425 return ret;
2426 }
2427
2428 /**
2429 * flush_work_sync - wait until a work has finished execution
2430 * @work: the work to flush
2431 *
2432 * Wait until @work has finished execution. On return, it's
2433 * guaranteed that all queueing instances of @work which happened
2434 * before this function is called are finished. In other words, if
2435 * @work hasn't been requeued since this function was called, @work is
2436 * guaranteed to be idle on return.
2437 *
2438 * RETURNS:
2439 * %true if flush_work_sync() waited for the work to finish execution,
2440 * %false if it was already idle.
2441 */
2442 bool flush_work_sync(struct work_struct *work)
2443 {
2444 struct wq_barrier barr;
2445 bool pending, waited;
2446
2447 /* we'll wait for executions separately, queue barr only if pending */
2448 pending = start_flush_work(work, &barr, false);
2449
2450 /* wait for executions to finish */
2451 waited = wait_on_work(work);
2452
2453 /* wait for the pending one */
2454 if (pending) {
2455 wait_for_completion(&barr.done);
2456 destroy_work_on_stack(&barr.work);
2457 }
2458
2459 return pending || waited;
2460 }
2461 EXPORT_SYMBOL_GPL(flush_work_sync);
2462
2384 /* 2463 /*
2385 * Upon a successful return (>= 0), the caller "owns" WORK_STRUCT_PENDING bit, 2464 * Upon a successful return (>= 0), the caller "owns" WORK_STRUCT_PENDING bit,
2386 * so this work can't be re-armed in any way. 2465 * so this work can't be re-armed in any way.
2387 */ 2466 */
2388 static int try_to_grab_pending(struct work_struct *work) 2467 static int try_to_grab_pending(struct work_struct *work)
2389 { 2468 {
2390 struct global_cwq *gcwq; 2469 struct global_cwq *gcwq;
2391 int ret = -1; 2470 int ret = -1;
2392 2471
2393 if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) 2472 if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work)))
2394 return 0; 2473 return 0;
2395 2474
2396 /* 2475 /*
2397 * The queueing is in progress, or it is already queued. Try to 2476 * The queueing is in progress, or it is already queued. Try to
2398 * steal it from ->worklist without clearing WORK_STRUCT_PENDING. 2477 * steal it from ->worklist without clearing WORK_STRUCT_PENDING.
2399 */ 2478 */
2400 gcwq = get_work_gcwq(work); 2479 gcwq = get_work_gcwq(work);
2401 if (!gcwq) 2480 if (!gcwq)
2402 return ret; 2481 return ret;
2403 2482
2404 spin_lock_irq(&gcwq->lock); 2483 spin_lock_irq(&gcwq->lock);
2405 if (!list_empty(&work->entry)) { 2484 if (!list_empty(&work->entry)) {
2406 /* 2485 /*
2407 * This work is queued, but perhaps we locked the wrong gcwq. 2486 * This work is queued, but perhaps we locked the wrong gcwq.
2408 * In that case we must see the new value after rmb(), see 2487 * In that case we must see the new value after rmb(), see
2409 * insert_work()->wmb(). 2488 * insert_work()->wmb().
2410 */ 2489 */
2411 smp_rmb(); 2490 smp_rmb();
2412 if (gcwq == get_work_gcwq(work)) { 2491 if (gcwq == get_work_gcwq(work)) {
2413 debug_work_deactivate(work); 2492 debug_work_deactivate(work);
2414 list_del_init(&work->entry); 2493 list_del_init(&work->entry);
2415 cwq_dec_nr_in_flight(get_work_cwq(work), 2494 cwq_dec_nr_in_flight(get_work_cwq(work),
2416 get_work_color(work), 2495 get_work_color(work),
2417 *work_data_bits(work) & WORK_STRUCT_DELAYED); 2496 *work_data_bits(work) & WORK_STRUCT_DELAYED);
2418 ret = 1; 2497 ret = 1;
2419 } 2498 }
2420 } 2499 }
2421 spin_unlock_irq(&gcwq->lock); 2500 spin_unlock_irq(&gcwq->lock);
2422 2501
2423 return ret; 2502 return ret;
2424 } 2503 }
2425 2504
2426 static void wait_on_cpu_work(struct global_cwq *gcwq, struct work_struct *work) 2505 static bool __cancel_work_timer(struct work_struct *work,
2427 {
2428 struct wq_barrier barr;
2429 struct worker *worker;
2430
2431 spin_lock_irq(&gcwq->lock);
2432
2433 worker = find_worker_executing_work(gcwq, work);
2434 if (unlikely(worker))
2435 insert_wq_barrier(worker->current_cwq, &barr, work, worker);
2436
2437 spin_unlock_irq(&gcwq->lock);
2438
2439 if (unlikely(worker)) {
2440 wait_for_completion(&barr.done);
2441 destroy_work_on_stack(&barr.work);
2442 }
2443 }
2444
2445 static void wait_on_work(struct work_struct *work)
2446 {
2447 int cpu;
2448
2449 might_sleep();
2450
2451 lock_map_acquire(&work->lockdep_map);
2452 lock_map_release(&work->lockdep_map);
2453
2454 for_each_gcwq_cpu(cpu)
2455 wait_on_cpu_work(get_gcwq(cpu), work);
2456 }
2457
2458 static int __cancel_work_timer(struct work_struct *work,
2459 struct timer_list* timer) 2506 struct timer_list* timer)
2460 { 2507 {
2461 int ret; 2508 int ret;
2462 2509
2463 do { 2510 do {
2464 ret = (timer && likely(del_timer(timer))); 2511 ret = (timer && likely(del_timer(timer)));
2465 if (!ret) 2512 if (!ret)
2466 ret = try_to_grab_pending(work); 2513 ret = try_to_grab_pending(work);
2467 wait_on_work(work); 2514 wait_on_work(work);
2468 } while (unlikely(ret < 0)); 2515 } while (unlikely(ret < 0));
2469 2516
2470 clear_work_data(work); 2517 clear_work_data(work);
2471 return ret; 2518 return ret;
2472 } 2519 }
2473 2520
2474 /** 2521 /**
2475 * cancel_work_sync - block until a work_struct's callback has terminated 2522 * cancel_work_sync - cancel a work and wait for it to finish
2476 * @work: the work which is to be flushed 2523 * @work: the work to cancel
2477 * 2524 *
2478 * Returns true if @work was pending. 2525 * Cancel @work and wait for its execution to finish. This function
2526 * can be used even if the work re-queues itself or migrates to
2527 * another workqueue. On return from this function, @work is
2528 * guaranteed to be not pending or executing on any CPU.
2479 * 2529 *
2480 * cancel_work_sync() will cancel the work if it is queued. If the work's 2530 * cancel_work_sync(&delayed_work->work) must not be used for
2481 * callback appears to be running, cancel_work_sync() will block until it 2531 * delayed_work's. Use cancel_delayed_work_sync() instead.
2482 * has completed.
2483 * 2532 *
2484 * It is possible to use this function if the work re-queues itself. It can 2533 * The caller must ensure that the workqueue on which @work was last
2485 * cancel the work even if it migrates to another workqueue, however in that
2486 * case it only guarantees that work->func() has completed on the last queued
2487 * workqueue.
2488 *
2489 * cancel_work_sync(&delayed_work->work) should be used only if ->timer is not
2490 * pending, otherwise it goes into a busy-wait loop until the timer expires.
2491 *
2492 * The caller must ensure that workqueue_struct on which this work was last
2493 * queued can't be destroyed before this function returns. 2534 * queued can't be destroyed before this function returns.
2535 *
2536 * RETURNS:
2537 * %true if @work was pending, %false otherwise.
2494 */ 2538 */
2495 int cancel_work_sync(struct work_struct *work) 2539 bool cancel_work_sync(struct work_struct *work)
2496 { 2540 {
2497 return __cancel_work_timer(work, NULL); 2541 return __cancel_work_timer(work, NULL);
2498 } 2542 }
2499 EXPORT_SYMBOL_GPL(cancel_work_sync); 2543 EXPORT_SYMBOL_GPL(cancel_work_sync);
2500 2544
2501 /** 2545 /**
2502 * cancel_delayed_work_sync - reliably kill off a delayed work. 2546 * flush_delayed_work - wait for a dwork to finish executing the last queueing
2503 * @dwork: the delayed work struct 2547 * @dwork: the delayed work to flush
2504 * 2548 *
2505 * Returns true if @dwork was pending. 2549 * Delayed timer is cancelled and the pending work is queued for
2550 * immediate execution. Like flush_work(), this function only
2551 * considers the last queueing instance of @dwork.
2506 * 2552 *
2507 * It is possible to use this function if @dwork rearms itself via queue_work() 2553 * RETURNS:
2508 * or queue_delayed_work(). See also the comment for cancel_work_sync(). 2554 * %true if flush_work() waited for the work to finish execution,
2555 * %false if it was already idle.
2509 */ 2556 */
2510 int cancel_delayed_work_sync(struct delayed_work *dwork) 2557 bool flush_delayed_work(struct delayed_work *dwork)
2511 { 2558 {
2559 if (del_timer_sync(&dwork->timer))
2560 __queue_work(raw_smp_processor_id(),
2561 get_work_cwq(&dwork->work)->wq, &dwork->work);
2562 return flush_work(&dwork->work);
2563 }
2564 EXPORT_SYMBOL(flush_delayed_work);
2565
2566 /**
2567 * flush_delayed_work_sync - wait for a dwork to finish
2568 * @dwork: the delayed work to flush
2569 *
2570 * Delayed timer is cancelled and the pending work is queued for
2571 * execution immediately. Other than timer handling, its behavior
2572 * is identical to flush_work_sync().
2573 *
2574 * RETURNS:
2575 * %true if flush_work_sync() waited for the work to finish execution,
2576 * %false if it was already idle.
2577 */
2578 bool flush_delayed_work_sync(struct delayed_work *dwork)
2579 {
2580 if (del_timer_sync(&dwork->timer))
2581 __queue_work(raw_smp_processor_id(),
2582 get_work_cwq(&dwork->work)->wq, &dwork->work);
2583 return flush_work_sync(&dwork->work);
2584 }
2585 EXPORT_SYMBOL(flush_delayed_work_sync);
2586
2587 /**
2588 * cancel_delayed_work_sync - cancel a delayed work and wait for it to finish
2589 * @dwork: the delayed work cancel
2590 *
2591 * This is cancel_work_sync() for delayed works.
2592 *
2593 * RETURNS:
2594 * %true if @dwork was pending, %false otherwise.
2595 */
2596 bool cancel_delayed_work_sync(struct delayed_work *dwork)
2597 {
2512 return __cancel_work_timer(&dwork->work, &dwork->timer); 2598 return __cancel_work_timer(&dwork->work, &dwork->timer);
2513 } 2599 }
2514 EXPORT_SYMBOL(cancel_delayed_work_sync); 2600 EXPORT_SYMBOL(cancel_delayed_work_sync);
2515 2601
2516 /** 2602 /**
2517 * schedule_work - put work task in global workqueue 2603 * schedule_work - put work task in global workqueue
2518 * @work: job to be done 2604 * @work: job to be done
2519 * 2605 *
2520 * Returns zero if @work was already on the kernel-global workqueue and 2606 * Returns zero if @work was already on the kernel-global workqueue and
2521 * non-zero otherwise. 2607 * non-zero otherwise.
2522 * 2608 *
2523 * This puts a job in the kernel-global workqueue if it was not already 2609 * This puts a job in the kernel-global workqueue if it was not already
2524 * queued and leaves it in the same position on the kernel-global 2610 * queued and leaves it in the same position on the kernel-global
2525 * workqueue otherwise. 2611 * workqueue otherwise.
2526 */ 2612 */
2527 int schedule_work(struct work_struct *work) 2613 int schedule_work(struct work_struct *work)
2528 { 2614 {
2529 return queue_work(system_wq, work); 2615 return queue_work(system_wq, work);
2530 } 2616 }
2531 EXPORT_SYMBOL(schedule_work); 2617 EXPORT_SYMBOL(schedule_work);
2532 2618
2533 /* 2619 /*
2534 * schedule_work_on - put work task on a specific cpu 2620 * schedule_work_on - put work task on a specific cpu
2535 * @cpu: cpu to put the work task on 2621 * @cpu: cpu to put the work task on
2536 * @work: job to be done 2622 * @work: job to be done
2537 * 2623 *
2538 * This puts a job on a specific cpu 2624 * This puts a job on a specific cpu
2539 */ 2625 */
2540 int schedule_work_on(int cpu, struct work_struct *work) 2626 int schedule_work_on(int cpu, struct work_struct *work)
2541 { 2627 {
2542 return queue_work_on(cpu, system_wq, work); 2628 return queue_work_on(cpu, system_wq, work);
2543 } 2629 }
2544 EXPORT_SYMBOL(schedule_work_on); 2630 EXPORT_SYMBOL(schedule_work_on);
2545 2631
2546 /** 2632 /**
2547 * schedule_delayed_work - put work task in global workqueue after delay 2633 * schedule_delayed_work - put work task in global workqueue after delay
2548 * @dwork: job to be done 2634 * @dwork: job to be done
2549 * @delay: number of jiffies to wait or 0 for immediate execution 2635 * @delay: number of jiffies to wait or 0 for immediate execution
2550 * 2636 *
2551 * After waiting for a given time this puts a job in the kernel-global 2637 * After waiting for a given time this puts a job in the kernel-global
2552 * workqueue. 2638 * workqueue.
2553 */ 2639 */
2554 int schedule_delayed_work(struct delayed_work *dwork, 2640 int schedule_delayed_work(struct delayed_work *dwork,
2555 unsigned long delay) 2641 unsigned long delay)
2556 { 2642 {
2557 return queue_delayed_work(system_wq, dwork, delay); 2643 return queue_delayed_work(system_wq, dwork, delay);
2558 } 2644 }
2559 EXPORT_SYMBOL(schedule_delayed_work); 2645 EXPORT_SYMBOL(schedule_delayed_work);
2560 2646
2561 /** 2647 /**
2562 * flush_delayed_work - block until a dwork_struct's callback has terminated
2563 * @dwork: the delayed work which is to be flushed
2564 *
2565 * Any timeout is cancelled, and any pending work is run immediately.
2566 */
2567 void flush_delayed_work(struct delayed_work *dwork)
2568 {
2569 if (del_timer_sync(&dwork->timer)) {
2570 __queue_work(get_cpu(), get_work_cwq(&dwork->work)->wq,
2571 &dwork->work);
2572 put_cpu();
2573 }
2574 flush_work(&dwork->work);
2575 }
2576 EXPORT_SYMBOL(flush_delayed_work);
2577
2578 /**
2579 * schedule_delayed_work_on - queue work in global workqueue on CPU after delay 2648 * schedule_delayed_work_on - queue work in global workqueue on CPU after delay
2580 * @cpu: cpu to use 2649 * @cpu: cpu to use
2581 * @dwork: job to be done 2650 * @dwork: job to be done
2582 * @delay: number of jiffies to wait 2651 * @delay: number of jiffies to wait
2583 * 2652 *
2584 * After waiting for a given time this puts a job in the kernel-global 2653 * After waiting for a given time this puts a job in the kernel-global
2585 * workqueue on the specified CPU. 2654 * workqueue on the specified CPU.
2586 */ 2655 */
2587 int schedule_delayed_work_on(int cpu, 2656 int schedule_delayed_work_on(int cpu,
2588 struct delayed_work *dwork, unsigned long delay) 2657 struct delayed_work *dwork, unsigned long delay)
2589 { 2658 {
2590 return queue_delayed_work_on(cpu, system_wq, dwork, delay); 2659 return queue_delayed_work_on(cpu, system_wq, dwork, delay);
2591 } 2660 }
2592 EXPORT_SYMBOL(schedule_delayed_work_on); 2661 EXPORT_SYMBOL(schedule_delayed_work_on);
2593 2662
2594 /** 2663 /**
2595 * schedule_on_each_cpu - call a function on each online CPU from keventd 2664 * schedule_on_each_cpu - execute a function synchronously on each online CPU
2596 * @func: the function to call 2665 * @func: the function to call
2597 * 2666 *
2598 * Returns zero on success. 2667 * schedule_on_each_cpu() executes @func on each online CPU using the
2599 * Returns -ve errno on failure. 2668 * system workqueue and blocks until all CPUs have completed.
2600 *
2601 * schedule_on_each_cpu() is very slow. 2669 * schedule_on_each_cpu() is very slow.
2670 *
2671 * RETURNS:
2672 * 0 on success, -errno on failure.
2602 */ 2673 */
2603 int schedule_on_each_cpu(work_func_t func) 2674 int schedule_on_each_cpu(work_func_t func)
2604 { 2675 {
2605 int cpu; 2676 int cpu;
2606 struct work_struct __percpu *works; 2677 struct work_struct __percpu *works;
2607 2678
2608 works = alloc_percpu(struct work_struct); 2679 works = alloc_percpu(struct work_struct);
2609 if (!works) 2680 if (!works)
2610 return -ENOMEM; 2681 return -ENOMEM;
2611 2682
2612 get_online_cpus(); 2683 get_online_cpus();
2613 2684
2614 for_each_online_cpu(cpu) { 2685 for_each_online_cpu(cpu) {
2615 struct work_struct *work = per_cpu_ptr(works, cpu); 2686 struct work_struct *work = per_cpu_ptr(works, cpu);
2616 2687
2617 INIT_WORK(work, func); 2688 INIT_WORK(work, func);
2618 schedule_work_on(cpu, work); 2689 schedule_work_on(cpu, work);
2619 } 2690 }
2620 2691
2621 for_each_online_cpu(cpu) 2692 for_each_online_cpu(cpu)
2622 flush_work(per_cpu_ptr(works, cpu)); 2693 flush_work(per_cpu_ptr(works, cpu));
2623 2694
2624 put_online_cpus(); 2695 put_online_cpus();
2625 free_percpu(works); 2696 free_percpu(works);
2626 return 0; 2697 return 0;
2627 } 2698 }
2628 2699
2629 /** 2700 /**
2630 * flush_scheduled_work - ensure that any scheduled work has run to completion. 2701 * flush_scheduled_work - ensure that any scheduled work has run to completion.
2631 * 2702 *
2632 * Forces execution of the kernel-global workqueue and blocks until its 2703 * Forces execution of the kernel-global workqueue and blocks until its
2633 * completion. 2704 * completion.
2634 * 2705 *
2635 * Think twice before calling this function! It's very easy to get into 2706 * Think twice before calling this function! It's very easy to get into
2636 * trouble if you don't take great care. Either of the following situations 2707 * trouble if you don't take great care. Either of the following situations
2637 * will lead to deadlock: 2708 * will lead to deadlock:
2638 * 2709 *
2639 * One of the work items currently on the workqueue needs to acquire 2710 * One of the work items currently on the workqueue needs to acquire
2640 * a lock held by your code or its caller. 2711 * a lock held by your code or its caller.
2641 * 2712 *
2642 * Your code is running in the context of a work routine. 2713 * Your code is running in the context of a work routine.
2643 * 2714 *
2644 * They will be detected by lockdep when they occur, but the first might not 2715 * They will be detected by lockdep when they occur, but the first might not
2645 * occur very often. It depends on what work items are on the workqueue and 2716 * occur very often. It depends on what work items are on the workqueue and
2646 * what locks they need, which you have no control over. 2717 * what locks they need, which you have no control over.
2647 * 2718 *
2648 * In most situations flushing the entire workqueue is overkill; you merely 2719 * In most situations flushing the entire workqueue is overkill; you merely
2649 * need to know that a particular work item isn't queued and isn't running. 2720 * need to know that a particular work item isn't queued and isn't running.
2650 * In such cases you should use cancel_delayed_work_sync() or 2721 * In such cases you should use cancel_delayed_work_sync() or
2651 * cancel_work_sync() instead. 2722 * cancel_work_sync() instead.
2652 */ 2723 */
2653 void flush_scheduled_work(void) 2724 void flush_scheduled_work(void)
2654 { 2725 {
2655 flush_workqueue(system_wq); 2726 flush_workqueue(system_wq);
2656 } 2727 }
2657 EXPORT_SYMBOL(flush_scheduled_work); 2728 EXPORT_SYMBOL(flush_scheduled_work);
2658 2729
2659 /** 2730 /**
2660 * execute_in_process_context - reliably execute the routine with user context 2731 * execute_in_process_context - reliably execute the routine with user context
2661 * @fn: the function to execute 2732 * @fn: the function to execute
2662 * @ew: guaranteed storage for the execute work structure (must 2733 * @ew: guaranteed storage for the execute work structure (must
2663 * be available when the work executes) 2734 * be available when the work executes)
2664 * 2735 *
2665 * Executes the function immediately if process context is available, 2736 * Executes the function immediately if process context is available,
2666 * otherwise schedules the function for delayed execution. 2737 * otherwise schedules the function for delayed execution.
2667 * 2738 *
2668 * Returns: 0 - function was executed 2739 * Returns: 0 - function was executed
2669 * 1 - function was scheduled for execution 2740 * 1 - function was scheduled for execution
2670 */ 2741 */
2671 int execute_in_process_context(work_func_t fn, struct execute_work *ew) 2742 int execute_in_process_context(work_func_t fn, struct execute_work *ew)
2672 { 2743 {
2673 if (!in_interrupt()) { 2744 if (!in_interrupt()) {
2674 fn(&ew->work); 2745 fn(&ew->work);
2675 return 0; 2746 return 0;
2676 } 2747 }
2677 2748
2678 INIT_WORK(&ew->work, fn); 2749 INIT_WORK(&ew->work, fn);
2679 schedule_work(&ew->work); 2750 schedule_work(&ew->work);
2680 2751
2681 return 1; 2752 return 1;
2682 } 2753 }
2683 EXPORT_SYMBOL_GPL(execute_in_process_context); 2754 EXPORT_SYMBOL_GPL(execute_in_process_context);
2684 2755
2685 int keventd_up(void) 2756 int keventd_up(void)
2686 { 2757 {
2687 return system_wq != NULL; 2758 return system_wq != NULL;
2688 } 2759 }
2689 2760
2690 static int alloc_cwqs(struct workqueue_struct *wq) 2761 static int alloc_cwqs(struct workqueue_struct *wq)
2691 { 2762 {
2692 /* 2763 /*
2693 * cwqs are forced aligned according to WORK_STRUCT_FLAG_BITS. 2764 * cwqs are forced aligned according to WORK_STRUCT_FLAG_BITS.
2694 * Make sure that the alignment isn't lower than that of 2765 * Make sure that the alignment isn't lower than that of
2695 * unsigned long long. 2766 * unsigned long long.
2696 */ 2767 */
2697 const size_t size = sizeof(struct cpu_workqueue_struct); 2768 const size_t size = sizeof(struct cpu_workqueue_struct);
2698 const size_t align = max_t(size_t, 1 << WORK_STRUCT_FLAG_BITS, 2769 const size_t align = max_t(size_t, 1 << WORK_STRUCT_FLAG_BITS,
2699 __alignof__(unsigned long long)); 2770 __alignof__(unsigned long long));
2700 #ifdef CONFIG_SMP 2771 #ifdef CONFIG_SMP
2701 bool percpu = !(wq->flags & WQ_UNBOUND); 2772 bool percpu = !(wq->flags & WQ_UNBOUND);
2702 #else 2773 #else
2703 bool percpu = false; 2774 bool percpu = false;
2704 #endif 2775 #endif
2705 2776
2706 if (percpu) 2777 if (percpu)
2707 wq->cpu_wq.pcpu = __alloc_percpu(size, align); 2778 wq->cpu_wq.pcpu = __alloc_percpu(size, align);
2708 else { 2779 else {
2709 void *ptr; 2780 void *ptr;
2710 2781
2711 /* 2782 /*
2712 * Allocate enough room to align cwq and put an extra 2783 * Allocate enough room to align cwq and put an extra
2713 * pointer at the end pointing back to the originally 2784 * pointer at the end pointing back to the originally
2714 * allocated pointer which will be used for free. 2785 * allocated pointer which will be used for free.
2715 */ 2786 */
2716 ptr = kzalloc(size + align + sizeof(void *), GFP_KERNEL); 2787 ptr = kzalloc(size + align + sizeof(void *), GFP_KERNEL);
2717 if (ptr) { 2788 if (ptr) {
2718 wq->cpu_wq.single = PTR_ALIGN(ptr, align); 2789 wq->cpu_wq.single = PTR_ALIGN(ptr, align);
2719 *(void **)(wq->cpu_wq.single + 1) = ptr; 2790 *(void **)(wq->cpu_wq.single + 1) = ptr;
2720 } 2791 }
2721 } 2792 }
2722 2793
2723 /* just in case, make sure it's actually aligned */ 2794 /* just in case, make sure it's actually aligned */
2724 BUG_ON(!IS_ALIGNED(wq->cpu_wq.v, align)); 2795 BUG_ON(!IS_ALIGNED(wq->cpu_wq.v, align));
2725 return wq->cpu_wq.v ? 0 : -ENOMEM; 2796 return wq->cpu_wq.v ? 0 : -ENOMEM;
2726 } 2797 }
2727 2798
2728 static void free_cwqs(struct workqueue_struct *wq) 2799 static void free_cwqs(struct workqueue_struct *wq)
2729 { 2800 {
2730 #ifdef CONFIG_SMP 2801 #ifdef CONFIG_SMP
2731 bool percpu = !(wq->flags & WQ_UNBOUND); 2802 bool percpu = !(wq->flags & WQ_UNBOUND);
2732 #else 2803 #else
2733 bool percpu = false; 2804 bool percpu = false;
2734 #endif 2805 #endif
2735 2806
2736 if (percpu) 2807 if (percpu)
2737 free_percpu(wq->cpu_wq.pcpu); 2808 free_percpu(wq->cpu_wq.pcpu);
2738 else if (wq->cpu_wq.single) { 2809 else if (wq->cpu_wq.single) {
2739 /* the pointer to free is stored right after the cwq */ 2810 /* the pointer to free is stored right after the cwq */
2740 kfree(*(void **)(wq->cpu_wq.single + 1)); 2811 kfree(*(void **)(wq->cpu_wq.single + 1));
2741 } 2812 }
2742 } 2813 }
2743 2814
2744 static int wq_clamp_max_active(int max_active, unsigned int flags, 2815 static int wq_clamp_max_active(int max_active, unsigned int flags,
2745 const char *name) 2816 const char *name)
2746 { 2817 {
2747 int lim = flags & WQ_UNBOUND ? WQ_UNBOUND_MAX_ACTIVE : WQ_MAX_ACTIVE; 2818 int lim = flags & WQ_UNBOUND ? WQ_UNBOUND_MAX_ACTIVE : WQ_MAX_ACTIVE;
2748 2819
2749 if (max_active < 1 || max_active > lim) 2820 if (max_active < 1 || max_active > lim)
2750 printk(KERN_WARNING "workqueue: max_active %d requested for %s " 2821 printk(KERN_WARNING "workqueue: max_active %d requested for %s "
2751 "is out of range, clamping between %d and %d\n", 2822 "is out of range, clamping between %d and %d\n",
2752 max_active, name, 1, lim); 2823 max_active, name, 1, lim);
2753 2824
2754 return clamp_val(max_active, 1, lim); 2825 return clamp_val(max_active, 1, lim);
2755 } 2826 }
2756 2827
2757 struct workqueue_struct *__alloc_workqueue_key(const char *name, 2828 struct workqueue_struct *__alloc_workqueue_key(const char *name,
2758 unsigned int flags, 2829 unsigned int flags,
2759 int max_active, 2830 int max_active,
2760 struct lock_class_key *key, 2831 struct lock_class_key *key,
2761 const char *lock_name) 2832 const char *lock_name)
2762 { 2833 {
2763 struct workqueue_struct *wq; 2834 struct workqueue_struct *wq;
2764 unsigned int cpu; 2835 unsigned int cpu;
2836
2837 /*
2838 * Workqueues which may be used during memory reclaim should
2839 * have a rescuer to guarantee forward progress.
2840 */
2841 if (flags & WQ_MEM_RECLAIM)
2842 flags |= WQ_RESCUER;
2765 2843
2766 /* 2844 /*
2767 * Unbound workqueues aren't concurrency managed and should be 2845 * Unbound workqueues aren't concurrency managed and should be
2768 * dispatched to workers immediately. 2846 * dispatched to workers immediately.
2769 */ 2847 */
2770 if (flags & WQ_UNBOUND) 2848 if (flags & WQ_UNBOUND)
2771 flags |= WQ_HIGHPRI; 2849 flags |= WQ_HIGHPRI;
2772 2850
2773 max_active = max_active ?: WQ_DFL_ACTIVE; 2851 max_active = max_active ?: WQ_DFL_ACTIVE;
2774 max_active = wq_clamp_max_active(max_active, flags, name); 2852 max_active = wq_clamp_max_active(max_active, flags, name);
2775 2853
2776 wq = kzalloc(sizeof(*wq), GFP_KERNEL); 2854 wq = kzalloc(sizeof(*wq), GFP_KERNEL);
2777 if (!wq) 2855 if (!wq)
2778 goto err; 2856 goto err;
2779 2857
2780 wq->flags = flags; 2858 wq->flags = flags;
2781 wq->saved_max_active = max_active; 2859 wq->saved_max_active = max_active;
2782 mutex_init(&wq->flush_mutex); 2860 mutex_init(&wq->flush_mutex);
2783 atomic_set(&wq->nr_cwqs_to_flush, 0); 2861 atomic_set(&wq->nr_cwqs_to_flush, 0);
2784 INIT_LIST_HEAD(&wq->flusher_queue); 2862 INIT_LIST_HEAD(&wq->flusher_queue);
2785 INIT_LIST_HEAD(&wq->flusher_overflow); 2863 INIT_LIST_HEAD(&wq->flusher_overflow);
2786 2864
2787 wq->name = name; 2865 wq->name = name;
2788 lockdep_init_map(&wq->lockdep_map, lock_name, key, 0); 2866 lockdep_init_map(&wq->lockdep_map, lock_name, key, 0);
2789 INIT_LIST_HEAD(&wq->list); 2867 INIT_LIST_HEAD(&wq->list);
2790 2868
2791 if (alloc_cwqs(wq) < 0) 2869 if (alloc_cwqs(wq) < 0)
2792 goto err; 2870 goto err;
2793 2871
2794 for_each_cwq_cpu(cpu, wq) { 2872 for_each_cwq_cpu(cpu, wq) {
2795 struct cpu_workqueue_struct *cwq = get_cwq(cpu, wq); 2873 struct cpu_workqueue_struct *cwq = get_cwq(cpu, wq);
2796 struct global_cwq *gcwq = get_gcwq(cpu); 2874 struct global_cwq *gcwq = get_gcwq(cpu);
2797 2875
2798 BUG_ON((unsigned long)cwq & WORK_STRUCT_FLAG_MASK); 2876 BUG_ON((unsigned long)cwq & WORK_STRUCT_FLAG_MASK);
2799 cwq->gcwq = gcwq; 2877 cwq->gcwq = gcwq;
2800 cwq->wq = wq; 2878 cwq->wq = wq;
2801 cwq->flush_color = -1; 2879 cwq->flush_color = -1;
2802 cwq->max_active = max_active; 2880 cwq->max_active = max_active;
2803 INIT_LIST_HEAD(&cwq->delayed_works); 2881 INIT_LIST_HEAD(&cwq->delayed_works);
2804 } 2882 }
2805 2883
2806 if (flags & WQ_RESCUER) { 2884 if (flags & WQ_RESCUER) {
2807 struct worker *rescuer; 2885 struct worker *rescuer;
2808 2886
2809 if (!alloc_mayday_mask(&wq->mayday_mask, GFP_KERNEL)) 2887 if (!alloc_mayday_mask(&wq->mayday_mask, GFP_KERNEL))
2810 goto err; 2888 goto err;
2811 2889
2812 wq->rescuer = rescuer = alloc_worker(); 2890 wq->rescuer = rescuer = alloc_worker();
2813 if (!rescuer) 2891 if (!rescuer)
2814 goto err; 2892 goto err;
2815 2893
2816 rescuer->task = kthread_create(rescuer_thread, wq, "%s", name); 2894 rescuer->task = kthread_create(rescuer_thread, wq, "%s", name);
2817 if (IS_ERR(rescuer->task)) 2895 if (IS_ERR(rescuer->task))
2818 goto err; 2896 goto err;
2819 2897
2820 rescuer->task->flags |= PF_THREAD_BOUND; 2898 rescuer->task->flags |= PF_THREAD_BOUND;
2821 wake_up_process(rescuer->task); 2899 wake_up_process(rescuer->task);
2822 } 2900 }
2823 2901
2824 /* 2902 /*
2825 * workqueue_lock protects global freeze state and workqueues 2903 * workqueue_lock protects global freeze state and workqueues
2826 * list. Grab it, set max_active accordingly and add the new 2904 * list. Grab it, set max_active accordingly and add the new
2827 * workqueue to workqueues list. 2905 * workqueue to workqueues list.
2828 */ 2906 */
2829 spin_lock(&workqueue_lock); 2907 spin_lock(&workqueue_lock);
2830 2908
2831 if (workqueue_freezing && wq->flags & WQ_FREEZEABLE) 2909 if (workqueue_freezing && wq->flags & WQ_FREEZEABLE)
2832 for_each_cwq_cpu(cpu, wq) 2910 for_each_cwq_cpu(cpu, wq)
2833 get_cwq(cpu, wq)->max_active = 0; 2911 get_cwq(cpu, wq)->max_active = 0;
2834 2912
2835 list_add(&wq->list, &workqueues); 2913 list_add(&wq->list, &workqueues);
2836 2914
2837 spin_unlock(&workqueue_lock); 2915 spin_unlock(&workqueue_lock);
2838 2916
2839 return wq; 2917 return wq;
2840 err: 2918 err:
2841 if (wq) { 2919 if (wq) {
2842 free_cwqs(wq); 2920 free_cwqs(wq);
2843 free_mayday_mask(wq->mayday_mask); 2921 free_mayday_mask(wq->mayday_mask);
2844 kfree(wq->rescuer); 2922 kfree(wq->rescuer);
2845 kfree(wq); 2923 kfree(wq);
2846 } 2924 }
2847 return NULL; 2925 return NULL;
2848 } 2926 }
2849 EXPORT_SYMBOL_GPL(__alloc_workqueue_key); 2927 EXPORT_SYMBOL_GPL(__alloc_workqueue_key);
2850 2928
2851 /** 2929 /**
2852 * destroy_workqueue - safely terminate a workqueue 2930 * destroy_workqueue - safely terminate a workqueue
2853 * @wq: target workqueue 2931 * @wq: target workqueue
2854 * 2932 *
2855 * Safely destroy a workqueue. All work currently pending will be done first. 2933 * Safely destroy a workqueue. All work currently pending will be done first.
2856 */ 2934 */
2857 void destroy_workqueue(struct workqueue_struct *wq) 2935 void destroy_workqueue(struct workqueue_struct *wq)
2858 { 2936 {
2859 unsigned int cpu; 2937 unsigned int cpu;
2860 2938
2861 wq->flags |= WQ_DYING; 2939 wq->flags |= WQ_DYING;
2862 flush_workqueue(wq); 2940 flush_workqueue(wq);
2863 2941
2864 /* 2942 /*
2865 * wq list is used to freeze wq, remove from list after 2943 * wq list is used to freeze wq, remove from list after
2866 * flushing is complete in case freeze races us. 2944 * flushing is complete in case freeze races us.
2867 */ 2945 */
2868 spin_lock(&workqueue_lock); 2946 spin_lock(&workqueue_lock);
2869 list_del(&wq->list); 2947 list_del(&wq->list);
2870 spin_unlock(&workqueue_lock); 2948 spin_unlock(&workqueue_lock);
2871 2949
2872 /* sanity check */ 2950 /* sanity check */
2873 for_each_cwq_cpu(cpu, wq) { 2951 for_each_cwq_cpu(cpu, wq) {
2874 struct cpu_workqueue_struct *cwq = get_cwq(cpu, wq); 2952 struct cpu_workqueue_struct *cwq = get_cwq(cpu, wq);
2875 int i; 2953 int i;
2876 2954
2877 for (i = 0; i < WORK_NR_COLORS; i++) 2955 for (i = 0; i < WORK_NR_COLORS; i++)
2878 BUG_ON(cwq->nr_in_flight[i]); 2956 BUG_ON(cwq->nr_in_flight[i]);
2879 BUG_ON(cwq->nr_active); 2957 BUG_ON(cwq->nr_active);
2880 BUG_ON(!list_empty(&cwq->delayed_works)); 2958 BUG_ON(!list_empty(&cwq->delayed_works));
2881 } 2959 }
2882 2960
2883 if (wq->flags & WQ_RESCUER) { 2961 if (wq->flags & WQ_RESCUER) {
2884 kthread_stop(wq->rescuer->task); 2962 kthread_stop(wq->rescuer->task);
2885 free_mayday_mask(wq->mayday_mask); 2963 free_mayday_mask(wq->mayday_mask);
2886 kfree(wq->rescuer); 2964 kfree(wq->rescuer);
2887 } 2965 }
2888 2966
2889 free_cwqs(wq); 2967 free_cwqs(wq);
2890 kfree(wq); 2968 kfree(wq);
2891 } 2969 }
2892 EXPORT_SYMBOL_GPL(destroy_workqueue); 2970 EXPORT_SYMBOL_GPL(destroy_workqueue);
2893 2971
2894 /** 2972 /**
2895 * workqueue_set_max_active - adjust max_active of a workqueue 2973 * workqueue_set_max_active - adjust max_active of a workqueue
2896 * @wq: target workqueue 2974 * @wq: target workqueue
2897 * @max_active: new max_active value. 2975 * @max_active: new max_active value.
2898 * 2976 *
2899 * Set max_active of @wq to @max_active. 2977 * Set max_active of @wq to @max_active.
2900 * 2978 *
2901 * CONTEXT: 2979 * CONTEXT:
2902 * Don't call from IRQ context. 2980 * Don't call from IRQ context.
2903 */ 2981 */
2904 void workqueue_set_max_active(struct workqueue_struct *wq, int max_active) 2982 void workqueue_set_max_active(struct workqueue_struct *wq, int max_active)
2905 { 2983 {
2906 unsigned int cpu; 2984 unsigned int cpu;
2907 2985
2908 max_active = wq_clamp_max_active(max_active, wq->flags, wq->name); 2986 max_active = wq_clamp_max_active(max_active, wq->flags, wq->name);
2909 2987
2910 spin_lock(&workqueue_lock); 2988 spin_lock(&workqueue_lock);
2911 2989
2912 wq->saved_max_active = max_active; 2990 wq->saved_max_active = max_active;
2913 2991
2914 for_each_cwq_cpu(cpu, wq) { 2992 for_each_cwq_cpu(cpu, wq) {
2915 struct global_cwq *gcwq = get_gcwq(cpu); 2993 struct global_cwq *gcwq = get_gcwq(cpu);
2916 2994
2917 spin_lock_irq(&gcwq->lock); 2995 spin_lock_irq(&gcwq->lock);
2918 2996
2919 if (!(wq->flags & WQ_FREEZEABLE) || 2997 if (!(wq->flags & WQ_FREEZEABLE) ||
2920 !(gcwq->flags & GCWQ_FREEZING)) 2998 !(gcwq->flags & GCWQ_FREEZING))
2921 get_cwq(gcwq->cpu, wq)->max_active = max_active; 2999 get_cwq(gcwq->cpu, wq)->max_active = max_active;
2922 3000
2923 spin_unlock_irq(&gcwq->lock); 3001 spin_unlock_irq(&gcwq->lock);
2924 } 3002 }
2925 3003
2926 spin_unlock(&workqueue_lock); 3004 spin_unlock(&workqueue_lock);
2927 } 3005 }
2928 EXPORT_SYMBOL_GPL(workqueue_set_max_active); 3006 EXPORT_SYMBOL_GPL(workqueue_set_max_active);
2929 3007
2930 /** 3008 /**
2931 * workqueue_congested - test whether a workqueue is congested 3009 * workqueue_congested - test whether a workqueue is congested
2932 * @cpu: CPU in question 3010 * @cpu: CPU in question
2933 * @wq: target workqueue 3011 * @wq: target workqueue
2934 * 3012 *
2935 * Test whether @wq's cpu workqueue for @cpu is congested. There is 3013 * Test whether @wq's cpu workqueue for @cpu is congested. There is
2936 * no synchronization around this function and the test result is 3014 * no synchronization around this function and the test result is
2937 * unreliable and only useful as advisory hints or for debugging. 3015 * unreliable and only useful as advisory hints or for debugging.
2938 * 3016 *
2939 * RETURNS: 3017 * RETURNS:
2940 * %true if congested, %false otherwise. 3018 * %true if congested, %false otherwise.
2941 */ 3019 */
2942 bool workqueue_congested(unsigned int cpu, struct workqueue_struct *wq) 3020 bool workqueue_congested(unsigned int cpu, struct workqueue_struct *wq)
2943 { 3021 {
2944 struct cpu_workqueue_struct *cwq = get_cwq(cpu, wq); 3022 struct cpu_workqueue_struct *cwq = get_cwq(cpu, wq);
2945 3023
2946 return !list_empty(&cwq->delayed_works); 3024 return !list_empty(&cwq->delayed_works);
2947 } 3025 }
2948 EXPORT_SYMBOL_GPL(workqueue_congested); 3026 EXPORT_SYMBOL_GPL(workqueue_congested);
2949 3027
2950 /** 3028 /**
2951 * work_cpu - return the last known associated cpu for @work 3029 * work_cpu - return the last known associated cpu for @work
2952 * @work: the work of interest 3030 * @work: the work of interest
2953 * 3031 *
2954 * RETURNS: 3032 * RETURNS:
2955 * CPU number if @work was ever queued. WORK_CPU_NONE otherwise. 3033 * CPU number if @work was ever queued. WORK_CPU_NONE otherwise.
2956 */ 3034 */
2957 unsigned int work_cpu(struct work_struct *work) 3035 unsigned int work_cpu(struct work_struct *work)
2958 { 3036 {
2959 struct global_cwq *gcwq = get_work_gcwq(work); 3037 struct global_cwq *gcwq = get_work_gcwq(work);
2960 3038
2961 return gcwq ? gcwq->cpu : WORK_CPU_NONE; 3039 return gcwq ? gcwq->cpu : WORK_CPU_NONE;
2962 } 3040 }
2963 EXPORT_SYMBOL_GPL(work_cpu); 3041 EXPORT_SYMBOL_GPL(work_cpu);
2964 3042
2965 /** 3043 /**
2966 * work_busy - test whether a work is currently pending or running 3044 * work_busy - test whether a work is currently pending or running
2967 * @work: the work to be tested 3045 * @work: the work to be tested
2968 * 3046 *
2969 * Test whether @work is currently pending or running. There is no 3047 * Test whether @work is currently pending or running. There is no
2970 * synchronization around this function and the test result is 3048 * synchronization around this function and the test result is
2971 * unreliable and only useful as advisory hints or for debugging. 3049 * unreliable and only useful as advisory hints or for debugging.
2972 * Especially for reentrant wqs, the pending state might hide the 3050 * Especially for reentrant wqs, the pending state might hide the
2973 * running state. 3051 * running state.
2974 * 3052 *
2975 * RETURNS: 3053 * RETURNS:
2976 * OR'd bitmask of WORK_BUSY_* bits. 3054 * OR'd bitmask of WORK_BUSY_* bits.
2977 */ 3055 */
2978 unsigned int work_busy(struct work_struct *work) 3056 unsigned int work_busy(struct work_struct *work)
2979 { 3057 {
2980 struct global_cwq *gcwq = get_work_gcwq(work); 3058 struct global_cwq *gcwq = get_work_gcwq(work);
2981 unsigned long flags; 3059 unsigned long flags;
2982 unsigned int ret = 0; 3060 unsigned int ret = 0;
2983 3061
2984 if (!gcwq) 3062 if (!gcwq)
2985 return false; 3063 return false;
2986 3064
2987 spin_lock_irqsave(&gcwq->lock, flags); 3065 spin_lock_irqsave(&gcwq->lock, flags);
2988 3066
2989 if (work_pending(work)) 3067 if (work_pending(work))
2990 ret |= WORK_BUSY_PENDING; 3068 ret |= WORK_BUSY_PENDING;
2991 if (find_worker_executing_work(gcwq, work)) 3069 if (find_worker_executing_work(gcwq, work))
2992 ret |= WORK_BUSY_RUNNING; 3070 ret |= WORK_BUSY_RUNNING;
2993 3071
2994 spin_unlock_irqrestore(&gcwq->lock, flags); 3072 spin_unlock_irqrestore(&gcwq->lock, flags);
2995 3073
2996 return ret; 3074 return ret;
2997 } 3075 }
2998 EXPORT_SYMBOL_GPL(work_busy); 3076 EXPORT_SYMBOL_GPL(work_busy);
2999 3077
3000 /* 3078 /*
3001 * CPU hotplug. 3079 * CPU hotplug.
3002 * 3080 *
3003 * There are two challenges in supporting CPU hotplug. Firstly, there 3081 * There are two challenges in supporting CPU hotplug. Firstly, there
3004 * are a lot of assumptions on strong associations among work, cwq and 3082 * are a lot of assumptions on strong associations among work, cwq and
3005 * gcwq which make migrating pending and scheduled works very 3083 * gcwq which make migrating pending and scheduled works very
3006 * difficult to implement without impacting hot paths. Secondly, 3084 * difficult to implement without impacting hot paths. Secondly,
3007 * gcwqs serve mix of short, long and very long running works making 3085 * gcwqs serve mix of short, long and very long running works making
3008 * blocked draining impractical. 3086 * blocked draining impractical.
3009 * 3087 *
3010 * This is solved by allowing a gcwq to be detached from CPU, running 3088 * This is solved by allowing a gcwq to be detached from CPU, running
3011 * it with unbound (rogue) workers and allowing it to be reattached 3089 * it with unbound (rogue) workers and allowing it to be reattached
3012 * later if the cpu comes back online. A separate thread is created 3090 * later if the cpu comes back online. A separate thread is created
3013 * to govern a gcwq in such state and is called the trustee of the 3091 * to govern a gcwq in such state and is called the trustee of the
3014 * gcwq. 3092 * gcwq.
3015 * 3093 *
3016 * Trustee states and their descriptions. 3094 * Trustee states and their descriptions.
3017 * 3095 *
3018 * START Command state used on startup. On CPU_DOWN_PREPARE, a 3096 * START Command state used on startup. On CPU_DOWN_PREPARE, a
3019 * new trustee is started with this state. 3097 * new trustee is started with this state.
3020 * 3098 *
3021 * IN_CHARGE Once started, trustee will enter this state after 3099 * IN_CHARGE Once started, trustee will enter this state after
3022 * assuming the manager role and making all existing 3100 * assuming the manager role and making all existing
3023 * workers rogue. DOWN_PREPARE waits for trustee to 3101 * workers rogue. DOWN_PREPARE waits for trustee to
3024 * enter this state. After reaching IN_CHARGE, trustee 3102 * enter this state. After reaching IN_CHARGE, trustee
3025 * tries to execute the pending worklist until it's empty 3103 * tries to execute the pending worklist until it's empty
3026 * and the state is set to BUTCHER, or the state is set 3104 * and the state is set to BUTCHER, or the state is set
3027 * to RELEASE. 3105 * to RELEASE.
3028 * 3106 *
3029 * BUTCHER Command state which is set by the cpu callback after 3107 * BUTCHER Command state which is set by the cpu callback after
3030 * the cpu has went down. Once this state is set trustee 3108 * the cpu has went down. Once this state is set trustee
3031 * knows that there will be no new works on the worklist 3109 * knows that there will be no new works on the worklist
3032 * and once the worklist is empty it can proceed to 3110 * and once the worklist is empty it can proceed to
3033 * killing idle workers. 3111 * killing idle workers.
3034 * 3112 *
3035 * RELEASE Command state which is set by the cpu callback if the 3113 * RELEASE Command state which is set by the cpu callback if the
3036 * cpu down has been canceled or it has come online 3114 * cpu down has been canceled or it has come online
3037 * again. After recognizing this state, trustee stops 3115 * again. After recognizing this state, trustee stops
3038 * trying to drain or butcher and clears ROGUE, rebinds 3116 * trying to drain or butcher and clears ROGUE, rebinds
3039 * all remaining workers back to the cpu and releases 3117 * all remaining workers back to the cpu and releases
3040 * manager role. 3118 * manager role.
3041 * 3119 *
3042 * DONE Trustee will enter this state after BUTCHER or RELEASE 3120 * DONE Trustee will enter this state after BUTCHER or RELEASE
3043 * is complete. 3121 * is complete.
3044 * 3122 *
3045 * trustee CPU draining 3123 * trustee CPU draining
3046 * took over down complete 3124 * took over down complete
3047 * START -----------> IN_CHARGE -----------> BUTCHER -----------> DONE 3125 * START -----------> IN_CHARGE -----------> BUTCHER -----------> DONE
3048 * | | ^ 3126 * | | ^
3049 * | CPU is back online v return workers | 3127 * | CPU is back online v return workers |
3050 * ----------------> RELEASE -------------- 3128 * ----------------> RELEASE --------------
3051 */ 3129 */
3052 3130
3053 /** 3131 /**
3054 * trustee_wait_event_timeout - timed event wait for trustee 3132 * trustee_wait_event_timeout - timed event wait for trustee
3055 * @cond: condition to wait for 3133 * @cond: condition to wait for
3056 * @timeout: timeout in jiffies 3134 * @timeout: timeout in jiffies
3057 * 3135 *
3058 * wait_event_timeout() for trustee to use. Handles locking and 3136 * wait_event_timeout() for trustee to use. Handles locking and
3059 * checks for RELEASE request. 3137 * checks for RELEASE request.
3060 * 3138 *
3061 * CONTEXT: 3139 * CONTEXT:
3062 * spin_lock_irq(gcwq->lock) which may be released and regrabbed 3140 * spin_lock_irq(gcwq->lock) which may be released and regrabbed
3063 * multiple times. To be used by trustee. 3141 * multiple times. To be used by trustee.
3064 * 3142 *
3065 * RETURNS: 3143 * RETURNS:
3066 * Positive indicating left time if @cond is satisfied, 0 if timed 3144 * Positive indicating left time if @cond is satisfied, 0 if timed
3067 * out, -1 if canceled. 3145 * out, -1 if canceled.
3068 */ 3146 */
3069 #define trustee_wait_event_timeout(cond, timeout) ({ \ 3147 #define trustee_wait_event_timeout(cond, timeout) ({ \
3070 long __ret = (timeout); \ 3148 long __ret = (timeout); \
3071 while (!((cond) || (gcwq->trustee_state == TRUSTEE_RELEASE)) && \ 3149 while (!((cond) || (gcwq->trustee_state == TRUSTEE_RELEASE)) && \
3072 __ret) { \ 3150 __ret) { \
3073 spin_unlock_irq(&gcwq->lock); \ 3151 spin_unlock_irq(&gcwq->lock); \
3074 __wait_event_timeout(gcwq->trustee_wait, (cond) || \ 3152 __wait_event_timeout(gcwq->trustee_wait, (cond) || \
3075 (gcwq->trustee_state == TRUSTEE_RELEASE), \ 3153 (gcwq->trustee_state == TRUSTEE_RELEASE), \
3076 __ret); \ 3154 __ret); \
3077 spin_lock_irq(&gcwq->lock); \ 3155 spin_lock_irq(&gcwq->lock); \
3078 } \ 3156 } \
3079 gcwq->trustee_state == TRUSTEE_RELEASE ? -1 : (__ret); \ 3157 gcwq->trustee_state == TRUSTEE_RELEASE ? -1 : (__ret); \
3080 }) 3158 })
3081 3159
3082 /** 3160 /**
3083 * trustee_wait_event - event wait for trustee 3161 * trustee_wait_event - event wait for trustee
3084 * @cond: condition to wait for 3162 * @cond: condition to wait for
3085 * 3163 *
3086 * wait_event() for trustee to use. Automatically handles locking and 3164 * wait_event() for trustee to use. Automatically handles locking and
3087 * checks for CANCEL request. 3165 * checks for CANCEL request.
3088 * 3166 *
3089 * CONTEXT: 3167 * CONTEXT:
3090 * spin_lock_irq(gcwq->lock) which may be released and regrabbed 3168 * spin_lock_irq(gcwq->lock) which may be released and regrabbed
3091 * multiple times. To be used by trustee. 3169 * multiple times. To be used by trustee.
3092 * 3170 *
3093 * RETURNS: 3171 * RETURNS:
3094 * 0 if @cond is satisfied, -1 if canceled. 3172 * 0 if @cond is satisfied, -1 if canceled.
3095 */ 3173 */
3096 #define trustee_wait_event(cond) ({ \ 3174 #define trustee_wait_event(cond) ({ \
3097 long __ret1; \ 3175 long __ret1; \
3098 __ret1 = trustee_wait_event_timeout(cond, MAX_SCHEDULE_TIMEOUT);\ 3176 __ret1 = trustee_wait_event_timeout(cond, MAX_SCHEDULE_TIMEOUT);\
3099 __ret1 < 0 ? -1 : 0; \ 3177 __ret1 < 0 ? -1 : 0; \
3100 }) 3178 })
3101 3179
3102 static int __cpuinit trustee_thread(void *__gcwq) 3180 static int __cpuinit trustee_thread(void *__gcwq)
3103 { 3181 {
3104 struct global_cwq *gcwq = __gcwq; 3182 struct global_cwq *gcwq = __gcwq;
3105 struct worker *worker; 3183 struct worker *worker;
3106 struct work_struct *work; 3184 struct work_struct *work;
3107 struct hlist_node *pos; 3185 struct hlist_node *pos;
3108 long rc; 3186 long rc;
3109 int i; 3187 int i;
3110 3188
3111 BUG_ON(gcwq->cpu != smp_processor_id()); 3189 BUG_ON(gcwq->cpu != smp_processor_id());
3112 3190
3113 spin_lock_irq(&gcwq->lock); 3191 spin_lock_irq(&gcwq->lock);
3114 /* 3192 /*
3115 * Claim the manager position and make all workers rogue. 3193 * Claim the manager position and make all workers rogue.
3116 * Trustee must be bound to the target cpu and can't be 3194 * Trustee must be bound to the target cpu and can't be
3117 * cancelled. 3195 * cancelled.
3118 */ 3196 */
3119 BUG_ON(gcwq->cpu != smp_processor_id()); 3197 BUG_ON(gcwq->cpu != smp_processor_id());
3120 rc = trustee_wait_event(!(gcwq->flags & GCWQ_MANAGING_WORKERS)); 3198 rc = trustee_wait_event(!(gcwq->flags & GCWQ_MANAGING_WORKERS));
3121 BUG_ON(rc < 0); 3199 BUG_ON(rc < 0);
3122 3200
3123 gcwq->flags |= GCWQ_MANAGING_WORKERS; 3201 gcwq->flags |= GCWQ_MANAGING_WORKERS;
3124 3202
3125 list_for_each_entry(worker, &gcwq->idle_list, entry) 3203 list_for_each_entry(worker, &gcwq->idle_list, entry)
3126 worker->flags |= WORKER_ROGUE; 3204 worker->flags |= WORKER_ROGUE;
3127 3205
3128 for_each_busy_worker(worker, i, pos, gcwq) 3206 for_each_busy_worker(worker, i, pos, gcwq)
3129 worker->flags |= WORKER_ROGUE; 3207 worker->flags |= WORKER_ROGUE;
3130 3208
3131 /* 3209 /*
3132 * Call schedule() so that we cross rq->lock and thus can 3210 * Call schedule() so that we cross rq->lock and thus can
3133 * guarantee sched callbacks see the rogue flag. This is 3211 * guarantee sched callbacks see the rogue flag. This is
3134 * necessary as scheduler callbacks may be invoked from other 3212 * necessary as scheduler callbacks may be invoked from other
3135 * cpus. 3213 * cpus.
3136 */ 3214 */
3137 spin_unlock_irq(&gcwq->lock); 3215 spin_unlock_irq(&gcwq->lock);
3138 schedule(); 3216 schedule();
3139 spin_lock_irq(&gcwq->lock); 3217 spin_lock_irq(&gcwq->lock);
3140 3218
3141 /* 3219 /*
3142 * Sched callbacks are disabled now. Zap nr_running. After 3220 * Sched callbacks are disabled now. Zap nr_running. After
3143 * this, nr_running stays zero and need_more_worker() and 3221 * this, nr_running stays zero and need_more_worker() and
3144 * keep_working() are always true as long as the worklist is 3222 * keep_working() are always true as long as the worklist is
3145 * not empty. 3223 * not empty.
3146 */ 3224 */
3147 atomic_set(get_gcwq_nr_running(gcwq->cpu), 0); 3225 atomic_set(get_gcwq_nr_running(gcwq->cpu), 0);
3148 3226
3149 spin_unlock_irq(&gcwq->lock); 3227 spin_unlock_irq(&gcwq->lock);
3150 del_timer_sync(&gcwq->idle_timer); 3228 del_timer_sync(&gcwq->idle_timer);
3151 spin_lock_irq(&gcwq->lock); 3229 spin_lock_irq(&gcwq->lock);
3152 3230
3153 /* 3231 /*
3154 * We're now in charge. Notify and proceed to drain. We need 3232 * We're now in charge. Notify and proceed to drain. We need
3155 * to keep the gcwq running during the whole CPU down 3233 * to keep the gcwq running during the whole CPU down
3156 * procedure as other cpu hotunplug callbacks may need to 3234 * procedure as other cpu hotunplug callbacks may need to
3157 * flush currently running tasks. 3235 * flush currently running tasks.
3158 */ 3236 */
3159 gcwq->trustee_state = TRUSTEE_IN_CHARGE; 3237 gcwq->trustee_state = TRUSTEE_IN_CHARGE;
3160 wake_up_all(&gcwq->trustee_wait); 3238 wake_up_all(&gcwq->trustee_wait);
3161 3239
3162 /* 3240 /*
3163 * The original cpu is in the process of dying and may go away 3241 * The original cpu is in the process of dying and may go away
3164 * anytime now. When that happens, we and all workers would 3242 * anytime now. When that happens, we and all workers would
3165 * be migrated to other cpus. Try draining any left work. We 3243 * be migrated to other cpus. Try draining any left work. We
3166 * want to get it over with ASAP - spam rescuers, wake up as 3244 * want to get it over with ASAP - spam rescuers, wake up as
3167 * many idlers as necessary and create new ones till the 3245 * many idlers as necessary and create new ones till the
3168 * worklist is empty. Note that if the gcwq is frozen, there 3246 * worklist is empty. Note that if the gcwq is frozen, there
3169 * may be frozen works in freezeable cwqs. Don't declare 3247 * may be frozen works in freezeable cwqs. Don't declare
3170 * completion while frozen. 3248 * completion while frozen.
3171 */ 3249 */
3172 while (gcwq->nr_workers != gcwq->nr_idle || 3250 while (gcwq->nr_workers != gcwq->nr_idle ||
3173 gcwq->flags & GCWQ_FREEZING || 3251 gcwq->flags & GCWQ_FREEZING ||
3174 gcwq->trustee_state == TRUSTEE_IN_CHARGE) { 3252 gcwq->trustee_state == TRUSTEE_IN_CHARGE) {
3175 int nr_works = 0; 3253 int nr_works = 0;
3176 3254
3177 list_for_each_entry(work, &gcwq->worklist, entry) { 3255 list_for_each_entry(work, &gcwq->worklist, entry) {
3178 send_mayday(work); 3256 send_mayday(work);
3179 nr_works++; 3257 nr_works++;
3180 } 3258 }
3181 3259
3182 list_for_each_entry(worker, &gcwq->idle_list, entry) { 3260 list_for_each_entry(worker, &gcwq->idle_list, entry) {
3183 if (!nr_works--) 3261 if (!nr_works--)
3184 break; 3262 break;
3185 wake_up_process(worker->task); 3263 wake_up_process(worker->task);
3186 } 3264 }
3187 3265
3188 if (need_to_create_worker(gcwq)) { 3266 if (need_to_create_worker(gcwq)) {
3189 spin_unlock_irq(&gcwq->lock); 3267 spin_unlock_irq(&gcwq->lock);
3190 worker = create_worker(gcwq, false); 3268 worker = create_worker(gcwq, false);
3191 spin_lock_irq(&gcwq->lock); 3269 spin_lock_irq(&gcwq->lock);
3192 if (worker) { 3270 if (worker) {
3193 worker->flags |= WORKER_ROGUE; 3271 worker->flags |= WORKER_ROGUE;
3194 start_worker(worker); 3272 start_worker(worker);
3195 } 3273 }
3196 } 3274 }
3197 3275
3198 /* give a breather */ 3276 /* give a breather */
3199 if (trustee_wait_event_timeout(false, TRUSTEE_COOLDOWN) < 0) 3277 if (trustee_wait_event_timeout(false, TRUSTEE_COOLDOWN) < 0)
3200 break; 3278 break;
3201 } 3279 }
3202 3280
3203 /* 3281 /*
3204 * Either all works have been scheduled and cpu is down, or 3282 * Either all works have been scheduled and cpu is down, or
3205 * cpu down has already been canceled. Wait for and butcher 3283 * cpu down has already been canceled. Wait for and butcher
3206 * all workers till we're canceled. 3284 * all workers till we're canceled.
3207 */ 3285 */
3208 do { 3286 do {
3209 rc = trustee_wait_event(!list_empty(&gcwq->idle_list)); 3287 rc = trustee_wait_event(!list_empty(&gcwq->idle_list));
3210 while (!list_empty(&gcwq->idle_list)) 3288 while (!list_empty(&gcwq->idle_list))
3211 destroy_worker(list_first_entry(&gcwq->idle_list, 3289 destroy_worker(list_first_entry(&gcwq->idle_list,
3212 struct worker, entry)); 3290 struct worker, entry));
3213 } while (gcwq->nr_workers && rc >= 0); 3291 } while (gcwq->nr_workers && rc >= 0);
3214 3292
3215 /* 3293 /*
3216 * At this point, either draining has completed and no worker 3294 * At this point, either draining has completed and no worker
3217 * is left, or cpu down has been canceled or the cpu is being 3295 * is left, or cpu down has been canceled or the cpu is being
3218 * brought back up. There shouldn't be any idle one left. 3296 * brought back up. There shouldn't be any idle one left.
3219 * Tell the remaining busy ones to rebind once it finishes the 3297 * Tell the remaining busy ones to rebind once it finishes the
3220 * currently scheduled works by scheduling the rebind_work. 3298 * currently scheduled works by scheduling the rebind_work.
3221 */ 3299 */
3222 WARN_ON(!list_empty(&gcwq->idle_list)); 3300 WARN_ON(!list_empty(&gcwq->idle_list));
3223 3301
3224 for_each_busy_worker(worker, i, pos, gcwq) { 3302 for_each_busy_worker(worker, i, pos, gcwq) {
3225 struct work_struct *rebind_work = &worker->rebind_work; 3303 struct work_struct *rebind_work = &worker->rebind_work;
3226 3304
3227 /* 3305 /*
3228 * Rebind_work may race with future cpu hotplug 3306 * Rebind_work may race with future cpu hotplug
3229 * operations. Use a separate flag to mark that 3307 * operations. Use a separate flag to mark that
3230 * rebinding is scheduled. 3308 * rebinding is scheduled.
3231 */ 3309 */
3232 worker->flags |= WORKER_REBIND; 3310 worker->flags |= WORKER_REBIND;
3233 worker->flags &= ~WORKER_ROGUE; 3311 worker->flags &= ~WORKER_ROGUE;
3234 3312
3235 /* queue rebind_work, wq doesn't matter, use the default one */ 3313 /* queue rebind_work, wq doesn't matter, use the default one */
3236 if (test_and_set_bit(WORK_STRUCT_PENDING_BIT, 3314 if (test_and_set_bit(WORK_STRUCT_PENDING_BIT,
3237 work_data_bits(rebind_work))) 3315 work_data_bits(rebind_work)))
3238 continue; 3316 continue;
3239 3317
3240 debug_work_activate(rebind_work); 3318 debug_work_activate(rebind_work);
3241 insert_work(get_cwq(gcwq->cpu, system_wq), rebind_work, 3319 insert_work(get_cwq(gcwq->cpu, system_wq), rebind_work,
3242 worker->scheduled.next, 3320 worker->scheduled.next,
3243 work_color_to_flags(WORK_NO_COLOR)); 3321 work_color_to_flags(WORK_NO_COLOR));
3244 } 3322 }
3245 3323
3246 /* relinquish manager role */ 3324 /* relinquish manager role */
3247 gcwq->flags &= ~GCWQ_MANAGING_WORKERS; 3325 gcwq->flags &= ~GCWQ_MANAGING_WORKERS;
3248 3326
3249 /* notify completion */ 3327 /* notify completion */
3250 gcwq->trustee = NULL; 3328 gcwq->trustee = NULL;
3251 gcwq->trustee_state = TRUSTEE_DONE; 3329 gcwq->trustee_state = TRUSTEE_DONE;
3252 wake_up_all(&gcwq->trustee_wait); 3330 wake_up_all(&gcwq->trustee_wait);
3253 spin_unlock_irq(&gcwq->lock); 3331 spin_unlock_irq(&gcwq->lock);
3254 return 0; 3332 return 0;
3255 } 3333 }
3256 3334
3257 /** 3335 /**
3258 * wait_trustee_state - wait for trustee to enter the specified state 3336 * wait_trustee_state - wait for trustee to enter the specified state
3259 * @gcwq: gcwq the trustee of interest belongs to 3337 * @gcwq: gcwq the trustee of interest belongs to
3260 * @state: target state to wait for 3338 * @state: target state to wait for
3261 * 3339 *
3262 * Wait for the trustee to reach @state. DONE is already matched. 3340 * Wait for the trustee to reach @state. DONE is already matched.
3263 * 3341 *
3264 * CONTEXT: 3342 * CONTEXT:
3265 * spin_lock_irq(gcwq->lock) which may be released and regrabbed 3343 * spin_lock_irq(gcwq->lock) which may be released and regrabbed
3266 * multiple times. To be used by cpu_callback. 3344 * multiple times. To be used by cpu_callback.
3267 */ 3345 */
3268 static void __cpuinit wait_trustee_state(struct global_cwq *gcwq, int state) 3346 static void __cpuinit wait_trustee_state(struct global_cwq *gcwq, int state)
3269 __releases(&gcwq->lock) 3347 __releases(&gcwq->lock)
3270 __acquires(&gcwq->lock) 3348 __acquires(&gcwq->lock)
3271 { 3349 {
3272 if (!(gcwq->trustee_state == state || 3350 if (!(gcwq->trustee_state == state ||
3273 gcwq->trustee_state == TRUSTEE_DONE)) { 3351 gcwq->trustee_state == TRUSTEE_DONE)) {
3274 spin_unlock_irq(&gcwq->lock); 3352 spin_unlock_irq(&gcwq->lock);
3275 __wait_event(gcwq->trustee_wait, 3353 __wait_event(gcwq->trustee_wait,
3276 gcwq->trustee_state == state || 3354 gcwq->trustee_state == state ||
3277 gcwq->trustee_state == TRUSTEE_DONE); 3355 gcwq->trustee_state == TRUSTEE_DONE);
3278 spin_lock_irq(&gcwq->lock); 3356 spin_lock_irq(&gcwq->lock);
3279 } 3357 }
3280 } 3358 }
3281 3359
3282 static int __devinit workqueue_cpu_callback(struct notifier_block *nfb, 3360 static int __devinit workqueue_cpu_callback(struct notifier_block *nfb,
3283 unsigned long action, 3361 unsigned long action,
3284 void *hcpu) 3362 void *hcpu)
3285 { 3363 {
3286 unsigned int cpu = (unsigned long)hcpu; 3364 unsigned int cpu = (unsigned long)hcpu;
3287 struct global_cwq *gcwq = get_gcwq(cpu); 3365 struct global_cwq *gcwq = get_gcwq(cpu);
3288 struct task_struct *new_trustee = NULL; 3366 struct task_struct *new_trustee = NULL;
3289 struct worker *uninitialized_var(new_worker); 3367 struct worker *uninitialized_var(new_worker);
3290 unsigned long flags; 3368 unsigned long flags;
3291 3369
3292 action &= ~CPU_TASKS_FROZEN; 3370 action &= ~CPU_TASKS_FROZEN;
3293 3371
3294 switch (action) { 3372 switch (action) {
3295 case CPU_DOWN_PREPARE: 3373 case CPU_DOWN_PREPARE:
3296 new_trustee = kthread_create(trustee_thread, gcwq, 3374 new_trustee = kthread_create(trustee_thread, gcwq,
3297 "workqueue_trustee/%d\n", cpu); 3375 "workqueue_trustee/%d\n", cpu);
3298 if (IS_ERR(new_trustee)) 3376 if (IS_ERR(new_trustee))
3299 return notifier_from_errno(PTR_ERR(new_trustee)); 3377 return notifier_from_errno(PTR_ERR(new_trustee));
3300 kthread_bind(new_trustee, cpu); 3378 kthread_bind(new_trustee, cpu);
3301 /* fall through */ 3379 /* fall through */
3302 case CPU_UP_PREPARE: 3380 case CPU_UP_PREPARE:
3303 BUG_ON(gcwq->first_idle); 3381 BUG_ON(gcwq->first_idle);
3304 new_worker = create_worker(gcwq, false); 3382 new_worker = create_worker(gcwq, false);
3305 if (!new_worker) { 3383 if (!new_worker) {
3306 if (new_trustee) 3384 if (new_trustee)
3307 kthread_stop(new_trustee); 3385 kthread_stop(new_trustee);
3308 return NOTIFY_BAD; 3386 return NOTIFY_BAD;
3309 } 3387 }
3310 } 3388 }
3311 3389
3312 /* some are called w/ irq disabled, don't disturb irq status */ 3390 /* some are called w/ irq disabled, don't disturb irq status */
3313 spin_lock_irqsave(&gcwq->lock, flags); 3391 spin_lock_irqsave(&gcwq->lock, flags);
3314 3392
3315 switch (action) { 3393 switch (action) {
3316 case CPU_DOWN_PREPARE: 3394 case CPU_DOWN_PREPARE:
3317 /* initialize trustee and tell it to acquire the gcwq */ 3395 /* initialize trustee and tell it to acquire the gcwq */
3318 BUG_ON(gcwq->trustee || gcwq->trustee_state != TRUSTEE_DONE); 3396 BUG_ON(gcwq->trustee || gcwq->trustee_state != TRUSTEE_DONE);
3319 gcwq->trustee = new_trustee; 3397 gcwq->trustee = new_trustee;
3320 gcwq->trustee_state = TRUSTEE_START; 3398 gcwq->trustee_state = TRUSTEE_START;
3321 wake_up_process(gcwq->trustee); 3399 wake_up_process(gcwq->trustee);
3322 wait_trustee_state(gcwq, TRUSTEE_IN_CHARGE); 3400 wait_trustee_state(gcwq, TRUSTEE_IN_CHARGE);
3323 /* fall through */ 3401 /* fall through */
3324 case CPU_UP_PREPARE: 3402 case CPU_UP_PREPARE:
3325 BUG_ON(gcwq->first_idle); 3403 BUG_ON(gcwq->first_idle);
3326 gcwq->first_idle = new_worker; 3404 gcwq->first_idle = new_worker;
3327 break; 3405 break;
3328 3406
3329 case CPU_DYING: 3407 case CPU_DYING:
3330 /* 3408 /*
3331 * Before this, the trustee and all workers except for 3409 * Before this, the trustee and all workers except for
3332 * the ones which are still executing works from 3410 * the ones which are still executing works from
3333 * before the last CPU down must be on the cpu. After 3411 * before the last CPU down must be on the cpu. After
3334 * this, they'll all be diasporas. 3412 * this, they'll all be diasporas.
3335 */ 3413 */
3336 gcwq->flags |= GCWQ_DISASSOCIATED; 3414 gcwq->flags |= GCWQ_DISASSOCIATED;
3337 break; 3415 break;
3338 3416
3339 case CPU_POST_DEAD: 3417 case CPU_POST_DEAD:
3340 gcwq->trustee_state = TRUSTEE_BUTCHER; 3418 gcwq->trustee_state = TRUSTEE_BUTCHER;
3341 /* fall through */ 3419 /* fall through */
3342 case CPU_UP_CANCELED: 3420 case CPU_UP_CANCELED:
3343 destroy_worker(gcwq->first_idle); 3421 destroy_worker(gcwq->first_idle);
3344 gcwq->first_idle = NULL; 3422 gcwq->first_idle = NULL;
3345 break; 3423 break;
3346 3424
3347 case CPU_DOWN_FAILED: 3425 case CPU_DOWN_FAILED:
3348 case CPU_ONLINE: 3426 case CPU_ONLINE:
3349 gcwq->flags &= ~GCWQ_DISASSOCIATED; 3427 gcwq->flags &= ~GCWQ_DISASSOCIATED;
3350 if (gcwq->trustee_state != TRUSTEE_DONE) { 3428 if (gcwq->trustee_state != TRUSTEE_DONE) {
3351 gcwq->trustee_state = TRUSTEE_RELEASE; 3429 gcwq->trustee_state = TRUSTEE_RELEASE;
3352 wake_up_process(gcwq->trustee); 3430 wake_up_process(gcwq->trustee);
3353 wait_trustee_state(gcwq, TRUSTEE_DONE); 3431 wait_trustee_state(gcwq, TRUSTEE_DONE);
3354 } 3432 }
3355 3433
3356 /* 3434 /*
3357 * Trustee is done and there might be no worker left. 3435 * Trustee is done and there might be no worker left.
3358 * Put the first_idle in and request a real manager to 3436 * Put the first_idle in and request a real manager to
3359 * take a look. 3437 * take a look.
3360 */ 3438 */
3361 spin_unlock_irq(&gcwq->lock); 3439 spin_unlock_irq(&gcwq->lock);
3362 kthread_bind(gcwq->first_idle->task, cpu); 3440 kthread_bind(gcwq->first_idle->task, cpu);
3363 spin_lock_irq(&gcwq->lock); 3441 spin_lock_irq(&gcwq->lock);
3364 gcwq->flags |= GCWQ_MANAGE_WORKERS; 3442 gcwq->flags |= GCWQ_MANAGE_WORKERS;
3365 start_worker(gcwq->first_idle); 3443 start_worker(gcwq->first_idle);
3366 gcwq->first_idle = NULL; 3444 gcwq->first_idle = NULL;
3367 break; 3445 break;
3368 } 3446 }
3369 3447
3370 spin_unlock_irqrestore(&gcwq->lock, flags); 3448 spin_unlock_irqrestore(&gcwq->lock, flags);
3371 3449
3372 return notifier_from_errno(0); 3450 return notifier_from_errno(0);
3373 } 3451 }
3374 3452
3375 #ifdef CONFIG_SMP 3453 #ifdef CONFIG_SMP
3376 3454
3377 struct work_for_cpu { 3455 struct work_for_cpu {
3378 struct completion completion; 3456 struct completion completion;
3379 long (*fn)(void *); 3457 long (*fn)(void *);
3380 void *arg; 3458 void *arg;
3381 long ret; 3459 long ret;
3382 }; 3460 };
3383 3461
3384 static int do_work_for_cpu(void *_wfc) 3462 static int do_work_for_cpu(void *_wfc)
3385 { 3463 {
3386 struct work_for_cpu *wfc = _wfc; 3464 struct work_for_cpu *wfc = _wfc;
3387 wfc->ret = wfc->fn(wfc->arg); 3465 wfc->ret = wfc->fn(wfc->arg);
3388 complete(&wfc->completion); 3466 complete(&wfc->completion);
3389 return 0; 3467 return 0;
3390 } 3468 }
3391 3469
3392 /** 3470 /**
3393 * work_on_cpu - run a function in user context on a particular cpu 3471 * work_on_cpu - run a function in user context on a particular cpu
3394 * @cpu: the cpu to run on 3472 * @cpu: the cpu to run on
3395 * @fn: the function to run 3473 * @fn: the function to run
3396 * @arg: the function arg 3474 * @arg: the function arg
3397 * 3475 *
3398 * This will return the value @fn returns. 3476 * This will return the value @fn returns.
3399 * It is up to the caller to ensure that the cpu doesn't go offline. 3477 * It is up to the caller to ensure that the cpu doesn't go offline.
3400 * The caller must not hold any locks which would prevent @fn from completing. 3478 * The caller must not hold any locks which would prevent @fn from completing.
3401 */ 3479 */
3402 long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg) 3480 long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
3403 { 3481 {
3404 struct task_struct *sub_thread; 3482 struct task_struct *sub_thread;
3405 struct work_for_cpu wfc = { 3483 struct work_for_cpu wfc = {
3406 .completion = COMPLETION_INITIALIZER_ONSTACK(wfc.completion), 3484 .completion = COMPLETION_INITIALIZER_ONSTACK(wfc.completion),
3407 .fn = fn, 3485 .fn = fn,
3408 .arg = arg, 3486 .arg = arg,
3409 }; 3487 };
3410 3488
3411 sub_thread = kthread_create(do_work_for_cpu, &wfc, "work_for_cpu"); 3489 sub_thread = kthread_create(do_work_for_cpu, &wfc, "work_for_cpu");
3412 if (IS_ERR(sub_thread)) 3490 if (IS_ERR(sub_thread))
3413 return PTR_ERR(sub_thread); 3491 return PTR_ERR(sub_thread);
3414 kthread_bind(sub_thread, cpu); 3492 kthread_bind(sub_thread, cpu);
3415 wake_up_process(sub_thread); 3493 wake_up_process(sub_thread);
3416 wait_for_completion(&wfc.completion); 3494 wait_for_completion(&wfc.completion);
3417 return wfc.ret; 3495 return wfc.ret;
3418 } 3496 }
3419 EXPORT_SYMBOL_GPL(work_on_cpu); 3497 EXPORT_SYMBOL_GPL(work_on_cpu);
3420 #endif /* CONFIG_SMP */ 3498 #endif /* CONFIG_SMP */
3421 3499
3422 #ifdef CONFIG_FREEZER 3500 #ifdef CONFIG_FREEZER
3423 3501
3424 /** 3502 /**
3425 * freeze_workqueues_begin - begin freezing workqueues 3503 * freeze_workqueues_begin - begin freezing workqueues
3426 * 3504 *
3427 * Start freezing workqueues. After this function returns, all 3505 * Start freezing workqueues. After this function returns, all
3428 * freezeable workqueues will queue new works to their frozen_works 3506 * freezeable workqueues will queue new works to their frozen_works
3429 * list instead of gcwq->worklist. 3507 * list instead of gcwq->worklist.
3430 * 3508 *
3431 * CONTEXT: 3509 * CONTEXT:
3432 * Grabs and releases workqueue_lock and gcwq->lock's. 3510 * Grabs and releases workqueue_lock and gcwq->lock's.
3433 */ 3511 */
3434 void freeze_workqueues_begin(void) 3512 void freeze_workqueues_begin(void)
3435 { 3513 {
3436 unsigned int cpu; 3514 unsigned int cpu;
3437 3515
3438 spin_lock(&workqueue_lock); 3516 spin_lock(&workqueue_lock);
3439 3517
3440 BUG_ON(workqueue_freezing); 3518 BUG_ON(workqueue_freezing);
3441 workqueue_freezing = true; 3519 workqueue_freezing = true;
3442 3520
3443 for_each_gcwq_cpu(cpu) { 3521 for_each_gcwq_cpu(cpu) {
3444 struct global_cwq *gcwq = get_gcwq(cpu); 3522 struct global_cwq *gcwq = get_gcwq(cpu);
3445 struct workqueue_struct *wq; 3523 struct workqueue_struct *wq;
3446 3524
3447 spin_lock_irq(&gcwq->lock); 3525 spin_lock_irq(&gcwq->lock);
3448 3526
3449 BUG_ON(gcwq->flags & GCWQ_FREEZING); 3527 BUG_ON(gcwq->flags & GCWQ_FREEZING);
3450 gcwq->flags |= GCWQ_FREEZING; 3528 gcwq->flags |= GCWQ_FREEZING;
3451 3529
3452 list_for_each_entry(wq, &workqueues, list) { 3530 list_for_each_entry(wq, &workqueues, list) {
3453 struct cpu_workqueue_struct *cwq = get_cwq(cpu, wq); 3531 struct cpu_workqueue_struct *cwq = get_cwq(cpu, wq);
3454 3532
3455 if (cwq && wq->flags & WQ_FREEZEABLE) 3533 if (cwq && wq->flags & WQ_FREEZEABLE)
3456 cwq->max_active = 0; 3534 cwq->max_active = 0;
3457 } 3535 }
3458 3536
3459 spin_unlock_irq(&gcwq->lock); 3537 spin_unlock_irq(&gcwq->lock);
3460 } 3538 }
3461 3539
3462 spin_unlock(&workqueue_lock); 3540 spin_unlock(&workqueue_lock);
3463 } 3541 }
3464 3542
3465 /** 3543 /**
3466 * freeze_workqueues_busy - are freezeable workqueues still busy? 3544 * freeze_workqueues_busy - are freezeable workqueues still busy?
3467 * 3545 *
3468 * Check whether freezing is complete. This function must be called 3546 * Check whether freezing is complete. This function must be called
3469 * between freeze_workqueues_begin() and thaw_workqueues(). 3547 * between freeze_workqueues_begin() and thaw_workqueues().
3470 * 3548 *
3471 * CONTEXT: 3549 * CONTEXT:
3472 * Grabs and releases workqueue_lock. 3550 * Grabs and releases workqueue_lock.
3473 * 3551 *
3474 * RETURNS: 3552 * RETURNS:
3475 * %true if some freezeable workqueues are still busy. %false if 3553 * %true if some freezeable workqueues are still busy. %false if
3476 * freezing is complete. 3554 * freezing is complete.
3477 */ 3555 */
3478 bool freeze_workqueues_busy(void) 3556 bool freeze_workqueues_busy(void)
3479 { 3557 {
3480 unsigned int cpu; 3558 unsigned int cpu;
3481 bool busy = false; 3559 bool busy = false;
3482 3560
3483 spin_lock(&workqueue_lock); 3561 spin_lock(&workqueue_lock);
3484 3562
3485 BUG_ON(!workqueue_freezing); 3563 BUG_ON(!workqueue_freezing);
3486 3564
3487 for_each_gcwq_cpu(cpu) { 3565 for_each_gcwq_cpu(cpu) {
3488 struct workqueue_struct *wq; 3566 struct workqueue_struct *wq;
3489 /* 3567 /*
3490 * nr_active is monotonically decreasing. It's safe 3568 * nr_active is monotonically decreasing. It's safe
3491 * to peek without lock. 3569 * to peek without lock.
3492 */ 3570 */
3493 list_for_each_entry(wq, &workqueues, list) { 3571 list_for_each_entry(wq, &workqueues, list) {
3494 struct cpu_workqueue_struct *cwq = get_cwq(cpu, wq); 3572 struct cpu_workqueue_struct *cwq = get_cwq(cpu, wq);
3495 3573
3496 if (!cwq || !(wq->flags & WQ_FREEZEABLE)) 3574 if (!cwq || !(wq->flags & WQ_FREEZEABLE))
3497 continue; 3575 continue;
3498 3576
3499 BUG_ON(cwq->nr_active < 0); 3577 BUG_ON(cwq->nr_active < 0);
3500 if (cwq->nr_active) { 3578 if (cwq->nr_active) {
3501 busy = true; 3579 busy = true;
3502 goto out_unlock; 3580 goto out_unlock;
3503 } 3581 }
3504 } 3582 }
3505 } 3583 }
3506 out_unlock: 3584 out_unlock:
3507 spin_unlock(&workqueue_lock); 3585 spin_unlock(&workqueue_lock);
3508 return busy; 3586 return busy;
3509 } 3587 }
3510 3588
3511 /** 3589 /**
3512 * thaw_workqueues - thaw workqueues 3590 * thaw_workqueues - thaw workqueues
3513 * 3591 *
3514 * Thaw workqueues. Normal queueing is restored and all collected 3592 * Thaw workqueues. Normal queueing is restored and all collected
3515 * frozen works are transferred to their respective gcwq worklists. 3593 * frozen works are transferred to their respective gcwq worklists.
3516 * 3594 *
3517 * CONTEXT: 3595 * CONTEXT:
3518 * Grabs and releases workqueue_lock and gcwq->lock's. 3596 * Grabs and releases workqueue_lock and gcwq->lock's.
3519 */ 3597 */
3520 void thaw_workqueues(void) 3598 void thaw_workqueues(void)
3521 { 3599 {
3522 unsigned int cpu; 3600 unsigned int cpu;
3523 3601
3524 spin_lock(&workqueue_lock); 3602 spin_lock(&workqueue_lock);
3525 3603
3526 if (!workqueue_freezing) 3604 if (!workqueue_freezing)
3527 goto out_unlock; 3605 goto out_unlock;
3528 3606
3529 for_each_gcwq_cpu(cpu) { 3607 for_each_gcwq_cpu(cpu) {
1 /* 1 /*
2 * linux/mm/memory_hotplug.c 2 * linux/mm/memory_hotplug.c
3 * 3 *
4 * Copyright (C) 4 * Copyright (C)
5 */ 5 */
6 6
7 #include <linux/stddef.h> 7 #include <linux/stddef.h>
8 #include <linux/mm.h> 8 #include <linux/mm.h>
9 #include <linux/swap.h> 9 #include <linux/swap.h>
10 #include <linux/interrupt.h> 10 #include <linux/interrupt.h>
11 #include <linux/pagemap.h> 11 #include <linux/pagemap.h>
12 #include <linux/bootmem.h> 12 #include <linux/bootmem.h>
13 #include <linux/compiler.h> 13 #include <linux/compiler.h>
14 #include <linux/module.h> 14 #include <linux/module.h>
15 #include <linux/pagevec.h> 15 #include <linux/pagevec.h>
16 #include <linux/writeback.h> 16 #include <linux/writeback.h>
17 #include <linux/slab.h> 17 #include <linux/slab.h>
18 #include <linux/sysctl.h> 18 #include <linux/sysctl.h>
19 #include <linux/cpu.h> 19 #include <linux/cpu.h>
20 #include <linux/memory.h> 20 #include <linux/memory.h>
21 #include <linux/memory_hotplug.h> 21 #include <linux/memory_hotplug.h>
22 #include <linux/highmem.h> 22 #include <linux/highmem.h>
23 #include <linux/vmalloc.h> 23 #include <linux/vmalloc.h>
24 #include <linux/ioport.h> 24 #include <linux/ioport.h>
25 #include <linux/delay.h> 25 #include <linux/delay.h>
26 #include <linux/migrate.h> 26 #include <linux/migrate.h>
27 #include <linux/page-isolation.h> 27 #include <linux/page-isolation.h>
28 #include <linux/pfn.h> 28 #include <linux/pfn.h>
29 #include <linux/suspend.h> 29 #include <linux/suspend.h>
30 #include <linux/mm_inline.h> 30 #include <linux/mm_inline.h>
31 #include <linux/firmware-map.h> 31 #include <linux/firmware-map.h>
32 32
33 #include <asm/tlbflush.h> 33 #include <asm/tlbflush.h>
34 34
35 #include "internal.h" 35 #include "internal.h"
36 36
37 /* add this memory to iomem resource */ 37 /* add this memory to iomem resource */
38 static struct resource *register_memory_resource(u64 start, u64 size) 38 static struct resource *register_memory_resource(u64 start, u64 size)
39 { 39 {
40 struct resource *res; 40 struct resource *res;
41 res = kzalloc(sizeof(struct resource), GFP_KERNEL); 41 res = kzalloc(sizeof(struct resource), GFP_KERNEL);
42 BUG_ON(!res); 42 BUG_ON(!res);
43 43
44 res->name = "System RAM"; 44 res->name = "System RAM";
45 res->start = start; 45 res->start = start;
46 res->end = start + size - 1; 46 res->end = start + size - 1;
47 res->flags = IORESOURCE_MEM | IORESOURCE_BUSY; 47 res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
48 if (request_resource(&iomem_resource, res) < 0) { 48 if (request_resource(&iomem_resource, res) < 0) {
49 printk("System RAM resource %llx - %llx cannot be added\n", 49 printk("System RAM resource %llx - %llx cannot be added\n",
50 (unsigned long long)res->start, (unsigned long long)res->end); 50 (unsigned long long)res->start, (unsigned long long)res->end);
51 kfree(res); 51 kfree(res);
52 res = NULL; 52 res = NULL;
53 } 53 }
54 return res; 54 return res;
55 } 55 }
56 56
57 static void release_memory_resource(struct resource *res) 57 static void release_memory_resource(struct resource *res)
58 { 58 {
59 if (!res) 59 if (!res)
60 return; 60 return;
61 release_resource(res); 61 release_resource(res);
62 kfree(res); 62 kfree(res);
63 return; 63 return;
64 } 64 }
65 65
66 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE 66 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
67 #ifndef CONFIG_SPARSEMEM_VMEMMAP 67 #ifndef CONFIG_SPARSEMEM_VMEMMAP
68 static void get_page_bootmem(unsigned long info, struct page *page, int type) 68 static void get_page_bootmem(unsigned long info, struct page *page, int type)
69 { 69 {
70 atomic_set(&page->_mapcount, type); 70 atomic_set(&page->_mapcount, type);
71 SetPagePrivate(page); 71 SetPagePrivate(page);
72 set_page_private(page, info); 72 set_page_private(page, info);
73 atomic_inc(&page->_count); 73 atomic_inc(&page->_count);
74 } 74 }
75 75
76 /* reference to __meminit __free_pages_bootmem is valid 76 /* reference to __meminit __free_pages_bootmem is valid
77 * so use __ref to tell modpost not to generate a warning */ 77 * so use __ref to tell modpost not to generate a warning */
78 void __ref put_page_bootmem(struct page *page) 78 void __ref put_page_bootmem(struct page *page)
79 { 79 {
80 int type; 80 int type;
81 81
82 type = atomic_read(&page->_mapcount); 82 type = atomic_read(&page->_mapcount);
83 BUG_ON(type >= -1); 83 BUG_ON(type >= -1);
84 84
85 if (atomic_dec_return(&page->_count) == 1) { 85 if (atomic_dec_return(&page->_count) == 1) {
86 ClearPagePrivate(page); 86 ClearPagePrivate(page);
87 set_page_private(page, 0); 87 set_page_private(page, 0);
88 reset_page_mapcount(page); 88 reset_page_mapcount(page);
89 __free_pages_bootmem(page, 0); 89 __free_pages_bootmem(page, 0);
90 } 90 }
91 91
92 } 92 }
93 93
94 static void register_page_bootmem_info_section(unsigned long start_pfn) 94 static void register_page_bootmem_info_section(unsigned long start_pfn)
95 { 95 {
96 unsigned long *usemap, mapsize, section_nr, i; 96 unsigned long *usemap, mapsize, section_nr, i;
97 struct mem_section *ms; 97 struct mem_section *ms;
98 struct page *page, *memmap; 98 struct page *page, *memmap;
99 99
100 if (!pfn_valid(start_pfn)) 100 if (!pfn_valid(start_pfn))
101 return; 101 return;
102 102
103 section_nr = pfn_to_section_nr(start_pfn); 103 section_nr = pfn_to_section_nr(start_pfn);
104 ms = __nr_to_section(section_nr); 104 ms = __nr_to_section(section_nr);
105 105
106 /* Get section's memmap address */ 106 /* Get section's memmap address */
107 memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr); 107 memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
108 108
109 /* 109 /*
110 * Get page for the memmap's phys address 110 * Get page for the memmap's phys address
111 * XXX: need more consideration for sparse_vmemmap... 111 * XXX: need more consideration for sparse_vmemmap...
112 */ 112 */
113 page = virt_to_page(memmap); 113 page = virt_to_page(memmap);
114 mapsize = sizeof(struct page) * PAGES_PER_SECTION; 114 mapsize = sizeof(struct page) * PAGES_PER_SECTION;
115 mapsize = PAGE_ALIGN(mapsize) >> PAGE_SHIFT; 115 mapsize = PAGE_ALIGN(mapsize) >> PAGE_SHIFT;
116 116
117 /* remember memmap's page */ 117 /* remember memmap's page */
118 for (i = 0; i < mapsize; i++, page++) 118 for (i = 0; i < mapsize; i++, page++)
119 get_page_bootmem(section_nr, page, SECTION_INFO); 119 get_page_bootmem(section_nr, page, SECTION_INFO);
120 120
121 usemap = __nr_to_section(section_nr)->pageblock_flags; 121 usemap = __nr_to_section(section_nr)->pageblock_flags;
122 page = virt_to_page(usemap); 122 page = virt_to_page(usemap);
123 123
124 mapsize = PAGE_ALIGN(usemap_size()) >> PAGE_SHIFT; 124 mapsize = PAGE_ALIGN(usemap_size()) >> PAGE_SHIFT;
125 125
126 for (i = 0; i < mapsize; i++, page++) 126 for (i = 0; i < mapsize; i++, page++)
127 get_page_bootmem(section_nr, page, MIX_SECTION_INFO); 127 get_page_bootmem(section_nr, page, MIX_SECTION_INFO);
128 128
129 } 129 }
130 130
131 void register_page_bootmem_info_node(struct pglist_data *pgdat) 131 void register_page_bootmem_info_node(struct pglist_data *pgdat)
132 { 132 {
133 unsigned long i, pfn, end_pfn, nr_pages; 133 unsigned long i, pfn, end_pfn, nr_pages;
134 int node = pgdat->node_id; 134 int node = pgdat->node_id;
135 struct page *page; 135 struct page *page;
136 struct zone *zone; 136 struct zone *zone;
137 137
138 nr_pages = PAGE_ALIGN(sizeof(struct pglist_data)) >> PAGE_SHIFT; 138 nr_pages = PAGE_ALIGN(sizeof(struct pglist_data)) >> PAGE_SHIFT;
139 page = virt_to_page(pgdat); 139 page = virt_to_page(pgdat);
140 140
141 for (i = 0; i < nr_pages; i++, page++) 141 for (i = 0; i < nr_pages; i++, page++)
142 get_page_bootmem(node, page, NODE_INFO); 142 get_page_bootmem(node, page, NODE_INFO);
143 143
144 zone = &pgdat->node_zones[0]; 144 zone = &pgdat->node_zones[0];
145 for (; zone < pgdat->node_zones + MAX_NR_ZONES - 1; zone++) { 145 for (; zone < pgdat->node_zones + MAX_NR_ZONES - 1; zone++) {
146 if (zone->wait_table) { 146 if (zone->wait_table) {
147 nr_pages = zone->wait_table_hash_nr_entries 147 nr_pages = zone->wait_table_hash_nr_entries
148 * sizeof(wait_queue_head_t); 148 * sizeof(wait_queue_head_t);
149 nr_pages = PAGE_ALIGN(nr_pages) >> PAGE_SHIFT; 149 nr_pages = PAGE_ALIGN(nr_pages) >> PAGE_SHIFT;
150 page = virt_to_page(zone->wait_table); 150 page = virt_to_page(zone->wait_table);
151 151
152 for (i = 0; i < nr_pages; i++, page++) 152 for (i = 0; i < nr_pages; i++, page++)
153 get_page_bootmem(node, page, NODE_INFO); 153 get_page_bootmem(node, page, NODE_INFO);
154 } 154 }
155 } 155 }
156 156
157 pfn = pgdat->node_start_pfn; 157 pfn = pgdat->node_start_pfn;
158 end_pfn = pfn + pgdat->node_spanned_pages; 158 end_pfn = pfn + pgdat->node_spanned_pages;
159 159
160 /* register_section info */ 160 /* register_section info */
161 for (; pfn < end_pfn; pfn += PAGES_PER_SECTION) 161 for (; pfn < end_pfn; pfn += PAGES_PER_SECTION)
162 register_page_bootmem_info_section(pfn); 162 register_page_bootmem_info_section(pfn);
163 163
164 } 164 }
165 #endif /* !CONFIG_SPARSEMEM_VMEMMAP */ 165 #endif /* !CONFIG_SPARSEMEM_VMEMMAP */
166 166
167 static void grow_zone_span(struct zone *zone, unsigned long start_pfn, 167 static void grow_zone_span(struct zone *zone, unsigned long start_pfn,
168 unsigned long end_pfn) 168 unsigned long end_pfn)
169 { 169 {
170 unsigned long old_zone_end_pfn; 170 unsigned long old_zone_end_pfn;
171 171
172 zone_span_writelock(zone); 172 zone_span_writelock(zone);
173 173
174 old_zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages; 174 old_zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
175 if (start_pfn < zone->zone_start_pfn) 175 if (start_pfn < zone->zone_start_pfn)
176 zone->zone_start_pfn = start_pfn; 176 zone->zone_start_pfn = start_pfn;
177 177
178 zone->spanned_pages = max(old_zone_end_pfn, end_pfn) - 178 zone->spanned_pages = max(old_zone_end_pfn, end_pfn) -
179 zone->zone_start_pfn; 179 zone->zone_start_pfn;
180 180
181 zone_span_writeunlock(zone); 181 zone_span_writeunlock(zone);
182 } 182 }
183 183
184 static void grow_pgdat_span(struct pglist_data *pgdat, unsigned long start_pfn, 184 static void grow_pgdat_span(struct pglist_data *pgdat, unsigned long start_pfn,
185 unsigned long end_pfn) 185 unsigned long end_pfn)
186 { 186 {
187 unsigned long old_pgdat_end_pfn = 187 unsigned long old_pgdat_end_pfn =
188 pgdat->node_start_pfn + pgdat->node_spanned_pages; 188 pgdat->node_start_pfn + pgdat->node_spanned_pages;
189 189
190 if (start_pfn < pgdat->node_start_pfn) 190 if (start_pfn < pgdat->node_start_pfn)
191 pgdat->node_start_pfn = start_pfn; 191 pgdat->node_start_pfn = start_pfn;
192 192
193 pgdat->node_spanned_pages = max(old_pgdat_end_pfn, end_pfn) - 193 pgdat->node_spanned_pages = max(old_pgdat_end_pfn, end_pfn) -
194 pgdat->node_start_pfn; 194 pgdat->node_start_pfn;
195 } 195 }
196 196
197 static int __meminit __add_zone(struct zone *zone, unsigned long phys_start_pfn) 197 static int __meminit __add_zone(struct zone *zone, unsigned long phys_start_pfn)
198 { 198 {
199 struct pglist_data *pgdat = zone->zone_pgdat; 199 struct pglist_data *pgdat = zone->zone_pgdat;
200 int nr_pages = PAGES_PER_SECTION; 200 int nr_pages = PAGES_PER_SECTION;
201 int nid = pgdat->node_id; 201 int nid = pgdat->node_id;
202 int zone_type; 202 int zone_type;
203 unsigned long flags; 203 unsigned long flags;
204 204
205 zone_type = zone - pgdat->node_zones; 205 zone_type = zone - pgdat->node_zones;
206 if (!zone->wait_table) { 206 if (!zone->wait_table) {
207 int ret; 207 int ret;
208 208
209 ret = init_currently_empty_zone(zone, phys_start_pfn, 209 ret = init_currently_empty_zone(zone, phys_start_pfn,
210 nr_pages, MEMMAP_HOTPLUG); 210 nr_pages, MEMMAP_HOTPLUG);
211 if (ret) 211 if (ret)
212 return ret; 212 return ret;
213 } 213 }
214 pgdat_resize_lock(zone->zone_pgdat, &flags); 214 pgdat_resize_lock(zone->zone_pgdat, &flags);
215 grow_zone_span(zone, phys_start_pfn, phys_start_pfn + nr_pages); 215 grow_zone_span(zone, phys_start_pfn, phys_start_pfn + nr_pages);
216 grow_pgdat_span(zone->zone_pgdat, phys_start_pfn, 216 grow_pgdat_span(zone->zone_pgdat, phys_start_pfn,
217 phys_start_pfn + nr_pages); 217 phys_start_pfn + nr_pages);
218 pgdat_resize_unlock(zone->zone_pgdat, &flags); 218 pgdat_resize_unlock(zone->zone_pgdat, &flags);
219 memmap_init_zone(nr_pages, nid, zone_type, 219 memmap_init_zone(nr_pages, nid, zone_type,
220 phys_start_pfn, MEMMAP_HOTPLUG); 220 phys_start_pfn, MEMMAP_HOTPLUG);
221 return 0; 221 return 0;
222 } 222 }
223 223
224 static int __meminit __add_section(int nid, struct zone *zone, 224 static int __meminit __add_section(int nid, struct zone *zone,
225 unsigned long phys_start_pfn) 225 unsigned long phys_start_pfn)
226 { 226 {
227 int nr_pages = PAGES_PER_SECTION; 227 int nr_pages = PAGES_PER_SECTION;
228 int ret; 228 int ret;
229 229
230 if (pfn_valid(phys_start_pfn)) 230 if (pfn_valid(phys_start_pfn))
231 return -EEXIST; 231 return -EEXIST;
232 232
233 ret = sparse_add_one_section(zone, phys_start_pfn, nr_pages); 233 ret = sparse_add_one_section(zone, phys_start_pfn, nr_pages);
234 234
235 if (ret < 0) 235 if (ret < 0)
236 return ret; 236 return ret;
237 237
238 ret = __add_zone(zone, phys_start_pfn); 238 ret = __add_zone(zone, phys_start_pfn);
239 239
240 if (ret < 0) 240 if (ret < 0)
241 return ret; 241 return ret;
242 242
243 return register_new_memory(nid, __pfn_to_section(phys_start_pfn)); 243 return register_new_memory(nid, __pfn_to_section(phys_start_pfn));
244 } 244 }
245 245
246 #ifdef CONFIG_SPARSEMEM_VMEMMAP 246 #ifdef CONFIG_SPARSEMEM_VMEMMAP
247 static int __remove_section(struct zone *zone, struct mem_section *ms) 247 static int __remove_section(struct zone *zone, struct mem_section *ms)
248 { 248 {
249 /* 249 /*
250 * XXX: Freeing memmap with vmemmap is not implement yet. 250 * XXX: Freeing memmap with vmemmap is not implement yet.
251 * This should be removed later. 251 * This should be removed later.
252 */ 252 */
253 return -EBUSY; 253 return -EBUSY;
254 } 254 }
255 #else 255 #else
256 static int __remove_section(struct zone *zone, struct mem_section *ms) 256 static int __remove_section(struct zone *zone, struct mem_section *ms)
257 { 257 {
258 unsigned long flags; 258 unsigned long flags;
259 struct pglist_data *pgdat = zone->zone_pgdat; 259 struct pglist_data *pgdat = zone->zone_pgdat;
260 int ret = -EINVAL; 260 int ret = -EINVAL;
261 261
262 if (!valid_section(ms)) 262 if (!valid_section(ms))
263 return ret; 263 return ret;
264 264
265 ret = unregister_memory_section(ms); 265 ret = unregister_memory_section(ms);
266 if (ret) 266 if (ret)
267 return ret; 267 return ret;
268 268
269 pgdat_resize_lock(pgdat, &flags); 269 pgdat_resize_lock(pgdat, &flags);
270 sparse_remove_one_section(zone, ms); 270 sparse_remove_one_section(zone, ms);
271 pgdat_resize_unlock(pgdat, &flags); 271 pgdat_resize_unlock(pgdat, &flags);
272 return 0; 272 return 0;
273 } 273 }
274 #endif 274 #endif
275 275
276 /* 276 /*
277 * Reasonably generic function for adding memory. It is 277 * Reasonably generic function for adding memory. It is
278 * expected that archs that support memory hotplug will 278 * expected that archs that support memory hotplug will
279 * call this function after deciding the zone to which to 279 * call this function after deciding the zone to which to
280 * add the new pages. 280 * add the new pages.
281 */ 281 */
282 int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn, 282 int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
283 unsigned long nr_pages) 283 unsigned long nr_pages)
284 { 284 {
285 unsigned long i; 285 unsigned long i;
286 int err = 0; 286 int err = 0;
287 int start_sec, end_sec; 287 int start_sec, end_sec;
288 /* during initialize mem_map, align hot-added range to section */ 288 /* during initialize mem_map, align hot-added range to section */
289 start_sec = pfn_to_section_nr(phys_start_pfn); 289 start_sec = pfn_to_section_nr(phys_start_pfn);
290 end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1); 290 end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
291 291
292 for (i = start_sec; i <= end_sec; i++) { 292 for (i = start_sec; i <= end_sec; i++) {
293 err = __add_section(nid, zone, i << PFN_SECTION_SHIFT); 293 err = __add_section(nid, zone, i << PFN_SECTION_SHIFT);
294 294
295 /* 295 /*
296 * EEXIST is finally dealt with by ioresource collision 296 * EEXIST is finally dealt with by ioresource collision
297 * check. see add_memory() => register_memory_resource() 297 * check. see add_memory() => register_memory_resource()
298 * Warning will be printed if there is collision. 298 * Warning will be printed if there is collision.
299 */ 299 */
300 if (err && (err != -EEXIST)) 300 if (err && (err != -EEXIST))
301 break; 301 break;
302 err = 0; 302 err = 0;
303 } 303 }
304 304
305 return err; 305 return err;
306 } 306 }
307 EXPORT_SYMBOL_GPL(__add_pages); 307 EXPORT_SYMBOL_GPL(__add_pages);
308 308
309 /** 309 /**
310 * __remove_pages() - remove sections of pages from a zone 310 * __remove_pages() - remove sections of pages from a zone
311 * @zone: zone from which pages need to be removed 311 * @zone: zone from which pages need to be removed
312 * @phys_start_pfn: starting pageframe (must be aligned to start of a section) 312 * @phys_start_pfn: starting pageframe (must be aligned to start of a section)
313 * @nr_pages: number of pages to remove (must be multiple of section size) 313 * @nr_pages: number of pages to remove (must be multiple of section size)
314 * 314 *
315 * Generic helper function to remove section mappings and sysfs entries 315 * Generic helper function to remove section mappings and sysfs entries
316 * for the section of the memory we are removing. Caller needs to make 316 * for the section of the memory we are removing. Caller needs to make
317 * sure that pages are marked reserved and zones are adjust properly by 317 * sure that pages are marked reserved and zones are adjust properly by
318 * calling offline_pages(). 318 * calling offline_pages().
319 */ 319 */
320 int __remove_pages(struct zone *zone, unsigned long phys_start_pfn, 320 int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
321 unsigned long nr_pages) 321 unsigned long nr_pages)
322 { 322 {
323 unsigned long i, ret = 0; 323 unsigned long i, ret = 0;
324 int sections_to_remove; 324 int sections_to_remove;
325 325
326 /* 326 /*
327 * We can only remove entire sections 327 * We can only remove entire sections
328 */ 328 */
329 BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK); 329 BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK);
330 BUG_ON(nr_pages % PAGES_PER_SECTION); 330 BUG_ON(nr_pages % PAGES_PER_SECTION);
331 331
332 sections_to_remove = nr_pages / PAGES_PER_SECTION; 332 sections_to_remove = nr_pages / PAGES_PER_SECTION;
333 for (i = 0; i < sections_to_remove; i++) { 333 for (i = 0; i < sections_to_remove; i++) {
334 unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION; 334 unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
335 release_mem_region(pfn << PAGE_SHIFT, 335 release_mem_region(pfn << PAGE_SHIFT,
336 PAGES_PER_SECTION << PAGE_SHIFT); 336 PAGES_PER_SECTION << PAGE_SHIFT);
337 ret = __remove_section(zone, __pfn_to_section(pfn)); 337 ret = __remove_section(zone, __pfn_to_section(pfn));
338 if (ret) 338 if (ret)
339 break; 339 break;
340 } 340 }
341 return ret; 341 return ret;
342 } 342 }
343 EXPORT_SYMBOL_GPL(__remove_pages); 343 EXPORT_SYMBOL_GPL(__remove_pages);
344 344
345 void online_page(struct page *page) 345 void online_page(struct page *page)
346 { 346 {
347 unsigned long pfn = page_to_pfn(page); 347 unsigned long pfn = page_to_pfn(page);
348 348
349 totalram_pages++; 349 totalram_pages++;
350 if (pfn >= num_physpages) 350 if (pfn >= num_physpages)
351 num_physpages = pfn + 1; 351 num_physpages = pfn + 1;
352 352
353 #ifdef CONFIG_HIGHMEM 353 #ifdef CONFIG_HIGHMEM
354 if (PageHighMem(page)) 354 if (PageHighMem(page))
355 totalhigh_pages++; 355 totalhigh_pages++;
356 #endif 356 #endif
357 357
358 #ifdef CONFIG_FLATMEM 358 #ifdef CONFIG_FLATMEM
359 max_mapnr = max(page_to_pfn(page), max_mapnr); 359 max_mapnr = max(page_to_pfn(page), max_mapnr);
360 #endif 360 #endif
361 361
362 ClearPageReserved(page); 362 ClearPageReserved(page);
363 init_page_count(page); 363 init_page_count(page);
364 __free_page(page); 364 __free_page(page);
365 } 365 }
366 366
367 static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages, 367 static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
368 void *arg) 368 void *arg)
369 { 369 {
370 unsigned long i; 370 unsigned long i;
371 unsigned long onlined_pages = *(unsigned long *)arg; 371 unsigned long onlined_pages = *(unsigned long *)arg;
372 struct page *page; 372 struct page *page;
373 if (PageReserved(pfn_to_page(start_pfn))) 373 if (PageReserved(pfn_to_page(start_pfn)))
374 for (i = 0; i < nr_pages; i++) { 374 for (i = 0; i < nr_pages; i++) {
375 page = pfn_to_page(start_pfn + i); 375 page = pfn_to_page(start_pfn + i);
376 online_page(page); 376 online_page(page);
377 onlined_pages++; 377 onlined_pages++;
378 } 378 }
379 *(unsigned long *)arg = onlined_pages; 379 *(unsigned long *)arg = onlined_pages;
380 return 0; 380 return 0;
381 } 381 }
382 382
383 383
384 int online_pages(unsigned long pfn, unsigned long nr_pages) 384 int online_pages(unsigned long pfn, unsigned long nr_pages)
385 { 385 {
386 unsigned long onlined_pages = 0; 386 unsigned long onlined_pages = 0;
387 struct zone *zone; 387 struct zone *zone;
388 int need_zonelists_rebuild = 0; 388 int need_zonelists_rebuild = 0;
389 int nid; 389 int nid;
390 int ret; 390 int ret;
391 struct memory_notify arg; 391 struct memory_notify arg;
392 392
393 arg.start_pfn = pfn; 393 arg.start_pfn = pfn;
394 arg.nr_pages = nr_pages; 394 arg.nr_pages = nr_pages;
395 arg.status_change_nid = -1; 395 arg.status_change_nid = -1;
396 396
397 nid = page_to_nid(pfn_to_page(pfn)); 397 nid = page_to_nid(pfn_to_page(pfn));
398 if (node_present_pages(nid) == 0) 398 if (node_present_pages(nid) == 0)
399 arg.status_change_nid = nid; 399 arg.status_change_nid = nid;
400 400
401 ret = memory_notify(MEM_GOING_ONLINE, &arg); 401 ret = memory_notify(MEM_GOING_ONLINE, &arg);
402 ret = notifier_to_errno(ret); 402 ret = notifier_to_errno(ret);
403 if (ret) { 403 if (ret) {
404 memory_notify(MEM_CANCEL_ONLINE, &arg); 404 memory_notify(MEM_CANCEL_ONLINE, &arg);
405 return ret; 405 return ret;
406 } 406 }
407 /* 407 /*
408 * This doesn't need a lock to do pfn_to_page(). 408 * This doesn't need a lock to do pfn_to_page().
409 * The section can't be removed here because of the 409 * The section can't be removed here because of the
410 * memory_block->state_mutex. 410 * memory_block->state_mutex.
411 */ 411 */
412 zone = page_zone(pfn_to_page(pfn)); 412 zone = page_zone(pfn_to_page(pfn));
413 /* 413 /*
414 * If this zone is not populated, then it is not in zonelist. 414 * If this zone is not populated, then it is not in zonelist.
415 * This means the page allocator ignores this zone. 415 * This means the page allocator ignores this zone.
416 * So, zonelist must be updated after online. 416 * So, zonelist must be updated after online.
417 */ 417 */
418 mutex_lock(&zonelists_mutex); 418 mutex_lock(&zonelists_mutex);
419 if (!populated_zone(zone)) 419 if (!populated_zone(zone))
420 need_zonelists_rebuild = 1; 420 need_zonelists_rebuild = 1;
421 421
422 ret = walk_system_ram_range(pfn, nr_pages, &onlined_pages, 422 ret = walk_system_ram_range(pfn, nr_pages, &onlined_pages,
423 online_pages_range); 423 online_pages_range);
424 if (ret) { 424 if (ret) {
425 mutex_unlock(&zonelists_mutex); 425 mutex_unlock(&zonelists_mutex);
426 printk(KERN_DEBUG "online_pages %lx at %lx failed\n", 426 printk(KERN_DEBUG "online_pages %lx at %lx failed\n",
427 nr_pages, pfn); 427 nr_pages, pfn);
428 memory_notify(MEM_CANCEL_ONLINE, &arg); 428 memory_notify(MEM_CANCEL_ONLINE, &arg);
429 return ret; 429 return ret;
430 } 430 }
431 431
432 zone->present_pages += onlined_pages; 432 zone->present_pages += onlined_pages;
433 zone->zone_pgdat->node_present_pages += onlined_pages; 433 zone->zone_pgdat->node_present_pages += onlined_pages;
434 if (need_zonelists_rebuild) 434 if (need_zonelists_rebuild)
435 build_all_zonelists(zone); 435 build_all_zonelists(zone);
436 else 436 else
437 zone_pcp_update(zone); 437 zone_pcp_update(zone);
438 438
439 mutex_unlock(&zonelists_mutex); 439 mutex_unlock(&zonelists_mutex);
440 setup_per_zone_wmarks(); 440 setup_per_zone_wmarks();
441 calculate_zone_inactive_ratio(zone); 441 calculate_zone_inactive_ratio(zone);
442 if (onlined_pages) { 442 if (onlined_pages) {
443 kswapd_run(zone_to_nid(zone)); 443 kswapd_run(zone_to_nid(zone));
444 node_set_state(zone_to_nid(zone), N_HIGH_MEMORY); 444 node_set_state(zone_to_nid(zone), N_HIGH_MEMORY);
445 } 445 }
446 446
447 vm_total_pages = nr_free_pagecache_pages(); 447 vm_total_pages = nr_free_pagecache_pages();
448 448
449 writeback_set_ratelimit(); 449 writeback_set_ratelimit();
450 450
451 if (onlined_pages) 451 if (onlined_pages)
452 memory_notify(MEM_ONLINE, &arg); 452 memory_notify(MEM_ONLINE, &arg);
453 453
454 return 0; 454 return 0;
455 } 455 }
456 #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */ 456 #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
457 457
458 /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */ 458 /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
459 static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) 459 static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
460 { 460 {
461 struct pglist_data *pgdat; 461 struct pglist_data *pgdat;
462 unsigned long zones_size[MAX_NR_ZONES] = {0}; 462 unsigned long zones_size[MAX_NR_ZONES] = {0};
463 unsigned long zholes_size[MAX_NR_ZONES] = {0}; 463 unsigned long zholes_size[MAX_NR_ZONES] = {0};
464 unsigned long start_pfn = start >> PAGE_SHIFT; 464 unsigned long start_pfn = start >> PAGE_SHIFT;
465 465
466 pgdat = arch_alloc_nodedata(nid); 466 pgdat = arch_alloc_nodedata(nid);
467 if (!pgdat) 467 if (!pgdat)
468 return NULL; 468 return NULL;
469 469
470 arch_refresh_nodedata(nid, pgdat); 470 arch_refresh_nodedata(nid, pgdat);
471 471
472 /* we can use NODE_DATA(nid) from here */ 472 /* we can use NODE_DATA(nid) from here */
473 473
474 /* init node's zones as empty zones, we don't have any present pages.*/ 474 /* init node's zones as empty zones, we don't have any present pages.*/
475 free_area_init_node(nid, zones_size, start_pfn, zholes_size); 475 free_area_init_node(nid, zones_size, start_pfn, zholes_size);
476 476
477 return pgdat; 477 return pgdat;
478 } 478 }
479 479
480 static void rollback_node_hotadd(int nid, pg_data_t *pgdat) 480 static void rollback_node_hotadd(int nid, pg_data_t *pgdat)
481 { 481 {
482 arch_refresh_nodedata(nid, NULL); 482 arch_refresh_nodedata(nid, NULL);
483 arch_free_nodedata(pgdat); 483 arch_free_nodedata(pgdat);
484 return; 484 return;
485 } 485 }
486 486
487 487
488 /* 488 /*
489 * called by cpu_up() to online a node without onlined memory. 489 * called by cpu_up() to online a node without onlined memory.
490 */ 490 */
491 int mem_online_node(int nid) 491 int mem_online_node(int nid)
492 { 492 {
493 pg_data_t *pgdat; 493 pg_data_t *pgdat;
494 int ret; 494 int ret;
495 495
496 lock_system_sleep(); 496 lock_system_sleep();
497 pgdat = hotadd_new_pgdat(nid, 0); 497 pgdat = hotadd_new_pgdat(nid, 0);
498 if (pgdat) { 498 if (pgdat) {
499 ret = -ENOMEM; 499 ret = -ENOMEM;
500 goto out; 500 goto out;
501 } 501 }
502 node_set_online(nid); 502 node_set_online(nid);
503 ret = register_one_node(nid); 503 ret = register_one_node(nid);
504 BUG_ON(ret); 504 BUG_ON(ret);
505 505
506 out: 506 out:
507 unlock_system_sleep(); 507 unlock_system_sleep();
508 return ret; 508 return ret;
509 } 509 }
510 510
511 /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */ 511 /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
512 int __ref add_memory(int nid, u64 start, u64 size) 512 int __ref add_memory(int nid, u64 start, u64 size)
513 { 513 {
514 pg_data_t *pgdat = NULL; 514 pg_data_t *pgdat = NULL;
515 int new_pgdat = 0; 515 int new_pgdat = 0;
516 struct resource *res; 516 struct resource *res;
517 int ret; 517 int ret;
518 518
519 lock_system_sleep(); 519 lock_system_sleep();
520 520
521 res = register_memory_resource(start, size); 521 res = register_memory_resource(start, size);
522 ret = -EEXIST; 522 ret = -EEXIST;
523 if (!res) 523 if (!res)
524 goto out; 524 goto out;
525 525
526 if (!node_online(nid)) { 526 if (!node_online(nid)) {
527 pgdat = hotadd_new_pgdat(nid, start); 527 pgdat = hotadd_new_pgdat(nid, start);
528 ret = -ENOMEM; 528 ret = -ENOMEM;
529 if (!pgdat) 529 if (!pgdat)
530 goto out; 530 goto out;
531 new_pgdat = 1; 531 new_pgdat = 1;
532 } 532 }
533 533
534 /* call arch's memory hotadd */ 534 /* call arch's memory hotadd */
535 ret = arch_add_memory(nid, start, size); 535 ret = arch_add_memory(nid, start, size);
536 536
537 if (ret < 0) 537 if (ret < 0)
538 goto error; 538 goto error;
539 539
540 /* we online node here. we can't roll back from here. */ 540 /* we online node here. we can't roll back from here. */
541 node_set_online(nid); 541 node_set_online(nid);
542 542
543 if (new_pgdat) { 543 if (new_pgdat) {
544 ret = register_one_node(nid); 544 ret = register_one_node(nid);
545 /* 545 /*
546 * If sysfs file of new node can't create, cpu on the node 546 * If sysfs file of new node can't create, cpu on the node
547 * can't be hot-added. There is no rollback way now. 547 * can't be hot-added. There is no rollback way now.
548 * So, check by BUG_ON() to catch it reluctantly.. 548 * So, check by BUG_ON() to catch it reluctantly..
549 */ 549 */
550 BUG_ON(ret); 550 BUG_ON(ret);
551 } 551 }
552 552
553 /* create new memmap entry */ 553 /* create new memmap entry */
554 firmware_map_add_hotplug(start, start + size, "System RAM"); 554 firmware_map_add_hotplug(start, start + size, "System RAM");
555 555
556 goto out; 556 goto out;
557 557
558 error: 558 error:
559 /* rollback pgdat allocation and others */ 559 /* rollback pgdat allocation and others */
560 if (new_pgdat) 560 if (new_pgdat)
561 rollback_node_hotadd(nid, pgdat); 561 rollback_node_hotadd(nid, pgdat);
562 if (res) 562 if (res)
563 release_memory_resource(res); 563 release_memory_resource(res);
564 564
565 out: 565 out:
566 unlock_system_sleep(); 566 unlock_system_sleep();
567 return ret; 567 return ret;
568 } 568 }
569 EXPORT_SYMBOL_GPL(add_memory); 569 EXPORT_SYMBOL_GPL(add_memory);
570 570
571 #ifdef CONFIG_MEMORY_HOTREMOVE 571 #ifdef CONFIG_MEMORY_HOTREMOVE
572 /* 572 /*
573 * A free page on the buddy free lists (not the per-cpu lists) has PageBuddy 573 * A free page on the buddy free lists (not the per-cpu lists) has PageBuddy
574 * set and the size of the free page is given by page_order(). Using this, 574 * set and the size of the free page is given by page_order(). Using this,
575 * the function determines if the pageblock contains only free pages. 575 * the function determines if the pageblock contains only free pages.
576 * Due to buddy contraints, a free page at least the size of a pageblock will 576 * Due to buddy contraints, a free page at least the size of a pageblock will
577 * be located at the start of the pageblock 577 * be located at the start of the pageblock
578 */ 578 */
579 static inline int pageblock_free(struct page *page) 579 static inline int pageblock_free(struct page *page)
580 { 580 {
581 return PageBuddy(page) && page_order(page) >= pageblock_order; 581 return PageBuddy(page) && page_order(page) >= pageblock_order;
582 } 582 }
583 583
584 /* Return the start of the next active pageblock after a given page */ 584 /* Return the start of the next active pageblock after a given page */
585 static struct page *next_active_pageblock(struct page *page) 585 static struct page *next_active_pageblock(struct page *page)
586 { 586 {
587 /* Ensure the starting page is pageblock-aligned */ 587 /* Ensure the starting page is pageblock-aligned */
588 BUG_ON(page_to_pfn(page) & (pageblock_nr_pages - 1)); 588 BUG_ON(page_to_pfn(page) & (pageblock_nr_pages - 1));
589 589
590 /* If the entire pageblock is free, move to the end of free page */ 590 /* If the entire pageblock is free, move to the end of free page */
591 if (pageblock_free(page)) { 591 if (pageblock_free(page)) {
592 int order; 592 int order;
593 /* be careful. we don't have locks, page_order can be changed.*/ 593 /* be careful. we don't have locks, page_order can be changed.*/
594 order = page_order(page); 594 order = page_order(page);
595 if ((order < MAX_ORDER) && (order >= pageblock_order)) 595 if ((order < MAX_ORDER) && (order >= pageblock_order))
596 return page + (1 << order); 596 return page + (1 << order);
597 } 597 }
598 598
599 return page + pageblock_nr_pages; 599 return page + pageblock_nr_pages;
600 } 600 }
601 601
602 /* Checks if this range of memory is likely to be hot-removable. */ 602 /* Checks if this range of memory is likely to be hot-removable. */
603 int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages) 603 int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
604 { 604 {
605 int type; 605 int type;
606 struct page *page = pfn_to_page(start_pfn); 606 struct page *page = pfn_to_page(start_pfn);
607 struct page *end_page = page + nr_pages; 607 struct page *end_page = page + nr_pages;
608 608
609 /* Check the starting page of each pageblock within the range */ 609 /* Check the starting page of each pageblock within the range */
610 for (; page < end_page; page = next_active_pageblock(page)) { 610 for (; page < end_page; page = next_active_pageblock(page)) {
611 type = get_pageblock_migratetype(page); 611 type = get_pageblock_migratetype(page);
612 612
613 /* 613 /*
614 * A pageblock containing MOVABLE or free pages is considered 614 * A pageblock containing MOVABLE or free pages is considered
615 * removable 615 * removable
616 */ 616 */
617 if (type != MIGRATE_MOVABLE && !pageblock_free(page)) 617 if (type != MIGRATE_MOVABLE && !pageblock_free(page))
618 return 0; 618 return 0;
619 619
620 /* 620 /*
621 * A pageblock starting with a PageReserved page is not 621 * A pageblock starting with a PageReserved page is not
622 * considered removable. 622 * considered removable.
623 */ 623 */
624 if (PageReserved(page)) 624 if (PageReserved(page))
625 return 0; 625 return 0;
626 } 626 }
627 627
628 /* All pageblocks in the memory block are likely to be hot-removable */ 628 /* All pageblocks in the memory block are likely to be hot-removable */
629 return 1; 629 return 1;
630 } 630 }
631 631
632 /* 632 /*
633 * Confirm all pages in a range [start, end) is belongs to the same zone. 633 * Confirm all pages in a range [start, end) is belongs to the same zone.
634 */ 634 */
635 static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn) 635 static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
636 { 636 {
637 unsigned long pfn; 637 unsigned long pfn;
638 struct zone *zone = NULL; 638 struct zone *zone = NULL;
639 struct page *page; 639 struct page *page;
640 int i; 640 int i;
641 for (pfn = start_pfn; 641 for (pfn = start_pfn;
642 pfn < end_pfn; 642 pfn < end_pfn;
643 pfn += MAX_ORDER_NR_PAGES) { 643 pfn += MAX_ORDER_NR_PAGES) {
644 i = 0; 644 i = 0;
645 /* This is just a CONFIG_HOLES_IN_ZONE check.*/ 645 /* This is just a CONFIG_HOLES_IN_ZONE check.*/
646 while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i)) 646 while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
647 i++; 647 i++;
648 if (i == MAX_ORDER_NR_PAGES) 648 if (i == MAX_ORDER_NR_PAGES)
649 continue; 649 continue;
650 page = pfn_to_page(pfn + i); 650 page = pfn_to_page(pfn + i);
651 if (zone && page_zone(page) != zone) 651 if (zone && page_zone(page) != zone)
652 return 0; 652 return 0;
653 zone = page_zone(page); 653 zone = page_zone(page);
654 } 654 }
655 return 1; 655 return 1;
656 } 656 }
657 657
658 /* 658 /*
659 * Scanning pfn is much easier than scanning lru list. 659 * Scanning pfn is much easier than scanning lru list.
660 * Scan pfn from start to end and Find LRU page. 660 * Scan pfn from start to end and Find LRU page.
661 */ 661 */
662 int scan_lru_pages(unsigned long start, unsigned long end) 662 int scan_lru_pages(unsigned long start, unsigned long end)
663 { 663 {
664 unsigned long pfn; 664 unsigned long pfn;
665 struct page *page; 665 struct page *page;
666 for (pfn = start; pfn < end; pfn++) { 666 for (pfn = start; pfn < end; pfn++) {
667 if (pfn_valid(pfn)) { 667 if (pfn_valid(pfn)) {
668 page = pfn_to_page(pfn); 668 page = pfn_to_page(pfn);
669 if (PageLRU(page)) 669 if (PageLRU(page))
670 return pfn; 670 return pfn;
671 } 671 }
672 } 672 }
673 return 0; 673 return 0;
674 } 674 }
675 675
676 static struct page * 676 static struct page *
677 hotremove_migrate_alloc(struct page *page, unsigned long private, int **x) 677 hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
678 { 678 {
679 /* This should be improooooved!! */ 679 /* This should be improooooved!! */
680 return alloc_page(GFP_HIGHUSER_MOVABLE); 680 return alloc_page(GFP_HIGHUSER_MOVABLE);
681 } 681 }
682 682
683 #define NR_OFFLINE_AT_ONCE_PAGES (256) 683 #define NR_OFFLINE_AT_ONCE_PAGES (256)
684 static int 684 static int
685 do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) 685 do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
686 { 686 {
687 unsigned long pfn; 687 unsigned long pfn;
688 struct page *page; 688 struct page *page;
689 int move_pages = NR_OFFLINE_AT_ONCE_PAGES; 689 int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
690 int not_managed = 0; 690 int not_managed = 0;
691 int ret = 0; 691 int ret = 0;
692 LIST_HEAD(source); 692 LIST_HEAD(source);
693 693
694 for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) { 694 for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
695 if (!pfn_valid(pfn)) 695 if (!pfn_valid(pfn))
696 continue; 696 continue;
697 page = pfn_to_page(pfn); 697 page = pfn_to_page(pfn);
698 if (!page_count(page)) 698 if (!page_count(page))
699 continue; 699 continue;
700 /* 700 /*
701 * We can skip free pages. And we can only deal with pages on 701 * We can skip free pages. And we can only deal with pages on
702 * LRU. 702 * LRU.
703 */ 703 */
704 ret = isolate_lru_page(page); 704 ret = isolate_lru_page(page);
705 if (!ret) { /* Success */ 705 if (!ret) { /* Success */
706 list_add_tail(&page->lru, &source); 706 list_add_tail(&page->lru, &source);
707 move_pages--; 707 move_pages--;
708 inc_zone_page_state(page, NR_ISOLATED_ANON + 708 inc_zone_page_state(page, NR_ISOLATED_ANON +
709 page_is_file_cache(page)); 709 page_is_file_cache(page));
710 710
711 } else { 711 } else {
712 /* Becasue we don't have big zone->lock. we should 712 /* Becasue we don't have big zone->lock. we should
713 check this again here. */ 713 check this again here. */
714 if (page_count(page)) 714 if (page_count(page))
715 not_managed++; 715 not_managed++;
716 #ifdef CONFIG_DEBUG_VM 716 #ifdef CONFIG_DEBUG_VM
717 printk(KERN_ALERT "removing pfn %lx from LRU failed\n", 717 printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
718 pfn); 718 pfn);
719 dump_page(page); 719 dump_page(page);
720 #endif 720 #endif
721 } 721 }
722 } 722 }
723 ret = -EBUSY; 723 ret = -EBUSY;
724 if (not_managed) { 724 if (not_managed) {
725 if (!list_empty(&source)) 725 if (!list_empty(&source))
726 putback_lru_pages(&source); 726 putback_lru_pages(&source);
727 goto out; 727 goto out;
728 } 728 }
729 ret = 0; 729 ret = 0;
730 if (list_empty(&source)) 730 if (list_empty(&source))
731 goto out; 731 goto out;
732 /* this function returns # of failed pages */ 732 /* this function returns # of failed pages */
733 ret = migrate_pages(&source, hotremove_migrate_alloc, 0, 1); 733 ret = migrate_pages(&source, hotremove_migrate_alloc, 0, 1);
734 734
735 out: 735 out:
736 return ret; 736 return ret;
737 } 737 }
738 738
739 /* 739 /*
740 * remove from free_area[] and mark all as Reserved. 740 * remove from free_area[] and mark all as Reserved.
741 */ 741 */
742 static int 742 static int
743 offline_isolated_pages_cb(unsigned long start, unsigned long nr_pages, 743 offline_isolated_pages_cb(unsigned long start, unsigned long nr_pages,
744 void *data) 744 void *data)
745 { 745 {
746 __offline_isolated_pages(start, start + nr_pages); 746 __offline_isolated_pages(start, start + nr_pages);
747 return 0; 747 return 0;
748 } 748 }
749 749
750 static void 750 static void
751 offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) 751 offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
752 { 752 {
753 walk_system_ram_range(start_pfn, end_pfn - start_pfn, NULL, 753 walk_system_ram_range(start_pfn, end_pfn - start_pfn, NULL,
754 offline_isolated_pages_cb); 754 offline_isolated_pages_cb);
755 } 755 }
756 756
757 /* 757 /*
758 * Check all pages in range, recoreded as memory resource, are isolated. 758 * Check all pages in range, recoreded as memory resource, are isolated.
759 */ 759 */
760 static int 760 static int
761 check_pages_isolated_cb(unsigned long start_pfn, unsigned long nr_pages, 761 check_pages_isolated_cb(unsigned long start_pfn, unsigned long nr_pages,
762 void *data) 762 void *data)
763 { 763 {
764 int ret; 764 int ret;
765 long offlined = *(long *)data; 765 long offlined = *(long *)data;
766 ret = test_pages_isolated(start_pfn, start_pfn + nr_pages); 766 ret = test_pages_isolated(start_pfn, start_pfn + nr_pages);
767 offlined = nr_pages; 767 offlined = nr_pages;
768 if (!ret) 768 if (!ret)
769 *(long *)data += offlined; 769 *(long *)data += offlined;
770 return ret; 770 return ret;
771 } 771 }
772 772
773 static long 773 static long
774 check_pages_isolated(unsigned long start_pfn, unsigned long end_pfn) 774 check_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
775 { 775 {
776 long offlined = 0; 776 long offlined = 0;
777 int ret; 777 int ret;
778 778
779 ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn, &offlined, 779 ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn, &offlined,
780 check_pages_isolated_cb); 780 check_pages_isolated_cb);
781 if (ret < 0) 781 if (ret < 0)
782 offlined = (long)ret; 782 offlined = (long)ret;
783 return offlined; 783 return offlined;
784 } 784 }
785 785
786 static int offline_pages(unsigned long start_pfn, 786 static int offline_pages(unsigned long start_pfn,
787 unsigned long end_pfn, unsigned long timeout) 787 unsigned long end_pfn, unsigned long timeout)
788 { 788 {
789 unsigned long pfn, nr_pages, expire; 789 unsigned long pfn, nr_pages, expire;
790 long offlined_pages; 790 long offlined_pages;
791 int ret, drain, retry_max, node; 791 int ret, drain, retry_max, node;
792 struct zone *zone; 792 struct zone *zone;
793 struct memory_notify arg; 793 struct memory_notify arg;
794 794
795 BUG_ON(start_pfn >= end_pfn); 795 BUG_ON(start_pfn >= end_pfn);
796 /* at least, alignment against pageblock is necessary */ 796 /* at least, alignment against pageblock is necessary */
797 if (!IS_ALIGNED(start_pfn, pageblock_nr_pages)) 797 if (!IS_ALIGNED(start_pfn, pageblock_nr_pages))
798 return -EINVAL; 798 return -EINVAL;
799 if (!IS_ALIGNED(end_pfn, pageblock_nr_pages)) 799 if (!IS_ALIGNED(end_pfn, pageblock_nr_pages))
800 return -EINVAL; 800 return -EINVAL;
801 /* This makes hotplug much easier...and readable. 801 /* This makes hotplug much easier...and readable.
802 we assume this for now. .*/ 802 we assume this for now. .*/
803 if (!test_pages_in_a_zone(start_pfn, end_pfn)) 803 if (!test_pages_in_a_zone(start_pfn, end_pfn))
804 return -EINVAL; 804 return -EINVAL;
805 805
806 lock_system_sleep(); 806 lock_system_sleep();
807 807
808 zone = page_zone(pfn_to_page(start_pfn)); 808 zone = page_zone(pfn_to_page(start_pfn));
809 node = zone_to_nid(zone); 809 node = zone_to_nid(zone);
810 nr_pages = end_pfn - start_pfn; 810 nr_pages = end_pfn - start_pfn;
811 811
812 /* set above range as isolated */ 812 /* set above range as isolated */
813 ret = start_isolate_page_range(start_pfn, end_pfn); 813 ret = start_isolate_page_range(start_pfn, end_pfn);
814 if (ret) 814 if (ret)
815 goto out; 815 goto out;
816 816
817 arg.start_pfn = start_pfn; 817 arg.start_pfn = start_pfn;
818 arg.nr_pages = nr_pages; 818 arg.nr_pages = nr_pages;
819 arg.status_change_nid = -1; 819 arg.status_change_nid = -1;
820 if (nr_pages >= node_present_pages(node)) 820 if (nr_pages >= node_present_pages(node))
821 arg.status_change_nid = node; 821 arg.status_change_nid = node;
822 822
823 ret = memory_notify(MEM_GOING_OFFLINE, &arg); 823 ret = memory_notify(MEM_GOING_OFFLINE, &arg);
824 ret = notifier_to_errno(ret); 824 ret = notifier_to_errno(ret);
825 if (ret) 825 if (ret)
826 goto failed_removal; 826 goto failed_removal;
827 827
828 pfn = start_pfn; 828 pfn = start_pfn;
829 expire = jiffies + timeout; 829 expire = jiffies + timeout;
830 drain = 0; 830 drain = 0;
831 retry_max = 5; 831 retry_max = 5;
832 repeat: 832 repeat:
833 /* start memory hot removal */ 833 /* start memory hot removal */
834 ret = -EAGAIN; 834 ret = -EAGAIN;
835 if (time_after(jiffies, expire)) 835 if (time_after(jiffies, expire))
836 goto failed_removal; 836 goto failed_removal;
837 ret = -EINTR; 837 ret = -EINTR;
838 if (signal_pending(current)) 838 if (signal_pending(current))
839 goto failed_removal; 839 goto failed_removal;
840 ret = 0; 840 ret = 0;
841 if (drain) { 841 if (drain) {
842 lru_add_drain_all(); 842 lru_add_drain_all();
843 flush_scheduled_work();
844 cond_resched(); 843 cond_resched();
845 drain_all_pages(); 844 drain_all_pages();
846 } 845 }
847 846
848 pfn = scan_lru_pages(start_pfn, end_pfn); 847 pfn = scan_lru_pages(start_pfn, end_pfn);
849 if (pfn) { /* We have page on LRU */ 848 if (pfn) { /* We have page on LRU */
850 ret = do_migrate_range(pfn, end_pfn); 849 ret = do_migrate_range(pfn, end_pfn);
851 if (!ret) { 850 if (!ret) {
852 drain = 1; 851 drain = 1;
853 goto repeat; 852 goto repeat;
854 } else { 853 } else {
855 if (ret < 0) 854 if (ret < 0)
856 if (--retry_max == 0) 855 if (--retry_max == 0)
857 goto failed_removal; 856 goto failed_removal;
858 yield(); 857 yield();
859 drain = 1; 858 drain = 1;
860 goto repeat; 859 goto repeat;
861 } 860 }
862 } 861 }
863 /* drain all zone's lru pagevec, this is asyncronous... */ 862 /* drain all zone's lru pagevec, this is asyncronous... */
864 lru_add_drain_all(); 863 lru_add_drain_all();
865 flush_scheduled_work();
866 yield(); 864 yield();
867 /* drain pcp pages , this is synchrouns. */ 865 /* drain pcp pages , this is synchrouns. */
868 drain_all_pages(); 866 drain_all_pages();
869 /* check again */ 867 /* check again */
870 offlined_pages = check_pages_isolated(start_pfn, end_pfn); 868 offlined_pages = check_pages_isolated(start_pfn, end_pfn);
871 if (offlined_pages < 0) { 869 if (offlined_pages < 0) {
872 ret = -EBUSY; 870 ret = -EBUSY;
873 goto failed_removal; 871 goto failed_removal;
874 } 872 }
875 printk(KERN_INFO "Offlined Pages %ld\n", offlined_pages); 873 printk(KERN_INFO "Offlined Pages %ld\n", offlined_pages);
876 /* Ok, all of our target is islaoted. 874 /* Ok, all of our target is islaoted.
877 We cannot do rollback at this point. */ 875 We cannot do rollback at this point. */
878 offline_isolated_pages(start_pfn, end_pfn); 876 offline_isolated_pages(start_pfn, end_pfn);
879 /* reset pagetype flags and makes migrate type to be MOVABLE */ 877 /* reset pagetype flags and makes migrate type to be MOVABLE */
880 undo_isolate_page_range(start_pfn, end_pfn); 878 undo_isolate_page_range(start_pfn, end_pfn);
881 /* removal success */ 879 /* removal success */
882 zone->present_pages -= offlined_pages; 880 zone->present_pages -= offlined_pages;
883 zone->zone_pgdat->node_present_pages -= offlined_pages; 881 zone->zone_pgdat->node_present_pages -= offlined_pages;
884 totalram_pages -= offlined_pages; 882 totalram_pages -= offlined_pages;
885 883
886 setup_per_zone_wmarks(); 884 setup_per_zone_wmarks();
887 calculate_zone_inactive_ratio(zone); 885 calculate_zone_inactive_ratio(zone);
888 if (!node_present_pages(node)) { 886 if (!node_present_pages(node)) {
889 node_clear_state(node, N_HIGH_MEMORY); 887 node_clear_state(node, N_HIGH_MEMORY);
890 kswapd_stop(node); 888 kswapd_stop(node);
891 } 889 }
892 890
893 vm_total_pages = nr_free_pagecache_pages(); 891 vm_total_pages = nr_free_pagecache_pages();
894 writeback_set_ratelimit(); 892 writeback_set_ratelimit();
895 893
896 memory_notify(MEM_OFFLINE, &arg); 894 memory_notify(MEM_OFFLINE, &arg);
897 unlock_system_sleep(); 895 unlock_system_sleep();
898 return 0; 896 return 0;
899 897
900 failed_removal: 898 failed_removal:
901 printk(KERN_INFO "memory offlining %lx to %lx failed\n", 899 printk(KERN_INFO "memory offlining %lx to %lx failed\n",
902 start_pfn, end_pfn); 900 start_pfn, end_pfn);
903 memory_notify(MEM_CANCEL_OFFLINE, &arg); 901 memory_notify(MEM_CANCEL_OFFLINE, &arg);
904 /* pushback to free area */ 902 /* pushback to free area */
905 undo_isolate_page_range(start_pfn, end_pfn); 903 undo_isolate_page_range(start_pfn, end_pfn);
906 904
907 out: 905 out:
908 unlock_system_sleep(); 906 unlock_system_sleep();
909 return ret; 907 return ret;
910 } 908 }
911 909
912 int remove_memory(u64 start, u64 size) 910 int remove_memory(u64 start, u64 size)
913 { 911 {
914 unsigned long start_pfn, end_pfn; 912 unsigned long start_pfn, end_pfn;
915 913
916 start_pfn = PFN_DOWN(start); 914 start_pfn = PFN_DOWN(start);
917 end_pfn = start_pfn + PFN_DOWN(size); 915 end_pfn = start_pfn + PFN_DOWN(size);
918 return offline_pages(start_pfn, end_pfn, 120 * HZ); 916 return offline_pages(start_pfn, end_pfn, 120 * HZ);
919 } 917 }
920 #else 918 #else
921 int remove_memory(u64 start, u64 size) 919 int remove_memory(u64 start, u64 size)
922 { 920 {
923 return -EINVAL; 921 return -EINVAL;
924 } 922 }
925 #endif /* CONFIG_MEMORY_HOTREMOVE */ 923 #endif /* CONFIG_MEMORY_HOTREMOVE */
926 EXPORT_SYMBOL_GPL(remove_memory); 924 EXPORT_SYMBOL_GPL(remove_memory);
927 925