Eric Lee / smarc-fsl-linux-kernel

14 Aug, 2009

1 commit

384be2b18 Merge branch 'percpu-for-linus' into percpu-for-next ... Browse Code »

Conflicts:
arch/sparc/kernel/smp_64.c
arch/x86/kernel/cpu/perf_counter.c
arch/x86/kernel/setup_percpu.c
drivers/cpufreq/cpufreq_ondemand.c
mm/percpu.c

Conflicts in core and arch percpu codes are mostly from commit
ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many
num_possible_cpus() with nr_cpu_ids. As for-next branch has moved all
the first chunk allocators into mm/percpu.c, the changes are moved
from arch code to mm/percpu.c.

Signed-off-by: Tejun Heo

Tejun Heo
2009-08-14 13:45:31 +0800

11 Jul, 2009

1 commit

32f2e807a cfq-iosched: reset oom_cfqq in cfq_set_request() ... Browse Code »

In case memory is scarce, we now default to oom_cfqq. Once memory is
available again, we should allocate a new cfqq and stop using oom_cfqq for
a particular io context.

Once a new request comes in, check if we are using oom_cfqq, and if yes,
try to allocate a new cfqq.

Tested the patch by forcing the use of oom_cfqq and upon next request thread
realized that it was using oom_cfqq and it allocated a new cfqq.

Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe

Vivek Goyal
2009-07-11 02:31:54 +0800

04 Jul, 2009

1 commit

c43768cbb Merge branch 'master' into for-next ... Browse Code »

Pull linus#master to merge PER_CPU_DEF_ATTRIBUTES and alpha build fix
changes. As alpha in percpu tree uses 'weak' attribute instead of
inline assembly, there's no need for __used attribute.

Conflicts:
arch/alpha/include/asm/percpu.h
arch/mn10300/kernel/vmlinux.lds.S
include/linux/percpu-defs.h

Tejun Heo
2009-07-04 06:13:18 +0800

01 Jul, 2009

3 commits

b706f6428 cfq-iosched: remove redundant check for NULL cfqq in cfq_set_request() ... Browse Code »

With the changes for falling back to an oom_cfqq, we never fail
to find/allocate a queue in cfq_get_queue(). So remove the check.

Signed-off-by: Shan Wei
Signed-off-by: Jens Axboe

Shan Wei
2009-07-01 18:41:14 +0800
6118b70b3 cfq-iosched: get rid of the need for __GFP_NOFAIL in cfq_find_alloc_queue() ... Browse Code »

Setup an emergency fallback cfqq that we allocate at IO scheduler init
time. If the slab allocation fails in cfq_find_alloc_queue(), we'll just
punt IO to that cfqq instead. This ensures that cfq_find_alloc_queue()
never fails without having to ensure free memory.

On cfqq lookup, always try to allocate a new cfqq if the given cfq io
context has the oom_cfqq assigned. This ensures that we only temporarily
punt to this shared queue.

Reviewed-by: Jeff Moyer
Signed-off-by: Jens Axboe

Jens Axboe
2009-07-01 16:56:25 +0800
d5036d770 cfq-iosched: move cfqq initialization out of cfq_find_alloc_queue() ... Browse Code »

We're going to be needing that init code outside of that function
to get rid of the __GFP_NOFAIL in cfqq allocation.

Reviewed-by: Jeff Moyer
Signed-off-by: Jens Axboe

Jens Axboe
2009-07-01 16:56:25 +0800

24 Jun, 2009

1 commit

245b2e70e percpu: clean up percpu variable definitions ... Browse Code »

Percpu variable definition is about to be updated such that all percpu
symbols including the static ones must be unique. Update percpu
variable definitions accordingly.

* as,cfq: rename ioc_count uniquely

* cpufreq: rename cpu_dbs_info uniquely

* xen: move nesting_count out of xen_evtchn_do_upcall() and rename it

* mm: move ratelimits out of balance_dirty_pages_ratelimited_nr() and
rename it

* ipv4,6: rename cookie_scratch uniquely

* x86 perf_counter: rename prev_left to pmc_prev_left, irq_entry to
pmc_irq_entry and nmi_entry to pmc_nmi_entry

* perf_counter: rename disable_count to perf_disable_count

* ftrace: rename test_event_disable to ftrace_test_event_disable

* kmemleak: rename test_pointer to kmemleak_test_pointer

* mce: rename next_interval to mce_next_interval

[ Impact: percpu usage cleanups, no duplicate static percpu var names ]

Signed-off-by: Tejun Heo
Reviewed-by: Christoph Lameter
Cc: Ivan Kokshaysky
Cc: Jens Axboe
Cc: Dave Jones
Cc: Jeremy Fitzhardinge
Cc: linux-mm
Cc: David S. Miller
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Li Zefan
Cc: Catalin Marinas
Cc: Andi Kleen

Tejun Heo
2009-06-24 14:13:48 +0800

16 Jun, 2009

2 commits

6923715ae cfq: remove extraneous '\n' in blktrace output ... Browse Code »

I noticed a blank line in blktrace output. This patch fixes that.

Signed-off-by: Jeff Moyer
Signed-off-by: Jens Axboe

Jeff Moyer
2009-06-16 14:21:04 +0800
81be83471 cfq: cleanup for last_end_request in cfq_data ... Browse Code »

Actually, last_end_request in cfq_data isn't used now. So lets
just remove it.

Signed-off-by: Gui Jianfeng
Signed-off-by: Jens Axboe

Gui Jianfeng
2009-06-16 14:21:03 +0800

11 Jun, 2009

1 commit

d9c7d394a block: prevent possible io_context->refcount overflow ... Browse Code »

Currently io_context has an atomic_t(32-bit) as refcount. In the case of
cfq, for each device against whcih a task does I/O, a reference to the
io_context would be taken. And when there are multiple process sharing
io_contexts(CLONE_IO) would also have a reference to the same io_context.

Theoretically the possible maximum number of processes sharing the same
io_context + the number of disks/cfq_data referring to the same io_context
can overflow the 32-bit counter on a very high-end machine.

Even though it is an improbable case, let us make it atomic_long_t.

Signed-off-by: Nikanth Karthikesan
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe

Nikanth Karthikesan
2009-06-11 05:07:15 +0800

11 May, 2009

3 commits

2e46e8b27 block: drop request->hard_* and *nr_sectors ... Browse Code »

struct request has had a few different ways to represent some
properties of a request. ->hard_* represent block layer's view of the
request progress (completion cursor) and the ones without the prefix
are supposed to represent the issue cursor and allowed to be updated
as necessary by the low level drivers. The thing is that as block
layer supports partial completion, the two cursors really aren't
necessary and only cause confusion. In addition, manual management of
request detail from low level drivers is cumbersome and error-prone at
the very least.

Another interesting duplicate fields are rq->[hard_]nr_sectors and
rq->{hard_cur|current}_nr_sectors against rq->data_len and
rq->bio->bi_size. This is more convoluted than the hard_ case.

rq->[hard_]nr_sectors are initialized for requests with bio but
blk_rq_bytes() uses it only for !pc requests. rq->data_len is
initialized for all request but blk_rq_bytes() uses it only for pc
requests. This causes good amount of confusion throughout block layer
and its drivers and determining the request length has been a bit of
black magic which may or may not work depending on circumstances and
what the specific LLD is actually doing.

rq->{hard_cur|current}_nr_sectors represent the number of sectors in
the contiguous data area at the front. This is mainly used by drivers
which transfers data by walking request segment-by-segment. This
value always equals rq->bio->bi_size >> 9. However, data length for
pc requests may not be multiple of 512 bytes and using this field
becomes a bit confusing.

In general, having multiple fields to represent the same property
leads only to confusion and subtle bugs. With recent block low level
driver cleanups, no driver is accessing or manipulating these
duplicate fields directly. Drop all the duplicates. Now rq->sector
means the current sector, rq->data_len the current total length and
rq->bio->bi_size the current segment length. Everything else is
defined in terms of these three and available only through accessors.

* blk_recalc_rq_sectors() is collapsed into blk_update_request() and
now handles pc and fs requests equally other than rq->sector update.
This means that now pc requests can use partial completion too (no
in-kernel user yet tho).

* bio_cur_sectors() is replaced with bio_cur_bytes() as block layer
now uses byte count as the primary data length.

* blk_rq_pos() is now guranteed to be always correct. In-block users
converted.

* blk_rq_bytes() is now guaranteed to be always valid as is
blk_rq_sectors(). In-block users converted.

* blk_rq_sectors() is now guaranteed to equal blk_rq_bytes() >> 9.
More convenient one is used.

* blk_rq_bytes() and blk_rq_cur_bytes() are now inlined and take const
pointer to request.

[ Impact: API cleanup, single way to represent one property of a request ]

Signed-off-by: Tejun Heo
Cc: Boaz Harrosh
Signed-off-by: Jens Axboe

Tejun Heo
2009-05-11 15:50:54 +0800
83096ebf1 block: convert to pos and nr_sectors accessors ... Browse Code »

With recent cleanups, there is no place where low level driver
directly manipulates request fields. This means that the 'hard'
request fields always equal the !hard fields. Convert all
rq->sectors, nr_sectors and current_nr_sectors references to
accessors.

While at it, drop superflous blk_rq_pos() < 0 test in swim.c.

[ Impact: use pos and nr_sectors accessors ]

Signed-off-by: Tejun Heo
Acked-by: Geert Uytterhoeven
Tested-by: Grant Likely
Acked-by: Grant Likely
Tested-by: Adrian McMenamin
Acked-by: Adrian McMenamin
Acked-by: Mike Miller
Cc: James Bottomley
Cc: Bartlomiej Zolnierkiewicz
Cc: Borislav Petkov
Cc: Sergei Shtylyov
Cc: Eric Moore
Cc: Alan Stern
Cc: FUJITA Tomonori
Cc: Pete Zaitcev
Cc: Stephen Rothwell
Cc: Paul Clements
Cc: Tim Waugh
Cc: Jeff Garzik
Cc: Jeremy Fitzhardinge
Cc: Alex Dubov
Cc: David Woodhouse
Cc: Martin Schwidefsky
Cc: Dario Ballabio
Cc: David S. Miller
Cc: Rusty Russell
Cc: unsik Kim
Cc: Laurent Vivier
Signed-off-by: Jens Axboe

Tejun Heo
2009-05-11 15:50:54 +0800
5b93629b4 block: implement blk_rq_pos/[cur_]sectors() and convert obvious ones ... Browse Code »

Implement accessors - blk_rq_pos(), blk_rq_sectors() and
blk_rq_cur_sectors() which return rq->hard_sector, rq->hard_nr_sectors
and rq->hard_cur_sectors respectively and convert direct references of
the said fields to the accessors.

This is in preparation of request data length handling cleanup.

Geert : suggested adding const to struct request * parameter to accessors
Sergei : spotted error in patch description

[ Impact: cleanup ]

Signed-off-by: Tejun Heo
Acked-by: Geert Uytterhoeven
Acked-by: Stephen Rothwell
Tested-by: Grant Likely
Acked-by: Grant Likely
Ackec-by: Sergei Shtylyov
Cc: Bartlomiej Zolnierkiewicz
Cc: Borislav Petkov
Cc: James Bottomley
Signed-off-by: Jens Axboe

Tejun Heo
2009-05-11 15:50:53 +0800

28 Apr, 2009

1 commit

a7f557923 block: kill blk_start_queueing() ... Browse Code »

blk_start_queueing() is identical to __blk_run_queue() except that it
doesn't check for recursion. None of the current users depends on
blk_start_queueing() running request_fn directly. Replace usages of
blk_start_queueing() with [__]blk_run_queue() and kill it.

[ Impact: removal of mostly duplicate interface function ]

Signed-off-by: Tejun Heo

Tejun Heo
2009-04-28 13:37:33 +0800

24 Apr, 2009

3 commits

f2d1f0ae7 cfq-iosched: cache prio_tree root in cfqq->p_root ... Browse Code »

Currently we look it up from ->ioprio, but ->ioprio can change if
either the process gets its IO priority changed explicitly, or if
cfq decides to temporarily boost it. So if we are unlucky, we can
end up attempting to remove a node from a different rbtree root than
where it was added.

Fix this by using ->org_ioprio as the prio_tree index, since that
will only change for explicit IO priority settings (not for a boost).
Additionally cache the rbtree root inside the cfqq, then we don't have
to add code to reinsert the cfqq in the prio_tree if IO priority changes.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-24 14:54:22 +0800
3ac6c9f8a cfq-iosched: fix bug with aliased request and cooperation detection ... Browse Code »

cfq_prio_tree_lookup() should return the direct match, yet it always
returns zero. Fix that.

cfq_prio_tree_add() assumes that we don't get a direct match, while
it is very possible that we do. Using O_DIRECT, you can have different
cfqq with matching requests, since you don't have the page cache
to serialize things for you. Fix this bug by only adding the cfqq if
there isn't an existing match.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-24 14:54:22 +0800
26a2ac009 cfq-iosched: clear ->prio_trees[] on cfqd alloc ... Browse Code »

Not strictly needed, but we should make it clear that we init the
rbtree roots here.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-24 14:54:22 +0800

22 Apr, 2009

2 commits

04dc6e71a cfq-iosched: use the default seek distance when there aren't enough seek samples ... Browse Code »

If the cfq io context doesn't have enough samples yet to provide a mean
seek distance, then use the default threshold we have for seeky IO instead
of defaulting to 0.

Signed-off-by: Jeff Moyer
Signed-off-by: Jens Axboe

Jeff Moyer
2009-04-22 14:35:11 +0800
4d00aa47e cfq-iosched: make seek_mean converge more quickly ... Browse Code »

Right now, depending on the first sector to which a process issues I/O,
the seek time may start out way out of whack. So make sure we start
with 0 sectors in seek, instead of the offset of the first request
issued.

Signed-off-by: Jeff Moyer
Signed-off-by: Jens Axboe

Jeff Moyer
2009-04-22 14:35:11 +0800

15 Apr, 2009

7 commits

a36e71f99 cfq-iosched: add close cooperator code ... Browse Code »

If we have processes that are working in close proximity to each
other on disk, we don't want to idle wait. Instead allow the close
process to issue a request, getting better aggregate bandwidth.
The anticipatory scheduler has similar checks, noop and deadline do
not need it since they don't care about process io mappings.

The code for CFQ is a little more involved though, since we split
request queues into per-process contexts.

This fixes a performance problem with eg dump(8), since it uses
several processes in some silly attempt to speed IO up. Even if
dump(8) isn't really a valid case (it should be fixed by using
CLONE_IO), there are other cases where we see close processes
and where idling ends up hurting performance.

Credit goes to Jeff Moyer for writing the
initial implementation.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-15 18:15:11 +0800
9481ffdc6 cfq-iosched: log responsible 'cfqq' in idle timer arm ... Browse Code »

Makes it easier to read the traces.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-15 18:14:13 +0800
2d8707229 cfq-iosched: tweak kick logic a bit more ... Browse Code »

We only kick the dispatch for an idling queue, if we think it's a
(somewhat) fully merged request. Also allow a kick if we have other
busy queues in the system, since we don't want to risk waiting for
a potential merge in that case. It's better to get some work done and
proceed.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-15 18:12:46 +0800
40bb54d19 cfq-iosched: no need to save interrupts in cfq_kick_queue() ... Browse Code »

It's called from the workqueue handlers from process context, so
we always have irqs enabled when entered.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-15 18:11:10 +0800
d6ceb25e8 cfq-iosched: don't delay queue kick for a merged request ... Browse Code »

"Zhang, Yanmin" reports that commit
b029195dda0129b427c6e579a3bb3ae752da3a93 introduced a regression
of about 50% with sequential threaded read workloads. The test
case is:

tiotest -k0 -k1 -k3 -f 80 -t 32

which starts 32 threads each reading a 80MB file. Twiddle the kick
queue logic so that we do start IO immediately, if it appears to be
a fully merged request. We can't really detect that, so just check
if the request is bigger than a page or not. The assumption is that
since single bio issues will first queue a single request with just
one page attached and then later do merges on that, if we already
have more than a page worth of data in the request, then the request
is most likely good to go.

Verified that this doesn't cause a regression with the test case that
commit b029195dda0129b427c6e579a3bb3ae752da3a93 was fixing. It does not,
we still see maximum sized requests for the queue-then-merge cases.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-15 14:28:12 +0800
ff6657c6c cfq-iosched: get rid of private SYNC/ASYNC defines ... Browse Code »

We can just use the block layer BLK_RW_SYNC/ASYNC defines now.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-15 14:28:10 +0800
b0b78f81a cfq-iosched: use rw_is_sync() to see if rw flags are sync or not ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-15 14:28:10 +0800

07 Apr, 2009

3 commits

b029195dd cfq-iosched: don't let idling interfere with plugging ... Browse Code »

When CFQ is waiting for a new request from a process, currently it'll
immediately restart queuing when it sees such a request. This doesn't
work very well with streamed IO, since we then end up splitting IO
that would otherwise have been merged nicely. For a simple dd test,
this causes 10x as many requests to be issued as we should have.
Normally this goes unnoticed due to the low overhead of requests
at the device side, but some hardware is very sensitive to request
sizes and there it can cause big slow downs.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-07 17:38:31 +0800
75e50984f cfq-iosched: kill two unused cfqq flags ... Browse Code »

We only manipulate the must_dispatch and queue_new flags, they are not
tested anymore. So get rid of them.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-07 14:56:14 +0800
2f5cb7381 cfq-iosched: change dispatch logic to deal with single requests at the time ... Browse Code »

The IO scheduler core calls into the IO scheduler dispatch_request hook
to move requests from the IO scheduler and into the driver dispatch
list. It only does so when the dispatch list is empty. CFQ moves several
requests to the dispatch list, which can cause higher latencies if we
suddenly have to switch to some important sync IO. Change the logic to
move one request at the time instead.

This should almost be functionally equivalent to what we did before,
except that we now honor 'quantum' as the maximum queue depth at the
device side from any single cfqq. If there's just a single active
cfqq, we allow up to 4 times the normal quantum.

Signed-off-by: Jens Axboe

Jens Axboe
2009-04-07 14:51:19 +0800

06 Apr, 2009

1 commit

aeb6fafb8 block: Add flag for telling the IO schedulers NOT to anticipate more IO ... Browse Code »

By default, CFQ will anticipate more IO from a given io context if the
previously completed IO was sync. This used to be fine, since the only
sync IO was reads and O_DIRECT writes. But with more "normal" sync writes
being used now, we don't want to anticipate for those.

Add a bio/request flag that informs the IO scheduler that this is a sync
request that we should not idle for. Introduce WRITE_ODIRECT specifically
for O_DIRECT writes, and make sure that the other sync writes set this
flag.

Signed-off-by: Jens Axboe
Signed-off-by: Linus Torvalds

Jens Axboe
2009-04-06 23:04:54 +0800

30 Jan, 2009

1 commit

3a9a3f6cc cfq-iosched: Allow RT requests to pre-empt ongoing BE timeslice ... Browse Code »

This patch adds the ability to pre-empt an ongoing BE timeslice when a RT
request is waiting for the current timeslice to complete. This reduces the
wait time to disk for RT requests from an upper bound of 4 (current value
of cfq_quantum) to 1 disk request.

Applied Jens' suggeested changes to avoid the rb lookup and use !cfq_class_rt()
and retested.

Latency(secs) for the RT task when doing sequential reads from 10G file.
| only RT | RT + BE | RT + BE + this patch
small (512 byte) reads | 143 | 163 | 145
large (1Mb) reads | 142 | 158 | 146

Signed-off-by: Divyesh Shah
Signed-off-by: Jens Axboe

Divyesh Shah
2009-01-30 19:47:33 +0800

29 Dec, 2008

4 commits

62c1fe9d9 cfq-iosched: fix race between exiting queue and exiting task ... Browse Code »

Original patch from Nikanth Karthikesan

When a queue exits the queue lock is taken and cfq_exit_queue() would free all
the cic's associated with the queue.

But when a task exits, cfq_exit_io_context() gets cic one by one and then
locks the associated queue to call __cfq_exit_single_io_context. It looks like
between getting a cic from the ioc and locking the queue, the queue might have
exited on another cpu.

Fix this by rechecking the cfq_io_context queue key inside the queue lock
again, and not calling into __cfq_exit_single_io_context() if somebody
beat us to it.

Signed-off-by: Jens Axboe

Jens Axboe
2008-12-29 15:29:52 +0800
30e0dc28b cfq-iosched: remove limit of dispatch depth of max 4 times quantum ... Browse Code »

This basically limits the hardware queue depth to 4*quantum at any
point in time, which is 16 with the default settings. As CFQ uses
other means to shrink the hardware queue when necessary in the first
place, there's really no need for this extra heuristic. Additionally,
it ends up hurting performance in some cases.

Signed-off-by: Jens Axboe

Jens Axboe
2008-12-29 15:29:51 +0800
b374d18a4 block: get rid of elevator_t typedef ... Browse Code »

Just use struct elevator_queue everywhere instead.

Signed-off-by: Jens Axboe

Jens Axboe
2008-12-29 15:29:50 +0800
64d01dc9e block: use cancel_work_sync() instead of kblockd_flush_work() ... Browse Code »

After many improvements on kblockd_flush_work, it is now identical to
cancel_work_sync, so a direct call to cancel_work_sync is suggested.

The only difference is that cancel_work_sync is a GPL symbol,
so no non-GPL modules anymore.

Signed-off-by: Cheng Renquan
Cc: Jens Axboe
Signed-off-by: Jens Axboe

Cheng Renquan
2008-12-29 15:28:44 +0800

09 Oct, 2008

4 commits

f7d7b7a7a block: as/cfq ssd idle check update ... Browse Code »

We really need to know about the hardware tagging support as well,
since if the SSD does not do tagging then we still want to idle.
Otherwise have the same dependent sync IO vs flooding async IO
problem as on rotational media.

Signed-off-by: Jens Axboe

Jens Axboe
2008-10-09 14:56:19 +0800
a68bbddba block: add queue flag for SSD/non-rotational devices ... Browse Code »

We don't want to idle in AS/CFQ if the device doesn't have a seek
penalty. So add a QUEUE_FLAG_NONROT to indicate a non-rotational
device, low level drivers should set this flag upon discovery of
an SSD or similar device type.

Signed-off-by: Jens Axboe

Jens Axboe
2008-10-09 14:56:19 +0800
45333d5a3 cfq-iosched: fix queue depth detection ... Browse Code »

CFQ's detection of queueing devices assumes a non-queuing device and detects
if the queue depth reaches a certain threshold. Under some workloads (e.g.
synchronous reads), CFQ effectively forces a unit queue depth, thus defeating
the detection logic. This leads to poor performance on queuing hardware,
since the idle window remains enabled.

This patch inverts the sense of the logic: assume a queuing-capable device,
and detect if the depth does not exceed the threshold.

Signed-off-by: Aaron Carroll
Signed-off-by: Jens Axboe

Aaron Carroll
2008-10-09 14:56:09 +0800
18887ad91 block: make kblockd_schedule_work() take the queue as parameter ... Browse Code »

Preparatory patch for checking queuing affinity.

Signed-off-by: Jens Axboe

Jens Axboe
2008-10-09 14:56:09 +0800

03 Jul, 2008

1 commit

c265a7f41 cfq-iosched: get rid of enable_idle being unused warning ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2008-07-03 19:21:14 +0800