30 Sep, 2006
2 commits
-
Signed-off-by: Alexey Dobriyan
Acked-by: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Check and handle init errors.
Signed-off-by: Randy Dunlap
Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Sep, 2006
1 commit
-
The following patches reduce the size of the VFS inode structure by 28 bytes
on a UP x86. (It would be more on an x86_64 system). This is a 10% reduction
in the inode size on a UP kernel that is configured in a production mode
(i.e., with no spinlock or other debugging functions enabled; if you want to
save memory taken up by in-core inodes, the first thing you should do is
disable the debugging options; they are responsible for a huge amount of bloat
in the VFS inode structure).This patch:
The filesystem or device-specific pointer in the inode is inside a union,
which is pretty pointless given that all 30+ users of this field have been
using the void pointer. Get rid of the union and rename it to i_private, with
a comment to explain who is allowed to use the void pointer. This is just a
cleanup, but it allows us to reuse the union 'u' for something something where
the union will actually be used.[judith@osdl.org: powerpc build fix]
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Judith Lebzelter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
24 Sep, 2006
1 commit
-
Conflicts:
include/linux/blkdev.h
Trivial merge to incorporate tag prototypes.
23 Sep, 2006
1 commit
-
Signed-off-by: Trond Myklebust
31 Aug, 2006
1 commit
-
The current block queue implementation already contains most of the
machinery for shared tag maps. The only remaining pieces are a way to
allocate and destroy a tag map independently of the queues (so that
the maps can be managed on the life cycle of the overseeing entity)Acked-by: Jens Axboe
Signed-off-by: James Bottomley
23 Aug, 2006
1 commit
-
An exiting task or process which didn't do I/O yet have no io context,
elv_unregister() should check it is not NULL.Signed-off-by: Oleg Nesterov
Acked-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman
21 Aug, 2006
2 commits
-
Obviously, cfq_cic_link() shouldn't free a just allocated cfq_io_context?
The dead key is from __cic, so drop that.Signed-off-by: Oleg Nesterov
Signed-off-by: Jens Axboe -
I know nothing about io scheduler, but I suspect set_task_ioprio() is not safe.
current_io_context() initializes "struct io_context", then sets ->io_context.
set_task_ioprio() running on another cpu may see the changes out of order, so
->set_ioprio(ioc) may use io_context which was not initialized properly.Signed-off-by: Oleg Nesterov
Signed-off-by: Jens Axboe
25 Jul, 2006
2 commits
-
The CIC_SEEKY() test really wants to use the minimum of either:
- 2 msecs (not jiffies)
- or, the pending slice time
So code it like that.
Signed-off-by: Jens Axboe
-
It should be toggling the same bit on and off, fix it up.
Signed-off-by: Jens Axboe
15 Jul, 2006
1 commit
-
The delete partition IOCTL takes the bd_mutex for both the disk and the
partition; these have an obvious hierarchical relationship and this patch
annotates this relationship for lockdep.Signed-off-by: Arjan van de Ven
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
06 Jul, 2006
2 commits
-
Not three, as assumed. This causes the barrier bit to be needlessly set
for some IO.Signed-off-by: Jens Axboe
-
Provide the needed kernel support for distinguishing readahead
from regular read requests when tracing block devices.Signed-off-by: Nathan Scott
Signed-off-by: Jens Axboe
04 Jul, 2006
1 commit
-
lockdep needs to have the waitqueue lock initialized for on-stack waitqueues
implicitly initialized by DECLARE_COMPLETION(). Annotate on-stack completions
accordingly.Has no effect on non-lockdep kernels.
Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Jul, 2006
3 commits
-
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
Remove obsolete #include
remove obsolete swsusp_encrypt
arch/arm26/Kconfig typos
Documentation/IPMI typos
Kconfig: Typos in net/sched/Kconfig
v9fs: do not include linux/version.h
Documentation/DocBook/mtdnand.tmpl: typo fixes
typo fixes: specfic -> specific
typo fixes in Documentation/networking/pktgen.txt
typo fixes: occuring -> occurring
typo fixes: infomation -> information
typo fixes: disadvantadge -> disadvantage
typo fixes: aquire -> acquire
typo fixes: mecanism -> mechanism
typo fixes: bandwith -> bandwidth
fix a typo in the RTC_CLASS help text
smb is no longer maintainedManually merged trivial conflict in arch/um/kernel/vmlinux.lds.S
-
The remaining counters in page_state after the zoned VM counter patches
have been applied are all just for show in /proc/vmstat. They have no
essential function for the VM.We use a simple increment of per cpu variables. In order to avoid the most
severe races we disable preempt. Preempt does not prevent the race between
an increment and an interrupt handler incrementing the same statistics
counter. However, that race is exceedingly rare, we may only loose one
increment or so and there is no requirement (at least not in kernel) that
the vm event counters have to be accurate.In the non preempt case this results in a simple increment for each
counter. For many architectures this will be reduced by the compiler to a
single instruction. This single instruction is atomic for i386 and x86_64.
And therefore even the rare race condition in an interrupt is avoided for
both architectures in most cases.The patchset also adds an off switch for embedded systems that allows a
building of linux kernels without these counters.The implementation of these counters is through inline code that hopefully
results in only a single instruction increment instruction being emitted
(i386, x86_64) or in the increment being hidden though instruction
concurrency (EPIC architectures such as ia64 can get that done).Benefits:
- VM event counter operations usually reduce to a single inline instruction
on i386 and x86_64.
- No interrupt disable, only preempt disable for the preempt case.
Preempt disable can also be avoided by moving the counter into a spinlock.
- Handling is similar to zoned VM counters.
- Simple and easily extendable.
- Can be omitted to reduce memory use for embedded use.References:
RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=113512330605497&w=2
RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=114988082814934&w=2
local_t http://marc.theaimsgroup.com/?l=linux-kernel&m=114991748606690&w=2
V2 http://marc.theaimsgroup.com/?t=115014808400007&r=1&w=2
V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767022346&w=2
V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115047968808926&w=2Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Jörn Engel
Signed-off-by: Adrian Bunk
28 Jun, 2006
2 commits
-
Make use the of newly defined hotplug version of cpu_notifier functionality
wherever appropriate.Signed-off-by: Chandra Seetharaman
Cc: Ashok Raj
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch reverts notifier_block changes made in 2.6.17
Signed-off-by: Chandra Seetharaman
Cc: Ashok Raj
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Jun, 2006
1 commit
-
acquired (aquired)
contiguous (contigious)
successful (succesful, succesfull)
surprise (suprise)
whether (weather)
some other misspellingsSigned-off-by: Andreas Mohr
Signed-off-by: Adrian Bunk
23 Jun, 2006
12 commits
-
Do a safer check for when to enable DMA. Currently we enable ISA DMA
for cases that do not need it, resulting in OOM conditions when ZONE_DMA
runs out of space.Signed-off-by: Jens Axboe
-
They all duplicate macros to check for empty root and/or node, and
clearing a node. So put those in rbtree.h.Signed-off-by: Jens Axboe
-
- Remember to set ->last_sector so that the cfq_choose_req() logic
works correctly.- Remove redundant call to cfq_choose_req()
Signed-off-by: Jens Axboe
-
This is a collection of patches that greatly improve CFQ performance
in some circumstances.- Change the idling logic to only kick in after a request is done and we
are deciding what to do. Before the idling included the request service
time, so it was hard to adjust. Now it's true think/idle time.- Take advantage of TCQ/NCQ/queueing for seeky sync workloads, but keep
it in control for sync and sequential (or close to) workloads.- Expire queues immediately and move on to other busy queues, if we are
not going to idle after the current one finishes.- Don't rearm idle timer if there are no busy queues. Just leave the
system idle.Signed-off-by: Jens Axboe
-
Patch originally from Vasily Tarasov
If you set io-priority of process 1 using sys_ioprio_set system call by
another process 2 (like ionice do), then cfq_init_prio_data() function
sets priority of process 2 (current) on queue of process 1 and clears
the flag, that designates change of ioprio. So the process 1 will work
like with priority of process 2.I propose not to call cfq_init_prio_data() on io-priority change, but
only mark queue as queue with changed prority. Every time when new
request comes cfq-scheduler checks for this flag and atomaticaly changes
priority of queue to new value.Signed-off-by: Jens Axboe
-
Signed-off-by: Jens Axboe
-
A process flag to indicate whether we are doing sync io is incredibly
ugly. It also causes performance problems when one does a lot of async
io and then proceeds to sync it. Part of the io will go out as async,
and the other part as sync. This causes a disconnect between the
previously submitted io and the synced io. For io schedulers such as CFQ,
this will cause us lost merges and suboptimal behaviour in scheduling.Remove PF_SYNCWRITE completely from the fsync/msync paths, and let
the O_DIRECT path just directly indicate that the writes are sync
by using WRITE_SYNC instead.Signed-off-by: Jens Axboe
-
We cannot update them if the user changes nr_requests, so don't
set it in the first place. The gains are pretty questionable as
well. The batching loss has been shown to decrease throughput.Signed-off-by: Jens Axboe
-
We already drop the refcount in elevator_exit(), and as
we're setting 'e' to NULL, we'll never take that branch anyway.
Finally, as 'e' is a local var that isn't referenced afterwards,
setting it to NULL is pointless.Signed-off-by: Dave Jones
Signed-off-by: Jens Axboe -
The queue lock can be taken from interrupts so it must always be taken with
irq disabling primitives. Some primitives already verify this.
blk_start_queue() is called under this lock, so interrupts must be
disabled.Also document this requirement clearly in blk_init_queue(), where the queue
spinlock is set.Signed-off-by: Paolo 'Blaisorblade' Giarrusso
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe -
Use hlist instead of list_head for request hashtable in deadline-iosched
and as-iosched. It also can remove the flag to know hashed or unhashed.Signed-off-by: Akinobu Mita
Signed-off-by: Jens Axboeblock/as-iosched.c | 45 +++++++++++++++++++--------------------------
block/deadline-iosched.c | 39 ++++++++++++++++-----------------------
2 files changed, 35 insertions(+), 49 deletions(-) -
list_splice_init(list, head) does unneeded job if it is known that
list_empty(head) == 1. We can use list_replace_init() instead.Signed-off-by: Oleg Nesterov
Acked-by: David S. Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Jun, 2006
1 commit
-
Like the SUBSYTEM= key we find in the environment of the uevent, this
creates a generic "subsystem" link in sysfs for every device. Userspace
usually doesn't care at all if its a "class" or a "bus" device. This
provides an unified way to determine the subsytem of a device, regardless
of the way the driver core has created it.Signed-off-by: Kay Sievers
Signed-off-by: Greg Kroah-Hartman
21 Jun, 2006
2 commits
-
The color is now in the low bits of the parent pointer, and initializing
it to 0 happens as part of the whole memset above, so just remove the
unnecessary RB_CLEAR_COLOR.Signed-off-by: Linus Torvalds
-
* git://git.infradead.org/~dwmw2/rbtree-2.6:
[RBTREE] Switch rb_colour() et al to en_US spelling of 'color' for consistency
Update UML kernel/physmem.c to use rb_parent() accessor macro
[RBTREE] Update hrtimers to use rb_parent() accessor macro.
[RBTREE] Add explicit alignment to sizeof(long) for struct rb_node.
[RBTREE] Merge colour and parent fields of struct rb_node.
[RBTREE] Remove dead code in rb_erase()
[RBTREE] Update JFFS2 to use rb_parent() accessor macro.
[RBTREE] Update eventpoll.c to use rb_parent() accessor macro.
[RBTREE] Update key.c to use rb_parent() accessor macro.
[RBTREE] Update ext3 to use rb_parent() accessor macro.
[RBTREE] Change rbtree off-tree marking in I/O schedulers.
[RBTREE] Add accessor macros for colour and parent fields of rb_node
15 Jun, 2006
1 commit
-
We don't clear the seek stat values in cfq_alloc_io_context(), and if
->seek_mean is unlucky enough to be set to -36 by chance, the first
invocation of cfq_update_io_seektime() will oops with a divide by zero
in do_div().Just memset the entire cic instead of filling invididual values
independently.Signed-off-by: Jens Axboe
Signed-off-by: Linus Torvalds
09 Jun, 2006
1 commit
-
There's a race between shutting down one io scheduler and firing up the
next, in which a new io could enter and cause the io scheduler to be
invoked with bad or NULL data.To fix this, we need to maintain the queue lock for a bit longer.
Unfortunately we cannot do that, since the elevator init requires to be
run without the lock held. This isn't easily fixable, without also
changing the mempool API. So split the initialization into two parts,
and alloc-init operation and an attach operation. Then we can
preallocate the io scheduler and related structures, and run the attach
inside the lock after we detach the old one.This patch has survived 30 minutes of 1 second io scheduler switching
with a very busy io load.Signed-off-by: Jens Axboe
Signed-off-by: Linus Torvalds
02 Jun, 2006
1 commit
-
Now that we select busy_rr for possible service, insert entries at the
back of that list instead of at the front.Signed-off-by: Jens Axboe
01 Jun, 2006
1 commit
-
There's a small window from when the timer is entered and we grab
the queue lock, where cfq_set_active_queue() could be rearming the
timer for us. Seen in the wild on a 12-way ppc box. Fix this by
just using mod_timer(), which will do the right thing for us.Signed-off-by: Jens Axboe