30 Sep, 2006

2 commits


27 Sep, 2006

1 commit

  • The following patches reduce the size of the VFS inode structure by 28 bytes
    on a UP x86. (It would be more on an x86_64 system). This is a 10% reduction
    in the inode size on a UP kernel that is configured in a production mode
    (i.e., with no spinlock or other debugging functions enabled; if you want to
    save memory taken up by in-core inodes, the first thing you should do is
    disable the debugging options; they are responsible for a huge amount of bloat
    in the VFS inode structure).

    This patch:

    The filesystem or device-specific pointer in the inode is inside a union,
    which is pretty pointless given that all 30+ users of this field have been
    using the void pointer. Get rid of the union and rename it to i_private, with
    a comment to explain who is allowed to use the void pointer. This is just a
    cleanup, but it allows us to reuse the union 'u' for something something where
    the union will actually be used.

    [judith@osdl.org: powerpc build fix]
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Judith Lebzelter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Theodore Ts'o
     

24 Sep, 2006

1 commit


23 Sep, 2006

1 commit


31 Aug, 2006

1 commit

  • The current block queue implementation already contains most of the
    machinery for shared tag maps. The only remaining pieces are a way to
    allocate and destroy a tag map independently of the queues (so that
    the maps can be managed on the life cycle of the overseeing entity)

    Acked-by: Jens Axboe
    Signed-off-by: James Bottomley

    James Bottomley
     

23 Aug, 2006

1 commit


21 Aug, 2006

2 commits


25 Jul, 2006

2 commits


15 Jul, 2006

1 commit


06 Jul, 2006

2 commits


04 Jul, 2006

1 commit

  • lockdep needs to have the waitqueue lock initialized for on-stack waitqueues
    implicitly initialized by DECLARE_COMPLETION(). Annotate on-stack completions
    accordingly.

    Has no effect on non-lockdep kernels.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

01 Jul, 2006

3 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
    Remove obsolete #include
    remove obsolete swsusp_encrypt
    arch/arm26/Kconfig typos
    Documentation/IPMI typos
    Kconfig: Typos in net/sched/Kconfig
    v9fs: do not include linux/version.h
    Documentation/DocBook/mtdnand.tmpl: typo fixes
    typo fixes: specfic -> specific
    typo fixes in Documentation/networking/pktgen.txt
    typo fixes: occuring -> occurring
    typo fixes: infomation -> information
    typo fixes: disadvantadge -> disadvantage
    typo fixes: aquire -> acquire
    typo fixes: mecanism -> mechanism
    typo fixes: bandwith -> bandwidth
    fix a typo in the RTC_CLASS help text
    smb is no longer maintained

    Manually merged trivial conflict in arch/um/kernel/vmlinux.lds.S

    Linus Torvalds
     
  • The remaining counters in page_state after the zoned VM counter patches
    have been applied are all just for show in /proc/vmstat. They have no
    essential function for the VM.

    We use a simple increment of per cpu variables. In order to avoid the most
    severe races we disable preempt. Preempt does not prevent the race between
    an increment and an interrupt handler incrementing the same statistics
    counter. However, that race is exceedingly rare, we may only loose one
    increment or so and there is no requirement (at least not in kernel) that
    the vm event counters have to be accurate.

    In the non preempt case this results in a simple increment for each
    counter. For many architectures this will be reduced by the compiler to a
    single instruction. This single instruction is atomic for i386 and x86_64.
    And therefore even the rare race condition in an interrupt is avoided for
    both architectures in most cases.

    The patchset also adds an off switch for embedded systems that allows a
    building of linux kernels without these counters.

    The implementation of these counters is through inline code that hopefully
    results in only a single instruction increment instruction being emitted
    (i386, x86_64) or in the increment being hidden though instruction
    concurrency (EPIC architectures such as ia64 can get that done).

    Benefits:
    - VM event counter operations usually reduce to a single inline instruction
    on i386 and x86_64.
    - No interrupt disable, only preempt disable for the preempt case.
    Preempt disable can also be avoided by moving the counter into a spinlock.
    - Handling is similar to zoned VM counters.
    - Simple and easily extendable.
    - Can be omitted to reduce memory use for embedded use.

    References:

    RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=113512330605497&w=2
    RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=114988082814934&w=2
    local_t http://marc.theaimsgroup.com/?l=linux-kernel&m=114991748606690&w=2
    V2 http://marc.theaimsgroup.com/?t=115014808400007&r=1&w=2
    V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767022346&w=2
    V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115047968808926&w=2

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Signed-off-by: Jörn Engel
    Signed-off-by: Adrian Bunk

    Jörn Engel
     

28 Jun, 2006

2 commits


27 Jun, 2006

1 commit

  • acquired (aquired)
    contiguous (contigious)
    successful (succesful, succesfull)
    surprise (suprise)
    whether (weather)
    some other misspellings

    Signed-off-by: Andreas Mohr
    Signed-off-by: Adrian Bunk

    Andreas Mohr
     

23 Jun, 2006

12 commits

  • Do a safer check for when to enable DMA. Currently we enable ISA DMA
    for cases that do not need it, resulting in OOM conditions when ZONE_DMA
    runs out of space.

    Signed-off-by: Jens Axboe

    Andi Kleen
     
  • They all duplicate macros to check for empty root and/or node, and
    clearing a node. So put those in rbtree.h.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • - Remember to set ->last_sector so that the cfq_choose_req() logic
    works correctly.

    - Remove redundant call to cfq_choose_req()

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • This is a collection of patches that greatly improve CFQ performance
    in some circumstances.

    - Change the idling logic to only kick in after a request is done and we
    are deciding what to do. Before the idling included the request service
    time, so it was hard to adjust. Now it's true think/idle time.

    - Take advantage of TCQ/NCQ/queueing for seeky sync workloads, but keep
    it in control for sync and sequential (or close to) workloads.

    - Expire queues immediately and move on to other busy queues, if we are
    not going to idle after the current one finishes.

    - Don't rearm idle timer if there are no busy queues. Just leave the
    system idle.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Patch originally from Vasily Tarasov

    If you set io-priority of process 1 using sys_ioprio_set system call by
    another process 2 (like ionice do), then cfq_init_prio_data() function
    sets priority of process 2 (current) on queue of process 1 and clears
    the flag, that designates change of ioprio. So the process 1 will work
    like with priority of process 2.

    I propose not to call cfq_init_prio_data() on io-priority change, but
    only mark queue as queue with changed prority. Every time when new
    request comes cfq-scheduler checks for this flag and atomaticaly changes
    priority of queue to new value.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Signed-off-by: Jens Axboe

    Jens Axboe
     
  • A process flag to indicate whether we are doing sync io is incredibly
    ugly. It also causes performance problems when one does a lot of async
    io and then proceeds to sync it. Part of the io will go out as async,
    and the other part as sync. This causes a disconnect between the
    previously submitted io and the synced io. For io schedulers such as CFQ,
    this will cause us lost merges and suboptimal behaviour in scheduling.

    Remove PF_SYNCWRITE completely from the fsync/msync paths, and let
    the O_DIRECT path just directly indicate that the writes are sync
    by using WRITE_SYNC instead.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • We cannot update them if the user changes nr_requests, so don't
    set it in the first place. The gains are pretty questionable as
    well. The batching loss has been shown to decrease throughput.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • We already drop the refcount in elevator_exit(), and as
    we're setting 'e' to NULL, we'll never take that branch anyway.
    Finally, as 'e' is a local var that isn't referenced afterwards,
    setting it to NULL is pointless.

    Signed-off-by: Dave Jones
    Signed-off-by: Jens Axboe

    Dave Jones
     
  • The queue lock can be taken from interrupts so it must always be taken with
    irq disabling primitives. Some primitives already verify this.
    blk_start_queue() is called under this lock, so interrupts must be
    disabled.

    Also document this requirement clearly in blk_init_queue(), where the queue
    spinlock is set.

    Signed-off-by: Paolo 'Blaisorblade' Giarrusso
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Paolo 'Blaisorblade' Giarrusso
     
  • Use hlist instead of list_head for request hashtable in deadline-iosched
    and as-iosched. It also can remove the flag to know hashed or unhashed.

    Signed-off-by: Akinobu Mita
    Signed-off-by: Jens Axboe

    block/as-iosched.c | 45 +++++++++++++++++++--------------------------
    block/deadline-iosched.c | 39 ++++++++++++++++-----------------------
    2 files changed, 35 insertions(+), 49 deletions(-)

    Akinobu Mita
     
  • list_splice_init(list, head) does unneeded job if it is known that
    list_empty(head) == 1. We can use list_replace_init() instead.

    Signed-off-by: Oleg Nesterov
    Acked-by: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

22 Jun, 2006

1 commit

  • Like the SUBSYTEM= key we find in the environment of the uevent, this
    creates a generic "subsystem" link in sysfs for every device. Userspace
    usually doesn't care at all if its a "class" or a "bus" device. This
    provides an unified way to determine the subsytem of a device, regardless
    of the way the driver core has created it.

    Signed-off-by: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Kay Sievers
     

21 Jun, 2006

2 commits

  • The color is now in the low bits of the parent pointer, and initializing
    it to 0 happens as part of the whole memset above, so just remove the
    unnecessary RB_CLEAR_COLOR.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • * git://git.infradead.org/~dwmw2/rbtree-2.6:
    [RBTREE] Switch rb_colour() et al to en_US spelling of 'color' for consistency
    Update UML kernel/physmem.c to use rb_parent() accessor macro
    [RBTREE] Update hrtimers to use rb_parent() accessor macro.
    [RBTREE] Add explicit alignment to sizeof(long) for struct rb_node.
    [RBTREE] Merge colour and parent fields of struct rb_node.
    [RBTREE] Remove dead code in rb_erase()
    [RBTREE] Update JFFS2 to use rb_parent() accessor macro.
    [RBTREE] Update eventpoll.c to use rb_parent() accessor macro.
    [RBTREE] Update key.c to use rb_parent() accessor macro.
    [RBTREE] Update ext3 to use rb_parent() accessor macro.
    [RBTREE] Change rbtree off-tree marking in I/O schedulers.
    [RBTREE] Add accessor macros for colour and parent fields of rb_node

    Linus Torvalds
     

15 Jun, 2006

1 commit

  • We don't clear the seek stat values in cfq_alloc_io_context(), and if
    ->seek_mean is unlucky enough to be set to -36 by chance, the first
    invocation of cfq_update_io_seektime() will oops with a divide by zero
    in do_div().

    Just memset the entire cic instead of filling invididual values
    independently.

    Signed-off-by: Jens Axboe
    Signed-off-by: Linus Torvalds

    Jens Axboe
     

09 Jun, 2006

1 commit

  • There's a race between shutting down one io scheduler and firing up the
    next, in which a new io could enter and cause the io scheduler to be
    invoked with bad or NULL data.

    To fix this, we need to maintain the queue lock for a bit longer.
    Unfortunately we cannot do that, since the elevator init requires to be
    run without the lock held. This isn't easily fixable, without also
    changing the mempool API. So split the initialization into two parts,
    and alloc-init operation and an attach operation. Then we can
    preallocate the io scheduler and related structures, and run the attach
    inside the lock after we detach the old one.

    This patch has survived 30 minutes of 1 second io scheduler switching
    with a very busy io load.

    Signed-off-by: Jens Axboe
    Signed-off-by: Linus Torvalds

    Jens Axboe
     

02 Jun, 2006

1 commit


01 Jun, 2006

1 commit