24 Jul, 2007

1 commit

  • Some of the code has been gradually transitioned to using the proper
    struct request_queue, but there's lots left. So do a full sweet of
    the kernel and get rid of this typedef and replace its uses with
    the proper type.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

20 Jul, 2007

1 commit

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     

10 Jul, 2007

1 commit


08 May, 2007

1 commit

  • This patch provides a new macro

    KMEM_CACHE(, )

    to simplify slab creation. KMEM_CACHE creates a slab with the name of the
    struct, with the size of the struct and with the alignment of the struct.
    Additional slab flags may be specified if necessary.

    Example

    struct test_slab {
    int a,b,c;
    struct list_head;
    } __cacheline_aligned_in_smp;

    test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC)

    will create a new slab named "test_slab" of the size sizeof(struct
    test_slab) and aligned to the alignment of test slab. If it fails then we
    panic.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

30 Apr, 2007

1 commit

  • Currently we scale the mempool sizes depending on memory installed
    in the machine, except for the bio pool itself which sits at a fixed
    256 entry pre-allocation.

    There's really no point in "optimizing" this OOM path, we just need
    enough preallocated to make progress. A single unit is enough, lets
    scale it down to 2 just to be on the safe side.

    This patch saves ~150kb of pinned kernel memory on a 32-bit box.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

14 Dec, 2006

1 commit

  • Implement block device specific .direct_IO method instead of going through
    generic direct_io_worker for block device.

    direct_io_worker() is fairly complex because it needs to handle O_DIRECT on
    file system, where it needs to perform block allocation, hole detection,
    extents file on write, and tons of other corner cases. The end result is
    that it takes tons of CPU time to submit an I/O.

    For block device, the block allocation is much simpler and a tight triple
    loop can be written to iterate each iovec and each page within the iovec in
    order to construct/prepare bio structure and then subsequently submit it to
    the block layer. This significantly speeds up O_D on block device.

    [akpm@osdl.org: small speedup]
    Signed-off-by: Ken Chen
    Cc: Christoph Hellwig
    Cc: Zach Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chen, Kenneth W
     

08 Dec, 2006

1 commit

  • Replace all uses of kmem_cache_t with struct kmem_cache.

    The patch was generated using the following script:

    #!/bin/sh
    #
    # Replace one string by another in all the kernel sources.
    #

    set -e

    for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
    quilt add $file
    sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
    mv /tmp/$$ $file
    quilt refresh
    done

    The script was run like this

    sh replace kmem_cache_t "struct kmem_cache"

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

05 Dec, 2006

1 commit


01 Dec, 2006

2 commits

  • This patch modifies blk_rq_map/unmap_user() and the cdrom and scsi_ioctl.c
    users so that it supports requests larger than bio by chaining them together.

    Signed-off-by: Mike Christie
    Signed-off-by: Jens Axboe

    Mike Christie
     
  • The target mode support is mapping in bios using bio_map_user. The
    current targets do not need their len to be aligned with a queue limit
    so this check is causing some problems. Note: pointers passed into the
    kernel are properly aligned by usersapace tgt code so the uaddr check
    in bio_map_user is ok.

    The major user, blk_bio_map_user checks for the len before mapping
    so it is not affected by this patch.

    And the semi-newly added user blk_rq_map_user_iov has been failing
    out when the len is not aligned properly so maybe people have been
    good and not sending misaligned lens or that path is not used very
    often and this change will not be very dangerous. st and sg do not
    check the length and we have not seen any problem reports from those
    wider used paths so this patch should be fairly safe - for mm
    and wider testing at least.

    Signed-off-by: Mike Christie
    Signed-off-by: FUJITA Tomonori
    Signed-off-by: James Bottomley
    Signed-off-by: Jens Axboe

    Mike Christie
     

22 Nov, 2006

1 commit

  • Pass the work_struct pointer to the work function rather than context data.
    The work function can use container_of() to work out the data.

    For the cases where the container of the work_struct may go away the moment the
    pending bit is cleared, it is made possible to defer the release of the
    structure by deferring the clearing of the pending bit.

    To make this work, an extra flag is introduced into the management side of the
    work_struct. This governs auto-release of the structure upon execution.

    Ordinarily, the work queue executor would release the work_struct for further
    scheduling or deallocation by clearing the pending bit prior to jumping to the
    work function. This means that, unless the driver makes some guarantee itself
    that the work_struct won't go away, the work function may not access anything
    else in the work_struct or its container lest they be deallocated.. This is a
    problem if the auxiliary data is taken away (as done by the last patch).

    However, if the pending bit is *not* cleared before jumping to the work
    function, then the work function *may* access the work_struct and its container
    with no problems. But then the work function must itself release the
    work_struct by calling work_release().

    In most cases, automatic release is fine, so this is the default. Special
    initiators exist for the non-auto-release case (ending in _NAR).

    Signed-Off-By: David Howells

    David Howells
     

12 Oct, 2006

1 commit

  • - Calculate a variable in bvec_alloc_bs() only once needed, not earlier
    (bio.o down from 18408 to 18376 Bytes, 32 Bytes saved, probably due to
    data locality improvements).

    - Init variable idx to silence a gcc warning which already existed in the
    unmodified original base file (bvec_alloc_bs() handles idx correctly, so
    there's no need for the warning):

    fs/bio.c: In function `bio_alloc_bioset':
    fs/bio.c:169: warning: `idx' may be used uninitialized in this function

    Signed-off-by: Andreas Mohr
    Acked-by: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Mohr
     

01 Oct, 2006

2 commits


18 Jun, 2006

1 commit


24 May, 2006

1 commit


27 Mar, 2006

3 commits

  • Modify well over a dozen mempool users to call mempool_create_slab_pool()
    rather than calling mempool_create() with extra arguments, saving about 30
    lines of code and increasing readability.

    Signed-off-by: Matthew Dobson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Dobson
     
  • This patch changes several mempool users, all of which are basically just
    wrappers around kmalloc(), to use the common mempool_kmalloc/kfree, rather
    than their own wrapper function, removing a bunch of duplicated code.

    Signed-off-by: Matthew Dobson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Dobson
     
  • I discovered on oprofile hunting on a SMP platform that dentry lookups were
    slowed down because d_hash_mask, d_hash_shift and dentry_hashtable were in
    a cache line that contained inodes_stat. So each time inodes_stats is
    changed by a cpu, other cpus have to refill their cache line.

    This patch moves some variables to the __read_mostly section, in order to
    avoid false sharing. RCU dentry lookups can go full speed.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

26 Mar, 2006

1 commit


24 Mar, 2006

1 commit


23 Mar, 2006

1 commit

  • The biovec default mempool limit of 256 entries results in over 3MB of RAM
    being permanently pinned, even on systems with only 128MB of RAM. Since
    mempool tries to allocate from the system pool first, it makes sense to
    reduce the size of the mempool fallbacks to a more reasonable limit of 1-5
    entries -- enough for the system to be able to make progress even under
    load.

    Signed-off-by: Benjamin LaHaise
    Acked-by: Jens Axboe
    Cc: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin LaHaise
     

31 Jan, 2006

1 commit


15 Jan, 2006

1 commit


09 Jan, 2006

1 commit


06 Jan, 2006

1 commit


16 Dec, 2005

1 commit

  • - export __blk_put_request and blk_execute_rq_nowait
    needed for async REQ_BLOCK_PC requests
    - seperate max_hw_sectors and max_sectors for block/scsi_ioctl.c and
    SG_IO bio.c helpers per Jens's last comments. Since block/scsi_ioctl.c SG_IO was
    already testing against max_sectors and SCSI-ml was setting max_sectors and
    max_hw_sectors to the same value this does not change any scsi SG_IO behavior. It only
    prepares ll_rw_blk.c, scsi_ioctl.c and bio.c for when SCSI-ml begins to set
    a valid max_hw_sectors for all LLDs. Today if a LLD does not set it
    SCSI-ml sets it to a safe default and some LLDs set it to a artificial low
    value to overcome memory and feedback issues.

    Note: Since we now cap max_sectors to BLK_DEF_MAX_SECTORS, which is 1024,
    drivers that used to call blk_queue_max_sectors with a large value of
    max_sectors will now see the fs requests capped to BLK_DEF_MAX_SECTORS.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     

15 Dec, 2005

1 commit

  • Add scsi helpers to create really-large-requests and convert
    scsi-ml to scsi_execute_async().

    Per Jens's previous comments, I placed this function in scsi_lib.c.
    I made it follow all the queue's limits - I think I did at least :), so
    I removed the warning on the function header.

    I think the scsi_execute_* functions should eventually take a request_queue
    and be placed some place where the dm-multipath hw_handler can use them
    if that failover code is going to stay in the kernel. That conversion
    patch will be sent in another mail though.

    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    Mike Christie
     

28 Oct, 2005

1 commit

  • - ->releasepage() annotated (s/int/gfp_t), instances updated
    - missing gfp_t in fs/* added
    - fixed misannotation from the original sweep caught by bitwise checks:
    XFS used __nocast both for gfp_t and for flags used by XFS allocator.
    The latter left with unsigned int __nocast; we might want to add a
    different type for those but for now let's leave them alone. That,
    BTW, is a case when __nocast use had been actively confusing - it had
    been used in the same code for two different and similar types, with
    no way to catch misuses. Switch of gfp_t to bitwise had caught that
    immediately...

    One tricky bit is left alone to be dealt with later - mapping->flags is
    a mix of gfp_t and error indications. Left alone for now.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

09 Oct, 2005

1 commit

  • - added typedef unsigned int __nocast gfp_t;

    - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
    the same warnings as far as sparse is concerned, doesn't change
    generated code (from gcc point of view we replaced unsigned int with
    typedef) and documents what's going on far better.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

10 Sep, 2005

1 commit


08 Sep, 2005

2 commits


28 Aug, 2005

1 commit


08 Aug, 2005

1 commit


28 Jul, 2005

1 commit

  • Fix bug introduced in 2.6.11-rc2: when we clone a BIO we need to copy over the
    current index into it as well.

    It corrupts data with some MD setups.

    See http://bugzilla.kernel.org/show_bug.cgi?id=4946

    Huuuuuuuuge thanks to Matthew Stapleton for doggedly
    chasing this one down.

    Acked-by: Jens Axboe
    Cc:
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

08 Jul, 2005

1 commit

  • Add a new section called ".data.read_mostly" for data items that are read
    frequently and rarely written to like cpumaps etc.

    If these maps are placed in the .data section then these frequenly read
    items may end up in cachelines with data is is frequently updated. In that
    case all processors in an SMP system must needlessly reload the cachelines
    again and again containing elements of those frequently used variables.

    The ability to share these cachelines will allow each cpu in an SMP system
    to keep local copies of those shared cachelines thereby optimizing
    performance.

    Signed-off-by: Alok N Kataria
    Signed-off-by: Shobhit Dayal
    Signed-off-by: Christoph Lameter
    Signed-off-by: Shai Fultheim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

20 Jun, 2005

3 commits