29 Dec, 2008

2 commits


09 Oct, 2008

3 commits


27 Jul, 2008

1 commit

  • Use WARN() instead of a printk+WARN_ON() pair; this way the message
    becomes part of the warning section for better reporting/collection.

    Signed-off-by: Arjan van de Ven
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

03 Jul, 2008

1 commit

  • If we have multiple tasks freeing io contexts when as-iosched
    is being unloaded, we could complete() ioc_gone twice. Fix that by
    protecting ioc_gone complete() and clearing with a spinlock for
    just that purpose. Doesn't matter from a performance perspective,
    since it'll only enter that path when ioc_gone != NULL (when as-iosched
    is being rmmod'ed).

    Signed-off-by: Jens Axboe

    Jens Axboe
     

01 Jul, 2008

1 commit

  • AS scheduler alternates between issuing read and write batches. It does
    the batch switch only after all requests from the previous batch are
    completed.

    When switching to a write batch, if there is an on-going read request,
    it waits for its completion and indicates its intention of switching by
    setting ad->changed_batch and the new direction but does not update the
    batch_expire_time for the new write batch which it does in the case of
    no previous pending requests.
    On completion of the read request, it sees that we were waiting for the
    switch and schedules work for kblockd right away and resets the
    ad->changed_data flag.
    Now when kblockd enters dispatch_request where it is expected to pick
    up a write request, it in turn ends the write batch because the
    batch_expire_timer was not updated and shows the expire timestamp for
    the previous batch.

    This results in the write starvation for all the cases where there is
    the intention for switching to a write batch, but there is a previous
    in-flight read request and the batch gets reverted to a read_batch
    right away.

    This also holds true in the reverse case (switching from a write batch
    to a read batch with an in-flight write request).

    I've checked that this bug exists on 2.6.11, 2.6.18, 2.6.24 and
    linux-2.6-block git HEAD. I've tested the fix on x86 platforms with
    SCSI drives where the driver asks for the next request while a current
    request is in-flight.

    This patch is based off linux-2.6-block git HEAD.

    Bug reproduction:
    A simple scenario which reproduces this bug is:
    - dd if=/dev/hda3 of=/dev/null &
    - lilo
    The lilo takes forever to complete.

    This can also be reproduced fairly easily with the earlier dd and
    another test
    program doing msync().

    The example test program below should print out a message after every
    iteration
    but it simply hangs forever. With this bugfix it makes forward progress.

    ====
    Example test program using msync() (thanks to suleiman AT google DOT
    com)

    inline uint64_t
    rdtsc(void)
    {
    int64_t tsc;

    __asm __volatile("rdtsc" : "=A" (tsc));
    return (tsc);
    }

    int
    main(int argc, char **argv)
    {
    struct stat st;
    uint64_t e, s, t;
    char *p, q;
    long i;
    int fd;

    if (argc < 2) {
    printf("Usage: %s \n", argv[0]);
    return (1);
    }

    if ((fd = open(argv[1], O_RDWR | O_NOATIME)) < 0)
    err(1, "open");

    if (fstat(fd, &st) < 0)
    err(1, "fstat");

    p = mmap(NULL, st.st_size, PROT_READ | PROT_WRITE,
    MAP_SHARED, fd, 0);

    t = 0;
    for (i = 0; i < 1000; i++) {
    *p = 0;
    msync(p, 4096, MS_SYNC);
    s = rdtsc();
    *p = 0;
    __asm __volatile(""::: "memory");
    e = rdtsc();
    if (argc > 2)
    printf("%d: %lld cycles %jd %jd\n",
    i, e - s, (intmax_t)s, (intmax_t)e);
    t += e - s;
    }
    printf("average time: %lld cycles\n", t / 1000);
    return (0);
    }

    Cc:
    Acked-by: Nick Piggin
    Signed-off-by: Jens Axboe

    Divyesh Shah
     

01 Feb, 2008

2 commits

  • It blindly copies everything in the io_context, including the lock.
    That doesn't work so well for either lock ordering or lockdep.

    There seems zero point in swapping io contexts on a request to request
    merge, so the best point of action is to just remove it.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Since it's acquired from irq context, all locking must be of the
    irq safe variant. Most are already inside the queue lock (which
    already disables interrupts), but the io scheduler rmmod path
    always has irqs enabled and the put_io_context() path may legally
    be called with irqs enabled (even if it isn't usually). So fixup
    those two.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

30 Jan, 2008

1 commit


28 Jan, 2008

1 commit


18 Dec, 2007

3 commits

  • elv_register() always returns 0, and there isn't anything it does where
    it should return an error (the only error condition is so grave that
    it's handled with a BUG_ON).

    Signed-off-by: Adrian Bunk
    Signed-off-by: Jens Axboe

    Adrian Bunk
     
  • New write batches currently start from where the last one completed.
    We have no idea where the head is after switching batches, so this
    makes little sense. Instead, start the next batch from the request
    with the earliest deadline in the hope that we avoid a deadline
    expiry later on.

    Signed-off-by: Aaron Carroll
    Acked-by: Nick Piggin
    Signed-off-by: Jens Axboe

    Aaron Carroll
     
  • Two comments refer to deadlines applying to reads only. This is
    not the case.

    Signed-off-by: Aaron Carroll
    Acked-by: Nick Piggin
    Signed-off-by: Jens Axboe

    Aaron Carroll
     

24 Jul, 2007

1 commit

  • Some of the code has been gradually transitioned to using the proper
    struct request_queue, but there's lots left. So do a full sweet of
    the kernel and get rid of this typedef and replace its uses with
    the proper type.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

18 Jul, 2007

1 commit

  • kmalloc_node() and kmem_cache_alloc_node() were not available in a zeroing
    variant in the past. But with __GFP_ZERO it is possible now to do zeroing
    while allocating.

    Use __GFP_ZERO to remove the explicit clearing of memory via memset whereever
    we can.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

10 May, 2007

1 commit

  • Switch the kblockd flushing from a global flush to a more specific
    flush_work().

    (akpm: bypassed maintainers, sorry. There are other patches which depend on
    this)

    Cc: "Maciej W. Rozycki"
    Cc: David Howells
    Cc: Jens Axboe
    Cc: Nick Piggin
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

09 May, 2007

1 commit


13 Dec, 2006

1 commit


05 Dec, 2006

1 commit


01 Dec, 2006

1 commit


22 Nov, 2006

1 commit

  • Pass the work_struct pointer to the work function rather than context data.
    The work function can use container_of() to work out the data.

    For the cases where the container of the work_struct may go away the moment the
    pending bit is cleared, it is made possible to defer the release of the
    structure by deferring the clearing of the pending bit.

    To make this work, an extra flag is introduced into the management side of the
    work_struct. This governs auto-release of the structure upon execution.

    Ordinarily, the work queue executor would release the work_struct for further
    scheduling or deallocation by clearing the pending bit prior to jumping to the
    work function. This means that, unless the driver makes some guarantee itself
    that the work_struct won't go away, the work function may not access anything
    else in the work_struct or its container lest they be deallocated.. This is a
    problem if the auxiliary data is taken away (as done by the last patch).

    However, if the pending bit is *not* cleared before jumping to the work
    function, then the work function *may* access the work_struct and its container
    with no problems. But then the work function must itself release the
    work_struct by calling work_release().

    In most cases, automatic release is fine, so this is the default. Special
    initiators exist for the non-auto-release case (ending in _NAR).

    Signed-Off-By: David Howells

    David Howells
     

01 Oct, 2006

13 commits


01 Jul, 2006

1 commit


27 Jun, 2006

1 commit

  • acquired (aquired)
    contiguous (contigious)
    successful (succesful, succesfull)
    surprise (suprise)
    whether (weather)
    some other misspellings

    Signed-off-by: Andreas Mohr
    Signed-off-by: Adrian Bunk

    Andreas Mohr
     

23 Jun, 2006

2 commits

  • They all duplicate macros to check for empty root and/or node, and
    clearing a node. So put those in rbtree.h.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • A process flag to indicate whether we are doing sync io is incredibly
    ugly. It also causes performance problems when one does a lot of async
    io and then proceeds to sync it. Part of the io will go out as async,
    and the other part as sync. This causes a disconnect between the
    previously submitted io and the synced io. For io schedulers such as CFQ,
    this will cause us lost merges and suboptimal behaviour in scheduling.

    Remove PF_SYNCWRITE completely from the fsync/msync paths, and let
    the O_DIRECT path just directly indicate that the writes are sync
    by using WRITE_SYNC instead.

    Signed-off-by: Jens Axboe

    Jens Axboe