21 Jun, 2010

1 commit


19 Jun, 2010

1 commit

  • Hi Jens,

    Few days back Ingo noticed a CFQ boot time warning. This patch fixes it.
    The issue here is that with CFQ_GROUP_IOSCHED=n, CFQ should not really
    be making blkio stat related calls.

    > Hm, it's still not entirely fixed, as of 2.6.35-rc2-00131-g7908a9e. With
    > some
    > configs i get bad spinlock warnings during bootup:
    >
    > [ 28.968013] initcall net_olddevs_init+0x0/0x82 returned 0 after 93750
    > usecs
    > [ 28.972003] calling b44_init+0x0/0x55 @ 1
    > [ 28.976009] bus: 'pci': add driver b44
    > [ 28.976374] sda:
    > [ 28.978157] BUG: spinlock bad magic on CPU#1, async/0/117
    > [ 28.980000] lock: 7e1c5bbc, .magic: 00000000, .owner: /-1, +.owner_cpu: 0
    > [ 28.980000] Pid: 117, comm: async/0 Not tainted +2.6.35-rc2-tip-01092-g010e7ef-dirty #8183
    > [ 28.980000] Call Trace:
    > [ 28.980000] [] ? printk+0x20/0x24
    > [ 28.980000] [] spin_bug+0x7c/0x87
    > [ 28.980000] [] do_raw_spin_lock+0x1e/0x123
    > [ 28.980000] [] ? _raw_spin_lock_irqsave+0x12/0x20
    > [ 28.980000] [] _raw_spin_lock_irqsave+0x1a/0x20
    > [ 28.980000] [] blkiocg_update_io_add_stats+0x25/0xfb
    > [ 28.980000] [] ? cfq_prio_tree_add+0xb1/0xc1
    > [ 28.980000] [] cfq_insert_request+0x8c/0x425

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     

18 Jun, 2010

1 commit

  • Hi,

    A user reported a kernel bug when running a particular program that did
    the following:

    created 32 threads
    - each thread took a mutex, grabbed a global offset, added a buffer size
    to that offset, released the lock
    - read from the given offset in the file
    - created a new thread to do the same
    - exited

    The result is that cfq's close cooperator logic would trigger, as the
    threads were issuing I/O within the mean seek distance of one another.
    This workload managed to routinely trigger a use after free bug when
    walking the list of merge candidates for a particular cfqq
    (cfqq->new_cfqq). The logic used for merging queues looks like this:

    static void cfq_setup_merge(struct cfq_queue *cfqq, struct cfq_queue *new_cfqq)
    {
    int process_refs, new_process_refs;
    struct cfq_queue *__cfqq;

    /* Avoid a circular list and skip interim queue merges */
    while ((__cfqq = new_cfqq->new_cfqq)) {
    if (__cfqq == cfqq)
    return;
    new_cfqq = __cfqq;
    }

    process_refs = cfqq_process_refs(cfqq);
    /*
    * If the process for the cfqq has gone away, there is no
    * sense in merging the queues.
    */
    if (process_refs == 0)
    return;

    /*
    * Merge in the direction of the lesser amount of work.
    */
    new_process_refs = cfqq_process_refs(new_cfqq);
    if (new_process_refs >= process_refs) {
    cfqq->new_cfqq = new_cfqq;
    atomic_add(process_refs, &new_cfqq->ref);
    } else {
    new_cfqq->new_cfqq = cfqq;
    atomic_add(new_process_refs, &cfqq->ref);
    }
    }

    When a merge candidate is found, we add the process references for the
    queue with less references to the queue with more. The actual merging
    of queues happens when a new request is issued for a given cfqq. In the
    case of the test program, it only does a single pread call to read in
    1MB, so the actual merge never happens.

    Normally, this is fine, as when the queue exits, we simply drop the
    references we took on the other cfqqs in the merge chain:

    /*
    * If this queue was scheduled to merge with another queue, be
    * sure to drop the reference taken on that queue (and others in
    * the merge chain). See cfq_setup_merge and cfq_merge_cfqqs.
    */
    __cfqq = cfqq->new_cfqq;
    while (__cfqq) {
    if (__cfqq == cfqq) {
    WARN(1, "cfqq->new_cfqq loop detected\n");
    break;
    }
    next = __cfqq->new_cfqq;
    cfq_put_queue(__cfqq);
    __cfqq = next;
    }

    However, there is a hole in this logic. Consider the following (and
    keep in mind that each I/O keeps a reference to the cfqq):

    q1->new_cfqq = q2 // q2 now has 2 process references
    q3->new_cfqq = q2 // q2 now has 3 process references

    // the process associated with q2 exits
    // q2 now has 2 process references

    // queue 1 exits, drops its reference on q2
    // q2 now has 1 process reference

    // q3 exits, so has 0 process references, and hence drops its references
    // to q2, which leaves q2 also with 0 process references

    q4 comes along and wants to merge with q3

    q3->new_cfqq still points at q2! We follow that link and end up at an
    already freed cfqq.

    So, the fix is to not follow a merge chain if the top-most queue does
    not have a process reference, otherwise any queue in the chain could be
    already freed. I also changed the logic to disallow merging with a
    queue that does not have any process references. Previously, we did
    this check for one of the merge candidates, but not the other. That
    doesn't really make sense.

    Without the attached patch, my system would BUG within a couple of
    seconds of running the reproducer program. With the patch applied, my
    system ran the program for over an hour without issues.

    This addresses the following bugzilla:
    https://bugzilla.kernel.org/show_bug.cgi?id=16217

    Thanks a ton to Phil Carns for providing the bug report and an excellent
    reproducer.

    [ Note for stable: this applies to 2.6.32/33/34 ].

    Signed-off-by: Jeff Moyer
    Reported-by: Phil Carns
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Jeff Moyer
     

17 Jun, 2010

1 commit

  • Filesystems assume that DISCARD_BARRIER are full barriers, so that they
    don't have to track in-progress discard operation when submitting new I/O.
    But currently we only treat them as elevator barriers, which don't
    actually do the nessecary queue drains.

    Also remove the unlikely around both the DISCARD and BARRIER requests -
    the happen far too often for a static mispredict.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

15 Jun, 2010

1 commit


14 Jun, 2010

4 commits


12 Jun, 2010

27 commits


11 Jun, 2010

4 commits

  • when we use remap_file_pages() to remap a file, remap_file_pages always return
    error. It is because btrfs didn't set VM_CAN_NONLINEAR for vma.

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     
  • refs can be used with uninitialized data if btrfs_lookup_extent_info()
    fails on the first pass through the loop. In the original code if that
    happens then check_path_shared() probably returns 1, this patch
    changes it to return 1 for safety.

    Signed-off-by: Dan Carpenter
    Signed-off-by: Chris Mason

    Dan Carpenter
     
  • Seems that when btrfs_fallocate was converted to use the new ENOSPC stuff we
    dropped passing the mode to the function that actually does the preallocation.
    This breaks anybody who wants to use FALLOC_FL_KEEP_SIZE. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • We cannot use the loop device which has been connected to a file in the btrf

    The reproduce steps is following:
    # dd if=/dev/zero of=vdev0 bs=1M count=1024
    # losetup /dev/loop0 vdev0
    # mkfs.btrfs /dev/loop0
    ...
    failed to zero device start -5

    The reason is that the btrfs don't implement either ->write_begin or ->write
    the VFS API, so we fix it by setting ->write to do_sync_write().

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie