15 Nov, 2005

1 commit


31 Oct, 2005

1 commit

  • When __generic_file_aio_read() hits an error during reading, it reports the
    error iff nothing has successfully been read yet. This is condition - when
    an error occurs, if nothing has been read/written, report the error code;
    otherwise, report the amount of bytes successfully transferred upto that
    point.

    This corner case can be exposed by performing readv(2) with the following
    iov.

    iov[0] = len0 @ ptr0
    iov[1] = len1 @ NULL (or any other invalid pointer)
    iov[2] = len2 @ ptr2

    When file size is enough, performing above readv(2) results in

    len0 bytes from file_pos @ ptr0
    len2 bytes from file_pos + len0 @ ptr2

    And the return value is len0 + len2. Test program is attached to this
    mail.

    This patch makes __generic_file_aio_read()'s error handling identical to
    other functions.

    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    int main(int argc, char **argv)
    {
    const char *path;
    struct stat stbuf;
    size_t len0, len1;
    void *buf0, *buf1;
    struct iovec iov[3];
    int fd, i;
    ssize_t ret;

    if (argc < 2) {
    fprintf(stderr, "Usage: testreadv path (better be a "
    "small text file)\n");
    return 1;
    }
    path = argv[1];

    if (stat(path, &stbuf) < 0) {
    perror("stat");
    return 1;
    }

    len0 = stbuf.st_size / 2;
    len1 = stbuf.st_size - len0;

    if (!len0 || !len1) {
    fprintf(stderr, "Dude, file is too small\n");
    return 1;
    }

    if ((fd = open(path, O_RDONLY)) < 0) {
    perror("open");
    return 1;
    }

    if (!(buf0 = malloc(len0)) || !(buf1 = malloc(len1))) {
    perror("malloc");
    return 1;
    }

    memset(buf0, 0, len0);
    memset(buf1, 0, len1);

    iov[0].iov_base = buf0;
    iov[0].iov_len = len0;
    iov[1].iov_base = NULL;
    iov[1].iov_len = len1;
    iov[2].iov_base = buf1;
    iov[2].iov_len = len1;

    printf("vector ");
    for (i = 0; i < 3; i++)
    printf("%p:%zu ", iov[i].iov_base, iov[i].iov_len);
    printf("\n");

    ret = readv(fd, iov, 3);
    if (ret < 0)
    perror("readv");

    printf("readv returned %zd\nbuf0 = [%s]\nbuf1 = [%s]\n",
    ret, (char *)buf0, (char *)buf1);

    return 0;
    }

    Signed-off-by: Tejun Heo
    Cc: Benjamin LaHaise
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

30 Oct, 2005

4 commits

  • move EXPORT_SYMBOL(filemap_populate) to the proper place: just after
    function itself: it's easy to miss that function is exported otherwise.

    Signed-off-by: Nikita Danilov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nikita Danilov
     
  • Updated several references to page_table_lock in common code comments.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
    a many-threaded application which concurrently initializes different parts of
    a large anonymous area.

    This patch corrects that, by using a separate spinlock per page table page, to
    guard the page table entries in that page, instead of using the mm's single
    page_table_lock. (But even then, page_table_lock is still used to guard page
    table allocation, and anon_vma allocation.)

    In this implementation, the spinlock is tucked inside the struct page of the
    page table page: with a BUILD_BUG_ON in case it overflows - which it would in
    the case of 32-bit PA-RISC with spinlock debugging enabled.

    Splitting the lock is not quite for free: another cacheline access. Ideally,
    I suppose we would use split ptlock only for multi-threaded processes on
    multi-cpu machines; but deciding that dynamically would have its own costs.
    So for now enable it by config, at some number of cpus - since the Kconfig
    language doesn't support inequalities, let preprocessor compare that with
    NR_CPUS. But I don't think it's worth being user-configurable: for good
    testing of both split and unsplit configs, split now at 4 cpus, and perhaps
    change that to 8 later.

    There is a benefit even for singly threaded processes: kswapd can be attacking
    one part of the mm while another part is busy faulting.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Impose a little more consistency on the page fault handlers do_wp_page,
    do_swap_page, do_anonymous_page, do_no_page, do_file_page: why not pass their
    arguments in the same order, called the same names?

    break_cow is all very well, but what it did was inlined elsewhere: easier to
    compare if it's brought back into do_wp_page.

    do_file_page's fallback to do_no_page dates from a time when we were testing
    pte_file by using it wherever possible: currently it's peculiar to nonlinear
    vmas, so just check that. BUG_ON if not? Better not, it's probably page
    table corruption, so just show the pte: hmm, there's a pte_ERROR macro, let's
    use that for do_wp_page's invalid pfn too.

    Hah! Someone in the ppc64 world noticed pte_ERROR was unused so removed it:
    restored (and say "pud" not "pmd" in its pud_ERROR).

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

28 Oct, 2005

1 commit


11 Sep, 2005

1 commit


05 Sep, 2005

2 commits

  • Either shmem_getpage returns a failure, or it found a page, or it was told
    it couldn't do any I/O. So it's useless to check nonblock in the else
    branch. We could add a BUG() there but I preferred to comment the
    offending function.

    This was taken out from one Ingo Molnar's old patch I'm resurrecting.

    Signed-off-by: Paolo 'Blaisorblade' Giarrusso
    Cc: Ingo Molnar
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paolo 'Blaisorblade' Giarrusso
     
  • The idea of a swap_device_lock per device, and a swap_list_lock over them all,
    is appealing; but in practice almost every holder of swap_device_lock must
    already hold swap_list_lock, which defeats the purpose of the split.

    The only exceptions have been swap_duplicate, valid_swaphandles and an
    untrodden path in try_to_unuse (plus a few places added in this series).
    valid_swaphandles doesn't show up high in profiles, but swap_duplicate does
    demand attention. However, with the hold time in get_swap_pages so much
    reduced, I've not yet found a load and set of swap device priorities to show
    even swap_duplicate benefitting from the split. Certainly the split is mere
    overhead in the common case of a single swap device.

    So, replace swap_list_lock and swap_device_lock by spinlock_t swap_lock
    (generally we seem to prefer an _ in the name, and not hide in a macro).

    If someone can show a regression in swap_duplicate, then probably we should
    add a hashlock for the swap_map entries alone (shorts being anatomic), so as
    to help the case of the single swap device too.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

26 Jun, 2005

2 commits

  • Here is the fix for the problem described in

    http://bugzilla.kernel.org/show_bug.cgi?id=4721

    Basically, problem is generic_file_buffered_write() is accessing beyond end
    of the iov[] vector after handling the last vector. If we happen to cross
    page boundary, we get a fault.

    I think this simple patch is good enough. If we really don't want to
    depend on the "count", then we need pass nr_segs to
    filemap_set_next_iovec() and decrement it and check it.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • Fix a bug on error handling in the direct I/O function.

    Currently, if a file is opened with the O_DIRECT|O_SYNC flag, the write()
    syscall cannot receive the EIO error after an I/O error (SCSI cable is
    disconnected etc.).

    Return values of other points that call generic_osync_inode() are treated
    appropriately.

    Signed-off-by: Hisashi Hifumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hifumi Hisashi
     

24 Jun, 2005

2 commits

  • - generic_file* file operations do no longer have a xip/non-xip split
    - filemap_xip.c implements a new set of fops that require get_xip_page
    aop to work proper. all new fops are exported GPL-only (don't like to
    see whatever code use those except GPL modules)
    - __xip_unmap now uses page_check_address, which is no longer static
    in rmap.c, and defined in linux/rmap.h
    - mm/filemap.h is now much more clean, plainly having just Linus'
    inline funcs moved here from filemap.c
    - fix includes in filemap_xip to make it build cleanly on i386

    Signed-off-by: Carsten Otte
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Carsten Otte
     
  • The following patch removes the f_error field and all checks of f_error.

    Trond said:

    f_error was introduced for NFS, and made sense when we were guaranteed
    always to have a file pointer around when write errors occurred. Since
    then, we have (for various reasons) had to introduce the nfs_open_context in
    order to track the file read/write state, and it made sense to move our
    f_error tracking there too.

    Signed-off-by: Christoph Lameter
    Acked-by: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

07 Jun, 2005

1 commit


22 May, 2005

1 commit

  • I came across the following problem while running ltp-aiodio testcases from
    ltp-full-20050405 on linux-2.6.12-rc3-mm3. I tried running the tests with
    EXT3 as well as JFS filesystems.

    One or two fsx-linux testcases were hung after some time. These testcases
    were hanging at wait_for_all_aios().

    Debugging shows that there were some iocbs which were not getting completed
    eventhough the last retry for those returned -EIOCBQUEUED. Also all such
    pending iocbs represented READ operation.

    Further debugging revealed that all such iocbs hit EOF in the DIO layer.
    To be more precise, the "pos" from which they were trying to read was
    greater than the "size" of the file. So the generic_file_direct_IO
    returned 0.

    This happens rarely as there is already a check in
    __generic_file_aio_read(), for whether "pos" < "size" before calling direct
    IO routine.

    >size = i_size_read(inode);
    >if (pos < size) {
    > retval = generic_file_direct_IO(READ, iocb,
    > iov, pos, nr_segs);

    But for READ, we are taking the inode->i_sem only in the DIO layer. So it
    is possible that some other process can change the size of the file before
    we take the i_sem. In such a case ( when "pos" > "size"), the
    __generic_file_aio_read() would return -EIOCBQUEUED even though there were
    no I/O requests submitted by the DIO layer. This would cause the AIO layer
    to expect aio_complete() for THE iocb, which doesnot happen. And thus the
    test hangs forever, waiting for an I/O completion, where there are no
    requests submitted at all.

    The following patch makes __generic_file_aio_read() return 0 (instead of
    returning -EIOCBQUEUED), on getting 0 from generic_file_direct_IO(), so
    that the AIO layer does the aio_complete().

    Testing:

    I have tested the patch on a SMP machine(with 2 Pentium 4 (HT)) running
    linux-2.6.12-rc3-mm3. I ran the ltp-aiodio testcases and none of the
    fsx-linux tests hung. Also the aio-stress tests ran without any problem.

    Signed-off-by: Suzuki K P
    Signed-off-by: Suparna Bhattacharya
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Suparna Bhattacharya
     

06 May, 2005

1 commit


01 May, 2005

4 commits

  • Some KernelDoc descriptions are updated to match the current code.
    No code changes.

    Signed-off-by: Martin Waitz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Martin Waitz
     
  • Remove PAGE_BUG - repalce it with BUG and BUG_ON.

    Signed-off-by: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Mackall
     
  • The smp_mb() is becaus sync_page() doesn't have PG_locked while it accesses
    page_mapping(page). The comments in the patch (the entire patch is the
    addition of this comment) try to explain further how and why smp_mb() is
    used.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    William Lee Irwin III
     
  • Anton Altaparmakov points out:

    - It calls fault_in_pages_readable() which is completely bogus if @nr_segs >
    1. It needs to be replaced by a to be written
    "fault_in_pages_readable_iovec()".

    - It increments @buf even in the iovec case thus @buf can point to random
    memory really quickly (in the iovec case) and then it calls
    fault_in_pages_readable() on this random memory.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    akpm@osdl.org
     

17 Apr, 2005

2 commits

  • We will return NULL from filemap_getpage when a page does not exist in the
    page cache and MAP_NONBLOCK is specified, here:

    page = find_get_page(mapping, pgoff);
    if (!page) {
    if (nonblock)
    return NULL;
    goto no_cached_page;
    }

    But we forget to do so when the page in the cache is not uptodate. The
    following could result in a blocking call:

    /*
    * Ok, found a page in the page cache, now we need to check
    * that it's up-to-date.
    */
    if (!PageUptodate(page))
    goto page_not_uptodate;

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Moyer
     
  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds