17 Mar, 2006

2 commits


16 Mar, 2006

2 commits

  • This fixes not one, but _two_, silly (but admittedly hard to hit) bugs
    in the ext2 filesystem "readdir()" function. It also cleans up the code
    to avoid the unnecessary goto mess.

    The bugs were related to re-valiating the f_pos value after somebody had
    either done an "lseek()" on the directory to an invalid offset, or when
    the offset had become invalid due to a file being unlinked in the
    directory. The code would not only set the f_version too eagerly, it
    would also not update f_pos appropriately for when the offset fixup took
    place.

    When that happened, we'd occasionally subsequently fail the readdir()
    even when we shouldn't (no real harm done, but an ugly printk, and
    obviously you would end up not necessarily seeing all entries).

    Thanks to Masoud Sharbiani who noticed the problem
    and had a test-case for it, and also fixed up a thinko in the first
    version of this patch.

    Signed-off-by: Al Viro
    Acked-by: Masoud Sharbiani
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • The Coverity checker spotted the following bug in dup_namespace():

    if (!new_ns->root) {
    up_write(&namespace_sem);
    kfree(new_ns);
    goto out;
    }
    ...
    out:
    return new_ns;

    Callers expect a non-NULL result to not be freed.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     

15 Mar, 2006

4 commits

  • page migration currently simply retries a couple of times if try_to_unmap()
    fails without inspecting the return code.

    However, SWAP_FAIL indicates that the page is in a vma that has the
    VM_LOCKED flag set (if ignore_refs ==1). We can check for that return code
    and avoid retrying the migration.

    migrate_page_remove_references() now needs to return a reason why the
    failure occured. So switch migrate_page_remove_references to use -Exx
    style error messages.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • * git://oss.sgi.com:8090/oss/git/rc-fixes:
    Fix a direct I/O locking issue revealed by the new mutex code.

    Linus Torvalds
     
  • Affects only XFS (i.e. DIO_OWN_LOCKING case) - currently it is
    not possible to get i_mutex locking correct when using DIO_OWN
    direct I/O locking in a filesystem due to indeterminism in the
    possible return code/lock/unlock combinations. This can cause
    a direct read to attempt a double i_mutex unlock inside XFS.

    We're now ensuring __blockdev_direct_IO always exits with the
    inode i_mutex (still) held for a direct reader.

    Tested with the three different locking modes (via direct block
    device access, ext3 and XFS) - both reading and writing; cannot
    find any regressions resulting from this change, and it clearly
    fixes the mutex_unlock warning originally reported here:
    http://marc.theaimsgroup.com/?l=linux-kernel&m=114189068126253&w=2

    Signed-off-by: Nathan Scott
    Acked-by: Christoph Hellwig

    Nathan Scott
     
  • This fixes a race where lsn could be cleared before taking the lock

    Signed-off-by: Dave Kleikamp
    Signed-off-by: Linus Torvalds

    Dave Kleikamp
     

14 Mar, 2006

3 commits

  • In theory, NLM specs assure us that the server will only reply LCK_GRANTED or
    LCK_DENIED_GRACE_PERIOD to our NLM_UNLOCK request.

    In practice, we should not assume this to be the case, and the code will
    currently Oops if we do.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     
  • It turns out that nfs4_proc_get_root() may return raw NFSv4 errors instead of
    mapping them to kernel errors. Problem spotted by Neil Horman

    Signed-off-by: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     
  • Based on an original patch by Mike O'Connor and Greg Banks of SGI.

    Mike states:

    A normal user can panic an NFS client and cause a local DoS with
    'judicious'(?) use of O_DIRECT. Any O_DIRECT write to an NFS file where the
    user buffer starts with a valid mapped page and contains an unmapped page,
    will crash in this way. I haven't followed the code, but O_DIRECT reads with
    similar user buffers will probably also crash albeit in different ways.

    Details: when nfs_get_user_pages() calls get_user_pages(), it detects and
    correctly handles get_user_pages() returning an error, which happens if the
    first page covered by the user buffer's address range is unmapped. However,
    if the first page is mapped but some subsequent page isn't, get_user_pages()
    will return a positive number which is less than the number of pages requested
    (this behaviour is sort of analagous to a short write() call and appears to be
    intentional). nfs_get_user_pages() doesn't detect this and hands off the
    array of pages (whose last few elements are random rubbish from the newly
    allocated array memory) to it's caller, whence they go to
    nfs_direct_write_seg(), which then totally ignores the nr_pages it's given,
    and calculates its own idea of how many pages are in the array from the user
    buffer length. Needless to say, when it comes to transmit those uninitialised
    page* pointers, we see a crash in the network stack.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     

12 Mar, 2006

2 commits

  • One can do "chattr +j" on a file to change its journalling mode. Fix
    writeback mode with "nobh" handling for it.

    Even though, we mount ext3 filesystem in writeback mode with "nobh" option,
    some one can do "chattr +j" on a single file to force it to do journalled
    mode. In order to do journaling, ext3_block_truncate_page() need to
    fallback to default case of creating buffers and adding them to transaction
    etc.

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • This patch fixes illegal __GFP_FS allocation inside ext3 transaction in
    ext3_symlink(). Such allocation may re-enter ext3 code from
    try_to_free_pages. But JBD/ext3 code keeps a pointer to current journal
    handle in task_struct and, hence, is not reentrable.

    This bug led to "Assertion failure in journal_dirty_metadata()" messages.

    http://bugzilla.openvz.org/show_bug.cgi?id=115

    Signed-off-by: Andrey Savochkin
    Signed-off-by: Kirill Korotaev
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill Korotaev
     

10 Mar, 2006

1 commit

  • Fix some bugs in mtd/jffs2 on 64bit platform.

    The MEMGETBADBLOCK/MEMSETBADBLOCK ioctl are not listed in compat_ioctl.h.

    And some variables in jffs2 are declared as uint32_t but used to hold
    size_t values.

    Signed-off-by: Atsushi Nemoto
    Cc: Thomas Gleixner
    Acked-by: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Atsushi Nemoto
     

09 Mar, 2006

7 commits

  • A recent change to compat. dev_ifconf() in fs/compat_ioctl.c
    causes ifconf data to be truncated 1 entry too early when copying it
    to userspace. The correct amount of data (length) is returned,
    but the final entry is empty (zero, not filled in).
    The for-loop 'i' check should use
    Signed-off-by: David S. Miller

    Randy Dunlap
     
  • Miscellaneous fixes related to accessing uninitialized variables or memory
    that was already freed.

    Signed-off-by: Latchesar Ionkov
    Cc: Eric Van Hensbergen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Latchesar Ionkov
     
  • DASD allows to open a device as soon as gendisk is registered, which means the
    device is a fake device (capacity=0) and we do know nothing about blocksize
    and partitions at that point of time. In case the device is opened by
    someone, the bdev and inode creation is done with the fake device info and the
    following partition detection code is just using the wrong data.

    To avoid this modify the DASD state machine to make sure that the open is
    rejected until the device analysis is either finished or an unformatted device
    was detected.

    Signed-off-by: Horst Hummel
    Signed-off-by: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Horst Hummel
     
  • Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Woodhouse
     
  • I have benchmarked this on an x86_64 NUMA system and see no significant
    performance difference on kernbench. Tested on both x86_64 and powerpc.

    The way we do file struct accounting is not very suitable for batched
    freeing. For scalability reasons, file accounting was
    constructor/destructor based. This meant that nr_files was decremented
    only when the object was removed from the slab cache. This is susceptible
    to slab fragmentation. With RCU based file structure, consequent batched
    freeing and a test program like Serge's, we just speed this up and end up
    with a very fragmented slab -

    llm22:~ # cat /proc/sys/fs/file-nr
    587730 0 758844

    At the same time, I see only a 2000+ objects in filp cache. The following
    patch I fixes this problem.

    This patch changes the file counting by removing the filp_count_lock.
    Instead we use a separate percpu counter, nr_files, for now and all
    accesses to it are through get_nr_files() api. In the sysctl handler for
    nr_files, we populate files_stat.nr_files before returning to user.

    Counting files as an when they are created and destroyed (as opposed to
    inside slab) allows us to correctly count open files with RCU.

    Signed-off-by: Dipankar Sarma
    Cc: "Paul E. McKenney"
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dipankar Sarma
     
  • Fix a bug in udf where it would write uid/gid = 0 to the disk for files
    owned by the id given with the uid=/gid= mount options. It also adds 4 new
    mount options: uid/gid=forget and uid/gid=ignore. Without any options the
    id in core and on disk always match. Giving uid/gid=nnn specifies a
    default ID to be used in core when the on disk ID is -1. uid/gid=ignore
    forces the in core ID to allways be used no matter what the on disk ID is.
    uid/gid=forget forces the on disk ID to always be written out as -1.

    The use of these options allows you to override ownerships on a disk or
    disable ownwership information from being written, allowing the media to be
    used portably between different computers and possibly different users
    without permissions issues that would require root to correct.

    Signed-off-by: Phillip Susi
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Phillip Susi
     
  • They aren't used (nor even really usable) outside of pipe.c anyway

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

08 Mar, 2006

1 commit


07 Mar, 2006

4 commits

  • The point of the smaps "shared" is to count the number of pages that are
    mapped by more than one process, according to Mauricio Lin. However, smaps
    uses page_count for this, so it will return a false positive for every page
    that is mapped by just that one process, which is also in pagecache or
    swapcache. There are false positive situations for anonymous pages not in
    swapcache as well: - page reclaim, migration - get_user_pages (eg.
    direct-io, ptrace)

    Use page_mapcount instead, to count the number of mappings to the page.

    Use vm_normal_page so that weird things like /dev/mem aren't counted either.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • smaps doesn't have a hugepage pagetable walker. Skip walking hugepage
    vmas.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • ramfs neglects to update the directory mtime and ctime fields when creating
    a new symbolic link. Ramfs was modified in 2.6.15 to update these fields
    when other types of entries are created. The symlink support is separate
    from that other support, so that change did not cover quite all of the
    possibilities.

    All of the directory content manipulation entry points now seem to be
    covered with respect to these time field updates.

    Signed-off-by: Peter Staubach
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Staubach
     
  • Fix handling of cramfs images created by util-linux containing empty
    regular files. Images created by cramfstools 1.x were ok.

    Fill out inode contents in cramfs_iget5_set() instead of get_cramfs_inode()
    to prevent issues if cramfs_iget5_test() is called with I_LOCK|I_NEW still
    set.

    Signed-off-by: Dave Johnson
    Cc: Olaf Hering
    Cc: Chris Mason
    Cc: Andreas Gruenbacher
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Johnson
     

05 Mar, 2006

1 commit

  • session when multiply mounted.

    Fixes slow response when cifs client is mounted to shares on multiple
    servers and oplock break occurs (usually due to attempt to multiply open a
    file). When treeids on mutiple mounted shares match and we find the wrong
    match first, we searched for the wrong cached files to send oplock break
    response for which usually meant that no matching file was found and thus
    the server would have to timeout the notification. Oplock break timeout is
    about 20 seconds on some servers so this could cause significantly slower
    performance on file open calls in a few cases (in particular when multiple
    shares are mounted from multiple servers, tree ids match, and we have a
    cached file which is later opened multiple times). This was the most
    important of the bugs that was found and fixed at Connectathon
    (interoperability testing event) this week.

    Acked-by: Shaggy (shaggy@austin.ibm.com)
    Signed-off-by: Steve French (sfrench@us.ibm.com)

    Steve French
     

03 Mar, 2006

5 commits

  • The bitmaps associated with generation numbers for directory entries
    are declared as an array of ints. On some platforms, this causes alignment
    exceptions.

    The following patch uses the standard bitmap declaration macros to
    declare the bitmaps, fixing the problem.

    Originally from Takashi Iwai.

    Signed-off-by: Takashi Iwai
    Acked-by: Jeff Mahoney
    Signed-off-by: Linus Torvalds

    Jeff Mahoney
     
  • This patch fixes bugs in reiserfs where unsigned integers were checked
    whether they are less then 0.

    Signed-off-by: Vladimir V. Saveliev
    Cc: Neil Brown
    Signed-off-by: Hans Reiser
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir V. Saveliev
     
  • v9fs has been plagued by an over-complicated approach trying to map Linux
    dentry semantics to Plan 9 fid semantics. Our previous approach called for
    aggressive flushing of the dcache resulting in several problems (including
    wierd cwd behavior when running /bin/pwd).

    This patch dramatically simplifies our handling of this fid management. Fids
    will not be clunked as promptly, but the new approach is more functionally
    correct. We now clunk un-open fids only when their dentry ref_count reaches 0
    (and d_delete is called).

    Another simplification is we no longer seek to match fids to the process-id or
    uid of the action initiator. The uid-matching will need to be revisited when
    we fix the security model.

    Signed-off-by: Eric Van Hensbergen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Van Hensbergen
     
  • Lucho's atomic create+open fix had a bug in the super block initialization
    causing all mounts to fail. He was freeing an fcall too early. This patch
    fixes that oversight.

    Signed-off-by: Eric Van Hensbergen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Van Hensbergen
     
  • In order to assure atomic create+open v9fs stores the open fid produced by
    v9fs_vfs_create in the dentry, from where v9fs_file_open retrieves it and
    associates it with the open file.

    This patch modifies v9fs to use nameidata.intent.open values to do the atomic
    create+open.

    Signed-off-by: Latchesar Ionkov
    Signed-off-by: Eric Van Hensbergen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Latchesar Ionkov
     

02 Mar, 2006

8 commits