15 May, 2009

12 commits

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: Fix race in ext4_inode_info.i_cached_extent
    ext4: Clear the unwritten buffer_head flag after the extent is initialized
    ext4: Use a fake block number for delayed new buffer_head
    ext4: Fix sub-block zeroing for writes into preallocated extents

    Linus Torvalds
     
  • devpts_get_sb() calls memset(0) to clear mount options and calls
    parse_mount_options() if user specified any mount options.

    The memset(0) is bogus since the 'mode' and 'ptmxmode' options are
    non-zero by default. parse_mount_options() restores options to default
    anyway and can properly deal with NULL mount options.

    So in devpts_get_sb() remove memset(0) and call parse_mount_options() even
    for NULL mount options.

    Bug reported by Eric Paris: http://lkml.org/lkml/2009/5/7/448.

    Signed-off-by: Sukadev Bhattiprolu
    Tested-by: Marc Dionne
    Reported-by: Eric Paris
    Cc: Christoph Hellwig
    Cc: Alan Cox
    Acked-by: Serge Hallyn
    Cc: Al Viro
    Cc: "Rafael J. Wysocki"
    Reviewed-by: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • If two CPU's simultaneously call ext4_ext_get_blocks() at the same
    time, there is nothing protecting the i_cached_extent structure from
    being used and updated at the same time. This could potentially cause
    the wrong location on disk to be read or written to, including
    potentially causing the corruption of the block group descriptors
    and/or inode table.

    This bug has been in the ext4 code since almost the very beginning of
    ext4's development. Fortunately once the data is stored in the page
    cache cache, ext4_get_blocks() doesn't need to be called, so trying to
    replicate this problem to the point where we could identify its root
    cause was *extremely* difficult. Many thanks to Kevin Shanahan for
    working over several months to be able to reproduce this easily so we
    could finally nail down the cause of the corruption.

    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: "Aneesh Kumar K.V"

    Theodore Ts'o
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
    cifs: fix error handling in parse_DFS_referrals

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
    Btrfs: Spelling fix in btrfs_lookup_first_block_group comments
    Btrfs: make show_options result match actual option names
    Btrfs: remove outdated comment in btrfs_ioctl_resize()
    Btrfs: remove some WARN_ONs in the IO failure path
    Btrfs: Don't loop forever on metadata IO failures
    Btrfs: init inode ordered_data_close flag properly

    Linus Torvalds
     
  • The BH_Unwritten flag indicates that the buffer is allocated on disk
    but has not been written; that is, the disk was part of a persistent
    preallocation area. That flag should only be set when a get_blocks()
    function is looking up a inode's logical to physical block mapping.

    When ext4_get_blocks_wrap() is called with create=1, the uninitialized
    extent is converted into an initialized one, so the BH_Unwritten flag
    is no longer appropriate. Hence, we need to make sure the
    BH_Unwritten is not left set, since the combination of BH_Mapped and
    BH_Unwritten is not allowed; among other things, it will result ext4's
    get_block() to be called over and over again during the write_begin
    phase of write(2).

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     
  • Signed-off-by: Sankar P
    Signed-off-by: Chris Mason

    Sankar P
     
  • The notreelog and flushoncommit mount options were being printed slightly
    differently.

    Signed-off-by: Sage Weil
    Signed-off-by: Chris Mason

    Sage Weil
     
  • In Li Zefan's commit dae7b665cf6d6e6e733f1c9c16cf55547dd37e33,
    a combination call of kmalloc() and copy_from_user() is replaced by
    memdup_user(). So btrfs_ioctl_resize() doesn't use GFP_NOFS any more.

    Signed-off-by: Li Hong
    Signed-off-by: Chris Mason

    Li Hong
     
  • These debugging WARN_ONs make too much console noise during regular
    IO failures. An IO failure will still generate a number of messages
    as we verify checksums etc, but these two are not needed.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • When a btrfs metadata read fails, the first thing we try to do is find
    a good copy on another mirror of the block. If this fails, read_tree_block()
    ends up returning a buffer that isn't up to date.

    The btrfs btree reading code was reworked to drop locks and repeat
    the search when IO was done, but the changes didn't add a check for failed
    reads. The end result was looping forever on buffers that were never
    going to become up to date.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • This flag is used to decide when we need to send a given file through
    the ordered code to make sure it is fully written before a transaction
    commits. It was not being properly set to zero when the inode was
    being setup.

    Signed-off-by: Chris Mason

    Chris Mason
     

14 May, 2009

5 commits


13 May, 2009

7 commits

  • The core VM assumes the page size used by the address_space in
    inode->i_mapping is PAGE_SIZE but hugetlbfs breaks this assumption by
    inserting pages into the page cache at offsets the core VM considers
    unexpected.

    This would not be a problem except that hugetlbfs also provide a
    ->readpage implementation. As it exists, the core VM can assume the
    base page size is being used, allocate pages on behalf of the
    filesystem, insert them into the page cache and call ->readpage to
    populate them. These pages are the wrong size and at the wrong offset
    for hugetlbfs causing confusion.

    This patch deletes the ->readpage implementation for hugetlbfs on the
    grounds the core VM should not be allocating and populating pages on
    behalf of hugetlbfs. There should be no existing users of the
    ->readpage implementation so it should not cause a regression.

    Signed-off-by: Mel Gorman
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Signed-off-by: Phillip Lougher

    Phillip Lougher
     
  • Normally the block size (by default 128K) will be larger than the
    page size, unless a non-standard block size has been specified in
    Mksquashfs, and the page size is larger than 4K.

    Signed-off-by: Phillip Lougher

    Phillip Lougher
     
  • Squashfs is broken on any system where the page size is larger than
    the metadata size (8192). This is easily fixed by ensuring cache->pages
    is always > 0.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Doug Chapman
    Signed-off-by: Andrew Morton
    Signed-off-by: Phillip Lougher

    Doug Chapman
     
  • * 'for-2.6.30' of git://linux-nfs.org/~bfields/linux:
    nfsd: silence lockdep warning
    lockd: fix list corruption on lockd restart
    nfsd4: check for negative dentry before use in nfsv4 readdir
    nfsd41: slots are freed with session
    svcrdma: clean up error paths.
    svcrdma: Fix dma map direction for rdma read targets

    Linus Torvalds
     
  • Fix a size check WRT the manual pages. This was inadvertently broken by
    commit 9fe5ad9c8cef9ad5873d8ee55d1cf00d9b607df0 ("flag parameters
    add-on: remove epoll_create size param").

    Signed-off-by: Davide Libenzi
    Cc:
    Cc: rohit verma
    Cc: Ulrich Drepper
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • Use a very large unsigned number (~0xffff) as as the fake block number
    for the delayed new buffer. The VFS should never try to write out this
    number, but if it does, this will make it obvious.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     

12 May, 2009

3 commits

  • Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • The return value of dup2 when oldfd == newfd and the fd isn't valid is
    not getting properly sign extended. We end up with 4294967287 instead
    of -EBADF.

    I've reproduced this on SLE11 (2.6.27.21), openSUSE Factory
    (2.6.29-rc5), and Ubuntu 9.04 (2.6.28).

    This patch uses a signed int for the error value so it is properly
    extended.

    Commit 6c5d0512a091480c9f981162227fdb1c9d70e555 introduced this
    regression.

    Reported-by: Jiri Dluhos
    Signed-off-by: Jeff Mahoney
    Signed-off-by: Linus Torvalds

    Jeff Mahoney
     
  • Although some ioctls of nilfs2 exchange data in the form of indirectly
    referenced array, some of them lack size check on the array elements.

    This inserts the missing checks and rejects requests if data of ioctl
    does not have a valid format.

    We usually don't have to check size of structures that we associated
    with ioctl commands because the size is tested implicitly for
    identifying ioctl command; the checks this patch adds are for the
    cases where the implicit check is not applied.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     

11 May, 2009

3 commits

  • This is a companion patch to ("nilfs2: fix possible circular locking
    for get information ioctls").

    This corrects lock order reversal between mm->mmap_sem and
    nilfs->ns_segctor_sem in nilfs_clean_segments() which was detected by
    lockdep check:

    =======================================================
    [ INFO: possible circular locking dependency detected ]
    2.6.30-rc3-nilfs-00003-g360bdc1 #7
    -------------------------------------------------------
    mmap/5294 is trying to acquire lock:
    (&nilfs->ns_segctor_sem){++++.+}, at: [] nilfs_transaction_begin+0xb6/0x10c [nilfs2]

    but task is already holding lock:
    (&mm->mmap_sem){++++++}, at: [] do_page_fault+0x1d8/0x30a

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&mm->mmap_sem){++++++}:
    [] __lock_acquire+0x1066/0x13b0
    [] lock_acquire+0xba/0xdd
    [] might_fault+0x68/0x88
    [] copy_from_user+0x2a/0x111
    [] nilfs_ioctl_prepare_clean_segments+0x1d/0xf1 [nilfs2]
    [] nilfs_clean_segments+0x6d/0x1b9 [nilfs2]
    [] nilfs_ioctl+0x2ad/0x318 [nilfs2]
    [] vfs_ioctl+0x22/0x69
    [] do_vfs_ioctl+0x460/0x499
    [] sys_ioctl+0x40/0x5a
    [] sysenter_do_call+0x12/0x38
    [] 0xffffffff

    -> #0 (&nilfs->ns_segctor_sem){++++.+}:
    [] __lock_acquire+0xdcc/0x13b0
    [] lock_acquire+0xba/0xdd
    [] down_read+0x2a/0x3e
    [] nilfs_transaction_begin+0xb6/0x10c [nilfs2]
    [] nilfs_page_mkwrite+0xe7/0x154 [nilfs2]
    [] __do_fault+0x165/0x376
    [] handle_mm_fault+0x287/0x5d1
    [] do_page_fault+0x2fb/0x30a
    [] error_code+0x72/0x78
    [] 0xffffffff

    where nilfs_clean_segments() holds:

    nilfs->ns_segctor_sem -> copy_from_user()
    --> page fault -> mm->mmap_sem

    And, page fault path may hold:

    page fault -> mm->mmap_sem
    --> nilfs_page_mkwrite() -> nilfs->ns_segctor_sem

    Even though nilfs_clean_segments() does not perform write access on
    given user pages, it may cause deadlock because nilfs->ns_segctor_sem
    is shared per device and mm->mmap_sem can be shared with other tasks.

    To avoid this problem, this patch moves all calls of copy_from_user()
    outside the nilfs->ns_segctor_sem lock in the ioctl.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This is one of two patches which are to correct possible circular
    locking between mm->mmap_sem and nilfs->ns_segctor_sem.

    The problem was detected by lockdep check as follows:

    =======================================================
    [ INFO: possible circular locking dependency detected ]
    2.6.30-rc3-nilfs-00002-g3552613 #6
    -------------------------------------------------------
    mmap/5418 is trying to acquire lock:
    (&nilfs->ns_segctor_sem){++++.+}, at: [] nilfs_transaction_begin+0xb6/0x10c [nilfs2]

    but task is already holding lock:
    (&mm->mmap_sem){++++++}, at: [] do_page_fault+0x1d8/0x30a

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&mm->mmap_sem){++++++}:
    [] __lock_acquire+0x1066/0x13b0
    [] lock_acquire+0xba/0xdd
    [] might_fault+0x68/0x88
    [] copy_to_user+0x2c/0xfc
    [] nilfs_ioctl_wrap_copy+0x103/0x160 [nilfs2]
    [] nilfs_ioctl+0x30a/0x3b0 [nilfs2]
    [] vfs_ioctl+0x22/0x69
    [] do_vfs_ioctl+0x460/0x499
    [] sys_ioctl+0x40/0x5a
    [] sysenter_do_call+0x12/0x38
    [] 0xffffffff

    -> #0 (&nilfs->ns_segctor_sem){++++.+}:
    [] __lock_acquire+0xdcc/0x13b0
    [] lock_acquire+0xba/0xdd
    [] down_read+0x2a/0x3e
    [] nilfs_transaction_begin+0xb6/0x10c [nilfs2]
    [] nilfs_page_mkwrite+0xe7/0x154 [nilfs2]
    [] __do_fault+0x165/0x376
    [] handle_mm_fault+0x287/0x5d1
    [] do_page_fault+0x2fb/0x30a
    [] error_code+0x72/0x78
    [] 0xffffffff

    other info that might help us debug this:

    1 lock held by mmap/5418:
    #0: (&mm->mmap_sem){++++++}, at: [] do_page_fault+0x1d8/0x30a

    stack backtrace:
    Pid: 5418, comm: mmap Not tainted 2.6.30-rc3-nilfs-00002-g3552613 #6
    Call Trace:
    [] ? printk+0xf/0x12
    [] print_circular_bug_tail+0xaa/0xb5
    [] __lock_acquire+0xdcc/0x13b0
    [] ? nilfs_sufile_get_stat+0x1e/0x105 [nilfs2]
    [] ? up_read+0x16/0x2c
    [] ? nilfs_sufile_get_stat+0xfa/0x105 [nilfs2]
    [] lock_acquire+0xba/0xdd
    [] ? nilfs_transaction_begin+0xb6/0x10c [nilfs2]
    [] down_read+0x2a/0x3e
    [] ? nilfs_transaction_begin+0xb6/0x10c [nilfs2]
    [] nilfs_transaction_begin+0xb6/0x10c [nilfs2]
    [] nilfs_page_mkwrite+0xe7/0x154 [nilfs2]
    [] __do_fault+0x165/0x376
    [] handle_mm_fault+0x287/0x5d1
    [] ? do_page_fault+0x1d8/0x30a
    [] ? down_read_trylock+0x39/0x43
    [] do_page_fault+0x2fb/0x30a
    [] ? do_page_fault+0x0/0x30a
    [] error_code+0x72/0x78
    [] ? do_page_fault+0x0/0x30a

    This makes the lock granularity of nilfs->ns_segctor_sem finer than
    that of the mmap semaphore for ioctl commands except
    nilfs_clean_segments().

    The successive patch ("nilfs2: fix lock order reversal in
    nilfs_clean_segments ioctl") is required to fully resolve the problem.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (22 commits)
    Fix the race between capifs remount and node creation
    Fix races around the access to ->s_options
    switch ufs directories to ufs_sync_file()
    Switch open_exec() and sys_uselib() to do_open_filp()
    Make open_exec() and sys_uselib() use may_open(), instead of duplicating its parts
    Reduce path_lookup() abuses
    Make checkpatch.pl shut up on fs/inode.c
    NULL noise in fs/super.c:kill_bdev_super()
    romfs: cleanup romfs_fs.h
    ROMFS: romfs_dev_read() error ignored
    fs: dcache fix LRU ordering
    ocfs2: Use nd_set_link().
    Fix deadlock in ipathfs ->get_sb()
    Fix a leak in failure exit in 9p ->get_sb()
    Convert obvious places to deactivate_locked_super()
    New helper: deactivate_locked_super()
    reiserfs: remove privroot hiding in lookup
    reiserfs: dont associate security.* with xattr files
    reiserfs: fixup xattr_root caching
    Always lookup priv_root on reiserfs mount and keep it
    ...

    Linus Torvalds
     

10 May, 2009

1 commit

  • This would fix the following failure during GC:

    nilfs_cpfile_delete_checkpoints: cannot delete block
    NILFS: GC failed during preparation: cannot delete checkpoints: err=-2

    The problem was caused by a break in state consistency between page
    cache and btree; the above block was removed from the btree but the
    page buffering the block was remaining in the page cache in dirty
    state.

    This resolves the inconsistency by ensuring to clear dirty state of
    the page buffering the deleted block.

    Reported-by: David Arendt
    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     

09 May, 2009

9 commits