28 Aug, 2009

3 commits

  • * 'for-linus' of git://git.infradead.org/users/eparis/notify:
    inotify: Ensure we alwasy write the terminating NULL.
    inotify: fix locking around inotify watching in the idr
    inotify: do not BUG on idr entries at inotify destruction
    inotify: seperate new watch creation updating existing watches

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
    9p: update documentation pointers
    9p: remove unnecessary v9fses->options which duplicates the mount string
    net/9p: insulate the client against an invalid error code sent by a 9p server
    9p: Add missing cast for the error return value in v9fs_get_inode
    9p: Remove redundant inode uid/gid assignment
    9p: Fix possible regressions when ->get_sb fails.
    9p: Fix v9fs show_options
    9p: Fix possible memleak in v9fs_inode_from fid.
    9p: minor comment fixes
    9p: Fix possible inode leak in v9fs_get_inode.
    9p: Check for error in return value of v9fs_fid_add

    Linus Torvalds
     
  • kAFS crashes when asked to read a symbolic link because page_getlink()
    passes a NULL file pointer to read_mapping_page(), but afs_readpage()
    expects a file pointer from which to extract a key.

    Modify afs_readpage() to request the appropriate key from the calling
    process's keyrings if a file struct is not supplied with one attached.

    Signed-off-by: David Howells
    Acked-by: Anton Blanchard
    Signed-off-by: Linus Torvalds

    David Howells
     

27 Aug, 2009

4 commits

  • Before the rewrite copy_event_to_user always wrote a terqminating '\0'
    byte to user space after the filename. Since the rewrite that
    terminating byte was skipped if your filename is exactly a multiple of
    event_size. Ouch!

    So add one byte to name_size before we round up and use clear_user to
    set userspace to zero like /dev/zero does instead of copying the
    strange nul_inotify_event. I can't quite convince myself len_to_zero
    will never exceed 16 and even if it doesn't clear_user should be more
    efficient and a more accurate reflection of what the code is trying to
    do.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Eric Paris

    Eric W. Biederman
     
  • The are races around the idr storage of inotify watches. It's possible
    that a watch could be found from sys_inotify_rm_watch() in the idr, but it
    could be removed from the idr before that code does it's removal. Move the
    locking and the refcnt'ing so that these have to happen atomically.

    Signed-off-by: Eric Paris

    Eric Paris
     
  • If an inotify watch is left in the idr when an fsnotify group is destroyed
    this will lead to a BUG. This is not a dangerous situation and really
    indicates a programming bug and leak of memory. This patch changes it to
    use a WARN and a printk rather than killing people's boxes.

    Signed-off-by: Eric Paris

    Eric Paris
     
  • There is nothing known wrong with the inotify watch addition/modification
    but this patch seperates the two code paths to make them each easy to
    verify as correct.

    Signed-off-by: Eric Paris

    Eric Paris
     

26 Aug, 2009

1 commit


25 Aug, 2009

3 commits

  • Commit 76db6d9500caeaa774a3e32a997eba30bbdc176b (nfs41: add session setup
    to the state manager) introduces an infinite loop possibility in the NFSv4
    state manager. By first checking nfs4_has_session() before clearing the
    NFS4CLNT_SESSION_SETUP flag, it allows for a situation where someone sets
    that flag, but it never gets cleared, and so the state manager loops.

    In fact commit c3fad1b1aaf850bf692642642ace7cd0d64af0a3 (nfs41: add session
    reset to state manager) causes this to happen every time we get a network
    partition error.

    Signed-off-by: Trond Myklebust
    Tested-by: Daniel J Blueman
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     
  • * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
    ocfs2/dlm: Wait on lockres instead of erroring cancel requests
    ocfs2: Add missing lock name
    ocfs2: Don't oops in ocfs2_kill_sb on a failed mount
    ocfs2: release the buffer head in ocfs2_do_truncate.
    ocfs2: Handle quota file corruption more gracefully

    Linus Torvalds
     
  • 2.6.30's commit 8a0bdec194c21c8fdef840989d0d7b742bb5d4bc removed
    user_shm_lock() calls in hugetlb_file_setup() but left the
    user_shm_unlock call in shm_destroy().

    In detail:
    Assume that can_do_hugetlb_shm() returns true and hence user_shm_lock()
    is not called in hugetlb_file_setup(). However, user_shm_unlock() is
    called in any case in shm_destroy() and in the following
    atomic_dec_and_lock(&up->__count) in free_uid() is executed and if
    up->__count gets zero, also cleanup_user_struct() is scheduled.

    Note that sched_destroy_user() is empty if CONFIG_USER_SCHED is not set.
    However, the ref counter up->__count gets unexpectedly non-positive and
    the corresponding structs are freed even though there are live
    references to them, resulting in a kernel oops after a lots of
    shmget(SHM_HUGETLB)/shmctl(IPC_RMID) cycles and CONFIG_USER_SCHED set.

    Hugh changed Stefan's suggested patch: can_do_hugetlb_shm() at the
    time of shm_destroy() may give a different answer from at the time
    of hugetlb_file_setup(). And fixed newseg()'s no_id error path,
    which has missed user_shm_unlock() ever since it came in 2.6.9.

    Reported-by: Stefan Huber
    Signed-off-by: Hugh Dickins
    Tested-by: Stefan Huber
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

24 Aug, 2009

3 commits

  • This patch makes the error message about changing journaling mode on remount
    more descriptive. Some people are going to hit this error now due to commit
    bbae8bcc49bc4d002221dab52c79a50a82e7cd1f if they configure a kernel to default
    to data=writeback mode. The problem happens if they have data=ordered set for
    the root filesystem in /etc/fstab but not in the kernel command line (and they
    don't use initrd). Their filesystem then gets mounted as data=writeback by
    kernel but then their boot fails because init scripts won't be able to remount
    the filesystem rw. Better error message will hopefully make it easier for them
    to find the error in their setup and bother us less with error reports :).

    Signed-off-by: Jan Kara

    Jan Kara
     
  • The old description for this configuration option was perhaps not
    completely balanced in terms of describing the tradeoffs of using a
    default of data=writeback vs. data=ordered. Despite the fact that old
    description very strongly recomended disabling this feature, all of
    the major distributions have elected to preserve the existing 'legacy'
    default, which is a strong hint that it perhaps wasn't telling the
    whole story.

    This revised description has been vetted by a number of ext3
    developers as being better at informing the user about the tradeoffs
    of enabling or disabling this configuration feature.

    Cc: linux-ext4@vger.kernel.org
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Jan Kara

    Theodore Ts'o
     
  • vfs_read() offset is defined as loff_t, but kernel_read()
    offset is only defined as unsigned long. Redefine
    kernel_read() offset as loff_t.

    Cc: stable@kernel.org
    Signed-off-by: Mimi Zohar
    Signed-off-by: James Morris

    Mimi Zohar
     

22 Aug, 2009

2 commits

  • In commit a8e7d49aa7be728c4ae241a75a2a124cdcabc0c5 ("Fix race in
    create_empty_buffers() vs __set_page_dirty_buffers()"), I removed a test
    for a NULL page mapping unintentionally when some of the code inside
    __set_page_dirty() was moved to the callers.

    That removal generally didn't matter, since a filesystem would serialize
    truncation (which clears the page mapping) against writing (which marks
    the buffer dirty), so locking at a higher level (either per-page or an
    inode at a time) should mean that the buffer page would be stable. And
    indeed, nothing bad seemed to happen.

    Except it turns out that apparently reiserfs does something odd when
    under load and writing out the journal, and we have a number of bugzilla
    entries that look similar:

    http://bugzilla.kernel.org/show_bug.cgi?id=13556
    http://bugzilla.kernel.org/show_bug.cgi?id=13756
    http://bugzilla.kernel.org/show_bug.cgi?id=13876

    and it looks like reiserfs depended on that check (the common theme
    seems to be "data=journal", and a journal writeback during a truncate).

    I suspect reiserfs should have some additional locking, but in the
    meantime this should get us back to the pre-2.6.29 behavior.

    Pattern-pointed-out-by: Roland Kletzing
    Cc: stable@kernel.org (2.6.29 and 2.6.30)
    Cc: Jeff Mahoney
    Cc: Nick Piggin
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • * 'btrfs' of git://git.kernel.dk/linux-2.6-block:
    btrfs: fix inode rbtree corruption

    Linus Torvalds
     

21 Aug, 2009

3 commits

  • Node may not be inserted over existing node. This causes inode tree
    corruption and I was seeing crashes in inode_tree_del which I can not
    reproduce after this patch.

    The other way to fix this would be to tie inode lifetime in the rbtree
    with inode while not in freeing state. I had a look at this but it is
    not so trivial at this point. At least this patch gets things working again.

    Signed-off-by: Nick Piggin
    Cc: Chris Mason
    Acked-by: Yan Zheng
    Signed-off-by: Jens Axboe

    From: Nick Piggin
     
  • In case a downconvert is queued, and a flock receives a signal,
    BUG_ON(lockres->l_action != OCFS2_AST_INVALID) is triggered
    because a lock cancel triggers a dlmunlock while an AST is
    scheduled.

    To avoid this, allow a LKM_CANCEL to pass through, and let it
    wait on __dlm_wait_on_lockres().

    Signed-off-by: Goldwyn Rodrigues
    Acked-off-by: Mark Fasheh
    Signed-off-by: Joel Becker

    Goldwyn Rodrigues
     
  • There is missing name for NFSSync cluster lock. This makes lockdep unhappy
    because we end up passing NULL to lockdep when initializing lock key. Fix it.

    Signed-off-by: Jan Kara
    Signed-off-by: Joel Becker

    Jan Kara
     

20 Aug, 2009

1 commit


19 Aug, 2009

3 commits

  • The commit 2ff05b2b (oom: move oom_adj value) moveed the oom_adj value to
    the mm_struct. It was a very good first step for sanitize OOM.

    However Paul Menage reported the commit makes regression to his job
    scheduler. Current OOM logic can kill OOM_DISABLED process.

    Why? His program has the code of similar to the following.

    ...
    set_oom_adj(OOM_DISABLE); /* The job scheduler never killed by oom */
    ...
    if (vfork() == 0) {
    set_oom_adj(0); /* Invoked child can be killed */
    execve("foo-bar-cmd");
    }
    ....

    vfork() parent and child are shared the same mm_struct. then above
    set_oom_adj(0) doesn't only change oom_adj for vfork() child, it's also
    change oom_adj for vfork() parent. Then, vfork() parent (job scheduler)
    lost OOM immune and it was killed.

    Actually, fork-setting-exec idiom is very frequently used in userland program.
    We must not break this assumption.

    Then, this patch revert commit 2ff05b2b and related commit.

    Reverted commit list
    ---------------------
    - commit 2ff05b2b4e (oom: move oom_adj value from task_struct to mm_struct)
    - commit 4d8b9135c3 (oom: avoid unnecessary mm locking and scanning for OOM_DISABLE)
    - commit 8123681022 (oom: only oom kill exiting tasks with attached memory)
    - commit 933b787b57 (mm: copy over oom_adj value at fork time)

    Signed-off-by: KOSAKI Motohiro
    Cc: Paul Menage
    Cc: David Rientjes
    Cc: KAMEZAWA Hiroyuki
    Cc: Rik van Riel
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Nick Piggin
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • get_sb_pseudo sets s_maxbytes to ~0ULL which becomes negative when cast
    to a signed value. Fix it to use MAX_LFS_FILESIZE which casts properly
    to a positive signed value.

    Signed-off-by: Jeff Layton
    Reviewed-by: Johannes Weiner
    Acked-by: Steve French
    Reviewed-by: Christoph Hellwig
    Cc: Al Viro
    Cc: Robert Love
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Layton
     
  • will fix kernel oopses like the following:

    # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test1
    # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test2
    # umount /test1
    # umount /test2

    BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1069
    in_atomic(): 0, irqs_disabled(): 1, pid: 3886, name: umount.nilfs2
    1 lock held by umount.nilfs2/3886:
    #0: (&type->s_umount_key#31){+.+...}, at: [] deactivate_super+0x52/0x6c
    irq event stamp: 1219
    hardirqs last enabled at (1219): [] __mutex_unlock_slowpath+0xf8/0x119
    hardirqs last disabled at (1218): [] __mutex_unlock_slowpath+0x59/0x119
    softirqs last enabled at (1214): [] __do_softirq+0x1a5/0x1ad
    softirqs last disabled at (1205): [] do_softirq+0x36/0x5a
    Pid: 3886, comm: umount.nilfs2 Not tainted 2.6.31-rc6 #55
    Call Trace:
    [] __might_sleep+0x107/0x10e
    [] do_page_fault+0x246/0x397
    [] ? do_page_fault+0x0/0x397
    [] error_code+0x6b/0x70
    [] ? do_page_fault+0x0/0x397
    [] ? __lock_acquire+0x91/0x12fd
    [] ? __lock_acquire+0x12ee/0x12fd
    [] ? __lock_acquire+0x12ee/0x12fd
    [] lock_acquire+0xba/0xdd
    [] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2]
    [] down_write+0x2a/0x46
    [] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2]
    [] nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2]
    [] ? mark_held_locks+0x43/0x5b
    [] ? trace_hardirqs_on_caller+0x10b/0x133
    [] ? trace_hardirqs_on+0xb/0xd
    [] nilfs_put_super+0x2f/0xca [nilfs2]
    [] generic_shutdown_super+0x49/0xb8
    [] kill_block_super+0x1d/0x31
    [] ? vfs_quota_off+0x0/0x12
    [] deactivate_super+0x57/0x6c
    [] mntput_no_expire+0x8c/0xb4
    [] sys_umount+0x27f/0x2a4
    [] sys_oldumount+0xd/0xf
    [] sysenter_do_call+0x12/0x38
    ...

    This turns out to be a bug brought by an -rc1 patch ("nilfs2: simplify
    remaining sget() use").

    In the patch, a new "put resource" function, nilfs_put_sbinfo()
    was introduced to delay freeing nilfs_sb_info struct.

    But the nilfs_put_sbinfo() mistakenly used atomic_dec_and_test()
    function to check the reference count, and it caused the nilfs_sb_info
    was freed when user mounted a snapshot twice.

    This bug also suggests there was unseen memory leak in usual mount
    /umount operations for nilfs.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     

18 Aug, 2009

17 commits