28 Feb, 2009

2 commits

  • Commit 8e961870bb9804110d5c8211d5d9d500451c4518 removed the FREEZE/THAW
    handling in xfs_compat_ioctl but never added any compat handler back, so
    now any freeze/thaw request from a 32-bit binary ond 64-bit userspace
    will fail.

    As these ioctls are 32/64-bit compatible two simple COMPATIBLE_IOCTL
    entries in fs/compat_ioctl.c will do the job.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Commit 4ea3ada2955e4519befa98ff55dd62d6dfbd1705 declares d_obtain_alias()
    as EXPORT_SYMBOL_GPL where it's supposed to replace d_alloc_anon which was
    previously declared as EXPORT_SYMBOL and thus available to any loadable
    module.

    This patch reverts that.

    Signed-off-by: Benny Halevy
    Acked-by: Linus Torvalds
    Cc: Christoph Hellwig
    Cc: "J. Bruce Fields"
    Cc: Trond Myklebust
    Acked-by: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benny Halevy
     

27 Feb, 2009

11 commits


26 Feb, 2009

1 commit


25 Feb, 2009

3 commits


24 Feb, 2009

1 commit

  • de_get is called before every proc_get_inode, but corresponding de_put is
    called only when dropping last reference to an inode. This might cause
    something like
    remove_proc_entry: /proc/stats busy, count=14496
    to be printed to the syslog.

    The fix is to call de_put in case of an already initialized inode in
    proc_get_inode.

    Signed-off-by: Krzysztof Sachanowicz
    Tested-by: Marcin Pilipczuk
    Acked-by: Al Viro
    Signed-off-by: Linus Torvalds

    Krzysztof Sachanowicz
     

23 Feb, 2009

1 commit

  • Functions ext4_write_begin() and ext4_da_write_begin() call
    grab_cache_page_write_begin() without AOP_FLAG_NOFS. Thus it
    can happen that page reclaim is triggered in that function
    and it recurses back into the filesystem (or some other filesystem).
    But this can lead to various problems as a transaction is already
    started at that point. Add the necessary flag.

    http://bugzilla.kernel.org/show_bug.cgi?id=11688

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     

22 Feb, 2009

2 commits

  • This is a workaround for find_group_flex() which badly needs to be
    replaced. One of its problems (besides ignoring the Orlov algorithm)
    is that it is a bit hyperactive about returning failure under
    suspicious circumstances. This can lead to spurious ENOSPC failures
    even when there are inodes still available.

    Work around this for now by retrying the search using
    find_group_other() if find_group_flex() returns -1. If
    find_group_other() succeeds when find_group_flex() has failed, log a
    warning message.

    A better block/inode allocator that will fix this problem for real has
    been queued up for the next merge window.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
    [CIFS] Fix multiuser mounts so server does not invalidate earlier security contexts
    [CIFS] improve posix semantics of file create
    [CIFS] Fix oops in cifs_strfromUCS_le mounting to servers which do not specify their OS
    cifs: posix fill in inode needed by posix open
    cifs: properly handle case where CIFSGetSrvInodeNumber fails
    cifs: refactor new_inode() calls and inode initialization
    [CIFS] Prevent OOPs when mounting with remote prefixpath.
    [CIFS] ipv6_addr_equal for address comparison

    Linus Torvalds
     

21 Feb, 2009

10 commits

  • At scan time we observed following scenario:

    node A inserted
    node B inserted
    node C inserted -> sets overlapped flag on node B

    node A is removed due to CRC failure -> overlapped flag on node B remains

    while (tn->overlapped)
    tn = tn_prev(tn);

    ==> crash, when tn_prev(B) is referenced.

    When the ultimate node is removed at scan time and the overlapped flag
    is set on the penultimate node, then nothing updates the overlapped
    flag of that node. The overlapped iterators blindly expect that the
    ultimate node does not have the overlapped flag set, which causes the
    scan code to crash.

    It would be a huge overhead to go through the node chain on node
    removal and fix up the overlapped flags, so detecting such a case on
    the fly in the overlapped iterators is a simpler and reliable
    solution.

    Cc: stable@kernel.org
    Signed-off-by: Thomas Gleixner
    Signed-off-by: David Woodhouse

    Thomas Gleixner
     
  • When two different users mount the same Windows 2003 Server share using CIFS,
    the first session mounted can be invalidated. Some servers invalidate the first
    smb session when a second similar user (e.g. two users who get mapped by server to "guest")
    authenticates an smb session from the same client.

    By making sure that we set the 2nd and subsequent vc numbers to nonzero values,
    this ensures that we will not have this problem.

    Fixes Samba bug 6004, problem description follows:
    How to reproduce:

    - configure an "open share" (full permissions to Guest user) on Windows 2003
    Server (I couldn't reproduce the problem with Samba server or Windows older
    than 2003)
    - mount the share twice with different users who will be authenticated as guest.

    noacl,noperm,user=john,dir_mode=0700,domain=DOMAIN,rw
    noacl,noperm,user=jeff,dir_mode=0700,domain=DOMAIN,rw

    Result:

    - just the mount point mounted last is accessible:

    Signed-off-by: Steve French

    Steve French
     
  • Samba server added support for a new posix open/create/mkdir operation
    a year or so ago, and we added support to cifs for mkdir to use it,
    but had not added the corresponding code to file create.

    The following patch helps improve the performance of the cifs create
    path (to Samba and servers which support the cifs posix protocol
    extensions). Using Connectathon basic test1, with 2000 files, the
    performance improved about 15%, and also helped reduce network traffic
    (17% fewer SMBs sent over the wire) due to saving a network round trip
    for the SetPathInfo on every file create.

    It should also help the semantics (and probably the performance) of
    write (e.g. when posix byte range locks are on the file) on file
    handles opened with posix create, and adds support for a few flags
    which would have to be ignored otherwise.

    Signed-off-by: Steve French

    Steve French
     
  • Fixes kernel bug #10451 http://bugzilla.kernel.org/show_bug.cgi?id=10451

    Certain NAS appliances do not set the operating system or network operating system
    fields in the session setup response on the wire. cifs was oopsing on the unexpected
    zero length response fields (when trying to null terminate a zero length field).

    This fixes the oops.

    Acked-by: Jeff Layton
    CC: stable
    Signed-off-by: Steve French

    Steve French
     
  • function needed to prepare for posix open

    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     
  • ...if it does then we pass a pointer to an unintialized variable for
    the inode number to cifs_new_inode. Have it pass a NULL pointer instead.

    Also tweak the function prototypes to reduce the amount of casting.

    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     
  • Move new inode creation into a separate routine and refactor the
    callers to take advantage of it.

    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     
  • Fixes OOPs with message 'kernel BUG at fs/cifs/cifs_dfs_ref.c:274!'.
    Checks if the prefixpath in an accesible while we are still in cifs_mount
    and fails with reporting a error if we can't access the prefixpath

    Should fix Samba bugs 6086 and 5861 and kernel bug 12192

    Signed-off-by: Igor Mammedov
    Acked-by: Jeff Layton
    Signed-off-by: Steve French

    Igor Mammedov
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
    Btrfs: check file pointer in btrfs_sync_file

    Linus Torvalds
     
  • This is a step in the direction of better -ENOSPC handling. Instead of
    checking the global bytes counter we check the space_info bytes counters to
    make sure we have enough space.

    If we don't we go ahead and try to allocate a new chunk, and then if that fails
    we return -ENOSPC. This patch adds two counters to btrfs_space_info,
    bytes_delalloc and bytes_may_use.

    bytes_delalloc account for extents we've actually setup for delalloc and will
    be allocated at some point down the line.

    bytes_may_use is to keep track of how many bytes we may use for delalloc at
    some point. When we actually set the extent_bit for the delalloc bytes we
    subtract the reserved bytes from the bytes_may_use counter. This keeps us from
    not actually being able to allocate space for any delalloc bytes.

    Signed-off-by: Josef Bacik

    Josef Bacik
     

20 Feb, 2009

5 commits


19 Feb, 2009

4 commits

  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    block: fix deadlock in blk_abort_queue() for drivers that readd to timeout list
    block: fix booting from partitioned md array
    block: revert part of 18ce3751ccd488c78d3827e9f6bf54e6322676fb
    cciss: PCI power management reset for kexec
    paride/pg.c: xs(): &&/|| confusion
    fs/bio: bio_alloc_bioset: pass right object ptr to mempool_free
    block: fix bad definition of BIO_RW_SYNC
    bsg: Fix sense buffer bug in SG_IO

    Linus Torvalds
     
  • Enhanced lockdep coverage of __GFP_NOFS turned up this new lockdep
    assert:

    [ 1093.677775]
    [ 1093.677781] =================================
    [ 1093.680031] [ INFO: inconsistent lock state ]
    [ 1093.680031] 2.6.29-rc5-tip-01504-gb49eca1-dirty #1
    [ 1093.680031] ---------------------------------
    [ 1093.680031] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
    [ 1093.680031] kswapd0/308 [HC0[0]:SC0[0]:HE1:SE1] takes:
    [ 1093.680031] (&inode->inotify_mutex){+.+.?.}, at: [] inotify_inode_is_dead+0x20/0x80
    [ 1093.680031] {RECLAIM_FS-ON-W} state was registered at:
    [ 1093.680031] [] mark_held_locks+0x43/0x5b
    [ 1093.680031] [] lockdep_trace_alloc+0x6c/0x6e
    [ 1093.680031] [] kmem_cache_alloc+0x20/0x150
    [ 1093.680031] [] idr_pre_get+0x27/0x6c
    [ 1093.680031] [] inotify_handle_get_wd+0x25/0xad
    [ 1093.680031] [] inotify_add_watch+0x7a/0x129
    [ 1093.680031] [] sys_inotify_add_watch+0x20f/0x250
    [ 1093.680031] [] sysenter_do_call+0x12/0x35
    [ 1093.680031] [] 0xffffffff
    [ 1093.680031] irq event stamp: 60417
    [ 1093.680031] hardirqs last enabled at (60417): [] call_rcu+0x53/0x59
    [ 1093.680031] hardirqs last disabled at (60416): [] call_rcu+0x17/0x59
    [ 1093.680031] softirqs last enabled at (59656): [] __do_softirq+0x157/0x16b
    [ 1093.680031] softirqs last disabled at (59651): [] do_softirq+0x74/0x15d
    [ 1093.680031]
    [ 1093.680031] other info that might help us debug this:
    [ 1093.680031] 2 locks held by kswapd0/308:
    [ 1093.680031] #0: (shrinker_rwsem){++++..}, at: [] shrink_slab+0x36/0x189
    [ 1093.680031] #1: (&type->s_umount_key#4){+++++.}, at: [] shrink_dcache_memory+0x110/0x1fb
    [ 1093.680031]
    [ 1093.680031] stack backtrace:
    [ 1093.680031] Pid: 308, comm: kswapd0 Not tainted 2.6.29-rc5-tip-01504-gb49eca1-dirty #1
    [ 1093.680031] Call Trace:
    [ 1093.680031] [] valid_state+0x12a/0x13d
    [ 1093.680031] [] mark_lock+0xc1/0x1e9
    [ 1093.680031] [] ? check_usage_forwards+0x0/0x3f
    [ 1093.680031] [] __lock_acquire+0x2c6/0xac8
    [ 1093.680031] [] ? register_lock_class+0x17/0x228
    [ 1093.680031] [] lock_acquire+0x5d/0x7a
    [ 1093.680031] [] ? inotify_inode_is_dead+0x20/0x80
    [ 1093.680031] [] __mutex_lock_common+0x3a/0x4cb
    [ 1093.680031] [] ? inotify_inode_is_dead+0x20/0x80
    [ 1093.680031] [] mutex_lock_nested+0x2e/0x36
    [ 1093.680031] [] ? inotify_inode_is_dead+0x20/0x80
    [ 1093.680031] [] inotify_inode_is_dead+0x20/0x80
    [ 1093.680031] [] dentry_iput+0x90/0xc2
    [ 1093.680031] [] d_kill+0x21/0x45
    [ 1093.680031] [] __shrink_dcache_sb+0x27f/0x355
    [ 1093.680031] [] shrink_dcache_memory+0x15e/0x1fb
    [ 1093.680031] [] shrink_slab+0x121/0x189
    [ 1093.680031] [] kswapd+0x39f/0x561
    [ 1093.680031] [] ? isolate_pages_global+0x0/0x233
    [ 1093.680031] [] ? autoremove_wake_function+0x0/0x43
    [ 1093.680031] [] ? kswapd+0x0/0x561
    [ 1093.680031] [] kthread+0x41/0x82
    [ 1093.680031] [] ? kthread+0x0/0x82
    [ 1093.680031] [] kernel_thread_helper+0x7/0x10

    inotify_handle_get_wd() does idr_pre_get() which does a
    kmem_cache_alloc() without __GFP_FS - and is hence deadlockable under
    extreme MM pressure.

    Signed-off-by: Ingo Molnar
    Acked-by: Peter Zijlstra
    Cc: MinChan Kim
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Otherwise, these don't work when called from 32-bit userspace on 64-bit
    kernels.

    Cc: Jiri Kosina
    Cc: Alan Cox
    Cc: [2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bill Nottingham
     
  • Li Zefan said:

    Thread 1:
    for ((; ;))
    {
    mount -t cpuset xxx /mnt > /dev/null 2>&1
    cat /mnt/cpus > /dev/null 2>&1
    umount /mnt > /dev/null 2>&1
    }

    Thread 2:
    for ((; ;))
    {
    mount -t cpuset xxx /mnt > /dev/null 2>&1
    umount /mnt > /dev/null 2>&1
    }

    (Note: It is irrelevant which cgroup subsys is used.)

    After a while a lockdep warning showed up:

    =============================================
    [ INFO: possible recursive locking detected ]
    2.6.28 #479
    ---------------------------------------------
    mount/13554 is trying to acquire lock:
    (&type->s_umount_key#19){--..}, at: [] sget+0x5e/0x321

    but task is already holding lock:
    (&type->s_umount_key#19){--..}, at: [] sget+0x1e2/0x321

    other info that might help us debug this:
    1 lock held by mount/13554:
    #0: (&type->s_umount_key#19){--..}, at: [] sget+0x1e2/0x321

    stack backtrace:
    Pid: 13554, comm: mount Not tainted 2.6.28-mc #479
    Call Trace:
    [] validate_chain+0x4c6/0xbbd
    [] __lock_acquire+0x676/0x700
    [] lock_acquire+0x5d/0x7a
    [] ? sget+0x5e/0x321
    [] down_write+0x34/0x50
    [] ? sget+0x5e/0x321
    [] sget+0x5e/0x321
    [] ? cgroup_set_super+0x0/0x3e
    [] ? cgroup_test_super+0x0/0x2f
    [] cgroup_get_sb+0x98/0x2e7
    [] cpuset_get_sb+0x4a/0x5f
    [] vfs_kern_mount+0x40/0x7b
    [] do_kern_mount+0x37/0xbf
    [] do_mount+0x5c3/0x61a
    [] ? copy_mount_options+0x2c/0x111
    [] sys_mount+0x69/0xa0
    [] sysenter_do_call+0x12/0x31

    The cause is after alloc_super() and then retry, an old entry in list
    fs_supers is found, so grab_super(old) is called, but both functions hold
    s_umount lock:

    struct super_block *sget(...)
    {
    ...
    retry:
    spin_lock(&sb_lock);
    if (test) {
    list_for_each_entry(old, &type->fs_supers, s_instances) {
    if (!test(old, data))
    continue;
    if (!grab_super(old)) s_umount);
    goto retry;
    if (s)
    destroy_super(s);
    return old;
    }
    }
    if (!s) {
    spin_unlock(&sb_lock);
    s = alloc_super(type); s_umount)
    if (!s)
    return ERR_PTR(-ENOMEM);
    goto retry;
    }
    ...
    }

    It seems like a false positive, and seems like VFS but not cgroup needs to
    be fixed.

    Peter said:

    We can simply put the new s_umount instance in a but lockdep doesn't
    particularly cares about subclass order.

    If there's any issue with the callers of sget() assuming the s_umount lock
    being of sublcass 0, then there is another annotation we can use to fix
    that, but lets not bother with that if this is sufficient.

    Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12673

    Signed-off-by: Peter Zijlstra
    Tested-by: Li Zefan
    Reported-by: Li Zefan
    Cc: Al Viro
    Cc: Paul Menage
    Cc: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra