29 May, 2009

4 commits


08 May, 2009

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: (32 commits)
    [CIFS] Fix double list addition in cifs posix open code
    [CIFS] Allow raw ntlmssp code to be enabled with sec=ntlmssp
    [CIFS] Fix SMB uid in NTLMSSP authenticate request
    [CIFS] NTLMSSP reenabled after move from connect.c to sess.c
    [CIFS] Remove sparse warning
    [CIFS] remove checkpatch warning
    [CIFS] Fix final user of old string conversion code
    [CIFS] remove cifs_strfromUCS_le
    [CIFS] NTLMSSP support moving into new file, old dead code removed
    [CIFS] Fix endian conversion of vcnum field
    [CIFS] Remove trailing whitespace
    [CIFS] Remove sparse endian warnings
    [CIFS] Add remaining ntlmssp flags and standardize field names
    [CIFS] Fix build warning
    cifs: fix length handling in cifs_get_name_from_search_buf
    [CIFS] Remove unneeded QuerySymlink call and fix mapping for unmapped status
    [CIFS] rename cifs_strndup to cifs_strndup_from_ucs
    Added loop check when mounting DFS tree.
    Enable dfs submounts to handle remote referrals.
    [CIFS] Remove older session setup implementation
    ...

    Linus Torvalds
     
  • Remove adding open file entry twice to lists in the file
    Do not fill file info twice in case of posix opens and creates

    Signed-off-by: Shirish Pargaonkar
    Signed-off-by: Steve French

    Steve French
     

07 May, 2009

2 commits

  • Fix a problem where the generic block based fiemap stuff would not
    properly set FIEMAP_EXTENT_LAST on the last extent. I've reworked things
    to keep track if we go past the EOF, and mark the last extent properly.
    The problem was reported by and tested by Eric Sandeen.

    Tested-by: Eric Sandeen
    Signed-off-by: Josef Bacik
    Cc:
    Cc:
    Cc:
    Cc: Steven Whitehouse
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef Bacik
     
  • There is what we believe to be a false positive reported by lockdep.

    inotify_inode_queue_event() => take inotify_mutex => kernel_event() =>
    kmalloc() => SLOB => alloc_pages_node() => page reclaim => slab reclaim =>
    dcache reclaim => inotify_inode_is_dead => take inotify_mutex => deadlock

    The plan is to fix this via lockdep annotation, but that is proving to be
    quite involved.

    The patch flips the allocation over to GFP_NFS to shut the warning up, for
    the 2.6.30 release.

    Hopefully we will fix this for real in 2.6.31. I'll queue a patch in -mm
    to switch it back to GFP_KERNEL so we don't forget.

    =================================
    [ INFO: inconsistent lock state ]
    2.6.30-rc2-next-20090417 #203
    ---------------------------------
    inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
    kswapd0/380 [HC0[0]:SC0[0]:HE1:SE1] takes:
    (&inode->inotify_mutex){+.+.?.}, at: [] inotify_inode_is_dead+0x35/0xb0
    {RECLAIM_FS-ON-W} state was registered at:
    [] mark_held_locks+0x68/0x90
    [] lockdep_trace_alloc+0xf5/0x100
    [] __kmalloc_node+0x31/0x1e0
    [] kernel_event+0xe2/0x190
    [] inotify_dev_queue_event+0x126/0x230
    [] inotify_inode_queue_event+0xc6/0x110
    [] vfs_create+0xcd/0x140
    [] do_filp_open+0x88d/0xa20
    [] do_sys_open+0x98/0x140
    [] sys_open+0x20/0x30
    [] system_call_fastpath+0x16/0x1b
    [] 0xffffffffffffffff
    irq event stamp: 690455
    hardirqs last enabled at (690455): [] _spin_unlock_irqrestore+0x44/0x80
    hardirqs last disabled at (690454): [] _spin_lock_irqsave+0x32/0xa0
    softirqs last enabled at (690178): [] __do_softirq+0x202/0x220
    softirqs last disabled at (690157): [] call_softirq+0x1c/0x50

    other info that might help us debug this:
    2 locks held by kswapd0/380:
    #0: (shrinker_rwsem){++++..}, at: [] shrink_slab+0x37/0x180
    #1: (&type->s_umount_key#17){++++..}, at: [] shrink_dcache_memory+0x11f/0x1e0

    stack backtrace:
    Pid: 380, comm: kswapd0 Not tainted 2.6.30-rc2-next-20090417 #203
    Call Trace:
    [] print_usage_bug+0x19f/0x200
    [] ? save_stack_trace+0x2f/0x50
    [] mark_lock+0x4bb/0x6d0
    [] ? check_usage_forwards+0x0/0xc0
    [] __lock_acquire+0xc62/0x1ae0
    [] ? slob_free+0x10c/0x370
    [] lock_acquire+0xe1/0x120
    [] ? inotify_inode_is_dead+0x35/0xb0
    [] mutex_lock_nested+0x63/0x420
    [] ? inotify_inode_is_dead+0x35/0xb0
    [] ? inotify_inode_is_dead+0x35/0xb0
    [] ? sched_clock+0x9/0x10
    [] ? lock_release_holdtime+0x35/0x1c0
    [] inotify_inode_is_dead+0x35/0xb0
    [] dentry_iput+0xbc/0xe0
    [] d_kill+0x33/0x60
    [] __shrink_dcache_sb+0x2d3/0x350
    [] shrink_dcache_memory+0x15a/0x1e0
    [] shrink_slab+0x125/0x180
    [] kswapd+0x560/0x7a0
    [] ? isolate_pages_global+0x0/0x2c0
    [] ? autoremove_wake_function+0x0/0x40
    [] ? trace_hardirqs_on+0xd/0x10
    [] ? kswapd+0x0/0x7a0
    [] kthread+0x5b/0xa0
    [] child_rip+0xa/0x20
    [] ? restore_args+0x0/0x30
    [] ? kthread+0x0/0xa0
    [] ? child_rip+0x0/0x20

    [eparis@redhat.com: fix audit too]
    Cc: Al Viro
    Cc: Matt Mackall
    Cc: Christoph Lameter
    Signed-off-by: Wu Fengguang
    Signed-off-by: Eric Paris
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     

06 May, 2009

2 commits


05 May, 2009

1 commit

  • By using the same test as is used for /proc/pid/maps and /proc/pid/smaps,
    only allow processes that can ptrace() a given process to see information
    that might be used to bypass address space layout randomization (ASLR).
    These include eip, esp, wchan, and start_stack in /proc/pid/stat as well
    as the non-symbolic output from /proc/pid/wchan.

    ASLR can be bypassed by sampling eip as shown by the proof-of-concept
    code at http://code.google.com/p/fuzzyaslr/ As part of a presentation
    (http://www.cr0.org/paper/to-jt-linux-alsr-leak.pdf) esp and wchan were
    also noted as possibly usable information leaks as well. The
    start_stack address also leaks potentially useful information.

    Cc: Stable Team
    Signed-off-by: Jake Edge
    Acked-by: Arjan van de Ven
    Acked-by: "Eric W. Biederman"
    Signed-off-by: Linus Torvalds

    Jake Edge
     

04 May, 2009

1 commit

  • The NTLMSSP code was removed from fs/cifs/connect.c and merged
    (75% smaller, cleaner) into fs/cifs/sess.c

    As with the old code it requires that cifs be built with
    CONFIG_CIFS_EXPERIMENTAL, the /proc/fs/cifs/Experimental flag
    must be set to 2, and mount must turn on extended security
    (e.g. with sec=krb5).

    Although NTLMSSP encapsulated in SPNEGO is not enabled yet,
    "raw" ntlmssp is common and useful in some cases since it
    offers more complete security negotiation, and is the
    default way of negotiating security for many Windows systems.
    SPNEGO encapsulated NTLMSSP will be able to reuse the same
    code.

    Signed-off-by: Steve French

    Steve French
     

03 May, 2009

10 commits

  • Follow up to Nick Piggin's patches to ensure that nfs_vm_page_mkwrite
    returns with the page lock held, and sets the VM_FAULT_LOCKED flag.

    See http://bugzilla.kernel.org/show_bug.cgi?id=12913

    Signed-off-by: Trond Myklebust
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     
  • * 'for-linus' of git://oss.sgi.com/xfs/xfs:
    xfs: fix getbmap vs mmap deadlock
    xfs: a couple getbmap cleanups
    xfs: add more checks to superblock validation
    xfs_file_last_byte() needs to acquire ilock

    Linus Torvalds
     
  • * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/configfs:
    configfs: Fix Trivial Warning in fs/configfs/symlink.c

    Linus Torvalds
     
  • * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
    ocfs2: Change repository in MAINTAINERS.
    ocfs2: Fix a missing credit when deleting from indexed directories.
    ocfs2/trivial: Remove unused variable in ocfs2_rename.
    ocfs2: Add missing iput() during error handling in ocfs2_dentry_attach_lock()
    ocfs2: Fix some printk() warnings.
    ocfs2: Fix 2 warning during ocfs2 make.
    ocfs2: Reserve 1 more cluster in expanding_inline_dir for indexed dir.

    Linus Torvalds
     
  • ->real_parent is the parent. ->parent may be the tracer.

    Signed-off-by: Oleg Nesterov
    Acked-by: David Howells
    Acked-by: Roland McGrath
    Cc: Greg Ungerer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • The Committed_AS field can underflow in certain situations:

    > # while true; do cat /proc/meminfo | grep _AS; sleep 1; done | uniq -c
    > 1 Committed_AS: 18446744073709323392 kB
    > 11 Committed_AS: 18446744073709455488 kB
    > 6 Committed_AS: 35136 kB
    > 5 Committed_AS: 18446744073709454400 kB
    > 7 Committed_AS: 35904 kB
    > 3 Committed_AS: 18446744073709453248 kB
    > 2 Committed_AS: 34752 kB
    > 9 Committed_AS: 18446744073709453248 kB
    > 8 Committed_AS: 34752 kB
    > 3 Committed_AS: 18446744073709320960 kB
    > 7 Committed_AS: 18446744073709454080 kB
    > 3 Committed_AS: 18446744073709320960 kB
    > 5 Committed_AS: 18446744073709454080 kB
    > 6 Committed_AS: 18446744073709320960 kB

    Because NR_CPUS can be greater than 1000 and meminfo_proc_show() does
    not check for underflow.

    But NR_CPUS proportional isn't good calculation. In general,
    possibility of lock contention is proportional to the number of online
    cpus, not theorical maximum cpus (NR_CPUS).

    The current kernel has generic percpu-counter stuff. using it is right
    way. it makes code simplify and percpu_counter_read_positive() don't
    make underflow issue.

    Reported-by: Dave Hansen
    Signed-off-by: KOSAKI Motohiro
    Cc: Eric B Munson
    Cc: Mel Gorman
    Cc: Christoph Lameter
    Cc: [All kernel versions]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • This fixes the problem introduced by commit 3bfacef412 (get rid of
    special-casing the /sbin/loader on alpha): osf/1 ecoff binary segfaults
    when binfmt_aout built as module. That happens because aout binary
    handler gets on the top of the binfmt list due to late registration, and
    kernel attempts to execute the binary without preparatory work that must
    be done by binfmt_loader.

    Fixed by changing the registration order of the default binfmt handlers
    using list_add_tail() and introducing insert_binfmt() function which
    places new handler on the top of the binfmt list. This might be generally
    useful for installing arch-specific frontends for default handlers or just
    for overriding them.

    Signed-off-by: Ivan Kokshaysky
    Cc: Al Viro
    Cc: Richard Henderson
    Signed-off-by: Linus Torvalds

    Ivan Kokshaysky
     
  • The intention of commit aae8679b0ebcaa92f99c1c3cb0cd651594a43915
    ("pagemap: fix bug in add_to_pagemap, require aligned-length reads of
    /proc/pid/pagemap") was to force reads of /proc/pid/pagemap to be a
    multiple of 8 bytes, but now it allows to read 0 bytes, which actually
    puts some data to user's buffer. According to POSIX, if count is zero,
    read() should return zero and has no other results.

    Signed-off-by: Vitaly Mayatskikh
    Cc: Thomas Tuttle
    Acked-by: Matt Mackall
    Cc: Alexey Dobriyan
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Mayatskikh
     
  • Change page_mkwrite to allow implementations to return with the page
    locked, and also change it's callers (in page fault paths) to hold the
    lock until the page is marked dirty. This allows the filesystem to have
    full control of page dirtying events coming from the VM.

    Rather than simply hold the page locked over the page_mkwrite call, we
    call page_mkwrite with the page unlocked and allow callers to return with
    it locked, so filesystems can avoid LOR conditions with page lock.

    The problem with the current scheme is this: a filesystem that wants to
    associate some metadata with a page as long as the page is dirty, will
    perform this manipulation in its ->page_mkwrite. It currently then must
    return with the page unlocked and may not hold any other locks (according
    to existing page_mkwrite convention).

    In this window, the VM could write out the page, clearing page-dirty. The
    filesystem has no good way to detect that a dirty pte is about to be
    attached, so it will happily write out the page, at which point, the
    filesystem may manipulate the metadata to reflect that the page is no
    longer dirty.

    It is not always possible to perform the required metadata manipulation in
    ->set_page_dirty, because that function cannot block or fail. The
    filesystem may need to allocate some data structure, for example.

    And the VM cannot mark the pte dirty before page_mkwrite, because
    page_mkwrite is allowed to fail, so we must not allow any window where the
    page could be written to if page_mkwrite does fail.

    This solution of holding the page locked over the 3 critical operations
    (page_mkwrite, setting the pte dirty, and finally setting the page dirty)
    closes out races nicely, preventing page cleaning for writeout being
    initiated in that window. This provides the filesystem with a strong
    synchronisation against the VM here.

    - Sage needs this race closed for ceph filesystem.
    - Trond for NFS (http://bugzilla.kernel.org/show_bug.cgi?id=12913).
    - I need it for fsblock.
    - I suspect other filesystems may need it too (eg. btrfs).
    - I have converted buffer.c to the new locking. Even simple block allocation
    under dirty pages might be susceptible to i_size changing under partial page
    at the end of file (we also have a buffer.c-side problem here, but it cannot
    be fixed properly without this patch).
    - Other filesystems (eg. NFS, maybe btrfs) will need to change their
    page_mkwrite functions themselves.

    [ This also moves page_mkwrite another step closer to fault, which should
    eventually allow page_mkwrite to be moved into ->fault, and thus avoiding a
    filesystem calldown and page lock/unlock cycle in __do_fault. ]

    [akpm@linux-foundation.org: fix derefs of NULL ->mapping]
    Cc: Sage Weil
    Cc: Trond Myklebust
    Signed-off-by: Nick Piggin
    Cc: Valdis Kletnieks
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Fix an obvious incorrect return status in autofs4_mount_busy().

    Signed-off-by: Ian Kent
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ian Kent
     

02 May, 2009

7 commits


01 May, 2009

10 commits

  • Removes two sparse CHECK_ENDIAN warnings from Jeffs earlier patch,
    and removes the dead readlink code (after noting where in
    findfirst we will need to add something like that in the future
    to handle the newly discovered unexpected error on FindFirst of NTFS symlinks.

    Signed-off-by: Steve French

    Steve French
     
  • Signed-off-by: Steve French

    Steve French
     
  • Signed-off-by: Steve French

    Steve French
     
  • The earlier patch to move this code to use the new unicode helpers
    assumed that the filename strings would be null terminated. That's not
    always the case.

    Instead of passing "max_len" to the string converter, pass "min(len,
    max_len)", which makes it do the right thing while still keeping the
    parser confined to the response. Also fix up the prototypes of this
    function and the callers so that max_len is unsigned (like len is).

    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     
  • Signed-off-by: Steve French

    Steve French
     
  • The ocfs2 directory index updates two blocks when we remove an entry -
    the dx root and the dx leaf. OCFS2_DELETE_INODE_CREDITS was only
    accounting for the dx leaf. This shows up when ocfs2_delete_inode()
    runs out of credits in jbd2_journal_dirty_metadata() at
    "J_ASSERT_JH(jh, handle->h_buffer_credits > 0);".

    The test that caught this was running dirop_file_racer from the
    ocfs2-test suite with a 250-character filename PREFIX. Run on a 512B
    blocksize, it forces the orphan dir index to grow large enough to
    trigger.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • In most cases, cifs_strndup is converting from Unicode (UCS2 / UTF-32) to
    the configured local code page for the Linux mount (usually UTF8), so
    Jeff suggested that to make it more clear that cifs_strndup is doing
    a conversion not just memory allocation and copy, rename the function
    to including "from_ucs" (ie Unicode)

    Signed-off-by: Steve French

    Steve French
     
  • Added loop check when mounting DFS tree. mount will fail with
    ELOOP if referral walks exceed MAX_NESTED_LINK count.

    Signed-off-by: Igor Mammedov
    Acked-by: Jeff Layton
    Signed-off-by: Steve French

    Igor Mammedov
     
  • Having remote dfs root support in cifs_mount, we can
    afford to pass into it UNC that is remote.

    Signed-off-by: Igor Mammedov
    Acked-by: Jeff Layton
    Signed-off-by: Steve French

    Igor Mammedov
     
  • Two years ago, when the session setup code in cifs was rewritten and moved
    to fs/cifs/sess.c, we were asked to keep the old code for a release or so
    (which could be reenabled at runtime) since it was such a large change and
    because the asn (SPNEGO) and NTLMSSP code was not rewritten and needed to
    be. This was useful to avoid regressions, but is long overdue to be removed.
    Now that the Kerberos (asn/spnego) code is working in fs/cifs/sess.c,
    and the NTLMSSP code moved (NTLMSSP blob setup be rewritten with the
    next patch in this series) quite a bit of dead code from fs/cifs/connect.c
    now can be removed.

    This old code should have been removed last year, but the earlier krb5
    patches did not move/remove the NTLMSSP code which we had asked to
    be done first. Since no one else volunteered, I am doing it now.

    It is extremely important that we continue to examine the documentation
    for this area, to make sure our code continues to be uptodate with
    changes since Windows 2003.

    Signed-off-by: Steve French

    Steve French
     

30 Apr, 2009

1 commit