03 May, 2011

1 commit


19 Apr, 2011

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (24 commits)
    Btrfs: fix free space cache leak
    Btrfs: avoid taking the chunk_mutex in do_chunk_alloc
    Btrfs end_bio_extent_readpage should look for locked bits
    Btrfs: don't force chunk allocation in find_free_extent
    Btrfs: Check validity before setting an acl
    Btrfs: Fix incorrect inode nlink in btrfs_link()
    Btrfs: Check if btrfs_next_leaf() returns error in btrfs_real_readdir()
    Btrfs: Check if btrfs_next_leaf() returns error in btrfs_listxattr()
    Btrfs: make uncache_state unconditional
    btrfs: using cached extent_state in set/unlock combinations
    Btrfs: avoid taking the trans_mutex in btrfs_end_transaction
    Btrfs: fix subvolume mount by name problem when default mount subvolume is set
    fix user annotation in ioctl.c
    Btrfs: check for duplicate iov_base's when doing dio reads
    btrfs: properly handle overlapping areas in memmove_extent_buffer
    Btrfs: fix memory leaks in btrfs_new_inode()
    Btrfs: check for duplicate iov_base's when doing dio reads
    Btrfs: reuse the extent_map we found when calling btrfs_get_extent
    Btrfs: do not use async submit for small DIO io's
    Btrfs: don't split dio bios if we don't have to
    ...

    Linus Torvalds
     
  • Rather than pass in some random truncated offset to the pid-related
    functions, check that the offset is in range up-front.

    This is just cleanup, the previous commit fixed the real problem.

    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

18 Apr, 2011

2 commits

  • The free space caching code was recently reworked to
    cache all the pages it needed instead of using find_get_page everywhere.

    One loop was missed though, so it ended up leaking pages. This fixes
    it to use our page array instead of find_get_page.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • While checking unregister_filesystem for saftey vs extra calls for
    "ext4: register ext2 and ext3 alias after ext4" I realized that
    the synchronize_rcu() was called on the error path but not on
    the success path.

    Cc: stable (2.6.38)
    Signed-off-by: Milton Miller
    [ This probably won't really make a difference since commit d863b50ab013
    ("vfs: call rcu_barrier after ->kill_sb()"), but it's the right thing
    to do. - Linus ]
    Signed-off-by: Linus Torvalds

    Milton Miller
     

16 Apr, 2011

9 commits

  • Everytime we try to allocate disk space we try and see if we can pre-emptively
    allocate a chunk, but in the common case we don't allocate anything, so there is
    no sense in taking the chunk_mutex at all. So instead if we are allocating a
    chunk, mark it in the space_info so we don't get two people trying to allocate
    at the same time. Thanks,

    Signed-off-by: Josef Bacik
    Reviewed-by: Liu Bo

    Josef Bacik
     
  • A recent commit caches the extent state in end_bio_extent_readpage,
    but the search it does should look for locked extents. This
    fixes things to make it more effective.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
    net/9p: nwname should be an unsigned int
    9p: Fix sparse error
    fs/9p: Fix error reported by coccicheck
    9p: revert tsyncfs related changes
    fs/9p: Use write_inode for data sync on server
    fs/9p: Fix revalidate to return correct value

    Linus Torvalds
     
  • During RCU walk in path_lookupat and path_openat, the rcu lookup
    frequently failed if looking up an absolute path, because when root
    directory was looked up, seq number was not properly set in nameidata.

    We dropped out of RCU walk in nameidata_drop_rcu due to mismatch in
    directory entry's seq number. We reverted to slow path walk that need
    to take references.

    With the following patch, I saw a 50% increase in an exim mail server
    benchmark throughput on a 4-socket Nehalem-EX system.

    Signed-off-by: Tim Chen
    Reviewed-by: Andi Kleen
    Cc: stable@kernel.org (v2.6.38)
    Signed-off-by: Linus Torvalds

    Tim Chen
     
  • Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • Now that we use write_inode to flush server
    cache related to fid, we don't need tsyncfs either fort dotl or dotu
    protocols. For dotu this helps to do a more efficient server flush.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • revalidate should return > 0 on success. Also return 0 on ENOENT
    to force do_revalidate to return NULL dentry;

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • find_free_extent likes to allocate in contiguous clusters,
    which makes writeback faster, especially on SSD storage. As
    the FS fragments, these clusters become harder to find and we have
    to decide between allocating a new chunk to make more clusters
    or giving up on the cluster to allocate from the free space
    we have.

    Right now it creates too many chunks, and you can end up with
    a whole FS that is mostly empty metadata chunks. This commit
    changes the allocation code to be more strict and only
    allocate new chunks when we've made good use of the chunks we
    already have.

    Signed-off-by: Chris Mason

    Chris Mason
     

15 Apr, 2011

6 commits

  • * 'linux-next' of git://git.infradead.org/ubifs-2.6:
    UBIFS: fix compilation warnings when compiling with gcc 4.5
    UBIFS: fix oops when R/O file-system is fsync'ed

    Linus Torvalds
     
  • The case we should be verifying when updating the dentry name is that
    the _parent_ inode (the directory) semaphore is held, not the semaphore
    for the dentry itself. It's the directory locking that rename and
    readdir() etc all care about.

    The comment just above even says so - but then the BUG_ON() still
    checked the dentry inode itself.

    Very few people noticed, because this helper function really isn't used
    for very much, so you had to be using ncpfs to ever hit it.

    I think I should just remove the BUG_ON (the function really has just
    one user), but let's run with it fixed for a while before getting rid of
    it entirely.

    Reported-and-tested-by: Bongani Hlope
    Reported-and-tested-by: Bernd Feige
    Cc: Petr Vandrovec ,
    Cc: Arnd Bergmann
    Cc: Christoph Hellwig
    Cc: Nick Piggin
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • On no-mmu arch, there is a memleak during shmem test. The cause of this
    memleak is ramfs_nommu_expand_for_mapping() added page refcount to 2
    which makes iput() can't free that pages.

    The simple test file is like this:

    int main(void)
    {
    int i;
    key_t k = ftok("/etc", 42);

    for ( i=0; i free
    total used free shared buffers
    Mem: 60320 17912 42408 0 0
    -/+ buffers: 17912 42408
    root:/> shmem
    run ok...
    root:/> free
    total used free shared buffers
    Mem: 60320 19096 41224 0 0
    -/+ buffers: 19096 41224
    root:/> shmem
    run ok...
    root:/> free
    total used free shared buffers
    Mem: 60320 20296 40024 0 0
    -/+ buffers: 20296 40024
    ...

    After this patch the test result is:(no memleak anymore)

    root:/> free
    total used free shared buffers
    Mem: 60320 16668 43652 0 0
    -/+ buffers: 16668 43652
    root:/> shmem
    run ok...
    root:/> free
    total used free shared buffers
    Mem: 60320 16668 43652 0 0
    -/+ buffers: 16668 43652

    Signed-off-by: Bob Liu
    Acked-by: Hugh Dickins
    Signed-off-by: David Howells
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Liu
     
  • force_o_largefile() on ia64 is defined in and requires
    .

    Signed-off-by: Jeff Mahoney
    Cc: Aneesh Kumar K.V
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Mahoney
     
  • 5520e89 ("brk: fix min_brk lower bound computation for COMPAT_BRK")
    tried to get the whole logic of brk randomization for legacy
    (libc5-based) applications finally right.

    It turns out that the way to detect whether brk has actually been
    randomized in the end or not introduced by that patch still doesn't work
    for those binaries, as reported by Geert:

    : /sbin/init from my old m68k ramdisk exists prematurely.
    :
    : Before the patch:
    :
    : | brk(0x80005c8e) = 0x80006000
    :
    : After the patch:
    :
    : | brk(0x80005c8e) = 0x80005c8e
    :
    : Old libc5 considers brk() to have failed if the return value is not
    : identical to the requested value.

    I don't like it, but currently see no better option than a bit flag in
    task_struct to catch the CONFIG_COMPAT_BRK && randomize_va_space == 2
    case.

    Signed-off-by: Jiri Kosina
    Tested-by: Geert Uytterhoeven
    Reported-by: Geert Uytterhoeven
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Kosina
     
  • The kernel automatically evaluates partition tables of storage devices.
    The code for evaluating LDM partitions (in fs/partitions/ldm.c) contains
    a bug that causes a kernel oops on certain corrupted LDM partitions.
    A kernel subsystem seems to crash, because, after the oops, the kernel no
    longer recognizes newly connected storage devices.

    The patch validates the value of vblk_size.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Timo Warns
    Cc: Eugene Teo
    Cc: Harvey Harrison
    Cc: Richard Russon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Timo Warns
     

13 Apr, 2011

10 commits

  • When compiling UBIFS with CONFIG_UBIFS_FS_DEBUG not set,
    gcc-4.5.2 generates a slew of "warning: statement with no effect"
    on references to non-void functions defined as 0.
    To avoid these warnings, replace #defines with dummy inline functions.

    Artem: massage the patch a bit, also remove the duplicate
    'dbg_check_lprops()' prototype.

    Signed-off-by: Maksim Rayskiy
    Acked-by: Mike Frysinger
    Signed-off-by: Artem Bityutskiy

    Maksim Rayskiy
     
  • This patch fixes severe UBIFS bug: UBIFS oopses when we 'fsync()' an
    file on R/O-mounter file-system. We (the UBIFS authors) incorrectly
    thought that VFS would not propagate 'fsync()' down to the file-system
    if it is read-only, but this is not the case.

    It is easy to exploit this bug using the following simple perl script:

    use strict;
    use File::Sync qw(fsync sync);

    die "File path is not specified" if not defined $ARGV[0];
    my $path = $ARGV[0];

    open FILE, " for reporting about this
    issue.

    Signed-off-by: Artem Bityutskiy
    Reported-by: Reuben Dowle
    Cc: stable@kernel.org

    Artem Bityutskiy
     
  • Call posix_acl_valid() to check if an acl is valid or not.

    Signed-off-by: Miao Xie
    Signed-off-by: Li Zefan

    Miao Xie
     
  • Link count of the inode is not decreased if btrfs_set_inode_index()
    fails.

    Signed-off-by: Miao Xie
    Singed-off-by: Li Zefan

    Miao Xie
     
  • btrfs_next_leaf() can return -errno, and we should propagate
    it to userspace.

    This also simplifies how we walk the btree path.

    Signed-off-by: Li Zefan

    Li Zefan
     
  • btrfs_next_leaf() can return -errno, and we should propagate
    it to userspace.

    This also simplifies how we walk the btree path.

    Signed-off-by: Li Zefan

    Li Zefan
     
  • The extent_io code can take cached pointers into the extent state trees,
    and these can make lookups much faster in common operations. The
    caching only happens when specific bits are set that prevent merging
    and splitting of the extent state.

    A help function was added to uncache the state, and it was testing
    the same set of conditionals. This can leak in very strange corner
    cases where the lock bit goes away unexpectedly.

    The uncaching should be unconditional. Once we have a ref on the
    extent we should always give it up.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
    cifs: don't allow mmap'ed pages to be dirtied while under writeback (try #3)
    [CIFS] Warn on requesting default security (ntlm) on mount
    [CIFS] cifs: clarify the meaning of tcpStatus == CifsGood
    cifs: wrap received signature check in srv_mutex
    cifs: clean up various nits in unicode routines (try #2)
    cifs: clean up length checks in check2ndT2
    cifs: set ra_pages in backing_dev_info
    cifs: fix broken BCC check in is_valid_oplock_break
    cifs: always do is_path_accessible check in cifs_mount
    various endian fixes to cifs
    Elminate sparse __CHECK_ENDIAN__ warnings on port conversion
    Max share size is too small
    Allow user names longer than 32 bytes
    cifs: replace /proc/fs/cifs/Experimental with a module parm
    cifs: check for private_data before trying to put it

    Linus Torvalds
     
  • nfs_scan_commit() is called with the inode->i_lock held, but it then
    calls __mark_inode_dirty() while still holding the lock. This causes
    a deadlock.

    Push the inode->i_lock into nfs_scan_commit() so it can protect only
    the parts of the code it needs to and can be dropped before the call
    to __mark_inode_dirty() to avoid the deadlock.

    Signed-off-by: Dave Chinner
    Tested-by: Will Simoneau
    Signed-off-by: Linus Torvalds

    Dave Chinner
     
  • This reverts commit 93f1c20bc8cdb757be50566eff88d65c3b26881f.

    It turns out that libmount misparses it because it adds a '-' character
    in the uuid string, which libmount then incorrectly confuses with the
    separator string (" - ") at the end of all the optional arguments.

    Upstream libmount (in the util-linux tree) has been fixed, but until
    that fix actually percolates up to users, we'd better not expose this
    change in the kernel.

    Let's revisit this later (possibly by exposing the UUID without any '-'
    characters in it, avoiding the user-space bug).

    Reported-by: Dave Jones
    Cc: Aneesh Kumar K.V
    Cc: Al Viro
    Cc: Karel Zak
    Cc: Ram Pai
    Cc: Miklos Szeredi
    Cc: Eric Sandeen
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

12 Apr, 2011

10 commits

  • This is more or less the same patch as before, but with some merge
    conflicts fixed up.

    If a process has a dirty page mapped into its page tables, then it has
    the ability to change it while the client is trying to write the data
    out to the server. If that happens after the signature has been
    calculated then that signature will then be wrong, and the server will
    likely reset the TCP connection.

    This patch adds a page_mkwrite handler for CIFS that simply takes the
    page lock. Because the page lock is held over the life of writepage and
    writepages, this prevents the page from becoming writeable until
    the write call has completed.

    With this, we can also remove the "sign_zero_copy" module option and
    always inline the pages when writing.

    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     
  • Warn once if default security (ntlm) requested. We will
    update the default to the stronger security mechanism
    (ntlmv2) in 2.6.41. Kerberos is also stronger than
    ntlm, but more servers support ntlmv2 and ntlmv2
    does not require an upcall, so ntlmv2 is a better
    default.

    Reviewed-by: Jeff Layton
    CC: Suresh Jayaraman
    Reviewed-by: Shirish Pargaonkar
    Signed-off-by: Steve French

    Steve French
     
  • When the TCP_Server_Info is first allocated and connected, tcpStatus ==
    CifsGood means that the NEGOTIATE_PROTOCOL request has completed and the
    socket is ready for other calls. cifs_reconnect however sets tcpStatus
    to CifsGood as soon as the socket is reconnected and the optional
    RFC1001 session setup is done. We have no clear way to tell the
    difference between these two states, and we need to know this in order
    to know whether we can send an echo or not.

    Resolve this by adding a new statusEnum value -- CifsNeedNegotiate. When
    the socket has been connected but has not yet had a NEGOTIATE_PROTOCOL
    request done, set it to this value. Once the NEGOTIATE is done,
    cifs_negotiate_protocol will set tcpStatus to CifsGood.

    This also fixes and cleans the logic in cifs_reconnect and
    cifs_reconnect_tcon. The old code checked for specific states when what
    it really wants to know is whether the state has actually changed from
    CifsNeedReconnect.

    Reported-and-Tested-by: JG
    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Steve French
     
  • While testing my patchset to fix asynchronous writes, I hit a bunch
    of signature problems when testing with signing on. The problem seems
    to be that signature checks on receive can be running at the same
    time as a process that is sending, or even that multiple receives can
    be checking signatures at the same time, clobbering the same data
    structures.

    While we're at it, clean up the comments over cifs_calculate_signature
    and add a note that the srv_mutex should be held when calling this
    function.

    This patch seems to fix the problems for me, but I'm not clear on
    whether it's the best approach. If it is, then this should probably
    go to stable too.

    Cc: stable@kernel.org
    Cc: Shirish Pargaonkar
    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     
  • Minor revision to the original patch. Don't abuse the __le16 variable
    on the stack by casting it to wchar_t and handing it off to char2uni.
    Declare an actual wchar_t on the stack instead. This fixes a valid
    sparse warning.

    Fix the spelling of UNI_ASTERISK. Eliminate the unneeded len_remaining
    variable in cifsConvertToUCS.

    Also, as David Howells points out. We were better off making
    cifsConvertToUCS *not* use put_unaligned_le16 since it means that we
    can't optimize the mapped characters at compile time. Switch them
    instead to use cpu_to_le16, and simply use put_unaligned to set them
    in the string.

    Reported-and-acked-by: David Howells
    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     
  • Thus spake David Howells:

    The code that follows this:

    remaining = total_data_size - data_in_this_rsp;
    if (remaining == 0)
    return 0;
    else if (remaining < 0) {

    generates better code if you drop the 'remaining' variable and compare
    the values directly.

    Clean it up per his recommendation...

    Reported-and-acked-by: David Howells
    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     
  • Commit 522440ed made cifs set backing_dev_info on the mapping attached
    to new inodes. This change caused a fairly significant read performance
    regression, as cifs started doing page-sized reads exclusively.

    By virtue of the fact that they're allocated as part of cifs_sb_info by
    kzalloc, the ra_pages on cifs BDIs get set to 0, which prevents any
    readahead. This forces the normal read codepaths to use readpage instead
    of readpages causing a four-fold increase in the number of read calls
    with the default rsize.

    Fix it by setting ra_pages in the BDI to the same value as that in the
    default_backing_dev_info.

    Fixes https://bugzilla.kernel.org/show_bug.cgi?id=31662

    Cc: stable@kernel.org
    Reported-and-Tested-by: Till
    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     
  • The BCC is still __le16 at this point, and in any case we need to
    use the get_bcc_le macro to make sure we don't hit alignment
    problems.

    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     
  • Currently, we skip doing the is_path_accessible check in cifs_mount if
    there is no prefixpath. I have a report of at least one server however
    that allows a TREE_CONNECT to a share that has a DFS referral at its
    root. The reporter in this case was using a UNC that had no prefixpath,
    so the is_path_accessible check was not triggered and the box later hit
    a BUG() because we were chasing a DFS referral on the root dentry for
    the mount.

    This patch fixes this by removing the check for a zero-length
    prefixpath. That should make the is_path_accessible check be done in
    this situation and should allow the client to chase the DFS referral at
    mount time instead.

    Cc: stable@kernel.org
    Reported-and-Tested-by: Yogesh Sharma
    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     
  • make modules C=2 M=fs/cifs CF=-D__CHECK_ENDIAN__

    Found for example:

    CHECK fs/cifs/cifssmb.c
    fs/cifs/cifssmb.c:728:22: warning: incorrect type in assignment (different base types)
    fs/cifs/cifssmb.c:728:22: expected unsigned short [unsigned] [usertype] Tid
    fs/cifs/cifssmb.c:728:22: got restricted __le16 [usertype]
    fs/cifs/cifssmb.c:1883:45: warning: incorrect type in assignment (different base types)
    fs/cifs/cifssmb.c:1883:45: expected long long [signed] [usertype] fl_start
    fs/cifs/cifssmb.c:1883:45: got restricted __le64 [usertype] start
    fs/cifs/cifssmb.c:1884:54: warning: restricted __le64 degrades to integer
    fs/cifs/cifssmb.c:1885:58: warning: restricted __le64 degrades to integer
    fs/cifs/cifssmb.c:1886:43: warning: incorrect type in assignment (different base types)
    fs/cifs/cifssmb.c:1886:43: expected unsigned int [unsigned] fl_pid
    fs/cifs/cifssmb.c:1886:43: got restricted __le32 [usertype] pid

    In checking new smb2 code for missing endian conversions, I noticed
    some endian errors had crept in over the last few releases into the
    cifs code (symlink, ntlmssp, posix lock, and also a less problematic warning
    in fscache). A followon patch will address a few smb2 endian
    problems.

    Reviewed-by: Jeff Layton
    Signed-off-by: Steve French

    Steve French