22 Apr, 2010

1 commit

  • kill_fasync() uses a central rwlock, candidate for RCU conversion, to
    avoid cache line ping pongs on SMP.

    fasync_remove_entry() and fasync_add_entry() can disable IRQS on a short
    section instead during whole list scan.

    Use a spinlock per fasync_struct to synchronize kill_fasync_rcu() and
    fasync_{remove|add}_entry(). This spinlock is IRQ safe, so sock_fasync()
    doesnt need its own implementation and can use fasync_helper(), to
    reduce code size and complexity.

    We can remove __kill_fasync() direct use in net/socket.c, and rename it
    to kill_fasync_rcu().

    Signed-off-by: Eric Dumazet
    Cc: Paul E. McKenney
    Cc: Lai Jiangshan
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 Apr, 2010

2 commits


20 Apr, 2010

6 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
    eCryptfs: Turn lower lookup error messages into debug messages
    eCryptfs: Copy lower directory inode times and size on link
    ecryptfs: fix use with tmpfs by removing d_drop from ecryptfs_destroy_inode
    ecryptfs: fix error code for missing xattrs in lower fs
    eCryptfs: Decrypt symlink target for stat size
    eCryptfs: Strip metadata in xattr flag in encrypted view
    eCryptfs: Clear buffer before reading in metadata xattr
    eCryptfs: Rename ecryptfs_crypt_stat.num_header_bytes_at_front
    eCryptfs: Fix metadata in xattr feature regression

    Linus Torvalds
     
  • Vaugue warnings about ENAMETOOLONG errors when looking up an encrypted
    file name have caused many users to become concerned about their data.
    Since this is a rather harmless condition, I'm moving this warning to
    only be printed when the ecryptfs_verbosity module param is 1.

    Signed-off-by: Tyler Hicks

    Tyler Hicks
     
  • The timestamps and size of a lower inode involved in a link() call was
    being copied to the upper parent inode. Instead, we should be
    copying lower parent inode's timestamps and size to the upper parent
    inode. I discovered this bug using the POSIX test suite at Tuxera.

    Signed-off-by: Tyler Hicks

    Tyler Hicks
     
  • Since tmpfs has no persistent storage, it pins all its dentries in memory
    so they have d_count=1 when other file systems would have d_count=0.
    ->lookup is only used to create new dentries. If the caller doesn't
    instantiate it, it's freed immediately at dput(). ->readdir reads
    directly from the dcache and depends on the dentries being hashed.

    When an ecryptfs mount is mounted, it associates the lower file and dentry
    with the ecryptfs files as they're accessed. When it's umounted and
    destroys all the in-memory ecryptfs inodes, it fput's the lower_files and
    d_drop's the lower_dentries. Commit 4981e081 added this and a d_delete in
    2008 and several months later commit caeeeecf removed the d_delete. I
    believe the d_drop() needs to be removed as well.

    The d_drop effectively hides any file that has been accessed via ecryptfs
    from the underlying tmpfs since it depends on it being hashed for it to
    be accessible. I've removed the d_drop on my development node and see no
    ill effects with basic testing on both tmpfs and persistent storage.

    As a side effect, after ecryptfs d_drops the dentries on tmpfs, tmpfs
    BUGs on umount. This is due to the dentries being unhashed.
    tmpfs->kill_sb is kill_litter_super which calls d_genocide to drop
    the reference pinning the dentry. It skips unhashed and negative dentries,
    but shrink_dcache_for_umount_subtree doesn't. Since those dentries
    still have an elevated d_count, we get a BUG().

    This patch removes the d_drop call and fixes both issues.

    This issue was reported at:
    https://bugzilla.novell.com/show_bug.cgi?id=567887

    Reported-by: Árpád Bíró
    Signed-off-by: Jeff Mahoney
    Cc: Dustin Kirkland
    Cc: stable@kernel.org
    Signed-off-by: Tyler Hicks

    Jeff Mahoney
     
  • If the lower file system driver has extended attributes disabled,
    ecryptfs' own access functions return -ENOSYS instead of -EOPNOTSUPP.
    This breaks execution of programs in the ecryptfs mount, since the
    kernel expects the latter error when checking for security
    capabilities in xattrs.

    Signed-off-by: Christian Pulvermacher
    Cc: stable@kernel.org
    Signed-off-by: Tyler Hicks

    Christian Pulvermacher
     
  • Create a getattr handler for eCryptfs symlinks that is capable of
    reading the lower target and decrypting its path. Prior to this patch,
    a stat's st_size field would represent the strlen of the encrypted path,
    while readlink() would return the strlen of the decrypted path. This
    could lead to confusion in some userspace applications, since the two
    values should be equal.

    https://bugs.launchpad.net/bugs/524919

    Reported-by: Loïc Minier
    Cc: stable@kernel.org
    Signed-off-by: Tyler Hicks

    Tyler Hicks
     

17 Apr, 2010

2 commits

  • Any inode reclaim flush that returns EAGAIN will result in the inode
    reclaim being attempted again later. There is no need to issue a
    warning into the logs about this situation.

    Signed-off-by: Dave Chinner
    Reviewed-by: Alex Elder
    Signed-off-by: Alex Elder

    Dave Chinner
     
  • Updates to the VFS layer removed an extra ->sync_fs call into the
    filesystem during the sync process (from the quota code).
    Unfortunately the sync code was unknowingly relying on this call to
    make sure metadata buffers were flushed via a xfs_buftarg_flush()
    call to move the tail of the log forward in memory before the final
    transactions of the sync process were issued.

    As a result, the old code would write a very recent log tail value
    to the log by the end of the sync process, and so a subsequent crash
    would leave nothing for log recovery to do. Hence in qa test 182,
    log recovery only replayed a small handle for inode fsync
    transactions in this case.

    However, with the removal of the extra ->sync_fs call, the log tail
    was now not moved forward with the inode fsync transactions near the
    end of the sync procese the first (and only) buftarg flush occurred
    after these transactions went to disk. The result is that log
    recovery now sees a large number of transactions for metadata that
    is already on disk.

    This usually isn't a problem, but when the transactions include
    inode chunk allocation, the inode create transactions and all
    subsequent changes are replayed as we cannt rely on what is on disk
    is valid. As a result, if the inode was written and contains
    unlogged changes, the unlogged changes are lost, thereby violating
    sync semantics.

    The fix is to always issue a transaction after the buftarg flush
    occurs is the log iѕ not idle or covered. This results in a dummy
    transaction being written that contains the up-to-date log tail
    value, which will be very recent. Indeed, it will be at least as
    recent as the old code would have left on disk, so log recovery
    will behave exactly as it used to in this situation.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner
     

15 Apr, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    ceph: use separate class for ceph sockets' sk_lock
    ceph: reserve one more caps space when doing readdir
    ceph: queue_cap_snap should always queue dirty context
    ceph: fix dentry reference leak in dcache readdir
    ceph: decode v5 of osdmap (pool names) [protocol change]
    ceph: fix ack counter reset on connection reset
    ceph: fix leaked inode ref due to snap metadata writeback race
    ceph: fix snap context reference leaks
    ceph: allow writeback of snapped pages older than 'oldest' snapc
    ceph: fix dentry rehashing on virtual .snap dir

    Linus Torvalds
     

14 Apr, 2010

4 commits

  • * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
    NFSv4: fix delegated locking
    NFS: Ensure that the WRITE and COMMIT RPC calls are always uninterruptible
    NFS: Fix a race with the new commit code
    NFS: Ensure that writeback_single_inode() calls write_inode() when syncing
    NFS: Fix the mode calculation in nfs_find_open_context
    NFSv4: Fall back to ordinary lookup if nfs4_atomic_open() returns EISDIR

    Linus Torvalds
     
  • Use a separate class for ceph sockets to prevent lockdep confusion.
    Because ceph sockets only get passed kernel pointers, there is no
    dependency from sk_lock -> mmap_sem. If we share the same class as other
    sockets, lockdep detects a circular dependency from

    mmap_sem (page fault) -> fs mutex -> sk_lock -> mmap_sem

    because dependencies are noted from both ceph and user contexts. Using
    a separate class prevents the sk_lock(ceph) -> mmap_sem dependency and
    makes lockdep happy.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • We were missing space for the directory cap. The result was a BUG at
    fs/ceph/caps.c:2178.

    Signed-off-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Yehuda Sadeh
     
  • This simplifies the calling convention, and fixes a bug where we queue a
    capsnap with a context other than i_head_snapc (the one that matches the
    dirty pages). The result was a BUG at fs/ceph/caps.c:2178 on writeback
    completion when a capsnap matching the writeback snapc could not be found.

    Signed-off-by: Sage Weil

    Sage Weil
     

13 Apr, 2010

9 commits


12 Apr, 2010

2 commits

  • Arnaud Giersch reports that NFSv4 locking is broken when we hold a
    delegation since commit 8e469ebd6dc32cbaf620e134d79f740bf0ebab79 (NFSv4:
    Don't allow posix locking against servers that don't support it).

    According to Arnaud, the lock succeeds the first time he opens the file
    (since we cannot do a delegated open) but then fails after we start using
    delegated opens.

    The following patch fixes it by ensuring that locking behaviour is
    governed by a per-filesystem capability flag that is initially set, but
    gets cleared if the server ever returns an OPEN without the
    NFS4_OPEN_RESULT_LOCKTYPE_POSIX flag being set.

    Reported-by: Arnaud Giersch
    Signed-off-by: Trond Myklebust
    Cc: stable@kernel.org

    Trond Myklebust
     
  • Fixes the typo found in a warning message of a persistent object
    allocator function.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     

10 Apr, 2010

7 commits


09 Apr, 2010

1 commit


08 Apr, 2010

3 commits

  • generic setattr not longer responsible for quota transfer.
    use udf_setattr for all udf's inodes.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Jan Kara

    Dmitry Monakhov
     
  • bloc->logicalBlockNum is unsigned so it's never less than zero.

    When I saw that, it made me worry that "bloc->logicalBlockNum + count"
    could overflow. That's why I changed the check for less than zero
    to an overflow check. (The test works because "count" is also
    unsigned.)

    Signed-off-by: Dan Carpenter
    Signed-off-by: Jan Kara

    Dan Carpenter
     
  • If nfs atomic open implementation ends up doing open request from
    ->d_revalidate() codepath and gets an error from server, return that error
    to caller explicitly and don't bother with lookup_instantiate_filp() at all.
    ->d_revalidate() can return an error itself just fine...

    See
    http://bugzilla.kernel.org/show_bug.cgi?id=15674
    http://marc.info/?l=linux-kernel&m=126988782722711&w=2

    for original report.

    Reported-by: Daniel J Blueman
    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

07 Apr, 2010

2 commits

  • Order the debugfs statistics correctly. The values displayed through a
    seq_printf() statement should be in the same order as the names in the
    format string.

    In the 'Lookups' line, objects created ('crt=') and lookups timed out
    ('tmo=') have their values transposed.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • When we look into pagemap using page-types with option -p, the value of
    pfn for hugepages looks wrong (see below.) This is because pte was
    evaluated only once for one vma although it should be updated for each
    hugepage. This patch fixes it.

    $ page-types -p 3277 -Nl -b huge
    voffset offset len flags
    7f21e8a00 11e400 1 ___U___________H_G________________
    7f21e8a01 11e401 1ff ________________TG________________
    ^^^
    7f21e8c00 11e400 1 ___U___________H_G________________
    7f21e8c01 11e401 1ff ________________TG________________
    ^^^

    One hugepage contains 1 head page and 511 tail pages in x86_64 and each
    two lines represent each hugepage. Voffset and offset mean virtual
    address and physical address in the page unit, respectively. The
    different hugepages should not have the same offset value.

    With this patch applied:

    $ page-types -p 3386 -Nl -b huge
    voffset offset len flags
    7fec7a600 112c00 1 ___UD__________H_G________________
    7fec7a601 112c01 1ff ________________TG________________
    ^^^
    7fec7a800 113200 1 ___UD__________H_G________________
    7fec7a801 113201 1ff ________________TG________________
    ^^^
    OK

    More info:

    - This patch modifies walk_page_range()'s hugepage walker. But the
    change only affects pagemap_read(), which is the only caller of hugepage
    callback.

    - Without this patch, hugetlb_entry() callback is called per vma, that
    doesn't match the natural expectation from its name.

    - With this patch, hugetlb_entry() is called per hugepte entry and the
    callback can become much simpler.

    Signed-off-by: Naoya Horiguchi
    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi