07 May, 2014

8 commits

  • Merge misc fixes from Andrew Morton:
    "13 fixes"

    * emailed patches from Andrew Morton :
    agp: info leak in agpioc_info_wrap()
    fs/affs/super.c: bugfix / double free
    fanotify: fix -EOVERFLOW with large files on 64-bit
    slub: use sysfs'es release mechanism for kmem_cache
    revert "mm: vmscan: do not swap anon pages just because free+file is low"
    autofs: fix lockref lookup
    mm: filemap: update find_get_pages_tag() to deal with shadow entries
    mm/compaction: make isolate_freepages start at pageblock boundary
    MAINTAINERS: zswap/zbud: change maintainer email address
    mm/page-writeback.c: fix divide by zero in pos_ratio_polynom
    hugetlb: ensure hugepage access is denied if hugepages are not supported
    slub: fix memcg_propagate_slab_attrs
    drivers/rtc/rtc-pcf8523.c: fix month definition

    Linus Torvalds
     
  • Commit 842a859db26b ("affs: use ->kill_sb() to simplify ->put_super()
    and failure exits of ->mount()") adds .kill_sb which frees sbi but
    doesn't remove sbi free in case of parse_options error causing double
    free+random crash.

    Signed-off-by: Fabian Frederick
    Cc: Alexander Viro
    Cc: [3.14.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • On 64-bit systems, O_LARGEFILE is automatically added to flags inside
    the open() syscall (also openat(), blkdev_open(), etc). Userspace
    therefore defines O_LARGEFILE to be 0 - you can use it, but it's a
    no-op. Everything should be O_LARGEFILE by default.

    But: when fanotify does create_fd() it uses dentry_open(), which skips
    all that. And userspace can't set O_LARGEFILE in fanotify_init()
    because it's defined to 0. So if fanotify gets an event regarding a
    large file, the read() will just fail with -EOVERFLOW.

    This patch adds O_LARGEFILE to fanotify_init()'s event_f_flags on 64-bit
    systems, using the same test as open()/openat()/etc.

    Addresses https://bugzilla.redhat.com/show_bug.cgi?id=696821

    Signed-off-by: Will Woods
    Acked-by: Eric Paris
    Reviewed-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Will Woods
     
  • autofs needs to be able to see private data dentry flags for its dentrys
    that are being created but not yet hashed and for its dentrys that have
    been rmdir()ed but not yet freed. It needs to do this so it can block
    processes in these states until a status has been returned to indicate
    the given operation is complete.

    It does this by keeping two lists, active and expring, of dentrys in
    this state and uses ->d_release() to keep them stable while it checks
    the reference count to determine if they should be used.

    But with the recent lockref changes dentrys being freed sometimes don't
    transition to a reference count of 0 before being freed so autofs can
    occassionally use a dentry that is invalid which can lead to a panic.

    Signed-off-by: Ian Kent
    Cc: Al Viro
    Cc: Linus Torvalds
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ian Kent
     
  • Currently, I am seeing the following when I `mount -t hugetlbfs /none
    /dev/hugetlbfs`, and then simply do a `ls /dev/hugetlbfs`. I think it's
    related to the fact that hugetlbfs is properly not correctly setting
    itself up in this state?:

    Unable to handle kernel paging request for data at address 0x00000031
    Faulting instruction address: 0xc000000000245710
    Oops: Kernel access of bad area, sig: 11 [#1]
    SMP NR_CPUS=2048 NUMA pSeries
    ....

    In KVM guests on Power, in a guest not backed by hugepages, we see the
    following:

    AnonHugePages: 0 kB
    HugePages_Total: 0
    HugePages_Free: 0
    HugePages_Rsvd: 0
    HugePages_Surp: 0
    Hugepagesize: 64 kB

    HPAGE_SHIFT == 0 in this configuration, which indicates that hugepages
    are not supported at boot-time, but this is only checked in
    hugetlb_init(). Extract the check to a helper function, and use it in a
    few relevant places.

    This does make hugetlbfs not supported (not registered at all) in this
    environment. I believe this is fine, as there are no valid hugepages
    and that won't change at runtime.

    [akpm@linux-foundation.org: use pr_info(), per Mel]
    [akpm@linux-foundation.org: fix build when HPAGE_SHIFT is undefined]
    Signed-off-by: Nishanth Aravamudan
    Reviewed-by: Aneesh Kumar K.V
    Acked-by: Mel Gorman
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nishanth Aravamudan
     
  • Pull vfs fixes from Al Viro:
    "dcache fixes + kvfree() (uninlined, exported by mm/util.c) + posix_acl
    bugfix from hch"

    The dcache fixes are for a subtle LRU list corruption bug reported by
    Miklos Szeredi, where people inside IBM saw list corruptions with the
    LTP/host01 test.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    nick kvfree() from apparmor
    posix_acl: handle NULL ACL in posix_acl_equiv_mode
    dcache: don't need rcu in shrink_dentry_list()
    more graceful recovery in umount_collect()
    don't remove from shrink list in select_collect()
    dentry_kill(): don't try to remove from shrink list
    expand the call of dentry_lru_del() in dentry_kill()
    new helper: dentry_free()
    fold try_prune_one_dentry()
    fold d_kill() and d_free()
    fix races between __d_instantiate() and checks of dentry flags

    Linus Torvalds
     
  • Various filesystems don't bother checking for a NULL ACL in
    posix_acl_equiv_mode, and thus can dereference a NULL pointer when it
    gets passed one. This usually happens from the NFS server, as the ACL tools
    never pass a NULL ACL, but instead of one representing the mode bits.

    Instead of adding boilerplat to all filesystems put this check into one place,
    which will allow us to remove the check from other filesystems as well later
    on.

    Signed-off-by: Christoph Hellwig
    Reported-by: Ben Greear
    Reported-by: Marco Munderloh ,
    Cc: Chuck Lever
    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Pull fuse fixes from Miklos Szeredi:
    "This adds ctime update in the new cached writeback mode and also
    fixes/simplifies the mtime update handling. Support for rename flags
    (aka renameat2) is also added to the userspace API"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: add renameat2 support
    fuse: clear MS_I_VERSION
    fuse: clear FUSE_I_CTIME_DIRTY flag on setattr
    fuse: trust kernel i_ctime only
    fuse: remove .update_time
    fuse: allow ctime flushing to userspace
    fuse: fuse: add time_gran to INIT_OUT
    fuse: add .write_inode
    fuse: clean up fsync
    fuse: fuse: fallocate: use file_update_time()
    fuse: update mtime on open(O_TRUNC) in atomic_o_trunc mode
    fuse: update mtime on truncate(2)
    fuse: do not use uninitialized i_mode
    fuse: fix mtime update error in fsync
    fuse: check fallocate mode
    fuse: add __exit to fuse_ctl_cleanup

    Linus Torvalds
     

06 May, 2014

1 commit

  • Pull Ceph fixes from Sage Weil:
    "First, there is a critical fix for the new primary-affinity function
    that went into -rc1.

    The second batch of patches from Zheng fix a range of problems with
    directory fragmentation, readdir, and a few odds and ends for cephfs"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    ceph: reserve caps for file layout/lock MDS requests
    ceph: avoid releasing caps that are being used
    ceph: clear directory's completeness when creating file
    libceph: fix non-default values check in apply_primary_affinity()
    ceph: use fpos_cmp() to compare dentry positions
    ceph: check directory's completeness before emitting directory entry

    Linus Torvalds
     

05 May, 2014

1 commit

  • Dan's "smatch" checker found out that there was a bug in the error path of the
    'ubifs_remount_rw()' function. Instead of jumping to the "out" label which
    cleans-things up, we just returned.

    This patch fixes the problem.

    Reported-by: Dan Carpenter
    Signed-off-by: Artem Bityutskiy

    Artem Bityutskiy
     

04 May, 2014

3 commits

  • Since now the shrink list is private and nobody can free the dentry while
    it is on the shrink list, we can remove RCU protection from this.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Miklos Szeredi
     
  • Start with shrink_dcache_parent(), then scan what remains.

    First of all, BUG() is very much an overkill here; we are holding
    ->s_umount, and hitting BUG() means that a lot of interesting stuff
    will be hanging after that point (sync(2), for example). Moreover,
    in cases when there had been more than one leak, we'll be better
    off reporting all of them. And more than just the last component
    of pathname - %pd is there for just such uses...

    That was the last user of dentry_lru_del(), so kill it off...

    Signed-off-by: Al Viro

    Al Viro
     
  • If we find something already on a shrink list, just increment
    data->found and do nothing else. Loops in shrink_dcache_parent() and
    check_submounts_and_drop() will do the right thing - everything we
    did put into our list will be evicted and if there had been nothing,
    but data->found got non-zero, well, we have somebody else shrinking
    those guys; just try again.

    Signed-off-by: Al Viro

    Al Viro
     

01 May, 2014

7 commits

  • Pull aio fixes from Ben LaHaise:
    "The first change from Anatol fixes a regression where io_destroy() no
    longer waits for outstanding aios to complete. The second corrects a
    memory leak in an error path for vectored aio operations.

    Both of these bug fixes should be queued up for stable as well"

    * git://git.kvack.org/~bcrl/aio-fixes:
    aio: fix potential leak in aio_run_iocb().
    aio: block io_destroy() until all context requests are completed

    Linus Torvalds
     
  • If the victim in on the shrink list, don't remove it from there.
    If shrink_dentry_list() manages to remove it from the list before
    we are done - fine, we'll just free it as usual. If not - mark
    it with new flag (DCACHE_MAY_FREE) and leave it there.

    Eventually, shrink_dentry_list() will get to it, remove the sucker
    from shrink list and call dentry_kill(dentry, 0). Which is where
    we'll deal with freeing.

    Since now dentry_kill(dentry, 0) may happen after or during
    dentry_kill(dentry, 1), we need to recognize that (by seeing
    DCACHE_DENTRY_KILLED already set), unlock everything
    and either free the sucker (in case DCACHE_MAY_FREE has been
    set) or leave it for ongoing dentry_kill(dentry, 1) to deal with.

    Signed-off-by: Al Viro

    Al Viro
     
  • iovec should be reclaimed whenever caller of rw_copy_check_uvector() returns,
    but it doesn't hold when failure happens right after aio_setup_vectored_rw().

    Fix that in a such way to avoid hairy goto.

    Signed-off-by: Leon Yu
    Signed-off-by: Benjamin LaHaise
    Cc: stable@vger.kernel.org

    Leon Yu
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • The part of old d_free() that dealt with actual freeing of dentry.
    Taken out of dentry_kill() into a separate function.

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     

29 Apr, 2014

5 commits


28 Apr, 2014

15 commits

  • Support RENAME_EXCHANGE and RENAME_NOREPLACE flags on the userspace ABI.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Fuse doesn't support i_version (yet).

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • The patch addresses two use-cases when the flag may be safely cleared:

    1. fuse_do_setattr() is called with ATTR_CTIME flag set in attr->ia_valid.
    In this case attr->ia_ctime bears actual value. In-kernel fuse must send it
    to the userspace server and then assign the value to inode->i_ctime.

    2. fuse_do_setattr() is called with ATTR_SIZE flag set in attr->ia_valid,
    whereas ATTR_CTIME is not set (truncate(2)).
    In this case in-kernel fuse must sent "now" to the userspace server and then
    assign the value to inode->i_ctime.

    In both cases we could clear I_DIRTY_SYNC, but that needs more thought.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • Let the kernel maintain i_ctime locally: update i_ctime explicitly on
    truncate, fallocate, open(O_TRUNC), setxattr, removexattr, link, rename,
    unlink.

    The inode flag I_DIRTY_SYNC serves as indication that local i_ctime should
    be flushed to the server eventually. The patch sets the flag and updates
    i_ctime in course of operations listed above.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • This implements updating ctime as well as mtime on file_update_time().

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • The patch extends fuse_setattr_in, and extends the flush procedure
    (fuse_flush_times()) called on ->write_inode() to send the ctime as well as
    mtime.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • Allow userspace fs to specify time granularity.

    This is needed because with writeback_cache mode the kernel is responsible
    for generating mtime and ctime, but if the underlying filesystem doesn't
    support nanosecond granularity then the cache will contain a different
    value from the one stored on the filesystem resulting in a change of times
    after a cache flush.

    Make the default granularity 1s.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • ...and flush mtime from this. This allows us to use the kernel
    infrastructure for writing out dirty metadata (mtime at this point, but
    ctime in the next patches and also maybe atime).

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Don't need to start I/O twice (once without i_mutex and one within).

    Also make sure that even if the userspace filesystem doesn't support FSYNC
    we do all the steps other than sending the message.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • in preparation for getting rid of FUSE_I_MTIME_DIRTY.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • In case of fc->atomic_o_trunc is set, fuse does nothing in
    fuse_do_setattr() while handling open(O_TRUNC). Hence, i_mtime must be
    updated explicitly in fuse_finish_open(). The patch also adds extra locking
    encompassing open(O_TRUNC) operation to avoid races between the truncation
    and updating i_mtime.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • Handling truncate(2), VFS doesn't set ATTR_MTIME bit in iattr structure;
    only ATTR_SIZE bit is set. In-kernel fuse must handle the case by setting
    mtime fields of struct fuse_setattr_in to "now" and set FATTR_MTIME bit
    even though ATTR_MTIME was not set.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • When inode is in I_NEW state, inode->i_mode is not initialized yet. Do not
    use it before fuse_init_inode() is called.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • Bad case of shadowing.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Don't allow new fallocate modes until we figure out what (if anything) that
    takes.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi