29 Jul, 2016

1 commit

  • Pull vfs updates from Al Viro:
    "Assorted cleanups and fixes.

    Probably the most interesting part long-term is ->d_init() - that will
    have a bunch of followups in (at least) ceph and lustre, but we'll
    need to sort the barrier-related rules before it can get used for
    really non-trivial stuff.

    Another fun thing is the merge of ->d_iput() callers (dentry_iput()
    and dentry_unlink_inode()) and a bunch of ->d_compare() ones (all
    except the one in __d_lookup_lru())"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
    fs/dcache.c: avoid soft-lockup in dput()
    vfs: new d_init method
    vfs: Update lookup_dcache() comment
    bdev: get rid of ->bd_inodes
    Remove last traces of ->sync_page
    new helper: d_same_name()
    dentry_cmp(): use lockless_dereference() instead of smp_read_barrier_depends()
    vfs: clean up documentation
    vfs: document ->d_real()
    vfs: merge .d_select_inode() into .d_real()
    unify dentry_iput() and dentry_unlink_inode()
    binfmt_misc: ->s_root is not going anywhere
    drop redundant ->owner initializations
    ufs: get rid of redundant checks
    orangefs: constify inode_operations
    missed comment updates from ->direct_IO() prototype change
    file_inode(f)->i_mapping is f->f_mapping
    trim fsnotify hooks a bit
    9p: new helper - v9fs_parent_fid()
    debugfs: ->d_parent is never NULL or negative
    ...

    Linus Torvalds
     

27 Jul, 2016

2 commits

  • Merge updates from Andrew Morton:

    - a few misc bits

    - ocfs2

    - most(?) of MM

    * emailed patches from Andrew Morton : (125 commits)
    thp: fix comments of __pmd_trans_huge_lock()
    cgroup: remove unnecessary 0 check from css_from_id()
    cgroup: fix idr leak for the first cgroup root
    mm: memcontrol: fix documentation for compound parameter
    mm: memcontrol: remove BUG_ON in uncharge_list
    mm: fix build warnings in
    mm, thp: convert from optimistic swapin collapsing to conservative
    mm, thp: fix comment inconsistency for swapin readahead functions
    thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
    shmem: split huge pages beyond i_size under memory pressure
    thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
    khugepaged: add support of collapse for tmpfs/shmem pages
    shmem: make shmem_inode_info::lock irq-safe
    khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
    thp: extract khugepaged from mm/huge_memory.c
    shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
    shmem: add huge pages support
    shmem: get_unmapped_area align huge page
    shmem: prepare huge= mount option and sysfs knob
    mm, rmap: account shmem thp pages
    ...

    Linus Torvalds
     
  • Vladimir has noticed that we might declare memcg oom even during
    readahead because read_pages only uses GFP_KERNEL (with mapping_gfp
    restriction) while __do_page_cache_readahead uses
    page_cache_alloc_readahead which adds __GFP_NORETRY to prevent from
    OOMs. This gfp mask discrepancy is really unfortunate and easily
    fixable. Drop page_cache_alloc_readahead() which only has one user and
    outsource the gfp_mask logic into readahead_gfp_mask and propagate this
    mask from __do_page_cache_readahead down to read_pages.

    This alone would have only very limited impact as most filesystems are
    implementing ->readpages and the common implementation mpage_readpages
    does GFP_KERNEL (with mapping_gfp restriction) again. We can tell it to
    use readahead_gfp_mask instead as this function is called only during
    readahead as well. The same applies to read_cache_pages.

    ext4 has its own ext4_mpage_readpages but the path which has pages !=
    NULL can use the same gfp mask. Btrfs, cifs, f2fs and orangefs are
    doing a very similar pattern to mpage_readpages so the same can be
    applied to them as well.

    [akpm@linux-foundation.org: coding-style fixes]
    [mhocko@suse.com: restrict gfp mask in mpage_alloc]
    Link: http://lkml.kernel.org/r/20160610074223.GC32285@dhcp22.suse.cz
    Link: http://lkml.kernel.org/r/1465301556-26431-1-git-send-email-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Cc: Vladimir Davydov
    Cc: Chris Mason
    Cc: Steve French
    Cc: Theodore Ts'o
    Cc: Jan Kara
    Cc: Mike Marshall
    Cc: Jaegeuk Kim
    Cc: Changman Lee
    Cc: Chao Yu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

06 Jul, 2016

5 commits

  • In orangefs_inode_getxattr(), an fsuid is written to dmesg. The kuid is
    converted to a userspace uid via from_kuid(current_user_ns(), [...]), but
    since dmesg is global, init_user_ns should be used here instead.

    In copy_attributes_from_inode(), op_alloc() and fill_default_sys_attrs(),
    upcall structures are populated with uids/gids that have been mapped into
    the caller's namespace. However, those upcall structures are read by
    another process (the userspace filesystem driver), and that process might
    be running in another namespace. This effectively lets any user spoof its
    uid and gid as seen by the userspace filesystem driver.

    To fix the second issue, I just construct the opcall structures with
    init_user_ns uids/gids and require the filesystem server to run in the
    init namespace. Since orangefs is full of global state anyway (as the error
    message in DUMP_DEVICE_ERROR explains, there can only be one userspace
    orangefs filesystem driver at once), that shouldn't be a problem.

    [
    Why does orangefs even exist in the kernel if everything does upcalls into
    userspace? What does orangefs do that couldn't be done with the FUSE
    interface? If there is no good answer to those questions, I'd prefer to see
    orangefs kicked out of the kernel. Can that be done for something that
    shipped in a release?

    According to commit f7ab093f74bf ("Orangefs: kernel client part 1"), they
    even already have a FUSE daemon, and the only rational reason (apart from
    "but most of our users report preferring to use our kernel module instead")
    given for not wanting to use FUSE is one "in-the-works" feature that could
    probably be integated into FUSE instead.
    ]

    This patch has been compile-tested.

    Signed-off-by: Jann Horn
    Signed-off-by: Mike Marshall

    Jann Horn
     
  • Signed-off-by: Mike Marshall

    Mike Marshall
     
  • Mike,

    On Fri, Jun 3, 2016 at 9:44 PM, Mike Marshall wrote:
    > We use the return value in this one line you changed, our userspace code gets
    > ill when we send it (-ENOMEM +1) as a key length...

    ah, my mistake. Here's a fixed version.

    Thanks,
    Andreas

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Mike Marshall

    Andreas Gruenbacher
     
  • Orangefs has a catch-all xattr handler that effectively does what the
    trusted handler does already.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Mike Marshall

    Andreas Gruenbacher
     
  • The ORANGEFS_XATTR_INDEX_ defines are unused; the ORANGEFS_XATTR_NAME_
    defines only obfuscate the code.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Mike Marshall

    Andreas Gruenbacher
     

30 May, 2016

2 commits


28 May, 2016

1 commit


03 May, 2016

2 commits


11 Apr, 2016

1 commit


10 Apr, 2016

1 commit

  • Pull orangefs fixes from Mike Marshall:
    "Orangefs cleanups and a strncpy vulnerability fix.

    Cleanups:
    - remove an unused variable from orangefs_readdir.
    - clean up printk wrapper used for ofs "gossip" debugging.
    - clean up truncate ctime and mtime setting in inode.c
    - remove a useless null check found by coccinelle.
    - optimize some memcpy/memset boilerplate code.
    - remove some useless sanity checks from xattr.c

    Fix:
    - fix a potential strncpy vulnerability"

    * tag 'for-linus-4.6-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
    orangefs: remove unused variable
    orangefs: Add KERN_ to gossip_ macros
    orangefs: strncpy -> strscpy
    orangefs: clean up truncate ctime and mtime setting
    Orangefs: fix ifnullfree.cocci warnings
    Orangefs: optimize boilerplate code.
    Orangefs: xattr.c cleanup

    Linus Torvalds
     

09 Apr, 2016

7 commits

  • Signed-off-by: Martin Brandenburg
    Signed-off-by: Mike Marshall

    Martin Brandenburg
     
  • Emit the logging messages at the appropriate levels.

    Miscellanea:

    o Change format to fmt
    o Use the more common ##__VA_ARGS__

    Signed-off-by: Joe Perches
    Signed-off-by: Mike Marshall

    Joe Perches
     
  • It would have been possible for a rogue client-core to send in a symlink
    target which is not NUL terminated. This returns EIO if the client-core
    gives us corrupt data.

    Leave debugfs and superblock code as is for now.

    Other dcache.c and namei.c strncpy instances are safe because
    ORANGEFS_NAME_MAX = NAME_MAX + 1; there is always enough space for a
    name plus a NUL byte.

    Signed-off-by: Martin Brandenburg
    Signed-off-by: Mike Marshall

    Martin Brandenburg
     
  • The ctime and mtime are always updated on a successful ftruncate and
    only updated on a successful truncate where the size changed.

    We handle the ``if the size changed'' bit.

    This matches FUSE's behavior.

    Signed-off-by: Martin Brandenburg
    Signed-off-by: Mike Marshall

    Martin Brandenburg
     
  • fs/orangefs/orangefs-debugfs.c:130:2-26: WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values.

    NULL check before some freeing functions is not needed.

    Based on checkpatch warning
    "kfree(NULL) is safe this check is probably not required"
    and kfreeaddr.cocci by Julia Lawall.

    Generated by: scripts/coccinelle/free/ifnullfree.cocci

    Signed-off-by: Fengguang Wu
    Signed-off-by: Mike Marshall

    kbuild test robot
     
  • Suggested by David Binderman
    The former can potentially be a performance win over the latter.

    memcpy(d, s, len);
    memset(d+len, c, size-len);

    memset(d, c, size);
    memcpy(d, s, len);

    Signed-off-by: Mike Marshall

    Mike Marshall
     
  • 1. It is nonsense to test for negative size_t, suggested by
    David Binderman

    2. By the time Orangefs gets called, the vfs has ensured that
    name != NULL, and that buffer and size are sane.

    Signed-off-by: Mike Marshall

    Mike Marshall
     

05 Apr, 2016

2 commits

  • Merge PAGE_CACHE_SIZE removal patches from Kirill Shutemov:
    "PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The first patch with most changes has been done with coccinelle. The
    second is manual fixups on top.

    The third patch removes macros definition"

    [ I was planning to apply this just before rc2, but then I spaced out,
    so here it is right _after_ rc2 instead.

    As Kirill suggested as a possibility, I could have decided to only
    merge the first two patches, and leave the old interfaces for
    compatibility, but I'd rather get it all done and any out-of-tree
    modules and patches can trivially do the converstion while still also
    working with older kernels, so there is little reason to try to
    maintain the redundant legacy model. - Linus ]

    * PAGE_CACHE_SIZE-removal:
    mm: drop PAGE_CACHE_* and page_cache_{get,release} definition
    mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage
    mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros

    Linus Torvalds
     
  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

01 Apr, 2016

2 commits


26 Mar, 2016

8 commits


24 Mar, 2016

6 commits