28 Jul, 2020

3 commits

  • The arguments of fsnotify() are overloaded and mean different things
    for different event types.

    Replace the to_tell argument with separate arguments @dir and @inode,
    because we may be sending to both dir and child. Using the @data
    argument to pass the child is not enough, because dirent events pass
    this argument (for audit), but we do not report to child.

    Document the new fsnotify() function argumenets.

    Link: https://lore.kernel.org/r/20200722125849.17418-7-amir73il@gmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • Simple helper to consolidate biolerplate code.

    Link: https://lore.kernel.org/r/20200722125849.17418-5-amir73il@gmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • Instead of calling fsnotify() twice, once with parent inode and once
    with child inode, if event should be sent to parent inode, send it
    with both parent and child inodes marks in object type iterator and call
    the backend handle_event() callback only once.

    The parent inode is assigned to the standard "inode" iterator type and
    the child inode is assigned to the special "child" iterator type.

    In that case, the bit FS_EVENT_ON_CHILD will be set in the event mask,
    the dir argument to handle_event will be the parent inode, the file_name
    argument to handle_event is non NULL and refers to the name of the child
    and the child inode can be accessed with fsnotify_data_inode().

    This will allow fanotify to make decisions based on child or parent's
    ignored mask. For example, when a parent is interested in a specific
    event on its children, but a specific child wishes to ignore this event,
    the event will not be reported. This is not what happens with current
    code, but according to man page, it is the expected behavior.

    Link: https://lore.kernel.org/r/20200716084230.30611-15-amir73il@gmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     

15 Jul, 2020

1 commit

  • When creating an FS_MODIFY event on inode itself (not on parent)
    the file_name argument should be NULL.

    The change to send a non NULL name to inode itself was done on purpuse
    as part of another commit, as Tejun writes: "...While at it, supply the
    target file name to fsnotify() from kernfs_node->name.".

    But this is wrong practice and inconsistent with inotify behavior when
    watching a single file. When a child is being watched (as opposed to the
    parent directory) the inotify event should contain the watch descriptor,
    but not the file name.

    Fixes: df6a58c5c5aa ("kernfs: don't depend on d_find_any_alias()...")
    Link: https://lore.kernel.org/r/20200708111156.24659-5-amir73il@gmail.com
    Acked-by: Tejun Heo
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     

10 Jun, 2020

1 commit

  • Convert comments that reference mmap_sem to reference mmap_lock instead.

    [akpm@linux-foundation.org: fix up linux-next leftovers]
    [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
    [akpm@linux-foundation.org: more linux-next fixups, per Michel]

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Daniel Jordan
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

23 Apr, 2020

1 commit

  • The kernfs_node lockdep tracking is being done on kn->active, the
    active reference count. The other reference count (kn->count) is not
    tracked by lockdep. So change the lockdep name to reflect what it is
    tracking.

    Signed-off-by: Waiman Long
    Acked-by: Tejun Heo
    Link: https://lore.kernel.org/r/20200402171056.27871-1-longman@redhat.com
    Signed-off-by: Greg Kroah-Hartman

    Waiman Long
     

17 Mar, 2020

2 commits

  • User extended attributes are useful as metadata storage for kernfs
    consumers like cgroups. Especially in the case of cgroups, it is useful
    to have a central metadata store that multiple processes/services can
    use to coordinate actions.

    A concrete example is for userspace out of memory killers. We want to
    let delegated cgroup subtree owners (running as non-root) to be able to
    say "please avoid killing this cgroup". This is especially important for
    desktop linux as delegated subtrees owners are less likely to run as
    root.

    This patch introduces a new flag, KERNFS_ROOT_SUPPORT_USER_XATTR, that
    lets kernfs consumers enable user xattr support. An initial limit of 128
    entries or 128KB -- whichever is hit first -- is placed per cgroup
    because xattrs come from kernel memory and we don't want to let
    unprivileged users accidentally eat up too much kernel memory.

    Signed-off-by: Daniel Xu
    Acked-by: Chris Down
    Reviewed-by: Greg Kroah-Hartman
    Signed-off-by: Tejun Heo

    Daniel Xu
     
  • This helps set up size accounting in the next commit. Without this out
    param, it's difficult to find out the removed xattr size without taking
    a lock for longer and walking the xattr linked list twice.

    Signed-off-by: Daniel Xu
    Acked-by: Chris Down
    Reviewed-by: Greg Kroah-Hartman
    Signed-off-by: Tejun Heo

    Daniel Xu
     

05 Feb, 2020

1 commit

  • Pull vfs timestamp updates from Al Viro:
    "More 64bit timestamp work"

    * 'imm.timestamp' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    kernfs: don't bother with timestamp truncation
    fs: Do not overload update_time
    fs: Delete timespec64_trunc()
    fs: ubifs: Eliminate timespec64_trunc() usage
    fs: ceph: Delete timespec64_trunc() usage
    fs: cifs: Delete usage of timespec64_trunc
    fs: fat: Eliminate timespec64_trunc() usage
    utimes: Clamp the timestamps in notify_change()

    Linus Torvalds
     

14 Jan, 2020

1 commit

  • Previously there was an additional check if variable pos is not null.
    However, this check happens after entering while loop and only then,
    which can happen only if pos is not null.
    Therefore the additional check is redundant and can be removed.

    Signed-off-by: Mateusz Nosek
    Acked-by: Tejun Heo
    Link: https://lore.kernel.org/r/20191230191628.21099-1-mateusznosek0@gmail.com
    Signed-off-by: Greg Kroah-Hartman

    Mateusz Nosek
     

09 Dec, 2019

1 commit


07 Dec, 2019

1 commit

  • Pull vfs d_inode/d_flags memory ordering fixes from Al Viro:
    "Fallout from tree-wide audit for ->d_inode/->d_flags barriers use.
    Basically, the problem is that negative pinned dentries require
    careful treatment - unless ->d_lock is locked or parent is held at
    least shared, another thread can make them positive right under us.

    Most of the uses turned out to be safe - the main surprises as far as
    filesystems are concerned were

    - race in dget_parent() fastpath, that might end up with the caller
    observing the returned dentry _negative_, due to insufficient
    barriers. It is positive in memory, but we could end up seeing the
    wrong value of ->d_inode in CPU cache. Fixed.

    - manual checks that result of lookup_one_len_unlocked() is positive
    (and rejection of negatives). Again, insufficient barriers (we
    might end up with inconsistent observed values of ->d_inode and
    ->d_flags). Fixed by switching to a new primitive that does the
    checks itself and returns ERR_PTR(-ENOENT) instead of a negative
    dentry. That way we get rid of boilerplate converting negatives
    into ERR_PTR(-ENOENT) in the callers and have a single place to
    deal with the barrier-related mess - inside fs/namei.c rather than
    in every caller out there.

    The guts of pathname resolution *do* need to be careful - the race
    found by Ritesh is real, as well as several similar races.
    Fortunately, it turns out that we can take care of that with fairly
    local changes in there.

    The tree-wide audit had not been fun, and I hate the idea of repeating
    it. I think the right approach would be to annotate the places where
    we are _not_ guaranteed ->d_inode/->d_flags stability and have sparse
    catch regressions. But I'm still not sure what would be the least
    invasive way of doing that and it's clearly the next cycle fodder"

    * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs/namei.c: fix missing barriers when checking positivity
    fix dget_parent() fastpath race
    new helper: lookup_positive_unlocked()
    fs/namei.c: pull positivity check into follow_managed()

    Linus Torvalds
     

27 Nov, 2019

1 commit

  • Pull locking updates from Ingo Molnar:
    "The main changes in this cycle were:

    - A comprehensive rewrite of the robust/PI futex code's exit handling
    to fix various exit races. (Thomas Gleixner et al)

    - Rework the generic REFCOUNT_FULL implementation using
    atomic_fetch_* operations so that the performance impact of the
    cmpxchg() loops is mitigated for common refcount operations.

    With these performance improvements the generic implementation of
    refcount_t should be good enough for everybody - and this got
    confirmed by performance testing, so remove ARCH_HAS_REFCOUNT and
    REFCOUNT_FULL entirely, leaving the generic implementation enabled
    unconditionally. (Will Deacon)

    - Other misc changes, fixes, cleanups"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
    lkdtm: Remove references to CONFIG_REFCOUNT_FULL
    locking/refcount: Remove unused 'refcount_error_report()' function
    locking/refcount: Consolidate implementations of refcount_t
    locking/refcount: Consolidate REFCOUNT_{MAX,SATURATED} definitions
    locking/refcount: Move saturation warnings out of line
    locking/refcount: Improve performance of generic REFCOUNT_FULL code
    locking/refcount: Move the bulk of the REFCOUNT_FULL implementation into the header
    locking/refcount: Remove unused refcount_*_checked() variants
    locking/refcount: Ensure integer operands are treated as signed
    locking/refcount: Define constants for saturation and max refcount values
    futex: Prevent exit livelock
    futex: Provide distinct return value when owner is exiting
    futex: Add mutex around futex exit
    futex: Provide state handling for exec() as well
    futex: Sanitize exit state handling
    futex: Mark the begin of futex exit explicitly
    futex: Set task::futex_state to DEAD right after handling futex exit
    futex: Split futex_mm_release() for exit/exec
    exit/exec: Seperate mm_release()
    futex: Replace PF_EXITPIDONE with a state
    ...

    Linus Torvalds
     

16 Nov, 2019

1 commit

  • Most of the callers of lookup_one_len_unlocked() treat negatives are
    ERR_PTR(-ENOENT). Provide a helper that would do just that. Note
    that a pinned positive dentry remains positive - it's ->d_inode is
    stable, etc.; a pinned _negative_ dentry can become positive at any
    point as long as you are not holding its parent at least shared.
    So using lookup_one_len_unlocked() needs to be careful;
    lookup_positive_unlocked() is safer and that's what the callers
    end up open-coding anyway.

    Signed-off-by: Al Viro

    Al Viro
     

13 Nov, 2019

7 commits

  • Each kernfs_node is identified with a 64bit ID. The low 32bit is
    exposed as ino and the high gen. While this already allows using inos
    as keys by looking up with wildcard generation number of 0, it's
    adding unnecessary complications for 64bit ino archs which can
    directly use kernfs_node IDs as inos to uniquely identify each cgroup
    instance.

    This patch exposes IDs directly as inos on 64bit ino archs. The
    conversion is mostly straight-forward.

    * 32bit ino archs behave the same as before. 64bit ino archs now use
    the whole 64bit ID as ino and the generation number is fixed at 1.

    * 64bit inos still use the same idr allocator which gurantees that the
    lower 32bits identify the current live instance uniquely and the
    high 32bits are incremented whenever the low bits wrap. As the
    upper 32bits are no longer used as gen and we don't wanna start ino
    allocation with 33rd bit set, the initial value for highbits
    allocation is changed to 0 on 64bit ino archs.

    * blktrace exposes two 32bit numbers - (INO,GEN) pair - to identify
    the issuing cgroup. Userland builds FILEID_INO32_GEN fids from
    these numbers to look up the cgroups. To remain compatible with the
    behavior, always output (LOW32,HIGH32) which will be constructed
    back to the original 64bit ID by __kernfs_fh_to_dentry().

    Signed-off-by: Tejun Heo
    Reviewed-by: Greg Kroah-Hartman
    Cc: Namhyung Kim

    Tejun Heo
     
  • The current kernfs exportfs implementation uses the generic_fh_*()
    helpers and FILEID_INO32_GEN[_PARENT] which limits ino to 32bits.
    Let's implement custom exportfs operations and fid type to remove the
    restriction.

    * FILEID_KERNFS is a single u64 value whose content is
    kernfs_node->id. This is the only native fid type.

    * For backward compatibility with blk_log_action() path which exposes
    (ino,gen) pairs which userland assembles into FILEID_INO32_GEN keys,
    combine the generic keys into 64bit IDs in the same order.

    Signed-off-by: Tejun Heo
    Reviewed-by: Greg Kroah-Hartman
    Cc: Namhyung Kim

    Tejun Heo
     
  • kernfs_find_and_get_node_by_ino() looks the kernfs_node matching the
    specified ino. On top of that, kernfs_get_node_by_id() and
    kernfs_fh_get_inode() implement full ID matching by testing the rest
    of ID.

    On surface, confusingly, the two are slightly different in that the
    latter uses 0 gen as wildcard while the former doesn't - does it mean
    that the latter can't uniquely identify inodes w/ 0 gen? In practice,
    this is a distinction without a difference because generation number
    starts at 1. There are no actual IDs with 0 gen, so it can always
    safely used as wildcard.

    Let's simplify the code by renaming kernfs_find_and_get_node_by_ino()
    to kernfs_find_and_get_node_by_id(), moving all lookup logics into it,
    and removing now unnecessary kernfs_get_node_by_id().

    Signed-off-by: Tejun Heo
    Reviewed-by: Greg Kroah-Hartman

    Tejun Heo
     
  • kernfs_node->id is currently a union kernfs_node_id which represents
    either a 32bit (ino, gen) pair or u64 value. I can't see much value
    in the usage of the union - all that's needed is a 64bit ID which the
    current code is already limited to. Using a union makes the code
    unnecessarily complicated and prevents using 64bit ino without adding
    practical benefits.

    This patch drops union kernfs_node_id and makes kernfs_node->id a u64.
    ino is stored in the lower 32bits and gen upper. Accessors -
    kernfs[_id]_ino() and kernfs[_id]_gen() - are added to retrieve the
    ino and gen. This simplifies ID handling less cumbersome and will
    allow using 64bit inos on supported archs.

    This patch doesn't make any functional changes.

    Signed-off-by: Tejun Heo
    Reviewed-by: Greg Kroah-Hartman
    Cc: Namhyung Kim
    Cc: Jens Axboe
    Cc: Alexei Starovoitov

    Tejun Heo
     
  • kernfs node can be created in two separate steps - allocation and
    activation. This is used to make kernfs nodes visible only after the
    internal states attached to the node are fully initialized.
    kernfs_find_and_get_node_by_id() currently allows lookups of nodes
    which aren't activated yet and thus can expose nodes are which are
    still being prepped by kernfs users.

    Fix it by disallowing lookups of nodes which aren't activated yet.

    kernfs_find_and_get_node_by_ino()

    Signed-off-by: Tejun Heo
    Reviewed-by: Greg Kroah-Hartman
    Cc: Namhyung Kim

    Tejun Heo
     
  • kernfs_find_and_get_node_by_ino() uses RCU protection. It's currently
    a bit buggy because it can look up a node which hasn't been activated
    yet and thus may end up exposing a node that the kernfs user is still
    prepping.

    While it can be fixed by pushing it further in the current direction,
    it's already complicated and isn't clear whether the complexity is
    justified. The main use of kernfs_find_and_get_node_by_ino() is for
    exportfs operations. They aren't super hot and all the follow-up
    operations (e.g. mapping to path) use normal locking anyway.

    Let's switch to a dumber locking scheme and protect the lookup with
    kernfs_idr_lock.

    Signed-off-by: Tejun Heo
    Reviewed-by: Greg Kroah-Hartman
    Cc: Namhyung Kim

    Tejun Heo
     
  • When the 32bit ino wraps around, kernfs increments the generation
    number to distinguish reused ino instances. The wrap-around detection
    tests whether the allocated ino is lower than what the cursor but the
    cursor is pointing to the next ino to allocate so the condition never
    triggers.

    Fix it by remembering the last ino and comparing against that.

    Signed-off-by: Tejun Heo
    Reviewed-by: Greg Kroah-Hartman
    Fixes: 4a3ef68acacf ("kernfs: implement i_generation")
    Cc: Namhyung Kim
    Cc: stable@vger.kernel.org # v4.14+

    Tejun Heo
     

09 Oct, 2019

1 commit

  • Since the following commit:

    b4adfe8e05f1 ("locking/lockdep: Remove unused argument in __lock_release")

    @nested is no longer used in lock_release(), so remove it from all
    lock_release() calls and friends.

    Signed-off-by: Qian Cai
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Will Deacon
    Acked-by: Daniel Vetter
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: airlied@linux.ie
    Cc: akpm@linux-foundation.org
    Cc: alexander.levin@microsoft.com
    Cc: daniel@iogearbox.net
    Cc: davem@davemloft.net
    Cc: dri-devel@lists.freedesktop.org
    Cc: duyuyang@gmail.com
    Cc: gregkh@linuxfoundation.org
    Cc: hannes@cmpxchg.org
    Cc: intel-gfx@lists.freedesktop.org
    Cc: jack@suse.com
    Cc: jlbec@evilplan.or
    Cc: joonas.lahtinen@linux.intel.com
    Cc: joseph.qi@linux.alibaba.com
    Cc: jslaby@suse.com
    Cc: juri.lelli@redhat.com
    Cc: maarten.lankhorst@linux.intel.com
    Cc: mark@fasheh.com
    Cc: mhocko@kernel.org
    Cc: mripard@kernel.org
    Cc: ocfs2-devel@oss.oracle.com
    Cc: rodrigo.vivi@intel.com
    Cc: sean@poorly.run
    Cc: st@kernel.org
    Cc: tj@kernel.org
    Cc: tytso@mit.edu
    Cc: vdavydov.dev@gmail.com
    Cc: vincent.guittot@linaro.org
    Cc: viro@zeniv.linux.org.uk
    Link: https://lkml.kernel.org/r/1568909380-32199-1-git-send-email-cai@lca.pw
    Signed-off-by: Ingo Molnar

    Qian Cai
     

20 Sep, 2019

1 commit

  • Pull y2038 vfs updates from Arnd Bergmann:
    "Add inode timestamp clamping.

    This series from Deepa Dinamani adds a per-superblock minimum/maximum
    timestamp limit for a file system, and clamps timestamps as they are
    written, to avoid random behavior from integer overflow as well as
    having different time stamps on disk vs in memory.

    At mount time, a warning is now printed for any file system that can
    represent current timestamps but not future timestamps more than 30
    years into the future, similar to the arbitrary 30 year limit that was
    added to settimeofday().

    This was picked as a compromise to warn users to migrate to other file
    systems (e.g. ext4 instead of ext3) when they need the file system to
    survive beyond 2038 (or similar limits in other file systems), but not
    get in the way of normal usage"

    * tag 'y2038-vfs' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
    ext4: Reduce ext4 timestamp warnings
    isofs: Initialize filesystem timestamp ranges
    pstore: fs superblock limits
    fs: omfs: Initialize filesystem timestamp ranges
    fs: hpfs: Initialize filesystem timestamp ranges
    fs: ceph: Initialize filesystem timestamp ranges
    fs: sysv: Initialize filesystem timestamp ranges
    fs: affs: Initialize filesystem timestamp ranges
    fs: fat: Initialize filesystem timestamp ranges
    fs: cifs: Initialize filesystem timestamp ranges
    fs: nfs: Initialize filesystem timestamp ranges
    ext4: Initialize timestamps limits
    9p: Fill min and max timestamps in sb
    fs: Fill in max and min timestamps in superblock
    utimes: Clamp the timestamps before update
    mount: Add mount warning for impending timestamp expiry
    timestamp_truncate: Replace users of timespec64_trunc
    vfs: Add timestamp_truncate() api
    vfs: Add file timestamp range support

    Linus Torvalds
     

30 Aug, 2019

1 commit

  • Update the inode timestamp updates to use timestamp_truncate()
    instead of timespec64_trunc().

    The change was mostly generated by the following coccinelle
    script.

    virtual context
    virtual patch

    @r1 depends on patch forall@
    struct inode *inode;
    identifier i_xtime =~ "^i_[acm]time$";
    expression e;
    @@

    inode->i_xtime =
    - timespec64_trunc(
    + timestamp_truncate(
    ...,
    - e);
    + inode);

    Signed-off-by: Deepa Dinamani
    Acked-by: Greg Kroah-Hartman
    Acked-by: Jeff Layton
    Cc: adrian.hunter@intel.com
    Cc: dedekind1@gmail.com
    Cc: gregkh@linuxfoundation.org
    Cc: hch@lst.de
    Cc: jaegeuk@kernel.org
    Cc: jlbec@evilplan.org
    Cc: richard@nod.at
    Cc: tj@kernel.org
    Cc: yuchao0@huawei.com
    Cc: linux-f2fs-devel@lists.sourceforge.net
    Cc: linux-ntfs-dev@lists.sourceforge.net
    Cc: linux-mtd@lists.infradead.org

    Deepa Dinamani
     

25 Jul, 2019

2 commits

  • In kernfs_path_from_node_locked(), there is an if statement on line 147
    to check whether buf is NULL:
    if (buf)

    When buf is NULL, it is used on line 151:
    len += strlcpy(buf + len, parent_str, ...)
    and line 158:
    len += strlcpy(buf + len, "/", ...)
    and line 160:
    len += strlcpy(buf + len, kn->name, ...)

    Thus, possible null-pointer dereferences may occur.

    To fix these possible bugs, buf is checked before being used.
    If it is NULL, -EINVAL is returned.

    These bugs are found by a static analysis tool STCheck written by us.

    Signed-off-by: Jia-Ju Bai
    Link: https://lore.kernel.org/r/20190724022242.27505-1-baijiaju1990@gmail.com
    Signed-off-by: Greg Kroah-Hartman

    Jia-Ju Bai
     
  • Get root safely after kn is ensureed to be not null.

    Signed-off-by: Peng Wang
    Acked-by: Tejun Heo
    Link: https://lore.kernel.org/r/20190708151611.13242-1-rocking@whu.edu.cn
    Signed-off-by: Greg Kroah-Hartman

    Peng Wang
     

05 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this file is released under the gplv2

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 68 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Armijn Hemel
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190531190114.292346262@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 May, 2019

1 commit


08 May, 2019

2 commits

  • Pull misc dcache updates from Al Viro:
    "Most of this pile is putting name length into struct name_snapshot and
    making use of it.

    The beginning of this series ("ovl_lookup_real_one(): don't bother
    with strlen()") ought to have been split in two (separate switch of
    name_snapshot to struct qstr from overlayfs reaping the trivial
    benefits of that), but I wanted to avoid a rebase - by the time I'd
    spotted that it was (a) in -next and (b) close to 5.1-final ;-/"

    * 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    audit_compare_dname_path(): switch to const struct qstr *
    audit_update_watch(): switch to const struct qstr *
    inotify_handle_event(): don't bother with strlen()
    fsnotify: switch send_to_group() and ->handle_event to const struct qstr *
    fsnotify(): switch to passing const struct qstr * for file_name
    switch fsnotify_move() to passing const struct qstr * for old_name
    ovl_lookup_real_one(): don't bother with strlen()
    sysv: bury the broken "quietly truncate the long filenames" logics
    nsfs: unobfuscate
    unexport d_alloc_pseudo()

    Linus Torvalds
     
  • Pull selinux updates from Paul Moore:
    "We've got a few SELinux patches for the v5.2 merge window, the
    highlights are below:

    - Add LSM hooks, and the SELinux implementation, for proper labeling
    of kernfs. While we are only including the SELinux implementation
    here, the rest of the LSM folks have given the hooks a thumbs-up.

    - Update the SELinux mdp (Make Dummy Policy) script to actually work
    on a modern system.

    - Disallow userspace to change the LSM credentials via
    /proc/self/attr when the task's credentials are already overridden.

    The change was made in procfs because all the LSM folks agreed this
    was the Right Thing To Do and duplicating it across each LSM was
    going to be annoying"

    * tag 'selinux-pr-20190507' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
    proc: prevent changes to overridden credentials
    selinux: Check address length before reading address family
    kernfs: fix xattr name handling in LSM helpers
    MAINTAINERS: update SELinux file patterns
    selinux: avoid uninitialized variable warning
    selinux: remove useless assignments
    LSM: lsm_hooks.h - fix missing colon in docstring
    selinux: Make selinux_kernfs_init_security static
    kernfs: initialize security of newly created nodes
    selinux: implement the kernfs_init_security hook
    LSM: add new hook for kernfs node initialization
    kernfs: use simple_xattrs for security attributes
    selinux: try security xattr after genfs for kernfs filesystems
    kernfs: do not alloc iattrs in kernfs_xattr_get
    kernfs: clean up struct kernfs_iattrs
    scripts/selinux: fix build
    selinux: use kernel linux/socket.h for genheaders and mdp
    scripts/selinux: modernize mdp

    Linus Torvalds
     

27 Apr, 2019

1 commit

  • Note that in fnsotify_move() and fsnotify_link() we are guaranteed
    that dentry->d_name won't change during the fsnotify() evaluation
    (by having the parent directory locked exclusive), so we don't
    need to fetch dentry->d_name.name in the callers. In fsnotify_dirent()
    the same stability of dentry->d_name is also true, but it's a bit
    more convoluted - there is one callchain (devpts_pty_new() ->
    fsnotify_create() -> fsnotify_dirent()) where the parent is _not_
    locked, but on devpts ->d_name of everything is unchanging; it
    has neither explicit nor implicit renames.

    Signed-off-by: Al Viro

    Al Viro
     

26 Apr, 2019

1 commit

  • smp_mb__before_atomic() can not be applied to atomic_set(). Remove the
    barrier and rely on RELEASE synchronization.

    Fixes: ba16b2846a8c6 ("kernfs: add an API to get kernfs node from inode number")
    Cc: stable@vger.kernel.org
    Signed-off-by: Andrea Parri
    Acked-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Andrea Parri
     

04 Apr, 2019

1 commit

  • The implementation of kernfs_security_xattr_*() helpers reuses the
    kernfs_node_xattr_*() functions, which take the suffix of the xattr name
    and extract full xattr name from it using xattr_full_name(). However,
    this function relies on the fact that the suffix passed to xattr
    handlers from VFS is always constructed from the full name by just
    incerementing the pointer. This doesn't necessarily hold for the callers
    of kernfs_security_xattr_*(), so their usage will easily lead to
    out-of-bounds access.

    Fix this by moving the xattr name reconstruction to the VFS xattr
    handlers and replacing the kernfs_security_xattr_*() helpers with more
    general kernfs_xattr_*() helpers that take full xattr name and allow
    accessing all kernfs node's xattrs.

    Reported-by: kernel test robot
    Fixes: b230d5aba2d1 ("LSM: add new hook for kernfs node initialization")
    Fixes: ec882da5cda9 ("selinux: implement the kernfs_init_security hook")
    Signed-off-by: Ondrej Mosnacek
    Signed-off-by: Paul Moore

    Ondrej Mosnacek
     

21 Mar, 2019

5 commits

  • Use the new security_kernfs_init_security() hook to allow LSMs to
    possibly assign a non-default security context to a newly created kernfs
    node based on the attributes of the new node and also its parent node.

    This fixes an issue with cgroupfs under SELinux, where newly created
    cgroup subdirectories/files would not inherit its parent's context if
    it had been set explicitly to a non-default value (other than the genfs
    context specified by the policy). This can be reproduced as follows (on
    Fedora/RHEL):

    # mkdir /sys/fs/cgroup/unified/test
    # # Need permissive to change the label under Fedora policy:
    # setenforce 0
    # chcon -t container_file_t /sys/fs/cgroup/unified/test
    # ls -lZ /sys/fs/cgroup/unified
    total 0
    -r--r--r--. 1 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:06 cgroup.controllers
    -rw-r--r--. 1 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:06 cgroup.max.depth
    -rw-r--r--. 1 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:06 cgroup.max.descendants
    -rw-r--r--. 1 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:06 cgroup.procs
    -r--r--r--. 1 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:06 cgroup.stat
    -rw-r--r--. 1 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:06 cgroup.subtree_control
    -rw-r--r--. 1 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:06 cgroup.threads
    drwxr-xr-x. 2 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:06 init.scope
    drwxr-xr-x. 26 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:21 system.slice
    drwxr-xr-x. 3 root root system_u:object_r:container_file_t:s0 0 Jan 29 03:15 test
    drwxr-xr-x. 3 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:06 user.slice
    # mkdir /sys/fs/cgroup/unified/test/subdir

    Actual result:

    # ls -ldZ /sys/fs/cgroup/unified/test/subdir
    drwxr-xr-x. 2 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:15 /sys/fs/cgroup/unified/test/subdir

    Expected result:

    # ls -ldZ /sys/fs/cgroup/unified/test/subdir
    drwxr-xr-x. 2 root root unconfined_u:object_r:container_file_t:s0 0 Jan 29 03:15 /sys/fs/cgroup/unified/test/subdir

    Link: https://github.com/SELinuxProject/selinux-kernel/issues/39

    Signed-off-by: Ondrej Mosnacek
    Acked-by: Casey Schaufler
    Signed-off-by: Paul Moore

    Ondrej Mosnacek
     
  • This patch introduces a new security hook that is intended for
    initializing the security data for newly created kernfs nodes, which
    provide a way of storing a non-default security context, but need to
    operate independently from mounts (and therefore may not have an
    associated inode at the moment of creation).

    The main motivation is to allow kernfs nodes to inherit the context of
    the parent under SELinux, similar to the behavior of
    security_inode_init_security(). Other LSMs may implement their own logic
    for handling the creation of new nodes.

    This patch also adds helper functions to for
    getting/setting security xattrs of a kernfs node so that LSMs hooks are
    able to do their job. Other important attributes should be accessible
    direcly in the kernfs_node fields (in case there is need for more, then
    new helpers should be added to kernfs.h along with the patch that needs
    them).

    Signed-off-by: Ondrej Mosnacek
    Acked-by: Casey Schaufler
    [PM: more manual merge fixes]
    Signed-off-by: Paul Moore

    Ondrej Mosnacek
     
  • Replace the special handling of security xattrs with simple_xattrs, as
    is already done for the trusted xattrs. This simplifies the code and
    allows LSMs to use more than just a single xattr to do their business.

    Signed-off-by: Ondrej Mosnacek
    Acked-by: Casey Schaufler
    [PM: manual merge fixes]
    Signed-off-by: Paul Moore

    Ondrej Mosnacek
     
  • This is a read-only operation, so we can simply return -ENODATA if
    kn->iattr is NULL.

    Signed-off-by: Ondrej Mosnacek
    Acked-by: Casey Schaufler
    [PM: minor merge fixes]
    Signed-off-by: Paul Moore

    Ondrej Mosnacek
     
  • Right now, kernfs_iattrs embeds the whole struct iattr, even though it
    doesn't really use half of its fields... This both leads to wasting
    space and makes the code look awkward. Let's just list the few fields
    we need directly in struct kernfs_iattrs.

    Signed-off-by: Ondrej Mosnacek
    Acked-by: Casey Schaufler
    [PM: merged a number of chunks manually due to fuzz]
    Signed-off-by: Paul Moore

    Ondrej Mosnacek
     

13 Mar, 2019

1 commit

  • Pull vfs mount infrastructure updates from Al Viro:
    "The rest of core infrastructure; no new syscalls in that pile, but the
    old parts are switched to new infrastructure. At that point
    conversions of individual filesystems can happen independently; some
    are done here (afs, cgroup, procfs, etc.), there's also a large series
    outside of that pile dealing with NFS (quite a bit of option-parsing
    stuff is getting used there - it's one of the most convoluted
    filesystems in terms of mount-related logics), but NFS bits are the
    next cycle fodder.

    It got seriously simplified since the last cycle; documentation is
    probably the weakest bit at the moment - I considered dropping the
    commit introducing Documentation/filesystems/mount_api.txt (cutting
    the size increase by quarter ;-), but decided that it would be better
    to fix it up after -rc1 instead.

    That pile allows to do followup work in independent branches, which
    should make life much easier for the next cycle. fs/super.c size
    increase is unpleasant; there's a followup series that allows to
    shrink it considerably, but I decided to leave that until the next
    cycle"

    * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits)
    afs: Use fs_context to pass parameters over automount
    afs: Add fs_context support
    vfs: Add some logging to the core users of the fs_context log
    vfs: Implement logging through fs_context
    vfs: Provide documentation for new mount API
    vfs: Remove kern_mount_data()
    hugetlbfs: Convert to fs_context
    cpuset: Use fs_context
    kernfs, sysfs, cgroup, intel_rdt: Support fs_context
    cgroup: store a reference to cgroup_ns into cgroup_fs_context
    cgroup1_get_tree(): separate "get cgroup_root to use" into a separate helper
    cgroup_do_mount(): massage calling conventions
    cgroup: stash cgroup_root reference into cgroup_fs_context
    cgroup2: switch to option-by-option parsing
    cgroup1: switch to option-by-option parsing
    cgroup: take options parsing into ->parse_monolithic()
    cgroup: fold cgroup1_mount() into cgroup1_get_tree()
    cgroup: start switching to fs_context
    ipc: Convert mqueue fs to fs_context
    proc: Add fs_context support to procfs
    ...

    Linus Torvalds
     

07 Mar, 2019

1 commit

  • Pull driver core updates from Greg KH:
    "Here is the big driver core patchset for 5.1-rc1

    More patches than "normal" here this merge window, due to some work in
    the driver core by Alexander Duyck to rework the async probe
    functionality to work better for a number of devices, and independant
    work from Rafael for the device link functionality to make it work
    "correctly".

    Also in here is:

    - lots of BUS_ATTR() removals, the macro is about to go away

    - firmware test fixups

    - ihex fixups and simplification

    - component additions (also includes i915 patches)

    - lots of minor coding style fixups and cleanups.

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (65 commits)
    driver core: platform: remove misleading err_alloc label
    platform: set of_node in platform_device_register_full()
    firmware: hardcode the debug message for -ENOENT
    driver core: Add missing description of new struct device_link field
    driver core: Fix PM-runtime for links added during consumer probe
    drivers/component: kerneldoc polish
    async: Add cmdline option to specify drivers to be async probed
    driver core: Fix possible supplier PM-usage counter imbalance
    PM-runtime: Fix __pm_runtime_set_status() race with runtime resume
    driver: platform: Support parsing GpioInt 0 in platform_get_irq()
    selftests: firmware: fix verify_reqs() return value
    Revert "selftests: firmware: remove use of non-standard diff -Z option"
    Revert "selftests: firmware: add CONFIG_FW_LOADER_USER_HELPER_FALLBACK to config"
    device: Fix comment for driver_data in struct device
    kernfs: Allocating memory for kernfs_iattrs with kmem_cache.
    sysfs: remove unused include of kernfs-internal.h
    driver core: Postpone DMA tear-down until after devres release
    driver core: Document limitation related to DL_FLAG_RPM_ACTIVE
    PM-runtime: Take suppliers into account in __pm_runtime_set_status()
    device.h: Add __cold to dev_ logging functions
    ...

    Linus Torvalds