19 Aug, 2020

1 commit

  • [ Upstream commit 9991bb84b27a2594187898f261866cfc50255454 ]

    When creating an FS_MODIFY event on inode itself (not on parent)
    the file_name argument should be NULL.

    The change to send a non NULL name to inode itself was done on purpuse
    as part of another commit, as Tejun writes: "...While at it, supply the
    target file name to fsnotify() from kernfs_node->name.".

    But this is wrong practice and inconsistent with inotify behavior when
    watching a single file. When a child is being watched (as opposed to the
    parent directory) the inotify event should contain the watch descriptor,
    but not the file name.

    Fixes: df6a58c5c5aa ("kernfs: don't depend on d_find_any_alias()...")
    Link: https://lore.kernel.org/r/20200708111156.24659-5-amir73il@gmail.com
    Acked-by: Tejun Heo
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara
    Signed-off-by: Sasha Levin

    Amir Goldstein
     

13 Dec, 2019

1 commit

  • commit e23f568aa63f64cd6b355094224cc9356c0f696b upstream.

    When the 32bit ino wraps around, kernfs increments the generation
    number to distinguish reused ino instances. The wrap-around detection
    tests whether the allocated ino is lower than what the cursor but the
    cursor is pointing to the next ino to allocate so the condition never
    triggers.

    Fix it by remembering the last ino and comparing against that.

    Signed-off-by: Tejun Heo
    Reviewed-by: Greg Kroah-Hartman
    Fixes: 4a3ef68acacf ("kernfs: implement i_generation")
    Cc: Namhyung Kim
    Cc: stable@vger.kernel.org # v4.14+
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     

20 Sep, 2019

1 commit

  • Pull y2038 vfs updates from Arnd Bergmann:
    "Add inode timestamp clamping.

    This series from Deepa Dinamani adds a per-superblock minimum/maximum
    timestamp limit for a file system, and clamps timestamps as they are
    written, to avoid random behavior from integer overflow as well as
    having different time stamps on disk vs in memory.

    At mount time, a warning is now printed for any file system that can
    represent current timestamps but not future timestamps more than 30
    years into the future, similar to the arbitrary 30 year limit that was
    added to settimeofday().

    This was picked as a compromise to warn users to migrate to other file
    systems (e.g. ext4 instead of ext3) when they need the file system to
    survive beyond 2038 (or similar limits in other file systems), but not
    get in the way of normal usage"

    * tag 'y2038-vfs' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
    ext4: Reduce ext4 timestamp warnings
    isofs: Initialize filesystem timestamp ranges
    pstore: fs superblock limits
    fs: omfs: Initialize filesystem timestamp ranges
    fs: hpfs: Initialize filesystem timestamp ranges
    fs: ceph: Initialize filesystem timestamp ranges
    fs: sysv: Initialize filesystem timestamp ranges
    fs: affs: Initialize filesystem timestamp ranges
    fs: fat: Initialize filesystem timestamp ranges
    fs: cifs: Initialize filesystem timestamp ranges
    fs: nfs: Initialize filesystem timestamp ranges
    ext4: Initialize timestamps limits
    9p: Fill min and max timestamps in sb
    fs: Fill in max and min timestamps in superblock
    utimes: Clamp the timestamps before update
    mount: Add mount warning for impending timestamp expiry
    timestamp_truncate: Replace users of timespec64_trunc
    vfs: Add timestamp_truncate() api
    vfs: Add file timestamp range support

    Linus Torvalds
     

30 Aug, 2019

1 commit

  • Update the inode timestamp updates to use timestamp_truncate()
    instead of timespec64_trunc().

    The change was mostly generated by the following coccinelle
    script.

    virtual context
    virtual patch

    @r1 depends on patch forall@
    struct inode *inode;
    identifier i_xtime =~ "^i_[acm]time$";
    expression e;
    @@

    inode->i_xtime =
    - timespec64_trunc(
    + timestamp_truncate(
    ...,
    - e);
    + inode);

    Signed-off-by: Deepa Dinamani
    Acked-by: Greg Kroah-Hartman
    Acked-by: Jeff Layton
    Cc: adrian.hunter@intel.com
    Cc: dedekind1@gmail.com
    Cc: gregkh@linuxfoundation.org
    Cc: hch@lst.de
    Cc: jaegeuk@kernel.org
    Cc: jlbec@evilplan.org
    Cc: richard@nod.at
    Cc: tj@kernel.org
    Cc: yuchao0@huawei.com
    Cc: linux-f2fs-devel@lists.sourceforge.net
    Cc: linux-ntfs-dev@lists.sourceforge.net
    Cc: linux-mtd@lists.infradead.org

    Deepa Dinamani
     

25 Jul, 2019

2 commits

  • In kernfs_path_from_node_locked(), there is an if statement on line 147
    to check whether buf is NULL:
    if (buf)

    When buf is NULL, it is used on line 151:
    len += strlcpy(buf + len, parent_str, ...)
    and line 158:
    len += strlcpy(buf + len, "/", ...)
    and line 160:
    len += strlcpy(buf + len, kn->name, ...)

    Thus, possible null-pointer dereferences may occur.

    To fix these possible bugs, buf is checked before being used.
    If it is NULL, -EINVAL is returned.

    These bugs are found by a static analysis tool STCheck written by us.

    Signed-off-by: Jia-Ju Bai
    Link: https://lore.kernel.org/r/20190724022242.27505-1-baijiaju1990@gmail.com
    Signed-off-by: Greg Kroah-Hartman

    Jia-Ju Bai
     
  • Get root safely after kn is ensureed to be not null.

    Signed-off-by: Peng Wang
    Acked-by: Tejun Heo
    Link: https://lore.kernel.org/r/20190708151611.13242-1-rocking@whu.edu.cn
    Signed-off-by: Greg Kroah-Hartman

    Peng Wang
     

05 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this file is released under the gplv2

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 68 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Armijn Hemel
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190531190114.292346262@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 May, 2019

1 commit


08 May, 2019

2 commits

  • Pull misc dcache updates from Al Viro:
    "Most of this pile is putting name length into struct name_snapshot and
    making use of it.

    The beginning of this series ("ovl_lookup_real_one(): don't bother
    with strlen()") ought to have been split in two (separate switch of
    name_snapshot to struct qstr from overlayfs reaping the trivial
    benefits of that), but I wanted to avoid a rebase - by the time I'd
    spotted that it was (a) in -next and (b) close to 5.1-final ;-/"

    * 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    audit_compare_dname_path(): switch to const struct qstr *
    audit_update_watch(): switch to const struct qstr *
    inotify_handle_event(): don't bother with strlen()
    fsnotify: switch send_to_group() and ->handle_event to const struct qstr *
    fsnotify(): switch to passing const struct qstr * for file_name
    switch fsnotify_move() to passing const struct qstr * for old_name
    ovl_lookup_real_one(): don't bother with strlen()
    sysv: bury the broken "quietly truncate the long filenames" logics
    nsfs: unobfuscate
    unexport d_alloc_pseudo()

    Linus Torvalds
     
  • Pull selinux updates from Paul Moore:
    "We've got a few SELinux patches for the v5.2 merge window, the
    highlights are below:

    - Add LSM hooks, and the SELinux implementation, for proper labeling
    of kernfs. While we are only including the SELinux implementation
    here, the rest of the LSM folks have given the hooks a thumbs-up.

    - Update the SELinux mdp (Make Dummy Policy) script to actually work
    on a modern system.

    - Disallow userspace to change the LSM credentials via
    /proc/self/attr when the task's credentials are already overridden.

    The change was made in procfs because all the LSM folks agreed this
    was the Right Thing To Do and duplicating it across each LSM was
    going to be annoying"

    * tag 'selinux-pr-20190507' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
    proc: prevent changes to overridden credentials
    selinux: Check address length before reading address family
    kernfs: fix xattr name handling in LSM helpers
    MAINTAINERS: update SELinux file patterns
    selinux: avoid uninitialized variable warning
    selinux: remove useless assignments
    LSM: lsm_hooks.h - fix missing colon in docstring
    selinux: Make selinux_kernfs_init_security static
    kernfs: initialize security of newly created nodes
    selinux: implement the kernfs_init_security hook
    LSM: add new hook for kernfs node initialization
    kernfs: use simple_xattrs for security attributes
    selinux: try security xattr after genfs for kernfs filesystems
    kernfs: do not alloc iattrs in kernfs_xattr_get
    kernfs: clean up struct kernfs_iattrs
    scripts/selinux: fix build
    selinux: use kernel linux/socket.h for genheaders and mdp
    scripts/selinux: modernize mdp

    Linus Torvalds
     

27 Apr, 2019

1 commit

  • Note that in fnsotify_move() and fsnotify_link() we are guaranteed
    that dentry->d_name won't change during the fsnotify() evaluation
    (by having the parent directory locked exclusive), so we don't
    need to fetch dentry->d_name.name in the callers. In fsnotify_dirent()
    the same stability of dentry->d_name is also true, but it's a bit
    more convoluted - there is one callchain (devpts_pty_new() ->
    fsnotify_create() -> fsnotify_dirent()) where the parent is _not_
    locked, but on devpts ->d_name of everything is unchanging; it
    has neither explicit nor implicit renames.

    Signed-off-by: Al Viro

    Al Viro
     

26 Apr, 2019

1 commit

  • smp_mb__before_atomic() can not be applied to atomic_set(). Remove the
    barrier and rely on RELEASE synchronization.

    Fixes: ba16b2846a8c6 ("kernfs: add an API to get kernfs node from inode number")
    Cc: stable@vger.kernel.org
    Signed-off-by: Andrea Parri
    Acked-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Andrea Parri
     

04 Apr, 2019

1 commit

  • The implementation of kernfs_security_xattr_*() helpers reuses the
    kernfs_node_xattr_*() functions, which take the suffix of the xattr name
    and extract full xattr name from it using xattr_full_name(). However,
    this function relies on the fact that the suffix passed to xattr
    handlers from VFS is always constructed from the full name by just
    incerementing the pointer. This doesn't necessarily hold for the callers
    of kernfs_security_xattr_*(), so their usage will easily lead to
    out-of-bounds access.

    Fix this by moving the xattr name reconstruction to the VFS xattr
    handlers and replacing the kernfs_security_xattr_*() helpers with more
    general kernfs_xattr_*() helpers that take full xattr name and allow
    accessing all kernfs node's xattrs.

    Reported-by: kernel test robot
    Fixes: b230d5aba2d1 ("LSM: add new hook for kernfs node initialization")
    Fixes: ec882da5cda9 ("selinux: implement the kernfs_init_security hook")
    Signed-off-by: Ondrej Mosnacek
    Signed-off-by: Paul Moore

    Ondrej Mosnacek
     

21 Mar, 2019

5 commits

  • Use the new security_kernfs_init_security() hook to allow LSMs to
    possibly assign a non-default security context to a newly created kernfs
    node based on the attributes of the new node and also its parent node.

    This fixes an issue with cgroupfs under SELinux, where newly created
    cgroup subdirectories/files would not inherit its parent's context if
    it had been set explicitly to a non-default value (other than the genfs
    context specified by the policy). This can be reproduced as follows (on
    Fedora/RHEL):

    # mkdir /sys/fs/cgroup/unified/test
    # # Need permissive to change the label under Fedora policy:
    # setenforce 0
    # chcon -t container_file_t /sys/fs/cgroup/unified/test
    # ls -lZ /sys/fs/cgroup/unified
    total 0
    -r--r--r--. 1 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:06 cgroup.controllers
    -rw-r--r--. 1 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:06 cgroup.max.depth
    -rw-r--r--. 1 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:06 cgroup.max.descendants
    -rw-r--r--. 1 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:06 cgroup.procs
    -r--r--r--. 1 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:06 cgroup.stat
    -rw-r--r--. 1 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:06 cgroup.subtree_control
    -rw-r--r--. 1 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:06 cgroup.threads
    drwxr-xr-x. 2 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:06 init.scope
    drwxr-xr-x. 26 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:21 system.slice
    drwxr-xr-x. 3 root root system_u:object_r:container_file_t:s0 0 Jan 29 03:15 test
    drwxr-xr-x. 3 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:06 user.slice
    # mkdir /sys/fs/cgroup/unified/test/subdir

    Actual result:

    # ls -ldZ /sys/fs/cgroup/unified/test/subdir
    drwxr-xr-x. 2 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:15 /sys/fs/cgroup/unified/test/subdir

    Expected result:

    # ls -ldZ /sys/fs/cgroup/unified/test/subdir
    drwxr-xr-x. 2 root root unconfined_u:object_r:container_file_t:s0 0 Jan 29 03:15 /sys/fs/cgroup/unified/test/subdir

    Link: https://github.com/SELinuxProject/selinux-kernel/issues/39

    Signed-off-by: Ondrej Mosnacek
    Acked-by: Casey Schaufler
    Signed-off-by: Paul Moore

    Ondrej Mosnacek
     
  • This patch introduces a new security hook that is intended for
    initializing the security data for newly created kernfs nodes, which
    provide a way of storing a non-default security context, but need to
    operate independently from mounts (and therefore may not have an
    associated inode at the moment of creation).

    The main motivation is to allow kernfs nodes to inherit the context of
    the parent under SELinux, similar to the behavior of
    security_inode_init_security(). Other LSMs may implement their own logic
    for handling the creation of new nodes.

    This patch also adds helper functions to for
    getting/setting security xattrs of a kernfs node so that LSMs hooks are
    able to do their job. Other important attributes should be accessible
    direcly in the kernfs_node fields (in case there is need for more, then
    new helpers should be added to kernfs.h along with the patch that needs
    them).

    Signed-off-by: Ondrej Mosnacek
    Acked-by: Casey Schaufler
    [PM: more manual merge fixes]
    Signed-off-by: Paul Moore

    Ondrej Mosnacek
     
  • Replace the special handling of security xattrs with simple_xattrs, as
    is already done for the trusted xattrs. This simplifies the code and
    allows LSMs to use more than just a single xattr to do their business.

    Signed-off-by: Ondrej Mosnacek
    Acked-by: Casey Schaufler
    [PM: manual merge fixes]
    Signed-off-by: Paul Moore

    Ondrej Mosnacek
     
  • This is a read-only operation, so we can simply return -ENODATA if
    kn->iattr is NULL.

    Signed-off-by: Ondrej Mosnacek
    Acked-by: Casey Schaufler
    [PM: minor merge fixes]
    Signed-off-by: Paul Moore

    Ondrej Mosnacek
     
  • Right now, kernfs_iattrs embeds the whole struct iattr, even though it
    doesn't really use half of its fields... This both leads to wasting
    space and makes the code look awkward. Let's just list the few fields
    we need directly in struct kernfs_iattrs.

    Signed-off-by: Ondrej Mosnacek
    Acked-by: Casey Schaufler
    [PM: merged a number of chunks manually due to fuzz]
    Signed-off-by: Paul Moore

    Ondrej Mosnacek
     

13 Mar, 2019

1 commit

  • Pull vfs mount infrastructure updates from Al Viro:
    "The rest of core infrastructure; no new syscalls in that pile, but the
    old parts are switched to new infrastructure. At that point
    conversions of individual filesystems can happen independently; some
    are done here (afs, cgroup, procfs, etc.), there's also a large series
    outside of that pile dealing with NFS (quite a bit of option-parsing
    stuff is getting used there - it's one of the most convoluted
    filesystems in terms of mount-related logics), but NFS bits are the
    next cycle fodder.

    It got seriously simplified since the last cycle; documentation is
    probably the weakest bit at the moment - I considered dropping the
    commit introducing Documentation/filesystems/mount_api.txt (cutting
    the size increase by quarter ;-), but decided that it would be better
    to fix it up after -rc1 instead.

    That pile allows to do followup work in independent branches, which
    should make life much easier for the next cycle. fs/super.c size
    increase is unpleasant; there's a followup series that allows to
    shrink it considerably, but I decided to leave that until the next
    cycle"

    * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits)
    afs: Use fs_context to pass parameters over automount
    afs: Add fs_context support
    vfs: Add some logging to the core users of the fs_context log
    vfs: Implement logging through fs_context
    vfs: Provide documentation for new mount API
    vfs: Remove kern_mount_data()
    hugetlbfs: Convert to fs_context
    cpuset: Use fs_context
    kernfs, sysfs, cgroup, intel_rdt: Support fs_context
    cgroup: store a reference to cgroup_ns into cgroup_fs_context
    cgroup1_get_tree(): separate "get cgroup_root to use" into a separate helper
    cgroup_do_mount(): massage calling conventions
    cgroup: stash cgroup_root reference into cgroup_fs_context
    cgroup2: switch to option-by-option parsing
    cgroup1: switch to option-by-option parsing
    cgroup: take options parsing into ->parse_monolithic()
    cgroup: fold cgroup1_mount() into cgroup1_get_tree()
    cgroup: start switching to fs_context
    ipc: Convert mqueue fs to fs_context
    proc: Add fs_context support to procfs
    ...

    Linus Torvalds
     

07 Mar, 2019

1 commit

  • Pull driver core updates from Greg KH:
    "Here is the big driver core patchset for 5.1-rc1

    More patches than "normal" here this merge window, due to some work in
    the driver core by Alexander Duyck to rework the async probe
    functionality to work better for a number of devices, and independant
    work from Rafael for the device link functionality to make it work
    "correctly".

    Also in here is:

    - lots of BUS_ATTR() removals, the macro is about to go away

    - firmware test fixups

    - ihex fixups and simplification

    - component additions (also includes i915 patches)

    - lots of minor coding style fixups and cleanups.

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (65 commits)
    driver core: platform: remove misleading err_alloc label
    platform: set of_node in platform_device_register_full()
    firmware: hardcode the debug message for -ENOENT
    driver core: Add missing description of new struct device_link field
    driver core: Fix PM-runtime for links added during consumer probe
    drivers/component: kerneldoc polish
    async: Add cmdline option to specify drivers to be async probed
    driver core: Fix possible supplier PM-usage counter imbalance
    PM-runtime: Fix __pm_runtime_set_status() race with runtime resume
    driver: platform: Support parsing GpioInt 0 in platform_get_irq()
    selftests: firmware: fix verify_reqs() return value
    Revert "selftests: firmware: remove use of non-standard diff -Z option"
    Revert "selftests: firmware: add CONFIG_FW_LOADER_USER_HELPER_FALLBACK to config"
    device: Fix comment for driver_data in struct device
    kernfs: Allocating memory for kernfs_iattrs with kmem_cache.
    sysfs: remove unused include of kernfs-internal.h
    driver core: Postpone DMA tear-down until after devres release
    driver core: Document limitation related to DL_FLAG_RPM_ACTIVE
    PM-runtime: Take suppliers into account in __pm_runtime_set_status()
    device.h: Add __cold to dev_ logging functions
    ...

    Linus Torvalds
     

06 Mar, 2019

1 commit

  • Patch series "psi: pressure stall monitors", v3.

    Android is adopting psi to detect and remedy memory pressure that
    results in stuttering and decreased responsiveness on mobile devices.

    Psi gives us the stall information, but because we're dealing with
    latencies in the millisecond range, periodically reading the pressure
    files to detect stalls in a timely fashion is not feasible. Psi also
    doesn't aggregate its averages at a high enough frequency right now.

    This patch series extends the psi interface such that users can
    configure sensitive latency thresholds and use poll() and friends to be
    notified when these are breached.

    As high-frequency aggregation is costly, it implements an aggregation
    method that is optimized for fast, short-interval averaging, and makes
    the aggregation frequency adaptive, such that high-frequency updates
    only happen while monitored stall events are actively occurring.

    With these patches applied, Android can monitor for, and ward off,
    mounting memory shortages before they cause problems for the user. For
    example, using memory stall monitors in userspace low memory killer
    daemon (lmkd) we can detect mounting pressure and kill less important
    processes before device becomes visibly sluggish.

    In our memory stress testing psi memory monitors produce roughly 10x
    less false positives compared to vmpressure signals. Having ability to
    specify multiple triggers for the same psi metric allows other parts of
    Android framework to monitor memory state of the device and act
    accordingly.

    The new interface is straightforward. The user opens one of the
    pressure files for writing and writes a trigger description into the
    file descriptor that defines the stall state - some or full, and the
    maximum stall time over a given window of time. E.g.:

    /* Signal when stall time exceeds 100ms of a 1s window */
    char trigger[] = "full 100000 1000000";
    fd = open("/proc/pressure/memory");
    write(fd, trigger, sizeof(trigger));
    while (poll() >= 0) {
    ...
    }
    close(fd);

    When the monitored stall state is entered, psi adapts its aggregation
    frequency according to what the configured time window requires in order
    to emit event signals in a timely fashion. Once the stalling subsides,
    aggregation reverts back to normal.

    The trigger is associated with the open file descriptor. To stop
    monitoring, the user only needs to close the file descriptor and the
    trigger is discarded.

    Patches 1-4 prepare the psi code for polling support. Patch 5
    implements the adaptive polling logic, the pressure growth detection
    optimized for short intervals, and hooks up write() and poll() on the
    pressure files.

    The patches were developed in collaboration with Johannes Weiner.

    This patch (of 5):

    Kernfs has a standardized poll/notification mechanism for waking all
    pollers on all fds when a filesystem node changes. To allow polling for
    custom events, add a .poll callback that can override the default.

    This is in preparation for pollable cgroup pressure files which have
    per-fd trigger configurations.

    Link: http://lkml.kernel.org/r/20190124211518.244221-2-surenb@google.com
    Signed-off-by: Johannes Weiner
    Signed-off-by: Suren Baghdasaryan
    Cc: Dennis Zhou
    Cc: Ingo Molnar
    Cc: Jens Axboe
    Cc: Li Zefan
    Cc: Peter Zijlstra
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

28 Feb, 2019

1 commit

  • Make kernfs support superblock creation/mount/remount with fs_context.

    This requires that sysfs, cgroup and intel_rdt, which are built on kernfs,
    be made to support fs_context also.

    Notes:

    (1) A kernfs_fs_context struct is created to wrap fs_context and the
    kernfs mount parameters are moved in here (or are in fs_context).

    (2) kernfs_mount{,_ns}() are made into kernfs_get_tree(). The extra
    namespace tag parameter is passed in the context if desired

    (3) kernfs_free_fs_context() is provided as a destructor for the
    kernfs_fs_context struct, but for the moment it does nothing except
    get called in the right places.

    (4) sysfs doesn't wrap kernfs_fs_context since it has no parameters to
    pass, but possibly this should be done anyway in case someone wants to
    add a parameter in future.

    (5) A cgroup_fs_context struct is created to wrap kernfs_fs_context and
    the cgroup v1 and v2 mount parameters are all moved there.

    (6) cgroup1 parameter parsing error messages are now handled by invalf(),
    which allows userspace to collect them directly.

    (7) cgroup1 parameter cleanup is now done in the context destructor rather
    than in the mount/get_tree and remount functions.

    Weirdies:

    (*) cgroup_do_get_tree() calls cset_cgroup_from_root() with locks held,
    but then uses the resulting pointer after dropping the locks. I'm
    told this is okay and needs commenting.

    (*) The cgroup refcount web. This really needs documenting.

    (*) cgroup2 only has one root?

    Add a suggestion from Thomas Gleixner in which the RDT enablement code is
    placed into its own function.

    [folded a leak fix from Andrey Vagin]

    Signed-off-by: David Howells
    cc: Greg Kroah-Hartman
    cc: Tejun Heo
    cc: Li Zefan
    cc: Johannes Weiner
    cc: cgroups@vger.kernel.org
    cc: fenghua.yu@intel.com
    Signed-off-by: Al Viro

    David Howells
     

08 Feb, 2019

1 commit

  • Creating a new cache for kernfs_iattrs.
    Currently, memory is allocated with kzalloc() which
    always gives aligned memory. On ARM, this is 64 byte aligned.
    To avoid the wastage of memory in aligning the size requested,
    a new cache for kernfs_iattrs is created.

    Size of struct kernfs_iattrs is 80 Bytes.
    On ARM, it will come in kmalloc-128 slab.
    and it will come in kmalloc-192 slab if debug info is enabled.
    Extra bytes taken 48 bytes.

    Total number of objects created : 4096
    Total saving = 48*4096 = 192 KB

    After creating new slab(When debug info is enabled) :
    sh-3.2# cat /proc/slabinfo
    ...
    kernfs_iattrs_cache 4069 4096 128 32 1 : tunables 0 0 0 : slabdata 128 128 0
    ...

    All testing has been done on ARM target.

    Signed-off-by: Ayush Mittal
    Signed-off-by: Vaneet Narang
    Acked-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Ayush Mittal
     

18 Jan, 2019

2 commits

  • unused now and impossible to use safely anyway.

    Signed-off-by: Al Viro

    Al Viro
     
  • same story as with last May fixes in sysfs (7b745a4e4051
    "unfuck sysfs_mount()"); new_sb is left uninitialized
    in case of early errors in kernfs_mount_ns() and papering
    over it by treating any error from kernfs_mount_ns() as
    equivalent to !new_ns ends up conflating the cases when
    objects had never been transferred to a superblock with
    ones when that has happened and resulting new superblock
    had been dropped. Easily fixed (same way as in sysfs
    case). Additionally, there's a superblock leak on
    kernfs_node_dentry() failure *and* a dentry leak inside
    kernfs_node_dentry() itself - the latter on probably
    impossible errors, but the former not impossible to trigger
    (as the matter of fact, injecting allocation failures
    at that point *does* trigger it).

    Cc: stable@kernel.org
    Signed-off-by: Al Viro

    Al Viro
     

27 Nov, 2018

1 commit

  • kernfs_notify() does two notifications: poll and fsnotify. Originally,
    both notifications were done from scheduled work context and all that
    kernfs_notify() did was schedule the work.

    This patch simply moves the poll notification from the scheduled work
    handler to kernfs_notify(). The fsnotify notification still needs to be
    done from scheduled work context because it can sleep (it needs to lock
    a mutex).

    If the poll notification is time critical (the notified thread needs to
    wake as quickly as possible), it's better to do it from kernfs_notify()
    directly. One example is calling sysfs_notify_dirent() from a hardware
    interrupt handler to wake up a thread and handle the interrupt in user
    space.

    Signed-off-by: Radu Rendec
    Acked-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Radu Rendec
     

27 Oct, 2018

1 commit

  • The page cache and most shrinkable slab caches hold data that has been
    read from disk, but there are some caches that only cache CPU work, such
    as the dentry and inode caches of procfs and sysfs, as well as the subset
    of radix tree nodes that track non-resident page cache.

    Currently, all these are shrunk at the same rate: using DEFAULT_SEEKS for
    the shrinker's seeks setting tells the reclaim algorithm that for every
    two page cache pages scanned it should scan one slab object.

    This is a bogus setting. A virtual inode that required no IO to create is
    not twice as valuable as a page cache page; shadow cache entries with
    eviction distances beyond the size of memory aren't either.

    In most cases, the behavior in practice is still fine. Such virtual
    caches don't tend to grow and assert themselves aggressively, and usually
    get picked up before they cause problems. But there are scenarios where
    that's not true.

    Our database workloads suffer from two of those. For one, their file
    workingset is several times bigger than available memory, which has the
    kernel aggressively create shadow page cache entries for the non-resident
    parts of it. The workingset code does tell the VM that most of these are
    expendable, but the VM ends up balancing them 2:1 to cache pages as per
    the seeks setting. This is a huge waste of memory.

    These workloads also deal with tens of thousands of open files and use
    /proc for introspection, which ends up growing the proc_inode_cache to
    absurdly large sizes - again at the cost of valuable cache space, which
    isn't a reasonable trade-off, given that proc inodes can be re-created
    without involving the disk.

    This patch implements a "zero-seek" setting for shrinkers that results in
    a target ratio of 0:1 between their objects and IO-backed caches. This
    allows such virtual caches to grow when memory is available (they do
    cache/avoid CPU work after all), but effectively disables them as soon as
    IO-backed objects are under pressure.

    It then switches the shrinkers for procfs and sysfs metadata, as well as
    excess page cache shadow nodes, to the new zero-seek setting.

    Link: http://lkml.kernel.org/r/20181009184732.762-5-hannes@cmpxchg.org
    Signed-off-by: Johannes Weiner
    Reported-by: Domas Mituzas
    Reviewed-by: Andrew Morton
    Reviewed-by: Rik van Riel
    Acked-by: Peter Zijlstra (Intel)
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

17 Sep, 2018

1 commit

  • The terminating NUL byte is only there because the buffer is
    allocated with kzalloc(PAGE_SIZE, GFP_KERNEL), but since the
    range-check is off-by-one, and PAGE_SIZE==PATH_MAX, the
    returned string may not be zero-terminated if it is exactly
    PATH_MAX characters long. Furthermore also the initial loop
    may theoretically exceed PATH_MAX and cause a fault.

    Signed-off-by: Bernd Edlinger
    Acked-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Bernd Edlinger
     

19 Aug, 2018

1 commit

  • Pull driver core updates from Greg KH:
    "Here are all of the driver core and related patches for 4.19-rc1.

    Nothing huge here, just a number of small cleanups and the ability to
    now stop the deferred probing after init happens.

    All of these have been in linux-next for a while with only a merge
    issue reported"

    * tag 'driver-core-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (21 commits)
    base: core: Remove WARN_ON from link dependencies check
    drivers/base: stop new probing during shutdown
    drivers: core: Remove glue dirs from sysfs earlier
    driver core: remove unnecessary function extern declare
    sysfs.h: fix non-kernel-doc comment
    PM / Domains: Stop deferring probe at the end of initcall
    iommu: Remove IOMMU_OF_DECLARE
    iommu: Stop deferring probe at end of initcalls
    pinctrl: Support stopping deferred probe after initcalls
    dt-bindings: pinctrl: add a 'pinctrl-use-default' property
    driver core: allow stopping deferred probe after init
    driver core: add a debugfs entry to show deferred devices
    sysfs: Fix internal_create_group() for named group updates
    base: fix order of OF initialization
    linux/device.h: fix kernel-doc notation warning
    Documentation: update firmware loader fallback reference
    kobject: Replace strncpy with memcpy
    drivers: base: cacheinfo: use OF property_read_u32 instead of get_property,read_number
    kernfs: Replace strncpy with memcpy
    device: Add #define dev_fmt similar to #define pr_fmt
    ...

    Linus Torvalds
     

21 Jul, 2018

1 commit

  • This change allows creating kernfs files and directories with arbitrary
    uid/gid instead of always using GLOBAL_ROOT_UID/GID by extending
    kernfs_create_dir_ns() and kernfs_create_file_ns() with uid/gid arguments.
    The "simple" kernfs_create_file() and kernfs_create_dir() are left alone
    and always create objects belonging to the global root.

    When creating symlinks ownership (uid/gid) is taken from the target kernfs
    object.

    Co-Developed-by: Tyler Hicks
    Signed-off-by: Dmitry Torokhov
    Signed-off-by: Tyler Hicks
    Signed-off-by: David S. Miller

    Dmitry Torokhov
     

07 Jul, 2018

1 commit

  • gcc 8.1.0 complains:

    fs/kernfs/symlink.c:91:3: warning:
    'strncpy' output truncated before terminating nul copying
    as many bytes from a string as its length
    fs/kernfs/symlink.c: In function 'kernfs_iop_get_link':
    fs/kernfs/symlink.c:88:14: note: length computed here

    Using strncpy() is indeed less than perfect since the length of data to
    be copied has already been determined with strlen(). Replace strncpy()
    with memcpy() to address the warning and optimize the code a little.

    Signed-off-by: Guenter Roeck
    Acked-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Guenter Roeck
     

15 Jun, 2018

1 commit

  • Pull inode timestamps conversion to timespec64 from Arnd Bergmann:
    "This is a late set of changes from Deepa Dinamani doing an automated
    treewide conversion of the inode and iattr structures from 'timespec'
    to 'timespec64', to push the conversion from the VFS layer into the
    individual file systems.

    As Deepa writes:

    'The series aims to switch vfs timestamps to use struct timespec64.
    Currently vfs uses struct timespec, which is not y2038 safe.

    The series involves the following:
    1. Add vfs helper functions for supporting struct timepec64
    timestamps.
    2. Cast prints of vfs timestamps to avoid warnings after the switch.
    3. Simplify code using vfs timestamps so that the actual replacement
    becomes easy.
    4. Convert vfs timestamps to use struct timespec64 using a script.
    This is a flag day patch.

    Next steps:
    1. Convert APIs that can handle timespec64, instead of converting
    timestamps at the boundaries.
    2. Update internal data structures to avoid timestamp conversions'

    Thomas Gleixner adds:

    'I think there is no point to drag that out for the next merge
    window. The whole thing needs to be done in one go for the core
    changes which means that you're going to play that catchup game
    forever. Let's get over with it towards the end of the merge window'"

    * tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
    pstore: Remove bogus format string definition
    vfs: change inode times to use struct timespec64
    pstore: Convert internal records to timespec64
    udf: Simplify calls to udf_disk_stamp_to_time
    fs: nfs: get rid of memcpys for inode times
    ceph: make inode time prints to be long long
    lustre: Use long long type to print inode time
    fs: add timespec64_truncate()

    Linus Torvalds
     

14 Jun, 2018

1 commit

  • Pull the timespec64 conversion from Deepa Dinamani:
    "The series aims to switch vfs timestamps to use
    struct timespec64. Currently vfs uses struct timespec,
    which is not y2038 safe.

    The flag patch applies cleanly. I've not seen the timestamps
    update logic change often. The series applies cleanly on 4.17-rc6
    and linux-next tip (top commit: next-20180517).

    I'm not sure how to merge this kind of a series with a flag patch.
    We are targeting 4.18 for this.
    Let me know if you have other suggestions.

    The series involves the following:
    1. Add vfs helper functions for supporting struct timepec64 timestamps.
    2. Cast prints of vfs timestamps to avoid warnings after the switch.
    3. Simplify code using vfs timestamps so that the actual
    replacement becomes easy.
    4. Convert vfs timestamps to use struct timespec64 using a script.
    This is a flag day patch.

    I've tried to keep the conversions with the script simple, to
    aid in the reviews. I've kept all the internal filesystem data
    structures and function signatures the same.

    Next steps:
    1. Convert APIs that can handle timespec64, instead of converting
    timestamps at the boundaries.
    2. Update internal data structures to avoid timestamp conversions."

    I've pulled it into a branch based on top of the NFS changes that
    are now in mainline, so I could resolve the non-obvious conflict
    between the two while merging.

    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

06 Jun, 2018

2 commits

  • struct timespec is not y2038 safe. Transition vfs to use
    y2038 safe struct timespec64 instead.

    The change was made with the help of the following cocinelle
    script. This catches about 80% of the changes.
    All the header file and logic changes are included in the
    first 5 rules. The rest are trivial substitutions.
    I avoid changing any of the function signatures or any other
    filesystem specific data structures to keep the patch simple
    for review.

    The script can be a little shorter by combining different cases.
    But, this version was sufficient for my usecase.

    virtual patch

    @ depends on patch @
    identifier now;
    @@
    - struct timespec
    + struct timespec64
    current_time ( ... )
    {
    - struct timespec now = current_kernel_time();
    + struct timespec64 now = current_kernel_time64();
    ...
    - return timespec_trunc(
    + return timespec64_trunc(
    ... );
    }

    @ depends on patch @
    identifier xtime;
    @@
    struct \( iattr \| inode \| kstat \) {
    ...
    - struct timespec xtime;
    + struct timespec64 xtime;
    ...
    }

    @ depends on patch @
    identifier t;
    @@
    struct inode_operations {
    ...
    int (*update_time) (...,
    - struct timespec t,
    + struct timespec64 t,
    ...);
    ...
    }

    @ depends on patch @
    identifier t;
    identifier fn_update_time =~ "update_time$";
    @@
    fn_update_time (...,
    - struct timespec *t,
    + struct timespec64 *t,
    ...) { ... }

    @ depends on patch @
    identifier t;
    @@
    lease_get_mtime( ... ,
    - struct timespec *t
    + struct timespec64 *t
    ) { ... }

    @te depends on patch forall@
    identifier ts;
    local idexpression struct inode *inode_node;
    identifier i_xtime =~ "^i_[acm]time$";
    identifier ia_xtime =~ "^ia_[acm]time$";
    identifier fn_update_time =~ "update_time$";
    identifier fn;
    expression e, E3;
    local idexpression struct inode *node1;
    local idexpression struct inode *node2;
    local idexpression struct iattr *attr1;
    local idexpression struct iattr *attr2;
    local idexpression struct iattr attr;
    identifier i_xtime1 =~ "^i_[acm]time$";
    identifier i_xtime2 =~ "^i_[acm]time$";
    identifier ia_xtime1 =~ "^ia_[acm]time$";
    identifier ia_xtime2 =~ "^ia_[acm]time$";
    @@
    (
    (
    - struct timespec ts;
    + struct timespec64 ts;
    |
    - struct timespec ts = current_time(inode_node);
    + struct timespec64 ts = current_time(inode_node);
    )

    i_xtime, &ts)
    + timespec64_equal(&inode_node->i_xtime, &ts)
    |
    - timespec_equal(&ts, &inode_node->i_xtime)
    + timespec64_equal(&ts, &inode_node->i_xtime)
    |
    - timespec_compare(&inode_node->i_xtime, &ts)
    + timespec64_compare(&inode_node->i_xtime, &ts)
    |
    - timespec_compare(&ts, &inode_node->i_xtime)
    + timespec64_compare(&ts, &inode_node->i_xtime)
    |
    ts = current_time(e)
    |
    fn_update_time(..., &ts,...)
    |
    inode_node->i_xtime = ts
    |
    node1->i_xtime = ts
    |
    ts = inode_node->i_xtime
    |
    ia_xtime ...+> = ts
    |
    ts = attr1->ia_xtime
    |
    ts.tv_sec
    |
    ts.tv_nsec
    |
    btrfs_set_stack_timespec_sec(..., ts.tv_sec)
    |
    btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
    |
    - ts = timespec64_to_timespec(
    + ts =
    ...
    -)
    |
    - ts = ktime_to_timespec(
    + ts = ktime_to_timespec64(
    ...)
    |
    - ts = E3
    + ts = timespec_to_timespec64(E3)
    |
    - ktime_get_real_ts(&ts)
    + ktime_get_real_ts64(&ts)
    |
    fn(...,
    - ts
    + timespec64_to_timespec(ts)
    ,...)
    )
    ...+>
    (

    )
    |
    - timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
    + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
    |
    - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
    + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
    |
    - timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
    + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
    |
    node1->i_xtime1 =
    - timespec_trunc(attr1->ia_xtime1,
    + timespec64_trunc(attr1->ia_xtime1,
    ...)
    |
    - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
    + attr1->ia_xtime1 = timespec64_trunc(attr2->ia_xtime2,
    ...)
    |
    - ktime_get_real_ts(&attr1->ia_xtime1)
    + ktime_get_real_ts64(&attr1->ia_xtime1)
    |
    - ktime_get_real_ts(&attr.ia_xtime1)
    + ktime_get_real_ts64(&attr.ia_xtime1)
    )

    @ depends on patch @
    struct inode *node;
    struct iattr *attr;
    identifier fn;
    identifier i_xtime =~ "^i_[acm]time$";
    identifier ia_xtime =~ "^ia_[acm]time$";
    expression e;
    @@
    (
    - fn(node->i_xtime);
    + fn(timespec64_to_timespec(node->i_xtime));
    |
    fn(...,
    - node->i_xtime);
    + timespec64_to_timespec(node->i_xtime));
    |
    - e = fn(attr->ia_xtime);
    + e = fn(timespec64_to_timespec(attr->ia_xtime));
    )

    @ depends on patch forall @
    struct inode *node;
    struct iattr *attr;
    identifier i_xtime =~ "^i_[acm]time$";
    identifier ia_xtime =~ "^ia_[acm]time$";
    identifier fn;
    @@
    {
    + struct timespec ts;
    i_xtime);
    fn (...,
    - &node->i_xtime,
    + &ts,
    ...);
    |
    + ts = timespec64_to_timespec(attr->ia_xtime);
    fn (...,
    - &attr->ia_xtime,
    + &ts,
    ...);
    )
    ...+>
    }

    @ depends on patch forall @
    struct inode *node;
    struct iattr *attr;
    struct kstat *stat;
    identifier ia_xtime =~ "^ia_[acm]time$";
    identifier i_xtime =~ "^i_[acm]time$";
    identifier xtime =~ "^[acm]time$";
    identifier fn, ret;
    @@
    {
    + struct timespec ts;
    i_xtime);
    ret = fn (...,
    - &node->i_xtime,
    + &ts,
    ...);
    |
    + ts = timespec64_to_timespec(node->i_xtime);
    ret = fn (...,
    - &node->i_xtime);
    + &ts);
    |
    + ts = timespec64_to_timespec(attr->ia_xtime);
    ret = fn (...,
    - &attr->ia_xtime,
    + &ts,
    ...);
    |
    + ts = timespec64_to_timespec(attr->ia_xtime);
    ret = fn (...,
    - &attr->ia_xtime);
    + &ts);
    |
    + ts = timespec64_to_timespec(stat->xtime);
    ret = fn (...,
    - &stat->xtime);
    + &ts);
    )
    ...+>
    }

    @ depends on patch @
    struct inode *node;
    struct inode *node2;
    identifier i_xtime1 =~ "^i_[acm]time$";
    identifier i_xtime2 =~ "^i_[acm]time$";
    identifier i_xtime3 =~ "^i_[acm]time$";
    struct iattr *attrp;
    struct iattr *attrp2;
    struct iattr attr ;
    identifier ia_xtime1 =~ "^ia_[acm]time$";
    identifier ia_xtime2 =~ "^ia_[acm]time$";
    struct kstat *stat;
    struct kstat stat1;
    struct timespec64 ts;
    identifier xtime =~ "^[acmb]time$";
    expression e;
    @@
    (
    ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1 ;
    |
    node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \);
    |
    node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
    |
    node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
    |
    stat->xtime = node2->i_xtime1;
    |
    stat1.xtime = node2->i_xtime1;
    |
    ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1 ;
    |
    ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
    |
    - e = node->i_xtime1;
    + e = timespec64_to_timespec( node->i_xtime1 );
    |
    - e = attrp->ia_xtime1;
    + e = timespec64_to_timespec( attrp->ia_xtime1 );
    |
    node->i_xtime1 = current_time(...);
    |
    node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
    - e;
    + timespec_to_timespec64(e);
    |
    node->i_xtime1 = node->i_xtime3 =
    - e;
    + timespec_to_timespec64(e);
    |
    - node->i_xtime1 = e;
    + node->i_xtime1 = timespec_to_timespec64(e);
    )

    Signed-off-by: Deepa Dinamani
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:

    Deepa Dinamani
     
  • Pull driver core updates from Greg KH:
    "Here is the driver core patchset for 4.18-rc1.

    The large chunk of these are firmware core documentation and api
    updates. Nothing major there, just better descriptions for others to
    be able to understand the firmware code better. There's also a user
    for a new firmware api call.

    Other than that, there are some minor updates for debugfs, kernfs, and
    the driver core itself.

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'driver-core-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (23 commits)
    driver core: hold dev's parent lock when needed
    driver-core: return EINVAL error instead of BUG_ON()
    driver core: add __printf verification to device_create_groups_vargs
    mm: memory_hotplug: use put_device() if device_register fail
    base: core: fix typo 'can by' to 'can be'
    debugfs: inode: debugfs_create_dir uses mode permission from parent
    debugfs: Re-use kstrtobool_from_user()
    Documentation: clarify firmware_class provenance and why we can't rename the module
    Documentation: remove stale firmware API reference
    Documentation: fix few typos and clarifications for the firmware loader
    ath10k: re-enable the firmware fallback mechanism for testmode
    ath10k: use firmware_request_nowarn() to load firmware
    firmware: add firmware_request_nowarn() - load firmware without warnings
    firmware_loader: make firmware_fallback_sysfs() print more useful
    firmware_loader: move kconfig FW_LOADER entries to its own file
    firmware_loader: replace ---help--- with help
    firmware_loader: enhance Kconfig documentation over FW_LOADER
    firmware_loader: document firmware_sysfs_fallback()
    firmware: rename fw_sysfs_fallback to firmware_fallback_sysfs()
    firmware: use () to terminate kernel-doc function names
    ...

    Linus Torvalds
     

22 May, 2018

1 commit


23 Apr, 2018

1 commit

  • Use new return type vm_fault_t for page_mkwrite and
    fault handler. For now, this is just documenting that
    the function returns a VM_FAULT value rather than an
    errno. Once all instances are converted, vm_fault_t
    will become a distinct type.

    Reference id -> 1c8f422059ae ("mm: change return type to
    vm_fault_t")

    Signed-off-by: Souptick Joarder
    Reviewed-by: Matthew Wilcox
    Signed-off-by: Greg Kroah-Hartman

    Souptick Joarder
     

12 Feb, 2018

1 commit

  • This is the mindless scripted replacement of kernel use of POLL*
    variables as described by Al, done by this script:

    for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
    L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
    for f in $L; do sed -i "-es/^\([^\"]*\)\(\\)/\\1E\\2/" $f; done
    done

    with de-mangling cleanups yet to come.

    NOTE! On almost all architectures, the EPOLL* constants have the same
    values as the POLL* constants do. But they keyword here is "almost".
    For various bad reasons they aren't the same, and epoll() doesn't
    actually work quite correctly in some cases due to this on Sparc et al.

    The next patch from Al will sort out the final differences, and we
    should be all done.

    Scripted-by: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

10 Feb, 2018

1 commit


31 Jan, 2018

1 commit

  • Pull poll annotations from Al Viro:
    "This introduces a __bitwise type for POLL### bitmap, and propagates
    the annotations through the tree. Most of that stuff is as simple as
    'make ->poll() instances return __poll_t and do the same to local
    variables used to hold the future return value'.

    Some of the obvious brainos found in process are fixed (e.g. POLLIN
    misspelled as POLL_IN). At that point the amount of sparse warnings is
    low and most of them are for genuine bugs - e.g. ->poll() instance
    deciding to return -EINVAL instead of a bitmap. I hadn't touched those
    in this series - it's large enough as it is.

    Another problem it has caught was eventpoll() ABI mess; select.c and
    eventpoll.c assumed that corresponding POLL### and EPOLL### were
    equal. That's true for some, but not all of them - EPOLL### are
    arch-independent, but POLL### are not.

    The last commit in this series separates userland POLL### values from
    the (now arch-independent) kernel-side ones, converting between them
    in the few places where they are copied to/from userland. AFAICS, this
    is the least disruptive fix preserving poll(2) ABI and making epoll()
    work on all architectures.

    As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and
    it will trigger only on what would've triggered EPOLLWRBAND on other
    architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered
    at all on sparc. With this patch they should work consistently on all
    architectures"

    * 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
    make kernel-side POLL... arch-independent
    eventpoll: no need to mask the result of epi_item_poll() again
    eventpoll: constify struct epoll_event pointers
    debugging printk in sg_poll() uses %x to print POLL... bitmap
    annotate poll(2) guts
    9p: untangle ->poll() mess
    ->si_band gets POLL... bitmap stored into a user-visible long field
    ring_buffer_poll_wait() return value used as return value of ->poll()
    the rest of drivers/*: annotate ->poll() instances
    media: annotate ->poll() instances
    fs: annotate ->poll() instances
    ipc, kernel, mm: annotate ->poll() instances
    net: annotate ->poll() instances
    apparmor: annotate ->poll() instances
    tomoyo: annotate ->poll() instances
    sound: annotate ->poll() instances
    acpi: annotate ->poll() instances
    crypto: annotate ->poll() instances
    block: annotate ->poll() instances
    x86: annotate ->poll() instances
    ...

    Linus Torvalds