22 Mar, 2016

1 commit

  • Pull cgroup namespace support from Tejun Heo:
    "These are changes to implement namespace support for cgroup which has
    been pending for quite some time now. It is very straight-forward and
    only affects what part of cgroup hierarchies are visible.

    After unsharing, mounting a cgroup fs will be scoped to the cgroups
    the task belonged to at the time of unsharing and the cgroup paths
    exposed to userland would be adjusted accordingly"

    * 'for-4.6-ns' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroup: fix and restructure error handling in copy_cgroup_ns()
    cgroup: fix alloc_cgroup_ns() error handling in copy_cgroup_ns()
    Add FS_USERNS_FLAG to cgroup fs
    cgroup: Add documentation for cgroup namespaces
    cgroup: mount cgroupns-root when inside non-init cgroupns
    kernfs: define kernfs_node_dentry
    cgroup: cgroup namespace setns support
    cgroup: introduce cgroup namespaces
    sched: new clone flag CLONE_NEWCGROUP for cgroup namespace
    kernfs: Add API to generate relative kernfs path

    Linus Torvalds
     

17 Feb, 2016

2 commits


08 Feb, 2016

1 commit

  • kernfs_walk_ns() uses a static path_buf[PATH_MAX] to separate out path
    components. Keeping around the 4k buffer just for kernfs_walk_ns() is
    wasteful. This patch makes it piggyback on kernfs_pr_cont_buf[]
    instead. This requires kernfs_walk_ns() to hold kernfs_rename_lock.

    Signed-off-by: Tejun Heo
    Reported-by: Geert Uytterhoeven
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     

23 Jan, 2016

1 commit

  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

15 Jan, 2016

1 commit

  • Currently, all kmem allocations (namely every kmem_cache_alloc, kmalloc,
    alloc_kmem_pages call) are accounted to memory cgroup automatically.
    Callers have to explicitly opt out if they don't want/need accounting
    for some reason. Such a design decision leads to several problems:

    - kmalloc users are highly sensitive to failures, many of them
    implicitly rely on the fact that kmalloc never fails, while memcg
    makes failures quite plausible.

    - A lot of objects are shared among different containers by design.
    Accounting such objects to one of containers is just unfair.
    Moreover, it might lead to pinning a dead memcg along with its kmem
    caches, which aren't tiny, which might result in noticeable increase
    in memory consumption for no apparent reason in the long run.

    - There are tons of short-lived objects. Accounting them to memcg will
    only result in slight noise and won't change the overall picture, but
    we still have to pay accounting overhead.

    For more info, see

    - http://lkml.kernel.org/r/20151105144002.GB15111%40dhcp22.suse.cz
    - http://lkml.kernel.org/r/20151106090555.GK29259@esperanza

    Therefore this patchset switches to the white list policy. Now kmalloc
    users have to explicitly opt in by passing __GFP_ACCOUNT flag.

    Currently, the list of accounted objects is quite limited and only
    includes those allocations that (1) are known to be easily triggered
    from userspace and (2) can fail gracefully (for the full list see patch
    no. 6) and it still misses many object types. However, accounting only
    those objects should be a satisfactory approximation of the behavior we
    used to have for most sane workloads.

    This patch (of 6):

    Revert 499611ed451508a42d1d7d ("kernfs: do not account ino_ida allocations
    to memcg").

    Black-list kmem accounting policy (aka __GFP_NOACCOUNT) turned out to be
    fragile and difficult to maintain, because there seem to be many more
    allocations that should not be accounted than those that should be.
    Besides, false accounting an allocation might result in much worse
    consequences than not accounting at all, namely increased memory
    consumption due to pinned dead kmem caches.

    So it was decided to switch to the white-list policy. This patch reverts
    bits introducing the black-list policy. The white-list policy will be
    introduced later in the series.

    Signed-off-by: Vladimir Davydov
    Acked-by: Johannes Weiner
    Cc: Michal Hocko
    Cc: Tejun Heo
    Cc: Greg Thelen
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

13 Jan, 2016

1 commit

  • Pull networking updates from Davic Miller:

    1) Support busy polling generically, for all NAPI drivers. From Eric
    Dumazet.

    2) Add byte/packet counter support to nft_ct, from Floriani Westphal.

    3) Add RSS/XPS support to mvneta driver, from Gregory Clement.

    4) Implement IPV6_HDRINCL socket option for raw sockets, from Hannes
    Frederic Sowa.

    5) Add support for T6 adapter to cxgb4 driver, from Hariprasad Shenai.

    6) Add support for VLAN device bridging to mlxsw switch driver, from
    Ido Schimmel.

    7) Add driver for Netronome NFP4000/NFP6000, from Jakub Kicinski.

    8) Provide hwmon interface to mlxsw switch driver, from Jiri Pirko.

    9) Reorganize wireless drivers into per-vendor directories just like we
    do for ethernet drivers. From Kalle Valo.

    10) Provide a way for administrators "destroy" connected sockets via the
    SOCK_DESTROY socket netlink diag operation. From Lorenzo Colitti.

    11) Add support to add/remove multicast routes via netlink, from Nikolay
    Aleksandrov.

    12) Make TCP keepalive settings per-namespace, from Nikolay Borisov.

    13) Add forwarding and packet duplication facilities to nf_tables, from
    Pablo Neira Ayuso.

    14) Dead route support in MPLS, from Roopa Prabhu.

    15) TSO support for thunderx chips, from Sunil Goutham.

    16) Add driver for IBM's System i/p VNIC protocol, from Thomas Falcon.

    17) Rationalize, consolidate, and more completely document the checksum
    offloading facilities in the networking stack. From Tom Herbert.

    18) Support aborting an ongoing scan in mac80211/cfg80211, from
    Vidyullatha Kanchanapally.

    19) Use per-bucket spinlock for bpf hash facility, from Tom Leiming.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1375 commits)
    net: bnxt: always return values from _bnxt_get_max_rings
    net: bpf: reject invalid shifts
    phonet: properly unshare skbs in phonet_rcv()
    dwc_eth_qos: Fix dma address for multi-fragment skbs
    phy: remove an unneeded condition
    mdio: remove an unneed condition
    mdio_bus: NULL dereference on allocation error
    net: Fix typo in netdev_intersect_features
    net: freescale: mac-fec: Fix build error from phy_device API change
    net: freescale: ucc_geth: Fix build error from phy_device API change
    bonding: Prevent IPv6 link local address on enslaved devices
    IB/mlx5: Add flow steering support
    net/mlx5_core: Export flow steering API
    net/mlx5_core: Make ipv4/ipv6 location more clear
    net/mlx5_core: Enable flow steering support for the IB driver
    net/mlx5_core: Initialize namespaces only when supported by device
    net/mlx5_core: Set priority attributes
    net/mlx5_core: Connect flow tables
    net/mlx5_core: Introduce modify flow table command
    net/mlx5_core: Managing root flow table
    ...

    Linus Torvalds
     

12 Jan, 2016

1 commit

  • Pull vfs xattr updates from Al Viro:
    "Andreas' xattr cleanup series.

    It's a followup to his xattr work that went in last cycle; -0.5KLoC"

    * 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    xattr handlers: Simplify list operation
    ocfs2: Replace list xattr handler operations
    nfs: Move call to security_inode_listsecurity into nfs_listxattr
    xfs: Change how listxattr generates synthetic attributes
    tmpfs: listxattr should include POSIX ACL xattrs
    tmpfs: Use xattr handler infrastructure
    btrfs: Use xattr handler infrastructure
    vfs: Distinguish between full xattr names and proper prefixes
    posix acls: Remove duplicate xattr name definitions
    gfs2: Remove gfs2_xattr_acl_chmod
    vfs: Remove vfs_xattr_cmp

    Linus Torvalds
     

31 Dec, 2015

1 commit


30 Dec, 2015

1 commit


09 Dec, 2015

1 commit

  • new method: ->get_link(); replacement of ->follow_link(). The differences
    are:
    * inode and dentry are passed separately
    * might be called both in RCU and non-RCU mode;
    the former is indicated by passing it a NULL dentry.
    * when called that way it isn't allowed to block
    and should return ERR_PTR(-ECHILD) if it needs to be called
    in non-RCU mode.

    It's a flagday change - the old method is gone, all in-tree instances
    converted. Conversion isn't hard; said that, so far very few instances
    do not immediately bail out when called in RCU mode. That'll change
    in the next commits.

    Signed-off-by: Al Viro

    Al Viro
     

07 Dec, 2015

2 commits

  • When a file on tmpfs has an ACL or a Default ACL, listxattr should include the
    corresponding xattr name.

    Signed-off-by: Andreas Gruenbacher
    Reviewed-by: James Morris
    Cc: Hugh Dickins
    Cc: linux-mm@kvack.org
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     
  • Use the VFS xattr handler infrastructure and get rid of similar code in
    the filesystem. For implementing shmem_xattr_handler_set, we need a
    version of simple_xattr_set which removes the attribute when value is
    NULL. Use this to implement kernfs_iop_removexattr as well.

    Signed-off-by: Andreas Gruenbacher
    Reviewed-by: James Morris
    Cc: Hugh Dickins
    Cc: linux-mm@kvack.org
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     

21 Nov, 2015

1 commit

  • Implement kernfs_walk_and_get() which is similar to
    kernfs_find_and_get() but can walk a path instead of just a name.

    v2: Use strlcpy() instead of strlen() + memcpy() as suggested by
    David.

    Signed-off-by: Tejun Heo
    Acked-by: Greg Kroah-Hartman
    Cc: David Miller

    Tejun Heo
     

19 Aug, 2015

1 commit


04 Jul, 2015

1 commit

  • Pull user namespace updates from Eric Biederman:
    "Long ago and far away when user namespaces where young it was realized
    that allowing fresh mounts of proc and sysfs with only user namespace
    permissions could violate the basic rule that only root gets to decide
    if proc or sysfs should be mounted at all.

    Some hacks were put in place to reduce the worst of the damage could
    be done, and the common sense rule was adopted that fresh mounts of
    proc and sysfs should allow no more than bind mounts of proc and
    sysfs. Unfortunately that rule has not been fully enforced.

    There are two kinds of gaps in that enforcement. Only filesystems
    mounted on empty directories of proc and sysfs should be ignored but
    the test for empty directories was insufficient. So in my tree
    directories on proc, sysctl and sysfs that will always be empty are
    created specially. Every other technique is imperfect as an ordinary
    directory can have entries added even after a readdir returns and
    shows that the directory is empty. Special creation of directories
    for mount points makes the code in the kernel a smidge clearer about
    it's purpose. I asked container developers from the various container
    projects to help test this and no holes were found in the set of mount
    points on proc and sysfs that are created specially.

    This set of changes also starts enforcing the mount flags of fresh
    mounts of proc and sysfs are consistent with the existing mount of
    proc and sysfs. I expected this to be the boring part of the work but
    unfortunately unprivileged userspace winds up mounting fresh copies of
    proc and sysfs with noexec and nosuid clear when root set those flags
    on the previous mount of proc and sysfs. So for now only the atime,
    read-only and nodev attributes which userspace happens to keep
    consistent are enforced. Dealing with the noexec and nosuid
    attributes remains for another time.

    This set of changes also addresses an issue with how open file
    descriptors from /proc//ns/* are displayed. Recently readlink of
    /proc//fd has been triggering a WARN_ON that has not been
    meaningful since it was added (as all of the code in the kernel was
    converted) and is not now actively wrong.

    There is also a short list of issues that have not been fixed yet that
    I will mention briefly.

    It is possible to rename a directory from below to above a bind mount.
    At which point any directory pointers below the renamed directory can
    be walked up to the root directory of the filesystem. With user
    namespaces enabled a bind mount of the bind mount can be created
    allowing the user to pick a directory whose children they can rename
    to outside of the bind mount. This is challenging to fix and doubly
    so because all obvious solutions must touch code that is in the
    performance part of pathname resolution.

    As mentioned above there is also a question of how to ensure that
    developers by accident or with purpose do not introduce exectuable
    files on sysfs and proc and in doing so introduce security regressions
    in the current userspace that will not be immediately obvious and as
    such are likely to require breaking userspace in painful ways once
    they are recognized"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    vfs: Remove incorrect debugging WARN in prepend_path
    mnt: Update fs_fully_visible to test for permanently empty directories
    sysfs: Create mountpoints with sysfs_create_mount_point
    sysfs: Add support for permanently empty directories to serve as mount points.
    kernfs: Add support for always empty directories.
    proc: Allow creating permanently empty directories that serve as mount points
    sysctl: Allow creating permanently empty directories that serve as mountpoints.
    fs: Add helper functions for permanently empty directories.
    vfs: Ignore unlocked mounts in fs_fully_visible
    mnt: Modify fs_fully_visible to deal with locked ro nodev and atime
    mnt: Refactor the logic for mounting sysfs and proc in a user namespace

    Linus Torvalds
     

01 Jul, 2015

1 commit

  • Add a new function kernfs_create_empty_dir that can be used to create
    directory that can not be modified.

    Update the code to use make_empty_dir_inode when reporting a
    permanently empty directory to the vfs.

    Update the code to not allow adding to permanently empty directories.

    Cc: stable@vger.kernel.org
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

27 Jun, 2015

2 commits

  • Pull cgroup updates from Tejun Heo:

    - threadgroup_lock got reorganized so that its users can pick the
    actual locking mechanism to use. Its only user - cgroups - is
    updated to use a percpu_rwsem instead of per-process rwsem.

    This makes things a bit lighter on hot paths and allows cgroups to
    perform and fail multi-task (a process) migrations atomically.
    Multi-task migrations are used in several places including the
    unified hierarchy.

    - Delegation rule and documentation added to unified hierarchy. This
    will likely be the last interface update from the cgroup core side
    for unified hierarchy before lifting the devel mask.

    - Some groundwork for the pids controller which is scheduled to be
    merged in the coming devel cycle.

    * 'for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroup: add delegation section to unified hierarchy documentation
    cgroup: require write perm on common ancestor when moving processes on the default hierarchy
    cgroup: separate out cgroup_procs_write_permission() from __cgroup_procs_write()
    kernfs: make kernfs_get_inode() public
    MAINTAINERS: add a cgroup core co-maintainer
    cgroup: fix uninitialised iterator in for_each_subsys_which
    cgroup: replace explicit ss_mask checking with for_each_subsys_which
    cgroup: use bitmask to filter for_each_subsys
    cgroup: add seq_file forward declaration for struct cftype
    cgroup: simplify threadgroup locking
    sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem
    sched, cgroup: reorganize threadgroup locking
    cgroup: switch to unsigned long for bitmasks
    cgroup: reorganize include/linux/cgroup.h
    cgroup: separate out include/linux/cgroup-defs.h
    cgroup: fix some comment typos

    Linus Torvalds
     
  • Pull driver core updates from Greg KH:
    "Here is the driver core / firmware changes for 4.2-rc1.

    A number of small changes all over the place in the driver core, and
    in the firmware subsystem. Nothing really major, full details in the
    shortlog. Some of it is a bit of churn, given that the platform
    driver probing changes was found to not work well, so they were
    reverted.

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (31 commits)
    Revert "base/platform: Only insert MEM and IO resources"
    Revert "base/platform: Continue on insert_resource() error"
    Revert "of/platform: Use platform_device interface"
    Revert "base/platform: Remove code duplication"
    firmware: add missing kfree for work on async call
    fs: sysfs: don't pass count == 0 to bin file readers
    base:dd - Fix for typo in comment to function driver_deferred_probe_trigger().
    base/platform: Remove code duplication
    of/platform: Use platform_device interface
    base/platform: Continue on insert_resource() error
    base/platform: Only insert MEM and IO resources
    firmware: use const for remaining firmware names
    firmware: fix possible use after free on name on asynchronous request
    firmware: check for file truncation on direct firmware loading
    firmware: fix __getname() missing failure check
    drivers: of/base: move of_init to driver_init
    drivers/base: cacheinfo: fix annoying typo when DT nodes are absent
    sysfs: disambiguate between "error code" and "failure" in comments
    driver-core: fix build for !CONFIG_MODULES
    driver-core: make __device_attach() static
    ...

    Linus Torvalds
     

23 Jun, 2015

1 commit

  • Pull vfs updates from Al Viro:
    "In this pile: pathname resolution rewrite.

    - recursion in link_path_walk() is gone.

    - nesting limits on symlinks are gone (the only limit remaining is
    that the total amount of symlinks is no more than 40, no matter how
    nested).

    - "fast" (inline) symlinks are handled without leaving rcuwalk mode.

    - stack footprint (independent of the nesting) is below kilobyte now,
    about on par with what it used to be with one level of nested
    symlinks and ~2.8 times lower than it used to be in the worst case.

    - struct nameidata is entirely private to fs/namei.c now (not even
    opaque pointers are being passed around).

    - ->follow_link() and ->put_link() calling conventions had been
    changed; all in-tree filesystems converted, out-of-tree should be
    able to follow reasonably easily.

    For out-of-tree conversions, see Documentation/filesystems/porting
    for details (and in-tree filesystems for examples of conversion).

    That has sat in -next since mid-May, seems to survive all testing
    without regressions and merges clean with v4.1"

    * 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (131 commits)
    turn user_{path_at,path,lpath,path_dir}() into static inlines
    namei: move saved_nd pointer into struct nameidata
    inline user_path_create()
    inline user_path_parent()
    namei: trim do_last() arguments
    namei: stash dfd and name into nameidata
    namei: fold path_cleanup() into terminate_walk()
    namei: saner calling conventions for filename_parentat()
    namei: saner calling conventions for filename_create()
    namei: shift nameidata down into filename_parentat()
    namei: make filename_lookup() reject ERR_PTR() passed as name
    namei: shift nameidata inside filename_lookup()
    namei: move putname() call into filename_lookup()
    namei: pass the struct path to store the result down into path_lookupat()
    namei: uninline set_root{,_rcu}()
    namei: be careful with mountpoint crossings in follow_dotdot_rcu()
    Documentation: remove outdated information from automount-support.txt
    get rid of assorted nameidata-related debris
    lustre: kill unused helper
    lustre: kill unused macro (LOOKUP_CONTINUE)
    ...

    Linus Torvalds
     

19 Jun, 2015

1 commit

  • Move kernfs_get_inode() prototype from fs/kernfs/kernfs-internal.h to
    include/linux/kernfs.h. It obtains the matching inode for a
    kernfs_node.

    It will be used by cgroup for inode based permission checks for now
    but is generally useful.

    Signed-off-by: Tejun Heo
    Acked-by: Greg Kroah-Hartman

    Tejun Heo
     

25 May, 2015

1 commit


15 May, 2015

1 commit

  • root->ino_ida is used for kernfs inode number allocations. Since IDA has
    a layered structure, different IDs can reside on the same layer, which
    is currently accounted to some memory cgroup. The problem is that each
    kmem cache of a memory cgroup has its own directory on sysfs (under
    /sys/fs/kernel//cgroup). If the inode number of such a
    directory or any file in it gets allocated from a layer accounted to the
    cgroup which the cache is created for, the cgroup will get pinned for
    good, because one has to free all kmem allocations accounted to a cgroup
    in order to release it and destroy all its kmem caches. That said we
    must not account layers of ino_ida to any memory cgroup.

    Since per net init operations may create new sysfs entries directly
    (e.g. lo device) or indirectly (nf_conntrack creates a new kmem cache
    per each namespace, which, in turn, creates new sysfs entries), an easy
    way to reproduce this issue is by creating network namespace(s) from
    inside a kmem-active memory cgroup.

    Signed-off-by: Vladimir Davydov
    Acked-by: Tejun Heo
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Greg Thelen
    Cc: Greg Kroah-Hartman
    Cc: [4.0.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

11 May, 2015

4 commits

  • similar to kfree_put_link()

    Signed-off-by: Al Viro

    Al Viro
     
  • only one instance looks at that argument at all; that sole
    exception wants inode rather than dentry.

    Signed-off-by: Al Viro

    Al Viro
     
  • its only use is getting passed to nd_jump_link(), which can obtain
    it from current->nameidata

    Signed-off-by: Al Viro

    Al Viro
     
  • a) instead of storing the symlink body (via nd_set_link()) and returning
    an opaque pointer later passed to ->put_link(), ->follow_link() _stores_
    that opaque pointer (into void * passed by address by caller) and returns
    the symlink body. Returning ERR_PTR() on error, NULL on jump (procfs magic
    symlinks) and pointer to symlink body for normal symlinks. Stored pointer
    is ignored in all cases except the last one.

    Storing NULL for opaque pointer (or not storing it at all) means no call
    of ->put_link().

    b) the body used to be passed to ->put_link() implicitly (via nameidata).
    Now only the opaque pointer is. In the cases when we used the symlink body
    to free stuff, ->follow_link() now should store it as opaque pointer in addition
    to returning it.

    Signed-off-by: Al Viro

    Al Viro
     

16 Apr, 2015

1 commit


17 Mar, 2015

1 commit

  • Kernfs supports two styles of read: direct_read and seqfile_read.

    The latter supports 'poll' correctly thanks to the update of
    '->event' in kernfs_seq_show.
    The former does not as '->event' is never updated on a read.

    So add an appropriate update in kernfs_file_direct_read().

    This was noticed because some 'md' sysfs attributes were
    recently changed to use direct reads.

    Reported-by: Prakash Punnoor
    Reported-by: Torsten Kaiser
    Fixes: 750f199ee8b578062341e6ddfe36c59ac8ff2dcb
    Signed-off-by: NeilBrown
    Acked-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    NeilBrown
     

14 Feb, 2015

2 commits

  • When a new kernfs node is created, KERNFS_STATIC_NAME is used to avoid
    making a separate copy of its name. It's currently only used for sysfs
    attributes whose filenames are required to stay accessible and unchanged.
    There are rare exceptions where these names are allocated and formatted
    dynamically but for the vast majority of cases they're consts in the
    rodata section.

    Now that kernfs is converted to use kstrdup_const() and kfree_const(),
    there's little point in keeping KERNFS_STATIC_NAME around. Remove it.

    Signed-off-by: Tejun Heo
    Cc: Andrzej Hajda
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • sysfs frequently performs duplication of strings located in read-only
    memory section. Replacing kstrdup by kstrdup_const allows to avoid such
    operations.

    Signed-off-by: Andrzej Hajda
    Cc: Marek Szyprowski
    Cc: Kyungmin Park
    Cc: Mike Turquette
    Cc: Alexander Viro
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Acked-by: Tejun Heo
    Cc: Greg KH
    Cc: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrzej Hajda
     

21 Jan, 2015

2 commits


10 Jan, 2015

1 commit

  • Returning a difference from a comparison functions is usually wrong
    (see acbbe6fbb240 "kcmp: fix standard comparison bug" for the long
    story). Here there is the additional twist that if the void pointers
    ns and kn->ns happen to differ by a multiple of 2^32,
    kernfs_name_compare returns 0, falsely reporting a match to the
    caller.

    Technically 'hash - kn->hash' is ok since the hashes are restricted to
    31 bits, but it's better to avoid that subtlety.

    Signed-off-by: Rasmus Villemoes
    Acked-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Rasmus Villemoes
     

17 Dec, 2014

1 commit


15 Dec, 2014

1 commit

  • Pull driver core update from Greg KH:
    "Here's the set of driver core patches for 3.19-rc1.

    They are dominated by the removal of the .owner field in platform
    drivers. They touch a lot of files, but they are "simple" changes,
    just removing a line in a structure.

    Other than that, a few minor driver core and debugfs changes. There
    are some ath9k patches coming in through this tree that have been
    acked by the wireless maintainers as they relied on the debugfs
    changes.

    Everything has been in linux-next for a while"

    * tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits)
    Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries"
    fs: debugfs: add forward declaration for struct device type
    firmware class: Deletion of an unnecessary check before the function call "vunmap"
    firmware loader: fix hung task warning dump
    devcoredump: provide a one-way disable function
    device: Add dev__once variants
    ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries
    ath: use seq_file api for ath9k debugfs files
    debugfs: add helper function to create device related seq_file
    drivers/base: cacheinfo: remove noisy error boot message
    Revert "core: platform: add warning if driver has no owner"
    drivers: base: support cpu cache information interface to userspace via sysfs
    drivers: base: add cpu_device_create to support per-cpu devices
    topology: replace custom attribute macros with standard DEVICE_ATTR*
    cpumask: factor out show_cpumap into separate helper function
    driver core: Fix unbalanced device reference in drivers_probe
    driver core: fix race with userland in device_add()
    sysfs/kernfs: make read requests on pre-alloc files use the buffer.
    sysfs/kernfs: allow attributes to request write buffer be pre-allocated.
    fs: sysfs: return EGBIG on write if offset is larger than file size
    ...

    Linus Torvalds
     

20 Nov, 2014

1 commit


08 Nov, 2014

2 commits

  • To match the previous patch which used the pre-alloc buffer for
    writes, this patch causes reads to use the same buffer.
    This is not strictly necessary as the current seq_read() will allocate
    on first read, so user-space can trigger the required pre-alloc. But
    consistency is valuable.

    The read function is somewhat simpler than seq_read() and, for example,
    does not support reading from an offset into the file: reads must be
    at the start of the file.

    As seq_read() does not use the prealloc buffer, ->seq_show is
    incompatible with ->prealloc and caused an EINVAL return from open().
    sysfs code which calls into kernfs always chooses the correct function.

    As the buffer is shared with writes and other reads, the mutex is
    extended to cover the copy_to_user.

    Signed-off-by: NeilBrown
    Reviewed-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    NeilBrown
     
  • md/raid allows metadata management to be performed in user-space.
    A various times, particularly on device failure, the metadata needs
    to be updated before further writes can be permitted.
    This means that the user-space program which updates metadata much
    not block on writeout, and so must not allocate memory.

    mlockall(MCL_CURRENT|MCL_FUTURE) and pre-allocation can avoid all
    memory allocation issues for user-memory, but that does not help
    kernel memory.
    Several kernel objects can be pre-allocated. e.g. files opened before
    any writes to the array are permitted.
    However some kernel allocation happens in places that cannot be
    pre-allocated.
    In particular, writes to sysfs files (to tell md that it can now
    allow writes to the array) allocate a buffer using GFP_KERNEL.

    This patch allows attributes to be marked as "PREALLOC". In that case
    the maximal buffer is allocated when the file is opened, and then used
    on each write instead of allocating a new buffer.

    As the same buffer is now shared for all writes on the same file
    description, the mutex is extended to cover full use of the buffer
    including the copy_from_user().

    The new __ATTR_PREALLOC() 'or's a new flag in to the 'mode', which is
    inspected by sysfs_add_file_mode_ns() to determine if the file should be
    marked as requiring prealloc.

    Despite the comment, we *do* use ->seq_show together with ->prealloc
    in this patch. The next patch fixes that.

    Signed-off-by: NeilBrown
    Reviewed-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    NeilBrown
     

09 Oct, 2014

1 commit