28 Feb, 2017

1 commit

  • Now that %z is standartised in C99 there is no reason to support %Z.
    Unlike %L it doesn't even make format strings smaller.

    Use BUILD_BUG_ON in a couple ATM drivers.

    In case anyone didn't notice lib/vsprintf.o is about half of SLUB which
    is in my opinion is quite an achievement. Hopefully this patch inspires
    someone else to trim vsprintf.c more.

    Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2
    Signed-off-by: Alexey Dobriyan
    Cc: Andy Shevchenko
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

25 Feb, 2017

1 commit

  • ->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
    take a vma and vmf parameter when the vma already resides in vmf.

    Remove the vma parameter to simplify things.

    [arnd@arndb.de: fix ARM build]
    Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
    Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
    Signed-off-by: Dave Jiang
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Ross Zwisler
    Cc: Theodore Ts'o
    Cc: Darrick J. Wong
    Cc: Matthew Wilcox
    Cc: Dave Hansen
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: Dan Williams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     

24 Feb, 2017

1 commit

  • Pull namespace updates from Eric Biederman:
    "There is a lot here. A lot of these changes result in subtle user
    visible differences in kernel behavior. I don't expect anything will
    care but I will revert/fix things immediately if any regressions show
    up.

    From Seth Forshee there is a continuation of the work to make the vfs
    ready for unpriviled mounts. We had thought the previous changes
    prevented the creation of files outside of s_user_ns of a filesystem,
    but it turns we missed the O_CREAT path. Ooops.

    Pavel Tikhomirov and Oleg Nesterov worked together to fix a long
    standing bug in the implemenation of PR_SET_CHILD_SUBREAPER where only
    children that are forked after the prctl are considered and not
    children forked before the prctl. The only known user of this prctl
    systemd forks all children after the prctl. So no userspace
    regressions will occur. Holding earlier forked children to the same
    rules as later forked children creates a semantic that is sane enough
    to allow checkpoing of processes that use this feature.

    There is a long delayed change by Nikolay Borisov to limit inotify
    instances inside a user namespace.

    Michael Kerrisk extends the API for files used to maniuplate
    namespaces with two new trivial ioctls to allow discovery of the
    hierachy and properties of namespaces.

    Konstantin Khlebnikov with the help of Al Viro adds code that when a
    network namespace exits purges it's sysctl entries from the dcache. As
    in some circumstances this could use a lot of memory.

    Vivek Goyal fixed a bug with stacked filesystems where the permissions
    on the wrong inode were being checked.

    I continue previous work on ptracing across exec. Allowing a file to
    be setuid across exec while being ptraced if the tracer has enough
    credentials in the user namespace, and if the process has CAP_SETUID
    in it's own namespace. Proc files for setuid or otherwise undumpable
    executables are now owned by the root in the user namespace of their
    mm. Allowing debugging of setuid applications in containers to work
    better.

    A bug I introduced with permission checking and automount is now
    fixed. The big change is to mark the mounts that the kernel initiates
    as a result of an automount. This allows the permission checks in sget
    to be safely suppressed for this kind of mount. As the permission
    check happened when the original filesystem was mounted.

    Finally a special case in the mount namespace is removed preventing
    unbounded chains in the mount hash table, and making the semantics
    simpler which benefits CRIU.

    The vfs fix along with related work in ima and evm I believe makes us
    ready to finish developing and merge fully unprivileged mounts of the
    fuse filesystem. The cleanups of the mount namespace makes discussing
    how to fix the worst case complexity of umount. The stacked filesystem
    fixes pave the way for adding multiple mappings for the filesystem
    uids so that efficient and safer containers can be implemented"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    proc/sysctl: Don't grab i_lock under sysctl_lock.
    vfs: Use upper filesystem inode in bprm_fill_uid()
    proc/sysctl: prune stale dentries during unregistering
    mnt: Tuck mounts under others instead of creating shadow/side mounts.
    prctl: propagate has_child_subreaper flag to every descendant
    introduce the walk_process_tree() helper
    nsfs: Add an ioctl() to return owner UID of a userns
    fs: Better permission checking for submounts
    exit: fix the setns() && PR_SET_CHILD_SUBREAPER interaction
    vfs: open() with O_CREAT should not create inodes with unknown ids
    nsfs: Add an ioctl() to return the namespace type
    proc: Better ownership of files for non-dumpable tasks in user namespaces
    exec: Remove LSM_UNSAFE_PTRACE_CAP
    exec: Test the ptracer's saved cred to see if the tracee can gain caps
    exec: Don't reset euid and egid when the tracee has CAP_SETUID
    inotify: Convert to using per-namespace limits

    Linus Torvalds
     

23 Feb, 2017

2 commits

  • Pull driver core updates from Greg KH:
    "Here is the "small" driver core patches for 4.11-rc1.

    Not much here, some firmware documentation and self-test updates, a
    debugfs code formatting issue, and a new feature for call_usermodehelper
    to make it more robust on systems that want to lock it down in a more
    secure way.

    All of these have been linux-next for a while now with no reported
    issues"

    * tag 'driver-core-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
    kernfs: handle null pointers while printing node name and path
    Introduce STATIC_USERMODEHELPER to mediate call_usermodehelper()
    Make static usermode helper binaries constant
    kmod: make usermodehelper path a const string
    firmware: revamp firmware documentation
    selftests: firmware: send expected errors to /dev/null
    selftests: firmware: only modprobe if driver is missing
    platform: Print the resource range if device failed to claim
    kref: prefer atomic_inc_not_zero to atomic_add_unless
    debugfs: improve formatting of debugfs_real_fops()

    Linus Torvalds
     
  • Pull networking updates from David Miller:
    "Highlights:

    1) Support TX_RING in AF_PACKET TPACKET_V3 mode, from Sowmini
    Varadhan.

    2) Simplify classifier state on sk_buff in order to shrink it a bit.
    From Willem de Bruijn.

    3) Introduce SIPHASH and it's usage for secure sequence numbers and
    syncookies. From Jason A. Donenfeld.

    4) Reduce CPU usage for ICMP replies we are going to limit or
    suppress, from Jesper Dangaard Brouer.

    5) Introduce Shared Memory Communications socket layer, from Ursula
    Braun.

    6) Add RACK loss detection and allow it to actually trigger fast
    recovery instead of just assisting after other algorithms have
    triggered it. From Yuchung Cheng.

    7) Add xmit_more and BQL support to mvneta driver, from Simon Guinot.

    8) skb_cow_data avoidance in esp4 and esp6, from Steffen Klassert.

    9) Export MPLS packet stats via netlink, from Robert Shearman.

    10) Significantly improve inet port bind conflict handling, especially
    when an application is restarted and changes it's setting of
    reuseport. From Josef Bacik.

    11) Implement TX batching in vhost_net, from Jason Wang.

    12) Extend the dummy device so that VF (virtual function) features,
    such as configuration, can be more easily tested. From Phil
    Sutter.

    13) Avoid two atomic ops per page on x86 in bnx2x driver, from Eric
    Dumazet.

    14) Add new bpf MAP, implementing a longest prefix match trie. From
    Daniel Mack.

    15) Packet sample offloading support in mlxsw driver, from Yotam Gigi.

    16) Add new aquantia driver, from David VomLehn.

    17) Add bpf tracepoints, from Daniel Borkmann.

    18) Add support for port mirroring to b53 and bcm_sf2 drivers, from
    Florian Fainelli.

    19) Remove custom busy polling in many drivers, it is done in the core
    networking since 4.5 times. From Eric Dumazet.

    20) Support XDP adjust_head in virtio_net, from John Fastabend.

    21) Fix several major holes in neighbour entry confirmation, from
    Julian Anastasov.

    22) Add XDP support to bnxt_en driver, from Michael Chan.

    23) VXLAN offloads for enic driver, from Govindarajulu Varadarajan.

    24) Add IPVTAP driver (IP-VLAN based tap driver) from Sainath Grandhi.

    25) Support GRO in IPSEC protocols, from Steffen Klassert"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1764 commits)
    Revert "ath10k: Search SMBIOS for OEM board file extension"
    net: socket: fix recvmmsg not returning error from sock_error
    bnxt_en: use eth_hw_addr_random()
    bpf: fix unlocking of jited image when module ronx not set
    arch: add ARCH_HAS_SET_MEMORY config
    net: napi_watchdog() can use napi_schedule_irqoff()
    tcp: Revert "tcp: tcp_probe: use spin_lock_bh()"
    net/hsr: use eth_hw_addr_random()
    net: mvpp2: enable building on 64-bit platforms
    net: mvpp2: switch to build_skb() in the RX path
    net: mvpp2: simplify MVPP2_PRS_RI_* definitions
    net: mvpp2: fix indentation of MVPP2_EXT_GLOBAL_CTRL_DEFAULT
    net: mvpp2: remove unused register definitions
    net: mvpp2: simplify mvpp2_bm_bufs_add()
    net: mvpp2: drop useless fields in mvpp2_bm_pool and related code
    net: mvpp2: remove unused 'tx_skb' field of 'struct mvpp2_tx_queue'
    net: mvpp2: release reference to txq_cpu[] entry after unmapping
    net: mvpp2: handle too large value in mvpp2_rx_time_coal_set()
    net: mvpp2: handle too large value handling in mvpp2_rx_pkts_coal_set()
    net: mvpp2: remove useless arguments in mvpp2_rx_{pkts, time}_coal_set
    ...

    Linus Torvalds
     

22 Feb, 2017

1 commit

  • Pull security layer updates from James Morris:
    "Highlights:

    - major AppArmor update: policy namespaces & lots of fixes

    - add /sys/kernel/security/lsm node for easy detection of loaded LSMs

    - SELinux cgroupfs labeling support

    - SELinux context mounts on tmpfs, ramfs, devpts within user
    namespaces

    - improved TPM 2.0 support"

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (117 commits)
    tpm: declare tpm2_get_pcr_allocation() as static
    tpm: Fix expected number of response bytes of TPM1.2 PCR Extend
    tpm xen: drop unneeded chip variable
    tpm: fix misspelled "facilitate" in module parameter description
    tpm_tis: fix the error handling of init_tis()
    KEYS: Use memzero_explicit() for secret data
    KEYS: Fix an error code in request_master_key()
    sign-file: fix build error in sign-file.c with libressl
    selinux: allow changing labels for cgroupfs
    selinux: fix off-by-one in setprocattr
    tpm: silence an array overflow warning
    tpm: fix the type of owned field in cap_t
    tpm: add securityfs support for TPM 2.0 firmware event log
    tpm: enhance read_log_of() to support Physical TPM event log
    tpm: enhance TPM 2.0 PCR extend to support multiple banks
    tpm: implement TPM 2.0 capability to get active PCR banks
    tpm: fix RC value check in tpm2_seal_trusted
    tpm_tis: fix iTPM probe via probe_itpm() function
    tpm: Begin the process to deprecate user_read_timer
    tpm: remove tpm_read_index and tpm_write_index from tpm.h
    ...

    Linus Torvalds
     

21 Feb, 2017

1 commit

  • Pull locking updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Implement wraparound-safe refcount_t and kref_t types based on
    generic atomic primitives (Peter Zijlstra)

    - Improve and fix the ww_mutex code (Nicolai Hähnle)

    - Add self-tests to the ww_mutex code (Chris Wilson)

    - Optimize percpu-rwsems with the 'rcuwait' mechanism (Davidlohr
    Bueso)

    - Micro-optimize the current-task logic all around the core kernel
    (Davidlohr Bueso)

    - Tidy up after recent optimizations: remove stale code and APIs,
    clean up the code (Waiman Long)

    - ... plus misc fixes, updates and cleanups"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
    fork: Fix task_struct alignment
    locking/spinlock/debug: Remove spinlock lockup detection code
    lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS
    lkdtm: Convert to refcount_t testing
    kref: Implement 'struct kref' using refcount_t
    refcount_t: Introduce a special purpose refcount type
    sched/wake_q: Clarify queue reinit comment
    sched/wait, rcuwait: Fix typo in comment
    locking/mutex: Fix lockdep_assert_held() fail
    locking/rtmutex: Flip unlikely() branch to likely() in __rt_mutex_slowlock()
    locking/rwsem: Reinit wake_q after use
    locking/rwsem: Remove unnecessary atomic_long_t casts
    jump_labels: Move header guard #endif down where it belongs
    locking/atomic, kref: Implement kref_put_lock()
    locking/ww_mutex: Turn off __must_check for now
    locking/atomic, kref: Avoid more abuse
    locking/atomic, kref: Use kref_get_unless_zero() more
    locking/atomic, kref: Kill kref_sub()
    locking/atomic, kref: Add kref_read()
    locking/atomic, kref: Add KREF_INIT()
    ...

    Linus Torvalds
     

11 Feb, 2017

1 commit


10 Feb, 2017

3 commits


08 Feb, 2017

4 commits

  • SELinux tries to support setting/clearing of /proc/pid/attr attributes
    from the shell by ignoring terminating newlines and treating an
    attribute value that begins with a NUL or newline as an attempt to
    clear the attribute. However, the test for clearing attributes has
    always been wrong; it has an off-by-one error, and this could further
    lead to reading past the end of the allocated buffer since commit
    bb646cdb12e75d82258c2f2e7746d5952d3e321a ("proc_pid_attr_write():
    switch to memdup_user()"). Fix the off-by-one error.

    Even with this fix, setting and clearing /proc/pid/attr attributes
    from the shell is not straightforward since the interface does not
    support multiple write() calls (so shells that write the value and
    newline separately will set and then immediately clear the attribute,
    requiring use of echo -n to set the attribute), whereas trying to use
    echo -n "" to clear the attribute causes the shell to skip the
    write() call altogether since POSIX says that a zero-length write
    causes no side effects. Thus, one must use echo -n to set and echo
    without -n to clear, as in the following example:
    $ echo -n unconfined_u:object_r:user_home_t:s0 > /proc/$$/attr/fscreate
    $ cat /proc/$$/attr/fscreate
    unconfined_u:object_r:user_home_t:s0
    $ echo "" > /proc/$$/attr/fscreate
    $ cat /proc/$$/attr/fscreate

    Note the use of /proc/$$ rather than /proc/self, as otherwise
    the cat command will read its own attribute value, not that of the shell.

    There are no users of this facility to my knowledge; possibly we
    should just get rid of it.

    UPDATE: Upon further investigation it appears that a local process
    with the process:setfscreate permission can cause a kernel panic as a
    result of this bug. This patch fixes CVE-2017-2618.

    Signed-off-by: Stephen Smalley
    [PM: added the update about CVE-2017-2618 to the commit description]
    Cc: stable@vger.kernel.org # 3.5: d6ea83ec6864e
    Signed-off-by: Paul Moore

    Signed-off-by: James Morris

    Stephen Smalley
     
  • James Morris
     
  • This patch allows changing labels for cgroup mounts. Previously, running
    chcon on cgroupfs would throw an "Operation not supported". This patch
    specifically whitelist cgroupfs.

    The patch could also allow containers to write only to the systemd cgroup
    for instance, while the other cgroups are kept with cgroup_t label.

    Signed-off-by: Antonio Murdaca
    Acked-by: Stephen Smalley
    Signed-off-by: Paul Moore

    Antonio Murdaca
     
  • SELinux tries to support setting/clearing of /proc/pid/attr attributes
    from the shell by ignoring terminating newlines and treating an
    attribute value that begins with a NUL or newline as an attempt to
    clear the attribute. However, the test for clearing attributes has
    always been wrong; it has an off-by-one error, and this could further
    lead to reading past the end of the allocated buffer since commit
    bb646cdb12e75d82258c2f2e7746d5952d3e321a ("proc_pid_attr_write():
    switch to memdup_user()"). Fix the off-by-one error.

    Even with this fix, setting and clearing /proc/pid/attr attributes
    from the shell is not straightforward since the interface does not
    support multiple write() calls (so shells that write the value and
    newline separately will set and then immediately clear the attribute,
    requiring use of echo -n to set the attribute), whereas trying to use
    echo -n "" to clear the attribute causes the shell to skip the
    write() call altogether since POSIX says that a zero-length write
    causes no side effects. Thus, one must use echo -n to set and echo
    without -n to clear, as in the following example:
    $ echo -n unconfined_u:object_r:user_home_t:s0 > /proc/$$/attr/fscreate
    $ cat /proc/$$/attr/fscreate
    unconfined_u:object_r:user_home_t:s0
    $ echo "" > /proc/$$/attr/fscreate
    $ cat /proc/$$/attr/fscreate

    Note the use of /proc/$$ rather than /proc/self, as otherwise
    the cat command will read its own attribute value, not that of the shell.

    There are no users of this facility to my knowledge; possibly we
    should just get rid of it.

    UPDATE: Upon further investigation it appears that a local process
    with the process:setfscreate permission can cause a kernel panic as a
    result of this bug. This patch fixes CVE-2017-2618.

    Signed-off-by: Stephen Smalley
    [PM: added the update about CVE-2017-2618 to the commit description]
    Cc: stable@vger.kernel.org # 3.5: d6ea83ec6864e
    Signed-off-by: Paul Moore

    Stephen Smalley
     

28 Jan, 2017

2 commits

  • Otherwise some mask and inmask tokens with MAY_APPEND flag may not work
    as expected.

    Signed-off-by: Lans Zhang
    Signed-off-by: Mimi Zohar

    Lans Zhang
     
  • On failure to return a pathname from ima_d_path(), a pointer to
    dname is returned, which is subsequently used in the IMA measurement
    list, the IMA audit records, and other audit logging. Saving the
    pointer to dname for later use has the potential to race with rename.

    Intead of returning a pointer to dname on failure, this patch returns
    a pointer to a copy of the filename.

    Reported-by: Al Viro
    Signed-off-by: Mimi Zohar
    Cc: stable@vger.kernel.org

    Mimi Zohar
     

27 Jan, 2017

1 commit


25 Jan, 2017

1 commit

  • Add net.ipv4.ip_unprivileged_port_start, which is a per namespace sysctl
    that denotes the first unprivileged inet port in the namespace. To
    disable all privileged ports set this to zero. It also checks for
    overlap with the local port range. The privileged and local range may
    not overlap.

    The use case for this change is to allow containerized processes to bind
    to priviliged ports, but prevent them from ever being allowed to modify
    their container's network configuration. The latter is accomplished by
    ensuring that the network namespace is not a child of the user
    namespace. This modification was needed to allow the container manager
    to disable a namespace's priviliged port restrictions without exposing
    control of the network namespace to processes in the user namespace.

    Signed-off-by: Krister Johansen
    Signed-off-by: David S. Miller

    Krister Johansen
     

24 Jan, 2017

3 commits


19 Jan, 2017

3 commits

  • Some usermode helper applications are defined at kernel build time, while
    others can be changed at runtime. To provide a sane way to filter these, add a
    new kernel option "STATIC_USERMODEHELPER". This option routes all
    call_usermodehelper() calls through this binary, no matter what the caller
    wishes to have called.

    The new binary (by default set to /sbin/usermode-helper, but can be changed
    through the STATIC_USERMODEHELPER_PATH option) can properly filter the
    requested programs to be run by the kernel by looking at the first argument
    that is passed to it. All other options should then be passed onto the proper
    program if so desired.

    To disable all call_usermodehelper() calls by the kernel, set
    STATIC_USERMODEHELPER_PATH to an empty string.

    Thanks to Neil Brown for the idea of this feature.

    Cc: NeilBrown
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • There are a number of usermode helper binaries that are "hard coded" in
    the kernel today, so mark them as "const" to make it harder for someone
    to change where the variables point to.

    Cc: Benjamin Herrenschmidt
    Cc: Thomas Sailer
    Cc: "Rafael J. Wysocki"
    Cc: Johan Hovold
    Cc: Alex Elder
    Cc: "J. Bruce Fields"
    Cc: Jeff Layton
    Cc: David Howells
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • I am still tired of having to find indirect ways to determine
    what security modules are active on a system. I have added
    /sys/kernel/security/lsm, which contains a comma separated
    list of the active security modules. No more groping around
    in /proc/filesystems or other clever hacks.

    Unchanged from previous versions except for being updated
    to the latest security next branch.

    Signed-off-by: Casey Schaufler
    Acked-by: John Johansen
    Acked-by: Paul Moore
    Acked-by: Kees Cook
    Signed-off-by: James Morris

    Casey Schaufler
     

17 Jan, 2017

1 commit


16 Jan, 2017

14 commits