01 Oct, 2016

1 commit

  • CAI Qian pointed out that the semantics
    of shared subtrees make it possible to create an exponentially
    increasing number of mounts in a mount namespace.

    mkdir /tmp/1 /tmp/2
    mount --make-rshared /
    for i in $(seq 1 20) ; do mount --bind /tmp/1 /tmp/2 ; done

    Will create create 2^20 or 1048576 mounts, which is a practical problem
    as some people have managed to hit this by accident.

    As such CVE-2016-6213 was assigned.

    Ian Kent described the situation for autofs users
    as follows:

    > The number of mounts for direct mount maps is usually not very large because of
    > the way they are implemented, large direct mount maps can have performance
    > problems. There can be anywhere from a few (likely case a few hundred) to less
    > than 10000, plus mounts that have been triggered and not yet expired.
    >
    > Indirect mounts have one autofs mount at the root plus the number of mounts that
    > have been triggered and not yet expired.
    >
    > The number of autofs indirect map entries can range from a few to the common
    > case of several thousand and in rare cases up to between 30000 and 50000. I've
    > not heard of people with maps larger than 50000 entries.
    >
    > The larger the number of map entries the greater the possibility for a large
    > number of active mounts so it's not hard to expect cases of a 1000 or somewhat
    > more active mounts.

    So I am setting the default number of mounts allowed per mount
    namespace at 100,000. This is more than enough for any use case I
    know of, but small enough to quickly stop an exponential increase
    in mounts. Which should be perfect to catch misconfigurations and
    malfunctioning programs.

    For anyone who needs a higher limit this can be changed by writing
    to the new /proc/sys/fs/mount-max sysctl.

    Tested-by: CAI Qian
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

23 Sep, 2016

4 commits

  • From: Andrey Vagin

    Each namespace has an owning user namespace and now there is not way
    to discover these relationships.

    Pid and user namepaces are hierarchical. There is no way to discover
    parent-child relationships too.

    Why we may want to know relationships between namespaces?

    One use would be visualization, in order to understand the running
    system. Another would be to answer the question: what capability does
    process X have to perform operations on a resource governed by namespace
    Y?

    One more use-case (which usually called abnormal) is checkpoint/restart.
    In CRIU we are going to dump and restore nested namespaces.

    There [1] was a discussion about which interface to choose to determing
    relationships between namespaces.

    Eric suggested to add two ioctl-s [2]:
    > Grumble, Grumble. I think this may actually a case for creating ioctls
    > for these two cases. Now that random nsfs file descriptors are bind
    > mountable the original reason for using proc files is not as pressing.
    >
    > One ioctl for the user namespace that owns a file descriptor.
    > One ioctl for the parent namespace of a namespace file descriptor.

    Here is an implementaions of these ioctl-s.

    $ man man7/namespaces.7
    ...
    Since Linux 4.X, the following ioctl(2) calls are supported for
    namespace file descriptors. The correct syntax is:

    fd = ioctl(ns_fd, ioctl_type);

    where ioctl_type is one of the following:

    NS_GET_USERNS
    Returns a file descriptor that refers to an owning user names‐
    pace.

    NS_GET_PARENT
    Returns a file descriptor that refers to a parent namespace.
    This ioctl(2) can be used for pid and user namespaces. For
    user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same
    meaning.

    In addition to generic ioctl(2) errors, the following specific ones
    can occur:

    EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.

    EPERM The requested namespace is outside of the current namespace
    scope.

    [1] https://lkml.org/lkml/2016/7/6/158
    [2] https://lkml.org/lkml/2016/7/9/101

    Changes for v2:
    * don't return ENOENT for init_user_ns and init_pid_ns. There is nothing
    outside of the init namespace, so we can return EPERM in this case too.
    > The fewer special cases the easier the code is to get
    > correct, and the easier it is to read. // Eric

    Changes for v3:
    * rename ns->get_owner() to ns->owner(). get_* usually means that it
    grabs a reference.

    Cc: "Eric W. Biederman"
    Cc: James Bottomley
    Cc: "Michael Kerrisk (man-pages)"
    Cc: "W. Trevor King"
    Cc: Alexander Viro
    Cc: Serge Hallyn

    Eric W. Biederman
     
  • Pid and user namepaces are hierarchical. There is no way to discover
    parent-child relationships.

    In a future we will use this interface to dump and restore nested
    namespaces.

    Acked-by: Serge Hallyn
    Signed-off-by: Andrei Vagin
    Signed-off-by: Eric W. Biederman

    Andrey Vagin
     
  • Each namespace has an owning user namespace and now there is not way
    to discover these relationships.

    Understending namespaces relationships allows to answer the question:
    what capability does process X have to perform operations on a resource
    governed by namespace Y?

    After a long discussion, Eric W. Biederman proposed to use ioctl-s for
    this purpose.

    The NS_GET_USERNS ioctl returns a file descriptor to an owning user
    namespace.
    It returns EPERM if a target namespace is outside of a current user
    namespace.

    v2: rename parent to relative

    v3: Add a missing mntput when returning -EAGAIN --EWB

    Acked-by: Serge Hallyn
    Link: https://lkml.org/lkml/2016/7/6/158
    Signed-off-by: Andrei Vagin
    Signed-off-by: Eric W. Biederman

    Andrey Vagin
     
  • Return -EPERM if an owning user namespace is outside of a process
    current user namespace.

    v2: In a first version ns_get_owner returned ENOENT for init_user_ns.
    This special cases was removed from this version. There is nothing
    outside of init_user_ns, so we can return EPERM.
    v3: rename ns->get_owner() to ns->owner(). get_* usually means that it
    grabs a reference.

    Acked-by: Serge Hallyn
    Signed-off-by: Andrei Vagin
    Signed-off-by: Eric W. Biederman

    Andrey Vagin
     

31 Aug, 2016

1 commit


09 Aug, 2016

9 commits


08 Aug, 2016

6 commits

  • Add the necessary boiler plate to move freeing of user namespaces into
    work queue and thus into process context where things can sleep.

    This is a necessary precursor to per user namespace sysctls.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Passing nsproxy into sysctl_table_root.lookup was a premature
    optimization in attempt to avoid depending on current. The
    directory /proc/self/sys has not appeared and if and when
    it does this code will need to be reviewed closely and reworked
    anyway. So remove the premature optimization.

    Acked-by: Kees Cook
    Acked-by: Serge Hallyn
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Pull more block fixes from Jens Axboe:
    "As mentioned in the pull the other day, a few more fixes for this
    round, all related to the bio op changes in this series.

    Two fixes, and then a cleanup, renaming bio->bi_rw to bio->bi_opf. I
    wanted to do that change right after or right before -rc1, so that
    risk of conflict was reduced. I just rebased the series on top of
    current master, and no new ->bi_rw usage has snuck in"

    * 'for-linus' of git://git.kernel.dk/linux-block:
    block: rename bio bi_rw to bi_opf
    target: iblock_execute_sync_cache() should use bio_set_op_attrs()
    mm: make __swap_writepage() use bio_set_op_attrs()
    block/mm: make bdev_ops->rw_page() take a bool for read/write

    Linus Torvalds
     
  • Pull drm zpos property support from Dave Airlie:
    "This tree was waiting on some media stuff I hadn't had time to get a
    stable branchpoint off, so I just waited until it was all in your tree
    first.

    It's been around a bit on the list and shouldn't affect anything
    outside adding the generic API and moving some ARM drivers to using
    it"

    * tag 'drm-for-v4.8-zpos' of git://people.freedesktop.org/~airlied/linux:
    drm: rcar: use generic code for managing zpos plane property
    drm/exynos: use generic code for managing zpos plane property
    drm: sti: use generic zpos for plane
    drm: add generic zpos property

    Linus Torvalds
     
  • Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower
    portion and the op code in the higher portions. This means that
    old code that relies on manually setting bi_rw is most likely
    going to be broken. Instead of letting that brokeness linger,
    rename the member, to force old and out-of-tree code to break
    at compile time instead of at runtime.

    No intended functional changes in this commit.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Commit abf545484d31 changed it from an 'rw' flags type to the
    newer ops based interface, but now we're effectively leaking
    some bdev internals to the rest of the kernel. Since we only
    care about whether it's a read or a write at that level, just
    pass in a bool 'is_write' parameter instead.

    Then we can also move op_is_write() and friends back under
    CONFIG_BLOCK protection.

    Reviewed-by: Mike Christie
    Signed-off-by: Jens Axboe

    Jens Axboe
     

07 Aug, 2016

1 commit

  • Pull more vfs updates from Al Viro:
    "Assorted cleanups and fixes.

    In the "trivial API change" department - ->d_compare() losing 'parent'
    argument"

    * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    cachefiles: Fix race between inactivating and culling a cache object
    9p: use clone_fid()
    9p: fix braino introduced in "9p: new helper - v9fs_parent_fid()"
    vfs: make dentry_needs_remove_privs() internal
    vfs: remove file_needs_remove_privs()
    vfs: fix deadlock in file_remove_privs() on overlayfs
    get rid of 'parent' argument of ->d_compare()
    cifs, msdos, vfat, hfs+: don't bother with parent in ->d_compare()
    affs ->d_compare(): don't bother with ->d_inode
    fold _d_rehash() and __d_rehash() together
    fold dentry_rcuwalk_invalidate() into its only remaining caller

    Linus Torvalds
     

06 Aug, 2016

6 commits

  • Pull qstr constification updates from Al Viro:
    "Fairly self-contained bunch - surprising lot of places passes struct
    qstr * as an argument when const struct qstr * would suffice; it
    complicates analysis for no good reason.

    I'd prefer to feed that separately from the assorted fixes (those are
    in #for-linus and with somewhat trickier topology)"

    * 'work.const-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    qstr: constify instances in adfs
    qstr: constify instances in lustre
    qstr: constify instances in f2fs
    qstr: constify instances in ext2
    qstr: constify instances in vfat
    qstr: constify instances in procfs
    qstr: constify instances in fuse
    qstr constify instances in fs/dcache.c
    qstr: constify instances in nfs
    qstr: constify instances in ocfs2
    qstr: constify instances in autofs4
    qstr: constify instances in hfs
    qstr: constify instances in hfsplus
    qstr: constify instances in logfs
    qstr: constify dentry_init_security

    Linus Torvalds
     
  • Pull virtio/vhost updates from Michael Tsirkin:

    - new vsock device support in host and guest

    - platform IOMMU support in host and guest, including compatibility
    quirks for legacy systems.

    - misc fixes and cleanups.

    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
    VSOCK: Use kvfree()
    vhost: split out vringh Kconfig
    vhost: detect 32 bit integer wrap around
    vhost: new device IOTLB API
    vhost: drop vringh dependency
    vhost: convert pre sorted vhost memory array to interval tree
    vhost: introduce vhost memory accessors
    VSOCK: Add Makefile and Kconfig
    VSOCK: Introduce vhost_vsock.ko
    VSOCK: Introduce virtio_transport.ko
    VSOCK: Introduce virtio_vsock_common.ko
    VSOCK: defer sock removal to transports
    VSOCK: transport-specific vsock_transport functions
    vhost: drop vringh dependency
    vop: pull in vhost Kconfig
    virtio: new feature to detect IOMMU device quirk
    balloon: check the number of available pages in leak balloon
    vhost: lockless enqueuing
    vhost: simplify work flushing

    Linus Torvalds
     
  • Pull more KVM updates from Paolo Bonzini:
    - ARM bugfix and MSI injection support
    - x86 nested virt tweak and OOPS fix
    - Simplify pvclock code (vdso bits acked by Andy Lutomirski).

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    nvmx: mark ept single context invalidation as supported
    nvmx: remove comment about missing nested vpid support
    KVM: lapic: fix access preemption timer stuff even if kernel_irqchip=off
    KVM: documentation: fix KVM_CAP_X2APIC_API information
    x86: vdso: use __pvclock_read_cycles
    pvclock: introduce seqcount-like API
    arm64: KVM: Set cpsr before spsr on fault injection
    KVM: arm: vgic-irqfd: Workaround changing kvm_set_routing_entry prototype
    KVM: arm/arm64: Enable MSI routing
    KVM: arm/arm64: Enable irqchip routing
    KVM: Move kvm_setup_default/empty_irq_routing declaration in arch specific header
    KVM: irqchip: Convey devid to kvm_set_msi
    KVM: Add devid in kvm_kernel_irq_routing_entry
    KVM: api: Pass the devid in the msi routing entry

    Linus Torvalds
     
  • …erry.reding/linux-pwm

    Pull pwm updates from Thierry Reding:
    "This set of changes improve some aspects of the atomic API as well as
    make use of this new API in the regulator framework to allow properly
    dealing with critical regulators controlled by a PWM.

    Aside from that there's a bunch of updates and cleanups for existing
    drivers, as well as the addition of new drivers for the Broadcom
    iProc, STMPE and ChromeOS EC controllers"

    * tag 'pwm/for-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (44 commits)
    regulator: pwm: Document pwm-dutycycle-unit and pwm-dutycycle-range
    regulator: pwm: Support extra continuous mode cases
    pwm: Add ChromeOS EC PWM driver
    dt-bindings: pwm: Add binding for ChromeOS EC PWM
    mfd: cros_ec: Add EC_PWM function definitions
    mfd: cros_ec: Add cros_ec_cmd_xfer_status() helper
    pwm: atmel: Use of_device_get_match_data()
    pwm: atmel: Fix checkpatch warnings
    pwm: atmel: Fix disabling of PWM channels
    dt-bindings: pwm: Add R-Car H3 device tree bindings
    pwm: rcar: Use ARCH_RENESAS
    pwm: tegra: Add support for Tegra186
    dt-bindings: pwm: tegra: Add compatible string for Tegra186
    pwm: tegra: Avoid overflow when calculating duty cycle
    pwm: tegra: Allow 100 % duty cycle
    pwm: tegra: Add support for reset control
    pwm: tegra: Rename mmio_base to regs
    pwm: tegra: Remove useless padding
    pwm: tegra: Drop NUM_PWM macro
    pwm: lpc32xx: Set PWM_PIN_LEVEL bit to default value
    ...

    Linus Torvalds
     
  • Pull block fixes from Jens Axboe:
    "Here's the second round of block updates for this merge window.

    It's a mix of fixes for changes that went in previously in this round,
    and fixes in general. This pull request contains:

    - Fixes for loop from Christoph

    - A bdi vs gendisk lifetime fix from Dan, worth two cookies.

    - A blk-mq timeout fix, when on frozen queues. From Gabriel.

    - Writeback fix from Jan, ensuring that __writeback_single_inode()
    does the right thing.

    - Fix for bio->bi_rw usage in f2fs from me.

    - Error path deadlock fix in blk-mq sysfs registration from me.

    - Floppy O_ACCMODE fix from Jiri.

    - Fix to the new bio op methods from Mike.

    One more followup will be coming here, ensuring that we don't
    propagate the block types outside of block. That, and a rename of
    bio->bi_rw is coming right after -rc1 is cut.

    - Various little fixes"

    * 'for-linus' of git://git.kernel.dk/linux-block:
    mm/block: convert rw_page users to bio op use
    loop: make do_req_filebacked more robust
    loop: don't try to use AIO for discards
    blk-mq: fix deadlock in blk_mq_register_disk() error path
    Include: blkdev: Removed duplicate 'struct request;' declaration.
    Fixup direct bi_rw modifiers
    block: fix bdi vs gendisk lifetime mismatch
    blk-mq: Allow timeouts to run while queue is freezing
    nbd: fix race in ioctl
    block: fix use-after-free in seq file
    f2fs: drop bio->bi_rw manual assignment
    block: add missing group association in bio-cloning functions
    blkcg: kill unused field nr_undestroyed_grps
    writeback: Write dirty times for WB_SYNC_ALL writeback
    floppy: fix open(O_ACCMODE) for ioctl-only open

    Linus Torvalds
     
  • Pull more input updates from Dmitry Torokhov:
    "Two new drivers for touchscreen controllers:

    - Silead touchscreen controllers
    - SiS 9200 family touchscreen controllers

    and a few driver fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
    Input: silead - remove some dead code
    Input: sis-i2c - select CONFIG_CRC_ITU_T
    Input: add driver for SiS 9200 family I2C touchscreen controllers
    Input: ili210x - fix permissions on "calibrate" attribute
    Input: elan_i2c - properly wake up touchpad on ASUS laptops
    Input: add driver for Silead touchscreens
    Input: elantech - fix debug dump of the current packet
    Input: rotary_encoder - support binary encoding of states
    Input: xpad - power off wireless 360 controllers on suspend
    Input: i8042 - break load dependency between atkbd/psmouse and i8042
    Input: synaptics-rmi4 - do not check for NULL when calling of_node_put()
    Input: cros_ec_keyb - cleanup use of dev

    Linus Torvalds
     

05 Aug, 2016

12 commits

  • Pull RTC updates from Alexandre Belloni:
    "RTC for 4.8

    Cleanups:
    - huge cleanup of rtc-generic and char/genrtc this allowed to cleanup
    rtc-cmos, rtc-sh, rtc-m68k, rtc-powerpc and rtc-parisc
    - move mn10300 to rtc-cmos

    Subsystem:
    - fix wakealarms after hibernate
    - multiples fixes for rctest
    - simplify implementations of .read_alarm

    New drivers:
    - Maxim MAX6916

    Drivers:
    - ds1307: fix weekday
    - m41t80: add wakeup support
    - pcf85063: add support for PCF85063A variant
    - rv8803: extend i2c fix and other fixes
    - s35390a: fix alarm reading, this fixes instant reboot after
    shutdown for QNAP TS-41x
    - s3c: clock fixes"

    * tag 'rtc-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (65 commits)
    rtc: rv8803: Clear V1F when setting the time
    rtc: rv8803: Stop the clock while setting the time
    rtc: rv8803: Always apply the I²C workaround
    rtc: rv8803: Fix read day of week
    rtc: rv8803: Remove the check for valid time
    rtc: rv8803: Kconfig: Indicate rx8900 support
    rtc: asm9260: remove .owner field for driver
    rtc: at91sam9: Fix missing spin_lock_init()
    rtc: m41t80: add suspend handlers for alarm IRQ
    rtc: m41t80: make it a real error message
    rtc: pcf85063: Add support for the PCF85063A device
    rtc: pcf85063: fix year range
    rtc: hym8563: in .read_alarm set .tm_sec to 0 to signal minute accuracy
    rtc: explicitly set tm_sec = 0 for drivers with minute accurancy
    rtc: s3c: Add s3c_rtc_{enable/disable}_clk in s3c_rtc_setfreq()
    rtc: s3c: Remove unnecessary call to disable already disabled clock
    rtc: abx80x: use devm_add_action_or_reset()
    rtc: m41t80: use devm_add_action_or_reset()
    rtc: fix a typo and reduce three empty lines to one
    rtc: s35390a: improve two comments in .set_alarm
    ...

    Linus Torvalds
     
  • Pull more powerpc updates from Michael Ellerman:
    "These were delayed for various reasons, so I let them sit in next a
    bit longer, rather than including them in my first pull request.

    Fixes:
    - Fix early access to cpu_spec relocation from Benjamin Herrenschmidt
    - Fix incorrect event codes in power9-event-list from Madhavan Srinivasan
    - Move register_process_table() out of ppc_md from Michael Ellerman

    Use jump_label use for [cpu|mmu]_has_feature():
    - Add mmu_early_init_devtree() from Michael Ellerman
    - Move disable_radix handling into mmu_early_init_devtree() from Michael Ellerman
    - Do hash device tree scanning earlier from Michael Ellerman
    - Do radix device tree scanning earlier from Michael Ellerman
    - Do feature patching before MMU init from Michael Ellerman
    - Check features don't change after patching from Michael Ellerman
    - Make MMU_FTR_RADIX a MMU family feature from Aneesh Kumar K.V
    - Convert mmu_has_feature() to returning bool from Michael Ellerman
    - Convert cpu_has_feature() to returning bool from Michael Ellerman
    - Define radix_enabled() in one place & use static inline from Michael Ellerman
    - Add early_[cpu|mmu]_has_feature() from Michael Ellerman
    - Convert early cpu/mmu feature check to use the new helpers from Aneesh Kumar K.V
    - jump_label: Make it possible for arches to invoke jump_label_init() earlier from Kevin Hao
    - Call jump_label_init() in apply_feature_fixups() from Aneesh Kumar K.V
    - Remove mfvtb() from Kevin Hao
    - Move cpu_has_feature() to a separate file from Kevin Hao
    - Add kconfig option to use jump labels for cpu/mmu_has_feature() from Michael Ellerman
    - Add option to use jump label for cpu_has_feature() from Kevin Hao
    - Add option to use jump label for mmu_has_feature() from Kevin Hao
    - Catch usage of cpu/mmu_has_feature() before jump label init from Aneesh Kumar K.V
    - Annotate jump label assembly from Michael Ellerman

    TLB flush enhancements from Aneesh Kumar K.V:
    - radix: Implement tlb mmu gather flush efficiently
    - Add helper for finding SLBE LLP encoding
    - Use hugetlb flush functions
    - Drop multiple definition of mm_is_core_local
    - radix: Add tlb flush of THP ptes
    - radix: Rename function and drop unused arg
    - radix/hugetlb: Add helper for finding page size
    - hugetlb: Add flush_hugetlb_tlb_range
    - remove flush_tlb_page_nohash

    Add new ptrace regsets from Anshuman Khandual and Simon Guo:
    - elf: Add powerpc specific core note sections
    - Add the function flush_tmregs_to_thread
    - Enable in transaction NT_PRFPREG ptrace requests
    - Enable in transaction NT_PPC_VMX ptrace requests
    - Enable in transaction NT_PPC_VSX ptrace requests
    - Adapt gpr32_get, gpr32_set functions for transaction
    - Enable support for NT_PPC_CGPR
    - Enable support for NT_PPC_CFPR
    - Enable support for NT_PPC_CVMX
    - Enable support for NT_PPC_CVSX
    - Enable support for TM SPR state
    - Enable NT_PPC_TM_CTAR, NT_PPC_TM_CPPR, NT_PPC_TM_CDSCR
    - Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR
    - Enable support for EBB registers
    - Enable support for Performance Monitor registers"

    * tag 'powerpc-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (48 commits)
    powerpc/mm: Move register_process_table() out of ppc_md
    powerpc/perf: Fix incorrect event codes in power9-event-list
    powerpc/32: Fix early access to cpu_spec relocation
    powerpc/ptrace: Enable support for Performance Monitor registers
    powerpc/ptrace: Enable support for EBB registers
    powerpc/ptrace: Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR
    powerpc/ptrace: Enable NT_PPC_TM_CTAR, NT_PPC_TM_CPPR, NT_PPC_TM_CDSCR
    powerpc/ptrace: Enable support for TM SPR state
    powerpc/ptrace: Enable support for NT_PPC_CVSX
    powerpc/ptrace: Enable support for NT_PPC_CVMX
    powerpc/ptrace: Enable support for NT_PPC_CFPR
    powerpc/ptrace: Enable support for NT_PPC_CGPR
    powerpc/ptrace: Adapt gpr32_get, gpr32_set functions for transaction
    powerpc/ptrace: Enable in transaction NT_PPC_VSX ptrace requests
    powerpc/ptrace: Enable in transaction NT_PPC_VMX ptrace requests
    powerpc/ptrace: Enable in transaction NT_PRFPREG ptrace requests
    powerpc/process: Add the function flush_tmregs_to_thread
    elf: Add powerpc specific core note sections
    powerpc/mm: remove flush_tlb_page_nohash
    powerpc/mm/hugetlb: Add flush_hugetlb_tlb_range
    ...

    Linus Torvalds
     
  • Pull second round of rdma updates from Doug Ledford:
    "This can be split out into just two categories:

    - fixes to the RDMA R/W API in regards to SG list length limits
    (about 5 patches)

    - fixes/features for the Intel hfi1 driver (everything else)

    The hfi1 driver is still being brought to full feature support by
    Intel, and they have a lot of people working on it, so that amounts to
    almost the entirety of this pull request"

    * tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (84 commits)
    IB/hfi1: Add cache evict LRU list
    IB/hfi1: Fix memory leak during unexpected shutdown
    IB/hfi1: Remove unneeded mm argument in remove function
    IB/hfi1: Consistently call ops->remove outside spinlock
    IB/hfi1: Use evict mmu rb operation
    IB/hfi1: Add evict operation to the mmu rb handler
    IB/hfi1: Fix TID caching actions
    IB/hfi1: Make the cache handler own its rb tree root
    IB/hfi1: Make use of mm consistent
    IB/hfi1: Fix user SDMA racy user request claim
    IB/hfi1: Fix error condition that needs to clean up
    IB/hfi1: Release node on insert failure
    IB/hfi1: Validate SDMA user iovector count
    IB/hfi1: Validate SDMA user request index
    IB/hfi1: Use the same capability state for all shared contexts
    IB/hfi1: Prevent null pointer dereference
    IB/hfi1: Rename TID mmu_rb_* functions
    IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister()
    IB/hfi1: Restructure hfi1_file_open
    IB/hfi1: Make iovec loop index easy to understand
    ...

    Linus Torvalds
     
  • Pull base rdma updates from Doug Ledford:
    "Round one of 4.8 code: while this is mostly normal, there is a new
    driver in here (the driver was hosted outside the kernel for several
    years and is actually a fairly mature and well coded driver). It
    amounts to 13,000 of the 16,000 lines of added code in here.

    Summary:

    - Updates/fixes for iw_cxgb4 driver
    - Updates/fixes for mlx5 driver
    - Add flow steering and RSS API
    - Add hardware stats to mlx4 and mlx5 drivers
    - Add firmware version API for RDMA driver use
    - Add the rxe driver (this is a software RoCE driver that makes any
    Ethernet device a RoCE device)
    - Fixes for i40iw driver
    - Support for send only multicast joins in the cma layer
    - Other minor fixes"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (72 commits)
    Soft RoCE driver
    IB/core: Support for CMA multicast join flags
    IB/sa: Add cached attribute containing SM information to SA port
    IB/uverbs: Fix race between uverbs_close and remove_one
    IB/mthca: Clean up error unwind flow in mthca_reset()
    IB/mthca: NULL arg to pci_dev_put is OK
    IB/hfi1: NULL arg to sc_return_credits is OK
    IB/mlx4: Add diagnostic hardware counters
    net/mlx4: Query performance and diagnostics counters
    net/mlx4: Add diagnostic counters capability bit
    Use smaller 512 byte messages for portmapper messages
    IB/ipoib: Report SG feature regardless of HW UD CSUM capability
    IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct
    IB/hfi1: Disable by default
    IB/rdmavt: Disable by default
    IB/mlx5: Fix port counter ID association to QP offset
    IB/mlx5: Fix iteration overrun in GSI qps
    i40iw: Add NULL check for puda buffer
    i40iw: Change dup_ack_thresh to u8
    i40iw: Remove unnecessary check for moving CQ head
    ...

    Linus Torvalds
     
  • Pull SCSI target updates from Nicholas Bellinger:
    "The most notable item is IBM virtual SCSI target driver, that was
    originally ported to target-core back in 2010 by Tomo-san, and has
    been brought forward to v4.x code by Bryant Ly, Michael Cyr and co
    over the last months.

    Also included are two ORDERED task related bug-fixes Bryant + Michael
    found along the way using ibmvscsis with AIX guests, plus a few
    miscellaneous target-core + iscsi-target bug-fixes with associated
    stable tags"

    * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
    target: fix spelling mistake: "limitiation" -> "limitation"
    target: Fix residual overflow handling in target_complete_cmd_with_length
    tcm_fc: set and unset FCP_SPPF_TARG_FCN
    iscsi-target: Fix panic when adding second TCP connection to iSCSI session
    ibmvscsis: Initial commit of IBM VSCSI Tgt Driver
    target: Fix ordered task CHECK_CONDITION early exception handling
    target: Fix ordered task target_setup_cmd_from_cdb exception hang
    target: Fix max_unmap_lba_count calc overflow
    target: Fix race between iscsi-target connection shutdown + ABORT_TASK
    target: Fix missing complete during ABORT_TASK + CMD_T_FABRIC_STOP

    Linus Torvalds
     
  • Pull nfsd updates from Bruce Fields:
    "Highlights:

    - Trond made a change to the server's tcp logic that allows a fast
    client to better take advantage of high bandwidth networks, but may
    increase the risk that a single client could starve other clients;
    a new sunrpc.svc_rpc_per_connection_limit parameter should help
    mitigate this in the (hopefully unlikely) event this becomes a
    problem in practice.

    - Tom Haynes added a minimal flex-layout pnfs server, which is of no
    use in production for now--don't build it unless you're doing
    client testing or further server development"

    * tag 'nfsd-4.8' of git://linux-nfs.org/~bfields/linux: (32 commits)
    nfsd: remove some dead code in nfsd_create_locked()
    nfsd: drop unnecessary MAY_EXEC check from create
    nfsd: clean up bad-type check in nfsd_create_locked
    nfsd: remove unnecessary positive-dentry check
    nfsd: reorganize nfsd_create
    nfsd: check d_can_lookup in fh_verify of directories
    nfsd: remove redundant zero-length check from create
    nfsd: Make creates return EEXIST instead of EACCES
    SUNRPC: Detect immediate closure of accepted sockets
    SUNRPC: accept() may return sockets that are still in SYN_RECV
    nfsd: allow nfsd to advertise multiple layout types
    nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock
    nfsd/blocklayout: Make sure calculate signature/designator length aligned
    xfs: abstract block export operations from nfsd layouts
    SUNRPC: Remove unused callback xpo_adjust_wspace()
    SUNRPC: Change TCP socket space reservation
    SUNRPC: Add a server side per-connection limit
    SUNRPC: Micro optimisation for svc_data_ready
    SUNRPC: Call the default socket callbacks instead of open coding
    SUNRPC: lock the socket while detaching it
    ...

    Linus Torvalds
     
  • Pull more btrfs updates from Chris Mason:
    "This is part two of my btrfs pull, which is some cleanups and a batch
    of fixes.

    Most of the code here is from Jeff Mahoney, making the pointers we
    pass around internally more consistent and less confusing overall. I
    noticed a small problem right before I sent this out yesterday, so I
    fixed it up and re-tested overnight"

    * 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (40 commits)
    Btrfs: fix __MAX_CSUM_ITEMS
    btrfs: btrfs_abort_transaction, drop root parameter
    btrfs: add btrfs_trans_handle->fs_info pointer
    btrfs: btrfs_relocate_chunk pass extent_root to btrfs_end_transaction
    btrfs: convert nodesize macros to static inlines
    btrfs: introduce BTRFS_MAX_ITEM_SIZE
    btrfs: cleanup, remove prototype for btrfs_find_root_ref
    btrfs: copy_to_sk drop unused root parameter
    btrfs: simpilify btrfs_subvol_inherit_props
    btrfs: tests, use BTRFS_FS_STATE_DUMMY_FS_INFO instead of dummy root
    btrfs: tests, require fs_info for root
    btrfs: tests, move initialization into tests/
    btrfs: btrfs_test_opt and friends should take a btrfs_fs_info
    btrfs: prefix fsid to all trace events
    btrfs: plumb fs_info into btrfs_work
    btrfs: remove obsolete part of comment in statfs
    btrfs: hide test-only member under ifdef
    btrfs: Ratelimit "no csum found" info message
    btrfs: Add ratelimit to btrfs printing
    Btrfs: fix unexpected balance crash due to BUG_ON
    ...

    Linus Torvalds
     
  • Pull m68knommu updates from Greg Ungerer:
    "This series is all about Nicolas flat format support for MMU systems.

    Traditional m68k no-MMU flat format binaries can now be run on m68k
    MMU enabled systems too. The series includes some nice cleanups of
    the binfmt_flat code and converts it to using proper user space
    accessor functions.

    With all this in place you can boot and run a complete no-MMU flat
    format based user space on an MMU enabled system"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
    m68k: enable binfmt_flat on systems with an MMU
    binfmt_flat: allow compressed flat binary format to work on MMU systems
    binfmt_flat: add MMU-specific support
    binfmt_flat: update libraries' data segment pointer with userspace accessors
    binfmt_flat: use clear_user() rather than memset() to clear .bss
    binfmt_flat: use proper user space accessors with old relocs code
    binfmt_flat: use proper user space accessors with relocs processing code
    binfmt_flat: clean up create_flat_tables() and stack accesses
    binfmt_flat: use generic transfer_args_to_stack()
    elf_fdpic_transfer_args_to_stack(): make it generic
    binfmt_flat: prevent kernel dammage from corrupted executable headers
    binfmt_flat: convert printk invocations to their modern form
    binfmt_flat: assorted cleanups
    m68k: use same start_thread() on MMU and no-MMU
    m68k: fix file path comment
    m68k: fix bFLT executable running on MMU enabled systems

    Linus Torvalds
     
  • The rw_page users were not converted to use bio/req ops. As a result
    bdev_write_page is not passing down REQ_OP_WRITE and the IOs will
    be sent down as reads.

    Signed-off-by: Mike Christie
    Fixes: 4e1b2d52a80d ("block, fs, drivers: remove REQ_OP compat defs and related code")

    Modified by me to:

    1) Drop op_flags passing into ->rw_page(), as we don't use it.
    2) Make op_is_write() and friends safe to use for !CONFIG_BLOCK

    Signed-off-by: Jens Axboe

    Mike Christie
     
  • In include/linux/blkdev.h duplicate declarations of the request
    struct exist. Cleaned up by removing the second, unneeded
    declaration.

    Signed-off-by: John Pittman
    Signed-off-by: Jens Axboe

    John Pittman
     
  • The name for a bdi of a gendisk is derived from the gendisk's devt.
    However, since the gendisk is destroyed before the bdi it leaves a
    window where a new gendisk could dynamically reuse the same devt while a
    bdi with the same name is still live. Arrange for the bdi to hold a
    reference against its "owner" disk device while it is registered.
    Otherwise we can hit sysfs duplicate name collisions like the following:

    WARNING: CPU: 10 PID: 2078 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x64/0x80
    sysfs: cannot create duplicate filename '/devices/virtual/bdi/259:1'

    Hardware name: HP ProLiant DL580 Gen8, BIOS P79 05/06/2015
    0000000000000286 0000000002c04ad5 ffff88006f24f970 ffffffff8134caec
    ffff88006f24f9c0 0000000000000000 ffff88006f24f9b0 ffffffff8108c351
    0000001f0000000c ffff88105d236000 ffff88105d1031e0 ffff8800357427f8
    Call Trace:
    [] dump_stack+0x63/0x87
    [] __warn+0xd1/0xf0
    [] warn_slowpath_fmt+0x5f/0x80
    [] sysfs_warn_dup+0x64/0x80
    [] sysfs_create_dir_ns+0x7e/0x90
    [] kobject_add_internal+0xaa/0x320
    [] ? vsnprintf+0x34e/0x4d0
    [] kobject_add+0x75/0xd0
    [] ? mutex_lock+0x12/0x2f
    [] device_add+0x125/0x610
    [] device_create_groups_vargs+0xd8/0x100
    [] device_create_vargs+0x1c/0x20
    [] bdi_register+0x8c/0x180
    [] bdi_register_dev+0x27/0x30
    [] add_disk+0x175/0x4a0

    Cc:
    Reported-by: Yi Zhang
    Tested-by: Yi Zhang
    Signed-off-by: Dan Williams

    Fixed up missing 0 return in bdi_register_owner().

    Signed-off-by: Jens Axboe

    Dan Williams
     
  • When a bio is cloned, the newly created bio must be associated with
    the same blkcg as the original bio (if BLK_CGROUP is enabled). If
    this operation is not performed, then the new bio is not associated
    with any group, and the group of the current task is returned when
    the group of the bio is requested.

    Depending on the cloning frequency, this may cause a large
    percentage of the bios belonging to a given group to be treated
    as if belonging to other groups (in most cases as if belonging to
    the root group). The expected group isolation may thereby be broken.

    This commit adds the missing association in bio-cloning functions.

    Fixes: da2f0f74cf7d ("Btrfs: add support for blkio controllers")
    Cc: stable@vger.kernel.org # v4.3+

    Signed-off-by: Paolo Valente
    Reviewed-by: Nikolay Borisov
    Reviewed-by: Jeff Moyer
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Paolo Valente