29 Aug, 2015

5 commits

  • Add a ->handler and a ->handler_data field to struct scsi_device and kill
    this indirection. Also move struct scsi_device_handler to scsi_dh.h so that
    changes to it don't require rebuilding every SCSI LLDD.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Martin K. Petersen
    Reviewed-by: Hannes Reinecke
    Signed-off-by: James Bottomley

    Christoph Hellwig
     
  • Add a single list of devices that need non-ALUA device handlers to the core
    scsi_dh code so that we can autoload the modules for them at probe time.

    While this is a little ugly in terms of architecture it actually
    significantly simplifies the code in addition to the new autoloading
    functionality.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Martin K. Petersen
    Reviewed-by: Hannes Reinecke
    Acked-by: Mike Snitzer
    Signed-off-by: James Bottomley

    Christoph Hellwig
     
  • Stop building scsi_dh as a separate module and integrate it fully into the
    core SCSI code with explicit callouts at bus scan time. For now the
    callouts are placed at the same point as the old bus notifiers were called,
    but in the future we will be able to look at ALUA INQUIRY data earlier on.

    Note that this also means that the device handler modules need to be loaded
    by the time we scan the bus. The next patches will add support for
    autoloading device handlers at bus scan time to make sure they are always
    loaded if they are enabled in the kernel config.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Martin K. Petersen
    Reviewed-by: Hannes Reinecke
    Acked-by: Mike Snitzer
    Signed-off-by: James Bottomley

    Christoph Hellwig
     
  • This way we can reused the same code any attachment method, not just those
    requested from dm-mpath.

    [jejb: fixup checkpatch error]
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Martin K. Petersen
    Reviewed-by: Hannes Reinecke
    Acked-by: Mike Snitzer
    Signed-off-by: James Bottomley

    Christoph Hellwig
     
  • While allowing dm-mpath to attach device handlers is a functionality we need
    for backwards compatibility reason there is no reason to reference count
    them and detach them if dm-mpath stops using the device for some reason.

    If the device handler works for the given device it can just stay attached,
    and we can take the retain_hw_handler codepath.

    Signed-off-by: Christoph Hellwig
    Acked-by: Mike Snitzer
    Acked-by: Hannes Reinecke
    Signed-off-by: James Bottomley

    Christoph Hellwig
     

27 Aug, 2015

2 commits

  • Add support for physical LUN segmentation (virtual LUNs) to device
    driver supporting the IBM CXL Flash adapter. This patch allows user
    space applications to virtually segment a physical LUN into N virtual
    LUNs, taking advantage of the translation features provided by this
    adapter.

    Signed-off-by: Matthew R. Ochs
    Signed-off-by: Manoj N. Kumar
    Reviewed-by: Michael Neuling
    Reviewed-by: Wen Xiong
    Signed-off-by: James Bottomley

    Matthew R. Ochs
     
  • Add superpipe supporting infrastructure to device driver for the IBM CXL
    Flash adapter. This patch allows userspace applications to take advantage
    of the accelerated I/O features that this adapter provides and bypass the
    traditional filesystem stack.

    Signed-off-by: Matthew R. Ochs
    Signed-off-by: Manoj N. Kumar
    Reviewed-by: Michael Neuling
    Reviewed-by: Wen Xiong
    Reviewed-by: Brian King
    Signed-off-by: James Bottomley

    Matthew R. Ochs
     

26 Aug, 2015

1 commit


31 Jul, 2015

1 commit

  • The iSCSI session recovery_tmo setting is writeable in sysfs, but it's
    also set every time a connection is established when parameters are set
    from iscsid over netlink. That results in the timeout being reset to
    the default value after every recovery.

    The DM multipath tools want to use the sysfs interface to lower the
    default timeout when there are multiple paths to fail over. It has
    caused confusion that we have a writeable sysfs value that seem to keep
    resetting itself.

    This patch adds an in-kernel flag that gets set once a sysfs write
    occurs, and then ignores netlink parameter setting once it's been
    modified via the sysfs interface. My thinking here is that the sysfs
    interface is much simpler for external tools to influence the session
    timeout, but if we're going to allow it to be modified directly we
    should ensure that setting is maintained.

    Signed-off-by: Chris Leech
    Reviewed-by: Mike Christie
    Signed-off-by: James Bottomley

    Chris Leech
     

13 Jul, 2015

2 commits

  • Pull timer fixes from Thomas Gleixner:
    "This update from the timer departement contains:

    - A series of patches which address a shortcoming in the tick
    broadcast code.

    If the broadcast device is not available or an hrtimer emulated
    broadcast device, some of the original assumptions lead to boot
    failures. I rather plugged all of the corner cases instead of only
    addressing the issue reported, so the change got a little larger.

    Has been extensivly tested on x86 and arm.

    - Get rid of the last holdouts using do_posix_clock_monotonic_gettime()

    - A regression fix for the imx clocksource driver

    - An update to the new state callbacks mechanism for clockevents.
    This is required to simplify the conversion, which will take place
    in 4.3"

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    tick/broadcast: Prevent NULL pointer dereference
    time: Get rid of do_posix_clock_monotonic_gettime
    cris: Replace do_posix_clock_monotonic_gettime()
    tick/broadcast: Unbreak CONFIG_GENERIC_CLOCKEVENTS=n build
    tick/broadcast: Handle spurious interrupts gracefully
    tick/broadcast: Check for hrtimer broadcast active early
    tick/broadcast: Return busy when IPI is pending
    tick/broadcast: Return busy if periodic mode and hrtimer broadcast
    tick/broadcast: Move the check for periodic mode inside state handling
    tick/broadcast: Prevent deep idle if no broadcast device available
    tick/broadcast: Make idle check independent from mode and config
    tick/broadcast: Sanity check the shutdown of the local clock_event
    tick/broadcast: Prevent hrtimer recursion
    clockevents: Allow set-state callbacks to be optional
    clocksource/imx: Define clocksource for mx27

    Linus Torvalds
     
  • Pull irq fix from Thomas Gleixner:
    "A single fix for a cpu hotplug race vs. interrupt descriptors:

    Prevent irq setup/teardown across the cpu starting/dying parts of cpu
    hotplug so that the starting/dying cpu has a stable view of the
    descriptor space. This has been an issue for all architectures in the
    cpu dying phase, where interrupts are migrated away from the dying
    cpu. In the starting phase its mostly a x86 issue vs the vector space
    update"

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    hotplug: Prevent alloc/free of irq descriptors during cpu up/down

    Linus Torvalds
     

12 Jul, 2015

2 commits

  • Pull libnvdimm fixes from Dan Williams:
    "1) Fixes for a handful of smatch reports (Thanks Dan C.!) and minor
    bug fixes (patches 1-6)

    2) Correctness fixes to the BLK-mode nvdimm driver (patches 7-10).

    Granted these are slightly large for a -rc update. They have been
    out for review in one form or another since the end of May and were
    deferred from the merge window while we settled on the "PMEM API"
    for the PMEM-mode nvdimm driver (ie memremap_pmem, memcpy_to_pmem,
    and wmb_pmem).

    Now that those apis are merged we implement them in the BLK driver
    to guarantee that mmio aperture moves stay ordered with respect to
    incoming read/write requests, and that writes are flushed through
    those mmio-windows and platform-buffers to be persistent on media.

    These pass the sub-system unit tests with the updates to
    tools/testing/nvdimm, and have received a successful build-report from
    the kbuild robot (468 configs).

    With acks from Rafael for the touches to drivers/acpi/"

    * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm:
    nfit: add support for NVDIMM "latch" flag
    nfit: update block I/O path to use PMEM API
    tools/testing/nvdimm: add mock acpi_nfit_flush_address entries to nfit_test
    tools/testing/nvdimm: fix return code for unimplemented commands
    tools/testing/nvdimm: mock ioremap_wt
    pmem: add maintainer for include/linux/pmem.h
    nfit: fix smatch "use after null check" report
    nvdimm: Fix return value of nvdimm_bus_init() if class_create() fails
    libnvdimm: smatch cleanups in __nd_ioctl
    sparse: fix misplaced __pmem definition

    Linus Torvalds
     
  • Pull ARM SoC fixes from Kevin Hilman:
    "A fairly random colletion of fixes based on -rc1 for OMAP, sunxi and
    prima2 as well as a few arm64-specific DT fixes.

    This series also includes a late to support a new Allwinner (sunxi)
    SoC, but since it's rather simple and isolated to the
    platform-specific code, it's included it for this -rc"

    * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
    arm64: dts: add device tree for ARM SMM-A53x2 on LogicTile Express 20MG
    arm: dts: vexpress: add missing CCI PMU device node to TC2
    arm: dts: vexpress: describe all PMUs in TC2 dts
    GICv3: Add ITS entry to THUNDER dts
    arm64: dts: Add poweroff button device node for APM X-Gene platform
    ARM: dts: am4372.dtsi: disable rfbi
    ARM: dts: am57xx-beagle-x15: Provide supply for usb2_phy2
    ARM: dts: am4372: Add emif node
    Revert "ARM: dts: am335x-boneblack: disable RTC-only sleep"
    ARM: sunxi: Enable simplefb in the defconfig
    ARM: Remove deprecated symbol from defconfig files
    ARM: sunxi: Add Machine support for A33
    ARM: sunxi: Introduce Allwinner H3 support
    Documentation: sunxi: Update Allwinner SoC documentation
    ARM: prima2: move to use REGMAP APIs for rtciobrg
    ARM: dts: atlas7: add pinctrl and gpio descriptions
    ARM: OMAP2+: Remove unnessary return statement from the void function, omap2_show_dma_caps
    memory: omap-gpmc: Fix parsing of devices

    Linus Torvalds
     

10 Jul, 2015

2 commits

  • Pull Ceph fixes from Sage Weil:
    "There is a fix for CephFS and RBD when used within containers/namespaces,
    and a fix for the address learning the client is supposed to do when
    initially talking to the Ceph cluster.

    There are also two patches updating MAINTAINERS. One breaks out the
    common Ceph code shared by fs/ceph and drivers/block/rbd.c into a
    separate entry with the appropriate maintainers listed. The second
    adds a second reference to the github tree where the Ceph client
    development takes place (before it is pushed to korg and then to you).

    The goal here is to move closer to a situation where Ilya Dryomov or
    one of the other maintainers can push things to you if I am
    unavailable. Ilya has done most of the work preparing branches for
    upstream recently; you should not be surprised to hear from him if I
    am trapped in some internet-less wasteland or hit by a bus or
    something. In the meantime, we'll work on getting him added to the
    kernel web of trust"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    MAINTAINERS: add secondary tree for ceph modules
    MAINTAINERS: update ceph entries
    libceph: treat sockaddr_storage with uninitialized family as blank
    libceph: enable ceph in a non-default network namespace

    Linus Torvalds
     
  • Grab a reference on a network namespace of the 'rbd map' (in case of
    rbd) or 'mount' (in case of ceph) process and use that to open sockets
    instead of always using init_net and bailing if network namespace is
    anything but init_net. Be careful to not share struct ceph_client
    instances between different namespaces and don't add any code in the
    !CONFIG_NET_NS case.

    This is based on a patch from Hong Zhiguo .

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil

    Ilya Dryomov
     

09 Jul, 2015

3 commits

  • All users gone. Remove it before we get another one.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Pull power management and ACPI updates from Rafael Wysocki:
    "These are fixes on top of the previous PM+ACPI pull requests
    (including one fix for a 4.1 regression) and two commits adding
    _CLS-based device enumeration support to the ACPI core and the ATA
    subsystem that waited for the latest ACPICA changes to be merged.

    Specifics:

    - Fix for an ACPI resources management regression introduced during
    the 4.1 cycle (that unfortunately went into -stable) effectively
    reverting the bad commit along with the recent fixups on top of it
    and using an alternative approach to address the underlying issue
    (Rafael J Wysocki).

    - Fix for a memory leak and an incorrect return value in an error
    code path in the ACPI LPSS (Low-Power Subsystem) driver (Rafael J
    Wysocki).

    - Fix for a leftover dangling pointer in an error code path in the
    new wakeup IRQ support code (Rafael J Wysocki).

    - Fix to prevent infinite loops (due to errors in other places) from
    happening in the core generic PM domains support code (Geert
    Uytterhoeven).

    - Hibernation documentation update/clarification (Uwe Geuder).

    - Support for _CLS-based device enumeration in the ACPI core and in
    the ATA subsystem (Suravee Suthikulpanit)"

    * tag 'pm+acpi-4.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    PM / wakeirq: Avoid setting power.wakeirq too hastily
    ata: ahci_platform: Add ACPI _CLS matching
    ACPI / scan: Add support for ACPI _CLS device matching
    PM / hibernate: clarify resume documentation
    PM / Domains: Avoid infinite loops in attach/detach code
    ACPI / LPSS: Fix up acpi_lpss_create_device()
    ACPI / PNP: Reserve ACPI resources at the fs_initcall_sync stage

    Linus Torvalds
     
  • …el/git/baohua/linux into fixes

    Merge "CSR SiRFSoC rtc iobrg move to regmap for 4.2" from Barry Song:

    move CSR rtc iobrg read/write API to be regmap

    this moves to general APIs, and all drivers will be changed based
    on it.

    * tag 'sirf-iobrg2regmap-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/baohua/linux:
    ARM: prima2: move to use REGMAP APIs for rtciobrg

    Kevin Hilman
     

08 Jul, 2015

4 commits

  • When a cpu goes up some architectures (e.g. x86) have to walk the irq
    space to set up the vector space for the cpu. While this needs extra
    protection at the architecture level we can avoid a few race
    conditions by preventing the concurrent allocation/free of irq
    descriptors and the associated data.

    When a cpu goes down it moves the interrupts which are targeted to
    this cpu away by reassigning the affinities. While this happens
    interrupts can be allocated and freed, which opens a can of race
    conditions in the code which reassignes the affinities because
    interrupt descriptors might be freed underneath.

    Example:

    CPU1 CPU2
    cpu_up/down
    irq_desc = irq_to_desc(irq);
    remove_from_radix_tree(desc);
    raw_spin_lock(&desc->lock);
    free(desc);

    We could protect the irq descriptors with RCU, but that would require
    a full tree change of all accesses to interrupt descriptors. But
    fortunately these kind of race conditions are rather limited to a few
    things like cpu hotplug. The normal setup/teardown is very well
    serialized. So the simpler and obvious solution is:

    Prevent allocation and freeing of interrupt descriptors accross cpu
    hotplug.

    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra
    Cc: xiao jin
    Cc: Joerg Roedel
    Cc: Borislav Petkov
    Cc: Yanmin Zhang
    Link: http://lkml.kernel.org/r/20150705171102.063519515@linutronix.de

    Thomas Gleixner
     
  • * acpi-scan:
    ata: ahci_platform: Add ACPI _CLS matching
    ACPI / scan: Add support for ACPI _CLS device matching

    Rafael J. Wysocki
     
  • Making tick_broadcast_oneshot_control() independent from
    CONFIG_GENERIC_CLOCKEVENTS_BROADCAST broke the build for
    CONFIG_GENERIC_CLOCKEVENTS=n because the function is not defined
    there.

    Provide a proper stub inline.

    Fixes: f32dd1170511 'tick/broadcast: Make idle check independent from mode and config'
    Reported-by: kbuild test robot
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Currently the broadcast busy check, which prevents the idle code from
    going into deep idle, works only in one shot mode.

    If NOHZ and HIGHRES are off (config or command line) there is no
    sanity check at all, so under certain conditions cpus are allowed to
    go into deep idle, where the local timer stops, and are not woken up
    again because there is no broadcast timer installed or a hrtimer based
    broadcast device is not evaluated.

    Move tick_broadcast_oneshot_control() into the common code and provide
    proper subfunctions for the various config combinations.

    The common check in tick_broadcast_oneshot_control() is for the C3STOP
    misfeature flag of the local clock event device. If its not set, idle
    can proceed. If set, further checks are necessary.

    Provide checks for the trivial cases:

    - If broadcast is disabled in the config, then return busy

    - If oneshot mode (NOHZ/HIGHES) is disabled in the config, return
    busy if the broadcast device is hrtimer based.

    - If oneshot mode is enabled in the config call the original
    tick_broadcast_oneshot_control() function. That function needs
    extra checks which will be implemented in seperate patches.

    [ Split out from a larger combo patch ]

    Reported-and-tested-by: Sudeep Holla
    Signed-off-by: Thomas Gleixner
    Cc: Suzuki Poulose
    Cc: Lorenzo Pieralisi
    Cc: Catalin Marinas
    Cc: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Cc: Preeti U Murthy
    Cc: Ingo Molnar
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

    Thomas Gleixner
     

07 Jul, 2015

2 commits

  • Device drivers typically use ACPI _HIDs/_CIDs listed in struct device_driver
    acpi_match_table to match devices. However, for generic drivers, we do not
    want to list _HID for all supported devices. Also, certain classes of devices
    do not have _CID (e.g. SATA, USB). Instead, we can leverage ACPI _CLS,
    which specifies PCI-defined class code (i.e. base-class, subclass and
    programming interface). This patch adds support for matching ACPI devices using
    the _CLS method.

    To support loadable module, current design uses _HID or _CID to match device's
    modalias. With the new way of matching with _CLS this would requires modification
    to the current ACPI modalias key to include _CLS. This patch appends PCI-defined
    class-code to the existing ACPI modalias as following.

    acpi::::..:::
    E.g:
    # cat /sys/devices/platform/AMDI0600:00/modalias
    acpi:AMDI0600:010601:

    where bb is th base-class code, ss is te sub-class code, and pp is the
    programming interface code

    Since there would not be _HID/_CID in the ACPI matching table of the driver,
    this patch adds a field to acpi_device_id to specify the matching _CLS.

    static const struct acpi_device_id ahci_acpi_match[] = {
    { ACPI_DEVICE_CLASS(PCI_CLASS_STORAGE_SATA_AHCI, 0xffffff) },
    {},
    };

    In this case, the corresponded entry in modules.alias file would be:

    alias acpi*:010601:* ahci_platform

    Acked-by: Mika Westerberg
    Reviewed-by: Hanjun Guo
    Signed-off-by: Suravee Suthikulpanit
    Signed-off-by: Rafael J. Wysocki

    Suthikulpanit, Suravee
     
  • This effectively reverts the following three commits:

    7bc10388ccdd ACPI / resources: free memory on error in add_region_before()
    0f1b414d1907 ACPI / PNP: Avoid conflicting resource reservations
    b9a5e5e18fbf ACPI / init: Fix the ordering of acpi_reserve_resources()

    (commit b9a5e5e18fbf introduced regressions some of which, but not
    all, were addressed by commit 0f1b414d1907 and commit 7bc10388ccdd
    was a fixup on top of the latter) and causes ACPI fixed hardware
    resources to be reserved at the fs_initcall_sync stage of system
    initialization.

    The story is as follows. First, a boot regression was reported due
    to an apparent resource reservation ordering change after a commit
    that shouldn't lead to such changes. Investigation led to the
    conclusion that the problem happened because acpi_reserve_resources()
    was executed at the device_initcall() stage of system initialization
    which wasn't strictly ordered with respect to driver initialization
    (and with respect to the initialization of the pcieport driver in
    particular), so a random change causing the device initcalls to be
    run in a different order might break things.

    The response to that was to attempt to run acpi_reserve_resources()
    as soon as we knew that ACPI would be in use (commit b9a5e5e18fbf).
    However, that turned out to be too early, because it caused resource
    reservations made by the PNP system driver to fail on at least one
    system and that failure was addressed by commit 0f1b414d1907.

    That fix still turned out to be insufficient, though, because
    calling acpi_reserve_resources() before the fs_initcall stage of
    system initialization caused a boot regression to happen on the
    eCAFE EC-800-H20G/S netbook. That meant that we only could call
    acpi_reserve_resources() at the fs_initcall initialization stage
    or later, but then we might just as well call it after the PNP
    initalization in which case commit 0f1b414d1907 wouldn't be
    necessary any more.

    For this reason, the changes made by commit 0f1b414d1907 are reverted
    (along with a memory leak fixup on top of that commit), the changes
    made by commit b9a5e5e18fbf that went too far are reverted too and
    acpi_reserve_resources() is changed into fs_initcall_sync, which
    will cause it to be executed after the PNP subsystem initialization
    (which is an fs_initcall) and before device initcalls (including
    the pcieport driver initialization) which should avoid the initial
    issue.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=100581
    Link: http://marc.info/?t=143092384600002&r=1&w=2
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=99831
    Link: http://marc.info/?t=143389402600001&r=1&w=2
    Fixes: b9a5e5e18fbf "ACPI / init: Fix the ordering of acpi_reserve_resources()"
    Reported-by: Roland Dreier
    Cc: All applicable
    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

06 Jul, 2015

1 commit

  • Pull ext4 bugfixes from Ted Ts'o:
    "Bug fixes (all for stable kernels) for ext4:

    - address corner cases for indirect blocks->extent migration

    - fix reserved block accounting invalidate_page when
    page_size != block_size (i.e., ppc or 1k block size file systems)

    - fix deadlocks when a memcg is under heavy memory pressure

    - fix fencepost error in lazytime optimization"

    * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: replace open coded nofail allocation in ext4_free_blocks()
    ext4: correctly migrate a file with a hole at the beginning
    ext4: be more strict when migrating to non-extent based file
    ext4: fix reservation release on invalidatepage for delalloc fs
    ext4: avoid deadlocks in the writeback path by using sb_getblk_gfp
    bufferhead: Add _gfp version for sb_getblk()
    ext4: fix fencepost error in lazytime optimization

    Linus Torvalds
     

05 Jul, 2015

7 commits

  • Pull more vfs updates from Al Viro:
    "Assorted VFS fixes and related cleanups (IMO the most interesting in
    that part are f_path-related things and Eric's descriptor-related
    stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
    fs-cache series, DAX patches, Jan's file_remove_suid() work"

    [ I'd say this is much more than "fixes and related cleanups". The
    file_table locking rule change by Eric Dumazet is a rather big and
    fundamental update even if the patch isn't huge. - Linus ]

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
    9p: cope with bogus responses from server in p9_client_{read,write}
    p9_client_write(): avoid double p9_free_req()
    9p: forgetting to cancel request on interrupted zero-copy RPC
    dax: bdev_direct_access() may sleep
    block: Add support for DAX reads/writes to block devices
    dax: Use copy_from_iter_nocache
    dax: Add block size note to documentation
    fs/file.c: __fget() and dup2() atomicity rules
    fs/file.c: don't acquire files->file_lock in fd_install()
    fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
    vfs: avoid creation of inode number 0 in get_next_ino
    namei: make set_root_rcu() return void
    make simple_positive() public
    ufs: use dir_pages instead of ufs_dir_pages()
    pagemap.h: move dir_pages() over there
    remove the pointless include of lglock.h
    fs: cleanup slight list_entry abuse
    xfs: Correctly lock inode when removing suid and file capabilities
    fs: Call security_ops->inode_killpriv on truncate
    fs: Provide function telling whether file_remove_privs() will do anything
    ...

    Linus Torvalds
     
  • Pull SCSI target updates from Nicholas Bellinger:
    "It's been a busy development cycle for target-core in a number of
    different areas.

    The fabric API usage for se_node_acl allocation is now within
    target-core code, dropping the external API callers for all fabric
    drivers tree-wide.

    There is a new conversion to RCU hlists for se_node_acl and
    se_portal_group LUN mappings, that turns fast-past LUN lookup into a
    completely lockless code-path. It also removes the original
    hard-coded limitation of 256 LUNs per fabric endpoint.

    The configfs attributes for backends can now be shared between core
    and driver code, allowing existing drivers to use common code while
    still allowing flexibility for new backend provided attributes.

    The highlights include:

    - Merge sbc_verify_dif_* into common code (sagi)
    - Remove iscsi-target support for obsolete IFMarker/OFMarker
    (Christophe Vu-Brugier)
    - Add bidi support in target/user backend (ilias + vangelis + agover)
    - Move se_node_acl allocation into target-core code (hch)
    - Add crc_t10dif_update common helper (akinobu + mkp)
    - Handle target-core odd SGL mapping for data transfer memory
    (akinobu)
    - Move transport ID handling into target-core (hch)
    - Move task tag into struct se_cmd + support 64-bit tags (bart)
    - Convert se_node_acl->device_list[] to RCU hlist (nab + hch +
    paulmck)
    - Convert se_portal_group->tpg_lun_list[] to RCU hlist (nab + hch +
    paulmck)
    - Simplify target backend driver registration (hch)
    - Consolidate + simplify target backend attribute implementations
    (hch + nab)
    - Subsume se_port + t10_alua_tg_pt_gp_member into se_lun (hch)
    - Drop lun_sep_lock for se_lun->lun_se_dev RCU usage (hch + nab)
    - Drop unnecessary core_tpg_register TFO parameter (nab)
    - Use 64-bit LUNs tree-wide (hannes)
    - Drop left-over TARGET_MAX_LUNS_PER_TRANSPORT limit (hannes)"

    * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (76 commits)
    target: Bump core version to v5.0
    target: remove target_core_configfs.h
    target: remove unused TARGET_CORE_CONFIG_ROOT define
    target: consolidate version defines
    target: implement WRITE_SAME with UNMAP bit using ->execute_unmap
    target: simplify UNMAP handling
    target: replace se_cmd->execute_rw with a protocol_data field
    target/user: Fix inconsistent kmap_atomic/kunmap_atomic
    target: Send UA when changing LUN inventory
    target: Send UA upon LUN RESET tmr completion
    target: Send UA on ALUA target port group change
    target: Convert se_lun->lun_deve_lock to normal spinlock
    target: use 'se_dev_entry' when allocating UAs
    target: Remove 'ua_nacl' pointer from se_ua structure
    target_core_alua: Correct UA handling when switching states
    xen-scsiback: Fix compile warning for 64-bit LUN
    target: Remove TARGET_MAX_LUNS_PER_TRANSPORT
    target: use 64-bit LUNs
    target: Drop duplicate + unused se_dev_check_wce
    target: Drop unnecessary core_tpg_register TFO parameter
    ...

    Linus Torvalds
     
  • Pull NTB updates from Jon Mason:
    "This includes a pretty significant reworking of the NTB core code, but
    has already produced some significant performance improvements.

    An abstraction layer was added to allow the hardware and clients to be
    easily added. This required rewriting the NTB transport layer for
    this abstraction layer. This modification will allow future "high
    performance" NTB clients.

    In addition to this change, a number of performance modifications were
    added. These changes include NUMA enablement, using CPU memcpy
    instead of asyncdma, and modification of NTB layer MTU size"

    * tag 'ntb-4.2' of git://github.com/jonmason/ntb: (22 commits)
    NTB: Add split BAR output for debugfs stats
    NTB: Change WARN_ON_ONCE to pr_warn_once on unsafe
    NTB: Print driver name and version in module init
    NTB: Increase transport MTU to 64k from 16k
    NTB: Rename Intel code names to platform names
    NTB: Default to CPU memcpy for performance
    NTB: Improve performance with write combining
    NTB: Use NUMA memory in Intel driver
    NTB: Use NUMA memory and DMA chan in transport
    NTB: Rate limit ntb_qp_link_work
    NTB: Add tool test client
    NTB: Add ping pong test client
    NTB: Add parameters for Intel SNB B2B addresses
    NTB: Reset transport QP link stats on down
    NTB: Do not advance transport RX on link down
    NTB: Differentiate transport link down messages
    NTB: Check the device ID to set errata flags
    NTB: Enable link for Intel root port mode in probe
    NTB: Read peer info from local SPAD in transport
    NTB: Split ntb_hw_intel and ntb_transport drivers
    ...

    Linus Torvalds
     
  • Pull kvm fixes from Paolo Bonzini:
    "Except for the preempt notifiers fix, these are all small bugfixes
    that could have been waited for -rc2. Sending them now since I was
    taking care of Peter's patch anyway"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    kvm: add hyper-v crash msrs values
    KVM: x86: remove data variable from kvm_get_msr_common
    KVM: s390: virtio-ccw: don't overwrite config space values
    KVM: x86: keep track of LVT0 changes under APICv
    KVM: x86: properly restore LVT0
    KVM: x86: make vapics_in_nmi_mode atomic
    sched, preempt_notifier: separate notifier registration from static_key inc/dec

    Linus Torvalds
     
  • Change ntb_hw_intel to use the new NTB hardware abstraction layer.

    Split ntb_transport into its own driver. Change it to use the new NTB
    hardware abstraction layer.

    Signed-off-by: Allen Hubbe
    Signed-off-by: Jon Mason

    Allen Hubbe
     
  • Abstract the NTB device behind a programming interface, so that it can
    support different hardware and client drivers.

    Signed-off-by: Allen Hubbe
    Signed-off-by: Jon Mason

    Allen Hubbe
     
  • Pull irq update from Thomas Gleixner:
    "The last update for 4.2 is just moving a macro from a local header to
    the global one, so it can be used in architecture code as well.

    Cleanup of the now empty local header is 4.3 material"

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    irqchip: Move IRQCHIP_DECLARE macro to include/linux/irqchip.h

    Linus Torvalds
     

04 Jul, 2015

8 commits

  • Pull scheduler fixes from Ingo Molnar:
    "Debug info and other statistics fixes and related enhancements"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/numa: Fix numa balancing stats in /proc/pid/sched
    sched/numa: Show numa_group ID in /proc/sched_debug task listings
    sched/debug: Move print_cfs_rq() declaration to kernel/sched/sched.h
    sched/stat: Expose /proc/pid/schedstat if CONFIG_SCHED_INFO=y
    sched/stat: Simplify the sched_info accounting dependency

    Linus Torvalds
     
  • Pull second round of input updates from Dmitry Torokhov:
    "A new driver for Weida wdt87xx touch controllers, and a bunch of
    fixups for other drivers"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
    Input: wdt87xx_i2c - add a scaling factor for TOUCH_MAJOR event
    Input: wdt87xx_i2c - remove stray newline in diagnostic message
    Input: arc_ps2 - add HAS_IOMEM dependency
    Input: wdt87xx_i2c - fix format warning
    Input: improve parsing OF parameters for touchscreens
    Input: edt-ft5x06 - mark as direct input device
    Input: use for_each_set_bit() where appropriate
    Input: add a driver for wdt87xx touchscreen controller
    Input: axp20x-pek - fix reporting button state as inverted
    Input: xpad - re-send LED command on present event
    Input: xpad - set the LEDs properly on XBox Wireless controllers
    Input: imx_keypad - check for clk_prepare_enable() error

    Linus Torvalds
     
  • Currently print_cfs_rq() is declared in include/linux/sched.h.
    However it's not used outside kernel/sched. Hence move the
    declaration to kernel/sched/sched.h

    Also some functions are only available for CONFIG_SCHED_DEBUG=y.
    Hence move the declarations to within the #ifdef.

    Signed-off-by: Srikar Dronamraju
    Acked-by: Rik van Riel
    Cc: Iulia Manda
    Cc: Linus Torvalds
    Cc: Mel Gorman
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1435252903-1081-2-git-send-email-srikar@linux.vnet.ibm.com
    Signed-off-by: Ingo Molnar

    Srikar Dronamraju
     
  • Both CONFIG_SCHEDSTATS=y and CONFIG_TASK_DELAY_ACCT=y track task
    sched_info, which results in ugly #if clauses.

    Simplify the code by introducing a synthethic CONFIG_SCHED_INFO
    switch, selected by both.

    Signed-off-by: Naveen N. Rao
    Cc: Balbir Singh
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Srikar Dronamraju
    Cc: Thomas Gleixner
    Cc: a.p.zijlstra@chello.nl
    Cc: ricklind@us.ibm.com
    Link: http://lkml.kernel.org/r/8d19eef800811a94b0f91bcbeb27430a884d7433.1435255405.git.naveen.n.rao@linux.vnet.ibm.com
    Signed-off-by: Ingo Molnar

    Naveen N. Rao
     
  • Pull virtio/vhost cross endian support from Michael Tsirkin:
    "I have just queued some more bugfix patches today but none fix
    regressions and none are related to these ones, so it looks like a
    good time for a merge for -rc1.

    The motivation for this is support for legacy BE guests on the new LE
    hosts. There are two redeeming properties that made me merge this:

    - It's a trivial amount of code: since we wrap host/guest accesses
    anyway, almost all of it is well hidden from drivers.

    - Sane platforms would never set flags like VHOST_CROSS_ENDIAN_LEGACY,
    and when it's clear, there's zero overhead (as some point it was
    tested by compiling with and without the patches, got the same
    stripped binary).

    Maybe we could create a Kconfig symbol to enforce the second point:
    prevent people from enabling it eg on x86. I will look into this"

    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
    virtio-pci: alloc only resources actually used.
    macvtap/tun: cross-endian support for little-endian hosts
    vhost: cross-endian support for legacy devices
    virtio: add explicit big-endian support to memory accessors
    vhost: introduce vhost_is_little_endian() helper
    vringh: introduce vringh_is_little_endian() helper
    macvtap: introduce macvtap_is_little_endian() helper
    tun: add tun_is_little_endian() helper
    virtio: introduce virtio_is_little_endian() helper

    Linus Torvalds
     
  • Pull user namespace updates from Eric Biederman:
    "Long ago and far away when user namespaces where young it was realized
    that allowing fresh mounts of proc and sysfs with only user namespace
    permissions could violate the basic rule that only root gets to decide
    if proc or sysfs should be mounted at all.

    Some hacks were put in place to reduce the worst of the damage could
    be done, and the common sense rule was adopted that fresh mounts of
    proc and sysfs should allow no more than bind mounts of proc and
    sysfs. Unfortunately that rule has not been fully enforced.

    There are two kinds of gaps in that enforcement. Only filesystems
    mounted on empty directories of proc and sysfs should be ignored but
    the test for empty directories was insufficient. So in my tree
    directories on proc, sysctl and sysfs that will always be empty are
    created specially. Every other technique is imperfect as an ordinary
    directory can have entries added even after a readdir returns and
    shows that the directory is empty. Special creation of directories
    for mount points makes the code in the kernel a smidge clearer about
    it's purpose. I asked container developers from the various container
    projects to help test this and no holes were found in the set of mount
    points on proc and sysfs that are created specially.

    This set of changes also starts enforcing the mount flags of fresh
    mounts of proc and sysfs are consistent with the existing mount of
    proc and sysfs. I expected this to be the boring part of the work but
    unfortunately unprivileged userspace winds up mounting fresh copies of
    proc and sysfs with noexec and nosuid clear when root set those flags
    on the previous mount of proc and sysfs. So for now only the atime,
    read-only and nodev attributes which userspace happens to keep
    consistent are enforced. Dealing with the noexec and nosuid
    attributes remains for another time.

    This set of changes also addresses an issue with how open file
    descriptors from /proc//ns/* are displayed. Recently readlink of
    /proc//fd has been triggering a WARN_ON that has not been
    meaningful since it was added (as all of the code in the kernel was
    converted) and is not now actively wrong.

    There is also a short list of issues that have not been fixed yet that
    I will mention briefly.

    It is possible to rename a directory from below to above a bind mount.
    At which point any directory pointers below the renamed directory can
    be walked up to the root directory of the filesystem. With user
    namespaces enabled a bind mount of the bind mount can be created
    allowing the user to pick a directory whose children they can rename
    to outside of the bind mount. This is challenging to fix and doubly
    so because all obvious solutions must touch code that is in the
    performance part of pathname resolution.

    As mentioned above there is also a question of how to ensure that
    developers by accident or with purpose do not introduce exectuable
    files on sysfs and proc and in doing so introduce security regressions
    in the current userspace that will not be immediately obvious and as
    such are likely to require breaking userspace in painful ways once
    they are recognized"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    vfs: Remove incorrect debugging WARN in prepend_path
    mnt: Update fs_fully_visible to test for permanently empty directories
    sysfs: Create mountpoints with sysfs_create_mount_point
    sysfs: Add support for permanently empty directories to serve as mount points.
    kernfs: Add support for always empty directories.
    proc: Allow creating permanently empty directories that serve as mount points
    sysctl: Allow creating permanently empty directories that serve as mountpoints.
    fs: Add helper functions for permanently empty directories.
    vfs: Ignore unlocked mounts in fs_fully_visible
    mnt: Modify fs_fully_visible to deal with locked ro nodev and atime
    mnt: Refactor the logic for mounting sysfs and proc in a user namespace

    Linus Torvalds
     
  • Pull remoteproc updates from Ohad Ben-Cohen:

    - remoteproc fixes/cleanups from Suman Anna

    - new remoteproc TI Wakeup M3 driver from Dave Gerlach

    - remoteproc core support for TI's Wakeup M3 driver from both Dave and Suman

    - tiny remoteproc build fix from myself

    * tag 'remoteproc-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc:
    remoteproc: fix !CONFIG_OF build breakage
    remoteproc/wkup_m3: add a remoteproc driver for TI Wakeup M3
    Documentation: dt: add bindings for TI Wakeup M3 processor
    remoteproc: add a rproc ops for performing address translation
    remoteproc: introduce rproc_get_by_phandle API
    remoteproc: fix various checkpatch warnings
    remoteproc/davinci: fix quoted split string checkpatch warning
    remoteproc/ste: add blank lines after declarations

    Linus Torvalds
     
  • Pull hwspinlock updates from Ohad Ben-Cohen:

    - hwspinlock core DT support from Suman Anna

    - OMAP hwspinlock DT support from Suman Anna

    - QCOM hwspinlock DT support from Bjorn Andersson

    - a new CSR atlas7 hwspinlock driver from Wei Chen

    - CSR atlas7 hwspinlock DT binding document from Wei Chen

    - a tiny QCOM hwspinlock driver fix from Bjorn Andersson

    * tag 'hwspinlock-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/hwspinlock:
    hwspinlock: qcom: Correct msb in regmap_field
    DT: hwspinlock: add the CSR atlas7 hwspinlock bindings document
    hwspinlock: add a CSR atlas7 driver
    hwspinlock: qcom: Add support for Qualcomm HW Mutex block
    DT: hwspinlock: Add binding documentation for Qualcomm hwmutex
    hwspinlock/omap: add support for dt nodes
    Documentation: dt: add the omap hwspinlock bindings document
    hwspinlock/core: add device tree support
    Documentation: dt: add common bindings for hwspinlock

    Linus Torvalds