11 Oct, 2016

22 commits

  • Pull more vfs updates from Al Viro:
    ">rename2() work from Miklos + current_time() from Deepa"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Replace current_fs_time() with current_time()
    fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
    fs: Replace CURRENT_TIME with current_time() for inode timestamps
    fs: proc: Delete inode time initializations in proc_alloc_inode()
    vfs: Add current_time() api
    vfs: add note about i_op->rename changes to porting
    fs: rename "rename2" i_op to "rename"
    vfs: remove unused i_op->rename
    fs: make remaining filesystems use .rename2
    libfs: support RENAME_NOREPLACE in simple_rename()
    fs: support RENAME_NOREPLACE for local filesystems
    ncpfs: fix unused variable warning

    Linus Torvalds
     
  • Al Viro
     
  • Pull MTD updates from Brian Norris:
    "I've not been very active this cycle, so these are mostly from Boris,
    for the NAND flash subsystem.

    NAND:

    - Add the infrastructure to automate NAND timings configuration

    - Provide a generic DT property to maximize ECC strength

    - Some refactoring in the core bad block table handling, to help with
    improving some of the logic in error cases.

    - Minor cleanups and fixes

    MTD:

    - Add APIs for handling page pairing; this is necessary for reliably
    supporting MLC and TLC NAND flash, where paired-page disturbance
    affects reliability. Upper layers (e.g., UBI) should make use of
    these in the near future"

    * tag 'for-linus-20161008' of git://git.infradead.org/linux-mtd: (35 commits)
    mtd: nand: fix trivial spelling error
    mtdpart: Propagate _get/put_device()
    mtd: nand: Provide nand_cleanup() function to free NAND related resources
    mtd: Kill the OF_MTD Kconfig option
    mtd: nand: mxc: Test CONFIG_OF instead of CONFIG_OF_MTD
    mtd: nand: Fix nand_command_lp() for 8bits opcodes
    mtd: nand: sunxi: Support ECC maximization
    mtd: nand: Support maximizing ECC when using software BCH
    mtd: nand: Add an option to maximize the ECC strength
    mtd: nand: mxc: Add timing setup for v2 controllers
    mtd: nand: mxc: implement onfi get/set features
    mtd: nand: sunxi: switch from manual to automated timing config
    mtd: nand: automate NAND timings selection
    mtd: nand: Expose data interface for ONFI mode 0
    mtd: nand: Add function to convert ONFI mode to data_interface
    mtd: nand: convert ONFI mode into data interface
    mtd: nand: Introduce nand_data_interface
    mtd: nand: Create a NAND reset function
    mtd: nand: remove unnecessary 'extern' from function declarations
    MAINTAINERS: Add maintainer entry for Ingenic JZ4780 NAND driver
    ...

    Linus Torvalds
     
  • Pull vfs xattr updates from Al Viro:
    "xattr stuff from Andreas

    This completes the switch to xattr_handler ->get()/->set() from
    ->getxattr/->setxattr/->removexattr"

    * 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    vfs: Remove {get,set,remove}xattr inode operations
    xattr: Stop calling {get,set,remove}xattr inode operations
    vfs: Check for the IOP_XATTR flag in listxattr
    xattr: Add __vfs_{get,set,remove}xattr helpers
    libfs: Use IOP_XATTR flag for empty directory handling
    vfs: Use IOP_XATTR flag for bad-inode handling
    vfs: Add IOP_XATTR inode operations flag
    vfs: Move xattr_resolve_name to the front of fs/xattr.c
    ecryptfs: Switch to generic xattr handlers
    sockfs: Get rid of getxattr iop
    sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
    kernfs: Switch to generic xattr handlers
    hfs: Switch to generic xattr handlers
    jffs2: Remove jffs2_{get,set,remove}xattr macros
    xattr: Remove unnecessary NULL attribute name check

    Linus Torvalds
     
  • Pull crypto updates from Herbert Xu:
    "Here is the crypto update for 4.9:

    API:
    - The crypto engine code now supports hashes.

    Algorithms:
    - Allow keys >= 2048 bits in FIPS mode for RSA.

    Drivers:
    - Memory overwrite fix for vmx ghash.
    - Add support for building ARM sha1-neon in Thumb2 mode.
    - Reenable ARM ghash-ce code by adding import/export.
    - Reenable img-hash by adding import/export.
    - Add support for multiple cores in omap-aes.
    - Add little-endian support for sha1-powerpc.
    - Add Cavium HWRNG driver for ThunderX SoC"

    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (137 commits)
    crypto: caam - treat SGT address pointer as u64
    crypto: ccp - Make syslog errors human-readable
    crypto: ccp - clean up data structure
    crypto: vmx - Ensure ghash-generic is enabled
    crypto: testmgr - add guard to dst buffer for ahash_export
    crypto: caam - Unmap region obtained by of_iomap
    crypto: sha1-powerpc - little-endian support
    crypto: gcm - Fix IV buffer size in crypto_gcm_setkey
    crypto: vmx - Fix memory corruption caused by p8_ghash
    crypto: ghash-generic - move common definitions to a new header file
    crypto: caam - fix sg dump
    hwrng: omap - Only fail if pm_runtime_get_sync returns < 0
    crypto: omap-sham - shrink the internal buffer size
    crypto: omap-sham - add support for export/import
    crypto: omap-sham - convert driver logic to use sgs for data xmit
    crypto: omap-sham - change the DMA threshold value to a define
    crypto: omap-sham - add support functions for sg based data handling
    crypto: omap-sham - rename sgl to sgl_tmp for deprecation
    crypto: omap-sham - align algorithms on word offset
    crypto: omap-sham - add context export/import stubs
    ...

    Linus Torvalds
     
  • Pull dlm fix from David Teigland:
    "This includes a bug fix for a bad memory access during workqueue
    cleanup, which can happen while shutting down the dlm networking
    layer"

    * tag 'dlm-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
    dlm: free workqueues after the connections

    Linus Torvalds
     
  • Pull Ceph updates from Ilya Dryomov:
    "The big ticket item here is support for rbd exclusive-lock feature,
    with maintenance operations offloaded to userspace (Douglas Fuller,
    Mike Christie and myself). Another block device bullet is a series
    fixing up layering error paths (myself).

    On the filesystem side, we've got patches that improve our handling of
    buffered vs dio write races (Neil Brown) and a few assorted fixes from
    Zheng. Also included a couple of random cleanups and a minor CRUSH
    update"

    * tag 'ceph-for-4.9-rc1' of git://github.com/ceph/ceph-client: (39 commits)
    crush: remove redundant local variable
    crush: don't normalize input of crush_ln iteratively
    libceph: ceph_build_auth() doesn't need ceph_auth_build_hello()
    libceph: use CEPH_AUTH_UNKNOWN in ceph_auth_build_hello()
    ceph: fix description for rsize and rasize mount options
    rbd: use kmalloc_array() in rbd_header_from_disk()
    ceph: use list_move instead of list_del/list_add
    ceph: handle CEPH_SESSION_REJECT message
    ceph: avoid accessing / when mounting a subpath
    ceph: fix mandatory flock check
    ceph: remove warning when ceph_releasepage() is called on dirty page
    ceph: ignore error from invalidate_inode_pages2_range() in direct write
    ceph: fix error handling of start_read()
    rbd: add rbd_obj_request_error() helper
    rbd: img_data requests don't own their page array
    rbd: don't call rbd_osd_req_format_read() for !img_data requests
    rbd: rework rbd_img_obj_exists_submit() error paths
    rbd: don't crash or leak on errors in rbd_img_obj_parent_read_full_callback()
    rbd: move bumping img_request refcount into rbd_obj_request_submit()
    rbd: mark the original request as done if stat request fails
    ...

    Linus Torvalds
     
  • Pull splice fixups from Al Viro:
    "A couple of fixups for interaction of pipe-backed iov_iter with
    O_DIRECT reads + constification of a couple of primitives in uio.h
    missed by previous rounds.

    Kudos to davej - his fuzzing has caught those bugs"

    * 'work.splice_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    [btrfs] fix check_direct_IO() for non-iovec iterators
    constify iov_iter_count() and iter_is_iovec()
    fix ITER_PIPE interaction with direct_IO

    Linus Torvalds
     
  • Pull misc vfs updates from Al Viro:
    "Assorted misc bits and pieces.

    There are several single-topic branches left after this (rename2
    series from Miklos, current_time series from Deepa Dinamani, xattr
    series from Andreas, uaccess stuff from from me) and I'd prefer to
    send those separately"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits)
    proc: switch auxv to use of __mem_open()
    hpfs: support FIEMAP
    cifs: get rid of unused arguments of CIFSSMBWrite()
    posix_acl: uapi header split
    posix_acl: xattr representation cleanups
    fs/aio.c: eliminate redundant loads in put_aio_ring_file
    fs/internal.h: add const to ns_dentry_operations declaration
    compat: remove compat_printk()
    fs/buffer.c: make __getblk_slow() static
    proc: unsigned file descriptors
    fs/file: more unsigned file descriptors
    fs: compat: remove redundant check of nr_segs
    cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]
    cifs: don't use memcpy() to copy struct iov_iter
    get rid of separate multipage fault-in primitives
    fs: Avoid premature clearing of capabilities
    fs: Give dentry to inode_change_ok() instead of inode
    fuse: Propagate dentry down to inode_change_ok()
    ceph: Propagate dentry down to inode_change_ok()
    xfs: Propagate dentry down to inode_change_ok()
    ...

    Linus Torvalds
     
  • Pull ARM pcmcia updates from Russell King:
    "These updates lay the foundations for more generic soc_common PCMCIA
    support, which will result in several of the board specific drivers
    being elimated.

    As the dependencies for this are complex, the preliminary work is
    being submitted now, with the remainder scheduled for the next merge
    window"

    * 'pcmcia' of git://git.armlinux.org.uk/~rmk/linux-arm:
    pcmcia: soc_common: add driver-data pointer
    pcmcia: soc_common: add support for voltage sense GPIOs
    pcmcia: soc_common: constify pcmcia_low_level ops pointer
    pcmcia: soc_common: switch to a per-socket cpufreq notifier
    pcmcia: soc_common: add support for Vcc and Vpp regulators
    pcmcia: soc_common: add CF socket state helper
    pcmcia: soc_common: restore previous socket state on error
    pcmcia: soc_common: add support for reset and bus enable GPIOs
    pcmcia: soc_common: request legacy detect GPIO with active low
    pcmcia: soc_common: ignore invalid interrupts
    pcmcia: soc_common: switch to using gpio_descs
    pcmcia: soc_common: use devm_gpio_request_one()

    Linus Torvalds
     
  • Pull nios2 update from Ley Foon Tan:
    "Use of_property_read_bool() instead of open-coding it"

    * tag 'nios2-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
    nios2: use of_property_read_bool

    Linus Torvalds
     
  • Pull CRIS updates from Jesper Nilsson.

    * tag 'cris-for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jesper/cris:
    cris: return of class_create should be considered
    CRIS: defconfig: remove MTDRAM_ABS_POS
    CRIS v32: remove some double unlocks
    Fix typos
    cris: migrate exception table users off module.h and onto extable.h
    cris: v10: axisflashmap: remove unused ifdefs
    cris: use generic io.h
    cris: fix Kconfig mismatch when building with CONFIG_PCI
    cris: cardbus: fix header include path
    cris: add dev88_defconfig
    cris: irq: stop loop from accessing array out of bounds
    cris: fasttimer: fix mixed declarations and code compile warning
    cris: intmem: fix pointer comparison compile warning
    cris: intmem: fix device_initcall compile warning

    Linus Torvalds
     
  • Pull protection keys syscall interface from Thomas Gleixner:
    "This is the final step of Protection Keys support which adds the
    syscalls so user space can actually allocate keys and protect memory
    areas with them. Details and usage examples can be found in the
    documentation.

    The mm side of this has been acked by Mel"

    * 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/pkeys: Update documentation
    x86/mm/pkeys: Do not skip PKRU register if debug registers are not used
    x86/pkeys: Fix pkeys build breakage for some non-x86 arches
    x86/pkeys: Add self-tests
    x86/pkeys: Allow configuration of init_pkru
    x86/pkeys: Default to a restrictive init PKRU
    pkeys: Add details of system call use to Documentation/
    generic syscalls: Wire up memory protection keys syscalls
    x86: Wire up protection keys system calls
    x86/pkeys: Allocation/free syscalls
    x86/pkeys: Make mprotect_key() mask off additional vm_flags
    mm: Implement new pkey_mprotect() system call
    x86/pkeys: Add fault handling for PF_PK page fault bit

    Linus Torvalds
     
  • Pull x86 updates from Thomas Gleixner:
    "A pile of regression fixes and updates:

    - address the fallout of the patches which made the cpuid - nodeid
    relation permanent: Handling of invalid APIC ids and preventing
    pointless warning messages.

    - force eager FPU when protection keys are enabled. Protection keys
    are not generating FPU exceptions so they cannot work with the lazy
    FPU mechanism.

    - prevent force migration of interrupts which are not part of the CPU
    vector domain.

    - handle the fact that APIC ids are not updated in the ACPI/MADT
    tables on physical CPU hotplug

    - remove bash-isms from syscall table generator script

    - use the hypervisor supplied APIC frequency when running on VMware"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/pkeys: Make protection keys an "eager" feature
    x86/apic: Prevent pointless warning messages
    x86/acpi: Prevent LAPIC id 0xff from being accounted
    arch/x86: Handle non enumerated CPU after physical hotplug
    x86/unwind: Fix oprofile module link error
    x86/vmware: Skip lapic calibration on VMware
    x86/syscalls: Remove bash-isms in syscall table generator
    x86/irq: Prevent force migration of irqs which are not in the vector domain

    Linus Torvalds
     
  • looking for duplicate ->iov_base makes sense only for
    iovec-backed iterators; for kvec-backed ones it's pointless,
    for bvec-backed ones it's pointless and broken on 32bit (we
    walk through an array of struct bio_vec accessing them as if
    they were struct iovec; works by accident on 64bit, but on
    32bit it'll blow up) and for pipe-backed ones it's pointless
    and ends up oopsing.

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • by making sure we call iov_iter_advance() on original
    iov_iter even if direct_IO (done on its copy) has returned 0.
    It's a no-op for old iov_iter flavours and does the right thing
    (== truncation of the stuff we'd allocated, but not filled) in
    ITER_PIPE case. Failures (e.g. -EIO) get caught and dealt with
    by cleanup in generic_file_read_iter().

    Signed-off-by: Al Viro

    Al Viro
     
  • Pull perf tooling updates from Thomas Gleixner:

    - handle uretprobe placement proper on little endian PPC64

    - fix buffer handling in libtraceevent

    - add a missing pointer derefence in perf probe

    - fix the build of host tools in cross builds

    - fix Intel PT timestamp handling

    - synchronize memcpy, cpufeatures and bpf headers with the kernel headers

    - support for vendor supplied JSON files describing PMU events

    - a new set of tool tips

    - initial work for clang/llvm support

    - address some style issues found by cppcheck

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
    tools build: Add feature detection for g++
    tools build: Support compiling C++ source file
    perf top/report: Add tips about a list option
    perf report/top: Add a tip about system-wide collection from all CPUs
    perf report/top: Add a tip about source line numbers with overhead
    tools: Synchronize tools/include/uapi/linux/bpf.h
    tools: Synchronize tools/arch/x86/include/asm/cpufeatures.h
    perf bench mem: Sync memcpy assembly sources with the kernel
    perf jevents: Fix Intel JSON fixed counter conversions
    tools lib traceevent: Fix kbuffer_read_at_offset()
    perf intel-pt: Fix MTC timestamp calculation for large MTC periods
    perf intel-pt: Fix estimated timestamps for cycle-accurate mode
    perf uretprobe ppc64le: Fix probe location
    perf pmu-events: Add Skylake frontend MSR support
    perf pmu-events: Fix fixed counters on Intel
    perf tools: Make alias matching case-insensitive
    perf tools: Allow period= in perf stat CPU event descriptions.
    perf tools: Add README for info on parsing JSON/map files
    perf list jevents: Add support for event list topics
    perf list: Support long jevents descriptions
    ...

    Linus Torvalds
     
  • Pull scheduler fix from Thomas Gleixner:
    "A revert of a commit which pointelessly widened a preempt disabled
    section which in turn caused might_sleep() to trigger.

    The patch intended to prevent usage of smp_processor_id() in
    preemptible context, but the usage in that case is fine because the
    thread is pinned on a single cpu and therefore cannot be migrated off"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    Revert "sched/core: Do not use smp_processor_id() with preempt enabled in smpboot_thread_fn()"

    Linus Torvalds
     
  • Pull irq fixes from Thomas Gleixner:
    "Two small kerneldoc fixes from Julia Lawall"

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    irqchip/metag-ext: Improve function-level documentation
    irqchip/vic: Improve function-level documentation

    Linus Torvalds
     
  • Pull timer fix from Thomas Gleixner:
    "A single fix for a regression introduced in 4.8 which causes the
    trace/perf clock to return random nonsense if CONFIG_DEBUG_TIMEKEEPING
    is set"

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    timekeeping: Fix __ktime_get_fast_ns() regression

    Linus Torvalds
     
  • Merge my system logging cleanups, triggered by the broken '\n' patches.

    The line continuation handling has been broken basically forever, and
    the code to handle the system log records was both confusing and
    dubious. And it would do entirely the wrong thing unless you always had
    a terminating newline, partly because it couldn't actually see whether a
    message was marked KERN_CONT or not (but partly because the LOG_CONT
    handling in the recording code was rather confusing too).

    This re-introduces a real semantically meaningful KERN_CONT, and fixes
    the few places I noticed where it was missing. There are probably more
    missing cases, since KERN_CONT hasn't actually had any semantic meaning
    for at least four years (other than the checkpatch meaning of "no log
    level necessary, this is a continuation line").

    This also allows the combination of KERN_CONT and a log level. In that
    case the log level will be ignored if the merging with a previous line
    is successful, but if a new record is needed, that new record will now
    get the right log level.

    That also means that you can at least in theory combine KERN_CONT with
    the "pr_info()" style helpers, although any use of pr_fmt() prefixing
    would make that just result in a mess, of course (the prefix would end
    up in the middle of a continuing line).

    * printk-cleanups:
    printk: make reading the kernel log flush pending lines
    printk: re-organize log_output() to be more legible
    printk: split out core logging code into helper function
    printk: reinstate KERN_CONT for printing continuation lines

    Linus Torvalds
     

10 Oct, 2016

11 commits

  • After backporting commit ee44b4bc054a ("dlm: use sctp 1-to-1 API")
    series to a kernel with an older workqueue which didn't use RCU yet, it
    was noticed that we are freeing the workqueues in dlm_lowcomms_stop()
    too early as free_conn() will try to access that memory for canceling
    the queued works if any.

    This issue was introduced by commit 0d737a8cfd83 as before it such
    attempt to cancel the queued works wasn't performed, so the issue was
    not present.

    This patch fixes it by simply inverting the free order.

    Cc: stable@vger.kernel.org
    Fixes: 0d737a8cfd83 ("dlm: fix race while closing connections")
    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David Teigland

    Marcelo Ricardo Leitner
     
  • Merge the crypto tree to pull in vmx ghash fix.

    Herbert Xu
     
  • Pull blk-mq CPU hotplug update from Jens Axboe:
    "This is the conversion of blk-mq to the new hotplug state machine"

    * 'for-4.9/block-smp' of git://git.kernel.dk/linux-block:
    blk-mq: fixup "Convert to new hotplug state machine"
    blk-mq: Convert to new hotplug state machine
    blk-mq/cpu-notif: Convert to new hotplug state machine

    Linus Torvalds
     
  • Pull blk-mq irq/cpu mapping updates from Jens Axboe:
    "This is the block-irq topic branch for 4.9-rc. It's mostly from
    Christoph, and it allows drivers to specify their own mappings, and
    more importantly, to share the blk-mq mappings with the IRQ affinity
    mappings. It's a good step towards making this work better out of the
    box"

    * 'for-4.9/block-irq' of git://git.kernel.dk/linux-block:
    blk_mq: linux/blk-mq.h does not include all the headers it depends on
    blk-mq: kill unused blk_mq_create_mq_map()
    blk-mq: get rid of the cpumask in struct blk_mq_tags
    nvme: remove the post_scan callout
    nvme: switch to use pci_alloc_irq_vectors
    blk-mq: provide a default queue mapping for PCI device
    blk-mq: allow the driver to pass in a queue mapping
    blk-mq: remove ->map_queue
    blk-mq: only allocate a single mq_map per tag_set
    blk-mq: don't redistribute hardware queues on a CPU hotplug event

    Linus Torvalds
     
  • Pull device mapper updates from Mike Snitzer:

    - various fixes and cleanups for request-based DM core

    - add support for delaying the requeue of requests; used by DM
    multipath when all paths have failed and 'queue_if_no_path' is
    enabled

    - DM cache improvements to speedup the loading metadata and the writing
    of the hint array

    - fix potential for a dm-crypt crash on device teardown

    - remove dm_bufio_cond_resched() and just using cond_resched()

    - change DM multipath to return a reservation conflict error
    immediately; rather than failing the path and retrying (potentially
    indefinitely)

    * tag 'dm-4.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (24 commits)
    dm mpath: always return reservation conflict without failing over
    dm bufio: remove dm_bufio_cond_resched()
    dm crypt: fix crash on exit
    dm cache metadata: switch to using the new cursor api for loading metadata
    dm array: introduce cursor api
    dm btree: introduce cursor api
    dm cache policy smq: distribute entries to random levels when switching to smq
    dm cache: speed up writing of the hint array
    dm array: add dm_array_new()
    dm mpath: delay the requeue of blk-mq requests while all paths down
    dm mpath: use dm_mq_kick_requeue_list()
    dm rq: introduce dm_mq_kick_requeue_list()
    dm rq: reduce arguments passed to map_request() and dm_requeue_original_request()
    dm rq: add DM_MAPIO_DELAY_REQUEUE to delay requeue of blk-mq requests
    dm: convert wait loops to use autoremove_wake_function()
    dm: use signal_pending_state() in dm_wait_for_completion()
    dm: rename task state function arguments
    dm: add two lockdep_assert_held() statements
    dm rq: simplify dm_old_stop_queue()
    dm mpath: check if path's request_queue is dying in activate_path()
    ...

    Linus Torvalds
     
  • Pull main rdma updates from Doug Ledford:
    "This is the main pull request for the rdma stack this release. The
    code has been through 0day and I had it tagged for linux-next testing
    for a couple days.

    Summary:

    - updates to mlx5

    - updates to mlx4 (two conflicts, both minor and easily resolved)

    - updates to iw_cxgb4 (one conflict, not so obvious to resolve,
    proper resolution is to keep the code in cxgb4_main.c as it is in
    Linus' tree as attach_uld was refactored and moved into
    cxgb4_uld.c)

    - improvements to uAPI (moved vendor specific API elements to uAPI
    area)

    - add hns-roce driver and hns and hns-roce ACPI reset support

    - conversion of all rdma code away from deprecated
    create_singlethread_workqueue

    - security improvement: remove unsafe ib_get_dma_mr (breaks lustre in
    staging)"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (75 commits)
    staging/lustre: Disable InfiniBand support
    iw_cxgb4: add fast-path for small REG_MR operations
    cxgb4: advertise support for FR_NSMR_TPTE_WR
    IB/core: correctly handle rdma_rw_init_mrs() failure
    IB/srp: Fix infinite loop when FMR sg[0].offset != 0
    IB/srp: Remove an unused argument
    IB/core: Improve ib_map_mr_sg() documentation
    IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets
    IB/mthca: Move user vendor structures
    IB/nes: Move user vendor structures
    IB/ocrdma: Move user vendor structures
    IB/mlx4: Move user vendor structures
    IB/cxgb4: Move user vendor structures
    IB/cxgb3: Move user vendor structures
    IB/mlx5: Move and decouple user vendor structures
    IB/{core,hw}: Add constant for node_desc
    ipoib: Make ipoib_warn ratelimited
    IB/mlx4/alias_GUID: Remove deprecated create_singlethread_workqueue
    IB/ipoib_verbs: Remove deprecated create_singlethread_workqueue
    IB/ipoib: Remove deprecated create_singlethread_workqueue
    ...

    Linus Torvalds
     
  • Pull more rdma updates from Doug Ledford:
    "Minor updates for rxe driver"

    [ Starting to do merge window pulls again - the current -git tree does
    appear to have some netfilter use-after-free issues, but I've sent
    off the report to the proper channels, and I don't want to delay merge
    window activity any more ]

    * tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
    IB/rxe: improved debug prints & code cleanup
    rdma_rxe: Ensure rdma_rxe init occurs at correct time
    IB/rxe: Properly honor max IRD value for rd/atomic.
    IB/{rxe,core,rdmavt}: Fix kernel crash for reg MR
    IB/rxe: Fix sending out loopback packet on netdev interface.
    IB/rxe: Avoid scheduling tasklet for userspace QP

    Linus Torvalds
     
  • That will mean that any possible subsequent continuation will now be
    broken up onto a line of its own (since reading the log has finalized
    the beginning og the line), but if user space has activated system
    logging (or if there's a kernel message dump going on) that is the right
    thing to do.

    And now that we actually get the continuation flags _right_ for this
    all, the user space logger that is reading the kernel messages can
    actually see the continuation marker. Not that anybody seems to really
    bother with it (or care), but in theory user space can do its own
    message stitching.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Avoid some duplicate logic now that we can return early, and update the
    comments for the new LOG_CONT world order.

    This also stops the continuation flushing from just using random record
    flags for the flushing action, instead taking the flags from the proper
    original line and updating them as we add continuations to it.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • The code that actually decides how to log the message (whether to put it
    directly into the record log, whether to append it to an existing
    buffered log, or whether to start a new buffered log) is fairly
    non-obvious code in the middle of the vprintk_emit() function.

    Splitting that code up into a helper function makes it easier to
    understand, but perhaps more importantly also allows for the code to
    just return early out of the helper function once it has made the
    decision about where the new log content goes.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Long long ago the kernel log buffer was a buffered stream of bytes, very
    much like stdio in user space. It supported log levels by scanning the
    stream and noticing the log level markers at the beginning of each line,
    but if you wanted to print a partial line in multiple chunks, you just
    did multiple printk() calls, and it just automatically worked.

    Except when it didn't, and you had very confusing output when different
    lines got all mixed up with each other. Then you got fragment lines
    mixing with each other, or with non-fragment lines, because it was
    traditionally impossible to tell whether a printk() call was a
    continuation or not.

    To at least help clarify the issue of continuation lines, we added a
    KERN_CONT marker back in 2007 to mark continuation lines:

    474925277671 ("printk: add KERN_CONT annotation").

    That continuation marker was initially an empty string, and didn't
    actuall make any semantic difference. But it at least made it possible
    to annotate the source code, and have check-patch notice that a printk()
    didn't need or want a log level marker, because it was a continuation of
    a previous line.

    To avoid the ambiguity between a continuation line that had that
    KERN_CONT marker, and a printk with no level information at all, we then
    in 2009 made KERN_CONT be a real log level marker which meant that we
    could now reliably tell the difference between the two cases.

    5fd29d6ccbc9 ("printk: clean up handling of log-levels and newlines")

    and we could take advantage of that to make sure we didn't mix up
    continuation lines with lines that just didn't have any loglevel at all.

    Then, in 2012, the kernel log buffer was changed to be a "record" based
    log, where each line was a record that has a loglevel and a timestamp.

    You can see the beginning of that conversion in commits

    e11fea92e13f ("kmsg: export printk records to the /dev/kmsg interface")
    7ff9554bb578 ("printk: convert byte-buffer to variable-length record buffer")

    with a number of follow-up commits to fix some painful fallout from that
    conversion. Over all, it took a couple of months to sort out most of
    it. But the upside was that you could have concurrent readers (and
    writers) of the kernel log and not have lines with mixed output in them.

    And one particular pain-point for the record-based kernel logging was
    exactly the fragmentary lines that are generated in smaller chunks. In
    order to still log them as one recrod, the continuation lines need to be
    attached to the previous record properly.

    However the explicit continuation record marker that is actually useful
    for this exact case was actually removed in aroundm the same time by commit

    61e99ab8e35a ("printk: remove the now unnecessary "C" annotation for KERN_CONT")

    due to the incorrect belief that KERN_CONT wasn't meaningful. The
    ambiguity between "is this a continuation line" or "is this a plain
    printk with no log level information" was reintroduced, and in fact
    became an even bigger pain point because there was now the whole
    record-level merging of kernel messages going on.

    This patch reinstates the KERN_CONT as a real non-empty string marker,
    so that the ambiguity is fixed once again.

    But it's not a plain revert of that original removal: in the four years
    since we made KERN_CONT an empty string again, not only has the format
    of the log level markers changed, we've also had some usage changes in
    this area.

    For example, some ACPI code seems to use KERN_CONT _together_ with a log
    level, and now uses both the KERN_CONT marker and (for example) a
    KERN_INFO marker to show that it's an informational continuation of a
    line.

    Which is actually not a bad idea - if the continuation line cannot be
    attached to its predecessor, without the log level information we don't
    know what log level to assign to it (and we traditionally just assigned
    it the default loglevel). So having both a log level and the KERN_CONT
    marker is not necessarily a bad idea, but it does mean that we need to
    actually iterate over potentially multiple markers, rather than just a
    single one.

    Also, since KERN_CONT was still conceptually needed, and encouraged, but
    didn't actually _do_ anything, we've also had the reverse problem:
    rather than having too many annotations it has too few, and there is bit
    rot with code that no longer marks the continuation lines with the
    KERN_CONT marker.

    So this patch not only re-instates the non-empty KERN_CONT marker, it
    also fixes up the cases of bit-rot I noticed in my own logs.

    There are probably other cases where KERN_CONT will be needed to be
    added, either because it is new code that never dealt with the need for
    KERN_CONT, or old code that has bitrotted without anybody noticing.

    That said, we should strive to avoid the need for KERN_CONT. It does
    result in real problems for logging, and should generally not be seen as
    a good feature. If we some day can get rid of the feature entirely,
    because nobody does any fragmented printk calls, that would be lovely.

    But until that point, let's at mark the code that relies on the hacky
    multi-fragment kernel printk's. Not only does it avoid the ambiguity,
    it also annotates code as "maybe this would be good to fix some day".

    (That said, particularly during single-threaded bootup, the downsides of
    KERN_CONT are very limited. Things get much hairier when you have
    multiple threads going on and user level reading and writing logs too).

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

09 Oct, 2016

1 commit


08 Oct, 2016

6 commits

  • Al Viro
     
  • Al Viro
     
  • Al Viro
     
  • Al Viro
     
  • Our XSAVE features are divided into two categories: those that
    generate FPU exceptions, and those that do not. MPX and pkeys do
    not generate FPU exceptions and thus can not be used lazily. We
    disable them when lazy mode is forced on.

    We have a pair of masks to collect these two sets of features, but
    XFEATURE_MASK_PKRU was added to the wrong mask: XFEATURE_MASK_LAZY.
    Fix it by moving the feature to XFEATURE_MASK_EAGER.

    Note: this only causes problem if you boot with lazy FPU mode
    (eagerfpu=off) which is *not* the default. It also only affects
    hardware which is not currently publicly available. It looks like
    eager mode is going away, but we still need this patch applied
    to any kernel that has protection keys and lazy mode, which is 4.6
    through 4.8 at this point, and 4.9 if the lazy removal isn't sent
    to Linus for 4.9.

    Fixes: c8df40098451 ("x86/fpu, x86/mm/pkeys: Add PKRU xsave fields and data structures")
    Signed-off-by: Dave Hansen
    Cc: Dave Hansen
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/20161007162342.28A49813@viggo.jf.intel.com
    Signed-off-by: Thomas Gleixner

    Dave Hansen
     
  • Markus reported that he sees new warnings:

    APIC: NR_CPUS/possible_cpus limit of 4 reached. Processor 4/0x84 ignored.
    APIC: NR_CPUS/possible_cpus limit of 4 reached. Processor 5/0x85 ignored.

    This comes from the recent persistant cpuid - nodeid changes. The code
    which emits the warning has been called prior to these changes only for
    enabled processors. Now it's called for disabled processors as well to get
    the possible cpu accounting correct. So if the kernel is compiled for the
    number of actual available/enabled CPUs and the BIOS reports disabled CPUs
    as well then the above warnings are printed.

    That's a pointless exercise as it only makes sense if there are more CPUs
    enabled than the kernel supports.

    Nake the warning conditional on enabled processors so we are back to the
    state before these changes.

    Fixes: 8f54969dc8d6 ("x86/acpi: Introduce persistent storage for cpuid apicid mapping")
    Reported-and-tested-by: Markus Trippelsdorf
    Cc: One Thousand Gnomes
    Cc: Dou Liyang
    Cc: linux-acpi@vger.kernel.org
    Cc: Gu Zheng
    Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610071549330.19804@nanos
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner