14 Aug, 2015

6 commits

  • The split code in blkdev_issue_{discard,write_same} can go away
    now that any driver that cares does the split. We have to make
    sure bio size doesn't overflow.

    For discard, we set max discard sectors to (1<>9 to ensure
    it doesn't overflow bi_size and hopefully it is of the proper
    granularity as long as the granularity is a power of two.

    Acked-by: Christoph Hellwig
    Signed-off-by: Ming Lin
    Signed-off-by: Jens Axboe

    Ming Lin
     
  • Btrfs has been doing bio splitting from btrfs_map_bio(), by checking
    device limits as well as calling ->merge_bvec_fn() etc. That is not
    necessary any more, because generic_make_request() is now able to
    handle arbitrarily sized bios. So clean up unnecessary code paths.

    Cc: Chris Mason
    Cc: Josef Bacik
    Cc: linux-btrfs@vger.kernel.org
    Signed-off-by: Kent Overstreet
    Signed-off-by: Chris Mason
    [dpark: add more description in commit message]
    Signed-off-by: Dongsu Park
    Signed-off-by: Ming Lin
    Signed-off-by: Jens Axboe

    Kent Overstreet
     
  • The bcache driver has always accepted arbitrarily large bios and split
    them internally. Now that every driver must accept arbitrarily large
    bios this code isn't nessecary anymore.

    Cc: linux-bcache@vger.kernel.org
    Signed-off-by: Kent Overstreet
    [dpark: add more description in commit message]
    Signed-off-by: Dongsu Park
    Signed-off-by: Ming Lin
    Signed-off-by: Jens Axboe

    Kent Overstreet
     
  • Since generic_make_request() can now handle arbitrary size bios, all we
    have to do is make sure the bvec array doesn't overflow.
    __bio_add_page() doesn't need to call ->merge_bvec_fn(), where
    we can get rid of unnecessary code paths.

    Removing the call to ->merge_bvec_fn() is also fine, as no driver that
    implements support for BLOCK_PC commands even has a ->merge_bvec_fn()
    method.

    Cc: Christoph Hellwig
    Cc: Jens Axboe
    Signed-off-by: Kent Overstreet
    [dpark: rebase and resolve merge conflicts, change a couple of comments,
    make bio_add_page() warn once upon a cloned bio.]
    Signed-off-by: Dongsu Park
    Signed-off-by: Ming Lin
    Signed-off-by: Jens Axboe

    Kent Overstreet
     
  • The way the block layer is currently written, it goes to great lengths
    to avoid having to split bios; upper layer code (such as bio_add_page())
    checks what the underlying device can handle and tries to always create
    bios that don't need to be split.

    But this approach becomes unwieldy and eventually breaks down with
    stacked devices and devices with dynamic limits, and it adds a lot of
    complexity. If the block layer could split bios as needed, we could
    eliminate a lot of complexity elsewhere - particularly in stacked
    drivers. Code that creates bios can then create whatever size bios are
    convenient, and more importantly stacked drivers don't have to deal with
    both their own bio size limitations and the limitations of the
    (potentially multiple) devices underneath them. In the future this will
    let us delete merge_bvec_fn and a bunch of other code.

    We do this by adding calls to blk_queue_split() to the various
    make_request functions that need it - a few can already handle arbitrary
    size bios. Note that we add the call _after_ any call to
    blk_queue_bounce(); this means that blk_queue_split() and
    blk_recalc_rq_segments() don't need to be concerned with bouncing
    affecting segment merging.

    Some make_request_fn() callbacks were simple enough to audit and verify
    they don't need blk_queue_split() calls. The skipped ones are:

    * nfhd_make_request (arch/m68k/emu/nfblock.c)
    * axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
    * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
    * brd_make_request (ramdisk - drivers/block/brd.c)
    * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
    * loop_make_request
    * null_queue_bio
    * bcache's make_request fns

    Some others are almost certainly safe to remove now, but will be left
    for future patches.

    Cc: Jens Axboe
    Cc: Christoph Hellwig
    Cc: Al Viro
    Cc: Ming Lei
    Cc: Neil Brown
    Cc: Alasdair Kergon
    Cc: Mike Snitzer
    Cc: dm-devel@redhat.com
    Cc: Lars Ellenberg
    Cc: drbd-user@lists.linbit.com
    Cc: Jiri Kosina
    Cc: Geoff Levand
    Cc: Jim Paris
    Cc: Philip Kelleher
    Cc: Minchan Kim
    Cc: Nitin Gupta
    Cc: Oleg Drokin
    Cc: Andreas Dilger
    Acked-by: NeilBrown (for the 'md/md.c' bits)
    Acked-by: Mike Snitzer
    Reviewed-by: Martin K. Petersen
    Signed-off-by: Kent Overstreet
    [dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
    Signed-off-by: Dongsu Park
    Signed-off-by: Ming Lin
    Signed-off-by: Jens Axboe

    Kent Overstreet
     
  • IS_ERR(_OR_NULL) already contain an 'unlikely' compiler flag and there
    is no need to do that again from its callers. Drop it.

    Acked-by: Tejun Heo
    Signed-off-by: Viresh Kumar
    Signed-off-by: Jens Axboe

    Viresh Kumar
     

12 Aug, 2015

1 commit

  • Commit 4246a0b6 ("block: add a bi_error field to struct bio") has added a few
    dereferences of 'bio' after a call to bio_put(). This causes use-after-frees
    such as:

    [521120.719695] BUG: KASan: use after free in dio_bio_complete+0x2b3/0x320 at addr ffff880f36b38714
    [521120.720638] Read of size 4 by task mount.ocfs2/9644
    [521120.721212] =============================================================================
    [521120.722056] BUG kmalloc-256 (Not tainted): kasan: bad access detected
    [521120.722968] -----------------------------------------------------------------------------
    [521120.722968]
    [521120.723915] Disabling lock debugging due to kernel taint
    [521120.724539] INFO: Slab 0xffffea003cdace00 objects=32 used=25 fp=0xffff880f36b38600 flags=0x46fffff80004080
    [521120.726037] INFO: Object 0xffff880f36b38700 @offset=1792 fp=0xffff880f36b38800
    [521120.726037]
    [521120.726974] Bytes b4 ffff880f36b386f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    [521120.727898] Object ffff880f36b38700: 00 88 b3 36 0f 88 ff ff 00 00 d8 de 0b 88 ff ff ...6............
    [521120.728822] Object ffff880f36b38710: 02 00 00 f0 00 00 00 00 00 00 00 00 00 00 00 00 ................
    [521120.729705] Object ffff880f36b38720: 01 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 ................
    [521120.730623] Object ffff880f36b38730: 00 00 00 00 00 00 00 00 01 00 00 00 00 02 00 00 ................
    [521120.731621] Object ffff880f36b38740: 00 02 00 00 01 00 00 00 d0 f7 87 ad ff ff ff ff ................
    [521120.732776] Object ffff880f36b38750: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    [521120.733640] Object ffff880f36b38760: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    [521120.734508] Object ffff880f36b38770: 01 00 03 00 01 00 00 00 88 87 b3 36 0f 88 ff ff ...........6....
    [521120.735385] Object ffff880f36b38780: 00 73 22 ad 02 88 ff ff 40 13 e0 3c 00 ea ff ff .s".....@..ffff880f36b38700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    [521120.781465] ^
    [521120.782083] ffff880f36b38780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    [521120.783717] ffff880f36b38800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    [521120.784818] ==================================================================

    This patch fixes a few of those places that I caught while auditing the patch, but the
    original patch should be audited further for more occurences of this issue since I'm
    not too familiar with the code.

    Signed-off-by: Sasha Levin
    Signed-off-by: Jens Axboe

    Sasha Levin
     

29 Jul, 2015

3 commits

  • Commit bcf2843b3f8f added ->bi_error to cleanup the error passing
    for struct bio, but that ended up adding 4 bytes and a 4 byte hole
    to the size of struct bio. For a clean config, that bumped it from
    128 bytes, to 136 bytes, on x86-64.

    The ->bi_flags member is currently an unsigned long, but it fits
    easily within an int. Change it to an unsigned int, adjust the
    the pool offset code, and move ->bi_error into the new hole. Then
    we end up with a 128 byte bio again.

    Change the bio flag set/clear to use cmpxchg to ensure we don't
    lose any flags when manipulating them.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Some places use helpers now, others don't. We only have the 'is set'
    helper, add helpers for setting and clearing flags too.

    It was a bit of a mess of atomic vs non-atomic access. With
    BIO_UPTODATE gone, we don't have any risk of concurrent access to the
    flags. So relax the restriction and don't make any of them atomic. The
    flags that do have serialization issues (reffed and chained), we
    already handle those separately.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Currently we have two different ways to signal an I/O error on a BIO:

    (1) by clearing the BIO_UPTODATE flag
    (2) by returning a Linux errno value to the bi_end_io callback

    The first one has the drawback of only communicating a single possible
    error (-EIO), and the second one has the drawback of not beeing persistent
    when bios are queued up, and are not passed along from child to parent
    bio in the ever more popular chaining scenario. Having both mechanisms
    available has the additional drawback of utterly confusing driver authors
    and introducing bugs where various I/O submitters only deal with one of
    them, and the others have to add boilerplate code to deal with both kinds
    of error returns.

    So add a new bi_error field to store an errno value directly in struct
    bio and remove the existing mechanisms to clean all this up.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: NeilBrown
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

17 Jul, 2015

10 commits

  • Lots of devices support huge discard sizes these days. Depending
    on how the device handles them internally, huge discards can
    introduce massive latencies (hundreds of msec) on the device side.

    We have a sysfs file, discard_max_bytes, that advertises the max
    hardware supported discard size. Make this writeable, and split
    the settings into a soft and hard limit. This can be set from
    'discard_granularity' and up to the hardware limit.

    Add a new sysfs file, 'discard_max_hw_bytes', that shows the hw
    set limit.

    Reviewed-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Some drivers use it now, others just set the limits field manually.
    But in preparation for splitting this into a hard and soft limit,
    ensure that they all call the proper function for setting the hw
    limit for discards.

    Reviewed-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Percpu refcount is the perfect match for partition's case,
    and the conversion is quite straight.

    With the convertion, one pair of atomic inc/dec can be saved
    for accounting block I/O, which is run in hot path of block I/O.

    Signed-off-by: Ming Lei
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • So the helper can be used in both generic partition
    case and part0 case.

    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • Pull power management and ACPI fixes from Rafael Wysocki:
    "These fix two bugs in the cpufreq core (including one recent
    regression), fix a 4.0 PCI regression related to the ACPI resources
    management and quieten an RCU-related lockdep complaint about a
    tracepoint in the suspend-to-idle code.

    Specifics:

    - Fix a recently introduced issue in the cpufreq policy object
    reinitialization that leads to CPU offline/online breakage (Viresh
    Kumar)

    - Make it possible to access frequency tables of offline CPUs which
    is needed by thermal management code among other things (Viresh
    Kumar)

    - Fix an ACPI resource management regression introduced during the
    4.0 cycle that may cause incorrect resource validation results to
    appear in 32-bit x86 kernels due to silent truncation of 64-bit
    values to 32-bit (Jiang Liu)

    - Fix up an RCU-related lockdep complaint about suspicious RCU usage
    in idle caused by using a suspend tracepoint in the core suspend-
    to-idle code (Rafael J Wysocki)"

    * tag 'pm+acpi-4.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    ACPI / PCI: Fix regressions caused by resource_size_t overflow with 32-bit kernel
    cpufreq: Allow freq_table to be obtained for offline CPUs
    cpufreq: Initialize the governor again while restoring policy
    suspend-to-idle: Prevent RCU from complaining about tick_freeze()

    Linus Torvalds
     
  • …linux-platform-drivers-x86

    Pull x86 platform driver fixes from Darren Hart:
    "Fix SMBIOS call handling and hwswitch state coherency in the
    dell-laptop driver. Cleanups for intel_*_ipc drivers. Details:

    dell-laptop:
    - Do not cache hwswitch state
    - Check return value of each SMBIOS call
    - Clear buffer before each SMBIOS call

    intel_scu_ipc:
    - Move local memory initialization out of a mutex

    intel_pmc_ipc:
    - Update kerneldoc formatting
    - Fix compiler casting warnings"

    * tag 'platform-drivers-x86-v4.2-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
    intel_scu_ipc: move local memory initialization out of a mutex
    intel_pmc_ipc: Update kerneldoc formatting
    dell-laptop: Do not cache hwswitch state
    dell-laptop: Check return value of each SMBIOS call
    dell-laptop: Clear buffer before each SMBIOS call
    intel_pmc_ipc: Fix compiler casting warnings

    Linus Torvalds
     
  • Pull m68knommu/coldfire fixes from Greg Ungerer:
    "Contains build fixes and updates for the ColdFire defconfigs.

    Specifically there is a couple of fixes that address problems building
    allnoconfig. Also fix for enabling PCI bus on the M54xx family of
    ColdFire"

    * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
    m68k: enable PCI support for m5475evb defconfig
    m68k: fix io functions for ColdFire/MMU/PCI case
    m68knommu: update defconfig for ColdFire m5475evb
    m68knommu: update defconfig for ColdFire m5407c3
    m68knommu: update defconfig for ColdFire m5307c3
    m68knommu: update defconfig for ColdFire m5275evb
    m68knommu: update defconfig for ColdFire m5272c3
    m68knommu: update defconfig for ColdFire m5249evb
    m68knommu: update defconfig for m5208evb
    m68knommu: make ColdFire SoC selection a choice
    m68knommu: improve the clock configuration defaults
    m68knommu: force setting of CONFIG_CLOCK_FREQ for ColdFire

    Linus Torvalds
     
  • Pull block fixes from Jens Axboe:
    "A collection of fixes from the last few weeks that should go into the
    current series. This contains:

    - Various fixes for the per-blkcg policy data, fixing regressions
    since 4.1. From Arianna and Tejun

    - Code cleanup for bcache closure macros from me. Really just
    flushing this out, it's been sitting in another branch for months

    - FIELD_SIZEOF cleanup from Maninder Singh

    - bio integrity oops fix from Mike

    - Timeout regression fix for blk-mq from Ming Lei"

    * 'for-linus' of git://git.kernel.dk/linux-block:
    blk-mq: set default timeout as 30 seconds
    NVMe: Reread partitions on metadata formats
    bcache: don't embed 'return' statements in closure macros
    blkcg: fix blkcg_policy_data allocation bug
    blkcg: implement all_blkcgs list
    blkcg: blkcg_css_alloc() should grab blkcg_pol_mutex while iterating blkcg_policy[]
    blkcg: allow blkcg_pol_mutex to be grabbed from cgroup [file] methods
    block/blk-cgroup.c: free per-blkcg data when freeing the blkcg
    block: use FIELD_SIZEOF to calculate size of a field
    bio integrity: do not assume bio_integrity_pool exists if bioset exists

    Linus Torvalds
     
  • Pull jfs fixes from David Kleikamp:
    "A couple trivial fixes and an error path fix"

    * tag 'jfs-4.2' of git://github.com/kleikamp/linux-shaggy:
    jfs: clean up jfs_rename and fix out of order unlock
    jfs: fix indentation on if statement
    jfs: removed a prohibited space after opening parenthesis

    Linus Torvalds
     
  • * pm-cpuidle:
    suspend-to-idle: Prevent RCU from complaining about tick_freeze()

    * pm-cpufreq:
    cpufreq: Allow freq_table to be obtained for offline CPUs
    cpufreq: Initialize the governor again while restoring policy

    * acpi-resources:
    ACPI / PCI: Fix regressions caused by resource_size_t overflow with 32-bit kernel

    Rafael J. Wysocki
     

16 Jul, 2015

11 commits

  • It is reasonable to set default timeout of request as 30 seconds instead of
    30000 ticks, which may be 300 seconds if HZ is 100, for example, some arm64
    based systems may choose 100 HZ.

    Signed-off-by: Ming Lei
    Fixes: c76cbbcf4044 ("blk-mq: put blk_queue_rq_timeout together in blk_mq_init_queue()"
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • Pull TPM bugfixes from James Morris.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
    tpm, tpm_crb: fail when TPM2 ACPI table contents look corrupted
    tpm: Fix initialization of the cdev

    Linus Torvalds
     
  • Pull rdma fixes from Doug Ledford:
    "Mainly fix-ups for the various 4.2 items"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (24 commits)
    IB/core: Destroy ocrdma_dev_id IDR on module exit
    IB/core: Destroy multcast_idr on module exit
    IB/mlx4: Optimize do_slave_init
    IB/mlx4: Fix memory leak in do_slave_init
    IB/mlx4: Optimize freeing of items on error unwind
    IB/mlx4: Fix use of flow-counters for process_mad
    IB/ipath: Convert use of __constant_ to
    IB/ipoib: Set MTU to max allowed by mode when mode changes
    IB/ipoib: Scatter-Gather support in connected mode
    IB/ucm: Fix bitmap wrap when devnum > IB_UCM_MAX_DEVICES
    IB/ipoib: Prevent lockdep warning in __ipoib_ib_dev_flush
    IB/ucma: Fix lockdep warning in ucma_lock_files
    rds: rds_ib_device.refcount overflow
    RDMA/nes: Fix for incorrect recording of the MAC address
    RDMA/nes: Fix for resolving the neigh
    RDMA/core: Fixes for port mapper client registration
    IB/IPoIB: Fix bad error flow in ipoib_add_port()
    IB/mlx4: Do not attemp to report HCA clock offset on VFs
    IB/cm: Do not queue work to a device that's going away
    IB/srp: Avoid using uninitialized variable
    ...

    Linus Torvalds
     
  • This patch has the driver automatically reread partitions if a namespace
    has a separate metadata format. Previously revalidating a disk was
    sufficient to get the correct capacity set on such formatted drives,
    but partitions that may exist would not have been surfaced.

    Reported-by: Paul Grabinar
    Signed-off-by: Keith Busch
    Cc: Matthew Wilcox
    Tested-by: Paul Grabinar
    Signed-off-by: Jens Axboe

    Keith Busch
     
  • Pull file locking updates from Jeff Layton:
    "I had thought that I was going to get away without a pull request this
    cycle. There was a NFSv4 file locking problem that cropped up that I
    tried to fix in the NFSv4 code alone, but that fix has turned out to
    be problematic. These patches fix this in the correct way.

    Note that this touches some NFSv4 code as well. Ordinarily I'd wait
    for Trond to ACK this, but he's on holiday right now and the bug is
    rather nasty. So I suggest we merge this and if he raises issues with
    it we can sort it out when he gets back"

    Acked-by: Bruce Fields
    Acked-by: Dan Williams
    [ +1 to this series fixing a 100% reproducible slab corruption +
    general protection fault in my nfs-root test environment. - Dan ]
    Acked-by: Anna Schumaker

    * tag 'locks-v4.2-1' of git://git.samba.org/jlayton/linux:
    locks: inline posix_lock_file_wait and flock_lock_file_wait
    nfs4: have do_vfs_lock take an inode pointer
    locks: new helpers - flock_lock_inode_wait and posix_lock_inode_wait
    locks: have flock_lock_file take an inode pointer instead of a filp
    Revert "nfs: take extra reference to fl->fl_file when running a LOCKU operation"

    Linus Torvalds
     
  • Pull KVM fixes from Paolo Bonzini:

    - Fix FPU refactoring ("kvm: x86: fix load xsave feature warning")

    - Fix eager FPU mode (Cc stable)

    - AMD bits of MTRR virtualization

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    kvm: x86: fix load xsave feature warning
    KVM: x86: apply guest MTRR virtualization on host reserved pages
    KVM: SVM: Sync g_pat with guest-written PAT value
    KVM: SVM: use NPT page attributes
    KVM: count number of assigned devices
    KVM: VMX: fix vmwrite to invalid VMCS
    KVM: x86: reintroduce kvm_is_mmio_pfn
    x86: hyperv: add CPUID bit for crash handlers

    Linus Torvalds
     
  • Pull ARC fixes from Vineet Gupta:
    - Makefile changes (top-level+ARC) reinstates -O3 builds (regression
    since 3.16)
    - IDU intc related fixes, IRQ affinity
    - patch to make bitops safer for ARC
    - perf fix from Alexey to remove signed PC braino
    - Futex backend gets llock/scond support

    * tag 'arc-v4.2-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
    ARCv2: support HS38 releases
    ARC: make sure instruction_pointer() returns unsigned value
    ARC: slightly refactor macros for boot logging
    ARC: Add llock/scond to futex backend
    arc:irqchip: prepare for drivers/irqchip/irqchip.h removal
    ARC: Make ARC bitops "safer" (add anti-optimization)
    ARCv2: [axs103] bump CPU frequency from 75 to 90 MHZ
    ARCv2: intc: IDU: Fix potential race in installing a chained IRQ handler
    ARCv2: intc: IDU: support irq affinity
    ARC: fix unused var wanring
    ARC: Don't memzero twice in dma_alloc_coherent for __GFP_ZERO
    ARC: Override toplevel default -O2 with -O3
    kbuild: Allow arch Makefiles to override {cpp,ld,c}flags
    ARCv2: guard SLC DMA ops with spinlock
    ARC: Kconfig: better way to disable ARC_HAS_LLSC for ARC_CPU_750D

    Linus Torvalds
     
  • Pull s390 fixes from Martin Schwidefsky:
    "One improvement for the zcrypt driver, the quality attribute for the
    hwrng device has been missing. Without it the kernel entropy seeding
    will not happen automatically.

    And six bug fixes, the most important one is the fix for the vector
    register corruption due to machine checks"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
    s390/nmi: fix vector register corruption
    s390/process: fix sfpc inline assembly
    s390/dasd: fix kernel panic when alias is set offline
    s390/sclp: clear upper register halves in _sclp_print_early
    s390/oprofile: fix compile error
    s390/sclp: fix compile error
    s390/zcrypt: enable s390 hwrng to seed kernel entropy

    Linus Torvalds
     
  • The end of jfs_rename(), which is also used by the error paths,
    included a call to IWRITE_UNLOCK(new_ip) after labels out1, out2
    and out3. If we come in through these labels, IWRITE_LOCK() has not
    been called yet.

    In moving that call to the correct spot, I also moved some
    exceptional truncate code earlier as well, since the early error
    paths don't need to deal with it, and I renamed out4: to out_tx: so
    a future patch by Jan Kara doesn't need to deal with renumbering or
    confusing out-of-order labels.

    Signed-off-by: Dave Kleikamp

    Dave Kleikamp
     
  • Pull final init.h/module.h code relocation from Paul Gortmaker:
    "With the release of 4.2-rc2 done, we should not be seeing any new code
    added that gets upset by this small code move, and we've banked yet
    another complete week of testing with this move in place on top of
    4.2-rc1 via linux-next to ensure that remained true.

    Given that, I'd like to put it in now so that people formulating new
    work for 4.3-rc1 will be exposed to the ever so slightly stricter (but
    sensible) requirements wrt. whether they are needing init.h vs.
    module.h macros, even if they are not using linux-next.

    The diffstat of the move is slightly asymmetrical due to needing to
    leave behind a couple #ifdef in the old location and add the same ones
    to the new location, but other than that, it is a 1:1 move, complete
    with the module_init/exit trailing semicolon that we can't fix. That
    is, until/unless someone does a tree-wide sed fix of all the
    approximately 800 currently in tree users relying on it"

    * tag 'module-final-v4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
    module: relocate module_init from init.h to module.h

    Linus Torvalds
     
  • Pull tracing fix from Steven Rostedt:
    "Fengguang Wu discovered a crash that happened to be because of the
    branch tracer (traces unlikely and likely branches) when enabled with
    certain debug options.

    What happened was that various debug options like lockdep and
    DEBUG_PREEMPT can cause parts of the branch tracer to recurse outside
    its recursion protection. In fact, part of its recursion protection
    used these features that caused the lockup. This cleans up the code a
    little and makes the recursion protection a bit more robust"

    * tag 'trace-v4.2-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Have branch tracer use recursive field of task struct

    Linus Torvalds
     

15 Jul, 2015

9 commits

  • James Morris
     
  • '{ }' and memset will both reset the cbuf buffer.
    Only once is enough and this can be done outside fo the mutex.

    Signed-off-by: Christophe JAILLET
    Signed-off-by: Darren Hart

    Christophe JAILLET
     
  • Destroy ocrdma_dev_id IDR on module exit, reclaiming the allocated memory.

    This was detected by the following semantic patch (written by Luis Rodriguez
    )

    @ defines_module_init @
    declarer name module_init, module_exit;
    declarer name DEFINE_IDR;
    identifier init;
    @@

    module_init(init);

    @ defines_module_exit @
    identifier exit;
    @@

    module_exit(exit);

    @ declares_idr depends on defines_module_init && defines_module_exit @
    identifier idr;
    @@

    DEFINE_IDR(idr);

    @ on_exit_calls_destroy depends on declares_idr && defines_module_exit @
    identifier declares_idr.idr, defines_module_exit.exit;
    @@

    exit(void)
    {
    ...
    idr_destroy(&idr);
    ...
    }

    @ missing_module_idr_destroy depends on declares_idr && defines_module_exit && !on_exit_calls_destroy @
    identifier declares_idr.idr, defines_module_exit.exit;
    @@

    exit(void)
    {
    ...
    +idr_destroy(&idr);
    }

    Signed-off-by: Johannes Thumshirn
    Signed-off-by: Doug Ledford

    Johannes Thumshirn
     
  • Destroy multcast_idr on module exit, reclaiming the allocated memory.

    This was detected by the following semantic patch (written by Luis Rodriguez
    )

    @ defines_module_init @
    declarer name module_init, module_exit;
    declarer name DEFINE_IDR;
    identifier init;
    @@

    module_init(init);

    @ defines_module_exit @
    identifier exit;
    @@

    module_exit(exit);

    @ declares_idr depends on defines_module_init && defines_module_exit @
    identifier idr;
    @@

    DEFINE_IDR(idr);

    @ on_exit_calls_destroy depends on declares_idr && defines_module_exit @
    identifier declares_idr.idr, defines_module_exit.exit;
    @@

    exit(void)
    {
    ...
    idr_destroy(&idr);
    ...
    }

    @ missing_module_idr_destroy depends on declares_idr && defines_module_exit && !on_exit_calls_destroy @
    identifier declares_idr.idr, defines_module_exit.exit;
    @@

    exit(void)
    {
    ...
    +idr_destroy(&idr);
    }

    Signed-off-by: Johannes Thumshirn
    Signed-off-by: Doug Ledford

    Johannes Thumshirn
     
  • There is little chance our memory allocation will fail, so we can
    combine initializing the work structs with allocating them instead of
    looping through all of them once to allocate and again to initialize.
    Then when we need to actually find out if our device is up or in the
    process of going down, have all of our work structs batched up, take the
    spin_lock once and only once, and do all of the batch under the one
    spin_lock invocation instead of incurring all of the locked memory cycles
    we would otherwise incur to take/release the spin_lock over and over
    again.

    Signed-off-by: Doug Ledford

    Doug Ledford
     
  • We create a number of work structs to be queued up to a workqueue, and
    on completion of the workqueue handler, the workqueue handler frees the
    allocated memory. If, however, we don't queue the work struct because
    the device is going down, then we need to free the memory ourselves.

    Signed-off-by: Doug Ledford

    Doug Ledford
     
  • On failure, we loop through all possible pointers and test them before
    calling kfree. But really, why even attempt to free items we didn't
    allocate when we can easily loop through exactly and only the devices
    for which the original memory allocation succeeded and free just those.

    Signed-off-by: Maninder Singh
    Signed-off-by: Doug Ledford

    Maninder Singh
     
  • For IB links, reading HCA flow counters through iboe_process_mad() should
    be used when mlx4_ib_process_mad() is invoked only for VFs PMA queries and
    exactly nothing else.

    Fixes: 7193a141eb74 ('IB/mlx4: Set VF to read from QP counters')
    Reported-by: Linus Torvalds
    Signed-off-by: Or Gerlitz
    Signed-off-by: Doug Ledford

    Or Gerlitz
     
  • In little endian cases, the macros be16_to_cpu and cpu_to_be64
    unfolds to __swab{16,64} which provides special case for constants.
    In big endian cases, __constant_be16_to_cpu and be16_to_cpu
    expand directly to the same expression. The same applies for
    __constant_cpu_to_be64 and cpu_to_be64.

    So, replace __constant_be16_to_cpu with be16_to_cpu and
    __constant_cpu_to_be64 with cpu_to_be64, with the goal of getting
    rid of the definition of __constant_be16_to_cpu and
    __constant_cpu_to_be64 completely.

    Signed-off-by: Vaishali Thakkar
    Signed-off-by: Doug Ledford

    Vaishali Thakkar