24 Nov, 2015

3 commits

  • We had seen lots of reports of this kind issue, so add one
    warnning in blk-merge, then it can be triggered easily and
    avoid to depend on warning/bug from drivers.

    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • Commit bdced438acd83a(block: setup bi_phys_segments after
    splitting) introduces function of computing bio->bi_phys_segments
    during bio splitting.

    Unfortunately both bio->bi_seg_front_size and bio->bi_seg_back_size
    arn't computed, so too many physical segments may be obtained
    for one request since both the two are used to check if one segment
    across two bios can be possible.

    This patch fixes the issue by computing the two variables in
    blk_bio_segment_split().

    Fixes: bdced438acd83a(block: setup bi_phys_segments after splitting)
    Reported-by: Michael Ellerman
    Reported-by: Mark Salter
    Tested-by: Laurent Dufour
    Tested-by: Mark Salter
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • Inside blk_bio_segment_split(), previous bvec pointer(bvprvp)
    always points to the iterator local variable, which is obviously
    wrong, so fix it by pointing to the local variable of 'bvprv'.

    Fixes: 5014c311baa2b(block: fix bogus compiler warnings in blk-merge.c)
    Cc: stable@kernel.org #4.3
    Reported-by: Michael Ellerman
    Reported-by: Mark Salter
    Tested-by: Laurent Dufour
    Tested-by: Mark Salter
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

21 Nov, 2015

1 commit

  • Liu reported that running certain parts of xfstests threw the
    following error:

    BUG: sleeping function called from invalid context at mm/page_alloc.c:3190
    in_atomic(): 1, irqs_disabled(): 0, pid: 6, name: kworker/u16:0
    3 locks held by kworker/u16:0/6:
    #0: ("writeback"){++++.+}, at: [] process_one_work+0x173/0x730
    #1: ((&(&wb->dwork)->work)){+.+.+.}, at: [] process_one_work+0x173/0x730
    #2: (&type->s_umount_key#44){+++++.}, at: [] trylock_super+0x25/0x60
    CPU: 5 PID: 6 Comm: kworker/u16:0 Tainted: G OE 4.3.0+ #3
    Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
    Workqueue: writeback wb_workfn (flush-btrfs-108)
    ffffffff81a3abab ffff88042e282ba8 ffffffff8130191b ffffffff81a3abab
    0000000000000c76 ffff88042e282ba8 ffff88042e27c180 ffff88042e282bd8
    ffffffff8108ed95 ffff880400000004 0000000000000000 0000000000000c76
    Call Trace:
    [] dump_stack+0x4f/0x74
    [] ___might_sleep+0x185/0x240
    [] __might_sleep+0x52/0x90
    [] __alloc_pages_nodemask+0x268/0x410
    [] ? sched_clock_local+0x1c/0x90
    [] ? local_clock+0x21/0x40
    [] ? __lock_release+0x420/0x510
    [] ? __lock_acquired+0x16c/0x3c0
    [] alloc_pages_current+0xc5/0x210
    [] ? rbio_is_full+0x55/0x70 [btrfs]
    [] ? mark_held_locks+0x78/0xa0
    [] ? _raw_spin_unlock_irqrestore+0x40/0x60
    [] full_stripe_write+0x5a/0xc0 [btrfs]
    [] __raid56_parity_write+0x39/0x60 [btrfs]
    [] run_plug+0x11b/0x140 [btrfs]
    [] btrfs_raid_unplug+0x23/0x70 [btrfs]
    [] blk_flush_plug_list+0x82/0x1f0
    [] blk_sq_make_request+0x1f9/0x740
    [] ? generic_make_request_checks+0x222/0x7c0
    [] ? blk_queue_enter+0x124/0x310
    [] ? blk_queue_enter+0x92/0x310
    [] generic_make_request+0x172/0x2c0
    [] ? generic_make_request+0x164/0x2c0
    [] submit_bio+0x70/0x140
    [] ? rbio_add_io_page+0x99/0x150 [btrfs]
    [] finish_rmw+0x4d9/0x600 [btrfs]
    [] full_stripe_write+0x9c/0xc0 [btrfs]
    [] raid56_parity_write+0xef/0x160 [btrfs]
    [] btrfs_map_bio+0xe3/0x2d0 [btrfs]
    [] btrfs_submit_bio_hook+0x8d/0x1d0 [btrfs]
    [] submit_one_bio+0x74/0xb0 [btrfs]
    [] submit_extent_page+0xe5/0x1c0 [btrfs]
    [] __extent_writepage_io+0x408/0x4c0 [btrfs]
    [] ? alloc_dummy_extent_buffer+0x140/0x140 [btrfs]
    [] __extent_writepage+0x218/0x3a0 [btrfs]
    [] ? mark_held_locks+0x78/0xa0
    [] extent_write_cache_pages.clone.0+0x2f9/0x400 [btrfs]
    [] extent_writepages+0x52/0x70 [btrfs]
    [] ? btrfs_set_inode_index+0x70/0x70 [btrfs]
    [] btrfs_writepages+0x27/0x30 [btrfs]
    [] do_writepages+0x23/0x40
    [] __writeback_single_inode+0x89/0x4d0
    [] ? writeback_sb_inodes+0x260/0x480
    [] ? writeback_sb_inodes+0x260/0x480
    [] ? writeback_sb_inodes+0x15f/0x480
    [] writeback_sb_inodes+0x2d2/0x480
    [] ? down_read_trylock+0x57/0x60
    [] ? trylock_super+0x25/0x60
    [] ? rcu_read_lock_sched_held+0x4f/0x90
    [] __writeback_inodes_wb+0x8c/0xc0
    [] wb_writeback+0x2b5/0x500
    [] ? mark_held_locks+0x78/0xa0
    [] ? __local_bh_enable_ip+0x68/0xc0
    [] ? wb_do_writeback+0x62/0x310
    [] wb_do_writeback+0xc1/0x310
    [] ? set_worker_desc+0x79/0x90
    [] wb_workfn+0x92/0x330
    [] process_one_work+0x223/0x730
    [] ? process_one_work+0x173/0x730
    [] ? worker_thread+0x18f/0x430
    [] worker_thread+0x11d/0x430
    [] ? maybe_create_worker+0xf0/0xf0
    [] ? maybe_create_worker+0xf0/0xf0
    [] kthread+0xef/0x110
    [] ? schedule_tail+0x1e/0xd0
    [] ? __init_kthread_worker+0x70/0x70
    [] ret_from_fork+0x3f/0x70
    [] ? __init_kthread_worker+0x70/0x70

    The issue is that we've got the software context pinned while
    calling blk_flush_plug_list(), which flushes callbacks that
    are allowed to sleep. btrfs and raid has such callbacks.

    Flip the checks around a bit, so we can enable preempt a bit
    earlier and flush plugs without having preempt disabled.

    This only affects blk-mq driven devices, and only those that
    register a single queue.

    Reported-by: Liu Bo
    Tested-by: Liu Bo
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Jens Axboe
     

20 Nov, 2015

12 commits


17 Nov, 2015

16 commits

  • Currently blk_insert_flush() just adds flush request to q->queue_head
    when flush is not required. That completely bypasses IO scheduler so
    e.g. CFQ can be idling waiting for new request to arrive and will idle
    through the whole window unnecessarily. Luckily this only happens in
    rare cases as usually checks in generic_make_request_checks() clear
    FLUSH and FUA flags early if they are not needed.

    When no flushing is actually required, we can easily fix the problem by
    properly queueing the request through the IO scheduler. Ideally IO
    scheduler should be also made aware of requests queued via
    blk_flush_queue_rq(). However inserting flush request through IO
    scheduler can have unwanted side-effects since due to flush batching
    delaying the flush request in IO scheduler will delay all flush requests
    possibly coming from other processes. So we keep adding the request
    directly to q->queue_head.

    Signed-off-by: Jan Kara
    Reviewed-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Jan Kara
     
  • Add support for registering as a LightNVM device. This allows us to
    evaluate the performance of the LightNVM subsystem.

    In /drivers/Makefile, LightNVM is moved above block device drivers
    to make sure that the LightNVM media managers have been initialized
    before drivers under /drivers/block are initialized.

    Signed-off-by: Matias Bjørling
    Fix by Jens Axboe to remove unneeded slab cache and the following
    memory leak.
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • To make the intention clearer, use list_{first,prev,next}_entry
    instead of list_entry.

    Signed-off-by: Geliang Tang
    Signed-off-by: Jens Axboe

    Geliang Tang
     
  • This prevents outstanding IOs to be sent for completion to target after
    the target has been removed. The flow is now: stop new IOs > cleanup
    queue > remove target.

    Signed-off-by: Javier Gonzalez
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • The specification was updated the remove the double word just after
    number of configuration groups and capabilities. Update the identify
    structure to reflect it.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • The ppa format was not copied from the NVMe specific ppa format to the
    lightnvm specific ppa format. This led to the ppa format not being
    communicated to the layers above.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • The linear and device specific address modes can be replaced with a
    simple offset and bit length conversion that is generic across all
    devices.

    This both simplifies the specification and removes the special case for
    qemu nvme, that previously relied on the linear address mapping.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • Both the nvm_register and nvm_init does a kfree(dev) on error. Make sure
    to only free it once.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • We register with nvm_devices when there registration can still fail.
    Move the final registration at the end of the nvm_register function
    to make sure we are fully registered when added to the nvm_devices list.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • Only NAND flash with SLC and MLC is supported. Make sure to not try to
    initialize TLC memory or other non-volatile memory types.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • The nvm_id, nvm_id_group and nvm_addr_format data structures contain
    reserved attributes. They are unused by media managers and targets.
    Remove them.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • The mccap field is required for I/O command option support. It defines the
    following flash access modes:

    * SLC mode
    * Erase/Program Suspension
    * Scramble On/Off
    * Encryption

    It is slotted in between mpos and cpar, changing the offset for
    cpar as well.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • A single 8 bit and 16 bit reserve field were inserted in the
    specification to align fields appropriately. Reflect this in the
    identify group structure.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • The specification was changed to reflect a multi-value bad block table.
    Instead of bit-based bad block table, the bad block table now allows
    eight bad block categories. Currently four are defined:

    * Factory bad blocks
    * Grown bad blocks
    * Device-side reserved blocks
    * Host-side reserved blocks

    The factory and grown bad blocks are the regular bad blocks. The
    reserved blocks are either for internal use or external use. In
    particular, the device-side reserved blocks allows the host to
    bootstrap from a limited number of flash blocks. Reducing the flash
    blocks to scan upon super block initialization.

    Support for both get bad block table and set bad block table is added.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • The max_phys_sect variable is defined as a char. We do a boundary check
    to maximally allow 256 physical page descriptors per command. As we are
    not indexing from zero. This expression is always false. Bump the
    max_phys_sect to an unsigned int to support the range check.

    Signed-off-by: Matias Bjørling
    Reported-by: Geert Uytterhoeven
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     

16 Nov, 2015

7 commits

  • Linus Torvalds
     
  • Pull perf updates from Thomas Gleixner:
    "Mostly updates to the perf tool plus two fixes to the kernel core code:

    - Handle tracepoint filters correctly for inherited events (Peter
    Zijlstra)

    - Prevent a deadlock in perf_lock_task_context (Paul McKenney)

    - Add missing newlines to some pr_err() calls (Arnaldo Carvalho de
    Melo)

    - Print full source file paths when using 'perf annotate --print-line
    --full-paths' (Michael Petlan)

    - Fix 'perf probe -d' when just one out of uprobes and kprobes is
    enabled (Wang Nan)

    - Add compiler.h to list.h to fix 'make perf-tar-src-pkg' generated
    tarballs, i.e. out of tree building (Arnaldo Carvalho de Melo)

    - Add the llvm-src-base.c and llvm-src-kbuild.c files, generated by
    the 'perf test' LLVM entries, when running it in-tree, to
    .gitignore (Yunlong Song)

    - libbpf error reporting improvements, using a strerror interface to
    more precisely tell the user about problems with the provided
    scriptlet, be it in C or as a ready made object file (Wang Nan)

    - Do not be case sensitive when searching for matching 'perf test'
    entries (Arnaldo Carvalho de Melo)

    - Inform the user about objdump failures in 'perf annotate' (Andi
    Kleen)

    - Improve the LLVM 'perf test' entry, introduce a new ones for BPF
    and kbuild tests to check the environment used by clang to compile
    .c scriptlets (Wang Nan)"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
    perf/x86/intel/rapl: Remove the unused RAPL_EVENT_DESC() macro
    tools include: Add compiler.h to list.h
    perf probe: Verify parameters in two functions
    perf session: Add missing newlines to some pr_err() calls
    perf annotate: Support full source file paths for srcline fix
    perf test: Add llvm-src-base.c and llvm-src-kbuild.c to .gitignore
    perf: Fix inherited events vs. tracepoint filters
    perf: Disable IRQs across RCU RS CS that acquires scheduler lock
    perf test: Do not be case sensitive when searching for matching tests
    perf test: Add 'perf test BPF'
    perf test: Enhance the LLVM tests: add kbuild test
    perf test: Enhance the LLVM test: update basic BPF test program
    perf bpf: Improve BPF related error messages
    perf tools: Make fetch_kernel_version() publicly available
    bpf tools: Add new API bpf_object__get_kversion()
    bpf tools: Improve libbpf error reporting
    perf probe: Cleanup find_perf_probe_point_from_map to reduce redundancy
    perf annotate: Inform the user about objdump failures in --stdio
    perf stat: Make stat options global
    perf sched latency: Fix thread pid reuse issue
    ...

    Linus Torvalds
     
  • Pull scheduler fix from Thomas Gleixner:
    "A single fix to prevent math underflow in the numa balancing code"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/numa: Fix math underflow in task_tick_numa()

    Linus Torvalds
     
  • Pull liblockdep fixes from Thomas Gleixner:
    "Three small patches to synchronize liblockdep with the latest core
    changes"

    * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    tools/liblockdep: explicitly declare lockdep API we call from liblockdep
    tools/liblockdep: add userspace versions of WRITE_ONCE and RCU_INIT_POINTER
    tools/liblockdep: remove task argument from debug_check_no_locks_held

    Linus Torvalds
     
  • Pull x86 fixes from Thomas Gleixner:
    "A couple of fixes and updates related to x86:

    - Fix the W+X check regression on XEN

    - The real fix for the low identity map trainwreck

    - Probe legacy PIC early instead of unconditionally allocating legacy
    irqs

    - Add cpu verification to long mode entry

    - Adjust the cache topology to AMD Fam17H systems

    - Let Merrifield use the TSC across S3"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/cpu: Call verify_cpu() after having entered long mode too
    x86/setup: Fix low identity map for >= 2GB kernel range
    x86/mm: Skip the hypervisor range when walking PGD
    x86/AMD: Fix last level cache topology for AMD Fam17h systems
    x86/irq: Probe for PIC presence before allocating descs for legacy IRQs
    x86/cpu/intel: Enable X86_FEATURE_NONSTOP_TSC_S3 for Merrifield

    Linus Torvalds
     
  • ….kernel.org/pub/scm/linux/kernel/git/tip/tip

    Pull irq and timer fixes from Thomas Gleixner:

    - An irq regression fix to restore the wakeup behaviour of chained
    interrupts.

    - A timer fix for a long standing race versus timers scheduled on a
    target cpu which got exposed by recent changes in the workqueue
    implementation.

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    genirq/PM: Restore system wake up from chained interrupts

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    timers: Use proper base migration in add_timer_on()

    Linus Torvalds
     
  • Pull MIPS updates from Ralf Baechle:
    "These are the highlists of the main MIPS pull request for 4.4:

    - Add latencytop support
    - Support appended DTBs
    - VDSO support and initially use it for gettimeofday.
    - Drop the .MIPS.abiflags and ELF NOTE sections from vmlinux
    - Support for the 5KE, an internal test core.
    - Switch all MIPS platfroms to libata drivers.
    - Improved support, cleanups for ralink and Lantiq platforms.
    - Support for the new xilfpga platform.
    - A number of DTB improvments for BMIPS.
    - Improved support for CM and CPS.
    - Minor JZ4740 and BCM47xx enhancements"

    * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (120 commits)
    MIPS: idle: add case for CPU_5KE
    MIPS: Octeon: Support APPENDED_DTB
    MIPS: vmlinux: create a section for appended DTB
    MIPS: Clean up compat_siginfo_t
    MIPS: Fix PAGE_MASK definition
    MIPS: BMIPS: Enable GZIP ramdisk and timed printks
    MIPS: Add xilfpga defconfig
    MIPS: xilfpga: Add mipsfpga platform code
    MIPS: xilfpga: Add xilfpga device tree files.
    dt-bindings: MIPS: Document xilfpga bindings and boot style
    MIPS: Make MIPS_CMDLINE_DTB default
    MIPS: Make the kernel arguments from dtb available
    MIPS: Use USE_OF as the guard for appended dtb
    MIPS: BCM63XX: Use pr_* instead of printk
    MIPS: Loongson: Cleanup CONFIG_LOONGSON_SUSPEND.
    MIPS: lantiq: Disable xbar fpi burst mode
    MIPS: lantiq: Force the crossbar to big endian
    MIPS: lantiq: Initialize the USB core on boot
    MIPS: lantiq: Return correct value for fpi clock on ar9
    MIPS: ralink: Add missing clock on rt305x
    ...

    Linus Torvalds
     

15 Nov, 2015

1 commit

  • Pull sound fixes from Takashi Iwai:
    "Here are a collection of small fixes tha have been gathered for
    4.4-rc1. The only significant changes are those in PCI drivers
    Kconfig, to use "depends on" instead of "select" for CONFIG_ZONE_DMA.
    A reverse select is often more user-friendly, but in this case, it
    makes hard to manage with the conflict with ZONE_DEVICE, so changed in
    such a way for now.

    Others are all small fixes and quirks: an error check in soundcore
    reigster_chrdev(), HD-audio HDMI/DP phantom jack fix, Intel Broxton DP
    quirk, USB-audio DSD device quirk, some constifications, etc"

    * tag 'sound-fix-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
    ALSA: pci: depend on ZONE_DMA
    ALSA: hda - Simplify phantom jack handling for HDMI/DP
    ALSA: hda/hdmi - apply Skylake fix-ups to Broxton display codec
    ALSA: ctxfi: constify rsc ops structures
    ALSA: usb: Add native DSD support for Aune X1S
    ALSA: oxfw: add an comment to Kconfig for TASCAM FireOne
    sound: fix check for error condition of register_chrdev()

    Linus Torvalds