10 May, 2013

18 commits


09 May, 2013

8 commits

  • Pull InfiniBand/RDMA changes from Roland Dreier:
    - XRC transport fixes
    - Fix DHCP on IPoIB
    - mlx4 preparations for flow steering
    - iSER fixes
    - miscellaneous other fixes

    * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (23 commits)
    IB/iser: Add support for iser CM REQ additional info
    IB/iser: Return error to upper layers on EAGAIN registration failures
    IB/iser: Move informational messages from error to info level
    IB/iser: Add module version
    mlx4_core: Expose a few helpers to fill DMFS HW strucutures
    mlx4_core: Directly expose fields of DMFS HW rule control segment
    mlx4_core: Change a few DMFS fields names to match firmare spec
    mlx4: Match DMFS promiscuous field names to firmware spec
    mlx4_core: Move DMFS HW structs to common header file
    IB/mlx4: Set link type for RAW PACKET QPs in the QP context
    IB/mlx4: Disable VLAN stripping for RAW PACKET QPs
    mlx4_core: Reduce warning message for SRQ_LIMIT event to debug level
    RDMA/iwcm: Don't touch cmid after dropping reference
    IB/qib: Correct qib_verbs_register_sysfs() error handling
    IB/ipath: Correct ipath_verbs_register_sysfs() error handling
    RDMA/cxgb4: Fix SQ allocation when on-chip SQ is disabled
    SRPT: Fix odd use of WARN_ON()
    IPoIB: Fix ipoib_hard_header() return value
    RDMA: Rename random32() to prandom_u32()
    RDMA/cxgb3: Fix uninitialized variable
    ...

    Linus Torvalds
     
  • Pull arm64 update from Catalin Marinas:

    - Since drivers/irqchip/irq-gic.c no longer has dependencies on arm32
    specifics (the 'gic' branch merged), it can be enabled on arm64.

    - Enable arm64 support for poweroff/restart (for code under
    drivers/power/reset/).

    - Fixes (dts file, exception handling, bitops)

    * tag 'arm64-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
    arm64: Treat the bitops index argument as an 'int'
    arm64: Ignore the 'write' ESR flag on cache maintenance faults
    arm64: dts: fix #address-cells for foundation-v8
    arm64: vexpress: Add support for poweroff/restart
    arm64: Enable support for the ARM GIC interrupt controller

    Linus Torvalds
     
  • Pull f2fs updates from Jaegeuk Kim:
    "This patch-set includes the following major enhancement patches.
    - introduce a new gloabl lock scheme
    - add tracepoints on several major functions
    - fix the overall cleaning process focused on victim selection
    - apply the block plugging to merge IOs as much as possible
    - enhance management of free nids and its list
    - enhance the readahead mode for node pages
    - address several cretical deadlock conditions
    - reduce lock_page calls

    The other minor bug fixes and enhancements are as follows.
    - calculation mistakes: overflow
    - bio types: READ, READA, and READ_SYNC
    - fix the recovery flow, data races, and null pointer errors"

    * tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
    f2fs: cover free_nid management with spin_lock
    f2fs: optimize scan_nat_page()
    f2fs: code cleanup for scan_nat_page() and build_free_nids()
    f2fs: bugfix for alloc_nid_failed()
    f2fs: recover when journal contains deleted files
    f2fs: continue to mount after failing recovery
    f2fs: avoid deadlock during evict after f2fs_gc
    f2fs: modify the number of issued pages to merge IOs
    f2fs: remove useless #include as we're now using sysfs as debug entry.
    f2fs: fix inconsistent using of NM_WOUT_THRESHOLD
    f2fs: check truncation of mapping after lock_page
    f2fs: enhance alloc_nid and build_free_nids flows
    f2fs: add a tracepoint on f2fs_new_inode
    f2fs: check nid == 0 in add_free_nid
    f2fs: add REQ_META about metadata requests for submit
    f2fs: give a chance to merge IOs by IO scheduler
    f2fs: avoid frequent background GC
    f2fs: add tracepoints to debug checkpoint request
    f2fs: add tracepoints for write page operations
    f2fs: add tracepoints to debug the block allocation
    ...

    Linus Torvalds
     
  • Pull Hexagon fixes from Richard Kuo:
    "A bug fix and a Kconfig cleanup"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rkuo/linux-hexagon-kernel:
    HEXAGON: Remove non existent reference to GENERIC_KERNEL_EXECVE & GENERIC_KERNEL_THREAD
    Hexagon: fix register used to call do_work_pending

    Linus Torvalds
     
  • Commit 8a965b3baa89 ("mm, slab_common: Fix bootstrap creation of kmalloc
    caches") introduced a regression that caused us to crash early during
    boot. The commit was introducing ordering of slab creation, making sure
    two odd-sized slabs were created after specific powers of two sizes.

    But, if any of the power of two slabs were created earlier during boot,
    slabs at index 1 or 2 might not get created at all. This patch makes
    sure none of the slabs get skipped.

    Tony Lindgren bisected this down to the offending commit, which really
    helped because bisect kept bringing me to almost but not quite this one.

    Signed-off-by: Chris Mason
    Acked-by: Christoph Lameter
    Acked-by: Tony Lindgren
    Acked-by: Soren Brinkmann
    Tested-by: Tetsuo Handa
    Tested-by: Konrad Rzeszutek Wilk
    Signed-off-by: Linus Torvalds

    Chris Mason
     
  • Roland Dreier
     
  • Pull block driver updates from Jens Axboe:
    "It might look big in volume, but when categorized, not a lot of
    drivers are touched. The pull request contains:

    - mtip32xx fixes from Micron.

    - A slew of drbd updates, this time in a nicer series.

    - bcache, a flash/ssd caching framework from Kent.

    - Fixes for cciss"

    * 'for-3.10/drivers' of git://git.kernel.dk/linux-block: (66 commits)
    bcache: Use bd_link_disk_holder()
    bcache: Allocator cleanup/fixes
    cciss: bug fix to prevent cciss from loading in kdump crash kernel
    cciss: add cciss_allow_hpsa module parameter
    drivers/block/mg_disk.c: add CONFIG_PM_SLEEP to suspend/resume functions
    mtip32xx: Workaround for unaligned writes
    bcache: Make sure blocksize isn't smaller than device blocksize
    bcache: Fix merge_bvec_fn usage for when it modifies the bvm
    bcache: Correctly check against BIO_MAX_PAGES
    bcache: Hack around stuff that clones up to bi_max_vecs
    bcache: Set ra_pages based on backing device's ra_pages
    bcache: Take data offset from the bdev superblock.
    mtip32xx: mtip32xx: Disable TRIM support
    mtip32xx: fix a smatch warning
    bcache: Disable broken btree fuzz tester
    bcache: Fix a format string overflow
    bcache: Fix a minor memory leak on device teardown
    bcache: Documentation updates
    bcache: Use WARN_ONCE() instead of __WARN()
    bcache: Add missing #include
    ...

    Linus Torvalds
     
  • Pull block core updates from Jens Axboe:

    - Major bit is Kents prep work for immutable bio vecs.

    - Stable candidate fix for a scheduling-while-atomic in the queue
    bypass operation.

    - Fix for the hang on exceeded rq->datalen 32-bit unsigned when merging
    discard bios.

    - Tejuns changes to convert the writeback thread pool to the generic
    workqueue mechanism.

    - Runtime PM framework, SCSI patches exists on top of these in James'
    tree.

    - A few random fixes.

    * 'for-3.10/core' of git://git.kernel.dk/linux-block: (40 commits)
    relay: move remove_buf_file inside relay_close_buf
    partitions/efi.c: replace useless kzalloc's by kmalloc's
    fs/block_dev.c: fix iov_shorten() criteria in blkdev_aio_read()
    block: fix max discard sectors limit
    blkcg: fix "scheduling while atomic" in blk_queue_bypass_start
    Documentation: cfq-iosched: update documentation help for cfq tunables
    writeback: expose the bdi_wq workqueue
    writeback: replace custom worker pool implementation with unbound workqueue
    writeback: remove unused bdi_pending_list
    aoe: Fix unitialized var usage
    bio-integrity: Add explicit field for owner of bip_buf
    block: Add an explicit bio flag for bios that own their bvec
    block: Add bio_alloc_pages()
    block: Convert some code to bio_for_each_segment_all()
    block: Add bio_for_each_segment_all()
    bounce: Refactor __blk_queue_bounce to not use bi_io_vec
    raid1: use bio_copy_data()
    pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage
    pktcdvd: use bio_copy_data()
    block: Add bio_copy_data()
    ...

    Linus Torvalds
     

08 May, 2013

14 commits

  • After build_free_nids() searches free nid candidates from nat pages and
    current journal blocks, it checks all the candidates if they are allocated
    so that the nat cache has its nid with an allocated block address.

    In this procedure, previously we used
    list_for_each_entry_safe(fnid, next_fnid, &nm_i->free_nid_list, list).
    But, this is not covered by free_nid_list_lock, resulting in null pointer bug.

    This patch moves this checking routine inside add_free_nid() in order not to use
    the spin_lock.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • When nm_i->fcnt > 2 * MAX_FREE_NIDS, stop scanning other NAT entries.

    Signed-off-by: Haicheng Li
    [Jaegeuk Kim: fix handling the return value of add_free_nid()]
    Signed-off-by: Jaegeuk Kim

    Haicheng Li
     
  • This patch does two cleanups:
    1. remove unused variable "fcnt" in build_free_nids().
    2. make scan_nat_page() as void type and remove useless variable "fcnt".

    Signed-off-by: Haicheng Li
    Signed-off-by: Jaegeuk Kim

    Haicheng Li
     
  • Directly drop the free_nid cache when nm_i->fcnt > 2 * MAX_FREE_NIDS

    Since there is NOT nmi->free_nid_list_lock spinlock protection between
    a sequential calling of alloc_nid() and alloc_nid_failed(), some other
    threads may already add new free_nid to the free_nid_list during this
    period.

    We need to make sure nmi->fcnt is never > 2 * MAX_FREE_NIDS.

    Signed-off-by: Haicheng Li
    [Jaegeuk Kim: fit the coding style]
    Signed-off-by: Jaegeuk Kim

    Haicheng Li
     
  • When recovering a journal file with fsync data for files that have
    been deleted, don't bail out on recovery.

    Signed-off-by: Chris Fries
    Reviewed-by: Russell Knize
    Reviewed-by: Jason Hrycay
    [Jaegeuk Kim: fit the coding style]
    Signed-off-by: Jaegeuk Kim

    Chris Fries
     
  • When unable to roll forward the journal, we shouldn't bail out and
    not mount, we should continue to attempt the mount. Bad recovery data
    is likely unrecoverable at this point, and requiring the user to try
    to mount again doesn't solve any issues.

    Signed-off-by: Chris Fries
    Reviewed-by: Russell Knize
    Reviewed-by: Jason Hrycay
    Signed-off-by: Jaegeuk Kim

    Chris Fries
     
  • o Deadlock case #1

    Thread 1:
    - writeback_sb_inodes
    - do_writepages
    - f2fs_write_data_pages
    - write_cache_pages
    - f2fs_write_data_page
    - f2fs_balance_fs
    - wait mutex_lock(gc_mutex)

    Thread 2:
    - f2fs_balance_fs
    - mutex_lock(gc_mutex)
    - f2fs_gc
    - f2fs_iget
    - wait iget_locked(inode->i_lock)

    Thread 3:
    - do_unlinkat
    - iput
    - lock(inode->i_lock)
    - evict
    - inode_wait_for_writeback

    o Deadlock case #2

    Thread 1:
    - __writeback_single_inode
    : set I_SYNC
    - do_writepages
    - f2fs_write_data_page
    - f2fs_balance_fs
    - f2fs_gc
    - iput
    - evict
    - inode_wait_for_writeback(I_SYNC)

    In order to avoid this, even though iput is called with the zero-reference
    count, we need to stop the eviction procedure if the inode is on writeback.
    So this patch links f2fs_drop_inode which checks the I_SYNC flag.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • The bitops prototype use an 'int' as the bit index type but the asm
    implementation assume it to be a 'long'. Since the compiler does not
    guarantee zeroing the upper 32-bits in a register when used as 'int',
    change the bitops implementation accordingly.

    Signed-off-by: Catalin Marinas

    Catalin Marinas
     
  • ESR.WnR bit is always set on data cache maintenance faults even though
    the page is not required to have write permission. If a translation
    fault (page not yet mapped) happens for read-only user address range,
    Linux incorrectly assumes a permission fault. This patch adds the check
    of the ESR.CM bit during the page fault handling to ignore the 'write'
    flag.

    Signed-off-by: Catalin Marinas
    Reported-by: Tim Northover
    Cc: stable@vger.kernel.org

    Catalin Marinas
     
  • Commit 90556ca1 ("arm64: vexpress: Add dts files for the ARMv8 RTSM
    models") added foundation-v8.dts, but erroneously set
    /cpus/#address-cells = while providing two cells in each cpus/cpu@N
    node's reg property.

    As of commit ea393a2e ("arm64: smp: honour #address-size when parsing
    CPU reg property") we read in as many address cells as specified rather
    than always reading two. This means that for foundation-v8.dts, we only
    read the first reg cell (zero) for each cpu node, and receive a lot of
    warnings at boot of the form "/cpus/cpu@1: duplicate cpu reg properties
    in the DT".

    This patch corrects foundation-v8.dts to have the correct value for
    /cpus/#address-cells.

    Signed-off-by: Mark Rutland
    Cc: Catalin Marinas
    Cc: Pawel Moll
    Cc: Will Deacon
    Tested-by: Marc Zyngier
    Acked-by: Marc Zyngier
    Signed-off-by: Catalin Marinas

    Mark Rutland
     
  • This patch adds the arm_pm_poweroff definition expected by the
    vexpress-poweroff.c driver and enables the latter for arm64.

    Signed-off-by: Catalin Marinas
    Acked-by: Pawel Moll

    Catalin Marinas
     
  • This patch enables ARM_GIC on the arm64 kernel.

    Signed-off-by: Catalin Marinas

    Catalin Marinas
     
  • * arm64-prep-gic:
    irqchip: gic: Perform the gic_secondary_init() call via CPU notifier
    irqchip: gic: Call handle_bad_irq() directly
    arm: Move chained_irq_(enter|exit) to a generic file
    arm: Move the set_handle_irq and handle_arch_irq declarations to asm/irq.h

    Catalin Marinas
     
  • Merge more incoming from Andrew Morton:

    - Various fixes which were stalled or which I picked up recently

    - A large rotorooting of the AIO code. Allegedly to improve
    performance but I don't really have good performance numbers (I might
    have lost the email) and I can't raise Kent today. I held this out
    of 3.9 and we could give it another cycle if it's all too late/scary.

    I ended up taking only the first two thirds of the AIO rotorooting. I
    left the percpu parts and the batch completion for later. - Linus

    * emailed patches from Andrew Morton : (33 commits)
    aio: don't include aio.h in sched.h
    aio: kill ki_retry
    aio: kill ki_key
    aio: give shared kioctx fields their own cachelines
    aio: kill struct aio_ring_info
    aio: kill batch allocation
    aio: change reqs_active to include unreaped completions
    aio: use cancellation list lazily
    aio: use flush_dcache_page()
    aio: make aio_read_evt() more efficient, convert to hrtimers
    wait: add wait_event_hrtimeout()
    aio: refcounting cleanup
    aio: make aio_put_req() lockless
    aio: do fget() after aio_get_req()
    aio: dprintk() -> pr_debug()
    aio: move private stuff out of aio.h
    aio: add kiocb_cancel()
    aio: kill return value of aio_complete()
    char: add aio_{read,write} to /dev/{null,zero}
    aio: remove retry-based AIO
    ...

    Linus Torvalds