19 Apr, 2014

1 commit

  • This appears to be a copy/paste error. Update the description to
    reflect extra rbtree debug and checks for the config option instead of
    duplicating CONFIG_DEBUG_VM.

    Signed-off-by: Davidlohr Bueso
    Cc: Aswin Chandramouleeswaran
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

13 Apr, 2014

2 commits

  • Pull vfs updates from Al Viro:
    "The first vfs pile, with deep apologies for being very late in this
    window.

    Assorted cleanups and fixes, plus a large preparatory part of iov_iter
    work. There's a lot more of that, but it'll probably go into the next
    merge window - it *does* shape up nicely, removes a lot of
    boilerplate, gets rid of locking inconsistencie between aio_write and
    splice_write and I hope to get Kent's direct-io rewrite merged into
    the same queue, but some of the stuff after this point is having
    (mostly trivial) conflicts with the things already merged into
    mainline and with some I want more testing.

    This one passes LTP and xfstests without regressions, in addition to
    usual beating. BTW, readahead02 in ltp syscalls testsuite has started
    giving failures since "mm/readahead.c: fix readahead failure for
    memoryless NUMA nodes and limit readahead pages" - might be a false
    positive, might be a real regression..."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    missing bits of "splice: fix racy pipe->buffers uses"
    cifs: fix the race in cifs_writev()
    ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
    kill generic_file_buffered_write()
    ocfs2_file_aio_write(): switch to generic_perform_write()
    ceph_aio_write(): switch to generic_perform_write()
    xfs_file_buffered_aio_write(): switch to generic_perform_write()
    export generic_perform_write(), start getting rid of generic_file_buffer_write()
    generic_file_direct_write(): get rid of ppos argument
    btrfs_file_aio_write(): get rid of ppos
    kill the 5th argument of generic_file_buffered_write()
    kill the 4th argument of __generic_file_aio_write()
    lustre: don't open-code kernel_recvmsg()
    ocfs2: don't open-code kernel_recvmsg()
    drbd: don't open-code kernel_recvmsg()
    constify blk_rq_map_user_iov() and friends
    lustre: switch to kernel_sendmsg()
    ocfs2: don't open-code kernel_sendmsg()
    take iov_iter stuff to mm/iov_iter.c
    process_vm_access: tidy up a bit
    ...

    Linus Torvalds
     
  • Pull audit updates from Eric Paris.

    * git://git.infradead.org/users/eparis/audit: (28 commits)
    AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC
    audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range
    audit: do not cast audit_rule_data pointers pointlesly
    AUDIT: Allow login in non-init namespaces
    audit: define audit_is_compat in kernel internal header
    kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c
    sched: declare pid_alive as inline
    audit: use uapi/linux/audit.h for AUDIT_ARCH declarations
    syscall_get_arch: remove useless function arguments
    audit: remove stray newline from audit_log_execve_info() audit_panic() call
    audit: remove stray newlines from audit_log_lost messages
    audit: include subject in login records
    audit: remove superfluous new- prefix in AUDIT_LOGIN messages
    audit: allow user processes to log from another PID namespace
    audit: anchor all pid references in the initial pid namespace
    audit: convert PPIDs to the inital PID namespace.
    pid: get pid_t ppid of task in init_pid_ns
    audit: rename the misleading audit_get_context() to audit_take_context()
    audit: Add generic compat syscall support
    audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
    ...

    Linus Torvalds
     

09 Apr, 2014

1 commit

  • I got a bug report yesterday from Laszlo Ersek in which he states that
    his kvm instance fails to suspend. Laszlo bisected it down to this
    commit 1cf7e9c68fe8 ("virtio_blk: blk-mq support") where virtio-blk is
    converted to use the blk-mq infrastructure.

    After digging a bit, it became clear that the issue was with the queue
    drain. blk-mq tracks queue usage in a percpu counter, which is
    incremented on request alloc and decremented when the request is freed.
    The initial hunt was for an inconsistency in blk-mq, but everything
    seemed fine. In fact, the counter only returned crazy values when
    suspend was in progress.

    When a CPU is unplugged, the percpu counters merges that CPU state with
    the general state. blk-mq takes care to register a hotcpu notifier with
    the appropriate priority, so we know it runs after the percpu counter
    notifier. However, the percpu counter notifier only merges the state
    when the CPU is fully gone. This leaves a state transition where the
    CPU going away is no longer in the online mask, yet it still holds
    private values. This means that in this state, percpu_counter_sum()
    returns invalid results, and the suspend then hangs waiting for
    abs(dead-cpu-value) requests to complete which of course will never
    happen.

    Fix this by clearing the state earlier, so we never have a case where
    the CPU isn't in online mask but still holds private state. This bug
    has been there since forever, I guess we don't have a lot of users where
    percpu counters needs to be reliable during the suspend cycle.

    Signed-off-by: Jens Axboe
    Reported-by: Laszlo Ersek
    Tested-by: Laszlo Ersek
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jens Axboe
     

08 Apr, 2014

5 commits

  • We define a check function in order to avoid trouble with the include
    files. Then the higher level __this_cpu macros are modified to invoke
    the preemption check.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Christoph Lameter
    Acked-by: Ingo Molnar
    Cc: Tejun Heo
    Tested-by: Grygorii Strashko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • If the renamed symbol is defined lib/iomap.c implements ioport_map and
    ioport_unmap and currently (nearly) all platforms define the port
    accessor functions outb/inb and friend unconditionally. So
    HAS_IOPORT_MAP is the better name for this.

    Consequently NO_IOPORT is renamed to NO_IOPORT_MAP.

    The motivation for this change is to reintroduce a symbol HAS_IOPORT
    that signals if outb/int et al are available. I will address that at
    least one merge window later though to keep surprises to a minimum and
    catch new introductions of (HAS|NO)_IOPORT.

    The changes in this commit were done using:

    $ git grep -l -E '(NO|HAS)_IOPORT' | xargs perl -p -i -e 's/\b((?:CONFIG_)?(?:NO|HAS)_IOPORT)\b/$1_MAP/'

    Signed-off-by: Uwe Kleine-König
    Acked-by: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     
  • This can greatly aid in narrowing down the real source of initramfs
    problems such as failures related to the compression of the in-kernel
    initramfs when an external initramfs is in use as well. Existing errors
    are ambiguous as to which initramfs is a problem and why.

    [akpm@linux-foundation.org: use pr_debug()]
    Signed-off-by: Daniel M. Weeks
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel M. Weeks
     
  • Replace rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL)

    The rcu_assign_pointer() ensures that the initialization of a structure
    is carried out before storing a pointer to that structure. And in the
    case of the NULL pointer, there is no structure to initialize.

    So, rcu_assign_pointer(p, NULL) can be safely converted to
    RCU_INIT_POINTER(p, NULL)

    Signed-off-by: Monam Agarwal
    Acked-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Monam Agarwal
     
  • Remove no longer used deprecated code, and make local functions
    static.

    Signed-off-by: Stephen Hemminger
    Acked-by: Jean Delvare
    Acked-by: Tejun Heo
    Cc: Jeff Layton
    Cc: Philipp Reisner
    Cc: Jens Axboe
    Cc: George Spelvin
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Hemminger
     

04 Apr, 2014

11 commits

  • Include appropriate header file include/linux/decompress/inflate.h in
    lib/decompress_inflate.c because it has prototype declaration of
    function defined in lib/decompress_inflate.c.

    Also, fix the guard around the header file
    include/linux/decompress/inflate.h to use a more unique guard symbol.
    This avoids conflict with the INFLATE_H defined by
    zlib_inflate/inflate.h.

    This eliminates the following warning in lib/decompress_inflate.c:

    lib/decompress_inflate.c:35:17: warning: no previous prototype for `gunzip' [-Wmissing-prototypes]

    Signed-off-by: Rashika Kheria
    Reviewed-by: Josh Triplett
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rashika Kheria
     
  • Add prototype declarations of functions in lib/clz_ctz.c. These
    functions are required by GCC builtins and hence can not be removed
    despite of their unreferenced appearance in kernel source.

    This eliminates the following warning in lib/clz_ctz.c:

    lib/clz_ctz.c:16:12: warning: no previous prototype for `__ctzsi2' [-Wmissing-prototypes]
    lib/clz_ctz.c:22:12: warning: no previous prototype for `__clzsi2' [-Wmissing-prototypes]
    lib/clz_ctz.c:44:12: warning: no previous prototype for `__clzdi2' [-Wmissing-prototypes]
    lib/clz_ctz.c:50:12: warning: no previous prototype for `__ctzdi2' [-Wmissing-prototypes]

    Signed-off-by: Rashika Kheria
    Reviewed-by: Josh Triplett
    Acked-by: Chanho Min
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rashika Kheria
     
  • These are just some very minor and misc cleanups in the PRNG. In
    prandom_u32() we store the result in an unsigned long which is
    unnecessary as it should be u32 instead that we get from
    prandom_u32_state(). prandom_bytes_state()'s comment is in kdoc format,
    so change it into such as it's done everywhere else. Also, use the
    normal comment style for the header comment. Last but not least for
    readability, add some newlines.

    Signed-off-by: Daniel Borkmann
    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Borkmann
     
  • Having a discussion about sparse warnings in the kernel, and that we
    should clean them up, I decided to pick a random file to do so. This
    happened to be devres.c which gives the following warnings:

    CHECK lib/devres.c
    lib/devres.c:83:9: warning: cast removes address space of expression
    lib/devres.c:117:31: warning: incorrect type in return expression (different address spaces)
    lib/devres.c:117:31: expected void [noderef] *
    lib/devres.c:117:31: got void *
    lib/devres.c:125:31: warning: incorrect type in return expression (different address spaces)
    lib/devres.c:125:31: expected void [noderef] *
    lib/devres.c:125:31: got void *
    lib/devres.c:136:26: warning: incorrect type in assignment (different address spaces)
    lib/devres.c:136:26: expected void [noderef] *[assigned] dest_ptr
    lib/devres.c:136:26: got void *
    lib/devres.c:226:9: warning: cast removes address space of expression

    Mostly it's just the use of typecasting to void * without adding
    __force, or returning ERR_PTR(-ESOMEERR) without typecasting to a
    __iomem type.

    I added a helper macro IOMEM_ERR_PTR() that does the typecast to make
    the code a little nicer than adding ugly typecasts to the code.

    Signed-off-by: Steven Rostedt
    Cc: Tejun Heo
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven Rostedt
     
  • All in-kernel users of %n in format strings have now been removed and
    the %n directive is ignored. Remove the handling of %n so that it is
    treated the same as any other invalid format string directive. Keep a
    warning in place to deter new instances of %n in format strings.

    Signed-off-by: Ryan Mallon
    Acked-by: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryan Mallon
     
  • It is only used by procfs and procfs cannot be a module.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Currently kobject_uevent has somewhat unpredictable semantics. The
    point is, since it may call a usermode helper and wait for it to execute
    (UMH_WAIT_EXEC), it is impossible to say for sure what lock dependencies
    it will introduce for the caller - strictly speaking it depends on what
    fs the binary is located on and the set of locks fork may take. There
    are quite a few kobject_uevent's users that do not take this into
    account and call it with various mutexes taken, e.g. rtnl_mutex,
    net_mutex, which might potentially lead to a deadlock.

    Since there is actually no reason to wait for the usermode helper to
    execute there, let's make kobject_uevent start the helper asynchronously
    with the aid of the UMH_NO_WAIT flag.

    Personally, I'm interested in this, because I really want kobject_uevent
    to be called under the slab_mutex in the slub implementation as it used
    to be some time ago, because it greatly simplifies synchronization and
    automatically fixes a kmemcg-related race. However, there was a
    deadlock detected on an attempt to call kobject_uevent under the
    slab_mutex (see https://lkml.org/lkml/2012/1/14/45), which was reported
    to be fixed by releasing the slab_mutex for kobject_uevent.

    Unfortunately, there was no information about who exactly blocked on the
    slab_mutex causing the usermode helper to stall, neither have I managed
    to find this out or reproduce the issue.

    BTW, this is not the first attempt to make kobject_uevent use
    UMH_NO_WAIT. Previous one was made by commit f520360d93cd ("kobject:
    don't block for each kobject_uevent"), but it was wrong (it passed
    arguments allocated on stack to async thread) so it was reverted in
    05f54c13cd0c ("Revert "kobject: don't block for each kobject_uevent".").
    It targeted on speeding up the boot process though.

    Signed-off-by: Vladimir Davydov
    Cc: Greg KH
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • Previously, page cache radix tree nodes were freed after reclaim emptied
    out their page pointers. But now reclaim stores shadow entries in their
    place, which are only reclaimed when the inodes themselves are
    reclaimed. This is problematic for bigger files that are still in use
    after they have a significant amount of their cache reclaimed, without
    any of those pages actually refaulting. The shadow entries will just
    sit there and waste memory. In the worst case, the shadow entries will
    accumulate until the machine runs out of memory.

    To get this under control, the VM will track radix tree nodes
    exclusively containing shadow entries on a per-NUMA node list. Per-NUMA
    rather than global because we expect the radix tree nodes themselves to
    be allocated node-locally and we want to reduce cross-node references of
    otherwise independent cache workloads. A simple shrinker will then
    reclaim these nodes on memory pressure.

    A few things need to be stored in the radix tree node to implement the
    shadow node LRU and allow tree deletions coming from the list:

    1. There is no index available that would describe the reverse path
    from the node up to the tree root, which is needed to perform a
    deletion. To solve this, encode in each node its offset inside the
    parent. This can be stored in the unused upper bits of the same
    member that stores the node's height at no extra space cost.

    2. The number of shadow entries needs to be counted in addition to the
    regular entries, to quickly detect when the node is ready to go to
    the shadow node LRU list. The current entry count is an unsigned
    int but the maximum number of entries is 64, so a shadow counter
    can easily be stored in the unused upper bits.

    3. Tree modification needs tree lock and tree root, which are located
    in the address space, so store an address_space backpointer in the
    node. The parent pointer of the node is in a union with the 2-word
    rcu_head, so the backpointer comes at no extra cost as well.

    4. The node needs to be linked to an LRU list, which requires a list
    head inside the node. This does increase the size of the node, but
    it does not change the number of objects that fit into a slab page.

    [akpm@linux-foundation.org: export the right function]
    Signed-off-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Reviewed-by: Minchan Kim
    Cc: Andrea Arcangeli
    Cc: Bob Liu
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Greg Thelen
    Cc: Hugh Dickins
    Cc: Jan Kara
    Cc: KOSAKI Motohiro
    Cc: Luigi Semenzato
    Cc: Mel Gorman
    Cc: Metin Doslu
    Cc: Michel Lespinasse
    Cc: Ozgun Erdogan
    Cc: Peter Zijlstra
    Cc: Roman Gushchin
    Cc: Ryan Mallon
    Cc: Tejun Heo
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Make struct radix_tree_node part of the public interface and provide API
    functions to create, look up, and delete whole nodes. Refactor the
    existing insert, look up, delete functions on top of these new node
    primitives.

    This will allow the VM to track and garbage collect page cache radix
    tree nodes.

    [sasha.levin@oracle.com: return correct error code on insertion failure]
    Signed-off-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Bob Liu
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Greg Thelen
    Cc: Hugh Dickins
    Cc: Jan Kara
    Cc: KOSAKI Motohiro
    Cc: Luigi Semenzato
    Cc: Mel Gorman
    Cc: Metin Doslu
    Cc: Michel Lespinasse
    Cc: Ozgun Erdogan
    Cc: Peter Zijlstra
    Cc: Roman Gushchin
    Cc: Ryan Mallon
    Cc: Tejun Heo
    Cc: Vlastimil Babka
    Signed-off-by: Sasha Levin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • The radix tree hole searching code is only used for page cache, for
    example the readahead code trying to get a a picture of the area
    surrounding a fault.

    It sufficed to rely on the radix tree definition of holes, which is
    "empty tree slot". But this is about to change, though, as shadow page
    descriptors will be stored in the page cache after the actual pages get
    evicted from memory.

    Move the functions over to mm/filemap.c and make them native page cache
    operations, where they can later be adapted to handle the new definition
    of "page cache hole".

    Signed-off-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Reviewed-by: Minchan Kim
    Acked-by: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: Bob Liu
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Greg Thelen
    Cc: Hugh Dickins
    Cc: Jan Kara
    Cc: KOSAKI Motohiro
    Cc: Luigi Semenzato
    Cc: Metin Doslu
    Cc: Michel Lespinasse
    Cc: Ozgun Erdogan
    Cc: Peter Zijlstra
    Cc: Roman Gushchin
    Cc: Ryan Mallon
    Cc: Tejun Heo
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Provide a function that does not just delete an entry at a given index,
    but also allows passing in an expected item. Delete only if that item
    is still located at the specified index.

    This is handy when lockless tree traversals want to delete entries as
    well because they don't have to do an second, locked lookup to verify
    the slot has not changed under them before deleting the entry.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Minchan Kim
    Reviewed-by: Rik van Riel
    Acked-by: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: Bob Liu
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Greg Thelen
    Cc: Hugh Dickins
    Cc: Jan Kara
    Cc: KOSAKI Motohiro
    Cc: Luigi Semenzato
    Cc: Metin Doslu
    Cc: Michel Lespinasse
    Cc: Ozgun Erdogan
    Cc: Peter Zijlstra
    Cc: Roman Gushchin
    Cc: Ryan Mallon
    Cc: Tejun Heo
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

03 Apr, 2014

1 commit

  • Pull networking updates from David Miller:
    "Here is my initial pull request for the networking subsystem during
    this merge window:

    1) Support for ESN in AH (RFC 4302) from Fan Du.

    2) Add full kernel doc for ethtool command structures, from Ben
    Hutchings.

    3) Add BCM7xxx PHY driver, from Florian Fainelli.

    4) Export computed TCP rate information in netlink socket dumps, from
    Eric Dumazet.

    5) Allow IPSEC SA to be dumped partially using a filter, from Nicolas
    Dichtel.

    6) Convert many drivers to pci_enable_msix_range(), from Alexander
    Gordeev.

    7) Record SKB timestamps more efficiently, from Eric Dumazet.

    8) Switch to microsecond resolution for TCP round trip times, also
    from Eric Dumazet.

    9) Clean up and fix 6lowpan fragmentation handling by making use of
    the existing inet_frag api for it's implementation.

    10) Add TX grant mapping to xen-netback driver, from Zoltan Kiss.

    11) Auto size SKB lengths when composing netlink messages based upon
    past message sizes used, from Eric Dumazet.

    12) qdisc dumps can take a long time, add a cond_resched(), From Eric
    Dumazet.

    13) Sanitize netpoll core and drivers wrt. SKB handling semantics.
    Get rid of never-used-in-tree netpoll RX handling. From Eric W
    Biederman.

    14) Support inter-address-family and namespace changing in VTI tunnel
    driver(s). From Steffen Klassert.

    15) Add Altera TSE driver, from Vince Bridgers.

    16) Optimizing csum_replace2() so that it doesn't adjust the checksum
    by checksumming the entire header, from Eric Dumazet.

    17) Expand BPF internal implementation for faster interpreting, more
    direct translations into JIT'd code, and much cleaner uses of BPF
    filtering in non-socket ocntexts. From Daniel Borkmann and Alexei
    Starovoitov"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1976 commits)
    netpoll: Use skb_irq_freeable to make zap_completion_queue safe.
    net: Add a test to see if a skb is freeable in irq context
    qlcnic: Fix build failure due to undefined reference to `vxlan_get_rx_port'
    net: ptp: move PTP classifier in its own file
    net: sxgbe: make "core_ops" static
    net: sxgbe: fix logical vs bitwise operation
    net: sxgbe: sxgbe_mdio_register() frees the bus
    Call efx_set_channels() before efx->type->dimension_resources()
    xen-netback: disable rogue vif in kthread context
    net/mlx4: Set proper build dependancy with vxlan
    be2net: fix build dependency on VxLAN
    mac802154: make csma/cca parameters per-wpan
    mac802154: allow only one WPAN to be up at any given time
    net: filter: minor: fix kdoc in __sk_run_filter
    netlink: don't compare the nul-termination in nla_strcmp
    can: c_can: Avoid led toggling for every packet.
    can: c_can: Simplify TX interrupt cleanup
    can: c_can: Store dlc private
    can: c_can: Reduce register access
    can: c_can: Make the code readable
    ...

    Linus Torvalds
     

02 Apr, 2014

5 commits

  • it only makes control flow in __fput() and friends more convoluted.

    Signed-off-by: Al Viro

    Al Viro
     
  • Pull block driver update from Jens Axboe:
    "On top of the core pull request, here's the pull request for the
    driver related changes for 3.15. It contains:

    - Improvements for msi-x registration for block drivers (mtip32xx,
    skd, cciss, nvme) from Alexander Gordeev.

    - A round of cleanups and improvements for drbd from Andreas
    Gruenbacher and Rashika Kheria.

    - A round of clanups and improvements for bcache from Kent.

    - Removal of sleep_on() and friends in DAC960, ataflop, swim3 from
    Arnd Bergmann.

    - Bug fix for a bug in the mtip32xx async completion code from Sam
    Bradshaw.

    - Bug fix for accidentally bouncing IO on 32-bit platforms with
    mtip32xx from Felipe Franciosi"

    * 'for-3.15/drivers' of git://git.kernel.dk/linux-block: (103 commits)
    bcache: remove nested function usage
    bcache: Kill bucket->gc_gen
    bcache: Kill unused freelist
    bcache: Rework btree cache reserve handling
    bcache: Kill btree_io_wq
    bcache: btree locking rework
    bcache: Fix a race when freeing btree nodes
    bcache: Add a real GC_MARK_RECLAIMABLE
    bcache: Add bch_keylist_init_single()
    bcache: Improve priority_stats
    bcache: Better alloc tracepoints
    bcache: Kill dead cgroup code
    bcache: stop moving_gc marking buckets that can't be moved.
    bcache: Fix moving_pred()
    bcache: Fix moving_gc deadlocking with a foreground write
    bcache: Fix discard granularity
    bcache: Fix another bug recovering from unclean shutdown
    bcache: Fix a bug recovering from unclean shutdown
    bcache: Fix a journalling reclaim after recovery bug
    bcache: Fix a null ptr deref in journal replay
    ...

    Linus Torvalds
     
  • Pull driver core and sysfs updates from Greg KH:
    "Here's the big driver core / sysfs update for 3.15-rc1.

    Lots of kernfs updates to make it useful for other subsystems, and a
    few other tiny driver core patches.

    All have been in linux-next for a while"

    * tag 'driver-core-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (42 commits)
    Revert "sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()"
    kernfs: cache atomic_write_len in kernfs_open_file
    numa: fix NULL pointer access and memory leak in unregister_one_node()
    Revert "driver core: synchronize device shutdown"
    kernfs: fix off by one error.
    kernfs: remove duplicate dir.c at the top dir
    x86: align x86 arch with generic CPU modalias handling
    cpu: add generic support for CPU feature based module autoloading
    sysfs: create bin_attributes under the requested group
    driver core: unexport static function create_syslog_header
    firmware: use power efficient workqueue for unloading and aborting fw load
    firmware: give a protection when map page failed
    firmware: google memconsole driver fixes
    firmware: fix google/gsmi duplicate efivars_sysfs_init()
    drivers/base: delete non-required instances of include
    kernfs: fix kernfs_node_from_dentry()
    ACPI / platform: drop redundant ACPI_HANDLE check
    kernfs: fix hash calculation in kernfs_rename_ns()
    kernfs: add CONFIG_KERNFS
    sysfs, kobject: add sysfs wrapper for kernfs_enable_ns()
    ...

    Linus Torvalds
     
  • Pull PCI changes from Bjorn Helgaas:
    "Enumeration
    - Increment max correctly in pci_scan_bridge() (Andreas Noever)
    - Clarify the "scan anyway" comment in pci_scan_bridge() (Andreas Noever)
    - Assign CardBus bus number only during the second pass (Andreas Noever)
    - Use request_resource_conflict() instead of insert_ for bus numbers (Andreas Noever)
    - Make sure bus number resources stay within their parents bounds (Andreas Noever)
    - Remove pci_fixup_parent_subordinate_busnr() (Andreas Noever)
    - Check for child busses which use more bus numbers than allocated (Andreas Noever)
    - Don't scan random busses in pci_scan_bridge() (Andreas Noever)
    - x86: Drop pcibios_scan_root() check for bus already scanned (Bjorn Helgaas)
    - x86: Use pcibios_scan_root() instead of pci_scan_bus_with_sysdata() (Bjorn Helgaas)
    - x86: Use pcibios_scan_root() instead of pci_scan_bus_on_node() (Bjorn Helgaas)
    - x86: Merge pci_scan_bus_on_node() into pcibios_scan_root() (Bjorn Helgaas)
    - x86: Drop return value of pcibios_scan_root() (Bjorn Helgaas)

    NUMA
    - x86: Add x86_pci_root_bus_node() to look up NUMA node from PCI bus (Bjorn Helgaas)
    - x86: Use x86_pci_root_bus_node() instead of get_mp_bus_to_node() (Bjorn Helgaas)
    - x86: Remove mp_bus_to_node[], set_mp_bus_to_node(), get_mp_bus_to_node() (Bjorn Helgaas)
    - x86: Use NUMA_NO_NODE, not -1, for unknown node (Bjorn Helgaas)
    - x86: Remove acpi_get_pxm() usage (Bjorn Helgaas)
    - ia64: Use NUMA_NO_NODE, not MAX_NUMNODES, for unknown node (Bjorn Helgaas)
    - ia64: Remove acpi_get_pxm() usage (Bjorn Helgaas)
    - ACPI: Fix acpi_get_node() prototype (Bjorn Helgaas)

    Resource management
    - i2o: Fix and refactor PCI space allocation (Bjorn Helgaas)
    - Add resource_contains() (Bjorn Helgaas)
    - Add %pR support for IORESOURCE_UNSET (Bjorn Helgaas)
    - Mark resources as IORESOURCE_UNSET if we can't assign them (Bjorn Helgaas)
    - Don't clear IORESOURCE_UNSET when updating BAR (Bjorn Helgaas)
    - Check IORESOURCE_UNSET before updating BAR (Bjorn Helgaas)
    - Don't try to claim IORESOURCE_UNSET resources (Bjorn Helgaas)
    - Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit (Bjorn Helgaas)
    - Don't enable decoding if BAR hasn't been assigned an address (Bjorn Helgaas)
    - Add "weak" generic pcibios_enable_device() implementation (Bjorn Helgaas)
    - alpha, microblaze, sh, sparc, tile: Use default pcibios_enable_device() (Bjorn Helgaas)
    - s390: Use generic pci_enable_resources() (Bjorn Helgaas)
    - Don't check resource_size() in pci_bus_alloc_resource() (Bjorn Helgaas)
    - Set type in __request_region() (Bjorn Helgaas)
    - Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region() (Bjorn Helgaas)
    - Change pci_bus_alloc_resource() type_mask to unsigned long (Bjorn Helgaas)
    - Log IDE resource quirk in dmesg (Bjorn Helgaas)
    - Revert "[PATCH] Insert GART region into resource map" (Bjorn Helgaas)

    PCI device hotplug
    - Make check_link_active() non-static (Rajat Jain)
    - Use link change notifications for hot-plug and removal (Rajat Jain)
    - Enable link state change notifications (Rajat Jain)
    - Don't disable the link permanently during removal (Rajat Jain)
    - Don't check adapter or latch status while disabling (Rajat Jain)
    - Disable link notification across slot reset (Rajat Jain)
    - Ensure very fast hotplug events are also processed (Rajat Jain)
    - Add hotplug_lock to serialize hotplug events (Rajat Jain)
    - Remove a non-existent card, regardless of "surprise" capability (Rajat Jain)
    - Don't turn slot off when hot-added device already exists (Yijing Wang)

    MSI
    - Keep pci_enable_msi() documentation (Alexander Gordeev)
    - ahci: Fix broken single MSI fallback (Alexander Gordeev)
    - ahci, vfio: Use pci_enable_msi_range() (Alexander Gordeev)
    - Check kmalloc() return value, fix leak of name (Greg Kroah-Hartman)
    - Fix leak of msi_attrs (Greg Kroah-Hartman)
    - Fix pci_msix_vec_count() htmldocs failure (Masanari Iida)

    Virtualization
    - Device-specific ACS support (Alex Williamson)

    Freescale i.MX6
    - Wait for retraining (Marek Vasut)

    Marvell MVEBU
    - Use Device ID and revision from underlying endpoint (Andrew Lunn)
    - Fix incorrect size for PCI aperture resources (Jason Gunthorpe)
    - Call request_resource() on the apertures (Jason Gunthorpe)
    - Fix potential issue in range parsing (Jean-Jacques Hiblot)

    Renesas R-Car
    - Check platform_get_irq() return code (Ben Dooks)
    - Add error interrupt handling (Ben Dooks)
    - Fix bridge logic configuration accesses (Ben Dooks)
    - Register each instance independently (Magnus Damm)
    - Break out window size handling (Magnus Damm)
    - Make the Kconfig dependencies more generic (Magnus Damm)

    Synopsys DesignWare
    - Fix RC BAR to be single 64-bit non-prefetchable memory (Mohit Kumar)

    Miscellaneous
    - Remove unused SR-IOV VF Migration support (Bjorn Helgaas)
    - Enable INTx if BIOS left them disabled (Bjorn Helgaas)
    - Fix hex vs decimal typo in cpqhpc_probe() (Dan Carpenter)
    - Clean up par-arch object file list (Liviu Dudau)
    - Set IORESOURCE_ROM_SHADOW only for the default VGA device (Sander Eikelenboom)
    - ACPI, ARM, drm, powerpc, pcmcia, PCI: Use list_for_each_entry() for bus traversal (Yijing Wang)
    - Fix pci_bus_b() build failure (Paul Gortmaker)"

    * tag 'pci-v3.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (108 commits)
    Revert "[PATCH] Insert GART region into resource map"
    PCI: Log IDE resource quirk in dmesg
    PCI: Change pci_bus_alloc_resource() type_mask to unsigned long
    PCI: Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region()
    resources: Set type in __request_region()
    PCI: Don't check resource_size() in pci_bus_alloc_resource()
    s390/PCI: Use generic pci_enable_resources()
    tile PCI RC: Use default pcibios_enable_device()
    sparc/PCI: Use default pcibios_enable_device() (Leon only)
    sh/PCI: Use default pcibios_enable_device()
    microblaze/PCI: Use default pcibios_enable_device()
    alpha/PCI: Use default pcibios_enable_device()
    PCI: Add "weak" generic pcibios_enable_device() implementation
    PCI: Don't enable decoding if BAR hasn't been assigned an address
    PCI: Enable INTx in pci_reenable_device() only when MSI/MSI-X not enabled
    PCI: Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit
    PCI: Don't try to claim IORESOURCE_UNSET resources
    PCI: Check IORESOURCE_UNSET before updating BAR
    PCI: Don't clear IORESOURCE_UNSET when updating BAR
    PCI: Mark resources as IORESOURCE_UNSET if we can't assign them
    ...

    Conflicts:
    arch/x86/include/asm/topology.h
    drivers/ata/ahci.c

    Linus Torvalds
     
  • nla_strcmp compares the string length plus one, so it's implicitly
    including the nul-termination in the comparison.

    int nla_strcmp(const struct nlattr *nla, const char *str)
    {
    int len = strlen(str) + 1;
    ...
    d = memcmp(nla_data(nla), str, len);

    However, if NLA_STRING is used, userspace can send us a string without
    the nul-termination. This is a problem since the string
    comparison will not match as the last byte may be not the
    nul-termination.

    Fix this by skipping the comparison of the nul-termination if the
    attribute data is nul-terminated. Suggested by Thomas Graf.

    Cc: Florian Westphal
    Cc: Thomas Graf
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira
     

01 Apr, 2014

2 commits

  • Pull x86 LTO changes from Peter Anvin:
    "More infrastructure work in preparation for link-time optimization
    (LTO). Most of these changes is to make sure symbols accessed from
    assembly code are properly marked as visible so the linker doesn't
    remove them.

    My understanding is that the changes to support LTO are still not
    upstream in binutils, but are on the way there. This patchset should
    conclude the x86-specific changes, and remaining patches to actually
    enable LTO will be fed through the Kbuild tree (other than keeping up
    with changes to the x86 code base, of course), although not
    necessarily in this merge window"

    * 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
    Kbuild, lto: Handle basic LTO in modpost
    Kbuild, lto: Disable LTO for asm-offsets.c
    Kbuild, lto: Add a gcc-ld script to let run gcc as ld
    Kbuild, lto: add ld-version and ld-ifversion macros
    Kbuild, lto: Drop .number postfixes in modpost
    Kbuild, lto, workaround: Don't warn for initcall_reference in modpost
    lto: Disable LTO for sys_ni
    lto: Handle LTO common symbols in module loader
    lto, workaround: Add workaround for initcall reordering
    lto: Make asmlinkage __visible
    x86, lto: Disable LTO for the x86 VDSO
    initconst, x86: Fix initconst mistake in ts5500 code
    initconst: Fix initconst mistake in dcdbas
    asmlinkage: Make trace_hardirqs_on/off_caller visible
    asmlinkage, x86: Fix 32bit memcpy for LTO
    asmlinkage Make __stack_chk_failed and memcmp visible
    asmlinkage: Mark rwsem functions that can be called from assembler asmlinkage
    asmlinkage: Make main_extable_sort_needed visible
    asmlinkage, mutex: Mark __visible
    asmlinkage: Make trace_hardirq visible
    ...

    Linus Torvalds
     
  • Pull RCU updates from Ingo Molnar:
    "Main changes:

    - Torture-test changes, including refactoring of rcutorture and
    introduction of a vestigial locktorture.

    - Real-time latency fixes.

    - Documentation updates.

    - Miscellaneous fixes"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits)
    rcu: Provide grace-period piggybacking API
    rcu: Ensure kernel/rcu/rcu.h can be sourced/used stand-alone
    rcu: Fix sparse warning for rcu_expedited from kernel/ksysfs.c
    notifier: Substitute rcu_access_pointer() for rcu_dereference_raw()
    Documentation/memory-barriers.txt: Clarify release/acquire ordering
    rcutorture: Save kvm.sh output to log
    rcutorture: Add a lock_busted to test the test
    rcutorture: Place kvm-test-1-run.sh output into res directory
    rcutorture: Rename TREE_RCU-Kconfig.txt
    locktorture: Add kvm-recheck.sh plug-in for locktorture
    rcutorture: Gracefully handle NULL cleanup hooks
    locktorture: Add vestigial locktorture configuration
    rcutorture: Introduce "rcu" directory level underneath configs
    rcutorture: Rename kvm-test-1-rcu.sh
    rcutorture: Remove RCU dependencies from ver_functions.sh API
    rcutorture: Create CFcommon file for common Kconfig parameters
    rcutorture: Create config files for scripted test-the-test testing
    rcutorture: Add an rcu_busted to test the test
    locktorture: Add a lock-torture kernel module
    rcutorture: Abstract kvm-recheck.sh
    ...

    Linus Torvalds
     

29 Mar, 2014

1 commit

  • Commit 4af712e8df ("random32: add prandom_reseed_late() and call when
    nonblocking pool becomes initialized") has added a late reseed stage
    that happens as soon as the nonblocking pool is marked as initialized.

    This fails in the case that the nonblocking pool gets initialized
    during __prandom_reseed()'s call to get_random_bytes(). In that case
    we'd double back into __prandom_reseed() in an attempt to do a late
    reseed - deadlocking on 'lock' early on in the boot process.

    Instead, just avoid even waiting to do a reseed if a reseed is already
    occuring.

    Fixes: 4af712e8df99 ("random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized")
    Signed-off-by: Sasha Levin
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Sasha Levin
     

23 Mar, 2014

1 commit


20 Mar, 2014

1 commit

  • lib/audit.c provides a generic function for auditing system calls.
    This patch extends it for compat syscall support on bi-architectures
    (32/64-bit) by adding lib/compat_audit.c.
    What is required to support this feature are:
    * add asm/unistd32.h for compat system call names
    * select CONFIG_AUDIT_ARCH_COMPAT_GENERIC

    Signed-off-by: AKASHI Takahiro
    Acked-by: Richard Guy Briggs
    Signed-off-by: Eric Paris

    AKASHI Takahiro
     

04 Mar, 2014

2 commits

  • Running fsx on tmpfs with concurrent memhog-swapoff-swapon, lots of

    BUG: sleeping function called from invalid context at kernel/fork.c:606
    in_atomic(): 0, irqs_disabled(): 0, pid: 1394, name: swapoff
    1 lock held by swapoff/1394:
    #0: (rcu_read_lock){.+.+.+}, at: [] radix_tree_locate_item+0x1f/0x2b6

    followed by

    ================================================
    [ BUG: lock held when returning to user space! ]
    3.14.0-rc1 #3 Not tainted
    ------------------------------------------------
    swapoff/1394 is leaving the kernel with locks still held!
    1 lock held by swapoff/1394:
    #0: (rcu_read_lock){.+.+.+}, at: [] radix_tree_locate_item+0x1f/0x2b6

    after which the system recovered nicely.

    Whoops, I long ago forgot the rcu_read_unlock() on one unlikely branch.

    Fixes e504f3fdd63d ("tmpfs radix_tree: locate_item to speed up swapoff")

    Signed-off-by: Hugh Dickins
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • While debug_dma_assert_idle() checks if a given *page* is actively
    undergoing dma the valid granularity of a dma mapping is a *cacheline*.
    Sander's testing shows that the warning message "DMA-API: exceeded 7
    overlapping mappings of pfn..." is falsely triggering. The test is
    simply mapping multiple cachelines in a given page.

    Ultimately we want overlap tracking to be valid as it is a real api
    violation, so we need to track active mappings by cachelines. Update
    the active dma tracking to use the page-frame-relative cacheline of the
    mapping as the key, and update debug_dma_assert_idle() to check for all
    possible mapped cachelines for a given page.

    However, the need to track active mappings is only relevant when the
    dma-mapping is writable by the device. In fact it is fairly standard
    for read-only mappings to have hundreds or thousands of overlapping
    mappings at once. Limiting the overlap tracking to writable
    (!DMA_TO_DEVICE) eliminates this class of false-positive overlap
    reports.

    Note, the radix gang lookup is sub-optimal. It would be best if it
    stopped fetching entries once the search passed a page boundary.
    Nevertheless, this implementation does not perturb the original net_dma
    failing case. That is to say the extra overhead does not show up in
    terms of making the failing case pass due to a timing change.

    References:
    http://marc.info/?l=linux-netdev&m=139232263419315&w=2
    http://marc.info/?l=linux-netdev&m=139217088107122&w=2

    Signed-off-by: Dan Williams
    Reported-by: Sander Eikelenboom
    Reported-by: Dave Jones
    Tested-by: Dave Jones
    Tested-by: Sander Eikelenboom
    Cc: Konrad Rzeszutek Wilk
    Cc: Francois Romieu
    Cc: Eric Dumazet
    Cc: Wei Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     

27 Feb, 2014

1 commit

  • Sometimes we have a struct resource where we know the type (MEM/IO/etc.)
    and the size, but we haven't assigned address space for it. The
    IORESOURCE_UNSET flag is a way to indicate this situation. For these
    "unset" resources, the start address is meaningless, so print only the
    size, e.g.,

    - pci 0000:0c:00.0: reg 184: [mem 0x00000000-0x00001fff 64bit]
    + pci 0000:0c:00.0: reg 184: [mem size 0x2000 64bit]

    For %pr (printing with raw flags), we still print the address range,
    because %pr is mostly used for debugging anyway.

    Thanks to Fengguang Wu for suggesting
    resource_size().

    Signed-off-by: Bjorn Helgaas

    Bjorn Helgaas
     

24 Feb, 2014

2 commits


19 Feb, 2014

1 commit


17 Feb, 2014

1 commit


15 Feb, 2014

1 commit

  • Pull block IO fixes from Jens Axboe:
    "Second round of updates and fixes for 3.14-rc2. Most of this stuff
    has been queued up for a while. The notable exception is the blk-mq
    changes, which are naturally a bit more in flux still.

    The pull request contains:

    - Two bug fixes for the new immutable vecs, causing crashes with raid
    or swap. From Kent.

    - Various blk-mq tweaks and fixes from Christoph. A fix for
    integrity bio's from Nic.

    - A few bcache fixes from Kent and Darrick Wong.

    - xen-blk{front,back} fixes from David Vrabel, Matt Rushton, Nicolas
    Swenson, and Roger Pau Monne.

    - Fix for a vec miscount with integrity vectors from Martin.

    - Minor annotations or fixes from Masanari Iida and Rashika Kheria.

    - Tweak to null_blk to do more normal FIFO processing of requests
    from Shlomo Pongratz.

    - Elevator switching bypass fix from Tejun.

    - Softlockup in blkdev_issue_discard() fix when !CONFIG_PREEMPT from
    me"

    * 'for-linus' of git://git.kernel.dk/linux-block: (31 commits)
    block: add cond_resched() to potentially long running ioctl discard loop
    xen-blkback: init persistent_purge_work work_struct
    blk-mq: pair blk_mq_start_request / blk_mq_requeue_request
    blk-mq: dont assume rq->errors is set when returning an error from ->queue_rq
    block: Fix cloning of discard/write same bios
    block: Fix type mismatch in ssize_t_blk_mq_tag_sysfs_show
    blk-mq: rework flush sequencing logic
    null_blk: use blk_complete_request and blk_mq_complete_request
    virtio_blk: use blk_mq_complete_request
    blk-mq: rework I/O completions
    fs: Add prototype declaration to appropriate header file include/linux/bio.h
    fs: Mark function as static in fs/bio-integrity.c
    block/null_blk: Fix completion processing from LIFO to FIFO
    block: Explicitly handle discard/write same segments
    block: Fix nr_vecs for inline integrity vectors
    blk-mq: Add bio_integrity setup to blk_mq_make_request
    blk-mq: initialize sg_reserved_size
    blk-mq: handle dma_drain_size
    blk-mq: divert __blk_put_request for MQ ops
    blk-mq: support at_head inserations for blk_execute_rq
    ...

    Linus Torvalds
     

14 Feb, 2014

1 commit

  • In LTO symbols implicitely referenced by the compiler need
    to be visible. Earlier these symbols were visible implicitely
    from being exported, but we disabled implicit visibility fo
    EXPORTs when modules are disabled to improve code size. So
    now these symbols have to be marked visible explicitely.

    Do this for __stack_chk_fail (with stack protector)
    and memcmp.

    Signed-off-by: Andi Kleen
    Link: http://lkml.kernel.org/r/1391845930-28580-10-git-send-email-ak@linux.intel.com
    Signed-off-by: H. Peter Anvin

    Andi Kleen