13 Sep, 2013

4 commits

  • Pull generic hardirq option removal from Martin Schwidefsky:
    "All architectures now use generic hardirqs, s390 has been last to
    switch.

    With that the code under !CONFIG_GENERIC_HARDIRQS and the related
    HAVE_GENERIC_HARDIRQS and GENERIC_HARDIRQS config options can be
    removed. Yay!"

    * 'genirq' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
    Remove GENERIC_HARDIRQ config option

    Linus Torvalds
     
  • Pull crypto fixes from Herbert Xu:
    "This fixes a 7+ year race condition in the crypto API that causes
    sporadic crashes when multiple threads load the same algorithm.

    It also fixes the crct10dif algorithm again to prevent boot failures
    on systems where the initramfs tool ignores module softdeps"

    * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    crypto: crct10dif - Add fallback for broken initrds
    crypto: api - Fix race condition in larval lookup

    Linus Torvalds
     
  • After the last architecture switched to generic hard irqs the config
    options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code
    for !CONFIG_GENERIC_HARDIRQS can be removed.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • Pull SCSI target updates from Nicholas Bellinger:
    "Lots of activity again this round for I/O performance optimizations
    (per-cpu IDA pre-allocation for vhost + iscsi/target), and the
    addition of new fabric independent features to target-core
    (COMPARE_AND_WRITE + EXTENDED_COPY).

    The main highlights include:

    - Support for iscsi-target login multiplexing across individual
    network portals
    - Generic Per-cpu IDA logic (kent + akpm + clameter)
    - Conversion of vhost to use per-cpu IDA pre-allocation for
    descriptors, SGLs and userspace page pointer list
    - Conversion of iscsi-target + iser-target to use per-cpu IDA
    pre-allocation for descriptors
    - Add support for generic COMPARE_AND_WRITE (AtomicTestandSet)
    emulation for virtual backend drivers
    - Add support for generic EXTENDED_COPY (CopyOffload) emulation for
    virtual backend drivers.
    - Add support for fast memory registration mode to iser-target (Vu)

    The patches to add COMPARE_AND_WRITE and EXTENDED_COPY support are of
    particular significance, which make us the first and only open source
    target to support the full set of VAAI primitives.

    Currently Linux clients are lacking upstream support to actually
    utilize these primitives. However, with server side support now in
    place for folks like MKP + ZAB working on the client, this logic once
    reserved for the highest end of storage arrays, can now be run in VMs
    on their laptops"

    * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (50 commits)
    target/iscsi: Bump versions to v4.1.0
    target: Update copyright ownership/year information to 2013
    iscsi-target: Bump default TCP listen backlog to 256
    target: Fix >= v3.9+ regression in PR APTPL + ALUA metadata write-out
    iscsi-target; Bump default CmdSN Depth to 64
    iscsi-target: Remove unnecessary wait_for_completion in iscsi_get_thread_set
    iscsi-target: Add thread_set->ts_activate_sem + use common deallocate
    iscsi-target: Fix race with thread_pre_handler flush_signals + ISCSI_THREAD_SET_DIE
    target: remove unused including
    iser-target: introduce fast memory registration mode (FRWR)
    iser-target: generalize rdma memory registration and cleanup
    iser-target: move rdma wr processing to a shared function
    target: Enable global EXTENDED_COPY setup/release
    target: Add Third Party Copy (3PC) bit in INQUIRY response
    target: Enable EXTENDED_COPY setup in spc_parse_cdb
    target: Add support for EXTENDED_COPY copy offload emulation
    target: Avoid non-existent tg_pt_gp_mem in target_alua_state_check
    target: Add global device list for EXTENDED_COPY
    target: Make helpers non static for EXTENDED_COPY command setup
    target: Make spc_parse_naa_6h_vendor_specific non static
    ...

    Linus Torvalds
     

12 Sep, 2013

11 commits

  • Unfortunately, even with a softdep some distros fail to include
    the necessary modules in the initrd. Therefore this patch adds
    a fallback path to restore existing behaviour where we cannot
    load the new crypto crct10dif algorithm.

    In order to do this, the underlying crct10dif has been split out
    from the crypto implementation so that it can be used on the
    fallback path.

    Signed-off-by: Herbert Xu

    Herbert Xu
     
  • LZ4 compression and decompression functions require different in
    signedness input/output parameters: unsigned char for compression and
    signed char for decompression.

    Change decompression API to require "(const) unsigned char *".

    Signed-off-by: Sergey Senozhatsky
    Cc: Kyungsik Lee
    Cc: Geert Uytterhoeven
    Cc: Yann Collet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • With users of radix_tree_preload() run from interrupt (block/blk-ioc.c is
    one such possible user), the following race can happen:

    radix_tree_preload()
    ...
    radix_tree_insert()
    radix_tree_node_alloc()
    if (rtp->nr) {
    ret = rtp->nodes[rtp->nr - 1];

    ...
    radix_tree_preload()
    ...
    radix_tree_insert()
    radix_tree_node_alloc()
    if (rtp->nr) {
    ret = rtp->nodes[rtp->nr - 1];

    And we give out one radix tree node twice. That clearly results in radix
    tree corruption with different results (usually OOPS) depending on which
    two users of radix tree race.

    We fix the problem by making radix_tree_node_alloc() always allocate fresh
    radix tree nodes when in interrupt. Using preloading when in interrupt
    doesn't make sense since all the allocations have to be atomic anyway and
    we cannot steal nodes from process-context users because some users rely
    on radix_tree_insert() succeeding after radix_tree_preload().
    in_interrupt() check is somewhat ugly but we cannot simply key off passed
    gfp_mask as that is acquired from root_gfp_mask() and thus the same for
    all preload users.

    Another part of the fix is to avoid node preallocation in
    radix_tree_preload() when passed gfp_mask doesn't allow waiting. Again,
    preallocation in such case doesn't make sense and when preallocation would
    happen in interrupt we could possibly leak some allocated nodes. However,
    some users of radix_tree_preload() require following radix_tree_insert()
    to succeed. To avoid unexpected effects for these users,
    radix_tree_preload() only warns if passed gfp mask doesn't allow waiting
    and we provide a new function radix_tree_maybe_preload() for those users
    which get different gfp mask from different call sites and which are
    prepared to handle radix_tree_insert() failure.

    Signed-off-by: Jan Kara
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • No reason require rbtree test code to be a module, allow it to be builtin
    (streamlines my development process)

    Signed-off-by: Cody P Schafer
    Reviewed-by: Seth Jennings
    Cc: David Woodhouse
    Cc: Rik van Riel
    Cc: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cody P Schafer
     
  • Just check that we examine all nodes in the tree for the postorder
    iteration.

    Signed-off-by: Cody P Schafer
    Reviewed-by: Seth Jennings
    Cc: David Woodhouse
    Cc: Rik van Riel
    Cc: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cody P Schafer
     
  • Postorder iteration yields all of a node's children prior to yielding the
    node itself, and this particular implementation also avoids examining the
    leaf links in a node after that node has been yielded.

    In what I expect will be its most common usage, postorder iteration allows
    the deletion of every node in an rbtree without modifying the rbtree nodes
    (no _requirement_ that they be nulled) while avoiding referencing child
    nodes after they have been "deleted" (most commonly, freed).

    I have only updated zswap to use this functionality at this point, but
    numerous bits of code (most notably in the filesystem drivers) use a hand
    rolled postorder iteration that NULLs child links as it traverses the
    tree. Each of those instances could be replaced with this common
    implementation.

    1 & 2 add rbtree postorder iteration functions.
    3 adds testing of the iteration to the rbtree runtime tests
    4 allows building the rbtree runtime tests as builtins
    5 updates zswap.

    This patch:

    Add postorder iteration functions for rbtree. These are useful for safely
    freeing an entire rbtree without modifying the tree at all.

    Signed-off-by: Cody P Schafer
    Reviewed-by: Seth Jennings
    Cc: David Woodhouse
    Cc: Rik van Riel
    Cc: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cody P Schafer
     
  • When decompressing into memory, the output buffer length is set to some
    arbitrarily high value (0x7fffffff) to indicate the output is, virtually,
    unlimited in size.

    The problem with this is that some platforms have their physical memory at
    high physical addresses (0x80000000 or more), and that the output buffer
    address and its "unlimited" length cannot be added without overflowing.
    An example of this can be found in inflate_fast():

    /* next_out is the output buffer address */
    out = strm->next_out - OFF;
    /* avail_out is the output buffer size. end will overflow if the output
    * address is >= 0x80000104 */
    end = out + (strm->avail_out - 257);

    This has huge consequences on the performance of kernel decompression,
    since the following exit condition of inflate_fast() will be always true:

    } while (in < last && out < end);

    Indeed, "end" has overflowed and is now always lower than "out". As a
    result, inflate_fast() will return after processing one single byte of
    input data, and will thus need to be called an unreasonably high number of
    times. This probably went unnoticed because kernel decompression is fast
    enough even with this issue.

    Nonetheless, adjusting the output buffer length in such a way that the
    above pointer arithmetic never overflows results in a kernel decompression
    that is about 3 times faster on affected machines.

    Signed-off-by: Alexandre Courbot
    Tested-by: Jon Medhurst
    Cc: Stephen Warren
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandre Courbot
     
  • [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Gu Zheng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gu Zheng
     
  • The documentation mentions a "name" parameter, which does not exist. This
    commit removes such mention from the function documentation.

    Signed-off-by: Emilio López
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Emilio López
     
  • Use the helper function instead of __GFP_ZERO.

    Signed-off-by: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • In struct gen_pool_chunk, end_addr means the end address of memory chunk
    (inclusive), but in the implementation it is treated as address + size of
    memory chunk (exclusive), so it points to the address plus one instead of
    correct ending address.

    The ending address of memory chunk plus one will cause overflow on the
    memory chunk including the last address of memory map, e.g. when starting
    address is 0xFFF00000 and size is 0x100000 on 32bit machine, ending
    address will be 0x100000000.

    Use correct ending address like starting address + size - 1.

    [akpm@linux-foundation.org: add comment to struct gen_pool_chunk:end_addr]
    Signed-off-by: Joonyoung Shim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonyoung Shim
     

11 Sep, 2013

2 commits

  • Pull device-mapper updates from Mike Snitzer:
    "Add the ability to collect I/O statistics on user-defined regions of a
    device-mapper device. This dm-stats code required the reintroduction
    of a div64_u64_rem() helper, but as a separate method that doesn't
    slow down div64_u64() -- especially on 32-bit systems.

    Allow the error target to replace request-based DM devices (e.g.
    multipath) in addition to bio-based DM devices.

    Various other small code fixes and improvements to thin-provisioning,
    DM cache and the DM ioctl interface"

    * tag 'dm-3.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
    dm stripe: silence a couple sparse warnings
    dm: add statistics support
    dm thin: always return -ENOSPC if no_free_space is set
    dm ioctl: cleanup error handling in table_load
    dm ioctl: increase granularity of type_lock when loading table
    dm ioctl: prevent rename to empty name or uuid
    dm thin: set pool read-only if breaking_sharing fails block allocation
    dm thin: prefix pool error messages with pool device name
    dm: allow error target to replace bio-based and request-based targets
    math64: New separate div64_u64_rem helper
    dm space map: optimise sm_ll_dec and sm_ll_inc
    dm btree: prefetch child nodes when walking tree for a dm_btree_del
    dm btree: use pop_frame in dm_btree_del to cleanup code
    dm cache: eliminate holes in cache structure
    dm cache: fix stacking of geometry limits
    dm thin: fix stacking of geometry limits
    dm thin: add data block size limits to Documentation
    dm cache: add data block size limits to code and Documentation
    dm cache: document metadata device is exclussive to a cache
    dm: stop using WQ_NON_REENTRANT

    Linus Torvalds
     
  • Pull md update from Neil Brown:
    "Headline item is multithreading for RAID5 so that more IO/sec can be
    supported on fast (SSD) devices. Also TILE-Gx SIMD suppor for RAID6
    calculations and an assortment of bug fixes"

    * tag 'md/3.12' of git://neil.brown.name/md:
    raid5: only wakeup necessary threads
    md/raid5: flush out all pending requests before proceeding with reshape.
    md/raid5: use seqcount to protect access to shape in make_request.
    raid5: sysfs entry to control worker thread number
    raid5: offload stripe handle to workqueue
    raid5: fix stripe release order
    raid5: make release_stripe lockless
    md: avoid deadlock when dirty buffers during md_stop.
    md: Don't test all of mddev->flags at once.
    md: Fix apparent cut-and-paste error in super_90_validate
    raid6/test: replace echo -e with printf
    RAID: add tilegx SIMD implementation of raid6
    md: fix safe_mode buglet.
    md: don't call md_allow_write in get_bitmap_file.

    Linus Torvalds
     

10 Sep, 2013

2 commits

  • Percpu frontend for allocating ids. With percpu allocation (that works),
    it's impossible to guarantee it will always be possible to allocate all
    nr_tags - typically, some will be stuck on a remote percpu freelist
    where the current job can't get to them.

    We do guarantee that it will always be possible to allocate at least
    (nr_tags / 2) tags - this is done by keeping track of which and how many
    cpus have tags on their percpu freelists. On allocation failure if
    enough cpus have tags that there could potentially be (nr_tags / 2) tags
    stuck on remote percpu freelists, we then pick a remote cpu at random to
    steal from.

    Note that there's no cpu hotplug notifier - we don't care, because
    steal_tags() will eventually get the down cpu's tags. We _could_ satisfy
    more allocations if we had a notifier - but we'll still meet our
    guarantees and it's absolutely not a correctness issue, so I don't think
    it's worth the extra code.

    From akpm:

    "It looks OK to me (that's as close as I get to an ack :))

    v6 changes:
    - Add #include to include/linux/percpu_ida.h to
    make alpha/arc builds happy (Fengguang)
    - Move second (cpu >= nr_cpu_ids) check inside of first check scope
    in steal_tags() (akpm + nab)

    v5 changes:
    - Change percpu_ida->cpus_have_tags to cpumask_t (kmo + akpm)
    - Add comment for percpu_ida_cpu->lock + ->nr_free (kmo + akpm)
    - Convert steal_tags() to use cpumask_weight() + cpumask_next() +
    cpumask_first() + cpumask_clear_cpu() (kmo + akpm)
    - Add comment for alloc_global_tags() (kmo + akpm)
    - Convert percpu_ida_alloc() to use cpumask_set_cpu() (kmo + akpm)
    - Convert percpu_ida_free() to use cpumask_set_cpu() (kmo + akpm)
    - Drop percpu_ida->cpus_have_tags allocation in percpu_ida_init()
    (kmo + akpm)
    - Drop percpu_ida->cpus_have_tags kfree in percpu_ida_destroy()
    (kmo + akpm)
    - Add comment for percpu_ida_alloc @ gfp (kmo + akpm)
    - Move to percpu_ida.c + percpu_ida.h (kmo + akpm + nab)

    v4 changes:

    - Fix tags.c reference in percpu_ida_init (akpm)

    Signed-off-by: Kent Overstreet
    Cc: Tejun Heo
    Cc: Oleg Nesterov
    Cc: Christoph Lameter
    Cc: Ingo Molnar
    Cc: Andi Kleen
    Cc: Jens Axboe
    Cc: "Nicholas A. Bellinger"
    Signed-off-by: Nicholas Bellinger

    Kent Overstreet
     
  • Pull ARC changes from Vineet Gupta:

    - ARC MM changes:
    - preparation for MMUv4 (accomodate new PTE bits, new cmds)
    - Rework the ASID allocation algorithm to remove asid-mm reverse map
    - Boilerplate code consolidation in Exception Handlers
    - Disable FRAME_POINTER for ARC
    - Unaligned Access Emulation for Big-Endian from Noam
    - Bunch of fixes (udelay, missing accessors) from Mischa

    * tag 'arc-v3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
    ARC: fix new Section mismatches in build (post __cpuinit cleanup)
    Kconfig.debug: Add FRAME_POINTER anti-dependency for ARC
    ARC: Fix __udelay calculation
    ARC: remove console_verbose() from setup_arch()
    ARC: Add read*_relaxed to asm/io.h
    ARC: Handle un-aligned user space access in BE.
    ARC: [ASID] Track ASID allocation cycles/generations
    ARC: [ASID] activate_mm() == switch_mm()
    ARC: [ASID] get_new_mmu_context() to conditionally allocate new ASID
    ARC: [ASID] Refactor the TLB paranoid debug code
    ARC: [ASID] Remove legacy/unused debug code
    ARC: No need to flush the TLB in early boot
    ARC: MMUv4 preps/3 - Abstract out TLB Insert/Delete
    ARC: MMUv4 preps/2 - Reshuffle PTE bits
    ARC: MMUv4 preps/1 - Fold PTE K/U access flags
    ARC: Code cosmetics (Nothing semantical)
    ARC: Entry Handler tweaks: Optimize away redundant IRQ_DISABLE_SAVE
    ARC: Exception Handlers Code consolidation
    ARC: Add some .gitignore entries

    Linus Torvalds
     

08 Sep, 2013

4 commits

  • The only actual current lockref user (dcache) uses zero reference counts
    even for perfectly live dentries, because it's a cache: there may not be
    any users, but that doesn't mean that we want to throw away the dentry.

    At the same time, the dentry cache does have a notion of a truly "dead"
    dentry that we must not even increment the reference count of, because
    we have pruned it and it is not valid.

    Currently that distinction is not visible in the lockref itself, and the
    dentry cache validation uses "lockref_get_or_lock()" to either get a new
    reference to a dentry that already had existing references (and thus
    cannot be dead), or get the dentry lock so that we can then verify the
    dentry and increment the reference count under the lock if that
    verification was successful.

    That's all somewhat complicated.

    This adds the concept of being "dead" to the lockref itself, by simply
    using a count that is negative. This allows a usage scenario where we
    can increment the refcount of a dentry without having to validate it,
    and pushing the special "we killed it" case into the lockref code.

    The dentry code itself doesn't actually use this yet, and it's probably
    too late in the merge window to do that code (the dentry_kill() code
    with its "should I decrement the count" logic really is pretty complex
    code), but let's introduce the concept at the lockref level now.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • The code got rewritten, but the comments got copied as-is from older
    versions, and as a result the argument name in the comment didn't
    actually match the code any more.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Pull namespace changes from Eric Biederman:
    "This is an assorted mishmash of small cleanups, enhancements and bug
    fixes.

    The major theme is user namespace mount restrictions. nsown_capable
    is killed as it encourages not thinking about details that need to be
    considered. A very hard to hit pid namespace exiting bug was finally
    tracked and fixed. A couple of cleanups to the basic namespace
    infrastructure.

    Finally there is an enhancement that makes per user namespace
    capabilities usable as capabilities, and an enhancement that allows
    the per userns root to nice other processes in the user namespace"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    userns: Kill nsown_capable it makes the wrong thing easy
    capabilities: allow nice if we are privileged
    pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD
    userns: Allow PR_CAPBSET_DROP in a user namespace.
    namespaces: Simplify copy_namespaces so it is clear what is going on.
    pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup
    sysfs: Restrict mounting sysfs
    userns: Better restrictions on when proc and sysfs can be mounted
    vfs: Don't copy mount bind mounts of /proc//ns/mnt between namespaces
    kernel/nsproxy.c: Improving a snippet of code.
    proc: Restrict mounting the proc filesystem
    vfs: Lock in place mounts from more privileged users

    Linus Torvalds
     
  • Pull crypto update from Herbert Xu:
    "Here is the crypto update for 3.12:

    - Added MODULE_SOFTDEP to allow pre-loading of modules.
    - Reinstated crct10dif driver using the module softdep feature.
    - Allow via rng driver to be auto-loaded.

    - Split large input data when necessary in nx.
    - Handle zero length messages correctly for GCM/XCBC in nx.
    - Handle SHA-2 chunks bigger than block size properly in nx.

    - Handle unaligned lengths in omap-aes.
    - Added SHA384/SHA512 to omap-sham.
    - Added OMAP5/AM43XX SHAM support.
    - Added OMAP4 TRNG support.

    - Misc fixes"

    * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (66 commits)
    Reinstate "crypto: crct10dif - Wrap crc_t10dif function all to use crypto transform framework"
    hwrng: via - Add MODULE_DEVICE_TABLE
    crypto: fcrypt - Fix bitoperation for compilation with clang
    crypto: nx - fix SHA-2 for chunks bigger than block size
    crypto: nx - fix GCM for zero length messages
    crypto: nx - fix XCBC for zero length messages
    crypto: nx - fix limits to sg lists for AES-CCM
    crypto: nx - fix limits to sg lists for AES-XCBC
    crypto: nx - fix limits to sg lists for AES-GCM
    crypto: nx - fix limits to sg lists for AES-CTR
    crypto: nx - fix limits to sg lists for AES-CBC
    crypto: nx - fix limits to sg lists for AES-ECB
    crypto: nx - add offset to nx_build_sg_lists()
    padata - Register hotcpu notifier after initialization
    padata - share code between CPU_ONLINE and CPU_DOWN_FAILED, same to CPU_DOWN_PREPARE and CPU_UP_CANCELED
    hwrng: omap - reorder OMAP TRNG driver code
    crypto: omap-sham - correct dma burst size
    crypto: omap-sham - Enable Polling mode if DMA fails
    crypto: tegra-aes - bitwise vs logical and
    crypto: sahara - checking the wrong variable
    ...

    Linus Torvalds
     

07 Sep, 2013

1 commit


06 Sep, 2013

1 commit

  • Pull ARM updates from Russell King:
    "This set includes adding support for Neon acceleration of RAID6 XOR
    code from Ard Biesheuvel, cache flushing and barrier updates from Will
    Deacon, and a cleanup to the ARM debug code which reduces the amount
    of code by about 500 lines.

    A few other cleanups, such as constifying the machine descriptors
    which already shouldn't be written to, cleaning up the printing of the
    L2 cache size"

    * 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (55 commits)
    ARM: 7826/1: debug: support debug ll on hisilicon soc
    ARM: 7830/1: delay: don't bother reporting bogomips in /proc/cpuinfo
    ARM: 7829/1: Add ".text.unlikely" and ".text.hot" to arm unwind tables
    ARM: 7828/1: ARMv7-M: implement restart routine common to all v7-M machines
    ARM: 7827/1: highbank: fix debug uart virtual address for LPAE
    ARM: 7823/1: errata: workaround Cortex-A15 erratum 773022
    ARM: 7806/1: allow DEBUG_UNCOMPRESS for Tegra
    ARM: 7793/1: debug: use generic option for ep93xx PL10x debug port
    ARM: debug: move SPEAr debug to generic PL01x code
    ARM: debug: move davinci debug to generic 8250 code
    ARM: debug: move keystone debug to generic 8250 code
    ARM: debug: remove DEBUG_ROCKCHIP_UART
    ARM: debug: provide generic option choices for 8250 and PL01x ports
    ARM: debug: move PL01X debug include into arch/arm/include/debug/
    ARM: debug: provide PL01x debug uart phys/virt address configuration options
    ARM: debug: add support for word accesses to debug/8250.S
    ARM: debug: move 8250 debug include into arch/arm/include/debug/
    ARM: debug: provide 8250 debug uart phys/virt address configuration options
    ARM: debug: provide 8250 debug uart register shift configuration option
    ARM: debug: provide 8250 debug uart flow control configuration option
    ...

    Linus Torvalds
     

05 Sep, 2013

4 commits

  • Pull vfs pile 1 from Al Viro:
    "Unfortunately, this merge window it'll have a be a lot of small piles -
    my fault, actually, for not keeping #for-next in anything that would
    resemble a sane shape ;-/

    This pile: assorted fixes (the first 3 are -stable fodder, IMO) and
    cleanups + %pd/%pD formats (dentry/file pathname, up to 4 last
    components) + several long-standing patches from various folks.

    There definitely will be a lot more (starting with Miklos'
    check_submount_and_drop() series)"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
    direct-io: Handle O_(D)SYNC AIO
    direct-io: Implement generic deferred AIO completions
    add formats for dentry/file pathnames
    kvm eventfd: switch to fdget
    powerpc kvm: use fdget
    switch fchmod() to fdget
    switch epoll_ctl() to fdget
    switch copy_module_from_fd() to fdget
    git simplify nilfs check for busy subtree
    ibmasmfs: don't bother passing superblock when not needed
    don't pass superblock to hypfs_{mkdir,create*}
    don't pass superblock to hypfs_diag_create_files
    don't pass superblock to hypfs_vm_create_files()
    oprofile: get rid of pointless forward declarations of struct super_block
    oprofilefs_create_...() do not need superblock argument
    oprofilefs_mkdir() doesn't need superblock argument
    don't bother with passing superblock to oprofile_create_stats_files()
    oprofile: don't bother with passing superblock to ->create_files()
    don't bother passing sb to oprofile_create_files()
    coh901318: don't open-code simple_read_from_buffer()
    ...

    Linus Torvalds
     
  • Russell King
     
  • Frame pointer on ARC doesn't serve the conventional purpose of stack
    unwinding due to the typical way ABI designates it's usage.
    Thus it's explicit usage on ARC is discouraged (gcc is free to use it,
    for some tricky stack frames even if -fomit-frame-pointer).

    Hence no point enabling it for ARC.

    References: http://www.spinics.net/lists/kernel/msg1593937.html
    Signed-off-by: Vineet Gupta
    Cc: Dave Hansen
    Cc: Andrew Morton
    Cc: "Paul E. McKenney"
    Cc: Catalin Marinas
    Cc: Michel Lespinasse
    Cc: linux-kernel@vger.kernel.org

    Vineet Gupta
     
  • Pull Xen updates from Konrad Rzeszutek Wilk:
    "A couple of features and a ton of bug-fixes. There is also some
    maintership changes. Jeremy is enjoying the full-time work at the
    startup and as much as he would love to help - he can't find the time.
    I have a bunch of other things that I promised to work on - paravirt
    diet, get SWIOTLB working everywhere, etc, but haven't been able to
    find the time.

    As such both David Vrabel and Boris Ostrovsky have graciously
    volunteered to help with the maintership role. They will keep the lid
    on regressions, bug-fixes, etc. I will be in the background to help -
    but eventually there will be less of me doing the Xen GIT pulls and
    more of them. Stefano is still doing the ARM/ARM64 and will continue
    on doing so.

    Features:
    - Xen Trusted Platform Module (TPM) frontend driver - with the
    backend in MiniOS.
    - Scalability improvements in event channel.
    - Two extra Xen co-maintainers (David, Boris) and one going away (Jeremy)

    Bug-fixes:
    - Make the 1:1 mapping work during early bootup on selective regions.
    - Add scratch page to balloon driver to deal with unexpected code
    still holding on stale pages.
    - Allow NMIs on PV guests (64-bit only)
    - Remove unnecessary TLB flush in M2P code.
    - Fixes duplicate callbacks in Xen granttable code.
    - Fixes in PRIVCMD_MMAPBATCH ioctls to allow retries
    - Fix for events being lost due to rescheduling on different VCPUs.
    - More documentation"

    * tag 'stable/for-linus-3.12-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (23 commits)
    hvc_xen: Remove unnecessary __GFP_ZERO from kzalloc
    drivers/xen-tpmfront: Fix compile issue with missing option.
    xen/balloon: don't set P2M entry for auto translated guest
    xen/evtchn: double free on error
    Xen: Fix retry calls into PRIVCMD_MMAPBATCH*.
    xen/pvhvm: Initialize xen panic handler for PVHVM guests
    xen/m2p: use GNTTABOP_unmap_and_replace to reinstate the original mapping
    xen: fix ARM build after 6efa20e4
    MAINTAINERS: Remove Jeremy from the Xen subsystem.
    xen/events: document behaviour when scanning the start word for events
    x86/xen: during early setup, only 1:1 map the ISA region
    x86/xen: disable premption when enabling local irqs
    swiotlb-xen: replace dma_length with sg_dma_len() macro
    swiotlb: replace dma_length with sg_dma_len() macro
    xen/balloon: set a mapping for ballooned out pages
    xen/evtchn: improve scalability by using per-user locks
    xen/p2m: avoid unneccesary TLB flush in m2p_remove_override()
    MAINTAINERS: Add in two extra co-maintainers of the Xen tree.
    MAINTAINERS: Update the Xen subsystem's with proper mailing list.
    xen: replace strict_strtoul() with kstrtoul()
    ...

    Linus Torvalds
     

04 Sep, 2013

6 commits

  • Pull x86/asmlinkage changes from Ingo Molnar:
    "As a preparation for Andi Kleen's LTO patchset (link time
    optimizations using GCC's -flto which build time optimization has
    steadily increased in quality over the past few years and might
    eventually be usable for the kernel too) this tree includes a handful
    of preparatory patches that make function calling convention
    annotations consistent again:

    - Mark every function without arguments (or 64bit only) that is used
    by assembly code with asmlinkage()

    - Mark every function with parameters or variables that is used by
    assembly code as __visible.

    For the vanilla kernel this has documentation, consistency and
    debuggability advantages, for the time being"

    * 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/asmlinkage: Fix warning in xen asmlinkage change
    x86, asmlinkage, vdso: Mark vdso variables __visible
    x86, asmlinkage, power: Make various symbols used by the suspend asm code visible
    x86, asmlinkage: Make dump_stack visible
    x86, asmlinkage: Make 64bit checksum functions visible
    x86, asmlinkage, paravirt: Add __visible/asmlinkage to xen paravirt ops
    x86, asmlinkage, apm: Make APM data structure used from assembler visible
    x86, asmlinkage: Make syscall tables visible
    x86, asmlinkage: Make several variables used from assembler/linker script visible
    x86, asmlinkage: Make kprobes code visible and fix assembler code
    x86, asmlinkage: Make various syscalls asmlinkage
    x86, asmlinkage: Make 32bit/64bit __switch_to visible
    x86, asmlinkage: Make _*_start_kernel visible
    x86, asmlinkage: Make all interrupt handlers asmlinkage / __visible
    x86, asmlinkage: Change dotraplinkage into __visible on 32bit
    x86: Fix sys_call_table type in asm/syscall.h

    Linus Torvalds
     
  • Pull RCU updates from Ingo Molnar:
    "Main RCU changes this cycle were:

    - Full-system idle detection. This is for use by Frederic
    Weisbecker's adaptive-ticks mechanism. Its purpose is to allow the
    timekeeping CPU to shut off its tick when all other CPUs are idle.

    - Miscellaneous fixes.

    - Improved rcutorture test coverage.

    - Updated RCU documentation"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
    nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU
    nohz_full: Add full-system-idle state machine
    jiffies: Avoid undefined behavior from signed overflow
    rcu: Simplify _rcu_barrier() processing
    rcu: Make rcutorture emit online failures if verbose
    rcu: Remove unused variable from rcu_torture_writer()
    rcu: Sort rcutorture module parameters
    rcu: Increase rcutorture test coverage
    rcu: Add duplicate-callback tests to rcutorture
    doc: Fix memory-barrier control-dependency example
    rcu: Update RTFP documentation
    nohz_full: Add full-system-idle arguments to API
    nohz_full: Add full-system idle states and variables
    nohz_full: Add per-CPU idle-state tracking
    nohz_full: Add rcu_dyntick data for scalable detection of all-idle state
    nohz_full: Add Kconfig parameter for scalable detection of all-idle state
    nohz_full: Add testing information to documentation
    rcu: Eliminate unused APIs intended for adaptive ticks
    rcu: Select IRQ_WORK from TREE_PREEMPT_RCU
    rculist: list_first_or_null_rcu() should use list_entry_rcu()
    ...

    Linus Torvalds
     
  • New formats: %p[dD][234]?. The next pointer is interpreted as struct dentry *
    or struct file * resp. ('d' => dentry, 'D' => file) and the last component(s)
    of pathname are printed (%pd => just the last one, %pd2 => the last two, etc.)

    Signed-off-by: Al Viro

    Al Viro
     
  • Pull ACPI and power management updates from Rafael Wysocki:

    1) ACPI-based PCI hotplug (ACPIPHP) subsystem rework and introduction
    of Intel Thunderbolt support on systems that use ACPI for signalling
    Thunderbolt hotplug events. This also should make ACPIPHP work in
    some cases in which it was known to have problems. From
    Rafael J Wysocki, Mika Westerberg and Kirill A Shutemov.

    2) ACPI core code cleanups and dock station support cleanups from
    Jiang Liu and Rafael J Wysocki.

    3) Fixes for locking problems related to ACPI device hotplug from
    Rafael J Wysocki.

    4) ACPICA update to version 20130725 includig fixes, cleanups, support
    for more than 256 GPEs per GPE block and a change to make the ACPI
    PM Timer optional (we've seen systems without the PM Timer in the
    field already). One of the fixes, related to the DeRefOf operator,
    is necessary to prevent some Windows 8 oriented AML from causing
    problems to happen. From Bob Moore, Lv Zheng, and Jung-uk Kim.

    5) Removal of the old and long deprecated /proc/acpi/event interface
    and related driver changes from Thomas Renninger.

    6) ACPI and Xen changes to make the reduced hardware sleep work with
    the latter from Ben Guthro.

    7) ACPI video driver cleanups and a blacklist of systems that should
    not tell the BIOS that they are compatible with Windows 8 (or ACPI
    backlight and possibly other things will not work on them). From
    Felipe Contreras.

    8) Assorted ACPI fixes and cleanups from Aaron Lu, Hanjun Guo,
    Kuppuswamy Sathyanarayanan, Lan Tianyu, Sachin Kamat, Tang Chen,
    Toshi Kani, and Wei Yongjun.

    9) cpufreq ondemand governor target frequency selection change to
    reduce oscillations between min and max frequencies (essentially,
    it causes the governor to choose target frequencies proportional
    to load) from Stratos Karafotis.

    10) cpufreq fixes allowing sysfs attributes file permissions to be
    preserved over suspend/resume cycles Srivatsa S Bhat.

    11) Removal of Device Tree parsing for CPU device nodes from multiple
    cpufreq drivers that required some changes related to
    of_get_cpu_node() to be made in a few architectures and in the
    driver core. From Sudeep KarkadaNagesha.

    12) cpufreq core fixes and cleanups related to mutual exclusion and
    driver module references from Viresh Kumar, Lukasz Majewski and
    Rafael J Wysocki.

    13) Assorted cpufreq fixes and cleanups from Amit Daniel Kachhap,
    Bartlomiej Zolnierkiewicz, Hanjun Guo, Jingoo Han, Joseph Lo,
    Julia Lawall, Li Zhong, Mark Brown, Sascha Hauer, Stephen Boyd,
    Stratos Karafotis, and Viresh Kumar.

    14) Fixes to prevent race conditions in coupled cpuidle from happening
    from Colin Cross.

    15) cpuidle core fixes and cleanups from Daniel Lezcano and
    Tuukka Tikkanen.

    16) Assorted cpuidle fixes and cleanups from Daniel Lezcano,
    Geert Uytterhoeven, Jingoo Han, Julia Lawall, Linus Walleij,
    and Sahara.

    17) System sleep tracing changes from Todd E Brandt and Shuah Khan.

    18) PNP subsystem conversion to using struct dev_pm_ops for power
    management from Shuah Khan.

    * tag 'pm+acpi-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (217 commits)
    cpufreq: Don't use smp_processor_id() in preemptible context
    cpuidle: coupled: fix race condition between pokes and safe state
    cpuidle: coupled: abort idle if pokes are pending
    cpuidle: coupled: disable interrupts after entering safe state
    ACPI / hotplug: Remove containers synchronously
    driver core / ACPI: Avoid device hot remove locking issues
    cpufreq: governor: Fix typos in comments
    cpufreq: governors: Remove duplicate check of target freq in supported range
    cpufreq: Fix timer/workqueue corruption due to double queueing
    ACPI / EC: Add ASUSTEK L4R to quirk list in order to validate ECDT
    ACPI / thermal: Add check of "_TZD" availability and evaluating result
    cpufreq: imx6q: Fix clock enable balance
    ACPI: blacklist win8 OSI for buggy laptops
    cpufreq: tegra: fix the wrong clock name
    cpuidle: Change struct menu_device field types
    cpuidle: Add a comment warning about possible overflow
    cpuidle: Fix variable domains in get_typical_interval()
    cpuidle: Fix menu_device->intervals type
    cpuidle: CodingStyle: Break up multiple assignments on single line
    cpuidle: Check called function parameter in get_typical_interval()
    ...

    Linus Torvalds
     
  • While we are likley to succeed and break out of this loop, it isn't
    guaranteed. We should be power and thread friendly if we do have to
    go around for a second (or third, or more) attempt.

    Signed-off-by: Tony Luck
    Signed-off-by: Linus Torvalds

    Luck, Tony
     
  • Pull driver core patches from Greg KH:
    "Here's the big driver core pull request for 3.12-rc1.

    Lots of tiny changes here fixing up the way sysfs attributes are
    created, to try to make drivers simpler, and fix a whole class race
    conditions with creations of device attributes after the device was
    announced to userspace.

    All the various pieces are acked by the different subsystem
    maintainers"

    * tag 'driver-core-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (119 commits)
    firmware loader: fix pending_fw_head list corruption
    drivers/base/memory.c: introduce help macro to_memory_block
    dynamic debug: line queries failing due to uninitialized local variable
    sysfs: sysfs_create_groups returns a value.
    debugfs: provide debugfs_create_x64() when disabled
    rbd: convert bus code to use bus_groups
    firmware: dcdbas: use binary attribute groups
    sysfs: add sysfs_create/remove_groups for when SYSFS is not enabled
    driver core: add #include to core files.
    HID: convert bus code to use dev_groups
    Input: serio: convert bus code to use drv_groups
    Input: gameport: convert bus code to use drv_groups
    driver core: firmware: use __ATTR_RW()
    driver core: core: use DEVICE_ATTR_RO
    driver core: bus: use DRIVER_ATTR_WO()
    driver core: create write-only attribute macros for devices and drivers
    sysfs: create __ATTR_WO()
    driver-core: platform: convert bus code to use dev_groups
    workqueue: convert bus code to use dev_groups
    MEI: convert bus code to use dev_groups
    ...

    Linus Torvalds
     

03 Sep, 2013

3 commits

  • …/linux-rcu into core/rcu

    Pull RCU updates from Paul E. McKenney:

    "
    * Update RCU documentation. These were posted to LKML at
    https://lkml.org/lkml/2013/8/19/611.

    * Miscellaneous fixes. These were posted to LKML at
    https://lkml.org/lkml/2013/8/19/619.

    * Full-system idle detection. This is for use by Frederic
    Weisbecker's adaptive-ticks mechanism. Its purpose is
    to allow the timekeeping CPU to shut off its tick when
    all other CPUs are idle. These were posted to LKML at
    https://lkml.org/lkml/2013/8/19/648.

    * Improve rcutorture test coverage. These were posted to LKML at
    https://lkml.org/lkml/2013/8/19/675.
    "

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     
  • Instead of taking the spinlock, the lockless versions atomically check
    that the lock is not taken, and do the reference count update using a
    cmpxchg() loop. This is semantically identical to doing the reference
    count update protected by the lock, but avoids the "wait for lock"
    contention that you get when accesses to the reference count are
    contended.

    Note that a "lockref" is absolutely _not_ equivalent to an atomic_t.
    Even when the lockref reference counts are updated atomically with
    cmpxchg, the fact that they also verify the state of the spinlock means
    that the lockless updates can never happen while somebody else holds the
    spinlock.

    So while "lockref_put_or_lock()" looks a lot like just another name for
    "atomic_dec_and_lock()", and both optimize to lockless updates, they are
    fundamentally different: the decrement done by atomic_dec_and_lock() is
    truly independent of any lock (as long as it doesn't decrement to zero),
    so a locked region can still see the count change.

    The lockref structure, in contrast, really is a *locked* reference
    count. If you hold the spinlock, the reference count will be stable and
    you can modify the reference count without using atomics, because even
    the lockless updates will see and respect the state of the lock.

    In order to enable the cmpxchg lockless code, the architecture needs to
    do three things:

    (1) Make sure that the "arch_spinlock_t" and an "unsigned int" can fit
    in an aligned u64, and have a "cmpxchg()" implementation that works
    on such a u64 data type.

    (2) define a helper function to test for a spinlock being unlocked
    ("arch_spin_value_unlocked()")

    (3) select the "ARCH_USE_CMPXCHG_LOCKREF" config variable in its
    Kconfig file.

    This enables it for x86-64 (but not 32-bit, we'd need to make sure
    cmpxchg() turns into the proper cmpxchg8b in order to enable it for
    32-bit mode).

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • They aren't very good to inline, since they already call external
    functions (the spinlock code), and we're going to create rather more
    complicated versions of them that can do the reference count updates
    locklessly.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

29 Aug, 2013

2 commits

  • Don't allow mounting sysfs unless the caller has CAP_SYS_ADMIN rights
    over the net namespace. The principle here is if you create or have
    capabilities over it you can mount it, otherwise you get to live with
    what other people have mounted.

    Instead of testing this with a straight forward ns_capable call,
    perform this check the long and torturous way with kobject helpers,
    this keeps direct knowledge of namespaces out of sysfs, and preserves
    the existing sysfs abstractions.

    Acked-by: Greg Kroah-Hartman
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Settings of the form, 'line x module y +p', can fail arbitrarily due to an
    uninitialized local variable. With this patch results are consistent, as
    expected.

    Signed-off-by: Jason Baron
    Signed-off-by: Greg Kroah-Hartman

    jbaron@akamai.com