14 Aug, 2012

1 commit

  • ARM recently moved to asm-generic/mutex-xchg.h for its mutex
    implementation after the previous implementation was found to be missing
    some crucial memory barriers. However, this has revealed some problems
    running hackbench on SMP platforms due to the way in which the
    MUTEX_SPIN_ON_OWNER code operates.

    The symptoms are that a bunch of hackbench tasks are left waiting on an
    unlocked mutex and therefore never get woken up to claim it. This boils
    down to the following sequence of events:

    Task A Task B Task C Lock value
    0 1
    1 lock() 0
    2 lock() 0
    3 spin(A) 0
    4 unlock() 1
    5 lock() 0
    6 cmpxchg(1,0) 0
    7 contended() -1
    8 lock() 0
    9 spin(C) 0
    10 unlock() 1
    11 cmpxchg(1,0) 0
    12 unlock() 1

    At this point, the lock is unlocked, but Task B is in an uninterruptible
    sleep with nobody to wake it up.

    This patch fixes the problem by ensuring we put the lock into the
    contended state if we fail to acquire it on the fastpath, ensuring that
    any blocked waiters are woken up when the mutex is released.

    Signed-off-by: Will Deacon
    Cc: Arnd Bergmann
    Cc: Chris Mason
    Cc: Ingo Molnar
    Cc:
    Reviewed-by: Nicolas Pitre
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-6e9lrw2avczr0617fzl5vqb8@git.kernel.org
    Signed-off-by: Thomas Gleixner

    Will Deacon
     

31 Jul, 2012

3 commits

  • Merge Andrew's first set of patches:
    "Non-MM patches:

    - lots of misc bits

    - tree-wide have_clk() cleanups

    - quite a lot of printk tweaks. I draw your attention to "printk:
    convert the format for KERN_ to a 2 byte pattern" which
    looks a bit scary. But afaict it's solid.

    - backlight updates

    - lib/ feature work (notably the addition and use of memweight())

    - checkpatch updates

    - rtc updates

    - nilfs updates

    - fatfs updates (partial, still waiting for acks)

    - kdump, proc, fork, IPC, sysctl, taskstats, pps, etc

    - new fault-injection feature work"

    * Merge emailed patches from Andrew Morton : (128 commits)
    drivers/misc/lkdtm.c: fix missing allocation failure check
    lib/scatterlist: do not re-write gfp_flags in __sg_alloc_table()
    fault-injection: add tool to run command with failslab or fail_page_alloc
    fault-injection: add selftests for cpu and memory hotplug
    powerpc: pSeries reconfig notifier error injection module
    memory: memory notifier error injection module
    PM: PM notifier error injection module
    cpu: rewrite cpu-notifier-error-inject module
    fault-injection: notifier error injection
    c/r: fcntl: add F_GETOWNER_UIDS option
    resource: make sure requested range is included in the root range
    include/linux/aio.h: cpp->C conversions
    fs: cachefiles: add support for large files in filesystem caching
    pps: return PTR_ERR on error in device_create
    taskstats: check nla_reserve() return
    sysctl: suppress kmemleak messages
    ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION
    ipc: compat: use signed size_t types for msgsnd and msgrcv
    ipc: allow compat IPC version field parsing if !ARCH_WANT_OLD_COMPAT_IPC
    ipc: add COMPAT_SHMLBA support
    ...

    Linus Torvalds
     
  • When we restore file descriptors we would like them to look exactly as
    they were at dumping time.

    With help of fcntl it's almost possible, the missing snippet is file
    owners UIDs.

    To be able to read their values the F_GETOWNER_UIDS is introduced.

    This option is valid iif CONFIG_CHECKPOINT_RESTORE is turned on, otherwise
    returning -EINVAL.

    Signed-off-by: Cyrill Gorcunov
    Acked-by: "Eric W. Biederman"
    Cc: "Serge E. Hallyn"
    Cc: Oleg Nesterov
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • Pull DMA-mapping updates from Marek Szyprowski:
    "Those patches are continuation of my earlier work.

    They contains extensions to DMA-mapping framework to remove limitation
    of the current ARM implementation (like limited total size of DMA
    coherent/write combine buffers), improve performance of buffer sharing
    between devices (attributes to skip cpu cache operations or creation
    of additional kernel mapping for some specific use cases) as well as
    some unification of the common code for dma_mmap_attrs() and
    dma_mmap_coherent() functions. All extensions have been implemented
    and tested for ARM architecture."

    * 'for-linus-for-3.6-rc1' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
    ARM: dma-mapping: add support for DMA_ATTR_SKIP_CPU_SYNC attribute
    common: DMA-mapping: add DMA_ATTR_SKIP_CPU_SYNC attribute
    ARM: dma-mapping: add support for dma_get_sgtable()
    common: dma-mapping: introduce dma_get_sgtable() function
    ARM: dma-mapping: add support for DMA_ATTR_NO_KERNEL_MAPPING attribute
    common: DMA-mapping: add DMA_ATTR_NO_KERNEL_MAPPING attribute
    common: dma-mapping: add support for generic dma_mmap_* calls
    ARM: dma-mapping: fix error path for memory allocation failure
    ARM: dma-mapping: add more sanity checks in arm_dma_mmap()
    ARM: dma-mapping: remove custom consistent dma region
    mm: vmalloc: use const void * for caller argument
    scatterlist: add sg_alloc_table_from_pages function

    Linus Torvalds
     

30 Jul, 2012

2 commits

  • This patch adds dma_get_sgtable() function which is required to let
    drivers to share the buffers allocated by DMA-mapping subsystem. Right
    now the driver gets a dma address of the allocated buffer and the kernel
    virtual mapping for it. If it wants to share it with other device (= map
    into its dma address space) it usually hacks around kernel virtual
    addresses to get pointers to pages or assumes that both devices share
    the DMA address space. Both solutions are just hacks for the special
    cases, which should be avoided in the final version of buffer sharing.

    To solve this issue in a generic way, a new call to DMA mapping has been
    introduced - dma_get_sgtable(). It allocates a scatter-list which
    describes the allocated buffer and lets the driver(s) to use it with
    other device(s) by calling dma_map_sg() on it.

    This patch provides a generic implementation based on virt_to_page()
    call. Architectures which require more sophisticated translation might
    provide their own get_sgtable() methods.

    Signed-off-by: Marek Szyprowski
    Reviewed-by: Kyungmin Park
    Reviewed-by: Daniel Vetter

    Marek Szyprowski
     
  • Commit 9adc5374 ('common: dma-mapping: introduce mmap method') added a
    generic method for implementing mmap user call to dma_map_ops structure.

    This patch converts ARM and PowerPC architectures (the only providers of
    dma_mmap_coherent/dma_mmap_writecombine calls) to use this generic
    dma_map_ops based call and adds a generic cross architecture
    definition for dma_mmap_attrs, dma_mmap_coherent, dma_mmap_writecombine
    functions.

    The generic mmap virt_to_page-based fallback implementation is provided for
    architectures which don't provide their own implementation for mmap method.

    Signed-off-by: Marek Szyprowski
    Reviewed-by: Kyungmin Park

    Marek Szyprowski
     

28 Jul, 2012

3 commits

  • Pull ARM updates from Russell King:
    "First ARM push of this merge window, post me coming back from holiday.
    This is what has been in linux-next for the last few weeks. Not much
    to say which isn't described by the commit summaries."

    * 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (32 commits)
    ARM: 7463/1: topology: Update cpu_power according to DT information
    ARM: 7462/1: topology: factorize the update of sibling masks
    ARM: 7461/1: topology: Add arch_scale_freq_power function
    ARM: 7456/1: ptrace: provide separate functions for tracing syscall {entry,exit}
    ARM: 7455/1: audit: move syscall auditing until after ptrace SIGTRAP handling
    ARM: 7454/1: entry: don't bother with syscall tracing on ret_from_fork path
    ARM: 7453/1: audit: only allow syscall auditing for pure EABI userspace
    ARM: 7452/1: delay: allow timer-based delay implementation to be selected
    ARM: 7451/1: arch timer: implement read_current_timer and get_cycles
    ARM: 7450/1: dcache: select DCACHE_WORD_ACCESS for little-endian ARMv6+ CPUs
    ARM: 7449/1: use generic strnlen_user and strncpy_from_user functions
    ARM: 7448/1: perf: remove arm_perf_pmu_ids global enumeration
    ARM: 7447/1: rwlocks: remove unused branch labels from trylock routines
    ARM: 7446/1: spinlock: use ticket algorithm for ARMv6+ locking implementation
    ARM: 7445/1: mm: update CONTEXTIDR register to contain PID of current process
    ARM: 7444/1: kernel: add arch-timer C3STOP feature
    ARM: 7460/1: remove asm/locks.h
    ARM: 7439/1: head.S: simplify initial page table mapping
    ARM: 7437/1: zImage: Allow DTB command line concatenation with ATAG_CMDLINE
    ARM: 7436/1: Do not map the vectors page as write-through on UP systems
    ...

    Linus Torvalds
     
  • Russell King
     
  • Pull final kmap_atomic cleanups from Cong Wang:
    "This should be the final round of cleanup, as the definitions of enum
    km_type finally get removed from the whole tree. The patches have
    been in linux-next for a long time."

    * 'kmap_atomic' of git://github.com/congwang/linux:
    pipe: remove KM_USER0 from comments
    vmalloc: remove KM_USER0 from comments
    feature-removal-schedule.txt: remove kmap_atomic(page, km_type)
    tile: remove km_type definitions
    um: remove km_type definitions
    asm-generic: remove km_type definitions
    avr32: remove km_type definitions
    frv: remove km_type definitions
    powerpc: remove km_type definitions
    arm: remove km_type definitions
    highmem: remove the deprecated form of kmap_atomic
    tile: remove usage of enum km_type
    frv: remove the second parameter of kmap_atomic_primary()
    jbd2: remove the second argument of kmap_atomic

    Linus Torvalds
     

27 Jul, 2012

1 commit

  • Pull x86/mm changes from Peter Anvin:
    "The big change here is the patchset by Alex Shi to use INVLPG to flush
    only the affected pages when we only need to flush a small page range.

    It also removes the special INVALIDATE_TLB_VECTOR interrupts (32
    vectors!) and replace it with an ordinary IPI function call."

    Fix up trivial conflicts in arch/x86/include/asm/apic.h (added code next
    to changed line)

    * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/tlb: Fix build warning and crash when building for !SMP
    x86/tlb: do flush_tlb_kernel_range by 'invlpg'
    x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR
    x86/tlb: enable tlb flush range support for x86
    mm/mmu_gather: enable tlb flush range in generic mmu_gather
    x86/tlb: add tlb_flushall_shift knob into debugfs
    x86/tlb: add tlb_flushall_shift for specific CPU
    x86/tlb: fall back to flush all when meet a THP large page
    x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range
    x86/tlb_info: get last level TLB entry number of CPU
    x86: Add read_mostly declaration/definition to variables from smp.h
    x86: Define early read-mostly per-cpu macros

    Linus Torvalds
     

24 Jul, 2012

1 commit


06 Jul, 2012

1 commit


29 Jun, 2012

1 commit

  • sizes.h is used throughout the AMBA code and drivers, so the header
    should be available to everyone in order to driver AMBA/PrimeCell
    peripherals behind a PCI bridge where the host can be any platform
    (I'm doing it under x86).

    At this step includes ,
    to allow a grace period for both in-tree and out-of-tree drivers.

    Signed-off-by: Alessandro Rubini
    Acked-by: Giancarlo Asnaghi
    Acked-by: Linus Walleij
    Cc: Alan Cox
    Signed-off-by: Russell King

    Alessandro Rubini
     

28 Jun, 2012

2 commits

  • This patch enabled the tlb flush range support in generic mmu layer.

    Most of arch has self tlb flush range support, like ARM/IA64 etc.
    X86 arch has no this support in hardware yet. But another instruction
    'invlpg' can implement this function in some degree. So, enable this
    feather in generic layer for x86 now. and maybe useful for other archs
    in further.

    Generic mmu_gather struct is protected by micro
    HAVE_GENERIC_MMU_GATHER. Other archs that has flush range supported
    own self mmu_gather struct. So, now this change is safe for them.

    In future we may unify this struct and related functions on multiple
    archs.

    Thanks for Peter Zijlstra time and time reminder for multiple
    architecture code safe!

    Signed-off-by: Alex Shi
    Link: http://lkml.kernel.org/r/1340845344-27557-7-git-send-email-alex.shi@intel.com
    Signed-off-by: H. Peter Anvin

    Alex Shi
     
  • Testing show different CPU type(micro architectures and NUMA mode) has
    different balance points between the TLB flush all and multiple invlpg.
    And there also has cases the tlb flush change has no any help.

    This patch give a interface to let x86 vendor developers have a chance
    to set different shift for different CPU type.

    like some machine in my hands, balance points is 16 entries on
    Romely-EP; while it is at 8 entries on Bloomfield NHM-EP; and is 256 on
    IVB mobile CPU. but on model 15 core2 Xeon using invlpg has nothing
    help.

    For untested machine, do a conservative optimization, same as NHM CPU.

    Signed-off-by: Alex Shi
    Link: http://lkml.kernel.org/r/1340845344-27557-5-git-send-email-alex.shi@intel.com
    Signed-off-by: H. Peter Anvin

    Alex Shi
     

26 Jun, 2012

1 commit

  • Commit 2603efa31a03 ("bug.h: Fix up powerpc build regression") corrected
    the powerpc build case and extended the __ASSEMBLY__ guards, but it also
    got caught in pre-processor hell accidentally matching the else case of
    CONFIG_BUG resulting in the BUG disabled case tripping up on
    -Werror=implicit-function-declaration.

    It's not possible to __ASSEMBLY__ guard the entire file as architecture
    code needs to get at the BUGFLAG_WARNING definition in the GENERIC_BUG
    case, but the rest of the CONFIG_BUG=y/n case needs to be guarded.

    Rather than littering endless __ASSEMBLY__ checks in each of the if/else
    cases we just move the BUGFLAG definitions up under their own
    GENERIC_BUG test and then shove everything else under one big
    __ASSEMBLY__ guard.

    Build tested on all of x86 CONFIG_BUG=y, CONFIG_BUG=n, powerpc (due to
    it's dependence on BUGFLAG definitions in assembly code), and sh (due to
    not bringing in linux/kernel.h to satisfy the taint flag definitions used
    by the generic bug code).

    Hopefully that's the end of the corner cases and I can abstain from ever
    having to touch this infernal header ever again.

    Reported-by: Fengguang Wu
    Tested-by: Fengguang Wu
    Acked-by: Randy Dunlap
    Cc: Arnd Bergmann
    Signed-off-by: Paul Mundt
    Signed-off-by: Linus Torvalds

    Paul Mundt
     

21 Jun, 2012

2 commits

  • * emailed from Andrew Morton : (21 patches)
    mm/memblock: fix overlapping allocation when doubling reserved array
    c/r: prctl: Move PR_GET_TID_ADDRESS to a proper place
    pidns: find_new_reaper() can no longer switch to init_pid_ns.child_reaper
    pidns: guarantee that the pidns init will be the last pidns process reaped
    fault-inject: avoid call to random32() if fault injection is disabled
    Viresh has moved
    get_maintainer: Fix --help warning
    mm/memory.c: fix kernel-doc warnings
    mm: fix kernel-doc warnings
    mm: correctly synchronize rss-counters at exit/exec
    mm, thp: print useful information when mmap_sem is unlocked in zap_pmd_range
    h8300: use the declarations provided by
    h8300: fix use of extinct _sbss and _ebss
    xtensa: use the declarations provided by
    xtensa: use "test -e" instead of bashism "test -a"
    xtensa: replace xtensa-specific _f{data,text} by _s{data,text}
    memcg: fix use_hierarchy css_is_ancestor oops regression
    mm, oom: fix and cleanup oom score calculations
    nilfs2: ensure proper cache clearing for gc-inodes
    thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE
    ...

    Linus Torvalds
     
  • In the x86 32bit PAE CONFIG_TRANSPARENT_HUGEPAGE=y case while holding the
    mmap_sem for reading, cmpxchg8b cannot be used to read pmd contents under
    Xen.

    So instead of dealing only with "consistent" pmdvals in
    pmd_none_or_trans_huge_or_clear_bad() (which would be conceptually
    simpler) we let pmd_none_or_trans_huge_or_clear_bad() deal with pmdvals
    where the low 32bit and high 32bit could be inconsistent (to avoid having
    to use cmpxchg8b).

    The only guarantee we get from pmd_read_atomic is that if the low part of
    the pmd was found null, the high part will be null too (so the pmd will be
    considered unstable). And if the low part of the pmd is found "stable"
    later, then it means the whole pmd was read atomically (because after a
    pmd is stable, neither MADV_DONTNEED nor page faults can alter it anymore,
    and we read the high part after the low part).

    In the 32bit PAE x86 case, it is enough to read the low part of the pmdval
    atomically to declare the pmd as "stable" and that's true for THP and no
    THP, furthermore in the THP case we also have a barrier() that will
    prevent any inconsistent pmdvals to be cached by a later re-read of the
    *pmd.

    Signed-off-by: Andrea Arcangeli
    Cc: Jonathan Nieder
    Cc: Ulrich Obergfell
    Cc: Mel Gorman
    Cc: Hugh Dickins
    Cc: Larry Woodman
    Cc: Petr Matousek
    Cc: Rik van Riel
    Cc: Jan Beulich
    Cc: KOSAKI Motohiro
    Tested-by: Andrew Jones
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

19 Jun, 2012

1 commit

  • The asm-generic/bug.h __ASSEMBLY__ guarding is completely bogus, which
    tripped up the powerpc build when the kernel.h include was added:

    In file included from include/asm-generic/bug.h:5:0,
    from arch/powerpc/include/asm/bug.h:127,
    from arch/powerpc/kernel/head_64.S:31:
    include/linux/kernel.h:44:0: warning: "ALIGN" redefined [enabled by default]
    include/linux/linkage.h:57:0: note: this is the location of the previous definition
    include/linux/sysinfo.h: Assembler messages:
    include/linux/sysinfo.h:7: Error: Unrecognized opcode: `struct'
    include/linux/sysinfo.h:8: Error: Unrecognized opcode: `__kernel_long_t'

    Moving the __ASSEMBLY__ guard up and stashing the kernel.h include under
    it fixes this up, as well as covering the case the original fix was
    attempting to handle.

    Tested-by: Stephen Rothwell
    Acked-by: Arnd Bergmann
    Signed-off-by: Paul Mundt
    Signed-off-by: Linus Torvalds

    Paul Mundt
     

13 Jun, 2012

1 commit


11 Jun, 2012

1 commit

  • asm-generic/bug.h uses taint flags that are only defined in
    linux/kernel.h, resulting in build failures on platforms that
    don't include linux/kernel.h some other way:

    arch/sh/include/asm/thread_info.h:172:2: error: 'TAINT_WARN' undeclared (first use in this function)

    Caused by commit edd63a2763bd ("set_restore_sigmask() is never called
    without SIGPENDING (and never should be)").

    Reported-by: Stephen Rothwell
    Cc: Al Viro
    Signed-off-by: Paul Mundt

    Paul Mundt
     

02 Jun, 2012

1 commit

  • Pull vfs changes from Al Viro.
    "A lot of misc stuff. The obvious groups:
    * Miklos' atomic_open series; kills the damn abuse of
    ->d_revalidate() by NFS, which was the major stumbling block for
    all work in that area.
    * ripping security_file_mmap() and dealing with deadlocks in the
    area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in
    general.
    * ->encode_fh() switched to saner API; insane fake dentry in
    mm/cleancache.c gone.
    * assorted annotations in fs (endianness, __user)
    * parts of Artem's ->s_dirty work (jff2 and reiserfs parts)
    * ->update_time() work from Josef.
    * other bits and pieces all over the place.

    Normally it would've been in two or three pull requests, but
    signal.git stuff had eaten a lot of time during this cycle ;-/"

    Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the
    'truncate_range' inode method was removed by the VM changes, the VFS
    update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due
    to sparse fix added twice, with other changes nearby).

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits)
    nfs: don't open in ->d_revalidate
    vfs: retry last component if opening stale dentry
    vfs: nameidata_to_filp(): don't throw away file on error
    vfs: nameidata_to_filp(): inline __dentry_open()
    vfs: do_dentry_open(): don't put filp
    vfs: split __dentry_open()
    vfs: do_last() common post lookup
    vfs: do_last(): add audit_inode before open
    vfs: do_last(): only return EISDIR for O_CREAT
    vfs: do_last(): check LOOKUP_DIRECTORY
    vfs: do_last(): make ENOENT exit RCU safe
    vfs: make follow_link check RCU safe
    vfs: do_last(): use inode variable
    vfs: do_last(): inline walk_component()
    vfs: do_last(): make exit RCU safe
    vfs: split do_lookup()
    Btrfs: move over to use ->update_time
    fs: introduce inode operation ->update_time
    reiserfs: get rid of resierfs_sync_super
    reiserfs: mark the superblock as dirty a bit later
    ...

    Linus Torvalds
     

01 Jun, 2012

3 commits

  • Merge misc patches from Andrew Morton:

    - the "misc" tree - stuff from all over the map

    - checkpatch updates

    - fatfs

    - kmod changes

    - procfs

    - cpumask

    - UML

    - kexec

    - mqueue

    - rapidio

    - pidns

    - some checkpoint-restore feature work. Reluctantly. Most of it
    delayed a release. I'm still rather worried that we don't have a
    clear roadmap to completion for this work.

    * emailed from Andrew Morton : (78 patches)
    kconfig: update compression algorithm info
    c/r: prctl: add ability to set new mm_struct::exe_file
    c/r: prctl: extend PR_SET_MM to set up more mm_struct entries
    c/r: procfs: add arg_start/end, env_start/end and exit_code members to /proc/$pid/stat
    syscalls, x86: add __NR_kcmp syscall
    fs, proc: introduce /proc//task//children entry
    sysctl: make kernel.ns_last_pid control dependent on CHECKPOINT_RESTORE
    aio/vfs: cleanup of rw_copy_check_uvector() and compat_rw_copy_check_uvector()
    eventfd: change int to __u64 in eventfd_signal()
    fs/nls: add Apple NLS
    pidns: make killed children autoreap
    pidns: use task_active_pid_ns in do_notify_parent
    rapidio/tsi721: add DMA engine support
    rapidio: add DMA engine support for RIO data transfers
    ipc/mqueue: add rbtree node caching support
    tools/selftests: add mq_perf_tests
    ipc/mqueue: strengthen checks on mqueue creation
    ipc/mqueue: correct mq_attr_ok test
    ipc/mqueue: improve performance of send/recv
    selftests: add mq_open_tests
    ...

    Linus Torvalds
     
  • Previous code was using optimizations which were developed to work well
    even on narrow-word CPUs (by today's standards). But Linux runs only on
    32-bit and wider CPUs. We can use that.

    First: using 32x32->64 multiply and trivial 32-bit shift, we can correctly
    divide by 10 much larger numbers, and thus we can print groups of 9 digits
    instead of groups of 5 digits.

    Next: there are two algorithms to print larger numbers. One is generic:
    divide by 1000000000 and repeatedly print groups of (up to) 9 digits.
    It's conceptually simple, but requires an (unsigned long long) /
    1000000000 division.

    Second algorithm splits 64-bit unsigned long long into 16-bit chunks,
    manipulates them cleverly and generates groups of 4 decimal digits. It so
    happens that it does NOT require long long division.

    If long is > 32 bits, division of 64-bit values is relatively easy, and we
    will use the first algorithm. If long long is > 64 bits (strange
    architecture with VERY large long long), second algorithm can't be used,
    and we again use the first one.

    Else (if long is 32 bits and long long is 64 bits) we use second one.

    And third: there is a simple optimization which takes fast path not only
    for zero as was done before, but for all one-digit numbers.

    In all tested cases new code is faster than old one, in many cases by 30%,
    in few cases by more than 50% (for example, on x86-32, conversion of
    12345678). Code growth is ~0 in 32-bit case and ~130 bytes in 64-bit
    case.

    This patch is based upon an original from Michal Nazarewicz.

    [akpm@linux-foundation.org: checkpatch fixes]
    Signed-off-by: Michal Nazarewicz
    Signed-off-by: Denys Vlasenko
    Cc: Douglas W Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denys Vlasenko
     
  • Pull two small kvm fixes from Avi Kivity:
    "A build fix for non-kvm archs and a transparent hugepage refcount
    bugfix on hosts with 4M pages."

    * git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: Export asm-generic/kvm_para.h
    KVM: MMU: fix huge page adapted on non-PAE host

    Linus Torvalds
     

31 May, 2012

1 commit


30 May, 2012

1 commit

  • When holding the mmap_sem for reading, pmd_offset_map_lock should only
    run on a pmd_t that has been read atomically from the pmdp pointer,
    otherwise we may read only half of it leading to this crash.

    PID: 11679 TASK: f06e8000 CPU: 3 COMMAND: "do_race_2_panic"
    #0 [f06a9dd8] crash_kexec at c049b5ec
    #1 [f06a9e2c] oops_end at c083d1c2
    #2 [f06a9e40] no_context at c0433ded
    #3 [f06a9e64] bad_area_nosemaphore at c043401a
    #4 [f06a9e6c] __do_page_fault at c0434493
    #5 [f06a9eec] do_page_fault at c083eb45
    #6 [f06a9f04] error_code (via page_fault) at c083c5d5
    EAX: 01fb470c EBX: fff35000 ECX: 00000003 EDX: 00000100 EBP:
    00000000
    DS: 007b ESI: 9e201000 ES: 007b EDI: 01fb4700 GS: 00e0
    CS: 0060 EIP: c083bc14 ERR: ffffffff EFLAGS: 00010246
    #7 [f06a9f38] _spin_lock at c083bc14
    #8 [f06a9f44] sys_mincore at c0507b7d
    #9 [f06a9fb0] system_call at c083becd
    start len
    EAX: ffffffda EBX: 9e200000 ECX: 00001000 EDX: 6228537f
    DS: 007b ESI: 00000000 ES: 007b EDI: 003d0f00
    SS: 007b ESP: 62285354 EBP: 62285388 GS: 0033
    CS: 0073 EIP: 00291416 ERR: 000000da EFLAGS: 00000286

    This should be a longstanding bug affecting x86 32bit PAE without THP.
    Only archs with 64bit large pmd_t and 32bit unsigned long should be
    affected.

    With THP enabled the barrier() in pmd_none_or_trans_huge_or_clear_bad()
    would partly hide the bug when the pmd transition from none to stable,
    by forcing a re-read of the *pmd in pmd_offset_map_lock, but when THP is
    enabled a new set of problem arises by the fact could then transition
    freely in any of the none, pmd_trans_huge or pmd_trans_stable states.
    So making the barrier in pmd_none_or_trans_huge_or_clear_bad()
    unconditional isn't good idea and it would be a flakey solution.

    This should be fully fixed by introducing a pmd_read_atomic that reads
    the pmd in order with THP disabled, or by reading the pmd atomically
    with cmpxchg8b with THP enabled.

    Luckily this new race condition only triggers in the places that must
    already be covered by pmd_none_or_trans_huge_or_clear_bad() so the fix
    is localized there but this bug is not related to THP.

    NOTE: this can trigger on x86 32bit systems with PAE enabled with more
    than 4G of ram, otherwise the high part of the pmd will never risk to be
    truncated because it would be zero at all times, in turn so hiding the
    SMP race.

    This bug was discovered and fully debugged by Ulrich, quote:

    ----
    [..]
    pmd_none_or_trans_huge_or_clear_bad() loads the content of edx and
    eax.

    496 static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t
    *pmd)
    497 {
    498 /* depend on compiler for an atomic pmd read */
    499 pmd_t pmdval = *pmd;

    // edi = pmd pointer
    0xc0507a74 : mov 0x8(%esp),%edi
    ...
    // edx = PTE page table high address
    0xc0507a84 : mov 0x4(%edi),%edx
    ...
    // eax = PTE page table low address
    0xc0507a8e : mov (%edi),%eax

    [..]

    Please note that the PMD is not read atomically. These are two "mov"
    instructions where the high order bits of the PMD entry are fetched
    first. Hence, the above machine code is prone to the following race.

    - The PMD entry {high|low} is 0x0000000000000000.
    The "mov" at 0xc0507a84 loads 0x00000000 into edx.

    - A page fault (on another CPU) sneaks in between the two "mov"
    instructions and instantiates the PMD.

    - The PMD entry {high|low} is now 0x00000003fda38067.
    The "mov" at 0xc0507a8e loads 0xfda38067 into eax.
    ----

    Reported-by: Ulrich Obergfell
    Signed-off-by: Andrea Arcangeli
    Cc: Mel Gorman
    Cc: Hugh Dickins
    Cc: Larry Woodman
    Cc: Petr Matousek
    Cc: Rik van Riel
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

29 May, 2012

1 commit


27 May, 2012

2 commits

  • This makes actually live up to its promise of
    allowing architectures to help tune the string functions that do their
    work a word at a time.

    David had already taken the x86 strncpy_from_user() function, modified
    it to work on sparc, and then done the extra work to make it generically
    useful. This then expands on that work by making x86 use that generic
    version, completing the circle.

    But more importantly, it fixes up the word-at-a-time interfaces so that
    it's now easy to also support things like strnlen_user(), and pretty
    much most random string functions.

    David reports that it all works fine on sparc, and Jonas Bonn reported
    that an earlier version of this worked on OpenRISC too. It's pretty
    easy for architectures to add support for this and just replace their
    private versions with the generic code.

    * generic-string-functions:
    sparc: use the new generic strnlen_user() function
    x86: use the new generic strnlen_user() function
    lib: add generic strnlen_user() function
    word-at-a-time: make the interfaces truly generic
    x86: use generic strncpy_from_user routine

    Linus Torvalds
     
  • This changes the interfaces in to be a bit more
    complicated, but a lot more generic.

    In particular, it allows us to really do the operations efficiently on
    both little-endian and big-endian machines, pretty much regardless of
    machine details. For example, if you can rely on a fast population
    count instruction on your architecture, this will allow you to make your
    optimized file with that.

    NOTE! The "generic" version in include/asm-generic/word-at-a-time.h is
    not truly generic, it actually only works on big-endian. Why? Because
    on little-endian the generic algorithms are wasteful, since you can
    inevitably do better. The x86 implementation is an example of that.

    (The only truly non-generic part of the asm-generic implementation is
    the "find_zero()" function, and you could make a little-endian version
    of it. And if the Kbuild infrastructure allowed us to pick a particular
    header file, that would be lovely)

    The functions are as follows:

    - WORD_AT_A_TIME_CONSTANTS: specific constants that the algorithm
    uses.

    - has_zero(): take a word, and determine if it has a zero byte in it.
    It gets the word, the pointer to the constant pool, and a pointer to
    an intermediate "data" field it can set.

    This is the "quick-and-dirty" zero tester: it's what is run inside
    the hot loops.

    - "prep_zero_mask()": take the word, the data that has_zero() produced,
    and the constant pool, and generate an *exact* mask of which byte had
    the first zero. This is run directly *outside* the loop, and allows
    the "has_zero()" function to answer the "is there a zero byte"
    question without necessarily getting exactly *which* byte is the
    first one to contain a zero.

    If you do multiple byte lookups concurrently (eg "hash_name()", which
    looks for both NUL and '/' bytes), after you've done the prep_zero_mask()
    phase, the result of those can be or'ed together to get the "either
    or" case.

    - The result from "prep_zero_mask()" can then be fed into "find_zero()"
    (to find the byte offset of the first byte that was zero) or into
    "zero_bytemask()" (to find the bytemask of the bytes preceding the
    zero byte).

    The existence of zero_bytemask() is optional, and is not necessary
    for the normal string routines. But dentry name hashing needs it, so
    if you enable DENTRY_WORD_AT_A_TIME you need to expose it.

    This changes the generic strncpy_from_user() function and the dentry
    hashing functions to use these modified word-at-a-time interfaces. This
    gets us back to the optimized state of the x86 strncpy that we lost in
    the previous commit when moving over to the generic version.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

26 May, 2012

3 commits

  • Pull tile updates from Chris Metcalf:
    "These changes cover a range of new arch/tile features and
    optimizations. They've been through LKML review and on linux-next for
    a month or so. There's also one bug-fix that just missed 3.4, which
    I've marked for stable."

    Fixed up trivial conflict in arch/tile/Kconfig (new added tile Kconfig
    entries clashing with the generic timer/clockevents changes).

    * git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
    tile: default to tilegx_defconfig for ARCH=tile
    tile: fix bug where fls(0) was not returning 0
    arch/tile: mark TILEGX as not EXPERIMENTAL
    tile/mm/fault.c: Port OOM changes to handle_page_fault
    arch/tile: add descriptive text if the kernel reports a bad trap
    arch/tile: allow querying cpu module information from the hypervisor
    arch/tile: fix hardwall for tilegx and generalize for idn and ipi
    arch/tile: support multiple huge page sizes dynamically
    mm: add new arch_make_huge_pte() method for tile support
    arch/tile: support kexec() for tilegx
    arch/tile: support header for cacheflush() syscall
    arch/tile: Allow tilegx to build with either 16K or 64K page size
    arch/tile: optimize get_user/put_user and friends
    arch/tile: support building big-endian kernel
    arch/tile: allow building Linux with transparent huge pages enabled
    arch/tile: use interrupt critical sections less

    Linus Torvalds
     
  • The change adds some infrastructure for managing tile pmd's more generally,
    using pte_pmd() and pmd_pte() methods to translate pmd values to and
    from ptes, since on TILEPro a pmd is really just a nested structure
    holding a pgd (aka pte). Several existing pmd methods are moved into
    this framework, and a whole raft of additional pmd accessors are defined
    that are used by the transparent hugepage framework.

    The tile PTE now has a "client2" bit. The bit is used to indicate a
    transparent huge page is in the process of being split into subpages.

    This change also fixes a generic bug where the return value of the
    generic pmdp_splitting_flush() was incorrect.

    Signed-off-by: Chris Metcalf

    Chris Metcalf
     
  • Pull CMA and ARM DMA-mapping updates from Marek Szyprowski:
    "These patches contain two major updates for DMA mapping subsystem
    (mainly for ARM architecture). First one is Contiguous Memory
    Allocator (CMA) which makes it possible for device drivers to allocate
    big contiguous chunks of memory after the system has booted.

    The main difference from the similar frameworks is the fact that CMA
    allows to transparently reuse the memory region reserved for the big
    chunk allocation as a system memory, so no memory is wasted when no
    big chunk is allocated. Once the alloc request is issued, the
    framework migrates system pages to create space for the required big
    chunk of physically contiguous memory.

    For more information one can refer to nice LWN articles:

    - 'A reworked contiguous memory allocator':
    http://lwn.net/Articles/447405/

    - 'CMA and ARM':
    http://lwn.net/Articles/450286/

    - 'A deep dive into CMA':
    http://lwn.net/Articles/486301/

    - and the following thread with the patches and links to all previous
    versions:
    https://lkml.org/lkml/2012/4/3/204

    The main client for this new framework is ARM DMA-mapping subsystem.

    The second part provides a complete redesign in ARM DMA-mapping
    subsystem. The core implementation has been changed to use common
    struct dma_map_ops based infrastructure with the recent updates for
    new dma attributes merged in v3.4-rc2. This allows to use more than
    one implementation of dma-mapping calls and change/select them on the
    struct device basis. The first client of this new infractructure is
    dmabounce implementation which has been completely cut out of the
    core, common code.

    The last patch of this redesign update introduces a new, experimental
    implementation of dma-mapping calls on top of generic IOMMU framework.
    This lets ARM sub-platform to transparently use IOMMU for DMA-mapping
    calls if one provides required IOMMU hardware.

    For more information please refer to the following thread:
    http://www.spinics.net/lists/arm-kernel/msg175729.html

    The last patch merges changes from both updates and provides a
    resolution for the conflicts which cannot be avoided when patches have
    been applied on the same files (mainly arch/arm/mm/dma-mapping.c)."

    Acked by Andrew Morton :
    "Yup, this one please. It's had much work, plenty of review and I
    think even Russell is happy with it."

    * 'for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping: (28 commits)
    ARM: dma-mapping: use PMD size for section unmap
    cma: fix migration mode
    ARM: integrate CMA with DMA-mapping subsystem
    X86: integrate CMA with DMA-mapping subsystem
    drivers: add Contiguous Memory Allocator
    mm: trigger page reclaim in alloc_contig_range() to stabilise watermarks
    mm: extract reclaim code from __alloc_pages_direct_reclaim()
    mm: Serialize access to min_free_kbytes
    mm: page_isolation: MIGRATE_CMA isolation functions added
    mm: mmzone: MIGRATE_CMA migration type added
    mm: page_alloc: change fallbacks array handling
    mm: page_alloc: introduce alloc_contig_range()
    mm: compaction: export some of the functions
    mm: compaction: introduce isolate_freepages_range()
    mm: compaction: introduce map_pages()
    mm: compaction: introduce isolate_migratepages_range()
    mm: page_alloc: remove trailing whitespace
    ARM: dma-mapping: add support for IOMMU mapper
    ARM: dma-mapping: use alloc, mmap, free from dma_ops
    ARM: dma-mapping: remove redundant code and do the cleanup
    ...

    Conflicts:
    arch/x86/include/asm/dma-mapping.h

    Linus Torvalds
     

25 May, 2012

2 commits

  • Pull KVM changes from Avi Kivity:
    "Changes include additional instruction emulation, page-crossing MMIO,
    faster dirty logging, preventing the watchdog from killing a stopped
    guest, module autoload, a new MSI ABI, and some minor optimizations
    and fixes. Outside x86 we have a small s390 and a very large ppc
    update.

    Regarding the new (for kvm) rebaseless workflow, some of the patches
    that were merged before we switch trees had to be rebased, while
    others are true pulls. In either case the signoffs should be correct
    now."

    Fix up trivial conflicts in Documentation/feature-removal-schedule.txt
    arch/powerpc/kvm/book3s_segment.S and arch/x86/include/asm/kvm_para.h.

    I suspect the kvm_para.h resolution ends up doing the "do I have cpuid"
    check effectively twice (it was done differently in two different
    commits), but better safe than sorry ;)

    * 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (125 commits)
    KVM: make asm-generic/kvm_para.h have an ifdef __KERNEL__ block
    KVM: s390: onereg for timer related registers
    KVM: s390: epoch difference and TOD programmable field
    KVM: s390: KVM_GET/SET_ONEREG for s390
    KVM: s390: add capability indicating COW support
    KVM: Fix mmu_reload() clash with nested vmx event injection
    KVM: MMU: Don't use RCU for lockless shadow walking
    KVM: VMX: Optimize %ds, %es reload
    KVM: VMX: Fix %ds/%es clobber
    KVM: x86 emulator: convert bsf/bsr instructions to emulate_2op_SrcV_nobyte()
    KVM: VMX: unlike vmcs on fail path
    KVM: PPC: Emulator: clean up SPR reads and writes
    KVM: PPC: Emulator: clean up instruction parsing
    kvm/powerpc: Add new ioctl to retreive server MMU infos
    kvm/book3s: Make kernel emulated H_PUT_TCE available for "PR" KVM
    KVM: PPC: bookehv: Fix r8/r13 storing in level exception handler
    KVM: PPC: Book3S: Enable IRQs during exit handling
    KVM: PPC: Fix PR KVM on POWER7 bare metal
    KVM: PPC: Fix stbux emulation
    KVM: PPC: bookehv: Use lwz/stw instead of PPC_LL/PPC_STL for 32-bit fields
    ...

    Linus Torvalds
     
  • Pull GPIO driver changes from Grant Likely:
    "Lots of gpio changes, both to core code and drivers.

    Changes do touch architecture code to remove the need for separate
    arm/gpio.h includes in most architectures.

    Some new drivers are added, and a number of gpio drivers are converted
    to use irq_domains for gpio inputs used as interrupts. Device tree
    support has been amended to allow multiple gpio_chips to use the same
    device tree node.

    Remaining changes are primarily bug fixes."

    * tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux-2.6: (33 commits)
    gpio/generic: initialize basic_mmio_gpio shadow variables properly
    gpiolib: Remove 'const' from data argument of gpiochip_find()
    gpio/rc5t583: add gpio driver for RICOH PMIC RC5T583
    gpiolib: quiet gpiochip_add boot message noise
    gpio: mpc8xxx: Prevent NULL pointer deref in demux handler
    gpio/lpc32xx: Add device tree support
    gpio: Adjust of_xlate API to support multiple GPIO chips
    gpiolib: Implement devm_gpio_request_one()
    gpio-mcp23s08: dbg_show: fix pullup configuration display
    Add support for TCA6424A
    gpio/omap: (re)fix wakeups on level-triggered GPIOs
    gpio/omap: fix broken context restore for non-OFF mode transitions
    gpio/omap: fix missing check in *_runtime_suspend()
    gpio/omap: remove cpu_is_omapxxxx() checks from *_runtime_resume()
    gpio/omap: remove suspend/resume callbacks
    gpio/omap: remove retrigger variable in gpio_irq_handler
    gpio/omap: remove saved_wakeup field from struct gpio_bank
    gpio/omap: remove suspend_wakeup field from struct gpio_bank
    gpio/omap: remove saved_fallingdetect, saved_risingdetect
    gpio/omap: remove virtual_irq_start variable
    ...

    Conflicts:
    drivers/gpio/gpio-samsung.c

    Linus Torvalds
     

23 May, 2012

1 commit

  • Pull perf changes from Ingo Molnar:
    "Lots of changes:

    - (much) improved assembly annotation support in perf report, with
    jump visualization, searching, navigation, visual output
    improvements and more.

    - kernel support for AMD IBS PMU hardware features. Notably 'perf
    record -e cycles:p' and 'perf top -e cycles:p' should work without
    skid now, like PEBS does on the Intel side, because it takes
    advantage of IBS transparently.

    - the libtracevents library: it is the first step towards unifying
    tracing tooling and perf, and it also gives a tracing library for
    external tools like powertop to rely on.

    - infrastructure: various improvements and refactoring of the UI
    modules and related code

    - infrastructure: cleanup and simplification of the profiling
    targets code (--uid, --pid, --tid, --cpu, --all-cpus, etc.)

    - tons of robustness fixes all around

    - various ftrace updates: speedups, cleanups, robustness
    improvements.

    - typing 'make' in tools/ will now give you a menu of projects to
    build and a short help text to explain what each does.

    - ... and lots of other changes I forgot to list.

    The perf record make bzImage + perf report regression you reported
    should be fixed."

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (166 commits)
    tracing: Remove kernel_lock annotations
    tracing: Fix initial buffer_size_kb state
    ring-buffer: Merge separate resize loops
    perf evsel: Create events initially disabled -- again
    perf tools: Split term type into value type and term type
    perf hists: Fix callchain ip printf format
    perf target: Add uses_mmap field
    ftrace: Remove selecting FRAME_POINTER with FUNCTION_TRACER
    ftrace/x86: Have x86 ftrace use the ftrace_modify_all_code()
    ftrace: Make ftrace_modify_all_code() global for archs to use
    ftrace: Return record ip addr for ftrace_location()
    ftrace: Consolidate ftrace_location() and ftrace_text_reserved()
    ftrace: Speed up search by skipping pages by address
    ftrace: Remove extra helper functions
    ftrace: Sort all function addresses, not just per page
    tracing: change CPU ring buffer state from tracing_cpumask
    tracing: Check return value of tracing_dentry_percpu()
    ring-buffer: Reset head page before running self test
    ring-buffer: Add integrity check at end of iter read
    ring-buffer: Make addition of pages in ring buffer atomic
    ...

    Linus Torvalds
     

22 May, 2012

3 commits

  • Conflicts:
    arch/arm/Kconfig
    arch/arm/mm/dma-mapping.c

    Signed-off-by: Marek Szyprowski

    Marek Szyprowski
     
  • Pull security subsystem updates from James Morris:
    "New notable features:
    - The seccomp work from Will Drewry
    - PR_{GET,SET}_NO_NEW_PRIVS from Andy Lutomirski
    - Longer security labels for Smack from Casey Schaufler
    - Additional ptrace restriction modes for Yama by Kees Cook"

    Fix up trivial context conflicts in arch/x86/Kconfig and include/linux/filter.h

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits)
    apparmor: fix long path failure due to disconnected path
    apparmor: fix profile lookup for unconfined
    ima: fix filename hint to reflect script interpreter name
    KEYS: Don't check for NULL key pointer in key_validate()
    Smack: allow for significantly longer Smack labels v4
    gfp flags for security_inode_alloc()?
    Smack: recursive tramsmute
    Yama: replace capable() with ns_capable()
    TOMOYO: Accept manager programs which do not start with / .
    KEYS: Add invalidation support
    KEYS: Do LRU discard in full keyrings
    KEYS: Permit in-place link replacement in keyring list
    KEYS: Perform RCU synchronisation on keys prior to key destruction
    KEYS: Announce key type (un)registration
    KEYS: Reorganise keys Makefile
    KEYS: Move the key config into security/keys/Kconfig
    KEYS: Use the compat keyctl() syscall wrapper on Sparc64 for Sparc32 compat
    Yama: remove an unused variable
    samples/seccomp: fix dependencies on arch macros
    Yama: add additional ptrace scopes
    ...

    Linus Torvalds
     
  • Pull PCI changes from Bjorn Helgaas:
    - Host bridge cleanups from Yinghai
    - Disable Bus Master bit on PCI device shutdown (kexec-related)
    - Stratus ftServer fix
    - pci_dev_reset() locking fix
    - IvyBridge graphics erratum workaround

    * tag 'pci-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (21 commits)
    microblaze/PCI: fix "io_offset undeclared" error
    x86/PCI: only check for spinlock being held in SMP kernels
    resources: add resource_overlaps()
    PCI: fix uninitialized variable 'cap_mask'
    MAINTAINERS: update PCI git tree and patchwork
    PCI: disable Bus Master on PCI device shutdown
    PCI: work around IvyBridge internal graphics FLR erratum
    x86/PCI: fix unused variable warning in amd_bus.c
    PCI: move mutex locking out of pci_dev_reset function
    PCI: work around Stratus ftServer broken PCIe hierarchy
    x86/PCI: merge pcibios_scan_root() and pci_scan_bus_on_node()
    x86/PCI: dynamically allocate pci_root_info for native host bridge drivers
    x86/PCI: embed pci_sysdata into pci_root_info on ACPI path
    x86/PCI: embed name into pci_root_info struct
    x86/PCI: add host bridge resource release for _CRS path
    x86/PCI: refactor get_current_resources()
    PCI: add host bridge release support
    PCI: add generic device into pci_host_bridge struct
    PCI: rename pci_host_bridge() to find_pci_root_bridge()
    x86/PCI: fix memleak with get_current_resources()
    ...

    Linus Torvalds
     

21 May, 2012

1 commit