19 Aug, 2014

2 commits

  • commit 197725de65477bc8509b41388157c1a2283542bb upstream.

    Make espfix64 a hidden Kconfig option. This fixes the x86-64 UML
    build which had broken due to the non-existence of init_espfix_bsp()
    in UML: since UML uses its own Kconfig, this option does not appear in
    the UML build.

    This also makes it possible to make support for 16-bit segments a
    configuration option, for the people who want to minimize the size of
    the kernel.

    Reported-by: Ingo Molnar
    Signed-off-by: H. Peter Anvin
    Cc: Richard Weinberger
    Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
    Signed-off-by: Jiri Slaby

    H. Peter Anvin
     
  • commit 3891a04aafd668686239349ea58f3314ea2af86b upstream.

    The IRET instruction, when returning to a 16-bit segment, only
    restores the bottom 16 bits of the user space stack pointer. This
    causes some 16-bit software to break, but it also leaks kernel state
    to user space. We have a software workaround for that ("espfix") for
    the 32-bit kernel, but it relies on a nonzero stack segment base which
    is not available in 64-bit mode.

    In checkin:

    b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels

    we "solved" this by forbidding 16-bit segments on 64-bit kernels, with
    the logic that 16-bit support is crippled on 64-bit kernels anyway (no
    V86 support), but it turns out that people are doing stuff like
    running old Win16 binaries under Wine and expect it to work.

    This works around this by creating percpu "ministacks", each of which
    is mapped 2^16 times 64K apart. When we detect that the return SS is
    on the LDT, we copy the IRET frame to the ministack and use the
    relevant alias to return to userspace. The ministacks are mapped
    readonly, so if IRET faults we promote #GP to #DF which is an IST
    vector and thus has its own stack; we then do the fixup in the #DF
    handler.

    (Making #GP an IST exception would make the msr_safe functions unsafe
    in NMI/MC context, and quite possibly have other effects.)

    Special thanks to:

    - Andy Lutomirski, for the suggestion of using very small stack slots
    and copy (as opposed to map) the IRET frame there, and for the
    suggestion to mark them readonly and let the fault promote to #DF.
    - Konrad Wilk for paravirt fixup and testing.
    - Borislav Petkov for testing help and useful comments.

    Reported-by: Brian Gerst
    Signed-off-by: H. Peter Anvin
    Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
    Cc: Konrad Rzeszutek Wilk
    Cc: Borislav Petkov
    Cc: Andrew Lutomriski
    Cc: Linus Torvalds
    Cc: Dirk Hohndel
    Cc: Arjan van de Ven
    Cc: comex
    Cc: Alexander van Heukelum
    Cc: Boris Ostrovsky
    Cc: # consider after upstream merge
    Signed-off-by: Jiri Slaby

    H. Peter Anvin
     

13 Apr, 2014

1 commit

  • commit 03b8c7b623c80af264c4c8d6111e5c6289933666 upstream.

    If an architecture has futex_atomic_cmpxchg_inatomic() implemented and there
    is no runtime check necessary, allow to skip the test within futex_init().

    This allows to get rid of some code which would always give the same result,
    and also allows the compiler to optimize a couple of if statements away.

    Signed-off-by: Heiko Carstens
    Cc: Finn Thain
    Cc: Geert Uytterhoeven
    Link: http://lkml.kernel.org/r/20140302120947.GA3641@osiris
    Signed-off-by: Thomas Gleixner
    [geert: Backported to v3.10..v3.13]
    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Jiri Slaby

    Heiko Carstens
     

11 Oct, 2013

1 commit


23 Sep, 2013

1 commit


15 Sep, 2013

1 commit

  • Pull SLAB update from Pekka Enberg:
    "Nothing terribly exciting here apart from Christoph's kmalloc
    unification patches that brings sl[aou]b implementations closer to
    each other"

    * 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
    slab: Use correct GFP_DMA constant
    slub: remove verify_mem_not_deleted()
    mm/sl[aou]b: Move kmallocXXX functions to common code
    mm, slab_common: add 'unlikely' to size check of kmalloc_slab()
    mm/slub.c: beautify code for removing redundancy 'break' statement.
    slub: Remove unnecessary page NULL check
    slub: don't use cpu partial pages on UP
    mm/slub: beautify code for 80 column limitation and tab alignment
    mm/slub: remove 'per_cpu' which is useless variable

    Linus Torvalds
     

12 Sep, 2013

3 commits

  • Command line option rootfstype=ramfs to obtain old initramfs behavior, and
    use ramfs instead of tmpfs for stub when root= defined (for cosmetic
    reasons).

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Rob Landley
    Cc: Jeff Layton
    Cc: Jens Axboe
    Cc: Stephen Warren
    Cc: Rusty Russell
    Cc: Jim Cromie
    Cc: Sam Ravnborg
    Cc: Greg Kroah-Hartman
    Cc: "Eric W. Biederman"
    Cc: Alexander Viro
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rob Landley
     
  • Conditionally call the appropriate fs_init function and fill_super
    functions. Add a use once guard to shmem_init() to simply succeed on a
    second call.

    (Note that IS_ENABLED() is a compile time constant so dead code
    elimination removes unused function calls when CONFIG_TMPFS is disabled.)

    Signed-off-by: Rob Landley
    Cc: Jeff Layton
    Cc: Jens Axboe
    Cc: Stephen Warren
    Cc: Rusty Russell
    Cc: Jim Cromie
    Cc: Sam Ravnborg
    Cc: Greg Kroah-Hartman
    Cc: "Eric W. Biederman"
    Cc: Alexander Viro
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rob Landley
     
  • When the rootfs code was a wrapper around ramfs, having them in the same
    file made sense. Now that it can wrap another filesystem type, move it in
    with the init code instead.

    This also allows a subsequent patch to access rootfstype= command line
    arg.

    Signed-off-by: Rob Landley
    Cc: Jeff Layton
    Cc: Jens Axboe
    Cc: Stephen Warren
    Cc: Rusty Russell
    Cc: Jim Cromie
    Cc: Sam Ravnborg
    Cc: Greg Kroah-Hartman
    Cc: "Eric W. Biederman"
    Cc: Alexander Viro
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rob Landley
     

11 Sep, 2013

1 commit

  • Pull kconfig updates from Michal Marek:
    "This is the kconfig part of kbuild for v3.12-rc1:
    - post-3.11 search code fixes and micro-optimizations
    - CONFIG_MODULES is no longer a special case; this is needed to
    eventually fix the bug that using KCONFIG_ALLCONFIG breaks
    allmodconfig
    - long long is used to store hex and int values
    - make silentoldconfig no longer warns when a symbol changes from
    tristate to bool (it's a job for make oldconfig)
    - scripts/diffconfig updated to work with newer Pythons
    - scripts/config does not rely on GNU sed extensions"

    * 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
    kconfig: do not allow more than one symbol to have 'option modules'
    kconfig: regenerate bison parser
    kconfig: do not special-case 'MODULES' symbol
    diffconfig: Update script to support python versions 2.5 through 3.3
    diffconfig: Gracefully exit if the default config files are not present
    modules: do not depend on kconfig to set 'modules' option to symbol MODULES
    kconfig: silence warning when parsing auto.conf when a symbol has changed type
    scripts/config: use sed's POSIX interface
    kconfig: switch to "long long" for sanity
    kconfig: simplify symbol-search code
    kconfig: don't allocate n+1 elements in temporary array
    kconfig: minor style fixes in symbol-search code
    kconfig/[mn]conf: shorten title in search-box
    kconfig: avoid multiple calls to strlen
    Documentation/kconfig: more concise and straightforward search explanation

    Linus Torvalds
     

10 Sep, 2013

1 commit

  • Pull xfs updates from Ben Myers:
    "For 3.12-rc1 there are a number of bugfixes in addition to work to
    ease usage of shared code between libxfs and the kernel, the rest of
    the work to enable project and group quotas to be used simultaneously,
    performance optimisations in the log and the CIL, directory entry file
    type support, fixes for log space reservations, some spelling/grammar
    cleanups, and the addition of user namespace support.

    - introduce readahead to log recovery
    - add directory entry file type support
    - fix a number of spelling errors in comments
    - introduce new Q_XGETQSTATV quotactl for project quotas
    - add USER_NS support
    - log space reservation rework
    - CIL optimisations
    - kernel/userspace libxfs rework"

    * tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs: (112 commits)
    xfs: XFS_MOUNT_QUOTA_ALL needed by userspace
    xfs: dtype changed xfs_dir2_sfe_put_ino to xfs_dir3_sfe_put_ino
    Fix wrong flag ASSERT in xfs_attr_shortform_getvalue
    xfs: finish removing IOP_* macros.
    xfs: inode log reservations are too small
    xfs: check correct status variable for xfs_inobt_get_rec() call
    xfs: inode buffers may not be valid during recovery readahead
    xfs: check LSN ordering for v5 superblocks during recovery
    xfs: btree block LSN escaping to disk uninitialised
    XFS: Assertion failed: first < BBTOB(bp->b_length), file: fs/xfs/xfs_trans_buf.c, line: 568
    xfs: fix bad dquot buffer size in log recovery readahead
    xfs: don't account buffer cancellation during log recovery readahead
    xfs: check for underflow in xfs_iformat_fork()
    xfs: xfs_dir3_sfe_put_ino can be static
    xfs: introduce object readahead to log recovery
    xfs: Simplify xfs_ail_min() with list_first_entry_or_null()
    xfs: Register hotcpu notifier after initialization
    xfs: add xfs sb v4 support for dirent filetype field
    xfs: Add write support for dirent filetype field
    xfs: Add read-only support for dirent filetype field
    ...

    Linus Torvalds
     

05 Sep, 2013

1 commit

  • Pull timers/nohz changes from Ingo Molnar:
    "It mostly contains fixes and full dynticks off-case optimizations, by
    Frederic Weisbecker"

    * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
    nohz: Include local CPU in full dynticks global kick
    nohz: Optimize full dynticks's sched hooks with static keys
    nohz: Optimize full dynticks state checks with static keys
    nohz: Rename a few state variables
    vtime: Always debug check snapshot source _before_ updating it
    vtime: Always scale generic vtime accounting results
    vtime: Optimize full dynticks accounting off case with static keys
    vtime: Describe overriden functions in dedicated arch headers
    m68k: hardirq_count() only need preempt_mask.h
    hardirq: Split preempt count mask definitions
    context_tracking: Split low level state headers
    vtime: Fix racy cputime delta update
    vtime: Remove a few unneeded generic vtime state checks
    context_tracking: User/kernel broundary cross trace events
    context_tracking: Optimize context switch off case with static keys
    context_tracking: Optimize guest APIs off case with static key
    context_tracking: Optimize main APIs off case with static key
    context_tracking: Ground setup for static key use
    context_tracking: Remove full dynticks' hacky dependency on wide context tracking
    nohz: Only enable context tracking on full dynticks CPUs
    ...

    Linus Torvalds
     

03 Sep, 2013

1 commit

  • …/linux-rcu into core/rcu

    Pull RCU updates from Paul E. McKenney:

    "
    * Update RCU documentation. These were posted to LKML at
    https://lkml.org/lkml/2013/8/19/611.

    * Miscellaneous fixes. These were posted to LKML at
    https://lkml.org/lkml/2013/8/19/619.

    * Full-system idle detection. This is for use by Frederic
    Weisbecker's adaptive-ticks mechanism. Its purpose is
    to allow the timekeeping CPU to shut off its tick when
    all other CPUs are idle. These were posted to LKML at
    https://lkml.org/lkml/2013/8/19/648.

    * Improve rcutorture test coverage. These were posted to LKML at
    https://lkml.org/lkml/2013/8/19/675.
    "

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

24 Aug, 2013

1 commit

  • The swapaccount kernel parameter without any values has been removed by
    commit a2c8990aed5a ("memsw: remove noswapaccount kernel parameter") but
    it seems that we didn't get rid of all the left overs.

    Make sure that menuconfig help text and kernel-parameters.txt are clear
    about value for the paramter and remove the stalled comment which is not
    very much useful on its own.

    Signed-off-by: Michal Hocko
    Reported-by: Gergely Risko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

19 Aug, 2013

1 commit

  • TREE_RCU and TREE_PREEMPT_RCU both cause kernel/rcutree.c to be built,
    but only TREE_RCU selects IRQ_WORK, which can result in an undefined
    reference to irq_work_queue for some (random) configs:

    kernel/built-in.o In function `rcu_start_gp_advanced':
    kernel/rcutree.c:1564: undefined reference to `irq_work_queue'

    Select IRQ_WORK from TREE_PREEMPT_RCU too to fix this.

    Signed-off-by: James Hogan
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Dipankar Sarma
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    James Hogan
     

16 Aug, 2013

2 commits

  • Currently, the MODULES symbol is special-cased in different places in the
    kconfig language. For example, if no symbol is defined to enable tristates,
    then kconfig looks up for a symbol named 'MODULES', and forces the 'modules'
    option onto that symbol.

    This causes problems as such:
    - since MODULES is special-cased, reading the configuration with
    KCONFIG_ALLCONFIG set will forcibly set MODULES to be 'valid' (ie.
    it has a valid value), when no such value was previously set. So
    MODULES defaults to 'n' unless it is present in KCONFIG_ALLCONFIG
    - other third-party projects may decide that 'MODULES' plays a different
    role for them

    This has been exposed by cset #cfa98f2e:
    kconfig: do not override symbols already set
    and reported by Stephen in:
    http://marc.info/?l=linux-next&m=137592137915234&w=2

    As suggested by Sam, we explicitly define the MODULES symbol to be the
    tristate-enabler. This will allow us to drop special-casing of MODULES
    in the kconfig language, later.

    (Note: this patch is not a fix to Stephen's issue, just a first step).

    Reported-by: Stephen Rothwell
    Signed-off-by: yann.morin.1998@free.fr
    Cc: Stephen Rothwell
    Cc: Sam Ravnborg
    Cc: Michal Marek
    Cc: Kevin Hilman
    Cc: sedat.dilek@gmail.com
    Cc: Theodore Ts'o

    Yann E. MORIN
     
  • Reviewed-by: Dave Chinner
    Reviewed-by: Gao feng
    Signed-off-by: Dwight Engen
    Signed-off-by: Ben Myers

    Dwight Engen
     

14 Aug, 2013

1 commit

  • Prepare for using a static key in the context tracking subsystem.
    This will help optimizing the off case on its many users:

    * user_enter, user_exit, exception_enter, exception_exit, guest_enter,
    guest_exit, vtime_*()

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Li Zhong
    Cc: Mike Galbraith
    Cc: Kevin Hilman

    Frederic Weisbecker
     

13 Aug, 2013

2 commits

  • cpu partial pages are used to avoid contention which does not exist in
    the UP case. So let SLUB_CPU_PARTIAL depend on SMP.

    Acked-by: Christoph Lameter
    Signed-off-by: Uwe Kleine-König
    Signed-off-by: Pekka Enberg

    Uwe Kleine-König
     
  • Now that the full dynticks subsystem only enables the context tracking
    on full dynticks CPUs, lets remove the dependency on CONTEXT_TRACKING_FORCE

    This dependency was a hack to enable the context tracking widely for the
    full dynticks susbsystem until the latter becomes able to enable it in a
    more CPU-finegrained fashion.

    Now CONTEXT_TRACKING_FORCE only stands for testing on archs that
    work on support for the context tracking while full dynticks can't be
    used yet due to unmet dependencies. It simulates a system where all CPUs
    are full dynticks so that RCU user extended quiescent states and dynticks
    cputime accounting can be tested on the given arch.

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Li Zhong
    Cc: Mike Galbraith
    Cc: Kevin Hilman

    Frederic Weisbecker
     

15 Jul, 2013

2 commits

  • The __cpuinit type of throwaway sections might have made sense
    some time ago when RAM was more constrained, but now the savings
    do not offset the cost and complications. For example, the fix in
    commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
    is a good example of the nasty type of bugs that can be created
    with improper use of the various __init prefixes.

    After a discussion on LKML[1] it was decided that cpuinit should go
    the way of devinit and be phased out. Once all the users are gone,
    we can then finally remove the macros themselves from linux/init.h.

    This removes all the uses of the __cpuinit macros from C files in
    the core kernel directories (kernel, init, lib, mm, and include)
    that don't really have a specific maintainer.

    [1] https://lkml.org/lkml/2013/5/20/589

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     
  • Pull slab update from Pekka Enberg:
    "Highlights:

    - Fix for boot-time problems on some architectures due to
    init_lock_keys() not respecting kmalloc_caches boundaries
    (Christoph Lameter)

    - CONFIG_SLUB_CPU_PARTIAL requested by RT folks (Joonsoo Kim)

    - Fix for excessive slab freelist draining (Wanpeng Li)

    - SLUB and SLOB cleanups and fixes (various people)"

    I ended up editing the branch, and this avoids two commits at the end
    that were immediately reverted, and I instead just applied the oneliner
    fix in between myself.

    * 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
    slub: Check for page NULL before doing the node_match check
    mm/slab: Give s_next and s_stop slab-specific names
    slob: Check for NULL pointer before calling ctor()
    slub: Make cpu partial slab support configurable
    slab: add kmalloc() to kernel API documentation
    slab: fix init_lock_keys
    slob: use DIV_ROUND_UP where possible
    slub: do not put a slab to cpu partial list when cpu_partial is 0
    mm/slub: Use node_nr_slabs and node_nr_objs in get_slabinfo
    mm/slub: Drop unnecessary nr_partials
    mm/slab: Fix /proc/slabinfo unwriteable for slab
    mm/slab: Sharing s_next and s_stop between slab and slub
    mm/slab: Fix drain freelist excessively
    slob: Rework #ifdeffery in slab.h
    mm, slab: moved kmem_cache_alloc_node comment to correct place

    Linus Torvalds
     

10 Jul, 2013

1 commit

  • Add support for extracting LZ4-compressed kernel images, as well as
    LZ4-compressed ramdisk images in the kernel boot process.

    Signed-off-by: Kyungsik Lee
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Russell King
    Cc: Borislav Petkov
    Cc: Florian Fainelli
    Cc: Yann Collet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kyungsik Lee
     

08 Jul, 2013

1 commit

  • CPU partial support can introduce level of indeterminism that is not
    wanted in certain context (like a realtime kernel). Make it
    configurable.

    This patch is based on Christoph Lameter's "slub: Make cpu partial slab
    support configurable V2".

    Acked-by: Christoph Lameter
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Pekka Enberg

    Joonsoo Kim
     

07 Jul, 2013

1 commit

  • Pull timer core updates from Thomas Gleixner:
    "The timer changes contain:

    - posix timer code consolidation and fixes for odd corner cases

    - sched_clock implementation moved from ARM to core code to avoid
    duplication by other architectures

    - alarm timer updates

    - clocksource and clockevents unregistration facilities

    - clocksource/events support for new hardware

    - precise nanoseconds RTC readout (Xen feature)

    - generic support for Xen suspend/resume oddities

    - the usual lot of fixes and cleanups all over the place

    The parts which touch other areas (ARM/XEN) have been coordinated with
    the relevant maintainers. Though this results in an handful of
    trivial to solve merge conflicts, which we preferred over nasty cross
    tree merge dependencies.

    The patches which have been committed in the last few days are bug
    fixes plus the posix timer lot. The latter was in akpms queue and
    next for quite some time; they just got forgotten and Frederic
    collected them last minute."

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits)
    hrtimer: Remove unused variable
    hrtimers: Move SMP function call to thread context
    clocksource: Reselect clocksource when watchdog validated high-res capability
    posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
    posix_timers: fix racy timer delta caching on task exit
    posix-timers: correctly get dying task time sample in posix_cpu_timer_schedule()
    selftests: add basic posix timers selftests
    posix_cpu_timers: consolidate expired timers check
    posix_cpu_timers: consolidate timer list cleanups
    posix_cpu_timer: consolidate expiry time type
    tick: Sanitize broadcast control logic
    tick: Prevent uncontrolled switch to oneshot mode
    tick: Make oneshot broadcast robust vs. CPU offlining
    x86: xen: Sync the CMOS RTC as well as the Xen wallclock
    x86: xen: Sync the wallclock when the system time is set
    timekeeping: Indicate that clock was set in the pvclock gtod notifier
    timekeeping: Pass flags instead of multiple bools to timekeeping_update()
    xen: Remove clock_was_set() call in the resume path
    hrtimers: Support resuming with two or more CPUs online (but stopped)
    timer: Fix jiffies wrap behavior of round_jiffies_common()
    ...

    Linus Torvalds
     

05 Jul, 2013

1 commit


04 Jul, 2013

3 commits

  • Trivial, but it really looks better.

    Signed-off-by: Toralf Förster
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toralf Förster
     
  • do_one_initcall() uses a 64 byte string buffer to save a message. This
    buffer is declared static and is only used at boot up and when a module
    is loaded. As 64 bytes is very small, and this function has very limited
    scope, there's no reason to waste permanent memory with this string and
    not just simply put it on the stack.

    Signed-off-by: Steven Rostedt
    Cc: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven Rostedt
     
  • Now there are only 2 members in struct page_cgroup. Update config MEMCG
    description accordingly.

    Signed-off-by: Sergey Dyasly
    Acked-by: Michal Hocko
    Acked-by: KOSAKI Motohiro
    Acked-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Dyasly
     

03 Jul, 2013

2 commits

  • Pull perf updates from Ingo Molnar:
    "Kernel improvements:

    - watchdog driver improvements by Li Zefan
    - Power7 CPI stack events related improvements by Sukadev Bhattiprolu
    - event multiplexing via hrtimers and other improvements by Stephane
    Eranian
    - kernel stack use optimization by Andrew Hunter
    - AMD IOMMU uncore PMU support by Suravee Suthikulpanit
    - NMI handling rate-limits by Dave Hansen
    - various hw_breakpoint fixes by Oleg Nesterov
    - hw_breakpoint overflow period sampling and related signal handling
    fixes by Jiri Olsa
    - Intel Haswell PMU support by Andi Kleen

    Tooling improvements:

    - Reset SIGTERM handler in workload child process, fix from David
    Ahern.
    - Makefile reorganization, prep work for Kconfig patches, from Jiri
    Olsa.
    - Add automated make test suite, from Jiri Olsa.
    - Add --percent-limit option to 'top' and 'report', from Namhyung
    Kim.
    - Sorting improvements, from Namhyung Kim.
    - Expand definition of sysfs format attribute, from Michael Ellerman.

    Tooling fixes:

    - 'perf tests' fixes from Jiri Olsa.
    - Make Power7 CPI stack events available in sysfs, from Sukadev
    Bhattiprolu.
    - Handle death by SIGTERM in 'perf record', fix from David Ahern.
    - Fix printing of perf_event_paranoid message, from David Ahern.
    - Handle realloc failures in 'perf kvm', from David Ahern.
    - Fix divide by 0 in variance, from David Ahern.
    - Save parent pid in thread struct, from David Ahern.
    - Handle JITed code in shared memory, from Andi Kleen.
    - Fixes for 'perf diff', from Jiri Olsa.
    - Remove some unused struct members, from Jiri Olsa.
    - Add missing liblk.a dependency for python/perf.so, fix from Jiri
    Olsa.
    - Respect CROSS_COMPILE in liblk.a, from Rabin Vincent.
    - No need to do locking when adding hists in perf report, only 'top'
    needs that, from Namhyung Kim.
    - Fix alignment of symbol column in in the hists browser (top,
    report) when -v is given, from NAmhyung Kim.
    - Fix 'perf top' -E option behavior, from Namhyung Kim.
    - Fix bug in isupper() and islower(), from Sukadev Bhattiprolu.
    - Fix compile errors in bp_signal 'perf test', from Sukadev
    Bhattiprolu.

    ... and more things"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (102 commits)
    perf/x86: Disable PEBS-LL in intel_pmu_pebs_disable()
    perf/x86: Fix shared register mutual exclusion enforcement
    perf/x86/intel: Support full width counting
    x86: Add NMI duration tracepoints
    perf: Drop sample rate when sampling is too slow
    x86: Warn when NMI handlers take large amounts of time
    hw_breakpoint: Introduce "struct bp_cpuinfo"
    hw_breakpoint: Simplify *register_wide_hw_breakpoint()
    hw_breakpoint: Introduce cpumask_of_bp()
    hw_breakpoint: Simplify the "weight" usage in toggle_bp_slot() paths
    hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths
    perf/x86/intel: Add mem-loads/stores support for Haswell
    perf/x86/intel: Support Haswell/v4 LBR format
    perf/x86/intel: Move NMI clearing to end of PMI handler
    perf/x86/intel: Add Haswell PEBS support
    perf/x86/intel: Add simple Haswell PMU support
    perf/x86/intel: Add Haswell PEBS record support
    perf/x86/intel: Fix sparse warning
    perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation
    perf/x86/amd: Add IOMMU Performance Counter resource management
    ...

    Linus Torvalds
     
  • Pull RCU updates from Ingo Molnar:
    "The major changes:

    - Simplify RCU's grace-period and callback processing based on the new
    numbering for callbacks.

    - Removal of TINY_PREEMPT_RCU in favor of TREE_PREEMPT_RCU for
    single-CPU low-latency systems.

    - SRCU-related changes and fixes.

    - Miscellaneous fixes, including converting a few remaining printk()
    calls to pr_*().

    - Documentation updates"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
    rcu: Shrink TINY_RCU by reworking CPU-stall ifdefs
    rcu: Shrink TINY_RCU by moving exit_rcu()
    rcu: Remove TINY_PREEMPT_RCU tracing documentation
    rcu: Consolidate rcutiny_plugin.h ifdefs
    rcu: Remove rcu_preempt_note_context_switch()
    rcu: Remove the CONFIG_TINY_RCU ifdefs in rcutiny.h
    rcu: Remove check_cpu_stall_preempt()
    rcu: Simplify RCU_TINY RCU callback invocation
    rcu: Remove rcu_preempt_process_callbacks()
    rcu: Remove rcu_preempt_remove_callbacks()
    rcu: Remove rcu_preempt_check_callbacks()
    rcu: Remove show_tiny_preempt_stats()
    rcu: Remove TINY_PREEMPT_RCU
    powerpc,kvm: fix imbalance srcu_read_[un]lock()
    rcu: Remove srcu_read_lock_raw() and srcu_read_unlock_raw().
    rcu: Apply Dave Jones's NOCB Kconfig help feedback
    rcu: Merge adjacent identical ifdefs
    rcu: Drive quiescent-state-forcing delay from HZ
    rcu: Remove "Experimental" flags
    kthread: Add kworker kthreads to OS-jitter documentation
    ...

    Linus Torvalds
     

25 Jun, 2013

1 commit

  • Some drivers can be built on more platforms than they run on. This is
    a burden for users and distributors who package a kernel. They have to
    manually deselect some (for them useless) drivers when updating their
    configs via oldconfig. And yet, sometimes it is even impossible to
    disable the drivers without patching the kernel.

    Introduce a new config option COMPILE_TEST and make all those drivers
    to depend on the platform they run on, or on the COMPILE_TEST option.
    Now, when users/distributors choose COMPILE_TEST=n they will not have
    the drivers in their allmodconfig setups, but developers still can
    compile-test them with COMPILE_TEST=y.

    Now the drivers where we use this new option:
    * PTP_1588_CLOCK_PCH: The PCH EG20T is only compatible with Intel Atom
    processors so it should depend on x86.
    * FB_GEODE: Geode is 32-bit only so only enable it for X86_32.
    * USB_CHIPIDEA_IMX: The OF_DEVICE dependency will be met on powerpc
    systems -- which do not actually support the hardware via that
    method.
    * INTEL_MID_PTI: It is specific to the Penwell type of Intel Atom
    device.

    [v2]
    * remove EXPERT dependency

    [gregkh - remove chipidea portion, as it's incorrect, and also doesn't
    apply to my driver-core tree]

    Signed-off-by: Jiri Slaby
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Jeff Mahoney
    Cc: Alexander Shishkin
    Cc: linux-usb@vger.kernel.org
    Cc: Florian Tobias Schandinat
    Cc: linux-geode@lists.infradead.org
    Cc: linux-fbdev@vger.kernel.org
    Cc: Richard Cochran
    Cc: netdev@vger.kernel.org
    Cc: Ben Hutchings
    Cc: "Keller, Jacob E"
    Signed-off-by: Greg Kroah-Hartman

    Jiri Slaby
     

18 Jun, 2013

1 commit


13 Jun, 2013

1 commit


11 Jun, 2013

5 commits

  • …u.2013.06.10a' and 'tiny.2013.06.10a' into HEAD

    cbnum.2013.06.10a: Apply simplifications stemming from the new callback
    numbering.

    doc.2013.06.10a: Documentation updates.

    fixes.2013.06.10a: Miscellaneous fixes.

    srcu.2013.06.10a: Updates to SRCU.

    tiny.2013.06.10a: Eliminate TINY_PREEMPT_RCU.

    Paul E. McKenney
     
  • TINY_PREEMPT_RCU adds significant code and complexity, but does not
    offer commensurate benefits. People currently using TINY_PREEMPT_RCU
    can get much better memory footprint with TINY_RCU, or, if they really
    need preemptible RCU, they can use TREE_PREEMPT_RCU with a relatively
    minor degradation in memory footprint. Please note that this move
    has been widely publicized on LKML (https://lkml.org/lkml/2012/11/12/545)
    and on LWN (http://lwn.net/Articles/541037/).

    This commit therefore removes TINY_PREEMPT_RCU.

    Signed-off-by: Paul E. McKenney
    [ paulmck: Updated to eliminate #else in rcutiny.h as suggested by Josh ]
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The Kconfig help text for the RCU_NOCB_CPU_NONE, RCU_NOCB_CPU_ZERO,
    and RCU_NOCB_CPU_ALL Kconfig options was unclear, so this commit
    adds a bit more detail.

    Reported-by: Dave Jones
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • After a release or two, features are no longer experimental. Therefore,
    this commit removes the "Experimental" tag from them.

    Reported-by: Paul Gortmaker
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • This commit fixes a lockdep-detected deadlock by moving a wake_up()
    call out from a rnp->lock critical section. Please see below for
    the long version of this story.

    On Tue, 2013-05-28 at 16:13 -0400, Dave Jones wrote:

    > [12572.705832] ======================================================
    > [12572.750317] [ INFO: possible circular locking dependency detected ]
    > [12572.796978] 3.10.0-rc3+ #39 Not tainted
    > [12572.833381] -------------------------------------------------------
    > [12572.862233] trinity-child17/31341 is trying to acquire lock:
    > [12572.870390] (rcu_node_0){..-.-.}, at: [] rcu_read_unlock_special+0x9f/0x4c0
    > [12572.878859]
    > but task is already holding lock:
    > [12572.894894] (&ctx->lock){-.-...}, at: [] perf_lock_task_context+0x7d/0x2d0
    > [12572.903381]
    > which lock already depends on the new lock.
    >
    > [12572.927541]
    > the existing dependency chain (in reverse order) is:
    > [12572.943736]
    > -> #4 (&ctx->lock){-.-...}:
    > [12572.960032] [] lock_acquire+0x91/0x1f0
    > [12572.968337] [] _raw_spin_lock+0x40/0x80
    > [12572.976633] [] __perf_event_task_sched_out+0x2e7/0x5e0
    > [12572.984969] [] perf_event_task_sched_out+0x93/0xa0
    > [12572.993326] [] __schedule+0x2cf/0x9c0
    > [12573.001652] [] schedule_user+0x2e/0x70
    > [12573.009998] [] retint_careful+0x12/0x2e
    > [12573.018321]
    > -> #3 (&rq->lock){-.-.-.}:
    > [12573.034628] [] lock_acquire+0x91/0x1f0
    > [12573.042930] [] _raw_spin_lock+0x40/0x80
    > [12573.051248] [] wake_up_new_task+0xb7/0x260
    > [12573.059579] [] do_fork+0x105/0x470
    > [12573.067880] [] kernel_thread+0x26/0x30
    > [12573.076202] [] rest_init+0x23/0x140
    > [12573.084508] [] start_kernel+0x3f1/0x3fe
    > [12573.092852] [] x86_64_start_reservations+0x2a/0x2c
    > [12573.101233] [] x86_64_start_kernel+0xcc/0xcf
    > [12573.109528]
    > -> #2 (&p->pi_lock){-.-.-.}:
    > [12573.125675] [] lock_acquire+0x91/0x1f0
    > [12573.133829] [] _raw_spin_lock_irqsave+0x4b/0x90
    > [12573.141964] [] try_to_wake_up+0x31/0x320
    > [12573.150065] [] default_wake_function+0x12/0x20
    > [12573.158151] [] autoremove_wake_function+0x18/0x40
    > [12573.166195] [] __wake_up_common+0x58/0x90
    > [12573.174215] [] __wake_up+0x39/0x50
    > [12573.182146] [] rcu_start_gp_advanced.isra.11+0x4a/0x50
    > [12573.190119] [] rcu_start_future_gp+0x1c9/0x1f0
    > [12573.198023] [] rcu_nocb_kthread+0x114/0x930
    > [12573.205860] [] kthread+0xed/0x100
    > [12573.213656] [] ret_from_fork+0x7c/0xb0
    > [12573.221379]
    > -> #1 (&rsp->gp_wq){..-.-.}:
    > [12573.236329] [] lock_acquire+0x91/0x1f0
    > [12573.243783] [] _raw_spin_lock_irqsave+0x4b/0x90
    > [12573.251178] [] __wake_up+0x23/0x50
    > [12573.258505] [] rcu_start_gp_advanced.isra.11+0x4a/0x50
    > [12573.265891] [] rcu_start_future_gp+0x1c9/0x1f0
    > [12573.273248] [] rcu_nocb_kthread+0x114/0x930
    > [12573.280564] [] kthread+0xed/0x100
    > [12573.287807] [] ret_from_fork+0x7c/0xb0

    Notice the above call chain.

    rcu_start_future_gp() is called with the rnp->lock held. Then it calls
    rcu_start_gp_advance, which does a wakeup.

    You can't do wakeups while holding the rnp->lock, as that would mean
    that you could not do a rcu_read_unlock() while holding the rq lock, or
    any lock that was taken while holding the rq lock. This is because...
    (See below).

    > [12573.295067]
    > -> #0 (rcu_node_0){..-.-.}:
    > [12573.309293] [] __lock_acquire+0x1786/0x1af0
    > [12573.316568] [] lock_acquire+0x91/0x1f0
    > [12573.323825] [] _raw_spin_lock+0x40/0x80
    > [12573.331081] [] rcu_read_unlock_special+0x9f/0x4c0
    > [12573.338377] [] __rcu_read_unlock+0x96/0xa0
    > [12573.345648] [] perf_lock_task_context+0x143/0x2d0
    > [12573.352942] [] find_get_context+0x4e/0x1f0
    > [12573.360211] [] SYSC_perf_event_open+0x514/0xbd0
    > [12573.367514] [] SyS_perf_event_open+0x9/0x10
    > [12573.374816] [] tracesys+0xdd/0xe2

    Notice the above trace.

    perf took its own ctx->lock, which can be taken while holding the rq
    lock. While holding this lock, it did a rcu_read_unlock(). The
    perf_lock_task_context() basically looks like:

    rcu_read_lock();
    raw_spin_lock(ctx->lock);
    rcu_read_unlock();

    Now, what looks to have happened, is that we scheduled after taking that
    first rcu_read_lock() but before taking the spin lock. When we scheduled
    back in and took the ctx->lock, the following rcu_read_unlock()
    triggered the "special" code.

    The rcu_read_unlock_special() takes the rnp->lock, which gives us a
    possible deadlock scenario.

    CPU0 CPU1 CPU2
    ---- ---- ----

    rcu_nocb_kthread()
    lock(rq->lock);
    lock(ctx->lock);
    lock(rnp->lock);

    wake_up();

    lock(rq->lock);

    rcu_read_unlock();

    rcu_read_unlock_special();

    lock(rnp->lock);
    lock(ctx->lock);

    **** DEADLOCK ****

    > [12573.382068]
    > other info that might help us debug this:
    >
    > [12573.403229] Chain exists of:
    > rcu_node_0 --> &rq->lock --> &ctx->lock
    >
    > [12573.424471] Possible unsafe locking scenario:
    >
    > [12573.438499] CPU0 CPU1
    > [12573.445599] ---- ----
    > [12573.452691] lock(&ctx->lock);
    > [12573.459799] lock(&rq->lock);
    > [12573.467010] lock(&ctx->lock);
    > [12573.474192] lock(rcu_node_0);
    > [12573.481262]
    > *** DEADLOCK ***
    >
    > [12573.501931] 1 lock held by trinity-child17/31341:
    > [12573.508990] #0: (&ctx->lock){-.-...}, at: [] perf_lock_task_context+0x7d/0x2d0
    > [12573.516475]
    > stack backtrace:
    > [12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ #39
    > [12573.545357] ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00
    > [12573.552868] ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40
    > [12573.560353] 0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0
    > [12573.567856] Call Trace:
    > [12573.575011] [] dump_stack+0x19/0x1b
    > [12573.582284] [] print_circular_bug+0x200/0x20f
    > [12573.589637] [] __lock_acquire+0x1786/0x1af0
    > [12573.596982] [] ? sched_clock_cpu+0xb5/0x100
    > [12573.604344] [] lock_acquire+0x91/0x1f0
    > [12573.611652] [] ? rcu_read_unlock_special+0x9f/0x4c0
    > [12573.619030] [] _raw_spin_lock+0x40/0x80
    > [12573.626331] [] ? rcu_read_unlock_special+0x9f/0x4c0
    > [12573.633671] [] rcu_read_unlock_special+0x9f/0x4c0
    > [12573.640992] [] ? perf_lock_task_context+0x7d/0x2d0
    > [12573.648330] [] ? put_lock_stats.isra.29+0xe/0x40
    > [12573.655662] [] ? delay_tsc+0x90/0xe0
    > [12573.662964] [] __rcu_read_unlock+0x96/0xa0
    > [12573.670276] [] perf_lock_task_context+0x143/0x2d0
    > [12573.677622] [] ? __perf_event_enable+0x370/0x370
    > [12573.684981] [] find_get_context+0x4e/0x1f0
    > [12573.692358] [] SYSC_perf_event_open+0x514/0xbd0
    > [12573.699753] [] ? get_parent_ip+0xd/0x50
    > [12573.707135] [] ? trace_hardirqs_on_caller+0xfd/0x1c0
    > [12573.714599] [] SyS_perf_event_open+0x9/0x10
    > [12573.721996] [] tracesys+0xdd/0xe2

    This commit delays the wakeup via irq_work(), which is what
    perf and ftrace use to perform wakeups in critical sections.

    Reported-by: Dave Jones
    Signed-off-by: Steven Rostedt
    Signed-off-by: Paul E. McKenney

    Steven Rostedt
     

04 Jun, 2013

1 commit

  • Ever since commit 45f035ab9b8f ("CONFIG_HOTPLUG should be always on"),
    it has been basically impossible to build a kernel with CONFIG_HOTPLUG
    turned off. Remove all the remaining references to it.

    Cc: Russell King
    Cc: Doug Thompson
    Cc: Bjorn Helgaas
    Cc: Steven Whitehouse
    Cc: Arnd Bergmann
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: Andrew Morton
    Signed-off-by: Stephen Rothwell
    Acked-by: Mauro Carvalho Chehab
    Acked-by: Hans Verkuil
    Signed-off-by: Greg Kroah-Hartman

    Stephen Rothwell