15 Sep, 2013

1 commit

  • Pull SLAB update from Pekka Enberg:
    "Nothing terribly exciting here apart from Christoph's kmalloc
    unification patches that brings sl[aou]b implementations closer to
    each other"

    * 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
    slab: Use correct GFP_DMA constant
    slub: remove verify_mem_not_deleted()
    mm/sl[aou]b: Move kmallocXXX functions to common code
    mm, slab_common: add 'unlikely' to size check of kmalloc_slab()
    mm/slub.c: beautify code for removing redundancy 'break' statement.
    slub: Remove unnecessary page NULL check
    slub: don't use cpu partial pages on UP
    mm/slub: beautify code for 80 column limitation and tab alignment
    mm/slub: remove 'per_cpu' which is useless variable

    Linus Torvalds
     

11 Sep, 2013

1 commit

  • Pull kconfig updates from Michal Marek:
    "This is the kconfig part of kbuild for v3.12-rc1:
    - post-3.11 search code fixes and micro-optimizations
    - CONFIG_MODULES is no longer a special case; this is needed to
    eventually fix the bug that using KCONFIG_ALLCONFIG breaks
    allmodconfig
    - long long is used to store hex and int values
    - make silentoldconfig no longer warns when a symbol changes from
    tristate to bool (it's a job for make oldconfig)
    - scripts/diffconfig updated to work with newer Pythons
    - scripts/config does not rely on GNU sed extensions"

    * 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
    kconfig: do not allow more than one symbol to have 'option modules'
    kconfig: regenerate bison parser
    kconfig: do not special-case 'MODULES' symbol
    diffconfig: Update script to support python versions 2.5 through 3.3
    diffconfig: Gracefully exit if the default config files are not present
    modules: do not depend on kconfig to set 'modules' option to symbol MODULES
    kconfig: silence warning when parsing auto.conf when a symbol has changed type
    scripts/config: use sed's POSIX interface
    kconfig: switch to "long long" for sanity
    kconfig: simplify symbol-search code
    kconfig: don't allocate n+1 elements in temporary array
    kconfig: minor style fixes in symbol-search code
    kconfig/[mn]conf: shorten title in search-box
    kconfig: avoid multiple calls to strlen
    Documentation/kconfig: more concise and straightforward search explanation

    Linus Torvalds
     

10 Sep, 2013

1 commit

  • Pull xfs updates from Ben Myers:
    "For 3.12-rc1 there are a number of bugfixes in addition to work to
    ease usage of shared code between libxfs and the kernel, the rest of
    the work to enable project and group quotas to be used simultaneously,
    performance optimisations in the log and the CIL, directory entry file
    type support, fixes for log space reservations, some spelling/grammar
    cleanups, and the addition of user namespace support.

    - introduce readahead to log recovery
    - add directory entry file type support
    - fix a number of spelling errors in comments
    - introduce new Q_XGETQSTATV quotactl for project quotas
    - add USER_NS support
    - log space reservation rework
    - CIL optimisations
    - kernel/userspace libxfs rework"

    * tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs: (112 commits)
    xfs: XFS_MOUNT_QUOTA_ALL needed by userspace
    xfs: dtype changed xfs_dir2_sfe_put_ino to xfs_dir3_sfe_put_ino
    Fix wrong flag ASSERT in xfs_attr_shortform_getvalue
    xfs: finish removing IOP_* macros.
    xfs: inode log reservations are too small
    xfs: check correct status variable for xfs_inobt_get_rec() call
    xfs: inode buffers may not be valid during recovery readahead
    xfs: check LSN ordering for v5 superblocks during recovery
    xfs: btree block LSN escaping to disk uninitialised
    XFS: Assertion failed: first < BBTOB(bp->b_length), file: fs/xfs/xfs_trans_buf.c, line: 568
    xfs: fix bad dquot buffer size in log recovery readahead
    xfs: don't account buffer cancellation during log recovery readahead
    xfs: check for underflow in xfs_iformat_fork()
    xfs: xfs_dir3_sfe_put_ino can be static
    xfs: introduce object readahead to log recovery
    xfs: Simplify xfs_ail_min() with list_first_entry_or_null()
    xfs: Register hotcpu notifier after initialization
    xfs: add xfs sb v4 support for dirent filetype field
    xfs: Add write support for dirent filetype field
    xfs: Add read-only support for dirent filetype field
    ...

    Linus Torvalds
     

05 Sep, 2013

1 commit

  • Pull timers/nohz changes from Ingo Molnar:
    "It mostly contains fixes and full dynticks off-case optimizations, by
    Frederic Weisbecker"

    * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
    nohz: Include local CPU in full dynticks global kick
    nohz: Optimize full dynticks's sched hooks with static keys
    nohz: Optimize full dynticks state checks with static keys
    nohz: Rename a few state variables
    vtime: Always debug check snapshot source _before_ updating it
    vtime: Always scale generic vtime accounting results
    vtime: Optimize full dynticks accounting off case with static keys
    vtime: Describe overriden functions in dedicated arch headers
    m68k: hardirq_count() only need preempt_mask.h
    hardirq: Split preempt count mask definitions
    context_tracking: Split low level state headers
    vtime: Fix racy cputime delta update
    vtime: Remove a few unneeded generic vtime state checks
    context_tracking: User/kernel broundary cross trace events
    context_tracking: Optimize context switch off case with static keys
    context_tracking: Optimize guest APIs off case with static key
    context_tracking: Optimize main APIs off case with static key
    context_tracking: Ground setup for static key use
    context_tracking: Remove full dynticks' hacky dependency on wide context tracking
    nohz: Only enable context tracking on full dynticks CPUs
    ...

    Linus Torvalds
     

03 Sep, 2013

1 commit

  • …/linux-rcu into core/rcu

    Pull RCU updates from Paul E. McKenney:

    "
    * Update RCU documentation. These were posted to LKML at
    https://lkml.org/lkml/2013/8/19/611.

    * Miscellaneous fixes. These were posted to LKML at
    https://lkml.org/lkml/2013/8/19/619.

    * Full-system idle detection. This is for use by Frederic
    Weisbecker's adaptive-ticks mechanism. Its purpose is
    to allow the timekeeping CPU to shut off its tick when
    all other CPUs are idle. These were posted to LKML at
    https://lkml.org/lkml/2013/8/19/648.

    * Improve rcutorture test coverage. These were posted to LKML at
    https://lkml.org/lkml/2013/8/19/675.
    "

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

24 Aug, 2013

1 commit

  • The swapaccount kernel parameter without any values has been removed by
    commit a2c8990aed5a ("memsw: remove noswapaccount kernel parameter") but
    it seems that we didn't get rid of all the left overs.

    Make sure that menuconfig help text and kernel-parameters.txt are clear
    about value for the paramter and remove the stalled comment which is not
    very much useful on its own.

    Signed-off-by: Michal Hocko
    Reported-by: Gergely Risko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

19 Aug, 2013

1 commit

  • TREE_RCU and TREE_PREEMPT_RCU both cause kernel/rcutree.c to be built,
    but only TREE_RCU selects IRQ_WORK, which can result in an undefined
    reference to irq_work_queue for some (random) configs:

    kernel/built-in.o In function `rcu_start_gp_advanced':
    kernel/rcutree.c:1564: undefined reference to `irq_work_queue'

    Select IRQ_WORK from TREE_PREEMPT_RCU too to fix this.

    Signed-off-by: James Hogan
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Dipankar Sarma
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    James Hogan
     

16 Aug, 2013

2 commits

  • Currently, the MODULES symbol is special-cased in different places in the
    kconfig language. For example, if no symbol is defined to enable tristates,
    then kconfig looks up for a symbol named 'MODULES', and forces the 'modules'
    option onto that symbol.

    This causes problems as such:
    - since MODULES is special-cased, reading the configuration with
    KCONFIG_ALLCONFIG set will forcibly set MODULES to be 'valid' (ie.
    it has a valid value), when no such value was previously set. So
    MODULES defaults to 'n' unless it is present in KCONFIG_ALLCONFIG
    - other third-party projects may decide that 'MODULES' plays a different
    role for them

    This has been exposed by cset #cfa98f2e:
    kconfig: do not override symbols already set
    and reported by Stephen in:
    http://marc.info/?l=linux-next&m=137592137915234&w=2

    As suggested by Sam, we explicitly define the MODULES symbol to be the
    tristate-enabler. This will allow us to drop special-casing of MODULES
    in the kconfig language, later.

    (Note: this patch is not a fix to Stephen's issue, just a first step).

    Reported-by: Stephen Rothwell
    Signed-off-by: yann.morin.1998@free.fr
    Cc: Stephen Rothwell
    Cc: Sam Ravnborg
    Cc: Michal Marek
    Cc: Kevin Hilman
    Cc: sedat.dilek@gmail.com
    Cc: Theodore Ts'o

    Yann E. MORIN
     
  • Reviewed-by: Dave Chinner
    Reviewed-by: Gao feng
    Signed-off-by: Dwight Engen
    Signed-off-by: Ben Myers

    Dwight Engen
     

13 Aug, 2013

2 commits

  • cpu partial pages are used to avoid contention which does not exist in
    the UP case. So let SLUB_CPU_PARTIAL depend on SMP.

    Acked-by: Christoph Lameter
    Signed-off-by: Uwe Kleine-König
    Signed-off-by: Pekka Enberg

    Uwe Kleine-König
     
  • Now that the full dynticks subsystem only enables the context tracking
    on full dynticks CPUs, lets remove the dependency on CONTEXT_TRACKING_FORCE

    This dependency was a hack to enable the context tracking widely for the
    full dynticks susbsystem until the latter becomes able to enable it in a
    more CPU-finegrained fashion.

    Now CONTEXT_TRACKING_FORCE only stands for testing on archs that
    work on support for the context tracking while full dynticks can't be
    used yet due to unmet dependencies. It simulates a system where all CPUs
    are full dynticks so that RCU user extended quiescent states and dynticks
    cputime accounting can be tested on the given arch.

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Li Zhong
    Cc: Mike Galbraith
    Cc: Kevin Hilman

    Frederic Weisbecker
     

15 Jul, 2013

1 commit

  • Pull slab update from Pekka Enberg:
    "Highlights:

    - Fix for boot-time problems on some architectures due to
    init_lock_keys() not respecting kmalloc_caches boundaries
    (Christoph Lameter)

    - CONFIG_SLUB_CPU_PARTIAL requested by RT folks (Joonsoo Kim)

    - Fix for excessive slab freelist draining (Wanpeng Li)

    - SLUB and SLOB cleanups and fixes (various people)"

    I ended up editing the branch, and this avoids two commits at the end
    that were immediately reverted, and I instead just applied the oneliner
    fix in between myself.

    * 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
    slub: Check for page NULL before doing the node_match check
    mm/slab: Give s_next and s_stop slab-specific names
    slob: Check for NULL pointer before calling ctor()
    slub: Make cpu partial slab support configurable
    slab: add kmalloc() to kernel API documentation
    slab: fix init_lock_keys
    slob: use DIV_ROUND_UP where possible
    slub: do not put a slab to cpu partial list when cpu_partial is 0
    mm/slub: Use node_nr_slabs and node_nr_objs in get_slabinfo
    mm/slub: Drop unnecessary nr_partials
    mm/slab: Fix /proc/slabinfo unwriteable for slab
    mm/slab: Sharing s_next and s_stop between slab and slub
    mm/slab: Fix drain freelist excessively
    slob: Rework #ifdeffery in slab.h
    mm, slab: moved kmem_cache_alloc_node comment to correct place

    Linus Torvalds
     

10 Jul, 2013

1 commit

  • Add support for extracting LZ4-compressed kernel images, as well as
    LZ4-compressed ramdisk images in the kernel boot process.

    Signed-off-by: Kyungsik Lee
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Russell King
    Cc: Borislav Petkov
    Cc: Florian Fainelli
    Cc: Yann Collet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kyungsik Lee
     

08 Jul, 2013

1 commit

  • CPU partial support can introduce level of indeterminism that is not
    wanted in certain context (like a realtime kernel). Make it
    configurable.

    This patch is based on Christoph Lameter's "slub: Make cpu partial slab
    support configurable V2".

    Acked-by: Christoph Lameter
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Pekka Enberg

    Joonsoo Kim
     

07 Jul, 2013

1 commit

  • Pull timer core updates from Thomas Gleixner:
    "The timer changes contain:

    - posix timer code consolidation and fixes for odd corner cases

    - sched_clock implementation moved from ARM to core code to avoid
    duplication by other architectures

    - alarm timer updates

    - clocksource and clockevents unregistration facilities

    - clocksource/events support for new hardware

    - precise nanoseconds RTC readout (Xen feature)

    - generic support for Xen suspend/resume oddities

    - the usual lot of fixes and cleanups all over the place

    The parts which touch other areas (ARM/XEN) have been coordinated with
    the relevant maintainers. Though this results in an handful of
    trivial to solve merge conflicts, which we preferred over nasty cross
    tree merge dependencies.

    The patches which have been committed in the last few days are bug
    fixes plus the posix timer lot. The latter was in akpms queue and
    next for quite some time; they just got forgotten and Frederic
    collected them last minute."

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits)
    hrtimer: Remove unused variable
    hrtimers: Move SMP function call to thread context
    clocksource: Reselect clocksource when watchdog validated high-res capability
    posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
    posix_timers: fix racy timer delta caching on task exit
    posix-timers: correctly get dying task time sample in posix_cpu_timer_schedule()
    selftests: add basic posix timers selftests
    posix_cpu_timers: consolidate expired timers check
    posix_cpu_timers: consolidate timer list cleanups
    posix_cpu_timer: consolidate expiry time type
    tick: Sanitize broadcast control logic
    tick: Prevent uncontrolled switch to oneshot mode
    tick: Make oneshot broadcast robust vs. CPU offlining
    x86: xen: Sync the CMOS RTC as well as the Xen wallclock
    x86: xen: Sync the wallclock when the system time is set
    timekeeping: Indicate that clock was set in the pvclock gtod notifier
    timekeeping: Pass flags instead of multiple bools to timekeeping_update()
    xen: Remove clock_was_set() call in the resume path
    hrtimers: Support resuming with two or more CPUs online (but stopped)
    timer: Fix jiffies wrap behavior of round_jiffies_common()
    ...

    Linus Torvalds
     

05 Jul, 2013

1 commit


04 Jul, 2013

1 commit

  • Now there are only 2 members in struct page_cgroup. Update config MEMCG
    description accordingly.

    Signed-off-by: Sergey Dyasly
    Acked-by: Michal Hocko
    Acked-by: KOSAKI Motohiro
    Acked-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Dyasly
     

03 Jul, 2013

1 commit

  • Pull RCU updates from Ingo Molnar:
    "The major changes:

    - Simplify RCU's grace-period and callback processing based on the new
    numbering for callbacks.

    - Removal of TINY_PREEMPT_RCU in favor of TREE_PREEMPT_RCU for
    single-CPU low-latency systems.

    - SRCU-related changes and fixes.

    - Miscellaneous fixes, including converting a few remaining printk()
    calls to pr_*().

    - Documentation updates"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
    rcu: Shrink TINY_RCU by reworking CPU-stall ifdefs
    rcu: Shrink TINY_RCU by moving exit_rcu()
    rcu: Remove TINY_PREEMPT_RCU tracing documentation
    rcu: Consolidate rcutiny_plugin.h ifdefs
    rcu: Remove rcu_preempt_note_context_switch()
    rcu: Remove the CONFIG_TINY_RCU ifdefs in rcutiny.h
    rcu: Remove check_cpu_stall_preempt()
    rcu: Simplify RCU_TINY RCU callback invocation
    rcu: Remove rcu_preempt_process_callbacks()
    rcu: Remove rcu_preempt_remove_callbacks()
    rcu: Remove rcu_preempt_check_callbacks()
    rcu: Remove show_tiny_preempt_stats()
    rcu: Remove TINY_PREEMPT_RCU
    powerpc,kvm: fix imbalance srcu_read_[un]lock()
    rcu: Remove srcu_read_lock_raw() and srcu_read_unlock_raw().
    rcu: Apply Dave Jones's NOCB Kconfig help feedback
    rcu: Merge adjacent identical ifdefs
    rcu: Drive quiescent-state-forcing delay from HZ
    rcu: Remove "Experimental" flags
    kthread: Add kworker kthreads to OS-jitter documentation
    ...

    Linus Torvalds
     

25 Jun, 2013

1 commit

  • Some drivers can be built on more platforms than they run on. This is
    a burden for users and distributors who package a kernel. They have to
    manually deselect some (for them useless) drivers when updating their
    configs via oldconfig. And yet, sometimes it is even impossible to
    disable the drivers without patching the kernel.

    Introduce a new config option COMPILE_TEST and make all those drivers
    to depend on the platform they run on, or on the COMPILE_TEST option.
    Now, when users/distributors choose COMPILE_TEST=n they will not have
    the drivers in their allmodconfig setups, but developers still can
    compile-test them with COMPILE_TEST=y.

    Now the drivers where we use this new option:
    * PTP_1588_CLOCK_PCH: The PCH EG20T is only compatible with Intel Atom
    processors so it should depend on x86.
    * FB_GEODE: Geode is 32-bit only so only enable it for X86_32.
    * USB_CHIPIDEA_IMX: The OF_DEVICE dependency will be met on powerpc
    systems -- which do not actually support the hardware via that
    method.
    * INTEL_MID_PTI: It is specific to the Penwell type of Intel Atom
    device.

    [v2]
    * remove EXPERT dependency

    [gregkh - remove chipidea portion, as it's incorrect, and also doesn't
    apply to my driver-core tree]

    Signed-off-by: Jiri Slaby
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Jeff Mahoney
    Cc: Alexander Shishkin
    Cc: linux-usb@vger.kernel.org
    Cc: Florian Tobias Schandinat
    Cc: linux-geode@lists.infradead.org
    Cc: linux-fbdev@vger.kernel.org
    Cc: Richard Cochran
    Cc: netdev@vger.kernel.org
    Cc: Ben Hutchings
    Cc: "Keller, Jacob E"
    Signed-off-by: Greg Kroah-Hartman

    Jiri Slaby
     

18 Jun, 2013

1 commit


13 Jun, 2013

1 commit


11 Jun, 2013

5 commits

  • …u.2013.06.10a' and 'tiny.2013.06.10a' into HEAD

    cbnum.2013.06.10a: Apply simplifications stemming from the new callback
    numbering.

    doc.2013.06.10a: Documentation updates.

    fixes.2013.06.10a: Miscellaneous fixes.

    srcu.2013.06.10a: Updates to SRCU.

    tiny.2013.06.10a: Eliminate TINY_PREEMPT_RCU.

    Paul E. McKenney
     
  • TINY_PREEMPT_RCU adds significant code and complexity, but does not
    offer commensurate benefits. People currently using TINY_PREEMPT_RCU
    can get much better memory footprint with TINY_RCU, or, if they really
    need preemptible RCU, they can use TREE_PREEMPT_RCU with a relatively
    minor degradation in memory footprint. Please note that this move
    has been widely publicized on LKML (https://lkml.org/lkml/2012/11/12/545)
    and on LWN (http://lwn.net/Articles/541037/).

    This commit therefore removes TINY_PREEMPT_RCU.

    Signed-off-by: Paul E. McKenney
    [ paulmck: Updated to eliminate #else in rcutiny.h as suggested by Josh ]
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The Kconfig help text for the RCU_NOCB_CPU_NONE, RCU_NOCB_CPU_ZERO,
    and RCU_NOCB_CPU_ALL Kconfig options was unclear, so this commit
    adds a bit more detail.

    Reported-by: Dave Jones
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • After a release or two, features are no longer experimental. Therefore,
    this commit removes the "Experimental" tag from them.

    Reported-by: Paul Gortmaker
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • This commit fixes a lockdep-detected deadlock by moving a wake_up()
    call out from a rnp->lock critical section. Please see below for
    the long version of this story.

    On Tue, 2013-05-28 at 16:13 -0400, Dave Jones wrote:

    > [12572.705832] ======================================================
    > [12572.750317] [ INFO: possible circular locking dependency detected ]
    > [12572.796978] 3.10.0-rc3+ #39 Not tainted
    > [12572.833381] -------------------------------------------------------
    > [12572.862233] trinity-child17/31341 is trying to acquire lock:
    > [12572.870390] (rcu_node_0){..-.-.}, at: [] rcu_read_unlock_special+0x9f/0x4c0
    > [12572.878859]
    > but task is already holding lock:
    > [12572.894894] (&ctx->lock){-.-...}, at: [] perf_lock_task_context+0x7d/0x2d0
    > [12572.903381]
    > which lock already depends on the new lock.
    >
    > [12572.927541]
    > the existing dependency chain (in reverse order) is:
    > [12572.943736]
    > -> #4 (&ctx->lock){-.-...}:
    > [12572.960032] [] lock_acquire+0x91/0x1f0
    > [12572.968337] [] _raw_spin_lock+0x40/0x80
    > [12572.976633] [] __perf_event_task_sched_out+0x2e7/0x5e0
    > [12572.984969] [] perf_event_task_sched_out+0x93/0xa0
    > [12572.993326] [] __schedule+0x2cf/0x9c0
    > [12573.001652] [] schedule_user+0x2e/0x70
    > [12573.009998] [] retint_careful+0x12/0x2e
    > [12573.018321]
    > -> #3 (&rq->lock){-.-.-.}:
    > [12573.034628] [] lock_acquire+0x91/0x1f0
    > [12573.042930] [] _raw_spin_lock+0x40/0x80
    > [12573.051248] [] wake_up_new_task+0xb7/0x260
    > [12573.059579] [] do_fork+0x105/0x470
    > [12573.067880] [] kernel_thread+0x26/0x30
    > [12573.076202] [] rest_init+0x23/0x140
    > [12573.084508] [] start_kernel+0x3f1/0x3fe
    > [12573.092852] [] x86_64_start_reservations+0x2a/0x2c
    > [12573.101233] [] x86_64_start_kernel+0xcc/0xcf
    > [12573.109528]
    > -> #2 (&p->pi_lock){-.-.-.}:
    > [12573.125675] [] lock_acquire+0x91/0x1f0
    > [12573.133829] [] _raw_spin_lock_irqsave+0x4b/0x90
    > [12573.141964] [] try_to_wake_up+0x31/0x320
    > [12573.150065] [] default_wake_function+0x12/0x20
    > [12573.158151] [] autoremove_wake_function+0x18/0x40
    > [12573.166195] [] __wake_up_common+0x58/0x90
    > [12573.174215] [] __wake_up+0x39/0x50
    > [12573.182146] [] rcu_start_gp_advanced.isra.11+0x4a/0x50
    > [12573.190119] [] rcu_start_future_gp+0x1c9/0x1f0
    > [12573.198023] [] rcu_nocb_kthread+0x114/0x930
    > [12573.205860] [] kthread+0xed/0x100
    > [12573.213656] [] ret_from_fork+0x7c/0xb0
    > [12573.221379]
    > -> #1 (&rsp->gp_wq){..-.-.}:
    > [12573.236329] [] lock_acquire+0x91/0x1f0
    > [12573.243783] [] _raw_spin_lock_irqsave+0x4b/0x90
    > [12573.251178] [] __wake_up+0x23/0x50
    > [12573.258505] [] rcu_start_gp_advanced.isra.11+0x4a/0x50
    > [12573.265891] [] rcu_start_future_gp+0x1c9/0x1f0
    > [12573.273248] [] rcu_nocb_kthread+0x114/0x930
    > [12573.280564] [] kthread+0xed/0x100
    > [12573.287807] [] ret_from_fork+0x7c/0xb0

    Notice the above call chain.

    rcu_start_future_gp() is called with the rnp->lock held. Then it calls
    rcu_start_gp_advance, which does a wakeup.

    You can't do wakeups while holding the rnp->lock, as that would mean
    that you could not do a rcu_read_unlock() while holding the rq lock, or
    any lock that was taken while holding the rq lock. This is because...
    (See below).

    > [12573.295067]
    > -> #0 (rcu_node_0){..-.-.}:
    > [12573.309293] [] __lock_acquire+0x1786/0x1af0
    > [12573.316568] [] lock_acquire+0x91/0x1f0
    > [12573.323825] [] _raw_spin_lock+0x40/0x80
    > [12573.331081] [] rcu_read_unlock_special+0x9f/0x4c0
    > [12573.338377] [] __rcu_read_unlock+0x96/0xa0
    > [12573.345648] [] perf_lock_task_context+0x143/0x2d0
    > [12573.352942] [] find_get_context+0x4e/0x1f0
    > [12573.360211] [] SYSC_perf_event_open+0x514/0xbd0
    > [12573.367514] [] SyS_perf_event_open+0x9/0x10
    > [12573.374816] [] tracesys+0xdd/0xe2

    Notice the above trace.

    perf took its own ctx->lock, which can be taken while holding the rq
    lock. While holding this lock, it did a rcu_read_unlock(). The
    perf_lock_task_context() basically looks like:

    rcu_read_lock();
    raw_spin_lock(ctx->lock);
    rcu_read_unlock();

    Now, what looks to have happened, is that we scheduled after taking that
    first rcu_read_lock() but before taking the spin lock. When we scheduled
    back in and took the ctx->lock, the following rcu_read_unlock()
    triggered the "special" code.

    The rcu_read_unlock_special() takes the rnp->lock, which gives us a
    possible deadlock scenario.

    CPU0 CPU1 CPU2
    ---- ---- ----

    rcu_nocb_kthread()
    lock(rq->lock);
    lock(ctx->lock);
    lock(rnp->lock);

    wake_up();

    lock(rq->lock);

    rcu_read_unlock();

    rcu_read_unlock_special();

    lock(rnp->lock);
    lock(ctx->lock);

    **** DEADLOCK ****

    > [12573.382068]
    > other info that might help us debug this:
    >
    > [12573.403229] Chain exists of:
    > rcu_node_0 --> &rq->lock --> &ctx->lock
    >
    > [12573.424471] Possible unsafe locking scenario:
    >
    > [12573.438499] CPU0 CPU1
    > [12573.445599] ---- ----
    > [12573.452691] lock(&ctx->lock);
    > [12573.459799] lock(&rq->lock);
    > [12573.467010] lock(&ctx->lock);
    > [12573.474192] lock(rcu_node_0);
    > [12573.481262]
    > *** DEADLOCK ***
    >
    > [12573.501931] 1 lock held by trinity-child17/31341:
    > [12573.508990] #0: (&ctx->lock){-.-...}, at: [] perf_lock_task_context+0x7d/0x2d0
    > [12573.516475]
    > stack backtrace:
    > [12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ #39
    > [12573.545357] ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00
    > [12573.552868] ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40
    > [12573.560353] 0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0
    > [12573.567856] Call Trace:
    > [12573.575011] [] dump_stack+0x19/0x1b
    > [12573.582284] [] print_circular_bug+0x200/0x20f
    > [12573.589637] [] __lock_acquire+0x1786/0x1af0
    > [12573.596982] [] ? sched_clock_cpu+0xb5/0x100
    > [12573.604344] [] lock_acquire+0x91/0x1f0
    > [12573.611652] [] ? rcu_read_unlock_special+0x9f/0x4c0
    > [12573.619030] [] _raw_spin_lock+0x40/0x80
    > [12573.626331] [] ? rcu_read_unlock_special+0x9f/0x4c0
    > [12573.633671] [] rcu_read_unlock_special+0x9f/0x4c0
    > [12573.640992] [] ? perf_lock_task_context+0x7d/0x2d0
    > [12573.648330] [] ? put_lock_stats.isra.29+0xe/0x40
    > [12573.655662] [] ? delay_tsc+0x90/0xe0
    > [12573.662964] [] __rcu_read_unlock+0x96/0xa0
    > [12573.670276] [] perf_lock_task_context+0x143/0x2d0
    > [12573.677622] [] ? __perf_event_enable+0x370/0x370
    > [12573.684981] [] find_get_context+0x4e/0x1f0
    > [12573.692358] [] SYSC_perf_event_open+0x514/0xbd0
    > [12573.699753] [] ? get_parent_ip+0xd/0x50
    > [12573.707135] [] ? trace_hardirqs_on_caller+0xfd/0x1c0
    > [12573.714599] [] SyS_perf_event_open+0x9/0x10
    > [12573.721996] [] tracesys+0xdd/0xe2

    This commit delays the wakeup via irq_work(), which is what
    perf and ftrace use to perform wakeups in critical sections.

    Reported-by: Dave Jones
    Signed-off-by: Steven Rostedt
    Signed-off-by: Paul E. McKenney

    Steven Rostedt
     

04 Jun, 2013

1 commit

  • Ever since commit 45f035ab9b8f ("CONFIG_HOTPLUG should be always on"),
    it has been basically impossible to build a kernel with CONFIG_HOTPLUG
    turned off. Remove all the remaining references to it.

    Cc: Russell King
    Cc: Doug Thompson
    Cc: Bjorn Helgaas
    Cc: Steven Whitehouse
    Cc: Arnd Bergmann
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: Andrew Morton
    Signed-off-by: Stephen Rothwell
    Acked-by: Mauro Carvalho Chehab
    Acked-by: Hans Verkuil
    Signed-off-by: Greg Kroah-Hartman

    Stephen Rothwell
     

06 May, 2013

1 commit

  • Pull 'full dynticks' support from Ingo Molnar:
    "This tree from Frederic Weisbecker adds a new, (exciting! :-) core
    kernel feature to the timer and scheduler subsystems: 'full dynticks',
    or CONFIG_NO_HZ_FULL=y.

    This feature extends the nohz variable-size timer tick feature from
    idle to busy CPUs (running at most one task) as well, potentially
    reducing the number of timer interrupts significantly.

    This feature got motivated by real-time folks and the -rt tree, but
    the general utility and motivation of full-dynticks runs wider than
    that:

    - HPC workloads get faster: CPUs running a single task should be able
    to utilize a maximum amount of CPU power. A periodic timer tick at
    HZ=1000 can cause a constant overhead of up to 1.0%. This feature
    removes that overhead - and speeds up the system by 0.5%-1.0% on
    typical distro configs even on modern systems.

    - Real-time workload latency reduction: CPUs running critical tasks
    should experience as little jitter as possible. The last remaining
    source of kernel-related jitter was the periodic timer tick.

    - A single task executing on a CPU is a pretty common situation,
    especially with an increasing number of cores/CPUs, so this feature
    helps desktop and mobile workloads as well.

    The cost of the feature is mainly related to increased timer
    reprogramming overhead when a CPU switches its tick period, and thus
    slightly longer to-idle and from-idle latency.

    Configuration-wise a third mode of operation is added to the existing
    two NOHZ kconfig modes:

    - CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named
    as a config option. This is the traditional Linux periodic tick
    design: there's a HZ tick going on all the time, regardless of
    whether a CPU is idle or not.

    - CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the
    periodic tick when a CPU enters idle mode.

    - CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the
    tick when a CPU is idle, also slows the tick down to 1 Hz (one
    timer interrupt per second) when only a single task is running on a
    CPU.

    The .config behavior is compatible: existing !CONFIG_NO_HZ and
    CONFIG_NO_HZ=y settings get translated to the new values, without the
    user having to configure anything. CONFIG_NO_HZ_FULL is turned off by
    default.

    This feature is based on a lot of infrastructure work that has been
    steadily going upstream in the last 2-3 cycles: related RCU support
    and non-periodic cputime support in particular is upstream already.

    This tree adds the final pieces and activates the feature. The pull
    request is marked RFC because:

    - it's marked 64-bit only at the moment - the 32-bit support patch is
    small but did not get ready in time.

    - it has a number of fresh commits that came in after the merge
    window. The overwhelming majority of commits are from before the
    merge window, but still some aspects of the tree are fresh and so I
    marked it RFC.

    - it's a pretty wide-reaching feature with lots of effects - and
    while the components have been in testing for some time, the full
    combination is still not very widely used. That it's default-off
    should reduce its regression abilities and obviously there are no
    known regressions with CONFIG_NO_HZ_FULL=y enabled either.

    - the feature is not completely idempotent: there is no 100%
    equivalent replacement for a periodic scheduler/timer tick. In
    particular there's ongoing work to map out and reduce its effects
    on scheduler load-balancing and statistics. This should not impact
    correctness though, there are no known regressions related to this
    feature at this point.

    - it's a pretty ambitious feature that with time will likely be
    enabled by most Linux distros, and we'd like you to make input on
    its design/implementation, if you dislike some aspect we missed.
    Without flaming us to crisp! :-)

    Future plans:

    - there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off
    the periodic tick altogether when there's a single busy task on a
    CPU. We'd first like 1 Hz to be exposed more widely before we go
    for the 0 Hz target though.

    - once we reach 0 Hz we can remove the periodic tick assumption from
    nr_running>=2 as well, by essentially interrupting busy tasks only
    as frequently as the sched_latency constraints require us to do -
    once every 4-40 msecs, depending on nr_running.

    I am personally leaning towards biting the bullet and doing this in
    v3.10, like the -rt tree this effort has been going on for too long -
    but the final word is up to you as usual.

    More technical details can be found in Documentation/timers/NO_HZ.txt"

    * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
    sched: Keep at least 1 tick per second for active dynticks tasks
    rcu: Fix full dynticks' dependency on wide RCU nocb mode
    nohz: Protect smp_processor_id() in tick_nohz_task_switch()
    nohz_full: Add documentation.
    cputime_nsecs: use math64.h for nsec resolution conversion helpers
    nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config
    nohz: Reduce overhead under high-freq idling patterns
    nohz: Remove full dynticks' superfluous dependency on RCU tree
    nohz: Fix unavailable tick_stop tracepoint in dynticks idle
    nohz: Add basic tracing
    nohz: Select wide RCU nocb for full dynticks
    nohz: Disable the tick when irq resume in full dynticks CPU
    nohz: Re-evaluate the tick for the new task after a context switch
    nohz: Prepare to stop the tick on irq exit
    nohz: Implement full dynticks kick
    nohz: Re-evaluate the tick from the scheduler IPI
    sched: New helper to prevent from stopping the tick in full dynticks
    sched: Kick full dynticks CPU that have more than one task enqueued.
    perf: New helper to prevent full dynticks CPUs from stopping tick
    perf: Kick full dynticks CPU if events rotation is needed
    ...

    Linus Torvalds
     

04 May, 2013

1 commit

  • Commit 0637e029392386e6996f5d6574aadccee8315efa
    ("nohz: Select wide RCU nocb for full dynticks") intended
    to force CONFIG_RCU_NOCB_CPU_ALL=y when full dynticks is
    enabled.

    However this option is part of a choice menu and Kconfig's
    "select" instruction has no effect on such targets.

    Fix this by using reverse dependencies on the targets we
    don't want instead.

    Reviewed-by: Paul E. McKenney
    Signed-off-by: Frederic Weisbecker
    Cc: Christoph Lameter
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

02 May, 2013

1 commit


01 May, 2013

1 commit

  • The kconfig language requires that dependent options all follow the
    menuconfig symbol in order to be collapsed below it. Recently some hidden
    options were added below the EXPERT menuconfig, but did not depend on
    EXPERT (because hidden options can't). This broke the display. So
    re-order all these options, and while we're here stick the PCI quirks
    under the EXPERT menu (since it isn't sitting with any related options).

    Before this commit, we get:
    [*] Configure standard kernel features (expert users) --->
    [ ] Sysctl syscall support
    [*] Load all symbols for debugging/ksymoops
    ...
    [ ] Embedded system

    Now we get the older (and correct) behavior:
    [*] Configure standard kernel features (expert users) --->
    [ ] Embedded system
    And if you go into the expert menu you get the expert options:
    [ ] Sysctl syscall support
    [*] Load all symbols for debugging/ksymoops
    ...

    Signed-off-by: Mike Frysinger
    Acked-by: Randy Dunlap
    Cc: zhangwei(Jovi)
    Cc: Michal Marek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Frysinger
     

30 Apr, 2013

1 commit

  • Pull scheduler changes from Ingo Molnar:
    "The main changes in this development cycle were:

    - full dynticks preparatory work by Frederic Weisbecker

    - factor out the cpu time accounting code better, by Li Zefan

    - multi-CPU load balancer cleanups and improvements by Joonsoo Kim

    - various smaller fixes and cleanups"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
    sched: Fix init NOHZ_IDLE flag
    sched: Prevent to re-select dst-cpu in load_balance()
    sched: Rename load_balance_tmpmask to load_balance_mask
    sched: Move up affinity check to mitigate useless redoing overhead
    sched: Don't consider other cpus in our group in case of NEWLY_IDLE
    sched: Explicitly cpu_idle_type checking in rebalance_domains()
    sched: Change position of resched_cpu() in load_balance()
    sched: Fix wrong rq's runnable_avg update with rt tasks
    sched: Document task_struct::personality field
    sched/cpuacct/UML: Fix header file dependency bug on the UML build
    cgroup: Kill subsys.active flag
    sched/cpuacct: No need to check subsys active state
    sched/cpuacct: Initialize cpuacct subsystem earlier
    sched/cpuacct: Initialize root cpuacct earlier
    sched/cpuacct: Allocate per_cpu cpuusage for root cpuacct statically
    sched/cpuacct: Clean up cpuacct.h
    sched/cpuacct: Remove redundant NULL checks in cpuacct_acount_field()
    sched/cpuacct: Remove redundant NULL checks in cpuacct_charge()
    sched/cpuacct: Add cpuacct_acount_field()
    sched/cpuacct: Add cpuacct_init()
    ...

    Linus Torvalds
     

27 Apr, 2013

1 commit

  • Turn the full dynticks passive dependency on VIRT_CPU_ACCOUNTING_GEN
    to an active one.

    The full dynticks Kconfig is currently hidden behind the full dynticks
    cputime accounting, which is an awkward and counter-intuitive layout:
    the user first has to select the dynticks cputime accounting in order
    to make the full dynticks feature to be visible.

    We definetly want it the other way around. The usual way to perform
    this kind of active dependency is use "select" on the depended target.
    Now we can't use the Kconfig "select" instruction when the target is
    a "choice".

    So this patch inspires on how the RCU subsystem Kconfig interact
    with its dependencies on SMP and PREEMPT: we make sure that cputime
    accounting can't propose another option than VIRT_CPU_ACCOUNTING_GEN
    when NO_HZ_FULL is selected by using the right "depends on" instruction
    for each cputime accounting choices.

    v2: Keep full dynticks cputime accounting available even without
    full dynticks, as per Paul McKenney's suggestion.

    Reported-by: Ingo Molnar
    Signed-off-by: Frederic Weisbecker
    Cc: Christoph Lameter
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

10 Apr, 2013

1 commit

  • …/linux-rcu into core/rcu

    Pull RCU updates from Paul E. McKenney:

    * Remove restrictions on no-CBs CPUs, make RCU_FAST_NO_HZ
    take advantage of numbered callbacks, do additional callback
    accelerations based on numbered callbacks. Posted to LKML
    at https://lkml.org/lkml/2013/3/18/960.

    * RCU documentation updates. Posted to LKML at
    https://lkml.org/lkml/2013/3/18/570.

    * Miscellaneous fixes. Posted to LKML at
    https://lkml.org/lkml/2013/3/18/594.

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

03 Apr, 2013

1 commit

  • We are planning to convert the dynticks Kconfig options layout
    into a choice menu. The user must be able to easily pick
    any of the following implementations: constant periodic tick,
    idle dynticks, full dynticks.

    As this implies a mutual exclusion, the two dynticks implementions
    need to converge on the selection of a common Kconfig option in order
    to ease the sharing of a common infrastructure.

    It would thus seem pretty natural to reuse CONFIG_NO_HZ to
    that end. It already implements all the idle dynticks code
    and the full dynticks depends on all that code for now.
    So ideally the choice menu would propose CONFIG_NO_HZ_IDLE and
    CONFIG_NO_HZ_EXTENDED then both would select CONFIG_NO_HZ.

    On the other hand we want to stay backward compatible: if
    CONFIG_NO_HZ is set in an older config file, we want to
    enable CONFIG_NO_HZ_IDLE by default.

    But we can't afford both at the same time or we run into
    a circular dependency:

    1) CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED both select
    CONFIG_NO_HZ
    2) If CONFIG_NO_HZ is set, we default to CONFIG_NO_HZ_IDLE

    We might be able to support that from Kconfig/Kbuild but it
    may not be wise to introduce such a confusing behaviour.

    So to solve this, create a new CONFIG_NO_HZ_COMMON option
    which gathers the common code between idle and full dynticks
    (that common code for now is simply the idle dynticks code)
    and select it from their referring Kconfig.

    Then we'll later create CONFIG_NO_HZ_IDLE and map CONFIG_NO_HZ
    to it for backward compatibility.

    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Namhyung Kim
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

26 Mar, 2013

3 commits

  • Because RCU callbacks are now associated with the number of the grace
    period that they must wait for, CPUs can now take advance callbacks
    corresponding to grace periods that ended while a given CPU was in
    dyntick-idle mode. This eliminates the need to try forcing the RCU
    state machine while entering idle, thus reducing the CPU intensiveness
    of RCU_FAST_NO_HZ, which should increase its energy efficiency.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Currently, the per-no-CBs-CPU kthreads are named "rcuo" followed by
    the CPU number, for example, "rcuo". This is problematic given that
    there are either two or three RCU flavors, each of which gets a per-CPU
    kthread with exactly the same name. This commit therefore introduces
    a one-letter abbreviation for each RCU flavor, namely 'b' for RCU-bh,
    'p' for RCU-preempt, and 's' for RCU-sched. This abbreviation is used
    to distinguish the "rcuo" kthreads, for example, for CPU 0 we would have
    "rcuob/0", "rcuop/0", and "rcuos/0".

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Tested-by: Dietmar Eggemann

    Paul E. McKenney
     
  • Currently, the only way to specify no-CBs CPUs is via the rcu_nocbs
    kernel command-line parameter. This is inconvenient in some cases,
    particularly for randconfig testing, so this commit adds a new set of
    kernel configuration parameters. CONFIG_RCU_NOCB_CPU_NONE (the default)
    retains the old behavior, CONFIG_RCU_NOCB_CPU_ZERO offloads callback
    processing from CPU 0 (along with any other CPUs specified by the
    rcu_nocbs boot-time parameter), and CONFIG_RCU_NOCB_CPU_ALL offloads
    callback processing from all CPUs.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

13 Mar, 2013

2 commits

  • Remove "config EXPERIMENTAL" itself, now that every "depends on" it has
    been removed from the tree.

    Signed-off-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     
  • Currently, CPU 0 is constrained to not be a no-CBs CPU, and furthermore
    at least one no-CBs CPU must remain online at any given time. These
    restrictions are problematic in some situations, such as cases where
    all CPUs must run a real-time workload that needs to be insulated from
    OS jitter and latencies due to RCU callback invocation. This commit
    therefore provides no-CBs CPUs a (very crude and energy-inefficient)
    way to start and to wait for grace periods independently of the normal
    RCU callback mechanisms. This approach allows any or all of the CPUs to
    be designated as no-CBs CPUs, and allows any proper subset of the CPUs
    (whether no-CBs CPUs or not) to be offlined.

    This commit also provides a fix for a locking bug spotted by Xie
    ChanglongX .

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney