16 Jul, 2011

3 commits

  • To support probing module init functions, kprobe-tracer allows
    user to define a probe on non-existed function when it is given
    with a module name. This also enables user to set a probe on
    a function on a specific module, even if a same name (but different)
    function is locally defined in another module.

    The module name must be in the front of function name and separated
    by a ':'. e.g. btrfs:btrfs_init_sysfs

    Signed-off-by: Masami Hiramatsu
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Arnaldo Carvalho de Melo
    Link: http://lkml.kernel.org/r/20110627072656.6528.89970.stgit@fedora15
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • Return -ENOENT if probe point doesn't exist, but still returns
    -EINVAL if both of kprobe->addr and kprobe->symbol_name are
    specified or both are not specified.

    Acked-by: Ananth N Mavinakayanahalli
    Signed-off-by: Masami Hiramatsu
    Cc: Ananth N Mavinakayanahalli
    Cc: Arnaldo Carvalho de Melo
    Cc: Ingo Molnar
    Cc: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Anil S Keshavamurthy
    Cc: "David S. Miller"
    Link: http://lkml.kernel.org/r/20110627072650.6528.67329.stgit@fedora15
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • Merge redundant enable/disable functions into enable_trace_probe()
    and disable_trace_probe().

    Signed-off-by: Masami Hiramatsu
    Cc: Arnaldo Carvalho de Melo
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: yrl.pp-manager.tt@hitachi.com
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Link: http://lkml.kernel.org/r/20110627072644.6528.26910.stgit@fedora15

    [ converted kprobe selftest to use enable_trace_probe ]

    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     

15 Jul, 2011

3 commits

  • Rename probe_* to trace_probe_* for avoiding namespace
    confliction. This also fixes improper names of find_probe_event()
    and cleanup_all_probes() to find_trace_probe() and
    release_all_trace_probes() respectively.

    Signed-off-by: Masami Hiramatsu
    Cc: Arnaldo Carvalho de Melo
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/r/20110627072636.6528.60374.stgit@fedora15
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • Instead of hw_nmi_watchdog_set_attr() weak function
    and appropriate x86_pmu::hw_watchdog_set_attr() call
    we introduce even alias mechanism which allow us
    to drop this routines completely and isolate quirks
    of Netburst architecture inside P4 PMU code only.

    The main idea remains the same though -- to allow
    nmi-watchdog and perf top run simultaneously.

    Note the aliasing mechanism applies to generic
    PERF_COUNT_HW_CPU_CYCLES event only because arbitrary
    event (say passed as RAW initially) might have some
    additional bits set inside ESCR register changing
    the behaviour of event and we can't guarantee anymore
    that alias event will give the same result.

    P.S. Thanks a huge to Don and Steven for for testing
    and early review.

    Acked-by: Don Zickus
    Tested-by: Steven Rostedt
    Signed-off-by: Cyrill Gorcunov
    CC: Ingo Molnar
    CC: Peter Zijlstra
    CC: Stephane Eranian
    CC: Lin Ming
    CC: Arnaldo Carvalho de Melo
    CC: Frederic Weisbecker
    Link: http://lkml.kernel.org/r/20110708201712.GS23657@sun
    Signed-off-by: Steven Rostedt

    Cyrill Gorcunov
     
  • Currently the stack trace per event in ftace is only 8 frames.
    This can be quite limiting and sometimes useless. Especially when
    the "ignore frames" is wrong and we also use up stack frames for
    the event processing itself.

    Change this to be dynamic by adding a percpu buffer that we can
    write a large stack frame into and then copy into the ring buffer.

    For interrupts and NMIs that come in while another event is being
    process, will only get to use the 8 frame stack. That should be enough
    as the task that it interrupted will have the full stack frame anyway.

    Requested-by: Thomas Gleixner
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

14 Jul, 2011

3 commits

  • Archs that do not implement CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST, will
    fail the dynamic ftrace selftest.

    The function tracer has a quick 'off' variable that will prevent
    the call back functions from being called. This variable is called
    function_trace_stop. In x86, this is implemented directly in the mcount
    assembly, but for other archs, an intermediate function is used called
    ftrace_test_stop_func().

    In dynamic ftrace, the function pointer variable ftrace_trace_function is
    used to update the caller code in the mcount caller. But for archs that
    do not have CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST set, it only calls
    ftrace_test_stop_func() instead, which in turn calls __ftrace_trace_function.

    When more than one ftrace_ops is registered, the function it calls is
    ftrace_ops_list_func(), which will iterate over all registered ftrace_ops
    and call the callbacks that have their hash matching.

    The issue happens when two ftrace_ops are registered for different functions
    and one is then unregistered. The __ftrace_trace_function is then pointed
    to the remaining ftrace_ops callback function directly. This mean it will
    be called for all functions that were registered to trace by both ftrace_ops
    that were registered.

    This is not an issue for archs with CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST,
    because the update of ftrace_trace_function doesn't happen until after all
    functions have been updated, and then the mcount caller is updated. But
    for those archs that do use the ftrace_test_stop_func(), the update is
    immediate.

    The dynamic selftest fails because it hits this situation, and the
    ftrace_ops that it registers fails to only trace what it was suppose to
    and instead traces all other functions.

    The solution is to delay the setting of __ftrace_trace_function until
    after all the functions have been updated according to the registered
    ftrace_ops. Also, function_trace_stop is set during the update to prevent
    function tracing from calling code that is caused by the function tracer
    itself.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Currently, if set_ftrace_filter() is called when the ftrace_ops is
    active, the function filters will not be updated. They will only be updated
    when tracing is disabled and re-enabled.

    Update the functions immediately during set_ftrace_filter().

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Whenever the hash of the ftrace_ops is updated, the record counts
    must be balance. This requires disabling the records that are set
    in the original hash, and then enabling the records that are set
    in the updated hash.

    Moving the update into ftrace_hash_move() removes the bug where the
    hash was updated but the records were not, which results in ftrace
    triggering a warning and disabling itself because the ftrace_ops filter
    is updated while the ftrace_ops was registered, and then the failure
    happens when the ftrace_ops is unregistered.

    The current code will not trigger this bug, but new code will.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

08 Jul, 2011

2 commits

  • When I mounted an NFS directory, it caused several modules to be loaded. At the
    time I was running the preemptirqsoff tracer, and it showed the following
    output:

    # tracer: preemptirqsoff
    #
    # preemptirqsoff latency trace v1.1.5 on 2.6.33.9-rt30-mrg-test
    # --------------------------------------------------------------------
    # latency: 1177 us, #4/4, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
    # -----------------
    # | task: modprobe-19370 (uid:0 nice:0 policy:0 rt_prio:0)
    # -----------------
    # => started at: ftrace_module_notify
    # => ended at: ftrace_module_notify
    #
    #
    # _------=> CPU#
    # / _-----=> irqs-off
    # | / _----=> need-resched
    # || / _---=> hardirq/softirq
    # ||| / _--=> preempt-depth
    # |||| /_--=> lock-depth
    # |||||/ delay
    # cmd pid |||||| time | caller
    # \ / |||||| \ | /
    modprobe-19370 3d.... 0us!: ftrace_process_locs
    => ftrace_process_locs
    => ftrace_module_notify
    => notifier_call_chain
    => __blocking_notifier_call_chain
    => blocking_notifier_call_chain
    => sys_init_module
    => system_call_fastpath

    That's over 1ms that interrupts are disabled on a Real-Time kernel!

    Looking at the cause (being the ftrace author helped), I found that the
    interrupts are disabled before the code modification of mcounts into nops. The
    interrupts only need to be disabled on start up around this code, not when
    modules are being loaded.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • If a function is set to be traced by the set_graph_function, but the
    option funcgraph-irqs is zero, and the traced function happens to be
    called from a interrupt, it will not be traced.

    The point of funcgraph-irqs is to not trace interrupts when we are
    preempted by an irq, not to not trace functions we want to trace that
    happen to be *in* a irq.

    Luckily the current->trace_recursion element is perfect to add a flag
    to help us be able to trace functions within an interrupt even when
    we are not tracing interrupts that preempt the trace.

    Reported-by: Heiko Carstens
    Tested-by: Heiko Carstens
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

05 Jul, 2011

1 commit


01 Jul, 2011

10 commits

  • KVM needs one-shot samples, since a PMC programmed to -X will fire after X
    events and then again after 2^40 events (i.e. variable period).

    Signed-off-by: Avi Kivity
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1309362157-6596-4-git-send-email-avi@redhat.com
    Signed-off-by: Ingo Molnar

    Avi Kivity
     
  • The perf_event overflow handler does not receive any caller-derived
    argument, so many callers need to resort to looking up the perf_event
    in their local data structure. This is ugly and doesn't scale if a
    single callback services many perf_events.

    Fix by adding a context parameter to perf_event_create_kernel_counter()
    (and derived hardware breakpoints APIs) and storing it in the perf_event.
    The field can be accessed from the callback as event->overflow_handler_context.
    All callers are updated.

    Signed-off-by: Avi Kivity
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1309362157-6596-2-git-send-email-avi@redhat.com
    Signed-off-by: Ingo Molnar

    Avi Kivity
     
  • Since only samples call perf_output_sample() its much saner (and more
    correct) to put the sample logic in there than in the
    perf_output_begin()/perf_output_end() pair.

    Saves a useless argument, reduces conditionals and shrinks
    struct perf_output_handle, win!

    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-2crpvsx3cqu67q3zqjbnlpsc@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • The nmi parameter indicated if we could do wakeups from the current
    context, if not, we would set some state and self-IPI and let the
    resulting interrupt do the wakeup.

    For the various event classes:

    - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
    the PMI-tail (ARM etc.)
    - tracepoint: nmi=0; since tracepoint could be from NMI context.
    - software: nmi=[0,1]; some, like the schedule thing cannot
    perform wakeups, and hence need 0.

    As one can see, there is very little nmi=1 usage, and the down-side of
    not using it is that on some platforms some software events can have a
    jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).

    The up-side however is that we can remove the nmi parameter and save a
    bunch of conditionals in fast paths.

    Signed-off-by: Peter Zijlstra
    Cc: Michael Cree
    Cc: Will Deacon
    Cc: Deng-Cheng Zhu
    Cc: Anton Blanchard
    Cc: Eric B Munson
    Cc: Heiko Carstens
    Cc: Paul Mundt
    Cc: David S. Miller
    Cc: Frederic Weisbecker
    Cc: Jason Wessel
    Cc: Don Zickus
    Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Due to restriction and specifics of Netburst PMU we need a separated
    event for NMI watchdog. In particular every Netburst event
    consumes not just a counter and a config register, but also an
    additional ESCR register.

    Since ESCR registers are grouped upon counters (i.e. if ESCR is occupied
    for some event there is no room for another event to enter until its
    released) we need to pick up the "least" used ESCR (or the most available
    one) for nmi-watchdog purposes -- so MSR_P4_CRU_ESCR2/3 was chosen.

    With this patch nmi-watchdog and perf top should be able to run simultaneously.

    Signed-off-by: Cyrill Gorcunov
    CC: Lin Ming
    CC: Arnaldo Carvalho de Melo
    CC: Frederic Weisbecker
    Tested-and-reviewed-by: Don Zickus
    Tested-and-reviewed-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110623124918.GC13050@sun
    Signed-off-by: Ingo Molnar

    Cyrill Gorcunov
     
  • The event tracing infrastructure exposes two timers which should be updated
    each time the value of the counter is updated. Currently, these counters are
    only updated when userspace calls read() on the fd associated with an event.
    This means that counters which are read via the mmap'd page exclusively never
    have their timers updated. This patch adds ensures that the timers are updated
    each time the values in the mmap'd page are updated.

    Signed-off-by: Eric B Munson
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1308932786-5111-1-git-send-email-emunson@mgebm.net
    Signed-off-by: Ingo Molnar

    Eric B Munson
     
  • Take the timer calculation from perf_output_read and move it to a helper
    function for any place that needs timer values but cannot take the ctx->lock.

    Signed-off-by: Eric B Munson
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1308861279-15216-2-git-send-email-emunson@mgebm.net
    Signed-off-by: Ingo Molnar

    Eric B Munson
     
  • Signed-off-by: Eric B Munson
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1308861279-15216-1-git-send-email-emunson@mgebm.net
    Signed-off-by: Ingo Molnar

    Eric B Munson
     
  • Since 2.6.36 (specifically commit d57e34fdd60b ("perf: Simplify the
    ring-buffer logic: make perf_buffer_alloc() do everything needed"),
    the perf_buffer_init_code() has been mis-setting the buffer watermark
    if perf_event_attr.wakeup_events has a non-zero value.

    This is because perf_event_attr.wakeup_events is a union with
    perf_event_attr.wakeup_watermark.

    This commit re-enables the check for perf_event_attr.watermark being
    set before continuing with setting a non-default watermark.

    This bug is most noticable when you are trying to use PERF_IOC_REFRESH
    with a value larger than one and perf_event_attr.wakeup_events is set to
    one. In this case the buffer watermark will be set to 1 and you will
    get extraneous POLL_IN overflows rather than POLL_HUP as expected.

    [ avoid using attr.wakeup_events when attr.watermark is set ]

    Signed-off-by: Vince Weaver
    Signed-off-by: Peter Zijlstra
    Cc:
    Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1106011506390.5384@cl320.eecs.utk.edu
    Signed-off-by: Ingo Molnar

    Vince Weaver
     
  • Merge reason: Pick up the latest fixes.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

28 Jun, 2011

1 commit

  • Currently a single process may register exit handlers unlimited times.
    It may lead to a bloated listeners chain and very slow process
    terminations.

    Eg after 10KK sent TASKSTATS_CMD_ATTR_REGISTER_CPUMASKs ~300 Mb of
    kernel memory is stolen for the handlers chain and "time id" shows 2-7
    seconds instead of normal 0.003. It makes it possible to exhaust all
    kernel memory and to eat much of CPU time by triggerring numerous exits
    on a single CPU.

    The patch limits the number of times a single process may register
    itself on a single CPU to one.

    One little issue is kept unfixed - as taskstats_exit() is called before
    exit_files() in do_exit(), the orphaned listener entry (if it was not
    explicitly deregistered) is kept until the next someone's exit() and
    implicit deregistration in send_cpu_listeners(). So, if a process
    registered itself as a listener exits and the next spawned process gets
    the same pid, it would inherit taskstats attributes.

    Signed-off-by: Vasiliy Kulikov
    Cc: Balbir Singh
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasiliy Kulikov
     

25 Jun, 2011

1 commit


22 Jun, 2011

3 commits

  • Toralf Förster and Richard Weinberger noted that if there is
    no RTC device, the alarm timers core prints out an annoying
    "ALARM timers will not wake from suspend" message.

    This warning has been removed in a previous patch, however
    the issue still remains: The original idea was to support
    alarm timers even if there was no rtc device, as long as the
    system didn't go into suspend.

    However, after further consideration, communicating to the application
    that alarmtimers are not fully functional seems like the better
    solution.

    So this patch makes it so we return -ENOTSUPP to any posix _ALARM
    clockid calls if there is no backing RTC device on the system.

    Further this changes the behavior where when there is no rtc device
    we will check for one on clock_getres, clock_gettime, timer_create,
    and timer_nsleep instead of on suspend.

    CC: Toralf Förster
    CC: Richard Weinberger
    CC: Thomas Gleixner
    Reported-by: Toralf Förster
    Reported by: Richard Weinberger
    Signed-off-by: John Stultz

    John Stultz
     
  • The alarmtimers code currently picks a rtc device to use at
    late init time. However, if your rtc driver is loaded as a module,
    it may be registered after the alarmtimers late init code, leaving
    the alarmtimers nonfunctional.

    This patch moves the the rtcdevice selection to when we actually try
    to use it, allowing us to make use of rtc modules that may have been
    loaded at any point since bootup.

    CC: Thomas Gleixner
    CC: Meelis Roos
    Reported-by: Meelis Roos
    Signed-off-by: John Stultz

    John Stultz
     
  • When opening /dev/snapshot device, snapshot_open() creates memory
    bitmaps which are freed in snapshot_release(). But if any of the
    callbacks called by pm_notifier_call_chain() returns NOTIFY_BAD, open()
    fails, snapshot_release() is never called and bitmaps are not freed.
    Next attempt to open /dev/snapshot then triggers BUG_ON() check in
    create_basic_memory_bitmaps(). This happens e.g. when vmwatchdog module
    is active on s390x.

    Signed-off-by: Michal Kubecek
    Signed-off-by: Rafael J. Wysocki
    Cc: stable@kernel.org

    Michal Kubecek
     

20 Jun, 2011

1 commit

  • …-for-linus' and 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    tools/perf: Fix static build of perf tool
    tracing: Fix regression in printk_formats file

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    generic-ipi: Fix kexec boot crash by initializing call_single_queue before enabling interrupts

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    clocksource: Make watchdog robust vs. interruption
    timerfd: Fix wakeup of processes when timer is cancelled on clock change

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, MAINTAINERS: Add x86 MCE people
    x86, efi: Do not reserve boot services regions within reserved areas

    Linus Torvalds
     

19 Jun, 2011

1 commit


18 Jun, 2011

1 commit

  • ____call_usermodehelper() now erases any credentials set by the
    subprocess_inf::init() function. The problem is that commit
    17f60a7da150 ("capabilites: allow the application of capability limits
    to usermode helpers") creates and commits new credentials with
    prepare_kernel_cred() after the call to the init() function. This wipes
    all keyrings after umh_keys_init() is called.

    The best way to deal with this is to put the init() call just prior to
    the commit_creds() call, and pass the cred pointer to init(). That
    means that umh_keys_init() and suchlike can modify the credentials
    _before_ they are published and potentially in use by the rest of the
    system.

    This prevents request_key() from working as it is prevented from passing
    the session keyring it set up with the authorisation token to
    /sbin/request-key, and so the latter can't assume the authority to
    instantiate the key. This causes the in-kernel DNS resolver to fail
    with ENOKEY unconditionally.

    Signed-off-by: David Howells
    Acked-by: Eric Paris
    Tested-by: Jeff Layton
    Signed-off-by: Linus Torvalds

    David Howells
     

17 Jun, 2011

3 commits

  • There is a problem that kdump(2nd kernel) sometimes hangs up due
    to a pending IPI from 1st kernel. Kernel panic occurs because IPI
    comes before call_single_queue is initialized.

    To fix the crash, rename init_call_single_data() to call_function_init()
    and call it in start_kernel() so that call_single_queue can be
    initialized before enabling interrupts.

    The details of the crash are:

    (1) 2nd kernel boots up

    (2) A pending IPI from 1st kernel comes when irqs are first enabled
    in start_kernel().

    (3) Kernel tries to handle the interrupt, but call_single_queue
    is not initialized yet at this point. As a result, in the
    generic_smp_call_function_single_interrupt(), NULL pointer
    dereference occurs when list_replace_init() tries to access
    &q->list.next.

    Therefore this patch changes the name of init_call_single_data()
    to call_function_init() and calls it before local_irq_enable()
    in start_kernel().

    Signed-off-by: Takao Indoh
    Reviewed-by: WANG Cong
    Acked-by: Neil Horman
    Acked-by: Vivek Goyal
    Acked-by: Peter Zijlstra
    Cc: Milton Miller
    Cc: Jens Axboe
    Cc: Paul E. McKenney
    Cc: kexec@lists.infradead.org
    Link: http://lkml.kernel.org/r/D6CBEE2F420741indou.takao@jp.fujitsu.com
    Signed-off-by: Ingo Molnar

    Takao Indoh
     
  • The commit "use softirq instead of kthreads except when RCU_BOOST=y"
    just applied #ifdef in place. This commit is a cleanup that moves
    the newly #ifdef'ed code to the header file kernel/rcutree_plugin.h.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The clocksource watchdog code is interruptible and it has been
    observed that this can trigger false positives which disable the TSC.

    The reason is that an interrupt storm or a long running interrupt
    handler between the read of the watchdog source and the read of the
    TSC brings the two far enough apart that the delta is larger than the
    unstable treshold. Move both reads into a short interrupt disabled
    region to avoid that.

    Reported-and-tested-by: Vernon Mauery
    Signed-off-by: Thomas Gleixner
    Cc: stable@kernel.org

    Thomas Gleixner
     

16 Jun, 2011

5 commits

  • Merge reason: add the latest fixes.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • This patch #ifdefs RCU kthreads out of the kernel unless RCU_BOOST=y,
    thus eliminating context-switch overhead if RCU priority boosting has
    not been configured.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • …el/git/tip/linux-2.6-tip

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    sched: Check if lowest_mask is initialized in find_lowest_rq()
    sched: Fix need_resched() when checking peempt

    Linus Torvalds
     
  • CONFIG_CONSTRUCTORS controls support for running constructor functions at
    kernel init time. According to commit b99b87f70c7785ab ("kernel:
    constructor support"), gcov (CONFIG_GCOV_KERNEL) needs this. However,
    CONFIG_CONSTRUCTORS currently defaults to y, with no option to disable it,
    and CONFIG_GCOV_KERNEL depends on it. Instead, default it to n and have
    CONFIG_GCOV_KERNEL select it, so that the normal case of
    CONFIG_GCOV_KERNEL=n will result in CONFIG_CONSTRUCTORS=n.

    Observed in the short list of =y values in a minimal kernel configuration.

    Signed-off-by: Josh Triplett
    Acked-by: WANG Cong
    Acked-by: Peter Oberparleiter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Triplett
     
  • The following crash was reported:

    > Call Trace:
    > [] mem_cgroup_from_task+0x15/0x17
    > [] __mem_cgroup_try_charge+0x148/0x4b4
    > [] ? need_resched+0x23/0x2d
    > [] ? preempt_schedule+0x46/0x4f
    > [] mem_cgroup_charge_common+0x9a/0xce
    > [] mem_cgroup_newpage_charge+0x5d/0x5f
    > [] khugepaged+0x5da/0xfaf
    > [] ? __init_waitqueue_head+0x4b/0x4b
    > [] ? add_mm_counter.constprop.5+0x13/0x13
    > [] kthread+0xa8/0xb0
    > [] ? sub_preempt_count+0xa1/0xb4
    > [] kernel_thread_helper+0x4/0x10
    > [] ? retint_restore_args+0x13/0x13
    > [] ? __init_kthread_worker+0x5a/0x5a

    What happens is that khugepaged tries to charge a huge page against an mm
    whose last possible owner has already exited, and the memory controller
    crashes when the stale mm->owner is used to look up the cgroup to charge.

    mm->owner has never been set to NULL with the last owner going away, but
    nobody cared until khugepaged came along.

    Even then it wasn't a problem because the final mmput() on an mm was
    forced to acquire and release mmap_sem in write-mode, preventing an
    exiting owner to go away while the mmap_sem was held, and until "692e0b3
    mm: thp: optimize memcg charge in khugepaged", the memory cgroup charge
    was protected by mmap_sem in read-mode.

    Instead of going back to relying on the mmap_sem to enforce lifetime of a
    task, this patch ensures that mm->owner is properly set to NULL when the
    last possible owner is exiting, which the memory controller can handle
    just fine.

    [akpm@linux-foundation.org: tweak comments]
    Signed-off-by: Hugh Dickins
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Johannes Weiner
    Reported-by: Hugh Dickins
    Reported-by: Dave Jones
    Reviewed-by: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

15 Jun, 2011

2 commits

  • On system boot up, the lowest_mask is initialized with an
    early_initcall(). But RT tasks may wake up on other
    early_initcall() callers before the lowest_mask is initialized,
    causing a system crash.

    Commit "d72bce0e67 rcu: Cure load woes" was the first commit
    to wake up RT tasks in early init. Before this commit this bug
    should not happen.

    Reported-by: Andrew Theurer
    Tested-by: Andrew Theurer
    Tested-by: Paul E. McKenney
    Signed-off-by: Steven Rostedt
    Acked-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110614223657.824872966@goodmis.org
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • The RT preempt check tests the wrong task if NEED_RESCHED is
    set. It currently checks the local CPU task. It is supposed to
    check the task that is running on the runqueue we are about to
    wake another task on.

    Signed-off-by: Hillf Danton
    Reviewed-by: Yong Zhang
    Signed-off-by: Steven Rostedt
    Link: http://lkml.kernel.org/r/20110614223657.450239027@goodmis.org
    Signed-off-by: Ingo Molnar

    Hillf Danton