08 Jul, 2011

2 commits


07 Jul, 2011

2 commits

  • There is a bug in free_unnecessary_pages() that causes it to
    attempt to free too many pages in some cases, which triggers the
    BUG_ON() in memory_bm_clear_bit() for copy_bm. Namely, if
    count_data_pages() is initially greater than alloc_normal, we get
    to_free_normal equal to 0 and "save" greater from 0. In that case,
    if the sum of "save" and count_highmem_pages() is greater than
    alloc_highmem, we subtract a positive number from to_free_normal.
    Hence, since to_free_normal was 0 before the subtraction and is
    an unsigned int, the result is converted to a huge positive number
    that is used as the number of pages to free.

    Fix this bug by checking if to_free_normal is actually greater
    than or equal to the number we're going to subtract from it.

    Signed-off-by: Rafael J. Wysocki
    Reported-and-tested-by: Matthew Garrett
    Cc: stable@kernel.org

    Rafael J. Wysocki
     
  • Provides the ability to resize a resource that is already allocated.
    This functionality is put in place to support reallocation needs of
    pci resources.

    Signed-off-by: Ram Pai
    Acked-by: Jesse Barnes
    Signed-off-by: Linus Torvalds

    Ram Pai
     

01 Jul, 2011

1 commit

  • Commit c8b28116 ("sched: Increase SCHED_LOAD_SCALE resolution")
    intended to have no user-visible effect, but allows setting
    cpu.shares to < MIN_SHARES, which the user then sees.

    Signed-off-by: Mike Galbraith
    Signed-off-by: Peter Zijlstra
    Cc: Nikhil Rao
    Link: http://lkml.kernel.org/r/1307192600.8618.3.camel@marge.simson.net
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     

29 Jun, 2011

1 commit

  • The jump labels entries for modules do not stop at __stop__jump_table,
    but after mod->jump_entries + mod_num_jump_entries.

    By checking the wrong end point, module trace events never get enabled.

    Cc: Ingo Molnar
    Acked-by: Jason Baron
    Tested-by: Avi Kivity
    Tested-by: Johannes Berg
    Signed-off-by: Xiao Guangrong
    Link: http://lkml.kernel.org/r/4E00038B.2060404@cn.fujitsu.com
    Signed-off-by: Steven Rostedt

    Xiao Guangrong
     

28 Jun, 2011

1 commit

  • Currently a single process may register exit handlers unlimited times.
    It may lead to a bloated listeners chain and very slow process
    terminations.

    Eg after 10KK sent TASKSTATS_CMD_ATTR_REGISTER_CPUMASKs ~300 Mb of
    kernel memory is stolen for the handlers chain and "time id" shows 2-7
    seconds instead of normal 0.003. It makes it possible to exhaust all
    kernel memory and to eat much of CPU time by triggerring numerous exits
    on a single CPU.

    The patch limits the number of times a single process may register
    itself on a single CPU to one.

    One little issue is kept unfixed - as taskstats_exit() is called before
    exit_files() in do_exit(), the orphaned listener entry (if it was not
    explicitly deregistered) is kept until the next someone's exit() and
    implicit deregistration in send_cpu_listeners(). So, if a process
    registered itself as a listener exits and the next spawned process gets
    the same pid, it would inherit taskstats attributes.

    Signed-off-by: Vasiliy Kulikov
    Cc: Balbir Singh
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasiliy Kulikov
     

25 Jun, 2011

1 commit


22 Jun, 2011

3 commits

  • Toralf Förster and Richard Weinberger noted that if there is
    no RTC device, the alarm timers core prints out an annoying
    "ALARM timers will not wake from suspend" message.

    This warning has been removed in a previous patch, however
    the issue still remains: The original idea was to support
    alarm timers even if there was no rtc device, as long as the
    system didn't go into suspend.

    However, after further consideration, communicating to the application
    that alarmtimers are not fully functional seems like the better
    solution.

    So this patch makes it so we return -ENOTSUPP to any posix _ALARM
    clockid calls if there is no backing RTC device on the system.

    Further this changes the behavior where when there is no rtc device
    we will check for one on clock_getres, clock_gettime, timer_create,
    and timer_nsleep instead of on suspend.

    CC: Toralf Förster
    CC: Richard Weinberger
    CC: Thomas Gleixner
    Reported-by: Toralf Förster
    Reported by: Richard Weinberger
    Signed-off-by: John Stultz

    John Stultz
     
  • The alarmtimers code currently picks a rtc device to use at
    late init time. However, if your rtc driver is loaded as a module,
    it may be registered after the alarmtimers late init code, leaving
    the alarmtimers nonfunctional.

    This patch moves the the rtcdevice selection to when we actually try
    to use it, allowing us to make use of rtc modules that may have been
    loaded at any point since bootup.

    CC: Thomas Gleixner
    CC: Meelis Roos
    Reported-by: Meelis Roos
    Signed-off-by: John Stultz

    John Stultz
     
  • When opening /dev/snapshot device, snapshot_open() creates memory
    bitmaps which are freed in snapshot_release(). But if any of the
    callbacks called by pm_notifier_call_chain() returns NOTIFY_BAD, open()
    fails, snapshot_release() is never called and bitmaps are not freed.
    Next attempt to open /dev/snapshot then triggers BUG_ON() check in
    create_basic_memory_bitmaps(). This happens e.g. when vmwatchdog module
    is active on s390x.

    Signed-off-by: Michal Kubecek
    Signed-off-by: Rafael J. Wysocki
    Cc: stable@kernel.org

    Michal Kubecek
     

20 Jun, 2011

1 commit

  • …-for-linus' and 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    tools/perf: Fix static build of perf tool
    tracing: Fix regression in printk_formats file

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    generic-ipi: Fix kexec boot crash by initializing call_single_queue before enabling interrupts

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    clocksource: Make watchdog robust vs. interruption
    timerfd: Fix wakeup of processes when timer is cancelled on clock change

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, MAINTAINERS: Add x86 MCE people
    x86, efi: Do not reserve boot services regions within reserved areas

    Linus Torvalds
     

19 Jun, 2011

1 commit


18 Jun, 2011

1 commit

  • ____call_usermodehelper() now erases any credentials set by the
    subprocess_inf::init() function. The problem is that commit
    17f60a7da150 ("capabilites: allow the application of capability limits
    to usermode helpers") creates and commits new credentials with
    prepare_kernel_cred() after the call to the init() function. This wipes
    all keyrings after umh_keys_init() is called.

    The best way to deal with this is to put the init() call just prior to
    the commit_creds() call, and pass the cred pointer to init(). That
    means that umh_keys_init() and suchlike can modify the credentials
    _before_ they are published and potentially in use by the rest of the
    system.

    This prevents request_key() from working as it is prevented from passing
    the session keyring it set up with the authorisation token to
    /sbin/request-key, and so the latter can't assume the authority to
    instantiate the key. This causes the in-kernel DNS resolver to fail
    with ENOKEY unconditionally.

    Signed-off-by: David Howells
    Acked-by: Eric Paris
    Tested-by: Jeff Layton
    Signed-off-by: Linus Torvalds

    David Howells
     

17 Jun, 2011

3 commits

  • There is a problem that kdump(2nd kernel) sometimes hangs up due
    to a pending IPI from 1st kernel. Kernel panic occurs because IPI
    comes before call_single_queue is initialized.

    To fix the crash, rename init_call_single_data() to call_function_init()
    and call it in start_kernel() so that call_single_queue can be
    initialized before enabling interrupts.

    The details of the crash are:

    (1) 2nd kernel boots up

    (2) A pending IPI from 1st kernel comes when irqs are first enabled
    in start_kernel().

    (3) Kernel tries to handle the interrupt, but call_single_queue
    is not initialized yet at this point. As a result, in the
    generic_smp_call_function_single_interrupt(), NULL pointer
    dereference occurs when list_replace_init() tries to access
    &q->list.next.

    Therefore this patch changes the name of init_call_single_data()
    to call_function_init() and calls it before local_irq_enable()
    in start_kernel().

    Signed-off-by: Takao Indoh
    Reviewed-by: WANG Cong
    Acked-by: Neil Horman
    Acked-by: Vivek Goyal
    Acked-by: Peter Zijlstra
    Cc: Milton Miller
    Cc: Jens Axboe
    Cc: Paul E. McKenney
    Cc: kexec@lists.infradead.org
    Link: http://lkml.kernel.org/r/D6CBEE2F420741indou.takao@jp.fujitsu.com
    Signed-off-by: Ingo Molnar

    Takao Indoh
     
  • The commit "use softirq instead of kthreads except when RCU_BOOST=y"
    just applied #ifdef in place. This commit is a cleanup that moves
    the newly #ifdef'ed code to the header file kernel/rcutree_plugin.h.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The clocksource watchdog code is interruptible and it has been
    observed that this can trigger false positives which disable the TSC.

    The reason is that an interrupt storm or a long running interrupt
    handler between the read of the watchdog source and the read of the
    TSC brings the two far enough apart that the delta is larger than the
    unstable treshold. Move both reads into a short interrupt disabled
    region to avoid that.

    Reported-and-tested-by: Vernon Mauery
    Signed-off-by: Thomas Gleixner
    Cc: stable@kernel.org

    Thomas Gleixner
     

16 Jun, 2011

4 commits

  • This patch #ifdefs RCU kthreads out of the kernel unless RCU_BOOST=y,
    thus eliminating context-switch overhead if RCU priority boosting has
    not been configured.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • …el/git/tip/linux-2.6-tip

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    sched: Check if lowest_mask is initialized in find_lowest_rq()
    sched: Fix need_resched() when checking peempt

    Linus Torvalds
     
  • CONFIG_CONSTRUCTORS controls support for running constructor functions at
    kernel init time. According to commit b99b87f70c7785ab ("kernel:
    constructor support"), gcov (CONFIG_GCOV_KERNEL) needs this. However,
    CONFIG_CONSTRUCTORS currently defaults to y, with no option to disable it,
    and CONFIG_GCOV_KERNEL depends on it. Instead, default it to n and have
    CONFIG_GCOV_KERNEL select it, so that the normal case of
    CONFIG_GCOV_KERNEL=n will result in CONFIG_CONSTRUCTORS=n.

    Observed in the short list of =y values in a minimal kernel configuration.

    Signed-off-by: Josh Triplett
    Acked-by: WANG Cong
    Acked-by: Peter Oberparleiter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Triplett
     
  • The following crash was reported:

    > Call Trace:
    > [] mem_cgroup_from_task+0x15/0x17
    > [] __mem_cgroup_try_charge+0x148/0x4b4
    > [] ? need_resched+0x23/0x2d
    > [] ? preempt_schedule+0x46/0x4f
    > [] mem_cgroup_charge_common+0x9a/0xce
    > [] mem_cgroup_newpage_charge+0x5d/0x5f
    > [] khugepaged+0x5da/0xfaf
    > [] ? __init_waitqueue_head+0x4b/0x4b
    > [] ? add_mm_counter.constprop.5+0x13/0x13
    > [] kthread+0xa8/0xb0
    > [] ? sub_preempt_count+0xa1/0xb4
    > [] kernel_thread_helper+0x4/0x10
    > [] ? retint_restore_args+0x13/0x13
    > [] ? __init_kthread_worker+0x5a/0x5a

    What happens is that khugepaged tries to charge a huge page against an mm
    whose last possible owner has already exited, and the memory controller
    crashes when the stale mm->owner is used to look up the cgroup to charge.

    mm->owner has never been set to NULL with the last owner going away, but
    nobody cared until khugepaged came along.

    Even then it wasn't a problem because the final mmput() on an mm was
    forced to acquire and release mmap_sem in write-mode, preventing an
    exiting owner to go away while the mmap_sem was held, and until "692e0b3
    mm: thp: optimize memcg charge in khugepaged", the memory cgroup charge
    was protected by mmap_sem in read-mode.

    Instead of going back to relying on the mmap_sem to enforce lifetime of a
    task, this patch ensures that mm->owner is properly set to NULL when the
    last possible owner is exiting, which the memory controller can handle
    just fine.

    [akpm@linux-foundation.org: tweak comments]
    Signed-off-by: Hugh Dickins
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Johannes Weiner
    Reported-by: Hugh Dickins
    Reported-by: Dave Jones
    Reviewed-by: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

15 Jun, 2011

5 commits

  • On system boot up, the lowest_mask is initialized with an
    early_initcall(). But RT tasks may wake up on other
    early_initcall() callers before the lowest_mask is initialized,
    causing a system crash.

    Commit "d72bce0e67 rcu: Cure load woes" was the first commit
    to wake up RT tasks in early init. Before this commit this bug
    should not happen.

    Reported-by: Andrew Theurer
    Tested-by: Andrew Theurer
    Tested-by: Paul E. McKenney
    Signed-off-by: Steven Rostedt
    Acked-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110614223657.824872966@goodmis.org
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • The RT preempt check tests the wrong task if NEED_RESCHED is
    set. It currently checks the local CPU task. It is supposed to
    check the task that is running on the runqueue we are about to
    wake another task on.

    Signed-off-by: Hillf Danton
    Reviewed-by: Yong Zhang
    Signed-off-by: Steven Rostedt
    Link: http://lkml.kernel.org/r/20110614223657.450239027@goodmis.org
    Signed-off-by: Ingo Molnar

    Hillf Danton
     
  • Fix kernel-doc warnings in signal.c:

    Warning(kernel/signal.c:2374): No description found for parameter 'nset'
    Warning(kernel/signal.c:2374): Excess function parameter 'set' description in 'sys_rt_sigprocmask'

    Signed-off-by: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Commit a26ac2455ffcf3(rcu: move TREE_RCU from softirq to kthread)
    introduced performance regression. In an AIM7 test, this commit degraded
    performance by about 40%.

    The commit runs rcu callbacks in a kthread instead of softirq. We observed
    high rate of context switch which is caused by this. Out test system has
    64 CPUs and HZ is 1000, so we saw more than 64k context switch per second
    which is caused by RCU's per-CPU kthread. A trace showed that most of
    the time the RCU per-CPU kthread doesn't actually handle any callbacks,
    but instead just does a very small amount of work handling grace periods.
    This means that RCU's per-CPU kthreads are making the scheduler do quite
    a bit of work in order to allow a very small amount of RCU-related
    processing to be done.

    Alex Shi's analysis determined that this slowdown is due to lock
    contention within the scheduler. Unfortunately, as Peter Zijlstra points
    out, the scheduler's real-time semantics require global action, which
    means that this contention is inherent in real-time scheduling. (Yes,
    perhaps someone will come up with a workaround -- otherwise, -rt is not
    going to do well on large SMP systems -- but this patch will work around
    this issue in the meantime. And "the meantime" might well be forever.)

    This patch therefore re-introduces softirq processing to RCU, but only
    for core RCU work. RCU callbacks are still executed in kthread context,
    so that only a small amount of RCU work runs in softirq context in the
    common case. This should minimize ksoftirqd execution, allowing us to
    skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels.

    Signed-off-by: Shaohua Li
    Tested-by: "Alex,Shi"
    Signed-off-by: Paul E. McKenney

    Shaohua Li
     
  • Make the functions creating the kthreads wake them up. Leverage the
    fact that the per-node and boost kthreads can run anywhere, thus
    dispensing with the need to wake them up once the incoming CPU has
    gone fully online.

    Signed-off-by: Paul E. McKenney
    Tested-by: Daniel J Blueman

    Paul E. McKenney
     

14 Jun, 2011

1 commit

  • …l/git/tip/linux-2.6-tip

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    ftrace: Revert 8ab2b7efd ftrace: Remove unnecessary disabling of irqs
    kprobes/trace: Fix kprobe selftest for gcc 4.6
    ftrace: Fix possible undefined return code
    oprofile, dcookies: Fix possible circular locking dependency
    oprofile: Fix locking dependency in sync_start()
    oprofile: Free potentially owned tasks in case of errors
    oprofile, x86: Add comments to IBS LVT offset initialization

    Linus Torvalds
     

10 Jun, 2011

1 commit

  • In kernel/irq/manage.c::irq_set_irq_wake() we call
    irq_get_desc_buslock() which may return NULL, but the code
    dereferences the result unconditionally.

    irq_set_irq_wake() has lots of callers - I checked a few and I couldn't
    find anything that guarantees that they won't call it with some input that
    will cause irq_get_desc_buslock() to return NULL, so I think it's a good
    thing to test and -EINVAL was the most sane error code in this situation
    that I could think of.

    Not all callers test the return value of irq_set_irq_wake(), but those
    that do take != 0 to mean error as far as I can see, so they should be
    fine. I guess those that don't test actually should, but that's a
    different issue.

    Signed-off-by: Jesper Juhl
    Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1106092300360.17868@swampdragon.chaosbits.net
    Signed-off-by: Thomas Gleixner

    Jesper Juhl
     

09 Jun, 2011

1 commit

  • The fix to fix the printk_formats of modules broke the
    printk_formats of trace_printks in the kernel.

    The update of what to show via the seq_file was only updated
    if the passed in fmt was NULL, which happens only on the first
    iteration. The result was showing the first format every time
    instead of iterating through the available formats.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

08 Jun, 2011

5 commits

  • …l/git/tip/linux-2.6-tip

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf: Fix comments in include/linux/perf_event.h
    perf: Comment /proc/sys/kernel/perf_event_paranoid to be part of user ABI
    perf python: Fix argument name list of read_on_cpu()
    perf evlist: Don't die if sample_{id_all|type} is invalid
    perf python: Use exception to propagate errors
    perf evlist: Remove dependency on debug routines
    perf, cgroups: Fix up for new API

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    genirq: Ensure we locate the passed IRQ in irq_alloc_descs()
    genirq: Fix descriptor init on non-sparse IRQs
    irq: Handle spurios irq detection for threaded irqs
    genirq: Print threaded handler in spurious debug output

    Linus Torvalds
     
  • …el/git/tip/linux-2.6-tip

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    sched: Fix/clarify set_task_cpu() locking rules
    lockdep: Fix lock_is_held() on recursion
    sched: Fix schedstat.nr_wakeups_migrate
    sched: Fix cross-cpu clock sync on remote wakeups

    Linus Torvalds
     
  • Revert the commit that removed the disabling of interrupts around
    the initial modifying of mcount callers to nops, and update the comment.

    The original comment was outdated and stated that the interrupts were
    being disabled to prevent kstop machine, which was required with the
    old ftrace daemon, but was no longer the case.

    What the comment failed to mention was that interrupts needed to be
    disabled to keep interrupts from preempting the modifying of the code
    and then executing the code that was partially modified.

    Revert the commit and update the comment.

    Reported-by: Richard W.M. Jones
    Tested-by: Richard W.M. Jones
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • With gcc 4.6, the self test kprobe function:

    kprobe_trace_selftest_target()

    is optimized such that kallsyms does not list it. The kprobes
    test uses this function to insert a probe and test it. But
    it will fail the test if the function is not listed in kallsyms.

    Adding a __used annotation keeps the symbol in the kallsyms table.

    Suggested-by: David Daney
    Cc: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

07 Jun, 2011

3 commits

  • Sergey reported a CONFIG_PROVE_RCU warning in push_rt_task where
    set_task_cpu() was called with both relevant rq->locks held, which
    should be sufficient for running tasks since holding its rq->lock
    will serialize against sched_move_task().

    Update the comments and fix the task_group() lockdep test.

    Reported-and-tested-by: Sergey Senozhatsky
    Cc: Oleg Nesterov
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1307115427.2353.3456.camel@twins
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • The main lock_is_held() user is lockdep_assert_held(), avoid false
    assertions in lockdep_off() sections by unconditionally reporting the
    lock is taken.

    [ the reason this is important is a lockdep_assert_held() in ttwu()
    which triggers a warning under lockdep_off() as in printk() which
    can trigger another wakeup and lock up due to spinlock
    recursion, as reported and heroically debugged by Arne Jansen ]

    Reported-and-tested-by: Arne Jansen
    Signed-off-by: Peter Zijlstra
    Cc: Linus Torvalds
    Cc:
    Link: http://lkml.kernel.org/r/1307398759.2497.966.camel@laptop
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • kernel/trace/ftrace.c: In function 'ftrace_regex_write.clone.15':
    kernel/trace/ftrace.c:2743:6: warning: 'ret' may be used uninitialized in this
    function

    Signed-off-by: GuoWen Li
    Link: http://lkml.kernel.org/r/201106011918.47939.guowen.li.linux@gmail.com
    Signed-off-by: Steven Rostedt

    GuoWen Li
     

04 Jun, 2011

2 commits


03 Jun, 2011

1 commit

  • There is an optimization which does not update the timer if the timer
    was pending and the expiration time was unchanged.

    Since commit 3bbb9ec9 ("timers: Introduce the concept of timer slack
    for legacy timers") this optimization is no longer applied for timers
    where the expiration time got extended due to the slack value. So we
    need to check again after the expiration time might have been updated.

    [ tglx: Made it a single check by applying slack first and sorting
    out the slack = 0 value (all timeouts < 256 jiffies) early ]

    Signed-off-by: Sebastian Andrzej Siewior
    Link: http://lkml.kernel.org/r/20110521105828.GA29442@Chamillionaire.breakpoint.cc
    Signed-off-by: Thomas Gleixner

    Sebastian Andrzej Siewior