03 Jun, 2013

1 commit


06 Mar, 2013

1 commit


01 Mar, 2013

1 commit

  • There are so many retries happen on the per-cpu event device
    when run the command 'cat /proc/timer_list', as following:

    root@~$ cat /proc/timer_list
    Timer List Version: v0.6
    HRTIMER_MAX_CLOCK_BASES: 3
    now at 3297691988044 nsecs

    Tick Device: mode: 1
    Per CPU device: 0
    Clock Event Device: local_timer
    max_delta_ns: 8624432320
    min_delta_ns: 1000
    mult: 2138893713
    shift: 32
    mode: 3
    next_event: 3297700000000 nsecs
    set_next_event: twd_set_next_event
    set_mode: twd_set_mode
    event_handler: hrtimer_interrupt
    retries: 36383

    the reason is that the local timer will stop when enter C3 state,
    we need switch the local timer to bc timer when enter the state
    and switch back when exit from the that state.The code is like this:

    void arch_idle(void)
    {
    ....
    clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
    enter_the_wait_mode();

    clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
    }

    when the broadcast timer interrupt arrives(this interrupt just wakeup
    the ARM, and ARM has no chance to handle it since local irq is disabled.
    In fact it's disabled in cpu_idle() of arch/arm/kernel/process.c)

    the broadcast timer interrupt will wake up the CPU and run:

    clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu); ->
    tick_broadcast_oneshot_control(...);
    ->
    tick_program_event(dev->next_event, 1);
    ->
    tick_dev_program_event(dev, expires, force);
    ->
    for (i = 0;;) {
    int ret = clockevents_program_event(dev, expires, now);
    if (!ret || !force)
    return ret;
    dev->retries++;
    ....
    now = ktime_get();
    expires = ktime_add_ns(now, dev->min_delta_ns);
    }
    clockevents_program_event(dev, expires, now);
    delta = ktime_to_ns(ktime_sub(expires, now));

    if (delta retries++ when retry to program the expired timer.

    Even under the worst case, after the re-program the expired timer,
    then CPU enter idle quickly before the re-progam timer expired,
    it will make system ping-pang forever if no interrupt happen.

    We have found the ping-pang issue during the video play-back test.
    system will freeze and video not playing for sometime until other interrupt
    occured to break the error condition.

    The detailed information, please refer to the LKML:https://lkml.org/lkml/2013/2/20/216
    which posted by Jason Liu.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Jason Liu
    Tested-by: Jason Liu
    Tested-by: Santosh Shilimkar
    Tested-by: Lorenzo Pieralisi

    Jason Liu
     

20 Jul, 2012

9 commits

  • When performing cpu hotplug tests the kernel printk log buffer gets flooded
    with pointless "Switched to NOHz mode..." messages. Especially when afterwards
    analyzing a dump this might have removed more interesting stuff out of the
    buffer.
    Assuming that switching to NOHz mode simply works just remove the printk.

    Signed-off-by: Heiko Carstens
    Link: http://lkml.kernel.org/r/20110823112046.GB2540@osiris.boeblingen.de.ibm.com
    Signed-off-by: Thomas Gleixner

    Heiko Carstens
     
  • Stepan found:

    CPU0 CPUn

    _cpu_up()
    __cpu_up()

    boostrap()
    notify_cpu_starting()
    set_cpu_online()
    while (!cpu_active())
    cpu_relax()

    smp_call_function(.wait=1)
    /* we find cpu_online() is true */
    arch_send_call_function_ipi_mask()

    /* wait-forever-more */

    local_irq_enable()

    cpu_notify(CPU_ONLINE)
    sched_cpu_active()
    set_cpu_active()

    Now the purpose of cpu_active is mostly with bringing down a cpu, where
    we mark it !active to avoid the load-balancer from moving tasks to it
    while we tear down the cpu. This is required because we only update the
    sched_domain tree after we brought the cpu-down. And this is needed so
    that some tasks can still run while we bring it down, we just don't want
    new tasks to appear.

    On cpu-up however the sched_domain tree doesn't yet include the new cpu,
    so its invisible to the load-balancer, regardless of the active state.
    So instead of setting the active state after we boot the new cpu (and
    consequently having to wait for it before enabling interrupts) set the
    cpu active before we set it online and avoid the whole mess.

    Reported-by: Stepan Moskovchenko
    Signed-off-by: Peter Zijlstra
    Acked-by: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1323965362.18942.71.camel@twins
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • No need to assign ret in each case and break. Simply return the result
    of the handler function directly.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Darren Hart
    Signed-off-by: Huang Shijie

    Thomas Gleixner
     
  • cpufreq: interactive: New 'interactive' governor

    This governor is designed for latency-sensitive workloads, such as
    interactive user interfaces. The interactive governor aims to be
    significantly more responsive to ramp CPU quickly up when CPU-intensive
    activity begins.

    Existing governors sample CPU load at a particular rate, typically
    every X ms. This can lead to under-powering UI threads for the period of
    time during which the user begins interacting with a previously-idle system
    until the next sample period happens.

    The 'interactive' governor uses a different approach. Instead of sampling
    the CPU at a specified rate, the governor will check whether to scale the
    CPU frequency up soon after coming out of idle. When the CPU comes out of
    idle, a timer is configured to fire within 1-2 ticks. If the CPU is very
    busy from exiting idle to when the timer fires then we assume the CPU is
    underpowered and ramp to MAX speed.

    If the CPU was not sufficiently busy to immediately ramp to MAX speed, then
    the governor evaluates the CPU load since the last speed adjustment,
    choosing the highest value between that longer-term load or the short-term
    load since idle exit to determine the CPU speed to ramp to.

    A realtime thread is used for scaling up, giving the remaining tasks the
    CPU performance benefit, unlike existing governors which are more likely to
    schedule rampup work to occur after your performance starved tasks have
    completed.

    The tuneables for this governor are:
    /sys/devices/system/cpu/cpufreq/interactive/min_sample_time:
    The minimum amount of time to spend at the current frequency before
    ramping down. This is to ensure that the governor has seen enough
    historic CPU load data to determine the appropriate workload.
    /sys/devices/system/cpu/cpufreq/interactive/go_maxspeed_load
    The CPU load at which to ramp to max speed.

    Signed-off-by: Anson Huang

    Anson Huang
     
  • Change a single occurrence of "unlcoked" into "unlocked".

    Signed-off-by: Bart Van Assche
    Cc: Darren Hart
    Cc: Thomas Gleixner
    Signed-off-by: Jiri Kosina

    Bart Van Assche
     
  • The variables here are really not used uninitialized.

    kernel/futex.c: In function 'fixup_pi_state_owner.clone.17':
    kernel/futex.c:1582:6: warning: 'curval' may be used uninitialized in this function
    kernel/futex.c: In function 'handle_futex_death':
    kernel/futex.c:2486:6: warning: 'nval' may be used uninitialized in this function
    kernel/futex.c: In function 'do_futex':
    kernel/futex.c:863:11: warning: 'curval' may be used uninitialized in this function
    kernel/futex.c:828:6: note: 'curval' was declared here
    kernel/futex.c:898:5: warning: 'oldval' may be used uninitialized in this function
    kernel/futex.c:890:6: note: 'oldval' was declared here

    Signed-off-by: Vitaliy Ivanov
    Acked-by: Darren Hart
    Signed-off-by: Jiri Kosina

    Vitaliy Ivanov
     
  • This was legacy code brought over from the RT tree and
    is no longer necessary.

    Signed-off-by: Dima Zavin
    Acked-by: Thomas Gleixner
    Cc: Daniel Walker
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Andi Kleen
    Cc: Lai Jiangshan
    Link: http://lkml.kernel.org/r/1310084879-10351-2-git-send-email-dima@android.com
    Signed-off-by: Ingo Molnar

    Dima Zavin
     
  • Add work around to the reboot issue of SMP, with
    SMP, all the CPUs need to do _rcu_barrier, if we
    enqueue an rcu callback, we need to make sure CPU
    tick to stay alive until we take care of those by
    completing the appropriate grace period.

    This work around only work when the reboot command
    issue, so it didn't impact normal kernel feature.

    Signed-off-by: Anson Huang

    Anson Huang
     
  • There was some driver is slow on suspend/resume,
    but some embeded system like eReader,Cellphone
    are time sensitive,this commit will report the slow
    driver on suspend/resume, the default value is 500us(0.5ms)

    Also, the threshold can be change by modify
    '/sys/power/device_suspend_time_threshold' to change the threshold,
    it is in microsecond.

    The output is like:

    PM: device platform:soc-audio.2 suspend too slow, takes 606.696 msecs
    PM: device platform:mxc_sdc_fb.1 suspend too slow, takes 7.708 msecs

    the default state of suspend driver is default off,
    if you want to debug the suspend time, echo time in
    microsecond(u Second) to /sys/powe/device_suspend_time_threshold

    eg: I want to know which driver suspend & resume takes
    more that 0.5 ms (500 us), you can just :

    ehco 500 > /sys/power/device_suspend_time_threshold

    Signed-off-by: Zhang Jiejing

    Zhang Jiejing
     

18 Jun, 2012

1 commit

  • commit a841f8cef4bb124f0f5563314d0beaf2e1249d72 upstream.

    It does not get processed because sched_domain_level_max is 0 at the
    time that setup_relax_domain_level() is run.

    Simply accept the value as it is, as we don't know the value of
    sched_domain_level_max until sched domain construction is completed.

    Fix sched_relax_domain_level in cpuset. The build_sched_domain() routine calls
    the set_domain_attribute() routine prior to setting the sd->level, however,
    the set_domain_attribute() routine relies on the sd->level to decide whether
    idle load balancing will be off/on.

    Signed-off-by: Dimitri Sivanich
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20120605184436.GA15668@sgi.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Dimitri Sivanich
     

01 Jun, 2012

1 commit

  • commit 544ecf310f0e7f51fa057ac2a295fc1b3b35a9d3 upstream.

    worker_enter_idle() has WARN_ON_ONCE() which triggers if nr_running
    isn't zero when every worker is idle. This can trigger spuriously
    while a cpu is going down due to the way trustee sets %WORKER_ROGUE
    and zaps nr_running.

    It first sets %WORKER_ROGUE on all workers without updating
    nr_running, releases gcwq->lock, schedules, regrabs gcwq->lock and
    then zaps nr_running. If the last running worker enters idle
    inbetween, it would see stale nr_running which hasn't been zapped yet
    and trigger the WARN_ON_ONCE().

    Fix it by performing the sanity check iff the trustee is idle.

    Signed-off-by: Tejun Heo
    Reported-by: "Paul E. McKenney"
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     

22 May, 2012

2 commits

  • commit b7dafa0ef3145c31d7753be0a08b3cbda51f0209 upstream.

    compat_sys_sigprocmask reads a smaller signal mask from userspace than
    sigprogmask accepts for setting. So the high word of blocked.sig[0]
    will be cleared, releasing any potentially blocked RT signal.

    This was discovered via userspace code that relies on get/setcontext.
    glibc's i386 versions of those functions use sigprogmask instead of
    rt_sigprogmask to save/restore signal mask and caused RT signal
    unblocking this way.

    As suggested by Linus, this replaces the sys_sigprocmask based compat
    version with one that open-codes the required logic, including the merge
    of the existing blocked set with the new one provided on SIG_SETMASK.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Jan Kiszka
     
  • commit 5e2bf0142231194d36fdc9596b36a261ed2b9fe7 upstream.

    Fork() failure post namespace creation for a child cloned with
    CLONE_NEWPID leaks pid_namespace/mnt_cache due to proc being mounted
    during creation, but not unmounted during cleanup. Call
    pid_ns_release_proc() during cleanup.

    Signed-off-by: Mike Galbraith
    Acked-by: Oleg Nesterov
    Reviewed-by: "Eric W. Biederman"
    Cc: Pavel Emelyanov
    Cc: Cyrill Gorcunov
    Cc: Louis Rilling
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Mike Galbraith
     

07 May, 2012

1 commit

  • commit c308b56b5398779cd3da0f62ab26b0453494c3d4 upstream.
    [ backported to 3.0 by Kerin Millar ]

    Various people reported nohz load tracking still being wrecked, but Doug
    spotted the actual problem. We fold the nohz remainder in too soon,
    causing us to loose samples and under-account.

    So instead of playing catch-up up-front, always do a single load-fold
    with whatever state we encounter and only then fold the nohz remainder
    and play catch-up.

    Reported-by: Doug Smythies
    Reported-by: LesÅ=82aw Kope=C4=87
    Reported-by: Aman Gupta
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-4v31etnhgg9kwd6ocgx3rxl8@git.kernel.org
    Signed-off-by: Ingo Molnar
    Cc: Kerin Millar
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     

23 Apr, 2012

2 commits

  • commit bdbb776f882f5ad431aa1e694c69c1c3d6a4a5b8 upstream.

    It was possible to extract the robust list head address from a setuid
    process if it had used set_robust_list(), allowing an ASLR info leak. This
    changes the permission checks to be the same as those used for similar
    info that comes out of /proc.

    Running a setuid program that uses robust futexes would have had:
    cred->euid != pcred->euid
    cred->euid == pcred->uid
    so the old permissions check would allow it. I'm not aware of any setuid
    programs that use robust futexes, so this is just a preventative measure.

    (This patch is based on changes from grsecurity.)

    Signed-off-by: Kees Cook
    Cc: Darren Hart
    Cc: Peter Zijlstra
    Cc: Jiri Kosina
    Cc: Eric W. Biederman
    Cc: David Howells
    Cc: Serge E. Hallyn
    Cc: kernel-hardening@lists.openwall.com
    Cc: spender@grsecurity.net
    Link: http://lkml.kernel.org/r/20120319231253.GA20893@www.outflux.net
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     
  • commit 6f103929f8979d2638e58d7f7fda0beefcb8ee7e upstream.

    Fix tick_nohz_restart() to not use a stale ktime_t "now" value when
    calling tick_do_update_jiffies64(now).

    If we reach this point in the loop it means that we crossed a tick
    boundary since we grabbed the "now" timestamp, so at this point "now"
    refers to a time in the old jiffy, so using the old value for "now" is
    incorrect, and is likely to give us a stale jiffies value.

    In particular, the first time through the loop the
    tick_do_update_jiffies64(now) call is always a no-op, since the
    caller, tick_nohz_restart_sched_tick(), will have already called
    tick_do_update_jiffies64(now) with that "now" value.

    Note that tick_nohz_stop_sched_tick() already uses the correct
    approach: when we notice we cross a jiffy boundary, grab a new
    timestamp with ktime_get(), and *then* update jiffies.

    Signed-off-by: Neal Cardwell
    Cc: Ben Segall
    Cc: Ingo Molnar
    Link: http://lkml.kernel.org/r/1332875377-23014-1-git-send-email-ncardwell@google.com
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Neal Cardwell
     

13 Apr, 2012

5 commits

  • commit 79549c6dfda0603dba9a70a53467ce62d9335c33 upstream.

    keyctl_session_to_parent(task) sets ->replacement_session_keyring,
    it should be processed and cleared by key_replace_session_keyring().

    However, this task can fork before it notices TIF_NOTIFY_RESUME and
    the new child gets the bogus ->replacement_session_keyring copied by
    dup_task_struct(). This is obviously wrong and, if nothing else, this
    leads to put_cred(already_freed_cred).

    change copy_creds() to clear this member. If copy_process() fails
    before this point the wrong ->replacement_session_keyring doesn't
    matter, exit_creds() won't be called.

    Signed-off-by: Oleg Nesterov
    Acked-by: David Howells
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Oleg Nesterov
     
  • commit 620f6e8e855d6d447688a5f67a4e176944a084e8 upstream.

    Commit bfdc0b4 adds code to restrict access to dmesg_restrict,
    however, it incorrectly alters kptr_restrict rather than
    dmesg_restrict.

    The original patch from Richard Weinberger
    (https://lkml.org/lkml/2011/3/14/362) alters dmesg_restrict as
    expected, and so the patch seems to have been misapplied.

    This adds the CAP_SYS_ADMIN check to both dmesg_restrict and
    kptr_restrict, since both are sensitive.

    Reported-by: Phillip Lougher
    Signed-off-by: Kees Cook
    Acked-by: Serge Hallyn
    Acked-by: Richard Weinberger
    Signed-off-by: James Morris
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     
  • commit 98b54aa1a2241b59372468bd1e9c2d207bdba54b upstream.

    There is extra state information that needs to be exposed in the
    kgdb_bpt structure for tracking how a breakpoint was installed. The
    debug_core only uses the the probe_kernel_write() to install
    breakpoints, but this is not enough for all the archs. Some arch such
    as x86 need to use text_poke() in order to install a breakpoint into a
    read only page.

    Passing the kgdb_bpt structure to kgdb_arch_set_breakpoint() and
    kgdb_arch_remove_breakpoint() allows other archs to set the type
    variable which indicates how the breakpoint was installed.

    Signed-off-by: Jason Wessel
    Signed-off-by: Greg Kroah-Hartman

    Jason Wessel
     
  • commit 01de982abf8c9e10fc3089e10585cd2cc914bdab upstream.

    8 hex characters tell only half the tale for 64 bit CPUs,
    so use the appropriate length.

    Link: http://lkml.kernel.org/r/1332411501-8059-2-git-send-email-wolfgang.mauerer@siemens.com

    Signed-off-by: Wolfgang Mauerer
    Signed-off-by: Steven Rostedt
    Signed-off-by: Greg Kroah-Hartman

    Wolfgang Mauerer
     
  • commit f5cb92ac82d06cb583c1f66666314c5c0a4d7913 upstream.

    irq_move_masked_irq() checks the return code of
    chip->irq_set_affinity() only for 0, but IRQ_SET_MASK_OK_NOCOPY is
    also a valid return code, which is there to avoid a redundant copy of
    the cpumask. But in case of IRQ_SET_MASK_OK_NOCOPY we not only avoid
    the redundant copy, we also fail to adjust the thread affinity of an
    eventually threaded interrupt handler.

    Handle IRQ_SET_MASK_OK (==0) and IRQ_SET_MASK_OK_NOCOPY(==1) return
    values correctly by checking the valid return values seperately.

    Signed-off-by: Jiang Liu
    Cc: Jiang Liu
    Cc: Keping Chen
    Link: http://lkml.kernel.org/r/1333120296-13563-2-git-send-email-jiang.liu@huawei.com
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Jiang Liu
     

03 Apr, 2012

6 commits

  • commit f946eeb9313ff1470758e171a60fe7438a2ded3f upstream.

    Module size was limited to 64MB, this was legacy limitation due to vmalloc()
    which was removed a while ago.

    Limiting module size to 64MB is both pointless and affects real world use
    cases.

    Cc: Tim Abbott
    Signed-off-by: Sasha Levin
    Signed-off-by: Rusty Russell
    Signed-off-by: Greg Kroah-Hartman

    Sasha Levin
     
  • commit 05b4877f6a4f1ba4952d1222213d262bf8c132b7 upstream.

    If create_basic_memory_bitmaps() fails, usermodehelpers are not re-enabled
    before returning. Fix this. And while at it, reword the goto labels so that
    they look more meaningful.

    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Greg Kroah-Hartman

    Srivatsa S. Bhat
     
  • commit 540b60e24f3f4781d80e47122f0c4486a03375b8 upstream.

    We do not want a bitwise AND between boolean operands

    Signed-off-by: Alexander Gordeev
    Cc: Oleg Nesterov
    Link: http://lkml.kernel.org/r/20120309135912.GA2114@dhcp-26-207.brq.redhat.com
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Alexander Gordeev
     
  • commit a09b659cd68c10ec6a30cb91ebd2c327fcd5bfe5 upstream.

    In 2008, commit 0c5d1eb77a8be ("genirq: record trigger type") modified the
    way set_irq_type() handles the 'no trigger' condition. However, this has
    an adverse effect on PCMCIA support on Intel StrongARM and probably PXA
    platforms.

    PCMCIA has several status signals on the socket which can trigger
    interrupts; some of these status signals depend on the card's mode
    (whether it is configured in memory or IO mode). For example, cards have
    a 'Ready/IRQ' signal: in memory mode, this provides an indication to
    PCMCIA that the card has finished its power up initialization. In IO
    mode, it provides the device interrupt signal. Other status signals
    switch between on-board battery status and loud speaker output.

    In classical PCMCIA implementations, where you have a specific socket
    controller, the controller provides a method to mask interrupts from the
    socket, and importantly ignore any state transitions on the pins which
    correspond with interrupts once masked. This masking prevents unwanted
    events caused by the removal and application of socket power being
    forwarded.

    However, on platforms where there is no socket controller, the PCMCIA
    status and interrupt signals are routed to standard edge-triggered GPIOs.
    These GPIOs can be configured to interrupt on rising edge, falling edge,
    or never. This is where the problems start.

    Edge triggered interrupts are required to record events while disabled via
    the usual methods of {free,request,disable,enable}_irq() to prevent
    problems with dropped interrupts (eg, the 8390 driver uses disable_irq()
    to defer the delivery of interrupts). As a result, these interfaces can
    not be used to implement the desired behaviour.

    The side effect of this is that if the 'Ready/IRQ' GPIO is disabled via
    disable_irq() on suspend, and enabled via enable_irq() after resume, we
    will record the state transitions caused by powering events as valid
    interrupts, and foward them to the card driver, which may attempt to
    access a card which is not powered up.

    This leads delays resume while drivers spin in their interrupt handlers,
    and complaints from drivers before they realize what's happened.

    Moreover, in the case of the 'Ready/IRQ' signal, this is requested and
    freed by the card driver itself; the PCMCIA core has no idea whether the
    interrupt is requested, and, therefore, whether a call to disable_irq()
    would be valid. (We tried this around 2.4.17 / 2.5.1 kernel era, and
    ended up throwing it out because of this problem.)

    Therefore, it was decided back in around 2002 to disable the edge
    triggering instead, resulting in all state transitions on the GPIO being
    ignored. That's what we actually need the hardware to do.

    The commit above changes this behaviour; it explicitly prevents the 'no
    trigger' state being selected.

    The reason that request_irq() does not accept the 'no trigger' state is
    for compatibility with existing drivers which do not provide their desired
    triggering configuration. The set_irq_type() function is 'new' and not
    used by non-trigger aware drivers.

    Therefore, revert this change, and restore previously working platforms
    back to their former state.

    Signed-off-by: Russell King
    Cc: linux@arm.linux.org.uk
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Russell King
     
  • commit a078c6d0e6288fad6d83fb6d5edd91ddb7b6ab33 upstream.

    'long secs' is passed as divisor to div_s64, which accepts a 32bit
    divisor. On 64bit machines that value is trimmed back from 8 bytes
    back to 4, causing a divide by zero when the number is bigger than
    (1 << 32) - 1 and all 32 lower bits are 0.

    Use div64_long() instead.

    Signed-off-by: Sasha Levin
    Cc: johnstul@us.ibm.com
    Link: http://lkml.kernel.org/r/1331829374-31543-2-git-send-email-levinsasha928@gmail.com
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Sasha Levin
     
  • commit 59263b513c11398cd66a52d4c5b2b118ce1e0359 upstream.

    Some of the newer futex PI opcodes do not check the cmpxchg enabled
    variable and call unconditionally into the handling functions. Cover
    all PI opcodes in a separate check.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Darren Hart
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

19 Mar, 2012

1 commit

  • commit 62d3c5439c534b0e6c653fc63e6d8c67be3a57b1 upstream.

    This patch (as1519) fixes a bug in the block layer's disk-events
    polling. The polling is done by a work routine queued on the
    system_nrt_wq workqueue. Since that workqueue isn't freezable, the
    polling continues even in the middle of a system sleep transition.

    Obviously, polling a suspended drive for media changes and such isn't
    a good thing to do; in the case of USB mass-storage devices it can
    lead to real problems requiring device resets and even re-enumeration.

    The patch fixes things by creating a new system-wide, non-reentrant,
    freezable workqueue and using it for disk-events polling.

    Signed-off-by: Alan Stern
    Acked-by: Tejun Heo
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Alan Stern
     

13 Mar, 2012

2 commits

  • 3.0.21's 603b63484725a6e88e4ae5da58716efd88154b1e directly used
    the upstream patch, yet kprobes locking in 3.0.x uses spin_lock...()
    rather than raw_spin_lock...().

    Signed-off-by: Jan Beulich
    Signed-off-by: Greg Kroah-Hartman

    Jan Beulich
     
  • commit 52abb700e16a9aa4cbc03f3d7f80206cbbc80680 upstream.

    Xommit ac5637611(genirq: Unmask oneshot irqs when thread was not woken)
    fails to unmask when a !IRQ_ONESHOT threaded handler is handled by
    handle_level_irq.

    This happens because thread_mask is or'ed unconditionally in
    irq_wake_thread(), but for !IRQ_ONESHOT interrupts never cleared. So
    the check for !desc->thread_active fails and keeps the interrupt
    disabled.

    Keep the thread_mask zero for !IRQ_ONESHOT interrupts.

    Document the thread_mask magic while at it.

    Reported-and-tested-by: Sven Joachim
    Reported-and-tested-by: Stefan Lippers-Hollmann
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

01 Mar, 2012

3 commits

  • commit d80e731ecab420ddcb79ee9d0ac427acbc187b4b upstream.

    This patch is intentionally incomplete to simplify the review.
    It ignores ep_unregister_pollwait() which plays with the same wqh.
    See the next change.

    epoll assumes that the EPOLL_CTL_ADD'ed file controls everything
    f_op->poll() needs. In particular it assumes that the wait queue
    can't go away until eventpoll_release(). This is not true in case
    of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand
    which is not connected to the file.

    This patch adds the special event, POLLFREE, currently only for
    epoll. It expects that init_poll_funcptr()'ed hook should do the
    necessary cleanup. Perhaps it should be defined as EPOLLFREE in
    eventpoll.

    __cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if
    ->signalfd_wqh is not empty, we add the new signalfd_cleanup()
    helper.

    ep_poll_callback(POLLFREE) simply does list_del_init(task_list).
    This make this poll entry inconsistent, but we don't care. If you
    share epoll fd which contains our sigfd with another process you
    should blame yourself. signalfd is "really special". I simply do
    not know how we can define the "right" semantics if it used with
    epoll.

    The main problem is, epoll calls signalfd_poll() once to establish
    the connection with the wait queue, after that signalfd_poll(NULL)
    returns the different/inconsistent results depending on who does
    EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd
    has nothing to do with the file, it works with the current thread.

    In short: this patch is the hack which tries to fix the symptoms.
    It also assumes that nobody can take tasklist_lock under epoll
    locks, this seems to be true.

    Note:

    - we do not have wake_up_all_poll() but wake_up_poll()
    is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE.

    - signalfd_cleanup() uses POLLHUP along with POLLFREE,
    we need a couple of simple changes in eventpoll.c to
    make sure it can't be "lost".

    Reported-by: Maxime Bizon
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Oleg Nesterov
     
  • commit b4bc724e82e80478cba5fe9825b62e71ddf78757 upstream.

    An interrupt might be pending when irq_startup() is called, but the
    startup code does not invoke the resend logic. In some cases this
    prevents the device from issuing another interrupt which renders the
    device non functional.

    Call the resend function in irq_startup() to keep things going.

    Reported-and-tested-by: Russell King
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit ac5637611150281f398bb7a47e3fcb69a09e7803 upstream.

    When the primary handler of an interrupt which is marked IRQ_ONESHOT
    returns IRQ_HANDLED or IRQ_NONE, then the interrupt thread is not
    woken and the unmask logic of the interrupt line is never
    invoked. This keeps the interrupt masked forever.

    This was not noticed as most IRQ_ONESHOT users wake the thread
    unconditionally (usually because they cannot access the underlying
    device from hard interrupt context). Though this behaviour was nowhere
    documented and not necessarily intentional. Some drivers can avoid the
    thread wakeup in certain cases and run into the situation where the
    interrupt line s kept masked.

    Handle it gracefully.

    Reported-and-tested-by: Lothar Wassmann
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 Feb, 2012

1 commit


14 Feb, 2012

3 commits

  • commit df754e6af2f237a6c020c0daff55a1a609338e31 upstream.

    It's unlikely that TAINT_FIRMWARE_WORKAROUND causes false
    lockdep messages, so do not disable lockdep in that case.
    We still want to keep lockdep disabled in the
    TAINT_OOT_MODULE case:

    - bin-only modules can cause various instabilities in
    their and in unrelated kernel code

    - they are impossible to debug for kernel developers

    - they also typically do not have the copyright license
    permission to link to the GPL-ed lockdep code.

    Suggested-by: Ben Hutchings
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-xopopjjens57r0i13qnyh2yo@git.kernel.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     
  • commit cb297a3e433dbdcf7ad81e0564e7b804c941ff0d upstream.

    This issue happens under the following conditions:

    1. preemption is off
    2. __ARCH_WANT_INTERRUPTS_ON_CTXSW is defined
    3. RT scheduling class
    4. SMP system

    Sequence is as follows:

    1.suppose current task is A. start schedule()
    2.task A is enqueued pushable task at the entry of schedule()
    __schedule
    prev = rq->curr;
    ...
    put_prev_task
    put_prev_task_rt
    enqueue_pushable_task
    4.pick the task B as next task.
    next = pick_next_task(rq);
    3.rq->curr set to task B and context_switch is started.
    rq->curr = next;
    4.At the entry of context_swtich, release this cpu's rq->lock.
    context_switch
    prepare_task_switch
    prepare_lock_switch
    raw_spin_unlock_irq(&rq->lock);
    5.Shortly after rq->lock is released, interrupt is occurred and start IRQ context
    6.try_to_wake_up() which called by ISR acquires rq->lock
    try_to_wake_up
    ttwu_remote
    rq = __task_rq_lock(p)
    ttwu_do_wakeup(rq, p, wake_flags);
    task_woken_rt
    7.push_rt_task picks the task A which is enqueued before.
    task_woken_rt
    push_rt_tasks(rq)
    next_task = pick_next_pushable_task(rq)
    8.At find_lock_lowest_rq(), If double_lock_balance() returns 0,
    lowest_rq can be the remote rq.
    (But,If preemption is on, double_lock_balance always return 1 and it
    does't happen.)
    push_rt_task
    find_lock_lowest_rq
    if (double_lock_balance(rq, lowest_rq))..
    9.find_lock_lowest_rq return the available rq. task A is migrated to
    the remote cpu/rq.
    push_rt_task
    ...
    deactivate_task(rq, next_task, 0);
    set_task_cpu(next_task, lowest_rq->cpu);
    activate_task(lowest_rq, next_task, 0);
    10. But, task A is on irq context at this cpu.
    So, task A is scheduled by two cpus at the same time until restore from IRQ.
    Task A's stack is corrupted.

    To fix it, don't migrate an RT task if it's still running.

    Signed-off-by: Chanho Min
    Signed-off-by: Peter Zijlstra
    Acked-by: Steven Rostedt
    Link: http://lkml.kernel.org/r/CAOAMb1BHA=5fm7KTewYyke6u-8DP0iUuJMpgQw54vNeXFsGpoQ@mail.gmail.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Chanho Min
     
  • commit 55ca6140e9bb307efc97a9301a4f501de02a6fd6 upstream.

    In function pre_handler_kretprobe(), the allocated kretprobe_instance
    object will get leaked if the entry_handler callback returns non-zero.
    This may cause all the preallocated kretprobe_instance objects exhausted.

    This issue can be reproduced by changing
    samples/kprobes/kretprobe_example.c to probe "mutex_unlock". And the fix
    is straightforward: just put the allocated kretprobe_instance object back
    onto the free_instances list.

    [akpm@linux-foundation.org: use raw_spin_lock/unlock]
    Signed-off-by: Jiang Liu
    Acked-by: Jim Keniston
    Acked-by: Ananth N Mavinakayanahalli
    Cc: Masami Hiramatsu
    Cc: Anil S Keshavamurthy
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Jiang Liu