14 Dec, 2011

2 commits

  • cic is association between io_context and request_queue. A cic is
    linked from both ioc and q and should be destroyed when either one
    goes away. As ioc and q both have their own locks, locking becomes a
    bit complex - both orders work for removal from one but not from the
    other.

    Currently, cfq tries to circumvent this locking order issue with RCU.
    ioc->lock nests inside queue_lock but the radix tree and cic's are
    also protected by RCU allowing either side to walk their lists without
    grabbing lock.

    This rather unconventional use of RCU quickly devolves into extremely
    fragile convolution. e.g. The following is from cfqd going away too
    soon after ioc and q exits raced.

    general protection fault: 0000 [#1] PREEMPT SMP
    CPU 2
    Modules linked in:
    [ 88.503444]
    Pid: 599, comm: hexdump Not tainted 3.1.0-rc10-work+ #158 Bochs Bochs
    RIP: 0010:[] [] cfq_exit_single_io_context+0x58/0xf0
    ...
    Call Trace:
    [] call_for_each_cic+0x5a/0x90
    [] cfq_exit_io_context+0x15/0x20
    [] exit_io_context+0x100/0x140
    [] do_exit+0x579/0x850
    [] do_group_exit+0x5b/0xd0
    [] sys_exit_group+0x17/0x20
    [] system_call_fastpath+0x16/0x1b

    The only real hot path here is cic lookup during request
    initialization and avoiding extra locking requires very confined use
    of RCU. This patch makes cic removal from both ioc and request_queue
    perform double-locking and unlink immediately.

    * From q side, the change is almost trivial as ioc->lock nests inside
    queue_lock. It just needs to grab each ioc->lock as it walks
    cic_list and unlink it.

    * From ioc side, it's a bit more difficult because of inversed lock
    order. ioc needs its lock to walk its cic_list but can't grab the
    matching queue_lock and needs to perform unlock-relock dancing.

    Unlinking is now wholly done from put_io_context() and fast path is
    optimized by using the queue_lock the caller already holds, which is
    by far the most common case. If the ioc accessed multiple devices,
    it tries with trylock. In unlikely cases of fast path failure, it
    falls back to full double-locking dance from workqueue.

    Double-locking isn't the prettiest thing in the world but it's *far*
    simpler and more understandable than RCU trick without adding any
    meaningful overhead.

    This still leaves a lot of now unnecessary RCU logics. Future patches
    will trim them.

    -v2: Vivek pointed out that cic->q was being dereferenced after
    cic->release() was called. Updated to use local variable @this_q
    instead.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Ignoring copy_io() during fork, io_context can be allocated from two
    places - current_io_context() and set_task_ioprio(). The former is
    always called from local task while the latter can be called from
    different task. The synchornization between them are peculiar and
    dubious.

    * current_io_context() doesn't grab task_lock() and assumes that if it
    saw %NULL ->io_context, it would stay that way until allocation and
    assignment is complete. It has smp_wmb() between alloc/init and
    assignment.

    * set_task_ioprio() grabs task_lock() for assignment and does
    smp_read_barrier_depends() between "ioc = task->io_context" and "if
    (ioc)". Unfortunately, this doesn't achieve anything - the latter
    is not a dependent load of the former. ie, if ioc itself were being
    dereferenced "ioc->xxx", it would mean something (not sure what tho)
    but as the code currently stands, the dependent read barrier is
    noop.

    As only one of the the two test-assignment sequences is task_lock()
    protected, the task_lock() can't do much about race between the two.
    Nothing prevents current_io_context() and set_task_ioprio() allocating
    its own ioc for the same task and overwriting the other's.

    Also, set_task_ioprio() can race with exiting task and create a new
    ioc after exit_io_context() is finished.

    ioc get/put doesn't have any reason to be complex. The only hot path
    is accessing the existing ioc of %current, which is simple to achieve
    given that ->io_context is never destroyed as long as the task is
    alive. All other paths can happily go through task_lock() like all
    other task sub structures without impacting anything.

    This patch updates ioc get/put so that it becomes more conventional.

    * alloc_io_context() is replaced with get_task_io_context(). This is
    the only interface which can acquire access to ioc of another task.
    On return, the caller has an explicit reference to the object which
    should be put using put_io_context() afterwards.

    * The functionality of current_io_context() remains the same but when
    creating a new ioc, it shares the code path with
    get_task_io_context() and always goes through task_lock().

    * get_io_context() now means incrementing ref on an ioc which the
    caller already has access to (be that an explicit refcnt or implicit
    %current one).

    * PF_EXITING inhibits creation of new io_context and once
    exit_io_context() is finished, it's guaranteed that both ioc
    acquisition functions return %NULL.

    * All users are updated. Most are trivial but
    smp_read_barrier_depends() removal from cfq_get_io_context() needs a
    bit of explanation. I suppose the original intention was to ensure
    ioc->ioprio is visible when set_task_ioprio() allocates new
    io_context and installs it; however, this wouldn't have worked
    because set_task_ioprio() doesn't have wmb between init and install.
    There are other problems with this which will be fixed in another
    patch.

    * While at it, use NUMA_NO_NODE instead of -1 for wildcard node
    specification.

    -v2: Vivek spotted contamination from debug patch. Removed.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     

10 Dec, 2011

1 commit


09 Dec, 2011

3 commits


07 Dec, 2011

3 commits

  • perf_event_sched_in() shouldn't try to schedule task events if there
    are none otherwise task's ctx->is_active will be set and will not be
    cleared during sched_out. This will prevent newly added events from
    being scheduled into the task context.

    Fixes a boo-boo in commit 1d5f003f5a9 ("perf: Do not set task_ctx
    pointer in cpuctx if there are no events in the context").

    Signed-off-by: Gleb Natapov
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20111122140821.GF2557@redhat.com
    Signed-off-by: Ingo Molnar

    Gleb Natapov
     
  • * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    ftrace: Fix hash record accounting bug
    perf: Fix parsing of __print_flags() in TP_printk()
    jump_label: jump_label_inc may return before the code is patched
    ftrace: Remove force undef config value left for testing
    tracing: Restore system filter behavior
    tracing: fix event_subsystem ref counting

    Linus Torvalds
     
  • Since commit f59de89 ("lockdep: Clear whole lockdep_map on initialization"),
    lockdep_init_map() will clear all the struct. But it will break
    lock_set_class()/lock_set_subclass(). A typical race condition
    is like below:

    CPU A CPU B
    lock_set_subclass(lockA);
    lock_set_class(lockA);
    lockdep_init_map(lockA);
    /* lockA->name is cleared */
    memset(lockA);
    __lock_acquire(lockA);
    /* lockA->class_cache[] is cleared */
    register_lock_class(lockA);
    look_up_lock_class(lockA);
    WARN_ON_ONCE(class->name !=
    lock->name);

    lock->name = name;

    So restore to what we have done before commit f59de89 but annotate
    ->lock with kmemcheck_mark_initialized() to suppress the kmemcheck
    warning reported in commit f59de89.

    Reported-by: Sergey Senozhatsky
    Reported-by: Borislav Petkov
    Suggested-by: Vegard Nossum
    Signed-off-by: Yong Zhang
    Cc: Tejun Heo
    Cc: David Rientjes
    Cc:
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20111109080451.GB8124@zhy
    Signed-off-by: Ingo Molnar

    Yong Zhang
     

06 Dec, 2011

10 commits

  • The expiry function compares the timer against current time and does
    not expire the timer when the expiry time is >= now. That's wrong. If
    the timer is set for now, then it must expire.

    Make the condition expiry > now for breaking out the loop.

    Signed-off-by: Thomas Gleixner
    Acked-by: John Stultz
    Cc: stable@kernel.org

    Thomas Gleixner
     
  • * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf: Fix loss of notification with multi-event
    perf, x86: Force IBS LVT offset assignment for family 10h
    perf, x86: Disable PEBS on SandyBridge chips
    trace_events_filter: Use rcu_assign_pointer() when setting ftrace_event_call->filter
    perf session: Fix crash with invalid CPU list
    perf python: Fix undefined symbol problem
    perf/x86: Enable raw event access to Intel offcore events
    perf: Don't use -ENOSPC for out of PMU resources
    perf: Do not set task_ctx pointer in cpuctx if there are no events in the context
    perf/x86: Fix PEBS instruction unwind
    oprofile, x86: Fix crash when unloading module (nmi timer mode)
    oprofile: Fix crash when unloading module (hr timer mode)

    Linus Torvalds
     
  • * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    clockevents: Set noop handler in clockevents_exchange_device()
    tick-broadcast: Stop active broadcast device when replacing it
    clocksource: Fix bug with max_deferment margin calculation
    rtc: Fix some bugs that allowed accumulating time drift in suspend/resume
    rtc: Disable the alarm in the hardware

    Linus Torvalds
     
  • …ernel.org/pub/scm/linux/kernel/git/tip/tip

    * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    slab, lockdep: Fix silly bug

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    genirq: Fix race condition when stopping the irq thread

    Linus Torvalds
     
  • * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched, x86: Avoid unnecessary overflow in sched_clock
    sched: Fix buglet in return_cfs_rq_runtime()
    sched: Avoid SMT siblings in select_idle_sibling() if possible
    sched: Set the command name of the idle tasks in SMP kernels
    sched, rt: Provide means of disabling cross-cpu bandwidth sharing
    sched: Document wait_for_completion_*() return values
    sched_fair: Fix a typo in the comment describing update_sd_lb_stats
    sched: Add a comment to effective_load() since it's a pain

    Linus Torvalds
     
  • If the set_ftrace_filter is cleared by writing just whitespace to
    it, then the filter hash refcounts will be decremented but not
    updated. This causes two bugs:

    1) No functions will be enabled for tracing when they all should be

    2) If the users clears the set_ftrace_filter twice, it will crash ftrace:

    ------------[ cut here ]------------
    WARNING: at /home/rostedt/work/git/linux-trace.git/kernel/trace/ftrace.c:1384 __ftrace_hash_rec_update.part.27+0x157/0x1a7()
    Modules linked in:
    Pid: 2330, comm: bash Not tainted 3.1.0-test+ #32
    Call Trace:
    [] warn_slowpath_common+0x83/0x9b
    [] warn_slowpath_null+0x1a/0x1c
    [] __ftrace_hash_rec_update.part.27+0x157/0x1a7
    [] ? ftrace_regex_release+0xa7/0x10f
    [] ? kfree+0xe5/0x115
    [] ftrace_hash_move+0x2e/0x151
    [] ftrace_regex_release+0xba/0x10f
    [] fput+0xfd/0x1c2
    [] filp_close+0x6d/0x78
    [] sys_dup3+0x197/0x1c1
    [] sys_dup2+0x4f/0x54
    [] system_call_fastpath+0x16/0x1b
    ---[ end trace 77a3a7ee73794a02 ]---

    Link: http://lkml.kernel.org/r/20111101141420.GA4918@debian

    Reported-by: Rabin Vincent
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • If cpu A calls jump_label_inc() just after atomic_add_return() is
    called by cpu B, atomic_inc_not_zero() will return value greater then
    zero and jump_label_inc() will return to a caller before jump_label_update()
    finishes its job on cpu B.

    Link: http://lkml.kernel.org/r/20111018175551.GH17571@redhat.com

    Cc: stable@vger.kernel.org
    Cc: Peter Zijlstra
    Acked-by: Jason Baron
    Signed-off-by: Gleb Natapov
    Signed-off-by: Steven Rostedt

    Gleb Natapov
     
  • A forced undef of a config value was used for testing and was
    accidently left in during the final commit. This causes x86 to
    run slower than needed while running function tracing as well
    as causes the function graph selftest to fail when DYNMAIC_FTRACE
    is not set. This is because the code in MCOUNT expects the ftrace
    code to be processed with the config value set that happened to
    be forced not set.

    The forced config option was left in by:
    commit 6331c28c962561aee59e5a493b7556a4bb585957
    ftrace: Fix dynamic selftest failure on some archs

    Link: http://lkml.kernel.org/r/20111102150255.GA6973@debian

    Cc: stable@vger.kernel.org
    Reported-by: Rabin Vincent
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Though not all events have field 'prev_pid', it was allowed to do this:

    # echo 'prev_pid == 100' > events/sched/filter

    but commit 75b8e98263fdb0bfbdeba60d4db463259f1fe8a2 (tracing/filter: Swap
    entire filter of events) broke it without any reason.

    Link: http://lkml.kernel.org/r/4EAF46CF.8040408@cn.fujitsu.com

    Signed-off-by: Li Zefan
    Signed-off-by: Steven Rostedt

    Li Zefan
     
  • Fix a bug introduced by e9dbfae5, which prevents event_subsystem from
    ever being released.

    Ref_count was added to keep track of subsystem users, not for counting
    events. Subsystem is created with ref_count = 1, so there is no need to
    increment it for every event, we have nr_events for that. Fix this by
    touching ref_count only when we actually have a new user -
    subsystem_open().

    Cc: stable@vger.kernel.org
    Signed-off-by: Ilya Dryomov
    Link: http://lkml.kernel.org/r/1320052062-7846-1-git-send-email-idryomov@gmail.com
    Signed-off-by: Steven Rostedt

    Ilya Dryomov
     

05 Dec, 2011

2 commits

  • …rostedt/linux-trace into perf/urgent

    Ingo Molnar
     
  • When you do:
    $ perf record -e cycles,cycles,cycles noploop 10

    You expect about 10,000 samples for each event, i.e., 10s at
    1000samples/sec. However, this is not what's happening. You
    get much fewer samples, maybe 3700 samples/event:

    $ perf report -D | tail -15
    Aggregated stats:
    TOTAL events: 10998
    MMAP events: 66
    COMM events: 2
    SAMPLE events: 10930
    cycles stats:
    TOTAL events: 3644
    SAMPLE events: 3644
    cycles stats:
    TOTAL events: 3642
    SAMPLE events: 3642
    cycles stats:
    TOTAL events: 3644
    SAMPLE events: 3644

    On a Intel Nehalem or even AMD64, there are 4 counters capable
    of measuring cycles, so there is plenty of space to measure those
    events without multiplexing (even with the NMI watchdog active).
    And even with multiplexing, we'd expect roughly the same number
    of samples per event.

    The root of the problem was that when the event that caused the buffer
    to become full was not the first event passed on the cmdline, the user
    notification would get lost. The notification was sent to the file
    descriptor of the overflowed event but the perf tool was not polling
    on it. The perf tool aggregates all samples into a single buffer,
    i.e., the buffer of the first event. Consequently, it assumes
    notifications for any event will come via that descriptor.

    The seemingly straight forward solution of moving the waitq into the
    ringbuffer object doesn't work because of life-time issues. One could
    perf_event_set_output() on a fd that you're also blocking on and cause
    the old rb object to be freed while its waitq would still be
    referenced by the blocked thread -> FAIL.

    Therefore link all events to the ringbuffer and broadcast the wakeup
    from the ringbuffer object to all possible events that could be waited
    upon. This is rather ugly, and we're open to better solutions but it
    works for now.

    Reported-by: Stephane Eranian
    Finished-by: Stephane Eranian
    Reviewed-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20111126014731.GA7030@quad
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

02 Dec, 2011

5 commits

  • If a device is shutdown, then there might be a pending interrupt,
    which will be processed after we reenable interrupts, which causes the
    original handler to be run. If the old handler is the (broadcast)
    periodic handler the shutdown state might hang the kernel completely.

    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org

    Thomas Gleixner
     
  • When a better rated broadcast device is installed, then the current
    active device is not disabled, which results in two running broadcast
    devices.

    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org

    Thomas Gleixner
     
  • In irq_wait_for_interrupt(), the should_stop member is verified before
    setting the task's state to TASK_INTERRUPTIBLE and calling schedule().
    In case kthread_stop sets should_stop and wakes up the process after
    should_stop is checked by the irq thread but before the task's state
    is changed, the irq thread might never exit:

    kthread_stop irq_wait_for_interrupt
    ------------ ----------------------

    ...
    ... while (!kthread_should_stop()) {
    kthread->should_stop = 1;
    wake_up_process(k);
    wait_for_completion(&kthread->exited);
    ...
    set_current_state(TASK_INTERRUPTIBLE);

    ...

    schedule();
    }

    Fix this by checking if the thread should stop after modifying the
    task's state.

    [ tglx: Simplified it a bit ]

    Signed-off-by: Ido Yariv
    Link: http://lkml.kernel.org/r/1322740508-22640-1-git-send-email-ido@wizery.com
    Signed-off-by: Thomas Gleixner
    Cc: stable@kernel.org

    Ido Yariv
     
  • ftrace_event_call->filter is sched RCU protected but didn't use
    rcu_assign_pointer(). Use it.

    TODO: Add proper __rcu annotation to call->filter and all its users.

    -v2: Use RCU_INIT_POINTER() for %NULL clearing as suggested by Eric.

    Link: http://lkml.kernel.org/r/20111123164949.GA29639@google.com

    Cc: Eric Dumazet
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: stable@kernel.org # (2.6.39+)
    Signed-off-by: Tejun Heo
    Signed-off-by: Steven Rostedt

    Tejun Heo
     
  • In order to leave a margin of 12.5% we should >> 3 not >> 5.

    CC: stable@kernel.org
    Signed-off-by: Yang Honggang (Joseph)
    [jstultz: Modified commit subject]
    Signed-off-by: John Stultz

    Yang Honggang (Joseph)
     

30 Nov, 2011

1 commit

  • * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    PM: Update comments describing device power management callbacks
    PM / Sleep: Update documentation related to system wakeup
    PM / Runtime: Make documentation follow the new behavior of irq_safe
    PM / Sleep: Correct inaccurate information in devices.txt
    PM / Domains: Document how PM domains are used by the PM core
    PM / Hibernate: Do not leak memory in error/test code paths

    Linus Torvalds
     

29 Nov, 2011

4 commits


25 Nov, 2011

1 commit

  • 2d3cbf8b (cgroup_freezer: update_freezer_state() does incorrect state
    transitions) removed is_task_frozen_enough and replaced it with a simple
    frozen call. This, however, breaks freezing for a group with stopped tasks
    because those cannot be frozen and so the group remains in CGROUP_FREEZING
    state (update_if_frozen doesn't count stopped tasks) and never reaches
    CGROUP_FROZEN.

    Let's add is_task_frozen_enough back and use it at the original locations
    (update_if_frozen and try_to_freeze_cgroup). Semantically we consider
    stopped tasks as frozen enough so we should consider both cases when
    testing frozen tasks.

    Testcase:
    mkdir /dev/freezer
    mount -t cgroup -o freezer none /dev/freezer
    mkdir /dev/freezer/foo
    sleep 1h &
    pid=$!
    kill -STOP $pid
    echo $pid > /dev/freezer/foo/tasks
    echo FROZEN > /dev/freezer/foo/freezer.state
    while true
    do
    cat /dev/freezer/foo/freezer.state
    [ "`cat /dev/freezer/foo/freezer.state`" = "FROZEN" ] && break
    sleep 1
    done
    echo OK

    Signed-off-by: Michal Hocko
    Acked-by: Li Zefan
    Cc: Tomasz Buchert
    Cc: Paul Menage
    Cc: Andrew Morton
    Cc: stable@kernel.org
    Signed-off-by: Tejun Heo

    Michal Hocko
     

24 Nov, 2011

1 commit

  • The hibernation core code forgets to release memory preallocated
    for hibernation if there's an error in its early stages or if test
    modes causing hibernation_snapshot() to return early are used. This
    causes the system to be hardly usable, because the amount of
    preallocated memory is usually huge. Fix this problem.

    Reported-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Srivatsa S. Bhat

    Rafael J. Wysocki
     

23 Nov, 2011

1 commit


21 Nov, 2011

1 commit

  • * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    PM / Suspend: Fix bug in suspend statistics update
    PM / Hibernate: Fix the early termination of test modes
    PM / shmobile: Fix build of sh7372_pm_init() for CONFIG_PM unset
    PM Sleep: Do not extend wakeup paths to devices with ignore_children set
    PM / driver core: disable device's runtime PM during shutdown
    PM / devfreq: correct Kconfig dependency
    PM / devfreq: fix use after free in devfreq_remove_device
    PM / shmobile: Avoid restoring the INTCS state during initialization
    PM / devfreq: Remove compiler error after irq.h update
    PM / QoS: Properly use the WARN() macro in dev_pm_qos_add_request()
    PM / Clocks: Only disable enabled clocks in pm_clk_suspend()
    ARM: mach-shmobile: sh7372 A3SP no_suspend_console fix
    PM / shmobile: Don't skip debugging output in pd_power_up()

    Linus Torvalds
     

19 Nov, 2011

3 commits

  • After commit 2a77c46de1e3dace73745015635ebbc648eca69c
    (PM / Suspend: Add statistics debugfs file for suspend to RAM)
    a missing pair of braces inside the state_store() function causes even
    invalid arguments to suspend to be wrongly treated as failed suspend
    attempts. Fix this.

    [rjw: Put the hash/subject of the buggy commit into the changelog.]

    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki

    Srivatsa S. Bhat
     
  • __remove_hrtimer() attempts to reprogram the clockevent device when
    the timer being removed is the next to expire. However,
    __remove_hrtimer() reprograms the clockevent *before* removing the
    timer from the timerqueue and thus when hrtimer_force_reprogram()
    finds the next timer to expire it finds the timer we're trying to
    remove.

    This is especially noticeable when the system switches to NOHz mode
    and the system tick is removed. The timer tick is removed from the
    system but the clockevent is programmed to wakeup in another HZ
    anyway.

    Silence the extra wakeup by removing the timer from the timerqueue
    before calling hrtimer_force_reprogram() so that we actually program
    the clockevent for the next timer to expire.

    This was broken by 998adc3 "hrtimers: Convert hrtimers to use
    timerlist infrastructure".

    Signed-off-by: Jeff Ohlstein
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/1321660030-8520-1-git-send-email-johlstei@codeaurora.org
    Signed-off-by: Thomas Gleixner

    Jeff Ohlstein
     
  • Commit 2aede851ddf08666f68ffc17be446420e9d2a056
    (PM / Hibernate: Freeze kernel threads after preallocating memory)
    postponed the freezing of kernel threads to after preallocating memory
    for hibernation. But while doing that, the hibernation test TEST_FREEZER
    and the test mode HIBERNATION_TESTPROC were not moved accordingly.

    As a result, when using these test modes, it only goes upto the freezing of
    userspace and exits, when in fact it should go till the complete end of task
    freezing stage, namely the freezing of kernel threads as well.

    So, move these points of exit to appropriate places so that freezing of
    kernel threads is also tested while using these test harnesses.

    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki

    Srivatsa S. Bhat
     

18 Nov, 2011

2 commits

  • ktime_get and ktime_get_ts were calling timekeeping_get_ns()
    but later they were not calling arch_gettimeoffset() so architectures
    using this mechanism returned 0 ns when calling these functions.

    This happened for example when running Busybox's ping which calls
    syscall(__NR_clock_gettime, CLOCK_MONOTONIC, ts) which eventually
    calls ktime_get. As a result the returned ping travel time was zero.

    CC: stable@kernel.org
    Signed-off-by: Hector Palacios
    Signed-off-by: John Stultz

    Hector Palacios
     
  • The power management functions related to interrupts do not know
    (yet) about per-cpu interrupts and end up calling the wrong
    low-level methods to enable/disable interrupts.

    This leads to all kind of interesting issues (action taken on one
    CPU only, updating a refcount which is not used otherwise...).

    The workaround for the time being is simply to flag these interrupts
    with IRQF_NO_SUSPEND. At least on ARM, these interrupts are actually
    dealt with at the architecture level.

    Reported-by: Santosh Shilimkar
    Tested-by: Santosh Shilimkar
    Signed-off-by: Marc Zyngier
    Cc: linux-arm-kernel@lists.infradead.org
    Link: http://lkml.kernel.org/r/1321446459-31409-1-git-send-email-marc.zyngier@arm.com
    Signed-off-by: Thomas Gleixner

    Marc Zyngier