14 Dec, 2012

1 commit

  • Pull KVM updates from Marcelo Tosatti:
    "Considerable KVM/PPC work, x86 kvmclock vsyscall support,
    IA32_TSC_ADJUST MSR emulation, amongst others."

    Fix up trivial conflict in kernel/sched/core.c due to cross-cpu
    migration notifier added next to rq migration call-back.

    * tag 'kvm-3.8-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (156 commits)
    KVM: emulator: fix real mode segment checks in address linearization
    VMX: remove unneeded enable_unrestricted_guest check
    KVM: VMX: fix DPL during entry to protected mode
    x86/kexec: crash_vmclear_local_vmcss needs __rcu
    kvm: Fix irqfd resampler list walk
    KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump
    x86/kexec: VMCLEAR VMCSs loaded on all cpus if necessary
    KVM: MMU: optimize for set_spte
    KVM: PPC: booke: Get/set guest EPCR register using ONE_REG interface
    KVM: PPC: bookehv: Add EPCR support in mtspr/mfspr emulation
    KVM: PPC: bookehv: Add guest computation mode for irq delivery
    KVM: PPC: Make EPCR a valid field for booke64 and bookehv
    KVM: PPC: booke: Extend MAS2 EPN mask for 64-bit
    KVM: PPC: e500: Mask MAS2 EPN high 32-bits in 32/64 tlbwe emulation
    KVM: PPC: Mask ea's high 32-bits in 32/64 instr emulation
    KVM: PPC: e500: Add emulation helper for getting instruction ea
    KVM: PPC: bookehv64: Add support for interrupt handling
    KVM: PPC: bookehv: Remove GET_VCPU macro from exception handler
    KVM: PPC: booke: Fix get_tb() compile error on 64-bit
    KVM: PPC: e500: Silence bogus GCC warning in tlb code
    ...

    Linus Torvalds
     

13 Dec, 2012

1 commit

  • Pull networking changes from David Miller:

    1) Allow to dump, monitor, and change the bridge multicast database
    using netlink. From Cong Wang.

    2) RFC 5961 TCP blind data injection attack mitigation, from Eric
    Dumazet.

    3) Networking user namespace support from Eric W. Biederman.

    4) tuntap/virtio-net multiqueue support by Jason Wang.

    5) Support for checksum offload of encapsulated packets (basically,
    tunneled traffic can still be checksummed by HW). From Joseph
    Gasparakis.

    6) Allow BPF filter access to VLAN tags, from Eric Dumazet and
    Daniel Borkmann.

    7) Bridge port parameters over netlink and BPDU blocking support
    from Stephen Hemminger.

    8) Improve data access patterns during inet socket demux by rearranging
    socket layout, from Eric Dumazet.

    9) TIPC protocol updates and cleanups from Ying Xue, Paul Gortmaker, and
    Jon Maloy.

    10) Update TCP socket hash sizing to be more in line with current day
    realities. The existing heurstics were choosen a decade ago.
    From Eric Dumazet.

    11) Fix races, queue bloat, and excessive wakeups in ATM and
    associated drivers, from Krzysztof Mazur and David Woodhouse.

    12) Support DOVE (Distributed Overlay Virtual Ethernet) extensions
    in VXLAN driver, from David Stevens.

    13) Add "oops_only" mode to netconsole, from Amerigo Wang.

    14) Support set and query of VEB/VEPA bridge mode via PF_BRIDGE, also
    allow DCB netlink to work on namespaces other than the initial
    namespace. From John Fastabend.

    15) Support PTP in the Tigon3 driver, from Matt Carlson.

    16) tun/vhost zero copy fixes and improvements, plus turn it on
    by default, from Michael S. Tsirkin.

    17) Support per-association statistics in SCTP, from Michele
    Baldessari.

    And many, many, driver updates, cleanups, and improvements. Too
    numerous to mention individually.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
    net/mlx4_en: Add support for destination MAC in steering rules
    net/mlx4_en: Use generic etherdevice.h functions.
    net: ethtool: Add destination MAC address to flow steering API
    bridge: add support of adding and deleting mdb entries
    bridge: notify mdb changes via netlink
    ndisc: Unexport ndisc_{build,send}_skb().
    uapi: add missing netconf.h to export list
    pkt_sched: avoid requeues if possible
    solos-pci: fix double-free of TX skb in DMA mode
    bnx2: Fix accidental reversions.
    bna: Driver Version Updated to 3.1.2.1
    bna: Firmware update
    bna: Add RX State
    bna: Rx Page Based Allocation
    bna: TX Intr Coalescing Fix
    bna: Tx and Rx Optimizations
    bna: Code Cleanup and Enhancements
    ath9k: check pdata variable before dereferencing it
    ath5k: RX timestamp is reported at end of frame
    ath9k_htc: RX timestamp is reported at end of frame
    ...

    Linus Torvalds
     

12 Dec, 2012

2 commits

  • Pull core timer changes from Ingo Molnar:
    "It contains continued generic-NOHZ work by Frederic and smaller
    cleanups."

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    time: Kill xtime_lock, replacing it with jiffies_lock
    clocksource: arm_generic: use this_cpu_ptr per-cpu helper
    clocksource: arm_generic: use integer math helpers
    time/jiffies: Make clocksource_jiffies static
    clocksource: clean up parse_pmtmr()
    tick: Correct the comments for tick_sched_timer()
    tick: Conditionally build nohz specific code in tick handler
    tick: Consolidate tick handling for high and low res handlers
    tick: Consolidate timekeeping handling code

    Linus Torvalds
     
  • …it.kernel.org/pub/scm/linux/kernel/git/tip/tip

    Pull trivial fix branches from Ingo Molnar.

    Cleanup in __get_key_name, and a timer comment fixlet.

    * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    lockdep: Use KSYM_NAME_LEN'ed buffer for __get_key_name()

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    timers, sched: Correct the comments for tick_sched_timer()

    Linus Torvalds
     

28 Nov, 2012

1 commit


22 Nov, 2012

1 commit


15 Nov, 2012

1 commit

  • The prediction for future is difficult and when the cpuidle governor prediction
    fails and govenor possibly choose the shallower C-state than it should. How to
    quickly notice and find the failure becomes important for power saving.

    cpuidle menu governor has a method to predict the repeat pattern if there are 8
    C-states residency which are continuous and the same or very close, so it will
    predict the next C-states residency will keep same residency time.

    There is a real case that turbostat utility (tools/power/x86/turbostat)
    at kernel 3.3 or early. turbostat utility will read 10 registers one by one at
    Sandybridge, so it will generate 10 IPIs to wake up idle CPUs. So cpuidle menu
    governor will predict it is repeat mode and there is another IPI wake up idle
    CPU soon, so it keeps idle CPU stay at C1 state even though CPU is totally
    idle. However, in the turbostat, following 10 registers reading is sleep 5
    seconds by default, so the idle CPU will keep at C1 for a long time though it is
    idle until break event occurs.
    In a idle Sandybridge system, run "./turbostat -v", we will notice that deep
    C-state dangles between "70% ~ 99%". After patched the kernel, we will notice
    deep C-state stays at >99.98%.

    In the patch, a timer is added when menu governor detects a repeat mode and
    choose a shallow C-state. The timer is set to a time out value that greater
    than predicted time, and we conclude repeat mode prediction failure if timer is
    triggered. When repeat mode happens as expected, the timer is not triggered
    and CPU waken up from C-states and it will cancel the timer initiatively.
    When repeat mode does not happen, the timer will be time out and menu governor
    will quickly notice that the repeat mode prediction fails and then re-evaluates
    deeper C-states possibility.

    Below is another case which will clearly show the patch much benefit:

    #include
    #include
    #include
    #include
    #include
    #include
    #include

    volatile int * shutdown;
    volatile long * count;
    int delay = 20;
    int loop = 8;

    void usage(void)
    {
    fprintf(stderr,
    "Usage: idle_predict [options]\n"
    " --help -h Print this help\n"
    " --thread -n Thread number\n"
    " --loop -l Loop times in shallow Cstate\n"
    " --delay -t Sleep time (uS)in shallow Cstate\n");
    }

    void *simple_loop() {
    int idle_num = 1;
    while (!(*shutdown)) {
    *count = *count + 1;

    if (idle_num % loop)
    usleep(delay);
    else {
    /* sleep 1 second */
    usleep(1000000);
    idle_num = 0;
    }
    idle_num++;
    }

    }

    static void sighand(int sig)
    {
    *shutdown = 1;
    }

    int main(int argc, char *argv[])
    {
    sigset_t sigset;
    int signum = SIGALRM;
    int i, c, er = 0, thread_num = 8;
    pthread_t pt[1024];

    static char optstr[] = "n:l:t:h:";

    while ((c = getopt(argc, argv, optstr)) != EOF)
    switch (c) {
    case 'n':
    thread_num = atoi(optarg);
    break;
    case 'l':
    loop = atoi(optarg);
    break;
    case 't':
    delay = atoi(optarg);
    break;
    case 'h':
    default:
    usage();
    exit(1);
    }

    printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay);
    count = malloc(sizeof(long));
    shutdown = malloc(sizeof(int));
    *count = 0;
    *shutdown = 0;

    sigemptyset(&sigset);
    sigaddset(&sigset, signum);
    sigprocmask (SIG_BLOCK, &sigset, NULL);
    signal(SIGINT, sighand);
    signal(SIGTERM, sighand);

    for(i = 0; i < thread_num ; i++)
    pthread_create(&pt[i], NULL, simple_loop, NULL);

    for (i = 0; i < thread_num; i++)
    pthread_join(pt[i], NULL);

    exit(0);
    }

    Get powertop V2 from git://github.com/fenrus75/powertop, build powertop.
    After build the above test application, then run it.
    Test plaform can be Intel Sandybridge or other recent platforms.
    #./idle_predict -l 10 &
    #./powertop

    We will find that deep C-state will dangle between 40%~100% and much time spent
    on C1 state. It is because menu governor wrongly predict that repeat mode
    is kept, so it will choose the C1 shallow C-state even though it has chance to
    sleep 1 second in deep C-state.

    While after patched the kernel, we find that deep C-state will keep >99.6%.

    Signed-off-by: Rik van Riel
    Signed-off-by: Youquan Song
    Signed-off-by: Rafael J. Wysocki

    Youquan Song
     

14 Nov, 2012

2 commits

  • Now that timekeeping is protected by its own locks, rename
    the xtime_lock to jifffies_lock to better describe what it
    protects.

    CC: Thomas Gleixner
    CC: Eric Dumazet
    CC: Richard Cochran
    Signed-off-by: John Stultz

    John Stultz
     
  • Commit f1b8274 ("clocksource: Cleanup clocksource selection") removed all
    external references to clocksource_jiffies so there is no need to have the
    symbol globally visible.

    Fixes the following sparse warning:
    CHECK kernel/time/jiffies.c kernel/time/jiffies.c:61:20: warning: symbol 'clocksource_jiffies' was not declared. Should it be static?

    Signed-off-by: Lars-Peter Clausen
    Signed-off-by: John Stultz

    Lars-Peter Clausen
     

01 Nov, 2012

2 commits

  • This patch removes the timecompare code from the kernel. The top five
    reasons to do this are:

    1. There are no more users of this code.
    2. The original idea was a bit weak.
    3. The original author has disappeared.
    4. The code was not general purpose but tuned to a particular hardware,
    5. There are better ways to accomplish clock synchronization.

    Signed-off-by: Richard Cochran
    Acked-by: John Stultz
    Tested-by: Bob Liu
    Signed-off-by: David S. Miller

    Richard Cochran
     
  • In the comments of function tick_sched_timer(), the sentence
    "timer->base->cpu_base->lock held" is not right.

    In function __run_hrtimer(), before call timer->function(),
    the cpu_base->lock has been unlocked.

    Signed-off-by: liu chuansheng
    Cc: fei.li@intel.com
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1351098455.15558.1421.camel@cliu38-desktop-build
    Signed-off-by: Thomas Gleixner

    Chuansheng Liu
     

24 Oct, 2012

1 commit

  • In the comments of function tick_sched_timer(), the sentence
    "timer->base->cpu_base->lock held" is not right.

    In function __run_hrtimer(), before call timer->function(),
    the cpu_base->lock has been unlocked.

    Signed-off-by: liu chuansheng
    Cc: fei.li@intel.com
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1351098455.15558.1421.camel@cliu38-desktop-build
    Signed-off-by: Ingo Molnar

    Chuansheng Liu
     

16 Oct, 2012

3 commits


12 Oct, 2012

2 commits

  • Pull timer core update from Thomas Gleixner:
    - Bug fixes (one for a longstanding dead loop issue)
    - Rework of time related vsyscalls
    - Alarm timer updates
    - Jiffies updates to remove compile time dependencies

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    timekeeping: Cast raw_interval to u64 to avoid shift overflow
    timers: Fix endless looping between cascade() and internal_add_timer()
    time/jiffies: bring back unconditional LATCH definition
    time: Convert x86_64 to using new update_vsyscall
    time: Only do nanosecond rounding on GENERIC_TIME_VSYSCALL_OLD systems
    time: Introduce new GENERIC_TIME_VSYSCALL
    time: Convert CONFIG_GENERIC_TIME_VSYSCALL to CONFIG_GENERIC_TIME_VSYSCALL_OLD
    time: Move update_vsyscall definitions to timekeeper_internal.h
    time: Move timekeeper structure to timekeeper_internal.h for vsyscall changes
    jiffies: Remove compile time assumptions about CLOCK_TICK_RATE
    jiffies: Kill unused TICK_USEC_TO_NSEC
    alarmtimer: Rename alarmtimer_remove to alarmtimer_dequeue
    alarmtimer: Remove unused helpers & defines
    alarmtimer: Use hrtimer per-alarm instead of per-base
    alarmtimer: Implement minimum alarm interval for allowing suspend

    Linus Torvalds
     
  • Pull scheduler fixes from Ingo Molnar:
    "A CPU hotplug related crash fix and a nohz accounting fixlet."

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched: Update sched_domains_numa_masks[][] when new cpus are onlined
    sched: Ensure 'sched_domains_numa_levels' is safe to use in other functions
    nohz: Fix one jiffy count too far in idle cputime

    Linus Torvalds
     

10 Oct, 2012

1 commit


05 Oct, 2012

1 commit

  • When we stop the tick in idle, we save the current jiffies value
    in ts->idle_jiffies. This snapshot is substracted from the later
    value of jiffies when the tick is restarted and the resulting
    delta is accounted as idle cputime. This is how we handle the
    idle cputime accounting without the tick.

    But sometimes we need to schedule the next tick to some time in
    the future instead of completely stopping it. In this case, a
    tick may happen before we restart the periodic behaviour and
    from that tick we account one jiffy to idle cputime as usual but
    we also increment the ts->idle_jiffies snapshot by one so that
    when we compute the delta to account, we substract the one jiffy
    we just accounted.

    To prepare for stopping the tick outside idle, we introduced a
    check that prevents from fixing up that ts->idle_jiffies if we
    are not running the idle task. But we use idle_cpu() for that
    and this is a problem if we run the tick while another CPU
    remotely enqueues a ttwu to our runqueue:

    CPU 0: CPU 1:

    tick_sched_timer() { ttwu_queue_remote()
    if (idle_cpu(CPU 0))
    ts->idle_jiffies++;
    }

    Here, idle_cpu() notes that &rq->wake_list is not empty and
    hence won't consider the CPU as idle. As a result,
    ts->idle_jiffies won't be incremented. But this is wrong because
    we actually account the current jiffy to idle cputime. And that
    jiffy won't get substracted from the nohz time delta. So in the
    end, this jiffy is accounted twice.

    Fix this by changing idle_cpu(smp_processor_id()) with
    is_idle_task(current). This way the jiffy is substracted
    correctly even if a ttwu operation is enqueued on the CPU.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: # 3.5+
    Link: http://lkml.kernel.org/r/1349308004-3482-1-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

03 Oct, 2012

1 commit

  • Pull power management updates from Rafael J Wysocki:

    - Improved system suspend/resume and runtime PM handling for the SH
    TMU, CMT and MTU2 clock event devices (also used by ARM/shmobile).

    - Generic PM domains framework extensions related to cpuidle support
    and domain objects lookup using names.

    - ARM/shmobile power management updates including improved support for
    the SH7372's A4S power domain containing the CPU core.

    - cpufreq changes related to AMD CPUs support from Matthew Garrett,
    Andre Przywara and Borislav Petkov.

    - cpu0 cpufreq driver from Shawn Guo.

    - cpufreq governor fixes related to the relaxing of limit from Michal
    Pecio.

    - OMAP cpufreq updates from Axel Lin and Richard Zhao.

    - cpuidle ladder governor fixes related to the disabling of states from
    Carsten Emde and me.

    - Runtime PM core updates related to the interactions with the system
    suspend core from Alan Stern and Kevin Hilman.

    - Wakeup sources modification allowing more helper functions to be
    called from interrupt context from John Stultz and additional
    diagnostic code from Todd Poynor.

    - System suspend error code path fix from Feng Hong.

    Fixed up conflicts in cpufreq/powernow-k8 that stemmed from the
    workqueue fixes conflicting fairly badly with the removal of support for
    hardware P-state chips. The changes were independent but somewhat
    intertwined.

    * tag 'pm-for-3.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (76 commits)
    Revert "PM QoS: Use spinlock in the per-device PM QoS constraints code"
    PM / Runtime: let rpm_resume() succeed if RPM_ACTIVE, even when disabled, v2
    cpuidle: rename function name "__cpuidle_register_driver", v2
    cpufreq: OMAP: Check IS_ERR() instead of NULL for omap_device_get_by_hwmod_name
    cpuidle: remove some empty lines
    PM: Prevent runtime suspend during system resume
    PM QoS: Use spinlock in the per-device PM QoS constraints code
    PM / Sleep: use resume event when call dpm_resume_early
    cpuidle / ACPI : move cpuidle_device field out of the acpi_processor_power structure
    ACPI / processor: remove pointless variable initialization
    ACPI / processor: remove unused function parameter
    cpufreq: OMAP: remove loops_per_jiffy recalculate for smp
    sections: fix section conflicts in drivers/cpufreq
    cpufreq: conservative: update frequency when limits are relaxed
    cpufreq / ondemand: update frequency when limits are relaxed
    properly __init-annotate pm_sysrq_init()
    cpufreq: Add a generic cpufreq-cpu0 driver
    PM / OPP: Initialize OPP table from device tree
    ARM: add cpufreq transiton notifier to adjust loops_per_jiffy for smp
    cpufreq: Remove support for hardware P-state chips from powernow-k8
    ...

    Linus Torvalds
     

02 Oct, 2012

1 commit

  • Pull scheduler changes from Ingo Molnar:
    "Continued quest to clean up and enhance the cputime code by Frederic
    Weisbecker, in preparation for future tickless kernel features.

    Other than that, smallish changes."

    Fix up trivial conflicts due to additions next to each other in arch/{x86/}Kconfig

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
    cputime: Make finegrained irqtime accounting generally available
    cputime: Gather time/stats accounting config options into a single menu
    ia64: Reuse system and user vtime accounting functions on task switch
    ia64: Consolidate user vtime accounting
    vtime: Consolidate system/idle context detection
    cputime: Use a proper subsystem naming for vtime related APIs
    sched: cpu_power: enable ARCH_POWER
    sched/nohz: Clean up select_nohz_load_balancer()
    sched: Fix load avg vs. cpu-hotplug
    sched: Remove __ARCH_WANT_INTERRUPTS_ON_CTXSW
    sched: Fix nohz_idle_balance()
    sched: Remove useless code in yield_to()
    sched: Add time unit suffix to sched sysctl knobs
    sched/debug: Limit sd->*_idx range on sysctl
    sched: Remove AFFINE_WAKEUPS feature flag
    s390: Remove leftover account_tick_vtime() header
    cputime: Consolidate vtime handling on context switch
    sched: Move cputime code to its own file
    cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING
    tile: Remove SD_PREFER_LOCAL leftover
    ...

    Linus Torvalds
     

26 Sep, 2012

1 commit


25 Sep, 2012

8 commits

  • We only do rounding to the next nanosecond so we don't see minor
    1ns inconsistencies in the vsyscall implementations. Since we're
    changing the vsyscall implementations to avoid this, conditionalize
    the rounding only to the GENERIC_TIME_VSYSCALL_OLD architectures.

    Cc: Tony Luck
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Andy Lutomirski
    Cc: Martin Schwidefsky
    Cc: Paul Turner
    Cc: Steven Rostedt
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Cc: Thomas Gleixner
    Signed-off-by: John Stultz

    John Stultz
     
  • Now that we moved everyone over to GENERIC_TIME_VSYSCALL_OLD,
    introduce the new declaration and config option for the new
    update_vsyscall method.

    Cc: Tony Luck
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Andy Lutomirski
    Cc: Martin Schwidefsky
    Cc: Paul Turner
    Cc: Steven Rostedt
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Cc: Thomas Gleixner
    Signed-off-by: John Stultz

    John Stultz
     
  • To help migrate archtectures over to the new update_vsyscall method,
    redfine CONFIG_GENERIC_TIME_VSYSCALL as CONFIG_GENERIC_TIME_VSYSCALL_OLD

    Cc: Tony Luck
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Andy Lutomirski
    Cc: Martin Schwidefsky
    Cc: Paul Turner
    Cc: Steven Rostedt
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Cc: Thomas Gleixner
    Signed-off-by: John Stultz

    John Stultz
     
  • We're going to need to access the timekeeper in update_vsyscall,
    so make the structure available for those who need it.

    Cc: Tony Luck
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Andy Lutomirski
    Cc: Martin Schwidefsky
    Cc: Paul Turner
    Cc: Steven Rostedt
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Cc: Thomas Gleixner
    Signed-off-by: John Stultz

    John Stultz
     
  • CLOCK_TICK_RATE is used to accurately caclulate exactly how
    a tick will be at a given HZ.

    This is useful, because while we'd expect NSEC_PER_SEC/HZ,
    the underlying hardware will have some granularity limit,
    so we won't be able to have exactly HZ ticks per second.

    This slight error can cause timekeeping quality problems
    when using the jiffies or other jiffies driven clocksources.
    Thus we currently use compile time CLOCK_TICK_RATE value to
    generate SHIFTED_HZ and NSEC_PER_JIFFIES, which we then use
    to adjust the jiffies clocksource to correct this error.

    Unfortunately though, since CLOCK_TICK_RATE is a compile
    time value, and the jiffies clocksource is registered very
    early during boot, there are a number of cases where there
    are different possible hardware timers that have different
    tick rates. This causes problems in cases like ARM where
    there are numerous different types of hardware, each having
    their own compile-time CLOCK_TICK_RATE, making it hard to
    accurately support different hardware with a single kernel.

    For the most part, this doesn't matter all that much, as not
    too many systems actually utilize the jiffies or jiffies driven
    clocksource. Usually there are other highres clocksources
    who's granularity error is negligable.

    Even so, we have some complicated calcualtions that we do
    everywhere to handle these edge cases.

    This patch removes the compile time SHIFTED_HZ value, and
    introduces a register_refined_jiffies() function. This results
    in the default jiffies clock as being assumed a perfect HZ
    freq, and allows archtectures that care about jiffies accuracy
    to call register_refined_jiffies() with the tick rate, specified
    dynamically at boot.

    This allows us, where necessary, to not have a compile time
    CLOCK_TICK_RATE constant, simplifies the jiffies code, and
    still provides a way to have an accurate jiffies clock.

    NOTE: Since this patch does not add register_refinied_jiffies()
    calls for every arch, it may cause time quality regressions
    in some cases. Its likely these will not be noticable, but
    if they are an issue, adding the following to the end of
    setup_arch() should resolve the regression:
    register_refinied_jiffies(CLOCK_TICK_RATE)

    Cc: Catalin Marinas
    Cc: Arnd Bergmann
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Cc: Thomas Gleixner
    Signed-off-by: John Stultz

    John Stultz
     
  • Now that alarmtimer_remove has been simplified, change
    its name to _dequeue to better match its paired _enqueue
    function.

    Cc: Arve Hjønnevåg
    Cc: Colin Cross
    Cc: Thomas Gleixner
    Signed-off-by: John Stultz

    John Stultz
     
  • Arve Hjønnevåg reported numerous crashes from the
    "BUG_ON(timer->state != HRTIMER_STATE_CALLBACK)" check
    in __run_hrtimer after it called alarmtimer_fired.

    It ends up the alarmtimer code was not properly handling
    possible failures of hrtimer_try_to_cancel, and because
    these faulres occur when the underlying base hrtimer is
    being run, this limits the ability to properly handle
    modifications to any alarmtimers on that base.

    Because much of the logic duplicates the hrtimer logic,
    it seems that we might as well have a per-alarmtimer
    hrtimer, and avoid the extra complextity of trying to
    multiplex many alarmtimers off of one hrtimer.

    Thus this patch moves the hrtimer to the alarm structure
    and simplifies the management logic.

    Changelog:
    v2:
    * Includes a fix for double alarm_start calls found by
    Arve

    Cc: Arve Hjønnevåg
    Cc: Colin Cross
    Cc: Thomas Gleixner
    Reported-by: Arve Hjønnevåg
    Tested-by: Arve Hjønnevåg
    Signed-off-by: John Stultz

    John Stultz
     
  • alarmtimer suspend return -EBUSY if the next alarm will fire in less
    than 2 seconds. This allows one RTC seconds tick to occur subsequent
    to this check before the alarm wakeup time is set, ensuring the wakeup
    time is still in the future (assuming the RTC does not tick one more
    second prior to setting the alarm).

    If suspend is rejected due to an imminent alarm, hold a wakeup source
    for 2 seconds to process the alarm prior to reattempting suspend.

    If setting the alarm incurs an -ETIME for an alarm set in the past,
    or any other problem setting the alarm, abort suspend and hold a
    wakelock for 1 second while the alarm is allowed to be serviced or
    other hopefully transient conditions preventing the alarm clear up.

    Signed-off-by: Todd Poynor
    Signed-off-by: John Stultz

    Todd Poynor
     

23 Sep, 2012

1 commit

  • The can_stop_idle_tick() function complains if a softirq vector is
    raised too late in the idle-entry process, presumably in order to
    prevent dangling softirq invocations from being delayed across the
    full idle period, which might be indefinitely long -- and if softirq
    was asserted any later than the call to this function, such a delay
    might well happen.

    However, RCU needs to be able to use softirq to stop idle entry in
    order to be able to drain RCU callbacks from the current CPU, which in
    turn enables faster entry into dyntick-idle mode, which in turn reduces
    power consumption. Because RCU takes this action at a well-defined
    point in the idle-entry path, it is safe for RCU to take this approach.

    This commit therefore silences the error message that is sometimes
    produced when the going-idle CPU suddenly finds that it has an RCU_SOFTIRQ
    to process. The error message will continue to be issued for other
    softirq vectors.

    Reported-by: Sedat Dilek
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Tested-by: Sedat Dilek
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

22 Sep, 2012

1 commit


18 Sep, 2012

1 commit

  • * pm-timers:
    PM: Do not use the syscore flag for runtime PM
    sh: MTU2: Basic runtime PM support
    sh: CMT: Basic runtime PM support
    sh: TMU: Basic runtime PM support
    PM / Domains: Do not measure start time for "irq safe" devices
    PM / Domains: Move syscore flag from subsys data to struct device
    PM / Domains: Rename the always_on device flag to syscore
    PM / Runtime: Allow helpers to be called by early platform drivers
    PM: Reorganize device PM initialization
    sh: MTU2: Introduce clock events suspend/resume routines
    sh: CMT: Introduce clocksource/clock events suspend/resume routines
    sh: TMU: Introduce clocksource/clock events suspend/resume routines
    timekeeping: Add suspend and resume of clock event devices
    PM / Domains: Add power off/on function for system core suspend stage
    PM / Domains: Introduce simplified power on routine for system resume

    Rafael J. Wysocki
     

13 Sep, 2012

2 commits

  • Daniel Lezcano reported seeing multi-second stalls from
    keyboard input on his T61 laptop when NOHZ and CPU_IDLE
    were enabled on a 32bit kernel.

    He bisected the problem down to commit
    1e75fa8be9fb6 ("time: Condense timekeeper.xtime into xtime_sec").

    After reproducing this issue, I narrowed the problem down
    to the fact that timekeeping_get_ns() returns a 64bit
    nsec value that hasn't been accumulated. In some cases
    this value was being then stored in timespec.tv_nsec
    (which is a long).

    On 32bit systems, with idle times larger then 4 seconds
    (or less, depending on the value of xtime_nsec), the
    returned nsec value would overflow 32bits. This limited
    kept time from increasing, causing timers to not expire.

    The fix is to make sure we don't directly store the
    result of timekeeping_get_ns() into a tv_nsec field,
    instead using a 64bit nsec value which can then be
    added into the timespec via timespec_add_ns().

    Reported-and-bisected-by: Daniel Lezcano
    Tested-by: Daniel Lezcano
    Signed-off-by: John Stultz
    Acked-by: Prarit Bhargava
    Cc: Richard Cochran
    Link: http://lkml.kernel.org/r/1347405963-35715-1-git-send-email-john.stultz@linaro.org
    Signed-off-by: Ingo Molnar

    John Stultz
     
  • There is no load_balancer to be selected now. It just sets the
    state of the nohz tick to stop.

    So rename the function, pass the 'cpu' as a parameter and then
    remove the useless call from tick_nohz_restart_sched_tick().

    [ s/set_nohz_tick_stopped/nohz_balance_enter_idle/g
    s/clear_nohz_tick_stopped/nohz_balance_exit_idle/g ]
    Signed-off-by: Alex Shi
    Acked-by: Suresh Siddha
    Cc: Venkatesh Pallipadi
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1347261059-24747-1-git-send-email-alex.shi@intel.com
    Signed-off-by: Ingo Molnar

    Alex Shi
     

04 Sep, 2012

2 commits

  • Azat Khuzhin reported high loadavg in Linux v3.6

    After checking the upstream scheduler code, I found Peter's commit:

    5167e8d5417b sched/nohz: Rewrite and fix load-avg computation -- again

    not fully applied, missing the call to calc_load_exit_idle().

    After that idle exit in sampling window will always be calculated
    to non-idle, and the load will be higher than normal.

    This patch adds the missing call to calc_load_exit_idle().

    Signed-off-by: Charles Wang
    Cc: stable@kernel.org
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1345449754-27130-1-git-send-email-muming.wq@gmail.com
    Signed-off-by: Ingo Molnar

    Charles Wang
     
  • Some clock event devices, for example such that belong to PM domains,
    need to be handled in a spcial way during the timekeeping suspend
    and resume (which takes place in the system core, or "syscore",
    stages of system power transitions) in analogy with clock sources.

    Introduce .suspend() and .resume() callbacks for clock event devices
    that will be executed by timekeeping_suspend/_resume(), respectively,
    next the the clock sources' .suspend() and .resume() callbacks.

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

02 Sep, 2012

1 commit

  • Andreas Bombe reported that the added ktime_t overflow checking added to
    timespec_valid in commit 4e8b14526ca7 ("time: Improve sanity checking of
    timekeeping inputs") was causing problems with X.org because it caused
    timeouts larger then KTIME_T to be invalid.

    Previously, these large timeouts would be clamped to KTIME_MAX and would
    never expire, which is valid.

    This patch splits the ktime_t overflow checking into a new
    timespec_valid_strict function, and converts the timekeeping codes
    internal checking to use this more strict function.

    Reported-and-tested-by: Andreas Bombe
    Cc: Zhouping Liu
    Cc: Ingo Molnar
    Cc: Prarit Bhargava
    Cc: Thomas Gleixner
    Cc: stable@vger.kernel.org
    Signed-off-by: John Stultz
    Signed-off-by: Linus Torvalds

    John Stultz
     

22 Aug, 2012

2 commits

  • If update_wall_time() is called and the current offset isn't large
    enough to accumulate, avoid re-calling timekeeping_adjust which may
    change the clock freq and can cause 1ns inconsistencies with
    CLOCK_REALTIME_COARSE/CLOCK_MONOTONIC_COARSE.

    Signed-off-by: John Stultz
    Cc: Prarit Bhargava
    Cc: Ingo Molnar
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/1345595449-34965-5-git-send-email-john.stultz@linaro.org
    Signed-off-by: Thomas Gleixner

    John Stultz
     
  • Andreas Schwab noticed that the 1 << tk->shift could overflow if the
    shift value was greater than 30, since 1 would be a 32bit long on
    32bit architectures. This issue was introduced by 1e75fa8be (time:
    Condense timekeeper.xtime into xtime_sec)

    Use 1ULL instead to ensure we don't overflow on the shift.

    Reported-by: Andreas Schwab
    Signed-off-by: John Stultz
    Cc: Prarit Bhargava
    Cc: Ingo Molnar
    Link: http://lkml.kernel.org/r/1345595449-34965-4-git-send-email-john.stultz@linaro.org
    Signed-off-by: Thomas Gleixner

    John Stultz