Doug / smarc-fsl-linux-kernel | Embedian Git Server

14 Dec, 2012

1 commit

66cdd0cea Merge tag 'kvm-3.8-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull KVM updates from Marcelo Tosatti:
"Considerable KVM/PPC work, x86 kvmclock vsyscall support,
IA32_TSC_ADJUST MSR emulation, amongst others."

Fix up trivial conflict in kernel/sched/core.c due to cross-cpu
migration notifier added next to rq migration call-back.

* tag 'kvm-3.8-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (156 commits)
KVM: emulator: fix real mode segment checks in address linearization
VMX: remove unneeded enable_unrestricted_guest check
KVM: VMX: fix DPL during entry to protected mode
x86/kexec: crash_vmclear_local_vmcss needs __rcu
kvm: Fix irqfd resampler list walk
KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump
x86/kexec: VMCLEAR VMCSs loaded on all cpus if necessary
KVM: MMU: optimize for set_spte
KVM: PPC: booke: Get/set guest EPCR register using ONE_REG interface
KVM: PPC: bookehv: Add EPCR support in mtspr/mfspr emulation
KVM: PPC: bookehv: Add guest computation mode for irq delivery
KVM: PPC: Make EPCR a valid field for booke64 and bookehv
KVM: PPC: booke: Extend MAS2 EPN mask for 64-bit
KVM: PPC: e500: Mask MAS2 EPN high 32-bits in 32/64 tlbwe emulation
KVM: PPC: Mask ea's high 32-bits in 32/64 instr emulation
KVM: PPC: e500: Add emulation helper for getting instruction ea
KVM: PPC: bookehv64: Add support for interrupt handling
KVM: PPC: bookehv: Remove GET_VCPU macro from exception handler
KVM: PPC: booke: Fix get_tb() compile error on 64-bit
KVM: PPC: e500: Silence bogus GCC warning in tlb code
...

Linus Torvalds
2012-12-14 07:31:08 +0800

13 Dec, 2012

1 commit

6be35c700 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking changes from David Miller:

1) Allow to dump, monitor, and change the bridge multicast database
using netlink. From Cong Wang.

2) RFC 5961 TCP blind data injection attack mitigation, from Eric
Dumazet.

3) Networking user namespace support from Eric W. Biederman.

4) tuntap/virtio-net multiqueue support by Jason Wang.

5) Support for checksum offload of encapsulated packets (basically,
tunneled traffic can still be checksummed by HW). From Joseph
Gasparakis.

6) Allow BPF filter access to VLAN tags, from Eric Dumazet and
Daniel Borkmann.

7) Bridge port parameters over netlink and BPDU blocking support
from Stephen Hemminger.

8) Improve data access patterns during inet socket demux by rearranging
socket layout, from Eric Dumazet.

9) TIPC protocol updates and cleanups from Ying Xue, Paul Gortmaker, and
Jon Maloy.

10) Update TCP socket hash sizing to be more in line with current day
realities. The existing heurstics were choosen a decade ago.
From Eric Dumazet.

11) Fix races, queue bloat, and excessive wakeups in ATM and
associated drivers, from Krzysztof Mazur and David Woodhouse.

12) Support DOVE (Distributed Overlay Virtual Ethernet) extensions
in VXLAN driver, from David Stevens.

13) Add "oops_only" mode to netconsole, from Amerigo Wang.

14) Support set and query of VEB/VEPA bridge mode via PF_BRIDGE, also
allow DCB netlink to work on namespaces other than the initial
namespace. From John Fastabend.

15) Support PTP in the Tigon3 driver, from Matt Carlson.

16) tun/vhost zero copy fixes and improvements, plus turn it on
by default, from Michael S. Tsirkin.

17) Support per-association statistics in SCTP, from Michele
Baldessari.

And many, many, driver updates, cleanups, and improvements. Too
numerous to mention individually.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
net/mlx4_en: Add support for destination MAC in steering rules
net/mlx4_en: Use generic etherdevice.h functions.
net: ethtool: Add destination MAC address to flow steering API
bridge: add support of adding and deleting mdb entries
bridge: notify mdb changes via netlink
ndisc: Unexport ndisc_{build,send}_skb().
uapi: add missing netconf.h to export list
pkt_sched: avoid requeues if possible
solos-pci: fix double-free of TX skb in DMA mode
bnx2: Fix accidental reversions.
bna: Driver Version Updated to 3.1.2.1
bna: Firmware update
bna: Add RX State
bna: Rx Page Based Allocation
bna: TX Intr Coalescing Fix
bna: Tx and Rx Optimizations
bna: Code Cleanup and Enhancements
ath9k: check pdata variable before dereferencing it
ath5k: RX timestamp is reported at end of frame
ath9k_htc: RX timestamp is reported at end of frame
...

Linus Torvalds
2012-12-13 10:07:07 +0800

12 Dec, 2012

2 commits

b64c5fda3 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull core timer changes from Ingo Molnar:
"It contains continued generic-NOHZ work by Frederic and smaller
cleanups."

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time: Kill xtime_lock, replacing it with jiffies_lock
clocksource: arm_generic: use this_cpu_ptr per-cpu helper
clocksource: arm_generic: use integer math helpers
time/jiffies: Make clocksource_jiffies static
clocksource: clean up parse_pmtmr()
tick: Correct the comments for tick_sched_timer()
tick: Conditionally build nohz specific code in tick handler
tick: Consolidate tick handling for high and low res handlers
tick: Consolidate timekeeping handling code

Linus Torvalds
2012-12-12 10:22:46 +0800
de0c276b3 Merge branches 'core-locking-for-linus' and 'timers-urgent-for-linus' of git://g… ... Browse Code »

…it.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull trivial fix branches from Ingo Molnar.

Cleanup in __get_key_name, and a timer comment fixlet.

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lockdep: Use KSYM_NAME_LEN'ed buffer for __get_key_name()

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers, sched: Correct the comments for tick_sched_timer()

Linus Torvalds
2012-12-12 10:09:18 +0800

28 Nov, 2012

1 commit

e0b306fef time: export time information for KVM pvclock ... Browse Code »

As suggested by John, export time data similarly to how its
done by vsyscall support. This allows KVM to retrieve necessary
information to implement vsyscall support in KVM guests.

Acked-by: John Stultz
Signed-off-by: Marcelo Tosatti

Marcelo Tosatti
2012-11-28 09:29:12 +0800

22 Nov, 2012

1 commit

9c3f9e281 Merge branch 'fortglx/3.8/time' of git://git.linaro.org/people/jstultz/linux into timers/core ... Browse Code »

Fix trivial conflicts in: kernel/time/tick-sched.c

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2012-11-22 03:31:52 +0800

15 Nov, 2012

1 commit

69a37beab cpuidle: Quickly notice prediction failure for repeat mode ... Browse Code »

The prediction for future is difficult and when the cpuidle governor prediction
fails and govenor possibly choose the shallower C-state than it should. How to
quickly notice and find the failure becomes important for power saving.

cpuidle menu governor has a method to predict the repeat pattern if there are 8
C-states residency which are continuous and the same or very close, so it will
predict the next C-states residency will keep same residency time.

There is a real case that turbostat utility (tools/power/x86/turbostat)
at kernel 3.3 or early. turbostat utility will read 10 registers one by one at
Sandybridge, so it will generate 10 IPIs to wake up idle CPUs. So cpuidle menu
governor will predict it is repeat mode and there is another IPI wake up idle
CPU soon, so it keeps idle CPU stay at C1 state even though CPU is totally
idle. However, in the turbostat, following 10 registers reading is sleep 5
seconds by default, so the idle CPU will keep at C1 for a long time though it is
idle until break event occurs.
In a idle Sandybridge system, run "./turbostat -v", we will notice that deep
C-state dangles between "70% ~ 99%". After patched the kernel, we will notice
deep C-state stays at >99.98%.

In the patch, a timer is added when menu governor detects a repeat mode and
choose a shallow C-state. The timer is set to a time out value that greater
than predicted time, and we conclude repeat mode prediction failure if timer is
triggered. When repeat mode happens as expected, the timer is not triggered
and CPU waken up from C-states and it will cancel the timer initiatively.
When repeat mode does not happen, the timer will be time out and menu governor
will quickly notice that the repeat mode prediction fails and then re-evaluates
deeper C-states possibility.

Below is another case which will clearly show the patch much benefit:

#include
#include
#include
#include
#include
#include
#include

volatile int * shutdown;
volatile long * count;
int delay = 20;
int loop = 8;

void usage(void)
{
fprintf(stderr,
"Usage: idle_predict [options]\n"
" --help -h Print this help\n"
" --thread -n Thread number\n"
" --loop -l Loop times in shallow Cstate\n"
" --delay -t Sleep time (uS)in shallow Cstate\n");
}

void *simple_loop() {
int idle_num = 1;
while (!(*shutdown)) {
*count = *count + 1;

if (idle_num % loop)
usleep(delay);
else {
/* sleep 1 second */
usleep(1000000);
idle_num = 0;
}
idle_num++;
}

}

static void sighand(int sig)
{
*shutdown = 1;
}

int main(int argc, char *argv[])
{
sigset_t sigset;
int signum = SIGALRM;
int i, c, er = 0, thread_num = 8;
pthread_t pt[1024];

static char optstr[] = "n:l:t:h:";

while ((c = getopt(argc, argv, optstr)) != EOF)
switch (c) {
case 'n':
thread_num = atoi(optarg);
break;
case 'l':
loop = atoi(optarg);
break;
case 't':
delay = atoi(optarg);
break;
case 'h':
default:
usage();
exit(1);
}

printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay);
count = malloc(sizeof(long));
shutdown = malloc(sizeof(int));
*count = 0;
*shutdown = 0;

sigemptyset(&sigset);
sigaddset(&sigset, signum);
sigprocmask (SIG_BLOCK, &sigset, NULL);
signal(SIGINT, sighand);
signal(SIGTERM, sighand);

for(i = 0; i < thread_num ; i++)
pthread_create(&pt[i], NULL, simple_loop, NULL);

for (i = 0; i < thread_num; i++)
pthread_join(pt[i], NULL);

exit(0);
}

Get powertop V2 from git://github.com/fenrus75/powertop, build powertop.
After build the above test application, then run it.
Test plaform can be Intel Sandybridge or other recent platforms.
#./idle_predict -l 10 &
#./powertop

We will find that deep C-state will dangle between 40%~100% and much time spent
on C1 state. It is because menu governor wrongly predict that repeat mode
is kept, so it will choose the C1 shallow C-state even though it has chance to
sleep 1 second in deep C-state.

While after patched the kernel, we find that deep C-state will keep >99.6%.

Signed-off-by: Rik van Riel
Signed-off-by: Youquan Song
Signed-off-by: Rafael J. Wysocki

Youquan Song
2012-11-15 07:34:19 +0800

14 Nov, 2012

2 commits

d6ad41876 time: Kill xtime_lock, replacing it with jiffies_lock ... Browse Code »

Now that timekeeping is protected by its own locks, rename
the xtime_lock to jifffies_lock to better describe what it
protects.

CC: Thomas Gleixner
CC: Eric Dumazet
CC: Richard Cochran
Signed-off-by: John Stultz

John Stultz
2012-11-14 03:08:23 +0800
f95a98578 time/jiffies: Make clocksource_jiffies static ... Browse Code »

Commit f1b8274 ("clocksource: Cleanup clocksource selection") removed all
external references to clocksource_jiffies so there is no need to have the
symbol globally visible.

Fixes the following sparse warning:
CHECK kernel/time/jiffies.c kernel/time/jiffies.c:61:20: warning: symbol 'clocksource_jiffies' was not declared. Should it be static?

Signed-off-by: Lars-Peter Clausen
Signed-off-by: John Stultz

Lars-Peter Clausen
2012-11-14 03:04:51 +0800

01 Nov, 2012

2 commits

65f8f9a1c time: remove the timecompare code. ... Browse Code »

This patch removes the timecompare code from the kernel. The top five
reasons to do this are:

1. There are no more users of this code.
2. The original idea was a bit weak.
3. The original author has disappeared.
4. The code was not general purpose but tuned to a particular hardware,
5. There are better ways to accomplish clock synchronization.

Signed-off-by: Richard Cochran
Acked-by: John Stultz
Tested-by: Bob Liu
Signed-off-by: David S. Miller

Richard Cochran
2012-11-01 23:41:35 +0800
b8f61116c tick: Correct the comments for tick_sched_timer() ... Browse Code »

In the comments of function tick_sched_timer(), the sentence
"timer->base->cpu_base->lock held" is not right.

In function __run_hrtimer(), before call timer->function(),
the cpu_base->lock has been unlocked.

Signed-off-by: liu chuansheng
Cc: fei.li@intel.com
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1351098455.15558.1421.camel@cliu38-desktop-build
Signed-off-by: Thomas Gleixner

Chuansheng Liu
2012-11-01 19:13:59 +0800

24 Oct, 2012

1 commit

351f181f9 timers, sched: Correct the comments for tick_sched_timer() ... Browse Code »

In the comments of function tick_sched_timer(), the sentence
"timer->base->cpu_base->lock held" is not right.

In function __run_hrtimer(), before call timer->function(),
the cpu_base->lock has been unlocked.

Signed-off-by: liu chuansheng
Cc: fei.li@intel.com
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1351098455.15558.1421.camel@cliu38-desktop-build
Signed-off-by: Ingo Molnar

Chuansheng Liu
2012-10-24 16:16:51 +0800

16 Oct, 2012

3 commits

94a571402 tick: Conditionally build nohz specific code in tick handler ... Browse Code »

This optimize a bit the high res tick sched handler.

Signed-off-by: Frederic Weisbecker
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Peter Zijlstra

Frederic Weisbecker
2012-10-16 00:51:08 +0800
9e8f559b0 tick: Consolidate tick handling for high and low res handlers ... Browse Code »

Besides unifying code, this also adds the idle check before
processing idle accounting specifics on the low res handler.
This way we also generalize this part of the nohz code for
!CONFIG_HIGH_RES_TIMERS to prepare for the adaptive tickless
features.

Signed-off-by: Frederic Weisbecker
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Peter Zijlstra

Frederic Weisbecker
2012-10-16 00:42:25 +0800
5bb962269 tick: Consolidate timekeeping handling code ... Browse Code »

Unify the duplicated timekeeping handling code of low and high res tick
sched handlers.

Signed-off-by: Frederic Weisbecker
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Peter Zijlstra

Frederic Weisbecker
2012-10-16 00:35:11 +0800

12 Oct, 2012

2 commits

03d3602a8 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer core update from Thomas Gleixner:
- Bug fixes (one for a longstanding dead loop issue)
- Rework of time related vsyscalls
- Alarm timer updates
- Jiffies updates to remove compile time dependencies

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Cast raw_interval to u64 to avoid shift overflow
timers: Fix endless looping between cascade() and internal_add_timer()
time/jiffies: bring back unconditional LATCH definition
time: Convert x86_64 to using new update_vsyscall
time: Only do nanosecond rounding on GENERIC_TIME_VSYSCALL_OLD systems
time: Introduce new GENERIC_TIME_VSYSCALL
time: Convert CONFIG_GENERIC_TIME_VSYSCALL to CONFIG_GENERIC_TIME_VSYSCALL_OLD
time: Move update_vsyscall definitions to timekeeper_internal.h
time: Move timekeeper structure to timekeeper_internal.h for vsyscall changes
jiffies: Remove compile time assumptions about CLOCK_TICK_RATE
jiffies: Kill unused TICK_USEC_TO_NSEC
alarmtimer: Rename alarmtimer_remove to alarmtimer_dequeue
alarmtimer: Remove unused helpers & defines
alarmtimer: Use hrtimer per-alarm instead of per-base
alarmtimer: Implement minimum alarm interval for allowing suspend

Linus Torvalds
2012-10-12 21:17:48 +0800
0588f1f93 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler fixes from Ingo Molnar:
"A CPU hotplug related crash fix and a nohz accounting fixlet."

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Update sched_domains_numa_masks[][] when new cpus are onlined
sched: Ensure 'sched_domains_numa_levels' is safe to use in other functions
nohz: Fix one jiffy count too far in idle cputime

Linus Torvalds
2012-10-12 21:13:05 +0800

10 Oct, 2012

1 commit

5b3900cd4 timekeeping: Cast raw_interval to u64 to avoid shift overflow ... Browse Code »

We fixed a bunch of integer overflows in timekeeping code during the 3.6
cycle. I did an audit based on that and found this potential overflow.

Signed-off-by: Dan Carpenter
Acked-by: John Stultz
Link: http://lkml.kernel.org/r/20121009071823.GA19159@elgon.mountain
Signed-off-by: Thomas Gleixner
Cc: stable@vger.kernel.org

Dan Carpenter
2012-10-10 03:27:14 +0800

05 Oct, 2012

1 commit

2b17c545a nohz: Fix one jiffy count too far in idle cputime ... Browse Code »

When we stop the tick in idle, we save the current jiffies value
in ts->idle_jiffies. This snapshot is substracted from the later
value of jiffies when the tick is restarted and the resulting
delta is accounted as idle cputime. This is how we handle the
idle cputime accounting without the tick.

But sometimes we need to schedule the next tick to some time in
the future instead of completely stopping it. In this case, a
tick may happen before we restart the periodic behaviour and
from that tick we account one jiffy to idle cputime as usual but
we also increment the ts->idle_jiffies snapshot by one so that
when we compute the delta to account, we substract the one jiffy
we just accounted.

To prepare for stopping the tick outside idle, we introduced a
check that prevents from fixing up that ts->idle_jiffies if we
are not running the idle task. But we use idle_cpu() for that
and this is a problem if we run the tick while another CPU
remotely enqueues a ttwu to our runqueue:

CPU 0: CPU 1:

tick_sched_timer() { ttwu_queue_remote()
if (idle_cpu(CPU 0))
ts->idle_jiffies++;
}

Here, idle_cpu() notes that &rq->wake_list is not empty and
hence won't consider the CPU as idle. As a result,
ts->idle_jiffies won't be incremented. But this is wrong because
we actually account the current jiffy to idle cputime. And that
jiffy won't get substracted from the nohz time delta. So in the
end, this jiffy is accounted twice.

Fix this by changing idle_cpu(smp_processor_id()) with
is_idle_task(current). This way the jiffy is substracted
correctly even if a ttwu operation is enqueued on the CPU.

Signed-off-by: Frederic Weisbecker
Cc: Peter Zijlstra
Cc: # 3.5+
Link: http://lkml.kernel.org/r/1349308004-3482-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2012-10-05 16:22:20 +0800

03 Oct, 2012

1 commit

16642a2e7 Merge tag 'pm-for-3.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull power management updates from Rafael J Wysocki:

- Improved system suspend/resume and runtime PM handling for the SH
TMU, CMT and MTU2 clock event devices (also used by ARM/shmobile).

- Generic PM domains framework extensions related to cpuidle support
and domain objects lookup using names.

- ARM/shmobile power management updates including improved support for
the SH7372's A4S power domain containing the CPU core.

- cpufreq changes related to AMD CPUs support from Matthew Garrett,
Andre Przywara and Borislav Petkov.

- cpu0 cpufreq driver from Shawn Guo.

- cpufreq governor fixes related to the relaxing of limit from Michal
Pecio.

- OMAP cpufreq updates from Axel Lin and Richard Zhao.

- cpuidle ladder governor fixes related to the disabling of states from
Carsten Emde and me.

- Runtime PM core updates related to the interactions with the system
suspend core from Alan Stern and Kevin Hilman.

- Wakeup sources modification allowing more helper functions to be
called from interrupt context from John Stultz and additional
diagnostic code from Todd Poynor.

- System suspend error code path fix from Feng Hong.

Fixed up conflicts in cpufreq/powernow-k8 that stemmed from the
workqueue fixes conflicting fairly badly with the removal of support for
hardware P-state chips. The changes were independent but somewhat
intertwined.

* tag 'pm-for-3.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (76 commits)
Revert "PM QoS: Use spinlock in the per-device PM QoS constraints code"
PM / Runtime: let rpm_resume() succeed if RPM_ACTIVE, even when disabled, v2
cpuidle: rename function name "__cpuidle_register_driver", v2
cpufreq: OMAP: Check IS_ERR() instead of NULL for omap_device_get_by_hwmod_name
cpuidle: remove some empty lines
PM: Prevent runtime suspend during system resume
PM QoS: Use spinlock in the per-device PM QoS constraints code
PM / Sleep: use resume event when call dpm_resume_early
cpuidle / ACPI : move cpuidle_device field out of the acpi_processor_power structure
ACPI / processor: remove pointless variable initialization
ACPI / processor: remove unused function parameter
cpufreq: OMAP: remove loops_per_jiffy recalculate for smp
sections: fix section conflicts in drivers/cpufreq
cpufreq: conservative: update frequency when limits are relaxed
cpufreq / ondemand: update frequency when limits are relaxed
properly __init-annotate pm_sysrq_init()
cpufreq: Add a generic cpufreq-cpu0 driver
PM / OPP: Initialize OPP table from device tree
ARM: add cpufreq transiton notifier to adjust loops_per_jiffy for smp
cpufreq: Remove support for hardware P-state chips from powernow-k8
...

Linus Torvalds
2012-10-03 09:32:35 +0800

02 Oct, 2012

1 commit

0b981cb94 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler changes from Ingo Molnar:
"Continued quest to clean up and enhance the cputime code by Frederic
Weisbecker, in preparation for future tickless kernel features.

Other than that, smallish changes."

Fix up trivial conflicts due to additions next to each other in arch/{x86/}Kconfig

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
cputime: Make finegrained irqtime accounting generally available
cputime: Gather time/stats accounting config options into a single menu
ia64: Reuse system and user vtime accounting functions on task switch
ia64: Consolidate user vtime accounting
vtime: Consolidate system/idle context detection
cputime: Use a proper subsystem naming for vtime related APIs
sched: cpu_power: enable ARCH_POWER
sched/nohz: Clean up select_nohz_load_balancer()
sched: Fix load avg vs. cpu-hotplug
sched: Remove __ARCH_WANT_INTERRUPTS_ON_CTXSW
sched: Fix nohz_idle_balance()
sched: Remove useless code in yield_to()
sched: Add time unit suffix to sched sysctl knobs
sched/debug: Limit sd->*_idx range on sysctl
sched: Remove AFFINE_WAKEUPS feature flag
s390: Remove leftover account_tick_vtime() header
cputime: Consolidate vtime handling on context switch
sched: Move cputime code to its own file
cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING
tile: Remove SD_PREFER_LOCAL leftover
...

Linus Torvalds
2012-10-02 01:43:39 +0800

26 Sep, 2012

1 commit

593d1006c Merge remote-tracking branch 'tip/core/rcu' into next.2012.09.25b ... Browse Code »

Resolved conflict in kernel/sched/core.c using Peter Zijlstra's
approach from https://lkml.org/lkml/2012/9/5/585.

Paul E. McKenney
2012-09-26 01:03:56 +0800

25 Sep, 2012

8 commits

92bb1fcf5 time: Only do nanosecond rounding on GENERIC_TIME_VSYSCALL_OLD systems ... Browse Code »

We only do rounding to the next nanosecond so we don't see minor
1ns inconsistencies in the vsyscall implementations. Since we're
changing the vsyscall implementations to avoid this, conditionalize
the rounding only to the GENERIC_TIME_VSYSCALL_OLD architectures.

Cc: Tony Luck
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Andy Lutomirski
Cc: Martin Schwidefsky
Cc: Paul Turner
Cc: Steven Rostedt
Cc: Richard Cochran
Cc: Prarit Bhargava
Cc: Thomas Gleixner
Signed-off-by: John Stultz

John Stultz
2012-09-25 00:38:08 +0800
576094b7f time: Introduce new GENERIC_TIME_VSYSCALL ... Browse Code »

Now that we moved everyone over to GENERIC_TIME_VSYSCALL_OLD,
introduce the new declaration and config option for the new
update_vsyscall method.

Cc: Tony Luck
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Andy Lutomirski
Cc: Martin Schwidefsky
Cc: Paul Turner
Cc: Steven Rostedt
Cc: Richard Cochran
Cc: Prarit Bhargava
Cc: Thomas Gleixner
Signed-off-by: John Stultz

John Stultz
2012-09-25 00:38:08 +0800
706394211 time: Convert CONFIG_GENERIC_TIME_VSYSCALL to CONFIG_GENERIC_TIME_VSYSCALL_OLD ... Browse Code »

To help migrate archtectures over to the new update_vsyscall method,
redfine CONFIG_GENERIC_TIME_VSYSCALL as CONFIG_GENERIC_TIME_VSYSCALL_OLD

Cc: Tony Luck
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Andy Lutomirski
Cc: Martin Schwidefsky
Cc: Paul Turner
Cc: Steven Rostedt
Cc: Richard Cochran
Cc: Prarit Bhargava
Cc: Thomas Gleixner
Signed-off-by: John Stultz

John Stultz
2012-09-25 00:38:07 +0800
d7b4202e0 time: Move timekeeper structure to timekeeper_internal.h for vsyscall changes ... Browse Code »

We're going to need to access the timekeeper in update_vsyscall,
so make the structure available for those who need it.

Cc: Tony Luck
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Andy Lutomirski
Cc: Martin Schwidefsky
Cc: Paul Turner
Cc: Steven Rostedt
Cc: Richard Cochran
Cc: Prarit Bhargava
Cc: Thomas Gleixner
Signed-off-by: John Stultz

John Stultz
2012-09-25 00:38:05 +0800
b3c869d35 jiffies: Remove compile time assumptions about CLOCK_TICK_RATE ... Browse Code »

CLOCK_TICK_RATE is used to accurately caclulate exactly how
a tick will be at a given HZ.

This is useful, because while we'd expect NSEC_PER_SEC/HZ,
the underlying hardware will have some granularity limit,
so we won't be able to have exactly HZ ticks per second.

This slight error can cause timekeeping quality problems
when using the jiffies or other jiffies driven clocksources.
Thus we currently use compile time CLOCK_TICK_RATE value to
generate SHIFTED_HZ and NSEC_PER_JIFFIES, which we then use
to adjust the jiffies clocksource to correct this error.

Unfortunately though, since CLOCK_TICK_RATE is a compile
time value, and the jiffies clocksource is registered very
early during boot, there are a number of cases where there
are different possible hardware timers that have different
tick rates. This causes problems in cases like ARM where
there are numerous different types of hardware, each having
their own compile-time CLOCK_TICK_RATE, making it hard to
accurately support different hardware with a single kernel.

For the most part, this doesn't matter all that much, as not
too many systems actually utilize the jiffies or jiffies driven
clocksource. Usually there are other highres clocksources
who's granularity error is negligable.

Even so, we have some complicated calcualtions that we do
everywhere to handle these edge cases.

This patch removes the compile time SHIFTED_HZ value, and
introduces a register_refined_jiffies() function. This results
in the default jiffies clock as being assumed a perfect HZ
freq, and allows archtectures that care about jiffies accuracy
to call register_refined_jiffies() with the tick rate, specified
dynamically at boot.

This allows us, where necessary, to not have a compile time
CLOCK_TICK_RATE constant, simplifies the jiffies code, and
still provides a way to have an accurate jiffies clock.

NOTE: Since this patch does not add register_refinied_jiffies()
calls for every arch, it may cause time quality regressions
in some cases. Its likely these will not be noticable, but
if they are an issue, adding the following to the end of
setup_arch() should resolve the regression:
register_refinied_jiffies(CLOCK_TICK_RATE)

Cc: Catalin Marinas
Cc: Arnd Bergmann
Cc: Richard Cochran
Cc: Prarit Bhargava
Cc: Thomas Gleixner
Signed-off-by: John Stultz

John Stultz
2012-09-25 00:38:05 +0800
a65bcc12a alarmtimer: Rename alarmtimer_remove to alarmtimer_dequeue ... Browse Code »

Now that alarmtimer_remove has been simplified, change
its name to _dequeue to better match its paired _enqueue
function.

Cc: Arve Hjønnevåg
Cc: Colin Cross
Cc: Thomas Gleixner
Signed-off-by: John Stultz

John Stultz
2012-09-25 00:38:03 +0800
dae373be9 alarmtimer: Use hrtimer per-alarm instead of per-base ... Browse Code »

Arve Hjønnevåg reported numerous crashes from the
"BUG_ON(timer->state != HRTIMER_STATE_CALLBACK)" check
in __run_hrtimer after it called alarmtimer_fired.

It ends up the alarmtimer code was not properly handling
possible failures of hrtimer_try_to_cancel, and because
these faulres occur when the underlying base hrtimer is
being run, this limits the ability to properly handle
modifications to any alarmtimers on that base.

Because much of the logic duplicates the hrtimer logic,
it seems that we might as well have a per-alarmtimer
hrtimer, and avoid the extra complextity of trying to
multiplex many alarmtimers off of one hrtimer.

Thus this patch moves the hrtimer to the alarm structure
and simplifies the management logic.

Changelog:
v2:
* Includes a fix for double alarm_start calls found by
Arve

Cc: Arve Hjønnevåg
Cc: Colin Cross
Cc: Thomas Gleixner
Reported-by: Arve Hjønnevåg
Tested-by: Arve Hjønnevåg
Signed-off-by: John Stultz

John Stultz
2012-09-25 00:38:02 +0800
59a93c27c alarmtimer: Implement minimum alarm interval for allowing suspend ... Browse Code »

alarmtimer suspend return -EBUSY if the next alarm will fire in less
than 2 seconds. This allows one RTC seconds tick to occur subsequent
to this check before the alarm wakeup time is set, ensuring the wakeup
time is still in the future (assuming the RTC does not tick one more
second prior to setting the alarm).

If suspend is rejected due to an imminent alarm, hold a wakeup source
for 2 seconds to process the alarm prior to reattempting suspend.

If setting the alarm incurs an -ETIME for an alarm set in the past,
or any other problem setting the alarm, abort suspend and hold a
wakelock for 1 second while the alarm is allowed to be serviced or
other hopefully transient conditions preventing the alarm clear up.

Signed-off-by: Todd Poynor
Signed-off-by: John Stultz

Todd Poynor
2012-09-25 00:38:01 +0800

23 Sep, 2012

1 commit

803b0ebae time: RCU permitted to stop idle entry via softirq ... Browse Code »

The can_stop_idle_tick() function complains if a softirq vector is
raised too late in the idle-entry process, presumably in order to
prevent dangling softirq invocations from being delayed across the
full idle period, which might be indefinitely long -- and if softirq
was asserted any later than the call to this function, such a delay
might well happen.

However, RCU needs to be able to use softirq to stop idle entry in
order to be able to drain RCU callbacks from the current CPU, which in
turn enables faster entry into dyntick-idle mode, which in turn reduces
power consumption. Because RCU takes this action at a well-defined
point in the idle-entry path, it is safe for RCU to take this approach.

This commit therefore silences the error message that is sometimes
produced when the going-idle CPU suddenly finds that it has an RCU_SOFTIRQ
to process. The error message will continue to be issued for other
softirq vectors.

Reported-by: Sedat Dilek
Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Tested-by: Sedat Dilek
Reviewed-by: Josh Triplett

Paul E. McKenney
2012-09-23 22:42:52 +0800

22 Sep, 2012

1 commit

519b3b742 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer fix from Ingo Molnar:
"One more timekeeping fix for v3.6"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time: Fix timeekeping_get_ns overflow on 32bit systems

Linus Torvalds
2012-09-22 05:25:46 +0800

18 Sep, 2012

1 commit

00e8b2613 Merge branch 'pm-timers' ... Browse Code »

* pm-timers:
PM: Do not use the syscore flag for runtime PM
sh: MTU2: Basic runtime PM support
sh: CMT: Basic runtime PM support
sh: TMU: Basic runtime PM support
PM / Domains: Do not measure start time for "irq safe" devices
PM / Domains: Move syscore flag from subsys data to struct device
PM / Domains: Rename the always_on device flag to syscore
PM / Runtime: Allow helpers to be called by early platform drivers
PM: Reorganize device PM initialization
sh: MTU2: Introduce clock events suspend/resume routines
sh: CMT: Introduce clocksource/clock events suspend/resume routines
sh: TMU: Introduce clocksource/clock events suspend/resume routines
timekeeping: Add suspend and resume of clock event devices
PM / Domains: Add power off/on function for system core suspend stage
PM / Domains: Introduce simplified power on routine for system resume

Rafael J. Wysocki
2012-09-18 02:25:18 +0800

13 Sep, 2012

2 commits

ec145babe time: Fix timeekeping_get_ns overflow on 32bit systems ... Browse Code »

Daniel Lezcano reported seeing multi-second stalls from
keyboard input on his T61 laptop when NOHZ and CPU_IDLE
were enabled on a 32bit kernel.

He bisected the problem down to commit
1e75fa8be9fb6 ("time: Condense timekeeper.xtime into xtime_sec").

After reproducing this issue, I narrowed the problem down
to the fact that timekeeping_get_ns() returns a 64bit
nsec value that hasn't been accumulated. In some cases
this value was being then stored in timespec.tv_nsec
(which is a long).

On 32bit systems, with idle times larger then 4 seconds
(or less, depending on the value of xtime_nsec), the
returned nsec value would overflow 32bits. This limited
kept time from increasing, causing timers to not expire.

The fix is to make sure we don't directly store the
result of timekeeping_get_ns() into a tv_nsec field,
instead using a 64bit nsec value which can then be
added into the timespec via timespec_add_ns().

Reported-and-bisected-by: Daniel Lezcano
Tested-by: Daniel Lezcano
Signed-off-by: John Stultz
Acked-by: Prarit Bhargava
Cc: Richard Cochran
Link: http://lkml.kernel.org/r/1347405963-35715-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar

John Stultz
2012-09-13 23:39:14 +0800
c1cc017c5 sched/nohz: Clean up select_nohz_load_balancer() ... Browse Code »

There is no load_balancer to be selected now. It just sets the
state of the nohz tick to stop.

So rename the function, pass the 'cpu' as a parameter and then
remove the useless call from tick_nohz_restart_sched_tick().

[ s/set_nohz_tick_stopped/nohz_balance_enter_idle/g
s/clear_nohz_tick_stopped/nohz_balance_exit_idle/g ]
Signed-off-by: Alex Shi
Acked-by: Suresh Siddha
Cc: Venkatesh Pallipadi
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1347261059-24747-1-git-send-email-alex.shi@intel.com
Signed-off-by: Ingo Molnar

Alex Shi
2012-09-13 22:52:05 +0800

04 Sep, 2012

2 commits

749c8814f sched: Add missing call to calc_load_exit_idle() ... Browse Code »

Azat Khuzhin reported high loadavg in Linux v3.6

After checking the upstream scheduler code, I found Peter's commit:

5167e8d5417b sched/nohz: Rewrite and fix load-avg computation -- again

not fully applied, missing the call to calc_load_exit_idle().

After that idle exit in sampling window will always be calculated
to non-idle, and the load will be higher than normal.

This patch adds the missing call to calc_load_exit_idle().

Signed-off-by: Charles Wang
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1345449754-27130-1-git-send-email-muming.wq@gmail.com
Signed-off-by: Ingo Molnar

Charles Wang
2012-09-04 20:30:29 +0800
adc78e6b9 timekeeping: Add suspend and resume of clock event devices ... Browse Code »

Some clock event devices, for example such that belong to PM domains,
need to be handled in a spcial way during the timekeeping suspend
and resume (which takes place in the system core, or "syscore",
stages of system power transitions) in analogy with clock sources.

Introduce .suspend() and .resume() callbacks for clock event devices
that will be executed by timekeeping_suspend/_resume(), respectively,
next the the clock sources' .suspend() and .resume() callbacks.

Signed-off-by: Rafael J. Wysocki

Rafael J. Wysocki
2012-09-04 07:36:01 +0800

02 Sep, 2012

1 commit

cee58483c time: Move ktime_t overflow checking into timespec_valid_strict ... Browse Code »

Andreas Bombe reported that the added ktime_t overflow checking added to
timespec_valid in commit 4e8b14526ca7 ("time: Improve sanity checking of
timekeeping inputs") was causing problems with X.org because it caused
timeouts larger then KTIME_T to be invalid.

Previously, these large timeouts would be clamped to KTIME_MAX and would
never expire, which is valid.

This patch splits the ktime_t overflow checking into a new
timespec_valid_strict function, and converts the timekeeping codes
internal checking to use this more strict function.

Reported-and-tested-by: Andreas Bombe
Cc: Zhouping Liu
Cc: Ingo Molnar
Cc: Prarit Bhargava
Cc: Thomas Gleixner
Cc: stable@vger.kernel.org
Signed-off-by: John Stultz
Signed-off-by: Linus Torvalds

John Stultz
2012-09-02 01:24:48 +0800

22 Aug, 2012

2 commits

bf2ac3121 time: Avoid making adjustments if we haven't accumulated anything ... Browse Code »

If update_wall_time() is called and the current offset isn't large
enough to accumulate, avoid re-calling timekeeping_adjust which may
change the clock freq and can cause 1ns inconsistencies with
CLOCK_REALTIME_COARSE/CLOCK_MONOTONIC_COARSE.

Signed-off-by: John Stultz
Cc: Prarit Bhargava
Cc: Ingo Molnar
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1345595449-34965-5-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner

John Stultz
2012-08-22 16:42:13 +0800
6ea565a9b time: Avoid potential shift overflow with large shift values ... Browse Code »

Andreas Schwab noticed that the 1 << tk->shift could overflow if the
shift value was greater than 30, since 1 would be a 32bit long on
32bit architectures. This issue was introduced by 1e75fa8be (time:
Condense timekeeper.xtime into xtime_sec)

Use 1ULL instead to ensure we don't overflow on the shift.

Reported-by: Andreas Schwab
Signed-off-by: John Stultz
Cc: Prarit Bhargava
Cc: Ingo Molnar
Link: http://lkml.kernel.org/r/1345595449-34965-4-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner

John Stultz
2012-08-22 16:42:13 +0800