Eric Lee / smarc-fsl-linux-kernel

19 Aug, 2012

1 commit

be53db6e4 alpha: take a bunch of syscalls into osf_sys.c ... Browse Code »

New helper: current_thread_info(). Allows to do a bunch of odd syscalls
in C. While we are at it, there had never been a reason to do
osf_getpriority() in assembler. We also get "namespace"-aware (read:
consistent with getuid(2), etc.) behaviour from getx?id() syscalls now.

Signed-off-by: Al Viro
Signed-off-by: Michael Cree
Acked-by: Matt Turner
Signed-off-by: Linus Torvalds

Al Viro
2012-08-19 23:41:19 +0800

06 Jun, 2012

4 commits

e40468a54 timers: Improve get_next_timer_interrupt() ... Browse Code »

Gilad reported at

http://lkml.kernel.org/r/1336056962-10465-2-git-send-email-gilad@benyossef.com

"Current timer code fails to correctly return a value meaning that
there is no future timer event, with the result that the timer keeps
getting re-armed in HZ one shot mode even when we could turn it off,
generating unneeded interrupts.

What is happening is that when __next_timer_interrupt() wishes
to return a value that signifies "there is no future timer
event", it returns (base->timer_jiffies + NEXT_TIMER_MAX_DELTA).

However, the code in tick_nohz_stop_sched_tick(), which called
__next_timer_interrupt() via get_next_timer_interrupt(),
compares the return value to (last_jiffies + NEXT_TIMER_MAX_DELTA)
to see if the timer needs to be re-armed.

base->timer_jiffies != last_jiffies and so tick_nohz_stop_sched_tick()
interperts the return value as indication that there is a distant
future event 12 days from now and programs the timer to fire next
after KTIME_MAX nsecs instead of avoiding to arm it. This ends up
causing a needless interrupt once every KTIME_MAX nsecs."

Fix this by using the new active timer accounting. This avoids scans
when no active timer is enqueued completely, so we don't have to rely
on base->timer_next and base->timer_jiffies anymore.

Reported-by: Gilad Ben-Yossef
Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/r/20120525214819.317535385@linutronix.de

Thomas Gleixner
2012-06-06 19:49:02 +0800
99d5f3aac timers: Add accounting of non deferrable timers ... Browse Code »

The code in get_next_timer_interrupt() is suboptimal as it has to run
through the cascade to find the next expiring timer. On a completely
idle core we should only do that when there is an active timer
enqueued and base->next_timer does not give us a fast answer.

Add accounting of the active timers to the now consolidated
attach/detach code. I deliberately avoided sanity checks because the
code is fully symetric and any fiddling with timers w/o using the API
functions will lead to cute explosions anyway. ulong is big enough
even on 32bit and if we really run into the situation to have more
than 1<
Cc: Peter Zijlstra
Cc: Gilad Ben-Yossef
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/r/20120525214819.236377028@linutronix.de

Thomas Gleixner
2012-06-06 19:49:01 +0800
facbb4a7e timers: Consolidate base->next_timer update ... Browse Code »

Another bunch of mindlessly copied code. All callers of
internal_add_timer() except the recascading code updates
base->next_timer.

Move this into internal_add_timer() and let the cascading code call
__internal_add_timer().

Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Gilad Ben-Yossef
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/r/20120525214819.189946224@linutronix.de

Thomas Gleixner
2012-06-06 19:49:01 +0800
ec44bc7ac timers: Create detach_if_pending() and use it ... Browse Code »

Most callers of detach_timer() have the same pattern around
them. Check whether the timer is pending and eventually updating
base->next_timer.

Create detach_if_pending() and replace the duplicated code.

Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Gilad Ben-Yossef
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/r/20120525214819.131246037@linutronix.de

Thomas Gleixner
2012-06-06 19:49:01 +0800

24 May, 2012

1 commit

644473e9c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

Pull user namespace enhancements from Eric Biederman:
"This is a course correction for the user namespace, so that we can
reach an inexpensive, maintainable, and reasonably complete
implementation.

Highlights:
- Config guards make it impossible to enable the user namespace and
code that has not been converted to be user namespace safe.

- Use of the new kuid_t type ensures the if you somehow get past the
config guards the kernel will encounter type errors if you enable
user namespaces and attempt to compile in code whose permission
checks have not been updated to be user namespace safe.

- All uids from child user namespaces are mapped into the initial
user namespace before they are processed. Removing the need to add
an additional check to see if the user namespace of the compared
uids remains the same.

- With the user namespaces compiled out the performance is as good or
better than it is today.

- For most operations absolutely nothing changes performance or
operationally with the user namespace enabled.

- The worst case performance I could come up with was timing 1
billion cache cold stat operations with the user namespace code
enabled. This went from 156s to 164s on my laptop (or 156ns to
164ns per stat operation).

- (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
Most uid/gid setting system calls treat these value specially
anyway so attempting to use -1 as a uid would likely cause
entertaining failures in userspace.

- If setuid is called with a uid that can not be mapped setuid fails.
I have looked at sendmail, login, ssh and every other program I
could think of that would call setuid and they all check for and
handle the case where setuid fails.

- If stat or a similar system call is called from a context in which
we can not map a uid we lie and return overflowuid. The LFS
experience suggests not lying and returning an error code might be
better, but the historical precedent with uids is different and I
can not think of anything that would break by lying about a uid we
can't map.

- Capabilities are localized to the current user namespace making it
safe to give the initial user in a user namespace all capabilities.

My git tree covers all of the modifications needed to convert the core
kernel and enough changes to make a system bootable to runlevel 1."

Fix up trivial conflicts due to nearby independent changes in fs/stat.c

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
userns: Silence silly gcc warning.
cred: use correct cred accessor with regards to rcu read lock
userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
userns: Convert cgroup permission checks to use uid_eq
userns: Convert tmpfs to use kuid and kgid where appropriate
userns: Convert sysfs to use kgid/kuid where appropriate
userns: Convert sysctl permission checks to use kuid and kgids.
userns: Convert proc to use kuid/kgid where appropriate
userns: Convert ext4 to user kuid/kgid where appropriate
userns: Convert ext3 to use kuid/kgid where appropriate
userns: Convert ext2 to use kuid/kgid where appropriate.
userns: Convert devpts to use kuid/kgid where appropriate
userns: Convert binary formats to use kuid/kgid where appropriate
userns: Add negative depends on entries to avoid building code that is userns unsafe
userns: signal remove unnecessary map_cred_ns
userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
userns: Convert stat to return values mapped from kuids and kgids
userns: Convert user specfied uids and gids in chown into kuids and kgid
userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
...

Linus Torvalds
2012-05-24 08:42:39 +0800

23 May, 2012

1 commit

c54894cd4 Merge branch 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq ... Browse Code »

Pull workqueue changes from Tejun Heo:
"Nothing exciting. Most are updates to debug stuff and related fixes.
Two not-too-critical bugs are fixed - WARN_ON() triggering spurious
during cpu offlining and unlikely lockdep related oops."

* 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
lockdep: fix oops in processing workqueue
workqueue: skip nr_running sanity check in worker_enter_idle() if trustee is active
workqueue: Catch more locking problems with flush_work()
workqueue: change BUG_ON() to WARN_ON()
trace: Remove unused workqueue tracer

Linus Torvalds
2012-05-23 08:36:56 +0800

15 May, 2012

1 commit

4d82a1deb lockdep: fix oops in processing workqueue ... Browse Code »

Under memory load, on x86_64, with lockdep enabled, the workqueue's
process_one_work() has been seen to oops in __lock_acquire(), barfing
on a 0xffffffff00000000 pointer in the lockdep_map's class_cache[].

Because it's permissible to free a work_struct from its callout function,
the map used is an onstack copy of the map given in the work_struct: and
that copy is made without any locking.

Surprisingly, gcc (4.5.1 in Hugh's case) uses "rep movsl" rather than
"rep movsq" for that structure copy: which might race with a workqueue
user's wait_on_work() doing lock_map_acquire() on the source of the
copy, putting a pointer into the class_cache[], but only in time for
the top half of that pointer to be copied to the destination map.

Boom when process_one_work() subsequently does lock_map_acquire()
on its onstack copy of the lockdep_map.

Fix this, and a similar instance in call_timer_fn(), with a
lockdep_copy_map() function which additionally NULLs the class_cache[].

Note: this oops was actually seen on 3.4-next, where flush_work() newly
does the racing lock_map_acquire(); but Tejun points out that 3.4 and
earlier are already vulnerable to the same through wait_on_work().

* Patch orginally from Peter. Hugh modified it a bit and wrote the
description.

Signed-off-by: Peter Zijlstra
Reported-by: Hugh Dickins
LKML-Reference:
Signed-off-by: Tejun Heo

Peter Zijlstra
2012-05-15 23:08:31 +0800

03 May, 2012

1 commit

a29c33f4e userns: Convert setting and getting uid and gid system calls to use kuid and kgid ... Browse Code »

Convert setregid, setgid, setreuid, setuid,
setresuid, getresuid, setresgid, getresgid, setfsuid, setfsgid,
getuid, geteuid, getgid, getegid,
waitpid, waitid, wait4.

Convert userspace uids and gids into kuids and kgids before
being placed on struct cred. Convert struct cred kuids and
kgids into userspace uids and gids when returning them.

Signed-off-by: Eric W. Biederman

Eric W. Biederman
2012-05-03 18:28:41 +0800

27 Apr, 2012

1 commit

048a0e8f5 timer: Fix mod_timer_pinned() header comment ... Browse Code »

The mod_timer_pinned() header comment states that it prevents timers
from being migrated to a different CPU. This is not the case, instead,
it ensures that the timer is posted to the current CPU, but does nothing
to prevent CPU-hotplug operations from migrating the timer.

This commit therefore brings the comment header into alignment with
reality.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Acked-by: Steven Rostedt

Paul E. McKenney
2012-04-27 03:28:03 +0800

06 Jan, 2012

1 commit

8c717b72d Merge branch 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timer: Use debugobjects to catch deletion of uninitialized timers
timer: Setup uninitialized timer with a stub callback
debugobjects: Extend to assert that an object is initialized
debugobjects: Be smarter about static objects

Linus Torvalds
2012-01-06 23:53:34 +0800

09 Dec, 2011

1 commit

031af165b sys_getppid: add missing rcu_dereference ... Browse Code »

In order to safely dereference current->real_parent inside an
rcu_read_lock, we need an rcu_dereference.

Signed-off-by: Mandeep Singh Baines
Cc: Thomas Gleixner
Cc: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: Kees Cook
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mandeep Singh Baines
2011-12-09 23:50:29 +0800

24 Nov, 2011

2 commits

dc4218bd0 timer: Use debugobjects to catch deletion of uninitialized timers ... Browse Code »

del_timer_sync() calls debug_object_assert_init() to assert that
a timer has been initialized before calling lock_timer_base().
lock_timer_base() would spin forever on a NULL(uninit-ed) base.
The check is added to del_timer() to prevent silent failure, even
though it would not get stuck in an infinite loop.

[ sboyd@codeaurora.org: Remove WARN, intialize timer function]

Signed-off-by: Christine Chan
Signed-off-by: Stephen Boyd
Cc: John Stultz
Link: http://lkml.kernel.org/r/1320724108-20788-4-git-send-email-sboyd@codeaurora.org
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner

Christine Chan
2011-11-24 01:49:23 +0800
fb16b8cf0 timer: Setup uninitialized timer with a stub callback ... Browse Code »

Remove the WARN_ON() in timer_fixup_activate() as we now get the
debugobjects printout in the debugobjects activate check.

We also assign a dummy timer callback so that if the timer is
actually set to fire we don't oops.

[ tglx@linutronix.de: Split out the debugobjects vs. the timer change ]

Signed-off-by: Stephen Boyd
Cc: Christine Chan
Cc: John Stultz
Signed-off-by: Andrew Morton
Link: http://lkml.kernel.org/r/1320724108-20788-2-git-send-email-sboyd@codeaurora.org
Signed-off-by: Thomas Gleixner

Stephen Boyd
2011-11-24 01:49:23 +0800

31 Oct, 2011

1 commit

9984de1a5 kernel: Map most files to use export.h instead of module.h ... Browse Code »

The changed files were only including linux/module.h for the
EXPORT_SYMBOL infrastructure, and nothing else. Revector them
onto the isolated export header for faster compile times.

Nothing to see here but a whole lot of instances of:

-#include
+#include

This commit is only changing the kernel dir; next targets
will probably be mm, fs, the arch dirs, etc.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-10-31 21:20:12 +0800

03 Jun, 2011

1 commit

1c3cc1160 timers: Consider slack value in mod_timer() ... Browse Code »

There is an optimization which does not update the timer if the timer
was pending and the expiration time was unchanged.

Since commit 3bbb9ec9 ("timers: Introduce the concept of timer slack
for legacy timers") this optimization is no longer applied for timers
where the expiration time got extended due to the slack value. So we
need to check again after the expiration time might have been updated.

[ tglx: Made it a single check by applying slack first and sorting
out the slack = 0 value (all timeouts < 256 jiffies) early ]

Signed-off-by: Sebastian Andrzej Siewior
Link: http://lkml.kernel.org/r/20110521105828.GA29442@Chamillionaire.breakpoint.cc
Signed-off-by: Thomas Gleixner

Sebastian Andrzej Siewior
2011-06-03 21:02:32 +0800

16 Mar, 2011

2 commits

420c1c572 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/tip/linux-2.6-tip

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (62 commits)
posix-clocks: Check write permissions in posix syscalls
hrtimer: Remove empty hrtimer_init_hres_timer()
hrtimer: Update hrtimer->state documentation
hrtimer: Update base[CLOCK_BOOTTIME].offset correctly
timers: Export CLOCK_BOOTTIME via the posix timers interface
timers: Add CLOCK_BOOTTIME hrtimer base
time: Extend get_xtime_and_monotonic_offset() to also return sleep
time: Introduce get_monotonic_boottime and ktime_get_boottime
hrtimers: extend hrtimer base code to handle more then 2 clockids
ntp: Remove redundant and incorrect parameter check
mn10300: Switch do_timer() to xtimer_update()
posix clocks: Introduce dynamic clocks
posix-timers: Cleanup namespace
posix-timers: Add support for fd based clocks
x86: Add clock_adjtime for x86
posix-timers: Introduce a syscall for clock tuning.
time: Splitout compat timex accessors
ntp: Add ADJ_SETOFFSET mode bit
time: Introduce timekeeping_inject_offset
posix-timer: Update comment
...

Fix up new system-call-related conflicts in
arch/x86/ia32/ia32entry.S
arch/x86/include/asm/unistd_32.h
arch/x86/include/asm/unistd_64.h
arch/x86/kernel/syscall_table_32.S
(name_to_handle_at()/open_by_handle_at() vs clock_adjtime()), and some
due to movement of get_jiffies_64() in:
kernel/time.c

Linus Torvalds
2011-03-16 09:53:35 +0800
0586bed3e Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kern… ... Browse Code »

…el/git/tip/linux-2.6-tip

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rtmutex: tester: Remove the remaining BKL leftovers
lockdep/timers: Explain in detail the locking problems del_timer_sync() may cause
rtmutex: Simplify PI algorithm and make highest prio task get lock
rwsem: Remove redundant asmregparm annotation
rwsem: Move duplicate function prototypes to linux/rwsem.h
rwsem: Unify the duplicate rwsem_is_locked() inlines
rwsem: Move duplicate init macros and functions to linux/rwsem.h
rwsem: Move duplicate struct rwsem declaration to linux/rwsem.h
x86: Cleanup rwsem_count_t typedef
rwsem: Cleanup includes
locking: Remove deprecated lock initializers
cred: Replace deprecated spinlock initialization
kthread: Replace deprecated spinlock initialization
xtensa: Replace deprecated spinlock initialization
um: Replace deprecated spinlock initialization
sparc: Replace deprecated spinlock initialization
mips: Replace deprecated spinlock initialization
cris: Replace deprecated spinlock initialization
alpha: Replace deprecated spinlock initialization
rtmutex-tester: Remove BKL tests

Linus Torvalds
2011-03-16 09:28:30 +0800

08 Mar, 2011

1 commit

997772884 debugobjects: Add hint for better object identification ... Browse Code »

In complex subsystems like mac80211 structures can contain several
timers and work structs, so identifying a specific instance from the
call trace and object type output of debugobjects can be hard.

Allow the subsystems which support debugobjects to provide a hint
function. This function returns a pointer to a kernel address
(preferrably the objects callback function) which is printed along
with the debugobjects type.

Add hint methods for timer_list, work_struct and hrtimer.

[ tglx: Massaged changelog, made it compile ]

Signed-off-by: Stanislaw Gruszka
LKML-Reference:
Signed-off-by: Thomas Gleixner

Stanislaw Gruszka
2011-03-08 23:10:38 +0800

16 Feb, 2011

1 commit

48228f7b4 lockdep/timers: Explain in detail the locking problems del_timer_sync() may cause ... Browse Code »

Twice I had to explain the output about why lockdep gives an error with
locks in IRQ context and with del_timer_sync(). Might as well write it
up and place it in the comments above the code in del_timer_sync().
Perhaps the next time this lockdep dump triggers people will understand
the issues.

It is a ticky issue and very subtle, explaining it in detail in the code
may help others understand the issue when they stumble upon the bug
again.

Signed-off-by: Steven Rostedt
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Steven Rostedt
2011-02-16 20:35:08 +0800

08 Feb, 2011

1 commit

7ff207928 Revert "lockdep, timer: Fix del_timer_sync() annotation" ... Browse Code »

Both attempts at trying to allow softirq usage for
del_timer_sync() failed (produced bogus warnings),
so revert the commit for this release:

f266a5110d45: lockdep, timer: Fix del_timer_sync() annotation

and try again later.

Reported-by: Borislav Petkov
Signed-off-by: Peter Zijlstra
Cc: Linus Torvalds
Cc: Yong Zhang
Cc: Thomas Gleixner
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-02-08 23:18:39 +0800

04 Feb, 2011

1 commit

f266a5110 lockdep, timer: Fix del_timer_sync() annotation ... Browse Code »

Calling local_bh_enable() will want to actually start processing
softirqs, which isn't a good idea since this can get called with IRQs
disabled.

Cure this by using _local_bh_enable() which doesn't start processing
softirqs, and use raw_local_irq_save() to avoid any softirqs from
happening without letting lockdep think IRQs are in fact disabled.

Reported-by: Nick Bowler
Signed-off-by: Peter Zijlstra
Reviewed-by: Yong Zhang
LKML-Reference:
Signed-off-by: Thomas Gleixner

Peter Zijlstra
2011-02-04 17:31:22 +0800

31 Jan, 2011

1 commit

871cf1e5f time: Move do_timer() to kernel/time/timekeeping.c ... Browse Code »

do_timer() is primary timekeeping related. calc_global_load() is
called from do_timer() as well, but that's more for historical
reasons.

[ tglx: Fixed up the calc_global_load() reject andmassaged changelog ]

Signed-off-by: Torben Hohn
Cc: Peter Zijlstra
Cc: johnstul@us.ibm.com
Cc: yong.zhang0@gmail.com
Cc: hch@infradead.org
LKML-Reference:
Signed-off-by: Thomas Gleixner

Torben Hohn
2011-01-31 21:55:41 +0800

07 Jan, 2011

1 commit

dda5f0a37 Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
MAINTAINERS: Update timer related entries
timers: Use this_cpu_read
timerqueue: Make timerqueue_getnext() static inline
hrtimer: fix timerqueue conversion flub
hrtimers: Convert hrtimers to use timerlist infrastructure
timers: Fixup allmodconfig build issue
timers: Rename timerlist infrastructure to timerqueue
timers: Introduce timerlist infrastructure.
hrtimer: Remove stale comment on curr_timer
timer: Warn when del_timer_sync() is called in hardirq context
timer: Del_timer_sync() can be used in softirq context
timer: Make try_to_del_timer_sync() the same on SMP and UP
posix-timers: Annotate lock_timer()
timer: Permit statically-declared work with deferrable timers
time: Use ARRAY_SIZE macro in timecompare.c
timer: Initialize the field slack of timer_list
timer_list: Remove alignment padding on 64 bit when CONFIG_TIMER_STATS
time: Compensate for rounding on odd-frequency clocksources

Fix up trivial conflict in MAINTAINERS

Linus Torvalds
2011-01-07 02:42:43 +0800

13 Dec, 2010

1 commit

7496351ad timers: Use this_cpu_read ... Browse Code »

Eric asked for this.

[tglx: Because it generates faster code according to Erics ]

Signed-off-by: Christoph Lameter
Cc: Pekka Enberg
Cc: Eric Dumazet
Cc: Mathieu Desnoyers
Cc: Tejun Heo
Cc: linux-mm@kvack.org
LKML-Reference:
Signed-off-by: Thomas Gleixner

Christoph Lameter
2010-12-13 01:38:09 +0800

09 Dec, 2010

2 commits

dbd87b5af nohz: Fix get_next_timer_interrupt() vs cpu hotplug ... Browse Code »

This fixes a bug as seen on 2.6.32 based kernels where timers got
enqueued on offline cpus.

If a cpu goes offline it might still have pending timers. These will
be migrated during CPU_DEAD handling after the cpu is offline.
However while the cpu is going offline it will schedule the idle task
which will then call tick_nohz_stop_sched_tick().

That function in turn will call get_next_timer_intterupt() to figure
out if the tick of the cpu can be stopped or not. If it turns out that
the next tick is just one jiffy off (delta_jiffies == 1)
tick_nohz_stop_sched_tick() incorrectly assumes that the tick should
not stop and takes an early exit and thus it won't update the load
balancer cpu.

Just afterwards the cpu will be killed and the load balancer cpu could
be the offline cpu.

On 2.6.32 based kernel get_nohz_load_balancer() gets called to decide
on which cpu a timer should be enqueued (see __mod_timer()). Which
leads to the possibility that timers get enqueued on an offline cpu.
These will never expire and can cause a system hang.

This has been observed 2.6.32 kernels. On current kernels
__mod_timer() uses get_nohz_timer_target() which doesn't have that
problem. However there might be other problems because of the too
early exit tick_nohz_stop_sched_tick() in case a cpu goes offline.

The easiest and probably safest fix seems to be to let
get_next_timer_interrupt() just lie and let it say there isn't any
pending timer if the current cpu is offline.

I also thought of moving migrate_[hr]timers() from CPU_DEAD to
CPU_DYING, but seeing that there already have been fixes at least in
the hrtimer code in this area I'm afraid that this could add new
subtle bugs.

Signed-off-by: Heiko Carstens
Signed-off-by: Peter Zijlstra
LKML-Reference:
Cc: stable@kernel.org
Signed-off-by: Ingo Molnar

Heiko Carstens
2010-12-09 03:15:07 +0800
0f004f5a6 sched: Cure more NO_HZ load average woes ... Browse Code »

There's a long-running regression that proved difficult to fix and
which is hitting certain people and is rather annoying in its effects.

Damien reported that after 74f5187ac8 (sched: Cure load average vs
NO_HZ woes) his load average is unnaturally high, he also noted that
even with that patch reverted the load avgerage numbers are not
correct.

The problem is that the previous patch only solved half the NO_HZ
problem, it addressed the part of going into NO_HZ mode, not of
comming out of NO_HZ mode. This patch implements that missing half.

When comming out of NO_HZ mode there are two important things to take
care of:

- Folding the pending idle delta into the global active count.
- Correctly aging the averages for the idle-duration.

So with this patch the NO_HZ interaction should be complete and
behaviour between CONFIG_NO_HZ=[yn] should be equivalent.

Furthermore, this patch slightly changes the load average computation
by adding a rounding term to the fixed point multiplication.

Reported-by: Damien Wyart
Reported-by: Tim McGrath
Tested-by: Damien Wyart
Tested-by: Orion Poplawski
Tested-by: Kyle McMartin
Signed-off-by: Peter Zijlstra
Cc: stable@kernel.org
Cc: Chase Douglas
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-12-09 03:15:04 +0800

22 Oct, 2010

3 commits

466bd3030 timer: Warn when del_timer_sync() is called in hardirq context ... Browse Code »

Add explict warning when del_timer_sync() is called in hardirq
context.

Signed-off-by: Yong Zhang
Cc: Ingo Molnar
Cc: Peter Zijlstra
Acked-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner

Yong Zhang
2010-10-22 20:46:25 +0800
1118e2cd3 timer: Del_timer_sync() can be used in softirq context ... Browse Code »

Actually we have used del_timer_sync() in softirq context for a long time,
e.g. in __dst_free()::cancel_delayed_work().

So change the comments of it to warn on hardirq context only, and make
lockdep know about this change.

Signed-off-by: Yong Zhang
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner

Yong Zhang
2010-10-22 20:46:25 +0800
6f1bc451e timer: Make try_to_del_timer_sync() the same on SMP and UP ... Browse Code »

On UP try_to_del_timer_sync() is mapped to del_timer() which does not
take the running timer callback into account, so it has different
semantics.

Remove the SMP dependency of try_to_del_timer_sync() by using
base->running_timer in the UP case as well.

[ tglx: Removed set_running_timer() inline and tweaked the changelog ]

Signed-off-by: Yong Zhang
Cc: Ingo Molnar
Cc: Peter Zijlstra
Acked-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner

Yong Zhang
2010-10-22 20:46:25 +0800

21 Oct, 2010

1 commit

dd6414b50 timer: Permit statically-declared work with deferrable timers ... Browse Code »

Currently, you have to just define a delayed_work uninitialised, and then
initialise it before first use. That's a tad clumsy. At risk of playing
mind-games with the compiler, fooling it into doing pointer arithmetic
with compile-time-constants, this lets clients properly initialise delayed
work with deferrable timers statically.

This patch was inspired by the issues which lead Artem Bityutskiy to
commit 8eab945c5616fc984 ("sunrpc: make the cache cleaner workqueue
deferrable").

Signed-off-by: Phil Carmody
Acked-by: Artem Bityutskiy
Cc: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner

Phil Carmody
2010-10-21 23:30:06 +0800

19 Oct, 2010

1 commit

e360adbe2 irq_work: Add generic hardirq context callbacks ... Browse Code »
2

Provide a mechanism that allows running code in IRQ context. It is
most useful for NMI code that needs to interact with the rest of the
system -- like wakeup a task to drain buffers.

Perf currently has such a mechanism, so extract that and provide it as
a generic feature, independent of perf so that others may also
benefit.

The IRQ context callback is generated through self-IPIs where
possible, or on architectures like powerpc the decrementer (the
built-in timer facility) is set to generate an interrupt immediately.

Architectures that don't have anything like this get to do with a
callback from the timer tick. These architectures can call
irq_work_run() at the tail of any IRQ handlers that might enqueue such
work (like the perf IRQ handler) to avoid undue latencies in
processing the work.

Signed-off-by: Peter Zijlstra
Acked-by: Kyle McMartin
Acked-by: Martin Schwidefsky
[ various fixes ]
Signed-off-by: Huang Ying
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-10-19 01:58:50 +0800

11 Aug, 2010

1 commit

0caa62106 kernel/timer.c: fix kernel-doc function parameter warning ... Browse Code »

Fix kernel-doc warning, add @timer description:

Warning(kernel/timer.c:335): No description found for parameter 'timer'

Signed-off-by: Randy Dunlap
Cc: Thomas Gleixner
Signed-off-by: Linus Torvalds

Randy Dunlap
2010-08-11 06:33:09 +0800

07 Aug, 2010

3 commits

af3900843 Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
Documentation: Add timers/timers-howto.txt
timer: Added usleep_range timer
Revert "timer: Added usleep[_range] timer"
clockevents: Remove the per cpu tick skew
posix_timer: Move copy_to_user(created_timer_id) down in timer_create()
timer: Added usleep[_range] timer
timers: Document meaning of deferrable timer

Linus Torvalds
2010-08-07 04:12:36 +0800
c4efd6b56 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits)
sched: Use correct macro to display sched_child_runs_first in /proc/sched_debug
sched: No need for bootmem special cases
sched: Revert nohz_ratelimit() for now
sched: Reduce update_group_power() calls
sched: Update rq->clock for nohz balanced cpus
sched: Fix spelling of sibling
sched, cpuset: Drop __cpuexit from cpu hotplug callbacks
sched: Fix the racy usage of thread_group_cputimer() in fastpath_timer_check()
sched: run_posix_cpu_timers: Don't check ->exit_state, use lock_task_sighand()
sched: thread_group_cputime: Simplify, document the "alive" check
sched: Remove the obsolete exit_state/signal hacks
sched: task_tick_rt: Remove the obsolete ->signal != NULL check
sched: __sched_setscheduler: Read the RLIMIT_RTPRIO value lockless
sched: Fix comments to make them DocBook happy
sched: Fix fix_small_capacity
powerpc: Exclude arch_sd_sibiling_asym_packing() on UP
powerpc: Enable asymmetric SMT scheduling on POWER7
sched: Add asymmetric group packing option for sibling domain
sched: Fix capacity calculations for SMT4
sched: Change nohz idle load balancing logic to push model
...

Linus Torvalds
2010-08-07 00:39:22 +0800
4aed2fd8e Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/… ... Browse Code »

…git/tip/linux-2.6-tip

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (162 commits)
tracing/kprobes: unregister_trace_probe needs to be called under mutex
perf: expose event__process function
perf events: Fix mmap offset determination
perf, powerpc: fsl_emb: Restore setting perf_sample_data.period
perf, powerpc: Convert the FSL driver to use local64_t
perf tools: Don't keep unreferenced maps when unmaps are detected
perf session: Invalidate last_match when removing threads from rb_tree
perf session: Free the ref_reloc_sym memory at the right place
x86,mmiotrace: Add support for tracing STOS instruction
perf, sched migration: Librarize task states and event headers helpers
perf, sched migration: Librarize the GUI class
perf, sched migration: Make the GUI class client agnostic
perf, sched migration: Make it vertically scrollable
perf, sched migration: Parameterize cpu height and spacing
perf, sched migration: Fix key bindings
perf, sched migration: Ignore unhandled task states
perf, sched migration: Handle ignored migrate out events
perf: New migration tool overview
tracing: Drop cpparg() macro
perf: Use tracepoint_synchronize_unregister() to flush any pending tracepoint call
...

Fix up trivial conflicts in Makefile and drivers/cpufreq/cpufreq.c

Linus Torvalds
2010-08-07 00:30:52 +0800

05 Aug, 2010

1 commit

61be7fdec Merge branch 'perf/nmi' into perf/core ... Browse Code »

Conflicts:
kernel/Makefile

Merge reason: Add the now complete topic, fix the conflict.

Signed-off-by: Ingo Molnar

Ingo Molnar
2010-08-05 14:45:05 +0800

04 Aug, 2010

2 commits

5e7f5a178 timer: Added usleep_range timer ... Browse Code »

usleep_range is a finer precision implementations of msleep
and is designed to be a drop-in replacement for udelay where
a precise sleep / busy-wait is unnecessary.

Since an easy interface to hrtimers could lead to an undesired
proliferation of interrupts, we provide only a "range" API,
forcing the caller to think about an acceptable tolerance on
both ends and hopefully avoiding introducing another interrupt.

INTRO

As discussed here ( http://lkml.org/lkml/2007/8/3/250 ), msleep(1) is not
precise enough for many drivers (yes, sleep precision is an unfair notion,
but consistently sleeping for ~an order of magnitude greater than requested
is worth fixing). This patch adds a usleep API so that udelay does not have
to be used. Obviously not every udelay can be replaced (those in atomic
contexts or being used for simple bitbanging come to mind), but there are
many, many examples of

mydriver_write(...)
/* Wait for hardware to latch */
udelay(100)

in various drivers where a busy-wait loop is neither beneficial nor
necessary, but msleep simply does not provide enough precision and people
are using a busy-wait loop instead.

CONCERNS FROM THE RFC

Why is udelay a problem / necessary? Most callers of udelay are in device/
driver initialization code, which is serial...

As I see it, there is only benefit to sleeping over a delay; the
notion of "refactoring" areas that use udelay was presented, but
I see usleep as the refactoring. Consider i2c, if the bus is busy,
you need to wait a bit (say 100us) before trying again, your
current options are:

* udelay(100)
* msleep(1) = | COUNT
1000 | 319
500 | 414
100 | 1146
20 | 1832

I am working on Android, so that is my focus for this. The following table
is a modified usleep that simply printk's the amount of time requested to
sleep; these tests were run on a kernel with udelay >= 20 --> usleep

"boot" is power-on to lock screen
"power collapse" is when the power button is pushed and the device suspends
"resume" is when the power button is pushed and the lock screen is displayed
(no touchscreen events or anything, just turning on the display)
"use device" is from the unlock swipe to clicking around a bit; there is no
sd card in this phone, so fail loading music, video, camera

ACTION | TOTAL NUMBER OF USLEEP CALLS | NET TIME (us)
boot | 22 | 1250
power-collapse | 9 | 1200
resume | 5 | 500
use device | 59 | 7700

The most interesting category to me is the "use device" field; 7700us of
busy-wait time that could be put towards better responsiveness, or at the
least less power usage.

Signed-off-by: Patrick Pannuto
Cc: apw@canonical.com
Cc: corbet@lwn.net
Cc: arjan@linux.intel.com
Cc: Randy Dunlap
Cc: Andrew Morton
Signed-off-by: Thomas Gleixner

Patrick Pannuto
2010-08-04 17:00:45 +0800
e1b004c3e Revert "timer: Added usleep[_range] timer" ... Browse Code »

This reverts commit 22b8f15c2f7130bb0386f548428df2ffd4e81903 to merge
an advanced version.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2010-08-04 16:53:00 +0800

03 Aug, 2010

1 commit

8cadd2831 timer: add on-stack deferrable timer interfaces ... Browse Code »

In some cases (for instance with kernel threads) it may be desireable to
use on-stack deferrable timers to get their power saving benefits. Add
interfaces to support this for the IPS driver.

Signed-off-by: Jesse Barnes
Signed-off-by: Matthew Garrett

Jesse Barnes
2010-08-03 21:48:45 +0800