Eric Lee / smarc-fsl-linux-kernel

01 Mar, 2010

1 commit

f66ffdedb Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (25 commits)
sched: Fix SCHED_MC regression caused by change in sched cpu_power
sched: Don't use possibly stale sched_class
kthread, sched: Remove reference to kthread_create_on_cpu
sched: cpuacct: Use bigger percpu counter batch values for stats counters
percpu_counter: Make __percpu_counter_add an inline function on UP
sched: Remove member rt_se from struct rt_rq
sched: Change usage of rt_rq->rt_se to rt_rq->tg->rt_se[cpu]
sched: Remove unused update_shares_locked()
sched: Use for_each_bit
sched: Queue a deboosted task to the head of the RT prio queue
sched: Implement head queueing for sched_rt
sched: Extend enqueue_task to allow head queueing
sched: Remove USER_SCHED
sched: Fix the place where group powers are updated
sched: Assume *balance is valid
sched: Remove load_balance_newidle()
sched: Unify load_balance{,_newidle}()
sched: Add a lock break for PREEMPT=y
sched: Remove from fwd decls
sched: Remove rq_iterator from move_one_task
...

Fix up trivial conflicts in kernel/sched.c

Linus Torvalds
2010-03-01 02:31:01 +0800

23 Feb, 2010

1 commit

701188374 kernel/sys.c: fix missing rcu protection for sys_getpriority() ... Browse Code »

find_task_by_vpid() is not safe without rcu_read_lock(). 2.6.33-rc7 got
RCU protection for sys_setpriority() but missed it for sys_getpriority().

Signed-off-by: Tetsuo Handa
Cc: Oleg Nesterov
Cc: "Paul E. McKenney"
Acked-by: Serge Hallyn
Acked-by: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tetsuo Handa
2010-02-23 11:50:34 +0800

21 Jan, 2010

1 commit

7c9414385 sched: Remove USER_SCHED ... Browse Code »

Remove the USER_SCHED feature. It has been scheduled to be removed in
2.6.34 as per http://marc.info/?l=linux-kernel&m=125728479022976&w=2

Signed-off-by: Dhaval Giani
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Dhaval Giani
2010-01-21 20:40:18 +0800

20 Dec, 2009

1 commit

10e5453ff Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sys: Fix missing rcu protection for __task_cred() access
signals: Fix more rcu assumptions
signal: Fix racy access to __task_cred in kill_pid_info_as_uid()

Linus Torvalds
2009-12-20 01:47:34 +0800

16 Dec, 2009

1 commit

dfc6a736d kernel/sys.c: fix "warning: do-while statement is not a compound statement" noise ... Browse Code »

do_each_thread/while_each_thread wrap a block of code that is in this format:

for (...)
do
...
while

If curly braces do not surround the inner loop the following warning is
generated by sparse:

warning: do-while statement is not a compound statement

Fix the warning by adding the braces.

Signed-off-by: H Hartley Sweeten
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

H Hartley Sweeten
2009-12-16 00:53:26 +0800

11 Dec, 2009

1 commit

d4581a239 sys: Fix missing rcu protection for __task_cred() access ... Browse Code »

commit c69e8d9 (CRED: Use RCU to access another task's creds and to
release a task's own creds) added non rcu_read_lock() protected access
to task creds of the target task in set_prio_one().

The comment above the function says:
* - the caller must hold the RCU read lock

The calling code in sys_setpriority does read_lock(&tasklist_lock) but
not rcu_read_lock(). This works only when CONFIG_TREE_PREEMPT_RCU=n.
With CONFIG_TREE_PREEMPT_RCU=y the rcu_callbacks can run in the tick
interrupt when they see no read side critical section.

There is another instance of __task_cred() in sys_setpriority() itself
which is equally unprotected.

Wrap the whole code section into a rcu read side critical section to
fix this quick and dirty.

Will be revisited in course of the read_lock(&tasklist_lock) -> rcu
crusade.

Oleg noted further:

This also fixes another bug here. find_task_by_vpid() is not safe
without rcu_read_lock(). I do not mean it is not safe to use the
result, just find_pid_ns() by itself is not safe.

Usually tasklist gives enough protection, but if copy_process() fails
it calls free_pid() lockless and does call_rcu(delayed_put_pid().
This means, without rcu lock find_pid_ns() can't scan the hash table
safely.

Signed-off-by: Thomas Gleixner
LKML-Reference:
Acked-by: Paul E. McKenney

Thomas Gleixner
2009-12-11 06:04:11 +0800

10 Dec, 2009

1 commit

3b8ecd224 Merge branch 'bkl-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'bkl-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sys: Remove BKL from sys_reboot
pm_qos: clean up racy global "name" variable
pm_qos: remove BKL

Linus Torvalds
2009-12-10 00:07:17 +0800

03 Dec, 2009

1 commit

0cf55e1ec sched, cputime: Introduce thread_group_times() ... Browse Code »

This is a real fix for problem of utime/stime values decreasing
described in the thread:

http://lkml.org/lkml/2009/11/3/522

Now cputime is accounted in the following way:

- {u,s}time in task_struct are increased every time when the thread
is interrupted by a tick (timer interrupt).

- When a thread exits, its {u,s}time are added to signal->{u,s}time,
after adjusted by task_times().

- When all threads in a thread_group exits, accumulated {u,s}time
(and also c{u,s}time) in signal struct are added to c{u,s}time
in signal struct of the group's parent.

So {u,s}time in task struct are "raw" tick count, while
{u,s}time and c{u,s}time in signal struct are "adjusted" values.

And accounted values are used by:

- task_times(), to get cputime of a thread:
This function returns adjusted values that originates from raw
{u,s}time and scaled by sum_exec_runtime that accounted by CFS.

- thread_group_cputime(), to get cputime of a thread group:
This function returns sum of all {u,s}time of living threads in
the group, plus {u,s}time in the signal struct that is sum of
adjusted cputimes of all exited threads belonged to the group.

The problem is the return value of thread_group_cputime(),
because it is mixed sum of "raw" value and "adjusted" value:

group's {u,s}time = foreach(thread){{u,s}time} + exited({u,s}time)

This misbehavior can break {u,s}time monotonicity.
Assume that if there is a thread that have raw values greater
than adjusted values (e.g. interrupted by 1000Hz ticks 50 times
but only runs 45ms) and if it exits, cputime will decrease (e.g.
-5ms).

To fix this, we could do:

group's {u,s}time = foreach(t){task_times(t)} + exited({u,s}time)

But task_times() contains hard divisions, so applying it for
every thread should be avoided.

This patch fixes the above problem in the following way:

- Modify thread's exit (= __exit_signal()) not to use task_times().
It means {u,s}time in signal struct accumulates raw values instead
of adjusted values. As the result it makes thread_group_cputime()
to return pure sum of "raw" values.

- Introduce a new function thread_group_times(*task, *utime, *stime)
that converts "raw" values of thread_group_cputime() to "adjusted"
values, in same calculation procedure as task_times().

- Modify group's exit (= wait_task_zombie()) to use this introduced
thread_group_times(). It make c{u,s}time in signal struct to
have adjusted values like before this patch.

- Replace some thread_group_cputime() by thread_group_times().
This replacements are only applied where conveys the "adjusted"
cputime to users, and where already uses task_times() near by it.
(i.e. sys_times(), getrusage(), and /proc//stat.)

This patch have a positive side effect:

- Before this patch, if a group contains many short-life threads
(e.g. runs 0.9ms and not interrupted by ticks), the group's
cputime could be invisible since thread's cputime was accumulated
after adjusted: imagine adjustment function as adj(ticks, runtime),
{adj(0, 0.9) + adj(0, 0.9) + ....} = {0 + 0 + ....} = 0.
After this patch it will not happen because the adjustment is
applied after accumulated.

v2:
- remove if()s, put new variables into signal_struct.

Signed-off-by: Hidetoshi Seto
Acked-by: Peter Zijlstra
Cc: Spencer Candland
Cc: Americo Wang
Cc: Oleg Nesterov
Cc: Balbir Singh
Cc: Stanislaw Gruszka
LKML-Reference:
Signed-off-by: Ingo Molnar

Hidetoshi Seto
2009-12-03 00:32:40 +0800

26 Nov, 2009

1 commit

d180c5bcc sched: Introduce task_times() to replace task_{u,s}time() pair ... Browse Code »

Functions task_{u,s}time() are called in pair in almost all
cases. However task_stime() is implemented to call task_utime()
from its inside, so such paired calls run task_utime() twice.

It means we do heavy divisions (div_u64 + do_div) twice to get
utime and stime which can be obtained at same time by one set
of divisions.

This patch introduces a function task_times(*tsk, *utime,
*stime) to retrieve utime and stime at once in better, optimized
way.

Signed-off-by: Hidetoshi Seto
Acked-by: Peter Zijlstra
Cc: Stanislaw Gruszka
Cc: Spencer Candland
Cc: Oleg Nesterov
Cc: Balbir Singh
Cc: Americo Wang
LKML-Reference:
Signed-off-by: Ingo Molnar

Hidetoshi Seto
2009-11-26 19:59:19 +0800

29 Oct, 2009

2 commits

3242f9804 Merge branch 'hwpoison-2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 ... Browse Code »

* 'hwpoison-2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6:
HWPOISON: fix invalid page count in printk output
HWPOISON: Allow schedule_on_each_cpu() from keventd
HWPOISON: fix/proc/meminfo alignment
HWPOISON: fix oops on ksm pages
HWPOISON: Fix page count leak in hwpoison late kill in do_swap_page
HWPOISON: return early on non-LRU pages
HWPOISON: Add brief hwpoison description to Documentation
HWPOISON: Clean up PR_MCE_KILL interface

Linus Torvalds
2009-10-29 23:20:00 +0800
0d0df599f connector: fix regression introduced by sid connector ... Browse Code »

Since commit 02b51df1b07b4e9ca823c89284e704cadb323cd1 (proc connector: add
event for process becoming session leader) we have the following warning:

Badness at kernel/softirq.c:143
[...]
Krnl PSW : 0404c00180000000 00000000001481d4 (local_bh_enable+0xb0/0xe0)
[...]
Call Trace:
([] 0x13fe04100)
[] sk_filter+0x9a/0xd0
[] netlink_broadcast+0x2c0/0x53c
[] cn_netlink_send+0x272/0x2b0
[] proc_sid_connector+0xc4/0xd4
[] __set_special_pids+0x58/0x90
[] sys_setsid+0xb4/0xd8
[] sysc_noemu+0x10/0x16
[] 0x41616cb266

The warning is
---> WARN_ON_ONCE(in_irq() || irqs_disabled());

The network code must not be called with disabled interrupts but
sys_setsid holds the tasklist_lock with spinlock_irq while calling the
connector.

After a discussion we agreed that we can move proc_sid_connector from
__set_special_pids to sys_setsid.

We also agreed that it is sufficient to change the check from
task_session(curr) != pid into err > 0, since if we don't change the
session, this means we were already the leader and return -EPERM.

One last thing:
There is also daemonize(), and some people might want to get a
notification in that case. Since daemonize() is only needed if a user
space does kernel_thread this does not look important (and there seems
to be no consensus if this connector should be called in daemonize). If
we really want this, we can add proc_sid_connector to daemonize() in an
additional patch (Scott?)

Signed-off-by: Christian Borntraeger
Cc: Scott James Remnant
Cc: Matt Helsley
Cc: David S. Miller
Acked-by: Oleg Nesterov
Acked-by: Evgeniy Polyakov
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christian Borntraeger
2009-10-29 22:39:25 +0800

14 Oct, 2009

1 commit

6f15fa500 sys: Remove BKL from sys_reboot ... Browse Code »

Serialization of sys_reboot can be done local. The BKL is not
protecting anything else.

LKML-Reference:
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2009-10-14 21:31:10 +0800

04 Oct, 2009

1 commit

1087e9b4f HWPOISON: Clean up PR_MCE_KILL interface ... Browse Code »

While writing the manpage I noticed some shortcomings in the
current interface.

- Define symbolic names for all the different values
- Boundary check the kill mode values
- For symmetry add a get interface too. This allows library
code to get/set the current state.
- For consistency define a PR_MCE_KILL_DEFAULT value

Signed-off-by: Andi Kleen

Andi Kleen
2009-10-04 09:23:17 +0800

24 Sep, 2009

1 commit

db1682636 Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 ... Browse Code »

* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits)
HWPOISON: Enable error_remove_page on btrfs
HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs
HWPOISON: Add madvise() based injector for hardware poisoned pages v4
HWPOISON: Enable error_remove_page for NFS
HWPOISON: Enable .remove_error_page for migration aware file systems
HWPOISON: The high level memory error handler in the VM v7
HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process
HWPOISON: shmem: call set_page_dirty() with locked page
HWPOISON: Define a new error_remove_page address space op for async truncation
HWPOISON: Add invalidate_inode_page
HWPOISON: Refactor truncate to allow direct truncating of page v2
HWPOISON: check and isolate corrupted free pages v2
HWPOISON: Handle hardware poisoned pages in try_to_unmap
HWPOISON: Use bitmask/action code for try_to_unmap behaviour
HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2
HWPOISON: Add poison check to page fault handling
HWPOISON: Add basic support for poisoned pages in fault handler v3
HWPOISON: Add new SIGBUS error codes for hardware poison signals
HWPOISON: Add support for poison swap entries v2
HWPOISON: Export some rmap vma locking to outside world
...

Linus Torvalds
2009-09-24 22:53:22 +0800

23 Sep, 2009

1 commit

1f10206cf getrusage: fill ru_maxrss value ... Browse Code »

Make ->ru_maxrss value in struct rusage filled accordingly to rss hiwater
mark. This struct is filled as a parameter to getrusage syscall.
->ru_maxrss value is set to KBs which is the way it is done in BSD
systems. /usr/bin/time (gnu time) application converts ->ru_maxrss to KBs
which seems to be incorrect behavior. Maintainer of this util was
notified by me with the patch which corrects it and cc'ed.

To make this happen we extend struct signal_struct by two fields. The
first one is ->maxrss which we use to store rss hiwater of the task. The
second one is ->cmaxrss which we use to store highest rss hiwater of all
task childs. These values are used in k_getrusage() to actually fill
->ru_maxrss. k_getrusage() uses current rss hiwater value directly if mm
struct exists.

Note:
exec() clear mm->hiwater_rss, but doesn't clear sig->maxrss.
it is intetionally behavior. *BSD getrusage have exec() inheriting.

test programs
========================================================

getrusage.c
===========
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include

#include "common.h"

#define err(str) perror(str), exit(1)

int main(int argc, char** argv)
{
int status;

printf("allocate 100MB\n");
consume(100);

printf("testcase1: fork inherit? \n");
printf(" expect: initial.self ~= child.self\n");
show_rusage("initial");
if (__fork()) {
wait(&status);
} else {
show_rusage("fork child");
_exit(0);
}
printf("\n");

printf("testcase2: fork inherit? (cont.) \n");
printf(" expect: initial.children ~= 100MB, but child.children = 0\n");
show_rusage("initial");
if (__fork()) {
wait(&status);
} else {
show_rusage("child");
_exit(0);
}
printf("\n");

printf("testcase3: fork + malloc \n");
printf(" expect: child.self ~= initial.self + 50MB\n");
show_rusage("initial");
if (__fork()) {
wait(&status);
} else {
printf("allocate +50MB\n");
consume(50);
show_rusage("fork child");
_exit(0);
}
printf("\n");

printf("testcase4: grandchild maxrss\n");
printf(" expect: post_wait.children ~= 300MB\n");
show_rusage("initial");
if (__fork()) {
wait(&status);
show_rusage("post_wait");
} else {
system("./child -n 0 -g 300");
_exit(0);
}
printf("\n");

printf("testcase5: zombie\n");
printf(" expect: pre_wait ~= initial, IOW the zombie process is not accounted.\n");
printf(" post_wait ~= 400MB, IOW wait() collect child's max_rss. \n");
show_rusage("initial");
if (__fork()) {
sleep(1); /* children become zombie */
show_rusage("pre_wait");
wait(&status);
show_rusage("post_wait");
} else {
system("./child -n 400");
_exit(0);
}
printf("\n");

printf("testcase6: SIG_IGN\n");
printf(" expect: initial ~= after_zombie (child's 500MB alloc should be ignored).\n");
show_rusage("initial");
signal(SIGCHLD, SIG_IGN);
if (__fork()) {
sleep(1); /* children become zombie */
show_rusage("after_zombie");
} else {
system("./child -n 500");
_exit(0);
}
printf("\n");
signal(SIGCHLD, SIG_DFL);

printf("testcase7: exec (without fork) \n");
printf(" expect: initial ~= exec \n");
show_rusage("initial");
execl("./child", "child", "-v", NULL);

return 0;
}

child.c
=======
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include

#include "common.h"

int main(int argc, char** argv)
{
int status;
int c;
long consume_size = 0;
long grandchild_consume_size = 0;
int show = 0;

while ((c = getopt(argc, argv, "n:g:v")) != -1) {
switch (c) {
case 'n':
consume_size = atol(optarg);
break;
case 'v':
show = 1;
break;
case 'g':

grandchild_consume_size = atol(optarg);
break;
default:
break;
}
}

if (show)
show_rusage("exec");

if (consume_size) {
printf("child alloc %ldMB\n", consume_size);
consume(consume_size);
}

if (grandchild_consume_size) {
if (fork()) {
wait(&status);
} else {
printf("grandchild alloc %ldMB\n", grandchild_consume_size);
consume(grandchild_consume_size);

exit(0);
}
}

return 0;
}

common.c
========
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include

#include "common.h"
#define err(str) perror(str), exit(1)

void show_rusage(char *prefix)
{
int err, err2;
struct rusage rusage_self;
struct rusage rusage_children;

printf("%s: ", prefix);
err = getrusage(RUSAGE_SELF, &rusage_self);
if (!err)
printf("self %ld ", rusage_self.ru_maxrss);
err2 = getrusage(RUSAGE_CHILDREN, &rusage_children);
if (!err2)
printf("children %ld ", rusage_children.ru_maxrss);

printf("\n");
}

/* Some buggy OS need this worthless CPU waste. */
void make_pagefault(void)
{
void *addr;
int size = getpagesize();
int i;

for (i=0; i
Signed-off-by: KOSAKI Motohiro
Cc: Oleg Nesterov
Cc: Hugh Dickins
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiri Pirko
2009-09-23 22:39:30 +0800

21 Sep, 2009

1 commit

cdd6c482c perf: Do the big rename: Performance Counters -> Performance Events ... Browse Code »

Bye-bye Performance Counters, welcome Performance Events!

In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.

Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.

All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)

The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.

Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.

User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)

This patch has been generated via the following script:

FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

sed -i \
-e 's/PERF_EVENT_/PERF_RECORD_/g' \
-e 's/PERF_COUNTER/PERF_EVENT/g' \
-e 's/perf_counter/perf_event/g' \
-e 's/nb_counters/nb_events/g' \
-e 's/swcounter/swevent/g' \
-e 's/tpcounter_event/tp_event/g' \
$FILES

for N in $(find . -name perf_counter.[ch]); do
M=$(echo $N | sed 's/perf_counter/perf_event/g')
mv $N $M
done

FILES=$(find . -name perf_event.*)

sed -i \
-e 's/COUNTER_MASK/REG_MASK/g' \
-e 's/COUNTER/EVENT/g' \
-e 's/\/event_id/g' \
-e 's/counter/event/g' \
-e 's/Counter/Event/g' \
$FILES

... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.

Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.

( NOTE: 'counters' are still the proper terminology when we deal
with hardware registers - and these sed scripts are a bit
over-eager in renaming them. I've undone some of that, but
in case there's something left where 'counter' would be
better than 'event' we can undo that on an individual basis
instead of touching an otherwise nicely automated patch. )

Suggested-by: Stephane Eranian
Acked-by: Peter Zijlstra
Acked-by: Paul Mackerras
Reviewed-by: Arjan van de Ven
Cc: Mike Galbraith
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Steven Rostedt
Cc: Benjamin Herrenschmidt
Cc: David Howells
Cc: Kyle McMartin
Cc: Martin Schwidefsky
Cc: "David S. Miller"
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Cc:
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-09-21 20:28:04 +0800

16 Sep, 2009

1 commit

4db96cf07 HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process ... Browse Code »

This allows processes to override their early/late kill
behaviour on hardware memory errors.

Typically applications which are memory error aware is
better of with early kill (see the error as soon
as possible), all others with late kill (only
see the error when the error is really impacting execution)

There's a global sysctl, but this way an application
can set its specific policy.

We're using two bits, one to signify that the process
stated its intention and that

I also made the prctl future proof by enforcing
the unused arguments are 0.

The state is inherited to children.

Note this makes us officially run out of process flags
on 32bit, but the next patch can easily add another field.

Manpage patch will be supplied separately.

Signed-off-by: Andi Kleen

Andi Kleen
2009-09-16 17:50:14 +0800

17 Jun, 2009

1 commit

30639b6af groups: move code to kernel/groups.c ... Browse Code »

Move supplementary groups implementation to kernel/groups.c .
kernel/sys.c already accumulated quite a few random stuff.

Do strictly copy/paste + add required headers to compile. Compile-tested
on many configs and archs.

Signed-off-by: Alexey Dobriyan
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-06-17 10:47:48 +0800

29 Apr, 2009

1 commit

e7fd5d4b3 Merge branch 'linus' into perfcounters/core ... Browse Code »

Merge reason: This brach was on -rc1, refresh it to almost-rc4 to pick up
the latest upstream fixes.

Signed-off-by: Ingo Molnar

Ingo Molnar
2009-04-29 20:47:05 +0800

14 Apr, 2009

1 commit

3d26dcf76 kernel/sys.c: clean up sys_shutdown exit path ... Browse Code »

Impact: cleanup, fix

Clean up sys_shutdown() exit path. Factor out common code. Return
correct error code instead of always 0 on failure.

Signed-off-by: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andi Kleen
2009-04-14 06:04:32 +0800

06 Apr, 2009

1 commit

f541ae326 Merge branch 'linus' into perfcounters/core-v2 ... Browse Code »

Merge reason: we have gathered quite a few conflicts, need to merge upstream

Conflicts:
arch/powerpc/kernel/Makefile
arch/x86/ia32/ia32entry.S
arch/x86/include/asm/hardirq.h
arch/x86/include/asm/unistd_32.h
arch/x86/include/asm/unistd_64.h
arch/x86/kernel/cpu/common.c
arch/x86/kernel/irq.c
arch/x86/kernel/syscall_table_32.S
arch/x86/mm/iomap_32.c
include/linux/sched.h
kernel/Makefile

Signed-off-by: Ingo Molnar

Ingo Molnar
2009-04-06 15:02:57 +0800

03 Apr, 2009

2 commits

8fe74cf05 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
Remove two unneeded exports and make two symbols static in fs/mpage.c
Cleanup after commit 585d3bc06f4ca57f975a5a1f698f65a45ea66225
Trim includes of fdtable.h
Don't crap into descriptor table in binfmt_som
Trim includes in binfmt_elf
Don't mess with descriptor table in load_elf_binary()
Get rid of indirect include of fs_struct.h
New helper - current_umask()
check_unsafe_exec() doesn't care about signal handlers sharing
New locking/refcounting for fs_struct
Take fs_struct handling to new file (fs/fs_struct.c)
Get rid of bumping fs_struct refcount in pivot_root(2)
Kill unsharing fs_struct in __set_personality()

Linus Torvalds
2009-04-03 12:09:10 +0800
1b0f7ffd0 pids: kill signal_struct-> __pgrp/__session and friends ... Browse Code »

We are wasting 2 words in signal_struct without any reason to implement
task_pgrp_nr() and task_session_nr().

task_session_nr() has no callers since
2e2ba22ea4fd4bb85f0fa37c521066db6775cbef, we can remove it.

task_pgrp_nr() is still (I believe wrongly) used in fs/autofsX and
fs/coda.

This patch reimplements task_pgrp_nr() via task_pgrp_nr_ns(), and kills
__pgrp/__session and the related helpers.

The change in drivers/char/tty_io.c is cosmetic, but hopefully makes sense
anyway.

Signed-off-by: Oleg Nesterov
Acked-by: Alan Cox [tty parts]
Cc: Cedric Le Goater
Cc: Dave Hansen
Cc: Eric Biederman
Cc: Pavel Emelyanov
Cc: Serge Hallyn
Cc: Sukadev Bhattiprolu
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2009-04-03 10:05:02 +0800

01 Apr, 2009

1 commit

5ad4e53bd Get rid of indirect include of fs_struct.h ... Browse Code »

Don't pull it in sched.h; very few files actually need it and those
can include directly. sched.h itself only needs forward declaration
of struct fs_struct;

Signed-off-by: Al Viro

Al Viro
2009-04-01 11:00:27 +0800

04 Mar, 2009

1 commit

8163d88c7 Merge commit 'v2.6.29-rc7' into perfcounters/core ... Browse Code »

Conflicts:
arch/x86/mm/iomap_32.c

Ingo Molnar
2009-03-04 18:42:31 +0800

27 Feb, 2009

1 commit

54e991242 sched: don't allow setuid to succeed if the user does not have rt bandwidth ... Browse Code »

Impact: fix hung task with certain (non-default) rt-limit settings

Corey Hickey reported that on using setuid to change the uid of a
rt process, the process would be unkillable and not be running.
This is because there was no rt runtime for that user group. Add
in a check to see if a user can attach an rt task to its task group.
On failure, return EINVAL, which is also returned in
CONFIG_CGROUP_SCHED.

Reported-by: Corey Hickey
Signed-off-by: Dhaval Giani
Acked-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Dhaval Giani
2009-02-27 18:11:53 +0800

11 Feb, 2009

1 commit

95fd4845e Merge commit 'v2.6.29-rc4' into perfcounters/core ... Browse Code »

Conflicts:
arch/x86/kernel/setup_percpu.c
arch/x86/mm/fault.c
drivers/acpi/processor_idle.c
kernel/irq/handle.c

Ingo Molnar
2009-02-11 16:22:04 +0800

06 Feb, 2009

1 commit

60fd760fb revert "rlimit: permit setting RLIMIT_NOFILE to RLIM_INFINITY" ... Browse Code »

Revert commit 0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f because it causes
(arguably poorly designed) existing userspace to spend interminable
periods closing billions of not-open file descriptors.

We could bring this back, with some sort of opt-in tunable in /proc, which
defaults to "off".

Peter's alanysis follows:

: I spent several hours trying to get to the bottom of a serious
: performance issue that appeared on one of our servers after upgrading to
: 2.6.28. In the end it's what could be considered a userspace bug that
: was triggered by a change in 2.6.28. Since this might also affect other
: people I figured I'd at least document what I found here, and maybe we
: can even do something about it:
:
:
: So, I upgraded some of debian.org's machines to 2.6.28.1 and immediately
: the team maintaining our ftp archive complained that one of their
: scripts that previously ran in a few minutes still hadn't even come
: close to being done after an hour or so. Downgrading to 2.6.27 fixed
: that.
:
: Turns out that script is forking a lot and something in it or python or
: whereever closes all the file descriptors it doesn't want to pass on.
: That is, it starts at zero and goes up to ulimit -n/RLIMIT_NOFILE and
: closes them all with a few exceptions.
:
: Turns out that takes a long time when your limit -n is now 2^20 (1048576).
:
: With 2.6.27.* the ulimit -n was the standard 1024, but with 2.6.28 it is
: now a thousand times that.
:
: 2.6.28 included a patch titled "rlimit: permit setting RLIMIT_NOFILE to
: RLIM_INFINITY" (0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f)[1] that
: allows, as the title implies, to set the limit for number of files to
: infinity.
:
: Closer investigation showed that the broken default ulimit did not apply
: to "system" processes (like stuff started from init). In the end I
: could establish that all processes that passed through pam_limit at one
: point had the bad resource limit.
:
: Apparently the pam library in Debian etch (4.0) initializes the limits
: to some default values when it doesn't have any settings in limit.conf
: to override them. Turns out that for nofiles this is RLIM_INFINITY.
: Commenting out "case RLIMIT_NOFILE" in pam_limit.c:267 of our pam
: package version 0.79-5 fixes that - tho I'm not sure what side effects
: that has.
:
: Debian lenny (the upcoming 5.0 version) doesn't have this issue as it
: uses a different pam (version).

Reported-by: Peter Palfrader
Cc: Adam Tkac
Cc: Michael Kerrisk
Cc: [2.6.28.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2009-02-06 04:56:47 +0800

21 Jan, 2009

1 commit

77835492e Merge commit 'v2.6.29-rc2' into perfcounters/core ... Browse Code »

Conflicts:
include/linux/syscalls.h

Ingo Molnar
2009-01-21 23:37:27 +0800

14 Jan, 2009

9 commits

836f92adf [CVE-2009-0029] System call wrappers part 31 ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2009-01-14 21:15:31 +0800
c4ea37c26 [CVE-2009-0029] System call wrappers part 26 ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2009-01-14 21:15:29 +0800
e48fbb699 [CVE-2009-0029] System call wrappers part 24 ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2009-01-14 21:15:28 +0800
5a8a82b1d [CVE-2009-0029] System call wrappers part 23 ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2009-01-14 21:15:28 +0800
754fe8d29 [CVE-2009-0029] System call wrappers part 07 ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2009-01-14 21:15:20 +0800
b290ebe2c [CVE-2009-0029] System call wrappers part 04 ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2009-01-14 21:15:19 +0800
ae1251ab7 [CVE-2009-0029] System call wrappers part 03 ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2009-01-14 21:15:19 +0800
dbf040d9d [CVE-2009-0029] System call wrappers part 02 ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2009-01-14 21:15:19 +0800
58fd3aa28 [CVE-2009-0029] System call wrappers part 01 ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2009-01-14 21:15:18 +0800

11 Jan, 2009

1 commit

506c10f26 Merge commit 'v2.6.29-rc1' into perfcounters/core ... Browse Code »

Conflicts:
include/linux/kernel_stat.h

Ingo Molnar
2009-01-11 09:42:53 +0800

07 Jan, 2009

1 commit

cfa97f993 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: fix section mismatch
sched: fix double kfree in failure path
sched: clean up arch_reinit_sched_domains()
sched: mark sched_create_sysfs_power_savings_entries() as __init
getrusage: RUSAGE_THREAD should return ru_utime and ru_stime
sched: fix sched_slice()
sched_clock: prevent scd->clock from moving backwards, take #2
sched: sched.c declare variables before they get used

Linus Torvalds
2009-01-07 09:10:33 +0800