Eric Lee / smarc-fsl-linux-kernel

13 Jan, 2012

1 commit

028ee4be3 c/r: prctl: add PR_SET_MM codes to set up mm_struct entries ... Browse Code »

When we restore a task we need to set up text, data and data heap sizes
from userspace to the values a task had at checkpoint time. This patch
adds auxilary prctl codes for that.

While most of them have a statistical nature (their values are involved
into calculation of /proc//statm output) the start_brk and brk values
are used to compute an allowed size of program data segment expansion.
Which means an arbitrary changes of this values might be dangerous
operation. So to restrict access the following requirements applied to
prctl calls:

- The process has to have CAP_SYS_ADMIN capability granted.
- For all opcodes except start_brk/brk members an appropriate
VMA area must exist and should fit certain VMA flags,
such as:
- code segment must be executable but not writable;
- data segment must not be executable.

start_brk/brk values must not intersect with data segment and must not
exceed RLIMIT_DATA resource limit.

Still the main guard is CAP_SYS_ADMIN capability check.

Note the kernel should be compiled with CONFIG_CHECKPOINT_RESTORE support
otherwise these prctl calls will return -EINVAL.

[akpm@linux-foundation.org: cache current->mm in a local, saving 200 bytes text]
Signed-off-by: Cyrill Gorcunov
Reviewed-by: Kees Cook
Cc: Tejun Heo
Cc: Andrew Vagin
Cc: Serge Hallyn
Cc: Pavel Emelyanov
Cc: Vasiliy Kulikov
Cc: KAMEZAWA Hiroyuki
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cyrill Gorcunov
2012-01-13 12:13:13 +0800

15 Dec, 2011

1 commit

648616343 [S390] cputime: add sparse checking and cleanup ... Browse Code »

Make cputime_t and cputime64_t nocast to enable sparse checking to
detect incorrect use of cputime. Drop the cputime macros for simple
scalar operations. The conversion macros are still needed.

Signed-off-by: Martin Schwidefsky

Martin Schwidefsky
2011-12-15 21:56:19 +0800

07 Nov, 2011

1 commit

32aaeffbd Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux ... Browse Code »

* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include
net: sch_generic remove redundant use of
net: inet_timewait_sock doesnt need
...

Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h

Linus Torvalds
2011-11-07 11:44:47 +0800

03 Nov, 2011

1 commit

f1ecf0685 sysctl: add support for poll() ... Browse Code »

Adding support for poll() in sysctl fs allows userspace to receive
notifications of changes in sysctl entries. This adds a infrastructure to
allow files in sysctl fs to be pollable and implements it for hostname and
domainname.

[akpm@linux-foundation.org: s/declare/define/ for definitions]
Signed-off-by: Lucas De Marchi
Cc: Greg KH
Cc: Kay Sievers
Cc: Al Viro
Cc: "Eric W. Biederman"
Cc: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lucas De Marchi
2011-11-03 07:07:02 +0800

31 Oct, 2011

2 commits

74da1ff71 kernel: fix several implicit usasges of kmod.h ... Browse Code »

These files were implicitly relying on coming in via
module.h, as without it we get things like:

kernel/power/suspend.c:100: error: implicit declaration of function ‘usermodehelper_disable’
kernel/power/suspend.c:109: error: implicit declaration of function ‘usermodehelper_enable’
kernel/power/user.c:254: error: implicit declaration of function ‘usermodehelper_disable’
kernel/power/user.c:261: error: implicit declaration of function ‘usermodehelper_enable’

kernel/sys.c:317: error: implicit declaration of function ‘usermodehelper_disable’
kernel/sys.c:1816: error: implicit declaration of function ‘call_usermodehelper_setup’
kernel/sys.c:1822: error: implicit declaration of function ‘call_usermodehelper_setfns’
kernel/sys.c:1824: error: implicit declaration of function ‘call_usermodehelper_exec’

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-10-31 21:20:12 +0800
9984de1a5 kernel: Map most files to use export.h instead of module.h ... Browse Code »

The changed files were only including linux/module.h for the
EXPORT_SYMBOL infrastructure, and nothing else. Revector them
onto the isolated export header for faster compile times.

Nothing to see here but a whole lot of instances of:

-#include
+#include

This commit is only changing the kernel dir; next targets
will probably be mm, fs, the arch dirs, etc.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-10-31 21:20:12 +0800

25 Oct, 2011

1 commit

1805b2f04 Merge branch 'master' of ra.kernel.org:/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2011-10-25 06:18:09 +0800

17 Oct, 2011

1 commit

a84a79e4d Avoid using variable-length arrays in kernel/sys.c ... Browse Code »
44

The size is always valid, but variable-length arrays generate worse code
for no good reason (unless the function happens to be inlined and the
compiler sees the length for the simple constant it is).

Also, there seems to be some code generation problem on POWER, where
Henrik Bakken reports that register r28 can get corrupted under some
subtle circumstances (interrupt happening at the wrong time?). That all
indicates some seriously broken compiler issues, but since variable
length arrays are bad regardless, there's little point in trying to
chase it down.

"Just don't do that, then".

Reported-by: Henrik Grindal Bakken
Cc: Benjamin Herrenschmidt
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Linus Torvalds
2011-10-17 23:24:24 +0800

29 Sep, 2011

1 commit

f786ecba4 connector: add comm change event report to proc connector ... Browse Code »

Add an event to monitor comm value changes of tasks. Such an event
becomes vital, if someone desires to control threads of a process in
different manner.

A natural characteristic of threads is its comm value, and helpfully
application developers have an opportunity to change it in runtime.
Reporting about such events via proc connector allows to fine-grain
monitoring and control potentials, for instance a process control daemon
listening to proc connector and following comm value policies can place
specific threads to assigned cgroup partitions.

It might be possible to achieve a pale partial one-shot likeness without
this update, if an application changes comm value of a thread generator
task beforehand, then a new thread is cloned, and after that proc
connector listener gets the fork event and reads new thread's comm value
from procfs stat file, but this change visibly simplifies and extends the
matter.

Signed-off-by: Vladimir Zapolskiy
Acked-by: Evgeniy Polyakov
Cc: David Miller
Signed-off-by: Andrew Morton
Signed-off-by: David S. Miller

Vladimir Zapolskiy
2011-09-29 01:41:50 +0800

26 Aug, 2011

1 commit

be27425dc Add a personality to report 2.6.x version numbers ... Browse Code »
1

I ran into a couple of programs which broke with the new Linux 3.0
version. Some of those were binary only. I tried to use LD_PRELOAD to
work around it, but it was quite difficult and in one case impossible
because of a mix of 32bit and 64bit executables.

For example, all kind of management software from HP doesnt work, unless
we pretend to run a 2.6 kernel.

$ uname -a
Linux svivoipvnx001 3.0.0-08107-g97cd98f #1062 SMP Fri Aug 12 18:11:45 CEST 2011 i686 i686 i386 GNU/Linux

$ hpacucli ctrl all show

Error: No controllers detected.

$ rpm -qf /usr/sbin/hpacucli
hpacucli-8.75-12.0

Another notable case is that Python now reports "linux3" from
sys.platform(); which in turn can break things that were checking
sys.platform() == "linux2":

https://bugzilla.mozilla.org/show_bug.cgi?id=664564

It seems pretty clear to me though it's a bug in the apps that are using
'==' instead of .startswith(), but this allows us to unbreak broken
programs.

This patch adds a UNAME26 personality that makes the kernel report a
2.6.40+x version number instead. The x is the x in 3.x.

I know this is somewhat ugly, but I didn't find a better workaround, and
compatibility to existing programs is important.

Some programs also read /proc/sys/kernel/osrelease. This can be worked
around in user space with mount --bind (and a mount namespace)

To use:

wget ftp://ftp.kernel.org/pub/linux/kernel/people/ak/uname26/uname26.c
gcc -o uname26 uname26.c
./uname26 program

Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds

Andi Kleen
2011-08-26 01:17:28 +0800

12 Aug, 2011

1 commit

72fa59970 move RLIMIT_NPROC check from set_user() to do_execve_common() ... Browse Code »

The patch http://lkml.org/lkml/2003/7/13/226 introduced an RLIMIT_NPROC
check in set_user() to check for NPROC exceeding via setuid() and
similar functions.

Before the check there was a possibility to greatly exceed the allowed
number of processes by an unprivileged user if the program relied on
rlimit only. But the check created new security threat: many poorly
written programs simply don't check setuid() return code and believe it
cannot fail if executed with root privileges. So, the check is removed
in this patch because of too often privilege escalations related to
buggy programs.

The NPROC can still be enforced in the common code flow of daemons
spawning user processes. Most of daemons do fork()+setuid()+execve().
The check introduced in execve() (1) enforces the same limit as in
setuid() and (2) doesn't create similar security issues.

Neil Brown suggested to track what specific process has exceeded the
limit by setting PF_NPROC_EXCEEDED process flag. With the change only
this process would fail on execve(), and other processes' execve()
behaviour is not changed.

Solar Designer suggested to re-check whether NPROC limit is still
exceeded at the moment of execve(). If the process was sleeping for
days between set*uid() and execve(), and the NPROC counter step down
under the limit, the defered execve() failure because NPROC limit was
exceeded days ago would be unexpected. If the limit is not exceeded
anymore, we clear the flag on successful calls to execve() and fork().

The flag is also cleared on successful calls to set_user() as the limit
was exceeded for the previous user, not the current one.

Similar check was introduced in -ow patches (without the process flag).

v3 - clear PF_NPROC_EXCEEDED on successful calls to set_user().

Reviewed-by: James Morris
Signed-off-by: Vasiliy Kulikov
Acked-by: NeilBrown
Signed-off-by: Linus Torvalds

Vasiliy Kulikov
2011-08-12 02:24:42 +0800

26 Jul, 2011

1 commit

c5f41752f notifiers: sys: move reboot notifiers into reboot.h ... Browse Code »

It is not necessary to share the same notifier.h.

This patch already moves register_reboot_notifier() and
unregister_reboot_notifier() from kernel/notifier.c to kernel/sys.c.

[amwang@redhat.com: make allyesconfig succeed on ppc64]
Signed-off-by: WANG Cong
Cc: David Miller
Cc: "Rafael J. Wysocki"
Cc: Greg KH
Signed-off-by: WANG Cong
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Amerigo Wang
2011-07-26 11:57:14 +0800

20 May, 2011

1 commit

39ab05c8e Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git… ... Browse Code »

…/gregkh/driver-core-2.6

* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (44 commits)
debugfs: Silence DEBUG_STRICT_USER_COPY_CHECKS=y warning
sysfs: remove "last sysfs file:" line from the oops messages
drivers/base/memory.c: fix warning due to "memory hotplug: Speed up add/remove when blocks are larger than PAGES_PER_SECTION"
memory hotplug: Speed up add/remove when blocks are larger than PAGES_PER_SECTION
SYSFS: Fix erroneous comments for sysfs_update_group().
driver core: remove the driver-model structures from the documentation
driver core: Add the device driver-model structures to kerneldoc
Translated Documentation/email-clients.txt
RAW driver: Remove call to kobject_put().
reboot: disable usermodehelper to prevent fs access
efivars: prevent oops on unload when efi is not enabled
Allow setting of number of raw devices as a module parameter
Introduce CONFIG_GOOGLE_FIRMWARE
driver: Google Memory Console
driver: Google EFI SMI
x86: Better comments for get_bios_ebda()
x86: get_bios_ebda_length()
misc: fix ti-st build issues
params.c: Use new strtobool function to process boolean inputs
debugfs: move to new strtobool
...

Fix up trivial conflicts in fs/debugfs/file.c due to the same patch
being applied twice, and an unrelated cleanup nearby.

Linus Torvalds
2011-05-20 09:24:11 +0800

12 May, 2011

1 commit

2e711c04d PM: Remove sysdev suspend, resume and shutdown operations ... Browse Code »

Since suspend, resume and shutdown operations in struct sysdev_class
and struct sysdev_driver are not used any more, remove them. Also
drop sysdev_suspend(), sysdev_resume() and sysdev_shutdown() used
for executing those operations and modify all of their users
accordingly. This reduces kernel code size quite a bit and reduces
its complexity.

Signed-off-by: Rafael J. Wysocki
Acked-by: Greg Kroah-Hartman

Rafael J. Wysocki
2011-05-12 03:37:15 +0800

07 May, 2011

1 commit

b50fa7c80 reboot: disable usermodehelper to prevent fs access ... Browse Code »

In case CONFIG_UEVENT_HELPER_PATH is not set to "", which it
should be on every system, the kernel forks processes during
shutdown, which try to access the rootfs, even when the
binary does not exist. It causes exceptions and long delays in
the disk driver, which gets read requests at the time it tries
to shut down the disk.

This patch disables all kernel-forked processes during reboot to
allow a clean poweroff.

Cc: Tejun Heo
Tested-By: Anton Guda
Signed-off-by: Kay Sievers
Signed-off-by: Greg Kroah-Hartman

Kay Sievers
2011-05-07 08:52:32 +0800

24 Mar, 2011

2 commits

fc832ad36 userns: user namespaces: convert all capable checks in kernel/sys.c ... Browse Code »

This allows setuid/setgid in containers. It also fixes some corner cases
where kernel logic foregoes capability checks when uids are equivalent.
The latter will need to be done throughout the whole kernel.

Changelog:
Jan 11: Use nsown_capable() as suggested by Bastian Blank.
Jan 11: Fix logic errors in uid checks pointed out by Bastian.
Feb 15: allow prlimit to current (was regression in previous version)
Feb 23: remove debugging printks, uninline set_one_prio_perm and
make it bool, and document its return value.

Signed-off-by: Serge E. Hallyn
Acked-by: "Eric W. Biederman"
Acked-by: Daniel Lezcano
Acked-by: David Howells
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2011-03-24 10:47:06 +0800
bb96a6f50 userns: allow sethostname in a container ... Browse Code »

Changelog:
Feb 23: let clone_uts_ns() handle setting uts->user_ns
To do so we need to pass in the task_struct who'll
get the utsname, so we can get its user_ns.
Feb 23: As per Oleg's coment, just pass in tsk, instead of two
of its members.

Signed-off-by: Serge E. Hallyn
Acked-by: "Eric W. Biederman"
Acked-by: Daniel Lezcano
Acked-by: David Howells
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2011-03-24 10:47:03 +0800

15 Mar, 2011

1 commit

40dc166cb PM / Core: Introduce struct syscore_ops for core subsystems PM ... Browse Code »

Some subsystems need to carry out suspend/resume and shutdown
operations with one CPU on-line and interrupts disabled. The only
way to register such operations is to define a sysdev class and
a sysdev specifically for this purpose which is cumbersome and
inefficient. Moreover, the arguments taken by sysdev suspend,
resume and shutdown callbacks are practically never necessary.

For this reason, introduce a simpler interface allowing subsystems
to register operations to be executed very late during system suspend
and shutdown and very early during resume in the form of
strcut syscore_ops objects.

Signed-off-by: Rafael J. Wysocki
Acked-by: Greg Kroah-Hartman

Rafael J. Wysocki
2011-03-15 07:43:46 +0800

31 Jan, 2011

1 commit

aa5bd67dc Fix prlimit64 for suid/sgid processes ... Browse Code »

Since check_prlimit_permission always fails in the case of SUID/GUID
processes, such processes are not able to read or set their own limits.
This commit changes this by assuming that process can always read/change
its own limits.

Signed-off-by: Kacper Kornet
Acked-by: Jiri Slaby
Signed-off-by: Linus Torvalds

Kacper Kornet
2011-01-31 11:01:27 +0800

14 Jan, 2011

1 commit

04c6862c0 kmsg_dump: add kmsg_dump() calls to the reboot, halt, poweroff and emergency_restart paths ... Browse Code »
43

We need to know the reason why system rebooted in support service.
However, we can't inform our customers of the reason because final
messages are lost on current Linux kernel.

This patch improves the situation above because the final messages are
saved by adding kmsg_dump() to reboot, halt, poweroff and
emergency_restart path.

Signed-off-by: Seiji Aguchi
Cc: David Woodhouse
Cc: Marco Stornelli
Reviewed-by: Artem Bityutskiy
Reviewed-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Seiji Aguchi
2011-01-14 00:03:07 +0800

30 Nov, 2010

1 commit

5091faa44 sched: Add 'autogroup' scheduling feature: automated per session task groups ... Browse Code »

A recurring complaint from CFS users is that parallel kbuild has
a negative impact on desktop interactivity. This patch
implements an idea from Linus, to automatically create task
groups. Currently, only per session autogroups are implemented,
but the patch leaves the way open for enhancement.

Implementation: each task's signal struct contains an inherited
pointer to a refcounted autogroup struct containing a task group
pointer, the default for all tasks pointing to the
init_task_group. When a task calls setsid(), a new task group
is created, the process is moved into the new task group, and a
reference to the preveious task group is dropped. Child
processes inherit this task group thereafter, and increase it's
refcount. When the last thread of a process exits, the
process's reference is dropped, such that when the last process
referencing an autogroup exits, the autogroup is destroyed.

At runqueue selection time, IFF a task has no cgroup assignment,
its current autogroup is used.

Autogroup bandwidth is controllable via setting it's nice level
through the proc filesystem:

cat /proc//autogroup

Displays the task's group and the group's nice level.

echo > /proc//autogroup

Sets the task group's shares to the weight of nice task.
Setting nice level is rate limited for !admin users due to the
abuse risk of task group locking.

The feature is enabled from boot by default if
CONFIG_SCHED_AUTOGROUP=y is selected, but can be disabled via
the boot option noautogroup, and can also be turned on/off on
the fly via:

echo [01] > /proc/sys/kernel/sched_autogroup_enabled

... which will automatically move tasks to/from the root task group.

Signed-off-by: Mike Galbraith
Acked-by: Linus Torvalds
Acked-by: Peter Zijlstra
Cc: Markus Trippelsdorf
Cc: Mathieu Desnoyers
Cc: Paul Turner
Cc: Oleg Nesterov
[ Removed the task_group_path() debug code, and fixed !EVENTFD build failure. ]
Signed-off-by: Ingo Molnar
LKML-Reference:
Signed-off-by: Ingo Molnar

Mike Galbraith
2010-11-30 23:03:35 +0800

01 Sep, 2010

1 commit

950eaaca6 pid: make setpgid() system call use RCU read-side critical section ... Browse Code »

[ 23.584719]
[ 23.584720] ===================================================
[ 23.585059] [ INFO: suspicious rcu_dereference_check() usage. ]
[ 23.585176] ---------------------------------------------------
[ 23.585176] kernel/pid.c:419 invoked rcu_dereference_check() without protection!
[ 23.585176]
[ 23.585176] other info that might help us debug this:
[ 23.585176]
[ 23.585176]
[ 23.585176] rcu_scheduler_active = 1, debug_locks = 1
[ 23.585176] 1 lock held by rc.sysinit/728:
[ 23.585176] #0: (tasklist_lock){.+.+..}, at: [] sys_setpgid+0x5f/0x193
[ 23.585176]
[ 23.585176] stack backtrace:
[ 23.585176] Pid: 728, comm: rc.sysinit Not tainted 2.6.36-rc2 #2
[ 23.585176] Call Trace:
[ 23.585176] [] lockdep_rcu_dereference+0x99/0xa2
[ 23.585176] [] find_task_by_pid_ns+0x50/0x6a
[ 23.585176] [] find_task_by_vpid+0x1d/0x1f
[ 23.585176] [] sys_setpgid+0x67/0x193
[ 23.585176] [] system_call_fastpath+0x16/0x1b
[ 24.959669] type=1400 audit(1282938522.956:4): avc: denied { module_request } for pid=766 comm="hwclock" kmod="char-major-10-135" scontext=system_u:system_r:hwclock_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclas

It turns out that the setpgid() system call fails to enter an RCU
read-side critical section before doing a PID-to-task_struct translation.
This commit therefore does rcu_read_lock() before the translation, and
also does rcu_read_unlock() after the last use of the returned pointer.

Reported-by: Andrew Morton
Signed-off-by: Paul E. McKenney
Acked-by: David Howells

Paul E. McKenney
2010-09-01 08:00:18 +0800

16 Jul, 2010

9 commits

c022a0aca rlimits: implement prlimit64 syscall ... Browse Code »

This patch adds the code to support the sys_prlimit64 syscall which
modifies-and-returns the rlim values of a selected process atomically.
The first parameter, pid, being 0 means current process.

Unlike the current implementation, it is a generic interface,
architecture indepentent so that we needn't handle compat stuff
anymore. In the future, after glibc start to use this we can deprecate
sys_setrlimit and sys_getrlimit in favor to clean up the code finally.

It also adds a possibility of changing limits of other processes. We
check the user's permissions to do that and if it succeeds, the new
limits are propagated online. This is good for large scale
applications such as SAP or databases where administrators need to
change limits time by time (e.g. on crashes increase core size). And
it is unacceptable to restart the service.

For safety, all rlim users now either use accessors or doesn't need
them due to
- locking
- the fact a process was just forked and nobody else knows about it
yet (and nobody can't thus read/write limits)
hence it is safe to modify limits now.

The limitation is that we currently stay at ulong internal
representation. So the rlim64_is_infinity check is used where value is
compared against ULONG_MAX on 32-bit which is the maximum value there.

And since internally the limits are held in struct rlimit, converters
which are used before and after do_prlimit call in sys_prlimit64 are
introduced.

Signed-off-by: Jiri Slaby

Jiri Slaby
2010-07-16 15:48:48 +0800
b95183453 rlimits: switch more rlimit syscalls to do_prlimit ... Browse Code »

After we added more generic do_prlimit, switch sys_getrlimit to that.
Also switch compat handling, so we can get rid of ugly __user casts
and avoid setting process' address limit to kernel data and back.

Signed-off-by: Jiri Slaby

Jiri Slaby
2010-07-16 15:48:48 +0800
5b41535aa rlimits: redo do_setrlimit to more generic do_prlimit ... Browse Code »

It now allows also reading of limits. I.e. all read and writes will
later use this function.

It takes two parameters, new and old limits which can be both NULL.
If new is non-NULL, the value in it is set to rlimits.
If old is non-NULL, current rlimits are stored there.
If both are non-NULL, old are stored prior to setting the new ones,
atomically.
(Similar to sigaction.)

Signed-off-by: Jiri Slaby

Jiri Slaby
2010-07-16 15:48:48 +0800
86f162f4c rlimits: do security check under task_lock ... Browse Code »

Do security_task_setrlimit under task_lock. Other tasks may change
limits under our hands while we are checking limits inside the
function. From now on, they can't.

Note that all the security work is done under a spinlock here now.
Security hooks count with that, they are called from interrupt context
(like security_task_kill) and with spinlocks already held (e.g.
capable->security_capable).

Signed-off-by: Jiri Slaby
Acked-by: James Morris
Cc: Heiko Carstens

Jiri Slaby
2010-07-16 15:48:47 +0800
1c1e618dd rlimits: allow setrlimit to non-current tasks ... Browse Code »

Add locking to allow setrlimit accept task parameter other than
current.

Namely, lock tasklist_lock for read and check whether the task
structure has sighand non-null. Do all the signal processing under
that lock still held.

There are some points:
1) security_task_setrlimit is now called with that lock held. This is
not new, many security_* functions are called with this lock held
already so it doesn't harm (all this security_* stuff does almost
the same).
2) task->sighand->siglock (in update_rlimit_cpu) is nested in
tasklist_lock. This dependence is already existing.
3) tsk->alloc_lock is nested in tasklist_lock. This is OK too, already
existing dependence.

Signed-off-by: Jiri Slaby
Cc: Oleg Nesterov

Jiri Slaby
2010-07-16 15:48:47 +0800
7855c35da rlimits: split sys_setrlimit ... Browse Code »

Create do_setrlimit from sys_setrlimit and declare do_setrlimit
in the resource header. This is the first phase to have generic
do_prlimit which allows to be called from read, write and compat
rlimits code.

The new do_setrlimit also accepts a task pointer to change the limits
of. Currently, it cannot be other than current, but this will change
with locking later.

Also pass tsk->group_leader to security_task_setrlimit to check
whether current is allowed to change rlimits of the process and not
its arbitrary thread because it makes more sense given that rlimit are
per process and not per-thread.

Signed-off-by: Jiri Slaby

Jiri Slaby
2010-07-16 15:48:46 +0800
2fb9d2689 rlimits: make sure ->rlim_max never grows in sys_setrlimit ... Browse Code »

Mostly preparation for Jiri's changes, but probably makes sense anyway.

sys_setrlimit() checks new_rlim.rlim_max rlim_max, but when
it takes task_lock() old_rlim->rlim_max can be already lowered. Move this
check under task_lock().

Currently this is not important, we can only race with our sub-thread,
this means the application is stupid. But when we change the code to allow
the update of !current task's limits, it becomes important to make sure
->rlim_max can be lowered "reliably" even if we race with the application
doing sys_setrlimit().

Signed-off-by: Oleg Nesterov
Signed-off-by: Jiri Slaby

Oleg Nesterov
2010-07-16 15:48:46 +0800
5ab46b345 rlimits: add task_struct to update_rlimit_cpu ... Browse Code »

Add task_struct as a parameter to update_rlimit_cpu to be able to set
rlimit_cpu of different task than current.

Signed-off-by: Jiri Slaby
Acked-by: James Morris

Jiri Slaby
2010-07-16 15:48:45 +0800
8fd00b4d7 rlimits: security, add task_struct to setrlimit ... Browse Code »

Add task_struct to task_setrlimit of security_operations to be able to set
rlimit of task other than current.

Signed-off-by: Jiri Slaby
Acked-by: Eric Paris
Acked-by: James Morris

Jiri Slaby
2010-07-16 15:48:45 +0800

28 May, 2010

1 commit

a06a4dc3a kmod: add init function to usermodehelper ... Browse Code »

About 6 months ago, I made a set of changes to how the core-dump-to-a-pipe
feature in the kernel works. We had reports of several races, including
some reports of apps bypassing our recursion check so that a process that
was forked as part of a core_pattern setup could infinitely crash and
refork until the system crashed.

We fixed those by improving our recursion checks. The new check basically
refuses to fork a process if its core limit is zero, which works well.

Unfortunately, I've been getting grief from maintainer of user space
programs that are inserted as the forked process of core_pattern. They
contend that in order for their programs (such as abrt and apport) to
work, all the running processes in a system must have their core limits
set to a non-zero value, to which I say 'yes'. I did this by design, and
think thats the right way to do things.

But I've been asked to ease this burden on user space enough times that I
thought I would take a look at it. The first suggestion was to make the
recursion check fail on a non-zero 'special' number, like one. That way
the core collector process could set its core size ulimit to 1, and enable
the kernel's recursion detection. This isn't a bad idea on the surface,
but I don't like it since its opt-in, in that if a program like abrt or
apport has a bug and fails to set such a core limit, we're left with a
recursively crashing system again.

So I've come up with this. What I've done is modify the
call_usermodehelper api such that an extra parameter is added, a function
pointer which will be called by the user helper task, after it forks, but
before it exec's the required process. This will give the caller the
opportunity to get a call back in the processes context, allowing it to do
whatever it needs to to the process in the kernel prior to exec-ing the
user space code. In the case of do_coredump, this callback is ues to set
the core ulimit of the helper process to 1. This elimnates the opt-in
problem that I had above, as it allows the ulimit for core sizes to be set
to the value of 1, which is what the recursion check looks for in
do_coredump.

This patch:

Create new function call_usermodehelper_fns() and allow it to assign both
an init and cleanup function, as we'll as arbitrary data.

The init function is called from the context of the forked process and
allows for customization of the helper process prior to calling exec. Its
return code gates the continuation of the process, or causes its exit.
Also add an arbitrary data pointer to the subprocess_info struct allowing
for data to be passed from the caller to the new process, and the
subsequent cleanup process

Also, use this patch to cleanup the cleanup function. It currently takes
an argp and envp pointer for freeing, which is ugly. Lets instead just
make the subprocess_info structure public, and pass that to the cleanup
and init routines

Signed-off-by: Neil Horman
Reviewed-by: Oleg Nesterov
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Neil Horman
2010-05-28 00:12:44 +0800

06 May, 2010

1 commit

0ffbe2699 Merge branch 'master' into next Browse Code »

James Morris
2010-05-06 08:56:07 +0800

25 Apr, 2010

1 commit

46da27664 kernel/sys.c: fix compat uname machine ... Browse Code »

On ppc64 you get this error:

$ setarch ppc -R true
setarch: ppc: Unrecognized architecture

because uname still reports ppc64 as the machine.

So mask off the personality flags when checking for PER_LINUX32.

Signed-off-by: Andreas Schwab
Reviewed-by: Christoph Hellwig
Acked-by: David S. Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andreas Schwab
2010-04-25 02:31:24 +0800

12 Apr, 2010

2 commits

06ad187e2 security: remove dead hook task_setgid ... Browse Code »

Unused hook. Remove.

Signed-off-by: Eric Paris
Signed-off-by: James Morris

Eric Paris
2010-04-12 10:19:17 +0800
43ed8c3b4 security: remove dead hook task_setuid ... Browse Code »

Unused hook. Remove.

Signed-off-by: Eric Paris
Signed-off-by: James Morris

Eric Paris
2010-04-12 10:19:16 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

13 Mar, 2010

2 commits

5cacdb4ad Add generic sys_olduname() ... Browse Code »

Add generic implementations of the old and really old uname system calls.
Note that sh only implements sys_olduname but not sys_oldolduname, but I'm
not going to bother with another ifdef for that special case.

m32r implemented an old uname but never wired it up, so kill it, too.

Signed-off-by: Christoph Hellwig
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: Paul Mundt
Cc: Jeff Dike
Cc: Hirokazu Takata
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: H. Peter Anvin
Cc: Al Viro
Cc: Arnd Bergmann
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: "Luck, Tony"
Cc: James Morris
Cc: Andreas Schwab
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2010-03-13 07:52:32 +0800
e28cbf229 improve sys_newuname() for compat architectures ... Browse Code »

On an architecture that supports 32-bit compat we need to override the
reported machine in uname with the 32-bit value. Instead of doing this
separately in every architecture introduce a COMPAT_UTS_MACHINE define in
and apply it directly in sys_newuname().

Signed-off-by: Christoph Hellwig
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: Paul Mundt
Cc: Jeff Dike
Cc: Hirokazu Takata
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: H. Peter Anvin
Cc: Al Viro
Cc: Arnd Bergmann
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: "Luck, Tony"
Cc: James Morris
Cc: Andreas Schwab
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2010-03-13 07:52:32 +0800

07 Mar, 2010

1 commit

78d7d407b kernel core: use helpers for rlimits ... Browse Code »

Make sure compiler won't do weird things with limits. E.g. fetching them
twice may return 2 different values after writable limits are implemented.

I.e. either use rlimit helpers added in commit 3e10e716abf3 ("resource:
add helpers for fetching rlimits") or ACCESS_ONCE if not applicable.

Signed-off-by: Jiri Slaby
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: john stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiri Slaby
2010-03-07 03:26:33 +0800