Doug / smarc-fsl-linux-kernel | Embedian Git Server

20 Jul, 2007

4 commits

f20786ff4 lockstat: core infrastructure ... Browse Code »

Introduce the core lock statistics code.

Lock statistics provides lock wait-time and hold-time (as well as the count
of corresponding contention and acquisitions events). Also, the first few
call-sites that encounter contention are tracked.

Lock wait-time is the time spent waiting on the lock. This provides insight
into the locking scheme, that is, a heavily contended lock is indicative of
a too coarse locking scheme.

Lock hold-time is the duration the lock was held, this provides a reference for
the wait-time numbers, so they can be put into perspective.

1)
lock
2)
... do stuff ..
unlock
3)

The time between 1 and 2 is the wait-time. The time between 2 and 3 is the
hold-time.

The lockdep held-lock tracking code is reused, because it already collects locks
into meaningful groups (classes), and because it is an existing infrastructure
for lock instrumentation.

Currently lockdep tracks lock acquisition with two hooks:

lock()
lock_acquire()
_lock()

... code protected by lock ...

unlock()
lock_release()
_unlock()

We need to extend this with two more hooks, in order to measure contention.

lock_contended() - used to measure contention events
lock_acquired() - completion of the contention

These are then placed the following way:

lock()
lock_acquire()
if (!_try_lock())
lock_contended()
_lock()
lock_acquired()

... do locked stuff ...

unlock()
lock_release()
_unlock()

(Note: the try_lock() 'trick' is used to avoid instrumenting all platform
dependent lock primitive implementations.)

It is also possible to toggle the two lockdep features at runtime using:

/proc/sys/kernel/prove_locking
/proc/sys/kernel/lock_stat

(esp. turning off the O(n^2) prove_locking functionaliy can help)

[akpm@linux-foundation.org: build fixes]
[akpm@linux-foundation.org: nuke unneeded ifdefs]
Signed-off-by: Peter Zijlstra
Acked-by: Ingo Molnar
Acked-by: Jason Baron
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2007-07-20 01:04:49 +0800
76fdbb25f coredump masking: bound suid_dumpable sysctl ... Browse Code »

This patch series is version 5 of the core dump masking feature, which
controls which VMAs should be dumped based on their memory types and
per-process flags.

I adopted most of Andrew's suggestion at the previous version. He also
suggested using system call instead of /proc// interface, I decided to
use the latter continuously because adding new system call with pid argument
will give a big impact on the kernel.

You can access the per-process flags via /proc//coredump_filter
interface. coredump_filter represents a bitmask of memory types, and if a bit
is set, VMAs of corresponding memory type are written into a core file when
the process is dumped. The bitmask is inherited from the parent process when
a process is created.

The original purpose is to avoid longtime system slowdown when a number of
processes which share a huge shared memory are dumped at the same time. To
achieve this purpose, this patch series adds an ability to suppress dumping
anonymous shared memory for specified processes. In this version, three other
memory types are also supported.

Here are the coredump_filter bits:
bit 0: anonymous private memory
bit 1: anonymous shared memory
bit 2: file-backed private memory
bit 3: file-backed shared memory

The default value of coredump_filter is 0x3. This means the new core dump
routine has the same behavior as conventional behavior by default.

In this version, coredump_filter bits and mm.dumpable are merged into
mm.flags, and it is accessed by atomic bitops.

The supported core file formats are ELF and ELF-FDPIC. ELF has been tested,
but ELF-FDPIC has not been built and tested because I don't have the test
environment.

This patch limits a value of suid_dumpable sysctl to the range of 0 to 2.

Signed-off-by: Hidehiro Kawai
Cc: Alan Cox
Cc: David Howells
Cc: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kawai, Hidehiro
2007-07-20 01:04:46 +0800
bdf4c48af audit: rework execve audit ... Browse Code »

The purpose of audit_bprm() is to log the argv array to a userspace daemon at
the end of the execve system call. Since user-space hasn't had time to run,
this array is still in pristine state on the process' stack; so no need to
copy it, we can just grab it from there.

In order to minimize the damage to audit_log_*() copy each string into a
temporary kernel buffer first.

Currently the audit code requires that the full argument vector fits in a
single packet. So currently it does clip the argv size to a (sysctl) limit,
but only when execve auditing is enabled.

If the audit protocol gets extended to allow for multiple packets this check
can be removed.

Signed-off-by: Peter Zijlstra
Signed-off-by: Ollie Wild
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2007-07-20 01:04:45 +0800
77afcf78a PM: Integrate beeping flag with existing acpi_sleep flags ... Browse Code »

Move "debug during resume from s2ram" into the variable we already use
for real-mode flags to simplify code. It also closes nasty trap for
the user in acpi_sleep_setup; order of parameters actually mattered there,
acpi_sleep=s3_bios,s3_mode doing something different from
acpi_sleep=s3_mode,s3_bios.

Signed-off-by: Pavel Machek
Signed-off-by: Rafael J. Wysocki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Machek
2007-07-20 01:04:43 +0800

18 Jul, 2007

3 commits

10a0a8d4e Add common orderly_poweroff() ... Browse Code »

Various pieces of code around the kernel want to be able to trigger an
orderly poweroff. This pulls them together into a single
implementation.

By default the poweroff command is /sbin/poweroff, but it can be set
via sysctl: kernel/poweroff_cmd. This is split at whitespace, so it
can include command-line arguments.

This patch replaces four other instances of invoking either "poweroff"
or "shutdown -h now": two sbus drivers, and acpi thermal
management.

sparc64 has its own "powerd"; still need to determine whether it should
be replaced by orderly_poweroff().

Signed-off-by: Jeremy Fitzhardinge
Acked-by: Len Brown
Signed-off-by: Chris Wright
Cc: Andrew Morton
Cc: Randy Dunlap
Cc: Andi Kleen
Cc: Al Viro
Cc: Arnd Bergmann
Cc: David S. Miller

Jeremy Fitzhardinge
2007-07-18 23:47:40 +0800
62239ac2b proper prototype for proc_nr_files() ... Browse Code »

Add a proper prototype for proc_nr_files() in include/linux/fs.h

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2007-07-18 01:23:03 +0800
396faf030 Allow huge page allocations to use GFP_HIGH_MOVABLE ... Browse Code »

Huge pages are not movable so are not allocated from ZONE_MOVABLE. However,
as ZONE_MOVABLE will always have pages that can be migrated or reclaimed, it
can be used to satisfy hugepage allocations even when the system has been
running a long time. This allows an administrator to resize the hugepage pool
at runtime depending on the size of ZONE_MOVABLE.

This patch adds a new sysctl called hugepages_treat_as_movable. When a
non-zero value is written to it, future allocations for the huge page pool
will use ZONE_MOVABLE. Despite huge pages being non-movable, we do not
introduce additional external fragmentation of note as huge pages are always
the largest contiguous block we care about.

[akpm@linux-foundation.org: various fixes]
Signed-off-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2007-07-18 01:22:59 +0800

17 Jul, 2007

4 commits

7144521f5 Remove duplicate comments from sysctl.c ... Browse Code »

Randy Dunlap noticed that the recent comment clarifications from Andrew
had somehow gotten duplicated. Quoth Andrew: "hm, that could have been
some late-night reject-fixing."

Fix it up.

Cc: From: Andrew Morton
Cc: Randy Dunlap
Signed-off-by: Linus Torvalds

Linus Torvalds
2007-07-17 02:50:38 +0800
2be7fe075 sysctl.c: add text telling people to use CTL_UNNUMBERED ... Browse Code »

Hopefully this will help people to understand the new regime.

Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-07-17 00:05:49 +0800
45807a1df vdso: print fatal signals ... Browse Code »

Add the print-fatal-signals=1 boot option and the
/proc/sys/kernel/print-fatal-signals runtime switch.

This feature prints some minimal information about userspace segfaults to
the kernel console. This is useful to find early bootup bugs where
userspace debugging is very hard.

Defaults to off.

[akpm@linux-foundation.org: Don't add new sysctl numbers]
Signed-off-by: Ingo Molnar
Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2007-07-17 00:05:43 +0800
f0c0b2b80 change zonelist order: zonelist order selection logic ... Browse Code »

Make zonelist creation policy selectable from sysctl/boot option v6.

This patch makes NUMA's zonelist (of pgdat) order selectable.
Available order are Default(automatic)/ Node-based / Zone-based.

[Default Order]
The kernel selects Node-based or Zone-based order automatically.

[Node-based Order]
This policy treats the locality of memory as the most important parameter.
Zonelist order is created by each zone's locality. This means lower zones
(ex. ZONE_DMA) can be used before higher zone (ex. ZONE_NORMAL) exhausion.
IOW. ZONE_DMA will be in the middle of zonelist.
current 2.6.21 kernel uses this.

Pros.
* A user can expect local memory as much as possible.
Cons.
* lower zone will be exhansted before higher zone. This may cause OOM_KILL.

Maybe suitable if ZONE_DMA is relatively big and you never see OOM_KILL
because of ZONE_DMA exhaution and you need the best locality.

(example)
assume 2 node NUMA. node(0) has ZONE_DMA/ZONE_NORMAL, node(1) has ZONE_NORMAL.

*node(0)'s memory allocation order:

node(0)'s NORMAL -> node(0)'s DMA -> node(1)'s NORMAL.

*node(1)'s memory allocation order:

node(1)'s NORMAL -> node(0)'s NORMAL -> node(0)'s DMA.

[Zone-based order]
This policy treats the zone type as the most important parameter.
Zonelist order is created by zone-type order. This means lower zone
never be used bofere higher zone exhaustion.
IOW. ZONE_DMA will be always at the tail of zonelist.

Pros.
* OOM_KILL(bacause of lower zone) occurs only if the whole zones are exhausted.
Cons.
* memory locality may not be best.

(example)
assume 2 node NUMA. node(0) has ZONE_DMA/ZONE_NORMAL, node(1) has ZONE_NORMAL.

*node(0)'s memory allocation order:

node(0)'s NORMAL -> node(1)'s NORMAL -> node(0)'s DMA.

*node(1)'s memory allocation order:

node(1)'s NORMAL -> node(0)'s NORMAL -> node(0)'s DMA.

bootoption "numa_zonelist_order=" and proc/sysctl is supporetd.

command:
%echo N > /proc/sys/vm/numa_zonelist_order

Will rebuild zonelist in Node-based order.

command:
%echo Z > /proc/sys/vm/numa_zonelist_order

Will rebuild zonelist in Zone-based order.

Thanks to Lee Schermerhorn, he gives me much help and codes.

[Lee.Schermerhorn@hp.com: add check_highest_zone to build_zonelists_in_zone_order]
[akpm@linux-foundation.org: build fix]
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Lee Schermerhorn
Cc: Christoph Lameter
Cc: Andi Kleen
Cc: "jesse.barnes@intel.com"
Signed-off-by: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2007-07-17 00:05:35 +0800

12 Jul, 2007

1 commit

ed0321895 security: Protection for exploiting null dereference using mmap ... Browse Code »

Add a new security check on mmap operations to see if the user is attempting
to mmap to low area of the address space. The amount of space protected is
indicated by the new proc tunable /proc/sys/vm/mmap_min_addr and defaults to
0, preserving existing behavior.

This patch uses a new SELinux security class "memprotect." Policy already
contains a number of allow rules like a_t self:process * (unconfined_t being
one of them) which mean that putting this check in the process class (its
best current fit) would make it useless as all user processes, which we also
want to protect against, would be allowed. By taking the memprotect name of
the new class it will also make it possible for us to move some of the other
memory protect permissions out of 'process' and into the new class next time
we bump the policy version number (which I also think is a good future idea)

Acked-by: Stephen Smalley
Acked-by: Chris Wright
Signed-off-by: Eric Paris
Signed-off-by: James Morris

Eric Paris
2007-07-12 10:52:29 +0800

10 Jul, 2007

1 commit

77e54a1f8 sched: add CFS debug sysctls ... Browse Code »

add CFS debug sysctls: only tweakable if SCHED_DEBUG is enabled.
This allows for faster debugging of scheduler problems.

Signed-off-by: Ingo Molnar

Ingo Molnar
2007-07-10 00:52:00 +0800

17 May, 2007

1 commit

71ce92f3f make sysctl/kernel/core_pattern and fs/exec.c agree on maximum core filename size ... Browse Code »

Make sysctl/kernel/core_pattern and fs/exec.c agree on maximum core
filename size and change it to 128, so that extensive patterns such as
'/local/cores/%e-%h-%s-%t-%p.core' won't result in truncated filename
generation.

Signed-off-by: Dan Aloni
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Aloni
2007-05-17 20:23:05 +0800

10 May, 2007

1 commit

77461ab33 Make vm statistics update interval configurable ... Browse Code »

Make it configurable. Code in mm makes the vm statistics intervals
independent from the cache reaper use that opportunity to make it
configurable.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-10 03:30:56 +0800

09 May, 2007

1 commit

5096add84 proc: maps protection ... Browse Code »

The /proc/pid/ "maps", "smaps", and "numa_maps" files contain sensitive
information about the memory location and usage of processes. Issues:

- maps should not be world-readable, especially if programs expect any
kind of ASLR protection from local attackers.
- maps cannot just be 0400 because "-D_FORTIFY_SOURCE=2 -O2" makes glibc
check the maps when %n is in a *printf call, and a setuid(getuid())
process wouldn't be able to read its own maps file. (For reference
see http://lkml.org/lkml/2006/1/22/150)
- a system-wide toggle is needed to allow prior behavior in the case of
non-root applications that depend on access to the maps contents.

This change implements a check using "ptrace_may_attach" before allowing
access to read the maps contents. To control this protection, the new knob
/proc/sys/kernel/maps_protect has been added, with corresponding updates to
the procfs documentation.

[akpm@linux-foundation.org: build fixes]
[akpm@linux-foundation.org: New sysctl numbers are old hat]
Signed-off-by: Kees Cook
Cc: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2007-05-09 02:15:02 +0800

24 Apr, 2007

1 commit

91fcd412e Allow reading tainted flag as user ... Browse Code »

The commit 34f5a39899f3f3e815da64f48ddb72942d86c366 restricted reading
of the tainted value. The attached patch changes this back to a
write-only check and restores the read behaviour of older versions.

Signed-off-by: Bastian Blank
Cc: Theodore Ts'o
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bastian Blank
2007-04-24 23:23:08 +0800

05 Mar, 2007

1 commit

5c36e6578 sysctl: Support vdso_enabled sysctl on SH. ... Browse Code »

All of the logic for this was already in place, we just hadn't wired it
up in the sysctl table.

Signed-off-by: Paul Mundt

Paul Mundt
2007-03-05 13:13:26 +0800

02 Mar, 2007

1 commit

93a6fefe2 [PATCH] fix the SYSCTL=n compilation ... Browse Code »

/home/bunk/linux/kernel-2.6/linux-2.6.20-mm2/kernel/sysctl.c:1411: error: conflicting types for 'register_sysctl_table'
/home/bunk/linux/kernel-2.6/linux-2.6.20-mm2/include/linux/sysctl.h:1042: error: previous declaration of 'register_sysctl_table' was here
make[2]: *** [kernel/sysctl.o] Error 1

Caused by commit 0b4d414714f0d2f922d39424b0c5c82ad900a381.

Signed-off-by: Adrian Bunk
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2007-03-02 06:53:37 +0800

15 Feb, 2007

11 commits

d912b0cc1 [PATCH] sysctl: add a parent entry to ctl_table and set the parent entry ... Browse Code »

Add a parent entry into the ctl_table so you can walk the list of parents and
find the entire path to a ctl_table entry.

Signed-off-by: Eric W. Biederman
Cc: Stephen Smalley
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2007-02-15 00:10:00 +0800
77b14db50 [PATCH] sysctl: reimplement the sysctl proc support ... Browse Code »

With this change the sysctl inodes can be cached and nothing needs to be done
when removing a sysctl table.

For a cost of 2K code we will save about 4K of static tables (when we remove
de from ctl_table) and 70K in proc_dir_entries that we will not allocate, or
about half that on a 32bit arch.

The speed feels about the same, even though we can now cache the sysctl
dentries :(

We get the core advantage that we don't need to have a 1 to 1 mapping between
ctl table entries and proc files. Making it possible to have /proc/sys vary
depending on the namespace you are in. The currently merged namespaces don't
have an issue here but the network namespace under /proc/sys/net needs to have
different directories depending on which network adapters are visible. By
simply being a cache different directories being visible depending on who you
are is trivial to implement.

[akpm@osdl.org: fix uninitialised var]
[akpm@osdl.org: fix ARM build]
[bunk@stusta.de: make things static]
Signed-off-by: Eric W. Biederman
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2007-02-15 00:10:00 +0800
1ff007eb8 [PATCH] sysctl: allow sysctl_perm to be called from outside of sysctl.c ... Browse Code »

Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2007-02-15 00:09:59 +0800
805b5d5e0 [PATCH] sysctl: factor out sysctl_head_next from do_sysctl ... Browse Code »

The current logic to walk through the list of sysctl table headers is slightly
painful and implement in a way it cannot be used by code outside sysctl.c

I am in the process of implementing a version of the sysctl proc support that
instead of using the proc generic non-caching monster, just uses the existing
sysctl data structure as backing store for building the dcache entries and for
doing directory reads. To use the existing data structures however I need a
way to get at them.

[akpm@osdl.org: warning fix]
Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2007-02-15 00:09:59 +0800
0b4d41471 [PATCH] sysctl: remove insert_at_head from register_sysctl ... Browse Code »

The semantic effect of insert_at_head is that it would allow new registered
sysctl entries to override existing sysctl entries of the same name. Which is
pain for caching and the proc interface never implemented.

I have done an audit and discovered that none of the current users of
register_sysctl care as (excpet for directories) they do not register
duplicate sysctl entries.

So this patch simply removes the support for overriding existing entries in
the sys_sysctl interface since no one uses it or cares and it makes future
enhancments harder.

Signed-off-by: Eric W. Biederman
Acked-by: Ralf Baechle
Acked-by: Martin Schwidefsky
Cc: Russell King
Cc: David Howells
Cc: "Luck, Tony"
Cc: Ralf Baechle
Cc: Paul Mackerras
Cc: Martin Schwidefsky
Cc: Andi Kleen
Cc: Jens Axboe
Cc: Corey Minyard
Cc: Neil Brown
Cc: "John W. Linville"
Cc: James Bottomley
Cc: Jan Kara
Cc: Trond Myklebust
Cc: Mark Fasheh
Cc: David Chinner
Cc: "David S. Miller"
Cc: Patrick McHardy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2007-02-15 00:09:59 +0800
ae8368102 [PATCH] sysctl: remove support for directory strategy routines ... Browse Code »

parse_table has support for calling a strategy routine when descending into a
directory. To date no one has used this functionality and the /proc/sys
interface has no analog to it.

So no one is using this functionality kill it and make the binary sysctl code
easier to follow.

Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2007-02-15 00:09:59 +0800
6703ddfcc [PATCH] sysctl: remove support for CTL_ANY ... Browse Code »

There are currently no users in the kernel for CTL_ANY and it only has effect
on the binary interface which is practically unused.

So this complicates sysctl lookups for no good reason so just remove it.

Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2007-02-15 00:09:59 +0800
2abc26fc6 [PATCH] sysctl: create sys/fs/binfmt_misc as an ordinary sysctl entry ... Browse Code »

binfmt_misc has a mount point in the middle of the sysctl and that mount point
is created as a proc_generic directory.

Doing it that way gets in the way of cleaning up the sysctl proc support as it
continues the existence of a horrible hack. So instead simply create the
directory as an ordinary sysctl directory. At least that removes the magic
special case.

[akpm@osdl.org: warning fix]
Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2007-02-15 00:09:59 +0800
a5494dcd8 [PATCH] sysctl: move SYSV IPC sysctls to their own file ... Browse Code »

This is just a simple cleanup to keep kernel/sysctl.c from getting to crowded
with special cases, and by keeping all of the ipc logic to together it makes
the code a little more readable.

[gcoady.lk@gmail.com: build fix]
Signed-off-by: Eric W. Biederman
Cc: Serge E. Hallyn
Cc: Herbert Poetzl
Cc: Kirill Korotaev
Signed-off-by: Grant Coady
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2007-02-15 00:09:59 +0800
39732acd9 [PATCH] sysctl: move utsname sysctls to their own file ... Browse Code »

This is just a simple cleanup to keep kernel/sysctl.c from getting to crowded
with special cases, and by keeping all of the utsname logic to together it
makes the code a little more readable.

Signed-off-by: Eric W. Biederman
Cc: Serge E. Hallyn
Cc: Herbert Poetzl
Cc: Kirill Korotaev
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2007-02-15 00:09:58 +0800
b04c3afb2 [PATCH] sysctl: move init_irq_proc into init/main where it belongs ... Browse Code »

Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2007-02-15 00:09:58 +0800

12 Feb, 2007

5 commits

8d0608771 [PATCH] _proc_do_string(): fix short reads ... Browse Code »

If you try to read things like /proc/sys/kernel/osrelease with single-byte
reads, you get just one byte and then EOF. This is because _proc_do_string()
assumes that the caller is read()ing into a buffer which is large enough to
fit the whole string in a single hit.

Fix.

Cc: "Eric W. Biederman"
Cc: Michael Tokarev
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2007-02-12 03:18:07 +0800
cb799b898 [PATCH] sysctl warning fix ... Browse Code »

kernel/sysctl.c:2816: warning: 'sysctl_ipc_data' defined but not used

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-02-12 02:51:31 +0800
34f5a3989 [PATCH] Add TAINT_USER and ability to set taint flags from userspace ... Browse Code »

Allow taint flags to be set from userspace by writing to
/proc/sys/kernel/tainted, and add a new taint flag, TAINT_USER, to be used
when userspace has potentially done something dangerous that might
compromise the kernel. This will allow support personnel to ask further
questions about what may have caused the user taint flag to have been set.

For example, they might examine the logs of the realtime JVM to see if the
Java program has used the really silly, stupid, dangerous, and
completely-non-portable direct access to physical memory feature which MUST
be implemented according to the Real-Time Specification for Java (RTSJ).
Sigh. What were those silly people at Sun thinking?

[akpm@osdl.org: build fix]
[bunk@stusta.de: cleanup]
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Theodore Ts'o
2007-02-12 02:51:29 +0800
3ee75ac3c [PATCH] sysctl_{,ms_}jiffies: fix oldlen semantics ... Browse Code »

currently it's
1) if *oldlenp == 0,
don't writeback anything

2) if *oldlenp >= table->maxlen,
don't writeback more than table->maxlen bytes and rewrite *oldlenp
don't look at underlying type granularity

3) if 0 < *oldlenp < table->maxlen,
*cough*
string sysctls don't writeback more than *oldlenp bytes.
OK, that's because sizeof(char) == 1

int sysctls writeback anything in (0, table->maxlen] range
Though accept integers divisible by sizeof(int) for writing.

sysctl_jiffies and sysctl_ms_jiffies don't writeback anything but
sizeof(int), which violates 1) and 2).

So, make sysctl_jiffies and sysctl_ms_jiffies accept
a) *oldlenp == 0, not doing writeback
b) *oldlenp >= sizeof(int), writing one integer.

-EINVAL still returned for *oldlenp == 1, 2, 3.

Signed-off-by: Alexey Dobriyan
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2007-02-12 02:51:24 +0800
6ff1b4426 [PATCH] make reading /proc/sys/kernel/cap-bould not require CAP_SYS_MODULE ... Browse Code »

Reading /proc/sys/kernel/cap-bound requires CAP_SYS_MODULE. (see
proc_dointvec_bset in kernel/sysctl.c)

sysctl appears to drive all over proc reading everything it can get it's
hands on and is complaining when it is being denied access to read
cap-bound. Clearly writing to cap-bound should be a sensitive operation
but requiring CAP_SYS_MODULE to read cap-bound seems a bit to strong. I
believe the information could with reasonable certainty be obtained by
looking at a bunch of the output of /proc/pid/status which has very low
security protection, so at best we are just getting a little obfuscation of
information.

Currently SELinux policy has to 'dontaudit' capability checks for
CAP_SYS_MODULE for things like sysctl which just want to read cap-bound.
In doing so we also as a byproduct have to hide warnings of potential
exploits such as if at some time that sysctl actually tried to load a
module. I wondered if anyone would have a problem opening cap-bound up to
read from anyone?

Acked-by: Chris Wright
Cc: Stephen Smalley
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Paris
2007-02-12 02:51:19 +0800

14 Dec, 2006

1 commit

5d6f647fc [PATCH] debug: add sysrq_always_enabled boot option ... Browse Code »

Most distributions enable sysrq support but set it to 0 by default. Add a
sysrq_always_enabled boot option to always-enable sysrq keys. Useful for
debugging - without having to modify the disribution's config files (which
might not be possible if the kernel is on a live CD, etc.).

Also, while at it, clean up the sysrq interfaces.

[bunk@stusta.de: make sysrq_always_enabled_setup() static]
Signed-off-by: Ingo Molnar
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2006-12-14 01:05:50 +0800

11 Dec, 2006

3 commits

1f29bcd73 [PATCH] sysctl: remove unused "context" param ... Browse Code »

Signed-off-by: Alexey Dobriyan
Cc: Andi Kleen
Cc: "David S. Miller"
Cc: David Howells
Cc: Ralf Baechle
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2006-12-11 01:55:41 +0800
98d7340c3 [PATCH] sysctl: remove some OPs ... Browse Code »

kernel.cap-bound uses only OP_SET and OP_AND

Signed-off-by: Alexey Dobriyan
Cc: "Eric W. Biederman"
Cc: Chris Wright
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2006-12-11 01:55:40 +0800
d53ef07ab [PATCH] ipc-procfs-sysctl mixups ... Browse Code »

When CONFIG_PROC_FS=n and CONFIG_PROC_SYSCTL=n but CONFIG_SYSVIPC=y, we get
this build error:

kernel/built-in.o:(.data+0xc38): undefined reference to `proc_ipc_doulongvec_minmax'
kernel/built-in.o:(.data+0xc88): undefined reference to `proc_ipc_doulongvec_minmax'
kernel/built-in.o:(.data+0xcd8): undefined reference to `proc_ipc_dointvec'
kernel/built-in.o:(.data+0xd28): undefined reference to `proc_ipc_dointvec'
kernel/built-in.o:(.data+0xd78): undefined reference to `proc_ipc_dointvec'
kernel/built-in.o:(.data+0xdc8): undefined reference to `proc_ipc_dointvec'
kernel/built-in.o:(.data+0xe18): undefined reference to `proc_ipc_dointvec'
make: *** [vmlinux] Error 1

Signed-off-by: Randy Dunlap
Acked-by: Eric Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2006-12-11 01:55:39 +0800

09 Dec, 2006

1 commit

6b49a2578 [PATCH] sysctl: fix sys_sysctl interface of ipc sysctls ... Browse Code »

Currently there is a regression and the ipc sysctls don't show up in the
binary sysctl namespace.

This patch adds sysctl_ipc_data to read data/write from the appropriate
namespace and deliver it in the expected manner.

[akpm@osdl.org: warning fix]
Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2006-12-09 00:29:03 +0800