20 Jul, 2007
4 commits
-
Slab destructors were no longer supported after Christoph's
c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
BUGs for both slab and slub, and slob never supported them
either.This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).Signed-off-by: Paul Mundt
-
This patch adds an interface to set/reset flags which determines each memory
segment should be dumped or not when a core file is generated./proc//coredump_filter file is provided to access the flags. You can
change the flag status for a particular process by writing to or reading from
the file.The flag status is inherited to the child process when it is created.
Signed-off-by: Hidehiro Kawai
Cc: Alan Cox
Cc: David Howells
Cc: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch changes mm_struct.dumpable to a pair of bit flags.
set_dumpable() converts three-value dumpable to two flags and stores it into
lower two bits of mm_struct.flags instead of mm_struct.dumpable.
get_dumpable() behaves in the opposite way.[akpm@linux-foundation.org: export set_dumpable]
Signed-off-by: Hidehiro Kawai
Cc: Alan Cox
Cc: David Howells
Cc: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Optimize show_stat to collect per-irq information just once.
On x86_64, with newer kernel versions, kstat_irqs is a bit of a problem.
On every call to kstat_irqs, the process brings in per-cpu data from all
online cpus. Doing this for NR_IRQS, which is now 256 + 32 * NR_CPUS
results in (256+32*63) * 63 remote cpu references on a 64 cpu config.
Considering the fact that we already compute this value per-cpu, we can
save on the remote references as below.Signed-off-by: Alok N Kataria
Signed-off-by: Ravikiran Thirumalai
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
18 Jul, 2007
1 commit
-
KSYM_NAME_LEN is peculiar in that it does not include the space for the
trailing '\0', forcing all users to use KSYM_NAME_LEN + 1 when allocating
buffer. This is nonsense and error-prone. Moreover, when the caller
forgets that it's very likely to subtly bite back by corrupting the stack
because the last position of the buffer is always cleared to zero.This patch increments KSYM_NAME_LEN by one and updates code accordingly.
* off-by-one bug in asm-powerpc/kprobes.h::kprobe_lookup_name() macro
is fixed.* Where MODULE_NAME_LEN and KSYM_NAME_LEN were used together,
MODULE_NAME_LEN was treated as if it didn't include space for the
trailing '\0'. Fix it.Signed-off-by: Tejun Heo
Acked-by: Paulo Marques
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Jul, 2007
9 commits
-
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
[PATCH] sched: fix up fs/proc/array.c whitespace problems
[PATCH] sched: prettify prio_to_wmult[]
[PATCH] sched: document prio_to_wmult[]
[PATCH] sched: improve weight-array comments
[PATCH] sched: remove dead code from task_stime()Fixed up trivial conflict in fs/proc/array.c
-
This reduces the memory footprint and it enforces that only the current
task can enable seccomp on itself (this is a requirement for a
strightforward [modulo preempt ;) ] TIF_NOTSC implementation).Signed-off-by: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Make available to the user the following task and process performance
statistics:* Involuntary Context Switches (task_struct->nivcsw)
* Voluntary Context Switches (task_struct->nvcsw)Statistics information is available from:
1. taskstats interface (Documentation/accounting/)
2. /proc/PID/status (task only).This data is useful for detecting hyperactivity patterns between processes.
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Maxim Uvarov
Cc: Shailabh Nagar
Cc: Balbir Singh
Cc: Jay Lan
Cc: Jonathan Lim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
It's a bit dopey-looking and can permit a task to cause a pagefault in an mm
which it doesn't have permission to read from.Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Function proc_register() will assign proc_dir_operations and
proc_dir_inode_operations to ent's members proc_fops and proc_iops
correctly if ent is a directory. So the early assignment isn't
necessary.Cc: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Simple and stupid like some previous ones. Just use new API.
Signed-off-by: Pavel Emelianov
Cc: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Commit 411187fb05cd11676b0979d9fbf3291db69dbce2 caused uptime not to increase
during suspend. This may cause confusion so I restore the old behaviour by
using the boot based time instead of monotonic for uptime.Signed-off-by: Tomas Janousek
Acked-by: John Stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Commit 411187fb05cd11676b0979d9fbf3291db69dbce2 caused boot time to move and
process start times to become invalid after suspend. Using boot based time
for those restores the old behaviour and fixes the issue.[akpm@linux-foundation.org: little cleanup]
Signed-off-by: Tomas Janousek
Cc: Tomas Smetana
Acked-by: John Stultz
Cc: Thomas Gleixner
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix following races:
===========================================
1. Write via ->write_proc sleeps in copy_from_user(). Module disappears
meanwhile. Or, more generically, system call done on /proc file, method
supplied by module is called, module dissapeares meanwhile.pde = create_proc_entry()
if (!pde)
return -ENOMEM;
pde->write_proc = ...
open
write
copy_from_user
pde = create_proc_entry();
if (!pde) {
remove_proc_entry();
return -ENOMEM;
/* module unloaded */
}
*boom*
==========================================
2. bogo-revoke aka proc_kill_inodes()remove_proc_entry vfs_read
proc_kill_inodes [check ->f_op validness]
[check ->f_op->read validness]
[verify_area, security permissions checks]
->f_op = NULL;
if (file->f_op->read)
/* ->f_op dereference, boom */NOTE, NOTE, NOTE: file_operations are proxied for regular files only. Let's
see how this scheme behaves, then extend if needed for directories.
Directories creators in /proc only set ->owner for them, so proxying for
directories may be unneeded.NOTE, NOTE, NOTE: methods being proxied are ->llseek, ->read, ->write,
->poll, ->unlocked_ioctl, ->ioctl, ->compat_ioctl, ->open, ->release.
If your in-tree module uses something else, yell on me. Full audit pending.[akpm@linux-foundation.org: build fix]
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
16 Jul, 2007
2 commits
-
while changing task_stime() i noticed a whitespace style problem in
array.c - fix it. While at it, fix all the other style problems too,
most of them in the scheduler-stats related portions of array.c.There is no change in functionality:
text data bss dec hex filename
4356 28 0 4384 1120 array.o-before
4356 28 0 4384 1120 array.o-afterSigned-off-by: Ingo Molnar
-
Alexey Dobriyan noticed that task_stime() contains a piece of dead code.
(which is a remnant of earlier versions of this code) Remove that code.Signed-off-by: Ingo Molnar
10 Jul, 2007
4 commits
-
scheduler debugging core: implement /proc/sched_debug and
/proc//sched files for scheduler debugging.Signed-off-by: Ingo Molnar
-
update delay-accounting to use CFS's precise stats.
Signed-off-by: Ingo Molnar
-
make use of CFS's precise accounting to drive /proc//stat statistics.
this code was co-authored by:
Balbir Singh
Dmitry Adamushko
Ingo MolnarSigned-off-by: Ingo Molnar
Signed-off-by: Dmitry Adamushko -
remove the SleepAVG field from /proc//status, as
with the removal of the sleep-average code this value
no longer makes sense.Signed-off-by: Ingo Molnar
17 May, 2007
1 commit
-
SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.
Signed-off-by: Christoph Lameter
Cc: David Howells
Cc: Jens Axboe
Cc: Steven French
Cc: Michael Halcrow
Cc: OGAWA Hirofumi
Cc: Miklos Szeredi
Cc: Steven Whitehouse
Cc: Roman Zippel
Cc: David Woodhouse
Cc: Dave Kleikamp
Cc: Trond Myklebust
Cc: "J. Bruce Fields"
Cc: Anton Altaparmakov
Cc: Mark Fasheh
Cc: Paul Mackerras
Cc: Christoph Hellwig
Cc: Jan Kara
Cc: David Chinner
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 May, 2007
18 commits
-
/proc/pid/clear_refs is only defined in the CONFIG_MMU case, so make sure we
don't have any references to clear_refs_smap() in generic procfs code.Signed-off-by: David Rientjes
Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Cleanup using simple_read_from_buffer() in procfs.
Signed-off-by: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
notify_change() already calls security_inode_setattr() before
calling iop->setattr.Alan sayeth
This is a behaviour change on all of these and limits some behaviour of
existing established security modulesWhen inode_change_ok is called it has side effects. This includes
clearing the SGID bit on attribute changes caused by chmod. If you make
this change the results of some rulesets may be different before or after
the change is made.I'm not saying the change is wrong but it does change behaviour so that
needs looking at closely (ditto all other attribute twiddles)Signed-off-by: Steve Beattie
Signed-off-by: Andreas Gruenbacher
Signed-off-by: John Johansen
Acked-by: Stephen Smalley
Cc: James Morris
Cc: Chris Wright
Cc: Alan Cox
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
notify_change() already calls security_inode_setattr() before
calling iop->setattr.Signed-off-by: Tony Jones
Signed-off-by: Andreas Gruenbacher
Signed-off-by: John Johansen
Acked-by: Stephen Smalley
Cc: James Morris
Cc: Chris Wright
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We can save some lines of code by using seq_release_private().
Signed-off-by: Martin Peschke
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
kallsyms_lookup() can go iterating over modules list unprotected which is OK
for emergency situations (oops), but not OK for regular stuff like
/proc/*/wchan.Introduce lookup_symbol_name()/lookup_module_symbol_name() which copy symbol
name into caller-supplied buffer or return -ERANGE. All copying is done with
module_mutex held, so...Signed-off-by: Alexey Dobriyan
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Several kallsyms_lookup() pass dummy arguments but only need, say, module's
name. Make kallsyms_lookup() accept NULLs where possible.Also, makes picture clearer about what interfaces are needed for all symbol
resolving business.Signed-off-by: Alexey Dobriyan
Cc: Rusty Russell
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove includes of where it is not used/needed.
Suggested by Al Viro.Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Additions and removal from tty_drivers list were just done as well as
iterating on it for /proc/tty/drivers generation.testing: modprobe/rmmod loop of simple module which does nothing but
tty_register_driver() vs cat /proc/tty/drivers loopBUG: unable to handle kernel paging request at virtual address 6b6b6b6b
printing eip:
c01cefa7
*pde = 00000000
Oops: 0000 [#1]
PREEMPT
last sysfs file: devices/pci0000:00/0000:00:1d.7/usb5/5-0:1.0/bInterfaceProtocol
Modules linked in: ohci_hcd af_packet e1000 ehci_hcd uhci_hcd usbcore xfs
CPU: 0
EIP: 0060:[] Not tainted VLI
EFLAGS: 00010297 (2.6.21-rc4-mm1 #4)
EIP is at vsnprintf+0x3a4/0x5fc
eax: 6b6b6b6b ebx: f6cb50f2 ecx: 6b6b6b6b edx: fffffffe
esi: c0354700 edi: f6cb6000 ebp: 6b6b6b6b esp: f31f5e68
ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
Process cat (pid: 31864, ti=f31f4000 task=c1998030 task.ti=f31f4000)
Stack: 00000000 c0103f20 c013003a c0103f20 00000000 f6cb50da 0000000a 00000f0e
f6cb50f2 00000010 00000014 ffffffff ffffffff 00000007 c0354753 f6cb50f2
f73e39dc f73e39dc 00000001 c0175416 f31f5ed8 f31f5ed4 0ee00000 f32090bc
Call Trace:
[] restore_nocheck+0x12/0x15
[] mark_held_locks+0x6d/0x86
[] restore_nocheck+0x12/0x15
[] seq_printf+0x2e/0x52
[] show_tty_range+0x35/0x1f3
[] seq_printf+0x2e/0x52
[] show_tty_driver+0x8a/0x1d9
[] seq_read+0x70/0x2ba
[] seq_read+0x0/0x2ba
[] proc_reg_read+0x63/0x9f
[] vfs_read+0x7d/0xb5
[] proc_reg_read+0x0/0x9f
[] sys_read+0x41/0x6a
[] sysenter_past_esp+0x5f/0x99
=======================
Code: 00 8b 4d 04 e9 44 ff ff ff 8d 4d 04 89 4c 24 50 8b 6d 00 81 fd ff 0f 00 00 b8 a4 c1 35 c0 0f 46 e8 8b 54 24 2c 89 e9 89 c8 eb 06 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 89 c6 8b 44 24 28 89
EIP: [] vsnprintf+0x3a4/0x5fc SS:ESP 0068:f31f5e68Signed-off-by: Alexey Dobriyan
Cc: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Eternal quest to make
while true; do cat /proc/fs/xfs/stat >/dev/null 2>/dev/null; done
while true; do find /proc -type f 2>/dev/null | xargs cat >/dev/null 2>/dev/null; done
while true; do modprobe xfs; rmmod xfs; donework reliably continues and now kernel oopses in the following way:
BUG: unable to handle ... at virtual address 6b6b6b6b
EIP is at badness
process: cat
proc_oom_score
proc_info_read
sys_fstat64
vfs_read
proc_info_read
sys_readFailing code is prefetch hidden in list_for_each_entry() in badness().
badness() is reachable from two points. One is proc_oom_score, another
is out_of_memory() => select_bad_process() => badness().Second path grabs tasklist_lock, while first doesn't.
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add support for finding out the current file position, open flags and
possibly other info in the future.These new entries are added:
/proc/PID/fdinfo/FD
/proc/PID/task/TID/fdinfo/FDFor each fd the information is provided in the following format:
pos: 1234
flags: 0100002[bunk@stusta.de: make struct proc_fdinfo_file_operations static]
Signed-off-by: Miklos Szeredi
Cc: Alexey Dobriyan
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Change the order of fields of struct pid_entry (file fs/proc/base.c) in order
to avoid a hole on 64bit archs. (8 bytes saved per object)Also change all pid_entry arrays to be const qualified, to make clear they
must not be modified.Before (on x86_64) :
# size fs/proc/base.o
text data bss dec hex filename
15549 2192 0 17741 454d fs/proc/base.oAfter :
# size fs/proc/base.o
text data bss dec hex filename
17229 176 0 17405 43fd fs/proc/base.oThats 336 bytes saved on kernel size on x86_64
Signed-off-by: Eric Dumazet
Acked-by: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The /proc/pid/ "maps", "smaps", and "numa_maps" files contain sensitive
information about the memory location and usage of processes. Issues:- maps should not be world-readable, especially if programs expect any
kind of ASLR protection from local attackers.
- maps cannot just be 0400 because "-D_FORTIFY_SOURCE=2 -O2" makes glibc
check the maps when %n is in a *printf call, and a setuid(getuid())
process wouldn't be able to read its own maps file. (For reference
see http://lkml.org/lkml/2006/1/22/150)
- a system-wide toggle is needed to allow prior behavior in the case of
non-root applications that depend on access to the maps contents.This change implements a check using "ptrace_may_attach" before allowing
access to read the maps contents. To control this protection, the new knob
/proc/sys/kernel/maps_protect has been added, with corresponding updates to
the procfs documentation.[akpm@linux-foundation.org: build fixes]
[akpm@linux-foundation.org: New sysctl numbers are old hat]
Signed-off-by: Kees Cook
Cc: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
WARN_ON(de && de->deleted); is sooo unreliable. Why?
proc_lookup remove_proc_entry
=========== =================
lock_kernel();
spin_lock(&proc_subdir_lock);
[find proc entry]
spin_unlock(&proc_subdir_lock);
spin_lock(&proc_subdir_lock);
[find proc entry]proc_get_inode
==============
WARN_ON(de && de->deleted); ...if (!atomic_read(&de->count))
free_proc_entry(de);
else
de->deleted = 1;So, if you have some strange oops [1], and doesn't see this WARN_ON it means
nothing.[1] try_module_get() of module which doesn't exist, two lines below
should suffice, or not?Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix the following race:
proc_readdir remove_proc_entry
============ =================spin_lock(&proc_subdir_lock);
[choose PDE to start filldir from]
spin_unlock(&proc_subdir_lock);
spin_lock(&proc_subdir_lock);
[find PDE]
[free PDE, refcount is 0]
spin_unlock(&proc_subdir_lock);
/* boom */
if (filldir(dirent, de->name, ...[de_put on error path --adobriyan]
Signed-off-by: Darrick J. Wong
Signed-off-by: Alexey Dobriyan
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
proc_lookup remove_proc_entry
=========== =================lock_kernel();
spin_lock(&proc_subdir_lock);
[find PDE with refcount 0]
spin_unlock(&proc_subdir_lock);
spin_lock(&proc_subdir_lock);
[find PDE with refcount 0]
[check refcount and free PDE]
spin_unlock(&proc_subdir_lock);
proc_get_inode:
de_get(de); /* boom */Signed-off-by: Alexey Dobriyan
Cc: "Eric W. Biederman"
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This past week I was playing around with that pahole tool
(http://oops.ghostprotocols.net:81/acme/dwarves/) and looking at the size
of various struct in the kernel. I was surprised by the size of the
task_struct on x86_64, approaching 4K. I looked through the fields in
task_struct and found that a number of them were declared as "unsigned
long" rather than "unsigned int" despite them appearing okay as 32-bit
sized fields. On x86_64 "unsigned long" ends up being 8 bytes in size and
forces 8 byte alignment. Is there a reason there a reason they are
"unsigned long"?The patch below drops the size of the struct from 3808 bytes (60 64-byte
cachelines) to 3760 bytes (59 64-byte cachelines). A couple other fields
in the task struct take a signficant amount of space:struct thread_struct thread; 688
struct held_lock held_locks[30]; 1680CONFIG_LOCKDEP is turned on in the .config
[akpm@linux-foundation.org: fix printk warnings]
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
/proc/$PID/fd has r-x------ permissions, so if process does setuid(), it
will not be able to access /proc/*/fd/. This breaks fstatat() emulation
in glibc.open("foo", O_RDONLY|O_DIRECTORY) = 4
setuid32(65534) = 0
stat64("/proc/self/fd/4/bar", 0xbfafb298) = -1 EACCES (Permission denied)Signed-off-by: Alexey Dobriyan
Cc: "Eric W. Biederman"
Cc: James Morris
Cc: Chris Wright
Cc: Ulrich Drepper
Cc: Oleg Nesterov
Acked-By: Kirill Korotaev
Cc: Al Viro
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 May, 2007
1 commit
-
I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
SLAB.I think its purpose was to have a callback after an object has been freed
to verify that the state is the constructor state again? The callback is
performed before each freeing of an object.I would think that it is much easier to check the object state manually
before the free. That also places the check near the code object
manipulation of the object.Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
compiled with SLAB debugging on. If there would be code in a constructor
handling SLAB_DEBUG_INITIAL then it would have to be conditional on
SLAB_DEBUG otherwise it would just be dead code. But there is no such code
in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
use of, difficult to understand and there are easier ways to accomplish the
same effect (i.e. add debug code before kfree).There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
clear in fs inode caches. Remove the pointless checks (they would even be
pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.This is the last slab flag that SLUB did not support. Remove the check for
unimplemented flags from SLUB.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds