20 Oct, 2007
4 commits
-
With multiple pid namespaces, a process is known by some pid_t in every
ancestor pid namespace. Every time the process forks, the child process also
gets a pid_t in every ancestor pid namespace.While a process is visible in >=1 pid namespaces, it can see pid_t's in only
one pid namespace. We call this pid namespace it's "active pid namespace",
and it is always the youngest pid namespace in which the process is known.This patch defines and uses a wrapper to find the active pid namespace of a
process. The implementation of the wrapper will be changed in when support
for multiple pid namespaces are added.Changelog:
2.6.22-rc4-mm2-pidns1:
- [Pavel Emelianov, Alexey Dobriyan] Back out the change to use
task_active_pid_ns() in child_reaper() since task->nsproxy
can be NULL during task exit (so child_reaper() continues to
use init_pid_ns).to implement child_reaper() since init_pid_ns.child_reaper to
implement child_reaper() since tsk->nsproxy can be NULL during exit.2.6.21-rc6-mm1:
- Rename task_pid_ns() to task_active_pid_ns() to reflect that a
process can have multiple pid namespaces.Signed-off-by: Sukadev Bhattiprolu
Acked-by: Pavel Emelianov
Cc: Eric W. Biederman
Cc: Cedric Le Goater
Cc: Dave Hansen
Cc: Serge Hallyn
Cc: Herbert Poetzel
Cc: Kirill Korotaev
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The set of functions process_session, task_session, process_group and
task_pgrp is confusing, as the names can be mixed with each other when looking
at the code for a long time.The proposals are to
* equip the functions that return the integer with _nr suffix to
represent that fact,
* and to make all functions work with task (not process) by making
the common prefix of the same name.For monotony the routines signal_session() and set_signal_session() are
replaced with task_session_nr() and set_task_session(), especially since they
are only used with the explicit task->signal dereference.Signed-off-by: Pavel Emelianov
Acked-by: Serge E. Hallyn
Cc: Kirill Korotaev
Cc: "Eric W. Biederman"
Cc: Cedric Le Goater
Cc: Herbert Poetzl
Cc: Sukadev Bhattiprolu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove the filesystem support logic from the cpusets system and makes cpusets
a cgroup subsystemThe "cpuset" filesystem becomes a dummy filesystem; attempts to mount it get
passed through to the cgroup filesystem with the appropriate options to
emulate the old cpuset filesystem behaviour.Signed-off-by: Paul Menage
Cc: Serge E. Hallyn
Cc: "Eric W. Biederman"
Cc: Dave Hansen
Cc: Balbir Singh
Cc: Paul Jackson
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: Srivatsa Vaddagiri
Cc: Cedric Le Goater
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add:
/proc/cgroups - general system info
/proc/*/cgroup - per-task cgroup membership info
[a.p.zijlstra@chello.nl: cgroups: bdi init hooks]
Signed-off-by: Paul Menage
Cc: Serge E. Hallyn
Cc: "Eric W. Biederman"
Cc: Dave Hansen
Cc: Balbir Singh
Cc: Paul Jackson
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: Srivatsa Vaddagiri
Cc: Cedric Le Goater
Signed-off-by: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Oct, 2007
8 commits
-
/proc/PID/environ currently truncates at 4096 characters, patch based on
the /proc/PID/mem code.Signed-off-by: James Pearson
Cc: Anton Arapov
Cc: Jan Engelhardt
Cc: "H. Peter Anvin"
Cc: Alexey Dobriyan
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix f_version type: should be u64 instead of long
There is a type inconsistency between struct inode i_version and struct file
f_version.fs.h:
struct inode
u64 i_version;and
struct file
unsigned long f_version;Users do:
fs/ext3/dir.c:
if (filp->f_version != inode->i_version) {
So why isn't f_version a u64 ? It becomes a problem if versions gets
higher than 2^32 and we are on an architecture where longs are 32 bits.This patch changes the f_version type to u64, and updates the users accordingly.
It applies to 2.6.23-rc2-mm2.
Signed-off-by: Mathieu Desnoyers
Cc: Martin Bligh
Cc: "Randy.Dunlap"
Cc: Al Viro
Cc:
Cc: Mark Fasheh
Cc: Christoph Hellwig
Cc: "J. Bruce Fields"
Cc: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Commit 4004c69ad68dd03733179277280ea2946990ba36 avoids too many remote cpu
references while reporting per-irq stats. Since we will not have the same
performance penalty of bringing in remote cpu cachelines while reporting
per-irq stats anymore, we can now afford to be consistent and report this
statistic on all arches, all configs.akpm: affects ia64, alpha and ppc64, mainly.
Kiran earlier said:
Read to /proc/stat takes:
Plain: 2.622832
With speedup patch: 0.013194
With the per-irq stats commented out: 0.008124So the performance problems which originally caused those architectures to
disable this statistic should now be fixed up.Signed-off-by: Ravikiran Thirumalai
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: "Luck, Tony"
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Acked-by: Linus Torvalds
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
fs/proc/mmu.c consists of only one function which uses only:
1) struct vmalloc_info *
2) struct vm_struct *
3) struct vmalloc_info
4) vmlist
5) VMALLOC_TOTAL, VMALLOC_START, VMALLOC_END
6) read_lock, read_unlock
7) vmlist_lock
8) struct vm_structThis gives us linux/spinlock.h, asm/pgtable.h, "internal.h", linux/vmalloc.h.
asm/pgtable.h uses PKMAP_BASE on i386, for which asm/highmem.h is needed.
But, linux/highmem.h is actually used to make it compile everywhere.
I'll deal later with this particular i386 surprise.Cross-compile tested on many archs and configs.
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
These aren't modular, so SLAB_PANIC is OK.
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Slab constructors currently have a flags parameter that is never used. And
the order of the arguments is opposite to other slab functions. The object
pointer is placed before the kmem_cache pointer.Convert
ctor(void *object, struct kmem_cache *s, unsigned long flags)
to
ctor(struct kmem_cache *s, void *object)
throughout the kernel
[akpm@linux-foundation.org: coupla fixes]
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch provides fragmentation avoidance statistics via /proc/pagetypeinfo.
The information is collected only on request so there is no runtime overhead.
The statistics are in three parts:The first part prints information on the size of blocks that pages are
being grouped on and looks likePage block order: 10
Pages per block: 1024The second part is a more detailed version of /proc/buddyinfo and looks like
Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
Node 0, zone DMA, type Unmovable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type Reclaimable 1 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type Reserve 0 4 4 0 0 0 0 1 0 1 0
Node 0, zone Normal, type Unmovable 111 8 4 4 2 3 1 0 0 0 0
Node 0, zone Normal, type Reclaimable 293 89 8 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Movable 1 6 13 9 7 6 3 0 0 0 0
Node 0, zone Normal, type Reserve 0 0 0 0 0 0 0 0 0 0 4The third part looks like
Number of blocks type Unmovable Reclaimable Movable Reserve
Node 0, zone DMA 0 1 2 1
Node 0, zone Normal 3 17 94 4To walk the zones within a node with interrupts disabled, walk_zones_in_node()
is introduced and shared between /proc/buddyinfo, /proc/zoneinfo and
/proc/pagetypeinfo to reduce code duplication. It seems specific to what
vmstat.c requires but could be broken out as a general utility function in
mmzone.c if there were other other potential users.Signed-off-by: Mel Gorman
Acked-by: Andy Whitcroft
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch marks a number of allocations that are either short-lived such as
network buffers or are reclaimable such as inode allocations. When something
like updatedb is called, long-lived and unmovable kernel allocations tend to
be spread throughout the address space which increases fragmentation.This patch groups these allocations together as much as possible by adding a
new MIGRATE_TYPE. The MIGRATE_RECLAIMABLE type is for allocations that can be
reclaimed on demand, but not moved. i.e. they can be migrated by deleting
them and re-reading the information from elsewhere.Signed-off-by: Mel Gorman
Cc: Andy Whitcroft
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
16 Oct, 2007
1 commit
-
* 'locks' of git://linux-nfs.org/~bfields/linux:
nfsd: remove IS_ISMNDLCK macro
Rework /proc/locks via seq_files and seq_list helpers
fs/locks.c: use list_for_each_entry() instead of list_for_each()
NFS: clean up explicit check for mandatory locks
AFS: clean up explicit check for mandatory locks
9PFS: clean up explicit check for mandatory locks
GFS2: clean up explicit check for mandatory locks
Cleanup macros for distinguishing mandatory locks
Documentation: move locks.txt in filesystems/
locks: add warning about mandatory locking races
Documentation: move mandatory locking documentation to filesystems/
locks: Fix potential OOPS in generic_setlease()
Use list_first_entry in locks_wake_up_blocks
locks: fix flock_lock_file() comment
Memory shortage can result in inconsistent flocks state
locks: kill redundant local variable
locks: reverse order of posix_locks_conflict() arguments
15 Oct, 2007
3 commits
-
like for cpustat, introduce the "gtime" (guest time of the task) and
"cgtime" (guest time of the task children) fields for the
tasks. Modify signal_struct and task_struct.Modify /proc//stat to display these new fields.
Signed-off-by: Laurent Vivier
Acked-by: Avi Kivity
Signed-off-by: Ingo Molnar -
as recent CPUs introduce a third running state, after "user" and
"system", we need a new field, "guest", in cpustat to store the time
used by the CPU to run virtual CPU. Modify /proc/stat to display this
new field.Signed-off-by: Laurent Vivier
Acked-by: Avi Kivity
Signed-off-by: Ingo Molnar -
rename all 'cnt' fields and variables to the less yucky 'count' name.
yuckage noticed by Andrew Morton.
no change in code, other than the /proc/sched_debug bkl_count string got
a bit larger:text data bss dec hex filename
38236 3506 24 41766 a326 sched.o.before
38240 3506 24 41770 a32a sched.o.afterSigned-off-by: Ingo Molnar
Reviewed-by: Thomas Gleixner
11 Oct, 2007
5 commits
-
With the net namespaces many code leaved the __init section,
thus making the kernel occupy more memory than it did before.
Since we have a config option that prohibits the namespace
creation, the functions that initialize/finalize some netns
stuff are simply not needed and can be freed after the boot.Currently, this is almost not noticeable, since few calls
are no longer in __init, but when the namespaces will be
merged it will be possible to free more code. I propose to
use the __net_init, __net_exit and __net_initdata "attributes"
for functions/variables that are not used if the CONFIG_NET_NS
is not set to save more space in memory.The exiting functions cannot just reside in the __exit section,
as noticed by David, since the init section will have
references on it and the compilation will fail due to modpost
checks. These references can exist, since the init namespace
never dies and the exit callbacks are never called. So I
introduce the __exit_refok attribute just like it is already
done with the __init_refok.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
The problem: proc_net files remember which network namespace the are
against but do not remember hold a reference count (as that would pin
the network namespace). So we currently have a small window where
the reference count on a network namespace may be incremented when opening
a /proc file when it has already gone to zero.To fix this introduce maybe_get_net and get_proc_net.
maybe_get_net increments the network namespace reference count only if it is
greater then zero, ensuring we don't increment a reference count after it
has gone to zero.get_proc_net handles all of the magic to go from a proc inode to the network
namespace instance and call maybe_get_net on it.PROC_NET the old accessor is removed so that we don't get confused and use
the wrong helper function.Then I fix up the callers to use get_proc_net and handle the case case
where get_proc_net returns NULL. In that case I return -ENXIO because
effectively the network namespace has already gone away so the files
we are trying to access don't exist anymore.Signed-off-by: Eric W. Biederman
Acked-by: Paul E. McKenney
Signed-off-by: David S. Miller -
Add the appropriate EXPORT_SYMBOLS for proc_net_create,
proc_net_fops_create and proc_net_remove to fix errors when
compiling allmodconfigSigned-off-by: Mark Nelson
Acked-by: Benjamin Thery
Signed-off-by: David S. Miller -
My bad.
Signed-off-by: David S. Miller
-
This patch makes /proc/net per network namespace. It modifies the global
variables proc_net and proc_net_stat to be per network namespace.
The proc_net file helpers are modified to take a network namespace argument,
and all of their callers are fixed to pass &init_net for that argument.
This ensures that all of the /proc/net files are only visible and
usable in the initial network namespace until the code behind them
has been updated to be handle multiple network namespaces.Making /proc/net per namespace is necessary as at least some files
in /proc/net depend upon the set of network devices which is per
network namespace, and even more files in /proc/net have contents
that are relevant to a single network namespace.Signed-off-by: Eric W. Biederman
Signed-off-by: David S. Miller
10 Oct, 2007
1 commit
-
Currently /proc/locks is shown with a proc_read function, but its behavior
is rather complex as it has to manually handle current offset and buffer
length. On the other hand, files that show objects from lists can be
easily reimplemented using the sequential files and the seq_list_XXX()
helpers.This saves (as usually) 16 lines of code and more than 200 from
the .text section.[akpm@linux-foundation.org: no externs in C]
[akpm@linux-foundation.org: warning fixes]
Signed-off-by: Pavel Emelyanov
Cc: "J. Bruce Fields"
Cc: Trond Myklebust
Signed-off-by: Andrew Morton
12 Sep, 2007
1 commit
-
Taneli Vähäkangas reported that commit
786d7e1612f0b0adb6046f19b906609e4fe8b1ba aka "Fix rmmod/read/write races
in /proc entries" broke SBCL + SLIME combo.The old code in do_select() used DEFAULT_POLLMASK, if couldn't find
->poll handler. The new code makes ->poll always there and returns 0 by
default, which is not correct. Return DEFAULT_POLLMASK instead.Steps to reproduce:
install emacs, SBCL, SLIME
emacs
M-x slime in *inferior-lisp* buffer
[watch it doing "Connecting to Swank on port X.."]Please, apply before 2.6.23.
P.S.: why SBCL can't just read(2) /proc/cpuinfo is a mystery.
Signed-off-by: Alexey Dobriyan
Cc: T Taneli Vahakangas
Cc: Oleg Nesterov
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Aug, 2007
1 commit
-
Fix the accounting regression for CONFIG_VIRT_CPU_ACCOUNTING. It
reverts parts of commit b27f03d4bdc145a09fb7b0c0e004b29f1ee555fa by
converting fs/proc/array.c back to cputime_t. The new functions
task_utime and task_stime now return cputime_t instead of clock_t. If
CONFIG_VIRT_CPU_ACCOUTING is set, task->utime and task->stime are
returned directly instead of using sum_exec_runtime.Patch is tested on s390x with and without VIRT_CPU_ACCOUTING as well as
on i386.[ mingo@elte.hu: cleanups, comments. ]
Signed-off-by: Christian Borntraeger
Signed-off-by: Andrew Morton
Signed-off-by: Ingo Molnar
01 Aug, 2007
1 commit
-
On every open/close one struct seq_operations leaks.
Kudos to /proc/slab_allocators.Signed-off-by: Alexey Dobriyan
Acked-by: Ingo Molnar
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
29 Jul, 2007
1 commit
-
It is important to only provide the compat_ioctl method
if the downstream de->proc_fops does too, otherwise this
utterly confuses the logic in fs/compat_ioctl.c and we
end up doing the wrong thing.Signed-off-by: David S. Miller
Acked-by: Alexey Dobriyan
Signed-off-by: Linus Torvalds
22 Jul, 2007
1 commit
-
Too many remote cpu references due to /proc/stat.
On x86_64, with newer kernel versions, kstat_irqs is a bit of a problem.
On every call to kstat_irqs, the process brings in per-cpu data from all
online cpus. Doing this for NR_IRQS, which is now 256 + 32 * NR_CPUS
results in (256+32*63) * 63 remote cpu references on a 64 cpu config.
/proc/stat is parsed by common commands like top, who etc, causing lots
of cacheline transfersThis statistic seems useless. Other 'big iron' arches disable this.
AK: changed to remove for all SMP setups
AK: add commentSigned-off-by: Ravikiran Thirumalai
Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds
20 Jul, 2007
4 commits
-
Slab destructors were no longer supported after Christoph's
c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
BUGs for both slab and slub, and slob never supported them
either.This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).Signed-off-by: Paul Mundt
-
This patch adds an interface to set/reset flags which determines each memory
segment should be dumped or not when a core file is generated./proc//coredump_filter file is provided to access the flags. You can
change the flag status for a particular process by writing to or reading from
the file.The flag status is inherited to the child process when it is created.
Signed-off-by: Hidehiro Kawai
Cc: Alan Cox
Cc: David Howells
Cc: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch changes mm_struct.dumpable to a pair of bit flags.
set_dumpable() converts three-value dumpable to two flags and stores it into
lower two bits of mm_struct.flags instead of mm_struct.dumpable.
get_dumpable() behaves in the opposite way.[akpm@linux-foundation.org: export set_dumpable]
Signed-off-by: Hidehiro Kawai
Cc: Alan Cox
Cc: David Howells
Cc: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Optimize show_stat to collect per-irq information just once.
On x86_64, with newer kernel versions, kstat_irqs is a bit of a problem.
On every call to kstat_irqs, the process brings in per-cpu data from all
online cpus. Doing this for NR_IRQS, which is now 256 + 32 * NR_CPUS
results in (256+32*63) * 63 remote cpu references on a 64 cpu config.
Considering the fact that we already compute this value per-cpu, we can
save on the remote references as below.Signed-off-by: Alok N Kataria
Signed-off-by: Ravikiran Thirumalai
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
18 Jul, 2007
1 commit
-
KSYM_NAME_LEN is peculiar in that it does not include the space for the
trailing '\0', forcing all users to use KSYM_NAME_LEN + 1 when allocating
buffer. This is nonsense and error-prone. Moreover, when the caller
forgets that it's very likely to subtly bite back by corrupting the stack
because the last position of the buffer is always cleared to zero.This patch increments KSYM_NAME_LEN by one and updates code accordingly.
* off-by-one bug in asm-powerpc/kprobes.h::kprobe_lookup_name() macro
is fixed.* Where MODULE_NAME_LEN and KSYM_NAME_LEN were used together,
MODULE_NAME_LEN was treated as if it didn't include space for the
trailing '\0'. Fix it.Signed-off-by: Tejun Heo
Acked-by: Paulo Marques
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Jul, 2007
8 commits
-
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
[PATCH] sched: fix up fs/proc/array.c whitespace problems
[PATCH] sched: prettify prio_to_wmult[]
[PATCH] sched: document prio_to_wmult[]
[PATCH] sched: improve weight-array comments
[PATCH] sched: remove dead code from task_stime()Fixed up trivial conflict in fs/proc/array.c
-
This reduces the memory footprint and it enforces that only the current
task can enable seccomp on itself (this is a requirement for a
strightforward [modulo preempt ;) ] TIF_NOTSC implementation).Signed-off-by: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Make available to the user the following task and process performance
statistics:* Involuntary Context Switches (task_struct->nivcsw)
* Voluntary Context Switches (task_struct->nvcsw)Statistics information is available from:
1. taskstats interface (Documentation/accounting/)
2. /proc/PID/status (task only).This data is useful for detecting hyperactivity patterns between processes.
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Maxim Uvarov
Cc: Shailabh Nagar
Cc: Balbir Singh
Cc: Jay Lan
Cc: Jonathan Lim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
It's a bit dopey-looking and can permit a task to cause a pagefault in an mm
which it doesn't have permission to read from.Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Function proc_register() will assign proc_dir_operations and
proc_dir_inode_operations to ent's members proc_fops and proc_iops
correctly if ent is a directory. So the early assignment isn't
necessary.Cc: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Simple and stupid like some previous ones. Just use new API.
Signed-off-by: Pavel Emelianov
Cc: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Commit 411187fb05cd11676b0979d9fbf3291db69dbce2 caused uptime not to increase
during suspend. This may cause confusion so I restore the old behaviour by
using the boot based time instead of monotonic for uptime.Signed-off-by: Tomas Janousek
Acked-by: John Stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Commit 411187fb05cd11676b0979d9fbf3291db69dbce2 caused boot time to move and
process start times to become invalid after suspend. Using boot based time
for those restores the old behaviour and fixes the issue.[akpm@linux-foundation.org: little cleanup]
Signed-off-by: Tomas Janousek
Cc: Tomas Smetana
Acked-by: John Stultz
Cc: Thomas Gleixner
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds