20 Oct, 2007
1 commit
-
The set of functions process_session, task_session, process_group and
task_pgrp is confusing, as the names can be mixed with each other when looking
at the code for a long time.The proposals are to
* equip the functions that return the integer with _nr suffix to
represent that fact,
* and to make all functions work with task (not process) by making
the common prefix of the same name.For monotony the routines signal_session() and set_signal_session() are
replaced with task_session_nr() and set_task_session(), especially since they
are only used with the explicit task->signal dereference.Signed-off-by: Pavel Emelianov
Acked-by: Serge E. Hallyn
Cc: Kirill Korotaev
Cc: "Eric W. Biederman"
Cc: Cedric Le Goater
Cc: Herbert Poetzl
Cc: Sukadev Bhattiprolu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Oct, 2007
3 commits
-
For some time /proc/sys/kernel/core_pattern has been able to set its output
destination as a pipe, allowing a user space helper to receive and
intellegently process a core. This infrastructure however has some
shortcommings which can be enhanced. Specifically:1) The coredump code in the kernel should ignore RLIMIT_CORE limitation
when core_pattern is a pipe, since file system resources are not being
consumed in this case, unless the user application wishes to save the core,
at which point the app is restricted by usual file system limits and
restrictions.2) The core_pattern code should be able to parse and pass options to the
user space helper as an argv array. The real core limit of the uid of the
crashing proces should also be passable to the user space helper (since it
is overridden to zero when called).3) Some miscellaneous bugs need to be cleaned up (specifically the
recognition of a recursive core dump, should the user mode helper itself
crash. Also, the core dump code in the kernel should not wait for the user
mode helper to exit, since the same context is responsible for writing to
the pipe, and a read of the pipe by the user mode helper will result in a
deadlock.This patch:
Remove the check of RLIMIT_CORE if core_pattern is a pipe. In the event that
core_pattern is a pipe, the entire core will be fed to the user mode helper.Signed-off-by: Neil Horman
Cc:
Cc:
Cc: Jeremy Fitzhardinge
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Replace NT_PRXFPREG with ELF_CORE_XFPREG_TYPE in the coredump code which
allows for more flexibility in the note type for the state of 'extended
floating point' implementations in coredumps. New note types can now be
added with an appropriate #define.This does #define ELF_CORE_XFPREG_TYPE to be NT_PRXFPREG in all
current users so there's are no change in behaviour.This will let us use different note types on powerpc for the Altivec/VMX
state that some PowerPC cpus have (G4, PPC970, POWER6) and for the SPE
(signal processing extension) state that some embedded PowerPC cpus from
Freescale have.Signed-off-by: Mark Nelson
Cc: Paul Mackerras
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Andi Kleen
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The commit b5810039a54e5babf428e9a1e89fc1940fabff11 contains the note
A last caveat: the ZERO_PAGE is now refcounted and managed with rmap
(and thus mapcounted and count towards shared rss). These writes to
the struct page could cause excessive cacheline bouncing on big
systems. There are a number of ways this could be addressed if it is
an issue.And indeed this cacheline bouncing has shown up on large SGI systems.
There was a situation where an Altix system was essentially livelocked
tearing down ZERO_PAGE pagetables when an HPC app aborted during startup.
This situation can be avoided in userspace, but it does highlight the
potential scalability problem with refcounting ZERO_PAGE, and corner
cases where it can really hurt (we don't want the system to livelock!).There are several broad ways to fix this problem:
1. add back some special casing to avoid refcounting ZERO_PAGE
2. per-node or per-cpu ZERO_PAGES
3. remove the ZERO_PAGE completelyI will argue for 3. The others should also fix the problem, but they
result in more complex code than does 3, with little or no real benefit
that I can see.Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a
false optimisation: if an application is performance critical, it would
not be doing many read faults of new memory, or at least it could be
expected to write to that memory soon afterwards. If cache or memory use
is critical, it should not be working with a significant number of
ZERO_PAGEs anyway (a more compact representation of zeroes should be
used).As a sanity check -- mesuring on my desktop system, there are never many
mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not
increase much without it.When running a make -j4 kernel compile on my dual core system, there are
about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000
ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second
is torn down without being COWed). So removing ZERO_PAGE will save 1,000
page faults per second when running kbuild, while keeping it only saves
less than 1 page clearing operation per second. 1 page clear is cheaper
than a thousand faults, presumably, so there isn't an obvious loss.Neither the logical argument nor these basic tests give a guarantee of no
regressions. However, this is a reasonable opportunity to try to remove
the ZERO_PAGE from the pagefault path. If it is found to cause regressions,
we can reintroduce it and just avoid refcounting it.The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked. I don't see
much use to them except on benchmarks. All other users of ZERO_PAGE are
converted just to use ZERO_PAGE(0) for simplicity. We can look at
replacing them all and maybe ripping out ZERO_PAGE completely when we are
more satisfied with this solution.Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus "snif" Torvalds
20 Jul, 2007
3 commits
-
This patch enables core dump filtering for ELF-FDPIC-formatted core file.
Signed-off-by: Hidehiro Kawai
Cc: Alan Cox
Cc: David Howells
Cc: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch removes an unused argument from elf_fdpic_dump_segments().
Signed-off-by: Hidehiro Kawai
Cc: Alan Cox
Cc: David Howells
Cc: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly from
the old mm into the new mm.We create the new mm before the binfmt code runs, and place the new stack at
the very top of the address space. Once the binfmt code runs and figures out
where the stack should be, we move it downwards.It is a bit peculiar in that we have one task with two mm's, one of which is
inactive.[a.p.zijlstra@chello.nl: limit stack size]
Signed-off-by: Ollie Wild
Signed-off-by: Peter Zijlstra
Cc:
Cc: Hugh Dickins
[bunk@stusta.de: unexport bprm_mm_init]
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 May, 2007
1 commit
-
Remove includes of where it is not used/needed.
Suggested by Al Viro.Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
03 Apr, 2007
1 commit
-
When the dump cannot occur most likely because of a full file system and
the page to be written is the zero page, the call to page_cache_release()
is missed.Signed-off-by: Brian Pomerantz
Cc: Hugh Dickins
Cc: Nick Piggin
Cc: David Howells
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
24 Mar, 2007
1 commit
-
Fix the /proc/pid/stat representation of executable boundaries. It should
show the bounds of the executable, but instead shows the bounds of the
loader.Before the patch is applied, the bug can be seen by examining, say, inetd:
# ps | grep inetd
610 root 0 S /usr/sbin/inetd -i
# cat /proc/610/maps
c0bb0000-c0bba788 r-xs 00000000 00:0b 14582157 /lib/ld-uClibc-0.9.28.so
c3180000-c31dede4 r-xs 00000000 00:0b 14582179 /lib/libuClibc-0.9.28.so
c328c000-c328ea00 rw-p 00008000 00:0b 14582157 /lib/ld-uClibc-0.9.28.so
c3290000-c329b6c0 rw-p 00000000 00:00 0
c32a0000-c32c0000 rwxp 00000000 00:00 0
c32d4000-c32d8000 rw-p 00000000 00:00 0
c3394000-c3398000 rw-p 00000000 00:00 0
c3458000-c345f464 r-xs 00000000 00:0b 16384612 /usr/sbin/inetd
c3470000-c34748f8 rw-p 00004000 00:0b 16384612 /usr/sbin/inetd
c34cc000-c34d0000 rw-p 00000000 00:00 0
c34d4000-c34d8000 rw-p 00000000 00:00 0
c34d8000-c34dc000 rw-p 00000000 00:00 0
# cat /proc/610/stat
610 (inetd) S 1 610 610 0 -1 256 0 0 0 0 0 8 0 0 19 0 1 0 94392000718
950272 0 4294967295 3233480704 3233523592 3274440352 3274439976
3273467584 0 0 4096 90115 3221712796 0 0 17 0 0 0 0The code boundaries are 3233480704 to 3233523592, which are:
(gdb) p/x 3233480704
$1 = 0xc0bb0000
(gdb) p/x 3233523592
$2 = 0xc0bba788Which corresponds to this line in the maps file:
c0bb0000-c0bba788 r-xs 00000000 00:0b 14582157 /lib/ld-uClibc-0.9.28.so
Which is wrong. After the patch is applied, the maps file is pretty much
identical (there's some minor shuffling of the location of some of the
anonymous VMAs), but the stat file is now:# cat /proc/610/stat
610 (inetd) S 1 610 610 0 -1 256 0 0 0 0 0 7 0 0 18 0 1 0 94392000722
950272 0 4294967295 3276111872 3276141668 3274440352 3274439976
3273467584 0 0 4096 90115 3221712796 0 0 17 0 0 0 0The code boundaries are then 3276111872 to 3276141668, which are:
(gdb) p/x 3276111872
$1 = 0xc3458000
(gdb) p/x 3276141668
$2 = 0xc345f464And these correspond to this line in the maps file instead:
c3458000-c345f464 r-xs 00000000 00:0b 16384612 /usr/sbin/inetd
Which is now correct.
Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Feb, 2007
1 commit
-
Remove the last vestiges of the long-deprecated "MAP_ANON" page protection
flag: use "MAP_ANONYMOUS" instead.Signed-off-by: Robert P. J. Day
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Jan, 2007
1 commit
-
Proposed patch to fix #5 in
http://www.isec.pl/vulnerabilities/isec-0017-binfmt_elf.txt
aka
http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2004-1073To reproduce, do
* grab poc at the end of advisory.
* add line "eph.p_memsz = 4096;" after "eph.p_filesz = 4096;"
where first "4096" is something equal to or greater than 4096.
* ./poc /usr/bin/sudo && ls -lHere I get with 2.6.20-rc5:
-rw------- 1 ad ad 102400 2007-01-15 19:17 core
---s--x--x 2 root root 101820 2007-01-15 19:15 /usr/bin/sudoCheck for MAY_READ like binfmt_misc.c does.
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
13 Dec, 2006
1 commit
-
Convert the single available instance of kmalloc() + memset() to
kzalloc() in the fs/ directory.Signed-off-by: Robert P. J. Day
Signed-off-by: Adrian Bunk
09 Dec, 2006
2 commits
-
Replace occurences of task->signal->session by a new process_session() helper
routine.It will be useful for pid namespaces to abstract the session pid number.
Signed-off-by: Cedric Le Goater
Cc: Kirill Korotaev
Cc: Eric W. Biederman
Cc: Herbert Poetzl
Cc: Sukadev Bhattiprolu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch changes struct file to use struct path instead of having
independent pointers to struct dentry and struct vfsmount, and converts all
users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.Additionally, it adds two #define's to make the transition easier for users of
the f_dentry and f_vfsmnt.Signed-off-by: Josef "Jeff" Sipek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Dec, 2006
1 commit
-
Define elf_addr_t in linux/elf.h. The size of the type is determined using
ELF_CLASS. This allows us to remove the defines that today are spread all
over .c and .h files.Signed-off-by: Magnus Damm
Cc: Daniel Jacobowitz
Cc: Roland McGrath
Cc: Jakub Jelinek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
30 Sep, 2006
1 commit
-
do_each_thread() is rcu-safe, and all tasks which use this ->mm must sleep
in wait_for_completion(&mm->core_done) at this point, so we can use RCU
locks.Also, remove unneeded INIT_LIST_HEAD(new) before list_add(new, head).
Signed-off-by: Oleg Nesterov
Acked-By: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
11 Jul, 2006
3 commits
-
Add coredump capability for the ELF-FDPIC binfmt.
Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Adjust the ELF-FDPIC binfmt driver to conform much more to the CodingStyle,
silly though it may be.Further changes:
(*) Drop the casts to long for addresses in kdebug() statements (they're
unsigned long already).(*) Use extra variables to avoid expressions longer than 80 chars by splitting
the statement into multiple statements and letting the compiler optimise
them back together.(*) Eliminate duplicate call of ksize() when working out how much space was
actually allocated for the stack.(*) Discard the commented-out load_shlib prototype and op pointer as this will
not be supported in ELF-FDPIC for the foreseeable future.Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix FDPIC compile errors.
(akpm: we suspect it fixes a warning)
Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Jun, 2006
1 commit
-
Add __user annotations to binfmt_elf_fdpic.
Signed-off-by: Al Viro
Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
25 Mar, 2006
1 commit
-
this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized awaySigned-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk
11 Jan, 2006
1 commit
-
Remove unneeded casts of kmalloc() return value in binfmt_elf.
Signed-off-by: Jesper Juhl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Nov, 2005
1 commit
-
This is the fs/ part of the big kfree cleanup patch.
Remove pointless checks for NULL prior to calling kfree() in fs/.
Signed-off-by: Jesper Juhl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
30 Oct, 2005
1 commit
-
How is anon_rss initialized? In dup_mmap, and by mm_alloc's memset; but
that's not so good if an mm_counter_t is a special type. And how is rss
initialized? By set_mm_counter, all over the place. Come on, we just need to
initialize them both at once by set_mm_counter in mm_init (which follows the
memcpy when forking).Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Apr, 2005
1 commit
-
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.Let it rip!