17 Oct, 2007

11 commits

  • This patch contains the following cleanups that are now possible:
    - remove the unused security_operations->inode_xattr_getsuffix
    - remove the no longer used security_operations->unregister_security
    - remove some no longer required exit code
    - remove a bunch of no longer used exports

    Signed-off-by: Adrian Bunk
    Acked-by: James Morris
    Cc: Chris Wright
    Cc: Stephen Smalley
    Cc: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • de_thread() yields waiting for ->group_leader to be a zombie. This deadlocks
    if an rt-prio execer shares the same cpu with ->group_leader. Change the code
    to use ->group_exit_task/notify_count mechanics.

    This patch certainly uglifies the code, perhaps someone can suggest something
    better.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Now that we don't pre-allocate the new ->sighand, we can kill the first fast
    path, it doesn't make sense any longer. At best, it can save one "list_empty()"
    check but leads to the code duplication.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • de_thread() pre-allocates newsighand to make sure that exec() can't fail after
    killing all sub-threads. Imho, this buys nothing, but complicates the code:

    - this is (mostly) needed to handle CLONE_SIGHAND without CLONE_THREAD
    tasks, this is very unlikely (if ever used) case

    - unless we already have some serious problems, GFP_KERNEL allocation
    should not fail

    - ENOMEM still can happen after de_thread(), ->sighand is not the last
    object we have to allocate

    Change the code to allocate the new ->sighand on demand.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • There is no any reason to do recalc_sigpending() after changing ->sighand.
    To begin with, recalc_sigpending() does not take ->sighand into account.

    This means we don't need to take newsighand->siglock while changing sighands.
    rcu_assign_pointer() provides a necessary barrier, and if another process
    reads the new ->sighand it should either take tasklist_lock or it should use
    lock_task_sighand() which has a corresponding smp_read_barrier_depends().

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • vfs_permission(MAY_EXEC) checks if the filesystem is mounted with "noexec", so
    there's no need to repeat this check in sys_uselib() and open_exec().

    Signed-off-by: Miklos Szeredi
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Fix do_coredump to detect a crash in the user mode helper process and abort
    the attempt to recursively dump core to another copy of the helper process,
    potentially ad-infinitum.

    [akpm@linux-foundation.org: cleanups]
    Signed-off-by: Neil Horman
    Cc:
    Cc:
    Cc: Jeremy Fitzhardinge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Neil Horman
     
  • A rewrite of my previous post for this enhancement. It uses jeremy's
    split_argv/free_argv library functions to translate core_pattern into an argv
    array to be passed to the user mode helper process. It also adds a
    translation to format_corename such that the origional value of RLIMIT_CORE
    can be passed to userspace as an argument.

    Signed-off-by: Neil Horman
    Cc:
    Cc:
    Cc: Jeremy Fitzhardinge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Neil Horman
     
  • For some time /proc/sys/kernel/core_pattern has been able to set its output
    destination as a pipe, allowing a user space helper to receive and
    intellegently process a core. This infrastructure however has some
    shortcommings which can be enhanced. Specifically:

    1) The coredump code in the kernel should ignore RLIMIT_CORE limitation
    when core_pattern is a pipe, since file system resources are not being
    consumed in this case, unless the user application wishes to save the core,
    at which point the app is restricted by usual file system limits and
    restrictions.

    2) The core_pattern code should be able to parse and pass options to the
    user space helper as an argv array. The real core limit of the uid of the
    crashing proces should also be passable to the user space helper (since it
    is overridden to zero when called).

    3) Some miscellaneous bugs need to be cleaned up (specifically the
    recognition of a recursive core dump, should the user mode helper itself
    crash. Also, the core dump code in the kernel should not wait for the user
    mode helper to exit, since the same context is responsible for writing to
    the pipe, and a read of the pipe by the user mode helper will result in a
    deadlock.

    This patch:

    Remove the check of RLIMIT_CORE if core_pattern is a pipe. In the event that
    core_pattern is a pipe, the entire core will be fed to the user mode helper.

    Signed-off-by: Neil Horman
    Cc:
    Cc:
    Cc: Jeremy Fitzhardinge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Neil Horman
     
  • list_del() hardly can fail, so checking for return value is pointless
    (and current code always return 0).

    Nobody really cared that return value anyway.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Switch single-linked binfmt formats list to usual list_head's. This leads
    to one-liners in register_binfmt() and unregister_binfmt(). The downside
    is one pointer more in struct linux_binfmt. This is not a problem, since
    the set of registered binfmts on typical box is very small -- (ELF +
    something distro enabled for you).

    Test-booted, played with executable .txt files, modprobe/rmmod binfmt_misc.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

21 Sep, 2007

1 commit

  • This simplifies signalfd code, by avoiding it to remain attached to the
    sighand during its lifetime.

    In this way, the signalfd remain attached to the sighand only during
    poll(2) (and select and epoll) and read(2). This also allows to remove
    all the custom "tsk == current" checks in kernel/signal.c, since
    dequeue_signal() will only be called by "current".

    I think this is also what Ben was suggesting time ago.

    The external effect of this, is that a thread can extract only its own
    private signals and the group ones. I think this is an acceptable
    behaviour, in that those are the signals the thread would be able to
    fetch w/out signalfd.

    Signed-off-by: Davide Libenzi
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     

23 Aug, 2007

2 commits

  • de_thread:

    if (atomic_read(&oldsighand->count) count) != 1);

    This is not safe without the rmb() in between. The results of two
    correctly ordered __exit_signal()->atomic_dec_and_test()'s could be seen
    out of order on our CPU.

    The same is true for the "thread_group_empty()" case, __unhash_process()'s
    changes could be seen before atomic_dec_and_test(&sig->count).

    On some platforms (including i386) atomic_read() doesn't provide even the
    compiler barrier, in that case these checks are simply racy.

    Remove these BUG_ON()'s. Alternatively, we can do something like

    BUG_ON( ({ smp_rmb(); atomic_read(&sig->count) != 1; }) );

    Signed-off-by: Oleg Nesterov
    Acked-by: Paul E. McKenney
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • With this patch any thread can dequeue its own private signals via signalfd,
    even if it was created by another sub-thread.

    To do so, we pass "current" to dequeue_signal() if the caller is from the same
    thread group. This also fixes the scheduling of posix timers broken by the
    previous patch.

    If the caller doesn't belong to this thread group, we can't handle __SI_TIMER
    case properly anyway. Perhaps we should forbid the cross-process signalfd usage
    and convert ctx->tsk to ctx->sighand.

    Signed-off-by: Oleg Nesterov
    Cc: Benjamin Herrenschmidt
    Cc: Davide Libenzi
    Cc: Ingo Molnar
    Cc: Michael Kerrisk
    Cc: Roland McGrath
    Cc: Thomas Gleixner
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

19 Aug, 2007

1 commit


20 Jul, 2007

3 commits

  • This patch changes mm_struct.dumpable to a pair of bit flags.

    set_dumpable() converts three-value dumpable to two flags and stores it into
    lower two bits of mm_struct.flags instead of mm_struct.dumpable.
    get_dumpable() behaves in the opposite way.

    [akpm@linux-foundation.org: export set_dumpable]
    Signed-off-by: Hidehiro Kawai
    Cc: Alan Cox
    Cc: David Howells
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kawai, Hidehiro
     
  • Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly from
    the old mm into the new mm.

    We create the new mm before the binfmt code runs, and place the new stack at
    the very top of the address space. Once the binfmt code runs and figures out
    where the stack should be, we move it downwards.

    It is a bit peculiar in that we have one task with two mm's, one of which is
    inactive.

    [a.p.zijlstra@chello.nl: limit stack size]
    Signed-off-by: Ollie Wild
    Signed-off-by: Peter Zijlstra
    Cc:
    Cc: Hugh Dickins
    [bunk@stusta.de: unexport bprm_mm_init]
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ollie Wild
     
  • The purpose of audit_bprm() is to log the argv array to a userspace daemon at
    the end of the execve system call. Since user-space hasn't had time to run,
    this array is still in pristine state on the process' stack; so no need to
    copy it, we can just grab it from there.

    In order to minimize the damage to audit_log_*() copy each string into a
    temporary kernel buffer first.

    Currently the audit code requires that the full argument vector fits in a
    single packet. So currently it does clip the argv size to a (sysctl) limit,
    but only when execve auditing is enabled.

    If the audit protocol gets extended to allow for multiple packets this check
    can be removed.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ollie Wild
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

24 May, 2007

1 commit


17 May, 2007

1 commit


12 May, 2007

1 commit


11 May, 2007

3 commits

  • This patch series implements the new signalfd() system call.

    I took part of the original Linus code (and you know how badly it can be
    broken :), and I added even more breakage ;) Signals are fetched from the same
    signal queue used by the process, so signalfd will compete with standard
    kernel delivery in dequeue_signal(). If you want to reliably fetch signals on
    the signalfd file, you need to block them with sigprocmask(SIG_BLOCK). This
    seems to be working fine on my Dual Opteron machine. I made a quick test
    program for it:

    http://www.xmailserver.org/signafd-test.c

    The signalfd() system call implements signal delivery into a file descriptor
    receiver. The signalfd file descriptor if created with the following API:

    int signalfd(int ufd, const sigset_t *mask, size_t masksize);

    The "ufd" parameter allows to change an existing signalfd sigmask, w/out going
    to close/create cycle (Linus idea). Use "ufd" == -1 if you want a brand new
    signalfd file.

    The "mask" allows to specify the signal mask of signals that we are interested
    in. The "masksize" parameter is the size of "mask".

    The signalfd fd supports the poll(2) and read(2) system calls. The poll(2)
    will return POLLIN when signals are available to be dequeued. As a direct
    consequence of supporting the Linux poll subsystem, the signalfd fd can use
    used together with epoll(2) too.

    The read(2) system call will return a "struct signalfd_siginfo" structure in
    the userspace supplied buffer. The return value is the number of bytes copied
    in the supplied buffer, or -1 in case of error. The read(2) call can also
    return 0, in case the sighand structure to which the signalfd was attached,
    has been orphaned. The O_NONBLOCK flag is also supported, and read(2) will
    return -EAGAIN in case no signal is available.

    If the size of the buffer passed to read(2) is lower than sizeof(struct
    signalfd_siginfo), -EINVAL is returned. A read from the signalfd can also
    return -ERESTARTSYS in case a signal hits the process. The format of the
    struct signalfd_siginfo is, and the valid fields depends of the (->code &
    __SI_MASK) value, in the same way a struct siginfo would:

    struct signalfd_siginfo {
    __u32 signo; /* si_signo */
    __s32 err; /* si_errno */
    __s32 code; /* si_code */
    __u32 pid; /* si_pid */
    __u32 uid; /* si_uid */
    __s32 fd; /* si_fd */
    __u32 tid; /* si_fd */
    __u32 band; /* si_band */
    __u32 overrun; /* si_overrun */
    __u32 trapno; /* si_trapno */
    __s32 status; /* si_status */
    __s32 svint; /* si_int */
    __u64 svptr; /* si_ptr */
    __u64 utime; /* si_utime */
    __u64 stime; /* si_stime */
    __u64 addr; /* si_addr */
    };

    [akpm@linux-foundation.org: fix signalfd_copyinfo() on i386]
    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • attach_pid() currently takes a pid_t and then uses find_pid() to find the
    corresponding struct pid. Sometimes we already have the struct pid. We can
    then skip find_pid() if attach_pid() were to take a struct pid parameter.

    Signed-off-by: Sukadev Bhattiprolu
    Cc: Cedric Le Goater
    Cc: Dave Hansen
    Cc: Serge Hallyn
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • Hi,

    I have been working on some code that detects abnormal events based on audit
    system events. One kind of event that we currently have no visibility for is
    when a program terminates due to segfault - which should never happen on a
    production machine. And if it did, you'd want to investigate it. Attached is a
    patch that collects these events and sends them into the audit system.

    Signed-off-by: Steve Grubb
    Signed-off-by: Al Viro

    Steve Grubb
     

09 May, 2007

2 commits

  • When a binary format is unregistered and re-registered, register_binfmt
    fails with -EBUSY. The reason is that unregister_binfmt does not set
    fmt->next to NULL, and seeing (fmt->next != NULL), register_binfmt fails
    with -EBUSY.

    One can find his way around by explicitly setting fmt->next to NULL after
    unregistering, but that is kind of unclean (one should better be using only
    the interfaces, and not the interal members, isn't it?)

    Attached one-liner can fix it.

    Signed-off-by: Kalash Nainwal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    kalash nainwal
     
  • Petr Tesarik discovered a problem in remove_arg_zero(). He writes:

    When a script is loaded, load_script() replaces argv[0] with the
    name of the interpreter and the filename passed to the exec syscall.
    However, there is no guarantee that the length of the interpreter
    name plus the length of the filename is greater than the length of
    the original argv[0]. If the difference happens to cross a page boundary,
    setup_arg_pages() will call put_dirty_page() [aka install_arg_page()]
    with an address outside the VMA.

    Therefore, remove_arg_zero() must free all pages which would be unused
    after the argument is removed.

    So, rewrite the remove_arg_zero function without gotos, with a few comments,
    and with the commonly used explicit index/offset. This fixes the problem
    and makes it easier to understand as well.

    [a.p.zijlstra@chello.nl: add comment]
    Signed-off-by: Nick Piggin
    Cc: Petr Tesarik
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

18 Apr, 2007

1 commit

  • The patch checks for "|" in the pattern not the output and doesn't nail a
    pid on to a piped name (as it is a program name not a file)

    Also fixes a very very obscure security corner case. If you happen to have
    decided on a core pattern that starts with the program name then the user
    can run a program called "|myevilhack" as it stands. I doubt anyone does
    this.

    Signed-off-by: Alan Cox
    Confirmed-by: Christopher S. Aker
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     

12 Feb, 2007

1 commit

  • Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the
    corresponding "kmem_cache_zalloc()" call.

    Signed-off-by: Robert P. J. Day
    Cc: "Luck, Tony"
    Cc: Andi Kleen
    Cc: Roland McGrath
    Cc: James Bottomley
    Cc: Greg KH
    Acked-by: Joel Becker
    Cc: Steven Whitehouse
    Cc: Jan Kara
    Cc: Michael Halcrow
    Cc: "David S. Miller"
    Cc: Stephen Smalley
    Cc: James Morris
    Cc: Chris Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     

11 Dec, 2006

1 commit

  • Currently, each fdtable supports three dynamically-sized arrays of data: the
    fdarray and two fdsets. The code allows the number of fds supported by the
    fdarray (fdtable->max_fds) to differ from the number of fds supported by each
    of the fdsets (fdtable->max_fdset).

    In practice, it is wasteful for these two sizes to differ: whenever we hit a
    limit on the smaller-capacity structure, we will reallocate the entire fdtable
    and all the dynamic arrays within it, so any delta in the memory used by the
    larger-capacity structure will never be touched at all.

    Rather than hogging this excess, we shouldn't even allocate it in the first
    place, and keep the capacities of the fdarray and the fdsets equal. This
    patch removes fdtable->max_fdset. As an added bonus, most of the supporting
    code becomes simpler.

    Signed-off-by: Vadim Lobanov
    Cc: Christoph Hellwig
    Cc: Al Viro
    Cc: Dipankar Sarma
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vadim Lobanov
     

09 Dec, 2006

2 commits

  • Add a per pid_namespace child-reaper. This is needed so processes are reaped
    within the same pid space and do not spill over to the parent pid space. Its
    also needed so containers preserve existing semantic that pid == 1 would reap
    orphaned children.

    This is based on Eric Biederman's patch: http://lkml.org/lkml/2006/2/6/285

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Cedric Le Goater
    Cc: Kirill Korotaev
    Cc: Eric W. Biederman
    Cc: Herbert Poetzl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • This patch changes struct file to use struct path instead of having
    independent pointers to struct dentry and struct vfsmount, and converts all
    users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.

    Additionally, it adds two #define's to make the transition easier for users of
    the f_dentry and f_vfsmnt.

    Signed-off-by: Josef "Jeff" Sipek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef "Jeff" Sipek
     

08 Dec, 2006

2 commits

  • On Sat, Dec 02, 2006 at 11:47:44PM +0300, Alexey Dobriyan wrote:
    > David Binderman compiled 2.6.19 with icc and grepped for "was set but never
    > used". Many warnings are on
    > http://coderock.org/kj/unused-2.6.19-fs

    Heh, the very first line:
    fs/exec.c(1465): remark #593: variable "flag" was set but never used

    fs/exec.c:
    1477 /*
    1478 * We cannot trust fsuid as being the "true" uid of the
    1479 * process nor do we know its entire history. We only know it
    1480 * was tainted so we dump it as root in mode 2.
    1481 */
    1482 if (mm->dumpable == 2) { /* Setuid core dump mode */
    1483 flag = O_EXCL; /* Stop rewrite attacks */
    1484 current->fsuid = 0; /* Dump root private */
    1485 }

    And then filp_open follows with "flag" totally ignored.

    (akpm: this restores the code to Alan's original version. Andi's "Support
    piping into commands in /proc/sys/kernel/core_pattern" (cset d025c9db) broke
    it).

    Cc: Alan Cox
    Cc:
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • SLAB_KERNEL is an alias of GFP_KERNEL.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

02 Oct, 2006

1 commit

  • Replace references to system_utsname to the per-process uts namespace
    where appropriate. This includes things like uname.

    Changes: Per Eric Biederman's comments, use the per-process uts namespace
    for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c

    [jdike@addtoit.com: UML fix]
    [clg@fr.ibm.com: cleanup]
    [akpm@osdl.org: build fix]
    Signed-off-by: Serge E. Hallyn
    Cc: Kirill Korotaev
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Andrey Savochkin
    Signed-off-by: Cedric Le Goater
    Cc: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

01 Oct, 2006

2 commits

  • Using the infrastructure created in previous patches implement support to
    pipe core dumps into programs.

    This is done by overloading the existing core_pattern sysctl
    with a new syntax:

    |program

    When the first character of the pattern is a '|' the kernel will instead
    threat the rest of the pattern as a command to run. The core dump will be
    written to the standard input of that program instead of to a file.

    This is useful for having automatic core dump analysis without filling up
    disks. The program can do some simple analysis and save only a summary of
    the core dump.

    The core dump proces will run with the privileges and in the name space of
    the process that caused the core dump.

    I also increased the core pattern size to 128 bytes so that longer command
    lines fit.

    Most of the changes comes from allowing core dumps without seeks. They are
    fairly straight forward though.

    One small incompatibility is that if someone had a core pattern previously
    that started with '|' they will get suddenly new behaviour. I think that's
    unlikely to be a real problem though.

    Additional background:

    > Very nice, do you happen to have a program that can accept this kind of
    > input for crash dumps? I'm guessing that the embedded people will
    > really want this functionality.

    I had a cheesy demo/prototype. Basically it wrote the dump to a file again,
    ran gdb on it to get a backtrace and wrote the summary to a shared directory.
    Then there was a simple CGI script to generate a "top 10" crashes HTML
    listing.

    Unfortunately this still had the disadvantage to needing full disk space for a
    dump except for deleting it afterwards (in fact it was worse because over the
    pipe holes didn't work so if you have a holey address map it would require
    more space).

    Fortunately gdb seems to be happy to handle /proc/pid/fd/xxx input pipes as
    cores (at least it worked with zsh's =(cat core) syntax), so it would be
    likely possible to do it without temporary space with a simple wrapper that
    calls it in the right way. I ran out of time before doing that though.

    The demo prototype scripts weren't very good. If there is really interest I
    can dig them out (they are currently on a laptop disk on the desk with the
    laptop itself being in service), but I would recommend to rewrite them for any
    serious application of this and fix the disk space problem.

    Also to be really useful it should probably find a way to automatically fetch
    the debuginfos (I cheated and just installed them in advance). If nobody else
    does it I can probably do the rewrite myself again at some point.

    My hope at some point was that desktops would support it in their builtin
    crash reporters, but at least the KDE people I talked too seemed to be happy
    with their user space only solution.

    Alan sayeth:

    I don't believe that piping as such as neccessarily the right model, but
    the ability to intercept and processes core dumps from user space is asked
    for by many enterprise users as well. They want to know about, capture,
    analyse and process core dumps, often centrally and in automated form.

    [akpm@osdl.org: loff_t != unsigned long]
    Signed-off-by: Andi Kleen
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • There were a few accounting data/macros that are used in CSA but are #ifdef'ed
    inside CONFIG_BSD_PROCESS_ACCT. This patch is to change those ifdef's from
    CONFIG_BSD_PROCESS_ACCT to CONFIG_TASK_XACCT. A few defines are moved from
    kernel/acct.c and include/linux/acct.h to kernel/tsacct.c and
    include/linux/tsacct_kern.h.

    Signed-off-by: Jay Lan
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Jes Sorensen
    Cc: Chris Sturtivant
    Cc: Tony Ernst
    Cc: Guillaume Thouvenin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jay Lan
     

30 Sep, 2006

1 commit

  • Fixed race on put_files_struct on exec with proc. Restoring files on
    current on error path may lead to proc having a pointer to already kfree-d
    files_struct.

    ->files changing at exit.c and khtread.c are safe as exit_files() makes all
    things under lock.

    Found during OpenVZ stress testing.

    [akpm@osdl.org: add export]
    Signed-off-by: Pavel Emelianov
    Signed-off-by: Kirill Korotaev
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill Korotaev
     

27 Sep, 2006

2 commits

  • Ingo Oeser pointed out that because current expands to an inline function
    it is more space efficient and somewhat faster to simply keep a cached copy
    of current in another variable. This patch implements that for the
    de_thread function.

    (akpm: saves nearly 100 bytes of text on x86)

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • In de_thread we move pids from one process to another, a rather ugly case.
    The function transfer_pid makes it clear what we are doing, and makes the
    action atomic. This is useful we ever want to atomically traverse the
    process group and session lists, in a rcu safe manner.

    Even if the atomic properties this change should be a win as transfer_pid
    should be less code to execute than executing both attach_pid and
    detach_pid, and this should make de_thread slightly smaller as only a
    single function call needs to be emitted. The only downside is that the
    code might be slower to execute as the odds are against transfer_pid being
    in cache.

    Signed-off-by: Eric W. Biederman
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

28 Aug, 2006

1 commit

  • This fixes the locking error noticed by lockdep:

    =============================================
    [ INFO: possible recursive locking detected ]
    ---------------------------------------------
    init/1 is trying to acquire lock:
    (&sighand->siglock){....}, at: [] flush_old_exec+0x3ae/0x859

    but task is already holding lock:
    (&sighand->siglock){....}, at: [] flush_old_exec+0x39e/0x859

    other info that might help us debug this:
    2 locks held by init/1:
    #0: (tasklist_lock){..--}, at: [] flush_old_exec+0x38e/0x859
    #1: (&sighand->siglock){....}, at: [] flush_old_exec+0x39e/0x859

    stack backtrace:
    [] show_trace_log_lvl+0x54/0xfd
    [] show_trace+0xd/0x10
    [] dump_stack+0x19/0x1b
    [] __lock_acquire+0x773/0x997
    [] lock_acquire+0x4b/0x6c
    [] _spin_lock+0x19/0x28
    [] flush_old_exec+0x3ae/0x859
    [] load_elf_binary+0x4aa/0x1628
    [] search_binary_handler+0xa7/0x24e
    [] do_execve+0x15b/0x1f9
    [] sys_execve+0x29/0x4d
    [] syscall_call+0x7/0xb

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Dave Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jones