26 Mar, 2006

3 commits

  • This patch avoids arithmetic on 'signed' types that are slower than
    'unsigned'. This saves space and cpu cycles.

    size of kernel/sys.o before the patch (gcc-3.4.5)

    text data bss dec hex filename
    10924 252 4 11180 2bac kernel/sys.o

    size of kernel/sys.o after the patch
    text data bss dec hex filename
    10903 252 4 11159 2b97 kernel/sys.o

    I noticed that gcc-4.1.0 (from Fedora Core 5) even uses idiv instruction for
    (a+b)/2 if a and b are signed.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     
  • While doing some benchmarks of an Apache/PHP SMP server, I noticed high
    oprofile numbers in in_group_p() and _atomic_dec_and_lock().

    rank percent
    1 4.8911 % __link_path_walk
    2 4.8503 % __d_lookup
    *3 4.2911 % _atomic_dec_and_lock
    4 3.9307 % __copy_to_user_ll
    5 4.9004 % sysenter_past_esp
    *6 3.3248 % in_group_p

    It appears that in_group_p() does an uncessary

    get_group_info(current->group_info); /* atomic_inc() */
    ... /* access current->group_info */
    put_group_info(current->group_info); /* _atomic_dec_and_lock */

    It is not necessary to do this, because the current task holds a reference
    on its own group_info, and this reference cannot change during the lookup.

    This patch deletes the get_group_info()/put_group_info() pair from
    sys_getgroups(), in_group_p() and in_egroup_p() functions.

    Signed-off-by: Eric Dumazet
    Cc: Tim Hockin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     
  • Move capable() to kernel/capability.c and eliminate duplicate
    implementations. Add __capable() function which can be used to check for
    capabiilty of any process.

    Signed-off-by: Chris Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Wright
     

24 Mar, 2006

3 commits

  • Document the fact that setrlimit(RLIMIT_CPU) doesn't return error codes when
    it should. I don't think we can fix this without a 2.7.x..

    Cc: Martin Schwidefsky
    Cc: Ulrich Weigand
    Cc: Cliff Wickman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • At present the kernel doesn't honour an attempt to set RLIMIT_CPU to zero
    seconds. But the spec says it should, and that's what 2.4.x does.

    Fixing this for real would involve some complexity (such as adding a new
    it-has-been-set flag to the task_struct, and testing that everwhere, instead
    of overloading the value of it_prof_expires).

    Given that a 2.4 kernel won't actually send the signal until one second has
    expired anyway, let's just handle this case by treating the caller's
    zero-seconds as one second.

    Cc: Martin Schwidefsky
    Cc: Ulrich Weigand
    Cc: Cliff Wickman
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • - Whitespace cleanups

    - Make that expression comprehensible.

    There's a potential logic change here: we do the "is it_prof_expires equal to
    zero" test after converting it to seconds, rather than doing the comparison
    between raw cputime_t's.

    But given that it's in units of seconds anyway, that shouldn't change
    anything.

    Cc: Martin Schwidefsky
    Cc: Ulrich Weigand
    Cc: Cliff Wickman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

23 Mar, 2006

2 commits

  • Semaphore to mutex conversion.

    The conversion was generated via scripts, and the result was validated
    automatically via a script as well.

    Signed-off-by: Ingo Molnar
    Cc: Alan Cox
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Avoid taking the global tasklist_lock when possible, if a process is single
    threaded during getrusage(). Any avoidance of tasklist_lock is good for
    NUMA boxes (and possibly for large SMPs). Thanks to Oleg Nesterov for
    review and suggestions.

    Signed-off-by: Nippun Goel
    Signed-off-by: Ravikiran Thirumalai
    Signed-off-by: Shai Fultheim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ravikiran G Thirumalai
     

08 Feb, 2006

1 commit


25 Jan, 2006

1 commit


12 Jan, 2006

2 commits

  • - Move capable() from sched.h to capability.h;

    - Use where capable() is used
    (in include/, block/, ipc/, kernel/, a few drivers/,
    mm/, security/, & sound/;
    many more drivers/ to go)

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy.Dunlap
     
  • Uninline capable(). Saves 2K of kernel text on a generic .config, and 1K on a
    tiny config. In addition it makes the use of capable more consistent between
    CONFIG_SECURITY and !CONFIG_SECURITY

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

09 Jan, 2006

5 commits

  • Factor out common code for different RUSAGE_xxx cases.

    Don't take ->sighand->siglock in RUSAGE_SELF case, suggested by Ravikiran G
    Thirumalai .

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • sys_setpgid() allows to change ->pgrp of ptraced childs.

    'man setpgid' does not tell anything about that, so I consider
    this behaviour is a bug.

    Signed-off-by: Oleg Nesterov
    Cc: Oren Laadan
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • setsid() does not work unless the calling process is a
    thread_group_leader().

    'man setpgid' does not tell anything about that, so I consider this
    behaviour is a bug.

    Signed-off-by: Oren Laadan
    Cc: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oren Laadan
     
  • setpgid(0, pgid) or setpgid(forked_child_pid, pgid) does not work unless
    the calling process is a thread_group_leader().

    'man setpgid' does not tell anything about that, so I consider this
    behaviour is a bug.

    Signed-off-by: Oleg Nesterov
    Cc: Oren Laadan
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • The problem. It is expected that /sbin/halt -p works exactly like
    /sbin/halt, when the kernel does not implement power off functionality.

    The kernel can do a lot of work in the reboot notifiers and in
    device_shutdown before we even get to machine_power_off. Some of that
    shutdown is not safe if you are leaving the power on, and it definitely
    gets in the way of using sysrq or pressing ctrl-alt-del. Since the
    shutdown happens in generic code there is no way to fix this in
    architecture specific code :(

    Some machines are kernel oopsing today because of this.

    The simple solution is to turn LINUX_REBOOT_CMD_POWER_OFF into
    LINUX_REBOOT_CMD_HALT if power_off functionality is not implemented.

    This has the unfortunate side effect of disabling the power off
    functionality on architectures that leave pm_power_off to null and still
    implement something in machine_power_off. And it will break the build on
    some architectures that don't have a pm_power_off variable at all.

    On both counts I say tough.

    For architectures like alpha that don't implement the pm_power_off variable
    pm_power_off is declared in linux/pm.h and it is a generic part of our
    power management code, and all architectures should implement it.

    For architectures like parisc that have a default power off method in
    machine_power_off if pm_power_off is not implemented or fails. It is easy
    enough to set the pm_power_off variable. And nothing bad happens there,
    the machines just stop powering off.

    The current semantics are impossible without a flag at the top level so we
    can avoid the problem code if a power off is not implemented. pm_power_off
    is as good a flag as any with the bonus that it works without modification
    on at least x86, x86_64, powerpc, and ppc today.

    Andrew can you pick this up and put this in the mm tree. Kernels that
    don't compile or don't power off seem saner than kernels that oops or
    panic. Until we get the arch specific patches for the problem
    architectures this probably isn't smart to push into the stable kernel.
    Unfortunately I don't have the time at the moment to walk through every
    architecture and make them work. And even if I did I couldn't test it :(

    From: Hirokazu Takata

    Add pm_power_off() for build fix of arch/m32r/kernel/process.c.

    From: Miklos Szeredi

    UML build fix

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Hayato Fujiwara
    Signed-off-by: Hirokazu Takata
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

16 Dec, 2005

1 commit


13 Dec, 2005

1 commit

  • For Kprobes critical path is the path from debug break exception handler
    till the control reaches kprobes exception code. No probes can be
    supported in this path as we will end up in recursion.

    This patch prevents this by moving the below function to safe __kprobes
    section onto which no probes can be inserted.

    Signed-off-by: Anil S Keshavamurthy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Keshavamurthy Anil S
     

11 Nov, 2005

1 commit


07 Nov, 2005

4 commits

  • I didn't find any possible modular usage in the kernel.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • Convert to proper kernel-doc format.

    Some have extra blank lines (not allowed immed. after the function name)
    or need blank lines (after all parameters). Function summary must be only
    one line.

    Colon (":") in a function description does weird things (causes kernel-doc
    to think that it's a new section head sadly).

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Various core kernel-doc cleanups:
    - add missing function parameters in ipc, irq/manage, kernel/sys,
    kernel/sysctl, and mm/slab;
    - move description to just above function for kernel_restart()

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • This patch adds a connector that reports fork, exec, id change, and exit
    events for all processes to userspace. It replaces the fork_advisor patch
    that ELSA is currently using. Applications that may find these events
    useful include accounting/auditing (e.g. ELSA), system activity monitoring
    (e.g. top), security, and resource management (e.g. CKRM).

    Signed-off-by: Matt Helsley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Helsley
     

23 Sep, 2005

1 commit

  • In the lead up to 2.6.13 I fixed a large number of reboot problems by
    making the calling conventions consistent. Despite checking and double
    checking my work it appears I missed an obvious one.

    This first patch simply refactors the reboot routines so all of the
    preparation for various kinds of reboots are in their own functions.
    Making it very hard to get the various kinds of reboot out of sync.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

18 Sep, 2005

1 commit

  • 2.6.13 incorporated Alan Cox's patch for /proc/sys/fs/suid_dumpable (one
    version of this patch can be found here
    http://marc.theaimsgroup.com/?l=linux-kernel&m=109647550421014&w=2 ).

    This patch also made corresponding changes in kernel/sys.c to change the
    prctl() PR_SET_DUMPABLE operation so that the permitted range of 'arg2' was
    modified from 0..1 to 0..2.

    However, a corresponding change was not made for PR_GET_DUMPABLE: if the
    dumpable flag is non-zero, then PR_GET_DUMPABLE always returns 1, so that
    the caller can't determine the true setting of this flag.

    Acked-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Kerrisk
     

08 Sep, 2005

1 commit

  • The patch removes a redundant variable `sig' from sys_prctl().

    For some reason, when sys_prctl is called with option == PR_SET_PDEATHSIG
    then the value of arg2 is assigned to an int variable named sig. Then sig
    is tested with valid_signal() and later used to set the value of
    current->pdeath_signal .

    There is no reason to use this intermediate variable since valid_signal()
    takes a unsigned long argument, so it can handle being passed arg2
    directly, and if the call to valid_signal is OK, then we know the value of
    arg2 is in the range zero to _NSIG and thus it'll easily fit in a plain int
    and thus there's no problem assigning it later to current->pdeath_signal
    (which is an int).

    The patch gets rid of the pointless variable `sig'.
    This reduces the size of kernel/sys.o in 2.6.13-rc6-mm1 by 32 bytes on my
    system.

    Patch has been compile tested, boot tested, and just to make damn sure I
    didn't break anything I wrote a quick test app that calls
    prctl(PR_SET_PDEATHSIG ...) with the entire range of values for a
    unsigned long, and it behaves as expected with and without the patch.

    Signed-off-by: Jesper Juhl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesper Juhl
     

04 Aug, 2005

1 commit

  • This removes the calls to device_suspend() from the shutdown path that
    were added sometime during 2.6.13-rc*. They aren't working properly on
    a number of configs (I got reports from both ppc powerbook users and x86
    users) causing the system to not shutdown anymore.

    I think it isn't the right approach at the moment anyway. We have
    already a shutdown() callback for the drivers that actually care about
    shutdown and the suspend() code isn't yet in a good enough shape to be
    so much generalized. Also, the semantics of suspend and shutdown are
    slightly different on a number of setups and the way this was patched in
    provides little way for drivers to cleanly differenciate. It should
    have been at least a different message.

    For 2.6.13, I think we should revert to 2.6.12 behaviour and have a
    working suspend back.

    Signed-off-by: Benjamin Herrenschmidt
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     

30 Jul, 2005

1 commit


28 Jul, 2005

1 commit


27 Jul, 2005

4 commits

  • When the kernel is working well and we want to restart cleanly
    kernel_restart is the function to use. But in many instances
    the kernel wants to reboot when thing are expected to be working
    very badly such as from panic or a software watchdog handler.

    This patch adds the function emergency_restart() so that
    callers can be clear what semantics they expect when calling
    restart. emergency_restart() is expected to be callable
    from interrupt context and possibly reliable in even more
    trying circumstances.

    This is an initial generic implementation for all architectures.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • It is obvious we wanted to call kernel_restart here
    but since we don't have it the code was expanded inline and hasn't
    been correct since sometime in 2.4.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Because the factors of sys_reboot don't exist people calling
    into the reboot path duplicate the code badly, leading to
    inconsistent expectations of code in the reboot path.

    This patch should is just code motion.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • In the recent addition of device_suspend calls into
    sys_reboot two code paths were missed.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

26 Jun, 2005

2 commits

  • This patch introduces the architecture independent implementation the
    sys_kexec_load, the compat_sys_kexec_load system calls.

    Kexec on panic support has been integrated into the core patch and is
    relatively clean.

    In addition the hopefully architecture independent option
    crashkernel=size@location has been docuemented. It's purpose is to reserve
    space for the panic kernel to live, and where no DMA transfer will ever be
    setup to access.

    Signed-off-by: Eric Biederman
    Signed-off-by: Alexander Nyberg
    Signed-off-by: Adrian Bunk
    Signed-off-by: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Without this patch, Linux provokes emergency disk shutdowns and
    similar nastiness. It was in SuSE kernels for some time, IIRC.

    Signed-off-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Machek
     

24 Jun, 2005

3 commits

  • The attached patch makes the following changes:

    (1) There's a new special key type called ".request_key_auth".

    This is an authorisation key for when one process requests a key and
    another process is started to construct it. This type of key cannot be
    created by the user; nor can it be requested by kernel services.

    Authorisation keys hold two references:

    (a) Each refers to a key being constructed. When the key being
    constructed is instantiated the authorisation key is revoked,
    rendering it of no further use.

    (b) The "authorising process". This is either:

    (i) the process that called request_key(), or:

    (ii) if the process that called request_key() itself had an
    authorisation key in its session keyring, then the authorising
    process referred to by that authorisation key will also be
    referred to by the new authorisation key.

    This means that the process that initiated a chain of key requests
    will authorise the lot of them, and will, by default, wind up with
    the keys obtained from them in its keyrings.

    (2) request_key() creates an authorisation key which is then passed to
    /sbin/request-key in as part of a new session keyring.

    (3) When request_key() is searching for a key to hand back to the caller, if
    it comes across an authorisation key in the session keyring of the
    calling process, it will also search the keyrings of the process
    specified therein and it will use the specified process's credentials
    (fsuid, fsgid, groups) to do that rather than the calling process's
    credentials.

    This allows a process started by /sbin/request-key to find keys belonging
    to the authorising process.

    (4) A key can be read, even if the process executing KEYCTL_READ doesn't have
    direct read or search permission if that key is contained within the
    keyrings of a process specified by an authorisation key found within the
    calling process's session keyring, and is searchable using the
    credentials of the authorising process.

    This allows a process started by /sbin/request-key to read keys belonging
    to the authorising process.

    (5) The magic KEY_SPEC_*_KEYRING key IDs when passed to KEYCTL_INSTANTIATE or
    KEYCTL_NEGATE will specify a keyring of the authorising process, rather
    than the process doing the instantiation.

    (6) One of the process keyrings can be nominated as the default to which
    request_key() should attach new keys if not otherwise specified. This is
    done with KEYCTL_SET_REQKEY_KEYRING and one of the KEY_REQKEY_DEFL_*
    constants. The current setting can also be read using this call.

    (7) request_key() is partially interruptible. If it is waiting for another
    process to finish constructing a key, it can be interrupted. This permits
    a request-key cycle to be broken without recourse to rebooting.

    Signed-Off-By: David Howells
    Signed-Off-By: Benoit Boissinot
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Avoid taking the tasklist_lock in sys_times if the process is single
    threaded. In a NUMA system taking the tasklist_lock may cause a bouncing
    cacheline if multiple independent processes continually call sys_times to
    measure their performance.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Shai Fultheim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Add a new `suid_dumpable' sysctl:

    This value can be used to query and set the core dump mode for setuid
    or otherwise protected/tainted binaries. The modes are

    0 - (default) - traditional behaviour. Any process which has changed
    privilege levels or is execute only will not be dumped

    1 - (debug) - all processes dump core when possible. The core dump is
    owned by the current user and no security is applied. This is intended
    for system debugging situations only. Ptrace is unchecked.

    2 - (suidsafe) - any binary which normally would not be dumped is dumped
    readable by root only. This allows the end user to remove such a dump but
    not access it directly. For security reasons core dumps in this mode will
    not overwrite one another or other files. This mode is appropriate when
    adminstrators are attempting to debug problems in a normal environment.

    (akpm:

    > > +EXPORT_SYMBOL(suid_dumpable);
    >
    > EXPORT_SYMBOL_GPL?

    No problem to me.

    > > if (current->euid == current->uid && current->egid == current->gid)
    > > current->mm->dumpable = 1;
    >
    > Should this be SUID_DUMP_USER?

    Actually the feedback I had from last time was that the SUID_ defines
    should go because its clearer to follow the numbers. They can go
    everywhere (and there are lots of places where dumpable is tested/used
    as a bool in untouched code)

    > Maybe this should be renamed to `dump_policy' or something. Doing that
    > would help us catch any code which isn't using the #defines, too.

    Fair comment. The patch was designed to be easy to maintain for Red Hat
    rather than for merging. Changing that field would create a gigantic
    diff because it is used all over the place.

    )

    Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     

06 May, 2005

1 commit

  • As per http://www.nist.gov/dads/HTML/shellsort.html, this should be
    referred to as a Shell sort. Shell-Metzner is a misnomer.

    Signed-off-by: Daniel Dickman
    Signed-off-by: Domen Puncer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Domen Puncer