17 Nov, 2007

1 commit

  • dont use the vgetcpu tcache - it's causing problems with tasks
    migrating, they'll see the old cache up to a jiffy after the
    migration, further increasing the costs of the migration.

    In the worst case they see a complete bogus information from
    the tcache, when a sys_getcpu() call "invalidated" the cache
    info by incrementing the jiffies _and_ the cpuid info in the
    cache and the following vdso_getcpu() call happens after
    vdso_jiffies have been incremented.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Ulrich Drepper
    Signed-off-by: Thomas Gleixner

    Ingo Molnar
     

20 Oct, 2007

5 commits

  • The pgrp field is not used widely around the kernel so it is now marked as
    deprecated with appropriate comment.

    The initialization of INIT_SIGNALS is trimmed because
    a) they are set to 0 automatically;
    b) gcc cannot properly initialize two anonymous (the second one
    is the one with the session) unions. In this particular case
    to make it compile we'd have to add some field initialized
    right before the .pgrp.

    This is the same patch as the 1ec320afdc9552c92191d5f89fcd1ebe588334ca one
    (from Cedric), but for the pgrp field.

    Some progress report:

    We have to deprecate the pid, tgid, session and pgrp fields on struct
    task_struct and struct signal_struct. The session and pgrp are already
    deprecated. The tgid value is close to being such - the worst known usage
    in in fs/locks.c and audit code. The pid field deprecation is mainly
    blocked by numerous printk-s around the kernel that print the tsk->pid to
    log.

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Cedric Le Goater
    Cc: Serge Hallyn
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • The find_task_by_something is a set of macros are used to find task by pid
    depending on what kind of pid is proposed - global or virtual one. All of
    them are wrappers above the most generic one - find_task_by_pid_type_ns() -
    and just substitute some args for it.

    It turned out, that dereferencing the current->nsproxy->pid_ns construction
    and pushing one more argument on the stack inline cause kernel text size to
    grow.

    This patch moves all this stuff out-of-line into kernel/pid.c. Together
    with the next patch it saves a bit less than 400 bytes from the .text
    section.

    Signed-off-by: Pavel Emelyanov
    Cc: Sukadev Bhattiprolu
    Cc: Oleg Nesterov
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • This is the largest patch in the set. Make all (I hope) the places where
    the pid is shown to or get from user operate on the virtual pids.

    The idea is:
    - all in-kernel data structures must store either struct pid itself
    or the pid's global nr, obtained with pid_nr() call;
    - when seeking the task from kernel code with the stored id one
    should use find_task_by_pid() call that works with global pids;
    - when showing pid's numerical value to the user the virtual one
    should be used, but however when one shows task's pid outside this
    task's namespace the global one is to be used;
    - when getting the pid from userspace one need to consider this as
    the virtual one and use appropriate task/pid-searching functions.

    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: nuther build fix]
    [akpm@linux-foundation.org: yet nuther build fix]
    [akpm@linux-foundation.org: remove unneeded casts]
    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Alexey Dobriyan
    Cc: Sukadev Bhattiprolu
    Cc: Oleg Nesterov
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • The set of functions process_session, task_session, process_group and
    task_pgrp is confusing, as the names can be mixed with each other when looking
    at the code for a long time.

    The proposals are to
    * equip the functions that return the integer with _nr suffix to
    represent that fact,
    * and to make all functions work with task (not process) by making
    the common prefix of the same name.

    For monotony the routines signal_session() and set_signal_session() are
    replaced with task_session_nr() and set_task_session(), especially since they
    are only used with the explicit task->signal dereference.

    Signed-off-by: Pavel Emelianov
    Acked-by: Serge E. Hallyn
    Cc: Kirill Korotaev
    Cc: "Eric W. Biederman"
    Cc: Cedric Le Goater
    Cc: Herbert Poetzl
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelianov
     
  • There is separate notifier header, but no separate notifier .c file.

    Extract notifier code out of kernel/sys.c which will remain for
    misc syscalls I hope. Merge kernel/die_notifier.c into kernel/notifier.c.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

19 Oct, 2007

1 commit


01 Oct, 2007

1 commit

  • We need to disable all CPUs other than the boot CPU (usually 0) before
    attempting to power-off modern SMP machines. This fixes the
    hang-on-poweroff issue on my MythTV SMP box, and also on Thomas Gleixner's
    new toybox.

    Signed-off-by: Mark Lord
    Acked-by: Thomas Gleixner
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mark Lord
     

31 Aug, 2007

1 commit

  • Spotted by Marcin Kowalczyk .

    sys_setpgid(child) fails if the child was forked by sub-thread.

    Fix the "is it our child" check. The previous commit
    ee0acf90d320c29916ba8c5c1b2e908d81f5057d was not complete.

    (this patch asks for the new same_thread_group() helper, but mainline doesn't
    have it yet).

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Cc:
    Tested-by: "Marcin 'Qrczak' Kowalczyk"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

30 Jul, 2007

1 commit


27 Jul, 2007

1 commit

  • Commit bd804eba1c8597cbb7cd5a5f9fe886aae16a079a ("PM: Introduce
    pm_power_off_prepare") caused problems in the poweroff path, as reported by
    YOSHIFUJI Hideaki / 吉藤英明.

    Generally, sysdev_shutdown() should be called after the ACPI preparation for
    powering the system off. To make it happen, we can separate sysdev_shutdown()
    from device_shutdown() and call it directly wherever necessary.

    Signed-off-by: Rafael J. Wysocki
    Tested-by: YOSHIFUJI Hideaki / 吉藤英明
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     

20 Jul, 2007

2 commits

  • This patch changes mm_struct.dumpable to a pair of bit flags.

    set_dumpable() converts three-value dumpable to two flags and stores it into
    lower two bits of mm_struct.flags instead of mm_struct.dumpable.
    get_dumpable() behaves in the opposite way.

    [akpm@linux-foundation.org: export set_dumpable]
    Signed-off-by: Hidehiro Kawai
    Cc: Alan Cox
    Cc: David Howells
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kawai, Hidehiro
     
  • Introduce the pm_power_off_prepare() callback that can be registered by the
    interested platforms in analogy with pm_idle() and pm_power_off(), used for
    preparing the system to power off (needed by ACPI).

    This allows us to drop acpi_sysclass and device_acpi that are only defined in
    order to register the ACPI power off preparation callback, which is needed by
    pm_power_off() registered in a much different way.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     

18 Jul, 2007

2 commits

  • Rather than using a tri-state integer for the wait flag in
    call_usermodehelper_exec, define a proper enum, and use that. I've
    preserved the integer values so that any callers I've missed should
    still work OK.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: James Bottomley
    Cc: Randy Dunlap
    Cc: Christoph Hellwig
    Cc: Andi Kleen
    Cc: Paul Mackerras
    Cc: Johannes Berg
    Cc: Ralf Baechle
    Cc: Bjorn Helgaas
    Cc: Joel Becker
    Cc: Tony Luck
    Cc: Kay Sievers
    Cc: Srivatsa Vaddagiri
    Cc: Oleg Nesterov
    Cc: David Howells

    Jeremy Fitzhardinge
     
  • Various pieces of code around the kernel want to be able to trigger an
    orderly poweroff. This pulls them together into a single
    implementation.

    By default the poweroff command is /sbin/poweroff, but it can be set
    via sysctl: kernel/poweroff_cmd. This is split at whitespace, so it
    can include command-line arguments.

    This patch replaces four other instances of invoking either "poweroff"
    or "shutdown -h now": two sbus drivers, and acpi thermal
    management.

    sparc64 has its own "powerd"; still need to determine whether it should
    be replaced by orderly_poweroff().

    Signed-off-by: Jeremy Fitzhardinge
    Acked-by: Len Brown
    Signed-off-by: Chris Wright
    Cc: Andrew Morton
    Cc: Randy Dunlap
    Cc: Andi Kleen
    Cc: Al Viro
    Cc: Arnd Bergmann
    Cc: David S. Miller

    Jeremy Fitzhardinge
     

17 Jul, 2007

2 commits

  • This reduces the memory footprint and it enforces that only the current
    task can enable seccomp on itself (this is a requirement for a
    strightforward [modulo preempt ;) ] TIF_NOTSC implementation).

    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Basically, it will allow a process to unshare its user_struct table,
    resetting at the same time its own user_struct and all the associated
    accounting.

    A new root user (uid == 0) is added to the user namespace upon creation.
    Such root users have full privileges and it seems that theses privileges
    should be controlled through some means (process capabilities ?)

    The unshare is not included in this patch.

    Changes since [try #4]:
    - Updated get_user_ns and put_user_ns to accept NULL, and
    get_user_ns to return the namespace.

    Changes since [try #3]:
    - moved struct user_namespace to files user_namespace.{c,h}

    Changes since [try #2]:
    - removed struct user_namespace* argument from find_user()

    Changes since [try #1]:
    - removed struct user_namespace* argument from find_user()
    - added a root_user per user namespace

    Signed-off-by: Cedric Le Goater
    Signed-off-by: Serge E. Hallyn
    Acked-by: Pavel Emelianov
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Eric W. Biederman
    Cc: Chris Wright
    Cc: Stephen Smalley
    Cc: James Morris
    Cc: Andrew Morgan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     

11 May, 2007

3 commits

  • attach_pid() currently takes a pid_t and then uses find_pid() to find the
    corresponding struct pid. Sometimes we already have the struct pid. We can
    then skip find_pid() if attach_pid() were to take a struct pid parameter.

    Signed-off-by: Sukadev Bhattiprolu
    Cc: Cedric Le Goater
    Cc: Dave Hansen
    Cc: Serge Hallyn
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • Switch to the defines for these two checks, instead of hard coding the
    values.

    [akpm@linux-foundation.org: add missing include]
    Signed-off-by: Daniel Walker
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Walker
     
  • If CONFIG_TASK_IO_ACCOUNTING is defined, we update io accounting counters for
    each task.

    This patch permits reporting of values using the well known getrusage()
    syscall, filling ru_inblock and ru_oublock instead of null values.

    As TASK_IO_ACCOUNTING currently counts bytes counts, we approximate blocks
    count doing : nr_blocks = nr_bytes / 512

    Example of use :
    ----------------------
    After patch is applied, /usr/bin/time command can now give a good
    approximation of IO that the process had to do.

    $ /usr/bin/time grep tototo /usr/include/*
    Command exited with non-zero status 1
    0.00user 0.02system 0:02.11elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k
    24288inputs+0outputs (0major+259minor)pagefaults 0swaps

    $ /usr/bin/time dd if=/dev/zero of=/tmp/testfile count=1000
    1000+0 enregistrements lus
    1000+0 enregistrements écrits
    512000 octets (512 kB) copiés, 0,00326601 seconde, 157 MB/s
    0.00user 0.00system 0:00.00elapsed 80%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+3000outputs (0major+299minor)pagefaults 0swaps

    Signed-off-by: Eric Dumazet
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

10 May, 2007

3 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
    sound: convert "sound" subdirectory to UTF-8
    MAINTAINERS: Add cxacru website/mailing list
    include files: convert "include" subdirectory to UTF-8
    general: convert "kernel" subdirectory to UTF-8
    documentation: convert the Documentation directory to UTF-8
    Convert the toplevel files CREDITS and MAINTAINERS to UTF-8.
    remove broken URLs from net drivers' output
    Magic number prefix consistency change to Documentation/magic-number.txt
    trivial: s/i_sem /i_mutex/
    fix file specification in comments
    drivers/base/platform.c: fix small typo in doc
    misc doc and kconfig typos
    Remove obsolete fat_cvf help text
    Fix occurrences of "the the "
    Fix minor typoes in kernel/module.c
    Kconfig: Remove reference to external mqueue library
    Kconfig: A couple of grammatical fixes in arch/i386/Kconfig
    Correct comments in genrtc.c to refer to correct /proc file.
    Fix more "deprecated" spellos.
    Fix "deprecated" typoes.
    ...

    Fix trivial comment conflict in kernel/relay.c.

    Linus Torvalds
     
  • Since 2.6.18-something, the community has been bugged by the problem to
    provide a clean and a stable mechanism to postpone a cpu-hotplug event as
    lock_cpu_hotplug was badly broken.

    This is another proposal towards solving that problem. This one is along the
    lines of the solution provided in kernel/workqueue.c

    Instead of having a global mechanism like lock_cpu_hotplug, we allow the
    subsytems to define their own per-subsystem hot cpu mutexes. These would be
    taken(released) where ever we are currently calling
    lock_cpu_hotplug(unlock_cpu_hotplug).

    Also, in the per-subsystem hotcpu callback function,we take this mutex before
    we handle any pre-cpu-hotplug events and release it once we finish handling
    the post-cpu-hotplug events. A standard means for doing this has been
    provided in [PATCH 2/4] and demonstrated in [PATCH 3/4].

    The ordering of these per-subsystem mutexes might still prove to be a
    problem, but hopefully lockdep should help us get out of that muddle.

    The patch set to be applied against linux-2.6.19-rc5 is as follows:

    [PATCH 1/4] : Extend notifier_call_chain with an option to specify the
    number of notifications to be sent and also count the
    number of notifications actually sent.

    [PATCH 2/4] : Define events CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE
    and send out notifications for these in _cpu_up and
    _cpu_down. This would help us standardise the acquire and
    release of the subsystem locks in the hotcpu
    callback functions of these subsystems.

    [PATCH 3/4] : Eliminate lock_cpu_hotplug from kernel/sched.c.

    [PATCH 4/4] : In workqueue_cpu_callback function, acquire(release) the
    workqueue_mutex while handling
    CPU_LOCK_ACQUIRE(CPU_LOCK_RELEASE).

    If the per-subsystem-locking approach survives the test of time, we can expect
    a slow phasing out of lock_cpu_hotplug, which has not yet been eliminated in
    these patches :)

    This patch:

    Provide notifier_call_chain with an option to call only a specified number of
    notifiers and also record the number of call to notifiers made.

    The need for this enhancement was identified in the post entitled
    "Slab - Eliminate lock_cpu_hotplug from slab"
    (http://lkml.org/lkml/2006/10/28/92) by Ravikiran G Thirumalai and
    Andrew Morton.

    This patch adds two additional parameters to notifier_call_chain API namely
    - int nr_to_calls : Number of notifier_functions to be called.
    The don't care value is -1.

    - unsigned int *nr_calls : Records the total number of notifier_funtions
    called by notifier_call_chain. The don't care
    value is NULL.

    [michal.k.k.piotrowski@gmail.com: build fix]
    Credit: Andrew Morton
    Signed-off-by: Gautham R Shenoy
    Signed-off-by: Michal Piotrowski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gautham R Shenoy
     
  • [ With Johannes Berg ]

    Separate the hibernation (aka suspend to disk code) from the other suspend
    code. In particular:

    * Remove the definitions related to hibernation from include/linux/pm.h
    * Introduce struct hibernation_ops and a new hibernate() function to hibernate
    the system, defined in include/linux/suspend.h
    * Separate suspend code in kernel/power/main.c from hibernation-related code
    in kernel/power/disk.c and kernel/power/user.c (with the help of
    hibernation_ops)
    * Switch ACPI (the only user of pm_ops.pm_disk_mode) to hibernation_ops

    Signed-off-by: Rafael J. Wysocki
    Cc: Greg KH
    Cc: Pavel Machek
    Cc: Nigel Cunningham
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     

09 May, 2007

2 commits

  • Convert the "kernel" subdirectory of the tree to UTF-8. The only file
    modified is .

    Signed-off-by: John Anthony Kazos Jr.
    Signed-off-by: Adrian Bunk

    John Anthony Kazos Jr
     
  • As discovered here today, the change in Kernel 2.6.17 intended to inhibit
    users from setting RLIMIT_CPU to 0 (as that is equivalent to unlimited) by
    "cheating" and setting it to 1 in such a case, does not make a difference,
    as the check is done in the wrong place (too late), and only applies to the
    profiling code.

    On all systems I checked running kernels above 2.6.17, no matter what the
    hard and soft CPU time limits were before, a user could escape them by
    issuing in the shell (sh/bash/zsh) "ulimit -t 0", and then the user's
    process was not ever killed.

    Attached is a trivial patch to fix that. Simply moving the check to a
    slightly earlier location (specifically, before the line that actually
    assigns the limit - *old_rlim = new_rlim), does the trick.

    Do note that at least the zsh (but not ash, dash, or bash) shell has the
    problem of "caching" the limits set by the ulimit command, so when running
    zsh the fix will not immediately be evident - after entering "ulimit -t 0",
    "ulimit -a" will show "-t: cpu time (seconds) 0", even though the actual
    limit as returned by getrlimit(...) will be 1. It can be verified by
    opening a subshell (which will not have the values of the parent shell in
    cache) and checking in it, or just by running a CPU intensive command like
    "echo '65536^1048576' | bc" and verifying that it dumps core after one
    second.

    Regardless of whether that is a misfeature in the shell, perhaps it would
    be better to return -EINVAL from setrlimit in such a case instead of
    cheating and setting to 1, as that does not really reflect the actual state
    of the process anymore. I do not however know what the ground for that
    decision was in the original 2.6.17 change, and whether there would be any
    "backward" compatibility issues, so I preferred not to touch that right
    now.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tom Alsberg
     

08 May, 2007

1 commit

  • Remove software_suspend() and all its users since
    pm_suspend(PM_SUSPEND_DISK) should be equivalent and there's no point in
    having two interfaces for the same thing.

    The patch also changes the valid_state function to return 0 (false) for
    PM_SUSPEND_DISK when SOFTWARE_SUSPEND is not configured instead of
    accepting it and having the whole thing fail later.

    Signed-off-by: Johannes Berg
    Acked-by: "Rafael J. Wysocki"
    Cc: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Berg
     

13 Feb, 2007

2 commits

  • There isn't any real advantage to this change except that it allows the old
    functions to be removed. Which is easier on maintenance and puts the code in
    a more uniform style.

    Signed-off-by: Eric W. Biederman
    Cc: Alan Cox
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Of kernel subsystems that work with pids the tty layer is probably the largest
    consumer. But it has the nice virtue that the assiation with a session only
    lasts until the session leader exits. Which means that no reference counting
    is required. So using struct pid winds up being a simple optimization to
    avoid hash table lookups.

    In the long term the use of pid_nr also ensures that when we have multiple pid
    spaces mixed everything will work correctly.

    Signed-off-by: Eric W. Biederman
    Cc: Alan Cox
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

12 Feb, 2007

1 commit

  • A variety of (mostly) innocuous fixes to the embedded kernel-doc content in
    source files, including:

    * make multi-line initial descriptions single line
    * denote some function names, constants and structs as such
    * change erroneous opening '/*' to '/**' in a few places
    * reword some text for clarity

    Signed-off-by: Robert P. J. Day
    Cc: "Randy.Dunlap"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     

24 Jan, 2007

1 commit

  • while lock-profiling the -rt kernel i noticed weird contention during
    mmap-intense workloads, and the tracer showed the following gem, in one
    of our MM hotpaths:

    threaded-2771 1.... 65us : sys_munmap (sysenter_do_call)
    threaded-2771 1.... 66us : profile_munmap (sys_munmap)
    threaded-2771 1.... 66us : blocking_notifier_call_chain (profile_munmap)
    threaded-2771 1.... 66us : rt_down_read (blocking_notifier_call_chain)

    ouch! a global rw-semaphore taken in one of the most performance-
    sensitive codepaths of the kernel. And i dont even have oprofile
    enabled! All distro kernels have CONFIG_PROFILING enabled, so this
    scalability problem affects the majority of Linux users.

    The fix is to enhance blocking_notifier_call_chain() to only take the
    lock if there appears to be work on the call-chain.

    With this patch applied i get nicely saturated system, and much higher
    munmap performance, on SMP systems.

    And as a bonus this also fixes a similar scalability bottleneck in the
    thread-exit codepath: profile_task_exit() ...

    Signed-off-by: Ingo Molnar
    Acked-by: Peter Zijlstra
    Acked-by: Nick Piggin
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

09 Dec, 2006

3 commits

  • All tasks in the process group have the same sid, we don't need to iterate
    them all to check that the caller of sys_setpgid() doesn't change its
    session.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Replace occurences of task->signal->session by a new process_session() helper
    routine.

    It will be useful for pid namespaces to abstract the session pid number.

    Signed-off-by: Cedric Le Goater
    Cc: Kirill Korotaev
    Cc: Eric W. Biederman
    Cc: Herbert Poetzl
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • Fix the locking of signal->tty.

    Use ->sighand->siglock to protect ->signal->tty; this lock is already used
    by most other members of ->signal/->sighand. And unless we are 'current'
    or the tasklist_lock is held we need ->siglock to access ->signal anyway.

    (NOTE: sys_unshare() is broken wrt ->sighand locking rules)

    Note that tty_mutex is held over tty destruction, so while holding
    tty_mutex any tty pointer remains valid. Otherwise the lifetime of ttys
    are governed by their open file handles. This leaves some holes for tty
    access from signal->tty (or any other non file related tty access).

    It solves the tty SLAB scribbles we were seeing.

    (NOTE: the change from group_send_sig_info to __group_send_sig_info needs to
    be examined by someone familiar with the security framework, I think
    it is safe given the SEND_SIG_PRIV from other __group_send_sig_info
    invocations)

    [schwidefsky@de.ibm.com: 3270 fix]
    [akpm@osdl.org: various post-viro fixes]
    Signed-off-by: Peter Zijlstra
    Acked-by: Alan Cox
    Cc: Oleg Nesterov
    Cc: Prarit Bhargava
    Cc: Chris Wright
    Cc: Roland McGrath
    Cc: Stephen Smalley
    Cc: James Morris
    Cc: "David S. Miller"
    Cc: Jeff Dike
    Cc: Martin Schwidefsky
    Cc: Jan Kara
    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

08 Dec, 2006

1 commit


22 Nov, 2006

1 commit

  • Pass the work_struct pointer to the work function rather than context data.
    The work function can use container_of() to work out the data.

    For the cases where the container of the work_struct may go away the moment the
    pending bit is cleared, it is made possible to defer the release of the
    structure by deferring the clearing of the pending bit.

    To make this work, an extra flag is introduced into the management side of the
    work_struct. This governs auto-release of the structure upon execution.

    Ordinarily, the work queue executor would release the work_struct for further
    scheduling or deallocation by clearing the pending bit prior to jumping to the
    work function. This means that, unless the driver makes some guarantee itself
    that the work_struct won't go away, the work function may not access anything
    else in the work_struct or its container lest they be deallocated.. This is a
    problem if the auxiliary data is taken away (as done by the last patch).

    However, if the pending bit is *not* cleared before jumping to the work
    function, then the work function *may* access the work_struct and its container
    with no problems. But then the work function must itself release the
    work_struct by calling work_release().

    In most cases, automatic release is fine, so this is the default. Special
    initiators exist for the non-auto-release case (ending in _NAR).

    Signed-Off-By: David Howells

    David Howells
     

04 Oct, 2006

2 commits

  • Currently the init_srcu_struct() routine has no way to report out-of-memory
    errors. This patch (as761) makes it return -ENOMEM when the per-cpu data
    allocation fails.

    The patch also makes srcu_init_notifier_head() report a BUG if a notifier
    head can't be initialized. Perhaps it should return -ENOMEM instead, but
    in the most likely cases where this might occur I don't think any recovery
    is possible. Notifier chains generally are not created dynamically.

    [akpm@osdl.org: avoid statement-with-side-effect in macro]
    Signed-off-by: Alan Stern
    Acked-by: Paul E. McKenney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Stern
     
  • This patch (as751) adds a new type of notifier chain, based on the SRCU
    (Sleepable Read-Copy Update) primitives recently added to the kernel. An
    SRCU notifier chain is much like a blocking notifier chain, in that it must
    be called in process context and its callout routines are allowed to sleep.
    The difference is that the chain's links are protected by the SRCU
    mechanism rather than by an rw-semaphore, so calling the chain has
    extremely low overhead: no memory barriers and no cache-line bouncing. On
    the other hand, unregistering from the chain is expensive and the chain
    head requires special runtime initialization (plus cleanup if it is to be
    deallocated).

    SRCU notifiers are appropriate for notifiers that will be called very
    frequently and for which unregistration occurs very seldom. The proposed
    "task notifier" scheme qualifies, as may some of the network notifiers.

    Signed-off-by: Alan Stern
    Acked-by: Paul E. McKenney
    Acked-by: Chandra Seetharaman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Stern
     

02 Oct, 2006

3 commits

  • There are a few places in the kernel where the init task is signaled. The
    ctrl+alt+del sequence is one them. It kills a task, usually init, using a
    cached pid (cad_pid).

    This patch replaces the pid_t by a struct pid to avoid pid wrap around
    problem. The struct pid is initialized at boot time in init() and can be
    modified through systctl with

    /proc/sys/kernel/cad_pid

    [ I haven't found any distro using it ? ]

    It also introduces a small helper routine kill_cad_pid() which is used
    where it seemed ok to use cad_pid instead of pid 1.

    [akpm@osdl.org: cleanups, build fix]
    Signed-off-by: Cedric Le Goater
    Cc: Eric W. Biederman
    Cc: Martin Schwidefsky
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • Replace references to system_utsname to the per-process uts namespace
    where appropriate. This includes things like uname.

    Changes: Per Eric Biederman's comments, use the per-process uts namespace
    for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c

    [jdike@addtoit.com: UML fix]
    [clg@fr.ibm.com: cleanup]
    [akpm@osdl.org: build fix]
    Signed-off-by: Serge E. Hallyn
    Cc: Kirill Korotaev
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Andrey Savochkin
    Signed-off-by: Cedric Le Goater
    Cc: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • When kprobe is re-entered, the re-entered kprobe kernel path will will call
    atomic_notifier_call_chain function, if this function is kprobed that will
    incur numerous kprobe recursive fault. This patch disallows kprobes on
    atomic_notifier_call_chain function.

    Signed-off-by: bibo, mao
    Signed-off-by: Ananth N Mavinakayanahalli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    bibo,mao