26 Dec, 2011

1 commit

  • * pm-sleep: (51 commits)
    PM: Drop generic_subsys_pm_ops
    PM / Sleep: Remove forward-only callbacks from AMBA bus type
    PM / Sleep: Remove forward-only callbacks from platform bus type
    PM: Run the driver callback directly if the subsystem one is not there
    PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers
    PM / Sleep: Merge internal functions in generic_ops.c
    PM / Sleep: Simplify generic system suspend callbacks
    PM / Hibernate: Remove deprecated hibernation snapshot ioctls
    PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()
    PM / Sleep: Recommend [un]lock_system_sleep() over using pm_mutex directly
    PM / Sleep: Replace mutex_[un]lock(&pm_mutex) with [un]lock_system_sleep()
    PM / Sleep: Make [un]lock_system_sleep() generic
    PM / Sleep: Use the freezer_count() functions in [un]lock_system_sleep() APIs
    PM / Freezer: Remove the "userspace only" constraint from freezer[_do_not]_count()
    PM / Hibernate: Replace unintuitive 'if' condition in kernel/power/user.c with 'else'
    Freezer / sunrpc / NFS: don't allow TASK_KILLABLE sleeps to block the freezer
    PM / Sleep: Unify diagnostic messages from device suspend/resume
    ACPI / PM: Do not save/restore NVS on Asus K54C/K54HR
    PM / Hibernate: Remove deprecated hibernation test modes
    PM / Hibernate: Thaw processes in SNAPSHOT_CREATE_IMAGE ioctl test path
    ...

    Conflicts:
    kernel/kmod.c

    Rafael J. Wysocki
     

10 Dec, 2011

1 commit

  • Commit a144c6a (PM: Print a warning if firmware is requested when tasks
    are frozen) introduced usermodehelper_is_disabled() to warn and exit
    immediately if firmware is requested when usermodehelpers are disabled.

    However, it is racy. Consider the following scenario, currently used in
    drivers/base/firmware_class.c:

    ...
    if (usermodehelper_is_disabled())
    goto out;

    /* Do actual work */
    ...

    out:
    return err;

    Nothing prevents someone from disabling usermodehelpers just after the check
    in the 'if' condition, which means that it is quite possible to try doing the
    "actual work" with usermodehelpers disabled, leading to undesirable
    consequences.

    In particular, this race condition in _request_firmware() causes task freezing
    failures whenever suspend/hibernation is in progress because, it wrongly waits
    to get the firmware/microcode image from userspace when actually the
    usermodehelpers are disabled or userspace has been frozen.
    Some of the example scenarios that cause freezing failures due to this race
    are those that depend on userspace via request_firmware(), such as x86
    microcode module initialization and microcode image reload.

    Previous discussions about this issue can be found at:
    http://thread.gmane.org/gmane.linux.kernel/1198291/focus=1200591

    This patch adds proper synchronization to fix this issue.

    It is to be noted that this patchset fixes the freezing failures but doesn't
    remove the warnings. IOW, it does not attempt to add explicit synchronization
    to x86 microcode driver to avoid requesting microcode image at inopportune
    moments. Because, the warnings were introduced to highlight such cases, in the
    first place. And we need not silence the warnings, since we take care of the
    *real* problem (freezing failure) and hence, after that, the warnings are
    pretty harmless anyway.

    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki

    Srivatsa S. Bhat
     

24 Nov, 2011

1 commit

  • usermodehelper_pm_callback() no longer exists in the kernel. There are 2
    comments in kernel/kmod.c that still refer to it.

    Also, the patch that introduced usermodehelper_pm_callback(), #included
    two header files: and . But these are
    no longer necessary.

    This patch updates the comments as appropriate and removes the unnecessary
    header file inclusions.

    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki

    Srivatsa S. Bhat
     

26 Oct, 2011

1 commit

  • Due to post-increment in condition of kmod_loop_msg in __request_module(),
    the system log can be spammed by much more than 5 instances of the 'runaway
    loop' message if the number of events triggering it makes the kmod_loop_msg
    to overflow.

    Fix that by making sure we never increment it past the threshold.

    Signed-off-by: Jiri Kosina
    Signed-off-by: Rusty Russell
    CC: stable@kernel.org

    Jiri Kosina
     

04 Aug, 2011

1 commit

  • The core device layer sends tons of uevent notifications for each device
    it finds, and if the kernel has been built with a non-empty
    CONFIG_UEVENT_HELPER_PATH that will make us try to execute the usermode
    helper binary for all these events very early in the boot.

    Not only won't the root filesystem even be mounted at that point, we
    literally won't have necessarily even initialized all the process
    handling data structures at that point, which causes no end of silly
    problems even when the usermode helper doesn't actually succeed in
    executing.

    So just use our existing infrastructure to disable the usermodehelpers
    to make the kernel start out with them disabled. We enable them when
    we've at least initialized stuff a bit.

    Problems related to an uninitialized

    init_ipc_ns.ids[IPC_SHM_IDS].rw_mutex

    reported by various people.

    Reported-by: Manuel Lauss
    Reported-by: Richard Weinberger
    Reported-by: Marc Zyngier
    Acked-by: Kay Sievers
    Cc: Andrew Morton
    Cc: Vasiliy Kulikov
    Cc: Greg KH
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

18 Jun, 2011

1 commit

  • ____call_usermodehelper() now erases any credentials set by the
    subprocess_inf::init() function. The problem is that commit
    17f60a7da150 ("capabilites: allow the application of capability limits
    to usermode helpers") creates and commits new credentials with
    prepare_kernel_cred() after the call to the init() function. This wipes
    all keyrings after umh_keys_init() is called.

    The best way to deal with this is to put the init() call just prior to
    the commit_creds() call, and pass the cred pointer to init(). That
    means that umh_keys_init() and suchlike can modify the credentials
    _before_ they are published and potentially in use by the rest of the
    system.

    This prevents request_key() from working as it is prevented from passing
    the session keyring it set up with the authorisation token to
    /sbin/request-key, and so the latter can't assume the authority to
    instantiate the key. This causes the in-kernel DNS resolver to fail
    with ENOKEY unconditionally.

    Signed-off-by: David Howells
    Acked-by: Eric Paris
    Tested-by: Jeff Layton
    Signed-off-by: Linus Torvalds

    David Howells
     

24 May, 2011

1 commit


18 May, 2011

2 commits

  • We need to prevent kernel-forked processes during system poweroff.
    Such processes try to access the filesystem whose disks we are
    trying to shutdown at the same time. This causes delays and exceptions
    in the storage drivers.

    A follow-up patch will add these calls and need usermodehelper_disable()
    also on systems without suspend support.

    Signed-off-by: Kay Sievers
    Signed-off-by: Rafael J. Wysocki

    Kay Sievers
     
  • Some drivers erroneously use request_firmware() from their ->resume()
    (or ->thaw(), or ->restore()) callbacks, which is not going to work
    unless the firmware has been built in. This causes system resume to
    stall until the firmware-loading timeout expires, which makes users
    think that the resume has failed and reboot their machines
    unnecessarily. For this reason, make _request_firmware() print a
    warning and return immediately with error code if it has been called
    when tasks are frozen and it's impossible to start any new usermode
    helpers.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Greg Kroah-Hartman
    Reviewed-by: Valdis Kletnieks

    Rafael J. Wysocki
     

04 Apr, 2011

1 commit

  • There is no way to limit the capabilities of usermodehelpers. This problem
    reared its head recently when someone complained that any user with
    cap_net_admin was able to load arbitrary kernel modules, even though the user
    didn't have cap_sys_module. The reason is because the actual load is done by
    a usermode helper and those always have the full cap set. This patch addes new
    sysctls which allow us to bound the permissions of usermode helpers.

    /proc/sys/kernel/usermodehelper/bset
    /proc/sys/kernel/usermodehelper/inheritable

    You must have CAP_SYS_MODULE and CAP_SETPCAP to change these (changes are
    &= ONLY). When the kernel launches a usermodehelper it will do so with these
    as the bset and pI.

    -v2: make globals static
    create spinlock to protect globals

    -v3: require both CAP_SETPCAP and CAP_SYS_MODULE
    -v4: fix the typo s/CAP_SET_PCAP/CAP_SETPCAP/ because I didn't commit
    Signed-off-by: Eric Paris
    No-objection-from: Serge E. Hallyn
    Acked-by: David Howells
    Acked-by: Serge E. Hallyn
    Acked-by: Andrew G. Morgan
    Signed-off-by: James Morris

    Eric Paris
     

18 Aug, 2010

1 commit

  • Make do_execve() take a const filename pointer so that kernel_execve() compiles
    correctly on ARM:

    arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type

    This also requires the argv and envp arguments to be consted twice, once for
    the pointer array and once for the strings the array points to. This is
    because do_execve() passes a pointer to the filename (now const) to
    copy_strings_kernel(). A simpler alternative would be to cast the filename
    pointer in do_execve() when it's passed to copy_strings_kernel().

    do_execve() may not change any of the strings it is passed as part of the argv
    or envp lists as they are some of them in .rodata, so marking these strings as
    const should be fine.

    Further kernel_execve() and sys_execve() need to be changed to match.

    This has been test built on x86_64, frv, arm and mips.

    Signed-off-by: David Howells
    Tested-by: Ralf Baechle
    Acked-by: Russell King
    Signed-off-by: Linus Torvalds

    David Howells
     

28 May, 2010

8 commits

  • UMH_WAIT_EXEC should report the error if kernel_thread() fails, like
    UMH_WAIT_PROC does.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • __call_usermodehelper(UMH_NO_WAIT) has 2 problems:

    - if kernel_thread() fails, call_usermodehelper_freeinfo()
    is not called.

    - for unknown reason UMH_NO_WAIT has UMH_WAIT_PROC logic,
    we spawn yet another thread which waits until the user
    mode application exits.

    Change the UMH_NO_WAIT code to use ____call_usermodehelper() instead of
    wait_for_helper(), and do call_usermodehelper_freeinfo() unconditionally.
    We can rely on CLONE_VFORK, do_fork(CLONE_VFORK) until the child exits or
    execs.

    With or without this patch UMH_NO_WAIT does not report the error if
    kernel_thread() fails, this is correct since the caller doesn't wait for
    result.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • 1. wait_for_helper() calls allow_signal(SIGCHLD) to ensure the child
    can't autoreap itself.

    However, this means that a spurious SIGCHILD from user-space can
    set TIF_SIGPENDING and:

    - kernel_thread() or sys_wait4() can fail due to signal_pending()

    - worse, wait4() can fail before ____call_usermodehelper() execs
    or exits. In this case the caller may kfree(subprocess_info)
    while the child still uses this memory.

    Change the code to use SIG_DFL instead of magic "(void __user *)2"
    set by allow_signal(). This means that SIGCHLD won't be delivered,
    yet the child won't autoreap itsefl.

    The problem is minor, only root can send a signal to this kthread.

    2. If sys_wait4(&ret) fails it doesn't populate "ret", in this case
    wait_for_helper() reports a random value from uninitialized var.

    With this patch sys_wait4() should never fail, but still it makes
    sense to initialize ret = -ECHILD so that the caller can notice
    the problem.

    Signed-off-by: Oleg Nesterov
    Acked-by: Neil Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • ____call_usermodehelper() correctly calls flush_signal_handlers() to set
    SIG_DFL, but sigemptyset(->blocked) and recalc_sigpending() are not
    needed.

    This kthread was forked by workqueue thread, all signals must be unblocked
    and ignored, no pending signal is possible.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Now that nobody ever changes subprocess_info->cred we can kill this member
    and related code. ____call_usermodehelper() always runs in the context of
    freshly forked kernel thread, it has the proper ->cred copied from its
    parent kthread, keventd.

    Signed-off-by: Oleg Nesterov
    Acked-by: Neil Horman
    Acked-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • call_usermodehelper_keys() uses call_usermodehelper_setkeys() to change
    subprocess_info->cred in advance. Now that we have info->init() we can
    change this code to set tgcred->session_keyring in context of execing
    kernel thread.

    Note: since currently call_usermodehelper_keys() is never called with
    UMH_NO_WAIT, call_usermodehelper_keys()->key_get() and umh_keys_cleanup()
    are not really needed, we could rely on install_session_keyring_to_cred()
    which does key_get() on success.

    Signed-off-by: Oleg Nesterov
    Acked-by: Neil Horman
    Acked-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • The first patch in this series introduced an init function to the
    call_usermodehelper api so that processes could be customized by caller.
    This patch takes advantage of that fact, by customizing the helper in
    do_coredump to create the pipe and set its core limit to one (for our
    recusrsion check). This lets us clean up the previous uglyness in the
    usermodehelper internals and factor call_usermodehelper out entirely.
    While I'm at it, we can also modify the helper setup to look for a core
    limit value of 1 rather than zero for our recursion check

    Signed-off-by: Neil Horman
    Reviewed-by: Oleg Nesterov
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Neil Horman
     
  • About 6 months ago, I made a set of changes to how the core-dump-to-a-pipe
    feature in the kernel works. We had reports of several races, including
    some reports of apps bypassing our recursion check so that a process that
    was forked as part of a core_pattern setup could infinitely crash and
    refork until the system crashed.

    We fixed those by improving our recursion checks. The new check basically
    refuses to fork a process if its core limit is zero, which works well.

    Unfortunately, I've been getting grief from maintainer of user space
    programs that are inserted as the forked process of core_pattern. They
    contend that in order for their programs (such as abrt and apport) to
    work, all the running processes in a system must have their core limits
    set to a non-zero value, to which I say 'yes'. I did this by design, and
    think thats the right way to do things.

    But I've been asked to ease this burden on user space enough times that I
    thought I would take a look at it. The first suggestion was to make the
    recursion check fail on a non-zero 'special' number, like one. That way
    the core collector process could set its core size ulimit to 1, and enable
    the kernel's recursion detection. This isn't a bad idea on the surface,
    but I don't like it since its opt-in, in that if a program like abrt or
    apport has a bug and fails to set such a core limit, we're left with a
    recursively crashing system again.

    So I've come up with this. What I've done is modify the
    call_usermodehelper api such that an extra parameter is added, a function
    pointer which will be called by the user helper task, after it forks, but
    before it exec's the required process. This will give the caller the
    opportunity to get a call back in the processes context, allowing it to do
    whatever it needs to to the process in the kernel prior to exec-ing the
    user space code. In the case of do_coredump, this callback is ues to set
    the core ulimit of the helper process to 1. This elimnates the opt-in
    problem that I had above, as it allows the ulimit for core sizes to be set
    to the value of 1, which is what the recursion check looks for in
    do_coredump.

    This patch:

    Create new function call_usermodehelper_fns() and allow it to assign both
    an init and cleanup function, as we'll as arbitrary data.

    The init function is called from the context of the forked process and
    allows for customization of the helper process prior to calling exec. Its
    return code gates the continuation of the process, or causes its exit.
    Also add an arbitrary data pointer to the subprocess_info struct allowing
    for data to be passed from the caller to the new process, and the
    subsequent cleanup process

    Also, use this patch to cleanup the cleanup function. It currently takes
    an argp and envp pointer for freeing, which is ugly. Lets instead just
    make the subprocess_info structure public, and pass that to the cleanup
    and init routines

    Signed-off-by: Neil Horman
    Reviewed-by: Oleg Nesterov
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Neil Horman
     

12 Jan, 2010

1 commit

  • Fix resource (write-pipe file) leak in call_usermodehelper_pipe().

    When call_usermodehelper_exec() fails, write-pipe file is opened and
    call_usermodehelper_pipe() just returns an error. Since it is hard for
    caller to determine whether the error occured when opening the pipe or
    executing the helper, the caller cannot close the pipe by themselves.

    I've found this resoruce leak when testing coredump. You can check how
    the resource leaks as below;

    $ echo "|nocommand" > /proc/sys/kernel/core_pattern
    $ ulimit -c unlimited
    $ while [ 1 ]; do ./segv; done &> /dev/null &
    $ cat /proc/meminfo (
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masami Hiramatsu
     

10 Nov, 2009

1 commit

  • For SELinux to do better filtering in userspace we send the name of the
    module along with the AVC denial when a program is denied module_request.

    Example output:

    type=SYSCALL msg=audit(11/03/2009 10:59:43.510:9) : arch=x86_64 syscall=write success=yes exit=2 a0=3 a1=7fc28c0d56c0 a2=2 a3=7fffca0d7440 items=0 ppid=1727 pid=1729 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=rpc.nfsd exe=/usr/sbin/rpc.nfsd subj=system_u:system_r:nfsd_t:s0 key=(null)
    type=AVC msg=audit(11/03/2009 10:59:43.510:9) : avc: denied { module_request } for pid=1729 comm=rpc.nfsd kmod="net-pf-10" scontext=system_u:system_r:nfsd_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclass=system

    Signed-off-by: Eric Paris
    Signed-off-by: James Morris

    Eric Paris
     

12 Sep, 2009

1 commit

  • …el/git/tip/linux-2.6-tip

    * 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (105 commits)
    ring-buffer: only enable ring_buffer_swap_cpu when needed
    ring-buffer: check for swapped buffers in start of committing
    tracing: report error in trace if we fail to swap latency buffer
    tracing: add trace_array_printk for internal tracers to use
    tracing: pass around ring buffer instead of tracer
    tracing: make tracing_reset safe for external use
    tracing: use timestamp to determine start of latency traces
    tracing: Remove mentioning of legacy latency_trace file from documentation
    tracing/filters: Defer pred allocation, fix memory leak
    tracing: remove users of tracing_reset
    tracing: disable buffers and synchronize_sched before resetting
    tracing: disable update max tracer while reading trace
    tracing: print out start and stop in latency traces
    ring-buffer: disable all cpu buffers when one finds a problem
    ring-buffer: do not count discarded events
    ring-buffer: remove ring_buffer_event_discard
    ring-buffer: fix ring_buffer_read crossing pages
    ring-buffer: remove unnecessary cpu_relax
    ring-buffer: do not swap buffers during a commit
    ring-buffer: do not reset while in a commit
    ...

    Linus Torvalds
     

02 Sep, 2009

1 commit

  • Add a config option (CONFIG_DEBUG_CREDENTIALS) to turn on some debug checking
    for credential management. The additional code keeps track of the number of
    pointers from task_structs to any given cred struct, and checks to see that
    this number never exceeds the usage count of the cred struct (which includes
    all references, not just those from task_structs).

    Furthermore, if SELinux is enabled, the code also checks that the security
    pointer in the cred struct is never seen to be invalid.

    This attempts to catch the bug whereby inode_has_perm() faults in an nfsd
    kernel thread on seeing cred->security be a NULL pointer (it appears that the
    credential struct has been previously released):

    http://www.kerneloops.org/oops.php?number=252883

    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     

17 Aug, 2009

1 commit

  • Add trace points to trace module_load, module_free, module_get,
    module_put and module_request, and use trace_event facility to
    get the trace output.

    Here's the sample output:

    TASK-PID CPU# TIMESTAMP FUNCTION
    | | | | |
    -42 [000] 1.758380: module_request: fb0 wait=1 call_site=fb_open
    ...
    -60 [000] 3.269403: module_load: scsi_wait_scan
    -60 [000] 3.269432: module_put: scsi_wait_scan call_site=sys_init_module refcnt=0
    -61 [001] 3.273168: module_free: scsi_wait_scan
    ...
    -1021 [000] 13.836081: module_load: sunrpc
    -1021 [000] 13.840589: module_put: sunrpc call_site=sys_init_module refcnt=-1
    -1027 [000] 13.848098: module_get: sunrpc call_site=try_module_get refcnt=0
    -1027 [000] 13.848308: module_get: sunrpc call_site=get_filesystem refcnt=1
    -1027 [000] 13.848692: module_put: sunrpc call_site=put_filesystem refcnt=0
    ...
    modprobe-2587 [001] 1088.437213: module_load: trace_events_sample F
    modprobe-2587 [001] 1088.437786: module_put: trace_events_sample call_site=sys_init_module refcnt=0

    Note:

    - the taints flag can be 'F', 'C' and/or 'P' if mod->taints != 0

    - the module refcnt is percpu, so it can be negative in a
    specific cpu

    Signed-off-by: Li Zefan
    Acked-by: Rusty Russell
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Rusty Russell
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     

14 Aug, 2009

1 commit

  • Calling request_module() will trigger a userspace upcall which will load a
    new module into the kernel. This can be a dangerous event if the process
    able to trigger request_module() is able to control either the modprobe
    binary or the module binary. This patch adds a new security hook to
    request_module() which can be used by an LSM to control a processes ability
    to call request_module().

    Signed-off-by: Eric Paris
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    Eric Paris
     

09 Jul, 2009

1 commit

  • Fix various silly problems wrt mnt_namespace.h:

    - exit_mnt_ns() isn't used, remove it
    - done that, sched.h and nsproxy.h inclusions aren't needed
    - mount.h inclusion was need for vfsmount_lock, but no longer
    - remove mnt_namespace.h inclusion from files which don't use anything
    from mnt_namespace.h

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

27 May, 2009

1 commit


31 Mar, 2009

1 commit

  • There seems to be a common pattern in the kernel where drivers want to
    call request_module() from inside a module_init() function. Currently
    this would deadlock.

    As a result, several drivers go through hoops like scheduling things via
    kevent, or creating custom work queues (because kevent can deadlock on them).

    This patch changes this to use a request_module_nowait() function macro instead,
    which just fires the modprobe off but doesn't wait for it, and thus avoids the
    original deadlock entirely.

    On my laptop this already results in one less kernel thread running..

    (Includes Jiri's patch to use enum umh_wait)

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Rusty Russell (bool-ified)
    Cc: Jiri Slaby

    Arjan van de Ven
     

30 Mar, 2009

1 commit

  • Impact: cleanup

    (Thanks to Al Viro for reminding me of this, via Ingo)

    CPU_MASK_ALL is the (deprecated) "all bits set" cpumask, defined as so:

    #define CPU_MASK_ALL (cpumask_t) { { ... } }

    Taking the address of such a temporary is questionable at best,
    unfortunately 321a8e9d (cpumask: add CPU_MASK_ALL_PTR macro) added
    CPU_MASK_ALL_PTR:

    #define CPU_MASK_ALL_PTR (&CPU_MASK_ALL)

    Which formalizes this practice. One day gcc could bite us over this
    usage (though we seem to have gotten away with it so far).

    So replace everywhere which used &CPU_MASK_ALL or CPU_MASK_ALL_PTR
    with the modern "cpu_all_mask" (a real const struct cpumask *).

    Signed-off-by: Rusty Russell
    Acked-by: Ingo Molnar
    Reported-by: Al Viro
    Cc: Mike Travis

    Rusty Russell
     

07 Jan, 2009

1 commit

  • Fix varargs kernel-doc format in kmod.c:
    Use @... instead of @varargs.

    Warning(kernel/kmod.c:67): Excess function parameter or struct member 'varargs' description in 'request_module'

    Signed-off-by: Randy Dunlap
    Acked-by: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

14 Nov, 2008

2 commits

  • Inaugurate copy-on-write credentials management. This uses RCU to manage the
    credentials pointer in the task_struct with respect to accesses by other tasks.
    A process may only modify its own credentials, and so does not need locking to
    access or modify its own credentials.

    A mutex (cred_replace_mutex) is added to the task_struct to control the effect
    of PTRACE_ATTACHED on credential calculations, particularly with respect to
    execve().

    With this patch, the contents of an active credentials struct may not be
    changed directly; rather a new set of credentials must be prepared, modified
    and committed using something like the following sequence of events:

    struct cred *new = prepare_creds();
    int ret = blah(new);
    if (ret < 0) {
    abort_creds(new);
    return ret;
    }
    return commit_creds(new);

    There are some exceptions to this rule: the keyrings pointed to by the active
    credentials may be instantiated - keyrings violate the COW rule as managing
    COW keyrings is tricky, given that it is possible for a task to directly alter
    the keys in a keyring in use by another task.

    To help enforce this, various pointers to sets of credentials, such as those in
    the task_struct, are declared const. The purpose of this is compile-time
    discouragement of altering credentials through those pointers. Once a set of
    credentials has been made public through one of these pointers, it may not be
    modified, except under special circumstances:

    (1) Its reference count may incremented and decremented.

    (2) The keyrings to which it points may be modified, but not replaced.

    The only safe way to modify anything else is to create a replacement and commit
    using the functions described in Documentation/credentials.txt (which will be
    added by a later patch).

    This patch and the preceding patches have been tested with the LTP SELinux
    testsuite.

    This patch makes several logical sets of alteration:

    (1) execve().

    This now prepares and commits credentials in various places in the
    security code rather than altering the current creds directly.

    (2) Temporary credential overrides.

    do_coredump() and sys_faccessat() now prepare their own credentials and
    temporarily override the ones currently on the acting thread, whilst
    preventing interference from other threads by holding cred_replace_mutex
    on the thread being dumped.

    This will be replaced in a future patch by something that hands down the
    credentials directly to the functions being called, rather than altering
    the task's objective credentials.

    (3) LSM interface.

    A number of functions have been changed, added or removed:

    (*) security_capset_check(), ->capset_check()
    (*) security_capset_set(), ->capset_set()

    Removed in favour of security_capset().

    (*) security_capset(), ->capset()

    New. This is passed a pointer to the new creds, a pointer to the old
    creds and the proposed capability sets. It should fill in the new
    creds or return an error. All pointers, barring the pointer to the
    new creds, are now const.

    (*) security_bprm_apply_creds(), ->bprm_apply_creds()

    Changed; now returns a value, which will cause the process to be
    killed if it's an error.

    (*) security_task_alloc(), ->task_alloc_security()

    Removed in favour of security_prepare_creds().

    (*) security_cred_free(), ->cred_free()

    New. Free security data attached to cred->security.

    (*) security_prepare_creds(), ->cred_prepare()

    New. Duplicate any security data attached to cred->security.

    (*) security_commit_creds(), ->cred_commit()

    New. Apply any security effects for the upcoming installation of new
    security by commit_creds().

    (*) security_task_post_setuid(), ->task_post_setuid()

    Removed in favour of security_task_fix_setuid().

    (*) security_task_fix_setuid(), ->task_fix_setuid()

    Fix up the proposed new credentials for setuid(). This is used by
    cap_set_fix_setuid() to implicitly adjust capabilities in line with
    setuid() changes. Changes are made to the new credentials, rather
    than the task itself as in security_task_post_setuid().

    (*) security_task_reparent_to_init(), ->task_reparent_to_init()

    Removed. Instead the task being reparented to init is referred
    directly to init's credentials.

    NOTE! This results in the loss of some state: SELinux's osid no
    longer records the sid of the thread that forked it.

    (*) security_key_alloc(), ->key_alloc()
    (*) security_key_permission(), ->key_permission()

    Changed. These now take cred pointers rather than task pointers to
    refer to the security context.

    (4) sys_capset().

    This has been simplified and uses less locking. The LSM functions it
    calls have been merged.

    (5) reparent_to_kthreadd().

    This gives the current thread the same credentials as init by simply using
    commit_thread() to point that way.

    (6) __sigqueue_alloc() and switch_uid()

    __sigqueue_alloc() can't stop the target task from changing its creds
    beneath it, so this function gets a reference to the currently applicable
    user_struct which it then passes into the sigqueue struct it returns if
    successful.

    switch_uid() is now called from commit_creds(), and possibly should be
    folded into that. commit_creds() should take care of protecting
    __sigqueue_alloc().

    (7) [sg]et[ug]id() and co and [sg]et_current_groups.

    The set functions now all use prepare_creds(), commit_creds() and
    abort_creds() to build and check a new set of credentials before applying
    it.

    security_task_set[ug]id() is called inside the prepared section. This
    guarantees that nothing else will affect the creds until we've finished.

    The calling of set_dumpable() has been moved into commit_creds().

    Much of the functionality of set_user() has been moved into
    commit_creds().

    The get functions all simply access the data directly.

    (8) security_task_prctl() and cap_task_prctl().

    security_task_prctl() has been modified to return -ENOSYS if it doesn't
    want to handle a function, or otherwise return the return value directly
    rather than through an argument.

    Additionally, cap_task_prctl() now prepares a new set of credentials, even
    if it doesn't end up using it.

    (9) Keyrings.

    A number of changes have been made to the keyrings code:

    (a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have
    all been dropped and built in to the credentials functions directly.
    They may want separating out again later.

    (b) key_alloc() and search_process_keyrings() now take a cred pointer
    rather than a task pointer to specify the security context.

    (c) copy_creds() gives a new thread within the same thread group a new
    thread keyring if its parent had one, otherwise it discards the thread
    keyring.

    (d) The authorisation key now points directly to the credentials to extend
    the search into rather pointing to the task that carries them.

    (e) Installing thread, process or session keyrings causes a new set of
    credentials to be created, even though it's not strictly necessary for
    process or session keyrings (they're shared).

    (10) Usermode helper.

    The usermode helper code now carries a cred struct pointer in its
    subprocess_info struct instead of a new session keyring pointer. This set
    of credentials is derived from init_cred and installed on the new process
    after it has been cloned.

    call_usermodehelper_setup() allocates the new credentials and
    call_usermodehelper_freeinfo() discards them if they haven't been used. A
    special cred function (prepare_usermodeinfo_creds()) is provided
    specifically for call_usermodehelper_setup() to call.

    call_usermodehelper_setkeys() adjusts the credentials to sport the
    supplied keyring as the new session keyring.

    (11) SELinux.

    SELinux has a number of changes, in addition to those to support the LSM
    interface changes mentioned above:

    (a) selinux_setprocattr() no longer does its check for whether the
    current ptracer can access processes with the new SID inside the lock
    that covers getting the ptracer's SID. Whilst this lock ensures that
    the check is done with the ptracer pinned, the result is only valid
    until the lock is released, so there's no point doing it inside the
    lock.

    (12) is_single_threaded().

    This function has been extracted from selinux_setprocattr() and put into
    a file of its own in the lib/ directory as join_session_keyring() now
    wants to use it too.

    The code in SELinux just checked to see whether a task shared mm_structs
    with other tasks (CLONE_VM), but that isn't good enough. We really want
    to know if they're part of the same thread group (CLONE_THREAD).

    (13) nfsd.

    The NFS server daemon now has to use the COW credentials to set the
    credentials it is going to use. It really needs to pass the credentials
    down to the functions it calls, but it can't do that until other patches
    in this series have been applied.

    Signed-off-by: David Howells
    Acked-by: James Morris
    Signed-off-by: James Morris

    David Howells
     
  • Alter the use of the key instantiation and negation functions' link-to-keyring
    arguments. Currently this specifies a keyring in the target process to link
    the key into, creating the keyring if it doesn't exist. This, however, can be
    a problem for copy-on-write credentials as it means that the instantiating
    process can alter the credentials of the requesting process.

    This patch alters the behaviour such that:

    (1) If keyctl_instantiate_key() or keyctl_negate_key() are given a specific
    keyring by ID (ringid >= 0), then that keyring will be used.

    (2) If keyctl_instantiate_key() or keyctl_negate_key() are given one of the
    special constants that refer to the requesting process's keyrings
    (KEY_SPEC_*_KEYRING, all | Instantiator |------->| Instantiator |
    | | | | | |
    +-----------+ +--------------+ +--------------+
    request_key() request_key()

    This might be useful, for example, in Kerberos, where the requestor requests a
    ticket, and then the ticket instantiator requests the TGT, which someone else
    then has to go and fetch. The TGT, however, should be retained in the
    keyrings of the requestor, not the first instantiator. To make this explict
    an extra special keyring constant is also added.

    Signed-off-by: David Howells
    Reviewed-by: James Morris
    Signed-off-by: James Morris

    David Howells
     

17 Oct, 2008

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
    module: remove CONFIG_KMOD in comment after #endif
    remove CONFIG_KMOD from fs
    remove CONFIG_KMOD from drivers

    Manually fix conflict due to include cleanups in drivers/md/md.c

    Linus Torvalds
     
  • We currently use a PM notifier to disable user mode helpers before suspend
    and hibernation and to re-enable them during resume. However, this is not
    an ideal solution, because if any drivers want to upload firmware into
    memory before suspend, they have to use a PM notifier for this purpose and
    there is no guarantee that the ordering of PM notifiers will be as
    expected (ie. the notifier that disables user mode helpers has to be run
    after the driver's notifier used for uploading the firmware).

    For this reason, it seems better to move the disabling and enabling of
    user mode helpers to separate functions that will be called by the PM core
    as necessary.

    [akpm@linux-foundation.org: remove unneeded ifdefs]
    Signed-off-by: Rafael J. Wysocki
    Cc: Alan Stern
    Acked-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     

16 Oct, 2008

1 commit


26 Jul, 2008

1 commit

  • Presently call_usermodehelper_setup() uses GFP_ATOMIC. but it can return
    NULL _very_ easily.

    GFP_ATOMIC is needed only when we can't sleep. and, GFP_KERNEL is robust
    and better.

    thus, I add gfp_mask argument to call_usermodehelper_setup().

    So, its callers pass the gfp_t as below:

    call_usermodehelper() and call_usermodehelper_keys():
    depend on 'wait' argument.
    call_usermodehelper_pipe():
    always GFP_KERNEL because always run under process context.
    orderly_poweroff():
    pass to GFP_ATOMIC because may run under interrupt context.

    Signed-off-by: KOSAKI Motohiro
    Cc: "Paul Menage"
    Reviewed-by: Li Zefan
    Acked-by: Jeremy Fitzhardinge
    Cc: Rusty Russell
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

25 Jul, 2008

1 commit

  • This patch adds O_NONBLOCK support to pipe2. It is minimally more involved
    than the patches for eventfd et.al but still trivial. The interfaces of the
    create_write_pipe and create_read_pipe helper functions were changed and the
    one other caller as well.

    The following test must be adjusted for architectures other than x86 and
    x86-64 and in case the syscall numbers changed.

    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    #include
    #include
    #include
    #include

    #ifndef __NR_pipe2
    # ifdef __x86_64__
    # define __NR_pipe2 293
    # elif defined __i386__
    # define __NR_pipe2 331
    # else
    # error "need __NR_pipe2"
    # endif
    #endif

    int
    main (void)
    {
    int fds[2];
    if (syscall (__NR_pipe2, fds, 0) == -1)
    {
    puts ("pipe2(0) failed");
    return 1;
    }
    for (int i = 0; i < 2; ++i)
    {
    int fl = fcntl (fds[i], F_GETFL);
    if (fl == -1)
    {
    puts ("fcntl failed");
    return 1;
    }
    if (fl & O_NONBLOCK)
    {
    printf ("pipe2(0) set non-blocking mode for fds[%d]\n", i);
    return 1;
    }
    close (fds[i]);
    }

    if (syscall (__NR_pipe2, fds, O_NONBLOCK) == -1)
    {
    puts ("pipe2(O_NONBLOCK) failed");
    return 1;
    }
    for (int i = 0; i < 2; ++i)
    {
    int fl = fcntl (fds[i], F_GETFL);
    if (fl == -1)
    {
    puts ("fcntl failed");
    return 1;
    }
    if ((fl & O_NONBLOCK) == 0)
    {
    printf ("pipe2(O_NONBLOCK) does not set non-blocking mode for fds[%d]\n", i);
    return 1;
    }
    close (fds[i]);
    }

    puts ("OK");

    return 0;
    }
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    Signed-off-by: Ulrich Drepper
    Acked-by: Davide Libenzi
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Drepper
     

22 Jul, 2008

1 commit


02 May, 2008

1 commit


20 Apr, 2008

1 commit

  • * Use new set_cpus_allowed_ptr() function added by previous patch,
    which instead of passing the "newly allowed cpus" cpumask_t arg
    by value, pass it by pointer:

    -int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask)
    +int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask)

    * Modify CPU_MASK_ALL

    Depends on:
    [sched-devel]: sched: add new set_cpus_allowed_ptr function

    Signed-off-by: Mike Travis
    Signed-off-by: Ingo Molnar

    Mike Travis