18 Apr, 2013

1 commit

  • This fixes a kernel memory contents leak via the tkill and tgkill syscalls
    for compat processes.

    This is visible in the siginfo_t->_sifields._rt.si_sigval.sival_ptr field
    when handling signals delivered from tkill.

    The place of the infoleak:

    int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
    {
    ...
    put_user_ex(ptr_to_compat(from->si_ptr), &to->si_ptr);
    ...
    }

    Signed-off-by: Emese Revfy
    Reviewed-by: PaX Team
    Signed-off-by: Kees Cook
    Cc: Al Viro
    Cc: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc: Serge Hallyn
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Emese Revfy
     

14 Mar, 2013

2 commits

  • __ARCH_HAS_SA_RESTORER is the preferred conditional for use in 3.9 and
    later kernels, per Kees.

    Cc: Emese Revfy
    Cc: Emese Revfy
    Cc: PaX Team
    Cc: Al Viro
    Cc: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc: Serge Hallyn
    Cc: Julien Tinnes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • When the new signal handlers are set up, the location of sa_restorer is
    not cleared, leaking a parent process's address space location to
    children. This allows for a potential bypass of the parent's ASLR by
    examining the sa_restorer value returned when calling sigaction().

    Based on what should be considered "secret" about addresses, it only
    matters across the exec not the fork (since the VMAs haven't changed
    until the exec). But since exec sets SIG_DFL and keeps sa_restorer,
    this is where it should be fixed.

    Given the few uses of sa_restorer, a "set" function was not written
    since this would be the only use. Instead, we use
    __ARCH_HAS_SA_RESTORER, as already done in other places.

    Example of the leak before applying this patch:

    $ cat /proc/$$/maps
    ...
    7fb9f3083000-7fb9f3238000 r-xp 00000000 fd:01 404469 .../libc-2.15.so
    ...
    $ ./leak
    ...
    7f278bc74000-7f278be29000 r-xp 00000000 fd:01 404469 .../libc-2.15.so
    ...
    1 0 (nil) 0x7fb9f30b94a0
    2 4000000 (nil) 0x7f278bcaa4a0
    3 4000000 (nil) 0x7f278bcaa4a0
    4 0 (nil) 0x7fb9f30b94a0
    ...

    [akpm@linux-foundation.org: use SA_RESTORER for backportability]
    Signed-off-by: Kees Cook
    Reported-by: Emese Revfy
    Cc: Emese Revfy
    Cc: PaX Team
    Cc: Al Viro
    Cc: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc: Serge Hallyn
    Cc: Julien Tinnes
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

13 Mar, 2013

1 commit

  • Fix new kernel-doc warnings in kernel/signal.c:

    Warning(kernel/signal.c:2689): No description found for parameter 'uset'
    Warning(kernel/signal.c:2689): Excess function parameter 'set' description in 'sys_rt_sigpending'

    Signed-off-by: Randy Dunlap
    Cc: Alexander Viro
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

03 Mar, 2013

2 commits


28 Feb, 2013

2 commits

  • Several printk's were missing KERN_INFO and KERN_CONT flags. In
    addition, a printk that was outside a #if/#endif should have been
    inside, which would result in stray blank line on non-x86 boxes.

    Signed-off-by: Valdis Kletnieks
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Valdis Kletnieks
     
  • The idea is simple. We need to get the siginfo for each signal on
    checkpointing dump, and then return it back on restore.

    The first problem is that the kernel doesn't report complete siginfos to
    userspace. In a signal handler the kernel strips SI_CODE from siginfo.
    When a siginfo is received from signalfd, it has a different format with
    fixed sizes of fields. The interface of signalfd was extended. If a
    signalfd is created with the flag SFD_RAW, it returns siginfo in a raw
    format.

    rt_sigqueueinfo looks suitable for restoring signals, but it can't send
    siginfo with a positive si_code, because these codes are reserved for
    the kernel. In the real world each person has right to do anything with
    himself, so I think a process should able to send any siginfo to itself.

    This patch:

    The kernel prevents sending of siginfo with positive si_code, because
    these codes are reserved for kernel. I think we can allow a task to
    send such a siginfo to itself. This operation should not be dangerous.

    This functionality is required for restoring signals in
    checkpoint/restart.

    Signed-off-by: Andrey Vagin
    Cc: Serge Hallyn
    Cc: "Eric W. Biederman"
    Cc: Al Viro
    Cc: Michael Kerrisk
    Cc: Pavel Emelyanov
    Cc: Cyrill Gorcunov
    Cc: Michael Kerrisk
    Reviewed-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Vagin
     

24 Feb, 2013

1 commit

  • Pull signal handling cleanups from Al Viro:
    "This is the first pile; another one will come a bit later and will
    contain SYSCALL_DEFINE-related patches.

    - a bunch of signal-related syscalls (both native and compat)
    unified.

    - a bunch of compat syscalls switched to COMPAT_SYSCALL_DEFINE
    (fixing several potential problems with missing argument
    validation, while we are at it)

    - a lot of now-pointless wrappers killed

    - a couple of architectures (cris and hexagon) forgot to save
    altstack settings into sigframe, even though they used the
    (uninitialized) values in sigreturn; fixed.

    - microblaze fixes for delivery of multiple signals arriving at once

    - saner set of helpers for signal delivery introduced, several
    architectures switched to using those."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (143 commits)
    x86: convert to ksignal
    sparc: convert to ksignal
    arm: switch to struct ksignal * passing
    alpha: pass k_sigaction and siginfo_t using ksignal pointer
    burying unused conditionals
    make do_sigaltstack() static
    arm64: switch to generic old sigaction() (compat-only)
    arm64: switch to generic compat rt_sigaction()
    arm64: switch compat to generic old sigsuspend
    arm64: switch to generic compat rt_sigqueueinfo()
    arm64: switch to generic compat rt_sigpending()
    arm64: switch to generic compat rt_sigprocmask()
    arm64: switch to generic sigaltstack
    sparc: switch to generic old sigsuspend
    sparc: COMPAT_SYSCALL_DEFINE does all sign-extension as well as SYSCALL_DEFINE
    sparc: kill sign-extending wrappers for native syscalls
    kill sparc32_open()
    sparc: switch to use of generic old sigaction
    sparc: switch sys_compat_rt_sigaction() to COMPAT_SYSCALL_DEFINE
    mips: switch to generic sys_fork() and sys_clone()
    ...

    Linus Torvalds
     

14 Feb, 2013

2 commits


05 Feb, 2013

1 commit

  • …x/kernel/git/frederic/linux-dynticks into sched/core

    Pull full-dynticks (user-space execution is undisturbed and
    receives no timer IRQs) preparation changes that convert the
    cputime accounting code to be full-dynticks ready,
    from Frederic Weisbecker:

    "This implements the cputime accounting on full dynticks CPUs.

    Typical cputime stats infrastructure relies on the timer tick and
    its periodic polling on the CPU to account the amount of time
    spent by the CPUs and the tasks per high level domains such as
    userspace, kernelspace, guest, ...

    Now we are preparing to implement full dynticks capability on
    Linux for Real Time and HPC users who want full CPU isolation.
    This feature requires a cputime accounting that doesn't depend
    on the timer tick.

    To implement it, this new cputime infrastructure plugs into
    kernel/user/guest boundaries to take snapshots of cputime and
    flush these to the stats when needed. This performs pretty
    much like CONFIG_VIRT_CPU_ACCOUNTING except that context location
    and cputime snaphots are synchronized between write and read
    side such that the latter can safely retrieve the pending tickless
    cputime of a task and add it to its latest cputime snapshot to
    return the correct result to the user."

    Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

04 Feb, 2013

11 commits


28 Jan, 2013

1 commit

  • This is in preparation for the full dynticks feature. While
    remotely reading the cputime of a task running in a full
    dynticks CPU, we'll need to do some extra-computation. This
    way we can account the time it spent tickless in userspace
    since its last cputime snapshot.

    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Cc: Li Zhong
    Cc: Namhyung Kim
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

23 Jan, 2013

2 commits

  • putreg() assumes that the tracee is not running and pt_regs_access() can
    safely play with its stack. However a killed tracee can return from
    ptrace_stop() to the low-level asm code and do RESTORE_REST, this means
    that debugger can actually read/modify the kernel stack until the tracee
    does SAVE_REST again.

    set_task_blockstep() can race with SIGKILL too and in some sense this
    race is even worse, the very fact the tracee can be woken up breaks the
    logic.

    As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace()
    call, this ensures that nobody can ever wakeup the tracee while the
    debugger looks at it. Not only this fixes the mentioned problems, we
    can do some cleanups/simplifications in arch_ptrace() paths.

    Probably ptrace_unfreeze_traced() needs more callers, for example it
    makes sense to make the tracee killable for oom-killer before
    access_process_vm().

    While at it, add the comment into may_ptrace_stop() to explain why
    ptrace_stop() still can't rely on SIGKILL and signal_pending_state().

    Reported-by: Salman Qazi
    Reported-by: Suleiman Souhlal
    Suggested-by: Linus Torvalds
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Cleanup and preparation for the next change.

    signal_wake_up(resume => true) is overused. None of ptrace/jctl callers
    actually want to wakeup a TASK_WAKEKILL task, but they can't specify the
    necessary mask.

    Turn signal_wake_up() into signal_wake_up_state(state), reintroduce
    signal_wake_up() as a trivial helper, and add ptrace_signal_wake_up()
    which adds __TASK_TRACED.

    This way ptrace_signal_wake_up() can work "inside" ptrace_request()
    even if the tracee doesn't have the TASK_WAKEKILL bit set.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

21 Jan, 2013

1 commit

  • Pull misc syscall fixes from Al Viro:

    - compat syscall fixes (discussed back in December)

    - a couple of "make life easier for sigaltstack stuff by reducing
    inter-tree dependencies"

    - fix up compiler/asmlinkage calling convention disagreement of
    sys_clone()

    - misc

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
    sys_clone() needs asmlinkage_protect
    make sure that /linuxrc has std{in,out,err}
    x32: fix sigtimedwait
    x32: fix waitid()
    switch compat_sys_wait4() and compat_sys_waitid() to COMPAT_SYSCALL_DEFINE
    switch compat_sys_sigaltstack() to COMPAT_SYSCALL_DEFINE
    CONFIG_GENERIC_SIGALTSTACK build breakage with asm-generic/syscalls.h
    Ensure that kernel_init_freeable() is not inlined into non __init code

    Linus Torvalds
     

06 Jan, 2013

2 commits

  • Cleanup. And I think we need more cleanups, in particular
    __set_current_blocked() and sigprocmask() should die. Nobody should
    ever block SIGKILL or SIGSTOP.

    - Change set_current_blocked() to use __set_current_blocked()

    - Change sys_sigprocmask() to use set_current_blocked(), this way it
    should not worry about SIGKILL/SIGSTOP.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Commit 77097ae503b1 ("most of set_current_blocked() callers want
    SIGKILL/SIGSTOP removed from set") removed the initialization of newmask
    by accident, causing ltp to complain like this:

    ssetmask01 1 TFAIL : sgetmask() failed: TEST_ERRNO=???(0): Success

    Restore the proper initialization.

    Reported-and-tested-by: CAI Qian
    Signed-off-by: Oleg Nesterov
    Cc: stable@kernel.org # v3.5+
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

26 Dec, 2012

1 commit


21 Dec, 2012

1 commit

  • Pull signal handling cleanups from Al Viro:
    "sigaltstack infrastructure + conversion for x86, alpha and um,
    COMPAT_SYSCALL_DEFINE infrastructure.

    Note that there are several conflicts between "unify
    SS_ONSTACK/SS_DISABLE definitions" and UAPI patches in mainline;
    resolution is trivial - just remove definitions of SS_ONSTACK and
    SS_DISABLED from arch/*/uapi/asm/signal.h; they are all identical and
    include/uapi/linux/signal.h contains the unified variant."

    Fixed up conflicts as per Al.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
    alpha: switch to generic sigaltstack
    new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those
    generic compat_sys_sigaltstack()
    introduce generic sys_sigaltstack(), switch x86 and um to it
    new helper: compat_user_stack_pointer()
    new helper: restore_altstack()
    unify SS_ONSTACK/SS_DISABLE definitions
    new helper: current_user_stack_pointer()
    missing user_stack_pointer() instances
    Bury the conditionals from kernel_thread/kernel_execve series
    COMPAT_SYSCALL_DEFINE: infrastructure

    Linus Torvalds
     

20 Dec, 2012

4 commits


18 Dec, 2012

1 commit

  • Pull user namespace changes from Eric Biederman:
    "While small this set of changes is very significant with respect to
    containers in general and user namespaces in particular. The user
    space interface is now complete.

    This set of changes adds support for unprivileged users to create user
    namespaces and as a user namespace root to create other namespaces.
    The tyranny of supporting suid root preventing unprivileged users from
    using cool new kernel features is broken.

    This set of changes completes the work on setns, adding support for
    the pid, user, mount namespaces.

    This set of changes includes a bunch of basic pid namespace
    cleanups/simplifications. Of particular significance is the rework of
    the pid namespace cleanup so it no longer requires sending out
    tendrils into all kinds of unexpected cleanup paths for operation. At
    least one case of broken error handling is fixed by this cleanup.

    The files under /proc//ns/ have been converted from regular files
    to magic symlinks which prevents incorrect caching by the VFS,
    ensuring the files always refer to the namespace the process is
    currently using and ensuring that the ptrace_mayaccess permission
    checks are always applied.

    The files under /proc//ns/ have been given stable inode numbers
    so it is now possible to see if different processes share the same
    namespaces.

    Through the David Miller's net tree are changes to relax many of the
    permission checks in the networking stack to allowing the user
    namespace root to usefully use the networking stack. Similar changes
    for the mount namespace and the pid namespace are coming through my
    tree.

    Two small changes to add user namespace support were commited here adn
    in David Miller's -net tree so that I could complete the work on the
    /proc//ns/ files in this tree.

    Work remains to make it safe to build user namespaces and 9p, afs,
    ceph, cifs, coda, gfs2, ncpfs, nfs, nfsd, ocfs2, and xfs so the
    Kconfig guard remains in place preventing that user namespaces from
    being built when any of those filesystems are enabled.

    Future design work remains to allow root users outside of the initial
    user namespace to mount more than just /proc and /sys."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (38 commits)
    proc: Usable inode numbers for the namespace file descriptors.
    proc: Fix the namespace inode permission checks.
    proc: Generalize proc inode allocation
    userns: Allow unprivilged mounts of proc and sysfs
    userns: For /proc/self/{uid,gid}_map derive the lower userns from the struct file
    procfs: Print task uids and gids in the userns that opened the proc file
    userns: Implement unshare of the user namespace
    userns: Implent proc namespace operations
    userns: Kill task_user_ns
    userns: Make create_new_namespaces take a user_ns parameter
    userns: Allow unprivileged use of setns.
    userns: Allow unprivileged users to create new namespaces
    userns: Allow setting a userns mapping to your current uid.
    userns: Allow chown and setgid preservation
    userns: Allow unprivileged users to create user namespaces.
    userns: Ignore suid and sgid on binaries if the uid or gid can not be mapped
    userns: fix return value on mntns_install() failure
    vfs: Allow unprivileged manipulation of the mount namespace.
    vfs: Only support slave subtrees across different user namespaces
    vfs: Add a user namespace reference from struct mnt_namespace
    ...

    Linus Torvalds
     

13 Dec, 2012

1 commit

  • Pull big execve/kernel_thread/fork unification series from Al Viro:
    "All architectures are converted to new model. Quite a bit of that
    stuff is actually shared with architecture trees; in such cases it's
    literally shared branch pulled by both, not a cherry-pick.

    A lot of ugliness and black magic is gone (-3KLoC total in this one):

    - kernel_thread()/kernel_execve()/sys_execve() redesign.

    We don't do syscalls from kernel anymore for either kernel_thread()
    or kernel_execve():

    kernel_thread() is essentially clone(2) with callback run before we
    return to userland, the callbacks either never return or do
    successful do_execve() before returning.

    kernel_execve() is a wrapper for do_execve() - it doesn't need to
    do transition to user mode anymore.

    As a result kernel_thread() and kernel_execve() are
    arch-independent now - they live in kernel/fork.c and fs/exec.c
    resp. sys_execve() is also in fs/exec.c and it's completely
    architecture-independent.

    - daemonize() is gone, along with its parts in fs/*.c

    - struct pt_regs * is no longer passed to do_fork/copy_process/
    copy_thread/do_execve/search_binary_handler/->load_binary/do_coredump.

    - sys_fork()/sys_vfork()/sys_clone() unified; some architectures
    still need wrappers (ones with callee-saved registers not saved in
    pt_regs on syscall entry), but the main part of those suckers is in
    kernel/fork.c now."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (113 commits)
    do_coredump(): get rid of pt_regs argument
    print_fatal_signal(): get rid of pt_regs argument
    ptrace_signal(): get rid of unused arguments
    get rid of ptrace_signal_deliver() arguments
    new helper: signal_pt_regs()
    unify default ptrace_signal_deliver
    flagday: kill pt_regs argument of do_fork()
    death to idle_regs()
    don't pass regs to copy_process()
    flagday: don't pass regs to copy_thread()
    bfin: switch to generic vfork, get rid of pointless wrappers
    xtensa: switch to generic clone()
    openrisc: switch to use of generic fork and clone
    unicore32: switch to generic clone(2)
    score: switch to generic fork/vfork/clone
    c6x: sanitize copy_thread(), get rid of clone(2) wrapper, switch to generic clone()
    take sys_fork/sys_vfork/sys_clone prototypes to linux/syscalls.h
    mn10300: switch to generic fork/vfork/clone
    h8300: switch to generic fork/vfork/clone
    tile: switch to generic clone()
    ...

    Conflicts:
    arch/microblaze/include/asm/Kbuild

    Linus Torvalds
     

29 Nov, 2012

3 commits