08 Apr, 2014

1 commit


20 Feb, 2014

1 commit


19 Feb, 2014

1 commit


24 Jan, 2014

1 commit


09 Nov, 2013

1 commit


12 Sep, 2013

1 commit

  • I found the following pattern that leads in to interesting findings:

    grep -r "ret.*|=.*__put_user" *
    grep -r "ret.*|=.*__get_user" *
    grep -r "ret.*|=.*__copy" *

    The __put_user() calls in compat_ioctl.c, ptrace compat, signal compat,
    since those appear in compat code, we could probably expect the kernel
    addresses not to be reachable in the lower 32-bit range, so I think they
    might not be exploitable.

    For the "__get_user" cases, I don't think those are exploitable: the worse
    that can happen is that the kernel will copy kernel memory into in-kernel
    buffers, and will fail immediately afterward.

    The alpha csum_partial_copy_from_user() seems to be missing the
    access_ok() check entirely. The fix is inspired from x86. This could
    lead to information leak on alpha. I also noticed that many architectures
    map csum_partial_copy_from_user() to csum_partial_copy_generic(), but I
    wonder if the latter is performing the access checks on every
    architectures.

    Signed-off-by: Mathieu Desnoyers
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: Jens Axboe
    Cc: Oleg Nesterov
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathieu Desnoyers
     

12 May, 2013

1 commit

  • Avoid waking up every thread sleeping in a sigtimedwait call during
    suspend and resume by calling a freezable blocking call. Previous
    patches modified the freezer to avoid sending wakeups to threads
    that are blocked in freezable blocking calls.

    This call was selected to be converted to a freezable call because
    it doesn't hold any locks or release any resources when interrupted
    that might be needed by another freezing task or a kernel driver
    during suspend, and is a common site where idle userspace tasks are
    blocked.

    Acked-by: Tejun Heo
    Signed-off-by: Colin Cross
    Signed-off-by: Rafael J. Wysocki

    Colin Cross
     

02 May, 2013

1 commit

  • Pull networking updates from David Miller:
    "Highlights (1721 non-merge commits, this has to be a record of some
    sort):

    1) Add 'random' mode to team driver, from Jiri Pirko and Eric
    Dumazet.

    2) Make it so that any driver that supports configuration of multiple
    MAC addresses can provide the forwarding database add and del
    calls by providing a default implementation and hooking that up if
    the driver doesn't have an explicit set of handlers. From Vlad
    Yasevich.

    3) Support GSO segmentation over tunnels and other encapsulating
    devices such as VXLAN, from Pravin B Shelar.

    4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton.

    5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita
    Dukkipati.

    6) In the PHY layer, allow supporting wake-on-lan in situations where
    the PHY registers have to be written for it to be configured.

    Use it to support wake-on-lan in mv643xx_eth.

    From Michael Stapelberg.

    7) Significantly improve firewire IPV6 support, from YOSHIFUJI
    Hideaki.

    8) Allow multiple packets to be sent in a single transmission using
    network coding in batman-adv, from Martin Hundebøll.

    9) Add support for T5 cxgb4 chips, from Santosh Rastapur.

    10) Generalize the VXLAN forwarding tables so that there is more
    flexibility in configurating various aspects of the endpoints.
    From David Stevens.

    11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver,
    from Dmitry Kravkov.

    12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo
    Neira Ayuso.

    13) Start adding networking selftests.

    14) In situations of overload on the same AF_PACKET fanout socket, or
    per-cpu packet receive queue, minimize drop by distributing the
    load to other cpus/fanouts. From Willem de Bruijn and Eric
    Dumazet.

    15) Add support for new payload offset BPF instruction, from Daniel
    Borkmann.

    16) Convert several drivers over to mdoule_platform_driver(), from
    Sachin Kamat.

    17) Provide a minimal BPF JIT image disassembler userspace tool, from
    Daniel Borkmann.

    18) Rewrite F-RTO implementation in TCP to match the final
    specification of it in RFC4138 and RFC5682. From Yuchung Cheng.

    19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear
    you like netlink, so I implemented netlink dumping of netlink
    sockets.") From Andrey Vagin.

    20) Remove ugly passing of rtnetlink attributes into rtnl_doit
    functions, from Thomas Graf.

    21) Allow userspace to be able to see if a configuration change occurs
    in the middle of an address or device list dump, from Nicolas
    Dichtel.

    22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes
    Frederic Sowa.

    23) Increase accuracy of packet length used by packet scheduler, from
    Jason Wang.

    24) Beginning set of changes to make ipv4/ipv6 fragment handling more
    scalable and less susceptible to overload and locking contention,
    from Jesper Dangaard Brouer.

    25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_*()
    instead. From Hong Zhiguo.

    26) Optimize route usage in IPVS by avoiding reference counting where
    possible, from Julian Anastasov.

    27) Convert IPVS schedulers to RCU, also from Julian Anastasov.

    28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger
    Eitzenberger.

    29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG,
    nfnetlink_log, and nfnetlink_queue. From Gao feng.

    30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa.

    31) Support several new r8169 chips, from Hayes Wang.

    32) Support tokenized interface identifiers in ipv6, from Daniel
    Borkmann.

    33) Use usbnet_link_change() helper in USB net driver, from Ming Lei.

    34) Add 802.1ad vlan offload support, from Patrick McHardy.

    35) Support mmap() based netlink communication, also from Patrick
    McHardy.

    36) Support HW timestamping in mlx4 driver, from Amir Vadai.

    37) Rationalize AF_PACKET packet timestamping when transmitting, from
    Willem de Bruijn and Daniel Borkmann.

    38) Bring parity to what's provided by /proc/net/packet socket dumping
    and the info provided by netlink socket dumping of AF_PACKET
    sockets. From Nicolas Dichtel.

    39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin
    Poirier"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
    filter: fix va_list build error
    af_unix: fix a fatal race with bit fields
    bnx2x: Prevent memory leak when cnic is absent
    bnx2x: correct reading of speed capabilities
    net: sctp: attribute printl with __printf for gcc fmt checks
    netlink: kconfig: move mmap i/o into netlink kconfig
    netpoll: convert mutex into a semaphore
    netlink: Fix skb ref counting.
    net_sched: act_ipt forward compat with xtables
    mlx4_en: fix a build error on 32bit arches
    Revert "bnx2x: allow nvram test to run when device is down"
    bridge: avoid OOPS if root port not found
    drivers: net: cpsw: fix kernel warn on cpsw irq enable
    sh_eth: use random MAC address if no valid one supplied
    3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA)
    tg3: fix to append hardware time stamping flags
    unix/stream: fix peeking with an offset larger than data in queue
    unix/dgram: fix peeking with an offset larger than data in queue
    unix/dgram: peek beyond 0-sized skbs
    openvswitch: Remove unneeded ovs_netdev_get_ifindex()
    ...

    Linus Torvalds
     

01 May, 2013

2 commits

  • There are 2 well known and ancient problems with coredump/signals, and a
    lot of related bug reports:

    - do_coredump() clears TIF_SIGPENDING but of course this can't help
    if, say, SIGCHLD comes after that.

    In this case the coredump can fail unexpectedly. See for example
    wait_for_dump_helper()->signal_pending() check but there are other
    reasons.

    - At the same time, dumping a huge core on the slow media can take a
    lot of time/resources and there is no way to kill the coredumping
    task reliably. In particular this is not oom_kill-friendly.

    This patch tries to fix the 1st problem, and makes the preparation for the
    next changes.

    We add the new SIGNAL_GROUP_COREDUMP flag set by zap_threads() to indicate
    that this process dumps the core. prepare_signal() checks this flag and
    nacks any signal except SIGKILL.

    Note that this check tries to be conservative, in the long term we should
    probably treat the SIGNAL_GROUP_EXIT case equally but this needs more
    discussion. See marc.info/?l=linux-kernel&m=120508897917439

    Notes:
    - recalc_sigpending() doesn't check SIGNAL_GROUP_COREDUMP.
    The patch assumes that dump_write/etc paths should never
    call it, but we can change it as well.

    - There is another source of TIF_SIGPENDING, freezer. This
    will be addressed separately.

    Signed-off-by: Oleg Nesterov
    Tested-by: Mandeep Singh Baines
    Cc: Ingo Molnar
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Roland McGrath
    Cc: Tejun Heo
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • After the recent generic debug info on dump_stack() and friends, arc
    is printing duplicate information on debug dumps.

    [ARCLinux]$ ./crash
    crash/50: potentially unexpected fatal signal 11.
    Signed-off-by: Tejun Heo
    Cc: Bjorn Helgaas
    Cc: David S. Miller
    Cc: Fengguang Wu
    Cc: Heiko Carstens
    Cc: Jesper Nilsson
    Cc: Martin Schwidefsky
    Cc: Mike Frysinger
    Cc: Sam Ravnborg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vineet Gupta
     

23 Apr, 2013

1 commit

  • Conflicts:
    drivers/net/ethernet/emulex/benet/be_main.c
    drivers/net/ethernet/intel/igb/igb_main.c
    drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c
    include/net/scm.h
    net/batman-adv/routing.c
    net/ipv4/tcp_input.c

    The e{uid,gid} --> {uid,gid} credentials fix conflicted with the
    cleanup in net-next to now pass cred structs around.

    The be2net driver had a bug fix in 'net' that overlapped with the VLAN
    interface changes by Patrick McHardy in net-next.

    An IGB conflict existed because in 'net' the build_skb() support was
    reverted, and in 'net-next' there was a comment style fix within that
    code.

    Several batman-adv conflicts were resolved by making sure that all
    calls to batadv_is_my_mac() are changed to have a new bat_priv first
    argument.

    Eric Dumazet's TS ECR fix in TCP in 'net' conflicted with the F-RTO
    rewrite in 'net-next', mostly overlapping changes.

    Thanks to Stephen Rothwell and Antonio Quartulli for help with several
    of these merge resolutions.

    Signed-off-by: David S. Miller

    David S. Miller
     

18 Apr, 2013

1 commit

  • This fixes a kernel memory contents leak via the tkill and tgkill syscalls
    for compat processes.

    This is visible in the siginfo_t->_sifields._rt.si_sigval.sival_ptr field
    when handling signals delivered from tkill.

    The place of the infoleak:

    int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
    {
    ...
    put_user_ex(ptr_to_compat(from->si_ptr), &to->si_ptr);
    ...
    }

    Signed-off-by: Emese Revfy
    Reviewed-by: PaX Team
    Signed-off-by: Kees Cook
    Cc: Al Viro
    Cc: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc: Serge Hallyn
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Emese Revfy
     

21 Mar, 2013

1 commit

  • Process connector can now also detect coredumping events.

    Main aim of patch is get notified at start of coredumping, instead of
    having to wait for it to finish and then being notified through EXIT
    event.

    Could be used for instance by process-managers that want to get
    notified as soon as possible about process failures, and not
    necessarily beeing notified after coredump, which could be in the
    order of minutes depending on size of coredump, piping and so on.

    Signed-off-by: Jesper Derehag
    Signed-off-by: David S. Miller

    Jesper Derehag
     

14 Mar, 2013

2 commits

  • __ARCH_HAS_SA_RESTORER is the preferred conditional for use in 3.9 and
    later kernels, per Kees.

    Cc: Emese Revfy
    Cc: Emese Revfy
    Cc: PaX Team
    Cc: Al Viro
    Cc: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc: Serge Hallyn
    Cc: Julien Tinnes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • When the new signal handlers are set up, the location of sa_restorer is
    not cleared, leaking a parent process's address space location to
    children. This allows for a potential bypass of the parent's ASLR by
    examining the sa_restorer value returned when calling sigaction().

    Based on what should be considered "secret" about addresses, it only
    matters across the exec not the fork (since the VMAs haven't changed
    until the exec). But since exec sets SIG_DFL and keeps sa_restorer,
    this is where it should be fixed.

    Given the few uses of sa_restorer, a "set" function was not written
    since this would be the only use. Instead, we use
    __ARCH_HAS_SA_RESTORER, as already done in other places.

    Example of the leak before applying this patch:

    $ cat /proc/$$/maps
    ...
    7fb9f3083000-7fb9f3238000 r-xp 00000000 fd:01 404469 .../libc-2.15.so
    ...
    $ ./leak
    ...
    7f278bc74000-7f278be29000 r-xp 00000000 fd:01 404469 .../libc-2.15.so
    ...
    1 0 (nil) 0x7fb9f30b94a0
    2 4000000 (nil) 0x7f278bcaa4a0
    3 4000000 (nil) 0x7f278bcaa4a0
    4 0 (nil) 0x7fb9f30b94a0
    ...

    [akpm@linux-foundation.org: use SA_RESTORER for backportability]
    Signed-off-by: Kees Cook
    Reported-by: Emese Revfy
    Cc: Emese Revfy
    Cc: PaX Team
    Cc: Al Viro
    Cc: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc: Serge Hallyn
    Cc: Julien Tinnes
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

13 Mar, 2013

1 commit

  • Fix new kernel-doc warnings in kernel/signal.c:

    Warning(kernel/signal.c:2689): No description found for parameter 'uset'
    Warning(kernel/signal.c:2689): Excess function parameter 'set' description in 'sys_rt_sigpending'

    Signed-off-by: Randy Dunlap
    Cc: Alexander Viro
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

03 Mar, 2013

2 commits


28 Feb, 2013

2 commits

  • Several printk's were missing KERN_INFO and KERN_CONT flags. In
    addition, a printk that was outside a #if/#endif should have been
    inside, which would result in stray blank line on non-x86 boxes.

    Signed-off-by: Valdis Kletnieks
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Valdis Kletnieks
     
  • The idea is simple. We need to get the siginfo for each signal on
    checkpointing dump, and then return it back on restore.

    The first problem is that the kernel doesn't report complete siginfos to
    userspace. In a signal handler the kernel strips SI_CODE from siginfo.
    When a siginfo is received from signalfd, it has a different format with
    fixed sizes of fields. The interface of signalfd was extended. If a
    signalfd is created with the flag SFD_RAW, it returns siginfo in a raw
    format.

    rt_sigqueueinfo looks suitable for restoring signals, but it can't send
    siginfo with a positive si_code, because these codes are reserved for
    the kernel. In the real world each person has right to do anything with
    himself, so I think a process should able to send any siginfo to itself.

    This patch:

    The kernel prevents sending of siginfo with positive si_code, because
    these codes are reserved for kernel. I think we can allow a task to
    send such a siginfo to itself. This operation should not be dangerous.

    This functionality is required for restoring signals in
    checkpoint/restart.

    Signed-off-by: Andrey Vagin
    Cc: Serge Hallyn
    Cc: "Eric W. Biederman"
    Cc: Al Viro
    Cc: Michael Kerrisk
    Cc: Pavel Emelyanov
    Cc: Cyrill Gorcunov
    Cc: Michael Kerrisk
    Reviewed-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Vagin
     

24 Feb, 2013

1 commit

  • Pull signal handling cleanups from Al Viro:
    "This is the first pile; another one will come a bit later and will
    contain SYSCALL_DEFINE-related patches.

    - a bunch of signal-related syscalls (both native and compat)
    unified.

    - a bunch of compat syscalls switched to COMPAT_SYSCALL_DEFINE
    (fixing several potential problems with missing argument
    validation, while we are at it)

    - a lot of now-pointless wrappers killed

    - a couple of architectures (cris and hexagon) forgot to save
    altstack settings into sigframe, even though they used the
    (uninitialized) values in sigreturn; fixed.

    - microblaze fixes for delivery of multiple signals arriving at once

    - saner set of helpers for signal delivery introduced, several
    architectures switched to using those."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (143 commits)
    x86: convert to ksignal
    sparc: convert to ksignal
    arm: switch to struct ksignal * passing
    alpha: pass k_sigaction and siginfo_t using ksignal pointer
    burying unused conditionals
    make do_sigaltstack() static
    arm64: switch to generic old sigaction() (compat-only)
    arm64: switch to generic compat rt_sigaction()
    arm64: switch compat to generic old sigsuspend
    arm64: switch to generic compat rt_sigqueueinfo()
    arm64: switch to generic compat rt_sigpending()
    arm64: switch to generic compat rt_sigprocmask()
    arm64: switch to generic sigaltstack
    sparc: switch to generic old sigsuspend
    sparc: COMPAT_SYSCALL_DEFINE does all sign-extension as well as SYSCALL_DEFINE
    sparc: kill sign-extending wrappers for native syscalls
    kill sparc32_open()
    sparc: switch to use of generic old sigaction
    sparc: switch sys_compat_rt_sigaction() to COMPAT_SYSCALL_DEFINE
    mips: switch to generic sys_fork() and sys_clone()
    ...

    Linus Torvalds
     

14 Feb, 2013

2 commits


05 Feb, 2013

1 commit

  • …x/kernel/git/frederic/linux-dynticks into sched/core

    Pull full-dynticks (user-space execution is undisturbed and
    receives no timer IRQs) preparation changes that convert the
    cputime accounting code to be full-dynticks ready,
    from Frederic Weisbecker:

    "This implements the cputime accounting on full dynticks CPUs.

    Typical cputime stats infrastructure relies on the timer tick and
    its periodic polling on the CPU to account the amount of time
    spent by the CPUs and the tasks per high level domains such as
    userspace, kernelspace, guest, ...

    Now we are preparing to implement full dynticks capability on
    Linux for Real Time and HPC users who want full CPU isolation.
    This feature requires a cputime accounting that doesn't depend
    on the timer tick.

    To implement it, this new cputime infrastructure plugs into
    kernel/user/guest boundaries to take snapshots of cputime and
    flush these to the stats when needed. This performs pretty
    much like CONFIG_VIRT_CPU_ACCOUNTING except that context location
    and cputime snaphots are synchronized between write and read
    side such that the latter can safely retrieve the pending tickless
    cputime of a task and add it to its latest cputime snapshot to
    return the correct result to the user."

    Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

04 Feb, 2013

11 commits


28 Jan, 2013

1 commit

  • This is in preparation for the full dynticks feature. While
    remotely reading the cputime of a task running in a full
    dynticks CPU, we'll need to do some extra-computation. This
    way we can account the time it spent tickless in userspace
    since its last cputime snapshot.

    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Cc: Li Zhong
    Cc: Namhyung Kim
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

23 Jan, 2013

2 commits

  • putreg() assumes that the tracee is not running and pt_regs_access() can
    safely play with its stack. However a killed tracee can return from
    ptrace_stop() to the low-level asm code and do RESTORE_REST, this means
    that debugger can actually read/modify the kernel stack until the tracee
    does SAVE_REST again.

    set_task_blockstep() can race with SIGKILL too and in some sense this
    race is even worse, the very fact the tracee can be woken up breaks the
    logic.

    As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace()
    call, this ensures that nobody can ever wakeup the tracee while the
    debugger looks at it. Not only this fixes the mentioned problems, we
    can do some cleanups/simplifications in arch_ptrace() paths.

    Probably ptrace_unfreeze_traced() needs more callers, for example it
    makes sense to make the tracee killable for oom-killer before
    access_process_vm().

    While at it, add the comment into may_ptrace_stop() to explain why
    ptrace_stop() still can't rely on SIGKILL and signal_pending_state().

    Reported-by: Salman Qazi
    Reported-by: Suleiman Souhlal
    Suggested-by: Linus Torvalds
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Cleanup and preparation for the next change.

    signal_wake_up(resume => true) is overused. None of ptrace/jctl callers
    actually want to wakeup a TASK_WAKEKILL task, but they can't specify the
    necessary mask.

    Turn signal_wake_up() into signal_wake_up_state(state), reintroduce
    signal_wake_up() as a trivial helper, and add ptrace_signal_wake_up()
    which adds __TASK_TRACED.

    This way ptrace_signal_wake_up() can work "inside" ptrace_request()
    even if the tracee doesn't have the TASK_WAKEKILL bit set.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

21 Jan, 2013

1 commit

  • Pull misc syscall fixes from Al Viro:

    - compat syscall fixes (discussed back in December)

    - a couple of "make life easier for sigaltstack stuff by reducing
    inter-tree dependencies"

    - fix up compiler/asmlinkage calling convention disagreement of
    sys_clone()

    - misc

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
    sys_clone() needs asmlinkage_protect
    make sure that /linuxrc has std{in,out,err}
    x32: fix sigtimedwait
    x32: fix waitid()
    switch compat_sys_wait4() and compat_sys_waitid() to COMPAT_SYSCALL_DEFINE
    switch compat_sys_sigaltstack() to COMPAT_SYSCALL_DEFINE
    CONFIG_GENERIC_SIGALTSTACK build breakage with asm-generic/syscalls.h
    Ensure that kernel_init_freeable() is not inlined into non __init code

    Linus Torvalds
     

06 Jan, 2013

1 commit

  • Cleanup. And I think we need more cleanups, in particular
    __set_current_blocked() and sigprocmask() should die. Nobody should
    ever block SIGKILL or SIGSTOP.

    - Change set_current_blocked() to use __set_current_blocked()

    - Change sys_sigprocmask() to use set_current_blocked(), this way it
    should not worry about SIGKILL/SIGSTOP.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov