08 Jun, 2016

1 commit

  • This patch allows to build the whole kernel with GCC plugins. It was ported from
    grsecurity/PaX. The infrastructure supports building out-of-tree modules and
    building in a separate directory. Cross-compilation is supported too.
    Currently the x86, arm, arm64 and uml architectures enable plugins.

    The directory of the gcc plugins is scripts/gcc-plugins. You can use a file or a directory
    there. The plugins compile with these options:
    * -fno-rtti: gcc is compiled with this option so the plugins must use it too
    * -fno-exceptions: this is inherited from gcc too
    * -fasynchronous-unwind-tables: this is inherited from gcc too
    * -ggdb: it is useful for debugging a plugin (better backtrace on internal
    errors)
    * -Wno-narrowing: to suppress warnings from gcc headers (ipa-utils.h)
    * -Wno-unused-variable: to suppress warnings from gcc headers (gcc_version
    variable, plugin-version.h)

    The infrastructure introduces a new Makefile target called gcc-plugins. It
    supports all gcc versions from 4.5 to 6.0. The scripts/gcc-plugin.sh script
    chooses the proper host compiler (gcc-4.7 can be built by either gcc or g++).
    This script also checks the availability of the included headers in
    scripts/gcc-plugins/gcc-common.h.

    The gcc-common.h header contains frequently included headers for GCC plugins
    and it has a compatibility layer for the supported gcc versions.

    The gcc-generate-*-pass.h headers automatically generate the registration
    structures for GIMPLE, SIMPLE_IPA, IPA and RTL passes.

    Note that 'make clean' keeps the *.so files (only the distclean or mrproper
    targets clean all) because they are needed for out-of-tree modules.

    Based on work created by the PaX Team.

    Signed-off-by: Emese Revfy
    Acked-by: Kees Cook
    Signed-off-by: Michal Marek

    Emese Revfy
     

28 May, 2016

1 commit


24 May, 2016

1 commit


22 May, 2016

2 commits

  • This patch extends save_fp_registers() and restore_fp_registers() to use
    PTRACE_GETREGSET and PTRACE_SETREGSET with the XSTATE note type, adding
    support for new processor state extensions between context switches.

    When the new ptrace requests are unavailable, it falls back to the old
    PTRACE_GETFPREGS and PTRACE_SETFPREGS methods, which have been renamed to
    save_i387_registers() and restore_i387_registers().

    Now these functions expect *fp_regs to have the space of an _xstate struct.
    Thus, this also makes ptrace in UML responde to PTRACE_GETFPREGS/_SETFPREG
    requests with a user_i387_struct (thus independent from HOST_FP_SIZE), and
    by calling save_i387_registers() and restore_i387_registers() instead of
    the extended save_fp_registers() and restore_fp_registers() functions.

    Signed-off-by: Eli Cooper

    Eli Cooper
     
  • Extends fpstate to _xstate, in order to hold AVX/YMM registers.

    To avoid oversized stack frame, the following functions have been
    refactored by using malloc.
    - sig_handler_common
    - timer_real_alarm_handler

    Signed-off-by: Eli Cooper

    Eli Cooper
     

21 May, 2016

1 commit

  • Define HAVE_EXIT_THREAD for archs which want to do something in
    exit_thread. For others, let's define exit_thread as an empty inline.

    This is a cleanup before we change the prototype of exit_thread to
    accept a task parameter.

    [akpm@linux-foundation.org: fix mips]
    Signed-off-by: Jiri Slaby
    Cc: "David S. Miller"
    Cc: "H. Peter Anvin"
    Cc: "James E.J. Bottomley"
    Cc: Aurelien Jacquiot
    Cc: Benjamin Herrenschmidt
    Cc: Catalin Marinas
    Cc: Chen Liqin
    Cc: Chris Metcalf
    Cc: Chris Zankel
    Cc: David Howells
    Cc: Fenghua Yu
    Cc: Geert Uytterhoeven
    Cc: Guan Xuetao
    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ivan Kokshaysky
    Cc: James Hogan
    Cc: Jeff Dike
    Cc: Jesper Nilsson
    Cc: Jiri Slaby
    Cc: Jonas Bonn
    Cc: Koichi Yasutake
    Cc: Lennox Wu
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Mikael Starvik
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Richard Henderson
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Russell King
    Cc: Steven Miao
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     

18 May, 2016

1 commit

  • Pull networking updates from David Miller:
    "Highlights:

    1) Support SPI based w5100 devices, from Akinobu Mita.

    2) Partial Segmentation Offload, from Alexander Duyck.

    3) Add GMAC4 support to stmmac driver, from Alexandre TORGUE.

    4) Allow cls_flower stats offload, from Amir Vadai.

    5) Implement bpf blinding, from Daniel Borkmann.

    6) Optimize _ASYNC_ bit twiddling on sockets, unless the socket is
    actually using FASYNC these atomics are superfluous. From Eric
    Dumazet.

    7) Run TCP more preemptibly, also from Eric Dumazet.

    8) Support LED blinking, EEPROM dumps, and rxvlan offloading in mlx5e
    driver, from Gal Pressman.

    9) Allow creating ppp devices via rtnetlink, from Guillaume Nault.

    10) Improve BPF usage documentation, from Jesper Dangaard Brouer.

    11) Support tunneling offloads in qed, from Manish Chopra.

    12) aRFS offloading in mlx5e, from Maor Gottlieb.

    13) Add RFS and RPS support to SCTP protocol, from Marcelo Ricardo
    Leitner.

    14) Add MSG_EOR support to TCP, this allows controlling packet
    coalescing on application record boundaries for more accurate
    socket timestamp sampling. From Martin KaFai Lau.

    15) Fix alignment of 64-bit netlink attributes across the board, from
    Nicolas Dichtel.

    16) Per-vlan stats in bridging, from Nikolay Aleksandrov.

    17) Several conversions of drivers to ethtool ksettings, from Philippe
    Reynes.

    18) Checksum neutral ILA in ipv6, from Tom Herbert.

    19) Factorize all of the various marvell dsa drivers into one, from
    Vivien Didelot

    20) Add VF support to qed driver, from Yuval Mintz"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1649 commits)
    Revert "phy dp83867: Fix compilation with CONFIG_OF_MDIO=m"
    Revert "phy dp83867: Make rgmii parameters optional"
    r8169: default to 64-bit DMA on recent PCIe chips
    phy dp83867: Make rgmii parameters optional
    phy dp83867: Fix compilation with CONFIG_OF_MDIO=m
    bpf: arm64: remove callee-save registers use for tmp registers
    asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions
    switchdev: pass pointer to fib_info instead of copy
    net_sched: close another race condition in tcf_mirred_release()
    tipc: fix nametable publication field in nl compat
    drivers: net: Don't print unpopulated net_device name
    qed: add support for dcbx.
    ravb: Add missing free_irq() calls to ravb_close()
    qed: Remove a stray tab
    net: ethernet: fec-mpc52xx: use phy_ethtool_{get|set}_link_ksettings
    net: ethernet: fec-mpc52xx: use phydev from struct net_device
    bpf, doc: fix typo on bpf_asm descriptions
    stmmac: hardware TX COE doesn't work when force_thresh_dma_mode is set
    net: ethernet: fs-enet: use phy_ethtool_{get|set}_link_ksettings
    net: ethernet: fs-enet: use phydev from struct net_device
    ...

    Linus Torvalds
     

05 May, 2016

1 commit

  • Replace all trans_start updates with netif_trans_update helper.
    change was done via spatch:

    struct net_device *d;
    @@
    - d->trans_start = jiffies
    + netif_trans_update(d)

    Compile tested only.

    Cc: user-mode-linux-devel@lists.sourceforge.net
    Cc: linux-xtensa@linux-xtensa.org
    Cc: linux1394-devel@lists.sourceforge.net
    Cc: linux-rdma@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Cc: MPT-FusionLinux.pdl@broadcom.com
    Cc: linux-scsi@vger.kernel.org
    Cc: linux-can@vger.kernel.org
    Cc: linux-parisc@vger.kernel.org
    Cc: linux-omap@vger.kernel.org
    Cc: linux-hams@vger.kernel.org
    Cc: linux-usb@vger.kernel.org
    Cc: linux-wireless@vger.kernel.org
    Cc: linux-s390@vger.kernel.org
    Cc: devel@driverdev.osuosl.org
    Cc: b.a.t.m.a.n@lists.open-mesh.org
    Cc: linux-bluetooth@vger.kernel.org
    Signed-off-by: Florian Westphal
    Acked-by: Felipe Balbi
    Acked-by: Mugunthan V N
    Acked-by: Antonio Quartulli
    Signed-off-by: David S. Miller

    Florian Westphal
     

13 Apr, 2016

1 commit


23 Mar, 2016

1 commit

  • This commit fixes the following security hole affecting systems where
    all of the following conditions are fulfilled:

    - The fs.suid_dumpable sysctl is set to 2.
    - The kernel.core_pattern sysctl's value starts with "/". (Systems
    where kernel.core_pattern starts with "|/" are not affected.)
    - Unprivileged user namespace creation is permitted. (This is
    true on Linux >=3.8, but some distributions disallow it by
    default using a distro patch.)

    Under these conditions, if a program executes under secure exec rules,
    causing it to run with the SUID_DUMP_ROOT flag, then unshares its user
    namespace, changes its root directory and crashes, the coredump will be
    written using fsuid=0 and a path derived from kernel.core_pattern - but
    this path is interpreted relative to the root directory of the process,
    allowing the attacker to control where a coredump will be written with
    root privileges.

    To fix the security issue, always interpret core_pattern for dumps that
    are written under SUID_DUMP_ROOT relative to the root directory of init.

    Signed-off-by: Jann Horn
    Acked-by: Kees Cook
    Cc: Al Viro
    Cc: "Eric W. Biederman"
    Cc: Andy Lutomirski
    Cc: Oleg Nesterov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jann Horn
     

21 Mar, 2016

1 commit

  • Pull x86 protection key support from Ingo Molnar:
    "This tree adds support for a new memory protection hardware feature
    that is available in upcoming Intel CPUs: 'protection keys' (pkeys).

    There's a background article at LWN.net:

    https://lwn.net/Articles/643797/

    The gist is that protection keys allow the encoding of
    user-controllable permission masks in the pte. So instead of having a
    fixed protection mask in the pte (which needs a system call to change
    and works on a per page basis), the user can map a (handful of)
    protection mask variants and can change the masks runtime relatively
    cheaply, without having to change every single page in the affected
    virtual memory range.

    This allows the dynamic switching of the protection bits of large
    amounts of virtual memory, via user-space instructions. It also
    allows more precise control of MMU permission bits: for example the
    executable bit is separate from the read bit (see more about that
    below).

    This tree adds the MM infrastructure and low level x86 glue needed for
    that, plus it adds a high level API to make use of protection keys -
    if a user-space application calls:

    mmap(..., PROT_EXEC);

    or

    mprotect(ptr, sz, PROT_EXEC);

    (note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice
    this special case, and will set a special protection key on this
    memory range. It also sets the appropriate bits in the Protection
    Keys User Rights (PKRU) register so that the memory becomes unreadable
    and unwritable.

    So using protection keys the kernel is able to implement 'true'
    PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies
    PROT_READ as well. Unreadable executable mappings have security
    advantages: they cannot be read via information leaks to figure out
    ASLR details, nor can they be scanned for ROP gadgets - and they
    cannot be used by exploits for data purposes either.

    We know about no user-space code that relies on pure PROT_EXEC
    mappings today, but binary loaders could start making use of this new
    feature to map binaries and libraries in a more secure fashion.

    There is other pending pkeys work that offers more high level system
    call APIs to manage protection keys - but those are not part of this
    pull request.

    Right now there's a Kconfig that controls this feature
    (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled
    (like most x86 CPU feature enablement code that has no runtime
    overhead), but it's not user-configurable at the moment. If there's
    any serious problem with this then we can make it configurable and/or
    flip the default"

    * 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
    x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
    mm/pkeys: Fix siginfo ABI breakage caused by new u64 field
    x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA
    mm/core, x86/mm/pkeys: Add execute-only protection keys support
    x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags
    x86/mm/pkeys: Allow kernel to modify user pkey rights register
    x86/fpu: Allow setting of XSAVE state
    x86/mm: Factor out LDT init from context init
    mm/core, x86/mm/pkeys: Add arch_validate_pkey()
    mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits()
    x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU
    x86/mm/pkeys: Add Kconfig prompt to existing config option
    x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps
    x86/mm/pkeys: Dump PKRU with other kernel registers
    mm/core, x86/mm/pkeys: Differentiate instruction fetches
    x86/mm/pkeys: Optimize fault handling in access_error()
    mm/core: Do not enforce PKEY permissions on remote mm access
    um, pkeys: Add UML arch_*_access_permitted() methods
    mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
    x86/mm/gup: Simplify get_user_pages() PTE bit handling
    ...

    Linus Torvalds
     

18 Mar, 2016

1 commit

  • There are few things about *pte_alloc*() helpers worth cleaning up:

    - 'vma' argument is unused, let's drop it;

    - most __pte_alloc() callers do speculative check for pmd_none(),
    before taking ptl: let's introduce pte_alloc() macro which does
    the check.

    The only direct user of __pte_alloc left is userfaultfd, which has
    different expectation about atomicity wrt pmd.

    - pte_alloc_map() and pte_alloc_map_lock() are redefined using
    pte_alloc().

    [sudeep.holla@arm.com: fix build for arm64 hugetlbpage]
    [sfr@canb.auug.org.au: fix arch/arm/mm/mmu.c some more]
    Signed-off-by: Kirill A. Shutemov
    Cc: Dave Hansen
    Signed-off-by: Sudeep Holla
    Acked-by: Kirill A. Shutemov
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

06 Mar, 2016

2 commits


19 Feb, 2016

1 commit

  • UML has a special mmu_context.h and needs updates whenever the generic one
    is updated.

    Signed-off-by: Dave Hansen
    Cc: Dave Hansen
    Cc: Jeff Dike
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Richard Weinberger
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Cc: linux-mm@kvack.org
    Cc: user-mode-linux-devel@lists.sourceforge.net
    Cc: user-mode-linux-user@lists.sourceforge.net
    Link: http://lkml.kernel.org/r/20160218183557.AE1DB383@viggo.jf.intel.com
    Signed-off-by: Ingo Molnar

    Dave Hansen
     

06 Feb, 2016

1 commit

  • Commit 16da306849d0 ("um: kill pfn_t") introduced a compile warning for
    defconfig (SUBARCH=i386):

    arch/um/kernel/skas/mmu.c:38:206:
    warning: right shift count >= width of type [-Wshift-count-overflow]

    Aforementioned patch changes the definition of the phys_to_pfn() macro
    from

    ((pfn_t) ((p) >> PAGE_SHIFT))

    to

    ((p) >> PAGE_SHIFT)

    This effectively changes the phys_to_pfn() expansion's type from
    unsigned long long to unsigned long.

    Through the callchain init_stub_pte() => mk_pte(), the expansion of
    phys_to_pfn() is (indirectly) fed into the 'phys' argument of the
    pte_set_val(pte, phys, prot) macro, eventually leading to

    (pte).pte_high = (phys) >> 32;

    This results in the warning from above.

    Since UML only deals with 32 bit addresses, the upper 32 bits from
    'phys' used to be always zero anyway. Also, all page protection flags
    defined by UML don't use any bits beyond bit 9. Since the contents of a
    PTE are defined within architecture scope only, the ->pte_high member
    can be safely removed.

    Remove the ->pte_high member from struct pte_t.
    Rename ->pte_low to ->pte.
    Adapt the pte helper macros in arch/um/include/asm/page.h.

    Noteworthy is the pte_copy() macro where a smp_wmb() gets dropped. This
    write barrier doesn't seem to be paired with any read barrier though and
    thus, was useless anyway.

    Fixes: 16da306849d0 ("um: kill pfn_t")
    Signed-off-by: Nicolai Stange
    Cc: Dan Williams
    Cc: Richard Weinberger
    Cc: Nicolai Stange
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicolai Stange
     

16 Jan, 2016

1 commit

  • The core has developed a need for a "pfn_t" type [1]. Convert the usage
    of pfn_t by usermode-linux to an unsigned long, and update pfn_to_phys()
    to drop its expectation of a typed pfn.

    [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html

    Signed-off-by: Dan Williams
    Cc: Dave Hansen
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     

13 Jan, 2016

1 commit

  • Pull misc vfs updates from Al Viro:
    "All kinds of stuff. That probably should've been 5 or 6 separate
    branches, but by the time I'd realized how large and mixed that bag
    had become it had been too close to -final to play with rebasing.

    Some fs/namei.c cleanups there, memdup_user_nul() introduction and
    switching open-coded instances, burying long-dead code, whack-a-mole
    of various kinds, several new helpers for ->llseek(), assorted
    cleanups and fixes from various people, etc.

    One piece probably deserves special mention - Neil's
    lookup_one_len_unlocked(). Similar to lookup_one_len(), but gets
    called without ->i_mutex and tries to avoid ever taking it. That, of
    course, means that it's not useful for any directory modifications,
    but things like getting inode attributes in nfds readdirplus are fine
    with that. I really should've asked for moratorium on lookup-related
    changes this cycle, but since I hadn't done that early enough... I
    *am* asking for that for the coming cycle, though - I'm going to try
    and get conversion of i_mutex to rwsem with ->lookup() done under lock
    taken shared.

    There will be a patch closer to the end of the window, along the lines
    of the one Linus had posted last May - mechanical conversion of
    ->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/
    inode_is_locked()/inode_lock_nested(). To quote Linus back then:

    -----
    | This is an automated patch using
    |
    | sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/'
    | sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/'
    | sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[ ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/'
    | sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/'
    | sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/'
    |
    | with a very few manual fixups
    -----

    I'm going to send that once the ->i_mutex-affecting stuff in -next
    gets mostly merged (or when Linus says he's about to stop taking
    merges)"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    nfsd: don't hold i_mutex over userspace upcalls
    fs:affs:Replace time_t with time64_t
    fs/9p: use fscache mutex rather than spinlock
    proc: add a reschedule point in proc_readfd_common()
    logfs: constify logfs_block_ops structures
    fcntl: allow to set O_DIRECT flag on pipe
    fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE
    fs: xattr: Use kvfree()
    [s390] page_to_phys() always returns a multiple of PAGE_SIZE
    nbd: use ->compat_ioctl()
    fs: use block_device name vsprintf helper
    lib/vsprintf: add %*pg format specifier
    fs: use gendisk->disk_name where possible
    poll: plug an unused argument to do_poll
    amdkfd: don't open-code memdup_user()
    cdrom: don't open-code memdup_user()
    rsxx: don't open-code memdup_user()
    mtip32xx: don't open-code memdup_user()
    [um] mconsole: don't open-code memdup_user_nul()
    [um] hostaudio: don't open-code memdup_user()
    ...

    Linus Torvalds
     

11 Jan, 2016

9 commits

  • Open the memory mapped file with the O_TMPFILE flag when available.

    Signed-off-by: Mickaël Salaün
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Acked-by: Tristan Schmelcher
    Signed-off-by: Richard Weinberger

    Mickaël Salaün
     
  • Remove the insecure 0777 mode for temporary file to prohibit other users
    to change the executable mapped code.

    An attacker could gain access to the mapped file descriptor from the
    temporary file (before it is unlinked) in a read-only mode but it should
    not be accessible in write mode to avoid arbitrary code execution.

    To not change the hostfs behavior, the temporary file creation
    permission now depends on the current umask(2) and the implementation of
    mkstemp(3).

    Signed-off-by: Mickaël Salaün
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Acked-by: Tristan Schmelcher
    Signed-off-by: Richard Weinberger

    Mickaël Salaün
     
  • This brings SECCOMP_MODE_STRICT and SECCOMP_MODE_FILTER support through
    prctl(2) and seccomp(2) to User-mode Linux for i386 and x86_64
    subarchitectures.

    secure_computing() is called first in handle_syscall() so that the
    syscall emulation will be aborted quickly if matching a seccomp rule.

    This is inspired from Meredydd Luff's patch
    (https://gerrit.chromium.org/gerrit/21425).

    Signed-off-by: Mickaël Salaün
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Ingo Molnar
    Cc: Kees Cook
    Cc: Andy Lutomirski
    Cc: Will Drewry
    Cc: Chris Metcalf
    Cc: Michael Ellerman
    Cc: James Hogan
    Cc: Meredydd Luff
    Cc: David Drysdale
    Signed-off-by: Richard Weinberger
    Acked-by: Kees Cook

    Mickaël Salaün
     
  • Add subarchitecture-independent implementation of asm-generic/syscall.h
    allowing access to user system call parameters and results:
    * syscall_get_nr()
    * syscall_rollback()
    * syscall_get_error()
    * syscall_get_return_value()
    * syscall_set_return_value()
    * syscall_get_arguments()
    * syscall_set_arguments()
    * syscall_get_arch() provided by arch/x86/um/asm/syscall.h

    This provides the necessary syscall helpers needed by
    HAVE_ARCH_SECCOMP_FILTER plus syscall_get_error().

    This is inspired from Meredydd Luff's patch
    (https://gerrit.chromium.org/gerrit/21425).

    Signed-off-by: Mickaël Salaün
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Andy Lutomirski
    Cc: Will Drewry
    Cc: Meredydd Luff
    Cc: David Drysdale
    Signed-off-by: Richard Weinberger
    Acked-by: Kees Cook

    Mickaël Salaün
     
  • This fix two related bugs:
    * PTRACE_GETREGS doesn't get the right orig_ax (syscall) value
    * PTRACE_SETREGS can't set the orig_ax value (erased by initial value)

    Get rid of the now useless and error-prone get_syscall().

    Fix inconsistent behavior in the ptrace implementation for i386 when
    updating orig_eax automatically update the syscall number as well. This
    is now updated in handle_syscall().

    Signed-off-by: Mickaël Salaün
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Thomas Gleixner
    Cc: Kees Cook
    Cc: Andy Lutomirski
    Cc: Will Drewry
    Cc: Thomas Meyer
    Cc: Nicolas Iooss
    Cc: Anton Ivanov
    Cc: Meredydd Luff
    Cc: David Drysdale
    Signed-off-by: Richard Weinberger
    Acked-by: Kees Cook

    Mickaël Salaün
     
  • This decreases the number of syscalls per read/write by half.

    Signed-off-by: Anton Ivanov
    Signed-off-by: Richard Weinberger

    Anton Ivanov
     
  • Software IRQ processing in generic architectures assumes that the
    exit out of hard IRQ may have re-enabled interrupts (some
    architectures may have an implicit EOI). It presumes them enabled
    and toggles the flags once more just in case unless this is turned
    off in the architecture specific hardirq.h by setting
    __ARCH_IRQ_EXIT_IRQS_DISABLED

    This patch adds this to UML where due to the way IRQs are handled
    it is an optimization (it works fine without it too).

    Signed-off-by: Anton Ivanov
    Signed-off-by: Richard Weinberger

    Anton Ivanov
     
  • The existing IRQ handler design in UML does not prevent reentrancy

    This is mitigated by fd-enable/fd-disable semantics for the IO
    portion of the UML subsystem. The timer, however, can and is
    re-entered resulting in very deep stack usage and occasional
    stack exhaustion.

    This patch prevents this by checking if there is a timer
    interrupt in-flight before processing any pending timer interrupts.

    Signed-off-by: Anton Ivanov
    Signed-off-by: Richard Weinberger

    Anton Ivanov
     
  • I was seeing some really weird behaviour where piping UML's output
    somewhere would cause output to get duplicated:

    $ ./vmlinux | head -n 40
    Checking that ptrace can change system call numbers...Core dump limits :
    soft - 0
    hard - NONE
    OK
    Checking syscall emulation patch for ptrace...Core dump limits :
    soft - 0
    hard - NONE
    OK
    Checking advanced syscall emulation patch for ptrace...Core dump limits :
    soft - 0
    hard - NONE
    OK
    Core dump limits :
    soft - 0
    hard - NONE

    This is because these tests do a fork() which duplicates the non-empty
    stdout buffer, then glibc flushes the duplicated buffer as each child
    exits.

    A simple workaround is to flush before forking.

    Cc: stable@vger.kernel.org
    Signed-off-by: Vegard Nossum
    Signed-off-by: Richard Weinberger

    Vegard Nossum
     

04 Jan, 2016

2 commits


09 Dec, 2015

3 commits

  • When using va_list ensure that va_start will be followed by va_end.

    Signed-off-by: Geyslan G. Bem
    Signed-off-by: Richard Weinberger

    Geyslan G. Bem
     
  • On gcc Ubuntu 4.8.4-2ubuntu1~14.04, linking vmlinux fails with:

    arch/um/os-Linux/built-in.o: In function `os_timer_create':
    /android/kernel/android/arch/um/os-Linux/time.c:51: undefined reference to `timer_create'
    arch/um/os-Linux/built-in.o: In function `os_timer_set_interval':
    /android/kernel/android/arch/um/os-Linux/time.c:84: undefined reference to `timer_settime'
    arch/um/os-Linux/built-in.o: In function `os_timer_remain':
    /android/kernel/android/arch/um/os-Linux/time.c:109: undefined reference to `timer_gettime'
    arch/um/os-Linux/built-in.o: In function `os_timer_one_shot':
    /android/kernel/android/arch/um/os-Linux/time.c:132: undefined reference to `timer_settime'
    arch/um/os-Linux/built-in.o: In function `os_timer_disable':
    /android/kernel/android/arch/um/os-Linux/time.c:145: undefined reference to `timer_settime'

    This is because -lrt appears in the generated link commandline
    after arch/um/os-Linux/built-in.o. Fix this by removing -lrt from
    arch/um/Makefile and adding it to the UM-specific section of
    scripts/link-vmlinux.sh.

    Signed-off-by: Lorenzo Colitti
    Signed-off-by: Richard Weinberger

    Lorenzo Colitti
     
  • If get_signal() returns us a signal to post
    we must not call it again, otherwise the already
    posted signal will be overridden.
    Before commit a610d6e672d this was the case as we stopped
    the while after a successful handle_signal().

    Cc: # 3.10-
    Fixes: a610d6e672d ("pull clearing RESTORE_SIGMASK into block_sigmask()")
    Signed-off-by: Richard Weinberger

    Richard Weinberger
     

07 Nov, 2015

6 commits

  • UML is using an obsolete itimer call for
    all timers and "polls" for kernel space timer firing
    in its userspace portion resulting in a long list
    of bugs and incorrect behaviour(s). It also uses
    ITIMER_VIRTUAL for its timer which results in the
    timer being dependent on it running and the cpu
    load.

    This patch fixes this by moving to posix high resolution
    timers firing off CLOCK_MONOTONIC and relaying the timer
    correctly to the UML userspace.

    Fixes:
    - crashes when hosts suspends/resumes
    - broken userspace timers - effecive ~40Hz instead
    of what they should be. Note - this modifies skas behavior
    by no longer setting an itimer per clone(). Timer events
    are relayed instead.
    - kernel network packet scheduling disciplines
    - tcp behaviour especially under load
    - various timer related corner cases

    Finally, overall responsiveness of userspace is better.

    Signed-off-by: Thomas Meyer
    Signed-off-by: Anton Ivanov
    [rw: massaged commit message]
    Signed-off-by: Richard Weinberger

    Anton Ivanov
     
  • since GFP_KERNEL with GFP_ATOMIC while spinlock is held,
    as code while holding a spinlock should be atomic.
    GFP_KERNEL may sleep and can cause deadlock,
    where as GFP_ATOMIC may fail but certainly avoids deadlockdex f70dd54..d898f6c 100644

    Signed-off-by: Saurabh Sengar
    Signed-off-by: Richard Weinberger

    Saurabh Sengar
     
  • If UML runs on the host side out of memory, report this
    condition more nicely.

    Signed-off-by: Richard Weinberger

    Richard Weinberger
     
  • We can use __NR_syscall_max.

    Signed-off-by: Richard Weinberger

    Richard Weinberger
     
  • To support changing syscall numbers we have to store
    it after syscall_trace_enter().

    Signed-off-by: Richard Weinberger

    Richard Weinberger
     
  • ...such that processes within UML can do a ptrace(PTRACE_OLDSETOPTIONS, ...)

    Signed-off-by: Richard Weinberger

    Richard Weinberger
     

20 Oct, 2015

2 commits

  • We have to exclude memory locations

    Richard Weinberger
     
  • If UML is executing a helper program it is using
    waitpid() with the __WCLONE flag to wait for the program
    as the helper is executed from a clone()'ed thread.
    While using __WCLONE is perfectly fine for clone()'ed
    childs it won't detect terminated childs if the helper
    has issued an execve().

    We have to use __WALL to wait for both clone()'ed and
    regular childs to detect the termination before and
    after an execve().

    Reported-and-tested-by: Thomas Meyer
    Signed-off-by: Richard Weinberger

    Richard Weinberger