19 Oct, 2016

1 commit

  • This removes the redundant 'write' and 'force' parameters from
    __get_user_pages_unlocked() to make the use of FOLL_FORCE explicit in
    callers as use of this flag can result in surprising behaviour (and
    hence bugs) within the mm subsystem.

    Signed-off-by: Lorenzo Stoakes
    Acked-by: Paolo Bonzini
    Reviewed-by: Jan Kara
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Lorenzo Stoakes
     

16 Feb, 2016

1 commit

  • For protection keys, we need to understand whether protections
    should be enforced in software or not. In general, we enforce
    protections when working on our own task, but not when on others.
    We call these "current" and "remote" operations.

    This patch introduces a new get_user_pages() variant:

    get_user_pages_remote()

    Which is a replacement for when get_user_pages() is called on
    non-current tsk/mm.

    We also introduce a new gup flag: FOLL_REMOTE which can be used
    for the "__" gup variants to get this new behavior.

    The uprobes is_trap_at_addr() location holds mmap_sem and
    calls get_user_pages(current->mm) on an instruction address. This
    makes it a pretty unique gup caller. Being an instruction access
    and also really originating from the kernel (vs. the app), I opted
    to consider this a 'remote' access where protection keys will not
    be enforced.

    Without protection keys, this patch should not change any behavior.

    Signed-off-by: Dave Hansen
    Reviewed-by: Thomas Gleixner
    Cc: Andrea Arcangeli
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Kirill A. Shutemov
    Cc: Linus Torvalds
    Cc: Naoya Horiguchi
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Srikar Dronamraju
    Cc: Vlastimil Babka
    Cc: jack@suse.cz
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/20160212210154.3F0E51EA@viggo.jf.intel.com
    Signed-off-by: Ingo Molnar

    Dave Hansen
     

21 Jan, 2016

1 commit

  • By checking the effective credentials instead of the real UID / permitted
    capabilities, ensure that the calling process actually intended to use its
    credentials.

    To ensure that all ptrace checks use the correct caller credentials (e.g.
    in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
    flag), use two new flags and require one of them to be set.

    The problem was that when a privileged task had temporarily dropped its
    privileges, e.g. by calling setreuid(0, user_uid), with the intent to
    perform following syscalls with the credentials of a user, it still passed
    ptrace access checks that the user would not be able to pass.

    While an attacker should not be able to convince the privileged task to
    perform a ptrace() syscall, this is a problem because the ptrace access
    check is reused for things in procfs.

    In particular, the following somewhat interesting procfs entries only rely
    on ptrace access checks:

    /proc/$pid/stat - uses the check for determining whether pointers
    should be visible, useful for bypassing ASLR
    /proc/$pid/maps - also useful for bypassing ASLR
    /proc/$pid/cwd - useful for gaining access to restricted
    directories that contain files with lax permissions, e.g. in
    this scenario:
    lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar
    drwx------ root root /root
    drwxr-xr-x root root /root/foobar
    -rw-r--r-- root root /root/foobar/secret

    Therefore, on a system where a root-owned mode 6755 binary changes its
    effective credentials as described and then dumps a user-specified file,
    this could be used by an attacker to reveal the memory layout of root's
    processes or reveal the contents of files he is not allowed to access
    (through /proc/$pid/cwd).

    [akpm@linux-foundation.org: fix warning]
    Signed-off-by: Jann Horn
    Acked-by: Kees Cook
    Cc: Casey Schaufler
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: James Morris
    Cc: "Serge E. Hallyn"
    Cc: Andy Shevchenko
    Cc: Andy Lutomirski
    Cc: Al Viro
    Cc: "Eric W. Biederman"
    Cc: Willy Tarreau
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jann Horn
     

12 Apr, 2015

1 commit


12 Feb, 2015

1 commit

  • This allows those get_user_pages calls to pass FAULT_FLAG_ALLOW_RETRY to
    the page fault in order to release the mmap_sem during the I/O.

    Signed-off-by: Andrea Arcangeli
    Reviewed-by: Kirill A. Shutemov
    Cc: Andres Lagar-Cavilla
    Cc: Peter Feiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

07 May, 2014

2 commits


13 Apr, 2014

1 commit

  • Pull vfs updates from Al Viro:
    "The first vfs pile, with deep apologies for being very late in this
    window.

    Assorted cleanups and fixes, plus a large preparatory part of iov_iter
    work. There's a lot more of that, but it'll probably go into the next
    merge window - it *does* shape up nicely, removes a lot of
    boilerplate, gets rid of locking inconsistencie between aio_write and
    splice_write and I hope to get Kent's direct-io rewrite merged into
    the same queue, but some of the stuff after this point is having
    (mostly trivial) conflicts with the things already merged into
    mainline and with some I want more testing.

    This one passes LTP and xfstests without regressions, in addition to
    usual beating. BTW, readahead02 in ltp syscalls testsuite has started
    giving failures since "mm/readahead.c: fix readahead failure for
    memoryless NUMA nodes and limit readahead pages" - might be a false
    positive, might be a real regression..."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    missing bits of "splice: fix racy pipe->buffers uses"
    cifs: fix the race in cifs_writev()
    ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
    kill generic_file_buffered_write()
    ocfs2_file_aio_write(): switch to generic_perform_write()
    ceph_aio_write(): switch to generic_perform_write()
    xfs_file_buffered_aio_write(): switch to generic_perform_write()
    export generic_perform_write(), start getting rid of generic_file_buffer_write()
    generic_file_direct_write(): get rid of ppos argument
    btrfs_file_aio_write(): get rid of ppos
    kill the 5th argument of generic_file_buffered_write()
    kill the 4th argument of __generic_file_aio_write()
    lustre: don't open-code kernel_recvmsg()
    ocfs2: don't open-code kernel_recvmsg()
    drbd: don't open-code kernel_recvmsg()
    constify blk_rq_map_user_iov() and friends
    lustre: switch to kernel_sendmsg()
    ocfs2: don't open-code kernel_sendmsg()
    take iov_iter stuff to mm/iov_iter.c
    process_vm_access: tidy up a bit
    ...

    Linus Torvalds
     

04 Apr, 2014

1 commit

  • Mark function as static in process_vm_access.c because it is not used
    outside this file.

    This eliminates the following warning in mm/process_vm_access.c:

    mm/process_vm_access.c:416:1: warning: no previous prototype for `compat_process_vm_rw' [-Wmissing-prototypes]

    [akpm@linux-foundation.org: remove unneeded asmlinkage - compat_process_vm_rw isn't referenced from asm]
    Signed-off-by: Rashika Kheria
    Reviewed-by: Josh Triplett
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rashika Kheria
     

02 Apr, 2014

10 commits


06 Mar, 2014

1 commit


13 Mar, 2013

1 commit

  • Looking at mm/process_vm_access.c:process_vm_rw() and comparing it to
    compat_process_vm_rw() shows that the compatibility code requires an
    explicit "access_ok()" check before calling
    compat_rw_copy_check_uvector(). The same difference seems to appear when
    we compare fs/read_write.c:do_readv_writev() to
    fs/compat.c:compat_do_readv_writev().

    This subtle difference between the compat and non-compat requirements
    should probably be debated, as it seems to be error-prone. In fact,
    there are two others sites that use this function in the Linux kernel,
    and they both seem to get it wrong:

    Now shifting our attention to fs/aio.c, we see that aio_setup_iocb()
    also ends up calling compat_rw_copy_check_uvector() through
    aio_setup_vectored_rw(). Unfortunately, the access_ok() check appears to
    be missing. Same situation for
    security/keys/compat.c:compat_keyctl_instantiate_key_iov().

    I propose that we add the access_ok() check directly into
    compat_rw_copy_check_uvector(), so callers don't have to worry about it,
    and it therefore makes the compat call code similar to its non-compat
    counterpart. Place the access_ok() check in the same location where
    copy_from_user() can trigger a -EFAULT error in the non-compat code, so
    the ABI behaviors are alike on both compat and non-compat.

    While we are here, fix compat_do_readv_writev() so it checks for
    compat_rw_copy_check_uvector() negative return values.

    And also, fix a memory leak in compat_keyctl_instantiate_key_iov() error
    handling.

    Acked-by: Linus Torvalds
    Acked-by: Al Viro
    Signed-off-by: Mathieu Desnoyers
    Signed-off-by: Linus Torvalds

    Mathieu Desnoyers
     

01 Jun, 2012

1 commit

  • A cleanup of rw_copy_check_uvector and compat_rw_copy_check_uvector after
    changes made to support CMA in an earlier patch.

    Rather than having an additional check_access parameter to these
    functions, the first paramater type is overloaded to allow the caller to
    specify CHECK_IOVEC_ONLY which means check that the contents of the iovec
    are valid, but do not check the memory that they point to. This is used
    by process_vm_readv/writev where we need to validate that a iovec passed
    to the syscall is valid but do not want to check the memory that it points
    to at this point because it refers to an address space in another process.

    Signed-off-by: Chris Yeoh
    Reviewed-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christopher Yeoh
     

03 Feb, 2012

1 commit

  • This fixes the race in process_vm_core found by Oleg (see

    http://article.gmane.org/gmane.linux.kernel/1235667/

    for details).

    This has been updated since I last sent it as the creation of the new
    mm_access() function did almost exactly the same thing as parts of the
    previous version of this patch did.

    In order to use mm_access() even when /proc isn't enabled, we move it to
    kernel/fork.c where other related process mm access functions already
    are.

    Signed-off-by: Chris Yeoh
    Signed-off-by: Linus Torvalds

    Christopher Yeoh
     

01 Nov, 2011

1 commit

  • The basic idea behind cross memory attach is to allow MPI programs doing
    intra-node communication to do a single copy of the message rather than a
    double copy of the message via shared memory.

    The following patch attempts to achieve this by allowing a destination
    process, given an address and size from a source process, to copy memory
    directly from the source process into its own address space via a system
    call. There is also a symmetrical ability to copy from the current
    process's address space into a destination process's address space.

    - Use of /proc/pid/mem has been considered, but there are issues with
    using it:
    - Does not allow for specifying iovecs for both src and dest, assuming
    preadv or pwritev was implemented either the area read from or
    written to would need to be contiguous.
    - Currently mem_read allows only processes who are currently
    ptrace'ing the target and are still able to ptrace the target to read
    from the target. This check could possibly be moved to the open call,
    but its not clear exactly what race this restriction is stopping
    (reason appears to have been lost)
    - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
    domain socket is a bit ugly from a userspace point of view,
    especially when you may have hundreds if not (eventually) thousands
    of processes that all need to do this with each other
    - Doesn't allow for some future use of the interface we would like to
    consider adding in the future (see below)
    - Interestingly reading from /proc/pid/mem currently actually
    involves two copies! (But this could be fixed pretty easily)

    As mentioned previously use of vmsplice instead was considered, but has
    problems. Since you need the reader and writer working co-operatively if
    the pipe is not drained then you block. Which requires some wrapping to
    do non blocking on the send side or polling on the receive. In all to all
    communication it requires ordering otherwise you can deadlock. And in the
    example of many MPI tasks writing to one MPI task vmsplice serialises the
    copying.

    There are some cases of MPI collectives where even a single copy interface
    does not get us the performance gain we could. For example in an
    MPI_Reduce rather than copy the data from the source we would like to
    instead use it directly in a mathops (say the reduce is doing a sum) as
    this would save us doing a copy. We don't need to keep a copy of the data
    from the source. I haven't implemented this, but I think this interface
    could in the future do all this through the use of the flags - eg could
    specify the math operation and type and the kernel rather than just
    copying the data would apply the specified operation between the source
    and destination and store it in the destination.

    Although we don't have a "second user" of the interface (though I've had
    some nibbles from people who may be interested in using it for intra
    process messaging which is not MPI). This interface is something which
    hardware vendors are already doing for their custom drivers to implement
    fast local communication. And so in addition to this being useful for
    OpenMPI it would mean the driver maintainers don't have to fix things up
    when the mm changes.

    There was some discussion about how much faster a true zero copy would
    go. Here's a link back to the email with some testing I did on that:

    http://marc.info/?l=linux-mm&m=130105930902915&w=2

    There is a basic man page for the proposed interface here:

    http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt

    This has been implemented for x86 and powerpc, other architecture should
    mainly (I think) just need to add syscall numbers for the process_vm_readv
    and process_vm_writev. There are 32 bit compatibility versions for
    64-bit kernels.

    For arch maintainers there are some simple tests to be able to quickly
    verify that the syscalls are working correctly here:

    http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgz

    Signed-off-by: Chris Yeoh
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Arnd Bergmann
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: David Howells
    Cc: James Morris
    Cc:
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christopher Yeoh