04 Dec, 2011

1 commit


01 Nov, 2011

1 commit

  • The basic idea behind cross memory attach is to allow MPI programs doing
    intra-node communication to do a single copy of the message rather than a
    double copy of the message via shared memory.

    The following patch attempts to achieve this by allowing a destination
    process, given an address and size from a source process, to copy memory
    directly from the source process into its own address space via a system
    call. There is also a symmetrical ability to copy from the current
    process's address space into a destination process's address space.

    - Use of /proc/pid/mem has been considered, but there are issues with
    using it:
    - Does not allow for specifying iovecs for both src and dest, assuming
    preadv or pwritev was implemented either the area read from or
    written to would need to be contiguous.
    - Currently mem_read allows only processes who are currently
    ptrace'ing the target and are still able to ptrace the target to read
    from the target. This check could possibly be moved to the open call,
    but its not clear exactly what race this restriction is stopping
    (reason appears to have been lost)
    - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
    domain socket is a bit ugly from a userspace point of view,
    especially when you may have hundreds if not (eventually) thousands
    of processes that all need to do this with each other
    - Doesn't allow for some future use of the interface we would like to
    consider adding in the future (see below)
    - Interestingly reading from /proc/pid/mem currently actually
    involves two copies! (But this could be fixed pretty easily)

    As mentioned previously use of vmsplice instead was considered, but has
    problems. Since you need the reader and writer working co-operatively if
    the pipe is not drained then you block. Which requires some wrapping to
    do non blocking on the send side or polling on the receive. In all to all
    communication it requires ordering otherwise you can deadlock. And in the
    example of many MPI tasks writing to one MPI task vmsplice serialises the
    copying.

    There are some cases of MPI collectives where even a single copy interface
    does not get us the performance gain we could. For example in an
    MPI_Reduce rather than copy the data from the source we would like to
    instead use it directly in a mathops (say the reduce is doing a sum) as
    this would save us doing a copy. We don't need to keep a copy of the data
    from the source. I haven't implemented this, but I think this interface
    could in the future do all this through the use of the flags - eg could
    specify the math operation and type and the kernel rather than just
    copying the data would apply the specified operation between the source
    and destination and store it in the destination.

    Although we don't have a "second user" of the interface (though I've had
    some nibbles from people who may be interested in using it for intra
    process messaging which is not MPI). This interface is something which
    hardware vendors are already doing for their custom drivers to implement
    fast local communication. And so in addition to this being useful for
    OpenMPI it would mean the driver maintainers don't have to fix things up
    when the mm changes.

    There was some discussion about how much faster a true zero copy would
    go. Here's a link back to the email with some testing I did on that:

    http://marc.info/?l=linux-mm&m=130105930902915&w=2

    There is a basic man page for the proposed interface here:

    http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt

    This has been implemented for x86 and powerpc, other architecture should
    mainly (I think) just need to add syscall numbers for the process_vm_readv
    and process_vm_writev. There are 32 bit compatibility versions for
    64-bit kernels.

    For arch maintainers there are some simple tests to be able to quickly
    verify that the syscalls are working correctly here:

    http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgz

    Signed-off-by: Chris Yeoh
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Arnd Bergmann
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: David Howells
    Cc: James Morris
    Cc:
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christopher Yeoh
     

27 Aug, 2011

1 commit


16 Jul, 2011

1 commit

  • As promised in feature-removal-schedule.txt it is time to
    remove the nfsctl system call.

    Userspace has perferred to not use this call throughout 2.6 and it has been
    excluded in the default configuration since 2.6.36 (9 months ago).

    So this patch removes all the code that was being compiled out.

    There are still references to sys_nfsctl in various arch systemcall tables
    and related code. These should be cleaned out too, probably in the next
    merge window.

    Signed-off-by: NeilBrown
    Signed-off-by: J. Bruce Fields

    NeilBrown
     

28 Jun, 2011

1 commit


24 May, 2011

1 commit

  • fixes this build error on sparc64 (at least):

    In file included from arch/sparc/include/asm/siginfo.h:19,
    from include/linux/signal.h:5,
    from include/linux/sched.h:73,
    from arch/sparc/kernel/asm-offsets.c:13:
    include/linux/compat.h:401: error: expected ')' before 'ctx_id'
    include/linux/compat.h:406: error: expected ')' before 'ctx_id'

    Signed-off-by: Stephen Rothwell
    Signed-off-by: Chris Metcalf

    Stephen Rothwell
     

20 May, 2011

1 commit

  • I touched this file when adding support for the "tilegx" sub-architecture,
    and Andrew Morton observed "The file's a mismash of old-style, wrong-style
    and right-style. There's no point in doing mishmash preservation!
    May as well fix things up when we touch them."

    Accordingly, this change makes as checkpatch-clean
    as possible. It makes no semantic changes whatsoever.

    Signed-off-by: Chris Metcalf
    Signed-off-by: Andrew Morton

    Chris Metcalf
     

13 May, 2011

1 commit

  • The existing mechanism doesn't really provide
    enough to create the 64-bit "compat" ABI properly in a generic way,
    since the compat ABI is a mix of things were you can re-use the 64-bit
    versions of syscalls and things where you need a compat wrapper.

    To provide this in the most direct way possible, I added two new macros
    to go along with the existing __SYSCALL and __SC_3264 macros: __SC_COMP
    and SC_COMP_3264. These macros take an additional argument, typically a
    "compat_sys_xxx" function, which is passed to __SYSCALL if you define
    __SYSCALL_COMPAT when including the header, resulting in a pointer to
    the compat function being placed in the generated syscall table.

    The change also adds some missing definitions to so that
    it actually has declarations for all the compat syscalls, since the
    "[nr] = ##call" approach requires proper C declarations for all the
    functions included in the syscall table.

    Finally, compat.c defines compat_sys_sigpending() and
    compat_sys_sigprocmask() even if the underlying architecture doesn't
    request it, which tries to pull in undefined compat_old_sigset_t defines.
    We need to guard those compat syscall definitions with appropriate
    __ARCH_WANT_SYS_xxx ifdefs.

    Acked-by: Arnd Bergmann
    Signed-off-by: Chris Metcalf

    Chris Metcalf
     

15 Sep, 2010

1 commit

  • compat_alloc_user_space() expects the caller to independently call
    access_ok() to verify the returned area. A missing call could
    introduce problems on some architectures.

    This patch incorporates the access_ok() check into
    compat_alloc_user_space() and also adds a sanity check on the length.
    The existing compat_alloc_user_space() implementations are renamed
    arch_compat_alloc_user_space() and are used as part of the
    implementation of the new global function.

    This patch assumes NULL will cause __get_user()/__put_user() to either
    fail or access userspace on all architectures. This should be
    followed by checking the return value of compat_access_user_space()
    for NULL in the callers, at which time the access_ok() in the callers
    can also be removed.

    Reported-by: Ben Hawkes
    Signed-off-by: H. Peter Anvin
    Acked-by: Benjamin Herrenschmidt
    Acked-by: Chris Metcalf
    Acked-by: David S. Miller
    Acked-by: Ingo Molnar
    Acked-by: Thomas Gleixner
    Acked-by: Tony Luck
    Cc: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: James Bottomley
    Cc: Kyle McMartin
    Cc: Martin Schwidefsky
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc:

    H. Peter Anvin
     

14 Aug, 2010

1 commit

  • Mark arguments to certain system calls as being const where they should be but
    aren't. The list includes:

    (*) The filename arguments of various stat syscalls, execve(), various utimes
    syscalls and some mount syscalls.

    (*) The filename arguments of some syscall helpers relating to the above.

    (*) The buffer argument of various write syscalls.

    Signed-off-by: David Howells
    Acked-by: David S. Miller
    Signed-off-by: Linus Torvalds

    David Howells
     

28 May, 2010

1 commit

  • It was reported in http://lkml.org/lkml/2010/3/8/309 that 32 bit readv and
    writev AIO operations were not functioning properly. It turns out that
    the code to convert the 32bit io vectors to 64 bits was never written.
    The results of that can be pretty bad, but in my testing, it mostly ended
    up in generating EFAULT as we walked off the list of I/O vectors provided.

    This patch set fixes the problem in my environment. are greatly
    appreciated.

    This patch:

    Factor out code that will be used by both compat_do_readv_writev and the
    compat aio submission code paths.

    Signed-off-by: Jeff Moyer
    Reported-by: Michael Tokarev
    Cc: Zach Brown
    Cc: [2.6.35.1]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Moyer
     

13 Mar, 2010

1 commit

  • Add a generic implementation of the old select() syscall, which expects
    its argument in a memory block and switch all architectures over to use
    it.

    Signed-off-by: Christoph Hellwig
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mundt
    Cc: Jeff Dike
    Cc: Hirokazu Takata
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Reviewed-by: H. Peter Anvin
    Cc: Al Viro
    Cc: Arnd Bergmann
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: "Luck, Tony"
    Cc: James Morris
    Acked-by: Andreas Schwab
    Acked-by: Russell King
    Acked-by: Greg Ungerer
    Acked-by: David Howells
    Cc: Andreas Schwab
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

09 Nov, 2009

1 commit

  • This adds compat_ioctl support for SIOCWANDEV, which has
    always been missing.

    The definition of struct compat_ifreq was missing an
    ifru_settings fields that is needed to support SIOCWANDEV,
    so add that and clean up the whitespace damage in the
    struct definition.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

07 Nov, 2009

2 commits


01 May, 2009

1 commit

  • sys_kill has the per thread counterpart sys_tgkill. sigqueueinfo is
    missing a thread directed counterpart. Such an interface is important
    for migrating applications from other OSes which have the per thread
    delivery implemented.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Acked-by: Ulrich Drepper

    Thomas Gleixner
     

05 Apr, 2009

1 commit

  • Instead of always splitting the file offset into 32-bit 'high' and 'low'
    parts, just split them into the largest natural word-size - which in C
    terms is 'unsigned long'.

    This allows 64-bit architectures to avoid the unnecessary 32-bit
    shifting and masking for native format (while the compat interfaces will
    obviously always have to do it).

    This also changes the order of 'high' and 'low' to be "low first". Why?
    Because when we have it like this, the 64-bit system calls now don't use
    the "pos_high" argument at all, and it makes more sense for the native
    system call to simply match the user-mode prototype.

    This results in a much more natural calling convention, and allows the
    compiler to generate much more straightforward code. On x86-64, we now
    generate

    testq %rcx, %rcx # pos_l
    js .L122 #,
    movq %rcx, -48(%rbp) # pos_l, pos

    from the C source

    loff_t pos = pos_from_hilo(pos_h, pos_l);
    ...
    if (pos < 0)
    return -EINVAL;

    and the 'pos_h' register isn't even touched. It used to generate code
    like

    mov %r8d, %r8d # pos_low, pos_low
    salq $32, %rcx #, tmp71
    movq %r8, %rax # pos_low, pos.386
    orq %rcx, %rax # tmp71, pos.386
    js .L122 #,
    movq %rax, -48(%rbp) # pos.386, pos

    which isn't _that_ horrible, but it does show how the natural word size
    is just a more sensible interface (same arguments will hold in the user
    level glibc wrapper function, of course, so the kernel side is just half
    of the equation!)

    Note: in all cases the user code wrapper can again be the same. You can
    just do

    #define HALF_BITS (sizeof(unsigned long)*4)
    __syscall(PWRITEV, fd, iov, count, offset, (offset >> HALF_BITS) >> HALF_BITS);

    or something like that. That way the user mode wrapper will also be
    nicely passing in a zero (it won't actually have to do the shifts, the
    compiler will understand what is going on) for the last argument.

    And that is a good idea, even if nobody will necessarily ever care: if
    we ever do move to a 128-bit lloff_t, this particular system call might
    be left alone. Of course, that will be the least of our worries if we
    really ever need to care, so this may not be worth really caring about.

    [ Fixed for lost 'loff_t' cast noticed by Andrew Morton ]

    Acked-by: Gerd Hoffmann
    Cc: H. Peter Anvin
    Cc: Andrew Morton
    Cc: linux-api@vger.kernel.org
    Cc: linux-arch@vger.kernel.org
    Cc: Ingo Molnar
    Cc: Ralf Baechle >
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

03 Apr, 2009

1 commit

  • This patch adds preadv and pwritev system calls. These syscalls are a
    pretty straightforward combination of pread and readv (same for write).
    They are quite useful for doing vectored I/O in threaded applications.
    Using lseek+readv instead opens race windows you'll have to plug with
    locking.

    Other systems have such system calls too, for example NetBSD, check
    here: http://www.daemon-systems.org/man/preadv.2.html

    The application-visible interface provided by glibc should look like
    this to be compatible to the existing implementations in the *BSD family:

    ssize_t preadv(int d, const struct iovec *iov, int iovcnt, off_t offset);
    ssize_t pwritev(int d, const struct iovec *iov, int iovcnt, off_t offset);

    This prototype has one problem though: On 32bit archs is the (64bit)
    offset argument unaligned, which the syscall ABI of several archs doesn't
    allow to do. At least s390 needs a wrapper in glibc to handle this. As
    we'll need a wrappers in glibc anyway I've decided to push problem to
    glibc entriely and use a syscall prototype which works without
    arch-specific wrappers inside the kernel: The offset argument is
    explicitly splitted into two 32bit values.

    The patch sports the actual system call implementation and the windup in
    the x86 system call tables. Other archs follow as separate patches.

    Signed-off-by: Gerd Hoffmann
    Cc: Arnd Bergmann
    Cc: Al Viro
    Cc:
    Cc:
    Cc: Ralf Baechle
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gerd Hoffmann
     

28 Mar, 2009

1 commit

  • Due to a different size of ino_t ustat needs a compat handler, but
    currently only x86 and mips provide one. Add a generic compat_sys_ustat
    and switch all architectures over to it. Instead of doing various
    user copy hacks compat_sys_ustat just reimplements sys_ustat as
    it's trivial. This was suggested by Arnd Bergmann.

    Found by Eric Sandeen when running xfstests/017 on ppc64, which causes
    stack smashing warnings on RHEL/Fedora due to the too large amount of
    data writen by the syscall.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

14 Jan, 2009

1 commit


01 Dec, 2008

1 commit

  • All architectures now use the generic compat_sys_ptrace, as should every
    new architecture that needs 32bit compat (if we'll ever get another).

    Remove the now superflous __ARCH_WANT_COMPAT_SYS_PTRACE define, and also
    kill a comment about __ARCH_SYS_PTRACE that was added after
    __ARCH_SYS_PTRACE was already gone.

    Signed-off-by: Christoph Hellwig
    Acked-by: David S. Miller
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

17 Oct, 2008

2 commits

  • Nothing arch specific in get/settimeofday. The details of the timeval
    conversion varied a little from arch to arch, but all with the same
    results.

    Also add an extern declaration for sys_tz to linux/time.h because externs
    in .c files are fowned upon. I'll kill the externs in various other files
    in a sparate patch.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Christoph Hellwig
    Acked-by: David S. Miller [ sparc bits ]
    Cc: "Luck, Tony"
    Cc: Ralf Baechle
    Acked-by: Kyle McMartin
    Cc: Matthew Wilcox
    Cc: Grant Grundler
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • struct stat / compat_stat is the same on all architectures, so
    cp_compat_stat should be, too.

    Turns out it is, except that various architectures have slightly and some
    high2lowuid/high2lowgid or the direct assignment instead of the
    SET_UID/SET_GID that expands to the correct one anyway.

    This patch replaces the arch-specific cp_compat_stat implementations with
    a common one based on the x86-64 one.

    Signed-off-by: Christoph Hellwig
    Acked-by: David S. Miller [ sparc bits ]
    Acked-by: Kyle McMartin [ parisc bits ]
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

01 May, 2008

1 commit

  • This adds support for setting the TAI value (International Atomic Time). The
    value is reported back to userspace via timex (as we don't have a
    ntp_gettime() syscall).

    Signed-off-by: Roman Zippel
    Cc: john stultz
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Zippel
     

31 Mar, 2008

1 commit


07 Feb, 2008

1 commit


06 Feb, 2008

1 commit

  • This is the new timerfd API as it is implemented by the following patch:

    int timerfd_create(int clockid, int flags);
    int timerfd_settime(int ufd, int flags,
    const struct itimerspec *utmr,
    struct itimerspec *otmr);
    int timerfd_gettime(int ufd, struct itimerspec *otmr);

    The timerfd_create() API creates an un-programmed timerfd fd. The "clockid"
    parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME.

    The timerfd_settime() API give new settings by the timerfd fd, by optionally
    retrieving the previous expiration time (in case the "otmr" parameter is not
    NULL).

    The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit
    is set in the "flags" parameter. Otherwise it's a relative time.

    The timerfd_gettime() API returns the next expiration time of the timer, or
    {0, 0} if the timerfd has not been set yet.

    Like the previous timerfd API implementation, read(2) and poll(2) are
    supported (with the same interface). Here's a simple test program I used to
    exercise the new timerfd APIs:

    http://www.xmailserver.org/timerfd-test2.c

    [akpm@linux-foundation.org: coding-style cleanups]
    [akpm@linux-foundation.org: fix ia64 build]
    [akpm@linux-foundation.org: fix m68k build]
    [akpm@linux-foundation.org: fix mips build]
    [akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds]
    [heiko.carstens@de.ibm.com: fix s390]
    [akpm@linux-foundation.org: fix powerpc build]
    [akpm@linux-foundation.org: fix sparc64 more]
    Signed-off-by: Davide Libenzi
    Cc: Michael Kerrisk
    Cc: Thomas Gleixner
    Cc: Davide Libenzi
    Cc: Michael Kerrisk
    Cc: Martin Schwidefsky
    Signed-off-by: Heiko Carstens
    Cc: Michael Kerrisk
    Cc: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     

30 Jan, 2008

3 commits

  • This adds a generic definition of compat_sys_ptrace that calls
    compat_arch_ptrace, parallel to sys_ptrace/arch_ptrace. Some
    machines needing this already define a function by that name.
    The new generic function is defined only on machines that
    put #define __ARCH_WANT_COMPAT_SYS_PTRACE into asm/ptrace.h.

    Signed-off-by: Roland McGrath
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Roland McGrath
     
  • This adds a compat_ptrace_request that is the analogue of ptrace_request
    for the things that 32-on-64 ptrace implementations can share in common.
    So far there are just a couple of requests handled generically.

    Signed-off-by: Roland McGrath
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Roland McGrath
     
  • White space and coding style clenaup.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

15 May, 2007

1 commit


11 May, 2007

1 commit


10 May, 2007

1 commit


08 Mar, 2007

1 commit

  • IA64 and ARM-OABI are currently using their own version of epoll compat_
    code.

    An architecture needs epoll_event translation if alignof(u64) in 32 bit
    mode is different from alignof(u64) in 64 bit mode. If an architecture
    needs epoll_event translation, it must define struct compat_epoll_event in
    asm/compat.h and set CONFIG_HAVE_COMPAT_EPOLL_EVENT and use
    compat_sys_epoll_ctl and compat_sys_epoll_wait.

    All 64 bit architecture should use compat_sys_epoll_pwait.

    [sfr: restructure and move to fs/compat.c, remove MIPS version
    of compat_sys_epoll_pwait, use __put_user_unaligned]

    Signed-off-by: Stephen Rothwell
    Cc: David Woodhouse
    Cc: Russell King
    Cc: "Luck, Tony"
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     

04 Nov, 2006

1 commit


29 Oct, 2006

1 commit


11 Oct, 2006

1 commit


03 Oct, 2006

1 commit

  • Duh. I screwed up editing David Howells patch in commit
    3f2e05e90e0846c42626e3d272454f26be34a1bc, and the actual declaration for
    the sigset_from_compat() function went missing. My bad.

    Olaf Hering saved the day and noticed that I'm a moron.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

02 Oct, 2006

1 commit

  • Revert Andrew Morton's patch to temporarily hack around the lack of a
    declaration of sigset_t in linux/compat.h to make the block-disablement
    patches build on IA64. This got accidentally pushed to Linus and should
    be fixed in a different manner.

    Also make linux/compat.h #include asm/signal.h to gain a definition of
    sigset_t so that it can externally declare sigset_from_compat().

    This has been compile-tested for i386, x86_64, ia64, mips, mips64, frv, ppc and
    ppc64 and run-tested on frv.

    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     

27 Jun, 2006

1 commit