04 Dec, 2011
1 commit
-
Also prototype the "compat" functions so they can be referenced
from C code.Signed-off-by: Chris Metcalf
Acked-by: Arnd Bergmann
01 Nov, 2011
1 commit
-
The basic idea behind cross memory attach is to allow MPI programs doing
intra-node communication to do a single copy of the message rather than a
double copy of the message via shared memory.The following patch attempts to achieve this by allowing a destination
process, given an address and size from a source process, to copy memory
directly from the source process into its own address space via a system
call. There is also a symmetrical ability to copy from the current
process's address space into a destination process's address space.- Use of /proc/pid/mem has been considered, but there are issues with
using it:
- Does not allow for specifying iovecs for both src and dest, assuming
preadv or pwritev was implemented either the area read from or
written to would need to be contiguous.
- Currently mem_read allows only processes who are currently
ptrace'ing the target and are still able to ptrace the target to read
from the target. This check could possibly be moved to the open call,
but its not clear exactly what race this restriction is stopping
(reason appears to have been lost)
- Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
domain socket is a bit ugly from a userspace point of view,
especially when you may have hundreds if not (eventually) thousands
of processes that all need to do this with each other
- Doesn't allow for some future use of the interface we would like to
consider adding in the future (see below)
- Interestingly reading from /proc/pid/mem currently actually
involves two copies! (But this could be fixed pretty easily)As mentioned previously use of vmsplice instead was considered, but has
problems. Since you need the reader and writer working co-operatively if
the pipe is not drained then you block. Which requires some wrapping to
do non blocking on the send side or polling on the receive. In all to all
communication it requires ordering otherwise you can deadlock. And in the
example of many MPI tasks writing to one MPI task vmsplice serialises the
copying.There are some cases of MPI collectives where even a single copy interface
does not get us the performance gain we could. For example in an
MPI_Reduce rather than copy the data from the source we would like to
instead use it directly in a mathops (say the reduce is doing a sum) as
this would save us doing a copy. We don't need to keep a copy of the data
from the source. I haven't implemented this, but I think this interface
could in the future do all this through the use of the flags - eg could
specify the math operation and type and the kernel rather than just
copying the data would apply the specified operation between the source
and destination and store it in the destination.Although we don't have a "second user" of the interface (though I've had
some nibbles from people who may be interested in using it for intra
process messaging which is not MPI). This interface is something which
hardware vendors are already doing for their custom drivers to implement
fast local communication. And so in addition to this being useful for
OpenMPI it would mean the driver maintainers don't have to fix things up
when the mm changes.There was some discussion about how much faster a true zero copy would
go. Here's a link back to the email with some testing I did on that:http://marc.info/?l=linux-mm&m=130105930902915&w=2
There is a basic man page for the proposed interface here:
http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt
This has been implemented for x86 and powerpc, other architecture should
mainly (I think) just need to add syscall numbers for the process_vm_readv
and process_vm_writev. There are 32 bit compatibility versions for
64-bit kernels.For arch maintainers there are some simple tests to be able to quickly
verify that the syscalls are working correctly here:http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgz
Signed-off-by: Chris Yeoh
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Cc: Arnd Bergmann
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: David Howells
Cc: James Morris
Cc:
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Aug, 2011
1 commit
-
The nfsservctl system call is now gone, so we should remove all
linkage for it.Signed-off-by: NeilBrown
Signed-off-by: J. Bruce Fields
Signed-off-by: Linus Torvalds
16 Jul, 2011
1 commit
-
As promised in feature-removal-schedule.txt it is time to
remove the nfsctl system call.Userspace has perferred to not use this call throughout 2.6 and it has been
excluded in the default configuration since 2.6.36 (9 months ago).So this patch removes all the code that was being compiled out.
There are still references to sys_nfsctl in various arch systemcall tables
and related code. These should be cleaned out too, probably in the next
merge window.Signed-off-by: NeilBrown
Signed-off-by: J. Bruce Fields
28 Jun, 2011
1 commit
-
This is required for tilegx to be able to use the compat unistd.h header
where compat_sys_sendmmsg() is now mentioned.Signed-off-by: Chris Metcalf
Cc: Arnd Bergmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
24 May, 2011
1 commit
-
fixes this build error on sparc64 (at least):
In file included from arch/sparc/include/asm/siginfo.h:19,
from include/linux/signal.h:5,
from include/linux/sched.h:73,
from arch/sparc/kernel/asm-offsets.c:13:
include/linux/compat.h:401: error: expected ')' before 'ctx_id'
include/linux/compat.h:406: error: expected ')' before 'ctx_id'Signed-off-by: Stephen Rothwell
Signed-off-by: Chris Metcalf
20 May, 2011
1 commit
-
I touched this file when adding support for the "tilegx" sub-architecture,
and Andrew Morton observed "The file's a mismash of old-style, wrong-style
and right-style. There's no point in doing mishmash preservation!
May as well fix things up when we touch them."Accordingly, this change makes as checkpatch-clean
as possible. It makes no semantic changes whatsoever.Signed-off-by: Chris Metcalf
Signed-off-by: Andrew Morton
13 May, 2011
1 commit
-
The existing mechanism doesn't really provide
enough to create the 64-bit "compat" ABI properly in a generic way,
since the compat ABI is a mix of things were you can re-use the 64-bit
versions of syscalls and things where you need a compat wrapper.To provide this in the most direct way possible, I added two new macros
to go along with the existing __SYSCALL and __SC_3264 macros: __SC_COMP
and SC_COMP_3264. These macros take an additional argument, typically a
"compat_sys_xxx" function, which is passed to __SYSCALL if you define
__SYSCALL_COMPAT when including the header, resulting in a pointer to
the compat function being placed in the generated syscall table.The change also adds some missing definitions to so that
it actually has declarations for all the compat syscalls, since the
"[nr] = ##call" approach requires proper C declarations for all the
functions included in the syscall table.Finally, compat.c defines compat_sys_sigpending() and
compat_sys_sigprocmask() even if the underlying architecture doesn't
request it, which tries to pull in undefined compat_old_sigset_t defines.
We need to guard those compat syscall definitions with appropriate
__ARCH_WANT_SYS_xxx ifdefs.Acked-by: Arnd Bergmann
Signed-off-by: Chris Metcalf
15 Sep, 2010
1 commit
-
compat_alloc_user_space() expects the caller to independently call
access_ok() to verify the returned area. A missing call could
introduce problems on some architectures.This patch incorporates the access_ok() check into
compat_alloc_user_space() and also adds a sanity check on the length.
The existing compat_alloc_user_space() implementations are renamed
arch_compat_alloc_user_space() and are used as part of the
implementation of the new global function.This patch assumes NULL will cause __get_user()/__put_user() to either
fail or access userspace on all architectures. This should be
followed by checking the return value of compat_access_user_space()
for NULL in the callers, at which time the access_ok() in the callers
can also be removed.Reported-by: Ben Hawkes
Signed-off-by: H. Peter Anvin
Acked-by: Benjamin Herrenschmidt
Acked-by: Chris Metcalf
Acked-by: David S. Miller
Acked-by: Ingo Molnar
Acked-by: Thomas Gleixner
Acked-by: Tony Luck
Cc: Andrew Morton
Cc: Arnd Bergmann
Cc: Fenghua Yu
Cc: H. Peter Anvin
Cc: Heiko Carstens
Cc: Helge Deller
Cc: James Bottomley
Cc: Kyle McMartin
Cc: Martin Schwidefsky
Cc: Paul Mackerras
Cc: Ralf Baechle
Cc:
14 Aug, 2010
1 commit
-
Mark arguments to certain system calls as being const where they should be but
aren't. The list includes:(*) The filename arguments of various stat syscalls, execve(), various utimes
syscalls and some mount syscalls.(*) The filename arguments of some syscall helpers relating to the above.
(*) The buffer argument of various write syscalls.
Signed-off-by: David Howells
Acked-by: David S. Miller
Signed-off-by: Linus Torvalds
28 May, 2010
1 commit
-
It was reported in http://lkml.org/lkml/2010/3/8/309 that 32 bit readv and
writev AIO operations were not functioning properly. It turns out that
the code to convert the 32bit io vectors to 64 bits was never written.
The results of that can be pretty bad, but in my testing, it mostly ended
up in generating EFAULT as we walked off the list of I/O vectors provided.This patch set fixes the problem in my environment. are greatly
appreciated.This patch:
Factor out code that will be used by both compat_do_readv_writev and the
compat aio submission code paths.Signed-off-by: Jeff Moyer
Reported-by: Michael Tokarev
Cc: Zach Brown
Cc: [2.6.35.1]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
13 Mar, 2010
1 commit
-
Add a generic implementation of the old select() syscall, which expects
its argument in a memory block and switch all architectures over to use
it.Signed-off-by: Christoph Hellwig
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: Paul Mundt
Cc: Jeff Dike
Cc: Hirokazu Takata
Cc: Thomas Gleixner
Cc: Ingo Molnar
Reviewed-by: H. Peter Anvin
Cc: Al Viro
Cc: Arnd Bergmann
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: "Luck, Tony"
Cc: James Morris
Acked-by: Andreas Schwab
Acked-by: Russell King
Acked-by: Greg Ungerer
Acked-by: David Howells
Cc: Andreas Schwab
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Nov, 2009
1 commit
-
This adds compat_ioctl support for SIOCWANDEV, which has
always been missing.The definition of struct compat_ifreq was missing an
ifru_settings fields that is needed to support SIOCWANDEV,
so add that and clean up the whitespace damage in the
struct definition.Signed-off-by: Arnd Bergmann
Signed-off-by: David S. Miller
07 Nov, 2009
2 commits
-
It's defined colloqually in linux/if.h and linux/compat.h
includes that.Signed-off-by: David S. Miller
-
In order to move socket ioctl conversion code into multiple
places in the socket code, we need a common defintion of
the data structures it uses.Also change the name from ifreq32 to compat_ifreq to
follow the naming convention for compat.hSigned-off-by: Arnd Bergmann
Signed-off-by: David S. Miller
01 May, 2009
1 commit
-
sys_kill has the per thread counterpart sys_tgkill. sigqueueinfo is
missing a thread directed counterpart. Such an interface is important
for migrating applications from other OSes which have the per thread
delivery implemented.Signed-off-by: Thomas Gleixner
Reviewed-by: Oleg Nesterov
Acked-by: Roland McGrath
Acked-by: Ulrich Drepper
05 Apr, 2009
1 commit
-
Instead of always splitting the file offset into 32-bit 'high' and 'low'
parts, just split them into the largest natural word-size - which in C
terms is 'unsigned long'.This allows 64-bit architectures to avoid the unnecessary 32-bit
shifting and masking for native format (while the compat interfaces will
obviously always have to do it).This also changes the order of 'high' and 'low' to be "low first". Why?
Because when we have it like this, the 64-bit system calls now don't use
the "pos_high" argument at all, and it makes more sense for the native
system call to simply match the user-mode prototype.This results in a much more natural calling convention, and allows the
compiler to generate much more straightforward code. On x86-64, we now
generatetestq %rcx, %rcx # pos_l
js .L122 #,
movq %rcx, -48(%rbp) # pos_l, posfrom the C source
loff_t pos = pos_from_hilo(pos_h, pos_l);
...
if (pos < 0)
return -EINVAL;and the 'pos_h' register isn't even touched. It used to generate code
likemov %r8d, %r8d # pos_low, pos_low
salq $32, %rcx #, tmp71
movq %r8, %rax # pos_low, pos.386
orq %rcx, %rax # tmp71, pos.386
js .L122 #,
movq %rax, -48(%rbp) # pos.386, poswhich isn't _that_ horrible, but it does show how the natural word size
is just a more sensible interface (same arguments will hold in the user
level glibc wrapper function, of course, so the kernel side is just half
of the equation!)Note: in all cases the user code wrapper can again be the same. You can
just do#define HALF_BITS (sizeof(unsigned long)*4)
__syscall(PWRITEV, fd, iov, count, offset, (offset >> HALF_BITS) >> HALF_BITS);or something like that. That way the user mode wrapper will also be
nicely passing in a zero (it won't actually have to do the shifts, the
compiler will understand what is going on) for the last argument.And that is a good idea, even if nobody will necessarily ever care: if
we ever do move to a 128-bit lloff_t, this particular system call might
be left alone. Of course, that will be the least of our worries if we
really ever need to care, so this may not be worth really caring about.[ Fixed for lost 'loff_t' cast noticed by Andrew Morton ]
Acked-by: Gerd Hoffmann
Cc: H. Peter Anvin
Cc: Andrew Morton
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: Ingo Molnar
Cc: Ralf Baechle >
Cc: Al Viro
Signed-off-by: Linus Torvalds
03 Apr, 2009
1 commit
-
This patch adds preadv and pwritev system calls. These syscalls are a
pretty straightforward combination of pread and readv (same for write).
They are quite useful for doing vectored I/O in threaded applications.
Using lseek+readv instead opens race windows you'll have to plug with
locking.Other systems have such system calls too, for example NetBSD, check
here: http://www.daemon-systems.org/man/preadv.2.htmlThe application-visible interface provided by glibc should look like
this to be compatible to the existing implementations in the *BSD family:ssize_t preadv(int d, const struct iovec *iov, int iovcnt, off_t offset);
ssize_t pwritev(int d, const struct iovec *iov, int iovcnt, off_t offset);This prototype has one problem though: On 32bit archs is the (64bit)
offset argument unaligned, which the syscall ABI of several archs doesn't
allow to do. At least s390 needs a wrapper in glibc to handle this. As
we'll need a wrappers in glibc anyway I've decided to push problem to
glibc entriely and use a syscall prototype which works without
arch-specific wrappers inside the kernel: The offset argument is
explicitly splitted into two 32bit values.The patch sports the actual system call implementation and the windup in
the x86 system call tables. Other archs follow as separate patches.Signed-off-by: Gerd Hoffmann
Cc: Arnd Bergmann
Cc: Al Viro
Cc:
Cc:
Cc: Ralf Baechle
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
28 Mar, 2009
1 commit
-
Due to a different size of ino_t ustat needs a compat handler, but
currently only x86 and mips provide one. Add a generic compat_sys_ustat
and switch all architectures over to it. Instead of doing various
user copy hacks compat_sys_ustat just reimplements sys_ustat as
it's trivial. This was suggested by Arnd Bergmann.Found by Eric Sandeen when running xfstests/017 on ppc64, which causes
stack smashing warnings on RHEL/Fedora due to the too large amount of
data writen by the syscall.Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro
14 Jan, 2009
1 commit
-
Move declarations to correct header file.
Signed-off-by: Heiko Carstens
01 Dec, 2008
1 commit
-
All architectures now use the generic compat_sys_ptrace, as should every
new architecture that needs 32bit compat (if we'll ever get another).Remove the now superflous __ARCH_WANT_COMPAT_SYS_PTRACE define, and also
kill a comment about __ARCH_SYS_PTRACE that was added after
__ARCH_SYS_PTRACE was already gone.Signed-off-by: Christoph Hellwig
Acked-by: David S. Miller
Signed-off-by: Linus Torvalds
17 Oct, 2008
2 commits
-
Nothing arch specific in get/settimeofday. The details of the timeval
conversion varied a little from arch to arch, but all with the same
results.Also add an extern declaration for sys_tz to linux/time.h because externs
in .c files are fowned upon. I'll kill the externs in various other files
in a sparate patch.[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Christoph Hellwig
Acked-by: David S. Miller [ sparc bits ]
Cc: "Luck, Tony"
Cc: Ralf Baechle
Acked-by: Kyle McMartin
Cc: Matthew Wilcox
Cc: Grant Grundler
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
struct stat / compat_stat is the same on all architectures, so
cp_compat_stat should be, too.Turns out it is, except that various architectures have slightly and some
high2lowuid/high2lowgid or the direct assignment instead of the
SET_UID/SET_GID that expands to the correct one anyway.This patch replaces the arch-specific cp_compat_stat implementations with
a common one based on the x86-64 one.Signed-off-by: Christoph Hellwig
Acked-by: David S. Miller [ sparc bits ]
Acked-by: Kyle McMartin [ parisc bits ]
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 May, 2008
1 commit
-
This adds support for setting the TAI value (International Atomic Time). The
value is reported back to userspace via timex (as we don't have a
ntp_gettime() syscall).Signed-off-by: Roman Zippel
Cc: john stultz
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
31 Mar, 2008
1 commit
-
Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds
07 Feb, 2008
1 commit
-
Remove dead config CONFIG_HAS_COMPAT_EPOLL_EVENT symbol.
Signed-off-by: Jiri Olsa
Cc: Davide Libenzi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
06 Feb, 2008
1 commit
-
This is the new timerfd API as it is implemented by the following patch:
int timerfd_create(int clockid, int flags);
int timerfd_settime(int ufd, int flags,
const struct itimerspec *utmr,
struct itimerspec *otmr);
int timerfd_gettime(int ufd, struct itimerspec *otmr);The timerfd_create() API creates an un-programmed timerfd fd. The "clockid"
parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME.The timerfd_settime() API give new settings by the timerfd fd, by optionally
retrieving the previous expiration time (in case the "otmr" parameter is not
NULL).The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit
is set in the "flags" parameter. Otherwise it's a relative time.The timerfd_gettime() API returns the next expiration time of the timer, or
{0, 0} if the timerfd has not been set yet.Like the previous timerfd API implementation, read(2) and poll(2) are
supported (with the same interface). Here's a simple test program I used to
exercise the new timerfd APIs:http://www.xmailserver.org/timerfd-test2.c
[akpm@linux-foundation.org: coding-style cleanups]
[akpm@linux-foundation.org: fix ia64 build]
[akpm@linux-foundation.org: fix m68k build]
[akpm@linux-foundation.org: fix mips build]
[akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds]
[heiko.carstens@de.ibm.com: fix s390]
[akpm@linux-foundation.org: fix powerpc build]
[akpm@linux-foundation.org: fix sparc64 more]
Signed-off-by: Davide Libenzi
Cc: Michael Kerrisk
Cc: Thomas Gleixner
Cc: Davide Libenzi
Cc: Michael Kerrisk
Cc: Martin Schwidefsky
Signed-off-by: Heiko Carstens
Cc: Michael Kerrisk
Cc: Davide Libenzi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
30 Jan, 2008
3 commits
-
This adds a generic definition of compat_sys_ptrace that calls
compat_arch_ptrace, parallel to sys_ptrace/arch_ptrace. Some
machines needing this already define a function by that name.
The new generic function is defined only on machines that
put #define __ARCH_WANT_COMPAT_SYS_PTRACE into asm/ptrace.h.Signed-off-by: Roland McGrath
Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner -
This adds a compat_ptrace_request that is the analogue of ptrace_request
for the things that 32-on-64 ptrace implementations can share in common.
So far there are just a couple of requests handled generically.Signed-off-by: Roland McGrath
Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner -
White space and coding style clenaup.
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
15 May, 2007
1 commit
-
compat_sys_signalfd and compat_sys_timerfd need declarations before
PowerPC can wire them up.Signed-off-by: Stephen Rothwell
Signed-off-by: Linus Torvalds
11 May, 2007
1 commit
-
This patch implements the necessary compat code for the timerfd system call.
Signed-off-by: Davide Libenzi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
10 May, 2007
1 commit
-
This is needed before Powerpc can wire up the syscall.
Signed-off-by: Stephen Rothwell
Cc: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Mar, 2007
1 commit
-
IA64 and ARM-OABI are currently using their own version of epoll compat_
code.An architecture needs epoll_event translation if alignof(u64) in 32 bit
mode is different from alignof(u64) in 64 bit mode. If an architecture
needs epoll_event translation, it must define struct compat_epoll_event in
asm/compat.h and set CONFIG_HAVE_COMPAT_EPOLL_EVENT and use
compat_sys_epoll_ctl and compat_sys_epoll_wait.All 64 bit architecture should use compat_sys_epoll_pwait.
[sfr: restructure and move to fs/compat.c, remove MIPS version
of compat_sys_epoll_pwait, use __put_user_unaligned]Signed-off-by: Stephen Rothwell
Cc: David Woodhouse
Cc: Russell King
Cc: "Luck, Tony"
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Nov, 2006
1 commit
-
This is needed on bigendian 64bit architectures.
Signed-off-by: Stephen Rothwell
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
29 Oct, 2006
1 commit
-
This means we can call it when the bitmap we want to fetch is declared
const.Signed-off-by: Stephen Rothwell
Cc: Christoph Lameter
Cc: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
11 Oct, 2006
1 commit
-
Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds
03 Oct, 2006
1 commit
-
Duh. I screwed up editing David Howells patch in commit
3f2e05e90e0846c42626e3d272454f26be34a1bc, and the actual declaration for
the sigset_from_compat() function went missing. My bad.Olaf Hering saved the day and noticed that I'm a moron.
Signed-off-by: Linus Torvalds
02 Oct, 2006
1 commit
-
Revert Andrew Morton's patch to temporarily hack around the lack of a
declaration of sigset_t in linux/compat.h to make the block-disablement
patches build on IA64. This got accidentally pushed to Linus and should
be fixed in a different manner.Also make linux/compat.h #include asm/signal.h to gain a definition of
sigset_t so that it can externally declare sigset_from_compat().This has been compile-tested for i386, x86_64, ia64, mips, mips64, frv, ppc and
ppc64 and run-tested on frv.Signed-off-by: David Howells
Signed-off-by: Linus Torvalds
27 Jun, 2006
1 commit
-
Sometimes e.g. with crashme the compat layer warnings can be noisy.
Add a way to turn them off by gating all output through compat_printk
that checks a global sysctl. The default is not changed.Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds