Eric Lee / smarc-fsl-linux-kernel

17 Jan, 2012

1 commit

5674124f9 Merge branch 'x86-syscall-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'x86-syscall-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Move from trace_syscalls.c to asm/syscall.h
x86, um: Fix typo in 32-bit system call modifications
um: Use $(srctree) not $(KBUILD_SRC)
x86, um: Mark system call tables readonly
x86, um: Use the same style generated syscall tables as native
um: Generate headers before generating user-offsets.s
um: Run host archheaders, allow use of host generated headers
kbuild, headers.sh: Don't make archheaders explicitly
x86, syscall: Allow syscall offset to be symbolic
x86, syscall: Re-fix typo in comment
x86: Simplify syscallhdr.sh
x86: Generate system call tables and unistd_*.h from tables
checksyscalls: Use arch/x86/syscalls/syscall_32.tbl as source
x86: Machine-readable syscall tables and scripts to process them
trace: Include in trace_syscalls.c
x86-64, ia32: Move compat_ni_syscall into C and its own file
x86-64, syscall: Adjust comment spacing and remove typo
kbuild: Add support for an "archheaders" target
kbuild: Add support for installing generated asm headers

Linus Torvalds
2012-01-17 10:19:19 +0800

06 Dec, 2011

2 commits

f6b2bc847 x86-64: Cleanup some assembly entry points ... Browse Code »

system_call_after_swapgs doesn't really benefit from forcing
alignment from it - quite the opposite, native code needlessly
so far got a big NOP instruction inserted in front of it. Xen
being the only user of the separate entry point can well live
with the branch going to three bytes into a cache line.

The compatibility mode ptregs entry points for one can make use
of the GLOBAL() macro, and should be suitably aligned. Their
shared continuation point (ia32_ptregs_common) otoh doesn't need
to be global at all, but should continue to be properly aligned.

Signed-off-by: Jan Beulich
Reviewed-by: Andi Kleen
Link: http://lkml.kernel.org/r/4ED4CEEA020000780006407D@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar

Jan Beulich
2011-12-06 00:24:43 +0800
46db09d3f x86-64: Slightly shorten line system call entry and exit paths ... Browse Code »

GET_THREAD_INFO() involves a memory read immediately followed by
an "sub" on the value read, in turn (in several cases)
immediately followed by a use of the calculated value as the
base address of a memory access. This combination of
instructions has a non-negligible potential for stalls.

In the system call entry point code, however, the (fixed) offset
of the stack pointer from the end of the stack is generally
known, and hence we can instead avoid the memory load and
subtract, and instead do the memory reference using %rsp as the
base register. To do so in a legible fashion, introduce a
THREAD_INFO() macro which, provided a register (generally %rsp)
and the known offset from the end of the stack, produces a
suitable memory access operand.

The patch attempts to only touch the fast paths (no auditing and
alike), but manages to do so only in the 64-bit entry point
case; the compatibility mode entry points have so many
interdependencies between their various branch targets that it
was necessary to also adjust the slow paths to eliminate the
risk of having missed some register dependency during code
analysis.

Signed-off-by: Jan Beulich
Reviewed-by: Andi Kleen
Link: http://lkml.kernel.org/r/4ED4CD690200007800064075@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar

Jan Beulich
2011-12-06 00:24:41 +0800

19 Nov, 2011

1 commit

61f1e7e20 x86, syscall: Re-fix typo in comment ... Browse Code »

Fix the same typo as was fixed in:

b7641d2c x86-64, syscall: Adjust comment spacing and remove typo

... for the new versions of this file (32-bit and IA32 compat).

Signed-off-by: H. Peter Anvin
Link: http://lkml.kernel.org/r/1321569446-20433-4-git-send-email-hpa@linux.intel.com

H. Peter Anvin
2011-11-19 08:25:07 +0800

18 Nov, 2011

2 commits

303395ac3 x86: Generate system call tables and unistd_*.h from tables ... Browse Code »
86

Generate system call tables and unistd_*.h automatically from the
tables in arch/x86/syscalls. All other information, like NR_syscalls,
is auto-generated, some of which is in asm-offsets_*.c.

This allows us to keep all the system call information in one place,
and allows for kernel space and user space to see different
information; this is currently used for the ia32 system call numbers
when building the 64-bit kernel, but will be used by the x32 ABI in
the near future.

This also removes some gratuitious differences between i386, x86-64
and ia32; in particular, now all system call tables are generated with
the same mechanism.

Cc: H. J. Lu
Cc: Sam Ravnborg
Cc: Michal Marek
Signed-off-by: H. Peter Anvin

H. Peter Anvin
2011-11-18 05:35:37 +0800
e79a7fccf x86-64, ia32: Move compat_ni_syscall into C and its own file ... Browse Code »

Move compat_ni_syscall out of ia32entry.S and into its own .c file.
Although this is a trivial function, it is not performance-critical,
and this will simplify further cleanups.

Signed-off-by: H. Peter Anvin

H. Peter Anvin
2011-11-18 05:35:35 +0800

01 Nov, 2011

1 commit

fcf634098 Cross Memory Attach ... Browse Code »

The basic idea behind cross memory attach is to allow MPI programs doing
intra-node communication to do a single copy of the message rather than a
double copy of the message via shared memory.

The following patch attempts to achieve this by allowing a destination
process, given an address and size from a source process, to copy memory
directly from the source process into its own address space via a system
call. There is also a symmetrical ability to copy from the current
process's address space into a destination process's address space.

- Use of /proc/pid/mem has been considered, but there are issues with
using it:
- Does not allow for specifying iovecs for both src and dest, assuming
preadv or pwritev was implemented either the area read from or
written to would need to be contiguous.
- Currently mem_read allows only processes who are currently
ptrace'ing the target and are still able to ptrace the target to read
from the target. This check could possibly be moved to the open call,
but its not clear exactly what race this restriction is stopping
(reason appears to have been lost)
- Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
domain socket is a bit ugly from a userspace point of view,
especially when you may have hundreds if not (eventually) thousands
of processes that all need to do this with each other
- Doesn't allow for some future use of the interface we would like to
consider adding in the future (see below)
- Interestingly reading from /proc/pid/mem currently actually
involves two copies! (But this could be fixed pretty easily)

As mentioned previously use of vmsplice instead was considered, but has
problems. Since you need the reader and writer working co-operatively if
the pipe is not drained then you block. Which requires some wrapping to
do non blocking on the send side or polling on the receive. In all to all
communication it requires ordering otherwise you can deadlock. And in the
example of many MPI tasks writing to one MPI task vmsplice serialises the
copying.

There are some cases of MPI collectives where even a single copy interface
does not get us the performance gain we could. For example in an
MPI_Reduce rather than copy the data from the source we would like to
instead use it directly in a mathops (say the reduce is doing a sum) as
this would save us doing a copy. We don't need to keep a copy of the data
from the source. I haven't implemented this, but I think this interface
could in the future do all this through the use of the flags - eg could
specify the math operation and type and the kernel rather than just
copying the data would apply the specified operation between the source
and destination and store it in the destination.

Although we don't have a "second user" of the interface (though I've had
some nibbles from people who may be interested in using it for intra
process messaging which is not MPI). This interface is something which
hardware vendors are already doing for their custom drivers to implement
fast local communication. And so in addition to this being useful for
OpenMPI it would mean the driver maintainers don't have to fix things up
when the mm changes.

There was some discussion about how much faster a true zero copy would
go. Here's a link back to the email with some testing I did on that:

http://marc.info/?l=linux-mm&m=130105930902915&w=2

There is a basic man page for the proposed interface here:

http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt

This has been implemented for x86 and powerpc, other architecture should
mainly (I think) just need to add syscall numbers for the process_vm_readv
and process_vm_writev. There are 32 bit compatibility versions for
64-bit kernels.

For arch maintainers there are some simple tests to be able to quickly
verify that the syscalls are working correctly here:

http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgz

Signed-off-by: Chris Yeoh
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Cc: Arnd Bergmann
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: David Howells
Cc: James Morris
Cc:
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christopher Yeoh
2011-11-01 08:30:44 +0800

27 Aug, 2011

1 commit

f5b940997 All Arch: remove linkage for sys_nfsservctl system call ... Browse Code »

The nfsservctl system call is now gone, so we should remove all
linkage for it.

Signed-off-by: NeilBrown
Signed-off-by: J. Bruce Fields
Signed-off-by: Linus Torvalds

NeilBrown
2011-08-27 06:09:58 +0800

27 Jul, 2011

1 commit

60063497a atomic: use <linux/atomic.h> ... Browse Code »

This allows us to move duplicated code in
(atomic_inc_not_zero() for now) to

Signed-off-by: Arun Sharma
Reviewed-by: Eric Dumazet
Cc: Ingo Molnar
Cc: David Miller
Cc: Eric Dumazet
Acked-by: Mike Frysinger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arun Sharma
2011-07-27 07:49:47 +0800

23 Jul, 2011

1 commit

805120795 Merge branch 'x86-signal-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'x86-signal-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Kill handle_signal()->set_fs()
x86, do_signal: Simplify the TS_RESTORE_SIGMASK logic
x86, signals: Convert the X86_32 code to use set_current_blocked()
x86, signals: Convert the IA32_EMULATION code to use set_current_blocked()

Linus Torvalds
2011-07-23 08:04:32 +0800

15 Jul, 2011

1 commit

905f29e2a x86, signals: Convert the IA32_EMULATION code to use set_current_blocked() ... Browse Code »

sys32_sigsuspend() and sys32_*sigreturn() change ->blocked directly.
This is not correct, see the changelog in e6fa16ab
"signal: sigprocmask() should do retarget_shared_pending()"

Change them to use set_current_blocked().

Signed-off-by: Oleg Nesterov
Link: http://lkml.kernel.org/r/20110710192724.GA31755@redhat.com
Signed-off-by: H. Peter Anvin

Oleg Nesterov
2011-07-15 12:21:31 +0800

04 Jun, 2011

2 commits

838feb475 x86, asm: Flip RESTORE_ARGS arguments logic ... Browse Code »

... thus getting rid of the "else" part of the conditional statement in
the macro.

No functionality change.

Signed-off-by: Borislav Petkov
Link: http://lkml.kernel.org/r/1306873314-32523-4-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin

Borislav Petkov
2011-06-04 05:38:53 +0800
cac0e0a78 x86, asm: Flip SAVE_ARGS arguments logic ... Browse Code »

This saves us the else part of the conditional statement in the macro.

No functionality change.

Signed-off-by: Borislav Petkov
Link: http://lkml.kernel.org/r/1306873314-32523-3-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin

Borislav Petkov
2011-06-04 05:38:51 +0800

29 May, 2011

1 commit

7b21fddd0 ns: Wire up the setns system call ... Browse Code »

32bit and 64bit on x86 are tested and working. The rest I have looked
at closely and I can't find any problems.

setns is an easy system call to wire up. It just takes two ints so I
don't expect any weird architecture porting problems.

While doing this I have noticed that we have some architectures that are
very slow to get new system calls. cris seems to be the slowest where
the last system calls wired up were preadv and pwritev. avr32 is weird
in that recvmmsg was wired up but never declared in unistd.h. frv is
behind with perf_event_open being the last syscall wired up. On h8300
the last system call wired up was epoll_wait. On m32r the last system
call wired up was fallocate. mn10300 has recvmmsg as the last system
call wired up. The rest seem to at least have syncfs wired up which was
new in the 2.6.39.

v2: Most of the architecture support added by Daniel Lezcano
v3: ported to v2.6.36-rc4 by: Eric W. Biederman
v4: Moved wiring up of the system call to another patch
v5: ported to v2.6.39-rc6
v6: rebased onto parisc-next and net-next to avoid syscall conflicts.
v7: ported to Linus's latest post 2.6.39 tree.

> arch/blackfin/include/asm/unistd.h | 3 ++-
> arch/blackfin/mach-common/entry.S | 1 +
Acked-by: Mike Frysinger

Oh - ia64 wiring looks good.
Acked-by: Tony Luck

Signed-off-by: Eric W. Biederman
Signed-off-by: Linus Torvalds

Eric W. Biederman
2011-05-29 01:48:39 +0800

06 May, 2011

1 commit

228e548e6 net: Add sendmmsg socket system call ... Browse Code »
1

This patch adds a multiple message send syscall and is the send
version of the existing recvmmsg syscall. This is heavily
based on the patch by Arnaldo that added recvmmsg.

I wrote a microbenchmark to test the performance gains of using
this new syscall:

http://ozlabs.org/~anton/junkcode/sendmmsg_test.c

The test was run on a ppc64 box with a 10 Gbit network card. The
benchmark can send both UDP and RAW ethernet packets.

64B UDP

batch pkts/sec
1 804570
2 872800 (+ 8 %)
4 916556 (+14 %)
8 939712 (+17 %)
16 952688 (+18 %)
32 956448 (+19 %)
64 964800 (+20 %)

64B raw socket

batch pkts/sec
1 1201449
2 1350028 (+12 %)
4 1461416 (+22 %)
8 1513080 (+26 %)
16 1541216 (+28 %)
32 1553440 (+29 %)
64 1557888 (+30 %)

We see a 20% improvement in throughput on UDP send and 30%
on raw socket send.

[ Add sparc syscall entries. -DaveM ]

Signed-off-by: Anton Blanchard
Signed-off-by: David S. Miller

Anton Blanchard
2011-05-06 02:10:14 +0800

24 Mar, 2011

1 commit

375906f87 x86: mark associated mm when running a task in 32 bit compatibility mode ... Browse Code »

This patch simply follows the same practice as for setting the TIF_IA32 flag.
In particular, an mm is marked as holding 32-bit tasks when a 32-bit binary is
exec'ed. Both ELF and a.out formats are updated.

Signed-off-by: Stephen Wilson
Reviewed-by: Michel Lespinasse
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Signed-off-by: Al Viro

Stephen Wilson
2011-03-24 04:36:53 +0800

21 Mar, 2011

1 commit

b7ed78f56 introduce sys_syncfs to sync a single file system ... Browse Code »

It is frequently useful to sync a single file system, instead of all
mounted file systems via sync(2):

- On machines with many mounts, it is not at all uncommon for some of
them to hang (e.g. unresponsive NFS server). sync(2) will get stuck on
those and may never get to the one you do care about (e.g., /).
- Some applications write lots of data to the file system and then
want to make sure it is flushed to disk. Calling fsync(2) on each
file introduces unnecessary ordering constraints that result in a large
amount of sub-optimal writeback/flush/commit behavior by the file
system.

There are currently two ways (that I know of) to sync a single super_block:

- BLKFLSBUF ioctl on the block device: That also invalidates the bdev
mapping, which isn't usually desirable, and doesn't work for non-block
file systems.
- 'mount -o remount,rw' will call sync_filesystem as an artifact of the
current implemention. Relying on this little-known side effect for
something like data safety sounds foolish.

Both of these approaches require root privileges, which some applications
do not have (nor should they need?) given that sync(2) is an unprivileged
operation.

This patch introduces a new system call syncfs(2) that takes an fd and
syncs only the file system it references. Maybe someday we can

$ sync /some/path

and not get

sync: ignoring all arguments

The syscall is motivated by comments by Al and Christoph at the last LSF.
syncfs(2) seems like an appropriate name given statfs(2).

A similar ioctl was also proposed a while back, see
http://marc.info/?l=linux-fsdevel&m=127970513829285&w=2

Signed-off-by: Sage Weil
Signed-off-by: Al Viro

Sage Weil
2011-03-21 12:40:29 +0800

16 Mar, 2011

3 commits

da849abeb Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, binutils, xen: Fix another wrong size directive
x86: Remove dead config option X86_CPU
x86: Really print supported CPUs if PROCESSOR_SELECT=y
x86: Fix a bogus unwind annotation in lib/semaphore_32.S
um, x86-64: Fix UML build after adding CFI annotations to lib/rwsem_64.S
x86: Remove unused bits from lib/thunk_*.S
x86: Use {push,pop}_cfi in more places
x86-64: Add CFI annotations to lib/rwsem_64.S
x86, asm: Cleanup unnecssary macros in asm-offsets.c
x86, system.h: Drop unused __SAVE/__RESTORE macros
x86: Use bitmap library functions
x86: Partly unify asm-offsets_{32,64}.c
x86: Reduce back the alignment of the per-CPU data section

Linus Torvalds
2011-03-16 09:59:56 +0800
420c1c572 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/tip/linux-2.6-tip

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (62 commits)
posix-clocks: Check write permissions in posix syscalls
hrtimer: Remove empty hrtimer_init_hres_timer()
hrtimer: Update hrtimer->state documentation
hrtimer: Update base[CLOCK_BOOTTIME].offset correctly
timers: Export CLOCK_BOOTTIME via the posix timers interface
timers: Add CLOCK_BOOTTIME hrtimer base
time: Extend get_xtime_and_monotonic_offset() to also return sleep
time: Introduce get_monotonic_boottime and ktime_get_boottime
hrtimers: extend hrtimer base code to handle more then 2 clockids
ntp: Remove redundant and incorrect parameter check
mn10300: Switch do_timer() to xtimer_update()
posix clocks: Introduce dynamic clocks
posix-timers: Cleanup namespace
posix-timers: Add support for fd based clocks
x86: Add clock_adjtime for x86
posix-timers: Introduce a syscall for clock tuning.
time: Splitout compat timex accessors
ntp: Add ADJ_SETOFFSET mode bit
time: Introduce timekeeping_inject_offset
posix-timer: Update comment
...

Fix up new system-call-related conflicts in
arch/x86/ia32/ia32entry.S
arch/x86/include/asm/unistd_32.h
arch/x86/include/asm/unistd_64.h
arch/x86/kernel/syscall_table_32.S
(name_to_handle_at()/open_by_handle_at() vs clock_adjtime()), and some
due to movement of get_jiffies_64() in:
kernel/time.c

Linus Torvalds
2011-03-16 09:53:35 +0800
a926021cb Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/… ... Browse Code »

…git/tip/linux-2.6-tip

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (184 commits)
perf probe: Clean up probe_point_lazy_walker() return value
tracing: Fix irqoff selftest expanding max buffer
tracing: Align 4 byte ints together in struct tracer
tracing: Export trace_set_clr_event()
tracing: Explain about unstable clock on resume with ring buffer warning
ftrace/graph: Trace function entry before updating index
ftrace: Add .ref.text as one of the safe areas to trace
tracing: Adjust conditional expression latency formatting.
tracing: Fix event alignment: skb:kfree_skb
tracing: Fix event alignment: mce:mce_record
tracing: Fix event alignment: kvm:kvm_hv_hypercall
tracing: Fix event alignment: module:module_request
tracing: Fix event alignment: ftrace:context_switch and ftrace:wakeup
tracing: Remove lock_depth from event entry
perf header: Stop using 'self'
perf session: Use evlist/evsel for managing perf.data attributes
perf top: Don't let events to eat up whole header line
perf top: Fix events overflow in top command
ring-buffer: Remove unused #include <linux/trace_irq.h>
tracing: Add an 'overwrite' trace_option.
...

Linus Torvalds
2011-03-16 09:31:30 +0800

15 Mar, 2011

1 commit

6aae5f2b2 x86: Add new syscalls for x86_64 ... Browse Code »

This patch add new syscalls to x86_64

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Al Viro

Aneesh Kumar K.V
2011-03-15 14:21:44 +0800

09 Mar, 2011

1 commit

ea7145477 x86: Separate out entry text section ... Browse Code »

Put x86 entry code into a separate link section: .entry.text.

Separating the entry text section seems to have performance
benefits - caused by more efficient instruction cache usage.

Running hackbench with perf stat --repeat showed that the change
compresses the icache footprint. The icache load miss rate went
down by about 15%:

before patch:
19417627 L1-icache-load-misses ( +- 0.147% )

after patch:
16490788 L1-icache-load-misses ( +- 0.180% )

The motivation of the patch was to fix a particular kprobes
bug that relates to the entry text section, the performance
advantage was discovered accidentally.

Whole perf output follows:

- results for current tip tree:

Performance counter stats for './hackbench/hackbench 10' (500 runs):

19417627 L1-icache-load-misses ( +- 0.147% )
2676914223 instructions # 0.497 IPC ( +- 0.079% )
5389516026 cycles ( +- 0.144% )

0.206267711 seconds time elapsed ( +- 0.138% )

- results for current tip tree with the patch applied:

Performance counter stats for './hackbench/hackbench 10' (500 runs):

16490788 L1-icache-load-misses ( +- 0.180% )
2717734941 instructions # 0.502 IPC ( +- 0.079% )
5414756975 cycles ( +- 0.148% )

0.206747566 seconds time elapsed ( +- 0.137% )

Signed-off-by: Jiri Olsa
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Peter Zijlstra
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Nick Piggin
Cc: Eric Dumazet
Cc: masami.hiramatsu.pt@hitachi.com
Cc: ananth@in.ibm.com
Cc: davem@davemloft.net
Cc: 2nddept-manager@sdl.hitachi.co.jp
LKML-Reference:
Signed-off-by: Ingo Molnar

Jiri Olsa
2011-03-09 00:22:11 +0800

01 Mar, 2011

1 commit

60cf637a1 x86: Use {push,pop}_cfi in more places ... Browse Code »

Cleaning up and shortening code...

Signed-off-by: Jan Beulich
Cc: Alexander van Heukelum
LKML-Reference:
Signed-off-by: Ingo Molnar

Jan Beulich
2011-03-01 01:06:22 +0800

02 Feb, 2011

1 commit

ce26efdef x86: Add clock_adjtime for x86 ... Browse Code »

This patch adds the clock_adjtime system call to the x86 architecture.

Signed-off-by: Richard Cochran
Acked-by: John Stultz
LKML-Reference:
Signed-off-by: Thomas Gleixner

Richard Cochran
2011-02-02 22:28:19 +0800

18 Nov, 2010

1 commit

451a3c24b BKL: remove extraneous #include <smp_lock.h> ... Browse Code »

The big kernel lock has been removed from all these files at some point,
leaving only the #include.

Remove this too as a cleanup.

Signed-off-by: Arnd Bergmann
Signed-off-by: Linus Torvalds

Arnd Bergmann
2010-11-18 00:59:32 +0800

15 Oct, 2010

1 commit

0eead9ab4 Don't dump task struct in a.out core-dumps ... Browse Code »

akiphie points out that a.out core-dumps have that odd task struct
dumping that was never used and was never really a good idea (it goes
back into the mists of history, probably the original core-dumping
code). Just remove it.

Also do the access_ok() check on dump_write(). It probably doesn't
matter (since normal filesystems all seem to do it anyway), but he
points out that it's normally done by the VFS layer, so ...

[ I suspect that we should possibly do "vfs_write()" instead of
calling ->write directly. That also does the whole fsnotify and write
statistics thing, which may or may not be a good idea. ]

And just to be anal, do this all for the x86-64 32-bit a.out emulation
code too, even though it's not enabled (and won't currently even
compile)

Reported-by: akiphie
Signed-off-by: Linus Torvalds

Linus Torvalds
2010-10-15 01:57:40 +0800

15 Sep, 2010

2 commits

eefdca043 x86-64, compat: Retruncate rax after ia32 syscall entry tracing ... Browse Code »

In commit d4d6715, we reopened an old hole for a 64-bit ptracer touching a
32-bit tracee in system call entry. A %rax value set via ptrace at the
entry tracing stop gets used whole as a 32-bit syscall number, while we
only check the low 32 bits for validity.

Fix it by truncating %rax back to 32 bits after syscall_trace_enter,
in addition to testing the full 64 bits as has already been added.

Reported-by: Ben Hawkes
Signed-off-by: Roland McGrath
Signed-off-by: H. Peter Anvin

Roland McGrath
2010-09-15 07:08:47 +0800
36d001c70 x86-64, compat: Test %rax for the syscall number, not %eax ... Browse Code »

On 64 bits, we always, by necessity, jump through the system call
table via %rax. For 32-bit system calls, in theory the system call
number is stored in %eax, and the code was testing %eax for a valid
system call number. At one point we loaded the stored value back from
the stack to enforce zero-extension, but that was removed in checkin
d4d67150165df8bf1cc05e532f6efca96f907cab. An actual 32-bit process
will not be able to introduce a non-zero-extended number, but it can
happen via ptrace.

Instead of re-introducing the zero-extension, test what we are
actually going to use, i.e. %rax. This only adds a handful of REX
prefixes to the code.

Reported-by: Ben Hawkes
Signed-off-by: H. Peter Anvin
Cc:
Cc: Roland McGrath
Cc: Andrew Morton

H. Peter Anvin
2010-09-15 07:08:46 +0800

14 Aug, 2010

1 commit

c78873252 Mark arguments to certain syscalls as being const ... Browse Code »

Mark arguments to certain system calls as being const where they should be but
aren't. The list includes:

(*) The filename arguments of various stat syscalls, execve(), various utimes
syscalls and some mount syscalls.

(*) The filename arguments of some syscall helpers relating to the above.

(*) The buffer argument of various write syscalls.

Signed-off-by: David Howells
Acked-by: David S. Miller
Signed-off-by: Linus Torvalds

David Howells
2010-08-14 07:53:13 +0800

11 Aug, 2010

2 commits

8cbd84f2d x86: fix up system call numbering nit ... Browse Code »

As pointed out by Jiri Slaby: when I resolved the the 32-bit x85 system
call entry tables for prlimit (due to the conflict with fanotify), I
forgot to add the numbering in comments that we do for every fifth entry.

Reported-by: Jiri Slaby
Signed-off-by: Linus Torvalds

Linus Torvalds
2010-08-11 06:35:10 +0800
b34d8915c Merge branch 'writable_limits' of git://decibel.fi.muni.cz/~xslaby/linux ... Browse Code »

* 'writable_limits' of git://decibel.fi.muni.cz/~xslaby/linux:
unistd: add __NR_prlimit64 syscall numbers
rlimits: implement prlimit64 syscall
rlimits: switch more rlimit syscalls to do_prlimit
rlimits: redo do_setrlimit to more generic do_prlimit
rlimits: add rlimit64 structure
rlimits: do security check under task_lock
rlimits: allow setrlimit to non-current tasks
rlimits: split sys_setrlimit
rlimits: selinux, do rlimits changes under task_lock
rlimits: make sure ->rlim_max never grows in sys_setrlimit
rlimits: add task_struct to update_rlimit_cpu
rlimits: security, add task_struct to setrlimit

Fix up various system call number conflicts. We not only added fanotify
system calls in the meantime, but asm-generic/unistd.h added a wait4
along with a range of reserved per-architecture system calls.

Linus Torvalds
2010-08-11 03:07:51 +0800

28 Jul, 2010

2 commits

bbaa4168b fanotify: sys_fanotify_mark declartion ... Browse Code »

This patch simply declares the new sys_fanotify_mark syscall

int fanotify_mark(int fanotify_fd, unsigned int flags, u64_mask,
int dfd const char *pathname)

Signed-off-by: Eric Paris

Eric Paris
2010-07-28 21:58:55 +0800
11637e4b7 fanotify: fanotify_init syscall declaration ... Browse Code »

This patch defines a new syscall fanotify_init() of the form:

int sys_fanotify_init(unsigned int flags, unsigned int event_f_flags,
unsigned int priority)

This syscall is used to create and fanotify group. This is very similar to
the inotify_init() syscall.

Signed-off-by: Eric Paris

Eric Paris
2010-07-28 21:58:55 +0800

16 Jul, 2010

1 commit

f33ebbe9d unistd: add __NR_prlimit64 syscall numbers ... Browse Code »

Add __NR_prlimit64 syscall numbers to asm-generic. Add them also to
asm-x86, both 32 and 64-bit.

Signed-off-by: Jiri Slaby
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"

Jiri Slaby
2010-07-16 15:52:32 +0800

21 Apr, 2010

1 commit

4cecd935f x86: correctly wire up the newuname system call ... Browse Code »

Before commit e28cbf22933d0c0ccaf3c4c27a1a263b41f73859 ("improve
sys_newuname() for compat architectures") 64-bit x86 had a private
implementation of sys_uname which was just called sys_uname, which other
architectures used for the old uname.

Due to some merge issues with the uname refactoring patches we ended up
calling the old uname version for both the old and new system call
slots, which lead to the domainname filed never be set which caused
failures with libnss_nis.

Reported-and-tested-by: Andy Isaacson
Signed-off-by: Christoph Hellwig
Signed-off-by: Linus Torvalds

Christoph Hellwig
2010-04-21 00:17:21 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

13 Mar, 2010

3 commits

5cacdb4ad Add generic sys_olduname() ... Browse Code »

Add generic implementations of the old and really old uname system calls.
Note that sh only implements sys_olduname but not sys_oldolduname, but I'm
not going to bother with another ifdef for that special case.

m32r implemented an old uname but never wired it up, so kill it, too.

Signed-off-by: Christoph Hellwig
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: Paul Mundt
Cc: Jeff Dike
Cc: Hirokazu Takata
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: H. Peter Anvin
Cc: Al Viro
Cc: Arnd Bergmann
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: "Luck, Tony"
Cc: James Morris
Cc: Andreas Schwab
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2010-03-13 07:52:32 +0800
a4679373c Add generic sys_old_mmap() ... Browse Code »

Add a generic implementation of the old mmap() syscall, which expects its
argument in a memory block and switch all architectures over to use it.

Signed-off-by: Christoph Hellwig
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: Paul Mundt
Cc: Jeff Dike
Cc: Hirokazu Takata
Cc: Thomas Gleixner
Cc: Ingo Molnar
Reviewed-by: H. Peter Anvin
Cc: Al Viro
Cc: Arnd Bergmann
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: "Luck, Tony"
Cc: James Morris
Cc: Andreas Schwab
Acked-by: Jesper Nilsson
Acked-by: Russell King
Acked-by: Greg Ungerer
Acked-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2010-03-13 07:52:32 +0800
5d0e52830 Add generic sys_old_select() ... Browse Code »

Add a generic implementation of the old select() syscall, which expects
its argument in a memory block and switch all architectures over to use
it.

Signed-off-by: Christoph Hellwig
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: Paul Mundt
Cc: Jeff Dike
Cc: Hirokazu Takata
Cc: Thomas Gleixner
Cc: Ingo Molnar
Reviewed-by: H. Peter Anvin
Cc: Al Viro
Cc: Arnd Bergmann
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: "Luck, Tony"
Cc: James Morris
Acked-by: Andreas Schwab
Acked-by: Russell King
Acked-by: Greg Ungerer
Acked-by: David Howells
Cc: Andreas Schwab
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2010-03-13 07:52:32 +0800

04 Mar, 2010

1 commit

fb7b096d9 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (25 commits)
x86: Fix out of order of gsi
x86: apic: Fix mismerge, add arch_probe_nr_irqs() again
x86, irq: Keep chip_data in create_irq_nr and destroy_irq
xen: Remove unnecessary arch specific xen irq functions.
smp: Use nr_cpus= to set nr_cpu_ids early
x86, irq: Remove arch_probe_nr_irqs
sparseirq: Use radix_tree instead of ptrs array
sparseirq: Change irq_desc_ptrs to static
init: Move radix_tree_init() early
irq: Remove unnecessary bootmem code
x86: Add iMac9,1 to pci_reboot_dmi_table
x86: Convert i8259_lock to raw_spinlock
x86: Convert nmi_lock to raw_spinlock
x86: Convert ioapic_lock and vector_lock to raw_spinlock
x86: Avoid race condition in pci_enable_msix()
x86: Fix SCI on IOAPIC != 0
x86, ia32_aout: do not kill argument mapping
x86, irq: Move __setup_vector_irq() before the first irq enable in cpu online path
x86, irq: Update the vector domain for legacy irqs handled by io-apic
x86, irq: Don't block IRQ0_VECTOR..IRQ15_VECTOR's on all cpu's
...

Linus Torvalds
2010-03-04 00:15:37 +0800