Eric Lee / smarc-fsl-linux-kernel

28 Jul, 2010

1 commit

2a12a9d78 fsnotify: pass a file instead of an inode to open, read, and write ... Browse Code »

fanotify, the upcoming notification system actually needs a struct path so it can
do opens in the context of listeners, and it needs a file so it can get f_flags
from the original process. Close was the only operation that already was passing
a struct file to the notification hook. This patch passes a file for access,
modify, and open as well as they are easily available to these hooks.

Signed-off-by: Eric Paris

Eric Paris
2010-07-28 21:58:32 +0800

28 May, 2010

1 commit

ae6afc3f5 vfs: introduce noop_llseek() ... Browse Code »

This is an implementation of ->llseek useable for the rare special case
when userspace expects the seek to succeed but the (device) file is
actually not able to perform the seek. In this case you use noop_llseek()
instead of falling back to the default implementation of ->llseek.

Signed-off-by: Jan Blunck
Cc: Frederic Weisbecker
Cc: Christoph Hellwig
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

jan Blunck
2010-05-28 00:12:56 +0800

25 Mar, 2010

1 commit

61964eba5 do_sync_read/write() should set kiocb.ki_nbytes to be consistent ... Browse Code »

do_sync_read/write() should set kiocb.ki_nbytes to be consistent with
do_sync_readv_writev().

Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

David Howells
2010-03-25 07:43:29 +0800

04 Nov, 2009

1 commit

cc56f7de7 sendfile(): check f_op.splice_write() rather than f_op.sendpage() ... Browse Code »

sendfile(2) was reworked with the splice infrastructure, but it still
checks f_op.sendpage() instead of f_op.splice_write() wrongly. Although
if f_op.sendpage() exists, f_op.splice_write() always exists at the same
time currently, the assumption will be broken in future silently. This
patch also brings a side effect: sendfile(2) can work with any output
file. Some security checks related to f_op are added too.

Signed-off-by: Changli Gao
Signed-off-by: Jens Axboe

Changli Gao
2009-11-04 16:09:52 +0800

24 Sep, 2009

1 commit

f9098980f vfs: remove redundant position check in do_sendfile ... Browse Code »

As Johannes Weiner pointed out, one of the range checks in do_sendfile
is redundant and is already checked in rw_verify_area.

Signed-off-by: Jeff Layton
Reviewed-by: Johannes Weiner
Cc: Christoph Hellwig
Cc: Al Viro
Cc: Robert Love
Cc: Mandeep Singh Baines
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Jeff Layton
2009-09-24 19:47:34 +0800

11 May, 2009

1 commit

6818173bd splice: implement default splice_read method ... Browse Code »

If f_op->splice_read() is not implemented, fall back to a plain read.
Use vfs_readv() to read into previously allocated pages.

This will allow splice and functions using splice, such as the loop
device, to work on all filesystems. This includes "direct_io" files
in fuse which bypass the page cache.

Signed-off-by: Miklos Szeredi
Signed-off-by: Jens Axboe

Miklos Szeredi
2009-05-11 20:13:10 +0800

05 Apr, 2009

1 commit

601cc11d0 Make non-compat preadv/pwritev use native register size ... Browse Code »

Instead of always splitting the file offset into 32-bit 'high' and 'low'
parts, just split them into the largest natural word-size - which in C
terms is 'unsigned long'.

This allows 64-bit architectures to avoid the unnecessary 32-bit
shifting and masking for native format (while the compat interfaces will
obviously always have to do it).

This also changes the order of 'high' and 'low' to be "low first". Why?
Because when we have it like this, the 64-bit system calls now don't use
the "pos_high" argument at all, and it makes more sense for the native
system call to simply match the user-mode prototype.

This results in a much more natural calling convention, and allows the
compiler to generate much more straightforward code. On x86-64, we now
generate

testq %rcx, %rcx # pos_l
js .L122 #,
movq %rcx, -48(%rbp) # pos_l, pos

from the C source

loff_t pos = pos_from_hilo(pos_h, pos_l);
...
if (pos < 0)
return -EINVAL;

and the 'pos_h' register isn't even touched. It used to generate code
like

mov %r8d, %r8d # pos_low, pos_low
salq $32, %rcx #, tmp71
movq %r8, %rax # pos_low, pos.386
orq %rcx, %rax # tmp71, pos.386
js .L122 #,
movq %rax, -48(%rbp) # pos.386, pos

which isn't _that_ horrible, but it does show how the natural word size
is just a more sensible interface (same arguments will hold in the user
level glibc wrapper function, of course, so the kernel side is just half
of the equation!)

Note: in all cases the user code wrapper can again be the same. You can
just do

#define HALF_BITS (sizeof(unsigned long)*4)
__syscall(PWRITEV, fd, iov, count, offset, (offset >> HALF_BITS) >> HALF_BITS);

or something like that. That way the user mode wrapper will also be
nicely passing in a zero (it won't actually have to do the shifts, the
compiler will understand what is going on) for the last argument.

And that is a good idea, even if nobody will necessarily ever care: if
we ever do move to a 128-bit lloff_t, this particular system call might
be left alone. Of course, that will be the least of our worries if we
really ever need to care, so this may not be worth really caring about.

[ Fixed for lost 'loff_t' cast noticed by Andrew Morton ]

Acked-by: Gerd Hoffmann
Cc: H. Peter Anvin
Cc: Andrew Morton
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: Ingo Molnar
Cc: Ralf Baechle >
Cc: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2009-04-05 05:20:34 +0800

03 Apr, 2009

1 commit

f3554f4bc preadv/pwritev: Add preadv and pwritev system calls. ... Browse Code »

This patch adds preadv and pwritev system calls. These syscalls are a
pretty straightforward combination of pread and readv (same for write).
They are quite useful for doing vectored I/O in threaded applications.
Using lseek+readv instead opens race windows you'll have to plug with
locking.

Other systems have such system calls too, for example NetBSD, check
here: http://www.daemon-systems.org/man/preadv.2.html

The application-visible interface provided by glibc should look like
this to be compatible to the existing implementations in the *BSD family:

ssize_t preadv(int d, const struct iovec *iov, int iovcnt, off_t offset);
ssize_t pwritev(int d, const struct iovec *iov, int iovcnt, off_t offset);

This prototype has one problem though: On 32bit archs is the (64bit)
offset argument unaligned, which the syscall ABI of several archs doesn't
allow to do. At least s390 needs a wrapper in glibc to handle this. As
we'll need a wrappers in glibc anyway I've decided to push problem to
glibc entriely and use a syscall prototype which works without
arch-specific wrappers inside the kernel: The offset argument is
explicitly splitted into two 32bit values.

The patch sports the actual system call implementation and the windup in
the x86 system call tables. Other archs follow as separate patches.

Signed-off-by: Gerd Hoffmann
Cc: Arnd Bergmann
Cc: Al Viro
Cc:
Cc:
Cc: Ralf Baechle
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gerd Hoffmann
2009-04-03 10:05:08 +0800

14 Jan, 2009

5 commits

3cdad4288 [CVE-2009-0029] System call wrappers part 20 ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2009-01-14 21:15:26 +0800
003d7ab47 [CVE-2009-0029] System call wrappers part 19 ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2009-01-14 21:15:26 +0800
002c8976e [CVE-2009-0029] System call wrappers part 16 ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2009-01-14 21:15:25 +0800
6673e0c3f [CVE-2009-0029] System call wrapper special cases ... Browse Code »

System calls with an unsigned long long argument can't be converted with
the standard wrappers since that would include a cast to long, which in
turn means that we would lose the upper 32 bit on 32 bit architectures.
Also semctl can't use the standard wrapper since it has a 'union'
parameter.

So we handle them as special case and add some extra wrappers instead.

Signed-off-by: Heiko Carstens

Heiko Carstens
2009-01-14 21:15:18 +0800
2ed7c03ec [CVE-2009-0029] Convert all system calls to return a long ... Browse Code »

Convert all system calls to return a long. This should be a NOP since all
converted types should have the same size anyway.
With the exception of sys_exit_group which returned void. But that doesn't
matter since the system call doesn't return.

Signed-off-by: Heiko Carstens

Heiko Carstens
2009-01-14 21:15:14 +0800

06 Jan, 2009

1 commit

5b6f1eb97 vfs: lseek(fd, 0, SEEK_CUR) race condition ... Browse Code »

This patch fixes a race condition in lseek. While it is expected that
unpredictable behaviour may result while repositioning the offset of a
file descriptor concurrently with reading/writing to the same file
descriptor, this should not happen when merely *reading* the file
descriptor's offset.

Unfortunately, the only portable way in Unix to read a file
descriptor's offset is lseek(fd, 0, SEEK_CUR); however executing this
concurrently with read/write may mess up the position.

[with fixes from akpm]

Signed-off-by: Alain Knaff
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Alain Knaff
2009-01-06 00:53:07 +0800

23 Oct, 2008

1 commit

3a8cff4f0 [PATCH] generic_file_llseek tidyups ... Browse Code »

Add kerneldoc for generic_file_llseek and generic_file_llseek_unlocked,
use sane variable names and unclutter the code.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2008-10-23 17:12:59 +0800

03 Jul, 2008

1 commit

9465efc9e Remove BKL from remote_llseek v2 ... Browse Code »

- Replace remote_llseek with generic_file_llseek_unlocked (to force compilation
failures in all users)
- Change all users to either use generic_file_llseek_unlocked directly or
take the BKL around. I changed the file systems who don't use the BKL
for anything (CIFS, GFS) to call it directly. NCPFS and SMBFS and NFS
take the BKL, but explicitely in their own source now.

I moved them all over in a single patch to avoid unbisectable sections.

Open problem: 32bit kernels can corrupt fpos because its modification
is not atomic, but they can do that anyways because there's other paths who
modify it without BKL.

Do we need a special lock for the pos/f_version = 0 checks?

Trond says the NFS BKL is likely not needed, but keep it for now
until his full audit.

v2: Use generic_file_llseek_unlocked instead of remote_llseek_unlocked
and factor duplicated code (suggested by hch)

Cc: Trond.Myklebust@netapp.com
Cc: swhiteho@redhat.com
Cc: sfrench@samba.org
Cc: vandrove@vc.cvut.cz

Signed-off-by: Andi Kleen
Signed-off-by: Andi Kleen
Signed-off-by: Jonathan Corbet

Andi Kleen
2008-07-03 05:06:27 +0800

23 Apr, 2008

1 commit

16abef0e9 fs: use loff_t type instead of long long ... Browse Code »

Use offset type consistently.

Signed-off-by: David Sterba
Signed-off-by: Linus Torvalds

David Sterba
2008-04-23 06:17:11 +0800

09 Feb, 2008

1 commit

3287629ef remove the unused exports of sys_open/sys_read ... Browse Code »

These exports (which aren't used and which are in fact dangerous to use
because they pretty much form a security hole to use) have been marked
_UNUSED since 2.6.24 with removal in 2.6.25. This patch is their final
departure from the Linux kernel tree.

Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arjan van de Ven
2008-02-09 01:22:36 +0800

29 Jan, 2008

1 commit

19295529d ext4: export iov_shorten from kernel for ext4's use ... Browse Code »

Export iov_shorten() from kernel so that ext4 can
truncate too-large writes to bitmapped files.

Signed-off-by: Eric Sandeen

Eric Sandeen
2008-01-29 12:58:27 +0800

25 Jan, 2008

1 commit

c43e259cc security: call security_file_permission from rw_verify_area ... Browse Code »

All instances of rw_verify_area() are followed by a call to
security_file_permission(), so just call the latter from the former.

Acked-by: Eric Paris
Signed-off-by: James Morris

James Morris
2008-01-25 08:29:52 +0800

15 Nov, 2007

1 commit

cb51f973b mark sys_open/sys_read exports unused ... Browse Code »

sys_open / sys_read were used in the early 1.2 days to load firmware from
disk inside drivers. Since 2.0 or so this was deprecated behavior, but
several drivers still were using this. Since a few years we have a
request_firmware() API that implements this in a nice, consistent way.
Only some old ISA sound drivers (pre-ALSA) still straggled along for some
time.... however with commit c2b1239a9f22f19c53543b460b24507d0e21ea0c the
last user is now gone.

This is a good thing, since using sys_open / sys_read etc for firmware is a
very buggy to dangerous thing to do; these operations put an fd in the
process file descriptor table.... which then can be tampered with from
other threads for example. For those who don't want the firmware loader,
filp_open()/vfs_read are the better APIs to use, without this security
issue.

The patch below marks sys_open and sys_read unused now that they're
really not used anymore, and for deletion in the 2.6.25 timeframe.

Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arjan van de Ven
2007-11-15 10:45:42 +0800

10 Oct, 2007

1 commit

a16877ca9 Cleanup macros for distinguishing mandatory locks ... Browse Code »

The combination of S_ISGID bit set and S_IXGRP bit unset is used to mark the
inode as "mandatory lockable" and there's a macro for this check called
MANDATORY_LOCK(inode). However, fs/locks.c and some filesystems still perform
the explicit i_mode checking. Besides, Andrew pointed out, that this macro is
buggy itself, as it dereferences the inode arg twice.

Convert this macro into static inline function and switch its users to it,
making the code shorter and more readable.

The __mandatory_lock() helper is to be used in places where the IS_MANDLOCK()
for superblock is already known to be true.

Signed-off-by: Pavel Emelyanov
Cc: Trond Myklebust
Cc: "J. Bruce Fields"
Cc: David Howells
Cc: Eric Van Hensbergen
Cc: Ron Minnich
Cc: Latchesar Ionkov
Cc: Steven Whitehouse
Signed-off-by: Andrew Morton

Pavel Emelyanov
2007-10-10 06:32:46 +0800

10 Jul, 2007

3 commits

d96e6e716 Remove remnants of sendfile() ... Browse Code »

There are now zero users of .sendfile() in the kernel, so kill
it from the file_operations structure and in do_sendfile().

Signed-off-by: Jens Axboe

Jens Axboe
2007-07-10 14:04:15 +0800
d6b29d7ce splice: divorce the splice structure/function definitions from the pipe header ... Browse Code »

We need to move even more stuff into the header so that folks can use
the splice_to_pipe() implementation instead of open-coding a lot of
pipe knowledge (see relay implementation), so move to our own header
file finally.

Signed-off-by: Jens Axboe

Jens Axboe
2007-07-10 14:04:14 +0800
534f2aaa6 sys_sendfile: switch to using ->splice_read, if available ... Browse Code »

This patch makes sendfile prefer to use ->splice_read(), if it's
available in the file_operations structure.

Signed-off-by: Jens Axboe

Jens Axboe
2007-07-10 14:04:12 +0800

09 May, 2007

2 commits

1ae7075bc use use SEEK_MAX to validate user lseek arguments ... Browse Code »

Add SEEK_MAX and use it to validate lseek arguments from userspace.

Signed-off-by: Chris Snook
Acked-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chris Snook
2007-05-09 02:14:59 +0800
7b8e89249 use symbolic constants in generic lseek code ... Browse Code »

Convert magic numbers to SEEK_* values from fs.h

Signed-off-by: Chris Snook
Acked-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chris Snook
2007-05-09 02:14:59 +0800

13 Feb, 2007

1 commit

163da958b [PATCH] FS: speed up rw_verify_area() ... Browse Code »

oprofile hunting showed a stall in rw_verify_area(), because of triple
indirection and potential cache misses.
(file->f_path.dentry->d_inode->i_flock)

By moving initialization of 'struct inode' pointer before the pos/count
sanity tests, we allow the compiler and processor to perform two loads by
anticipation, reducing stall, without prefetch() hints. Even x86 arch has
enough registers to not use temporary variables and not increase text size.

I validated this patch running a bench and studied oprofile changes, and
absolute perf of the test program.

Results of my epoll_pipe_bench (source available on request) on a Pentium-M
1.6 GHz machine

Before :
# ./epoll_pipe_bench -l 30 -t 20
Avg: 436089 evts/sec read_count=8843037 write_count=8843040 21.218390 samples
per call
(best value out of 10 runs)

After :
# ./epoll_pipe_bench -l 30 -t 20
Avg: 470980 evts/sec read_count=9549871 write_count=9549894 21.216694 samples
per call
(best value out of 10 runs)

oprofile CPU_CLK_UNHALTED events gave a reduction from 5.3401 % to 2.5851 %
for the rw_verify_area() function.

Signed-off-by: Eric Dumazet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2007-02-13 01:48:29 +0800

12 Feb, 2007

1 commit

4b98d11b4 [PATCH] ifdef ->rchar, ->wchar, ->syscr, ->syscw from task_struct ... Browse Code »

They are fat: 4x8 bytes in task_struct.
They are uncoditionally updated in every fork, read, write and sendfile.
They are used only if you have some "extended acct fields feature".

And please, please, please, read(2) knows about bytes, not characters,
why it is called "rchar"?

Signed-off-by: Alexey Dobriyan
Cc: Jay Lan
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2007-02-12 03:18:07 +0800

14 Dec, 2006

1 commit

029530f81 [PATCH] one more EXPORT_UNUSED_SYMBOL removal ... Browse Code »

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2006-12-14 01:05:53 +0800

09 Dec, 2006

1 commit

0f7fc9e4d [PATCH] VFS: change struct file to use struct path ... Browse Code »

This patch changes struct file to use struct path instead of having
independent pointers to struct dentry and struct vfsmount, and converts all
users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.

Additionally, it adds two #define's to make the transition easier for users of
the f_dentry and f_vfsmnt.

Signed-off-by: Josef "Jeff" Sipek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Josef "Jeff" Sipek
2006-12-09 00:28:41 +0800

01 Oct, 2006

4 commits

eed4e51fb [PATCH] Add vector AIO support ... Browse Code »

This work is initially done by Zach Brown to add support for vectored aio.
These are the core changes for AIO to support
IOCB_CMD_PREADV/IOCB_CMD_PWRITEV.

[akpm@osdl.org: huge build fix]
Signed-off-by: Zach Brown
Signed-off-by: Christoph Hellwig
Signed-off-by: Badari Pulavarty
Acked-by: Benjamin LaHaise
Acked-by: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2006-10-01 15:39:29 +0800
543ade1fc [PATCH] Streamline generic_file_* interfaces and filemap cleanups ... Browse Code »

This patch cleans up generic_file_*_read/write() interfaces. Christoph
Hellwig gave me the idea for this clean ups.

In a nutshell, all filesystems should set .aio_read/.aio_write methods and use
do_sync_read/ do_sync_write() as their .read/.write methods. This allows us
to cleanup all variants of generic_file_* routines.

Final available interfaces:

generic_file_aio_read() - read handler
generic_file_aio_write() - write handler
generic_file_aio_write_nolock() - no lock write handler

__generic_file_aio_write_nolock() - internal worker routine

Signed-off-by: Badari Pulavarty
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2006-10-01 15:39:28 +0800
ee0b3e671 [PATCH] Remove readv/writev methods and use aio_read/aio_write instead ... Browse Code »

This patch removes readv() and writev() methods and replaces them with
aio_read()/aio_write() methods.

Signed-off-by: Badari Pulavarty
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2006-10-01 15:39:28 +0800
027445c37 [PATCH] Vectorize aio_read/aio_write fileop methods ... Browse Code »

This patch vectorizes aio_read() and aio_write() methods to prepare for
collapsing all aio & vectored operations into one interface - which is
aio_read()/aio_write().

Signed-off-by: Badari Pulavarty
Signed-off-by: Christoph Hellwig
Cc: Michael Holzheu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2006-10-01 15:39:28 +0800

11 Jul, 2006

1 commit

69c3a5b8f [PATCH] fs/read_write.c: EXPORT_UNUSED_SYMBOL ... Browse Code »

This patch marks an unused export as EXPORT_UNUSED_SYMBOL.

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2006-07-11 04:24:18 +0800

11 Apr, 2006

1 commit

49570e9b2 [PATCH] splice: unlikely() optimizations ... Browse Code »

Also corrects a few comments. Patch mainly from Ingo, changes by me.

Signed-off-by: Ingo Molnar
Signed-off-by: Jens Axboe

Jens Axboe
2006-04-11 19:56:09 +0800

29 Mar, 2006

1 commit

4b6f5d20b [PATCH] Make most file operations structs in fs/ const ... Browse Code »

This is a conversion to make the various file_operations structs in fs/
const. Basically a regexp job, with a few manual fixups

The goal is both to increase correctness (harder to accidentally write to
shared datastructures) and reducing the false sharing of cachelines with
things that get dirty in .data (while .rodata is nicely read only and thus
cache clean)

Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arjan van de Ven
2006-03-29 01:16:06 +0800

26 Mar, 2006

1 commit

6cc6b1226 [PATCH] remove needless check in fs/read_write.c ... Browse Code »

nr_segs is unsigned long and thus cannot be negative. We checked against 0
few lines before.

Signed-off-by: Carsten Otte
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Carsten Otte
2006-03-26 00:23:01 +0800

10 Jan, 2006

1 commit

1b1dcc1b5 [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem ... Browse Code »

This patch converts the inode semaphore to a mutex. I have tested it on
XFS and compiled as much as one can consider on an ia64. Anyway your
luck with it might be different.

Modified-by: Ingo Molnar

(finished the conversion)

Signed-off-by: Jes Sorensen
Signed-off-by: Ingo Molnar

Jes Sorensen
2006-01-10 07:59:24 +0800