Eric Lee / smarc-fsl-linux-kernel

09 Feb, 2008

2 commits

3287629ef remove the unused exports of sys_open/sys_read ... Browse Code »

These exports (which aren't used and which are in fact dangerous to use
because they pretty much form a security hole to use) have been marked
_UNUSED since 2.6.24 with removal in 2.6.25. This patch is their final
departure from the Linux kernel tree.

Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arjan van de Ven
2008-02-09 01:22:36 +0800
fc9b52cd8 fs: remove fastcall, it is always empty ... Browse Code »

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Harvey Harrison
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Harvey Harrison
2008-02-09 01:22:31 +0800

15 Nov, 2007

1 commit

cb51f973b mark sys_open/sys_read exports unused ... Browse Code »

sys_open / sys_read were used in the early 1.2 days to load firmware from
disk inside drivers. Since 2.0 or so this was deprecated behavior, but
several drivers still were using this. Since a few years we have a
request_firmware() API that implements this in a nice, consistent way.
Only some old ISA sound drivers (pre-ALSA) still straggled along for some
time.... however with commit c2b1239a9f22f19c53543b460b24507d0e21ea0c the
last user is now gone.

This is a good thing, since using sys_open / sys_read etc for firmware is a
very buggy to dangerous thing to do; these operations put an fd in the
process file descriptor table.... which then can be tampered with from
other threads for example. For those who don't want the firmware loader,
filp_open()/vfs_read are the better APIs to use, without this security
issue.

The patch below marks sys_open and sys_read unused now that they're
really not used anymore, and for deletion in the 2.6.25 timeframe.

Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arjan van de Ven
2007-11-15 10:45:42 +0800

21 Oct, 2007

1 commit

5a190ae69 [PATCH] pass dentry to audit_inode()/audit_inode_child() ... Browse Code »

makes caller simpler *and* allows to scan ancestors

Signed-off-by: Al Viro

Al Viro
2007-10-21 14:37:18 +0800

17 Oct, 2007

3 commits

b53767719 Implement file posix capabilities ... Browse Code »

Implement file posix capabilities. This allows programs to be given a
subset of root's powers regardless of who runs them, without having to use
setuid and giving the binary all of root's powers.

This version works with Kaigai Kohei's userspace tools, found at
http://www.kaigai.gr.jp/index.php. For more information on how to use this
patch, Chris Friedhoff has posted a nice page at
http://www.friedhoff.org/fscaps.html.

Changelog:
Nov 27:
Incorporate fixes from Andrew Morton
(security-introduce-file-caps-tweaks and
security-introduce-file-caps-warning-fix)
Fix Kconfig dependency.
Fix change signaling behavior when file caps are not compiled in.

Nov 13:
Integrate comments from Alexey: Remove CONFIG_ ifdef from
capability.h, and use %zd for printing a size_t.

Nov 13:
Fix endianness warnings by sparse as suggested by Alexey
Dobriyan.

Nov 09:
Address warnings of unused variables at cap_bprm_set_security
when file capabilities are disabled, and simultaneously clean
up the code a little, by pulling the new code into a helper
function.

Nov 08:
For pointers to required userspace tools and how to use
them, see http://www.friedhoff.org/fscaps.html.

Nov 07:
Fix the calculation of the highest bit checked in
check_cap_sanity().

Nov 07:
Allow file caps to be enabled without CONFIG_SECURITY, since
capabilities are the default.
Hook cap_task_setscheduler when !CONFIG_SECURITY.
Move capable(TASK_KILL) to end of cap_task_kill to reduce
audit messages.

Nov 05:
Add secondary calls in selinux/hooks.c to task_setioprio and
task_setscheduler so that selinux and capabilities with file
cap support can be stacked.

Sep 05:
As Seth Arnold points out, uid checks are out of place
for capability code.

Sep 01:
Define task_setscheduler, task_setioprio, cap_task_kill, and
task_setnice to make sure a user cannot affect a process in which
they called a program with some fscaps.

One remaining question is the note under task_setscheduler: are we
ok with CAP_SYS_NICE being sufficient to confine a process to a
cpuset?

It is a semantic change, as without fsccaps, attach_task doesn't
allow CAP_SYS_NICE to override the uid equivalence check. But since
it uses security_task_setscheduler, which elsewhere is used where
CAP_SYS_NICE can be used to override the uid equivalence check,
fixing it might be tough.

task_setscheduler
note: this also controls cpuset:attach_task. Are we ok with
CAP_SYS_NICE being used to confine to a cpuset?
task_setioprio
task_setnice
sys_setpriority uses this (through set_one_prio) for another
process. Need same checks as setrlimit

Aug 21:
Updated secureexec implementation to reflect the fact that
euid and uid might be the same and nonzero, but the process
might still have elevated caps.

Aug 15:
Handle endianness of xattrs.
Enforce capability version match between kernel and disk.
Enforce that no bits beyond the known max capability are
set, else return -EPERM.
With this extra processing, it may be worth reconsidering
doing all the work at bprm_set_security rather than
d_instantiate.

Aug 10:
Always call getxattr at bprm_set_security, rather than
caching it at d_instantiate.

[morgan@kernel.org: file-caps clean up for linux/capability.h]
[bunk@kernel.org: unexport cap_inode_killpriv]
Signed-off-by: Serge E. Hallyn
Cc: Stephen Smalley
Cc: James Morris
Cc: Chris Wright
Cc: Andrew Morgan
Signed-off-by: Andrew Morgan
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2007-10-17 23:43:07 +0800
a9c62a18a fs: correct SuS compliance for open of large file without options ... Browse Code »

The early LFS work that Linux uses favours EFBIG in various places. SuSv3
specifically uses EOVERFLOW for this as noted by Michael (Bug 7253)

[EOVERFLOW]
The named file is a regular file and the size of the file cannot be
represented correctly in an object of type off_t. We should therefore
transition to the proper error return code

Signed-off-by: Alan Cox
Cc: Theodore Tso
Cc: Jens Axboe
Cc: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alan Cox
2007-10-17 23:43:01 +0800
788e7dd4c SELinux: Improve read/write performance ... Browse Code »

It reduces the selinux overhead on read/write by only revalidating
permissions in selinux_file_permission if the task or inode labels have
changed or the policy has changed since the open-time check. A new LSM
hook, security_dentry_open, is added to capture the necessary state at open
time to allow this optimization.

(see http://marc.info/?l=selinux&m=118972995207740&w=2)

Signed-off-by: Yuichi Nakamura
Acked-by: Stephen Smalley
Signed-off-by: James Morris

Yuichi Nakamura
2007-10-17 06:59:31 +0800

01 Aug, 2007

1 commit

9700382c3 VFS: fix a race in lease-breaking during truncate ... Browse Code »

It is possible that another process could acquire a new file lease right
after break_lease() is called during a truncate, but before lease-granting
is disabled by the subsequent get_write_access(). Merely switching the
order of the break_lease() and get_write_access() calls prevents this race.

Signed-off-by: David M. Richter
Signed-off-by: "J. Bruce Fields"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

david m. richter
2007-08-01 06:39:42 +0800

25 Jul, 2007

1 commit

0d786d4a2 fallocate syscall interface deficiency ... Browse Code »

The fallocate syscall returns ENOSYS in case the filesystem does not support
the operation and expects the userlevel code to fill in. This is good in
concept.

The problem is that the libc code for old kernels should be able to
distinguish the case where the syscall is not at all available vs not
functioning for a specific mount point. As is this is not possible and we
always have to invoke the syscall even if the kernel doesn't support it.

I suggest the following patch. Using EOPNOTSUPP is IMO the right thing to do.

Cc: Amit Arora
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Drepper
2007-07-25 03:24:58 +0800

18 Jul, 2007

1 commit

97ac73506 sys_fallocate() implementation on i386, x86_64 and powerpc ... Browse Code »

fallocate() is a new system call being proposed here which will allow
applications to preallocate space to any file(s) in a file system.
Each file system implementation that wants to use this feature will need
to support an inode operation called ->fallocate().
Applications can use this feature to avoid fragmentation to certain
level and thus get faster access speed. With preallocation, applications
also get a guarantee of space for particular file(s) - even if later the
the system becomes full.

Currently, glibc provides an interface called posix_fallocate() which
can be used for similar cause. Though this has the advantage of working
on all file systems, but it is quite slow (since it writes zeroes to
each block that has to be preallocated). Without a doubt, file systems
can do this more efficiently within the kernel, by implementing
the proposed fallocate() system call. It is expected that
posix_fallocate() will be modified to call this new system call first
and incase the kernel/filesystem does not implement it, it should fall
back to the current implementation of writing zeroes to the new blocks.
ToDos:
1. Implementation on other architectures (other than i386, x86_64,
and ppc). Patches for s390(x) and ia64 are already available from
previous posts, but it was decided that they should be added later
once fallocate is in the mainline. Hence not including those patches
in this take.
2. Changes to glibc,
a) to support fallocate() system call
b) to make posix_fallocate() and posix_fallocate64() call fallocate()

Signed-off-by: Amit Arora

Amit Arora
2007-07-18 09:42:44 +0800

17 Jul, 2007

2 commits

4a19542e5 O_CLOEXEC for SCM_RIGHTS ... Browse Code »

Part two in the O_CLOEXEC saga: adding support for file descriptors received
through Unix domain sockets.

The patch is once again pretty minimal, it introduces a new flag for recvmsg
and passes it just like the existing MSG_CMSG_COMPAT flag. I think this bit
is not used otherwise but the networking people will know better.

This new flag is not recognized by recvfrom and recv. These functions cannot
be used for that purpose and the asymmetry this introduces is not worse than
the already existing MSG_CMSG_COMPAT situations.

The patch must be applied on the patch which introduced O_CLOEXEC. It has to
remove static from the new get_unused_fd_flags function but since scm.c cannot
live in a module the function still hasn't to be exported.

Here's a test program to make sure the code works. It's so much longer than
the actual patch...

#include
#include
#include
#include
#include
#include
#include
#include

#ifndef O_CLOEXEC
# define O_CLOEXEC 02000000
#endif
#ifndef MSG_CMSG_CLOEXEC
# define MSG_CMSG_CLOEXEC 0x40000000
#endif

int
main (int argc, char *argv[])
{
if (argc > 1)
{
int fd = atol (argv[1]);
printf ("child: fd = %d\n", fd);
if (fcntl (fd, F_GETFD) == 0 || errno != EBADF)
{
puts ("file descriptor valid in child");
return 1;
}
return 0;

}

struct sockaddr_un sun;
strcpy (sun.sun_path, "./testsocket");
sun.sun_family = AF_UNIX;

char databuf[] = "hello";
struct iovec iov[1];
iov[0].iov_base = databuf;
iov[0].iov_len = sizeof (databuf);

union
{
struct cmsghdr hdr;
char bytes[CMSG_SPACE (sizeof (int))];
} buf;
struct msghdr msg = { .msg_iov = iov, .msg_iovlen = 1,
.msg_control = buf.bytes,
.msg_controllen = sizeof (buf) };
struct cmsghdr *cmsg = CMSG_FIRSTHDR (&msg);

cmsg->cmsg_level = SOL_SOCKET;
cmsg->cmsg_type = SCM_RIGHTS;
cmsg->cmsg_len = CMSG_LEN (sizeof (int));

msg.msg_controllen = cmsg->cmsg_len;

pid_t child = fork ();
if (child == -1)
error (1, errno, "fork");
if (child == 0)
{
int sock = socket (PF_UNIX, SOCK_STREAM, 0);
if (sock < 0)
error (1, errno, "socket");

if (bind (sock, (struct sockaddr *) &sun, sizeof (sun)) < 0)
error (1, errno, "bind");
if (listen (sock, SOMAXCONN) < 0)
error (1, errno, "listen");

int conn = accept (sock, NULL, NULL);
if (conn == -1)
error (1, errno, "accept");

*(int *) CMSG_DATA (cmsg) = sock;
if (sendmsg (conn, &msg, MSG_NOSIGNAL) < 0)
error (1, errno, "sendmsg");

return 0;
}

/* For a test suite this should be more robust like a
barrier in shared memory. */
sleep (1);

int sock = socket (PF_UNIX, SOCK_STREAM, 0);
if (sock < 0)
error (1, errno, "socket");

if (connect (sock, (struct sockaddr *) &sun, sizeof (sun)) < 0)
error (1, errno, "connect");
unlink (sun.sun_path);

*(int *) CMSG_DATA (cmsg) = -1;

if (recvmsg (sock, &msg, MSG_CMSG_CLOEXEC) < 0)
error (1, errno, "recvmsg");

int fd = *(int *) CMSG_DATA (cmsg);
if (fd == -1)
error (1, 0, "no descriptor received");

char fdname[20];
snprintf (fdname, sizeof (fdname), "%d", fd);
execl ("/proc/self/exe", argv[0], fdname, NULL);
puts ("execl failed");
return 1;
}

[akpm@linux-foundation.org: Fix fastcall inconsistency noted by Michael Buesch]
[akpm@linux-foundation.org: build fix]
Signed-off-by: Ulrich Drepper
Cc: Ingo Molnar
Cc: Michael Buesch
Cc: Michael Kerrisk
Acked-by: David S. Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Drepper
2007-07-17 00:05:45 +0800
f23513e8d Introduce O_CLOEXEC ... Browse Code »

The problem is as follows: in multi-threaded code (or more correctly: all
code using clone() with CLONE_FILES) we have a race when exec'ing.

thread #1 thread #2

fd=open()

fork + exec

fcntl(fd,F_SETFD,FD_CLOEXEC)

In some applications this can happen frequently. Take a web browser. One
thread opens a file and another thread starts, say, an external PDF viewer.
The result can even be a security issue if that open file descriptor
refers to a sensitive file and the external program can somehow be tricked
into using that descriptor.

Just adding O_CLOEXEC support to open() doesn't solve the whole set of
problems. There are other ways to create file descriptors (socket,
epoll_create, Unix domain socket transfer, etc). These can and should be
addressed separately though. open() is such an easy case that it makes not
much sense putting the fix off.

The test program:

#include
#include
#include
#include

#ifndef O_CLOEXEC
# define O_CLOEXEC 02000000
#endif

int
main (int argc, char *argv[])
{
int fd;
if (argc > 1)
{
fd = atol (argv[1]);
printf ("child: fd = %d\n", fd);
if (fcntl (fd, F_GETFD) == 0 || errno != EBADF)
{
puts ("file descriptor valid in child");
return 1;
}
return 0;
}

fd = open ("/proc/self/exe", O_RDONLY | O_CLOEXEC);
printf ("in parent: new fd = %d\n", fd);
char buf[20];
snprintf (buf, sizeof (buf), "%d", fd);
execl ("/proc/self/exe", argv[0], buf, NULL);
puts ("execl failed");
return 1;
}

[kyle@parisc-linux.org: parisc fix]
Signed-off-by: Ulrich Drepper
Acked-by: Ingo Molnar
Cc: Davide Libenzi
Cc: Michael Kerrisk
Cc: Chris Zankel
Signed-off-by: Kyle McMartin
Acked-by: David S. Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Drepper
2007-07-17 00:05:45 +0800

09 May, 2007

2 commits

7b82dc0e6 Remove suid/sgid bits on [f]truncate() ... Browse Code »

.. to match what we do on write(). This way, people who write to files
by using [f]truncate + writable mmap have the same semantics as if they
were using the write() family of system calls.

Signed-off-by: Linus Torvalds

Linus Torvalds
2007-05-09 11:10:00 +0800
e63340ae6 header cleaning: don't include smp_lock.h when not used ... Browse Code »

Remove includes of where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2007-05-09 02:15:07 +0800

11 Dec, 2006

1 commit

bbea9f696 [PATCH] fdtable: Make fdarray and fdsets equal in size ... Browse Code »

Currently, each fdtable supports three dynamically-sized arrays of data: the
fdarray and two fdsets. The code allows the number of fds supported by the
fdarray (fdtable->max_fds) to differ from the number of fds supported by each
of the fdsets (fdtable->max_fdset).

In practice, it is wasteful for these two sizes to differ: whenever we hit a
limit on the smaller-capacity structure, we will reallocate the entire fdtable
and all the dynamic arrays within it, so any delta in the memory used by the
larger-capacity structure will never be touched at all.

Rather than hogging this excess, we shouldn't even allocate it in the first
place, and keep the capacities of the fdarray and the fdsets equal. This
patch removes fdtable->max_fdset. As an added bonus, most of the supporting
code becomes simpler.

Signed-off-by: Vadim Lobanov
Cc: Christoph Hellwig
Cc: Al Viro
Cc: Dipankar Sarma
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vadim Lobanov
2006-12-11 01:57:22 +0800

09 Dec, 2006

2 commits

0f7fc9e4d [PATCH] VFS: change struct file to use struct path ... Browse Code »

This patch changes struct file to use struct path instead of having
independent pointers to struct dentry and struct vfsmount, and converts all
users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.

Additionally, it adds two #define's to make the transition easier for users of
the f_dentry and f_vfsmnt.

Signed-off-by: Josef "Jeff" Sipek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Josef "Jeff" Sipek
2006-12-09 00:28:41 +0800
24ec839c4 [PATCH] tty: ->signal->tty locking ... Browse Code »

Fix the locking of signal->tty.

Use ->sighand->siglock to protect ->signal->tty; this lock is already used
by most other members of ->signal/->sighand. And unless we are 'current'
or the tasklist_lock is held we need ->siglock to access ->signal anyway.

(NOTE: sys_unshare() is broken wrt ->sighand locking rules)

Note that tty_mutex is held over tty destruction, so while holding
tty_mutex any tty pointer remains valid. Otherwise the lifetime of ttys
are governed by their open file handles. This leaves some holes for tty
access from signal->tty (or any other non file related tty access).

It solves the tty SLAB scribbles we were seeing.

(NOTE: the change from group_send_sig_info to __group_send_sig_info needs to
be examined by someone familiar with the security framework, I think
it is safe given the SEND_SIG_PRIV from other __group_send_sig_info
invocations)

[schwidefsky@de.ibm.com: 3270 fix]
[akpm@osdl.org: various post-viro fixes]
Signed-off-by: Peter Zijlstra
Acked-by: Alan Cox
Cc: Oleg Nesterov
Cc: Prarit Bhargava
Cc: Chris Wright
Cc: Roland McGrath
Cc: Stephen Smalley
Cc: James Morris
Cc: "David S. Miller"
Cc: Jeff Dike
Cc: Martin Schwidefsky
Cc: Jan Kara
Signed-off-by: Martin Schwidefsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2006-12-09 00:28:38 +0800

01 Oct, 2006

2 commits

6902d925d [PATCH] r/o bind mounts: prepare for write access checks: collapse if() ... Browse Code »

We're shortly going to be adding a bunch more permission checks in these
functions. That requires adding either a bunch of new if() conditions, or
some gotos. This patch collapses existing if()s and uses gotos instead to
prepare for the upcoming changes.

Signed-off-by: Dave Hansen
Acked-by: Christoph Hellwig
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Hansen
2006-10-01 15:39:30 +0800
82b0547cf [PATCH] Create fs/utimes.c ... Browse Code »

* fs/open.c is getting bit crowdy
* preparation to lutimes(2)

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2006-10-01 15:39:19 +0800

30 Sep, 2006

2 commits

ee731f4f7 [PATCH] fix wrong error code on interrupted close syscalls ... Browse Code »

The problem is that close() syscalls can call a file system's flush
handler, which in turn might sleep interruptibly and ultimately pass back
an -ERESTARTSYS return value. This happens for files backed by an
interruptible NFS mount under nfs_file_flush() when a large file has just
been written and nfs_wait_bit_interruptible() detects that there is a
signal pending.

I have a test case where the "strace" command is used to attach to a
process sleeping in such a close(). Since the SIGSTOP is forced onto the
victim process (removing it from the thread's "blocked" mask in
force_sig_info()), the RPC wait is interrupted and the close() is
terminated early.

But the file table entry has already been cleared before the flush handler
was called. Thus, when the syscall is restarted, the file descriptor
appears closed and an EBADF error is returned (which is wrong). What's
worse, there is the hypothetical case where another thread of a
multi-threaded application might have reused the file descriptor, in which
case that file would be mistakenly closed.

The bottom line is that close() syscalls are not restartable, and thus
-ERESTARTSYS return values should be mapped to -EINTR. This is consistent
with the close(2) manual page. The fix is below.

Signed-off-by: Ernie Petrides
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ernie Petrides
2006-09-30 00:18:13 +0800
650a89834 [PATCH] vfs: define new lookup flag for chdir ... Browse Code »

In the "operation does permission checking" model used by fuse, chdir
permission is not checked, since there's no chdir method.

For this case set a lookup flag, which will be passed to ->permission(), so
fuse can distinguish it from permission checks for other operations.

Signed-off-by: Miklos Szeredi
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2006-09-30 00:18:08 +0800

26 Jun, 2006

1 commit

6e656be89 [PATCH] ftruncate does not always update m/ctime ... Browse Code »

In the course of trying to track down a bug where a file mtime was not
being updated correctly, it was discovered that the m/ctime updates were
not quite being handled correctly for ftruncate() calls.

Quoth SUSv3:

open(2):

If O_TRUNC is set and the file did previously exist, upon
successful completion, open() shall mark for update the st_ctime
and st_mtime fields of the file.

truncate(2):

Upon successful completion, if the file size is changed, this
function shall mark for update the st_ctime and st_mtime fields
of the file, and the S_ISUID and S_ISGID bits of the file mode
may be cleared.

ftruncate(2):

Upon successful completion, if fildes refers to a regular file,
the ftruncate() function shall mark for update the st_ctime and
st_mtime fields of the file and the S_ISUID and S_ISGID bits of
the file mode may be cleared. If the ftruncate() function is
unsuccessful, the file is unaffected.

The open(O_TRUNC) and truncate cases were being handled correctly, but the
ftruncate case was being handled like the truncate case. The semantics of
truncate and ftruncate don't quite match, so ftruncate needs to be handled
slightly differently.

The attached patch addresses this issue for ftruncate(2).

My thanx to Stephen Tweedie and Trond Myklebust for their help in
understanding the situation and semantics.

Signed-off-by: Peter Staubach
Cc: "Stephen C. Tweedie"
Cc: Trond Myklebust
Cc: Al Viro
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Staubach
2006-06-26 01:01:15 +0800

23 Jun, 2006

2 commits

75e1fcc0b [PATCH] vfs: add lock owner argument to flush operation ... Browse Code »

Pass the POSIX lock owner ID to the flush operation.

This is useful for filesystems which don't want to store any locking state
in inode->i_flock but want to handle locking/unlocking POSIX locks
internally. FUSE is one such filesystem but I think it possible that some
network filesystems would need this also.

Also add a flag to indicate that a POSIX locking request was generated by
close(), so filesystems using the above feature won't send an extra locking
request in this case.

Signed-off-by: Miklos Szeredi
Cc: Trond Myklebust
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2006-06-23 22:43:02 +0800
726c33422 [PATCH] VFS: Permit filesystem to perform statfs with a known root dentry ... Browse Code »

Give the statfs superblock operation a dentry pointer rather than a superblock
pointer.

This complements the get_sb() patch. That reduced the significance of
sb->s_root, allowing NFS to place a fake root there. However, NFS does
require a dentry to use as a target for the statfs operation. This permits
the root in the vfsmount to be used instead.

linux/mount.h has been added where necessary to make allyesconfig build
successfully.

Interest has also been expressed for use with the FUSE and XFS filesystems.

Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Nathan Scott
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-06-23 22:42:45 +0800

20 Jun, 2006

1 commit

9c937dcc7 [PATCH] log more info for directory entry change events ... Browse Code »

When an audit event involves changes to a directory entry, include
a PATH record for the directory itself. A few other notable changes:

- fixed audit_inode_child() hooks in fsnotify_move()
- removed unused flags arg from audit_inode()
- added audit log routines for logging a portion of a string

Here's some sample output.

before patch:
type=SYSCALL msg=audit(1149821605.320:26): arch=40000003 syscall=39 success=yes exit=0 a0=bf8d3c7c a1=1ff a2=804e1b8 a3=bf8d3c7c items=1 ppid=739 pid=800 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 comm="mkdir" exe="/bin/mkdir" subj=root:system_r:unconfined_t:s0-s0:c0.c255
type=CWD msg=audit(1149821605.320:26): cwd="/root"
type=PATH msg=audit(1149821605.320:26): item=0 name="foo" parent=164068 inode=164010 dev=03:00 mode=040755 ouid=0 ogid=0 rdev=00:00 obj=root:object_r:user_home_t:s0

after patch:
type=SYSCALL msg=audit(1149822032.332:24): arch=40000003 syscall=39 success=yes exit=0 a0=bfdd9c7c a1=1ff a2=804e1b8 a3=bfdd9c7c items=2 ppid=714 pid=777 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 comm="mkdir" exe="/bin/mkdir" subj=root:system_r:unconfined_t:s0-s0:c0.c255
type=CWD msg=audit(1149822032.332:24): cwd="/root"
type=PATH msg=audit(1149822032.332:24): item=0 name="/root" inode=164068 dev=03:00 mode=040750 ouid=0 ogid=0 rdev=00:00 obj=root:object_r:user_home_dir_t:s0
type=PATH msg=audit(1149822032.332:24): item=1 name="foo" inode=164010 dev=03:00 mode=040755 ouid=0 ogid=0 rdev=00:00 obj=root:object_r:user_home_t:s0

Signed-off-by: Amy Griffis
Signed-off-by: Al Viro

Amy Griffis
2006-06-20 17:25:28 +0800

16 May, 2006

1 commit

6aff5cb8e [PATCH] fs/open.c: unexport sys_openat ... Browse Code »

Remove the unused EXPORT_SYMBOL_GPL(sys_openat).

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2006-05-16 02:20:54 +0800

19 Apr, 2006

2 commits

385910f2b x86: be careful about tailcall breakage for sys_open[at] too ... Browse Code »

Came up through a quick grep for other cases similar to the ftruncate()
one in commit 0a489cb3b6a7b277030cdbc97c2c65905db94536.

Also, add a comment, so that people who read the code understand why we
do what looks like a no-op.

(Again, this won't actually matter to any sane user, since libc will
save and restore the register gcc stomps on, but it's still wrong to
stomp on it)

Signed-off-by: Linus Torvalds

Linus Torvalds
2006-04-19 04:22:59 +0800
0a489cb3b x86: don't allow tail-calls in sys_ftruncate[64]() ... Browse Code »

Gcc thinks it owns the incoming argument stack, but that's not true for
"asmlinkage" functions, and it corrupts the caller-set-up argument stack
when it pushes the third argument onto the stack. Which can result in
%ebx getting corrupted in user space.

Now, normally nobody sane would ever notice, since libc will save and
restore %ebx anyway over the system call, but it's still wrong.

I'd much rather have "asmlinkage" tell gcc directly that it doesn't own
the stack, but no such attribute exists, so we're stuck with our hacky
manual "prevent_tail_call()" macro once more (we've had the same issue
before with sys_waitpid() and sys_wait4()).

Thanks to Hans-Werner Hilse for reporting
the issue and testing the fix.

Signed-off-by: Linus Torvalds

Linus Torvalds
2006-04-19 04:02:48 +0800

26 Mar, 2006

2 commits

1b9a39173 Merge branch 'audit.b3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current ... Browse Code »

* 'audit.b3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: (22 commits)
[PATCH] fix audit_init failure path
[PATCH] EXPORT_SYMBOL patch for audit_log, audit_log_start, audit_log_end and audit_format
[PATCH] sem2mutex: audit_netlink_sem
[PATCH] simplify audit_free() locking
[PATCH] Fix audit operators
[PATCH] promiscuous mode
[PATCH] Add tty to syscall audit records
[PATCH] add/remove rule update
[PATCH] audit string fields interface + consumer
[PATCH] SE Linux audit events
[PATCH] Minor cosmetic cleanups to the code moved into auditfilter.c
[PATCH] Fix audit record filtering with !CONFIG_AUDITSYSCALL
[PATCH] Fix IA64 success/failure indication in syscall auditing.
[PATCH] Miscellaneous bug and warning fixes
[PATCH] Capture selinux subject/object context information.
[PATCH] Exclude messages by message type
[PATCH] Collect more inode information during syscall processing.
[PATCH] Pass dentry, not just name, in fsnotify creation hooks.
[PATCH] Define new range of userspace messages.
[PATCH] Filter rule comparators
...

Fixed trivial conflict in security/selinux/hooks.c

Linus Torvalds
2006-03-26 01:24:53 +0800
9a56c2139 [PATCH] Add lookup_instantiate_filp usage warning ... Browse Code »

I think it would be nice to put an usage warning in header of
lookup_instantiate_filp() to indicate it is unsafe to use it on anything
but regular files (even that is potentially unsafe, but there your ->open()
is usually in your hands anyway), so that others won't fall into the same
trap I did.

Signed-off-by: Oleg Drokin
Cc: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Drokin
2006-03-26 00:22:51 +0800

23 Mar, 2006

1 commit

0c9e63fd3 [PATCH] Shrinks sizeof(files_struct) and better layout ... Browse Code »

1) Reduce the size of (struct fdtable) to exactly 64 bytes on 32bits
platforms, lowering kmalloc() allocated space by 50%.

2) Reduce the size of (files_struct), using a special 32 bits (or
64bits) embedded_fd_set, instead of a 1024 bits fd_set for the
close_on_exec_init and open_fds_init fields. This save some ram (248
bytes per task) as most tasks dont open more than 32 files. D-Cache
footprint for such tasks is also reduced to the minimum.

3) Reduce size of allocated fdset. Currently two full pages are
allocated, that is 32768 bits on x86 for example, and way too much. The
minimum is now L1_CACHE_BYTES.

UP and SMP should benefit from this patch, because most tasks will touch
only one cache line when open()/close() stdin/stdout/stderr (0/1/2),
(next_fd, close_on_exec_init, open_fds_init, fd_array[0 .. 2] being in the
same cache line)

Signed-off-by: Eric Dumazet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2006-03-23 23:38:09 +0800

21 Mar, 2006

1 commit

73241ccca [PATCH] Collect more inode information during syscall processing. ... Browse Code »

This patch augments the collection of inode info during syscall
processing. It represents part of the functionality that was provided
by the auditfs patch included in RHEL4.

Specifically, it:

- Collects information for target inodes created or removed during
syscalls. Previous code only collects information for the target
inode's parent.

- Adds the audit_inode() hook to syscalls that operate on a file
descriptor (e.g. fchown), enabling audit to do inode filtering for
these calls.

- Modifies filtering code to check audit context for either an inode #
or a parent inode # matching a given rule.

- Modifies logging to provide inode # for both parent and child.

- Protect debug info from NULL audit_names.name.

[AV: folded a later typo fix from the same author]

Signed-off-by: Amy Griffis
Signed-off-by: David Woodhouse
Signed-off-by: Al Viro

Amy Griffis
2006-03-21 03:08:53 +0800

19 Jan, 2006

1 commit

5590ff0d5 [PATCH] vfs: *at functions: core ... Browse Code »

Here is a series of patches which introduce in total 13 new system calls
which take a file descriptor/filename pair instead of a single file
name. These functions, openat etc, have been discussed on numerous
occasions. They are needed to implement race-free filesystem traversal,
they are necessary to implement a virtual per-thread current working
directory (think multi-threaded backup software), etc.

We have in glibc today implementations of the interfaces which use the
/proc/self/fd magic. But this code is rather expensive. Here are some
results (similar to what Jim Meyering posted before).

The test creates a deep directory hierarchy on a tmpfs filesystem. Then
rm -fr is used to remove all directories. Without syscall support I get
this:

real 0m31.921s
user 0m0.688s
sys 0m31.234s

With syscall support the results are much better:

real 0m20.699s
user 0m0.536s
sys 0m20.149s

The interfaces are for obvious reasons currently not much used. But they'll
be used. coreutils (and Jeff's posixutils) are already using them.
Furthermore, code like ftw/fts in libc (maybe even glob) will also start using
them. I expect a patch to make follow soon. Every program which is walking
the filesystem tree will benefit.

Signed-off-by: Ulrich Drepper
Signed-off-by: Alexey Dobriyan
Cc: Christoph Hellwig
Cc: Al Viro
Acked-by: Ingo Molnar
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Drepper
2006-01-19 11:20:29 +0800

12 Jan, 2006

1 commit

16f7e0fe2 [PATCH] capable/capability.h (fs/) ... Browse Code »

fs: Use where capable() is used.

Signed-off-by: Randy Dunlap
Acked-by: Tim Schmielau
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2006-01-12 10:42:13 +0800

10 Jan, 2006

1 commit

1b1dcc1b5 [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem ... Browse Code »

This patch converts the inode semaphore to a mutex. I have tested it on
XFS and compiled as much as one can consider on an ia64. Anyway your
luck with it might be different.

Modified-by: Ingo Molnar

(finished the conversion)

Signed-off-by: Jes Sorensen
Signed-off-by: Ingo Molnar

Jes Sorensen
2006-01-10 07:59:24 +0800

09 Jan, 2006

2 commits

b01ec0ef6 [PATCH] tiny: Uninline some open.c functions ... Browse Code »

uninline some open.c functions

add/remove: 3/0 grow/shrink: 0/6 up/down: 679/-1166 (-487)
function old new delta
do_sys_truncate - 336 +336
do_sys_ftruncate - 317 +317
__put_unused_fd - 26 +26
put_unused_fd 57 49 -8
sys_close 150 119 -31
sys_ftruncate64 260 26 -234
sys_ftruncate 272 24 -248
sys_truncate 339 25 -314
sys_truncate64 336 5 -331

Signed-off-by: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matt Mackall
2006-01-09 12:14:10 +0800
4a30131e7 [PATCH] Fix some problems with truncate and mtime semantics. ... Browse Code »

SUS requires that when truncating a file to the size that it currently
is:
truncate and ftruncate should NOT modify ctime or mtime
O_TRUNC SHOULD modify ctime and mtime.

Currently mtime and ctime are always modified on most local
filesystems (side effect of ->truncate) or never modified (on NFS).

With this patch:
ATTR_CTIME|ATTR_MTIME are sent with ATTR_SIZE precisely when
an update of these times is required whether size changes or not
(via a new argument to do_truncate). This allows NFS to do
the right thing for O_TRUNC.
inode_setattr nolonger forces ATTR_MTIME|ATTR_CTIME when the ATTR_SIZE
sets the size to it's current value. This allows local filesystems
to do the right thing for f?truncate.

Also, the logic in inode_setattr is changed a bit so there are two return
points. One returns the error from vmtruncate if it failed, the other
returns 0 (there can be no other failure).

Finally, if vmtruncate succeeds, and ATTR_SIZE is the only change
requested, we now fall-through and mark_inode_dirty. If a filesystem did
not have a ->truncate function, then vmtruncate will have changed i_size,
without marking the inode as 'dirty', and I think this is wrong.

Signed-off-by: Neil Brown
Cc: Christoph Hellwig
Cc: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-01-09 12:13:52 +0800

09 Nov, 2005

2 commits

8c744fb83 [PATCH] add a file_permission helper ... Browse Code »

A few more callers of permission() just want to check for a different access
pattern on an already open file. This patch adds a wrapper for permission()
that takes a file in preparation of per-mount read-only support and to clean
up the callers a little. The helper is not intended for new code, everything
without the interface set in stone should use vfs_permission()

Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2005-11-09 23:55:59 +0800
e4543eddf [PATCH] add a vfs_permission helper ... Browse Code »

Most permission() calls have a struct nameidata * available. This helper
takes that as an argument and thus makes sure we pass it down for lookup
intents and prepares for per-mount read-only support where we need a struct
vfsmount for checking whether a file is writeable.

Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2005-11-09 23:55:58 +0800

07 Nov, 2005

1 commit

cc4e69dee [PATCH] VFS: pass file pointer to filesystem from ftruncate() ... Browse Code »

This patch extends the iattr structure with a file pointer memeber, and adds
an ATTR_FILE validity flag for this member.

This is set if do_truncate() is invoked from ftruncate() or from
do_coredump().

The change is source and binary compatible.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2005-11-07 23:53:42 +0800