Eric Lee / smarc-fsl-linux-kernel

12 Jun, 2009

1 commit

6fac98dd2 Push BKL into do_mount() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2009-06-12 09:36:08 +0800

11 May, 2009

1 commit

5e751e992 CRED: Rename cred_exec_mutex to reflect that it's a guard against ptrace ... Browse Code »

Rename cred_exec_mutex to reflect that it's a guard against foreign
intervention on a process's credential state, such as is made by ptrace(). The
attachment of a debugger to a process affects execve()'s calculation of the new
credential state - _and_ also setprocattr()'s calculation of that state.

Signed-off-by: David Howells
Signed-off-by: James Morris

David Howells
2009-05-11 06:15:36 +0800

24 Apr, 2009

1 commit

8c652f96d do_execve() must not clear fs->in_exec if it was set by another thread ... Browse Code »

If do_execve() fails after check_unsafe_exec(), it clears fs->in_exec
unconditionally. This is wrong if we race with our sub-thread which
also does do_execve:

Two threads T1 and T2 and another process P, all share the same
->fs.

T1 starts do_execve(BAD_FILE). It calls check_unsafe_exec(), since
->fs is shared, we set LSM_UNSAFE but not ->in_exec.

P exits and decrements fs->users.

T2 starts do_execve(), calls check_unsafe_exec(), now ->fs is not
shared, we set fs->in_exec.

T1 continues, open_exec(BAD_FILE) fails, we clear ->in_exec and
return to the user-space.

T1 does clone(CLONE_FS /* without CLONE_THREAD */).

T2 continues without LSM_UNSAFE_SHARE while ->fs is shared with
another process.

Change check_unsafe_exec() to return res = 1 if we set ->in_exec, and change
do_execve() to clear ->in_exec depending on res.

When do_execve() suceeds, it is safe to clear ->in_exec unconditionally.
It can be set only if we don't share ->fs with another process, and since
we already killed all sub-threads either ->in_exec == 0 or we are the
only user of this ->fs.

Also, we do not need fs->lock to clear fs->in_exec.

Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Acked-by: Hugh Dickins
Signed-off-by: Linus Torvalds

Oleg Nesterov
2009-04-24 22:39:45 +0800

21 Apr, 2009

2 commits

2eae7a187 kill vfs_stat_fd / vfs_lstat_fd ... Browse Code »

There's really no reason to keep vfs_stat_fd and vfs_lstat_fd with
Oleg's vfs_fstatat. Use vfs_fstatat for the few cases having the
directory fd, and switch all others to vfs_stat / vfs_lstat.

Reviewed-by: Christoph Hellwig

Signed-off-by: Al Viro

Christoph Hellwig
2009-04-21 11:02:52 +0800
0112fc222 Separate out common fstatat code into vfs_fstatat ... Browse Code »

This is a version incorporating Christoph's suggestion.

Separate out common *fstatat functionality into a single function
instead of duplicating it all over the code.

Signed-off-by: Oleg Drokin
Signed-off-by: Al Viro

Oleg Drokin
2009-04-21 11:02:51 +0800

05 Apr, 2009

1 commit

601cc11d0 Make non-compat preadv/pwritev use native register size ... Browse Code »

Instead of always splitting the file offset into 32-bit 'high' and 'low'
parts, just split them into the largest natural word-size - which in C
terms is 'unsigned long'.

This allows 64-bit architectures to avoid the unnecessary 32-bit
shifting and masking for native format (while the compat interfaces will
obviously always have to do it).

This also changes the order of 'high' and 'low' to be "low first". Why?
Because when we have it like this, the 64-bit system calls now don't use
the "pos_high" argument at all, and it makes more sense for the native
system call to simply match the user-mode prototype.

This results in a much more natural calling convention, and allows the
compiler to generate much more straightforward code. On x86-64, we now
generate

testq %rcx, %rcx # pos_l
js .L122 #,
movq %rcx, -48(%rbp) # pos_l, pos

from the C source

loff_t pos = pos_from_hilo(pos_h, pos_l);
...
if (pos < 0)
return -EINVAL;

and the 'pos_h' register isn't even touched. It used to generate code
like

mov %r8d, %r8d # pos_low, pos_low
salq $32, %rcx #, tmp71
movq %r8, %rax # pos_low, pos.386
orq %rcx, %rax # tmp71, pos.386
js .L122 #,
movq %rax, -48(%rbp) # pos.386, pos

which isn't _that_ horrible, but it does show how the natural word size
is just a more sensible interface (same arguments will hold in the user
level glibc wrapper function, of course, so the kernel side is just half
of the equation!)

Note: in all cases the user code wrapper can again be the same. You can
just do

#define HALF_BITS (sizeof(unsigned long)*4)
__syscall(PWRITEV, fd, iov, count, offset, (offset >> HALF_BITS) >> HALF_BITS);

or something like that. That way the user mode wrapper will also be
nicely passing in a zero (it won't actually have to do the shifts, the
compiler will understand what is going on) for the last argument.

And that is a good idea, even if nobody will necessarily ever care: if
we ever do move to a 128-bit lloff_t, this particular system call might
be left alone. Of course, that will be the least of our worries if we
really ever need to care, so this may not be worth really caring about.

[ Fixed for lost 'loff_t' cast noticed by Andrew Morton ]

Acked-by: Gerd Hoffmann
Cc: H. Peter Anvin
Cc: Andrew Morton
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: Ingo Molnar
Cc: Ralf Baechle >
Cc: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2009-04-05 05:20:34 +0800

03 Apr, 2009

5 commits

8fe74cf05 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
Remove two unneeded exports and make two symbols static in fs/mpage.c
Cleanup after commit 585d3bc06f4ca57f975a5a1f698f65a45ea66225
Trim includes of fdtable.h
Don't crap into descriptor table in binfmt_som
Trim includes in binfmt_elf
Don't mess with descriptor table in load_elf_binary()
Get rid of indirect include of fs_struct.h
New helper - current_umask()
check_unsafe_exec() doesn't care about signal handlers sharing
New locking/refcounting for fs_struct
Take fs_struct handling to new file (fs/fs_struct.c)
Get rid of bumping fs_struct refcount in pivot_root(2)
Kill unsharing fs_struct in __set_personality()

Linus Torvalds
2009-04-03 12:09:10 +0800
10c7db279 preadv/pwritev: switch compat readv/preadv/writev/pwritev from fget to fget_light ... Browse Code »

Signed-off-by: Gerd Hoffmann
Cc: Arnd Bergmann
Cc: Al Viro
Cc:
Cc:
Cc: Ralf Baechle
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gerd Hoffmann
2009-04-03 10:05:08 +0800
f3554f4bc preadv/pwritev: Add preadv and pwritev system calls. ... Browse Code »

This patch adds preadv and pwritev system calls. These syscalls are a
pretty straightforward combination of pread and readv (same for write).
They are quite useful for doing vectored I/O in threaded applications.
Using lseek+readv instead opens race windows you'll have to plug with
locking.

Other systems have such system calls too, for example NetBSD, check
here: http://www.daemon-systems.org/man/preadv.2.html

The application-visible interface provided by glibc should look like
this to be compatible to the existing implementations in the *BSD family:

ssize_t preadv(int d, const struct iovec *iov, int iovcnt, off_t offset);
ssize_t pwritev(int d, const struct iovec *iov, int iovcnt, off_t offset);

This prototype has one problem though: On 32bit archs is the (64bit)
offset argument unaligned, which the syscall ABI of several archs doesn't
allow to do. At least s390 needs a wrapper in glibc to handle this. As
we'll need a wrappers in glibc anyway I've decided to push problem to
glibc entriely and use a syscall prototype which works without
arch-specific wrappers inside the kernel: The offset argument is
explicitly splitted into two 32bit values.

The patch sports the actual system call implementation and the windup in
the x86 system call tables. Other archs follow as separate patches.

Signed-off-by: Gerd Hoffmann
Cc: Arnd Bergmann
Cc: Al Viro
Cc:
Cc:
Cc: Ralf Baechle
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gerd Hoffmann
2009-04-03 10:05:08 +0800
6949a6318 preadv/pwritev: create compat_writev() ... Browse Code »

Factor out some code from compat_sys_writev() which can be shared with the
upcoming compat_sys_pwritev().

Signed-off-by: Gerd Hoffmann
Cc: Arnd Bergmann
Cc: Al Viro
Cc:
Cc:
Cc: Ralf Baechle
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gerd Hoffmann
2009-04-03 10:05:07 +0800
dac121384 preadv/pwritev: create compat_readv() ... Browse Code »

This patch series:

Implement the preadv() and pwritev() syscalls. *BSD has this syscall for
quite some time.

Test code:

#if 0
set -x
gcc -Wall -O2 -o preadv $0
exit 0
#endif
/*
* preadv demo / test
*
* (c) 2008 Gerd Hoffmann
*
* build with "sh $thisfile"
*/

#include
#include
#include
#include
#include
#include

/* ----------------------------------------------------------------- */
/* syscall windup */

#include
#if 0
/* WARNING: Be sure you know what you are doing if you enable this.
* linux syscall code isn't upstream yet, syscall numbers are subject
* to change */
# ifndef __NR_preadv
# ifdef __i386__
# define __NR_preadv 333
# define __NR_pwritev 334
# endif
# ifdef __x86_64__
# define __NR_preadv 295
# define __NR_pwritev 296
# endif
# endif
#endif
#ifndef __NR_preadv
# error preadv/pwritev syscall numbers are unknown
#endif

static ssize_t preadv(int fd, const struct iovec *iov, int iovcnt, off_t offset)
{
uint32_t pos_high = (offset >> 32) & 0xffffffff;
uint32_t pos_low = offset & 0xffffffff;

return syscall(__NR_preadv, fd, iov, iovcnt, pos_high, pos_low);
}

static ssize_t pwritev(int fd, const struct iovec *iov, int iovcnt, off_t offset)
{
uint32_t pos_high = (offset >> 32) & 0xffffffff;
uint32_t pos_low = offset & 0xffffffff;

return syscall(__NR_pwritev, fd, iov, iovcnt, pos_high, pos_low);
}

/* ----------------------------------------------------------------- */
/* demo/test app */

static char filename[] = "/tmp/preadv-XXXXXX";
static char outbuf[11] = "0123456789";
static char inbuf[11] = "----------";

static struct iovec ovec[2] = {{
.iov_base = outbuf + 5,
.iov_len = 5,
},{
.iov_base = outbuf + 0,
.iov_len = 5,
}};

static struct iovec ivec[3] = {{
.iov_base = inbuf + 6,
.iov_len = 2,
},{
.iov_base = inbuf + 4,
.iov_len = 2,
},{
.iov_base = inbuf + 2,
.iov_len = 2,
}};

void cleanup(void)
{
unlink(filename);
}

int main(int argc, char **argv)
{
int fd, rc;

fd = mkstemp(filename);
if (-1 == fd) {
perror("mkstemp");
exit(1);
}
atexit(cleanup);

/* write to file: "56789-01234" */
rc = pwritev(fd, ovec, 2, 0);
if (rc < 0) {
perror("pwritev");
exit(1);
}

/* read from file: "78-90-12" */
rc = preadv(fd, ivec, 3, 2);
if (rc < 0) {
perror("preadv");
exit(1);
}

printf("result : %s\n", inbuf);
printf("expected: %s\n", "--129078--");
exit(0);
}

This patch:

Factor out some code from compat_sys_readv() which can be shared with the
upcoming compat_sys_preadv().

Signed-off-by: Gerd Hoffmann
Cc: Arnd Bergmann
Cc: Al Viro
Cc:
Cc:
Cc: Ralf Baechle
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gerd Hoffmann
2009-04-03 10:05:07 +0800

01 Apr, 2009

1 commit

498052bba New locking/refcounting for fs_struct ... Browse Code »

* all changes of current->fs are done under task_lock and write_lock of
old fs->lock
* refcount is not atomic anymore (same protection)
* its decrements are done when removing reference from current; at the
same time we decide whether to free it.
* put_fs_struct() is gone
* new field - ->in_exec. Set by check_unsafe_exec() if we are trying to do
execve() and only subthreads share fs_struct. Cleared when finishing exec
(success and failure alike). Makes CLONE_FS fail with -EAGAIN if set.
* check_unsafe_exec() may fail with -EAGAIN if another execve() from subthread
is in progress.

Signed-off-by: Al Viro

Al Viro
2009-04-01 11:00:26 +0800

29 Mar, 2009

2 commits

e426b64c4 fix setuid sometimes doesn't ... Browse Code »

Joe Malicki reports that setuid sometimes doesn't: very rarely,
a setuid root program does not get root euid; and, by the way,
they have a health check running lsof every few minutes.

Right, check_unsafe_exec() notes whether the files_struct is being
shared by more threads than will get killed by the exec, and if so
sets LSM_UNSAFE_SHARE to make bprm_set_creds() careful about euid.
But /proc//fd and /proc//fdinfo lookups make transient
use of get_files_struct(), which also raises that sharing count.

There's a rather simple fix for this: exec's check on files->count
has been redundant ever since 2.6.1 made it unshare_files() (except
while compat_do_execve() omitted to do so) - just remove that check.

[Note to -stable: this patch will not apply before 2.6.29: earlier
releases should just remove the files->count line from unsafe_exec().]

Reported-by: Joe Malicki
Narrowed-down-by: Michael Itz
Tested-by: Joe Malicki
Signed-off-by: Hugh Dickins
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Hugh Dickins
2009-03-29 08:30:00 +0800
53e9309e0 compat_do_execve should unshare_files ... Browse Code »

2.6.26's commit fd8328be874f4190a811c58cd4778ec2c74d2c05
"sanitize handling of shared descriptor tables in failing execve()"
moved the unshare_files() from flush_old_exec() and several binfmts
to the head of do_execve(); but forgot to make the same change to
compat_do_execve(), leaving a CLONE_FILES files_struct shared across
exec from a 32-bit process on a 64-bit kernel.

It's arguable whether the files_struct really ought to be unshared
across exec; but 2.6.1 made that so to stop the loading binary's fd
leaking into other threads, and a 32-bit process on a 64-bit kernel
ought to behave in the same way as 32 on 32 and 64 on 64.

Signed-off-by: Hugh Dickins
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Hugh Dickins
2009-03-29 08:30:00 +0800

28 Mar, 2009

2 commits

3ae5080f4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (37 commits)
fs: avoid I_NEW inodes
Merge code for single and multiple-instance mounts
Remove get_init_pts_sb()
Move common mknod_ptmx() calls into caller
Parse mount options just once and copy them to super block
Unroll essentials of do_remount_sb() into devpts
vfs: simple_set_mnt() should return void
fs: move bdev code out of buffer.c
constify dentry_operations: rest
constify dentry_operations: configfs
constify dentry_operations: sysfs
constify dentry_operations: JFS
constify dentry_operations: OCFS2
constify dentry_operations: GFS2
constify dentry_operations: FAT
constify dentry_operations: FUSE
constify dentry_operations: procfs
constify dentry_operations: ecryptfs
constify dentry_operations: CIFS
constify dentry_operations: AFS
...

Linus Torvalds
2009-03-28 07:23:12 +0800
2b1c6bd77 generic compat_sys_ustat ... Browse Code »

Due to a different size of ino_t ustat needs a compat handler, but
currently only x86 and mips provide one. Add a generic compat_sys_ustat
and switch all architectures over to it. Instead of doing various
user copy hacks compat_sys_ustat just reimplements sys_ustat as
it's trivial. This was suggested by Arnd Bergmann.

Found by Eric Sandeen when running xfstests/017 on ppc64, which causes
stack smashing warnings on RHEL/Fedora due to the too large amount of
data writen by the syscall.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2009-03-28 02:43:57 +0800

24 Mar, 2009

1 commit

703a3cd72 Merge branch 'master' into next Browse Code »

James Morris
2009-03-24 07:52:46 +0800

12 Feb, 2009

1 commit

f9ce1f1cd Add in_execve flag into task_struct. ... Browse Code »

This patch allows LSM modules to determine whether current process is in an
execve operation or not so that they can behave differently while an execve
operation is in progress.

This patch is needed by TOMOYO. Please see another patch titled "LSM adapter
functions." for backgrounds.

Signed-off-by: Tetsuo Handa
Signed-off-by: David Howells
Signed-off-by: James Morris

Kentaro Takeda
2009-02-12 12:15:03 +0800

07 Feb, 2009

1 commit

0bf2f3aec CRED: Fix SUID exec regression ... Browse Code »

The patch:

commit a6f76f23d297f70e2a6b3ec607f7aeeea9e37e8d
CRED: Make execve() take advantage of copy-on-write credentials

moved the place in which the 'safeness' of a SUID/SGID exec was performed to
before de_thread() was called. This means that LSM_UNSAFE_SHARE is now
calculated incorrectly. This flag is set if any of the usage counts for
fs_struct, files_struct and sighand_struct are greater than 1 at the time the
determination is made. All of which are true for threads created by the
pthread library.

However, since we wish to make the security calculation before irrevocably
damaging the process so that we can return it an error code in the case where
we decide we want to reject the exec request on this basis, we have to make the
determination before calling de_thread().

So, instead, we count up the number of threads (CLONE_THREAD) that are sharing
our fs_struct (CLONE_FS), files_struct (CLONE_FILES) and sighand_structs
(CLONE_SIGHAND/CLONE_THREAD) with us. These will be killed by de_thread() and
so can be discounted by check_unsafe_exec().

We do have to be careful because CLONE_THREAD does not imply FS or FILES.

We _assume_ that there will be no extra references to these structs held by the
threads we're going to kill.

This can be tested with the attached pair of programs. Build the two programs
using the Makefile supplied, and run ./test1 as a non-root user. If
successful, you should see something like:

[dhowells@andromeda tmp]$ ./test1
--TEST1--
uid=4043, euid=4043 suid=4043
exec ./test2
--TEST2--
uid=4043, euid=0 suid=0
SUCCESS - Correct effective user ID

and if unsuccessful, something like:

[dhowells@andromeda tmp]$ ./test1
--TEST1--
uid=4043, euid=4043 suid=4043
exec ./test2
--TEST2--
uid=4043, euid=4043 suid=4043
ERROR - Incorrect effective user ID!

The non-root user ID you see will depend on the user you run as.

[test1.c]
#include
#include
#include
#include

static void *thread_func(void *arg)
{
while (1) {}
}

int main(int argc, char **argv)
{
pthread_t tid;
uid_t uid, euid, suid;

printf("--TEST1--\n");
getresuid(&uid, &euid, &suid);
printf("uid=%d, euid=%d suid=%d\n", uid, euid, suid);

if (pthread_create(&tid, NULL, thread_func, NULL) < 0) {
perror("pthread_create");
exit(1);
}

printf("exec ./test2\n");
execlp("./test2", "test2", NULL);
perror("./test2");
_exit(1);
}

[test2.c]
#include
#include
#include

int main(int argc, char **argv)
{
uid_t uid, euid, suid;

getresuid(&uid, &euid, &suid);
printf("--TEST2--\n");
printf("uid=%d, euid=%d suid=%d\n", uid, euid, suid);

if (euid != 0) {
fprintf(stderr, "ERROR - Incorrect effective user ID!\n");
exit(1);
}
printf("SUCCESS - Correct effective user ID\n");
exit(0);
}

[Makefile]
CFLAGS = -D_GNU_SOURCE -Wall -Werror -Wunused
all: test1 test2

test1: test1.c
gcc $(CFLAGS) -o test1 test1.c -lpthread

test2: test2.c
gcc $(CFLAGS) -o test2 test2.c
sudo chown root.root test2
sudo chmod +s test2

Reported-by: David Smith
Signed-off-by: David Howells
Acked-by: David Smith
Signed-off-by: James Morris

David Howells
2009-02-07 05:46:18 +0800

14 Jan, 2009

1 commit

c9da9f212 [CVE-2009-0029] Make sys_pselect7 static ... Browse Code »

Not a single architecture has wired up sys_pselect7 plus it is the
only system call with seven parameters. Just make it static and
rename it to do_pselect which will do the work for sys_pselect6.

Signed-off-by: Heiko Carstens

Heiko Carstens
2009-01-14 21:15:16 +0800

07 Jan, 2009

1 commit

ca8a5bd28 add missing accounting calls to compat_sys_{readv,writev} ... Browse Code »

Signed-off-by: Gerd Hoffmann
Cc: Jay Lan
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gerd Hoffmann
2009-01-07 07:59:13 +0800

14 Nov, 2008

1 commit

a6f76f23d CRED: Make execve() take advantage of copy-on-write credentials ... Browse Code »

Make execve() take advantage of copy-on-write credentials, allowing it to set
up the credentials in advance, and then commit the whole lot after the point
of no return.

This patch and the preceding patches have been tested with the LTP SELinux
testsuite.

This patch makes several logical sets of alteration:

(1) execve().

The credential bits from struct linux_binprm are, for the most part,
replaced with a single credentials pointer (bprm->cred). This means that
all the creds can be calculated in advance and then applied at the point
of no return with no possibility of failure.

I would like to replace bprm->cap_effective with:

cap_isclear(bprm->cap_effective)

but this seems impossible due to special behaviour for processes of pid 1
(they always retain their parent's capability masks where normally they'd
be changed - see cap_bprm_set_creds()).

The following sequence of events now happens:

(a) At the start of do_execve, the current task's cred_exec_mutex is
locked to prevent PTRACE_ATTACH from obsoleting the calculation of
creds that we make.

(a) prepare_exec_creds() is then called to make a copy of the current
task's credentials and prepare it. This copy is then assigned to
bprm->cred.

This renders security_bprm_alloc() and security_bprm_free()
unnecessary, and so they've been removed.

(b) The determination of unsafe execution is now performed immediately
after (a) rather than later on in the code. The result is stored in
bprm->unsafe for future reference.

(c) prepare_binprm() is called, possibly multiple times.

(i) This applies the result of set[ug]id binaries to the new creds
attached to bprm->cred. Personality bit clearance is recorded,
but now deferred on the basis that the exec procedure may yet
fail.

(ii) This then calls the new security_bprm_set_creds(). This should
calculate the new LSM and capability credentials into *bprm->cred.

This folds together security_bprm_set() and parts of
security_bprm_apply_creds() (these two have been removed).
Anything that might fail must be done at this point.

(iii) bprm->cred_prepared is set to 1.

bprm->cred_prepared is 0 on the first pass of the security
calculations, and 1 on all subsequent passes. This allows SELinux
in (ii) to base its calculations only on the initial script and
not on the interpreter.

(d) flush_old_exec() is called to commit the task to execution. This
performs the following steps with regard to credentials:

(i) Clear pdeath_signal and set dumpable on certain circumstances that
may not be covered by commit_creds().

(ii) Clear any bits in current->personality that were deferred from
(c.i).

(e) install_exec_creds() [compute_creds() as was] is called to install the
new credentials. This performs the following steps with regard to
credentials:

(i) Calls security_bprm_committing_creds() to apply any security
requirements, such as flushing unauthorised files in SELinux, that
must be done before the credentials are changed.

This is made up of bits of security_bprm_apply_creds() and
security_bprm_post_apply_creds(), both of which have been removed.
This function is not allowed to fail; anything that might fail
must have been done in (c.ii).

(ii) Calls commit_creds() to apply the new credentials in a single
assignment (more or less). Possibly pdeath_signal and dumpable
should be part of struct creds.

(iii) Unlocks the task's cred_replace_mutex, thus allowing
PTRACE_ATTACH to take place.

(iv) Clears The bprm->cred pointer as the credentials it was holding
are now immutable.

(v) Calls security_bprm_committed_creds() to apply any security
alterations that must be done after the creds have been changed.
SELinux uses this to flush signals and signal handlers.

(f) If an error occurs before (d.i), bprm_free() will call abort_creds()
to destroy the proposed new credentials and will then unlock
cred_replace_mutex. No changes to the credentials will have been
made.

(2) LSM interface.

A number of functions have been changed, added or removed:

(*) security_bprm_alloc(), ->bprm_alloc_security()
(*) security_bprm_free(), ->bprm_free_security()

Removed in favour of preparing new credentials and modifying those.

(*) security_bprm_apply_creds(), ->bprm_apply_creds()
(*) security_bprm_post_apply_creds(), ->bprm_post_apply_creds()

Removed; split between security_bprm_set_creds(),
security_bprm_committing_creds() and security_bprm_committed_creds().

(*) security_bprm_set(), ->bprm_set_security()

Removed; folded into security_bprm_set_creds().

(*) security_bprm_set_creds(), ->bprm_set_creds()

New. The new credentials in bprm->creds should be checked and set up
as appropriate. bprm->cred_prepared is 0 on the first call, 1 on the
second and subsequent calls.

(*) security_bprm_committing_creds(), ->bprm_committing_creds()
(*) security_bprm_committed_creds(), ->bprm_committed_creds()

New. Apply the security effects of the new credentials. This
includes closing unauthorised files in SELinux. This function may not
fail. When the former is called, the creds haven't yet been applied
to the process; when the latter is called, they have.

The former may access bprm->cred, the latter may not.

(3) SELinux.

SELinux has a number of changes, in addition to those to support the LSM
interface changes mentioned above:

(a) The bprm_security_struct struct has been removed in favour of using
the credentials-under-construction approach.

(c) flush_unauthorized_files() now takes a cred pointer and passes it on
to inode_has_perm(), file_has_perm() and dentry_open().

Signed-off-by: David Howells
Acked-by: James Morris
Acked-by: Serge Hallyn
Signed-off-by: James Morris

David Howells
2008-11-14 07:39:24 +0800

27 Oct, 2008

1 commit

4d36a9e65 select: deal with math overflow from borderline valid userland data ... Browse Code »

Some userland apps seem to pass in a "0" for the seconds, and several
seconds worth of usecs to select(). The old kernels accepted this just
fine, so the new kernels must too.

However, due to the upscaling of the microseconds to nanoseconds we had
some cases where we got math overflow, and depending on the GCC version
(due to inlining decisions) that actually resulted in an -EINVAL return.

This patch fixes this by adding the excess microseconds to the seconds
field.

Also with thanks to Marcin Slusarz for spotting some implementation bugs
in the diagnostics patches.

Reported-by: Carlos R. Mafra
Signed-off-by: Arjan van de Ven
Signed-off-by: Linus Torvalds

Arjan van de Ven
2008-10-27 02:22:08 +0800

24 Oct, 2008

1 commit

1f6d6e8eb Merge branch 'v28-range-hrtimers-for-linus-v2' of git://git.kernel.org/pub/scm/l… ... Browse Code »

…inux/kernel/git/tip/linux-2.6-tip

* 'v28-range-hrtimers-for-linus-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (37 commits)
hrtimers: add missing docbook comments to struct hrtimer
hrtimers: simplify hrtimer_peek_ahead_timers()
hrtimers: fix docbook comments
DECLARE_PER_CPU needs linux/percpu.h
hrtimers: fix typo
rangetimers: fix the bug reported by Ingo for real
rangetimer: fix BUG_ON reported by Ingo
rangetimer: fix x86 build failure for the !HRTIMERS case
select: fix alpha OSF wrapper
select: fix alpha OSF wrapper
hrtimer: peek at the timer queue just before going idle
hrtimer: make the futex() system call use the per process slack value
hrtimer: make the nanosleep() syscall use the per process slack
hrtimer: fix signed/unsigned bug in slack estimator
hrtimer: show the timer ranges in /proc/timer_list
hrtimer: incorporate feedback from Peter Zijlstra
hrtimer: add a hrtimer_start_range() function
hrtimer: another build fix
hrtimer: fix build bug found by Ingo
hrtimer: make select() and poll() use the hrtimer range feature
...

Linus Torvalds
2008-10-24 01:53:02 +0800

23 Oct, 2008

1 commit

53c9c5c0e [PATCH] prepare vfs_readdir() callers to returning filldir result ... Browse Code »

It's not the final state, but it allows moving ->readdir() instances
to passing filldir return value to caller of vfs_readdir().

Signed-off-by: Al Viro

Al Viro
2008-10-23 17:13:10 +0800

18 Oct, 2008

1 commit

651dab426 Merge commit 'linus/master' into merge-linus ... Browse Code »

Conflicts:

arch/x86/kvm/i8254.c

Arjan van de Ven
2008-10-18 00:20:26 +0800

17 Oct, 2008

2 commits

f7a5000f7 compat: move cp_compat_stat to common code ... Browse Code »

struct stat / compat_stat is the same on all architectures, so
cp_compat_stat should be, too.

Turns out it is, except that various architectures have slightly and some
high2lowuid/high2lowgid or the direct assignment instead of the
SET_UID/SET_GID that expands to the correct one anyway.

This patch replaces the arch-specific cp_compat_stat implementations with
a common one based on the x86-64 one.

Signed-off-by: Christoph Hellwig
Acked-by: David S. Miller [ sparc bits ]
Acked-by: Kyle McMartin [ parisc bits ]
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2008-10-17 02:21:33 +0800
362e6663e exec.c, compat.c: fix count(), compat_count() bounds checking ... Browse Code »

With MAX_ARG_STRINGS set to 0x7FFFFFFF, and being passed to 'count()' and
compat_count(), it would appear that the current max bounds check of
fs/exec.c:394:

if(++i > max)
return -E2BIG;

would never trigger. Since 'i' is of type int, so values would wrap and the
function would continue looping.

Simple fix seems to be chaning ++i to i++ and checking for '>='.

Signed-off-by: Jason Baron
Acked-by: Peter Zijlstra
Cc: "Ollie Wild"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jason Baron
2008-10-17 02:21:32 +0800

06 Sep, 2008

2 commits

8ff3e8e85 select: switch select() and poll() over to hrtimers ... Browse Code »

With lots of help, input and cleanups from Thomas Gleixner

This patch switches select() and poll() over to hrtimers.

The core of the patch is replacing the "s64 timeout" with a
"struct timespec end_time" in all the plumbing.

But most of the diffstat comes from using the just introduced helpers:
poll_select_set_timeout
poll_select_copy_remaining
timespec_add_safe
which make manipulating the timespec easier and less error-prone.

Signed-off-by: Arjan van de Ven
Signed-off-by: Thomas Gleixner

Arjan van de Ven
2008-09-06 12:35:03 +0800
b773ad40a select: add poll_select_set_timeout() and poll_select_copy_remaining() helpers ... Browse Code »

This patch adds 2 helpers that will be used for the hrtimer based select/poll:

poll_select_set_timeout() is a helper that takes a timeout (as a second, nanosecond
pair) and turns that into a "struct timespec" that represents the absolute end time.
This is a common operation in the many select() and poll() variants and needs various,
common, sanity checks.

poll_select_copy_remaining() is a helper that takes care of copying the remaining
time to userspace, as select(), pselect() and ppoll() do. This function comes in
both a natural and a compat implementation (due to datastructure differences).

Signed-off-by: Thomas Gleixner
Signed-off-by: Arjan van de Ven

Thomas Gleixner
2008-09-06 12:34:59 +0800

25 Aug, 2008

1 commit

8f3f655da [PATCH] fix regular readdir() and friends ... Browse Code »

Handling of -EOVERFLOW.

Signed-off-by: Al Viro

Al Viro
2008-08-25 13:18:08 +0800

27 Jul, 2008

1 commit

2d8f30380 [PATCH] sanitize __user_walk_fd() et.al. ... Browse Code »

* do not pass nameidata; struct path is all the callers want.
* switch to new helpers:
user_path_at(dfd, pathname, flags, &path)
user_path(pathname, &path)
user_lpath(pathname, &path)
user_path_dir(pathname, &path) (fail if not a directory)
The last 3 are trivial macro wrappers for the first one.
* remove nameidata in callers.

Signed-off-by: Al Viro

Al Viro
2008-07-27 08:53:34 +0800

25 Jul, 2008

2 commits

9deb27bae flag parameters: signalfd ... Browse Code »

This patch adds the new signalfd4 syscall. It extends the old signalfd
syscall by one parameter which is meant to hold a flag value. In this
patch the only flag support is SFD_CLOEXEC which causes the close-on-exec
flag for the returned file descriptor to be set.

A new name SFD_CLOEXEC is introduced which in this implementation must
have the same value as O_CLOEXEC.

The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include
#include
#include
#include
#include

#ifndef __NR_signalfd4
# ifdef __x86_64__
# define __NR_signalfd4 289
# elif defined __i386__
# define __NR_signalfd4 327
# else
# error "need __NR_signalfd4"
# endif
#endif

#define SFD_CLOEXEC O_CLOEXEC

int
main (void)
{
sigset_t ss;
sigemptyset (&ss);
sigaddset (&ss, SIGUSR1);
int fd = syscall (__NR_signalfd4, -1, &ss, 8, 0);
if (fd == -1)
{
puts ("signalfd4(0) failed");
return 1;
}
int coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if (coe & FD_CLOEXEC)
{
puts ("signalfd4(0) set close-on-exec flag");
return 1;
}
close (fd);

fd = syscall (__NR_signalfd4, -1, &ss, 8, SFD_CLOEXEC);
if (fd == -1)
{
puts ("signalfd4(SFD_CLOEXEC) failed");
return 1;
}
coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if ((coe & FD_CLOEXEC) == 0)
{
puts ("signalfd4(SFD_CLOEXEC) does not set close-on-exec flag");
return 1;
}
close (fd);

puts ("OK");

return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[akpm@linux-foundation.org: add sys_ni stub]
Signed-off-by: Ulrich Drepper
Acked-by: Davide Libenzi
Cc: Michael Kerrisk
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Drepper
2008-07-25 01:47:27 +0800
f4a67ccee fs: check for statfs overflow ... Browse Code »

Adds a check for an overflow in the filesystem size so if someone is
checking with statfs() on a 16G blocksize hugetlbfs in a 32bit binary that
it will report back EOVERFLOW instead of a size of 0.

Acked-by: Nishanth Aravamudan
Signed-off-by: Jon Tollefson
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jon Tollefson
2008-07-25 01:47:19 +0800

17 May, 2008

1 commit

08a6fac1c [PATCH] get rid of leak in compat_execve() ... Browse Code »

Even though copy_compat_strings() doesn't cache the pages,
copy_strings_kernel() and stuff indirectly called by e.g.
->load_binary() is doing that, so we need to drop the
cache contents in the end.

[found by WANG Cong ]

Signed-off-by: Al Viro

Al Viro
2008-05-17 05:23:05 +0800

02 May, 2008

1 commit

9f3acc314 [PATCH] split linux/file.h ... Browse Code »

Initial splitoff of the low-level stuff; taken to fdtable.h

Signed-off-by: Al Viro

Al Viro
2008-05-02 01:08:16 +0800

30 Apr, 2008

2 commits

f3de272b8 signals: use HAVE_SET_RESTORE_SIGMASK ... Browse Code »

Change all the #ifdef TIF_RESTORE_SIGMASK conditionals in non-arch code to
#ifdef HAVE_SET_RESTORE_SIGMASK. If arch code defines it first, the generic
set_restore_sigmask() using TIF_RESTORE_SIGMASK is not defined.

Signed-off-by: Roland McGrath
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: "Luck, Tony"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roland McGrath
2008-04-30 23:29:37 +0800
4e4c22c71 signals: add set_restore_sigmask ... Browse Code »

This adds the set_restore_sigmask() inline in and
replaces every set_thread_flag(TIF_RESTORE_SIGMASK) with a call to it. No
change, but abstracts the details of the flag protocol from all the calls.

Signed-off-by: Roland McGrath
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: "Luck, Tony"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roland McGrath
2008-04-30 23:29:37 +0800

16 Feb, 2008

1 commit

52833e897 Merge branch 'linus_origin' into hotfixes Browse Code »

Trond Myklebust
2008-02-16 02:36:30 +0800

15 Feb, 2008

1 commit

1d957f9bf Introduce path_put() ... Browse Code »

* Add path_put() functions for releasing a reference to the dentry and
vfsmount of a struct path in the right order

* Switch from path_release(nd) to path_put(&nd->path)

* Rename dput_path() to path_put_conditional()

[akpm@linux-foundation.org: fix cifs]
Signed-off-by: Jan Blunck
Signed-off-by: Andreas Gruenbacher
Acked-by: Christoph Hellwig
Cc:
Cc: Al Viro
Cc: Steven French
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Blunck
2008-02-15 13:13:33 +0800