12 Jun, 2009

1 commit


11 May, 2009

1 commit


24 Apr, 2009

1 commit

  • If do_execve() fails after check_unsafe_exec(), it clears fs->in_exec
    unconditionally. This is wrong if we race with our sub-thread which
    also does do_execve:

    Two threads T1 and T2 and another process P, all share the same
    ->fs.

    T1 starts do_execve(BAD_FILE). It calls check_unsafe_exec(), since
    ->fs is shared, we set LSM_UNSAFE but not ->in_exec.

    P exits and decrements fs->users.

    T2 starts do_execve(), calls check_unsafe_exec(), now ->fs is not
    shared, we set fs->in_exec.

    T1 continues, open_exec(BAD_FILE) fails, we clear ->in_exec and
    return to the user-space.

    T1 does clone(CLONE_FS /* without CLONE_THREAD */).

    T2 continues without LSM_UNSAFE_SHARE while ->fs is shared with
    another process.

    Change check_unsafe_exec() to return res = 1 if we set ->in_exec, and change
    do_execve() to clear ->in_exec depending on res.

    When do_execve() suceeds, it is safe to clear ->in_exec unconditionally.
    It can be set only if we don't share ->fs with another process, and since
    we already killed all sub-threads either ->in_exec == 0 or we are the
    only user of this ->fs.

    Also, we do not need fs->lock to clear fs->in_exec.

    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Acked-by: Hugh Dickins
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

21 Apr, 2009

2 commits


05 Apr, 2009

1 commit

  • Instead of always splitting the file offset into 32-bit 'high' and 'low'
    parts, just split them into the largest natural word-size - which in C
    terms is 'unsigned long'.

    This allows 64-bit architectures to avoid the unnecessary 32-bit
    shifting and masking for native format (while the compat interfaces will
    obviously always have to do it).

    This also changes the order of 'high' and 'low' to be "low first". Why?
    Because when we have it like this, the 64-bit system calls now don't use
    the "pos_high" argument at all, and it makes more sense for the native
    system call to simply match the user-mode prototype.

    This results in a much more natural calling convention, and allows the
    compiler to generate much more straightforward code. On x86-64, we now
    generate

    testq %rcx, %rcx # pos_l
    js .L122 #,
    movq %rcx, -48(%rbp) # pos_l, pos

    from the C source

    loff_t pos = pos_from_hilo(pos_h, pos_l);
    ...
    if (pos < 0)
    return -EINVAL;

    and the 'pos_h' register isn't even touched. It used to generate code
    like

    mov %r8d, %r8d # pos_low, pos_low
    salq $32, %rcx #, tmp71
    movq %r8, %rax # pos_low, pos.386
    orq %rcx, %rax # tmp71, pos.386
    js .L122 #,
    movq %rax, -48(%rbp) # pos.386, pos

    which isn't _that_ horrible, but it does show how the natural word size
    is just a more sensible interface (same arguments will hold in the user
    level glibc wrapper function, of course, so the kernel side is just half
    of the equation!)

    Note: in all cases the user code wrapper can again be the same. You can
    just do

    #define HALF_BITS (sizeof(unsigned long)*4)
    __syscall(PWRITEV, fd, iov, count, offset, (offset >> HALF_BITS) >> HALF_BITS);

    or something like that. That way the user mode wrapper will also be
    nicely passing in a zero (it won't actually have to do the shifts, the
    compiler will understand what is going on) for the last argument.

    And that is a good idea, even if nobody will necessarily ever care: if
    we ever do move to a 128-bit lloff_t, this particular system call might
    be left alone. Of course, that will be the least of our worries if we
    really ever need to care, so this may not be worth really caring about.

    [ Fixed for lost 'loff_t' cast noticed by Andrew Morton ]

    Acked-by: Gerd Hoffmann
    Cc: H. Peter Anvin
    Cc: Andrew Morton
    Cc: linux-api@vger.kernel.org
    Cc: linux-arch@vger.kernel.org
    Cc: Ingo Molnar
    Cc: Ralf Baechle >
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

03 Apr, 2009

5 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    Remove two unneeded exports and make two symbols static in fs/mpage.c
    Cleanup after commit 585d3bc06f4ca57f975a5a1f698f65a45ea66225
    Trim includes of fdtable.h
    Don't crap into descriptor table in binfmt_som
    Trim includes in binfmt_elf
    Don't mess with descriptor table in load_elf_binary()
    Get rid of indirect include of fs_struct.h
    New helper - current_umask()
    check_unsafe_exec() doesn't care about signal handlers sharing
    New locking/refcounting for fs_struct
    Take fs_struct handling to new file (fs/fs_struct.c)
    Get rid of bumping fs_struct refcount in pivot_root(2)
    Kill unsharing fs_struct in __set_personality()

    Linus Torvalds
     
  • Signed-off-by: Gerd Hoffmann
    Cc: Arnd Bergmann
    Cc: Al Viro
    Cc:
    Cc:
    Cc: Ralf Baechle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gerd Hoffmann
     
  • This patch adds preadv and pwritev system calls. These syscalls are a
    pretty straightforward combination of pread and readv (same for write).
    They are quite useful for doing vectored I/O in threaded applications.
    Using lseek+readv instead opens race windows you'll have to plug with
    locking.

    Other systems have such system calls too, for example NetBSD, check
    here: http://www.daemon-systems.org/man/preadv.2.html

    The application-visible interface provided by glibc should look like
    this to be compatible to the existing implementations in the *BSD family:

    ssize_t preadv(int d, const struct iovec *iov, int iovcnt, off_t offset);
    ssize_t pwritev(int d, const struct iovec *iov, int iovcnt, off_t offset);

    This prototype has one problem though: On 32bit archs is the (64bit)
    offset argument unaligned, which the syscall ABI of several archs doesn't
    allow to do. At least s390 needs a wrapper in glibc to handle this. As
    we'll need a wrappers in glibc anyway I've decided to push problem to
    glibc entriely and use a syscall prototype which works without
    arch-specific wrappers inside the kernel: The offset argument is
    explicitly splitted into two 32bit values.

    The patch sports the actual system call implementation and the windup in
    the x86 system call tables. Other archs follow as separate patches.

    Signed-off-by: Gerd Hoffmann
    Cc: Arnd Bergmann
    Cc: Al Viro
    Cc:
    Cc:
    Cc: Ralf Baechle
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gerd Hoffmann
     
  • Factor out some code from compat_sys_writev() which can be shared with the
    upcoming compat_sys_pwritev().

    Signed-off-by: Gerd Hoffmann
    Cc: Arnd Bergmann
    Cc: Al Viro
    Cc:
    Cc:
    Cc: Ralf Baechle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gerd Hoffmann
     
  • This patch series:

    Implement the preadv() and pwritev() syscalls. *BSD has this syscall for
    quite some time.

    Test code:

    #if 0
    set -x
    gcc -Wall -O2 -o preadv $0
    exit 0
    #endif
    /*
    * preadv demo / test
    *
    * (c) 2008 Gerd Hoffmann
    *
    * build with "sh $thisfile"
    */

    #include
    #include
    #include
    #include
    #include
    #include

    /* ----------------------------------------------------------------- */
    /* syscall windup */

    #include
    #if 0
    /* WARNING: Be sure you know what you are doing if you enable this.
    * linux syscall code isn't upstream yet, syscall numbers are subject
    * to change */
    # ifndef __NR_preadv
    # ifdef __i386__
    # define __NR_preadv 333
    # define __NR_pwritev 334
    # endif
    # ifdef __x86_64__
    # define __NR_preadv 295
    # define __NR_pwritev 296
    # endif
    # endif
    #endif
    #ifndef __NR_preadv
    # error preadv/pwritev syscall numbers are unknown
    #endif

    static ssize_t preadv(int fd, const struct iovec *iov, int iovcnt, off_t offset)
    {
    uint32_t pos_high = (offset >> 32) & 0xffffffff;
    uint32_t pos_low = offset & 0xffffffff;

    return syscall(__NR_preadv, fd, iov, iovcnt, pos_high, pos_low);
    }

    static ssize_t pwritev(int fd, const struct iovec *iov, int iovcnt, off_t offset)
    {
    uint32_t pos_high = (offset >> 32) & 0xffffffff;
    uint32_t pos_low = offset & 0xffffffff;

    return syscall(__NR_pwritev, fd, iov, iovcnt, pos_high, pos_low);
    }

    /* ----------------------------------------------------------------- */
    /* demo/test app */

    static char filename[] = "/tmp/preadv-XXXXXX";
    static char outbuf[11] = "0123456789";
    static char inbuf[11] = "----------";

    static struct iovec ovec[2] = {{
    .iov_base = outbuf + 5,
    .iov_len = 5,
    },{
    .iov_base = outbuf + 0,
    .iov_len = 5,
    }};

    static struct iovec ivec[3] = {{
    .iov_base = inbuf + 6,
    .iov_len = 2,
    },{
    .iov_base = inbuf + 4,
    .iov_len = 2,
    },{
    .iov_base = inbuf + 2,
    .iov_len = 2,
    }};

    void cleanup(void)
    {
    unlink(filename);
    }

    int main(int argc, char **argv)
    {
    int fd, rc;

    fd = mkstemp(filename);
    if (-1 == fd) {
    perror("mkstemp");
    exit(1);
    }
    atexit(cleanup);

    /* write to file: "56789-01234" */
    rc = pwritev(fd, ovec, 2, 0);
    if (rc < 0) {
    perror("pwritev");
    exit(1);
    }

    /* read from file: "78-90-12" */
    rc = preadv(fd, ivec, 3, 2);
    if (rc < 0) {
    perror("preadv");
    exit(1);
    }

    printf("result : %s\n", inbuf);
    printf("expected: %s\n", "--129078--");
    exit(0);
    }

    This patch:

    Factor out some code from compat_sys_readv() which can be shared with the
    upcoming compat_sys_preadv().

    Signed-off-by: Gerd Hoffmann
    Cc: Arnd Bergmann
    Cc: Al Viro
    Cc:
    Cc:
    Cc: Ralf Baechle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gerd Hoffmann
     

01 Apr, 2009

1 commit

  • * all changes of current->fs are done under task_lock and write_lock of
    old fs->lock
    * refcount is not atomic anymore (same protection)
    * its decrements are done when removing reference from current; at the
    same time we decide whether to free it.
    * put_fs_struct() is gone
    * new field - ->in_exec. Set by check_unsafe_exec() if we are trying to do
    execve() and only subthreads share fs_struct. Cleared when finishing exec
    (success and failure alike). Makes CLONE_FS fail with -EAGAIN if set.
    * check_unsafe_exec() may fail with -EAGAIN if another execve() from subthread
    is in progress.

    Signed-off-by: Al Viro

    Al Viro
     

29 Mar, 2009

2 commits

  • Joe Malicki reports that setuid sometimes doesn't: very rarely,
    a setuid root program does not get root euid; and, by the way,
    they have a health check running lsof every few minutes.

    Right, check_unsafe_exec() notes whether the files_struct is being
    shared by more threads than will get killed by the exec, and if so
    sets LSM_UNSAFE_SHARE to make bprm_set_creds() careful about euid.
    But /proc//fd and /proc//fdinfo lookups make transient
    use of get_files_struct(), which also raises that sharing count.

    There's a rather simple fix for this: exec's check on files->count
    has been redundant ever since 2.6.1 made it unshare_files() (except
    while compat_do_execve() omitted to do so) - just remove that check.

    [Note to -stable: this patch will not apply before 2.6.29: earlier
    releases should just remove the files->count line from unsafe_exec().]

    Reported-by: Joe Malicki
    Narrowed-down-by: Michael Itz
    Tested-by: Joe Malicki
    Signed-off-by: Hugh Dickins
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • 2.6.26's commit fd8328be874f4190a811c58cd4778ec2c74d2c05
    "sanitize handling of shared descriptor tables in failing execve()"
    moved the unshare_files() from flush_old_exec() and several binfmts
    to the head of do_execve(); but forgot to make the same change to
    compat_do_execve(), leaving a CLONE_FILES files_struct shared across
    exec from a 32-bit process on a 64-bit kernel.

    It's arguable whether the files_struct really ought to be unshared
    across exec; but 2.6.1 made that so to stop the loading binary's fd
    leaking into other threads, and a 32-bit process on a 64-bit kernel
    ought to behave in the same way as 32 on 32 and 64 on 64.

    Signed-off-by: Hugh Dickins
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

28 Mar, 2009

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (37 commits)
    fs: avoid I_NEW inodes
    Merge code for single and multiple-instance mounts
    Remove get_init_pts_sb()
    Move common mknod_ptmx() calls into caller
    Parse mount options just once and copy them to super block
    Unroll essentials of do_remount_sb() into devpts
    vfs: simple_set_mnt() should return void
    fs: move bdev code out of buffer.c
    constify dentry_operations: rest
    constify dentry_operations: configfs
    constify dentry_operations: sysfs
    constify dentry_operations: JFS
    constify dentry_operations: OCFS2
    constify dentry_operations: GFS2
    constify dentry_operations: FAT
    constify dentry_operations: FUSE
    constify dentry_operations: procfs
    constify dentry_operations: ecryptfs
    constify dentry_operations: CIFS
    constify dentry_operations: AFS
    ...

    Linus Torvalds
     
  • Due to a different size of ino_t ustat needs a compat handler, but
    currently only x86 and mips provide one. Add a generic compat_sys_ustat
    and switch all architectures over to it. Instead of doing various
    user copy hacks compat_sys_ustat just reimplements sys_ustat as
    it's trivial. This was suggested by Arnd Bergmann.

    Found by Eric Sandeen when running xfstests/017 on ppc64, which causes
    stack smashing warnings on RHEL/Fedora due to the too large amount of
    data writen by the syscall.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

24 Mar, 2009

1 commit


12 Feb, 2009

1 commit

  • This patch allows LSM modules to determine whether current process is in an
    execve operation or not so that they can behave differently while an execve
    operation is in progress.

    This patch is needed by TOMOYO. Please see another patch titled "LSM adapter
    functions." for backgrounds.

    Signed-off-by: Tetsuo Handa
    Signed-off-by: David Howells
    Signed-off-by: James Morris

    Kentaro Takeda
     

07 Feb, 2009

1 commit

  • The patch:

    commit a6f76f23d297f70e2a6b3ec607f7aeeea9e37e8d
    CRED: Make execve() take advantage of copy-on-write credentials

    moved the place in which the 'safeness' of a SUID/SGID exec was performed to
    before de_thread() was called. This means that LSM_UNSAFE_SHARE is now
    calculated incorrectly. This flag is set if any of the usage counts for
    fs_struct, files_struct and sighand_struct are greater than 1 at the time the
    determination is made. All of which are true for threads created by the
    pthread library.

    However, since we wish to make the security calculation before irrevocably
    damaging the process so that we can return it an error code in the case where
    we decide we want to reject the exec request on this basis, we have to make the
    determination before calling de_thread().

    So, instead, we count up the number of threads (CLONE_THREAD) that are sharing
    our fs_struct (CLONE_FS), files_struct (CLONE_FILES) and sighand_structs
    (CLONE_SIGHAND/CLONE_THREAD) with us. These will be killed by de_thread() and
    so can be discounted by check_unsafe_exec().

    We do have to be careful because CLONE_THREAD does not imply FS or FILES.

    We _assume_ that there will be no extra references to these structs held by the
    threads we're going to kill.

    This can be tested with the attached pair of programs. Build the two programs
    using the Makefile supplied, and run ./test1 as a non-root user. If
    successful, you should see something like:

    [dhowells@andromeda tmp]$ ./test1
    --TEST1--
    uid=4043, euid=4043 suid=4043
    exec ./test2
    --TEST2--
    uid=4043, euid=0 suid=0
    SUCCESS - Correct effective user ID

    and if unsuccessful, something like:

    [dhowells@andromeda tmp]$ ./test1
    --TEST1--
    uid=4043, euid=4043 suid=4043
    exec ./test2
    --TEST2--
    uid=4043, euid=4043 suid=4043
    ERROR - Incorrect effective user ID!

    The non-root user ID you see will depend on the user you run as.

    [test1.c]
    #include
    #include
    #include
    #include

    static void *thread_func(void *arg)
    {
    while (1) {}
    }

    int main(int argc, char **argv)
    {
    pthread_t tid;
    uid_t uid, euid, suid;

    printf("--TEST1--\n");
    getresuid(&uid, &euid, &suid);
    printf("uid=%d, euid=%d suid=%d\n", uid, euid, suid);

    if (pthread_create(&tid, NULL, thread_func, NULL) < 0) {
    perror("pthread_create");
    exit(1);
    }

    printf("exec ./test2\n");
    execlp("./test2", "test2", NULL);
    perror("./test2");
    _exit(1);
    }

    [test2.c]
    #include
    #include
    #include

    int main(int argc, char **argv)
    {
    uid_t uid, euid, suid;

    getresuid(&uid, &euid, &suid);
    printf("--TEST2--\n");
    printf("uid=%d, euid=%d suid=%d\n", uid, euid, suid);

    if (euid != 0) {
    fprintf(stderr, "ERROR - Incorrect effective user ID!\n");
    exit(1);
    }
    printf("SUCCESS - Correct effective user ID\n");
    exit(0);
    }

    [Makefile]
    CFLAGS = -D_GNU_SOURCE -Wall -Werror -Wunused
    all: test1 test2

    test1: test1.c
    gcc $(CFLAGS) -o test1 test1.c -lpthread

    test2: test2.c
    gcc $(CFLAGS) -o test2 test2.c
    sudo chown root.root test2
    sudo chmod +s test2

    Reported-by: David Smith
    Signed-off-by: David Howells
    Acked-by: David Smith
    Signed-off-by: James Morris

    David Howells
     

14 Jan, 2009

1 commit


07 Jan, 2009

1 commit


14 Nov, 2008

1 commit

  • Make execve() take advantage of copy-on-write credentials, allowing it to set
    up the credentials in advance, and then commit the whole lot after the point
    of no return.

    This patch and the preceding patches have been tested with the LTP SELinux
    testsuite.

    This patch makes several logical sets of alteration:

    (1) execve().

    The credential bits from struct linux_binprm are, for the most part,
    replaced with a single credentials pointer (bprm->cred). This means that
    all the creds can be calculated in advance and then applied at the point
    of no return with no possibility of failure.

    I would like to replace bprm->cap_effective with:

    cap_isclear(bprm->cap_effective)

    but this seems impossible due to special behaviour for processes of pid 1
    (they always retain their parent's capability masks where normally they'd
    be changed - see cap_bprm_set_creds()).

    The following sequence of events now happens:

    (a) At the start of do_execve, the current task's cred_exec_mutex is
    locked to prevent PTRACE_ATTACH from obsoleting the calculation of
    creds that we make.

    (a) prepare_exec_creds() is then called to make a copy of the current
    task's credentials and prepare it. This copy is then assigned to
    bprm->cred.

    This renders security_bprm_alloc() and security_bprm_free()
    unnecessary, and so they've been removed.

    (b) The determination of unsafe execution is now performed immediately
    after (a) rather than later on in the code. The result is stored in
    bprm->unsafe for future reference.

    (c) prepare_binprm() is called, possibly multiple times.

    (i) This applies the result of set[ug]id binaries to the new creds
    attached to bprm->cred. Personality bit clearance is recorded,
    but now deferred on the basis that the exec procedure may yet
    fail.

    (ii) This then calls the new security_bprm_set_creds(). This should
    calculate the new LSM and capability credentials into *bprm->cred.

    This folds together security_bprm_set() and parts of
    security_bprm_apply_creds() (these two have been removed).
    Anything that might fail must be done at this point.

    (iii) bprm->cred_prepared is set to 1.

    bprm->cred_prepared is 0 on the first pass of the security
    calculations, and 1 on all subsequent passes. This allows SELinux
    in (ii) to base its calculations only on the initial script and
    not on the interpreter.

    (d) flush_old_exec() is called to commit the task to execution. This
    performs the following steps with regard to credentials:

    (i) Clear pdeath_signal and set dumpable on certain circumstances that
    may not be covered by commit_creds().

    (ii) Clear any bits in current->personality that were deferred from
    (c.i).

    (e) install_exec_creds() [compute_creds() as was] is called to install the
    new credentials. This performs the following steps with regard to
    credentials:

    (i) Calls security_bprm_committing_creds() to apply any security
    requirements, such as flushing unauthorised files in SELinux, that
    must be done before the credentials are changed.

    This is made up of bits of security_bprm_apply_creds() and
    security_bprm_post_apply_creds(), both of which have been removed.
    This function is not allowed to fail; anything that might fail
    must have been done in (c.ii).

    (ii) Calls commit_creds() to apply the new credentials in a single
    assignment (more or less). Possibly pdeath_signal and dumpable
    should be part of struct creds.

    (iii) Unlocks the task's cred_replace_mutex, thus allowing
    PTRACE_ATTACH to take place.

    (iv) Clears The bprm->cred pointer as the credentials it was holding
    are now immutable.

    (v) Calls security_bprm_committed_creds() to apply any security
    alterations that must be done after the creds have been changed.
    SELinux uses this to flush signals and signal handlers.

    (f) If an error occurs before (d.i), bprm_free() will call abort_creds()
    to destroy the proposed new credentials and will then unlock
    cred_replace_mutex. No changes to the credentials will have been
    made.

    (2) LSM interface.

    A number of functions have been changed, added or removed:

    (*) security_bprm_alloc(), ->bprm_alloc_security()
    (*) security_bprm_free(), ->bprm_free_security()

    Removed in favour of preparing new credentials and modifying those.

    (*) security_bprm_apply_creds(), ->bprm_apply_creds()
    (*) security_bprm_post_apply_creds(), ->bprm_post_apply_creds()

    Removed; split between security_bprm_set_creds(),
    security_bprm_committing_creds() and security_bprm_committed_creds().

    (*) security_bprm_set(), ->bprm_set_security()

    Removed; folded into security_bprm_set_creds().

    (*) security_bprm_set_creds(), ->bprm_set_creds()

    New. The new credentials in bprm->creds should be checked and set up
    as appropriate. bprm->cred_prepared is 0 on the first call, 1 on the
    second and subsequent calls.

    (*) security_bprm_committing_creds(), ->bprm_committing_creds()
    (*) security_bprm_committed_creds(), ->bprm_committed_creds()

    New. Apply the security effects of the new credentials. This
    includes closing unauthorised files in SELinux. This function may not
    fail. When the former is called, the creds haven't yet been applied
    to the process; when the latter is called, they have.

    The former may access bprm->cred, the latter may not.

    (3) SELinux.

    SELinux has a number of changes, in addition to those to support the LSM
    interface changes mentioned above:

    (a) The bprm_security_struct struct has been removed in favour of using
    the credentials-under-construction approach.

    (c) flush_unauthorized_files() now takes a cred pointer and passes it on
    to inode_has_perm(), file_has_perm() and dentry_open().

    Signed-off-by: David Howells
    Acked-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    David Howells
     

27 Oct, 2008

1 commit

  • Some userland apps seem to pass in a "0" for the seconds, and several
    seconds worth of usecs to select(). The old kernels accepted this just
    fine, so the new kernels must too.

    However, due to the upscaling of the microseconds to nanoseconds we had
    some cases where we got math overflow, and depending on the GCC version
    (due to inlining decisions) that actually resulted in an -EINVAL return.

    This patch fixes this by adding the excess microseconds to the seconds
    field.

    Also with thanks to Marcin Slusarz for spotting some implementation bugs
    in the diagnostics patches.

    Reported-by: Carlos R. Mafra
    Signed-off-by: Arjan van de Ven
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

24 Oct, 2008

1 commit

  • …inux/kernel/git/tip/linux-2.6-tip

    * 'v28-range-hrtimers-for-linus-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (37 commits)
    hrtimers: add missing docbook comments to struct hrtimer
    hrtimers: simplify hrtimer_peek_ahead_timers()
    hrtimers: fix docbook comments
    DECLARE_PER_CPU needs linux/percpu.h
    hrtimers: fix typo
    rangetimers: fix the bug reported by Ingo for real
    rangetimer: fix BUG_ON reported by Ingo
    rangetimer: fix x86 build failure for the !HRTIMERS case
    select: fix alpha OSF wrapper
    select: fix alpha OSF wrapper
    hrtimer: peek at the timer queue just before going idle
    hrtimer: make the futex() system call use the per process slack value
    hrtimer: make the nanosleep() syscall use the per process slack
    hrtimer: fix signed/unsigned bug in slack estimator
    hrtimer: show the timer ranges in /proc/timer_list
    hrtimer: incorporate feedback from Peter Zijlstra
    hrtimer: add a hrtimer_start_range() function
    hrtimer: another build fix
    hrtimer: fix build bug found by Ingo
    hrtimer: make select() and poll() use the hrtimer range feature
    ...

    Linus Torvalds
     

23 Oct, 2008

1 commit


18 Oct, 2008

1 commit


17 Oct, 2008

2 commits

  • struct stat / compat_stat is the same on all architectures, so
    cp_compat_stat should be, too.

    Turns out it is, except that various architectures have slightly and some
    high2lowuid/high2lowgid or the direct assignment instead of the
    SET_UID/SET_GID that expands to the correct one anyway.

    This patch replaces the arch-specific cp_compat_stat implementations with
    a common one based on the x86-64 one.

    Signed-off-by: Christoph Hellwig
    Acked-by: David S. Miller [ sparc bits ]
    Acked-by: Kyle McMartin [ parisc bits ]
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • With MAX_ARG_STRINGS set to 0x7FFFFFFF, and being passed to 'count()' and
    compat_count(), it would appear that the current max bounds check of
    fs/exec.c:394:

    if(++i > max)
    return -E2BIG;

    would never trigger. Since 'i' is of type int, so values would wrap and the
    function would continue looping.

    Simple fix seems to be chaning ++i to i++ and checking for '>='.

    Signed-off-by: Jason Baron
    Acked-by: Peter Zijlstra
    Cc: "Ollie Wild"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Baron
     

06 Sep, 2008

2 commits

  • With lots of help, input and cleanups from Thomas Gleixner

    This patch switches select() and poll() over to hrtimers.

    The core of the patch is replacing the "s64 timeout" with a
    "struct timespec end_time" in all the plumbing.

    But most of the diffstat comes from using the just introduced helpers:
    poll_select_set_timeout
    poll_select_copy_remaining
    timespec_add_safe
    which make manipulating the timespec easier and less error-prone.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Thomas Gleixner

    Arjan van de Ven
     
  • This patch adds 2 helpers that will be used for the hrtimer based select/poll:

    poll_select_set_timeout() is a helper that takes a timeout (as a second, nanosecond
    pair) and turns that into a "struct timespec" that represents the absolute end time.
    This is a common operation in the many select() and poll() variants and needs various,
    common, sanity checks.

    poll_select_copy_remaining() is a helper that takes care of copying the remaining
    time to userspace, as select(), pselect() and ppoll() do. This function comes in
    both a natural and a compat implementation (due to datastructure differences).

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Arjan van de Ven

    Thomas Gleixner
     

25 Aug, 2008

1 commit


27 Jul, 2008

1 commit

  • * do not pass nameidata; struct path is all the callers want.
    * switch to new helpers:
    user_path_at(dfd, pathname, flags, &path)
    user_path(pathname, &path)
    user_lpath(pathname, &path)
    user_path_dir(pathname, &path) (fail if not a directory)
    The last 3 are trivial macro wrappers for the first one.
    * remove nameidata in callers.

    Signed-off-by: Al Viro

    Al Viro
     

25 Jul, 2008

2 commits

  • This patch adds the new signalfd4 syscall. It extends the old signalfd
    syscall by one parameter which is meant to hold a flag value. In this
    patch the only flag support is SFD_CLOEXEC which causes the close-on-exec
    flag for the returned file descriptor to be set.

    A new name SFD_CLOEXEC is introduced which in this implementation must
    have the same value as O_CLOEXEC.

    The following test must be adjusted for architectures other than x86 and
    x86-64 and in case the syscall numbers changed.

    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    #include
    #include
    #include
    #include
    #include

    #ifndef __NR_signalfd4
    # ifdef __x86_64__
    # define __NR_signalfd4 289
    # elif defined __i386__
    # define __NR_signalfd4 327
    # else
    # error "need __NR_signalfd4"
    # endif
    #endif

    #define SFD_CLOEXEC O_CLOEXEC

    int
    main (void)
    {
    sigset_t ss;
    sigemptyset (&ss);
    sigaddset (&ss, SIGUSR1);
    int fd = syscall (__NR_signalfd4, -1, &ss, 8, 0);
    if (fd == -1)
    {
    puts ("signalfd4(0) failed");
    return 1;
    }
    int coe = fcntl (fd, F_GETFD);
    if (coe == -1)
    {
    puts ("fcntl failed");
    return 1;
    }
    if (coe & FD_CLOEXEC)
    {
    puts ("signalfd4(0) set close-on-exec flag");
    return 1;
    }
    close (fd);

    fd = syscall (__NR_signalfd4, -1, &ss, 8, SFD_CLOEXEC);
    if (fd == -1)
    {
    puts ("signalfd4(SFD_CLOEXEC) failed");
    return 1;
    }
    coe = fcntl (fd, F_GETFD);
    if (coe == -1)
    {
    puts ("fcntl failed");
    return 1;
    }
    if ((coe & FD_CLOEXEC) == 0)
    {
    puts ("signalfd4(SFD_CLOEXEC) does not set close-on-exec flag");
    return 1;
    }
    close (fd);

    puts ("OK");

    return 0;
    }
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    [akpm@linux-foundation.org: add sys_ni stub]
    Signed-off-by: Ulrich Drepper
    Acked-by: Davide Libenzi
    Cc: Michael Kerrisk
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Drepper
     
  • Adds a check for an overflow in the filesystem size so if someone is
    checking with statfs() on a 16G blocksize hugetlbfs in a 32bit binary that
    it will report back EOVERFLOW instead of a size of 0.

    Acked-by: Nishanth Aravamudan
    Signed-off-by: Jon Tollefson
    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jon Tollefson
     

17 May, 2008

1 commit

  • Even though copy_compat_strings() doesn't cache the pages,
    copy_strings_kernel() and stuff indirectly called by e.g.
    ->load_binary() is doing that, so we need to drop the
    cache contents in the end.

    [found by WANG Cong ]

    Signed-off-by: Al Viro

    Al Viro
     

02 May, 2008

1 commit


30 Apr, 2008

2 commits

  • Change all the #ifdef TIF_RESTORE_SIGMASK conditionals in non-arch code to
    #ifdef HAVE_SET_RESTORE_SIGMASK. If arch code defines it first, the generic
    set_restore_sigmask() using TIF_RESTORE_SIGMASK is not defined.

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • This adds the set_restore_sigmask() inline in and
    replaces every set_thread_flag(TIF_RESTORE_SIGMASK) with a call to it. No
    change, but abstracts the details of the flag protocol from all the calls.

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     

16 Feb, 2008

1 commit


15 Feb, 2008

1 commit

  • * Add path_put() functions for releasing a reference to the dentry and
    vfsmount of a struct path in the right order

    * Switch from path_release(nd) to path_put(&nd->path)

    * Rename dput_path() to path_put_conditional()

    [akpm@linux-foundation.org: fix cifs]
    Signed-off-by: Jan Blunck
    Signed-off-by: Andreas Gruenbacher
    Acked-by: Christoph Hellwig
    Cc:
    Cc: Al Viro
    Cc: Steven French
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Blunck