15 Aug, 2012

1 commit

  • Userspace can pass weird create mode in open(2) that we canonicalize to
    "(mode & S_IALLUGO) | S_IFREG" in vfs_create().

    The problem is that we use the uncanonicalized mode before calling vfs_create()
    with unforseen consequences.

    So do the canonicalization early in build_open_flags().

    Signed-off-by: Miklos Szeredi
    Tested-by: Richard W.M. Jones
    CC: stable@vger.kernel.org

    Miklos Szeredi
     

04 Aug, 2012

1 commit


31 Jul, 2012

2 commits

  • There are several entry points which dirty pages in a filesystem. mmap
    (handled by block_page_mkwrite()), buffered write (handled by
    __generic_file_aio_write()), splice write (generic_file_splice_write),
    truncate, and fallocate (these can dirty last partial page - handled inside
    each filesystem separately). Protect these places with sb_start_write() and
    sb_end_write().

    ->page_mkwrite() calls are particularly complex since they are called with
    mmap_sem held and thus we cannot use standard sb_start_write() due to lock
    ordering constraints. We solve the problem by using a special freeze protection
    sb_start_pagefault() which ranks below mmap_sem.

    BugLink: https://bugs.launchpad.net/bugs/897421
    Tested-by: Kamal Mostafa
    Tested-by: Peter M. Petrakis
    Tested-by: Dann Frazier
    Tested-by: Massimo Morana
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Most of places where we want freeze protection coincides with the places where
    we also have remount-ro protection. So make mnt_want_write() and
    mnt_drop_write() (and their _file alternative) prevent freezing as well.
    For the few cases that are really interested only in remount-ro protection
    provide new function variants.

    BugLink: https://bugs.launchpad.net/bugs/897421
    Tested-by: Kamal Mostafa
    Tested-by: Peter M. Petrakis
    Tested-by: Dann Frazier
    Tested-by: Massimo Morana
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     

30 Jul, 2012

1 commit


23 Jul, 2012

1 commit


14 Jul, 2012

12 commits


08 Jul, 2012

1 commit

  • We already use them for openat() and friends, but fchdir() also wants to
    be able to use O_PATH file descriptors. This should make it comparable
    to the O_SEARCH of Solaris. In particular, O_PATH allows you to access
    (not-quite-open) a directory you don't have read persmission to, only
    execute permission.

    Noticed during development of multithread support for ksh93.

    Reported-by: ольга крыжановская
    Cc: Al Viro
    Cc: stable@kernel.org # O_PATH introduced in 3.0+
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

02 Jun, 2012

4 commits


24 May, 2012

1 commit

  • Pull user namespace enhancements from Eric Biederman:
    "This is a course correction for the user namespace, so that we can
    reach an inexpensive, maintainable, and reasonably complete
    implementation.

    Highlights:
    - Config guards make it impossible to enable the user namespace and
    code that has not been converted to be user namespace safe.

    - Use of the new kuid_t type ensures the if you somehow get past the
    config guards the kernel will encounter type errors if you enable
    user namespaces and attempt to compile in code whose permission
    checks have not been updated to be user namespace safe.

    - All uids from child user namespaces are mapped into the initial
    user namespace before they are processed. Removing the need to add
    an additional check to see if the user namespace of the compared
    uids remains the same.

    - With the user namespaces compiled out the performance is as good or
    better than it is today.

    - For most operations absolutely nothing changes performance or
    operationally with the user namespace enabled.

    - The worst case performance I could come up with was timing 1
    billion cache cold stat operations with the user namespace code
    enabled. This went from 156s to 164s on my laptop (or 156ns to
    164ns per stat operation).

    - (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
    Most uid/gid setting system calls treat these value specially
    anyway so attempting to use -1 as a uid would likely cause
    entertaining failures in userspace.

    - If setuid is called with a uid that can not be mapped setuid fails.
    I have looked at sendmail, login, ssh and every other program I
    could think of that would call setuid and they all check for and
    handle the case where setuid fails.

    - If stat or a similar system call is called from a context in which
    we can not map a uid we lie and return overflowuid. The LFS
    experience suggests not lying and returning an error code might be
    better, but the historical precedent with uids is different and I
    can not think of anything that would break by lying about a uid we
    can't map.

    - Capabilities are localized to the current user namespace making it
    safe to give the initial user in a user namespace all capabilities.

    My git tree covers all of the modifications needed to convert the core
    kernel and enough changes to make a system bootable to runlevel 1."

    Fix up trivial conflicts due to nearby independent changes in fs/stat.c

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
    userns: Silence silly gcc warning.
    cred: use correct cred accessor with regards to rcu read lock
    userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
    userns: Convert cgroup permission checks to use uid_eq
    userns: Convert tmpfs to use kuid and kgid where appropriate
    userns: Convert sysfs to use kgid/kuid where appropriate
    userns: Convert sysctl permission checks to use kuid and kgids.
    userns: Convert proc to use kuid/kgid where appropriate
    userns: Convert ext4 to user kuid/kgid where appropriate
    userns: Convert ext3 to use kuid/kgid where appropriate
    userns: Convert ext2 to use kuid/kgid where appropriate.
    userns: Convert devpts to use kuid/kgid where appropriate
    userns: Convert binary formats to use kuid/kgid where appropriate
    userns: Add negative depends on entries to avoid building code that is userns unsafe
    userns: signal remove unnecessary map_cred_ns
    userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
    userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
    userns: Convert stat to return values mapped from kuids and kgids
    userns: Convert user specfied uids and gids in chown into kuids and kgid
    userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
    ...

    Linus Torvalds
     

03 May, 2012

2 commits


10 Apr, 2012

1 commit


20 Feb, 2012

1 commit

  • Wrap accesses to the fd_sets in struct fdtable (for recording open files and
    close-on-exec flags) so that we can move away from using fd_sets since we
    abuse the fd_set structs by not allocating the full-sized structure under
    normal circumstances and by non-core code looking at the internals of the
    fd_sets.

    The first abuse means that use of FD_ZERO() on these fd_sets is not permitted,
    since that cannot be told about their abnormal lengths.

    This introduces six wrapper functions for setting, clearing and testing
    close-on-exec flags and fd-is-open flags:

    void __set_close_on_exec(int fd, struct fdtable *fdt);
    void __clear_close_on_exec(int fd, struct fdtable *fdt);
    bool close_on_exec(int fd, const struct fdtable *fdt);
    void __set_open_fd(int fd, struct fdtable *fdt);
    void __clear_open_fd(int fd, struct fdtable *fdt);
    bool fd_is_open(int fd, const struct fdtable *fdt);

    Note that I've prepended '__' to the names of the set/clear functions because
    they require the caller to hold a lock to use them.

    Note also that I haven't added wrappers for looking behind the scenes at the
    the array. Possibly that should exist too.

    Signed-off-by: David Howells
    Link: http://lkml.kernel.org/r/20120216174942.23314.1364.stgit@warthog.procyon.org.uk
    Signed-off-by: H. Peter Anvin
    Cc: Al Viro

    David Howells
     

07 Jan, 2012

1 commit


04 Jan, 2012

3 commits


28 Oct, 2011

1 commit

  • In setlease, we use i_writecount to decide whether we can give out a
    read lease.

    In open, we break leases before incrementing i_writecount.

    There is therefore a window between the break lease and the i_writecount
    increment when setlease could add a new read lease.

    This would leave us with a simultaneous write open and read lease, which
    shouldn't happen.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Christoph Hellwig

    J. Bruce Fields
     

27 Jul, 2011

1 commit

  • The kludge in question is undocumented and doesn't work for 32bit
    binaries on amd64, sparc64 and s390. Passing (mode_t)-1 as
    mode had (since 0.99.14v and contrary to behaviour of any
    other Unix, prescriptions of POSIX, SuS and our own manpages)
    was kinda-sorta no-op. Note that any software relying on
    that (and looking for examples shows none) would be visibly
    broken on sparc64, where practically all userland is built
    32bit. No such complaints noticed...

    Signed-off-by: Al Viro

    Al Viro
     

23 Jul, 2011

1 commit


21 Mar, 2011

1 commit


17 Mar, 2011

1 commit

  • …s/security-testing-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (33 commits)
    AppArmor: kill unused macros in lsm.c
    AppArmor: cleanup generated files correctly
    KEYS: Add an iovec version of KEYCTL_INSTANTIATE
    KEYS: Add a new keyctl op to reject a key with a specified error code
    KEYS: Add a key type op to permit the key description to be vetted
    KEYS: Add an RCU payload dereference macro
    AppArmor: Cleanup make file to remove cruft and make it easier to read
    SELinux: implement the new sb_remount LSM hook
    LSM: Pass -o remount options to the LSM
    SELinux: Compute SID for the newly created socket
    SELinux: Socket retains creator role and MLS attribute
    SELinux: Auto-generate security_is_socket_class
    TOMOYO: Fix memory leak upon file open.
    Revert "selinux: simplify ioctl checking"
    selinux: drop unused packet flow permissions
    selinux: Fix packet forwarding checks on postrouting
    selinux: Fix wrong checks for selinux_policycap_netpeer
    selinux: Fix check for xfrm selinux context algorithm
    ima: remove unnecessary call to ima_must_measure
    IMA: remove IMA imbalance checking
    ...

    Linus Torvalds
     

16 Mar, 2011

1 commit


15 Mar, 2011

2 commits

  • For readlinkat() we simply allow empty pathname; it will fail unless
    we have dfd equal to O_PATH-opened symlink, so we are outside of
    POSIX scope here. For fchownat() and fstatat() we allow AT_EMPTY_PATH;
    let the caller explicitly ask for such behaviour.

    Signed-off-by: Al Viro

    Al Viro
     
  • New flag for open(2) - O_PATH. Semantics:
    * pathname is resolved, but the file itself is _NOT_ opened
    as far as filesystem is concerned.
    * almost all operations on the resulting descriptors shall
    fail with -EBADF. Exceptions are:
    1) operations on descriptors themselves (i.e.
    close(), dup(), dup2(), dup3(), fcntl(fd, F_DUPFD),
    fcntl(fd, F_DUPFD_CLOEXEC, ...), fcntl(fd, F_GETFD),
    fcntl(fd, F_SETFD, ...))
    2) fcntl(fd, F_GETFL), for a common non-destructive way to
    check if descriptor is open
    3) "dfd" arguments of ...at(2) syscalls, i.e. the starting
    points of pathname resolution
    * closing such descriptor does *NOT* affect dnotify or
    posix locks.
    * permissions are checked as usual along the way to file;
    no permission checks are applied to the file itself. Of course,
    giving such thing to syscall will result in permission checks (at
    the moment it means checking that starting point of ....at() is
    a directory and caller has exec permissions on it).

    fget() and fget_light() return NULL on such descriptors; use of
    fget_raw() and fget_raw_light() is needed to get them. That protects
    existing code from dealing with those things.

    There are two things still missing (they come in the next commits):
    one is handling of symlinks (right now we refuse to open them that
    way; see the next commit for semantics related to those) and another
    is descriptor passing via SCM_RIGHTS datagrams.

    Signed-off-by: Al Viro

    Al Viro