22 Aug, 2012

1 commit

  • Fix kernel-doc warnings in fs/namei.c:

    Warning(fs/namei.c:360): No description found for parameter 'inode'
    Warning(fs/namei.c:672): No description found for parameter 'nd'

    Signed-off-by: Randy Dunlap
    Cc: Alexander Viro
    Cc: linux-fsdevel@vger.kernel.org
    Signed-off-by: Al Viro

    Randy Dunlap
     

17 Aug, 2012

1 commit

  • If ->atomic_open() returns -ENOENT, we take care to return the create
    error (e.g., EACCES), if any. Do the same when ->atomic_open() returns 1
    and provides a negative dentry.

    This fixes a regression where an unprivileged open O_CREAT fails with
    ENOENT instead of EACCES, introduced with the new atomic_open code. It
    is tested by the open/08.t test in the pjd posix test suite, and was
    observed on top of fuse (backed by ceph-fuse).

    Signed-off-by: Sage Weil
    Signed-off-by: Miklos Szeredi

    Sage Weil
     

15 Aug, 2012

2 commits


31 Jul, 2012

2 commits

  • Currently, mnt_want_write() is sometimes called with i_mutex held and sometimes
    without it. This isn't really a problem because mnt_want_write() is a
    non-blocking operation (essentially has a trylock semantics) but when the
    function starts to handle also frozen filesystems, it will get a full lock
    semantics and thus proper lock ordering has to be established. So move
    all mnt_want_write() calls outside of i_mutex.

    One non-trivial case needing conversion is kern_path_create() /
    user_path_create() which didn't include mnt_want_write() but now needs to
    because it acquires i_mutex. Because there are virtual file systems which
    don't bother with freeze / remount-ro protection we actually provide both
    versions of the function - one which calls mnt_want_write() and one which does
    not.

    [AV: scratch the previous, mnt_want_write() has been moved to kern_path_create()
    by now]

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • The write ref to vfsmount taken in lookup_open()/atomic_open() is going to
    be dropped; we take the one to stay in dentry_open(). Just grab the temporary
    in caller if it looks like we are going to need it (create/truncate/writable open)
    and pass (by value) "has it succeeded" flag. Instead of doing mnt_want_write()
    inside, check that flag and treat "false" as "mnt_want_write() has just failed".
    mnt_want_write() is cheap and the things get considerably simpler and more robust
    that way - we get it and drop it in the same function, to start with, rather
    than passing a "has something in the guts of really scary functions taken it"
    back to caller.

    Signed-off-by: Al Viro

    Al Viro
     

30 Jul, 2012

7 commits

  • O_EXCL without O_CREAT has different semantics; it's "fail if already opened",
    not "fail if already exists". commit 71574865 broke that...

    Signed-off-by: Al Viro

    Al Viro
     
  • Adds audit messages for unexpected link restriction violations so that
    system owners will have some sort of potentially actionable information
    about misbehaving processes.

    Signed-off-by: Kees Cook
    Signed-off-by: Al Viro

    Kees Cook
     
  • This adds symlink and hardlink restrictions to the Linux VFS.

    Symlinks:

    A long-standing class of security issues is the symlink-based
    time-of-check-time-of-use race, most commonly seen in world-writable
    directories like /tmp. The common method of exploitation of this flaw
    is to cross privilege boundaries when following a given symlink (i.e. a
    root process follows a symlink belonging to another user). For a likely
    incomplete list of hundreds of examples across the years, please see:
    http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp

    The solution is to permit symlinks to only be followed when outside
    a sticky world-writable directory, or when the uid of the symlink and
    follower match, or when the directory owner matches the symlink's owner.

    Some pointers to the history of earlier discussion that I could find:

    1996 Aug, Zygo Blaxell
    http://marc.info/?l=bugtraq&m=87602167419830&w=2
    1996 Oct, Andrew Tridgell
    http://lkml.indiana.edu/hypermail/linux/kernel/9610.2/0086.html
    1997 Dec, Albert D Cahalan
    http://lkml.org/lkml/1997/12/16/4
    2005 Feb, Lorenzo Hernández García-Hierro
    http://lkml.indiana.edu/hypermail/linux/kernel/0502.0/1896.html
    2010 May, Kees Cook
    https://lkml.org/lkml/2010/5/30/144

    Past objections and rebuttals could be summarized as:

    - Violates POSIX.
    - POSIX didn't consider this situation and it's not useful to follow
    a broken specification at the cost of security.
    - Might break unknown applications that use this feature.
    - Applications that break because of the change are easy to spot and
    fix. Applications that are vulnerable to symlink ToCToU by not having
    the change aren't. Additionally, no applications have yet been found
    that rely on this behavior.
    - Applications should just use mkstemp() or O_CREATE|O_EXCL.
    - True, but applications are not perfect, and new software is written
    all the time that makes these mistakes; blocking this flaw at the
    kernel is a single solution to the entire class of vulnerability.
    - This should live in the core VFS.
    - This should live in an LSM. (https://lkml.org/lkml/2010/5/31/135)
    - This should live in an LSM.
    - This should live in the core VFS. (https://lkml.org/lkml/2010/8/2/188)

    Hardlinks:

    On systems that have user-writable directories on the same partition
    as system files, a long-standing class of security issues is the
    hardlink-based time-of-check-time-of-use race, most commonly seen in
    world-writable directories like /tmp. The common method of exploitation
    of this flaw is to cross privilege boundaries when following a given
    hardlink (i.e. a root process follows a hardlink created by another
    user). Additionally, an issue exists where users can "pin" a potentially
    vulnerable setuid/setgid file so that an administrator will not actually
    upgrade a system fully.

    The solution is to permit hardlinks to only be created when the user is
    already the existing file's owner, or if they already have read/write
    access to the existing file.

    Many Linux users are surprised when they learn they can link to files
    they have no access to, so this change appears to follow the doctrine
    of "least surprise". Additionally, this change does not violate POSIX,
    which states "the implementation may require that the calling process
    has permission to access the existing file"[1].

    This change is known to break some implementations of the "at" daemon,
    though the version used by Fedora and Ubuntu has been fixed[2] for
    a while. Otherwise, the change has been undisruptive while in use in
    Ubuntu for the last 1.5 years.

    [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/linkat.html
    [2] http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279

    This patch is based on the patches in Openwall and grsecurity, along with
    suggestions from Al Viro. I have added a sysctl to enable the protected
    behavior, and documentation.

    Signed-off-by: Kees Cook
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Kees Cook
     
  • I can reliably reproduce the following panic by simply setting an audit
    rule on a recent 3.5.0+ kernel:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
    IP: [] audit_copy_inode+0x10/0x90
    PGD 7acd9067 PUD 7b8fb067 PMD 0
    Oops: 0000 [#86] SMP
    Modules linked in: nfs nfs_acl auth_rpcgss fscache lockd sunrpc tpm_bios btrfs zlib_deflate libcrc32c kvm_amd kvm joydev virtio_net pcspkr i2c_piix4 floppy virtio_balloon microcode virtio_blk cirrus drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan]
    CPU 0
    Pid: 1286, comm: abrt-dump-oops Tainted: G D 3.5.0+ #1 Bochs Bochs
    RIP: 0010:[] [] audit_copy_inode+0x10/0x90
    RSP: 0018:ffff88007aebfc38 EFLAGS: 00010282
    RAX: 0000000000000000 RBX: ffff88003692d860 RCX: 00000000000038c4
    RDX: 0000000000000000 RSI: ffff88006baf5d80 RDI: ffff88003692d860
    RBP: ffff88007aebfc68 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
    R13: ffff880036d30f00 R14: ffff88006baf5d80 R15: ffff88003692d800
    FS: 00007f7562634740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000040 CR3: 000000003643d000 CR4: 00000000000006f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process abrt-dump-oops (pid: 1286, threadinfo ffff88007aebe000, task ffff880079614530)
    Stack:
    ffff88007aebfdf8 ffff88007aebff28 ffff88007aebfc98 ffffffff81211358
    ffff88003692d860 0000000000000000 ffff88007aebfcc8 ffffffff810d4968
    ffff88007aebfcc8 ffff8800000038c4 0000000000000000 0000000000000000
    Call Trace:
    [] ? ext4_lookup+0xe8/0x160
    [] __audit_inode+0x118/0x2d0
    [] do_last+0x999/0xe80
    [] ? inode_permission+0x18/0x50
    [] ? kmem_cache_alloc_trace+0x11a/0x130
    [] path_openat+0xba/0x420
    [] do_filp_open+0x41/0xa0
    [] ? alloc_fd+0x4d/0x120
    [] do_sys_open+0xed/0x1c0
    [] ? __audit_syscall_entry+0xcc/0x300
    [] sys_open+0x21/0x30
    [] system_call_fastpath+0x16/0x1b
    RSP
    CR2: 0000000000000040

    The problem is that do_last is passing a negative dentry to audit_inode.
    The comments on lookup_open note that it can pass back a negative dentry
    if O_CREAT is not set.

    This patch fixes the oops, but I'm not clear on whether there's a better
    approach.

    Cc: Miklos Szeredi
    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     
  • One side effect - attempt to create a cross-device link on a read-only fs fails
    with EROFS instead of EXDEV now. Makes more sense, POSIX allows, etc.

    Signed-off-by: Al Viro

    Al Viro
     
  • Note that applying umask can't affect their results. While
    that affects errno in cases like
    mknod("/no_such_directory/a", 030000)
    yielding -EINVAL (due to impossible mode_t) instead of
    -ENOENT (due to inexistent directory), IMO that makes a lot
    more sense, POSIX allows to return either and any software
    that relies on getting -ENOENT instead of -EINVAL in that
    case deserves everything it gets.

    Signed-off-by: Al Viro

    Al Viro
     
  • releases what needs to be released after {kern,user}_path_create()

    Signed-off-by: Al Viro

    Al Viro
     

23 Jul, 2012

3 commits


14 Jul, 2012

24 commits