09 Mar, 2013

1 commit


02 Mar, 2013

1 commit


26 Feb, 2013

1 commit

  • The following set of operations on a NFS client and server will cause

    server# mkdir a
    client# cd a
    server# mv a a.bak
    client# sleep 30 # (or whatever the dir attrcache timeout is)
    client# stat .
    stat: cannot stat `.': Stale NFS file handle

    Obviously, we should not be getting an ESTALE error back there since the
    inode still exists on the server. The problem is that the lookup code
    will call d_revalidate on the dentry that "." refers to, because NFS has
    FS_REVAL_DOT set.

    nfs_lookup_revalidate will see that the parent directory has changed and
    will try to reverify the dentry by redoing a LOOKUP. That of course
    fails, so the lookup code returns ESTALE.

    The problem here is that d_revalidate is really a bad fit for this case.
    What we really want to know at this point is whether the inode is still
    good or not, but we don't really care what name it goes by or whether
    the dcache is still valid.

    Add a new d_op->d_weak_revalidate operation and have complete_walk call
    that instead of d_revalidate. The intent there is to allow for a
    "weaker" d_revalidate that just checks to see whether the inode is still
    good. This is also gives us an opportunity to kill off the FS_REVAL_DOT
    special casing.

    [AV: changed method name, added note in porting, fixed confusion re
    having it possibly called from RCU mode (it won't be)]

    Cc: NeilBrown
    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     

23 Feb, 2013

6 commits


21 Dec, 2012

12 commits


30 Nov, 2012

1 commit


27 Oct, 2012

1 commit

  • In commit 800179c9b8a1 ("This adds symlink and hardlink restrictions to
    the Linux VFS"), the new link protections were enabled by default, in
    the hope that no actual application would care, despite it being
    technically against legacy UNIX (and documented POSIX) behavior.

    However, it does turn out to break some applications. It's rare, and
    it's unfortunate, but it's unacceptable to break existing systems, so
    we'll have to default to legacy behavior.

    In particular, it has broken the way AFD distributes files, see

    http://www.dwd.de/AFD/

    along with some legacy scripts.

    Distributions can end up setting this at initrd time or in system
    scripts: if you have security problems due to link attacks during your
    early boot sequence, you have bigger problems than some kernel sysctl
    setting. Do:

    echo 1 > /proc/sys/fs/protected_symlinks
    echo 1 > /proc/sys/fs/protected_hardlinks

    to re-enable the link protections.

    Alternatively, we may at some point introduce a kernel config option
    that sets these kinds of "more secure but not traditional" behavioural
    options automatically.

    Reported-by: Nick Bowler
    Reported-by: Holger Kiehl
    Cc: Kees Cook
    Cc: Ingo Molnar
    Cc: Andrew Morton
    Cc: Al Viro
    Cc: Alan Cox
    Cc: Theodore Ts'o
    Cc: stable@kernel.org # v3.6
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

13 Oct, 2012

6 commits

  • In the common case where a name is much smaller than PATH_MAX, an extra
    allocation for struct filename is unnecessary. Before allocating a
    separate one, try to embed the struct filename inside the buffer first. If
    it turns out that that's not long enough, then fall back to allocating a
    separate struct filename and redoing the copy.

    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     
  • Keep a pointer to the audit_names "slot" in struct filename.

    Have all of the audit_inode callers pass a struct filename ponter to
    audit_inode instead of a string pointer. If the aname field is already
    populated, then we can skip walking the list altogether and just use it
    directly.

    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     
  • ...and fix up the callers. For do_file_open_root, just declare a
    struct filename on the stack and fill out the .name field. For
    do_filp_open, make it also take a struct filename pointer, and fix up its
    callers to call it appropriately.

    For filp_open, add a variant that takes a struct filename pointer and turn
    filp_open into a wrapper around it.

    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     
  • ...and make the user_path callers use that variant instead.

    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     
  • Currently, if we call getname() on a userland string more than once,
    we'll get multiple copies of the string and multiple audit_names
    records.

    Add a function that will allow the audit_names code to satisfy getname
    requests using info from the audit_names list, avoiding a new allocation
    and audit_names records.

    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     
  • getname() is intended to copy pathname strings from userspace into a
    kernel buffer. The result is just a string in kernel space. It would
    however be quite helpful to be able to attach some ancillary info to
    the string.

    For instance, we could attach some audit-related info to reduce the
    amount of audit-related processing needed. When auditing is enabled,
    we could also call getname() on the string more than once and not
    need to recopy it from userspace.

    This patchset converts the getname()/putname() interfaces to return
    a struct instead of a string. For now, the struct just tracks the
    string in kernel space and the original userland pointer for it.

    Later, we'll add other information to the struct as it becomes
    convenient.

    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     

12 Oct, 2012

6 commits

  • I see no callers in module code.

    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     
  • In order to accomodate retrying path-based syscalls, we need to add a
    new "type" argument to audit_inode_child. This will tell us whether
    we're looking for a child entry that represents a create or a delete.

    If we find a parent, don't automatically assume that we need to create a
    new entry. Instead, use the information we have to try to find an
    existing entry first. Update it if one is found and create a new one if
    not.

    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     
  • Currently, this gets set mostly by happenstance when we call into
    audit_inode_child. While that might be a little more efficient, it seems
    wrong. If the syscall ends up failing before audit_inode_child ever gets
    called, then you'll have an audit_names record that shows the full path
    but has the parent inode info attached.

    Fix this by passing in a parent flag when we call audit_inode that gets
    set to the value of LOOKUP_PARENT. We can then fix up the pathname for
    the audit entry correctly from the get-go.

    While we're at it, clean up the no-op macro for audit_inode in the
    !CONFIG_AUDITSYSCALL case.

    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     
  • Most of the callers get called with an inode and dentry in the reverse
    order. The compiler then has to reshuffle the arg registers and/or
    stack in order to pass them on to audit_inode_child.

    Reverse those arguments for a micro-optimization.

    Reported-by: Eric Paris
    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     
  • As best I can tell, whenever retval == 0, nd->path.dentry and nd->inode
    are also non-NULL. Eliminate those checks and the superfluous
    audit_context check.

    Signed-off-by: Eric Paris
    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     
  • The follow_link() function always initializes its *p argument,
    or returns an error, but when building with 'gcc -s', the compiler
    gets confused by the __always_inline attribute to the function
    and can no longer detect where the cookie was initialized.

    The solution is to always initialize the pointer from follow_link,
    even in the error path. When building with -O2, this has zero impact
    on generated code and adds a single instruction in the error path
    for a -Os build on ARM.

    Without this patch, building with gcc-4.6 through gcc-4.8 and
    CONFIG_CC_OPTIMIZE_FOR_SIZE results in:

    fs/namei.c: In function 'link_path_walk':
    fs/namei.c:649:24: warning: 'cookie' may be used uninitialized in this function [-Wuninitialized]
    fs/namei.c:1544:9: note: 'cookie' was declared here
    fs/namei.c: In function 'path_lookupat':
    fs/namei.c:649:24: warning: 'cookie' may be used uninitialized in this function [-Wuninitialized]
    fs/namei.c:1934:10: note: 'cookie' was declared here
    fs/namei.c: In function 'path_openat':
    fs/namei.c:649:24: warning: 'cookie' may be used uninitialized in this function [-Wuninitialized]
    fs/namei.c:2899:9: note: 'cookie' was declared here

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Al Viro

    Arnd Bergmann
     

10 Oct, 2012

1 commit


03 Oct, 2012

2 commits

  • Pull vfs update from Al Viro:

    - big one - consolidation of descriptor-related logics; almost all of
    that is moved to fs/file.c

    (BTW, I'm seriously tempted to rename the result to fd.c. As it is,
    we have a situation when file_table.c is about handling of struct
    file and file.c is about handling of descriptor tables; the reasons
    are historical - file_table.c used to be about a static array of
    struct file we used to have way back).

    A lot of stray ends got cleaned up and converted to saner primitives,
    disgusting mess in android/binder.c is still disgusting, but at least
    doesn't poke so much in descriptor table guts anymore. A bunch of
    relatively minor races got fixed in process, plus an ext4 struct file
    leak.

    - related thing - fget_light() partially unuglified; see fdget() in
    there (and yes, it generates the code as good as we used to have).

    - also related - bits of Cyrill's procfs stuff that got entangled into
    that work; _not_ all of it, just the initial move to fs/proc/fd.c and
    switch of fdinfo to seq_file.

    - Alex's fs/coredump.c spiltoff - the same story, had been easier to
    take that commit than mess with conflicts. The rest is a separate
    pile, this was just a mechanical code movement.

    - a few misc patches all over the place. Not all for this cycle,
    there'll be more (and quite a few currently sit in akpm's tree)."

    Fix up trivial conflicts in the android binder driver, and some fairly
    simple conflicts due to two different changes to the sock_alloc_file()
    interface ("take descriptor handling from sock_alloc_file() to callers"
    vs "net: Providing protocol type via system.sockprotoname xattr of
    /proc/PID/fd entries" adding a dentry name to the socket)

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
    MAX_LFS_FILESIZE should be a loff_t
    compat: fs: Generic compat_sys_sendfile implementation
    fs: push rcu_barrier() from deactivate_locked_super() to filesystems
    btrfs: reada_extent doesn't need kref for refcount
    coredump: move core dump functionality into its own file
    coredump: prevent double-free on an error path in core dumper
    usb/gadget: fix misannotations
    fcntl: fix misannotations
    ceph: don't abuse d_delete() on failure exits
    hypfs: ->d_parent is never NULL or negative
    vfs: delete surplus inode NULL check
    switch simple cases of fget_light to fdget
    new helpers: fdget()/fdput()
    switch o2hb_region_dev_write() to fget_light()
    proc_map_files_readdir(): don't bother with grabbing files
    make get_file() return its argument
    vhost_set_vring(): turn pollstart/pollstop into bool
    switch prctl_set_mm_exe_file() to fget_light()
    switch xfs_find_handle() to fget_light()
    switch xfs_swapext() to fget_light()
    ...

    Linus Torvalds
     
  • Pull user namespace changes from Eric Biederman:
    "This is a mostly modest set of changes to enable basic user namespace
    support. This allows the code to code to compile with user namespaces
    enabled and removes the assumption there is only the initial user
    namespace. Everything is converted except for the most complex of the
    filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
    nfs, ocfs2 and xfs as those patches need a bit more review.

    The strategy is to push kuid_t and kgid_t values are far down into
    subsystems and filesystems as reasonable. Leaving the make_kuid and
    from_kuid operations to happen at the edge of userspace, as the values
    come off the disk, and as the values come in from the network.
    Letting compile type incompatible compile errors (present when user
    namespaces are enabled) guide me to find the issues.

    The most tricky areas have been the places where we had an implicit
    union of uid and gid values and were storing them in an unsigned int.
    Those places were converted into explicit unions. I made certain to
    handle those places with simple trivial patches.

    Out of that work I discovered we have generic interfaces for storing
    quota by projid. I had never heard of the project identifiers before.
    Adding full user namespace support for project identifiers accounts
    for most of the code size growth in my git tree.

    Ultimately there will be work to relax privlige checks from
    "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
    root in a user names to do those things that today we only forbid to
    non-root users because it will confuse suid root applications.

    While I was pushing kuid_t and kgid_t changes deep into the audit code
    I made a few other cleanups. I capitalized on the fact we process
    netlink messages in the context of the message sender. I removed
    usage of NETLINK_CRED, and started directly using current->tty.

    Some of these patches have also made it into maintainer trees, with no
    problems from identical code from different trees showing up in
    linux-next.

    After reading through all of this code I feel like I might be able to
    win a game of kernel trivial pursuit."

    Fix up some fairly trivial conflicts in netfilter uid/git logging code.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
    userns: Convert the ufs filesystem to use kuid/kgid where appropriate
    userns: Convert the udf filesystem to use kuid/kgid where appropriate
    userns: Convert ubifs to use kuid/kgid
    userns: Convert squashfs to use kuid/kgid where appropriate
    userns: Convert reiserfs to use kuid and kgid where appropriate
    userns: Convert jfs to use kuid/kgid where appropriate
    userns: Convert jffs2 to use kuid and kgid where appropriate
    userns: Convert hpfs to use kuid and kgid where appropriate
    userns: Convert btrfs to use kuid/kgid where appropriate
    userns: Convert bfs to use kuid/kgid where appropriate
    userns: Convert affs to use kuid/kgid wherwe appropriate
    userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
    userns: On ia64 deal with current_uid and current_gid being kuid and kgid
    userns: On ppc convert current_uid from a kuid before printing.
    userns: Convert s390 getting uid and gid system calls to use kuid and kgid
    userns: Convert s390 hypfs to use kuid and kgid where appropriate
    userns: Convert binder ipc to use kuids
    userns: Teach security_path_chown to take kuids and kgids
    userns: Add user namespace support to IMA
    userns: Convert EVM to deal with kuids and kgids in it's hmac computation
    ...

    Linus Torvalds
     

27 Sep, 2012

2 commits