10 Mar, 2013

1 commit

  • Pull namespace bugfixes from Eric Biederman:
    "This is three simple fixes against 3.9-rc1. I have tested each of
    these fixes and verified they work correctly.

    The userns oops in key_change_session_keyring and the BUG_ON triggered
    by proc_ns_follow_link were found by Dave Jones.

    I am including the enhancement for mount to only trigger requests of
    filesystem modules here instead of delaying this for the 3.10 merge
    window because it is both trivial and the kind of change that tends to
    bit-rot if left untouched for two months."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    proc: Use nd_jump_link in proc_ns_follow_link
    fs: Limit sys_mount to only request filesystem modules (Part 2).
    fs: Limit sys_mount to only request filesystem modules.
    userns: Stop oopsing in key_change_session_keyring

    Linus Torvalds
     

08 Mar, 2013

1 commit

  • Pull ecryptfs fixes from Tyler Hicks:
    "Minor code cleanups and new Kconfig option to disable /dev/ecryptfs

    The code cleanups fix up W=1 compiler warnings and some unnecessary
    checks. The new Kconfig option, defaulting to N, allows the rarely
    used eCryptfs kernel to userspace communication channel to be compiled
    out. This may be the first step in it being eventually removed."

    Hmm. I'm not sure whether these should be called "fixes", and it
    probably should have gone in the merge window. But I'll let it slide.

    * tag 'ecryptfs-3.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
    eCryptfs: allow userspace messaging to be disabled
    eCryptfs: Fix redundant error check on ecryptfs_find_daemon_by_euid()
    ecryptfs: ecryptfs_msg_ctx_alloc_to_free(): remove kfree() redundant null check
    eCryptfs: decrypt_pki_encrypted_session_key(): remove kfree() redundant null check
    eCryptfs: remove unneeded checks in virt_to_scatterlist()
    eCryptfs: Fix -Wmissing-prototypes warnings
    eCryptfs: Fix -Wunused-but-set-variable warnings
    eCryptfs: initialize payload_len in keystore.c

    Linus Torvalds
     

04 Mar, 2013

2 commits

  • When the userspace messaging (for the less common case of userspace key
    wrap/unwrap via ecryptfsd) is not needed, allow eCryptfs to build with
    it removed. This saves on kernel code size and reduces potential attack
    surface by removing the /dev/ecryptfs node.

    Signed-off-by: Kees Cook
    Signed-off-by: Tyler Hicks

    Kees Cook
     
  • Modify the request_module to prefix the file system type with "fs-"
    and add aliases to all of the filesystems that can be built as modules
    to match.

    A common practice is to build all of the kernel code and leave code
    that is not commonly needed as modules, with the result that many
    users are exposed to any bug anywhere in the kernel.

    Looking for filesystems with a fs- prefix limits the pool of possible
    modules that can be loaded by mount to just filesystems trivially
    making things safer with no real cost.

    Using aliases means user space can control the policy of which
    filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
    with blacklist and alias directives. Allowing simple, safe,
    well understood work-arounds to known problematic software.

    This also addresses a rare but unfortunate problem where the filesystem
    name is not the same as it's module name and module auto-loading
    would not work. While writing this patch I saw a handful of such
    cases. The most significant being autofs that lives in the module
    autofs4.

    This is relevant to user namespaces because we can reach the request
    module in get_fs_type() without having any special permissions, and
    people get uncomfortable when a user specified string (in this case
    the filesystem type) goes all of the way to request_module.

    After having looked at this issue I don't think there is any
    particular reason to perform any filtering or permission checks beyond
    making it clear in the module request that we want a filesystem
    module. The common pattern in the kernel is to call request_module()
    without regards to the users permissions. In general all a filesystem
    module does once loaded is call register_filesystem() and go to sleep.
    Which means there is not much attack surface exposed by loading a
    filesytem module unless the filesystem is mounted. In a user
    namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
    which most filesystems do not set today.

    Acked-by: Serge Hallyn
    Acked-by: Kees Cook
    Reported-by: Kees Cook
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

28 Feb, 2013

2 commits

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     
  • It is sufficient to check the return code of
    ecryptfs_find_daemon_by_euid(). If it returns 0, it always sets the
    daemon pointer to point to a valid ecryptfs_daemon.

    Signed-off-by: Tyler Hicks
    Reported-by: Kees Cook

    Tyler Hicks
     

27 Feb, 2013

1 commit

  • Pull vfs pile (part one) from Al Viro:
    "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
    locking violations, etc.

    The most visible changes here are death of FS_REVAL_DOT (replaced with
    "has ->d_weak_revalidate()") and a new helper getting from struct file
    to inode. Some bits of preparation to xattr method interface changes.

    Misc patches by various people sent this cycle *and* ocfs2 fixes from
    several cycles ago that should've been upstream right then.

    PS: the next vfs pile will be xattr stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    saner proc_get_inode() calling conventions
    proc: avoid extra pde_put() in proc_fill_super()
    fs: change return values from -EACCES to -EPERM
    fs/exec.c: make bprm_mm_init() static
    ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
    ocfs2: fix possible use-after-free with AIO
    ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
    get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
    target: writev() on single-element vector is pointless
    export kernel_write(), convert open-coded instances
    fs: encode_fh: return FILEID_INVALID if invalid fid_type
    kill f_vfsmnt
    vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
    nfsd: handle vfs_getattr errors in acl protocol
    switch vfs_getattr() to struct path
    default SET_PERSONALITY() in linux/elf.h
    ceph: prepopulate inodes only when request is aborted
    d_hash_and_lookup(): export, switch open-coded instances
    9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
    9p: split dropping the acls from v9fs_set_create_acl()
    ...

    Linus Torvalds
     

26 Feb, 2013

2 commits


23 Feb, 2013

1 commit


13 Feb, 2013

2 commits


29 Jan, 2013

1 commit


18 Jan, 2013

3 commits


12 Jan, 2013

1 commit

  • The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
    while now and is almost always enabled by default. As agreed during the
    Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

    CC: Tyler Hicks
    CC: Dustin Kirkland
    Signed-off-by: Kees Cook
    Acked-by: Tyler Hicks

    Kees Cook
     

19 Dec, 2012

2 commits


08 Nov, 2012

1 commit

  • ecryptfs_write_begin grabs a page from page cache for writing.
    If the page contains invalid data, or data older than the
    counterpart on the disk, eCryptfs will read out the
    corresponing data from the disk into the page, decrypt them,
    then perform writing. However, for this page, if the length
    of the data to be written into is equal to page size,
    that means the whole page of data will be overwritten,
    in which case, it does not matter whatever the data were before,
    it is beneficial to perform writing directly rather than bothering
    to read and decrypt first.

    With this optimization, according to our test on a machine with
    Intel Core 2 Duo processor, iozone 'write' operation on an existing
    file with write size being multiple of page size will enjoy a steady
    3x speedup.

    Signed-off-by: Li Wang
    Signed-off-by: Yunchuan Wen
    Signed-off-by: Tyler Hicks

    Li Wang
     

03 Oct, 2012

3 commits

  • Pull vfs update from Al Viro:

    - big one - consolidation of descriptor-related logics; almost all of
    that is moved to fs/file.c

    (BTW, I'm seriously tempted to rename the result to fd.c. As it is,
    we have a situation when file_table.c is about handling of struct
    file and file.c is about handling of descriptor tables; the reasons
    are historical - file_table.c used to be about a static array of
    struct file we used to have way back).

    A lot of stray ends got cleaned up and converted to saner primitives,
    disgusting mess in android/binder.c is still disgusting, but at least
    doesn't poke so much in descriptor table guts anymore. A bunch of
    relatively minor races got fixed in process, plus an ext4 struct file
    leak.

    - related thing - fget_light() partially unuglified; see fdget() in
    there (and yes, it generates the code as good as we used to have).

    - also related - bits of Cyrill's procfs stuff that got entangled into
    that work; _not_ all of it, just the initial move to fs/proc/fd.c and
    switch of fdinfo to seq_file.

    - Alex's fs/coredump.c spiltoff - the same story, had been easier to
    take that commit than mess with conflicts. The rest is a separate
    pile, this was just a mechanical code movement.

    - a few misc patches all over the place. Not all for this cycle,
    there'll be more (and quite a few currently sit in akpm's tree)."

    Fix up trivial conflicts in the android binder driver, and some fairly
    simple conflicts due to two different changes to the sock_alloc_file()
    interface ("take descriptor handling from sock_alloc_file() to callers"
    vs "net: Providing protocol type via system.sockprotoname xattr of
    /proc/PID/fd entries" adding a dentry name to the socket)

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
    MAX_LFS_FILESIZE should be a loff_t
    compat: fs: Generic compat_sys_sendfile implementation
    fs: push rcu_barrier() from deactivate_locked_super() to filesystems
    btrfs: reada_extent doesn't need kref for refcount
    coredump: move core dump functionality into its own file
    coredump: prevent double-free on an error path in core dumper
    usb/gadget: fix misannotations
    fcntl: fix misannotations
    ceph: don't abuse d_delete() on failure exits
    hypfs: ->d_parent is never NULL or negative
    vfs: delete surplus inode NULL check
    switch simple cases of fget_light to fdget
    new helpers: fdget()/fdput()
    switch o2hb_region_dev_write() to fget_light()
    proc_map_files_readdir(): don't bother with grabbing files
    make get_file() return its argument
    vhost_set_vring(): turn pollstart/pollstop into bool
    switch prctl_set_mm_exe_file() to fget_light()
    switch xfs_find_handle() to fget_light()
    switch xfs_swapext() to fget_light()
    ...

    Linus Torvalds
     
  • There's no reason to call rcu_barrier() on every
    deactivate_locked_super(). We only need to make sure that all delayed rcu
    free inodes are flushed before we destroy related cache.

    Removing rcu_barrier() from deactivate_locked_super() affects some fast
    paths. E.g. on my machine exit_group() of a last process in IPC
    namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.

    Signed-off-by: Kirill A. Shutemov
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Kirill A. Shutemov
     
  • Pull user namespace changes from Eric Biederman:
    "This is a mostly modest set of changes to enable basic user namespace
    support. This allows the code to code to compile with user namespaces
    enabled and removes the assumption there is only the initial user
    namespace. Everything is converted except for the most complex of the
    filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
    nfs, ocfs2 and xfs as those patches need a bit more review.

    The strategy is to push kuid_t and kgid_t values are far down into
    subsystems and filesystems as reasonable. Leaving the make_kuid and
    from_kuid operations to happen at the edge of userspace, as the values
    come off the disk, and as the values come in from the network.
    Letting compile type incompatible compile errors (present when user
    namespaces are enabled) guide me to find the issues.

    The most tricky areas have been the places where we had an implicit
    union of uid and gid values and were storing them in an unsigned int.
    Those places were converted into explicit unions. I made certain to
    handle those places with simple trivial patches.

    Out of that work I discovered we have generic interfaces for storing
    quota by projid. I had never heard of the project identifiers before.
    Adding full user namespace support for project identifiers accounts
    for most of the code size growth in my git tree.

    Ultimately there will be work to relax privlige checks from
    "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
    root in a user names to do those things that today we only forbid to
    non-root users because it will confuse suid root applications.

    While I was pushing kuid_t and kgid_t changes deep into the audit code
    I made a few other cleanups. I capitalized on the fact we process
    netlink messages in the context of the message sender. I removed
    usage of NETLINK_CRED, and started directly using current->tty.

    Some of these patches have also made it into maintainer trees, with no
    problems from identical code from different trees showing up in
    linux-next.

    After reading through all of this code I feel like I might be able to
    win a game of kernel trivial pursuit."

    Fix up some fairly trivial conflicts in netfilter uid/git logging code.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
    userns: Convert the ufs filesystem to use kuid/kgid where appropriate
    userns: Convert the udf filesystem to use kuid/kgid where appropriate
    userns: Convert ubifs to use kuid/kgid
    userns: Convert squashfs to use kuid/kgid where appropriate
    userns: Convert reiserfs to use kuid and kgid where appropriate
    userns: Convert jfs to use kuid/kgid where appropriate
    userns: Convert jffs2 to use kuid and kgid where appropriate
    userns: Convert hpfs to use kuid and kgid where appropriate
    userns: Convert btrfs to use kuid/kgid where appropriate
    userns: Convert bfs to use kuid/kgid where appropriate
    userns: Convert affs to use kuid/kgid wherwe appropriate
    userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
    userns: On ia64 deal with current_uid and current_gid being kuid and kgid
    userns: On ppc convert current_uid from a kuid before printing.
    userns: Convert s390 getting uid and gid system calls to use kuid and kgid
    userns: Convert s390 hypfs to use kuid and kgid where appropriate
    userns: Convert binder ipc to use kuids
    userns: Teach security_path_chown to take kuids and kgids
    userns: Add user namespace support to IMA
    userns: Convert EVM to deal with kuids and kgids in it's hmac computation
    ...

    Linus Torvalds
     

21 Sep, 2012

1 commit


15 Sep, 2012

3 commits

  • After calling into the lower filesystem to do a rename, the lower target
    inode's attributes were not copied up to the eCryptfs target inode. This
    resulted in the eCryptfs target inode staying around, rather than being
    evicted, because i_nlink was not updated for the eCryptfs inode. This
    also meant that eCryptfs didn't do the final iput() on the lower target
    inode so it stayed around, as well. This would result in a failure to
    free up space occupied by the target file in the rename() operation.
    Both target inodes would eventually be evicted when the eCryptfs
    filesystem was unmounted.

    This patch calls fsstack_copy_attr_all() after the lower filesystem
    does its ->rename() so that important inode attributes, such as i_nlink,
    are updated at the eCryptfs layer. ecryptfs_evict_inode() is now called
    and eCryptfs can drop its final reference on the lower inode.

    http://launchpad.net/bugs/561129

    Signed-off-by: Tyler Hicks
    Tested-by: Colin Ian King
    Cc: [2.6.39+]

    Tyler Hicks
     
  • Since eCryptfs only calls fput() on the lower file in
    ecryptfs_release(), eCryptfs should call the lower filesystem's
    ->flush() from ecryptfs_flush().

    If the lower filesystem implements ->flush(), then eCryptfs should try
    to flush out any dirty pages prior to calling the lower ->flush(). If
    the lower filesystem does not implement ->flush(), then eCryptfs has no
    need to do anything in ecryptfs_flush() since dirty pages are now
    written out to the lower filesystem in ecryptfs_release().

    Signed-off-by: Tyler Hicks

    Tyler Hicks
     
  • Fixes a regression caused by:

    821f749 eCryptfs: Revert to a writethrough cache model

    That patch reverted some code (specifically, 32001d6f) that was
    necessary to properly handle open() -> mmap() -> close() -> dirty pages
    -> munmap(), because the lower file could be closed before the dirty
    pages are written out.

    Rather than reapplying 32001d6f, this approach is a better way of
    ensuring that the lower file is still open in order to handle writing
    out the dirty pages. It is called from ecryptfs_release(), while we have
    a lock on the lower file pointer, just before the lower file gets the
    final fput() and we overwrite the pointer.

    https://launchpad.net/bugs/1047261

    Signed-off-by: Tyler Hicks
    Reported-by: Artemy Tregubenko
    Tested-by: Artemy Tregubenko
    Tested-by: Colin Ian King

    Tyler Hicks
     

03 Aug, 2012

1 commit

  • Pull ecryptfs fixes from Tyler Hicks:
    - Fixes a bug when the lower filesystem mount options include 'acl',
    but the eCryptfs mount options do not
    - Cleanups in the messaging code
    - Better handling of empty files in the lower filesystem to improve
    usability. Failed file creations are now cleaned up and empty lower
    files are converted into eCryptfs during open().
    - The write-through cache changes are being reverted due to bugs that
    are not easy to fix. Stability outweighs the performance
    enhancements here.
    - Improvement to the mount code to catch unsupported ciphers specified
    in the mount options

    * tag 'ecryptfs-3.6-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
    eCryptfs: check for eCryptfs cipher support at mount
    eCryptfs: Revert to a writethrough cache model
    eCryptfs: Initialize empty lower files when opening them
    eCryptfs: Unlink lower inode when ecryptfs_create() fails
    eCryptfs: Make all miscdev functions use daemon ptr in file private_data
    eCryptfs: Remove unused messaging declarations and function
    eCryptfs: Copy up POSIX ACL and read-only flags from lower mount

    Linus Torvalds
     

30 Jul, 2012

2 commits


23 Jul, 2012

3 commits


14 Jul, 2012

7 commits

  • Pass mount flags to sget() so that it can use them in initialising a new
    superblock before the set function is called. They could also be passed to the
    compare function.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • all we want is a boolean flag, same as the method gets now

    Signed-off-by: Al Viro

    Al Viro
     
  • boolean "does it have to be exclusive?" flag is passed instead;
    Local filesystem should just ignore it - the object is guaranteed
    not to be there yet.

    Signed-off-by: Al Viro

    Al Viro
     
  • Just the flags; only NFS cares even about that, but there are
    legitimate uses for such argument. And getting rid of that
    completely would require splitting ->lookup() into a couple
    of methods (at least), so let's leave that alone for now...

    Signed-off-by: Al Viro

    Al Viro
     
  • Just the lookup flags. Die, bastard, die...

    Signed-off-by: Al Viro

    Al Viro
     
  • The issue occurs when eCryptfs is mounted with a cipher supported by
    the crypto subsystem but not by eCryptfs. The mount succeeds and an
    error does not occur until a write. This change checks for eCryptfs
    cipher support at mount time.

    Resolves Launchpad issue #338914, reported by Tyler Hicks in 03/2009.
    https://bugs.launchpad.net/ecryptfs/+bug/338914

    Signed-off-by: Tim Sally
    Signed-off-by: Tyler Hicks

    Tim Sally
     
  • A change was made about a year ago to get eCryptfs to better utilize its
    page cache during writes. The idea was to do the page encryption
    operations during page writeback, rather than doing them when initially
    writing into the page cache, to reduce the number of page encryption
    operations during sequential writes. This meant that the encrypted page
    would only be written to the lower filesystem during page writeback,
    which was a change from how eCryptfs had previously wrote to the lower
    filesystem in ecryptfs_write_end().

    The change caused a few eCryptfs-internal bugs that were shook out.
    Unfortunately, more grave side effects have been identified that will
    force changes outside of eCryptfs. Because the lower filesystem isn't
    consulted until page writeback, eCryptfs has no way to pass lower write
    errors (ENOSPC, mainly) back to userspace. Additionaly, it was reported
    that quotas could be bypassed because of the way eCryptfs may sometimes
    open the lower filesystem using a privileged kthread.

    It would be nice to resolve the latest issues, but it is best if the
    eCryptfs commits be reverted to the old behavior in the meantime.

    This reverts:
    32001d6f "eCryptfs: Flush file in vma close"
    5be79de2 "eCryptfs: Flush dirty pages in setattr"
    57db4e8d "ecryptfs: modify write path to encrypt page in writepage"

    Signed-off-by: Tyler Hicks
    Tested-by: Colin King
    Cc: Colin King
    Cc: Thieu Le

    Tyler Hicks