19 Dec, 2012

2 commits


08 Nov, 2012

1 commit

  • ecryptfs_write_begin grabs a page from page cache for writing.
    If the page contains invalid data, or data older than the
    counterpart on the disk, eCryptfs will read out the
    corresponing data from the disk into the page, decrypt them,
    then perform writing. However, for this page, if the length
    of the data to be written into is equal to page size,
    that means the whole page of data will be overwritten,
    in which case, it does not matter whatever the data were before,
    it is beneficial to perform writing directly rather than bothering
    to read and decrypt first.

    With this optimization, according to our test on a machine with
    Intel Core 2 Duo processor, iozone 'write' operation on an existing
    file with write size being multiple of page size will enjoy a steady
    3x speedup.

    Signed-off-by: Li Wang
    Signed-off-by: Yunchuan Wen
    Signed-off-by: Tyler Hicks

    Li Wang
     

03 Oct, 2012

3 commits

  • Pull vfs update from Al Viro:

    - big one - consolidation of descriptor-related logics; almost all of
    that is moved to fs/file.c

    (BTW, I'm seriously tempted to rename the result to fd.c. As it is,
    we have a situation when file_table.c is about handling of struct
    file and file.c is about handling of descriptor tables; the reasons
    are historical - file_table.c used to be about a static array of
    struct file we used to have way back).

    A lot of stray ends got cleaned up and converted to saner primitives,
    disgusting mess in android/binder.c is still disgusting, but at least
    doesn't poke so much in descriptor table guts anymore. A bunch of
    relatively minor races got fixed in process, plus an ext4 struct file
    leak.

    - related thing - fget_light() partially unuglified; see fdget() in
    there (and yes, it generates the code as good as we used to have).

    - also related - bits of Cyrill's procfs stuff that got entangled into
    that work; _not_ all of it, just the initial move to fs/proc/fd.c and
    switch of fdinfo to seq_file.

    - Alex's fs/coredump.c spiltoff - the same story, had been easier to
    take that commit than mess with conflicts. The rest is a separate
    pile, this was just a mechanical code movement.

    - a few misc patches all over the place. Not all for this cycle,
    there'll be more (and quite a few currently sit in akpm's tree)."

    Fix up trivial conflicts in the android binder driver, and some fairly
    simple conflicts due to two different changes to the sock_alloc_file()
    interface ("take descriptor handling from sock_alloc_file() to callers"
    vs "net: Providing protocol type via system.sockprotoname xattr of
    /proc/PID/fd entries" adding a dentry name to the socket)

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
    MAX_LFS_FILESIZE should be a loff_t
    compat: fs: Generic compat_sys_sendfile implementation
    fs: push rcu_barrier() from deactivate_locked_super() to filesystems
    btrfs: reada_extent doesn't need kref for refcount
    coredump: move core dump functionality into its own file
    coredump: prevent double-free on an error path in core dumper
    usb/gadget: fix misannotations
    fcntl: fix misannotations
    ceph: don't abuse d_delete() on failure exits
    hypfs: ->d_parent is never NULL or negative
    vfs: delete surplus inode NULL check
    switch simple cases of fget_light to fdget
    new helpers: fdget()/fdput()
    switch o2hb_region_dev_write() to fget_light()
    proc_map_files_readdir(): don't bother with grabbing files
    make get_file() return its argument
    vhost_set_vring(): turn pollstart/pollstop into bool
    switch prctl_set_mm_exe_file() to fget_light()
    switch xfs_find_handle() to fget_light()
    switch xfs_swapext() to fget_light()
    ...

    Linus Torvalds
     
  • There's no reason to call rcu_barrier() on every
    deactivate_locked_super(). We only need to make sure that all delayed rcu
    free inodes are flushed before we destroy related cache.

    Removing rcu_barrier() from deactivate_locked_super() affects some fast
    paths. E.g. on my machine exit_group() of a last process in IPC
    namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.

    Signed-off-by: Kirill A. Shutemov
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Kirill A. Shutemov
     
  • Pull user namespace changes from Eric Biederman:
    "This is a mostly modest set of changes to enable basic user namespace
    support. This allows the code to code to compile with user namespaces
    enabled and removes the assumption there is only the initial user
    namespace. Everything is converted except for the most complex of the
    filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
    nfs, ocfs2 and xfs as those patches need a bit more review.

    The strategy is to push kuid_t and kgid_t values are far down into
    subsystems and filesystems as reasonable. Leaving the make_kuid and
    from_kuid operations to happen at the edge of userspace, as the values
    come off the disk, and as the values come in from the network.
    Letting compile type incompatible compile errors (present when user
    namespaces are enabled) guide me to find the issues.

    The most tricky areas have been the places where we had an implicit
    union of uid and gid values and were storing them in an unsigned int.
    Those places were converted into explicit unions. I made certain to
    handle those places with simple trivial patches.

    Out of that work I discovered we have generic interfaces for storing
    quota by projid. I had never heard of the project identifiers before.
    Adding full user namespace support for project identifiers accounts
    for most of the code size growth in my git tree.

    Ultimately there will be work to relax privlige checks from
    "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
    root in a user names to do those things that today we only forbid to
    non-root users because it will confuse suid root applications.

    While I was pushing kuid_t and kgid_t changes deep into the audit code
    I made a few other cleanups. I capitalized on the fact we process
    netlink messages in the context of the message sender. I removed
    usage of NETLINK_CRED, and started directly using current->tty.

    Some of these patches have also made it into maintainer trees, with no
    problems from identical code from different trees showing up in
    linux-next.

    After reading through all of this code I feel like I might be able to
    win a game of kernel trivial pursuit."

    Fix up some fairly trivial conflicts in netfilter uid/git logging code.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
    userns: Convert the ufs filesystem to use kuid/kgid where appropriate
    userns: Convert the udf filesystem to use kuid/kgid where appropriate
    userns: Convert ubifs to use kuid/kgid
    userns: Convert squashfs to use kuid/kgid where appropriate
    userns: Convert reiserfs to use kuid and kgid where appropriate
    userns: Convert jfs to use kuid/kgid where appropriate
    userns: Convert jffs2 to use kuid and kgid where appropriate
    userns: Convert hpfs to use kuid and kgid where appropriate
    userns: Convert btrfs to use kuid/kgid where appropriate
    userns: Convert bfs to use kuid/kgid where appropriate
    userns: Convert affs to use kuid/kgid wherwe appropriate
    userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
    userns: On ia64 deal with current_uid and current_gid being kuid and kgid
    userns: On ppc convert current_uid from a kuid before printing.
    userns: Convert s390 getting uid and gid system calls to use kuid and kgid
    userns: Convert s390 hypfs to use kuid and kgid where appropriate
    userns: Convert binder ipc to use kuids
    userns: Teach security_path_chown to take kuids and kgids
    userns: Add user namespace support to IMA
    userns: Convert EVM to deal with kuids and kgids in it's hmac computation
    ...

    Linus Torvalds
     

21 Sep, 2012

1 commit


15 Sep, 2012

3 commits

  • After calling into the lower filesystem to do a rename, the lower target
    inode's attributes were not copied up to the eCryptfs target inode. This
    resulted in the eCryptfs target inode staying around, rather than being
    evicted, because i_nlink was not updated for the eCryptfs inode. This
    also meant that eCryptfs didn't do the final iput() on the lower target
    inode so it stayed around, as well. This would result in a failure to
    free up space occupied by the target file in the rename() operation.
    Both target inodes would eventually be evicted when the eCryptfs
    filesystem was unmounted.

    This patch calls fsstack_copy_attr_all() after the lower filesystem
    does its ->rename() so that important inode attributes, such as i_nlink,
    are updated at the eCryptfs layer. ecryptfs_evict_inode() is now called
    and eCryptfs can drop its final reference on the lower inode.

    http://launchpad.net/bugs/561129

    Signed-off-by: Tyler Hicks
    Tested-by: Colin Ian King
    Cc: [2.6.39+]

    Tyler Hicks
     
  • Since eCryptfs only calls fput() on the lower file in
    ecryptfs_release(), eCryptfs should call the lower filesystem's
    ->flush() from ecryptfs_flush().

    If the lower filesystem implements ->flush(), then eCryptfs should try
    to flush out any dirty pages prior to calling the lower ->flush(). If
    the lower filesystem does not implement ->flush(), then eCryptfs has no
    need to do anything in ecryptfs_flush() since dirty pages are now
    written out to the lower filesystem in ecryptfs_release().

    Signed-off-by: Tyler Hicks

    Tyler Hicks
     
  • Fixes a regression caused by:

    821f749 eCryptfs: Revert to a writethrough cache model

    That patch reverted some code (specifically, 32001d6f) that was
    necessary to properly handle open() -> mmap() -> close() -> dirty pages
    -> munmap(), because the lower file could be closed before the dirty
    pages are written out.

    Rather than reapplying 32001d6f, this approach is a better way of
    ensuring that the lower file is still open in order to handle writing
    out the dirty pages. It is called from ecryptfs_release(), while we have
    a lock on the lower file pointer, just before the lower file gets the
    final fput() and we overwrite the pointer.

    https://launchpad.net/bugs/1047261

    Signed-off-by: Tyler Hicks
    Reported-by: Artemy Tregubenko
    Tested-by: Artemy Tregubenko
    Tested-by: Colin Ian King

    Tyler Hicks
     

03 Aug, 2012

1 commit

  • Pull ecryptfs fixes from Tyler Hicks:
    - Fixes a bug when the lower filesystem mount options include 'acl',
    but the eCryptfs mount options do not
    - Cleanups in the messaging code
    - Better handling of empty files in the lower filesystem to improve
    usability. Failed file creations are now cleaned up and empty lower
    files are converted into eCryptfs during open().
    - The write-through cache changes are being reverted due to bugs that
    are not easy to fix. Stability outweighs the performance
    enhancements here.
    - Improvement to the mount code to catch unsupported ciphers specified
    in the mount options

    * tag 'ecryptfs-3.6-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
    eCryptfs: check for eCryptfs cipher support at mount
    eCryptfs: Revert to a writethrough cache model
    eCryptfs: Initialize empty lower files when opening them
    eCryptfs: Unlink lower inode when ecryptfs_create() fails
    eCryptfs: Make all miscdev functions use daemon ptr in file private_data
    eCryptfs: Remove unused messaging declarations and function
    eCryptfs: Copy up POSIX ACL and read-only flags from lower mount

    Linus Torvalds
     

30 Jul, 2012

2 commits


23 Jul, 2012

3 commits


14 Jul, 2012

7 commits

  • Pass mount flags to sget() so that it can use them in initialising a new
    superblock before the set function is called. They could also be passed to the
    compare function.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • all we want is a boolean flag, same as the method gets now

    Signed-off-by: Al Viro

    Al Viro
     
  • boolean "does it have to be exclusive?" flag is passed instead;
    Local filesystem should just ignore it - the object is guaranteed
    not to be there yet.

    Signed-off-by: Al Viro

    Al Viro
     
  • Just the flags; only NFS cares even about that, but there are
    legitimate uses for such argument. And getting rid of that
    completely would require splitting ->lookup() into a couple
    of methods (at least), so let's leave that alone for now...

    Signed-off-by: Al Viro

    Al Viro
     
  • Just the lookup flags. Die, bastard, die...

    Signed-off-by: Al Viro

    Al Viro
     
  • The issue occurs when eCryptfs is mounted with a cipher supported by
    the crypto subsystem but not by eCryptfs. The mount succeeds and an
    error does not occur until a write. This change checks for eCryptfs
    cipher support at mount time.

    Resolves Launchpad issue #338914, reported by Tyler Hicks in 03/2009.
    https://bugs.launchpad.net/ecryptfs/+bug/338914

    Signed-off-by: Tim Sally
    Signed-off-by: Tyler Hicks

    Tim Sally
     
  • A change was made about a year ago to get eCryptfs to better utilize its
    page cache during writes. The idea was to do the page encryption
    operations during page writeback, rather than doing them when initially
    writing into the page cache, to reduce the number of page encryption
    operations during sequential writes. This meant that the encrypted page
    would only be written to the lower filesystem during page writeback,
    which was a change from how eCryptfs had previously wrote to the lower
    filesystem in ecryptfs_write_end().

    The change caused a few eCryptfs-internal bugs that were shook out.
    Unfortunately, more grave side effects have been identified that will
    force changes outside of eCryptfs. Because the lower filesystem isn't
    consulted until page writeback, eCryptfs has no way to pass lower write
    errors (ENOSPC, mainly) back to userspace. Additionaly, it was reported
    that quotas could be bypassed because of the way eCryptfs may sometimes
    open the lower filesystem using a privileged kthread.

    It would be nice to resolve the latest issues, but it is best if the
    eCryptfs commits be reverted to the old behavior in the meantime.

    This reverts:
    32001d6f "eCryptfs: Flush file in vma close"
    5be79de2 "eCryptfs: Flush dirty pages in setattr"
    57db4e8d "ecryptfs: modify write path to encrypt page in writepage"

    Signed-off-by: Tyler Hicks
    Tested-by: Colin King
    Cc: Colin King
    Cc: Thieu Le

    Tyler Hicks
     

09 Jul, 2012

5 commits

  • Historically, eCryptfs has only initialized lower files in the
    ecryptfs_create() path. Lower file initialization is the act of writing
    the cryptographic metadata from the inode's crypt_stat to the header of
    the file. The ecryptfs_open() path already expects that metadata to be
    in the header of the file.

    A number of users have reported empty lower files in beneath their
    eCryptfs mounts. Most of the causes for those empty files being left
    around have been addressed, but the presence of empty files causes
    problems due to the lack of proper cryptographic metadata.

    To transparently solve this problem, this patch initializes empty lower
    files in the ecryptfs_open() error path. If the metadata is unreadable
    due to the lower inode size being 0, plaintext passthrough support is
    not in use, and the metadata is stored in the header of the file (as
    opposed to the user.ecryptfs extended attribute), the lower file will be
    initialized.

    The number of nested conditionals in ecryptfs_open() was getting out of
    hand, so a helper function was created. To avoid the same nested
    conditional problem, the conditional logic was reversed inside of the
    helper function.

    https://launchpad.net/bugs/911507

    Signed-off-by: Tyler Hicks
    Cc: John Johansen
    Cc: Colin Ian King

    Tyler Hicks
     
  • ecryptfs_create() creates a lower inode, allocates an eCryptfs inode,
    initializes the eCryptfs inode and cryptographic metadata attached to
    the inode, and then writes the metadata to the header of the file.

    If an error was to occur after the lower inode was created, an empty
    lower file would be left in the lower filesystem. This is a problem
    because ecryptfs_open() refuses to open any lower files which do not
    have the appropriate metadata in the file header.

    This patch properly unlinks the lower inode when an error occurs in the
    later stages of ecryptfs_create(), reducing the chance that an empty
    lower file will be left in the lower filesystem.

    https://launchpad.net/bugs/872905

    Signed-off-by: Tyler Hicks
    Cc: John Johansen
    Cc: Colin Ian King

    Tyler Hicks
     
  • Now that a pointer to a valid struct ecryptfs_daemon is stored in the
    private_data of an opened /dev/ecryptfs file, the remaining miscdev
    functions can utilize the pointer rather than looking up the
    ecryptfs_daemon at the beginning of each operation.

    The security model of /dev/ecryptfs is simplified a little bit with this
    patch. Upon opening /dev/ecryptfs, a per-user ecryptfs_daemon is
    registered. Another daemon cannot be registered for that user until the
    last file reference is released. During the lifetime of the
    ecryptfs_daemon, access checks are not performed on the /dev/ecryptfs
    operations because it is assumed that the application securely handles
    the opened file descriptor and does not unintentionally leak it to
    processes that are not trusted.

    Signed-off-by: Tyler Hicks
    Cc: Sasha Levin

    Tyler Hicks
     
  • These are no longer needed.

    Signed-off-by: Tyler Hicks
    Cc: Sasha Levin

    Tyler Hicks
     
  • When the eCryptfs mount options do not include '-o acl', but the lower
    filesystem's mount options do include 'acl', the MS_POSIXACL flag is not
    flipped on in the eCryptfs super block flags. This flag is what the VFS
    checks in do_last() when deciding if the current umask should be applied
    to a newly created inode's mode or not. When a default POSIX ACL mask is
    set on a directory, the current umask is incorrectly applied to new
    inodes created in the directory. This patch ignores the MS_POSIXACL flag
    passed into ecryptfs_mount() and sets the flag on the eCryptfs super
    block depending on the flag's presence on the lower super block.

    Additionally, it is incorrect to allow a writeable eCryptfs mount on top
    of a read-only lower mount. This missing check did not allow writes to
    the read-only lower mount because permissions checks are still performed
    on the lower filesystem's objects but it is best to simply not allow a
    rw mount on top of ro mount. However, a ro eCryptfs mount on top of a rw
    mount is valid and still allowed.

    https://launchpad.net/bugs/1009207

    Signed-off-by: Tyler Hicks
    Reported-by: Stefan Beller
    Cc: John Johansen

    Tyler Hicks
     

07 Jul, 2012

1 commit

  • File operations on /dev/ecryptfs would BUG() when the operations were
    performed by processes other than the process that originally opened the
    file. This could happen with open files inherited after fork() or file
    descriptors passed through IPC mechanisms. Rather than calling BUG(), an
    error code can be safely returned in most situations.

    In ecryptfs_miscdev_release(), eCryptfs still needs to handle the
    release even if the last file reference is being held by a process that
    didn't originally open the file. ecryptfs_find_daemon_by_euid() will not
    be successful, so a pointer to the daemon is stored in the file's
    private_data. The private_data pointer is initialized when the miscdev
    file is opened and only used when the file is released.

    https://launchpad.net/bugs/994247

    Signed-off-by: Tyler Hicks
    Reported-by: Sasha Levin
    Tested-by: Sasha Levin

    Tyler Hicks
     

04 Jul, 2012

2 commits

  • Don't grab the daemon mutex while holding the message context mutex.
    Addresses this lockdep warning:

    ecryptfsd/2141 is trying to acquire lock:
    (&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}, at: [] ecryptfs_miscdev_read+0x143/0x470 [ecryptfs]

    but task is already holding lock:
    (&(*daemon)->mux){+.+...}, at: [] ecryptfs_miscdev_read+0x21c/0x470 [ecryptfs]

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&(*daemon)->mux){+.+...}:
    [] lock_acquire+0x9d/0x220
    [] __mutex_lock_common+0x5a/0x4b0
    [] mutex_lock_nested+0x44/0x50
    [] ecryptfs_send_miscdev+0x97/0x120 [ecryptfs]
    [] ecryptfs_send_message+0x134/0x1e0 [ecryptfs]
    [] ecryptfs_generate_key_packet_set+0x2fe/0xa80 [ecryptfs]
    [] ecryptfs_write_metadata+0x108/0x250 [ecryptfs]
    [] ecryptfs_create+0x130/0x250 [ecryptfs]
    [] vfs_create+0xb4/0x120
    [] do_last+0x8c5/0xa10
    [] path_openat+0xd9/0x460
    [] do_filp_open+0x42/0xa0
    [] do_sys_open+0xf8/0x1d0
    [] sys_open+0x21/0x30
    [] system_call_fastpath+0x16/0x1b

    -> #0 (&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}:
    [] __lock_acquire+0x1bf8/0x1c50
    [] lock_acquire+0x9d/0x220
    [] __mutex_lock_common+0x5a/0x4b0
    [] mutex_lock_nested+0x44/0x50
    [] ecryptfs_miscdev_read+0x143/0x470 [ecryptfs]
    [] vfs_read+0xb3/0x180
    [] sys_read+0x4d/0x90
    [] system_call_fastpath+0x16/0x1b

    Signed-off-by: Tyler Hicks

    Tyler Hicks
     
  • If the first attempt at opening the lower file read/write fails,
    eCryptfs will retry using a privileged kthread. However, the privileged
    retry should not happen if the lower file's inode is read-only because a
    read/write open will still be unsuccessful.

    The check for determining if the open should be retried was intended to
    be based on the access mode of the lower file's open flags being
    O_RDONLY, but the check was incorrectly performed. This would cause the
    open to be retried by the privileged kthread, resulting in a second
    failed open of the lower file. This patch corrects the check to
    determine if the open request should be handled by the privileged
    kthread.

    Signed-off-by: Tyler Hicks
    Reported-by: Dan Carpenter
    Acked-by: Dan Carpenter

    Tyler Hicks
     

30 May, 2012

1 commit


29 May, 2012

1 commit

  • Pull writeback tree from Wu Fengguang:
    "Mainly from Jan Kara to avoid iput() in the flusher threads."

    * tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
    writeback: Avoid iput() from flusher thread
    vfs: Rename end_writeback() to clear_inode()
    vfs: Move waiting for inode writeback from end_writeback() to evict_inode()
    writeback: Refactor writeback_single_inode()
    writeback: Remove wb->list_lock from writeback_single_inode()
    writeback: Separate inode requeueing after writeback
    writeback: Move I_DIRTY_PAGES handling
    writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()
    writeback: Move clearing of I_SYNC into inode_sync_complete()
    writeback: initialize global_dirty_limit
    fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds
    mm: page-writeback.c: local functions should not be exposed globally

    Linus Torvalds
     

06 May, 2012

1 commit

  • After we moved inode_sync_wait() from end_writeback() it doesn't make sense
    to call the function end_writeback() anymore. Rename it to clear_inode()
    which well says what the function really does - set I_CLEAR flag.

    Signed-off-by: Jan Kara
    Signed-off-by: Fengguang Wu

    Jan Kara
     

08 Apr, 2012

1 commit


21 Mar, 2012

4 commits


29 Feb, 2012

1 commit

  • Fix printk format warning (from Linus's suggestion):

    on i386:
    fs/ecryptfs/miscdev.c:433:38: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'unsigned int'

    and on x86_64:
    fs/ecryptfs/miscdev.c:433:38: warning: format '%u' expects type 'unsigned int', but argument 4 has type 'long unsigned int'

    Signed-off-by: Randy Dunlap
    Cc: Geert Uytterhoeven
    Cc: Tyler Hicks
    Cc: Dustin Kirkland
    Cc: ecryptfs@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Randy Dunlap