04 Mar, 2013

2 commits

  • Modify the request_module to prefix the file system type with "fs-"
    and add aliases to all of the filesystems that can be built as modules
    to match.

    A common practice is to build all of the kernel code and leave code
    that is not commonly needed as modules, with the result that many
    users are exposed to any bug anywhere in the kernel.

    Looking for filesystems with a fs- prefix limits the pool of possible
    modules that can be loaded by mount to just filesystems trivially
    making things safer with no real cost.

    Using aliases means user space can control the policy of which
    filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
    with blacklist and alias directives. Allowing simple, safe,
    well understood work-arounds to known problematic software.

    This also addresses a rare but unfortunate problem where the filesystem
    name is not the same as it's module name and module auto-loading
    would not work. While writing this patch I saw a handful of such
    cases. The most significant being autofs that lives in the module
    autofs4.

    This is relevant to user namespaces because we can reach the request
    module in get_fs_type() without having any special permissions, and
    people get uncomfortable when a user specified string (in this case
    the filesystem type) goes all of the way to request_module.

    After having looked at this issue I don't think there is any
    particular reason to perform any filtering or permission checks beyond
    making it clear in the module request that we want a filesystem
    module. The common pattern in the kernel is to call request_module()
    without regards to the users permissions. In general all a filesystem
    module does once loaded is call register_filesystem() and go to sleep.
    Which means there is not much attack surface exposed by loading a
    filesytem module unless the filesystem is mounted. In a user
    namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
    which most filesystems do not set today.

    Acked-by: Serge Hallyn
    Acked-by: Kees Cook
    Reported-by: Kees Cook
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Pull more VFS bits from Al Viro:
    "Unfortunately, it looks like xattr series will have to wait until the
    next cycle ;-/

    This pile contains 9p cleanups and fixes (races in v9fs_fid_add()
    etc), fixup for nommu breakage in shmem.c, several cleanups and a bit
    more file_inode() work"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    constify path_get/path_put and fs_struct.c stuff
    fix nommu breakage in shmem.c
    cache the value of file_inode() in struct file
    9p: if v9fs_fid_lookup() gets to asking server, it'd better have hashed dentry
    9p: make sure ->lookup() adds fid to the right dentry
    9p: untangle ->lookup() a bit
    9p: double iput() in ->lookup() if d_materialise_unique() fails
    9p: v9fs_fid_add() can't fail now
    v9fs: get rid of v9fs_dentry
    9p: turn fid->dlist into hlist
    9p: don't bother with private lock in ->d_fsdata; dentry->d_lock will do just fine
    more file_inode() open-coded instances
    selinux: opened file can't have NULL or negative ->f_path.dentry

    (In the meantime, the hlist traversal macros have changed, so this
    required a semantic conflict fixup for the newly hlistified fid->dlist)

    Linus Torvalds
     

28 Feb, 2013

8 commits


27 Feb, 2013

1 commit

  • Pull vfs pile (part one) from Al Viro:
    "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
    locking violations, etc.

    The most visible changes here are death of FS_REVAL_DOT (replaced with
    "has ->d_weak_revalidate()") and a new helper getting from struct file
    to inode. Some bits of preparation to xattr method interface changes.

    Misc patches by various people sent this cycle *and* ocfs2 fixes from
    several cycles ago that should've been upstream right then.

    PS: the next vfs pile will be xattr stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    saner proc_get_inode() calling conventions
    proc: avoid extra pde_put() in proc_fill_super()
    fs: change return values from -EACCES to -EPERM
    fs/exec.c: make bprm_mm_init() static
    ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
    ocfs2: fix possible use-after-free with AIO
    ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
    get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
    target: writev() on single-element vector is pointless
    export kernel_write(), convert open-coded instances
    fs: encode_fh: return FILEID_INVALID if invalid fid_type
    kill f_vfsmnt
    vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
    nfsd: handle vfs_getattr errors in acl protocol
    switch vfs_getattr() to struct path
    default SET_PERSONALITY() in linux/elf.h
    ceph: prepopulate inodes only when request is aborted
    d_hash_and_lookup(): export, switch open-coded instances
    9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
    9p: split dropping the acls from v9fs_set_create_acl()
    ...

    Linus Torvalds
     

26 Feb, 2013

8 commits

  • The following set of operations on a NFS client and server will cause

    server# mkdir a
    client# cd a
    server# mv a a.bak
    client# sleep 30 # (or whatever the dir attrcache timeout is)
    client# stat .
    stat: cannot stat `.': Stale NFS file handle

    Obviously, we should not be getting an ESTALE error back there since the
    inode still exists on the server. The problem is that the lookup code
    will call d_revalidate on the dentry that "." refers to, because NFS has
    FS_REVAL_DOT set.

    nfs_lookup_revalidate will see that the parent directory has changed and
    will try to reverify the dentry by redoing a LOOKUP. That of course
    fails, so the lookup code returns ESTALE.

    The problem here is that d_revalidate is really a bad fit for this case.
    What we really want to know at this point is whether the inode is still
    good or not, but we don't really care what name it goes by or whether
    the dcache is still valid.

    Add a new d_op->d_weak_revalidate operation and have complete_walk call
    that instead of d_revalidate. The intent there is to allow for a
    "weaker" d_revalidate that just checks to see whether the inode is still
    good. This is also gives us an opportunity to kill off the FS_REVAL_DOT
    special casing.

    [AV: changed method name, added note in porting, fixed confusion re
    having it possibly called from RCU mode (it won't be)]

    Cc: NeilBrown
    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • caller has both, might as well pass them explicitly.

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • ... making v9fs_xattr_set() a wrapper for it.

    Signed-off-by: Al Viro

    Al Viro
     
  • Pull user namespace and namespace infrastructure changes from Eric W Biederman:
    "This set of changes starts with a few small enhnacements to the user
    namespace. reboot support, allowing more arbitrary mappings, and
    support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the
    user namespace root.

    I do my best to document that if you care about limiting your
    unprivileged users that when you have the user namespace support
    enabled you will need to enable memory control groups.

    There is a minor bug fix to prevent overflowing the stack if someone
    creates way too many user namespaces.

    The bulk of the changes are a continuation of the kuid/kgid push down
    work through the filesystems. These changes make using uids and gids
    typesafe which ensures that these filesystems are safe to use when
    multiple user namespaces are in use. The filesystems converted for
    3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs. The
    changes for these filesystems were a little more involved so I split
    the changes into smaller hopefully obviously correct changes.

    XFS is the only filesystem that remains. I was hoping I could get
    that in this release so that user namespace support would be enabled
    with an allyesconfig or an allmodconfig but it looks like the xfs
    changes need another couple of days before it they are ready."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits)
    cifs: Enable building with user namespaces enabled.
    cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t
    cifs: Convert struct cifs_sb_info to use kuids and kgids
    cifs: Modify struct smb_vol to use kuids and kgids
    cifs: Convert struct cifsFileInfo to use a kuid
    cifs: Convert struct cifs_fattr to use kuid and kgids
    cifs: Convert struct tcon_link to use a kuid.
    cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t
    cifs: Convert from a kuid before printing current_fsuid
    cifs: Use kuids and kgids SID to uid/gid mapping
    cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc
    cifs: Use BUILD_BUG_ON to validate uids and gids are the same size
    cifs: Override unmappable incoming uids and gids
    nfsd: Enable building with user namespaces enabled.
    nfsd: Properly compare and initialize kuids and kgids
    nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids
    nfsd: Modify nfsd4_cb_sec to use kuids and kgids
    nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion
    nfsd: Convert nfsxdr to use kuids and kgids
    nfsd: Convert nfs3xdr to use kuids and kgids
    ...

    Linus Torvalds
     

23 Feb, 2013

1 commit


22 Feb, 2013

3 commits

  • Merge misc patches from Andrew Morton:

    - Florian has vanished so I appear to have become fbdev maintainer
    again :(

    - Joel and Mark are distracted to welcome to the new OCFS2 maintainer

    - The backlight queue

    - Small core kernel changes

    - lib/ updates

    - The rtc queue

    - Various random bits

    * akpm: (164 commits)
    rtc: rtc-davinci: use devm_*() functions
    rtc: rtc-max8997: use devm_request_threaded_irq()
    rtc: rtc-max8907: use devm_request_threaded_irq()
    rtc: rtc-da9052: use devm_request_threaded_irq()
    rtc: rtc-wm831x: use devm_request_threaded_irq()
    rtc: rtc-tps80031: use devm_request_threaded_irq()
    rtc: rtc-lp8788: use devm_request_threaded_irq()
    rtc: rtc-coh901331: use devm_clk_get()
    rtc: rtc-vt8500: use devm_*() functions
    rtc: rtc-tps6586x: use devm_request_threaded_irq()
    rtc: rtc-imxdi: use devm_clk_get()
    rtc: rtc-cmos: use dev_warn()/dev_dbg() instead of printk()/pr_debug()
    rtc: rtc-pcf8583: use dev_warn() instead of printk()
    rtc: rtc-sun4v: use pr_warn() instead of printk()
    rtc: rtc-vr41xx: use dev_info() instead of printk()
    rtc: rtc-rs5c313: use pr_err() instead of printk()
    rtc: rtc-at91rm9200: use dev_dbg()/dev_err() instead of printk()/pr_debug()
    rtc: rtc-rs5c372: use dev_dbg()/dev_warn() instead of printk()/pr_debug()
    rtc: rtc-ds2404: use dev_err() instead of printk()
    rtc: rtc-efi: use dev_err()/dev_warn()/pr_err() instead of printk()
    ...

    Linus Torvalds
     
  • Fix up the ->page_mkwrite handler to provide stable page writes if necessary.

    Signed-off-by: Darrick J. Wong
    Cc: Adrian Hunter
    Cc: Andy Lutomirski
    Cc: Artem Bityutskiy
    Cc: Jan Kara
    Cc: Joel Becker
    Cc: Mark Fasheh
    Cc: Steven Whitehouse
    Cc: Jens Axboe
    Cc: Eric Van Hensbergen
    Cc: Ron Minnich
    Cc: Latchesar Ionkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Darrick J. Wong
     
  • Pull driver core patches from Greg Kroah-Hartman:
    "Here is the big driver core merge for 3.9-rc1

    There are two major series here, both of which touch lots of drivers
    all over the kernel, and will cause you some merge conflicts:

    - add a new function called devm_ioremap_resource() to properly be
    able to check return values.

    - remove CONFIG_EXPERIMENTAL

    Other than those patches, there's not much here, some minor fixes and
    updates"

    Fix up trivial conflicts

    * tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits)
    base: memory: fix soft/hard_offline_page permissions
    drivercore: Fix ordering between deferred_probe and exiting initcalls
    backlight: fix class_find_device() arguments
    TTY: mark tty_get_device call with the proper const values
    driver-core: constify data for class_find_device()
    firmware: Ignore abort check when no user-helper is used
    firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER
    firmware: Make user-mode helper optional
    firmware: Refactoring for splitting user-mode helper code
    Driver core: treat unregistered bus_types as having no devices
    watchdog: Convert to devm_ioremap_resource()
    thermal: Convert to devm_ioremap_resource()
    spi: Convert to devm_ioremap_resource()
    power: Convert to devm_ioremap_resource()
    mtd: Convert to devm_ioremap_resource()
    mmc: Convert to devm_ioremap_resource()
    mfd: Convert to devm_ioremap_resource()
    media: Convert to devm_ioremap_resource()
    iommu: Convert to devm_ioremap_resource()
    drm: Convert to devm_ioremap_resource()
    ...

    Linus Torvalds
     

12 Feb, 2013

5 commits

  • Modify v9fs_get_fsgid_for_create to return a kgid and modify all of
    the variables that hold the result of v9fs_get_fsgid_for_create to be
    of type kgid_t.

    Cc: Eric Van Hensbergen
    Cc: Ron Minnich
    Cc: Latchesar Ionkov
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Change struct v9fs_session_info and the code that popluates it to use
    kuids and kgids. When parsing the 9p mount options convert the
    dfltuid, dflutgid, and the session uid from the current user namespace
    into kuids and kgids. Modify V9FS_DEFUID and V9FS_DEFGUID to be kuid
    and kgid values.

    Cc: Eric Van Hensbergen
    Cc: Ron Minnich
    Cc: Latchesar Ionkov
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Change struct 9p_fid and it's associated functions to
    use kuid_t's instead of uid_t.

    Cc: Eric Van Hensbergen
    Cc: Ron Minnich
    Cc: Latchesar Ionkov
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • 9p has thre strucrtures that can encode inode stat information. Modify
    all of those structures to contain kuid_t and kgid_t values. Modify
    he wire encoders and decoders of those structures to use 'u' and 'g' instead of
    'd' in the format string where uids and gids are present.

    This results in all kuid and kgid conversion to and from on the wire values
    being performed by the same code in protocol.c where the client is known
    at the time of the conversion.

    Cc: Eric Van Hensbergen
    Cc: Ron Minnich
    Cc: Latchesar Ionkov
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • Modify the p9_client_rpc format specifiers of every function that
    directly transmits a uid or a gid from 'd' to 'u' or 'g' as
    appropriate.

    Modify those same functions to take kuid_t and kgid_t parameters
    instead of uid_t and gid_t parameters.

    Cc: Eric Van Hensbergen
    Cc: Ron Minnich
    Cc: Latchesar Ionkov
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

11 Feb, 2013

3 commits

  • Return EEXISTS if requested file already exists, without this patch open
    call will always succeed even if the file exists and user specified
    O_CREAT|O_EXCL.

    Following test code can be used to verify this patch. Without this patch
    executing following test code on 9p mount will result in printing 'test case
    failed' always.

    main()
    {
    int fd;

    /* first create the file */
    fd = open("./file", O_CREAT|O_WRONLY);
    if (fd < 0) {
    perror("open");
    return -1;
    }
    close(fd);

    /* Now opening same file with O_CREAT|O_EXCL should fail */
    fd = open("./file", O_CREAT|O_EXCL);
    if (fd < 0 && errno == EEXIST)
    printf("test case pass\n");
    else
    printf("test case failed\n");
    close(fd);
    return 0;
    }

    Signed-off-by: M. Mohan Kumar
    Signed-off-by: Eric Van Hensbergen

    M. Mohan Kumar
     
  • We do the truncate via setattr request, hence don't pass the O_TRUNC flag in
    open request. Without this patch we end up sending zero sized write request
    to server when we try to truncate. Some servers (VirtFS) were not handling that
    properly.

    Reported-by: M. Mohan Kumar
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • ... is really excessive. First of all, ->readdir() is serialized by
    file->f_path.dentry->d_inode->i_mutex; playing with file->f_path.dentry->d_lock
    is not buying you anything. Moreover, rdir->mutex is pointless for exactly
    the same reason - you'll never see contention on it.

    While we are at it, there's no point in having rdir->buf a pointer -
    you have it point just past the end of rdir, so it might as well be a flex
    array (and no, it's not a gccism).

    Absolutely untested patch follows:

    Signed-off-by: Al Viro
    Signed-off-by: Eric Van Hensbergen

    Al Viro
     

22 Jan, 2013

1 commit

  • The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
    while now and is almost always enabled by default. As agreed during the
    Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

    CC: Eric Van Hensbergen
    CC: Ron Minnich
    CC: Latchesar Ionkov
    Cc: Al Viro
    Signed-off-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     

12 Oct, 2012

1 commit


09 Oct, 2012

1 commit

  • Move actual pte filling for non-linear file mappings into the new special
    vma operation: ->remap_pages().

    Filesystems must implement this method to get non-linear mapping support,
    if it uses filemap_fault() then generic_file_remap_pages() can be used.

    Now device drivers can implement this method and obtain nonlinear vma support.

    Signed-off-by: Konstantin Khlebnikov
    Cc: Alexander Viro
    Cc: Carsten Otte
    Cc: Chris Metcalf #arch/tile
    Cc: Cyrill Gorcunov
    Cc: Eric Paris
    Cc: H. Peter Anvin
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc: James Morris
    Cc: Jason Baron
    Cc: Kentaro Takeda
    Cc: Matt Helsley
    Cc: Nick Piggin
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Suresh Siddha
    Cc: Tetsuo Handa
    Cc: Venkatesh Pallipadi
    Acked-by: Linus Torvalds
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     

03 Oct, 2012

2 commits

  • Pull vfs update from Al Viro:

    - big one - consolidation of descriptor-related logics; almost all of
    that is moved to fs/file.c

    (BTW, I'm seriously tempted to rename the result to fd.c. As it is,
    we have a situation when file_table.c is about handling of struct
    file and file.c is about handling of descriptor tables; the reasons
    are historical - file_table.c used to be about a static array of
    struct file we used to have way back).

    A lot of stray ends got cleaned up and converted to saner primitives,
    disgusting mess in android/binder.c is still disgusting, but at least
    doesn't poke so much in descriptor table guts anymore. A bunch of
    relatively minor races got fixed in process, plus an ext4 struct file
    leak.

    - related thing - fget_light() partially unuglified; see fdget() in
    there (and yes, it generates the code as good as we used to have).

    - also related - bits of Cyrill's procfs stuff that got entangled into
    that work; _not_ all of it, just the initial move to fs/proc/fd.c and
    switch of fdinfo to seq_file.

    - Alex's fs/coredump.c spiltoff - the same story, had been easier to
    take that commit than mess with conflicts. The rest is a separate
    pile, this was just a mechanical code movement.

    - a few misc patches all over the place. Not all for this cycle,
    there'll be more (and quite a few currently sit in akpm's tree)."

    Fix up trivial conflicts in the android binder driver, and some fairly
    simple conflicts due to two different changes to the sock_alloc_file()
    interface ("take descriptor handling from sock_alloc_file() to callers"
    vs "net: Providing protocol type via system.sockprotoname xattr of
    /proc/PID/fd entries" adding a dentry name to the socket)

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
    MAX_LFS_FILESIZE should be a loff_t
    compat: fs: Generic compat_sys_sendfile implementation
    fs: push rcu_barrier() from deactivate_locked_super() to filesystems
    btrfs: reada_extent doesn't need kref for refcount
    coredump: move core dump functionality into its own file
    coredump: prevent double-free on an error path in core dumper
    usb/gadget: fix misannotations
    fcntl: fix misannotations
    ceph: don't abuse d_delete() on failure exits
    hypfs: ->d_parent is never NULL or negative
    vfs: delete surplus inode NULL check
    switch simple cases of fget_light to fdget
    new helpers: fdget()/fdput()
    switch o2hb_region_dev_write() to fget_light()
    proc_map_files_readdir(): don't bother with grabbing files
    make get_file() return its argument
    vhost_set_vring(): turn pollstart/pollstop into bool
    switch prctl_set_mm_exe_file() to fget_light()
    switch xfs_find_handle() to fget_light()
    switch xfs_swapext() to fget_light()
    ...

    Linus Torvalds
     
  • There's no reason to call rcu_barrier() on every
    deactivate_locked_super(). We only need to make sure that all delayed rcu
    free inodes are flushed before we destroy related cache.

    Removing rcu_barrier() from deactivate_locked_super() affects some fast
    paths. E.g. on my machine exit_group() of a last process in IPC
    namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.

    Signed-off-by: Kirill A. Shutemov
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Kirill A. Shutemov
     

18 Sep, 2012

2 commits

  • - Pass the user namespace the uid and gid values in the xattr are stored
    in into posix_acl_from_xattr.

    - Pass the user namespace kuid and kgid values should be converted into
    when storing uid and gid values in an xattr in posix_acl_to_xattr.

    - Modify all callers of posix_acl_from_xattr and posix_acl_to_xattr to
    pass in &init_user_ns.

    In the short term this change is not strictly needed but it makes the
    code clearer. In the longer term this change is necessary to be able to
    mount filesystems outside of the initial user namespace that natively
    store posix acls in the linux xattr format.

    Cc: Theodore Tso
    Cc: Andrew Morton
    Cc: Andreas Dilger
    Cc: Jan Kara
    Cc: Al Viro
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • These are generally very small strings. We don't need an entire 4k
    allocation for each. Instead, just free and reallocate them on an
    as-needed basis.

    Note: This patch is untested since I don't have a 9p server available at
    the moment. It's mainly something I noticed while doing some
    getname/putname cleanup work.

    Signed-off-by: Jeff Layton
    Signed-off-by: Eric Van Hensbergen

    Jeff Layton
     

07 Sep, 2012

1 commit

  • Reading a symlink longer than the given buffer, a p9_debug use would
    try to print the link name (not NUL-terminated) using a %s format.
    Use %.*s instead, and replace the strncpy+strnlen with functionally
    equivalent strlen+memcpy.

    Signed-off-by: Jim Meyering
    Signed-off-by: Eric Van Hensbergen

    Jim Meyering
     

31 Jul, 2012

1 commit