20 Jun, 2019

1 commit

  • We would like to move fsnotify_nameremove() calls from d_delete()
    into a higher layer where the hook makes more sense and so we can
    consider every d_delete() call site individually.

    Start by creating empty hook fsnotify_{unlink,rmdir}() and place
    them in the proper VFS call sites. After all d_delete() call sites
    will be converted to use the new hook, the new hook will generate the
    delete events and fsnotify_nameremove() hook will be removed.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     

08 May, 2019

1 commit

  • Pull fscrypt updates from Ted Ts'o:
    "Clean up fscrypt's dcache revalidation support, and other
    miscellaneous cleanups"

    * tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
    fscrypt: cache decrypted symlink target in ->i_link
    vfs: use READ_ONCE() to access ->i_link
    fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext
    fscrypt: only set dentry_operations on ciphertext dentries
    fs, fscrypt: clear DCACHE_ENCRYPTED_NAME when unaliasing directory
    fscrypt: fix race allowing rename() and link() of ciphertext dentries
    fscrypt: clean up and improve dentry revalidation
    fscrypt: use READ_ONCE() to access ->i_crypt_info
    fscrypt: remove WARN_ON_ONCE() when decryption fails
    fscrypt: drop inode argument from fscrypt_get_ctx()

    Linus Torvalds
     

27 Apr, 2019

2 commits


18 Apr, 2019

1 commit

  • Use 'READ_ONCE(inode->i_link)' to explicitly support filesystems caching
    the symlink target in ->i_link later if it was unavailable at iget()
    time, or wasn't easily available. I'll be doing this in fscrypt, to
    improve the performance of encrypted symlinks on ext4, f2fs, and ubifs.

    ->i_link will start NULL and may later be set to a non-NULL value by a
    smp_store_release() or cmpxchg_release(). READ_ONCE() is needed on the
    read side. smp_load_acquire() is unnecessary because only a data
    dependency barrier is required. (Thanks to Al for pointing this out.)

    Acked-by: Al Viro
    Signed-off-by: Eric Biggers
    Signed-off-by: Theodore Ts'o

    Eric Biggers
     

13 Mar, 2019

1 commit

  • Pull vfs mount infrastructure updates from Al Viro:
    "The rest of core infrastructure; no new syscalls in that pile, but the
    old parts are switched to new infrastructure. At that point
    conversions of individual filesystems can happen independently; some
    are done here (afs, cgroup, procfs, etc.), there's also a large series
    outside of that pile dealing with NFS (quite a bit of option-parsing
    stuff is getting used there - it's one of the most convoluted
    filesystems in terms of mount-related logics), but NFS bits are the
    next cycle fodder.

    It got seriously simplified since the last cycle; documentation is
    probably the weakest bit at the moment - I considered dropping the
    commit introducing Documentation/filesystems/mount_api.txt (cutting
    the size increase by quarter ;-), but decided that it would be better
    to fix it up after -rc1 instead.

    That pile allows to do followup work in independent branches, which
    should make life much easier for the next cycle. fs/super.c size
    increase is unpleasant; there's a followup series that allows to
    shrink it considerably, but I decided to leave that until the next
    cycle"

    * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits)
    afs: Use fs_context to pass parameters over automount
    afs: Add fs_context support
    vfs: Add some logging to the core users of the fs_context log
    vfs: Implement logging through fs_context
    vfs: Provide documentation for new mount API
    vfs: Remove kern_mount_data()
    hugetlbfs: Convert to fs_context
    cpuset: Use fs_context
    kernfs, sysfs, cgroup, intel_rdt: Support fs_context
    cgroup: store a reference to cgroup_ns into cgroup_fs_context
    cgroup1_get_tree(): separate "get cgroup_root to use" into a separate helper
    cgroup_do_mount(): massage calling conventions
    cgroup: stash cgroup_root reference into cgroup_fs_context
    cgroup2: switch to option-by-option parsing
    cgroup1: switch to option-by-option parsing
    cgroup: take options parsing into ->parse_monolithic()
    cgroup: fold cgroup1_mount() into cgroup1_get_tree()
    cgroup: start switching to fs_context
    ipc: Convert mqueue fs to fs_context
    proc: Add fs_context support to procfs
    ...

    Linus Torvalds
     

11 Mar, 2019

1 commit

  • …morris/linux-security

    Pull integrity updates from James Morris:
    "Mimi Zohar says:

    'Linux 5.0 introduced the platform keyring to allow verifying the IMA
    kexec kernel image signature using the pre-boot keys. This pull
    request similarly makes keys on the platform keyring accessible for
    verifying the PE kernel image signature.

    Also included in this pull request is a new IMA hook that tags tmp
    files, in policy, indicating the file hash needs to be calculated.
    The remaining patches are cleanup'"

    * 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
    evm: Use defined constant for UUID representation
    ima: define ima_post_create_tmpfile() hook and add missing call
    evm: remove set but not used variable 'xattr'
    encrypted-keys: fix Opt_err/Opt_error = -1
    kexec, KEYS: Make use of platform keyring for signature verify
    integrity, KEYS: add a reference to platform keyring

    Linus Torvalds
     

08 Mar, 2019

2 commits

  • Merge more updates from Andrew Morton:

    - some of the rest of MM

    - various misc things

    - dynamic-debug updates

    - checkpatch

    - some epoll speedups

    - autofs

    - rapidio

    - lib/, lib/lzo/ updates

    * emailed patches from Andrew Morton : (83 commits)
    samples/mic/mpssd/mpssd.h: remove duplicate header
    kernel/fork.c: remove duplicated include
    include/linux/relay.h: fix percpu annotation in struct rchan
    arch/nios2/mm/fault.c: remove duplicate include
    unicore32: stop printing the virtual memory layout
    MAINTAINERS: fix GTA02 entry and mark as orphan
    mm: create the new vm_fault_t type
    arm, s390, unicore32: remove oneliner wrappers for memblock_alloc()
    arch: simplify several early memory allocations
    openrisc: simplify pte_alloc_one_kernel()
    sh: prefer memblock APIs returning virtual address
    microblaze: prefer memblock API returning virtual address
    powerpc: prefer memblock APIs returning virtual address
    lib/lzo: separate lzo-rle from lzo
    lib/lzo: implement run-length encoding
    lib/lzo: fast 8-byte copy on arm64
    lib/lzo: 64-bit CTZ on arm64
    lib/lzo: tidy-up ifdefs
    ipc/sem.c: replace kvmalloc/memset with kvzalloc and use struct_size
    ipc: annotate implicit fall through
    ...

    Linus Torvalds
     
  • Instead of doing this compile-time check in some slightly arbitrary user
    of struct filename, put it next to the definition.

    Link: http://lkml.kernel.org/r/20190208203015.29702-3-linux@rasmusvillemoes.dk
    Signed-off-by: Rasmus Villemoes
    Cc: Alexander Viro
    Cc: Kees Cook
    Cc: Luc Van Oostenryck
    Cc: Masahiro Yamada
    Cc: Nick Desaulniers
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     

28 Feb, 2019

1 commit

  • Because the new API passes in key,value parameters, match_token() cannot be
    used with it. Instead, provide three new helpers to aid with parsing:

    (1) fs_parse(). This takes a parameter and a simple static description of
    all the parameters and maps the key name to an ID. It returns 1 on a
    match, 0 on no match if unknowns should be ignored and some other
    negative error code on a parse error.

    The parameter description includes a list of key names to IDs, desired
    parameter types and a list of enumeration name -> ID mappings.

    [!] Note that for the moment I've required that the key->ID mapping
    array is expected to be sorted and unterminated. The size of the
    array is noted in the fsconfig_parser struct. This allows me to use
    bsearch(), but I'm not sure any performance gain is worth the hassle
    of requiring people to keep the array sorted.

    The parameter type array is sized according to the number of parameter
    IDs and is indexed directly. The optional enum mapping array is an
    unterminated, unsorted list and the size goes into the fsconfig_parser
    struct.

    The function can do some additional things:

    (a) If it's not ambiguous and no value is given, the prefix "no" on
    a key name is permitted to indicate that the parameter should
    be considered negatory.

    (b) If the desired type is a single simple integer, it will perform
    an appropriate conversion and store the result in a union in
    the parse result.

    (c) If the desired type is an enumeration, {key ID, name} will be
    looked up in the enumeration list and the matching value will
    be stored in the parse result union.

    (d) Optionally generate an error if the key is unrecognised.

    This is called something like:

    enum rdt_param {
    Opt_cdp,
    Opt_cdpl2,
    Opt_mba_mpbs,
    nr__rdt_params
    };

    const struct fs_parameter_spec rdt_param_specs[nr__rdt_params] = {
    [Opt_cdp] = { fs_param_is_bool },
    [Opt_cdpl2] = { fs_param_is_bool },
    [Opt_mba_mpbs] = { fs_param_is_bool },
    };

    const const char *const rdt_param_keys[nr__rdt_params] = {
    [Opt_cdp] = "cdp",
    [Opt_cdpl2] = "cdpl2",
    [Opt_mba_mpbs] = "mba_mbps",
    };

    const struct fs_parameter_description rdt_parser = {
    .name = "rdt",
    .nr_params = nr__rdt_params,
    .keys = rdt_param_keys,
    .specs = rdt_param_specs,
    .no_source = true,
    };

    int rdt_parse_param(struct fs_context *fc,
    struct fs_parameter *param)
    {
    struct fs_parse_result parse;
    struct rdt_fs_context *ctx = rdt_fc2context(fc);
    int ret;

    ret = fs_parse(fc, &rdt_parser, param, &parse);
    if (ret < 0)
    return ret;

    switch (parse.key) {
    case Opt_cdp:
    ctx->enable_cdpl3 = true;
    return 0;
    case Opt_cdpl2:
    ctx->enable_cdpl2 = true;
    return 0;
    case Opt_mba_mpbs:
    ctx->enable_mba_mbps = true;
    return 0;
    }

    return -EINVAL;
    }

    (2) fs_lookup_param(). This takes a { dirfd, path, LOOKUP_EMPTY? } or
    string value and performs an appropriate path lookup to convert it
    into a path object, which it will then return.

    If the desired type was a blockdev, the type of the looked up inode
    will be checked to make sure it is one.

    This can be used like:

    enum foo_param {
    Opt_source,
    nr__foo_params
    };

    const struct fs_parameter_spec foo_param_specs[nr__foo_params] = {
    [Opt_source] = { fs_param_is_blockdev },
    };

    const char *char foo_param_keys[nr__foo_params] = {
    [Opt_source] = "source",
    };

    const struct constant_table foo_param_alt_keys[] = {
    { "device", Opt_source },
    };

    const struct fs_parameter_description foo_parser = {
    .name = "foo",
    .nr_params = nr__foo_params,
    .nr_alt_keys = ARRAY_SIZE(foo_param_alt_keys),
    .keys = foo_param_keys,
    .alt_keys = foo_param_alt_keys,
    .specs = foo_param_specs,
    };

    int foo_parse_param(struct fs_context *fc,
    struct fs_parameter *param)
    {
    struct fs_parse_result parse;
    struct foo_fs_context *ctx = foo_fc2context(fc);
    int ret;

    ret = fs_parse(fc, &foo_parser, param, &parse);
    if (ret < 0)
    return ret;

    switch (parse.key) {
    case Opt_source:
    return fs_lookup_param(fc, &foo_parser, param,
    &parse, &ctx->source);
    default:
    return -EINVAL;
    }
    }

    (3) lookup_constant(). This takes a table of named constants and looks up
    the given name within it. The table is expected to be sorted such
    that bsearch() be used upon it.

    Possibly I should require the table be terminated and just use a
    for-loop to scan it instead of using bsearch() to reduce hassle.

    Tables look something like:

    static const struct constant_table bool_names[] = {
    { "0", false },
    { "1", true },
    { "false", false },
    { "no", false },
    { "true", true },
    { "yes", true },
    };

    and a lookup is done with something like:

    b = lookup_constant(bool_names, param->string, -1);

    Additionally, optional validation routines for the parameter description
    are provided that can be enabled at compile time. A later patch will
    invoke these when a filesystem is registered.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

05 Feb, 2019

1 commit

  • If tmpfiles can be made persistent, then newly created tmpfiles need to
    be treated like any other new files in policy.

    This patch indicates which newly created tmpfiles are in policy, causing
    the file hash to be calculated on __fput().

    Reported-by: Ignaz Forster
    [rgoldwyn@suse.com: Call ima_post_create_tmpfile() in vfs_tmpfile() as
    opposed to do_tmpfile(). This will help the case for overlayfs where
    copy_up is denied while overwriting a file.]
    Signed-off-by: Goldwyn Rodrigues
    Signed-off-by: Mimi Zohar

    Mimi Zohar
     

31 Jan, 2019

1 commit

  • Don't fetch fcaps when umount2 is called to avoid a process hang while
    it waits for the missing resource to (possibly never) re-appear.

    Note the comment above user_path_mountpoint_at():
    * A umount is a special case for path walking. We're not actually interested
    * in the inode in this situation, and ESTALE errors can be a problem. We
    * simply want track down the dentry and vfsmount attached at the mountpoint
    * and avoid revalidating the last component.

    This can happen on ceph, cifs, 9p, lustre, fuse (gluster) or NFS.

    Please see the github issue tracker
    https://github.com/linux-audit/audit-kernel/issues/100

    Signed-off-by: Richard Guy Briggs
    [PM: merge fuzz in audit_log_fcaps()]
    Signed-off-by: Paul Moore

    Richard Guy Briggs
     

23 Dec, 2018

1 commit

  • This reverts commit 55956b59df336f6738da916dbb520b6e37df9fbd.

    commit 55956b59df33 ("vfs: Allow userns root to call mknod on owned filesystems.")
    enabled mknod() in user namespaces for userns root if CAP_MKNOD is
    available. However, these device nodes are useless since any filesystem
    mounted from a non-initial user namespace will set the SB_I_NODEV flag on
    the filesystem. Now, when a device node s created in a non-initial user
    namespace a call to open() on said device node will fail due to:

    bool may_open_dev(const struct path *path)
    {
    return !(path->mnt->mnt_flags & MNT_NODEV) &&
    !(path->mnt->mnt_sb->s_iflags & SB_I_NODEV);
    }

    The problem with this is that as of the aforementioned commit mknod()
    creates partially functional device nodes in non-initial user namespaces.
    In particular, it has the consequence that as of the aforementioned commit
    open() will be more privileged with respect to device nodes than mknod().
    Before it was the other way around. Specifically, if mknod() succeeded
    then it was transparent for any userspace application that a fatal error
    must have occured when open() failed.

    All of this breaks multiple userspace workloads and a widespread assumption
    about how to handle mknod(). Basically, all container runtimes and systemd
    live by the slogan "ask for forgiveness not permission" when running user
    namespace workloads. For mknod() the assumption is that if the syscall
    succeeds the device nodes are useable irrespective of whether it succeeds
    in a non-initial user namespace or not. This logic was chosen explicitly
    to allow for the glorious day when mknod() will actually be able to create
    fully functional device nodes in user namespaces.
    A specific problem people are already running into when running 4.18 rc
    kernels are failing systemd services. For any distro that is run in a
    container systemd services started with the PrivateDevices= property set
    will fail to start since the device nodes in question cannot be
    opened (cf. the arguments in [1]).

    Full disclosure, Seth made the very sound argument that it is already
    possible to end up with partially functional device nodes. Any filesystem
    mounted with MS_NODEV set will allow mknod() to succeed but will not allow
    open() to succeed. The difference to the case here is that the MS_NODEV
    case is transparent to userspace since it is an explicitly set mount option
    while the SB_I_NODEV case is an implicit property enforced by the kernel
    and hence opaque to userspace.

    [1]: https://github.com/systemd/systemd/pull/9483

    Signed-off-by: Christian Brauner
    Cc: "Eric W. Biederman"
    Cc: Seth Forshee
    Cc: Serge Hallyn
    Signed-off-by: Linus Torvalds

    Christian Brauner
     

24 Aug, 2018

1 commit

  • Disallows open of FIFOs or regular files not owned by the user in world
    writable sticky directories, unless the owner is the same as that of the
    directory or the file is opened without the O_CREAT flag. The purpose
    is to make data spoofing attacks harder. This protection can be turned
    on and off separately for FIFOs and regular files via sysctl, just like
    the symlinks/hardlinks protection. This patch is based on Openwall's
    "HARDEN_FIFO" feature by Solar Designer.

    This is a brief list of old vulnerabilities that could have been prevented
    by this feature, some of them even allow for privilege escalation:

    CVE-2000-1134
    CVE-2007-3852
    CVE-2008-0525
    CVE-2009-0416
    CVE-2011-4834
    CVE-2015-1838
    CVE-2015-7442
    CVE-2016-7489

    This list is not meant to be complete. It's difficult to track down all
    vulnerabilities of this kind because they were often reported without any
    mention of this particular attack vector. In fact, before
    hardlinks/symlinks restrictions, fifos/regular files weren't the favorite
    vehicle to exploit them.

    [s.mesoraca16@gmail.com: fix bug reported by Dan Carpenter]
    Link: https://lkml.kernel.org/r/20180426081456.GA7060@mwanda
    Link: http://lkml.kernel.org/r/1524829819-11275-1-git-send-email-s.mesoraca16@gmail.com
    [keescook@chromium.org: drop pr_warn_ratelimited() in favor of audit changes in the future]
    [keescook@chromium.org: adjust commit subjet]
    Link: http://lkml.kernel.org/r/20180416175918.GA13494@beast
    Signed-off-by: Salvatore Mesoraca
    Signed-off-by: Kees Cook
    Suggested-by: Solar Designer
    Suggested-by: Kees Cook
    Cc: Al Viro
    Cc: Dan Carpenter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Salvatore Mesoraca
     

22 Aug, 2018

1 commit

  • Pull overlayfs updates from Miklos Szeredi:
    "This contains two new features:

    - Stack file operations: this allows removal of several hacks from
    the VFS, proper interaction of read-only open files with copy-up,
    possibility to implement fs modifying ioctls properly, and others.

    - Metadata only copy-up: when file is on lower layer and only
    metadata is modified (except size) then only copy up the metadata
    and continue to use the data from the lower file"

    * tag 'ovl-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (66 commits)
    ovl: Enable metadata only feature
    ovl: Do not do metacopy only for ioctl modifying file attr
    ovl: Do not do metadata only copy-up for truncate operation
    ovl: add helper to force data copy-up
    ovl: Check redirect on index as well
    ovl: Set redirect on upper inode when it is linked
    ovl: Set redirect on metacopy files upon rename
    ovl: Do not set dentry type ORIGIN for broken hardlinks
    ovl: Add an inode flag OVL_CONST_INO
    ovl: Treat metacopy dentries as type OVL_PATH_MERGE
    ovl: Check redirects for metacopy files
    ovl: Move some dir related ovl_lookup_single() code in else block
    ovl: Do not expose metacopy only dentry from d_real()
    ovl: Open file with data except for the case of fsync
    ovl: Add helper ovl_inode_realdata()
    ovl: Store lower data inode in ovl_inode
    ovl: Fix ovl_getattr() to get number of blocks from lower
    ovl: Add helper ovl_dentry_lowerdata() to get lower data dentry
    ovl: Copy up meta inode data from lowest data inode
    ovl: Modify ovl_lookup() and friends to lookup metacopy dentry
    ...

    Linus Torvalds
     

14 Aug, 2018

1 commit

  • …ux/kernel/git/viro/vfs

    Pull misc vfs updates from Al Viro:
    "Misc cleanups from various folks all over the place

    I expected more fs/dcache.c cleanups this cycle, so that went into a
    separate branch. Said cleanups have missed the window, so in the
    hindsight it could've gone into work.misc instead. Decided not to
    cherry-pick, thus the 'work.dcache' branch"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: dcache: Use true and false for boolean values
    fold generic_readlink() into its only caller
    fs: shave 8 bytes off of struct inode
    fs: Add more kernel-doc to the produced documentation
    fs: Fix attr.c kernel-doc
    removed extra extern file_fdatawait_range

    * 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    kill dentry_update_name_case()

    Linus Torvalds
     

20 Jul, 2018

1 commit


18 Jul, 2018

1 commit


12 Jul, 2018

17 commits


16 Jun, 2018

1 commit

  • Pull AFS updates from Al Viro:
    "Assorted AFS stuff - ended up in vfs.git since most of that consists
    of David's AFS-related followups to Christoph's procfs series"

    * 'afs-proc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    afs: Optimise callback breaking by not repeating volume lookup
    afs: Display manually added cells in dynamic root mount
    afs: Enable IPv6 DNS lookups
    afs: Show all of a server's addresses in /proc/fs/afs/servers
    afs: Handle CONFIG_PROC_FS=n
    proc: Make inline name size calculation automatic
    afs: Implement network namespacing
    afs: Mark afs_net::ws_cell as __rcu and set using rcu functions
    afs: Fix a Sparse warning in xdr_decode_AFSFetchStatus()
    proc: Add a way to make network proc files writable
    afs: Rearrange fs/afs/proc.c to remove remaining predeclarations.
    afs: Rearrange fs/afs/proc.c to move the show routines up
    afs: Rearrange fs/afs/proc.c by moving fops and open functions down
    afs: Move /proc management functions to the end of the file

    Linus Torvalds
     

15 Jun, 2018

1 commit

  • Alter the dynroot mount so that cells created by manipulation of
    /proc/fs/afs/cells and /proc/fs/afs/rootcell and by specification of a root
    cell as a module parameter will cause directories for those cells to be
    created in the dynamic root superblock for the network namespace[*].

    To this end:

    (1) Only one dynamic root superblock is now created per network namespace
    and this is shared between all attempts to mount it. This makes it
    easier to find the superblock to modify.

    (2) When a dynamic root superblock is created, the list of cells is walked
    and directories created for each cell already defined.

    (3) When a new cell is added, if a dynamic root superblock exists, a
    directory is created for it.

    (4) When a cell is destroyed, the directory is removed.

    (5) These directories are created by calling lookup_one_len() on the root
    dir which automatically creates them if they don't exist.

    [*] Inasmuch as network namespaces are currently supported here.

    Signed-off-by: David Howells

    David Howells
     

13 Jun, 2018

1 commit

  • The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
    patch replaces cases of:

    kmalloc(a * b, gfp)

    with:
    kmalloc_array(a * b, gfp)

    as well as handling cases of:

    kmalloc(a * b * c, gfp)

    with:

    kmalloc(array3_size(a, b, c), gfp)

    as it's slightly less ugly than:

    kmalloc_array(array_size(a, b), c, gfp)

    This does, however, attempt to ignore constant size factors like:

    kmalloc(4 * 1024, gfp)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The tools/ directory was manually excluded, since it has its own
    implementation of kmalloc().

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    kmalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    kmalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    kmalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_ID)
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_ID
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_CONST)
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_CONST
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_ID)
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_ID
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_CONST)
    + COUNT_CONST, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_CONST
    + COUNT_CONST, sizeof(THING)
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    - kmalloc
    + kmalloc_array
    (
    - SIZE * COUNT
    + COUNT, SIZE
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    kmalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    kmalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    kmalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products,
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(
    - (E1) * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * (E3)
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants,
    // keeping sizeof() as the second factor argument.
    @@
    expression THING, E1, E2;
    type TYPE;
    constant C1, C2, C3;
    @@

    (
    kmalloc(sizeof(THING) * C2, ...)
    |
    kmalloc(sizeof(TYPE) * C2, ...)
    |
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(C1 * C2, ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (E2)
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * E2
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (E2)
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * E2
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * E2
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * (E2)
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - E1 * E2
    + E1, E2
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     

05 Jun, 2018

2 commits

  • Pull userns updates from Eric Biederman:
    "This is the last couple of vfs bits to enable root in a user namespace
    to mount and manipulate a filesystem with backing store (AKA not a
    virtual filesystem like proc, but a filesystem where the unprivileged
    user controls the content). The target filesystem for this work is
    fuse, and Miklos should be sending you the pull request for the fuse
    bits this merge window.

    The two key patches are "evm: Don't update hmacs in user ns mounts"
    and "vfs: Don't allow changing the link count of an inode with an
    invalid uid or gid". Those close small gaps in the vfs that would be a
    problem if an unprivileged fuse filesystem is mounted.

    The rest of the changes are things that are now safe to allow a root
    user in a user namespace to do with a filesystem they have mounted.
    The most interesting development is that remount is now safe"

    * 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems
    capabilities: Allow privileged user in s_user_ns to set security.* xattrs
    fs: Allow superblock owner to access do_remount_sb()
    fs: Allow superblock owner to replace invalid owners of inodes
    vfs: Allow userns root to call mknod on owned filesystems.
    vfs: Don't allow changing the link count of an inode with an invalid uid or gid
    evm: Don't update hmacs in user ns mounts

    Linus Torvalds
     
  • Pull misc vfs updates from Al Viro:
    "Misc bits and pieces not fitting into anything more specific"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    vfs: delete unnecessary assignment in vfs_listxattr
    Documentation: filesystems: update filesystem locking documentation
    vfs: namei: use path_equal() in follow_dotdot()
    fs.h: fix outdated comment about file flags
    __inode_security_revalidate() never gets NULL opt_dentry
    make xattr_getsecurity() static
    vfat: simplify checks in vfat_lookup()
    get rid of dead code in d_find_alias()
    it's SB_BORN, not MS_BORN...
    msdos_rmdir(): kill BS comment
    remove rpc_rmdir()
    fs: avoid fdput() after failed fdget() in vfs_dedupe_file_range()

    Linus Torvalds