21 Mar, 2019

3 commits

  • Add a syscall for configuring a filesystem creation context and triggering
    actions upon it, to be used in conjunction with fsopen, fspick and fsmount.

    long fsconfig(int fs_fd, unsigned int cmd, const char *key,
    const void *value, int aux);

    Where fs_fd indicates the context, cmd indicates the action to take, key
    indicates the parameter name for parameter-setting actions and, if needed,
    value points to a buffer containing the value and aux can give more
    information for the value.

    The following command IDs are proposed:

    (*) FSCONFIG_SET_FLAG: No value is specified. The parameter must be
    boolean in nature. The key may be prefixed with "no" to invert the
    setting. value must be NULL and aux must be 0.

    (*) FSCONFIG_SET_STRING: A string value is specified. The parameter can
    be expecting boolean, integer, string or take a path. A conversion to
    an appropriate type will be attempted (which may include looking up as
    a path). value points to a NUL-terminated string and aux must be 0.

    (*) FSCONFIG_SET_BINARY: A binary blob is specified. value points to
    the blob and aux indicates its size. The parameter must be expecting
    a blob.

    (*) FSCONFIG_SET_PATH: A non-empty path is specified. The parameter must
    be expecting a path object. value points to a NUL-terminated string
    that is the path and aux is a file descriptor at which to start a
    relative lookup or AT_FDCWD.

    (*) FSCONFIG_SET_PATH_EMPTY: As fsconfig_set_path, but with AT_EMPTY_PATH
    implied.

    (*) FSCONFIG_SET_FD: An open file descriptor is specified. value must
    be NULL and aux indicates the file descriptor.

    (*) FSCONFIG_CMD_CREATE: Trigger superblock creation.

    (*) FSCONFIG_CMD_RECONFIGURE: Trigger superblock reconfiguration.

    For the "set" command IDs, the idea is that the file_system_type will point
    to a list of parameters and the types of value that those parameters expect
    to take. The core code can then do the parse and argument conversion and
    then give the LSM and FS a cooked option or array of options to use.

    Source specification is also done the same way same way, using special keys
    "source", "source1", "source2", etc..

    [!] Note that, for the moment, the key and value are just glued back
    together and handed to the filesystem. Every filesystem that uses options
    uses match_token() and co. to do this, and this will need to be changed -
    but not all at once.

    Example usage:

    fd = fsopen("ext4", FSOPEN_CLOEXEC);
    fsconfig(fd, fsconfig_set_path, "source", "/dev/sda1", AT_FDCWD);
    fsconfig(fd, fsconfig_set_path_empty, "journal_path", "", journal_fd);
    fsconfig(fd, fsconfig_set_fd, "journal_fd", "", journal_fd);
    fsconfig(fd, fsconfig_set_flag, "user_xattr", NULL, 0);
    fsconfig(fd, fsconfig_set_flag, "noacl", NULL, 0);
    fsconfig(fd, fsconfig_set_string, "sb", "1", 0);
    fsconfig(fd, fsconfig_set_string, "errors", "continue", 0);
    fsconfig(fd, fsconfig_set_string, "data", "journal", 0);
    fsconfig(fd, fsconfig_set_string, "context", "unconfined_u:...", 0);
    fsconfig(fd, fsconfig_cmd_create, NULL, NULL, 0);
    mfd = fsmount(fd, FSMOUNT_CLOEXEC, MS_NOEXEC);

    or:

    fd = fsopen("ext4", FSOPEN_CLOEXEC);
    fsconfig(fd, fsconfig_set_string, "source", "/dev/sda1", 0);
    fsconfig(fd, fsconfig_cmd_create, NULL, NULL, 0);
    mfd = fsmount(fd, FSMOUNT_CLOEXEC, MS_NOEXEC);

    or:

    fd = fsopen("afs", FSOPEN_CLOEXEC);
    fsconfig(fd, fsconfig_set_string, "source", "#grand.central.org:root.cell", 0);
    fsconfig(fd, fsconfig_cmd_create, NULL, NULL, 0);
    mfd = fsmount(fd, FSMOUNT_CLOEXEC, MS_NOEXEC);

    or:

    fd = fsopen("jffs2", FSOPEN_CLOEXEC);
    fsconfig(fd, fsconfig_set_string, "source", "mtd0", 0);
    fsconfig(fd, fsconfig_cmd_create, NULL, NULL, 0);
    mfd = fsmount(fd, FSMOUNT_CLOEXEC, MS_NOEXEC);

    Signed-off-by: David Howells
    cc: linux-api@vger.kernel.org
    Signed-off-by: Al Viro

    David Howells
     
  • Implement the ability for filesystems to log error, warning and
    informational messages through the fs_context. These can be extracted by
    userspace by reading from an fd created by fsopen().

    Error messages are prefixed with "e ", warnings with "w " and informational
    messages with "i ".

    Inside the kernel, formatted messages are malloc'd but unformatted messages
    are not copied if they're either in the core .rodata section or in the
    .rodata section of the filesystem module pinned by fs_context::fs_type.
    The messages are only good till the fs_type is released.

    Note that the logging object is shared between duplicated fs_context
    structures. This is so that such as NFS which do a mount within a mount
    can get at least some of the errors from the inner mount.

    Five logging functions are provided for this:

    (1) void logfc(struct fs_context *fc, const char *fmt, ...);

    This logs a message into the context. If the buffer is full, the
    earliest message is discarded.

    (2) void errorf(fc, fmt, ...);

    This wraps logfc() to log an error.

    (3) void invalf(fc, fmt, ...);

    This wraps errorf() and returns -EINVAL for convenience.

    (4) void warnf(fc, fmt, ...);

    This wraps logfc() to log a warning.

    (5) void infof(fc, fmt, ...);

    This wraps logfc() to log an informational message.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Provide an fsopen() system call that starts the process of preparing to
    create a superblock that will then be mountable, using an fd as a context
    handle. fsopen() is given the name of the filesystem that will be used:

    int mfd = fsopen(const char *fsname, unsigned int flags);

    where flags can be 0 or FSOPEN_CLOEXEC.

    For example:

    sfd = fsopen("ext4", FSOPEN_CLOEXEC);
    fsconfig(sfd, FSCONFIG_SET_PATH, "source", "/dev/sda1", AT_FDCWD);
    fsconfig(sfd, FSCONFIG_SET_FLAG, "noatime", NULL, 0);
    fsconfig(sfd, FSCONFIG_SET_FLAG, "acl", NULL, 0);
    fsconfig(sfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0);
    fsconfig(sfd, FSCONFIG_SET_STRING, "sb", "1", 0);
    fsconfig(sfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
    fsinfo(sfd, NULL, ...); // query new superblock attributes
    mfd = fsmount(sfd, FSMOUNT_CLOEXEC, MS_RELATIME);
    move_mount(mfd, "", sfd, AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);

    sfd = fsopen("afs", -1);
    fsconfig(fd, FSCONFIG_SET_STRING, "source",
    "#grand.central.org:root.cell", 0);
    fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
    mfd = fsmount(sfd, 0, MS_NODEV);
    move_mount(mfd, "", sfd, AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);

    If an error is reported at any step, an error message may be available to be
    read() back (ENODATA will be reported if there isn't an error available) in
    the form:

    "e :"
    "e SELinux:Mount on mountpoint not permitted"

    Once fsmount() has been called, further fsconfig() calls will incur EBUSY,
    even if the fsmount() fails. read() is still possible to retrieve error
    information.

    The fsopen() syscall creates a mount context and hangs it of the fd that it
    returns.

    Netlink is not used because it is optional and would make the core VFS
    dependent on the networking layer and also potentially add network
    namespace issues.

    Note that, for the moment, the caller must have SYS_CAP_ADMIN to use
    fsopen().

    Signed-off-by: David Howells
    cc: linux-api@vger.kernel.org
    Signed-off-by: Al Viro

    David Howells
     

28 Feb, 2019

3 commits

  • Implement the ability for filesystems to log error, warning and
    informational messages through the fs_context. In the future, these will
    be extractable by userspace by reading from an fd created by the fsopen()
    syscall.

    Error messages are prefixed with "e ", warnings with "w " and informational
    messages with "i ".

    In the future, inside the kernel, formatted messages will be malloc'd but
    unformatted messages will not copied if they're either in the core .rodata
    section or in the .rodata section of the filesystem module pinned by
    fs_context::fs_type. The messages will only be good till the fs_type is
    released.

    Note that the logging object will be shared between duplicated fs_context
    structures. This is so that such as NFS which do a mount within a mount
    can get at least some of the errors from the inner mount.

    Five logging functions are provided for this:

    (1) void logfc(struct fs_context *fc, const char *fmt, ...);

    This logs a message into the context. If the buffer is full, the
    earliest message is discarded.

    (2) void errorf(fc, fmt, ...);

    This wraps logfc() to log an error.

    (3) void invalf(fc, fmt, ...);

    This wraps errorf() and returns -EINVAL for convenience.

    (4) void warnf(fc, fmt, ...);

    This wraps logfc() to log a warning.

    (5) void infof(fc, fmt, ...);

    This wraps logfc() to log an informational message.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • new primitive: vfs_dup_fs_context(). Comes with fs_context
    method (->dup()) for copying the filesystem-specific parts
    of fs_context, along with LSM one (->fs_context_dup()) for
    doing the same to LSM parts.

    [needs better commit message, and change of Author:, anyway]

    Signed-off-by: Al Viro

    Al Viro
     
  • [AV - unfuck kern_mount_data(); we want non-NULL ->mnt_ns on long-living
    mounts]
    [AV - reordering fs/namespace.c is badly overdue, but let's keep it
    separate from that series]
    [AV - drop simple_pin_fs() change]
    [AV - clean vfs_kern_mount() failure exits up]

    Implement a filesystem context concept to be used during superblock
    creation for mount and superblock reconfiguration for remount.

    The mounting procedure then becomes:

    (1) Allocate new fs_context context.

    (2) Configure the context.

    (3) Create superblock.

    (4) Query the superblock.

    (5) Create a mount for the superblock.

    (6) Destroy the context.

    Rather than calling fs_type->mount(), an fs_context struct is created and
    fs_type->init_fs_context() is called to set it up. Pointers exist for the
    filesystem and LSM to hang their private data off.

    A set of operations has to be set by ->init_fs_context() to provide
    freeing, duplication, option parsing, binary data parsing, validation,
    mounting and superblock filling.

    Legacy filesystems are supported by the provision of a set of legacy
    fs_context operations that build up a list of mount options and then invoke
    fs_type->mount() from within the fs_context ->get_tree() operation. This
    allows all filesystems to be accessed using fs_context.

    It should be noted that, whilst this patch adds a lot of lines of code,
    there is quite a bit of duplication with existing code that can be
    eliminated should all filesystems be converted over.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

31 Jan, 2019

5 commits

  • Signed-off-by: Al Viro

    Al Viro
     
  • This is an eventual replacement for vfs_submount() uses. Unlike the
    "mount" and "remount" cases, the users of that thing are not in VFS -
    they are buried in various ->d_automount() instances and rather than
    converting them all at once we introduce the (thankfully small and
    simple) infrastructure here and deal with the prospective users in
    afs, nfs, etc. parts of the series.

    Here we just introduce a new constructor (fs_context_for_submount())
    along with the corresponding enum constant to be put into fc->purpose
    for those.

    Signed-off-by: Al Viro

    Al Viro
     
  • Replace do_remount_sb() with a function, reconfigure_super(), that's
    fs_context aware. The fs_context is expected to be parameterised already
    and have ->root pointing to the superblock to be reconfigured.

    A legacy wrapper is provided that is intended to be called from the
    fs_context ops when those appear, but for now is called directly from
    reconfigure_super(). This wrapper invokes the ->remount_fs() superblock op
    for the moment. It is intended that the remount_fs() op will be phased
    out.

    The fs_context->purpose is set to FS_CONTEXT_FOR_RECONFIGURE to indicate
    that the context is being used for reconfiguration.

    do_umount_root() is provided to consolidate remount-to-R/O for umount and
    emergency remount by creating a context and invoking reconfiguration.

    do_remount(), do_umount() and do_emergency_remount_callback() are switched
    to use the new process.

    [AV -- fold UMOUNT and EMERGENCY_REMOUNT in; fixes the
    umount / bug, gets rid of pointless complexity]
    [AV -- set ->net_ns in all cases; nfs remount will need that]
    [AV -- shift security_sb_remount() call into reconfigure_super(); the callers
    that didn't do security_sb_remount() have NULL fc->security anyway, so it's
    a no-op for them]

    Signed-off-by: David Howells
    Co-developed-by: Al Viro
    Signed-off-by: Al Viro

    David Howells
     
  • Right now vfs_get_tree() calls security_sb_kern_mount() (i.e.
    mount MAC) unless it gets MS_KERNMOUNT or MS_SUBMOUNT in flags.
    Doing it that way is both clumsy and imprecise.

    Consider the callers' tree of vfs_get_tree():
    vfs_get_tree()
    s_umount (in
    do_new_mount_fc()).

    Signed-off-by: Al Viro

    Al Viro
     
  • Introduce a filesystem context concept to be used during superblock
    creation for mount and superblock reconfiguration for remount. This is
    allocated at the beginning of the mount procedure and into it is placed:

    (1) Filesystem type.

    (2) Namespaces.

    (3) Source/Device names (there may be multiple).

    (4) Superblock flags (SB_*).

    (5) Security details.

    (6) Filesystem-specific data, as set by the mount options.

    Accessor functions are then provided to set up a context, parameterise it
    from monolithic mount data (the data page passed to mount(2)) and tear it
    down again.

    A legacy wrapper is provided that implements what will be the basic
    operations, wrapping access to filesystems that aren't yet aware of the
    fs_context.

    Finally, vfs_kern_mount() is changed to make use of the fs_context and
    mount_fs() is replaced by vfs_get_tree(), called from vfs_kern_mount().
    [AV -- add missing kstrdup()]
    [AV -- put_cred() can be unconditional - fc->cred can't be NULL]
    [AV -- take legacy_validate() contents into legacy_parse_monolithic()]
    [AV -- merge KERNEL_MOUNT and USER_MOUNT]
    [AV -- don't unlock superblock on success return from vfs_get_tree()]
    [AV -- kill 'reference' argument of init_fs_context()]

    Signed-off-by: David Howells
    Co-developed-by: Al Viro
    Signed-off-by: Al Viro

    David Howells