19 Sep, 2019

1 commit

  • Pull fs-verity support from Eric Biggers:
    "fs-verity is a filesystem feature that provides Merkle tree based
    hashing (similar to dm-verity) for individual readonly files, mainly
    for the purpose of efficient authenticity verification.

    This pull request includes:

    (a) The fs/verity/ support layer and documentation.

    (b) fs-verity support for ext4 and f2fs.

    Compared to the original fs-verity patchset from last year, the UAPI
    to enable fs-verity on a file has been greatly simplified. Lots of
    other things were cleaned up too.

    fs-verity is planned to be used by two different projects on Android;
    most of the userspace code is in place already. Another userspace tool
    ("fsverity-utils"), and xfstests, are also available. e2fsprogs and
    f2fs-tools already have fs-verity support. Other people have shown
    interest in using fs-verity too.

    I've tested this on ext4 and f2fs with xfstests, both the existing
    tests and the new fs-verity tests. This has also been in linux-next
    since July 30 with no reported issues except a couple minor ones I
    found myself and folded in fixes for.

    Ted and I will be co-maintaining fs-verity"

    * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
    f2fs: add fs-verity support
    ext4: update on-disk format documentation for fs-verity
    ext4: add fs-verity read support
    ext4: add basic fs-verity support
    fs-verity: support builtin file signatures
    fs-verity: add SHA-512 support
    fs-verity: implement FS_IOC_MEASURE_VERITY ioctl
    fs-verity: implement FS_IOC_ENABLE_VERITY ioctl
    fs-verity: add data verification hooks for ->readpages()
    fs-verity: add the hook for file ->setattr()
    fs-verity: add the hook for file ->open()
    fs-verity: add inode and superblock fields
    fs-verity: add Kconfig and the helper functions for hashing
    fs: uapi: define verity bit for FS_IOC_GETFLAGS
    fs-verity: add UAPI header
    fs-verity: add MAINTAINERS file entry
    fs-verity: add a documentation file

    Linus Torvalds
     

24 Aug, 2019

1 commit

  • EROFS filesystem has been merged into linux-staging for a year.

    EROFS is designed to be a better solution of saving extra storage
    space with guaranteed end-to-end performance for read-only files
    with the help of reduced metadata, fixed-sized output compression
    and decompression inplace technologies.

    In the past year, EROFS was greatly improved by many people as
    a staging driver, self-tested, betaed by a large number of our
    internal users, successfully applied to almost all in-service
    HUAWEI smartphones as the part of EMUI 9.1 and proven to be stable
    enough to be moved out of staging.

    EROFS is a self-contained filesystem driver. Although there are
    still some TODOs to be more generic, we have a dedicated team
    actively keeping on working on EROFS in order to make it better
    with the evolution of Linux kernel as the other in-kernel filesystems.

    As Pavel suggested, it's better to do as one commit since git
    can do moves and all histories will be saved in this way.

    Let's promote it from staging and enhance it more actively as
    a "real" part of kernel for more wider scenarios!

    Cc: Greg Kroah-Hartman
    Cc: Alexander Viro
    Cc: Andrew Morton
    Cc: Stephen Rothwell
    Cc: Theodore Ts'o
    Cc: Pavel Machek
    Cc: David Sterba
    Cc: Amir Goldstein
    Cc: Christoph Hellwig
    Cc: Darrick J . Wong
    Cc: Dave Chinner
    Cc: Jaegeuk Kim
    Cc: Jan Kara
    Cc: Richard Weinberger
    Cc: Linus Torvalds
    Cc: Chao Yu
    Cc: Miao Xie
    Cc: Li Guifu
    Cc: Fang Wei
    Signed-off-by: Gao Xiang
    Link: https://lore.kernel.org/r/20190822213659.5501-1-hsiangkao@aol.com
    Signed-off-by: Greg Kroah-Hartman

    Gao Xiang
     

29 Jul, 2019

1 commit


17 Jul, 2019

1 commit


15 Jul, 2019

1 commit


08 May, 2019

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "Add as a feature case-insensitive directories (the casefold feature)
    using Unicode 12.1.

    Also, the usual largish number of cleanups and bug fixes"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (25 commits)
    ext4: export /sys/fs/ext4/feature/casefold if Unicode support is present
    ext4: fix ext4_show_options for file systems w/o journal
    unicode: refactor the rule for regenerating utf8data.h
    docs: ext4.rst: document case-insensitive directories
    ext4: Support case-insensitive file name lookups
    ext4: include charset encoding information in the superblock
    MAINTAINERS: add Unicode subsystem entry
    unicode: update unicode database unicode version 12.1.0
    unicode: introduce test module for normalized utf8 implementation
    unicode: implement higher level API for string handling
    unicode: reduce the size of utf8data[]
    unicode: introduce code for UTF-8 normalization
    unicode: introduce UTF-8 character database
    ext4: actually request zeroing of inode table after grow
    ext4: cond_resched in work-heavy group loops
    ext4: fix use-after-free race with debug_want_extra_isize
    ext4: avoid drop reference to iloc.bh twice
    ext4: ignore e_value_offs for xattrs with value-in-ea-inode
    ext4: protect journal inode's blocks using block_validity
    ext4: use BUG() instead of BUG_ON(1)
    ...

    Linus Torvalds
     

26 Apr, 2019

1 commit

  • The decomposition and casefolding of UTF-8 characters are described in a
    prefix tree in utf8data.h, which is a generate from the Unicode
    Character Database (UCD), published by the Unicode Consortium, and
    should not be edited by hand. The structures in utf8data.h are meant to
    be used for lookup operations by the unicode subsystem, when decoding a
    utf-8 string.

    mkutf8data.c is the source for a program that generates utf8data.h. It
    was written by Olaf Weber from SGI and originally proposed to be merged
    into Linux in 2014. The original proposal performed the compatibility
    decomposition, NFKD, but the current version was modified by me to do
    canonical decomposition, NFD, as suggested by the community. The
    changes from the original submission are:

    * Rebase to mainline.
    * Fix out-of-tree-build.
    * Update makefile to build 11.0.0 ucd files.
    * drop references to xfs.
    * Convert NFKD to NFD.
    * Merge back robustness fixes from original patch. Requested by
    Dave Chinner.

    The original submission is archived at:

    The utf8data.h file can be regenerated using the instructions in
    fs/unicode/README.utf8data.

    - Notes on the update from 8.0.0 to 11.0:

    The structure of the ucd files and special cases have not experienced
    any changes between versions 8.0.0 and 11.0.0. 8.0.0 saw the addition
    of Cherokee LC characters, which is an interesting case for
    case-folding. The update is accompanied by new tests on the test_ucd
    module to catch specific cases. No changes to mkutf8data script were
    required for the updates.

    Signed-off-by: Gabriel Krisman Bertazi
    Signed-off-by: Theodore Ts'o

    Gabriel Krisman Bertazi
     

21 Mar, 2019

2 commits

  • Provide an fsopen() system call that starts the process of preparing to
    create a superblock that will then be mountable, using an fd as a context
    handle. fsopen() is given the name of the filesystem that will be used:

    int mfd = fsopen(const char *fsname, unsigned int flags);

    where flags can be 0 or FSOPEN_CLOEXEC.

    For example:

    sfd = fsopen("ext4", FSOPEN_CLOEXEC);
    fsconfig(sfd, FSCONFIG_SET_PATH, "source", "/dev/sda1", AT_FDCWD);
    fsconfig(sfd, FSCONFIG_SET_FLAG, "noatime", NULL, 0);
    fsconfig(sfd, FSCONFIG_SET_FLAG, "acl", NULL, 0);
    fsconfig(sfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0);
    fsconfig(sfd, FSCONFIG_SET_STRING, "sb", "1", 0);
    fsconfig(sfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
    fsinfo(sfd, NULL, ...); // query new superblock attributes
    mfd = fsmount(sfd, FSMOUNT_CLOEXEC, MS_RELATIME);
    move_mount(mfd, "", sfd, AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);

    sfd = fsopen("afs", -1);
    fsconfig(fd, FSCONFIG_SET_STRING, "source",
    "#grand.central.org:root.cell", 0);
    fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
    mfd = fsmount(sfd, 0, MS_NODEV);
    move_mount(mfd, "", sfd, AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);

    If an error is reported at any step, an error message may be available to be
    read() back (ENODATA will be reported if there isn't an error available) in
    the form:

    "e :"
    "e SELinux:Mount on mountpoint not permitted"

    Once fsmount() has been called, further fsconfig() calls will incur EBUSY,
    even if the fsmount() fails. read() is still possible to retrieve error
    information.

    The fsopen() syscall creates a mount context and hangs it of the fd that it
    returns.

    Netlink is not used because it is optional and would make the core VFS
    dependent on the networking layer and also potentially add network
    namespace issues.

    Note that, for the moment, the caller must have SYS_CAP_ADMIN to use
    fsopen().

    Signed-off-by: David Howells
    cc: linux-api@vger.kernel.org
    Signed-off-by: Al Viro

    David Howells
     
  • Make the anon_inodes facility unconditional so that it can be used by core
    VFS code.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

13 Mar, 2019

1 commit

  • Pull vfs mount infrastructure updates from Al Viro:
    "The rest of core infrastructure; no new syscalls in that pile, but the
    old parts are switched to new infrastructure. At that point
    conversions of individual filesystems can happen independently; some
    are done here (afs, cgroup, procfs, etc.), there's also a large series
    outside of that pile dealing with NFS (quite a bit of option-parsing
    stuff is getting used there - it's one of the most convoluted
    filesystems in terms of mount-related logics), but NFS bits are the
    next cycle fodder.

    It got seriously simplified since the last cycle; documentation is
    probably the weakest bit at the moment - I considered dropping the
    commit introducing Documentation/filesystems/mount_api.txt (cutting
    the size increase by quarter ;-), but decided that it would be better
    to fix it up after -rc1 instead.

    That pile allows to do followup work in independent branches, which
    should make life much easier for the next cycle. fs/super.c size
    increase is unpleasant; there's a followup series that allows to
    shrink it considerably, but I decided to leave that until the next
    cycle"

    * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits)
    afs: Use fs_context to pass parameters over automount
    afs: Add fs_context support
    vfs: Add some logging to the core users of the fs_context log
    vfs: Implement logging through fs_context
    vfs: Provide documentation for new mount API
    vfs: Remove kern_mount_data()
    hugetlbfs: Convert to fs_context
    cpuset: Use fs_context
    kernfs, sysfs, cgroup, intel_rdt: Support fs_context
    cgroup: store a reference to cgroup_ns into cgroup_fs_context
    cgroup1_get_tree(): separate "get cgroup_root to use" into a separate helper
    cgroup_do_mount(): massage calling conventions
    cgroup: stash cgroup_root reference into cgroup_fs_context
    cgroup2: switch to option-by-option parsing
    cgroup1: switch to option-by-option parsing
    cgroup: take options parsing into ->parse_monolithic()
    cgroup: fold cgroup1_mount() into cgroup1_get_tree()
    cgroup: start switching to fs_context
    ipc: Convert mqueue fs to fs_context
    proc: Add fs_context support to procfs
    ...

    Linus Torvalds
     

10 Mar, 2019

1 commit

  • Pull SCSI updates from James Bottomley:
    "This is mostly update of the usual drivers: arcmsr, qla2xxx, lpfc,
    hisi_sas, target/iscsi and target/core.

    Additionally Christoph refactored gdth as part of the dma changes. The
    major mid-layer change this time is the removal of bidi commands and
    with them the whole of the osd/exofs driver and filesystem. This is a
    major simplification for block and mq in particular"

    * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (240 commits)
    scsi: cxgb4i: validate tcp sequence number only if chip version pf
    scsi: core: replace GFP_ATOMIC with GFP_KERNEL in scsi_scan.c
    scsi: mpt3sas: Add missing breaks in switch statements
    scsi: aacraid: Fix missing break in switch statement
    scsi: kill command serial number
    scsi: csiostor: drop serial_number usage
    scsi: mvumi: use request tag instead of serial_number
    scsi: dpt_i2o: remove serial number usage
    scsi: st: osst: Remove negative constant left-shifts
    scsi: ufs-bsg: Allow reading descriptors
    scsi: ufs: Allow reading descriptor via raw upiu
    scsi: ufs-bsg: Change the calling convention for write descriptor
    scsi: ufs: Remove unused device quirks
    Revert "scsi: ufs: disable vccq if it's not needed by UFS device"
    scsi: megaraid_sas: Remove a bunch of set but not used variables
    scsi: clean obsolete return values of eh_timed_out
    scsi: sd: Optimal I/O size should be a multiple of physical block size
    scsi: MAINTAINERS: SCSI initiator and target tweaks
    scsi: fcoe: make use of fip_mode enum complete
    ...

    Linus Torvalds
     

09 Mar, 2019

1 commit

  • Pull io_uring IO interface from Jens Axboe:
    "Second attempt at adding the io_uring interface.

    Since the first one, we've added basic unit testing of the three
    system calls, that resides in liburing like the other unit tests that
    we have so far. It'll take a while to get full coverage of it, but
    we're working towards it. I've also added two basic test programs to
    tools/io_uring. One uses the raw interface and has support for all the
    various features that io_uring supports outside of standard IO, like
    fixed files, fixed IO buffers, and polled IO. The other uses the
    liburing API, and is a simplified version of cp(1).

    This adds support for a new IO interface, io_uring.

    io_uring allows an application to communicate with the kernel through
    two rings, the submission queue (SQ) and completion queue (CQ) ring.
    This allows for very efficient handling of IOs, see the v5 posting for
    some basic numbers:

    https://lore.kernel.org/linux-block/20190116175003.17880-1-axboe@kernel.dk/

    Outside of just efficiency, the interface is also flexible and
    extendable, and allows for future use cases like the upcoming NVMe
    key-value store API, networked IO, and so on. It also supports async
    buffered IO, something that we've always failed to support in the
    kernel.

    Outside of basic IO features, it supports async polled IO as well.
    This particular feature has already been tested at Facebook months ago
    for flash storage boxes, with 25-33% improvements. It makes polled IO
    actually useful for real world use cases, where even basic flash sees
    a nice win in terms of efficiency, latency, and performance. These
    boxes were IOPS bound before, now they are not.

    This series adds three new system calls. One for setting up an
    io_uring instance (io_uring_setup(2)), one for submitting/completing
    IO (io_uring_enter(2)), and one for aux functions like registrating
    file sets, buffers, etc (io_uring_register(2)). Through the help of
    Arnd, I've coordinated the syscall numbers so merge on that front
    should be painless.

    Jon did a writeup of the interface a while back, which (except for
    minor details that have been tweaked) is still accurate. Find that
    here:

    https://lwn.net/Articles/776703/

    Huge thanks to Al Viro for helping getting the reference cycle code
    correct, and to Jann Horn for his extensive reviews focused on both
    security and bugs in general.

    There's a userspace library that provides basic functionality for
    applications that don't need or want to care about how to fiddle with
    the rings directly. It has helpers to allow applications to easily set
    up an io_uring instance, and submit/complete IO through it without
    knowing about the intricacies of the rings. It also includes man pages
    (thanks to Jeff Moyer), and will continue to grow support helper
    functions and features as time progresses. Find it here:

    git://git.kernel.dk/liburing

    Fio has full support for the raw interface, both in the form of an IO
    engine (io_uring), but also with a small test application (t/io_uring)
    that can exercise and benchmark the interface"

    * tag 'io_uring-2019-03-06' of git://git.kernel.dk/linux-block:
    io_uring: add a few test tools
    io_uring: allow workqueue item to handle multiple buffered requests
    io_uring: add support for IORING_OP_POLL
    io_uring: add io_kiocb ref count
    io_uring: add submission polling
    io_uring: add file set registration
    net: split out functions related to registering inflight socket files
    io_uring: add support for pre-mapped user IO buffers
    block: implement bio helper to add iter bvec pages to bio
    io_uring: batch io_kiocb allocation
    io_uring: use fget/fput_many() for file references
    fs: add fget_many() and fput_many()
    io_uring: support for IO polling
    io_uring: add fsync support
    Add io_uring IO interface

    Linus Torvalds
     

28 Feb, 2019

2 commits

  • The submission queue (SQ) and completion queue (CQ) rings are shared
    between the application and the kernel. This eliminates the need to
    copy data back and forth to submit and complete IO.

    IO submissions use the io_uring_sqe data structure, and completions
    are generated in the form of io_uring_cqe data structures. The SQ
    ring is an index into the io_uring_sqe array, which makes it possible
    to submit a batch of IOs without them being contiguous in the ring.
    The CQ ring is always contiguous, as completion events are inherently
    unordered, and hence any io_uring_cqe entry can point back to an
    arbitrary submission.

    Two new system calls are added for this:

    io_uring_setup(entries, params)
    Sets up an io_uring instance for doing async IO. On success,
    returns a file descriptor that the application can mmap to
    gain access to the SQ ring, CQ ring, and io_uring_sqes.

    io_uring_enter(fd, to_submit, min_complete, flags, sigset, sigsetsize)
    Initiates IO against the rings mapped to this fd, or waits for
    them to complete, or both. The behavior is controlled by the
    parameters passed in. If 'to_submit' is non-zero, then we'll
    try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
    kernel will wait for 'min_complete' events, if they aren't
    already available. It's valid to set IORING_ENTER_GETEVENTS
    and 'min_complete' == 0 at the same time, this allows the
    kernel to return already completed events without waiting
    for them. This is useful only for polling, as for IRQ
    driven IO, the application can just check the CQ ring
    without entering the kernel.

    With this setup, it's possible to do async IO with a single system
    call. Future developments will enable polled IO with this interface,
    and polled submission as well. The latter will enable an application
    to do IO without doing ANY system calls at all.

    For IRQ driven IO, an application only needs to enter the kernel for
    completions if it wants to wait for them to occur.

    Each io_uring is backed by a workqueue, to support buffered async IO
    as well. We will only punt to an async context if the command would
    need to wait for IO on the device side. Any data that can be accessed
    directly in the page cache is done inline. This avoids the slowness
    issue of usual threadpools, since cached data is accessed as quickly
    as a sync interface.

    Sample application: http://git.kernel.dk/cgit/fio/plain/t/io_uring.c

    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Because the new API passes in key,value parameters, match_token() cannot be
    used with it. Instead, provide three new helpers to aid with parsing:

    (1) fs_parse(). This takes a parameter and a simple static description of
    all the parameters and maps the key name to an ID. It returns 1 on a
    match, 0 on no match if unknowns should be ignored and some other
    negative error code on a parse error.

    The parameter description includes a list of key names to IDs, desired
    parameter types and a list of enumeration name -> ID mappings.

    [!] Note that for the moment I've required that the key->ID mapping
    array is expected to be sorted and unterminated. The size of the
    array is noted in the fsconfig_parser struct. This allows me to use
    bsearch(), but I'm not sure any performance gain is worth the hassle
    of requiring people to keep the array sorted.

    The parameter type array is sized according to the number of parameter
    IDs and is indexed directly. The optional enum mapping array is an
    unterminated, unsorted list and the size goes into the fsconfig_parser
    struct.

    The function can do some additional things:

    (a) If it's not ambiguous and no value is given, the prefix "no" on
    a key name is permitted to indicate that the parameter should
    be considered negatory.

    (b) If the desired type is a single simple integer, it will perform
    an appropriate conversion and store the result in a union in
    the parse result.

    (c) If the desired type is an enumeration, {key ID, name} will be
    looked up in the enumeration list and the matching value will
    be stored in the parse result union.

    (d) Optionally generate an error if the key is unrecognised.

    This is called something like:

    enum rdt_param {
    Opt_cdp,
    Opt_cdpl2,
    Opt_mba_mpbs,
    nr__rdt_params
    };

    const struct fs_parameter_spec rdt_param_specs[nr__rdt_params] = {
    [Opt_cdp] = { fs_param_is_bool },
    [Opt_cdpl2] = { fs_param_is_bool },
    [Opt_mba_mpbs] = { fs_param_is_bool },
    };

    const const char *const rdt_param_keys[nr__rdt_params] = {
    [Opt_cdp] = "cdp",
    [Opt_cdpl2] = "cdpl2",
    [Opt_mba_mpbs] = "mba_mbps",
    };

    const struct fs_parameter_description rdt_parser = {
    .name = "rdt",
    .nr_params = nr__rdt_params,
    .keys = rdt_param_keys,
    .specs = rdt_param_specs,
    .no_source = true,
    };

    int rdt_parse_param(struct fs_context *fc,
    struct fs_parameter *param)
    {
    struct fs_parse_result parse;
    struct rdt_fs_context *ctx = rdt_fc2context(fc);
    int ret;

    ret = fs_parse(fc, &rdt_parser, param, &parse);
    if (ret < 0)
    return ret;

    switch (parse.key) {
    case Opt_cdp:
    ctx->enable_cdpl3 = true;
    return 0;
    case Opt_cdpl2:
    ctx->enable_cdpl2 = true;
    return 0;
    case Opt_mba_mpbs:
    ctx->enable_mba_mbps = true;
    return 0;
    }

    return -EINVAL;
    }

    (2) fs_lookup_param(). This takes a { dirfd, path, LOOKUP_EMPTY? } or
    string value and performs an appropriate path lookup to convert it
    into a path object, which it will then return.

    If the desired type was a blockdev, the type of the looked up inode
    will be checked to make sure it is one.

    This can be used like:

    enum foo_param {
    Opt_source,
    nr__foo_params
    };

    const struct fs_parameter_spec foo_param_specs[nr__foo_params] = {
    [Opt_source] = { fs_param_is_blockdev },
    };

    const char *char foo_param_keys[nr__foo_params] = {
    [Opt_source] = "source",
    };

    const struct constant_table foo_param_alt_keys[] = {
    { "device", Opt_source },
    };

    const struct fs_parameter_description foo_parser = {
    .name = "foo",
    .nr_params = nr__foo_params,
    .nr_alt_keys = ARRAY_SIZE(foo_param_alt_keys),
    .keys = foo_param_keys,
    .alt_keys = foo_param_alt_keys,
    .specs = foo_param_specs,
    };

    int foo_parse_param(struct fs_context *fc,
    struct fs_parameter *param)
    {
    struct fs_parse_result parse;
    struct foo_fs_context *ctx = foo_fc2context(fc);
    int ret;

    ret = fs_parse(fc, &foo_parser, param, &parse);
    if (ret < 0)
    return ret;

    switch (parse.key) {
    case Opt_source:
    return fs_lookup_param(fc, &foo_parser, param,
    &parse, &ctx->source);
    default:
    return -EINVAL;
    }
    }

    (3) lookup_constant(). This takes a table of named constants and looks up
    the given name within it. The table is expected to be sorted such
    that bsearch() be used upon it.

    Possibly I should require the table be terminated and just use a
    for-loop to scan it instead of using bsearch() to reduce hassle.

    Tables look something like:

    static const struct constant_table bool_names[] = {
    { "0", false },
    { "1", true },
    { "false", false },
    { "no", false },
    { "true", true },
    { "yes", true },
    };

    and a lookup is done with something like:

    b = lookup_constant(bool_names, param->string, -1);

    Additionally, optional validation routines for the parameter description
    are provided that can be enabled at compile time. A later patch will
    invoke these when a filesystem is registered.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

06 Feb, 2019

1 commit


31 Jan, 2019

1 commit

  • Introduce a filesystem context concept to be used during superblock
    creation for mount and superblock reconfiguration for remount. This is
    allocated at the beginning of the mount procedure and into it is placed:

    (1) Filesystem type.

    (2) Namespaces.

    (3) Source/Device names (there may be multiple).

    (4) Superblock flags (SB_*).

    (5) Security details.

    (6) Filesystem-specific data, as set by the mount options.

    Accessor functions are then provided to set up a context, parameterise it
    from monolithic mount data (the data page passed to mount(2)) and tear it
    down again.

    A legacy wrapper is provided that implements what will be the basic
    operations, wrapping access to filesystems that aren't yet aware of the
    fs_context.

    Finally, vfs_kern_mount() is changed to make use of the fs_context and
    mount_fs() is replaced by vfs_get_tree(), called from vfs_kern_mount().
    [AV -- add missing kstrdup()]
    [AV -- put_cred() can be unconditional - fc->cred can't be NULL]
    [AV -- take legacy_validate() contents into legacy_parse_monolithic()]
    [AV -- merge KERNEL_MOUNT and USER_MOUNT]
    [AV -- don't unlock superblock on success return from vfs_get_tree()]
    [AV -- kill 'reference' argument of init_fs_context()]

    Signed-off-by: David Howells
    Co-developed-by: Al Viro
    Signed-off-by: Al Viro

    David Howells
     

22 Jan, 2019

1 commit

  • Many file systems use a copy&paste implementation
    of dirent to on-disk file type conversions.

    Create a common implementation to be used by file systems
    with some useful conversion helpers to reduce open coded
    file type conversions in file system code.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Phillip Potter
    Signed-off-by: Jan Kara

    Phillip Potter
     

11 Jun, 2018

1 commit

  • There's no need to retain the fs/autofs4 directory for backward
    compatibility.

    Adding an AUTOFS4_FS fragment to the autofs Kconfig and a module alias
    for autofs4 is sufficient for almost all cases. Not keeping fs/autofs4
    remnants will prevent "insmod /autofs4/autofs4.ko" from working
    but this shouldn't be used in automation scripts rather than
    modprobe(8).

    There were some comments about things to look out for with the module
    rename in the fs/autofs4/Kconfig that is removed by this patch, see the
    commit patch if you are interested.

    One potential problem with this change is that when the
    fs/autofs/Kconfig fragment for AUTOFS4_FS is removed any AUTOFS4_FS
    entries will be removed from the kernel config, resulting in no autofs
    file system being built if there is no AUTOFS_FS entry also.

    This would have also happened if the fs/autofs4 remnants had remained
    and is most likely to be a problem with automated builds.

    Please check your build configurations before the removal which will
    occur after the next couple of kernel releases.

    Acked-by: Ian Kent
    [ With edits and commit message from Ian Kent ]
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

08 Jun, 2018

1 commit

  • Create Makefile and Kconfig for autofs module.

    [raven@themaw.net: make autofs4 Kconfig depend on AUTOFS_FS]
    Link: http://lkml.kernel.org/r/152687649097.8263.7046086367407522029.stgit@pluto.themaw.net
    Link: http://lkml.kernel.org/r/152626705591.28589.356365986974038383.stgit@pluto.themaw.net
    Signed-off-by: Ian Kent
    Tested-by: Randy Dunlap
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ian Kent
     

30 Mar, 2018

1 commit


28 Nov, 2017

1 commit


02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

15 Dec, 2016

1 commit

  • Logfs was introduced to the kernel in 2009, and hasn't seen any non
    drive-by changes since 2012, while having lots of unsolved issues
    including the complete lack of error handling, with more and more
    issues popping up without any fixes.

    The logfs.org domain has been bouncing from a mail, and the maintainer
    on the non-logfs.org domain hasn't repsonded to past queries either.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

21 Jun, 2016

1 commit

  • Add infrastructure for multipage buffered writes. This is implemented
    using an main iterator that applies an actor function to a range that
    can be written.

    This infrastucture is used to implement a buffered write helper, one
    to zero file ranges and one to implement the ->page_mkwrite VM
    operations. All of them borrow a fair amount of code from fs/buffers.
    for now by using an internal version of __block_write_begin that
    gets passed an iomap and builds the corresponding buffer head.

    The file system is gets a set of paired ->iomap_begin and ->iomap_end
    calls which allow it to map/reserve a range and get a notification
    once the write code is finished with it.

    Based on earlier code from Dave Chinner.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Bob Peterson
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     

27 Mar, 2016

1 commit

  • Pull orangefs filesystem from Mike Marshall.

    This finally merges the long-pending orangefs filesystem, which has been
    much cleaned up with input from Al Viro over the last six months. From
    the documentation file:

    "OrangeFS is an LGPL userspace scale-out parallel storage system. It
    is ideal for large storage problems faced by HPC, BigData, Streaming
    Video, Genomics, Bioinformatics.

    Orangefs, originally called PVFS, was first developed in 1993 by Walt
    Ligon and Eric Blumer as a parallel file system for Parallel Virtual
    Machine (PVM) as part of a NASA grant to study the I/O patterns of
    parallel programs.

    Orangefs features include:

    - Distributes file data among multiple file servers
    - Supports simultaneous access by multiple clients
    - Stores file data and metadata on servers using local file system
    and access methods
    - Userspace implementation is easy to install and maintain
    - Direct MPI support
    - Stateless"

    see Documentation/filesystems/orangefs.txt for more in-depth details.

    * tag 'ofs-pull-tag-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: (174 commits)
    orangefs: fix orangefs_superblock locking
    orangefs: fix do_readv_writev() handling of error halfway through
    orangefs: have ->kill_sb() evict the VFS side of things first
    orangefs: sanitize ->llseek()
    orangefs-bufmap.h: trim unused junk
    orangefs: saner calling conventions for getting a slot
    orangefs_copy_{to,from}_bufmap(): don't pass bufmap pointer
    orangefs: get rid of readdir_handle_s
    ornagefs: ensure that truncate has an up to date inode size
    orangefs: move code which sets i_link to orangefs_inode_getattr
    orangefs: remove needless wrapper around GFP_KERNEL
    orangefs: remove wrapper around mutex_lock(&inode->i_mutex)
    orangefs: refactor inode type or link_target change detection
    orangefs: use new getattr for revalidate and remove old getattr
    orangefs: use new getattr in inode getattr and permission
    orangefs: use new orangefs_inode_getattr to get size in write and llseek
    orangefs: use new orangefs_inode_getattr to create new inodes
    orangefs: rename orangefs_inode_getattr to orangefs_inode_old_getattr
    orangefs: remove inode->i_lock wrapper
    orangefs: put register_chrdev immediately before register_filesystem
    ...

    Linus Torvalds
     

18 Mar, 2016

1 commit

  • This patch adds the renamed functions moved from the f2fs crypto files.

    1. definitions for per-file encryption used by ext4 and f2fs.

    2. crypto.c for encrypt/decrypt functions
    a. IO preparation:
    - fscrypt_get_ctx / fscrypt_release_ctx
    b. before IOs:
    - fscrypt_encrypt_page
    - fscrypt_decrypt_page
    - fscrypt_zeroout_range
    c. after IOs:
    - fscrypt_decrypt_bio_pages
    - fscrypt_pullback_bio_page
    - fscrypt_restore_control_page

    3. policy.c supporting context management.
    a. For ioctls:
    - fscrypt_process_policy
    - fscrypt_get_policy
    b. For context permission
    - fscrypt_has_permitted_context
    - fscrypt_inherit_context

    4. keyinfo.c to handle permissions
    - fscrypt_get_encryption_info
    - fscrypt_free_encryption_info

    5. fname.c to support filename encryption
    a. general wrapper functions
    - fscrypt_fname_disk_to_usr
    - fscrypt_fname_usr_to_disk
    - fscrypt_setup_filename
    - fscrypt_free_filename

    b. specific filename handling functions
    - fscrypt_fname_alloc_buffer
    - fscrypt_fname_free_buffer

    6. Makefile and Kconfig

    Cc: Al Viro
    Signed-off-by: Michael Halcrow
    Signed-off-by: Ildar Muslukhov
    Signed-off-by: Uday Savagaonkar
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

16 Nov, 2015

1 commit


15 Oct, 2015

1 commit

  • Prevent clean ext3 filesystems from mounting by default with the ext2
    driver (with no journal!) by putting ext4 ahead of ext2 in the default
    probe order. This will have the effect of mounting ext2 filesystems
    with ext4.ko by default, which is a safer failure than hoping the user
    notices that their journalled ext3 is now running without a journal!

    Users who require ext2.ko for ext2 can either disable ext4.ko or
    explicitly request ext2 via "mount -t ext2" or "rootfstype=ext2".

    Signed-off-by: Darrick J. Wong
    Signed-off-by: Theodore Ts'o

    Darrick J. Wong
     

03 Oct, 2015

1 commit


05 Sep, 2015

1 commit

  • This allows to select the userfaultfd during configuration to build it.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Pavel Emelyanov
    Cc: Sanidhya Kashyap
    Cc: zhang.zhanghailiang@huawei.com
    Cc: "Kirill A. Shutemov"
    Cc: Andres Lagar-Cavilla
    Cc: Dave Hansen
    Cc: Paolo Bonzini
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: Andy Lutomirski
    Cc: Hugh Dickins
    Cc: Peter Feiner
    Cc: "Dr. David Alan Gilbert"
    Cc: Johannes Weiner
    Cc: "Huangpeng (Peter)"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

24 Jul, 2015

1 commit

  • The functionality of ext3 is fully supported by ext4 driver. Major
    distributions (SUSE, RedHat) already use ext4 driver to handle ext3
    filesystems for quite some time. There is some ugliness in mm resulting
    from jbd cleaning buffers in a dirty page without cleaning page dirty
    bit and also support for buffer bouncing in the block layer when stable
    pages are required is there only because of jbd. So let's remove the
    ext3 driver. This saves us some 28k lines of duplicated code.

    Acked-by: Theodore Ts'o
    Signed-off-by: Jan Kara

    Jan Kara
     

31 May, 2015

1 commit

  • hppfs (honeypot procfs) was an attempt to use UML as honeypot.
    It was never stable nor in heavy use.

    As Al Viro and Christoph Hellwig pointed some major issues out
    it is better to let it die.

    Signed-off-by: Richard Weinberger

    Richard Weinberger
     

15 Apr, 2015

1 commit

  • Pull tracefs from Steven Rostedt:
    "This adds the new tracefs file system.

    This has been in linux-next for more than one release, as I had it
    ready for the 4.0 merge window, but a last minute thing that needed to
    go into Linux first had to be done. That was that perf hard coded the
    file system number when reading /sys/kernel/debugfs/tracing directory
    making sure that the path had the debugfs mount # before it would
    parse the tracing file. This broke other use cases of perf, and the
    check is removed.

    Now when mounting /sys/kernel/debug, tracefs is automatically mounted
    in /sys/kernel/debug/tracing such that old tools will still see that
    path as expected. But now system admins can mount tracefs directly
    and not need to mount debugfs, which can expose security issues. A
    new directory is created when tracefs is configured such that system
    admins can now mount it separately (/sys/kernel/tracing)"

    * tag 'trace-4.1-tracefs' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Have mkdir and rmdir be part of tracefs
    tracefs: Add directory /sys/kernel/tracing
    tracing: Automatically mount tracefs on debugfs/tracing
    tracing: Convert the tracing facility over to use tracefs
    tracefs: Add new tracefs file system
    tracing: Create cmdline tracer options on tracing fs init
    tracing: Only create tracer options files if directory exists
    debugfs: Provide a file creation function that also takes an initial size

    Linus Torvalds
     

18 Feb, 2015

1 commit

  • Pull parisc update from Helge Deller:
    "The major change in here is the removal of the old HP-UX compat code
    which should have made it possible to load and execute 32-bit HP-UX
    binaries on PA-RISC Linux. Since it was never functional and since
    nobody cares about old 32-bit HPUX binaries any longer, it's now time
    to free up 3200 lines of kernel code (CONFIG_HPUX and
    CONFIG_BINFMT_SOM).

    Other than that we wire up the execveat() syscall, fix sparse errors
    and have some whitespace cleanups"

    * 'parisc-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
    fs/binfmt_som: Drop kernel support for HP-UX SOM binaries
    parisc: Remove unused function
    parisc: macro whitespace fixes
    parisc/uaccess: fix sparse errors
    parisc: hpux - Remove HPUX syscall numbers
    parisc: hpux - Remove hpux gateway page
    parisc: hpux - Delete files in hpux subdirectory
    parisc: hpux - Do not compile hpux subdirectory
    parisc: hpux - Drop support for HP-UX binaries
    parisc: Add error checks when building up signal trampoline handler
    parisc: Wire up execveat syscall

    Linus Torvalds
     

17 Feb, 2015

3 commits

  • The parisc arch has been the only user of HP-UX SOM binaries.

    Support for HP-UX executables was never finished and since we now drop support
    for the HP-UX compat layer anyway, it does not makes sense to keep the
    BINFMT_SOM support.

    Cc: linux-fsdevel@vger.kernel.org
    Cc: linux-parisc@vger.kernel.org
    Signed-off-by: Helge Deller

    Helge Deller
     
  • The fewer Kconfig options we have the better. Use the generic
    CONFIG_FS_DAX to enable XIP support in ext2 as well as in the core.

    Signed-off-by: Matthew Wilcox
    Cc: Andreas Dilger
    Cc: Boaz Harrosh
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Jan Kara
    Cc: Jens Axboe
    Cc: Kirill A. Shutemov
    Cc: Mathieu Desnoyers
    Cc: Randy Dunlap
    Cc: Ross Zwisler
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Use the generic AIO infrastructure instead of custom read and write
    methods. In addition to giving us support for AIO, this adds the missing
    locking between read() and truncate().

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Reviewed-by: Jan Kara
    Cc: Andreas Dilger
    Cc: Boaz Harrosh
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Jens Axboe
    Cc: Kirill A. Shutemov
    Cc: Mathieu Desnoyers
    Cc: Randy Dunlap
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

04 Feb, 2015

1 commit

  • Add a separate file system to handle the tracing directory. Currently it
    is part of debugfs, but that is starting to show its limits.

    One thing is that in order to access the tracing infrastructure, you need
    to mount debugfs. As that includes debugging from all sorts of sub systems
    in the kernel, it is not considered advisable to mount such an all
    encompassing debugging system.

    Having the tracing system in its own file systems gives access to the
    tracing sub system without needing to include all other systems.

    Another problem with tracing using the debugfs system is that the
    instances use mkdir to create sub buffers. debugfs does not support mkdir
    from userspace so to implement it, special hacks were used. By controlling
    the file system that the tracing infrastructure uses, this can be properly
    done without hacks.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

11 Dec, 2014

2 commits

  • Al Viro
     
  • New pseudo-filesystem: nsfs. Targets of /proc/*/ns/* live there now.
    It's not mountable (not even registered, so it's not in /proc/filesystems,
    etc.). Files on it *are* bindable - we explicitly permit that in do_loopback().

    This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well.
    get_proc_ns() is a macro now (it's simply returning ->i_private; would
    have been an inline, if not for header ordering headache).
    proc_ns_inode() is an ex-parrot. The interface used in procfs is
    ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops).

    Dentries and inodes are never hashed; a non-counting reference to dentry
    is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path()
    if present. See ns_get_path()/ns_prune_dentry/nsfs_evict() for details
    of that mechanism.

    As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt;
    it does nd_jump_link() on a consistent pair it gets
    from ns_get_path().

    Signed-off-by: Al Viro

    Al Viro