20 Sep, 2021

2 commits

  • Attempt to mount 9p file system as root gives the following kernel panic:

    9pnet_virtio: no channels available for device root
    Kernel panic - not syncing: VFS: Unable to mount root "root" (9p), err=-2
    CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.15.0-rc1+ #127
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
    Call Trace:
    dump_stack_lvl+0x45/0x59
    panic+0x1e2/0x44b
    ? __warn_printk+0xf3/0xf3
    ? free_unref_page+0x2d4/0x4a0
    ? trace_hardirqs_on+0x32/0x120
    ? free_unref_page+0x2d4/0x4a0
    mount_root+0x189/0x1e0
    prepare_namespace+0x136/0x165
    kernel_init_freeable+0x3b8/0x3cb
    ? rest_init+0x2e0/0x2e0
    kernel_init+0x19/0x130
    ret_from_fork+0x1f/0x30
    Kernel Offset: disabled
    ---[ end Kernel panic - not syncing: VFS: Unable to mount root "root" (9p), err=-2 ]---

    QEMU command line:
    "qemu-system-x86_64 -append root=/dev/root rw rootfstype=9p rootflags=trans=virtio ..."

    This error is because root_device_name is truncated in prepare_namespace() from
    being "/dev/root" to be "root" prior to call to mount_nodev_root().

    As a solution, don't treat errors in mount_nodev_root() as errors that
    require panics and allow failback to the mount flow that existed before
    patch citied in Fixes tag.

    Fixes: f9259be6a9e7 ("init: allow mounting arbitrary non-blockdevice filesystems as root")
    Signed-off-by: Leon Romanovsky
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Leon Romanovsky
     
  • split_fs_names() currently takes comma separate list of filesystems
    and converts it into individual filesystem strings. Pleaces these
    strings in the input buffer passed by caller and returns number of
    strings.

    If caller manages to pass input string bigger than buffer, then we
    can write beyond the buffer. Or if string just fits buffer, we will
    still write beyond the buffer as we append a '\0' byte at the end.

    Pass size of input buffer to split_fs_names() and put enough checks
    in place so such buffer overrun possibilities do not occur.

    This patch does few things.

    - Add a parameter "size" to split_fs_names(). This specifies size
    of input buffer.

    - Use strlcpy() (instead of strcpy()) so that we can't go beyond
    buffer size. If input string "names" is larger than passed in
    buffer, input string will be truncated to fit in buffer.

    - Stop appending extra '\0' character at the end and avoid one
    possibility of going beyond the input buffer size.

    - Do not use extra loop to count number of strings.

    - Previously if one passed "rootfstype=foo,,bar", split_fs_names()
    will return only 1 string "foo" (and "bar" will be truncated
    due to extra ,). After this patch, now split_fs_names() will
    return 3 strings ("foo", zero-sized-string, and "bar").

    Callers of split_fs_names() have been modified to check for
    zero sized string and skip to next one.

    Reported-by: xu xin
    Signed-off-by: Vivek Goyal
    Reviewed-by: Jan Kara
    Signed-off-by: Al Viro

    Vivek Goyal
     

10 Sep, 2021

1 commit


24 Aug, 2021

1 commit


23 Aug, 2021

3 commits

  • Just output the '\0' separate list of supported file systems for block
    devices directly rather than going through a pointless round of string
    manipulation.

    Based on an earlier patch from Al Viro .

    Vivek:
    Modified list_bdev_fs_names() and split_fs_names() to return number of
    null terminted strings to caller. Callers now use that information to
    loop through all the strings instead of relying on one extra null char
    being present at the end.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Vivek Goyal
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Currently the only non-blockdevice filesystems that can be used as the
    initial root filesystem are NFS and CIFS, which use the magic
    "root=/dev/nfs" and "root=/dev/cifs" syntax that requires the root
    device file system details to come from filesystem specific kernel
    command line options.

    Add a little bit of new code that allows to just pass arbitrary
    string mount options to any non-blockdevice filesystems so that it can
    be mounted as the root file system.

    For example a virtiofs root file system can be mounted using the
    following syntax:

    "root=myfs rootfstype=virtiofs rw"

    Based on an earlier patch from Vivek Goyal .

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Split get_fs_names into one function that splits up the command line
    argument, and one that gets the list of all registered file systems.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

01 Jun, 2021

1 commit


02 Dec, 2020

6 commits

  • Instead of having two structures that represent each block device with
    different life time rules, merge them into a single one. This also
    greatly simplifies the reference counting rules, as we can use the inode
    reference count as the main reference count for the new struct
    block_device, with the device model reference front ending it for device
    model interaction.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jan Kara
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Just use the bd_partno field in struct block_device everywhere.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jan Kara
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Move the partition_meta_info to struct block_device in preparation for
    killing struct hd_struct.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jan Kara
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Avoid a totally pointless goto label, and use the same style of
    comparism for both helpers.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Greg Kroah-Hartman
    Reviewed-by: Jan Kara
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Johannes Thumshirn
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • The code in devt_from_partuuid is very convoluted. Refactor a bit by
    sanitizing the goto and variable name usage.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Greg Kroah-Hartman
    Reviewed-by: Jan Kara
    Reviewed-by: Hannes Reinecke
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Split each case into a self-contained helper, and move the block
    dependent code entirely under the pre-existing #ifdef CONFIG_BLOCK.
    This allows to remove the blk_lookup_devt stub in genhd.h.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Greg Kroah-Hartman
    Reviewed-by: Jan Kara
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Johannes Thumshirn
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

31 Jul, 2020

3 commits


30 Jul, 2020

1 commit


16 Jul, 2020

1 commit


24 Mar, 2020

1 commit

  • There is no good reason for __bdevname to exist. Just open code
    printing the string in the callers. For three of them the format
    string can be trivially merged into existing printk statements,
    and in init/do_mounts.c we can at least do the scnprintf once at
    the start of the function, and unconditional of CONFIG_BLOCK to
    make the output for tiny configfs a little more helpful.

    Acked-by: Theodore Ts'o # for ext4
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

17 Dec, 2019

1 commit

  • The "trivial conversion" in commit cccaa5e33525 ("init: use do_mount()
    instead of ksys_mount()") was totally broken, since it didn't handle the
    case of a NULL mount data pointer. And while I had "tested" it (and
    presumably Dominik had too) that bug was hidden by me having options.

    Cc: Dominik Brodowski
    Cc: Arnd Bergmann
    Reported-by: Ondřej Jirman
    Reported-by: Guenter Roeck
    Reported-by: Naresh Kamboju
    Reported-and-tested-by: Borislav Petkov
    Tested-by: Chris Clayton
    Tested-by: Eric Biggers
    Tested-by: Geert Uytterhoeven
    Tested-by: Guido Günther
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

12 Dec, 2019

2 commits

  • In prepare_namespace(), do_mount() can be used instead of ksys_mount()
    as the first and third argument are const strings in the kernel, the
    second and fourth argument are passed through anyway, and the fifth
    argument is NULL.

    In do_mount_root(), ksys_mount() is called with the first and third
    argument being already kernelspace strings, which do not need to be
    copied over from userspace to kernelspace (again). The second and
    fourth arguments are passed through to do_mount() anyway. The fifth
    argument, while already residing in kernelspace, needs to be put into
    a page of its own. Then, do_mount() can be used instead of
    ksys_mount().

    Once this is done, there are no in-kernel users to ksys_mount() left,
    which can therefore be removed.

    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • In devtmpfs, do_mount() can be called directly instead of complex wrapping
    by ksys_mount():
    - the first and third arguments are const strings in the kernel,
    and do not need to be copied over from userspace;
    - the fifth argument is NULL, and therefore no page needs to be
    copied over from userspace;
    - the second and fourth argument are passed through anyway.

    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     

03 Oct, 2019

1 commit

  • Add a new virtual device named /dev/cifs (0xfe) to tell the kernel to
    mount the root file system over the network by using SMB protocol.

    cifs_root_data() will be responsible to retrieve the parsed
    information of the new command-line option (cifsroot=) and then call
    do_mount_root() with the appropriate mount options for cifs.ko.

    Signed-off-by: Paulo Alcantara (SUSE)
    Signed-off-by: David S. Miller

    Paulo Alcantara (SUSE)
     

13 Sep, 2019

1 commit

  • Convert the ramfs, shmem, tmpfs, devtmpfs and rootfs filesystems to the new
    internal mount API as the old one will be obsoleted and removed. This
    allows greater flexibility in communication of mount parameters between
    userspace, the VFS and the filesystem.

    See Documentation/filesystems/mount_api.txt for more information.

    Note that tmpfs is slightly tricky as it can contain embedded commas, so it
    can't be trivially split up using strsep() to break on commas in
    generic_parse_monolithic(). Instead, tmpfs has to supply its own generic
    parser.

    However, if tmpfs changes, then devtmpfs and rootfs, which are wrappers
    around tmpfs or ramfs, must change too - and thus so must ramfs, so these
    had to be converted also.

    [AV: rewritten]

    Signed-off-by: David Howells
    cc: Hugh Dickins
    cc: linux-mm@kvack.org
    Signed-off-by: Al Viro

    David Howells
     

06 Sep, 2019

2 commits


20 Jul, 2019

1 commit

  • Pull vfs mount updates from Al Viro:
    "The first part of mount updates.

    Convert filesystems to use the new mount API"

    * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    mnt_init(): call shmem_init() unconditionally
    constify ksys_mount() string arguments
    don't bother with registering rootfs
    init_rootfs(): don't bother with init_ramfs_fs()
    vfs: Convert smackfs to use the new mount API
    vfs: Convert selinuxfs to use the new mount API
    vfs: Convert securityfs to use the new mount API
    vfs: Convert apparmorfs to use the new mount API
    vfs: Convert openpromfs to use the new mount API
    vfs: Convert xenfs to use the new mount API
    vfs: Convert gadgetfs to use the new mount API
    vfs: Convert oprofilefs to use the new mount API
    vfs: Convert ibmasmfs to use the new mount API
    vfs: Convert qib_fs/ipathfs to use the new mount API
    vfs: Convert efivarfs to use the new mount API
    vfs: Convert configfs to use the new mount API
    vfs: Convert binfmt_misc to use the new mount API
    convenience helper: get_tree_single()
    convenience helper get_tree_nodev()
    vfs: Kill sget_userns()
    ...

    Linus Torvalds
     

05 Jul, 2019

3 commits

  • No point having two call sites (earlier in init_rootfs() from
    mnt_init() in case we are going to use shmem-style rootfs,
    later from do_basic_setup() unconditionally), along with the
    logics in shmem_init() itself to make the second call a no-op...

    Signed-off-by: Al Viro

    Al Viro
     
  • init_mount_tree() can get to rootfs_fs_type directly and that simplifies
    a lot of things. We don't need to register it, we don't need to look
    it up *and* we don't need to bother with preventing subsequent userland
    mounts. That's the way we should've done that from the very beginning.

    There is a user-visible change, namely the disappearance of "rootfs"
    from /proc/filesystems. Note that it's been unmountable all along
    and it didn't show up in /proc/mounts; however, it *is* a user-visible
    change and theoretically some script might've been using its presence
    in /proc/filesystems to tell 2.4.11+ from earlier kernels.

    *IF* any complaints about behaviour change do show up, we could fake
    it in /proc/filesystems. I very much doubt we'll have to, though.

    Signed-off-by: Al Viro

    Al Viro
     
  • the only thing done by the latter is making ramfs visible
    to mount(2); we don't need it there - rootfs is separate
    and, in fact, made visible to mount(2) in the same init_rootfs().

    Signed-off-by: Al Viro

    Al Viro
     

21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 Dec, 2018

1 commit


31 Oct, 2018

1 commit

  • Support referencing the root partition label from GPT as argument
    to the root= option on the kernel command line in analogy to
    referencing the partition uuid as root=PARTUUID=.

    Specifying the partition label instead of the uuid is often much
    easier, e.g. in embedded environments when there is an
    A/B rootfs partition scheme for interruptible firmware updates
    (i.e. rootfsA/ rootfsB).

    The partition label can be queried with the blkid command.

    Link: http://lkml.kernel.org/r/20180822060904.828E510665E@pc-niv.weinmann.com
    Signed-off-by: Nikolaus Voss
    Reviewed-by: Andrew Morton
    Cc: Dominik Brodowski
    Cc: Sasha Levin
    Cc: Al Viro
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nikolaus Voss
     

23 Aug, 2018

1 commit

  • Sparse checking used to be disabled on init/do_mounts.c and a few related
    files because "Many of the syscalls used in this file expect some of the
    arguments to be __user pointers not __kernel pointers".

    However since 28128c61e ("kconfig.h: Include compiler types to avoid
    missed struct attributes") the checks are, in fact, not disabled anymore
    because of the more early include of "linux/compiler_types.h"

    So remove the now ineffective #undefery that was done to disable these
    warnings, as well as the associated comment.

    Link: http://lkml.kernel.org/r/20180617115355.53799-1-luc.vanoostenryck@gmail.com
    Signed-off-by: Luc Van Oostenryck
    Cc: Dominik Brodowski
    Cc: Al Viro
    Cc: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Luc Van Oostenryck
     

03 Apr, 2018

5 commits

  • Using this helper allows us to avoid the in-kernel calls to the
    sys_read() syscall. The ksys_ prefix denotes that this function
    is meant as a drop-in replacement for the syscall. In particular, it
    uses the same calling convention as sys_read().

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Alexander Viro
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using this helper allows us to avoid the in-kernel calls to the
    sys_ioctl() syscall. The ksys_ prefix denotes that this function
    is meant as a drop-in replacement for the syscall. In particular, it
    uses the same calling convention as sys_ioctl().

    After careful review, at least some of these calls could be converted
    to do_vfs_ioctl() in future.

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Alexander Viro
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using this wrapper allows us to avoid the in-kernel calls to the
    sys_open() syscall. The ksys_ prefix denotes that this function is meant
    as a drop-in replacement for the syscall. In particular, it uses the
    same calling convention as sys_open().

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Al Viro
    Cc: Andrew Morton
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using the ksys_close() wrapper allows us to get rid of in-kernel calls
    to the sys_close() syscall. The ksys_ prefix denotes that this function
    is meant as a drop-in replacement for the syscall. In particular, it
    uses the same calling convention as sys_close(), with one subtle
    difference:

    The few places which checked the return value did not care about the return
    value re-writing in sys_close(), so simply use a wrapper around
    __close_fd().

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Al Viro
    Cc: Andrew Morton
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using this helper allows us to avoid the in-kernel calls to the sys_chdir()
    syscall. The ksys_ prefix denotes that this function is meant as a drop-in
    replacement for the syscall. In particular, it uses the same calling
    convention as sys_chdir().

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Al Viro
    Cc: Andrew Morton
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski