11 Dec, 2019

1 commit


13 Oct, 2019

2 commits

  • If on boot up, lockdown is activated for tracefs, don't even bother creating
    the files. This can also prevent instances from being created if lockdown is
    in effect.

    Link: http://lkml.kernel.org/r/CAHk-=whC6Ji=fWnjh2+eS4b15TnbsS4VPVtvBOwCy1jjEG_JHQ@mail.gmail.com

    Suggested-by: Linus Torvalds
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • Running the latest kernel through my "make instances" stress tests, I
    triggered the following bug (with KASAN and kmemleak enabled):

    mkdir invoked oom-killer:
    gfp_mask=0x40cd0(GFP_KERNEL|__GFP_COMP|__GFP_RECLAIMABLE), order=0,
    oom_score_adj=0
    CPU: 1 PID: 2229 Comm: mkdir Not tainted 5.4.0-rc2-test #325
    Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
    Call Trace:
    dump_stack+0x64/0x8c
    dump_header+0x43/0x3b7
    ? trace_hardirqs_on+0x48/0x4a
    oom_kill_process+0x68/0x2d5
    out_of_memory+0x2aa/0x2d0
    __alloc_pages_nodemask+0x96d/0xb67
    __alloc_pages_node+0x19/0x1e
    alloc_slab_page+0x17/0x45
    new_slab+0xd0/0x234
    ___slab_alloc.constprop.86+0x18f/0x336
    ? alloc_inode+0x2c/0x74
    ? irq_trace+0x12/0x1e
    ? tracer_hardirqs_off+0x1d/0xd7
    ? __slab_alloc.constprop.85+0x21/0x53
    __slab_alloc.constprop.85+0x31/0x53
    ? __slab_alloc.constprop.85+0x31/0x53
    ? alloc_inode+0x2c/0x74
    kmem_cache_alloc+0x50/0x179
    ? alloc_inode+0x2c/0x74
    alloc_inode+0x2c/0x74
    new_inode_pseudo+0xf/0x48
    new_inode+0x15/0x25
    tracefs_get_inode+0x23/0x7c
    ? lookup_one_len+0x54/0x6c
    tracefs_create_file+0x53/0x11d
    trace_create_file+0x15/0x33
    event_create_dir+0x2a3/0x34b
    __trace_add_new_event+0x1c/0x26
    event_trace_add_tracer+0x56/0x86
    trace_array_create+0x13e/0x1e1
    instance_mkdir+0x8/0x17
    tracefs_syscall_mkdir+0x39/0x50
    ? get_dname+0x31/0x31
    vfs_mkdir+0x78/0xa3
    do_mkdirat+0x71/0xb0
    sys_mkdir+0x19/0x1b
    do_fast_syscall_32+0xb0/0xed

    I bisected this down to the addition of the proxy_ops into tracefs for
    lockdown. It appears that the allocation of the proxy_ops and then freeing
    it in the destroy_inode callback, is causing havoc with the memory system.
    Reading the documentation about destroy_inode and talking with Linus about
    this, this is buggy and wrong. When defining the destroy_inode() method, it
    is expected that the destroy_inode() will also free the inode, and not just
    the extra allocations done in the creation of the inode. The faulty commit
    causes a memory leak of the inode data structure when they are deleted.

    Instead of allocating the proxy_ops (and then having to free it) the checks
    should be done by the open functions themselves, and not hack into the
    tracefs directory. First revert the tracefs updates for locked_down and then
    later we can add the locked_down checks in the kernel/trace files.

    Link: http://lkml.kernel.org/r/20191011135458.7399da44@gandalf.local.home

    Fixes: ccbd54ff54e8 ("tracefs: Restrict tracefs when the kernel is locked down")
    Suggested-by: Linus Torvalds
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

28 Sep, 2019

1 commit

  • Pull kernel lockdown mode from James Morris:
    "This is the latest iteration of the kernel lockdown patchset, from
    Matthew Garrett, David Howells and others.

    From the original description:

    This patchset introduces an optional kernel lockdown feature,
    intended to strengthen the boundary between UID 0 and the kernel.
    When enabled, various pieces of kernel functionality are restricted.
    Applications that rely on low-level access to either hardware or the
    kernel may cease working as a result - therefore this should not be
    enabled without appropriate evaluation beforehand.

    The majority of mainstream distributions have been carrying variants
    of this patchset for many years now, so there's value in providing a
    doesn't meet every distribution requirement, but gets us much closer
    to not requiring external patches.

    There are two major changes since this was last proposed for mainline:

    - Separating lockdown from EFI secure boot. Background discussion is
    covered here: https://lwn.net/Articles/751061/

    - Implementation as an LSM, with a default stackable lockdown LSM
    module. This allows the lockdown feature to be policy-driven,
    rather than encoding an implicit policy within the mechanism.

    The new locked_down LSM hook is provided to allow LSMs to make a
    policy decision around whether kernel functionality that would allow
    tampering with or examining the runtime state of the kernel should be
    permitted.

    The included lockdown LSM provides an implementation with a simple
    policy intended for general purpose use. This policy provides a coarse
    level of granularity, controllable via the kernel command line:

    lockdown={integrity|confidentiality}

    Enable the kernel lockdown feature. If set to integrity, kernel features
    that allow userland to modify the running kernel are disabled. If set to
    confidentiality, kernel features that allow userland to extract
    confidential information from the kernel are also disabled.

    This may also be controlled via /sys/kernel/security/lockdown and
    overriden by kernel configuration.

    New or existing LSMs may implement finer-grained controls of the
    lockdown features. Refer to the lockdown_reason documentation in
    include/linux/security.h for details.

    The lockdown feature has had signficant design feedback and review
    across many subsystems. This code has been in linux-next for some
    weeks, with a few fixes applied along the way.

    Stephen Rothwell noted that commit 9d1f8be5cf42 ("bpf: Restrict bpf
    when kernel lockdown is in confidentiality mode") is missing a
    Signed-off-by from its author. Matthew responded that he is providing
    this under category (c) of the DCO"

    * 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits)
    kexec: Fix file verification on S390
    security: constify some arrays in lockdown LSM
    lockdown: Print current->comm in restriction messages
    efi: Restrict efivar_ssdt_load when the kernel is locked down
    tracefs: Restrict tracefs when the kernel is locked down
    debugfs: Restrict debugfs when the kernel is locked down
    kexec: Allow kexec_file() with appropriate IMA policy when locked down
    lockdown: Lock down perf when in confidentiality mode
    bpf: Restrict bpf when kernel lockdown is in confidentiality mode
    lockdown: Lock down tracing and perf kprobes when in confidentiality mode
    lockdown: Lock down /proc/kcore
    x86/mmiotrace: Lock down the testmmiotrace module
    lockdown: Lock down module params that specify hardware parameters (eg. ioport)
    lockdown: Lock down TIOCSSERIAL
    lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down
    acpi: Disable ACPI table override if the kernel is locked down
    acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
    ACPI: Limit access to custom_method when the kernel is locked down
    x86/msr: Restrict MSR access when the kernel is locked down
    x86: Lock down IO port access when the kernel is locked down
    ...

    Linus Torvalds
     

20 Aug, 2019

1 commit

  • Tracefs may release more information about the kernel than desirable, so
    restrict it when the kernel is locked down in confidentiality mode by
    preventing open().

    (Fixed by Ben Hutchings to avoid a null dereference in
    default_file_open())

    Signed-off-by: Matthew Garrett
    Reviewed-by: Steven Rostedt (VMware)
    Cc: Ben Hutchings
    Signed-off-by: James Morris

    Matthew Garrett
     

11 Jul, 2019

1 commit

  • Pull fsnotify updates from Jan Kara:
    "This contains cleanups of the fsnotify name removal hook and also a
    patch to disable fanotify permission events for 'proc' filesystem"

    * tag 'fsnotify_for_v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    fsnotify: get rid of fsnotify_nameremove()
    fsnotify: move fsnotify_nameremove() hook out of d_delete()
    configfs: call fsnotify_rmdir() hook
    debugfs: call fsnotify_{unlink,rmdir}() hooks
    debugfs: simplify __debugfs_remove_file()
    devpts: call fsnotify_unlink() hook
    tracefs: call fsnotify_{unlink,rmdir}() hooks
    rpc_pipefs: call fsnotify_{unlink,rmdir}() hooks
    btrfs: call fsnotify_rmdir() hook
    fsnotify: add empty fsnotify_{unlink,rmdir}() hooks
    fanotify: Disallow permission events for proc filesystem

    Linus Torvalds
     

20 Jun, 2019

1 commit


19 Jun, 2019

1 commit

  • Based on 2 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation #

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 4122 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Enrico Weigelt
    Reviewed-by: Kate Stewart
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 May, 2019

1 commit


31 Jul, 2018

1 commit

  • tracefs_ops is initialized inside tracefs_create_instance_dir and not
    modified after. tracefs_create_instance_dir allows for initialization
    only once, and is called from create_trace_instances(marked __init),
    which is called from tracer_init_tracefs(marked __init). Also, mark
    tracefs_create_instance_dir as __init.

    Link: http://lkml.kernel.org/r/20180725171901.4468-1-zsm@chromium.org

    Reviewed-by: Kees Cook
    Signed-off-by: Zubin Mithra
    Signed-off-by: Steven Rostedt (VMware)

    Zubin Mithra
     

06 Jul, 2017

1 commit

  • btrfs, debugfs, reiserfs and tracefs call save_mount_options() and reiserfs
    calls replace_mount_options(), but they then implement their own
    ->show_options() methods and don't touch s_options, rendering the saved
    options unnecessary. I'm trying to eliminate s_options to make it easier
    to implement a context-based mount where the mount options can be passed
    individually over a file descriptor.

    Remove the calls to save/replace_mount_options() call in these cases.

    Signed-off-by: David Howells
    cc: Chris Mason
    cc: Greg Kroah-Hartman
    cc: Steven Rostedt
    cc: linux-btrfs@vger.kernel.org
    cc: reiserfs-devel@vger.kernel.org
    Signed-off-by: Al Viro

    David Howells
     

27 Apr, 2017

1 commit

  • simple_fill_super() is passed an array of tree_descr structures which
    describe the files to create in the filesystem's root directory. Since
    these arrays are never modified intentionally, they should be 'const' so
    that they are placed in .rodata and benefit from memory protection.
    This patch updates the function signature and all users, and also
    constifies tree_descr.name.

    Signed-off-by: Eric Biggers
    Signed-off-by: Al Viro

    Eric Biggers
     

28 Sep, 2016

1 commit

  • CURRENT_TIME macro is not appropriate for filesystems as it
    doesn't use the right granularity for filesystem timestamps.
    Use current_time() instead.

    CURRENT_TIME is also not y2038 safe.

    This is also in preparation for the patch that transitions
    vfs timestamps to use 64 bit time and hence make them
    y2038 safe. As part of the effort current_time() will be
    extended to do range checks. Hence, it is necessary for all
    file system timestamps to use current_time(). Also,
    current_time() will be transitioned along with vfs to be
    y2038 safe.

    Note that whenever a single call to current_time() is used
    to change timestamps in different inodes, it is because they
    share the same time granularity.

    Signed-off-by: Deepa Dinamani
    Reviewed-by: Arnd Bergmann
    Acked-by: Felipe Balbi
    Acked-by: Steven Whitehouse
    Acked-by: Ryusuke Konishi
    Acked-by: David Sterba
    Signed-off-by: Al Viro

    Deepa Dinamani
     

30 May, 2016

1 commit


23 Jan, 2016

1 commit

  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

05 Nov, 2015

1 commit

  • In tracefs' start_creating(), we pin the file system to safely access
    its root. When we failed to create a file, we unpin the file system via
    failed_creating() to release the mount count and eventually the reference
    of the singleton vfsmount.

    However, when we run into an error during lookup_one_len() when still
    in start_creating(), we only release the parent's mutex but not so the
    reference on the mount.

    F.e., in securityfs_create_file(), after doing simple_pin_fs() when
    lookup_one_len() fails there, we infact do simple_release_fs(). This
    seems necessary here as well.

    Same issue seen in debugfs due to 190afd81e4a5 ("debugfs: split the
    beginning and the end of __create_file() off"), which seemed to got
    carried over into tracefs, too. Noticed during code review.

    Link: http://lkml.kernel.org/r/68efa86101b778cf7517ed7c6ad573bd69f60ec6.1446672850.git.daniel@iogearbox.net

    Fixes: 4282d60689d4 ("tracefs: Add new tracefs file system")
    Cc: stable@vger.kernel.org # 4.1+
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Steven Rostedt

    Daniel Borkmann
     

05 Jul, 2015

1 commit

  • Pull more vfs updates from Al Viro:
    "Assorted VFS fixes and related cleanups (IMO the most interesting in
    that part are f_path-related things and Eric's descriptor-related
    stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
    fs-cache series, DAX patches, Jan's file_remove_suid() work"

    [ I'd say this is much more than "fixes and related cleanups". The
    file_table locking rule change by Eric Dumazet is a rather big and
    fundamental update even if the patch isn't huge. - Linus ]

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
    9p: cope with bogus responses from server in p9_client_{read,write}
    p9_client_write(): avoid double p9_free_req()
    9p: forgetting to cancel request on interrupted zero-copy RPC
    dax: bdev_direct_access() may sleep
    block: Add support for DAX reads/writes to block devices
    dax: Use copy_from_iter_nocache
    dax: Add block size note to documentation
    fs/file.c: __fget() and dup2() atomicity rules
    fs/file.c: don't acquire files->file_lock in fd_install()
    fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
    vfs: avoid creation of inode number 0 in get_next_ino
    namei: make set_root_rcu() return void
    make simple_positive() public
    ufs: use dir_pages instead of ufs_dir_pages()
    pagemap.h: move dir_pages() over there
    remove the pointless include of lglock.h
    fs: cleanup slight list_entry abuse
    xfs: Correctly lock inode when removing suid and file capabilities
    fs: Call security_ops->inode_killpriv on truncate
    fs: Provide function telling whether file_remove_privs() will do anything
    ...

    Linus Torvalds
     

01 Jul, 2015

1 commit

  • This allows for better documentation in the code and
    it allows for a simpler and fully correct version of
    fs_fully_visible to be written.

    The mount points converted and their filesystems are:
    /sys/hypervisor/s390/ s390_hypfs
    /sys/kernel/config/ configfs
    /sys/kernel/debug/ debugfs
    /sys/firmware/efi/efivars/ efivarfs
    /sys/fs/fuse/connections/ fusectl
    /sys/fs/pstore/ pstore
    /sys/kernel/tracing/ tracefs
    /sys/fs/cgroup/ cgroup
    /sys/kernel/security/ securityfs
    /sys/fs/selinux/ selinuxfs
    /sys/fs/smackfs/ smackfs

    Cc: stable@vger.kernel.org
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

24 Jun, 2015

1 commit


04 Feb, 2015

3 commits

  • The tracing "instances" directory can create sub tracing buffers
    with mkdir, and remove them with rmdir. As a mkdir will also create
    all the files and directories that control the sub buffer the inode
    mutexes need to be released before this is done, to avoid deadlocks.
    It is better to let the tracing system unlock the inode mutexes before
    calling the functions that create the files within the new directory
    (or deletes the files from the one being destroyed).

    Now that tracing has been converted over to tracefs, the tracefs file
    system can be modified to accommodate this feature. It still releases
    the locks, but the filesystem itself can take care of the ugly
    business and let the user just do what it needs.

    The tracing system now attaches a descriptor to the directory dentry
    that can have userspace create or remove sub directories. If this
    descriptor does not exist for a dentry, then that dentry can not be
    used to create other directories. This descriptor holds a mkdir and
    rmdir method that only takes a character string as an argument.

    The tracefs file system will first make a copy of the dentry name
    before releasing the locks. Then it will pass the copied name to the
    methods. It is up to the tracing system that supplied the methods to
    handle races with duplicate names and such as all the inode mutexes
    would be released when the functions are called.

    Cc: Al Viro
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • When tracefs is configured, have the directory /sys/kernel/tracing appear
    just like /sys/kernel/debug appears when debugfs is configured.

    This will give a consistent place for system admins to mount tracefs.

    Acked-by: Greg Kroah-Hartman
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Add a separate file system to handle the tracing directory. Currently it
    is part of debugfs, but that is starting to show its limits.

    One thing is that in order to access the tracing infrastructure, you need
    to mount debugfs. As that includes debugging from all sorts of sub systems
    in the kernel, it is not considered advisable to mount such an all
    encompassing debugging system.

    Having the tracing system in its own file systems gives access to the
    tracing sub system without needing to include all other systems.

    Another problem with tracing using the debugfs system is that the
    instances use mkdir to create sub buffers. debugfs does not support mkdir
    from userspace so to implement it, special hacks were used. By controlling
    the file system that the tracing infrastructure uses, this can be properly
    done without hacks.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)