14 Oct, 2020

1 commit

  • kmemleak-test.c is just a kmemleak test module, which also can not be used
    as a built-in kernel module. Thus, i think it may should not be in mm
    dir, and move the kmemleak-test.c to samples/kmemleak/kmemleak-test.c.
    Fix the spelling of built-in by the way.

    Signed-off-by: Hui Su
    Signed-off-by: Andrew Morton
    Cc: Catalin Marinas
    Cc: Jonathan Corbet
    Cc: Mauro Carvalho Chehab
    Cc: David S. Miller
    Cc: Rob Herring
    Cc: Masahiro Yamada
    Cc: Sam Ravnborg
    Cc: Josh Poimboeuf
    Cc: Steven Rostedt (VMware)
    Cc: Miguel Ojeda
    Cc: Divya Indi
    Cc: Tomas Winkler
    Cc: David Howells
    Link: https://lkml.kernel.org/r/20200925183729.GA172837@rlk
    Signed-off-by: Linus Torvalds

    Hui Su
     

14 Jun, 2020

2 commits

  • Pull more Kbuild updates from Masahiro Yamada:

    - fix build rules in binderfs sample

    - fix build errors when Kbuild recurses to the top Makefile

    - covert '---help---' in Kconfig to 'help'

    * tag 'kbuild-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
    treewide: replace '---help---' in Kconfig files with 'help'
    kbuild: fix broken builds because of GZIP,BZIP2,LZOP variables
    samples: binderfs: really compile this sample and fix build issues

    Linus Torvalds
     
  • …git/dhowells/linux-fs

    Pull notification queue from David Howells:
    "This adds a general notification queue concept and adds an event
    source for keys/keyrings, such as linking and unlinking keys and
    changing their attributes.

    Thanks to Debarshi Ray, we do have a pull request to use this to fix a
    problem with gnome-online-accounts - as mentioned last time:

    https://gitlab.gnome.org/GNOME/gnome-online-accounts/merge_requests/47

    Without this, g-o-a has to constantly poll a keyring-based kerberos
    cache to find out if kinit has changed anything.

    [ There are other notification pending: mount/sb fsinfo notifications
    for libmount that Karel Zak and Ian Kent have been working on, and
    Christian Brauner would like to use them in lxc, but let's see how
    this one works first ]

    LSM hooks are included:

    - A set of hooks are provided that allow an LSM to rule on whether or
    not a watch may be set. Each of these hooks takes a different
    "watched object" parameter, so they're not really shareable. The
    LSM should use current's credentials. [Wanted by SELinux & Smack]

    - A hook is provided to allow an LSM to rule on whether or not a
    particular message may be posted to a particular queue. This is
    given the credentials from the event generator (which may be the
    system) and the watch setter. [Wanted by Smack]

    I've provided SELinux and Smack with implementations of some of these
    hooks.

    WHY
    ===

    Key/keyring notifications are desirable because if you have your
    kerberos tickets in a file/directory, your Gnome desktop will monitor
    that using something like fanotify and tell you if your credentials
    cache changes.

    However, we also have the ability to cache your kerberos tickets in
    the session, user or persistent keyring so that it isn't left around
    on disk across a reboot or logout. Keyrings, however, cannot currently
    be monitored asynchronously, so the desktop has to poll for it - not
    so good on a laptop. This facility will allow the desktop to avoid the
    need to poll.

    DESIGN DECISIONS
    ================

    - The notification queue is built on top of a standard pipe. Messages
    are effectively spliced in. The pipe is opened with a special flag:

    pipe2(fds, O_NOTIFICATION_PIPE);

    The special flag has the same value as O_EXCL (which doesn't seem
    like it will ever be applicable in this context)[?]. It is given up
    front to make it a lot easier to prohibit splice&co from accessing
    the pipe.

    [?] Should this be done some other way? I'd rather not use up a new
    O_* flag if I can avoid it - should I add a pipe3() system call
    instead?

    The pipe is then configured::

    ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);
    ioctl(fds[1], IOC_WATCH_QUEUE_SET_FILTER, &filter);

    Messages are then read out of the pipe using read().

    - It should be possible to allow write() to insert data into the
    notification pipes too, but this is currently disabled as the
    kernel has to be able to insert messages into the pipe *without*
    holding pipe->mutex and the code to make this work needs careful
    auditing.

    - sendfile(), splice() and vmsplice() are disabled on notification
    pipes because of the pipe->mutex issue and also because they
    sometimes want to revert what they just did - but one or more
    notification messages might've been interleaved in the ring.

    - The kernel inserts messages with the wait queue spinlock held. This
    means that pipe_read() and pipe_write() have to take the spinlock
    to update the queue pointers.

    - Records in the buffer are binary, typed and have a length so that
    they can be of varying size.

    This allows multiple heterogeneous sources to share a common
    buffer; there are 16 million types available, of which I've used
    just a few, so there is scope for others to be used. Tags may be
    specified when a watchpoint is created to help distinguish the
    sources.

    - Records are filterable as types have up to 256 subtypes that can be
    individually filtered. Other filtration is also available.

    - Notification pipes don't interfere with each other; each may be
    bound to a different set of watches. Any particular notification
    will be copied to all the queues that are currently watching for it
    - and only those that are watching for it.

    - When recording a notification, the kernel will not sleep, but will
    rather mark a queue as having lost a message if there's
    insufficient space. read() will fabricate a loss notification
    message at an appropriate point later.

    - The notification pipe is created and then watchpoints are attached
    to it, using one of:

    keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fds[1], 0x01);
    watch_mount(AT_FDCWD, "/", 0, fd, 0x02);
    watch_sb(AT_FDCWD, "/mnt", 0, fd, 0x03);

    where in both cases, fd indicates the queue and the number after is
    a tag between 0 and 255.

    - Watches are removed if either the notification pipe is destroyed or
    the watched object is destroyed. In the latter case, a message will
    be generated indicating the enforced watch removal.

    Things I want to avoid:

    - Introducing features that make the core VFS dependent on the
    network stack or networking namespaces (ie. usage of netlink).

    - Dumping all this stuff into dmesg and having a daemon that sits
    there parsing the output and distributing it as this then puts the
    responsibility for security into userspace and makes handling
    namespaces tricky. Further, dmesg might not exist or might be
    inaccessible inside a container.

    - Letting users see events they shouldn't be able to see.

    TESTING AND MANPAGES
    ====================

    - The keyutils tree has a pipe-watch branch that has keyctl commands
    for making use of notifications. Proposed manual pages can also be
    found on this branch, though a couple of them really need to go to
    the main manpages repository instead.

    If the kernel supports the watching of keys, then running "make
    test" on that branch will cause the testing infrastructure to spawn
    a monitoring process on the side that monitors a notifications pipe
    for all the key/keyring changes induced by the tests and they'll
    all be checked off to make sure they happened.

    https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=pipe-watch

    - A test program is provided (samples/watch_queue/watch_test) that
    can be used to monitor for keyrings, mount and superblock events.
    Information on the notifications is simply logged to stdout"

    * tag 'notifications-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
    smack: Implement the watch_key and post_notification hooks
    selinux: Implement the watch_key security hook
    keys: Make the KEY_NEED_* perms an enum rather than a mask
    pipe: Add notification lossage handling
    pipe: Allow buffers to be marked read-whole-or-error for notifications
    Add sample notification program
    watch_queue: Add a key/keyring notification facility
    security: Add hooks to rule on setting a watch
    pipe: Add general notification queue support
    pipe: Add O_NOTIFICATION_PIPE
    security: Add a hook for the point of notification insertion
    uapi: General notification queue definitions

    Linus Torvalds
     

11 Jun, 2020

1 commit

  • Even after commit c624adc9cb6e ("samples: fix binderfs sample"), this
    sample is never compiled.

    'hostprogs' teaches Kbuild that this is a host program, but not enough
    to order to compile it. You must add it to 'always-y' to really compile
    it.

    Since this sample has never been compiled in upstream, various issues
    are left unnoticed.

    [1] compilers without are still widely used

    is only available since commit c13295ad219d
    ("binderfs: rename header to binderfs.h"), i.e., Linux 5.0

    If your compiler is based on UAPI headers older than Linux 5.0, you
    will see the following error:

    samples/binderfs/binderfs_example.c:16:10: fatal error: linux/android/binderfs.h: No such file or directory
    #include
    ^~~~~~~~~~~~~~~~~~~~~~~~~~
    compilation terminated.

    You cannot rely on compilers having such a new header.

    The common approach is to install UAPI headers of this kernel into
    usr/include, and then add it to the header search path.

    I added 'depends on HEADERS_INSTALL' in Kconfig, and '-I usr/include'
    compiler flag in Makefile.

    [2] compile the sample for target architecture

    Because headers_install works for the target architecture, only the
    native compiler was able to build sample code that requires
    '-I usr/include'.

    Commit 7f3a59db274c ("kbuild: add infrastructure to build userspace
    programs") added the new syntax 'userprogs' to compile user-space
    programs for the target architecture.

    Use it, and then 'ifndef CROSS_COMPILE' will go away.

    I added 'depends on CC_CAN_LINK' because $(CC) is not necessarily
    capable of linking user-space programs.

    [3] use subdir-y to descend into samples/binderfs

    Since this directory does not contain any kernel-space code, it has no
    point in generating built-in.a or modules.order.

    Replace obj-$(CONFIG_...) with subdir-$(CONFIG_...).

    [4] -Wunused-variable warning

    If I compile this, I see the following warning.

    samples/binderfs/binderfs_example.c: In function 'main':
    samples/binderfs/binderfs_example.c:21:9: warning: unused variable 'len' [-Wunused-variable]
    21 | size_t len;
    | ^~~

    I removed the unused 'len'.

    [5] CONFIG_ANDROID_BINDERFS is not required

    Since this is a user-space standalone program, it is independent of
    the kernel configuration.

    Remove 'depends on ANDROID_BINDERFS'.

    Fixes: 9762dc1432e1 ("samples: add binderfs sample program")
    Fixes: c624adc9cb6e ("samples: fix binderfs sample")
    Signed-off-by: Masahiro Yamada
    Acked-by: Christian Brauner

    Masahiro Yamada
     

19 May, 2020

1 commit

  • The sample program is run like:

    ./samples/watch_queue/watch_test

    and watches "/" for mount changes and the current session keyring for key
    changes:

    # keyctl add user a a @s
    1035096409
    # keyctl unlink 1035096409 @s

    producing:

    # ./watch_test
    read() = 16
    NOTIFY[000]: ty=000001 sy=02 i=00000110
    KEY 2ffc2e5d change=2[linked] aux=1035096409
    read() = 16
    NOTIFY[000]: ty=000001 sy=02 i=00000110
    KEY 2ffc2e5d change=3[unlinked] aux=1035096409

    Other events may be produced, such as with a failing disk:

    read() = 22
    NOTIFY[000]: ty=000003 sy=02 i=00000416
    USB 3-7.7 dev-reset e=0 r=0
    read() = 24
    NOTIFY[000]: ty=000002 sy=06 i=00000418
    BLOCK 00800050 e=6[critical medium] s=64000ef8

    This corresponds to:

    blk_update_request: critical medium error, dev sdf, sector 1677725432 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0

    in dmesg.

    Signed-off-by: David Howells

    David Howells
     

17 May, 2020

4 commits

  • Kbuild now supports the 'userprogs' syntax to compile userspace
    programs for the same architecture as the kernel.

    Add the entry to samples/Makefile to put this into the build bot
    coverage.

    I also added the CONFIG option guarded by 'depends on CC_CAN_LINK'
    because $(CC) may not provide libc.

    Signed-off-by: Masahiro Yamada
    Acked-by: Sam Ravnborg

    Masahiro Yamada
     
  • Kbuild now supports the 'userprogs' syntax to compile userspace
    programs for the same architecture as the kernel.

    Add the entry to samples/Makefile to put this into the build bot
    coverage.

    I also added the CONFIG option guarded by 'depends on CC_CAN_LINK'
    because $(CC) may not provide libc.

    Signed-off-by: Masahiro Yamada
    Acked-by: Sam Ravnborg

    Masahiro Yamada
     
  • Kbuild now supports the 'userprogs' syntax to compile userspace
    programs for the same architecture as the kernel.

    Add the entry to samples/Makefile to put this into the build bot
    coverage.

    I also added the CONFIG option guarded by 'depends on CC_CAN_LINK'
    because $(CC) may not provide libc.

    Signed-off-by: Masahiro Yamada
    Acked-by: Miguel Ojeda
    Acked-by: Sam Ravnborg

    Masahiro Yamada
     
  • This userspace program includes UAPI headers exported to usr/include/.
    'make headers' always works for the target architecture (i.e. the same
    architecture as the kernel), so the sample program should be built for
    the target as well. Kbuild now supports 'userprogs' for that.

    Add the entry to samples/Makefile to put this into the build bot
    coverage.

    I also added the CONFIG option guarded by 'depends on CC_CAN_LINK'
    because $(CC) may not provide libc.

    Signed-off-by: Masahiro Yamada
    Acked-by: Sam Ravnborg

    Masahiro Yamada
     

12 May, 2020

1 commit


22 Jan, 2020

1 commit

  • The code in the 'samples' subdirectory isn't part of the kernel, so
    there's no need to validate it.

    Reported-by: Randy Dunlap
    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Ingo Molnar
    Link: https://lore.kernel.org/r/c4cb4ef635ec606454ab834cb49fc3e9381fb1b1.1579543924.git.jpoimboe@redhat.com

    Josh Poimboeuf
     

28 Nov, 2019

1 commit

  • Pull tracing updates from Steven Rostedt:
    "New tracing features:

    - New PERMANENT flag to ftrace_ops when attaching a callback to a
    function.

    As /proc/sys/kernel/ftrace_enabled when set to zero will disable
    all attached callbacks in ftrace, this has a detrimental impact on
    live kernel tracing, as it disables all that it patched. If a
    ftrace_ops is registered to ftrace with the PERMANENT flag set, it
    will prevent ftrace_enabled from being disabled, and if
    ftrace_enabled is already disabled, it will prevent a ftrace_ops
    with PREMANENT flag set from being registered.

    - New register_ftrace_direct().

    As eBPF would like to register its own trampolines to be called by
    the ftrace nop locations directly, without going through the ftrace
    trampoline, this function has been added. This allows for eBPF
    trampolines to live along side of ftrace, perf, kprobe and live
    patching. It also utilizes the ftrace enabled_functions file that
    keeps track of functions that have been modified in the kernel, to
    allow for security auditing.

    - Allow for kernel internal use of ftrace instances.

    Subsystems in the kernel can now create and destroy their own
    tracing instances which allows them to have their own tracing
    buffer, and be able to record events without worrying about other
    users from writing over their data.

    - New seq_buf_hex_dump() that lets users use the hex_dump() in their
    seq_buf usage.

    - Notifications now added to tracing_max_latency to allow user space
    to know when a new max latency is hit by one of the latency
    tracers.

    - Wider spread use of generic compare operations for use of bsearch
    and friends.

    - More synthetic event fields may be defined (32 up from 16)

    - Use of xarray for architectures with sparse system calls, for the
    system call trace events.

    This along with small clean ups and fixes"

    * tag 'trace-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (51 commits)
    tracing: Enable syscall optimization for MIPS
    tracing: Use xarray for syscall trace events
    tracing: Sample module to demonstrate kernel access to Ftrace instances.
    tracing: Adding new functions for kernel access to Ftrace instances
    tracing: Fix Kconfig indentation
    ring-buffer: Fix typos in function ring_buffer_producer
    ftrace: Use BIT() macro
    ftrace: Return ENOTSUPP when DYNAMIC_FTRACE_WITH_DIRECT_CALLS is not configured
    ftrace: Rename ftrace_graph_stub to ftrace_stub_graph
    ftrace: Add a helper function to modify_ftrace_direct() to allow arch optimization
    ftrace: Add helper find_direct_entry() to consolidate code
    ftrace: Add another check for match in register_ftrace_direct()
    ftrace: Fix accounting bug with direct->count in register_ftrace_direct()
    ftrace/selftests: Fix spelling mistake "wakeing" -> "waking"
    tracing: Increase SYNTH_FIELDS_MAX for synthetic_events
    ftrace/samples: Add a sample module that implements modify_ftrace_direct()
    ftrace: Add modify_ftrace_direct()
    tracing: Add missing "inline" in stub function of latency_fsnotify()
    tracing: Remove stray tab in TRACE_EVAL_MAP_FILE's help text
    tracing: Use seq_buf_hex_dump() to dump buffers
    ...

    Linus Torvalds
     

23 Nov, 2019

1 commit

  • This is a sample module to demonstrate the use of the newly introduced and
    exported APIs to access Ftrace instances from within the kernel.

    Newly introduced APIs used here -

    1. Create/Lookup a trace array with the given name.
    struct trace_array *trace_array_get_by_name(const char *name)

    2. Destroy/Remove a trace array.
    int trace_array_destroy(struct trace_array *tr)

    4. Enable/Disable trace events:
    int trace_array_set_clr_event(struct trace_array *tr, const char *system,
    const char *event, bool enable);

    Exported APIs -
    1. trace_printk equivalent for instances.
    int trace_array_printk(struct trace_array *tr,
    unsigned long ip, const char *fmt, ...);

    2. Helper function.
    void trace_printk_init_buffers(void);

    3. To decrement the reference counter.
    void trace_array_put(struct trace_array *tr)

    Sample output(contents of /sys/kernel/tracing/instances/sample-instance)
    NOTE: Tracing disabled after ~5 sec)

    _-----=> irqs-off
    / _----=> need-resched
    | / _---=> hardirq/softirq
    || / _--=> preempt-depth
    ||| / delay
    TASK-PID CPU# |||| TIMESTAMP FUNCTION
    | | | |||| | |
    sample-instance-1452 [002] .... 49.430948: simple_thread: trace_array_printk: count=0
    sample-instance-1452 [002] .... 49.430951: sample_event: count value=0 at jiffies=4294716608
    sample-instance-1452 [002] .... 50.454847: simple_thread: trace_array_printk: count=1
    sample-instance-1452 [002] .... 50.454849: sample_event: count value=1 at jiffies=4294717632
    sample-instance-1452 [002] .... 51.478748: simple_thread: trace_array_printk: count=2
    sample-instance-1452 [002] .... 51.478750: sample_event: count value=2 at jiffies=4294718656
    sample-instance-1452 [002] .... 52.502652: simple_thread: trace_array_printk: count=3
    sample-instance-1452 [002] .... 52.502655: sample_event: count value=3 at jiffies=4294719680
    sample-instance-1452 [002] .... 53.526533: simple_thread: trace_array_printk: count=4
    sample-instance-1452 [002] .... 53.526535: sample_event: count value=4 at jiffies=4294720704
    sample-instance-1452 [002] .... 54.550438: simple_thread: trace_array_printk: count=5
    sample-instance-1452 [002] .... 55.574336: simple_thread: trace_array_printk: count=6

    Link: http://lkml.kernel.org/r/1574276919-11119-3-git-send-email-divya.indi@oracle.com

    Reviewed-by: Aruna Ramakrishna
    Signed-off-by: Divya Indi
    [ Moved to samples/ftrace ]
    Signed-off-by: Steven Rostedt (VMware)

    Divya Indi
     

13 Nov, 2019

1 commit


22 Oct, 2019

1 commit

  • Use hostprogs kbuild constructs to compile
    mei sample program mei-amt-version

    Add CONFIG_SAMPLE_INTEL_MEI option to enable/disable
    the feature.

    Signed-off-by: Tomas Winkler
    Link: https://lore.kernel.org/r/20191010132710.4075-1-tomas.winkler@intel.com
    Signed-off-by: Greg Kroah-Hartman

    Tomas Winkler
     

15 Jun, 2019

1 commit

  • Commit 5318321d367c ("samples: disable CONFIG_SAMPLES for UML") used
    a big hammer to fix the build errors under the samples/ directory.
    Only some samples actually include uapi headers from usr/include.

    Introduce CONFIG_HEADERS_INSTALL since 'depends on HEADERS_INSTALL' is
    clearer than 'depends on !UML'. If this option is enabled, uapi headers
    are installed before starting directory descending.

    I added 'depends on HEADERS_INSTALL' to per-sample CONFIG options.
    This allows UML to compile some samples.

    $ make ARCH=um allmodconfig samples/
    [ snip ]
    CC [M] samples/configfs/configfs_sample.o
    CC [M] samples/kfifo/bytestream-example.o
    CC [M] samples/kfifo/dma-example.o
    CC [M] samples/kfifo/inttype-example.o
    CC [M] samples/kfifo/record-example.o
    CC [M] samples/kobject/kobject-example.o
    CC [M] samples/kobject/kset-example.o
    CC [M] samples/trace_events/trace-events-sample.o
    CC [M] samples/trace_printk/trace-printk.o
    AR samples/vfio-mdev/built-in.a
    AR samples/built-in.a

    Signed-off-by: Masahiro Yamada

    Masahiro Yamada
     

18 May, 2019

1 commit


09 May, 2019

1 commit

  • Pull Kbuild updates from Masahiro Yamada:

    - allow users to invoke 'make' out of the source tree

    - refactor scripts/mkmakefile

    - deprecate KBUILD_SRC, which was used to track the source tree
    location for O= build.

    - fix recordmcount.pl in case objdump output is localized

    - turn unresolved symbols in external modules to errors from warnings
    by default; pass KBUILD_MODPOST_WARN=1 to get them back to warnings

    - generate modules.builtin.modinfo to collect .modinfo data from
    built-in modules

    - misc Makefile cleanups

    * tag 'kbuild-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (21 commits)
    .gitignore: add more all*.config patterns
    moduleparam: Save information about built-in modules in separate file
    Remove MODULE_ALIAS() calls that take undefined macro
    .gitignore: add leading and trailing slashes to generated directories
    scripts/tags.sh: fix direct execution of scripts/tags.sh
    scripts: override locale from environment when running recordmcount.pl
    samples: kobject: allow CONFIG_SAMPLE_KOBJECT to become y
    samples: seccomp: turn CONFIG_SAMPLE_SECCOMP into a bool option
    kbuild: move Documentation to vmlinux-alldirs
    kbuild: move samples/ to KBUILD_VMLINUX_OBJS
    modpost: make KBUILD_MODPOST_WARN also configurable for external modules
    kbuild: check arch/$(SRCARCH)/include/generated before out-of-tree build
    kbuild: remove unneeded dependency for include/config/kernel.release
    memory: squash drivers/memory/Makefile.asm-offsets
    kbuild: use $(srctree) instead of KBUILD_SRC to check out-of-tree build
    kbuild: mkmakefile: generate a simple wrapper of top Makefile
    kbuild: mkmakefile: do not check the generated Makefile marker
    kbuild: allow Kbuild to start from any directory
    kbuild: pass $(MAKECMDGOALS) to sub-make as is
    kbuild: fix warning "overriding recipe for target 'Makefile'"
    ...

    Linus Torvalds
     

08 May, 2019

1 commit

  • Pull mount ABI updates from Al Viro:
    "The syscalls themselves, finally.

    That's not all there is to that stuff, but switching individual
    filesystems to new methods is fortunately independent from everything
    else, so e.g. NFS series can go through NFS tree, etc.

    As those conversions get done, we'll be finally able to get rid of a
    bunch of duplication in fs/super.c introduced in the beginning of the
    entire thing. I expect that to be finished in the next window..."

    * 'work.mount-syscalls' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    vfs: Add a sample program for the new mount API
    vfs: syscall: Add fspick() to select a superblock for reconfiguration
    vfs: syscall: Add fsmount() to create a mount for a superblock
    vfs: syscall: Add fsconfig() for configuring and managing a context
    vfs: Implement logging through fs_context
    vfs: syscall: Add fsopen() to prepare for superblock creation
    Make anon_inodes unconditional
    teach move_mount(2) to work with OPEN_TREE_CLONE
    vfs: syscall: Add move_mount(2) to move mounts around
    vfs: syscall: Add open_tree(2) to reference or clone a mount

    Linus Torvalds
     

07 May, 2019

1 commit

  • This is a sample program showing userspace how to get race-free access
    to process metadata from a pidfd. It is rather easy to do and userspace
    can actually simply reuse code that currently parses a process's status
    file in procfs.
    The program can easily be extended into a generic helper suitable for
    inclusion in a libc to make it even easier for userspace to gain metadata
    access.

    Since this came up in a discussion because this API is going to be used
    in various service managers: A lot of programs will have a whitelist
    seccomp filter that returns for all new syscalls. This
    means that programs might get confused if CLONE_PIDFD works but the
    later pidfd_send_signal() syscall doesn't. Hence, here's a ahead of
    time check that pidfd_send_signal() is supported:

    bool pidfd_send_signal_supported()
    {
    int procfd = open("/proc/self", O_DIRECTORY | O_RDONLY | O_CLOEXEC);
    if (procfd < 0)
    return false;

    /*
    * A process is always allowed to signal itself so
    * pidfd_send_signal() should never fail this test. If it does
    * it must mean it is not available, blocked by an LSM, seccomp,
    * or other.
    */
    return pidfd_send_signal(procfd, 0, NULL, 0) == 0;
    }

    Signed-off-by: Christian Brauner
    Co-developed-by: Jann Horn
    Signed-off-by: Jann Horn
    Reviewed-by: Oleg Nesterov
    Cc: Arnd Bergmann
    Cc: "Eric W. Biederman"
    Cc: Kees Cook
    Cc: Thomas Gleixner
    Cc: David Howells
    Cc: "Michael Kerrisk (man-pages)"
    Cc: Andy Lutomirsky
    Cc: Andrew Morton
    Cc: Aleksa Sarai
    Cc: Linus Torvalds
    Cc: Al Viro

    Christian Brauner
     

03 May, 2019

1 commit


21 Mar, 2019

1 commit

  • Add a sample program to demonstrate fsopen/fsmount/move_mount to mount
    something.

    To make it compile on all arches, irrespective of whether or not syscall
    numbers are assigned, define the syscall number to -1 if it isn't to cause
    the kernel to return -ENOSYS.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

15 Jan, 2019

1 commit

  • This adds a simple sample program mounting binderfs and adding, then
    removing a binder device. Hopefully, it will be helpful to users who want
    to know how binderfs is supposed to be used.

    Signed-off-by: Christian Brauner
    Signed-off-by: Jonathan Corbet

    Christian Brauner
     

11 Apr, 2018

1 commit

  • Pull remoteproc updates from Bjorn Andersson:

    - add support for generating coredumps for remoteprocs using
    devcoredump

    - add the Qualcomm sysmon driver for intra-remoteproc crash handling

    - a number of fixes in Qualcomm and IMX drivers

    * tag 'rproc-v4.17' of git://github.com/andersson/remoteproc:
    remoteproc: fix null pointer dereference on glink only platforms
    soc: qcom: qmi: add CONFIG_NET dependency
    remoteproc: imx_rproc: Slightly simplify code in 'imx_rproc_probe()'
    remoteproc: imx_rproc: Re-use existing error handling path in 'imx_rproc_probe()'
    remoteproc: imx_rproc: Fix an error handling path in 'imx_rproc_probe()'
    samples: Introduce Qualcomm QMI sample client
    remoteproc: qcom: Introduce sysmon
    remoteproc: Pass type of shutdown to subdev remove
    remoteproc: qcom: Register segments for core dump
    soc: qcom: mdt-loader: Return relocation base
    remoteproc: Rename "load_rsc_table" to "parse_fw"
    remoteproc: Add remote processor coredump support
    remoteproc: Remove null character write of shared mem

    Linus Torvalds
     

16 Mar, 2018

1 commit

  • The Analog Devices Blackfin port was added in 2007 and was rather
    active for a while, but all work on it has come to a standstill
    over time, as Analog have changed their product line-up.

    Aaron Wu confirmed that the architecture port is no longer relevant,
    and multiple people suggested removing blackfin independently because
    of some of its oddities like a non-working SMP port, and the amount of
    duplication between the chip variants, which cause extra work when
    doing cross-architecture changes.

    Link: https://docs.blackfin.uclinux.org/
    Acked-by: Aaron Wu
    Acked-by: Bryan Wu
    Cc: Steven Miao
    Cc: Mike Frysinger
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

13 Feb, 2018

1 commit

  • Introduce a sample driver that register for server notifications and
    spawn clients for each available test service (service 15). The spawned
    clients implements the interface for encoding "ping" and "data" requests
    and decode the responses from the remote.

    Acked-By: Chris Lew
    Signed-off-by: Bjorn Andersson

    Bjorn Andersson
     

03 Mar, 2017

1 commit

  • Add a system call to make extended file information available, including
    file creation and some attribute flags where available through the
    underlying filesystem.

    The getattr inode operation is altered to take two additional arguments: a
    u32 request_mask and an unsigned int flags that indicate the
    synchronisation mode. This change is propagated to the vfs_getattr*()
    function.

    Functions like vfs_stat() are now inline wrappers around new functions
    vfs_statx() and vfs_statx_fd() to reduce stack usage.

    ========
    OVERVIEW
    ========

    The idea was initially proposed as a set of xattrs that could be retrieved
    with getxattr(), but the general preference proved to be for a new syscall
    with an extended stat structure.

    A number of requests were gathered for features to be included. The
    following have been included:

    (1) Make the fields a consistent size on all arches and make them large.

    (2) Spare space, request flags and information flags are provided for
    future expansion.

    (3) Better support for the y2038 problem [Arnd Bergmann] (tv_sec is an
    __s64).

    (4) Creation time: The SMB protocol carries the creation time, which could
    be exported by Samba, which will in turn help CIFS make use of
    FS-Cache as that can be used for coherency data (stx_btime).

    This is also specified in NFSv4 as a recommended attribute and could
    be exported by NFSD [Steve French].

    (5) Lightweight stat: Ask for just those details of interest, and allow a
    netfs (such as NFS) to approximate anything not of interest, possibly
    without going to the server [Trond Myklebust, Ulrich Drepper, Andreas
    Dilger] (AT_STATX_DONT_SYNC).

    (6) Heavyweight stat: Force a netfs to go to the server, even if it thinks
    its cached attributes are up to date [Trond Myklebust]
    (AT_STATX_FORCE_SYNC).

    And the following have been left out for future extension:

    (7) Data version number: Could be used by userspace NFS servers [Aneesh
    Kumar].

    Can also be used to modify fill_post_wcc() in NFSD which retrieves
    i_version directly, but has just called vfs_getattr(). It could get
    it from the kstat struct if it used vfs_xgetattr() instead.

    (There's disagreement on the exact semantics of a single field, since
    not all filesystems do this the same way).

    (8) BSD stat compatibility: Including more fields from the BSD stat such
    as creation time (st_btime) and inode generation number (st_gen)
    [Jeremy Allison, Bernd Schubert].

    (9) Inode generation number: Useful for FUSE and userspace NFS servers
    [Bernd Schubert].

    (This was asked for but later deemed unnecessary with the
    open-by-handle capability available and caused disagreement as to
    whether it's a security hole or not).

    (10) Extra coherency data may be useful in making backups [Andreas Dilger].

    (No particular data were offered, but things like last backup
    timestamp, the data version number and the DOS archive bit would come
    into this category).

    (11) Allow the filesystem to indicate what it can/cannot provide: A
    filesystem can now say it doesn't support a standard stat feature if
    that isn't available, so if, for instance, inode numbers or UIDs don't
    exist or are fabricated locally...

    (This requires a separate system call - I have an fsinfo() call idea
    for this).

    (12) Store a 16-byte volume ID in the superblock that can be returned in
    struct xstat [Steve French].

    (Deferred to fsinfo).

    (13) Include granularity fields in the time data to indicate the
    granularity of each of the times (NFSv4 time_delta) [Steve French].

    (Deferred to fsinfo).

    (14) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags.
    Note that the Linux IOC flags are a mess and filesystems such as Ext4
    define flags that aren't in linux/fs.h, so translation in the kernel
    may be a necessity (or, possibly, we provide the filesystem type too).

    (Some attributes are made available in stx_attributes, but the general
    feeling was that the IOC flags were to ext[234]-specific and shouldn't
    be exposed through statx this way).

    (15) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer,
    Michael Kerrisk].

    (Deferred, probably to fsinfo. Finding out if there's an ACL or
    seclabal might require extra filesystem operations).

    (16) Femtosecond-resolution timestamps [Dave Chinner].

    (A __reserved field has been left in the statx_timestamp struct for
    this - if there proves to be a need).

    (17) A set multiple attributes syscall to go with this.

    ===============
    NEW SYSTEM CALL
    ===============

    The new system call is:

    int ret = statx(int dfd,
    const char *filename,
    unsigned int flags,
    unsigned int mask,
    struct statx *buffer);

    The dfd, filename and flags parameters indicate the file to query, in a
    similar way to fstatat(). There is no equivalent of lstat() as that can be
    emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags. There is
    also no equivalent of fstat() as that can be emulated by passing a NULL
    filename to statx() with the fd of interest in dfd.

    Whether or not statx() synchronises the attributes with the backing store
    can be controlled by OR'ing a value into the flags argument (this typically
    only affects network filesystems):

    (1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this
    respect.

    (2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise
    its attributes with the server - which might require data writeback to
    occur to get the timestamps correct.

    (3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a
    network filesystem. The resulting values should be considered
    approximate.

    mask is a bitmask indicating the fields in struct statx that are of
    interest to the caller. The user should set this to STATX_BASIC_STATS to
    get the basic set returned by stat(). It should be noted that asking for
    more information may entail extra I/O operations.

    buffer points to the destination for the data. This must be 256 bytes in
    size.

    ======================
    MAIN ATTRIBUTES RECORD
    ======================

    The following structures are defined in which to return the main attribute
    set:

    struct statx_timestamp {
    __s64 tv_sec;
    __s32 tv_nsec;
    __s32 __reserved;
    };

    struct statx {
    __u32 stx_mask;
    __u32 stx_blksize;
    __u64 stx_attributes;
    __u32 stx_nlink;
    __u32 stx_uid;
    __u32 stx_gid;
    __u16 stx_mode;
    __u16 __spare0[1];
    __u64 stx_ino;
    __u64 stx_size;
    __u64 stx_blocks;
    __u64 __spare1[1];
    struct statx_timestamp stx_atime;
    struct statx_timestamp stx_btime;
    struct statx_timestamp stx_ctime;
    struct statx_timestamp stx_mtime;
    __u32 stx_rdev_major;
    __u32 stx_rdev_minor;
    __u32 stx_dev_major;
    __u32 stx_dev_minor;
    __u64 __spare2[14];
    };

    The defined bits in request_mask and stx_mask are:

    STATX_TYPE Want/got stx_mode & S_IFMT
    STATX_MODE Want/got stx_mode & ~S_IFMT
    STATX_NLINK Want/got stx_nlink
    STATX_UID Want/got stx_uid
    STATX_GID Want/got stx_gid
    STATX_ATIME Want/got stx_atime{,_ns}
    STATX_MTIME Want/got stx_mtime{,_ns}
    STATX_CTIME Want/got stx_ctime{,_ns}
    STATX_INO Want/got stx_ino
    STATX_SIZE Want/got stx_size
    STATX_BLOCKS Want/got stx_blocks
    STATX_BASIC_STATS [The stuff in the normal stat struct]
    STATX_BTIME Want/got stx_btime{,_ns}
    STATX_ALL [All currently available stuff]

    stx_btime is the file creation time, stx_mask is a bitmask indicating the
    data provided and __spares*[] are where as-yet undefined fields can be
    placed.

    Time fields are structures with separate seconds and nanoseconds fields
    plus a reserved field in case we want to add even finer resolution. Note
    that times will be negative if before 1970; in such a case, the nanosecond
    fields will also be negative if not zero.

    The bits defined in the stx_attributes field convey information about a
    file, how it is accessed, where it is and what it does. The following
    attributes map to FS_*_FL flags and are the same numerical value:

    STATX_ATTR_COMPRESSED File is compressed by the fs
    STATX_ATTR_IMMUTABLE File is marked immutable
    STATX_ATTR_APPEND File is append-only
    STATX_ATTR_NODUMP File is not to be dumped
    STATX_ATTR_ENCRYPTED File requires key to decrypt in fs

    Within the kernel, the supported flags are listed by:

    KSTAT_ATTR_FS_IOC_FLAGS

    [Are any other IOC flags of sufficient general interest to be exposed
    through this interface?]

    New flags include:

    STATX_ATTR_AUTOMOUNT Object is an automount trigger

    These are for the use of GUI tools that might want to mark files specially,
    depending on what they are.

    Fields in struct statx come in a number of classes:

    (0) stx_dev_*, stx_blksize.

    These are local system information and are always available.

    (1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino,
    stx_size, stx_blocks.

    These will be returned whether the caller asks for them or not. The
    corresponding bits in stx_mask will be set to indicate whether they
    actually have valid values.

    If the caller didn't ask for them, then they may be approximated. For
    example, NFS won't waste any time updating them from the server,
    unless as a byproduct of updating something requested.

    If the values don't actually exist for the underlying object (such as
    UID or GID on a DOS file), then the bit won't be set in the stx_mask,
    even if the caller asked for the value. In such a case, the returned
    value will be a fabrication.

    Note that there are instances where the type might not be valid, for
    instance Windows reparse points.

    (2) stx_rdev_*.

    This will be set only if stx_mode indicates we're looking at a
    blockdev or a chardev, otherwise will be 0.

    (3) stx_btime.

    Similar to (1), except this will be set to 0 if it doesn't exist.

    =======
    TESTING
    =======

    The following test program can be used to test the statx system call:

    samples/statx/test-statx.c

    Just compile and run, passing it paths to the files you want to examine.
    The file is built automatically if CONFIG_SAMPLES is enabled.

    Here's some example output. Firstly, an NFS directory that crosses to
    another FSID. Note that the AUTOMOUNT attribute is set because transiting
    this directory will cause d_automount to be invoked by the VFS.

    [root@andromeda ~]# /tmp/test-statx -A /warthog/data
    statx(/warthog/data) = 0
    results=7ff
    Size: 4096 Blocks: 8 IO Block: 1048576 directory
    Device: 00:26 Inode: 1703937 Links: 125
    Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041
    Access: 2016-11-24 09:02:12.219699527+0000
    Modify: 2016-11-17 10:44:36.225653653+0000
    Change: 2016-11-17 10:44:36.225653653+0000
    Attributes: 0000000000001000 (-------- -------- -------- -------- -------- -------- ---m---- --------)

    Secondly, the result of automounting on that directory.

    [root@andromeda ~]# /tmp/test-statx /warthog/data
    statx(/warthog/data) = 0
    results=7ff
    Size: 4096 Blocks: 8 IO Block: 1048576 directory
    Device: 00:27 Inode: 2 Links: 125
    Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041
    Access: 2016-11-24 09:02:12.219699527+0000
    Modify: 2016-11-17 10:44:36.225653653+0000
    Change: 2016-11-17 10:44:36.225653653+0000

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

30 Dec, 2016

1 commit


10 Oct, 2016

1 commit

  • Move blackfin gptimers-example to samples and remove it from Documentation
    Makefile. Update samples Kconfig and Makefile to build gptimers-example.

    blackfin is the last CONFIG_BUILD_DOCSRC target in Documentation/Makefile.
    Hence this patch also includes changes to remove CONFIG_BUILD_DOCSRC from
    Makefile and lib/Kconfig.debug and updates VIDEO_PCI_SKELETON dependency
    on BUILD_DOCSRC.

    Documentation/Makefile is not deleted to avoid braking make htmldocs and
    make distclean.

    Acked-by: Michal Marek
    Acked-by: Jonathan Corbet
    Reviewed-by: Kees Cook
    Reported-by: Valentin Rothberg
    Reported-by: Paul Gortmaker
    Signed-off-by: Shuah Khan

    Shuah Khan
     

20 Jun, 2016

1 commit

  • Add sample code to test trace_printk(). The trace_printk() functions should
    never be used in production code. This makes testing it a bit more
    difficult. Having a sample module that can test use cases of trace_printk()
    can help out.

    Currently it just tests trace_printk() where it will be converted into:

    trace_bputs()
    trace_puts()
    trace_bprintk()

    as well as staying as the normal _trace_printk().

    It also tests its use in interrupt context as that will test the auxilery
    buffers.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

28 Apr, 2016

2 commits

  • A small bug with the new autoksyms support showed that there are
    two kernel modules in the Documentation directory that qualify
    as samples, while all other samples are in the samples/ directory.

    This patch was originally meant as a workaround for that bug, but
    it has now been solved in a different way. However, I still think
    it makes sense as a cleanup to consolidate all sample code in
    one place.

    Signed-off-by: Arnd Bergmann
    Acked-by: Hans Verkuil
    Acked-by: Mauro Carvalho Chehab
    Signed-off-by: Jonathan Corbet

    Arnd Bergmann
     
  • A small bug with the new autoksyms support showed that there are
    two kernel modules in the Documentation directory that qualify
    as samples, while all other samples are in the samples/ directory.

    This patch was originally meant as a workaround for that bug, but
    it has now been solved in a different way. However, I still think
    it makes sense as a cleanup to consolidate all sample code in
    one place.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Jonathan Corbet

    Arnd Bergmann
     

14 Oct, 2015

1 commit

  • Remove the old show_attribute and store_attribute methods and update
    the documentation. Also replace the two C samples with a single new
    one in the proper samples directory where people expect to find it.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Nicholas Bellinger

    Christoph Hellwig
     

22 Dec, 2014

1 commit


26 Jan, 2013

1 commit

  • The tracepoint sample code was used to teach developers how to
    create their own tracepoints. But now the trace_events have been
    added as a higher level that is used directly by developers today.

    Only the trace_event code should use the tracepoint interface
    directly and no new tracepoints should be added.

    Besides, the example had a race condition with the use of the
    ->d_name.name dentry field, as pointed out by Al Viro.

    Best just to remove the code so it wont be used by other developers.

    Link: http://lkml.kernel.org/r/20130123225523.GY4939@ZenIV.linux.org.uk

    Cc: Al Viro
    Acked-by: Mathieu Desnoyers
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

14 Apr, 2012

1 commit

  • Documents how system call filtering using Berkeley Packet
    Filter programs works and how it may be used.
    Includes an example for x86 and a semi-generic
    example using a macro-based code generator.

    Acked-by: Eric Paris
    Signed-off-by: Will Drewry
    Acked-by: Kees Cook

    v18: - added acked by
    - update no new privs numbers
    v17: - remove @compat note and add Pitfalls section for arch checking
    (keescook@chromium.org)
    v16: -
    v15: -
    v14: - rebase/nochanges
    v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc
    v12: - comment on the ptrace_event use
    - update arch support comment
    - note the behavior of SECCOMP_RET_DATA when there are multiple filters
    (keescook@chromium.org)
    - lots of samples/ clean up incl 64-bit bpf-direct support
    (markus@chromium.org)
    - rebase to linux-next
    v11: - overhaul return value language, updates (keescook@chromium.org)
    - comment on do_exit(SIGSYS)
    v10: - update for SIGSYS
    - update for new seccomp_data layout
    - update for ptrace option use
    v9: - updated bpf-direct.c for SIGILL
    v8: - add PR_SET_NO_NEW_PRIVS to the samples.
    v7: - updated for all the new stuff in v7: TRAP, TRACE
    - only talk about PR_SET_SECCOMP now
    - fixed bad JLE32 check (coreyb@linux.vnet.ibm.com)
    - adds dropper.c: a simple system call disabler
    v6: - tweak the language to note the requirement of
    PR_SET_NO_NEW_PRIVS being called prior to use. (luto@mit.edu)
    v5: - update sample to use system call arguments
    - adds a "fancy" example using a macro-based generator
    - cleaned up bpf in the sample
    - update docs to mention arguments
    - fix prctl value (eparis@redhat.com)
    - language cleanup (rdunlap@xenotime.net)
    v4: - update for no_new_privs use
    - minor tweaks
    v3: - call out BPF Berkeley Packet Filter (rdunlap@xenotime.net)
    - document use of tentative always-unprivileged
    - guard sample compilation for i386 and x86_64
    v2: - move code to samples (corbet@lwn.net)
    Signed-off-by: James Morris

    Will Drewry
     

09 Feb, 2012

1 commit

  • Add an rpmsg driver sample, which demonstrates how to communicate with
    an AMP-configured remote processor over the rpmsg bus.

    Note how once probed, the driver can immediately start sending messages
    using the rpmsg_send() API, without having to worry about creating endpoints
    or allocating rpmsg addresses: all that work is done by the rpmsg bus,
    and the required information is already embedded in the rpmsg channel
    that the driver is probed with.

    In this sample, the driver simply sends a "Hello World!" message to the remote
    processor repeatedly.

    Designed with Brian Swetland .

    Signed-off-by: Ohad Ben-Cohen
    Cc: Brian Swetland
    Cc: Arnd Bergmann
    Cc: Grant Likely
    Cc: Tony Lindgren
    Cc: Russell King
    Cc: Rusty Russell
    Cc: Andrew Morton
    Cc: Greg KH
    Cc: Stephen Boyd

    Ohad Ben-Cohen
     

22 Mar, 2011

1 commit


30 Oct, 2010

1 commit


11 Aug, 2010

1 commit

  • Add four examples to the kernel sample directory.

    It shows how to handle:
    - a byte stream fifo
    - a integer type fifo
    - a dynamic record sized fifo
    - the fifo DMA functions

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Stefani Seibold
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stefani Seibold