17 Nov, 2013

1 commit

  • Pull tracing update from Steven Rostedt:
    "This batch of changes is mostly clean ups and small bug fixes. The
    only real feature that was added this release is from Namhyung Kim,
    who introduced "set_graph_notrace" filter that lets you run the
    function graph tracer and not trace particular functions and their
    call chain.

    Tom Zanussi added some updates to the ftrace multibuffer tracing that
    made it more consistent with the top level tracing.

    One of the fixes for perf function tracing required an API change in
    RCU; the addition of "rcu_is_watching()". As Paul McKenney is pushing
    that change in this release too, he gave me a branch that included all
    the changes to get that working, and I pulled that into my tree in
    order to complete the perf function tracing fix"

    * tag 'trace-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Add rcu annotation for syscall trace descriptors
    tracing: Do not use signed enums with unsigned long long in fgragh output
    tracing: Remove unused function ftrace_off_permanent()
    tracing: Do not assign filp->private_data to freed memory
    tracing: Add helper function tracing_is_disabled()
    tracing: Open tracer when ftrace_dump_on_oops is used
    tracing: Add support for SOFT_DISABLE to syscall events
    tracing: Make register/unregister_ftrace_command __init
    tracing: Update event filters for multibuffer
    recordmcount.pl: Add support for __fentry__
    ftrace: Have control op function callback only trace when RCU is watching
    rcu: Do not trace rcu_is_watching() functions
    ftrace/x86: skip over the breakpoint for ftrace caller
    trace/trace_stat: use rbtree postorder iteration helper instead of opencoding
    ftrace: Add set_graph_notrace filter
    ftrace: Narrow down the protected area of graph_lock
    ftrace: Introduce struct ftrace_graph_data
    ftrace: Get rid of ftrace_graph_filter_enabled
    tracing: Fix potential out-of-bounds in trace_get_user()
    tracing: Show more exact help information about snapshot

    Linus Torvalds
     

13 Nov, 2013

1 commit


06 Nov, 2013

2 commits

  • The original SOFT_DISABLE patches didn't add support for soft disable
    of syscall events; this adds it.

    Add an array of ftrace_event_file pointers indexed by syscall number
    to the trace array and remove the existing enabled bitmaps, which as a
    result are now redundant. The ftrace_event_file structs in turn
    contain the soft disable flags we need for per-syscall soft disable
    accounting.

    Adding ftrace_event_files also means we can remove the USE_CALL_FILTER
    bit, thus enabling multibuffer filter support for syscall events.

    Link: http://lkml.kernel.org/r/6e72b566e85d8df8042f133efbc6c30e21fb017e.1382620672.git.tom.zanussi@linux.intel.com

    Signed-off-by: Tom Zanussi
    Signed-off-by: Steven Rostedt

    Tom Zanussi
     
  • The trace event filters are still tied to event calls rather than
    event files, which means you don't get what you'd expect when using
    filters in the multibuffer case:

    Before:

    # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
    # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
    bytes_alloc > 8192
    # mkdir /sys/kernel/debug/tracing/instances/test1
    # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
    # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
    bytes_alloc > 2048
    # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
    bytes_alloc > 2048

    Setting the filter in tracing/instances/test1/events shouldn't affect
    the same event in tracing/events as it does above.

    After:

    # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
    # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
    bytes_alloc > 8192
    # mkdir /sys/kernel/debug/tracing/instances/test1
    # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
    # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
    bytes_alloc > 8192
    # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
    bytes_alloc > 2048

    We'd like to just move the filter directly from ftrace_event_call to
    ftrace_event_file, but there are a couple cases that don't yet have
    multibuffer support and therefore have to continue using the current
    event_call-based filters. For those cases, a new USE_CALL_FILTER bit
    is added to the event_call flags, whose main purpose is to keep the
    old behavior for those cases until they can be updated with
    multibuffer support; at that point, the USE_CALL_FILTER flag (and the
    new associated call_filter_check_discard() function) can go away.

    The multibuffer support also made filter_current_check_discard()
    redundant, so this change removes that function as well and replaces
    it with filter_check_discard() (or call_filter_check_discard() as
    appropriate).

    Link: http://lkml.kernel.org/r/f16e9ce4270c62f46b2e966119225e1c3cca7e60.1382620672.git.tom.zanussi@linux.intel.com

    Signed-off-by: Tom Zanussi
    Signed-off-by: Steven Rostedt

    Tom Zanussi
     

12 Sep, 2013

1 commit

  • Unclutter -Wmissing-prototypes warning types (enabled at make W=1)

    linux/include/linux/syscalls.h:190:18: warning: no previous prototype for 'SyS_semctl' [-Wmissing-prototypes]
    asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
    ^
    linux/include/linux/syscalls.h:183:2: note: in expansion of macro '__SYSCALL_DEFINEx'
    __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
    ^
    by adding forward declarations right before definitions.

    Signed-off-by: Sergei Trofimovich
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergei Trofimovich
     

14 Aug, 2013

1 commit

  • Fix inadvertent breakage in the clone syscall ABI for Microblaze that
    was introduced in commit f3268edbe6fe ("microblaze: switch to generic
    fork/vfork/clone").

    The Microblaze syscall ABI for clone takes the parent tid address in the
    4th argument; the third argument slot is used for the stack size. The
    incorrectly-used CLONE_BACKWARDS type assigned parent tid to the 3rd
    slot.

    This commit restores the original ABI so that existing userspace libc
    code will work correctly.

    All kernel versions from v3.8-rc1 were affected.

    Signed-off-by: Michal Simek
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Simek
     

06 Mar, 2013

2 commits

  • a) teach __MAP(num, m, ) to take empty
    list (with num being 0, of course)
    b) fold types__... and args__... declaration and initialization into
    SYSCALL_METADATA(num, ...), making their use conditional on num != 0.
    That allows to use the SYSCALL_METADATA instead of its near-duplicate
    in SYSCALL_DEFINE0.
    c) make SYSCALL_METADATA expand to nothing in case if CONFIG_FTRACE_SYSCALLS
    is not defined; that allows to make SYSCALL_DEFINE0 and SYSCALL_DEFINEx
    definitions independent from CONFIG_FTRACE_SYSCALLS.
    d) kill SYSCALL_DEFINE - no users left (SYSCALL_DEFINE[0-6] is, of course,
    still alive and well).

    Signed-off-by: Al Viro

    Al Viro
     
  • just have the bugger take unsigned long and deal with SETVAL
    case (when we use an int member in the union) explicitly.

    Signed-off-by: Al Viro

    Al Viro
     

04 Mar, 2013

5 commits


14 Feb, 2013

1 commit

  • __ARCH_WANT_SYS_RT_SIGACTION,
    __ARCH_WANT_SYS_RT_SIGSUSPEND,
    __ARCH_WANT_COMPAT_SYS_RT_SIGSUSPEND,
    __ARCH_WANT_COMPAT_SYS_SCHED_RR_GET_INTERVAL - not used anymore
    CONFIG_GENERIC_{SIGALTSTACK,COMPAT_RT_SIG{ACTION,QUEUEINFO,PENDING,PROCMASK}} -
    can be assumed always set.

    Al Viro
     

04 Feb, 2013

5 commits


21 Dec, 2012

1 commit

  • Pull signal handling cleanups from Al Viro:
    "sigaltstack infrastructure + conversion for x86, alpha and um,
    COMPAT_SYSCALL_DEFINE infrastructure.

    Note that there are several conflicts between "unify
    SS_ONSTACK/SS_DISABLE definitions" and UAPI patches in mainline;
    resolution is trivial - just remove definitions of SS_ONSTACK and
    SS_DISABLED from arch/*/uapi/asm/signal.h; they are all identical and
    include/uapi/linux/signal.h contains the unified variant."

    Fixed up conflicts as per Al.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
    alpha: switch to generic sigaltstack
    new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those
    generic compat_sys_sigaltstack()
    introduce generic sys_sigaltstack(), switch x86 and um to it
    new helper: compat_user_stack_pointer()
    new helper: restore_altstack()
    unify SS_ONSTACK/SS_DISABLE definitions
    new helper: current_user_stack_pointer()
    missing user_stack_pointer() instances
    Bury the conditionals from kernel_thread/kernel_execve series
    COMPAT_SYSCALL_DEFINE: infrastructure

    Linus Torvalds
     

20 Dec, 2012

2 commits


19 Dec, 2012

1 commit

  • Pull module update from Rusty Russell:
    "Nothing all that exciting; a new module-from-fd syscall for those who
    want to verify the source of the module (ChromeOS) and/or use standard
    IMA on it or other security hooks."

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    MODSIGN: Fix kbuild output when using default extra_certificates
    MODSIGN: Avoid using .incbin in C source
    modules: don't hand 0 to vmalloc.
    module: Remove a extra null character at the top of module->strtab.
    ASN.1: Use the ASN1_LONG_TAG and ASN1_INDEFINITE_LENGTH constants
    ASN.1: Define indefinite length marker constant
    moduleparam: use __UNIQUE_ID()
    __UNIQUE_ID()
    MODSIGN: Add modules_sign make target
    powerpc: add finit_module syscall.
    ima: support new kernel module syscall
    add finit_module syscall to asm-generic
    ARM: add finit_module syscall to ARM
    security: introduce kernel_module_from_file hook
    module: add flags arg to sys_finit_module()
    module: add syscall to load module from fd

    Linus Torvalds
     

18 Dec, 2012

1 commit


14 Dec, 2012

2 commits

  • Thanks to Michael Kerrisk for keeping us honest. These flags are actually
    useful for eliminating the only case where kmod has to mangle a module's
    internals: for overriding module versioning.

    Signed-off-by: Rusty Russell
    Acked-by: Lucas De Marchi
    Acked-by: Kees Cook

    Rusty Russell
     
  • As part of the effort to create a stronger boundary between root and
    kernel, Chrome OS wants to be able to enforce that kernel modules are
    being loaded only from our read-only crypto-hash verified (dm_verity)
    root filesystem. Since the init_module syscall hands the kernel a module
    as a memory blob, no reasoning about the origin of the blob can be made.

    Earlier proposals for appending signatures to kernel modules would not be
    useful in Chrome OS, since it would involve adding an additional set of
    keys to our kernel and builds for no good reason: we already trust the
    contents of our root filesystem. We don't need to verify those kernel
    modules a second time. Having to do signature checking on module loading
    would slow us down and be redundant. All we need to know is where a
    module is coming from so we can say yes/no to loading it.

    If a file descriptor is used as the source of a kernel module, many more
    things can be reasoned about. In Chrome OS's case, we could enforce that
    the module lives on the filesystem we expect it to live on. In the case
    of IMA (or other LSMs), it would be possible, for example, to examine
    extended attributes that may contain signatures over the contents of
    the module.

    This introduces a new syscall (on x86), similar to init_module, that has
    only two arguments. The first argument is used as a file descriptor to
    the module and the second argument is a pointer to the NULL terminated
    string of module arguments.

    Signed-off-by: Kees Cook
    Cc: Andrew Morton
    Signed-off-by: Rusty Russell (merge fixes)

    Kees Cook
     

29 Nov, 2012

3 commits


13 Oct, 2012

1 commit

  • * allow kernel_execve() leave the actual return to userland to
    caller (selected by CONFIG_GENERIC_KERNEL_EXECVE). Callers
    updated accordingly.
    * architecture that does select GENERIC_KERNEL_EXECVE in its
    Kconfig should have its ret_from_kernel_thread() do this:
    call schedule_tail
    call the callback left for it by copy_thread(); if it ever
    returns, that's because it has just done successful kernel_execve()
    jump to return from syscall
    IOW, its only difference from ret_from_fork() is that it does call the
    callback.
    * such an architecture should also get rid of ret_from_kernel_execve()
    and __ARCH_WANT_KERNEL_EXECVE

    This is the last part of infrastructure patches in that area - from
    that point on work on different architectures can live independently.

    Signed-off-by: Al Viro

    Al Viro
     

01 Jun, 2012

1 commit

  • While doing the checkpoint-restore in the user space one need to determine
    whether various kernel objects (like mm_struct-s of file_struct-s) are
    shared between tasks and restore this state.

    The 2nd step can be solved by using appropriate CLONE_ flags and the
    unshare syscall, while there's currently no ways for solving the 1st one.

    One of the ways for checking whether two tasks share e.g. mm_struct is to
    provide some mm_struct ID of a task to its proc file, but showing such
    info considered to be not that good for security reasons.

    Thus after some debates we end up in conclusion that using that named
    'comparison' syscall might be the best candidate. So here is it --
    __NR_kcmp.

    It takes up to 5 arguments - the pids of the two tasks (which
    characteristics should be compared), the comparison type and (in case of
    comparison of files) two file descriptors.

    Lookups for pids are done in the caller's PID namespace only.

    At moment only x86 is supported and tested.

    [akpm@linux-foundation.org: fix up selftests, warnings]
    [akpm@linux-foundation.org: include errno.h]
    [akpm@linux-foundation.org: tweak comment text]
    Signed-off-by: Cyrill Gorcunov
    Acked-by: "Eric W. Biederman"
    Cc: Pavel Emelyanov
    Cc: Andrey Vagin
    Cc: KOSAKI Motohiro
    Cc: Ingo Molnar
    Cc: H. Peter Anvin
    Cc: Thomas Gleixner
    Cc: Glauber Costa
    Cc: Andi Kleen
    Cc: Tejun Heo
    Cc: Matt Helsley
    Cc: Pekka Enberg
    Cc: Eric Dumazet
    Cc: Vasiliy Kulikov
    Cc: Alexey Dobriyan
    Cc: Valdis.Kletnieks@vt.edu
    Cc: Michal Marek
    Cc: Frederic Weisbecker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     

05 Mar, 2012

1 commit

  • If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any
    other BUG variant in a static inline (i.e. not in a #define) then
    that header really should be including and not just
    expecting it to be implicitly present.

    We can make this change risk-free, since if the files using these
    headers didn't have exposure to linux/bug.h already, they would have
    been causing compile failures/warnings.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

22 Feb, 2012

1 commit

  • The 'poll()' system call timeout parameter is supposed to be 'int', not
    'long'.

    Now, the reason this matters is that right now 32-bit compat mode is
    broken on at least x86-64, because the 32-bit code just calls
    'sys_poll()' directly on x86-64, and the 32-bit argument will have been
    zero-extended, turning a signed 'int' into a large unsigned 'long'
    value.

    We could just introduce a 'compat_sys_poll()' function for this, and
    that may eventually be what we have to do, but since the actual standard
    poll() semantics is *supposed* to be 'int', and since at least on x86-64
    glibc sign-extends the argument before invocing the system call (so
    nobody can actually use a 64-bit timeout value in user space _anyway_,
    even in 64-bit binaries), the simpler solution would seem to be to just
    fix the definition of the system call to match what it should have been
    from the very start.

    If it turns out that somebody somehow circumvents the user-level libc
    64-bit sign extension and actually uses a large unsigned 64-bit timeout
    despite that not being how poll() is supposed to work, we will need to
    do the compat_sys_poll() approach.

    Reported-by: Thomas Meyer
    Acked-by: Eric Dumazet
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

04 Jan, 2012

5 commits


01 Nov, 2011

1 commit

  • The basic idea behind cross memory attach is to allow MPI programs doing
    intra-node communication to do a single copy of the message rather than a
    double copy of the message via shared memory.

    The following patch attempts to achieve this by allowing a destination
    process, given an address and size from a source process, to copy memory
    directly from the source process into its own address space via a system
    call. There is also a symmetrical ability to copy from the current
    process's address space into a destination process's address space.

    - Use of /proc/pid/mem has been considered, but there are issues with
    using it:
    - Does not allow for specifying iovecs for both src and dest, assuming
    preadv or pwritev was implemented either the area read from or
    written to would need to be contiguous.
    - Currently mem_read allows only processes who are currently
    ptrace'ing the target and are still able to ptrace the target to read
    from the target. This check could possibly be moved to the open call,
    but its not clear exactly what race this restriction is stopping
    (reason appears to have been lost)
    - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
    domain socket is a bit ugly from a userspace point of view,
    especially when you may have hundreds if not (eventually) thousands
    of processes that all need to do this with each other
    - Doesn't allow for some future use of the interface we would like to
    consider adding in the future (see below)
    - Interestingly reading from /proc/pid/mem currently actually
    involves two copies! (But this could be fixed pretty easily)

    As mentioned previously use of vmsplice instead was considered, but has
    problems. Since you need the reader and writer working co-operatively if
    the pipe is not drained then you block. Which requires some wrapping to
    do non blocking on the send side or polling on the receive. In all to all
    communication it requires ordering otherwise you can deadlock. And in the
    example of many MPI tasks writing to one MPI task vmsplice serialises the
    copying.

    There are some cases of MPI collectives where even a single copy interface
    does not get us the performance gain we could. For example in an
    MPI_Reduce rather than copy the data from the source we would like to
    instead use it directly in a mathops (say the reduce is doing a sum) as
    this would save us doing a copy. We don't need to keep a copy of the data
    from the source. I haven't implemented this, but I think this interface
    could in the future do all this through the use of the flags - eg could
    specify the math operation and type and the kernel rather than just
    copying the data would apply the specified operation between the source
    and destination and store it in the destination.

    Although we don't have a "second user" of the interface (though I've had
    some nibbles from people who may be interested in using it for intra
    process messaging which is not MPI). This interface is something which
    hardware vendors are already doing for their custom drivers to implement
    fast local communication. And so in addition to this being useful for
    OpenMPI it would mean the driver maintainers don't have to fix things up
    when the mm changes.

    There was some discussion about how much faster a true zero copy would
    go. Here's a link back to the email with some testing I did on that:

    http://marc.info/?l=linux-mm&m=130105930902915&w=2

    There is a basic man page for the proposed interface here:

    http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt

    This has been implemented for x86 and powerpc, other architecture should
    mainly (I think) just need to add syscall numbers for the process_vm_readv
    and process_vm_writev. There are 32 bit compatibility versions for
    64-bit kernels.

    For arch maintainers there are some simple tests to be able to quickly
    verify that the syscalls are working correctly here:

    http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgz

    Signed-off-by: Chris Yeoh
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Arnd Bergmann
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: David Howells
    Cc: James Morris
    Cc:
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christopher Yeoh
     

27 Aug, 2011

1 commit