24 Aug, 2015

1 commit

  • Poma (on the way to another bug) reported an assertion triggering:

    [] module_assert_mutex_or_preempt+0x49/0x90
    [] __module_address+0x32/0x150
    [] __module_text_address+0x16/0x70
    [] symbol_put_addr+0x29/0x40
    [] dvb_frontend_detach+0x7d/0x90 [dvb_core]

    Laura Abbott produced a patch which lead us to
    inspect symbol_put_addr(). This function has a comment claiming it
    doesn't need to disable preemption around the module lookup
    because it holds a reference to the module it wants to find, which
    therefore cannot go away.

    This is wrong (and a false optimization too, preempt_disable() is really
    rather cheap, and I doubt any of this is on uber critical paths,
    otherwise it would've retained a pointer to the actual module anyway and
    avoided the second lookup).

    While its true that the module cannot go away while we hold a reference
    on it, the data structure we do the lookup in very much _CAN_ change
    while we do the lookup. Therefore fix the comment and add the
    required preempt_disable().

    Reported-by: poma
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Rusty Russell
    Fixes: a6e6abd575fc ("module: remove module_text_address()")
    Cc: stable@kernel.org

    Peter Zijlstra
     

29 Jul, 2015

1 commit

  • We don't actually hold the module_mutex when calling find_module_all
    from module_kallsyms_lookup_name: that's because it's used by the oops
    code and we don't want to deadlock.

    However, access to the list read-only is safe if preempt is disabled,
    so we can weaken the assertion. Keep a strong version for external
    callers though.

    Fixes: 0be964be0d45 ("module: Sanitize RCU usage and locking")
    Reported-by: He Kuang
    Cc: stable@kernel.org
    Acked-by: Peter Zijlstra (Intel)
    Signed-off-by: Rusty Russell

    Rusty Russell
     

09 Jul, 2015

1 commit

  • The load_module() error path frees a module but forgot to take it out
    of the mod_tree, leaving a dangling entry in the tree, causing havoc.

    Cc: Mathieu Desnoyers
    Reported-by: Arthur Marsh
    Tested-by: Arthur Marsh
    Fixes: 93c2e105f6bc ("module: Optimize __module_address() using a latched RB-tree")
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Rusty Russell

    Peter Zijlstra
     

02 Jul, 2015

1 commit

  • Pull module updates from Rusty Russell:
    "Main excitement here is Peter Zijlstra's lockless rbtree optimization
    to speed module address lookup. He found some abusers of the module
    lock doing that too.

    A little bit of parameter work here too; including Dan Streetman's
    breaking up the big param mutex so writing a parameter can load
    another module (yeah, really). Unfortunately that broke the usual
    suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were
    appended too"

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (26 commits)
    modules: only use mod->param_lock if CONFIG_MODULES
    param: fix module param locks when !CONFIG_SYSFS.
    rcu: merge fix for Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()
    module: add per-module param_lock
    module: make perm const
    params: suppress unused variable error, warn once just in case code changes.
    modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'.
    kernel/module.c: avoid ifdefs for sig_enforce declaration
    kernel/workqueue.c: remove ifdefs over wq_power_efficient
    kernel/params.c: export param_ops_bool_enable_only
    kernel/params.c: generalize bool_enable_only
    kernel/module.c: use generic module param operaters for sig_enforce
    kernel/params: constify struct kernel_param_ops uses
    sysfs: tightened sysfs permission checks
    module: Rework module_addr_{min,max}
    module: Use __module_address() for module_address_lookup()
    module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING
    module: Optimize __module_address() using a latched RB-tree
    rbtree: Implement generic latch_tree
    seqlock: Introduce raw_read_seqcount_latch()
    ...

    Linus Torvalds
     

28 Jun, 2015

1 commit


27 Jun, 2015

2 commits

  • Pull driver core updates from Greg KH:
    "Here is the driver core / firmware changes for 4.2-rc1.

    A number of small changes all over the place in the driver core, and
    in the firmware subsystem. Nothing really major, full details in the
    shortlog. Some of it is a bit of churn, given that the platform
    driver probing changes was found to not work well, so they were
    reverted.

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (31 commits)
    Revert "base/platform: Only insert MEM and IO resources"
    Revert "base/platform: Continue on insert_resource() error"
    Revert "of/platform: Use platform_device interface"
    Revert "base/platform: Remove code duplication"
    firmware: add missing kfree for work on async call
    fs: sysfs: don't pass count == 0 to bin file readers
    base:dd - Fix for typo in comment to function driver_deferred_probe_trigger().
    base/platform: Remove code duplication
    of/platform: Use platform_device interface
    base/platform: Continue on insert_resource() error
    base/platform: Only insert MEM and IO resources
    firmware: use const for remaining firmware names
    firmware: fix possible use after free on name on asynchronous request
    firmware: check for file truncation on direct firmware loading
    firmware: fix __getname() missing failure check
    drivers: of/base: move of_init to driver_init
    drivers/base: cacheinfo: fix annoying typo when DT nodes are absent
    sysfs: disambiguate between "error code" and "failure" in comments
    driver-core: fix build for !CONFIG_MODULES
    driver-core: make __device_attach() static
    ...

    Linus Torvalds
     
  • Pull tracing updates from Steven Rostedt:
    "This patch series contains several clean ups and even a new trace
    clock "monitonic raw". Also some enhancements to make the ring buffer
    even faster. But the biggest and most noticeable change is the
    renaming of the ftrace* files, structures and variables that have to
    deal with trace events.

    Over the years I've had several developers tell me about their
    confusion with what ftrace is compared to events. Technically,
    "ftrace" is the infrastructure to do the function hooks, which include
    tracing and also helps with live kernel patching. But the trace
    events are a separate entity altogether, and the files that affect the
    trace events should not be named "ftrace". These include:

    include/trace/ftrace.h -> include/trace/trace_events.h
    include/linux/ftrace_event.h -> include/linux/trace_events.h

    Also, functions that are specific for trace events have also been renamed:

    ftrace_print_*() -> trace_print_*()
    (un)register_ftrace_event() -> (un)register_trace_event()
    ftrace_event_name() -> trace_event_name()
    ftrace_trigger_soft_disabled() -> trace_trigger_soft_disabled()
    ftrace_define_fields_##call() -> trace_define_fields_##call()
    ftrace_get_offsets_##call() -> trace_get_offsets_##call()

    Structures have been renamed:

    ftrace_event_file -> trace_event_file
    ftrace_event_{call,class} -> trace_event_{call,class}
    ftrace_event_buffer -> trace_event_buffer
    ftrace_subsystem_dir -> trace_subsystem_dir
    ftrace_event_raw_##call -> trace_event_raw_##call
    ftrace_event_data_offset_##call-> trace_event_data_offset_##call
    ftrace_event_type_funcs_##call -> trace_event_type_funcs_##call

    And a few various variables and flags have also been updated.

    This has been sitting in linux-next for some time, and I have not
    heard a single complaint about this rename breaking anything. Mostly
    because these functions, variables and structures are mostly internal
    to the tracing system and are seldom (if ever) used by anything
    external to that"

    * tag 'trace-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
    ring_buffer: Allow to exit the ring buffer benchmark immediately
    ring-buffer-benchmark: Fix the wrong type
    ring-buffer-benchmark: Fix the wrong param in module_param
    ring-buffer: Add enum names for the context levels
    ring-buffer: Remove useless unused tracing_off_permanent()
    ring-buffer: Give NMIs a chance to lock the reader_lock
    ring-buffer: Add trace_recursive checks to ring_buffer_write()
    ring-buffer: Allways do the trace_recursive checks
    ring-buffer: Move recursive check to per_cpu descriptor
    ring-buffer: Add unlikelys to make fast path the default
    tracing: Rename ftrace_get_offsets_##call() to trace_event_get_offsets_##call()
    tracing: Rename ftrace_define_fields_##call() to trace_event_define_fields_##call()
    tracing: Rename ftrace_event_type_funcs_##call to trace_event_type_funcs_##call
    tracing: Rename ftrace_data_offset_##call to trace_event_data_offset_##call
    tracing: Rename ftrace_raw_##call event structures to trace_event_raw_##call
    tracing: Rename ftrace_trigger_soft_disabled() to trace_trigger_soft_disabled()
    tracing: Rename FTRACE_EVENT_FL_* flags to EVENT_FILE_FL_*
    tracing: Rename struct ftrace_subsystem_dir to trace_subsystem_dir
    tracing: Rename ftrace_event_name() to trace_event_name()
    tracing: Rename FTRACE_MAX_EVENT to TRACE_EVENT_TYPE_MAX
    ...

    Linus Torvalds
     

23 Jun, 2015

1 commit

  • Add a "param_lock" mutex to each module, and update params.c to use
    the correct built-in or module mutex while locking kernel params.
    Remove the kparam_block_sysfs_r/w() macros, replace them with direct
    calls to kernel_param_[un]lock(module).

    The kernel param code currently uses a single mutex to protect
    modification of any and all kernel params. While this generally works,
    there is one specific problem with it; a module callback function
    cannot safely load another module, i.e. with request_module() or even
    with indirect calls such as crypto_has_alg(). If the module to be
    loaded has any of its params configured (e.g. with a /etc/modprobe.d/*
    config file), then the attempt will result in a deadlock between the
    first module param callback waiting for modprobe, and modprobe trying to
    lock the single kernel param mutex to set the new module's param.

    This fixes that by using per-module mutexes, so that each individual module
    is protected against concurrent changes in its own kernel params, but is
    not blocked by changes to other module params. All built-in modules
    continue to use the built-in mutex, since they will always be loaded at
    runtime and references (e.g. request_module(), crypto_has_alg()) to them
    will never cause load-time param changing.

    This also simplifies the interface used by modules to block sysfs access
    to their params; while there are currently functions to block and unblock
    sysfs param access which are split up by read and write and expect a single
    kernel param to be passed, their actual operation is identical and applies
    to all params, not just the one passed to them; they simply lock and unlock
    the global param mutex. They are replaced with direct calls to
    kernel_param_[un]lock(THIS_MODULE), which locks THIS_MODULE's param_lock, or
    if the module is built-in, it locks the built-in mutex.

    Suggested-by: Rusty Russell
    Signed-off-by: Dan Streetman
    Signed-off-by: Rusty Russell

    Dan Streetman
     

09 Jun, 2015

1 commit


28 May, 2015

8 commits

  • There's no need to require an ifdef over the declaration
    of sig_enforce as IS_ENABLED() can be used. While at it,
    there's no harm in exposing this kernel parameter outside of
    CONFIG_MODULE_SIG as it'd be a no-op on non module sig
    kernels.

    Now, technically we should in theory be able to remove
    the #ifdef'ery over the declaration of the module parameter
    as we are also trusting the bool_enable_only code for
    CONFIG_MODULE_SIG kernels but for now remain paranoid
    and keep it.

    With time if no one can put a bullet through bool_enable_only
    and if there are no technical requirements over not exposing
    CONFIG_MODULE_SIG_FORCE with the measures in place by
    bool_enable_only we could remove this last ifdef.

    Cc: Rusty Russell
    Cc: Andrew Morton
    Cc: Kees Cook
    Cc: Tejun Heo
    Cc: Ingo Molnar
    Cc: linux-kernel@vger.kernel.org
    Cc: cocci@systeme.lip6.fr
    Signed-off-by: Luis R. Rodriguez
    Signed-off-by: Rusty Russell

    Luis R. Rodriguez
     
  • This takes out the bool_enable_only implementation from
    the module loading code and generalizes it so that others
    can make use of it.

    Cc: Rusty Russell
    Cc: Jani Nikula
    Cc: Andrew Morton
    Cc: Kees Cook
    Cc: Tejun Heo
    Cc: Ingo Molnar
    Cc: linux-kernel@vger.kernel.org
    Cc: cocci@systeme.lip6.fr
    Signed-off-by: Luis R. Rodriguez
    Signed-off-by: Rusty Russell

    Luis R. Rodriguez
     
  • We're directly checking and modifying sig_enforce when needed instead
    of using the generic helpers. This prevents us from generalizing this
    helper so that others can use it. Use indirect helpers to allow us
    to generalize this code a bit and to make it a bit more clear what
    this is doing.

    Cc: Rusty Russell
    Cc: Jani Nikula
    Cc: Andrew Morton
    Cc: Kees Cook
    Cc: Tejun Heo
    Cc: Ingo Molnar
    Cc: linux-kernel@vger.kernel.org
    Cc: cocci@systeme.lip6.fr
    Signed-off-by: Luis R. Rodriguez
    Signed-off-by: Rusty Russell

    Luis R. Rodriguez
     
  • __module_address() does an initial bound check before doing the
    {list/tree} iteration to find the actual module. The bound variables
    are nowhere near the mod_tree cacheline, in fact they're nowhere near
    one another.

    module_addr_min lives in .data while module_addr_max lives in .bss
    (smarty pants GCC thinks the explicit 0 assignment is a mistake).

    Rectify this by moving the two variables into a structure together
    with the latch_tree_root to guarantee they all share the same
    cacheline and avoid hitting two extra cachelines for the lookup.

    While reworking the bounds code, move the bound update from allocation
    to insertion time, this avoids updating the bounds for a few error
    paths.

    Cc: Rusty Russell
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Rusty Russell

    Peter Zijlstra
     
  • Use the generic __module_address() addr to struct module lookup
    instead of open coding it once more.

    Cc: Rusty Russell
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Rusty Russell

    Peter Zijlstra
     
  • Andrew worried about the overhead on small systems; only use the fancy
    code when either perf or tracing is enabled.

    Cc: Rusty Russell
    Cc: Steven Rostedt
    Requested-by: Andrew Morton
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Rusty Russell

    Peter Zijlstra
     
  • Currently __module_address() is using a linear search through all
    modules in order to find the module corresponding to the provided
    address. With a lot of modules this can take a lot of time.

    One of the users of this is kernel_text_address() which is employed
    in many stack unwinders; which in turn are used by perf-callchain and
    ftrace (possibly from NMI context).

    So by optimizing __module_address() we optimize many stack unwinders
    which are used by both perf and tracing in performance sensitive code.

    Cc: Rusty Russell
    Cc: Steven Rostedt
    Cc: Mathieu Desnoyers
    Cc: Oleg Nesterov
    Cc: "Paul E. McKenney"
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Rusty Russell

    Peter Zijlstra
     
  • Currently the RCU usage in module is an inconsistent mess of RCU and
    RCU-sched, this is broken for CONFIG_PREEMPT where synchronize_rcu()
    does not imply synchronize_sched().

    Most usage sites use preempt_{dis,en}able() which is RCU-sched, but
    (most of) the modification sites use synchronize_rcu(). With the
    exception of the module bug list, which actually uses RCU.

    Convert everything over to RCU-sched.

    Furthermore add lockdep asserts to all sites, because it's not at all
    clear to me the required locking is observed, esp. on exported
    functions.

    Cc: Rusty Russell
    Acked-by: "Paul E. McKenney"
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Rusty Russell

    Peter Zijlstra
     

27 May, 2015

1 commit

  • Due to the new lockdep checks in the coming patch, we go:

    [ 9.759380] ------------[ cut here ]------------
    [ 9.759389] WARNING: CPU: 31 PID: 597 at ../kernel/module.c:216 each_symbol_section+0x121/0x130()
    [ 9.759391] Modules linked in:
    [ 9.759393] CPU: 31 PID: 597 Comm: modprobe Not tainted 4.0.0-rc1+ #65
    [ 9.759393] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
    [ 9.759396] ffffffff817d8676 ffff880424567ca8 ffffffff8157e98b 0000000000000001
    [ 9.759398] 0000000000000000 ffff880424567ce8 ffffffff8105fbc7 ffff880424567cd8
    [ 9.759400] 0000000000000000 ffffffff810ec160 ffff880424567d40 0000000000000000
    [ 9.759400] Call Trace:
    [ 9.759407] [] dump_stack+0x4f/0x7b
    [ 9.759410] [] warn_slowpath_common+0x97/0xe0
    [ 9.759412] [] ? section_objs+0x60/0x60
    [ 9.759414] [] warn_slowpath_null+0x1a/0x20
    [ 9.759415] [] each_symbol_section+0x121/0x130
    [ 9.759417] [] find_symbol+0x31/0x70
    [ 9.759420] [] load_module+0x20f/0x2660
    [ 9.759422] [] ? __do_page_fault+0x190/0x4e0
    [ 9.759426] [] ? retint_restore_args+0x13/0x13
    [ 9.759427] [] ? retint_restore_args+0x13/0x13
    [ 9.759433] [] ? trace_hardirqs_on_caller+0x11d/0x1e0
    [ 9.759437] [] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [ 9.759439] [] ? retint_restore_args+0x13/0x13
    [ 9.759441] [] SyS_init_module+0xce/0x100
    [ 9.759443] [] system_call_fastpath+0x12/0x17
    [ 9.759445] ---[ end trace 9294429076a9c644 ]---

    As per the comment this site should be fine, but lets wrap it in
    preempt_disable() anyhow to placate lockdep.

    Cc: Rusty Russell
    Acked-by: Paul E. McKenney
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Rusty Russell

    Peter Zijlstra
     

20 May, 2015

2 commits

  • Some init systems may wish to express the desire to have device drivers
    run their probe() code asynchronously. This implements support for this
    and allows userspace to request async probe as a preference through a
    generic shared device driver module parameter, async_probe.

    Implementation for async probe is supported through a module parameter
    given that since synchronous probe has been prevalent for years some
    userspace might exist which relies on the fact that the device driver
    will probe synchronously and the assumption that devices it provides
    will be immediately available after this.

    Signed-off-by: Luis R. Rodriguez
    Signed-off-by: Dmitry Torokhov
    Signed-off-by: Greg Kroah-Hartman

    Luis R. Rodriguez
     
  • This adds an extra argument onto parse_params() to be used
    as a way to make the unused callback a bit more useful and
    generic by allowing the caller to pass on a data structure
    of its choice. An example use case is to allow us to easily
    make module parameters for every module which we will do
    next.

    @ parse @
    identifier name, args, params, num, level_min, level_max;
    identifier unknown, param, val, doing;
    type s16;
    @@
    extern char *parse_args(const char *name,
    char *args,
    const struct kernel_param *params,
    unsigned num,
    s16 level_min,
    s16 level_max,
    + void *arg,
    int (*unknown)(char *param, char *val,
    const char *doing
    + , void *arg
    ));

    @ parse_mod @
    identifier name, args, params, num, level_min, level_max;
    identifier unknown, param, val, doing;
    type s16;
    @@
    char *parse_args(const char *name,
    char *args,
    const struct kernel_param *params,
    unsigned num,
    s16 level_min,
    s16 level_max,
    + void *arg,
    int (*unknown)(char *param, char *val,
    const char *doing
    + , void *arg
    ))
    {
    ...
    }

    @ parse_args_found @
    expression R, E1, E2, E3, E4, E5, E6;
    identifier func;
    @@

    (
    R =
    parse_args(E1, E2, E3, E4, E5, E6,
    + NULL,
    func);
    |
    R =
    parse_args(E1, E2, E3, E4, E5, E6,
    + NULL,
    &func);
    |
    R =
    parse_args(E1, E2, E3, E4, E5, E6,
    + NULL,
    NULL);
    |
    parse_args(E1, E2, E3, E4, E5, E6,
    + NULL,
    func);
    |
    parse_args(E1, E2, E3, E4, E5, E6,
    + NULL,
    &func);
    |
    parse_args(E1, E2, E3, E4, E5, E6,
    + NULL,
    NULL);
    )

    @ parse_args_unused depends on parse_args_found @
    identifier parse_args_found.func;
    @@

    int func(char *param, char *val, const char *unused
    + , void *arg
    )
    {
    ...
    }

    @ mod_unused depends on parse_args_found @
    identifier parse_args_found.func;
    expression A1, A2, A3;
    @@

    - func(A1, A2, A3);
    + func(A1, A2, A3, NULL);

    Generated-by: Coccinelle SmPL
    Cc: cocci@systeme.lip6.fr
    Cc: Tejun Heo
    Cc: Arjan van de Ven
    Cc: Greg Kroah-Hartman
    Cc: Rusty Russell
    Cc: Christoph Hellwig
    Cc: Felipe Contreras
    Cc: Ewan Milne
    Cc: Jean Delvare
    Cc: Hannes Reinecke
    Cc: Jani Nikula
    Cc: linux-kernel@vger.kernel.org
    Reviewed-by: Tejun Heo
    Acked-by: Rusty Russell
    Signed-off-by: Luis R. Rodriguez
    Signed-off-by: Greg Kroah-Hartman

    Luis R. Rodriguez
     

14 May, 2015

1 commit


09 May, 2015

1 commit

  • The module notifier call chain for MODULE_STATE_COMING was moved up before
    the parsing of args, into the complete_formation() call. But if the module failed
    to load after that, the notifier call chain for MODULE_STATE_GOING was
    never called and that prevented the users of those call chains from
    cleaning up anything that was allocated.

    Link: http://lkml.kernel.org/r/554C52B9.9060700@gmail.com

    Reported-by: Pontus Fuchs
    Fixes: 4982223e51e8 "module: set nx before marking module MODULE_STATE_COMING"
    Cc: stable@vger.kernel.org # 3.16+
    Signed-off-by: Steven Rostedt
    Signed-off-by: Rusty Russell

    Steven Rostedt
     

23 Apr, 2015

1 commit

  • Pull module updates from Rusty Russell:
    "Quentin opened a can of worms by adding extable entry checking to
    modpost, but most architectures seem fixed now. Thanks to all
    involved.

    Last minute rebase because I noticed a "[PATCH]" had snuck into a
    commit message somehow"

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    modpost: don't emit section mismatch warnings for compiler optimizations
    modpost: expand pattern matching to support substring matches
    modpost: do not try to match the SHT_NUL section.
    modpost: fix extable entry size calculation.
    modpost: fix inverted logic in is_extable_fault_address().
    modpost: handle -ffunction-sections
    modpost: Whitelist .text.fixup and .exception.text
    params: handle quotes properly for values not of form foo="bar".
    modpost: document the use of struct section_check.
    modpost: handle relocations mismatch in __ex_table.
    scripts: add check_extable.sh script.
    modpost: mismatch_handler: retrieve tosym information only when needed.
    modpost: factorize symbol pretty print in get_pretty_name().
    modpost: add handler function pointer to sectioncheck.
    modpost: add .sched.text and .kprobes.text to the TEXT_SECTIONS list.
    modpost: add strict white-listing when referencing sections.
    module: do not print allocation-fail warning on bogus user buffer size
    kernel/module.c: fix typos in message about unused symbols

    Linus Torvalds
     

15 Apr, 2015

1 commit

  • Pull tracing updates from Steven Rostedt:
    "Some clean ups and small fixes, but the biggest change is the addition
    of the TRACE_DEFINE_ENUM() macro that can be used by tracepoints.

    Tracepoints have helper functions for the TP_printk() called
    __print_symbolic() and __print_flags() that lets a numeric number be
    displayed as a a human comprehensible text. What is placed in the
    TP_printk() is also shown in the tracepoint format file such that user
    space tools like perf and trace-cmd can parse the binary data and
    express the values too. Unfortunately, the way the TRACE_EVENT()
    macro works, anything placed in the TP_printk() will be shown pretty
    much exactly as is. The problem arises when enums are used. That's
    because unlike macros, enums will not be changed into their values by
    the C pre-processor. Thus, the enum string is exported to the format
    file, and this makes it useless for user space tools.

    The TRACE_DEFINE_ENUM() solves this by converting the enum strings in
    the TP_printk() format into their number, and that is what is shown to
    user space. For example, the tracepoint tlb_flush currently has this
    in its format file:

    __print_symbolic(REC->reason,
    { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
    { TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
    { TLB_LOCAL_SHOOTDOWN, "local shootdown" },
    { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })

    After adding:

    TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
    TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
    TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
    TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);

    Its format file will contain this:

    __print_symbolic(REC->reason,
    { 0, "flush on task switch" },
    { 1, "remote shootdown" },
    { 2, "local shootdown" },
    { 3, "local mm shootdown" })"

    * tag 'trace-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (27 commits)
    tracing: Add enum_map file to show enums that have been mapped
    writeback: Export enums used by tracepoint to user space
    v4l: Export enums used by tracepoints to user space
    SUNRPC: Export enums in tracepoints to user space
    mm: tracing: Export enums in tracepoints to user space
    irq/tracing: Export enums in tracepoints to user space
    f2fs: Export the enums in the tracepoints to userspace
    net/9p/tracing: Export enums in tracepoints to userspace
    x86/tlb/trace: Export enums in used by tlb_flush tracepoint
    tracing/samples: Update the trace-event-sample.h with TRACE_DEFINE_ENUM()
    tracing: Allow for modules to convert their enums to values
    tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values
    tracing: Update trace-event-sample with TRACE_SYSTEM_VAR documentation
    tracing: Give system name a pointer
    brcmsmac: Move each system tracepoints to their own header
    iwlwifi: Move each system tracepoints to their own header
    mac80211: Move message tracepoints to their own header
    tracing: Add TRACE_SYSTEM_VAR to xhci-hcd
    tracing: Add TRACE_SYSTEM_VAR to kvm-s390
    tracing: Add TRACE_SYSTEM_VAR to intel-sst
    ...

    Linus Torvalds
     

09 Apr, 2015

1 commit

  • Unlike most (all?) other copies from user space, kernel module loading
    is almost unlimited in size. So we do a potentially huge
    "copy_from_user()" when we copy the module data from user space to the
    kernel buffer, which can be a latency concern when preemption is
    disabled (or voluntary).

    Also, because 'copy_from_user()' clears the tail of the kernel buffer on
    failures, even a *failed* copy can end up wasting a lot of time.

    Normally neither of these are concerns in real life, but they do trigger
    when doing stress-testing with trinity. Running in a VM seems to add
    its own overheadm causing trinity module load testing to even trigger
    the watchdog.

    The simple fix is to just chunk up the module loading, so that it never
    tries to copy insanely big areas in one go. That bounds the latency,
    and also the amount of (unnecessarily, in this case) cleared memory for
    the failure case.

    Reported-by: Sasha Levin
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

08 Apr, 2015

1 commit


24 Mar, 2015

2 commits


23 Mar, 2015

1 commit

  • Module unload calls lockdep_free_key_range(), which removes entries
    from the data structures. Most of the lockdep code OTOH assumes the
    data structures are append only; in specific see the comments in
    add_lock_to_list() and look_up_lock_class().

    Clearly this has only worked by accident; make it work proper. The
    actual scenario to make it go boom would involve the memory freed by
    the module unlock being re-allocated and re-used for a lock inside of
    a rcu-sched grace period. This is a very unlikely scenario, still
    better plug the hole.

    Use RCU list iteration in all places and ammend the comments.

    Change lockdep_free_key_range() to issue a sync_sched() between
    removal from the lists and returning -- which results in the memory
    being freed. Further ensure the callers are placed correctly and
    comment the requirements.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Andrey Tsyvarev
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rusty Russell
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

13 Mar, 2015

1 commit

  • Current approach in handling shadow memory for modules is broken.

    Shadow memory could be freed only after memory shadow corresponds it is no
    longer used. vfree() called from interrupt context could use memory its
    freeing to store 'struct llist_node' in it:

    void vfree(const void *addr)
    {
    ...
    if (unlikely(in_interrupt())) {
    struct vfree_deferred *p = this_cpu_ptr(&vfree_deferred);
    if (llist_add((struct llist_node *)addr, &p->list))
    schedule_work(&p->wq);

    Later this list node used in free_work() which actually frees memory.
    Currently module_memfree() called in interrupt context will free shadow
    before freeing module's memory which could provoke kernel crash.

    So shadow memory should be freed after module's memory. However, such
    deallocation order could race with kasan_module_alloc() in module_alloc().

    Free shadow right before releasing vm area. At this point vfree()'d
    memory is not used anymore and yet not available for other allocations.
    New VM_KASAN flag used to indicate that vm area has dynamically allocated
    shadow memory so kasan frees shadow only if it was previously allocated.

    Signed-off-by: Andrey Ryabinin
    Acked-by: Rusty Russell
    Cc: Dmitry Vyukov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     

06 Mar, 2015

1 commit


18 Feb, 2015

1 commit

  • This provides a reliable breakpoint target, required for automatic symbol
    loading via the gdb helper command 'lx-symbols'.

    Signed-off-by: Jan Kiszka
    Acked-by: Rusty Russell
    Cc: Thomas Gleixner
    Cc: Jason Wessel
    Cc: Andi Kleen
    Cc: Ben Widawsky
    Cc: Borislav Petkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kiszka
     

14 Feb, 2015

1 commit

  • This feature let us to detect accesses out of bounds of global variables.
    This will work as for globals in kernel image, so for globals in modules.
    Currently this won't work for symbols in user-specified sections (e.g.
    __init, __read_mostly, ...)

    The idea of this is simple. Compiler increases each global variable by
    redzone size and add constructors invoking __asan_register_globals()
    function. Information about global variable (address, size, size with
    redzone ...) passed to __asan_register_globals() so we could poison
    variable's redzone.

    This patch also forces module_alloc() to return 8*PAGE_SIZE aligned
    address making shadow memory handling (
    kasan_module_alloc()/kasan_module_free() ) more simple. Such alignment
    guarantees that each shadow page backing modules address space correspond
    to only one module_alloc() allocation.

    Signed-off-by: Andrey Ryabinin
    Cc: Dmitry Vyukov
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrey Konovalov
    Cc: Yuri Gribov
    Cc: Konstantin Khlebnikov
    Cc: Sasha Levin
    Cc: Christoph Lameter
    Cc: Joonsoo Kim
    Cc: Dave Hansen
    Cc: Andi Kleen
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     

11 Feb, 2015

2 commits

  • Since the introduction of the nested sleep warning; we've established
    that the occasional sleep inside a wait_event() is fine.

    wait_event() loops are invariant wrt. spurious wakeups, and the
    occasional sleep has a similar effect on them. As long as its occasional
    its harmless.

    Therefore replace the 'correct' but verbose wait_woken() thing with
    a simple annotation to shut up the warning.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Rusty Russell

    Peter Zijlstra
     
  • Because wait_event() loops are safe vs spurious wakeups we can allow the
    occasional sleep -- which ends up being very similar.

    Reported-by: Dave Jones
    Signed-off-by: Peter Zijlstra (Intel)
    Tested-by: Dave Jones
    Signed-off-by: Rusty Russell

    Peter Zijlstra
     

06 Feb, 2015

2 commits


22 Jan, 2015

1 commit

  • James Bottomley points out that it will be -1 during unload. It's
    only used for diagnostics, so let's not hide that as it could be a
    clue as to what's gone wrong.

    Cc: Jason Wessel
    Acked-and-documention-added-by: James Bottomley
    Reviewed-by: Masami Hiramatsu
    Signed-off-by: Rusty Russell

    Rusty Russell
     

20 Jan, 2015

2 commits

  • The kallsyms routines (module_symbol_name, lookup_module_* etc) disable
    preemption to walk the modules rather than taking the module_mutex:
    this is because they are used for symbol resolution during oopses.

    This works because there are synchronize_sched() and synchronize_rcu()
    in the unload and failure paths. However, there's one case which doesn't
    have that: the normal case where module loading succeeds, and we free
    the init section.

    We don't want a synchronize_rcu() there, because it would slow down
    module loading: this bug was introduced in 2009 to speed module
    loading in the first place.

    Thus, we want to do the free in an RCU callback. We do this in the
    simplest possible way by allocating a new rcu_head: if we put it in
    the module structure we'd have to worry about that getting freed.

    Reported-by: Rui Xiang
    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Nothing needs the module pointer any more, and the next patch will
    call it from RCU, where the module itself might no longer exist.
    Removing the arg is the safest approach.

    This just codifies the use of the module_alloc/module_free pattern
    which ftrace and bpf use.

    Signed-off-by: Rusty Russell
    Acked-by: Alexei Starovoitov
    Cc: Mikael Starvik
    Cc: Jesper Nilsson
    Cc: Ralf Baechle
    Cc: Ley Foon Tan
    Cc: Benjamin Herrenschmidt
    Cc: Chris Metcalf
    Cc: Steven Rostedt
    Cc: x86@kernel.org
    Cc: Ananth N Mavinakayanahalli
    Cc: Anil S Keshavamurthy
    Cc: Masami Hiramatsu
    Cc: linux-cris-kernel@axis.com
    Cc: linux-kernel@vger.kernel.org
    Cc: linux-mips@linux-mips.org
    Cc: nios2-dev@lists.rocketboards.org
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: sparclinux@vger.kernel.org
    Cc: netdev@vger.kernel.org

    Rusty Russell