04 Aug, 2016

2 commits

  • Add ro_after_init support for modules by adding a new page-aligned section
    in the module layout (after rodata) for ro_after_init data and enabling RO
    protection for that section after module init runs.

    Signed-off-by: Jessica Yu
    Acked-by: Kees Cook
    Signed-off-by: Rusty Russell

    Jessica Yu
     
  • For historical reasons (i.e. pre-git) the exception table stuff was
    buried in the middle of the module.h file. I noticed this while
    doing an audit for needless includes of module.h and found core
    kernel files (both arch specific and arch independent) were just
    including module.h for this.

    The converse is also true, in that conventional drivers, be they
    for filesystems or actual hardware peripherals or similar, do not
    normally care about the exception tables.

    Here we fork the exception table content out of module.h into a
    new file called extable.h -- and temporarily include it into the
    module.h itself.

    Then we will work our way across the arch independent and arch
    specific files needing just exception table content, and move
    them off module.h and onto extable.h

    Once that is done, we can remove the extable.h from module.h
    and in doing it like this, we avoid introducing build failures
    into the git history.

    The gain here is that module.h gets a bit smaller, across all
    modular drivers that we build for allmodconfig. Also the core
    files that only need exception table stuff don't have an include
    of module.h that brings in lots of extra stuff and just looks
    generally out of place.

    Cc: Andrew Morton
    Cc: Linus Torvalds
    Signed-off-by: Paul Gortmaker
    Signed-off-by: Rusty Russell

    Paul Gortmaker
     

27 Jul, 2016

1 commit

  • __module_put_and_exit() is makred noreturn in module.h declaration, but is
    lacking the attribute in the definition, which makes some tools (such as
    sparse) unhappy. Amend the definition with the attribute as well (and
    reformat the declaration so that it uses more common format).

    Signed-off-by: Jiri Kosina
    Signed-off-by: Rusty Russell

    Jiri Kosina
     

01 Apr, 2016

1 commit

  • For livepatch modules, copy Elf section, symbol, and string information
    from the load_info struct in the module loader. Persist copies of the
    original symbol table and string table.

    Livepatch manages its own relocation sections in order to reuse module
    loader code to write relocations. Livepatch modules must preserve Elf
    information such as section indices in order to apply livepatch relocation
    sections using the module loader's apply_relocate_add() function.

    In order to apply livepatch relocation sections, livepatch modules must
    keep a complete copy of their original symbol table in memory. Normally, a
    stripped down copy of a module's symbol table (containing only "core"
    symbols) is made available through module->core_symtab. But for livepatch
    modules, the symbol table copied into memory on module load must be exactly
    the same as the symbol table produced when the patch module was compiled.
    This is because the relocations in each livepatch relocation section refer
    to their respective symbols with their symbol indices, and the original
    symbol indices (and thus the symtab ordering) must be preserved in order
    for apply_relocate_add() to find the right symbol.

    Signed-off-by: Jessica Yu
    Reviewed-by: Miroslav Benes
    Acked-by: Josh Poimboeuf
    Acked-by: Rusty Russell
    Reviewed-by: Rusty Russell
    Signed-off-by: Jiri Kosina

    Jessica Yu
     

03 Feb, 2016

1 commit

  • For CONFIG_KALLSYMS, we keep two symbol tables and two string tables.
    There's one full copy, marked SHF_ALLOC and laid out at the end of the
    module's init section. There's also a cut-down version that only
    contains core symbols and strings, and lives in the module's core
    section.

    After module init (and before we free the module memory), we switch
    the mod->symtab, mod->num_symtab and mod->strtab to point to the core
    versions. We do this under the module_mutex.

    However, kallsyms doesn't take the module_mutex: it uses
    preempt_disable() and rcu tricks to walk through the modules, because
    it's used in the oops path. It's also used in /proc/kallsyms.
    There's nothing atomic about the change of these variables, so we can
    get the old (larger!) num_symtab and the new symtab pointer; in fact
    this is what I saw when trying to reproduce.

    By grouping these variables together, we can use a
    carefully-dereferenced pointer to ensure we always get one or the
    other (the free of the module init section is already done in an RCU
    callback, so that's safe). We allocate the init one at the end of the
    module init section, and keep the core one inside the struct module
    itself (it could also have been allocated at the end of the module
    core, but that's probably overkill).

    Reported-by: Weilong Chen
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111541
    Cc: stable@kernel.org
    Signed-off-by: Rusty Russell

    Rusty Russell
     

05 Dec, 2015

2 commits

  • Modules have three sections: text, rodata and writable data. The code
    handled the case where these overlapped, however they never can:
    debug_align() ensures they are always page-aligned.

    This is why we got away with manually traversing the pages in
    set_all_modules_text_rw() without rounding.

    We create three helper functions: frob_text(), frob_rodata() and
    frob_writable_data(). We then call these explicitly at every point,
    so it's clear what we're doing.

    We also expose module_enable_ro() and module_disable_ro() for
    livepatch to use.

    Reviewed-by: Josh Poimboeuf
    Signed-off-by: Rusty Russell
    Signed-off-by: Jiri Kosina

    Rusty Russell
     
  • Makes it easier to handle init vs core cleanly, though the change is
    fairly invasive across random architectures.

    It simplifies the rbtree code immediately, however, while keeping the
    core data together in the same cachline (now iff the rbtree code is
    enabled).

    Acked-by: Peter Zijlstra
    Reviewed-by: Josh Poimboeuf
    Signed-off-by: Rusty Russell
    Signed-off-by: Jiri Kosina

    Rusty Russell
     

06 Jul, 2015

1 commit

  • Modular users will always be users of init functionality, but
    users of init functionality are not necessarily always modules.

    Hence any functionality like module_init and module_exit would
    be more at home in the module.h file. And module.h should
    explicitly include init.h to make the dependency clear.

    We've already done all the legwork needed to ensure that this
    move does not cause any build regressions due to implicit
    header file include assumptions about where module_init lives.

    Cc: Rusty Russell
    Acked-by: Rusty Russell
    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

02 Jul, 2015

1 commit

  • Pull module updates from Rusty Russell:
    "Main excitement here is Peter Zijlstra's lockless rbtree optimization
    to speed module address lookup. He found some abusers of the module
    lock doing that too.

    A little bit of parameter work here too; including Dan Streetman's
    breaking up the big param mutex so writing a parameter can load
    another module (yeah, really). Unfortunately that broke the usual
    suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were
    appended too"

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (26 commits)
    modules: only use mod->param_lock if CONFIG_MODULES
    param: fix module param locks when !CONFIG_SYSFS.
    rcu: merge fix for Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()
    module: add per-module param_lock
    module: make perm const
    params: suppress unused variable error, warn once just in case code changes.
    modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'.
    kernel/module.c: avoid ifdefs for sig_enforce declaration
    kernel/workqueue.c: remove ifdefs over wq_power_efficient
    kernel/params.c: export param_ops_bool_enable_only
    kernel/params.c: generalize bool_enable_only
    kernel/module.c: use generic module param operaters for sig_enforce
    kernel/params: constify struct kernel_param_ops uses
    sysfs: tightened sysfs permission checks
    module: Rework module_addr_{min,max}
    module: Use __module_address() for module_address_lookup()
    module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING
    module: Optimize __module_address() using a latched RB-tree
    rbtree: Implement generic latch_tree
    seqlock: Introduce raw_read_seqcount_latch()
    ...

    Linus Torvalds
     

28 Jun, 2015

1 commit


27 Jun, 2015

2 commits

  • Pull driver core updates from Greg KH:
    "Here is the driver core / firmware changes for 4.2-rc1.

    A number of small changes all over the place in the driver core, and
    in the firmware subsystem. Nothing really major, full details in the
    shortlog. Some of it is a bit of churn, given that the platform
    driver probing changes was found to not work well, so they were
    reverted.

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (31 commits)
    Revert "base/platform: Only insert MEM and IO resources"
    Revert "base/platform: Continue on insert_resource() error"
    Revert "of/platform: Use platform_device interface"
    Revert "base/platform: Remove code duplication"
    firmware: add missing kfree for work on async call
    fs: sysfs: don't pass count == 0 to bin file readers
    base:dd - Fix for typo in comment to function driver_deferred_probe_trigger().
    base/platform: Remove code duplication
    of/platform: Use platform_device interface
    base/platform: Continue on insert_resource() error
    base/platform: Only insert MEM and IO resources
    firmware: use const for remaining firmware names
    firmware: fix possible use after free on name on asynchronous request
    firmware: check for file truncation on direct firmware loading
    firmware: fix __getname() missing failure check
    drivers: of/base: move of_init to driver_init
    drivers/base: cacheinfo: fix annoying typo when DT nodes are absent
    sysfs: disambiguate between "error code" and "failure" in comments
    driver-core: fix build for !CONFIG_MODULES
    driver-core: make __device_attach() static
    ...

    Linus Torvalds
     
  • Pull tracing updates from Steven Rostedt:
    "This patch series contains several clean ups and even a new trace
    clock "monitonic raw". Also some enhancements to make the ring buffer
    even faster. But the biggest and most noticeable change is the
    renaming of the ftrace* files, structures and variables that have to
    deal with trace events.

    Over the years I've had several developers tell me about their
    confusion with what ftrace is compared to events. Technically,
    "ftrace" is the infrastructure to do the function hooks, which include
    tracing and also helps with live kernel patching. But the trace
    events are a separate entity altogether, and the files that affect the
    trace events should not be named "ftrace". These include:

    include/trace/ftrace.h -> include/trace/trace_events.h
    include/linux/ftrace_event.h -> include/linux/trace_events.h

    Also, functions that are specific for trace events have also been renamed:

    ftrace_print_*() -> trace_print_*()
    (un)register_ftrace_event() -> (un)register_trace_event()
    ftrace_event_name() -> trace_event_name()
    ftrace_trigger_soft_disabled() -> trace_trigger_soft_disabled()
    ftrace_define_fields_##call() -> trace_define_fields_##call()
    ftrace_get_offsets_##call() -> trace_get_offsets_##call()

    Structures have been renamed:

    ftrace_event_file -> trace_event_file
    ftrace_event_{call,class} -> trace_event_{call,class}
    ftrace_event_buffer -> trace_event_buffer
    ftrace_subsystem_dir -> trace_subsystem_dir
    ftrace_event_raw_##call -> trace_event_raw_##call
    ftrace_event_data_offset_##call-> trace_event_data_offset_##call
    ftrace_event_type_funcs_##call -> trace_event_type_funcs_##call

    And a few various variables and flags have also been updated.

    This has been sitting in linux-next for some time, and I have not
    heard a single complaint about this rename breaking anything. Mostly
    because these functions, variables and structures are mostly internal
    to the tracing system and are seldom (if ever) used by anything
    external to that"

    * tag 'trace-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
    ring_buffer: Allow to exit the ring buffer benchmark immediately
    ring-buffer-benchmark: Fix the wrong type
    ring-buffer-benchmark: Fix the wrong param in module_param
    ring-buffer: Add enum names for the context levels
    ring-buffer: Remove useless unused tracing_off_permanent()
    ring-buffer: Give NMIs a chance to lock the reader_lock
    ring-buffer: Add trace_recursive checks to ring_buffer_write()
    ring-buffer: Allways do the trace_recursive checks
    ring-buffer: Move recursive check to per_cpu descriptor
    ring-buffer: Add unlikelys to make fast path the default
    tracing: Rename ftrace_get_offsets_##call() to trace_event_get_offsets_##call()
    tracing: Rename ftrace_define_fields_##call() to trace_event_define_fields_##call()
    tracing: Rename ftrace_event_type_funcs_##call to trace_event_type_funcs_##call
    tracing: Rename ftrace_data_offset_##call to trace_event_data_offset_##call
    tracing: Rename ftrace_raw_##call event structures to trace_event_raw_##call
    tracing: Rename ftrace_trigger_soft_disabled() to trace_trigger_soft_disabled()
    tracing: Rename FTRACE_EVENT_FL_* flags to EVENT_FILE_FL_*
    tracing: Rename struct ftrace_subsystem_dir to trace_subsystem_dir
    tracing: Rename ftrace_event_name() to trace_event_name()
    tracing: Rename FTRACE_MAX_EVENT to TRACE_EVENT_TYPE_MAX
    ...

    Linus Torvalds
     

23 Jun, 2015

1 commit

  • Add a "param_lock" mutex to each module, and update params.c to use
    the correct built-in or module mutex while locking kernel params.
    Remove the kparam_block_sysfs_r/w() macros, replace them with direct
    calls to kernel_param_[un]lock(module).

    The kernel param code currently uses a single mutex to protect
    modification of any and all kernel params. While this generally works,
    there is one specific problem with it; a module callback function
    cannot safely load another module, i.e. with request_module() or even
    with indirect calls such as crypto_has_alg(). If the module to be
    loaded has any of its params configured (e.g. with a /etc/modprobe.d/*
    config file), then the attempt will result in a deadlock between the
    first module param callback waiting for modprobe, and modprobe trying to
    lock the single kernel param mutex to set the new module's param.

    This fixes that by using per-module mutexes, so that each individual module
    is protected against concurrent changes in its own kernel params, but is
    not blocked by changes to other module params. All built-in modules
    continue to use the built-in mutex, since they will always be loaded at
    runtime and references (e.g. request_module(), crypto_has_alg()) to them
    will never cause load-time param changing.

    This also simplifies the interface used by modules to block sysfs access
    to their params; while there are currently functions to block and unblock
    sysfs param access which are split up by read and write and expect a single
    kernel param to be passed, their actual operation is identical and applies
    to all params, not just the one passed to them; they simply lock and unlock
    the global param mutex. They are replaced with direct calls to
    kernel_param_[un]lock(THIS_MODULE), which locks THIS_MODULE's param_lock, or
    if the module is built-in, it locks the built-in mutex.

    Suggested-by: Rusty Russell
    Signed-off-by: Dan Streetman
    Signed-off-by: Rusty Russell

    Dan Streetman
     

28 May, 2015

3 commits

  • Andrew worried about the overhead on small systems; only use the fancy
    code when either perf or tracing is enabled.

    Cc: Rusty Russell
    Cc: Steven Rostedt
    Requested-by: Andrew Morton
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Rusty Russell

    Peter Zijlstra
     
  • Currently __module_address() is using a linear search through all
    modules in order to find the module corresponding to the provided
    address. With a lot of modules this can take a lot of time.

    One of the users of this is kernel_text_address() which is employed
    in many stack unwinders; which in turn are used by perf-callchain and
    ftrace (possibly from NMI context).

    So by optimizing __module_address() we optimize many stack unwinders
    which are used by both perf and tracing in performance sensitive code.

    Cc: Rusty Russell
    Cc: Steven Rostedt
    Cc: Mathieu Desnoyers
    Cc: Oleg Nesterov
    Cc: "Paul E. McKenney"
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Rusty Russell

    Peter Zijlstra
     
  • Currently the RCU usage in module is an inconsistent mess of RCU and
    RCU-sched, this is broken for CONFIG_PREEMPT where synchronize_rcu()
    does not imply synchronize_sched().

    Most usage sites use preempt_{dis,en}able() which is RCU-sched, but
    (most of) the modification sites use synchronize_rcu(). With the
    exception of the module bug list, which actually uses RCU.

    Convert everything over to RCU-sched.

    Furthermore add lockdep asserts to all sites, because it's not at all
    clear to me the required locking is observed, esp. on exported
    functions.

    Cc: Rusty Russell
    Acked-by: "Paul E. McKenney"
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Rusty Russell

    Peter Zijlstra
     

25 May, 2015

1 commit

  • Commit f2411da74698 ("driver-core: add driver module asynchronous probe
    support") broke build in case modules are disabled, because in this case
    "struct module" is not defined and we can't dereference it. Let's define
    module_requested_async_probing() helper and stub it out if modules are
    disabled.

    Reported-by: kbuild test robot
    Reported-by: Stephen Rothwell
    Signed-off-by: Dmitry Torokhov
    Signed-off-by: Greg Kroah-Hartman

    Dmitry Torokhov
     

20 May, 2015

1 commit

  • Some init systems may wish to express the desire to have device drivers
    run their probe() code asynchronously. This implements support for this
    and allows userspace to request async probe as a preference through a
    generic shared device driver module parameter, async_probe.

    Implementation for async probe is supported through a module parameter
    given that since synchronous probe has been prevalent for years some
    userspace might exist which relies on the fact that the device driver
    will probe synchronously and the assumption that devices it provides
    will be immediately available after this.

    Signed-off-by: Luis R. Rodriguez
    Signed-off-by: Dmitry Torokhov
    Signed-off-by: Greg Kroah-Hartman

    Luis R. Rodriguez
     

14 May, 2015

1 commit


23 Apr, 2015

1 commit


15 Apr, 2015

1 commit

  • Pull tracing updates from Steven Rostedt:
    "Some clean ups and small fixes, but the biggest change is the addition
    of the TRACE_DEFINE_ENUM() macro that can be used by tracepoints.

    Tracepoints have helper functions for the TP_printk() called
    __print_symbolic() and __print_flags() that lets a numeric number be
    displayed as a a human comprehensible text. What is placed in the
    TP_printk() is also shown in the tracepoint format file such that user
    space tools like perf and trace-cmd can parse the binary data and
    express the values too. Unfortunately, the way the TRACE_EVENT()
    macro works, anything placed in the TP_printk() will be shown pretty
    much exactly as is. The problem arises when enums are used. That's
    because unlike macros, enums will not be changed into their values by
    the C pre-processor. Thus, the enum string is exported to the format
    file, and this makes it useless for user space tools.

    The TRACE_DEFINE_ENUM() solves this by converting the enum strings in
    the TP_printk() format into their number, and that is what is shown to
    user space. For example, the tracepoint tlb_flush currently has this
    in its format file:

    __print_symbolic(REC->reason,
    { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
    { TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
    { TLB_LOCAL_SHOOTDOWN, "local shootdown" },
    { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })

    After adding:

    TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
    TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
    TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
    TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);

    Its format file will contain this:

    __print_symbolic(REC->reason,
    { 0, "flush on task switch" },
    { 1, "remote shootdown" },
    { 2, "local shootdown" },
    { 3, "local mm shootdown" })"

    * tag 'trace-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (27 commits)
    tracing: Add enum_map file to show enums that have been mapped
    writeback: Export enums used by tracepoint to user space
    v4l: Export enums used by tracepoints to user space
    SUNRPC: Export enums in tracepoints to user space
    mm: tracing: Export enums in tracepoints to user space
    irq/tracing: Export enums in tracepoints to user space
    f2fs: Export the enums in the tracepoints to userspace
    net/9p/tracing: Export enums in tracepoints to userspace
    x86/tlb/trace: Export enums in used by tlb_flush tracepoint
    tracing/samples: Update the trace-event-sample.h with TRACE_DEFINE_ENUM()
    tracing: Allow for modules to convert their enums to values
    tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values
    tracing: Update trace-event-sample with TRACE_SYSTEM_VAR documentation
    tracing: Give system name a pointer
    brcmsmac: Move each system tracepoints to their own header
    iwlwifi: Move each system tracepoints to their own header
    mac80211: Move message tracepoints to their own header
    tracing: Add TRACE_SYSTEM_VAR to xhci-hcd
    tracing: Add TRACE_SYSTEM_VAR to kvm-s390
    tracing: Add TRACE_SYSTEM_VAR to intel-sst
    ...

    Linus Torvalds
     

08 Apr, 2015

1 commit


19 Mar, 2015

1 commit


17 Mar, 2015

1 commit

  • There is a notifier that handles live patches for coming and going modules.
    It takes klp_mutex lock to avoid races with coming and going patches but
    it does not keep the lock all the time. Therefore the following races are
    possible:

    1. The notifier is called sometime in STATE_MODULE_COMING. The module
    is visible by find_module() in this state all the time. It means that
    new patch can be registered and enabled even before the notifier is
    called. It might create wrong order of stacked patches, see below
    for an example.

    2. New patch could still see the module in the GOING state even after
    the notifier has been called. It will try to initialize the related
    object structures but the module could disappear at any time. There
    will stay mess in the structures. It might even cause an invalid
    memory access.

    This patch solves the problem by adding a boolean variable into struct module.
    The value is true after the coming and before the going handler is called.
    New patches need to be applied when the value is true and they need to ignore
    the module when the value is false.

    Note that we need to know state of all modules on the system. The races are
    related to new patches. Therefore we do not know what modules will get
    patched.

    Also note that we could not simply ignore going modules. The code from the
    module could be called even in the GOING state until mod->exit() finishes.
    If we start supporting patches with semantic changes between function
    calls, we need to apply new patches to any still usable code.
    See below for an example.

    Finally note that the patch solves only the situation when a new patch is
    registered. There are no such problems when the patch is being removed.
    It does not matter who disable the patch first, whether the normal
    disable_patch() or the module notifier. There is nothing to do
    once the patch is disabled.

    Alternative solutions:
    ======================

    + reject new patches when a patched module is coming or going; this is ugly

    + wait with adding new patch until the module leaves the COMING and GOING
    states; this might be dangerous and complicated; we would need to release
    kgr_lock in the middle of the patch registration to avoid a deadlock
    with the coming and going handlers; also we might need a waitqueue for
    each module which seems to be even bigger overhead than the boolean

    + stop modules from entering COMING and GOING states; wait until modules
    leave these states when they are already there; looks complicated; we would
    need to ignore the module that asked to stop the others to avoid a deadlock;
    also it is unclear what to do when two modules asked to stop others and
    both are in COMING state (situation when two new patches are applied)

    + always register/enable new patches and fix up the potential mess (registered
    patches order) in klp_module_init(); this is nasty and prone to regressions
    in the future development

    + add another MODULE_STATE where the kallsyms are visible but the module is not
    used yet; this looks too complex; the module states are checked on "many"
    locations

    Example of patch stacking breakage:
    ===================================

    The notifier could _not_ _simply_ ignore already initialized module objects.
    For example, let's have three patches (P1, P2, P3) for functions a() and b()
    where a() is from vmcore and b() is from a module M. Something like:

    a() b()
    P1 a1() b1()
    P2 a2() b2()
    P3 a3() b3(3)

    If you load the module M after all patches are registered and enabled.
    The ftrace ops for function a() and b() has listed the functions in this
    order:

    ops_a->func_stack -> list(a3,a2,a1)
    ops_b->func_stack -> list(b3,b2,b1)

    , so the pointer to b3() is the first and will be used.

    Then you might have the following scenario. Let's start with state when patches
    P1 and P2 are registered and enabled but the module M is not loaded. Then ftrace
    ops for b() does not exist. Then we get into the following race:

    CPU0 CPU1

    load_module(M)

    complete_formation()

    mod->state = MODULE_STATE_COMING;
    mutex_unlock(&module_mutex);

    klp_register_patch(P3);
    klp_enable_patch(P3);

    # STATE 1

    klp_module_notify(M)
    klp_module_notify_coming(P1);
    klp_module_notify_coming(P2);
    klp_module_notify_coming(P3);

    # STATE 2

    The ftrace ops for a() and b() then looks:

    STATE1:

    ops_a->func_stack -> list(a3,a2,a1);
    ops_b->func_stack -> list(b3);

    STATE2:
    ops_a->func_stack -> list(a3,a2,a1);
    ops_b->func_stack -> list(b2,b1,b3);

    therefore, b2() is used for the module but a3() is used for vmcore
    because they were the last added.

    Example of the race with going modules:
    =======================================

    CPU0 CPU1

    delete_module() #SYSCALL

    try_stop_module()
    mod->state = MODULE_STATE_GOING;

    mutex_unlock(&module_mutex);

    klp_register_patch()
    klp_enable_patch()

    #save place to switch universe

    b() # from module that is going
    a() # from core (patched)

    mod->exit();

    Note that the function b() can be called until we call mod->exit().

    If we do not apply patch against b() because it is in MODULE_STATE_GOING,
    it will call patched a() with modified semantic and things might get wrong.

    [jpoimboe@redhat.com: use one boolean instead of two]
    Signed-off-by: Petr Mladek
    Acked-by: Josh Poimboeuf
    Acked-by: Rusty Russell
    Signed-off-by: Jiri Kosina

    Petr Mladek
     

14 Feb, 2015

1 commit

  • MODULE_DEVICE_TABLE() macro used to create aliases to device tables.
    Normally alias should have the same type as aliased symbol.

    Device tables are arrays, so they have 'struct type##_device_id[x]'
    types. Alias created by MODULE_DEVICE_TABLE() will have non-array type -
    'struct type##_device_id'.

    This inconsistency confuses compiler, it could make a wrong assumption
    about variable's size which leads KASan to produce a false positive report
    about out of bounds access.

    For every global variable compiler calls __asan_register_globals() passing
    information about global variable (address, size, size with redzone, name
    ...) __asan_register_globals() poison symbols redzone to detect possible
    out of bounds accesses.

    When symbol has an alias __asan_register_globals() will be called as for
    symbol so for alias. Compiler determines size of variable by size of
    variable's type. Alias and symbol have the same address, so if alias have
    the wrong size part of memory that actually belongs to the symbol could be
    poisoned as redzone of alias symbol.

    By fixing type of alias symbol we will fix size of it, so
    __asan_register_globals() will not poison valid memory.

    Signed-off-by: Andrey Ryabinin
    Cc: Dmitry Vyukov
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrey Konovalov
    Cc: Yuri Gribov
    Cc: Konstantin Khlebnikov
    Cc: Sasha Levin
    Cc: Christoph Lameter
    Cc: Joonsoo Kim
    Cc: Dave Hansen
    Cc: Andi Kleen
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     

22 Jan, 2015

1 commit

  • James Bottomley points out that it will be -1 during unload. It's
    only used for diagnostics, so let's not hide that as it could be a
    clue as to what's gone wrong.

    Cc: Jason Wessel
    Acked-and-documention-added-by: James Bottomley
    Reviewed-by: Masami Hiramatsu
    Signed-off-by: Rusty Russell

    Rusty Russell
     

11 Nov, 2014

1 commit


27 Jul, 2014

2 commits


07 Apr, 2014

1 commit

  • Pull module updates from Rusty Russell:
    "Nothing major: the stricter permissions checking for sysfs broke a
    staging driver; fix included. Greg KH said he'd take the patch but
    hadn't as the merge window opened, so it's included here to avoid
    breaking build"

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    staging: fix up speakup kobject mode
    Use 'E' instead of 'X' for unsigned module taint flag.
    VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms.
    kallsyms: fix percpu vars on x86-64 with relocation.
    kallsyms: generalize address range checking
    module: LLVMLinux: Remove unused function warning from __param_check macro
    Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE
    module: remove MODULE_GENERIC_TABLE
    module: allow multiple calls to MODULE_DEVICE_TABLE() per module
    module: use pr_cont

    Linus Torvalds
     

13 Mar, 2014

2 commits

  • MODULE_DEVICE_TABLE() calles MODULE_GENERIC_TABLE(); make it do the
    work directly. This also removes a wart introduced in the last patch,
    where the alias is defined to be an unknown struct type "struct
    type##__##name##_device_id" instead of "struct type##_device_id" (it's
    an extern so GCC doesn't care, but it's wrong).

    The other user of MODULE_GENERIC_TABLE (ISAPNP_CARD_TABLE) is unused,
    so delete it.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Commit 78551277e4df5: "Input: i8042 - add PNP modaliases" had a bug, where the
    second call to MODULE_DEVICE_TABLE() overrode the first resulting in not all
    the modaliases being exposed.

    This fixes the problem by including the name of the device_id table in the
    __mod_*_device_table alias, allowing us to export several device_id tables
    per module.

    Suggested-by: Kay Sievers
    Acked-by: Greg Kroah-Hartman
    Cc: Dmitry Torokhov
    Signed-off-by: Tom Gundersen
    Signed-off-by: Rusty Russell

    Tom Gundersen
     

07 Mar, 2014

1 commit

  • There's nothing in the module.h header that requires tracepoint.h to be
    included, and there may be cases that tracepoint.h may need to include
    module.h, which will cause recursive header issues.

    But module.h requires seeing HAVE_JUMP_LABEL which is set in jump_label.h
    which it just coincidentally gets from tracepoint.h.

    Link: http://lkml.kernel.org/r/20140307084712.5c68641a@gandalf.local.home

    Acked-by: Rusty Russell
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

16 Jan, 2014

1 commit


04 Dec, 2013

1 commit


23 Sep, 2013

1 commit

  • The option to wait for a module reference count to reach zero was in
    the initial module implementation, but it was never supported in
    modprobe (you had to use rmmod --wait). After discussion with Lucas,
    It has been deprecated (with a 10 second sleep) in kmod for the last
    year.

    This finally removes it: the flag will evoke a printk warning and a
    normal (non-blocking) remove attempt.

    Cc: Lucas De Marchi
    Signed-off-by: Rusty Russell

    Rusty Russell
     

03 Sep, 2013

1 commit

  • DEBUG_KOBJECT_RELEASE helps to find the issue attached below.

    After some investigation, it seems the reason is:
    The mod->mkobj.kobj(ffffffffa01600d0 below) is freed together with mod
    itself in free_module(). However, its children still hold references to
    it, as the delay caused by DEBUG_KOBJECT_RELEASE. So when the
    child(holders below) tries to decrease the reference count to its parent
    in kobject_del(), BUG happens as it tries to access already freed memory.

    This patch tries to fix it by waiting for the mod->mkobj.kobj to be
    really released in the module removing process (and some error code
    paths).

    [ 1844.175287] kobject: 'holders' (ffff88007c1f1600): kobject_release, parent ffffffffa01600d0 (delayed)
    [ 1844.178991] kobject: 'notes' (ffff8800370b2a00): kobject_release, parent ffffffffa01600d0 (delayed)
    [ 1845.180118] kobject: 'holders' (ffff88007c1f1600): kobject_cleanup, parent ffffffffa01600d0
    [ 1845.182130] kobject: 'holders' (ffff88007c1f1600): auto cleanup kobject_del
    [ 1845.184120] BUG: unable to handle kernel paging request at ffffffffa01601d0
    [ 1845.185026] IP: [] kobject_put+0x11/0x60
    [ 1845.185026] PGD 1a13067 PUD 1a14063 PMD 7bd30067 PTE 0
    [ 1845.185026] Oops: 0000 [#1] PREEMPT
    [ 1845.185026] Modules linked in: xfs libcrc32c [last unloaded: kprobe_example]
    [ 1845.185026] CPU: 0 PID: 18 Comm: kworker/0:1 Tainted: G O 3.11.0-rc6-next-20130819+ #1
    [ 1845.185026] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
    [ 1845.185026] Workqueue: events kobject_delayed_cleanup
    [ 1845.185026] task: ffff88007ca51f00 ti: ffff88007ca5c000 task.ti: ffff88007ca5c000
    [ 1845.185026] RIP: 0010:[] [] kobject_put+0x11/0x60
    [ 1845.185026] RSP: 0018:ffff88007ca5dd08 EFLAGS: 00010282
    [ 1845.185026] RAX: 0000000000002000 RBX: ffffffffa01600d0 RCX: ffffffff8177d638
    [ 1845.185026] RDX: ffff88007ca5dc18 RSI: 0000000000000000 RDI: ffffffffa01600d0
    [ 1845.185026] RBP: ffff88007ca5dd18 R08: ffffffff824e9810 R09: ffffffffffffffff
    [ 1845.185026] R10: ffff8800ffffffff R11: dead4ead00000001 R12: ffffffff81a95040
    [ 1845.185026] R13: ffff88007b27a960 R14: ffff88007c1f1600 R15: 0000000000000000
    [ 1845.185026] FS: 0000000000000000(0000) GS:ffffffff81a23000(0000) knlGS:0000000000000000
    [ 1845.185026] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [ 1845.185026] CR2: ffffffffa01601d0 CR3: 0000000037207000 CR4: 00000000000006b0
    [ 1845.185026] Stack:
    [ 1845.185026] ffff88007c1f1600 ffff88007c1f1600 ffff88007ca5dd38 ffffffff812cdb7e
    [ 1845.185026] 0000000000000000 ffff88007c1f1640 ffff88007ca5dd68 ffffffff812cdbfe
    [ 1845.185026] ffff88007c974800 ffff88007c1f1640 ffff88007ff61a00 0000000000000000
    [ 1845.185026] Call Trace:
    [ 1845.185026] [] kobject_del+0x2e/0x40
    [ 1845.185026] [] kobject_delayed_cleanup+0x6e/0x1d0
    [ 1845.185026] [] process_one_work+0x1e5/0x670
    [ 1845.185026] [] ? process_one_work+0x183/0x670
    [ 1845.185026] [] worker_thread+0x113/0x370
    [ 1845.185026] [] ? rescuer_thread+0x290/0x290
    [ 1845.185026] [] kthread+0xda/0xe0
    [ 1845.185026] [] ? _raw_spin_unlock_irq+0x30/0x60
    [ 1845.185026] [] ? kthread_create_on_node+0x130/0x130
    [ 1845.185026] [] ret_from_fork+0x7a/0xb0
    [ 1845.185026] [] ? kthread_create_on_node+0x130/0x130
    [ 1845.185026] Code: 81 48 c7 c7 28 95 ad 81 31 c0 e8 9b da 01 00 e9 4f ff ff ff 66 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 85 ff 74 1d 87 00 01 00 00 01 74 1e 48 8d 7b 38 83 6b 38 01 0f 94 c0 84
    [ 1845.185026] RIP [] kobject_put+0x11/0x60
    [ 1845.185026] RSP
    [ 1845.185026] CR2: ffffffffa01601d0
    [ 1845.185026] ---[ end trace 49a70afd109f5653 ]---

    Signed-off-by: Li Zhong
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: Rusty Russell

    Li Zhong
     

20 Aug, 2013

1 commit


15 Mar, 2013

1 commit

  • We have CONFIG_SYMBOL_PREFIX, which three archs define to the string
    "_". But Al Viro broke this in "consolidate cond_syscall and
    SYSCALL_ALIAS declarations" (in linux-next), and he's not the first to
    do so.

    Using CONFIG_SYMBOL_PREFIX is awkward, since we usually just want to
    prefix it so something. So various places define helpers which are
    defined to nothing if CONFIG_SYMBOL_PREFIX isn't set:

    1) include/asm-generic/unistd.h defines __SYMBOL_PREFIX.
    2) include/asm-generic/vmlinux.lds.h defines VMLINUX_SYMBOL(sym)
    3) include/linux/export.h defines MODULE_SYMBOL_PREFIX.
    4) include/linux/kernel.h defines SYMBOL_PREFIX (which differs from #7)
    5) kernel/modsign_certificate.S defines ASM_SYMBOL(sym)
    6) scripts/modpost.c defines MODULE_SYMBOL_PREFIX
    7) scripts/Makefile.lib defines SYMBOL_PREFIX on the commandline if
    CONFIG_SYMBOL_PREFIX is set, so that we have a non-string version
    for pasting.

    (arch/h8300/include/asm/linkage.h defines SYMBOL_NAME(), too).

    Let's solve this properly:
    1) No more generic prefix, just CONFIG_HAVE_UNDERSCORE_SYMBOL_PREFIX.
    2) Make linux/export.h usable from asm.
    3) Define VMLINUX_SYMBOL() and VMLINUX_SYMBOL_STR().
    4) Make everyone use them.

    Signed-off-by: Rusty Russell
    Reviewed-by: James Hogan
    Tested-by: James Hogan (metag)

    Rusty Russell
     

21 Jan, 2013

1 commit