31 Oct, 2011

2 commits

  • There are files which use module_param and MODULE_PARM_DESC
    back to back. They only include moduleparam.h which makes sense,
    but the implicit presence of module.h everywhere hid the fact
    that MODULE_PARM_DESC wasn't in moduleparam.h at all. Relocate
    the macro to moduleparam.h so that the moduleparam infrastructure
    can be used independently of module.h

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     
  • A lot of files pull in module.h when all they are really
    looking for is the basic EXPORT_SYMBOL functionality. The
    recent data from Ingo[1] shows that this is one of several
    instances that has a significant impact on compile times,
    and it should be targeted for factoring out (as done here).

    Note that several commonly used header files in include/*
    directly include themselves (some 34 of them!)
    The most commonly used ones of these will have to be made
    independent of module.h before the full benefit of this change
    can be realized.

    We also transition THIS_MODULE from module.h to export.h,
    since there are lots of files with subsystem structs that
    in turn will have a struct module *owner and only be doing:

    .owner = THIS_MODULE;

    and absolutely nothing else modular. So, we also want to have
    the THIS_MODULE definition present in the lightweight header.

    [1] https://lkml.org/lkml/2011/5/23/76

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

11 Aug, 2011

1 commit

  • Copy the information needed from struct module into a local module list
    held within tracepoint.c from within the module coming/going notifier.

    This vastly simplifies locking of tracepoint registration /
    unregistration, because we don't have to take the module mutex to
    register and unregister tracepoints anymore. Steven Rostedt ran into
    dependency problems related to modules mutex vs kprobes mutex vs ftrace
    mutex vs tracepoint mutex that seems to be hard to fix without removing
    this dependency between tracepoint and module mutex. (note: it should be
    investigated whether kprobes could benefit of being dissociated from the
    modules mutex too.)

    This also fixes module handling of tracepoint list iterators, because it
    was expecting the list to be sorted by pointer address. Given we have
    control on our own list now, it's OK to sort this list which has
    tracepoints as its only purpose. The reason why this sorting is required
    is to handle the fact that seq files (and any read() operation from
    user-space) cannot hold the tracepoint mutex across multiple calls, so
    list entries may vanish between calls. With sorting, the tracepoint
    iterator becomes usable even if the list don't contain the exact item
    pointed to by the iterator anymore.

    Signed-off-by: Mathieu Desnoyers
    Acked-by: Jason Baron
    CC: Ingo Molnar
    CC: Lai Jiangshan
    CC: Peter Zijlstra
    CC: Thomas Gleixner
    CC: Masami Hiramatsu
    Link: http://lkml.kernel.org/r/20110810191839.GC8525@Krystal
    Signed-off-by: Steven Rostedt

    Mathieu Desnoyers
     

24 Jul, 2011

2 commits

  • Userspace wants to manage module parameters with udev rules.
    This currently only works for loaded modules, but not for
    built-in ones.

    To allow access to the built-in modules we need to
    re-trigger all module load events that happened before any
    userspace was running. We already do the same thing for all
    devices, subsystems(buses) and drivers.

    This adds the currently missing /sys/module//uevent files
    to all module entries.

    Signed-off-by: Kay Sievers
    Signed-off-by: Rusty Russell (split & trivial fix)

    Kay Sievers
     
  • This simplifies the next patch, where we have an attribute on a
    builtin module (ie. module == NULL).

    Signed-off-by: Kay Sievers
    Signed-off-by: Rusty Russell (split into 2)

    Kay Sievers
     

19 May, 2011

5 commits

  • This patch places every exported symbol in its own section
    (i.e. "___ksymtab+printk"). Thus the linker will use its SORT() directive
    to sort and finally merge all symbol in the right and final section
    (i.e. "__ksymtab").

    The symbol prefixed archs use an underscore as prefix for symbols.
    To avoid collision we use a different character to create the temporary
    section names.

    This work was supported by a hardware donation from the CE Linux Forum.

    Signed-off-by: Alessio Igor Bogani
    Signed-off-by: Rusty Russell (folded in '+' fixup)
    Tested-by: Dirk Behme

    Alessio Igor Bogani
     
  • Instead of having a callback function for each symbol in the kernel,
    have a callback for each array of symbols.

    This eases the logic when we move to sorted symbols and binary search.

    Signed-off-by: Rusty Russell
    Signed-off-by: Alessio Igor Bogani

    Rusty Russell
     
  • Reorder struct module to remove 24 bytes of alignment padding on 64 bit
    builds when the CONFIG_TRACE options are selected. This allows the
    structure to fit into one fewer cache lines, and its size drops from 592
    to 568 on x86_64.

    Signed-off-by: Richard Kennedy
    Signed-off-by: Rusty Russell

    Richard Kennedy
     
  • Doing so prevents the following warning from sparse:

    CHECK kernel/params.c
    kernel/params.c:817:9: warning: symbol '__modver_version_show' was not
    declared. Should it be static?

    since kernel/params.c is never compiled with MODULE being set.

    Signed-off-by: Dmitry Torokhov
    Signed-off-by: Rusty Russell

    Dmitry Torokhov
     
  • On m68k natural alignment is 2-byte boundary but we are trying to
    align structures in __modver section on sizeof(void *) boundary.
    This causes trouble when we try to access elements in this section
    in array-like fashion when create "version" attributes for built-in
    modules.

    Moreover, as DaveM said, we can't reliably put structures into
    independent objects, put them into a special section, and then expect
    array access over them (via the section boundaries) after linking the
    objects together to just "work" due to variable alignment choices in
    different situations. The only solution that seems to work reliably
    is to make an array of plain pointers to the objects in question and
    put those pointers in the special section.

    Reported-by: Geert Uytterhoeven
    Signed-off-by: Dmitry Torokhov
    Signed-off-by: Rusty Russell

    Dmitry Torokhov
     

22 Feb, 2011

1 commit

  • We force particular alignment when we generate attribute structures
    when generation MODULE_VERSION() data and we need to make sure that
    this alignment is followed when we iterate over these structures,
    otherwise we may crash on platforms whose natural alignment is not
    sizeof(void *), such as m68k.

    Reported-by: Geert Uytterhoeven
    Signed-off-by: Dmitry Torokhov
    [ There are more issues here, but the fixes are incredibly ugly - Linus ]
    Signed-off-by: Linus Torvalds

    Dmitry Torokhov
     

03 Feb, 2011

2 commits

  • Make the tracepoints more robust, making them solid enough to handle compiler
    changes by not relying on anything based on compiler-specific behavior with
    respect to structure alignment. Implement an approach proposed by David Miller:
    use an array of const pointers to refer to the individual structures, and export
    this pointer array through the linker script rather than the structures per se.
    It will consume 32 extra bytes per tracepoint (24 for structure padding and 8
    for the pointers), but are less likely to break due to compiler changes.

    History:

    commit 7e066fb8 tracepoints: add DECLARE_TRACE() and DEFINE_TRACE()
    added the aligned(32) type and variable attribute to the tracepoint structures
    to deal with gcc happily aligning statically defined structures on 32-byte
    multiples.

    One attempt was to use a 8-byte alignment for tracepoint structures by applying
    both the variable and type attribute to tracepoint structures definitions and
    declarations. It worked fine with gcc 4.5.1, but broke with gcc 4.4.4 and 4.4.5.

    The reason is that the "aligned" attribute only specify the _minimum_ alignment
    for a structure, leaving both the compiler and the linker free to align on
    larger multiples. Because tracepoint.c expects the structures to be placed as an
    array within each section, up-alignment cause NULL-pointer exceptions due to the
    extra unexpected padding.

    (this patch applies on top of -tip)

    Signed-off-by: Mathieu Desnoyers
    Acked-by: David S. Miller
    LKML-Reference:
    CC: Frederic Weisbecker
    CC: Ingo Molnar
    CC: Thomas Gleixner
    CC: Andrew Morton
    CC: Peter Zijlstra
    CC: Rusty Russell
    Signed-off-by: Steven Rostedt

    Mathieu Desnoyers
     
  • Currently the trace_event structures are placed in the _ftrace_events
    section, and at link time, the linker makes one large array of all
    the trace_event structures. On boot up, this array is read (much like
    the initcall sections) and the events are processed.

    The problem is that there is no guarantee that gcc will place complex
    structures nicely together in an array format. Two structures in the
    same file may be placed awkwardly, because gcc has no clue that they
    are suppose to be in an array.

    A hack was used previous to force the alignment to 4, to pack the
    structures together. But this caused alignment issues with other
    architectures (sparc).

    Instead of packing the structures into an array, the structures' addresses
    are now put into the _ftrace_event section. As pointers are always the
    natural alignment, gcc should always pack them tightly together
    (otherwise initcall, extable, etc would also fail).

    By having the pointers to the structures in the section, we can still
    iterate the trace_events without causing unnecessary alignment problems
    with other architectures, or depending on the current behaviour of
    gcc that will likely change in the future just to tick us kernel developers
    off a little more.

    The _ftrace_event section is also moved into the .init.data section
    as it is now only needed at boot up.

    Suggested-by: David Miller
    Cc: Mathieu Desnoyers
    Acked-by: David S. Miller
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

24 Jan, 2011

2 commits

  • lib/built-in.o:(__modver+0x8): undefined reference to `__modver_version_show'
    lib/built-in.o:(__modver+0x2c): undefined reference to `__modver_version_show'

    Simplest to just not emit anything: if they've disabled SYSFS they probably
    want the smallest kernel possible.

    Reported-by: Randy Dunlap
    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Currently only drivers that are built as modules have their versions
    shown in /sys/module//version, but this information might
    also be useful for built-in drivers as well. This especially important
    for drivers that do not define any parameters - such drivers, if
    built-in, are completely invisible from userspace.

    This patch changes MODULE_VERSION() macro so that in case when we are
    compiling built-in module, version information is stored in a separate
    section. Kernel then uses this data to create 'version' sysfs attribute
    in the same fashion it creates attributes for module parameters.

    Signed-off-by: Dmitry Torokhov
    Signed-off-by: Rusty Russell

    Dmitry Torokhov
     

23 Dec, 2010

1 commit


24 Nov, 2010

1 commit


18 Nov, 2010

1 commit

  • This patch is a logical extension of the protection provided by
    CONFIG_DEBUG_RODATA to LKMs. The protection is provided by
    splitting module_core and module_init into three logical parts
    each and setting appropriate page access permissions for each
    individual section:

    1. Code: RO+X
    2. RO data: RO+NX
    3. RW data: RW+NX

    In order to achieve proper protection, layout_sections() have
    been modified to align each of the three parts mentioned above
    onto page boundary. Next, the corresponding page access
    permissions are set right before successful exit from
    load_module(). Further, free_module() and sys_init_module have
    been modified to set module_core and module_init as RW+NX right
    before calling module_free().

    By default, the original section layout and access flags are
    preserved. When compiled with CONFIG_DEBUG_SET_MODULE_RONX=y,
    the patch will page-align each group of sections to ensure that
    each page contains only one type of content and will enforce
    RO/NX for each group of pages.

    -v1: Initial proof-of-concept patch.
    -v2: The patch have been re-written to reduce the number of #ifdefs
    and to make it architecture-agnostic. Code formatting has also
    been corrected.
    -v3: Opportunistic RO/NX protection is now unconditional. Section
    page-alignment is enabled when CONFIG_DEBUG_RODATA=y.
    -v4: Removed most macros and improved coding style.
    -v5: Changed page-alignment and RO/NX section size calculation
    -v6: Fixed comments. Restricted RO/NX enforcement to x86 only
    -v7: Introduced CONFIG_DEBUG_SET_MODULE_RONX, added
    calls to set_all_modules_text_rw() and set_all_modules_text_ro()
    in ftrace
    -v8: updated for compatibility with linux 2.6.33-rc5
    -v9: coding style fixes
    -v10: more coding style fixes
    -v11: minor adjustments for -tip
    -v12: minor adjustments for v2.6.35-rc2-tip
    -v13: minor adjustments for v2.6.37-rc1-tip

    Signed-off-by: Siarhei Liakh
    Signed-off-by: Xuxian Jiang
    Acked-by: Arjan van de Ven
    Reviewed-by: James Morris
    Signed-off-by: H. Peter Anvin
    Cc: Andi Kleen
    Cc: Rusty Russell
    Cc: Stephen Rothwell
    Cc: Dave Jones
    Cc: Kees Cook
    Cc: Linus Torvalds
    LKML-Reference:
    [ minor cleanliness edits, -v14: build failure fix ]
    Signed-off-by: Ingo Molnar

    matthieu castet
     

08 Oct, 2010

1 commit


06 Oct, 2010

1 commit

  • With all the recent module loading cleanups, we've minimized the code
    that sits under module_mutex, fixing various deadlocks and making it
    possible to do most of the module loading in parallel.

    However, that whole conversion totally missed the rather obscure code
    that adds a new module to the list for BUG() handling. That code was
    doubly obscure because (a) the code itself lives in lib/bugs.c (for
    dubious reasons) and (b) it gets called from the architecture-specific
    "module_finalize()" rather than from generic code.

    Calling it from arch-specific code makes no sense what-so-ever to begin
    with, and is now actively wrong since that code isn't protected by the
    module loading lock any more.

    So this commit moves the "module_bug_{finalize,cleanup}()" calls away
    from the arch-specific code, and into the generic code - and in the
    process protects it with the module_mutex so that the list operations
    are now safe.

    Future fixups:
    - move the module list handling code into kernel/module.c where it
    belongs.
    - get rid of 'module_bug_list' and just use the regular list of modules
    (called 'modules' - imagine that) that we already create and maintain
    for other reasons.

    Reported-and-tested-by: Thomas Gleixner
    Cc: Rusty Russell
    Cc: Adrian Bunk
    Cc: Andrew Morton
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

23 Sep, 2010

1 commit

  • base patch to implement 'jump labeling'. Based on a new 'asm goto' inline
    assembly gcc mechanism, we can now branch to labels from an 'asm goto'
    statment. This allows us to create a 'no-op' fastpath, which can subsequently
    be patched with a jump to the slowpath code. This is useful for code which
    might be rarely used, but which we'd like to be able to call, if needed.
    Tracepoints are the current usecase that these are being implemented for.

    Acked-by: David S. Miller
    Signed-off-by: Jason Baron
    LKML-Reference:

    [ cleaned up some formating ]

    Signed-off-by: Steven Rostedt

    Jason Baron
     

05 Jun, 2010

3 commits

  • These were placed in the header in ef665c1a06 to get the various
    SYSFS/MODULE config combintations to compile.

    That may have been necessary then, but it's not now. These functions
    are all local to module.c.

    Signed-off-by: Rusty Russell
    Cc: Randy Dunlap

    Rusty Russell
     
  • Linus changed the structure, and luckily this didn't compile any more.

    Reported-by: Stephen Rothwell
    Signed-off-by: Rusty Russell
    Cc: Jason Wessel
    Cc: Martin Hicks

    Rusty Russell
     
  • When adding a module that depends on another one, we used to create a
    one-way list of "modules_which_use_me", so that module unloading could
    see who needs a module.

    It's actually quite simple to make that list go both ways: so that we
    not only can see "who uses me", but also see a list of modules that are
    "used by me".

    In fact, we always wanted that list in "module_unload_free()": when we
    unload a module, we want to also release all the other modules that are
    used by that module. But because we didn't have that list, we used to
    first iterate over all modules, and then iterate over each "used by me"
    list of that module.

    By making the list two-way, we simplify module_unload_free(), and it
    allows for some trivial fixes later too.

    Signed-off-by: Linus Torvalds
    Signed-off-by: Rusty Russell (cleaned & rebased)

    Linus Torvalds
     

08 Apr, 2010

1 commit

  • Conflicts:
    include/linux/module.h
    kernel/module.c

    Semantic conflict:
    include/trace/events/module.h

    Merge reason: Resolve the conflict with upstream commit 5fbfb18 ("Fix up
    possibly racy module refcounting")

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

06 Apr, 2010

1 commit

  • Module refcounting is implemented with a per-cpu counter for speed.
    However there is a race when tallying the counter where a reference may
    be taken by one CPU and released by another. Reference count summation
    may then see the decrement without having seen the previous increment,
    leading to lower than expected count. A module which never has its
    actual reference drop below 1 may return a reference count of 0 due to
    this race.

    Module removal generally runs under stop_machine, which prevents this
    race causing bugs due to removal of in-use modules. However there are
    other real bugs in module.c code and driver code (module_refcount is
    exported) where the callers do not run under stop_machine.

    Fix this by maintaining running per-cpu counters for the number of
    module refcount increments and the number of refcount decrements. The
    increments are tallied after the decrements, so any decrement seen will
    always have its corresponding increment counted. The final refcount is
    the difference of the total increments and decrements, preventing a
    low-refcount from being returned.

    Signed-off-by: Nick Piggin
    Acked-by: Rusty Russell
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

01 Apr, 2010

1 commit

  • Remove the @refcnt argument, because it has side-effects, and arguments with
    side-effects are not skipped by the jump over disabled instrumentation and are
    executed even when the tracepoint is disabled.

    This was also causing a GPF as found by Randy Dunlap:

    Subject: 2.6.33 GP fault only when built with tracing
    LKML-Reference:

    Note, the current 2.6.34-rc has a fix for the actual cause of the GPF,
    but this fixes one of its triggers.

    Tested-by: Randy Dunlap
    Acked-by: Mathieu Desnoyers
    Signed-off-by: Li Zefan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Li Zefan
     

31 Mar, 2010

1 commit


29 Mar, 2010

2 commits

  • lockdep has custom code to check whether a pointer belongs to static
    percpu area which is somewhat broken. Implement proper
    is_kernel/module_percpu_address() and replace the custom code.

    On UP, percpu variables are regular static variables and can't be
    distinguished from them. Always return %false on UP.

    Signed-off-by: Tejun Heo
    Acked-by: Peter Zijlstra
    Cc: Rusty Russell
    Cc: Ingo Molnar

    Tejun Heo
     
  • Better encapsulate module static percpu area handling so that code
    outsidef of CONFIG_SMP ifdef doesn't deal with mod->percpu directly
    and add mod->percpu_size and record percpu_size in it. Both percpu
    fields are compiled out on UP. While at it, mark mod->percpu w/
    __percpu.

    This is to prepare for is_module_percpu_address().

    Signed-off-by: Tejun Heo
    Acked-by: Rusty Russell

    Tejun Heo
     

13 Mar, 2010

1 commit

  • Extern declarations in sysctl.c should be moved to their own header file,
    and then include them in relavant .c files.

    Move modprobe_path extern declaration to linux/kmod.h
    Move modules_disabled extern declaration to linux/module.h

    Signed-off-by: Dave Young
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Young
     

17 Feb, 2010

1 commit

  • Add __percpu sparse annotations to core subsystems.

    These annotations are to make sparse consider percpu variables to be
    in a different address space and warn if accessed without going
    through percpu accessors. This patch doesn't affect normal builds.

    Signed-off-by: Tejun Heo
    Reviewed-by: Christoph Lameter
    Acked-by: Paul E. McKenney
    Cc: Jens Axboe
    Cc: linux-mm@kvack.org
    Cc: Rusty Russell
    Cc: Dipankar Sarma
    Cc: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Eric Biederman

    Tejun Heo
     

05 Jan, 2010

2 commits

  • ringbuffer*.c are the last users of local.h.

    Remove the include from modules.h and add it to ringbuffer files.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Tejun Heo

    Christoph Lameter
     
  • Use cpu ops to deal with the per cpu data instead of a local_t. Reduces memory
    requirements, cache footprint and decreases cycle counts.

    The this_cpu_xx operations are also used for !SMP mode. Otherwise we could
    not drop the use of __module_ref_addr() which would make per cpu data handling
    complicated. this_cpu_xx operations have their own fallback for !SMP.

    V8-V9:
    - Leave include asm/module.h since ringbuffer.c depends on it. Nothing else
    does though. Another patch will deal with that.
    - Remove spurious free.

    Signed-off-by: Christoph Lameter
    Acked-by: Rusty Russell
    Signed-off-by: Tejun Heo

    Christoph Lameter
     

15 Dec, 2009

1 commit

  • The next commit will require the use of MODULE_SYMBOL_PREFIX in
    .tmp_exports-asm.S. Currently it is mixed in with C structure
    definitions in "asm/module.h". Move the definition of this arch option
    into Kconfig, so it can be easily accessed by any code.

    This also lets modpost.c use the same definition. Previously modpost
    relied on a hardcoded list of architectures in mk_elfconfig.c.

    A build test for blackfin, one of the two MODULE_SYMBOL_PREFIX archs,
    showed the generated code was unchanged. vmlinux was identical save
    for build ids, and an apparently randomized suffix on a single "__key"
    symbol in the kallsyms data).

    Signed-off-by: Alan Jenkins
    Acked-by: Mike Frysinger (blackfin)
    CC: Sam Ravnborg
    Signed-off-by: Rusty Russell

    Alan Jenkins
     

24 Sep, 2009

3 commits


19 Sep, 2009

1 commit

  • Now that the last users of markers have migrated to the event
    tracer we can kill off the (now orphan) support code.

    Signed-off-by: Christoph Hellwig
    Acked-by: Mathieu Desnoyers
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Christoph Hellwig
     

17 Aug, 2009

1 commit

  • Add trace points to trace module_load, module_free, module_get,
    module_put and module_request, and use trace_event facility to
    get the trace output.

    Here's the sample output:

    TASK-PID CPU# TIMESTAMP FUNCTION
    | | | | |
    -42 [000] 1.758380: module_request: fb0 wait=1 call_site=fb_open
    ...
    -60 [000] 3.269403: module_load: scsi_wait_scan
    -60 [000] 3.269432: module_put: scsi_wait_scan call_site=sys_init_module refcnt=0
    -61 [001] 3.273168: module_free: scsi_wait_scan
    ...
    -1021 [000] 13.836081: module_load: sunrpc
    -1021 [000] 13.840589: module_put: sunrpc call_site=sys_init_module refcnt=-1
    -1027 [000] 13.848098: module_get: sunrpc call_site=try_module_get refcnt=0
    -1027 [000] 13.848308: module_get: sunrpc call_site=get_filesystem refcnt=1
    -1027 [000] 13.848692: module_put: sunrpc call_site=put_filesystem refcnt=0
    ...
    modprobe-2587 [001] 1088.437213: module_load: trace_events_sample F
    modprobe-2587 [001] 1088.437786: module_put: trace_events_sample call_site=sys_init_module refcnt=0

    Note:

    - the taints flag can be 'F', 'C' and/or 'P' if mod->taints != 0

    - the module refcnt is percpu, so it can be negative in a
    specific cpu

    Signed-off-by: Li Zefan
    Acked-by: Rusty Russell
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Rusty Russell
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan