12 Apr, 2009

1 commit

  • Several drivers use asynchronous work to do device discovery, and we
    synchronize with them in the compiled-in case before we actually try to
    mount root filesystems etc.

    However, when compiled as modules, that synchronization is missing - the
    module loading completes, but the driver hasn't actually finished
    probing for devices, and that means that any user mode that expects to
    use the devices after the 'insmod' is now potentially broken.

    We already saw one case of a similar issue in the ACPI battery code,
    where the kernel itself expected the module to be all done, and unmapped
    the init memory - but the async device discovery was still running.
    That got hacked around by just removing the "__init" (see commit
    5d38258ec026921a7b266f4047ebeaa75db358e5 "ACPI battery: fix async boot
    oops"), but the real fix is to just make the module loading wait for all
    async work to be completed.

    It will slow down module loading, but since common devices should be
    built in anyway, and since the bug is really annoying and hard to handle
    from user space (and caused several S3 resume regressions), the simple
    fix to wait is the right one.

    This fixes at least

    http://bugzilla.kernel.org/show_bug.cgi?id=13063

    but probably a few other bugzilla entries too (12936, for example), and
    is confirmed to fix Rafael's storage driver breakage after resume bug
    report (no bugzilla entry).

    We should also be able to now revert that ACPI battery fix.

    Reported-and-tested-by: Rafael J. Wysocki
    Tested-by: Heinz Diehl
    Acked-by: Arjan van de Ven
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

07 Apr, 2009

1 commit

  • This reverts commit 9cb610d8e35fe3ec95a2fe2030b02f85aeea83c1.

    This was an impressively stupid patch. Firstly, we reset the SHF_ALLOC
    flag lower down in the same function, so the patch was useless. Even
    better, find_sec() ignores sections with SHF_ALLOC not set, so
    it breaks CONFIG_MODVERSIONS=y with CONFIG_MODULE_FORCE_LOAD=n, which
    refuses to load the module since it can't find the __versions section.

    Signed-off-by: Rusty Russell

    Rusty Russell
     

06 Apr, 2009

1 commit

  • * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (413 commits)
    tracing, net: fix net tree and tracing tree merge interaction
    tracing, powerpc: fix powerpc tree and tracing tree interaction
    ring-buffer: do not remove reader page from list on ring buffer free
    function-graph: allow unregistering twice
    trace: make argument 'mem' of trace_seq_putmem() const
    tracing: add missing 'extern' keywords to trace_output.h
    tracing: provide trace_seq_reserve()
    blktrace: print out BLK_TN_MESSAGE properly
    blktrace: extract duplidate code
    blktrace: fix memory leak when freeing struct blk_io_trace
    blktrace: fix blk_probes_ref chaos
    blktrace: make classic output more classic
    blktrace: fix off-by-one bug
    blktrace: fix the original blktrace
    blktrace: fix a race when creating blk_tree_root in debugfs
    blktrace: fix timestamp in binary output
    tracing, Text Edit Lock: cleanup
    tracing: filter fix for TRACE_EVENT_FORMAT events
    ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release()
    x86: kretprobe-booster interrupt emulation code fix
    ...

    Fix up trivial conflicts in
    arch/parisc/include/asm/ftrace.h
    include/linux/memory.h
    kernel/extable.c
    kernel/module.c

    Linus Torvalds
     

02 Apr, 2009

1 commit


31 Mar, 2009

12 commits

  • Impact: minor cleanup.

    I'm not going to neaten anyone else's code, but I'm happy to clean up
    my own.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Kay Sievers discovered that boot times are slowed
    by about half a second because all the stop_machine_create() calls,
    and he only probes about 40 modules (I have 125 loaded on this laptop).

    We only do stop_machine_create() so we can unlink the module if
    something goes wrong, but it's overkill (and buggy anyway: if
    stop_machine_create() fails we still call stop_machine_destroy()).

    Since we are only protecting against kallsyms (esp. oops) walking the
    list, synchronize_sched() is sufficient (synchronize_rcu() is probably
    sufficient, but we're not in a hurry).

    Kay says of this patch:
    ... no module takes more than 40 millisecs to link now, most of
    them are between 3 and 8 millisecs.

    That looks very different to the numbers without this patch
    and the otherwise same setup, where we get heavy noise in the
    traces and many delays of up to 200 millisecs until linking,
    most of them taking 30+ millisecs.

    Tested-by: Kay Sievers
    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • With CONFIG_MODVERSIONS, we version 'struct module' using a dummy
    export, but other things matter too:

    1) 'struct modversion_info' determines the layout of the __versions section,
    2) 'struct kernel_param' determines the layout of the __params section,
    3) 'struct kernel_symbol' determines __ksymtab*.
    4) 'struct marker' determines __markers.
    5) 'struct tracepoint' determines __tracepoints.

    So we rename 'struct_module' to 'module_layout' and include these in
    the signature. Now it's general we can add others later on without
    confusion.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Impact: reduce kernel memory usage

    This patch just takes off the SHF_ALLOC flag on __versions so we don't
    keep them around after module load.

    This saves about 7% of module memory if CONFIG_MODVERSIONS=y.

    Cc: Shawn Bohrer
    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Impact: Message cleanup

    Two of three callers of try_to_force_load() are not because of a
    missing version, so change the messages:

    Old:
    : no version for "magic" found: kernel tainted.
    New:
    : bad vermagic: kernel tainted.

    Old:
    : no version for "nocrc" found: kernel tainted.
    New:
    : no versions for exported symbols: kernel tainted.

    Old:
    : no version for "" found: kernel tainted.
    New:
    : : kernel tainted.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Impact: Expose some module.c symbols

    Ksplice uses several functions from module.c in order to resolve
    symbols and implement dependency handling. Calling these functions
    requires holding module_mutex, so it is exported.

    (This is just the module part of a bigger add-exports patch from Tim).

    Cc: Anders Kaseorg
    Cc: Jeff Arnold
    Signed-off-by: Tim Abbott
    Signed-off-by: Rusty Russell

    Tim Abbott
     
  • Impact: New API

    kallsyms_lookup_name only returns the first match that it finds. Ksplice
    needs information about all symbols with a given name in order to correctly
    resolve local symbols.

    kallsyms_on_each_symbol provides a generic mechanism for iterating over the
    kallsyms table.

    Cc: Jeff Arnold
    Cc: Tim Abbott
    Signed-off-by: Anders Kaseorg
    Signed-off-by: Rusty Russell

    Anders Kaseorg
     
  • Impact: Replace and remove risky (non-EXPORTed) API

    module_text_address() returns a pointer to the module, which given locking
    improvements in module.c, is useless except to test for NULL:

    1) If the module can't go away, use __module_text_address.
    2) Otherwise, just use is_module_text_address().

    Cc: linux-mtd@lists.infradead.org
    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Impact: New API, cleanup

    ksplice wants to know the bounds of a module, not just the module text.

    It makes sense to have __module_address. We then implement
    is_module_address and __module_text_address in terms of this (and
    change is_module_text_address() to bool while we're at it).

    Also, add proper kerneldoc for them all.

    Cc: Anders Kaseorg
    Cc: Jeff Arnold
    Cc: Tim Abbott
    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Impact: Cleanup, internal API change

    Ksplice needs access to the kernel_symbol structure in order to support
    modifications to the exported symbol table.

    Cc: Anders Kaseorg
    Cc: Jeff Arnold
    Signed-off-by: Tim Abbott
    Signed-off-by: Rusty Russell (bugfix and style)

    Tim Abbott
     
  • Impact: cleanup

    Label 'free_init' is only used when defined(CONFIG_MODULE_UNLOAD) &&
    defined(CONFIG_SMP), so move it inside to shut up gcc.

    Signed-off-by: WANG Cong
    Cc: Rusty Russell
    Signed-off-by: Rusty Russell

    Américo Wang
     
  • Impact: fix crash on reading from /sys/module/.../ieee80211_default_rc_algo

    The module_param type "charp" simply sets a char * pointer in the
    module to the parameter in the commandline string: this is why we keep
    the (mangled) module command line around. But when set via sysfs (as
    about 11 charp parameters can be) this memory is freed on the way
    out of the write(). Future reads hit random mem.

    So we kstrdup instead: we have to check we're not in early commandline
    parsing, and we have to note when we've used it so we can reliably
    kfree the parameter when it's next overwritten, and also on module
    unload.

    (Thanks to Randy Dunlap for CONFIG_SYSFS=n fixes)

    Reported-by: Sitsofe Wheeler
    Diagnosed-by: Frederic Weisbecker
    Tested-by: Frederic Weisbecker
    Tested-by: Christof Schmitt
    Signed-off-by: Rusty Russell

    Rusty Russell
     

28 Mar, 2009

1 commit


25 Mar, 2009

1 commit

  • This patch combines Greg Bank's dprintk() work with the existing dynamic
    printk patchset, we are now calling it 'dynamic debug'.

    The new feature of this patchset is a richer /debugfs control file interface,
    (an example output from my system is at the bottom), which allows fined grained
    control over the the debug output. The output can be controlled by function,
    file, module, format string, and line number.

    for example, enabled all debug messages in module 'nf_conntrack':

    echo -n 'module nf_conntrack +p' > /mnt/debugfs/dynamic_debug/control

    to disable them:

    echo -n 'module nf_conntrack -p' > /mnt/debugfs/dynamic_debug/control

    A further explanation can be found in the documentation patch.

    Signed-off-by: Greg Banks
    Signed-off-by: Jason Baron
    Signed-off-by: Greg Kroah-Hartman

    Jason Baron
     

20 Mar, 2009

1 commit


18 Mar, 2009

1 commit

  • Impact: fix ref-after-free crash on failed module load

    Fix refptr bug: Change refptr allocation and release order not to access a module
    data structure pointed by 'mod' after freeing mod->module_core.
    This bug will cause kernel panic(e.g. failed to find undefined symbols).

    This bug was reported on systemtap bugzilla.
    http://sources.redhat.com/bugzilla/show_bug.cgi?id=9927

    Signed-off-by: Masami Hiramatsu
    Cc: Eric Dumazet
    Signed-off-by: Rusty Russell

    Masami Hiramatsu
     

10 Mar, 2009

1 commit


06 Mar, 2009

2 commits

  • Conflicts:
    arch/x86/Kconfig
    block/blktrace.c
    kernel/irq/handle.c

    Semantic conflict:
    kernel/trace/blktrace.c

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Impact: add reserved allocation functionality and use it for module
    percpu variables

    This patch implements reserved allocation from the first chunk. When
    setting up the first chunk, arch can ask to set aside certain number
    of bytes right after the core static area which is available only
    through a separate reserved allocator. This will be used primarily
    for module static percpu variables on architectures with limited
    relocation range to ensure that the module perpcu symbols are inside
    the relocatable range.

    If reserved area is requested, the first chunk becomes reserved and
    isn't available for regular allocation. If the first chunk also
    includes piggy-back dynamic allocation area, a separate chunk mapping
    the same region is created to serve dynamic allocation. The first one
    is called static first chunk and the second dynamic first chunk.
    Although they share the page map, their different area map
    initializations guarantee they serve disjoint areas according to their
    purposes.

    If arch doesn't setup reserved area, reserved allocation is handled
    like any other allocation.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

20 Feb, 2009

2 commits

  • Impact: new scalable dynamic percpu allocator which allows dynamic
    percpu areas to be accessed the same way as static ones

    Implement scalable dynamic percpu allocator which can be used for both
    static and dynamic percpu areas. This will allow static and dynamic
    areas to share faster direct access methods. This feature is optional
    and enabled only when CONFIG_HAVE_DYNAMIC_PER_CPU_AREA is defined by
    arch. Please read comment on top of mm/percpu.c for details.

    Signed-off-by: Tejun Heo
    Cc: Andrew Morton

    Tejun Heo
     
  • Impact: cleanup

    Move percpu_modinit() upwards. This is to ease further changes.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

09 Feb, 2009

1 commit

  • When the function graph tracer picks a return address, it ensures this address
    is really a kernel text one by calling __kernel_text_address()

    Actually this path has never been taken.Its role was more likely to debug the tracer
    on the beginning of its development but this function is wasteful since it is called
    for every traced function.

    The fault check is already sufficient.

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

03 Feb, 2009

1 commit

  • Current refcounting for modules (done if CONFIG_MODULE_UNLOAD=y) is
    using a lot of memory.

    Each 'struct module' contains an [NR_CPUS] array of full cache lines.

    This patch uses existing infrastructure (percpu_modalloc() &
    percpu_modfree()) to allocate percpu space for the refcount storage.

    Instead of wasting NR_CPUS*128 bytes (on i386), we now use
    nr_cpu_ids*sizeof(local_t) bytes.

    On a typical distro, where NR_CPUS=8, shiping 2000 modules, we reduce
    size of module files by about 2 Mbytes. (1Kb per module)

    Instead of having all refcounters in the same memory node - with TLB misses
    because of vmalloc() - this new implementation permits to have better
    NUMA properties, since each CPU will use storage on its preferred node,
    thanks to percpu storage.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Rusty Russell
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

14 Jan, 2009

1 commit


08 Jan, 2009

1 commit

  • Right now, most of the kernel boot is strictly synchronous, such that
    various hardware delays are done sequentially.

    In order to make the kernel boot faster, this patch introduces
    infrastructure to allow doing some of the initialization steps
    asynchronously, which will hide significant portions of the hardware delays
    in practice.

    In order to not change device order and other similar observables, this
    patch does NOT do full parallel initialization.

    Rather, it operates more in the way an out of order CPU does; the work may
    be done out of order and asynchronous, but the observable effects
    (instruction retiring for the CPU) are still done in the original sequence.

    Signed-off-by: Arjan van de Ven

    Arjan van de Ven
     

07 Jan, 2009

3 commits

  • Add a module notifier call which notifies that the state of a module
    changes from MODULE_STATE_COMING to MODULE_STATE_LIVE.

    Signed-off-by: Masami Hiramatsu
    Cc: Ananth N Mavinakayanahalli
    Cc: Anil S Keshavamurthy
    Acked-by: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masami Hiramatsu
     
  • This series of patches allows kprobes to probe module's __init and __exit
    functions. This means, you can probe driver initialization and
    terminating.

    Currently, kprobes can't probe __init function because these functions are
    freed after module initialization. And it also can't probe module __exit
    functions because kprobe increments reference count of target module and
    user can't unload it. this means __exit functions never be called unless
    removing probes from the module.

    To solve both cases, this series of patches introduces GONE flag and sets
    it when the target code is freed(for this purpose, kprobes hooks
    MODULE_STATE_* events). This also removes refcount incrementing for
    allowing user to unload target module. Users can check which probes are
    GONE by debugfs interface. For taking timing of freeing module's .init
    text, these also include a patch which adds module's notifier of
    MODULE_STATE_LIVE event.

    This patch:

    Add within_module_core() and within_module_init() for checking whether an
    address is in the module .init.text section or .text section, and replace
    within() local inline functions in kernel/module.c with them.

    kprobes uses these functions to check where the kprobe is inserted.

    Signed-off-by: Masami Hiramatsu
    Cc: Ananth N Mavinakayanahalli
    Cc: Anil S Keshavamurthy
    Acked-by: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masami Hiramatsu
     
  • Signed-off-by: Alexey Dobriyan
    Cc: Gabor Gombas
    Cc: Jan Beulich
    Cc: Andi Kleen
    Cc: Ingo Molnar ,
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

05 Jan, 2009

4 commits

  • The module code relies on a non-failing stop_machine call. So we create
    the kstop threads in advance and with that make sure the call won't fail.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Rusty Russell

    Heiko Carstens
     
  • When creating the final layout of a kernel module in memory, allow the
    module loader to reserve some additional memory in front of a given section.
    This is currently only needed for the parisc port which needs to put the
    stub entries there to fulfill the 17/22bit PCREL relocations with large
    kernel modules like xfs.

    Signed-off-by: Helge Deller
    Signed-off-by: Rusty Russell (renamed fn)

    Helge Deller
     
  • Fix this warning:
    kernel/module.c:824: warning: ‘print_unload_info’ defined but not used
    print_unload_info() just was used when CONFIG_PROC_FS was defined.
    This patch mark print_unload_info() inline to solve the problem.

    Signed-off-by: Jianjun Kong
    Signed-off-by: Rusty Russell
    CC: Ingo Molnar
    CC: Américo Wang

    Jianjun Kong
     
  • When there are two symbols in a module with the same name, one of which is
    exported, both will be marked as exported in /proc/kallsyms. There aren't
    any instances of this in the current kernel, but it is easy to construct a
    simple module with two compilation units that exhibits the problem.

    $ objdump -j .text -t testmod.ko | grep foo
    00000000 l F .text 00000032 foo
    00000080 g F .text 00000001 foo
    $ sudo insmod testmod.ko
    $ grep "T foo" /proc/kallsyms
    c28e8000 T foo [testmod]
    c28e8080 T foo [testmod]

    Fix this by comparing the symbol values once we've found the exported
    symbol table entry matching the symbol name. Tested using Ksplice:

    $ ksplice-create --patch=this_commit.patch --id=bar .
    $ sudo ksplice-apply ksplice-bar.tar.gz
    Done!
    $ grep "T foo" /proc/kallsyms
    c28e8080 T foo [testmod]

    Signed-off-by: Tim Abbott
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Rusty Russell

    Tim Abbott
     

08 Dec, 2008

1 commit

  • Impact: trace more functions

    When the function graph tracer is configured, three more files are not
    traced to prevent only four functions to be traced. And this impacts the
    normal function tracer too.

    arch/x86/kernel/process_64/32.c:

    I had crashes when I let this file traced. After some debugging, I saw
    that the "current" task point was changed inside__swtich_to(), ie:
    "write_pda(pcurrent, next_p);" inside process_64.c Since the tracer store
    the original return address of the function inside current, we had
    crashes. Only __switch_to() has to be excluded from tracing.

    kernel/module.c and kernel/extable.c:

    Because of a function used internally by the function graph tracer:
    __kernel_text_address()

    To let the other functions inside these files to be traced, this patch
    introduces the __notrace_funcgraph function prefix which is __notrace if
    function graph tracer is configured and nothing if not.

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

17 Nov, 2008

1 commit


16 Nov, 2008

2 commits