10 Nov, 2015

1 commit

  • commit 275d7d44d802ef271a42dc87ac091a495ba72fc5 upstream.

    Poma (on the way to another bug) reported an assertion triggering:

    [] module_assert_mutex_or_preempt+0x49/0x90
    [] __module_address+0x32/0x150
    [] __module_text_address+0x16/0x70
    [] symbol_put_addr+0x29/0x40
    [] dvb_frontend_detach+0x7d/0x90 [dvb_core]

    Laura Abbott produced a patch which lead us to
    inspect symbol_put_addr(). This function has a comment claiming it
    doesn't need to disable preemption around the module lookup
    because it holds a reference to the module it wants to find, which
    therefore cannot go away.

    This is wrong (and a false optimization too, preempt_disable() is really
    rather cheap, and I doubt any of this is on uber critical paths,
    otherwise it would've retained a pointer to the actual module anyway and
    avoided the second lookup).

    While its true that the module cannot go away while we hold a reference
    on it, the data structure we do the lookup in very much _CAN_ change
    while we do the lookup. Therefore fix the comment and add the
    required preempt_disable().

    Reported-by: poma
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Rusty Russell
    Fixes: a6e6abd575fc ("module: remove module_text_address()")
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     

09 May, 2015

1 commit

  • The module notifier call chain for MODULE_STATE_COMING was moved up before
    the parsing of args, into the complete_formation() call. But if the module failed
    to load after that, the notifier call chain for MODULE_STATE_GOING was
    never called and that prevented the users of those call chains from
    cleaning up anything that was allocated.

    Link: http://lkml.kernel.org/r/554C52B9.9060700@gmail.com

    Reported-by: Pontus Fuchs
    Fixes: 4982223e51e8 "module: set nx before marking module MODULE_STATE_COMING"
    Cc: stable@vger.kernel.org # 3.16+
    Signed-off-by: Steven Rostedt
    Signed-off-by: Rusty Russell

    Steven Rostedt
     

23 Apr, 2015

1 commit

  • Pull module updates from Rusty Russell:
    "Quentin opened a can of worms by adding extable entry checking to
    modpost, but most architectures seem fixed now. Thanks to all
    involved.

    Last minute rebase because I noticed a "[PATCH]" had snuck into a
    commit message somehow"

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    modpost: don't emit section mismatch warnings for compiler optimizations
    modpost: expand pattern matching to support substring matches
    modpost: do not try to match the SHT_NUL section.
    modpost: fix extable entry size calculation.
    modpost: fix inverted logic in is_extable_fault_address().
    modpost: handle -ffunction-sections
    modpost: Whitelist .text.fixup and .exception.text
    params: handle quotes properly for values not of form foo="bar".
    modpost: document the use of struct section_check.
    modpost: handle relocations mismatch in __ex_table.
    scripts: add check_extable.sh script.
    modpost: mismatch_handler: retrieve tosym information only when needed.
    modpost: factorize symbol pretty print in get_pretty_name().
    modpost: add handler function pointer to sectioncheck.
    modpost: add .sched.text and .kprobes.text to the TEXT_SECTIONS list.
    modpost: add strict white-listing when referencing sections.
    module: do not print allocation-fail warning on bogus user buffer size
    kernel/module.c: fix typos in message about unused symbols

    Linus Torvalds
     

15 Apr, 2015

1 commit

  • Pull tracing updates from Steven Rostedt:
    "Some clean ups and small fixes, but the biggest change is the addition
    of the TRACE_DEFINE_ENUM() macro that can be used by tracepoints.

    Tracepoints have helper functions for the TP_printk() called
    __print_symbolic() and __print_flags() that lets a numeric number be
    displayed as a a human comprehensible text. What is placed in the
    TP_printk() is also shown in the tracepoint format file such that user
    space tools like perf and trace-cmd can parse the binary data and
    express the values too. Unfortunately, the way the TRACE_EVENT()
    macro works, anything placed in the TP_printk() will be shown pretty
    much exactly as is. The problem arises when enums are used. That's
    because unlike macros, enums will not be changed into their values by
    the C pre-processor. Thus, the enum string is exported to the format
    file, and this makes it useless for user space tools.

    The TRACE_DEFINE_ENUM() solves this by converting the enum strings in
    the TP_printk() format into their number, and that is what is shown to
    user space. For example, the tracepoint tlb_flush currently has this
    in its format file:

    __print_symbolic(REC->reason,
    { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
    { TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
    { TLB_LOCAL_SHOOTDOWN, "local shootdown" },
    { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })

    After adding:

    TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
    TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
    TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
    TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);

    Its format file will contain this:

    __print_symbolic(REC->reason,
    { 0, "flush on task switch" },
    { 1, "remote shootdown" },
    { 2, "local shootdown" },
    { 3, "local mm shootdown" })"

    * tag 'trace-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (27 commits)
    tracing: Add enum_map file to show enums that have been mapped
    writeback: Export enums used by tracepoint to user space
    v4l: Export enums used by tracepoints to user space
    SUNRPC: Export enums in tracepoints to user space
    mm: tracing: Export enums in tracepoints to user space
    irq/tracing: Export enums in tracepoints to user space
    f2fs: Export the enums in the tracepoints to userspace
    net/9p/tracing: Export enums in tracepoints to userspace
    x86/tlb/trace: Export enums in used by tlb_flush tracepoint
    tracing/samples: Update the trace-event-sample.h with TRACE_DEFINE_ENUM()
    tracing: Allow for modules to convert their enums to values
    tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values
    tracing: Update trace-event-sample with TRACE_SYSTEM_VAR documentation
    tracing: Give system name a pointer
    brcmsmac: Move each system tracepoints to their own header
    iwlwifi: Move each system tracepoints to their own header
    mac80211: Move message tracepoints to their own header
    tracing: Add TRACE_SYSTEM_VAR to xhci-hcd
    tracing: Add TRACE_SYSTEM_VAR to kvm-s390
    tracing: Add TRACE_SYSTEM_VAR to intel-sst
    ...

    Linus Torvalds
     

09 Apr, 2015

1 commit

  • Unlike most (all?) other copies from user space, kernel module loading
    is almost unlimited in size. So we do a potentially huge
    "copy_from_user()" when we copy the module data from user space to the
    kernel buffer, which can be a latency concern when preemption is
    disabled (or voluntary).

    Also, because 'copy_from_user()' clears the tail of the kernel buffer on
    failures, even a *failed* copy can end up wasting a lot of time.

    Normally neither of these are concerns in real life, but they do trigger
    when doing stress-testing with trinity. Running in a VM seems to add
    its own overheadm causing trinity module load testing to even trigger
    the watchdog.

    The simple fix is to just chunk up the module loading, so that it never
    tries to copy insanely big areas in one go. That bounds the latency,
    and also the amount of (unnecessarily, in this case) cleared memory for
    the failure case.

    Reported-by: Sasha Levin
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

08 Apr, 2015

1 commit


24 Mar, 2015

2 commits


23 Mar, 2015

1 commit

  • Module unload calls lockdep_free_key_range(), which removes entries
    from the data structures. Most of the lockdep code OTOH assumes the
    data structures are append only; in specific see the comments in
    add_lock_to_list() and look_up_lock_class().

    Clearly this has only worked by accident; make it work proper. The
    actual scenario to make it go boom would involve the memory freed by
    the module unlock being re-allocated and re-used for a lock inside of
    a rcu-sched grace period. This is a very unlikely scenario, still
    better plug the hole.

    Use RCU list iteration in all places and ammend the comments.

    Change lockdep_free_key_range() to issue a sync_sched() between
    removal from the lists and returning -- which results in the memory
    being freed. Further ensure the callers are placed correctly and
    comment the requirements.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Andrey Tsyvarev
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rusty Russell
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

13 Mar, 2015

1 commit

  • Current approach in handling shadow memory for modules is broken.

    Shadow memory could be freed only after memory shadow corresponds it is no
    longer used. vfree() called from interrupt context could use memory its
    freeing to store 'struct llist_node' in it:

    void vfree(const void *addr)
    {
    ...
    if (unlikely(in_interrupt())) {
    struct vfree_deferred *p = this_cpu_ptr(&vfree_deferred);
    if (llist_add((struct llist_node *)addr, &p->list))
    schedule_work(&p->wq);

    Later this list node used in free_work() which actually frees memory.
    Currently module_memfree() called in interrupt context will free shadow
    before freeing module's memory which could provoke kernel crash.

    So shadow memory should be freed after module's memory. However, such
    deallocation order could race with kasan_module_alloc() in module_alloc().

    Free shadow right before releasing vm area. At this point vfree()'d
    memory is not used anymore and yet not available for other allocations.
    New VM_KASAN flag used to indicate that vm area has dynamically allocated
    shadow memory so kasan frees shadow only if it was previously allocated.

    Signed-off-by: Andrey Ryabinin
    Acked-by: Rusty Russell
    Cc: Dmitry Vyukov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     

06 Mar, 2015

1 commit


18 Feb, 2015

1 commit

  • This provides a reliable breakpoint target, required for automatic symbol
    loading via the gdb helper command 'lx-symbols'.

    Signed-off-by: Jan Kiszka
    Acked-by: Rusty Russell
    Cc: Thomas Gleixner
    Cc: Jason Wessel
    Cc: Andi Kleen
    Cc: Ben Widawsky
    Cc: Borislav Petkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kiszka
     

14 Feb, 2015

1 commit

  • This feature let us to detect accesses out of bounds of global variables.
    This will work as for globals in kernel image, so for globals in modules.
    Currently this won't work for symbols in user-specified sections (e.g.
    __init, __read_mostly, ...)

    The idea of this is simple. Compiler increases each global variable by
    redzone size and add constructors invoking __asan_register_globals()
    function. Information about global variable (address, size, size with
    redzone ...) passed to __asan_register_globals() so we could poison
    variable's redzone.

    This patch also forces module_alloc() to return 8*PAGE_SIZE aligned
    address making shadow memory handling (
    kasan_module_alloc()/kasan_module_free() ) more simple. Such alignment
    guarantees that each shadow page backing modules address space correspond
    to only one module_alloc() allocation.

    Signed-off-by: Andrey Ryabinin
    Cc: Dmitry Vyukov
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrey Konovalov
    Cc: Yuri Gribov
    Cc: Konstantin Khlebnikov
    Cc: Sasha Levin
    Cc: Christoph Lameter
    Cc: Joonsoo Kim
    Cc: Dave Hansen
    Cc: Andi Kleen
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     

11 Feb, 2015

2 commits

  • Since the introduction of the nested sleep warning; we've established
    that the occasional sleep inside a wait_event() is fine.

    wait_event() loops are invariant wrt. spurious wakeups, and the
    occasional sleep has a similar effect on them. As long as its occasional
    its harmless.

    Therefore replace the 'correct' but verbose wait_woken() thing with
    a simple annotation to shut up the warning.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Rusty Russell

    Peter Zijlstra
     
  • Because wait_event() loops are safe vs spurious wakeups we can allow the
    occasional sleep -- which ends up being very similar.

    Reported-by: Dave Jones
    Signed-off-by: Peter Zijlstra (Intel)
    Tested-by: Dave Jones
    Signed-off-by: Rusty Russell

    Peter Zijlstra
     

06 Feb, 2015

2 commits


22 Jan, 2015

1 commit

  • James Bottomley points out that it will be -1 during unload. It's
    only used for diagnostics, so let's not hide that as it could be a
    clue as to what's gone wrong.

    Cc: Jason Wessel
    Acked-and-documention-added-by: James Bottomley
    Reviewed-by: Masami Hiramatsu
    Signed-off-by: Rusty Russell

    Rusty Russell
     

20 Jan, 2015

3 commits

  • The kallsyms routines (module_symbol_name, lookup_module_* etc) disable
    preemption to walk the modules rather than taking the module_mutex:
    this is because they are used for symbol resolution during oopses.

    This works because there are synchronize_sched() and synchronize_rcu()
    in the unload and failure paths. However, there's one case which doesn't
    have that: the normal case where module loading succeeds, and we free
    the init section.

    We don't want a synchronize_rcu() there, because it would slow down
    module loading: this bug was introduced in 2009 to speed module
    loading in the first place.

    Thus, we want to do the free in an RCU callback. We do this in the
    simplest possible way by allocating a new rcu_head: if we put it in
    the module structure we'd have to worry about that getting freed.

    Reported-by: Rui Xiang
    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Nothing needs the module pointer any more, and the next patch will
    call it from RCU, where the module itself might no longer exist.
    Removing the arg is the safest approach.

    This just codifies the use of the module_alloc/module_free pattern
    which ftrace and bpf use.

    Signed-off-by: Rusty Russell
    Acked-by: Alexei Starovoitov
    Cc: Mikael Starvik
    Cc: Jesper Nilsson
    Cc: Ralf Baechle
    Cc: Ley Foon Tan
    Cc: Benjamin Herrenschmidt
    Cc: Chris Metcalf
    Cc: Steven Rostedt
    Cc: x86@kernel.org
    Cc: Ananth N Mavinakayanahalli
    Cc: Anil S Keshavamurthy
    Cc: Masami Hiramatsu
    Cc: linux-cris-kernel@axis.com
    Cc: linux-kernel@vger.kernel.org
    Cc: linux-mips@linux-mips.org
    Cc: nios2-dev@lists.rocketboards.org
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: sparclinux@vger.kernel.org
    Cc: netdev@vger.kernel.org

    Rusty Russell
     
  • Archs have been abusing module_free() to clean up their arch-specific
    allocations. Since module_free() is also (ab)used by BPF and trace code,
    let's keep it to simple allocations, and provide a hook called before
    that.

    This means that avr32, ia64, parisc and s390 no longer need to implement
    their own module_free() at all. avr32 doesn't need module_finalize()
    either.

    Signed-off-by: Rusty Russell
    Cc: Chris Metcalf
    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: "James E.J. Bottomley"
    Cc: Helge Deller
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: linux-kernel@vger.kernel.org
    Cc: linux-ia64@vger.kernel.org
    Cc: linux-parisc@vger.kernel.org
    Cc: linux-s390@vger.kernel.org

    Rusty Russell
     

19 Dec, 2014

1 commit

  • Pull module updates from Rusty Russell:
    "The exciting thing here is the getting rid of stop_machine on module
    removal. This is possible by using a simple atomic_t for the counter,
    rather than our fancy per-cpu counter: it turns out that no one is
    doing a module increment per net packet, so the slowdown should be in
    the noise"

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    param: do not set store func without write perm
    params: cleanup sysfs allocation
    kernel:module Fix coding style errors and warnings.
    module: Remove stop_machine from module unloading
    module: Replace module_ref with atomic_t refcnt
    lib/bug: Use RCU list ops for module_bug_list
    module: Unlink module with RCU synchronizing instead of stop_machine
    module: Wait for RCU synchronizing before releasing a module

    Linus Torvalds
     

11 Nov, 2014

6 commits

  • Fixed codin style errors and warnings. Changes printk with
    print_debug/warn. Changed seq_printf to seq_puts.

    Signed-off-by: Ionut Alexa
    Signed-off-by: Rusty Russell (removed bogus KERN_DEFAULT conversion)

    Ionut Alexa
     
  • Remove stop_machine from module unloading by adding new reference
    counting algorithm.

    This atomic refcounter works like a semaphore, it can get (be
    incremented) only when the counter is not 0. When loading a module,
    kmodule subsystem sets the counter MODULE_REF_BASE (= 1). And when
    unloading the module, it subtracts MODULE_REF_BASE from the counter.
    If no one refers the module, the refcounter becomes 0 and we can
    remove the module safely. If someone referes it, we try to recover
    the counter by adding MODULE_REF_BASE unless the counter becomes 0,
    because the referrer can put the module right before recovering.
    If the recovering is failed, we can get the 0 refcount and it
    never be incremented again, it can be removed safely too.

    Note that __module_get() forcibly gets the module refcounter,
    users should use try_module_get() instead of that.

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Rusty Russell

    Masami Hiramatsu
     
  • Replace module_ref per-cpu complex reference counter with
    an atomic_t simple refcnt. This is for code simplification.

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Rusty Russell

    Masami Hiramatsu
     
  • Actually since module_bug_list should be used in BUG context,
    we may not need this. But for someone who want to use this
    from normal context, this makes module_bug_list an RCU list.

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Rusty Russell

    Masami Hiramatsu
     
  • Unlink module from module list with RCU synchronizing instead
    of using stop_machine(). Since module list is already protected
    by rcu, we don't need stop_machine() anymore.

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Rusty Russell

    Masami Hiramatsu
     
  • Wait for RCU synchronizing on failure path of module loading
    before releasing struct module, because the memory of mod->list
    can still be accessed by list walkers (e.g. kallsyms).

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Rusty Russell

    Masami Hiramatsu
     

28 Oct, 2014

1 commit

  • This is a genuine bug in add_unformed_module(), we cannot use blocking
    primitives inside a wait loop.

    So rewrite the wait_event_interruptible() usage to use the fresh
    wait_woken() stuff.

    Reported-by: Fengguang Wu
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: tglx@linutronix.de
    Cc: ilya.dryomov@inktank.com
    Cc: umgwanakikbuti@gmail.com
    Cc: Rusty Russell
    Cc: oleg@redhat.com
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Greg Kroah-Hartman
    Link: http://lkml.kernel.org/r/20140924082242.458562904@infradead.org
    [ So this is probably complex to backport and the race wasn't reported AFAIK,
    so not marked for -stable. ]
    Signed-off-by: Ingo Molnar

    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

19 Oct, 2014

1 commit


15 Oct, 2014

1 commit

  • A panic was seen in the following sitation.

    There are two threads running on the system. The first thread is a system
    monitoring thread that is reading /proc/modules. The second thread is
    loading and unloading a module (in this example I'm using my simple
    dummy-module.ko). Note, in the "real world" this occurred with the qlogic
    driver module.

    When doing this, the following panic occurred:

    ------------[ cut here ]------------
    kernel BUG at kernel/module.c:3739!
    invalid opcode: 0000 [#1] SMP
    Modules linked in: binfmt_misc sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw igb gf128mul glue_helper iTCO_wdt iTCO_vendor_support ablk_helper ptp sb_edac cryptd pps_core edac_core shpchp i2c_i801 pcspkr wmi lpc_ich ioatdma mfd_core dca ipmi_si nfsd ipmi_msghandler auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm isci drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dummy_module]
    CPU: 37 PID: 186343 Comm: cat Tainted: GF O-------------- 3.10.0+ #7
    Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
    task: ffff8807fd2d8000 ti: ffff88080fa7c000 task.ti: ffff88080fa7c000
    RIP: 0010:[] [] module_flags+0xb5/0xc0
    RSP: 0018:ffff88080fa7fe18 EFLAGS: 00010246
    RAX: 0000000000000003 RBX: ffffffffa03b5200 RCX: 0000000000000000
    RDX: 0000000000001000 RSI: ffff88080fa7fe38 RDI: ffffffffa03b5000
    RBP: ffff88080fa7fe28 R08: 0000000000000010 R09: 0000000000000000
    R10: 0000000000000000 R11: 000000000000000f R12: ffffffffa03b5000
    R13: ffffffffa03b5008 R14: ffffffffa03b5200 R15: ffffffffa03b5000
    FS: 00007f6ae57ef740(0000) GS:ffff88101e7a0000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000404f70 CR3: 0000000ffed48000 CR4: 00000000001407e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Stack:
    ffffffffa03b5200 ffff8810101e4800 ffff88080fa7fe70 ffffffff810d666c
    ffff88081e807300 000000002e0f2fbf 0000000000000000 ffff88100f257b00
    ffffffffa03b5008 ffff88080fa7ff48 ffff8810101e4800 ffff88080fa7fee0
    Call Trace:
    [] m_show+0x19c/0x1e0
    [] seq_read+0x16e/0x3b0
    [] proc_reg_read+0x3d/0x80
    [] vfs_read+0x9c/0x170
    [] SyS_read+0x58/0xb0
    [] system_call_fastpath+0x16/0x1b
    Code: 48 63 c2 83 c2 01 c6 04 03 29 48 63 d2 eb d9 0f 1f 80 00 00 00 00 48 63 d2 c6 04 13 2d 41 8b 0c 24 8d 50 02 83 f9 01 75 b2 eb cb 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
    RIP [] module_flags+0xb5/0xc0
    RSP

    Consider the two processes running on the system.

    CPU 0 (/proc/modules reader)
    CPU 1 (loading/unloading module)

    CPU 0 opens /proc/modules, and starts displaying data for each module by
    traversing the modules list via fs/seq_file.c:seq_open() and
    fs/seq_file.c:seq_read(). For each module in the modules list, seq_read
    does

    op->start() show() stop() state == MODULE_STATE_UNFORMED);
    ...

    The other thread, CPU 1, in unloading the module calls the syscall
    delete_module() defined in kernel/module.c. The module_mutex is acquired
    for a short time, and then released. free_module() is called without the
    module_mutex. free_module() then sets mod->state = MODULE_STATE_UNFORMED,
    also without the module_mutex. Some additional code is called and then the
    module_mutex is reacquired to remove the module from the modules list:

    /* Now we can delete it from the lists */
    mutex_lock(&module_mutex);
    stop_machine(__unlink_module, mod, NULL);
    mutex_unlock(&module_mutex);

    This is the sequence of events that leads to the panic.

    CPU 1 is removing dummy_module via delete_module(). It acquires the
    module_mutex, and then releases it. CPU 1 has NOT set dummy_module->state to
    MODULE_STATE_UNFORMED yet.

    CPU 0, which is reading the /proc/modules, acquires the module_mutex and
    acquires a pointer to the dummy_module which is still in the modules list.
    CPU 0 calls m_show for dummy_module. The check in m_show() for
    MODULE_STATE_UNFORMED passed for dummy_module even though it is being
    torn down.

    Meanwhile CPU 1, which has been continuing to remove dummy_module without
    holding the module_mutex, now calls free_module() and sets
    dummy_module->state to MODULE_STATE_UNFORMED.

    CPU 0 now calls module_flags() with dummy_module and ...

    static char *module_flags(struct module *mod, char *buf)
    {
    int bx = 0;

    BUG_ON(mod->state == MODULE_STATE_UNFORMED);

    and BOOM.

    Acquire and release the module_mutex lock around the setting of
    MODULE_STATE_UNFORMED in the teardown path, which should resolve the
    problem.

    Testing: In the unpatched kernel I can panic the system within 1 minute by
    doing

    while (true) do insmod dummy_module.ko; rmmod dummy_module.ko; done

    and

    while (true) do cat /proc/modules; done

    in separate terminals.

    In the patched kernel I was able to run just over one hour without seeing
    any issues. I also verified the output of panic via sysrq-c and the output
    of /proc/modules looks correct for all three states for the dummy_module.

    dummy_module 12661 0 - Unloading 0xffffffffa03a5000 (OE-)
    dummy_module 12661 0 - Live 0xffffffffa03bb000 (OE)
    dummy_module 14015 1 - Loading 0xffffffffa03a5000 (OE+)

    Signed-off-by: Prarit Bhargava
    Reviewed-by: Oleg Nesterov
    Signed-off-by: Rusty Russell
    Cc: stable@kernel.org

    Prarit Bhargava
     

08 Oct, 2014

1 commit

  • Pull arm64 updates from Catalin Marinas:
    - eBPF JIT compiler for arm64
    - CPU suspend backend for PSCI (firmware interface) with standard idle
    states defined in DT (generic idle driver to be merged via a
    different tree)
    - Support for CONFIG_DEBUG_SET_MODULE_RONX
    - Support for unmapped cpu-release-addr (outside kernel linear mapping)
    - set_arch_dma_coherent_ops() implemented and bus notifiers removed
    - EFI_STUB improvements when base of DRAM is occupied
    - Typos in KGDB macros
    - Clean-up to (partially) allow kernel building with LLVM
    - Other clean-ups (extern keyword, phys_addr_t usage)

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (51 commits)
    arm64: Remove unneeded extern keyword
    ARM64: make of_device_ids const
    arm64: Use phys_addr_t type for physical address
    aarch64: filter $x from kallsyms
    arm64: Use DMA_ERROR_CODE to denote failed allocation
    arm64: Fix typos in KGDB macros
    arm64: insn: Add return statements after BUG_ON()
    arm64: debug: don't re-enable debug exceptions on return from el1_dbg
    Revert "arm64: dmi: Add SMBIOS/DMI support"
    arm64: Implement set_arch_dma_coherent_ops() to replace bus notifiers
    of: amba: use of_dma_configure for AMBA devices
    arm64: dmi: Add SMBIOS/DMI support
    arm64: Correct ftrace calls to aarch64_insn_gen_branch_imm()
    arm64:mm: initialize max_mapnr using function set_max_mapnr
    setup: Move unmask of async interrupts after possible earlycon setup
    arm64: LLVMLinux: Fix inline arm64 assembly for use with clang
    arm64: pageattr: Correctly adjust unaligned start addresses
    net: bpf: arm64: fix module memory leak when JIT image build fails
    arm64: add PSCI CPU_SUSPEND based cpu_suspend support
    arm64: kernel: introduce cpu_init_idle CPU operation
    ...

    Linus Torvalds
     

03 Oct, 2014

1 commit

  • Similar to ARM, AArch64 is generating $x and $d syms... which isn't
    terribly helpful when looking at %pF output and the like. Filter those
    out in kallsyms, modpost and when looking at module symbols.

    Seems simplest since none of these check EM_ARM anyway, to just add it
    to the strchr used, rather than trying to make things overly
    complicated.

    initcall_debug improves:
    dmesg_before.txt: initcall $x+0x0/0x154 [sg] returned 0 after 26331 usecs
    dmesg_after.txt: initcall init_sg+0x0/0x154 [sg] returned 0 after 15461 usecs

    Signed-off-by: Kyle McMartin
    Acked-by: Rusty Russell
    Signed-off-by: Catalin Marinas

    Kyle McMartin
     

27 Aug, 2014

1 commit


16 Aug, 2014

1 commit

  • The commit

    4982223e51e8 module: set nx before marking module MODULE_STATE_COMING.

    introduced a regression: if a module fails to parse its arguments or
    if mod_sysfs_setup fails, then the module's memory will be freed
    while still read-only. Anything that reuses that memory will crash
    as soon as it tries to write to it.

    Cc: stable@vger.kernel.org # v3.16
    Cc: Rusty Russell
    Signed-off-by: Andy Lutomirski
    Signed-off-by: Rusty Russell

    Andy Lutomirski
     

11 Aug, 2014

1 commit

  • Pull module updates from Rusty Russell:
    "This finally applies the stricter sysfs perms checking we pulled out
    before last merge window. A few stragglers are fixed (thanks
    linux-next!)"

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    arch/powerpc/platforms/powernv/opal-dump.c: fix world-writable sysfs files
    arch/powerpc/platforms/powernv/opal-elog.c: fix world-writable sysfs files
    drivers/video/fbdev/s3c2410fb.c: don't make debug world-writable.
    ARM: avoid ARM binutils leaking ELF local symbols
    scripts: modpost: Remove numeric suffix pattern matching
    scripts: modpost: fix compilation warning
    sysfs: disallow world-writable files.
    module: return bool from within_module*()
    module: add within_module() function
    modules: Fix build error in moduleloader.h

    Linus Torvalds
     

27 Jul, 2014

2 commits

  • Symbols starting with .L are ELF local symbols and should not appear
    in ELF symbol tables. However, unfortunately ARM binutils leaks the
    .LANCHOR symbols into the symbol table, which leads kallsyms to report
    these symbols rather than the real name. It is not very useful when
    %pf reports symbols against these leaked .LANCHOR symbols.

    Arrange for kallsyms to ignore these symbols using the same mechanism
    that is used for the ARM mapping symbols.

    Signed-off-by: Russell King
    Signed-off-by: Rusty Russell

    Russell King
     
  • It is just a small optimization that allows to replace few
    occurrences of within_module_init() || within_module_core()
    with a single call.

    Signed-off-by: Petr Mladek
    Signed-off-by: Rusty Russell

    Petr Mladek
     

03 Jul, 2014

1 commit

  • Per further discussion with NIST, the requirements for FIPS state that
    we only need to panic the system on failed kernel module signature checks
    for crypto subsystem modules. This moves the fips-mode-only module
    signature check out of the generic module loading code, into the crypto
    subsystem, at points where we can catch both algorithm module loads and
    mode module loads. At the same time, make CONFIG_CRYPTO_FIPS dependent on
    CONFIG_MODULE_SIG, as this is entirely necessary for FIPS mode.

    v2: remove extraneous blank line, perform checks in static inline
    function, drop no longer necessary fips.h include.

    CC: "David S. Miller"
    CC: Rusty Russell
    CC: Stephan Mueller
    Signed-off-by: Jarod Wilson
    Acked-by: Neil Horman
    Signed-off-by: Herbert Xu

    Jarod Wilson
     

12 Jun, 2014

1 commit

  • Pull module updates from Rusty Russell:
    "Most of this is cleaning up various driver sysfs permissions so we can
    re-add the perm check (we unified the module param and sysfs checks,
    but the module ones were stronger so we weakened them temporarily).

    Param parsing gets documented, and also "--" now forces args to be
    handed to init (and ignored by the kernel).

    Module NX/RO protections get tightened: we now set them before calling
    parse_args()"

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    module: set nx before marking module MODULE_STATE_COMING.
    samples/kobject/: avoid world-writable sysfs files.
    drivers/hid/hid-picolcd_fb: avoid world-writable sysfs files.
    drivers/staging/speakup/: avoid world-writable sysfs files.
    drivers/regulator/virtual: avoid world-writable sysfs files.
    drivers/scsi/pm8001/pm8001_ctl.c: avoid world-writable sysfs files.
    drivers/hid/hid-lg4ff.c: avoid world-writable sysfs files.
    drivers/video/fbdev/sm501fb.c: avoid world-writable sysfs files.
    drivers/mtd/devices/docg3.c: avoid world-writable sysfs files.
    speakup: fix incorrect perms on speakup_acntsa.c
    cpumask.h: silence warning with -Wsign-compare
    Documentation: Update kernel-parameters.tx
    param: hand arguments after -- straight to init
    modpost: Fix resource leak in read_dump()

    Linus Torvalds