Eric Lee / smarc-fsl-linux-kernel

04 Aug, 2016

2 commits

444d13ff1 modules: add ro_after_init support ... Browse Code »

Add ro_after_init support for modules by adding a new page-aligned section
in the module layout (after rodata) for ro_after_init data and enabling RO
protection for that section after module init runs.

Signed-off-by: Jessica Yu
Acked-by: Kees Cook
Signed-off-by: Rusty Russell

Jessica Yu
2016-08-04 08:46:55 +0800
0ef765379 exceptions: fork exception table content from module.h into extable.h ... Browse Code »

For historical reasons (i.e. pre-git) the exception table stuff was
buried in the middle of the module.h file. I noticed this while
doing an audit for needless includes of module.h and found core
kernel files (both arch specific and arch independent) were just
including module.h for this.

The converse is also true, in that conventional drivers, be they
for filesystems or actual hardware peripherals or similar, do not
normally care about the exception tables.

Here we fork the exception table content out of module.h into a
new file called extable.h -- and temporarily include it into the
module.h itself.

Then we will work our way across the arch independent and arch
specific files needing just exception table content, and move
them off module.h and onto extable.h

Once that is done, we can remove the extable.h from module.h
and in doing it like this, we avoid introducing build failures
into the git history.

The gain here is that module.h gets a bit smaller, across all
modular drivers that we build for allmodconfig. Also the core
files that only need exception table stuff don't have an include
of module.h that brings in lots of extra stuff and just looks
generally out of place.

Cc: Andrew Morton
Cc: Linus Torvalds
Signed-off-by: Paul Gortmaker
Signed-off-by: Rusty Russell

Paul Gortmaker
2016-08-04 08:46:54 +0800

27 Jul, 2016

1 commit

bf262dcec module: fix noreturn attribute for __module_put_and_exit() ... Browse Code »

__module_put_and_exit() is makred noreturn in module.h declaration, but is
lacking the attribute in the definition, which makes some tools (such as
sparse) unhappy. Amend the definition with the attribute as well (and
reformat the declaration so that it uses more common format).

Signed-off-by: Jiri Kosina
Signed-off-by: Rusty Russell

Jiri Kosina
2016-07-27 11:08:00 +0800

01 Apr, 2016

1 commit

1ce15ef4f module: preserve Elf information for livepatch modules ... Browse Code »

For livepatch modules, copy Elf section, symbol, and string information
from the load_info struct in the module loader. Persist copies of the
original symbol table and string table.

Livepatch manages its own relocation sections in order to reuse module
loader code to write relocations. Livepatch modules must preserve Elf
information such as section indices in order to apply livepatch relocation
sections using the module loader's apply_relocate_add() function.

In order to apply livepatch relocation sections, livepatch modules must
keep a complete copy of their original symbol table in memory. Normally, a
stripped down copy of a module's symbol table (containing only "core"
symbols) is made available through module->core_symtab. But for livepatch
modules, the symbol table copied into memory on module load must be exactly
the same as the symbol table produced when the patch module was compiled.
This is because the relocations in each livepatch relocation section refer
to their respective symbols with their symbol indices, and the original
symbol indices (and thus the symtab ordering) must be preserved in order
for apply_relocate_add() to find the right symbol.

Signed-off-by: Jessica Yu
Reviewed-by: Miroslav Benes
Acked-by: Josh Poimboeuf
Acked-by: Rusty Russell
Reviewed-by: Rusty Russell
Signed-off-by: Jiri Kosina

Jessica Yu
2016-04-01 21:00:10 +0800

03 Feb, 2016

1 commit

8244062ef modules: fix longstanding /proc/kallsyms vs module insertion race. ... Browse Code »

For CONFIG_KALLSYMS, we keep two symbol tables and two string tables.
There's one full copy, marked SHF_ALLOC and laid out at the end of the
module's init section. There's also a cut-down version that only
contains core symbols and strings, and lives in the module's core
section.

After module init (and before we free the module memory), we switch
the mod->symtab, mod->num_symtab and mod->strtab to point to the core
versions. We do this under the module_mutex.

However, kallsyms doesn't take the module_mutex: it uses
preempt_disable() and rcu tricks to walk through the modules, because
it's used in the oops path. It's also used in /proc/kallsyms.
There's nothing atomic about the change of these variables, so we can
get the old (larger!) num_symtab and the new symtab pointer; in fact
this is what I saw when trying to reproduce.

By grouping these variables together, we can use a
carefully-dereferenced pointer to ensure we always get one or the
other (the free of the module init section is already done in an RCU
callback, so that's safe). We allocate the init one at the end of the
module init section, and keep the core one inside the struct module
itself (it could also have been allocated at the end of the module
core, but that's probably overkill).

Reported-by: Weilong Chen
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111541
Cc: stable@kernel.org
Signed-off-by: Rusty Russell

Rusty Russell
2016-02-03 14:28:15 +0800

05 Dec, 2015

2 commits

85c898db6 module: clean up RO/NX handling. ... Browse Code »

Modules have three sections: text, rodata and writable data. The code
handled the case where these overlapped, however they never can:
debug_align() ensures they are always page-aligned.

This is why we got away with manually traversing the pages in
set_all_modules_text_rw() without rounding.

We create three helper functions: frob_text(), frob_rodata() and
frob_writable_data(). We then call these explicitly at every point,
so it's clear what we're doing.

We also expose module_enable_ro() and module_disable_ro() for
livepatch to use.

Reviewed-by: Josh Poimboeuf
Signed-off-by: Rusty Russell
Signed-off-by: Jiri Kosina

Rusty Russell
2015-12-05 05:46:26 +0800
7523e4dc5 module: use a structure to encapsulate layout. ... Browse Code »

Makes it easier to handle init vs core cleanly, though the change is
fairly invasive across random architectures.

It simplifies the rbtree code immediately, however, while keeping the
core data together in the same cachline (now iff the rbtree code is
enabled).

Acked-by: Peter Zijlstra
Reviewed-by: Josh Poimboeuf
Signed-off-by: Rusty Russell
Signed-off-by: Jiri Kosina

Rusty Russell
2015-12-05 05:46:25 +0800

06 Jul, 2015

1 commit

0fd972a7d module: relocate module_init from init.h to module.h ... Browse Code »

Modular users will always be users of init functionality, but
users of init functionality are not necessarily always modules.

Hence any functionality like module_init and module_exit would
be more at home in the module.h file. And module.h should
explicitly include init.h to make the dependency clear.

We've already done all the legwork needed to ensure that this
move does not cause any build regressions due to implicit
header file include assumptions about where module_init lives.

Cc: Rusty Russell
Acked-by: Rusty Russell
Signed-off-by: Paul Gortmaker

Paul Gortmaker
2015-07-06 11:59:14 +0800

02 Jul, 2015

1 commit

02201e3f1 Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux ... Browse Code »

Pull module updates from Rusty Russell:
"Main excitement here is Peter Zijlstra's lockless rbtree optimization
to speed module address lookup. He found some abusers of the module
lock doing that too.

A little bit of parameter work here too; including Dan Streetman's
breaking up the big param mutex so writing a parameter can load
another module (yeah, really). Unfortunately that broke the usual
suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were
appended too"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (26 commits)
modules: only use mod->param_lock if CONFIG_MODULES
param: fix module param locks when !CONFIG_SYSFS.
rcu: merge fix for Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()
module: add per-module param_lock
module: make perm const
params: suppress unused variable error, warn once just in case code changes.
modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'.
kernel/module.c: avoid ifdefs for sig_enforce declaration
kernel/workqueue.c: remove ifdefs over wq_power_efficient
kernel/params.c: export param_ops_bool_enable_only
kernel/params.c: generalize bool_enable_only
kernel/module.c: use generic module param operaters for sig_enforce
kernel/params: constify struct kernel_param_ops uses
sysfs: tightened sysfs permission checks
module: Rework module_addr_{min,max}
module: Use __module_address() for module_address_lookup()
module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING
module: Optimize __module_address() using a latched RB-tree
rbtree: Implement generic latch_tree
seqlock: Introduce raw_read_seqcount_latch()
...

Linus Torvalds
2015-07-02 01:49:25 +0800

28 Jun, 2015

1 commit

cf2fde7b3 param: fix module param locks when !CONFIG_SYSFS. ... Browse Code »

As Dan Streetman points out, the entire point of locking for is to
stop sysfs accesses, so they're elided entirely in the !SYSFS case.

Reported-by: Stephen Rothwell
Signed-off-by: Rusty Russell

Rusty Russell
2015-06-28 13:16:14 +0800

27 Jun, 2015

2 commits

8d7804a2f Merge tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core ... Browse Code »

Pull driver core updates from Greg KH:
"Here is the driver core / firmware changes for 4.2-rc1.

A number of small changes all over the place in the driver core, and
in the firmware subsystem. Nothing really major, full details in the
shortlog. Some of it is a bit of churn, given that the platform
driver probing changes was found to not work well, so they were
reverted.

All of these have been in linux-next for a while with no reported
issues"

* tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (31 commits)
Revert "base/platform: Only insert MEM and IO resources"
Revert "base/platform: Continue on insert_resource() error"
Revert "of/platform: Use platform_device interface"
Revert "base/platform: Remove code duplication"
firmware: add missing kfree for work on async call
fs: sysfs: don't pass count == 0 to bin file readers
base:dd - Fix for typo in comment to function driver_deferred_probe_trigger().
base/platform: Remove code duplication
of/platform: Use platform_device interface
base/platform: Continue on insert_resource() error
base/platform: Only insert MEM and IO resources
firmware: use const for remaining firmware names
firmware: fix possible use after free on name on asynchronous request
firmware: check for file truncation on direct firmware loading
firmware: fix __getname() missing failure check
drivers: of/base: move of_init to driver_init
drivers/base: cacheinfo: fix annoying typo when DT nodes are absent
sysfs: disambiguate between "error code" and "failure" in comments
driver-core: fix build for !CONFIG_MODULES
driver-core: make __device_attach() static
...

Linus Torvalds
2015-06-27 06:07:37 +0800
e38260825 Merge tag 'trace-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace ... Browse Code »

Pull tracing updates from Steven Rostedt:
"This patch series contains several clean ups and even a new trace
clock "monitonic raw". Also some enhancements to make the ring buffer
even faster. But the biggest and most noticeable change is the
renaming of the ftrace* files, structures and variables that have to
deal with trace events.

Over the years I've had several developers tell me about their
confusion with what ftrace is compared to events. Technically,
"ftrace" is the infrastructure to do the function hooks, which include
tracing and also helps with live kernel patching. But the trace
events are a separate entity altogether, and the files that affect the
trace events should not be named "ftrace". These include:

include/trace/ftrace.h -> include/trace/trace_events.h
include/linux/ftrace_event.h -> include/linux/trace_events.h

Also, functions that are specific for trace events have also been renamed:

ftrace_print_*() -> trace_print_*()
(un)register_ftrace_event() -> (un)register_trace_event()
ftrace_event_name() -> trace_event_name()
ftrace_trigger_soft_disabled() -> trace_trigger_soft_disabled()
ftrace_define_fields_##call() -> trace_define_fields_##call()
ftrace_get_offsets_##call() -> trace_get_offsets_##call()

Structures have been renamed:

ftrace_event_file -> trace_event_file
ftrace_event_{call,class} -> trace_event_{call,class}
ftrace_event_buffer -> trace_event_buffer
ftrace_subsystem_dir -> trace_subsystem_dir
ftrace_event_raw_##call -> trace_event_raw_##call
ftrace_event_data_offset_##call-> trace_event_data_offset_##call
ftrace_event_type_funcs_##call -> trace_event_type_funcs_##call

And a few various variables and flags have also been updated.

This has been sitting in linux-next for some time, and I have not
heard a single complaint about this rename breaking anything. Mostly
because these functions, variables and structures are mostly internal
to the tracing system and are seldom (if ever) used by anything
external to that"

* tag 'trace-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
ring_buffer: Allow to exit the ring buffer benchmark immediately
ring-buffer-benchmark: Fix the wrong type
ring-buffer-benchmark: Fix the wrong param in module_param
ring-buffer: Add enum names for the context levels
ring-buffer: Remove useless unused tracing_off_permanent()
ring-buffer: Give NMIs a chance to lock the reader_lock
ring-buffer: Add trace_recursive checks to ring_buffer_write()
ring-buffer: Allways do the trace_recursive checks
ring-buffer: Move recursive check to per_cpu descriptor
ring-buffer: Add unlikelys to make fast path the default
tracing: Rename ftrace_get_offsets_##call() to trace_event_get_offsets_##call()
tracing: Rename ftrace_define_fields_##call() to trace_event_define_fields_##call()
tracing: Rename ftrace_event_type_funcs_##call to trace_event_type_funcs_##call
tracing: Rename ftrace_data_offset_##call to trace_event_data_offset_##call
tracing: Rename ftrace_raw_##call event structures to trace_event_raw_##call
tracing: Rename ftrace_trigger_soft_disabled() to trace_trigger_soft_disabled()
tracing: Rename FTRACE_EVENT_FL_* flags to EVENT_FILE_FL_*
tracing: Rename struct ftrace_subsystem_dir to trace_subsystem_dir
tracing: Rename ftrace_event_name() to trace_event_name()
tracing: Rename FTRACE_MAX_EVENT to TRACE_EVENT_TYPE_MAX
...

Linus Torvalds
2015-06-27 05:02:43 +0800

23 Jun, 2015

1 commit

b51d23e4e module: add per-module param_lock ... Browse Code »

Add a "param_lock" mutex to each module, and update params.c to use
the correct built-in or module mutex while locking kernel params.
Remove the kparam_block_sysfs_r/w() macros, replace them with direct
calls to kernel_param_[un]lock(module).

The kernel param code currently uses a single mutex to protect
modification of any and all kernel params. While this generally works,
there is one specific problem with it; a module callback function
cannot safely load another module, i.e. with request_module() or even
with indirect calls such as crypto_has_alg(). If the module to be
loaded has any of its params configured (e.g. with a /etc/modprobe.d/*
config file), then the attempt will result in a deadlock between the
first module param callback waiting for modprobe, and modprobe trying to
lock the single kernel param mutex to set the new module's param.

This fixes that by using per-module mutexes, so that each individual module
is protected against concurrent changes in its own kernel params, but is
not blocked by changes to other module params. All built-in modules
continue to use the built-in mutex, since they will always be loaded at
runtime and references (e.g. request_module(), crypto_has_alg()) to them
will never cause load-time param changing.

This also simplifies the interface used by modules to block sysfs access
to their params; while there are currently functions to block and unblock
sysfs param access which are split up by read and write and expect a single
kernel param to be passed, their actual operation is identical and applies
to all params, not just the one passed to them; they simply lock and unlock
the global param mutex. They are replaced with direct calls to
kernel_param_[un]lock(THIS_MODULE), which locks THIS_MODULE's param_lock, or
if the module is built-in, it locks the built-in mutex.

Suggested-by: Rusty Russell
Signed-off-by: Dan Streetman
Signed-off-by: Rusty Russell

Dan Streetman
2015-06-23 13:57:38 +0800

28 May, 2015

3 commits

6c9692e2d module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING ... Browse Code »

Andrew worried about the overhead on small systems; only use the fancy
code when either perf or tracing is enabled.

Cc: Rusty Russell
Cc: Steven Rostedt
Requested-by: Andrew Morton
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Rusty Russell

Peter Zijlstra
2015-05-28 10:02:07 +0800
93c2e105f module: Optimize __module_address() using a latched RB-tree ... Browse Code »

Currently __module_address() is using a linear search through all
modules in order to find the module corresponding to the provided
address. With a lot of modules this can take a lot of time.

One of the users of this is kernel_text_address() which is employed
in many stack unwinders; which in turn are used by perf-callchain and
ftrace (possibly from NMI context).

So by optimizing __module_address() we optimize many stack unwinders
which are used by both perf and tracing in performance sensitive code.

Cc: Rusty Russell
Cc: Steven Rostedt
Cc: Mathieu Desnoyers
Cc: Oleg Nesterov
Cc: "Paul E. McKenney"
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Rusty Russell

Peter Zijlstra
2015-05-28 10:02:07 +0800
0be964be0 module: Sanitize RCU usage and locking ... Browse Code »

Currently the RCU usage in module is an inconsistent mess of RCU and
RCU-sched, this is broken for CONFIG_PREEMPT where synchronize_rcu()
does not imply synchronize_sched().

Most usage sites use preempt_{dis,en}able() which is RCU-sched, but
(most of) the modification sites use synchronize_rcu(). With the
exception of the module bug list, which actually uses RCU.

Convert everything over to RCU-sched.

Furthermore add lockdep asserts to all sites, because it's not at all
clear to me the required locking is observed, esp. on exported
functions.

Cc: Rusty Russell
Acked-by: "Paul E. McKenney"
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Rusty Russell

Peter Zijlstra
2015-05-28 10:01:52 +0800

25 May, 2015

1 commit

80c6e1465 driver-core: fix build for !CONFIG_MODULES ... Browse Code »

Commit f2411da74698 ("driver-core: add driver module asynchronous probe
support") broke build in case modules are disabled, because in this case
"struct module" is not defined and we can't dereference it. Let's define
module_requested_async_probing() helper and stub it out if modules are
disabled.

Reported-by: kbuild test robot
Reported-by: Stephen Rothwell
Signed-off-by: Dmitry Torokhov
Signed-off-by: Greg Kroah-Hartman

Dmitry Torokhov
2015-05-25 03:28:30 +0800

20 May, 2015

1 commit

f2411da74 driver-core: add driver module asynchronous probe support ... Browse Code »

Some init systems may wish to express the desire to have device drivers
run their probe() code asynchronously. This implements support for this
and allows userspace to request async probe as a preference through a
generic shared device driver module parameter, async_probe.

Implementation for async probe is supported through a module parameter
given that since synchronous probe has been prevalent for years some
userspace might exist which relies on the fact that the device driver
will probe synchronously and the assumption that devices it provides
will be immediately available after this.

Signed-off-by: Luis R. Rodriguez
Signed-off-by: Dmitry Torokhov
Signed-off-by: Greg Kroah-Hartman

Luis R. Rodriguez
2015-05-20 15:25:24 +0800

14 May, 2015

1 commit

2425bcb92 tracing: Rename ftrace_event_{call,class} to trace_event_{call,class} ... Browse Code »

The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The structures ftrace_event_call and
ftrace_event_class have nothing to do with the function hooks, and are
really trace_event structures. Rename ftrace_event_* to trace_event_*.

Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2015-05-14 02:06:10 +0800

23 Apr, 2015

1 commit

59afdc7b3 crypto: api - Move module sig ifdef into accessor function ... Browse Code »

Currently we're hiding mod->sig_ok under an ifdef in open code.
This patch adds a module_sig_ok accessor function and removes that
ifdef.

Signed-off-by: Herbert Xu
Acked-by: Rusty Russell

Herbert Xu
2015-04-23 14:18:07 +0800

15 Apr, 2015

1 commit

eeee78cf7 Merge tag 'trace-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace ... Browse Code »

Pull tracing updates from Steven Rostedt:
"Some clean ups and small fixes, but the biggest change is the addition
of the TRACE_DEFINE_ENUM() macro that can be used by tracepoints.

Tracepoints have helper functions for the TP_printk() called
__print_symbolic() and __print_flags() that lets a numeric number be
displayed as a a human comprehensible text. What is placed in the
TP_printk() is also shown in the tracepoint format file such that user
space tools like perf and trace-cmd can parse the binary data and
express the values too. Unfortunately, the way the TRACE_EVENT()
macro works, anything placed in the TP_printk() will be shown pretty
much exactly as is. The problem arises when enums are used. That's
because unlike macros, enums will not be changed into their values by
the C pre-processor. Thus, the enum string is exported to the format
file, and this makes it useless for user space tools.

The TRACE_DEFINE_ENUM() solves this by converting the enum strings in
the TP_printk() format into their number, and that is what is shown to
user space. For example, the tracepoint tlb_flush currently has this
in its format file:

__print_symbolic(REC->reason,
{ TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
{ TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
{ TLB_LOCAL_SHOOTDOWN, "local shootdown" },
{ TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })

After adding:

TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);

Its format file will contain this:

__print_symbolic(REC->reason,
{ 0, "flush on task switch" },
{ 1, "remote shootdown" },
{ 2, "local shootdown" },
{ 3, "local mm shootdown" })"

* tag 'trace-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (27 commits)
tracing: Add enum_map file to show enums that have been mapped
writeback: Export enums used by tracepoint to user space
v4l: Export enums used by tracepoints to user space
SUNRPC: Export enums in tracepoints to user space
mm: tracing: Export enums in tracepoints to user space
irq/tracing: Export enums in tracepoints to user space
f2fs: Export the enums in the tracepoints to userspace
net/9p/tracing: Export enums in tracepoints to userspace
x86/tlb/trace: Export enums in used by tlb_flush tracepoint
tracing/samples: Update the trace-event-sample.h with TRACE_DEFINE_ENUM()
tracing: Allow for modules to convert their enums to values
tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values
tracing: Update trace-event-sample with TRACE_SYSTEM_VAR documentation
tracing: Give system name a pointer
brcmsmac: Move each system tracepoints to their own header
iwlwifi: Move each system tracepoints to their own header
mac80211: Move message tracepoints to their own header
tracing: Add TRACE_SYSTEM_VAR to xhci-hcd
tracing: Add TRACE_SYSTEM_VAR to kvm-s390
tracing: Add TRACE_SYSTEM_VAR to intel-sst
...

Linus Torvalds
2015-04-15 01:49:03 +0800

08 Apr, 2015

1 commit

3673b8e4c tracing: Allow for modules to convert their enums to values ... Browse Code »

Update the infrastructure such that modules that declare TRACE_DEFINE_ENUM()
will have those enums converted into their values in the tracepoint
print fmt strings.

Link: http://lkml.kernel.org/r/87vbhjp74q.fsf@rustcorp.com.au

Acked-by: Rusty Russell
Reviewed-by: Masami Hiramatsu
Tested-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2015-04-08 21:39:57 +0800

19 Mar, 2015

1 commit

da11508eb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching ... Browse Code »

Pull livepatching fix from Jiri Kosina:

- fix for potential race with module loading, from Petr Mladek.

The race is very unlikely to be seen in real world and has been found
by code inspection, but should be fixed for 4.0 anyway.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: Fix subtle race with coming and going modules

Linus Torvalds
2015-03-19 01:46:39 +0800

17 Mar, 2015

1 commit

8cb2c2dc4 livepatch: Fix subtle race with coming and going modules ... Browse Code »

There is a notifier that handles live patches for coming and going modules.
It takes klp_mutex lock to avoid races with coming and going patches but
it does not keep the lock all the time. Therefore the following races are
possible:

1. The notifier is called sometime in STATE_MODULE_COMING. The module
is visible by find_module() in this state all the time. It means that
new patch can be registered and enabled even before the notifier is
called. It might create wrong order of stacked patches, see below
for an example.

2. New patch could still see the module in the GOING state even after
the notifier has been called. It will try to initialize the related
object structures but the module could disappear at any time. There
will stay mess in the structures. It might even cause an invalid
memory access.

This patch solves the problem by adding a boolean variable into struct module.
The value is true after the coming and before the going handler is called.
New patches need to be applied when the value is true and they need to ignore
the module when the value is false.

Note that we need to know state of all modules on the system. The races are
related to new patches. Therefore we do not know what modules will get
patched.

Also note that we could not simply ignore going modules. The code from the
module could be called even in the GOING state until mod->exit() finishes.
If we start supporting patches with semantic changes between function
calls, we need to apply new patches to any still usable code.
See below for an example.

Finally note that the patch solves only the situation when a new patch is
registered. There are no such problems when the patch is being removed.
It does not matter who disable the patch first, whether the normal
disable_patch() or the module notifier. There is nothing to do
once the patch is disabled.

Alternative solutions:
======================

+ reject new patches when a patched module is coming or going; this is ugly

+ wait with adding new patch until the module leaves the COMING and GOING
states; this might be dangerous and complicated; we would need to release
kgr_lock in the middle of the patch registration to avoid a deadlock
with the coming and going handlers; also we might need a waitqueue for
each module which seems to be even bigger overhead than the boolean

+ stop modules from entering COMING and GOING states; wait until modules
leave these states when they are already there; looks complicated; we would
need to ignore the module that asked to stop the others to avoid a deadlock;
also it is unclear what to do when two modules asked to stop others and
both are in COMING state (situation when two new patches are applied)

+ always register/enable new patches and fix up the potential mess (registered
patches order) in klp_module_init(); this is nasty and prone to regressions
in the future development

+ add another MODULE_STATE where the kallsyms are visible but the module is not
used yet; this looks too complex; the module states are checked on "many"
locations

Example of patch stacking breakage:
===================================

The notifier could _not_ _simply_ ignore already initialized module objects.
For example, let's have three patches (P1, P2, P3) for functions a() and b()
where a() is from vmcore and b() is from a module M. Something like:

a() b()
P1 a1() b1()
P2 a2() b2()
P3 a3() b3(3)

If you load the module M after all patches are registered and enabled.
The ftrace ops for function a() and b() has listed the functions in this
order:

ops_a->func_stack -> list(a3,a2,a1)
ops_b->func_stack -> list(b3,b2,b1)

, so the pointer to b3() is the first and will be used.

Then you might have the following scenario. Let's start with state when patches
P1 and P2 are registered and enabled but the module M is not loaded. Then ftrace
ops for b() does not exist. Then we get into the following race:

CPU0 CPU1

load_module(M)

complete_formation()

mod->state = MODULE_STATE_COMING;
mutex_unlock(&module_mutex);

klp_register_patch(P3);
klp_enable_patch(P3);

# STATE 1

klp_module_notify(M)
klp_module_notify_coming(P1);
klp_module_notify_coming(P2);
klp_module_notify_coming(P3);

# STATE 2

The ftrace ops for a() and b() then looks:

STATE1:

ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b3);

STATE2:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b2,b1,b3);

therefore, b2() is used for the module but a3() is used for vmcore
because they were the last added.

Example of the race with going modules:
=======================================

CPU0 CPU1

delete_module() #SYSCALL

try_stop_module()
mod->state = MODULE_STATE_GOING;

mutex_unlock(&module_mutex);

klp_register_patch()
klp_enable_patch()

#save place to switch universe

b() # from module that is going
a() # from core (patched)

mod->exit();

Note that the function b() can be called until we call mod->exit().

If we do not apply patch against b() because it is in MODULE_STATE_GOING,
it will call patched a() with modified semantic and things might get wrong.

[jpoimboe@redhat.com: use one boolean instead of two]
Signed-off-by: Petr Mladek
Acked-by: Josh Poimboeuf
Acked-by: Rusty Russell
Signed-off-by: Jiri Kosina

Petr Mladek
2015-03-17 17:31:54 +0800

14 Feb, 2015

1 commit

6301939d9 module: fix types of device tables aliases ... Browse Code »

MODULE_DEVICE_TABLE() macro used to create aliases to device tables.
Normally alias should have the same type as aliased symbol.

Device tables are arrays, so they have 'struct type##_device_id[x]'
types. Alias created by MODULE_DEVICE_TABLE() will have non-array type -
'struct type##_device_id'.

This inconsistency confuses compiler, it could make a wrong assumption
about variable's size which leads KASan to produce a false positive report
about out of bounds access.

For every global variable compiler calls __asan_register_globals() passing
information about global variable (address, size, size with redzone, name
...) __asan_register_globals() poison symbols redzone to detect possible
out of bounds accesses.

When symbol has an alias __asan_register_globals() will be called as for
symbol so for alias. Compiler determines size of variable by size of
variable's type. Alias and symbol have the same address, so if alias have
the wrong size part of memory that actually belongs to the symbol could be
poisoned as redzone of alias symbol.

By fixing type of alias symbol we will fix size of it, so
__asan_register_globals() will not poison valid memory.

Signed-off-by: Andrey Ryabinin
Cc: Dmitry Vyukov
Cc: Konstantin Serebryany
Cc: Dmitry Chernenkov
Signed-off-by: Andrey Konovalov
Cc: Yuri Gribov
Cc: Konstantin Khlebnikov
Cc: Sasha Levin
Cc: Christoph Lameter
Cc: Joonsoo Kim
Cc: Dave Hansen
Cc: Andi Kleen
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Ryabinin
2015-02-14 13:21:42 +0800

22 Jan, 2015

1 commit

d5db139ab module: make module_refcount() a signed integer. ... Browse Code »

James Bottomley points out that it will be -1 during unload. It's
only used for diagnostics, so let's not hide that as it could be a
clue as to what's gone wrong.

Cc: Jason Wessel
Acked-and-documention-added-by: James Bottomley
Reviewed-by: Masami Hiramatsu
Signed-off-by: Rusty Russell

Rusty Russell
2015-01-22 08:45:54 +0800

11 Nov, 2014

1 commit

2f35c41f5 module: Replace module_ref with atomic_t refcnt ... Browse Code »

Replace module_ref per-cpu complex reference counter with
an atomic_t simple refcnt. This is for code simplification.

Signed-off-by: Masami Hiramatsu
Signed-off-by: Rusty Russell

Masami Hiramatsu
2014-11-11 14:37:46 +0800

27 Jul, 2014

2 commits

76681c8fa module: return bool from within_module*() ... Browse Code »

The within_module*() functions return only true or false. Let's use bool as
the return type.

Note that it should not change kABI because these are inline functions.

Signed-off-by: Petr Mladek
Signed-off-by: Rusty Russell

Petr Mladek
2014-07-27 19:22:44 +0800
9b20a352d module: add within_module() function ... Browse Code »

It is just a small optimization that allows to replace few
occurrences of within_module_init() || within_module_core()
with a single call.

Signed-off-by: Petr Mladek
Signed-off-by: Rusty Russell

Petr Mladek
2014-07-27 19:22:43 +0800

07 Apr, 2014

1 commit

6f4c98e1c Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux ... Browse Code »

Pull module updates from Rusty Russell:
"Nothing major: the stricter permissions checking for sysfs broke a
staging driver; fix included. Greg KH said he'd take the patch but
hadn't as the merge window opened, so it's included here to avoid
breaking build"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
staging: fix up speakup kobject mode
Use 'E' instead of 'X' for unsigned module taint flag.
VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms.
kallsyms: fix percpu vars on x86-64 with relocation.
kallsyms: generalize address range checking
module: LLVMLinux: Remove unused function warning from __param_check macro
Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE
module: remove MODULE_GENERIC_TABLE
module: allow multiple calls to MODULE_DEVICE_TABLE() per module
module: use pr_cont

Linus Torvalds
2014-04-07 00:38:07 +0800

13 Mar, 2014

2 commits

cff26a51d module: remove MODULE_GENERIC_TABLE ... Browse Code »

MODULE_DEVICE_TABLE() calles MODULE_GENERIC_TABLE(); make it do the
work directly. This also removes a wart introduced in the last patch,
where the alias is defined to be an unknown struct type "struct
type##__##name##_device_id" instead of "struct type##_device_id" (it's
an extern so GCC doesn't care, but it's wrong).

The other user of MODULE_GENERIC_TABLE (ISAPNP_CARD_TABLE) is unused,
so delete it.

Signed-off-by: Rusty Russell

Rusty Russell
2014-03-13 09:41:00 +0800
21bdd17b2 module: allow multiple calls to MODULE_DEVICE_TABLE() per module ... Browse Code »

Commit 78551277e4df5: "Input: i8042 - add PNP modaliases" had a bug, where the
second call to MODULE_DEVICE_TABLE() overrode the first resulting in not all
the modaliases being exposed.

This fixes the problem by including the name of the device_id table in the
__mod_*_device_table alias, allowing us to export several device_id tables
per module.

Suggested-by: Kay Sievers
Acked-by: Greg Kroah-Hartman
Cc: Dmitry Torokhov
Signed-off-by: Tom Gundersen
Signed-off-by: Rusty Russell

Tom Gundersen
2014-03-13 09:41:00 +0800

07 Mar, 2014

1 commit

b2e285fcb tracing/module: Replace include of tracepoint.h with jump_label.h in module.h ... Browse Code »

There's nothing in the module.h header that requires tracepoint.h to be
included, and there may be cases that tracepoint.h may need to include
module.h, which will cause recursive header issues.

But module.h requires seeing HAVE_JUMP_LABEL which is set in jump_label.h
which it just coincidentally gets from tracepoint.h.

Link: http://lkml.kernel.org/r/20140307084712.5c68641a@gandalf.local.home

Acked-by: Rusty Russell
Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2014-03-07 23:06:09 +0800

16 Jan, 2014

1 commit

e865d06b1 module: fix coding style ... Browse Code »

Fix coding style of module.h

Signed-off-by: Seunghun Lee
Signed-off-by: Rusty Russell

Seunghun Lee
2014-01-16 07:53:03 +0800

04 Dec, 2013

1 commit

74e22fac8 module.h: Remove unnecessary semicolon ... Browse Code »

[All 8 callers already have semicolons. -- RR]

Signed-off-by: Joe Perches
Signed-off-by: Rusty Russell

Joe Perches
2013-12-04 11:39:46 +0800

23 Sep, 2013

1 commit

3f2b9c9cd module: remove rmmod --wait option. ... Browse Code »

The option to wait for a module reference count to reach zero was in
the initial module implementation, but it was never supported in
modprobe (you had to use rmmod --wait). After discussion with Lucas,
It has been deprecated (with a 10 second sleep) in kmod for the last
year.

This finally removes it: the flag will evoke a printk warning and a
normal (non-blocking) remove attempt.

Cc: Lucas De Marchi
Signed-off-by: Rusty Russell

Rusty Russell
2013-09-23 14:14:58 +0800

03 Sep, 2013

1 commit

942e44312 module: Fix mod->mkobj.kobj potentially freed too early ... Browse Code »

DEBUG_KOBJECT_RELEASE helps to find the issue attached below.

After some investigation, it seems the reason is:
The mod->mkobj.kobj(ffffffffa01600d0 below) is freed together with mod
itself in free_module(). However, its children still hold references to
it, as the delay caused by DEBUG_KOBJECT_RELEASE. So when the
child(holders below) tries to decrease the reference count to its parent
in kobject_del(), BUG happens as it tries to access already freed memory.

This patch tries to fix it by waiting for the mod->mkobj.kobj to be
really released in the module removing process (and some error code
paths).

[ 1844.175287] kobject: 'holders' (ffff88007c1f1600): kobject_release, parent ffffffffa01600d0 (delayed)
[ 1844.178991] kobject: 'notes' (ffff8800370b2a00): kobject_release, parent ffffffffa01600d0 (delayed)
[ 1845.180118] kobject: 'holders' (ffff88007c1f1600): kobject_cleanup, parent ffffffffa01600d0
[ 1845.182130] kobject: 'holders' (ffff88007c1f1600): auto cleanup kobject_del
[ 1845.184120] BUG: unable to handle kernel paging request at ffffffffa01601d0
[ 1845.185026] IP: [] kobject_put+0x11/0x60
[ 1845.185026] PGD 1a13067 PUD 1a14063 PMD 7bd30067 PTE 0
[ 1845.185026] Oops: 0000 [#1] PREEMPT
[ 1845.185026] Modules linked in: xfs libcrc32c [last unloaded: kprobe_example]
[ 1845.185026] CPU: 0 PID: 18 Comm: kworker/0:1 Tainted: G O 3.11.0-rc6-next-20130819+ #1
[ 1845.185026] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 1845.185026] Workqueue: events kobject_delayed_cleanup
[ 1845.185026] task: ffff88007ca51f00 ti: ffff88007ca5c000 task.ti: ffff88007ca5c000
[ 1845.185026] RIP: 0010:[] [] kobject_put+0x11/0x60
[ 1845.185026] RSP: 0018:ffff88007ca5dd08 EFLAGS: 00010282
[ 1845.185026] RAX: 0000000000002000 RBX: ffffffffa01600d0 RCX: ffffffff8177d638
[ 1845.185026] RDX: ffff88007ca5dc18 RSI: 0000000000000000 RDI: ffffffffa01600d0
[ 1845.185026] RBP: ffff88007ca5dd18 R08: ffffffff824e9810 R09: ffffffffffffffff
[ 1845.185026] R10: ffff8800ffffffff R11: dead4ead00000001 R12: ffffffff81a95040
[ 1845.185026] R13: ffff88007b27a960 R14: ffff88007c1f1600 R15: 0000000000000000
[ 1845.185026] FS: 0000000000000000(0000) GS:ffffffff81a23000(0000) knlGS:0000000000000000
[ 1845.185026] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1845.185026] CR2: ffffffffa01601d0 CR3: 0000000037207000 CR4: 00000000000006b0
[ 1845.185026] Stack:
[ 1845.185026] ffff88007c1f1600 ffff88007c1f1600 ffff88007ca5dd38 ffffffff812cdb7e
[ 1845.185026] 0000000000000000 ffff88007c1f1640 ffff88007ca5dd68 ffffffff812cdbfe
[ 1845.185026] ffff88007c974800 ffff88007c1f1640 ffff88007ff61a00 0000000000000000
[ 1845.185026] Call Trace:
[ 1845.185026] [] kobject_del+0x2e/0x40
[ 1845.185026] [] kobject_delayed_cleanup+0x6e/0x1d0
[ 1845.185026] [] process_one_work+0x1e5/0x670
[ 1845.185026] [] ? process_one_work+0x183/0x670
[ 1845.185026] [] worker_thread+0x113/0x370
[ 1845.185026] [] ? rescuer_thread+0x290/0x290
[ 1845.185026] [] kthread+0xda/0xe0
[ 1845.185026] [] ? _raw_spin_unlock_irq+0x30/0x60
[ 1845.185026] [] ? kthread_create_on_node+0x130/0x130
[ 1845.185026] [] ret_from_fork+0x7a/0xb0
[ 1845.185026] [] ? kthread_create_on_node+0x130/0x130
[ 1845.185026] Code: 81 48 c7 c7 28 95 ad 81 31 c0 e8 9b da 01 00 e9 4f ff ff ff 66 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 85 ff 74 1d 87 00 01 00 00 01 74 1e 48 8d 7b 38 83 6b 38 01 0f 94 c0 84
[ 1845.185026] RIP [] kobject_put+0x11/0x60
[ 1845.185026] RSP
[ 1845.185026] CR2: ffffffffa01601d0
[ 1845.185026] ---[ end trace 49a70afd109f5653 ]---

Signed-off-by: Li Zhong
Acked-by: Greg Kroah-Hartman
Signed-off-by: Rusty Russell

Li Zhong
2013-09-03 15:05:47 +0800

20 Aug, 2013

1 commit

7cb14ba75 modules: add support for soft module dependencies ... Browse Code »

Additional and optional dependencies not found while building the kernel and
modules, can now be declared explicitly.

Signed-off-by: Andreas Robinson
Acked-by: Herbert Xu
Signed-off-by: Rusty Russell

Andreas Robinson
2013-08-20 14:07:41 +0800

15 Mar, 2013

1 commit

b92021b09 CONFIG_SYMBOL_PREFIX: cleanup. ... Browse Code »

We have CONFIG_SYMBOL_PREFIX, which three archs define to the string
"_". But Al Viro broke this in "consolidate cond_syscall and
SYSCALL_ALIAS declarations" (in linux-next), and he's not the first to
do so.

Using CONFIG_SYMBOL_PREFIX is awkward, since we usually just want to
prefix it so something. So various places define helpers which are
defined to nothing if CONFIG_SYMBOL_PREFIX isn't set:

1) include/asm-generic/unistd.h defines __SYMBOL_PREFIX.
2) include/asm-generic/vmlinux.lds.h defines VMLINUX_SYMBOL(sym)
3) include/linux/export.h defines MODULE_SYMBOL_PREFIX.
4) include/linux/kernel.h defines SYMBOL_PREFIX (which differs from #7)
5) kernel/modsign_certificate.S defines ASM_SYMBOL(sym)
6) scripts/modpost.c defines MODULE_SYMBOL_PREFIX
7) scripts/Makefile.lib defines SYMBOL_PREFIX on the commandline if
CONFIG_SYMBOL_PREFIX is set, so that we have a non-string version
for pasting.

(arch/h8300/include/asm/linkage.h defines SYMBOL_NAME(), too).

Let's solve this properly:
1) No more generic prefix, just CONFIG_HAVE_UNDERSCORE_SYMBOL_PREFIX.
2) Make linux/export.h usable from asm.
3) Define VMLINUX_SYMBOL() and VMLINUX_SYMBOL_STR().
4) Make everyone use them.

Signed-off-by: Rusty Russell
Reviewed-by: James Hogan
Tested-by: James Hogan (metag)

Rusty Russell
2013-03-15 12:39:43 +0800

21 Jan, 2013

1 commit

93843b376 module: constify within_module_* ... Browse Code »

These helper functions just check a set intersection with a range, and
don't actually modify struct module.

Signed-off-by: Sasha Levin
Signed-off-by: Rusty Russell

Sasha Levin
2013-01-21 14:48:20 +0800