05 Jun, 2010
13 commits
-
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
module: fix bne2 "gave up waiting for init of module libcrc32c"
module: verify_export_symbols under the lock
module: move find_module check to end
module: make locking more fine-grained.
module: Make module sysfs functions private.
module: move sysfs exposure to end of load_module
module: fix kdb's illicit use of struct module_use.
module: Make the 'usage' lists be two-way -
Problem: it's hard to avoid an init routine stumbling over a
request_module these days. And it's not clear it's always a bad idea:
for example, a module like kvm with dynamic dependencies on kvm-intel
or kvm-amd would be neater if it could simply request_module the right
one.In this particular case, it's libcrc32c:
libcrc32c_mod_init
crypto_alloc_shash
crypto_alloc_tfm
crypto_find_alg
crypto_alg_mod_lookup
crypto_larval_lookup
request_moduleIf another module is waiting inside resolve_symbol() for libcrc32c to
finish initializing (ie. bne2 depends on libcrc32c) then it does so
holding the module lock, and our request_module() can't make progress
until that is released.Waiting inside resolve_symbol() without the lock isn't all that hard:
we just need to pass the -EBUSY up the call chain so we can sleep
where we don't hold the lock. Error reporting is a bit trickier: we
need to copy the name of the unfinished module before releasing the
lock.Other notes:
1) This also fixes a theoretical issue where a weak dependency would allow
symbol version mismatches to be ignored.
2) We rename use_module to ref_module to make life easier for the only
external user (the out-of-tree ksplice patches).Signed-off-by: Rusty Russell
Cc: Linus Torvalds
Cc: Tim Abbot
Tested-by: Brandon Philips -
It disabled preempt so it was "safe", but nothing stops another module
slipping in before this module is added to the global list now we don't
hold the lock the whole time.So we check this just after we check for duplicate modules, and just
before we put the module in the global list.(find_symbol finds symbols in coming and going modules, too).
Signed-off-by: Rusty Russell
-
I think Rusty may have made the lock a bit _too_ finegrained there, and
didn't add it to some places that needed it. It looks, for example, like
PATCH 1/2 actually drops the lock in places where it's needed
("find_module()" is documented to need it, but now load_module() didn't
hold it at all when it did the find_module()).Rather than adding a new "module_loading" list, I think we should be able
to just use the existing "modules" list, and just fix up the locking a
bit.In fact, maybe we could just move the "look up existing module" a bit
later - optimistically assuming that the module doesn't exist, and then
just undoing the work if it turns out that we were wrong, just before
adding ourselves to the list.Signed-off-by: Rusty Russell
-
Kay Sievers reports that we still have some
contention over module loading which is slowing boot.Linus also disliked a previous "drop lock and regrab" patch to fix the
bne2 "gave up waiting for init of module libcrc32c" message.This is more ambitious: we only grab the lock where we need it.
Signed-off-by: Rusty Russell
Cc: Brandon Philips
Cc: Kay Sievers
Cc: Linus Torvalds -
These were placed in the header in ef665c1a06 to get the various
SYSFS/MODULE config combintations to compile.That may have been necessary then, but it's not now. These functions
are all local to module.c.Signed-off-by: Rusty Russell
Cc: Randy Dunlap -
This means a little extra work, but is more logical: we don't put
anything in sysfs until we're about to put the module into the
global list an parse its parameters.This also gives us a logical place to put duplicate module detection
in the next patch.Signed-off-by: Rusty Russell
-
Linus changed the structure, and luckily this didn't compile any more.
Reported-by: Stephen Rothwell
Signed-off-by: Rusty Russell
Cc: Jason Wessel
Cc: Martin Hicks -
When adding a module that depends on another one, we used to create a
one-way list of "modules_which_use_me", so that module unloading could
see who needs a module.It's actually quite simple to make that list go both ways: so that we
not only can see "who uses me", but also see a list of modules that are
"used by me".In fact, we always wanted that list in "module_unload_free()": when we
unload a module, we want to also release all the other modules that are
used by that module. But because we didn't have that list, we used to
first iterate over all modules, and then iterate over each "used by me"
list of that module.By making the list two-way, we simplify module_unload_free(), and it
allows for some trivial fixes later too.Signed-off-by: Linus Torvalds
Signed-off-by: Rusty Russell (cleaned & rebased) -
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (27 commits)
block: make blk_init_free_list and elevator_init idempotent
block: avoid unconditionally freeing previously allocated request_queue
pipe: change /proc/sys/fs/pipe-max-pages to byte sized interface
pipe: change the privilege required for growing a pipe beyond system max
pipe: adjust minimum pipe size to 1 page
block: disable preemption before using sched_clock()
cciss: call BUG() earlier
Preparing 8.3.8rc2
drbd: Reduce verbosity
drbd: use drbd specific ratelimit instead of global printk_ratelimit
drbd: fix hang on local read errors while disconnected
drbd: Removed the now empty w_io_error() function
drbd: removed duplicated #includes
drbd: improve usage of MSG_MORE
drbd: need to set socket bufsize early to take effect
drbd: improve network latency, TCP_QUICKACK
drbd: Revert "drbd: Create new current UUID as late as possible"
brd: support discard
Revert "writeback: fix WB_SYNC_NONE writeback from umount"
Revert "writeback: ensure that WB_SYNC_NONE writeback with sb pinned is sync"
... -
The commit 80b5184cc537718122e036afe7e62d202b70d077 ("kernel/: convert cpu
notifier to return encapsulate errno value") changed the return value of
cpu notifier callbacks.Those callbacks don't return NOTIFY_BAD on failures anymore. But there
are a few callbacks which are called directly at init time and checking
the return value.I forgot to change BUG_ON checking by the direct callers in the commit.
Signed-off-by: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Child groups should have a greater depth than their parents. Prior to
this change, the parent would incorrectly report zero memory usage for
child cgroups when use_hierarchy is enabled.test script:
mount -t cgroup none /cgroups -o memory
cd /cgroups
mkdir cg1echo 1 > cg1/memory.use_hierarchy
mkdir cg1/cg11echo $$ > cg1/cg11/tasks
dd if=/dev/zero of=/tmp/foo bs=1M count=1echo
echo CHILD
grep cache cg1/cg11/memory.statecho
echo PARENT
grep cache cg1/memory.statecho $$ > tasks
rmdir cg1/cg11 cg1
cd /
umount /cgroupsUsing fae9c79, a recent patch that changed alloc_css_id() depth computation,
the parent incorrectly reports zero usage:
root@ubuntu:~# ./test
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0151844 s, 69.1 MB/sCHILD
cache 1048576
total_cache 1048576PARENT
cache 0
total_cache 0With this patch, the parent correctly includes child usage:
root@ubuntu:~# ./test
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0136827 s, 76.6 MB/sCHILD
cache 1052672
total_cache 1052672PARENT
cache 0
total_cache 1052672Signed-off-by: Greg Thelen
Acked-by: Paul Menage
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Li Zefan
Cc: [2.6.34.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
task_struct->pesonality is "unsigned int", but sys_personality() paths use
"unsigned long pesonality". This means that every assignment or
comparison is not right. In particular, if this argument does not fit
into "unsigned int" __set_personality() changes the caller's personality
and then sys_personality() returns -EINVAL.Turn this argument into "unsigned int" and avoid overflows. Obviously,
this is the user-visible change, we just ignore the upper bits. But this
can't break the sane application.There is another thing which can confuse the poorly written applications.
User-space thinks that this syscall returns int, not long. This means
that the returned value can be negative and look like the error code. But
note that libc won't be confused and thus errno won't be set, and with
this patch the user-space can never get -1 unless sys_personality() really
fails. And, most importantly, the negative RET != -1 is only possible if
that app previously called personality(RET).Pointed-out-by: Wenming Zhang
Suggested-by: Linus Torvalds
Signed-off-by: Oleg Nesterov
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Jun, 2010
2 commits
-
…l/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched, trace: Fix sched_switch() prev_state argument
sched: Fix wake_affine() vs RT tasks
sched: Make sure timers have migrated before killing the migration_thread -
…el/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf: Fix crash in swevents
perf buildid-list: Fix --with-hits event processing
perf scripts python: Give field dict to unhandled callback
perf hist: fix objdump output parsing
perf-record: Check correct pid when forking
perf: Do the comm inheritance per thread in event__process_task
perf: Use event__process_task from perf sched
perf: Process comm events by tid
blktrace: Fix new kernel-doc warnings
perf_events: Fix unincremented buffer base on partial copy
perf_events: Fix event scheduling issues introduced by transactional API
perf_events, trace: Fix perf_trace_destroy(), mutex went missing
perf_events, trace: Fix probe unregister race
perf_events: Fix races in group composition
perf_events: Fix races and clean up perf_event and perf_mmap_data interaction
03 Jun, 2010
2 commits
-
Frederic reported that because swevents handling doesn't disable IRQs
anymore, we can get a recursion of perf_adjust_period(), once from
overflow handling and once from the tick.If both call ->disable, we get a double hlist_del_rcu() and trigger
a LIST_POISON2 dereference.Since we don't actually need to stop/start a swevent to re-programm
the hardware (lack of hardware to program), simply nop out these
callbacks for the swevent pmu.Reported-by: Frederic Weisbecker
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
This changes the interface to be based on bytes instead. The API
matches that of F_SETPIPE_SZ in that it rounds up the passed in
size so that the resulting page array is a power-of-2 in size.The proc file is renamed to /proc/sys/fs/pipe-max-size to
reflect this change.Signed-off-by: Jens Axboe
02 Jun, 2010
1 commit
-
In commit e9fb7631ebcd ("cpu-hotplug: introduce cpu_notify(),
__cpu_notify(), cpu_notify_nofail()") the new helper functions access
cpu_chain. As a result, it shouldn't be marked __cpuinitdata (via
section mismatch warning).Alternatively, the helper functions should be forced inline, or marked
__ref or __cpuinit. In the meantime, this patch silences the warning
the trivial way.Signed-off-by: Daniel J Blueman
Signed-off-by: Linus Torvalds
01 Jun, 2010
3 commits
-
* 'for-35' of git://repo.or.cz/linux-kbuild: (81 commits)
kbuild: Revert part of e8d400a to resolve a conflict
kbuild: Fix checking of scm-identifier variable
gconfig: add support to show hidden options that have prompts
menuconfig: add support to show hidden options which have prompts
gconfig: remove show_debug option
gconfig: remove dbg_print_ptype() and dbg_print_stype()
kconfig: fix zconfdump()
kconfig: some small fixes
add random binaries to .gitignore
kbuild: Include gen_initramfs_list.sh and the file list in the .d file
kconfig: recalc symbol value before showing search results
.gitignore: ignore *.lzo files
headerdep: perlcritic warning
scripts/Makefile.lib: Align the output of LZO
kbuild: Generate modules.builtin in make modules_install
Revert "kbuild: specify absolute paths for cscope"
kbuild: Do not unnecessarily regenerate modules.builtin
headers_install: use local file handles
headers_check: fix perl warnings
export_report: fix perl warnings
... -
Mike reports that since e9e9250b (sched: Scale down cpu_power due to RT
tasks), wake_affine() goes funny on RT tasks due to them still having a
!0 weight and wake_affine() still subtracts that from the rq weight.Since nobody should be using se->weight for RT tasks, set the value to
zero. Also, since we now use ->cpu_power to normalize rq weights to
account for RT cpu usage, add that factor into the imbalance computation.Reported-by: Mike Galbraith
Tested-by: Mike Galbraith
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Rafael sees a sometimes crash at precpu_modfree from kernel/module.c; it
only occurred with another (since-reverted) patch, but that patch simply
changed timing to uncover this bug, it was otherwise unrelated.The comment about the mod being freed is self-explanatory, but neither
Tejun nor I read it. This bug was introduced in 259354deaa, after it
had previously been fixed in 6e2b75740b. How embarrassing.Reported-by: "Rafael J. Wysocki"
Signed-off-by: Rusty Russell
Embarrassingly-Acked-by: Tejun Heo
Cc: Masami Hiramatsu
Tested-by: "Rafael J. Wysocki"
Signed-off-by: Linus Torvalds
31 May, 2010
12 commits
-
Fix blktrace.c kernel-doc warnings:
Warning(kernel/trace/blktrace.c:858): No description found for parameter 'ignore'
Warning(kernel/trace/blktrace.c:890): No description found for parameter 'ignore'Signed-off-by: Randy Dunlap
Cc: Jens Axboe
Cc: Steven Rostedt
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
If a sample size crosses to the next page boundary, the copy
will be made in more than one step. However we forget to advance
the source offset for the next copy, leading to unexpected double
copies that completely mess up the traces.This fixes various kinds of bad traces that have irrelevant
data inside, as an example:geany-4979 [001] 5758.077775: sched_switch: prev_comm=! prev_pid=121
prev_prio=0 prev_state=S|D|Z|X|x ==> next_comm= next_pid=7497072
next_prio=0Signed-off-by: Frederic Weisbecker
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
The transactional API patch between the generic and model-specific
code introduced several important bugs with event scheduling, at
least on X86. If you had pinned events, e.g., watchdog, and were
over-committing the PMU, you would get bogus counts. The bug was
showing up on Intel CPU because events would move around more
often that on AMD. But the problem also existed on AMD, though
harder to expose.The issues were:
- group_sched_in() was missing a cancel_txn() in the error path
- cpuc->n_added was not properly maintained, leading to missing
actions in hw_perf_enable(), i.e., n_running being 0. You cannot
update n_added until you know the transaction has succeeded. In
case of failed transaction n_added was not adjusted back.- in case of failed transactions, event_sched_out() was called
and eventually invoked x86_disable_event() to touch the HW reg.
But with transactions, on X86, event_sched_in() does not touch
HW registers, it simply collects events into a list. Thus, you
could end up calling x86_disable_event() on a counter which
did not correspond to the current event when idx != -1.The patch modifies the generic and X86 code to avoid all those problems.
First, we keep track of the number of events added last. In case the
transaction fails, we substract them from n_added. This approach is
necessary (as opposed to delaying updates to n_added) because not all
event updates use the transaction API, e.g., single events.Second, we encapsulate the event_sched_in() and event_sched_out() in
group_sched_in() inside the transaction. That makes the operations
symmetrical and you can also detect that you are inside a transaction
and skip the HW reg access by checking cpuc->group_flag.With this patch, you can now overcommit the PMU even with pinned
system-wide events present and still get valid counts.Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Steve spotted I forgot to do the destroy under event_mutex.
Reported-by: Steven Rostedt
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
tracepoint_probe_unregister() does not synchronize against the probe
callbacks, so do that explicitly. This properly serializes the callbacks
and the free of the data used therein.Also, use this_cpu_ptr() where possible.
Acked-by: Frederic Weisbecker
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Group siblings don't pin each-other or the parent, so when we destroy
events we must make sure to clean up all cross referencing pointers.In particular, for destruction of a group leader we must be able to
find all its siblings and remove their reference to it.This means that detaching an event from its context must not detach it
from the group, otherwise we can end up failing to clear all pointers.Solve this by clearly separating the attachment to a context and
attachment to a group, and keep the group composed until we destroy
the events.Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
In order to move toward separate buffer objects, rework the whole
perf_mmap_data construct to be a more self-sufficient entity, one
with its own lifetime rules.This greatly sanitizes the whole output redirection code, which
was riddled with bugs and races.Signed-off-by: Peter Zijlstra
Cc:
LKML-Reference:
Signed-off-by: Ingo Molnar -
Problem: In a stress test where some heavy tests were running along with
regular CPU offlining and onlining, a hang was observed. The system seems
to be hung at a point where migration_call() tries to kill the
migration_thread of the dying CPU, which just got moved to the current
CPU. This migration thread does not get a chance to run (and die) since
rt_throttled is set to 1 on current, and it doesn't get cleared as the
hrtimer which is supposed to reset the rt bandwidth
(sched_rt_period_timer) is tied to the CPU which we just marked dead!Solution: This patch pushes the killing of migration thread to
"CPU_POST_DEAD" event. By then all the timers (including
sched_rt_period_timer) should have got migrated (along with other
callbacks).Signed-off-by: Amit Arora
Signed-off-by: Gautham R Shenoy
Acked-by: Tejun Heo
Signed-off-by: Peter Zijlstra
Cc: Thomas Gleixner
LKML-Reference:
Signed-off-by: Ingo Molnar -
…/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
mutex: Fix optimistic spinning vs. BKL -
…/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf tui: Fix last use_browser problem related to .perfconfig
perf symbols: Add the build id cache to the vmlinux path
perf tui: Reset use_browser if stdout is not a tty
ring-buffer: Move zeroing out excess in page to ring buffer code
ring-buffer: Reset "real_end" when page is filled -
If there's only one CPU online when disable_nonboot_cpus() is called,
the error variable will not be initialized and that may lead to
erroneous behavior. Fix this issue by initializing error in
disable_nonboot_cpus() as appropriate.Signed-off-by: Rafael J. Wysocki
Signed-off-by: Linus Torvalds -
This reverts commit 0ac0c0d0f837c499afd02a802f9cf52d3027fa3b, which
caused cross-architecture build problems for all the wrong reasons.
IA64 already added its own version of __node_random(), but the fact is,
there is nothing architectural about the function, and the original
commit was just badly done. Revert it, since no fix is forthcoming.Requested-by: Stephen Rothwell
Signed-off-by: Linus Torvalds
30 May, 2010
2 commits
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: clean up on forwarded aborted mds request
ceph: fix leak of osd authorizer
ceph: close out mds, osd connections before stopping auth
ceph: make lease code DN specific
fs/ceph: Use ERR_CAST
ceph: renew auth tickets before they expire
ceph: do not resend mon requests on auth ticket renewal
ceph: removed duplicated #includes
ceph: avoid possible null dereference
ceph: make mds requests killable, not interruptible
sched: add wait_for_completion_killable_timeout -
Add missing _killable_timeout variant for wait_for_completion that will
return when a timeout expires or the task is killed.CC: Ingo Molnar
CC: Andreas Herrmann
CC: Thomas Gleixner
CC: Mike Galbraith
Acked-by: Peter Zijlstra
Signed-off-by: Sage Weil
29 May, 2010
1 commit
-
…el/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
posix_timer: Fix error path in timer_create
hrtimer: Avoid double seqlock
timers: Move local variable into else section
timers: Fix slack calculation really
28 May, 2010
4 commits
-
once anon_inode_getfd() is called, you can't expect *anything* about
struct file that descriptor points to - another thread might be doing
whatever it likes with descriptor table at that point.Cc: stable
Signed-off-by: Al Viro -
…git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (61 commits)
tracing: Add __used annotation to event variable
perf, trace: Fix !x86 build bug
perf report: Support multiple events on the TUI
perf annotate: Fix up usage of the build id cache
x86/mmiotrace: Remove redundant instruction prefix checks
perf annotate: Add TUI interface
perf tui: Remove annotate from popup menu after failure
perf report: Don't start the TUI if -D is used
perf: Fix getline undeclared
perf: Optimize perf_tp_event_match()
perf: Remove more code from the fastpath
perf: Optimize the !vmalloc backed buffer
perf: Optimize perf_output_copy()
perf: Fix wakeup storm for RO mmap()s
perf-record: Share per-cpu buffers
perf-record: Remove -M
perf: Ensure that IOC_OUTPUT isn't used to create multi-writer buffers
perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events
perf, trace: Optimize tracepoints by removing IRQ-disable from perf/tracepoint interaction
perf tui: Allow disabling the TUI on a per command basis in ~/.perfconfig
... -
Move CLOCK_DISPATCH(which_clock, timer_create, (new_timer)) after all
posible EFAULT erros.*_timer_create may allocate/get resources.
(for example posix_cpu_timer_create does get_task_struct)[ tglx: fold the remove crappy comment patch into this ]
Signed-off-by: Andrey Vagin
Cc: Oleg Nesterov
Cc: Pavel Emelyanov
Cc:
Reviewed-by: Stanislaw Gruszka
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner -
Commit e9fb7631ebcd ("cpu-hotplug: introduce cpu_notify(),
__cpu_notify(), cpu_notify_nofail()") also introduced this annoying
warning:kernel/cpu.c:157: warning: 'cpu_notify_nofail' defined but not used
when CONFIG_HOTPLUG_CPU wasn't set.
So move that helper inside the #ifdef CONFIG_HOTPLUG_CPU region, and
simplify it while at it.Signed-off-by: Linus Torvalds