05 Mar, 2011
1 commit
-
This merge creates two set of conflicts. One is simple context
conflicts caused by removal of throtl_scheduled_delayed_work() in
for-linus and removal of throtl_shutdown_timer_wq() in
for-2.6.39/core.The other is caused by commit 255bb490c8 (block: blk-flush shouldn't
call directly into q->request_fn() __blk_run_queue()) in for-linus
crashing with FLUSH reimplementation in for-2.6.39/core. The conflict
isn't trivial but the resolution is straight-forward.* __blk_run_queue() calls in flush_end_io() and flush_data_end_io()
should be called with @force_kblockd set to %true.* elv_insert() in blk_kick_flush() should use
%ELEVATOR_INSERT_REQUEUE.Both changes are to avoid invoking ->request_fn() directly from
request completion path and closely match the changes in the commit
255bb490c8.Signed-off-by: Tejun Heo
04 Mar, 2011
1 commit
-
Following steps lead to deadlock in kernel:
dd if=/dev/zero of=img bs=512 count=1000
losetup -f img
mkfs.ext2 /dev/loop0
mount -t ext2 -o loop /dev/loop0 mnt
umount mnt/Stacktrace:
[] irq_exit+0x36/0x59
[] smp_apic_timer_interrupt+0x6b/0x75
[] apic_timer_interrupt+0x31/0x38
[] mutex_spin_on_owner+0x54/0x5b
[] lo_release+0x12/0x67 [loop]
[] __blkdev_put+0x7c/0x10c
[] fput+0xd5/0x1aa
[] loop_clr_fd+0x1a9/0x1b1 [loop]
[] lo_release+0x39/0x67 [loop]
[] __blkdev_put+0x7c/0x10c
[] deactivate_locked_super+0x17/0x36
[] sys_umount+0x27e/0x2a5
[] sys_oldumount+0xb/0xe
[] sysenter_do_call+0x12/0x26
[] 0xffffffffRegression since 2a48fc0ab24241755dc9, which introduced the private
loop_mutex as part of the BKL removal process.As per [1], the mutex can be safely removed.
[1] http://www.gossamer-threads.com/lists/linux/kernel/1341930
Addresses: https://bugzilla.novell.com/show_bug.cgi?id=669394
Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=29172Signed-off-by: Petr Uzel
Cc: stable@kernel.org
Reviewed-by: Nikanth Karthikesan
Acked-by: Arnd Bergmann
Signed-off-by: Jens Axboe
03 Mar, 2011
5 commits
-
If we enable trace events to trace block actions, We use
blk_fill_rwbs_rq to analyze the corresponding actions
in request's cmd_flags, but we only choose the minor 2 bits
from it, so most of other flags(e.g, REQ_SYNC) are missing.
For example, with a sync write we get:
write_test-2409 [001] 160.013869: block_rq_insert: 3,64 W 0 () 258135 + =
8 [write_test]Since now we have integrated the flags of both bio and request,
it is safe to pass rq->cmd_flags directly to blk_fill_rwbs and
blk_fill_rwbs_rq isn't needed any more.With this patch, after a sync write we get:
write_test-2417 [000] 226.603878: block_rq_insert: 3,64 WS 0 () 258135 +=
8 [write_test]Signed-off-by: Tao Ma
Acked-by: Jeff Moyer
Signed-off-by: Jens Axboe -
Move blk_throtl_exit() in blk_cleanup_queue() as blk_throtl_exit() is
written in such a way that it needs queue lock. In blk_release_queue()
there is no gurantee that ->queue_lock is still around.Initially blk_throtl_exit() was in blk_cleanup_queue() but Ingo reported
one problem.https://lkml.org/lkml/2010/10/23/86
And a quick fix moved blk_throtl_exit() to blk_release_queue().
commit 7ad58c028652753814054f4e3ac58f925e7343f4
Author: Jens Axboe
Date: Sat Oct 23 20:40:26 2010 +0200block: fix use-after-free bug in blk throttle code
This patch reverts above change and does not try to shutdown the
throtl work in blk_sync_queue(). By avoiding call to
throtl_shutdown_timer_wq() from blk_sync_queue(), we should also avoid
the problem reported by Ingo.blk_sync_queue() seems to be used only by md driver and it seems to be
using it to make sure q->unplug_fn is not called as md registers its
own unplug functions and it is about to free up the data structures
used by unplug_fn(). Block throttle does not call back into unplug_fn()
or into md. So there is no need to cancel blk throttle work.In fact I think cancelling block throttle work is bad because it might
happen that some bios are throttled and scheduled to be dispatched later
with the help of pending work and if work is cancelled, these bios might
never be dispatched.Block layer also uses blk_sync_queue() during blk_cleanup_queue() and
blk_release_queue() time. That should be safe as we are also calling
blk_throtl_exit() which should make sure all the throttling related
data structures are cleaned up.Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe -
Now we initialize ->queue_lock at queue allocation time so driver does
not have to worry about initializing it before calling
blk_cleanup_queue().Signed-off-by: Jens Axboe
-
There does not seem to be a clear convention whether q->queue_lock is
initialized or not when blk_cleanup_queue() is called. In the past it
was not necessary but now blk_throtl_exit() takes up queue lock by
default and needs queue lock to be available.In fact elevator_exit() code also has similar requirement just that it
is less stringent in the sense that elevator_exit() is called only if
elevator is initialized.Two problems have been noticed because of ambiguity about spin lock
status.- If a driver calls blk_alloc_queue() and then soon calls
blk_cleanup_queue() almost immediately, (because some other
driver structure allocation failed or some other error happened)
then blk_throtl_exit() will run into issues as queue lock is not
initialized. Loop driver ran into this issue recently and I
noticed error paths in md driver too. Similar error paths should
exist in other drivers too.- If some driver provided external spin lock and zapped the lock
before blk_cleanup_queue(), then it can lead to issues.So this patch initializes the default queue lock at queue allocation time.
block throttling code is one of the users of queue lock and it is
initialized at the queue allocation time, so it makes sense to
initialize ->queue_lock also to internal lock. A driver can overide that
lock later. This will take care of the issue where a driver does not have
to worry about initializing the queue lock to default before calling
blk_cleanup_queue()Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe -
Rename the numerals in the diskstats_show() into the macros.
Cc: Jens Axboe
Signed-off-by: Liu Yuan
Signed-off-by: Jens Axboe
02 Mar, 2011
6 commits
-
blk-flush decomposes a flush into sequence of multiple requests. On
completion of a request, the next one is queued; however, block layer
must not implicitly call into q->request_fn() directly from completion
path. This makes the queue behave unexpectedly when seen from the
drivers and violates the assumption that q->request_fn() is called
with process context + queue_lock.This patch makes blk-flush the following two changes to make sure
q->request_fn() is not called directly from request completion path.- blk_flush_complete_seq_end_io() now asks __blk_run_queue() to always
use kblockd instead of calling directly into q->request_fn().- queue_next_fseq() uses ELEVATOR_INSERT_REQUEUE instead of
ELEVATOR_INSERT_FRONT so that elv_insert() doesn't try to unplug the
request queue directly.Reported by Jan in the following threads.
http://thread.gmane.org/gmane.linux.ide/48778
http://thread.gmane.org/gmane.linux.ide/48786stable: applicable to v2.6.37.
Signed-off-by: Tejun Heo
Reported-by: Jan Beulich
Cc: "David S. Miller"
Cc: stable@kernel.org
Signed-off-by: Jens Axboe -
__blk_run_queue() automatically either calls q->request_fn() directly
or schedules kblockd depending on whether the function is recursed.
blk-flush implementation needs to be able to explicitly choose
kblockd. Add @force_kblockd.All the current users are converted to specify %false for the
parameter and this patch doesn't introduce any behavior change.stable: This is prerequisite for fixing ide oops caused by the new
blk-flush implementation.Signed-off-by: Tejun Heo
Cc: Jan Beulich
Cc: James Bottomley
Cc: stable@kernel.org
Signed-off-by: Jens Axboe -
Effectively, make group_isolation=1 the default and remove the tunable.
The setting group_isolation=0 was because by default we idle on
sync-noidle tree and on fast devices, this can be very harmful for
throughput.However, this problem can also be addressed by tuning slice_idle and
possibly group_idle on faster storage devices.This change simplifies the CFQ code by removing the feature entirely.
Signed-off-by: Justin TerAvest
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe -
Conflicts:
block/cfq-iosched.cSigned-off-by: Jens Axboe
-
Signed-off-by: Ben Hutchings
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe -
o Dominik Klein reported a system hang issue while doing some blkio
throttling testing.https://lkml.org/lkml/2011/2/24/173
o Some tracing revealed that CFQ was not dispatching any more jobs as
queue unplug was not happening. And queue unplug was not happening
because unplug work was not being called as there was one throttling
work on same cpu which as not finished yet. And throttling work had not
finished as it was tyring to dispatch a bio to CFQ but all the request
descriptors were consume to it was put to sleep.o So basically it is a cyclic dependecny between CFQ unplug work and
throtl dispatch work. Tejun suggested that use separate workqueue for
such cases.o This patch uses a separate workqueue for throttle related work and
does not rely on kblockd workqueue anymore.Cc: stable@kernel.org
Reported-by: Dominik Klein
Signed-off-by: Vivek Goyal
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe
01 Mar, 2011
10 commits
-
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/staging:
hwmon: (adt7411) add MODULE_DEVICE_TABLE
hwmon: (ad7414) add MODULE_DEVICE_TABLE -
Fix new kernel-doc warning in fs/block_dev.c:
Warning(fs/block_dev.c:937): No description found for parameter 'kill_dirty'
Signed-off-by: Randy Dunlap
Signed-off-by: Linus Torvalds -
Several ACPI drivers fail to build if CONFIG_NET is unset, because
they refer to things depending on CONFIG_THERMAL that in turn depends
on CONFIG_NET. However, CONFIG_THERMAL doesn't really need to depend
on CONFIG_NET, because the only part of it requiring CONFIG_NET is
the netlink interface in thermal_sys.c.Put the netlink interface in thermal_sys.c under #ifdef CONFIG_NET
and remove the dependency of CONFIG_THERMAL on CONFIG_NET from
drivers/thermal/Kconfig.Signed-off-by: Rafael J. Wysocki
Acked-by: Randy Dunlap
Cc: Ingo Molnar
Cc: Len Brown
Cc: Stephen Rothwell
Cc: Luming Yu
Cc: Andrew Morton
Signed-off-by: Linus Torvalds -
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm: fix unsigned vs signed comparison issue in modeset ctl ioctl.
drm/nv50-nvc0: make sure vma is definitely unmapped when destroying bo -
…/git/tmlind/linux-omap-2.6
* 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
omap4: prcm: Fix the CPUx clockdomain offsets
OMAP2+: clocksource: fix crash on boot when !CONFIG_OMAP_32K_TIMER
OMAP2/3: clock: fix fint calculation for DPLL_FREQSEL
OMAP2+: mailbox: fix lookups for multiple mailboxes
OMAP2420: mailbox: fix IVA vs DSP IRQ numbering
mach-omap2: smartreflex: world-writable debugfs voltage files
mach-omap2: pm: world-writable debugfs timer files
mach-omap2: mux: world-writable debugfs files -
…or-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf timechart: Fix max number of cpus
perf timechart: Fix black idle boxes in the title
perf hists: Print number of samples, not the period sum* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Use u32 instead of long to set reset vector back to 0* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
clockevents: Prevent oneshot mode when broadcast device is periodic -
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: fix truncate after open
fuse: fix hang of single threaded fuseblk filesystem -
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
ocfs2: Check heartbeat mode for kernel stacks only
Ocfs2/refcounttree: Fix a bug for refcounttree to writeback clusters in a right number.
ocfs2: Fix estimate of necessary credits for mkdir -
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
eukrea-tlv320: fix platform_name
ASoC: correct pxa AC97 DAI names
ALSA: hda - Add support for new IDT 92HD98 and 92HD99 codecs
ALSA: HDA: Add ideapad quirk for two Dell machines
ALSA: HDA: Add a new Conexant codec 506e (20590)
ALSA: usb-audio: fix oops due to cleanup race when disconnecting
ASoC: Hook wm_hubs micbiases up to CLK_SYS
ASoC: Correct definition of WM8903_VMID_RES_5K
ASoC: Fix WM8958 default microphone detection argument ordering
ALSA: HDA: Fix mic initialization in VIA auto parser
ALSA: fix one memory leak in sound jack -
Commit e2cda3226481 ("thp: add pmd mangling generic functions") replaced
some macros in with inline functions.If the functions are to be defined (not all architectures need them)
then struct vm_area_struct must be defined first. So include
.Fixes a build failure seen in Debian:
CC [M] drivers/media/dvb/mantis/mantis_pci.o
In file included from arch/arm/include/asm/pgtable.h:460,
from drivers/media/dvb/mantis/mantis_pci.c:25:
include/asm-generic/pgtable.h: In function 'ptep_test_and_clear_young':
include/asm-generic/pgtable.h:29: error: dereferencing pointer to incomplete typeSigned-off-by: Ben Hutchings
Signed-off-by: Linus Torvalds
28 Feb, 2011
6 commits
-
A customer of ours, complained that when setting the reset
vector back to 0, it trashed other data and hung their box.
They noticed when only 4 bytes were set to 0 instead of 8,
everything worked correctly.Mathew pointed out:
|
| We're supposed to be resetting trampoline_phys_low and
| trampoline_phys_high here, which are two 16-bit values.
| Writing 64 bits is definitely going to overwrite space
| that we're not supposed to be touching.
|So limit the area modified to u32.
Signed-off-by: Don Zickus
Acked-by: Matthew Garrett
Cc:
LKML-Reference:
Signed-off-by: Ingo Molnar -
Currently numcpus is determined in pid_put_sample which is only
called on sched_switch/sched_wakeup sample processing.On a machine with a lot cpus I often saw the last cpu missing.
Check for (max) numcpus on every event happening and in the
beginning. -> fixes the issue for me.Signed-off-by: Thomas Renninger
Cc: Arjan van de Ven
Cc: Arnaldo Carvalho de Melo
Cc: lenb@kernel.org
LKML-Reference:
Signed-off-by: Ingo Molnar -
This fix is needed for eye of gnome and firefox svg viewers.
Only Inkscape can handle the broken case.Compare with the other svg_legenda_box declarations, looks
like a typo slipped in at this place.Signed-off-by: Thomas Renninger
Cc: Arjan van de Ven
Cc: Arnaldo Carvalho de Melo
Cc: lenb@kernel.org
LKML-Reference:
Signed-off-by: Ingo Molnar -
* 'nouveau/drm-nouveau-fixes' of /ssd/git/drm-nouveau-next:
drm/nv50-nvc0: make sure vma is definitely unmapped when destroying bo -
This fixes CVE-2011-1013.
Reported-by: Matthiew Herrb (OpenBSD X.org team)
Cc: stable@kernel.org
Signed-off-by: Dave Airlie -
Somehow fixes a misrendering + hang at GDM startup on my NVA8...
My first guess would have been stale TLB entries laying around that a new
bo then accidentally inherits. That doesn't make a great deal of sense
however, as when we mapped the pages for the new bo the TLBs would've
gotten flushed anyway.Signed-off-by: Ben Skeggs
27 Feb, 2011
2 commits
-
The device table is required to load modules based on modaliases.
Signed-off-by: Axel Lin
Acked-by: Wolfram Sang
Signed-off-by: Guenter Roeck -
The device table is required to load modules based on modaliases.
Signed-off-by: Axel Lin
Signed-off-by: Guenter Roeck
26 Feb, 2011
9 commits
-
When the per cpu timer is marked CLOCK_EVT_FEAT_C3STOP, then we only
can switch into oneshot mode, when the backup broadcast device
supports oneshot mode as well. Otherwise we would try to switch the
broadcast device into an unsupported mode unconditionally. This went
unnoticed so far as the current available broadcast devices support
oneshot mode. Seth unearthed this problem while debugging and working
around an hpet related BIOS wreckage.Add the necessary check to tick_is_oneshot_available().
Reported-and-tested-by: Seth Forshee
Signed-off-by: Thomas Gleixner
LKML-Reference:
Cc: stable@kernel.org # .21 -> -
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
PM: Make ACPI wakeup from S5 work again when CONFIG_PM_SLEEP is unset -
Fixes sysfs config attribute to allow access to entire 16MB maintenance
space of RapidIO devices.Signed-off-by: Alexandre Bounine
Cc: Kumar Gala
Cc: Matt Porter
Cc: Li Yang
Cc: Thomas Moll
Cc: Micha Nelissen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Initialize ts_real.flags to fix compiler warning about possible
uninitialized use of this field.Signed-off-by: Alexander Gordeev
Cc: john stultz
Cc: Rodolfo Giometti
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
It seems odd that truncate_inode_pages_range(), called not only when
truncating but also when evicting inodes, has mem_cgroup_uncharge_start
and _end() batching in its second loop to clear up a few leftovers, but
not in its first loop that does almost all the work: add them there too.Signed-off-by: Hugh Dickins
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Balbir Singh
Acked-by: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The THP code didn't pass the correct interleaving shift to the memory
policy code. Fix this here by adjusting for the order.Signed-off-by: Andi Kleen
Reviewed-by: Christoph Lameter
Acked-by: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
A race can occur when io_submit() races with io_destroy():
CPU1 CPU2
io_submit()
do_io_submit()
...
ctx = lookup_ioctx(ctx_id);
io_destroy()
Now do_io_submit() holds the last reference to ctx.
...
queue new AIO
put_ioctx(ctx) - frees ctx with active AIOsWe solve this issue by checking whether ctx is being destroyed in AIO
submission path after adding new AIO to ctx. Then we are guaranteed that
either io_destroy() waits for new AIO or we see that ctx is being
destroyed and bail out.Cc: Nick Piggin
Reviewed-by: Jeff Moyer
Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
aio-dio-invalidate-failure GPFs in aio_put_req from io_submit.
lookup_ioctx doesn't implement the rcu lookup pattern properly.
rcu_read_lock does not prevent refcount going to zero, so we might take
a refcount on a zero count ioctx.Fix the bug by atomically testing for zero refcount before incrementing.
[jack@suse.cz: added comment into the code]
Reviewed-by: Jeff Moyer
Signed-off-by: Nick Piggin
Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds