12 Jul, 2011
2 commits
-
Move the variables to do think time check to a sepatate struct. This is
to prepare adding think time check for service tree and group. No
functional change.Signed-off-by: Shaohua Li
Acked-by: Vivek Goyal
Signed-off-by: Jens Axboe -
fs_excl is a poor man's priority inheritance for filesystems to hint to
the block layer that an operation is important. It was never clearly
specified, not widely adopted, and will not prevent starvation in many
cases (like across cgroups).fs_excl was introduced with the time sliced CFQ IO scheduler, to
indicate when a process held FS exclusive resources and thus needed
a boost.It doesn't cover all file systems, and it was never fully complete.
Lets kill it.Signed-off-by: Justin TerAvest
Signed-off-by: Jens Axboe
11 Jul, 2011
1 commit
-
There is no consistency among filesystems from what bios (or requests)
are marked as being metadata. It's interesting to expose this in traces,
but we shouldn't schedule the requests differently based on whether or
not they're marked as being metadata.Signed-off-by: Justin TerAvest
Signed-off-by: Jens Axboe
08 Jul, 2011
2 commits
-
I'm often confused why not disable preempt when changing blk_plug list. It
would be better to add comments here in case others have the similar concerns.Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe -
When I test fio script with big I/O depth, I found the total throughput drops
compared to some relative small I/O depth. The reason is the thread accumulates
big requests in its plug list and causes some delays (surely this depends
on CPU speed).
I thought we'd better have a threshold for requests. When a threshold reaches,
this means there is no request merge and queue lock contention isn't severe
when pushing per-task requests to queue, so the main advantages of blk plug
don't exist. We can force a plug list flush in this case.
With this, my test throughput actually increases and almost equals to small
I/O depth. Another side effect is irq off time decreases in blk_flush_plug_list()
for big I/O depth.
The BLK_MAX_REQUEST_COUNT is choosen arbitarily, but 16 is efficiently to
reduce lock contention to me. But I'm open here, 32 is ok in my test too.Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe
07 Jul, 2011
2 commits
-
Fix headers_check error introduced by 390192b30057:
include/linux/fd.h:6: included file 'linux/compat.h' is not exported
Signed-off-by: Johannes Stezenbach
Signed-off-by: Jens Axboe -
Due to the recently identified overflow in read_capacity_16() it was
possible for max_discard_sectors to be zero but still have discards
enabled on the associated device's queue.Eliminate the possibility for blkdev_issue_discard to infinitely loop.
Interestingly this issue wasn't identified until a device, whose
discard_granularity was 0 due to read_capacity_16 overflow, was consumed
by blk_stack_limits() to construct limits for a higher-level DM
multipath device. The multipath device's resulting limits never had the
discard limits stacked because blk_stack_limits() will only do so if
the bottom device's discard_granularity != 0. This resulted in the
multipath device's limits.max_discard_sectors being 0.Signed-off-by: Mike Snitzer
Signed-off-by: Jens Axboe
02 Jul, 2011
1 commit
-
On Linux x86_64 host with 32bit userspace, running
qemu or even just "qemu-img create -f qcow2 some.img 1G"
causes a kernel warning:ioctl32(qemu-img:5296): Unknown cmd fd(3) cmd(00005326){t:'S';sz:0} arg(7fffffff) on some.img
ioctl32(qemu-img:5296): Unknown cmd fd(3) cmd(801c0204){t:02;sz:28} arg(fff77350) on some.imgioctl 00005326 is CDROM_DRIVE_STATUS,
ioctl 801c0204 is FDGETPRM.The warning appears because the Linux compat-ioctl handler for these
ioctls only applies to block devices, while qemu also uses the ioctls on
plain files.Signed-off-by: Johannes Stezenbach
Acked-by: Arnd Bergmann
Signed-off-by: Jens Axboe
01 Jul, 2011
2 commits
-
Currently, only open(2) is defined as the 'clearing' point. It has
two roles - first, it's an acknowledgement from userland indicating
that the event has been received and kernel can clear pending states
and proceed to generate more events. Secondly, it's passed on to
device drivers as a hint indicating that a synchronization point has
been reached and it might want to take a deeper look at the device.The latter currently is only used by sr which uses two different
mechanisms - GET_EVENT_MEDIA_STATUS_NOTIFICATION and TEST_UNIT_READY
to discover events, where the former is lighter weight and safe to be
used repeatedly but may not provide full coverage. Among other
things, GET_EVENT can't detect media removal while TUR can.This patch makes close(2) - blkdev_put() - indicate clearing hint for
MEDIA_CHANGE to drivers. disk_check_events() is renamed to
disk_flush_events() and updated to take @mask for events to flush
which is or'd to ev->clearing and will be passed to the driver on the
next ->check_events() invocation.This change makes sr generate MEDIA_CHANGE when media is ejected from
userland - e.g. with eject(1).Note: Given the current usage, it seems @clearing hint is needlessly
complex. disk_clear_events() can simply clear all events and the hint
can be boolean @flush.Signed-off-by: Tejun Heo
Cc: Kay Sievers
Signed-off-by: Jens Axboe -
Conflicts:
block/blk-throttle.c
block/cfq-iosched.cSigned-off-by: Jens Axboe
30 Jun, 2011
8 commits
-
We used to write these with BIO_RW_BARRIER aka REQ_HARDBARRIER (unless
disabled in the configuration). The correct semantic now would be to
write with FLUSH/FUA.
For example, with activity log transactions, FUA alone is not enough, we
need the corresponding bitmap update (and all related application
updates) on stable storage as well.Signed-off-by: Philipp Reisner
Signed-off-by: Lars Ellenberg -
Signed-off-by: Philipp Reisner
Signed-off-by: Lars Ellenberg -
If we have an asymetrically congested network, we may send P_PING,
but due to congestion, the corresponding P_PING_ACK would time out,
and we would drop a (congested, but otherwise) healthy connection
("PingAck did not arrive in time.")Signed-off-by: Philipp Reisner
Signed-off-by: Lars Ellenberg -
If we have a good resync rate, we will frequently update the on-disk
bitmap, which, if not accounted for as resync io, may let an otherwise
idle device appear to be "busy", and cause us to throttle resync.Signed-off-by: Philipp Reisner
Signed-off-by: Lars Ellenberg -
The last commit, drbd: add missing spinlock to bitmap receive,
introduced a cond_resched_lock(), where the lock in question is taken
with irqs disabled.As we must not schedule with IRQs disabled,
and cond_resched_lock_irq() does not exist, yet,
we re-aquire the spin_lock_irq() for each bitmap page processed in turn.Signed-off-by: Philipp Reisner
Signed-off-by: Lars Ellenberg -
During bitmap exchange, when using the RLE bitmap compression scheme,
we have a code path that can set the whole bitmap at once.To avoid holding spin_lock_irq() for too long, we used to lock out other
bitmap modifications during bitmap exchange by other means, and then,
knowing we have exclusive access to the bitmap, modify it without
the spinlock, and with IRQs enabled.Since we now allow local IO to continue, potentially setting additional
bits during the bitmap receive phase, this is no longer true, and we get
uncoordinated updates of bitmap members, causing bm_set to no longer
accurately reflect the total number of set bits.To actually see this, you'd need to have a large bitmap, use RLE bitmap
compression, and have busy IO during sync handshake and bitmap exchange.Fix this by taking the spin_lock_irq() in this code path as well, but
calling cond_resched_lock() after each page worth of bits processed.Signed-off-by: Philipp Reisner
Signed-off-by: Lars Ellenberg -
Signed-off-by: Philipp Reisner
Signed-off-by: Lars Ellenberg
27 Jun, 2011
4 commits
-
ioc->ioc_data is rcu protectd, so uses correct API to access it.
This doesn't change any behavior, but just make code consistent.Signed-off-by: Shaohua Li
Cc: stable@kernel.org # after ab4bd22d
Signed-off-by: Jens Axboe -
I got a rcu warnning at boot. the ioc->ioc_data is rcu_deferenced, but
doesn't hold rcu_read_lock.Signed-off-by: Shaohua Li
Cc: stable@kernel.org # after ab4bd22d
Signed-off-by: Jens Axboe -
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: mark CONFIG_CIFS_NFSD_EXPORT as BROKEN
cifs: free blkcipher in smbhash -
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
cifs: propagate errors from cifs_get_root() to mount(2)
cifs: tidy cifs_do_mount() up a bit
cifs: more breakage on mount failures
cifs: close sget() races
cifs: pull freeing mountdata/dropping nls/freeing cifs_sb into cifs_umount()
cifs: move cifs_umount() call into ->kill_sb()
cifs: pull cifs_mount() call up
sanitize cifs_umount() prototype
cifs: initialize ->tlink_tree in cifs_setup_cifs_sb()
cifs: allocate mountdata earlier
cifs: leak on mount if we share superblock
cifs: don't pass superblock to cifs_mount()
cifs: don't leak nls on mount failure
cifs: double free on mount failure
take bdi setup/destruction into cifs_mount/cifs_umountAcked-by: Steve French
26 Jun, 2011
1 commit
25 Jun, 2011
17 commits
-
…l/git/tip/linux-2.6-tip
* 'timer-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rtc: vt8500: Fix build error & cleanup rtc_class_ops->update_irq_enable()
alarmtimers: Return -ENOTSUPP if no RTC device is present
alarmtimers: Handle late rtc module loading -
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: Remove unneeded version.h includes from sound/
ASoC: pxa-ssp: Correct check for stream presence
ASoC: imx: add missing module informations
ASoC: imx: Remove unused Kconfig SND_MXC_SOC_SSI entry
ALSA: HDA: Pinfix quirk for HP Z200 Workstation
ALSA: VIA HDA: Create a master amplifier control for VT1718S.
ALSA: VIA HDA: Mute/unmute mixer conncted to Headphone for VT1718S.
ALSA: VIA HDA: Modify initial verbs list for VT1718S.
ALSA: hda - Remove ALC268 model override for CPR2000
ALSA: HDA: Remove quirk for an HP device
ASoC: Remove unused and about to be broken SND_SOC_CUSTOM I/O bus -
…/linux into timers/urgent
* rtc: vt8500: Fix build error & cleanup rtc_class_ops->update_irq_enable()
-
* 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6:
drm/i915: save/resume forcewake lock fixes
Revert "drm/i915: Kill GTT mappings when moving from GTT domain"
drm/i915: Apply HWSTAM workaround for BSD ring on SandyBridge
drm/i915: Call intel_enable_plane from i9xx_crtc_mode_set (again) -
... instead of just failing with -EINVAL
Acked-by: Pavel Shilovsky
Reviewed-by: Jeff Layton
Signed-off-by: Al Viro -
Acked-by: Pavel Shilovsky
Reviewed-by: Jeff Layton
Signed-off-by: Al Viro -
if cifs_get_root() fails, we end up with ->mount() returning NULL,
which is not what callers expect. Moreover, in case of superblock
reuse we end up leaking a superblock reference...Acked-by: Pavel Shilovsky
Reviewed-by: Jeff Layton
Signed-off-by: Al Viro -
have ->s_fs_info set by the set() callback passed to sget()
Acked-by: Pavel Shilovsky
Reviewed-by: Jeff Layton
Signed-off-by: Al Viro -
all callers of cifs_umount() proceed to do the same thing; pull it into
cifs_umount() itself.Acked-by: Pavel Shilovsky
Reviewed-by: Jeff Layton
Signed-off-by: Al Viro -
instead of calling it manually in case if cifs_read_super() fails
to set ->s_root, just call it from ->kill_sb(). cifs_put_super()
is gone now *and* we have cifs_sb shutdown and destruction done
after the superblock is gone from ->s_instances.Acked-by: Pavel Shilovsky
Reviewed-by: Jeff Layton
Signed-off-by: Al Viro -
... to the point prior to sget(). Now we have cifs_sb set up early
enough.Acked-by: Pavel Shilovsky
Reviewed-by: Jeff Layton
Signed-off-by: Al Viro -
a) superblock argument is unused
b) it always returns 0Acked-by: Pavel Shilovsky
Reviewed-by: Jeff Layton
Signed-off-by: Al Viro -
no need to wait until cifs_read_super() and we need it done
by the time cifs_mount() will be called.Acked-by: Pavel Shilovsky
Reviewed-by: Jeff Layton
Signed-off-by: Al Viro -
pull mountdata allocation up, so that it won't stand in the way when
we lift cifs_mount() to location before sget().Acked-by: Pavel Shilovsky
Reviewed-by: Jeff Layton
Signed-off-by: Al Viro -
cifs_sb and nls end up leaked...
Acked-by: Pavel Shilovsky
Reviewed-by: Jeff Layton
Signed-off-by: Al Viro -
To close sget() races we'll need to be able to set cifs_sb up before
we get the superblock, so we'll want to be able to do cifs_mount()
earlier. Fortunately, it's easy to do - setting ->s_maxbytes can
be done in cifs_read_super(), ditto for ->s_time_gran and as for
putting MS_POSIXACL into ->s_flags, we can mirror it in ->mnt_cifs_flags
until cifs_read_super() is called. Kill unused 'devname' argument,
while we are at it...Acked-by: Pavel Shilovsky
Reviewed-by: Jeff Layton
Signed-off-by: Al Viro -
if cifs_sb allocation fails, we still need to drop nls we'd stashed
into volume_info - the one we would've copied to cifs_sb if we could
allocate the latter.Acked-by: Pavel Shilovsky
Reviewed-by: Jeff Layton
Signed-off-by: Al Viro