09 Oct, 2008
29 commits
-
A klist entry is kept on the list till all its current iterations are
finished; however, a new iteration after deletion also iterates over
deleted entries as long as their reference count stays above zero.
This causes problems for cases where there are users which iterate
over the list while synchronized against list manipulations and
natuarally expect already deleted entries to not show up during
iteration.This patch implements dead flag which gets set on deletion so that
iteration can skip already deleted entries. The dead flag piggy backs
on the lowest bit of knode->n_klist and only visible to klist
implementation proper.While at it, drop klist_iter->i_head as it's redundant and doesn't
offer anything in semantics or performance wise as klist_iter->i_klist
is dereferenced on every iteration anyway.Signed-off-by: Tejun Heo
Cc: Greg Kroah-Hartman
Cc: Alan Stern
Cc: Jens Axboe
Signed-off-by: Jens Axboe -
…in them as needed. Fix changed function parameter names. Fix typos/spellos. In comments, change REQ_SPECIAL to REQ_TYPE_SPECIAL and REQ_BLOCK_PC to REQ_TYPE_BLOCK_PC.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com> -
raid5 can overflow with more than 255 stripes, and we can increase it
to an int for free on both 32 and 64-bit archs due to the padding.Signed-off-by: Jens Axboe
-
Signed-off-by: Jens Axboe
-
Remove hw_segments field from struct bio and struct request. Without virtual
merge accounting they have no purpose.Signed-off-by: Mikulas Patocka
Signed-off-by: Jens Axboe -
Remove virtual merge accounting.
Signed-off-by: Mikulas Patocka
Signed-off-by: Jens Axboe -
Update the description of fifo_batch to match the current implementation,
and include a description of how to tune it.Signed-off-by: Aaron Carroll
Signed-off-by: Jens Axboe -
* convert goto to simpler while loop;
* use rq_end_sector() instead of computing manually;
* fix false comments;
* remove spurious whitespace;
* convert rq_rb_root macro to an inline function.Signed-off-by: Aaron Carroll
Signed-off-by: Jens Axboe -
Deadline currently only batches sector-contiguous requests, so except
for a few circumstances (e.g. requests in a single direction), it is
essentially first come first served. This is bad for throughput, so
change it to CSCAN, which means requests in a batch do not need to be
sequential and are issued in increasing sector order.Signed-off-by: Aaron Carroll
Signed-off-by: Jens Axboe -
struct request has an ioprio member but it is never updated because
currently bios do not hold io context information. The implication of
this is that virtio_blk ends up passing useless information to the
backend driver.That said, some IO schedulers such as CFQ do store io context
information in struct request, but use private members for that, which
means that that information cannot be directly accessed in a IO
scheduler-independent way.This patch adds a function to obtain the ioprio of a request. We should
avoid accessing ioprio directly and use this function instead, so that
its users do not have to care about future changes in block layer
structures or what the currently active IO controller is.This patch does not introduce any functional changes but paves the way
for future clean-ups and enhancements.Signed-off-by: Fernando Luis Vazquez Cao
Acked-by: Rusty Russell
Signed-off-by: Jens Axboe -
It was only used by ps3disk, and it should probably have been
REQ_TYPE_LINUX_BLOCK + REQ_LB_OP_FLUSH.Signed-off-by: David Woodhouse
Signed-off-by: Jens Axboe -
But blkdev_issue_discard() still emits requests which are interpreted as
soft barriers, because naïve callers might otherwise issue subsequent
writes to those same sectors, which might cross on the queue (if they're
reallocated quickly enough).Callers still _can_ issue non-barrier discard requests, but they have to
take care of queue ordering for themselves.Signed-off-by: David Woodhouse
Signed-off-by: Jens Axboe -
We may well want mkfs tools to use this to mark the whole device as
unwanted before they format it, for example.The ioctl takes a pair of uint64_ts, which are start offset and length
in _bytes_. Although at the moment it might make sense for them both to
be in 512-byte sectors, I don't want to limit the ABI to that.Signed-off-by: David Woodhouse
Signed-off-by: Jens Axboe -
Barriers should be submitted with the WRITE flag set.
Signed-off-by: OGAWA Hirofumi
Signed-off-by: David Woodhouse
Signed-off-by: Jens Axboe -
Let the compiler see what's going on, and it can all get a lot simpler.
On PPC64 this reduces the size of the code calculating these bits by
about 60%. On x86_64 it's less of a win -- only 40%.Signed-off-by: David Woodhouse
Signed-off-by: Jens Axboe -
Signed-off-by: David Woodhouse
Signed-off-by: Jens Axboe -
We can benefit from knowing that the file system no longer cares about
the contents of certain sectors, by throwing them away immediately and
then never having to garbage collect them, and using the extra free
space to make our operations more efficient. Do so.Signed-off-by: David Woodhouse
Signed-off-by: Jens Axboe -
Signed-off-by: David Woodhouse
Signed-off-by: Jens Axboe -
[hirofumi@mail.parknet.co.jp: discard _after_ checking for corrupt chains]
Signed-off-by: David Woodhouse
Acked-by: OGAWA Hirofumi
Signed-off-by: Jens Axboe -
Some block devices benefit from a hint that they can forget the contents
of certain sectors. Add basic support for this to the block core, along
with a 'blkdev_issue_discard()' helper function which issues such
requests.The caller doesn't get to provide an end_io functio, since
blkdev_issue_discard() will automatically split the request up into
multiple bios if appropriate. Neither does the function wait for
completion -- it's expected that callers won't care about when, or even
_if_, the request completes. It's only a hint to the device anyway. By
definition, the file system doesn't _care_ about these sectors any more.[With feedback from OGAWA Hirofumi and
Jens Axboe
Signed-off-by: Jens Axboe -
Signed-off-by: David Woodhouse
Signed-off-by: Jens Axboe -
Signed-off-by: Jens Axboe
-
Signed-off-by: Jens Axboe
-
Signed-off-by: Jens Axboe
-
Signed-off-by: Jens Axboe
-
I have another request for the block filter SG_IO command whitelist,
specifically the MMC streaming command set SET READ AHEAD command.
The command applies only to MMC CDROM/DVDROM drives with the streaming
optional feature set. The command is useful to cdparanoia in that it
allows explicit cache control side effects that are, on many drives,
cdparanoia's most efficient way to flush/disable the media cache on
cdrom drives. I am aware of no reason why it should not be accessible
from usespace.Also note that the command is already fully accessible through the
SCSI-native version of the SG_IO ioctl as well as the traditional SG
interface. The command is only being refused on block devices. That
means that on a typical stock distro, the command is available through
/dev/sg* but not /dev/scd* although both are typically available and
accessible. Filtering the command is not providing any protection,
only a confusing inconsistency.Signed-off-by: Jens Axboe
-
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] Sibyte: Register PIO PATA device only for Swarm and Litte Sur -
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
tcp: Fix tcp_hybla zero congestion window growth with small rho and large cwnd.
net: Fix netdev_run_todo dead-lock
tcp: Fix possible double-ack w/ user dma
net: only invoke dev->change_rx_flags when device is UP
netrom: Fix sock_orphan() use in nr_release
ax25: Quick fix for making sure unaccepted sockets get destroyed.
Revert "ax25: Fix std timer socket destroy handling."
[Bluetooth] Add reset quirk for A-Link BlueUSB21 dongle
[Bluetooth] Add reset quirk for new Targus and Belkin dongles
[Bluetooth] Fix double frees on error paths of btusb and bpa10x drivers -
Symbol name spaghetti which is too complicated to cleanup on this stage
of the release cycle breaks the build on BCM1480 platforms.Signed-off-by: Ralf Baechle
08 Oct, 2008
6 commits
-
Because of rounding, in certain conditions, i.e. when in congestion
avoidance state rho is smaller than 1/128 of the current cwnd, TCP
Hybla congestion control starves and the cwnd is kept constant
forever.This patch forces an increment by one segment after #send_cwnd calls
without increments(newreno behavior).Signed-off-by: Daniele Lacamera
Signed-off-by: David S. Miller -
Benjamin Thery tracked down a bug that explains many instances
of the errorunregister_netdevice: waiting for %s to become free. Usage count = %d
It turns out that netdev_run_todo can dead-lock with itself if
a second instance of it is run in a thread that will then free
a reference to the device waited on by the first instance.The problem is really quite silly. We were trying to create
parallelism where none was required. As netdev_run_todo always
follows a RTNL section, and that todo tasks can only be added
with the RTNL held, by definition you should only need to wait
for the very ones that you've added and be done with it.There is no need for a second mutex or spinlock.
This is exactly what the following patch does.
Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller -
From: Ali Saidi
When TCP receive copy offload is enabled it's possible that
tcp_rcv_established() will cause two acks to be sent for a single
packet. In the case that a tcp_dma_early_copy() is successful,
copied_early is set to true which causes tcp_cleanup_rbuf() to be
called early which can send an ack. Further along in
tcp_rcv_established(), __tcp_ack_snd_check() is called and will
schedule a delayed ACK. If no packets are processed before the delayed
ack timer expires the packet will be acked twice.Signed-off-by: David S. Miller
-
Jesper Dangaard Brouer reported a bug when setting a VLAN
device down that is in promiscous mode:When the VLAN device is set down, the promiscous count on the real
device is decremented by one by vlan_dev_stop(). When removing the
promiscous flag from the VLAN device afterwards, the promiscous
count on the real device is decremented a second time by the
vlan_change_rx_flags() callback.The root cause for this is that the ->change_rx_flags() callback is
invoked while the device is down. The synchronization is meant to mirror
the behaviour of the ->set_rx_mode callbacks, meaning the ->open function
is responsible for doing a full sync on open, the ->close() function is
responsible for doing full cleanup on ->stop() and ->change_rx_flags()
is meant to do incremental changes while the device is UP.Only invoke ->change_rx_flags() while the device is UP to provide the
intended behaviour.Tested-by: Jesper Dangaard Brouer
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
SLOB's ksize calculation was braindamaged and generally harmlessly
underreported the allocation size. But for very small buffers, it could
in fact overreport them, leading code depending on krealloc to overrun
the allocation and trample other data.Signed-off-by: Matt Mackall
Tested-by: Peter Zijlstra
Signed-off-by: Linus Torvalds
07 Oct, 2008
5 commits
-
This reverts commit 135aedc38e812b922aa56096f36a3d72ffbcf2fb, as
requested by Hans Verkuil.It was a patch for 2.6.28 where the BKL was pushed down from v4l core to
the drivers, not for 2.6.27!Requested-by: Hans Verkuil
Cc: Mauro Carvalho Chehab
Signed-of-by: Linus Torvalds -
* Theodore Ts'o (tytso@mit.edu) wrote:
>
> I've been playing with adding some markers into ext4 to see if they
> could be useful in solving some problems along with Systemtap. It
> appears, though, that as of 2.6.27-rc8, markers defined in code which is
> compiled directly into the kernel (i.e., not as modules) don't show up
> in Module.markers:
>
> kvm_trace_entryexit arch/x86/kvm/kvm-intel %u %p %u %u %u %u %u %u
> kvm_trace_handler arch/x86/kvm/kvm-intel %u %p %u %u %u %u %u %u
> kvm_trace_entryexit arch/x86/kvm/kvm-amd %u %p %u %u %u %u %u %u
> kvm_trace_handler arch/x86/kvm/kvm-amd %u %p %u %u %u %u %u %u
>
> (Note the lack of any of the kernel_sched_* markers, and the markers I
> added for ext4_* and jbd2_* are missing as wel.)
>
> Systemtap apparently depends on in-kernel trace_mark being recorded in
> Module.markers, and apparently it's been claimed that it used to be
> there. Is this a bug in systemtap, or in how Module.markers is getting
> built? And is there a file that contains the equivalent information
> for markers located in non-modules code?I think the problem comes from "markers: fix duplicate modpost entry"
(commit d35cb360c29956510b2fe1a953bd4968536f7216)Especially :
- add_marker(mod, marker, fmt);
+ if (!mod->skip)
+ add_marker(mod, marker, fmt);
}
return;
fail:Here is a fix that should take care if this problem.
Thanks for the bug report!
Signed-off-by: Mathieu Desnoyers
Tested-by: "Theodore Ts'o"
CC: Greg KH
CC: David Smith
CC: Roland McGrath
CC: Sam Ravnborg
CC: Wenji Huang
CC: Takashi Nishiie
Signed-off-by: Linus Torvalds -
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
kgdb: call touch_softlockup_watchdog on resume
kgdb, x86: Avoid invoking kgdb_nmicallback twice per NMI -
…git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: gart iommu have direct mapping when agp is present too