Eric Lee / linux-smarc-t335x-v3.2

09 Oct, 2008

29 commits

a1ed5b0cf klist: don't iterate over deleted entries ... Browse Code »

A klist entry is kept on the list till all its current iterations are
finished; however, a new iteration after deletion also iterates over
deleted entries as long as their reference count stays above zero.
This causes problems for cases where there are users which iterate
over the list while synchronized against list manipulations and
natuarally expect already deleted entries to not show up during
iteration.

This patch implements dead flag which gets set on deletion so that
iteration can skip already deleted entries. The dead flag piggy backs
on the lowest bit of knode->n_klist and only visible to klist
implementation proper.

While at it, drop klist_iter->i_head as it's redundant and doesn't
offer anything in semantics or performance wise as klist_iter->i_klist
is dereferenced on every iteration anyway.

Signed-off-by: Tejun Heo
Cc: Greg Kroah-Hartman
Cc: Alan Stern
Cc: Jens Axboe
Signed-off-by: Jens Axboe

Tejun Heo
2008-10-09 14:56:04 +0800
710027a48 Add some block/ source files to the kernel-api docbook. Fix kernel-doc notation … ... Browse Code »

…in them as needed. Fix changed function parameter names. Fix typos/spellos. In comments, change REQ_SPECIAL to REQ_TYPE_SPECIAL and REQ_BLOCK_PC to REQ_TYPE_BLOCK_PC.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

Randy Dunlap
2008-10-09 14:56:03 +0800
5b99c2ffa block: make bi_phys_segments an unsigned int instead of short ... Browse Code »

raid5 can overflow with more than 255 stripes, and we can increase it
to an int for free on both 32 and 64-bit archs due to the padding.

Signed-off-by: Jens Axboe

Jens Axboe
2008-10-09 14:56:03 +0800
960e739d9 block: raid fixups for removal of bi_hw_segments ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2008-10-09 14:56:03 +0800
5df97b91b drop vmerge accounting ... Browse Code »

Remove hw_segments field from struct bio and struct request. Without virtual
merge accounting they have no purpose.

Signed-off-by: Mikulas Patocka
Signed-off-by: Jens Axboe

Mikulas Patocka
2008-10-09 14:56:03 +0800
b8b3e16cf block: drop virtual merging accounting ... Browse Code »

Remove virtual merge accounting.

Signed-off-by: Mikulas Patocka
Signed-off-by: Jens Axboe

Mikulas Patocka
2008-10-09 14:56:03 +0800
6a421c1dc block: update documentation for deadline fifo_batch tunable ... Browse Code »

Update the description of fifo_batch to match the current implementation,
and include a description of how to tune it.

Signed-off-by: Aaron Carroll
Signed-off-by: Jens Axboe

Aaron Carroll
2008-10-09 14:56:03 +0800
4fb72f764 deadline-iosched: non-functional fixes ... Browse Code »

* convert goto to simpler while loop;
* use rq_end_sector() instead of computing manually;
* fix false comments;
* remove spurious whitespace;
* convert rq_rb_root macro to an inline function.

Signed-off-by: Aaron Carroll
Signed-off-by: Jens Axboe

Aaron Carroll
2008-10-09 14:56:03 +0800
63de428b1 deadline-iosched: allow non-sequential batching ... Browse Code »

Deadline currently only batches sector-contiguous requests, so except
for a few circumstances (e.g. requests in a single direction), it is
essentially first come first served. This is bad for throughput, so
change it to CSCAN, which means requests in a batch do not need to be
sequential and are issued in increasing sector order.

Signed-off-by: Aaron Carroll
Signed-off-by: Jens Axboe

Aaron Carroll
2008-10-09 14:56:02 +0800
766ca4428 virtio_blk: use a wrapper function to access io context information of IO requests ... Browse Code »

struct request has an ioprio member but it is never updated because
currently bios do not hold io context information. The implication of
this is that virtio_blk ends up passing useless information to the
backend driver.

That said, some IO schedulers such as CFQ do store io context
information in struct request, but use private members for that, which
means that that information cannot be directly accessed in a IO
scheduler-independent way.

This patch adds a function to obtain the ioprio of a request. We should
avoid accessing ioprio directly and use this function instead, so that
its users do not have to care about future changes in block layer
structures or what the currently active IO controller is.

This patch does not introduce any functional changes but paves the way
for future clean-ups and enhancements.

Signed-off-by: Fernando Luis Vazquez Cao
Acked-by: Rusty Russell
Signed-off-by: Jens Axboe

Fernando Luis Vázquez Cao
2008-10-09 14:56:02 +0800
1a8e2bddd Kill REQ_TYPE_FLUSH ... Browse Code »

It was only used by ps3disk, and it should probably have been
REQ_TYPE_LINUX_BLOCK + REQ_LB_OP_FLUSH.

Signed-off-by: David Woodhouse
Signed-off-by: Jens Axboe

David Woodhouse
2008-10-09 14:56:02 +0800
e17fc0a1c Allow elevators to sort/merge discard requests ... Browse Code »

But blkdev_issue_discard() still emits requests which are interpreted as
soft barriers, because naïve callers might otherwise issue subsequent
writes to those same sectors, which might cross on the queue (if they're
reallocated quickly enough).

Callers still _can_ issue non-barrier discard requests, but they have to
take care of queue ordering for themselves.

Signed-off-by: David Woodhouse
Signed-off-by: Jens Axboe

David Woodhouse
2008-10-09 14:56:02 +0800
d30a2605b Add BLKDISCARD ioctl to allow userspace to discard sectors ... Browse Code »

We may well want mkfs tools to use this to mark the whole device as
unwanted before they format it, for example.

The ioctl takes a pair of uint64_ts, which are start offset and length
in _bytes_. Although at the moment it might make sense for them both to
be in 512-byte sectors, I don't want to limit the ABI to that.

Signed-off-by: David Woodhouse
Signed-off-by: Jens Axboe

David Woodhouse
2008-10-09 14:56:02 +0800
2ebca85ab Use WRITE_BARRIER in blkdev_issue_flush(), not (1<<BIO_RW_BARRIER) ... Browse Code »

Barriers should be submitted with the WRITE flag set.

Signed-off-by: OGAWA Hirofumi
Signed-off-by: David Woodhouse
Signed-off-by: Jens Axboe

OGAWA Hirofumi
2008-10-09 14:56:02 +0800
35ba8f708 blktrace: simplify flags handling in __blk_add_trace ... Browse Code »

Let the compiler see what's going on, and it can all get a lot simpler.
On PPC64 this reduces the size of the code calculating these bits by
about 60%. On x86_64 it's less of a win -- only 40%.

Signed-off-by: David Woodhouse
Signed-off-by: Jens Axboe

David Woodhouse
2008-10-09 14:56:01 +0800
27b29e86b blktrace: support discard requests ... Browse Code »

Signed-off-by: David Woodhouse
Signed-off-by: Jens Axboe

David Woodhouse
2008-10-09 14:56:01 +0800
fdc53971b Support 'discard sectors' operation. ... Browse Code »

We can benefit from knowing that the file system no longer cares about
the contents of certain sectors, by throwing them away immediately and
then never having to garbage collect them, and using the extra free
space to make our operations more efficient. Do so.

Signed-off-by: David Woodhouse
Signed-off-by: Jens Axboe

David Woodhouse
2008-10-09 14:56:01 +0800
eae9acd13 Support 'discard sectors' operation in translation layer support core ... Browse Code »

Signed-off-by: David Woodhouse
Signed-off-by: Jens Axboe

David Woodhouse
2008-10-09 14:56:01 +0800
8c540a96c Let the block device know when sectors can be discarded ... Browse Code »

[hirofumi@mail.parknet.co.jp: discard _after_ checking for corrupt chains]

Signed-off-by: David Woodhouse
Acked-by: OGAWA Hirofumi
Signed-off-by: Jens Axboe

David Woodhouse
2008-10-09 14:56:01 +0800
fb2dce862 Add 'discard' request handling ... Browse Code »

Some block devices benefit from a hint that they can forget the contents
of certain sectors. Add basic support for this to the block core, along
with a 'blkdev_issue_discard()' helper function which issues such
requests.

The caller doesn't get to provide an end_io functio, since
blkdev_issue_discard() will automatically split the request up into
multiple bios if appropriate. Neither does the function wait for
completion -- it's expected that callers won't care about when, or even
_if_, the request completes. It's only a hint to the device anyway. By
definition, the file system doesn't _care_ about these sectors any more.

[With feedback from OGAWA Hirofumi and
Jens Axboe
Signed-off-by: Jens Axboe

David Woodhouse
2008-10-09 14:56:01 +0800
d628eaef3 Fix up comments about matching flags between bio and rq ... Browse Code »

Signed-off-by: David Woodhouse
Signed-off-by: Jens Axboe

David Woodhouse
2008-10-09 14:56:01 +0800
36144077b highmem: use bio_has_data() in the bounce path ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2008-10-09 14:56:01 +0800
051cc3952 block: use bio_has_data() in the IO completion path ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2008-10-09 14:56:00 +0800
a9c701e59 block: use bio_has_data() to check for data carrying bio ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2008-10-09 14:56:00 +0800
7a67f63b3 block: add bio_has_data() to detect whether a bio carries data or not ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2008-10-09 14:56:00 +0800
35e396cd1 SG_IO block filter whitelist missing MMC SET READ AHEAD command ... Browse Code »

I have another request for the block filter SG_IO command whitelist,
specifically the MMC streaming command set SET READ AHEAD command.
The command applies only to MMC CDROM/DVDROM drives with the streaming
optional feature set. The command is useful to cdparanoia in that it
allows explicit cache control side effects that are, on many drives,
cdparanoia's most efficient way to flush/disable the media cache on
cdrom drives. I am aware of no reason why it should not be accessible
from usespace.

Also note that the command is already fully accessible through the
SCSI-native version of the SG_IO ioctl as well as the traditional SG
interface. The command is only being refused on block devices. That
means that on a typical stock distro, the command is available through
/dev/sg* but not /dev/scd* although both are typically available and
accessible. Filtering the command is not providing any protection,
only a confusing inconsistency.

Signed-off-by: Jens Axboe

xiphmont@xiph.org
2008-10-09 14:56:00 +0800
69849375d Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus ... Browse Code »

* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] Sibyte: Register PIO PATA device only for Swarm and Litte Sur

Linus Torvalds
2008-10-09 02:41:10 +0800
392eaef2e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
tcp: Fix tcp_hybla zero congestion window growth with small rho and large cwnd.
net: Fix netdev_run_todo dead-lock
tcp: Fix possible double-ack w/ user dma
net: only invoke dev->change_rx_flags when device is UP
netrom: Fix sock_orphan() use in nr_release
ax25: Quick fix for making sure unaccepted sockets get destroyed.
Revert "ax25: Fix std timer socket destroy handling."
[Bluetooth] Add reset quirk for A-Link BlueUSB21 dongle
[Bluetooth] Add reset quirk for new Targus and Belkin dongles
[Bluetooth] Fix double frees on error paths of btusb and bpa10x drivers

Linus Torvalds
2008-10-09 02:40:19 +0800
880604887 [MIPS] Sibyte: Register PIO PATA device only for Swarm and Litte Sur ... Browse Code »

Symbol name spaghetti which is too complicated to cleanup on this stage
of the release cycle breaks the build on BCM1480 platforms.

Signed-off-by: Ralf Baechle

Ralf Baechle
2008-10-09 02:19:28 +0800

08 Oct, 2008

6 commits

9d2c27e17 tcp: Fix tcp_hybla zero congestion window growth with small rho and large cwnd. ... Browse Code »

Because of rounding, in certain conditions, i.e. when in congestion
avoidance state rho is smaller than 1/128 of the current cwnd, TCP
Hybla congestion control starves and the cwnd is kept constant
forever.

This patch forces an increment by one segment after #send_cwnd calls
without increments(newreno behavior).

Signed-off-by: Daniele Lacamera
Signed-off-by: David S. Miller

Daniele Lacamera
2008-10-08 06:58:17 +0800
58ec3b4db net: Fix netdev_run_todo dead-lock ... Browse Code »

Benjamin Thery tracked down a bug that explains many instances
of the error

unregister_netdevice: waiting for %s to become free. Usage count = %d

It turns out that netdev_run_todo can dead-lock with itself if
a second instance of it is run in a thread that will then free
a reference to the device waited on by the first instance.

The problem is really quite silly. We were trying to create
parallelism where none was required. As netdev_run_todo always
follows a RTNL section, and that todo tasks can only be added
with the RTNL held, by definition you should only need to wait
for the very ones that you've added and be done with it.

There is no need for a second mutex or spinlock.

This is exactly what the following patch does.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2008-10-08 06:50:03 +0800
742201e7b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6 Browse Code »

David S. Miller
2008-10-08 06:32:20 +0800
53240c208 tcp: Fix possible double-ack w/ user dma ... Browse Code »

From: Ali Saidi

When TCP receive copy offload is enabled it's possible that
tcp_rcv_established() will cause two acks to be sent for a single
packet. In the case that a tcp_dma_early_copy() is successful,
copied_early is set to true which causes tcp_cleanup_rbuf() to be
called early which can send an ack. Further along in
tcp_rcv_established(), __tcp_ack_snd_check() is called and will
schedule a delayed ACK. If no packets are processed before the delayed
ack timer expires the packet will be acked twice.

Signed-off-by: David S. Miller

Ali Saidi
2008-10-08 06:31:19 +0800
b6c40d68f net: only invoke dev->change_rx_flags when device is UP ... Browse Code »

Jesper Dangaard Brouer reported a bug when setting a VLAN
device down that is in promiscous mode:

When the VLAN device is set down, the promiscous count on the real
device is decremented by one by vlan_dev_stop(). When removing the
promiscous flag from the VLAN device afterwards, the promiscous
count on the real device is decremented a second time by the
vlan_change_rx_flags() callback.

The root cause for this is that the ->change_rx_flags() callback is
invoked while the device is down. The synchronization is meant to mirror
the behaviour of the ->set_rx_mode callbacks, meaning the ->open function
is responsible for doing a full sync on open, the ->close() function is
responsible for doing full cleanup on ->stop() and ->change_rx_flags()
is meant to do incremental changes while the device is UP.

Only invoke ->change_rx_flags() while the device is UP to provide the
intended behaviour.

Tested-by: Jesper Dangaard Brouer

Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2008-10-08 06:26:48 +0800
85ba94ba0 SLOB: fix bogus ksize calculation ... Browse Code »

SLOB's ksize calculation was braindamaged and generally harmlessly
underreported the allocation size. But for very small buffers, it could
in fact overreport them, leading code depending on krealloc to overrun
the allocation and trample other data.

Signed-off-by: Matt Mackall
Tested-by: Peter Zijlstra
Signed-off-by: Linus Torvalds

Matt Mackall
2008-10-08 02:19:23 +0800

07 Oct, 2008

5 commits

e09e6e2b6 Revert "V4L/DVB (8904): cx88: add missing unlock_kernel" ... Browse Code »

This reverts commit 135aedc38e812b922aa56096f36a3d72ffbcf2fb, as
requested by Hans Verkuil.

It was a patch for 2.6.28 where the BKL was pushed down from v4l core to
the drivers, not for 2.6.27!

Requested-by: Hans Verkuil
Cc: Mauro Carvalho Chehab
Signed-of-by: Linus Torvalds

Linus Torvalds
2008-10-07 22:54:34 +0800
4330ed8ed Linux 2.6.27-rc9 Browse Code »

Linus Torvalds
2008-10-07 07:39:58 +0800
87f3b6b6f Marker depmod fix core kernel list ... Browse Code »

* Theodore Ts'o (tytso@mit.edu) wrote:
>
> I've been playing with adding some markers into ext4 to see if they
> could be useful in solving some problems along with Systemtap. It
> appears, though, that as of 2.6.27-rc8, markers defined in code which is
> compiled directly into the kernel (i.e., not as modules) don't show up
> in Module.markers:
>
> kvm_trace_entryexit arch/x86/kvm/kvm-intel %u %p %u %u %u %u %u %u
> kvm_trace_handler arch/x86/kvm/kvm-intel %u %p %u %u %u %u %u %u
> kvm_trace_entryexit arch/x86/kvm/kvm-amd %u %p %u %u %u %u %u %u
> kvm_trace_handler arch/x86/kvm/kvm-amd %u %p %u %u %u %u %u %u
>
> (Note the lack of any of the kernel_sched_* markers, and the markers I
> added for ext4_* and jbd2_* are missing as wel.)
>
> Systemtap apparently depends on in-kernel trace_mark being recorded in
> Module.markers, and apparently it's been claimed that it used to be
> there. Is this a bug in systemtap, or in how Module.markers is getting
> built? And is there a file that contains the equivalent information
> for markers located in non-modules code?

I think the problem comes from "markers: fix duplicate modpost entry"
(commit d35cb360c29956510b2fe1a953bd4968536f7216)

Especially :

- add_marker(mod, marker, fmt);
+ if (!mod->skip)
+ add_marker(mod, marker, fmt);
}
return;
fail:

Here is a fix that should take care if this problem.

Thanks for the bug report!

Signed-off-by: Mathieu Desnoyers
Tested-by: "Theodore Ts'o"
CC: Greg KH
CC: David Smith
CC: Roland McGrath
CC: Sam Ravnborg
CC: Wenji Huang
CC: Takashi Nishiie
Signed-off-by: Linus Torvalds

Mathieu Desnoyers
2008-10-07 07:34:19 +0800
afed26d15 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb ... Browse Code »

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
kgdb: call touch_softlockup_watchdog on resume
kgdb, x86: Avoid invoking kgdb_nmicallback twice per NMI

Linus Torvalds
2008-10-07 05:30:02 +0800
6106611e1 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/… ... Browse Code »

…git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: gart iommu have direct mapping when agp is present too

Linus Torvalds
2008-10-07 05:29:16 +0800