Eric Lee / smarc-fsl-linux-kernel

18 May, 2011

1 commit

a934a00a6 block: Fix discard topology stacking and reporting ... Browse Code »

In some cases we would end up stacking discard_zeroes_data incorrectly.
Fix this by enabling the feature by default for stacking drivers and
clearing it for low-level drivers. Incorporating a device that does not
support dzd will then cause the feature to be disabled in the stacking
driver.

Also ensure that the maximum discard value does not overflow when
exported in sysfs and return 0 in the alignment and dzd fields for
devices that don't support discard.

Reported-by: Lukas Czerner
Signed-off-by: Martin K. Petersen
Acked-by: Mike Snitzer
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Martin K. Petersen
2011-05-18 16:37:35 +0800

09 May, 2011

1 commit

bbdd304cf fs: fixup warning part_discard_alignment_show() ... Browse Code »

Stephen reports:

-----

After merging the block tree, today's linux-next build (x86_64
allmodconfig) produced this warning:

fs/partitions/check.c: In function 'part_discard_alignment_show':
fs/partitions/check.c:263: warning: format '%u' expects type 'unsigned int', but argument 3 has type 'long long unsigned int'

Introduced by commit ("block: Remove extra discard_alignment from
hd_struct")

-----

Fix it up by just removing the cast, we return an int already.

Reported-by: Stephen Rothwell
Signed-off-by: Jens Axboe

Jens Axboe
2011-05-09 14:28:13 +0800

07 May, 2011

7 commits

23ceb5b77 block: Remove extra discard_alignment from hd_struct. ... Browse Code »

Currently, hd_struct.discard_alignment is only used when we
show /sys/block/sdx/sdx/discard_alignment. So remove it and
calculate when it is asked to show.

Signed-off-by: Tao Ma
Signed-off-by: Jens Axboe

Tao Ma
2011-05-07 09:30:02 +0800
8af1954d1 blkdev: Do not return -EOPNOTSUPP if discard is supported ... Browse Code »

Currently we return -EOPNOTSUPP in blkdev_issue_discard() if any of the
bio fails due to underlying device not supporting discard request.
However, if the device is for example dm device composed of devices
which some of them support discard and some of them does not, it is ok
for some bios to fail with EOPNOTSUPP, but it does not mean that discard
is not supported at all.

This commit removes the check for bios failed with EOPNOTSUPP and change
blkdev_issue_discard() to return operation not supported if and only if
the device does not actually supports it, not just part of the device as
some bios might indicate.

This change also fixes problem with BLKDISCARD ioctl() which now works
correctly on such dm devices.

Signed-off-by: Lukas Czerner
CC: Jens Axboe
CC: Jeff Moyer
Signed-off-by: Jens Axboe

Lukas Czerner
2011-05-07 09:30:01 +0800
5baebe5c8 blkdev: Simple cleanup in blkdev_issue_zeroout() ... Browse Code »

In blkdev_issue_zeroout() we are submitting regular WRITE bios, so we do
not need to check for -EOPNOTSUPP specifically in case of error. Also
there is no need to have label submit: because there is no way to jump
out from the while cycle without an error and we really want to exit,
rather than try again. And also remove the check for (sz == 0) since at
that point sz can never be zero.

Signed-off-by: Lukas Czerner
Reviewed-by: Jeff Moyer
CC: Dmitry Monakhov
CC: Jens Axboe
Signed-off-by: Jens Axboe

Lukas Czerner
2011-05-07 09:26:28 +0800
5dba3089e blkdev: Submit discard bio in batches in blkdev_issue_discard() ... Browse Code »

Currently we are waiting for every submitted REQ_DISCARD bio separately,
but it can have unwanted consequences of repeatedly flushing the queue,
so we rather submit bios in batches and wait for the entire batch, hence
narrowing the window of other ios going in.

Use bio_batch_end_io() and struct bio_batch for that purpose, the same
is used by blkdev_issue_zeroout(). Also change bio_batch_end_io() so we
always set !BIO_UPTODATE in the case of error and remove the check for
bb, since we are the only user of this function and we always set this.

Remove bio_get()/bio_put() from the blkdev_issue_discard() since
bio_alloc() and bio_batch_end_io() is doing the same thing, hence it is
not needed anymore.

I have done simple dd testing with surprising results. The script I have
used is:

for i in $(seq 10); do
echo $i
dd if=/dev/sdb1 of=/dev/sdc1 bs=4k &
sleep 5
done
/usr/bin/time -f %e ./blkdiscard /dev/sdc1

Running time of BLKDISCARD on the whole device:
with patch without patch
0.95 15.58

So we can see that in this artificial test the kernel with the patch
applied is approx 16x faster in discarding the device.

Signed-off-by: Lukas Czerner
CC: Dmitry Monakhov
CC: Jens Axboe
CC: Jeff Moyer
Signed-off-by: Jens Axboe

Lukas Czerner
2011-05-07 09:26:27 +0800
900e599eb SATA: enable non-queueable flush flag ... Browse Code »

Enable non-queueable flush flag for SATA.

Stable: 2.6.39 only

Cc: stable@kernel.org
Signed-off-by: Shaohua Li
Acked-by: Tejun Heo
Acked-by: Jeff Garzik
Signed-off-by: Jens Axboe

shaohua.li@intel.com
2011-05-07 01:36:25 +0800
3ac0cc450 block: hold queue if flush is running for non-queueable flush drive ... Browse Code »

In some drives, flush requests are non-queueable. When flush request is
running, normal read/write requests can't run. If block layer dispatches
such request, driver can't handle it and requeue it. Tejun suggested we
can hold the queue when flush is running. This can avoid unnecessary
requeue. Also this can improve performance. For example, we have
request flush1, write1, flush 2. flush1 is dispatched, then queue is
hold, write1 isn't inserted to queue. After flush1 is finished, flush2
will be dispatched. Since disk cache is already clean, flush2 will be
finished very soon, so looks like flush2 is folded to flush1.

In my test, the queue holding completely solves a regression introduced by
commit 53d63e6b0dfb95882ec0219ba6bbd50cde423794:

block: make the flush insertion use the tail of the dispatch list

It's not a preempt type request, in fact we have to insert it
behind requests that do specify INSERT_FRONT.

which causes about 20% regression running a sysbench fileio
workload.

Stable: 2.6.39 only

Cc: stable@kernel.org
Signed-off-by: Shaohua Li
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

shaohua.li@intel.com
2011-05-07 01:36:25 +0800
f38769309 block: add a non-queueable flush flag ... Browse Code »

flush request isn't queueable in some drives. Add a flag to let driver
notify block layer about this. We can optimize flush performance with the
knowledge.

Stable: 2.6.39 only

Cc: stable@kernel.org
Signed-off-by: Shaohua Li
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

shaohua.li@intel.com
2011-05-07 01:36:25 +0800

06 May, 2011

2 commits

490b94be0 iosched: remove redundant sprintf ... Browse Code »

After the anticipatory scheduler was dropped, there was no need to
special-case the request_module string. As such, drop the redundant
sprintf and stack variable.

Signed-off-by: Kees Cook
Signed-off-by: Jens Axboe

Kees Cook
2011-05-06 08:02:12 +0800
addd0a09f block: Remove 'plug/unplug' comment in blk_execute_rq_nowait ... Browse Code »

unplug is replaced with blk_run_queue now in blk_execute_rq_nowait,
so change the comment accordingly.

Signed-off-by: Tao Ma
Signed-off-by: Jens Axboe

Tao Ma
2011-05-06 05:10:05 +0800

22 Apr, 2011

3 commits

d4dc210f6 block: don't block events on excl write for non-optical devices ... Browse Code »

Disk event code automatically blocks events on excl write. This is
primarily to avoid issuing polling commands while burning is in
progress. This behavior doesn't fit other types of devices with
removeable media where polling commands don't have adverse side
effects and door locking usually doesn't exist.

This patch introduces new genhd flag which controls the auto-blocking
behavior and uses it to enable auto-blocking only on optical devices.

Note for stable: 2.6.38 and later only

Cc: stable@kernel.org
Signed-off-by: Tejun Heo
Reported-by: Kay Sievers
Signed-off-by: Jens Axboe

Tejun Heo
2011-04-22 02:54:46 +0800
1196f8b81 block: rescan partitions on invalidated devices on -ENOMEDIA too ... Browse Code »
44

__blkdev_get() doesn't rescan partitions if disk->fops->open() fails,
which leads to ghost partition devices lingering after medimum removal
is known to both the kernel and userland. The behavior also creates a
subtle inconsistency where O_NONBLOCK open, which doesn't fail even if
there's no medium, clears the ghots partitions, which is exploited to
work around the problem from userland.

Fix it by updating __blkdev_get() to issue partition rescan after
-ENOMEDIA too.

This was reported in the following bz.

https://bugzilla.kernel.org/show_bug.cgi?id=13029

Note for stable: 2.6.38 and later only

Cc: stable@kernel.org
Signed-off-by: Tejun Heo
Reported-by: David Zeuthen
Reported-by: Martin Pitt
Reported-by: Kay Sievers
Tested-by: Kay Sievers
Cc: Alan Cox
Signed-off-by: Jens Axboe

Tejun Heo
2011-04-22 02:54:45 +0800
ea6949b66 cdrom: always check_disk_change() on open ... Browse Code »

cdrom_open() called check_disk_change() after the rest of open path
succeeded which leads to the following bizarre behavior.

* After media change, if the device opened without O_NONBLOCK,
open_for_data() naturally fails with -ENOMEDIA and
check_disk_change() is never called. The media is known to be gone
and the open failure makes it obvious to the userland but device
invalidation never happens.

* But if the device is opened with O_NONBLOCK, all the checks are
bypassed and cdrom_open() doesn't notice that the media is not there
and check_disk_change() is called and invalidation happens.

There's nothing to be gained by avoiding calling check_disk_change()
on open failure. Common cases end up calling check_disk_change()
anyway. All we get is inconsistent behavior.

Fix it by moving check_disk_change() invocation to the top of
cdrom_open() so that it always gets called regardless of how the rest
of open proceeds.

Note for stable: 2.6.38 and later only

Cc: stable@kernel.org
Signed-off-by: Tejun Heo
Reported-by: Amit Shah
Tested-by: Amit Shah
Signed-off-by: Jens Axboe

Tejun Heo
2011-04-22 02:54:44 +0800

19 Apr, 2011

10 commits

f0e615c3c Linux 2.6.39-rc4 Browse Code »

Linus Torvalds
2011-04-19 12:26:00 +0800
e024f69de Merge branch 'for-39-rc4' of git://codeaurora.org/quic/kernel/davidb/linux-msm ... Browse Code »

* 'for-39-rc4' of git://codeaurora.org/quic/kernel/davidb/linux-msm:
msm: timer: fix missing return value
msm: Remove extraneous ffa device check

Linus Torvalds
2011-04-19 06:44:29 +0800
96fd2d57b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: xen-kbdfront - fix mouse getting stuck after save/restore
Input: estimate number of events per packet
Input: evdev - indicate buffer overrun with SYN_DROPPED
Input: document event types and codes and their intended use
Input: add KEY_IMAGES specifically for AL Image Browser
Input: twl4030_keypad - fix potential NULL dereference in twl4030_kp_probe()
Input: h3600_ts - fix error handling at connect
Input: twl4030_keypad - avoid potential NULL-pointer dereference

Linus Torvalds
2011-04-19 04:29:03 +0800
8a83f3310 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: add blk_run_queue_async
block: blk_delay_queue() should use kblockd workqueue
md: fix up raid1/raid10 unplugging.
md: incorporate new plugging into raid5.
md: provide generic support for handling unplug callbacks.
md - remove old plugging code.
md/dm - remove remains of plug_fn callback.
md: use new plugging interface for RAID IO.
block: drop queue lock before calling __blk_run_queue() for kblockd punt
Revert "block: add callback function for unplug notification"
block: Enhance new plugging support to support general callbacks

Linus Torvalds
2011-04-19 04:21:18 +0800
5d5b1b9f7 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc ... Browse Code »

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/powermac: Build fix with SMP and CPU hotplug
powerpc/perf_event: Skip updating kernel counters if register value shrinks
powerpc: Don't write protect kernel text with CONFIG_DYNAMIC_FTRACE enabled
powerpc: Fix oops if scan_dispatch_log is called too early
powerpc/pseries: Use a kmem cache for DTL buffers
powerpc/kexec: Fix regression causing compile failure on UP
powerpc/85xx: disable Suspend support if SMP enabled
powerpc/e500mc: Remove CPU_FTR_MAYBE_CAN_NAP/CPU_FTR_MAYBE_CAN_DOZE
powerpc/book3e: Fix CPU feature handling on 64-bit e5500
powerpc: Check device status before adding serial device
powerpc/85xx: Don't add disabled PCIe devices

Linus Torvalds
2011-04-19 03:24:24 +0800
adff377bb Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (24 commits)
Btrfs: fix free space cache leak
Btrfs: avoid taking the chunk_mutex in do_chunk_alloc
Btrfs end_bio_extent_readpage should look for locked bits
Btrfs: don't force chunk allocation in find_free_extent
Btrfs: Check validity before setting an acl
Btrfs: Fix incorrect inode nlink in btrfs_link()
Btrfs: Check if btrfs_next_leaf() returns error in btrfs_real_readdir()
Btrfs: Check if btrfs_next_leaf() returns error in btrfs_listxattr()
Btrfs: make uncache_state unconditional
btrfs: using cached extent_state in set/unlock combinations
Btrfs: avoid taking the trans_mutex in btrfs_end_transaction
Btrfs: fix subvolume mount by name problem when default mount subvolume is set
fix user annotation in ioctl.c
Btrfs: check for duplicate iov_base's when doing dio reads
btrfs: properly handle overlapping areas in memmove_extent_buffer
Btrfs: fix memory leaks in btrfs_new_inode()
Btrfs: check for duplicate iov_base's when doing dio reads
Btrfs: reuse the extent_map we found when calling btrfs_get_extent
Btrfs: do not use async submit for small DIO io's
Btrfs: don't split dio bios if we don't have to
...

Linus Torvalds
2011-04-19 03:24:05 +0800
d8bdc59f2 proc: do proper range check on readdir offset ... Browse Code »

Rather than pass in some random truncated offset to the pid-related
functions, check that the offset is in range up-front.

This is just cleanup, the previous commit fixed the real problem.

Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Linus Torvalds
2011-04-19 01:36:54 +0800
c78193e9c next_pidmap: fix overflow condition ... Browse Code »

next_pidmap() just quietly accepted whatever 'last' pid that was passed
in, which is not all that safe when one of the users is /proc.

Admittedly the proc code should do some sanity checking on the range
(and that will be the next commit), but that doesn't mean that the
helper functions should just do that pidmap pointer arithmetic without
checking the range of its arguments.

So clamp 'last' to PID_MAX_LIMIT. The fact that we then do "last+1"
doesn't really matter, the for-loop does check against the end of the
pidmap array properly (it's only the actual pointer arithmetic overflow
case we need to worry about, and going one bit beyond isn't going to
overflow).

[ Use PID_MAX_LIMIT rather than pid_max as per Eric Biederman ]

Reported-by: Tavis Ormandy
Analyzed-by: Robert Święcki
Cc: Eric W. Biederman
Cc: Pavel Emelyanov
Signed-off-by: Linus Torvalds

Linus Torvalds
2011-04-19 01:35:30 +0800
c36b58e8a Input: xen-kbdfront - fix mouse getting stuck after save/restore ... Browse Code »

Mouse gets "stuck" after restore of PV guest but buttons are in working
condition.

If driver has been configured for ABS coordinates at start it will get
XENKBD_TYPE_POS events and then suddenly after restore it'll start getting
XENKBD_TYPE_MOTION events, that will be dropped later and they won't get
into user-space.

Regression was introduced by hunk 5 and 6 of
5ea5254aa0ad269cfbd2875c973ef25ab5b5e9db
("Input: xen-kbdfront - advertise either absolute or relative
coordinates").

Driver on restore should ask xen for request-abs-pointer again if it is
available. So restore parts that did it before 5ea5254.

Acked-by: Olaf Hering
Signed-off-by: Igor Mammedov
[v1: Expanded the commit description]
Signed-off-by: Konrad Rzeszutek Wilk
Signed-off-by: Dmitry Torokhov

Igor Mammedov
2011-04-19 01:17:45 +0800
80b4895aa Input: estimate number of events per packet ... Browse Code »

Calculate a default based on the number of ABS axes, REL axes,
and MT slots for the device during input device registration.

Signed-off-by: Jeff Brown
Reviewed-by: Henrik Rydberg
Signed-off-by: Dmitry Torokhov

Jeff Brown
2011-04-19 01:15:43 +0800

18 Apr, 2011

16 commits

f65647c29 Btrfs: fix free space cache leak ... Browse Code »

The free space caching code was recently reworked to
cache all the pages it needed instead of using find_get_page everywhere.

One loop was missed though, so it ended up leaking pages. This fixes
it to use our page array instead of find_get_page.

Signed-off-by: Chris Mason

Chris Mason
2011-04-18 20:55:34 +0800
24ecfbe27 block: add blk_run_queue_async ... Browse Code »

Instead of overloading __blk_run_queue to force an offload to kblockd
add a new blk_run_queue_async helper to do it explicitly. I've kept
the blk_queue_stopped check for now, but I suspect it's not needed
as the check we do when the workqueue items runs should be enough.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2011-04-18 17:41:33 +0800
4521cc4ed block: blk_delay_queue() should use kblockd workqueue ... Browse Code »

Reported-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Jens Axboe
2011-04-18 17:36:39 +0800
c3b328ac8 md: fix up raid1/raid10 unplugging. ... Browse Code »

We just need to make sure that an unplug event wakes up the md
thread, which is exactly what mddev_check_plugged does.

Also remove some plug-related code that is no longer needed.

Signed-off-by: NeilBrown

NeilBrown
2011-04-18 16:25:43 +0800
7c13edc87 md: incorporate new plugging into raid5. ... Browse Code »

In raid5 plugging is used for 2 things:
1/ collecting writes that require a bitmap update
2/ collecting writes in the hope that we can create full
stripes - or at least more-full.

We now release these different sets of stripes when plug_cnt
is zero.

Also in make_request, we call mddev_check_plug to hopefully increase
plug_cnt, and wake up the thread at the end if plugging wasn't
achieved for some reason.

Signed-off-by: NeilBrown

NeilBrown
2011-04-18 16:25:43 +0800
97658cdd3 md: provide generic support for handling unplug callbacks. ... Browse Code »

When an md device adds a request to a queue, it can call
mddev_check_plugged.
If this succeeds then we know that the md thread will be woken up
shortly, and ->plug_cnt will be non-zero until then, so some
processing can be delayed.

If it fails, then no unplug callback is expected and the make_request
function needs to do whatever is required to make the request happen.

Signed-off-by: NeilBrown

NeilBrown
2011-04-18 16:25:42 +0800
482c08349 md - remove old plugging code. ... Browse Code »

md has some plugging infrastructure for RAID5 to use because the
normal plugging infrastructure required a 'request_queue', and when
called from dm, RAID5 doesn't have one of those available.

This relied on the ->unplug_fn callback which doesn't exist any more.

So remove all of that code, both in md and raid5. Subsequent patches
with restore the plugging functionality.

Signed-off-by: NeilBrown

NeilBrown
2011-04-18 16:25:42 +0800
af1db72d8 md/dm - remove remains of plug_fn callback. ... Browse Code »

Now that unplugging is done differently, the unplug_fn callback is
never called, so it can be completely discarded.

Signed-off-by: NeilBrown

NeilBrown
2011-04-18 16:25:41 +0800
e1dfa0a29 md: use new plugging interface for RAID IO. ... Browse Code »

md/raid submits a lot of IO from the various raid threads.
So adding start/finish plug calls to those so that some
plugging happens.

Signed-off-by: NeilBrown

NeilBrown
2011-04-18 16:25:41 +0800
99e22598e block: drop queue lock before calling __blk_run_queue() for kblockd punt ... Browse Code »

If we know we are going to punt to kblockd, we can drop the queue
lock before calling into __blk_run_queue() since it only does a
safe bit test and a workqueue call. Since kblockd needs to grab
this very lock as one of the first things it does, it's a good
optimization to drop the lock before waking kblockd.

Signed-off-by: Jens Axboe

Jens Axboe
2011-04-18 15:59:55 +0800
b4cb290e0 Revert "block: add callback function for unplug notification" ... Browse Code »

MD can't use this since it really requires us to be able to
keep more than a single piece of state for the unplug. Commit
048c9374 added the required support for MD, so get rid of this
now unused code.

This reverts commit f75664570d8b75469cc468f23c2b27220984983b.

Conflicts:

block/blk-core.c

Signed-off-by: Jens Axboe

Jens Axboe
2011-04-18 15:54:05 +0800
048c9374a block: Enhance new plugging support to support general callbacks ... Browse Code »

md/raid requires an unplug callback, but as it does not uses
requests the current code cannot provide one.

So allow arbitrary callbacks to be attached to the blk_plug.

Signed-off-by: NeilBrown
Signed-off-by: Jens Axboe

NeilBrown
2011-04-18 15:52:22 +0800
7b84b29b8 powerpc/powermac: Build fix with SMP and CPU hotplug ... Browse Code »

Signed-off-by: Benjamin Herrenschmidt

Benjamin Herrenschmidt
2011-04-18 13:46:35 +0800
86c74ab31 powerpc/perf_event: Skip updating kernel counters if register value shrinks ... Browse Code »

Because of speculative event roll back, it is possible for some event coutners
to decrease between reads on POWER7. This causes a problem with the way that
counters are updated. Delta calues are calculated in a 64 bit value and the
top 32 bits are masked. If the register value has decreased, this leaves us
with a very large positive value added to the kernel counters. This patch
protects against this by skipping the update if the delta would be negative.
This can lead to a lack of precision in the coutner values, but from my testing
the value is typcially fewer than 10 samples at a time.

Signed-off-by: Eric B Munson
Cc: stable@kernel.org
Signed-off-by: Benjamin Herrenschmidt

Eric B Munson
2011-04-18 11:08:23 +0800
09597cfe9 powerpc: Don't write protect kernel text with CONFIG_DYNAMIC_FTRACE enabled ... Browse Code »

This problem was noticed on an MPC855T platform. Ftrace did oops
when trying to write to the kernel text segment.

Many thanks to Joakim for finding the root cause of this problem.

Signed-off-by: Stefan Roese
Cc: Joakim Tjernlund
Cc: Benjamin Herrenschmidt
Cc: Steven Rostedt
Signed-off-by: Benjamin Herrenschmidt

Stefan Roese
2011-04-18 11:08:21 +0800
84ffae55a powerpc: Fix oops if scan_dispatch_log is called too early ... Browse Code »

We currently enable interrupts before the dispatch log for the boot
cpu is setup. If a timer interrupt comes in early enough we oops in
scan_dispatch_log:

Unable to handle kernel paging request for data at address 0x00000010

...

.scan_dispatch_log+0xb0/0x170
.account_system_vtime+0xa0/0x220
.irq_enter+0x88/0xc0
.do_IRQ+0x48/0x230

The patch below adds a check to scan_dispatch_log to ensure the
dispatch log has been allocated.

Signed-off-by: Anton Blanchard
Cc:
Signed-off-by: Benjamin Herrenschmidt

Anton Blanchard
2011-04-18 11:08:19 +0800