Eric Lee / smarc-fsl-linux-kernel

08 Mar, 2014

1 commit

2a75184d5 Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:
"Small collection of fixes for 3.14-rc. It contains:

- Three minor update to blk-mq from Christoph.

- Reduce number of unaligned (< 4kb) in-flight writes on mtip32xx to
two. From Micron.

- Make the blk-mq CPU notify spinlock raw, since it can't be a
sleeper spinlock on RT. From Mike Galbraith.

- Drop now bogus BUG_ON() for bio iteration with blk integrity. From
Nic Bellinger.

- Properly propagate the SYNC flag on requests. From Shaohua"

* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: add REQ_SYNC early
rt,blk,mq: Make blk_mq_cpu_notify_lock a raw spinlock
bio-integrity: Drop bio_integrity_verify BUG_ON in post bip->bip_iter world
blk-mq: support partial I/O completions
blk-mq: merge blk_mq_insert_request and blk_mq_run_request
blk-mq: remove blk_mq_alloc_rq
mtip32xx: Reduce the number of unaligned writes to 2

Linus Torvalds
2014-03-08 01:59:44 +0800

04 Mar, 2014

2 commits

db5d711e2 zram: avoid null access when fail to alloc meta ... Browse Code »

zram_meta_alloc could fail so caller should check it. Otherwise, your
system will hang.

Signed-off-by: Minchan Kim
Acked-by: Jerome Marchand
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2014-03-04 23:55:49 +0800
668f9abbd mm: close PageTail race ... Browse Code »

Commit bf6bddf1924e ("mm: introduce compaction and migration for
ballooned pages") introduces page_count(page) into memory compaction
which dereferences page->first_page if PageTail(page).

This results in a very rare NULL pointer dereference on the
aforementioned page_count(page). Indeed, anything that does
compound_head(), including page_count() is susceptible to racing with
prep_compound_page() and seeing a NULL or dangling page->first_page
pointer.

This patch uses Andrea's implementation of compound_trans_head() that
deals with such a race and makes it the default compound_head()
implementation. This includes a read memory barrier that ensures that
if PageTail(head) is true that we return a head page that is neither
NULL nor dangling. The patch then adds a store memory barrier to
prep_compound_page() to ensure page->first_page is set.

This is the safest way to ensure we see the head page that we are
expecting, PageTail(page) is already in the unlikely() path and the
memory barriers are unfortunately required.

Hugetlbfs is the exception, we don't enforce a store memory barrier
during init since no race is possible.

Signed-off-by: David Rientjes
Cc: Holger Kiehl
Cc: Christoph Lameter
Cc: Rafael Aquini
Cc: Vlastimil Babka
Cc: Michal Hocko
Cc: Mel Gorman
Cc: Andrea Arcangeli
Cc: Rik van Riel
Cc: "Kirill A. Shutemov"
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2014-03-04 23:55:47 +0800

19 Feb, 2014

1 commit

5a98268e0 mtip32xx: Reduce the number of unaligned writes to 2 ... Browse Code »

After several experiments, deduced the the optimal number of unaligned
writes to be 2. Changing the value accordingly.

Signed-off-by: Asai Thambi S P
Signed-off-by: Sam Bradshaw
Signed-off-by: Jens Axboe

Asai Thambi S P
2014-02-19 06:49:17 +0800

15 Feb, 2014

1 commit

5e57dc811 Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block IO fixes from Jens Axboe:
"Second round of updates and fixes for 3.14-rc2. Most of this stuff
has been queued up for a while. The notable exception is the blk-mq
changes, which are naturally a bit more in flux still.

The pull request contains:

- Two bug fixes for the new immutable vecs, causing crashes with raid
or swap. From Kent.

- Various blk-mq tweaks and fixes from Christoph. A fix for
integrity bio's from Nic.

- A few bcache fixes from Kent and Darrick Wong.

- xen-blk{front,back} fixes from David Vrabel, Matt Rushton, Nicolas
Swenson, and Roger Pau Monne.

- Fix for a vec miscount with integrity vectors from Martin.

- Minor annotations or fixes from Masanari Iida and Rashika Kheria.

- Tweak to null_blk to do more normal FIFO processing of requests
from Shlomo Pongratz.

- Elevator switching bypass fix from Tejun.

- Softlockup in blkdev_issue_discard() fix when !CONFIG_PREEMPT from
me"

* 'for-linus' of git://git.kernel.dk/linux-block: (31 commits)
block: add cond_resched() to potentially long running ioctl discard loop
xen-blkback: init persistent_purge_work work_struct
blk-mq: pair blk_mq_start_request / blk_mq_requeue_request
blk-mq: dont assume rq->errors is set when returning an error from ->queue_rq
block: Fix cloning of discard/write same bios
block: Fix type mismatch in ssize_t_blk_mq_tag_sysfs_show
blk-mq: rework flush sequencing logic
null_blk: use blk_complete_request and blk_mq_complete_request
virtio_blk: use blk_mq_complete_request
blk-mq: rework I/O completions
fs: Add prototype declaration to appropriate header file include/linux/bio.h
fs: Mark function as static in fs/bio-integrity.c
block/null_blk: Fix completion processing from LIFO to FIFO
block: Explicitly handle discard/write same segments
block: Fix nr_vecs for inline integrity vectors
blk-mq: Add bio_integrity setup to blk_mq_make_request
blk-mq: initialize sg_reserved_size
blk-mq: handle dma_drain_size
blk-mq: divert __blk_put_request for MQ ops
blk-mq: support at_head inserations for blk_execute_rq
...

Linus Torvalds
2014-02-15 02:45:18 +0800

12 Feb, 2014

1 commit

abb97b8c5 xen-blkback: init persistent_purge_work work_struct ... Browse Code »

Initialize persistent_purge_work work_struct on xen_blkif_alloc (and
remove the previous initialization done in purge_persistent_gnt). This
prevents flush_work from complaining even if purge_persistent_gnt has
not been used.

Signed-off-by: Roger Pau Monné
Reviewed-by: David Vrabel
Tested-by: Sander Eikelenboom
Signed-off-by: Jens Axboe

Roger Pau Monne
2014-02-12 11:34:03 +0800

11 Feb, 2014

3 commits

9d4cb8e3a Merge branch 'stable/for-jens-3.14' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/xen/tip into for-linus

Konrad writes:

Please git pull the following branch:

git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git stable/for-jens-3.14

which is based off v3.13-rc6. If you would like me to rebase it on
a different branch/tag I would be more than happy to do so.

The patches are all bug-fixes and hopefully can go in 3.14.

They deal with xen-blkback shutdown and cause memory leaks
as well as shutdown races. They should go to stable tree and if you
are OK with I will ask them to backport those fixes.

There is also a fix to xen-blkfront to deal with unexpected state
transition. And lastly a fix to the header where it was using the
__aligned__ unnecessarily.

Jens Axboe
2014-02-11 03:52:34 +0800
ce2c350b2 null_blk: use blk_complete_request and blk_mq_complete_request ... Browse Code »

Use the block layer helpers for CPU-local completions instead of
reimplementing them locally.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2014-02-11 00:27:31 +0800
5124c2857 virtio_blk: use blk_mq_complete_request ... Browse Code »

Make sure to complete requests on the submitting CPU. Previously this
was done in blk_mq_end_io, but the responsibility shifted to the drivers.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2014-02-11 00:27:31 +0800

08 Feb, 2014

6 commits

d7790b928 block/null_blk: Fix completion processing from LIFO to FIFO ... Browse Code »

The completion queue is implemented using lockless list.

The llist_add is adds the events to the list head which is a push operation.
The processing of the completion elements is done by disconnecting all the
pushed elements and iterating over the disconnected list. The problem is
that the processing is done in reverse order w.r.t order of the insertion
i.e. LIFO processing. By reversing the disconnected list which is done in
linear time the desired FIFO processing is achieved.

Signed-off-by: Shlomo Pongratz
Signed-off-by: Jens Axboe

Shlomo Pongratz
2014-02-08 04:56:07 +0800
366137170 xen-blkfront: handle backend CLOSED without CLOSING ... Browse Code »

Backend drivers shouldn't transistion to CLOSED unless the frontend is
CLOSED. If a backend does transition to CLOSED too soon then the
frontend may not see the CLOSING state and will not properly shutdown.

So, treat an unexpected backend CLOSED state the same as CLOSING.

Signed-off-by: David Vrabel
Acked-by: Konrad Rzeszutek Wilk
Cc: stable@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk

David Vrabel
2014-02-08 02:35:20 +0800
80bfa2f6e xen-blkif: drop struct blkif_request_segment_aligned ... Browse Code »

This was wrongly introduced in commit 402b27f9, the only difference
between blkif_request_segment_aligned and blkif_request_segment is
that the former has a named padding, while both share the same
memory layout.

Also correct a few minor glitches in the description, including for it
to no longer assume PAGE_SIZE == 4096.

Signed-off-by: Roger Pau Monné
[Description fix by Jan Beulich]
Signed-off-by: Jan Beulich
Reported-by: Jan Beulich
Cc: Konrad Rzeszutek Wilk
Cc: David Vrabel
Cc: Boris Ostrovsky
Tested-by: Matt Rushton
Cc: Matt Wilson
Signed-off-by: Konrad Rzeszutek Wilk

Roger Pau Monne
2014-02-08 02:03:53 +0800
c05f3e3c8 xen-blkback: fix shutdown race ... Browse Code »

Introduce a new variable to keep track of the number of in-flight
requests. We need to make sure that when xen_blkif_put is called the
request has already been freed and we can safely free xen_blkif, which
was not the case before.

Signed-off-by: Roger Pau Monné
Cc: Konrad Rzeszutek Wilk
Cc: David Vrabel
Reviewed-by: Boris Ostrovsky
Tested-by: Matt Rushton
Reviewed-by: Matt Rushton
Cc: Matt Wilson
Cc: Ian Campbell
Signed-off-by: Konrad Rzeszutek Wilk

Roger Pau Monne
2014-02-08 01:59:30 +0800
ef7534113 xen-blkback: fix memory leaks ... Browse Code »

I've at least identified two possible memory leaks in blkback, both
related to the shutdown path of a VBD:

- blkback doesn't wait for any pending purge work to finish before
cleaning the list of free_pages. The purge work will call
put_free_pages and thus we might end up with pages being added to
the free_pages list after we have emptied it. Fix this by making
sure there's no pending purge work before exiting
xen_blkif_schedule, and moving the free_page cleanup code to
xen_blkif_free.
- blkback doesn't wait for pending requests to end before cleaning
persistent grants and the list of free_pages. Again this can add
pages to the free_pages list or persistent grants to the
persistent_gnts red-black tree. Fixed by moving the persistent
grants and free_pages cleanup code to xen_blkif_free.

Also, add some checks in xen_blkif_free to make sure we are cleaning
everything.

Signed-off-by: Roger Pau Monné
Cc: Konrad Rzeszutek Wilk
Reviewed-by: David Vrabel
Cc: Boris Ostrovsky
Tested-by: Matt Rushton
Reviewed-by: Matt Rushton
Cc: Matt Wilson
Cc: Ian Campbell
Signed-off-by: Konrad Rzeszutek Wilk

Roger Pau Monne
2014-02-08 01:58:46 +0800
2ed22e3c3 xen-blkback: fix memory leak when persistent grants are used ... Browse Code »

Currently shrink_free_pagepool() is called before the pages used for
persistent grants are released via free_persistent_gnts(). This
results in a memory leak when a VBD that uses persistent grants is
torn down.

Cc: Konrad Rzeszutek Wilk
Cc: "Roger Pau Monné"
Cc: Ian Campbell
Reviewed-by: David Vrabel
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xen.org
Cc: Anthony Liguori
Signed-off-by: Matt Rushton
Signed-off-by: Matt Wilson
Signed-off-by: Konrad Rzeszutek Wilk

Matt Rushton
2014-02-08 01:58:18 +0800

06 Feb, 2014

2 commits

1cd731df0 Merge tag 'stable/for-linus-3.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip ... Browse Code »

Pull Xen fixes from Konrad Rzeszutek Wilk:
"Bug-fixes:
- Revert "xen/grant-table: Avoid m2p_override during mapping" as it
broke Xen ARM build.
- Fix CR4 not being set on AP processors in Xen PVH mode"

* tag 'stable/for-linus-3.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/pvh: set CR4 flags for APs
Revert "xen/grant-table: Avoid m2p_override during mapping"

Linus Torvalds
2014-02-06 08:01:11 +0800
8352650a5 Merge git://git.infradead.org/users/willy/linux-nvme ... Browse Code »

Pull NVMe driver update from Matthew Wilcox:
"Looks like I missed the merge window ... but these are almost all
bugfixes anyway (the ones that aren't have been baking for months)"

* git://git.infradead.org/users/willy/linux-nvme:
NVMe: Namespace use after free on surprise removal
NVMe: Correct uses of INIT_WORK
NVMe: Include device and queue numbers in interrupt name
NVMe: Add a pci_driver shutdown method
NVMe: Disable admin queue on init failure
NVMe: Dynamically allocate partition numbers
NVMe: Async IO queue deletion
NVMe: Surprise removal handling
NVMe: Abort timed out commands
NVMe: Schedule reset for failed controllers
NVMe: Device resume error handling
NVMe: Cache dev->pci_dev in a local pointer
NVMe: Fix lockdep warnings
NVMe: compat SG_IO ioctl
NVMe: remove deprecated IRQF_DISABLED
NVMe: Avoid shift operation when writing cq head doorbell

Linus Torvalds
2014-02-06 07:53:26 +0800

03 Feb, 2014

2 commits

e85fc9805 Revert "xen/grant-table: Avoid m2p_override during mapping" ... Browse Code »

This reverts commit 08ece5bb2312b4510b161a6ef6682f37f4eac8a1.

As it breaks ARM builds and needs more attention
on the ARM side.

Acked-by: David Vrabel
Signed-off-by: Konrad Rzeszutek Wilk

Konrad Rzeszutek Wilk
2014-02-03 19:44:49 +0800
9ac27090f NVMe: Namespace use after free on surprise removal ... Browse Code »

An nvme block device may have open references when the device is
removed. New commands may still be sent on the removed device, so we
need to ref count the opens, return errors for new commands, and not
free the namespace and nvme_dev until all references are closed.

Signed-off-by: Keith Busch
Signed-off-by: Matthew Wilcox

Keith Busch
2014-02-03 02:31:15 +0800

01 Feb, 2014

1 commit

14164b46f Merge tag 'stable/for-linus-3.14-rc0-late-tag' of git://git.kernel.org/pub/scm/l… ... Browse Code »

…inux/kernel/git/xen/tip

Pull Xen bugfixes from Konrad Rzeszutek Wilk:
"Bug-fixes for the new features that were added during this cycle.

There are also two fixes for long-standing issues for which we have a
solution: grant-table operations extra work that was not needed
causing performance issues and the self balloon code was too
aggressive causing OOMs.

Details:
- Xen ARM couldn't use the new FIFO events
- Xen ARM couldn't use the SWIOTLB if compiled as 32-bit with 64-bit PCIe devices.
- Grant table were doing needless M2P operations.
- Ratchet down the self-balloon code so it won't OOM.
- Fix misplaced kfree in Xen PVH error code paths"

* tag 'stable/for-linus-3.14-rc0-late-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/pvh: Fix misplaced kfree from xlated_setup_gnttab_pages
drivers: xen: deaggressive selfballoon driver
xen/grant-table: Avoid m2p_override during mapping
xen/gnttab: Use phys_addr_t to describe the grant frame base address
xen: swiotlb: handle sizeof(dma_addr_t) != sizeof(phys_addr_t)
arm/xen: Initialize event channels earlier

Linus Torvalds
2014-02-01 00:38:18 +0800

31 Jan, 2014

13 commits

08ece5bb2 xen/grant-table: Avoid m2p_override during mapping ... Browse Code »

The grant mapping API does m2p_override unnecessarily: only gntdev needs it,
for blkback and future netback patches it just cause a lock contention, as
those pages never go to userspace. Therefore this series does the following:
- the original functions were renamed to __gnttab_[un]map_refs, with a new
parameter m2p_override
- based on m2p_override either they follow the original behaviour, or just set
the private flag and call set_phys_to_machine
- gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with
m2p_override false
- a new function gnttab_[un]map_refs_userspace provides the old behaviour

It also removes a stray space from page.h and change ret to 0 if
XENFEAT_auto_translated_physmap, as that is the only possible return value
there.

v2:
- move the storing of the old mfn in page->index to gnttab_map_refs
- move the function header update to a separate patch

v3:
- a new approach to retain old behaviour where it needed
- squash the patches into one

v4:
- move out the common bits from m2p* functions, and pass pfn/mfn as parameter
- clear page->private before doing anything with the page, so m2p_find_override
won't race with this

v5:
- change return value handling in __gnttab_[un]map_refs
- remove a stray space in page.h
- add detail why ret = 0 now at some places

v6:
- don't pass pfn to m2p* functions, just get it locally

Signed-off-by: Zoltan Kiss
Suggested-by: David Vrabel
Acked-by: David Vrabel
Acked-by: Stefano Stabellini
Signed-off-by: Konrad Rzeszutek Wilk

Zoltan Kiss
2014-01-31 22:48:32 +0800
e46e33152 zram: remove zram->lock in read path and change it with mutex ... Browse Code »

Finally, we separated zram->lock dependency from 32bit stat/ table
handling so there is no reason to use rw_semaphore between read and
write path so this patch removes the lock from read path totally and
changes rw_semaphore with mutex. So, we could do

old:

read-read: OK
read-write: NO
write-write: NO

Now:

read-read: OK
read-write: OK
write-write: NO

The below data proves mixed workload performs well 11 times and there is
also enhance on write-write path because current rw-semaphore doesn't
support SPIN_ON_OWNER. It's side effect but anyway good thing for us.

Write-related tests perform better (from 61% to 1058%) but read path has
good/bad(from -2.22% to 1.45%) but they are all marginal within stddev.

CPU 12
iozone -t -T -l 12 -u 12 -r 16K -s 60M -I +Z -V 0

==Initial write ==Initial write
records: 10 records: 10
avg: 516189.16 avg: 839907.96
std: 22486.53 (4.36%) std: 47902.17 (5.70%)
max: 546970.60 max: 909910.35
min: 481131.54 min: 751148.38
==Rewrite ==Rewrite
records: 10 records: 10
avg: 509527.98 avg: 1050156.37
std: 45799.94 (8.99%) std: 40695.44 (3.88%)
max: 611574.27 max: 1111929.26
min: 443679.95 min: 980409.62
==Read ==Read
records: 10 records: 10
avg: 4408624.17 avg: 4472546.76
std: 281152.61 (6.38%) std: 163662.78 (3.66%)
max: 4867888.66 max: 4727351.03
min: 4058347.69 min: 4126520.88
==Re-read ==Re-read
records: 10 records: 10
avg: 4462147.53 avg: 4363257.75
std: 283546.11 (6.35%) std: 247292.63 (5.67%)
max: 4912894.44 max: 4677241.75
min: 4131386.50 min: 4035235.84
==Reverse Read ==Reverse Read
records: 10 records: 10
avg: 4565865.97 avg: 4485818.08
std: 313395.63 (6.86%) std: 248470.10 (5.54%)
max: 5232749.16 max: 4789749.94
min: 4185809.62 min: 3963081.34
==Stride read ==Stride read
records: 10 records: 10
avg: 4515981.80 avg: 4418806.01
std: 211192.32 (4.68%) std: 212837.97 (4.82%)
max: 4889287.28 max: 4686967.22
min: 4210362.00 min: 4083041.84
==Random read ==Random read
records: 10 records: 10
avg: 4410525.23 avg: 4387093.18
std: 236693.22 (5.37%) std: 235285.23 (5.36%)
max: 4713698.47 max: 4669760.62
min: 4057163.62 min: 3952002.16
==Mixed workload ==Mixed workload
records: 10 records: 10
avg: 243234.25 avg: 2818677.27
std: 28505.07 (11.72%) std: 195569.70 (6.94%)
max: 288905.23 max: 3126478.11
min: 212473.16 min: 2484150.69
==Random write ==Random write
records: 10 records: 10
avg: 555887.07 avg: 1053057.79
std: 70841.98 (12.74%) std: 35195.36 (3.34%)
max: 683188.28 max: 1096125.73
min: 437299.57 min: 992481.93
==Pwrite ==Pwrite
records: 10 records: 10
avg: 501745.93 avg: 810363.09
std: 16373.54 (3.26%) std: 19245.01 (2.37%)
max: 518724.52 max: 833359.70
min: 464208.73 min: 765501.87
==Pread ==Pread
records: 10 records: 10
avg: 4539894.60 avg: 4457680.58
std: 197094.66 (4.34%) std: 188965.60 (4.24%)
max: 4877170.38 max: 4689905.53
min: 4226326.03 min: 4095739.72

Signed-off-by: Minchan Kim
Cc: Nitin Gupta
Tested-by: Sergey Senozhatsky
Cc: Jerome Marchand
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2014-01-31 08:56:56 +0800
f614a9f48 zram: remove workqueue for freeing removed pending slot ... Browse Code »

Commit a0c516cbfc74 ("zram: don't grab mutex in zram_slot_free_noity")
introduced free request pending code to avoid scheduling by mutex under
spinlock and it was a mess which made code lenghty and increased
overhead.

Now, we don't need zram->lock any more to free slot so this patch
reverts it and then, tb_lock should protect it.

Signed-off-by: Minchan Kim
Cc: Nitin Gupta
Tested-by: Sergey Senozhatsky
Cc: Jerome Marchand
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2014-01-31 08:56:55 +0800
92967471b zram: introduce zram->tb_lock ... Browse Code »

Currently, the zram table is protected by zram->lock but it's rather
coarse-grained lock and it makes hard for scalibility.

Let's use own rwlock instead of depending on zram->lock. This patch
adds new locking so obviously, it would make slow but this patch is just
prepartion for removing coarse-grained rw_semaphore(ie, zram->lock)
which is hurdle about zram scalability.

Final patch in this patchset series will remove the lock from read-path
and change rw_semaphore with mutex in write path. With bonus, we could
drop pending slot free mess in next patch.

Signed-off-by: Minchan Kim
Cc: Nitin Gupta
Tested-by: Sergey Senozhatsky
Cc: Jerome Marchand
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2014-01-31 08:56:55 +0800
deb0bdeb2 zram: use atomic operation for stat ... Browse Code »

Some of fields in zram->stats are protected by zram->lock which is
rather coarse-grained so let's use atomic operation without explict
locking.

This patch is ready for removing dependency of zram->lock in read path
which is very coarse-grained rw_semaphore. Of course, this patch adds
new atomic operation so it might make slow but my 12CPU test couldn't
spot any regression. All gain/lose is marginal within stddev.

iozone -t -T -l 12 -u 12 -r 16K -s 60M -I +Z -V 0

==Initial write ==Initial write
records: 50 records: 50
avg: 412875.17 avg: 415638.23
std: 38543.12 (9.34%) std: 36601.11 (8.81%)
max: 521262.03 max: 502976.72
min: 343263.13 min: 351389.12
==Rewrite ==Rewrite
records: 50 records: 50
avg: 416640.34 avg: 397914.33
std: 60798.92 (14.59%) std: 46150.42 (11.60%)
max: 543057.07 max: 522669.17
min: 304071.67 min: 316588.77
==Read ==Read
records: 50 records: 50
avg: 4147338.63 avg: 4070736.51
std: 179333.25 (4.32%) std: 223499.89 (5.49%)
max: 4459295.28 max: 4539514.44
min: 3753057.53 min: 3444686.31
==Re-read ==Re-read
records: 50 records: 50
avg: 4096706.71 avg: 4117218.57
std: 229735.04 (5.61%) std: 171676.25 (4.17%)
max: 4430012.09 max: 4459263.94
min: 2987217.80 min: 3666904.28
==Reverse Read ==Reverse Read
records: 50 records: 50
avg: 4062763.83 avg: 4078508.32
std: 186208.46 (4.58%) std: 172684.34 (4.23%)
max: 4401358.78 max: 4424757.22
min: 3381625.00 min: 3679359.94
==Stride read ==Stride read
records: 50 records: 50
avg: 4094933.49 avg: 4082170.22
std: 185710.52 (4.54%) std: 196346.68 (4.81%)
max: 4478241.25 max: 4460060.97
min: 3732593.23 min: 3584125.78
==Random read ==Random read
records: 50 records: 50
avg: 4031070.04 avg: 4074847.49
std: 192065.51 (4.76%) std: 206911.33 (5.08%)
max: 4356931.16 max: 4399442.56
min: 3481619.62 min: 3548372.44
==Mixed workload ==Mixed workload
records: 50 records: 50
avg: 149925.73 avg: 149675.54
std: 7701.26 (5.14%) std: 6902.09 (4.61%)
max: 191301.56 max: 175162.05
min: 133566.28 min: 137762.87
==Random write ==Random write
records: 50 records: 50
avg: 404050.11 avg: 393021.47
std: 58887.57 (14.57%) std: 42813.70 (10.89%)
max: 601798.09 max: 524533.43
min: 325176.99 min: 313255.34
==Pwrite ==Pwrite
records: 50 records: 50
avg: 411217.70 avg: 411237.96
std: 43114.99 (10.48%) std: 33136.29 (8.06%)
max: 530766.79 max: 471899.76
min: 320786.84 min: 317906.94
==Pread ==Pread
records: 50 records: 50
avg: 4154908.65 avg: 4087121.92
std: 151272.08 (3.64%) std: 219505.04 (5.37%)
max: 4459478.12 max: 4435857.38
min: 3730512.41 min: 3101101.67

Signed-off-by: Minchan Kim
Cc: Nitin Gupta
Tested-by: Sergey Senozhatsky
Cc: Jerome Marchand
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2014-01-31 08:56:55 +0800
874e3cddc zram: remove unnecessary free ... Browse Code »

Commit a0c516cbfc74 ("zram: don't grab mutex in zram_slot_free_noity")
introduced pending zram slot free in zram's write path in case of
missing slot free by memory allocation failure in zram_slot_free_notify
but it is not necessary because we have already freed the slot right
before overwriting.

Signed-off-by: Minchan Kim
Cc: Nitin Gupta
Cc: Jerome Marchand
Tested-by: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2014-01-31 08:56:55 +0800
9b353db16 zram: delay pending free request in read path ... Browse Code »

Sergey reported we don't need to handle pending free request every I/O
so that this patch removes it in read path while we remain it in write
path.

Let's consider below example.

Swap subsystem ask to zram "A" block free by swap_slot_free_notify but
zram had been pended it without real freeing. Swap subsystem allocates
"A" block for new data but request pended for a long time just handled
and zram blindly free new data on the "A" block. :(

That's why we couldn't remove handle pending free request right before
zram-write.

Signed-off-by: Minchan Kim
Reported-by: Sergey Senozhatsky
Tested-by: Sergey Senozhatsky
Cc: Nitin Gupta
Cc: Jerome Marchand
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2014-01-31 08:56:55 +0800
da4a04126 zram: fix race between reset and flushing pending work ... Browse Code »

Dan and Sergey reported that there is a racy between reset and flushing
of pending work so that it could make oops by freeing zram->meta in
reset while zram_slot_free can access zram->meta if new request is
adding during the race window.

This patch moves flush after taking init_lock so it prevents new request
so that it closes the race.

Signed-off-by: Minchan Kim
Reported-by: Dan Carpenter
Cc: Nitin Gupta
Cc: Jerome Marchand
Tested-by: Sergey Senozhatsky
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2014-01-31 08:56:55 +0800
7bfb3de8a zram: add copyright ... Browse Code »

Add my copyright to the zram source code which I maintain.

Signed-off-by: Minchan Kim
Cc: Nitin Gupta
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2014-01-31 08:56:55 +0800
49061236a zram: remove old private project comment ... Browse Code »

Remove the old private compcache project address so upcoming patches
should be sent to LKML because we Linux kernel community will take care.

Signed-off-by: Minchan Kim
Cc: Nitin Gupta
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2014-01-31 08:56:55 +0800
cd67e10ac zram: promote zram from staging ... Browse Code »

Zram has lived in staging for a LONG LONG time and have been
fixed/improved by many contributors so code is clean and stable now. Of
course, there are lots of product using zram in real practice.

The major TV companys have used zram as swap since two years ago and
recently our production team released android smart phone with zram
which is used as swap, too and recently Android Kitkat start to use zram
for small memory smart phone. And there was a report Google released
their ChromeOS with zram, too and cyanogenmod have been used zram long
time ago. And I heard some disto have used zram block device for tmpfs.
In addition, I saw many report from many other peoples. For example,
Lubuntu start to use it.

The benefit of zram is very clear. With my experience, one of the
benefit was to remove jitter of video application with backgroud memory
pressure. It would be effect of efficient memory usage by compression
but more issue is whether swap is there or not in the system. Recent
mobile platforms have used JAVA so there are many anonymous pages. But
embedded system normally are reluctant to use eMMC or SDCard as swap
because there is wear-leveling and latency issues so if we do not use
swap, it means we can't reclaim anoymous pages and at last, we could
encounter OOM kill. :(

Although we have real storage as swap, it was a problem, too. Because
it sometime ends up making system very unresponsible caused by slow swap
storage performance.

Quote from Luigi on Google
"Since Chrome OS was mentioned: the main reason why we don't use swap
to a disk (rotating or SSD) is because it doesn't degrade gracefully
and leads to a bad interactive experience. Generally we prefer to
manage RAM at a higher level, by transparently killing and restarting
processes. But we noticed that zram is fast enough to be competitive
with the latter, and it lets us make more efficient use of the
available RAM. " and he announced.
http://www.spinics.net/lists/linux-mm/msg57717.html

Other uses case is to use zram for block device. Zram is block device
so anyone can format the block device and mount on it so some guys on
the internet start zram as /var/tmp.
http://forums.gentoo.org/viewtopic-t-838198-start-0.html

Let's promote zram and enhance/maintain it instead of removing.

Signed-off-by: Minchan Kim
Reviewed-by: Konrad Rzeszutek Wilk
Acked-by: Nitin Gupta
Acked-by: Pekka Enberg
Cc: Bob Liu
Cc: Greg Kroah-Hartman
Cc: Hugh Dickins
Cc: Jens Axboe
Cc: Luigi Semenzato
Cc: Mel Gorman
Cc: Rik van Riel
Cc: Seth Jennings
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2014-01-31 08:56:55 +0800
53d8ab29f Merge branch 'for-3.14/drivers' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block IO driver changes from Jens Axboe:

- bcache update from Kent Overstreet.

- two bcache fixes from Nicholas Swenson.

- cciss pci init error fix from Andrew.

- underflow fix in the parallel IDE pg_write code from Dan Carpenter.
I'm sure the 1 (or 0) users of that are now happy.

- two PCI related fixes for sx8 from Jingoo Han.

- floppy init fix for first block read from Jiri Kosina.

- pktcdvd error return miss fix from Julia Lawall.

- removal of IRQF_SHARED from the SEGA Dreamcast CD-ROM code from
Michael Opdenacker.

- comment typo fix for the loop driver from Olaf Hering.

- potential oops fix for null_blk from Raghavendra K T.

- two fixes from Sam Bradshaw (Micron) for the mtip32xx driver, fixing
an OOM problem and a problem with handling security locked conditions

* 'for-3.14/drivers' of git://git.kernel.dk/linux-block: (47 commits)
mg_disk: Spelling s/finised/finished/
null_blk: Null pointer deference problem in alloc_page_buffers
mtip32xx: Correctly handle security locked condition
mtip32xx: Make SGL container per-command to eliminate high order dma allocation
drivers/block/loop.c: fix comment typo in loop_config_discard
drivers/block/cciss.c:cciss_init_one(): use proper errnos
drivers/block/paride/pg.c: underflow bug in pg_write()
drivers/block/sx8.c: remove unnecessary pci_set_drvdata()
drivers/block/sx8.c: use module_pci_driver()
floppy: bail out in open() if drive is not responding to block0 read
bcache: Fix auxiliary search trees for key size > cacheline size
bcache: Don't return -EINTR when insert finished
bcache: Improve bucket_prio() calculation
bcache: Add bch_bkey_equal_header()
bcache: update bch_bkey_try_merge
bcache: Move insert_fixup() to btree_keys_ops
bcache: Convert sorting to btree_keys
bcache: Convert debug code to btree_keys
bcache: Convert btree_iter to struct btree_keys
bcache: Refactor bset_tree sysfs stats
...

Linus Torvalds
2014-01-31 03:40:10 +0800
f568849ed Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block IO changes from Jens Axboe:
"The major piece in here is the immutable bio_ve series from Kent, the
rest is fairly minor. It was supposed to go in last round, but
various issues pushed it to this release instead. The pull request
contains:

- Various smaller blk-mq fixes from different folks. Nothing major
here, just minor fixes and cleanups.

- Fix for a memory leak in the error path in the block ioctl code
from Christian Engelmayer.

- Header export fix from CaiZhiyong.

- Finally the immutable biovec changes from Kent Overstreet. This
enables some nice future work on making arbitrarily sized bios
possible, and splitting more efficient. Related fixes to immutable
bio_vecs:

- dm-cache immutable fixup from Mike Snitzer.
- btrfs immutable fixup from Muthu Kumar.

- bio-integrity fix from Nic Bellinger, which is also going to stable"

* 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
xtensa: fixup simdisk driver to work with immutable bio_vecs
block/blk-mq-cpu.c: use hotcpu_notifier()
blk-mq: for_each_* macro correctness
block: Fix memory leak in rw_copy_check_uvector() handling
bio-integrity: Fix bio_integrity_verify segment start bug
block: remove unrelated header files and export symbol
blk-mq: uses page->list incorrectly
blk-mq: use __smp_call_function_single directly
btrfs: fix missing increment of bi_remaining
Revert "block: Warn and free bio if bi_end_io is not set"
block: Warn and free bio if bi_end_io is not set
blk-mq: fix initializing request's start time
block: blk-mq: don't export blk_mq_free_queue()
block: blk-mq: make blk_sync_queue support mq
block: blk-mq: support draining mq queue
dm cache: increment bi_remaining when bi_end_io is restored
block: fixup for generic bio chaining
block: Really silence spurious compiler warnings
block: Silence spurious compiler warnings
block: Kill bio_pair_split()
...

Linus Torvalds
2014-01-31 03:19:05 +0800

30 Jan, 2014

1 commit

bdfd70fde NVMe: Correct uses of INIT_WORK ... Browse Code »

We need to initialise the work_struct when we initialise the rest of the
struct nvme_dev, otherwise we'll hit a lockdep warning when we remove
the device. Use PREPARE_WORK to change the function pointer instead
of INIT_WORK.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2014-01-30 00:42:32 +0800

29 Jan, 2014

1 commit

d891ea23d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

Pull ceph updates from Sage Weil:
"This is a big batch. From Ilya we have:

- rbd support for more than ~250 mapped devices (now uses same scheme
that SCSI does for device major/minor numbering)
- crush updates for new mapping behaviors (will be needed for coming
erasure coding support, among other things)
- preliminary support for tiered storage pools

There is also a big series fixing a pile cephfs bugs with clustered
MDSs from Yan Zheng, ACL support for cephfs from Guangliang Zhao, ceph
fscache improvements from Li Wang, improved behavior when we get
ENOSPC from Josh Durgin, some readv/writev improvements from
Majianpeng, and the usual mix of small cleanups"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (76 commits)
ceph: cast PAGE_SIZE to size_t in ceph_sync_write()
ceph: fix dout() compile warnings in ceph_filemap_fault()
libceph: support CEPH_FEATURE_OSD_CACHEPOOL feature
libceph: follow redirect replies from osds
libceph: rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid}
libceph: follow {read,write}_tier fields on osd request submission
libceph: add ceph_pg_pool_by_id()
libceph: CEPH_OSD_FLAG_* enum update
libceph: replace ceph_calc_ceph_pg() with ceph_oloc_oid_to_pg()
libceph: introduce and start using oid abstraction
libceph: rename MAX_OBJ_NAME_SIZE to CEPH_MAX_OID_NAME_LEN
libceph: move ceph_file_layout helpers to ceph_fs.h
libceph: start using oloc abstraction
libceph: dout() is missing a newline
libceph: add ceph_kv{malloc,free}() and switch to them
libceph: support CEPH_FEATURE_EXPORT_PEER
ceph: add imported caps when handling cap export message
ceph: add open export target session helper
ceph: remove exported caps when handling cap import message
ceph: handle session flush message
...

Linus Torvalds
2014-01-29 03:02:23 +0800

28 Jan, 2014

5 commits

3193f07bb NVMe: Include device and queue numbers in interrupt name ... Browse Code »

On larger systems with many drives, it may help debugging to know which
queue is tied to which interrupt, just by looking at /proc/interrupts.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2014-01-28 09:14:08 +0800
09ece1424 NVMe: Add a pci_driver shutdown method ... Browse Code »

We need to shut down the device cleanly when the system is being shut down.
This was in an earlier patch but was inadvertently lost during a rewrite.

Signed-off-by: Keith Busch
Signed-off-by: Matthew Wilcox

Keith Busch
2014-01-28 09:12:15 +0800
a1a5ef999 NVMe: Disable admin queue on init failure ... Browse Code »

Disable the admin queue if device fails during initialization so the
queue's irq is freed.

Signed-off-by: Keith Busch
[rewritten to use nvme_free_queues]
Signed-off-by: Matthew Wilcox

Keith Busch
2014-01-28 09:11:38 +0800
469071a37 NVMe: Dynamically allocate partition numbers ... Browse Code »

Some users need more than 64 partitions per device. Rather than simply
increasing the number of partitions, switch to the dynamic partition
allocation scheme.

This means that minor numbers are not stable across boots, but since major
numbers aren't either, I cannot see this being a significant problem.

Tested-by: Matias Bjørling
Signed-off-by: Matthew Wilcox

Matthew Wilcox
2014-01-28 09:11:29 +0800
4d1154207 NVMe: Async IO queue deletion ... Browse Code »

This attempts to delete all IO queues at the same time asynchronously on
shutdown. This is necessary for a present device that is not responding;
a shutdown operation previously would take 2 minutes per queue-pair
to timeout before moving on to the next queue, making a device removal
appear to take a very long time or "hung" as reported by users.

In the previous worst case, a removal may be stuck forever until a kill
signal is given if there are more than 32 queue pairs since it would run
out of admin command IDs after over an hour of timed out sync commands
(admin queue depth is 64).

This patch will wait for the admin command timeout for all commands to
complete, so the worst case now for an unresponsive controller is 60
seconds, though that still seems like a long time.

Since this adds another way to take queues offline, some duplicate code
resulted so I moved these into more convienient functions.

Signed-off-by: Keith Busch
[make functions static, correct line length and whitespace issues]
Signed-off-by: Matthew Wilcox

Keith Busch
2014-01-28 09:07:35 +0800