Eric Lee / smarc-fsl-linux-kernel

27 Jul, 2016

2 commits

0e06f5c0d Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge updates from Andrew Morton:

- a few misc bits

- ocfs2

- most(?) of MM

* emailed patches from Andrew Morton : (125 commits)
thp: fix comments of __pmd_trans_huge_lock()
cgroup: remove unnecessary 0 check from css_from_id()
cgroup: fix idr leak for the first cgroup root
mm: memcontrol: fix documentation for compound parameter
mm: memcontrol: remove BUG_ON in uncharge_list
mm: fix build warnings in
mm, thp: convert from optimistic swapin collapsing to conservative
mm, thp: fix comment inconsistency for swapin readahead functions
thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
shmem: split huge pages beyond i_size under memory pressure
thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
khugepaged: add support of collapse for tmpfs/shmem pages
shmem: make shmem_inode_info::lock irq-safe
khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
thp: extract khugepaged from mm/huge_memory.c
shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
shmem: add huge pages support
shmem: get_unmapped_area align huge page
shmem: prepare huge= mount option and sysfs knob
mm, rmap: account shmem thp pages
...

Linus Torvalds
2016-07-27 10:55:54 +0800
8a5c743e3 mm, memcg: use consistent gfp flags during readahead ... Browse Code »

Vladimir has noticed that we might declare memcg oom even during
readahead because read_pages only uses GFP_KERNEL (with mapping_gfp
restriction) while __do_page_cache_readahead uses
page_cache_alloc_readahead which adds __GFP_NORETRY to prevent from
OOMs. This gfp mask discrepancy is really unfortunate and easily
fixable. Drop page_cache_alloc_readahead() which only has one user and
outsource the gfp_mask logic into readahead_gfp_mask and propagate this
mask from __do_page_cache_readahead down to read_pages.

This alone would have only very limited impact as most filesystems are
implementing ->readpages and the common implementation mpage_readpages
does GFP_KERNEL (with mapping_gfp restriction) again. We can tell it to
use readahead_gfp_mask instead as this function is called only during
readahead as well. The same applies to read_cache_pages.

ext4 has its own ext4_mpage_readpages but the path which has pages !=
NULL can use the same gfp mask. Btrfs, cifs, f2fs and orangefs are
doing a very similar pattern to mpage_readpages so the same can be
applied to them as well.

[akpm@linux-foundation.org: coding-style fixes]
[mhocko@suse.com: restrict gfp mask in mpage_alloc]
Link: http://lkml.kernel.org/r/20160610074223.GC32285@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/1465301556-26431-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko
Cc: Vladimir Davydov
Cc: Chris Mason
Cc: Steve French
Cc: Theodore Ts'o
Cc: Jan Kara
Cc: Mike Marshall
Cc: Jaegeuk Kim
Cc: Changman Lee
Cc: Chao Yu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-07-27 07:19:19 +0800

08 Jun, 2016

2 commits

eed25cd5b mpage: use bio op accessors ... Browse Code »

Separate the op from the rq_flag_bits and have the mpage code
set/get the bio using bio_set_op_attrs/bio_op.

Signed-off-by: Mike Christie
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Mike Christie
2016-06-08 03:41:38 +0800
4e49ea4a3 block/fs/drivers: remove rw argument from submit_bio ... Browse Code »

This has callers of submit_bio/submit_bio_wait set the bio->bi_rw
instead of passing it in. This makes that use the same as
generic_make_request and how we set the other bio fields.

Signed-off-by: Mike Christie

Fixed up fs/ext4/crypto.c

Signed-off-by: Jens Axboe

Mike Christie
2016-06-08 03:41:38 +0800

05 Apr, 2016

2 commits

ea1754a08 mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage ... Browse Code »

Mostly direct substitution with occasional adjustment or removing
outdated comments.

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-04-05 01:41:08 +0800
09cbfeaf1 mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros ... Browse Code »

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special. They are
not.

The changes are pretty straight-forward:

- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

- page_cache_get() -> get_page();

- page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-04-05 01:41:08 +0800

16 Mar, 2016

1 commit

02c43638e fs/mpage.c:mpage_readpages(): use lru_to_page() helper ... Browse Code »

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2016-03-16 07:55:16 +0800

07 Nov, 2015

1 commit

c62d25556 mm, fs: introduce mapping_gfp_constraint() ... Browse Code »

There are many places which use mapping_gfp_mask to restrict a more
generic gfp mask which would be used for allocations which are not
directly related to the page cache but they are performed in the same
context.

Let's introduce a helper function which makes the restriction explicit and
easier to track. This patch doesn't introduce any functional changes.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Michal Hocko
Suggested-by: Andrew Morton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2015-11-07 09:50:42 +0800

05 Nov, 2015

1 commit

d9734e0d1 Merge branch 'for-4.4/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block updates from Jens Axboe:
"This is the core block pull request for 4.4. I've got a few more
topic branches this time around, some of them will layer on top of the
core+drivers changes and will come in a separate round. So not a huge
chunk of changes in this round.

This pull request contains:

- Enable blk-mq page allocation tracking with kmemleak, from Catalin.

- Unused prototype removal in blk-mq from Christoph.

- Cleanup of the q->blk_trace exchange, using cmpxchg instead of two
xchg()'s, from Davidlohr.

- A plug flush fix from Jeff.

- Also from Jeff, a fix that means we don't have to update shared tag
sets at init time unless we do a state change. This cuts down boot
times on thousands of devices a lot with scsi/blk-mq.

- blk-mq waitqueue barrier fix from Kosuke.

- Various fixes from Ming:

- Fixes for segment merging and splitting, and checks, for
the old core and blk-mq.

- Potential blk-mq speedup by marking ctx pending at the end
of a plug insertion batch in blk-mq.

- direct-io no page dirty on kernel direct reads.

- A WRITE_SYNC fix for mpage from Roman"

* 'for-4.4/core' of git://git.kernel.dk/linux-block:
blk-mq: avoid excessive boot delays with large lun counts
blktrace: re-write setting q->blk_trace
blk-mq: mark ctx as pending at batch in flush plug path
blk-mq: fix for trace_block_plug()
block: check bio_mergeable() early before merging
blk-mq: check bio_mergeable() early before merging
block: avoid to merge splitted bio
block: setup bi_phys_segments after splitting
block: fix plug list flushing for nomerge queues
blk-mq: remove unused blk_mq_clone_flush_request prototype
blk-mq: fix waitqueue_active without memory barrier in block/blk-mq-tag.c
fs: direct-io: don't dirtying pages for ITER_BVEC/ITER_KVEC direct read
fs/mpage.c: forgotten WRITE_SYNC in case of data integrity write
block: kmemleak: Track the page allocations for struct request

Linus Torvalds
2015-11-05 12:28:10 +0800

17 Oct, 2015

1 commit

063d99b4f mm, fs: obey gfp_mapping for add_to_page_cache() ... Browse Code »

Commit 6afdb859b710 ("mm: do not ignore mapping_gfp_mask in page cache
allocation paths") has caught some users of hardcoded GFP_KERNEL used in
the page cache allocation paths. This, however, wasn't complete and
there were others which went unnoticed.

Dave Chinner has reported the following deadlock for xfs on loop device:
: With the recent merge of the loop device changes, I'm now seeing
: XFS deadlock on my single CPU, 1GB RAM VM running xfs/073.
:
: The deadlocked is as follows:
:
: kloopd1: loop_queue_read_work
: xfs_file_iter_read
: lock XFS inode XFS_IOLOCK_SHARED (on image file)
: page cache read (GFP_KERNEL)
: radix tree alloc
: memory reclaim
: reclaim XFS inodes
: log force to unpin inodes
:
:
: xfs-cil/loop1:
: xlog_cil_push
: xlog_write
:
: xlog_state_get_iclog_space()
:
:
:
: kloopd1: loop_queue_write_work
: xfs_file_write_iter
: lock XFS inode XFS_IOLOCK_EXCL (on image file)
:
:
: i.e. the kloopd, with it's split read and write work queues, has
: introduced a dependency through memory reclaim. i.e. that writes
: need to be able to progress for reads make progress.
:
: The problem, fundamentally, is that mpage_readpages() does a
: GFP_KERNEL allocation, rather than paying attention to the inode's
: mapping gfp mask, which is set to GFP_NOFS.
:
: The didn't used to happen, because the loop device used to issue
: reads through the splice path and that does:
:
: error = add_to_page_cache_lru(page, mapping, index,
: GFP_KERNEL & mapping_gfp_mask(mapping));

This has changed by commit aa4d86163e4 ("block: loop: switch to VFS
ITER_BVEC").

This patch changes mpage_readpage{s} to follow gfp mask set for the
mapping. There are, however, other places which are doing basically the
same.

lustre:ll_dir_filler is doing GFP_KERNEL from the function which
apparently uses GFP_NOFS for other allocations so let's make this
consistent.

cifs:readpages_get_pages is called from cifs_readpages and
__cifs_readpages_from_fscache called from the same path obeys mapping
gfp.

ramfs_nommu_expand_for_mapping is hardcoding GFP_KERNEL as well
regardless it uses mapping_gfp_mask for the page allocation.

ext4_mpage_readpages is the called from the page cache allocation path
same as read_pages and read_cache_pages

As I've noticed in my previous post I cannot say I would be happy about
sprinkling mapping_gfp_mask all over the place and it sounds like we
should drop gfp_mask argument altogether and use it internally in
__add_to_page_cache_locked that would require all the filesystems to use
mapping gfp consistently which I am not sure is the case here. From a
quick glance it seems that some file system use it all the time while
others are selective.

Signed-off-by: Michal Hocko
Reported-by: Dave Chinner
Cc: "Theodore Ts'o"
Cc: Ming Lei
Cc: Andreas Dilger
Cc: Oleg Drokin
Cc: Al Viro
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2015-10-17 02:42:28 +0800

24 Sep, 2015

1 commit

5948edbcb fs/mpage.c: forgotten WRITE_SYNC in case of data integrity write ... Browse Code »

In case of wbc->sync_mode == WB_SYNC_ALL we need to do data integrity
write, thus mark request as WRITE_SYNC.

akpm: afaict this change will cause the data integrity write bios to be
placed onto the second queue in cfq_io_cq.cfqq[], which presumably results
in special treatment. The documentation for REQ_SYNC is horrid.

Signed-off-by: Roman Pen
Reviewed-by: Jan Kara
Signed-off-by: Andrew Morton
Reviewed-by: Tejun Heo
Signed-off-by: Jens Axboe

Roman Pen
2015-09-24 01:00:57 +0800

14 Aug, 2015

1 commit

b54ffb73c block: remove bio_get_nr_vecs() ... Browse Code »

We can always fill up the bio now, no need to estimate the possible
size based on queue parameters.

Acked-by: Steven Whitehouse
Signed-off-by: Kent Overstreet
[hch: rebased and wrote a changelog]
Signed-off-by: Christoph Hellwig
Signed-off-by: Ming Lin
Signed-off-by: Jens Axboe

Kent Overstreet
2015-08-14 02:32:04 +0800

29 Jul, 2015

1 commit

4246a0b63 block: add a bi_error field to struct bio ... Browse Code »

Currently we have two different ways to signal an I/O error on a BIO:

(1) by clearing the BIO_UPTODATE flag
(2) by returning a Linux errno value to the bi_end_io callback

The first one has the drawback of only communicating a single possible
error (-EIO), and the second one has the drawback of not beeing persistent
when bios are queued up, and are not passed along from child to parent
bio in the ever more popular chaining scenario. Having both mechanisms
available has the additional drawback of utterly confusing driver authors
and introducing bugs where various I/O submitters only deal with one of
them, and the others have to add boilerplate code to deal with both kinds
of error returns.

So add a new bi_error field to store an errno value directly in struct
bio and remove the existing mechanisms to clean all this up.

Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: NeilBrown
Signed-off-by: Jens Axboe

Christoph Hellwig
2015-07-29 22:55:15 +0800

02 Jun, 2015

3 commits

2a8149081 writeback: implement foreign cgroup inode detection ... Browse Code »

As concurrent write sharing of an inode is expected to be very rare
and memcg only tracks page ownership on first-use basis severely
confining the usefulness of such sharing, cgroup writeback tracks
ownership per-inode. While the support for concurrent write sharing
of an inode is deemed unnecessary, an inode being written to by
different cgroups at different points in time is a lot more common,
and, more importantly, charging only by first-use can too readily lead
to grossly incorrect behaviors (single foreign page can lead to
gigabytes of writeback to be incorrectly attributed).

To resolve this issue, cgroup writeback detects the majority dirtier
of an inode and will transfer the ownership to it. To avoid
unnnecessary oscillation, the detection mechanism keeps track of
history and gives out the switch verdict only if the foreign usage
pattern is stable over a certain amount of time and/or writeback
attempts.

The detection mechanism has fairly low space and computation overhead.
It adds 8 bytes to struct inode (one int and two u16's) and minimal
amount of calculation per IO. The detection mechanism converges to
the correct answer usually in several seconds of IO time when there's
a clear majority dirtier. Even when there isn't, it can reach an
acceptable answer fairly quickly under most circumstances.

Please see wb_detach_inode() for more details.

This patch only implements detection. Following patches will
implement actual switching.

v2: wbc_account_io() now checks whether the wbc is associated with a
wb before dereferencing it. This can happen when pageout() is
writing pages directly without going through the usual writeback
path. As pageout() path is single-threaded, we don't want it to
be blocked behind a slow cgroup and ultimately want it to delegate
actual writing to the usual writeback path.

Signed-off-by: Tejun Heo
Cc: Jens Axboe
Cc: Jan Kara
Cc: Wu Fengguang
Cc: Greg Thelen
Signed-off-by: Jens Axboe

Tejun Heo
2015-06-02 22:40:20 +0800
b16b1deb5 writeback: make writeback_control track the inode being written back ... Browse Code »

Currently, for cgroup writeback, the IO submission paths directly
associate the bio's with the blkcg from inode_to_wb_blkcg_css();
however, it'd be necessary to keep more writeback context to implement
foreign inode writeback detection. wbc (writeback_control) is the
natural fit for the extra context - it persists throughout the
writeback of each inode and is passed all the way down to IO
submission paths.

This patch adds wbc_attach_and_unlock_inode(), wbc_detach_inode(), and
wbc_attach_fdatawrite_inode() which are used to associate wbc with the
inode being written back. IO submission paths now use wbc_init_bio()
instead of directly associating bio's with blkcg themselves. This
leaves inode_to_wb_blkcg_css() w/o any user. The function is removed.

wbc currently only tracks the associated wb (bdi_writeback). Future
patches will add more for foreign inode detection. The association is
established under i_lock which will be depended upon when migrating
foreign inodes to other wb's.

As currently, once established, inode to wb association never changes,
going through wbc when initializing bio's doesn't cause any behavior
changes.

v2: submit_blk_blkcg() now checks whether the wbc is associated with a
wb before dereferencing it. This can happen when pageout() is
writing pages directly without going through the usual writeback
path. As pageout() path is single-threaded, we don't want it to
be blocked behind a slow cgroup and ultimately want it to delegate
actual writing to the usual writeback path.

Signed-off-by: Tejun Heo
Cc: Jens Axboe
Cc: Jan Kara
Cc: Wu Fengguang
Cc: Greg Thelen
Signed-off-by: Jens Axboe

Tejun Heo
2015-06-02 22:39:48 +0800
429b3fb02 mpage: make __mpage_writepage() honor cgroup writeback ... Browse Code »

__mpage_writepage() is used to implement mpage_writepages() which in
turn is used for ->writepages() of various filesystems. All writeback
logic is now updated to handle cgroup writeback and the block cgroup
to issue IOs for is encoded in writeback_control and can be retrieved
from the inode; however, __mpage_writepage() currently ignores the
blkcg indicated by the inode and issues all bio's without explicit
blkcg association.

This patch updates __mpage_writepage() so that the issued bio's are
associated with inode_to_writeback_blkcg_css(inode).

v2: Updated for per-inode wb association.

Signed-off-by: Tejun Heo
Cc: Jens Axboe
Cc: Jan Kara
Cc: Andrew Morton
Cc: Alexander Viro
Signed-off-by: Jens Axboe

Tejun Heo
2015-06-02 22:38:04 +0800

10 Oct, 2014

1 commit

4db96b71e vfs: guard end of device for mpage interface ... Browse Code »

Add guard_bio_eod() check for mpage code in order to allow us to do IO
even on the odd last sectors of a device, even if the block size is some
multiple of the physical sector size.

Using mpage_readpages() for block device requires this guard check.

Signed-off-by: Akinobu Mita
Cc: Jens Axboe
Cc: Alexander Viro
Cc: Jeff Moyer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2014-10-10 10:25:53 +0800

05 Jun, 2014

3 commits

47a191fd3 fs/block_dev.c: add bdev_read_page() and bdev_write_page() ... Browse Code »

A block device driver may choose to provide a rw_page operation. These
will be called when the filesystem is attempting to do page sized I/O to
page cache pages (ie not for direct I/O). This does preclude I/Os that
are larger than page size, so this may only be a performance gain for
some devices.

Signed-off-by: Matthew Wilcox
Tested-by: Dheeraj Reddy
Cc: Dave Chinner
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2014-06-05 07:54:02 +0800
57d998456 fs/mpage.c: factor page_endio() out of mpage_end_io() ... Browse Code »

page_endio() takes care of updating all the appropriate page flags once
I/O has finished to a page. Switch to using mapping_set_error() instead
of setting AS_EIO directly; this will handle thin-provisioned devices
correctly.

Signed-off-by: Matthew Wilcox
Cc: Dave Chinner
Cc: Dheeraj Reddy
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2014-06-05 07:54:02 +0800
90768eee4 fs/mpage.c: factor clean_buffers() out of __mpage_writepage() ... Browse Code »

__mpage_writepage() is over 200 lines long, has 20 local variables, four
goto labels and could desperately use simplification. Splitting
clean_buffers() into a helper function improves matters a little,
removing 20+ lines from it.

Signed-off-by: Matthew Wilcox
Cc: Dave Chinner
Cc: Dheeraj Reddy
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2014-06-05 07:54:02 +0800

24 Nov, 2013

2 commits

4f024f379 block: Abstract out bvec iterator ... Browse Code »

Immutable biovecs are going to require an explicit iterator. To
implement immutable bvecs, a later patch is going to add a bi_bvec_done
member to this struct; for now, this patch effectively just renames
things.

Signed-off-by: Kent Overstreet
Cc: Jens Axboe
Cc: Geert Uytterhoeven
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: "Ed L. Cashin"
Cc: Nick Piggin
Cc: Lars Ellenberg
Cc: Jiri Kosina
Cc: Matthew Wilcox
Cc: Geoff Levand
Cc: Yehuda Sadeh
Cc: Sage Weil
Cc: Alex Elder
Cc: ceph-devel@vger.kernel.org
Cc: Joshua Morris
Cc: Philip Kelleher
Cc: Rusty Russell
Cc: "Michael S. Tsirkin"
Cc: Konrad Rzeszutek Wilk
Cc: Jeremy Fitzhardinge
Cc: Neil Brown
Cc: Alasdair Kergon
Cc: Mike Snitzer
Cc: dm-devel@redhat.com
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: linux390@de.ibm.com
Cc: Boaz Harrosh
Cc: Benny Halevy
Cc: "James E.J. Bottomley"
Cc: Greg Kroah-Hartman
Cc: "Nicholas A. Bellinger"
Cc: Alexander Viro
Cc: Chris Mason
Cc: "Theodore Ts'o"
Cc: Andreas Dilger
Cc: Jaegeuk Kim
Cc: Steven Whitehouse
Cc: Dave Kleikamp
Cc: Joern Engel
Cc: Prasad Joshi
Cc: Trond Myklebust
Cc: KONISHI Ryusuke
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Ben Myers
Cc: xfs@oss.sgi.com
Cc: Steven Rostedt
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Len Brown
Cc: Pavel Machek
Cc: "Rafael J. Wysocki"
Cc: Herton Ronaldo Krzesinski
Cc: Ben Hutchings
Cc: Andrew Morton
Cc: Guo Chao
Cc: Tejun Heo
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Wei Yongjun
Cc: "Roger Pau Monné"
Cc: Jan Beulich
Cc: Stefano Stabellini
Cc: Ian Campbell
Cc: Sebastian Ott
Cc: Christian Borntraeger
Cc: Minchan Kim
Cc: Jiang Liu
Cc: Nitin Gupta
Cc: Jerome Marchand
Cc: Joe Perches
Cc: Peng Tao
Cc: Andy Adamson
Cc: fanchaoting
Cc: Jie Liu
Cc: Sunil Mushran
Cc: "Martin K. Petersen"
Cc: Namjae Jeon
Cc: Pankaj Kumar
Cc: Dan Magenheimer
Cc: Mel Gorman 6

Kent Overstreet
2013-11-24 14:33:47 +0800
2c30c71bd block: Convert various code to bio_for_each_segment() ... Browse Code »

With immutable biovecs we don't want code accessing bi_io_vec directly -
the uses this patch changes weren't incorrect since they all own the
bio, but it makes the code harder to audit for no good reason - also,
this will help with multipage bvecs later.

Signed-off-by: Kent Overstreet
Cc: Jens Axboe
Cc: Alexander Viro
Cc: Chris Mason
Cc: Jaegeuk Kim
Cc: Joern Engel
Cc: Prasad Joshi
Cc: Trond Myklebust

Kent Overstreet
2013-11-24 14:33:46 +0800

29 Feb, 2012

1 commit

630d9c472 fs: reduce the use of module.h wherever possible ... Browse Code »

For files only using THIS_MODULE and/or EXPORT_SYMBOL, map
them onto including export.h -- or if the file isn't even
using those, then just delete the include. Fix up any implicit
include dependencies that were being masked by module.h along
the way.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2012-02-29 08:31:58 +0800

12 Jan, 2012

1 commit

0b4156eb2 fs: remove unneeded plug in mpage_readpages() ... Browse Code »

The block plug in mpage_readpages() duplicates the one in read_pages().

Signed-off-by: Namjae Jeon
Signed-off-by: Amit Sahrawat
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe

Namjae Jeon
2012-01-12 16:19:54 +0800

27 May, 2011

1 commit

c515e1fd3 mm/fs: add hooks to support cleancache ... Browse Code »

This fourth patch of eight in this cleancache series provides the
core hooks in VFS for: initializing cleancache per filesystem;
capturing clean pages reclaimed by page cache; attempting to get
pages from cleancache before filesystem read; and ensuring coherency
between pagecache, disk, and cleancache. Note that the placement
of these hooks was stable from 2.6.18 to 2.6.38; a minor semantic
change was required due to a patchset in 2.6.39.

All hooks become no-ops if CONFIG_CLEANCACHE is unset, or become
a check of a boolean global if CONFIG_CLEANCACHE is set but no
cleancache "backend" has claimed cleancache_ops.

Details and a FAQ can be found in Documentation/vm/cleancache.txt

[v8: minchan.kim@gmail.com: adapt to new remove_from_page_cache function]
Signed-off-by: Chris Mason
Signed-off-by: Dan Magenheimer
Reviewed-by: Jeremy Fitzhardinge
Reviewed-by: Konrad Rzeszutek Wilk
Cc: Andrew Morton
Cc: Al Viro
Cc: Matthew Wilcox
Cc: Nick Piggin
Cc: Mel Gorman
Cc: Rik Van Riel
Cc: Jan Beulich
Cc: Andreas Dilger
Cc: Ted Ts'o
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Nitin Gupta

Dan Magenheimer
2011-05-27 00:01:43 +0800

10 Mar, 2011

1 commit

2ed1a6bcf fs: make mpage read/write_pages() plug ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:52:26 +0800

14 Jan, 2011

1 commit

c32b0d4b3 fs/mpage.c: consolidate code ... Browse Code »

Merge mpage_end_io_read() and mpage_end_io_write() into mpage_end_io() to
eliminate code duplication.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Hai Shan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hai Shan
2011-01-14 09:32:32 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

04 Feb, 2010

1 commit

2a61aa401 Fix misspellings of "invocation" in comments. ... Browse Code »

Some comments misspell "invocation"; this fixes them. No code
changes.

Signed-off-by: Adam Buchbinder
Signed-off-by: Jiri Kosina

Adam Buchbinder
2010-02-04 18:55:45 +0800

14 May, 2009

1 commit

79ffab343 ext4: Properly initialize the buffer_head state ... Browse Code »

These struct buffer_heads are allocated on the stack (and hence are
initialized with stack garbage). They are only used to call a
get_blocks() function, so that's mostly OK, but b_state must be
initialized to be 0 so we don't have any unexpected BH_* flags set by
accident, such as BH_Unwritten or BH_Delay.

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: "Theodore Ts'o"

Aneesh Kumar K.V
2009-05-14 03:13:42 +0800

01 Apr, 2009

1 commit

ced117c73 Remove two unneeded exports and make two symbols static in fs/mpage.c ... Browse Code »

Commit 29a814d2ee0e43c2980f33f91c1311ec06c0aa35 (vfs: add hooks for
ext4's delayed allocation support) exported the following functions

mpage_bio_submit()
__mpage_writepage()

for the benefit of ext4's delayed allocation support. Since commit
a1d6cc563bfdf1bf2829d3e6ce4d8b774251796b (ext4: Rework the
ext4_da_writepages() function), these functions are not used by the
ext4 driver anymore. However, the now unnecessary exports still
remain, and this patch removes those. Moreover, these two functions
can become static again.

The issue was spotted by namespacecheck.

Signed-off-by: Dmitri Vorobiev
Reviewed-by: Aneesh Kumar K.V
Signed-off-by: Al Viro

Dmitri Vorobiev
2009-04-01 19:38:54 +0800

07 Jan, 2009

2 commits

39f0dee2d do_mpage_readpage(): remove useless clear_buffer_mapped() call ... Browse Code »

It is known that buffer_mapped() is false in this code path.

Signed-off-by: Franck Bui-Huu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Franck Bui-Huu
2009-01-07 07:59:01 +0800
38c8e6180 do_mpage_readpage(): don't submit lots of small bios on boundary ... Browse Code »

While tracing I/O patterns with blktrace (a great tool) a few weeks ago I
identified a minor issue in fs/mpage.c

As the comment above mpage_readpages() says, a fs's get_block function
will set BH_Boundary when it maps a block just before a block for which
extra I/O is required.

Since get_block() can map a range of pages, for all these pages the
BH_Boundary flag will be set. But we only need to push what I/O we have
accumulated at the last block of this range.

This makes do_mpage_readpage() send out the largest possible bio instead
of a bunch of page-sized ones in the BH_Boundary case.

Signed-off-by: Miquel van Smoorenburg
Cc: Nick Piggin
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miquel van Smoorenburg
2009-01-07 07:58:59 +0800

17 Oct, 2008

1 commit

e1f8e8744 Remove Andrew Morton's old email accounts ... Browse Code »

People can use the real name an an index into MAINTAINERS to find the
current email address.

Signed-off-by: Francois Cami
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Francois Cami
2008-10-17 02:21:32 +0800

12 Jul, 2008

1 commit

29a814d2e vfs: add hooks for ext4's delayed allocation support ... Browse Code »

Export mpage_bio_submit() and __mpage_writepage() for the benefit of
ext4's delayed allocation support. Also change __block_write_full_page
so that if buffers that have the BH_Delay flag set it will call
get_block() to get the physical block allocated, just as in the
!BH_Mapped case.

Signed-off-by: Alex Tomas
Signed-off-by: "Theodore Ts'o"

Alex Tomas
2008-07-12 07:27:31 +0800

04 Mar, 2008

1 commit

78a4a50a8 docbook: fix filesystems.tmpl source files ... Browse Code »

Fix docbook problems in filesystems.tmpl.
These cause the generated docbook to be incorrect.

Signed-off-by: Randy Dunlap
Signed-off-by: Linus Torvalds

Randy Dunlap
2008-03-04 02:47:13 +0800

06 Feb, 2008

1 commit

eebd2aa35 Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user ... Browse Code »

Simplify page cache zeroing of segments of pages through 3 functions

zero_user_segments(page, start1, end1, start2, end2)

Zeros two segments of the page. It takes the position where to
start and end the zeroing which avoids length calculations and
makes code clearer.

zero_user_segment(page, start, end)

Same for a single segment.

zero_user(page, start, length)

Length variant for the case where we know the length.

We remove the zero_user_page macro. Issues:

1. Its a macro. Inline functions are preferable.

2. The KM_USER0 macro is only defined for HIGHMEM.

Having to treat this special case everywhere makes the
code needlessly complex. The parameter for zeroing is always
KM_USER0 except in one single case that we open code.

Avoiding KM_USER0 makes a lot of code not having to be dealing
with the special casing for HIGHMEM anymore. Dealing with
kmap is only necessary for HIGHMEM configurations. In those
configurations we use KM_USER0 like we do for a series of other
functions defined in highmem.h.

Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
function could not be a macro. zero_user_* functions introduced
here can be be inline because that constant is not used when these
functions are called.

Also extract the flushing of the caches to be outside of the kmap.

[akpm@linux-foundation.org: fix nfs and ntfs build]
[akpm@linux-foundation.org: fix ntfs build some more]
Signed-off-by: Christoph Lameter
Cc: Steven French
Cc: Michael Halcrow
Cc:
Cc: Steven Whitehouse
Cc: Trond Myklebust
Cc: "J. Bruce Fields"
Cc: Anton Altaparmakov
Cc: Mark Fasheh
Cc: David Chinner
Cc: Michael Halcrow
Cc: Steven French
Cc: Steven Whitehouse
Cc: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2008-02-06 01:44:13 +0800

17 Oct, 2007

1 commit

eb2be1893 mm: buffered write cleanup ... Browse Code »

Quite a bit of code is used in maintaining these "cached pages" that are
probably pretty unlikely to get used. It would require a narrow race where
the page is inserted concurrently while this process is allocating a page
in order to create the spare page. Then a multi-page write into an uncached
part of the file, to make use of it.

Next, the buffered write path (and others) uses its own LRU pagevec when it
should be just using the per-CPU LRU pagevec (which will cut down on both data
and code size cacheline footprint). Also, these private LRU pagevecs are
emptied after just a very short time, in contrast with the per-CPU pagevecs
that are persistent. Net result: 7.3 times fewer lru_lock acquisitions required
to add the pages to pagecache for a bulk write (in 4K chunks).

[this gets rid of some cond_resched() calls in readahead.c and mpage.c due
to clashes in -mm. What put them there, and why? ]

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:54 +0800

10 Oct, 2007

1 commit

6712ecf8f Drop 'size' argument from bio_endio and bi_end_io ... Browse Code »

As bi_end_io is only called once when the reqeust is complete,
the 'size' argument is now redundant. Remove it.

Now there is no need for bio_endio to subtract the size completed
from bi_size. So don't do that either.

While we are at it, change bi_end_io to return void.

Signed-off-by: Neil Brown
Signed-off-by: Jens Axboe

NeilBrown
2007-10-10 15:25:57 +0800

11 May, 2007

1 commit

0ea971801 consolidate generic_writepages and mpage_writepages ... Browse Code »

Clean up massive code duplication between mpage_writepages() and
generic_writepages().

The new generic function, write_cache_pages() takes a function pointer
argument, which will be called for each page to be written.

Maybe cifs_writepages() too can use this infrastructure, but I'm not
touching that with a ten-foot pole.

The upcoming page writeback support in fuse will also want this.

Signed-off-by: Miklos Szeredi
Acked-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2007-05-11 23:29:35 +0800