Commit 62c230bc1790923a1b35da03596a68a6c9b5b100

Authored by Mel Gorman
Committed by Linus Torvalds
1 parent 18022c5d86

mm: add support for a filesystem to activate swap files and use direct_IO for writing swap pages

Currently swapfiles are managed entirely by the core VM by using ->bmap to
allocate space and write to the blocks directly.  This effectively ensures
that the underlying blocks are allocated and avoids the need for the swap
subsystem to locate what physical blocks store offsets within a file.

If the swap subsystem is to use the filesystem information to locate the
blocks, it is critical that information such as block groups, block
bitmaps and the block descriptor table that map the swap file were
resident in memory.  This patch adds address_space_operations that the VM
can call when activating or deactivating swap backed by a file.

  int swap_activate(struct file *);
  int swap_deactivate(struct file *);

The ->swap_activate() method is used to communicate to the file that the
VM relies on it, and the address_space should take adequate measures such
as reserving space in the underlying device, reserving memory for mempools
and pinning information such as the block descriptor table in memory.  The
->swap_deactivate() method is called on sys_swapoff() if ->swap_activate()
returned success.

After a successful swapfile ->swap_activate, the swapfile is marked
SWP_FILE and swapper_space.a_ops will proxy to
sis->swap_file->f_mappings->a_ops using ->direct_io to write swapcache
pages and ->readpage to read.

It is perfectly possible that direct_IO be used to read the swap pages but
it is an unnecessary complication.  Similarly, it is possible that
->writepage be used instead of direct_io to write the pages but filesystem
developers have stated that calling writepage from the VM is undesirable
for a variety of reasons and using direct_IO opens up the possibility of
writing back batches of swap pages in the future.

[a.p.zijlstra@chello.nl: Original patch]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Showing 7 changed files with 105 additions and 3 deletions Side-by-side Diff

Documentation/filesystems/Locking
... ... @@ -206,6 +206,8 @@
206 206 int (*launder_page)(struct page *);
207 207 int (*is_partially_uptodate)(struct page *, read_descriptor_t *, unsigned long);
208 208 int (*error_remove_page)(struct address_space *, struct page *);
  209 + int (*swap_activate)(struct file *);
  210 + int (*swap_deactivate)(struct file *);
209 211  
210 212 locking rules:
211 213 All except set_page_dirty and freepage may block
... ... @@ -229,6 +231,8 @@
229 231 launder_page: yes
230 232 is_partially_uptodate: yes
231 233 error_remove_page: yes
  234 +swap_activate: no
  235 +swap_deactivate: no
232 236  
233 237 ->write_begin(), ->write_end(), ->sync_page() and ->readpage()
234 238 may be called from the request handler (/dev/loop).
... ... @@ -329,6 +333,15 @@
329 333 cleaned, or an error value if not. Note that in order to prevent the page
330 334 getting mapped back in and redirtied, it needs to be kept locked
331 335 across the entire operation.
  336 +
  337 + ->swap_activate will be called with a non-zero argument on
  338 +files backing (non block device backed) swapfiles. A return value
  339 +of zero indicates success, in which case this file can be used for
  340 +backing swapspace. The swapspace operations will be proxied to the
  341 +address space operations.
  342 +
  343 + ->swap_deactivate() will be called in the sys_swapoff()
  344 +path after ->swap_activate() returned success.
332 345  
333 346 ----------------------- file_lock_operations ------------------------------
334 347 prototypes:
Documentation/filesystems/vfs.txt
... ... @@ -592,6 +592,8 @@
592 592 int (*migratepage) (struct page *, struct page *);
593 593 int (*launder_page) (struct page *);
594 594 int (*error_remove_page) (struct mapping *mapping, struct page *page);
  595 + int (*swap_activate)(struct file *);
  596 + int (*swap_deactivate)(struct file *);
595 597 };
596 598  
597 599 writepage: called by the VM to write a dirty page to backing store.
... ... @@ -759,6 +761,16 @@
759 761 is ok for this address space. Used for memory failure handling.
760 762 Setting this implies you deal with pages going away under you,
761 763 unless you have them locked or reference counts increased.
  764 +
  765 + swap_activate: Called when swapon is used on a file to allocate
  766 + space if necessary and pin the block lookup information in
  767 + memory. A return value of zero indicates success,
  768 + in which case this file can be used to back swapspace. The
  769 + swapspace operations will be proxied to this address space's
  770 + ->swap_{out,in} methods.
  771 +
  772 + swap_deactivate: Called during swapoff on files where swap_activate
  773 + was successful.
762 774  
763 775  
764 776 The File Object
... ... @@ -638,6 +638,10 @@
638 638 int (*is_partially_uptodate) (struct page *, read_descriptor_t *,
639 639 unsigned long);
640 640 int (*error_remove_page)(struct address_space *, struct page *);
  641 +
  642 + /* swapfile support */
  643 + int (*swap_activate)(struct file *file);
  644 + int (*swap_deactivate)(struct file *file);
641 645 };
642 646  
643 647 extern const struct address_space_operations empty_aops;
include/linux/swap.h
... ... @@ -151,6 +151,7 @@
151 151 SWP_SOLIDSTATE = (1 << 4), /* blkdev seeks are cheap */
152 152 SWP_CONTINUED = (1 << 5), /* swap_map has count continuation */
153 153 SWP_BLKDEV = (1 << 6), /* its a block device */
  154 + SWP_FILE = (1 << 7), /* set after swap_activate success */
154 155 /* add others here before... */
155 156 SWP_SCANNING = (1 << 8), /* refcount in scan_swap_map */
156 157 };
... ... @@ -320,6 +321,7 @@
320 321 /* linux/mm/page_io.c */
321 322 extern int swap_readpage(struct page *);
322 323 extern int swap_writepage(struct page *page, struct writeback_control *wbc);
  324 +extern int swap_set_page_dirty(struct page *page);
323 325 extern void end_swap_bio_read(struct bio *bio, int err);
324 326  
325 327 /* linux/mm/swap_state.c */
... ... @@ -17,6 +17,7 @@
17 17 #include <linux/swap.h>
18 18 #include <linux/bio.h>
19 19 #include <linux/swapops.h>
  20 +#include <linux/buffer_head.h>
20 21 #include <linux/writeback.h>
21 22 #include <linux/frontswap.h>
22 23 #include <asm/pgtable.h>
... ... @@ -94,6 +95,7 @@
94 95 {
95 96 struct bio *bio;
96 97 int ret = 0, rw = WRITE;
  98 + struct swap_info_struct *sis = page_swap_info(page);
97 99  
98 100 if (try_to_free_swap(page)) {
99 101 unlock_page(page);
... ... @@ -105,6 +107,32 @@
105 107 end_page_writeback(page);
106 108 goto out;
107 109 }
  110 +
  111 + if (sis->flags & SWP_FILE) {
  112 + struct kiocb kiocb;
  113 + struct file *swap_file = sis->swap_file;
  114 + struct address_space *mapping = swap_file->f_mapping;
  115 + struct iovec iov = {
  116 + .iov_base = page_address(page),
  117 + .iov_len = PAGE_SIZE,
  118 + };
  119 +
  120 + init_sync_kiocb(&kiocb, swap_file);
  121 + kiocb.ki_pos = page_file_offset(page);
  122 + kiocb.ki_left = PAGE_SIZE;
  123 + kiocb.ki_nbytes = PAGE_SIZE;
  124 +
  125 + unlock_page(page);
  126 + ret = mapping->a_ops->direct_IO(KERNEL_WRITE,
  127 + &kiocb, &iov,
  128 + kiocb.ki_pos, 1);
  129 + if (ret == PAGE_SIZE) {
  130 + count_vm_event(PSWPOUT);
  131 + ret = 0;
  132 + }
  133 + return ret;
  134 + }
  135 +
108 136 bio = get_swap_bio(GFP_NOIO, page, end_swap_bio_write);
109 137 if (bio == NULL) {
110 138 set_page_dirty(page);
... ... @@ -126,6 +154,7 @@
126 154 {
127 155 struct bio *bio;
128 156 int ret = 0;
  157 + struct swap_info_struct *sis = page_swap_info(page);
129 158  
130 159 VM_BUG_ON(!PageLocked(page));
131 160 VM_BUG_ON(PageUptodate(page));
... ... @@ -134,6 +163,17 @@
134 163 unlock_page(page);
135 164 goto out;
136 165 }
  166 +
  167 + if (sis->flags & SWP_FILE) {
  168 + struct file *swap_file = sis->swap_file;
  169 + struct address_space *mapping = swap_file->f_mapping;
  170 +
  171 + ret = mapping->a_ops->readpage(swap_file, page);
  172 + if (!ret)
  173 + count_vm_event(PSWPIN);
  174 + return ret;
  175 + }
  176 +
137 177 bio = get_swap_bio(GFP_KERNEL, page, end_swap_bio_read);
138 178 if (bio == NULL) {
139 179 unlock_page(page);
... ... @@ -144,5 +184,17 @@
144 184 submit_bio(READ, bio);
145 185 out:
146 186 return ret;
  187 +}
  188 +
  189 +int swap_set_page_dirty(struct page *page)
  190 +{
  191 + struct swap_info_struct *sis = page_swap_info(page);
  192 +
  193 + if (sis->flags & SWP_FILE) {
  194 + struct address_space *mapping = sis->swap_file->f_mapping;
  195 + return mapping->a_ops->set_page_dirty(page);
  196 + } else {
  197 + return __set_page_dirty_no_writeback(page);
  198 + }
147 199 }
... ... @@ -27,7 +27,7 @@
27 27 */
28 28 static const struct address_space_operations swap_aops = {
29 29 .writepage = swap_writepage,
30   - .set_page_dirty = __set_page_dirty_no_writeback,
  30 + .set_page_dirty = swap_set_page_dirty,
31 31 .migratepage = migrate_page,
32 32 };
33 33  
... ... @@ -1329,6 +1329,14 @@
1329 1329 list_del(&se->list);
1330 1330 kfree(se);
1331 1331 }
  1332 +
  1333 + if (sis->flags & SWP_FILE) {
  1334 + struct file *swap_file = sis->swap_file;
  1335 + struct address_space *mapping = swap_file->f_mapping;
  1336 +
  1337 + sis->flags &= ~SWP_FILE;
  1338 + mapping->a_ops->swap_deactivate(swap_file);
  1339 + }
1332 1340 }
1333 1341  
1334 1342 /*
... ... @@ -1410,7 +1418,9 @@
1410 1418 */
1411 1419 static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span)
1412 1420 {
1413   - struct inode *inode;
  1421 + struct file *swap_file = sis->swap_file;
  1422 + struct address_space *mapping = swap_file->f_mapping;
  1423 + struct inode *inode = mapping->host;
1414 1424 unsigned blocks_per_page;
1415 1425 unsigned long page_no;
1416 1426 unsigned blkbits;
1417 1427  
... ... @@ -1421,10 +1431,19 @@
1421 1431 int nr_extents = 0;
1422 1432 int ret;
1423 1433  
1424   - inode = sis->swap_file->f_mapping->host;
1425 1434 if (S_ISBLK(inode->i_mode)) {
1426 1435 ret = add_swap_extent(sis, 0, sis->max, 0);
1427 1436 *span = sis->pages;
  1437 + goto out;
  1438 + }
  1439 +
  1440 + if (mapping->a_ops->swap_activate) {
  1441 + ret = mapping->a_ops->swap_activate(swap_file);
  1442 + if (!ret) {
  1443 + sis->flags |= SWP_FILE;
  1444 + ret = add_swap_extent(sis, 0, sis->max, 0);
  1445 + *span = sis->pages;
  1446 + }
1428 1447 goto out;
1429 1448 }
1430 1449