08 Oct, 2016

1 commit


20 Sep, 2016

1 commit

  • Commit 62c230bc1790 ("mm: add support for a filesystem to activate
    swap files and use direct_IO for writing swap pages") replaced the
    swap_aops dirty hook from __set_page_dirty_no_writeback() with
    swap_set_page_dirty().

    For normal cases without these special SWP flags code path falls back to
    __set_page_dirty_no_writeback() so the behaviour is expected to be the
    same as before.

    But swap_set_page_dirty() makes use of the page_swap_info() helper to
    get the swap_info_struct to check for the flags like SWP_FILE,
    SWP_BLKDEV etc as desired for those features. This helper has
    BUG_ON(!PageSwapCache(page)) which is racy and safe only for the
    set_page_dirty_lock() path.

    For the set_page_dirty() path which is often needed for cases to be
    called from irq context, kswapd() can toggle the flag behind the back
    while the call is getting executed when system is low on memory and
    heavy swapping is ongoing.

    This ends up with undesired kernel panic.

    This patch just moves the check outside the helper to its users
    appropriately to fix kernel panic for the described path. Couple of
    users of helpers already take care of SwapCache condition so I skipped
    them.

    Link: http://lkml.kernel.org/r/1473460718-31013-1-git-send-email-santosh.shilimkar@oracle.com
    Signed-off-by: Santosh Shilimkar
    Cc: Mel Gorman
    Cc: Joe Perches
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: David S. Miller
    Cc: Jens Axboe
    Cc: Michal Hocko
    Cc: Hugh Dickins
    Cc: Al Viro
    Cc: [4.7.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Santosh Shilimkar
     

08 Aug, 2016

1 commit


29 Jul, 2016

1 commit

  • generic_swapfile_activate() can take quite long time, it iterates over
    all blocks of a file, so add cond_resched to it. I observed about 1
    second stalls when activating a swapfile that was almost unfragmented -
    this patch fixes it.

    Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1607221710580.4818@file01.intranet.prod.int.rdu2.redhat.com
    Signed-off-by: Mikulas Patocka
    Acked-by: Michal Hocko
    Cc: Hugh Dickins
    Cc: Alexander Viro
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mikulas Patocka
     

08 Jun, 2016

2 commits

  • This patch converts the simple bi_rw use cases in the block,
    drivers, mm and fs code to set/get the bio operation using
    bio_set_op_attrs/bio_op

    These should be simple one or two liner cases, so I just did them
    in one patch. The next patches handle the more complicated
    cases in a module per patch.

    Signed-off-by: Mike Christie
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     
  • This has callers of submit_bio/submit_bio_wait set the bio->bi_rw
    instead of passing it in. This makes that use the same as
    generic_make_request and how we set the other bio fields.

    Signed-off-by: Mike Christie

    Fixed up fs/ext4/crypto.c

    Signed-off-by: Jens Axboe

    Mike Christie
     

18 May, 2016

1 commit

  • Pull vfs cleanups from Al Viro:
    "More cleanups from Christoph"

    * 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    nfsd: use RWF_SYNC
    fs: add RWF_DSYNC aand RWF_SYNC
    ceph: use generic_write_sync
    fs: simplify the generic_write_sync prototype
    fs: add IOCB_SYNC and IOCB_DSYNC
    direct-io: remove the offset argument to dio_complete
    direct-io: eliminate the offset argument to ->direct_IO
    xfs: eliminate the pos variable in xfs_file_dio_aio_write
    filemap: remove the pos argument to generic_file_direct_write
    filemap: remove pos variables in generic_file_read_iter

    Linus Torvalds
     

02 May, 2016

1 commit


29 Apr, 2016

1 commit

  • Kyeongdon reported below error which is BUG_ON(!PageSwapCache(page)) in
    page_swap_info. The reason is that page_endio in rw_page unlocks the
    page if read I/O is completed so we need to hold a PG_lock again to
    check PageSwapCache. Otherwise, the page can be removed from swapcache.

    Kernel BUG at c00f9040 [verbose debug info unavailable]
    Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
    Modules linked in:
    CPU: 4 PID: 13446 Comm: RenderThread Tainted: G W 3.10.84-g9f14aec-dirty #73
    task: c3b73200 ti: dd192000 task.ti: dd192000
    PC is at page_swap_info+0x10/0x2c
    LR is at swap_slot_free_notify+0x18/0x6c
    pc : [] lr : [] psr: 400f0113
    sp : dd193d78 ip : c2deb1e4 fp : da015180
    r10: 00000000 r9 : 000200da r8 : c120fe08
    r7 : 00000000 r6 : 00000000 r5 : c249a6c0 r4 : = c249a6c0
    r3 : 00000000 r2 : 40080009 r1 : 200f0113 r0 : = c249a6c0
    .. ..
    Call Trace:
    page_swap_info+0x10/0x2c
    swap_slot_free_notify+0x18/0x6c
    swap_readpage+0x90/0x11c
    read_swap_cache_async+0x134/0x1ac
    swapin_readahead+0x70/0xb0
    handle_pte_fault+0x320/0x6fc
    handle_mm_fault+0xc0/0xf0
    do_page_fault+0x11c/0x36c
    do_DataAbort+0x34/0x118

    Fixes: 3f2b1a04f44933f2 ("zram: revive swap_slot_free_notify")
    Signed-off-by: Minchan Kim
    Tested-by: Kyeongdon Kim
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

23 Mar, 2016

1 commit

  • Commit b430e9d1c6d4 ("remove compressed copy from zram in-memory")
    applied swap_slot_free_notify call in *end_swap_bio_read* to remove
    duplicated memory between zram and memory.

    However, with the introduction of rw_page in zram: 8c7f01025f7b ("zram:
    implement rw_page operation of zram"), it became void because rw_page
    doesn't need bio.

    Memory footprint is really important in embedded platforms which have
    small memory, for example, 512M) recently because it could start to kill
    processes if memory footprint exceeds some threshold by LMK or some
    similar memory management modules.

    This patch restores the function for rw_page, thereby eliminating this
    duplication.

    Signed-off-by: Minchan Kim
    Cc: Sergey Senozhatsky
    Cc: karam.lee
    Cc:
    Cc: Chan Jeong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

18 Mar, 2016

1 commit

  • Most of the mm subsystem uses pr_ so make it consistent.

    Miscellanea:

    - Realign arguments
    - Add missing newline to format
    - kmemleak-test.c has a "kmemleak: " prefix added to the
    "Kmemleak testing" logging message via pr_fmt

    Signed-off-by: Joe Perches
    Acked-by: Tejun Heo [percpu]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

14 Aug, 2015

1 commit

  • Call pre-defined helper bio_add_page() instead of open coding for
    iterating through bi_io_vec[]. Doing that, it's possible to make some
    parts in filesystems and mm/page_io.c simpler than before.

    Acked-by: Dave Kleikamp
    Cc: Christoph Hellwig
    Cc: Al Viro
    Cc: linux-fsdevel@vger.kernel.org
    Signed-off-by: Kent Overstreet
    [dpark: add more description in commit message]
    Signed-off-by: Dongsu Park
    Signed-off-by: Ming Lin
    Signed-off-by: Jens Axboe

    Kent Overstreet
     

29 Jul, 2015

1 commit

  • Currently we have two different ways to signal an I/O error on a BIO:

    (1) by clearing the BIO_UPTODATE flag
    (2) by returning a Linux errno value to the bi_end_io callback

    The first one has the drawback of only communicating a single possible
    error (-EIO), and the second one has the drawback of not beeing persistent
    when bios are queued up, and are not passed along from child to parent
    bio in the ever more popular chaining scenario. Having both mechanisms
    available has the additional drawback of utterly confusing driver authors
    and introducing bugs where various I/O submitters only deal with one of
    them, and the others have to add boilerplate code to deal with both kinds
    of error returns.

    So add a new bi_error field to store an errno value directly in struct
    bio and remove the existing mechanisms to clean all this up.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: NeilBrown
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

19 May, 2015

1 commit

  • Stop abusing struct page functionality and the swap end_io handler, and
    instead add a modified version of the blk-lib.c bio_batch helpers.

    Also move the block I/O code into swap.c as they are directly tied into
    each other.

    Signed-off-by: Christoph Hellwig
    Tested-by: Pavel Machek
    Tested-by: Ming Lin
    Acked-by: Pavel Machek
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

12 Apr, 2015

1 commit


26 Mar, 2015

1 commit


13 Mar, 2015

1 commit


29 Jan, 2015

1 commit


15 Jun, 2014

1 commit

  • Tetsuo Handa wrote:
    "Commit 62a8067a7f35 ("bio_vec-backed iov_iter") introduced an unnamed
    union inside a struct which gcc-4.4.7 cannot handle. Name the unnamed
    union as u in order to fix build failure"

    Let's do this instead: there is only one place in the entire tree that
    steps into this breakage. Anon structs and unions work in older gcc
    versions; as the matter of fact, we have those in the tree - see e.g.
    struct ieee80211_tx_info in include/net/mac80211.h

    What doesn't work is handling their initializers:

    struct {
    int a;
    union {
    int b;
    char c;
    };
    } x[2] = {{.a = 1, .c = 'a'}, {.a = 0, .b = 1}};

    is the obvious syntax for initializer, perfectly fine for C11 and
    handled correctly by gcc-4.7 or later.

    Earlier versions, though, break on it - declaration is fine and so's
    access to fields (i.e. x[0].c = 'a'; would produce the right code), but
    members of the anon structs and unions are not inserted into the right
    namespace. Tellingly, those older versions will not barf on struct {int
    a; struct {int a;};}; - looks like they just have it hacked up somewhere
    around the handling of . and -> instead of doing the right thing.

    The easiest way to deal with that crap is to turn initialization of
    those fields (in the only place where we have such initializer of
    iov_iter) into plain assignment.

    Reported-by: Tetsuo Handa
    Reported-by: Russell King
    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

13 Jun, 2014

1 commit

  • Pull vfs updates from Al Viro:
    "This the bunch that sat in -next + lock_parent() fix. This is the
    minimal set; there's more pending stuff.

    In particular, I really hope to get acct.c fixes merged this cycle -
    we need that to deal sanely with delayed-mntput stuff. In the next
    pile, hopefully - that series is fairly short and localized
    (kernel/acct.c, fs/super.c and fs/namespace.c). In this pile: more
    iov_iter work. Most of prereqs for ->splice_write with sane locking
    order are there and Kent's dio rewrite would also fit nicely on top of
    this pile"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits)
    lock_parent: don't step on stale ->d_parent of all-but-freed one
    kill generic_file_splice_write()
    ceph: switch to iter_file_splice_write()
    shmem: switch to iter_file_splice_write()
    nfs: switch to iter_splice_write_file()
    fs/splice.c: remove unneeded exports
    ocfs2: switch to iter_file_splice_write()
    ->splice_write() via ->write_iter()
    bio_vec-backed iov_iter
    optimize copy_page_{to,from}_iter()
    bury generic_file_aio_{read,write}
    lustre: get rid of messing with iovecs
    ceph: switch to ->write_iter()
    ceph_sync_direct_write: stop poking into iov_iter guts
    ceph_sync_read: stop poking into iov_iter guts
    new helper: copy_page_from_iter()
    fuse: switch to ->write_iter()
    btrfs: switch to ->write_iter()
    ocfs2: switch to ->write_iter()
    xfs: switch to ->write_iter()
    ...

    Linus Torvalds
     

05 Jun, 2014

1 commit

  • By calling the device driver to write the page directly, we avoid
    allocating a BIO, which allows us to free memory without allocating
    memory.

    [akpm@linux-foundation.org: fix used-uninitialized bug]
    Signed-off-by: Matthew Wilcox
    Cc: Dave Chinner
    Cc: Dheeraj Reddy
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

07 May, 2014

3 commits

  • New variant of iov_iter - ITER_BVEC in iter->type, backed with
    bio_vec array instead of iovec one. Primitives taught to deal
    with such beasts, __swap_write() switched to using that kind
    of iov_iter.

    Note that bio_vec is just a triple - there's
    nothing block-specific about it. I've left the definition where it
    was, but took it from under ifdef CONFIG_BLOCK.

    Next target: ->splice_write()...

    Signed-off-by: Al Viro

    Al Viro
     
  • For now, just use the same thing we pass to ->direct_IO() - it's all
    iovec-based at the moment. Pass it explicitly to iov_iter_init() and
    account for kvec vs. iovec in there, by the same kludge NFS ->direct_IO()
    uses.

    Signed-off-by: Al Viro

    Al Viro
     
  • unmodified, for now

    Signed-off-by: Al Viro

    Al Viro
     

31 Jan, 2014

1 commit

  • Pull core block IO changes from Jens Axboe:
    "The major piece in here is the immutable bio_ve series from Kent, the
    rest is fairly minor. It was supposed to go in last round, but
    various issues pushed it to this release instead. The pull request
    contains:

    - Various smaller blk-mq fixes from different folks. Nothing major
    here, just minor fixes and cleanups.

    - Fix for a memory leak in the error path in the block ioctl code
    from Christian Engelmayer.

    - Header export fix from CaiZhiyong.

    - Finally the immutable biovec changes from Kent Overstreet. This
    enables some nice future work on making arbitrarily sized bios
    possible, and splitting more efficient. Related fixes to immutable
    bio_vecs:

    - dm-cache immutable fixup from Mike Snitzer.
    - btrfs immutable fixup from Muthu Kumar.

    - bio-integrity fix from Nic Bellinger, which is also going to stable"

    * 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
    xtensa: fixup simdisk driver to work with immutable bio_vecs
    block/blk-mq-cpu.c: use hotcpu_notifier()
    blk-mq: for_each_* macro correctness
    block: Fix memory leak in rw_copy_check_uvector() handling
    bio-integrity: Fix bio_integrity_verify segment start bug
    block: remove unrelated header files and export symbol
    blk-mq: uses page->list incorrectly
    blk-mq: use __smp_call_function_single directly
    btrfs: fix missing increment of bi_remaining
    Revert "block: Warn and free bio if bi_end_io is not set"
    block: Warn and free bio if bi_end_io is not set
    blk-mq: fix initializing request's start time
    block: blk-mq: don't export blk_mq_free_queue()
    block: blk-mq: make blk_sync_queue support mq
    block: blk-mq: support draining mq queue
    dm cache: increment bi_remaining when bi_end_io is restored
    block: fixup for generic bio chaining
    block: Really silence spurious compiler warnings
    block: Silence spurious compiler warnings
    block: Kill bio_pair_split()
    ...

    Linus Torvalds
     

24 Jan, 2014

1 commit

  • Most of the VM_BUG_ON assertions are performed on a page. Usually, when
    one of these assertions fails we'll get a BUG_ON with a call stack and
    the registers.

    I've recently noticed based on the requests to add a small piece of code
    that dumps the page to various VM_BUG_ON sites that the page dump is
    quite useful to people debugging issues in mm.

    This patch adds a VM_BUG_ON_PAGE(cond, page) which beyond doing what
    VM_BUG_ON() does, also dumps the page before executing the actual
    BUG_ON.

    [akpm@linux-foundation.org: fix up includes]
    Signed-off-by: Sasha Levin
    Cc: "Kirill A. Shutemov"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

24 Nov, 2013

1 commit

  • Immutable biovecs are going to require an explicit iterator. To
    implement immutable bvecs, a later patch is going to add a bi_bvec_done
    member to this struct; for now, this patch effectively just renames
    things.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Geert Uytterhoeven
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "Ed L. Cashin"
    Cc: Nick Piggin
    Cc: Lars Ellenberg
    Cc: Jiri Kosina
    Cc: Matthew Wilcox
    Cc: Geoff Levand
    Cc: Yehuda Sadeh
    Cc: Sage Weil
    Cc: Alex Elder
    Cc: ceph-devel@vger.kernel.org
    Cc: Joshua Morris
    Cc: Philip Kelleher
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Konrad Rzeszutek Wilk
    Cc: Jeremy Fitzhardinge
    Cc: Neil Brown
    Cc: Alasdair Kergon
    Cc: Mike Snitzer
    Cc: dm-devel@redhat.com
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: linux390@de.ibm.com
    Cc: Boaz Harrosh
    Cc: Benny Halevy
    Cc: "James E.J. Bottomley"
    Cc: Greg Kroah-Hartman
    Cc: "Nicholas A. Bellinger"
    Cc: Alexander Viro
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Andreas Dilger
    Cc: Jaegeuk Kim
    Cc: Steven Whitehouse
    Cc: Dave Kleikamp
    Cc: Joern Engel
    Cc: Prasad Joshi
    Cc: Trond Myklebust
    Cc: KONISHI Ryusuke
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Ben Myers
    Cc: xfs@oss.sgi.com
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Len Brown
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: Herton Ronaldo Krzesinski
    Cc: Ben Hutchings
    Cc: Andrew Morton
    Cc: Guo Chao
    Cc: Tejun Heo
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Wei Yongjun
    Cc: "Roger Pau Monné"
    Cc: Jan Beulich
    Cc: Stefano Stabellini
    Cc: Ian Campbell
    Cc: Sebastian Ott
    Cc: Christian Borntraeger
    Cc: Minchan Kim
    Cc: Jiang Liu
    Cc: Nitin Gupta
    Cc: Jerome Marchand
    Cc: Joe Perches
    Cc: Peng Tao
    Cc: Andy Adamson
    Cc: fanchaoting
    Cc: Jie Liu
    Cc: Sunil Mushran
    Cc: "Martin K. Petersen"
    Cc: Namjae Jeon
    Cc: Pankaj Kumar
    Cc: Dan Magenheimer
    Cc: Mel Gorman 6

    Kent Overstreet
     

30 Jul, 2013

1 commit

  • This code doesn't serve any purpose anymore, since the aio retry
    infrastructure has been removed.

    This change should be safe because aio_read/write are also used for
    synchronous IO, and called from do_sync_read()/do_sync_write() - and
    there's no looping done in the sync case (the read and write syscalls).

    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Signed-off-by: Benjamin LaHaise

    Kent Overstreet
     

04 Jul, 2013

1 commit

  • Swap subsystem does lazy swap slot free with expecting the page would be
    swapped out again so we can avoid unnecessary write.

    But the problem in in-memory swap(ex, zram) is that it consumes memory
    space until vm_swap_full(ie, used half of all of swap device) condition
    meet. It could be bad if we use multiple swap device, small in-memory
    swap and big storage swap or in-memory swap alone.

    This patch makes swap subsystem free swap slot as soon as swap-read is
    completed and make the swapcache page dirty so the page should be
    written out the swap device to reclaim it. It means we never lose it.

    I tested this patch with kernel compile workload.

    1. before

    compile time : 9882.42
    zram max wasted space by fragmentation: 13471881 byte
    memory space consumed by zram: 174227456 byte
    the number of slot free notify: 206684

    2. after

    compile time : 9653.90
    zram max wasted space by fragmentation: 11805932 byte
    memory space consumed by zram: 154001408 byte
    the number of slot free notify: 426972

    [akpm@linux-foundation.org: tweak comment text]
    [artem.savkov@gmail.com: fix BUG due to non-swapcache pages in end_swap_bio_read()]
    [akpm@linux-foundation.org: invert unlikely() test, augment comment, 80-col cleanup]
    Signed-off-by: Dan Magenheimer
    Signed-off-by: Minchan Kim
    Signed-off-by: Artem Savkov
    Cc: Hugh Dickins
    Cc: Seth Jennings
    Cc: Nitin Gupta
    Cc: Konrad Rzeszutek Wilk
    Cc: Shaohua Li
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

09 May, 2013

1 commit

  • Pull block core updates from Jens Axboe:

    - Major bit is Kents prep work for immutable bio vecs.

    - Stable candidate fix for a scheduling-while-atomic in the queue
    bypass operation.

    - Fix for the hang on exceeded rq->datalen 32-bit unsigned when merging
    discard bios.

    - Tejuns changes to convert the writeback thread pool to the generic
    workqueue mechanism.

    - Runtime PM framework, SCSI patches exists on top of these in James'
    tree.

    - A few random fixes.

    * 'for-3.10/core' of git://git.kernel.dk/linux-block: (40 commits)
    relay: move remove_buf_file inside relay_close_buf
    partitions/efi.c: replace useless kzalloc's by kmalloc's
    fs/block_dev.c: fix iov_shorten() criteria in blkdev_aio_read()
    block: fix max discard sectors limit
    blkcg: fix "scheduling while atomic" in blk_queue_bypass_start
    Documentation: cfq-iosched: update documentation help for cfq tunables
    writeback: expose the bdi_wq workqueue
    writeback: replace custom worker pool implementation with unbound workqueue
    writeback: remove unused bdi_pending_list
    aoe: Fix unitialized var usage
    bio-integrity: Add explicit field for owner of bip_buf
    block: Add an explicit bio flag for bios that own their bvec
    block: Add bio_alloc_pages()
    block: Convert some code to bio_for_each_segment_all()
    block: Add bio_for_each_segment_all()
    bounce: Refactor __blk_queue_bounce to not use bi_io_vec
    raid1: use bio_copy_data()
    pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage
    pktcdvd: use bio_copy_data()
    block: Add bio_copy_data()
    ...

    Linus Torvalds
     

08 May, 2013

1 commit

  • Faster kernel compiles by way of fewer unnecessary includes.

    [akpm@linux-foundation.org: fix fallout]
    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     

30 Apr, 2013

4 commits

  • As pointed out by Andrew Morton, the swap-over-NFS writeback is not
    setting PageWriteback before it is queued for direct IO. While swap
    pages do not participate in BDI or process dirty accounting and the IO
    is synchronous, the writeback bit is still required and not setting it
    in this case was an oversight. swapoff depends on the page writeback to
    synchronoise all pending writes on a swap page before it is reused.
    Swapcache freeing and reuse depend on checking the PageWriteback under
    lock to ensure the page is safe to reuse.

    Direct IO handlers and the direct IO handler for NFS do not deal with
    PageWriteback as they are synchronous writes. In the case of NFS, it
    schedules pages (or a page in the case of swap) for IO and then waits
    synchronously for IO to complete in nfs_direct_write(). It is
    recognised that this is a slowdown from normal swap handling which is
    asynchronous and uses a completion handler. Shoving PageWriteback
    handling down into direct IO handlers looks like a bad fit to handle the
    swap case although it may have to be dealt with some day if swap is
    converted to use direct IO in general and bmap is finally done away
    with. At that point it will be necessary to refit asynchronous direct
    IO with completion handlers onto the swap subsystem.

    As swapcache currently depends on PageWriteback to protect against
    races, this patch sets PageWriteback under the page lock before queueing
    it for direct IO. It is cleared when the direct IO handler returns. IO
    errors are treated similarly to the direct-to-bio case except PageError
    is not set as in the case of swap-over-NFS, it is likely to be a
    transient error.

    It was asked what prevents such a page being reclaimed in parallel.
    With this patch applied, such a page will now be skipped (most of the
    time) or blocked until the writeback completes. Reclaim checks
    PageWriteback under the page lock before calling try_to_free_swap and
    the page lock should prevent the page being requeued for IO before it is
    freed.

    This and Jerome's related patch should considered for -stable as far
    back as 3.6 when swap-over-NFS was introduced.

    [akpm@linux-foundation.org: use pr_err_ratelimited()]
    [akpm@linux-foundation.org: remove hopefully-unneeded cast in printk]
    Signed-off-by: Mel Gorman
    Cc: Jerome Marchand
    Cc: Hugh Dickins
    Cc: [3.6+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Since commit 62c230bc1790 ("mm: add support for a filesystem to activate
    swap files and use direct_IO for writing swap pages"), swap_writepage()
    calls direct_IO on swap files. However, in that case the page isn't
    redirtied if I/O fails, and is therefore handled afterwards as if it has
    been successfully written to the swap file, leading to memory corruption
    when the page is eventually swapped back in.

    This patch sets the page dirty when direct_IO() fails. It fixes a
    memory corruption that happened while using swap-over-NFS.

    Signed-off-by: Jerome Marchand
    Acked-by: Johannes Weiner
    Acked-by: Mel Gorman
    Cc: Hugh Dickins
    Cc: [3.6+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jerome Marchand
     
  • To prevent flooding the swap device with writebacks, frontswap backends
    need to count and limit the number of outstanding writebacks. The
    incrementing of the counter can be done before the call to
    __swap_writepage(). However, the caller must receive a notification
    when the writeback completes in order to decrement the counter.

    To achieve this functionality, this patch modifies __swap_writepage() to
    take the bio completion callback function as an argument.

    end_swap_bio_write(), the normal bio completion function, is also made
    non-static so that code doing the accounting can call it after the
    accounting is done.

    There should be no behavioural change to existing code.

    Signed-off-by: Seth Jennings
    Signed-off-by: Bob Liu
    Acked-by: Minchan Kim
    Reviewed-by: Dan Magenheimer
    Cc: Konrad Rzeszutek Wilk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Seth Jennings
     
  • swap_writepage() is currently where frontswap hooks into the swap write
    path to capture pages with the frontswap_store() function. However, if
    a frontswap backend wants to "resume" the writeback of a page to the
    swap device, it can't call swap_writepage() as the page will simply
    reenter the backend.

    This patch separates swap_writepage() into a top and bottom half, the
    bottom half named __swap_writepage() to allow a frontswap backend, like
    zswap, to resume writeback beyond the frontswap_store() hook.

    __add_to_swap_cache() is also made non-static so that the page for which
    writeback is to be resumed can be added to the swap cache.

    Signed-off-by: Seth Jennings
    Signed-off-by: Bob Liu
    Acked-by: Minchan Kim
    Reviewed-by: Dan Magenheimer
    Cc: Konrad Rzeszutek Wilk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Seth Jennings
     

24 Mar, 2013

1 commit

  • For immutable bvecs, all bi_idx usage needs to be audited - so here
    we're removing all the unnecessary uses.

    Most of these are places where it was being initialized on a bio that
    was just allocated, a few others are conversions to standard macros.

    Signed-off-by: Kent Overstreet
    CC: Jens Axboe

    Kent Overstreet
     

01 Aug, 2012

3 commits

  • The patch "mm: add support for a filesystem to activate swap files and use
    direct_IO for writing swap pages" added support for using direct_IO to
    write swap pages but it is insufficient for highmem pages.

    To support highmem pages, this patch kmaps() the page before calling the
    direct_IO() handler. As direct_IO deals with virtual addresses an
    additional helper is necessary for get_kernel_pages() to lookup the struct
    page for a kmap virtual address.

    Signed-off-by: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Christoph Hellwig
    Cc: David S. Miller
    Cc: Eric B Munson
    Cc: Eric Paris
    Cc: James Morris
    Cc: Mel Gorman
    Cc: Mike Christie
    Cc: Neil Brown
    Cc: Peter Zijlstra
    Cc: Sebastian Andrzej Siewior
    Cc: Trond Myklebust
    Cc: Xiaotian Feng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • The version of swap_activate introduced is sufficient for swap-over-NFS
    but would not provide enough information to implement a generic handler.
    This patch shuffles things slightly to ensure the same information is
    available for aops->swap_activate() as is available to the core.

    No functionality change.

    Signed-off-by: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Christoph Hellwig
    Cc: David S. Miller
    Cc: Eric B Munson
    Cc: Eric Paris
    Cc: James Morris
    Cc: Mel Gorman
    Cc: Mike Christie
    Cc: Neil Brown
    Cc: Peter Zijlstra
    Cc: Sebastian Andrzej Siewior
    Cc: Trond Myklebust
    Cc: Xiaotian Feng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Currently swapfiles are managed entirely by the core VM by using ->bmap to
    allocate space and write to the blocks directly. This effectively ensures
    that the underlying blocks are allocated and avoids the need for the swap
    subsystem to locate what physical blocks store offsets within a file.

    If the swap subsystem is to use the filesystem information to locate the
    blocks, it is critical that information such as block groups, block
    bitmaps and the block descriptor table that map the swap file were
    resident in memory. This patch adds address_space_operations that the VM
    can call when activating or deactivating swap backed by a file.

    int swap_activate(struct file *);
    int swap_deactivate(struct file *);

    The ->swap_activate() method is used to communicate to the file that the
    VM relies on it, and the address_space should take adequate measures such
    as reserving space in the underlying device, reserving memory for mempools
    and pinning information such as the block descriptor table in memory. The
    ->swap_deactivate() method is called on sys_swapoff() if ->swap_activate()
    returned success.

    After a successful swapfile ->swap_activate, the swapfile is marked
    SWP_FILE and swapper_space.a_ops will proxy to
    sis->swap_file->f_mappings->a_ops using ->direct_io to write swapcache
    pages and ->readpage to read.

    It is perfectly possible that direct_IO be used to read the swap pages but
    it is an unnecessary complication. Similarly, it is possible that
    ->writepage be used instead of direct_io to write the pages but filesystem
    developers have stated that calling writepage from the VM is undesirable
    for a variety of reasons and using direct_IO opens up the possibility of
    writing back batches of swap pages in the future.

    [a.p.zijlstra@chello.nl: Original patch]
    Signed-off-by: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Christoph Hellwig
    Cc: David S. Miller
    Cc: Eric B Munson
    Cc: Eric Paris
    Cc: James Morris
    Cc: Mel Gorman
    Cc: Mike Christie
    Cc: Neil Brown
    Cc: Peter Zijlstra
    Cc: Sebastian Andrzej Siewior
    Cc: Trond Myklebust
    Cc: Xiaotian Feng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman