13 Jan, 2012

2 commits

  • This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
    mode that avoids writing back pages to backing storage. Async compaction
    maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT.
    For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is
    used.

    This avoids sync compaction stalling for an excessive length of time,
    particularly when copying files to a USB stick where there might be a
    large number of dirty pages backed by a filesystem that does not support
    ->writepages.

    [aarcange@redhat.com: This patch is heavily based on Andrea's work]
    [akpm@linux-foundation.org: fix fs/nfs/write.c build]
    [akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build]
    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Minchan Kim
    Cc: Dave Jones
    Cc: Jan Kara
    Cc: Andy Isaacson
    Cc: Nai Xia
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Asynchronous compaction is used when allocating transparent hugepages to
    avoid blocking for long periods of time. Due to reports of stalling,
    there was a debate on disabling synchronous compaction but this severely
    impacted allocation success rates. Part of the reason was that many dirty
    pages are skipped in asynchronous compaction by the following check;

    if (PageDirty(page) && !sync &&
    mapping->a_ops->migratepage != migrate_page)
    rc = -EBUSY;

    This skips over all mapping aops using buffer_migrate_page() even though
    it is possible to migrate some of these pages without blocking. This
    patch updates the ->migratepage callback with a "sync" parameter. It is
    the responsibility of the callback to fail gracefully if migration would
    block.

    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Minchan Kim
    Cc: Dave Jones
    Cc: Jan Kara
    Cc: Andy Isaacson
    Cc: Nai Xia
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

06 Jan, 2012

1 commit

  • We have no business doing any this in the standard write release path.
    Get rid of it, and put it in the pNFS layer.

    Also, while we're at it, get rid of the completely bogus unlock/relock
    semantics that were present in nfs_writeback_release_full(). It is
    not only unnecessary, but actually dangerous to release the write lock
    just in order to take it again in nfs_page_async_flush(). Better just
    to open code the pgio operations in a pnfs helper.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

07 Nov, 2011

1 commit

  • * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
    Revert "tracing: Include module.h in define_trace.h"
    irq: don't put module.h into irq.h for tracking irqgen modules.
    bluetooth: macroize two small inlines to avoid module.h
    ip_vs.h: fix implicit use of module_get/module_put from module.h
    nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
    include: replace linux/module.h with "struct module" wherever possible
    include: convert various register fcns to macros to avoid include chaining
    crypto.h: remove unused crypto_tfm_alg_modname() inline
    uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
    pm_runtime.h: explicitly requires notifier.h
    linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
    miscdevice.h: fix up implicit use of lists and types
    stop_machine.h: fix implicit use of smp.h for smp_processor_id
    of: fix implicit use of errno.h in include/linux/of.h
    of_platform.h: delete needless include
    acpi: remove module.h include from platform/aclinux.h
    miscdevice.h: delete unnecessary inclusion of module.h
    device_cgroup.h: delete needless include
    net: sch_generic remove redundant use of
    net: inet_timewait_sock doesnt need
    ...

    Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
    - drivers/media/dvb/frontends/dibx000_common.c
    - drivers/media/video/{mt9m111.c,ov6650.c}
    - drivers/mfd/ab3550-core.c
    - include/linux/dmaengine.h

    Linus Torvalds
     

03 Nov, 2011

1 commit

  • When CONFIG_NFS=y and CONFIG_NFS_V3_{,V4}=n we get the following warning.

    fs/nfs/write.c: In function ‘nfs_writeback_done’:
    fs/nfs/write.c:1246:21: warning: unused variable ‘server’

    Remove the variable 'server' to fix the above warning.

    Signed-off-by: Rakib Mullick
    Signed-off-by: Trond Myklebust

    Rakib Mullick
     

01 Nov, 2011

1 commit


20 Oct, 2011

2 commits


19 Oct, 2011

5 commits

  • Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • For pnfs pagelist write failure, we need to pg_recoalesce and resend IO to
    mds.

    Signed-off-by: Peng Tao
    Signed-off-by: Jim Rees
    Cc: stable@kernel.org [3.0]
    Signed-off-by: Trond Myklebust

    Peng Tao
     
  • nfs_find_and_lock_request will take a reference to the nfs_page and
    will then put it if the req is already locked. It's possible though
    that the reference will be the last one. That put then can kick off
    a whole series of reference puts:

    nfs_page
    nfs_open_context
    dentry
    inode

    If the inode ends up being deleted, then the VFS will call
    truncate_inode_pages. That function will try to take the page lock, but
    it was already locked when migrate_page was called. The code
    deadlocks.

    Fix this by simply refusing the migration request if PagePrivate is
    already set, indicating that the page is already associated with an
    active read or write request.

    We've had a customer test a backported version of this patch and
    the preliminary results seem good.

    Cc: stable@kernel.org
    Cc: Andrea Arcangeli
    Reported-by: Harshula Jayasuriya
    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    Jeff Layton
     
  • commit 420e3646 allowed the kernel to reduce the number of unnecessary
    commit calls by skipping the commit when there are a large number of
    outstanding pages.

    However, the current test in nfs_commit_unstable_pages does not handle
    the edge condition properly. When ncommit == 0, then that means that the
    kernel doesn't need to do anything more for the inode. The current test
    though in the WB_SYNC_NONE case will return true, and the inode will end
    up being marked dirty. Once that happens the inode will never be clean
    until there's a WB_SYNC_ALL flush.

    Fix this by immediately returning from nfs_commit_unstable_pages when
    ncommit == 0.

    Mike noticed this problem initially in RHEL5 (2.6.18-based kernel) which
    has a backported version of 420e3646. The inode cache there was growing
    very large. The inode cache was unable to be shrunk since the inodes
    were all marked dirty. Calling sync() would essentially "fix" the
    problem -- the WB_SYNC_ALL flush would result in the inodes all being
    marked clean.

    What I'm not clear on is how big a problem this is in mainline kernels
    as the writeback code there is very different. Either way, it seems
    incorrect to re-mark the inode dirty in this case.

    Reported-by: Mike McLean
    Signed-off-by: Jeff Layton
    Cc: stable@kernel.org [2.6.34+]
    Signed-off-by: Trond Myklebust

    Jeff Layton
     
  • This reverts commit b80c3cb628f0ebc241b02e38dd028969fb8026a2.

    The reverted commit was rendered obsolete by a VFS fix: commit
    5547e8aac6f71505d621a612de2fca0dd988b439 (writeback: Update dirty flags in
    two steps). We now no longer need to worry about writeback_single_inode()
    missing our marking the inode for COMMIT in 'do_writepages()' call.

    Reverting this patch, fixes a performance regression in which the inode
    would continuously get queued to the dirty list, causing the writeback
    code to unnecessarily try to send a COMMIT.

    Signed-off-by: Trond Myklebust
    Tested-by: Simon Kirby
    Cc: stable@kernel.org [2.6.35+]

    Trond Myklebust
     

14 Sep, 2011

1 commit


28 Jul, 2011

1 commit

  • * 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (44 commits)
    NFSv4: Don't use the delegation->inode in nfs_mark_return_delegation()
    nfs: don't use d_move in nfs_async_rename_done
    RDMA: Increasing RPCRDMA_MAX_DATA_SEGS
    SUNRPC: Replace xprt->resend and xprt->sending with a priority queue
    SUNRPC: Allow caller of rpc_sleep_on() to select priority levels
    SUNRPC: Support dynamic slot allocation for TCP connections
    SUNRPC: Clean up the slot table allocation
    SUNRPC: Initalise the struct xprt upon allocation
    SUNRPC: Ensure that we grab the XPRT_LOCK before calling xprt_alloc_slot
    pnfs: simplify pnfs files module autoloading
    nfs: document nfsv4 sillyrename issues
    NFS: Convert nfs4_set_ds_client to EXPORT_SYMBOL_GPL
    SUNRPC: Convert the backchannel exports to EXPORT_SYMBOL_GPL
    SUNRPC: sunrpc should not explicitly depend on NFS config options
    NFS: Clean up - simplify the switch to read/write-through-MDS
    NFS: Move the pnfs write code into pnfs.c
    NFS: Move the pnfs read code into pnfs.c
    NFS: Allow the nfs_pageio_descriptor to signal that a re-coalesce is needed
    NFS: Use the nfs_pageio_descriptor->pg_bsize in the read/write request
    NFS: Cache rpc_ops in struct nfs_pageio_descriptor
    ...

    Linus Torvalds
     

27 Jul, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback: (27 commits)
    mm: properly reflect task dirty limits in dirty_exceeded logic
    writeback: don't busy retry writeback on new/freeing inodes
    writeback: scale IO chunk size up to half device bandwidth
    writeback: trace global_dirty_state
    writeback: introduce max-pause and pass-good dirty limits
    writeback: introduce smoothed global dirty limit
    writeback: consolidate variable names in balance_dirty_pages()
    writeback: show bdi write bandwidth in debugfs
    writeback: bdi write bandwidth estimation
    writeback: account per-bdi accumulated written pages
    writeback: make writeback_control.nr_to_write straight
    writeback: skip tmpfs early in balance_dirty_pages_ratelimited_nr()
    writeback: trace event writeback_queue_io
    writeback: trace event writeback_single_inode
    writeback: remove .nonblocking and .encountered_congestion
    writeback: remove writeback_control.more_io
    writeback: skip balance_dirty_pages() for in-memory fs
    writeback: add bdi_dirty_limit() kernel-doc
    writeback: avoid extra sync work at enqueue time
    writeback: elevate queue_io() into wb_writeback()
    ...

    Fix up trivial conflicts in fs/fs-writeback.c and mm/filemap.c

    Linus Torvalds
     

26 Jul, 2011

1 commit


20 Jul, 2011

1 commit


15 Jul, 2011

6 commits


13 Jul, 2011

6 commits


29 Jun, 2011

1 commit

  • In current pnfs tree, all the layouts set mds_offset in their
    .write_pagelist member.
    mds_offset is only used by generic layer and should be handled by it.

    This patch is for upstream. It is needed in this -rc series to fix a
    bug in objects layout_commit.

    I'll send patches for objects and blocks to be
    squashed into current pnfs tree.

    TODO: It looks like the read path needs the same patch.

    Signed-off-by: Boaz Harrosh
    Signed-off-by: Trond Myklebust

    Boaz Harrosh
     

08 Jun, 2011

1 commit

  • Remove two unused struct writeback_control fields:

    .encountered_congestion (completely unused)
    .nonblocking (never set, checked/showed in XFS,NFS/btrfs)

    The .for_background check in nfs_write_inode() is also removed btw,
    as .for_background implies WB_SYNC_NONE.

    Reviewed-by: Jan Kara
    Proposed-by: Christoph Hellwig
    Signed-off-by: Wu Fengguang

    Wu Fengguang
     

30 May, 2011

2 commits

  • Use common code for pnfs_pageio_init_{read,write} and use
    a common generic pg_test function.

    Note that this function always assumes the the layout driver's
    pg_test method is implemented.

    [Fix BUG]
    Signed-off-by: Boaz Harrosh
    Signed-off-by: Benny Halevy

    Benny Halevy
     
  • Add offset and count parameters to pnfs_update_layout and use them to get
    the layout in the pageio path.

    Order cache layout segments in the following order:
    * offset (ascending)
    * length (descending)
    * iomode (RW before READ)

    Test byte range against the layout segment in use in pnfs_{read,write}_pg_test
    so not to coalesce pages not using the same layout segment.

    [fix lseg ordering]
    [clean up pnfs_find_lseg lseg arg]
    [remove unnecessary FIXME]
    [fix ordering in pnfs_insert_layout]
    [clean up pnfs_insert_layout]
    Signed-off-by: Benny Halevy

    Benny Halevy
     

12 May, 2011

1 commit


13 Apr, 2011

3 commits


27 Mar, 2011

1 commit

  • Now that the inode scalability patches have been merged, it is no longer
    safe to call igrab() under the inode->i_lock.
    Now that we no longer call nfs_clear_request() until the nfs_page is
    being freed, we know that we are always holding a reference to the
    nfs_open_context, which again holds a reference to the path, and so
    the inode cannot be freed until the last nfs_page has been removed
    from the radix tree and freed.

    We can therefore skip the igrab()/iput() altogether.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

25 Mar, 2011

1 commit