29 Nov, 2007

1 commit

  • tmpfs was misconverted to __GFP_ZERO in 2.6.11. There's an unusual case in
    which shmem_getpage receives the page from its caller instead of allocating.
    We must cover this case by clear_highpage before SetPageUptodate, as before.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

30 Oct, 2007

1 commit

  • It's possible to provoke unionfs (not yet in mainline, though in mm and
    some distros) to hit shmem_writepage's BUG_ON(page_mapped(page)). I expect
    it's possible to provoke the 2.6.23 ecryptfs in the same way (but the
    2.6.24 ecryptfs no longer calls lower level's ->writepage).

    This came to light with the recent find that AOP_WRITEPAGE_ACTIVATE could
    leak from tmpfs via write_cache_pages and unionfs to userspace. There's
    already a fix (e423003028183df54f039dfda8b58c49e78c89d7 - writeback: don't
    propagate AOP_WRITEPAGE_ACTIVATE) in the tree for that, and it's okay so
    far as it goes; but insufficient because it doesn't address the underlying
    issue, that shmem_writepage expects to be called only by vmscan (relying on
    backing_dev_info capabilities to prevent the normal writeback path from
    ever approaching it).

    That's an increasingly fragile assumption, and ramdisk_writepage (the other
    source of AOP_WRITEPAGE_ACTIVATEs) is already careful to check
    wbc->for_reclaim before returning it. Make the same check in
    shmem_writepage, thereby sidestepping the page_mapped BUG also.

    Signed-off-by: Hugh Dickins
    Cc: Erez Zadok
    Cc:
    Reviewed-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

22 Oct, 2007

2 commits

  • Now that nfsd has stopped writing to the find_exported_dentry member we an
    mark the export_operations const

    Signed-off-by: Christoph Hellwig
    Cc: Neil Brown
    Cc: "J. Bruce Fields"
    Cc:
    Cc: Dave Kleikamp
    Cc: Anton Altaparmakov
    Cc: David Chinner
    Cc: Timothy Shimmin
    Cc: OGAWA Hirofumi
    Cc: Hugh Dickins
    Cc: Chris Mason
    Cc: Jeff Mahoney
    Cc: "Vladimir V. Saveliev"
    Cc: Steven Whitehouse
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • I'm not sure what people were thinking when adding support to export tmpfs,
    but here's the conversion anyway:

    Signed-off-by: Christoph Hellwig
    Cc: Neil Brown
    Cc: "J. Bruce Fields"
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

17 Oct, 2007

9 commits

  • Why do we need r/o bind mounts?

    This feature allows a read-only view into a read-write filesystem. In the
    process of doing that, it also provides infrastructure for keeping track of
    the number of writers to any given mount.

    This has a number of uses. It allows chroots to have parts of filesystems
    writable. It will be useful for containers in the future because users may
    have root inside a container, but should not be allowed to write to
    somefilesystems. This also replaces patches that vserver has had out of the
    tree for several years.

    It allows security enhancement by making sure that parts of your filesystem
    read-only (such as when you don't trust your FTP server), when you don't want
    to have entire new filesystems mounted, or when you want atime selectively
    updated. I've been using the following script to test that the feature is
    working as desired. It takes a directory and makes a regular bind and a r/o
    bind mount of it. It then performs some normal filesystem operations on the
    three directories, including ones that are expected to fail, like creating a
    file on the r/o mount.

    This patch:

    Some filesystems forego the vfs and may_open() and create their own 'struct
    file's.

    This patch creates a couple of helper functions which can be used by these
    filesystems, and will provide a unified place which the r/o bind mount code
    may patch.

    Also, rename an existing, static-scope init_file() to a less generic name.

    Signed-off-by: Dave Hansen
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • These aren't modular, so SLAB_PANIC is OK.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Slab constructors currently have a flags parameter that is never used. And
    the order of the arguments is opposite to other slab functions. The object
    pointer is placed before the kmem_cache pointer.

    Convert

    ctor(void *object, struct kmem_cache *s, unsigned long flags)

    to

    ctor(struct kmem_cache *s, void *object)

    throughout the kernel

    [akpm@linux-foundation.org: coupla fixes]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • provide BDI constructor/destructor hooks

    [akpm@linux-foundation.org: compile fix]
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • This patch makes three needlessly global functions static.

    Signed-off-by: Adrian Bunk
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • This patch marks a number of allocations that are either short-lived such as
    network buffers or are reclaimable such as inode allocations. When something
    like updatedb is called, long-lived and unmovable kernel allocations tend to
    be spread throughout the address space which increases fragmentation.

    This patch groups these allocations together as much as possible by adding a
    new MIGRATE_TYPE. The MIGRATE_RECLAIMABLE type is for allocations that can be
    reclaimed on demand, but not moved. i.e. they can be migrated by deleting
    them and re-reading the information from elsewhere.

    Signed-off-by: Mel Gorman
    Cc: Andy Whitcroft
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Here's a cut at fixing up uses of the online node map in generic code.

    mm/shmem.c:shmem_parse_mpol()

    Ensure nodelist is subset of nodes with memory.
    Use node_states[N_HIGH_MEMORY] as default for missing
    nodelist for interleave policy.

    mm/shmem.c:shmem_fill_super()

    initialize policy_nodes to node_states[N_HIGH_MEMORY]

    mm/page-writeback.c:highmem_dirtyable_memory()

    sum over nodes with memory

    mm/page_alloc.c:zlc_setup()

    allowednodes - use nodes with memory.

    mm/page_alloc.c:default_zonelist_order()

    average over nodes with memory.

    mm/page_alloc.c:find_next_best_node()

    skip nodes w/o memory.
    N_HIGH_MEMORY state mask may not be initialized at this time,
    unless we want to depend on early_calculate_totalpages() [see
    below]. Will ZONE_MOVABLE ever be configurable?

    mm/page_alloc.c:find_zone_movable_pfns_for_nodes()

    spread kernelcore over nodes with memory.

    This required calling early_calculate_totalpages()
    unconditionally, and populating N_HIGH_MEMORY node
    state therein from nodes in the early_node_map[].
    If we can depend on this, we can eliminate the
    population of N_HIGH_MEMORY mask from __build_all_zonelists()
    and use the N_HIGH_MEMORY mask in find_next_best_node().

    mm/mempolicy.c:mpol_check_policy()

    Ensure nodes specified for policy are subset of
    nodes with memory.

    [akpm@linux-foundation.org: fix warnings]
    Signed-off-by: Lee Schermerhorn
    Acked-by: Christoph Lameter
    Cc: Shaohua Li
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     
  • Implement new aops for some of the simpler filesystems.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • This patch cleans up duplicate includes in
    mm/

    Signed-off-by: Jesper Juhl
    Acked-by: Paul Mundt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesper Juhl
     

20 Jul, 2007

5 commits

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     
  • This patch completes Linus's wish that the fault return codes be made into
    bit flags, which I agree makes everything nicer. This requires requires
    all handle_mm_fault callers to be modified (possibly the modifications
    should go further and do things like fault accounting in handle_mm_fault --
    however that would be for another patch).

    [akpm@linux-foundation.org: fix alpha build]
    [akpm@linux-foundation.org: fix s390 build]
    [akpm@linux-foundation.org: fix sparc build]
    [akpm@linux-foundation.org: fix sparc64 build]
    [akpm@linux-foundation.org: fix ia64 build]
    Signed-off-by: Nick Piggin
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Russell King
    Cc: Ian Molton
    Cc: Bryan Wu
    Cc: Mikael Starvik
    Cc: David Howells
    Cc: Yoshinori Sato
    Cc: "Luck, Tony"
    Cc: Hirokazu Takata
    Cc: Geert Uytterhoeven
    Cc: Roman Zippel
    Cc: Greg Ungerer
    Cc: Matthew Wilcox
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Paul Mundt
    Cc: Kazumoto Kojima
    Cc: Richard Curnow
    Cc: William Lee Irwin III
    Cc: "David S. Miller"
    Cc: Jeff Dike
    Cc: Paolo 'Blaisorblade' Giarrusso
    Cc: Miles Bader
    Cc: Chris Zankel
    Acked-by: Kyle McMartin
    Acked-by: Haavard Skinnemoen
    Acked-by: Ralf Baechle
    Acked-by: Andi Kleen
    Signed-off-by: Andrew Morton
    [ Still apparently needs some ARM and PPC loving - Linus ]
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Change ->fault prototype. We now return an int, which contains
    VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte.
    FAULT_RET_ code tells the VM whether a page was found, whether it has been
    locked, and potentially other things. This is not quite the way he wanted
    it yet, but that's changed in the next patch (which requires changes to
    arch code).

    This means we no longer set VM_CAN_INVALIDATE in the vma in order to say
    that a page is locked which requires filemap_nopage to go away (because we
    can no longer remain backward compatible without that flag), but we were
    going to do that anyway.

    struct fault_data is renamed to struct vm_fault as Linus asked. address
    is now a void __user * that we should firmly encourage drivers not to use
    without really good reason.

    The page is now returned via a page pointer in the vm_fault struct.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
    the virtual address -> file offset differently from linear mappings.

    ->populate is a layering violation because the filesystem/pagecache code
    should need to know anything about the virtual memory mapping. The hitch here
    is that the ->nopage handler didn't pass down enough information (ie. pgoff).
    But it is more logical to pass pgoff rather than have the ->nopage function
    calculate it itself anyway (because that's a similar layering violation).

    Having the populate handler install the pte itself is likewise a nasty thing
    to be doing.

    This patch introduces a new fault handler that replaces ->nopage and
    ->populate and (later) ->nopfn. Most of the old mechanism is still in place
    so there is a lot of duplication and nice cleanups that can be removed if
    everyone switches over.

    The rationale for doing this in the first place is that nonlinear mappings are
    subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
    to duplicate the synchronisation logic rather than just consolidate the two.

    After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
    pagecache. Seems like a fringe functionality anyway.

    NOPAGE_REFAULT is removed. This should be implemented with ->fault, and no
    users have hit mainline yet.

    [akpm@linux-foundation.org: cleanup]
    [randy.dunlap@oracle.com: doc. fixes for readahead]
    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Nick Piggin
    Signed-off-by: Randy Dunlap
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Fix the race between invalidate_inode_pages and do_no_page.

    Andrea Arcangeli identified a subtle race between invalidation of pages from
    pagecache with userspace mappings, and do_no_page.

    The issue is that invalidation has to shoot down all mappings to the page,
    before it can be discarded from the pagecache. Between shooting down ptes to
    a particular page, and actually dropping the struct page from the pagecache,
    do_no_page from any process might fault on that page and establish a new
    mapping to the page just before it gets discarded from the pagecache.

    The most common case where such invalidation is used is in file truncation.
    This case was catered for by doing a sort of open-coded seqlock between the
    file's i_size, and its truncate_count.

    Truncation will decrease i_size, then increment truncate_count before
    unmapping userspace pages; do_no_page will read truncate_count, then find the
    page if it is within i_size, and then check truncate_count under the page
    table lock and back out and retry if it had subsequently been changed (ptl
    will serialise against unmapping, and ensure a potentially updated
    truncate_count is actually visible).

    Complexity and documentation issues aside, the locking protocol fails in the
    case where we would like to invalidate pagecache inside i_size. do_no_page
    can come in anytime and filemap_nopage is not aware of the invalidation in
    progress (as it is when it is outside i_size). The end result is that
    dangling (->mapping == NULL) pages that appear to be from a particular file
    may be mapped into userspace with nonsense data. Valid mappings to the same
    place will see a different page.

    Andrea implemented two working fixes, one using a real seqlock, another using
    a page->flags bit. He also proposed using the page lock in do_no_page, but
    that was initially considered too heavyweight. However, it is not a global or
    per-file lock, and the page cacheline is modified in do_no_page to increment
    _count and _mapcount anyway, so a further modification should not be a large
    performance hit. Scalability is not an issue.

    This patch implements this latter approach. ->nopage implementations return
    with the page locked if it is possible for their underlying file to be
    invalidated (in that case, they must set a special vm_flags bit to indicate
    so). do_no_page only unlocks the page after setting up the mapping
    completely. invalidation is excluded because it holds the page lock during
    invalidation of each page (and ensures that the page is not mapped while
    holding the lock).

    This also allows significant simplifications in do_no_page, because we have
    the page locked in the right place in the pagecache from the start.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

18 Jul, 2007

2 commits

  • currently the export_operation structure and helpers related to it are in
    fs.h. fs.h is already far too large and there are very few places needing the
    export bits, so split them off into a separate header.

    [akpm@linux-foundation.org: fix cifs build]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Neil Brown
    Cc: Steven French
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • It is often known at allocation time whether a page may be migrated or not.
    This patch adds a flag called __GFP_MOVABLE and a new mask called
    GFP_HIGH_MOVABLE. Allocations using the __GFP_MOVABLE can be either migrated
    using the page migration mechanism or reclaimed by syncing with backing
    storage and discarding.

    An API function very similar to alloc_zeroed_user_highpage() is added for
    __GFP_MOVABLE allocations called alloc_zeroed_user_highpage_movable(). The
    flags used by alloc_zeroed_user_highpage() are not changed because it would
    change the semantics of an existing API. After this patch is applied there
    are no in-kernel users of alloc_zeroed_user_highpage() so it probably should
    be marked deprecated if this patch is merged.

    Note that this patch includes a minor cleanup to the use of __GFP_ZERO in
    shmem.c to keep all flag modifications to inode->mapping in the
    shmem_dir_alloc() helper function. This clean-up suggestion is courtesy of
    Hugh Dickens.

    Additional credit goes to Christoph Lameter and Linus Torvalds for shaping the
    concept. Credit to Hugh Dickens for catching issues with shmem swap vector
    and ramfs allocations.

    [akpm@linux-foundation.org: build fix]
    [hugh@veritas.com: __GFP_ZERO cleanup]
    Signed-off-by: Mel Gorman
    Cc: Andy Whitcroft
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

10 Jul, 2007

1 commit

  • Remove shmem_file_sendfile and resurrect shmem_readpage, as used by tmpfs
    to support loop and sendfile in 2.4 and 2.5. Now tmpfs can support splice,
    loop and sendfile in the simplest way, using generic_file_splice_read and
    generic_file_splice_write (with the aid of shmem_prepare_write).

    We could make some efficiency tweaks later, if there's a real need;
    but this is stable and works well as is.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Jens Axboe

    Hugh Dickins
     

09 Jun, 2007

1 commit

  • Randy Dunlap reports that a tmpfs, mounted with NUMA mpol= specifying an
    offline node, crashes as soon as data is allocated upon it. Now restrict it
    to online nodes, where before it restricted to MAX_NUMNODES.

    Signed-off-by: Hugh Dickins
    Cc: Robin Holt
    Cc: Christoph Lameter
    Cc: Andi Kleen
    Tested-and-acked-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

17 May, 2007

1 commit

  • SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.

    Signed-off-by: Christoph Lameter
    Cc: David Howells
    Cc: Jens Axboe
    Cc: Steven French
    Cc: Michael Halcrow
    Cc: OGAWA Hirofumi
    Cc: Miklos Szeredi
    Cc: Steven Whitehouse
    Cc: Roman Zippel
    Cc: David Woodhouse
    Cc: Dave Kleikamp
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Anton Altaparmakov
    Cc: Mark Fasheh
    Cc: Paul Mackerras
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: David Chinner
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

08 May, 2007

1 commit

  • I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
    SLAB.

    I think its purpose was to have a callback after an object has been freed
    to verify that the state is the constructor state again? The callback is
    performed before each freeing of an object.

    I would think that it is much easier to check the object state manually
    before the free. That also places the check near the code object
    manipulation of the object.

    Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
    compiled with SLAB debugging on. If there would be code in a constructor
    handling SLAB_DEBUG_INITIAL then it would have to be conditional on
    SLAB_DEBUG otherwise it would just be dead code. But there is no such code
    in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
    use of, difficult to understand and there are easier ways to accomplish the
    same effect (i.e. add debug code before kfree).

    There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
    clear in fs inode caches. Remove the pointless checks (they would even be
    pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.

    This is the last slab flag that SLUB did not support. Remove the check for
    unimplemented flags from SLUB.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

29 Mar, 2007

3 commits

  • shmem_truncate_range has its own truncate_inode_pages_range, to free any pages
    racily instantiated while it was in progress: a SHMEM_PAGEIN flag is set when
    this might have happened. But holepunching gets no chance to clear that flag
    at the start of vmtruncate_range, so it's always set (unless a truncate came
    just before), so holepunch almost always does this second
    truncate_inode_pages_range.

    shmem holepunch has unlikely swapfile races hereabouts whatever we do
    (without a fuller rework than is fit for this release): I was going to skip
    the second truncate in the punch_hole case, but Miklos points out that would
    make holepunch correctness more vulnerable to swapoff. So keep the second
    truncate, but follow it by an unmap_mapping_range to eliminate the
    disconnected pages (freed from pagecache while still mapped in userspace) that
    it might have left behind.

    Signed-off-by: Hugh Dickins
    Cc: Miklos Szeredi
    Cc: Badari Pulavarty
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Miklos Szeredi observes that during truncation of shmem page directories,
    info->lock is released to improve latency (after lowering i_size and
    next_index to exclude races); but this is quite wrong for holepunching, which
    receives no such protection from i_size or next_index, and is left vulnerable
    to races with shmem_unuse, shmem_getpage and shmem_writepage.

    Hold info->lock throughout when holepunching? No, any user could prevent
    rescheduling for far too long. Instead take info->lock just when needed: in
    shmem_free_swp when removing the swap entries, and whenever removing a
    directory page from the level above. But so long as we remove before
    scanning, we can safely skip taking the lock at the lower levels, except at
    misaligned start and end of the hole.

    Signed-off-by: Hugh Dickins
    Cc: Miklos Szeredi
    Cc: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Miklos Szeredi observes BUG_ON(!entry) in shmem_writepage() triggered in rare
    circumstances, because shmem_truncate_range() erroneously removes partially
    truncated directory pages at the end of the range: later reclaim on pages
    pointing to these removed directories triggers the BUG. Indeed, and it can
    also cause data loss beyond the hole.

    Fix this as in the patch proposed by Miklos, but distinguish between "limit"
    (how far we need to search: ignore truncation's next_index optimization in the
    holepunch case - if there are races it's more consistent to act on the whole
    range specified) and "upper_limit" (how far we can free directory pages:
    generally we must be careful to keep partially punched pages, but can relax at
    end of file - i_size being held stable by i_mutex).

    Signed-off-by: Hugh Dickins
    Cc: Miklos Szeredi
    Cc: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

05 Mar, 2007

1 commit


02 Mar, 2007

1 commit


13 Feb, 2007

1 commit

  • Many struct inode_operations in the kernel can be "const". Marking them const
    moves these to the .rodata section, which avoids false sharing with potential
    dirty data. In addition it'll catch accidental writes at compile time to
    these shared resources.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

12 Feb, 2007

1 commit

  • shmem backed file does not have page writeback, nor it participates in
    backing device's dirty or writeback accounting. So using generic
    __set_page_dirty_nobuffers() for its .set_page_dirty aops method is a bit
    overkill. It unnecessarily prolongs shm unmap latency.

    For example, on a densely populated large shm segment (sevearl GBs), the
    unmapping operation becomes painfully long. Because at unmap, kernel
    transfers dirty bit in PTE into page struct and to the radix tree tag. The
    operation of tagging the radix tree is particularly expensive because it
    has to traverse the tree from the root to the leaf node on every dirty
    page. What's bothering is that radix tree tag is used for page write back.
    However, shmem is memory backed and there is no page write back for such
    file system. And in the end, we spend all that time tagging radix tree and
    none of that fancy tagging will be used. So let's simplify it by introduce
    a new aops __set_page_dirty_no_writeback and this will speed up shm unmap.

    Signed-off-by: Ken Chen
    Cc: Peter Zijlstra
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ken Chen
     

23 Dec, 2006

1 commit

  • Ran into BUG() while doing madvise(REMOVE) testing. If we are punching a
    hole into shared memory segment using madvise(REMOVE) and the entire hole
    is below the indirect blocks, we hit following assert.

    BUG_ON(limit
    Cc: Hugh Dickins
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     

09 Dec, 2006

1 commit


08 Dec, 2006

3 commits


21 Oct, 2006

1 commit

  • Separate out the concept of "queue congestion" from "backing-dev congestion".
    Congestion is a backing-dev concept, not a queue concept.

    The blk_* congestion functions are retained, as wrappers around the core
    backing-dev congestion functions.

    This proper layering is needed so that NFS can cleanly use the congestion
    functions, and so that CONFIG_BLOCK=n actually links.

    Cc: "Thomas Maier"
    Cc: "Jens Axboe"
    Cc: Trond Myklebust
    Cc: David Howells
    Cc: Peter Osterlund
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

17 Oct, 2006

1 commit

  • We need to encode a decode the 'file' part of a handle. We simply use the
    inode number and generation number to construct the filehandle.

    The generation number is the time when the file was created. As inode numbers
    cycle through the full 32 bits before being reused, there is no real chance of
    the same inum being allocated to different files in the same second so this is
    suitably unique. Using time-of-day rather than e.g. jiffies makes it less
    likely that the same filehandle can be created after a reboot.

    In order to be able to decode a filehandle we need to be able to lookup by
    inum, which means that the inode needs to be added to the inode hash table
    (tmpfs doesn't currently hash inodes as there is never a need to lookup by
    inum). To avoid overhead when not exporting, we only hash an inode when it is
    first exported. This requires a lock to ensure it isn't hashed twice.

    This code is separate from the patch posted in June06 from Atal Shargorodsky
    which provided the same functionality, but does borrow slightly from it.

    Locking comment: Most filesystems that hash their inodes do so at the point
    where the 'struct inode' is initialised, and that has suitable locking
    (I_NEW). Here in shmem, we are hashing the inode later, the first time we
    need an NFS file handle for it. We no longer have I_NEW to ensure only one
    thread tries to add it to the hash table.

    Cc: Atal Shargorodsky
    Cc: Gilad Ben-Yossef
    Signed-off-by: David M. Grimes
    Signed-off-by: Neil Brown
    Acked-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David M. Grimes
     

01 Oct, 2006

2 commits

  • This is mostly included for parity with dec_nlink(), where we will have some
    more hooks. This one should stay pretty darn straightforward for now.

    Signed-off-by: Dave Hansen
    Acked-by: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • When a filesystem decrements i_nlink to zero, it means that a write must be
    performed in order to drop the inode from the filesystem.

    We're shortly going to have keep filesystems from being remounted r/o between
    the time that this i_nlink decrement and that write occurs.

    So, add a little helper function to do the decrements. We'll tie into it in a
    bit to note when i_nlink hits zero.

    Signed-off-by: Dave Hansen
    Acked-by: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen