27 Jul, 2016

1 commit

  • I have noticed that frontswap.h first declares "frontswap_enabled" as
    extern bool variable, and then overrides it with "#define
    frontswap_enabled (1)" for CONFIG_FRONTSWAP=Y or (0) when disabled. The
    bool variable isn't actually instantiated anywhere.

    This all looks like an unfinished attempt to make frontswap_enabled
    reflect whether a backend is instantiated. But in the current state,
    all frontswap hooks call unconditionally into frontswap.c just to check
    if frontswap_ops is non-NULL. This should at least be checked inline,
    but we can further eliminate the overhead when CONFIG_FRONTSWAP is
    enabled and no backend registered, using a static key that is initially
    disabled, and gets enabled only upon first backend registration.

    Thus, checks for "frontswap_enabled" are replaced with
    "frontswap_enabled()" wrapping the static key check. There are two
    exceptions:

    - xen's selfballoon_process() was testing frontswap_enabled in code guarded
    by #ifdef CONFIG_FRONTSWAP, which was effectively always true when reachable.
    The patch just removes this check. Using frontswap_enabled() does not sound
    correct here, as this can be true even without xen's own backend being
    registered.

    - in SYSCALL_DEFINE2(swapon), change the check to IS_ENABLED(CONFIG_FRONTSWAP)
    as it seems the bitmap allocation cannot currently be postponed until a
    backend is registered. This means that frontswap will still have some
    memory overhead by being configured, but without a backend.

    After the patch, we can expect that some functions in frontswap.c are
    called only when frontswap_ops is non-NULL. Change the checks there to
    VM_BUG_ONs. While at it, convert other BUG_ONs to VM_BUG_ONs as
    frontswap has been stable for some time.

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/1463152235-9717-1-git-send-email-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Cc: Konrad Rzeszutek Wilk
    Cc: Boris Ostrovsky
    Cc: David Vrabel
    Cc: Juergen Gross
    Cc: "Kirill A. Shutemov"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

25 Jun, 2015

1 commit

  • Change frontswap single pointer to a singly linked list of frontswap
    implementations. Update Xen tmem implementation as register no longer
    returns anything.

    Frontswap only keeps track of a single implementation; any
    implementation that registers second (or later) will replace the
    previously registered implementation, and gets a pointer to the previous
    implementation that the new implementation is expected to pass all
    frontswap functions to if it can't handle the function itself. However
    that method doesn't really make much sense, as passing that work on to
    every implementation adds unnecessary work to implementations; instead,
    frontswap should simply keep a list of all registered implementations
    and try each implementation for any function. Most importantly, neither
    of the two currently existing frontswap implementations in the kernel
    actually do anything with any previous frontswap implementation that
    they replace when registering.

    This allows frontswap to successfully manage multiple implementations by
    keeping a list of them all.

    Signed-off-by: Dan Streetman
    Cc: Konrad Rzeszutek Wilk
    Cc: Boris Ostrovsky
    Cc: David Vrabel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     

11 Dec, 2014

1 commit


03 Dec, 2014

1 commit

  • If a frontswap dup-store failed, it should invalidate the expired page
    in the backend, or it could trigger some data corruption issue.
    Such as:
    1. use zswap as the frontswap backend with writeback feature
    2. store a swap page(version_1) to entry A, success
    3. dup-store a newer page(version_2) to the same entry A, fail
    4. use __swap_writepage() write version_2 page to swapfile, success
    5. zswap do shrink, writeback version_1 page to swapfile
    6. version_2 page is overwrited by version_1, data corrupt.

    This patch fixes this issue by invalidating expired data immediately
    when meet a dup-store failure.

    Signed-off-by: Weijie Yang
    Cc: Konrad Rzeszutek Wilk
    Cc: Seth Jennings
    Cc: Dan Streetman
    Cc: Minchan Kim
    Cc: Bob Liu
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Weijie Yang
     

05 Jun, 2014

2 commits

  • Originally get_swap_page() started iterating through the singly-linked
    list of swap_info_structs using swap_list.next or highest_priority_index,
    which both were intended to point to the highest priority active swap
    target that was not full. The first patch in this series changed the
    singly-linked list to a doubly-linked list, and removed the logic to start
    at the highest priority non-full entry; it starts scanning at the highest
    priority entry each time, even if the entry is full.

    Replace the manually ordered swap_list_head with a plist, swap_active_head.
    Add a new plist, swap_avail_head. The original swap_active_head plist
    contains all active swap_info_structs, as before, while the new
    swap_avail_head plist contains only swap_info_structs that are active and
    available, i.e. not full. Add a new spinlock, swap_avail_lock, to protect
    the swap_avail_head list.

    Mel Gorman suggested using plists since they internally handle ordering
    the list entries based on priority, which is exactly what swap was doing
    manually. All the ordering code is now removed, and swap_info_struct
    entries and simply added to their corresponding plist and automatically
    ordered correctly.

    Using a new plist for available swap_info_structs simplifies and
    optimizes get_swap_page(), which no longer has to iterate over full
    swap_info_structs. Using a new spinlock for swap_avail_head plist
    allows each swap_info_struct to add or remove themselves from the
    plist when they become full or not-full; previously they could not
    do so because the swap_info_struct->lock is held when they change
    from fullnot-full, and the swap_lock protecting the main
    swap_active_head must be ordered before any swap_info_struct->lock.

    Signed-off-by: Dan Streetman
    Acked-by: Mel Gorman
    Cc: Shaohua Li
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Hugh Dickins
    Cc: Dan Streetman
    Cc: Michal Hocko
    Cc: Christian Ehrhardt
    Cc: Weijie Yang
    Cc: Rik van Riel
    Cc: Johannes Weiner
    Cc: Bob Liu
    Cc: Paul Gortmaker
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     
  • The logic controlling the singly-linked list of swap_info_struct entries
    for all active, i.e. swapon'ed, swap targets is rather complex, because:

    - it stores the entries in priority order
    - there is a pointer to the highest priority entry
    - there is a pointer to the highest priority not-full entry
    - there is a highest_priority_index variable set outside the swap_lock
    - swap entries of equal priority should be used equally

    this complexity leads to bugs such as: https://lkml.org/lkml/2014/2/13/181
    where different priority swap targets are incorrectly used equally.

    That bug probably could be solved with the existing singly-linked lists,
    but I think it would only add more complexity to the already difficult to
    understand get_swap_page() swap_list iteration logic.

    The first patch changes from a singly-linked list to a doubly-linked list
    using list_heads; the highest_priority_index and related code are removed
    and get_swap_page() starts each iteration at the highest priority
    swap_info entry, even if it's full. While this does introduce unnecessary
    list iteration (i.e. Schlemiel the painter's algorithm) in the case where
    one or more of the highest priority entries are full, the iteration and
    manipulation code is much simpler and behaves correctly re: the above bug;
    and the fourth patch removes the unnecessary iteration.

    The second patch adds some minor plist helper functions; nothing new
    really, just functions to match existing regular list functions. These
    are used by the next two patches.

    The third patch adds plist_requeue(), which is used by get_swap_page() in
    the next patch - it performs the requeueing of same-priority entries
    (which moves the entry to the end of its priority in the plist), so that
    all equal-priority swap_info_structs get used equally.

    The fourth patch converts the main list into a plist, and adds a new plist
    that contains only swap_info entries that are both active and not full.
    As Mel suggested using plists allows removing all the ordering code from
    swap - plists handle ordering automatically. The list naming is also
    clarified now that there are two lists, with the original list changed
    from swap_list_head to swap_active_head and the new list named
    swap_avail_head. A new spinlock is also added for the new list, so
    swap_info entries can be added or removed from the new list immediately as
    they become full or not full.

    This patch (of 4):

    Replace the singly-linked list tracking active, i.e. swapon'ed,
    swap_info_struct entries with a doubly-linked list using struct
    list_heads. Simplify the logic iterating and manipulating the list of
    entries, especially get_swap_page(), by using standard list_head
    functions, and removing the highest priority iteration logic.

    The change fixes the bug:
    https://lkml.org/lkml/2014/2/13/181
    in which different priority swap entries after the highest priority entry
    are incorrectly used equally in pairs. The swap behavior is now as
    advertised, i.e. different priority swap entries are used in order, and
    equal priority swap targets are used concurrently.

    Signed-off-by: Dan Streetman
    Acked-by: Mel Gorman
    Cc: Shaohua Li
    Cc: Hugh Dickins
    Cc: Dan Streetman
    Cc: Michal Hocko
    Cc: Christian Ehrhardt
    Cc: Weijie Yang
    Cc: Rik van Riel
    Cc: Johannes Weiner
    Cc: Bob Liu
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Paul Gortmaker
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     

13 Jun, 2013

1 commit

  • The bitmap accessed by bitops must have enough size to hold the required
    numbers of bits rounded up to a multiple of BITS_PER_LONG. And the
    bitmap must not be zeroed by memset() if the number of bits cleared is
    not a multiple of BITS_PER_LONG.

    This fixes incorrect zeroing and allocation size for frontswap_map. The
    incorrect zeroing part doesn't cause any problem because frontswap_map
    is freed just after zeroing. But the wrongly calculated allocation size
    may cause the problem.

    For 32bit systems, the allocation size of frontswap_map is about twice
    as large as required size. For 64bit systems, the allocation size is
    smaller than requeired if the number of bits is not a multiple of
    BITS_PER_LONG.

    Signed-off-by: Akinobu Mita
    Cc: Konrad Rzeszutek Wilk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

01 May, 2013

4 commits

  • Frontswap initialization routine depends on swap_lock, which want to be
    atomic about frontswap's first appearance. IOW, frontswap is not present
    and will fail all calls OR frontswap is fully functional but if new
    swap_info_struct isn't registered by enable_swap_info, swap subsystem
    doesn't start I/O so there is no race between init procedure and page I/O
    working on frontswap.

    So let's remove unnecessary swap_lock dependency.

    Cc: Dan Magenheimer
    Signed-off-by: Minchan Kim
    [v1: Rebased on my branch, reworked to work with backends loading late]
    [v2: Added a check for !map]
    [v3: Made the invalidate path follow the init path]
    [v4: Address comments by Wanpeng Li ]
    Signed-off-by: Konrad Rzeszutek Wilk
    Signed-off-by: Bob Liu
    Cc: Wanpeng Li
    Cc: Andor Daam
    Cc: Florian Schmaus
    Cc: Stefan Hengelein
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • After allowing tmem backends to build/run as modules, frontswap_enabled
    always true if defined CONFIG_FRONTSWAP. But frontswap_test() depends on
    whether backend is registered, mv it into frontswap.c using fronstswap_ops
    to make the decision.

    frontswap_set/clear are not used outside frontswap, so don't export them.

    Signed-off-by: Bob Liu
    Cc: Wanpeng Li
    Cc: Andor Daam
    Cc: Dan Magenheimer
    Cc: Florian Schmaus
    Cc: Konrad Rzeszutek Wilk
    Cc: Minchan Kim
    Cc: Stefan Hengelein
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Liu
     
  • This simplifies the code in the frontswap - we can get rid of the
    'backend_registered' test and instead check against frontswap_ops.

    [v1: Rebase on top of 703ba7fe5e0 (ramster->zcache move]
    Signed-off-by: Konrad Rzeszutek Wilk
    Signed-off-by: Bob Liu
    Cc: Wanpeng Li
    Cc: Andor Daam
    Cc: Dan Magenheimer
    Cc: Florian Schmaus
    Cc: Minchan Kim
    Cc: Stefan Hengelein
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konrad Rzeszutek Wilk
     
  • With the goal of allowing tmem backends (zcache, ramster, Xen tmem) to
    be built/loaded as modules rather than built-in and enabled by a boot
    parameter, this patch provides "lazy initialization", allowing backends
    to register to frontswap even after swapon was run. Before a backend
    registers all calls to init are recorded and the creation of tmem_pools
    delayed until a backend registers or until a frontswap store is
    attempted.

    Signed-off-by: Stefan Hengelein
    Signed-off-by: Florian Schmaus
    Signed-off-by: Andor Daam
    Signed-off-by: Dan Magenheimer
    [v1: Fixes per Seth Jennings suggestions]
    [v2: Removed FRONTSWAP_HAS_.. ]
    [v3: Fix up per Bob Liu recommendations]
    [v4: Fix up per Andrew's comments]
    Signed-off-by: Konrad Rzeszutek Wilk
    Signed-off-by: Bob Liu
    Cc: Wanpeng Li
    Cc: Dan Magenheimer
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Magenheimer
     

21 Sep, 2012

2 commits

  • Tmem, as originally specified, assumes that "get" operations
    performed on persistent pools never flush the page of data out
    of tmem on a successful get, waiting instead for a flush
    operation. This is intended to mimic the model of a swap
    disk, where a disk read is non-destructive. Unlike a
    disk, however, freeing up the RAM can be valuable. Over
    the years that frontswap was in the review process, several
    reviewers (and notably Hugh Dickins in 2010) pointed out that
    this would result, at least temporarily, in two copies of the
    data in RAM: one (compressed for zcache) copy in tmem,
    and one copy in the swap cache. We wondered if this could
    be done differently, at least optionally.

    This patch allows tmem backends to instruct the frontswap
    code that this backend performs exclusive gets. Zcache2
    already contains hooks to support this feature. Other
    backends are completely unaffected unless/until they are
    updated to support this feature.

    While it is not clear that exclusive gets are a performance
    win on all workloads at all times, this small patch allows for
    experimentation by backends.

    P.S. Let's not quibble about the naming of "get" vs "read" vs
    "load" etc. The naming is currently horribly inconsistent between
    cleancache and frontswap and existing tmem backends, so will need
    to be straightened out as a separate patch. "Get" is used
    by the tmem architecture spec, existing backends, and
    all documentation and presentation material so I am
    using it in this patch.

    Signed-off-by: Dan Magenheimer
    Signed-off-by: Konrad Rzeszutek Wilk

    Dan Magenheimer
     
  • pages_to_unuse is set to 0 to unuse all frontswap pages
    But that doesn't happen since a wrong condition in frontswap_shrink
    cancel it.

    -v2: Add comment to explain return value of __frontswap_shrink,
    as suggested by Dan Carpenter, thanks

    Signed-off-by: Zhenzhong Duan
    Signed-off-by: Konrad Rzeszutek Wilk

    Zhenzhong Duan
     

14 Aug, 2012

1 commit

  • Fixes uninitialized variable warning on 'type' in frontswap_shrink().
    type is set before use by __frontswap_unuse_pages() called by
    __frontswap_shrink() called by frontswap_shrink() before use by
    try_to_unuse().

    Signed-off-by: Seth Jennings
    Signed-off-by: Konrad Rzeszutek Wilk

    Seth Jennings
     

23 Jul, 2012

2 commits


20 Jul, 2012

1 commit


12 Jun, 2012

7 commits


15 May, 2012

2 commits

  • Sounds so much more natural.

    Suggested-by: Andrea Arcangeli
    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     
  • This patch, 3of4, provides the core frontswap code that interfaces between
    the hooks in the swap subsystem and a frontswap backend via frontswap_ops.

    ---
    New file added: mm/frontswap.c

    [v14: add support for writethrough, per suggestion by aarcange@redhat.com]
    [v11: sjenning@linux.vnet.ibm.com: s/puts/failed_puts/]
    [v10: sjenning@linux.vnet.ibm.com: fix debugfs calls on 32-bit]
    [v9: akpm@linux-foundation.org: change "flush" to "invalidate", part 1]
    [v9: akpm@linux-foundation.org: mark some statics __read_mostly]
    [v9: akpm@linux-foundation.org: add clarifying comments]
    [v9: akpm@linux-foundation.org: no need to loop repeating try_to_unuse]
    [v9: error27@gmail.com: remove superfluous check for NULL]
    [v8: rebase to 3.0-rc4]
    [v8: kamezawa.hiroyu@jp.fujitsu.com: add comment to clarify find_next_to_unuse]
    [v7: rebase to 3.0-rc3]
    [v7: JBeulich@novell.com: use new static inlines, no-ops if not config'd]
    [v6: rebase to 3.1-rc1]
    [v6: lliubbo@gmail.com: use vzalloc]
    [v6: lliubbo@gmail.com: fix null pointer deref if vzalloc fails]
    [v6: konrad.wilk@oracl.com: various checks and code clarifications/comments]
    [v4: rebase to 2.6.39]
    Signed-off-by: Dan Magenheimer
    Acked-by: Jan Beulich
    Acked-by: Seth Jennings
    Cc: Jeremy Fitzhardinge
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: Nitin Gupta
    Cc: Matthew Wilcox
    Cc: Chris Mason
    Cc: Rik Riel
    Cc: Andrew Morton
    [v12: Squashed s/flush/invalidate/ in]
    [v15: A bit of cleanup and seperate DEBUGFS]
    Signed-off-by: Konrad Wilk

    Dan Magenheimer