12 Nov, 2016

1 commit

  • Christian Borntraeger reports:

    With commit 8ea1d2a1985a ("mm, frontswap: convert frontswap_enabled to
    static key") kmemleak complains about a memory leak in swapon

    unreferenced object 0x3e09ba56000 (size 32112640):
    comm "swapon", pid 7852, jiffies 4294968787 (age 1490.770s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    __vmalloc_node_range+0x194/0x2d8
    vzalloc+0x58/0x68
    SyS_swapon+0xd60/0x12f8
    system_call+0xd6/0x270

    Turns out kmemleak is right. We now allocate the frontswap map
    depending on the kernel config (and no longer on the enablement)

    swapfile.c:
    [...]
    if (IS_ENABLED(CONFIG_FRONTSWAP))
    frontswap_map = vzalloc(BITS_TO_LONGS(maxpages) * sizeof(long));

    but later on this is passed along
    --> enable_swap_info(p, prio, swap_map, cluster_info, frontswap_map);

    and ignored if frontswap is disabled
    --> frontswap_init(p->type, frontswap_map);

    static inline void frontswap_init(unsigned type, unsigned long *map)
    {
    if (frontswap_enabled())
    __frontswap_init(type, map);
    }

    Thing is, that frontswap map is never freed.

    The leakage is relatively not that bad, because swapon is an infrequent
    and privileged operation. However, if the first frontswap backend is
    registered after a swap type has been already enabled, it will WARN_ON
    in frontswap_register_ops() and frontswap will not be available for the
    swap type.

    Fix this by making sure the map is assigned by frontswap_init() as long
    as CONFIG_FRONTSWAP is enabled.

    Fixes: 8ea1d2a1985a ("mm, frontswap: convert frontswap_enabled to static key")
    Link: http://lkml.kernel.org/r/20161026134220.2566-1-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Reported-by: Christian Borntraeger
    Cc: Konrad Rzeszutek Wilk
    Cc: Boris Ostrovsky
    Cc: David Vrabel
    Cc: Juergen Gross
    Cc: "Kirill A. Shutemov"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

27 Jul, 2016

1 commit

  • I have noticed that frontswap.h first declares "frontswap_enabled" as
    extern bool variable, and then overrides it with "#define
    frontswap_enabled (1)" for CONFIG_FRONTSWAP=Y or (0) when disabled. The
    bool variable isn't actually instantiated anywhere.

    This all looks like an unfinished attempt to make frontswap_enabled
    reflect whether a backend is instantiated. But in the current state,
    all frontswap hooks call unconditionally into frontswap.c just to check
    if frontswap_ops is non-NULL. This should at least be checked inline,
    but we can further eliminate the overhead when CONFIG_FRONTSWAP is
    enabled and no backend registered, using a static key that is initially
    disabled, and gets enabled only upon first backend registration.

    Thus, checks for "frontswap_enabled" are replaced with
    "frontswap_enabled()" wrapping the static key check. There are two
    exceptions:

    - xen's selfballoon_process() was testing frontswap_enabled in code guarded
    by #ifdef CONFIG_FRONTSWAP, which was effectively always true when reachable.
    The patch just removes this check. Using frontswap_enabled() does not sound
    correct here, as this can be true even without xen's own backend being
    registered.

    - in SYSCALL_DEFINE2(swapon), change the check to IS_ENABLED(CONFIG_FRONTSWAP)
    as it seems the bitmap allocation cannot currently be postponed until a
    backend is registered. This means that frontswap will still have some
    memory overhead by being configured, but without a backend.

    After the patch, we can expect that some functions in frontswap.c are
    called only when frontswap_ops is non-NULL. Change the checks there to
    VM_BUG_ONs. While at it, convert other BUG_ONs to VM_BUG_ONs as
    frontswap has been stable for some time.

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/1463152235-9717-1-git-send-email-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Cc: Konrad Rzeszutek Wilk
    Cc: Boris Ostrovsky
    Cc: David Vrabel
    Cc: Juergen Gross
    Cc: "Kirill A. Shutemov"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

25 Jun, 2015

1 commit

  • Change frontswap single pointer to a singly linked list of frontswap
    implementations. Update Xen tmem implementation as register no longer
    returns anything.

    Frontswap only keeps track of a single implementation; any
    implementation that registers second (or later) will replace the
    previously registered implementation, and gets a pointer to the previous
    implementation that the new implementation is expected to pass all
    frontswap functions to if it can't handle the function itself. However
    that method doesn't really make much sense, as passing that work on to
    every implementation adds unnecessary work to implementations; instead,
    frontswap should simply keep a list of all registered implementations
    and try each implementation for any function. Most importantly, neither
    of the two currently existing frontswap implementations in the kernel
    actually do anything with any previous frontswap implementation that
    they replace when registering.

    This allows frontswap to successfully manage multiple implementations by
    keeping a list of them all.

    Signed-off-by: Dan Streetman
    Cc: Konrad Rzeszutek Wilk
    Cc: Boris Ostrovsky
    Cc: David Vrabel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     

01 May, 2013

3 commits

  • Frontswap initialization routine depends on swap_lock, which want to be
    atomic about frontswap's first appearance. IOW, frontswap is not present
    and will fail all calls OR frontswap is fully functional but if new
    swap_info_struct isn't registered by enable_swap_info, swap subsystem
    doesn't start I/O so there is no race between init procedure and page I/O
    working on frontswap.

    So let's remove unnecessary swap_lock dependency.

    Cc: Dan Magenheimer
    Signed-off-by: Minchan Kim
    [v1: Rebased on my branch, reworked to work with backends loading late]
    [v2: Added a check for !map]
    [v3: Made the invalidate path follow the init path]
    [v4: Address comments by Wanpeng Li ]
    Signed-off-by: Konrad Rzeszutek Wilk
    Signed-off-by: Bob Liu
    Cc: Wanpeng Li
    Cc: Andor Daam
    Cc: Florian Schmaus
    Cc: Stefan Hengelein
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • After allowing tmem backends to build/run as modules, frontswap_enabled
    always true if defined CONFIG_FRONTSWAP. But frontswap_test() depends on
    whether backend is registered, mv it into frontswap.c using fronstswap_ops
    to make the decision.

    frontswap_set/clear are not used outside frontswap, so don't export them.

    Signed-off-by: Bob Liu
    Cc: Wanpeng Li
    Cc: Andor Daam
    Cc: Dan Magenheimer
    Cc: Florian Schmaus
    Cc: Konrad Rzeszutek Wilk
    Cc: Minchan Kim
    Cc: Stefan Hengelein
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Liu
     
  • This simplifies the code in the frontswap - we can get rid of the
    'backend_registered' test and instead check against frontswap_ops.

    [v1: Rebase on top of 703ba7fe5e0 (ramster->zcache move]
    Signed-off-by: Konrad Rzeszutek Wilk
    Signed-off-by: Bob Liu
    Cc: Wanpeng Li
    Cc: Andor Daam
    Cc: Dan Magenheimer
    Cc: Florian Schmaus
    Cc: Minchan Kim
    Cc: Stefan Hengelein
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konrad Rzeszutek Wilk
     

21 Sep, 2012

1 commit

  • Tmem, as originally specified, assumes that "get" operations
    performed on persistent pools never flush the page of data out
    of tmem on a successful get, waiting instead for a flush
    operation. This is intended to mimic the model of a swap
    disk, where a disk read is non-destructive. Unlike a
    disk, however, freeing up the RAM can be valuable. Over
    the years that frontswap was in the review process, several
    reviewers (and notably Hugh Dickins in 2010) pointed out that
    this would result, at least temporarily, in two copies of the
    data in RAM: one (compressed for zcache) copy in tmem,
    and one copy in the swap cache. We wondered if this could
    be done differently, at least optionally.

    This patch allows tmem backends to instruct the frontswap
    code that this backend performs exclusive gets. Zcache2
    already contains hooks to support this feature. Other
    backends are completely unaffected unless/until they are
    updated to support this feature.

    While it is not clear that exclusive gets are a performance
    win on all workloads at all times, this small patch allows for
    experimentation by backends.

    P.S. Let's not quibble about the naming of "get" vs "read" vs
    "load" etc. The naming is currently horribly inconsistent between
    cleancache and frontswap and existing tmem backends, so will need
    to be straightened out as a separate patch. "Get" is used
    by the tmem architecture spec, existing backends, and
    all documentation and presentation material so I am
    using it in this patch.

    Signed-off-by: Dan Magenheimer
    Signed-off-by: Konrad Rzeszutek Wilk

    Dan Magenheimer
     

15 May, 2012

2 commits

  • Sounds so much more natural.

    Suggested-by: Andrea Arcangeli
    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     
  • Frontswap is the alter ego of cleancache, the "yang" to cleancache's
    "yin"... and more precisely frontswap is the provider of anonymous
    pages to transcendent memory to nicely complement cleancache's providing
    of clean pagecache pages to transcendent memory. For optimal use
    of transcendent memory, both are necessary... because a kernel
    under memory pressure first reclaims clean pagecache pages and,
    when under more memory pressure, starts swapping anonymous pages.

    Frontswap and cleancache (which was merged at 3.0) are the "frontends"
    and the only necessary changes to the core kernel for transcendent memory;
    all other supporting code -- the "backends" -- is implemented as drivers.
    See the LWN.net article "Transcendent memory in a nutshell" for a detailed
    overview of frontswap and related kernel parts:
    https://lwn.net/Articles/454795/

    Frontswap code was first posted publicly in January 2009 and on LKML in
    May 2009, and has remained functionally stable for nearly three years now.
    It is barely invasive, touching only the swap subsystem and adds less
    than 100 lines of code to existing swap subsystem code files.
    It has improved syntactically substantially between V1 and this posting
    of V14, thanks to the review of a few kernel developers, and has adapted
    easily to at least one major swap subsystem change. As of 3.4, there are
    three in-tree users of frontswap patiently waiting for this patchset and
    for CONFIG_FRONTSWAP to be enabled: zcache (staging driver merged at
    2.6.39), Xen tmem (merged at 3.0 and 3.1) and RAMster (staging driver
    merged at 3.4). In addition, a RFC has been posted for a KVM backend.
    The frontswap patchset has been in linux-next since next-110603. Earlier
    versions of frontswap already ship in the Oracle Unbreakable Enterprise Kernel
    and SuSE SLES.

    This patch, 1of4, provides the header file for the core code for frontswap
    that interfaces between the hooks in the swap subsystem and a frontswap
    backend via frontswap_ops.
    ---
    New file added: include/linux/frontswap.h

    [v14: add support for writethrough, per suggestion by aarcange@redhat.com]
    [v14: rebase to 3.4-rc2]
    [v11: konrad.wilk@oracle.com: squashed s/flush/invalidate/ in]
    [v10: no change]
    [v9: akpm@linux-foundation.org: change "flush" to "invalidate", part 1]
    [v8: rebase to 3.0-rc4]
    [v7: rebase to 3.0-rc3]
    [v7: JBeulich@novell.com: new static inlines resolve to no-ops if not config'd]
    [v7: JBeulich@novell.com: avoid redundant shifts/divides for *_bit lib calls]
    [v6: rebase to 3.1-rc1]
    [v5: no change from v4]
    [v4: rebase to 2.6.39]
    Signed-off-by: Dan Magenheimer
    Reviewed-by: Andrew Morton
    Acked-by: Jan Beulich
    Acked-by: Seth Jennings
    Cc: Jeremy Fitzhardinge
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: Nitin Gupta
    Cc: Matthew Wilcox
    Cc: Chris Mason
    Cc: Rik Riel
    [v15: int/bool on some functions]
    Signed-off-by: Konrad Wilk

    Dan Magenheimer