17 Dec, 2009

5 commits


16 Dec, 2009

35 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: (80 commits)
    dm snapshot: use merge origin if snapshot invalid
    dm snapshot: report merge failure in status
    dm snapshot: merge consecutive chunks together
    dm snapshot: trigger exceptions in remaining snapshots during merge
    dm snapshot: delay merging a chunk until writes to it complete
    dm snapshot: queue writes to chunks being merged
    dm snapshot: add merging
    dm snapshot: permit only one merge at once
    dm snapshot: support barriers in snapshot merge target
    dm snapshot: avoid allocating exceptions in merge
    dm snapshot: rework writing to origin
    dm snapshot: add merge target
    dm exception store: add merge specific methods
    dm snapshot: create function for chunk_is_tracked wait
    dm snapshot: make bio optional in __origin_write
    dm mpath: reject messages when device is suspended
    dm: export suspended state to targets
    dm: rename dm_suspended to dm_suspended_md
    dm: swap target postsuspend call and setting suspended flag
    dm crypt: add plain64 iv
    ...

    Linus Torvalds
     
  • …el/git/tip/linux-2.6-tip

    * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (26 commits)
    clockevents: Convert to raw_spinlock
    clockevents: Make tick_device_lock static
    debugobjects: Convert to raw_spinlocks
    perf_event: Convert to raw_spinlock
    hrtimers: Convert to raw_spinlocks
    genirq: Convert irq_desc.lock to raw_spinlock
    smp: Convert smplocks to raw_spinlocks
    rtmutes: Convert rtmutex.lock to raw_spinlock
    sched: Convert pi_lock to raw_spinlock
    sched: Convert cpupri lock to raw_spinlock
    sched: Convert rt_runtime_lock to raw_spinlock
    sched: Convert rq->lock to raw_spinlock
    plist: Make plist debugging raw_spinlock aware
    bkl: Fixup core_lock fallout
    locking: Cleanup the name space completely
    locking: Further name space cleanups
    alpha: Fix fallout from locking changes
    locking: Implement new raw_spinlock
    locking: Convert raw_rwlock functions to arch_rwlock
    locking: Convert raw_rwlock to arch_rwlock
    ...

    Linus Torvalds
     
  • * git://git.infradead.org/battery-2.6:
    power_supply_sysfs: Handle -ENODATA in a special way
    wm831x_backup: Remove unused variables
    gta02: Set pcf50633 charger_reference_current_ma
    pcf50633: Query charger status directly
    pcf50633: Properly reenable charging when the supply conditions change
    pcf50633: Get rid of charging restart software auto-triggering
    pcf50633: introduces battery charging current control
    pcf50633: Add ac power supply class to the charger
    wm831x: Factor out WM831x backup battery charger

    Linus Torvalds
     
  • Implement selftest feature as specified by chip manufacturer. Control:
    read selftest sysfs entry

    Response: "OK x y z" or "FAIL x y z"

    where x, y, and z are difference between selftest mode and normal mode.
    Test is passed when values are within acceptance limit values.

    Acceptance limits are provided via platform data. See chip spesifications
    for acceptance limits. If limits are not properly set, OK / FAIL decision
    is meaningless. However, userspace application can still make decision
    based on the numeric x, y, z values.

    Selftest is meant for HW diagnostic purposes. It is not meant to be
    called during normal use of the chip. It may cause false interrupt
    events. Selftest mode delays polling of the normal results but it doesn't
    cause wrong values. Chip must be in static state during selftest. Any
    acceration during the test causes most probably failure.

    Signed-off-by: Samu Onkalo
    Acked-by: Éric Piel
    Cc: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Samu Onkalo
     
  • Add the possibility to remap axes via platform data. Function pointers
    for resource setup and release purposes

    Signed-off-by: Samu Onkalo
    Acked-by: Éric Piel
    Cc: Pavel Machek
    Cc: Jean Delvare
    Cc: "Trisal, Kalhan"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Samu Onkalo
     
  • Allow the use of another DMA controller driver in atmel-mci sd/mmc driver.
    This adds a generic dma_slave pointer to the mci platform structure where
    we can store DMA controller information. In atmel-mci we use information
    provided by this structure to initialize the driver (with new helper
    functions that are architecture dependant).

    This also adds at32/avr32 chip modifications to cope with this new access
    method.

    Signed-off-by: Nicolas Ferre
    Cc: Haavard Skinnemoen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicolas Ferre
     
  • Recently, We marked strstrip() as must_check. because it was frequently
    misused and it should be checked. However, we found one exception.
    scsi/ipr.c intentionally ignore return value of strstrip. Because it
    wishes to keep the whitespace at the beginning.

    Thus we need to keep with and without checked whitespace trim function.
    This patch adds a new strim() and changes ipr.c to use it.

    [akpm@linux-foundation.org: coding-style fixes]
    Suggested-by: Alan Cox
    Signed-off-by: KOSAKI Motohiro
    Cc: James Bottomley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • Shrinks vmlinux

    without:
    $ size vmlinux
    text data bss dec hex filename
    6975863 679652 1359668 9015183 898f8f vmlinux

    with:
    $ size vmlinux
    text data bss dec hex filename
    6975639 679652 1359668 9014959 898eaf vmlinux

    Signed-off-by: Joe Perches
    Cc: Jeff Garzik
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • On the following sentence:
    while (*s && isspace(*s))
    s++;

    If *s == 0, isspace() evaluates to ((_ctype[*s] & 0x20) != 0), which
    evaluates to ((0x08 & 0x20) != 0) which equals to 0 as well.
    If *s == 1, we depend on isspace() result anyway. In other words,
    "a char equals zero is never a space", so remove this check.

    Also, *s != 0 is most common case (non-null string).

    Fixed const return as noticed by Jan Engelhardt and James Bottomley.
    Fixed unnecessary extra cast on strstrip() as noticed by Jan Engelhardt.

    Signed-off-by: André Goddard Rosa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    André Goddard Rosa
     
  • While at it, use tabs to indent the comments.

    Signed-off-by: André Goddard Rosa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    André Goddard Rosa
     
  • The kernel offers with TIOCL_GETKMSGREDIRECT ioctl() the possibility to
    redirect the kernel messages to a specific console.

    However, since it's not possible to switch to the kernel message console
    after a panic(), it would be nice if the kernel would print the panic
    message on the current console.

    This patch series adds a new interface to access the global kmsg_redirect
    variable by a function to be able to use it in code where
    CONFIG_VT_CONSOLE is not set (kernel/panic.c).

    This patch:

    Instead of using and exporting a global value kmsg_redirect, introduce a
    function vt_kmsg_redirect() that both can set and return the console where
    messages are printed.

    Change all users of kmsg_redirect (the VT code itself and kernel/power.c)
    to the new interface.

    The main advantage is that vt_kmsg_redirect() can also be used when
    CONFIG_VT_CONSOLE is not set.

    Signed-off-by: Bernhard Walle
    Cc: Alan Cox
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bernhard Walle
     
  • ..and include them in the lxfb/gxfb drivers rather than asm/geode.h (where
    possible).

    Signed-off-by: Andres Salomon
    Cc: Jordan Crouse
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: john stultz
    Cc: Chris Ball
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andres Salomon
     
  • Signed-off-by: Andres Salomon
    Cc: Jordan Crouse
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: john stultz
    Cc: Chris Ball
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andres Salomon
     
  • The only thing that uses this is the reboot_fixups code.

    Signed-off-by: Andres Salomon
    Cc: Jordan Crouse
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: john stultz
    Cc: Chris Ball
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andres Salomon
     
  • This is based on the old code on arch/x86/kernel/mfgpt_32.c, except it's
    not x86 specific, it's modular, and it makes use of a PCI BAR rather than
    a random MSR. Currently module unloading is not supported; it's uncertain
    whether or not it can be made work with the hardware.

    [akpm@linux-foundation.org: add X86 dependency]
    Signed-off-by: Andres Salomon
    Cc: Jordan Crouse
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: john stultz
    Cc: Chris Ball
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andres Salomon
     
  • This creates a CS5535/CS5536 GPIO driver which uses a gpio_chip backend
    (allowing GPIO users to use the generic GPIO API if desired) while also
    allowing architecture-specific users directly (via the cs5535_gpio_*
    functions).

    Tested on an OLPC machine. Some Leemotes also use CS5536 (with a mips
    cpu), which is why this is in drivers/gpio rather than arch/x86.
    Currently, it conflicts with older geode GPIO support; once MFGPT support
    is reworked to also be more generic, the older geode code will be removed.

    Signed-off-by: Andres Salomon
    Cc: Takashi Iwai
    Cc: Jordan Crouse
    Cc: David Brownell
    Reviewed-by: Alessandro Zummo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andres Salomon
     
  • There are quite a few instances in the kernel of checks of pointers both
    against NULL and against the errno range, handling both cases identically.
    This additional helper function would simplify such code.

    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Phil Carmody
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Phil Carmody
     
  • journal_info in task_struct is used in journaling file system only. So
    introduce CONFIG_FS_JOURNAL_INFO and make it conditional.

    Signed-off-by: Hiroshi Shimamoto
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Steven Whitehouse
    Cc: KONISHI Ryusuke
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hiroshi Shimamoto
     
  • Add a printk_ratelimited statement expression macro that uses a per-call
    ratelimit_state so that multiple subsystems output messages are not
    suppressed by a global __ratelimit state.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: s/_rl/_ratelimited/g]
    Signed-off-by: Joe Perches
    Cc: Naohiro Ooiwa
    Cc: Ingo Molnar
    Cc: Hiroshi Shimamoto
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • According to feature-removal-schedule.txt, it is the time to remove
    print_fn_descriptor_symbol().

    And a quick grep shows that it no longer has any callers.

    Signed-off-by: WANG Cong
    Cc: Bjorn Helgaas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amerigo Wang
     
  • rwsem_is_locked() tests ->activity without locks, so we should always keep
    ->activity consistent. However, the code in __rwsem_do_wake() breaks this
    rule, it updates ->activity after _all_ readers waken up, this may give
    some reader a wrong ->activity value, thus cause rwsem_is_locked() behaves
    wrong.

    Quote from Andrew:

    "
    - we have one or more processes sleeping in down_read(), waiting for access.

    - we wake one or more processes up without altering ->activity

    - they start to run and they do rwsem_is_locked(). This incorrectly
    returns "false", because the waker process is still crunching away in
    __rwsem_do_wake().

    - the waker now alters ->activity, but it was too late.
    "

    So we need get a spinlock to protect this. And rwsem_is_locked() should
    not block, thus we use spin_trylock_irqsave().

    [akpm@linux-foundation.org: simplify code]
    Reported-by: Brian Behlendorf
    Cc: Ben Woodard
    Cc: David Howells
    Signed-off-by: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amerigo Wang
     
  • Don't initialize __print_once. Invert the test to reduce initialized
    data.

    defconfig before: $size vmlinux
    text data bss dec hex filename
    6976022 679572 1359668 9015262 898fde vmlinux

    defconfig after: $size vmlinux
    text data bss dec hex filename
    6976006 679508 1359700 9015214 898fae vmlinux

    Signed-off-by: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • If CONFIG_DYNAMIC_DEBUG is enabled and a source file has:

    #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
    #include

    dynamic_debug.h will duplicate KBUILD_MODNAME
    in the output string.

    Remove the use of KBUILD_MODNAME from the
    output format string generated by dynamic_debug.h

    If CONFIG_DYNAMIC_DEBUG is not enabled, no compile-time
    check is done to printk/dev_printk arguments.

    Add it.

    Signed-off-by: Joe Perches
    Cc: Jason Baron
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Commit 70867453092297be9afb2249e712a1f960ec0a09 ("printk_once(): use bool
    for boolean flag") changed printk_once() to use bool instead of int for
    its guard variable. Do the same change to WARN_ONCE() and WARN_ON_ONCE(),
    for the same reasons.

    This resulted in a reduction of 1462 bytes on a x86-64 defconfig:

    text data bss dec hex filename
    8101271 1207116 992764 10301151 9d2edf vmlinux.before
    8100553 1207148 991988 10299689 9d2929 vmlinux.after

    Signed-off-by: Cesar Eduardo Barros
    Cc: Roland Dreier
    Cc: Daniel Walker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cesar Eduardo Barros
     
  • Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • The NOMMU code currently clears all anonymous mmapped memory. While this
    is what we want in the default case, all memory allocation from userspace
    under NOMMU has to go through this interface, including malloc() which is
    allowed to return uninitialized memory. This can easily be a significant
    performance penalty. So for constrained embedded systems were security is
    irrelevant, allow people to avoid clearing memory unnecessarily.

    This also alters the ELF-FDPIC binfmt such that it obtains uninitialised
    memory for the brk and stack region.

    Signed-off-by: Jie Zhang
    Signed-off-by: Robin Getz
    Signed-off-by: Mike Frysinger
    Signed-off-by: David Howells
    Acked-by: Paul Mundt
    Acked-by: Greg Ungerer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jie Zhang
     
  • This patch enables extraction of the pfn of a hugepage from
    /proc/pid/pagemap in an architecture independent manner.

    Details
    -------
    My test program (leak_pagemap) works as follows:
    - creat() and mmap() a file on hugetlbfs (file size is 200MB == 100 hugepages,)
    - read()/write() something on it,
    - call page-types with option -p,
    - munmap() and unlink() the file on hugetlbfs

    Without my patches
    ------------------
    $ ./leak_pagemap
    flags page-count MB symbolic-flags long-symbolic-flags
    0x0000000000000000 1 0 __________________________________
    0x0000000000000804 1 0 __R________M______________________ referenced,mmap
    0x000000000000086c 81 0 __RU_lA____M______________________ referenced,uptodate,lru,active,mmap
    0x0000000000005808 5 0 ___U_______Ma_b___________________ uptodate,mmap,anonymous,swapbacked
    0x0000000000005868 12 0 ___U_lA____Ma_b___________________ uptodate,lru,active,mmap,anonymous,swapbacked
    0x000000000000586c 1 0 __RU_lA____Ma_b___________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked
    total 101 0

    The output of page-types don't show any hugepage.

    With my patches
    ---------------
    $ ./leak_pagemap
    flags page-count MB symbolic-flags long-symbolic-flags
    0x0000000000000000 1 0 __________________________________
    0x0000000000030000 51100 199 ________________TG________________ compound_tail,huge
    0x0000000000028018 100 0 ___UD__________H_G________________ uptodate,dirty,compound_head,huge
    0x0000000000000804 1 0 __R________M______________________ referenced,mmap
    0x000000000000080c 1 0 __RU_______M______________________ referenced,uptodate,mmap
    0x000000000000086c 80 0 __RU_lA____M______________________ referenced,uptodate,lru,active,mmap
    0x0000000000005808 4 0 ___U_______Ma_b___________________ uptodate,mmap,anonymous,swapbacked
    0x0000000000005868 12 0 ___U_lA____Ma_b___________________ uptodate,lru,active,mmap,anonymous,swapbacked
    0x000000000000586c 1 0 __RU_lA____Ma_b___________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked
    total 51300 200

    The output of page-types shows 51200 pages contributing to hugepages,
    containing 100 head pages and 51100 tail pages as expected.

    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Naoya Horiguchi
    Cc: Andi Kleen
    Cc: Wu Fengguang
    Cc: Hugh Dickins
    Cc: Mel Gorman
    Cc: Lee Schermerhorn
    Cc: Andy Whitcroft
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • The check code for CONFIG_SWAP is redundant, because there is a
    non-CONFIG_SWAP version for PageSwapCache() which just returns 0.

    Signed-off-by: Huang Shijie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Shijie
     
  • It has no references outside memory_hotplug.c.

    Cc: "Rafael J. Wysocki"
    Cc: Andi Kleen
    Cc: Gerald Schaefer
    Cc: KOSAKI Motohiro
    Cc: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • The previous patch enables page migration of ksm pages, but that soon gets
    into trouble: not surprising, since we're using the ksm page lock to lock
    operations on its stable_node, but page migration switches the page whose
    lock is to be used for that. Another layer of locking would fix it, but
    do we need that yet?

    Do we actually need page migration of ksm pages? Yes, memory hotremove
    needs to offline sections of memory: and since we stopped allocating ksm
    pages with GFP_HIGHUSER, they will tend to be GFP_HIGHUSER_MOVABLE
    candidates for migration.

    But KSM is currently unconscious of NUMA issues, happily merging pages
    from different NUMA nodes: at present the rule must be, not to use
    MADV_MERGEABLE where you care about NUMA. So no, NUMA page migration of
    ksm pages does not make sense yet.

    So, to complete support for ksm swapping we need to make hotremove safe.
    ksm_memory_callback() take ksm_thread_mutex when MEM_GOING_OFFLINE and
    release it when MEM_OFFLINE or MEM_CANCEL_OFFLINE. But if mapped pages
    are freed before migration reaches them, stable_nodes may be left still
    pointing to struct pages which have been removed from the system: the
    stable_node needs to identify a page by pfn rather than page pointer, then
    it can safely prune them when MEM_OFFLINE.

    And make NUMA migration skip PageKsm pages where it skips PageReserved.
    But it's only when we reach unmap_and_move() that the page lock is taken
    and we can be sure that raised pagecount has prevented a PageAnon from
    being upgraded: so add offlining arg to migrate_pages(), to migrate ksm
    page when offlining (has sufficient locking) but reject it otherwise.

    Signed-off-by: Hugh Dickins
    Cc: Izik Eidus
    Cc: Andrea Arcangeli
    Cc: Chris Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • A side-effect of making ksm pages swappable is that they have to be placed
    on the LRUs: which then exposes them to isolate_lru_page() and hence to
    page migration.

    Add rmap_walk() for remove_migration_ptes() to use: rmap_walk_anon() and
    rmap_walk_file() in rmap.c, but rmap_walk_ksm() in ksm.c. Perhaps some
    consolidation with existing code is possible, but don't attempt that yet
    (try_to_unmap needs to handle nonlinears, but migration pte removal does
    not).

    rmap_walk() is sadly less general than it appears: rmap_walk_anon(), like
    remove_anon_migration_ptes() which it replaces, avoids calling
    page_lock_anon_vma(), because that includes a page_mapped() test which
    fails when all migration ptes are in place. That was valid when NUMA page
    migration was introduced (holding mmap_sem provided the missing guarantee
    that anon_vma's slab had not already been destroyed), but I believe not
    valid in the memory hotremove case added since.

    For now do the same as before, and consider the best way to fix that
    unlikely race later on. When fixed, we can probably use rmap_walk() on
    hwpoisoned ksm pages too: for now, they remain among hwpoison's various
    exceptions (its PageKsm test comes before the page is locked, but its
    page_lock_anon_vma fails safely if an anon gets upgraded).

    Signed-off-by: Hugh Dickins
    Cc: Izik Eidus
    Cc: Andrea Arcangeli
    Cc: Chris Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • For full functionality, page_referenced_one() and try_to_unmap_one() need
    to know the vma: to pass vma down to arch-dependent flushes, or to observe
    VM_LOCKED or VM_EXEC. But KSM keeps no record of vma: nor can it, since
    vmas get split and merged without its knowledge.

    Instead, note page's anon_vma in its rmap_item when adding to stable tree:
    all the vmas which might map that page are listed by its anon_vma.

    page_referenced_ksm() and try_to_unmap_ksm() then traverse the anon_vma,
    first to find the probable vma, that which matches rmap_item's mm; but if
    that is not enough to locate all instances, traverse again to try the
    others. This catches those occasions when fork has duplicated a pte of a
    ksm page, but ksmd has not yet come around to assign it an rmap_item.

    But each rmap_item in the stable tree which refers to an anon_vma needs to
    take a reference to it. Andrea's anon_vma design cleverly avoided a
    reference count (an anon_vma was free when its list of vmas was empty),
    but KSM now needs to add that. Is a 32-bit count sufficient? I believe
    so - the anon_vma is only free when both count is 0 and list is empty.

    Signed-off-by: Hugh Dickins
    Cc: Izik Eidus
    Cc: Andrea Arcangeli
    Cc: Chris Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Initial implementation for swapping out KSM's shared pages: add
    page_referenced_ksm() and try_to_unmap_ksm(), which rmap.c calls when
    faced with a PageKsm page.

    Most of what's needed can be got from the rmap_items listed from the
    stable_node of the ksm page, without discovering the actual vma: so in
    this patch just fake up a struct vma for page_referenced_one() or
    try_to_unmap_one(), then refine that in the next patch.

    Add VM_NONLINEAR to ksm_madvise()'s list of exclusions: it has always been
    implicit there (being only set with VM_SHARED, already excluded), but
    let's make it explicit, to help justify the lack of nonlinear unmap.

    Rely on the page lock to protect against concurrent modifications to that
    page's node of the stable tree.

    The awkward part is not swapout but swapin: do_swap_page() and
    page_add_anon_rmap() now have to allow for new possibilities - perhaps a
    ksm page still in swapcache, perhaps a swapcache page associated with one
    location in one anon_vma now needed for another location or anon_vma.
    (And the vma might even be no longer VM_MERGEABLE when that happens.)

    ksm_might_need_to_copy() checks for that case, and supplies a duplicate
    page when necessary, simply leaving it to a subsequent pass of ksmd to
    rediscover the identity and merge them back into one ksm page.
    Disappointingly primitive: but the alternative would have to accumulate
    unswappable info about the swapped out ksm pages, limiting swappability.

    Remove page_add_ksm_rmap(): page_add_anon_rmap() now has to allow for the
    particular case it was handling, so just use it instead.

    Signed-off-by: Hugh Dickins
    Cc: Izik Eidus
    Cc: Andrea Arcangeli
    Cc: Chris Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Add a pointer to the ksm page into struct stable_node, holding a reference
    to the page while the node exists. Put a pointer to the stable_node into
    the ksm page's ->mapping.

    Then we don't need get_ksm_page() while traversing the stable tree: the
    page to compare against is sure to be present and correct, even if it's no
    longer visible through any of its existing rmap_items.

    And we can handle the forked ksm page case more efficiently: no need to
    memcmp our way through the tree to find its match.

    Signed-off-by: Hugh Dickins
    Cc: Izik Eidus
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Remove three degrees of obfuscation, left over from when we had
    CONFIG_UNEVICTABLE_LRU. MLOCK_PAGES is CONFIG_HAVE_MLOCKED_PAGE_BIT is
    CONFIG_HAVE_MLOCK is CONFIG_MMU. rmap.o (and memory-failure.o) are only
    built when CONFIG_MMU, so don't need such conditions at all.

    Somehow, I feel no compulsion to remove the CONFIG_HAVE_MLOCK* lines from
    169 defconfigs: leave those to evolve in due course.

    Signed-off-by: Hugh Dickins
    Cc: Izik Eidus
    Cc: Andrea Arcangeli
    Cc: Nick Piggin
    Reviewed-by: KOSAKI Motohiro
    Cc: Rik van Riel
    Cc: Lee Schermerhorn
    Cc: Andi Kleen
    Cc: KAMEZAWA Hiroyuki
    Cc: Wu Fengguang
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins