26 Jun, 2015

1 commit

  • Change the "enabled" parameter to be configurable at runtime. Remove the
    enabled check from init(), and move it to the frontswap store() function;
    when enabled, pages will be stored, and when disabled, pages won't be
    stored.

    This is almost identical to Seth's patch from 2 years ago:
    http://lkml.iu.edu/hypermail/linux/kernel/1307.2/04289.html

    [akpm@linux-foundation.org: tweak documentation]
    Signed-off-by: Dan Streetman
    Suggested-by: Seth Jennings
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     

25 Jun, 2015

1 commit

  • There is a very subtle difference between mmap()+mlock() vs
    mmap(MAP_LOCKED) semantic. The former one fails if the population of the
    area fails while the later one doesn't. This basically means that
    mmap(MAPLOCKED) areas might see major fault after mmap syscall returns
    which is not the case for mlock. mmap man page has already been altered
    but Documentation/vm/unevictable-lru.txt deserves a clarification as well.

    Signed-off-by: Michal Hocko
    Reported-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

18 Apr, 2015

1 commit

  • Pull documentation updates from Jonathan Corbet:
    "Numerous fixes, the overdue removal of the i2o docs, some new Chinese
    translations, and, hopefully, the README fix that will end the flow of
    identical patches to that file"

    * tag 'docs-for-linus' of git://git.lwn.net/linux-2.6: (34 commits)
    Documentation/memcg: update memcg/kmem status
    Documentation: blackfin: Makefile: Typo building issue
    Documentation/vm/pagemap.txt: correct location of page-types tool
    Documentation/memory-barriers.txt: typo fix
    doc: Add guest_nice column to example output of `cat /proc/stat'
    Documentation/kernel-parameters: Move "eagerfpu" to its right place
    Documentation: gpio: Update ACPI part of the document to mention _DSD
    docs/completion.txt: Various tweaks and corrections
    doc: completion: context, scope and language fixes
    Documentation:Update Documentation/zh_CN/arm64/memory.txt
    Documentation:Update Documentation/zh_CN/arm64/booting.txt
    Documentation: Chinese translation of arm64/legacy_instructions.txt
    DocBook media: fix broken EIA hyperlink
    Documentation: tweak the maintainers entry
    README: Change gzip/bzip2 to xz compression format
    README: Update version number reference
    doc:pci: Fix typo in Documentation/PCI
    Documentation: drm: Use '->' when describing access through pointers.
    Documentation: Remove mentioning of block barriers
    Documentation/email-clients.txt: Fix one grammar mistake, add extra info about TB
    ...

    Linus Torvalds
     

16 Apr, 2015

4 commits

  • Create zsmalloc doc which explains design concept and stat information.

    Signed-off-by: Minchan Kim
    Cc: Juneho Choi
    Cc: Gunho Lee
    Cc: Luigi Semenzato
    Cc: Dan Streetman
    Cc: Seth Jennings
    Cc: Nitin Gupta
    Cc: Jerome Marchand
    Cc: Sergey Senozhatsky
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • munmap(2) of hugetlb memory requires a length that is hugepage aligned,
    otherwise it may fail. Add this to the documentation.

    This also cleans up the documentation and separates it into logical units:
    one part refers to MAP_HUGETLB and another part refers to requirements for
    shared memory segments.

    Signed-off-by: David Rientjes
    Cc: Jonathan Corbet
    Cc: Davide Libenzi
    Cc: Luiz Capitulino
    Cc: Shuah Khan
    Acked-by: Hugh Dickins
    Cc: Andrea Arcangeli
    Cc: Joern Engel
    Cc: Jianguo Wu
    Cc: Eric B Munson
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Add min_size mount option to the hugetlbfs documentation. Also, add the
    missing pagesize option and mention that size can be specified as bytes or
    a percentage of huge page pool.

    Signed-off-by: Mike Kravetz
    Cc: Davidlohr Bueso
    Cc: Aneesh Kumar
    Cc: Joonsoo Kim
    Cc: Andi Kleen
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     
  • …d the unevictable LRU

    The memory compaction code uses the migration code to do most of the
    work in compaction. However, the compaction code interacts with the
    unevictable LRU differently than migration code and this difference
    should be noted in the documentation.

    [akpm@linux-foundation.org: identify /proc/sys/vm/compact_unevictable directly]
    Signed-off-by: Eric B Munson <emunson@akamai.com>
    Cc: Michal Hocko <mhocko@suse.cz>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Eric B Munson
     

15 Apr, 2015

2 commits

  • Currently, cleancache_register_ops returns the previous value of
    cleancache_ops to allow chaining. However, chaining, as it is
    implemented now, is extremely dangerous due to possible pool id
    collisions. Suppose, a new cleancache driver is registered after the
    previous one assigned an id to a super block. If the new driver assigns
    the same id to another super block, which is perfectly possible, we will
    have two different filesystems using the same id. No matter if the new
    driver implements chaining or not, we are likely to get data corruption
    with such a configuration eventually.

    This patch therefore disables the ability to override cleancache_ops
    altogether as potentially dangerous. If there is already cleancache
    driver registered, all further calls to cleancache_register_ops will
    return EBUSY. Since no user of cleancache implements chaining, we only
    need to make minor changes to the code outside the cleancache core.

    Signed-off-by: Vladimir Davydov
    Cc: Konrad Rzeszutek Wilk
    Cc: Boris Ostrovsky
    Cc: David Vrabel
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Stefan Hengelein
    Cc: Florian Schmaus
    Cc: Andor Daam
    Cc: Dan Magenheimer
    Cc: Bob Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • __mlock_vma_pages_range() doesn't necessarily mlock pages. It depends on
    vma flags. The same codepath is used for MAP_POPULATE.

    Let's rename __mlock_vma_pages_range() to populate_vma_page_range().

    This patch also drops mlock_vma_pages_range() references from
    documentation. It has gone in cea10a19b797 ("mm: directly use
    __mlock_vma_pages_range() in find_extend_vma()").

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Linus Torvalds
    Acked-by: David Rientjes
    Cc: Michel Lespinasse
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

11 Apr, 2015

1 commit


20 Mar, 2015

1 commit

  • max_ptes_none specifies how many extra small pages (that are
    not already mapped) can be allocated when collapsing a group
    of small pages into one large page.

    /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none

    A higher value leads to use additional memory for programs.
    A lower value leads to gain less thp performance. Value of
    max_ptes_none can waste cpu time very little, you can
    ignore it.

    Signed-off-by: Ebru Akagunduz
    Reviewed-by: Rik van Riel
    Signed-off-by: Jonathan Corbet

    Ebru Akagunduz
     

12 Feb, 2015

1 commit

  • Add KPF_ZERO_PAGE flag for zero_page, so that userspace processes can
    detect zero_page in /proc/kpageflags, and then do memory analysis more
    accurately.

    Signed-off-by: Yalin Wang
    Acked-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Naoya Horiguchi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wang, Yalin
     

11 Feb, 2015

2 commits

  • Pull trivial tree changes from Jiri Kosina:
    "Patches from trivial.git that keep the world turning around.

    Mostly documentation and comment fixes, and a two corner-case code
    fixes from Alan Cox"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
    kexec, Kconfig: spell "architecture" properly
    mm: fix cleancache debugfs directory path
    blackfin: mach-common: ints-priority: remove unused function
    doubletalk: probe failure causes OOPS
    ARM: cache-l2x0.c: Make it clear that cache-l2x0 handles L310 cache controller
    msdos_fs.h: fix 'fields' in comment
    scsi: aic7xxx: fix comment
    ARM: l2c: fix comment
    ibmraid: fix writeable attribute with no store method
    dynamic_debug: fix comment
    doc: usbmon: fix spelling s/unpriviledged/unprivileged/
    x86: init_mem_mapping(): use capital BIOS in comment

    Linus Torvalds
     
  • remap_file_pages(2) was invented to be able efficiently map parts of
    huge file into limited 32-bit virtual address space such as in database
    workloads.

    Nonlinear mappings are pain to support and it seems there's no
    legitimate use-cases nowadays since 64-bit systems are widely available.

    Let's drop it and get rid of all these special-cased code.

    The patch replaces the syscall with emulation which creates new VMA on
    each remap_file_pages(), unless they it can be merged with an adjacent
    one.

    I didn't find *any* real code that uses remap_file_pages(2) to test
    emulation impact on. I've checked Debian code search and source of all
    packages in ALT Linux. No real users: libc wrappers, mentions in
    strace, gdb, valgrind and this kind of stuff.

    There are few basic tests in LTP for the syscall. They work just fine
    with emulation.

    To test performance impact, I've written small test case which
    demonstrate pretty much worst case scenario: map 4G shmfs file, write to
    begin of every page pgoff of the page, remap pages in reverse order,
    read every page.

    The test creates 1 million of VMAs if emulation is in use, so I had to
    set vm.max_map_count to 1100000 to avoid -ENOMEM.

    Before: 23.3 ( +- 4.31% ) seconds
    After: 43.9 ( +- 0.85% ) seconds
    Slowdown: 1.88x

    I believe we can live with that.

    Test case:

    #define _GNU_SOURCE
    #include
    #include
    #include
    #include

    #define MB (1024UL * 1024)
    #define SIZE (4096 * MB)

    int main(int argc, char **argv)
    {
    unsigned long *p;
    long i, pass;

    for (pass = 0; pass < 10; pass++) {
    p = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
    MAP_SHARED | MAP_ANONYMOUS, -1, 0);
    if (p == MAP_FAILED) {
    perror("mmap");
    return -1;
    }

    for (i = 0; i < SIZE / 4096; i++)
    p[i * 4096 / sizeof(*p)] = i;

    for (i = 0; i < SIZE / 4096; i++) {
    if (remap_file_pages(p + i * 4096 / sizeof(*p), 4096,
    0, (SIZE - 4096 * (i + 1)) >> 12, 0)) {
    perror("remap_file_pages");
    return -1;
    }
    }

    for (i = SIZE / 4096 - 1; i >= 0; i--)
    assert(p[i * 4096 / sizeof(*p)] == SIZE / 4096 - i - 1);

    munmap(p, SIZE);
    }

    return 0;
    }

    [akpm@linux-foundation.org: fix spello]
    [sasha.levin@oracle.com: initialize populate before usage]
    [sasha.levin@oracle.com: grab file ref to prevent race while mmaping]
    Signed-off-by: "Kirill A. Shutemov"
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Dave Jones
    Cc: Linus Torvalds
    Cc: Armin Rigo
    Signed-off-by: Sasha Levin
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

20 Jan, 2015

1 commit


14 Dec, 2014

2 commits

  • Merge second patchbomb from Andrew Morton:
    - the rest of MM
    - misc fs fixes
    - add execveat() syscall
    - new ratelimit feature for fault-injection
    - decompressor updates
    - ipc/ updates
    - fallocate feature creep
    - fsnotify cleanups
    - a few other misc things

    * emailed patches from Andrew Morton : (99 commits)
    cgroups: Documentation: fix trivial typos and wrong paragraph numberings
    parisc: percpu: update comments referring to __get_cpu_var
    percpu: update local_ops.txt to reflect this_cpu operations
    percpu: remove __get_cpu_var and __raw_get_cpu_var macros
    fsnotify: remove destroy_list from fsnotify_mark
    fsnotify: unify inode and mount marks handling
    fallocate: create FAN_MODIFY and IN_MODIFY events
    mm/cma: make kmemleak ignore CMA regions
    slub: fix cpuset check in get_any_partial
    slab: fix cpuset check in fallback_alloc
    shmdt: use i_size_read() instead of ->i_size
    ipc/shm.c: fix overly aggressive shmdt() when calls span multiple segments
    ipc/msg: increase MSGMNI, remove scaling
    ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM
    ipc/sem.c: change memory barrier in sem_lock() to smp_rmb()
    lib/decompress.c: consistency of compress formats for kernel image
    decompress_bunzip2: off by one in get_next_block()
    usr/Kconfig: make initrd compression algorithm selection not expert
    fault-inject: add ratelimit option
    ratelimit: add initialization macro
    ...

    Linus Torvalds
     
  • page owner is for the tracking about who allocated each page. This
    document explains what is the page owner feature and what is the merit of
    it. And, simple HOW-TO is also explained. See the document for detailed
    information.

    Signed-off-by: Joonsoo Kim
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Minchan Kim
    Cc: Dave Hansen
    Cc: Michal Nazarewicz
    Cc: Jungsoo Son
    Cc: Ingo Molnar
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

07 Nov, 2014

1 commit


23 Oct, 2014

1 commit


07 Jun, 2014

1 commit

  • The remap_file_pages() system call is used to create a nonlinear
    mapping, that is, a mapping in which the pages of the file are mapped
    into a nonsequential order in memory. The advantage of using
    remap_file_pages() over using repeated calls to mmap(2) is that the
    former approach does not require the kernel to create additional VMA
    (Virtual Memory Area) data structures.

    Supporting of nonlinear mapping requires significant amount of
    non-trivial code in kernel virtual memory subsystem including hot paths.
    Also to get nonlinear mapping work kernel need a way to distinguish
    normal page table entries from entries with file offset (pte_file).
    Kernel reserves flag in PTE for this purpose. PTE flags are scarce
    resource especially on some CPU architectures. It would be nice to free
    up the flag for other usage.

    Fortunately, there are not many users of remap_file_pages() in the wild.
    It's only known that one enterprise RDBMS implementation uses the
    syscall on 32-bit systems to map files bigger than can linearly fit into
    32-bit virtual address space. This use-case is not critical anymore
    since 64-bit systems are widely available.

    The plan is to deprecate the syscall and replace it with an emulation.
    The emulation will create new VMAs instead of nonlinear mappings. It's
    going to work slower for rare users of remap_file_pages() but ABI is
    preserved.

    One side effect of emulation (apart from performance) is that user can
    hit vm.max_map_count limit more easily due to additional VMAs. See
    comment for DEFAULT_MAX_MAP_COUNT for more details on the limit.

    [akpm@linux-foundation.org: fix spello]
    Signed-off-by: Kirill A. Shutemov
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Dave Jones
    Cc: Armin Rigo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

05 Jun, 2014

1 commit

  • Currently memory error handler handles action optional errors in the
    deferred manner by default. And if a recovery aware application wants
    to handle it immediately, it can do it by setting PF_MCE_EARLY flag.
    However, such signal can be sent only to the main thread, so it's
    problematic if the application wants to have a dedicated thread to
    handler such signals.

    So this patch adds dedicated thread support to memory error handler. We
    have PF_MCE_EARLY flags for each thread separately, so with this patch
    AO signal is sent to the thread with PF_MCE_EARLY flag set, not the main
    thread. If you want to implement a dedicated thread, you call prctl()
    to set PF_MCE_EARLY on the thread.

    Memory error handler collects processes to be killed, so this patch lets
    it check PF_MCE_EARLY flag on each thread in the collecting routines.

    No behavioral change for all non-early kill cases.

    Tony said:

    : The old behavior was crazy - someone with a multithreaded process might
    : well expect that if they call prctl(PF_MCE_EARLY) in just one thread, then
    : that thread would see the SIGBUS with si_code = BUS_MCEERR_A0 - even if
    : that thread wasn't the main thread for the process.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Naoya Horiguchi
    Reviewed-by: Tony Luck
    Cc: Kamil Iskra
    Cc: Andi Kleen
    Cc: Borislav Petkov
    Cc: Chen Gong
    Cc: [3.2+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     

04 Jun, 2014

1 commit

  • Pull trivial tree changes from Jiri Kosina:
    "Usual pile of patches from trivial tree that make the world go round"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
    staging: go7007: remove reference to CONFIG_KMOD
    aic7xxx: Remove obsolete preprocessor define
    of: dma: doc fixes
    doc: fix incorrect formula to calculate CommitLimit value
    doc: Note need of bc in the kernel build from 3.10 onwards
    mm: Fix printk typo in dmapool.c
    modpost: Fix comment typo "Modules.symvers"
    Kconfig.debug: Grammar s/addition/additional/
    wimax: Spelling s/than/that/, wording s/destinatary/recipient/
    aic7xxx: Spelling s/termnation/termination/
    arm64: mm: Remove superfluous "the" in comment
    of: Spelling s/anonymouns/anonymous/
    dma: imx-sdma: Spelling s/determnine/determine/
    ath10k: Improve grammar in comments
    ath6kl: Spelling s/determnine/determine/
    of: Improve grammar for of_alias_get_id() documentation
    drm/exynos: Spelling s/contro/control/
    radio-bcm2048.c: fix wrong overflow check
    doc: printk-formats: do not mention casts for u64/s64
    doc: spelling error changes
    ...

    Linus Torvalds
     

05 May, 2014

1 commit


19 Apr, 2014

1 commit

  • In document numa_memory_policy.txt, the following examples for flag
    MPOL_F_RELATIVE_NODES are incorrect.

    For example, consider a task that is attached to a cpuset with
    mems 2-5 that sets an Interleave policy over the same set with
    MPOL_F_RELATIVE_NODES. If the cpuset's mems change to 3-7, the
    interleave now occurs over nodes 3,5-6. If the cpuset's mems
    then change to 0,2-3,5, then the interleave occurs over nodes
    0,3,5.

    According to the comment of the patch adding flag MPOL_F_RELATIVE_NODES,
    the nodemasks the user specifies should be considered relative to the
    current task's mems_allowed.

    (https://lkml.org/lkml/2008/2/29/428)

    And according to numa_memory_policy.txt, if the user's nodemask includes
    nodes that are outside the range of the new set of allowed nodes, then
    the remap wraps around to the beginning of the nodemask and, if not
    already set, sets the node in the mempolicy nodemask.

    So in the example, if the user specifies 2-5, for a task whose
    mems_allowed is 3-7, the nodemasks should be remapped the third, fourth,
    fifth, sixth node in mems_allowed. like the following:

    mems_allowed: 3 4 5 6 7

    relative index: 0 1 2 3 4
    5

    So the nodemasks should be remapped to 3,5-7, but not 3,5-6.

    And for a task whose mems_allowed is 0,2-3,5, the nodemasks should be
    remapped to 0,2-3,5, but not 0,3,5.

    mems_allowed: 0 2 3 5

    relative index: 0 1 2 3
    4 5

    Signed-off-by: Tang Chen
    Cc: Randy Dunlap
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tang Chen
     

21 Mar, 2014

1 commit


11 Feb, 2014

1 commit

  • Some of the 00-INDEX files are somewhat outdated and some folders does
    not contain 00-INDEX at all. Only outdated (with the notably exception
    of spi) indexes are touched here, the 169 folders without 00-INDEX has
    not been touched.

    New 00-INDEX
    - spi/* was added in a series of commits dating back to 2006

    Added files (missing in (*/)00-INDEX)
    - dmatest.txt was added by commit 851b7e16a07d ("dmatest: run test via
    debugfs")
    - this_cpu_ops.txt was added by commit a1b2a555d637 ("percpu: add
    documentation on this_cpu operations")
    - ww-mutex-design.txt was added by commit 040a0a371005 ("mutex: Add
    support for wound/wait style locks")
    - bcache.txt was added by commit cafe56359144 ("bcache: A block layer
    cache")
    - kernel-per-CPU-kthreads.txt was added by commit 49717cb40410
    ("kthread: Document ways of reducing OS jitter due to per-CPU
    kthreads")
    - phy.txt was added by commit ff764963479a ("drivers: phy: add generic
    PHY framework")
    - block/null_blk was added by commit 12f8f4fc0314 ("null_blk:
    documentation")
    - module-signing.txt was added by commit 3cafea307642 ("Add
    Documentation/module-signing.txt file")
    - assoc_array.txt was added by commit 3cb989501c26 ("Add a generic
    associative array implementation.")
    - arm/IXP4xx was part of the initial repo
    - arm/cluster-pm-race-avoidance.txt was added by commit 7fe31d28e839
    ("ARM: mcpm: introduce helpers for platform coherency exit/setup")
    - arm/firmware.txt was added by commit 7366b92a77fc ("ARM: Add
    interface for registering and calling firmware-specific operations")
    - arm/kernel_mode_neon.txt was added by commit 2afd0a05241d ("ARM:
    7825/1: document the use of NEON in kernel mode")
    - arm/tcm.txt was added by commit bc581770cfdd ("ARM: 5580/2: ARM TCM
    (Tightly-Coupled Memory) support v3")
    - arm/vlocks.txt was added by commit 9762f12d3e05 ("ARM: mcpm: Add
    baremetal voting mutexes")
    - blackfin/gptimers-example.c, Makefile was added by commit
    4b60779d5ea7 ("Blackfin: add an example showing how to use the
    gptimers API")
    - devicetree/usage-model.txt was added by commit 31134efc681a ("dt:
    Linux DT usage model documentation")
    - fb/api.txt was added by commit fb21c2f42879 ("fbdev: Add FOURCC-based
    format configuration API")
    - fb/sm501.txt was added by commit e6a049807105 ("video, sm501: add
    edid and commandline support")
    - fb/udlfb.txt was added by commit 96f8d864afd6 ("fbdev: move udlfb out
    of staging.")
    - filesystems/Makefile was added by commit 1e0051ae48a2
    ("Documentation/fs/: split txt and source files")
    - filesystems/nfs/nfsd-admin-interfaces.txt was added by commit
    8a4c6e19cfed ("nfsd: document kernel interfaces for nfsd
    configuration")
    - ide/warm-plug-howto.txt was added by commit f74c91413ec6 ("ide: add
    warm-plug support for IDE devices (take 2)")
    - laptops/Makefile was added by commit d49129accc21
    ("Documentation/laptop/: split txt and source files")
    - leds/leds-blinkm.txt was added by commit b54cf35a7f65 ("LEDS: add
    BlinkM RGB LED driver, documentation and update MAINTAINERS")
    - leds/ledtrig-oneshot.txt was added by commit 5e417281cde2 ("leds: add
    oneshot trigger")
    - leds/ledtrig-transient.txt was added by commit 44e1e9f8e705 ("leds:
    add new transient trigger for one shot timer activation")
    - m68k/README.buddha was part of the initial repo
    - networking/LICENSE.(qla3xxx|qlcnic|qlge) was added by commits
    40839129f779, c4e84bde1d59, 5a4faa873782
    - networking/Makefile was added by commit 3794f3e812ef ("docsrc: build
    Documentation/ sources")
    - networking/i40evf.txt was added by commit 105bf2fe6b32 ("i40evf: add
    driver to kernel build system")
    - networking/ipsec.txt was added by commit b3c6efbc36e2 ("xfrm: Add
    file to document IPsec corner case")
    - networking/mac80211-auth-assoc-deauth.txt was added by commit
    3cd7920a2be8 ("mac80211: add auth/assoc/deauth flow diagram")
    - networking/netlink_mmap.txt was added by commit 5683264c3981
    ("netlink: add documentation for memory mapped I/O")
    - networking/nf_conntrack-sysctl.txt was added by commit c9f9e0e1597f
    ("netfilter: doc: add nf_conntrack sysctl api documentation") lan)
    - networking/team.txt was added by commit 3d249d4ca7d0 ("net: introduce
    ethernet teaming device")
    - networking/vxlan.txt was added by commit d342894c5d2f ("vxlan:
    virtual extensible lan")
    - power/runtime_pm.txt was added by commit 5e928f77a09a ("PM: Introduce
    core framework for run-time PM of I/O devices (rev. 17)")
    - power/charger-manager.txt was added by commit 3bb3dbbd56ea
    ("power_supply: Add initial Charger-Manager driver")
    - RCU/lockdep-splat.txt was added by commit d7bd2d68aa2e ("rcu:
    Document interpretation of RCU-lockdep splats")
    - s390/kvm.txt was added by 5ecee4b (KVM: s390: API documentation)
    - s390/qeth.txt was added by commit b4d72c08b358 ("qeth: bridgeport
    support - basic control")
    - scheduler/sched-bwc.txt was added by commit 88ebc08ea9f7 ("sched: Add
    documentation for bandwidth control")
    - scsi/advansys.txt was added by commit 4bd6d7f35661 ("[SCSI] advansys:
    Move documentation to Documentation/scsi")
    - scsi/bfa.txt was added by commit 1ec90174bdb4 ("[SCSI] bfa: add
    readme file")
    - scsi/bnx2fc.txt was added by commit 12b8fc10eaf4 ("[SCSI] bnx2fc: Add
    driver documentation")
    - scsi/cxgb3i.txt was added by commit c3673464ebc0 ("[SCSI] cxgb3i: Add
    cxgb3i iSCSI driver.")
    - scsi/hpsa.txt was added by commit 992ebcf14f3c ("[SCSI] hpsa: Add
    hpsa.txt to Documentation/scsi")
    - scsi/link_power_management_policy.txt was added by commit
    ca77329fb713 ("[libata] Link power management infrastructure")
    - scsi/osd.txt was added by commit 78e0c621deca ("[SCSI] osd:
    Documentation for OSD library")
    - scsi/scsi-parameter.txt was created/moved by commit 163475fb111c
    ("Documentation: move SCSI parameters to their own text file")
    - serial/driver was part of the initial repo
    - serial/n_gsm.txt was added by commit 323e84122ec6 ("n_gsm: add a
    documentation")
    - timers/Makefile was added by commit 3794f3e812ef ("docsrc: build
    Documentation/ sources")
    - virt/kvm/s390.txt was added by commit d9101fca3d57 ("KVM: s390:
    diagnose call documentation")
    - vm/split_page_table_lock was added by commit 49076ec2ccaf ("mm:
    dynamically allocate page->ptl if it cannot be embedded to struct
    page")
    - w1/slaves/w1_ds28e04 was added by commit fbf7f7b4e2ae ("w1: Add
    1-wire slave device driver for DS28E04-100")
    - w1/masters/omap-hdq was added by commit e0a29382c6f5 ("hdq:
    documentation for OMAP HDQ")
    - x86/early-microcode.txt was added by commit 0d91ea86a895 ("x86, doc:
    Documentation for early microcode loading")
    - x86/earlyprintk.txt was added by commit a1aade478862 ("x86/doc:
    mini-howto for using earlyprintk=dbgp")
    - x86/entry_64.txt was added by commit 8b4777a4b50c ("x86-64: Document
    some of entry_64.S")
    - x86/pat.txt was added by commit d27554d874c7 ("x86: PAT
    documentation")

    Moved files
    - arm/kernel_user_helpers.txt was moved out of arch/arm/kernel by
    commit 37b8304642c7 ("ARM: kuser: move interface documentation out of
    the source code")
    - efi-stub.txt was moved out of x86/ and down into Documentation/ in
    commit 4172fe2f8a47 ("EFI stub documentation updates")
    - laptops/hpfall.c was moved out of hwmon/ and into laptops/ in commit
    efcfed9bad88 ("Move hp_accel to drivers/platform/x86")
    - commit 5616c23ad9cd ("x86: doc: move x86-generic documentation from
    Doc/x86/i386"):
    * x86/usb-legacy-support.txt
    * x86/boot.txt
    * x86/zero_page.txt
    - power/video_extension.txt was moved to acpi in commit 70e66e4df191
    ("ACPI / video: move video_extension.txt to Documentation/acpi")

    Removed files (left in 00-INDEX)
    - memory.txt was removed by commit 00ea8990aadf ("memory.txt: remove
    stray information")
    - gpio.txt was moved to gpio/ in commit fd8e198cfcaa ("Documentation:
    gpiolib: document new interface")
    - networking/DLINK.txt was removed by commit 168e06ae26dd
    ("drivers/net: delete old parallel port de600/de620 drivers")
    - serial/hayes-esp.txt was removed by commit f53a2ade0bb9 ("tty: esp:
    remove broken driver")
    - s390/TAPE was removed by commit 9e280f669308 ("[S390] remove tape
    block docu")
    - vm/locking was removed by commit 57ea8171d2bc ("mm: documentation:
    remove hopelessly out-of-date locking doc")
    - laptops/acer-wmi.txt was remvoed by commit 020036678e81 ("acer-wmi:
    Delete out-of-date documentation")

    Typos/misc issues
    - rpc-server-gss.txt was added as knfsd-rpcgss.txt in commit
    030d794bf498 ("SUNRPC: Use gssproxy upcall for server RPCGSS
    authentication.")
    - commit b88cf73d9278 ("net: add missing entries to
    Documentation/networking/00-INDEX")
    * generic-hdlc.txt was added as generic_hdlc.txt
    * spider_net.txt was added as spider-net.txt
    - w1/master/mxc-w1 was added as mxc_w1 by commit a5fd9139f74c ("w1: add
    1-wire master driver for i.MX27 / i.MX31")
    - s390/zfcpdump.txt was added as zfcpdump by commit 6920c12a407e
    ("[S390] Add Documentation/s390/00-INDEX.")

    Signed-off-by: Henrik Austad
    Reviewed-by: Paul E. McKenney [rcu bits]
    Acked-by: Rob Landley
    Cc: Jiri Kosina
    Cc: Thomas Gleixner
    Cc: Rob Herring
    Cc: David S. Miller
    Cc: Mark Brown
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Gleb Natapov
    Cc: Linus Torvalds
    Cc: Len Brown
    Cc: James Bottomley
    Cc: Jean-Christophe Plagniol-Villard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henrik Austad
     

24 Jan, 2014

1 commit

  • Documentation/vm/locking is a blast from the past. In the entire git
    history, it has had precisely Three modifications. Two of those look to
    be pure renames, and the third was from 2005.

    The doc contains such gems as:

    > The page_table_lock is grabbed while holding the
    > kernel_lock spinning monitor.

    > Page stealers hold kernel_lock to protect against a bunch of
    > races.

    Or this which talks about mmap_sem:

    > 4. The exception to this rule is expand_stack, which just
    > takes the read lock and the page_table_lock, this is ok
    > because it doesn't really modify fields anybody relies on.

    expand_stack() doesn't take any locks any more directly, and the
    mmap_sem acquisition was long ago moved up in to the page fault code
    itself.

    It could be argued that we need to rewrite this, but it is dangerous to
    leave it as-is. It will confuse more people than it helps.

    Signed-off-by: Dave Hansen
    Cc: Hugh Dickins
    Acked-by: Vlastimil Babka
    Cc: Wanpeng Li
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     

22 Jan, 2014

1 commit

  • Some applications that run on HPC clusters are designed around the
    availability of RAM and the overcommit ratio is fine tuned to get the
    maximum usage of memory without swapping. With growing memory, the
    1%-of-all-RAM grain provided by overcommit_ratio has become too coarse
    for these workload (on a 2TB machine it represents no less than 20GB).

    This patch adds the new overcommit_kbytes sysctl variable that allow a
    much finer grain.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: fix nommu build]
    Signed-off-by: Jerome Marchand
    Cc: Dave Hansen
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jerome Marchand
     

22 Nov, 2013

1 commit

  • There are two code paths how page with pmd page table can be freed:
    pmd_free() and pmd_free_tlb().

    I've missed the second one and didn't add page table destructor call
    there. It leads to leak of page->ptl for pmd page tables, if
    dynamically allocated page->ptl is in use.

    The patch adds the missed destructor and modifies documentation
    accordingly.

    Signed-off-by: Kirill A. Shutemov
    Reported-by: Andrey Vagin
    Tested-by: Andrey Vagin
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

16 Nov, 2013

1 commit

  • Pull trivial tree updates from Jiri Kosina:
    "Usual earth-shaking, news-breaking, rocket science pile from
    trivial.git"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
    doc: usb: Fix typo in Documentation/usb/gadget_configs.txt
    doc: add missing files to timers/00-INDEX
    timekeeping: Fix some trivial typos in comments
    mm: Fix some trivial typos in comments
    irq: Fix some trivial typos in comments
    NUMA: fix typos in Kconfig help text
    mm: update 00-INDEX
    doc: Documentation/DMA-attributes.txt fix typo
    DRM: comment: `halve' -> `half'
    Docs: Kconfig: `devlopers' -> `developers'
    doc: typo on word accounting in kprobes.c in mutliple architectures
    treewide: fix "usefull" typo
    treewide: fix "distingush" typo
    mm/Kconfig: Grammar s/an/a/
    kexec: Typo s/the/then/
    Documentation/kvm: Update cpuid documentation for steal time and pv eoi
    treewide: Fix common typo in "identify"
    __page_to_pfn: Fix typo in comment
    Correct some typos for word frequency
    clk: fixed-factor: Fix a trivial typo
    ...

    Linus Torvalds
     

15 Nov, 2013

1 commit

  • If split page table lock is in use, we embed the lock into struct page
    of table's page. We have to disable split lock, if spinlock_t is too
    big be to be embedded, like when DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC
    enabled.

    This patch add support for dynamic allocation of split page table lock
    if we can't embed it to struct page.

    page->ptl is unsigned long now and we use it as spinlock_t if
    sizeof(spinlock_t) ptl.

    Signed-off-by: Kirill A. Shutemov
    Reviewed-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

13 Nov, 2013

1 commit


14 Oct, 2013

1 commit

  • The following files moved files out of Documentation/vm/
    c6dd897f ("mm: move page-types.c from Documentation to tools/vm")
    f0f57b2b ("move hugepage test examples to tools/testing/selftests/vm)

    Remove these files from vm/00-INDEX.

    The following commits added new files do Documentation/vm/
    4fe4746a ("mm/fs: cleancache documentation") added vm/cleancache.txt
    d65bfacb ("mm: highmem documentation") added vm/highmem.txt
    1c9bf22c ("thp: transparent hugepage support documentation") added
    vm/transhuge.txt
    0f8975ec ("mm: soft-dirty bits for user memory changes tracking")
    61b0d760 ("zswap: add documentation")
    27c6aec2 ("mm: frontswap: config and doc files")

    Add the missing documentation-files with a short description to 00-INDEX

    Signed-off-by: Henrik Austad
    Signed-off-by: Jiri Kosina

    Henrik Austad
     

12 Sep, 2013

2 commits

  • Pavel reported that in case if vma area get unmapped and then mapped (or
    expanded) in-place, the soft dirty tracker won't be able to recognize this
    situation since it works on pte level and ptes are get zapped on unmap,
    loosing soft dirty bit of course.

    So to resolve this situation we need to track actions on vma level, there
    VM_SOFTDIRTY flag comes in. When new vma area created (or old expanded)
    we set this bit, and keep it here until application calls for clearing
    soft dirty bit.

    Thus when user space application track memory changes now it can detect if
    vma area is renewed.

    Reported-by: Pavel Emelyanov
    Signed-off-by: Cyrill Gorcunov
    Cc: Andy Lutomirski
    Cc: Matt Mackall
    Cc: Xiao Guangrong
    Cc: Marcelo Tosatti
    Cc: KOSAKI Motohiro
    Cc: Stephen Rothwell
    Cc: Peter Zijlstra
    Cc: "Aneesh Kumar K.V"
    Cc: Rob Landley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • Explicitly mention/recommend using the libhugetlbfs test cases when
    changing related kernel code. Developers that are unaware of the project
    can easily miss this and introduce potential regressions that may or may
    not be caught by community review.

    Also do some cleanups that make the document visually easier to view at a
    first glance.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

11 Jul, 2013

1 commit

  • Add the documentation file for the zswap functionality

    Signed-off-by: Seth Jennings
    Acked-by: Rik van Riel
    Cc: Greg Kroah-Hartman
    Cc: Nitin Gupta
    Cc: Minchan Kim
    Cc: Konrad Rzeszutek Wilk
    Cc: Dan Magenheimer
    Cc: Robert Jennings
    Cc: Jenifer Hopper
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Larry Woodman
    Cc: Benjamin Herrenschmidt
    Cc: Dave Hansen
    Cc: Joe Perches
    Cc: Joonsoo Kim
    Cc: Cody P Schafer
    Cc: Hugh Dickens
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Seth Jennings
     

10 Jul, 2013

1 commit

  • Transparent huge zero page is used during the page fault instead of in
    khugepaged.

    # ls /sys/kernel/mm/transparent_hugepage/
    defrag enabled khugepaged use_zero_page
    # ls /sys/kernel/mm/transparent_hugepage/khugepaged/
    alloc_sleep_millisecs defrag full_scans max_ptes_none pages_collapsed pages_to_scan scan_sleep_millisecs

    This patch corrects the documentation just like the codes done.

    Signed-off-by: Wanpeng Li
    Acked-by: Kirill A. Shutemov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wanpeng Li
     

05 Jul, 2013

1 commit

  • Pull trivial tree updates from Jiri Kosina:
    "The usual stuff from trivial tree"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
    treewide: relase -> release
    Documentation/cgroups/memory.txt: fix stat file documentation
    sysctl/net.txt: delete reference to obsolete 2.4.x kernel
    spinlock_api_smp.h: fix preprocessor comments
    treewide: Fix typo in printk
    doc: device tree: clarify stuff in usage-model.txt.
    open firmware: "/aliasas" -> "/aliases"
    md: bcache: Fixed a typo with the word 'arithmetic'
    irq/generic-chip: fix a few kernel-doc entries
    frv: Convert use of typedef ctl_table to struct ctl_table
    sgi: xpc: Convert use of typedef ctl_table to struct ctl_table
    doc: clk: Fix incorrect wording
    Documentation/arm/IXP4xx fix a typo
    Documentation/networking/ieee802154 fix a typo
    Documentation/DocBook/media/v4l fix a typo
    Documentation/video4linux/si476x.txt fix a typo
    Documentation/virtual/kvm/api.txt fix a typo
    Documentation/early-userspace/README fix a typo
    Documentation/video4linux/soc-camera.txt fix a typo
    lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment
    ...

    Linus Torvalds
     

04 Jul, 2013

2 commits

  • In order to reuse bits from pagemap entries gracefully, we leave the
    entries as is but on pagemap open emit a warning in dmesg, that bits
    55-60 are about to change in a couple of releases. Next, if a user
    issues soft-dirty clear command via the clear_refs file (it was disabled
    before v3.9) we assume that he's aware of the new pagemap format, note
    that fact and report the bits in pagemap in the new manner.

    The "migration strategy" looks like this then:

    1. existing users are not affected -- they don't touch soft-dirty feature, thus
    see old bits in pagemap, but are warned and have time to fix themselves
    2. those who use soft-dirty know about new pagemap format
    3. some time soon we get rid of any signs of page-shift in pagemap as well as
    this trick with clear-soft-dirty affecting pagemap format.

    Signed-off-by: Pavel Emelyanov
    Cc: Matt Mackall
    Cc: Xiao Guangrong
    Cc: Glauber Costa
    Cc: Marcelo Tosatti
    Cc: KOSAKI Motohiro
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • The soft-dirty is a bit on a PTE which helps to track which pages a task
    writes to. In order to do this tracking one should

    1. Clear soft-dirty bits from PTEs ("echo 4 > /proc/PID/clear_refs)
    2. Wait some time.
    3. Read soft-dirty bits (55'th in /proc/PID/pagemap2 entries)

    To do this tracking, the writable bit is cleared from PTEs when the
    soft-dirty bit is. Thus, after this, when the task tries to modify a
    page at some virtual address the #PF occurs and the kernel sets the
    soft-dirty bit on the respective PTE.

    Note, that although all the task's address space is marked as r/o after
    the soft-dirty bits clear, the #PF-s that occur after that are processed
    fast. This is so, since the pages are still mapped to physical memory,
    and thus all the kernel does is finds this fact out and puts back
    writable, dirty and soft-dirty bits on the PTE.

    Another thing to note, is that when mremap moves PTEs they are marked
    with soft-dirty as well, since from the user perspective mremap modifies
    the virtual memory at mremap's new address.

    Signed-off-by: Pavel Emelyanov
    Cc: Matt Mackall
    Cc: Xiao Guangrong
    Cc: Glauber Costa
    Cc: Marcelo Tosatti
    Cc: KOSAKI Motohiro
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov