20 Dec, 2017

1 commit

  • [ Upstream commit 592e254502041f953e84d091eae2c68cba04c10b ]

    _calc_vm_trans() does not handle the situation when some of the passed
    flags are 0 (which can happen if these VM flags do not make sense for
    the architecture). Improve the _calc_vm_trans() macro to return 0 in
    such situation. Since all passed flags are constant, this does not add
    any runtime overhead.

    Signed-off-by: Jan Kara
    Signed-off-by: Dan Williams
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

21 Jun, 2017

1 commit

  • Currently, percpu_counter_add is a wrapper around __percpu_counter_add
    which is preempt safe due to explicit calls to preempt_disable. Given
    how __ prefix is used in percpu related interfaces, the naming
    unfortunately creates the false sense that __percpu_counter_add is
    less safe than percpu_counter_add. In terms of context-safety,
    they're equivalent. The only difference is that the __ version takes
    a batch parameter.

    Make this a bit more explicit by just renaming __percpu_counter_add to
    percpu_counter_add_batch.

    This patch doesn't cause any functional changes.

    tj: Minor updates to patch description for clarity. Cosmetic
    indentation updates.

    Signed-off-by: Nikolay Borisov
    Signed-off-by: Tejun Heo
    Cc: Chris Mason
    Cc: Josef Bacik
    Cc: David Sterba
    Cc: Darrick J. Wong
    Cc: Jan Kara
    Cc: Jens Axboe
    Cc: linux-mm@kvack.org
    Cc: "David S. Miller"

    Nikolay Borisov
     

03 Aug, 2016

1 commit


19 Feb, 2016

1 commit

  • This plumbs a protection key through calc_vm_flag_bits(). We
    could have done this in calc_vm_prot_bits(), but I did not feel
    super strongly which way to go. It was pretty arbitrary which
    one to use.

    Signed-off-by: Dave Hansen
    Reviewed-by: Thomas Gleixner
    Cc: Andrea Arcangeli
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Arve Hjønnevåg
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Chen Gang
    Cc: Dan Williams
    Cc: Dave Chinner
    Cc: Dave Hansen
    Cc: David Airlie
    Cc: Denys Vlasenko
    Cc: Eric W. Biederman
    Cc: Geliang Tang
    Cc: Greg Kroah-Hartman
    Cc: H. Peter Anvin
    Cc: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Leon Romanovsky
    Cc: Linus Torvalds
    Cc: Masahiro Yamada
    Cc: Maxime Coquelin
    Cc: Mel Gorman
    Cc: Michael Ellerman
    Cc: Oleg Nesterov
    Cc: Paul Gortmaker
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Riley Andrews
    Cc: Vladimir Davydov
    Cc: devel@driverdev.osuosl.org
    Cc: linux-api@vger.kernel.org
    Cc: linux-arch@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Cc: linux-mm@kvack.org
    Cc: linuxppc-dev@lists.ozlabs.org
    Link: http://lkml.kernel.org/r/20160212210231.E6F1F0D6@viggo.jf.intel.com
    Signed-off-by: Ingo Molnar

    Dave Hansen
     

22 Jan, 2014

1 commit

  • Some applications that run on HPC clusters are designed around the
    availability of RAM and the overcommit ratio is fine tuned to get the
    maximum usage of memory without swapping. With growing memory, the
    1%-of-all-RAM grain provided by overcommit_ratio has become too coarse
    for these workload (on a 2TB machine it represents no less than 20GB).

    This patch adds the new overcommit_kbytes sysctl variable that allow a
    much finer grain.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: fix nommu build]
    Signed-off-by: Jerome Marchand
    Cc: Dave Hansen
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jerome Marchand
     

13 Nov, 2013

1 commit

  • The same calculation is currently done in three differents places.
    Factor that code so future changes has to be made at only one place.

    [akpm@linux-foundation.org: uninline vm_commit_limit()]
    Signed-off-by: Jerome Marchand
    Cc: Dave Hansen
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jerome Marchand
     

04 Jul, 2013

1 commit

  • Currently the per cpu counter's batch size for memory accounting is
    configured as twice the number of cpus in the system. However, for
    system with very large memory, it is more appropriate to make it
    proportional to the memory size per cpu in the system.

    For example, for a x86_64 system with 64 cpus and 128 GB of memory, the
    batch size is only 2*64 pages (0.5 MB). So any memory accounting
    changes of more than 0.5MB will overflow the per cpu counter into the
    global counter. Instead, for the new scheme, the batch size is
    configured to be 0.4% of the memory/cpu = 8MB (128 GB/64 /256), which is
    more inline with the memory size.

    I've done a repeated brk test of 800KB (from will-it-scale test suite)
    with 80 concurrent processes on a 4 socket Westmere machine with a total
    of 40 cores. Without the patch, about 80% of cpu is spent on spin-lock
    contention within the vm_committed_as counter. With the patch, there's
    a 73x speedup on the benchmark and the lock contention drops off almost
    entirely.

    [akpm@linux-foundation.org: fix section mismatch]
    Signed-off-by: Tim Chen
    Cc: Tejun Heo
    Cc: Eric Dumazet
    Cc: Dave Hansen
    Cc: Andi Kleen
    Cc: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tim Chen
     

29 Mar, 2013

1 commit

  • This reverts commit 186930500985 ("mm: introduce VM_POPULATE flag to
    better deal with racy userspace programs").

    VM_POPULATE only has any effect when userspace plays racy games with
    vmas by trying to unmap and remap memory regions that mmap or mlock are
    operating on.

    Also, the only effect of VM_POPULATE when userspace plays such games is
    that it avoids populating new memory regions that get remapped into the
    address range that was being operated on by the original mmap or mlock
    calls.

    Let's remove VM_POPULATE as there isn't any strong argument to mandate a
    new vm_flag.

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Hugh Dickins
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

24 Feb, 2013

1 commit

  • The vm_populate() code populates user mappings without constantly
    holding the mmap_sem. This makes it susceptible to racy userspace
    programs: the user mappings may change while vm_populate() is running,
    and in this case vm_populate() may end up populating the new mapping
    instead of the old one.

    In order to reduce the possibility of userspace getting surprised by
    this behavior, this change introduces the VM_POPULATE vma flag which
    gets set on vmas we want vm_populate() to work on. This way
    vm_populate() may still end up populating the new mapping after such a
    race, but only if the new mapping is also one that the user has
    requested (using MAP_SHARED, MAP_LOCKED or mlock) to be populated.

    Signed-off-by: Michel Lespinasse
    Acked-by: Rik van Riel
    Tested-by: Andy Lutomirski
    Cc: Greg Ungerer
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

16 Nov, 2012

1 commit

  • It will be useful to be able to access global memory commitment from
    device drivers. On the Hyper-V platform, the host has a policy engine to
    balance the available physical memory amongst all competing virtual
    machines hosted on a given node. This policy engine is driven by a number
    of metrics including the memory commitment reported by the guests. The
    balloon driver for Linux on Hyper-V will use this function to retrieve
    guest memory commitment. This function is also used in Xen self
    ballooning code.

    [akpm@linux-foundation.org: coding-style tweak]
    Signed-off-by: K. Y. Srinivasan
    Acked-by: David Rientjes
    Acked-by: Dan Magenheimer
    Cc: Konrad Rzeszutek Wilk
    Cc: Jeremy Fitzhardinge
    Signed-off-by: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     

13 Oct, 2012

1 commit


09 Oct, 2012

1 commit

  • Currently the kernel sets mm->exe_file during sys_execve() and then tracks
    number of vmas with VM_EXECUTABLE flag in mm->num_exe_file_vmas, as soon
    as this counter drops to zero kernel resets mm->exe_file to NULL. Plus it
    resets mm->exe_file at last mmput() when mm->mm_users drops to zero.

    VMA with VM_EXECUTABLE flag appears after mapping file with flag
    MAP_EXECUTABLE, such vmas can appears only at sys_execve() or after vma
    splitting, because sys_mmap ignores this flag. Usually binfmt module sets
    mm->exe_file and mmaps executable vmas with this file, they hold
    mm->exe_file while task is running.

    comment from v2.6.25-6245-g925d1c4 ("procfs task exe symlink"),
    where all this stuff was introduced:

    > The kernel implements readlink of /proc/pid/exe by getting the file from
    > the first executable VMA. Then the path to the file is reconstructed and
    > reported as the result.
    >
    > Because of the VMA walk the code is slightly different on nommu systems.
    > This patch avoids separate /proc/pid/exe code on nommu systems. Instead of
    > walking the VMAs to find the first executable file-backed VMA we store a
    > reference to the exec'd file in the mm_struct.
    >
    > That reference would prevent the filesystem holding the executable file
    > from being unmounted even after unmapping the VMAs. So we track the number
    > of VM_EXECUTABLE VMAs and drop the new reference when the last one is
    > unmapped. This avoids pinning the mounted filesystem.

    exe_file's vma accounting is hooked into every file mmap/unmmap and vma
    split/merge just to fix some hypothetical pinning fs from umounting by mm,
    which already unmapped all its executable files, but still alive.

    Seems like currently nobody depends on this behaviour. We can try to
    remove this logic and keep mm->exe_file until final mmput().

    mm->exe_file is still protected with mm->mmap_sem, because we want to
    change it via new sys_prctl(PR_SET_MM_EXE_FILE). Also via this syscall
    task can change its mm->exe_file and unpin mountpoint explicitly.

    Signed-off-by: Konstantin Khlebnikov
    Cc: Alexander Viro
    Cc: Carsten Otte
    Cc: Chris Metcalf
    Cc: Cyrill Gorcunov
    Cc: Eric Paris
    Cc: H. Peter Anvin
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc: James Morris
    Cc: Jason Baron
    Cc: Kentaro Takeda
    Cc: Matt Helsley
    Cc: Nick Piggin
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Suresh Siddha
    Cc: Tetsuo Handa
    Cc: Venkatesh Pallipadi
    Acked-by: Linus Torvalds
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     

27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

03 May, 2009

1 commit

  • The Committed_AS field can underflow in certain situations:

    > # while true; do cat /proc/meminfo | grep _AS; sleep 1; done | uniq -c
    > 1 Committed_AS: 18446744073709323392 kB
    > 11 Committed_AS: 18446744073709455488 kB
    > 6 Committed_AS: 35136 kB
    > 5 Committed_AS: 18446744073709454400 kB
    > 7 Committed_AS: 35904 kB
    > 3 Committed_AS: 18446744073709453248 kB
    > 2 Committed_AS: 34752 kB
    > 9 Committed_AS: 18446744073709453248 kB
    > 8 Committed_AS: 34752 kB
    > 3 Committed_AS: 18446744073709320960 kB
    > 7 Committed_AS: 18446744073709454080 kB
    > 3 Committed_AS: 18446744073709320960 kB
    > 5 Committed_AS: 18446744073709454080 kB
    > 6 Committed_AS: 18446744073709320960 kB

    Because NR_CPUS can be greater than 1000 and meminfo_proc_show() does
    not check for underflow.

    But NR_CPUS proportional isn't good calculation. In general,
    possibility of lock contention is proportional to the number of online
    cpus, not theorical maximum cpus (NR_CPUS).

    The current kernel has generic percpu-counter stuff. using it is right
    way. it makes code simplify and percpu_counter_read_positive() don't
    make underflow issue.

    Reported-by: Dave Hansen
    Signed-off-by: KOSAKI Motohiro
    Cc: Eric B Munson
    Cc: Mel Gorman
    Cc: Christoph Lameter
    Cc: [All kernel versions]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

09 Jul, 2008

1 commit

  • This patch allows architectures to define functions to deal with
    additional protections bits for mmap() and mprotect().

    arch_calc_vm_prot_bits() maps additonal protection bits to vm_flags
    arch_vm_get_page_prot() maps additional vm_flags to the vma's vm_page_prot
    arch_validate_prot() checks for valid values of the protection bits

    Note: vm_get_page_prot() is now pretty ugly, but the generated code
    should be identical for architectures that don't define additional
    protection bits.

    Signed-off-by: Dave Kleikamp
    Acked-by: Andrew Morton
    Acked-by: Hugh Dickins
    Signed-off-by: Benjamin Herrenschmidt

    Dave Kleikamp
     

25 May, 2008

1 commit

  • The atomic_t type is 32bit but a 64bit system can have more than 2^32
    pages of virtual address space available. Without this we overflow on
    ludicrously large mappings

    Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     

26 Apr, 2006

1 commit


25 Apr, 2006

1 commit


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds