17 Jul, 2018

1 commit

  • The mm_struct always contains a cpumask bitmap, regardless of
    CONFIG_CPUMASK_OFFSTACK. That means the first step can be to
    simplify things, and simply have one bitmask at the end of the
    mm_struct for the mm_cpumask.

    This does necessitate moving everything else in mm_struct into
    an anonymous sub-structure, which can be randomized when struct
    randomization is enabled.

    The second step is to determine the correct size for the
    mm_struct slab object from the size of the mm_struct
    (excluding the CPU bitmap) and the size the cpumask.

    For init_mm we can simply allocate the maximum size this
    kernel is compiled for, since we only have one init_mm
    in the system, anyway.

    Pointer magic by Mike Galbraith, to evade -Wstringop-overflow
    getting confused by the dynamically sized array.

    Tested-by: Song Liu
    Signed-off-by: Rik van Riel
    Signed-off-by: Mike Galbraith
    Signed-off-by: Rik van Riel
    Acked-by: Dave Hansen
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: kernel-team@fb.com
    Cc: luto@kernel.org
    Link: http://lkml.kernel.org/r/20180716190337.26133-2-riel@surriel.com
    Signed-off-by: Ingo Molnar

    Rik van Riel
     

08 Jun, 2018

1 commit

  • mmap_sem is on the hot path of kernel, and it very contended, but it is
    abused too. It is used to protect arg_start|end and evn_start|end when
    reading /proc/$PID/cmdline and /proc/$PID/environ, but it doesn't make
    sense since those proc files just expect to read 4 values atomically and
    not related to VM, they could be set to arbitrary values by C/R.

    And, the mmap_sem contention may cause unexpected issue like below:

    INFO: task ps:14018 blocked for more than 120 seconds.
    Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
    message.
    ps D 0 14018 1 0x00000004
    Call Trace:
    schedule+0x36/0x80
    rwsem_down_read_failed+0xf0/0x150
    call_rwsem_down_read_failed+0x18/0x30
    down_read+0x20/0x40
    proc_pid_cmdline_read+0xd9/0x4e0
    __vfs_read+0x37/0x150
    vfs_read+0x96/0x130
    SyS_read+0x55/0xc0
    entry_SYSCALL_64_fastpath+0x1a/0xc5

    Both Alexey Dobriyan and Michal Hocko suggested to use dedicated lock
    for them to mitigate the abuse of mmap_sem.

    So, introduce a new spinlock in mm_struct to protect the concurrent
    access to arg_start|end, env_start|end and others, as well as replace
    write map_sem to read to protect the race condition between prctl and
    sys_brk which might break check_data_rlimit(), and makes prctl more
    friendly to other VM operations.

    This patch just eliminates the abuse of mmap_sem, but it can't resolve
    the above hung task warning completely since the later
    access_remote_vm() call needs acquire mmap_sem. The mmap_sem
    scalability issue will be solved in the future.

    [yang.shi@linux.alibaba.com: add comment about mmap_sem and arg_lock]
    Link: http://lkml.kernel.org/r/1524077799-80690-1-git-send-email-yang.shi@linux.alibaba.com
    Link: http://lkml.kernel.org/r/1523730291-109696-1-git-send-email-yang.shi@linux.alibaba.com
    Signed-off-by: Yang Shi
    Reviewed-by: Cyrill Gorcunov
    Acked-by: Michal Hocko
    Cc: Alexey Dobriyan
    Cc: Matthew Wilcox
    Cc: Mateusz Guzik
    Cc: Kirill Tkhai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yang Shi
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

23 Nov, 2016

1 commit

  • During exec dumpable is cleared if the file that is being executed is
    not readable by the user executing the file. A bug in
    ptrace_may_access allows reading the file if the executable happens to
    enter into a subordinate user namespace (aka clone(CLONE_NEWUSER),
    unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER).

    This problem is fixed with only necessary userspace breakage by adding
    a user namespace owner to mm_struct, captured at the time of exec, so
    it is clear in which user namespace CAP_SYS_PTRACE must be present in
    to be able to safely give read permission to the executable.

    The function ptrace_may_access is modified to verify that the ptracer
    has CAP_SYS_ADMIN in task->mm->user_ns instead of task->cred->user_ns.
    This ensures that if the task changes it's cred into a subordinate
    user namespace it does not become ptraceable.

    The function ptrace_attach is modified to only set PT_PTRACE_CAP when
    CAP_SYS_PTRACE is held over task->mm->user_ns. The intent of
    PT_PTRACE_CAP is to be a flag to note that whatever permission changes
    the task might go through the tracer has sufficient permissions for
    it not to be an issue. task->cred->user_ns is always the same
    as or descendent of mm->user_ns. Which guarantees that having
    CAP_SYS_PTRACE over mm->user_ns is the worst case for the tasks
    credentials.

    To prevent regressions mm->dumpable and mm->user_ns are not considered
    when a task has no mm. As simply failing ptrace_may_attach causes
    regressions in privileged applications attempting to read things
    such as /proc//stat

    Cc: stable@vger.kernel.org
    Acked-by: Kees Cook
    Tested-by: Cyrill Gorcunov
    Fixes: 8409cca70561 ("userns: allow ptrace from non-init user namespaces")
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

25 May, 2011

1 commit

  • cpumask_t is very big struct and cpu_vm_mask is placed wrong position.
    It might lead to reduce cache hit ratio.

    This patch has two change.
    1) Move the place of cpumask into last of mm_struct. Because usually cpumask
    is accessed only front bits when the system has cpu-hotplug capability
    2) Convert cpu_vm_mask into cpumask_var_t. It may help to reduce memory
    footprint if cpumask_size() will use nr_cpumask_bits properly in future.

    In addition, this patch change the name of cpu_vm_mask with cpu_vm_mask_var.
    It may help to detect out of tree cpu_vm_mask users.

    This patch has no functional change.

    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: KOSAKI Motohiro
    Cc: David Howells
    Cc: Koichi Yasutake
    Cc: Hugh Dickins
    Cc: Chris Metcalf
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

10 Aug, 2010

1 commit

  • Provide an INIT_MM_CONTEXT intializer macro which can be used to
    statically initialize mm_struct:mm_context of init_mm. This way we can
    get rid of code which will do the initialization at run time (on s390).

    In addition the current code can be found at a place where it is not
    expected. So let's have a common initializer which architectures
    can use if needed.

    This is based on a patch from Suzuki Poulose.

    Signed-off-by: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Suzuki Poulose
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     

17 Jun, 2009

1 commit

  • * create mm/init-mm.c, move init_mm there
    * remove INIT_MM, initialize init_mm with C99 initializer
    * unexport init_mm on all arches:

    init_mm is already unexported on x86.

    One strange place is some OMAP driver (drivers/video/omap/) which
    won't build modular, but it's already wants get_vm_area() export.
    Somebody should look there.

    [akpm@linux-foundation.org: add missing #includes]
    Signed-off-by: Alexey Dobriyan
    Cc: Mike Frysinger
    Cc: Americo Wang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan