13 Aug, 2020

2 commits

  • Use the general page fault accounting by passing regs into
    handle_mm_fault().

    Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the
    other two perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in
    handle_mm_fault().

    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Link: http://lkml.kernel.org/r/20200707225021.200906-3-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     
  • Patch series "mm: Page fault accounting cleanups", v5.

    This is v5 of the pf accounting cleanup series. It originates from Gerald
    Schaefer's report on an issue a week ago regarding to incorrect page fault
    accountings for retried page fault after commit 4064b9827063 ("mm: allow
    VM_FAULT_RETRY for multiple times"):

    https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/

    What this series did:

    - Correct page fault accounting: we do accounting for a page fault
    (no matter whether it's from #PF handling, or gup, or anything else)
    only with the one that completed the fault. For example, page fault
    retries should not be counted in page fault counters. Same to the
    perf events.

    - Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf
    event is used in an adhoc way across different archs.

    Case (1): for many archs it's done at the entry of a page fault
    handler, so that it will also cover e.g. errornous faults.

    Case (2): for some other archs, it is only accounted when the page
    fault is resolved successfully.

    Case (3): there're still quite some archs that have not enabled
    this perf event.

    Since this series will touch merely all the archs, we unify this
    perf event to always follow case (1), which is the one that makes most
    sense. And since we moved the accounting into handle_mm_fault, the
    other two MAJ/MIN perf events are well taken care of naturally.

    - Unify definition of "major faults": the definition of "major
    fault" is slightly changed when used in accounting (not
    VM_FAULT_MAJOR). More information in patch 1.

    - Always account the page fault onto the one that triggered the page
    fault. This does not matter much for #PF handlings, but mostly for
    gup. More information on this in patch 25.

    Patchset layout:

    Patch 1: Introduced the accounting in handle_mm_fault(), not enabled.
    Patch 2-23: Enable the new accounting for arch #PF handlers one by one.
    Patch 24: Enable the new accounting for the rest outliers (gup, iommu, etc.)
    Patch 25: Cleanup GUP task_struct pointer since it's not needed any more

    This patch (of 25):

    This is a preparation patch to move page fault accountings into the
    general code in handle_mm_fault(). This includes both the per task
    flt_maj/flt_min counters, and the major/minor page fault perf events. To
    do this, the pt_regs pointer is passed into handle_mm_fault().

    PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault
    handlers.

    So far, all the pt_regs pointer that passed into handle_mm_fault() is
    NULL, which means this patch should have no intented functional change.

    Suggested-by: Linus Torvalds
    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Cc: Albert Ou
    Cc: Alexander Gordeev
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Christian Borntraeger
    Cc: Chris Zankel
    Cc: Dave Hansen
    Cc: David S. Miller
    Cc: Geert Uytterhoeven
    Cc: Gerald Schaefer
    Cc: Greentime Hu
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: H. Peter Anvin
    Cc: Ingo Molnar
    Cc: Ivan Kokshaysky
    Cc: James E.J. Bottomley
    Cc: John Hubbard
    Cc: Jonas Bonn
    Cc: Ley Foon Tan
    Cc: "Luck, Tony"
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Palmer Dabbelt
    Cc: Paul Mackerras
    Cc: Paul Walmsley
    Cc: Pekka Enberg
    Cc: Peter Zijlstra
    Cc: Richard Henderson
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Stefan Kristiansson
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Vasily Gorbik
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200707225021.200906-1-peterx@redhat.com
    Link: http://lkml.kernel.org/r/20200707225021.200906-2-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     

08 Aug, 2020

1 commit

  • Patch series "mm: cleanup usage of "

    Most architectures have very similar versions of pXd_alloc_one() and
    pXd_free_one() for intermediate levels of page table. These patches add
    generic versions of these functions in and enable
    use of the generic functions where appropriate.

    In addition, functions declared and defined in headers are
    used mostly by core mm and early mm initialization in arch and there is no
    actual reason to have the included all over the place.
    The first patch in this series removes unneeded includes of

    In the end it didn't work out as neatly as I hoped and moving
    pXd_alloc_track() definitions to would require
    unnecessary changes to arches that have custom page table allocations, so
    I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local
    to mm/.

    This patch (of 8):

    In most cases header is required only for allocations of
    page table memory. Most of the .c files that include that header do not
    use symbols declared in and do not require that header.

    As for the other header files that used to include , it is
    possible to move that include into the .c file that actually uses symbols
    from and drop the include from the header file.

    The process was somewhat automated using

    sed -i -E '/[
    Signed-off-by: Andrew Morton
    Reviewed-by: Pekka Enberg
    Acked-by: Geert Uytterhoeven [m68k]
    Cc: Abdul Haleem
    Cc: Andy Lutomirski
    Cc: Arnd Bergmann
    Cc: Christophe Leroy
    Cc: Joerg Roedel
    Cc: Max Filippov
    Cc: Peter Zijlstra
    Cc: Satheesh Rajendran
    Cc: Stafford Horne
    Cc: Stephen Rothwell
    Cc: Steven Rostedt
    Cc: Joerg Roedel
    Cc: Matthew Wilcox
    Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org
    Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

10 Jun, 2020

3 commits

  • Convert comments that reference old mmap_sem APIs to reference
    corresponding new mmap locking APIs instead.

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Davidlohr Bueso
    Reviewed-by: Daniel Jordan
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-12-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • This change converts the existing mmap_sem rwsem calls to use the new mmap
    locking API instead.

    The change is generated using coccinelle with the following rule:

    // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

    @@
    expression mm;
    @@
    (
    -init_rwsem
    +mmap_init_lock
    |
    -down_write
    +mmap_write_lock
    |
    -down_write_killable
    +mmap_write_lock_killable
    |
    -down_write_trylock
    +mmap_write_trylock
    |
    -up_write
    +mmap_write_unlock
    |
    -downgrade_write
    +mmap_write_downgrade
    |
    -down_read
    +mmap_read_lock
    |
    -down_read_killable
    +mmap_read_lock_killable
    |
    -down_read_trylock
    +mmap_read_trylock
    |
    -up_read
    +mmap_read_unlock
    )
    -(&mm->mmap_sem)
    +(mm)

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Daniel Jordan
    Reviewed-by: Laurent Dufour
    Reviewed-by: Vlastimil Babka
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Patch series "mm: consolidate definitions of page table accessors", v2.

    The low level page table accessors (pXY_index(), pXY_offset()) are
    duplicated across all architectures and sometimes more than once. For
    instance, we have 31 definition of pgd_offset() for 25 supported
    architectures.

    Most of these definitions are actually identical and typically it boils
    down to, e.g.

    static inline unsigned long pmd_index(unsigned long address)
    {
    return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
    }

    static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
    {
    return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
    }

    These definitions can be shared among 90% of the arches provided
    XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

    For architectures that really need a custom version there is always
    possibility to override the generic version with the usual ifdefs magic.

    These patches introduce include/linux/pgtable.h that replaces
    include/asm-generic/pgtable.h and add the definitions of the page table
    accessors to the new header.

    This patch (of 12):

    The linux/mm.h header includes to allow inlining of the
    functions involving page table manipulations, e.g. pte_alloc() and
    pmd_alloc(). So, there is no point to explicitly include
    in the files that include .

    The include statements in such cases are remove with a simple loop:

    for f in $(git grep -l "include ") ; do
    sed -i -e '/include / d' $f
    done

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matthew Wilcox
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Mike Rapoport
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
    Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

04 Jun, 2020

3 commits

  • free_area_init() only requires the definition of maximal PFN for each of
    the supported zone rater than calculation of actual zone sizes and the
    sizes of the holes between the zones.

    After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
    available to all architectures.

    Using this function instead of free_area_init_node() simplifies the zone
    detection.

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Tested-by: Hoan Tran [arm64]
    Cc: Baoquan He
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: "James E.J. Bottomley"
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Hocko
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200412194859.12663-7-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Currently, architectures that use free_area_init() to initialize memory
    map and node and zone structures need to calculate zone and hole sizes.
    We can use free_area_init_nodes() instead and let it detect the zone
    boundaries while the architectures will only have to supply the possible
    limits for the zones.

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Tested-by: Hoan Tran [arm64]
    Reviewed-by: Baoquan He
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: "James E.J. Bottomley"
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Hocko
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200412194859.12663-5-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • CONFIG_HAVE_MEMBLOCK_NODE_MAP is used to differentiate initialization of
    nodes and zones structures between the systems that have region to node
    mapping in memblock and those that don't.

    Currently all the NUMA architectures enable this option and for the
    non-NUMA systems we can presume that all the memory belongs to node 0 and
    therefore the compile time configuration option is not required.

    The remaining few architectures that use DISCONTIGMEM without NUMA are
    easily updated to use memblock_add_node() instead of memblock_add() and
    thus have proper correspondence of memblock regions to NUMA nodes.

    Still, free_area_init_node() must have a backward compatible version
    because its semantics with and without CONFIG_HAVE_MEMBLOCK_NODE_MAP is
    different. Once all the architectures will use the new semantics, the
    entire compatibility layer can be dropped.

    To avoid addition of extra run time memory to store node id for
    architectures that keep memblock but have only a single node, the node id
    field of the memblock_region is guarded by CONFIG_NEED_MULTIPLE_NODES and
    the corresponding accessors presume that in those cases it is always 0.

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Tested-by: Hoan Tran [arm64]
    Acked-by: Catalin Marinas [arm64]
    Cc: Baoquan He
    Cc: Brian Cain
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: "James E.J. Bottomley"
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Hocko
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200412194859.12663-4-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

03 Apr, 2020

3 commits

  • The idea comes from a discussion between Linus and Andrea [1].

    Before this patch we only allow a page fault to retry once. We achieved
    this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing
    handle_mm_fault() the second time. This was majorly used to avoid
    unexpected starvation of the system by looping over forever to handle the
    page fault on a single page. However that should hardly happen, and after
    all for each code path to return a VM_FAULT_RETRY we'll first wait for a
    condition (during which time we should possibly yield the cpu) to happen
    before VM_FAULT_RETRY is really returned.

    This patch removes the restriction by keeping the FAULT_FLAG_ALLOW_RETRY
    flag when we receive VM_FAULT_RETRY. It means that the page fault handler
    now can retry the page fault for multiple times if necessary without the
    need to generate another page fault event. Meanwhile we still keep the
    FAULT_FLAG_TRIED flag so page fault handler can still identify whether a
    page fault is the first attempt or not.

    Then we'll have these combinations of fault flags (only considering
    ALLOW_RETRY flag and TRIED flag):

    - ALLOW_RETRY and !TRIED: this means the page fault allows to
    retry, and this is the first try

    - ALLOW_RETRY and TRIED: this means the page fault allows to
    retry, and this is not the first try

    - !ALLOW_RETRY and !TRIED: this means the page fault does not allow
    to retry at all

    - !ALLOW_RETRY and TRIED: this is forbidden and should never be used

    In existing code we have multiple places that has taken special care of
    the first condition above by checking against (fault_flags &
    FAULT_FLAG_ALLOW_RETRY). This patch introduces a simple helper to detect
    the first retry of a page fault by checking against both (fault_flags &
    FAULT_FLAG_ALLOW_RETRY) and !(fault_flag & FAULT_FLAG_TRIED) because now
    even the 2nd try will have the ALLOW_RETRY set, then use that helper in
    all existing special paths. One example is in __lock_page_or_retry(), now
    we'll drop the mmap_sem only in the first attempt of page fault and we'll
    keep it in follow up retries, so old locking behavior will be retained.

    This will be a nice enhancement for current code [2] at the same time a
    supporting material for the future userfaultfd-writeprotect work, since in
    that work there will always be an explicit userfault writeprotect retry
    for protected pages, and if that cannot resolve the page fault (e.g., when
    userfaultfd-writeprotect is used in conjunction with swapped pages) then
    we'll possibly need a 3rd retry of the page fault. It might also benefit
    other potential users who will have similar requirement like userfault
    write-protection.

    GUP code is not touched yet and will be covered in follow up patch.

    Please read the thread below for more information.

    [1] https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/
    [2] https://lore.kernel.org/lkml/20181230154648.GB9832@redhat.com/

    Suggested-by: Linus Torvalds
    Suggested-by: Andrea Arcangeli
    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Tested-by: Brian Geffon
    Cc: Bobby Powers
    Cc: David Hildenbrand
    Cc: Denis Plotnikov
    Cc: "Dr . David Alan Gilbert"
    Cc: Hugh Dickins
    Cc: Jerome Glisse
    Cc: Johannes Weiner
    Cc: "Kirill A . Shutemov"
    Cc: Martin Cracauer
    Cc: Marty McFadden
    Cc: Matthew Wilcox
    Cc: Maya Gokhale
    Cc: Mel Gorman
    Cc: Mike Kravetz
    Cc: Mike Rapoport
    Cc: Pavel Emelyanov
    Link: http://lkml.kernel.org/r/20200220160246.9790-1-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     
  • Although there're tons of arch-specific page fault handlers, most of them
    are still sharing the same initial value of the page fault flags. Say,
    merely all of the page fault handlers would allow the fault to be retried,
    and they also allow the fault to respond to SIGKILL.

    Let's define a default value for the fault flags to replace those initial
    page fault flags that were copied over. With this, it'll be far easier to
    introduce new fault flag that can be used by all the architectures instead
    of touching all the archs.

    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Tested-by: Brian Geffon
    Reviewed-by: David Hildenbrand
    Cc: Andrea Arcangeli
    Cc: Bobby Powers
    Cc: Denis Plotnikov
    Cc: "Dr . David Alan Gilbert"
    Cc: Hugh Dickins
    Cc: Jerome Glisse
    Cc: Johannes Weiner
    Cc: "Kirill A . Shutemov"
    Cc: Martin Cracauer
    Cc: Marty McFadden
    Cc: Matthew Wilcox
    Cc: Maya Gokhale
    Cc: Mel Gorman
    Cc: Mike Kravetz
    Cc: Mike Rapoport
    Cc: Pavel Emelyanov
    Link: http://lkml.kernel.org/r/20200220160238.9694-1-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     
  • For most architectures, we've got a quick path to detect fatal signal
    after a handle_mm_fault(). Introduce a helper for that quick path.

    It cleans the current codes a bit so we don't need to duplicate the same
    check across archs. More importantly, this will be an unified place that
    we handle the signal immediately right after an interrupted page fault, so
    it'll be much easier for us if we want to change the behavior of handling
    signals later on for all the archs.

    Note that currently only part of the archs are using this new helper,
    because some archs have their own way to handle signals. In the follow up
    patches, we'll try to apply this helper to all the rest of archs.

    Another note is that the "regs" parameter in the new helper is not used
    yet. It'll be used very soon. Now we kept it in this patch only to avoid
    touching all the archs again in the follow up patches.

    [peterx@redhat.com: fix sparse warnings]
    Link: http://lkml.kernel.org/r/20200311145921.GD479302@xz-x1
    Signed-off-by: Peter Xu
    Signed-off-by: Andrew Morton
    Tested-by: Brian Geffon
    Cc: Andrea Arcangeli
    Cc: Bobby Powers
    Cc: David Hildenbrand
    Cc: Denis Plotnikov
    Cc: "Dr . David Alan Gilbert"
    Cc: Hugh Dickins
    Cc: Jerome Glisse
    Cc: Johannes Weiner
    Cc: "Kirill A . Shutemov"
    Cc: Martin Cracauer
    Cc: Marty McFadden
    Cc: Matthew Wilcox
    Cc: Maya Gokhale
    Cc: Mel Gorman
    Cc: Mike Kravetz
    Cc: Mike Rapoport
    Cc: Pavel Emelyanov
    Link: http://lkml.kernel.org/r/20200220155353.8676-4-peterx@redhat.com
    Signed-off-by: Linus Torvalds

    Peter Xu
     

05 Dec, 2019

1 commit

  • Patch series "mm: remove __ARCH_HAS_4LEVEL_HACK", v13.

    These patches convert several architectures to use page table folding
    and remove __ARCH_HAS_4LEVEL_HACK along with
    include/asm-generic/4level-fixup.h.

    For the nommu configurations the folding is already implemented by the
    generic code so the only change was to use the appropriate header file.

    As for the rest, the changes are mostly about mechanical replacement of
    pgd accessors with pud/pmd ones and the addition of higher levels to
    page table traversals.

    With Vineet's patches from "elide extraneous generated code for folded
    p4d/pud/pmd" series [1] there is a small shrink of the kernel size of
    about -0.01% for the defconfig builds.

    This patch (of 13):

    It is not likely alpha will have 5-level page tables.

    Replace usage of include/asm-generic/4level-fixup.h and implied
    __ARCH_HAS_4LEVEL_HACK with include/asm-generic/pgtable-nopud.h and
    adjust page table manipulation macros and functions accordingly.

    Link: http://lkml.kernel.org/r/1572938135-31886-2-git-send-email-rppt@kernel.org
    Signed-off-by: Mike Rapoport
    Cc: Anton Ivanov
    Cc: Arnd Bergmann
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Helge Deller
    Cc: "James E.J. Bottomley"
    Cc: Jeff Dike
    Cc: "Kirill A. Shutemov"
    Cc: Mark Salter
    Cc: Matt Turner
    Cc: Michal Simek
    Cc: Peter Rosin
    Cc: Richard Weinberger
    Cc: Rolf Eike Beer
    Cc: Russell King
    Cc: Sam Creasey
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Anatoly Pugachev
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

09 Jul, 2019

1 commit

  • …iederm/user-namespace

    Pull force_sig() argument change from Eric Biederman:
    "A source of error over the years has been that force_sig has taken a
    task parameter when it is only safe to use force_sig with the current
    task.

    The force_sig function is built for delivering synchronous signals
    such as SIGSEGV where the userspace application caused a synchronous
    fault (such as a page fault) and the kernel responded with a signal.

    Because the name force_sig does not make this clear, and because the
    force_sig takes a task parameter the function force_sig has been
    abused for sending other kinds of signals over the years. Slowly those
    have been fixed when the oopses have been tracked down.

    This set of changes fixes the remaining abusers of force_sig and
    carefully rips out the task parameter from force_sig and friends
    making this kind of error almost impossible in the future"

    * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits)
    signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus
    signal: Remove the signal number and task parameters from force_sig_info
    signal: Factor force_sig_info_to_task out of force_sig_info
    signal: Generate the siginfo in force_sig
    signal: Move the computation of force into send_signal and correct it.
    signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal
    signal: Remove the task parameter from force_sig_fault
    signal: Use force_sig_fault_to_task for the two calls that don't deliver to current
    signal: Explicitly call force_sig_fault on current
    signal/unicore32: Remove tsk parameter from __do_user_fault
    signal/arm: Remove tsk parameter from __do_user_fault
    signal/arm: Remove tsk parameter from ptrace_break
    signal/nds32: Remove tsk parameter from send_sigtrap
    signal/riscv: Remove tsk parameter from do_trap
    signal/sh: Remove tsk parameter from force_sig_info_fault
    signal/um: Remove task parameter from send_sigtrap
    signal/x86: Remove task parameter from send_sigtrap
    signal: Remove task parameter from force_sig_mceerr
    signal: Remove task parameter from force_sig
    signal: Remove task parameter from force_sigsegv
    ...

    Linus Torvalds
     

29 May, 2019

1 commit

  • As synchronous exceptions really only make sense against the current
    task (otherwise how are you synchronous) remove the task parameter
    from from force_sig_fault to make it explicit that is what is going
    on.

    The two known exceptions that deliver a synchronous exception to a
    stopped ptraced task have already been changed to
    force_sig_fault_to_task.

    The callers have been changed with the following emacs regular expression
    (with obvious variations on the architectures that take more arguments)
    to avoid typos:

    force_sig_fault[(]\([^,]+\)[,]\([^,]+\)[,]\([^,]+\)[,]\W+current[)]
    ->
    force_sig_fault(\1,\2,\3)

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

21 May, 2019

1 commit


15 May, 2019

2 commits

  • Patch series "provide a generic free_initmem implementation", v2.

    Many architectures implement free_initmem() in exactly the same or very
    similar way: they wrap the call to free_initmem_default() with sometimes
    different 'poison' parameter.

    These patches switch those architectures to use a generic implementation
    that does free_initmem_default(POISON_FREE_INITMEM).

    This was inspired by Christoph's patches for free_initrd_mem [1] and I
    shamelessly copied changelog entries from his patches :)

    [1] https://lore.kernel.org/lkml/20190213174621.29297-1-hch@lst.de/

    This patch (of 2):

    For most architectures free_initmem just a wrapper for the same
    free_initmem_default(-1) call. Provide that as a generic implementation
    marked __weak.

    Link: http://lkml.kernel.org/r/1550515285-17446-2-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Reviewed-by: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Palmer Dabbelt
    Cc: Richard Kuo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • For most architectures free_initrd_mem just expands to the same
    free_reserved_area call. Provide that as a generic implementation marked
    __weak.

    Link: http://lkml.kernel.org/r/20190213174621.29297-8-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Acked-by: Geert Uytterhoeven [m68k]
    Acked-by: Mike Rapoport
    Cc: Catalin Marinas [arm64]
    Cc: Steven Price
    Cc: Alexander Viro
    Cc: Guan Xuetao
    Cc: Russell King
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

11 Feb, 2019

1 commit

  • Fix page fault handling code to fixup r16-r18 registers.
    Before the patch code had off-by-two registers bug.
    This bug caused overwriting of ps,pc,gp registers instead
    of fixing intended r16,r17,r18 (see `struct pt_regs`).

    More details:

    Initially Dmitry noticed a kernel bug as a failure
    on strace test suite. Test passes unmapped userspace
    pointer to io_submit:

    ```c
    #include
    #include
    #include
    #include
    int main(void)
    {
    unsigned long ctx = 0;
    if (syscall(__NR_io_setup, 1, &ctx))
    err(1, "io_setup");
    const size_t page_size = sysconf(_SC_PAGESIZE);
    const size_t size = page_size * 2;
    void *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE,
    MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    if (MAP_FAILED == ptr)
    err(1, "mmap(%zu)", size);
    if (munmap(ptr, size))
    err(1, "munmap");
    syscall(__NR_io_submit, ctx, 1, ptr + page_size);
    syscall(__NR_io_destroy, ctx);
    return 0;
    }
    ```

    Running this test causes kernel to crash when handling page fault:

    ```
    Unable to handle kernel paging request at virtual address ffffffffffff9468
    CPU 3
    aio(26027): Oops 0
    pc = [] ra = [] ps = 0000 Not tainted
    pc is at sys_io_submit+0x108/0x200
    ra is at sys_io_submit+0x6c/0x200
    v0 = fffffc00c58e6300 t0 = fffffffffffffff2 t1 = 000002000025e000
    t2 = fffffc01f159fef8 t3 = fffffc0001009640 t4 = fffffc0000e0f6e0
    t5 = 0000020001002e9e t6 = 4c41564e49452031 t7 = fffffc01f159c000
    s0 = 0000000000000002 s1 = 000002000025e000 s2 = 0000000000000000
    s3 = 0000000000000000 s4 = 0000000000000000 s5 = fffffffffffffff2
    s6 = fffffc00c58e6300
    a0 = fffffc00c58e6300 a1 = 0000000000000000 a2 = 000002000025e000
    a3 = 00000200001ac260 a4 = 00000200001ac1e8 a5 = 0000000000000001
    t8 = 0000000000000008 t9 = 000000011f8bce30 t10= 00000200001ac440
    t11= 0000000000000000 pv = fffffc00006fd320 at = 0000000000000000
    gp = 0000000000000000 sp = 00000000265fd174
    Disabling lock debugging due to kernel taint
    Trace:
    [] entSys+0xa4/0xc0
    ```

    Here `gp` has invalid value. `gp is s overwritten by a fixup for the
    following page fault handler in `io_submit` syscall handler:

    ```
    __se_sys_io_submit
    ...
    ldq a1,0(t1)
    bne t0,4280
    ```

    After a page fault `t0` should contain -EFALUT and `a1` is 0.
    Instead `gp` was overwritten in place of `a1`.

    This happens due to a off-by-two bug in `dpf_reg()` for `r16-r18`
    (aka `a0-a2`).

    I think the bug went unnoticed for a long time as `gp` is one
    of scratch registers. Any kernel function call would re-calculate `gp`.

    Dmitry tracked down the bug origin back to 2.1.32 kernel version
    where trap_a{0,1,2} fields were inserted into struct pt_regs.
    And even before that `dpf_reg()` contained off-by-one error.

    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: linux-alpha@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Reported-and-reviewed-by: "Dmitry V. Levin"
    Cc: stable@vger.kernel.org # v2.1.32+
    Bug: https://bugs.gentoo.org/672040
    Signed-off-by: Sergei Trofimovich
    Signed-off-by: Matt Turner

    Sergei Trofimovich
     

15 Dec, 2018

1 commit

  • The conversion of alpha to memblock as the early memory manager caused
    boot to hang as described at [1].

    The issue is caused because for CONFIG_DISCTONTIGMEM=y case,
    memblock_add() is called using memory start PFN that had been rounded
    down to the nearest 8Mb and it caused memblock to see more memory that
    is actually present in the system.

    Besides, memblock allocates memory from high addresses while bootmem was
    using low memory, which broke the assumption that early allocations are
    always accessible by the hardware.

    This patch ensures that memblock_add() is using the correct PFN for the
    memory start and forces memblock to use bottom-up allocations.

    [1] https://lkml.org/lkml/2018/11/22/1032

    Link: http://lkml.kernel.org/r/1543233216-25833-1-git-send-email-rppt@linux.ibm.com
    Reported-by: Meelis Roos
    Signed-off-by: Mike Rapoport
    Tested-by: Meelis Roos
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

31 Oct, 2018

2 commits

  • Move remaining definitions and declarations from include/linux/bootmem.h
    into include/linux/memblock.h and remove the redundant header.

    The includes were replaced with the semantic patch below and then
    semi-automated removal of duplicated '#include

    @@
    @@
    - #include
    + #include

    [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
    Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
    [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
    Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
    [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
    Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
    Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Signed-off-by: Stephen Rothwell
    Acked-by: Michal Hocko
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • The conversion is done using

    sed -i 's@free_all_bootmem@memblock_free_all@' \
    $(git grep -l free_all_bootmem)

    Link: http://lkml.kernel.org/r/1536927045-23536-26-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Acked-by: Michal Hocko
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

27 Oct, 2018

1 commit

  • Replace bootmem allocator with memblock and enable use of NO_BOOTMEM like
    on most other architectures.

    Alpha gets the description of the physical memory from the firmware as an
    array of memory clusters. Each cluster that is not reserved by the
    firmware is added to memblock.memory.

    Once the memblock.memory is set up, we reserve the kernel and initrd pages
    with memblock reserve.

    Since we don't need the bootmem bitmap anymore, the code that finds an
    appropriate place is removed.

    The conversion does not take care of NUMA support which is marked broken
    for more than 10 years now.

    Link: http://lkml.kernel.org/r/1535952894-10967-1-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

18 Aug, 2018

1 commit

  • Use new return type vm_fault_t for fault handler. For now, this is just
    documenting that the function returns a VM_FAULT value rather than an
    errno. Once all instances are converted, vm_fault_t will become a
    distinct type.

    Ref-> commit 1c8f422059ae ("mm: change return type to vm_fault_t")

    In this patch all the caller of handle_mm_fault() are changed to return
    vm_fault_t type.

    Link: http://lkml.kernel.org/r/20180617084810.GA6730@jordon-HP-15-Notebook-PC
    Signed-off-by: Souptick Joarder
    Cc: Matthew Wilcox
    Cc: Richard Henderson
    Cc: Tony Luck
    Cc: Matt Turner
    Cc: Vineet Gupta
    Cc: Russell King
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Richard Kuo
    Cc: Geert Uytterhoeven
    Cc: Michal Simek
    Cc: James Hogan
    Cc: Ley Foon Tan
    Cc: Jonas Bonn
    Cc: James E.J. Bottomley
    Cc: Benjamin Herrenschmidt
    Cc: Palmer Dabbelt
    Cc: Yoshinori Sato
    Cc: David S. Miller
    Cc: Richard Weinberger
    Cc: Guan Xuetao
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: "Levin, Alexander (Sasha Levin)"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Souptick Joarder
     

25 Apr, 2018

2 commits

  • Filling in struct siginfo before calling force_sig_info a tedious and
    error prone process, where once in a great while the wrong fields
    are filled out, and siginfo has been inconsistently cleared.

    Simplify this process by using the helper force_sig_fault. Which
    takes as a parameters all of the information it needs, ensures
    all of the fiddly bits of filling in struct siginfo are done properly
    and then calls force_sig_info.

    In short about a 5 line reduction in code for every time force_sig_info
    is called, which makes the calling function clearer.

    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: linux-alpha@vger.kernel.org
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Call clear_siginfo to ensure every stack allocated siginfo is properly
    initialized before being passed to the signal sending functions.

    Note: It is not safe to depend on C initializers to initialize struct
    siginfo on the stack because C is allowed to skip holes when
    initializing a structure.

    The initialization of struct siginfo in tracehook_report_syscall_exit
    was moved from the helper user_single_step_siginfo into
    tracehook_report_syscall_exit itself, to make it clear that the local
    variable siginfo gets fully initialized.

    In a few cases the scope of struct siginfo has been reduced to make it
    clear that siginfo siginfo is not used on other paths in the function
    in which it is declared.

    Instances of using memset to initialize siginfo have been replaced
    with calls clear_siginfo for clarity.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

02 Mar, 2017

1 commit


25 Jan, 2017

1 commit

  • These files were only including module.h for exception table
    related functions. We've now separated that content out into its
    own file "extable.h" so now move over to that and avoid all the
    extra header content in module.h that we don't really need to compile
    these files.

    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: linux-alpha@vger.kernel.org
    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

25 Dec, 2016

1 commit


27 Jul, 2016

1 commit


23 Mar, 2016

1 commit


19 May, 2015

1 commit

  • Introduce faulthandler_disabled() and use it to check for irq context and
    disabled pagefaults (via pagefault_disable()) in the pagefault handlers.

    Please note that we keep the in_atomic() checks in place - to detect
    whether in irq context (in which case preemption is always properly
    disabled).

    In contrast, preempt_disable() should never be used to disable pagefaults.
    With !CONFIG_PREEMPT_COUNT, preempt_disable() doesn't modify the preempt
    counter, and therefore the result of in_atomic() differs.
    We validate that condition by using might_fault() checks when calling
    might_sleep().

    Therefore, add a comment to faulthandler_disabled(), describing why this
    is needed.

    faulthandler_disabled() and pagefault_disable() are defined in
    linux/uaccess.h, so let's properly add that include to all relevant files.

    This patch is based on a patch from Thomas Gleixner.

    Reviewed-and-tested-by: Thomas Gleixner
    Signed-off-by: David Hildenbrand
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: David.Laight@ACULAB.COM
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: airlied@linux.ie
    Cc: akpm@linux-foundation.org
    Cc: benh@kernel.crashing.org
    Cc: bigeasy@linutronix.de
    Cc: borntraeger@de.ibm.com
    Cc: daniel.vetter@intel.com
    Cc: heiko.carstens@de.ibm.com
    Cc: herbert@gondor.apana.org.au
    Cc: hocko@suse.cz
    Cc: hughd@google.com
    Cc: mst@redhat.com
    Cc: paulus@samba.org
    Cc: ralf@linux-mips.org
    Cc: schwidefsky@de.ibm.com
    Cc: yang.shi@windriver.com
    Link: http://lkml.kernel.org/r/1431359540-32227-7-git-send-email-dahi@linux.vnet.ibm.com
    Signed-off-by: Ingo Molnar

    David Hildenbrand
     

30 Jan, 2015

1 commit

  • The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
    "you should SIGSEGV" error, because the SIGSEGV case was generally
    handled by the caller - usually the architecture fault handler.

    That results in lots of duplication - all the architecture fault
    handlers end up doing very similar "look up vma, check permissions, do
    retries etc" - but it generally works. However, there are cases where
    the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.

    In particular, when accessing the stack guard page, libsigsegv expects a
    SIGSEGV. And it usually got one, because the stack growth is handled by
    that duplicated architecture fault handler.

    However, when the generic VM layer started propagating the error return
    from the stack expansion in commit fee7e49d4514 ("mm: propagate error
    from stack expansion even for guard page"), that now exposed the
    existing VM_FAULT_SIGBUS result to user space. And user space really
    expected SIGSEGV, not SIGBUS.

    To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
    duplicate architecture fault handlers about it. They all already have
    the code to handle SIGSEGV, so it's about just tying that new return
    value to the existing code, but it's all a bit annoying.

    This is the mindless minimal patch to do this. A more extensive patch
    would be to try to gather up the mostly shared fault handling logic into
    one generic helper routine, and long-term we really should do that
    cleanup.

    Just from this patch, you can generally see that most architectures just
    copied (directly or indirectly) the old x86 way of doing things, but in
    the meantime that original x86 model has been improved to hold the VM
    semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
    "newer" things, so it would be a good idea to bring all those
    improvements to the generic case and teach other architectures about
    them too.

    Reported-and-tested-by: Takashi Iwai
    Tested-by: Jan Engelhardt
    Acked-by: Heiko Carstens # "s390 still compiles and boots"
    Cc: linux-arch@vger.kernel.org
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

13 Sep, 2013

1 commit

  • Unlike global OOM handling, memory cgroup code will invoke the OOM killer
    in any OOM situation because it has no way of telling faults occuring in
    kernel context - which could be handled more gracefully - from
    user-triggered faults.

    Pass a flag that identifies faults originating in user space from the
    architecture-specific fault handlers to generic code so that memcg OOM
    handling can be improved.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Michal Hocko
    Cc: David Rientjes
    Cc: KAMEZAWA Hiroyuki
    Cc: azurIt
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

04 Jul, 2013

5 commits

  • Now mem_init() for both Alpha UMA and Alpha NUMA are the same, so unify it
    to reduce duplicated code.

    Signed-off-by: Jiang Liu
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Prepare for removing num_physpages and simplify mem_init().

    Signed-off-by: Jiang Liu
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Concentrate code to modify totalram_pages into the mm core, so the arch
    memory initialized code doesn't need to take care of it. With these
    changes applied, only following functions from mm core modify global
    variable totalram_pages: free_bootmem_late(), free_all_bootmem(),
    free_all_bootmem_node(), adjust_managed_page_count().

    With this patch applied, it will be much more easier for us to keep
    totalram_pages and zone->managed_pages in consistence.

    Signed-off-by: Jiang Liu
    Acked-by: David Howells
    Cc: "H. Peter Anvin"
    Cc: "Michael S. Tsirkin"
    Cc:
    Cc: Arnd Bergmann
    Cc: Catalin Marinas
    Cc: Chris Metcalf
    Cc: Geert Uytterhoeven
    Cc: Ingo Molnar
    Cc: Jeremy Fitzhardinge
    Cc: Jianguo Wu
    Cc: Joonsoo Kim
    Cc: Kamezawa Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Marek Szyprowski
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Tang Chen
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Wen Congyang
    Cc: Will Deacon
    Cc: Yasuaki Ishimatsu
    Cc: Yinghai Lu
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Address more review comments from last round of code review.
    1) Enhance free_reserved_area() to support poisoning freed memory with
    pattern '0'. This could be used to get rid of poison_init_mem()
    on ARM64.
    2) A previous patch has disabled memory poison for initmem on s390
    by mistake, so restore to the original behavior.
    3) Remove redundant PAGE_ALIGN() when calling free_reserved_area().

    Signed-off-by: Jiang Liu
    Cc: Geert Uytterhoeven
    Cc: "H. Peter Anvin"
    Cc: "Michael S. Tsirkin"
    Cc:
    Cc: Arnd Bergmann
    Cc: Catalin Marinas
    Cc: Chris Metcalf
    Cc: David Howells
    Cc: Ingo Molnar
    Cc: Jeremy Fitzhardinge
    Cc: Jianguo Wu
    Cc: Joonsoo Kim
    Cc: Kamezawa Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Marek Szyprowski
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Tang Chen
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Wen Congyang
    Cc: Will Deacon
    Cc: Yasuaki Ishimatsu
    Cc: Yinghai Lu
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Change signature of free_reserved_area() according to Russell King's
    suggestion to fix following build warnings:

    arch/arm/mm/init.c: In function 'mem_init':
    arch/arm/mm/init.c:603:2: warning: passing argument 1 of 'free_reserved_area' makes integer from pointer without a cast [enabled by default]
    free_reserved_area(__va(PHYS_PFN_OFFSET), swapper_pg_dir, 0, NULL);
    ^
    In file included from include/linux/mman.h:4:0,
    from arch/arm/mm/init.c:15:
    include/linux/mm.h:1301:22: note: expected 'long unsigned int' but argument is of type 'void *'
    extern unsigned long free_reserved_area(unsigned long start, unsigned long end,

    mm/page_alloc.c: In function 'free_reserved_area':
    >> mm/page_alloc.c:5134:3: warning: passing argument 1 of 'virt_to_phys' makes pointer from integer without a cast [enabled by default]
    In file included from arch/mips/include/asm/page.h:49:0,
    from include/linux/mmzone.h:20,
    from include/linux/gfp.h:4,
    from include/linux/mm.h:8,
    from mm/page_alloc.c:18:
    arch/mips/include/asm/io.h:119:29: note: expected 'const volatile void *' but argument is of type 'long unsigned int'
    mm/page_alloc.c: In function 'free_area_init_nodes':
    mm/page_alloc.c:5030:34: warning: array subscript is below array bounds [-Warray-bounds]

    Also address some minor code review comments.

    Signed-off-by: Jiang Liu
    Reported-by: Arnd Bergmann
    Cc: "H. Peter Anvin"
    Cc: "Michael S. Tsirkin"
    Cc:
    Cc: Catalin Marinas
    Cc: Chris Metcalf
    Cc: David Howells
    Cc: Geert Uytterhoeven
    Cc: Ingo Molnar
    Cc: Jeremy Fitzhardinge
    Cc: Jianguo Wu
    Cc: Joonsoo Kim
    Cc: Kamezawa Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Marek Szyprowski
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Tang Chen
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Wen Congyang
    Cc: Will Deacon
    Cc: Yasuaki Ishimatsu
    Cc: Yinghai Lu
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu