15 Jul, 2013

1 commit

  • The __cpuinit type of throwaway sections might have made sense
    some time ago when RAM was more constrained, but now the savings
    do not offset the cost and complications. For example, the fix in
    commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
    is a good example of the nasty type of bugs that can be created
    with improper use of the various __init prefixes.

    After a discussion on LKML[1] it was decided that cpuinit should go
    the way of devinit and be phased out. Once all the users are gone,
    we can then finally remove the macros themselves from linux/init.h.

    Note that some harmless section mismatch warnings may result, since
    notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
    are flagged as __cpuinit -- so if we remove the __cpuinit from
    arch specific callers, we will also get section mismatch warnings.
    As an intermediate step, we intend to turn the linux/init.h cpuinit
    content into no-ops as early as possible, since that will get rid
    of these warnings. In any case, they are temporary and harmless.

    This removes all the arch/x86 uses of the __cpuinit macros from
    all C files. x86 only had the one __CPUINIT used in assembly files,
    and it wasn't paired off with a .previous or a __FINIT, so we can
    delete it directly w/o any corresponding additional change there.

    [1] https://lkml.org/lkml/2013/5/20/589

    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: x86@kernel.org
    Acked-by: Ingo Molnar
    Acked-by: Thomas Gleixner
    Acked-by: H. Peter Anvin
    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

11 Jul, 2013

1 commit

  • Since all architectures have been converted to use vm_unmapped_area(),
    there is no remaining use for the free_area_cache.

    Signed-off-by: Michel Lespinasse
    Acked-by: Rik van Riel
    Cc: "James E.J. Bottomley"
    Cc: "Luck, Tony"
    Cc: Benjamin Herrenschmidt
    Cc: David Howells
    Cc: Helge Deller
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: Paul Mackerras
    Cc: Richard Henderson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

10 Jul, 2013

1 commit

  • The old codes accumulate addr to get right pmd, however, currently pmds
    are preallocated and transfered as a parameter, there is unnecessary to
    accumulate addr variable any more, this patch remove it.

    Signed-off-by: Wanpeng Li
    Reviewed-by: Michal Hocko
    Reviewed-by: Zhang Yanfei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wanpeng Li
     

04 Jul, 2013

6 commits

  • Prepare for removing num_physpages and simplify mem_init().

    Signed-off-by: Jiang Liu
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Andreas Herrmann
    Cc: Tang Chen
    Cc: Wen Congyang
    Cc: Jianguo Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Concentrate code to modify totalram_pages into the mm core, so the arch
    memory initialized code doesn't need to take care of it. With these
    changes applied, only following functions from mm core modify global
    variable totalram_pages: free_bootmem_late(), free_all_bootmem(),
    free_all_bootmem_node(), adjust_managed_page_count().

    With this patch applied, it will be much more easier for us to keep
    totalram_pages and zone->managed_pages in consistence.

    Signed-off-by: Jiang Liu
    Acked-by: David Howells
    Cc: "H. Peter Anvin"
    Cc: "Michael S. Tsirkin"
    Cc:
    Cc: Arnd Bergmann
    Cc: Catalin Marinas
    Cc: Chris Metcalf
    Cc: Geert Uytterhoeven
    Cc: Ingo Molnar
    Cc: Jeremy Fitzhardinge
    Cc: Jianguo Wu
    Cc: Joonsoo Kim
    Cc: Kamezawa Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Marek Szyprowski
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Tang Chen
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Wen Congyang
    Cc: Will Deacon
    Cc: Yasuaki Ishimatsu
    Cc: Yinghai Lu
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • In order to simpilify management of totalram_pages and
    zone->managed_pages, make __free_pages_bootmem() only available at boot
    time. With this change applied, __free_pages_bootmem() will only be
    used by bootmem.c and nobootmem.c at boot time, so mark it as __init.
    Other callers of __free_pages_bootmem() have been converted to use
    free_reserved_page(), which handles totalram_pages and
    zone->managed_pages in a safer way.

    This patch also fix a bug in free_pagetable() for x86_64, which should
    increase zone->managed_pages instead of zone->present_pages when freeing
    reserved pages.

    And now we have managed_pages_count_lock to protect totalram_pages and
    zone->managed_pages, so remove the redundant ppb_lock lock in
    put_page_bootmem(). This greatly simplifies the locking rules.

    Signed-off-by: Jiang Liu
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Yinghai Lu
    Cc: Wen Congyang
    Cc: Tang Chen
    Cc: Yasuaki Ishimatsu
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: "Michael S. Tsirkin"
    Cc:
    Cc: Arnd Bergmann
    Cc: Catalin Marinas
    Cc: Chris Metcalf
    Cc: David Howells
    Cc: Geert Uytterhoeven
    Cc: Jeremy Fitzhardinge
    Cc: Jianguo Wu
    Cc: Joonsoo Kim
    Cc: Kamezawa Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Marek Szyprowski
    Cc: Michel Lespinasse
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Tejun Heo
    Cc: Will Deacon
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Commit "mm: introduce new field 'managed_pages' to struct zone" assumes
    that all highmem pages will be freed into the buddy system by function
    mem_init(). But that's not always true, some architectures may reserve
    some highmem pages during boot. For example PPC may allocate highmem
    pages for giagant HugeTLB pages, and several architectures have code to
    check PageReserved flag to exclude highmem pages allocated during boot
    when freeing highmem pages into the buddy system.

    So treat highmem pages in the same way as normal pages, that is to:
    1) reset zone->managed_pages to zero in mem_init().
    2) recalculate managed_pages when freeing pages into the buddy system.

    Signed-off-by: Jiang Liu
    Cc: "H. Peter Anvin"
    Cc: Tejun Heo
    Cc: Joonsoo Kim
    Cc: Yinghai Lu
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: Kamezawa Hiroyuki
    Cc: Marek Szyprowski
    Cc: "Michael S. Tsirkin"
    Cc:
    Cc: Arnd Bergmann
    Cc: Catalin Marinas
    Cc: Chris Metcalf
    Cc: David Howells
    Cc: Geert Uytterhoeven
    Cc: Ingo Molnar
    Cc: Jeremy Fitzhardinge
    Cc: Jianguo Wu
    Cc: Konrad Rzeszutek Wilk
    Cc: Michel Lespinasse
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Tang Chen
    Cc: Thomas Gleixner
    Cc: Wen Congyang
    Cc: Will Deacon
    Cc: Yasuaki Ishimatsu
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Use common help function free_reserved_area() to simplify code.

    Signed-off-by: Jiang Liu
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Yinghai Lu
    Cc: Tang Chen
    Cc: Wen Congyang
    Cc: Jianguo Wu
    Cc: "Michael S. Tsirkin"
    Cc:
    Cc: Arnd Bergmann
    Cc: Catalin Marinas
    Cc: Chris Metcalf
    Cc: David Howells
    Cc: Geert Uytterhoeven
    Cc: Jeremy Fitzhardinge
    Cc: Joonsoo Kim
    Cc: Kamezawa Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Marek Szyprowski
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Tejun Heo
    Cc: Will Deacon
    Cc: Yasuaki Ishimatsu
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Pull ARM64 updates from Catalin Marinas:
    "Main features:
    - KVM and Xen ports to AArch64
    - Hugetlbfs and transparent huge pages support for arm64
    - Applied Micro X-Gene Kconfig entry and dts file
    - Cache flushing improvements

    For arm64 huge pages support, there are x86 changes moving part of
    arch/x86/mm/hugetlbpage.c into mm/hugetlb.c to be re-used by arm64"

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64: (66 commits)
    arm64: Add initial DTS for APM X-Gene Storm SOC and APM Mustang board
    arm64: Add defines for APM ARMv8 implementation
    arm64: Enable APM X-Gene SOC family in the defconfig
    arm64: Add Kconfig option for APM X-Gene SOC family
    arm64/Makefile: provide vdso_install target
    ARM64: mm: THP support.
    ARM64: mm: Raise MAX_ORDER for 64KB pages and THP.
    ARM64: mm: HugeTLB support.
    ARM64: mm: Move PTE_PROT_NONE bit.
    ARM64: mm: Make PAGE_NONE pages read only and no-execute.
    ARM64: mm: Restore memblock limit when map_mem finished.
    mm: thp: Correct the HPAGE_PMD_ORDER check.
    x86: mm: Remove general hugetlb code from x86.
    mm: hugetlb: Copy general hugetlb code from x86 to mm.
    x86: mm: Remove x86 version of huge_pmd_share.
    mm: hugetlb: Copy huge_pmd_share from x86 to mm.
    arm64: KVM: document kernel object mappings in HYP
    arm64: KVM: MAINTAINERS update
    arm64: KVM: userspace API documentation
    arm64: KVM: enable initialization of a 32bit vcpu
    ...

    Linus Torvalds
     

03 Jul, 2013

1 commit

  • Pull x86 mm changes from Ingo Molnar:
    "Misc improvements:

    - Fix /proc/mtrr reporting
    - Fix ioremap printout
    - Remove the unused pvclock fixmap entry on 32-bit
    - misc cleanups"

    * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/ioremap: Correct function name output
    x86: Fix /proc/mtrr with base/size more than 44bits
    ix86: Don't waste fixmap entries
    x86/mm: Drop unneeded include
    x86_64: Correct phys_addr in cleanup_highmap comment

    Linus Torvalds
     

28 Jun, 2013

1 commit


14 Jun, 2013

2 commits

  • huge_pte_alloc, huge_pte_offset and follow_huge_p[mu]d have
    already been copied over to mm.

    This patch removes the x86 copies of these functions and activates
    the general ones by enabling:
    CONFIG_ARCH_WANT_GENERAL_HUGETLB

    Signed-off-by: Steve Capper
    Acked-by: Catalin Marinas
    Acked-by: Andrew Morton

    Steve Capper
     
  • The huge_pmd_share code has been copied over to mm/hugetlb.c to
    make it accessible to other architectures.

    Remove the x86 copy of the huge_pmd_share code and enable the
    ARCH_WANT_HUGE_PMD_SHARE config flag. That way we reference the
    general one.

    Signed-off-by: Steve Capper
    Acked-by: Catalin Marinas
    Acked-by: Andrew Morton

    Steve Capper
     

01 Jun, 2013

1 commit

  • Commit

    8d57470d x86, mm: setup page table in top-down

    causes a kernel panic while setting mem=2G.

    [mem 0x00000000-0x000fffff] page 4k
    [mem 0x7fe00000-0x7fffffff] page 1G
    [mem 0x7c000000-0x7fdfffff] page 1G
    [mem 0x00100000-0x001fffff] page 4k
    [mem 0x00200000-0x7bffffff] page 2M

    for last entry is not what we want, we should have
    [mem 0x00200000-0x3fffffff] page 2M
    [mem 0x40000000-0x7bffffff] page 1G

    Actually we merge the continuous ranges with same page size too early.
    in this case, before merging we have
    [mem 0x00200000-0x3fffffff] page 2M
    [mem 0x40000000-0x7bffffff] page 2M
    after merging them, will get
    [mem 0x00200000-0x7bffffff] page 2M
    even we can use 1G page to map
    [mem 0x40000000-0x7bffffff]

    that will cause problem, because we already map
    [mem 0x7fe00000-0x7fffffff] page 1G
    [mem 0x7c000000-0x7fdfffff] page 1G
    with 1G page, aka [0x40000000-0x7fffffff] is mapped with 1G page already.
    During phys_pud_init() for [0x40000000-0x7bffffff], it will not
    reuse existing that pud page, and allocate new one then try to use
    2M page to map it instead, as page_size_mask does not include
    PG_LEVEL_1G. At end will have [7c000000-0x7fffffff] not mapped, loop
    in phys_pmd_init stop mapping at 0x7bffffff.

    That is right behavoir, it maps exact range with exact page size that
    we ask, and we should explicitly call it to map [7c000000-0x7fffffff]
    before or after mapping 0x40000000-0x7bffffff.
    Anyway we need to make sure ranges' page_size_mask correct and consistent
    after split_mem_range for each range.

    Fix that by calling adjust_range_size_mask before merging range
    with same page size.

    -v2: update change log.
    -v3: add more explanation why [7c000000-0x7fffffff] is not mapped, and
    it causes panic.

    Bisected-by: "Xie, ChanglongX"
    Bisected-by: Yuanhan Liu
    Reported-and-tested-by: Yuanhan Liu
    Signed-off-by: Yinghai Lu
    Link: http://lkml.kernel.org/r/1370015587-20835-1-git-send-email-yinghai@kernel.org
    Cc: v3.9
    Signed-off-by: H. Peter Anvin

    Yinghai Lu
     

28 May, 2013

1 commit

  • For x86_64, we have phys_base, which means the delta between the
    the address kernel is actually running at and the address kernel
    is compiled to run at. Not phys_addr so correct it.

    Signed-off-by: Zhang Yanfei
    Link: http://lkml.kernel.org/r/5192F9BF.2000802@cn.fujitsu.com
    Signed-off-by: Ingo Molnar

    Zhang Yanfei
     

10 May, 2013

1 commit

  • Two sets of comments were lost during patch-series shuffling:

    - comments for init_range_memory_mapping()

    - comments in init_mem_mapping that is helpful for reminding people
    that the pagetable is setup top-down

    The comments were written by Yinghai in his patch in:

    https://lkml.org/lkml/2012/11/28/620

    This patch reintroduces them.

    Originally-From: Yinghai Lu
    Signed-off-by: Zhang Yanfei
    Cc: Yasuaki Ishimatsu
    Cc: Konrad Rzeszutek Wilk
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/518BC776.7010506@gmail.com
    [ Tidied it all up a bit. ]
    Signed-off-by: Ingo Molnar

    Zhang Yanfei
     

02 May, 2013

1 commit

  • Pull VFS updates from Al Viro,

    Misc cleanups all over the place, mainly wrt /proc interfaces (switch
    create_proc_entry to proc_create(), get rid of the deprecated
    create_proc_read_entry() in favor of using proc_create_data() and
    seq_file etc).

    7kloc removed.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
    don't bother with deferred freeing of fdtables
    proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
    proc: Make the PROC_I() and PDE() macros internal to procfs
    proc: Supply a function to remove a proc entry by PDE
    take cgroup_open() and cpuset_open() to fs/proc/base.c
    ppc: Clean up scanlog
    ppc: Clean up rtas_flash driver somewhat
    hostap: proc: Use remove_proc_subtree()
    drm: proc: Use remove_proc_subtree()
    drm: proc: Use minor->index to label things, not PDE->name
    drm: Constify drm_proc_list[]
    zoran: Don't print proc_dir_entry data in debug
    reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
    proc: Supply an accessor for getting the data from a PDE's parent
    airo: Use remove_proc_subtree()
    rtl8192u: Don't need to save device proc dir PDE
    rtl8187se: Use a dir under /proc/net/r8180/
    proc: Add proc_mkdir_data()
    proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
    proc: Move PDE_NET() to fs/proc/proc_net.c
    ...

    Linus Torvalds
     

30 Apr, 2013

14 commits

  • Pull x86 mm changes from Ingo Molnar:
    "Misc smaller changes all over the map"

    * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/iommu/dmar: Remove warning for HPET scope type
    x86/mm/gart: Drop unnecessary check
    x86/mm/hotplug: Put kernel_physical_mapping_remove() declaration in CONFIG_MEMORY_HOTREMOVE
    x86/mm/fixmap: Remove unused FIX_CYCLONE_TIMER
    x86/mm/numa: Simplify some bit mangling
    x86/mm: Re-enable DEBUG_TLBFLUSH for X86_32
    x86/mm/cpa: Cleanup split_large_page() and its callee
    x86: Drop always empty .text..page_aligned section

    Linus Torvalds
     
  • Pull x86 cpuid changes from Ingo Molnar:
    "The biggest change is x86 CPU bug handling refactoring and cleanups,
    by Borislav Petkov"

    * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86, CPU, AMD: Drop useless label
    x86, AMD: Correct {rd,wr}msr_amd_safe warnings
    x86: Fold-in trivial check_config function
    x86, cpu: Convert AMD Erratum 400
    x86, cpu: Convert AMD Erratum 383
    x86, cpu: Convert Cyrix coma bug detection
    x86, cpu: Convert FDIV bug detection
    x86, cpu: Convert F00F bug detection
    x86, cpu: Expand cpufeature facility to include cpu bugs

    Linus Torvalds
     
  • Pull scheduler changes from Ingo Molnar:
    "The main changes in this development cycle were:

    - full dynticks preparatory work by Frederic Weisbecker

    - factor out the cpu time accounting code better, by Li Zefan

    - multi-CPU load balancer cleanups and improvements by Joonsoo Kim

    - various smaller fixes and cleanups"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
    sched: Fix init NOHZ_IDLE flag
    sched: Prevent to re-select dst-cpu in load_balance()
    sched: Rename load_balance_tmpmask to load_balance_mask
    sched: Move up affinity check to mitigate useless redoing overhead
    sched: Don't consider other cpus in our group in case of NEWLY_IDLE
    sched: Explicitly cpu_idle_type checking in rebalance_domains()
    sched: Change position of resched_cpu() in load_balance()
    sched: Fix wrong rq's runnable_avg update with rt tasks
    sched: Document task_struct::personality field
    sched/cpuacct/UML: Fix header file dependency bug on the UML build
    cgroup: Kill subsys.active flag
    sched/cpuacct: No need to check subsys active state
    sched/cpuacct: Initialize cpuacct subsystem earlier
    sched/cpuacct: Initialize root cpuacct earlier
    sched/cpuacct: Allocate per_cpu cpuusage for root cpuacct statically
    sched/cpuacct: Clean up cpuacct.h
    sched/cpuacct: Remove redundant NULL checks in cpuacct_acount_field()
    sched/cpuacct: Remove redundant NULL checks in cpuacct_charge()
    sched/cpuacct: Add cpuacct_acount_field()
    sched/cpuacct: Add cpuacct_init()
    ...

    Linus Torvalds
     
  • Use preferable function name which implies using a pseudo-random
    number generator.

    Signed-off-by: Akinobu Mita
    Acked-by: H. Peter Anvin
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • pageattr-test calls srandom32() once every test iteration. But calling
    srandom32() after late_initcalls is not meaningfull. Because the random
    states for random32() is mixed by good random numbers in late_initcall
    prandom_reseed().

    So this removes the call to srandom32().

    Signed-off-by: Akinobu Mita
    Acked-by: H. Peter Anvin
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Signed-off-by: Cody P Schafer
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Benjamin Herrenschmidt
    Acked-by: Yinghai Lu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cody P Schafer
     
  • Memory hotplug can happen on a machine under load, memory shortness
    and fragmentation, so huge page allocations for the vmemmap are not
    guaranteed to succeed.

    Try to fall back to regular pages before failing the hotplug event
    completely.

    Signed-off-by: Johannes Weiner
    Cc: Ben Hutchings
    Cc: Bernhard Schmidt
    Cc: Johannes Weiner
    Cc: Russell King
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Benjamin Herrenschmidt
    Cc: "Luck, Tony"
    Cc: Heiko Carstens
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • We already have generic code to allocate vmemmap with regular pages, use
    it.

    Signed-off-by: Johannes Weiner
    Cc: Ben Hutchings
    Cc: Bernhard Schmidt
    Cc: Johannes Weiner
    Cc: Russell King
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Benjamin Herrenschmidt
    Cc: "Luck, Tony"
    Cc: Heiko Carstens
    Cc: David Miller
    Cc: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • No need to maintain addr_end and p_end when they are never actually read
    anywhere on !pse setups. Remove the dead code.

    Signed-off-by: Johannes Weiner
    Cc: Ben Hutchings
    Cc: Bernhard Schmidt
    Cc: Johannes Weiner
    Cc: Russell King
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Benjamin Herrenschmidt
    Cc: "Luck, Tony"
    Cc: Heiko Carstens
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • The sparse code, when asking the architecture to populate the vmemmap,
    specifies the section range as a starting page and a number of pages.

    This is an awkward interface, because none of the arch-specific code
    actually thinks of the range in terms of 'struct page' units and always
    translates it to bytes first.

    In addition, later patches mix huge page and regular page backing for
    the vmemmap. For this, they need to call vmemmap_populate_basepages()
    on sub-section ranges with PAGE_SIZE and PMD_SIZE in mind. But these
    are not necessarily multiples of the 'struct page' size and so this unit
    is too coarse.

    Just translate the section range into bytes once in the generic sparse
    code, then pass byte ranges down the stack.

    Signed-off-by: Johannes Weiner
    Cc: Ben Hutchings
    Cc: Bernhard Schmidt
    Cc: Johannes Weiner
    Cc: Russell King
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Benjamin Herrenschmidt
    Cc: "Luck, Tony"
    Cc: Heiko Carstens
    Acked-by: David S. Miller
    Tested-by: David S. Miller
    Cc: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • This patchset removes vm_struct list management after initializing
    vmalloc. Adding and removing an entry to vmlist is linear time
    complexity, so it is inefficient. If we maintain this list, overall
    time complexity of adding and removing area to vmalloc space is O(N),
    although we use rbtree for finding vacant place and it's time complexity
    is just O(logN).

    And vmlist and vmlist_lock is used many places of outside of vmalloc.c.
    It is preferable that we hide this raw data structure and provide
    well-defined function for supporting them, because it makes that they
    cannot mistake when manipulating theses structure and it makes us easily
    maintain vmalloc layer.

    For kexec and makedumpfile, I export vmap_area_list, instead of vmlist.
    This comes from Atsushi's recommendation. For more information, please
    refer below link. https://lkml.org/lkml/2012/12/6/184

    This patch:

    The purpose of iterating a vmlist is finding vm area with specific virtual
    address. find_vm_area() is provided for this purpose and more efficient,
    because it uses a rbtree. So change it.

    Signed-off-by: Joonsoo Kim
    Signed-off-by: Joonsoo Kim
    Acked-by: Guan Xuetao
    Acked-by: Ingo Molnar
    Acked-by: Chris Metcalf
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Atsushi Kumagai
    Cc: Dave Anderson
    Cc: Eric Biederman
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • Use helper function free_highmem_page() to free highmem pages into
    the buddy system.

    Signed-off-by: Jiang Liu
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Cong Wang
    Cc: Yinghai Lu
    Cc: Attilio Rao
    Cc: Konrad Rzeszutek Wilk
    Reviewed-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Use common help functions to free reserved pages.

    Signed-off-by: Jiang Liu
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Split kcore bits from linux/procfs.h into linux/kcore.h.

    Signed-off-by: David Howells
    Acked-by: KOSAKI Motohiro
    Acked-by: Ralf Baechle
    cc: linux-mips@linux-mips.org
    cc: sparclinux@vger.kernel.org
    cc: x86@kernel.org
    cc: linux-mm@kvack.org
    Signed-off-by: Al Viro

    David Howells
     

15 Apr, 2013

2 commits

  • kernel_physical_mapping_remove() is only called by
    arch_remove_memory() in init_64.c, which is enclosed in
    CONFIG_MEMORY_HOTREMOVE. So when we don't configure
    CONFIG_MEMORY_HOTREMOVE, the compiler will give a warning:

    warning: ‘kernel_physical_mapping_remove’ defined but not used

    So put kernel_physical_mapping_remove() in
    CONFIG_MEMORY_HOTREMOVE.

    Signed-off-by: Tang Chen
    Cc: linux-mm@kvack.org
    Cc: gregkh@linuxfoundation.org
    Cc: yinghai@kernel.org
    Cc: wency@cn.fujitsu.com
    Cc: mgorman@suse.de
    Cc: tj@kernel.org
    Cc: liwanp@linux.vnet.ibm.com
    Link: http://lkml.kernel.org/r/1366019207-27818-3-git-send-email-tangchen@cn.fujitsu.com
    Signed-off-by: Ingo Molnar

    Tang Chen
     
  • Pull x86 fixes from Ingo Molnar:
    "Misc fixes"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/mm: Flush lazy MMU when DEBUG_PAGEALLOC is set
    x86/mm/cpa/selftest: Fix false positive in CPA self test
    x86/mm/cpa: Convert noop to functional fix
    x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal
    x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates

    Linus Torvalds
     

13 Apr, 2013

1 commit

  • This patch attempts to fix:

    https://bugzilla.kernel.org/show_bug.cgi?id=56461

    The symptom is a crash and messages like this:

    chrome: Corrupted page table at address 34a03000
    *pdpt = 0000000000000000 *pde = 0000000000000000
    Bad pagetable: 000f [#1] PREEMPT SMP

    Ingo guesses this got introduced by commit 611ae8e3f520 ("x86/tlb:
    enable tlb flush range support for x86") since that code started to free
    unused pagetables.

    On x86-32 PAE kernels, that new code has the potential to free an entire
    PMD page and will clear one of the four page-directory-pointer-table
    (aka pgd_t entries).

    The hardware aggressively "caches" these top-level entries and invlpg
    does not actually affect the CPU's copy. If we clear one we *HAVE* to
    do a full TLB flush, otherwise we might continue using a freed pmd page.
    (note, we do this properly on the population side in pud_populate()).

    This patch tracks whenever we clear one of these entries in the 'struct
    mmu_gather', and ensures that we follow up with a full tlb flush.

    BTW, I disassembled and checked that:

    if (tlb->fullmm == 0)
    and
    if (!tlb->fullmm && !tlb->need_flush_all)

    generate essentially the same code, so there should be zero impact there
    to the !PAE case.

    Signed-off-by: Dave Hansen
    Cc: Peter Anvin
    Cc: Ingo Molnar
    Cc: Artem S Tashkinov
    Signed-off-by: Linus Torvalds

    Dave Hansen
     

12 Apr, 2013

2 commits

  • When CONFIG_DEBUG_PAGEALLOC is set page table updates made by
    kernel_map_pages() are not made visible (via TLB flush)
    immediately if lazy MMU is on. In environments that support lazy
    MMU (e.g. Xen) this may lead to fatal page faults, for example,
    when zap_pte_range() needs to allocate pages in
    __tlb_remove_page() -> tlb_next_batch().

    Signed-off-by: Boris Ostrovsky
    Cc: konrad.wilk@oracle.com
    Link: http://lkml.kernel.org/r/1365703192-2089-1-git-send-email-boris.ostrovsky@oracle.com
    Signed-off-by: Ingo Molnar

    Boris Ostrovsky
     
  • If the pmd is not present, _PAGE_PSE will not be set anymore.
    Fix the false positive.

    Reported-by: Ingo Molnar
    Signed-off-by: Andrea Arcangeli
    Cc: Stefan Bader
    Cc: Andy Whitcroft
    Cc: Mel Gorman
    Cc: Borislav Petkov
    Link: http://lkml.kernel.org/r/1365687369-30802-1-git-send-email-aarcange@redhat.com
    Signed-off-by: Ingo Molnar

    Andrea Arcangeli
     

11 Apr, 2013

3 commits

  • Commit:

    a8aed3e0752b ("x86/mm/pageattr: Prevent PSE and GLOABL leftovers to confuse pmd/pte_present and pmd_huge")

    introduced a valid fix but one location that didn't trigger the bug that
    lead to finding those (small) problems, wasn't updated using the
    right variable.

    The wrong variable was also initialized for no good reason, that
    may have been the source of the confusion. Remove the noop
    initialization accordingly.

    Commit a8aed3e0752b also erroneously removed one canon_pgprot pass meant
    to clear pmd bitflags not supported in hardware by older CPUs, that
    automatically gets corrected by this patch too by applying it to the right
    variable in the new location.

    Reported-by: Stefan Bader
    Signed-off-by: Andrea Arcangeli
    Acked-by: Borislav Petkov
    Cc: Andy Whitcroft
    Cc: Mel Gorman
    Link: http://lkml.kernel.org/r/1365600505-19314-1-git-send-email-aarcange@redhat.com
    Signed-off-by: Ingo Molnar

    Andrea Arcangeli
     
  • In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops
    when lazy MMU updates are enabled, because set_pgd effects are being
    deferred.

    One instance of this problem is during process mm cleanup with memory
    cgroups enabled. The chain of events is as follows:

    - zap_pte_range enables lazy MMU updates
    - zap_pte_range eventually calls mem_cgroup_charge_statistics,
    which accesses the vmalloc'd mem_cgroup per-cpu stat area
    - vmalloc_fault is triggered which tries to sync the corresponding
    PGD entry with set_pgd, but the update is deferred
    - vmalloc_fault oopses due to a mismatch in the PUD entries

    The OOPs usually looks as so:

    ------------[ cut here ]------------
    kernel BUG at arch/x86/mm/fault.c:396!
    invalid opcode: 0000 [#1] SMP
    .. snip ..
    CPU 1
    Pid: 10866, comm: httpd Not tainted 3.6.10-4.fc18.x86_64 #1
    RIP: e030:[] [] vmalloc_fault+0x11f/0x208
    .. snip ..
    Call Trace:
    [] do_page_fault+0x399/0x4b0
    [] ? xen_mc_extend_args+0xec/0x110
    [] page_fault+0x25/0x30
    [] ? mem_cgroup_charge_statistics.isra.13+0x13/0x50
    [] __mem_cgroup_uncharge_common+0xd8/0x350
    [] mem_cgroup_uncharge_page+0x57/0x60
    [] page_remove_rmap+0xe0/0x150
    [] ? vm_normal_page+0x1a/0x80
    [] unmap_single_vma+0x531/0x870
    [] unmap_vmas+0x52/0xa0
    [] ? pte_mfn_to_pfn+0x72/0x100
    [] exit_mmap+0x98/0x170
    [] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
    [] mmput+0x83/0xf0
    [] exit_mm+0x104/0x130
    [] do_exit+0x15a/0x8c0
    [] do_group_exit+0x3f/0xa0
    [] sys_exit_group+0x17/0x20
    [] system_call_fastpath+0x16/0x1b

    Calling arch_flush_lazy_mmu_mode immediately after set_pgd makes the
    changes visible to the consistency checks.

    Cc:
    RedHat-Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=914737
    Tested-by: Josh Boyer
    Reported-and-Tested-by: Krishna Raman
    Signed-off-by: Samu Kallio
    Link: http://lkml.kernel.org/r/1364045796-10720-1-git-send-email-konrad.wilk@oracle.com
    Tested-by: Konrad Rzeszutek Wilk
    Signed-off-by: Konrad Rzeszutek Wilk
    Signed-off-by: H. Peter Anvin

    Samu Kallio
     
  • Minor. Reordered a few lines to lose a superfluous OR operation.

    Signed-off-by: Martin Bundgaard
    Link: http://lkml.kernel.org/r/1363286075-62615-1-git-send-email-martin@mindflux.org
    Signed-off-by: Ingo Molnar

    Martin Bundgaard
     

10 Apr, 2013

1 commit

  • So basically we're generating the pte_t * from a struct page and
    we're handing it down to the __split_large_page() internal version
    which then goes and gets back struct page * from it because it
    needs it.

    Change the caller to hand down struct page * directly and the
    callee can compute the pte_t itself.

    Net save is one virt_to_page() call and simpler code. While at
    it, make __split_large_page() static.

    Signed-off-by: Borislav Petkov
    Acked-by: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1363886217-24703-1-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov