15 Nov, 2013

15 commits

  • Add missing check for memory allocation fail.

    Signed-off-by: Kirill A. Shutemov
    Cc: Mikael Starvik
    Acked-by: Jesper Nilsson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • In split page table lock case, we embed spinlock_t into struct page.
    For obvious reason, we don't want to increase size of struct page if
    spinlock_t is too big, like with DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC or
    on -rt kernel. So we disable split page table lock, if spinlock_t is
    too big.

    This patchset allows to allocate the lock dynamically if spinlock_t is
    big. In this page->ptl is used to store pointer to spinlock instead of
    spinlock itself. It costs additional cache line for indirect access,
    but fix page fault scalability for multi-threaded applications.

    LOCK_STAT depends on DEBUG_SPINLOCK, so on current kernel enabling
    LOCK_STAT to analyse scalability issues breaks scalability. ;)

    The patchset mostly fixes this. Results for ./thp_memscale -c 80 -b 512M
    on 4-socket machine:

    baseline, no CONFIG_LOCK_STAT: 9.115460703 seconds time elapsed
    baseline, CONFIG_LOCK_STAT=y: 53.890567123 seconds time elapsed
    patched, no CONFIG_LOCK_STAT: 8.852250368 seconds time elapsed
    patched, CONFIG_LOCK_STAT=y: 11.069770759 seconds time elapsed

    Patch count is scary, but most of them trivial. Overview:

    Patches 1-4 Few bug fixes. No dependencies to other patches.
    Probably should applied as soon as possible.

    Patch 5 Changes signature of pgtable_page_ctor(). We will use it
    for dynamic lock allocation, so it can fail.

    Patches 6-8 Add missing constructor/destructor calls on few archs.
    It's fixes NR_PAGETABLE accounting and prepare to use
    split ptl.

    Patches 9-33 Add pgtable_page_ctor() fail handling to all archs.

    Patches 34 Finally adds support of dynamically-allocated page->pte.
    Also contains documentation for split page table lock.

    This patch (of 34):

    I've missed that we preallocate few pmds on pgd_alloc() if X86_PAE
    enabled. Let's add missed constructor/destructor calls.

    I haven't noticed it during testing since prep_new_page() clears
    page->mapping and therefore page->ptl. It's effectively equal to
    spin_lock_init(&page->ptl).

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: "James E.J. Bottomley"
    Cc: "Kirill A. Shutemov"
    Cc: Benjamin Herrenschmidt
    Cc: Catalin Marinas
    Cc: Chen Liqin
    Cc: Chris Metcalf
    Cc: Chris Zankel
    Cc: Christoph Lameter
    Cc: David Howells
    Cc: David S. Miller
    Cc: Fenghua Yu
    Cc: Geert Uytterhoeven
    Cc: Grant Likely
    Cc: Guan Xuetao
    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Hirokazu Takata
    Cc: Ivan Kokshaysky
    Cc: James Hogan
    Cc: Jeff Dike
    Cc: Jesper Nilsson
    Cc: Jonas Bonn
    Cc: Koichi Yasutake
    Cc: Lennox Wu
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michal Simek
    Cc: Mikael Starvik
    Cc: Paul Mackerras
    Cc: Paul Mundt
    Cc: Peter Zijlstra
    Cc: Ralf Baechle
    Cc: Richard Henderson
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rob Herring
    Cc: Russell King
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Enable PMD split page table lock for X86_64 and PAE.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Kirill A. Shutemov
    Tested-by: Alex Thorlton
    Cc: Ingo Molnar
    Cc: Naoya Horiguchi
    Cc: "Eric W . Biederman"
    Cc: "Paul E . McKenney"
    Cc: Al Viro
    Cc: Andi Kleen
    Cc: Andrea Arcangeli
    Cc: Dave Hansen
    Cc: Dave Jones
    Cc: David Howells
    Cc: Frederic Weisbecker
    Cc: Johannes Weiner
    Cc: Kees Cook
    Cc: Mel Gorman
    Cc: Michael Kerrisk
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Robin Holt
    Cc: Sedat Dilek
    Cc: Srikar Dronamraju
    Cc: Thomas Gleixner
    Cc: Hugh Dickins
    Reviewed-by: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • The basic idea is the same as with PTE level: the lock is embedded into
    struct page of table's page.

    We can't use mm->pmd_huge_pte to store pgtables for THP, since we don't
    take mm->page_table_lock anymore. Let's reuse page->lru of table's page
    for that.

    pgtable_pmd_page_ctor() returns true, if initialization is successful
    and false otherwise. Current implementation never fails, but assumption
    that constructor can fail will help to port it to -rt where spinlock_t
    is rather huge and cannot be embedded into struct page -- dynamic
    allocation is required.

    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Kirill A. Shutemov
    Tested-by: Alex Thorlton
    Cc: Ingo Molnar
    Cc: "Eric W . Biederman"
    Cc: "Paul E . McKenney"
    Cc: Al Viro
    Cc: Andi Kleen
    Cc: Andrea Arcangeli
    Cc: Dave Hansen
    Cc: Dave Jones
    Cc: David Howells
    Cc: Frederic Weisbecker
    Cc: Johannes Weiner
    Cc: Kees Cook
    Cc: Mel Gorman
    Cc: Michael Kerrisk
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Robin Holt
    Cc: Sedat Dilek
    Cc: Srikar Dronamraju
    Cc: Thomas Gleixner
    Cc: Hugh Dickins
    Reviewed-by: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Only trivial cases left. Let's convert them altogether.

    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Kirill A. Shutemov
    Tested-by: Alex Thorlton
    Cc: Ingo Molnar
    Cc: "Eric W . Biederman"
    Cc: "Paul E . McKenney"
    Cc: Al Viro
    Cc: Andi Kleen
    Cc: Andrea Arcangeli
    Cc: Dave Hansen
    Cc: Dave Jones
    Cc: David Howells
    Cc: Frederic Weisbecker
    Cc: Johannes Weiner
    Cc: Kees Cook
    Cc: Mel Gorman
    Cc: Michael Kerrisk
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Robin Holt
    Cc: Sedat Dilek
    Cc: Srikar Dronamraju
    Cc: Thomas Gleixner
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Hugetlb supports multiple page sizes. We use split lock only for PMD
    level, but not for PUD.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Kirill A. Shutemov
    Tested-by: Alex Thorlton
    Cc: Ingo Molnar
    Cc: "Eric W . Biederman"
    Cc: "Paul E . McKenney"
    Cc: Al Viro
    Cc: Andi Kleen
    Cc: Andrea Arcangeli
    Cc: Dave Hansen
    Cc: Dave Jones
    Cc: David Howells
    Cc: Frederic Weisbecker
    Cc: Johannes Weiner
    Cc: Kees Cook
    Cc: Mel Gorman
    Cc: Michael Kerrisk
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Robin Holt
    Cc: Sedat Dilek
    Cc: Srikar Dronamraju
    Cc: Thomas Gleixner
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Currently mm->pmd_huge_pte protected by page table lock. It will not
    work with split lock. We have to have per-pmd pmd_huge_pte for proper
    access serialization.

    For now, let's just introduce wrapper to access mm->pmd_huge_pte.

    Signed-off-by: Kirill A. Shutemov
    Tested-by: Alex Thorlton
    Cc: Alex Thorlton
    Cc: Ingo Molnar
    Cc: Naoya Horiguchi
    Cc: "Eric W . Biederman"
    Cc: "Paul E . McKenney"
    Cc: Al Viro
    Cc: Andi Kleen
    Cc: Andrea Arcangeli
    Cc: Dave Hansen
    Cc: Dave Jones
    Cc: David Howells
    Cc: Frederic Weisbecker
    Cc: Johannes Weiner
    Cc: Kees Cook
    Cc: Mel Gorman
    Cc: Michael Kerrisk
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Robin Holt
    Cc: Sedat Dilek
    Cc: Srikar Dronamraju
    Cc: Thomas Gleixner
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • With split page table lock we can't know which lock we need to take
    before we find the relevant pmd.

    Let's move lock taking inside the function.

    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Kirill A. Shutemov
    Tested-by: Alex Thorlton
    Cc: Ingo Molnar
    Cc: "Eric W . Biederman"
    Cc: "Paul E . McKenney"
    Cc: Al Viro
    Cc: Andi Kleen
    Cc: Andrea Arcangeli
    Cc: Dave Hansen
    Cc: Dave Jones
    Cc: David Howells
    Cc: Frederic Weisbecker
    Cc: Johannes Weiner
    Cc: Kees Cook
    Cc: Mel Gorman
    Cc: Michael Kerrisk
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Robin Holt
    Cc: Sedat Dilek
    Cc: Srikar Dronamraju
    Cc: Thomas Gleixner
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • With split ptlock it's important to know which lock
    pmd_trans_huge_lock() took. This patch adds one more parameter to the
    function to return the lock.

    In most places migration to new api is trivial. Exception is
    move_huge_pmd(): we need to take two locks if pmd tables are different.

    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Kirill A. Shutemov
    Tested-by: Alex Thorlton
    Cc: Ingo Molnar
    Cc: "Eric W . Biederman"
    Cc: "Paul E . McKenney"
    Cc: Al Viro
    Cc: Andi Kleen
    Cc: Andrea Arcangeli
    Cc: Dave Hansen
    Cc: Dave Jones
    Cc: David Howells
    Cc: Frederic Weisbecker
    Cc: Johannes Weiner
    Cc: Kees Cook
    Cc: Mel Gorman
    Cc: Michael Kerrisk
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Robin Holt
    Cc: Sedat Dilek
    Cc: Srikar Dronamraju
    Cc: Thomas Gleixner
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Basic api, backed by mm->page_table_lock for now. Actual implementation
    will be added later.

    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Kirill A. Shutemov
    Tested-by: Alex Thorlton
    Cc: Ingo Molnar
    Cc: "Eric W . Biederman"
    Cc: "Paul E . McKenney"
    Cc: Al Viro
    Cc: Andi Kleen
    Cc: Andrea Arcangeli
    Cc: Dave Hansen
    Cc: Dave Jones
    Cc: David Howells
    Cc: Frederic Weisbecker
    Cc: Johannes Weiner
    Cc: Kees Cook
    Cc: Mel Gorman
    Cc: Michael Kerrisk
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Robin Holt
    Cc: Sedat Dilek
    Cc: Srikar Dronamraju
    Cc: Thomas Gleixner
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • With split page table lock for PMD level we can't hold mm->page_table_lock
    while updating nr_ptes.

    Let's convert it to atomic_long_t to avoid races.

    Signed-off-by: Kirill A. Shutemov
    Tested-by: Alex Thorlton
    Cc: Ingo Molnar
    Cc: Naoya Horiguchi
    Cc: "Eric W . Biederman"
    Cc: "Paul E . McKenney"
    Cc: Al Viro
    Cc: Andi Kleen
    Cc: Andrea Arcangeli
    Cc: Dave Hansen
    Cc: Dave Jones
    Cc: David Howells
    Cc: Frederic Weisbecker
    Cc: Johannes Weiner
    Cc: Kees Cook
    Cc: Mel Gorman
    Cc: Michael Kerrisk
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Robin Holt
    Cc: Sedat Dilek
    Cc: Srikar Dronamraju
    Cc: Thomas Gleixner
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • We're going to introduce split page table lock for PMD level. Let's
    rename existing split ptlock for PTE level to avoid confusion.

    Signed-off-by: Kirill A. Shutemov
    Tested-by: Alex Thorlton
    Cc: Ingo Molnar
    Cc: Naoya Horiguchi
    Cc: "Eric W . Biederman"
    Cc: "Paul E . McKenney"
    Cc: Al Viro
    Cc: Andi Kleen
    Cc: Andrea Arcangeli
    Cc: Dave Hansen
    Cc: Dave Jones
    Cc: David Howells
    Cc: Frederic Weisbecker
    Cc: Johannes Weiner
    Cc: Kees Cook
    Cc: Mel Gorman
    Cc: Michael Kerrisk
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Robin Holt
    Cc: Sedat Dilek
    Cc: Srikar Dronamraju
    Cc: Thomas Gleixner
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Alex Thorlton noticed that some massively threaded workloads work poorly,
    if THP enabled. This patchset fixes this by introducing split page table
    lock for PMD tables. hugetlbfs is not covered yet.

    This patchset is based on work by Naoya Horiguchi.

    : akpm result summary:
    :
    : THP off, v3.12-rc2: 18.059261877 seconds time elapsed
    : THP off, patched: 16.768027318 seconds time elapsed
    :
    : THP on, v3.12-rc2: 42.162306788 seconds time elapsed
    : THP on, patched: 8.397885779 seconds time elapsed
    :
    : HUGETLB, v3.12-rc2: 47.574936948 seconds time elapsed
    : HUGETLB, patched: 19.447481153 seconds time elapsed

    THP off, v3.12-rc2:
    -------------------

    Performance counter stats for './thp_memscale -c 80 -b 512m' (5 runs):

    1037072.835207 task-clock # 57.426 CPUs utilized ( +- 3.59% )
    95,093 context-switches # 0.092 K/sec ( +- 3.93% )
    140 cpu-migrations # 0.000 K/sec ( +- 5.28% )
    10,000,550 page-faults # 0.010 M/sec ( +- 0.00% )
    2,455,210,400,261 cycles # 2.367 GHz ( +- 3.62% ) [83.33%]
    2,429,281,882,056 stalled-cycles-frontend # 98.94% frontend cycles idle ( +- 3.67% ) [83.33%]
    1,975,960,019,659 stalled-cycles-backend # 80.48% backend cycles idle ( +- 3.88% ) [66.68%]
    46,503,296,013 instructions # 0.02 insns per cycle
    # 52.24 stalled cycles per insn ( +- 3.21% ) [83.34%]
    9,278,997,542 branches # 8.947 M/sec ( +- 4.00% ) [83.34%]
    89,881,640 branch-misses # 0.97% of all branches ( +- 1.17% ) [83.33%]

    18.059261877 seconds time elapsed ( +- 2.65% )

    THP on, v3.12-rc2:
    ------------------

    Performance counter stats for './thp_memscale -c 80 -b 512m' (5 runs):

    3114745.395974 task-clock # 73.875 CPUs utilized ( +- 1.84% )
    267,356 context-switches # 0.086 K/sec ( +- 1.84% )
    99 cpu-migrations # 0.000 K/sec ( +- 1.40% )
    58,313 page-faults # 0.019 K/sec ( +- 0.28% )
    7,416,635,817,510 cycles # 2.381 GHz ( +- 1.83% ) [83.33%]
    7,342,619,196,993 stalled-cycles-frontend # 99.00% frontend cycles idle ( +- 1.88% ) [83.33%]
    6,267,671,641,967 stalled-cycles-backend # 84.51% backend cycles idle ( +- 2.03% ) [66.67%]
    117,819,935,165 instructions # 0.02 insns per cycle
    # 62.32 stalled cycles per insn ( +- 4.39% ) [83.34%]
    28,899,314,777 branches # 9.278 M/sec ( +- 4.48% ) [83.34%]
    71,787,032 branch-misses # 0.25% of all branches ( +- 1.03% ) [83.33%]

    42.162306788 seconds time elapsed ( +- 1.73% )

    HUGETLB, v3.12-rc2:
    -------------------

    Performance counter stats for './thp_memscale_hugetlbfs -c 80 -b 512M' (5 runs):

    2588052.787264 task-clock # 54.400 CPUs utilized ( +- 3.69% )
    246,831 context-switches # 0.095 K/sec ( +- 4.15% )
    138 cpu-migrations # 0.000 K/sec ( +- 5.30% )
    21,027 page-faults # 0.008 K/sec ( +- 0.01% )
    6,166,666,307,263 cycles # 2.383 GHz ( +- 3.68% ) [83.33%]
    6,086,008,929,407 stalled-cycles-frontend # 98.69% frontend cycles idle ( +- 3.77% ) [83.33%]
    5,087,874,435,481 stalled-cycles-backend # 82.51% backend cycles idle ( +- 4.41% ) [66.67%]
    133,782,831,249 instructions # 0.02 insns per cycle
    # 45.49 stalled cycles per insn ( +- 4.30% ) [83.34%]
    34,026,870,541 branches # 13.148 M/sec ( +- 4.24% ) [83.34%]
    68,670,942 branch-misses # 0.20% of all branches ( +- 3.26% ) [83.33%]

    47.574936948 seconds time elapsed ( +- 2.09% )

    THP off, patched:
    -----------------

    Performance counter stats for './thp_memscale -c 80 -b 512m' (5 runs):

    943301.957892 task-clock # 56.256 CPUs utilized ( +- 3.01% )
    86,218 context-switches # 0.091 K/sec ( +- 3.17% )
    121 cpu-migrations # 0.000 K/sec ( +- 6.64% )
    10,000,551 page-faults # 0.011 M/sec ( +- 0.00% )
    2,230,462,457,654 cycles # 2.365 GHz ( +- 3.04% ) [83.32%]
    2,204,616,385,805 stalled-cycles-frontend # 98.84% frontend cycles idle ( +- 3.09% ) [83.32%]
    1,778,640,046,926 stalled-cycles-backend # 79.74% backend cycles idle ( +- 3.47% ) [66.69%]
    45,995,472,617 instructions # 0.02 insns per cycle
    # 47.93 stalled cycles per insn ( +- 2.51% ) [83.34%]
    9,179,700,174 branches # 9.731 M/sec ( +- 3.04% ) [83.35%]
    89,166,529 branch-misses # 0.97% of all branches ( +- 1.45% ) [83.33%]

    16.768027318 seconds time elapsed ( +- 2.47% )

    THP on, patched:
    ----------------

    Performance counter stats for './thp_memscale -c 80 -b 512m' (5 runs):

    458793.837905 task-clock # 54.632 CPUs utilized ( +- 0.79% )
    41,831 context-switches # 0.091 K/sec ( +- 0.97% )
    98 cpu-migrations # 0.000 K/sec ( +- 1.66% )
    57,829 page-faults # 0.126 K/sec ( +- 0.62% )
    1,077,543,336,716 cycles # 2.349 GHz ( +- 0.81% ) [83.33%]
    1,067,403,802,964 stalled-cycles-frontend # 99.06% frontend cycles idle ( +- 0.87% ) [83.33%]
    864,764,616,143 stalled-cycles-backend # 80.25% backend cycles idle ( +- 0.73% ) [66.68%]
    16,129,177,440 instructions # 0.01 insns per cycle
    # 66.18 stalled cycles per insn ( +- 7.94% ) [83.35%]
    3,618,938,569 branches # 7.888 M/sec ( +- 8.46% ) [83.36%]
    33,242,032 branch-misses # 0.92% of all branches ( +- 2.02% ) [83.32%]

    8.397885779 seconds time elapsed ( +- 0.18% )

    HUGETLB, patched:
    -----------------

    Performance counter stats for './thp_memscale_hugetlbfs -c 80 -b 512M' (5 runs):

    395353.076837 task-clock # 20.329 CPUs utilized ( +- 8.16% )
    55,730 context-switches # 0.141 K/sec ( +- 5.31% )
    138 cpu-migrations # 0.000 K/sec ( +- 4.24% )
    21,027 page-faults # 0.053 K/sec ( +- 0.00% )
    930,219,717,244 cycles # 2.353 GHz ( +- 8.21% ) [83.32%]
    914,295,694,103 stalled-cycles-frontend # 98.29% frontend cycles idle ( +- 8.35% ) [83.33%]
    704,137,950,187 stalled-cycles-backend # 75.70% backend cycles idle ( +- 9.16% ) [66.69%]
    30,541,538,385 instructions # 0.03 insns per cycle
    # 29.94 stalled cycles per insn ( +- 3.98% ) [83.35%]
    8,415,376,631 branches # 21.286 M/sec ( +- 3.61% ) [83.36%]
    32,645,478 branch-misses # 0.39% of all branches ( +- 3.41% ) [83.32%]

    19.447481153 seconds time elapsed ( +- 2.00% )

    This patch (of 11):

    CONFIG_GENERIC_LOCKBREAK increases sizeof(spinlock_t) to 8 bytes. It
    leads to increase sizeof(struct page) by 4 bytes on 32-bit system if split
    page table lock is in use, since page->ptl shares space in union with
    longs and pointers.

    Let's disable split page table lock on 32-bit systems with
    GENERIC_LOCKBREAK enabled.

    Signed-off-by: Kirill A. Shutemov
    Cc: Alex Thorlton
    Cc: Ingo Molnar
    Cc: Naoya Horiguchi
    Cc: "Eric W . Biederman"
    Cc: "Paul E . McKenney"
    Cc: Al Viro
    Cc: Andi Kleen
    Cc: Andrea Arcangeli
    Cc: Dave Hansen
    Cc: Dave Jones
    Cc: David Howells
    Cc: Frederic Weisbecker
    Cc: Johannes Weiner
    Cc: Kees Cook
    Cc: Mel Gorman
    Cc: Michael Kerrisk
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Robin Holt
    Cc: Sedat Dilek
    Cc: Srikar Dronamraju
    Cc: Thomas Gleixner
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • There's only one caller of do_generic_file_read() and the only actor is
    file_read_actor(). No reason to have a callback parameter.

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Dave Hansen
    Reviewed-by: Wanpeng Li
    Cc: Matthew Wilcox
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Cc: Maxim Levitsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

14 Nov, 2013

19 commits

  • Pull ext4 changes from Ted Ts'o:
    "Ext4 updates for 3.13. Mostly bug fixes and cleanups"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: add prototypes for macro-generated functions
    ext4: return non-zero st_blocks for inline data
    ext4: use prandom_u32() instead of get_random_bytes()
    ext4: remove unreachable code after ext4_can_extents_be_merged()
    ext4: remove unreachable code in ext4_can_extents_be_merged()
    ext4: avoid bh leak in retry path of ext4_expand_extra_isize_ea()
    ext4: don't count free clusters from a corrupt block group
    ext4: fix FITRIM in no journal mode
    ext4: drop set but otherwise unused variable from ext4_add_dirent_to_inline()
    ext4: change ext4_read_inline_dir() to return 0 on success
    ext4: pair trace_ext4_writepages & trace_ext4_writepages_result
    ext4: add ratelimiting to ext4 messages
    ext4: fix performance regression in ext4_writepages
    ext4: fixup kerndoc annotation of mpage_map_and_submit_extent()
    ext4: fix assertion in ext4_add_complete_io()

    Linus Torvalds
     
  • Pull xfs update from Ben Myers:
    "For 3.13-rc1 we have an eclectic assortment of bugfixes, cleanups, and
    refactoring. Bugfixes that stand out are the fix for the AGF/AGI
    deadlock, incore extent list fixes, verifier fixes for v4 superblocks
    and growfs, and memory leaks. There are some asserts, warnings, and
    strings that were cleaned up. There was further rearrangement of code
    to make libxfs and the kernel sync up more easily, differences between
    v2 and v3 directory code were abstracted using an ops vector,
    xfs_inactive was reworked, and the preallocation/hole punching code
    was refactored.

    - simplify kmem_zone_zalloc
    - add traces for AGF/AGI read ops
    - add additional AIL traces
    - fix xfs_remove AGF vs AGI deadlock
    - fix the extent count of new incore extent page in the indirection
    array
    - don't fail bad secondary superblocks verification on v4 filesystems
    due to unzeroed bits after v4 fields
    - fix possible NULL dereference in xlog_verify_iclog
    - remove redundant assert in xfs_dir2_leafn_split
    - prevent stack overflows from page cache allocation
    - fix some sparse warnings
    - fix directory block format verifier to check the leaf entry count
    - abstract the differences in dir2/dir3 via an ops vector
    - continue process of reorganization to make libxfs/kernel code
    merges easier
    - refactor the preallocation and hole punching code
    - fix for growfs and verifiers
    - remove unnecessary scary corruption error when probing non-xfs
    filesystems
    - remove extra newlines from strings passed to printk
    - prevent deadlock trying to cover an active log
    - rework xfs_inactive()
    - add the inode directory type support to XFS_IOC_FSGEOM
    - cleanup (remove) usage of is_bad_inode
    - fix miscalculation in xfs_iext_realloc_direct which results in
    oversized direct extent list
    - remove unnecessary count arg to xfs_iomap_write_allocate
    - fix memory leak in xlog_recover_add_to_trans
    - check superblock instead of block magic to determine if dtype field
    is present
    - fix lockdep annotation due to project quotas
    - fix regression in xfs_node_toosmall which can lead to incorrect
    directory btree node collapse
    - make log recovery verify filesystem uuid of recovering blocks
    - fix XFS_IOC_FREE_EOFBLOCKS definition
    - remove invalid assert in xfs_inode_free
    - fix for AIL lock regression"

    * tag 'xfs-for-linus-v3.13-rc1' of git://oss.sgi.com/xfs/xfs: (49 commits)
    xfs: simplify kmem_{zone_}zalloc
    xfs: add tracepoints to AGF/AGI read operations
    xfs: trace AIL manipulations
    xfs: xfs_remove deadlocks due to inverted AGF vs AGI lock ordering
    xfs: fix the extent count when allocating an new indirection array entry
    xfs: be more forgiving of a v4 secondary sb w/ junk in v5 fields
    xfs: fix possible NULL dereference in xlog_verify_iclog
    xfs:xfs_dir2_node.c: pointer use before check for null
    xfs: prevent stack overflows from page cache allocation
    xfs: fix static and extern sparse warnings
    xfs: validity check the directory block leaf entry count
    xfs: make dir2 ftype offset pointers explicit
    xfs: convert directory vector functions to constants
    xfs: convert directory vector functions to constants
    xfs: vectorise encoding/decoding directory headers
    xfs: vectorise DA btree operations
    xfs: vectorise directory leaf operations
    xfs: vectorise directory data operations part 2
    xfs: vectorise directory data operations
    xfs: vectorise remaining shortform dir2 ops
    ...

    Linus Torvalds
     
  • Pull perf updates from Ingo Molnar:
    "A number of fixes:

    - Fix segfault on perf trace -i perf.data, from Namhyung Kim.

    - Fix segfault with --no-mmap-pages, from David Ahern.

    - Don't force a refresh during progress update in the TUI, greatly
    reducing startup costs, fix from Patrick Palka.

    - Fix sw clock event period test wrt not checking if using >
    max_sample_freq.

    - Handle throttle events in 'object code reading' test, fix from
    Adrian Hunter.

    - Prevent condition that all sort keys are elided, fix from Namhyung
    Kim.

    - Round mmap pages to power 2, from David Ahern.

    And a number of late arrival changes:

    - Add summary only option to 'perf trace', suppressing the decoding
    of events, from David Ahern

    - 'perf trace --summary' formatting simplifications, from Pekka
    Enberg.

    - Beautify fifth argument of mmap() as fd, in 'perf trace', from
    Namhyung Kim.

    - Add direct access to dynamic arrays in libtraceevent, from Steven
    Rostedt.

    - Synthesize non-exec MMAP records when --data used, allowing the
    resolution of data addresses to symbols (global variables, etc), by
    Arnaldo Carvalho de Melo.

    - Code cleanups by David Ahern and Adrian Hunter"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
    tools lib traceevent: Add direct access to dynamic arrays
    perf target: Shorten perf_target__ to target__
    perf tests: Handle throttle events in 'object code reading' test
    perf evlist: Refactor mmap_pages parsing
    perf evlist: Round mmap pages to power 2 - v2
    perf record: Fix segfault with --no-mmap-pages
    perf trace: Add summary only option
    perf trace: Simplify '--summary' output
    perf trace: Change syscall summary duration order
    perf tests: Compensate lower sample freq with longer test loop
    perf trace: Fix segfault on perf trace -i perf.data
    perf trace: Separate tp syscall field caching into init routine to be reused
    perf trace: Beautify fifth argument of mmap() as fd
    perf tests: Use lower sample_freq in sw clock event period test
    perf tests: Check return of perf_evlist__open sw clock event period test
    perf record: Move existing write_output into helper function
    perf record: Use correct return type for write()
    perf tools: Prevent condition that all sort keys are elided
    perf machine: Simplify synthesize_threads method
    perf machine: Introduce synthesize_threads method out of open coded equivalent
    ...

    Linus Torvalds
     
  • Pull two x86 fixes from Ingo Molnar.

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/microcode/amd: Tone down printk(), don't treat a missing firmware file as an error
    x86/dumpstack: Fix printk_address for direct addresses

    Linus Torvalds
     
  • Pull scheduler fixes from Ingo Molnar:
    "Four bugfixes and one performance fix"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/fair: Avoid integer overflow
    sched: Optimize task_sched_runtime()
    sched/numa: Cure update_numa_stats() vs. hotplug
    sched/numa: Fix NULL pointer dereference in task_numa_migrate()
    sched: Fix endless sync_sched/rcu() loop inside _cpu_down()

    Linus Torvalds
     
  • Pull core locking changes from Ingo Molnar:
    "The biggest changes:

    - add lockdep support for seqcount/seqlocks structures, this
    unearthed both bugs and required extra annotation.

    - move the various kernel locking primitives to the new
    kernel/locking/ directory"

    * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
    block: Use u64_stats_init() to initialize seqcounts
    locking/lockdep: Mark __lockdep_count_forward_deps() as static
    lockdep/proc: Fix lock-time avg computation
    locking/doc: Update references to kernel/mutex.c
    ipv6: Fix possible ipv6 seqlock deadlock
    cpuset: Fix potential deadlock w/ set_mems_allowed
    seqcount: Add lockdep functionality to seqcount/seqlock structures
    net: Explicitly initialize u64_stats_sync structures for lockdep
    locking: Move the percpu-rwsem code to kernel/locking/
    locking: Move the lglocks code to kernel/locking/
    locking: Move the rwsem code to kernel/locking/
    locking: Move the rtmutex code to kernel/locking/
    locking: Move the semaphore core to kernel/locking/
    locking: Move the spinlock code to kernel/locking/
    locking: Move the lockdep code to kernel/locking/
    locking: Move the mutex code to kernel/locking/
    hung_task debugging: Add tracepoint to report the hang
    x86/locking/kconfig: Update paravirt spinlock Kconfig description
    lockstat: Report avg wait and hold times
    lockdep, x86/alternatives: Drop ancient lockdep fixup message
    ...

    Linus Torvalds
     
  • Pull x86/trace changes from Ingo Molnar:
    "This adds page fault tracepoints which have zero runtime cost in the
    disabled case via IDT trickery (no NOPs in the page fault hotpath)"

    * 'x86-trace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86, trace: Change user|kernel_page_fault to page_fault_user|kernel
    x86, trace: Add page fault tracepoints
    x86, trace: Delete __trace_alloc_intr_gate()
    x86, trace: Register exception handler to trace IDT
    x86, trace: Remove __alloc_intr_gate()

    Linus Torvalds
     
  • Pull fbdev changes from Tomi Valkeinen:
    "Nothing particularly stands out in this pull request. The biggest
    part of the changes are cleanups.

    Maybe one fix to mention is the "fb: reorder the lock sequence to fix
    potential dead lock" which hopefully fixes the fb locking issues
    reported by multiple persons.

    There are also a few commits that have changes to arch/arm/mach-at91
    and arch/avr32, which have been acked by the maintainers"

    * tag 'fbdev-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux: (143 commits)
    fb: reorder the lock sequence to fix potential dead lock
    fbdev: shmobile-lcdcfb: Convert to clk_prepare/unprepare
    fbdev: shmobile-hdmi: Convert to clk_prepare/unprepare
    omapdss: Add new panel driver for Topolly td028ttec1 LCD.
    video: exynos_mipi_dsi: Unlock the mutex before returning
    video: da8xx-fb: remove unwanted define
    video: Remove unnecessary semicolons
    simplefb: use write-combined remapping
    simplefb: fix unmapping fb during destruction
    OMAPDSS: connector-dvi: fix releasing i2c_adapter
    OMAPDSS: DSI: fix perf measuring ifdefs
    framebuffer: Use fb_
    framebuffer: Add fb_ convenience logging macros
    efifb: prevent null-deref when iterating dmi_list
    fbdev: fix error return code in metronomefb_probe()
    video: xilinxfb: Fix for "Use standard variable name convention"
    OMAPDSS: Fix de_level in videomode_to_omap_video_timings()
    video: xilinxfb: Simplify error path
    video: xilinxfb: Use devm_kzalloc instead of kzalloc
    video: xilinxfb: Use standard variable name convention
    ...

    Linus Torvalds
     
  • Pull thermal management updates from Zhang Rui:
    "This time we only have a few changes as there are no soc thermal
    changes from Eduardo. The only big change is the introduction of
    TMON, a tool to help visualize, tune, and test the thermal subsystem.
    The rest is mostly cleanups and fixes all over.

    Specifics:

    - introduce TMON, a tool base on thermal sysfs I/F. It can be used
    to visualize, tune and test the thermal subsystem.

    - fix a zone/cooling device binding problem, when both thermal zone
    bind parameters and .bind() callback are available"

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
    tools/thermal: Introduce tmon, a tool for thermal subsystem
    thermal: Fix binding problem when there is thermal zone params
    thermal: cpu_cooling: fix return value check in cpufreq_cooling_register()
    Thermal: Check for validity before doing kfree
    thermal/intel_powerclamp: Add newer CPU models
    Thermal: Tidy up error handling in powerclamp_init
    thermal: Kconfig: cosmetic fixes
    ACPI/thermal : Remove zone disabled warning
    typo in drivers/thermal/Kconfig: lpatform instead of platform

    Linus Torvalds
     
  • Pull PCI changes from Bjorn Helgaas:
    "Resource management
    - Fix host bridge window coalescing (Alexey Neyman)
    - Pass type, width, and prefetchability for window alignment (Wei Yang)

    PCI device hotplug
    - Convert acpiphp, acpiphp_ibm to dynamic debug (Lan Tianyu)

    Power management
    - Remove pci_pm_complete() (Liu Chuansheng)

    MSI
    - Fail initialization if device is not in PCI_D0 (Yijing Wang)

    MPS (Max Payload Size)
    - Use pcie_get_mps() and pcie_set_mps() to simplify code (Yijing Wang)
    - Use pcie_set_readrq() to simplify code (Yijing Wang)
    - Use cached pci_dev->pcie_mpss to simplify code (Yijing Wang)

    SR-IOV
    - Enable upstream bridges even for VFs on virtual buses (Bjorn Helgaas)
    - Use pci_is_root_bus() to avoid catching virtual buses (Wei Yang)

    Virtualization
    - Add x86 MSI masking ops (Konrad Rzeszutek Wilk)

    Freescale i.MX6
    - Support i.MX6 PCIe controller (Sean Cross)
    - Increase link startup timeout (Marek Vasut)
    - Probe PCIe in fs_initcall() (Marek Vasut)
    - Fix imprecise abort handler (Tim Harvey)
    - Remove redundant of_match_ptr (Sachin Kamat)

    Renesas R-Car
    - Support Gen2 internal PCIe controller (Valentine Barshak)

    Samsung Exynos
    - Add MSI support (Jingoo Han)
    - Turn off power when link fails (Jingoo Han)
    - Add Jingoo Han as maintainer (Jingoo Han)
    - Add clk_disable_unprepare() on error path (Wei Yongjun)
    - Remove redundant of_match_ptr (Sachin Kamat)

    Synopsys DesignWare
    - Add irq_create_mapping() (Pratyush Anand)
    - Add header guards (Seungwon Jeon)

    Miscellaneous
    - Enable native PCIe services by default on non-ACPI (Andrew Murray)
    - Cleanup _OSC usage and messages (Bjorn Helgaas)
    - Remove pcibios_last_bus boot option on non-x86 (Bjorn Helgaas)
    - Convert bus code to use bus_, drv_, and dev_groups (Greg Kroah-Hartman)
    - Remove unused pci_mem_start (Myron Stowe)
    - Make sysfs functions static (Sachin Kamat)
    - Warn on invalid return from driver probe (Stephen M. Cameron)
    - Remove Intel Haswell D3 delays (Todd E Brandt)
    - Call pci_set_master() in core if driver doesn't do it (Yinghai Lu)
    - Use pci_is_pcie() to simplify code (Yijing Wang)
    - Use PCIe capability accessors to simplify code (Yijing Wang)
    - Use cached pci_dev->pcie_cap to simplify code (Yijing Wang)
    - Removed unused "is_pcie" from struct pci_dev (Yijing Wang)
    - Simplify sysfs CPU affinity implementation (Yijing Wang)"

    * tag 'pci-v3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (79 commits)
    PCI: Enable upstream bridges even for VFs on virtual buses
    PCI: Add pci_upstream_bridge()
    PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()
    PCI: Warn on driver probe return value greater than zero
    PCI: Drop warning about drivers that don't use pci_set_master()
    PCI: Workaround missing pci_set_master in pci drivers
    powerpc/pci: Use pci_is_pcie() to simplify code [fix]
    PCI: Update pcie_ports 'auto' behavior for non-ACPI platforms
    PCI: imx6: Probe the PCIe in fs_initcall()
    PCI: Add R-Car Gen2 internal PCI support
    PCI: imx6: Remove redundant of_match_ptr
    PCI: Report pci_pme_active() kmalloc failure
    mn10300/PCI: Remove useless pcibios_last_bus
    frv/PCI: Remove pcibios_last_bus
    PCI: imx6: Increase link startup timeout
    PCI: exynos: Remove redundant of_match_ptr
    PCI: imx6: Fix imprecise abort handler
    PCI: Fail MSI/MSI-X initialization if device is not in PCI_D0
    PCI: imx6: Remove redundant dev_err() in imx6_pcie_probe()
    x86/PCI: Coalesce multiple overlapping host bridge windows
    ...

    Linus Torvalds
     
  • Pull ACPI and power management updates from Rafael J Wysocki:

    - New power capping framework and the the Intel Running Average Power
    Limit (RAPL) driver using it from Srinivas Pandruvada and Jacob Pan.

    - Addition of the in-kernel switching feature to the arm_big_little
    cpufreq driver from Viresh Kumar and Nicolas Pitre.

    - cpufreq support for iMac G5 from Aaro Koskinen.

    - Baytrail processors support for intel_pstate from Dirk Brandewie.

    - cpufreq support for Midway/ECX-2000 from Mark Langsdorf.

    - ARM vexpress/TC2 cpufreq support from Sudeep KarkadaNagesha.

    - ACPI power management support for the I2C and SPI bus types from Mika
    Westerberg and Lv Zheng.

    - cpufreq core fixes and cleanups from Viresh Kumar, Srivatsa S Bhat,
    Stratos Karafotis, Xiaoguang Chen, Lan Tianyu.

    - cpufreq drivers updates (mostly fixes and cleanups) from Viresh
    Kumar, Aaro Koskinen, Jungseok Lee, Sudeep KarkadaNagesha, Lukasz
    Majewski, Manish Badarkhe, Hans-Christian Egtvedt, Evgeny Kapaev.

    - intel_pstate updates from Dirk Brandewie and Adrian Huang.

    - ACPICA update to version 20130927 includig fixes and cleanups and
    some reduction of divergences between the ACPICA code in the kernel
    and ACPICA upstream in order to improve the automatic ACPICA patch
    generation process. From Bob Moore, Lv Zheng, Tomasz Nowicki, Naresh
    Bhat, Bjorn Helgaas, David E Box.

    - ACPI IPMI driver fixes and cleanups from Lv Zheng.

    - ACPI hotplug fixes and cleanups from Bjorn Helgaas, Toshi Kani, Zhang
    Yanfei, Rafael J Wysocki.

    - Conversion of the ACPI AC driver to the platform bus type and
    multiple driver fixes and cleanups related to ACPI from Zhang Rui.

    - ACPI processor driver fixes and cleanups from Hanjun Guo, Jiang Liu,
    Bartlomiej Zolnierkiewicz, Mathieu Rhéaume, Rafael J Wysocki.

    - Fixes and cleanups and new blacklist entries related to the ACPI
    video support from Aaron Lu, Felipe Contreras, Lennart Poettering,
    Kirill Tkhai.

    - cpuidle core cleanups from Viresh Kumar and Lorenzo Pieralisi.

    - cpuidle drivers fixes and cleanups from Daniel Lezcano, Jingoo Han,
    Bartlomiej Zolnierkiewicz, Prarit Bhargava.

    - devfreq updates from Sachin Kamat, Dan Carpenter, Manish Badarkhe.

    - Operation Performance Points (OPP) core updates from Nishanth Menon.

    - Runtime power management core fix from Rafael J Wysocki and update
    from Ulf Hansson.

    - Hibernation fixes from Aaron Lu and Rafael J Wysocki.

    - Device suspend/resume lockup detection mechanism from Benoit Goby.

    - Removal of unused proc directories created for various ACPI drivers
    from Lan Tianyu.

    - ACPI LPSS driver fix and new device IDs for the ACPI platform scan
    handler from Heikki Krogerus and Jarkko Nikula.

    - New ACPI _OSI blacklist entry for Toshiba NB100 from Levente Kurusa.

    - Assorted fixes and cleanups related to ACPI from Andy Shevchenko, Al
    Stone, Bartlomiej Zolnierkiewicz, Colin Ian King, Dan Carpenter,
    Felipe Contreras, Jianguo Wu, Lan Tianyu, Yinghai Lu, Mathias Krause,
    Liu Chuansheng.

    - Assorted PM fixes and cleanups from Andy Shevchenko, Thierry Reding,
    Jean-Christophe Plagniol-Villard.

    * tag 'pm+acpi-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (386 commits)
    cpufreq: conservative: fix requested_freq reduction issue
    ACPI / hotplug: Consolidate deferred execution of ACPI hotplug routines
    PM / runtime: Use pm_runtime_put_sync() in __device_release_driver()
    ACPI / event: remove unneeded NULL pointer check
    Revert "ACPI / video: Ignore BIOS initial backlight value for HP 250 G1"
    ACPI / video: Quirk initial backlight level 0
    ACPI / video: Fix initial level validity test
    intel_pstate: skip the driver if ACPI has power mgmt option
    PM / hibernate: Avoid overflow in hibernate_preallocate_memory()
    ACPI / hotplug: Do not execute "insert in progress" _OST
    ACPI / hotplug: Carry out PCI root eject directly
    ACPI / hotplug: Merge device hot-removal routines
    ACPI / hotplug: Make acpi_bus_hot_remove_device() internal
    ACPI / hotplug: Simplify device ejection routines
    ACPI / hotplug: Fix handle_root_bridge_removal()
    ACPI / hotplug: Refuse to hot-remove all objects with disabled hotplug
    ACPI / scan: Start matching drivers after trying scan handlers
    ACPI: Remove acpi_pci_slot_init() headers from internal.h
    ACPI / blacklist: fix name of ThinkPad Edge E530
    PowerCap: Fix build error with option -Werror=format-security
    ...

    Conflicts:
    arch/arm/mach-omap2/opp.c
    drivers/Kconfig
    drivers/spi/spi.c

    Linus Torvalds
     
  • Pull device mapper changes from Mike Snitzer:
    "A set of device-mapper changes for 3.13.

    Improve reliability of buffer allocations for dm messages with a small
    number of arguments, a couple path group initialization fixes for dm
    multipath, a fix for resizing a dm array, various fixes and
    optimizations for dm cache, a fix for device mapper's Kconfig menu
    indentation.

    Features added include:
    - dm crypt support for activating legacy CBC TrueCrypt containers
    (useful for forensics of these old TCRYPT containers)
    - reduced dm-cache memory requirements for each block in the cache
    - basic support for shrinking a dm-cache's cache (fast) device
    - most notably, dm-cache support for managing cache coherency when
    deploying dm-cache with sophisticated origin volumes (that support
    hardware snapshots and/or clustering): these changes come in the
    form of a new passthrough operation mode and a cache block
    invalidation interface"

    * tag 'dm-3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (32 commits)
    dm cache: resolve small nits and improve Documentation
    dm cache: add cache block invalidation support
    dm cache: add remove_cblock method to policy interface
    dm cache policy mq: reduce memory requirements
    dm cache metadata: check the metadata version when reading the superblock
    dm cache: add passthrough mode
    dm cache: cache shrinking support
    dm cache: promotion optimisation for writes
    dm cache: be much more aggressive about promoting writes to discarded blocks
    dm cache policy mq: implement writeback_work() and mq_{set,clear}_dirty()
    dm cache: optimize commit_if_needed
    dm space map disk: optimise sm_disk_dec_block
    MAINTAINERS: add reference to device-mapper's linux-dm.git tree
    dm: fix Kconfig menu indentation
    dm: allow remove to be deferred
    dm table: print error on preresume failure
    dm crypt: add TCW IV mode for old CBC TCRYPT containers
    dm crypt: properly handle extra key string in initialization
    dm cache: log error message if dm_kcopyd_copy() fails
    dm cache: use cell_defer() boolean argument consistently
    ...

    Linus Torvalds
     
  • Pull MTD changes from Brian Norris:
    - Unify some compile-time differences so that we have fewer uses of
    #ifdef CONFIG_OF in atmel_nand
    - Other general cleanups (removing unused functions, options,
    variables, fields; use correct interfaces)
    - Fix BUG() for new odd-sized NAND, which report non-power-of-2
    dimensions via ONFI
    - Miscellaneous driver fixes (SPI NOR flash; BCM47xx NAND flash; etc.)
    - Improve differentiation between SLC and MLC NAND -- this clarifies an
    ABI issue regarding the MTD "type" (in sysfs and in the MEMGETINFO
    ioctl), where the MTD_MLCNANDFLASH type was present but
    inconsistently used
    - Extend GPMI NAND to support multi-chip-select NAND for some platforms
    - Many improvements to the OMAP2/3 NAND driver, including an expanded
    DT binding to bring us closer to mainline support for some OMAP
    systems
    - Fix a deadlock in the error path of the Atmel NAND driver probe
    - Correct the error codes from MTD mmap() to conform to POSIX and the
    Linux Programmer's Manual. This is an acknowledged change in the MTD
    ABI, but I can't imagine somebody relying on the non-standard -ENOSYS
    error code specifically. Am I just being unimaginative? :)
    - Fix a few important GPMI NAND bugs (one regression from 3.12 and one
    long-standing race condition)
    - More? Read the log!

    * tag 'for-linus-20131112' of git://git.infradead.org/linux-mtd: (98 commits)
    mtd: gpmi: fix the NULL pointer
    mtd: gpmi: fix kernel BUG due to racing DMA operations
    mtd: mtdchar: return expected errors on mmap() call
    mtd: gpmi: only scan two chips for imx6
    mtd: gpmi: Use devm_kzalloc()
    mtd: atmel_nand: fix bug driver will in a dead lock if no nand detected
    mtd: nand: use a local variable to simplify the nand_scan_tail
    mtd: nand: remove deprecated IRQF_DISABLED
    mtd: dataflash: Say if we find a device we don't support
    mtd: nand: omap: fix error return code in omap_nand_probe()
    mtd: nand_bbt: kill NAND_BBT_SCANALLPAGES
    mtd: m25p80: fixup device removal failure path
    mtd: mxc_nand: Include linux/of.h header
    mtd: remove duplicated include from mtdcore.c
    mtd: m25p80: add support for Macronix mx25l3255e
    mtd: nand: omap: remove selection of BCH ecc-scheme via KConfig
    mtd: nand: omap: updated devm_xx for all resource allocation and free calls
    mtd: nand: omap: use drivers/mtd/nand/nand_bch.c wrapper for BCH ECC instead of lib/bch.c
    mtd: nand: omap: clean-up ecc layout for BCH ecc schemes
    mtd: nand: omap2: clean-up BCHx_HW and BCHx_SW ECC configurations in device_probe
    ...

    Linus Torvalds
     
  • Pull first round of SCSI updates from James Bottomley:
    "This patch set is driver updates for qla4xxx, scsi_debug, pm80xx,
    fcoe/libfc, eas2r, lpfc, be2iscsi and megaraid_sas plus some assorted
    bug fixes and cleanups"

    * tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (106 commits)
    [SCSI] scsi_error: Escalate to LUN reset if abort fails
    [SCSI] Add 'eh_deadline' to limit SCSI EH runtime
    [SCSI] remove check for 'resetting'
    [SCSI] dc395: Move 'last_reset' into internal host structure
    [SCSI] tmscsim: Move 'last_reset' into host structure
    [SCSI] advansys: Remove 'last_reset' references
    [SCSI] dpt_i2o: return SCSI_MLQUEUE_HOST_BUSY when in reset
    [SCSI] dpt_i2o: Remove DPTI_STATE_IOCTL
    [SCSI] megaraid_sas: Fix synchronization problem between sysPD IO path and AEN path
    [SCSI] lpfc: Fix typo on NULL assignment
    [SCSI] scsi_dh_alua: ALUA handler attach should succeed while TPG is transitioning
    [SCSI] scsi_dh_alua: ALUA check sense should retry device internal reset unit attention
    [SCSI] esas2r: Cleanup snprinf formatting of firmware version
    [SCSI] esas2r: Remove superfluous mask of pcie_cap_reg
    [SCSI] esas2r: Fixes for big-endian platforms
    [SCSI] esas2r: Directly call kernel functions for atomic bit operations
    [SCSI] lpfc 8.3.43: Update lpfc version to driver version 8.3.43
    [SCSI] lpfc 8.3.43: Fixed not processing task management IOCB response status
    [SCSI] lpfc 8.3.43: Fixed spinlock hang.
    [SCSI] lpfc 8.3.43: Fixed invalid Total_Data_Placed value received for els and ct command responses
    ...

    Linus Torvalds
     
  • Pull block driver updates from Jens Axboe:
    "This is the block driver pull request for 3.13. As with the core pull
    request just sent out, this was rebased on top of the core branch
    again after the immutable series was pulled. This also means that
    bcache gets to sit the initial pull over. I will send a second driver
    pull request in the merge window to get those fixes in, once they have
    been rebased and tested on top of the non-immutable stack.

    This pull request contains:

    - Add support for the sTec Kronos pci-e flash card from sTec. Also
    has various cleanups for this driver, from myself, Bart, Mike
    Snizter, and Wei Yongjun.

    - Add surprise removal support for the micron mtip32xx driver from
    Micron.

    - Floppy documentation fix from Ben Harris.

    - debugfs bug fix for pktcdvd from Dan Carpenter.

    - Fix for the mtip32xx driver stack usage in the debugfs path,
    dynamically allocating those buffers instead. From David Milburn.

    - Disable cpqarray in Kconfig. The plan is to remove it on request
    of HP, but lets disable it for a few revisions just to see if
    anyone yells.

    - drbd fixes from Lars Ellenberg and Philipp Reisner.

    - Elevator switch fix for the s390 block driver from Heiko Carstens.

    - loop crash fix on IO to unassigned device from Mikulas Patocka.

    - A series of bug fixes for the IBM rsxx pci-e flash driver from
    Philip J Kelleher.

    - cciss probe fix from Stephen Cameron.

    - Xen block front/back fixes from Roger Pau Monne and Vegard Nossum"

    * 'for-3.13/drivers' of git://git.kernel.dk/linux-block: (41 commits)
    floppy: Correct documentation of driver options when used as a module.
    pktcdvd: debugfs functions return NULL on error
    xen-blkfront: restore the non-persistent data path
    skd: fix formatting in skd_s1120.h
    skd: reorder construct/destruct code
    skd: cleanup skd_do_inq_page_da()
    skd: remove SKD_OMIT_FROM_SRC_DIST ifdefs
    skd: remove redundant skdev->pdev assignment from skd_pci_probe()
    skd: use
    skd: remove SCSI subsystem specific includes
    skd: register block device only if some devices are present
    skd: fix error messages in skd_init()
    skd: fix error paths in skd_init()
    skd: fix unregister_blkdev() placement
    skd: more removal of bio-based code
    skd: cleanup the skd_*() function block wrapping
    skd: rip out bio path
    skd: fix error return code in skd_pci_probe()
    s390/dasd: hold request queue sysfs lock when calling elevator_init()
    cciss: return 0 from driver probe function on success, not 1
    ...

    Linus Torvalds
     
  • Pull block IO core updates from Jens Axboe:
    "This is the pull request for the core changes in the block layer for
    3.13. It contains:

    - The new blk-mq request interface.

    This is a new and more scalable queueing model that marries the
    best part of the request based interface we currently have (which
    is fully featured, but scales poorly) and the bio based "interface"
    which the new drivers for high IOPS devices end up using because
    it's much faster than the request based one.

    The bio interface has no block layer support, since it taps into
    the stack much earlier. This means that drivers end up having to
    implement a lot of functionality on their own, like tagging,
    timeout handling, requeue, etc. The blk-mq interface provides all
    these. Some drivers even provide a switch to select bio or rq and
    has code to handle both, since things like merging only works in
    the rq model and hence is faster for some workloads. This is a
    huge mess. Conversion of these drivers nets us a substantial code
    reduction. Initial results on converting SCSI to this model even
    shows an 8x improvement on single queue devices. So while the
    model was intended to work on the newer multiqueue devices, it has
    substantial improvements for "classic" hardware as well. This code
    has gone through extensive testing and development, it's now ready
    to go. A pull request is coming to convert virtio-blk to this
    model will be will be coming as well, with more drivers scheduled
    for 3.14 conversion.

    - Two blktrace fixes from Jan and Chen Gang.

    - A plug merge fix from Alireza Haghdoost.

    - Conversion of __get_cpu_var() from Christoph Lameter.

    - Fix for sector_div() with 64-bit divider from Geert Uytterhoeven.

    - A fix for a race between request completion and the timeout
    handling from Jeff Moyer. This is what caused the merge conflict
    with blk-mq/core, in case you are looking at that.

    - A dm stacking fix from Mike Snitzer.

    - A code consolidation fix and duplicated code removal from Kent
    Overstreet.

    - A handful of block bug fixes from Mikulas Patocka, fixing a loop
    crash and memory corruption on blk cg.

    - Elevator switch bug fix from Tomoki Sekiyama.

    A heads-up that I had to rebase this branch. Initially the immutable
    bio_vecs had been queued up for inclusion, but a week later, it became
    clear that it wasn't fully cooked yet. So the decision was made to
    pull this out and postpone it until 3.14. It was a straight forward
    rebase, just pruning out the immutable series and the later fixes of
    problems with it. The rest of the patches applied directly and no
    further changes were made"

    * 'for-3.13/core' of git://git.kernel.dk/linux-block: (31 commits)
    block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
    block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
    block: Do not call sector_div() with a 64-bit divisor
    kernel: trace: blktrace: remove redundent memcpy() in compat_blk_trace_setup()
    block: Consolidate duplicated bio_trim() implementations
    block: Use rw_copy_check_uvector()
    block: Enable sysfs nomerge control for I/O requests in the plug list
    block: properly stack underlying max_segment_size to DM device
    elevator: acquire q->sysfs_lock in elevator_change()
    elevator: Fix a race in elevator switching and md device initialization
    block: Replace __get_cpu_var uses
    bdi: test bdi_init failure
    block: fix a probe argument to blk_register_region
    loop: fix crash if blk_alloc_queue fails
    blk-core: Fix memory corruption if blkcg_init_queue fails
    block: fix race between request completion and timeout handling
    blktrace: Send BLK_TN_PROCESS events to all running traces
    blk-mq: don't disallow request merges for req->special being set
    blk-mq: mq plug list breakage
    blk-mq: fix for flush deadlock
    ...

    Linus Torvalds
     
  • Pull VFS fixes from Al Viro:
    "Several fixes, mostly for regressions in the last pile. Howeover,
    prepend_path() forgetting to reininitalize dentry/vfsmount is in 3.12
    as well and qib_fs had been leaking all along..."

    The unpaired RCU lock issue was also independently reported by Dave
    Jones with his fuzzer tool..

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    qib_fs: fix (some) dcache abuses
    prepend_path() needs to reinitialize dentry/vfsmount/mnt on restarts
    fix unpaired rcu lock in prepend_path()
    locks: missing unlock on error in generic_add_lease()
    aio: checking for NULL instead of IS_ERR

    Linus Torvalds
     
  • Pull ARM updates from Russell King:
    "Included in this series are:

    1. BE8 (modern big endian) changes for ARM from Ben Dooks
    2. big.Little support from Nicolas Pitre and Dave Martin
    3. support for LPAE systems with all system memory above 4GB
    4. Perf updates from Will Deacon
    5. Additional prefetching and other performance improvements from Will.
    6. Neon-optimised AES implementation fro Ard.
    7. A number of smaller fixes scattered around the place.

    There is a rather horrid merge conflict in tools/perf - I was never
    notified of the conflict because it originally occurred between Will's
    tree and other stuff. Consequently I have a resolution which Will
    forwarded me, which I'll forward on immediately after sending this
    mail.

    The other notable thing is I'm expecting some build breakage in the
    crypto stuff on ARM only with Ard's AES patches. These were merged
    into a stable git branch which others had already pulled, so there's
    little I can do about this. The problem is caused because these
    patches have a dependency on some code in the crypto git tree - I
    tried requesting a branch I can pull to resolve these, and all I got
    each time from the crypto people was "we'll revert our patches then"
    which would only make things worse since I still don't have the
    dependent patches. I've no idea what's going on there or how to
    resolve that, and since I can't split these patches from the rest of
    this pull request, I'm rather stuck with pushing this as-is or
    reverting Ard's patches.

    Since it should "come out in the wash" I've left them in - the only
    build problems they seem to cause at the moment are with randconfigs,
    and since it's a new feature anyway. However, if by -rc1 the
    dependencies aren't in, I think it'd be best to revert Ard's patches"

    I resolved the perf conflict roughly as per the patch sent by Russell,
    but there may be some differences. Any errors are likely mine. Let's
    see how the crypto issues work out..

    * 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (110 commits)
    ARM: 7868/1: arm/arm64: remove atomic_clear_mask() in "include/asm/atomic.h"
    ARM: 7867/1: include: asm: use 'int' instead of 'unsigned long' for 'oldval' in atomic_cmpxchg().
    ARM: 7866/1: include: asm: use 'long long' instead of 'u64' within atomic.h
    ARM: 7871/1: amba: Extend number of IRQS
    ARM: 7887/1: Don't smp_cross_call() on UP devices in arch_irq_work_raise()
    ARM: 7872/1: Support arch_irq_work_raise() via self IPIs
    ARM: 7880/1: Clear the IT state independent of the Thumb-2 mode
    ARM: 7878/1: nommu: Implement dummy early_paging_init()
    ARM: 7876/1: clear Thumb-2 IT state on exception handling
    ARM: 7874/2: bL_switcher: Remove cpu_hotplug_driver_{lock,unlock}()
    ARM: footbridge: fix build warnings for netwinder
    ARM: 7873/1: vfp: clear vfp_current_hw_state for dying cpu
    ARM: fix misplaced arch_virt_to_idmap()
    ARM: 7848/1: mcpm: Implement cpu_kill() to synchronise on powerdown
    ARM: 7847/1: mcpm: Factor out logical-to-physical CPU translation
    ARM: 7869/1: remove unused XSCALE_PMU Kconfig param
    ARM: 7864/1: Handle 64-bit memory in case of 32-bit phys_addr_t
    ARM: 7863/1: Let arm_add_memory() always use 64-bit arguments
    ARM: 7862/1: pcpu: replace __get_cpu_var_uses
    ARM: 7861/1: cacheflush: consolidate single-CPU ARMv7 cache disabling code
    ...

    Linus Torvalds
     
  • Pull DMA mask updates from Russell King:
    "This series cleans up the handling of DMA masks in a lot of drivers,
    fixing some bugs as we go.

    Some of the more serious errors include:
    - drivers which only set their coherent DMA mask if the attempt to
    set the streaming mask fails.
    - drivers which test for a NULL dma mask pointer, and then set the
    dma mask pointer to a location in their module .data section -
    which will cause problems if the module is reloaded.

    To counter these, I have introduced two helper functions:
    - dma_set_mask_and_coherent() takes care of setting both the
    streaming and coherent masks at the same time, with the correct
    error handling as specified by the API.
    - dma_coerce_mask_and_coherent() which resolves the problem of
    drivers forcefully setting DMA masks. This is more a marker for
    future work to further clean these locations up - the code which
    creates the devices really should be initialising these, but to fix
    that in one go along with this change could potentially be very
    disruptive.

    The last thing this series does is prise away some of Linux's addition
    to "DMA addresses are physical addresses and RAM always starts at
    zero". We have ARM LPAE systems where all system memory is above 4GB
    physical, hence having DMA masks interpreted by (eg) the block layers
    as describing physical addresses in the range 0..DMAMASK fails on
    these platforms. Santosh Shilimkar addresses this in this series; the
    patches were copied to the appropriate people multiple times but were
    ignored.

    Fixing this also gets rid of some ARM weirdness in the setup of the
    max*pfn variables, and brings ARM into line with every other Linux
    architecture as far as those go"

    * 'for-linus-dma-masks' of git://git.linaro.org/people/rmk/linux-arm: (52 commits)
    ARM: 7805/1: mm: change max*pfn to include the physical offset of memory
    ARM: 7797/1: mmc: Use dma_max_pfn(dev) helper for bounce_limit calculations
    ARM: 7796/1: scsi: Use dma_max_pfn(dev) helper for bounce_limit calculations
    ARM: 7795/1: mm: dma-mapping: Add dma_max_pfn(dev) helper function
    ARM: 7794/1: block: Rename parameter dma_mask to max_addr for blk_queue_bounce_limit()
    ARM: DMA-API: better handing of DMA masks for coherent allocations
    ARM: 7857/1: dma: imx-sdma: setup dma mask
    DMA-API: firmware/google/gsmi.c: avoid direct access to DMA masks
    DMA-API: dcdbas: update DMA mask handing
    DMA-API: dma: edma.c: no need to explicitly initialize DMA masks
    DMA-API: usb: musb: use platform_device_register_full() to avoid directly messing with dma masks
    DMA-API: crypto: remove last references to 'static struct device *dev'
    DMA-API: crypto: fix ixp4xx crypto platform device support
    DMA-API: others: use dma_set_coherent_mask()
    DMA-API: staging: use dma_set_coherent_mask()
    DMA-API: usb: use new dma_coerce_mask_and_coherent()
    DMA-API: usb: use dma_set_coherent_mask()
    DMA-API: parport: parport_pc.c: use dma_coerce_mask_and_coherent()
    DMA-API: net: octeon: use dma_coerce_mask_and_coherent()
    DMA-API: net: nxp/lpc_eth: use dma_coerce_mask_and_coherent()
    ...

    Linus Torvalds
     

13 Nov, 2013

6 commits

  • * lookup_one_len() really wants i_mutex held on directory.
    * leaks galore - just mount ipathfs, then
    cd /sys/bus/pci/drivers/qib_ib; echo *:*:*.* >unbind
    on a box with that card present and try to umount ipathfs...

    Signed-off-by: Al Viro

    Al Viro
     
  • Now that seqcounts are lockdep enabled objects, we need to explicitly
    initialize runtime allocated seqcounts so that lockdep can track them.

    Without this patch, Fengguang was seeing:

    [ 4.127282] INFO: trying to register non-static key.
    [ 4.128027] the code is fine but needs lockdep annotation.
    [ 4.128027] turning off the locking correctness validator.
    [ 4.128027] CPU: 0 PID: 96 Comm: kworker/u4:1 Not tainted 3.12.0-next-20131108-10601-gbad570d #2
    [ 4.128027] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    [ ... ]
    [ 4.128027] Call Trace:
    [ 4.128027] [] ? console_unlock+0x353/0x380
    [ 4.128027] [] dump_stack+0x48/0x60
    [ 4.128027] [] __lock_acquire.isra.26+0x7e3/0xceb
    [ 4.128027] [] lock_acquire+0x71/0x9a
    [ 4.128027] [] ? blk_throtl_bio+0x1c3/0x485
    [ 4.128027] [] throtl_update_dispatch_stats+0x7c/0x153
    [ 4.128027] [] ? blk_throtl_bio+0x1c3/0x485
    [ 4.128027] [] blk_throtl_bio+0x1c3/0x485
    ...

    Use u64_stats_init() for all affected data structures, which initializes
    the seqcount.

    Reported-and-Tested-by: Fengguang Wu
    Cc: Vivek Goyal
    Cc: Jens Axboe
    Signed-off-by: Peter Zijlstra
    [ Folded in another fix from the mailing list as well as a fix to that fix. Tweaked commit message. ]
    Signed-off-by: John Stultz
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1384314134-6895-1-git-send-email-john.stultz@linaro.org
    [ So I actually think that the two SOBs from PeterZ are the right depiction of the patch route. ]
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • There are new Sparse warnings:

    >> kernel/locking/lockdep.c:1235:15: sparse: symbol '__lockdep_count_forward_deps' was not declared. Should it be static?
    >> kernel/locking/lockdep.c:1261:15: sparse: symbol '__lockdep_count_backward_deps' was not declared. Should it be static?

    Please consider folding the attached diff :-)

    Signed-off-by: Fengguang Wu
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/527d1787.ThzXGoUspZWehFDl\%fengguang.wu@intel.com
    Signed-off-by: Ingo Molnar

    Fengguang Wu
     
  • ... and equivalent is needed in 3.12; it's broken there as well

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Li Zhong
    Signed-off-by: Al Viro

    Li Zhong
     
  • sa->runnable_avg_sum is of type u32 but after shifting it by NICE_0_SHIFT
    bits it is promoted to u64. This of course makes no sense, since the
    result will never be more then 32-bit long. Casting sa->runnable_avg_sum
    to u64 before it is shifted, fixes this problem.

    Reviewed-by: Ben Segall
    Signed-off-by: Michal Nazarewicz
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1384112521-25177-1-git-send-email-mpn@google.com
    Signed-off-by: Ingo Molnar

    Michal Nazarewicz