13 Feb, 2015

34 commits

  • Currently, the isolate callback passed to the list_lru_walk family of
    functions is supposed to just delete an item from the list upon returning
    LRU_REMOVED or LRU_REMOVED_RETRY, while nr_items counter is fixed by
    __list_lru_walk_one after the callback returns. Since the callback is
    allowed to drop the lock after removing an item (it has to return
    LRU_REMOVED_RETRY then), the nr_items can be less than the actual number
    of elements on the list even if we check them under the lock. This makes
    it difficult to move items from one list_lru_one to another, which is
    required for per-memcg list_lru reparenting - we can't just splice the
    lists, we have to move entries one by one.

    This patch therefore introduces helpers that must be used by callback
    functions to isolate items instead of raw list_del/list_move. These are
    list_lru_isolate and list_lru_isolate_move. They not only remove the
    entry from the list, but also fix the nr_items counter, making sure
    nr_items always reflects the actual number of elements on the list if
    checked under the appropriate lock.

    Signed-off-by: Vladimir Davydov
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Tejun Heo
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Dave Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • We need to look up a kmem_cache in ->memcg_params.memcg_caches arrays only
    on allocations, so there is no need to have the array entries set until
    css free - we can clear them on css offline. This will allow us to reuse
    array entries more efficiently and avoid costly array relocations.

    Signed-off-by: Vladimir Davydov
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Tejun Heo
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Dave Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • Currently, we use mem_cgroup->kmemcg_id to guarantee kmem_cache->name
    uniqueness. This is correct, because kmemcg_id is only released on css
    free after destroying all per memcg caches.

    However, I am going to change that and release kmemcg_id on css offline,
    because it is not wise to keep it for so long, wasting valuable entries of
    memcg_cache_params->memcg_caches arrays. Therefore, to preserve cache
    name uniqueness, let us switch to css->id.

    Signed-off-by: Vladimir Davydov
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Tejun Heo
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Dave Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • Currently, we release css->id in css_release_work_fn, right before calling
    css_free callback, so that when css_free is called, the id may have
    already been reused for a new cgroup.

    I am going to use css->id to create unique names for per memcg kmem
    caches. Since kmem caches are destroyed only on css_free, I need css->id
    to be freed after css_free was called to avoid name clashes. This patch
    therefore moves css->id removal to css_free_work_fn. To prevent
    css_from_id from returning a pointer to a stale css, it makes
    css_release_work_fn replace the css ptr at css_idr:css->id with NULL.

    Signed-off-by: Vladimir Davydov
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Acked-by: Tejun Heo
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Dave Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • Sometimes, we need to iterate over all memcg copies of a particular root
    kmem cache. Currently, we use memcg_cache_params->memcg_caches array for
    that, because it contains all existing memcg caches.

    However, it's a bad practice to keep all caches, including those that
    belong to offline cgroups, in this array, because it will be growing
    beyond any bounds then. I'm going to wipe away dead caches from it to
    save space. To still be able to perform iterations over all memcg caches
    of the same kind, let us link them into a list.

    Signed-off-by: Vladimir Davydov
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Tejun Heo
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Dave Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • Currently, kmem_cache stores a pointer to struct memcg_cache_params
    instead of embedding it. The rationale is to save memory when kmem
    accounting is disabled. However, the memcg_cache_params has shrivelled
    drastically since it was first introduced:

    * Initially:

    struct memcg_cache_params {
    bool is_root_cache;
    union {
    struct kmem_cache *memcg_caches[0];
    struct {
    struct mem_cgroup *memcg;
    struct list_head list;
    struct kmem_cache *root_cache;
    bool dead;
    atomic_t nr_pages;
    struct work_struct destroy;
    };
    };
    };

    * Now:

    struct memcg_cache_params {
    bool is_root_cache;
    union {
    struct {
    struct rcu_head rcu_head;
    struct kmem_cache *memcg_caches[0];
    };
    struct {
    struct mem_cgroup *memcg;
    struct kmem_cache *root_cache;
    };
    };
    };

    So the memory saving does not seem to be a clear win anymore.

    OTOH, keeping a pointer to memcg_cache_params struct instead of embedding
    it results in touching one more cache line on kmem alloc/free hot paths.
    Besides, it makes linking kmem caches in a list chained by a field of
    struct memcg_cache_params really painful due to a level of indirection,
    while I want to make them linked in the following patch. That said, let
    us embed it.

    Signed-off-by: Vladimir Davydov
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Tejun Heo
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Dave Chinner
    Cc: Dan Carpenter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • In super_cache_scan() we divide the number of objects of particular type
    by the total number of objects in order to distribute pressure among As a
    result, in some corner cases we can get nr_to_scan=0 even if there are
    some objects to reclaim, e.g. dentries=1, inodes=1, fs_objects=1,
    nr_to_scan=1/3=0.

    This is unacceptable for per memcg kmem accounting, because this means
    that some objects may never get reclaimed after memcg death, preventing it
    from being freed.

    This patch therefore assures that super_cache_scan() will scan at least
    one object of each type if any.

    [akpm@linux-foundation.org: add comment]
    Signed-off-by: Vladimir Davydov
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Alexander Viro
    Cc: Dave Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • Now, to make any list_lru-based shrinker memcg aware we should only
    initialize its list_lru as memcg aware. Let's do it for the general FS
    shrinker (super_block::s_shrink).

    There are other FS-specific shrinkers that use list_lru for storing
    objects, such as XFS and GFS2 dquot cache shrinkers, but since they
    reclaim objects that are shared among different cgroups, there is no point
    making them memcg aware. It's a big question whether we should account
    them to memcg at all.

    Signed-off-by: Vladimir Davydov
    Cc: Dave Chinner
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Greg Thelen
    Cc: Glauber Costa
    Cc: Alexander Viro
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • There are several FS shrinkers, including super_block::s_shrink, that
    keep reclaimable objects in the list_lru structure. Hence to turn them
    to memcg-aware shrinkers, it is enough to make list_lru per-memcg.

    This patch does the trick. It adds an array of lru lists to the
    list_lru_node structure (per-node part of the list_lru), one for each
    kmem-active memcg, and dispatches every item addition or removal to the
    list corresponding to the memcg which the item is accounted to. So now
    the list_lru structure is not just per node, but per node and per memcg.

    Not all list_lrus need this feature, so this patch also adds a new
    method, list_lru_init_memcg, which initializes a list_lru as memcg
    aware. Otherwise (i.e. if initialized with old list_lru_init), the
    list_lru won't have per memcg lists.

    Just like per memcg caches arrays, the arrays of per-memcg lists are
    indexed by memcg_cache_id, so we must grow them whenever
    memcg_nr_cache_ids is increased. So we introduce a callback,
    memcg_update_all_list_lrus, invoked by memcg_alloc_cache_id if the id
    space is full.

    The locking is implemented in a manner similar to lruvecs, i.e. we have
    one lock per node that protects all lists (both global and per cgroup) on
    the node.

    Signed-off-by: Vladimir Davydov
    Cc: Dave Chinner
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Greg Thelen
    Cc: Glauber Costa
    Cc: Alexander Viro
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • To make list_lru memcg aware, we need all list_lrus to be kept on a list
    protected by a mutex, so that we could sleep while walking over the
    list.

    Therefore after this change list_lru_destroy may sleep. Fortunately,
    there is only one user that calls it from an atomic context - it's
    put_super - and we can easily fix it by calling list_lru_destroy before
    put_super in destroy_locked_super - anyway we don't longer need lrus by
    that time.

    Another point that should be noted is that list_lru_destroy is allowed
    to be called on an uninitialized zeroed-out object, in which case it is
    a no-op. Before this patch this was guaranteed by kfree, but now we
    need an explicit check there.

    Signed-off-by: Vladimir Davydov
    Cc: Dave Chinner
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Greg Thelen
    Cc: Glauber Costa
    Cc: Alexander Viro
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • The active_nodes mask allows us to skip empty nodes when walking over
    list_lru items from all nodes in list_lru_count/walk. However, these
    functions are never called from hot paths, so it doesn't seem we need
    such kind of optimization there. OTOH, removing the mask will make it
    easier to make list_lru per-memcg.

    Signed-off-by: Vladimir Davydov
    Cc: Dave Chinner
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Greg Thelen
    Cc: Glauber Costa
    Cc: Alexander Viro
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • We need a stable value of memcg_nr_cache_ids in kmem_cache_create()
    (memcg_alloc_cache_params() wants it for root caches), where we only
    hold the slab_mutex and no memcg-related locks. As a result, we have to
    update memcg_nr_cache_ids under the slab_mutex, which we can only take
    on the slab's side (see memcg_update_array_size). This looks awkward
    and will become even worse when per-memcg list_lru is introduced, which
    also wants stable access to memcg_nr_cache_ids.

    To get rid of this dependency between the memcg_nr_cache_ids and the
    slab_mutex, this patch introduces a special rwsem. The rwsem is held
    for writing during memcg_caches arrays relocation and memcg_nr_cache_ids
    updates. Therefore one can take it for reading to get a stable access
    to memcg_caches arrays and/or memcg_nr_cache_ids.

    Currently the semaphore is taken for reading only from
    kmem_cache_create, right before taking the slab_mutex, so right now
    there's no much point in using rwsem instead of mutex. However, once
    list_lru is made per-memcg it will allow list_lru initializations to
    proceed concurrently.

    Signed-off-by: Vladimir Davydov
    Cc: Dave Chinner
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Greg Thelen
    Cc: Glauber Costa
    Cc: Alexander Viro
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • memcg_limited_groups_array_size, which defines the size of memcg_caches
    arrays, sounds rather cumbersome. Also it doesn't point anyhow that
    it's related to kmem/caches stuff. So let's rename it to
    memcg_nr_cache_ids. It's concise and points us directly to
    memcg_cache_id.

    Also, rename kmem_limited_groups to memcg_cache_ida.

    Signed-off-by: Vladimir Davydov
    Cc: Dave Chinner
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Greg Thelen
    Cc: Glauber Costa
    Cc: Alexander Viro
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • This patch adds SHRINKER_MEMCG_AWARE flag. If a shrinker has this flag
    set, it will be called per memory cgroup. The memory cgroup to scan
    objects from is passed in shrink_control->memcg. If the memory cgroup
    is NULL, a memcg aware shrinker is supposed to scan objects from the
    global list. Unaware shrinkers are only called on global pressure with
    memcg=NULL.

    Signed-off-by: Vladimir Davydov
    Cc: Dave Chinner
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Greg Thelen
    Cc: Glauber Costa
    Cc: Alexander Viro
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • We are going to make FS shrinkers memcg-aware. To achieve that, we will
    have to pass the memcg to scan to the nr_cached_objects and
    free_cached_objects VFS methods, which currently take only the NUMA node
    to scan. Since the shrink_control structure already holds the node, and
    the memcg to scan will be added to it when we introduce memcg-aware
    vmscan, let us consolidate the methods' arguments in this structure to
    keep things clean.

    Signed-off-by: Vladimir Davydov
    Suggested-by: Dave Chinner
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Greg Thelen
    Cc: Glauber Costa
    Cc: Alexander Viro
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • Kmem accounting of memcg is unusable now, because it lacks slab shrinker
    support. That means when we hit the limit we will get ENOMEM w/o any
    chance to recover. What we should do then is to call shrink_slab, which
    would reclaim old inode/dentry caches from this cgroup. This is what
    this patch set is intended to do.

    Basically, it does two things. First, it introduces the notion of
    per-memcg slab shrinker. A shrinker that wants to reclaim objects per
    cgroup should mark itself as SHRINKER_MEMCG_AWARE. Then it will be
    passed the memory cgroup to scan from in shrink_control->memcg. For
    such shrinkers shrink_slab iterates over the whole cgroup subtree under
    the target cgroup and calls the shrinker for each kmem-active memory
    cgroup.

    Secondly, this patch set makes the list_lru structure per-memcg. It's
    done transparently to list_lru users - everything they have to do is to
    tell list_lru_init that they want memcg-aware list_lru. Then the
    list_lru will automatically distribute objects among per-memcg lists
    basing on which cgroup the object is accounted to. This way to make FS
    shrinkers (icache, dcache) memcg-aware we only need to make them use
    memcg-aware list_lru, and this is what this patch set does.

    As before, this patch set only enables per-memcg kmem reclaim when the
    pressure goes from memory.limit, not from memory.kmem.limit. Handling
    memory.kmem.limit is going to be tricky due to GFP_NOFS allocations, and
    it is still unclear whether we will have this knob in the unified
    hierarchy.

    This patch (of 9):

    NUMA aware slab shrinkers use the list_lru structure to distribute
    objects coming from different NUMA nodes to different lists. Whenever
    such a shrinker needs to count or scan objects from a particular node,
    it issues commands like this:

    count = list_lru_count_node(lru, sc->nid);
    freed = list_lru_walk_node(lru, sc->nid, isolate_func,
    isolate_arg, &sc->nr_to_scan);

    where sc is an instance of the shrink_control structure passed to it
    from vmscan.

    To simplify this, let's add special list_lru functions to be used by
    shrinkers, list_lru_shrink_count() and list_lru_shrink_walk(), which
    consolidate the nid and nr_to_scan arguments in the shrink_control
    structure.

    This will also allow us to avoid patching shrinkers that use list_lru
    when we make shrink_slab() per-memcg - all we will have to do is extend
    the shrink_control structure to include the target memcg and make
    list_lru_shrink_{count,walk} handle this appropriately.

    Signed-off-by: Vladimir Davydov
    Suggested-by: Dave Chinner
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Greg Thelen
    Cc: Glauber Costa
    Cc: Alexander Viro
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • If a PTE or PMD is already marked NUMA when scanning to mark entries for
    NUMA hinting then it is not necessary to update the entry and incur a TLB
    flush penalty. Avoid the avoidhead where possible.

    Signed-off-by: Mel Gorman
    Cc: Aneesh Kumar K.V
    Cc: Benjamin Herrenschmidt
    Cc: Dave Jones
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc: Kirill Shutemov
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: Rik van Riel
    Cc: Sasha Levin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • pte_protnone_numa is only safe to use after VMA checks for PROT_NONE are
    complete. Treating a real PROT_NONE PTE as a NUMA hinting fault is going
    to result in strangeness so add a check for it. BUG_ON looks like
    overkill but if this is hit then it's a serious bug that could result in
    corruption so do not even try recovering. It would have been more
    comprehensive to check VMA flags in pte_protnone_numa but it would have
    made the API ugly just for a debugging check.

    Signed-off-by: Mel Gorman
    Cc: Aneesh Kumar K.V
    Cc: Benjamin Herrenschmidt
    Cc: Dave Jones
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc: Kirill Shutemov
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: Rik van Riel
    Cc: Sasha Levin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Commit b38af4721f59 ("x86,mm: fix pte_special versus pte_numa") adjusted
    the pte_special check to take into account that a special pte had
    SPECIAL and neither PRESENT nor PROTNONE. Now that NUMA hinting PTEs
    are no longer modifying _PAGE_PRESENT it should be safe to restore the
    original pte_special behaviour.

    Signed-off-by: Mel Gorman
    Cc: Aneesh Kumar K.V
    Cc: Benjamin Herrenschmidt
    Cc: Dave Jones
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc: Kirill Shutemov
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: Rik van Riel
    Cc: Sasha Levin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Faults on the huge zero page are pointless and there is a BUG_ON to catch
    them during fault time. This patch reintroduces a check that avoids
    marking the zero page PAGE_NONE.

    Signed-off-by: Mel Gorman
    Cc: Aneesh Kumar K.V
    Cc: Benjamin Herrenschmidt
    Cc: Dave Jones
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc: Kirill Shutemov
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: Rik van Riel
    Cc: Sasha Levin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • This patch removes the NUMA PTE bits and associated helpers. As a
    side-effect it increases the maximum possible swap space on x86-64.

    One potential source of problems is races between the marking of PTEs
    PROT_NONE, NUMA hinting faults and migration. It must be guaranteed that
    a PTE being protected is not faulted in parallel, seen as a pte_none and
    corrupting memory. The base case is safe but transhuge has problems in
    the past due to an different migration mechanism and a dependance on page
    lock to serialise migrations and warrants a closer look.

    task_work hinting update parallel fault
    ------------------------ --------------
    change_pmd_range
    change_huge_pmd
    __pmd_trans_huge_lock
    pmdp_get_and_clear
    __handle_mm_fault
    pmd_none
    do_huge_pmd_anonymous_page
    read? pmd_lock blocks until hinting complete, fail !pmd_none test
    write? __do_huge_pmd_anonymous_page acquires pmd_lock, checks pmd_none
    pmd_modify
    set_pmd_at

    task_work hinting update parallel migration
    ------------------------ ------------------
    change_pmd_range
    change_huge_pmd
    __pmd_trans_huge_lock
    pmdp_get_and_clear
    __handle_mm_fault
    do_huge_pmd_numa_page
    migrate_misplaced_transhuge_page
    pmd_lock waits for updates to complete, recheck pmd_same
    pmd_modify
    set_pmd_at

    Both of those are safe and the case where a transhuge page is inserted
    during a protection update is unchanged. The case where two processes try
    migrating at the same time is unchanged by this series so should still be
    ok. I could not find a case where we are accidentally depending on the
    PTE not being cleared and flushed. If one is missed, it'll manifest as
    corruption problems that start triggering shortly after this series is
    merged and only happen when NUMA balancing is enabled.

    Signed-off-by: Mel Gorman
    Tested-by: Sasha Levin
    Cc: Aneesh Kumar K.V
    Cc: Benjamin Herrenschmidt
    Cc: Dave Jones
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc: Kirill Shutemov
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: Rik van Riel
    Cc: Mark Brown
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • With PROT_NONE, the traditional page table manipulation functions are
    sufficient.

    [andre.przywara@arm.com: fix compiler warning in pmdp_invalidate()]
    [akpm@linux-foundation.org: fix build with STRICT_MM_TYPECHECKS]
    Signed-off-by: Mel Gorman
    Acked-by: Linus Torvalds
    Acked-by: Aneesh Kumar
    Tested-by: Sasha Levin
    Cc: Benjamin Herrenschmidt
    Cc: Dave Jones
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc: Kirill Shutemov
    Cc: Paul Mackerras
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • ppc64 should not be depending on DSISR_PROTFAULT and it's unexpected if
    they are triggered. This patch adds warnings just in case they are being
    accidentally depended upon.

    Signed-off-by: Mel Gorman
    Acked-by: Aneesh Kumar K.V
    Tested-by: Sasha Levin
    Cc: Benjamin Herrenschmidt
    Cc: Dave Jones
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc: Kirill Shutemov
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Convert existing users of pte_numa and friends to the new helper. Note
    that the kernel is broken after this patch is applied until the other page
    table modifiers are also altered. This patch layout is to make review
    easier.

    Signed-off-by: Mel Gorman
    Acked-by: Linus Torvalds
    Acked-by: Aneesh Kumar
    Acked-by: Benjamin Herrenschmidt
    Tested-by: Sasha Levin
    Cc: Dave Jones
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc: Kirill Shutemov
    Cc: Paul Mackerras
    Cc: Rik van Riel
    Cc: Sasha Levin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • This is a preparatory patch that introduces protnone helpers for automatic
    NUMA balancing.

    Signed-off-by: Mel Gorman
    Acked-by: Linus Torvalds
    Acked-by: Aneesh Kumar K.V
    Tested-by: Sasha Levin
    Cc: Benjamin Herrenschmidt
    Cc: Dave Jones
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc: Kirill Shutemov
    Cc: Paul Mackerras
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Automatic NUMA balancing depends on being able to protect PTEs to trap a
    fault and gather reference locality information. Very broadly speaking
    it would mark PTEs as not present and use another bit to distinguish
    between NUMA hinting faults and other types of faults. It was
    universally loved by everybody and caused no problems whatsoever. That
    last sentence might be a lie.

    This series is very heavily based on patches from Linus and Aneesh to
    replace the existing PTE/PMD NUMA helper functions with normal change
    protections. I did alter and add parts of it but I consider them
    relatively minor contributions. At their suggestion, acked-bys are in
    there but I've no problem converting them to Signed-off-by if requested.

    AFAIK, this has received no testing on ppc64 and I'm depending on Aneesh
    for that. I tested trinity under kvm-tool and passed and ran a few
    other basic tests. At the time of writing, only the short-lived tests
    have completed but testing of V2 indicated that long-term testing had no
    surprises. In most cases I'm leaving out detail as it's not that
    interesting.

    specjbb single JVM: There was negligible performance difference in the
    benchmark itself for short runs. However, system activity is
    higher and interrupts are much higher over time -- possibly TLB
    flushes. Migrations are also higher. Overall, this is more overhead
    but considering the problems faced with the old approach I think
    we just have to suck it up and find another way of reducing the
    overhead.

    specjbb multi JVM: Negligible performance difference to the actual benchmark
    but like the single JVM case, the system overhead is noticeably
    higher. Again, interrupts are a major factor.

    autonumabench: This was all over the place and about all that can be
    reasonably concluded is that it's different but not necessarily
    better or worse.

    autonumabench
    3.18.0-rc5 3.18.0-rc5
    mmotm-20141119 protnone-v3r3
    User NUMA01 32380.24 ( 0.00%) 21642.92 ( 33.16%)
    User NUMA01_THEADLOCAL 22481.02 ( 0.00%) 22283.22 ( 0.88%)
    User NUMA02 3137.00 ( 0.00%) 3116.54 ( 0.65%)
    User NUMA02_SMT 1614.03 ( 0.00%) 1543.53 ( 4.37%)
    System NUMA01 322.97 ( 0.00%) 1465.89 (-353.88%)
    System NUMA01_THEADLOCAL 91.87 ( 0.00%) 49.32 ( 46.32%)
    System NUMA02 37.83 ( 0.00%) 14.61 ( 61.38%)
    System NUMA02_SMT 7.36 ( 0.00%) 7.45 ( -1.22%)
    Elapsed NUMA01 716.63 ( 0.00%) 599.29 ( 16.37%)
    Elapsed NUMA01_THEADLOCAL 553.98 ( 0.00%) 539.94 ( 2.53%)
    Elapsed NUMA02 83.85 ( 0.00%) 83.04 ( 0.97%)
    Elapsed NUMA02_SMT 86.57 ( 0.00%) 79.15 ( 8.57%)
    CPU NUMA01 4563.00 ( 0.00%) 3855.00 ( 15.52%)
    CPU NUMA01_THEADLOCAL 4074.00 ( 0.00%) 4136.00 ( -1.52%)
    CPU NUMA02 3785.00 ( 0.00%) 3770.00 ( 0.40%)
    CPU NUMA02_SMT 1872.00 ( 0.00%) 1959.00 ( -4.65%)

    System CPU usage of NUMA01 is worse but it's an adverse workload on this
    machine so I'm reluctant to conclude that it's a problem that matters. On
    the other workloads that are sensible on this machine, system CPU usage is
    great. Overall time to complete the benchmark is comparable

    3.18.0-rc5 3.18.0-rc5
    mmotm-20141119protnone-v3r3
    User 59612.50 48586.44
    System 460.22 1537.45
    Elapsed 1442.20 1304.29

    NUMA alloc hit 5075182 5743353
    NUMA alloc miss 0 0
    NUMA interleave hit 0 0
    NUMA alloc local 5075174 5743339
    NUMA base PTE updates 637061448 443106883
    NUMA huge PMD updates 1243434 864747
    NUMA page range updates 1273699656 885857347
    NUMA hint faults 1658116 1214277
    NUMA hint local faults 959487 754113
    NUMA hint local percent 57 62
    NUMA pages migrated 5467056 61676398

    The NUMA pages migrated look terrible but when I looked at a graph of the
    activity over time I see that the massive spike in migration activity was
    during NUMA01. This correlates with high system CPU usage and could be
    simply down to bad luck but any modifications that affect that workload
    would be related to scan rates and migrations, not the protection
    mechanism. For all other workloads, migration activity was comparable.

    Overall, headline performance figures are comparable but the overhead is
    higher, mostly in interrupts. To some extent, higher overhead from this
    approach was anticipated but not to this degree. It's going to be
    necessary to reduce this again with a separate series in the future. It's
    still worth going ahead with this series though as it's likely to avoid
    constant headaches with Xen and is probably easier to maintain.

    This patch (of 10):

    A transhuge NUMA hinting fault may find the page is migrating and should
    wait until migration completes. The check is race-prone because the pmd
    is deferenced outside of the page lock and while the race is tiny, it'll
    be larger if the PMD is cleared while marking PMDs for hinting fault.
    This patch closes the race.

    Signed-off-by: Mel Gorman
    Cc: Aneesh Kumar K.V
    Cc: Benjamin Herrenschmidt
    Cc: Dave Jones
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc: Kirill Shutemov
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: Rik van Riel
    Cc: Sasha Levin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Pull jfs updates from David Kleikamp:
    "A couple cleanups for jfs"

    * tag 'jfs-3.20' of git://github.com/kleikamp/linux-shaggy:
    jfs: Deletion of an unnecessary check before the function call "unload_nls"
    jfs: get rid of homegrown endianness helpers

    Linus Torvalds
     
  • Pull nfsd updates from Bruce Fields:
    "The main change is the pNFS block server support from Christoph, which
    allows an NFS client connected to shared disk to do block IO to the
    shared disk in place of NFS reads and writes. This also requires xfs
    patches, which should arrive soon through the xfs tree, barring
    unexpected problems. Support for other filesystems is also possible
    if there's interest.

    Thanks also to Chuck Lever for continuing work to get NFS/RDMA into
    shape"

    * 'for-3.20' of git://linux-nfs.org/~bfields/linux: (32 commits)
    nfsd: default NFSv4.2 to on
    nfsd: pNFS block layout driver
    exportfs: add methods for block layout exports
    nfsd: add trace events
    nfsd: update documentation for pNFS support
    nfsd: implement pNFS layout recalls
    nfsd: implement pNFS operations
    nfsd: make find_any_file available outside nfs4state.c
    nfsd: make find/get/put file available outside nfs4state.c
    nfsd: make lookup/alloc/unhash_stid available outside nfs4state.c
    nfsd: add fh_fsid_match helper
    nfsd: move nfsd_fh_match to nfsfh.h
    fs: add FL_LAYOUT lease type
    fs: track fl_owner for leases
    nfs: add LAYOUT_TYPE_MAX enum value
    nfsd: factor out a helper to decode nfstime4 values
    sunrpc/lockd: fix references to the BKL
    nfsd: fix year-2038 nfs4 state problem
    svcrdma: Handle additional inline content
    svcrdma: Move read list XDR round-up logic
    ...

    Linus Torvalds
     
  • Pull IOMMU updates from Joerg Roedel:
    "This time with:

    - Generic page-table framework for ARM IOMMUs using the LPAE
    page-table format, ARM-SMMU and Renesas IPMMU make use of it
    already.

    - Break out the IO virtual address allocator from the Intel IOMMU so
    that it can be used by other DMA-API implementations too. The
    first user will be the ARM64 common DMA-API implementation for
    IOMMUs

    - Device tree support for Renesas IPMMU

    - Various fixes and cleanups all over the place"

    * tag 'iommu-updates-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (36 commits)
    iommu/amd: Convert non-returned local variable to boolean when relevant
    iommu: Update my email address
    iommu/amd: Use wait_event in put_pasid_state_wait
    iommu/amd: Fix amd_iommu_free_device()
    iommu/arm-smmu: Avoid build warning
    iommu/fsl: Various cleanups
    iommu/fsl: Use %pa to print phys_addr_t
    iommu/omap: Print phys_addr_t using %pa
    iommu: Make more drivers depend on COMPILE_TEST
    iommu/ipmmu-vmsa: Fix IOMMU lookup when multiple IOMMUs are registered
    iommu: Disable on !MMU builds
    iommu/fsl: Remove unused fsl_of_pamu_ids[]
    iommu/fsl: Fix section mismatch
    iommu/ipmmu-vmsa: Use the ARM LPAE page table allocator
    iommu: Fix trace_map() to report original iova and original size
    iommu/arm-smmu: add support for iova_to_phys through ATS1PR
    iopoll: Introduce memory-mapped IO polling macros
    iommu/arm-smmu: don't touch the secure STLBIALL register
    iommu/arm-smmu: make use of generic LPAE allocator
    iommu: io-pgtable-arm: add non-secure quirk
    ...

    Linus Torvalds
     
  • Pull DeviceTree changes from Rob Herring:

    - DT unittests for I2C probing and overlays from Pantelis Antoniou

    - Remove DT unittest dependency on OF_DYNAMIC from Gaurav Minocha

    - Add Tegra compatible strings missing for newer parts from Paul
    Walmsley

    - Various vendor prefix additions

    * tag 'devicetree-for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
    of: Add vendor prefix for OmniVision Technologies
    of: Use ovti for Omnivision
    of: Add vendor prefix for Truly Semiconductors Limited
    of: Add vendor prefix for Himax Technologies Inc.
    of/fdt: fix sparse warning
    of: unitest: Add I2C overlay unit tests.
    Documentation: DT: document compatible string existence requirement
    Documentation: DT bindings: add nvidia, tegra132-denver compatible string
    Documentation: DT bindings: add more Tegra chip compatible strings
    of: EXPORT_SYMBOL_GPL of_property_read_u64_array
    of: Fix brace position for struct of_device_id definition
    of/unittest: Remove obsolete code
    dt-bindings: use isil prefix for Intersil in vendor-prefixes.txt
    Add AD Holdings Plc. to vendor-prefixes.
    dt-bindings: Add Silicon Mitus vendor prefix
    Removes OF_UNITTEST dependency on OF_DYNAMIC config symbol
    pinctrl: fix up device tree bindings
    DT: Vendors: Add Everspin
    doc: add bindings document for altera fpga manager
    drivers: of: Export of_reserved_mem_device_{init,release}

    Linus Torvalds
     
  • Pull ARM updates from Russell King:

    - clang assembly fixes from Ard

    - optimisations and cleanups for Aurora L2 cache support

    - efficient L2 cache support for secure monitor API on Exynos SoCs

    - debug menu cleanup from Daniel Thompson to allow better behaviour for
    multiplatform kernels

    - StrongARM SA11x0 conversion to irq domains, and pxa_timer

    - kprobes updates for older ARM CPUs

    - move probes support out of arch/arm/kernel to arch/arm/probes

    - add inline asm support for the rbit (reverse bits) instruction

    - provide an ARM mode secondary CPU entry point (for Qualcomm CPUs)

    - remove the unused ARMv3 user access code

    - add driver_override support to AMBA Primecell bus

    * 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (55 commits)
    ARM: 8256/1: driver coamba: add device binding path 'driver_override'
    ARM: 8301/1: qcom: Use secondary_startup_arm()
    ARM: 8302/1: Add a secondary_startup that assumes ARM mode
    ARM: 8300/1: teach __asmeq that r11 == fp and r12 == ip
    ARM: kprobes: Fix compilation error caused by superfluous '*'
    ARM: 8297/1: cache-l2x0: optimize aurora range operations
    ARM: 8296/1: cache-l2x0: clean up aurora cache handling
    ARM: 8284/1: sa1100: clear RCSR_SMR on resume
    ARM: 8283/1: sa1100: collie: clear PWER register on machine init
    ARM: 8282/1: sa1100: use handle_domain_irq
    ARM: 8281/1: sa1100: move GPIO-related IRQ code to gpio driver
    ARM: 8280/1: sa1100: switch to irq_domain_add_simple()
    ARM: 8279/1: sa1100: merge both GPIO irqdomains
    ARM: 8278/1: sa1100: split irq handling for low GPIOs
    ARM: 8291/1: replace magic number with PAGE_SHIFT macro in fixup_pv code
    ARM: 8290/1: decompressor: fix a wrong comment
    ARM: 8286/1: mm: Fix dma_contiguous_reserve comment
    ARM: 8248/1: pm: remove outdated comment
    ARM: 8274/1: Fix DEBUG_LL for multi-platform kernels (without PL01X)
    ARM: 8273/1: Seperate DEBUG_UART_PHYS from DEBUG_LL on EP93XX
    ...

    Linus Torvalds
     
  • Pull AVR32 update from Hans-Christian Egtvedt.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32:
    avr32: update all default configurations
    avr32: remove fake at91 cpu identification
    avr32: wire up missing syscalls

    Linus Torvalds
     
  • Pull tracing updates from Steven Rostedt:
    "The updates included in this pull request for ftrace are:

    o Several clean ups to the code

    One such clean up was to convert to 64 bit time keeping, in the
    ring buffer benchmark code.

    o Adding of __print_array() helper macro for TRACE_EVENT()

    o Updating the sample/trace_events/ to add samples of different ways
    to make trace events. Lots of features have been added since the
    sample code was made, and these features are mostly unknown.
    Developers have been making their own hacks to do things that are
    already available.

    o Performance improvements. Most notably, I found a performance bug
    where a waiter that is waiting for a full page from the ring buffer
    will see that a full page is not available, and go to sleep. The
    sched event caused by it going to sleep would cause it to wake up
    again. It would see that there was still not a full page, and go
    back to sleep again, and that would wake it up again, until finally
    it would see a full page. This change has been marked for stable.

    Other improvements include removing global locks from fast paths"

    * tag 'trace-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    ring-buffer: Do not wake up a splice waiter when page is not full
    tracing: Fix unmapping loop in tracing_mark_write
    tracing: Add samples of DECLARE_EVENT_CLASS() and DEFINE_EVENT()
    tracing: Add TRACE_EVENT_FN example
    tracing: Add TRACE_EVENT_CONDITION sample
    tracing: Update the TRACE_EVENT fields available in the sample code
    tracing: Separate out initializing top level dir from instances
    tracing: Make tracing_init_dentry_tr() static
    trace: Use 64-bit timekeeping
    tracing: Add array printing helper
    tracing: Remove newline from trace_printk warning banner
    tracing: Use IS_ERR() check for return value of tracing_init_dentry()
    tracing: Remove unneeded includes of debugfs.h and fs.h
    tracing: Remove taking of trace_types_lock in pipe files
    tracing: Add ref count to tracer for when they are being read by pipe

    Linus Torvalds
     
  • Pull ktest updates from Steven Rostedt:
    "The following ktest updates were done:

    o Added timings to various parts of the test (build, install, boot,
    tests) and report them so that the users can keep track of changes.

    o Josh Poimboeuf fixed the console output to work better with virtual
    machine targets.

    o Various clean ups and fixes"

    * tag 'ktest-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
    ktest: Place quotes around item variable
    ktest: Cleanup terminal on dodie() failure
    ktest: Print build,install,boot,test times at success and failure
    ktest: Enable user input to the console
    ktest: Give console process a dedicated tty
    ktest: Rename start_monitor_and_boot to start_monitor_and_install
    ktest: Show times for build, install, boot and test
    ktest: Restore tty settings after closing console
    ktest: Add timings for commands

    Linus Torvalds
     

12 Feb, 2015

6 commits

  • Pull security layer updates from James Morris:
    "Highlights:

    - Smack adds secmark support for Netfilter
    - /proc/keys is now mandatory if CONFIG_KEYS=y
    - TPM gets its own device class
    - Added TPM 2.0 support
    - Smack file hook rework (all Smack users should review this!)"

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (64 commits)
    cipso: don't use IPCB() to locate the CIPSO IP option
    SELinux: fix error code in policydb_init()
    selinux: add security in-core xattr support for pstore and debugfs
    selinux: quiet the filesystem labeling behavior message
    selinux: Remove unused function avc_sidcmp()
    ima: /proc/keys is now mandatory
    Smack: Repair netfilter dependency
    X.509: silence asn1 compiler debug output
    X.509: shut up about included cert for silent build
    KEYS: Make /proc/keys unconditional if CONFIG_KEYS=y
    MAINTAINERS: email update
    tpm/tpm_tis: Add missing ifdef CONFIG_ACPI for pnp_acpi_device
    smack: fix possible use after frees in task_security() callers
    smack: Add missing logging in bidirectional UDS connect check
    Smack: secmark support for netfilter
    Smack: Rework file hooks
    tpm: fix format string error in tpm-chip.c
    char/tpm/tpm_crb: fix build error
    smack: Fix a bidirectional UDS connect check typo
    smack: introduce a special case for tmpfs in smack_d_instantiate()
    ...

    Linus Torvalds
     
  • Pull audit fix from Paul Moore:
    "Just one patch from the audit tree for v3.20, and a very minor one at
    that.

    The patch simply removes an old, unused field from the audit_krule
    structure, a private audit-only struct. In audit related news, we did
    a proper overhaul of the audit pathname code and removed the nasty
    getname()/putname() hacks for audit, you should see those patches in
    Al's vfs tree if you haven't already.

    That's it for audit this time, let's hope for a quiet -rcX series"

    * 'upstream' of git://git.infradead.org/users/pcmoore/audit:
    audit: remove vestiges of vers_ops

    Linus Torvalds
     
  • Rob Herring
     
  • Merge second set of updates from Andrew Morton:
    "More of MM"

    * emailed patches from Andrew Morton : (83 commits)
    mm/nommu.c: fix arithmetic overflow in __vm_enough_memory()
    mm/mmap.c: fix arithmetic overflow in __vm_enough_memory()
    vmstat: Reduce time interval to stat update on idle cpu
    mm/page_owner.c: remove unnecessary stack_trace field
    Documentation/filesystems/proc.txt: describe /proc//map_files
    mm: incorporate read-only pages into transparent huge pages
    vmstat: do not use deferrable delayed work for vmstat_update
    mm: more aggressive page stealing for UNMOVABLE allocations
    mm: always steal split buddies in fallback allocations
    mm: when stealing freepages, also take pages created by splitting buddy page
    mincore: apply page table walker on do_mincore()
    mm: /proc/pid/clear_refs: avoid split_huge_page()
    mm: pagewalk: fix misbehavior of walk_page_range for vma(VM_PFNMAP)
    mempolicy: apply page table walker on queue_pages_range()
    arch/powerpc/mm/subpage-prot.c: use walk->vma and walk_page_vma()
    memcg: cleanup preparation for page table walk
    numa_maps: remove numa_maps->vma
    numa_maps: fix typo in gather_hugetbl_stats
    pagemap: use walk->vma instead of calling find_vma()
    clear_refs: remove clear_refs_private->vma and introduce clear_refs_test_walk()
    ...

    Linus Torvalds
     
  • Pull powerpc updates from Michael Ellerman:

    - Update of all defconfigs

    - Addition of a bunch of config options to modernise our defconfigs

    - Some PS3 updates from Geoff

    - Optimised memcmp for 64 bit from Anton

    - Fix for kprobes that allows 'perf probe' to work from Naveen

    - Several cxl updates from Ian & Ryan

    - Expanded support for the '24x7' PMU from Cody & Sukadev

    - Freescale updates from Scott:
    "Highlights include 8xx optimizations, some more work on datapath
    device tree content, e300 machine check support, t1040 corenet
    error reporting, and various cleanups and fixes"

    * tag 'powerpc-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (102 commits)
    cxl: Add missing return statement after handling AFU errror
    cxl: Fail AFU initialisation if an invalid configuration record is found
    cxl: Export optional AFU configuration record in sysfs
    powerpc/mm: Warn on flushing tlb page in kernel context
    powerpc/powernv: Add OPAL soft-poweroff routine
    powerpc/perf/hv-24x7: Document sysfs event description entries
    powerpc/perf/hv-gpci: add the remaining gpci requests
    powerpc/perf/{hv-gpci, hv-common}: generate requests with counters annotated
    powerpc/perf/hv-24x7: parse catalog and populate sysfs with events
    perf: define EVENT_DEFINE_RANGE_FORMAT_LITE helper
    perf: add PMU_EVENT_ATTR_STRING() helper
    perf: provide sysfs_show for struct perf_pmu_events_attr
    powerpc/kernel: Avoid initializing device-tree pointer twice
    powerpc: Remove old compile time disabled syscall tracing code
    powerpc/kernel: Make syscall_exit a local label
    cxl: Fix device_node reference counting
    powerpc/mm: bail out early when flushing TLB page
    powerpc: defconfigs: add MTD_SPI_NOR (new dependency for M25P80)
    perf/powerpc: reset event hw state when adding it to the PMU
    powerpc/qe: Use strlcpy()
    ...

    Linus Torvalds
     
  • Pull arm64 updates from Catalin Marinas:
    "arm64 updates for 3.20:

    - reimplementation of the virtual remapping of UEFI Runtime Services
    in a way that is stable across kexec
    - emulation of the "setend" instruction for 32-bit tasks (user
    endianness switching trapped in the kernel, SCTLR_EL1.E0E bit set
    accordingly)
    - compat_sys_call_table implemented in C (from asm) and made it a
    constant array together with sys_call_table
    - export CPU cache information via /sys (like other architectures)
    - DMA API implementation clean-up in preparation for IOMMU support
    - macros clean-up for KVM
    - dropped some unnecessary cache+tlb maintenance
    - CONFIG_ARM64_CPU_SUSPEND clean-up
    - defconfig update (CPU_IDLE)

    The EFI changes going via the arm64 tree have been acked by Matt
    Fleming. There is also a patch adding sys_*stat64 prototypes to
    include/linux/syscalls.h, acked by Andrew Morton"

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (47 commits)
    arm64: compat: Remove incorrect comment in compat_siginfo
    arm64: Fix section mismatch on alloc_init_p[mu]d()
    arm64: Avoid breakage caused by .altmacro in fpsimd save/restore macros
    arm64: mm: use *_sect to check for section maps
    arm64: drop unnecessary cache+tlb maintenance
    arm64:mm: free the useless initial page table
    arm64: Enable CPU_IDLE in defconfig
    arm64: kernel: remove ARM64_CPU_SUSPEND config option
    arm64: make sys_call_table const
    arm64: Remove asm/syscalls.h
    arm64: Implement the compat_sys_call_table in C
    syscalls: Declare sys_*stat64 prototypes if __ARCH_WANT_(COMPAT_)STAT64
    compat: Declare compat_sys_sigpending and compat_sys_sigprocmask prototypes
    arm64: uapi: expose our struct ucontext to the uapi headers
    smp, ARM64: Kill SMP single function call interrupt
    arm64: Emulate SETEND for AArch32 tasks
    arm64: Consolidate hotplug notifier for instruction emulation
    arm64: Track system support for mixed endian EL0
    arm64: implement generic IOMMU configuration
    arm64: Combine coherent and non-coherent swiotlb dma_ops
    ...

    Linus Torvalds