08 Oct, 2016

14 commits

  • After using the offset of the swap entry as the key of the swap cache,
    the page_index() becomes exactly same as page_file_index(). So the
    page_file_index() is removed and the callers are changed to use
    page_index() instead.

    Link: http://lkml.kernel.org/r/1473270649-27229-2-git-send-email-ying.huang@intel.com
    Signed-off-by: "Huang, Ying"
    Cc: Trond Myklebust
    Cc: Anna Schumaker
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Dave Hansen
    Cc: Johannes Weiner
    Cc: Dan Williams
    Cc: Joonsoo Kim
    Cc: Ross Zwisler
    Cc: Eric Dumazet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Ying
     
  • The global zero page is used to satisfy an anonymous read fault. If
    THP(Transparent HugePage) is enabled then the global huge zero page is
    used. The global huge zero page uses an atomic counter for reference
    counting and is allocated/freed dynamically according to its counter
    value.

    CPU time spent on that counter will greatly increase if there are a lot
    of processes doing anonymous read faults. This patch proposes a way to
    reduce the access to the global counter so that the CPU load can be
    reduced accordingly.

    To do this, a new flag of the mm_struct is introduced:
    MMF_USED_HUGE_ZERO_PAGE. With this flag, the process only need to touch
    the global counter in two cases:

    1 The first time it uses the global huge zero page;
    2 The time when mm_user of its mm_struct reaches zero.

    Note that right now, the huge zero page is eligible to be freed as soon
    as its last use goes away. With this patch, the page will not be
    eligible to be freed until the exit of the last process from which it
    was ever used.

    And with the use of mm_user, the kthread is not eligible to use huge
    zero page either. Since no kthread is using huge zero page today, there
    is no difference after applying this patch. But if that is not desired,
    I can change it to when mm_count reaches zero.

    Case used for test on Haswell EP:

    usemem -n 72 --readonly -j 0x200000 100G

    Which spawns 72 processes and each will mmap 100G anonymous space and
    then do read only access to that space sequentially with a step of 2MB.

    CPU cycles from perf report for base commit:
    54.03% usemem [kernel.kallsyms] [k] get_huge_zero_page
    CPU cycles from perf report for this commit:
    0.11% usemem [kernel.kallsyms] [k] mm_get_huge_zero_page

    Performance(throughput) of the workload for base commit: 1784430792
    Performance(throughput) of the workload for this commit: 4726928591
    164% increase.

    Runtime of the workload for base commit: 707592 us
    Runtime of the workload for this commit: 303970 us
    50% drop.

    Link: http://lkml.kernel.org/r/fe51a88f-446a-4622-1363-ad1282d71385@intel.com
    Signed-off-by: Aaron Lu
    Cc: Sergey Senozhatsky
    Cc: "Kirill A. Shutemov"
    Cc: Dave Hansen
    Cc: Tim Chen
    Cc: Huang Ying
    Cc: Vlastimil Babka
    Cc: Jerome Marchand
    Cc: Andrea Arcangeli
    Cc: Mel Gorman
    Cc: Ebru Akagunduz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aaron Lu
     
  • Trying to walk all of virtual memory requires architecture specific
    knowledge. On x86_64, addresses must be sign extended from bit 48,
    whereas on arm64 the top VA_BITS of address space have their own set of
    page tables.

    clear_refs_write() calls walk_page_range() on the range 0 to ~0UL, it
    provides a test_walk() callback that only expects to be walking over
    VMAs. Currently walk_pmd_range() will skip memory regions that don't
    have a VMA, reporting them as a hole.

    As this call only expects to walk user address space, make it walk 0 to
    'highest_vm_end'.

    Link: http://lkml.kernel.org/r/1472655792-22439-1-git-send-email-james.morse@arm.com
    Signed-off-by: James Morse
    Acked-by: Naoya Horiguchi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    James Morse
     
  • To support DAX pmd mappings with unmodified applications, filesystems
    need to align an mmap address by the pmd size.

    Call thp_get_unmapped_area() from f_op->get_unmapped_area.

    Note, there is no change in behavior for a non-DAX file.

    Link: http://lkml.kernel.org/r/1472497881-9323-3-git-send-email-toshi.kani@hpe.com
    Signed-off-by: Toshi Kani
    Cc: Dan Williams
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Cc: Kirill A. Shutemov
    Cc: Dave Chinner
    Cc: Jan Kara
    Cc: Theodore Ts'o
    Cc: Andreas Dilger
    Cc: Mike Kravetz
    Cc: "Kirill A. Shutemov"
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshi Kani
     
  • The extern struct variable ocfs2_inode_cache is not defined. It meant to
    use ocfs2_inode_cachep defined in super.c, I think. Fortunately it is
    not used anywhere now, so no impact actually. Clean it up to fix this
    mistake.

    Link: http://lkml.kernel.org/r/57E1E49D.8050503@huawei.com
    Signed-off-by: Joseph Qi
    Reviewed-by: Eric Ren
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • The workqueue "dlm_worker" queues a single work item &dlm->dispatched_work
    and thus it doesn't require execution ordering. Hence, alloc_workqueue
    has been used to replace the deprecated create_singlethread_workqueue
    instance.

    The WQ_MEM_RECLAIM flag has been set to ensure forward progress under
    memory pressure.

    Since there are fixed number of work items, explicit concurrency
    limit is unnecessary here.

    Link: http://lkml.kernel.org/r/2b5ad8d6688effe1a9ddb2bc2082d26fbbe00302.1472590094.git.bhaktipriya96@gmail.com
    Signed-off-by: Bhaktipriya Shridhar
    Acked-by: Tejun Heo
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bhaktipriya Shridhar
     
  • The workqueue "ocfs2_wq" queues multiple work items viz
    &osb->la_enable_wq, &journal->j_recovery_work, &os->os_orphan_scan_work,
    &osb->osb_truncate_log_wq which require strict execution ordering. Hence,
    an ordered dedicated workqueue has been used.

    WQ_MEM_RECLAIM has been set to ensure forward progress under memory
    pressure because the workqueue is being used on a memory reclaim path.

    Link: http://lkml.kernel.org/r/66279de510a7f4cfc6e386d99b7e04b3f65fb11b.1472590094.git.bhaktipriya96@gmail.com
    Signed-off-by: Bhaktipriya Shridhar
    Acked-by: Tejun Heo
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bhaktipriya Shridhar
     
  • The workqueue "o2net_wq" queues multiple work items viz
    &old_sc->sc_shutdown_work, &sc->sc_rx_work, &sc->sc_connect_work which
    require strict execution ordering. Hence, an ordered dedicated
    workqueue has been used.

    WQ_MEM_RECLAIM has been set to ensure forward progress under memory
    pressure.

    Link: http://lkml.kernel.org/r/ddc12e5766c79ba26f8a00d98049107f8a1d4866.1472590094.git.bhaktipriya96@gmail.com
    Signed-off-by: Bhaktipriya Shridhar
    Acked-by: Tejun Heo
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bhaktipriya Shridhar
     
  • The workqueue "user_dlm_worker" queues a single work item
    &lockres->l_work per user_lock_res instance and so it doesn't require
    execution ordering. Hence, alloc_workqueue has been used to replace the
    deprecated create_singlethread_workqueue instance.

    The WQ_MEM_RECLAIM flag has been set to ensure forward progress under
    memory pressure.

    Since there are fixed number of work items, explicit concurrency
    limit is unnecessary here.

    Link: http://lkml.kernel.org/r/9748136d3a3b18138ad1d6ba708367aa1fe9f98c.1472590094.git.bhaktipriya96@gmail.com
    Signed-off-by: Bhaktipriya Shridhar
    Acked-by: Tejun Heo
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bhaktipriya Shridhar
     
  • Use assert_spin_locked() macro instead of hand-made BUG_ON statements.

    Link: http://lkml.kernel.org/r/1474537439-18919-1-git-send-email-jack@suse.cz
    Signed-off-by: Jan Kara
    Suggested-by: Heiner Kallweit
    Reviewed-by: Jeff Layton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • When freeing permission events by fsnotify_destroy_event(), the warning
    WARN_ON(!list_empty(&event->list)); may falsely hit.

    This is because although fanotify_get_response() saw event->response
    set, there is nothing to make sure the current CPU also sees the removal
    of the event from the list. Add proper locking around the WARN_ON() to
    avoid the false warning.

    Link: http://lkml.kernel.org/r/1473797711-14111-7-git-send-email-jack@suse.cz
    Reported-by: Miklos Szeredi
    Signed-off-by: Jan Kara
    Reviewed-by: Lino Sanfilippo
    Cc: Eric Paris
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Fanotify code has its own lock (access_lock) to protect a list of events
    waiting for a response from userspace.

    However this is somewhat awkward as the same list_head in the event is
    protected by notification_lock if it is part of the notification queue
    and by access_lock if it is part of the fanotify private queue which
    makes it difficult for any reliable checks in the generic code. So make
    fanotify use the same lock - notification_lock - for protecting its
    private event list.

    Link: http://lkml.kernel.org/r/1473797711-14111-6-git-send-email-jack@suse.cz
    Signed-off-by: Jan Kara
    Reviewed-by: Lino Sanfilippo
    Cc: Miklos Szeredi
    Cc: Eric Paris
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • notification_mutex is used to protect the list of pending events. As such
    there's no reason to use a sleeping lock for it. Convert it to a
    spinlock.

    [jack@suse.cz: fixed version]
    Link: http://lkml.kernel.org/r/1474031567-1831-1-git-send-email-jack@suse.cz
    Link: http://lkml.kernel.org/r/1473797711-14111-5-git-send-email-jack@suse.cz
    Signed-off-by: Jan Kara
    Reviewed-by: Lino Sanfilippo
    Tested-by: Guenter Roeck
    Cc: Miklos Szeredi
    Cc: Eric Paris
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • fsnotify_flush_notify() and fanotify_release() destroy notification
    event while holding notification_mutex.

    The destruction of fanotify event includes a path_put() call which may
    end up calling into a filesystem to delete an inode if we happen to be
    the last holders of dentry reference which happens to be the last holder
    of inode reference.

    That in turn may violate lock ordering for some filesystems since
    notification_mutex is also acquired e. g. during write when generating
    fanotify event.

    Also this is the only thing that forces notification_mutex to be a
    sleeping lock. So drop notification_mutex before destroying a
    notification event.

    Link: http://lkml.kernel.org/r/1473797711-14111-4-git-send-email-jack@suse.cz
    Signed-off-by: Jan Kara
    Cc: Miklos Szeredi
    Cc: Lino Sanfilippo
    Cc: Eric Paris
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

07 Oct, 2016

4 commits

  • Pull f2fs updates from Jaegeuk Kim:
    "In this round, we've investigated how f2fs deals with errors given by
    our fault injection facility. With this, we could fix several corner
    cases. And, in order to improve the performance, we set inline_dentry
    by default and enhance the exisiting discard issue flow. In addition,
    we added f2fs_migrate_page for better memory management.

    Enhancements:
    - set inline_dentry by default
    - improve discard issue flow
    - add more fault injection cases in f2fs
    - allow block preallocation for encrypted files
    - introduce migrate_page callback function
    - avoid truncating the next direct node block at every checkpoint

    Bug fixes:
    - set page flag correctly between write_begin and write_end
    - missing error handling cases detected by fault injection
    - preallocate blocks regarding to 4KB alignement correctly
    - dentry and filename handling of encryption
    - lost xattrs of directories"

    * tag 'for-f2fs-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (69 commits)
    f2fs: introduce update_ckpt_flags to clean up
    f2fs: don't submit irrelevant page
    f2fs: fix to commit bio cache after flushing node pages
    f2fs: introduce get_checkpoint_version for cleanup
    f2fs: remove dead variable
    f2fs: remove redundant io plug
    f2fs: support checkpoint error injection
    f2fs: fix to recover old fault injection config in ->remount_fs
    f2fs: do fault injection initialization in default_options
    f2fs: remove redundant value definition
    f2fs: support configuring fault injection per superblock
    f2fs: adjust display format of segment bit
    f2fs: remove dirty inode pages in error path
    f2fs: do not unnecessarily null-terminate encrypted symlink data
    f2fs: handle errors during recover_orphan_inodes
    f2fs: avoid gc in cp_error case
    f2fs: should put_page for summary page
    f2fs: assign return value in f2fs_gc
    f2fs: add customized migrate_page callback
    f2fs: introduce cp_lock to protect updating of ckpt_flags
    ...

    Linus Torvalds
     
  • Pull pstore updates from Kees Cook:

    - Fix bug in module unloading

    - Switch to always using spinlock over cmpxchg

    - Explicitly define pstore backend's supported modes

    - Remove bounce buffer from pmsg

    - Switch to using memcpy_to/fromio()

    - Error checking improvements

    * tag 'pstore-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    ramoops: move spin_lock_init after kmalloc error checking
    pstore/ram: Use memcpy_fromio() to save old buffer
    pstore/ram: Use memcpy_toio instead of memcpy
    pstore/pmsg: drop bounce buffer
    pstore/ram: Set pstore flags dynamically
    pstore: Split pstore fragile flags
    pstore/core: drop cmpxchg based updates
    pstore/ramoops: fixup driver removal

    Linus Torvalds
     
  • Pull orangefs updates from Mike Marshall:
    "Miscellaneous improvements:
    - clean up debugfs globals
    - remove dead code in sysfs
    - reorganize duplicated sysfs attribute structs
    - consolidate sysfs show and store functions
    - remove duplicated sysfs_ops structures
    - describe organization of sysfs
    - make devreq_mutex static
    - g_orangefs_stats -> orangefs_stats for consistency
    - rename most remaining global variables

    Feature negotiation:
    - enable Orangefs userspace and kernel module to negotiate mutually
    supported features"

    * tag 'for-linus-4.9-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
    Revert "orangefs: bump minimum userspace version"
    orangefs: bump minimum userspace version
    orangefs: rename most remaining global variables
    orangefs: g_orangefs_stats -> orangefs_stats for consistency
    orangefs: make devreq_mutex static
    orangefs: describe organization of sysfs
    orangefs: remove duplicated sysfs_ops structures
    orangefs: consolidate sysfs show and store functions
    orangefs: reorganize duplicated sysfs attribute structs
    orangefs: remove dead code in sysfs
    orangefs: clean up debugfs globals
    orangefs: do not allow client readahead cache without feature bit
    orangefs: add features op
    orangefs: record userspace version for feature compatbility
    orangefs: add readahead count and size to sysfs
    orangefs: re-add flush_racache from out-of-tree
    orangefs: turn param response value into union
    orangefs: add missing param request ops
    orangefs: rename remaining bits of mmap readahead cache

    Linus Torvalds
     
  • Pull namespace updates from Eric Biederman:
    "This set of changes is a number of smaller things that have been
    overlooked in other development cycles focused on more fundamental
    change. The devpts changes are small things that were a distraction
    until we managed to kill off DEVPTS_MULTPLE_INSTANCES. There is an
    trivial regression fix to autofs for the unprivileged mount changes
    that went in last cycle. A pair of ioctls has been added by Andrey
    Vagin making it is possible to discover the relationships between
    namespaces when referring to them through file descriptors.

    The big user visible change is starting to add simple resource limits
    to catch programs that misbehave. With namespaces in general and user
    namespaces in particular allowing users to use more kinds of
    resources, it has become important to have something to limit errant
    programs. Because the purpose of these limits is to catch errant
    programs the code needs to be inexpensive to use as it always on, and
    the default limits need to be high enough that well behaved programs
    on well behaved systems don't encounter them.

    To this end, after some review I have implemented per user per user
    namespace limits, and use them to limit the number of namespaces. The
    limits being per user mean that one user can not exhause the limits of
    another user. The limits being per user namespace allow contexts where
    the limit is 0 and security conscious folks can remove from their
    threat anlysis the code used to manage namespaces (as they have
    historically done as it root only). At the same time the limits being
    per user namespace allow other parts of the system to use namespaces.

    Namespaces are increasingly being used in application sand boxing
    scenarios so an all or nothing disable for the entire system for the
    security conscious folks makes increasing use of these sandboxes
    impossible.

    There is also added a limit on the maximum number of mounts present in
    a single mount namespace. It is nontrivial to guess what a reasonable
    system wide limit on the number of mount structure in the kernel would
    be, especially as it various based on how a system is using
    containers. A limit on the number of mounts in a mount namespace
    however is much easier to understand and set. In most cases in
    practice only about 1000 mounts are used. Given that some autofs
    scenarious have the potential to be 30,000 to 50,000 mounts I have set
    the default limit for the number of mounts at 100,000 which is well
    above every known set of users but low enough that the mount hash
    tables don't degrade unreaonsably.

    These limits are a start. I expect this estabilishes a pattern that
    other limits for resources that namespaces use will follow. There has
    been interest in making inotify event limits per user per user
    namespace as well as interest expressed in making details about what
    is going on in the kernel more visible"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (28 commits)
    autofs: Fix automounts by using current_real_cred()->uid
    mnt: Add a per mount namespace limit on the number of mounts
    netns: move {inc,dec}_net_namespaces into #ifdef
    nsfs: Simplify __ns_get_path
    tools/testing: add a test to check nsfs ioctl-s
    nsfs: add ioctl to get a parent namespace
    nsfs: add ioctl to get an owning user namespace for ns file descriptor
    kernel: add a helper to get an owning user namespace for a namespace
    devpts: Change the owner of /dev/pts/ptmx to the mounter of /dev/pts
    devpts: Remove sync_filesystems
    devpts: Make devpts_kill_sb safe if fsi is NULL
    devpts: Simplify devpts_mount by using mount_nodev
    devpts: Move the creation of /dev/pts/ptmx into fill_super
    devpts: Move parse_mount_options into fill_super
    userns: When the per user per user namespace limit is reached return ENOSPC
    userns; Document per user per user namespace limits.
    mntns: Add a limit on the number of mount namespaces.
    netns: Add a limit on the number of net namespaces
    cgroupns: Add a limit on the number of cgroup namespaces
    ipcns: Add a limit on the number of ipc namespaces
    ...

    Linus Torvalds
     

06 Oct, 2016

6 commits

  • Pull xfs and iomap updates from Dave Chinner:
    "The main things in this update are the iomap-based DAX infrastructure,
    an XFS delalloc rework, and a chunk of fixes to how log recovery
    schedules writeback to prevent spurious corruption detections when
    recovery of certain items was not required.

    The other main chunk of code is some preparation for the upcoming
    reflink functionality. Most of it is generic and cleanups that stand
    alone, but they were ready and reviewed so are in this pull request.

    Speaking of reflink, I'm currently planning to send you another pull
    request next week containing all the new reflink functionality. I'm
    working through a similar process to the last cycle, where I sent the
    reverse mapping code in a separate request because of how large it
    was. The reflink code merge is even bigger than reverse mapping, so
    I'll be doing the same thing again....

    Summary for this update:

    - change of XFS mailing list to linux-xfs@vger.kernel.org

    - iomap-based DAX infrastructure w/ XFS and ext2 support

    - small iomap fixes and additions

    - more efficient XFS delayed allocation infrastructure based on iomap

    - a rework of log recovery writeback scheduling to ensure we don't
    fail recovery when trying to replay items that are already on disk

    - some preparation patches for upcoming reflink support

    - configurable error handling fixes and documentation

    - aio access time update race fixes for XFS and
    generic_file_read_iter"

    * tag 'xfs-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (40 commits)
    fs: update atime before I/O in generic_file_read_iter
    xfs: update atime before I/O in xfs_file_dio_aio_read
    ext2: fix possible integer truncation in ext2_iomap_begin
    xfs: log recovery tracepoints to track current lsn and buffer submission
    xfs: update metadata LSN in buffers during log recovery
    xfs: don't warn on buffers not being recovered due to LSN
    xfs: pass current lsn to log recovery buffer validation
    xfs: rework log recovery to submit buffers on LSN boundaries
    xfs: quiesce the filesystem after recovery on readonly mount
    xfs: remote attribute blocks aren't really userdata
    ext2: use iomap to implement DAX
    ext2: stop passing buffer_head to ext2_get_blocks
    xfs: use iomap to implement DAX
    xfs: refactor xfs_setfilesize
    xfs: take the ilock shared if possible in xfs_file_iomap_begin
    xfs: fix locking for DAX writes
    dax: provide an iomap based fault handler
    dax: provide an iomap based dax read/write path
    dax: don't pass buffer_head to copy_user_dax
    dax: don't pass buffer_head to dax_insert_mapping
    ...

    Linus Torvalds
     
  • Pull ARM updates from Russell King:

    - Correct ARMs dma-mapping to use the correct printk format strings.

    - Avoid defining OBJCOPYFLAGS globally which upsets lkdtm rodata
    testing.

    - Cleanups to ARMs asm/memory.h include.

    - L2 cache cleanups.

    - Allow flat nommu binaries to be executed on ARM MMU systems.

    - Kernel hardening - add more read-only after init annotations,
    including making some kernel vdso variables const.

    - Ensure AMBA primecell clocks are appropriately defaulted.

    - ARM breakpoint cleanup.

    - Various StrongARM 11x0 and companion chip (SA1111) updates to bring
    this legacy platform to use more modern APIs for (eg) GPIOs and
    interrupts, which will allow us in the future to reduce some of the
    board-level driver clutter and elimate function callbacks into board
    code via platform data. There still appears to be interest in these
    platforms!

    - Remove the now redundant secure_flush_area() API.

    - Module PLT relocation optimisations. Ard says: This series of 4
    patches optimizes the ARM PLT generation code that is invoked at
    module load time, to get rid of the O(n^2) algorithm that results in
    pathological load times of 10 seconds or more for large modules on
    certain STB platforms.

    - ARMv7M cache maintanence support.

    - L2 cache PMU support

    * 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (35 commits)
    ARM: sa1111: provide to_sa1111_device() macro
    ARM: sa1111: add sa1111_get_irq()
    ARM: sa1111: clean up duplication in IRQ chip implementation
    ARM: sa1111: implement a gpio_chip for SA1111 GPIOs
    ARM: sa1111: move irq cleanup to separate function
    ARM: sa1111: use devm_clk_get()
    ARM: sa1111: use devm_kzalloc()
    ARM: sa1111: ensure we only touch RAB bus type devices when removing
    ARM: 8611/1: l2x0: add PMU support
    ARM: 8610/1: V7M: Add dsb before jumping in handler mode
    ARM: 8609/1: V7M: Add support for the Cortex-M7 processor
    ARM: 8608/1: V7M: Indirect proc_info construction for V7M CPUs
    ARM: 8607/1: V7M: Wire up caches for V7M processors with cache support.
    ARM: 8606/1: V7M: introduce cache operations
    ARM: 8605/1: V7M: fix notrace variant of save_and_disable_irqs
    ARM: 8604/1: V7M: Add support for reading the CTR with read_cpuid_cachetype()
    ARM: 8603/1: V7M: Add addresses for mem-mapped V7M cache operations
    ARM: 8602/1: factor out CSSELR/CCSIDR operations that use cp15 directly
    ARM: kernel: avoid brute force search on PLT generation
    ARM: kernel: sort relocation sections before allocating PLTs
    ...

    Linus Torvalds
     
  • Russell King
     
  • Pull fuse updates from Miklos Szeredi:
    "This adds POSIX ACL permission checking to the fuse kernel module.

    In addition there are minor bug fixes as well as cleanups"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: limit xattr returned size
    fuse: remove duplicate cs->offset assignment
    fuse: don't use fuse_ioctl_copy_user() helper
    fuse_ioctl_copy_user(): don't open-code copy_page_{to,from}_iter()
    fuse: get rid of fc->flags
    fuse: use timespec64
    fuse: don't use ->d_time
    fuse: Add posix ACL support
    fuse: handle killpriv in userspace fs
    fuse: fix killing s[ug]id in setattr
    fuse: invalidate dir dentry after chmod
    fuse: Use generic xattr ops
    fuse: listxattr: verify xattr list

    Linus Torvalds
     
  • Pull misc filesystem and quota fixes from Jan Kara:
    "Some smaller udf, ext2, quota & reiserfs fixes"

    * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    ext2: Unmap metadata when zeroing blocks
    udf: don't bother with full-page write optimisations in adinicb case
    reiserfs: Unlock superblock before calling reiserfs_quota_on_mount()
    udf: Remove useless check in udf_adinicb_write_begin()
    quota: fill in Q_XGETQSTAT inode information for inactive quotas
    ext2: Check return value from ext2_get_group_desc()

    Linus Torvalds
     
  • Pull networking updates from David Miller:

    1) BBR TCP congestion control, from Neal Cardwell, Yuchung Cheng and
    co. at Google. https://lwn.net/Articles/701165/

    2) Do TCP Small Queues for retransmits, from Eric Dumazet.

    3) Support collect_md mode for all IPV4 and IPV6 tunnels, from Alexei
    Starovoitov.

    4) Allow cls_flower to classify packets in ip tunnels, from Amir Vadai.

    5) Support DSA tagging in older mv88e6xxx switches, from Andrew Lunn.

    6) Support GMAC protocol in iwlwifi mwm, from Ayala Beker.

    7) Support ndo_poll_controller in mlx5, from Calvin Owens.

    8) Move VRF processing to an output hook and allow l3mdev to be
    loopback, from David Ahern.

    9) Support SOCK_DESTROY for UDP sockets. Also from David Ahern.

    10) Congestion control in RXRPC, from David Howells.

    11) Support geneve RX offload in ixgbe, from Emil Tantilov.

    12) When hitting pressure for new incoming TCP data SKBs, perform a
    partial rathern than a full purge of the OFO queue (which could be
    huge). From Eric Dumazet.

    13) Convert XFRM state and policy lookups to RCU, from Florian Westphal.

    14) Support RX network flow classification to igb, from Gangfeng Huang.

    15) Hardware offloading of eBPF in nfp driver, from Jakub Kicinski.

    16) New skbmod packet action, from Jamal Hadi Salim.

    17) Remove some inefficiencies in snmp proc output, from Jia He.

    18) Add FIB notifications to properly propagate route changes to
    hardware which is doing forwarding offloading. From Jiri Pirko.

    19) New dsa driver for qca8xxx chips, from John Crispin.

    20) Implement RFC7559 ipv6 router solicitation backoff, from Maciej
    Żenczykowski.

    21) Add L3 mode to ipvlan, from Mahesh Bandewar.

    22) Support 802.1ad in mlx4, from Moshe Shemesh.

    23) Support hardware LRO in mediatek driver, from Nelson Chang.

    24) Add TC offloading to mlx5, from Or Gerlitz.

    25) Convert various drivers to ethtool ksettings interfaces, from
    Philippe Reynes.

    26) TX max rate limiting for cxgb4, from Rahul Lakkireddy.

    27) NAPI support for ath10k, from Rajkumar Manoharan.

    28) Support XDP in mlx5, from Rana Shahout and Saeed Mahameed.

    29) UDP replicast support in TIPC, from Richard Alpe.

    30) Per-queue statistics for qed driver, from Sudarsana Reddy Kalluru.

    31) Support BQL in thunderx driver, from Sunil Goutham.

    32) TSO support in alx driver, from Tobias Regnery.

    33) Add stream parser engine and use it in kcm.

    34) Support async DHCP replies in ipconfig module, from Uwe
    Kleine-König.

    35) DSA port fast aging for mv88e6xxx driver, from Vivien Didelot.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1715 commits)
    mlxsw: switchx2: Fix misuse of hard_header_len
    mlxsw: spectrum: Fix misuse of hard_header_len
    net/faraday: Stop NCSI device on shutdown
    net/ncsi: Introduce ncsi_stop_dev()
    net/ncsi: Rework the channel monitoring
    net/ncsi: Allow to extend NCSI request properties
    net/ncsi: Rework request index allocation
    net/ncsi: Don't probe on the reserved channel ID (0x1f)
    net/ncsi: Introduce NCSI_RESERVED_CHANNEL
    net/ncsi: Avoid unused-value build warning from ia64-linux-gcc
    net: Add netdev all_adj_list refcnt propagation to fix panic
    net: phy: Add Edge-rate driver for Microsemi PHYs.
    vmxnet3: Wake queue from reset work
    i40e: avoid NULL pointer dereference and recursive errors on early PCI error
    qed: Add RoCE ll2 & GSI support
    qed: Add support for memory registeration verbs
    qed: Add support for QP verbs
    qed: PD,PKEY and CQ verb support
    qed: Add support for RoCE hw init
    qede: Add qedr framework
    ...

    Linus Torvalds
     

05 Oct, 2016

4 commits

  • Pull security subsystem updates from James Morris:

    SELinux/LSM:
    - overlayfs support, necessary for container filesystems

    LSM:
    - finally remove the kernel_module_from_file hook

    Smack:
    - treat signal delivery as an 'append' operation

    TPM:
    - lots of bugfixes & updates

    Audit:
    - new audit data type: LSM_AUDIT_DATA_FILE

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (47 commits)
    Revert "tpm/tpm_crb: implement tpm crb idle state"
    Revert "tmp/tpm_crb: fix Intel PTT hw bug during idle state"
    Revert "tpm/tpm_crb: open code the crb_init into acpi_add"
    Revert "tmp/tpm_crb: implement runtime pm for tpm_crb"
    lsm,audit,selinux: Introduce a new audit data type LSM_AUDIT_DATA_FILE
    tmp/tpm_crb: implement runtime pm for tpm_crb
    tpm/tpm_crb: open code the crb_init into acpi_add
    tmp/tpm_crb: fix Intel PTT hw bug during idle state
    tpm/tpm_crb: implement tpm crb idle state
    tpm: add check for minimum buffer size in tpm_transmit()
    tpm: constify TPM 1.x header structures
    tpm/tpm_crb: fix the over 80 characters checkpatch warring
    tpm/tpm_crb: drop useless cpu_to_le32 when writing to registers
    tpm/tpm_crb: cache cmd_size register value.
    tmp/tpm_crb: drop include to platform_device
    tpm/tpm_tis: remove unused itpm variable
    tpm_crb: fix incorrect values of cmdReady and goIdle bits
    tpm_crb: refine the naming of constants
    tpm_crb: remove wmb()'s
    tpm_crb: fix crb_req_canceled behavior
    ...

    Linus Torvalds
     
  • Pull jfs updates from David Kleikamp:
    "Minor jfs updates"

    * tag 'jfs-4.9' of git://github.com/kleikamp/linux-shaggy:
    jfs: Simplify code
    jfs: jump to error_out when filemap_{fdatawait, write_and_wait} fails

    Linus Torvalds
     
  • Pull gfs2 updates from Bob Peterson:
    "We've only got six GFS2 patches for this merge window. In patch
    order:

    - Fabian Frederick submitted a nice cleanup that uses the BIT macro
    rather than bit shifting.

    - Andreas Gruenbacher contributed a patch that fixes a long-standing
    annoyance whereby GFS2 warned about dirty pages.

    - Andreas also fixed a problem with the recent extended attribute
    readahead feature.

    - Chao Yu contributed a patch that checks the return code from
    function register_shrinker and reacts accordingly. Previously, it
    was not checked.

    - Andreas Gruenbacher also fixed a problem whereby incore file
    timestamps were forgotten if the file was invalidated. This merely
    moves the assignment inside the inode glock where it belongs.

    - Andreas also fixed a problem where incore timestamps were not
    initialized"

    * tag 'gfs2-4.8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
    gfs2: Initialize atime of I_NEW inodes
    gfs2: Update file times after grabbing glock
    gfs2: fix to detect failure of register_shrinker
    gfs2: Fix extended attribute readahead optimization
    gfs2: Remove dirty buffer warning from gfs2_releasepage
    GFS2: use BIT() macro

    Linus Torvalds
     
  • Pull file locking updates from Jeff Layton:
    "Only a single patch from Nikolay this cycle, with a small change to
    better handle /proc/locks in a containerized host"

    * tag 'locks-v4.9-1' of git://git.samba.org/jlayton/linux:
    locks: Filter /proc/locks output on proc pid ns

    Linus Torvalds
     

04 Oct, 2016

7 commits

  • Pull tty and serial updates from Greg KH:
    "Here is the big tty and serial patch set for 4.9-rc1.

    It also includes some drivers/dma/ changes, as those were needed by
    some serial drivers, and they were all acked by the DMA maintainer.

    Also in here is the long-suffering ACPI SPCR patchset, which was
    passed around from maintainer to maintainer like a hot-potato. Seems I
    was the sucker^Wlucky one. All of those patches have been acked by the
    various subsystem maintainers as well.

    All of this has been in linux-next with no reported issues"

    * tag 'tty-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (111 commits)
    Revert "serial: pl011: add console matching function"
    MAINTAINERS: update entry for atmel_serial driver
    serial: pl011: add console matching function
    ARM64: ACPI: enable ACPI_SPCR_TABLE
    ACPI: parse SPCR and enable matching console
    of/serial: move earlycon early_param handling to serial
    Revert "drivers/tty: Explicitly pass current to show_stack"
    tty: amba-pl011: Don't complain on -EPROBE_DEFER when no irq
    nios2: dts: 10m50: Add tx-threshold parameter
    serial: 8250: Set Altera 16550 TX FIFO Threshold
    serial: 8250: of: Load TX FIFO Threshold from DT
    Documentation: dt: serial: Add TX FIFO threshold parameter
    drivers/tty: Explicitly pass current to show_stack
    serial: imx: Fix DCD reading
    serial: stm32: mark symbols static where possible
    serial: xuartps: Add some register initialisation to cdns_early_console_setup()
    serial: xuartps: Removed unwanted checks while reading the error conditions
    serial: xuartps: Rewrite the interrupt handling logic
    serial: stm32: use mapbase instead of membase for DMA
    tty/serial: atmel: fix fractional baud rate computation
    ...

    Linus Torvalds
     
  • Pull driver core updates from Greg KH:
    "Here are the "big" driver core patches for 4.9-rc1. Also in here are a
    number of debugfs fixes that cropped up due to the changes that
    happened in 4.8 for that filesystem. Overall, nothing major, just a
    few fixes and cleanups.

    All of these have been in linux-next with no reported issues"

    * tag 'driver-core-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (23 commits)
    drivers: dma-coherent: Move spinlock in dma_alloc_from_coherent()
    drivers: dma-coherent: Fix DMA coherent size for less than page
    MAINTAINERS: extend firmware_class maintainer list
    debugfs: propagate release() call result
    driver-core: platform: Catch errors from calls to irq_get_irq_data
    sysfs print name of undiscoverable attribute group
    carl9170: fix debugfs crashes
    b43legacy: fix debugfs crash
    b43: fix debugfs crash
    debugfs: introduce a public file_operations accessor
    device core: Remove deprecated create_singlethread_workqueue
    drivers/base dmam_declare_coherent_memory leaks
    platform: don't return 0 from platform_get_irq[_byname]() on error
    cpu: clean up register_cpu func
    dma-mapping: use vma_pages().
    drivers: dma-coherent: use vma_pages().
    attribute_container: Fix typo
    base: soc: make it explicitly non-modular
    drivers: base: dma-mapping: page align the size when unmap_kernel_range
    platform driver: fix use-after-free in platform_device_del()
    ...

    Linus Torvalds
     
  • Pull x86 vdso updates from Ingo Molnar:
    "The main changes in this cycle centered around adding support for
    32-bit compatible C/R of the vDSO on 64-bit kernels, by Dmitry
    Safonov"

    * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/vdso: Use CONFIG_X86_X32_ABI to enable vdso prctl
    x86/vdso: Only define map_vdso_randomized() if CONFIG_X86_64
    x86/vdso: Only define prctl_map_vdso() if CONFIG_CHECKPOINT_RESTORE
    x86/signal: Add SA_{X32,IA32}_ABI sa_flags
    x86/ptrace: Down with test_thread_flag(TIF_IA32)
    x86/coredump: Use pr_reg size, rather that TIF_IA32 flag
    x86/arch_prctl/vdso: Add ARCH_MAP_VDSO_*
    x86/vdso: Replace calculate_addr in map_vdso() with addr
    x86/vdso: Unmap vdso blob on vvar mapping failure

    Linus Torvalds
     
  • Pull low-level x86 updates from Ingo Molnar:
    "In this cycle this topic tree has become one of those 'super topics'
    that accumulated a lot of changes:

    - Add CONFIG_VMAP_STACK=y support to the core kernel and enable it on
    x86 - preceded by an array of changes. v4.8 saw preparatory changes
    in this area already - this is the rest of the work. Includes the
    thread stack caching performance optimization. (Andy Lutomirski)

    - switch_to() cleanups and all around enhancements. (Brian Gerst)

    - A large number of dumpstack infrastructure enhancements and an
    unwinder abstraction. The secret long term plan is safe(r) live
    patching plus maybe another attempt at debuginfo based unwinding -
    but all these current bits are standalone enhancements in a frame
    pointer based debug environment as well. (Josh Poimboeuf)

    - More __ro_after_init and const annotations. (Kees Cook)

    - Enable KASLR for the vmemmap memory region. (Thomas Garnier)"

    [ The virtually mapped stack changes are pretty fundamental, and not
    x86-specific per se, even if they are only used on x86 right now. ]

    * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
    x86/asm: Get rid of __read_cr4_safe()
    thread_info: Use unsigned long for flags
    x86/alternatives: Add stack frame dependency to alternative_call_2()
    x86/dumpstack: Fix show_stack() task pointer regression
    x86/dumpstack: Remove dump_trace() and related callbacks
    x86/dumpstack: Convert show_trace_log_lvl() to use the new unwinder
    oprofile/x86: Convert x86_backtrace() to use the new unwinder
    x86/stacktrace: Convert save_stack_trace_*() to use the new unwinder
    perf/x86: Convert perf_callchain_kernel() to use the new unwinder
    x86/unwind: Add new unwind interface and implementations
    x86/dumpstack: Remove NULL task pointer convention
    fork: Optimize task creation by caching two thread stacks per CPU if CONFIG_VMAP_STACK=y
    sched/core: Free the stack early if CONFIG_THREAD_INFO_IN_TASK
    lib/syscall: Pin the task stack in collect_syscall()
    x86/process: Pin the target stack in get_wchan()
    x86/dumpstack: Pin the target stack when dumping it
    kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function
    sched/core: Add try_get_task_stack() and put_task_stack()
    x86/entry/64: Fix a minor comment rebase error
    iommu/amd: Don't put completion-wait semaphore on stack
    ...

    Linus Torvalds
     
  • Pull locking updates from Ingo Molnar:
    "The main changes in this cycle were:

    - rwsem micro-optimizations (Davidlohr Bueso)

    - Improve the implementation and optimize the performance of
    percpu-rwsems. (Peter Zijlstra.)

    - Convert all lglock users to better facilities such as percpu-rwsems
    or percpu-spinlocks and remove lglocks. (Peter Zijlstra)

    - Remove the ticket (spin)lock implementation. (Peter Zijlstra)

    - Korean translation of memory-barriers.txt and related fixes to the
    English document. (SeongJae Park)

    - misc fixes and cleanups"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
    x86/cmpxchg, locking/atomics: Remove superfluous definitions
    x86, locking/spinlocks: Remove ticket (spin)lock implementation
    locking/lglock: Remove lglock implementation
    stop_machine: Remove stop_cpus_lock and lg_double_lock/unlock()
    fs/locks: Use percpu_down_read_preempt_disable()
    locking/percpu-rwsem: Add down_read_preempt_disable()
    fs/locks: Replace lg_local with a per-cpu spinlock
    fs/locks: Replace lg_global with a percpu-rwsem
    locking/percpu-rwsem: Add DEFINE_STATIC_PERCPU_RWSEMand percpu_rwsem_assert_held()
    locking/pv-qspinlock: Use cmpxchg_release() in __pv_queued_spin_unlock()
    locking/rwsem, x86: Drop a bogus cc clobber
    futex: Add some more function commentry
    locking/hung_task: Show all locks
    locking/rwsem: Scan the wait_list for readers only once
    locking/rwsem: Remove a few useless comments
    locking/rwsem: Return void in __rwsem_mark_wake()
    locking, rcu, cgroup: Avoid synchronize_sched() in __cgroup_procs_write()
    locking/Documentation: Add Korean translation
    locking/Documentation: Fix a typo of example result
    locking/Documentation: Fix wrong section reference
    ...

    Linus Torvalds
     
  • The features op did make it into OrangeFS 2.9.6 after all.

    This reverts commit 0c95ad76361f1d75a1ffdf82deafbcec44d19c42.

    Mike Marshall
     
  • Pull EFI updates from Ingo Molnar:
    "Main changes in this cycle were:

    - Refactor the EFI memory map code into architecture neutral files
    and allow drivers to permanently reserve EFI boot services regions
    on x86, as well as ARM/arm64. (Matt Fleming)

    - Add ARM support for the EFI ESRT driver. (Ard Biesheuvel)

    - Make the EFI runtime services and efivar API interruptible by
    swapping spinlocks for semaphores. (Sylvain Chouleur)

    - Provide the EFI identity mapping for kexec which allows kexec to
    work on SGI/UV platforms with requiring the "noefi" kernel command
    line parameter. (Alex Thorlton)

    - Add debugfs node to dump EFI page tables on arm64. (Ard Biesheuvel)

    - Merge the EFI test driver being carried out of tree until now in
    the FWTS project. (Ivan Hu)

    - Expand the list of flags for classifying EFI regions as "RAM" on
    arm64 so we align with the UEFI spec. (Ard Biesheuvel)

    - Optimise out the EFI mixed mode if it's unsupported (CONFIG_X86_32)
    or disabled (CONFIG_EFI_MIXED=n) and switch the early EFI boot
    services function table for direct calls, alleviating us from
    having to maintain the custom function table. (Lukas Wunner)

    - Miscellaneous cleanups and fixes"

    * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
    x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE
    x86/efi: Allow invocation of arbitrary boot services
    x86/efi: Optimize away setup_gop32/64 if unused
    x86/efi: Use kmalloc_array() in efi_call_phys_prolog()
    efi/arm64: Treat regions with WT/WC set but WB cleared as memory
    efi: Add efi_test driver for exporting UEFI runtime service interfaces
    x86/efi: Defer efi_esrt_init until after memblock_x86_fill
    efi/arm64: Add debugfs node to dump UEFI runtime page tables
    x86/efi: Remove unused find_bits() function
    fs/efivarfs: Fix double kfree() in error path
    x86/efi: Map in physical addresses in efi_map_region_fixed
    lib/ucs2_string: Speed up ucs2_utf8size()
    firmware-gsmi: Delete an unnecessary check before the function call "dma_pool_destroy"
    x86/efi: Initialize status to ensure garbage is not returned on small size
    efi: Replace runtime services spinlock with semaphore
    efi: Don't use spinlocks for efi vars
    efi: Use a file local lock for efivars
    efi/arm*: esrt: Add missing call to efi_esrt_init()
    efi/esrt: Use memremap not ioremap to access ESRT table in memory
    x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data
    ...

    Linus Torvalds
     

03 Oct, 2016

5 commits