27 Sep, 2016

7 commits

  • NFSv4.1 has built-in trunking support that allows a client to determine
    whether two connections to two different IP addresses are actually to
    the same server. NFSv4.0 does not, but RFC 7931 attempts to provide
    clients a means to do this, basically by performing a SETCLIENTID to one
    address and confirming it with a SETCLIENTID_CONFIRM to the other.

    Linux clients since 05f4c350ee02 "NFS: Discover NFSv4 server trunking
    when mounting" implement a variation on this suggestion. It is possible
    that other clients do too.

    This depends on the clientid and verifier not being accepted by an
    unrelated server. Since both are 64-bit values, that would be very
    unlikely if they were random numbers. But they aren't:

    knfsd generates the 64-bit clientid by concatenating the 32-bit boot
    time (in seconds) and a counter. This makes collisions between
    clientids generated by the same server extremely unlikely. But
    collisions are very likely between clientids generated by servers that
    boot at the same time, and it's quite common for multiple servers to
    boot at the same time. The verifier is a concatenation of the
    SETCLIENTID time (in seconds) and a counter, so again collisions between
    different servers are likely if multiple SETCLIENTIDs are done at the
    same time, which is a common case.

    Therefore recent NFSv4.0 clients may decide two different servers are
    really the same, and mount a filesystem from the wrong server.

    Fortunately the Linux client, since 55b9df93ddd6 "nfsv4/v4.1: Verify the
    client owner id during trunking detection", only does this when given
    the non-default "migration" mount option.

    The fault is really with RFC 7931, and needs a client fix, but in the
    meantime we can mitigate the chance of these collisions by randomizing
    the starting value of the counters used to generate clientids and
    verifiers.

    Reported-by: Frank Sorenson
    Reviewed-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • If we are using v4.1+, then we can send notification when contended
    locks become free. Inform the client of that fact.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • As defined in RFC 5661, section 18.16.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • It's possible for a client to call in on a lock that is blocked for a
    long time, but discontinue polling for it. A malicious client could
    even set a lock on a file, and then spam the server with failing lock
    requests from different lockowners that pile up in a DoS attack.

    Add the blocked lock structures to a per-net namespace LRU when hashing
    them, and timestamp them. If the lock request is not revisited after a
    lease period, we'll drop it under the assumption that the client is no
    longer interested.

    This also gives us a mechanism to clean up these objects at server
    shutdown time as well.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • Create a new per-lockowner+per-inode structure that contains a
    file_lock. Have nfsd4_lock add this structure to the lockowner's list
    prior to setting the lock. Then call the vfs and request a blocking lock
    (by setting FL_SLEEP). If we get anything besides FILE_LOCK_DEFERRED
    back, then we dequeue the block structure and free it. When the next
    lock request comes in, we'll look for an existing block for the same
    filehandle and dequeue and reuse it if there is one.

    When the lock comes free (a'la an lm_notify call), we dequeue it
    from the lockowner's list and kick off a CB_NOTIFY_LOCK callback to
    inform the client that it should retry the lock request.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • Add the encoding/decoding for CB_NOTIFY_LOCK operations.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • By design notifier can be registered once only, however nfsd registers
    the same inetaddr notifiers per net-namespace. When this happen it
    corrupts list of notifiers, as result some notifiers can be not called
    on proper event, traverse on list can be cycled forever, and second
    unregister can access already freed memory.

    Cc: stable@vger.kernel.org
    fixes: 36684996 ("nfsd: Register callbacks on the inetaddr_chain and inet6addr_chain")
    Signed-off-by: Vasily Averin
    Reviewed-by: Jeff Layton
    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields

    Vasily Averin
     

23 Sep, 2016

6 commits

  • Support Remote Invalidation. A private message is exchanged with
    the client upon RDMA transport connect that indicates whether
    Send With Invalidation may be used by the server to send RPC
    replies. The invalidate_rkey is arbitrarily chosen from among
    rkeys present in the RPC-over-RDMA header's chunk lists.

    Send With Invalidate improves performance only when clients can
    recognize, while processing an RPC reply, that an rkey has already
    been invalidated. That has been submitted as a separate change.

    In the future, the RPC-over-RDMA protocol might support Remote
    Invalidation properly. The protocol needs to enable signaling
    between peers to indicate when Remote Invalidation can be used
    for each individual RPC.

    Signed-off-by: Chuck Lever
    Reviewed-by: Sagi Grimberg
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Prepare to receive an RDMA-CM private message when handling a new
    connection attempt, and send a similar message as part of connection
    acceptance.

    Both sides can communicate their various implementation limits.
    Implementations that don't support this sideband protocol ignore it.

    Signed-off-by: Chuck Lever
    Reviewed-by: Sagi Grimberg
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Introduce data structure used by both client and server to exchange
    implementation details during RDMA/CM connection establishment.

    This is an experimental out-of-band exchange between Linux
    RPC-over-RDMA Version One implementations, replacing the deprecated
    CCP (see RFC 5666bis). The purpose of this extension is to enable
    prototyping of features that might be introduced in a subsequent
    version of RPC-over-RDMA.

    Suggested by Christoph Hellwig and Devesh Sharma.

    Signed-off-by: Chuck Lever
    Reviewed-by: Sagi Grimberg
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Message from syslogd@klimt at Aug 18 17:00:37 ...
    kernel:page:ffffea0020639b00 count:0 mapcount:0 mapping: (null) index:0x0
    Aug 18 17:00:37 klimt kernel: flags: 0x2fffff80000000()
    Aug 18 17:00:37 klimt kernel: page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)

    Aug 18 17:00:37 klimt kernel: kernel BUG at /home/cel/src/linux/linux-2.6/include/linux/mm.h:445!
    Aug 18 17:00:37 klimt kernel: RIP: 0010:[] svc_rdma_sendto+0x641/0x820 [rpcrdma]

    send_reply() assigns its page argument as the first page of ctxt. On
    error, send_reply() already invokes svc_rdma_put_context(ctxt, 1);
    which does a put_page() on that very page. No need to do that again
    as svc_rdma_sendto exits.

    Fixes: 3e1eeb980822 ("svcrdma: Close connection when a send error occurs")
    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • The ctxt's count field is overloaded to mean the number of pages in
    the ctxt->page array and the number of SGEs in the ctxt->sge array.
    Typically these two numbers are the same.

    However, when an inline RPC reply is constructed from an xdr_buf
    with a tail iovec, the head and tail often occupy the same page,
    but each are DMA mapped independently. In that case, ->count equals
    the number of pages, but it does not equal the number of SGEs.
    There's one more SGE, for the tail iovec. Hence there is one more
    DMA mapping than there are pages in the ctxt->page array.

    This isn't a real problem until the server's iommu is enabled. Then
    each RPC reply that has content in that iovec orphans a DMA mapping
    that consists of real resources.

    krb5i and krb5p always populate that tail iovec. After a couple
    million sent krb5i/p RPC replies, the NFS server starts behaving
    erratically. Reboot is needed to clear the problem.

    Fixes: 9d11b51ce7c1 ("svcrdma: Fix send_reply() scatter/gather set-up")
    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • nfserr is big-endian, so we should convert it to host-endian before
    printing it.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     

17 Sep, 2016

2 commits

  • We already have that info in the client pointer. No need to pass around
    a copy.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • We currently can hit a deadlock (of sorts) when trying to use flexfiles
    layouts with XFS. XFS will call break_layout when something wants to
    write to the file. In the case of the (super-simple) flexfiles layout
    driver in knfsd, the MDS and DS are the same machine.

    The client can get a layout and then issue a v3 write to do its I/O. XFS
    will then call xfs_break_layouts, which will cause a CB_LAYOUTRECALL to
    be issued to the client. The client however can't return the layout
    until the v3 WRITE completes, but XFS won't allow the write to proceed
    until the layout is returned.

    Christoph says:

    XFS only cares about block-like layouts where the client has direct
    access to the file blocks. I'd need to look how to propagate the
    flag into break_layout, but in principle we don't need to do any
    recalls on truncate ever for file and flexfile layouts.

    If we're never going to recall the layout, then we don't even need to
    set the lease at all. Just skip doing so on flexfiles layouts by
    adding a new flag to struct nfsd4_layout_ops and skipping the lease
    setting and removal when that flag is true.

    Cc: Christoph Hellwig
    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     

13 Sep, 2016

1 commit

  • rsc_lookup steals the passed-in memory to avoid doing an allocation of
    its own, so we can't just pass in a pointer to memory that someone else
    is using.

    If we really want to avoid allocation there then maybe we should
    preallocate somwhere, or reference count these handles.

    For now we should revert.

    On occasion I see this on my server:

    kernel: kernel BUG at /home/cel/src/linux/linux-2.6/mm/slub.c:3851!
    kernel: invalid opcode: 0000 [#1] SMP
    kernel: Modules linked in: cts rpcsec_gss_krb5 sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd btrfs xor iTCO_wdt iTCO_vendor_support raid6_pq pcspkr i2c_i801 i2c_smbus lpc_ich mfd_core mei_me sg mei shpchp wmi ioatdma ipmi_si ipmi_msghandler acpi_pad acpi_power_meter rpcrdma ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables xfs libcrc32c mlx4_ib mlx4_en ib_core sr_mod cdrom sd_mod ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel igb mlx4_core ahci libahci libata ptp pps_core dca i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
    kernel: CPU: 7 PID: 145 Comm: kworker/7:2 Not tainted 4.8.0-rc4-00006-g9d06b0b #15
    kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
    kernel: Workqueue: events do_cache_clean [sunrpc]
    kernel: task: ffff8808541d8000 task.stack: ffff880854344000
    kernel: RIP: 0010:[] [] kfree+0x155/0x180
    kernel: RSP: 0018:ffff880854347d70 EFLAGS: 00010246
    kernel: RAX: ffffea0020fe7660 RBX: ffff88083f9db064 RCX: 146ff0f9d5ec5600
    kernel: RDX: 000077ff80000000 RSI: ffff880853f01500 RDI: ffff88083f9db064
    kernel: RBP: ffff880854347d88 R08: ffff8808594ee000 R09: ffff88087fdd8780
    kernel: R10: 0000000000000000 R11: ffffea0020fe76c0 R12: ffff880853f01500
    kernel: R13: ffffffffa013cf76 R14: ffffffffa013cff0 R15: ffffffffa04253a0
    kernel: FS: 0000000000000000(0000) GS:ffff88087fdc0000(0000) knlGS:0000000000000000
    kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    kernel: CR2: 00007fed60b020c3 CR3: 0000000001c06000 CR4: 00000000001406e0
    kernel: Stack:
    kernel: ffff8808589f2f00 ffff880853f01500 0000000000000001 ffff880854347da0
    kernel: ffffffffa013cf76 ffff8808589f2f00 ffff880854347db8 ffffffffa013d006
    kernel: ffff8808589f2f20 ffff880854347e00 ffffffffa0406f60 0000000057c7044f
    kernel: Call Trace:
    kernel: [] rsc_free+0x16/0x90 [auth_rpcgss]
    kernel: [] rsc_put+0x16/0x30 [auth_rpcgss]
    kernel: [] cache_clean+0x2e0/0x300 [sunrpc]
    kernel: [] do_cache_clean+0xe/0x70 [sunrpc]
    kernel: [] process_one_work+0x1ff/0x3b0
    kernel: [] worker_thread+0x2bc/0x4a0
    kernel: [] ? rescuer_thread+0x3a0/0x3a0
    kernel: [] kthread+0xe4/0xf0
    kernel: [] ret_from_fork+0x1f/0x40
    kernel: [] ? kthread_stop+0x110/0x110
    kernel: Code: f7 ff ff eb 3b 65 8b 05 da 30 e2 7e 89 c0 48 0f a3 05 a0 38 b8 00 0f 92 c0 84 c0 0f 85 d1 fe ff ff 0f 1f 44 00 00 e9 f5 fe ff ff 0b 49 8b 03 31 f6 f6 c4 40 0f 85 62 ff ff ff e9 61 ff ff ff
    kernel: RIP [] kfree+0x155/0x180
    kernel: RSP
    kernel: ---[ end trace 3fdec044969def26 ]---

    It seems to be most common after a server reboot where a client has been
    using a Kerberos mount, and reconnects to continue its workload.

    Signed-off-by: Chuck Lever
    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

12 Sep, 2016

4 commits

  • Linus Torvalds
     
  • Commit aa71987472a9 ("nvme: fabrics drivers don't need the nvme-pci
    driver") removed the dependency on BLK_DEV_NVME, but the cdoe does
    depend on the block layer (which used to be an implicit dependency
    through BLK_DEV_NVME).

    Otherwise you get various errors from the kbuild test robot random
    config testing when that happens to hit a configuration with BLOCK
    device support disabled.

    Cc: Christoph Hellwig
    Cc: Jay Freyensee
    Cc: Sagi Grimberg
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Pull IIO fixes from Greg KH:
    "Here are a few small IIO fixes for 4.8-rc6.

    Nothing major, full details are in the shortlog, all of these have
    been in linux-next with no reported issues"

    * tag 'staging-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
    iio:core: fix IIO_VAL_FRACTIONAL sign handling
    iio: ensure ret is initialized to zero before entering do loop
    iio: accel: kxsd9: Fix scaling bug
    iio: accel: bmc150: reset chip at init time
    iio: fix pressure data output unit in hid-sensor-attributes
    tools:iio:iio_generic_buffer: fix trigger-less mode

    Linus Torvalds
     
  • Pull USB fixes from Greg KH:
    "Here are some small USB gadget, phy, and xhci fixes for 4.8-rc6.

    All of these resolve minor issues that have been reported, and all
    have been in linux-next with no reported issues"

    * tag 'usb-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
    usb: chipidea: udc: fix NULL ptr dereference in isr_setup_status_phase
    xhci: fix null pointer dereference in stop command timeout function
    usb: dwc3: pci: fix build warning on !PM_SLEEP
    usb: gadget: prevent potenial null pointer dereference on skb->len
    usb: renesas_usbhs: fix clearing the {BRDY,BEMP}STS condition
    usb: phy: phy-generic: Check clk_prepare_enable() error
    usb: gadget: udc: renesas-usb3: clear VBOUT bit in DRD_CON
    Revert "usb: dwc3: gadget: always decrement by 1"

    Linus Torvalds
     

11 Sep, 2016

3 commits

  • Pull libnvdimm fixes from Dan Williams:
    "nvdimm fixes for v4.8, two of them are tagged for -stable:

    - Fix devm_memremap_pages() to use track_pfn_insert(). Otherwise,
    DAX pmd mappings end up with an uncached pgprot, and unusable
    performance for the device-dax interface. The device-dax interface
    appeared in 4.7 so this is tagged for -stable.

    - Fix a couple VM_BUG_ON() checks in the show_smaps() path to
    understand DAX pmd entries. This fix is tagged for -stable.

    - Fix a mis-merge of the nfit machine-check handler to flip the
    polarity of an if() to match the final version of the patch that
    Vishal sent for 4.8-rc1. Without this the nfit machine check
    handler never detects / inserts new 'badblocks' entries which
    applications use to identify lost portions of files.

    - For test purposes, fix the nvdimm_clear_poison() path to operate on
    legacy / simulated nvdimm memory ranges. Without this fix a test
    can set badblocks, but never clear them on these ranges.

    - Fix the range checking done by dax_dev_pmd_fault(). This is not
    tagged for -stable since this problem is mitigated by specifying
    aligned resources at device-dax setup time.

    These patches have appeared in a next release over the past week. The
    recent rebase you can see in the timestamps was to drop an invalid fix
    as identified by the updated device-dax unit tests [1]. The -mm
    touches have an ack from Andrew"

    [1]: "[ndctl PATCH 0/3] device-dax test for recent kernel bugs"
    https://lists.01.org/pipermail/linux-nvdimm/2016-September/006855.html

    * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    libnvdimm: allow legacy (e820) pmem region to clear bad blocks
    nfit, mce: Fix SPA matching logic in MCE handler
    mm: fix cache mode of dax pmd mappings
    mm: fix show_smap() for zone_device-pmd ranges
    dax: fix mapping size check

    Linus Torvalds
     
  • Pull i2c fixes from Wolfram Sang:
    "Mostly driver bugfixes, but also a few cleanups which are nice to have
    out of the way"

    * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
    i2c: rk3x: Restore clock settings at resume time
    i2c: Spelling s/acknowedge/acknowledge/
    i2c: designware: save the preset value of DW_IC_SDA_HOLD
    Documentation: i2c: slave-interface: add note for driver development
    i2c: mux: demux-pinctrl: run properly with multiple instances
    i2c: bcm-kona: fix inconsistent indenting
    i2c: rcar: use proper device with dma_mapping_error
    i2c: sh_mobile: use proper device with dma_mapping_error
    i2c: mux: demux-pinctrl: invalidate properly when switching fails

    Linus Torvalds
     
  • Pull fscrypto fixes fromTed Ts'o:
    "Fix some brown-paper-bag bugs for fscrypto, including one one which
    allows a malicious user to set an encryption policy on an empty
    directory which they do not own"

    * tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    fscrypto: require write access to mount to set encryption policy
    fscrypto: only allow setting encryption policy on directories
    fscrypto: add authorization check for setting encryption policy

    Linus Torvalds
     

10 Sep, 2016

17 commits

  • Since setting an encryption policy requires writing metadata to the
    filesystem, it should be guarded by mnt_want_write/mnt_drop_write.
    Otherwise, a user could cause a write to a frozen or readonly
    filesystem. This was handled correctly by f2fs but not by ext4. Make
    fscrypt_process_policy() handle it rather than relying on the filesystem
    to get it right.

    Signed-off-by: Eric Biggers
    Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
    Signed-off-by: Theodore Ts'o
    Acked-by: Jaegeuk Kim

    Eric Biggers
     
  • The FS_IOC_SET_ENCRYPTION_POLICY ioctl allowed setting an encryption
    policy on nondirectory files. This was unintentional, and in the case
    of nonempty regular files did not behave as expected because existing
    data was not actually encrypted by the ioctl.

    In the case of ext4, the user could also trigger filesystem errors in
    ->empty_dir(), e.g. due to mismatched "directory" checksums when the
    kernel incorrectly tried to interpret a regular file as a directory.

    This bug affected ext4 with kernels v4.8-rc1 or later and f2fs with
    kernels v4.6 and later. It appears that older kernels only permitted
    directories and that the check was accidentally lost during the
    refactoring to share the file encryption code between ext4 and f2fs.

    This patch restores the !S_ISDIR() check that was present in older
    kernels.

    Signed-off-by: Eric Biggers
    Cc: stable@vger.kernel.org
    Signed-off-by: Theodore Ts'o

    Eric Biggers
     
  • On an ext4 or f2fs filesystem with file encryption supported, a user
    could set an encryption policy on any empty directory(*) to which they
    had readonly access. This is obviously problematic, since such a
    directory might be owned by another user and the new encryption policy
    would prevent that other user from creating files in their own directory
    (for example).

    Fix this by requiring inode_owner_or_capable() permission to set an
    encryption policy. This means that either the caller must own the file,
    or the caller must have the capability CAP_FOWNER.

    (*) Or also on any regular file, for f2fs v4.6 and later and ext4
    v4.8-rc1 and later; a separate bug fix is coming for that.

    Signed-off-by: Eric Biggers
    Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
    Signed-off-by: Theodore Ts'o

    Eric Biggers
     
  • Bad blocks can be injected via /sys/block/pmemN/badblocks. In a situation
    where legacy pmem is being used or a pmem region created by using memmap
    kernel parameter, the injected bad blocks are not cleared due to
    nvdimm_clear_poison() failing from lack of ndctl function pointer. In
    this case we need to just return as handled and allow the bad blocks to
    be cleared rather than fail.

    Reviewed-by: Vishal Verma
    Signed-off-by: Dave Jiang
    Signed-off-by: Dan Williams

    Dave Jiang
     
  • The check for a 'pmem' type SPA in the MCE handler was inverted due to a
    merge/rebase error.

    Fixes: 6839a6d nfit: do an ARS scrub on hitting a latent media error
    Cc: linux-acpi@vger.kernel.org
    Cc: Dan Williams
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     
  • track_pfn_insert() in vmf_insert_pfn_pmd() is marking dax mappings as
    uncacheable rendering them impractical for application usage. DAX-pte
    mappings are cached and the goal of establishing DAX-pmd mappings is to
    attain more performance, not dramatically less (3 orders of magnitude).

    track_pfn_insert() relies on a previous call to reserve_memtype() to
    establish the expected page_cache_mode for the range. While memremap()
    arranges for reserve_memtype() to be called, devm_memremap_pages() does
    not. So, teach track_pfn_insert() and untrack_pfn() how to handle
    tracking without a vma, and arrange for devm_memremap_pages() to
    establish the write-back-cache reservation in the memtype tree.

    Cc:
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Cc: Nilesh Choudhury
    Cc: Kirill A. Shutemov
    Reported-by: Toshi Kani
    Reported-by: Kai Zhang
    Acked-by: Andrew Morton
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Attempting to dump /proc//smaps for a process with pmd dax mappings
    currently results in the following VM_BUG_ONs:

    kernel BUG at mm/huge_memory.c:1105!
    task: ffff88045f16b140 task.stack: ffff88045be14000
    RIP: 0010:[] [] follow_trans_huge_pmd+0x2cb/0x340
    [..]
    Call Trace:
    [] smaps_pte_range+0xa0/0x4b0
    [] ? vsnprintf+0x255/0x4c0
    [] __walk_page_range+0x1fe/0x4d0
    [] walk_page_vma+0x62/0x80
    [] show_smap+0xa6/0x2b0

    kernel BUG at fs/proc/task_mmu.c:585!
    RIP: 0010:[] [] smaps_pte_range+0x499/0x4b0
    Call Trace:
    [] ? vsnprintf+0x255/0x4c0
    [] __walk_page_range+0x1fe/0x4d0
    [] walk_page_vma+0x62/0x80
    [] show_smap+0xa6/0x2b0

    These locations are sanity checking page flags that must be set for an
    anonymous transparent huge page, but are not set for the zone_device
    pages associated with dax mappings.

    Cc: Ross Zwisler
    Cc: Kirill A. Shutemov
    Acked-by: Andrew Morton
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Pull virtio fixes from Michael Tsirkin:
    "This includes a couple of bugfixs for virtio.

    The virtio console patch is actually also in x86/tip targeting 4.9
    because it helps vmap stacks, but it also fixes IOMMU_PLATFORM which
    was added in 4.8, and it seems important not to ship that in a broken
    configuration"

    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
    virtio_console: Stop doing DMA on the stack
    virtio: mark vring_dma_dev() static

    Linus Torvalds
     
  • Pull power management fixes from Rafael Wysocki:
    "This includes a PM QoS framework fix from Tejun to prevent interrupts
    from being enabled unexpectedly during early boot and a cpufreq
    documentation fix.

    Specifics:

    - If the PM QoS framework invokes cancel_delayed_work_sync() during
    early boot, it will enable interrupts which is not expected at that
    point, so prevent it from happening (Tejun Heo)

    - Fix cpufreq statistic documentation to follow a recent change in
    behavior that forgot to update it as appropriate (Jean Delvare)"

    * tag 'pm-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    cpufreq-stats: Minor documentation fix
    PM / QoS: avoid calling cancel_delayed_work_sync() during early boot

    Linus Torvalds
     
  • * pm-core-fixes:
    PM / QoS: avoid calling cancel_delayed_work_sync() during early boot

    * pm-cpufreq-fixes:
    cpufreq-stats: Minor documentation fix

    Rafael J. Wysocki
     
  • Pull GPIO fixes from Linus Walleij:
    "Some GPIO fixes that have been boiling the last two weeks or so.
    Nothing special, I'm trying to sort out some Kconfig business and
    Russell needs a fix in for -his SA1100 rework.

    Summary:

    - Revert a pointless attempt to add an include to solve the UM allyes
    compilation problem.

    - Make the mcp23s08 depend on OF_GPIO as it uses it and doesn't
    compile properly without it.

    - Fix a probing problem for ucb1x00"

    * tag 'gpio-v4.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
    gpio: sa1100: fix irq probing for ucb1x00
    gpio: mcp23s08: make driver depend on OF_GPIO
    Revert "gpio: include in gpiolib-of"

    Linus Torvalds
     
  • Pull fuse fix from Miklos Szeredi:
    "This fixes a deadlock when fuse, direct I/O and loop device are
    combined"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: direct-io: don't dirty ITER_BVEC pages

    Linus Torvalds
     
  • Pull overlayfs fix from Miklos Szeredi:
    "This fixes a regression caused by the last pull request"

    * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ovl: fix workdir creation

    Linus Torvalds
     
  • Pull btrfs fixes from Chris Mason:
    "I'm not proud of how long it took me to track down that one liner in
    btrfs_sync_log(), but the good news is the patches I was trying to
    blame for these problems were actually fine (sorry Filipe)"

    * 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    btrfs: introduce tickets_id to determine whether asynchronous metadata reclaim work makes progress
    btrfs: remove root_log_ctx from ctx list before btrfs_sync_log returns
    btrfs: do not decrease bytes_may_use when replaying extents

    Linus Torvalds
     
  • Pull sound fixes from Takashi Iwai:
    "We've got quite a few fixes at this time, and all are stable patches.

    syzkaller strikes back again (episode 19 or so), and we had to plug
    some holes in ALSA core part (mostly timer).

    In addition, a couple of FireWire audio fixes for the invalid copy
    user calls in locks, and a few quirks for HD-audio and USB-audio as
    usual are included"

    * tag 'sound-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
    ALSA: rawmidi: Fix possible deadlock with virmidi registration
    ALSA: timer: Fix zero-division by continue of uninitialized instance
    ALSA: timer: fix NULL pointer dereference in read()/ioctl() race
    ALSA: fireworks: accessing to user space outside spinlock
    ALSA: firewire-tascam: accessing to user space outside spinlock
    ALSA: hda - Enable subwoofer on Dell Inspiron 7559
    ALSA: hda - Add headset mic quirk for Dell Inspiron 5468
    ALSA: usb-audio: Add sample rate inquiry quirk for B850V3 CP2114
    ALSA: timer: fix NULL pointer dereference on memory allocation failure
    ALSA: timer: fix division by zero after SNDRV_TIMER_IOCTL_CONTINUE

    Linus Torvalds
     
  • virtio_console uses a small DMA buffer for control requests. Move
    that buffer into heap memory.

    Doing virtio DMA on the stack is normally okay on non-DMA-API virtio
    systems (which is currently most of them), but it breaks completely
    if the stack is virtually mapped.

    Tested by typing both directions using picocom aimed at /dev/hvc0.

    Signed-off-by: Andy Lutomirski
    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: Amit Shah

    Andy Lutomirski
     
  • We get 1 warning when building kernel with W=1:
    drivers/virtio/virtio_ring.c:170:16: warning: no previous prototype for 'vring_dma_dev' [-Wmissing-prototypes]

    In fact, this function is only used in the file in which it is
    declared and don't need a declaration, but can be made static.
    so this patch marks this function with 'static'.

    Signed-off-by: Baoyou Xie
    Acked-by: Arnd Bergmann
    Signed-off-by: Michael S. Tsirkin

    Baoyou Xie