16 Jan, 2016

40 commits

  • As far as I can see there's no users of PG_reserved on compound pages.
    Let's use PF_NO_COMPOUND here.

    Signed-off-by: Kirill A. Shutemov
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Cc: Dave Hansen
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Vlastimil Babka
    Cc: Christoph Lameter
    Cc: Naoya Horiguchi
    Cc: Steve Capper
    Cc: "Aneesh Kumar K.V"
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Jerome Marchand
    Cc: Jérôme Glisse
    Cc: Hillf Danton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • PG_pinned and PG_savepinned are about page table's pages which are never
    compound.

    I'm not so sure about PG_foreign, but it seems we shouldn't see compound
    pages there too.

    Let's use PF_NO_COMPOUND for all of them.

    Signed-off-by: Kirill A. Shutemov
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Cc: Dave Hansen
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Vlastimil Babka
    Cc: Christoph Lameter
    Cc: Naoya Horiguchi
    Cc: Steve Capper
    Cc: "Aneesh Kumar K.V"
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Jerome Marchand
    Cc: Jérôme Glisse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • SL*B uses compound pages and marks head pages with PG_slab.
    __SetPageSlab() and __ClearPageSlab() are never called for tail pages.

    The same situation with PG_slob_free in SLOB allocator.

    PF_NO_TAIL is appropriate for these flags.

    Signed-off-by: Kirill A. Shutemov
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Cc: Dave Hansen
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Vlastimil Babka
    Cc: Christoph Lameter
    Cc: Naoya Horiguchi
    Cc: Steve Capper
    Cc: "Aneesh Kumar K.V"
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Jerome Marchand
    Cc: Jérôme Glisse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Only head pages are ever on LRU. Let's use PF_HEAD policy to avoid any
    confusion for all LRU-related flags.

    Signed-off-by: Kirill A. Shutemov
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Cc: Dave Hansen
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Vlastimil Babka
    Cc: Christoph Lameter
    Cc: Naoya Horiguchi
    Cc: Steve Capper
    Cc: "Aneesh Kumar K.V"
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Jerome Marchand
    Cc: Jérôme Glisse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • It seems we don't have compound page on FS/IO path currently. Use
    PF_NO_COMPOUND to catch if we have.

    The odd exception is PG_dirty: sound uses compound pages and maps them
    with PTEs. PF_NO_COMPOUND triggers VM_BUG_ON() in set_page_dirty() on
    handling shared fault. Let's use PF_HEAD for PG_dirty.

    Signed-off-by: Kirill A. Shutemov
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Cc: Dave Hansen
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Vlastimil Babka
    Cc: Christoph Lameter
    Cc: Naoya Horiguchi
    Cc: Steve Capper
    Cc: "Aneesh Kumar K.V"
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Jerome Marchand
    Cc: Jérôme Glisse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • lock_page() must operate on the whole compound page. It doesn't make
    much sense to lock part of compound page. Change code to use head
    page's PG_locked, if tail page is passed.

    This patch also gets rid of custom helper functions --
    __set_page_locked() and __clear_page_locked(). They are replaced with
    helpers generated by __SETPAGEFLAG/__CLEARPAGEFLAG. Tail pages to these
    helper would trigger VM_BUG_ON().

    SLUB uses PG_locked as a bit spin locked. IIUC, tail pages should never
    appear there. VM_BUG_ON() is added to make sure that this assumption is
    correct.

    [akpm@linux-foundation.org: fix fs/cifs/file.c]
    Signed-off-by: Kirill A. Shutemov
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Cc: Dave Hansen
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Vlastimil Babka
    Cc: Christoph Lameter
    Cc: Naoya Horiguchi
    Cc: Steve Capper
    Cc: "Aneesh Kumar K.V"
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Jerome Marchand
    Cc: Jérôme Glisse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • This patch adds a third argument to macros which create function
    definitions for page flags. This argument defines how page-flags
    helpers behave on compound functions.

    For now we define four policies:

    - PF_ANY: the helper function operates on the page it gets, regardless
    if it's non-compound, head or tail.

    - PF_HEAD: the helper function operates on the head page of the
    compound page if it gets tail page.

    - PF_NO_TAIL: only head and non-compond pages are acceptable for this
    helper function.

    - PF_NO_COMPOUND: only non-compound pages are acceptable for this
    helper function.

    For now we use policy PF_ANY for all helpers, which matches current
    behaviour.

    We do not enforce the policy for TESTPAGEFLAG, because we have flags
    checked for random pages all over the kernel. Noticeable exception to
    this is PageTransHuge() which triggers VM_BUG_ON() for tail page.

    Signed-off-by: Kirill A. Shutemov
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Cc: Dave Hansen
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Vlastimil Babka
    Cc: Christoph Lameter
    Cc: Naoya Horiguchi
    Cc: Steve Capper
    Cc: "Aneesh Kumar K.V"
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Jerome Marchand
    Cc: Jérôme Glisse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • The preparation patch: we are going to use compound_head(), PageTail()
    and PageCompound() to define page-flags helpers.

    Let's define them before macros.

    We cannot user PageHead() helper in PageCompound() as it's not yet
    defined -- use test_bit(PG_head, &page->flags) instead.

    Signed-off-by: Kirill A. Shutemov
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Cc: Dave Hansen
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Vlastimil Babka
    Cc: Christoph Lameter
    Cc: Naoya Horiguchi
    Cc: Steve Capper
    Cc: "Aneesh Kumar K.V"
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Jerome Marchand
    Cc: Jérôme Glisse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Use TESTPAGEFLAG_FALSE() to get it a bit cleaner.

    Signed-off-by: Kirill A. Shutemov
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Cc: Dave Hansen
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Vlastimil Babka
    Cc: Christoph Lameter
    Cc: Naoya Horiguchi
    Cc: Steve Capper
    Cc: "Aneesh Kumar K.V"
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Jerome Marchand
    Cc: Jérôme Glisse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Pull networking fixes from David Miller:
    "A quick set of bug fixes after there initial networking merge:

    1) Netlink multicast group storage allocator only was tested with
    nr_groups equal to 1, make it work for other values too. From
    Matti Vaittinen.

    2) Check build_skb() return value in macb and hip04_eth drivers, from
    Weidong Wang.

    3) Don't leak x25_asy on x25_asy_open() failure.

    4) More DMA map/unmap fixes in 3c59x from Neil Horman.

    5) Don't clobber IP skb control block during GSO segmentation, from
    Konstantin Khlebnikov.

    6) ECN helpers for ipv6 don't fixup the checksum, from Eric Dumazet.

    7) Fix SKB segment utilization estimation in xen-netback, from David
    Vrabel.

    8) Fix lockdep splat in bridge addrlist handling, from Nikolay
    Aleksandrov"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
    bgmac: Fix reversed test of build_skb() return value.
    bridge: fix lockdep addr_list_lock false positive splat
    net: smsc: Add support h8300
    xen-netback: free queues after freeing the net device
    xen-netback: delete NAPI instance when queue fails to initialize
    xen-netback: use skb to determine number of required guest Rx requests
    net: sctp: Move sequence start handling into sctp_transport_get_idx()
    ipv6: update skb->csum when CE mark is propagated
    net: phy: turn carrier off on phy attach
    net: macb: clear interrupts when disabling them
    sctp: support to lookup with ep+paddr in transport rhashtable
    net: hns: fixes no syscon error when init mdio
    dts: hisi: fixes no syscon fault when init mdio
    net: preserve IP control block during GSO segmentation
    fsl/fman: Delete one function call "put_device" in dtsec_config()
    hip04_eth: fix missing error handle for build_skb failed
    3c59x: fix another page map/single unmap imbalance
    3c59x: balance page maps and unmaps
    x25_asy: Free x25_asy on x25_asy_open() failure.
    mlxsw: fix SWITCHDEV_OBJ_ID_PORT_MDB
    ...

    Linus Torvalds
     
  • Pull sparc fixes from David Miller:
    "Two sparc bug fixes"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
    sparc64: Fix numa node distance initialization
    sparc64: fix incorrect sign extension in sys_sparc64_personality

    Linus Torvalds
     
  • Pull powerpc updates from Michael Ellerman:
    "Core:
    - Ground work for the new Power9 MMU from Aneesh Kumar K.V
    - Optimise FP/VMX/VSX context switching from Anton Blanchard

    Misc:
    - Various cleanups from Krzysztof Kozlowski, John Ogness, Rashmica
    Gupta, Russell Currey, Gavin Shan, Daniel Axtens, Michael Neuling,
    Andrew Donnellan
    - Allow wrapper to work on non-english system from Laurent Vivier
    - Add rN aliases to the pt_regs_offset table from Rashmica Gupta
    - Fix module autoload for rackmeter & axonram drivers from Luis de
    Bethencourt
    - Include KVM guest test in all interrupt vectors from Paul Mackerras
    - Fix DSCR inheritance over fork() from Anton Blanchard
    - Make value-returning atomics & {cmp}xchg* & their atomic_ versions
    fully ordered from Boqun Feng
    - Print MSR TM bits in oops messages from Michael Neuling
    - Add TM signal return & invalid stack selftests from Michael Neuling
    - Limit EPOW reset event warnings from Vipin K Parashar
    - Remove the Cell QPACE code from Rashmica Gupta
    - Append linux_banner to exception information in xmon from Rashmica
    Gupta
    - Add selftest to check if VSRs are corrupted from Rashmica Gupta
    - Remove broken GregorianDay() from Daniel Axtens
    - Import Anton's context_switch2 benchmark into selftests from
    Michael Ellerman
    - Add selftest script to test HMI functionality from Daniel Axtens
    - Remove obsolete OPAL v2 support from Stewart Smith
    - Make enter_rtas() private from Michael Ellerman
    - PPR exception cleanups from Michael Ellerman
    - Add page soft dirty tracking from Laurent Dufour
    - Add support for Nvlink NPUs from Alistair Popple
    - Add support for kexec on 476fpe from Alistair Popple
    - Enable kernel CPU dlpar from sysfs from Nathan Fontenot
    - Copy only required pieces of the mm_context_t to the paca from
    Michael Neuling
    - Add a kmsg_dumper that flushes OPAL console output on panic from
    Russell Currey
    - Implement save_stack_trace_regs() to enable kprobe stack tracing
    from Steven Rostedt
    - Add HWCAP bits for Power9 from Michael Ellerman
    - Fix _PAGE_PTE breaking swapoff from Aneesh Kumar K.V
    - Fix _PAGE_SWP_SOFT_DIRTY breaking swapoff from Hugh Dickins
    - scripts/recordmcount.pl: support data in text section on powerpc
    from Ulrich Weigand
    - Handle R_PPC64_ENTRY relocations in modules from Ulrich Weigand

    cxl:
    - cxl: Fix possible idr warning when contexts are released from
    Vaibhav Jain
    - cxl: use correct operator when writing pcie config space values
    from Andrew Donnellan
    - cxl: Fix DSI misses when the context owning task exits from Vaibhav
    Jain
    - cxl: fix build for GCC 4.6.x from Brian Norris
    - cxl: use -Werror only with CONFIG_PPC_WERROR from Brian Norris
    - cxl: Enable PCI device ID for future IBM CXL adapter from Uma
    Krishnan

    Freescale:
    - Freescale updates from Scott: Highlights include moving QE code out
    of arch/powerpc (to be shared with arm), device tree updates, and
    minor fixes"

    * tag 'powerpc-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (149 commits)
    powerpc/module: Handle R_PPC64_ENTRY relocations
    scripts/recordmcount.pl: support data in text section on powerpc
    powerpc/powernv: Fix OPAL_CONSOLE_FLUSH prototype and usages
    powerpc/mm: fix _PAGE_SWP_SOFT_DIRTY breaking swapoff
    powerpc/mm: Fix _PAGE_PTE breaking swapoff
    cxl: Enable PCI device ID for future IBM CXL adapter
    cxl: use -Werror only with CONFIG_PPC_WERROR
    cxl: fix build for GCC 4.6.x
    powerpc: Add HWCAP bits for Power9
    powerpc/powernv: Reserve PE#0 on NPU
    powerpc/powernv: Change NPU PE# assignment
    powerpc/powernv: Fix update of NVLink DMA mask
    powerpc/powernv: Remove misleading comment in pci.c
    powerpc: Implement save_stack_trace_regs() to enable kprobe stack tracing
    powerpc: Fix build break due to paca mm_context_t changes
    cxl: Fix DSI misses when the context owning task exits
    MAINTAINERS: Update Scott Wood's e-mail address
    powerpc/powernv: Fix minor off-by-one error in opal_mce_check_early_recovery()
    powerpc: Fix style of self-test config prompts
    powerpc/powernv: Only delay opal_rtc_read() retry when necessary
    ...

    Linus Torvalds
     
  • Fixes: f1640c3ddeec ("bgmac: fix a missing check for build_skb")
    Signed-off-by: David S. Miller

    David S. Miller
     
  • Pull VFIO updates from Alex Williamson:

    - Fixes in AMD xgbe reset, spapr structure padding, type 1 flags (Dan
    Carpenter, Alexey Kardashevskiy, Pierre Morel)

    - Re-introduce no-iommu mode, with a user this time (Alex Williamson)

    * tag 'vfio-v4.5-rc1' of git://github.com/awilliam/linux-vfio:
    vfio/iommu_type1: make use of info.flags
    vfio: Include No-IOMMU mode
    vfio: Add explicit alignments in vfio_iommu_spapr_tce_create
    VFIO: platform: reset: fix a warning message condition

    Linus Torvalds
     
  • Pull nfsd updates from Bruce Fields:
    "Smaller bugfixes and cleanup, including a fix for a failures of
    kerberized NFSv4.1 mounts, and Scott Mayhew's work addressing ACK
    storms that can affect some high-availability NFS setups"

    * tag 'nfsd-4.5' of git://linux-nfs.org/~bfields/linux:
    nfsd: add new io class tracepoint
    nfsd: give up on CB_LAYOUTRECALLs after two lease periods
    nfsd: Fix nfsd leaks sunrpc module references
    lockd: constify nlmsvc_binding structure
    lockd: use to_delayed_work
    nfsd: use to_delayed_work
    Revert "svcrdma: Do not send XDR roundup bytes for a write chunk"
    lockd: Register callbacks on the inetaddr_chain and inet6addr_chain
    nfsd: Register callbacks on the inetaddr_chain and inet6addr_chain
    sunrpc: Add a function to close temporary transports immediately
    nfsd: don't base cl_cb_status on stale information
    nfsd4: fix gss-proxy 4.1 mounts for some AD principals
    nfsd: fix unlikely NULL deref in mach_creds_match
    nfsd: minor consolidation of mach_cred handling code
    nfsd: helper for dup of possibly NULL string
    svcrpc: move some initialization to common code
    nfsd: fix a warning message
    nfsd: constify nfsd4_callback_ops structure
    nfsd: recover: constify nfsd4_client_tracking_ops structures
    svcrdma: Do not send XDR roundup bytes for a write chunk

    Linus Torvalds
     
  • Pull vfs regression fix from Al Viro:
    "Fix for braino introduced in vfs.git#work.misc"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    amdkfd: Copy from the proper user command pointer

    Linus Torvalds
     
  • After promisc mode management was introduced a bridge device could do
    dev_set_promiscuity from its ndo_change_rx_flags() callback which in
    turn can be called after the bridge's addr_list_lock has been taken
    (e.g. by dev_uc_add). This causes a false positive lockdep splat because
    the port interfaces' addr_list_lock is taken when br_manage_promisc()
    runs after the bridge's addr list lock was already taken.
    To remove the false positive introduce a custom bridge addr_list_lock
    class and set it on bridge init.
    A simple way to reproduce this is with the following:
    $ brctl addbr br0
    $ ip l add l br0 br0.100 type vlan id 100
    $ ip l set br0 up
    $ ip l set br0.100 up
    $ echo 1 > /sys/class/net/br0/bridge/vlan_filtering
    $ brctl addif br0 eth0
    Splat:
    [ 43.684325] =============================================
    [ 43.684485] [ INFO: possible recursive locking detected ]
    [ 43.684636] 4.4.0-rc8+ #54 Not tainted
    [ 43.684755] ---------------------------------------------
    [ 43.684906] brctl/1187 is trying to acquire lock:
    [ 43.685047] (_xmit_ETHER){+.....}, at: [] dev_set_rx_mode+0x1e/0x40
    [ 43.685460] but task is already holding lock:
    [ 43.685618] (_xmit_ETHER){+.....}, at: [] dev_uc_add+0x27/0x80
    [ 43.686015] other info that might help us debug this:
    [ 43.686316] Possible unsafe locking scenario:

    [ 43.686743] CPU0
    [ 43.686967] ----
    [ 43.687197] lock(_xmit_ETHER);
    [ 43.687544] lock(_xmit_ETHER);
    [ 43.687886] *** DEADLOCK ***

    [ 43.688438] May be due to missing lock nesting notation

    [ 43.688882] 2 locks held by brctl/1187:
    [ 43.689134] #0: (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x20
    [ 43.689852] #1: (_xmit_ETHER){+.....}, at: [] dev_uc_add+0x27/0x80
    [ 43.690575] stack backtrace:
    [ 43.690970] CPU: 0 PID: 1187 Comm: brctl Not tainted 4.4.0-rc8+ #54
    [ 43.691270] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
    [ 43.691770] ffffffff826a25c0 ffff8800369fb8e0 ffffffff81360ceb ffffffff826a25c0
    [ 43.692425] ffff8800369fb9b8 ffffffff810d0466 ffff8800369fb968 ffffffff81537139
    [ 43.693071] ffff88003a08c880 0000000000000000 00000000ffffffff 0000000002080020
    [ 43.693709] Call Trace:
    [ 43.693931] [] dump_stack+0x4b/0x70
    [ 43.694199] [] __lock_acquire+0x1e46/0x1e90
    [ 43.694483] [] ? netlink_broadcast_filtered+0x139/0x3e0
    [ 43.694789] [] ? nlmsg_notify+0x5a/0xc0
    [ 43.695064] [] lock_acquire+0xe5/0x1f0
    [ 43.695340] [] ? dev_set_rx_mode+0x1e/0x40
    [ 43.695623] [] _raw_spin_lock_bh+0x45/0x80
    [ 43.695901] [] ? dev_set_rx_mode+0x1e/0x40
    [ 43.696180] [] dev_set_rx_mode+0x1e/0x40
    [ 43.696460] [] dev_set_promiscuity+0x3c/0x50
    [ 43.696750] [] br_port_set_promisc+0x25/0x50 [bridge]
    [ 43.697052] [] br_manage_promisc+0x8a/0xe0 [bridge]
    [ 43.697348] [] br_dev_change_rx_flags+0x1e/0x20 [bridge]
    [ 43.697655] [] __dev_set_promiscuity+0x132/0x1f0
    [ 43.697943] [] __dev_set_rx_mode+0x82/0x90
    [ 43.698223] [] dev_uc_add+0x5e/0x80
    [ 43.698498] [] vlan_device_event+0x542/0x650 [8021q]
    [ 43.698798] [] notifier_call_chain+0x5d/0x80
    [ 43.699083] [] raw_notifier_call_chain+0x16/0x20
    [ 43.699374] [] call_netdevice_notifiers_info+0x6e/0x80
    [ 43.699678] [] call_netdevice_notifiers+0x16/0x20
    [ 43.699973] [] br_add_if+0x47e/0x4c0 [bridge]
    [ 43.700259] [] add_del_if+0x6e/0x80 [bridge]
    [ 43.700548] [] br_dev_ioctl+0xaf/0xc0 [bridge]
    [ 43.700836] [] dev_ifsioc+0x30c/0x3c0
    [ 43.701106] [] dev_ioctl+0xf9/0x6f0
    [ 43.701379] [] ? mntput_no_expire+0x5/0x450
    [ 43.701665] [] ? mntput_no_expire+0xae/0x450
    [ 43.701947] [] sock_do_ioctl+0x42/0x50
    [ 43.702219] [] sock_ioctl+0x1e5/0x290
    [ 43.702500] [] do_vfs_ioctl+0x2cb/0x5c0
    [ 43.702771] [] SyS_ioctl+0x79/0x90
    [ 43.703033] [] entry_SYSCALL_64_fastpath+0x16/0x7a

    CC: Vlad Yasevich
    CC: Stephen Hemminger
    CC: Bridge list
    CC: Andy Gospodarek
    CC: Roopa Prabhu
    Fixes: 2796d0c648c9 ("bridge: Automatically manage port promiscuous mode.")
    Reported-by: Andy Gospodarek
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Pull md updates from Neil Brown:
    "Mostly clustered-raid1 and raid5 journal updates. one Y2038 fix and
    other minor stuff.

    One patch removes me from the MAINTAINERS file and adds a record of my
    md maintainership to Credits"

    Many thanks to Neil, who has been around for a _looong_ time.

    * tag 'md/4.5' of git://neil.brown.name/md: (26 commits)
    md/raid: only permit hot-add of compatible integrity profiles
    Remove myself as MD Maintainer, and add to Credits.
    raid5-cache: handle journal hotadd in quiesce
    MD: add journal with array suspended
    md: set MD_HAS_JOURNAL in correct places
    md: Remove 'ready' field from mddev.
    md: remove unnecesary md_new_event_inintr
    raid5: allow r5l_io_unit allocations to fail
    raid5-cache: use a mempool for the metadata block
    raid5-cache: use a bio_set
    raid5-cache: add journal hot add/remove support
    drivers: md: use ktime_get_real_seconds()
    md: avoid warning for 32-bit sector_t
    raid5-cache: free meta_page earlier
    raid5-cache: simplify r5l_move_io_unit_list
    md: update comment for md_allow_write
    md-cluster: update comments for MD_CLUSTER_SEND_LOCKED_ALREADY
    md-cluster: Protect communication with mutexes
    md-cluster: Defer MD reloading to mddev->thread
    md-cluster: update the documentation
    ...

    Linus Torvalds
     
  • Pull regulator updates from Mark Brown:
    "Aside from a fix for a spurious warning (which caused more problems
    than it fixed in the fixing really) this is all driver updates,
    including new drivers for Dialog PV88060/90 and TI LM363x and TPS65086
    devices. The qcom_smd driver has had PM8916 and PMA8084 support
    added"

    * tag 'regulator-v4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (36 commits)
    regulator: core: remove some dead code
    regulator: core: use dev_to_rdev
    regulator: lp872x: Get rid of duplicate reference to DVS GPIO
    regulator: lp872x: Add missing of_match in regulators descriptions
    regulator: axp20x: Fix GPIO LDO enable value for AXP22x
    regulator: lp8788: constify regulator_ops structures
    regulator: wm8*: constify regulator_ops structures
    regulator: da9*: constify regulator_ops structures
    regulator: mt6311: Use REGCACHE_RBTREE
    regulator: tps65917/palmas: Add bypass ops for LDOs with bypass capability
    regulator: qcom-smd: Add support for PMA8084
    regulator: qcom-smd: Add PM8916 support
    soc: qcom: documentation: Update SMD/RPM Docs
    regulator: pv88090: logical vs bitwise AND typo
    regulator: pv88090: Fix irq leak
    regulator: pv88090: new regulator driver
    regulator: wm831x-ldo: Use platform_register/unregister_drivers()
    regulator: wm831x-dcdc: Use platform_register/unregister_drivers()
    regulator: lp8788-ldo: Use platform_register/unregister_drivers()
    regulator: core: Fix nested locking of supplies
    ...

    Linus Torvalds
     
  • Add H8/300 platform support for smc91x

    Signed-off-by: Yoshinori Sato
    Signed-off-by: David S. Miller

    Yoshinori Sato
     
  • 8f1d57c17248 ("amdkfd: don't open-code memdup_user()") mistakenly uses
    an uninitialized local pointer, gcc complains:

    drivers/gpu/drm/amd/amdkfd/kfd_chardev.c: In function ‘kfd_ioctl_dbg_address_watch’:
    drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:562:12: warning: ‘args_buff’ may be used uninitialized in this function [-Wmaybe-uninitialized]
    args_buff = memdup_user(args_buff,
    ^

    Fix it.

    Signed-off-by: Borislav Petkov
    Cc: Al Viro
    Signed-off-by: Al Viro

    Borislav Petkov
     
  • Pull mailbox fixlet from Jussi Brar.

    * 'mailbox-for-next' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
    mailbox: constify mbox_chan_ops structure

    Linus Torvalds
     
  • David Vrabel says:

    ====================
    xen-netback: use skb to determine number of required (etc.)

    "xen-netback: use skb to determine number of required" plus two other
    minor fixes I found down the back of the sofa.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • If a queue still has a NAPI instance added to the net device, freeing
    the queues early results in a use-after-free.

    The shouldn't ever happen because we disconnect and tear down all queues
    before freeing the net device, but doing this makes it obviously safe.

    Signed-off-by: David Vrabel
    Signed-off-by: David S. Miller

    David Vrabel
     
  • When xenvif_connect() fails it may leave a stale NAPI instance added to
    the device. Make sure we delete it in the error path.

    Signed-off-by: David Vrabel
    Signed-off-by: David S. Miller

    David Vrabel
     
  • Using the MTU or GSO size to determine the number of required guest Rx
    requests for an skb was subtly broken since these value may change at
    runtime.

    After 1650d5455bd2dc6b5ee134bd6fc1a3236c266b5b (xen-netback: always
    fully coalesce guest Rx packets) we always fully pack a packet into
    its guest Rx slots. Calculating the number of required slots from the
    packet length is then easy.

    Signed-off-by: David Vrabel
    Signed-off-by: David S. Miller

    David Vrabel
     
  • net/sctp/proc.c: In function ‘sctp_transport_get_idx’:
    net/sctp/proc.c:313: warning: ‘obj’ may be used uninitialized in this function

    This is currently a false positive, as all callers check for a zero
    offset first, and handle this case in the exact same way.

    Move the check and handling into sctp_transport_get_idx() to kill the
    compiler warning, and avoid future bugs.

    Signed-off-by: Geert Uytterhoeven
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Geert Uytterhoeven
     
  • When a tunnel decapsulates the outer header, it has to comply
    with RFC 6080 and eventually propagate CE mark into inner header.

    It turns out IP6_ECN_set_ce() does not correctly update skb->csum
    for CHECKSUM_COMPLETE packets, triggering infamous "hw csum failure"
    messages and stack traces.

    Signed-off-by: Eric Dumazet
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Pull UDF fixes and quota cleanups from Jan Kara:
    "Several UDF fixes and some minor quota cleanups"

    * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    udf: Check output buffer length when converting name to CS0
    udf: Prevent buffer overrun with multi-byte characters
    quota: constify qtree_fmt_operations structures
    udf: avoid uninitialized variable use
    udf: Fix lost indirect extent block
    udf: Factor out code for creating indirect extent
    udf: limit the maximum number of indirect extents in a row
    udf: limit the maximum number of TD redirections
    fs: make quota/dquot.c explicitly non-modular
    fs: make quota/netlink.c explicitly non-modular

    Linus Torvalds
     
  • The operstate of a networking device initially IF_OPER_UNKNOWN aka
    "unknown", updated on carrier state changes (with carrier state being on
    by default). This means it will stay unknown unless the carrier state
    goes to off at some point, which is not the case if the phy is already
    up/connected at startup.

    Explicitly turn off the carrier on phy attach, leaving the phy state
    machine to turn the carrier on when it has done the initial negotiation.

    Signed-off-by: Sjoerd Simons
    Signed-off-by: David S. Miller

    Sjoerd Simons
     
  • Disabling interrupts with the IDR register does not stop the macb hardware
    from asserting its interrupt line if there are interrupts pending. Always
    clear the interrupts using ISR, and be sure to write it on hardware that
    is not read-to-clear, like Zynq. Not doing so will cause interrupts when
    the driver doesn't expect them.

    Signed-off-by: Nathan Sullivan
    Acked-by: Nicolas Ferre
    Signed-off-by: David S. Miller

    Nathan Sullivan
     
  • Merge first patch-bomb from Andrew Morton:

    - A few hotfixes which missed 4.4 becasue I was asleep. cc'ed to
    -stable

    - A few misc fixes

    - OCFS2 updates

    - Part of MM. Including pretty large changes to page-flags handling
    and to thp management which have been buffered up for 2-3 cycles now.

    I have a lot of MM material this time.

    [ It turns out the THP part wasn't quite ready, so that got dropped from
    this series - Linus ]

    * emailed patches from Andrew Morton : (117 commits)
    zsmalloc: reorganize struct size_class to pack 4 bytes hole
    mm/zbud.c: use list_last_entry() instead of list_tail_entry()
    zram/zcomp: do not zero out zcomp private pages
    zram: pass gfp from zcomp frontend to backend
    zram: try vmalloc() after kmalloc()
    zram/zcomp: use GFP_NOIO to allocate streams
    mm: add tracepoint for scanning pages
    drivers/base/memory.c: fix kernel warning during memory hotplug on ppc64
    mm/page_isolation: use macro to judge the alignment
    mm: fix noisy sparse warning in LIBCFS_ALLOC_PRE()
    mm: rework virtual memory accounting
    include/linux/memblock.h: fix ordering of 'flags' argument in comments
    mm: move lru_to_page to mm_inline.h
    Documentation/filesystems: describe the shared memory usage/accounting
    memory-hotplug: don't BUG() in register_memory_resource()
    hugetlb: make mm and fs code explicitly non-modular
    mm/swapfile.c: use list_for_each_entry_safe in free_swap_count_continuations
    mm: /proc/pid/clear_refs: no need to clear VM_SOFTDIRTY in clear_soft_dirty_pmd()
    mm: make sure isolate_lru_page() is never called for tail page
    vmstat: make vmstat_updater deferrable again and shut down on idle
    ...

    Linus Torvalds
     
  • Now, when we sendmsg, we translate the ep to laddr by selecting the
    first element of the list, and then do a lookup for a transport.

    But sctp_hash_cmp() will compare it against asoc addr_list, which may
    be a subset of ep addr_list, meaning that this chosen laddr may not be
    there, and thus making it impossible to find the transport.

    So we fix it by using ep + paddr to lookup transports in hashtable. In
    sctp_hash_cmp, if .ep is set, we will check if this ep == asoc->ep,
    or we will do the laddr check.

    Fixes: d6c0256a60e6 ("sctp: add the rhashtable apis for sctp global transport hashtable")
    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Reported-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Xin Long
     
  • Reoder the pages_per_zspage field in struct size_class which can
    eliminate the 4 bytes hole between it and stats field.

    Signed-off-by: Weijie Yang
    Reviewed-by: Sergey Senozhatsky
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Weijie Yang
     
  • list_last_entry*( has been defined in list.h, so replace
    list_tail_entry() with it.

    Signed-off-by: Geliang Tang
    Cc: Seth Jennings
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geliang Tang
     
  • Do not __GFP_ZERO allocated zcomp ->private pages. We keep allocated
    streams around and use them for read/write requests, so we supply a
    zeroed out ->private to compression algorithm as a scratch buffer only
    once -- the first time we use that stream. For the rest of IO requests
    served by this stream ->private usually contains some temporarily data
    from the previous requests.

    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • Each zcomp backend uses own gfp flag but it's pointless because the
    context they could be called is driven by upper layer(ie, zcomp
    frontend). As well, zcomp frondend could call them in different
    context. One context(ie, zram init part) is it should be better to make
    sure successful allocation other context(ie, further stream allocation
    part for accelarating I/O speed) is just optional so let's pass gfp down
    from driver (ie, zcomp frontend) like normal MM convention.

    [sergey.senozhatsky@gmail.com: add missing __vmalloc zero and highmem gfps]
    Signed-off-by: Minchan Kim
    Signed-off-by: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • When we're using LZ4 multi compression streams for zram swap, we found
    out page allocation failure message in system running test. That was
    not only once, but a few(2 - 5 times per test). Also, some failure
    cases were continually occurring to try allocation order 3.

    In order to make parallel compression private data, we should call
    kzalloc() with order 2/3 in runtime(lzo/lz4). But if there is no order
    2/3 size memory to allocate in that time, page allocation fails. This
    patch makes to use vmalloc() as fallback of kmalloc(), this prevents
    page alloc failure warning.

    After using this, we never found warning message in running test, also
    It could reduce process startup latency about 60-120ms in each case.

    For reference a call trace :

    Binder_1: page allocation failure: order:3, mode:0x10c0d0
    CPU: 0 PID: 424 Comm: Binder_1 Tainted: GW 3.10.49-perf-g991d02b-dirty #20
    Call trace:
    dump_backtrace+0x0/0x270
    show_stack+0x10/0x1c
    dump_stack+0x1c/0x28
    warn_alloc_failed+0xfc/0x11c
    __alloc_pages_nodemask+0x724/0x7f0
    __get_free_pages+0x14/0x5c
    kmalloc_order_trace+0x38/0xd8
    zcomp_lz4_create+0x2c/0x38
    zcomp_strm_alloc+0x34/0x78
    zcomp_strm_multi_find+0x124/0x1ec
    zcomp_strm_find+0xc/0x18
    zram_bvec_rw+0x2fc/0x780
    zram_make_request+0x25c/0x2d4
    generic_make_request+0x80/0xbc
    submit_bio+0xa4/0x15c
    __swap_writepage+0x218/0x230
    swap_writepage+0x3c/0x4c
    shrink_page_list+0x51c/0x8d0
    shrink_inactive_list+0x3f8/0x60c
    shrink_lruvec+0x33c/0x4cc
    shrink_zone+0x3c/0x100
    try_to_free_pages+0x2b8/0x54c
    __alloc_pages_nodemask+0x514/0x7f0
    __get_free_pages+0x14/0x5c
    proc_info_read+0x50/0xe4
    vfs_read+0xa0/0x12c
    SyS_read+0x44/0x74
    DMA: 3397*4kB (MC) 26*8kB (RC) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
    0*512kB 0*1024kB 0*2048kB 0*4096kB = 13796kB

    [minchan@kernel.org: change vmalloc gfp and adding comment about gfp]
    [sergey.senozhatsky@gmail.com: tweak comments and styles]
    Signed-off-by: Kyeongdon Kim
    Signed-off-by: Minchan Kim
    Acked-by: Sergey Senozhatsky
    Sergey Senozhatsky
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kyeongdon Kim
     
  • We can end up allocating a new compression stream with GFP_KERNEL from
    within the IO path, which may result is nested (recursive) IO
    operations. That can introduce problems if the IO path in question is a
    reclaimer, holding some locks that will deadlock nested IOs.

    Allocate streams and working memory using GFP_NOIO flag, forbidding
    recursive IO and FS operations.

    An example:

    inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
    git/20158 [HC0[0]:SC0[0]:HE1:SE1] takes:
    (jbd2_handle){+.+.?.}, at: start_this_handle+0x4ca/0x555
    {IN-RECLAIM_FS-W} state was registered at:
    __lock_acquire+0x8da/0x117b
    lock_acquire+0x10c/0x1a7
    start_this_handle+0x52d/0x555
    jbd2__journal_start+0xb4/0x237
    __ext4_journal_start_sb+0x108/0x17e
    ext4_dirty_inode+0x32/0x61
    __mark_inode_dirty+0x16b/0x60c
    iput+0x11e/0x274
    __dentry_kill+0x148/0x1b8
    shrink_dentry_list+0x274/0x44a
    prune_dcache_sb+0x4a/0x55
    super_cache_scan+0xfc/0x176
    shrink_slab.part.14.constprop.25+0x2a2/0x4d3
    shrink_zone+0x74/0x140
    kswapd+0x6b7/0x930
    kthread+0x107/0x10f
    ret_from_fork+0x3f/0x70
    irq event stamp: 138297
    hardirqs last enabled at (138297): debug_check_no_locks_freed+0x113/0x12f
    hardirqs last disabled at (138296): debug_check_no_locks_freed+0x33/0x12f
    softirqs last enabled at (137818): __do_softirq+0x2d3/0x3e9
    softirqs last disabled at (137813): irq_exit+0x41/0x95

    other info that might help us debug this:
    Possible unsafe locking scenario:
    CPU0
    ----
    lock(jbd2_handle);

    lock(jbd2_handle);

    *** DEADLOCK ***
    5 locks held by git/20158:
    #0: (sb_writers#7){.+.+.+}, at: [] mnt_want_write+0x24/0x4b
    #1: (&type->i_mutex_dir_key#2/1){+.+.+.}, at: [] lock_rename+0xd9/0xe3
    #2: (&sb->s_type->i_mutex_key#11){+.+.+.}, at: [] lock_two_nondirectories+0x3f/0x6b
    #3: (&sb->s_type->i_mutex_key#11/4){+.+.+.}, at: [] lock_two_nondirectories+0x66/0x6b
    #4: (jbd2_handle){+.+.?.}, at: [] start_this_handle+0x4ca/0x555

    stack backtrace:
    CPU: 2 PID: 20158 Comm: git Not tainted 4.1.0-rc7-next-20150615-dbg-00016-g8bdf555-dirty #211
    Call Trace:
    dump_stack+0x4c/0x6e
    mark_lock+0x384/0x56d
    mark_held_locks+0x5f/0x76
    lockdep_trace_alloc+0xb2/0xb5
    kmem_cache_alloc_trace+0x32/0x1e2
    zcomp_strm_alloc+0x25/0x73 [zram]
    zcomp_strm_multi_find+0xe7/0x173 [zram]
    zcomp_strm_find+0xc/0xe [zram]
    zram_bvec_rw+0x2ca/0x7e0 [zram]
    zram_make_request+0x1fa/0x301 [zram]
    generic_make_request+0x9c/0xdb
    submit_bio+0xf7/0x120
    ext4_io_submit+0x2e/0x43
    ext4_bio_write_page+0x1b7/0x300
    mpage_submit_page+0x60/0x77
    mpage_map_and_submit_buffers+0x10f/0x21d
    ext4_writepages+0xc8c/0xe1b
    do_writepages+0x23/0x2c
    __filemap_fdatawrite_range+0x84/0x8b
    filemap_flush+0x1c/0x1e
    ext4_alloc_da_blocks+0xb8/0x117
    ext4_rename+0x132/0x6dc
    ? mark_held_locks+0x5f/0x76
    ext4_rename2+0x29/0x2b
    vfs_rename+0x540/0x636
    SyS_renameat2+0x359/0x44d
    SyS_rename+0x1e/0x20
    entry_SYSCALL_64_fastpath+0x12/0x6f

    [minchan@kernel.org: add stable mark]
    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Cc: Kyeongdon Kim
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • Kejian Yan says:

    ====================
    dts: hisi: fixes no syscon fault when init mdio

    This patchset fixes the bug that eth can't initial successful on hip05-D02
    because the dts files doesn't match the source code.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller