27 Sep, 2015

1 commit

  • Pull SCSI target fixes from Nicholas Bellinger:
    "This includes a iser-target series from Jenny + Sagi @ Mellanox that
    addresses the few remaining active I/O shutdown bugs, along with a
    patch to support zero-copy for immediate data payloads that gives a
    nice performance improvement for small block WRITEs.

    Also included are some recent >= v4.2 regression bug-fixes. The most
    notable is a RCU conversion regression for SPC-3 PR registrations, and
    recent removal of obsolete RFC-3720 markers that introduced a login
    regression bug with MSFT iSCSI initiators.

    Thanks to everyone who has been testing + reporting bugs for v4.x"

    * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
    iscsi-target: Avoid OFMarker + IFMarker negotiation
    target: Make TCM_WRITE_PROTECT failure honor D_SENSE bit
    target: Fix target_sense_desc_format NULL pointer dereference
    target: Propigate backend read-only to core_tpg_add_lun
    target: Fix PR registration + APTPL RCU conversion regression
    iser-target: Skip data copy if all the command data comes as immediate
    iser-target: Change the recv buffers posting logic
    iser-target: Fix pending connections handling in target stack shutdown sequnce
    iser-target: Remove np_ prefix from isert_np members
    iser-target: Remove unused variables
    iser-target: Put the reference on commands waiting for unsol data
    iser-target: remove command with state ISTATE_REMOVE

    Linus Torvalds
     

26 Sep, 2015

3 commits

  • Pull networking fixes from David Miller:

    1) When we run a tap on netlink sockets, we have to copy mmap'd SKBs
    instead of cloning them. From Daniel Borkmann.

    2) When converting classical BPF into eBPF, fix the setting of the
    source reg to BPF_REG_X. From Tycho Andersen.

    3) Fix igmpv3/mldv2 report parsing in the bridge multicast code, from
    Linus Lussing.

    4) Fix dst refcounting for ipv6 tunnels, from Martin KaFai Lau.

    5) Set NLM_F_REPLACE flag properly when replacing ipv6 routes, from
    Roopa Prabhu.

    6) Add some new cxgb4 PCI device IDs, from Hariprasad Shenai.

    7) Fix headroom tests and SKB leaks in ipv6 fragmentation code, from
    Florian Westphal.

    8) Check DMA mapping errors in bna driver, from Ivan Vecera.

    9) Several 8139cp bug fixes (dev_kfree_skb_any in interrupt context,
    misclearing of interrupt status in TX timeout handler, etc.) from
    David Woodhouse.

    10) In tipc, reset SKB header pointer after skb_linearize(), from Erik
    Hugne.

    11) Fix autobind races et al. in netlink code, from Herbert Xu with
    help from Tejun Heo and others.

    12) Missing SET_NETDEV_DEV in sunvnet driver, from Sowmini Varadhan.

    13) Fix various races in timewait timer and reqsk_queue_hadh_req, from
    Eric Dumazet.

    14) Fix array overruns in mac80211, from Johannes Berg and Dan
    Carpenter.

    15) Fix data race in rhashtable_rehash_one(), from Dmitriy Vyukov.

    16) Fix race between poll_one_napi and napi_disable, from Neil Horman.

    17) Fix byte order in geneve tunnel port config, from John W Linville.

    18) Fix handling of ARP replies over lightweight tunnels, from Jiri
    Benc.

    19) We can loop when fib rule dumps cross multiple SKBs, fix from Wilson
    Kok and Roopa Prabhu.

    20) Several reference count handling bug fixes in the PHY/MDIO layer
    from Russel King.

    21) Fix lockdep splat in ppp_dev_uninit(), from Guillaume Nault.

    22) Fix crash in icmp_route_lookup(), from David Ahern.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
    net: Fix panic in icmp_route_lookup
    net: update docbook comment for __mdiobus_register()
    ppp: fix lockdep splat in ppp_dev_uninit()
    net: via/Kconfig: GENERIC_PCI_IOMAP required if PCI not selected
    phy: marvell: add link partner advertised modes
    net: fix net_device refcounting
    phy: add phy_device_remove()
    phy: fixed-phy: properly validate phy in fixed_phy_update_state()
    net: fix phy refcounting in a bunch of drivers
    of_mdio: fix MDIO phy device refcounting
    phy: add proper phy struct device refcounting
    phy: fix mdiobus module safety
    net: dsa: fix of_mdio_find_bus() device refcount leak
    phy: fix of_mdio_find_bus() device refcount leak
    ip6_tunnel: Reduce log level in ip6_tnl_err() to debug
    ip6_gre: Reduce log level in ip6gre_err() to debug
    fib_rules: fix fib rule dumps across multiple skbs
    bnx2x: byte swap rss_key to comply to Toeplitz specs
    net: revert "net_sched: move tp->root allocation into fw_init()"
    lwtunnel: remove source and destination UDP port config option
    ...

    Linus Torvalds
     
  • Pull another cgroup fix from Tejun Heo:
    "The cgroup writeback support got inadvertently enabled for traditional
    hierarchies revealing two regressions which are currently being worked
    on. It shouldn't have been enabled on traditional hierarchies, so
    disable it on them. This is enough to make the regressions go away
    for people who aren't experimenting with cgroup"

    * 'for-4.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroup, writeback: don't enable cgroup writeback on traditional hierarchies

    Linus Torvalds
     
  • Pull NFS client bugfixes from Trond Myklebust:
    "Highlights include:

    Stable patches:
    - fix v4.2 SEEK on files over 2 gigs
    - Fix a layout segment reference leak when pNFS I/O falls back to inband I/O.
    - Fix recovery of recalled read delegations

    Bugfixes:
    - Fix a case where NFSv4 fails to send CLOSE after a server reboot
    - Fix sunrpc to wait for connections to complete before retrying
    - Fix sunrpc races between transport connect/disconnect and shutdown
    - Fix an infinite loop when layoutget fail with BAD_STATEID
    - nfs/filelayout: Fix NULL reference caused by double freeing of fh_array
    - Fix a bogus WARN_ON_ONCE() in O_DIRECT when layout commit_through_mds is set
    - Fix layoutreturn/close ordering issues"

    * tag 'nfs-for-4.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    NFS41: make close wait for layoutreturn
    NFS: Skip checking ds_cinfo.buckets when lseg's commit_through_mds is set
    NFSv4.x/pnfs: Don't try to recover stateids twice in layoutget
    NFSv4: Recovery of recalled read delegations is broken
    NFS: Fix an infinite loop when layoutget fail with BAD_STATEID
    NFS: Do cleanup before resetting pageio read/write to mds
    SUNRPC: xs_sock_mark_closed() does not need to trigger socket autoclose
    SUNRPC: Lock the transport layer on shutdown
    nfs/filelayout: Fix NULL reference caused by double freeing of fh_array
    SUNRPC: Ensure that we wait for connections to complete before retrying
    SUNRPC: drop null test before destroy functions
    nfs: fix v4.2 SEEK on files over 2 gigs
    SUNRPC: Fix races between socket connection and destroy code
    nfs: fix pg_test page count calculation
    Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount

    Linus Torvalds
     

25 Sep, 2015

10 commits

  • This patch adds a DF_READ_ONLY flag that is used by IBLOCK to
    signal when a backend has been set to read-only mode, in order
    to propigate read-only status up to core_tpg_add_lun() for all
    future LUN fabric exports.

    With this is place, existing emulation for reporting read-only
    in spc_emulate_modesense() and normal transport_lookup_cmd_lun()
    TCM_WRITE_PROTECTED status checking just works as expected.

    Reported-by: Joeue Deng
    Reported-by: Andy Grover
    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger
     
  • Add a phy_device_remove() function to complement phy_device_register(),
    which undoes the effects of phy_device_register() by removing the phy
    device from visibility, but not freeing it.

    This allows these details to be moved out of the mdio bus code into
    the phy code where this action belongs.

    Signed-off-by: Russell King
    Signed-off-by: David S. Miller

    Russell King
     
  • Re-implement the mdiobus module refcounting to ensure that we actually
    ensure that the mdiobus module code does not go away while we might call
    into it.

    The old scheme using bus->dev.driver was buggy, because bus->dev is a
    class device which never has a struct device_driver associated with it,
    and hence the associated code trying to obtain a refcount did nothing
    useful.

    Instead, take the approach that other subsystems do: pass the module
    when calling mdiobus_register(), and record that in the mii_bus struct.
    When we need to increment the module use count in the phy code, use
    this stored pointer. When the phy is deteched, drop the module
    refcount, remembering that the phy device might go away at that point.

    This doesn't stop the mii_bus going away while there are in-use phys -
    it merely stops the underlying code vanishing.

    Signed-off-by: Russell King
    Signed-off-by: David S. Miller

    Russell King
     
  • Pull thermal management fixes from Zhang Rui:

    - Power allocator governor changes to allow binding on thermal zones
    with missing power estimates information. From Javi Merino.

    - Add compile test flags on thermal drivers that allow it without
    producing compilation errors. From Eduardo Valentin.

    - Fixes around memory allocation on cpu_cooling. From Javi Merino.

    - Fix on db8500 cpufreq code to allow autoload. From Luis de
    Bethencourt.

    - Maintainer entries for cpu cooling device

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
    thermal: power_allocator: exit early if there are no cooling devices
    thermal: power_allocator: don't require tzp to be present for the thermal zone
    thermal: power_allocator: relax the requirement of two passive trip points
    thermal: power_allocator: relax the requirement of a sustainable_power in tzp
    thermal: Add a function to get the minimum power
    thermal: cpu_cooling: free power table on error or when unregistering
    thermal: cpu_cooling: don't call kcalloc() under rcu_read_lock
    thermal: db8500_cpufreq_cooling: Fix module autoload for OF platform driver
    thermal: cpu_cooling: Add MAINTAINERS entry
    thermal: ti-soc: Kconfig fix to avoid menu showing wrongly
    thermal: ti-soc: allow compile test
    thermal: qcom_spmi: allow compile test
    thermal: exynos: allow compile test
    thermal: armada: allow compile test
    thermal: dove: allow compile test
    thermal: kirkwood: allow compile test
    thermal: rockchip: allow compile test
    thermal: spear: allow compile test
    thermal: hisi: allow compile test
    thermal: Fix thermal_zone_of_sensor_register to match documentation

    Linus Torvalds
     
  • Merge misc fixes from Andrew Morton:
    "15 fixes"

    * emailed patches from Andrew Morton :
    ocfs2/dlm: fix deadlock when dispatch assert master
    membarrier: clean up selftest
    vmscan: fix sane_reclaim helper for legacy memcg
    lib/iommu-common.c: do not try to deref a null iommu->lazy_flush() pointer when n < pool->hint
    x86, efi, kasan: #undef memset/memcpy/memmove per arch
    mm: migrate: hugetlb: putback destination hugepage to active list
    mm, dax: VMA with vm_ops->pfn_mkwrite wants to be write-notified
    userfaultfd: register uapi generic syscall (aarch64)
    userfaultfd: selftest: don't error out if pthread_mutex_t isn't identical
    userfaultfd: selftest: return an error if BOUNCE_VERIFY fails
    userfaultfd: selftest: avoid my_bcmp false positives with powerpc
    userfaultfd: selftest: only warn if __NR_userfaultfd is undefined
    userfaultfd: selftest: headers fixup
    userfaultfd: selftests: vm: pick up sanitized kernel headers
    userfaultfd: revert "userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key"

    Linus Torvalds
     
  • The UDP tunnel config is asymmetric wrt. to the ports used. The source and
    destination ports from one direction of the tunnel are not related to the
    ports of the other direction. We need to be able to respond to ARP requests
    using the correct ports without involving routing.

    As the consequence, UDP ports need to be fixed property of the tunnel
    interface and cannot be set per route. Remove the ability to set ports per
    route. This is still okay to do, as no kernel has been released with these
    attributes yet.

    Note that the ability to specify source and destination ports is preserved
    for other users of the lwtunnel API which don't use routes for tunnel key
    specification (like openvswitch).

    If in the future we rework ARP handling to allow port specification, the
    attributes can be added back.

    Signed-off-by: Jiri Benc
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Jiri Benc
     
  • When using ip lwtunnels, the additional data for xmit (basically, the actual
    tunnel to use) are carried in ip_tunnel_info either in dst->lwtstate or in
    metadata dst. When replying to ARP requests, we need to send the reply to
    the same tunnel the request came from. This means we need to construct
    proper metadata dst for ARP replies.

    We could perform another route lookup to get a dst entry with the correct
    lwtstate. However, this won't always ensure that the outgoing tunnel is the
    same as the incoming one, and it won't work anyway for IPv4 duplicate
    address detection.

    The only thing to do is to "reverse" the ip_tunnel_info.

    Signed-off-by: Jiri Benc
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Jiri Benc
     
  • VXLAN device can receive skb with checksum partial. But the checksum
    offset could be in outer header which is pulled on receive. This results
    in negative checksum offset for the skb. Such skb can cause the assert
    failure in skb_checksum_help(). Following patch fixes the bug by setting
    checksum-none while pulling outer header.

    Following is the kernel panic msg from old kernel hitting the bug.

    ------------[ cut here ]------------
    kernel BUG at net/core/dev.c:1906!
    RIP: 0010:[] skb_checksum_help+0x144/0x150
    Call Trace:

    [] queue_userspace_packet+0x408/0x470 [openvswitch]
    [] ovs_dp_upcall+0x5d/0x60 [openvswitch]
    [] ovs_dp_process_packet_with_key+0xe6/0x100 [openvswitch]
    [] ovs_dp_process_received_packet+0x4b/0x80 [openvswitch]
    [] ovs_vport_receive+0x2a/0x30 [openvswitch]
    [] vxlan_rcv+0x53/0x60 [openvswitch]
    [] vxlan_udp_encap_recv+0x8b/0xf0 [openvswitch]
    [] udp_queue_rcv_skb+0x2dc/0x3b0
    [] __udp4_lib_rcv+0x1cf/0x6c0
    [] udp_rcv+0x1a/0x20
    [] ip_local_deliver_finish+0xdd/0x280
    [] ip_local_deliver+0x88/0x90
    [] ip_rcv_finish+0x10d/0x370
    [] ip_rcv+0x235/0x300
    [] __netif_receive_skb+0x55d/0x620
    [] netif_receive_skb+0x80/0x90
    [] virtnet_poll+0x555/0x6f0
    [] net_rx_action+0x134/0x290
    [] __do_softirq+0xa8/0x210
    [] call_softirq+0x1c/0x30
    [] do_softirq+0x65/0xa0
    [] irq_exit+0x8e/0xb0
    [] do_IRQ+0x63/0xe0
    [] common_interrupt+0x6e/0x6e

    Reported-by: Anupam Chanda
    Signed-off-by: Pravin B Shelar
    Acked-by: Tom Herbert
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • inode_cgwb_enabled() gates cgroup writeback support. If it returns
    true, each inode is attached to the corresponding memory domain which
    gets mapped to io domain. It currently only tests whether the
    filesystem and bdi support cgroup writeback; however, cgroup writeback
    support doesn't work on traditional hierarchies and thus it should
    also test whether memcg and iocg are on the default hierarchy.

    This caused traditional hierarchy setups to hit the cgroup writeback
    path inadvertently and ended up creating separate writeback domains
    for each memcg and mapping them all to the root iocg uncovering a
    couple issues in the cgroup writeback path.

    cgroup writeback was never meant to be enabled on traditional
    hierarchies. Make inode_cgwb_enabled() test whether both memcg and
    iocg are on the default hierarchy.

    Signed-off-by: Tejun Heo
    Reported-by: Artem Bityutskiy
    Reported-by: Dexuan Cui
    Link: http://lkml.kernel.org/g/1443012552.19983.209.camel@gmail.com
    Link: http://lkml.kernel.org/g/f30d4a6aa8a546ff88f73021d026a453@SIXPR30MB031.064d.mgd.msft.net

    Tejun Heo
     
  • Pull spi fixes from Mark Brown:
    "A disappointingly large collection of fixes for SPI issues, though
    almost all in drivers (and there mainly the newly added Mediatek
    driver) and the core fixes are documentation and error handling.

    The driver fixes are all of the usual 'important if you see them'
    variety"

    * tag 'spi-fix-v4.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
    spi: xtensa-xtfpga: fix register endianness
    spi: meson: Fix module autoload for OF platform driver
    spi: mediatek: fix wrong error return value on probe
    spi: fix kernel-doc warnings in spi.h
    spi: spidev: fix possible NULL dereference
    spi: atmel: remove warning when !CONFIG_PM_SLEEP
    spi: bcm2835: BUG: fix wrong use of PAGE_MASK
    spi: mediatek: fix spi cs polarity error
    spi: Fix documentation of spi_alloc_master()
    spi: spi-pxa2xx: Check status register to determine if SSSR_TINT is disabled
    spi: Mediatek: Document devicetree bindings update for spi bus
    spi: mediatek: fix spi clock usage error
    spi: mediatek: remove clk_disable_unprepare()

    Linus Torvalds
     

24 Sep, 2015

1 commit

  • Drivers might call napi_disable while not holding the napi instance poll_lock.
    In those instances, its possible for a race condition to exist between
    poll_one_napi and napi_disable. That is to say, poll_one_napi only tests the
    NAPI_STATE_SCHED bit to see if there is work to do during a poll, and as such
    the following may happen:

    CPU0 CPU1
    ndo_tx_timeout napi_poll_dev
    napi_disable poll_one_napi
    test_and_set_bit (ret 0)
    test_bit (ret 1)
    reset adapter napi_poll_routine

    If the adapter gets a tx timeout without a napi instance scheduled, its possible
    for the adapter to think it has exclusive access to the hardware (as the napi
    instance is now scheduled via the napi_disable call), while the netpoll code
    thinks there is simply work to do. The result is parallel hardware access
    leading to corrupt data structures in the driver, and a crash.

    Additionaly, there is another, more critical race between netpoll and
    napi_disable. The disabled napi state is actually identical to the scheduled
    state for a given napi instance. The implication being that, if a napi instance
    is disabled, a netconsole instance would see the napi state of the device as
    having been scheduled, and poll it, likely while the driver was dong something
    requiring exclusive access. In the case above, its fairly clear that not having
    the rings in a state ready to be polled will cause any number of crashes.

    The fix should be pretty easy. netpoll uses its own bit to indicate that that
    the napi instance is in a state of being serviced by netpoll (NAPI_STATE_NPSVC).
    We can just gate disabling on that bit as well as the sched bit. That should
    prevent netpoll from conducting a napi poll if we convert its set bit to a
    test_and_set_bit operation to provide mutual exclusion

    Change notes:
    V2)
    Remove a trailing whtiespace
    Resubmit with proper subject prefix

    V3)
    Clean up spacing nits

    Signed-off-by: Neil Horman
    CC: "David S. Miller"
    CC: jmaxwell@redhat.com
    Tested-by: jmaxwell@redhat.com
    Signed-off-by: David S. Miller

    Neil Horman
     

23 Sep, 2015

3 commits

  • Add the userfaultfd syscalls to uapi asm-generic, it was tested with
    postcopy live migration on aarch64 with both 4k and 64k pagesize
    kernels.

    Signed-off-by: Dr. David Alan Gilbert
    Signed-off-by: Andrea Arcangeli
    Cc: Michael Ellerman
    Cc: Shuah Khan
    Cc: Thierry Reding
    Cc: Mathieu Desnoyers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dr. David Alan Gilbert
     
  • This reverts commit 51360155eccb907ff8635bd10fc7de876408c2e0 and adapts
    fs/userfaultfd.c to use the old version of that function.

    It didn't look robust to call __wake_up_common with "nr == 1" when we
    absolutely require wakeall semantics, but we've full control of what we
    insert in the two waitqueue heads of the blocked userfaults. No
    exclusive waitqueue risks to be inserted into those two waitqueue heads
    so we can as well stick to "nr == 1" of the old code and we can rely
    purely on the fact no waitqueue inserted in one of the two waitqueue
    heads we must enforce as wakeall, has wait->flags WQ_FLAG_EXCLUSIVE set.

    Signed-off-by: Andrea Arcangeli
    Cc: Dr. David Alan Gilbert
    Cc: Michael Ellerman
    Cc: Shuah Khan
    Cc: Thierry Reding
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • …, 'spi/fix/mediatek', 'spi/fix/meson', 'spi/fix/mtk' and 'spi/fix/pxa2xx' into spi-linus

    Mark Brown
     

22 Sep, 2015

2 commits

  • Pull cgroup fixes from Tejun Heo:
    "The threadgroup locking changes which went in during 4.2 devel cycle
    added write locking of a percpu_rwsem in cgroup task migration path;
    unfortunately, that involved expedited rcu syncing which turned out to
    be too slow and heavy for certain workloads. The patchset which is
    dependent on this one didn't get committed during that devel cycle, so
    these two patches can be reverted safely.

    Oleg reworked percpu_rwsem for 4.4 so that the writer path is a lot
    lighter. The reported issue goes away with Oleg's reworked
    percpu_rwsem and I'll reapply these patches on the for-4.4 branch so
    that they can land together with Oleg's changes"

    * 'for-4.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    Revert "sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem"
    Revert "cgroup: simplify threadgroup locking"

    Linus Torvalds
     
  • When creating a timewait socket, we need to arm the timer before
    allowing other cpus to find it. The signal allowing cpus to find
    the socket is setting tw_refcnt to non zero value.

    As we set tw_refcnt in __inet_twsk_hashdance(), we therefore need to
    call inet_twsk_schedule() first.

    This also means we need to remove tw_refcnt changes from
    inet_twsk_schedule() and let the caller handle it.

    Note that because we use mod_timer_pinned(), we have the guarantee
    the timer wont expire before we set tw_refcnt as we run in BH context.

    To make things more readable I introduced inet_twsk_reschedule() helper.

    When rearming the timer, we can use mod_timer_pending() to make sure
    we do not rearm a canceled timer.

    Note: This bug can possibly trigger if packets of a flow can hit
    multiple cpus. This does not normally happen, unless flow steering
    is broken somehow. This explains this bug was spotted ~5 months after
    its introduction.

    A similar fix is needed for SYN_RECV sockets in reqsk_queue_hash_req(),
    but will be provided in a separate patch for proper tracking.

    Fixes: 789f558cfb36 ("tcp/dccp: get rid of central timewait timer")
    Signed-off-by: Eric Dumazet
    Reported-by: Ying Cai
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 Sep, 2015

4 commits

  • Like the previous patch, which fixes ipv4 tunnels, here is the ipv6 part.

    Before the patch, the external ipv6 header + gre header were included on
    tx.

    After the patch:
    $ ping -c1 192.168.6.121 ; ip -s l ls dev ip6gre1
    PING 192.168.6.121 (192.168.6.121) 56(84) bytes of data.
    64 bytes from 192.168.6.121: icmp_req=1 ttl=64 time=1.92 ms

    --- 192.168.6.121 ping statistics ---
    1 packets transmitted, 1 received, 0% packet loss, time 0ms
    rtt min/avg/max/mdev = 1.923/1.923/1.923/0.000 ms
    7: ip6gre1@NONE: mtu 1440 qdisc noqueue state UNKNOWN mode DEFAULT group default
    link/gre6 20:01:06:60:30:08:c1:c3:00:00:00:00:00:00:01:23 peer 20:01:06:60:30:08:c1:c3:00:00:00:00:00:00:01:21
    RX: bytes packets errors dropped overrun mcast
    84 1 0 0 0 0
    TX: bytes packets errors dropped carrier collsns
    84 1 0 0 0 0
    $ ping -c1 192.168.1.121 ; ip -s l ls dev ip6tnl1
    PING 192.168.1.121 (192.168.1.121) 56(84) bytes of data.
    64 bytes from 192.168.1.121: icmp_req=1 ttl=64 time=2.28 ms

    --- 192.168.1.121 ping statistics ---
    1 packets transmitted, 1 received, 0% packet loss, time 0ms
    rtt min/avg/max/mdev = 2.288/2.288/2.288/0.000 ms
    8: ip6tnl1@NONE: mtu 1452 qdisc noqueue state UNKNOWN mode DEFAULT group default
    link/tunnel6 2001:660:3008:c1c3::123 peer 2001:660:3008:c1c3::121
    RX: bytes packets errors dropped overrun mcast
    84 1 0 0 0 0
    TX: bytes packets errors dropped carrier collsns
    84 1 0 0 0 0

    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patch contains Netfilter fixes for your net tree, they are:

    1) nf_log_unregister() should only set to NULL the logger that is being
    unregistered, instead of everything else. Patch from Florian Westphal.

    2) Fix a crash when accessing physoutdev from PREROUTING in br_netfilter.
    This is partially reverting the patch to shrink nf_bridge_info to 32 bytes.
    Also from Florian.

    3) Use existing match/target extensions in the internal nft_compat extension
    lists when the extension is family unspecific (ie. NFPROTO_UNSPEC).

    4) Wait for rcu grace period before leaving nf_log_unregister().
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Man page of ip-route(8) says following about route types:

    unreachable - these destinations are unreachable. Packets are dis‐
    carded and the ICMP message host unreachable is generated. The local
    senders get an EHOSTUNREACH error.

    blackhole - these destinations are unreachable. Packets are dis‐
    carded silently. The local senders get an EINVAL error.

    prohibit - these destinations are unreachable. Packets are discarded
    and the ICMP message communication administratively prohibited is
    generated. The local senders get an EACCES error.

    In the inet6 address family, this was correct, except the local senders
    got ENETUNREACH error instead of EHOSTUNREACH in case of unreachable route.
    In the inet address family, all three route types generated ICMP message
    net unreachable, and the local senders got ENETUNREACH error.

    In both address families all three route types now behave consistently
    with documentation.

    Signed-off-by: Nikola Forró
    Signed-off-by: David S. Miller

    Nikola Forró
     
  • Signed-off-by: Jann Horn
    Reviewed-by: Andy Lutomirski
    Signed-off-by: Linus Torvalds

    Jann Horn
     

20 Sep, 2015

4 commits

  • Pull power management and ACPI updates from Rafael Wysocki:
    "Included are: a somewhat late devfreq update which however is mostly
    fixes and cleanups with one new thing only (the PPMUv2 support on
    Exynos5433), an ACPI cpufreq driver fixup and two ACPI core cleanups
    related to preprocessor directives.

    Specifics:

    - Fix a memory allocation size in the devfreq core (Xiaolong Ye).

    - Fix a mistake in the exynos-ppmu DT binding (Javier Martinez
    Canillas).

    - Add support for PPMUv2 ((Platform Performance Monitoring Unit
    version 2.0) on the Exynos5433 SoCs (Chanwoo Choi).

    - Fix a type casting bug in the Exynos PPMU code (MyungJoo Ham).

    - Assorted devfreq code cleanups and optimizations (Javi Merino,
    MyungJoo Ham, Viresh Kumar).

    - Fix up the ACPI cpufreq driver to use a more lightweight way to get
    to its private data in the ->get() callback (Rafael J Wysocki).

    - Fix a CONFIG_ prefix bug in one of the ACPI drivers and make the
    ACPI subsystem use IS_ENABLED() instead of #ifdefs in function
    bodies (Sudeep Holla)"

    * tag 'pm+acpi-4.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    cpufreq: acpi-cpufreq: Use cpufreq_cpu_get_raw() in ->get()
    ACPI: Eliminate CONFIG_.*{, _MODULE} #ifdef in favor of IS_ENABLED()
    ACPI: int340x_thermal: add missing CONFIG_ prefix
    PM / devfreq: Fix incorrect type issue.
    PM / devfreq: tegra: Update governor to use devfreq_update_stats()
    PM / devfreq: comments for get_dev_status usage updated
    PM / devfreq: drop comment about thermal setting max_freq
    PM / devfreq: cache the last call to get_dev_status()
    PM / devfreq: Drop unlikely before IS_ERR(_OR_NULL)
    PM / devfreq: exynos-ppmu: bit-wise operation bugfix.
    PM / devfreq: exynos-ppmu: Update documentation to support PPMUv2
    PM / devfreq: exynos-ppmu: Add the support of PPMUv2 for Exynos5433
    PM / devfreq: event: Remove incorrect property in exynos-ppmu DT binding

    Linus Torvalds
     
  • Pull rdma fixes from Doug Ledford:
    "The new hfi1 driver in staging/rdma has had a number of fixup patches
    since being added to the tree. This is the first batch of those fixes"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
    IB/hfi: Properly set permissions for user device files
    IB/hfi1: mask vs shift confusion
    IB/hfi1: clean up some defines
    IB/hfi1: info leak in get_ctxt_info()
    IB/hfi1: fix a locking bug
    IB/hfi1: checking for NULL instead of IS_ERR
    IB/hfi1: fix sdma_descq_cnt parameter parsing
    IB/hfi1: fix copy_to/from_user() error handling
    IB/hfi1: fix pstateinfo from returning improperly byteswapped value

    Linus Torvalds
     
  • Pull libnvdimm fixes from Dan Williams:

    - a boot regression (since v4.2) fix for some ARM configurations from
    Tyler

    - regression (since v4.1) fixes for mkfs.xfs on a DAX enabled device
    from Jeff. These are tagged for -stable.

    - a pair of locking fixes from Axel that are hidden from lockdep since
    they involve device_lock(). The "btt" one is tagged for -stable, the
    other only applies to the new "pfn" mechanism in v4.3.

    - a fix for the pmem ->rw_page() path to use wmb_pmem() from Ross.

    * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    mm: fix type cast in __pfn_to_phys()
    pmem: add proper fencing to pmem_rw_page()
    libnvdimm: pfn_devs: Fix locking in namespace_store
    libnvdimm: btt_devs: Fix locking in namespace_store
    blockdev: don't set S_DAX for misaligned partitions
    dax: fix O_DIRECT I/O to the last block of a blockdev

    Linus Torvalds
     
  • Pull block updates from Jens Axboe:
    "This is a bit bigger than it should be, but I could (did) not want to
    send it off last week due to both wanting extra testing, and expecting
    a fix for the bounce regression as well. In any case, this contains:

    - Fix for the blk-merge.c compilation warning on gcc 5.x from me.

    - A set of back/front SG gap merge fixes, from me and from Sagi.
    This ensures that we honor SG gapping for integrity payloads as
    well.

    - Two small fixes for null_blk from Matias, fixing a leak and a
    capacity propagation issue.

    - A blkcg fix from Tejun, fixing a NULL dereference.

    - A fast clone optimization from Ming, fixing a performance
    regression since the arbitrarily sized bio's were introduced.

    - Also from Ming, a regression fix for bouncing IOs"

    * 'for-linus' of git://git.kernel.dk/linux-block:
    block: fix bounce_end_io
    block: blk-merge: fast-clone bio when splitting rw bios
    block: blkg_destroy_all() should clear q->root_blkg and ->root_rl.blkg
    block: Copy a user iovec if it includes gaps
    block: Refuse adding appending a gapped integrity page to a bio
    block: Refuse request/bio merges with gaps in the integrity payload
    block: Check for gaps on front and back merges
    null_blk: fix wrong capacity when bs is not 512 bytes
    null_blk: fix memory leak on cleanup
    block: fix bogus compiler warnings in blk-merge.c

    Linus Torvalds
     

19 Sep, 2015

4 commits

  • The various definitions of __pfn_to_phys() have been consolidated to
    use a generic macro in include/asm-generic/memory_model.h. This hit
    mainline in the form of 012dcef3f058 "mm: move __phys_to_pfn and
    __pfn_to_phys to asm/generic/memory_model.h". When the generic macro
    was implemented the type cast to phys_addr_t was dropped which caused
    boot regressions on ARM platforms with more than 4GB of memory and
    LPAE enabled.

    It was suggested to use PFN_PHYS() defined in include/linux/pfn.h
    as provides the correct logic and avoids further duplication.

    Reported-by: kernelci.org bot
    Suggested-by: Dan Williams
    Signed-off-by: Tyler Baker
    Signed-off-by: Dan Williams

    Tyler Baker
     
  • * acpi-bus:
    ACPI: Eliminate CONFIG_.*{, _MODULE} #ifdef in favor of IS_ENABLED()
    ACPI: int340x_thermal: add missing CONFIG_ prefix

    Rafael J. Wysocki
     
  • * pm-cpufreq:
    cpufreq: acpi-cpufreq: Use cpufreq_cpu_get_raw() in ->get()

    * pm-devfreq:
    PM / devfreq: Fix incorrect type issue.
    PM / devfreq: tegra: Update governor to use devfreq_update_stats()
    PM / devfreq: comments for get_dev_status usage updated
    PM / devfreq: drop comment about thermal setting max_freq
    PM / devfreq: cache the last call to get_dev_status()
    PM / devfreq: Drop unlikely before IS_ERR(_OR_NULL)
    PM / devfreq: exynos-ppmu: bit-wise operation bugfix.
    PM / devfreq: exynos-ppmu: Update documentation to support PPMUv2
    PM / devfreq: exynos-ppmu: Add the support of PPMUv2 for Exynos5433
    PM / devfreq: event: Remove incorrect property in exynos-ppmu DT binding

    Rafael J. Wysocki
     
  • Pull KVM fixes from Paolo Bonzini:
    "Mostly stable material, a lot of ARM fixes"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
    sched: access local runqueue directly in single_task_running
    arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS'
    arm64: KVM: Remove all traces of the ThumbEE registers
    arm: KVM: Disable virtual timer even if the guest is not using it
    arm64: KVM: Disable virtual timer even if the guest is not using it
    arm/arm64: KVM: vgic: Check for !irqchip_in_kernel() when mapping resources
    KVM: s390: Replace incorrect atomic_or with atomic_andnot
    arm: KVM: Fix incorrect device to IPA mapping
    arm64: KVM: Fix user access for debug registers
    KVM: vmx: fix VPID is 0000H in non-root operation
    KVM: add halt_attempted_poll to VCPU stats
    kvm: fix zero length mmio searching
    kvm: fix double free for fast mmio eventfd
    kvm: factor out core eventfd assign/deassign logic
    kvm: don't try to register to KVM_FAST_MMIO_BUS for non mmio eventfd
    KVM: make the declaration of functions within 80 characters
    KVM: arm64: add workaround for Cortex-A57 erratum #852523
    KVM: fix polling for guest halt continued even if disable it
    arm/arm64: KVM: Fix PSCI affinity info return value for non valid cores
    arm64: KVM: set {v,}TCR_EL2 RES1 bits
    ...

    Linus Torvalds
     

18 Sep, 2015

8 commits

  • Byteswap link_width_downgrade_*_active values before sending on the wire. In
    addition properly define the Port State Info structure.

    Reviewed-by: Dennis Dalessandro
    Reviewed-by: Christian Gomez
    Signed-off-by: Rimmer, Todd
    Signed-off-by: Ira Weiny
    Acked-by: Mike Marciniszyn
    Signed-off-by: Doug Ledford

    Ira Weiny
     
  • Pull irq updates from Thomas Gleixner:
    "This is a rather large update post rc1 due to the final steps of
    cleanups and API changes which had to wait for the preparatory patches
    to hit your tree.

    - Regression fixes for ARM GIC irqchips

    - Regression fixes and lockdep anotations for renesas irq chips

    - The leftovers of the cleanup and preparatory patches which have
    been ignored by maintainers

    - Final conversions of the newly merged users of obsolete APIs

    - Final removal of obsolete APIs

    - Final removal of ARM artifacts which had been introduced during the
    conversion of ARM to the generic interrupt code.

    - Final split of the irq_data into chip specific and common data to
    reflect the needs of hierarchical irq domains.

    - Treewide removal of the first argument of interrupt flow handlers,
    i.e. the irq number, which is not used by the majority of handlers
    and simple to retrieve from the other argument the irq descriptor.

    - A few comment updates and build warning fixes"

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
    arm64: Remove ununsed set_irq_flags
    ARM: Remove ununsed set_irq_flags
    sh: Kill off set_irq_flags usage
    irqchip: Kill off set_irq_flags usage
    gpu/drm: Kill off set_irq_flags usage
    genirq: Remove irq argument from irq flow handlers
    genirq: Move field 'msi_desc' from irq_data into irq_common_data
    genirq: Move field 'affinity' from irq_data into irq_common_data
    genirq: Move field 'handler_data' from irq_data into irq_common_data
    genirq: Move field 'node' from irq_data into irq_common_data
    irqchip/gic-v3: Use IRQD_FORWARDED_TO_VCPU flag
    irqchip/gic: Use IRQD_FORWARDED_TO_VCPU flag
    genirq: Provide IRQD_FORWARDED_TO_VCPU status flag
    genirq: Simplify irq_data_to_desc()
    genirq: Remove __irq_set_handler_locked()
    pinctrl/pistachio: Use irq_set_handler_locked
    gpio: vf610: Use irq_set_handler_locked
    powerpc/mpc8xx: Use irq_set_handler_locked()
    powerpc/ipic: Use irq_set_handler_locked()
    powerpc/cpm2: Use irq_set_handler_locked()
    ...

    Linus Torvalds
     
  • Steffen reported that the recent change to add oif to dst lookups breaks
    the VTI use case. The problem is that with the oif set in the flow struct
    the comparison to the nh_oif is triggered. Fix by splitting the
    FLOWI_FLAG_VRFSRC into 2 flags -- one that triggers the vrf device cache
    bypass (FLOWI_FLAG_VRFSRC) and another telling the lookup to not compare
    nh oif (FLOWI_FLAG_SKIP_NH_OIF).

    Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups")

    Signed-off-by: David Ahern
    Acked-by: Steffen Klassert
    Signed-off-by: David S. Miller

    David Ahern
     
  • Commit 718ba5b87343, moved the responsibility for unlocking the socket to
    xs_tcp_setup_socket, meaning that the socket will be unlocked before we
    know that it has finished trying to connect. The following patch is based on
    an initial patch by Russell King to ensure that we delay clearing the
    XPRT_CONNECTING flag until we either know that we failed to initiate
    a connection attempt, or the connection attempt itself failed.

    Fixes: 718ba5b87343 ("SUNRPC: Add helpers to prevent socket create from racing")
    Reported-by: Russell King
    Reported-by: Russell King
    Tested-by: Russell King
    Tested-by: Benjamin Coddington
    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • This patch adds NLM_F_REPLACE flag to ipv6 route replace notifications.
    This makes nlm_flags in ipv6 replace notifications consistent
    with ipv4.

    Signed-off-by: Roopa Prabhu
    Acked-by: Nicolas Dichtel
    Reviewed-by: Michal Kubecek
    Signed-off-by: David S. Miller

    Roopa Prabhu
     
  • Pull Ceph fixes from Sage Weil:
    "These are both fixes to the new and improved keepalive2 behavior"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    libceph: advertise support for keepalive2
    libceph: don't access invalid memory in keepalive2 path

    Linus Torvalds
     
  • Pull timer fixes from Ingo Molnar:
    "A fix for an abs()/abs64() bug that caused too slow NTP convergence on
    32-bit kernels, plus a removal of an obsolete clockevents driver
    facility after all users got converted during the merge window"

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    clockevents: Remove unused set_mode() callback
    time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64()

    Linus Torvalds
     
  • Pull scheduler fixes from Ingo Molnar:
    "A migrate_tasks() locking fix, and a late-coming nohz change plus a
    nohz debug check"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched: 'Annotate' migrate_tasks()
    nohz: Assert existing housekeepers when nohz full enabled
    nohz: Affine unpinned timers to housekeepers

    Linus Torvalds