22 Feb, 2015

1 commit

  • Pull more NFS client updates from Trond Myklebust:
    "Highlights include:

    - Fix a use-after-free in decode_cb_sequence_args()
    - Fix a compile error when #undef CONFIG_PROC_FS
    - NFSv4.1 backchannel spinlocking issue
    - Cleanups in the NFS unstable write code requested by Linus
    - NFSv4.1 fix issues when the server denies our backchannel request
    - Cleanups in create_session and bind_conn_to_session"

    * tag 'nfs-for-3.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    NFSv4.1: Clean up bind_conn_to_session
    NFSv4.1: Always set up a forward channel when binding the session
    NFSv4.1: Don't set up a backchannel if the server didn't agree to do so
    NFSv4.1: Clean up create_session
    pnfs: Refactor the *_layout_mark_request_commit to use pnfs_layout_mark_request_commit
    NFSv4: Kill unused nfs_inode->delegation_state field
    NFS: struct nfs_commit_info.lock must always point to inode->i_lock
    nfs: Can call nfs_clear_page_commit() instead
    nfs: Provide and use helper functions for marking a page as unstable
    SUNRPC: Always manipulate rpc_rqst::rq_bc_pa_list under xprt->bc_pa_lock
    SUNRPC: Fix a compile error when #undef CONFIG_PROC_FS
    NFSv4.1: Convert open-coded array allocation calls to kmalloc_array()
    NFSv4.1: Fix a kfree() of uninitialised pointers in decode_cb_sequence_args

    Linus Torvalds
     

20 Feb, 2015

2 commits

  • Pull Ceph changes from Sage Weil:
    "On the RBD side, there is a conversion to blk-mq from Christoph,
    several long-standing bug fixes from Ilya, and some cleanup from
    Rickard Strandqvist.

    On the CephFS side there is a long list of fixes from Zheng, including
    improved session handling, a few IO path fixes, some dcache management
    correctness fixes, and several blocking while !TASK_RUNNING fixes.

    The core code gets a few cleanups and Chaitanya has added support for
    TCP_NODELAY (which has been used on the server side for ages but we
    somehow missed on the kernel client).

    There is also an update to MAINTAINERS to fix up some email addresses
    and reflect that Ilya and Zheng are doing most of the maintenance for
    RBD and CephFS these days. Do not be surprised to see a pull request
    come from one of them in the future if I am unavailable for some
    reason"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (27 commits)
    MAINTAINERS: update Ceph and RBD maintainers
    libceph: kfree() in put_osd() shouldn't depend on authorizer
    libceph: fix double __remove_osd() problem
    rbd: convert to blk-mq
    ceph: return error for traceless reply race
    ceph: fix dentry leaks
    ceph: re-send requests when MDS enters reconnecting stage
    ceph: show nocephx_require_signatures and notcp_nodelay options
    libceph: tcp_nodelay support
    rbd: do not treat standalone as flatten
    ceph: fix atomic_open snapdir
    ceph: properly mark empty directory as complete
    client: include kernel version in client metadata
    ceph: provide seperate {inode,file}_operations for snapdir
    ceph: fix request time stamp encoding
    ceph: fix reading inline data when i_size > PAGE_SIZE
    ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_close_sessions)
    ceph: avoid block operation when !TASK_RUNNING (ceph_get_caps)
    ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync)
    rbd: fix error paths in rbd_dev_refresh()
    ...

    Linus Torvalds
     
  • Pull kconfig updates from Michal Marek:
    "Yann E Morin was supposed to take over kconfig maintainership, but
    this hasn't happened. So I'm sending a few kconfig patches that I
    collected:

    - Fix for missing va_end in kconfig
    - merge_config.sh displays used if given too few arguments
    - s/boolean/bool/ in Kconfig files for consistency, with the plan to
    only support bool in the future"

    * 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
    kconfig: use va_end to match corresponding va_start
    merge_config.sh: Display usage if given too few arguments
    kconfig: use bool instead of boolean for type definition attributes

    Linus Torvalds
     

19 Feb, 2015

6 commits

  • a255651d4cad ("ceph: ensure auth ops are defined before use") made
    kfree() in put_osd() conditional on the authorizer. A mechanical
    mistake most likely - fix it.

    Cc: Alex Elder
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Reviewed-by: Alex Elder

    Ilya Dryomov
     
  • It turns out it's possible to get __remove_osd() called twice on the
    same OSD. That doesn't sit well with rb_erase() - depending on the
    shape of the tree we can get a NULL dereference, a soft lockup or
    a random crash at some point in the future as we end up touching freed
    memory. One scenario that I was able to reproduce is as follows:

    con_fault_finish()
    osd_reset()

    ceph_osdc_handle_map()

    kick_requests()

    reset_changed_osds()
    __reset_osd()
    __remove_osd()




    __kick_osd_requests()
    __reset_osd()
    __remove_osd()
    Cc: stable@vger.kernel.org # 3.9+: 7c6e6fc53e73: libceph: assert both regular and lingering lists in __remove_osd()
    Cc: stable@vger.kernel.org # 3.9+: cc9f1f518cec: libceph: change from BUG to WARN for __remove_osd() asserts
    Cc: stable@vger.kernel.org # 3.9+
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Reviewed-by: Alex Elder

    Ilya Dryomov
     
  • TCP_NODELAY socket option set on connection sockets,
    disables Nagle’s algorithm and improves latency characteristics.
    tcp_nodelay(default)/notcp_nodelay option flags provided to
    enable/disable setting the socket option.

    Signed-off-by: Chaitanya Huilgol
    [idryomov@redhat.com: NO_TCP_NODELAY -> TCP_NODELAY, minor adjustments]
    Signed-off-by: Ilya Dryomov

    Chaitanya Huilgol
     
  • Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • On Mon, Dec 22, 2014 at 5:35 PM, Sage Weil wrote:
    > On Mon, 22 Dec 2014, Ilya Dryomov wrote:
    >> Actually, pool op stuff has been unused for over two years - looks like
    >> it was added for rbd create_snap and that got ripped out in 2012. It's
    >> unlikely we'd ever need to manage pools or snaps from the kernel client
    >> so I think it makes sense to nuke it. Sage?
    >
    > Yep!

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Pull virtio updates from Rusty Russell:
    "OK, this has the big virtio 1.0 implementation, as specified by OASIS.

    On top of tht is the major rework of lguest, to use PCI and virtio
    1.0, to double-check the implementation.

    Then comes the inevitable fixes and cleanups from that work"

    * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (80 commits)
    virtio: don't set VIRTIO_CONFIG_S_DRIVER_OK twice.
    virtio_net: unconditionally define struct virtio_net_hdr_v1.
    tools/lguest: don't use legacy definitions for net device in example launcher.
    virtio: Don't expose legacy net features when VIRTIO_NET_NO_LEGACY defined.
    tools/lguest: use common error macros in the example launcher.
    tools/lguest: give virtqueues names for better error messages
    tools/lguest: more documentation and checking of virtio 1.0 compliance.
    lguest: don't look in console features to find emerg_wr.
    tools/lguest: don't start devices until DRIVER_OK status set.
    tools/lguest: handle indirect partway through chain.
    tools/lguest: insert driver references from the 1.0 spec (4.1 Virtio Over PCI)
    tools/lguest: insert device references from the 1.0 spec (4.1 Virtio Over PCI)
    tools/lguest: rename virtio_pci_cfg_cap field to match spec.
    tools/lguest: fix features_accepted logic in example launcher.
    tools/lguest: handle device reset correctly in example launcher.
    virtual: Documentation: simplify and generalize paravirt_ops.txt
    lguest: remove NOTIFY call and eventfd facility.
    lguest: remove NOTIFY facility from demonstration launcher.
    lguest: use the PCI console device's emerg_wr for early boot messages.
    lguest: always put console in PCI slot #1.
    ...

    Linus Torvalds
     

18 Feb, 2015

3 commits

  • Merge cleanups requested by Linus.

    * cleanups: (3 commits)
    pnfs: Refactor the *_layout_mark_request_commit to use pnfs_layout_mark_request_commit
    nfs: Can call nfs_clear_page_commit() instead
    nfs: Provide and use helper functions for marking a page as unstable

    Trond Myklebust
     
  • Pull networking updates from David Miller:

    1) Missing netlink attribute validation in nft_lookup, from Patrick
    McHardy.

    2) Restrict ipv6 partial checksum handling to UDP, since that's the
    only case it works for. From Vlad Yasevich.

    3) Clear out silly device table sentinal macros used by SSB and BCMA
    drivers. From Joe Perches.

    4) Make sure the remote checksum code never creates a situation where
    the remote checksum is applied yet the tunneling metadata describing
    the remote checksum transformation is still present. Otherwise an
    external entity might see this and apply the checksum again. From
    Tom Herbert.

    5) Use msecs_to_jiffies() where applicable, from Nicholas Mc Guire.

    6) Don't explicitly initialize timer struct fields, use setup_timer()
    and mod_timer() instead. From Vaishali Thakkar.

    7) Don't invoke tg3_halt() without the tp->lock held, from Jun'ichi
    Nomura.

    8) Missing __percpu annotation in ipvlan driver, from Eric Dumazet.

    9) Don't potentially perform skb_get() on shared skbs, also from Eric
    Dumazet.

    10) Fix COW'ing of metrics for non-DST_HOST routes in ipv6, from Martin
    KaFai Lau.

    11) Fix merge resolution error between the iov_iter changes in vhost and
    some bug fixes that occurred at the same time. From Jason Wang.

    12) If rtnl_configure_link() fails we have to perform a call to
    ->dellink() before unregistering the device. From WANG Cong.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (39 commits)
    net: dsa: Set valid phy interface type
    rtnetlink: call ->dellink on failure when ->newlink exists
    com20020-pci: add support for eae single card
    vhost_net: fix wrong iter offset when setting number of buffers
    net: spelling fixes
    net/core: Fix warning while make xmldocs caused by dev.c
    net: phy: micrel: disable NAND-tree for KSZ8021, KSZ8031, KSZ8051, KSZ8081
    ipv6: fix ipv6_cow_metrics for non DST_HOST case
    openvswitch: Fix key serialization.
    r8152: restore hw settings
    hso: fix rx parsing logic when skb allocation fails
    tcp: make sure skb is not shared before using skb_get()
    bridge: netfilter: Move sysctl-specific error code inside #ifdef
    ipv6: fix possible deadlock in ip6_fl_purge / ip6_fl_gc
    ipvlan: add a missing __percpu pcpu_stats
    tg3: Hold tp->lock before calling tg3_halt() from tg3_init_one()
    bgmac: fix device initialization on Northstar SoCs (condition typo)
    qlcnic: Delete existing multicast MAC list before adding new
    net/mlx5_core: Fix configuration of log_uar_page_sz
    sunvnet: don't change gso data on clones
    ...

    Linus Torvalds
     
  • If the phy interface mode is not found in devicetree, or if devicetree
    is not configured, of_get_phy_mode returns -ENODEV. The current code
    sets the phy interface mode to the return value from of_get_phy_mode
    without checking if it is valid.

    This invalid phy interface mode is passed as parameter to of_phy_connect
    or to phy_connect_direct. This sets the phy interface mode to the invalid
    value, which in turn causes problems for any code using phydev->interface.

    Fixes: b31f65fb4383 ("net: dsa: slave: Fix autoneg for phys on switch MDIO bus")
    Fixes: 0d8bcdd383b8 ("net: dsa: allow for more complex PHY setups")
    Cc: Florian Fainelli
    Cc: Andrew Lunn
    Signed-off-by: Guenter Roeck
    Acked-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Guenter Roeck
     

16 Feb, 2015

1 commit

  • Ignacy reported that when eth0 is down and add a vlan device
    on top of it like:

    ip link add link eth0 name eth0.1 up type vlan id 1

    We will get a refcount leak:

    unregister_netdevice: waiting for eth0.1 to become free. Usage count = 2

    The problem is when rtnl_configure_link() fails in rtnl_newlink(),
    we simply call unregister_device(), but for stacked device like vlan,
    we almost do nothing when we unregister the upper device, more work
    is done when we unregister the lower device, so call its ->dellink().

    Reported-by: Ignacy Gawedzki
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

15 Feb, 2015

4 commits

  • Spelling errors caught by codespell.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • This patch fix following warning wile make xmldocs.

    Warning(.//net/core/dev.c:5345): No description found
    for parameter 'bonding_info'
    Warning(.//net/core/dev.c:5345): Excess function parameter
    'netdev_bonding_info' description in 'netdev_bonding_info_change'

    This warning starts to appear after following patch was added
    into Linus's tree during merger period.

    commit 61bd3857ff2c7daf756d49b41e6277bbdaa8f789
    net/core: Add event for a change in slave state

    Signed-off-by: Masanari Iida
    Signed-off-by: David S. Miller

    Masanari Iida
     
  • ipv6_cow_metrics() currently assumes only DST_HOST routes require
    dynamic metrics allocation from inetpeer. The assumption breaks
    when ndisc discovered router with RTAX_MTU and RTAX_HOPLIMIT metric.
    Refer to ndisc_router_discovery() in ndisc.c and note that dst_metric_set()
    is called after the route is created.

    This patch creates the metrics array (by calling dst_cow_metrics_generic) in
    ipv6_cow_metrics().

    Test:
    radvd.conf:
    interface qemubr0
    {
    AdvLinkMTU 1300;
    AdvCurHopLimit 30;

    prefix fd00:face:face:face::/64
    {
    AdvOnLink on;
    AdvAutonomous on;
    AdvRouterAddr off;
    };
    };

    Before:
    [root@qemu1 ~]# ip -6 r show | egrep -v unreachable
    fd00:face:face:face::/64 dev eth0 proto kernel metric 256 expires 27sec
    fe80::/64 dev eth0 proto kernel metric 256
    default via fe80::74df:d0ff:fe23:8ef2 dev eth0 proto ra metric 1024 expires 27sec

    After:
    [root@qemu1 ~]# ip -6 r show | egrep -v unreachable
    fd00:face:face:face::/64 dev eth0 proto kernel metric 256 expires 27sec mtu 1300
    fe80::/64 dev eth0 proto kernel metric 256 mtu 1300
    default via fe80::74df:d0ff:fe23:8ef2 dev eth0 proto ra metric 1024 expires 27sec mtu 1300 hoplimit 30

    Fixes: 8e2ec639173f325 (ipv6: don't use inetpeer to store metrics for routes.)
    Signed-off-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     
  • Fix typo where mask is used rather than key.

    Fixes: 74ed7ab9264("openvswitch: Add support for unique flow IDs.")
    Reported-by: Joe Stringer
    Signed-off-by: Pravin B Shelar
    Acked-by: Joe Stringer
    Signed-off-by: David S. Miller

    Pravin B Shelar
     

14 Feb, 2015

2 commits


13 Feb, 2015

4 commits

  • IPv6 can keep a copy of SYN message using skb_get() in
    tcp_v6_conn_request() so that caller wont free the skb when calling
    kfree_skb() later.

    Therefore TCP fast open has to clone the skb it is queuing in
    child->sk_receive_queue, as all skbs consumed from receive_queue are
    freed using __kfree_skb() (ie assuming skb->users == 1)

    Signed-off-by: Eric Dumazet
    Signed-off-by: Yuchung Cheng
    Fixes: 5b7ed0892f2af ("tcp: move fastopen functions to tcp_fastopen.c")
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Move memcg_socket_limit_enabled decrement to tcp_destroy_cgroup (called
    from memcg_destroy_kmem -> mem_cgroup_sockets_destroy) and zap a bunch of
    wrapper functions.

    Although this patch moves static keys decrement from __mem_cgroup_free to
    mem_cgroup_css_free, it does not introduce any functional changes, because
    the keys are incremented on setting the limit (tcp or kmem), which can
    only happen after successful mem_cgroup_css_online.

    Signed-off-by: Vladimir Davydov
    Cc: Glauber Costa
    Cc: KAMEZAWA Hiroyuki
    Cc: Eric W. Biederman
    Cc: David S. Miller
    Cc: Johannes Weiner
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • Pull nfsd updates from Bruce Fields:
    "The main change is the pNFS block server support from Christoph, which
    allows an NFS client connected to shared disk to do block IO to the
    shared disk in place of NFS reads and writes. This also requires xfs
    patches, which should arrive soon through the xfs tree, barring
    unexpected problems. Support for other filesystems is also possible
    if there's interest.

    Thanks also to Chuck Lever for continuing work to get NFS/RDMA into
    shape"

    * 'for-3.20' of git://linux-nfs.org/~bfields/linux: (32 commits)
    nfsd: default NFSv4.2 to on
    nfsd: pNFS block layout driver
    exportfs: add methods for block layout exports
    nfsd: add trace events
    nfsd: update documentation for pNFS support
    nfsd: implement pNFS layout recalls
    nfsd: implement pNFS operations
    nfsd: make find_any_file available outside nfs4state.c
    nfsd: make find/get/put file available outside nfs4state.c
    nfsd: make lookup/alloc/unhash_stid available outside nfs4state.c
    nfsd: add fh_fsid_match helper
    nfsd: move nfsd_fh_match to nfsfh.h
    fs: add FL_LAYOUT lease type
    fs: track fl_owner for leases
    nfs: add LAYOUT_TYPE_MAX enum value
    nfsd: factor out a helper to decode nfstime4 values
    sunrpc/lockd: fix references to the BKL
    nfsd: fix year-2038 nfs4 state problem
    svcrdma: Handle additional inline content
    svcrdma: Move read list XDR round-up logic
    ...

    Linus Torvalds
     
  • If CONFIG_SYSCTL=n:

    net/bridge/br_netfilter.c: In function ‘br_netfilter_init’:
    net/bridge/br_netfilter.c:996: warning: label ‘err1’ defined but not used

    Move the label and the code after it inside the existing #ifdef to get
    rid of the warning.

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: David S. Miller

    Geert Uytterhoeven
     

12 Feb, 2015

16 commits

  • Use spin_lock_bh in ip6_fl_purge() to prevent following potentially
    deadlock scenario between ip6_fl_purge() and ip6_fl_gc() timer.

    =================================
    [ INFO: inconsistent lock state ]
    3.19.0 #1 Not tainted
    ---------------------------------
    inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
    swapper/5/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
    (ip6_fl_lock){+.?...}, at: [] ip6_fl_gc+0x2d/0x180
    {SOFTIRQ-ON-W} state was registered at:
    [] __lock_acquire+0x4a0/0x10b0
    [] lock_acquire+0xc4/0x2b0
    [] _raw_spin_lock+0x3d/0x80
    [] ip6_flowlabel_net_exit+0x28/0x110
    [] ops_exit_list.isra.1+0x39/0x60
    [] cleanup_net+0x100/0x1e0
    [] process_one_work+0x20a/0x830
    [] worker_thread+0x11b/0x460
    [] kthread+0x104/0x120
    [] ret_from_fork+0x7c/0xb0
    irq event stamp: 84640
    hardirqs last enabled at (84640): [] _raw_spin_unlock_irq+0x30/0x50
    hardirqs last disabled at (84639): [] _raw_spin_lock_irq+0x1f/0x80
    softirqs last enabled at (84628): [] _local_bh_enable+0x21/0x50
    softirqs last disabled at (84629): [] irq_exit+0x12d/0x150

    other info that might help us debug this:
    Possible unsafe locking scenario:

    CPU0
    ----
    lock(ip6_fl_lock);

    lock(ip6_fl_lock);

    *** DEADLOCK ***

    Signed-off-by: Jan Stancek
    Signed-off-by: David S. Miller

    Jan Stancek
     
  • Pull security layer updates from James Morris:
    "Highlights:

    - Smack adds secmark support for Netfilter
    - /proc/keys is now mandatory if CONFIG_KEYS=y
    - TPM gets its own device class
    - Added TPM 2.0 support
    - Smack file hook rework (all Smack users should review this!)"

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (64 commits)
    cipso: don't use IPCB() to locate the CIPSO IP option
    SELinux: fix error code in policydb_init()
    selinux: add security in-core xattr support for pstore and debugfs
    selinux: quiet the filesystem labeling behavior message
    selinux: Remove unused function avc_sidcmp()
    ima: /proc/keys is now mandatory
    Smack: Repair netfilter dependency
    X.509: silence asn1 compiler debug output
    X.509: shut up about included cert for silent build
    KEYS: Make /proc/keys unconditional if CONFIG_KEYS=y
    MAINTAINERS: email update
    tpm/tpm_tis: Add missing ifdef CONFIG_ACPI for pnp_acpi_device
    smack: fix possible use after frees in task_security() callers
    smack: Add missing logging in bidirectional UDS connect check
    Smack: secmark support for netfilter
    Smack: Rework file hooks
    tpm: fix format string error in tpm-chip.c
    char/tpm/tpm_crb: fix build error
    smack: Fix a bidirectional UDS connect check typo
    smack: introduce a special case for tmpfs in smack_d_instantiate()
    ...

    Linus Torvalds
     
  • Merge second set of updates from Andrew Morton:
    "More of MM"

    * emailed patches from Andrew Morton : (83 commits)
    mm/nommu.c: fix arithmetic overflow in __vm_enough_memory()
    mm/mmap.c: fix arithmetic overflow in __vm_enough_memory()
    vmstat: Reduce time interval to stat update on idle cpu
    mm/page_owner.c: remove unnecessary stack_trace field
    Documentation/filesystems/proc.txt: describe /proc//map_files
    mm: incorporate read-only pages into transparent huge pages
    vmstat: do not use deferrable delayed work for vmstat_update
    mm: more aggressive page stealing for UNMOVABLE allocations
    mm: always steal split buddies in fallback allocations
    mm: when stealing freepages, also take pages created by splitting buddy page
    mincore: apply page table walker on do_mincore()
    mm: /proc/pid/clear_refs: avoid split_huge_page()
    mm: pagewalk: fix misbehavior of walk_page_range for vma(VM_PFNMAP)
    mempolicy: apply page table walker on queue_pages_range()
    arch/powerpc/mm/subpage-prot.c: use walk->vma and walk_page_vma()
    memcg: cleanup preparation for page table walk
    numa_maps: remove numa_maps->vma
    numa_maps: fix typo in gather_hugetbl_stats
    pagemap: use walk->vma instead of calling find_vma()
    clear_refs: remove clear_refs_private->vma and introduce clear_refs_test_walk()
    ...

    Linus Torvalds
     
  • Pull NFS client updates from Trond Myklebust:
    "Highlights incluse:

    Features:
    - Removing the forced serialisation of open()/close() calls in
    NFSv4.x (x>0) makes for a significant performance improvement in
    metadata intensive workloads.
    - Full support for the pNFS "flexible files" layout type
    - Further RPC/RDMA client improvements from Chuck

    Bugfixes:
    - Stable fix: NFSv4.1 backchannel calls blocking operations with !TASK_RUNNING
    - Stable fix: pnfs_generic_pg_init_read/write can be called with lseg == NULL
    - Stable fix: Fix an Oopsable condition when nsm_mon_unmon is called
    as part of the namespace cleanup,
    - Stable fix: Ensure we reference the inode for return-on-close in
    delegreturn
    - Use SO_REUSEPORT to ensure that NFSv3 TCP connections can rebind to
    the same source address/port combination during a disconnect/
    reconnect event. This is a requirement imposed by most NFSv3
    server duplicate reply cache implementations.

    Optimisations:
    - Ask for no NFSv4.1 delegations on OPEN if using O_DIRECT

    Other:
    - Add Anna Schumaker as co-maintainer for the NFS client"

    * tag 'nfs-for-3.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (119 commits)
    SUNRPC: Cleanup to remove xs_tcp_close()
    pnfs: delete an unintended goto
    pnfs/flexfiles: Do not dprintk after the free
    SUNRPC: Fix stupid typo in xs_sock_set_reuseport
    SUNRPC: Define xs_tcp_fin_timeout only if CONFIG_SUNRPC_DEBUG
    SUNRPC: Handle connection reset more efficiently.
    SUNRPC: Remove the redundant XPRT_CONNECTION_CLOSE flag
    SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_release
    SUNRPC: Ensure xs_tcp_shutdown() requests a full close of the connection
    SUNRPC: Cleanup to remove remaining uses of XPRT_CONNECTION_ABORT
    SUNRPC: Remove TCP socket linger code
    SUNRPC: Remove TCP client connection reset hack
    SUNRPC: TCP/UDP always close the old socket before reconnecting
    SUNRPC: Add helpers to prevent socket create from racing
    SUNRPC: Ensure xs_reset_transport() resets the close connection flags
    SUNRPC: Do not clear the source port in xs_reset_transport
    SUNRPC: Handle EADDRINUSE on connect
    SUNRPC: Set SO_REUSEPORT socket option for TCP connections
    NFSv4.1: Fix pnfs_put_lseg races
    NFSv4.1: pnfs_send_layoutreturn should use GFP_NOFS
    ...

    Linus Torvalds
     
  • This allows those get_user_pages calls to pass FAULT_FLAG_ALLOW_RETRY to
    the page fault in order to release the mmap_sem during the I/O.

    Signed-off-by: Andrea Arcangeli
    Reviewed-by: Kirill A. Shutemov
    Cc: Andres Lagar-Cavilla
    Cc: Peter Feiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • The unified hierarchy interface for memory cgroups will no longer use "-1"
    to mean maximum possible resource value. In preparation for this, make
    the string an argument and let the caller supply it.

    Signed-off-by: Johannes Weiner
    Acked-by: Michal Hocko
    Cc: Vladimir Davydov
    Cc: Greg Thelen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Change remote checksum handling to set checksum partial as default
    behavior. Added an iflink parameter to configure not using
    checksum partial (calling csum_partial to update checksum).

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • This patch adds infrastructure so that remote checksum offload can
    set CHECKSUM_PARTIAL instead of calling csum_partial and writing
    the modfied checksum field.

    Add skb_remcsum_adjust_partial function to set an skb for using
    CHECKSUM_PARTIAL with remote checksum offload. Changed
    skb_remcsum_process and skb_gro_remcsum_process to take a boolean
    argument to indicate if checksum partial can be set or the
    checksum needs to be modified using the normal algorithm.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • Properly set GSO types and skb->encapsulation in the UDP tunnel GRO
    complete so that packets are properly represented for GSO. This sets
    SKB_GSO_UDP_TUNNEL or SKB_GSO_UDP_TUNNEL_CSUM depending on whether
    non-zero checksums were received, and sets SKB_GSO_TUNNEL_REMCSUM if
    the remote checksum option was processed.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • Remote checksum offload processing is currently the same for both
    the GRO and non-GRO path. When the remote checksum offload option
    is encountered, the checksum field referred to is modified in
    the packet. So in the GRO case, the packet is modified in the
    GRO path and then the operation is skipped when the packet goes
    through the normal path based on skb->remcsum_offload. There is
    a problem in that the packet may be modified in the GRO path, but
    then forwarded off host still containing the remote checksum option.
    A remote host will again perform RCO but now the checksum verification
    will fail since GRO RCO already modified the checksum.

    To fix this, we ensure that GRO restores a packet to it's original
    state before returning. In this model, when GRO processes a remote
    checksum option it still changes the checksum per the algorithm
    but on return from lower layer processing the checksum is restored
    to its original value.

    In this patch we add define gro_remcsum structure which is passed
    to skb_gro_remcsum_process to save offset and delta for the checksum
    being changed. After lower layer processing, skb_gro_remcsum_cleanup
    is called to restore the checksum before returning from GRO.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • net/openvswitch/flow_netlink.c: In function ‘validate_and_copy_set_tun’:
    net/openvswitch/flow_netlink.c:1749: warning: ‘err’ may be used uninitialized in this function

    If ipv4_tun_from_nlattr() returns a different positive value than
    OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS, err will be uninitialized, and
    validate_and_copy_set_tun() may return an undefined value instead of a
    zero success indicator. Initialize err to zero to fix this.

    Fixes: 1dd144cf5b4b47e1 ("openvswitch: Support VXLAN Group Policy extension")
    Signed-off-by: Geert Uytterhoeven
    Acked-by: Thomas Graf
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Geert Uytterhoeven
     
  • Userspace packet execute command pass down flow key for given
    packet. But userspace can skip some parameter with zero value.
    Therefore kernel needs to initialize key metadata to zero.

    Fixes: 0714812134 ("openvswitch: Eliminate memset() from flow_extract.")
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • When the RDS transport is TCP, we cannot inline the call to rds_send_xmit
    from rds_cong_queue_update because
    (a) we are already holding the sock_lock in the recv path, and
    will deadlock when tcp_setsockopt/tcp_sendmsg try to get the sock
    lock
    (b) cong_queue_update does an irqsave on the rds_cong_lock, and this
    will trigger warnings (for a good reason) from functions called
    out of sock_lock.

    This patch reverts the change introduced by
    2fa57129d ("RDS: Bypass workqueue when queueing cong updates").

    The patch has been verified for both RDS/TCP as well as RDS/RDMA
    to ensure that there are not regressions for either transport:
    - for verification of RDS/TCP a client-server unit-test was used,
    with the server blocked in gdb and thus unable to drain its rcvbuf,
    eventually triggering a RDS congestion update.
    - for RDS/RDMA, the standard IB regression tests were used

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • ip6_append_data is used by other protocols and some of them can't
    be partially checksummed. Only partially checksum UDP protocol.

    Fixes: 32dce968dd987a (ipv6: Allow for partial checksums on non-ufo packets)
    Reported-by: Sabrina Dubroca
    Tested-by: Sabrina Dubroca
    Signed-off-by: Vladislav Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains two small Netfilter updates for your
    net-next tree, they are:

    1) Add ebtables support to nft_compat, from Arturo Borrero.

    2) Fix missing validation of the SET_ID attribute in the lookup
    expressions, from Patrick McHardy.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Using the IPCB() macro to get the IPv4 options is convenient, but
    unfortunately NetLabel often needs to examine the CIPSO option outside
    of the scope of the IP layer in the stack. While historically IPCB()
    worked above the IP layer, due to the inclusion of the inet_skb_param
    struct at the head of the {tcp,udp}_skb_cb structs, recent commit
    971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
    reordered the tcp_skb_cb struct and invalidated this IPCB() trick.

    This patch fixes the problem by creating a new function,
    cipso_v4_optptr(), which locates the CIPSO option inside the IP header
    without calling IPCB(). Unfortunately, this isn't as fast as a simple
    lookup so some additional tweaks were made to limit the use of this
    new function.

    Cc: # 3.18
    Reported-by: Casey Schaufler
    Signed-off-by: Paul Moore
    Tested-by: Casey Schaufler

    Paul Moore
     

11 Feb, 2015

1 commit