04 Feb, 2017

14 commits

  • [ Upstream commit 9f427a0e474a67b454420c131709600d44850486 ]

    MPLS multipath for LSR is broken -- always selecting the first nexthop
    in the one label case. For example:

    $ ip -f mpls ro ls
    100
    nexthop as to 200 via inet 172.16.2.2 dev virt12
    nexthop as to 300 via inet 172.16.3.2 dev virt13
    101
    nexthop as to 201 via inet6 2000:2::2 dev virt12
    nexthop as to 301 via inet6 2000:3::2 dev virt13

    In this example incoming packets have a single MPLS labels which means
    BOS bit is set. The BOS bit is passed from mpls_forward down to
    mpls_multipath_hash which never processes the hash loop because BOS is 1.

    Update mpls_multipath_hash to process the entire label stack. mpls_hdr_len
    tracks the total mpls header length on each pass (on pass N mpls_hdr_len
    is N * sizeof(mpls_shim_hdr)). When the label is found with the BOS set
    it verifies the skb has sufficient header for ipv4 or ipv6, and find the
    IPv4 and IPv6 header by using the last mpls_hdr pointer and adding 1 to
    advance past it.

    With these changes I have verified the code correctly sees the label,
    BOS, IPv4 and IPv6 addresses in the network header and icmp/tcp/udp
    traffic for ipv4 and ipv6 are distributed across the nexthops.

    Fixes: 1c78efa8319ca ("mpls: flow-based multipath selection")
    Acked-by: Robert Shearman
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     
  • [ Upstream commit b6677449dff674cf5b81429b11d5c7f358852ef9 ]

    Any bridge options specified during link creation (e.g. ip link add)
    are ignored as br_dev_newlink() does not process them.
    Use br_changelink() to do it.

    Fixes: 133235161721 ("bridge: implement rtnl_link_ops->changelink")
    Signed-off-by: Ivan Vecera
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ivan Vecera
     
  • [ Upstream commit 0dbd7ff3ac5017a46033a9d0a87a8267d69119d9 ]

    Found that if we run LTP netstress test with large MSS (65K),
    the first attempt from server to send data comparable to this
    MSS on fastopen connection will be delayed by the probe timer.

    Here is an example:

    < S seq 0:0 win 43690 options [mss 65495 wscale 7 tfo cookie] length 32
    > S. seq 0:0 ack 1 win 43690 options [mss 65495 wscale 7] length 0
    < . ack 1 win 342 length 0

    Inside tcp_sendmsg(), tcp_send_mss() returns max MSS in 'mss_now',
    as well as in 'size_goal'. This results the segment not queued for
    transmition until all the data copied from user buffer. Then, inside
    __tcp_push_pending_frames(), it breaks on send window test and
    continues with the check probe timer.

    Fragmentation occurs in tcp_write_wakeup()...

    +0.2 > P. seq 1:43777 ack 1 win 342 length 43776
    < . ack 43777, win 1365 length 0
    > P. seq 43777:65001 ack 1 win 342 options [...] length 21224
    ...

    This also contradicts with the fact that we should bound to the half
    of the window if it is large.

    Fix this flaw by correctly initializing max_window. Before that, it
    could have large values that affect further calculations of 'size_goal'.

    Fixes: 168a8f58059a ("tcp: TCP Fast Open Server - main code path")
    Signed-off-by: Alexey Kodanev
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Alexey Kodanev
     
  • [ Upstream commit 03e4deff4987f79c34112c5ba4eb195d4f9382b0 ]

    Just like commit 4acd4945cd1e ("ipv6: addrconf: Avoid calling
    netdevice notifiers with RCU read-side lock"), it is unnecessary
    to make addrconf_disable_change() use RCU iteration over the
    netdev list, since it already holds the RTNL lock, or we may meet
    Illegal context switch in RCU read-side critical section.

    Signed-off-by: Kefeng Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Kefeng Wang
     
  • [ Upstream commit 9ed59592e3e379b2e9557dc1d9e9ec8fcbb33f16]

    Trying to add an mpls encap route when the MPLS modules are not loaded
    hangs. For example:

    CONFIG_MPLS=y
    CONFIG_NET_MPLS_GSO=m
    CONFIG_MPLS_ROUTING=m
    CONFIG_MPLS_IPTUNNEL=m

    $ ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2

    The ip command hangs:
    root 880 826 0 21:25 pts/0 00:00:00 ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2

    $ cat /proc/880/stack
    [] call_usermodehelper_exec+0xd6/0x134
    [] __request_module+0x27b/0x30a
    [] lwtunnel_build_state+0xe4/0x178
    [] fib_create_info+0x47f/0xdd4
    [] fib_table_insert+0x90/0x41f
    [] inet_rtm_newroute+0x4b/0x52
    ...

    modprobe is trying to load rtnl-lwt-MPLS:

    root 881 5 0 21:25 ? 00:00:00 /sbin/modprobe -q -- rtnl-lwt-MPLS

    and it hangs after loading mpls_router:

    $ cat /proc/881/stack
    [] rtnl_lock+0x12/0x14
    [] register_netdevice_notifier+0x16/0x179
    [] mpls_init+0x25/0x1000 [mpls_router]
    [] do_one_initcall+0x8e/0x13f
    [] do_init_module+0x5a/0x1e5
    [] load_module+0x13bd/0x17d6
    ...

    The problem is that lwtunnel_build_state is called with rtnl lock
    held preventing mpls_init from registering.

    Given the potential references held by the time lwtunnel_build_state it
    can not drop the rtnl lock to the load module. So, extract the module
    loading code from lwtunnel_build_state into a new function to validate
    the encap type. The new function is called while converting the user
    request into a fib_config which is well before any table, device or
    fib entries are examined.

    Fixes: 745041e2aaf1 ("lwtunnel: autoload of lwt modules")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     
  • [ Upstream commit 7be2c82cfd5d28d7adb66821a992604eb6dd112e ]

    Ashizuka reported a highmem oddity and sent a patch for freescale
    fec driver.

    But the problem root cause is that core networking stack
    must ensure no skb with highmem fragment is ever sent through
    a device that does not assert NETIF_F_HIGHDMA in its features.

    We need to call illegal_highdma() from harmonize_features()
    regardless of CSUM checks.

    Fixes: ec5f06156423 ("net: Kill link between CSUM and SG features.")
    Signed-off-by: Eric Dumazet
    Cc: Pravin Shelar
    Reported-by: "Ashizuka, Yuusuke"
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 6391a4481ba0796805d6581e42f9f0418c099e34 ]

    Commit 501db511397f ("virtio: don't set VIRTIO_NET_HDR_F_DATA_VALID on
    xmit") in fact disables VIRTIO_HDR_F_DATA_VALID on receiving path too,
    fixing this by adding a hint (has_data_valid) and set it only on the
    receiving path.

    Cc: Rolf Neugebauer
    Signed-off-by: Jason Wang
    Acked-by: Rolf Neugebauer
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jason Wang
     
  • [ Upstream commit 0faa9cb5b3836a979864a6357e01d2046884ad52 ]

    Demonstrating the issue:

    .. add a drop action
    $sudo $TC actions add action drop index 10

    .. retrieve it
    $ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 2 bind 0 installed 29 sec used 29 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    ... bug 1 above: reference is two.
    Reference is actually 1 but we forget to subtract 1.

    ... do a GET again and we see the same issue
    try a few times and nothing changes
    ~$ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 2 bind 0 installed 31 sec used 31 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    ... lets try to bind the action to a filter..
    $ sudo $TC qdisc add dev lo ingress
    $ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
    u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10

    ... and now a few GETs:
    $ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 3 bind 1 installed 204 sec used 204 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    $ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 4 bind 1 installed 206 sec used 206 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    $ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 5 bind 1 installed 235 sec used 235 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    .... as can be observed the reference count keeps going up.

    After the fix

    $ sudo $TC actions add action drop index 10
    $ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 1 bind 0 installed 4 sec used 4 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    $ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 1 bind 0 installed 6 sec used 6 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    $ sudo $TC qdisc add dev lo ingress
    $ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
    u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10

    $ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 2 bind 1 installed 32 sec used 32 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    $ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 2 bind 1 installed 33 sec used 33 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    Fixes: aecc5cefc389 ("net sched actions: fix GETing actions")
    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jamal Hadi Salim
     
  • [ Upstream commit 8a367e74c0120ef68c8c70d5a025648c96626dff ]

    The ax.25 socket connection timed out & the sock struct has been
    previously taken down ie. sock struct is now a NULL pointer. Checking
    the sock_flag causes the segfault. Check if the socket struct pointer
    is NULL before checking sock_flag. This segfault is seen in
    timed out netrom connections.

    Please submit to -stable.

    Signed-off-by: Basil Gunn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Basil Gunn
     
  • [ Upstream commit 02ca0423fd65a0a9c4d70da0dbb8f4b8503f08c7 ]

    With ip6gre we have a tunnel header which also makes the tunnel MTU
    smaller. We need to reserve room for it. Previously we were using up
    space reserved for the Tunnel Encapsulation Limit option
    header (RFC 2473).

    Also, after commit b05229f44228 ("gre6: Cleanup GREv6 transmit path,
    call common GRE functions") our contract with the caller has
    changed. Now we check if the packet length exceeds the tunnel MTU after
    the tunnel header has been pushed, unlike before.

    This is reflected in the check where we look at the packet length minus
    the size of the tunnel header, which is already accounted for in tunnel
    MTU.

    Fixes: b05229f44228 ("gre6: Cleanup GREv6 transmit path, call common GRE functions")
    Signed-off-by: Jakub Sitnicki
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jakub Sitnicki
     
  • [ Upstream commit 75f01a4c9cc291ff5cb28ca1216adb163b7a20ee ]

    When executing conntrack actions on skbuffs with checksum mode
    CHECKSUM_COMPLETE, the checksum must be updated to account for
    header pushes and pulls. Otherwise we get "hw csum failure"
    logs similar to this (ICMP packet received on geneve tunnel
    via ixgbe NIC):

    [ 405.740065] genev_sys_6081: hw csum failure
    [ 405.740106] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G I 4.10.0-rc3+ #1
    [ 405.740108] Call Trace:
    [ 405.740110]
    [ 405.740113] dump_stack+0x63/0x87
    [ 405.740116] netdev_rx_csum_fault+0x3a/0x40
    [ 405.740118] __skb_checksum_complete+0xcf/0xe0
    [ 405.740120] nf_ip_checksum+0xc8/0xf0
    [ 405.740124] icmp_error+0x1de/0x351 [nf_conntrack_ipv4]
    [ 405.740132] nf_conntrack_in+0xe1/0x550 [nf_conntrack]
    [ 405.740137] ? find_bucket.isra.2+0x62/0x70 [openvswitch]
    [ 405.740143] __ovs_ct_lookup+0x95/0x980 [openvswitch]
    [ 405.740145] ? netif_rx_internal+0x44/0x110
    [ 405.740149] ovs_ct_execute+0x147/0x4b0 [openvswitch]
    [ 405.740153] do_execute_actions+0x22e/0xa70 [openvswitch]
    [ 405.740157] ovs_execute_actions+0x40/0x120 [openvswitch]
    [ 405.740161] ovs_dp_process_packet+0x84/0x120 [openvswitch]
    [ 405.740166] ovs_vport_receive+0x73/0xd0 [openvswitch]
    [ 405.740168] ? udp_rcv+0x1a/0x20
    [ 405.740170] ? ip_local_deliver_finish+0x93/0x1e0
    [ 405.740172] ? ip_local_deliver+0x6f/0xe0
    [ 405.740174] ? ip_rcv_finish+0x3a0/0x3a0
    [ 405.740176] ? ip_rcv_finish+0xdb/0x3a0
    [ 405.740177] ? ip_rcv+0x2a7/0x400
    [ 405.740180] ? __netif_receive_skb_core+0x970/0xa00
    [ 405.740185] netdev_frame_hook+0xd3/0x160 [openvswitch]
    [ 405.740187] __netif_receive_skb_core+0x1dc/0xa00
    [ 405.740194] ? ixgbe_clean_rx_irq+0x46d/0xa20 [ixgbe]
    [ 405.740197] __netif_receive_skb+0x18/0x60
    [ 405.740199] netif_receive_skb_internal+0x40/0xb0
    [ 405.740201] napi_gro_receive+0xcd/0x120
    [ 405.740204] gro_cell_poll+0x57/0x80 [geneve]
    [ 405.740206] net_rx_action+0x260/0x3c0
    [ 405.740209] __do_softirq+0xc9/0x28c
    [ 405.740211] irq_exit+0xd9/0xf0
    [ 405.740213] do_IRQ+0x51/0xd0
    [ 405.740215] common_interrupt+0x93/0x93

    Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
    Signed-off-by: Lance Richardson
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Lance Richardson
     
  • [ Upstream commit 003c941057eaa868ca6fedd29a274c863167230d ]

    Fix up a data alignment issue on sparc by swapping the order
    of the cookie byte array field with the length field in
    struct tcp_fastopen_cookie, and making it a proper union
    to clean up the typecasting.

    This addresses log complaints like these:
    log_unaligned: 113 callbacks suppressed
    Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360
    Kernel unaligned access at TPC[9764ac] tcp_try_fastopen+0x2ec/0x360
    Kernel unaligned access at TPC[9764c8] tcp_try_fastopen+0x308/0x360
    Kernel unaligned access at TPC[9764e4] tcp_try_fastopen+0x324/0x360
    Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360

    Cc: Eric Dumazet
    Signed-off-by: Shannon Nelson
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Shannon Nelson
     
  • [ Upstream commit 8a430ed50bb1b19ca14a46661f3b1b35f2fb5c39 ]

    rtm_table is an 8-bit field while table ids are allowed up to u32. Commit
    709772e6e065 ("net: Fix routing tables with id > 255 for legacy software")
    added the preference to set rtm_table in dumps to RT_TABLE_COMPAT if the
    table id is > 255. The table id returned on get route requests should do
    the same.

    Fixes: c36ba6603a11 ("net: Allow user to get table id from route lookup")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     
  • [ Upstream commit ea7a80858f57d8878b1499ea0f1b8a635cc48de7 ]

    Handle failure in lwtunnel_fill_encap adding attributes to skb.

    Fixes: 571e722676fe ("ipv4: support for fib route lwtunnel encap attributes")
    Fixes: 19e42e451506 ("ipv6: support for fib route lwtunnel encap attributes")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     

01 Feb, 2017

1 commit

  • commit c929ea0b910355e1876c64431f3d5802f95b3d75 upstream.

    After removing sunrpc module, I get many kmemleak information as,
    unreferenced object 0xffff88003316b1e0 (size 544):
    comm "gssproxy", pid 2148, jiffies 4294794465 (age 4200.081s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [] kmemleak_alloc+0x4a/0xa0
    [] kmem_cache_alloc+0x15e/0x1f0
    [] ida_pre_get+0xaa/0x150
    [] ida_simple_get+0xad/0x180
    [] nlmsvc_lookup_host+0x4ab/0x7f0 [lockd]
    [] lockd+0x4d/0x270 [lockd]
    [] param_set_timeout+0x55/0x100 [lockd]
    [] svc_defer+0x114/0x3f0 [sunrpc]
    [] svc_defer+0x2d7/0x3f0 [sunrpc]
    [] rpc_show_info+0x8a/0x110 [sunrpc]
    [] proc_reg_write+0x7f/0xc0
    [] __vfs_write+0xdf/0x3c0
    [] vfs_write+0xef/0x240
    [] SyS_write+0xad/0x130
    [] entry_SYSCALL_64_fastpath+0x1a/0xa9
    [] 0xffffffffffffffff

    I found, the ida information (dynamic memory) isn't cleanup.

    Signed-off-by: Kinglong Mee
    Fixes: 2f048db4680a ("SUNRPC: Add an identifier for struct rpc_clnt")
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Kinglong Mee
     

26 Jan, 2017

18 commits

  • commit 7af3ea189a9a13f090de51c97f676215dabc1205 upstream.

    This is useless and more importantly not allowed on the writeback path,
    because crypto_alloc_skcipher() allocates memory with GFP_KERNEL, which
    can recurse back into the filesystem:

    kworker/9:3 D ffff92303f318180 0 20732 2 0x00000080
    Workqueue: ceph-msgr ceph_con_workfn [libceph]
    ffff923035dd4480 ffff923038f8a0c0 0000000000000001 000000009eb27318
    ffff92269eb28000 ffff92269eb27338 ffff923036b145ac ffff923035dd4480
    00000000ffffffff ffff923036b145b0 ffffffff951eb4e1 ffff923036b145a8
    Call Trace:
    [] ? schedule+0x31/0x80
    [] ? schedule_preempt_disabled+0xa/0x10
    [] ? __mutex_lock_slowpath+0xb4/0x130
    [] ? mutex_lock+0x1b/0x30
    [] ? xfs_reclaim_inodes_ag+0x233/0x2d0 [xfs]
    [] ? move_active_pages_to_lru+0x125/0x270
    [] ? radix_tree_gang_lookup_tag+0xc5/0x1c0
    [] ? __list_lru_walk_one.isra.3+0x33/0x120
    [] ? xfs_reclaim_inodes_nr+0x31/0x40 [xfs]
    [] ? super_cache_scan+0x17e/0x190
    [] ? shrink_slab.part.38+0x1e3/0x3d0
    [] ? shrink_node+0x10a/0x320
    [] ? do_try_to_free_pages+0xf4/0x350
    [] ? try_to_free_pages+0xea/0x1b0
    [] ? __alloc_pages_nodemask+0x61d/0xe60
    [] ? cache_grow_begin+0x9d/0x560
    [] ? fallback_alloc+0x148/0x1c0
    [] ? __crypto_alloc_tfm+0x37/0x130
    [] ? __kmalloc+0x1eb/0x580
    [] ? crush_choose_firstn+0x3eb/0x470 [libceph]
    [] ? __crypto_alloc_tfm+0x37/0x130
    [] ? crypto_spawn_tfm+0x39/0x60
    [] ? crypto_cbc_init_tfm+0x23/0x40 [cbc]
    [] ? __crypto_alloc_tfm+0xcc/0x130
    [] ? crypto_skcipher_init_tfm+0x113/0x180
    [] ? crypto_create_tfm+0x43/0xb0
    [] ? crypto_larval_lookup+0x150/0x150
    [] ? crypto_alloc_tfm+0x72/0x120
    [] ? ceph_aes_encrypt2+0x67/0x400 [libceph]
    [] ? ceph_pg_to_up_acting_osds+0x84/0x5b0 [libceph]
    [] ? release_sock+0x40/0x90
    [] ? tcp_recvmsg+0x4b4/0xae0
    [] ? ceph_encrypt2+0x54/0xc0 [libceph]
    [] ? ceph_x_encrypt+0x5d/0x90 [libceph]
    [] ? calcu_signature+0x5f/0x90 [libceph]
    [] ? ceph_x_sign_message+0x35/0x50 [libceph]
    [] ? prepare_write_message_footer+0x5c/0xa0 [libceph]
    [] ? ceph_con_workfn+0x2258/0x2dd0 [libceph]
    [] ? queue_con_delay+0x33/0xd0 [libceph]
    [] ? __submit_request+0x20d/0x2f0 [libceph]
    [] ? ceph_osdc_start_request+0x28/0x30 [libceph]
    [] ? rbd_queue_workfn+0x2f3/0x350 [rbd]
    [] ? process_one_work+0x160/0x410
    [] ? worker_thread+0x4d/0x480
    [] ? process_one_work+0x410/0x410
    [] ? kthread+0xcd/0xf0
    [] ? ret_from_fork+0x1f/0x40
    [] ? kthread_create_on_node+0x190/0x190

    Allocating the cipher along with the key fixes the issue - as long the
    key doesn't change, a single cipher context can be used concurrently in
    multiple requests.

    We still can't take that GFP_KERNEL allocation though. Both
    ceph_crypto_key_clone() and ceph_crypto_key_decode() are called from
    GFP_NOFS context, so resort to memalloc_noio_{save,restore}() here.

    Reported-by: Lucas Stach
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     
  • commit 6db2304aabb070261ad34923bfd83c43dfb000e3 upstream.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     
  • commit 6d6bf72de914059b304f7b99530a7856e5c846aa upstream.

    Clean up: This message was intended to be a dprintk, as it is on the
    server-side.

    Fixes: 87cfb9a0c85c ('xprtrdma: Client-side support for ...')
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     
  • commit 8d38de65644d900199f035277aa5f3da4aa9fc17 upstream.

    Verbs providers may perform house-keeping on the Send Queue during
    each signaled send completion. It is necessary therefore for a verbs
    consumer (like xprtrdma) to occasionally force a signaled send
    completion if it runs unsignaled most of the time.

    xprtrdma does not require signaled completions for Send or FastReg
    Work Requests, but does signal some LocalInv Work Requests. To
    ensure that Send Queue house-keeping can run before the Send Queue
    is more than half-consumed, xprtrdma forces a signaled completion
    on occasion by counting the number of Send Queue Entries it
    consumes. It currently does this by counting each ib_post_send as
    one Entry.

    Commit c9918ff56dfb ("xprtrdma: Add ro_unmap_sync method for FRWR")
    introduced the ability for frwr_op_unmap_sync to post more than one
    Work Request with a single post_send. Thus the underlying assumption
    of one Send Queue Entry per ib_post_send is no longer true.

    Also, FastReg Work Requests are currently never signaled. They
    should be signaled once in a while, just as Send is, to keep the
    accounting of consumed SQEs accurate.

    While we're here, convert the CQCOUNT macros to the currently
    preferred kernel coding style, which is inline functions.

    Fixes: c9918ff56dfb ("xprtrdma: Add ro_unmap_sync method for FRWR")
    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     
  • commit 124f930b8cbc4ac11236e6eb1c5f008318864588 upstream.

    ... otherwise the crypto stack will align it for us with a GFP_ATOMIC
    allocation and a memcpy() -- see skcipher_walk_first().

    Signed-off-by: Ilya Dryomov
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     
  • commit 2b1e1a7cd0a615d57455567a549f9965023321b5 upstream.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     
  • commit e15fd0a11db00fc7f470a9fc804657ec3f6d04a5 upstream.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     
  • commit d03857c63bb036edff0aa7a107276360173aca4e upstream.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     
  • commit 4eb4517ce7c9c573b6c823de403aeccb40018cfc upstream.

    - replace an ad-hoc array with a struct
    - rename to calc_signature() for consistency

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     
  • commit 7882a26d2e2e520099e2961d5e2e870f8e4172dc upstream.

    It's going to be used as a temporary buffer for in-place en/decryption
    with ceph_crypt() instead of on-stack buffers, so rename to enc_buf.
    Ensure alignment to avoid GFP_ATOMIC allocations in the crypto stack.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     
  • commit a45f795c65b479b4ba107b6ccde29b896d51ee98 upstream.

    Starting with 4.9, kernel stacks may be vmalloced and therefore not
    guaranteed to be physically contiguous; the new CONFIG_VMAP_STACK
    option is enabled by default on x86. This makes it invalid to use
    on-stack buffers with the crypto scatterlist API, as sg_set_buf()
    expects a logical address and won't work with vmalloced addresses.

    There isn't a different (e.g. kvec-based) crypto API we could switch
    net/ceph/crypto.c to and the current scatterlist.h API isn't getting
    updated to accommodate this use case. Allocating a new header and
    padding for each operation is a non-starter, so do the en/decryption
    in-place on a single pre-assembled (header + data + padding) heap
    buffer. This is explicitly supported by the crypto API:

    "... the caller may provide the same scatter/gather list for the
    plaintext and cipher text. After the completion of the cipher
    operation, the plaintext data is replaced with the ciphertext data
    in case of an encryption and vice versa for a decryption."

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     
  • commit 55d9cc834f933698fc864f0d36f3cca533d30a8d upstream.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     
  • commit 462e650451c577d15eeb4d883d70fa9e4e529fad upstream.

    Since commit 0a990e709356 ("ceph: clean up service ticket decoding"),
    th->session_key isn't assigned until everything is decoded.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     
  • commit 36721ece1e84a25130c4befb930509b3f96de020 upstream.

    Pass what's going to be encrypted - that's msg_b, not ticket_blob.
    ceph_x_encrypt_buflen() returns the upper bound, so this doesn't change
    the maxlen calculation, but makes it a bit clearer.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     
  • commit ce1ca7d2d140a1f4aaffd297ac487f246963dd2f upstream.

    In rdma_read_chunk_frmr() when ib_post_send() fails, the error code path
    invokes ib_dma_unmap_sg() to unmap the sg list. It then invokes
    svc_rdma_put_frmr() which in turn tries to unmap the same sg list through
    ib_dma_unmap_sg() again. This second unmap is invalid and could lead to
    problems when the iova being unmapped is subsequently reused. Remove
    the call to unmap in rdma_read_chunk_frmr() and let svc_rdma_put_frmr()
    handle it.

    Fixes: 412a15c0fe53 ("svcrdma: Port to new memory registration API")
    Signed-off-by: Sriharsha Basavapatna
    Reviewed-by: Chuck Lever
    Reviewed-by: Yuval Shaia
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Sriharsha Basavapatna
     
  • commit eeb0d56fab4cd7848cf2be6704fa48900dbc1381 upstream.

    In AP (or VLAN) mode, when unicast 802.11 packets are received,
    they might actually be multicast after conversion. In this case
    the fast-RX path didn't handle them properly to send them back
    to the wireless medium. Implement that by copying the SKB and
    sending it back out.

    The possible alternative would be to just punt the packet back
    to the regular (slow) RX path, but since we have almost all of
    the required code here already it's not so complicated to add
    here. Punting it back would also mean acquiring the spinlock,
    which would be bad for the stated purpose of the fast-RX path,
    to enable well-performing parallel RX.

    Signed-off-by: Johannes Berg
    Signed-off-by: Greg Kroah-Hartman

    Johannes Berg
     
  • commit 78794d1890708cf94e3961261e52dcec2cc34722 upstream.

    Context expiry times are in units of seconds since boot, not unix time.

    The use of get_seconds() here therefore sets the expiry time decades in
    the future. This prevents timely freeing of contexts destroyed by
    client RPC_GSS_PROC_DESTROY requests. We'd still free them eventually
    (when the module is unloaded or the container shut down), but a lot of
    contexts could pile up before then.

    Fixes: c5b29f885afe "sunrpc: use seconds since boot in expiry cache"
    Reported-by: Andy Adamson
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    J. Bruce Fields
     
  • commit 546125d1614264d26080817d0c8cddb9b25081fa upstream.

    The inet6addr_chain is an atomic notifier chain, so we can't call
    anything that might sleep (like lock_sock)... instead of closing the
    socket from svc_age_temp_xprts_now (which is called by the notifier
    function), just have the rpc service threads do it instead.

    Fixes: c3d4879e01be "sunrpc: Add a function to close..."
    Signed-off-by: Scott Mayhew
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Scott Mayhew
     

20 Jan, 2017

3 commits

  • commit dc5367bcc556e97555fc94a32cd1aadbebdff47e upstream.

    With commit e53743994e21
    ("af_iucv: use paged SKBs for big outbound messages"),
    we transmit paged skbs for both of AF_IUCV's transport modes
    (IUCV or HiperSockets).
    The qeth driver for Layer 3 HiperSockets currently doesn't
    support NETIF_F_SG, so these skbs would just be linearized again
    by the stack.
    Avoid that overhead by using paged skbs only for IUCV transport.

    cc stable, since this also circumvents a significant skb leak when
    sending large messages (where the skb then needs to be linearized).

    Signed-off-by: Julian Wiedmann
    Signed-off-by: Ursula Braun
    Fixes: e53743994e21 ("af_iucv: use paged SKBs for big outbound messages")
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Julian Wiedmann
     
  • commit 14221cc45caad2fcab3a8543234bb7eda9b540d5 upstream.

    Problem:
    br_nf_pre_routing_finish() calls itself instead of
    br_nf_pre_routing_finish_bridge(). Due to this bug reverse path filter drops
    packets that go through bridge interface.

    User impact:
    Local docker containers with bridge network can not communicate with each
    other.

    Fixes: c5136b15ea36 ("netfilter: bridge: add and use br_nf_hook_thresh")
    Signed-off-by: Artur Molchanov
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Artur Molchanov
     
  • commit 753aacfd2e95df6a0caf23c03dc309020765bea9 upstream.

    A single netlink socket might own multiple interfaces *and* a
    scheduled scan request (which might belong to another interface),
    so when it goes away both may need to be destroyed.

    Remove the schedule_scan_stop indirection to fix this - it's only
    needed for interface destruction because of the way this works
    right now, with a single work taking care of all interfaces.

    Fixes: 93a1e86ce10e4 ("nl80211: Stop scheduled scan if netlink client disappears")
    Signed-off-by: Johannes Berg
    Signed-off-by: Greg Kroah-Hartman

    Johannes Berg
     

15 Jan, 2017

4 commits

  • commit 1b9f700b8cfc31089e2dfa5d0905c52fd4529b50 upstream.

    Logic copied from xs_setup_bc_tcp().

    Fixes: 39a9beab5acb ('rpc: share one xps between all backchannels')
    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     
  • [ Upstream commit 7a18c5b9fb31a999afc62b0e60978aa896fc89e9 ]

    fib_select_path does not call fib_select_multipath if oif is set in the
    flow struct. For VRF use cases oif is always set, so multipath route
    selection is bypassed. Use the FLOWI_FLAG_SKIP_NH_OIF to skip the oif
    check similar to what is done in fib_table_lookup.

    Add saddr and proto to the flow struct for the fib lookup done by the
    VRF driver to better match hash computation for a flow.

    Fixes: 613d09b30f8b ("net: Use VRF device index for lookups on TX")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     
  • [ Upstream commit 57ea52a865144aedbcd619ee0081155e658b6f7d ]

    The GRO fast path caches the frag0 address. This address becomes
    invalid if frag0 is modified by pskb_may_pull or its variants.
    So whenever that happens we must disable the frag0 optimization.

    This is usually done through the combination of gro_header_hard
    and gro_header_slow, however, the IPv6 extension header path did
    the pulling directly and would continue to use the GRO fast path
    incorrectly.

    This patch fixes it by disabling the fast path when we enter the
    IPv6 extension header path.

    Fixes: 78a478d0efd9 ("gro: Inline skb_gro_header and cache frag0 virtual address")
    Reported-by: Slava Shwartsman
    Signed-off-by: Herbert Xu
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Herbert Xu
     
  • [ Upstream commit 7cfd5fd5a9813f1430290d20c0fead9b4582a307 ]

    On 32bit arches, (skb->end - skb->data) is not 'unsigned int',
    so we shall use min_t() instead of min() to avoid a compiler error.

    Fixes: 1272ce87fa01 ("gro: Enter slow-path if there is no tailroom")
    Reported-by: kernel test robot
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet