12 Sep, 2014
1 commit
-
Pull Ceph fixes from Sage Weil:
"The main thing here is a set of three patches that fix a buffer
overrun for large authentication tickets (sigh).There is also a trivial warning fix and an error path fix that are
both regressions"* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
libceph: do not hard code max auth ticket len
libceph: add process_one_ticket() helper
libceph: gracefully handle large reply messages from the mon
rbd: fix error return code in rbd_dev_device_setup()
rbd: avoid format-security warning inside alloc_workqueue()
11 Sep, 2014
3 commits
-
We hard code cephx auth ticket buffer size to 256 bytes. This isn't
enough for any moderate setups and, in case tickets themselves are not
encrypted, leads to buffer overflows (ceph_x_decrypt() errors out, but
ceph_decode_copy() doesn't - it's just a memcpy() wrapper). Since the
buffer is allocated dynamically anyway, allocated it a bit later, at
the point where we know how much is going to be needed.Fixes: http://tracker.ceph.com/issues/8979
Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil -
Add a helper for processing individual cephx auth tickets. Needed for
the next commit, which deals with allocating ticket buffers. (Most of
the diff here is whitespace - view with git diff -b).Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil -
We preallocate a few of the message types we get back from the mon. If we
get a larger message than we are expecting, fall back to trying to allocate
a new one instead of blindly using the one we have.CC: stable@vger.kernel.org
Signed-off-by: Sage Weil
Reviewed-by: Ilya Dryomov
08 Sep, 2014
2 commits
-
John W. Linville says:
====================
pull request: wireless 2014-09-05Please pull this batch of fixes intended for the 3.17 stream...
For the mac80211 bits, Johannes says:
"Here are a few fixes for mac80211. One has been discussed for a while
and adds a terminating NUL-byte to the alpha2 sent to userspace, which
shouldn't be necessary but since many places treat it as a string we
couldn't move to just sending two bytes.In addition to that, we have two VLAN fixes from Felix, a mesh fix, a
fix for the recently introduced RX aggregation offload, a revert for
a broken patch (that luckily didn't really cause any harm) and a small
fix for alignment in debugfs."For the iwlwifi bits, Emmanuel says:
"I revert a patch that disabled CTS to self in dvm because users
reported issues. The revert is CCed to stable since the offending
patch was sent to stable too. I also bump the firmware API versions
since a new firmware is coming up. On top of that, Marcel fixes a
bug I introduced while fixing a bug in our Kconfig file."Please let me know if there are problems!
====================Signed-off-by: David S. Miller
-
It is possible that the interface is already gone after joining
the list of anycast on this interface as we don't hold a refcount
for the device, in this case we are safe to ignore the error.What's more important, for API compatibility we should not
change this behavior for applications even if it were correct.Fixes: commit a9ed4a2986e13011 ("ipv6: fix rtnl locking in setsockopt for anycast and multicast")
Cc: Sabrina Dubroca
Cc: David S. Miller
Signed-off-by: Cong Wang
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
06 Sep, 2014
7 commits
-
This patch fix spelling typo found in DocBook/networking.xml.
It is because the neworking.xml is generated from comments
in the source, I have to fix typo in comments within the source.Signed-off-by: Masanari Iida
Acked-by: Randy Dunlap
Signed-off-by: David S. Miller -
Paul Bolle reports that 'select NETFILTER_XT_NAT' from the IPV4 and IPV6
NAT tables becomes noop since there is no Kconfig switch for it. Add the
Kconfig switch to resolve this problem.Fixes: 8993cf8 netfilter: move NAT Kconfig switches out of the iptables scope
Reported-by: Paul Bolle
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: David S. Miller -
addrconf_get_prefix_route() ensures to get the right route in the right table.
Signed-off-by: Nicolas Dichtel
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller -
There is no reason to take a refcnt before deleting the peer address route.
It's done some lines below for the local prefix route because
inet6_ifa_finish_destroy() will release it at the end.
For the peer address route, we want to free it right now.This bug has been introduced by commit
caeaba79009c ("ipv6: add support of peer address").Signed-off-by: Nicolas Dichtel
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller -
The timestamping API has separate bits for generating and reporting
timestamps. A software timestamp should only be reported for a packet
when the packet has the relevant generation flag (SKBTX_..) set
and the socket has reporting bit SOF_TIMESTAMPING_SOFTWARE set.The second check was accidentally removed. Reinstitute the original
behavior.Tested:
Without this patch, Documentation/networking/txtimestamp reports
timestamps regardless of whether SOF_TIMESTAMPING_SOFTWARE is set.
After the patch, it only reports them when the flag is set.Fixes: f24b9be5957b ("net-timestamp: extend SCM_TIMESTAMPING ancillary data struct")
Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller -
Use dst_entry held by sk_dst_get() to retrieve tunnel's PMTU.
The dst_mtu(__sk_dst_get(tunnel->sock)) call was racy. __sk_dst_get()
could return NULL if tunnel->sock->sk_dst_cache was reset just before the
call, thus making dst_mtu() dereference a NULL pointer:[ 1937.661598] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[ 1937.664005] IP: [] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
[ 1937.664005] PGD daf0c067 PUD d9f93067 PMD 0
[ 1937.664005] Oops: 0000 [#1] SMP
[ 1937.664005] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables udp_tunnel pppoe pppox ppp_generic slhc deflate ctr twofish_generic twofish_x86_64_3way xts lrw gf128mul glue_helper twofish_x86_64 twofish_common blowfish_generic blowfish_x86_64 blowfish_common des_generic cbc xcbc rmd160 sha512_generic hmac crypto_null af_key xfrm_algo 8021q garp bridge stp llc tun atmtcp clip atm ext3 mbcache jbd iTCO_wdt coretemp kvm_intel iTCO_vendor_support kvm pcspkr evdev ehci_pci lpc_ich mfd_core i5400_edac edac_core i5k_amb shpchp button processor thermal_sys xfs crc32c_generic libcrc32c dm_mod usbhid sg hid sr_mod sd_mod cdrom crc_t10dif crct10dif_common ata_generic ahci ata_piix tg3 libahci libata uhci_hcd ptp ehci_hcd pps_core usbcore scsi_mod libphy usb_common [last unloaded: l2tp_core]
[ 1937.664005] CPU: 0 PID: 10022 Comm: l2tpstress Tainted: G O 3.17.0-rc1 #1
[ 1937.664005] Hardware name: HP ProLiant DL160 G5, BIOS O12 08/22/2008
[ 1937.664005] task: ffff8800d8fda790 ti: ffff8800c43c4000 task.ti: ffff8800c43c4000
[ 1937.664005] RIP: 0010:[] [] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
[ 1937.664005] RSP: 0018:ffff8800c43c7de8 EFLAGS: 00010282
[ 1937.664005] RAX: ffff8800da8a7240 RBX: ffff8800d8c64600 RCX: 000001c325a137b5
[ 1937.664005] RDX: 8c6318c6318c6320 RSI: 000000000000010c RDI: 0000000000000000
[ 1937.664005] RBP: ffff8800c43c7ea8 R08: 0000000000000000 R09: 0000000000000000
[ 1937.664005] R10: ffffffffa048e2c0 R11: ffff8800d8c64600 R12: ffff8800ca7a5000
[ 1937.664005] R13: ffff8800c439bf40 R14: 000000000000000c R15: 0000000000000009
[ 1937.664005] FS: 00007fd7f610f700(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000
[ 1937.664005] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1937.664005] CR2: 0000000000000020 CR3: 00000000d9d75000 CR4: 00000000000027e0
[ 1937.664005] Stack:
[ 1937.664005] ffffffffa049da80 ffff8800d8fda790 000000000000005b ffff880000000009
[ 1937.664005] ffff8800daf3f200 0000000000000003 ffff8800c43c7e48 ffffffff81109b57
[ 1937.664005] ffffffff81109b0e ffffffff8114c566 0000000000000000 0000000000000000
[ 1937.664005] Call Trace:
[ 1937.664005] [] ? pppol2tp_connect+0x235/0x41e [l2tp_ppp]
[ 1937.664005] [] ? might_fault+0x9e/0xa5
[ 1937.664005] [] ? might_fault+0x55/0xa5
[ 1937.664005] [] ? rcu_read_unlock+0x1c/0x26
[ 1937.664005] [] SYSC_connect+0x87/0xb1
[ 1937.664005] [] ? sysret_check+0x1b/0x56
[ 1937.664005] [] ? trace_hardirqs_on_caller+0x145/0x1a1
[ 1937.664005] [] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1937.664005] [] ? spin_lock+0x9/0xb
[ 1937.664005] [] SyS_connect+0x9/0xb
[ 1937.664005] [] system_call_fastpath+0x16/0x1b
[ 1937.664005] Code: 10 2a 84 81 e8 65 76 bd e0 65 ff 0c 25 10 bb 00 00 4d 85 ed 74 37 48 8b 85 60 ff ff ff 48 8b 80 88 01 00 00 48 8b b8 10 02 00 00 8b 47 20 ff 50 20 85 c0 74 0f 83 e8 28 89 83 10 01 00 00 89
[ 1937.664005] RIP [] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
[ 1937.664005] RSP
[ 1937.664005] CR2: 0000000000000020
[ 1939.559375] ---[ end trace 82d44500f28f8708 ]---Fixes: f34c4a35d879 ("l2tp: take PMTU from tunnel UDP socket")
Signed-off-by: Guillaume Nault
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller -
Calling setsockopt with IPV6_JOIN_ANYCAST or IPV6_LEAVE_ANYCAST
triggers the assertion in addrconf_join_solict()/addrconf_leave_solict()ipv6_sock_ac_join(), ipv6_sock_ac_drop(), ipv6_sock_ac_close() need to
take RTNL before calling ipv6_dev_ac_inc/dec. Same thing with
ipv6_sock_mc_join(), ipv6_sock_mc_drop(), ipv6_sock_mc_close() before
calling ipv6_dev_mc_inc/dec.This patch moves ASSERT_RTNL() up a level in the call stack.
Signed-off-by: Cong Wang
Signed-off-by: Sabrina Dubroca
Reported-by: Tommi Rantala
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
05 Sep, 2014
1 commit
-
…ernel/git/jberg/mac80211
Johannes Berg <johannes@sipsolutions.net> says:
"Here are a few fixes for mac80211. One has been discussed for a while
and adds a terminating NUL-byte to the alpha2 sent to userspace, which
shouldn't be necessary but since many places treat it as a string we
couldn't move to just sending two bytes.In addition to that, we have two VLAN fixes from Felix, a mesh fix, a
fix for the recently introduced RX aggregation offload, a revert for
a broken patch (that luckily didn't really cause any harm) and a small
fix for alignment in debugfs."Signed-off-by: John W. Linville <linville@redhat.com>
04 Sep, 2014
1 commit
-
distinguish between the dropped and consumed skb, not assume the skb
is consumed alwaysCc: Thomas Graf
Cc: Pravin Shelar
Signed-off-by: Li RongQing
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
03 Sep, 2014
3 commits
-
The user_skb maybe be leaked if the operation on it failed and codes
skipped into the label "out:" without calling genlmsg_unicast.Cc: Pravin Shelar
Signed-off-by: Li RongQing
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller -
make defconfig reports:
warning: (NETFILTER_XT_TARGET_LOG) selects NF_LOG_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NETFILTER_ADVANCED)
Fixes: d79a61d netfilter: NETFILTER_XT_TARGET_LOG selects NF_LOG_*
Reported-by: kbuild test robot
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: David S. Miller -
Pablo Neira Ayuso says:
====================
pull request: Netfilter/IPVS fixes for netThe following patchset contains seven Netfilter fixes for your net
tree, they are:1) Make the NAT infrastructure independent of x_tables, some users are
already starting to test nf_tables with NAT without enabling x_tables.
Without this patch for Kconfig, there's a superfluous dependency
between NAT and x_tables.
2) Allow to use 0 in the cgroup match, the kernel rejects with -EINVAL
with no good reason. From Daniel Borkmann.3) Select CONFIG_NF_NAT from the nf_tables NAT expression, this also
resolves another NAT dependency with x_tables.4) Use HAVE_JUMP_LABEL instead of CONFIG_JUMP_LABEL in the Netfilter hook
code as elsewhere in the kernel to resolve toolchain problems, from
Zhouyi Zhou.5) Use iptunnel_handle_offloads() to set up tunnel encapsulation
depending on the offload capabilities, reported by Alex Gartrell
patch from Julian Anastasov.6) Fix wrong family when registering the ip_vs_local_reply6() hook,
also from Julian.7) Select the NF_LOG_* symbols from NETFILTER_XT_TARGET_LOG. Rafał
Miłecki reported that when jumping from 3.16 to 3.17-rc, his log
target is not selected anymore due to changes in the previous
development cycle to accomodate the full logging support for
nf_tables.
====================Signed-off-by: David S. Miller
02 Sep, 2014
1 commit
-
John W. Linville says:
====================
pull request: wireless 2014-08-28Please pull this batch of fixes intended for the 3.17 stream.
For the Bluetooth/6LowPAN/802.15.4 bits, Johan says:
'It contains a connection reference counting fix for LE where a
connection might stay up even though it should get disconnected.The other 802.15.4 6LoWPAN related patches were sent to the bluetooth
tree by Alexander Aring and described as follows by him:"
these patches contains patches for the bluetooth branch.This series includes memory leak fixes and an errno value fix.
Also there are two patches for sending and receiving 1280 6LoWPAN
packets, which makes the IEEE 802.15.4 6LoWPAN stack more RFC
compliant.
"'Along with that...
Alexey Khoroshilov fixes a use-after-free bug on at76c50x-usb.
Hauke Mehrtens adds a PCI ID to bcma.
Himangi Saraogi fixes a silly "A || A" test in rtlwifi.
Larry Finger adds a device ID to rtl8192cu.
Maks Naumov fixes a strncmp argument in ath9k.
Álvaro Fernández Rojas adds a PCI ID to ssb.
====================Signed-off-by: David S. Miller
01 Sep, 2014
1 commit
-
CONFIG_NETFILTER_XT_TARGET_LOG is not selected anymore when jumping
from 3.16 to 3.17-rc1 if you don't set on the new NF_LOG_IPV4 and
NF_LOG_IPV6 switches.Change this to select the three new symbols NF_LOG_COMMON, NF_LOG_IPV4
and NF_LOG_IPV6 instead, so NETFILTER_XT_TARGET_LOG remains enabled
when moving from old to new kernels.Reported-by: Rafał Miłecki
Signed-off-by: Pablo Neira Ayuso
30 Aug, 2014
2 commits
-
Since SCTP day 1, that is, 19b55a2af145 ("Initial commit") from lksctp
tree, the official header carries a copy of enum
sctp_sstat_state that looks like (compared to the current in-kernel
enumeration):User definition: Kernel definition:
enum sctp_sstat_state { typedef enum {
SCTP_EMPTY = 0,
SCTP_CLOSED = 1, SCTP_STATE_CLOSED = 0,
SCTP_COOKIE_WAIT = 2, SCTP_STATE_COOKIE_WAIT = 1,
SCTP_COOKIE_ECHOED = 3, SCTP_STATE_COOKIE_ECHOED = 2,
SCTP_ESTABLISHED = 4, SCTP_STATE_ESTABLISHED = 3,
SCTP_SHUTDOWN_PENDING = 5, SCTP_STATE_SHUTDOWN_PENDING = 4,
SCTP_SHUTDOWN_SENT = 6, SCTP_STATE_SHUTDOWN_SENT = 5,
SCTP_SHUTDOWN_RECEIVED = 7, SCTP_STATE_SHUTDOWN_RECEIVED = 6,
SCTP_SHUTDOWN_ACK_SENT = 8, SCTP_STATE_SHUTDOWN_ACK_SENT = 7,
}; } sctp_state_t;This header was later on also placed into the uapi, so that user space
programs can compile without having , but the shipped
with instead.While RFC6458 under 8.2.1.Association Status (SCTP_STATUS) says that
sstat_state can range from SCTP_CLOSED to SCTP_SHUTDOWN_ACK_SENT, we
nevertheless have a what it appears to be dummy SCTP_EMPTY state from
the very early days.While it seems to do just nothing, commit 0b8f9e25b0aa ("sctp: remove
completely unsed EMPTY state") did the right thing and removed this dead
code. That however, causes an off-by-one when the user asks the SCTP
stack via SCTP_STATUS API and checks for the current socket state thus
yielding possibly undefined behaviour in applications as they expect
the kernel to tell the right thing.The enumeration had to be changed however as based on the current socket
state, we access a function pointer lookup-table through this. Therefore,
I think the best way to deal with this is just to add a helper function
sctp_assoc_to_state() to encapsulate the off-by-one quirk.Reported-by: Tristan Su
Fixes: 0b8f9e25b0aa ("sctp: remove completely unsed EMPTY state")
Signed-off-by: Daniel Borkmann
Acked-by: Vlad Yasevich
Signed-off-by: David S. Miller -
In commit ed98df3361f0 ("net: use __GFP_NORETRY for high order
allocations") we tried to address one issue caused by order-3
allocations.We still observe high latencies and system overhead in situations where
compaction is not successful.Instead of trying order-3, order-2, and order-1, do a single order-3
best effort and immediately fallback to plain order-0.This mimics slub strategy to fallback to slab min order if the high
order allocation used for performance failed.Order-3 allocations give a performance boost only if they can be done
without recurring and expensive memory scan.Quoting David :
The page allocator relies on synchronous (sync light) memory compaction
after direct reclaim for allocations that don't retry and deferred
compaction doesn't work with this strategy because the allocation order
is always decreasing from the previous failed attempt.This means sync light compaction will always be encountered if memory
cannot be defragmented or reclaimed several times during the
skb_page_frag_refill() iteration.Signed-off-by: Eric Dumazet
Acked-by: David Rientjes
Signed-off-by: David S. Miller
28 Aug, 2014
1 commit
-
commit fc604767613b6d2036cdc35b660bc39451040a47
("ipvs: changes for local real server") from 2.6.37
introduced DNAT support to local real server but the
IPv6 LOCAL_OUT handler ip_vs_local_reply6() is
registered incorrectly as IPv4 hook causing any outgoing
IPv4 traffic to be dropped depending on the IP header values.Chris tracked down the problem to CONFIG_IP_VS_IPV6=y
Bug report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349768Reported-by: Chris J Arges
Tested-by: Chris J Arges
Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman
27 Aug, 2014
1 commit
-
The tunneling method should properly use tunnel encapsulation.
Fixes problem with CHECKSUM_PARTIAL packets when TCP/UDP csum
offload is supported.Thanks to Alex Gartrell for reporting the problem, providing
solution and for all suggestions.Reported-by: Alex Gartrell
Signed-off-by: Julian Anastasov
Signed-off-by: Alex Gartrell
Signed-off-by: Simon Horman
26 Aug, 2014
10 commits
-
The "RX active" string is too long, so the columns get
shifted. Change it to just "RX" to avoid this.Signed-off-by: Johannes Berg
-
sta->last_seq_ctrl is the seq_ctrl field from the last header
seen, need to shift it 4 bits to extract the sequence number.
Otherwise the ieee80211_sn_less() check at the top of
ieee80211_sta_manage_reorder_buf drops frames until the sequence
number catches up.Cc: Michal Kazior
Signed-off-by: Denton Gentry
Signed-off-by: Johannes Berg -
The 802.11 standard says when processing a plink confirm
frame:"If the peerLinkID in the mesh peering instance has not been
set, the Local Link ID field of the Mesh Peering Confirm
request shall be copied into the peerLinkID in the mesh
peering instance."We were only doing this when receiving an open peering frame,
but it could happen that the open frame gets lost and so we
should handle this case rather than rejecting the confirm and
failing the whole peering process.Reported-by: Yu Niiro
Signed-off-by: Bob Copeland
Signed-off-by: Johannes Berg -
In ieee80211_sta_ps_deliver_wakeup, sdata->smps_mode is checked. This is
initialized only for the base AP interface, not the individual VLANs.Signed-off-by: Felix Fietkau
Signed-off-by: Johannes Berg -
When bringing down the AP, a WARN_ON is hit because the bss config chandef
is empty here.
Since AP_VLAN channel settings do not matter for anything chanctx related
(always inherits the settings from the AP interface), let's just ignore
it here.Signed-off-by: Felix Fietkau
Signed-off-by: Johannes Berg -
This reverts commit 24aa11ab8ae03292d38ec0dbd9bc2ac49fe8a6dd.
That commit was wrong since it uses data that hasn't even been set
up yet, but might be a hold-over from a previous connection.Additionally, it seems like a driver-specific workaround that
shouldn't have been in mac80211 to start with.Cc: stable@vger.kernel.org
Fixes: 24aa11ab8ae0 ("mac80211: disable uAPSD if all ACs are under ACM")
Reviewed-by: Luciano Coelho
Signed-off-by: Johannes Berg -
This is follow-up to
da08143b8520 ("vlan: more careful checksum features handling")
which introduced more careful feature intersection in vlan code,
taking into account that HW_CSUM should be considered superset
of IP_CSUM/IPV6_CSUM. The same is needed in netif_skb_features()
in order to avoid offloading mismatch warning when vlan is
created on top of a bond consisting of slaves supporting IP/IPv6
checksumming but not vlan Tx offloading.Signed-off-by: Michal Kubecek
Signed-off-by: David S. Miller -
Code manipulating sysfs symlinks on adjacent net_devices(s)
currently doesn't take into account that devices potentially
belong to different namespaces.This patch trying to fix an issue as follows:
- check for net_ns before creating / deleting symlink.
for now only netdev_adjacent_rename_links and
__netdev_adjacent_dev_remove are affected, afaics
__netdev_adjacent_dev_insert implies both net_devs
belong to the same namespace.
- Drop all existing symlinks to / from all adj_devs before
switching namespace and recreate them just after.Signed-off-by: Alexander Y. Fomichev
Signed-off-by: David S. Miller -
This adds one more ACPI ID of a Broadcom bluetooth chip.
Signed-off-by: Mika Westerberg
Signed-off-by: John W. Linville
25 Aug, 2014
1 commit
-
Use HAVE_JUMP_LABEL as elsewhere in the kernel to ensure
that the toolchain has the required support in addition to
CONFIG_JUMP_LABEL being set.Signed-off-by: Zhouyi Zhou
Reviewed-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
23 Aug, 2014
5 commits
-
The new_ctx pointer is set only for non-chanctx drivers. This yielded a
crash for chanctx-based drivers during channel switch finalization:BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: ieee80211_vif_use_reserved_switch+0x71c/0xb00 [mac80211]Use an adequate chanctx pointer to fix this.
Reported-by: Linus Torvalds
Signed-off-by: Michal Kazior
Signed-off-by: Linus Torvalds -
In SCTP, selection of active (T.ACT) and retransmission (T.RET)
transports is being done whenever transport control operations
(UP, DOWN, PF, ...) are engaged through sctp_assoc_control_transport().Commits 4c47af4d5eb2 ("net: sctp: rework multihoming retransmission
path selection to rfc4960") and a7288c4dd509 ("net: sctp: improve
sctp_select_active_and_retran_path selection") have both improved
it towards a more fine-grained and optimal path selection.Currently, the selection algorithm for T.ACT and T.RET is as follows:
1) Elect the two most recently used ACTIVE transports T1, T2 for
T.ACT, T.RET, where T.ACT
T1: p1p1 (10.0.10.10) .'`) p1p1 (10.0.10.12) (_ . ) p1p2 (10.0.10.22)net.sctp.rto_min = 1000
net.sctp.path_max_retrans = 2
net.sctp.pf_retrans = 0
net.sctp.hb_interval = 1000T.PRI is permanently down, T2 is put briefly into PF state (e.g. due to
link flapping). Here, the first time transmission is sent over PF path
T2 as it's the only non-INACTIVE path, but the retransmitted data-chunks
are sent over the INACTIVE path T1 (T.PRI), which is not good.After the patch, it's choosing better transports in both cases by
modifying step 4):4) If none is ACTIVE, set T.ACT_newPF->INACTIVE and stays in INACTIVE just
for a very short while before going back ACTIVE, it will guarantee that
this path will be reselected for T.ACT/T.RET since T3 (PF) is not
available.Previously, this was not possible, as we would only select between T.PRI
and T.RET, and a possible T3 would be NULL due to the fact that we have
just transitioned T3 in sctp_assoc_control_transport() from PF->INACTIVE
and would select a suboptimal path when T.PRI/T.RET have worse properties.In the case that T.ACT_old permanently went to INACTIVE during this
transition and there's no PF path available, plus T.PRI and T.RET are
INACTIVE as well, we would now camp on T.ACT_old, but if everything is
being INACTIVE there's really not much we can do except hoping for a
successful HB to bring one of the transports back up again and, thus
cause a new selection through sctp_assoc_control_transport().Now both tests work fine:
Case 1:
1. T1 S(ACTIVE) T.ACT
T2 S(ACTIVE) T.RET2. T1 S(ACTIVE) T.ACT, T.RET
T2 S(PF)3. T1 S(ACTIVE) T.ACT, T.RET
T2 S(INACTIVE)5. T1 S(PF) T.ACT, T.RET
T2 S(INACTIVE)[ 5.1 T1 S(INACTIVE) T.ACT, T.RET
T2 S(INACTIVE) ]6. T1 S(ACTIVE) T.ACT, T.RET
T2 S(INACTIVE)7. T1 S(ACTIVE) T.ACT
T2 S(ACTIVE) T.RETCase 2:
1. T1 S(ACTIVE) T.ACT
T2 S(ACTIVE) T.RET2. T1 S(PF)
T2 S(ACTIVE) T.ACT, T.RET3. T1 S(INACTIVE)
T2 S(ACTIVE) T.ACT, T.RET5. T1 S(INACTIVE)
T2 S(PF) T.ACT, T.RET[ 5.1 T1 S(INACTIVE)
T2 S(INACTIVE) T.ACT, T.RET ]6. T1 S(INACTIVE)
T2 S(ACTIVE) T.ACT, T.RET7. T1 S(ACTIVE) T.ACT
T2 S(ACTIVE) T.RETSigned-off-by: Daniel Borkmann
Acked-by: Neil Horman
Acked-by: Vlad Yasevich
Signed-off-by: David S. Miller -
When both transports are the same, we don't have to go down that
road only to realize that we will return the very same transport.
We are guaranteed that curr is always non-NULL. Therefore, just
short-circuit this special case.Signed-off-by: Daniel Borkmann
Acked-by: Neil Horman
Acked-by: Vlad Yasevich
Signed-off-by: David S. Miller -
When there are multiple vlan headers present in a received frame, the first
one is put into vlan_tci and protocol is set to ETH_P_8021Q. Anything in the
skb beyond the VLAN TPID may be still non-linear, including the inner TCI
and ethertype. While ovs_flow_extract takes care of IP and IPv6 headers, it
does nothing with ETH_P_8021Q. Later, if OVS_ACTION_ATTR_POP_VLAN is
executed, __pop_vlan_tci pulls the next vlan header into vlan_tci.This leads to two things:
1. Part of the resulting ethernet header is in the non-linear part of the
skb. When eth_type_trans is called later as the result of
OVS_ACTION_ATTR_OUTPUT, kernel BUGs in __skb_pull. Also, __pop_vlan_tci
is in fact accessing random data when it reads past the TPID.2. network_header points into the ethernet header instead of behind it.
mac_len is set to a wrong value (10), too.Reported-by: Yulong Pei
Signed-off-by: Jiri Benc
Signed-off-by: David S. Miller -
The function fib6_commit_metrics() allocates a piece of memory in mode
GFP_KERNEL while holding an atomic lock from higher up in the stack, in
the function __ip6_ins_rt(). This produces the following BUG:> BUG: sleeping function called from invalid context at mm/slub.c:1250
> in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: dhcpcd
> 2 locks held by dhcpcd/2909:
> #0: (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x20
> #1: (&tb->tb6_lock){++--+.}, at: [] ip6_route_add+0x65a/0x800
> CPU: 1 PID: 2909 Comm: dhcpcd Not tainted 3.17.0-rc1 #1
> Hardware name: ASUS All Series/Q87T, BIOS 0216 10/16/2013
> 0000000000000008 ffff8800c8f13858 ffffffff81af135a 0000000000000000
> ffff880212202430 ffff8800c8f13878 ffffffff810f8d3a ffff880212202c98
> 0000000000000010 ffff8800c8f138c8 ffffffff8121ad0e 0000000000000001
> Call Trace:
> [] dump_stack+0x4e/0x68
> [] __might_sleep+0x10a/0x120
> [] kmem_cache_alloc_trace+0x4e/0x190
> [] ? fib6_commit_metrics+0x66/0x110
> [] fib6_commit_metrics+0x66/0x110
> [] fib6_add+0x883/0xa80
> [] ? ip6_route_add+0x65a/0x800
> [] ip6_route_add+0x675/0x800
> [] ? ip6_route_add+0x6a/0x800
> [] inet6_rtm_newroute+0x5c/0x80
> [] rtnetlink_rcv_msg+0x211/0x260
> [] ? rtnl_lock+0x17/0x20
> [] ? lock_release_holdtime+0x28/0x180
> [] ? rtnl_lock+0x17/0x20
> [] ? __rtnl_unlock+0x20/0x20
> [] netlink_rcv_skb+0x6e/0xd0
> [] rtnetlink_rcv+0x25/0x40
> [] netlink_unicast+0xd9/0x180
> [] netlink_sendmsg+0x700/0x770
> [] ? local_clock+0x25/0x30
> [] sock_sendmsg+0x6c/0x90
> [] ? might_fault+0xa3/0xb0
> [] ? verify_iovec+0x7d/0xf0
> [] ___sys_sendmsg+0x37e/0x3b0
> [] ? trace_hardirqs_on_caller+0x185/0x220
> [] ? mutex_unlock+0xe/0x10
> [] ? netlink_insert+0xbc/0xe0
> [] ? netlink_autobind.isra.30+0x125/0x150
> [] ? netlink_autobind.isra.30+0x60/0x150
> [] ? netlink_bind+0x159/0x230
> [] ? might_fault+0x5a/0xb0
> [] ? SYSC_bind+0x7e/0xd0
> [] __sys_sendmsg+0x4d/0x80
> [] SyS_sendmsg+0x12/0x20
> [] system_call_fastpath+0x16/0x1bFixing this by replacing the mode GFP_KERNEL with GFP_ATOMIC.
Signed-off-by: Benjamin Block
Acked-by: David Rientjes
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller