23 Feb, 2016
14 commits
-
The batman-adv source code is the only place in the kernel which uses the
*_free_ref naming scheme for the *_put functions. Changing it to *_put
makes it more consistent and makes it easier to understand the connection
to the *_get functions.Signed-off-by: Sven Eckelmann
Signed-off-by: Marek Lindner
Signed-off-by: Antonio Quartulli -
The batman-adv source code is the only place in the kernel which uses the
*_free_ref naming scheme for the *_put functions. Changing it to *_put
makes it more consistent and makes it easier to understand the connection
to the *_get functions.Signed-off-by: Sven Eckelmann
Signed-off-by: Marek Lindner
Signed-off-by: Antonio Quartulli -
The batman-adv source code is the only place in the kernel which uses the
*_free_ref naming scheme for the *_put functions. Changing it to *_put
makes it more consistent and makes it easier to understand the connection
to the *_get functions.Signed-off-by: Sven Eckelmann
Signed-off-by: Marek Lindner
Signed-off-by: Antonio Quartulli -
The batman-adv source code is the only place in the kernel which uses the
*_free_ref naming scheme for the *_put functions. Changing it to *_put
makes it more consistent and makes it easier to understand the connection
to the *_get functions.Signed-off-by: Sven Eckelmann
Signed-off-by: Marek Lindner
Signed-off-by: Antonio Quartulli -
The batman-adv source code is the only place in the kernel which uses the
*_free_ref naming scheme for the *_put functions. Changing it to *_put
makes it more consistent and makes it easier to understand the connection
to the *_get functions.Signed-off-by: Sven Eckelmann
Signed-off-by: Marek Lindner
Signed-off-by: Antonio Quartulli -
The batman-adv source code is the only place in the kernel which uses the
*_free_ref naming scheme for the *_put functions. Changing it to *_put
makes it more consistent and makes it easier to understand the connection
to the *_get functions.Signed-off-by: Sven Eckelmann
Signed-off-by: Marek Lindner
Signed-off-by: Antonio Quartulli -
The batman-adv source code is the only place in the kernel which uses the
*_free_ref naming scheme for the *_put functions. Changing it to *_put
makes it more consistent and makes it easier to understand the connection
to the *_get functions.Signed-off-by: Sven Eckelmann
Signed-off-by: Marek Lindner
Signed-off-by: Antonio Quartulli -
The batman-adv source code is the only place in the kernel which uses the
*_free_ref naming scheme for the *_put functions. Changing it to *_put
makes it more consistent and makes it easier to understand the connection
to the *_get functions.Signed-off-by: Sven Eckelmann
Signed-off-by: Marek Lindner
Signed-off-by: Antonio Quartulli -
The batman-adv source code is the only place in the kernel which uses the
*_free_ref naming scheme for the *_put functions. Changing it to *_put
makes it more consistent and makes it easier to understand the connection
to the *_get functions.Signed-off-by: Sven Eckelmann
Signed-off-by: Marek Lindner
Signed-off-by: Antonio Quartulli -
The batman-adv source code is the only place in the kernel which uses the
*_free_ref naming scheme for the *_put functions. Changing it to *_put
makes it more consistent and makes it easier to understand the connection
to the *_get functions.Signed-off-by: Sven Eckelmann
Signed-off-by: Marek Lindner
Signed-off-by: Antonio Quartulli -
The batman-adv source code is the only place in the kernel which uses the
*_free_ref naming scheme for the *_put functions. Changing it to *_put
makes it more consistent and makes it easier to understand the connection
to the *_get functions.Signed-off-by: Sven Eckelmann
Signed-off-by: Marek Lindner
Signed-off-by: Antonio Quartulli -
The batman-adv source code is the only place in the kernel which uses the
*_free_ref naming scheme for the *_put functions. Changing it to *_put
makes it more consistent and makes it easier to understand the connection
to the *_get functions.Signed-off-by: Sven Eckelmann
Signed-off-by: Marek Lindner
Signed-off-by: Antonio Quartulli -
BATADV_BONDING_TQ_THRESHOLD is not used anymore since the implementation
of the bat_neigh_is_similar_or_better() API function.
Such function uses the more generic BATADV_TQ_SIMILARITY_THRESHOLD
constant.Therefore, remove definition of the unused BATADV_BONDING_TQ_THRESHOLD
constant.Signed-off-by: Antonio Quartulli
Signed-off-by: Marek Lindner -
Conflicts:
drivers/net/phy/bcm7xxx.c
drivers/net/phy/marvell.c
drivers/net/vxlan.cAll three conflicts were cases of simple overlapping changes.
Signed-off-by: David S. Miller
22 Feb, 2016
11 commits
-
Johan Hedberg says:
====================
pull request: bluetooth 2016-02-20Here's an important patch for 4.5 which fixes potential invalid pointer
access when processing completed Bluetooth HCI commands.Please let me know if there are any issues pulling. Thanks.
====================Signed-off-by: David S. Miller
-
While debugging with bpf_jit_disasm I noticed emissions of 'mov %eax,%eax',
and found that this comes from BPF_RET | BPF_A translations from classic
BPF. Emitting this is unnecessary as BPF_REG_A is mapped into BPF_REG_0
already, therefore only emit a mov when immediates are used as return value.Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller -
When using this helper for updating UDP checksums, we need to extend
this in order to write CSUM_MANGLED_0 for csum computations that result
into 0 as sum. Reason we need this is because packets with a checksum
could otherwise become incorrectly marked as a packet without a checksum.
Likewise, if the user indicates BPF_F_MARK_MANGLED_0, then we should
not turn packets without a checksum into ones with a checksum.Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller -
When we're dealing with clones and the area is not writeable, try
harder and get a copy via pskb_expand_head(). Replace also other
occurences in tc actions with the new skb_try_make_writable().Reported-by: Ashhad Sheikh
Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller -
We currently limit bpf_skb_store_bytes() and bpf_skb_load_bytes()
helpers to only store or load a maximum buffer of 16 bytes. Thus,
loading, rewriting and storing headers require several bpf_skb_load_bytes()
and bpf_skb_store_bytes() calls.Also here we can use a per-cpu scratch buffer instead in order to not
pressure stack space any further. I do suspect that this limit was mainly
set in place for this particular reason. So, ease program development
by removing this limitation and make the scratchpad generic, so it can
be reused.Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller -
For L4 checksums, we currently have bpf_l4_csum_replace() helper. It's
currently limited to handle 2 and 4 byte changes in a header and feeds the
from/to into inet_proto_csum_replace{2,4}() helpers of the kernel. When
working with IPv6, for example, this makes it rather cumbersome to deal
with, similarly when editing larger parts of a header.Instead, extend the API in a more generic way: For bpf_l4_csum_replace(),
add a case for header field mask of 0 to change the checksum at a given
offset through inet_proto_csum_replace_by_diff(), and provide a helper
bpf_csum_diff() that can generically calculate a from/to diff for arbitrary
amounts of data.This can be used in multiple ways: for the bpf_l4_csum_replace() only
part, this even provides us with the option to insert precalculated diffs
from user space f.e. from a map, or from bpf_csum_diff() during runtime.bpf_csum_diff() has a optional from/to stack buffer input, so we can
calculate a diff by using a scratchbuffer for scenarios where we're
inserting (from is NULL), removing (to is NULL) or diffing (from/to buffers
don't need to be of equal size) data. Also, bpf_csum_diff() allows to
feed a previous csum into csum_partial(), so the function can also be
cascaded.Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller -
Avoid users having to manually load the module by adding a module
alias allowing it to be autoloaded by the lwt infra.Signed-off-by: Robert Shearman
Signed-off-by: David S. Miller -
Avoid users having to manually load the module by adding a module
alias allowing it to be autoloaded by the lwt infra.Signed-off-by: Robert Shearman
Signed-off-by: David S. Miller -
The lwt implementations using net devices can autoload using the
existing mechanism using IFLA_INFO_KIND. However, there's no mechanism
that lwt modules not using net devices can use.Therefore, add the ability to autoload modules registering lwt
operations for lwt implementations not using a net device so that
users don't have to manually load the modules.Only users with the CAP_NET_ADMIN capability can cause modules to be
loaded, which is ensured by rtnetlink_rcv_msg rejecting non-RTM_GETxxx
messages for users without this capability, and by
lwtunnel_build_state not being called in response to RTM_GETxxx
messages.Signed-off-by: Robert Shearman
Signed-off-by: David S. Miller -
Currently vlan device inherits unicast filtering flag from underlying
device. If underlying device doesn't support unicast filter, this will
put vlan device into promiscuous mode when it's stacked.Tun on IFF_UNICAST_FLT on the vlan device in any case so that it does
not go into promiscuous mode needlessly. If underlying device does not
support unicast filtering, that device will enter promiscuous mode.Signed-off-by: Zhang Shengju
Signed-off-by: David S. Miller -
Dmitry Vyukov noted recently that the sctp_port_hashtable had an error in
its size computation, observing that the current method never guaranteed
that the hashsize (measured in number of entries) would be a power of two,
which the input hash function for that table requires. The root cause of
the problem is that two values need to be computed (one, the allocation
order of the storage requries, as passed to __get_free_pages, and two the
number of entries for the hash table). Both need to be ^2, but for
different reasons, and the existing code is simply computing one order
value, and using it as the basis for both, which is wrong (i.e. it assumes
that ((1<
Reported-by: Dmitry Vyukov
CC: Dmitry Vyukov
CC: Vladislav Yasevich
CC: "David S. Miller"
Signed-off-by: David S. Miller
20 Feb, 2016
15 commits
-
In commit 44d271377479 ("Bluetooth: Compress the size of struct
hci_ctrl") we squashed down the size of the structure by using a union
with the assumption that all users would use the flag to determine
whether we had a req_complete or a req_complete_skb.Unfortunately we had a case in hci_req_cmd_complete() where we weren't
looking at the flag. This can result in a situation where we might be
storing a hci_req_complete_skb_t in a hci_req_complete_t variable, or
vice versa.During some testing I found at least one case where the function
hci_req_sync_complete() was called improperly because the kernel thought
that it didn't require an SKB. Looking through the stack in kgdb I
found that it was called by hci_event_packet() and that
hci_event_packet() had both of its locals "req_complete" and
"req_complete_skb" pointing to the same place: both to
hci_req_sync_complete().Let's make sure we always check the flag.
For more details on debugging done, see .
Fixes: 44d271377479 ("Bluetooth: Compress the size of struct hci_ctrl")
Signed-off-by: Douglas Anderson
Acked-by: Johan Hedberg
Signed-off-by: Marcel Holtmann -
The unix_stream_read_generic function tries to use a continue statement
to restart the receive loop after waiting for a message. This may not
work as intended as the caller might use a recvmsg call to peek at
control messages without specifying a message buffer. If this was the
case, the continue will cause the function to return without an error
and without the credential information if the function had to wait for a
message while it had returned with the credentials otherwise. Change to
using goto to restart the loop without checking the condition first in
this case so that credentials are returned either way.Signed-off-by: Rainer Weikusat
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller -
The value passed by unix_diag_get_exact to unix_lookup_by_ino has type
__u32, but unix_lookup_by_ino's argument ino has type int, which is not
a problem yet.
However, when ino is compared with sock_i_ino return value of type
unsigned long, ino is sign extended to signed long, and this results
to incorrect comparison on 64-bit architectures for inode numbers
greater than INT_MAX.This bug was found by strace test suite.
Fixes: 5d3cae8bc39d ("unix_diag: Dumping exact socket core")
Signed-off-by: Dmitry V. Levin
Acked-by: Cong Wang
Signed-off-by: David S. Miller -
Replace individual implementations with the recently introduced
skb_postpush_rcsum() helper.Signed-off-by: Daniel Borkmann
Acked-by: Tom Herbert
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller -
This patch implements sub command ETHTOOL_SCOALESCE for ioctl
ETHTOOL_PERQUEUE. It introduces an interface set_per_queue_coalesce to
set coalesce of each masked queue to device driver. The wanted coalesce
information are stored in "data" for each masked queue, which can copy
from userspace.
If it fails to set coalesce to device driver, the value which already
set to specific queue will be tried to rollback.Signed-off-by: Kan Liang
Reviewed-by: Ben Hutchings
Signed-off-by: David S. Miller -
This patch implements sub command ETHTOOL_GCOALESCE for ioctl
ETHTOOL_PERQUEUE. It introduces an interface get_per_queue_coalesce to
get coalesce of each masked queue from device driver. Then the interrupt
coalescing parameters will be copied back to user space one by one.Signed-off-by: Kan Liang
Reviewed-by: Ben Hutchings
Signed-off-by: David S. Miller -
Introduce a new ioctl ETHTOOL_PERQUEUE for per queue parameters setting.
The following patches will enable some SUB_COMMANDs for per queue
setting.Signed-off-by: Kan Liang
Reviewed-by: Ben Hutchings
Signed-off-by: David S. Miller -
In ipv4, when the machine receives a ICMP_FRAG_NEEDED message, the
connected UDP socket will get EMSGSIZE message on its next read from the
socket.
However, this is not the case for ipv6.
This fix modifies the udp err handler in Ipv6 for ICMP6_PKT_TOOBIG to
make it similar to ipv4 behavior. That is when the machine gets an
ICMP6_PKT_TOOBIG message, the connected UDP socket will get EMSGSIZE
message on its next read from the socket.Signed-off-by: Wei Wang
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller -
the commit 35e2d1152b22 ("tunnels: Allow IPv6 UDP checksums to be
correctly controlled.") changed the default xmit checksum setting
for lwt vxlan/geneve ipv6 tunnels, so that now the checksum is not
set into external UDP header.
This commit changes the rx checksum setting for both lwt vxlan/geneve
devices created by openvswitch accordingly, so that lwt over ipv6
tunnel pairs are again able to communicate with default values.Signed-off-by: Paolo Abeni
Acked-by: Jiri Benc
Acked-by: Jesse Gross
Signed-off-by: David S. Miller -
tipc_bcast_unlock need to be unlocked in error path.
Signed-off-by: Insu Yun
Signed-off-by: David S. Miller -
Antonio Quartulli says:
====================
Two of the fixes included in this patchset prevent wrong memory
access - it was triggered when removing an object from a list
after it was already free'd due to bad reference counting.
This misbehaviour existed for both the gw_node and the
orig_node_vlan object and has been fixed by Sven Eckelmann.The last patch fixes our interface feasibility check and prevents
it from looping indefinitely when two net_device objects
reference each other via iflink index (i.e. veth pair), by
Andrew Lunn
====================Signed-off-by: David S. Miller
-
An error response from a RTM_GETNETCONF request can return the positive
error value EINVAL in the struct nlmsgerr that can mislead userspace.Signed-off-by: Anton Protopopov
Acked-by: Cong Wang
Signed-off-by: David S. Miller -
When I used netdev_for_each_lower_dev in commit bad531623253 ("vrf:
remove slave queue and private slave struct") I thought that it acts
like netdev_for_each_lower_private and can be used to remove the current
device from the list while walking, but unfortunately it acts more like
netdev_for_each_lower_private_rcu and doesn't allow it. The difference
is where the "iter" points to, right now it points to the current element
and that makes it impossible to remove it. Change the logic to be
similar to netdev_for_each_lower_private and make it point to the "next"
element so we can safely delete the current one. VRF is the only such
user right now, there's no change for the read-only users.Here's what can happen now:
[98423.249858] general protection fault: 0000 [#1] SMP
[98423.250175] Modules linked in: vrf bridge(O) stp llc nfsd auth_rpcgss
oid_registry nfs_acl nfs lockd grace sunrpc crct10dif_pclmul
crc32_pclmul crc32c_intel ghash_clmulni_intel jitterentropy_rng
sha256_generic hmac drbg ppdev aesni_intel aes_x86_64 glue_helper lrw
gf128mul ablk_helper cryptd evdev serio_raw pcspkr virtio_balloon
parport_pc parport i2c_piix4 i2c_core virtio_console acpi_cpufreq button
9pnet_virtio 9p 9pnet fscache ipv6 autofs4 ext4 crc16 mbcache jbd2 sg
virtio_blk virtio_net sr_mod cdrom e1000 ata_generic ehci_pci uhci_hcd
ehci_hcd usbcore usb_common virtio_pci ata_piix libata floppy
virtio_ring virtio scsi_mod [last unloaded: bridge]
[98423.255040] CPU: 1 PID: 14173 Comm: ip Tainted: G O
4.5.0-rc2+ #81
[98423.255386] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.8.1-20150318_183358- 04/01/2014
[98423.255777] task: ffff8800547f5540 ti: ffff88003428c000 task.ti:
ffff88003428c000
[98423.256123] RIP: 0010:[] []
netdev_lower_get_next+0x1e/0x30
[98423.256534] RSP: 0018:ffff88003428f940 EFLAGS: 00010207
[98423.256766] RAX: 0002000100000004 RBX: ffff880054ff9000 RCX:
0000000000000000
[98423.257039] RDX: ffff88003428f8b8 RSI: ffff88003428f950 RDI:
ffff880054ff90c0
[98423.257287] RBP: ffff88003428f940 R08: 0000000000000000 R09:
0000000000000000
[98423.257537] R10: 0000000000000001 R11: 0000000000000000 R12:
ffff88003428f9e0
[98423.257802] R13: ffff880054a5fd00 R14: ffff88003428f970 R15:
0000000000000001
[98423.258055] FS: 00007f3d76881700(0000) GS:ffff88005d000000(0000)
knlGS:0000000000000000
[98423.258418] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[98423.258650] CR2: 00007ffe5951ffa8 CR3: 0000000052077000 CR4:
00000000000406e0
[98423.258902] Stack:
[98423.259075] ffff88003428f960 ffffffffa0442636 0002000100000004
ffff880054ff9000
[98423.259647] ffff88003428f9b0 ffffffff81518205 ffff880054ff9000
ffff88003428f978
[98423.260208] ffff88003428f978 ffff88003428f9e0 ffff88003428f9e0
ffff880035b35f00
[98423.260739] Call Trace:
[98423.260920] [] vrf_dev_uninit+0x76/0xa0 [vrf]
[98423.261156] []
rollback_registered_many+0x205/0x390
[98423.261401] [] unregister_netdevice_many+0x1c/0x70
[98423.261641] [] rtnl_delete_link+0x3c/0x50
[98423.271557] [] rtnl_dellink+0xcb/0x1d0
[98423.271800] [] ? __inc_zone_state+0x4a/0x90
[98423.272049] [] rtnetlink_rcv_msg+0x84/0x200
[98423.272279] [] ? trace_hardirqs_on+0xd/0x10
[98423.272513] [] ? rtnetlink_rcv+0x1b/0x40
[98423.272755] [] ? rtnetlink_rcv+0x40/0x40
[98423.272983] [] netlink_rcv_skb+0x97/0xb0
[98423.273209] [] rtnetlink_rcv+0x2a/0x40
[98423.273476] [] netlink_unicast+0x11b/0x1a0
[98423.273710] [] netlink_sendmsg+0x3e1/0x610
[98423.273947] [] sock_sendmsg+0x38/0x70
[98423.274175] [] ___sys_sendmsg+0x2e3/0x2f0
[98423.274416] [] ? do_raw_spin_unlock+0xbe/0x140
[98423.274658] [] ? handle_mm_fault+0x26c/0x2210
[98423.274894] [] ? handle_mm_fault+0x4d/0x2210
[98423.275130] [] ? __fget_light+0x91/0xb0
[98423.275365] [] __sys_sendmsg+0x42/0x80
[98423.275595] [] SyS_sendmsg+0x12/0x20
[98423.275827] [] entry_SYSCALL_64_fastpath+0x16/0x7a
[98423.276073] Code: c3 31 c0 5d c3 0f 1f 84 00 00 00 00 00 66 66 66 66
90 48 8b 06 55 48 81 c7 c0 00 00 00 48 89 e5 48 8b 00 48 39 f8 74 09 48
89 06 8b 40 e8 5d c3 31 c0 5d c3 0f 1f 84 00 00 00 00 00 66 66 66
[98423.279639] RIP [] netdev_lower_get_next+0x1e/0x30
[98423.279920] RSPCC: David Ahern
CC: David S. Miller
CC: Roopa Prabhu
CC: Vlad Yasevich
Fixes: bad531623253 ("vrf: remove slave queue and private slave struct")
Signed-off-by: Nikolay Aleksandrov
Reviewed-by: David Ahern
Tested-by: David Ahern
Signed-off-by: David S. Miller -
Currently mdb entries are exported directly as a structure inside
MDBA_MDB_ENTRY_INFO attribute, we can't really extend it without
breaking user-space. In order to export new mdb fields, I've converted
the MDBA_MDB_ENTRY_INFO into a nested attribute which starts like before
with struct br_mdb_entry (without header, as it's casted directly in
iproute2) and continues with MDBA_MDB_EATTR_ attributes. This way we
keep compatibility with older users and can export new data.
I've tested this with iproute2, both with and without support for the
added attribute and it works fine.
So basically we again have MDBA_MDB_ENTRY_INFO with struct br_mdb_entry
inside but it may contain also some additional MDBA_MDB_EATTR_ attributes
such as MDBA_MDB_EATTR_TIMER which can be parsed by user-space.So the new structure is:
[MDBA_MDB] = {
[MDBA_MDB_ENTRY] = {
[MDBA_MDB_ENTRY_INFO]
[MDBA_MDB_ENTRY_INFO] {
Signed-off-by: David S. Miller -
Switch the port check and skip if it's null, this allows us to reduce one
indentation level.Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller