23 Feb, 2016

14 commits


22 Feb, 2016

11 commits

  • Johan Hedberg says:

    ====================
    pull request: bluetooth 2016-02-20

    Here's an important patch for 4.5 which fixes potential invalid pointer
    access when processing completed Bluetooth HCI commands.

    Please let me know if there are any issues pulling. Thanks.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • While debugging with bpf_jit_disasm I noticed emissions of 'mov %eax,%eax',
    and found that this comes from BPF_RET | BPF_A translations from classic
    BPF. Emitting this is unnecessary as BPF_REG_A is mapped into BPF_REG_0
    already, therefore only emit a mov when immediates are used as return value.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • When using this helper for updating UDP checksums, we need to extend
    this in order to write CSUM_MANGLED_0 for csum computations that result
    into 0 as sum. Reason we need this is because packets with a checksum
    could otherwise become incorrectly marked as a packet without a checksum.
    Likewise, if the user indicates BPF_F_MARK_MANGLED_0, then we should
    not turn packets without a checksum into ones with a checksum.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • When we're dealing with clones and the area is not writeable, try
    harder and get a copy via pskb_expand_head(). Replace also other
    occurences in tc actions with the new skb_try_make_writable().

    Reported-by: Ashhad Sheikh
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • We currently limit bpf_skb_store_bytes() and bpf_skb_load_bytes()
    helpers to only store or load a maximum buffer of 16 bytes. Thus,
    loading, rewriting and storing headers require several bpf_skb_load_bytes()
    and bpf_skb_store_bytes() calls.

    Also here we can use a per-cpu scratch buffer instead in order to not
    pressure stack space any further. I do suspect that this limit was mainly
    set in place for this particular reason. So, ease program development
    by removing this limitation and make the scratchpad generic, so it can
    be reused.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • For L4 checksums, we currently have bpf_l4_csum_replace() helper. It's
    currently limited to handle 2 and 4 byte changes in a header and feeds the
    from/to into inet_proto_csum_replace{2,4}() helpers of the kernel. When
    working with IPv6, for example, this makes it rather cumbersome to deal
    with, similarly when editing larger parts of a header.

    Instead, extend the API in a more generic way: For bpf_l4_csum_replace(),
    add a case for header field mask of 0 to change the checksum at a given
    offset through inet_proto_csum_replace_by_diff(), and provide a helper
    bpf_csum_diff() that can generically calculate a from/to diff for arbitrary
    amounts of data.

    This can be used in multiple ways: for the bpf_l4_csum_replace() only
    part, this even provides us with the option to insert precalculated diffs
    from user space f.e. from a map, or from bpf_csum_diff() during runtime.

    bpf_csum_diff() has a optional from/to stack buffer input, so we can
    calculate a diff by using a scratchbuffer for scenarios where we're
    inserting (from is NULL), removing (to is NULL) or diffing (from/to buffers
    don't need to be of equal size) data. Also, bpf_csum_diff() allows to
    feed a previous csum into csum_partial(), so the function can also be
    cascaded.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Avoid users having to manually load the module by adding a module
    alias allowing it to be autoloaded by the lwt infra.

    Signed-off-by: Robert Shearman
    Signed-off-by: David S. Miller

    Robert Shearman
     
  • Avoid users having to manually load the module by adding a module
    alias allowing it to be autoloaded by the lwt infra.

    Signed-off-by: Robert Shearman
    Signed-off-by: David S. Miller

    Robert Shearman
     
  • The lwt implementations using net devices can autoload using the
    existing mechanism using IFLA_INFO_KIND. However, there's no mechanism
    that lwt modules not using net devices can use.

    Therefore, add the ability to autoload modules registering lwt
    operations for lwt implementations not using a net device so that
    users don't have to manually load the modules.

    Only users with the CAP_NET_ADMIN capability can cause modules to be
    loaded, which is ensured by rtnetlink_rcv_msg rejecting non-RTM_GETxxx
    messages for users without this capability, and by
    lwtunnel_build_state not being called in response to RTM_GETxxx
    messages.

    Signed-off-by: Robert Shearman
    Signed-off-by: David S. Miller

    Robert Shearman
     
  • Currently vlan device inherits unicast filtering flag from underlying
    device. If underlying device doesn't support unicast filter, this will
    put vlan device into promiscuous mode when it's stacked.

    Tun on IFF_UNICAST_FLT on the vlan device in any case so that it does
    not go into promiscuous mode needlessly. If underlying device does not
    support unicast filtering, that device will enter promiscuous mode.

    Signed-off-by: Zhang Shengju
    Signed-off-by: David S. Miller

    Zhang Shengju
     
  • Dmitry Vyukov noted recently that the sctp_port_hashtable had an error in
    its size computation, observing that the current method never guaranteed
    that the hashsize (measured in number of entries) would be a power of two,
    which the input hash function for that table requires. The root cause of
    the problem is that two values need to be computed (one, the allocation
    order of the storage requries, as passed to __get_free_pages, and two the
    number of entries for the hash table). Both need to be ^2, but for
    different reasons, and the existing code is simply computing one order
    value, and using it as the basis for both, which is wrong (i.e. it assumes
    that ((1<
    Reported-by: Dmitry Vyukov
    CC: Dmitry Vyukov
    CC: Vladislav Yasevich
    CC: "David S. Miller"
    Signed-off-by: David S. Miller

    Neil Horman
     

20 Feb, 2016

15 commits

  • In commit 44d271377479 ("Bluetooth: Compress the size of struct
    hci_ctrl") we squashed down the size of the structure by using a union
    with the assumption that all users would use the flag to determine
    whether we had a req_complete or a req_complete_skb.

    Unfortunately we had a case in hci_req_cmd_complete() where we weren't
    looking at the flag. This can result in a situation where we might be
    storing a hci_req_complete_skb_t in a hci_req_complete_t variable, or
    vice versa.

    During some testing I found at least one case where the function
    hci_req_sync_complete() was called improperly because the kernel thought
    that it didn't require an SKB. Looking through the stack in kgdb I
    found that it was called by hci_event_packet() and that
    hci_event_packet() had both of its locals "req_complete" and
    "req_complete_skb" pointing to the same place: both to
    hci_req_sync_complete().

    Let's make sure we always check the flag.

    For more details on debugging done, see .

    Fixes: 44d271377479 ("Bluetooth: Compress the size of struct hci_ctrl")
    Signed-off-by: Douglas Anderson
    Acked-by: Johan Hedberg
    Signed-off-by: Marcel Holtmann

    Douglas Anderson
     
  • The unix_stream_read_generic function tries to use a continue statement
    to restart the receive loop after waiting for a message. This may not
    work as intended as the caller might use a recvmsg call to peek at
    control messages without specifying a message buffer. If this was the
    case, the continue will cause the function to return without an error
    and without the credential information if the function had to wait for a
    message while it had returned with the credentials otherwise. Change to
    using goto to restart the loop without checking the condition first in
    this case so that credentials are returned either way.

    Signed-off-by: Rainer Weikusat
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Rainer Weikusat
     
  • The value passed by unix_diag_get_exact to unix_lookup_by_ino has type
    __u32, but unix_lookup_by_ino's argument ino has type int, which is not
    a problem yet.
    However, when ino is compared with sock_i_ino return value of type
    unsigned long, ino is sign extended to signed long, and this results
    to incorrect comparison on 64-bit architectures for inode numbers
    greater than INT_MAX.

    This bug was found by strace test suite.

    Fixes: 5d3cae8bc39d ("unix_diag: Dumping exact socket core")
    Signed-off-by: Dmitry V. Levin
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller

    Dmitry V. Levin
     
  • Replace individual implementations with the recently introduced
    skb_postpush_rcsum() helper.

    Signed-off-by: Daniel Borkmann
    Acked-by: Tom Herbert
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • This patch implements sub command ETHTOOL_SCOALESCE for ioctl
    ETHTOOL_PERQUEUE. It introduces an interface set_per_queue_coalesce to
    set coalesce of each masked queue to device driver. The wanted coalesce
    information are stored in "data" for each masked queue, which can copy
    from userspace.
    If it fails to set coalesce to device driver, the value which already
    set to specific queue will be tried to rollback.

    Signed-off-by: Kan Liang
    Reviewed-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Kan Liang
     
  • This patch implements sub command ETHTOOL_GCOALESCE for ioctl
    ETHTOOL_PERQUEUE. It introduces an interface get_per_queue_coalesce to
    get coalesce of each masked queue from device driver. Then the interrupt
    coalescing parameters will be copied back to user space one by one.

    Signed-off-by: Kan Liang
    Reviewed-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Kan Liang
     
  • Introduce a new ioctl ETHTOOL_PERQUEUE for per queue parameters setting.
    The following patches will enable some SUB_COMMANDs for per queue
    setting.

    Signed-off-by: Kan Liang
    Reviewed-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Kan Liang
     
  • In ipv4, when the machine receives a ICMP_FRAG_NEEDED message, the
    connected UDP socket will get EMSGSIZE message on its next read from the
    socket.
    However, this is not the case for ipv6.
    This fix modifies the udp err handler in Ipv6 for ICMP6_PKT_TOOBIG to
    make it similar to ipv4 behavior. That is when the machine gets an
    ICMP6_PKT_TOOBIG message, the connected UDP socket will get EMSGSIZE
    message on its next read from the socket.

    Signed-off-by: Wei Wang
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Wei Wang
     
  • the commit 35e2d1152b22 ("tunnels: Allow IPv6 UDP checksums to be
    correctly controlled.") changed the default xmit checksum setting
    for lwt vxlan/geneve ipv6 tunnels, so that now the checksum is not
    set into external UDP header.
    This commit changes the rx checksum setting for both lwt vxlan/geneve
    devices created by openvswitch accordingly, so that lwt over ipv6
    tunnel pairs are again able to communicate with default values.

    Signed-off-by: Paolo Abeni
    Acked-by: Jiri Benc
    Acked-by: Jesse Gross
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • tipc_bcast_unlock need to be unlocked in error path.

    Signed-off-by: Insu Yun
    Signed-off-by: David S. Miller

    Insu Yun
     
  • Antonio Quartulli says:

    ====================
    Two of the fixes included in this patchset prevent wrong memory
    access - it was triggered when removing an object from a list
    after it was already free'd due to bad reference counting.
    This misbehaviour existed for both the gw_node and the
    orig_node_vlan object and has been fixed by Sven Eckelmann.

    The last patch fixes our interface feasibility check and prevents
    it from looping indefinitely when two net_device objects
    reference each other via iflink index (i.e. veth pair), by
    Andrew Lunn
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • An error response from a RTM_GETNETCONF request can return the positive
    error value EINVAL in the struct nlmsgerr that can mislead userspace.

    Signed-off-by: Anton Protopopov
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller

    Anton Protopopov
     
  • When I used netdev_for_each_lower_dev in commit bad531623253 ("vrf:
    remove slave queue and private slave struct") I thought that it acts
    like netdev_for_each_lower_private and can be used to remove the current
    device from the list while walking, but unfortunately it acts more like
    netdev_for_each_lower_private_rcu and doesn't allow it. The difference
    is where the "iter" points to, right now it points to the current element
    and that makes it impossible to remove it. Change the logic to be
    similar to netdev_for_each_lower_private and make it point to the "next"
    element so we can safely delete the current one. VRF is the only such
    user right now, there's no change for the read-only users.

    Here's what can happen now:
    [98423.249858] general protection fault: 0000 [#1] SMP
    [98423.250175] Modules linked in: vrf bridge(O) stp llc nfsd auth_rpcgss
    oid_registry nfs_acl nfs lockd grace sunrpc crct10dif_pclmul
    crc32_pclmul crc32c_intel ghash_clmulni_intel jitterentropy_rng
    sha256_generic hmac drbg ppdev aesni_intel aes_x86_64 glue_helper lrw
    gf128mul ablk_helper cryptd evdev serio_raw pcspkr virtio_balloon
    parport_pc parport i2c_piix4 i2c_core virtio_console acpi_cpufreq button
    9pnet_virtio 9p 9pnet fscache ipv6 autofs4 ext4 crc16 mbcache jbd2 sg
    virtio_blk virtio_net sr_mod cdrom e1000 ata_generic ehci_pci uhci_hcd
    ehci_hcd usbcore usb_common virtio_pci ata_piix libata floppy
    virtio_ring virtio scsi_mod [last unloaded: bridge]
    [98423.255040] CPU: 1 PID: 14173 Comm: ip Tainted: G O
    4.5.0-rc2+ #81
    [98423.255386] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
    BIOS 1.8.1-20150318_183358- 04/01/2014
    [98423.255777] task: ffff8800547f5540 ti: ffff88003428c000 task.ti:
    ffff88003428c000
    [98423.256123] RIP: 0010:[] []
    netdev_lower_get_next+0x1e/0x30
    [98423.256534] RSP: 0018:ffff88003428f940 EFLAGS: 00010207
    [98423.256766] RAX: 0002000100000004 RBX: ffff880054ff9000 RCX:
    0000000000000000
    [98423.257039] RDX: ffff88003428f8b8 RSI: ffff88003428f950 RDI:
    ffff880054ff90c0
    [98423.257287] RBP: ffff88003428f940 R08: 0000000000000000 R09:
    0000000000000000
    [98423.257537] R10: 0000000000000001 R11: 0000000000000000 R12:
    ffff88003428f9e0
    [98423.257802] R13: ffff880054a5fd00 R14: ffff88003428f970 R15:
    0000000000000001
    [98423.258055] FS: 00007f3d76881700(0000) GS:ffff88005d000000(0000)
    knlGS:0000000000000000
    [98423.258418] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [98423.258650] CR2: 00007ffe5951ffa8 CR3: 0000000052077000 CR4:
    00000000000406e0
    [98423.258902] Stack:
    [98423.259075] ffff88003428f960 ffffffffa0442636 0002000100000004
    ffff880054ff9000
    [98423.259647] ffff88003428f9b0 ffffffff81518205 ffff880054ff9000
    ffff88003428f978
    [98423.260208] ffff88003428f978 ffff88003428f9e0 ffff88003428f9e0
    ffff880035b35f00
    [98423.260739] Call Trace:
    [98423.260920] [] vrf_dev_uninit+0x76/0xa0 [vrf]
    [98423.261156] []
    rollback_registered_many+0x205/0x390
    [98423.261401] [] unregister_netdevice_many+0x1c/0x70
    [98423.261641] [] rtnl_delete_link+0x3c/0x50
    [98423.271557] [] rtnl_dellink+0xcb/0x1d0
    [98423.271800] [] ? __inc_zone_state+0x4a/0x90
    [98423.272049] [] rtnetlink_rcv_msg+0x84/0x200
    [98423.272279] [] ? trace_hardirqs_on+0xd/0x10
    [98423.272513] [] ? rtnetlink_rcv+0x1b/0x40
    [98423.272755] [] ? rtnetlink_rcv+0x40/0x40
    [98423.272983] [] netlink_rcv_skb+0x97/0xb0
    [98423.273209] [] rtnetlink_rcv+0x2a/0x40
    [98423.273476] [] netlink_unicast+0x11b/0x1a0
    [98423.273710] [] netlink_sendmsg+0x3e1/0x610
    [98423.273947] [] sock_sendmsg+0x38/0x70
    [98423.274175] [] ___sys_sendmsg+0x2e3/0x2f0
    [98423.274416] [] ? do_raw_spin_unlock+0xbe/0x140
    [98423.274658] [] ? handle_mm_fault+0x26c/0x2210
    [98423.274894] [] ? handle_mm_fault+0x4d/0x2210
    [98423.275130] [] ? __fget_light+0x91/0xb0
    [98423.275365] [] __sys_sendmsg+0x42/0x80
    [98423.275595] [] SyS_sendmsg+0x12/0x20
    [98423.275827] [] entry_SYSCALL_64_fastpath+0x16/0x7a
    [98423.276073] Code: c3 31 c0 5d c3 0f 1f 84 00 00 00 00 00 66 66 66 66
    90 48 8b 06 55 48 81 c7 c0 00 00 00 48 89 e5 48 8b 00 48 39 f8 74 09 48
    89 06 8b 40 e8 5d c3 31 c0 5d c3 0f 1f 84 00 00 00 00 00 66 66 66
    [98423.279639] RIP [] netdev_lower_get_next+0x1e/0x30
    [98423.279920] RSP

    CC: David Ahern
    CC: David S. Miller
    CC: Roopa Prabhu
    CC: Vlad Yasevich
    Fixes: bad531623253 ("vrf: remove slave queue and private slave struct")
    Signed-off-by: Nikolay Aleksandrov
    Reviewed-by: David Ahern
    Tested-by: David Ahern
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Currently mdb entries are exported directly as a structure inside
    MDBA_MDB_ENTRY_INFO attribute, we can't really extend it without
    breaking user-space. In order to export new mdb fields, I've converted
    the MDBA_MDB_ENTRY_INFO into a nested attribute which starts like before
    with struct br_mdb_entry (without header, as it's casted directly in
    iproute2) and continues with MDBA_MDB_EATTR_ attributes. This way we
    keep compatibility with older users and can export new data.
    I've tested this with iproute2, both with and without support for the
    added attribute and it works fine.
    So basically we again have MDBA_MDB_ENTRY_INFO with struct br_mdb_entry
    inside but it may contain also some additional MDBA_MDB_EATTR_ attributes
    such as MDBA_MDB_EATTR_TIMER which can be parsed by user-space.

    So the new structure is:
    [MDBA_MDB] = {
    [MDBA_MDB_ENTRY] = {
    [MDBA_MDB_ENTRY_INFO]
    [MDBA_MDB_ENTRY_INFO] {
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Switch the port check and skip if it's null, this allows us to reduce one
    indentation level.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov