13 May, 2015

2 commits

  • Pull networking fixes from David Miller:

    1) Handle max TX power properly wrt VIFs and the MAC in iwlwifi, from
    Avri Altman.

    2) Use the correct FW API for scan completions in iwlwifi, from Avraham
    Stern.

    3) FW monitor in iwlwifi accidently uses unmapped memory, fix from Liad
    Kaufman.

    4) rhashtable conversion of mac80211 station table was buggy, the
    virtual interface was not taken into account. Fix from Johannes
    Berg.

    5) Fix deadlock in rtlwifi by not using a zero timeout for
    usb_control_msg(), from Larry Finger.

    6) Update reordering state before calculating loss detection, from
    Yuchung Cheng.

    7) Fix off by one in bluetooth firmward parsing, from Dan Carpenter.

    8) Fix extended frame handling in xiling_can driver, from Jeppe
    Ledet-Pedersen.

    9) Fix CODEL packet scheduler behavior in the presence of TSO packets,
    from Eric Dumazet.

    10) Fix NAPI budget testing in fm10k driver, from Alexander Duyck.

    11) macvlan needs to propagate promisc settings down the the lower
    device, from Vlad Yasevich.

    12) igb driver can oops when changing number of rings, from Toshiaki
    Makita.

    13) Source specific default routes not handled properly in ipv6, from
    Markus Stenberg.

    14) Use after free in tc_ctl_tfilter(), from WANG Cong.

    15) Use softirq spinlocking in netxen driver, from Tony Camuso.

    16) Two ARM bpf JIT fixes from Nicolas Schichan.

    17) Handle MSG_DONTWAIT properly in ring based AF_PACKET sends, from
    Mathias Kretschmer.

    18) Fix x86 bpf JIT implementation of FROM_{BE16,LE16,LE32}, from Alexei
    Starovoitov.

    19) ll_temac driver DMA maps TX packet header with incorrect length, fix
    from Michal Simek.

    20) We removed pm_qos bits from netdevice.h, but some indirect
    references remained. Kill them. From David Ahern.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits)
    net: Remove remaining remnants of pm_qos from netdevice.h
    e1000e: Add pm_qos header
    net: phy: micrel: Fix regression in kszphy_probe
    net: ll_temac: Fix DMA map size bug
    x86: bpf_jit: fix FROM_BE16 and FROM_LE16/32 instructions
    netns: return RTM_NEWNSID instead of RTM_GETNSID on a get
    Update be2net maintainers' email addresses
    net_sched: gred: use correct backlog value in WRED mode
    pppoe: drop pppoe device in pppoe_unbind_sock_work
    net: qca_spi: Fix possible race during probe
    net: mdio-gpio: Allow for unspecified bus id
    af_packet / TX_RING not fully non-blocking (w/ MSG_DONTWAIT).
    bnx2x: limit fw delay in kdump to 5s after boot
    ARM: net: delegate filter to kernel interpreter when imm_offset() return value can't fit into 12bits.
    ARM: net fix emit_udiv() for BPF_ALU | BPF_DIV | BPF_K intruction.
    mpls: Change reserved label names to be consistent with netbsd
    usbnet: avoid integer overflow in start_xmit
    netxen_nic: use spin_[un]lock_bh around tx_clean_lock (2)
    net: xgene_enet: Set hardware dependency
    net: amd-xgbe: Add hardware dependency
    ...

    Linus Torvalds
     
  • Usually, RTM_NEWxxx is returned on a get (same as a dump).

    Fixes: 0c7aecd4bde4 ("netns: add rtnl cmd to add and get peer netns ids")
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     

12 May, 2015

1 commit

  • In WRED mode, the backlog for a single virtual queue (VQ) should not be
    used to determine queue behavior; instead the backlog is summed across
    all VQs. This sum is currently used when calculating the average queue
    lengths. It also needs to be used when determining if the queue's hard
    limit has been reached, or when reporting each VQ's backlog via netlink.
    q->backlog will only be used if the queue switches out of WRED mode.

    Signed-off-by: David Ward
    Signed-off-by: David S. Miller

    David Ward
     

11 May, 2015

1 commit

  • This patch fixes an issue where the send(MSG_DONTWAIT) call
    on a TX_RING is not fully non-blocking in cases where the device's sndBuf is
    full. We pass nonblock=true to sock_alloc_send_skb() and return any possibly
    occuring error code (most likely EGAIN) to the caller. As the fast-path stays
    as it is, we keep the unlikely() around skb == NULL.

    Signed-off-by: Mathias Kretschmer
    Signed-off-by: David S. Miller

    Kretschmer, Mathias
     

10 May, 2015

6 commits

  • Since these are now visible to userspace it is nice to be consistent
    with BSD (sys/netmpls/mpls.h in netBSD).

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • When tcf_destroy() returns true, tp could be already destroyed,
    we should not use tp->next after that.

    For long term, we probably should move tp list to list_head.

    Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     
  • When the peer of an RDS-TCP connection restarts, a reconnect
    attempt should only be made from the active side of the TCP
    connection, i.e. the side that has a transient TCP port
    number. Do not add the passive side of the TCP connection
    to the c_hash_node and thus avoid triggering rds_queue_reconnect()
    for passive rds connections.

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • When running RDS over TCP, the active (client) side connects to the
    listening ("passive") side at the RDS_TCP_PORT. After the connection
    is established, if the client side reboots (potentially without even
    sending a FIN) the server still has a TCP socket in the esablished
    state. If the server-side now gets a new SYN comes from the client
    with a different client port, TCP will create a new socket-pair, but
    the RDS layer will incorrectly pull up the old rds_connection (which
    is still associated with the stale t_sock and RDS socket state).

    This patch corrects this behavior by having rds_tcp_accept_one()
    always create a new connection for an incoming TCP SYN.
    The rds and tcp state associated with the old socket-pair is cleaned
    up via the rds_tcp_state_change() callback which would typically be
    invoked in most cases when the client-TCP sends a FIN on TCP restart,
    triggering a transition to CLOSE_WAIT state. In the rarer event of client
    death without a FIN, TCP_KEEPALIVE probes on the socket will detect
    the stale socket, and the TCP transition to CLOSE state will trigger
    the RDS state cleanup.

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • If there are only IPv6 source specific default routes present, the
    host gets -ENETUNREACH on e.g. connect() because ip6_dst_lookup_tail
    calls ip6_route_output first, and given source address any, it fails,
    and ip6_route_get_saddr is never called.

    The change is to use the ip6_route_get_saddr, even if the initial
    ip6_route_output fails, and then doing ip6_route_output _again_ after
    we have appropriate source address available.

    Note that this is '99% fix' to the problem; a correct fix would be to
    do route lookups only within addrconf.c when picking a source address,
    and never call ip6_route_output before source address has been
    populated.

    Signed-off-by: Markus Stenberg
    Signed-off-by: David S. Miller

    Markus Stenberg
     
  • Johan Hedberg says:

    ====================
    Here are a couple of important Bluetooth & mac802154 fixes for 4.1:

    - mac802154 fix for crypto algorithm allocation failure checking
    - mac802154 wpan phy leak fix for error code path
    - Fix for not calling Bluetooth shutdown() if interface is not up

    Let me know if there are any issues pulling. Thanks.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

06 May, 2015

2 commits


05 May, 2015

5 commits

  • …kernel/git/jberg/mac80211

    Johannes Berg says:

    ====================
    We have only a few fixes right now:
    * a fix for an issue with hash collision handling in the
    rhashtable conversion
    * a merge issue - rhashtable removed default shrinking
    just before mac80211 was converted, so enable it now
    * remove an invalid WARN that can trigger with legitimate
    userspace behaviour
    * add a struct member missing from kernel-doc that caused
    a lot of warnings
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     
  • …etooth/bluetooth-next

    Johan Hedberg says:

    ====================
    pull request: bluetooth-next 2015-05-04

    Here's the first bluetooth-next pull request for 4.2:

    - Various fixes for at86rf230 driver
    - ieee802154: trace events support for rdev->ops
    - HCI UART driver refactoring
    - New Realtek IDs added to btusb driver
    - Off-by-one fix for rtl8723b in btusb driver
    - Refactoring of btbcm driver for both UART & USB use

    Please let me know if there are any issues pulling. Thanks.
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     
  • c0adf54a109 introduced new sparse warnings:
    CHECK /home/dahern/kernels/linux.git/net/rds/ib_cm.c
    net/rds/ib_cm.c:191:34: warning: incorrect type in initializer (different base types)
    net/rds/ib_cm.c:191:34: expected unsigned long long [unsigned] [usertype] dp_ack_seq
    net/rds/ib_cm.c:191:34: got restricted __be64
    net/rds/ib_cm.c:194:51: warning: cast to restricted __be64

    The temporary variable for sequence number should have been declared as __be64
    rather than u64. Make it so.

    Signed-off-by: David Ahern
    Cc: shamir rabinovitch
    Signed-off-by: David S. Miller

    David Ahern
     
  • The code in __netdev_upper_dev_link() has an over-stringent
    loop detection logic that actually prevents valid configurations
    from working correctly.

    In particular, the logic returns an error if an upper device
    is already in the list of all upper devices for a given dev.
    This particular check seems to be a overzealous as it disallows
    perfectly valid configurations. For example:
    # ip l a link eth0 name eth0.10 type vlan id 10
    # ip l a dev br0 typ bridge
    # ip l s eth0.10 master br0
    # ip l s eth0 master br0
    Acked-by: Jiri Pirko
    Acked-by: Veaceslav Falico
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • In an environment where the KDC is running Active Directory, the
    exported composite name field returned in the context could be large
    enough to span a page boundary. Attaching a scratch buffer to the
    decoding xdr_stream helps deal with those cases.

    The case where we saw this was actually due to behavior that's been
    fixed in newer gss-proxy versions, but we're fixing it here too.

    Signed-off-by: Scott Mayhew
    Cc: stable@vger.kernel.org
    Reviewed-by: Simo Sorce
    Signed-off-by: J. Bruce Fields

    Scott Mayhew
     

04 May, 2015

4 commits

  • This reverts commit c243d7e20996254f89c28d4838b5feca735c030d.

    That patch is solving a non-existant problem while creating a
    real problem. Just because a socket is allocated in the init
    name space doesn't mean that it gets hashed in the init name space.

    When we unhash it the name space must be the same as the one
    we had when we hashed it. So this patch is completely bogus
    and causes socket leaks.

    Reported-by: Andrey Wagin
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • rdma_conn_param private data is copied using memcpy after headers such
    as cma_hdr (see cma_resolve_ib_udp as example). so the start of the
    private data is aligned to the end of the structure that come before. if
    this structure end with u32 the meaning is that the start of the private
    data will be 4 bytes aligned. structures that use u8/u16/u32/u64 are
    naturally aligned but in case the structure start is not 8 bytes aligned,
    all u64 members of this structure will not be aligned. to solve this issue
    we must use special macros that allow unaligned access to those
    unaligned members.

    Addresses the following kernel log seen when attempting to use RDMA:

    Kernel unaligned access at TPC[10507a88] rds_ib_cm_connect_complete+0x1bc/0x1e0 [rds_rdma]

    Acked-by: Chien Yen
    Signed-off-by: shamir rabinovitch
    [Minor tweaks for top of tree by:]
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    shamir rabinovitch
     
  • We currently limit the hash table size to 64K which is very bad
    as even 10 years ago it was relatively easy to generate millions
    of sockets.

    Since the hash table is naturally limited by memory allocation
    failure, we don't really need an explicit limit so this patch
    removes it.

    Signed-off-by: Herbert Xu
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Under presence of TSO/GSO/GRO packets, codel at low rates can be quite
    useless. In following example, not a single packet was ever dropped,
    while average delay in codel queue is ~100 ms !

    qdisc codel 0: parent 1:12 limit 16000p target 5.0ms interval 100.0ms
    Sent 134376498 bytes 88797 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 13626b 3p requeues 0
    count 0 lastcount 0 ldelay 96.9ms drop_next 0us
    maxpacket 9084 ecn_mark 0 drop_overlimit 0

    This comes from a confusion of what should be the minimal backlog. It is
    pretty clear it is not 64KB or whatever max GSO packet ever reached the
    qdisc.

    codel intent was to use MTU of the device.

    After the fix, we finally drop some packets, and rtt/cwnd of my single
    TCP flow are meeting our expectations.

    qdisc codel 0: parent 1:12 limit 16000p target 5.0ms interval 100.0ms
    Sent 102798497 bytes 67912 pkt (dropped 1365, overlimits 0 requeues 0)
    backlog 6056b 3p requeues 0
    count 1 lastcount 1 ldelay 36.3ms drop_next 0us
    maxpacket 10598 ecn_mark 0 drop_overlimit 0

    Signed-off-by: Eric Dumazet
    Cc: Kathleen Nichols
    Cc: Dave Taht
    Cc: Van Jacobson
    Signed-off-by: David S. Miller

    Eric Dumazet
     

02 May, 2015

1 commit


01 May, 2015

6 commits

  • This patch fix endian convertions for extended address and short address
    handling when TP_printk is called.

    Signed-off-by: Alexander Aring
    Cc: Guido Günther
    Signed-off-by: Marcel Holtmann

    Alexander Aring
     
  • This code is based on commit 6bab2e19c5ffd
    ("cfg80211: pass name_assign_type to rdev_add_virtual_intf()")

    This will expose in sysfs whether the ifname of a IEEE-802.15.4
    device is set by userspace or generated by the kernel.
    We are using two types of name_assign_types
    o NET_NAME_ENUM: Default interface name provided by kernel
    o NET_NAME_USER: Interface name provided by user.

    Signed-off-by: Varka Bhadram
    Signed-off-by: Alexander Aring
    Signed-off-by: Marcel Holtmann

    Varka Bhadram
     
  • Enabling tracing via

    echo 1 > /sys/kernel/debug/tracing/events/cfg802154/enable

    enables event tracing like

    iwpan dev wpan0 set pan_id 0xbeef
    cat /sys/kernel/debug/tracing/trace
    # tracer: nop
    #
    # entries-in-buffer/entries-written: 2/2 #P:1
    #
    # _-----=> irqs-off
    # / _----=> need-resched
    # | / _---=> hardirq/softirq
    # || / _--=> preempt-depth
    # ||| / delay
    # TASK-PID CPU# |||| TIMESTAMP FUNCTION
    # | | | |||| | |
    iwpan-2663 [000] .... 170.369142: 802154_rdev_set_pan_id: phy0, wpan_dev(1), pan id: 0xbeef
    iwpan-2663 [000] .... 170.369177: 802154_rdev_return_int: phy0, returned: 0

    Signed-off-by: Guido Günther
    Signed-off-by: Alexander Aring
    Signed-off-by: Marcel Holtmann

    Guido Günther
     
  • In case of error, the functions crypto_alloc_aead() and crypto_alloc_blkcipher()
    returns ERR_PTR() and never returns NULL. The NULL test in the return value check
    should be replaced with IS_ERR().

    Signed-off-by: Wei Yongjun
    Signed-off-by: Alexander Aring
    Signed-off-by: Marcel Holtmann

    Wei Yongjun
     
  • Currently if ieee802154_if_add failed, we don't unregister the wpan phy
    which was registered before. This patch adds a correct error handling
    for unregister the wpan phy when ieee802154_if_add failed.

    Signed-off-by: Alexander Aring
    Signed-off-by: Marcel Holtmann

    Alexander Aring
     
  • Most likely, the shutdown routine requires the interface to be up.
    This is the case for BTUSB_INTEL: the routine tries to send a command
    to the interface, but since this one is down, it fails and exits once
    HCI_INIT_TIMEOUT has expired.

    Signed-off-by: Gabriele Mazzotta
    Signed-off-by: Marcel Holtmann
    Cc: stable@vger.kernel.org # 4.0.x

    Gabriele Mazzotta
     

30 Apr, 2015

12 commits

  • tcp_mark_lost_retrans is not used when FACK is disabled. Since
    tcp_update_reordering may disable FACK, it should be called first
    before tcp_mark_lost_retrans.

    Signed-off-by: Yuchung Cheng
    Signed-off-by: Nandita Dukkipati
    Signed-off-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Yuchung Cheng
     
  • Some Congestion Control modules can provide per flow information,
    but current way to get this information is to use netlink.

    Like TCP_INFO, let's add TCP_CC_INFO so that applications can
    issue a getsockopt() if they have a socket file descriptor,
    instead of playing complex netlink games.

    Sample usage would be :

    union tcp_cc_info info;
    socklen_t len = sizeof(info);

    if (getsockopt(fd, SOL_TCP, TCP_CC_INFO, &info, &len) == -1)

    Signed-off-by: Eric Dumazet
    Cc: Yuchung Cheng
    Cc: Neal Cardwell
    Acked-by: Neal Cardwell
    Acked-by: Daniel Borkmann
    Acked-by: Yuchung Cheng
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • We would like that optional info provided by Congestion Control
    modules using netlink can also be read using getsockopt()

    This patch changes get_info() to put this information in a buffer,
    instead of skb, like tcp_get_info(), so that following patch
    can reuse this common infrastructure.

    Signed-off-by: Eric Dumazet
    Cc: Yuchung Cheng
    Cc: Neal Cardwell
    Acked-by: Neal Cardwell
    Acked-by: Daniel Borkmann
    Acked-by: Yuchung Cheng
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This patch tracks total number of payload bytes received on a TCP socket.
    This is the sum of all changes done to tp->rcv_nxt

    RFC4898 named this : tcpEStatsAppHCThruOctetsReceived

    This is a 64bit field, and can be fetched both from TCP_INFO
    getsockopt() if one has a handle on a TCP socket, or from inet_diag
    netlink facility (iproute2/ss patch will follow)

    Note that tp->bytes_received was placed near tp->rcv_nxt for
    best data locality and minimal performance impact.

    Signed-off-by: Eric Dumazet
    Cc: Yuchung Cheng
    Cc: Matt Mathis
    Cc: Eric Salo
    Cc: Martin Lau
    Cc: Chris Rapier
    Acked-by: Yuchung Cheng
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This patch tracks total number of bytes acked for a TCP socket.
    This is the sum of all changes done to tp->snd_una, and allows
    for precise tracking of delivered data.

    RFC4898 named this : tcpEStatsAppHCThruOctetsAcked

    This is a 64bit field, and can be fetched both from TCP_INFO
    getsockopt() if one has a handle on a TCP socket, or from inet_diag
    netlink facility (iproute2/ss patch will follow)

    Note that tp->bytes_acked was placed near tp->snd_una for
    best data locality and minimal performance impact.

    Signed-off-by: Eric Dumazet
    Acked-by: Yuchung Cheng
    Cc: Matt Mathis
    Cc: Eric Salo
    Cc: Martin Lau
    Cc: Chris Rapier
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • eeprom-length is a switch property, not a dsa property, and thus
    needs to be attached to the switch node, not to the dsa node.

    Reported-by: Andrew Lunn
    Fixes: 6793abb4e849 ("net: dsa: Add support for switch EEPROM access")
    Signed-off-by: Guenter Roeck
    Acked-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Guenter Roeck
     
  • Currently, we try to accumulate arrived packets in the links's
    'deferred' queue during the parallel link syncronization phase.

    This entails two problems:

    - With an unlucky combination of arriving packets the algorithm
    may go into a lockstep with the out-of-sequence handling function,
    where the synch mechanism is adding a packet to the deferred queue,
    while the out-of-sequence handling is retrieving it again, thus
    ending up in a loop inside the node_lock scope.

    - Even if this is avoided, the link will very often send out
    unnecessary protocol messages, in the worst case leading to
    redundant retransmissions.

    We fix this by just dropping arriving packets on the upcoming link
    during the synchronization phase, thus relying on the retransmission
    protocol to resolve the situation once the two links have arrived to
    a synchronized state.

    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact,
    it is sent only at the end of a dump.

    Libraries like libnl will wait forever for NLMSG_DONE.

    Fixes: 35b9dd7607f0 ("tipc: add bearer get/dump to new netlink api")
    Fixes: 7be57fc69184 ("tipc: add link get/dump to new netlink api")
    Fixes: 46f15c6794fb ("tipc: add media get/dump to new netlink api")
    CC: Richard Alpe
    CC: Jon Maloy
    CC: Ying Xue
    CC: tipc-discussion@lists.sourceforge.net
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact,
    it is sent only at the end of a dump.

    Libraries like libnl will wait forever for NLMSG_DONE.

    Fixes: e5a55a898720 ("net: create generic bridge ops")
    Fixes: 815cccbf10b2 ("ixgbe: add setlink, getlink support to ixgbe and ixgbevf")
    CC: John Fastabend
    CC: Sathya Perla
    CC: Subbu Seetharaman
    CC: Ajit Khaparde
    CC: Jeff Kirsher
    CC: intel-wired-lan@lists.osuosl.org
    CC: Jiri Pirko
    CC: Scott Feldman
    CC: Stephen Hemminger
    CC: bridge@lists.linux-foundation.org
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact,
    it is sent only at the end of a dump.

    Libraries like libnl will wait forever for NLMSG_DONE.

    Fixes: 37a393bc4932 ("bridge: notify mdb changes via netlink")
    CC: Cong Wang
    CC: Stephen Hemminger
    CC: bridge@lists.linux-foundation.org
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • This action is meant to be passive, i.e. we should not alter
    skb->nfct: If nfct is present just leave it alone.

    Compile tested only.

    Cc: Jamal Hadi Salim
    Signed-off-by: Florian Westphal
    Acked-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • The commit 3cdaa5be9e81a914e633a6be7b7d2ef75b528562 ("ipv4: Don't
    increase PMTU with Datagram Too Big message") broke PMTU in cases
    where the rt_pmtu value has expired but is smaller than the new
    PMTU value.

    This obsolete rt_pmtu then prevents the new PMTU value from being
    installed.

    Fixes: 3cdaa5be9e81 ("ipv4: Don't increase PMTU with Datagram Too Big message")
    Reported-by: Gerd v. Egidy
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu