29 Oct, 2009

5 commits


28 Oct, 2009

8 commits


27 Oct, 2009

2 commits

  • Conflicts:
    drivers/net/sh_eth.c

    David S. Miller
     
  • We currently use a 16 bit field (vlan_tci) to store VLAN ID/PRIO on a skb.

    Null value is used as a special value, meaning vlan tagging not enabled.
    This forbids use of null vlan ID.

    As pointed by David, some drivers use the 3 high order bits (PRIO)

    As VLAN ID is 12 bits, we can use the remaining bit (CFI) as a flag, and
    allow null VLAN ID.

    In case future code really wants to use VLAN_CFI_MASK, we'll have to use
    a bit outside of vlan_tci.

    #define VLAN_PRIO_MASK 0xe000 /* Priority Code Point */
    #define VLAN_PRIO_SHIFT 13
    #define VLAN_CFI_MASK 0x1000 /* Canonical Format Indicator */
    #define VLAN_TAG_PRESENT VLAN_CFI_MASK
    #define VLAN_VID_MASK 0x0fff /* VLAN Identifier */

    Reported-by: Gertjan Hofman
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 Oct, 2009

8 commits

  • While playing with pktgen, I realized IP ID was not filled and a
    random value was taken, possibly leaking 2 bytes of kernel memory.

    We can use an increasing ID, this can help diagnostics anyway.

    Also clear packet payload, instead of leaking kernel memory.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • When handling large number of netdevice, rtnl_dump_ifinfo()
    is very slow because it has O(N^2) complexity.

    Instead of scanning one single list, we can use the 256 sub lists
    of the dev_index hash table.

    This considerably speedups "ip link" operations

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • GRE tunnels use one rwlock to protect their hash tables.

    This locking scheme can be converted to RCU for free, since netdevice
    already must wait for a RCU grace period at dismantle time.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • ip6_tunnels use one rwlock to protect their hash tables.

    This locking scheme can be converted to RCU for free, since netdevice
    already must wait for a RCU grace period at dismantle time.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • IPIP tunnels use one rwlock to protect their hash tables.

    This locking scheme can be converted to RCU for free, since netdevice
    already must wait for a RCU grace period at dismantle time.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • xfrm6_tunnels use one rwlock to protect their hash tables.

    Plain and straightforward conversion to RCU locking to permit better SMP
    performance.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • SIT tunnels use one rwlock to protect their hash tables.

    This locking scheme can be converted to RCU for free, since netdevice
    already must wait for a RCU grace period at dismantle time.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • SIT tunnels use one rwlock to protect their prl entries.

    This first patch adds RCU locking for prl management,
    with standard call_rcu() calls.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Oct, 2009

2 commits


22 Oct, 2009

1 commit

  • rtnl_getlink() & rtnl_setlink() run with RTNL held, we can use
    __dev_get_by_index() and __dev_get_by_name() variants and avoid
    dev_hold()/dev_put()

    Adds to rtnl_getlink() the capability to find a device by its name,
    not only by its index.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 Oct, 2009

4 commits

  • For connected sockets, the first run of dev_pick_tx saves the
    calculated txq in sk_tx_queue_mapping. This is not saved if
    either the device has a queue select or the socket is not
    connected. Next iterations of dev_pick_tx uses the cached value
    of sk_tx_queue_mapping.

    Signed-off-by: Krishna Kumar
    Signed-off-by: David S. Miller

    Krishna Kumar
     
  • dst_negative_advice() should check for changed dst and reset
    sk_tx_queue_mapping accordingly. Pass sock to the callers of
    dst_negative_advice.

    (sk_reset_txq is defined just for use by dst_negative_advice. The
    only way I could find to get around this is to move dst_negative_()
    from dst.h to dst.c, include sock.h in dst.c, etc)

    Signed-off-by: Krishna Kumar
    Signed-off-by: David S. Miller

    Krishna Kumar
     
  • IPv6: Reset sk_tx_queue_mapping when dst_cache is reset. Use existing
    macro to do the work.

    Signed-off-by: Krishna Kumar
    Signed-off-by: David S. Miller

    Krishna Kumar
     
  • Introduce sk_tx_queue_mapping; and functions that set, test and
    get this value. Reset sk_tx_queue_mapping to -1 whenever the dst
    cache is set/reset, and in socket alloc. Setting txq to -1 and
    using valid txq= allows the tx path to use the value
    of sk_tx_queue_mapping directly instead of subtracting 1 on every
    tx.

    Signed-off-by: Krishna Kumar
    Signed-off-by: David S. Miller

    Krishna Kumar
     

20 Oct, 2009

10 commits

  • It can help being able to filter packets on their queue_mapping.

    If filter performance is not good, we could add a "numqueue" field
    in struct packet_type, so that netif_nit_deliver() and other functions
    can directly ignore packets with not expected queue number.

    Lets experiment this simple filter extension first.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • We hold RTNL, we can use __dev_get_by_index() instead of dev_get_by_index()

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • While doing multiple captures, I found af_packet was dirtying cache line
    containing its prot_hook.

    This slow down machines where several cpus are necessary to handle capture
    traffic, as each prot_hook is traversed for each packet coming in or out
    the host.

    This patches moves "struct packet_type prot_hook" to the end of
    packet_sock, and uses a ____cacheline_aligned_in_smp to make sure
    this remains shared by all cpus.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This patch tries to print out more information when we hit the
    MSG_PEEK bug in tcp_recvmsg. It's been around since at least
    2005 and it's about time that we finally fix it.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Use symbols instead of magic constants while checking PMTU discovery
    setsockopt.

    Remove redundant test in ip_rt_frag_needed() (done by caller).

    Signed-off-by: John Dykstra
    Signed-off-by: David S. Miller

    John Dykstra
     
  • Allow bpf to set a filter to drop packets that dont
    match a specific mark

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    jamal
     
  • ipv4/ipv6 setsockopt(IP_MULTICAST_IF) have dubious __dev_get_by_index() calls.

    This function should be called only with RTNL or dev_base_lock held, or reader
    could see a corrupt hash chain and eventually enter an endless loop.

    Fix is to call dev_get_by_index()/dev_put().

    If this happens to be performance critical, we could define a new dev_exist_by_index()
    function to avoid touching dev refcount.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • When shutdown ppp connection, lockdep waring about non-static key
    will happen, it is caused by the lock is not initialized properly
    at that time.

    Fix with tuning the lock/skb_queue_head init order

    [ 94.339261] INFO: trying to register non-static key.
    [ 94.342509] the code is fine but needs lockdep annotation.
    [ 94.342509] turning off the locking correctness validator.
    [ 94.342509] Pid: 0, comm: swapper Not tainted 2.6.31-mm1 #2
    [ 94.342509] Call Trace:
    [ 94.342509] [] register_lock_class+0x58/0x241
    [ 94.342509] [] ? __lock_acquire+0xb57/0xb73
    [ 94.342509] [] __lock_acquire+0xac/0xb73
    [ 94.342509] [] ? lock_release_non_nested+0x17b/0x1de
    [ 94.342509] [] lock_acquire+0x67/0x84
    [ 94.342509] [] ? skb_dequeue+0x15/0x41
    [ 94.342509] [] _spin_lock_irqsave+0x2f/0x3f
    [ 94.342509] [] ? skb_dequeue+0x15/0x41
    [ 94.342509] [] skb_dequeue+0x15/0x41
    [ 94.342509] [] ? _read_unlock+0x1d/0x20
    [ 94.342509] [] skb_queue_purge+0x14/0x1b
    [ 94.342509] [] l2cap_recv_frame+0xea1/0x115a [l2cap]
    [ 94.342509] [] ? __lock_acquire+0xb57/0xb73
    [ 94.342509] [] ? mark_lock+0x1e/0x1c7
    [ 94.342509] [] ? hci_rx_task+0xd2/0x1bc [bluetooth]
    [ 94.342509] [] l2cap_recv_acldata+0xb1/0x1c6 [l2cap]
    [ 94.342509] [] hci_rx_task+0x106/0x1bc [bluetooth]
    [ 94.342509] [] ? l2cap_recv_acldata+0x0/0x1c6 [l2cap]
    [ 94.342509] [] tasklet_action+0x69/0xc1
    [ 94.342509] [] __do_softirq+0x94/0x11e
    [ 94.342509] [] do_softirq+0x36/0x5a
    [ 94.342509] [] irq_exit+0x35/0x68
    [ 94.342509] [] do_IRQ+0x72/0x89
    [ 94.342509] [] common_interrupt+0x2e/0x34
    [ 94.342509] [] ? pm_qos_add_requirement+0x63/0x9d
    [ 94.342509] [] ? acpi_idle_enter_bm+0x209/0x238
    [ 94.342509] [] cpuidle_idle_call+0x5c/0x94
    [ 94.342509] [] cpu_idle+0x4e/0x6f
    [ 94.342509] [] rest_init+0x53/0x55
    [ 94.342509] [] start_kernel+0x2f0/0x2f5
    [ 94.342509] [] i386_start_kernel+0x91/0x96

    Reported-by: Oliver Hartkopp
    Signed-off-by: Dave Young
    Tested-by: Oliver Hartkopp
    Signed-off-by: David S. Miller

    Dave Young
     
  • Due to driver core changes dev_set_drvdata will call kzalloc which should be
    in might_sleep context, but hci_conn_add will be called in atomic context

    Like dev_set_name move dev_set_drvdata to work queue function.

    oops as following:

    Oct 2 17:41:59 darkstar kernel: [ 438.001341] BUG: sleeping function called from invalid context at mm/slqb.c:1546
    Oct 2 17:41:59 darkstar kernel: [ 438.001345] in_atomic(): 1, irqs_disabled(): 0, pid: 2133, name: sdptool
    Oct 2 17:41:59 darkstar kernel: [ 438.001348] 2 locks held by sdptool/2133:
    Oct 2 17:41:59 darkstar kernel: [ 438.001350] #0: (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.+.}, at: [] lock_sock+0xa/0xc [l2cap]
    Oct 2 17:41:59 darkstar kernel: [ 438.001360] #1: (&hdev->lock){+.-.+.}, at: [] l2cap_sock_connect+0x103/0x26b [l2cap]
    Oct 2 17:41:59 darkstar kernel: [ 438.001371] Pid: 2133, comm: sdptool Not tainted 2.6.31-mm1 #2
    Oct 2 17:41:59 darkstar kernel: [ 438.001373] Call Trace:
    Oct 2 17:41:59 darkstar kernel: [ 438.001381] [] __might_sleep+0xde/0xe5
    Oct 2 17:41:59 darkstar kernel: [ 438.001386] [] __kmalloc+0x4a/0x15a
    Oct 2 17:41:59 darkstar kernel: [ 438.001392] [] ? kzalloc+0xb/0xd
    Oct 2 17:41:59 darkstar kernel: [ 438.001396] [] kzalloc+0xb/0xd
    Oct 2 17:41:59 darkstar kernel: [ 438.001400] [] device_private_init+0x15/0x3d
    Oct 2 17:41:59 darkstar kernel: [ 438.001405] [] dev_set_drvdata+0x18/0x26
    Oct 2 17:41:59 darkstar kernel: [ 438.001414] [] hci_conn_init_sysfs+0x40/0xd9 [bluetooth]
    Oct 2 17:41:59 darkstar kernel: [ 438.001422] [] ? hci_conn_add+0x128/0x186 [bluetooth]
    Oct 2 17:41:59 darkstar kernel: [ 438.001429] [] hci_conn_add+0x177/0x186 [bluetooth]
    Oct 2 17:41:59 darkstar kernel: [ 438.001437] [] hci_connect+0x3c/0xfb [bluetooth]
    Oct 2 17:41:59 darkstar kernel: [ 438.001442] [] l2cap_sock_connect+0x174/0x26b [l2cap]
    Oct 2 17:41:59 darkstar kernel: [ 438.001448] [] sys_connect+0x60/0x7a
    Oct 2 17:41:59 darkstar kernel: [ 438.001453] [] ? lock_release_non_nested+0x84/0x1de
    Oct 2 17:41:59 darkstar kernel: [ 438.001458] [] ? might_fault+0x47/0x81
    Oct 2 17:41:59 darkstar kernel: [ 438.001462] [] ? might_fault+0x47/0x81
    Oct 2 17:41:59 darkstar kernel: [ 438.001468] [] ? __copy_from_user_ll+0x11/0xce
    Oct 2 17:41:59 darkstar kernel: [ 438.001472] [] sys_socketcall+0x82/0x17b
    Oct 2 17:41:59 darkstar kernel: [ 438.001477] [] syscall_call+0x7/0xb

    Signed-off-by: Dave Young
    Signed-off-by: David S. Miller

    Dave Young
     
  • Fix TCP_DEFER_ACCEPT conversion between seconds and
    retransmission to match the TCP SYN-ACK retransmission periods
    because the time is converted to such retransmissions. The old
    algorithm selects one more retransmission in some cases. Allow
    up to 255 retransmissions.

    Signed-off-by: Julian Anastasov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Julian Anastasov