20 Jul, 2010

1 commit

  • It can happen that there are no packets in queue while calling
    tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns
    NULL and that gets deref'ed to get sacked into a local var.

    There is no work to do if no packets are outstanding so we just
    exit early.

    This oops was introduced by 08ebd1721ab8fd (tcp: remove tp->lost_out
    guard to make joining diff nicer).

    Signed-off-by: Ilpo Järvinen
    Reported-by: Lennart Schulte
    Tested-by: Lennart Schulte
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     

16 Jul, 2010

1 commit

  • This was detected using two mcast router tables. The
    pimreg for the second interface did not have a specific
    mrule, so packets received by it were handled by the
    default table, which had nothing configured.

    This caused the ipmr_fib_lookup to fail, causing
    the memory leak.

    Signed-off-by: Ben Greear
    Signed-off-by: David S. Miller

    Ben Greear
     

15 Jul, 2010

1 commit


05 Jul, 2010

1 commit

  • While using xfrm by MARK feature in
    2.6.34 - 2.6.35 kernels, the mark
    is always cleared in flowi structure via memset in
    _decode_session4 (net/ipv4/xfrm4_policy.c), so
    the policy lookup fails.
    IPv6 code is affected by this bug too.

    Signed-off-by: Peter Kosyh
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Peter Kosyh
     

22 Jun, 2010

1 commit

  • It has been reported that the new UFO software fallback path
    fails under certain conditions with NFS. I tracked the problem
    down to the generation of UFO packets that are smaller than the
    MTU. The software fallback path simply discards these packets.

    This patch fixes the problem by not generating such packets on
    the UFO path.

    Signed-off-by: Herbert Xu
    Reviewed-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Herbert Xu
     

07 Jun, 2010

1 commit

  • ipmr_rules_exit() and ip6mr_rules_exit() free a list of items, but
    forget to properly remove these items from list. List head is not
    changed and still points to freed memory.

    This can trigger a fault later when icmpv6_sk_exit() is called.

    Fix is to either reinit list, or use list_del() to properly remove items
    from list before freeing them.

    bugzilla report : https://bugzilla.kernel.org/show_bug.cgi?id=16120

    Introduced by commit d1db275dd3f6e4 (ipv6: ip6mr: support multiple
    tables) and commit f0ad0860d01e (ipv4: ipmr: support multiple tables)

    Reported-by: Alex Zhavnerchik
    Signed-off-by: Eric Dumazet
    CC: Patrick McHardy
    Signed-off-by: David S. Miller

    Eric Dumazet
     

05 Jun, 2010

3 commits

  • Its better to make a route lookup in appropriate namespace.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • I believe a moderate SYN flood attack can corrupt RFS flow table
    (rps_sock_flow_table), making RPS/RFS much less effective.

    Even in a normal situation, server handling short lived sessions suffer
    from bad steering for the first data packet of a session, if another SYN
    packet is received for another session.

    We do following action in tcp_v4_rcv() :

    sock_rps_save_rxhash(sk, skb->rxhash);

    We should _not_ do this if sk is a LISTEN socket, as about each
    packet received on a LISTEN socket has a different rxhash than
    previous one.
    -> RPS_NO_CPU markers are spread all over rps_sock_flow_table.

    Also, it makes sense to protect sk->rxhash field changes with socket
    lock (We currently can change it even if user thread owns the lock
    and might use rxhash)

    This patch moves sock_rps_save_rxhash() to a sock locked section,
    and only for non LISTEN sockets.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • syncookies default to on since
    e994b7c901ded7200b525a707c6da71f2cf6d4bb
    (tcp: Don't make syn cookies initial setting depend on CONFIG_SYSCTL).

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

02 Jun, 2010

1 commit


01 Jun, 2010

3 commits


31 May, 2010

2 commits


29 May, 2010

2 commits

  • As David found out, sock_queue_err_skb() should be called with socket
    lock hold, or we risk sk_forward_alloc corruption, since we use non
    atomic operations to update this field.

    This patch adds bh_lock_sock()/bh_unlock_sock() pair to three spots.
    (BH already disabled)

    1) skb_tstamp_tx()
    2) Before calling ip_icmp_error(), in __udp4_lib_err()
    3) Before calling ipv6_icmp_error(), in __udp6_lib_err()

    Reported-by: Anton Blanchard
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (22 commits)
    netlink: bug fix: wrong size was calculated for vfinfo list blob
    netlink: bug fix: don't overrun skbs on vf_port dump
    xt_tee: use skb_dst_drop()
    netdev/fec: fix ifconfig eth0 down hang issue
    cnic: Fix context memory init. on 5709.
    drivers/net: Eliminate a NULL pointer dereference
    drivers/net/hamradio: Eliminate a NULL pointer dereference
    be2net: Patch removes redundant while statement in loop.
    ipv6: Add GSO support on forwarding path
    net: fix __neigh_event_send()
    vhost: fix the memory leak which will happen when memory_access_ok fails
    vhost-net: fix to check the return value of copy_to/from_user() correctly
    vhost: fix to check the return value of copy_to/from_user() correctly
    vhost: Fix host panic if ioctl called with wrong index
    net: fix lock_sock_bh/unlock_sock_bh
    net/iucv: Add missing spin_unlock
    net: ll_temac: fix checksum offload logic
    net: ll_temac: fix interrupt bug when interrupt 0 is used
    sctp: dubious bitfields in sctp_transport
    ipmr: off by one in __ipmr_fill_mroute()
    ...

    Linus Torvalds
     

27 May, 2010

1 commit

  • This new sock lock primitive was introduced to speedup some user context
    socket manipulation. But it is unsafe to protect two threads, one using
    regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh

    This patch changes lock_sock_bh to be careful against 'owned' state.
    If owned is found to be set, we must take the slow path.
    lock_sock_bh() now returns a boolean to say if the slow path was taken,
    and this boolean is used at unlock_sock_bh time to call the appropriate
    unlock function.

    After this change, BH are either disabled or enabled during the
    lock_sock_bh/unlock_sock_bh protected section. This might be misleading,
    so we rename these functions to lock_sock_fast()/unlock_sock_fast().

    Reported-by: Anton Blanchard
    Signed-off-by: Eric Dumazet
    Tested-by: Anton Blanchard
    Signed-off-by: David S. Miller

    Eric Dumazet
     

26 May, 2010

1 commit


25 May, 2010

1 commit


21 May, 2010

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1674 commits)
    qlcnic: adding co maintainer
    ixgbe: add support for active DA cables
    ixgbe: dcb, do not tag tc_prio_control frames
    ixgbe: fix ixgbe_tx_is_paused logic
    ixgbe: always enable vlan strip/insert when DCB is enabled
    ixgbe: remove some redundant code in setting FCoE FIP filter
    ixgbe: fix wrong offset to fc_frame_header in ixgbe_fcoe_ddp
    ixgbe: fix header len when unsplit packet overflows to data buffer
    ipv6: Never schedule DAD timer on dead address
    ipv6: Use POSTDAD state
    ipv6: Use state_lock to protect ifa state
    ipv6: Replace inet6_ifaddr->dead with state
    cxgb4: notify upper drivers if the device is already up when they load
    cxgb4: keep interrupts available when the ports are brought down
    cxgb4: fix initial addition of MAC address
    cnic: Return SPQ credit to bnx2x after ring setup and shutdown.
    cnic: Convert cnic_local_flags to atomic ops.
    can: Fix SJA1000 command register writes on SMP systems
    bridge: fix build for CONFIG_SYSFS disabled
    ARCNET: Limit com20020 PCI ID matches for SOHARD cards
    ...

    Fix up various conflicts with pcmcia tree drivers/net/
    {pcmcia/3c589_cs.c, wireless/orinoco/orinoco_cs.c and
    wireless/orinoco/spectrum_cs.c} and feature removal
    (Documentation/feature-removal-schedule.txt).

    Also fix a non-content conflict due to pm_qos_requirement getting
    renamed in the PM tree (now pm_qos_request) in net/mac80211/scan.c

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (44 commits)
    vlynq: make whole Kconfig-menu dependant on architecture
    add descriptive comment for TIF_MEMDIE task flag declaration.
    EEPROM: max6875: Header file cleanup
    EEPROM: 93cx6: Header file cleanup
    EEPROM: Header file cleanup
    agp: use NULL instead of 0 when pointer is needed
    rtc-v3020: make bitfield unsigned
    PCI: make bitfield unsigned
    jbd2: use NULL instead of 0 when pointer is needed
    cciss: fix shadows sparse warning
    doc: inode uses a mutex instead of a semaphore.
    uml: i386: Avoid redefinition of NR_syscalls
    fix "seperate" typos in comments
    cocbalt_lcdfb: correct sections
    doc: Change urls for sparse
    Powerpc: wii: Fix typo in comment
    i2o: cleanup some exit paths
    Documentation/: it's -> its where appropriate
    UML: Fix compiler warning due to missing task_struct declaration
    UML: add kernel.h include to signal.c
    ...

    Linus Torvalds
     

18 May, 2010

8 commits

  • This patch removes from net/ (but not any netfilter files)
    all the unnecessary return; statements that precede the
    last closing brace of void functions.

    It does not remove the returns that are immediately
    preceded by a label as gcc doesn't like that.

    Done via:
    $ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
    xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • skb rxhash should be cleared when a skb is handled by a tunnel before
    being delivered again, so that correct packet steering can take place.

    There are other cleanups and accounting that we can factorize in a new
    helper, skb_tunnel_rx()

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Commit 33ad798c924b4a (tcp: options clean up) introduced a problem
    if MD5+SACK+timestamps were used in initial SYN message.

    Some stacks (old linux for example) try to negotiate MD5+SACK+TSTAMP
    sessions, but since 40 bytes of tcp options space are not enough to
    store all the bits needed, we chose to disable timestamps in this case.

    We send a SYN-ACK _without_ timestamp option, but socket has timestamps
    enabled and all further outgoing messages contain a TS block, all with
    the initial timestamp of the remote peer.

    Fix is to really disable timestamps option for the whole session.

    Reported-by: Bijay Singh
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Also added an explicit break; to avoid
    a fallthrough in net/ipv4/tcp_input.c

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • TCP outgoing packets can avoid two atomic ops, and dirtying
    of previously higly contended cache line using new refdst
    infrastructure.

    Note 1: loopback device excluded because of !IFF_XMIT_DST_RELEASE
    Note 2: UDP packets dsts are built before ip_queue_xmit().

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Use ip_route_input_noref() in ip fast path, to avoid two atomic ops per
    incoming packet.

    Note: loopback is excluded from this optimization in ip_rcv_finish()

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • ip_route_input() is the version returning a refcounted dst, while
    ip_route_input_noref() returns a non refcounted one.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Use low order bit of skb->_skb_dst to tell dst is not refcounted.

    Change _skb_dst to _skb_refdst to make sure all uses are catched.

    skb_dst() returns the dst, regardless of noref bit set or not, but
    with a lockdep check to make sure a noref dst is not given if current
    user is not rcu protected.

    New skb_dst_set_noref() helper to set an notrefcounted dst on a skb.
    (with lockdep check)

    skb_dst_drop() drops a reference only if skb dst was refcounted.

    skb_dst_force() helper is used to force a refcount on dst, when skb
    is queued and not anymore RCU protected.

    Use skb_dst_force() in __sk_add_backlog(), __dev_xmit_skb() if
    !IFF_XMIT_DST_RELEASE or skb enqueued on qdisc queue, in
    sock_queue_rcv_skb(), in __nf_queue().

    Use skb_dst_force() in dev_requeue_skb().

    Note: dst_use_noref() still dirties dst, we might transform it
    later to do one dirtying per jiffies.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

17 May, 2010

1 commit


16 May, 2010

3 commits

  • TCP-MD5 sessions have intermittent failures, when route cache is
    invalidated. ip_queue_xmit() has to find a new route, calls
    sk_setup_caps(sk, &rt->u.dst), destroying the

    sk->sk_route_caps &= ~NETIF_F_GSO_MASK

    that MD5 desperately try to make all over its way (from
    tcp_transmit_skb() for example)

    So we send few bad packets, and everything is fine when
    tcp_transmit_skb() is called again for this socket.

    Since ip_queue_xmit() is at a lower level than TCP-MD5, I chose to use a
    socket field, sk_route_nocaps, containing bits to mask on sk_route_caps.

    Reported-by: Bhaskar Dutta
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • TCP MD5 support uses percpu data for temporary storage. It currently
    disables preemption so that same storage cannot be reclaimed by another
    thread on same cpu.

    We also have to make sure a softirq handler wont try to use also same
    context. Various bug reports demonstrated corruptions.

    Fix is to disable preemption and BH.

    Reported-by: Bhaskar Dutta
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • (Dropped the infiniband part, because Tetsuo modified the related code,
    I will send a separate patch for it once this is accepted.)

    This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which
    allows users to reserve ports for third-party applications.

    The reserved ports will not be used by automatic port assignments
    (e.g. when calling connect() or bind() with port number 0). Explicit
    port allocation behavior is unchanged.

    Signed-off-by: Octavian Purdila
    Signed-off-by: WANG Cong
    Cc: Neil Horman
    Cc: Eric Dumazet
    Cc: Eric W. Biederman
    Signed-off-by: David S. Miller

    Amerigo Wang
     

14 May, 2010

1 commit


13 May, 2010

3 commits

  • This patch removes from net/ netfilter files
    all the unnecessary return; statements that precede the
    last closing brace of void functions.

    It does not remove the returns that are immediately
    preceded by a label as gcc doesn't like that.

    Done via:
    $ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
    xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'

    Signed-off-by: Joe Perches
    [Patrick: changed to keep return statements in otherwise empty function bodies]
    Signed-off-by: Patrick McHardy

    Joe Perches
     
  • Make sure all printk messages have a severity level.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: Patrick McHardy

    Stephen Hemminger
     
  • Change netfilter asserts to standard WARN_ON. This has the
    benefit of backtrace info and also causes netfilter errors
    to show up on kerneloops.org.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: Patrick McHardy

    Stephen Hemminger
     

12 May, 2010

2 commits