11 Oct, 2008

3 commits


10 Oct, 2008

14 commits

  • This patch fix error with CONFIG_TCP_MD5SIG disabled.

    Signed-off-by: Guo-Fu Tseng
    Signed-off-by: David S. Miller

    Guo-Fu Tseng
     
  • While looking at UDP port randomization, I noticed it
    was litle bit pessimistic, not looking at type of sockets
    (IPV6/IPV4) and not looking at bound addresses if any.

    We should perform same tests than when binding to a
    specific port.

    This permits a cleanup of udp_lib_get_port()

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • $ codiff tcp_ipv6.o.old tcp_ipv6.o.new
    net/ipv6/tcp_ipv6.c:
    tcp_v6_md5_hash_hdr | -144
    tcp_v6_send_ack | -585
    tcp_v6_send_reset | -540
    3 functions changed, 1269 bytes removed, diff: -1269

    net/ipv6/tcp_ipv6.c:
    tcp_v6_send_response | +791
    1 function changed, 791 bytes added, diff: +791

    tcp_ipv6.o.new:
    4 functions changed, 791 bytes added, 1269 bytes removed, diff: -478

    I choose to leave the reset related netns comment in place (not
    the one that is killed) as I cannot understand its English so
    it's a bit hard for me to evaluate its usefulness :-).

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     
  • after this I get:

    $ diff-funcs tcp_v6_send_reset tcp_ipv6.c tcp_ipv6.c tcp_v6_send_ack
    --- tcp_ipv6.c:tcp_v6_send_reset()
    +++ tcp_ipv6.c:tcp_v6_send_ack()
    @@ -1,4 +1,5 @@
    -static void tcp_v6_send_reset(struct sock *sk, struct sk_buff *skb)
    +static void tcp_v6_send_ack(struct sk_buff *skb, u32 seq, u32 ack, u32 win,
    u32 ts,
    + struct tcp_md5sig_key *key)
    {
    struct tcphdr *th = tcp_hdr(skb), *t1;
    struct sk_buff *buff;
    @@ -7,31 +8,14 @@
    struct sock *ctl_sk = net->ipv6.tcp_sk;
    unsigned int tot_len = sizeof(struct tcphdr);
    __be32 *topt;
    -#ifdef CONFIG_TCP_MD5SIG
    - struct tcp_md5sig_key *key;
    -#endif
    -
    - if (th->rst)
    - return;
    -
    - if (!ipv6_unicast_destination(skb))
    - return;

    + if (ts)
    + tot_len += TCPOLEN_TSTAMP_ALIGNED;
    #ifdef CONFIG_TCP_MD5SIG
    - if (sk)
    - key = tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->daddr);
    - else
    - key = NULL;
    -
    if (key)
    tot_len += TCPOLEN_MD5SIG_ALIGNED;
    #endif

    - /*
    - * We need to grab some memory, and put together an RST,
    - * and then put it into the queue to be sent.
    - */
    -
    buff = alloc_skb(MAX_HEADER + sizeof(struct ipv6hdr) + tot_len,
    GFP_ATOMIC);
    if (buff == NULL)
    @@ -46,18 +30,20 @@
    t1->dest = th->source;
    t1->source = th->dest;
    t1->doff = tot_len / 4;
    - t1->rst = 1;
    -
    - if(th->ack) {
    - t1->seq = th->ack_seq;
    - } else {
    - t1->ack = 1;
    - t1->ack_seq = htonl(ntohl(th->seq) + th->syn + th->fin
    - + skb->len - (th->doff<seq = htonl(seq);
    + t1->ack_seq = htonl(ack);
    + t1->ack = 1;
    + t1->window = htons(win);

    topt = (__be32 *)(t1 + 1);

    + if (ts) {
    + *topt++ = htonl((TCPOPT_NOP << 24) | (TCPOPT_NOP << 16) |
    + (TCPOPT_TIMESTAMP << 8) |
    TCPOLEN_TIMESTAMP);
    + *topt++ = htonl(tcp_time_stamp);
    + *topt++ = htonl(ts);
    + }
    +
    #ifdef CONFIG_TCP_MD5SIG
    if (key) {
    *topt++ = htonl((TCPOPT_NOP << 24) | (TCPOPT_NOP << 16) |
    @@ -84,15 +70,10 @@
    fl.fl_ip_sport = t1->source;
    security_skb_classify_flow(skb, &fl);

    - /* Pass a socket to ip6_dst_lookup either it is for RST
    - * Underlying function will use this to retrieve the network
    - * namespace
    - */
    if (!ip6_dst_lookup(ctl_sk, &buff->dst, &fl)) {
    if (xfrm_lookup(&buff->dst, &fl, NULL, 0) >= 0) {
    ip6_xmit(ctl_sk, buff, &fl, NULL, 0);
    TCP_INC_STATS_BH(net, TCP_MIB_OUTSEGS);
    - TCP_INC_STATS_BH(net, TCP_MIB_OUTRSTS);
    return;
    }
    }

    ...which starts to be trivial to combine.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     
  • Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     
  • Maybe it's just me but I guess those md5 people made a mess
    out of it by having *_md5_hash_* to use daddr, saddr order
    instead of the one that is natural (and equal to what csum
    functions use). For the segment were sending, the original
    addresses are reversed so buff's saddr == skb's daddr and
    vice-versa.

    Maybe I can finally proceed with unification of some code
    after fixing it first... :-)

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     
  • The T5 timer is the timer for the over-all shutdown procedure. If
    this timer expires, then shutdown procedure has not completed and we
    ABORT the association. We should update SCTP_MIB_ABORTED and
    SCTP_MIB_CURRESTAB when aborting.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • If ABORT chunks require authentication and a protocol violation
    is triggered, we do not tear down the association. Subsequently,
    we should not increment SCTP_MIB_ABORTED.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • RFC3873 defined SCTP_MIB_CURRESTAB:
    sctpCurrEstab OBJECT-TYPE
    SYNTAX Gauge32
    MAX-ACCESS read-only
    STATUS current
    DESCRIPTION
    "The number of associations for which the current state is
    either ESTABLISHED, SHUTDOWN-RECEIVED or SHUTDOWN-PENDING."
    REFERENCE
    "Section 4 in RFC2960 covers the SCTP Association state
    diagram."

    If the T4 RTO timer expires many times(timeout), the association will enter
    CLOSED state, so we should dec the number of SCTP_MIB_CURRESTAB, not inc the
    number of SCTP_MIB_CURRESTAB.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Wei Yongjun
     
  • This patch makes the RX/TX byte counters for IPIP, GRE and SIT more
    consistent. Previously we included the external IP headers on the
    way out but not when the packet is inbound.

    The new scheme is to count payload only in both directions. For
    IPIP and SIT this simply means the exclusion of the external IP
    header. For GRE this means that we exclude the GRE header as
    well.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch adds support for Ethernet over GRE encapsulation.
    This is exposed to user-space with a new link type of "gretap"
    instead of "gre". It will create an ARPHRD_ETHER device in
    lieu of the usual ARPHRD_IPGRE.

    Note that to preserver backwards compatibility all Transparent
    Ethernet Bridging packets are passed to an ARPHRD_IPGRE tunnel
    if its key matches and there is no ARPHRD_ETHER device whose
    key matches more closely.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch adds a netlink interface that will eventually displace
    the existing ioctl interface. It utilises the elegant rtnl_link_ops
    mechanism.

    This also means that user-space no longer needs to rely on the
    tunnel interface being of type GRE to identify GRE tunnels. The
    identification can now occur using rtnl_link_ops.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch moves the dev->mtu setting out of ipgre_tunnel_bind_dev.
    This is in prepartion of using rtnl_link where we'll need to make
    the MTU setting conditional on whether the user has supplied an
    MTU. This also requires the move of the ipgre_tunnel_bind_dev
    call out of the dev->init function so that we can access the user
    parameters later.

    This patch also adds a check to prevent setting the MTU below
    the minimum of 68.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Now that we have dev->needed_headroom, we can use it instead of
    having a bogus dev->hard_header_len. This also allows us to
    include dev->hard_header_len in the MTU computation so that when
    we do have a meaningful hard_harder_len in future it is included
    automatically in figuring out the MTU.

    Incidentally, this fixes a bug where we ignored the needed_headroom
    field of the underlying device in calculating our own hard_header_len.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

09 Oct, 2008

23 commits

  • Signed-off-by: David S. Miller

    David S. Miller
     
  • Add support for the Marvell 88E6060 switch chip. This chip only
    supports the Header and Trailer tagging formats, and we use it in
    Trailer mode since that mode is slightly easier to handle than
    Header mode.

    Signed-off-by: Lennert Buytenhek
    Tested-by: Byron Bradley
    Tested-by: Tim Ellis
    Signed-off-by: David S. Miller

    Lennert Buytenhek
     
  • This adds support for the Trailer switch tagging format. This is
    another tagging that doesn't explicitly mark tagged packets with a
    distinct ethertype, so that we need to add a similar hack in the
    receive path as for the Original DSA tagging format.

    Signed-off-by: Lennert Buytenhek
    Tested-by: Byron Bradley
    Tested-by: Tim Ellis
    Signed-off-by: David S. Miller

    Lennert Buytenhek
     
  • Add support for the Marvell 88E6131 switch chip. This chip only
    supports the original (ethertype-less) DSA tagging format.

    On the 88E6131, there is a PHY Polling Unit (PPU) which has exclusive
    access to each of the PHYs's MII management registers. If we want to
    talk to the PHYs from software, we have to disable the PPU and wait
    for it to complete its current transaction before we can do so, and we
    need to re-enable the PPU afterwards to make sure that the switch will
    notice changes in link state and speed on the individual ports as they
    occur.

    Since disabling the PPU is rather slow, and since MII management
    accesses are typically done in bursts, this patch keeps the PPU disabled
    for 10ms after a software access completes. This makes handling the
    PPU slightly more complex, but speeds up something like running ethtool
    on one of the switch slave interfaces from ~300ms to ~30ms on typical
    hardware.

    Signed-off-by: Lennert Buytenhek
    Tested-by: Nicolas Pitre
    Tested-by: Peter van Valderen
    Tested-by: Dirk Teurlings
    Signed-off-by: David S. Miller

    Lennert Buytenhek
     
  • Most of the DSA switches currently in the field do not support the
    Ethertype DSA tagging format that one of the previous patches added
    support for, but only the original DSA tagging format.

    The original DSA tagging format carries the same information as the
    Ethertype DSA tagging format, but with the difference that it does not
    have an ethertype field. In other words, when receiving a packet that
    is tagged with an original DSA tag, there is no way of telling in
    eth_type_trans() that this packet is in fact a DSA-tagged packet.

    This patch adds a hook into eth_type_trans() which is only compiled in
    if support for a switch chip that doesn't support Ethertype DSA is
    selected, and which checks whether there is a DSA switch driver
    instance attached to this network device which uses the old tag format.
    If so, it sets the protocol field to ETH_P_DSA without looking at the
    packet, so that the packet ends up in the right place.

    Signed-off-by: Lennert Buytenhek
    Tested-by: Nicolas Pitre
    Tested-by: Peter van Valderen
    Tested-by: Dirk Teurlings
    Signed-off-by: David S. Miller

    Lennert Buytenhek
     
  • Distributed Switch Architecture is a protocol for managing hardware
    switch chips. It consists of a set of MII management registers and
    commands to configure the switch, and an ethernet header format to
    signal which of the ports of the switch a packet was received from
    or is intended to be sent to.

    The switches that this driver supports are typically embedded in
    access points and routers, and a typical setup with a DSA switch
    looks something like this:

    +-----------+ +-----------+
    | | RGMII | |
    | +-------+ +------ 1000baseT MDI ("WAN")
    | | | 6-port +------ 1000baseT MDI ("LAN1")
    | CPU | | ethernet +------ 1000baseT MDI ("LAN2")
    | |MIImgmt| switch +------ 1000baseT MDI ("LAN3")
    | +-------+ w/5 PHYs +------ 1000baseT MDI ("LAN4")
    | | | |
    +-----------+ +-----------+

    The switch driver presents each port on the switch as a separate
    network interface to Linux, polls the switch to maintain software
    link state of those ports, forwards MII management interface
    accesses to those network interfaces (e.g. as done by ethtool) to
    the switch, and exposes the switch's hardware statistics counters
    via the appropriate Linux kernel interfaces.

    This initial patch supports the MII management interface register
    layout of the Marvell 88E6123, 88E6161 and 88E6165 switch chips, and
    supports the "Ethertype DSA" packet tagging format.

    (There is no officially registered ethertype for the Ethertype DSA
    packet format, so we just grab a random one. The ethertype to use
    is programmed into the switch, and the switch driver uses the value
    of ETH_P_EDSA for this, so this define can be changed at any time in
    the future if the one we chose is allocated to another protocol or
    if Ethertype DSA gets its own officially registered ethertype, and
    everything will continue to work.)

    Signed-off-by: Lennert Buytenhek
    Tested-by: Nicolas Pitre
    Tested-by: Byron Bradley
    Tested-by: Tim Ellis
    Tested-by: Peter van Valderen
    Tested-by: Dirk Teurlings
    Signed-off-by: David S. Miller

    Lennert Buytenhek
     
  • Conflicts:

    drivers/net/e1000e/ich8lan.c
    drivers/net/e1000e/netdev.c

    David S. Miller
     
  • Commit cb7f6a7b716e801097b564dec3ccb58d330aef56 ("IPVS: Move IPVS to
    net/netfilter/ipvs") has left a stray file in the old location of ipvs.

    Signed-off-by: Sven Wegener
    Signed-off-by: David S. Miller

    Sven Wegener
     
  • More breakage :-), part of timestamps just were previously
    overwritten.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     
  • Conflicts:

    net/netfilter/Kconfig

    David S. Miller
     
  • The gabs array in the sctp_tsnmap structure is only used
    in one place, sctp_make_sack(). As such, carrying the
    array around in the sctp_tsnmap and thus directly in
    the sctp_association is rather pointless since most
    of the time it's just taking up space. Now, let
    sctp_make_sack create and populate it and then throw
    it away when it's done.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • The tsn map currently use is 4K large and is stuck inside
    the sctp_association structure making memory references REALLY
    expensive. What we really need is at most 4K worth of bits
    so the biggest map we would have is 512 bytes. Also, the
    map is only really usefull when we have gaps to store and
    report. As such, starting with minimal map of say 32 TSNs (bits)
    should be enough for normal low-loss operations. We can grow
    the map by some multiple of 32 along with some extra room any
    time we receive the TSN which would put us outside of the map
    boundry. As we close gaps, we can shift the map to rebase
    it on the latest TSN we've seen. This saves 4088 bytes per
    association just in the map alone along savings from the now
    unnecessary structure members.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • I noticed sysctl_local_port_range[] and its associated seqlock
    sysctl_local_port_range_lock were on separate cache lines.
    Moreover, sysctl_local_port_range[] was close to unrelated
    variables, highly modified, leading to cache misses.

    Moving these two variables in a structure can help data
    locality and moving this structure to read_mostly section
    helps sharing of this data among cpus.

    Cleanup of extern declarations (moved in include file where
    they belong), and use of inet_get_local_port_range()
    accessor instead of direct access to ports values.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Current UDP port allocation is suboptimal.
    We select the shortest chain to chose a port (out of 512)
    that will hash in this shortest chain.

    First, it can lead to give not so ramdom ports and ease
    give attackers more opportunities to break the system.

    Second, it can consume a lot of CPU to scan all table
    in order to find the shortest chain.

    Third, in some pathological cases we can fail to find
    a free port even if they are plenty of them.

    This patch zap the search for a short chain and only
    use one random seed. Problem of getting long chains
    should be addressed in another way, since we can
    obtain long chains with non random ports.

    Based on a report and patch from Vitaly Mayatskikh

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • After the last change of requeuing there is no info about such
    incidents in tc stats. This patch updates the counter, but we should
    consider this should differ from previous stats because of additional
    checks preventing to repeat this. On the other hand, previous stats
    didn't include requeuing of gso_segmented skbs.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     
  • While looking for some common code I came across difference
    in checksum calculation between tcp_v6_send_(reset|ack) I
    couldn't explain. I checked both v4 and v6 and found out that
    both seem to have the same "feature". I couldn't find anything
    in rfc nor anywhere else which would state that md5 option
    should be ignored like it was in case of reset so I came to
    a conclusion that this is probably a genuine bug. I suspect
    that addition of md5 just was fooled by the excessive
    copy-paste code in those functions and the reset part was
    never tested well enough to find out the problem.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     
  • Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev