09 Jan, 2008

14 commits

  • Al went through the ip_fast_csum callers and found this piece of code
    that did not validate the IP header. While root crashing the machine
    by sending bogus packets through raw or AF_PACKET sockets isn't that
    serious, it is still nice to react gracefully.

    This patch ensures that the skb has enough data for an IP header and
    that the header length field is valid.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Signed-off-by: Brian Haley
    Acked-by: David L Stevens
    Signed-off-by: David S. Miller

    Brian Haley
     
  • alg_key_len is the length in bits of the key, not in bytes.

    Best way to fix this is to move alg_len() function from net/xfrm/xfrm_user.c
    to include/net/xfrm.h, and to use it in xfrm_algo_clone()

    alg_len() is renamed to xfrm_alg_len() because of its global exposition.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • lro_mgr->features contains a bitmask of LRO_F_* values which are
    defined as power of two, not as bit indexes.
    They must be checked with x&LRO_F_FOO, not with test_bit(LRO_F_FOO,&x).

    Signed-off-by: Brice Goglin
    Acked-by: Andrew Gallatin
    Signed-off-by: David S. Miller

    Brice Goglin
     
  • Both NetLabel and SELinux (other LSMs may grow to use it as well) rely
    on the 'iif' field to determine the receiving network interface of
    inbound packets. Unfortunately, at present this field is not
    preserved across a skb clone operation which can lead to garbage
    values if the cloned skb is sent back through the network stack. This
    patch corrects this problem by properly copying the 'iif' field in
    __skb_clone() and removing the 'iif' field assignment from
    skb_act_clone() since it is no longer needed.

    Also, while we are here, put the assignments in the same order as the
    offsets to reduce cacheline bounces.

    Signed-off-by: Paul Moore
    Signed-off-by: David S. Miller

    Paul Moore
     
  • I noticed "ip route list cache x.y.z.t" can be *very* slow.

    While strace-ing -T it I also noticed that first part of route cache
    is fetched quite fast :

    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
    GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3772
    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\
    202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3736
    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\
    202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3740
    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\
    202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3712
    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\
    202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3732
    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
    GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3708
    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
    GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3680

    while the part at the end of the table is more expensive:

    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3656
    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3772
    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3712
    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3700
    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3676
    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3724
    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3736

    The following patch corrects this performance/latency problem,
    removing quadratic behavior.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This finally adds the code in net_rx_action() to break out of the
    ->poll()'ing loop when a napi_disable() is found to be pending.

    Now, even if a device is being flooded with packets it can be cleanly
    brought down.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Currently mac80211 fails silently when trying to set a nonexistent
    rate. Return an error instead.

    Signed-Off-By: Andy Lutomirski
    Signed-off-by: John W. Linville

    Andrew Lutomirski
     
  • easy to trigger as user with sfuzz.

    irda_create() is quiet on unknown sock->type,
    match this behaviour for SOCK_DGRAM unknown protocol

    Signed-off-by: maximilian attems
    Signed-off-by: David S. Miller

    maximilian attems
     
  • Some recent changes completely removed accounting for the FORWARD_TSN
    parameter length in the INIT and INIT-ACK chunk. This is wrong and
    should be restored.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • When processing an unexpected INIT chunk, we do not need to
    do any preservation of the old AUTH parameters. In fact,
    doing such preservations will nullify AUTH and allow connection
    stealing.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • The even should be called SCTP_AUTHENTICATION_INDICATION.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • The recent changes for ip command line processing fixed some problems
    but unfortunately broke some common usage scenarios. In current
    2.6.24-rc6 the following command line results in no IP address
    assignment, which is surely a regression:

    ip=10.0.2.15::10.0.2.2:255.255.255.0::eth0:off

    Please find below a patch that works for all cases I can find.

    Signed-off-by: Amos Waterland
    Signed-off-by: David S. Miller

    Amos Waterland
     
  • We currently check that iph->ihl is bounded by the real length and that
    the real length is greater than the minimum IP header length. However,
    we did not check the caes where iph->ihl is less than the minimum IP
    header length.

    This breaks because some ip_fast_csum implementations assume that which
    is quite reasonable.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

04 Jan, 2008

3 commits

  • When re-naming an interface, the previous secondary address
    labels get lost e.g.

    $> brctl addbr foo
    $> ip addr add 192.168.0.1 dev foo
    $> ip addr add 192.168.0.2 dev foo label foo:00
    $> ip addr show dev foo | grep inet
    inet 192.168.0.1/32 scope global foo
    inet 192.168.0.2/32 scope global foo:00
    $> ip link set foo name bar
    $> ip addr show dev bar | grep inet
    inet 192.168.0.1/32 scope global bar
    inet 192.168.0.2/32 scope global bar:2

    Turns out to be a simple thinko in inetdev_changename() - clearly we
    want to look at the address label, rather than the device name, for
    a suffix to retain.

    Signed-off-by: Mark McLoughlin
    Signed-off-by: David S. Miller

    Mark McLoughlin
     
  • In include/net/xfrm.h we find :

    #ifdef CONFIG_XFRM_MIGRATE
    extern int km_migrate(struct xfrm_selector *sel, u8 dir, u8 type,
    struct xfrm_migrate *m, int num_bundles);
    ...
    #endif

    We can also guard the function body itself in net/xfrm/xfrm_state.c
    with same condition.

    (Problem spoted by sparse checker)
    make C=2 net/xfrm/xfrm_state.o
    ...
    net/xfrm/xfrm_state.c:1765:5: warning: symbol 'km_migrate' was not declared. Should it be static?
    ...

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • The function x25_get_neigh increments a reference count. At the point of
    the second goto out, the result of calling x25_get_neigh is only stored in
    a local variable, and thus no one outside the function will be able to
    decrease the reference count. Thus, x25_neigh_put should be called before
    the return in this case.

    The problem was found using the following semantic match.
    (http://www.emn.fr/x-info/coccinelle/)

    //

    @@
    type T,T1,T2;
    identifier E;
    statement S;
    expression x1,x2,x3;
    int ret;
    @@

    T E;
    ...
    * if ((E = x25_get_neigh(...)) == NULL)
    S
    ... when != x25_neigh_put(...,(T1)E,...)
    when != if (E != NULL) { ... x25_neigh_put(...,(T1)E,...); ...}
    when != x1 = (T1)E
    when != E = x3;
    when any
    if (...) {
    ... when != x25_neigh_put(...,(T2)E,...)
    when != if (E != NULL) { ... x25_neigh_put(...,(T2)E,...); ...}
    when != x2 = (T2)E
    (
    * return;
    |
    * return ret;
    )
    }
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: David S. Miller

    Julia Lawall
     

03 Jan, 2008

1 commit


30 Dec, 2007

2 commits

  • Because of workqueue delay, the put_device could be called before
    device_del, so move it to del_conn.

    Signed-off-by: Dave Young
    Signed-off-by: David S. Miller

    Dave Young
     
  • When a delayed ACK representing two packets arrives, there are two RTT
    samples available, one for each packet. The first (in order of seq
    number) will be artificially long due to the delay waiting for the
    second packet, the second will trigger the ACK and so will not itself
    be delayed.

    According to rfc1323, the SRTT used for RTO calculation should use the
    first rtt, so receivers echo the timestamp from the first packet in
    the delayed ack. For congestion control however, it seems measuring
    delayed ack delay is not desirable as it varies independently of
    congestion.

    The patch below causes seq_rtt and last_ackt to be updated with any
    available later packet rtts which should have less (and hopefully
    zero) delack delay. The rtt value then gets passed to
    ca_ops->pkts_acked().

    Where TCP_CONG_RTT_STAMP was set, effort was made to supress RTTs from
    within a TSO chunk (!fully_acked), using only the final ACK (which
    includes any TSO delay) to generate RTTs. This patch removes these
    checks so RTTs are passed for each ACK to ca_ops->pkts_acked().

    For non-delay based congestion control (cubic, h-tcp), rtt is
    sometimes used for rtt-scaling. In shortening the RTT, this may make
    them a little less aggressive. Delay-based schemes (eg vegas, veno,
    illinois) should get a cleaner, more accurate congestion signal,
    particularly for small cwnds. The congestion control module can
    potentially also filter out bad RTTs due to the delayed ack alarm by
    looking at the associated cnt which (where delayed acking is in use)
    should probably be 1 if the alarm went off or greater if the ACK was
    triggered by a packet.

    Signed-off-by: Gavin McCullagh
    Acked-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Gavin McCullagh
     

29 Dec, 2007

1 commit

  • David Brownell pointed out a regression in my recent "Fix ip command
    line processing" patch. It turns out to be a fairly blatant oversight on
    my part whereby ic_enable is never set, and thus autoconfiguration is
    never enabled. Clearly my testing was broken :-(

    The solution that I have is to set ic_enable to 1 if we hit
    ip_auto_config_setup(), which basically means that autoconfiguration is
    activated unless told otherwise. I then flip ic_enable to 0 if ip=off,
    ip=none, ip=::::::off or ip=::::::none using ic_proto_name();

    The incremental patch is below, let me know if a non-incremental version
    is prepared, as I did as for the original patch to be reverted pending a
    fix.

    Signed-off-by: Simon Horman
    Signed-off-by: David S. Miller

    Simon Horman
     

27 Dec, 2007

4 commits

  • Recently the documentation in Documentation/nfsroot.txt was
    update to note that in fact ip=off and ip=::::::off as the
    latter is ignored and the default (on) is used.

    This was certainly a step in the direction of reducing confusion.
    But it seems to me that the code ought to be fixed up so that
    ip=::::::off actually turns off ip autoconfiguration.

    This patch also notes more specifically that ip=on (aka ip=::::::on)
    is the default.

    Signed-off-by: Simon Horman
    Signed-off-by: David S. Miller

    Simon Horman
     
  • Some users do "modprobe ip_conntrack hashsize=...". Since we have the
    module aliases this loads nf_conntrack_ipv4 and nf_conntrack, the
    hashsize parameter is unknown for nf_conntrack_ipv4 however and makes
    it fail.

    Allow to specify hashsize= for both nf_conntrack and nf_conntrack_ipv4.

    Note: the nf_conntrack message in the ringbuffer will display an
    incorrect hashsize since nf_conntrack is first pulled in as a
    dependency and calculates the size itself, then it gets changed
    through a call to nf_conntrack_set_hashsize().

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • This patch makes mac80211 warn (once) when the driver passes up a
    frame in which the payload data is not aligned on a four-byte
    boundary, with a long comment for people who run into the condition
    and need to know what to do.

    Signed-off-by: Johannes Berg
    Signed-off-by: John W. Linville

    Johannes Berg
     
  • The station cleanup timer runs every ten seconds, the exact
    timing is not relevant at all so it can well run together with
    other things to save power.

    Signed-off-by: Johannes Berg
    Signed-off-by: John W. Linville

    Johannes Berg
     

21 Dec, 2007

11 commits

  • [ Regression added by changeset:
    cd40b7d3983c708aabe3d3008ec64ffce56d33b0
    [NET]: make netlink user -> kernel interface synchronious
    -DaveM ]

    nl_fib_input re-reuses incoming skb to send the reply. This means that this
    packet will be freed twice, namely in:
    - netlink_unicast_kernel
    - on receive path
    Use clone to send as a cure, the caller is responsible for kfree_skb on error.

    Thanks to Alexey Dobryan, who originally found the problem.

    Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • When used function put_cmsg() to copy kernel information to user
    application memory, if the memory length given by user application is
    not enough, by the bad length calculate of msg.msg_controllen,
    put_cmsg() function may cause the msg.msg_controllen to be a large
    value, such as 0xFFFFFFF0, so the following put_cmsg() can also write
    data to usr application memory even usr has no valid memory to store
    this. This may cause usr application memory overflow.

    int put_cmsg(struct msghdr * msg, int level, int type, int len, void *data)
    {
    struct cmsghdr __user *cm
    = (__force struct cmsghdr __user *)msg->msg_control;
    struct cmsghdr cmhdr;
    int cmlen = CMSG_LEN(len);
    ~~~~~~~~~~~~~~~~~~~~~
    int err;

    if (MSG_CMSG_COMPAT & msg->msg_flags)
    return put_cmsg_compat(msg, level, type, len, data);

    if (cm==NULL || msg->msg_controllen < sizeof(*cm)) {
    msg->msg_flags |= MSG_CTRUNC;
    return 0; /* XXX: return error? check spec. */
    }
    if (msg->msg_controllen < cmlen) {
    ~~~~~~~~~~~~~~~~~~~~~~~~
    msg->msg_flags |= MSG_CTRUNC;
    cmlen = msg->msg_controllen;
    }
    cmhdr.cmsg_level = level;
    cmhdr.cmsg_type = type;
    cmhdr.cmsg_len = cmlen;

    err = -EFAULT;
    if (copy_to_user(cm, &cmhdr, sizeof cmhdr))
    goto out;
    if (copy_to_user(CMSG_DATA(cm), data, cmlen - sizeof(struct cmsghdr)))
    goto out;
    cmlen = CMSG_SPACE(len);
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~
    If MSG_CTRUNC flags is set, msg->msg_controllen is less than
    CMSG_SPACE(len), "msg->msg_controllen -= cmlen" will cause unsinged int
    type msg->msg_controllen to be a large value.
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~
    msg->msg_control += cmlen;
    msg->msg_controllen -= cmlen;
    ~~~~~~~~~~~~~~~~~~~~~
    err = 0;
    out:
    return err;
    }

    The same promble exists in put_cmsg_compat(). This patch can fix this
    problem.

    Signed-off-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Wei Yongjun
     
  • Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

20 Dec, 2007

4 commits

  • This operation helper abstracts:

    skb->mac_header = skb->data;

    but it was done in two more places which were actually:

    skb->mac_header = skb->network_header;

    and those are corrected here.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • mac_header update in ipgre_recv() was incorrectly changed to
    skb_reset_mac_header() when it was introduced.

    Signed-off-by: Timo Teras
    Signed-off-by: David S. Miller

    Timo Teras
     
  • In several places the arguments to the xfrm_audit_start() function are
    in the wrong order resulting in incorrect user information being
    reported. This patch corrects this by pacing the arguments in the
    correct order.

    Signed-off-by: Paul Moore
    Signed-off-by: David S. Miller

    Paul Moore
     
  • The aalgos/ealgos fields are only 32 bits wide. However, af_key tries
    to test them with the expression 1 << id where id can be as large as
    253. This produces different behaviour on different architectures.

    The following patch explicitly checks whether ID is greater than 31
    and fails the check if that's the case.

    We cannot easily extend the mask to be longer than 32 bits due to
    exposure to user-space. Besides, this whole interface is obsolete
    anyway in favour of the xfrm_user interface which doesn't use this
    bit mask in templates (well not within the kernel anyway).

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu