14 Jan, 2008

1 commit


12 Jan, 2008

1 commit

  • The bridge code incorrectly causes two POST_ROUTING hook invocations
    for DNATed packets that end up on the same bridge device. This
    happens because packets with a changed destination address are passed
    to dst_output() to make them go through the neighbour output function
    again to build a new destination MAC address, before they will continue
    through the IP hooks simulated by bridge netfilter.

    The resulting hook order is:
    PREROUTING (bridge netfilter)
    POSTROUTING (dst_output -> ip_output)
    FORWARD (bridge netfilter)
    POSTROUTING (bridge netfilter)

    The deferred hooks used to abort the first POST_ROUTING invocation,
    but since the only thing bridge netfilter actually really wants is
    a new MAC address, we can avoid going through the IP stack completely
    by simply calling the neighbour output function directly.

    Tested, reported and lots of data provided by: Damien Thebault

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

11 Jan, 2008

8 commits

  • Use the @helper variable that was just obtained.

    Signed-off-by: Jan Engelhardt
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Jan Engelhardt
     
  • RFC2464 says that the next to lowerst order bit of the first octet
    of the Interface Identifier is formed by complementing
    the Universal/Local bit of the EUI-64. But ip6t_eui64 uses OR not XOR.

    Thanks Peter Ivancik for reporing this bug and posting a patch
    for it.

    Signed-off-by: Yasuyuki Kozakai
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Yasuyuki Kozakai
     
  • Don't allow to nest macvlan devices since it will cause lockdep
    warnings and isn't really useful for anything.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Allow vlans nesting other vlans without lockdep's warnings (max. 2 levels
    i.e. parent + child). Thanks to Patrick McHardy for pointing a bug in the
    first version of this patch.

    Reported-by: Benny Amorsen

    Signed-off-by: Jarek Poplawski
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Jarek Poplawski
     
  • In dn_rt_cache_get_next(), no need to guard seq->private by a
    rcu_dereference() since seq is private to the thread running this
    function. Reading seq.private once (as guaranted bu rcu_dereference())
    or several time if compiler really is dumb enough wont change the
    result.

    But we miss real spots where rcu_dereference() are needed, both in
    dn_rt_cache_get_first() and dn_rt_cache_get_next()

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • In the (rare) event of simultaneous mutual wake up requests,
    do send the chip an explicit wake-up ack. This is required
    for Texas Instruments's BRF6350 chip.

    Signed-off-by: Ohad Ben-Cohen
    Signed-off-by: David S. Miller

    Ohad Ben-Cohen
     
  • 1) In tty.c the BUG_ON at line 115 will never be called, because the the
    before list_del_init in this same function.
    115 BUG_ON(!list_empty(&dev->list));
    So move the list_del_init to rfcomm_dev_del

    2) The rfcomm_dev_del could be called from diffrent path
    (rfcomm_tty_hangup/rfcomm_dev_state_change/rfcomm_release_dev),

    So add another BUG_ON when the rfcomm_dev_del is called more than
    one time.

    Signed-off-by: Dave Young
    Signed-off-by: David S. Miller

    Dave Young
     
  • Bernard Pidoux F6BVP reported:
    > When I killall kissattach I can see the following message.
    >
    > This happens on kernel 2.6.24-rc5 already patched with the 6 previously
    > patches I sent recently.
    >
    >
    > =======================================================
    > [ INFO: possible circular locking dependency detected ]
    > 2.6.23.9 #1
    > -------------------------------------------------------
    > kissattach/2906 is trying to acquire lock:
    > (linkfail_lock){-+..}, at: [] ax25_link_failed+0x11/0x39 [ax25]
    >
    > but task is already holding lock:
    > (ax25_list_lock){-+..}, at: [] ax25_device_event+0x38/0x84
    > [ax25]
    >
    > which lock already depends on the new lock.
    >
    >
    > the existing dependency chain (in reverse order) is:
    ...

    lockdep is worried about the different order here:

    #1 (rose_neigh_list_lock){-+..}:
    #3 (ax25_list_lock){-+..}:

    #0 (linkfail_lock){-+..}:
    #1 (rose_neigh_list_lock){-+..}:

    #3 (ax25_list_lock){-+..}:
    #0 (linkfail_lock){-+..}:

    So, ax25_list_lock could be taken before and after linkfail_lock.
    I don't know if this three-thread clutch is very probable (or
    possible at all), but it seems another bug reported by Bernard
    ("[...] system impossible to reboot with linux-2.6.24-rc5")
    could have similar source - namely ax25_list_lock held by
    ax25_kill_by_device() during ax25_disconnect(). It looks like the
    only place which calls ax25_disconnect() this way, so I guess, it
    isn't necessary.

    This patch is breaking the lock for ax25_disconnect().

    Reported-and-tested-by: Bernard Pidoux
    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

10 Jan, 2008

6 commits

  • sfuzz can easily trigger any of those.

    move the printk message to the corresponding comment: makes the
    intention of the code clear and easy to pick up on an scheduled
    removal. as bonus simplify the braces placement.

    Signed-off-by: maximilian attems
    Signed-off-by: David S. Miller

    maximilian attems
     
  • In rt_cache_get_next(), no need to guard seq->private by a
    rcu_dereference() since seq is private to the thread running this
    function. Reading seq.private once (as guaranted bu rcu_dereference())
    or several time if compiler really is dumb enough wont change the
    result.

    But we miss real spots where rcu_dereference() are needed, both in
    rt_cache_get_first() and rt_cache_get_next()

    Signed-off-by: Eric Dumazet
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • The neightbl_fill_parms() is called under the write-locked tbl->lock
    and accesses the parms->dev. The negh_parm_release() calls the
    dev_put(parms->dev) without this lock. This creates a tiny race window
    on which the parms contains potentially stale dev pointer.

    To fix this race it's enough to move the dev_put() upper under the
    tbl->lock, but note, that the parms are held by neighbors and thus can
    live after the neigh_parms_release() is called, so we still can have a
    parm with bad dev pointer.

    I didn't find where the neigh->parms->dev is accessed, but still think
    that putting the dev is to be done in a place, where the parms are
    really freed. Am I right with that?

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • From: Mirko Lindner

    This patch makes necessary changes in the Neptune driver to support
    the new Marvell PHY. It also adds support for the LED blinking
    on Neptune cards with Marvell PHY. All registers are using defines
    in the niu.h header file as is already done for the BCM8704 registers.

    [ Coding style, etc. cleanups -DaveM ]

    Signed-off-by: David S. Miller

    Mirko Lindner
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (36 commits)
    [ATM]: Check IP header validity in mpc_send_packet
    [IPV6]: IPV6_MULTICAST_IF setting is ignored on link-local connect()
    [CONNECTOR]: Don't touch queue dev after decrement of ref count.
    [SOCK]: Adds a rcu_dereference() in sk_filter
    [XFRM]: xfrm_algo_clone() allocates too much memory
    [FORCEDETH]: Fix reversing the MAC address on suspend.
    [NET]: mcs7830 passes msecs instead of jiffies to usb_control_msg
    [LRO] Fix lro_mgr->features checks
    [NET]: Clone the sk_buff 'iif' field in __skb_clone()
    [IPV4] ROUTE: ip_rt_dump() is unecessary slow
    [NET]: kaweth was forgotten in msec switchover of usb_start_wait_urb
    [NET] Intel ethernet drivers: update MAINTAINERS
    [NET]: Make ->poll() breakout consistent in Intel ethernet drivers.
    [NET]: Stop polling when napi_disable() is pending.
    [NET]: Fix drivers to handle napi_disable() disabling interrupts.
    [NETXEN]: Fix ->poll() done logic.
    mac80211: return an error when SIWRATE doesn't match any rate
    ssb: Fix probing of PCI cores if PCI and PCIE core is available
    [NET]: Do not check netif_running() and carrier state in ->poll()
    [NET]: Add NAPI_STATE_DISABLE.
    ...

    Linus Torvalds
     
  • The show_task function invoked by sysrq-t et al displays the
    pid and parent's pid of each task. It seems more useful to
    show the actual process hierarchy here than who is using
    ptrace on each process.

    Signed-off-by: Roland McGrath
    Signed-off-by: Linus Torvalds

    Roland McGrath
     

09 Jan, 2008

24 commits

  • Al went through the ip_fast_csum callers and found this piece of code
    that did not validate the IP header. While root crashing the machine
    by sending bogus packets through raw or AF_PACKET sockets isn't that
    serious, it is still nice to react gracefully.

    This patch ensures that the skb has enough data for an IP header and
    that the header length field is valid.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Signed-off-by: Brian Haley
    Acked-by: David L Stevens
    Signed-off-by: David S. Miller

    Brian Haley
     
  • cn_queue_free_callback() will touch 'dev'(i.e. cbq->pdev), so it
    should be called before atomic_dec(&dev->refcnt).

    Signed-off-by: Li Zefan
    Signed-off-by: David S. Miller

    Li Zefan
     
  • It seems commit fda9ef5d679b07c9d9097aaf6ef7f069d794a8f9 introduced a RCU
    protection for sk_filter(), without a rcu_dereference()

    Either we need a rcu_dereference(), either a comment should explain why we
    dont need it. I vote for the former.

    Signed-off-by: Eric Dumazet
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • alg_key_len is the length in bits of the key, not in bytes.

    Best way to fix this is to move alg_len() function from net/xfrm/xfrm_user.c
    to include/net/xfrm.h, and to use it in xfrm_algo_clone()

    alg_len() is renamed to xfrm_alg_len() because of its global exposition.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • For cards that initially have the MAC address stored in reverse order,
    the forcedeth driver uses a flag to signal whether the address was
    already corrected, so that it is not reversed again on a subsequent
    probe.

    Unfortunately this flag, which is stored in a register of the card,
    seems to get lost during suspend, resulting in the MAC address being
    reversed again. To fix that, the MAC address needs to be written back
    in reversed order before we suspend and the flag needs to be reset.

    The flag is still required because at least kexec will never write
    back the reversed address and thus needs to know what state the card
    is in.

    Signed-off-by: Björn Steinbrink
    Signed-off-by: David S. Miller

    Björn Steinbrink
     
  • usb_control_msg was changed long ago (2.6.12-pre) to take milliseconds
    instead of jiffies. Oddly, mcs7830 wasn't added until 2.6.19-rc3.

    Signed-off-by: Russ Dill
    Signed-off-by: David S. Miller

    Russ Dill
     
  • lro_mgr->features contains a bitmask of LRO_F_* values which are
    defined as power of two, not as bit indexes.
    They must be checked with x&LRO_F_FOO, not with test_bit(LRO_F_FOO,&x).

    Signed-off-by: Brice Goglin
    Acked-by: Andrew Gallatin
    Signed-off-by: David S. Miller

    Brice Goglin
     
  • Both NetLabel and SELinux (other LSMs may grow to use it as well) rely
    on the 'iif' field to determine the receiving network interface of
    inbound packets. Unfortunately, at present this field is not
    preserved across a skb clone operation which can lead to garbage
    values if the cloned skb is sent back through the network stack. This
    patch corrects this problem by properly copying the 'iif' field in
    __skb_clone() and removing the 'iif' field assignment from
    skb_act_clone() since it is no longer needed.

    Also, while we are here, put the assignments in the same order as the
    offsets to reduce cacheline bounces.

    Signed-off-by: Paul Moore
    Signed-off-by: David S. Miller

    Paul Moore
     
  • I noticed "ip route list cache x.y.z.t" can be *very* slow.

    While strace-ing -T it I also noticed that first part of route cache
    is fetched quite fast :

    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
    GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3772
    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\
    202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3736
    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\
    202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3740
    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\
    202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3712
    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\
    202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3732
    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
    GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3708
    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
    GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3680

    while the part at the end of the table is more expensive:

    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3656
    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3772
    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3712
    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3700
    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3676
    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3724
    recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3736

    The following patch corrects this performance/latency problem,
    removing quadratic behavior.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Back in 2.6.12-pre, usb_start_wait_urb was switched over to take
    milliseconds instead of jiffies. kaweth.c was never updated to match.

    Signed-off-by: Russ Dill
    Signed-off-by: David S. Miller

    Russ Dill
     
  • Unfortunately Jeb decided to move away from our group. We wish Jeb
    good luck with his new group!

    Reordered people a bit so most active team members are on top.

    Signed-off-by: Auke Kok
    Signed-off-by: David S. Miller

    Auke Kok
     
  • This makes the ->poll() routines of the E100, E1000, E1000E, IXGB, and
    IXGBE drivers complete ->poll() consistently.

    Now they will all break out when the amount of RX work done is less
    than 'budget'.

    At a later time, we may want put back code to include the TX work as
    well (as at least one other NAPI driver does, but by in large NAPI
    drivers do not do this). But if so, it should be done consistently
    across the board to all of these drivers.

    Signed-off-by: David S. Miller
    Acked-by: Auke Kok

    David S. Miller
     
  • This finally adds the code in net_rx_action() to break out of the
    ->poll()'ing loop when a napi_disable() is found to be pending.

    Now, even if a device is being flooded with packets it can be cleanly
    brought down.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • When we add the generic napi_disable_pending() breakout
    logic to net_rx_action() it means that napi_disable()
    can cause NAPI poll interrupt events to be disabled.

    And this is exactly what we want. If a napi_disable()
    is pending, and we are looping in the ->poll(), we want
    ->poll() event interrupts to stay disabled and we want
    to complete the NAPI poll ASAP.

    When ->poll() break out during device down was being handled on a
    per-driver basis, often these drivers would turn interrupts back on
    when '!netif_running()' was detected.

    And this would just cause a reschedule of the NAPI ->poll() in the
    interrupt handler before the napi_disable() could get in there and
    grab the NAPI_STATE_SCHED bit.

    The vast majority of drivers don't care if napi_disable() might have
    the side effect of disabling NAPI ->poll() event interrupts. In all
    such cases, when a napi_disable() is performed, the driver just
    disabled interrupts or is about to.

    However there were three exceptions to this in PCNET32, R8169, and
    SKY2. To fix those cases, at the subsequent napi_enable() points, I
    added code to ensure that the ->poll() interrupt events are enabled in
    the hardware.

    Signed-off-by: David S. Miller
    Acked-by: Don Fry

    David S. Miller
     
  • If work_done >= budget we should always elide the NAPI
    completion.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Currently mac80211 fails silently when trying to set a nonexistent
    rate. Return an error instead.

    Signed-Off-By: Andy Lutomirski
    Signed-off-by: John W. Linville

    Andrew Lutomirski
     
  • This will make sure that always the correct core is selected, even if
    there are both a PCI and PCI-E core on a PCI or PCI-E card.

    Signed-off-by: Michael Buesch
    Signed-off-by: John W. Linville

    Michael Buesch
     
  • Drivers do this to try to break out of the ->poll()'ing loop
    when the device is being brought administratively down.

    Now that we have a napi_disable() "pending" state we are going
    to solve that problem generically.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Create a bit to signal that a napi_disable() is in progress.

    This sets up infrastructure such that net_rx_action() can generically
    break out of the ->poll() loop on a NAPI context that has a pending
    napi_disable() yet is being bombed with packets (and thus would
    otherwise poll endlessly and not allow the napi_disable() to finish).

    Now, what napi_disable() does is first set the NAPI_STATE_DISABLE bit
    (to indicate that a disable is pending), then it polls for the
    NAPI_STATE_SCHED bit, and once the NAPI_STATE_SCHED bit is acquired
    the NAPI_STATE_DISABLE bit is cleared. Here, the test_and_set_bit()
    provides the necessary memory barrier between the various bitops.

    napi_schedule_prep() now tests for a pending disable as it's first
    action and won't try to obtain the NAPI_STATE_SCHED bit if a disable
    is pending.

    As a result, we can remove the netif_running() check in
    netif_rx_schedule_prep() because the NAPI disable pending state serves
    this purpose. And, it does so in a NAPI centric manner which is what
    we really want.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • It is pointless, because everything that can make a device go away
    will do a napi_disable() first.

    The main impetus behind this is that now we can legally do a NAPI
    completion in generic code like net_rx_action() which a following
    changeset needs to do. net_rx_action() can only perform actions
    in NAPI centric ways, because there may be a one to many mapping
    between NAPI contexts and network devices (SKY2 is one example).

    We also want to get rid of this because it's an extra atomic in the
    NAPI paths, and also because it is one of the last instances where the
    NAPI interfaces care about net devices.

    The one remaining netdev detail the NAPI stuff cares about is the
    netif_running() check which will be killed off in a subsequent
    changeset.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This patch fixes the parsing of the RX data header channel field.

    The current code parses the header incorrectly and passes a wrong
    channel number and frequency for each frame to mac80211.
    The FIXMEs added by this patch don't matter for now as the code
    where they live won't get executed anyway. They will be fixed later.

    Signed-off-by: Michael Buesch
    Signed-off-by: John W. Linville

    Michael Buesch
     
  • easy to trigger as user with sfuzz.

    irda_create() is quiet on unknown sock->type,
    match this behaviour for SOCK_DGRAM unknown protocol

    Signed-off-by: maximilian attems
    Signed-off-by: David S. Miller

    maximilian attems
     
  • Some recent changes completely removed accounting for the FORWARD_TSN
    parameter length in the INIT and INIT-ACK chunk. This is wrong and
    should be restored.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich