04 May, 2005

21 commits

  • Some network drivers call netif_stop_queue() when detecting loss of
    carrier. This leads to packets being queued up at the qdisc level for
    an unbound period of time. In order to prevent this effect, the core
    networking stack will now cease to queue packets for any device, that
    is operationally down (i.e. the queue is flushed and disabled).

    Signed-off-by: Tommy S. Christensen
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Tommy S. Christensen
     
  • If we free up a partially processed packet because it's
    skb->len dropped to zero, we need to decrement qlen because
    we are dropping out of the top-level loop so it will do
    the decrement for us.

    Spotted by Herbert Xu.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The qlen should continue to decrement, even if we
    pop partially processed SKBs back onto the receive queue.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Let's recap the problem. The current asynchronous netlink kernel
    message processing is vulnerable to these attacks:

    1) Hit and run: Attacker sends one or more messages and then exits
    before they're processed. This may confuse/disable the next netlink
    user that gets the netlink address of the attacker since it may
    receive the responses to the attacker's messages.

    Proposed solutions:

    a) Synchronous processing.
    b) Stream mode socket.
    c) Restrict/prohibit binding.

    2) Starvation: Because various netlink rcv functions were written
    to not return until all messages have been processed on a socket,
    it is possible for these functions to execute for an arbitrarily
    long period of time. If this is successfully exploited it could
    also be used to hold rtnl forever.

    Proposed solutions:

    a) Synchronous processing.
    b) Stream mode socket.

    Firstly let's cross off solution c). It only solves the first
    problem and it has user-visible impacts. In particular, it'll
    break user space applications that expect to bind or communicate
    with specific netlink addresses (pid's).

    So we're left with a choice of synchronous processing versus
    SOCK_STREAM for netlink.

    For the moment I'm sticking with the synchronous approach as
    suggested by Alexey since it's simpler and I'd rather spend
    my time working on other things.

    However, it does have a number of deficiencies compared to the
    stream mode solution:

    1) User-space to user-space netlink communication is still vulnerable.

    2) Inefficient use of resources. This is especially true for rtnetlink
    since the lock is shared with other users such as networking drivers.
    The latter could hold the rtnl while communicating with hardware which
    causes the rtnetlink user to wait when it could be doing other things.

    3) It is still possible to DoS all netlink users by flooding the kernel
    netlink receive queue. The attacker simply fills the receive socket
    with a single netlink message that fills up the entire queue. The
    attacker then continues to call sendmsg with the same message in a loop.

    Point 3) can be countered by retransmissions in user-space code, however
    it is pretty messy.

    In light of these problems (in particular, point 3), we should implement
    stream mode netlink at some point. In the mean time, here is a patch
    that implements synchronous processing.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Here is a little optimisation for the cb_lock used by netlink_dump.
    While fixing that race earlier, I noticed that the reference count
    held by cb_lock is completely useless. The reason is that in order
    to obtain the protection of the reference count, you have to take
    the cb_lock. But the only way to take the cb_lock is through
    dereferencing the socket.

    That is, you must already possess a reference count on the socket
    before you can take advantage of the reference count held by cb_lock.
    As a corollary, we can remve the reference count held by the cb_lock.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • htb_enqueue(): Free skb and return NET_XMIT_DROP if a packet is
    destined for the direct_queue but the direct_queue is full. (Before
    this: erroneously returned NET_XMIT_SUCCESS even though the packet was
    not enqueued)

    Signed-off-by: Asim Shankar
    Signed-off-by: David S. Miller

    Asim Shankar
     
  • kfree() and vfree() can both deal with NULL pointers. This patch removes
    redundant NULL pointer checks from the ppp code in drivers/net/

    Signed-off-by: Jesper Juhl
    Signed-off-by: David S. Miller

    Jesper Juhl
     
  • Signed-off-by: Folkert van Heusden
    Signed-off-by: David S. Miller

    Folkert van Heusden
     
  • Signed-off-by: Folkert van Heusden
    Signed-off-by: David S. Miller

    Folkert van Heusden
     
  • This is a trivial fix for a typo on Kconfig, where the Generic Random Early
    Detection algorithm is abbreviated as RED instead of GRED.

    Signed-off-by: Lucas Correia Villa Real
    Signed-off-by: David S. Miller

    Lucas Correia Villa Real
     
  • kfree(0) is perfectly valid, checking pointers for NULL before calling
    kfree() on them is redundant. The patch below cleans away a few such
    redundant checks (and while I was around some of those bits I couldn't
    stop myself from making a few tiny whitespace changes as well).

    Signed-off-by: Jesper Juhl
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Jesper Juhl
     
  • Converts remaining rtnetlink_link tables to use c99 designated
    initializers to make greping a little bit easier.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Converts rtm_min and rtm_max arrays to use c99 designated
    initializers for easier insertion of new message families.
    RTM_GETMULTICAST and RTM_GETANYCAST did not have the minimal
    message size specified which means that the netlink message
    was parsed for routing attributes starting from the header.
    Adds the proper minimal message sizes for these messages
    (netlink header + common rtnetlink header) to fix this issue.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • RTM_MAX is currently set to the maximum reserverd message type plus one
    thus being the cause of two bugs for new types being assigned a) given the
    new family registers only the NEW command in its reserved block the array
    size for per family entries is calculated one entry short and b) given the
    new family registers all commands RTM_MAX would point to the first entry
    of the block following this one and the rtnetlink receive path would accept
    a message type for a nonexisting family.

    This patch changes RTM_MAX to point to the maximum valid message type
    by aligning it to the start of the next block and subtracting one.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Converts xfrm_msg_min and xfrm_dispatch to use c99 designated
    initializers to make greping a little bit easier. Also replaces
    two hardcoded message type with meaningful names.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Makes the type > XFRM_MSG_MAX check behave correctly to
    protect access to xfrm_dispatch.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • This patch includes net/ipv6.h from addrconf.h since it needs
    ipv6_addr_set.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • I made a mistake in my last patch to the raw socket checksum code.
    I used the value of inet->cork.length as the length of the payload.
    While this works with normal packets, it breaks down when IPsec is
    present since the cork length includes the extension header length.

    So here is a patch to fix the length calculations.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

03 May, 2005

7 commits

  • gcc-4.0 generates altivec code implicitly when -mcpu indicates an
    altivec capable CPU which is not suitable for the kernel. However, we
    used to set -mcpu=970 when CONFIG_ALTIVEC was set because a gcc-3.x bug
    prevented from using -maltivec along with -mcpu=power4, thus prevented
    building the RAID6 altivec code.

    This patch fixes all of this by testing for the gcc version. If 4.0 or
    later, just normally use -mcpu=power4 and let the RAID6 code add
    -maltivec to the few files it needs to be compiled with altivec support.
    For 3.x, we still use -mcpu=970 to work around the above problem, which
    is fine as 3.x will never implicitly generate altivec code.

    The Makefile hackery may not be the most lovely, I welcome anybody more
    skilled than me to improve it.

    Signed-off-by: Benjamin Herrenschmidt
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     
  • Modify xtSearch so that it returns the next allocated block when the
    requested block is unmapped. This can be used to make sure we don't
    create a new extent that overlaps the next one.

    Signed-off-by: Dave Kleikamp
    Signed-off-by: Linus Torvalds

    Dave Kleikamp
     
  • This patch adds jfs_syncpt, which calls lmLogSync to write sync points
    to the journal both in jfs_sync_fs and when sync barrier processing
    completes.

    lmLogSync accomplishes two things: 1) it pushes logged-but-dirty
    metadata pages to disk, and 2) it writes a sync record to the journal
    so that jfs_fsck doesn't need to replay more transactions than is
    necessary.

    Signed-off-by: Dave Kleikamp
    Signed-off-by: Linus Torvalds

    Dave Kleikamp
     
  • jfs has never worked on architecutures where the page size was not 4K.

    Signed-off-by: Dave Kleikamp
    Signed-off-by: Linus Torvalds

    Dave Kleikamp
     
  • JFS code has always assumed a page size of 4K. This patch fixes the
    non-pagecache uses of pages to deal with larger pages.

    Signed-off-by: Dave Kleikamp
    Signed-off-by: Linus Torvalds

    Dave Kleikamp
     
  • JFS was creating a new IAG (inode aggregate group) in one address
    space, and afterwards, accessing it from another. This could lead to
    complications when cache pages contain more than one page of jfs
    metadata. This patch causes the IAG to be initialized in the same
    address space that it is subsequently accessed with.

    This also elimitates an I/O, but IAG's aren't created too often.

    Signed-off-by: Dave Kleikamp
    Signed-off-by: Linus Torvalds

    Dave Kleikamp
     
  • Use an inline pxd list rather than an xad list in the xadlock.
    When the number of extents being modified can fit with the xadlock,
    a transaction can be committed asynchronously. Using a list of
    pxd's instead of xad's allows us to fit 4 extents, rather than 2.

    Signed-off-by: Dave Kleikamp
    Signed-off-by: Linus Torvalds

    Dave Kleikamp
     

02 May, 2005

7 commits

  • The cpufreq core patch I sent earlier got only half-applied. I added a
    flag to let the low level driver disable an annoying warning on
    suspend/resume that is normal on ppc, but the "resume" part of it wasn't
    applied.

    This just adds back that missing bit. The original patch also reworked
    the resume() function to avoid nesting too many if () statements along
    the way I did the suspend() one, but I didn't include that in the patch
    below.

    Signed-off-by: Benjamin Herrenschmidt
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     
  • The clock spreading disable/enable code was called to late/early during
    the suspend/resume code on some laptops and would trigger a
    might_sleep() warning due to the down() call in the low level i2c code.

    This fixes it by calling those functions earlier/later when interrupts
    are still enabled.

    Signed-off-by: Benjamin Herrenschmidt
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     
  • As Al Viro noticed, my previous fix missed one instance of "device" in
    the driver local debug code. Harmless unless you tweak the #define's in
    there but still work fixing.

    Signed-off-by: Benjamin Herrenschmidt
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     
  • A typo in the machine table incorrectly mark the 101 PowerBook as
    needing explicit callback from the video driver to enable sleep mode. I
    did not implement that mecanism for chipsest older than r128, so we need
    to mark this machine as always beeing able to sleep for now.

    Signed-off-by: Benjamin Herrenschmidt
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     
  • My newer iMac mini driver doesn't build with verbose debug enabled.

    This fixes it, and removes an erroneous error printk (since it's normal
    on some machine to not find some gpios on the "first try").

    Signed-off-by: Benjamin Herrenschmidt
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     
  • We are experiencing a problem when flushing the CPU caches before sleep
    on some laptop models using the 750FX CPU rev 1.X. While I haven't been
    able to figure out a proper explanation for what's going on, I do have a
    workaround that seem to work reliably and allows those machine to sleep
    and wakeup properly again.

    I'll re-update that code if/when I ever find exactly what is happening
    with those CPU revisions.

    Signed-off-by: Benjamin Herrenschmidt
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     
  • Only issue a cdrom cache flush if we've done write to the drive. The
    ->media_written() flag keeps track of that.

    Signed-off-by: Jens Axboe
    Signed-off-by: Linus Torvalds

    Jens Axboe
     

01 May, 2005

5 commits