27 Jun, 2005

5 commits

  • From:

    $subject was fixed in 2.4 already, 2.6 needs it as well.

    The impact of the bugs is a kernel stack overflow and privilege escalation
    from CAP_NET_ADMIN via the IP_VS_SO_SET_STARTDAEMON/IP_VS_SO_GET_DAEMON
    ioctls. People running with 'root=all caps' (i.e., most users) are not
    really affected (there's nothing to escalate), but SELinux and similar
    users should take it seriously if they grant CAP_NET_ADMIN to other users.

    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    pageexec
     
  • 1) netlink_release() should only decrement the hash entry
    count if the socket was actually hashed.

    This was causing hash->entries to underflow, which
    resulting in all kinds of troubles.

    On 64-bit systems, this would cause the following
    conditional to erroneously trigger:

    err = -ENOMEM;
    if (BITS_PER_LONG > 32 && unlikely(hash->entries >= UINT_MAX))
    goto err;

    2) netlink_autobind() needs to propagate the error return from
    netlink_insert(). Otherwise, callers will not see the error
    as they should and thus try to operate on a socket with a zero pid,
    which is very bad.

    However, it should not propagate -EBUSY. If two threads race
    to autobind the socket, that is fine. This is consistent with the
    autobind behavior in other protocols.

    So bug #1 above, combined with this one, resulted in hangs
    on netlink_sendmsg() calls to the rtnetlink socket. We'd try
    to do the user sendmsg() with the socket's pid set to zero,
    later we do a socket lookup using that pid (via the value we
    stashed away in NETLINK_CB(skb).pid), but that won't give us the
    user socket, it will give us the rtnetlink socket. So when we
    try to wake up the receive queue, we dive back into rtnetlink_rcv()
    which tries to recursively take the rtnetlink semaphore.

    Thanks to Jakub Jelink for providing backtraces. Also, thanks to
    Herbert Xu for supplying debugging patches to help track this down,
    and also finding a mistake in an earlier version of this fix.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Signed-off-by: Robert Olsson
    Signed-off-by: David S. Miller

    Robert Olsson
     
  • It doesn't seem to make much sense to let an "If unsure, say N." option
    default to y.

    Signed-off-by: Adrian Bunk
    Signed-off-by: David S. Miller

    Adrian Bunk
     
  • Since it is tristate when we offer it as a choice, we should
    definte it also as tristate when forcing it as the default.
    Otherwise kconfig warns.

    Signed-off-by: David S. Miller

    David S. Miller
     

26 Jun, 2005

2 commits

  • Linus Torvalds
     
  • 1. Establish a simple API for process freezing defined in linux/include/sched.h:

    frozen(process) Check for frozen process
    freezing(process) Check if a process is being frozen
    freeze(process) Tell a process to freeze (go to refrigerator)
    thaw_process(process) Restart process
    frozen_process(process) Process is frozen now

    2. Remove all references to PF_FREEZE and PF_FROZEN from all
    kernel sources except sched.h

    3. Fix numerous locations where try_to_freeze is manually done by a driver

    4. Remove the argument that is no longer necessary from two function calls.

    5. Some whitespace cleanup

    6. Clear potential race in refrigerator (provides an open window of PF_FREEZE
    cleared before setting PF_FROZEN, recalc_sigpending does not check
    PF_FROZEN).

    This patch does not address the problem of freeze_processes() violating the rule
    that a task may only modify its own flags by setting PF_FREEZE. This is not clean
    in an SMP environment. freeze(process) is therefore not SMP safe!

    Signed-off-by: Christoph Lameter
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

25 Jun, 2005

4 commits


24 Jun, 2005

23 commits

  • Linus Torvalds
     
  • Another rollup of patches which give various symbols static scope

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • rpc_create_client was modified recently to do its own (synchronous) NULL ping
    of the server. We'd rather do that on our own, asynchronously, so that we
    don't have to block the nfsd thread doing the probe, and so that setclientid
    handling (hence, client mounts) can proceed normally whether the callback is
    succesful or not. (We can still function fine without the callback
    channel--we just won't be able to give out delegations till it's verified to
    work.)

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Signed-off-by: David S. Miller

    David S. Miller
     
  • Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Finds a pattern in the skb data according to the specified
    textsearch configuration. Use textsearch_next() to retrieve
    subsequent occurrences of the pattern. Returns the offset
    to the first occurrence or UINT_MAX if no match was found.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Implements sequential reading for both linear and non-linear
    skb data at zerocopy cost. The data is returned in chunks of
    arbitary length, therefore random access is not possible.

    Usage:
    from := 0
    to := 128
    state := undef
    data := undef
    len := undef
    consumed := 0

    skb_prepare_seq_read(skb, from, to, &state)
    while (len = skb_seq_read(consumed, &data, &state)) != 0 do
    /* do something with 'data' of length 'len' */
    if abort then
    /* abort read if we don't wait for
    * skb_seq_read() to return 0 */
    skb_abort_seq_read(&state)
    return
    endif
    /* not necessary to consume all of 'len' */
    consumed += len
    done

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Allow using setsockopt to set TCP congestion control to use on a per
    socket basis.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Separate out the two uses of netdev_max_backlog. One controls the
    upper bound on packets processed per softirq, the new name for this is
    netdev_budget; the other controls the limit on packets queued via
    netif_rx.

    Increase the max_backlog default to account for faster processors.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Eliminate the throttling behaviour when the netif receive queue fills
    because it behaves badly when using high speed networks under load.
    The throttling cause multiple packet drops that cause TCP to go into
    slow start mode. The same effective patch has been part of BIC TCP and
    H-TCP as well as part of Web100.

    The existing code drops 100's of packets when the queue fills;
    this changes it to individual packet drop-tail.

    Signed-off-by: Stephen Hemmminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Remove the congestion sensing mechanism from netif_rx, and always
    return either full or empty. Almost no driver checks the return value
    from netif_rx, and those that do only use it for debug messages.

    The original design of netif_rx was to do flow control based on the
    receive queue, but NAPI has supplanted this and no driver uses the
    feedback.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Remove last vestiages of fastroute code that is no longer used.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • This patch implements Tom Kelly's Scalable TCP congestion control algorithm
    for the modular framework.

    The algorithm has some nice scaling properties, and has been used a fair bit
    in research, though is known to have significant fairness issues, so it's not
    really suitable for general purpose use.

    Signed-off-by: John Heffner
    Signed-off-by: David S. Miller

    John Heffner
     
  • H-TCP is a congestion control algorithm developed at the Hamilton Institute, by
    Douglas Leith and Robert Shorten. It is extending the standard Reno algorithm
    with mode switching is thus a relatively simple modification.

    H-TCP is defined in a layered manner as it is still a research platform. The
    basic form includes the modification of beta according to the ratio of maxRTT
    to min RTT and the alpha=2*factor*(1-beta) relation, where factor is dependant
    on the time since last congestion.

    The other layers improve convergence by adding appropriate factors to alpha.

    The following patch implements the H-TCP algorithm in it's basic form.

    Signed-Off-By: Baruch Even
    Signed-off-by: David S. Miller

    Baruch Even
     
  • TCP Vegas code modified for the new TCP infrastructure.
    Vegas now uses microsecond resolution timestamps for
    better estimation of performance over higher speed links.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • TCP Hybla congestion avoidance.

    - "In heterogeneous networks, TCP connections that incorporate a
    terrestrial or satellite radio link are greatly disadvantaged with
    respect to entirely wired connections, because of their longer round
    trip times (RTTs). To cope with this problem, a new TCP proposal, the
    TCP Hybla, is presented and discussed in the paper[1]. It stems from an
    analytical evaluation of the congestion window dynamics in the TCP
    standard versions (Tahoe, Reno, NewReno), which suggests the necessary
    modifications to remove the performance dependence on RTT.[...]"[1]

    [1]: Carlo Caini, Rosario Firrincieli, "TCP Hybla: a TCP enhancement for
    heterogeneous networks",
    International Journal of Satellite Communications and Networking
    Volume 22, Issue 5 , Pages 547 - 566. September 2004.

    Signed-off-by: Daniele Lacamera (root at danielinux.net)net
    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Daniele Lacamera
     
  • Sally Floyd's high speed TCP congestion control.
    This is useful for comparison and research.

    Signed-off-by: John Heffner
    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    John Heffner
     
  • This is the existing 2.6.12 Westwood code moved from tcp_input
    to the new congestion framework. A lot of the inline functions
    have been eliminated to try and make it clearer.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • TCP BIC congestion control reworked to use the new congestion control
    infrastructure. This version is more up to date than the BIC
    code in 2.6.12; it incorporates enhancements from BICTCP 1.1,
    to handle low latency links.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Enhancement to the tcp_diag interface used by the iproute2 ss command
    to report the tcp congestion control being used by a socket.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Allow TCP to have multiple pluggable congestion control algorithms.
    Algorithms are defined by a set of operations and can be built in
    or modules. The legacy "new RENO" algorithm is used as a starting
    point and fallback.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • This patch creates a new kstrdup library function and changes the "local"
    implementations in several places to use this function.

    Most of the changes come from the sound and net subsystems. The sound part
    had already been acknowledged by Takashi Iwai and the net part by David S.
    Miller.

    I left UML alone for now because I would need more time to read the code
    carefully before making changes there.

    Signed-off-by: Paulo Marques
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paulo Marques
     

23 Jun, 2005

6 commits

  • Linus Torvalds
     
  • This patch is a follow up to patch 1 regarding "Selective Sub Address
    matching with call user data". It allows use of the Fast-Select-Acceptance
    optional user facility for X.25.

    This patch just implements fast select with no restriction on response
    (NRR). What this means (according to ITU-T Recomendation 10/96 section
    6.16) is that if in an incoming call packet, the relevant facility bits are
    set for fast-select-NRR, then the called DTE can issue a direct response to
    the incoming packet using a call-accepted packet that contains
    call-user-data. This patch allows such a response.

    The called DTE can also respond with a clear-request packet that contains
    call-user-data. However, this feature is currently not implemented by the
    patch.

    How is Fast Select Acceptance used?
    By default, the system does not allow fast select acceptance (as before).
    To enable a response to fast select acceptance,
    After a listen socket in created and bound as follows
    socket(AF_X25, SOCK_SEQPACKET, 0);
    bind(call_soc, (struct sockaddr *)&locl_addr, sizeof(locl_addr));
    but before a listen system call is made, the following ioctl should be used.
    ioctl(call_soc,SIOCX25CALLACCPTAPPRV);
    Now the listen system call can be made
    listen(call_soc, 4);
    After this, an incoming-call packet will be accepted, but no call-accepted
    packet will be sent back until the following system call is made on the socket
    that accepts the call
    ioctl(vc_soc,SIOCX25SENDCALLACCPT);
    The network (or cisco xot router used for testing here) will allow the
    application server's call-user-data in the call-accepted packet,
    provided the call-request was made with Fast-select NRR.

    Signed-off-by: Shaun Pereira
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Shaun Pereira
     
  • From: Shaun Pereira

    This is the first (independent of the second) patch of two that I am
    working on with x25 on linux (tested with xot on a cisco router). Details
    are as follows.

    Current state of module:

    A server using the current implementation (2.6.11.7) of the x25 module will
    accept a call request/ incoming call packet at the listening x.25 address,
    from all callers to that address, as long as NO call user data is present
    in the packet header.

    If the server needs to choose to accept a particular call request/ incoming
    call packet arriving at its listening x25 address, then the kernel has to
    allow a match of call user data present in the call request packet with its
    own. This is required when multiple servers listen at the same x25 address
    and device interface. The kernel currently matches ALL call user data, if
    present.

    Current Changes:

    This patch is a follow up to the patch submitted previously by Andrew
    Hendry, and allows the user to selectively control the number of octets of
    call user data in the call request packet, that the kernel will match. By
    default no call user data is matched, even if call user data is present.
    To allow call user data matching, a cudmatchlength > 0 has to be passed
    into the kernel after which the passed number of octets will be matched.
    Otherwise the kernel behavior is exactly as the original implementation.

    This patch also ensures that as is normally the case, no call user data
    will be present in the Call accepted / call connected packet sent back to
    the caller

    Future Changes on next patch:

    There are cases however when call user data may be present in the call
    accepted packet. According to the X.25 recommendation (ITU-T 10/96)
    section 5.2.3.2 call user data may be present in the call accepted packet
    provided the fast select facility is used. My next patch will include this
    fast select utility and the ability to send up to 128 octets call user data
    in the call accepted packet provided the fast select facility is used. I
    am currently testing this, again with xot on linux and cisco.

    Signed-off-by: Shaun Pereira

    (With a fix from Alexey Dobriyan )
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Shaun Pereira
     
  • From: jlamanna@gmail.com

    ebtables.c vfree() checking cleanups.

    Signed-off by: James Lamanna
    Signed-off-by: Domen Puncer
    Signed-off-by: David S. Miller

    James Lamanna
     
  • From: Nishanth Aravamudan

    Use msleep() instead of schedule_timeout() to guarantee the task
    delays as expected. The current code is not wrong, but it does not account for
    early return due to signals, so I think msleep() should be appropriate.

    Signed-off-by: Nishanth Aravamudan
    Signed-off-by: Domen Puncer
    Signed-off-by: David S. Miller

    Nishanth Aravamudan
     
  • Signed-off by: Chuck Short
    Signed-off-by: David S. Miller

    Chuck Short