14 Jan, 2011

1 commit

  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
    Documentation/trace/events.txt: Remove obsolete sched_signal_send.
    writeback: fix global_dirty_limits comment runtime -> real-time
    ppc: fix comment typo singal -> signal
    drivers: fix comment typo diable -> disable.
    m68k: fix comment typo diable -> disable.
    wireless: comment typo fix diable -> disable.
    media: comment typo fix diable -> disable.
    remove doc for obsolete dynamic-printk kernel-parameter
    remove extraneous 'is' from Documentation/iostats.txt
    Fix spelling milisec -> ms in snd_ps3 module parameter description
    Fix spelling mistakes in comments
    Revert conflicting V4L changes
    i7core_edac: fix typos in comments
    mm/rmap.c: fix comment
    sound, ca0106: Fix assignment to 'channel'.
    hrtimer: fix a typo in comment
    init/Kconfig: fix typo
    anon_inodes: fix wrong function name in comment
    fix comment typos concerning "consistent"
    poll: fix a typo in comment
    ...

    Fix up trivial conflicts in:
    - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c)
    - fs/ext4/ext4.h

    Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.

    Linus Torvalds
     

23 Dec, 2010

1 commit


21 Dec, 2010

1 commit

  • This patch changes the default initial receive window to 10 mss
    (defined constant). The default window is limited to the maximum
    of 10*1460 and 2*mss (when mss > 1460).

    draft-ietf-tcpm-initcwnd-00 is a proposal to the IETF that recommends
    increasing TCP's initial congestion window to 10 mss or about 15KB.
    Leading up to this proposal were several large-scale live Internet
    experiments with an initial congestion window of 10 mss (IW10), where
    we showed that the average latency of HTTP responses improved by
    approximately 10%. This was accompanied by a slight increase in
    retransmission rate (0.5%), most of which is coming from applications
    opening multiple simultaneous connections. To understand the extreme
    worst case scenarios, and fairness issues (IW10 versus IW3), we further
    conducted controlled testbed experiments. We came away finding minimal
    negative impact even under low link bandwidths (dial-ups) and small
    buffers. These results are extremely encouraging to adopting IW10.

    However, an initial congestion window of 10 mss is useless unless a TCP
    receiver advertises an initial receive window of at least 10 mss.
    Fortunately, in the large-scale Internet experiments we found that most
    widely used operating systems advertised large initial receive windows
    of 64KB, allowing us to experiment with a wide range of initial
    congestion windows. Linux systems were among the few exceptions that
    advertised a small receive window of 6KB. The purpose of this patch is
    to fix this shortcoming.

    References:
    1. A comprehensive list of all IW10 references to date.
    http://code.google.com/speed/protocols/tcpm-IW10.html

    2. Paper describing results from large-scale Internet experiments with IW10.
    http://ccr.sigcomm.org/drupal/?q=node/621

    3. Controlled testbed experiments under worst case scenarios and a
    fairness study.
    http://www.ietf.org/proceedings/79/slides/tcpm-0.pdf

    4. Raw test data from testbed experiments (Linux senders/receivers)
    with initial congestion and receive windows of both 10 mss.
    http://research.csc.ncsu.edu/netsrv/?q=content/iw10

    5. Internet-Draft. Increasing TCP's Initial Window.
    https://datatracker.ietf.org/doc/draft-ietf-tcpm-initcwnd/

    Signed-off-by: Nandita Dukkipati
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Nandita Dukkipati
     

14 Dec, 2010

1 commit

  • Make all RTAX_ADVMSS metric accesses go through a new helper function,
    dst_metric_advmss().

    Leave the actual default metric as "zero" in the real metric slot,
    and compute the actual default value dynamically via a new dst_ops
    AF specific callback.

    For stacked IPSEC routes, we use the advmss of the path which
    preserves existing behavior.

    Unlike ipv4/ipv6, DecNET ties the advmss to the mtu and thus updates
    advmss on pmtu updates. This inconsistency in advmss handling
    results in more raw metric accesses than I wish we ended up with.

    Signed-off-by: David S. Miller

    David S. Miller
     

09 Dec, 2010

4 commits


03 Dec, 2010

1 commit

  • TCP_BASE_MSS is defined, but not used.
    commit 5d424d5a introduce this macro, so use
    it to initial sysctl_tcp_base_mss.

    commit 5d424d5a674f782d0659a3b66d951f412901faee
    Author: John Heffner
    Date: Mon Mar 20 17:53:41 2006 -0800

    [TCP]: MTU probing

    Signed-off-by: Shan Wei
    Signed-off-by: David S. Miller

    Shan Wei
     

25 Nov, 2010

1 commit

  • In dev_pick_tx, don't do work in calculating queue
    index or setting
    the index in the sock unless the device has more than one queue. This
    allows the sock to be set only with a queue index of a multi-queue
    device which is desirable if device are stacked like in a tunnel.

    We also allow the mapping of a socket to queue to be changed. To
    maintain in order packet transmission a flag (ooo_okay) has been
    added to the sk_buff structure. If a transport layer sets this flag
    on a packet, the transmit queue can be changed for the socket.
    Presumably, the transport would set this if there was no possbility
    of creating OOO packets (for instance, there are no packets in flight
    for the socket). This patch includes the modification in TCP output
    for setting this flag.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

18 Nov, 2010

1 commit

  • The current tcp_connect code completely ignores errors from sending an skb.
    This makes sense in many situations (like -ENOBUFFS) but I want to be able to
    immediately fail connections if they are denied by the SELinux netfilter hook.
    Netfilter does not normally return ECONNREFUSED when it drops a packet so we
    respect that error code as a final and fatal error that can not be recovered.

    Based-on-patch-by: Patrick McHardy
    Signed-off-by: Eric Paris
    Signed-off-by: David S. Miller

    Eric Paris
     

02 Nov, 2010

1 commit

  • "gadget", "through", "command", "maintain", "maintain", "controller", "address",
    "between", "initiali[zs]e", "instead", "function", "select", "already",
    "equal", "access", "management", "hierarchy", "registration", "interest",
    "relative", "memory", "offset", "already",

    Signed-off-by: Uwe Kleine-König
    Signed-off-by: Jiri Kosina

    Uwe Kleine-König
     

24 Sep, 2010

1 commit


02 Sep, 2010

1 commit


23 Aug, 2010

1 commit

  • Via setsockopt it is possible to reduce the socket RX buffer
    (SO_RCVBUF). TCP method to select the initial window and window scaling
    option in tcp_select_initial_window() currently misbehaves and do not
    consider a reduced RX socket buffer via setsockopt.

    Even though the server's RX buffer is reduced via setsockopt() to 256
    byte (Initial Window 384 byte => 256 * 2 - (256 * 2 / 4)) the window
    scale option is still 7:

    192.168.1.38.40676 > 78.47.222.210.5001: Flags [S], seq 2577214362, win 5840, options [mss 1460,sackOK,TS val 338417 ecr 0,nop,wscale 0], length 0
    78.47.222.210.5001 > 192.168.1.38.40676: Flags [S.], seq 1570631029, ack 2577214363, win 384, options [mss 1452,sackOK,TS val 2435248895 ecr 338417,nop,wscale 7], length 0
    192.168.1.38.40676 > 78.47.222.210.5001: Flags [.], ack 1, win 5840, options [nop,nop,TS val 338421 ecr 2435248895], length 0

    Within tcp_select_initial_window() the original space argument - a
    representation of the rx buffer size - is expanded during
    tcp_select_initial_window(). Only sysctl_tcp_rmem[2], sysctl_rmem_max
    and window_clamp are considered to calculate the initial window.

    This patch adjust the window_clamp argument if the user explicitly
    reduce the receive buffer.

    Signed-off-by: Hagen Paul Pfeifer
    Cc: David S. Miller
    Cc: Patrick McHardy
    Cc: Eric Dumazet
    Cc: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Hagen Paul Pfeifer
     

21 Jul, 2010

1 commit


20 Jul, 2010

1 commit

  • It can happen that there are no packets in queue while calling
    tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns
    NULL and that gets deref'ed to get sacked into a local var.

    There is no work to do if no packets are outstanding so we just
    exit early.

    This oops was introduced by 08ebd1721ab8fd (tcp: remove tp->lost_out
    guard to make joining diff nicer).

    Signed-off-by: Ilpo Järvinen
    Reported-by: Lennart Schulte
    Tested-by: Lennart Schulte
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     

13 Jul, 2010

1 commit


29 Jun, 2010

1 commit


16 Jun, 2010

1 commit

  • unify tcp flag macros: TCPHDR_FIN, TCPHDR_SYN, TCPHDR_RST, TCPHDR_PSH,
    TCPHDR_ACK, TCPHDR_URG, TCPHDR_ECE and TCPHDR_CWR. TCBCB_FLAG_* are replaced
    with the corresponding TCPHDR_*.

    Signed-off-by: Changli Gao
    ----
    include/net/tcp.h | 24 ++++++-------
    net/ipv4/tcp.c | 8 ++--
    net/ipv4/tcp_input.c | 2 -
    net/ipv4/tcp_output.c | 59 ++++++++++++++++-----------------
    net/netfilter/nf_conntrack_proto_tcp.c | 32 ++++++-----------
    net/netfilter/xt_TCPMSS.c | 4 --
    6 files changed, 58 insertions(+), 71 deletions(-)
    Signed-off-by: David S. Miller

    Changli Gao
     

18 May, 2010

1 commit

  • Commit 33ad798c924b4a (tcp: options clean up) introduced a problem
    if MD5+SACK+timestamps were used in initial SYN message.

    Some stacks (old linux for example) try to negotiate MD5+SACK+TSTAMP
    sessions, but since 40 bytes of tcp options space are not enough to
    store all the bits needed, we chose to disable timestamps in this case.

    We send a SYN-ACK _without_ timestamp option, but socket has timestamps
    enabled and all further outgoing messages contain a TS block, all with
    the initial timestamp of the remote peer.

    Fix is to really disable timestamps option for the whole session.

    Reported-by: Bijay Singh
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

16 May, 2010

1 commit

  • TCP-MD5 sessions have intermittent failures, when route cache is
    invalidated. ip_queue_xmit() has to find a new route, calls
    sk_setup_caps(sk, &rt->u.dst), destroying the

    sk->sk_route_caps &= ~NETIF_F_GSO_MASK

    that MD5 desperately try to make all over its way (from
    tcp_transmit_skb() for example)

    So we send few bad packets, and everything is fine when
    tcp_transmit_skb() is called again for this socket.

    Since ip_queue_xmit() is at a lower level than TCP-MD5, I chose to use a
    socket field, sk_route_nocaps, containing bits to mask on sk_route_caps.

    Reported-by: Bhaskar Dutta
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Apr, 2010

1 commit


21 Apr, 2010

1 commit


16 Apr, 2010

1 commit

  • As Herbert Xu said: we should be able to simply replace ipfragok
    with skb->local_df. commit f88037(sctp: Drop ipfargok in sctp_xmit function)
    has droped ipfragok and set local_df value properly.

    The patch kills the ipfragok parameter of .queue_xmit().

    Signed-off-by: Shan Wei
    Signed-off-by: David S. Miller

    Shan Wei
     

12 Apr, 2010

2 commits

  • Back in commit 04a0551c87363f100b04d28d7a15a632b70e18e7
    ("loopback: Drop obsolete ip_summed setting") we stopped
    setting CHECKSUM_UNNECESSARY in the loopback xmit.

    This is because such a setting was a lie since it implies that the
    checksum field of the packet is properly filled in.

    Instead what happens normally is that CHECKSUM_PARTIAL is set and
    skb->csum is calculated as needed.

    But this was only happening for TCP data packets (via the
    skb->ip_summed assignment done in tcp_sendmsg()). It doesn't
    happen for non-data packets like ACKs etc.

    Fix this by setting skb->ip_summed in the common non-data packet
    constructor. It already is setting skb->csum to zero.

    But this reminds us that we still have things like ip_output.c's
    ip_dev_loopback_xmit() which sets skb->ip_summed to the value
    CHECKSUM_UNNECESSARY, which Herbert's patch teaches us is not
    valid. So we'll have to address that at some point too.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • inet: Remove unused send_check length argument

    This patch removes the unused length argument from the send_check
    function in struct inet_connection_sock_af_ops.

    Signed-off-by: Herbert Xu
    Tested-by: Yinghai
    Signed-off-by: David S. Miller

    Herbert Xu
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

09 Mar, 2010

1 commit

  • Commit 4957faad (TCPCT part 1g: Responder Cookie => Initiator), part
    of TCP_COOKIE_TRANSACTION implementation, forgot to correctly size
    synack skb in case user data must be included.

    Many thanks to Mika Pentillä for spotting this error.

    Reported-by: Penttillä Mika
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 Dec, 2009

2 commits

  • Add rtnetlink init_rcvwnd to set the TCP initial receive window size
    advertised by passive and active TCP connections.
    The current Linux TCP implementation limits the advertised TCP initial
    receive window to the one prescribed by slow start. For short lived
    TCP connections used for transaction type of traffic (i.e. http
    requests), bounding the advertised TCP initial receive window results
    in increased latency to complete the transaction.
    Support for setting initial congestion window is already supported
    using rtnetlink init_cwnd, but the feature is useless without the
    ability to set a larger TCP initial receive window.
    The rtnetlink init_rcvwnd allows increasing the TCP initial receive
    window, allowing TCP connection to advertise larger TCP receive window
    than the ones bounded by slow start.

    Signed-off-by: Laurent Chavey
    Signed-off-by: David S. Miller

    laurent chavey
     
  • tcp_push checks tcp_send_head and calls __tcp_push_pending_frames,
    which again checks tcp_send_head, and this unnecessary check is
    done for every other caller of __tcp_push_pending_frames.

    Remove tcp_send_head check in __tcp_push_pending_frames and add
    the check to tcp_push_pending_frames. Other functions call
    __tcp_push_pending_frames only when tcp_send_head would evaluate
    to true.

    Signed-off-by: Krishna Kumar
    Acked-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Krishna Kumar
     

16 Dec, 2009

1 commit

  • It creates a regression, triggering badness for SYN_RECV
    sockets, for example:

    [19148.022102] Badness at net/ipv4/inet_connection_sock.c:293
    [19148.022570] NIP: c02a0914 LR: c02a0904 CTR: 00000000
    [19148.023035] REGS: eeecbd30 TRAP: 0700 Not tainted (2.6.32)
    [19148.023496] MSR: 00029032 CR: 24002442 XER: 00000000
    [19148.024012] TASK = eee9a820[1756] 'privoxy' THREAD: eeeca000

    This is likely caused by the change in the 'estab' parameter
    passed to tcp_parse_options() when invoked by the functions
    in net/ipv4/tcp_minisocks.c

    But even if that is fixed, the ->conn_request() changes made in
    this patch series is fundamentally wrong. They try to use the
    listening socket's 'dst' to probe the route settings. The
    listening socket doesn't even have a route, and you can't
    get the right route (the child request one) until much later
    after we setup all of the state, and it must be done by hand.

    This stuff really isn't ready, so the best thing to do is a
    full revert. This reverts the following commits:

    f55017a93f1a74d50244b1254b9a2bd7ac9bbf7d
    022c3f7d82f0f1c68018696f2f027b87b9bb45c2
    1aba721eba1d84a2defce45b950272cee1e6c72a
    cda42ebd67ee5fdf09d7057b5a4584d36fe8a335
    345cda2fd695534be5a4494f1b59da9daed33663
    dc343475ed062e13fc260acccaab91d7d80fd5b2
    05eaade2782fb0c90d3034fd7a7d5a16266182bb
    6a2a2d6bf8581216e08be15fcb563cfd6c430e1e

    Signed-off-by: David S. Miller

    David S. Miller
     

03 Dec, 2009

6 commits

  • Otherwise:

    ERROR: "sysctl_tcp_cookie_size" [net/ipv6/ipv6.ko] undefined!
    make[1]: *** [__modpost] Error 1

    Signed-off-by: David S. Miller

    David S. Miller
     
  • net/ipv4/tcp_output.c: In function ‘tcp_make_synack’:
    net/ipv4/tcp_output.c:2488: warning: cast from pointer to integer of different size

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Parse incoming TCP_COOKIE option(s).

    Calculate TCP_COOKIE option.

    Send optional data.

    This is a significantly revised implementation of an earlier (year-old)
    patch that no longer applies cleanly, with permission of the original
    author (Adam Langley):

    http://thread.gmane.org/gmane.linux.network/102586

    Requires:
    TCPCT part 1a: add request_values parameter for sending SYNACK
    TCPCT part 1b: generate Responder Cookie secret
    TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS
    TCPCT part 1d: define TCP cookie option, extend existing struct's
    TCPCT part 1e: implement socket option TCP_COOKIE_TRANSACTIONS
    TCPCT part 1f: Initiator Cookie => Responder

    Signed-off-by: William.Allen.Simpson@gmail.com
    Signed-off-by: David S. Miller

    William Allen Simpson
     
  • Calculate and format TCP_COOKIE option.

    This is a significantly revised implementation of an earlier (year-old)
    patch that no longer applies cleanly, with permission of the original
    author (Adam Langley):

    http://thread.gmane.org/gmane.linux.network/102586

    Requires:
    TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS
    TCPCT part 1d: define TCP cookie option, extend existing struct's

    Signed-off-by: William.Allen.Simpson@gmail.com
    Signed-off-by: David S. Miller

    William Allen Simpson
     
  • Define sysctl (tcp_cookie_size) to turn on and off the cookie option
    default globally, instead of a compiled configuration option.

    Define per socket option (TCP_COOKIE_TRANSACTIONS) for setting constant
    data values, retrieving variable cookie values, and other facilities.

    Move inline tcp_clear_options() unchanged from net/tcp.h to linux/tcp.h,
    near its corresponding struct tcp_options_received (prior to changes).

    This is a straightforward re-implementation of an earlier (year-old)
    patch that no longer applies cleanly, with permission of the original
    author (Adam Langley):

    http://thread.gmane.org/gmane.linux.network/102586

    These functions will also be used in subsequent patches that implement
    additional features.

    Requires:
    net: TCP_MSS_DEFAULT, TCP_MSS_DESIRED

    Signed-off-by: William.Allen.Simpson@gmail.com
    Signed-off-by: David S. Miller

    William Allen Simpson
     
  • Add optional function parameters associated with sending SYNACK.
    These parameters are not needed after sending SYNACK, and are not
    used for retransmission. Avoids extending struct tcp_request_sock,
    and avoids allocating kernel memory.

    Also affects DCCP as it uses common struct request_sock_ops,
    but this parameter is currently reserved for future use.

    Signed-off-by: William.Allen.Simpson@gmail.com
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    William Allen Simpson
     

24 Nov, 2009

1 commit

  • On Sun, 2009-11-22 at 16:31 -0800, David Miller wrote:
    > It should be of the form:
    > if (x &&
    > y)
    >
    > or:
    > if (x && y)
    >
    > Fix patches, rather than complaints, for existing cases where things
    > do not follow this pattern are certainly welcome.

    Also collapsed some multiple tabs to single space.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

29 Oct, 2009

1 commit