02 Sep, 2017

1 commit


26 Aug, 2017

1 commit

  • There are a few bugs around refcnt handling in the new BPF congestion
    control setsockopt:

    - The new ca is assigned to icsk->icsk_ca_ops even in the case where we
    cannot get a reference on it. This would lead to a use after free,
    since that ca is going away soon.

    - Changing the congestion control case doesn't release the refcnt on
    the previous ca.

    - In the reinit case, we first leak a reference on the old ca, then we
    call tcp_reinit_congestion_control on the ca that we have just
    assigned, leading to deinitializing the wrong ca (->release of the
    new ca on the old ca's data) and releasing the refcount on the ca
    that we actually want to use.

    This is visible by building (for example) BIC as a module and setting
    net.ipv4.tcp_congestion_control=bic, and using tcp_cong_kern.c from
    samples/bpf.

    This patch fixes the refcount issues, and moves reinit back into tcp
    core to avoid passing a ca pointer back to BPF.

    Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control")
    Signed-off-by: Sabrina Dubroca
    Acked-by: Lawrence Brakmo
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     

07 Aug, 2017

1 commit

  • Using ssthresh to revert cwnd is less reliable when ssthresh is
    bounded to 2 packets. This patch uses an existing variable in TCP
    "prior_cwnd" that snapshots the cwnd right before entering fast
    recovery and RTO recovery in Reno. This fixes the issue discussed
    in netdev thread: "A buggy behavior for Linux TCP Reno and HTCP"
    https://www.spinics.net/lists/netdev/msg444955.html

    Suggested-by: Neal Cardwell
    Reported-by: Wei Sun
    Signed-off-by: Yuchung Cheng
    Signed-off-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Yuchung Cheng
     

02 Jul, 2017

1 commit

  • Added support for changing congestion control for SOCK_OPS bpf
    programs through the setsockopt bpf helper function. It also adds
    a new SOCK_OPS op, BPF_SOCK_OPS_NEEDS_ECN, that is needed for
    congestion controls, like dctcp, that need to enable ECN in the
    SYN packets.

    Signed-off-by: Lawrence Brakmo
    Signed-off-by: David S. Miller

    Lawrence Brakmo
     

03 Jun, 2017

1 commit

  • When the sender switches its congestion control during loss
    recovery, if the recovery is spurious then it may incorrectly
    revert cwnd and ssthresh to the older values set by a previous
    congestion control. Consider a congestion control (like BBR)
    that does not use ssthresh and keeps it infinite: the connection
    may incorrectly revert cwnd to an infinite value when switching
    from BBR to another congestion control.

    This patch fixes it by disallowing such cwnd undo operation
    upon switching congestion control. Note that undo_marker
    is not reset s.t. the packets that were incorrectly marked
    lost would be corrected. We only avoid undoing the cwnd in
    tcp_undo_cwnd_reduction().

    Signed-off-by: Yuchung Cheng
    Signed-off-by: Soheil Hassas Yeganeh
    Signed-off-by: Neal Cardwell
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Yuchung Cheng
     

27 Apr, 2017

1 commit

  • Always zero out ca_priv data in tcp_assign_congestion_control() so that
    ca_priv data is cleared out during socket creation.
    Also always zero out ca_priv data in tcp_reinit_congestion_control() so
    that when cc algorithm is changed, ca_priv data is cleared out as well.
    We should still zero out ca_priv data even in TCP_CLOSE state because
    user could call connect() on AF_UNSPEC to disconnect the socket and
    leave it in TCP_CLOSE state and later call setsockopt() to switch cc
    algorithm on this socket.

    Fixes: 2b0a8c9ee ("tcp: add CDG congestion control")
    Reported-by: Andrey Konovalov
    Signed-off-by: Wei Wang
    Acked-by: Eric Dumazet
    Acked-by: Yuchung Cheng
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Wei Wang
     

23 Nov, 2016

1 commit

  • All conflicts were simple overlapping changes except perhaps
    for the Thunder driver.

    That driver has a change_mtu method explicitly for sending
    a message to the hardware. If that fails it returns an
    error.

    Normally a driver doesn't need an ndo_change_mtu method becuase those
    are usually just range changes, which are now handled generically.
    But since this extra operation is needed in the Thunder driver, it has
    to stay.

    However, if the message send fails we have to restore the original
    MTU before the change because the entire call chain expects that if
    an error is thrown by ndo_change_mtu then the MTU did not change.
    Therefore code is added to nicvf_change_mtu to remember the original
    MTU, and to restore it upon nicvf_update_hw_max_frs() failue.

    Signed-off-by: David S. Miller

    David S. Miller
     

22 Nov, 2016

2 commits

  • The undo_cwnd fallback in the stack doubles cwnd based on ssthresh,
    which un-does reno halving behaviour.

    It seems more appropriate to let congctl algorithms pair .ssthresh
    and .undo_cwnd properly. Add a 'tcp_reno_undo_cwnd' function and wire it
    up for all congestion algorithms that used to rely on the fallback.

    Cc: Eric Dumazet
    Cc: Yuchung Cheng
    Cc: Neal Cardwell
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • We need to zero out the private data area when application switches
    connection to different algorithm (TCP_CONGESTION setsockopt).

    When congestion ops get assigned at connect time everything is already
    zeroed because sk_alloc uses GFP_ZERO flag. But in the setsockopt case
    this contains whatever previous cc placed there.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

21 Sep, 2016

1 commit

  • This commit introduces an optional new "omnipotent" hook,
    cong_control(), for congestion control modules. The cong_control()
    function is called at the end of processing an ACK (i.e., after
    updating sequence numbers, the SACK scoreboard, and loss
    detection). At that moment we have precise delivery rate information
    the congestion control module can use to control the sending behavior
    (using cwnd, TSO skb size, and pacing rate) in any CA state.

    This function can also be used by a congestion control that prefers
    not to use the default cwnd reduction approach (i.e., the PRR
    algorithm) during CA_Recovery to control the cwnd and sending rate
    during loss recovery.

    We take advantage of the fact that recent changes defer the
    retransmission or transmission of new data (e.g. by F-RTO) in recovery
    until the new tcp_cong_control() function is run.

    With this commit, we only run tcp_update_pacing_rate() if the
    congestion control is not using this new API. New congestion controls
    which use the new API do not want the TCP stack to run the default
    pacing rate calculation and overwrite whatever pacing rate they have
    chosen at initialization time.

    Signed-off-by: Van Jacobson
    Signed-off-by: Neal Cardwell
    Signed-off-by: Yuchung Cheng
    Signed-off-by: Nandita Dukkipati
    Signed-off-by: Eric Dumazet
    Signed-off-by: Soheil Hassas Yeganeh
    Signed-off-by: David S. Miller

    Yuchung Cheng
     

26 Sep, 2015

1 commit

  • SYNACK packets might be sent without holding socket lock.

    For DCTCP/ECN sake, we should call INET_ECN_xmit() while
    socket lock is owned, and only when we init/change congestion control.

    This also fixies a bug if congestion module is changed from
    dctcp to another one on a listener : we now clear ECN bits
    properly.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Sep, 2015

1 commit

  • Currently, the following case doesn't use DCTCP, even if it should:
    A responder has f.e. Cubic as system wide default, but for a specific
    route to the initiating host, DCTCP is being set in RTAX_CC_ALGO. The
    initiating host then uses DCTCP as congestion control, but since the
    initiator sets ECT(0), tcp_ecn_create_request() doesn't set ecn_ok,
    and we have to fall back to Reno after 3WHS completes.

    We were thinking on how to solve this in a minimal, non-intrusive
    way without bloating tcp_ecn_create_request() needlessly: lets cache
    the CA ecn option flag in RTAX_FEATURES. In other words, when ECT(0)
    is set on the SYN packet, set ecn_ok=1 iff route RTAX_FEATURES
    contains the unexposed (internal-only) DST_FEATURE_ECN_CA. This allows
    to only do a single metric feature lookup inside tcp_ecn_create_request().

    Joint work with Florian Westphal.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

10 Jul, 2015

2 commits

  • In the original design slow start is only used to raise cwnd
    when cwnd is stricly below ssthresh. It makes little sense
    to slow start when cwnd == ssthresh: especially
    when hystart has set ssthresh in the initial ramp, or after
    recovery when cwnd resets to ssthresh. Not doing so will
    also help reduce the buffer bloat slightly.

    Signed-off-by: Yuchung Cheng
    Signed-off-by: Neal Cardwell
    Signed-off-by: Eric Dumazet
    Signed-off-by: Nandita Dukkipati
    Signed-off-by: David S. Miller

    Yuchung Cheng
     
  • Add a helper to test the slow start condition in various congestion
    control modules and other places. This is to prepare a slight improvement
    in policy as to exactly when to slow start.

    Signed-off-by: Yuchung Cheng
    Signed-off-by: Neal Cardwell
    Signed-off-by: Eric Dumazet
    Signed-off-by: Nandita Dukkipati
    Signed-off-by: David S. Miller

    Yuchung Cheng
     

01 Jun, 2015

1 commit

  • Linux 3.17 and earlier are explicitly engineered so that if the app
    doesn't specifically request a CC module on a listener before the SYN
    arrives, then the child gets the system default CC when the connection
    is established. See tcp_init_congestion_control() in 3.17 or earlier,
    which says "if no choice made yet assign the current value set as
    default". The change ("net: tcp: assign tcp cong_ops when tcp sk is
    created") altered these semantics, so that children got their parent
    listener's congestion control even if the system default had changed
    after the listener was created.

    This commit returns to those original semantics from 3.17 and earlier,
    since they are the original semantics from 2007 in 4d4d3d1e8 ("[TCP]:
    Congestion control initialization."), and some Linux congestion
    control workflows depend on that.

    In summary, if a listener socket specifically sets TCP_CONGESTION to
    "x", or the route locks the CC module to "x", then the child gets
    "x". Otherwise the child gets current system default from
    net.ipv4.tcp_congestion_control. That's the behavior in 3.17 and
    earlier, and this commit restores that.

    Fixes: 55d8694fa82c ("net: tcp: assign tcp cong_ops when tcp sk is created")
    Cc: Florian Westphal
    Cc: Daniel Borkmann
    Cc: Glenn Judd
    Cc: Stephen Hemminger
    Signed-off-by: Neal Cardwell
    Signed-off-by: Eric Dumazet
    Signed-off-by: Yuchung Cheng
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Neal Cardwell
     

21 Mar, 2015

1 commit

  • Conflicts:
    drivers/net/ethernet/emulex/benet/be_main.c
    net/core/sysctl_net_core.c
    net/ipv4/inet_diag.c

    The be_main.c conflict resolution was really tricky. The conflict
    hunks generated by GIT were very unhelpful, to say the least. It
    split functions in half and moved them around, when the real actual
    conflict only existed solely inside of one function, that being
    be_map_pci_bars().

    So instead, to resolve this, I checked out be_main.c from the top
    of net-next, then I applied the be_main.c changes from 'net' since
    the last time I merged. And this worked beautifully.

    The inet_diag.c and sysctl_net_core.c conflicts were simple
    overlapping changes, and were easily to resolve.

    Signed-off-by: David S. Miller

    David S. Miller
     

12 Mar, 2015

1 commit

  • The recent change to tcp_cong_avoid_ai() to handle stretch ACKs
    introduced a bug where snd_cwnd_cnt could accumulate a very large
    value while w was large, and then if w was reduced snd_cwnd could be
    incremented by a large delta, leading to a large burst and high packet
    loss. This was tickled when CUBIC's bictcp_update() sets "ca->cnt =
    100 * cwnd".

    This bug crept in while preparing the upstream version of
    814d488c6126.

    Testing: This patch has been tested in datacenter netperf transfers
    and live youtube.com and google.com servers.

    Fixes: 814d488c6126 ("tcp: fix the timid additive increase on stretch ACKs")
    Signed-off-by: Neal Cardwell
    Signed-off-by: Yuchung Cheng
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Neal Cardwell
     

21 Feb, 2015

1 commit


06 Feb, 2015

1 commit

  • Conflicts:
    drivers/net/vxlan.c
    drivers/vhost/net.c
    include/linux/if_vlan.h
    net/core/dev.c

    The net/core/dev.c conflict was the overlap of one commit marking an
    existing function static whilst another was adding a new function.

    In the include/linux/if_vlan.h case, the type used for a local
    variable was changed in 'net', whereas the function got rewritten
    to fix a stacked vlan bug in 'net-next'.

    In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next'
    overlapped with an endainness fix for VHOST 1.0 in 'net'.

    In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter
    in 'net-next' whereas in 'net' there was a bug fix to pass in the
    correct network namespace pointer in calls to this function.

    Signed-off-by: David S. Miller

    David S. Miller
     

29 Jan, 2015

3 commits

  • Change Reno to properly handle stretch ACKs in additive increase mode
    by passing in the count of ACKed packets to tcp_cong_avoid_ai().

    In addition, if snd_cwnd crosses snd_ssthresh during slow start
    processing, and we then exit slow start mode, we need to carry over
    any remaining "credit" for packets ACKed and apply that to additive
    increase by passing this remaining "acked" count to
    tcp_cong_avoid_ai().

    Reported-by: Eyal Perry
    Signed-off-by: Neal Cardwell
    Signed-off-by: Yuchung Cheng
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Neal Cardwell
     
  • tcp_cong_avoid_ai() was too timid (snd_cwnd increased too slowly) on
    "stretch ACKs" -- cases where the receiver ACKed more than 1 packet in
    a single ACK. For example, suppose w is 10 and we get a stretch ACK
    for 20 packets, so acked is 20. We ought to increase snd_cwnd by 2
    (since acked/w = 20/10 = 2), but instead we were only increasing cwnd
    by 1. This patch fixes that behavior.

    Reported-by: Eyal Perry
    Signed-off-by: Neal Cardwell
    Signed-off-by: Yuchung Cheng
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Neal Cardwell
     
  • LRO, GRO, delayed ACKs, and middleboxes can cause "stretch ACKs" that
    cover more than the RFC-specified maximum of 2 packets. These stretch
    ACKs can cause serious performance shortfalls in common congestion
    control algorithms that were designed and tuned years ago with
    receiver hosts that were not using LRO or GRO, and were instead
    politely ACKing every other packet.

    This patch series fixes Reno and CUBIC to handle stretch ACKs.

    This patch prepares for the upcoming stretch ACK bug fix patches. It
    adds an "acked" parameter to tcp_cong_avoid_ai() to allow for future
    fixes to tcp_cong_avoid_ai() to correctly handle stretch ACKs, and
    changes all congestion control algorithms to pass in 1 for the ACKed
    count. It also changes tcp_slow_start() to return the number of packet
    ACK "credits" that were not processed in slow start mode, and can be
    processed by the congestion control module in additive increase mode.

    In future patches we will fix tcp_cong_avoid_ai() to handle stretch
    ACKs, and fix Reno and CUBIC handling of stretch ACKs in slow start
    and additive increase mode.

    Reported-by: Eyal Perry
    Signed-off-by: Neal Cardwell
    Signed-off-by: Yuchung Cheng
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Neal Cardwell
     

06 Jan, 2015

2 commits

  • This patch adds necessary infrastructure to the congestion control
    framework for later per route congestion control support.

    For a per route congestion control possibility, our aim is to store
    a unique u32 key identifier into dst metrics, which can then be
    mapped into a tcp_congestion_ops struct. We argue that having a
    RTAX key entry is the most simple, generic and easy way to manage,
    and also keeps the memory footprint of dst entries lower on 64 bit
    than with storing a pointer directly, for example. Having a unique
    key id also allows for decoupling actual TCP congestion control
    module management from the FIB layer, i.e. we don't have to care
    about expensive module refcounting inside the FIB at this point.

    We first thought of using an IDR store for the realization, which
    takes over dynamic assignment of unused key space and also performs
    the key to pointer mapping in RCU. While doing so, we stumbled upon
    the issue that due to the nature of dynamic key distribution, it
    just so happens, arguably in very rare occasions, that excessive
    module loads and unloads can lead to a possible reuse of previously
    used key space. Thus, previously stale keys in the dst metric are
    now being reassigned to a different congestion control algorithm,
    which might lead to unexpected behaviour. One way to resolve this
    would have been to walk FIBs on the actually rare occasion of a
    module unload and reset the metric keys for each FIB in each netns,
    but that's just very costly.

    Therefore, we argue a better solution is to reuse the unique
    congestion control algorithm name member and map that into u32 key
    space through jhash. For that, we split the flags attribute (as it
    currently uses 2 bits only anyway) into two u32 attributes, flags
    and key, so that we can keep the cacheline boundary of 2 cachelines
    on x86_64 and cache the precalculated key at registration time for
    the fast path. On average we might expect 2 - 4 modules being loaded
    worst case perhaps 15, so a key collision possibility is extremely
    low, and guaranteed collision-free on LE/BE for all in-tree modules.
    Overall this results in much simpler code, and all without the
    overhead of an IDR. Due to the deterministic nature, modules can
    now be unloaded, the congestion control algorithm for a specific
    but unloaded key will fall back to the default one, and on module
    reload time it will switch back to the expected algorithm
    transparently.

    Joint work with Florian Westphal.

    Signed-off-by: Florian Westphal
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • We can just move this to an extra function and make the code
    a bit more readable, no functional change.

    Joint work with Florian Westphal.

    Signed-off-by: Florian Westphal
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

05 Nov, 2014

1 commit


01 Oct, 2014

1 commit


29 Sep, 2014

1 commit

  • Split assignment and initialization from one into two functions.

    This is required by followup patches that add Datacenter TCP
    (DCTCP) congestion control algorithm - we need to be able to
    determine if the connection is moderated by DCTCP before the
    3WHS has finished.

    As we walk the available congestion control list during the
    assignment, we are always guaranteed to have Reno present as
    it's fixed compiled-in. Therefore, since we're doing the
    early assignment, we don't have a real use for the Reno alias
    tcp_init_congestion_ops anymore and can thus remove it.

    Actual usage of the congestion control operations are being
    made after the 3WHS has finished, in some cases however we
    can access get_info() via diag if implemented, therefore we
    need to zero out the private area for those modules.

    Joint work with Daniel Borkmann and Glenn Judd.

    Signed-off-by: Florian Westphal
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Glenn Judd
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Florian Westphal
     

02 Sep, 2014

1 commit

  • Fix places where there is space before tab, long lines, and
    awkward if(){, double spacing etc. Add blank line after declaration/initialization.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     

04 May, 2014

1 commit


03 May, 2014

1 commit

  • Yuchung discovered tcp_is_cwnd_limited() was returning false in
    slow start phase even if the application filled the socket write queue.

    All congestion modules take into account tcp_is_cwnd_limited()
    before increasing cwnd, so this behavior limits slow start from
    probing the bandwidth at full speed.

    The problem is that even if write queue is full (aka we are _not_
    application limited), cwnd can be under utilized if TSO should auto
    defer or TCP Small queues decided to hold packets.

    So the in_flight can be kept to smaller value, and we can get to the
    point tcp_is_cwnd_limited() returns false.

    With TCP Small Queues and FQ/pacing, this issue is more visible.

    We fix this by having tcp_cwnd_validate(), which is supposed to track
    such things, take into account unsent_segs, the number of segs that we
    are not sending at the moment due to TSO or TSQ, but intend to send
    real soon. Then when we are cwnd-limited, remember this fact while we
    are processing the window of ACKs that comes back.

    For example, suppose we have a brand new connection with cwnd=10; we
    are in slow start, and we send a flight of 9 packets. By the time we
    have received ACKs for all 9 packets we want our cwnd to be 18.
    We implement this by setting tp->lsnd_pending to 9, and
    considering ourselves to be cwnd-limited while cwnd is less than
    twice tp->lsnd_pending (2*9 -> 18).

    This makes tcp_is_cwnd_limited() more understandable, by removing
    the GSO/TSO kludge, that tried to work around the issue.

    Note the in_flight parameter can be removed in a followup cleanup
    patch.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Neal Cardwell
    Signed-off-by: Yuchung Cheng
    Signed-off-by: David S. Miller

    Eric Dumazet
     

06 Mar, 2014

1 commit

  • Conflicts:
    drivers/net/wireless/ath/ath9k/recv.c
    drivers/net/wireless/mwifiex/pcie.c
    net/ipv6/sit.c

    The SIT driver conflict consists of a bug fix being done by hand
    in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper
    was created (netdev_alloc_pcpu_stats()) which takes care of this.

    The two wireless conflicts were overlapping changes.

    Signed-off-by: David S. Miller

    David S. Miller
     

25 Feb, 2014

1 commit

  • tcp_is_cwnd_limited() allows GSO/TSO enabled flows to increase
    their cwnd to allow a full size (64KB) TSO packet to be sent.

    Non GSO flows only allow an extra room of 3 MSS.

    For most flows with a BDP below 10 MSS, this results in a bloat
    of cwnd reaching 90, and an inflate of RTT.

    Thanks to TSO auto sizing, we can restrict the bloat to the number
    of MSS contained in a TSO packet (tp->xmit_size_goal_segs), to keep
    original intent without performance impact.

    Because we keep cwnd small, it helps to keep TSO packet size to their
    optimal value.

    Example for a 10Mbit flow, with low TCP Small queue limits (no more than
    2 skb in qdisc/device tx ring)

    Before patch :

    lpk51:~# ./ss -i dst lpk52:44862 | grep cwnd
    cubic wscale:6,6 rto:215 rtt:15.875/2.5 mss:1448 cwnd:96
    ssthresh:96
    send 70.1Mbps unacked:14 rcv_space:29200

    After patch :

    lpk51:~# ./ss -i dst lpk52:52916 | grep cwnd
    cubic wscale:6,6 rto:206 rtt:5.206/0.036 mss:1448 cwnd:15
    ssthresh:14
    send 33.4Mbps unacked:4 rcv_space:29200

    Signed-off-by: Eric Dumazet
    Cc: Yuchung Cheng
    Cc: Neal Cardwell
    Cc: Nandita Dukkipati
    Cc: Van Jacobson
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Eric Dumazet
     

14 Feb, 2014

1 commit


05 Nov, 2013

1 commit

  • Slow start now increases cwnd by 1 if an ACK acknowledges some packets,
    regardless the number of packets. Consequently slow start performance
    is highly dependent on the degree of the stretch ACKs caused by
    receiver or network ACK compression mechanisms (e.g., delayed-ACK,
    GRO, etc). But slow start algorithm is to send twice the amount of
    packets of packets left so it should process a stretch ACK of degree
    N as if N ACKs of degree 1, then exits when cwnd exceeds ssthresh. A
    follow up patch will use the remainder of the N (if greater than 1)
    to adjust cwnd in the congestion avoidance phase.

    In addition this patch retires the experimental limited slow start
    (LSS) feature. LSS has multiple drawbacks but questionable benefit. The
    fractional cwnd increase in LSS requires a loop in slow start even
    though it's rarely used. Configuring such an increase step via a global
    sysctl on different BDPS seems hard. Finally and most importantly the
    slow start overshoot concern is now better covered by the Hybrid slow
    start (hystart) enabled by default.

    Signed-off-by: Yuchung Cheng
    Signed-off-by: Neal Cardwell
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Yuchung Cheng
     

06 Feb, 2013

1 commit

  • TCP Appropriate Byte Count was added by me, but later disabled.
    There is no point in maintaining it since it is a potential source
    of bugs and Linux already implements other better window protection
    heuristics.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

04 Feb, 2013

1 commit

  • Since commit 9dc274151a548 (tcp: fix ABC in tcp_slow_start()),
    a nul snd_cwnd triggers an infinite loop in tcp_slow_start()

    Avoid this infinite loop and log a one time error for further
    analysis. FRTO code is suspected to cause this bug.

    Reported-by: Pasi Kärkkäinen
    Signed-off-by: Eric Dumazet
    Cc: Neal Cardwell
    Cc: Yuchung Cheng
    Signed-off-by: David S. Miller

    Eric Dumazet
     

14 Dec, 2012

1 commit

  • Pull trivial branch from Jiri Kosina:
    "Usual stuff -- comment/printk typo fixes, documentation updates, dead
    code elimination."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
    HOWTO: fix double words typo
    x86 mtrr: fix comment typo in mtrr_bp_init
    propagate name change to comments in kernel source
    doc: Update the name of profiling based on sysfs
    treewide: Fix typos in various drivers
    treewide: Fix typos in various Kconfig
    wireless: mwifiex: Fix typo in wireless/mwifiex driver
    messages: i2o: Fix typo in messages/i2o
    scripts/kernel-doc: check that non-void fcts describe their return value
    Kernel-doc: Convention: Use a "Return" section to describe return values
    radeon: Fix typo and copy/paste error in comments
    doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c
    various: Fix spelling of "asynchronous" in comments.
    Fix misspellings of "whether" in comments.
    eisa: Fix spelling of "asynchronous".
    various: Fix spelling of "registered" in comments.
    doc: fix quite a few typos within Documentation
    target: iscsi: fix comment typos in target/iscsi drivers
    treewide: fix typo of "suport" in various comments and Kconfig
    treewide: fix typo of "suppport" in various comments
    ...

    Linus Torvalds
     

19 Nov, 2012

2 commits

  • Signed-off-by: Masanari Iida
    Signed-off-by: Jiri Kosina

    Masanari Iida
     
  • Allow an unpriviled user who has created a user namespace, and then
    created a network namespace to effectively use the new network
    namespace, by reducing capable(CAP_NET_ADMIN) and
    capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
    CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.

    Settings that merely control a single network device are allowed.
    Either the network device is a logical network device where
    restrictions make no difference or the network device is hardware NIC
    that has been explicity moved from the initial network namespace.

    In general policy and network stack state changes are allowed
    while resource control is left unchanged.

    Allow creating raw sockets.
    Allow the SIOCSARP ioctl to control the arp cache.
    Allow the SIOCSIFFLAG ioctl to allow setting network device flags.
    Allow the SIOCSIFADDR ioctl to allow setting a netdevice ipv4 address.
    Allow the SIOCSIFBRDADDR ioctl to allow setting a netdevice ipv4 broadcast address.
    Allow the SIOCSIFDSTADDR ioctl to allow setting a netdevice ipv4 destination address.
    Allow the SIOCSIFNETMASK ioctl to allow setting a netdevice ipv4 netmask.
    Allow the SIOCADDRT and SIOCDELRT ioctls to allow adding and deleting ipv4 routes.

    Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
    adding, changing and deleting gre tunnels.

    Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
    adding, changing and deleting ipip tunnels.

    Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
    adding, changing and deleting ipsec virtual tunnel interfaces.

    Allow setting the MRT_INIT, MRT_DONE, MRT_ADD_VIF, MRT_DEL_VIF, MRT_ADD_MFC,
    MRT_DEL_MFC, MRT_ASSERT, MRT_PIM, MRT_TABLE socket options on multicast routing
    sockets.

    Allow setting and receiving IPOPT_CIPSO, IP_OPT_SEC, IP_OPT_SID and
    arbitrary ip options.

    Allow setting IP_SEC_POLICY/IP_XFRM_POLICY ipv4 socket option.
    Allow setting the IP_TRANSPARENT ipv4 socket option.
    Allow setting the TCP_REPAIR socket option.
    Allow setting the TCP_CONGESTION socket option.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

02 Aug, 2012

1 commit