28 Apr, 2007

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
    selinux: preserve boolean values across policy reloads
    selinux: change numbering of boolean directory inodes in selinuxfs
    selinux: remove unused enumeration constant from selinuxfs
    selinux: explicitly number all selinuxfs inodes
    selinux: export initial SID contexts via selinuxfs
    selinux: remove userland security class and permission definitions
    SELinux: move security_skb_extlbl_sid() out of the security server
    MAINTAINERS: update selinux entry
    SELinux: rename selinux_netlabel.h to netlabel.h
    SELinux: extract the NetLabel SELinux support from the security server
    NetLabel: convert a BUG_ON in the CIPSO code to a runtime check
    NetLabel: cleanup and document CIPSO constants

    Linus Torvalds
     

27 Apr, 2007

1 commit


26 Apr, 2007

38 commits

  • This patch changes a BUG_ON in the CIPSO code to a runtime check. It should
    also increase the readability of the code as it replaces an unexplained
    constant with a well defined macro.

    Signed-off-by: Paul Moore
    Signed-off-by: James Morris

    Paul Moore
     
  • This patch collects all of the CIPSO constants and puts them in one place; it
    also documents each value explaining how the value is derived.

    Signed-off-by: Paul Moore
    Signed-off-by: James Morris

    Paul Moore
     
  • Signed-off-by: YOSHIFUJI Hideaki

    YOSHIFUJI Hideaki
     
  • Signed-off-by: YOSHIFUJI Hideaki

    YOSHIFUJI Hideaki
     
  • This patch moves the SNMP code shared between IPv4/IPv6 from proc.c
    into net/ipv4/af_inet.c. This makes sense because these functions
    aren't specific to /proc.

    As a result we can again skip proc.o if /proc is disabled.

    Signed-off-by: Herbert Xu
    Acked-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Signed-off-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    YOSHIFUJI Hideaki
     
  • To avoid raw division, use ktime_to_timeval() to get usec.

    Signed-off-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    YOSHIFUJI Hideaki
     
  • Rather than using a copy of vegas code, the YEAH code should just have
    it exported so there is common code.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Do some simple changes to make congestion control API faster/cleaner.
    * use ktime_t rather than timeval
    * merge rtt sampling into existing ack callback
    this means one indirect call versus two per ack.
    * use flags bits to store options/settings

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • This version more closely matches the paper, and fixes several
    math errors. The biggest difference is that it updates alpha/beta
    once per RTT

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • This is (mostly) automated change using magic:

    sed -e '/struct sock \*sk/ N' -e '/struct sock \*sk/ N'
    -e '/struct sock \*sk/ N' -e '/struct sock \*sk/ N'
    -e 's|struct sock \*sk,[\n\t ]*struct tcp_sock \*tp\([^{]*\n{\n\)|
    struct sock \*sk\1\tstruct tcp_sock *tp = tcp_sk(sk);\n|g'
    -e 's|struct sock \*sk, struct tcp_sock \*tp|
    struct sock \*sk|g' -e 's|sk, tp\([^-]\)|sk\1|g'

    Fixed four unused variable (tp) warnings that were introduced.

    In addition, manually added newlines after local variables and
    tweaked function arguments positioning.

    $ gcc --version
    gcc (GCC) 4.1.1 20060525 (Red Hat 4.1.1-1)
    ...
    $ codiff -fV built-in.o.old built-in.o.new
    net/ipv4/route.c:
    rt_cache_flush | +14
    1 function changed, 14 bytes added

    net/ipv4/tcp.c:
    tcp_setsockopt | -5
    tcp_sendpage | -25
    tcp_sendmsg | -16
    3 functions changed, 46 bytes removed

    net/ipv4/tcp_input.c:
    tcp_try_undo_recovery | +3
    tcp_try_undo_dsack | +2
    tcp_mark_head_lost | -12
    tcp_ack | -15
    tcp_event_data_recv | -32
    tcp_rcv_state_process | -10
    tcp_rcv_established | +1
    7 functions changed, 6 bytes added, 69 bytes removed, diff: -63

    net/ipv4/tcp_output.c:
    update_send_head | -9
    tcp_transmit_skb | +19
    tcp_cwnd_validate | +1
    tcp_write_wakeup | -17
    __tcp_push_pending_frames | -25
    tcp_push_one | -8
    tcp_send_fin | -4
    7 functions changed, 20 bytes added, 63 bytes removed, diff: -43

    built-in.o.new:
    18 functions changed, 40 bytes added, 178 bytes removed, diff: -138

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     
  • The function is quite big and has several call sites and nothing
    to collapse by compiler optimization on inlining.

    Besides it's nicer to read in a in .c file.

    Signed-off-by: Andi Kleen
    Signed-off-by: David S. Miller

    Andi Kleen
     
  • Spring cleaning time...

    There seems to be a lot of places in the network code that have
    extra bogus semicolons after conditionals. Most commonly is a
    bogus semicolon after: switch() { }

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • This is an implementation of TCP Illinois invented by Shao Liu
    at University of Illinois. It is a another variant of Reno which adapts
    the alpha and beta parameters based on RTT. The basic idea is to increase
    window less rapidly as delay approaches the maximum. See the papers
    and talks to get a more complete description.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • This also fixes memory leak in error path.

    Signed-off-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    YOSHIFUJI Hideaki
     
  • Add IP(V6)_PMTUDISC_PROBE value for IP(V6)_MTU_DISCOVER. This option forces
    us not to fragment, but does not make use of the kernel path MTU discovery.
    That is, it allows for user-mode MTU probing (or, packetization-layer path
    MTU discovery). This is particularly useful for diagnostic utilities, like
    traceroute/tracepath.

    Signed-off-by: John Heffner
    Signed-off-by: David S. Miller

    John Heffner
     
  • Since we're now holding the rtnl during the entire dump operation, we can
    remove additional locking for rtnl protected data. This patch does that
    for all simple cases (dev_base_lock for dev_base walking, RCU protection
    for FIB rule dumping).

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Switch cb_lock to mutex and allow netlink kernel users to override it
    with a subsystem specific mutex for consistent locking in dump callbacks.
    All netlink_dump_start users have been audited not to rely on any
    side-effects of the previously used spinlock.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • All LOG targets always use their internal logging function nowadays, so
    remove the incorrect error message and handle real errors (!= -EEXIST)
    by failing to load.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • When mangling packets forwarded to a HW checksumming capable device,
    offload recalculation of the checksum instead of doing it in software.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • When a transmitted packet is looped back directly, CHECKSUM_PARTIAL
    maps to the semantics of CHECKSUM_UNNECESSARY. Therefore we should
    treat it as such in the stack.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • The skb transport pointer is currently used to specify the start
    of the checksum region for transmit checksum offload. Unfortunately,
    the same pointer is also used during receive side processing.

    This creates a problem when we want to retransmit a received
    packet with partial checksums since the skb transport pointer
    would be overwritten.

    This patch solves this problem by creating a new 16-bit csum_start
    offset value to replace the skb transport header for the purpose
    of checksums. This offset is calculated from skb->head so that
    it does not have to change when skb->data changes.

    No extra space is required since csum_offset itself fits within
    a 16-bit word so we can use the other 16 bits for csum_start.

    For backwards compatibility, just before we push a packet with
    partial checksums off into the device driver, we set the skb
    transport header to what it would have been under the old scheme.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • esp_init_state doesn't account for the beet pseudo header in the header_len
    calculation, which may result in undersized skbs hitting xfrm4_beet_output,
    causing unnecessary reallocations in ip_finish_output2.

    The skbs should still always have enough room to avoid causing
    skb_under_panic in skb_push since we have at least 16 bytes available
    from LL_RESERVED_SPACE in xfrm_state_check_space.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Replace the probing based MTU estimation, which usually takes 2-3 iterations
    to find a fitting value and may underestimate the MTU, by an exact calculation.

    Also fix underestimation of the XFRM trailer_len, which causes unnecessary
    reallocations.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Fix incorrect switch of "trailer" skb by "skb" during skb_tail_pointer
    conversion:

    - *(u8*)(trailer->tail - 1) = top_iph->protocol;
    + *(skb_tail_pointer(skb) - 1) = top_iph->protocol;

    - *(u8 *)(trailer->tail - 1) = *skb_network_header(skb);
    + *(skb_tail_pointer(skb) - 1) = *skb_network_header(skb);

    Signed-off-by: Patrick McHardy
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Remove unnecessary initialization/variable.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • To clearly state the intent of copying to linear sk_buffs, _offset being a
    overly long variant but interesting for the sake of saving some bytes.

    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • To clearly state the intent of copying from linear sk_buffs, _offset being a
    overly long variant but interesting for the sake of saving some bytes.

    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • As IPPROTO_TCP is 6, it makes sense to make sure inet_protos[] array
    is properly cache line aligned to avoid false sharing on SMP.

    c0680540 b peer_total
    c0680544 b inet_peer_unused_head
    c0680560 B inet_protos

    On i386 this example, we can see that inet_protos[IPPROTO_TCP] shares
    a potentially hot (and modified) cache line.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • tcp_memory_pressure and tcp_socket currently share a cache line with tcp_memory_allocated, tcp_sockets_allocated.
    (Very hot cache line)
    It makes sense to declare these variables as __read_mostly, to avoid false sharing on SMP.

    ffffffff8081d9c0 B tcp_orphan_count
    ffffffff8081d9c4 B tcp_memory_allocated
    ffffffff8081d9c8 B tcp_sockets_allocated
    ffffffff8081d9cc B tcp_memory_pressure
    ffffffff8081d9d0 b tcp_md5sig_users
    ffffffff8081d9d8 b tcp_md5sig_pool
    ffffffff8081d9e0 b warntime.31570
    ffffffff8081d9e8 b tcp_socket

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • The results of FIB rules lookups are cached in the routing cache
    except for IPv6 as no such cache exists. So far, it was the
    responsibility of the user to flush the cache after modifying any
    rules. This lead to many false bug reports due to misunderstanding
    of this concept.

    This patch automatically flushes the route cache after inserting
    or deleting a rule.

    Thanks to Muli Ben-Yehuda for catching a bug
    in the previous patch.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • There is a very tiny probability that build_ehash_secret() is called
    at the same time by different CPUS.

    Also, using __read_mostly is a must for inet_ehash_secret

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Right now Xen has a horrible hack that lets it forward packets with
    partial checksums. One of the reasons that CHECKSUM_PARTIAL and
    CHECKSUM_COMPLETE were added is so that we can get rid of this hack
    (where it creates two extra bits in the skbuff to essentially mirror
    ip_summed without being destroyed by the forwarding code).

    I had forgotten that I've already gone through all the deivce drivers
    last time around to make sure that they're looking at ip_summed ==
    CHECKSUM_PARTIAL rather than ip_summed != 0 on transmit. In any case,
    I've now done that again so it should definitely be safe.

    Unfortunately nobody has yet added any code to update CHECKSUM_COMPLETE
    values on forward so we I'm setting that to CHECKSUM_NONE. This should
    be safe to remove for bridging but I'd like to check that code path
    first.

    So here is the patch that lets us get rid of the hack by preserving
    ip_summed (mostly) on forwarded packets.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • this is a small patch by Janusz Krzysztofik to ip_route_output_slow()
    that allows VIP-less LVS linux director to generate packets
    originating >From VIP if sysctl_ip_nonlocal_bind is set.

    In a nutshell, the intention is for an LVS linux director to be able
    to send ICMP unreachable responses to end-users when real-servers are
    removed.

    http://archive.linuxvirtualserver.org/html/lvs-users/2007-01/msg00106.html

    Signed-off-by: Simon Horman
    Signed-off-by: David S. Miller

    Janusz Krzysztofik
     
  • Change tcp_probe to use ktime (needed to add one export).
    Add option to only get events when cwnd changes - from Doug Leith

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger