04 Jan, 2006

27 commits

  • I noticed that some of 'struct proto_ops' used in the kernel may share
    a cache line used by locks or other heavily modified data. (default
    linker alignement is 32 bytes, and L1_CACHE_LINE is 64 or 128 at
    least)

    This patch makes sure a 'struct proto_ops' can be declared as const,
    so that all cpus can share all parts of it without false sharing.

    This is not mandatory : a driver can still use a read/write structure
    if it needs to (and eventually a __read_mostly)

    I made a global stubstitute to change all existing occurences to make
    them const.

    This should reduce the possibility of false sharing on SMP, and
    speedup some socket system calls.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • sock_init can be done as a core_initcall instead of calling
    it directly in init/main.c

    Also I removed an out of date #ifdef.

    Signed-off-by: Andi Kleen
    Signed-off-by: David S. Miller

    Andi Kleen
     
  • Signed-off-by: Frank Filz
    Signed-off-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Frank Filz
     
  • This patch adds support to set/get heartbeat interval, maximum number of
    retransmissions, pathmtu, sackdelay time for a particular transport/
    association/socket as per the latest SCTP sockets api draft11.

    Signed-off-by: Frank Filz
    Signed-off-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Frank Filz
     
  • Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Here is a new feature for netem in 2.6.16. It adds the ability to
    randomly corrupt packets with netem. A version was done by
    Hagen Paul Pfeifer, but I redid it to handle the cases of backwards
    compatibility with netlink interface and presence of hardware checksum
    offload. It is useful for testing hardware offload in devices.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • This lock is actually taken mostly as a writer,
    so using a rwlock actually just makes performance
    worse especially on chips like the Intel P4.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • As DCCP needs to be called in the same spots.

    Now we have a member in inet_sock (is_icsk), set at sock creation time from
    struct inet_protosw->flags (if INET_PROTOSW_ICSK is set, like for TCP and
    DCCP) to see if a struct sock instance is a inet_connection_sock for places
    like the ones in ip_sockglue.c (v4 and v6) where we previously were looking if
    sk_type was SOCK_STREAM, that is insufficient because we now use the same code
    for DCCP, that has sk_type SOCK_DCCP.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Upcoming patches will make, for instance, ip_sockglue.c need just this enum
    and not all of tcp.h.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Renaming it to inet6_hash_connect, making it possible to ditch
    dccp_v6_hash_connect and share the same code with TCP instead.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Renaming it to inet_hash_connect, making it possible to ditch
    dccp_v4_hash_connect and share the same code with TCP instead.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • So that we can share several timewait sockets related functions and
    make the timewait mini sockets infrastructure closer to the request
    mini sockets one.

    Next changesets will take advantage of this, moving more code out of
    TCP and DCCP v4 and v6 to common infrastructure.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • It was already non-TCP specific, will be used by DCCPv6.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Out of tcp6_timewait_sock, that now is just an aggregation of
    inet_timewait_sock and inet6_timewait_sock, using tw_ipv6_offset in struct
    inet_timewait_sock, that is common to the IPv6 transport protocols that use
    timewait sockets, like DCCP and TCP.

    tw_ipv6_offset plays the struct inet_sock pinfo6 role, i.e. for the generic
    code to find the IPv6 area in a timewait sock.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Using sk->sk_protocol instead of IPPROTO_TCP.

    Will be used by DCCPv6 in the next changesets.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • It also looks like there were 2 places where the test on sk_err was
    missing from the event wait logic (in sk_stream_wait_connect and
    sk_stream_wait_memory), while the rest of the sock_error() users look
    to be doing the right thing. This version of the patch fixes those,
    and cleans up a few places that were testing ->sk_err directly.

    Signed-off-by: Benjamin LaHaise
    Signed-off-by: David S. Miller

    Benjamin LaHaise
     
  • When a packet is obtained from skb_recv_datagram with MSG_PEEK enabled
    it is left on the socket receive queue. This means that when we detect
    a checksum error we have to be careful when trying to free the packet
    as someone could have dequeued it in the time being.

    Currently this delicate logic is duplicated three times between UDPv4,
    UDPv6 and RAWv6. This patch moves them into a one place and simplifies
    the code somewhat.

    This is based on a suggestion by Eric Dumazet.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Renaming it to inet_csk_addr2sockaddr.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • And move it to struct inet_connection_sock. DCCP will use it in the
    upcoming changesets.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • And inet6_rsk_offset in inet_request_sock, for the same reasons as
    inet_sock's pinfo6 member.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • More work is needed tho to introduce inet6_request_sock from
    tcp6_request_sock, in the same layout considerations as ipv6_pinfo in
    inet_sock, next changeset will do that.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Another spin of Herbert Xu's "safer ip reassembly" patch
    for 2.6.16.

    (The original patch is here:
    http://marc.theaimsgroup.com/?l=linux-netdev&m=112281936522415&w=2
    and my only contribution is to have tested it.)

    This patch (optionally) does additional checks before accepting IP
    fragments, which can greatly reduce the possibility of reassembling
    fragments which originated from different IP datagrams.

    Signed-off-by: Herbert Xu
    Signed-off-by: Arthur Kepner
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch series implements per packet access control via the
    extension of the Linux Security Modules (LSM) interface by hooks in
    the XFRM and pfkey subsystems that leverage IPSec security
    associations to label packets. Extensions to the SELinux LSM are
    included that leverage the patch for this purpose.

    This patch implements the changes necessary to the XFRM subsystem,
    pfkey interface, ipv4/ipv6, and xfrm_user interface to restrict a
    socket to use only authorized security associations (or no security
    association) to send/receive network packets.

    Patch purpose:

    The patch is designed to enable access control per packets based on
    the strongly authenticated IPSec security association. Such access
    controls augment the existing ones based on network interface and IP
    address. The former are very coarse-grained, and the latter can be
    spoofed. By using IPSec, the system can control access to remote
    hosts based on cryptographic keys generated using the IPSec mechanism.
    This enables access control on a per-machine basis or per-application
    if the remote machine is running the same mechanism and trusted to
    enforce the access control policy.

    Patch design approach:

    The overall approach is that policy (xfrm_policy) entries set by
    user-level programs (e.g., setkey for ipsec-tools) are extended with a
    security context that is used at policy selection time in the XFRM
    subsystem to restrict the sockets that can send/receive packets via
    security associations (xfrm_states) that are built from those
    policies.

    A presentation available at
    www.selinux-symposium.org/2005/presentations/session2/2-3-jaeger.pdf
    from the SELinux symposium describes the overall approach.

    Patch implementation details:

    On output, the policy retrieved (via xfrm_policy_lookup or
    xfrm_sk_policy_lookup) must be authorized for the security context of
    the socket and the same security context is required for resultant
    security association (retrieved or negotiated via racoon in
    ipsec-tools). This is enforced in xfrm_state_find.

    On input, the policy retrieved must also be authorized for the socket
    (at __xfrm_policy_check), and the security context of the policy must
    also match the security association being used.

    The patch has virtually no impact on packets that do not use IPSec.
    The existing Netfilter (outgoing) and LSM rcv_skb hooks are used as
    before.

    Also, if IPSec is used without security contexts, the impact is
    minimal. The LSM must allow such policies to be selected for the
    combination of socket and remote machine, but subsequent IPSec
    processing proceeds as in the original case.

    Testing:

    The pfkey interface is tested using the ipsec-tools. ipsec-tools have
    been modified (a separate ipsec-tools patch is available for version
    0.5) that supports assignment of xfrm_policy entries and security
    associations with security contexts via setkey and the negotiation
    using the security contexts via racoon.

    The xfrm_user interface is tested via ad hoc programs that set
    security contexts. These programs are also available from me, and
    contain programs for setting, getting, and deleting policy for testing
    this interface. Testing of sa functions was done by tracing kernel
    behavior.

    Signed-off-by: Trent Jaeger
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Trent Jaeger
     

03 Jan, 2006

1 commit

  • In commit 3D59121003721a8fad11ee72e646fd9d3076b5679c, the x86 and x86-64
    was changed to include for the
    configurable timer frequency.

    However, asm/param.h is sometimes used in userland (it is included
    indirectly from ), so your commit pollutes the userland
    namespace with tons of CONFIG_FOO macros. This greatly confuses
    software packages (such as BusyBox) which use CONFIG_FOO macros
    themselves to control the inclusion of optional features.

    After a short exchange, Christoph approved this patch

    Signed-off-by: Linus Torvalds

    Dag-Erling Smørgrav
     

28 Dec, 2005

1 commit

  • The below "jumbo" patch fixes the following problems in MLDv2.

    1) Add necessary "ntohs" to recent "pskb_may_pull" check [breaks
    all nonzero source queries on little-endian (!)]

    2) Add locking to source filter list [resend of prior patch]

    3) fix "mld_marksources()" to
    a) send nothing when all queried sources are excluded
    b) send full exclude report when source queried sources are
    not excluded
    c) don't schedule a timer when there's nothing to report

    NOTE: RFC 3810 specifies the source list should be saved and each
    source reported individually as an IS_IN. This is an obvious DOS
    path, requiring the host to store and then multicast as many sources
    as are queried (e.g., millions...). This alternative sends a full,
    relevant report that's limited to number of sources present on the
    machine.

    4) fix "add_grec()" to send empty-source records when it should
    The original check doesn't account for a non-empty source
    list with all sources inactive; the new code keeps that
    short-circuit case, and also generates the group header
    with an empty list if needed.

    5) fix mca_crcount decrement to be after add_grec(), which needs
    its original value

    These issues (other than item #1 ;-) ) were all found by Yan Zheng,
    much thanks!

    Signed-off-by: David L Stevens
    Signed-off-by: David S. Miller

    David L Stevens
     

25 Dec, 2005

3 commits


23 Dec, 2005

3 commits

  • Len Brown
     
  • Linus Torvalds
     
  • Currently a simple

    void foo(void) { preempt_enable(); }

    produces the following code on ARM:

    foo:
    bic r3, sp, #8128
    bic r3, r3, #63
    ldr r2, [r3, #4]
    ldr r1, [r3, #0]
    sub r2, r2, #1
    tst r1, #4
    str r2, [r3, #4]
    blne preempt_schedule
    mov pc, lr

    The problem is that the TIF_NEED_RESCHED flag is loaded _before_ the
    preemption count is stored back, hence any interrupt coming within that
    3 instruction window causing TIF_NEED_RESCHED to be set won't be
    seen and scheduling won't happen as it should.

    Nothing currently prevents gcc from performing that reordering. There
    is already a barrier() before the decrement of the preemption count, but
    another one is needed between this and the TIF_NEED_RESCHED flag test
    for proper code ordering.

    Signed-off-by: Nicolas Pitre
    Acked-by: Nick Piggin
    Signed-off-by: Linus Torvalds

    Nicolas Pitre
     

22 Dec, 2005

3 commits


21 Dec, 2005

2 commits