04 Jan, 2006

16 commits

  • I noticed that some of 'struct proto_ops' used in the kernel may share
    a cache line used by locks or other heavily modified data. (default
    linker alignement is 32 bytes, and L1_CACHE_LINE is 64 or 128 at
    least)

    This patch makes sure a 'struct proto_ops' can be declared as const,
    so that all cpus can share all parts of it without false sharing.

    This is not mandatory : a driver can still use a read/write structure
    if it needs to (and eventually a __read_mostly)

    I made a global stubstitute to change all existing occurences to make
    them const.

    This should reduce the possibility of false sharing on SMP, and
    speedup some socket system calls.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • sock_init can be done as a core_initcall instead of calling
    it directly in init/main.c

    Also I removed an out of date #ifdef.

    Signed-off-by: Andi Kleen
    Signed-off-by: David S. Miller

    Andi Kleen
     
  • Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Here is a new feature for netem in 2.6.16. It adds the ability to
    randomly corrupt packets with netem. A version was done by
    Hagen Paul Pfeifer, but I redid it to handle the cases of backwards
    compatibility with netlink interface and presence of hardware checksum
    offload. It is useful for testing hardware offload in devices.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • As DCCP needs to be called in the same spots.

    Now we have a member in inet_sock (is_icsk), set at sock creation time from
    struct inet_protosw->flags (if INET_PROTOSW_ICSK is set, like for TCP and
    DCCP) to see if a struct sock instance is a inet_connection_sock for places
    like the ones in ip_sockglue.c (v4 and v6) where we previously were looking if
    sk_type was SOCK_STREAM, that is insufficient because we now use the same code
    for DCCP, that has sk_type SOCK_DCCP.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Upcoming patches will make, for instance, ip_sockglue.c need just this enum
    and not all of tcp.h.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Renaming it to inet6_hash_connect, making it possible to ditch
    dccp_v6_hash_connect and share the same code with TCP instead.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Renaming it to inet_hash_connect, making it possible to ditch
    dccp_v4_hash_connect and share the same code with TCP instead.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • So that we can share several timewait sockets related functions and
    make the timewait mini sockets infrastructure closer to the request
    mini sockets one.

    Next changesets will take advantage of this, moving more code out of
    TCP and DCCP v4 and v6 to common infrastructure.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Out of tcp6_timewait_sock, that now is just an aggregation of
    inet_timewait_sock and inet6_timewait_sock, using tw_ipv6_offset in struct
    inet_timewait_sock, that is common to the IPv6 transport protocols that use
    timewait sockets, like DCCP and TCP.

    tw_ipv6_offset plays the struct inet_sock pinfo6 role, i.e. for the generic
    code to find the IPv6 area in a timewait sock.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Using sk->sk_protocol instead of IPPROTO_TCP.

    Will be used by DCCPv6 in the next changesets.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • When a packet is obtained from skb_recv_datagram with MSG_PEEK enabled
    it is left on the socket receive queue. This means that when we detect
    a checksum error we have to be careful when trying to free the packet
    as someone could have dequeued it in the time being.

    Currently this delicate logic is duplicated three times between UDPv4,
    UDPv6 and RAWv6. This patch moves them into a one place and simplifies
    the code somewhat.

    This is based on a suggestion by Eric Dumazet.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • And move it to struct inet_connection_sock. DCCP will use it in the
    upcoming changesets.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • And inet6_rsk_offset in inet_request_sock, for the same reasons as
    inet_sock's pinfo6 member.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Another spin of Herbert Xu's "safer ip reassembly" patch
    for 2.6.16.

    (The original patch is here:
    http://marc.theaimsgroup.com/?l=linux-netdev&m=112281936522415&w=2
    and my only contribution is to have tested it.)

    This patch (optionally) does additional checks before accepting IP
    fragments, which can greatly reduce the possibility of reassembling
    fragments which originated from different IP datagrams.

    Signed-off-by: Herbert Xu
    Signed-off-by: Arthur Kepner
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch series implements per packet access control via the
    extension of the Linux Security Modules (LSM) interface by hooks in
    the XFRM and pfkey subsystems that leverage IPSec security
    associations to label packets. Extensions to the SELinux LSM are
    included that leverage the patch for this purpose.

    This patch implements the changes necessary to the XFRM subsystem,
    pfkey interface, ipv4/ipv6, and xfrm_user interface to restrict a
    socket to use only authorized security associations (or no security
    association) to send/receive network packets.

    Patch purpose:

    The patch is designed to enable access control per packets based on
    the strongly authenticated IPSec security association. Such access
    controls augment the existing ones based on network interface and IP
    address. The former are very coarse-grained, and the latter can be
    spoofed. By using IPSec, the system can control access to remote
    hosts based on cryptographic keys generated using the IPSec mechanism.
    This enables access control on a per-machine basis or per-application
    if the remote machine is running the same mechanism and trusted to
    enforce the access control policy.

    Patch design approach:

    The overall approach is that policy (xfrm_policy) entries set by
    user-level programs (e.g., setkey for ipsec-tools) are extended with a
    security context that is used at policy selection time in the XFRM
    subsystem to restrict the sockets that can send/receive packets via
    security associations (xfrm_states) that are built from those
    policies.

    A presentation available at
    www.selinux-symposium.org/2005/presentations/session2/2-3-jaeger.pdf
    from the SELinux symposium describes the overall approach.

    Patch implementation details:

    On output, the policy retrieved (via xfrm_policy_lookup or
    xfrm_sk_policy_lookup) must be authorized for the security context of
    the socket and the same security context is required for resultant
    security association (retrieved or negotiated via racoon in
    ipsec-tools). This is enforced in xfrm_state_find.

    On input, the policy retrieved must also be authorized for the socket
    (at __xfrm_policy_check), and the security context of the policy must
    also match the security association being used.

    The patch has virtually no impact on packets that do not use IPSec.
    The existing Netfilter (outgoing) and LSM rcv_skb hooks are used as
    before.

    Also, if IPSec is used without security contexts, the impact is
    minimal. The LSM must allow such policies to be selected for the
    combination of socket and remote machine, but subsequent IPSec
    processing proceeds as in the original case.

    Testing:

    The pfkey interface is tested using the ipsec-tools. ipsec-tools have
    been modified (a separate ipsec-tools patch is available for version
    0.5) that supports assignment of xfrm_policy entries and security
    associations with security contexts via setkey and the negotiation
    using the security contexts via racoon.

    The xfrm_user interface is tested via ad hoc programs that set
    security contexts. These programs are also available from me, and
    contain programs for setting, getting, and deleting policy for testing
    this interface. Testing of sa functions was done by tracing kernel
    behavior.

    Signed-off-by: Trent Jaeger
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Trent Jaeger
     

25 Dec, 2005

1 commit


23 Dec, 2005

2 commits

  • Linus Torvalds
     
  • Currently a simple

    void foo(void) { preempt_enable(); }

    produces the following code on ARM:

    foo:
    bic r3, sp, #8128
    bic r3, r3, #63
    ldr r2, [r3, #4]
    ldr r1, [r3, #0]
    sub r2, r2, #1
    tst r1, #4
    str r2, [r3, #4]
    blne preempt_schedule
    mov pc, lr

    The problem is that the TIF_NEED_RESCHED flag is loaded _before_ the
    preemption count is stored back, hence any interrupt coming within that
    3 instruction window causing TIF_NEED_RESCHED to be set won't be
    seen and scheduling won't happen as it should.

    Nothing currently prevents gcc from performing that reordering. There
    is already a barrier() before the decrement of the preemption count, but
    another one is needed between this and the TIF_NEED_RESCHED flag test
    for proper code ordering.

    Signed-off-by: Nicolas Pitre
    Acked-by: Nick Piggin
    Signed-off-by: Linus Torvalds

    Nicolas Pitre
     

22 Dec, 2005

2 commits

  • David S. Miller
     
  • Jan's crosscompile page [1] shows, that one regression in 2.6.15-rc is
    that the v850 defconfig does no longer compile.

    The compile error is:

    ...
    CC arch/v850/kernel/setup.o
    In file included from /usr/src/ctest/rc/kernel/arch/v850/kernel/setup.c:17:
    /usr/src/ctest/rc/kernel/include/linux/irq.h:13:43: asm/smp.h: No such file or directory
    make[2]: *** [arch/v850/kernel/setup.o] Error 1

    The #include in irq.h was intruduced in 2.6.15-rc.

    Since include/linux/irq.h needs code from asm/smp.h only in the
    CONFIG_SMP=y case and linux/smp.h #include's asm/smp.h only in the
    CONFIG_SMP=y case, I'm suggesting this patch to #include
    in irq.h.

    I've tested the compilation with both CONFIG_SMP=y and CONFIG_SMP=n
    on i386.

    Signed-off-by: Adrian Bunk
    Acked-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     

21 Dec, 2005

2 commits


20 Dec, 2005

2 commits

  • Ensure we call unmap_mapping_range() and sync dirty pages to disk before
    doing an NFS direct write.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • I reported a problem and gave hints to the solution, but nobody seemed
    to react. So I prepared a patch against 2.6.14.4.

    Tested on 2.6.14.4 with "ip monitor addr" and with the program
    attached, while adding and removing IPv6 address. Both programs didn't
    receive any messages. Tested 2.6.14.4 + this patch, and both programs
    received add and remove messages.

    Signed-off-by: Kristian Slavov
    Acked-by: Jamal Hadi salim
    ACKed-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Kristian Slavov
     

19 Dec, 2005

2 commits


17 Dec, 2005

3 commits


15 Dec, 2005

4 commits


13 Dec, 2005

6 commits

  • Some hardware does not support the PACKET command at all.
    Other hardware supports ATAPI, but the driver does something nasty such
    as calling BUG() when an ATAPI command is issued.

    For these such cases, we mark them with a new flag, ATA_FLAG_NO_ATAPI.

    Initial version contributed by Ben Collins.

    Jeff Garzik
     
  • The drawing function cfbfillrect does not work correctly when access is not
    unsigned-long aligned. It manifests as extra lines of pixels that are not
    complete drawn. Reversing the shift operator solves the problem, so I would
    presume that this bug would manifest only on little endian machines. The
    function cfbcopyarea may also have this bug.

    Aligned access should present no problems.

    Signed-off-by: Antonino Daplas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Antonino A. Daplas
     
  • Every framebuffer driver relies on the assumption that the set_par()
    function of the driver is called before drawing functions and other
    functions dependent on the hardware state are executed.

    Whenever you switch from X to a framebuffer console for the very first
    time, there is a chance that a broken X system has _not_ set the mode to
    KD_GRAPHICS, thus the vt and framebuffer code executes a screen redraw and
    several other functions before a set_par() is executed. This is believed
    to be not a bug of linux but a bug of X/xdm. At least some X releases used
    by SuSE and Debian show this behaviour.

    There was a 2nd case, but that has been fixed by Antonino Daplas on
    10-dec-2005.

    This patch allows drivers to set a flag to inform fbcon_switch() that they
    prefer a set_par() call on every console switch, working around the
    problems caused by the broken X releases.

    The flag will be used by the next release of cyblafb and might help other
    drivers that assume a hardware state different to the one used by X.

    As the default behaviour does not change, this patch should be acceptable
    to everybody.

    Signed-off-by: Knut Petersen
    Acked-by: "Antonino A. Daplas"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Knut Petersen
     
  • Add hooks to save and restore the graphics state. These hooks are called in
    fbcon_blank() when entering/leaving KD_GRAPHICS mode. This is needed by
    savagefb at least so it can cooperate with savage_dri and by cyblafb.

    State save/restoration can be full or partial.

    Signed-off-by: Antonino Daplas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Antonino A. Daplas
     
  • Spotted by a Fedora user. Compiling with DEBUG_PARPORT set fails due to
    the broken cast.

    Just remove it.

    Signed-off-by: Dave Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jones
     
  • When multiple probes are registered at the same address and if due to some
    recursion (probe getting triggered within a probe handler), we skip calling
    pre_handlers and just increment nmissed field.

    The below patch make sure it walks the list for multiple probes case.
    Without the below patch we get incorrect results of nmissed count for
    multiple probe case.

    Signed-off-by: Anil S Keshavamurthy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Keshavamurthy Anil S