31 May, 2007

1 commit


09 May, 2007

1 commit

  • Merge all compat ioctl handling into compat_ioctl.c instead of splitting it
    over compat.c and compat_ioctl.c. This also allows to get rid of ioctl32.h

    Signed-off-by: Christoph Hellwig
    Looks-good-to: Andi Kleen
    Acked-by: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

26 Apr, 2007

7 commits

  • It is far too large to be an inline and not in any hot paths.

    Signed-off-by: Andi Kleen
    Signed-off-by: David S. Miller

    Andi Kleen
     
  • Now that network timestamps use ktime_t infrastructure, we can add a new
    SOL_SOCKET sockopt SO_TIMESTAMPNS.

    This command is similar to SO_TIMESTAMP, but permits transmission of
    a 'timespec struct' instead of a 'timeval struct' control message.
    (nanosecond resolution instead of microsecond)

    Control message is labelled SCM_TIMESTAMPNS instead of SCM_TIMESTAMP

    A socket cannot mix SO_TIMESTAMP and SO_TIMESTAMPNS : the two modes are
    mutually exclusive.

    sock_recv_timestamp() became too big to be fully inlined so I added a
    __sock_recv_timestamp() helper function.

    Signed-off-by: Eric Dumazet
    CC: linux-arch@vger.kernel.org
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Covert network warning messages from a compile time to runtime choice.
    Removes kernel config option and replaces it with new /proc/sys/net/core/warnings.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Now network timestamps use ktime_t infrastructure, we can add a new
    ioctl() SIOCGSTAMPNS command to get timestamps in 'struct timespec'.
    User programs can thus access to nanosecond resolution.

    Signed-off-by: Eric Dumazet
    CC: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This allows the write queue implementation to be changed,
    for example, to one which allows fast interval searching.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • We currently use a special structure (struct skb_timeval) and plain
    'struct timeval' to store packet timestamps in sk_buffs and struct
    sock.

    This has some drawbacks :
    - Fixed resolution of micro second.
    - Waste of space on 64bit platforms where sizeof(struct timeval)=16

    I suggest using ktime_t that is a nice abstraction of high resolution
    time services, currently capable of nanosecond resolution.

    As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits
    a 8 byte shrink of this structure on 64bit architectures. Some other
    structures also benefit from this size reduction (struct ipq in
    ipv4/ip_fragment.c, struct frag_queue in ipv6/reassembly.c, ...)

    Once this ktime infrastructure adopted, we can more easily provide
    nanosecond resolution on top of it. (ioctl SIOCGSTAMPNS and/or
    SO_TIMESTAMPNS/SCM_TIMESTAMPNS)

    Note : this patch includes a bug correction in
    compat_sock_get_timestamp() where a "err = 0;" was missing (so this
    syscall returned -ENOENT instead of 0)

    Signed-off-by: Eric Dumazet
    CC: Stephen Hemminger
    CC: John find
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • sk_backlog is a critical field of struct sock. (known famous words)

    It is (ab)used in hot paths, in particular in release_sock(), tcp_recvmsg(),
    tcp_v4_rcv(), sk_receive_skb().

    It really makes sense to place it next to sk_lock, because sk_backlog is only
    used after sk_lock locked (and thus memory cache line in L1 cache). This
    should reduce cache misses and sk_lock acquisition time.

    (In theory, we could only move the head pointer near sk_lock, and leaving tail
    far away, because 'tail' is normally not so hot, but keep it simple :) )

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

07 Mar, 2007

1 commit

  • This reverts two changes:

    8488df894d05d6fa41c2bd298c335f944bb0e401
    248f06726e866942b3d8ca8f411f9067713b7ff8

    A backlog value of N really does mean allow "N + 1" connections
    to queue to a listening socket. This allows one to specify
    "0" as the backlog and still get 1 connection.

    Noticed by Gerrit Renker and Rick Jones.

    Signed-off-by: David S. Miller

    David S. Miller
     

03 Mar, 2007

1 commit

  • when I use linux TCP socket, and find there is a bug in function
    sk_acceptq_is_full().

    When a new SYN comes, TCP module first checks its validation. If valid,
    send SYN,ACK to the client and add the sock to the syn hash table. Next
    time if received the valid ACK for SYN,ACK from the client. server will
    accept this connection and increase the sk->sk_ack_backlog -- which is
    done in function tcp_check_req().We check wether acceptq is full in
    function tcp_v4_syn_recv_sock().

    Consider an example:

    After listen(sockfd, 1) system call, sk->sk_max_ack_backlog is set to
    1. As we know, sk->sk_ack_backlog is initialized to 0. Assuming accept()
    system call is not invoked now.

    1. 1st connection comes. invoke sk_acceptq_is_full(). sk-
    >sk_ack_backlog=0 sk->sk_max_ack_backlog=1, function return 0 accept
    this connection. Increase the sk->sk_ack_backlog
    2. 2nd connection comes. invoke sk_acceptq_is_full(). sk-
    >sk_ack_backlog=1 sk->sk_max_ack_backlog=1, function return 0 accept
    this connection. Increase the sk->sk_ack_backlog
    3. 3rd connection comes. invoke sk_acceptq_is_full(). sk-
    >sk_ack_backlog=2 sk->sk_max_ack_backlog=1, function return 1. Refuse
    this connection.

    I think it has bugs. after listen system call. sk->sk_max_ack_backlog=1
    but now it can accept 2 connections.

    Signed-off-by: Wei Dong
    Signed-off-by: David S. Miller

    Wei Dong
     

01 Mar, 2007

1 commit

  • ctnetlink uses netlink_unicast from an atomic_notifier_chain
    (which is called within a RCU read side critical section)
    without holding further locks. netlink_unicast calls netlink_trim
    with the result of gfp_any() for the gfp flags, which are passed
    down to pskb_expand_header. gfp_any() only checks for softirq
    context and returns GFP_KERNEL, resulting in this warning:

    BUG: sleeping function called from invalid context at mm/slab.c:3032
    in_atomic():1, irqs_disabled():0
    no locks held by rmmod/7010.

    Call Trace:
    [] debug_show_held_locks+0x9/0xb
    [] __might_sleep+0xd9/0xdb
    [] __kmalloc+0x68/0x110
    [] pskb_expand_head+0x4d/0x13b
    [] netlink_broadcast+0xa5/0x2e0
    [] :nfnetlink:nfnetlink_send+0x83/0x8a
    [] :nf_conntrack_netlink:ctnetlink_conntrack_event+0x94c/0x96a
    [] notifier_call_chain+0x29/0x3e
    [] atomic_notifier_call_chain+0x32/0x60
    [] :nf_conntrack:destroy_conntrack+0xa5/0x1d3
    [] :nf_conntrack:nf_ct_cleanup+0x8c/0x12c
    [] :nf_conntrack:kill_l3proto+0x0/0x13
    [] :nf_conntrack:nf_conntrack_l3proto_unregister+0x90/0x94
    [] :nf_conntrack_ipv4:nf_conntrack_l3proto_ipv4_fini+0x2b/0x5d
    [] sys_delete_module+0x1b5/0x1e6
    [] trace_hardirqs_on_thunk+0x35/0x37
    [] system_call+0x7e/0x83

    Since netlink_unicast is supposed to be callable from within RCU
    read side critical sections, make gfp_any() check for in_atomic()
    instead of in_softirq().

    Additionally nfnetlink_send needs to use gfp_any() as well for the
    call to netlink_broadcast).

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

08 Dec, 2006

2 commits

  • Stick NFS sockets in their own class to avoid some lockdep warnings. NFS
    sockets are never exposed to user-space, and will hence not trigger certain
    code paths that would otherwise pose deadlock scenarios.

    [akpm@osdl.org: cleanups]
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Steven Dickson
    Acked-by: Ingo Molnar
    Cc: Trond Myklebust
    Acked-by: Neil Brown
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    [ Fixed patch corruption by quilt, pointed out by Peter Zijlstra ]
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Replace all uses of kmem_cache_t with struct kmem_cache.

    The patch was generated using the following script:

    #!/bin/sh
    #
    # Replace one string by another in all the kernel sources.
    #

    set -e

    for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
    quilt add $file
    sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
    mv /tmp/$$ $file
    quilt refresh
    done

    The script was run like this

    sh replace kmem_cache_t "struct kmem_cache"

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

04 Dec, 2006

1 commit


03 Dec, 2006

3 commits


26 Nov, 2006

1 commit

  • Restoring old, correct comment for sk_filter_release, moving it to
    where it should actually be, and changing new comment into proper
    comment for sk_filter_rcu_free, where it actually makes sense.

    The original fix submitted for this on Oct 23 mistakenly documented
    the wrong function.

    Signed-off-by: Paul Bonser
    Signed-off-by: David S. Miller

    Paul Bonser
     

23 Oct, 2006

1 commit


01 Oct, 2006

1 commit

  • This patch vectorizes aio_read() and aio_write() methods to prepare for
    collapsing all aio & vectored operations into one interface - which is
    aio_read()/aio_write().

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Christoph Hellwig
    Cc: Michael Holzheu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     

23 Sep, 2006

3 commits

  • Function sk_filter() is called from tcp_v{4,6}_rcv() functions with arg
    needlock = 0, while socket is not locked at that moment. In order to avoid
    this and similar issues in the future, use rcu for sk->sk_filter field read
    protection.

    Signed-off-by: Dmitry Mishin
    Signed-off-by: Alexey Kuznetsov
    Signed-off-by: Kirill Korotaev

    Dmitry Mishin
     
  • This automatically labels the TCP, Unix stream, and dccp child sockets
    as well as openreqs to be at the same MLS level as the peer. This will
    result in the selection of appropriately labeled IPSec Security
    Associations.

    This also uses the sock's sid (as opposed to the isec sid) in SELinux
    enforcement of secmark in rcv_skb and postroute_last hooks.

    Signed-off-by: Venkat Yekkirala
    Signed-off-by: David S. Miller

    Venkat Yekkirala
     
  • This adds security for IP sockets at the sock level. Security at the
    sock level is needed to enforce the SELinux security policy for
    security associations even when a sock is orphaned (such as in the TCP
    LAST_ACK state).

    This will also be used to enforce SELinux controls over data arriving
    at or leaving a child socket while it's still waiting to be accepted.

    Signed-off-by: Venkat Yekkirala
    Signed-off-by: David S. Miller

    Venkat Yekkirala
     

04 Jul, 2006

3 commits

  • Teach sk_lock semantics to the lock validator. In the softirq path the
    slock has mutex_trylock()+mutex_unlock() semantics, in the process context
    sock_lock() case it has mutex_lock()/mutex_unlock() semantics.

    Thus we treat sock_owned_by_user() flagged areas as an exclusion area too,
    not just those areas covered by a held sk_lock.slock.

    Effect on non-lockdep kernels: minimal, sk_lock_sock_init() has been turned
    into an inline function.

    Signed-off-by: Ingo Molnar
    Cc: Arjan van de Ven
    Cc: "David S. Miller"
    Cc: Herbert Xu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Teach special (recursive) locking code to the lock validator. Has no effect
    on non-lockdep kernels.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Arjan van de Ven
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Teach special (multi-initialized, per-address-family) locking code to the lock
    validator. Has no effect on non-lockdep kernels.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Arjan van de Ven
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

01 Jul, 2006

1 commit

  • This patch generalises the TSO-specific bits from sk_setup_caps by adding
    the sk_gso_type member to struct sock. This makes sk_setup_caps generic
    so that it can be used by TCPv6 or UFO.

    The only catch is that whoever uses this must provide a GSO implementation
    for their protocol which I think is a fair deal :) For now UFO continues to
    live without a GSO implementation which is OK since it doesn't use the sock
    caps field at the moment.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

30 Jun, 2006

1 commit

  • In the current TSO implementation, NETIF_F_TSO and ECN cannot be
    turned on together in a TCP connection. The problem is that most
    hardware that supports TSO does not handle CWR correctly if it is set
    in the TSO packet. Correct handling requires CWR to be set in the
    first packet only if it is set in the TSO header.

    This patch adds the ability to turn on NETIF_F_TSO and ECN using
    GSO if necessary to handle TSO packets with CWR set. Hardware
    that handles CWR correctly can turn on NETIF_F_TSO_ECN in the dev->
    features flag.

    All TSO packets with CWR set will have the SKB_GSO_TCPV4_ECN set. If
    the output device does not have the NETIF_F_TSO_ECN feature set, GSO
    will split the packet up correctly with CWR only set in the first
    segment.

    With help from Herbert Xu .

    Since ECN can always be enabled with TSO, the SOCK_NO_LARGESEND sock
    flag is completely removed.

    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Michael Chan
     

23 Jun, 2006

2 commits

  • Warning(/var/linsrc/linux-2617-g4//include/linux/skbuff.h:304): No description found for parameter 'dma_cookie'
    Warning(/var/linsrc/linux-2617-g4//include/net/sock.h:1274): No description found for parameter 'copied_early'
    Warning(/var/linsrc/linux-2617-g4//net/core/dev.c:3309): No description found for parameter 'chan'
    Warning(/var/linsrc/linux-2617-g4//net/core/dev.c:3309): No description found for parameter 'event'

    Signed-off-by: Randy Dunlap
    Signed-off-by: David S. Miller

    Randy Dunlap
     
  • This patch adds a generic segmentation offload toggle that can be turned
    on/off for each net device. For now it only supports in TCPv4.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

21 Jun, 2006

1 commit

  • * git://git.infradead.org/hdrcleanup-2.6: (63 commits)
    [S390] __FD_foo definitions.
    Switch to __s32 types in joystick.h instead of C99 types for consistency.
    Add to headers included for userspace in
    Move inclusion of out of user scope in asm-x86_64/mtrr.h
    Remove struct fddi_statistics from user view in
    Move user-visible parts of drivers/s390/crypto/z90crypt.h to include/asm-s390
    Revert include/media changes: Mauro says those ioctls are only used in-kernel(!)
    Include and use __uXX types in
    Use __uXX types in , include too
    Remove private struct dx_hash_info from public view in
    Include and use __uXX types in
    Use __uXX types in for struct divert_blk et al.
    Use __u32 for elf_addr_t in , not u32. It's user-visible.
    Remove PPP_FCS from user view in , remove __P mess entirely
    Use __uXX types in user-visible structures in
    Don't use 'u32' in user-visible struct ip_conntrack_old_tuple.
    Use __uXX types for S390 DASD volume label definitions which are user-visible
    S390 BIODASDREADCMB ioctl should use __u64 not u64 type.
    Remove unneeded inclusion of from
    Fix private integer types used in V4L2 ioctls.
    ...

    Manually resolve conflict in include/linux/mtd/physmap.h

    Linus Torvalds
     

18 Jun, 2006

3 commits


07 May, 2006

1 commit


30 Apr, 2006

1 commit


26 Apr, 2006

1 commit


20 Apr, 2006

1 commit

  • Add some sanity checking. truesize should be at least sizeof(struct
    sk_buff) plus the current packet length. If not, then truesize is
    seriously mangled and deserves a kernel log message.

    Currently we'll do the check for release of stream socket buffers.

    But we can add checks to more spots over time.

    Incorporating ideas from Herbert Xu.

    Signed-off-by: David S. Miller

    David S. Miller
     

29 Mar, 2006

1 commit

  • Sizes in bytes (allyesconfig, i386) and files where those inlines
    are used:

    238 sock_queue_rcv_skb 2.6.16/net/x25/x25_in.o
    238 sock_queue_rcv_skb 2.6.16/net/rose/rose_in.o
    238 sock_queue_rcv_skb 2.6.16/net/packet/af_packet.o
    238 sock_queue_rcv_skb 2.6.16/net/netrom/nr_in.o
    238 sock_queue_rcv_skb 2.6.16/net/llc/llc_sap.o
    238 sock_queue_rcv_skb 2.6.16/net/llc/llc_conn.o
    238 sock_queue_rcv_skb 2.6.16/net/irda/af_irda.o
    238 sock_queue_rcv_skb 2.6.16/net/ipx/af_ipx.o
    238 sock_queue_rcv_skb 2.6.16/net/ipv6/udp.o
    238 sock_queue_rcv_skb 2.6.16/net/ipv6/raw.o
    238 sock_queue_rcv_skb 2.6.16/net/ipv4/udp.o
    238 sock_queue_rcv_skb 2.6.16/net/ipv4/raw.o
    238 sock_queue_rcv_skb 2.6.16/net/ipv4/ipmr.o
    238 sock_queue_rcv_skb 2.6.16/net/econet/econet.o
    238 sock_queue_rcv_skb 2.6.16/net/econet/af_econet.o
    238 sock_queue_rcv_skb 2.6.16/net/bluetooth/sco.o
    238 sock_queue_rcv_skb 2.6.16/net/bluetooth/l2cap.o
    238 sock_queue_rcv_skb 2.6.16/net/bluetooth/hci_sock.o
    238 sock_queue_rcv_skb 2.6.16/net/ax25/ax25_in.o
    238 sock_queue_rcv_skb 2.6.16/net/ax25/af_ax25.o
    238 sock_queue_rcv_skb 2.6.16/net/appletalk/ddp.o
    238 sock_queue_rcv_skb 2.6.16/drivers/net/pppoe.o

    276 sk_receive_skb 2.6.16/net/decnet/dn_nsp_in.o
    276 sk_receive_skb 2.6.16/net/dccp/ipv6.o
    276 sk_receive_skb 2.6.16/net/dccp/ipv4.o
    276 sk_receive_skb 2.6.16/net/dccp/dccp_ipv6.o
    276 sk_receive_skb 2.6.16/drivers/net/pppoe.o

    209 sk_dst_check 2.6.16/net/ipv6/ip6_output.o
    209 sk_dst_check 2.6.16/net/ipv4/udp.o
    209 sk_dst_check 2.6.16/net/decnet/dn_nsp_out.o

    Large inlines with multiple callers:
    Size Uses Wasted Name and definition
    ===== ==== ====== ================================================
    238 21 4360 sock_queue_rcv_skb include/net/sock.h
    109 10 801 sock_recv_timestamp include/net/sock.h
    276 4 768 sk_receive_skb include/net/sock.h
    94 8 518 __sk_dst_check include/net/sock.h
    209 3 378 sk_dst_check include/net/sock.h
    131 4 333 sk_setup_caps include/net/sock.h
    152 2 132 sk_stream_alloc_pskb include/net/sock.h
    125 2 105 sk_stream_writequeue_purge include/net/sock.h

    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Denis Vlasenko