21 Dec, 2011

5 commits


14 Oct, 2011

1 commit

  • skb truesize currently accounts for sk_buff struct and part of skb head.
    kmalloc() roundings are also ignored.

    Considering that skb_shared_info is larger than sk_buff, its time to
    take it into account for better memory accounting.

    This patch introduces SKB_TRUESIZE(X) macro to centralize various
    assumptions into a single place.

    At skb alloc phase, we put skb_shared_info struct at the exact end of
    skb head, to allow a better use of memory (lowering number of
    reallocations), since kmalloc() gives us power-of-two memory blocks.

    Unless SLUB/SLUB debug is active, both skb->head and skb_shared_info are
    aligned to cache lines, as before.

    Note: This patch might trigger performance regressions because of
    misconfigured protocol stacks, hitting per socket or global memory
    limits that were previously not reached. But its a necessary step for a
    more accurate memory accounting.

    Signed-off-by: Eric Dumazet
    CC: Andi Kleen
    CC: Ben Hutchings
    Signed-off-by: David S. Miller

    Eric Dumazet
     

13 Aug, 2011

3 commits

  • The current transport mechanism for af_iucv is the z/VM offered
    communications facility IUCV. To provide equivalent support when
    running Linux in an LPAR, HiperSockets transport is added to the
    AF_IUCV address family. It requires explicit binding of an AF_IUCV
    socket to a HiperSockets device. A new packet_type ETH_P_AF_IUCV
    is announced. An af_iucv specific transport header is defined
    preceding the skb data. A small protocol is implemented for
    connecting and for flow control/congestion management.

    Signed-off-by: Ursula Braun
    Signed-off-by: Frank Blaschka
    Reviewed-by: Hendrik Brueckner
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • Code cleanup making make use of local variable for struct iucv_sock.

    Signed-off-by: Ursula Braun
    Signed-off-by: Frank Blaschka
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • For future af_iucv extensions the module should be able to run in LPAR
    mode too. For this we use the new dynamic loading iucv interface.

    Signed-off-by: Frank Blaschka
    Signed-off-by: David S. Miller

    Frank Blaschka
     

14 May, 2011

1 commit


31 Mar, 2011

1 commit


27 May, 2010

1 commit

  • Add a spin_unlock missing on the error path. There seems like no reason
    why the lock should continue to be held if the kzalloc fail.

    The semantic match that finds this problem is as follows:
    (http://coccinelle.lip6.fr/)

    //
    @@
    expression E1;
    @@

    * spin_lock(E1,...);

    * spin_unlock(E1,...);
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: David S. Miller

    Julia Lawall
     

18 May, 2010

1 commit

  • This patch removes from net/ (but not any netfilter files)
    all the unnecessary return; statements that precede the
    last closing brace of void functions.

    It does not remove the returns that are immediately
    preceded by a label as gcc doesn't like that.

    Done via:
    $ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
    xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

02 May, 2010

1 commit

  • sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
    need two atomic operations (and associated dirtying) per incoming
    packet.

    RCU conversion is pretty much needed :

    1) Add a new structure, called "struct socket_wq" to hold all fields
    that will need rcu_read_lock() protection (currently: a
    wait_queue_head_t and a struct fasync_struct pointer).

    [Future patch will add a list anchor for wakeup coalescing]

    2) Attach one of such structure to each "struct socket" created in
    sock_alloc_inode().

    3) Respect RCU grace period when freeing a "struct socket_wq"

    4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
    socket_wq"

    5) Change sk_sleep() function to use new sk->sk_wq instead of
    sk->sk_sleep

    6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
    a rcu_read_lock() section.

    7) Change all sk_has_sleeper() callers to :
    - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
    - Use wq_has_sleeper() to eventually wakeup tasks.
    - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)

    8) sock_wake_async() is modified to use rcu protection as well.

    9) Exceptions :
    macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
    instead of dynamically allocated ones. They dont need rcu freeing.

    Some cleanups or followups are probably needed, (possible
    sk_callback_lock conversion to a spinlock for example...).

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 Apr, 2010

1 commit

  • Define a new function to return the waitqueue of a "struct sock".

    static inline wait_queue_head_t *sk_sleep(struct sock *sk)
    {
    return sk->sk_sleep;
    }

    Change all read occurrences of sk_sleep by a call to this function.

    Needed for a future RCU conversion. sk_sleep wont be a field directly
    available.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

16 Dec, 2009

1 commit


06 Nov, 2009

1 commit

  • The generic __sock_create function has a kern argument which allows the
    security system to make decisions based on if a socket is being created by
    the kernel or by userspace. This patch passes that flag to the
    net_proto_family specific create function, so it can do the same thing.

    Signed-off-by: Eric Paris
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Eric Paris
     

18 Oct, 2009

2 commits


07 Oct, 2009

1 commit


01 Oct, 2009

1 commit

  • This provides safety against negative optlen at the type
    level instead of depending upon (sometimes non-trivial)
    checks against this sprinkled all over the the place, in
    each and every implementation.

    Based upon work done by Arjan van de Ven and feedback
    from Linus Torvalds.

    Signed-off-by: David S. Miller

    David S. Miller
     

17 Sep, 2009

4 commits

  • iucv_sock_recvmsg() and iucv_process_message()/iucv_fragment_skb race
    for dequeuing an skb from the backlog queue.

    If iucv_sock_recvmsg() dequeues first, iucv_process_message() calls
    sock_queue_rcv_skb() with an skb that is NULL.

    This results in the following kernel panic:

    Unable to handle kernel pointer dereference at virtual kernel address (null)
    Oops: 0004 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    Modules linked in: af_iucv sunrpc qeth_l3 dm_multipath dm_mod vmur qeth ccwgroup
    CPU: 0 Not tainted 2.6.30 #4
    Process client-iucv (pid: 4787, task: 0000000034e75940, ksp: 00000000353e3710)
    Krnl PSW : 0704000180000000 000000000043ebca (sock_queue_rcv_skb+0x7a/0x138)
    R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3
    Krnl GPRS: 0052900000000000 000003e0016e0fe8 0000000000000000 0000000000000000
    000000000043eba8 0000000000000002 0000000000000001 00000000341aa7f0
    0000000000000000 0000000000007800 0000000000000000 0000000000000000
    00000000341aa7f0 0000000000594650 000000000043eba8 000000003fc2fb28
    Krnl Code: 000000000043ebbe: a7840006 brc 8,43ebca
    000000000043ebc2: 5930c23c c %r3,572(%r12)
    000000000043ebc6: a724004c brc 2,43ec5e
    >000000000043ebca: e3c0b0100024 stg %r12,16(%r11)
    000000000043ebd0: a7190000 lghi %r1,0
    000000000043ebd4: e310b0200024 stg %r1,32(%r11)
    000000000043ebda: c010ffffdce9 larl %r1,43a5ac
    000000000043ebe0: e310b0800024 stg %r1,128(%r11)
    Call Trace:
    ([] sock_queue_rcv_skb+0x58/0x138)
    [] iucv_process_message+0x112/0x3cc [af_iucv]
    [] iucv_callback_rx+0x1f0/0x274 [af_iucv]
    [] iucv_message_pending+0xa2/0x120
    [] iucv_tasklet_fn+0x176/0x1b8
    [] tasklet_action+0xfe/0x1f4
    [] __do_softirq+0x116/0x284
    [] do_softirq+0xe4/0xe8
    [] irq_exit+0xba/0xd8
    [] do_extint+0x146/0x190
    [] ext_no_vtime+0x1e/0x22
    [] kfree+0x202/0x28c
    ([] kfree+0x1f8/0x28c)
    [] __kfree_skb+0x32/0x124
    [] iucv_sock_recvmsg+0x236/0x41c [af_iucv]
    [] sock_aio_read+0x136/0x160
    [] do_sync_read+0xe4/0x13c
    [] vfs_read+0x152/0x15c
    [] SyS_read+0x54/0xac
    [] sysc_noemu+0x10/0x16
    [] 0x42ff8def3c

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Hendrik Brueckner
     
  • For non-accepted sockets on the accept queue, iucv_sock_kill()
    is called twice (in iucv_sock_close() and iucv_sock_cleanup_listen()).
    This typically results in a kernel oops as shown below.

    Remove the duplicate call to iucv_sock_kill() and set the SOCK_ZAPPED
    flag in iucv_sock_close() only.

    The iucv_sock_kill() function frees a socket only if the socket is zapped
    and orphaned (sk->sk_socket == NULL):
    - Non-accepted sockets are always orphaned and, thus, iucv_sock_kill()
    frees the socket twice.
    - For accepted sockets or sockets created with iucv_sock_create(),
    sk->sk_socket is initialized. This caused the first call to
    iucv_sock_kill() to return immediately. To free these sockets,
    iucv_sock_release() uses sock_orphan() before calling iucv_sock_kill().

    Unable to handle kernel pointer dereference at virtual kernel address 000000003edd3000
    Oops: 0011 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    Modules linked in: af_iucv sunrpc qeth_l3 dm_multipath dm_mod qeth vmur ccwgroup
    CPU: 0 Not tainted 2.6.30 #4
    Process iucv_sock_close (pid: 2486, task: 000000003aea4340, ksp: 000000003b75bc68)
    Krnl PSW : 0704200180000000 000003e00168e23a (iucv_sock_kill+0x2e/0xcc [af_iucv])
    R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3
    Krnl GPRS: 0000000000000000 000000003b75c000 000000003edd37f0 0000000000000001
    000003e00168ec62 000000003988d960 0000000000000000 000003e0016b0608
    000000003fe81b20 000000003839bb58 00000000399977f0 000000003edd37f0
    000003e00168b000 000003e00168f138 000000003b75bcd0 000000003b75bc98
    Krnl Code: 000003e00168e22a: c0c0ffffe6eb larl %r12,3e00168b000
    000003e00168e230: b90400b2 lgr %r11,%r2
    000003e00168e234: e3e0f0980024 stg %r14,152(%r15)
    >000003e00168e23a: e310225e0090 llgc %r1,606(%r2)
    000003e00168e240: a7110001 tmll %r1,1
    000003e00168e244: a7840007 brc 8,3e00168e252
    000003e00168e248: d507d00023c8 clc 0(8,%r13),968(%r2)
    000003e00168e24e: a7840009 brc 8,3e00168e260
    Call Trace:
    ([] afiucv_dbf+0x0/0xfffffffffffdea20 [af_iucv])
    [] iucv_sock_close+0x130/0x368 [af_iucv]
    [] iucv_sock_release+0x5e/0xe4 [af_iucv]
    [] sock_release+0x44/0x104
    [] sock_close+0x32/0x50
    [] __fput+0xf4/0x250
    [] filp_close+0x7a/0xa8
    [] SyS_close+0xe2/0x148
    [] sysc_noemu+0x10/0x16
    [] 0x42ff8deeac

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Hendrik Brueckner
     
  • After resuming from suspend, all af_iucv sockets are disconnected.
    Ensure that iucv_accept_dequeue() can handle disconnected sockets
    which are not yet accepted.

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Hendrik Brueckner
     
  • Moving prepare_to_wait before the condition to avoid a race between
    schedule_timeout and wake up.
    The race can appear during iucv_sock_connect() and iucv_callback_connack().

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Hendrik Brueckner
     

15 Sep, 2009

1 commit


10 Jul, 2009

1 commit

  • Adding memory barrier after the poll_wait function, paired with
    receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper
    to wrap the memory barrier.

    Without the memory barrier, following race can happen.
    The race fires, when following code paths meet, and the tp->rcv_nxt
    and __add_wait_queue updates stay in CPU caches.

    CPU1 CPU2

    sys_select receive packet
    ... ...
    __add_wait_queue update tp->rcv_nxt
    ... ...
    tp->rcv_nxt check sock_def_readable
    ... {
    schedule ...
    if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
    wake_up_interruptible(sk->sk_sleep)
    ...
    }

    If there was no cache the code would work ok, since the wait_queue and
    rcv_nxt are opposit to each other.

    Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already
    passed the tp->rcv_nxt check and sleeps, or will get the new value for
    tp->rcv_nxt and will return with new data mask.
    In both cases the process (CPU1) is being added to the wait queue, so the
    waitqueue_active (CPU2) call cannot miss and will wake up CPU1.

    The bad case is when the __add_wait_queue changes done by CPU1 stay in its
    cache, and so does the tp->rcv_nxt update on CPU2 side. The CPU1 will then
    endup calling schedule and sleep forever if there are no more data on the
    socket.

    Calls to poll_wait in following modules were ommited:
    net/bluetooth/af_bluetooth.c
    net/irda/af_irda.c
    net/irda/irnet/irnet_ppp.c
    net/mac80211/rc80211_pid_debugfs.c
    net/phonet/socket.c
    net/rds/af_rds.c
    net/rfkill/core.c
    net/sunrpc/cache.c
    net/sunrpc/rpc_pipe.c
    net/tipc/socket.c

    Signed-off-by: Jiri Olsa
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Jiri Olsa
     

23 Jun, 2009

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (43 commits)
    via-velocity: Fix velocity driver unmapping incorrect size.
    mlx4_en: Remove redundant refill code on RX
    mlx4_en: Removed redundant check on lso header size
    mlx4_en: Cancel port_up check in transmit function
    mlx4_en: using stop/start_all_queues
    mlx4_en: Removed redundant skb->len check
    mlx4_en: Counting all the dropped packets on the TX side
    usbnet cdc_subset: fix issues talking to PXA gadgets
    Net: qla3xxx, remove sleeping in atomic
    ipv4: fix NULL pointer + success return in route lookup path
    isdn: clean up documentation index
    cfg80211: validate station settings
    cfg80211: allow setting station parameters in mesh
    cfg80211: allow adding/deleting stations on mesh
    ath5k: fix beacon_int handling
    MAINTAINERS: Fix Atheros pattern paths
    ath9k: restore PS mode, before we put the chip into FULL SLEEP state.
    ath9k: wait for beacon frame along with CAB
    acer-wmi: fix rfkill conversion
    ath5k: avoid PCI FATAL interrupts by restoring RETRY_TIMEOUT disabling
    ...

    Linus Torvalds
     

19 Jun, 2009

2 commits

  • If the iucv message limit for a communication path is exceeded,
    sendmsg() returns -EAGAIN instead of -EPIPE.
    The calling application can then handle this error situtation,
    e.g. to try again after waiting some time.

    For blocking sockets, sendmsg() waits up to the socket timeout
    before returning -EAGAIN. For the new wait condition, a macro
    has been introduced and the iucv_sock_wait_state() has been
    refactored to this macro.

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Hendrik Brueckner
     
  • Change the if condition to exit sendmsg() if the socket in not connected.

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Hendrik Brueckner
     

16 Jun, 2009

1 commit

  • Patch establishes a dummy afiucv-device to make sure af_iucv is
    notified as iucv-bus device about suspend/resume.

    The PM freeze callback severs all iucv pathes of connected af_iucv sockets.
    The PM thaw/restore callback switches the state of all previously connected
    sockets to IUCV_DISCONN.

    Signed-off-by: Ursula Braun
    Signed-off-by: Martin Schwidefsky

    Ursula Braun
     

23 Apr, 2009

9 commits

  • From: Ursula Braun

    net/iucv/af_iucv.c in net-next-2.6 is almost correct. 4 lines should
    still be deleted. These are the remaining changes:

    Signed-off-by: David S. Miller

    Ursula Braun
     
  • Conflicts:
    net/iucv/af_iucv.c

    David S. Miller
     
  • The SO_MSGLIMIT socket option modifies the message limit for new
    IUCV communication paths.

    The message limit specifies the maximum number of outstanding messages
    that are allowed for connections. This setting can be lowered by z/VM
    when an IUCV connection is established.

    Expects an integer value in the range of 1 to 65535.
    The default value is 65535.

    The message limit must be set before calling connect() or listen()
    for sockets.

    If sockets are already connected or in state listen, changing the message
    limit is not supported.
    For reading the message limit value, unconnected sockets return the limit
    that has been set or the default limit. For connected sockets, the actual
    message limit is returned. The actual message limit is assigned by z/VM
    for each connection and it depends on IUCV MSGLIMIT authorizations
    specified for the z/VM guest virtual machine.

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Hendrik Brueckner
     
  • If the skb cannot be copied to user iovec, always return -EFAULT.
    The skb is enqueued again, except MSG_PEEK flag is set, to allow user space
    applications to correct its iovec pointer.

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Hendrik Brueckner
     
  • This patch provides the socket type SOCK_SEQPACKET in addition to
    SOCK_STREAM.

    AF_IUCV sockets of type SOCK_SEQPACKET supports an 1:1 mapping of
    socket read or write operations to complete IUCV messages.
    Socket data or IUCV message data is not fragmented as this is the
    case for SOCK_STREAM sockets.

    The intention is to help application developers who write
    applications or device drivers using native IUCV interfaces
    (Linux kernel or z/VM IUCV interfaces).

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Hendrik Brueckner
     
  • Allow 'classification' of socket data that is sent or received over
    an af_iucv socket. For classification of data, the target class of an
    (native) iucv message is used.

    This patch provides the cmsg interface for iucv_sock_recvmsg() and
    iucv_sock_sendmsg(). Applications can use the msg_control field of
    struct msghdr to set or get the target class as a
    "socket control message" (SCM/CMSG).

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Hendrik Brueckner
     
  • The patch allows to send and receive data in the parameter list of an
    iucv message.
    The parameter list is an arry of 8 bytes that are used by af_iucv as
    follows:
    0..6 7 bytes for socket data and
    7 1 byte to store the data length.

    Instead of storing the data length directly, the difference
    between 0xFF and the data length is used.
    This convention does not interfere with the existing use of PRM
    messages for shutting down the send direction of an AF_IUCV socket
    (shutdown() operation). Data lenghts greater than 7 (or PRM message
    byte 8 is less than 0xF8) denotes to special messages.
    Currently, the special SEND_SHUTDOWN message is supported only.

    To use IPRM messages, both communicators must set the IUCV_IPRMDATA
    flag during path negotiation, i.e. in iucv_connect() and
    path_pending().

    To be compatible to older af_iucv implementations, sending PRM
    messages is controlled by the socket option SO_IPRMDATA_MSG.
    Receiving PRM messages does not depend on the socket option (but
    requires the IUCV_IPRMDATA path flag to be set).

    Sending/Receiving data in the parameter list improves performance for
    small amounts of data by reducing message_completion() interrupts and
    memory copy operations.

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Hendrik Brueckner
     
  • Provide the socket operations getsocktopt() and setsockopt() to enable/disable
    sending of data in the parameter list of IUCV messages.
    The patch sets respective flag only.

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Hendrik Brueckner
     
  • If the af_iucv communication partner quiesces the path to shutdown its
    receive direction, provide a quiesce callback implementation to shutdown
    the (local) send direction. This ensures that both sides are synchronized.

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Hendrik Brueckner