04 Nov, 2016

1 commit

  • Andrey reported following warning while fuzzing with syzkaller

    WARNING: CPU: 1 PID: 21072 at net/dccp/proto.c:83 dccp_set_state+0x229/0x290
    Kernel panic - not syncing: panic_on_warn set ...

    CPU: 1 PID: 21072 Comm: syz-executor Not tainted 4.9.0-rc1+ #293
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    ffff88003d4c7738 ffffffff81b474f4 0000000000000003 dffffc0000000000
    ffffffff844f8b00 ffff88003d4c7804 ffff88003d4c7800 ffffffff8140c06a
    0000000041b58ab3 ffffffff8479ab7d ffffffff8140beae ffffffff8140cd00
    Call Trace:
    [< inline >] __dump_stack lib/dump_stack.c:15
    [] dump_stack+0xb3/0x10f lib/dump_stack.c:51
    [] panic+0x1bc/0x39d kernel/panic.c:179
    [] __warn+0x1cc/0x1f0 kernel/panic.c:542
    [] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
    [] dccp_set_state+0x229/0x290 net/dccp/proto.c:83
    [] dccp_close+0x612/0xc10 net/dccp/proto.c:1016
    [] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415
    [] sock_release+0x8e/0x1d0 net/socket.c:570
    [] sock_close+0x16/0x20 net/socket.c:1017
    [] __fput+0x29d/0x720 fs/file_table.c:208
    [] ____fput+0x15/0x20 fs/file_table.c:244
    [] task_work_run+0xf8/0x170 kernel/task_work.c:116
    [< inline >] exit_task_work include/linux/task_work.h:21
    [] do_exit+0x883/0x2ac0 kernel/exit.c:828
    [] do_group_exit+0x10e/0x340 kernel/exit.c:931
    [] get_signal+0x634/0x15a0 kernel/signal.c:2307
    [] do_signal+0x8d/0x1a30 arch/x86/kernel/signal.c:807
    [] exit_to_usermode_loop+0xe5/0x130
    arch/x86/entry/common.c:156
    [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190
    [] syscall_return_slowpath+0x1a8/0x1e0
    arch/x86/entry/common.c:259
    [] entry_SYSCALL_64_fastpath+0xc0/0xc2
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Kernel Offset: disabled

    Fix this the same way we did for TCP in commit 565b7b2d2e63
    ("tcp: do not send reset to already closed sockets")

    Signed-off-by: Eric Dumazet
    Reported-by: Andrey Konovalov
    Tested-by: Andrey Konovalov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

02 Dec, 2015

1 commit

  • This patch is a cleanup to make following patch easier to
    review.

    Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
    from (struct socket)->flags to a (struct socket_wq)->flags
    to benefit from RCU protection in sock_wake_async()

    To ease backports, we rename both constants.

    Two new helpers, sk_set_bit(int nr, struct sock *sk)
    and sk_clear_bit(int net, struct sock *sk) are added so that
    following patch can change their implementation.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

27 Jul, 2015

1 commit

  • Currently, tcp_recvmsg enters a busy loop in sk_wait_data if called
    with flags = MSG_WAITALL | MSG_PEEK.

    sk_wait_data waits for sk_receive_queue not empty, but in this case,
    the receive queue is not empty, but does not contain any skb that we
    can use.

    Add a "last skb seen on receive queue" argument to sk_wait_data, so
    that it sleeps until the receive queue has new skbs.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=99461
    Link: https://sourceware.org/bugzilla/show_bug.cgi?id=18493
    Link: https://bugzilla.redhat.com/show_bug.cgi?id=1205258
    Reported-by: Enrico Scholz
    Reported-by: Dan Searle
    Signed-off-by: Sabrina Dubroca
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     

03 Mar, 2015

1 commit

  • After TIPC doesn't depend on iocb argument in its internal
    implementations of sendmsg() and recvmsg() hooks defined in proto
    structure, no any user is using iocb argument in them at all now.
    Then we can drop the redundant iocb argument completely from kinds of
    implementations of both sendmsg() and recvmsg() in the entire
    networking stack.

    Cc: Christoph Hellwig
    Suggested-by: Al Viro
    Signed-off-by: Ying Xue
    Signed-off-by: David S. Miller

    Ying Xue
     

11 Dec, 2014

1 commit


24 Nov, 2014

1 commit


06 Nov, 2014

1 commit

  • This encapsulates all of the skb_copy_datagram_iovec() callers
    with call argument signature "skb, offset, msghdr->msg_iov, length".

    When we move to iov_iters in the networking, the iov_iter object will
    sit in the msghdr.

    Having a helper like this means there will be less places to touch
    during that transformation.

    Based upon descriptions and patch from Al Viro.

    Signed-off-by: David S. Miller

    David S. Miller
     

10 Oct, 2014

1 commit

  • Pull percpu updates from Tejun Heo:
    "A lot of activities on percpu front. Notable changes are...

    - percpu allocator now can take @gfp. If @gfp doesn't contain
    GFP_KERNEL, it tries to allocate from what's already available to
    the allocator and a work item tries to keep the reserve around
    certain level so that these atomic allocations usually succeed.

    This will replace the ad-hoc percpu memory pool used by
    blk-throttle and also be used by the planned blkcg support for
    writeback IOs.

    Please note that I noticed a bug in how @gfp is interpreted while
    preparing this pull request and applied the fix 6ae833c7fe0c
    ("percpu: fix how @gfp is interpreted by the percpu allocator")
    just now.

    - percpu_ref now uses longs for percpu and global counters instead of
    ints. It leads to more sparse packing of the percpu counters on
    64bit machines but the overhead should be negligible and this
    allows using percpu_ref for refcnting pages and in-memory objects
    directly.

    - The switching between percpu and single counter modes of a
    percpu_ref is made independent of putting the base ref and a
    percpu_ref can now optionally be initialized in single or killed
    mode. This allows avoiding percpu shutdown latency for cases where
    the refcounted objects may be synchronously created and destroyed
    in rapid succession with only a fraction of them reaching fully
    operational status (SCSI probing does this when combined with
    blk-mq support). It's also planned to be used to implement forced
    single mode to detect underflow more timely for debugging.

    There's a separate branch percpu/for-3.18-consistent-ops which cleans
    up the duplicate percpu accessors. That branch causes a number of
    conflicts with s390 and other trees. I'll send a separate pull
    request w/ resolutions once other branches are merged"

    * 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (33 commits)
    percpu: fix how @gfp is interpreted by the percpu allocator
    blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode
    percpu_ref: make INIT_ATOMIC and switch_to_atomic() sticky
    percpu_ref: add PERCPU_REF_INIT_* flags
    percpu_ref: decouple switching to percpu mode and reinit
    percpu_ref: decouple switching to atomic mode and killing
    percpu_ref: add PCPU_REF_DEAD
    percpu_ref: rename things to prepare for decoupling percpu/atomic mode switch
    percpu_ref: replace pcpu_ prefix with percpu_
    percpu_ref: minor code and comment updates
    percpu_ref: relocate percpu_ref_reinit()
    Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe"
    Revert "percpu: free percpu allocation info for uniprocessor system"
    percpu-refcount: make percpu_ref based on longs instead of ints
    percpu-refcount: improve WARN messages
    percpu: fix locking regression in the failure path of pcpu_alloc()
    percpu-refcount: add @gfp to percpu_ref_init()
    proportions: add @gfp to init functions
    percpu_counter: add @gfp to percpu_counter_init()
    percpu_counter: make percpu_counters_lock irq-safe
    ...

    Linus Torvalds
     

09 Oct, 2014

1 commit

  • Pull networking updates from David Miller:
    "Most notable changes in here:

    1) By far the biggest accomplishment, thanks to a large range of
    contributors, is the addition of multi-send for transmit. This is
    the result of discussions back in Chicago, and the hard work of
    several individuals.

    Now, when the ->ndo_start_xmit() method of a driver sees
    skb->xmit_more as true, it can choose to defer the doorbell
    telling the driver to start processing the new TX queue entires.

    skb->xmit_more means that the generic networking is guaranteed to
    call the driver immediately with another SKB to send.

    There is logic added to the qdisc layer to dequeue multiple
    packets at a time, and the handling mis-predicted offloads in
    software is now done with no locks held.

    Finally, pktgen is extended to have a "burst" parameter that can
    be used to test a multi-send implementation.

    Several drivers have xmit_more support: i40e, igb, ixgbe, mlx4,
    virtio_net

    Adding support is almost trivial, so export more drivers to
    support this optimization soon.

    I want to thank, in no particular or implied order, Jesper
    Dangaard Brouer, Eric Dumazet, Alexander Duyck, Tom Herbert, Jamal
    Hadi Salim, John Fastabend, Florian Westphal, Daniel Borkmann,
    David Tat, Hannes Frederic Sowa, and Rusty Russell.

    2) PTP and timestamping support in bnx2x, from Michal Kalderon.

    3) Allow adjusting the rx_copybreak threshold for a driver via
    ethtool, and add rx_copybreak support to enic driver. From
    Govindarajulu Varadarajan.

    4) Significant enhancements to the generic PHY layer and the bcm7xxx
    driver in particular (EEE support, auto power down, etc.) from
    Florian Fainelli.

    5) Allow raw buffers to be used for flow dissection, allowing drivers
    to determine the optimal "linear pull" size for devices that DMA
    into pools of pages. The objective is to get exactly the
    necessary amount of headers into the linear SKB area pre-pulled,
    but no more. The new interface drivers use is eth_get_headlen().
    From WANG Cong, with driver conversions (several had their own
    by-hand duplicated implementations) by Alexander Duyck and Eric
    Dumazet.

    6) Support checksumming more smoothly and efficiently for
    encapsulations, and add "foo over UDP" facility. From Tom
    Herbert.

    7) Add Broadcom SF2 switch driver to DSA layer, from Florian
    Fainelli.

    8) eBPF now can load programs via a system call and has an extensive
    testsuite. Alexei Starovoitov and Daniel Borkmann.

    9) Major overhaul of the packet scheduler to use RCU in several major
    areas such as the classifiers and rate estimators. From John
    Fastabend.

    10) Add driver for Intel FM10000 Ethernet Switch, from Alexander
    Duyck.

    11) Rearrange TCP_SKB_CB() to reduce cache line misses, from Eric
    Dumazet.

    12) Add Datacenter TCP congestion control algorithm support, From
    Florian Westphal.

    13) Reorganize sk_buff so that __copy_skb_header() is significantly
    faster. From Eric Dumazet"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1558 commits)
    netlabel: directly return netlbl_unlabel_genl_init()
    net: add netdev_txq_bql_{enqueue, complete}_prefetchw() helpers
    net: description of dma_cookie cause make xmldocs warning
    cxgb4: clean up a type issue
    cxgb4: potential shift wrapping bug
    i40e: skb->xmit_more support
    net: fs_enet: Add NAPI TX
    net: fs_enet: Remove non NAPI RX
    r8169:add support for RTL8168EP
    net_sched: copy exts->type in tcf_exts_change()
    wimax: convert printk to pr_foo()
    af_unix: remove 0 assignment on static
    ipv6: Do not warn for informational ICMP messages, regardless of type.
    Update Intel Ethernet Driver maintainers list
    bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING
    tipc: fix bug in multicast congestion handling
    net: better IFF_XMIT_DST_RELEASE support
    net/mlx4_en: remove NETDEV_TX_BUSY
    3c59x: fix bad split of cpu_to_le32(pci_map_single())
    net: bcmgenet: fix Tx ring priority programming
    ...

    Linus Torvalds
     

08 Oct, 2014

1 commit

  • Pull dmaengine updates from Dan Williams:
    "Even though this has fixes marked for -stable, given the size and the
    needed conflict resolutions this is 3.18-rc1/merge-window material.

    These patches have been languishing in my tree for a long while. The
    fact that I do not have the time to do proper/prompt maintenance of
    this tree is a primary factor in the decision to step down as
    dmaengine maintainer. That and the fact that the bulk of drivers/dma/
    activity is going through Vinod these days.

    The net_dma removal has not been in -next. It has developed simple
    conflicts against mainline and net-next (for-3.18).

    Continuing thanks to Vinod for staying on top of drivers/dma/.

    Summary:

    1/ Step down as dmaengine maintainer see commit 08223d80df38
    "dmaengine maintainer update"

    2/ Removal of net_dma, as it has been marked 'broken' since 3.13
    (commit 77873803363c "net_dma: mark broken"), without reports of
    performance regression.

    3/ Miscellaneous fixes"

    * tag 'dmaengine-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine:
    net: make tcp_cleanup_rbuf private
    net_dma: revert 'copied_early'
    net_dma: simple removal
    dmaengine maintainer update
    dmatest: prevent memory leakage on error path in thread
    ioat: Use time_before_jiffies()
    dmaengine: fix xor sources continuation
    dma: mv_xor: Rename __mv_xor_slot_cleanup() to mv_xor_slot_cleanup()
    dma: mv_xor: Remove all callers of mv_xor_slot_cleanup()
    dma: mv_xor: Remove unneeded mv_xor_clean_completed_slots() call
    ioat: Use pci_enable_msix_exact() instead of pci_enable_msix()
    drivers: dma: Include appropriate header file in dca.c
    drivers: dma: Mark functions as static in dma_v3.c
    dma: mv_xor: Add DMA API error checks
    ioat/dca: Use dev_is_pci() to check whether it is pci device

    Linus Torvalds
     

02 Oct, 2014

1 commit


28 Sep, 2014

1 commit

  • Per commit "77873803363c net_dma: mark broken" net_dma is no longer used
    and there is no plan to fix it.

    This is the mechanical removal of bits in CONFIG_NET_DMA ifdef guards.
    Reverting the remainder of the net_dma induced changes is deferred to
    subsequent patches.

    Marked for stable due to Roman's report of a memory leak in
    dma_pin_iovec_pages():

    https://lkml.org/lkml/2014/9/3/177

    Cc: Dave Jiang
    Cc: Vinod Koul
    Cc: David Whipple
    Cc: Alexander Duyck
    Cc:
    Reported-by: Roman Gushchin
    Acked-by: David S. Miller
    Signed-off-by: Dan Williams

    Dan Williams
     

08 Sep, 2014

1 commit

  • Percpu allocator now supports allocation mask. Add @gfp to
    percpu_counter_init() so that !GFP_KERNEL allocation masks can be used
    with percpu_counters too.

    We could have left percpu_counter_init() alone and added
    percpu_counter_init_gfp(); however, the number of users isn't that
    high and introducing _gfp variants to all percpu data structures would
    be quite ugly, so let's just do the conversion. This is the one with
    the most users. Other percpu data structures are a lot easier to
    convert.

    This patch doesn't make any functional difference.

    Signed-off-by: Tejun Heo
    Acked-by: Jan Kara
    Acked-by: "David S. Miller"
    Cc: x86@kernel.org
    Cc: Jens Axboe
    Cc: "Theodore Ts'o"
    Cc: Alexander Viro
    Cc: Andrew Morton

    Tejun Heo
     

08 May, 2014

1 commit

  • commit 8f0ea0fe3a036a47767f9c80e (snmp: reduce percpu needs by 50%)
    reduced snmp array size to 1, so technically it doesn't have to be
    an array any more. What's more, after the following commit:

    commit 933393f58fef9963eac61db8093689544e29a600
    Date: Thu Dec 22 11:58:51 2011 -0600

    percpu: Remove irqsafe_cpu_xxx variants

    We simply say that regular this_cpu use must be safe regardless of
    preemption and interrupt state. That has no material change for x86
    and s390 implementations of this_cpu operations. However, arches that
    do not provide their own implementation for this_cpu operations will
    now get code generated that disables interrupts instead of preemption.

    probably no arch wants to have SNMP_ARRAY_SZ == 2. At least after
    almost 3 years, no one complains.

    So, just convert the array to a single pointer and remove snmp_mib_init()
    and snmp_mib_free() as well.

    Cc: Christoph Lameter
    Cc: Eric Dumazet
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

09 Oct, 2013

1 commit

  • TCP listener refactoring, part 3 :

    Our goal is to hash SYN_RECV sockets into main ehash for fast lookup,
    and parallel SYN processing.

    Current inet_ehash_bucket contains two chains, one for ESTABLISH (and
    friend states) sockets, another for TIME_WAIT sockets only.

    As the hash table is sized to get at most one socket per bucket, it
    makes little sense to have separate twchain, as it makes the lookup
    slightly more complicated, and doubles hash table memory usage.

    If we make sure all socket types have the lookup keys at the same
    offsets, we can use a generic and faster lookup. It turns out TIME_WAIT
    and ESTABLISHED sockets already have common lookup fields for IPv4.

    [ INET_TW_MATCH() is no longer needed ]

    I'll provide a follow-up to factorize IPv6 lookup as well, to remove
    INET6_TW_MATCH()

    This way, SYN_RECV pseudo sockets will be supported the same.

    A new sock_gen_put() helper is added, doing either a sock_put() or
    inet_twsk_put() [ and will support SYN_RECV later ].

    Note this helper should only be called in real slow path, when rcu
    lookup found a socket that was moved to another identity (freed/reused
    immediately), but could eventually be used in other contexts, like
    sock_edemux()

    Before patch :

    dmesg | grep "TCP established"

    TCP established hash table entries: 524288 (order: 11, 8388608 bytes)

    After patch :

    TCP established hash table entries: 524288 (order: 10, 4194304 bytes)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

25 Jul, 2013

1 commit

  • Several call sites use the hardcoded following condition :

    sk_stream_wspace(sk) >= sk_stream_min_wspace(sk)

    Lets use a helper because TCP_NOTSENT_LOWAT support will change this
    condition for TCP sockets.

    Signed-off-by: Eric Dumazet
    Cc: Neal Cardwell
    Cc: Yuchung Cheng
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Eric Dumazet
     

17 May, 2012

1 commit


20 Dec, 2011

1 commit

  • module_param(bool) used to counter-intuitively take an int. In
    fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
    trick.

    It's time to remove the int/unsigned int option. For this version
    it'll simply give a warning, but it'll break next kernel version.

    (Thanks to Joe Perches for suggesting coccinelle for 0/1 -> true/false).

    Cc: "David S. Miller"
    Cc: netdev@vger.kernel.org
    Signed-off-by: Rusty Russell
    Signed-off-by: David S. Miller

    Rusty Russell
     

01 Aug, 2011

1 commit

  • This uses the new feature-negotiation framework to signal Ack Ratio changes,
    as required by RFC 4341, sec. 6.1.2.

    That raises some problems with CCID-2, which at the moment can not cope
    gracefully with Ack Ratios > 1. Since these issues are not directly related
    to feature negotiation, they are marked by a FIXME.

    Signed-off-by: Gerrit Renker
    Signed-off-by: Samuel Jero
    Acked-by: Ian McDonald

    Gerrit Renker
     

07 Dec, 2010

2 commits

  • Ensure that cmsg->cmsg_type value is valid for qpolicy
    that is currently in use.

    Signed-off-by: Tomasz Grobelny
    Signed-off-by: Gerrit Renker

    Tomasz Grobelny
     
  • This patch adds a generic infrastructure for policy-based dequeueing of
    TX packets and provides two policies:
    * a simple FIFO policy (which is the default) and
    * a priority based policy (set via socket options).
    Both policies honour the tx_qlen sysctl for the maximum size of the write
    queue (can be overridden via socket options).

    The priority policy uses skb->priority internally to assign an u32 priority
    identifier, using the same ranking as SO_PRIORITY. The skb->priority field
    is set to 0 when the packet leaves DCCP. The priority is supplied as ancillary
    data using cmsg(3), the patch also provides the requisite parsing routines.

    Signed-off-by: Tomasz Grobelny
    Signed-off-by: Gerrit Renker

    Tomasz Grobelny
     

29 Oct, 2010

1 commit

  • This extends the existing wait-for-ccid routine so that it may be used with
    different types of CCID, addressing the following problems:

    1) The queue-drain mechanism only works with rate-based CCIDs. If CCID-2 for
    example has a full TX queue and becomes network-limited just as the
    application wants to close, then waiting for CCID-2 to become unblocked
    could lead to an indefinite delay (i.e., application "hangs").
    2) Since each TX CCID in turn uses a feedback mechanism, there may be changes
    in its sending policy while the queue is being drained. This can lead to
    further delays during which the application will not be able to terminate.
    3) The minimum wait time for CCID-3/4 can be expected to be the queue length
    times the current inter-packet delay. For example if tx_qlen=100 and a delay
    of 15 ms is used for each packet, then the application would have to wait
    for a minimum of 1.5 seconds before being allowed to exit.
    4) There is no way for the user/application to control this behaviour. It would
    be good to use the timeout argument of dccp_close() as an upper bound. Then
    the maximum time that an application is willing to wait for its CCIDs to can
    be set via the SO_LINGER option.

    These problems are addressed by giving the CCID a grace period of up to the
    `timeout' value.

    The wait-for-ccid function is, as before, used when the application
    (a) has read all the data in its receive buffer and
    (b) if SO_LINGER was set with a non-zero linger time, or
    (c) the socket is either in the OPEN (active close) or in the PASSIVE_CLOSEREQ
    state (client application closes after receiving CloseReq).

    In addition, there is a catch-all case of __skb_queue_purge() after waiting for
    the CCID. This is necessary since the write queue may still have data when
    (a) the host has been passively-closed,
    (b) abnormal termination (unread data, zero linger time),
    (c) wait-for-ccid could not finish within the given time limit.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     

12 Oct, 2010

1 commit

  • This omits the redundant "DCCP:" in warning messages, since DCCP_WARN() already
    echoes the function name, avoiding messages like

    kernel: [10988.766503] dccp_close: DCCP: ABORT -- 209 bytes unread

    Signed-off-by: Gerrit Renker

    Gerrit Renker
     

07 Oct, 2010

1 commit


26 Jun, 2010

1 commit

  • In preparation for 64bit snmp counters for some mibs,
    add an 'align' parameter to snmp_mib_init(), instead
    of assuming mibs only contain 'unsigned long' fields.

    Callers can use __alignof__(type) to provide correct
    alignment.

    Signed-off-by: Eric Dumazet
    CC: Herbert Xu
    CC: Arnaldo Carvalho de Melo
    CC: Hideaki YOSHIFUJI
    CC: Vlad Yasevich
    Signed-off-by: David S. Miller

    Eric Dumazet
     

31 May, 2010

1 commit

  • Use memdup_user when user data is immediately copied into the
    allocated region.

    The semantic patch that makes this change is as follows:
    (http://coccinelle.lip6.fr/)

    //
    @@
    expression from,to,size,flag;
    position p;
    identifier l1,l2;
    @@

    - to = \(kmalloc@p\|kzalloc@p\)(size,flag);
    + to = memdup_user(from,size);
    if (
    - to==NULL
    + IS_ERR(to)
    || ...) {

    }
    - if (copy_from_user(to, from, size) != 0) {
    -
    - }
    //

    Signed-off-by: Julia Lawall
    Acked-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Julia Lawall
     

21 Apr, 2010

1 commit

  • Define a new function to return the waitqueue of a "struct sock".

    static inline wait_queue_head_t *sk_sleep(struct sock *sk)
    {
    return sk->sk_sleep;
    }

    Change all read occurrences of sk_sleep by a call to this function.

    Needed for a future RCU conversion. sk_sleep wont be a field directly
    available.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

16 Mar, 2010

1 commit

  • dccp: fix panic caused by failed initialisation

    This fixes a kernel panic reported thanks to Andre Noll:

    if DCCP is compiled into the kernel and any out of the initialisation
    steps in net/dccp/proto.c:dccp_init() fail, a subsequent attempt to create
    a SOCK_DCCP socket will panic, since inet{,6}_create() are not prevented
    from creating DCCP sockets.

    This patch fixes the problem by propagating a failure in dccp_init() to
    dccp_v{4,6}_init_net(), and from there to dccp_v{4,6}_init(), so that the
    DCCP protocol is not made available if its initialisation fails.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     

17 Feb, 2010

1 commit

  • Add __percpu sparse annotations to net.

    These annotations are to make sparse consider percpu variables to be
    in a different address space and warn if accessed without going
    through percpu accessors. This patch doesn't affect normal builds.

    The macro and type tricks around snmp stats make things a bit
    interesting. DEFINE/DECLARE_SNMP_STAT() macros mark the target field
    as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly. All
    snmp_mib_*() users which used to cast the argument to (void **) are
    updated to cast it to (void __percpu **).

    Signed-off-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Patrick McHardy
    Cc: Arnaldo Carvalho de Melo
    Cc: Vlad Yasevich
    Cc: netdev@vger.kernel.org
    Signed-off-by: David S. Miller

    Tejun Heo
     

13 Feb, 2010

1 commit

  • DCCP is datagram-oriented but lacks UDP's support for MSG_TRUNC as defined in
    recvmsg(2)/recv(2). Hence the following 'Hello world\0' receiver

    len = recv(fd, buf, 10, MSG_PEEK | MSG_TRUNC);

    wrongly (always) returns 10, while in UDP it returns 12 as expected.
    This patch adds the missing MSG_TRUNC support to recvmsg().

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     

19 Oct, 2009

1 commit

  • In order to have better cache layouts of struct sock (separate zones
    for rx/tx paths), we need this preliminary patch.

    Goal is to transfert fields used at lookup time in the first
    read-mostly cache line (inside struct sock_common) and move sk_refcnt
    to a separate cache line (only written by rx path)

    This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
    sport and id fields. This allows a future patch to define these
    fields as macros, like sk_refcnt, without name clashes.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

13 Oct, 2009

1 commit


01 Oct, 2009

1 commit

  • This provides safety against negative optlen at the type
    level instead of depending upon (sometimes non-trivial)
    checks against this sprinkled all over the the place, in
    each and every implementation.

    Based upon work done by Arjan van de Ven and feedback
    from Linus Torvalds.

    Signed-off-by: David S. Miller

    David S. Miller
     

22 Sep, 2009

1 commit

  • Sizing of memory allocations shouldn't depend on the number of physical
    pages found in a system, as that generally includes (perhaps a huge amount
    of) non-RAM pages. The amount of what actually is usable as storage
    should instead be used as a basis here.

    Some of the calculations (i.e. those not intending to use high memory)
    should likely even use (totalram_pages - totalhigh_pages).

    Signed-off-by: Jan Beulich
    Acked-by: Rusty Russell
    Acked-by: Ingo Molnar
    Cc: Dave Airlie
    Cc: Kyle McMartin
    Cc: Jeremy Fitzhardinge
    Cc: Pekka Enberg
    Cc: Hugh Dickins
    Cc: "David S. Miller"
    Cc: Patrick McHardy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     

13 Aug, 2009

1 commit


10 Aug, 2009

1 commit


06 Aug, 2009

2 commits

  • String literals are constant, and usually, we can also tag the array
    of pointers const too, moving it to the .rodata section.

    Signed-off-by: Jan Engelhardt
    Signed-off-by: David S. Miller

    Jan Engelhardt
     
  • percpu counter dccp_orphan_count is init in dccp_init() by
    percpu_counter_init() while dccp module is loaded, but the
    destroy of it is missing while dccp module is unloaded. We
    can get the kernel WARNING about this. Reproduct by the
    following commands:

    $ modprobe dccp
    $ rmmod dccp
    $ modprobe dccp

    WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c()
    Hardware name: VMware Virtual Platform
    list_add corruption. next->prev should be prev (c080c0c4), but was (null). (next
    =ca7188cc).
    Modules linked in: dccp(+) nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc
    Pid: 1956, comm: modprobe Not tainted 2.6.31-rc5 #55
    Call Trace:
    [] warn_slowpath_common+0x6a/0x81
    [] ? __list_add+0x27/0x5c
    [] warn_slowpath_fmt+0x29/0x2c
    [] __list_add+0x27/0x5c
    [] __percpu_counter_init+0x4d/0x5d
    [] dccp_init+0x19/0x2ed [dccp]
    [] do_one_initcall+0x4f/0x111
    [] ? dccp_init+0x0/0x2ed [dccp]
    [] ? notifier_call_chain+0x26/0x48
    [] ? __blocking_notifier_call_chain+0x45/0x51
    [] sys_init_module+0xac/0x1bd
    [] sysenter_do_call+0x12/0x22

    Signed-off-by: Wei Yongjun
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Wei Yongjun
     

30 Jul, 2009

1 commit

  • The DCCP protocol tries to allocate some large hash tables during
    initialisation using the largest size possible. This can be larger than
    what the page allocator can provide so it prints a warning. However, the
    caller is able to handle the situation so this patch suppresses the
    warning.

    Signed-off-by: Mel Gorman
    Acked-by: Arnaldo Carvalho de Melo
    Cc: "David S. Miller"
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman