04 Oct, 2010

1 commit

  • This patch fixes the condition (3rd arg) passed to sk_wait_event() in
    sk_stream_wait_memory(). The incorrect check in sk_stream_wait_memory()
    causes the following soft lockup in tcp_sendmsg() when the global tcp
    memory pool has exhausted.

    >>> snip <<<

    localhost kernel: BUG: soft lockup - CPU#3 stuck for 11s! [sshd:6429]
    localhost kernel: CPU 3:
    localhost kernel: RIP: 0010:[sk_stream_wait_memory+0xcd/0x200] [sk_stream_wait_memory+0xcd/0x200] sk_stream_wait_memory+0xcd/0x200
    localhost kernel:
    localhost kernel: Call Trace:
    localhost kernel: [sk_stream_wait_memory+0x1b1/0x200] sk_stream_wait_memory+0x1b1/0x200
    localhost kernel: [] autoremove_wake_function+0x0/0x40
    localhost kernel: [ipv6:tcp_sendmsg+0x6e6/0xe90] tcp_sendmsg+0x6e6/0xce0
    localhost kernel: [sock_aio_write+0x126/0x140] sock_aio_write+0x126/0x140
    localhost kernel: [xfs:do_sync_write+0xf1/0x130] do_sync_write+0xf1/0x130
    localhost kernel: [] autoremove_wake_function+0x0/0x40
    localhost kernel: [hrtimer_start+0xe3/0x170] hrtimer_start+0xe3/0x170
    localhost kernel: [vfs_write+0x185/0x190] vfs_write+0x185/0x190
    localhost kernel: [sys_write+0x50/0x90] sys_write+0x50/0x90
    localhost kernel: [system_call+0x7e/0x83] system_call+0x7e/0x83

    >>> snip <<<

    What is happening is, that the sk_wait_event() condition passed from
    sk_stream_wait_memory() evaluates to true for the case of tcp global memory
    exhaustion. This is because both sk_stream_memory_free() and vm_wait are true
    which causes sk_wait_event() to *not* call schedule_timeout().
    Hence sk_stream_wait_memory() returns immediately to the caller w/o sleeping.
    This causes the caller to again try allocation, which again fails and again
    calls sk_stream_wait_memory(), and so on.

    [ Bug introduced by commit c1cbe4b7ad0bc4b1d98ea708a3fecb7362aa4088
    ("[NET]: Avoid atomic xchg() for non-error case") -DaveM ]

    Signed-off-by: Nagendra Singh Tomar
    Signed-off-by: David S. Miller

    Nagendra Tomar
     

13 Jul, 2010

1 commit


02 May, 2010

1 commit

  • sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
    need two atomic operations (and associated dirtying) per incoming
    packet.

    RCU conversion is pretty much needed :

    1) Add a new structure, called "struct socket_wq" to hold all fields
    that will need rcu_read_lock() protection (currently: a
    wait_queue_head_t and a struct fasync_struct pointer).

    [Future patch will add a list anchor for wakeup coalescing]

    2) Attach one of such structure to each "struct socket" created in
    sock_alloc_inode().

    3) Respect RCU grace period when freeing a "struct socket_wq"

    4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
    socket_wq"

    5) Change sk_sleep() function to use new sk->sk_wq instead of
    sk->sk_sleep

    6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
    a rcu_read_lock() section.

    7) Change all sk_has_sleeper() callers to :
    - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
    - Use wq_has_sleeper() to eventually wakeup tasks.
    - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)

    8) sock_wake_async() is modified to use rcu protection as well.

    9) Exceptions :
    macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
    instead of dynamically allocated ones. They dont need rcu freeing.

    Some cleanups or followups are probably needed, (possible
    sk_callback_lock conversion to a spinlock for example...).

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 Apr, 2010

1 commit

  • Define a new function to return the waitqueue of a "struct sock".

    static inline wait_queue_head_t *sk_sleep(struct sock *sk)
    {
    return sk->sk_sleep;
    }

    Change all read occurrences of sk_sleep by a call to this function.

    Needed for a future RCU conversion. sk_sleep wont be a field directly
    available.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 May, 2009

1 commit

  • When TCP frees up write buffer space, avoid waking up tasks that have
    done a poll() or select() on the same socket specifying read-side
    events.

    This is an extension of a read-side patch by Eric Dumazet.

    Signed-off-by: John Dykstra
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    John Dykstra
     

14 Oct, 2008

1 commit

  • Clean up the various different email addresses of mine listed in the code
    to a single current and valid address. As Dave says his network merges
    for 2.6.28 are now done this seems a good point to send them in where
    they won't risk disrupting real changes.

    Signed-off-by: Alan Cox
    Signed-off-by: David S. Miller

    Alan Cox
     

26 Jul, 2008

1 commit

  • Removes legacy reinvent-the-wheel type thing. The generic
    machinery integrates much better to automated debugging aids
    such as kerneloops.org (and others), and is unambiguous due to
    better naming. Non-intuively BUG_TRAP() is actually equal to
    WARN_ON() rather than BUG_ON() though some might actually be
    promoted to BUG_ON() but I left that to future.

    I could make at least one BUILD_BUG_ON conversion.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     

29 Jan, 2008

4 commits

  • This patch introduces new memory accounting functions for each network
    protocol. Most of them are renamed from memory accounting functions
    for stream protocols. At the same time, some stream memory accounting
    functions are removed since other functions do same thing.

    Renaming:
    sk_stream_free_skb() -> sk_wmem_free_skb()
    __sk_stream_mem_reclaim() -> __sk_mem_reclaim()
    sk_stream_mem_reclaim() -> sk_mem_reclaim()
    sk_stream_mem_schedule -> __sk_mem_schedule()
    sk_stream_pages() -> sk_mem_pages()
    sk_stream_rmem_schedule() -> sk_rmem_schedule()
    sk_stream_wmem_schedule() -> sk_wmem_schedule()
    sk_charge_skb() -> sk_mem_charge()

    Removeing
    sk_stream_rfree(): consolidates into sock_rfree()
    sk_stream_set_owner_r(): consolidates into skb_set_owner_r()
    sk_stream_mem_schedule()

    The following functions are added.
    sk_has_account(): check if the protocol supports accounting
    sk_mem_uncharge(): do the opposite of sk_mem_charge()

    In addition, to achieve consolidation, updating sk_wmem_queued is
    removed from sk_mem_charge().

    Next, to consolidate memory accounting functions, this patch adds
    memory accounting calls to network core functions. Moreover, present
    memory accounting call is renamed to new accounting call.

    Finally we replace present memory accounting calls with new interface
    in TCP and SCTP.

    Signed-off-by: Takahiro Yasui
    Signed-off-by: Hideo Aoki
    Signed-off-by: David S. Miller

    Hideo Aoki
     
  • sk_forward_alloc being signed, we should take care of divides by
    SK_STREAM_MEM_QUANTUM we do in sk_stream_pages() and
    __sk_stream_mem_reclaim()

    This patchs introduces SK_STREAM_MEM_QUANTUM_SHIFT, defined
    as ilog2(SK_STREAM_MEM_QUANTUM), to be able to use right
    shifts instead of plain divides.

    This should help compiler to choose right shifts instead of
    expensive divides (as seen with CONFIG_CC_OPTIMIZE_FOR_SIZE=y on x86)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • The sock_wake_async() performs a bit different actions
    depending on "how" argument. Unfortunately this argument
    ony has numerical magic values.

    I propose to give names to their constants to help people
    reading this function callers understand what's going on
    without looking into this function all the time.

    I suppose this is 2.6.25 material, but if it's not (or the
    naming seems poor/bad/awful), I can rework it against the
    current net-2.6 tree.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • This function references sk->sk_prot->xxx for many times.
    It turned out, that there's so many code in it, that gcc
    cannot always optimize access to sk->sk_prot's fields.

    After saving the sk->sk_prot on the stack and comparing
    disassembled code, it turned out that the function became
    ~10 bytes shorter and made less dereferences (on i386 and
    x86_64). Stack consumption didn't grow.

    Besides, this patch drives most of this function into the
    80 columns limit.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

11 Feb, 2007

1 commit


13 Jul, 2006

1 commit

  • __sk_stream_mem_reclaim is only called by sk_stream_mem_reclaim.

    As such the check on sk->sk_forward_alloc is not needed and can be
    removed.

    Signed-off-by: Ian McDonald
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Ian McDonald
     

20 Apr, 2006

1 commit

  • Add some sanity checking. truesize should be at least sizeof(struct
    sk_buff) plus the current packet length. If not, then truesize is
    seriously mangled and deserves a kernel log message.

    Currently we'll do the check for release of stream socket buffers.

    But we can add checks to more spots over time.

    Incorporating ideas from Herbert Xu.

    Signed-off-by: David S. Miller

    David S. Miller
     

04 Jan, 2006

1 commit

  • It also looks like there were 2 places where the test on sk_err was
    missing from the event wait logic (in sk_stream_wait_connect and
    sk_stream_wait_memory), while the rest of the sock_error() users look
    to be doing the right thing. This version of the patch fixes those,
    and cleans up a few places that were testing ->sk_err directly.

    Signed-off-by: Benjamin LaHaise
    Signed-off-by: David S. Miller

    Benjamin LaHaise
     

06 Nov, 2005

1 commit

  • When sk_stream_wait_connect detects a state transition to ESTABLISHED
    or CLOSE_WAIT prior to it going to sleep, it will return without
    calling finish_wait and decrementing sk_write_pending.

    This may result in crashes and other unintended behaviour.

    The fix is to always call finish_wait and update sk_write_pending since
    it is safe to do so even if the wait entry is no longer on the queue.

    This bug was tracked down with the help of Alex Sidorenko and the
    fix is also based on his suggestion.

    Signed-off-by: Herbert Xu
    Signed-off-by: Arnaldo Carvalho de Melo

    Herbert Xu
     

01 May, 2005

1 commit

  • I have recompiled Linux kernel 2.6.11.5 documentation for me and our
    university students again. The documentation could be extended for more
    sources which are equipped by structured comments for recent 2.6 kernels. I
    have tried to proceed with that task. I have done that more times from 2.6.0
    time and it gets boring to do same changes again and again. Linux kernel
    compiles after changes for i386 and ARM targets. I have added references to
    some more files into kernel-api book, I have added some section names as well.
    So please, check that changes do not break something and that categories are
    not too much skewed.

    I have changed kernel-doc to accept "fastcall" and "asmlinkage" words reserved
    by kernel convention. Most of the other changes are modifications in the
    comments to make kernel-doc happy, accept some parameters description and do
    not bail out on errors. Changed to @pid in the description, moved some
    #ifdef before comments to correct function to comments bindings, etc.

    You can see result of the modified documentation build at
    http://cmp.felk.cvut.cz/~pisa/linux/lkdb-2.6.11.tar.gz

    Some more sources are ready to be included into kernel-doc generated
    documentation. Sources has been added into kernel-api for now. Some more
    section names added and probably some more chaos introduced as result of quick
    cleanup work.

    Signed-off-by: Pavel Pisa
    Signed-off-by: Martin Waitz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Pisa
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds