23 Mar, 2008

4 commits

  • Sorry for the patch sequence confusion :| but I found that the similar
    thing can be done for raw sockets easily too late.

    Expand the proto.h union with the raw_hashinfo member and use it in
    raw_prot and rawv6_prot. This allows to drop the protocol specific
    versions of hash and unhash callbacks.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • After this we have only udp_lib_get_port to get the port and two
    stubs for ipv4 and ipv6. No difference in udp and udplite except
    for initialized h.udp_hash member.

    I tried to find a graceful way to drop the only difference between
    udp_v4_get_port and udp_v6_get_port (i.e. the rcv_saddr comparison
    routine), but adding one more callback on the struct proto didn't
    appear such :( Maybe later.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Inspired by the commit ab1e0a13 ([SOCK] proto: Add hashinfo member to
    struct proto) from Arnaldo, I made similar thing for UDP/-Lite IPv4
    and -v6 protocols.

    The result is not that exciting, but it removes some levels of
    indirection in udpxxx_get_port and saves some space in code and text.

    The first step is to union existing hashinfo and new udp_hash on the
    struct proto and give a name to this union, since future initialization
    of tcpxxx_prot, dccp_vx_protinfo and udpxxx_protinfo will cause gcc
    warning about inability to initialize anonymous member this way.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • ip_options->is_data is assigned only and never checked. The structure is
    not a part of kernel interface to the userspace. So, it is safe to remove
    this field.

    Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     

22 Mar, 2008

2 commits

  • Change TCP_DEFER_ACCEPT implementation so that it transitions a
    connection to ESTABLISHED after handshake is complete instead of
    leaving it in SYN-RECV until some data arrvies. Place connection in
    accept queue when first data packet arrives from slow path.

    Benefits:
    - established connection is now reset if it never makes it
    to the accept queue

    - diagnostic state of established matches with the packet traces
    showing completed handshake

    - TCP_DEFER_ACCEPT timeouts are expressed in seconds and can now be
    enforced with reasonable accuracy instead of rounding up to next
    exponential back-off of syn-ack retry.

    Signed-off-by: Patrick McManus
    Signed-off-by: David S. Miller

    Patrick McManus
     
  • Use the inline trick (same as pr_debug) to get checking of debug
    statements even if no code is generated.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

21 Mar, 2008

7 commits

  • Make the proc for tcp6 to be per namespace.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: David S. Miller

    Daniel Lezcano
     
  • The proc init/exit functions take a new network namespace parameter in
    order to register/unregister /proc/net/udp6 for a namespace.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: David S. Miller

    Daniel Lezcano
     
  • This patch, like udp proc, makes the proc functions to take care of
    which namespace the socket belongs.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: David S. Miller

    Daniel Lezcano
     
  • This patch makes the common udp proc functions to take care of which
    socket they should show taking into account the namespace it belongs.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: David S. Miller

    Daniel Lezcano
     
  • Update: My mailer ate one of Jarek's feedback mails... Fixed the
    parameter in netif_set_gso_max_size() to be u32, not u16. Fixed the
    whitespace issue due to a patch import botch. Changed the types from
    u32 to unsigned int to be more consistent with other variables in the
    area. Also brought the patch up to the latest net-2.6.26 tree.

    Update: Made gso_max_size container 32 bits, not 16. Moved the
    location of gso_max_size within netdev to be less hotpath. Made more
    consistent names between the sock and netdev layers, and added a
    define for the max GSO size.

    Update: Respun for net-2.6.26 tree.

    Update: changed max_gso_frame_size and sk_gso_max_size from signed to
    unsigned - thanks Stephen!

    This patch adds the ability for device drivers to control the size of
    the TSO frames being sent to them, per TCP connection. By setting the
    netdevice's gso_max_size value, the socket layer will set the GSO
    frame size based on that value. This will propogate into the TCP
    layer, and send TSO's of that size to the hardware.

    This can be desirable to help tune the bursty nature of TSO on a
    per-adapter basis, where one may have 1 GbE and 10 GbE devices
    coexisting in a system, one running multiqueue and the other not, etc.

    This can also be desirable for devices that cannot support full 64 KB
    TSO's, but still want to benefit from some level of segmentation
    offloading.

    Signed-off-by: Peter P Waskiewicz Jr
    Signed-off-by: David S. Miller

    Peter P Waskiewicz Jr
     
  • David S. Miller
     
  • There is a race is SCTP between the loading of the module
    and the access by the socket layer to the protocol functions.
    In particular, a list of addresss that SCTP maintains is
    not initialized prior to the registration with the protosw.
    Thus it is possible for a user application to gain access
    to SCTP functions before everything has been initialized.
    The problem shows up as odd crashes during connection
    initializtion when we try to access the SCTP address list.

    The solution is to refactor how we do registration and
    initialize the lists prior to registering with the protosw.
    Care must be taken since the address list initialization
    depends on some other pieces of SCTP initialization. Also
    the clean-up in case of failure now also needs to be refactored.

    Signed-off-by: Vlad Yasevich
    Acked-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

18 Mar, 2008

6 commits


17 Mar, 2008

3 commits

  • Some drivers need to reserve all PCI BARs to prevent other drivers
    misusing unoccupied BARs. pcim_iomap_regions_request_all() requests
    all BARs and iomap specified BARs.

    Signed-off-by: Tejun Heo
    Cc: Greg Kroah-Hartman
    Cc: Alan Cox
    Cc: Jeff Garzik
    Signed-off-by: Jeff Garzik

    Tejun Heo
     
  • There is a race in virtio_net, dealing with disabling/enabling the callback.
    I saw the following oops:

    kernel BUG at /space/kvm/drivers/virtio/virtio_ring.c:218!
    illegal operation: 0001 [#1] SMP
    Modules linked in: sunrpc dm_mod
    CPU: 2 Not tainted 2.6.25-rc1zlive-host-10623-gd358142-dirty #99
    Process swapper (pid: 0, task: 000000000f85a610, ksp: 000000000f873c60)
    Krnl PSW : 0404300180000000 00000000002b81a6 (vring_disable_cb+0x16/0x20)
    R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 EA:3
    Krnl GPRS: 0000000000000001 0000000000000001 0000000010005800 0000000000000001
    000000000f3a0900 000000000f85a610 0000000000000000 0000000000000000
    0000000000000000 000000000f870000 0000000000000000 0000000000001237
    000000000f3a0920 000000000010ff74 00000000002846f6 000000000fa0bcd8
    Krnl Code: 00000000002b819a: a7110001 tmll %r1,1
    00000000002b819e: a7840004 brc 8,2b81a6
    00000000002b81a2: a7f40001 brc 15,2b81a4
    >00000000002b81a6: a51b0001 oill %r1,1
    00000000002b81aa: 40102000 sth %r1,0(%r2)
    00000000002b81ae: 07fe bcr 15,%r14
    00000000002b81b0: eb7ff0380024 stmg %r7,%r15,56(%r15)
    00000000002b81b6: a7f13e00 tmll %r15,15872
    Call Trace:
    ([] 0xfa0bcd0)
    [] vring_interrupt+0x5c/0x6c
    [] do_extint+0xb8/0xf0
    [] ext_no_vtime+0x16/0x1a
    [] cpu_idle+0x1c2/0x1e0

    The problem can be triggered with a high amount of host->guest traffic.
    I think its the following race:

    poll says netif_rx_complete
    poll calls enable_cb
    enable_cb opens the interrupt mask
    a new packet comes, an interrupt is triggered----\
    enable_cb sees that there is more work |
    enable_cb disables the interrupt |
    . V
    . interrupt is delivered
    . skb_recv_done does atomic napi test, ok
    some waiting disable_cb is called->check fails->bang!
    .
    poll would do napi check
    poll would do disable_cb

    The fix is to let enable_cb not disable the interrupt again, but expect the
    caller to do the cleanup if it returns false. In that case, the interrupt is
    only disabled, if the napi test_set_bit was successful.

    Signed-off-by: Christian Borntraeger
    Signed-off-by: Rusty Russell (cleaned up doco)

    Christian Borntraeger
     
  • * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6:
    [PARISC] make ptr_to_pide() static
    [PARISC] head.S: section mismatch fixes
    [PARISC] add back Crestone Peak cpu
    [PARISC] futex: special case cmpxchg NULL in kernel space
    [PARISC] clean up show_stack
    [PARISC] add pa8900 CPUs to hardware inventory
    [PARISC] clean up include/asm-parisc/elf.h
    [PARISC] move defconfig to arch/parisc/configs/
    [PARISC] add back AD1889 MAINTAINERS entry
    [PARISC] pdc_console: fix bizarre panic on boot
    [PARISC] dump_stack in show_regs
    [PARISC] pdc_stable: fix compile errors
    [PARISC] remove unused pdc_iodc_printf function
    [PARISC] bump __NR_syscalls
    [PARISC] unbreak pgalloc.h
    [PARISC] move VMALLOC_* definitions to fixmap.h
    [PARISC] wire up timerfd syscalls
    [PARISC] remove old timerfd syscall

    Linus Torvalds
     

16 Mar, 2008

9 commits


15 Mar, 2008

1 commit


14 Mar, 2008

6 commits


13 Mar, 2008

2 commits

  • Comparing with kernel 2.6.24, tbench result has regression with
    2.6.25-rc1.

    1) On 2 quad-core processor stoakley: 4%.
    2) On 4 quad-core processor tigerton: more than 30%.

    bisect located below patch.

    b4ce92775c2e7ff9cf79cca4e0a19c8c5fd6287b is first bad commit
    commit b4ce92775c2e7ff9cf79cca4e0a19c8c5fd6287b
    Author: Herbert Xu
    Date: Tue Nov 13 21:33:32 2007 -0800

    [IPV6]: Move nfheader_len into rt6_info

    The dst member nfheader_len is only used by IPv6. It's also currently
    creating a rather ugly alignment hole in struct dst. Therefore this patch
    moves it from there into struct rt6_info.

    Above patch changes the cache line alignment, especially member
    __refcnt. I did a testing by adding 2 unsigned long pading before
    lastuse, so the 3 members, lastuse/__refcnt/__use, are moved to next
    cache line. The performance is recovered.

    I created a patch to rearrange the members in struct dst_entry.

    With Eric and Valdis Kletnieks's suggestion, I made finer arrangement.

    1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So
    sizeof(dst_entry)=200 no matter if CONFIG_NET_CLS_ROUTE=y/n. I
    tested many patches on my 16-core tigerton by moving tclassid to
    different place. It looks like tclassid could also have impact on
    performance. If moving tclassid before metrics, or just don't move
    tclassid, the performance isn't good. So I move it behind metrics.

    2) Add comments before __refcnt.

    On 16-core tigerton:

    If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18%
    better than the one without the patch;

    If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30%
    better than the one without the patch.

    With 32bit 2.6.25-rc1 on 8-core stoakley, the new patch doesn't
    introduce regression.

    Thank Eric, Valdis, and David!

    Signed-off-by: Zhang Yanmin
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Zhang Yanmin
     
  • * master.kernel.org:/home/rmk/linux-2.6-arm: (26 commits)
    [ARM] 4856/1: Orion: initialise the sixth PCIe MBUS mapping window as well
    [ARM] 4855/1: Orion: use correct ethernet unit address range
    [ARM] 4853/1: include uImage target in make help
    [ARM] 4851/1: ns9xxx: fix size of gpiores
    [ARM] AT91: correct at91sam9263ek LCD power gpio pin
    [ARM] replace remaining __FUNCTION__ occurrences
    [ARM] 4850/1: include generic pgtable.h for !CONFIG_MMU case
    [ARM] 4849/1: move ATAGS asm definitions
    [ARM] 4848/1: at91: remove false lockdep warnings
    [ARM] 4847/1: kprobes: fix compilation with CONFIG_DEBUG_FS=y
    [ARM] include/asm-arm - use angle brackets for includes
    [ARM] 4845/1: Orion: Ignore memory tags with invalid data
    ARM: OMAP2: Register the L4 io bus to boot OMAP2
    ARM: OMAP1: Compile in other 16xx boards to OSK defconfig
    ARM: OMAP1: Refresh H2 defconfig
    ARM: OMAP1: Refresh OSK defconfig
    ARM: OMAP: gpio lockdep updates
    ARM: OMAP1: omap1/pm.c build fix
    ARM: OMAP1: omap h2 regression fix
    ARM: OMAP1: Fix compile for boards depending on old gpio expander
    ...

    Linus Torvalds