13 Sep, 2008

1 commit

  • inet6_rsk() is called on a struct request_sock * before we
    have checked whether the socket is an ipv6 socket or a ipv6-
    mapped ipv4 socket. The access that triggers this is the
    inet_rsk(rsk)->inet6_rsk_offset dereference in inet6_rsk().

    This is arguably not a critical error as the inet6_rsk_offset
    is only used to compute a pointer which is never really used
    (in the code path in question) anyway. But it might be a
    latent error, so let's fix it.

    Spotted by kmemcheck.

    Signed-off-by: Vegard Nossum
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Vegard Nossum
     

11 Sep, 2008

1 commit

  • Johannes Berg reported that occaisionally, bringing an interface
    down or unregistering it would hang for up to 30 seconds. Using
    debugging output he provided it became clear that ICMP6 routes
    were the culprit.

    The problem is that ICMP6 routes live in their own world totally
    separate from normal ipv6 routes. So there are all kinds of special
    cases throughout the ipv6 code to handle this.

    While we should really try to unify all of this stuff somehow,
    for the time being let's fix this by purging the ICMP6 routes
    that match the device in question during rt6_ifdown().

    Signed-off-by: David S. Miller

    David S. Miller
     

10 Sep, 2008

1 commit

  • This fixes kernel bugzilla 11469: "TUN with 1024 neighbours:
    ip6_dst_lookup_tail NULL crash"

    dst->neighbour is not necessarily hooked up at this point
    in the processing path, so blindly dereferencing it is
    the wrong thing to do. This NULL check exists in other
    similar paths and this case was just an oversight.

    Also fix the completely wrong and confusing indentation
    here while we're at it.

    Based upon a patch by Evgeniy Polyakov.

    Signed-off-by: Neil Horman
    Signed-off-by: David S. Miller

    Neil Horman
     

09 Sep, 2008

1 commit

  • How to reproduce ?
    - create a network namespace
    - use tcp protocol and get timewait socket
    - exit the network namespace
    - after a moment (when the timewait socket is destroyed), the kernel
    panics.

    # BUG: unable to handle kernel NULL pointer dereference at
    0000000000000007
    IP: [] inet_twdr_do_twkill_work+0x6e/0xb8
    PGD 119985067 PUD 11c5c0067 PMD 0
    Oops: 0000 [1] SMP
    CPU 1
    Modules linked in: ipv6 button battery ac loop dm_mod tg3 libphy ext3 jbd
    edd fan thermal processor thermal_sys sg sata_svw libata dock serverworks
    sd_mod scsi_mod ide_disk ide_core [last unloaded: freq_table]
    Pid: 0, comm: swapper Not tainted 2.6.27-rc2 #3
    RIP: 0010:[] []
    inet_twdr_do_twkill_work+0x6e/0xb8
    RSP: 0018:ffff88011ff7fed0 EFLAGS: 00010246
    RAX: ffffffffffffffff RBX: ffffffff82339420 RCX: ffff88011ff7ff30
    RDX: 0000000000000001 RSI: ffff88011a4d03c0 RDI: ffff88011ac2fc00
    RBP: ffffffff823392e0 R08: 0000000000000000 R09: ffff88002802a200
    R10: ffff8800a5c4b000 R11: ffffffff823e4080 R12: ffff88011ac2fc00
    R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
    FS: 0000000041cbd940(0000) GS:ffff8800bff839c0(0000)
    knlGS:0000000000000000
    CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
    CR2: 0000000000000007 CR3: 00000000bd87c000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process swapper (pid: 0, threadinfo ffff8800bff9e000, task
    ffff88011ff76690)
    Stack: ffffffff823392e0 0000000000000100 ffffffff821e3a3a
    0000000000000008
    0000000000000000 ffffffff821e3a61 ffff8800bff7c000 ffffffff8203c7e7
    ffff88011ff7ff10 ffff88011ff7ff10 0000000000000021 ffffffff82351108
    Call Trace:
    [] ? inet_twdr_hangman+0x0/0x9e
    [] ? inet_twdr_hangman+0x27/0x9e
    [] ? run_timer_softirq+0x12c/0x193
    [] ? __do_softirq+0x5e/0xcd
    [] ? call_softirq+0x1c/0x28
    [] ? do_softirq+0x2c/0x68
    [] ? smp_apic_timer_interrupt+0x8e/0xa9
    [] ? apic_timer_interrupt+0x66/0x70
    [] ? default_idle+0x27/0x3b
    [] ? cpu_idle+0x5f/0x7d

    Code: e8 01 00 00 4c 89 e7 41 ff c5 e8 8d fd ff ff 49 8b 44 24 38 4c 89 e7
    65 8b 14 25 24 00 00 00 89 d2 48 8b 80 e8 00 00 00 48 f7 d0 8b 04 d0
    48 ff 40 58 e8 fc fc ff ff 48 89 df e8 c0 5f 04 00
    RIP [] inet_twdr_do_twkill_work+0x6e/0xb8
    RSP
    CR2: 0000000000000007

    This patch provides a function to purge all timewait sockets related
    to a network namespace. The timewait sockets life cycle is not tied with
    the network namespace, that means the timewait sockets stay alive while
    the network namespace dies. The timewait sockets are for avoiding to
    receive a duplicate packet from the network, if the network namespace is
    freed, the network stack is removed, so no chance to receive any packets
    from the outside world. Furthermore, having a pending destruction timer
    on these sockets with a network namespace freed is not safe and will lead
    to an oops if the timer callback which try to access data belonging to
    the namespace like for example in:
    inet_twdr_do_twkill_work
    -> NET_INC_STATS_BH(twsk_net(tw), LINUX_MIB_TIMEWAITED);

    Purging the timewait sockets at the network namespace destruction will:
    1) speed up memory freeing for the namespace
    2) fix kernel panic on asynchronous timewait destruction

    Signed-off-by: Daniel Lezcano
    Acked-by: Denis V. Lunev
    Acked-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Daniel Lezcano
     

30 Aug, 2008

1 commit


26 Aug, 2008

1 commit


23 Aug, 2008

2 commits

  • This fixes a problem spotted with zebra, but not sure if it is
    necessary a kernel problem. With IPV6 when an address is added to an
    interface, Zebra creates a duplicate RIB entry, one as a connected
    route, and other as a kernel route.

    When an address is added to an interface the RTN_NEWADDR message
    causes Zebra to create a connected route. In IPV4 when an address is
    added to an interface a RTN_NEWROUTE message is set to user space with
    the protocol RTPROT_KERNEL. Zebra ignores these messages, because it
    already has the connected route.

    The problem is that route created in IPV6 has route protocol ==
    RTPROT_BOOT. Was this a design decision or a bug? This fixes it. Same
    patch applies to both net-2.6 and stable.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Pass namespace into icmp_xmit_lock, obtain socket inside and return
    it as a result for caller.

    Thanks Alexey Dobryan for this report:

    Steps to reproduce:

    CONFIG_PREEMPT=y
    CONFIG_DEBUG_PREEMPT=y
    tracepath

    BUG: using smp_processor_id() in preemptible [00000000] code: tracepath/3205
    caller is icmp_sk+0x15/0x30
    Pid: 3205, comm: tracepath Not tainted 2.6.27-rc4 #1

    Call Trace:
    [] debug_smp_processor_id+0xe4/0xf0
    [] icmp_sk+0x15/0x30
    [] icmp_send+0x4b/0x3f0
    [] ? trace_hardirqs_on_caller+0xd5/0x160
    [] ? trace_hardirqs_on+0xd/0x10
    [] ? local_bh_enable_ip+0x95/0x110
    [] ? _spin_unlock_bh+0x39/0x40
    [] ? mark_held_locks+0x4c/0x90
    [] ? trace_hardirqs_on+0xd/0x10
    [] ? trace_hardirqs_on_caller+0xd5/0x160
    [] ip_fragment+0x8d4/0x900
    [] ? ip_finish_output2+0x0/0x290
    [] ? ip_finish_output+0x0/0x60
    [] ? dst_output+0x0/0x10
    [] ip_finish_output+0x4c/0x60
    [] ip_output+0xa3/0xf0
    [] ip_local_out+0x20/0x30
    [] ip_push_pending_frames+0x27f/0x400
    [] udp_push_pending_frames+0x233/0x3d0
    [] udp_sendmsg+0x321/0x6f0
    [] inet_sendmsg+0x45/0x80
    [] sock_sendmsg+0xdf/0x110
    [] ? autoremove_wake_function+0x0/0x40
    [] ? validate_chain+0x415/0x1010
    [] ? __do_fault+0x140/0x450
    [] ? __lock_acquire+0x260/0x590
    [] ? sockfd_lookup_light+0x45/0x80
    [] sys_sendto+0xea/0x120
    [] ? _spin_unlock_irqrestore+0x42/0x80
    [] ? __up_read+0x4c/0xb0
    [] ? up_read+0x26/0x30
    [] system_call_fastpath+0x16/0x1b

    icmp6_sk() is similar.

    Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     

18 Aug, 2008

1 commit

  • When get receiving interface index while no message is received,
    the bounded device's index of the socket should be returned.

    RFC 3542:
    Issuing getsockopt() for the above options will return the sticky
    option value i.e., the value set with setsockopt(). If no sticky
    option value has been set getsockopt() will return the following
    values:

    - For the IPV6_PKTINFO option, it will return an in6_pktinfo
    structure with ipi6_addr being in6addr_any and ipi6_ifindex being
    zero.

    Signed-off-by: Yang Hongyang
    Signed-off-by: David S. Miller

    Yang Hongyang
     

15 Aug, 2008

1 commit


13 Aug, 2008

1 commit

  • Alexey Dobriyan wrote:
    > On Thu, Aug 07, 2008 at 07:00:56PM +0200, John Gumb wrote:
    >> Scenario: no ipv6 default route set.
    >
    >> # ip -f inet6 route get fec0::1
    >>
    >> BUG: unable to handle kernel NULL pointer dereference at 00000000
    >> IP: [] rt6_fill_node+0x175/0x3b0
    >> EIP is at rt6_fill_node+0x175/0x3b0
    >
    > 0xffffffff80424dd3 is in rt6_fill_node (net/ipv6/route.c:2191).
    > 2186 } else
    > 2187 #endif
    > 2188 NLA_PUT_U32(skb, RTA_IIF, iif);
    > 2189 } else if (dst) {
    > 2190 struct in6_addr saddr_buf;
    > 2191 ====> if (ipv6_dev_get_saddr(ip6_dst_idev(&rt->u.dst)->dev,
    > ^^^^^^^^^^^^^^^^^^^^^^^^
    > NULL
    >
    > 2192 dst, 0, &saddr_buf) == 0)
    > 2193 NLA_PUT(skb, RTA_PREFSRC, 16, &saddr_buf);
    > 2194 }

    The commit that changed this can't be reverted easily, but the patch
    below works for me.

    Fix NULL de-reference in rt6_fill_node() when there's no IPv6 input
    device present in the dst entry.

    Signed-off-by: Brian Haley
    Signed-off-by: David S. Miller

    Brian Haley
     

09 Aug, 2008

1 commit

  • The socket lock is there to protect the normal UDP receive path.
    Encapsulation UDP sockets don't need that protection. In fact
    the locking is deadly for them as they may contain another UDP
    packet within, possibly with the same addresses.

    Also the nested bit was copied from TCP. TCP needs it because
    of accept(2) spawning sockets. This simply doesn't apply to UDP
    so I've removed it.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

07 Aug, 2008

1 commit

  • If the following packet flow happen, kernel will panic.
    MathineA MathineB
    SYN
    ---------------------->
    SYN+ACK

    When a bad seq ACK is received, tcp_v4_md5_do_lookup(skb->sk, ip_hdr(skb)->daddr))
    is finally called by tcp_v4_reqsk_send_ack(), but the first parameter(skb->sk) is
    NULL at that moment, so kernel panic happens.
    This patch fixes this bug.

    OOPS output is as following:
    [ 302.812793] IP: [] tcp_v4_md5_do_lookup+0x12/0x42
    [ 302.817075] Oops: 0000 [#1] SMP
    [ 302.819815] Modules linked in: ipv6 loop dm_multipath rtc_cmos rtc_core rtc_lib pcspkr pcnet32 mii i2c_piix4 parport_pc i2c_core parport ac button ata_piix libata dm_mod mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
    [ 302.849946]
    [ 302.851198] Pid: 0, comm: swapper Not tainted (2.6.27-rc1-guijf #5)
    [ 302.855184] EIP: 0060:[] EFLAGS: 00010296 CPU: 0
    [ 302.858296] EIP is at tcp_v4_md5_do_lookup+0x12/0x42
    [ 302.861027] EAX: 0000001e EBX: 00000000 ECX: 00000046 EDX: 00000046
    [ 302.864867] ESI: ceb69e00 EDI: 1467a8c0 EBP: cf75f180 ESP: c0792e54
    [ 302.868333] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
    [ 302.871287] Process swapper (pid: 0, ti=c0792000 task=c0712340 task.ti=c0746000)
    [ 302.875592] Stack: c06f413a 00000000 cf75f180 ceb69e00 00000000 c05d0d86 000016d0 ceac5400
    [ 302.883275] c05d28f8 000016d0 ceb69e00 ceb69e20 681bf6e3 00001000 00000000 0a67a8c0
    [ 302.890971] ceac5400 c04250a3 c06f413a c0792eb0 c0792edc cf59a620 cf59a620 cf59a634
    [ 302.900140] Call Trace:
    [ 302.902392] [] tcp_v4_reqsk_send_ack+0x17/0x35
    [ 302.907060] [] tcp_check_req+0x156/0x372
    [ 302.910082] [] printk+0x14/0x18
    [ 302.912868] [] tcp_v4_do_rcv+0x1d3/0x2bf
    [ 302.917423] [] tcp_v4_rcv+0x563/0x5b9
    [ 302.920453] [] ip_local_deliver_finish+0xe8/0x183
    [ 302.923865] [] ip_rcv_finish+0x286/0x2a3
    [ 302.928569] [] dev_alloc_skb+0x11/0x25
    [ 302.931563] [] netif_receive_skb+0x2d6/0x33a
    [ 302.934914] [] pcnet32_poll+0x333/0x680 [pcnet32]
    [ 302.938735] [] net_rx_action+0x5c/0xfe
    [ 302.941792] [] __do_softirq+0x5d/0xc1
    [ 302.944788] [] __do_softirq+0x0/0xc1
    [ 302.948999] [] do_softirq+0x55/0x88
    [ 302.951870] [] handle_fasteoi_irq+0x0/0xa4
    [ 302.954986] [] irq_exit+0x35/0x69
    [ 302.959081] [] do_IRQ+0x99/0xae
    [ 302.961896] [] common_interrupt+0x23/0x28
    [ 302.966279] [] default_idle+0x2a/0x3d
    [ 302.969212] [] cpu_idle+0xb2/0xd2
    [ 302.972169] =======================
    [ 302.974274] Code: fc ff 84 d2 0f 84 df fd ff ff e9 34 fe ff ff 83 c4 0c 5b 5e 5f 5d c3 90 90 57 89 d7 56 53 89 c3 50 68 3a 41 6f c0 e8 e9 55 e5 ff 93 9c 04 00 00 58 85 d2 59 74 1e 8b 72 10 31 db 31 c9 85 f6
    [ 303.011610] EIP: [] tcp_v4_md5_do_lookup+0x12/0x42 SS:ESP 0068:c0792e54
    [ 303.018360] Kernel panic - not syncing: Fatal exception in interrupt

    Signed-off-by: Gui Jianfeng
    Signed-off-by: David S. Miller

    Gui Jianfeng
     

06 Aug, 2008

2 commits


04 Aug, 2008

3 commits


01 Aug, 2008

3 commits

  • Reported by Stefanos Harhalakis; although 2.6.27-rc1 talks to itself using IPv6
    TCP MD5 packets just fine, Stefanos noted that tcpdump claimed that the
    signatures were invalid.

    I broke this in 49a72dfb8814c2d65bd9f8c9c6daf6395a1ec58d ("tcp: Fix MD5
    signatures for non-linear skbs"), it was just a typo.

    Note that tcpdump will still sometimes claim that the signatures are incorrect.
    A patch to tcpdump has been submitted for this[1].

    [1] http://tinyurl.com/6a4fl2

    Signed-off-by: Adam Langley
    Signed-off-by: David S. Miller

    Adam Langley
     
  • I noticed, looking at tcpdumps, that timewait ACKs were getting sent
    with an incorrect MD5 signature when signatures were enabled.

    I broke this in 49a72dfb8814c2d65bd9f8c9c6daf6395a1ec58d ("tcp: Fix
    MD5 signatures for non-linear skbs"). I didn't take into account that
    the skb passed to tcp_*_send_ack was the inbound packet, thus the
    source and dest addresses need to be swapped when calculating the MD5
    pseudoheader.

    Signed-off-by: Adam Langley
    Signed-off-by: David S. Miller

    Adam Langley
     
  • SCTP used ip6_xmit() to send fragments after received ICMP packet too
    big message. But while send packet used ip6_xmit, the skb->local_df is
    not initialized. So when skb if enter ip6_fragment(), the following
    code will discard the skb.

    ip6_fragment(...)
    {
    if (!skb->local_df) {
    ...
    return -EMSGSIZE;
    }
    ...
    }

    SCTP do the following step:
    1. send packet ip6_xmit(skb, ipfragok=0)
    2. received ICMP packet too big message
    3. if PMTUD_ENABLE: ip6_xmit(skb, ipfragok=1)

    This patch fixed the problem by set local_df if ipfragok is true.

    Signed-off-by: Wei Yongjun
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Wei Yongjun
     

30 Jul, 2008

3 commits


27 Jul, 2008

5 commits

  • Piss-poor sysctl registration API strikes again, film at 11...
    What we really need is _pathname_ required to be present in
    already registered table, so that kernel could warn about bad
    order. That's the next target for sysctl stuff (and generally
    saner and more explicit order of initialization of ipv[46]
    internals wouldn't hurt either).

    For the time being, here are full fixups required by ..._rotable()
    stuff; we make per-net sysctl sets descendents of "ro" one and
    make sure that sufficient skeleton is there before we start registering
    per-net sysctls.

    Signed-off-by: Al Viro
    Signed-off-by: David S. Miller

    Al Viro
     
  • David S. Miller
     
  • net/ipv4/ipcomp.c: In function ‘ipcomp4_init_state’:
    net/ipv4/ipcomp.c:109: warning: unused variable ‘calg_desc’
    net/ipv4/ipcomp.c:108: warning: unused variable ‘ipcd’
    net/ipv4/ipcomp.c:107: warning: ‘err’ may be used uninitialized in this function
    net/ipv6/ipcomp6.c: In function ‘ipcomp6_init_state’:
    net/ipv6/ipcomp6.c:139: warning: unused variable ‘calg_desc’
    net/ipv6/ipcomp6.c:138: warning: unused variable ‘ipcd’
    net/ipv6/ipcomp6.c:137: warning: ‘err’ may be used uninitialized in this function

    Signed-off-by: David S. Miller

    David S. Miller
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
    netns: fix ip_rt_frag_needed rt_is_expired
    netfilter: nf_conntrack_extend: avoid unnecessary "ct->ext" dereferences
    netfilter: fix double-free and use-after free
    netfilter: arptables in netns for real
    netfilter: ip{,6}tables_security: fix future section mismatch
    selinux: use nf_register_hooks()
    netfilter: ebtables: use nf_register_hooks()
    Revert "pkt_sched: sch_sfq: dump a real number of flows"
    qeth: use dev->ml_priv instead of dev->priv
    syncookies: Make sure ECN is disabled
    net: drop unused BUG_TRAP()
    net: convert BUG_TRAP to generic WARN_ON
    drivers/net: convert BUG_TRAP to generic WARN_ON

    Linus Torvalds
     
  • Currently not visible, because NET_NS is mutually exclusive with SYSFS
    which is required by SECURITY.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

26 Jul, 2008

4 commits

  • ecn_ok is not initialized when a connection is established by cookies.
    The cookie syn-ack never sets ECN, so ecn_ok must be set to 0.

    Spotted using ns-3/network simulation cradle simulator and valgrind.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • Removes legacy reinvent-the-wheel type thing. The generic
    machinery integrates much better to automated debugging aids
    such as kerneloops.org (and others), and is unambiguous due to
    better naming. Non-intuively BUG_TRAP() is actually equal to
    WARN_ON() rather than BUG_ON() though some might actually be
    promoted to BUG_ON() but I left that to future.

    I could make at least one BUILD_BUG_ON conversion.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
    ipsec: ipcomp - Decompress into frags if necessary
    ipsec: ipcomp - Merge IPComp implementations
    pkt_sched: Fix locking in shutdown_scheduler_queue()

    Linus Torvalds
     
  • All uses of list_for_each_rcu() can be profitably replaced by the
    easier-to-use list_for_each_entry_rcu(). This patch makes this change for
    networking, in preparation for removing the list_for_each_rcu() API
    entirely.

    Acked-by: David S. Miller
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul E. McKenney
     

25 Jul, 2008

1 commit


24 Jul, 2008

1 commit


23 Jul, 2008

5 commits