24 May, 2011

2 commits

  • The %pK format specifier is designed to hide exposed kernel pointers,
    specifically via /proc interfaces. Exposing these pointers provides an
    easy target for kernel write vulnerabilities, since they reveal the
    locations of writable structures containing easily triggerable function
    pointers. The behavior of %pK depends on the kptr_restrict sysctl.

    If kptr_restrict is set to 0, no deviation from the standard %p behavior
    occurs. If kptr_restrict is set to 1, the default, if the current user
    (intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
    (currently in the LSM tree), kernel pointers using %pK are printed as 0's.
    If kptr_restrict is set to 2, kernel pointers using %pK are printed as
    0's regardless of privileges. Replacing with 0's was chosen over the
    default "(null)", which cannot be parsed by userland %p, which expects
    "(nil)".

    The supporting code for kptr_restrict and %pK are currently in the -mm
    tree. This patch converts users of %p in net/ to %pK. Cases of printing
    pointers to the syslog are not covered, since this would eliminate useful
    information for postmortem debugging and the reading of the syslog is
    already optionally protected by the dmesg_restrict sysctl.

    Signed-off-by: Dan Rosenberg
    Cc: James Morris
    Cc: Eric Dumazet
    Cc: Thomas Graf
    Cc: Eugene Teo
    Cc: Kees Cook
    Cc: Ingo Molnar
    Cc: David S. Miller
    Cc: Peter Zijlstra
    Cc: Eric Paris
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Dan Rosenberg
     
  • Like ipv4, just return xfrm6_rcv_spi()'s return value directly.

    Signed-off-by: David S. Miller

    David S. Miller
     

21 May, 2011

2 commits

  • commit c3968a857a6b6c3d2ef4ead35776b055fb664d74
    ('ipv6: RTA_PREFSRC support for ipv6 route source address selection')
    added support for ipv6 prefsrc as an alternative to ipv6 addrlabels,
    but it did not work because the prefsrc entry was not copied.

    Cc: Daniel Walter
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1446 commits)
    macvlan: fix panic if lowerdev in a bond
    tg3: Add braces around 5906 workaround.
    tg3: Fix NETIF_F_LOOPBACK error
    macvlan: remove one synchronize_rcu() call
    networking: NET_CLS_ROUTE4 depends on INET
    irda: Fix error propagation in ircomm_lmp_connect_response()
    irda: Kill set but unused variable 'bytes' in irlan_check_command_param()
    irda: Kill set but unused variable 'clen' in ircomm_connect_indication()
    rxrpc: Fix set but unused variable 'usage' in rxrpc_get_transport()
    be2net: Kill set but unused variable 'req' in lancer_fw_download()
    irda: Kill set but unused vars 'saddr' and 'daddr' in irlan_provider_connect_indication()
    atl1c: atl1c_resume() is only used when CONFIG_PM_SLEEP is defined.
    rxrpc: Fix set but unused variable 'usage' in rxrpc_get_peer().
    rxrpc: Kill set but unused variable 'local' in rxrpc_UDP_error_handler()
    rxrpc: Kill set but unused variable 'sp' in rxrpc_process_connection()
    rxrpc: Kill set but unused variable 'sp' in rxrpc_rotate_tx_window()
    pkt_sched: Kill set but unused variable 'protocol' in tc_classify()
    isdn: capi: Use pr_debug() instead of ifdefs.
    tg3: Update version to 3.119
    tg3: Apply rx_discards fix to 5719/5720
    ...

    Fix up trivial conflicts in arch/x86/Kconfig and net/mac80211/agg-tx.c
    as per Davem.

    Linus Torvalds
     

20 May, 2011

2 commits

  • * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (78 commits)
    Revert "rcu: Decrease memory-barrier usage based on semi-formal proof"
    net,rcu: convert call_rcu(prl_entry_destroy_rcu) to kfree
    batman,rcu: convert call_rcu(softif_neigh_free_rcu) to kfree_rcu
    batman,rcu: convert call_rcu(neigh_node_free_rcu) to kfree()
    batman,rcu: convert call_rcu(gw_node_free_rcu) to kfree_rcu
    net,rcu: convert call_rcu(kfree_tid_tx) to kfree_rcu()
    net,rcu: convert call_rcu(xt_osf_finger_free_rcu) to kfree_rcu()
    net/mac80211,rcu: convert call_rcu(work_free_rcu) to kfree_rcu()
    net,rcu: convert call_rcu(wq_free_rcu) to kfree_rcu()
    net,rcu: convert call_rcu(phonet_device_rcu_free) to kfree_rcu()
    perf,rcu: convert call_rcu(swevent_hlist_release_rcu) to kfree_rcu()
    perf,rcu: convert call_rcu(free_ctx) to kfree_rcu()
    net,rcu: convert call_rcu(__nf_ct_ext_free_rcu) to kfree_rcu()
    net,rcu: convert call_rcu(net_generic_release) to kfree_rcu()
    net,rcu: convert call_rcu(netlbl_unlhsh_free_addr6) to kfree_rcu()
    net,rcu: convert call_rcu(netlbl_unlhsh_free_addr4) to kfree_rcu()
    security,rcu: convert call_rcu(sel_netif_free) to kfree_rcu()
    net,rcu: convert call_rcu(xps_dev_maps_release) to kfree_rcu()
    net,rcu: convert call_rcu(xps_map_release) to kfree_rcu()
    net,rcu: convert call_rcu(rps_map_release) to kfree_rcu()
    ...

    Linus Torvalds
     
  • ipv6 has per device ICMP SNMP counters, taking too much space because
    they use percpu storage.

    needed size per device is :
    (512+4)*sizeof(long)*number_of_possible_cpus*2

    On a 32bit kernel, 16 possible cpus, this wastes more than 64kbytes of
    memory per ipv6 enabled network device, taken in vmalloc pool.

    Since ICMP messages are rare, just use shared counters (atomic_long_t)

    Per network space ICMP counters are still using percpu memory, we might
    also convert them to shared counters in a future patch.

    Signed-off-by: Eric Dumazet
    CC: Denys Fedoryshchenko
    Signed-off-by: David S. Miller

    Eric Dumazet
     

12 May, 2011

1 commit


11 May, 2011

2 commits

  • David S. Miller
     
  • As it is, we assign the outer modes output function to the dst entry
    when we create the xfrm bundle. This leads to two problems on interfamily
    scenarios. We might insert ipv4 packets into ip6_fragment when called
    from xfrm6_output. The system crashes if we try to fragment an ipv4
    packet with ip6_fragment. This issue was introduced with git commit
    ad0081e4 (ipv6: Fragment locally generated tunnel-mode IPSec6 packets
    as needed). The second issue is, that we might insert ipv4 packets in
    netfilter6 and vice versa on interfamily scenarios.

    With this patch we assign the inner mode output function to the dst entry
    when we create the xfrm bundle. So xfrm4_output/xfrm6_output from the inner
    mode is used and the right fragmentation and netfilter functions are called.
    We switch then to outer mode with the output_finish functions.

    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     

10 May, 2011

1 commit

  • The IPv6 header is not zeroed out in alloc_skb so we must initialize
    it properly unless we want to see IPv6 packets with random TOS fields
    floating around. The current implementation resets the flow label
    but this could be changed if deemed necessary.

    We stumbled upon this issue when trying to apply a mangle rule to
    the RST packet generated by the REJECT target module.

    Signed-off-by: Fernando Luis Vazquez Cao
    Signed-off-by: Pablo Neira Ayuso

    Fernando Luis Vazquez Cao
     

09 May, 2011

1 commit

  • This allows us to acquire the exact route keying information from the
    protocol, however that might be managed.

    It handles all of the possibilities, from the simplest case of storing
    the key in inet->cork.fl to the more complex setup SCTP has where
    individual transports determine the flow.

    Signed-off-by: David S. Miller

    David S. Miller
     

08 May, 2011

4 commits


07 May, 2011

1 commit

  • When we fast path datagram sends to avoid locking by putting
    the inet_cork on the stack we use up lots of space that isn't
    necessary.

    This is because inet_cork contains a "struct flowi" which isn't
    used in these code paths.

    Split inet_cork to two parts, "inet_cork" and "inet_cork_full".
    Only the latter of which has the "struct flowi" and is what is
    stored in inet_sock.

    Signed-off-by: David S. Miller
    Acked-by: Eric Dumazet

    David S. Miller
     

06 May, 2011

2 commits


05 May, 2011

1 commit


04 May, 2011

1 commit


03 May, 2011

2 commits

  • ctl_table_headers registered with register_net_sysctl_table should
    have been unregistered with the equivalent unregister_net_sysctl_table

    Signed-off-by: Lucian Adrian Grijincu
    Signed-off-by: David S. Miller

    Lucian Adrian Grijincu
     
  • Four years ago, Patrick made a change to hold rtnl mutex during netlink
    dump callbacks.

    I believe it was a wrong move. This slows down concurrent dumps, making
    good old /proc/net/ files faster than rtnetlink in some situations.

    This occurred to me because one "ip link show dev ..." was _very_ slow
    on a workload adding/removing network devices in background.

    All dump callbacks are able to use RCU locking now, so this patch does
    roughly a revert of commits :

    1c2d670f366 : [RTNETLINK]: Hold rtnl_mutex during netlink dump callbacks
    6313c1e0992 : [RTNETLINK]: Remove unnecessary locking in dump callbacks

    This let writers fight for rtnl mutex and readers going full speed.

    It also takes care of phonet : phonet_route_get() is now called from rcu
    read section. I renamed it to phonet_route_get_rcu()

    Signed-off-by: Eric Dumazet
    Cc: Patrick McHardy
    Cc: Remi Denis-Courmont
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Apr, 2011

1 commit

  • For backward compatibility, we should retain the module parameters and
    sysfs attributes to control the number of peer notifications
    (gratuitous ARPs and unsolicited NAs) sent after bonding failover.
    Also, it is possible for failover to take place even though the new
    active slave does not have link up, and in that case the peer
    notification should be deferred until it does.

    Change ipv4 and ipv6 so they do not automatically send peer
    notifications on bonding failover.

    Change the bonding driver to send separate NETDEV_NOTIFY_PEERS
    notifications when the link is up, as many times as requested. Since
    it does not directly control which protocols send notifications, make
    num_grat_arp and num_unsol_na aliases for a single parameter. Bump
    the bonding version number and update its documentation.

    Signed-off-by: Ben Hutchings
    Signed-off-by: Jay Vosburgh
    Acked-by: Brian Haley
    Signed-off-by: David S. Miller

    Ben Hutchings
     

29 Apr, 2011

4 commits


27 Apr, 2011

2 commits


26 Apr, 2011

1 commit

  • Since commit 62fa8a846d7d (net: Implement read-only protection and COW'ing
    of metrics.) the kernel throws an oops.

    [ 101.620985] BUG: unable to handle kernel NULL pointer dereference at
    (null)
    [ 101.621050] IP: [< (null)>] (null)
    [ 101.621084] PGD 6e53c067 PUD 3dd6a067 PMD 0
    [ 101.621122] Oops: 0010 [#1] SMP
    [ 101.621153] last sysfs file: /sys/devices/virtual/ppp/ppp/uevent
    [ 101.621192] CPU 2
    [ 101.621206] Modules linked in: l2tp_ppp pppox ppp_generic slhc
    l2tp_netlink l2tp_core deflate zlib_deflate twofish_x86_64
    twofish_common des_generic cbc ecb sha1_generic hmac af_key
    iptable_filter snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device loop
    snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec
    snd_pcm snd_timer snd i2c_i801 iTCO_wdt psmouse soundcore snd_page_alloc
    evdev uhci_hcd ehci_hcd thermal
    [ 101.621552]
    [ 101.621567] Pid: 5129, comm: openl2tpd Not tainted 2.6.39-rc4-Quad #3
    Gigabyte Technology Co., Ltd. G33-DS3R/G33-DS3R
    [ 101.621637] RIP: 0010:[] [< (null)>] (null)
    [ 101.621684] RSP: 0018:ffff88003ddeba60 EFLAGS: 00010202
    [ 101.621716] RAX: ffff88003ddb5600 RBX: ffff88003ddb5600 RCX:
    0000000000000020
    [ 101.621758] RDX: ffffffff81a69a00 RSI: ffffffff81b7ee61 RDI:
    ffff88003ddb5600
    [ 101.621800] RBP: ffff8800537cd900 R08: 0000000000000000 R09:
    ffff88003ddb5600
    [ 101.621840] R10: 0000000000000005 R11: 0000000000014b38 R12:
    ffff88003ddb5600
    [ 101.621881] R13: ffffffff81b7e480 R14: ffffffff81b7e8b8 R15:
    ffff88003ddebad8
    [ 101.621924] FS: 00007f06e4182700(0000) GS:ffff88007fd00000(0000)
    knlGS:0000000000000000
    [ 101.621971] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 101.622005] CR2: 0000000000000000 CR3: 0000000045274000 CR4:
    00000000000006e0
    [ 101.622046] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
    0000000000000000
    [ 101.622087] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
    0000000000000400
    [ 101.622129] Process openl2tpd (pid: 5129, threadinfo
    ffff88003ddea000, task ffff88003de9a280)
    [ 101.622177] Stack:
    [ 101.622191] ffffffff81447efa ffff88007d3ded80 ffff88003de9a280
    ffff88007d3ded80
    [ 101.622245] 0000000000000001 ffff88003ddebbb8 ffffffff8148d5a7
    0000000000000212
    [ 101.622299] ffff88003dcea000 ffff88003dcea188 ffffffff00000001
    ffffffff81b7e480
    [ 101.622353] Call Trace:
    [ 101.622374] [] ? ipv4_blackhole_route+0x1ba/0x210
    [ 101.622415] [] ? xfrm_lookup+0x417/0x510
    [ 101.622450] [] ? extract_buf+0x9a/0x140
    [ 101.622485] [] ? __ip_flush_pending_frames+0x70/0x70
    [ 101.622526] [] ? udp_sendmsg+0x62f/0x810
    [ 101.622562] [] ? sock_sendmsg+0x116/0x130
    [ 101.622599] [] ? find_get_page+0x18/0x90
    [ 101.622633] [] ? filemap_fault+0x12a/0x4b0
    [ 101.622668] [] ? move_addr_to_kernel+0x64/0x90
    [ 101.622706] [] ? verify_iovec+0x7a/0xf0
    [ 101.622739] [] ? sys_sendmsg+0x292/0x420
    [ 101.622774] [] ? handle_pte_fault+0x8a/0x7c0
    [ 101.622810] [] ? __pte_alloc+0xae/0x130
    [ 101.622844] [] ? handle_mm_fault+0x138/0x380
    [ 101.622880] [] ? do_page_fault+0x189/0x410
    [ 101.622915] [] ? sys_getsockname+0xf3/0x110
    [ 101.622952] [] ? ip_setsockopt+0x4d/0xa0
    [ 101.622986] [] ? sockfd_lookup_light+0x22/0x90
    [ 101.623024] [] ? system_call_fastpath+0x16/0x1b
    [ 101.623060] Code: Bad RIP value.
    [ 101.623090] RIP [< (null)>] (null)
    [ 101.623125] RSP
    [ 101.623146] CR2: 0000000000000000
    [ 101.650871] ---[ end trace ca3856a7d8e8dad4 ]---
    [ 101.651011] __sk_free: optmem leakage (160 bytes) detected.

    The oops happens in dst_metrics_write_ptr()
    include/net/dst.h:124: return dst->ops->cow_metrics(dst, p);

    dst->ops->cow_metrics is NULL and causes the oops.

    Provide cow_metrics() methods, like we did in commit 214f45c91bb
    (net: provide default_advmss() methods to blackhole dst_ops)

    Signed-off-by: Held Bernhard
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Held Bernhard
     

23 Apr, 2011

1 commit


22 Apr, 2011

2 commits

  • The changes introduced with git-commit a02e4b7d ("ipv6: Demark default
    hoplimit as zero.") missed to remove the hoplimit initialization. As a
    result, ipv6_get_mtu interprets the return value of dst_metric_raw
    (-1) as 255 and answers ping6 with this hoplimit. This patche removes
    the line such that ping6 is answered with the hoplimit value
    configured via sysctl.

    Signed-off-by: Thomas Egerer
    Signed-off-by: David S. Miller

    Thomas Egerer
     
  • At this point, skb->data points to skb_transport_header.
    So, headroom check is wrong.

    For some case:bridge(UFO is on) + eth device(UFO is off),
    there is no enough headroom for IPv6 frag head.
    But headroom check is always false.

    This will bring about data be moved to there prior to skb->head,
    when adding IPv6 frag header to skb.

    Signed-off-by: Shan Wei
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Shan Wei
     

20 Apr, 2011

1 commit


19 Apr, 2011

1 commit


18 Apr, 2011

4 commits


16 Apr, 2011

1 commit