11 Mar, 2011

2 commits


10 Mar, 2011

2 commits

  • Addresses https://bugzilla.kernel.org/show_bug.cgi?id=29252
    Addresses https://bugzilla.kernel.org/show_bug.cgi?id=30462

    In commit d80bc0fd262ef840ed4e82593ad6416fa1ba3fc4 ("ipv6: Always
    clone offlink routes.") we forced the kernel to always clone offlink
    routes.

    The reason we do that is to make sure we never bind an inetpeer to a
    prefixed route.

    The logic turned on here has existed in the tree for many years,
    but was always off due to a protecting CPP define. So perhaps
    it's no surprise that there is a logic bug here.

    The problem is that we canot clone a route that is already a
    host route (ie. has DST_HOST set). Because if we do, an identical
    entry already exists in the routing tree and therefore the
    ip6_rt_ins() call is going to fail.

    This sets off a series of failures and high cpu usage, because when
    ip6_rt_ins() fails we loop retrying this operation a few times in
    order to handle a race between two threads trying to clone and insert
    the same host route at the same time.

    Fix this by simply using the route as-is when DST_HOST is set.

    Reported-by: slash@ac.auone-net.jp
    Reported-by: Ernst Sjöstrand
    Signed-off-by: David S. Miller

    David S. Miller
     
  • Since a8f80e8ff94ecba629542d9b4b5f5a8ee3eb565c any process with
    CAP_NET_ADMIN may load any module from /lib/modules/. This doesn't mean
    that CAP_NET_ADMIN is a superset of CAP_SYS_MODULE as modules are
    limited to /lib/modules/**. However, CAP_NET_ADMIN capability shouldn't
    allow anybody load any module not related to networking.

    This patch restricts an ability of autoloading modules to netdev modules
    with explicit aliases. This fixes CVE-2011-1019.

    Arnd Bergmann suggested to leave untouched the old pre-v2.6.32 behavior
    of loading netdev modules by name (without any prefix) for processes
    with CAP_SYS_MODULE to maintain the compatibility with network scripts
    that use autoloading netdev modules by aliases like "eth0", "wlan0".

    Currently there are only three users of the feature in the upstream
    kernel: ipip, ip_gre and sit.

    root@albatros:~# capsh --drop=$(seq -s, 0 11),$(seq -s, 13 34) --
    root@albatros:~# grep Cap /proc/$$/status
    CapInh: 0000000000000000
    CapPrm: fffffff800001000
    CapEff: fffffff800001000
    CapBnd: fffffff800001000
    root@albatros:~# modprobe xfs
    FATAL: Error inserting xfs
    (/lib/modules/2.6.38-rc6-00001-g2bf4ca3/kernel/fs/xfs/xfs.ko): Operation not permitted
    root@albatros:~# lsmod | grep xfs
    root@albatros:~# ifconfig xfs
    xfs: error fetching interface information: Device not found
    root@albatros:~# lsmod | grep xfs
    root@albatros:~# lsmod | grep sit
    root@albatros:~# ifconfig sit
    sit: error fetching interface information: Device not found
    root@albatros:~# lsmod | grep sit
    root@albatros:~# ifconfig sit0
    sit0 Link encap:IPv6-in-IPv4
    NOARP MTU:1480 Metric:1

    root@albatros:~# lsmod | grep sit
    sit 10457 0
    tunnel4 2957 1 sit

    For CAP_SYS_MODULE module loading is still relaxed:

    root@albatros:~# grep Cap /proc/$$/status
    CapInh: 0000000000000000
    CapPrm: ffffffffffffffff
    CapEff: ffffffffffffffff
    CapBnd: ffffffffffffffff
    root@albatros:~# ifconfig xfs
    xfs: error fetching interface information: Device not found
    root@albatros:~# lsmod | grep xfs
    xfs 745319 0

    Reference: https://lkml.org/lkml/2011/2/24/203

    Signed-off-by: Vasiliy Kulikov
    Signed-off-by: Michael Tokarev
    Acked-by: David S. Miller
    Acked-by: Kees Cook
    Signed-off-by: James Morris

    Vasiliy Kulikov
     

26 Feb, 2011

1 commit


20 Feb, 2011

1 commit


19 Feb, 2011

1 commit


17 Feb, 2011

1 commit


04 Feb, 2011

1 commit


01 Feb, 2011

2 commits

  • In my testing of 2.6.37 I was occassionally getting a warning about
    sysctl table entries being unregistered in the wrong order. Digging
    in it turns out this dates back to the last great sysctl reorg done
    where Al Viro introduced the requirement that sysctl directories
    needed to be created before and destroyed after the files in them.

    It turns out that in that great reorg /proc/sys/net/ipv6/neigh was
    overlooked. So this patch fixes that oversight and makes an annoying
    warning message go away.

    >------------[ cut here ]------------
    >WARNING: at kernel/sysctl.c:1992 unregister_sysctl_table+0x134/0x164()
    >Pid: 23951, comm: kworker/u:3 Not tainted 2.6.37-350888.2010AroraKernelBeta.fc14.x86_64 #1
    >Call Trace:
    > [] warn_slowpath_common+0x80/0x98
    > [] warn_slowpath_null+0x15/0x17
    > [] unregister_sysctl_table+0x134/0x164
    > [] ? kfree+0xc4/0xd1
    > [] neigh_sysctl_unregister+0x22/0x3a
    > [] addrconf_ifdown+0x33f/0x37b [ipv6]
    > [] ? skb_dequeue+0x5f/0x6b
    > [] addrconf_notify+0x69b/0x75c [ipv6]
    > [] ? ip6mr_device_event+0x98/0xa9 [ipv6]
    > [] notifier_call_chain+0x32/0x5e
    > [] raw_notifier_call_chain+0xf/0x11
    > [] call_netdevice_notifiers+0x45/0x4a
    > [] rollback_registered_many+0x118/0x201
    > [] unregister_netdevice_many+0x16/0x6d
    > [] default_device_exit_batch+0xa4/0xb8
    > [] ? cleanup_net+0x0/0x194
    > [] ops_exit_list+0x4e/0x56
    > [] cleanup_net+0xf4/0x194
    > [] process_one_work+0x187/0x280
    > [] worker_thread+0xff/0x19f
    > [] ? worker_thread+0x0/0x19f
    > [] kthread+0x7d/0x85
    > [] kernel_thread_helper+0x4/0x10
    > [] ? kthread+0x0/0x85
    > [] ? kernel_thread_helper+0x0/0x10
    >---[ end trace 8a7e9310b35e9486 ]---

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • When an IPSEC SA is still being set up, __xfrm_lookup() will return
    -EREMOTE and so ip_route_output_flow() will return a blackhole route.
    This can happen in a sndmsg call, and after d33e455337ea ("net: Abstract
    default MTU metric calculation behind an accessor.") this leads to a
    crash in ip_append_data() because the blackhole dst_ops have no
    default_mtu() method and so dst_mtu() calls a NULL pointer.

    Fix this by adding default_mtu() methods (that simply return 0, matching
    the old behavior) to the blackhole dst_ops.

    The IPv4 part of this patch fixes a crash that I saw when using an IPSEC
    VPN; the IPv6 part is untested because I don't have an IPv6 VPN, but it
    looks to be needed as well.

    Signed-off-by: Roland Dreier
    Signed-off-by: David S. Miller

    Roland Dreier
     

28 Jan, 2011

1 commit

  • They are bogus. The basic idea is that I wanted to make sure
    that prefixed routes never bind to peers.

    The test I used was whether RTF_CACHE was set.

    But first of all, the RTF_CACHE flag is set at different spots
    depending upon which ip6_rt_copy() caller you're talking about.

    I've validated all of the code paths, and even in the future
    where we bind peers more aggressively (for route metric COW'ing)
    we never bind to prefix'd routes, only fully specified ones.
    This even applies when addrconf or icmp6 routes are allocated.

    Signed-off-by: David S. Miller

    David S. Miller
     

27 Jan, 2011

1 commit


26 Jan, 2011

1 commit

  • This reverts the following set of commits:

    d1ed113f1669390da9898da3beddcc058d938587 ("ipv6: remove duplicate neigh_ifdown")
    29ba5fed1bbd09c2cba890798c8f9eaab251401d ("ipv6: don't flush routes when setting loopback down")
    9d82ca98f71fd686ef2f3017c5e3e6a4871b6e46 ("ipv6: fix missing in6_ifa_put in addrconf")
    2de795707294972f6c34bae9de713e502c431296 ("ipv6: addrconf: don't remove address state on ifdown if the address is being kept")
    8595805aafc8b077e01804c9a3668e9aa3510e89 ("IPv6: only notify protocols if address is compeletely gone")
    27bdb2abcc5edb3526e25407b74bf17d1872c329 ("IPv6: keep tentative addresses in hash table")
    93fa159abe50d3c55c7f83622d3f5c09b6e06f4b ("IPv6: keep route for tentative address")
    8f37ada5b5f6bfb4d251a7f510f249cb855b77b3 ("IPv6: fix race between cleanup and add/delete address")
    84e8b803f1e16f3a2b8b80f80a63fa2f2f8a9be6 ("IPv6: addrconf notify when address is unavailable")
    dc2b99f71ef477a31020511876ab4403fb7c4420 ("IPv6: keep permanent addresses on admin down")

    because the core semantic change to ipv6 address handling on ifdown
    has broken some things, in particular "disable_ipv6" sysctl handling.

    Stephen has made several attempts to get things back in working order,
    but nothing has restored disable_ipv6 fully yet.

    Reported-by: Eric W. Biederman
    Tested-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    David S. Miller
     

25 Jan, 2011

1 commit


20 Jan, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (41 commits)
    sctp: user perfect name for Delayed SACK Timer option
    net: fix can_checksum_protocol() arguments swap
    Revert "netlink: test for all flags of the NLM_F_DUMP composite"
    gianfar: Fix misleading indentation in startup_gfar()
    net/irda/sh_irda: return to RX mode when TX error
    net offloading: Do not mask out NETIF_F_HW_VLAN_TX for vlan.
    USB CDC NCM: tx_fixup() race condition fix
    ns83820: Avoid bad pointer deref in ns83820_init_one().
    ipv6: Silence privacy extensions initialization
    bnx2x: Update bnx2x version to 1.62.00-4
    bnx2x: Fix AER setting for BCM57712
    bnx2x: Fix BCM84823 LED behavior
    bnx2x: Mark full duplex on some external PHYs
    bnx2x: Fix BCM8073/BCM8727 microcode loading
    bnx2x: LED fix for BCM8727 over BCM57712
    bnx2x: Common init will be executed only once after POR
    bnx2x: Swap BCM8073 PHY polarity if required
    iwlwifi: fix valid chain reading from EEPROM
    ath5k: fix locking in tx_complete_poll_work
    ath9k_hw: do PA offset calibration only on longcal interval
    ...

    Linus Torvalds
     

19 Jan, 2011

1 commit

  • When a network namespace is created (via CLONE_NEWNET), the loopback
    interface is automatically added to the new namespace, triggering a
    printk in ipv6_add_dev() if CONFIG_IPV6_PRIVACY is set.

    This is problematic for applications which use CLONE_NEWNET as
    part of a sandbox, like Chromium's suid sandbox or recent versions of
    vsftpd. On a busy machine, it can lead to thousands of useless
    "lo: Disabled Privacy Extensions" messages appearing in dmesg.

    It's easy enough to check the status of privacy extensions via the
    use_tempaddr sysctl, so just removing the printk seems like the most
    sensible solution.

    Signed-off-by: Romain Francoise
    Signed-off-by: David S. Miller

    Romain Francoise
     

15 Jan, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits)
    GRETH: resolve SMP issues and other problems
    GRETH: handle frame error interrupts
    GRETH: avoid writing bad speed/duplex when setting transfer mode
    GRETH: fixed skb buffer memory leak on frame errors
    GRETH: GBit transmit descriptor handling optimization
    GRETH: fix opening/closing
    GRETH: added raw AMBA vendor/device number to match against.
    cassini: Fix build bustage on x86.
    e1000e: consistent use of Rx/Tx vs. RX/TX/rx/tx in comments/logs
    e1000e: update Copyright for 2011
    e1000: Avoid unhandled IRQ
    r8169: keep firmware in memory.
    netdev: tilepro: Use is_unicast_ether_addr helper
    etherdevice.h: Add is_unicast_ether_addr function
    ks8695net: Use default implementation of ethtool_ops::get_link
    ks8695net: Disable non-working ethtool operations
    USB CDC NCM: Don't deref NULL in cdc_ncm_rx_fixup() and don't use uninitialized variable.
    vxge: Remember to release firmware after upgrading firmware
    netdev: bfin_mac: Remove is_multicast_ether_addr use in netdev_for_each_mc_addr
    ipsec: update MAX_AH_AUTH_LEN to support sha512
    ...

    Linus Torvalds
     

14 Jan, 2011

1 commit

  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
    Documentation/trace/events.txt: Remove obsolete sched_signal_send.
    writeback: fix global_dirty_limits comment runtime -> real-time
    ppc: fix comment typo singal -> signal
    drivers: fix comment typo diable -> disable.
    m68k: fix comment typo diable -> disable.
    wireless: comment typo fix diable -> disable.
    media: comment typo fix diable -> disable.
    remove doc for obsolete dynamic-printk kernel-parameter
    remove extraneous 'is' from Documentation/iostats.txt
    Fix spelling milisec -> ms in snd_ps3 module parameter description
    Fix spelling mistakes in comments
    Revert conflicting V4L changes
    i7core_edac: fix typos in comments
    mm/rmap.c: fix comment
    sound, ca0106: Fix assignment to 'channel'.
    hrtimer: fix a typo in comment
    init/Kconfig: fix typo
    anon_inodes: fix wrong function name in comment
    fix comment typos concerning "consistent"
    poll: fix a typo in comment
    ...

    Fix up trivial conflicts in:
    - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c)
    - fs/ext4/ext4.h

    Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.

    Linus Torvalds
     

13 Jan, 2011

3 commits


12 Jan, 2011

3 commits

  • David S. Miller
     
  • skb_cow_data() may allocate a new data buffer, so pointers on
    skb should be set after this function.

    Bug was introduced by commit dff3bb06 ("ah4: convert to ahash")
    and 8631e9bd ("ah6: convert to ahash").

    Signed-off-by: Wang Xuefu
    Acked-by: Krzysztof Witek
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Dang Hongwu
     
  • inet_csk_bind_conflict() logic currently disallows a bind() if
    it finds a friend socket (a socket bound on same address/port)
    satisfying a set of conditions :

    1) Current (to be bound) socket doesnt have sk_reuse set
    OR
    2) other socket doesnt have sk_reuse set
    OR
    3) other socket is in LISTEN state

    We should add the CLOSE state in the 3) condition, in order to avoid two
    REUSEADDR sockets in CLOSE state with same local address/port, since
    this can deny further operations.

    Note : a prior patch tried to address the problem in a different (and
    buggy) way. (commit fda48a0d7a8412ced tcp: bind() fix when many ports
    are bound).

    Reported-by: Gaspar Chilingarov
    Reported-by: Daniel Baluta
    Tested-by: Daniel Baluta
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

11 Jan, 2011

1 commit

  • Using "iptables -L" with a lot of rules have a too big BH latency.
    Jesper mentioned ~6 ms and worried of frame drops.

    Switch to a per_cpu seqlock scheme, so that taking a snapshot of
    counters doesnt need to block BH (for this cpu, but also other cpus).

    This adds two increments on seqlock sequence per ipt_do_table() call,
    its a reasonable cost for allowing "iptables -L" not block BH
    processing.

    Reported-by: Jesper Dangaard Brouer
    Signed-off-by: Eric Dumazet
    CC: Patrick McHardy
    Acked-by: Stephen Hemminger
    Acked-by: Jesper Dangaard Brouer
    Signed-off-by: Pablo Neira Ayuso

    Eric Dumazet
     

23 Dec, 2010

1 commit


21 Dec, 2010

1 commit


20 Dec, 2010

1 commit


19 Dec, 2010

2 commits


18 Dec, 2010

1 commit


17 Dec, 2010

3 commits

  • When loopback device is being brought down, then keep the route table
    entries because they are special. The entries in the local table for
    linklocal routes and ::1 address should not be purged.

    This is a sub optimal solution to the problem and should be replaced
    by a better fix in future.

    Signed-off-by: Stephen Hemminger
    Acked-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    stephen hemminger
     
  • Special care is taken inside sk_port_alloc to avoid overwriting
    skc_node/skc_nulls_node. We should also avoid overwriting
    skc_bind_node/skc_portaddr_node.

    The patch fixes the following crash:

    BUG: unable to handle kernel paging request at fffffffffffffff0
    IP: [] udp4_lib_lookup2+0xad/0x370
    [] __udp4_lib_lookup+0x282/0x360
    [] __udp4_lib_rcv+0x31e/0x700
    [] ? ip_local_deliver_finish+0x65/0x190
    [] ? ip_local_deliver+0x88/0xa0
    [] udp_rcv+0x15/0x20
    [] ip_local_deliver_finish+0x65/0x190
    [] ip_local_deliver+0x88/0xa0
    [] ip_rcv_finish+0x32d/0x6f0
    [] ? netif_receive_skb+0x99c/0x11c0
    [] ip_rcv+0x2bb/0x350
    [] netif_receive_skb+0x99c/0x11c0

    Signed-off-by: Leonard Crestez
    Signed-off-by: Octavian Purdila
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Octavian Purdila
     
  • The first big packets sent to a "low-MTU" client correctly
    triggers the creation of a temporary route containing the reduced MTU.

    But after the temporary route has expired, new ICMP6 "packet too big"
    will be sent, rt6_pmtu_discovery will find the previous EXPIRED route
    check that its mtu isn't bigger then in icmp packet and do nothing
    before the temporary route will not deleted by gc.

    I make the simple experiment:
    while :; do
    time ( dd if=/dev/zero bs=10K count=1 | ssh hostname dd of=/dev/null ) || break;
    done

    The "time" reports real 0m0.197s if a temporary route isn't expired, but
    it reports real 0m52.837s (!!!!) immediately after a temporare route has
    expired.

    Signed-off-by: Andrey Vagin
    Signed-off-by: David S. Miller

    Andrey Vagin
     

15 Dec, 2010

1 commit


14 Dec, 2010

1 commit

  • Make all RTAX_ADVMSS metric accesses go through a new helper function,
    dst_metric_advmss().

    Leave the actual default metric as "zero" in the real metric slot,
    and compute the actual default value dynamically via a new dst_ops
    AF specific callback.

    For stacked IPSEC routes, we use the advmss of the path which
    preserves existing behavior.

    Unlike ipv4/ipv6, DecNET ties the advmss to the mtu and thus updates
    advmss on pmtu updates. This inconsistency in advmss handling
    results in more raw metric accesses than I wish we ended up with.

    Signed-off-by: David S. Miller

    David S. Miller
     

13 Dec, 2010

3 commits