24 Feb, 2011

2 commits


23 Feb, 2011

4 commits


21 Feb, 2011

1 commit


20 Feb, 2011

2 commits


19 Feb, 2011

1 commit


18 Feb, 2011

1 commit


17 Feb, 2011

1 commit


11 Feb, 2011

2 commits

  • If we didn't have a routing cache, we would not be able to properly
    propagate certain kinds of dynamic path attributes, for example
    PMTU information and redirects.

    The reason is that if we didn't have a routing cache, then there would
    be no way to lookup all of the active cached routes hanging off of
    sockets, tunnels, IPSEC bundles, etc.

    Consider the case where we created a cached route, but no inetpeer
    entry existed and also we were not asked to pre-COW the route metrics
    and therefore did not force the creation a new inetpeer entry.

    If we later get a PMTU message, or a redirect, and store this
    information in a new inetpeer entry, there is no way to teach that
    cached route about the newly existing inetpeer entry.

    The facilities implemented here handle this problem.

    First we create a generation ID. When we create a cached route of any
    kind, we remember the generation ID at the time of attachment. Any
    time we force-create an inetpeer entry in response to new path
    information, we bump that generation ID.

    The dst_ops->check() callback is where the knowledge of this event
    is propagated. If the global generation ID does not equal the one
    stored in the cached route, and the cached route has not attached
    to an inetpeer yet, we look it up and attach if one is found. Now
    that we've updated the cached route's information, we update the
    route's generation ID too.

    This clears the way for implementing PMTU and redirects directly in
    the inetpeer cache. There is absolutely no need to consult cached
    route information in order to maintain this information.

    At this point nothing bumps the inetpeer genids, that comes in the
    later changes which handle PMTUs and redirects using inetpeers.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Future changes will add caching information, and some of
    these new elements will be addresses.

    Since the family is implicit via the ->daddr.family member,
    replicating the family in ever address we store is entirely
    redundant.

    Signed-off-by: David S. Miller

    David S. Miller
     

09 Feb, 2011

1 commit


05 Feb, 2011

2 commits


04 Feb, 2011

1 commit


01 Feb, 2011

2 commits

  • In my testing of 2.6.37 I was occassionally getting a warning about
    sysctl table entries being unregistered in the wrong order. Digging
    in it turns out this dates back to the last great sysctl reorg done
    where Al Viro introduced the requirement that sysctl directories
    needed to be created before and destroyed after the files in them.

    It turns out that in that great reorg /proc/sys/net/ipv6/neigh was
    overlooked. So this patch fixes that oversight and makes an annoying
    warning message go away.

    >------------[ cut here ]------------
    >WARNING: at kernel/sysctl.c:1992 unregister_sysctl_table+0x134/0x164()
    >Pid: 23951, comm: kworker/u:3 Not tainted 2.6.37-350888.2010AroraKernelBeta.fc14.x86_64 #1
    >Call Trace:
    > [] warn_slowpath_common+0x80/0x98
    > [] warn_slowpath_null+0x15/0x17
    > [] unregister_sysctl_table+0x134/0x164
    > [] ? kfree+0xc4/0xd1
    > [] neigh_sysctl_unregister+0x22/0x3a
    > [] addrconf_ifdown+0x33f/0x37b [ipv6]
    > [] ? skb_dequeue+0x5f/0x6b
    > [] addrconf_notify+0x69b/0x75c [ipv6]
    > [] ? ip6mr_device_event+0x98/0xa9 [ipv6]
    > [] notifier_call_chain+0x32/0x5e
    > [] raw_notifier_call_chain+0xf/0x11
    > [] call_netdevice_notifiers+0x45/0x4a
    > [] rollback_registered_many+0x118/0x201
    > [] unregister_netdevice_many+0x16/0x6d
    > [] default_device_exit_batch+0xa4/0xb8
    > [] ? cleanup_net+0x0/0x194
    > [] ops_exit_list+0x4e/0x56
    > [] cleanup_net+0xf4/0x194
    > [] process_one_work+0x187/0x280
    > [] worker_thread+0xff/0x19f
    > [] ? worker_thread+0x0/0x19f
    > [] kthread+0x7d/0x85
    > [] kernel_thread_helper+0x4/0x10
    > [] ? kthread+0x0/0x85
    > [] ? kernel_thread_helper+0x0/0x10
    >---[ end trace 8a7e9310b35e9486 ]---

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • When an IPSEC SA is still being set up, __xfrm_lookup() will return
    -EREMOTE and so ip_route_output_flow() will return a blackhole route.
    This can happen in a sndmsg call, and after d33e455337ea ("net: Abstract
    default MTU metric calculation behind an accessor.") this leads to a
    crash in ip_append_data() because the blackhole dst_ops have no
    default_mtu() method and so dst_mtu() calls a NULL pointer.

    Fix this by adding default_mtu() methods (that simply return 0, matching
    the old behavior) to the blackhole dst_ops.

    The IPv4 part of this patch fixes a crash that I saw when using an IPSEC
    VPN; the IPv6 part is untested because I don't have an IPv6 VPN, but it
    looks to be needed as well.

    Signed-off-by: Roland Dreier
    Signed-off-by: David S. Miller

    Roland Dreier
     

28 Jan, 2011

3 commits


27 Jan, 2011

3 commits

  • Routing metrics are now copy-on-write.

    Initially a route entry points it's metrics at a read-only location.
    If a routing table entry exists, it will point there. Else it will
    point at the all zero metric place-holder called 'dst_default_metrics'.

    The writeability state of the metrics is stored in the low bits of the
    metrics pointer, we have two bits left to spare if we want to store
    more states.

    For the initial implementation, COW is implemented simply via kmalloc.
    However future enhancements will change this to place the writable
    metrics somewhere else, in order to increase sharing. Very likely
    this "somewhere else" will be the inetpeer cache.

    Note also that this means that metrics updates may transiently fail
    if we cannot COW the metrics successfully.

    But even by itself, this patch should decrease memory usage and
    increase cache locality especially for routing workloads. In those
    cases the read-only metric copies stay in place and never get written
    to.

    TCP workloads where metrics get updated, and those rare cases where
    PMTU triggers occur, will take a very slight performance hit. But
    that hit will be alleviated when the long-term writable metrics
    move to a more sharable location.

    Since the metrics storage went from a u32 array of RTAX_MAX entries to
    what is essentially a pointer, some retooling of the dst_entry layout
    was necessary.

    Most importantly, we need to preserve the alignment of the reference
    count so that it doesn't share cache lines with the read-mostly state,
    as per Eric Dumazet's alignment assertion checks.

    The only non-trivial bit here is the move of the 'flags' member into
    the writeable cacheline. This is OK since we are always accessing the
    flags around the same moment when we made a modification to the
    reference count.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • David S. Miller
     
  • Like ipv4, we have to propagate the ipv6 route peer into
    the ipsec top-level route during instantiation.

    Signed-off-by: David S. Miller

    David S. Miller
     

26 Jan, 2011

1 commit

  • This reverts the following set of commits:

    d1ed113f1669390da9898da3beddcc058d938587 ("ipv6: remove duplicate neigh_ifdown")
    29ba5fed1bbd09c2cba890798c8f9eaab251401d ("ipv6: don't flush routes when setting loopback down")
    9d82ca98f71fd686ef2f3017c5e3e6a4871b6e46 ("ipv6: fix missing in6_ifa_put in addrconf")
    2de795707294972f6c34bae9de713e502c431296 ("ipv6: addrconf: don't remove address state on ifdown if the address is being kept")
    8595805aafc8b077e01804c9a3668e9aa3510e89 ("IPv6: only notify protocols if address is compeletely gone")
    27bdb2abcc5edb3526e25407b74bf17d1872c329 ("IPv6: keep tentative addresses in hash table")
    93fa159abe50d3c55c7f83622d3f5c09b6e06f4b ("IPv6: keep route for tentative address")
    8f37ada5b5f6bfb4d251a7f510f249cb855b77b3 ("IPv6: fix race between cleanup and add/delete address")
    84e8b803f1e16f3a2b8b80f80a63fa2f2f8a9be6 ("IPv6: addrconf notify when address is unavailable")
    dc2b99f71ef477a31020511876ab4403fb7c4420 ("IPv6: keep permanent addresses on admin down")

    because the core semantic change to ipv6 address handling on ifdown
    has broken some things, in particular "disable_ipv6" sysctl handling.

    Stephen has made several attempts to get things back in working order,
    but nothing has restored disable_ipv6 fully yet.

    Reported-by: Eric W. Biederman
    Tested-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    David S. Miller
     

25 Jan, 2011

2 commits

  • Do not handle PMTU vs. route lookup creation any differently
    wrt. offlink routes, always clone them.

    Reported-by: PK
    Signed-off-by: David S. Miller

    David S. Miller
     
  • Quoting Ben Hutchings: we presumably won't be defining features that
    can only be enabled on 64-bit architectures.

    Occurences found by `grep -r` on net/, drivers/net, include/

    [ Move features and vlan_features next to each other in
    struct netdev, as per Eric Dumazet's suggestion -DaveM ]

    Signed-off-by: Michał Mirosław
    Signed-off-by: David S. Miller

    Michał Mirosław
     

21 Jan, 2011

3 commits

  • Remove sparse warnings, using a function typedef to be able to use __rcu
    annotation on mh_filter pointer.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Fix minor __rcu annotations and remove sparse warnings

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • After commit ae90bdeaeac6b (netfilter: fix compilation when conntrack is
    disabled but tproxy is enabled) we have following warnings :

    net/ipv6/netfilter/nf_conntrack_reasm.c:520:16: warning: symbol
    'nf_ct_frag6_gather' was not declared. Should it be static?
    net/ipv6/netfilter/nf_conntrack_reasm.c:591:6: warning: symbol
    'nf_ct_frag6_output' was not declared. Should it be static?
    net/ipv6/netfilter/nf_conntrack_reasm.c:612:5: warning: symbol
    'nf_ct_frag6_init' was not declared. Should it be static?
    net/ipv6/netfilter/nf_conntrack_reasm.c:640:6: warning: symbol
    'nf_ct_frag6_cleanup' was not declared. Should it be static?

    Fix this including net/netfilter/ipv6/nf_defrag_ipv6.h

    Signed-off-by: Eric Dumazet
    CC: KOVACS Krisztian
    Signed-off-by: Patrick McHardy

    Eric Dumazet
     

20 Jan, 2011

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (41 commits)
    sctp: user perfect name for Delayed SACK Timer option
    net: fix can_checksum_protocol() arguments swap
    Revert "netlink: test for all flags of the NLM_F_DUMP composite"
    gianfar: Fix misleading indentation in startup_gfar()
    net/irda/sh_irda: return to RX mode when TX error
    net offloading: Do not mask out NETIF_F_HW_VLAN_TX for vlan.
    USB CDC NCM: tx_fixup() race condition fix
    ns83820: Avoid bad pointer deref in ns83820_init_one().
    ipv6: Silence privacy extensions initialization
    bnx2x: Update bnx2x version to 1.62.00-4
    bnx2x: Fix AER setting for BCM57712
    bnx2x: Fix BCM84823 LED behavior
    bnx2x: Mark full duplex on some external PHYs
    bnx2x: Fix BCM8073/BCM8727 microcode loading
    bnx2x: LED fix for BCM8727 over BCM57712
    bnx2x: Common init will be executed only once after POR
    bnx2x: Swap BCM8073 PHY polarity if required
    iwlwifi: fix valid chain reading from EEPROM
    ath5k: fix locking in tx_complete_poll_work
    ath9k_hw: do PA offset calibration only on longcal interval
    ...

    Linus Torvalds
     
  • Patrick McHardy
     

19 Jan, 2011

2 commits


15 Jan, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits)
    GRETH: resolve SMP issues and other problems
    GRETH: handle frame error interrupts
    GRETH: avoid writing bad speed/duplex when setting transfer mode
    GRETH: fixed skb buffer memory leak on frame errors
    GRETH: GBit transmit descriptor handling optimization
    GRETH: fix opening/closing
    GRETH: added raw AMBA vendor/device number to match against.
    cassini: Fix build bustage on x86.
    e1000e: consistent use of Rx/Tx vs. RX/TX/rx/tx in comments/logs
    e1000e: update Copyright for 2011
    e1000: Avoid unhandled IRQ
    r8169: keep firmware in memory.
    netdev: tilepro: Use is_unicast_ether_addr helper
    etherdevice.h: Add is_unicast_ether_addr function
    ks8695net: Use default implementation of ethtool_ops::get_link
    ks8695net: Disable non-working ethtool operations
    USB CDC NCM: Don't deref NULL in cdc_ncm_rx_fixup() and don't use uninitialized variable.
    vxge: Remember to release firmware after upgrading firmware
    netdev: bfin_mac: Remove is_multicast_ether_addr use in netdev_for_each_mc_addr
    ipsec: update MAX_AH_AUTH_LEN to support sha512
    ...

    Linus Torvalds
     

14 Jan, 2011

1 commit

  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
    Documentation/trace/events.txt: Remove obsolete sched_signal_send.
    writeback: fix global_dirty_limits comment runtime -> real-time
    ppc: fix comment typo singal -> signal
    drivers: fix comment typo diable -> disable.
    m68k: fix comment typo diable -> disable.
    wireless: comment typo fix diable -> disable.
    media: comment typo fix diable -> disable.
    remove doc for obsolete dynamic-printk kernel-parameter
    remove extraneous 'is' from Documentation/iostats.txt
    Fix spelling milisec -> ms in snd_ps3 module parameter description
    Fix spelling mistakes in comments
    Revert conflicting V4L changes
    i7core_edac: fix typos in comments
    mm/rmap.c: fix comment
    sound, ca0106: Fix assignment to 'channel'.
    hrtimer: fix a typo in comment
    init/Kconfig: fix typo
    anon_inodes: fix wrong function name in comment
    fix comment typos concerning "consistent"
    poll: fix a typo in comment
    ...

    Fix up trivial conflicts in:
    - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c)
    - fs/ext4/ext4.h

    Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.

    Linus Torvalds
     

13 Jan, 2011

2 commits

  • One iptables invocation with 135000 rules takes 35 seconds of cpu time
    on a recent server, using a 32bit distro and a 64bit kernel.

    We eventually trigger NMI/RCU watchdog.

    INFO: rcu_sched_state detected stall on CPU 3 (t=6000 jiffies)

    COMPAT mode has quadratic behavior and consume 16 bytes of memory per
    rule.

    Switch the xt_compat algos to use an array instead of list, and use a
    binary search to locate an offset in the sorted array.

    This halves memory need (8 bytes per rule), and removes quadratic
    behavior [ O(N*N) -> O(N*log2(N)) ]

    Time of iptables goes from 35 s to 150 ms.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Pablo Neira Ayuso

    Eric Dumazet
     
  • David S. Miller