13 Jan, 2012

1 commit

  • commit a9b3cd7f32 (rcu: convert uses of rcu_assign_pointer(x, NULL) to
    RCU_INIT_POINTER) did a lot of incorrect changes, since it did a
    complete conversion of rcu_assign_pointer(x, y) to RCU_INIT_POINTER(x,
    y).

    We miss needed barriers, even on x86, when y is not NULL.

    Signed-off-by: Eric Dumazet
    CC: Stephen Hemminger
    CC: Paul E. McKenney
    Signed-off-by: David S. Miller

    Eric Dumazet
     

02 Dec, 2011

1 commit


02 Aug, 2011

1 commit

  • When assigning a NULL value to an RCU protected pointer, no barrier
    is needed. The rcu_assign_pointer, used to handle that but will soon
    change to not handle the special case.

    Convert all rcu_assign_pointer of NULL value.

    //smpl
    @@ expression P; @@

    - rcu_assign_pointer(P, NULL)
    + RCU_INIT_POINTER(P, NULL)

    //

    Signed-off-by: Stephen Hemminger
    Acked-by: Paul E. McKenney
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

26 Jul, 2011

1 commit


10 Jun, 2011

1 commit

  • The message size allocated for rtnl ifinfo dumps was limited to
    a single page. This is not enough for additional interface info
    available with devices that support SR-IOV and caused a bug in
    which VF info would not be displayed if more than approximately
    40 VFs were created per interface.

    Implement a new function pointer for the rtnl_register service that will
    calculate the amount of data required for the ifinfo dump and allocate
    enough data to satisfy the request.

    Signed-off-by: Greg Rose
    Signed-off-by: Jeff Kirsher

    Greg Rose
     

11 May, 2011

1 commit

  • Commit e67f88dd12f6 (net: dont hold rtnl mutex during netlink dump
    callbacks) switched rtnl protection to RCU, but we forgot to adjust two
    rcu_dereference() lockdep annotations :

    inet_get_link_af_size() or inet_fill_link_af() might be called with
    rcu_read_lock or rtnl held, so use rcu_dereference_rtnl()
    instead of rtnl_dereference()

    Reported-by: Valdis Kletnieks
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

03 May, 2011

1 commit


24 Mar, 2011

1 commit

  • In commit 9435eb1cf0b76b323019cebf8d16762a50a12a19
    ("ipv4: Implement __ip_dev_find using new interface address hash.")
    we reimplemented __ip_dev_find() so that it doesn't have to
    do a full FIB table lookup.

    Instead, it consults a hash table of addresses configured to
    interfaces.

    This works identically to the old code in all except one case,
    and that is for loopback subnets.

    The old code would match the loopback device for any IP address
    that falls within a subnet configured to the loopback device.

    Handle this corner case by doing the FIB lookup.

    We could implement this via inet_addr_onlink() but:

    1) Someone could configure many addresses to loopback and
    inet_addr_onlink() is a simple list traversal.

    2) We know the old code works.

    Reported-by: Julian Anastasov
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    David S. Miller
     

22 Mar, 2011

2 commits

  • Optimize the calling of fib_add_ifaddr for all
    secondary addresses after the promoted one to start from
    their place, not from the new place of the promoted
    secondary. It will save some CPU cycles because we
    are sure the promoted secondary was first for the subnet
    and all next secondaries do not change their place.

    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • The secondary address promotion relies on fib_sync_down_addr
    to remove all routes created for the secondary addresses when
    the old primary address is deleted. It does not happen for cases
    when the primary address is also in another subnet. Fix that
    by deleting local and broadcast routes for all secondaries while
    they are on device list and by faking that all addresses from
    this subnet are to be deleted. It relies on fib_del_ifaddr being
    able to ignore the IPs from the concerned subnet while checking
    for duplication.

    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     

11 Mar, 2011

1 commit


10 Mar, 2011

1 commit

  • In usual cases ifa_address == ifa_local, but in the case where
    SIOCSIFDSTADDR sets the destination address on a point-to-point
    link, ifa_address gets set to that destination address.

    Therefore we should use ifa_local when we want the local interface
    address.

    There were two cases where the selection was done incorrectly:

    1) When devinet_ioctl() does matching, it checks ifa_address even
    though gifconf correct reported ifa_local to the user

    2) IN_DEV_ARP_NOTIFY handling sends a gratuitous ARP using
    ifa_address instead of ifa_local.

    Reported-by: Julian Anastasov
    Signed-off-by: David S. Miller

    David S. Miller
     

04 Mar, 2011

1 commit


20 Feb, 2011

1 commit


19 Feb, 2011

2 commits


15 Feb, 2011

1 commit

  • NETDEV_NOTIFY_PEER is an explicit request by the driver to send a link
    notification while NETDEV_UP/NETDEV_CHANGEADDR generate link
    notifications as a sort of side effect.

    In the later cases the sysctl option is present because link
    notification events can have undesired effects e.g. if the link is
    flapping. I don't think this applies in the case of an explicit
    request from a driver.

    This patch makes NETDEV_NOTIFY_PEER unconditional, if preferred we
    could add a new sysctl for this case which defaults to on.

    This change causes Xen post-migration ARP notifications (which cause
    switches to relearn their MAC tables etc) to be sent by default.

    Signed-off-by: Ian Campbell
    Signed-off-by: David S. Miller

    Ian Campbell
     

13 Dec, 2010

1 commit

  • Always go through a new ip4_dst_hoplimit() helper, just like ipv6.

    This allowed several simplifications:

    1) The interim dst_metric_hoplimit() can go as it's no longer
    userd.

    2) The sysctl_ip_default_ttl entry no longer needs to use
    ipv4_doint_and_flush, since the sysctl is not cached in
    routing cache metrics any longer.

    3) ipv4_doint_and_flush no longer needs to be exported and
    therefore can be marked static.

    When ipv4_doint_and_flush_strategy was removed some time ago,
    the external declaration in ip.h was mistakenly left around
    so kill that off too.

    We have to move the sysctl_ip_default_ttl declaration into
    ipv4's route cache definition header net/route.h, because
    currently net/ip.h (where the declaration lives now) has
    a back dependency on net/route.h

    Signed-off-by: David S. Miller

    David S. Miller
     

07 Dec, 2010

1 commit

  • commits 9f0f7272 (ipv4: AF_INET link address family) and cf7afbfeb8c
    (rtnl: make link af-specific updates atomic) used incorrect
    __in_dev_get_rcu() in RTNL protected contexts, triggering PROVE_RCU
    warnings.

    Switch to __in_dev_get_rtnl(), wich is more appropriate, since we hold
    RTNL.

    Based on a report and initial patch from Amerigo Wang.

    Reported-by: Amerigo Wang
    Signed-off-by: Eric Dumazet
    Cc: Thomas Graf
    Reviewed-by: WANG Cong
    Signed-off-by: David S. Miller

    Eric Dumazet
     

28 Nov, 2010

1 commit

  • As David pointed out correctly, updates to af-specific attributes
    are currently not atomic. If multiple changes are requested and
    one of them fails, previous updates may have been applied already
    leaving the link behind in a undefined state.

    This patch splits the function parse_link_af() into two functions
    validate_link_af() and set_link_at(). validate_link_af() is placed
    to validate_linkmsg() check for errors as early as possible before
    any changes to the link have been made. set_link_af() is called to
    commit the changes later.

    This method is not fail proof, while it is currently sufficient
    to make set_link_af() inerrable and thus 100% atomic, the
    validation function method will not be able to detect all error
    scenarios in the future, there will likely always be errors
    depending on states which are f.e. not protected by rtnl_mutex
    and thus may change between validation and setting.

    Also, instead of silently ignoring unknown address families and
    config blocks for address families which did not register a set
    function the errors EAFNOSUPPORT respectively EOPNOSUPPORT are
    returned to avoid comitting 4 out of 5 update requests without
    notifying the user.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     

18 Nov, 2010

1 commit

  • Implements the AF_INET link address family exposing the per
    device configuration settings via netlink using the attribute
    IFLA_INET_CONF.

    The format of IFLA_INET_CONF differs depending on the direction
    the attribute is sent. The attribute sent by the kernel consists
    of a u32 array, basically a 1:1 copy of in_device->cnf.data[].
    The attribute expected by the kernel must consist of a sequence
    of nested u32 attributes, each representing a change request,
    e.g.
    [IFLA_INET_CONF] = {
    [IPV4_DEVCONF_FORWARDING] = 1,
    [IPV4_DEVCONF_NOXFRM] = 0,
    }

    libnl userspace API documentation and example available from:
    http://www.infradead.org/~tgr/libnl/doc-git/group__link__inet.html

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     

19 Oct, 2010

1 commit


16 Sep, 2010

1 commit

  • dev->ip_ptr is protected by rtnl and rcu.

    Yet some places dont use appropriate primitives and/or locking rules.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

31 May, 2010

1 commit

  • Currently such notifications are only generated when the device comes up or the
    address changes. However one use case for these notifications is to enable
    faster network recovery after a virtual machine migration (by causing switches
    to relearn their MAC tables). A migration appears to the network stack as a
    temporary loss of carrier and therefore does not trigger either of the current
    conditions. Rather than adding carrier up as a trigger (which can cause issues
    when interfaces a flapping) simply add an interface which the driver can use
    to explicitly trigger the notification.

    Signed-off-by: Ian Campbell
    Cc: Stephen Hemminger
    Cc: Jeremy Fitzhardinge
    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Cc: stable@kernel.org
    Signed-off-by: David S. Miller

    Ian Campbell
     

12 Apr, 2010

1 commit


07 Apr, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

27 Mar, 2010

1 commit

  • When a dump is interrupted at the last device in a hash chain and
    then continued, "idx" won't get incremented past s_idx, so s_ip_idx
    is not reset when moving on to the next device. This means of all
    following devices only the last n - s_ip_idx addresses are dumped.

    Tested-by: Pawel Staszewski
    Signed-off-by: Patrick McHardy

    Patrick McHardy
     

19 Mar, 2010

1 commit


26 Feb, 2010

1 commit


20 Feb, 2010

1 commit

  • Yuck. It turns out that when we restart sysctls we were restarting
    with the values already changed. Which unfortunately meant that
    the second time through we thought there was no change and skipped
    all kinds of work, despite the fact that there was indeed a change.

    I have fixed this the simplest way possible by restoring the changed
    values when we restart the sysctl write.

    One of my coworkers spotted this bug when after disabling forwarding
    on an interface pings were still forwarded.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

17 Feb, 2010

2 commits


11 Jan, 2010

1 commit


07 Jan, 2010

1 commit

  • This is to be used together with switch technologies, like RFC3069,
    that where the individual ports are not allowed to communicate with
    each other, but they are allowed to talk to the upstream router. As
    described in RFC 3069, it is possible to allow these hosts to
    communicate through the upstream router by proxy_arp'ing.

    This patch basically allow proxy arp replies back to the same
    interface (from which the ARP request/solicitation was received).

    Tunable per device via proc "proxy_arp_pvlan":
    /proc/sys/net/ipv4/conf/*/proxy_arp_pvlan

    This switch technology is known by different vendor names:
    - In RFC 3069 it is called VLAN Aggregation.
    - Cisco and Allied Telesyn call it Private VLAN.
    - Hewlett-Packard call it Source-Port filtering or port-isolation.
    - Ericsson call it MAC-Forced Forwarding (RFC Draft).

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     

26 Dec, 2009

1 commit

  • when using policy routing and the skb mark:
    there are cases where a back path validation requires us
    to use a different routing table for src ip validation than
    the one used for mapping ingress dst ip.
    One such a case is transparent proxying where we pretend to be
    the destination system and therefore the local table
    is used for incoming packets but possibly a main table would
    be used on outbound.
    Make the default behavior to allow the above and if users
    need to turn on the symmetry via sysctl src_valid_mark

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     

08 Dec, 2009

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits)
    mac80211: fix reorder buffer release
    iwmc3200wifi: Enable wimax core through module parameter
    iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter
    iwmc3200wifi: Coex table command does not expect a response
    iwmc3200wifi: Update wiwi priority table
    iwlwifi: driver version track kernel version
    iwlwifi: indicate uCode type when fail dump error/event log
    iwl3945: remove duplicated event logging code
    b43: fix two warnings
    ipw2100: fix rebooting hang with driver loaded
    cfg80211: indent regulatory messages with spaces
    iwmc3200wifi: fix NULL pointer dereference in pmkid update
    mac80211: Fix TX status reporting for injected data frames
    ath9k: enable 2GHz band only if the device supports it
    airo: Fix integer overflow warning
    rt2x00: Fix padding bug on L2PAD devices.
    WE: Fix set events not propagated
    b43legacy: avoid PPC fault during resume
    b43: avoid PPC fault during resume
    tcp: fix a timewait refcnt race
    ...

    Fix up conflicts due to sysctl cleanups (dead sysctl_check code and
    CTL_UNNUMBERED removed) in
    kernel/sysctl_check.c
    net/ipv4/sysctl_net_ipv4.c
    net/ipv6/addrconf.c
    net/sctp/sysctl.c

    Linus Torvalds
     

04 Dec, 2009

1 commit

  • commit 8ec1e0ebe26087bfc5c0394ada5feb5758014fc8
    Author: Patrick McHardy
    Date: Thu Dec 3 12:16:35 2009 +0100

    ipv4: add sysctl to accept packets with local source addresses

    Change fib_validate_source() to accept packets with a local source address when
    the "accept_local" sysctl is set for the incoming inet device. Combined with the
    previous patches, this allows to communicate between multiple local interfaces
    over the wire.

    Signed-off-by: Patrick McHardy

    Signed-off-by: David S. Miller

    Patrick McHardy
     

26 Nov, 2009

1 commit

  • Generated with the following semantic patch

    @@
    struct net *n1;
    struct net *n2;
    @@
    - n1 == n2
    + net_eq(n1, n2)

    @@
    struct net *n1;
    struct net *n2;
    @@
    - n1 != n2
    + !net_eq(n1, n2)

    applied over {include,net,drivers/net}.

    Signed-off-by: Octavian Purdila
    Signed-off-by: David S. Miller

    Octavian Purdila
     

14 Nov, 2009

1 commit

  • Stephen Hemminger a écrit :
    > On Thu, 12 Nov 2009 15:11:36 +0100
    > Eric Dumazet wrote:
    >
    >> When handling large number of netdevices, inet_dump_ifaddr()
    >> is very slow because it has O(N^2) complexity.
    >>
    >> Instead of scanning one single list, we can use the NETDEV_HASHENTRIES
    >> sub lists of the dev_index hash table, and RCU lookups.
    >>
    >> Signed-off-by: Eric Dumazet
    >
    > You might be able to make RCU critical section smaller by moving
    > it into loop.
    >

    Indeed. But we dump at most one skb (
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet