30 Jan, 2009

2 commits

  • Unfortunately simplicity isn't always the best. The fraginfo
    interface turned out to be suboptimal. The problem was quite
    obvious. For every packet, we have to copy the headers from
    the frags structure into skb->head, even though for 99% of the
    packets this part is immediately thrown away after the merge.

    LRO didn't have this problem because it directly read the headers
    from the frags structure.

    This patch attempts to address this by creating an interface
    that allows GRO to access the headers in the first frag without
    having to copy it. Because all drivers that use frags place the
    headers in the first frag this optimisation should be enough.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Currently VLAN still has a bit of common code handling the aftermath
    of GRO that's shared with the common path. This patch moves them
    into shared helpers to reduce code duplication.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

28 Jan, 2009

1 commit


27 Jan, 2009

19 commits


24 Jan, 2009

2 commits


23 Jan, 2009

10 commits

  • Typo fix.

    Signed-off-by: Peter Ujfalusi
    Signed-off-by: Mark Brown

    Peter Ujfalusi
     
  • crc32c algorithm provides a byteswaped result. On little-endian
    arches, the result ends up in big-endian/network byte order.
    On big-endinan arches, the result ends up in little-endian
    order and needs to be byte swapped again. Thus calling cpu_to_le32
    gives the right output.

    Tested-by: Jukka Taimisto
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • This last patch makes the appropriate changes to use and propagate the
    network namespace where needed in IPv4 multicast routing code.

    This consists mainly in replacing all the remaining init_net occurences
    with current netns pointer retrieved from sockets, net devices or
    mfc_caches depending on the routines' contexts.

    Some routines receive a new 'struct net' parameter to propagate the current
    netns:
    * vif_add/vif_delete
    * ipmr_new_tunnel
    * mroute_clean_tables
    * ipmr_cache_find
    * ipmr_cache_report
    * ipmr_cache_unresolved
    * ipmr_mfc_add/ipmr_mfc_delete
    * ipmr_get_route
    * rt_fill_info (in route.c)

    Signed-off-by: Benjamin Thery
    Signed-off-by: David S. Miller

    Benjamin Thery
     
  • Preliminary work to make IPv4 multicast routing netns-aware.

    Declare variable 'reg_vif_num' per-namespace, move into struct netns_ipv4.

    At the moment, this variable is only referenced in init_net.

    Signed-off-by: Benjamin Thery
    Signed-off-by: David S. Miller

    Benjamin Thery
     
  • Preliminary work to make IPv4 multicast routing netns-aware.

    Declare IPv multicast routing variables 'mroute_do_assert' and
    'mroute_do_pim' per-namespace in struct netns_ipv4.

    At the moment, these variables are only referenced in init_net.

    Signed-off-by: Benjamin Thery
    Signed-off-by: David S. Miller

    Benjamin Thery
     
  • Preliminary work to make IPv4 multicast routing netns-aware.

    Declare variable cache_resolve_queue_len per-namespace: move it into
    struct netns_ipv4.

    This variable counts the number of unresolved cache entries queued in the
    list mfc_unres_queue. This list is kept global to all netns as the number
    of entries per namespace is limited to 10 (hardcoded in routine
    ipmr_cache_unresolved).
    Entries belonging to different namespaces in mfc_unres_queue will be
    identified by matching the mfc_net member introduced previously in
    struct mfc_cache.

    Keeping this list global to all netns, also allows us to keep a single
    timer (ipmr_expire_timer) to handle their expiration.
    In some places cache_resolve_queue_len value was tested for arming
    or deleting the timer. These tests were equivalent to testing
    mfc_unres_queue value instead and are replaced in this patch.

    At the moment, cache_resolve_queue_len is only referenced in init_net.

    Signed-off-by: Benjamin Thery
    Signed-off-by: David S. Miller

    Benjamin Thery
     
  • Preliminary work to make IPv4 multicast routing netns-aware.

    Dynamically allocate IPv4 multicast forwarding cache, mfc_cache_array,
    and move it to struct netns_ipv4.

    At the moment, mfc_cache_array is only referenced in init_net.

    Signed-off-by: Benjamin Thery
    Signed-off-by: David S. Miller

    Benjamin Thery
     
  • This patch stores into struct mfc_cache the network namespace each
    mfc_cache belongs to. The new member is mfc_net.

    mfc_net is assigned at cache allocation and doesn't change during
    the rest of the cache entry life.
    A new net parameter is added to ipmr_cache_alloc/ipmr_cache_alloc_unres.

    This will help to retrieve the current netns around the IPv4 multicast
    routing code.

    At the moment, all mfc_cache are allocated in init_net.

    Signed-off-by: Benjamin Thery
    Signed-off-by: David S. Miller

    Benjamin Thery
     
  • Preliminary work to make IPv6 multicast routing netns-aware.

    Dynamically allocate interface table vif_table and move it to
    struct netns_ipv4, and update MIF_EXISTS() macro.

    At the moment, vif_table is only referenced in init_net.

    Signed-off-by: Benjamin Thery
    Signed-off-by: David S. Miller

    Benjamin Thery
     
  • Preliminary work to make IPv4 multicast routing netns-aware.

    Make IPv4 multicast routing mroute_socket per-namespace,
    moves it into struct netns_ipv4.

    At the moment, mroute_socket is only referenced in init_net.

    Signed-off-by: Benjamin Thery
    Signed-off-by: David S. Miller

    Benjamin Thery
     

22 Jan, 2009

6 commits

  • Thomas Gleixner
     
  • Impact: Fix debugobjects warning

    debugobject enabled kernels spit out a warning in hpet code due to a
    workqueue which is initialized on stack.

    Add INIT_WORK_ON_STACK() which calls init_timer_on_stack() and use it
    in hpet.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Create a separate mode_config IDR lock for simplicity. The core DRM
    config structures (connector, mode, etc. lists) are still protected by
    the mode_config mutex, but the CRTC IDR (used for the various identifier
    IDs) is now protected by the mode_config idr_mutex. Simplifies the
    locking a bit and removes a warning.

    All objects are protected by the config mutex, we may in the future,
    split the object further to have reference counts.

    Signed-off-by: Jesse Barnes
    Signed-off-by: Dave Airlie

    Jesse Barnes
     
  • - Each namespace contains ppp channels and units separately
    with appropriate locks

    Signed-off-by: Cyrill Gorcunov
    Signed-off-by: David S. Miller

    Cyrill Gorcunov
     
  • Allow the host to inform us that the link is down by adding
    a VIRTIO_NET_F_STATUS which indicates that device status is
    available in virtio_net config.

    This is currently useful for simulating link down conditions
    (e.g. using proposed qemu 'set_link' monitor command) but
    would also be needed if we were to support device assignment
    via virtio.

    Signed-off-by: Mark McLoughlin
    Signed-off-by: Rusty Russell (added future masking)
    Signed-off-by: David S. Miller

    Mark McLoughlin
     
  • With simple extension to the binding mechanism, which allows to bind more
    than 64k sockets (or smaller amount, depending on sysctl parameters),
    we have to traverse the whole bind hash table to find out empty bucket.
    And while it is not a problem for example for 32k connections, bind()
    completion time grows exponentially (since after each successful binding
    we have to traverse one bucket more to find empty one) even if we start
    each time from random offset inside the hash table.

    So, when hash table is full, and we want to add another socket, we have
    to traverse the whole table no matter what, so effectivelly this will be
    the worst case performance and it will be constant.

    Attached picture shows bind() time depending on number of already bound
    sockets.

    Green area corresponds to the usual binding to zero port process, which
    turns on kernel port selection as described above. Red area is the bind
    process, when number of reuse-bound sockets is not limited by 64k (or
    sysctl parameters). The same exponential growth (hidden by the green
    area) before number of ports reaches sysctl limit.

    At this time bind hash table has exactly one reuse-enbaled socket in a
    bucket, but it is possible that they have different addresses. Actually
    kernel selects the first port to try randomly, so at the beginning bind
    will take roughly constant time, but with time number of port to check
    after random start will increase. And that will have exponential growth,
    but because of above random selection, not every next port selection
    will necessary take longer time than previous. So we have to consider
    the area below in the graph (if you could zoom it, you could find, that
    there are many different times placed there), so area can hide another.

    Blue area corresponds to the port selection optimization.

    This is rather simple design approach: hashtable now maintains (unprecise
    and racely updated) number of currently bound sockets, and when number
    of such sockets becomes greater than predefined value (I use maximum
    port range defined by sysctls), we stop traversing the whole bind hash
    table and just stop at first matching bucket after random start. Above
    limit roughly corresponds to the case, when bind hash table is full and
    we turned on mechanism of allowing to bind more reuse-enabled sockets,
    so it does not change behaviour of other sockets.

    Signed-off-by: Evgeniy Polyakov
    Tested-by: Denys Fedoryschenko
    Signed-off-by: David S. Miller

    Evgeniy Polyakov