11 Oct, 2007

40 commits

  • with the macro max provided by , so changed its name
    to a more proper one: limit

    Signed-off-by: Denis Cheng
    Signed-off-by: David S. Miller

    Denis Cheng
     
  • Signed-off-by: Denis Cheng
    Signed-off-by: David S. Miller

    Denis Cheng
     
  • Thanks for noticing the bug where csum_start is not updated
    when the head room changes.

    This patch fixes that. It also moves the csum/ip_summed
    copying into copy_skb_header so that skb_copy_expand gets
    it too. I've checked its callers and no one should be upset
    by this.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • I was looking at Patrick's fix to inet_diag and it occured
    to me that we're using a pointer argument to return values
    unnecessarily in netlink_run_queue. Changing it to return
    the value will allow the compiler to generate better code
    since the value won't have to be memory-backed.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • The sctp_[rw]mem definitions should really be in protocol.c
    since that is where they are initialized. This also allows
    one to build a kernel without sysctl support.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • SCTP Supported Extenions parameter is specified in Section 4.2.7
    of the ADD-IP draft (soon to be RFC). The parameter is
    encoded as:

    0 1 2 3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | Parameter Type = 0x8008 | Parameter Length |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | CHUNK TYPE 1 | CHUNK TYPE 2 | CHUNK TYPE 3 | CHUNK TYPE 4 |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | .... |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | CHUNK TYPE N | PAD | PAD | PAD |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    It contains a list of chunks that a particular SCTP extension
    uses. Current extensions supported are Partial Reliability
    (FWD-TSN) and ADD-IP (ASCONF and ASCONF-ACK).

    When implementing new extensions (AUTH, PKT-DROP, etc..), new
    chunks need to be added to this parameter. Parameter processing
    would be modified to negotiate support for these new features.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • This patch slightly cleanups FIB rules framework. rules_list as a pointer
    on struct fib_rules_ops is useless. It is always assigned with a static
    per/subsystem list in IPv4, IPv6 and DecNet.

    Signed-off-by: Denis V. Lunev
    Acked-by: Alexey Kuznetsov
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • The call_netdev_notifiers routine can successfully be used in
    the net/core_dev.c itself.

    This will save 6 lines of code and 62 ;) bytes of .text section.

    62 is rather small, but I have one more patch saving ~30 bytes
    from netns code (sent to Eric), so altogether they can save
    some more noticeable amount.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • The dev_name_hash and the dev_index_hash are now booth kmalloc-ed
    (and each element is properly initialized as usually) so I think
    it's worth consolidating this code making it look nicer (and
    saving 28 bytes of .text section ;) )

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • This replaces the void * parameter with a struct net_device * which
    is what is actually required.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • HARD_TX_LOCK micro is a nice aggregation that could be used
    in other spots. move it to netdevice.h
    Also makes sure the previously superflous cpu arguement is used.
    Thanks to DaveM for the suggestions.

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     
  • Remove useless message. We get the right message from another
    subsystem.

    Signed-off-by: Milan Kocian
    Signed-off-by: David S. Miller

    Milan Kocian
     
  • The bulk of the CIPSO option parsing/processing in the cipso_v4_sock_getattr()
    and cipso_v4_skb_getattr() functions are identical, the only real difference
    being where the functions obtain the CIPSO option itself. This patch creates
    a new function, cipso_v4_getattr(), which contains the common CIPSO option
    parsing/processing code and modifies the existing functions to call this new
    helper function.

    Signed-off-by: Paul Moore
    Signed-off-by: David S. Miller

    Paul Moore
     
  • For the operations
    get-tx-csum
    get-sg
    get-tso
    get-ufo
    the default ethtool_op_xxx behavior is fine for all drivers, so we
    permit op==NULL to imply the default behavior.

    This provides a more uniform behavior across all drivers, eliminating
    ethtool(8) "ioctl not supported" errors on older drivers that had
    not been updated for the latest sub-ioctls.

    The ethtool_op_xxx() functions are left exported, in case anyone
    wishes to call them directly from a driver-private implementation --
    a not-uncommon case. Should an ethtool_op_xxx() helper remain unused
    for a while, except by net/core/ethtool.c, we can un-export it at a
    later date.

    [ Resolved conflicts with set/get value ethtool patch... -DaveM ]

    Signed-off-by: Jeff Garzik
    Signed-off-by: David S. Miller

    Jeff Garzik
     
  • It's been a useless no-op for long enough in 2.6 so I figured it's time to
    remove it. The number of people that could object because they're
    maintaining unified 2.4 and 2.6 drivers is probably rather small.

    [ Handled drivers added by netdev tree and some missed IRDA cases... -DaveM ]

    Signed-off-by: Ralf Baechle
    Signed-off-by: Jeff Garzik
    Signed-off-by: David S. Miller

    Ralf Baechle
     
  • There are a few TODO comments in the mac80211 sources regarding
    hardware offload for Michael MIC verification. Those items are,
    however, better handled in the driver instead of the stack, if
    any device requires such hand-holding.

    Signed-off-by: Johannes Berg
    Signed-off-by: John W. Linville
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • tx.mode must be set also for buffered frames. It is used in the tx hanlders

    Signed-off-by: Tomas Winkler
    Signed-off-by: John W. Linville
    Signed-off-by: David S. Miller

    Tomas Winkler
     
  • Stats are now available for device usage inside network_device

    Signed-off-by: Stephen Hemminger
    Acked-by: Johannes Berg
    Signed-off-by: John W. Linville
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Johannes Berg noticed that in __ieee80211_tx_prepare() we try to get the
    STA from addr1 of the ieee80211 header when the radiotap header is actually
    still at the front of the packet. This patch defers doing that until the
    radiotap header is gone.

    Signed-off-by: Andy Green
    Signed-off-by: Johannes Berg
    Signed-off-by: John W. Linville
    Signed-off-by: David S. Miller

    warmcat
     
  • Work-around for broken APs that use a non-zero key index for WEP
    pairwise keys. With this patch, WEP encryption only is exempt from
    providing a zero key index.

    Signed-off-by: Volker Braun
    Signed-off-by: Johannes Berg
    Acked-by: Michael Wu
    Signed-off-by: John W. Linville
    Signed-off-by: David S. Miller

    Volker Braun
     
  • The TKIP mixing code was added for the benefit of Intel's ipw3945
    chipset but that code ended up not using it. We have previously
    identified many problems with this code and it crystallized that
    library functions for mixing are likely to handle this in much
    more generality and might allow b43 to take advantage of hardware
    acceleration for TKIP.

    Due to these reasons, remove the TKIP mixing for hardware
    accelerated crypto operations.

    Signed-off-by: Johannes Berg
    Acked-by: Michael Buesch
    Acked-by: Michael Wu
    Signed-off-by: John W. Linville
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • This patch makes the mac80211/driver interface rely only on the
    IEEE80211_TXCTL_DO_NOT_ENCRYPT flag to signal to the driver whether
    a frame should be encrypted or not, since mac80211 internally no
    longer relies on HW_KEY_IDX_INVALID either this removes it, changes
    the key index to be a u8 in all places and makes the full range of
    the value available to drivers.

    Signed-off-by: Johannes Berg
    Acked-by: Michael Wu
    Signed-off-by: John W. Linville
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • No existing drivers use this callback, hence there's no telling
    how it might be used. In fact, it is unlikely to be of much use
    as-is because the default key index isn't something that the
    driver can do much with without knowing which interface it was
    for etc. And if it needs the key index for the transmitted frame,
    it can get it by keeping a reference to the key_conf structure
    and looking it up by hw_key_idx.

    Signed-off-by: Johannes Berg
    Acked-by: Michael Wu
    Signed-off-by: John W. Linville
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • This patch reworks the various hardware crypto related
    flags to make them more local, i.e. put them with each
    key or each packet instead of into the hw struct.

    Signed-off-by: Johannes Berg
    Acked-by: Michael Wu
    Signed-off-by: John W. Linville
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • This patch removes all mention of the atheros turbo modes that
    can't possibly work properly anyway since in some places we don't
    check for them when we should.

    I have no idea what the iwlwifi drivers were doing with these but
    it can't possibly have been correct.

    Cc: Zhu Yi
    Signed-off-by: Johannes Berg
    Acked-by: Michael Wu
    Signed-off-by: John W. Linville
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • During receive processing, we select the key long before using it and
    because there's no locking it is possible that we kfree() the key
    after having selected it but before using it for crypto operations.
    Obviously, this is bad.

    Secondly, during transmit processing, there are two possible races: We
    have a similar race between select_key() and using it for encryption,
    but we also have a race here between select_key() and hardware
    encryption (both when a key is removed.)

    This patch solves these issues by using RCU: when a key is to be freed,
    we first remove the pointer from the appropriate places (sdata->keys,
    sdata->default_key, sta->key) using rcu_assign_pointer() and then
    synchronize_rcu(). Then, we can safely kfree() the key and remove it
    from the hardware. There's a window here where the hardware may still
    be using it for decryption, but we can't work around that without having
    two hardware callbacks, one to disable the key for RX and one to disable
    it for TX; but the worst thing that will happen is that we receive a
    packet decrypted that we don't find a key for any more and then drop it.

    When we add a key, we first need to upload it to the hardware and then,
    using rcu_assign_pointer() again, link it into our structures.

    In the code using keys (TX/RX paths) we use rcu_dereference() to get the
    key and enclose the whole tx/rx section in a rcu_read_lock() ...
    rcu_read_unlock() block. Because we've uploaded the key to hardware
    before linking it into internal structures, we can guarantee that it is
    valid once get to into tx().

    One possible race condition remains, however: when we have hardware
    acceleration enabled and the driver shuts down the queues, we end up
    queueing the frame. If now somebody removes the key, the key will be
    removed from hwaccel and then then driver will be asked to encrypt the
    frame with a key index that has been removed. Hence, drivers will need
    to be aware that the hw_key_index they are passed might not be under
    all circumstances. Most drivers will, however, simply ignore that
    condition and encrypt the frame with the selected key anyway, this
    only results in a frame being encrypted with a wrong key or dropped
    (rightfully) because the key was not valid. There isn't much we can
    do about it unless we want to walk the pending frame queue every time
    a key is removed and remove all frames that used it.

    This race condition, however, will most likely be solved once we add
    multiqueue support to mac80211 because then frames will be queued
    further up the stack instead of after being processed.

    Signed-off-by: Johannes Berg
    Acked-by: Michael Wu
    Signed-off-by: John W. Linville
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Kalle Valo noticed that QoS frames are sent with an invalid QoS control
    field; this is because we increase the header length but neither
    initialise the space nor actually have enough space in the header
    structure for the QoS control field.

    This patch fixes it by treating the QoS field specially and appending it
    explicitly, initialising it to zero.

    Signed-off-by: Johannes Berg
    Acked-by: Michael Wu
    Signed-off-by: John W. Linville
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • mac80211 never calls wireless_spy_update so these aren't
    useful.

    Signed-off-by: Johannes Berg
    Acked-by: Michael Wu
    Signed-off-by: John W. Linville
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • On loaded/big hosts, rt_check_expire() if of litle use, because it
    generally breaks out of its main loop because of a jiffies change.

    It can take a long time (read : timer invocations) to actually
    scan the whole hash table, freeing unused entries.

    Converting it to use a workqueue instead of softirq is a nice
    move because we can allow rt_check_expire() to do the scan
    it is supposed to do, without hogging the CPU.

    This has an impact on the average number of entries in cache,
    reducing ram usage. Cache is more responsive to parameter
    changes (/proc/sys/net/ipv4/route/gc_timeout and
    /proc/sys/net/ipv4/route/gc_interval)

    Note: Maybe the default value of gc_interval (60 seconds)
    is too high, since this means we actually need 5 (300/60)
    invocations to scan the whole table.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This patch will add support for UWB keys to rfkill,
    support for this has been requested by Inaky.

    Signed-off-by: Ivo van Doorn
    Signed-off-by: David S. Miller

    Ivo van Doorn
     
  • As Dmitry pointed out earlier, rfkill-input.c
    doesn't support irda because there are no users
    and we shouldn't add unrequired KEY_ defines.

    However, RFKILL_TYPE_IRDA was defined in the
    rfkill.h header file and would confuse people
    about whether it is implemented or not.

    This patch removes IRDA support completely,
    so it can be added whenever a driver wants the
    feature.

    Signed-off-by: Ivo van Doorn
    Signed-off-by: David S. Miller

    Ivo van Doorn
     
  • The problem: proc_net files remember which network namespace the are
    against but do not remember hold a reference count (as that would pin
    the network namespace). So we currently have a small window where
    the reference count on a network namespace may be incremented when opening
    a /proc file when it has already gone to zero.

    To fix this introduce maybe_get_net and get_proc_net.

    maybe_get_net increments the network namespace reference count only if it is
    greater then zero, ensuring we don't increment a reference count after it
    has gone to zero.

    get_proc_net handles all of the magic to go from a proc inode to the network
    namespace instance and call maybe_get_net on it.

    PROC_NET the old accessor is removed so that we don't get confused and use
    the wrong helper function.

    Then I fix up the callers to use get_proc_net and handle the case case
    where get_proc_net returns NULL. In that case I return -ENXIO because
    effectively the network namespace has already gone away so the files
    we are trying to access don't exist anymore.

    Signed-off-by: Eric W. Biederman
    Acked-by: Paul E. McKenney
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Change L2T (length to time) macros, in all rate based schedulers, to
    call a common function qdisc_l2t() that does the rate table lookup.
    This function handles if the packet size lookup is larger than the
    rate table, which often occurs with TSO enabled.

    Signed-off-by: Jesper Dangaard Brouer
    Acked-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • This patch makes the following needlessly global variables static:
    - sctp_memory_pressure
    - sctp_memory_allocated
    - sctp_sockets_allocated

    Signed-off-by: Adrian Bunk
    Signed-off-by: David S. Miller

    Adrian Bunk
     
  • sctp_addto_param() can become static.

    Signed-off-by: Adrian Bunk
    Signed-off-by: David S. Miller

    Adrian Bunk
     
  • This change allows the generic attribute interface to be used within
    the netfilter subsystem where this flag was initially introduced.

    The byte-order flag is yet unused, it's intended use is to
    allow automatic byte order convertions for all atomic types.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Requested by Johannes Berg.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • When the periodic IP route cache flush is done (every 600 seconds on
    default configuration), some hosts suffer a lot and eventually trigger
    the "soft lockup" message.

    dst_run_gc() is doing a scan of a possibly huge list of dst_entries,
    eventually freeing some (less than 1%) of them, while holding the
    dst_lock spinlock for the whole scan.

    Then it rearms a timer to redo the full thing 1/10 s later...
    The slowdown can last one minute or so, depending on how active are
    the tcp sessions.

    This second version of the patch converts the processing from a softirq
    based one to a workqueue.

    Even if the list of entries in garbage_list is huge, host is still
    responsive to softirqs and can make progress.

    Instead of resetting gc timer to 0.1 second if one entry was freed in a
    gc run, we do this if more than 10% of entries were freed.

    Before patch :

    Aug 16 06:21:37 SRV1 kernel: BUG: soft lockup detected on CPU#0!
    Aug 16 06:21:37 SRV1 kernel:
    Aug 16 06:21:37 SRV1 kernel: Call Trace:
    Aug 16 06:21:37 SRV1 kernel: [] wake_up_process+0x10/0x20
    Aug 16 06:21:37 SRV1 kernel: [] softlockup_tick+0xe9/0x110
    Aug 16 06:21:37 SRV1 kernel: [] dst_run_gc+0x0/0x140
    Aug 16 06:21:37 SRV1 kernel: [] run_local_timers+0x13/0x20
    Aug 16 06:21:37 SRV1 kernel: [] update_process_times+0x57/0x90
    Aug 16 06:21:37 SRV1 kernel: [] smp_local_timer_interrupt+0x34/0x60
    Aug 16 06:21:37 SRV1 kernel: [] smp_apic_timer_interrupt+0x5c/0x80
    Aug 16 06:21:37 SRV1 kernel: [] apic_timer_interrupt+0x66/0x70
    Aug 16 06:21:37 SRV1 kernel: [] dst_run_gc+0x53/0x140
    Aug 16 06:21:37 SRV1 kernel: [] dst_run_gc+0x46/0x140
    Aug 16 06:21:37 SRV1 kernel: [] run_timer_softirq+0x148/0x1c0
    Aug 16 06:21:37 SRV1 kernel: [] __do_softirq+0x6c/0xe0
    Aug 16 06:21:37 SRV1 kernel: [] call_softirq+0x1c/0x30
    Aug 16 06:21:37 SRV1 kernel: [] do_softirq+0x34/0x90
    Aug 16 06:21:37 SRV1 kernel: [] local_bh_enable_ip+0x3f/0x60
    Aug 16 06:21:37 SRV1 kernel: [] _spin_unlock_bh+0x13/0x20
    Aug 16 06:21:37 SRV1 kernel: [] rt_garbage_collect+0x1d8/0x320
    Aug 16 06:21:37 SRV1 kernel: [] dst_alloc+0x1d/0xa0
    Aug 16 06:21:37 SRV1 kernel: [] __ip_route_output_key+0x573/0x800
    Aug 16 06:21:37 SRV1 kernel: [] sock_common_recvmsg+0x32/0x50
    Aug 16 06:21:37 SRV1 kernel: [] ip_route_output_flow+0x1c/0x60
    Aug 16 06:21:37 SRV1 kernel: [] tcp_v4_connect+0x150/0x610
    Aug 16 06:21:37 SRV1 kernel: [] inet_bind_bucket_create+0x17/0x60
    Aug 16 06:21:37 SRV1 kernel: [] inet_stream_connect+0xa6/0x2c0
    Aug 16 06:21:37 SRV1 kernel: [] _spin_lock_bh+0x11/0x30
    Aug 16 06:21:37 SRV1 kernel: [] lock_sock_nested+0xcf/0xe0
    Aug 16 06:21:37 SRV1 kernel: [] _spin_lock_bh+0x11/0x30
    Aug 16 06:21:37 SRV1 kernel: [] sys_connect+0x71/0xa0
    Aug 16 06:21:37 SRV1 kernel: [] tcp_setsockopt+0x1f/0x30
    Aug 16 06:21:37 SRV1 kernel: [] sock_common_setsockopt+0xf/0x20
    Aug 16 06:21:37 SRV1 kernel: [] sys_setsockopt+0x9d/0xc0
    Aug 16 06:21:37 SRV1 kernel: [] sys_ioctl+0x5e/0x80
    Aug 16 06:21:37 SRV1 kernel: [] system_call+0x7e/0x83

    After patch : (RT_CACHE_DEBUG set to 2 to get following traces)

    dst_total: 75469 delayed: 74109 work_perf: 141 expires: 150 elapsed: 8092 us
    dst_total: 78725 delayed: 73366 work_perf: 743 expires: 400 elapsed: 8542 us
    dst_total: 86126 delayed: 71844 work_perf: 1522 expires: 775 elapsed: 8849 us
    dst_total: 100173 delayed: 68791 work_perf: 3053 expires: 1256 elapsed: 9748 us
    dst_total: 121798 delayed: 64711 work_perf: 4080 expires: 1997 elapsed: 10146 us
    dst_total: 154522 delayed: 58316 work_perf: 6395 expires: 25 elapsed: 11402 us
    dst_total: 154957 delayed: 58252 work_perf: 64 expires: 150 elapsed: 6148 us
    dst_total: 157377 delayed: 57843 work_perf: 409 expires: 400 elapsed: 6350 us
    dst_total: 163745 delayed: 56679 work_perf: 1164 expires: 775 elapsed: 7051 us
    dst_total: 176577 delayed: 53965 work_perf: 2714 expires: 1389 elapsed: 8120 us
    dst_total: 198993 delayed: 49627 work_perf: 4338 expires: 1997 elapsed: 8909 us
    dst_total: 226638 delayed: 46865 work_perf: 2762 expires: 2748 elapsed: 7351 us

    I successfully reduced the IP route cache of many hosts by a four factor
    thanks to this patch. Previously, I had to disable "ip route flush cache"
    to avoid crashes.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • We will undo this once it is actually used.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Until we support multiple network namespaces with netfilter only allow
    netfilter configuration in the initial network namespace.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman