16 Jun, 2009

3 commits

  • Patch establishes a dummy afiucv-device to make sure af_iucv is
    notified as iucv-bus device about suspend/resume.

    The PM freeze callback severs all iucv pathes of connected af_iucv sockets.
    The PM thaw/restore callback switches the state of all previously connected
    sockets to IUCV_DISCONN.

    Signed-off-by: Ursula Braun
    Signed-off-by: Martin Schwidefsky

    Ursula Braun
     
  • Patch calls the PM callback functions of iucv-bus devices, which are
    responsible for removal of their established iucv pathes.

    The PM freeze callback for the first iucv-bus device disables all iucv
    interrupts except the connection severed interrupt.
    The PM freeze callback for the last iucv-bus device shuts down iucv.

    The PM thaw callback for the first iucv-bus device re-enables iucv
    if it has been shut down during freeze. If freezing has been interrupted,
    it re-enables iucv interrupts according to the needs of iucv-exploiters.

    The PM restore callback for the first iucv-bus device re-enables iucv.

    Signed-off-by: Ursula Braun
    Signed-off-by: Martin Schwidefsky

    Ursula Braun
     
  • To guarantee a proper cleanup, patch adds a reboot notifier to
    the iucv base code, which disables iucv interrupts, shuts down
    established iucv pathes, and removes iucv declarations for z/VM.

    Checks have to be added to the iucv-API functions, whether
    iucv-buffers removed at reboot time are still declared.

    Signed-off-by: Ursula Braun
    Signed-off-by: Martin Schwidefsky

    Ursula Braun
     

15 Jun, 2009

4 commits

  • Conflicts:
    Documentation/feature-removal-schedule.txt
    drivers/scsi/fcoe/fcoe.c
    net/core/drop_monitor.c
    net/core/net-traces.c

    David S. Miller
     
  • Let's use TICKS instead of US, so PSCHED_TICKS2NS and PSCHED_NS2TICKS
    (like in PSCHED_TICKS_PER_SEC already) to avoid misleading.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     
  • While doing trie_rebalance(): resize(), inflate(), halve() RCU free
    tnodes before updating their parents. It depends on RCU delaying the
    real destruction, but if RCU readers start after call_rcu() and before
    parent update they could access freed memory.

    It is currently prevented with preempt_disable() on the update side,
    but it's not safe, except maybe classic RCU, plus it conflicts with
    memory allocations with GFP_KERNEL flag used from these functions.

    This patch explicitly delays freeing of tnodes by adding them to the
    list, which is flushed after the update is finished.

    Reported-by: Yan Zheng
    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (31 commits)
    trivial: remove the trivial patch monkey's name from SubmittingPatches
    trivial: Fix a typo in comment of addrconf_dad_start()
    trivial: usb: fix missing space typo in doc
    trivial: pci hotplug: adding __init/__exit macros to sgi_hotplug
    trivial: Remove the hyphen from git commands
    trivial: fix ETIMEOUT -> ETIMEDOUT typos
    trivial: Kconfig: .ko is normally not included in module names
    trivial: SubmittingPatches: fix typo
    trivial: Documentation/dell_rbu.txt: fix typos
    trivial: Fix Pavel's address in MAINTAINERS
    trivial: ftrace:fix description of trace directory
    trivial: unnecessary (void*) cast removal in sound/oss/msnd.c
    trivial: input/misc: Fix typo in Kconfig
    trivial: fix grammo in bus_for_each_dev() kerneldoc
    trivial: rbtree.txt: fix rb_entry() parameters in sample code
    trivial: spelling fix in ppc code comments
    trivial: fix typo in bio_alloc kernel doc
    trivial: Documentation/rbtree.txt: cleanup kerneldoc of rbtree.txt
    trivial: Miscellaneous documentation typo fixes
    trivial: fix typo milisecond/millisecond for documentation and source comments.
    ...

    Linus Torvalds
     

14 Jun, 2009

5 commits

  • Since the re-write of the RFKILL subsystem it is no longer good to just
    select RFKILL, but it is important to add a proper depends on rule.

    Based on a report by Alexander Beregalov

    Signed-off-by: Marcel Holtmann

    Marcel Holtmann
     
  • IPv4:
    - make PIM register vifs netns local
    - set the netns when a PIM register vif is created
    - make PIM available in all network namespaces (if CONFIG_IP_PIMSM_V2)
    by adding the protocol handler when multicast routing is initialized

    IPv6:
    - make PIM register vifs netns local
    - make PIM available in all network namespaces (if CONFIG_IPV6_PIMSM_V2)
    by adding the protocol handler when multicast routing is initialized

    Signed-off-by: Tom Goff
    Signed-off-by: David S. Miller

    Tom Goff
     
  • Removed the statements about ARP cache size as this config option does
    not affect it. The cache size is controlled by neigh_table gc thresholds.

    Remove also expiremental and obsolete markings as the API originally
    intended for arp caching is useful for implementing ARP-like protocols
    (e.g. NHRP) in user space and has been there for a long enough time.

    Signed-off-by: Timo Teras
    Signed-off-by: David S. Miller

    Timo Teräs
     
  • For the sake of power saver lovers, use a deferrable timer to fire
    rt_check_expire()

    As some big routers cache equilibrium depends on garbage collection
    done in time, we take into account elapsed time between two
    rt_check_expire() invocations to adjust the amount of slots we have to
    check.

    Based on an initial idea and patch from Tero Kristo

    Signed-off-by: Eric Dumazet
    Signed-off-by: Tero Kristo
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • David S. Miller
     

13 Jun, 2009

11 commits

  • Signed-off-by: Joe Perches
    Signed-off-by: Patrick McHardy

    Joe Perches
     
  • This patch improves ctnetlink event reliability if one broadcast
    listener has set the NETLINK_BROADCAST_ERROR socket option.

    The logic is the following: if an event delivery fails, we keep
    the undelivered events in the missed event cache. Once the next
    packet arrives, we add the new events (if any) to the missed
    events in the cache and we try a new delivery, and so on. Thus,
    if ctnetlink fails to deliver an event, we try to deliver them
    once we see a new packet. Therefore, we may lose state
    transitions but the userspace process gets in sync at some point.

    At worst case, if no events were delivered to userspace, we make
    sure that destroy events are successfully delivered. Basically,
    if ctnetlink fails to deliver the destroy event, we remove the
    conntrack entry from the hashes and we insert them in the dying
    list, which contains inactive entries. Then, the conntrack timer
    is added with an extra grace timeout of random32() % 15 seconds
    to trigger the event again (this grace timeout is tunable via
    /proc). The use of a limited random timeout value allows
    distributing the "destroy" resends, thus, avoiding accumulating
    lots "destroy" events at the same time. Event delivery may
    re-order but we can identify them by means of the tuple plus
    the conntrack ID.

    The maximum number of conntrack entries (active or inactive) is
    still handled by nf_conntrack_max. Thus, we may start dropping
    packets at some point if we accumulate a lot of inactive conntrack
    entries that did not successfully report the destroy event to
    userspace.

    During my stress tests consisting of setting a very small buffer
    of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket
    flag, and generating lots of very small connections, I noticed
    very few destroy entries on the fly waiting to be resend.

    A simple way to test this patch consist of creating a lot of
    entries, set a very small Netlink buffer in conntrackd (+ a patch
    which is not in the git tree to set the BROADCAST_ERROR flag)
    and invoke `conntrack -F'.

    For expectations, no changes are introduced in this patch.
    Currently, event delivery is only done for new expectations (no
    events from expectation expiration, removal and confirmation).
    In that case, they need a per-expectation event cache to implement
    the same idea that is exposed in this patch.

    This patch can be useful to provide reliable flow-accouting. We
    still have to add a new conntrack extension to store the creation
    and destroy time.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Patrick McHardy

    Pablo Neira Ayuso
     
  • This patch moves the helper destruction to a function that lives
    in nf_conntrack_helper.c. This new function is used in the patch
    to add ctnetlink reliable event delivery.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Patrick McHardy

    Pablo Neira Ayuso
     
  • This patch reworks the per-cpu event caching to use the conntrack
    extension infrastructure.

    The main drawback is that we consume more memory per conntrack
    if event delivery is enabled. This patch is required by the
    reliable event delivery that follows to this patch.

    BTW, this patch allows you to enable/disable event delivery via
    /proc/sys/net/netfilter/nf_conntrack_events in runtime, although
    you can still disable event caching as compilation option.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Patrick McHardy

    Pablo Neira Ayuso
     
  • Use mod_timer_pending() instead of atomic sequence of del_timer()/
    add_timer(). mod_timer_pending() does not rearm an inactive timer,
    so we don't need the conntrack lock anymore to make sure we don't
    accidentally rearm a timer of a conntrack which is in the process
    of being destroyed.

    With this change, we don't need to take the global lock anymore at all,
    counter updates can be performed under the per-conntrack lock.

    Signed-off-by: Patrick McHardy

    Patrick McHardy
     
  • Fix regression introduced by 17625274 "netfilter: sysctl support of
    logger choice":

    BUG: sleeping function called from invalid context at /mnt/s390test/linux-2.6-tip/arch/s390/include/asm/uaccess.h:234
    in_atomic(): 1, irqs_disabled(): 0, pid: 3245, name: sysctl
    CPU: 1 Not tainted 2.6.30-rc8-tipjun10-02053-g39ae214 #1
    Process sysctl (pid: 3245, task: 000000007f675da0, ksp: 000000007eb17cf0)
    0000000000000000 000000007eb17be8 0000000000000002 0000000000000000
    000000007eb17c88 000000007eb17c00 000000007eb17c00 0000000000048156
    00000000003e2de8 000000007f676118 000000007eb17f10 0000000000000000
    0000000000000000 000000007eb17be8 000000000000000d 000000007eb17c58
    00000000003e2050 000000000001635c 000000007eb17be8 000000007eb17c30
    Call Trace:
    (ݨ show_trace+0x13a/0x148)
    ݨ __might_sleep+0x13a/0x164
    ݨ proc_dostring+0x134/0x22c
    ݨ nf_log_proc_dostring+0xfc/0x188
    ݨ proc_sys_call_handler+0xf6/0x118
    ݨ proc_sys_read+0x26/0x34
    ݨ vfs_read+0xac/0x158
    ݨ SyS_read+0x56/0x88
    ݨ sysc_noemu+0x10/0x16

    Use the nf_log_mutex instead of RCU to fix this.

    Reported-and-tested-by: Maran Pakkirisamy
    Signed-off-by: Patrick McHardy

    Patrick McHardy
     
  • Convert magic values 1 and -1 to NETDEV_TX_BUSY and NETDEV_TX_LOCKED respectively.

    0 (NETDEV_TX_OK) is not changed to keep the noise down, except in very few cases
    where its in direct proximity to one of the other values.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Fix up ATM drivers that return an errno value to qdisc_restart(), causing
    qdisc_restart() to print a warning an requeue/retransmit the skb.

    - lec: condition can only be remedied by userspace, until that retransmissions

    Compile tested only.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Signed-off-by: Masatake YAMATO
    Signed-off-by: Jiri Kosina

    Masatake YAMATO
     
  • .ko is normally not included in Kconfig help, make it consistent.

    Signed-off-by: Pavel Machek
    Signed-off-by: Jiri Kosina

    Pavel Machek
     
  • Signed-off-by: Martin Olsson
    Signed-off-by: Jiri Kosina

    Martin Olsson
     

12 Jun, 2009

8 commits


11 Jun, 2009

9 commits

  • Replace the last occurence of tcp_lock by the per-conntrack lock.

    Signed-off-by: Patrick McHardy

    Patrick McHardy
     
  • Patrick McHardy
     
  • David S. Miller
     
  • The current code errors out the INCOMPLETE neigh entry skb queue only from
    the timer if maximum probes have been attempted and there has been no reply.
    This also causes the transtion to FAILED state.

    However, the neigh entry can be also updated via Netlink to inform that the
    address is unavailable. Currently, neigh_update() just stops the timers and
    leaves the pending skb's unreleased. This results that the clean up code in
    the timer callback is never called, preventing also proper garbage collection.

    This fixes neigh_update() to process the pending skb queue immediately if
    INCOMPLETE -> FAILED state transtion occurs due to a Netlink request.

    Signed-off-by: Timo Teras
    Signed-off-by: David S. Miller

    Timo Teras
     
  • One of the problem with sock memory accounting is it uses
    a pair of sock_hold()/sock_put() for each transmitted packet.

    This slows down bidirectional flows because the receive path
    also needs to take a refcount on socket and might use a different
    cpu than transmit path or transmit completion path. So these
    two atomic operations also trigger cache line bounces.

    We can see this in tx or tx/rx workloads (media gateways for example),
    where sock_wfree() can be in top five functions in profiles.

    We use this sock_hold()/sock_put() so that sock freeing
    is delayed until all tx packets are completed.

    As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
    by one unit at init time, until sk_free() is called.
    Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
    to decrement initial offset and atomicaly check if any packets
    are in flight.

    skb_set_owner_w() doesnt call sock_hold() anymore

    sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
    reached 0 to perform the final freeing.

    Drawback is that a skb->truesize error could lead to unfreeable sockets, or
    even worse, prematurely calling __sk_free() on a live socket.

    Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
    on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
    contention point. 5 % speedup on a UDP transmit workload (depends
    on number of flows), lowering TX completion cpu usage.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Signed-off-by: David S. Miller

    David S. Miller
     
  • David S. Miller
     
  • * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (244 commits)
    Revert "x86, bts: reenable ptrace branch trace support"
    tracing: do not translate event helper macros in print format
    ftrace/documentation: fix typo in function grapher name
    tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK
    tracing: add protection around module events unload
    tracing: add trace_seq_vprint interface
    tracing: fix the block trace points print size
    tracing/events: convert block trace points to TRACE_EVENT()
    ring-buffer: fix ret in rb_add_time_stamp
    ring-buffer: pass in lockdep class key for reader_lock
    tracing: add annotation to what type of stack trace is recorded
    tracing: fix multiple use of __print_flags and __print_symbolic
    tracing/events: fix output format of user stack
    tracing/events: fix output format of kernel stack
    tracing/trace_stack: fix the number of entries in the header
    ring-buffer: discard timestamps that are at the start of the buffer
    ring-buffer: try to discard unneeded timestamps
    ring-buffer: fix bug in ring_buffer_discard_commit
    ftrace: do not profile functions when disabled
    tracing: make trace pipe recognize latency format flag
    ...

    Linus Torvalds
     
  • rfkill currently requires a global lock within the
    rfkill_register() function, and holds that lock over
    calls to the set_block() methods. This means that we
    cannot hold a lock around rfkill_register() that we
    also require in set_block(), directly or indirectly.
    Fix cfg80211 to register rfkill outside the block
    locked by its global lock. Much of what cfg80211 does
    in the locked block doesn't need to be locked anyway.

    Reported-by: Vasanthakumar Thiagarajan
    Signed-off-by: Johannes Berg
    Signed-off-by: John W. Linville

    Johannes Berg