25 Jun, 2009

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6:
    bnx2: Fix the behavior of ethtool when ONBOOT=no
    qla3xxx: Don't sleep while holding lock.
    qla3xxx: Give the PHY time to come out of reset.
    ipv4 routing: Ensure that route cache entries are usable and reclaimable with caching is off
    net: Move rx skb_orphan call to where needed
    ipv6: Use correct data types for ICMPv6 type and code
    net: let KS8842 driver depend on HAS_IOMEM
    can: let SJA1000 driver depend on HAS_IOMEM
    netxen: fix firmware init handshake
    netxen: fix build with without CONFIG_PM
    netfilter: xt_rateest: fix comparison with self
    netfilter: xt_quota: fix incomplete initialization
    netfilter: nf_log: fix direct userspace memory access in proc handler
    netfilter: fix some sparse endianess warnings
    netfilter: nf_conntrack: fix conntrack lookup race
    netfilter: nf_conntrack: fix confirmation race condition
    netfilter: nf_conntrack: death_by_timeout() fix

    Linus Torvalds
     

24 Jun, 2009

2 commits

  • When route caching is disabled (rt_caching returns false), We still use route
    cache entries that are created and passed into rt_intern_hash once. These
    routes need to be made usable for the one call path that holds a reference to
    them, and they need to be reclaimed when they're finished with their use. To be
    made usable, they need to be associated with a neighbor table entry (which they
    currently are not), otherwise iproute_finish2 just discards the packet, since we
    don't know which L2 peer to send the packet to. To do this binding, we need to
    follow the path a bit higher up in rt_intern_hash, which calls
    arp_bind_neighbour, but not assign the route entry to the hash table.
    Currently, if caching is off, we simply assign the route to the rp pointer and
    are reutrn success. This patch associates us with a neighbor entry first.

    Secondly, we need to make sure that any single use routes like this are known to
    the garbage collector when caching is off. If caching is off, and we try to
    hash in a route, it will leak when its refcount reaches zero. To avoid this,
    this patch calls rt_free on the route cache entry passed into rt_intern_hash.
    This places us on the gc list for the route cache garbage collector, so that
    when its refcount reaches zero, it will be reclaimed (Thanks to Alexey for this
    suggestion).

    I've tested this on a local system here, and with these patches in place, I'm
    able to maintain routed connectivity to remote systems, even if I set
    /proc/sys/net/ipv4/rt_cache_rebuild_count to -1, which forces rt_caching to
    return false.

    Signed-off-by: Neil Horman
    Reported-by: Jarek Poplawski
    Reported-by: Maxime Bizon
    Signed-off-by: David S. Miller

    Neil Horman
     
  • In order to get the tun driver to account packets, we need to be
    able to receive packets with destructors set. To be on the safe
    side, I added an skb_orphan call for all protocols by default since
    some of them (IP in particular) cannot handle receiving packets
    destructors properly.

    Now it seems that at least one protocol (CAN) expects to be able
    to pass skb->sk through the rx path without getting clobbered.

    So this patch attempts to fix this properly by moving the skb_orphan
    call to where it's actually needed. In particular, I've added it
    to skb_set_owner_[rw] which is what most users of skb->destructor
    call.

    This is actually an improvement for tun too since it means that
    we only give back the amount charged to the socket when the skb
    is passed to another socket that will also be charged accordingly.

    Signed-off-by: Herbert Xu
    Tested-by: Oliver Hartkopp
    Signed-off-by: David S. Miller

    Herbert Xu
     

23 Jun, 2009

4 commits

  • Change all the code that deals directly with ICMPv6 type and code
    values to use u8 instead of a signed int as that's the actual data
    type.

    Signed-off-by: Brian Haley
    Signed-off-by: David S. Miller

    Brian Haley
     
  • * 'for-2.6.31' of git://fieldses.org/git/linux-nfsd: (60 commits)
    SUNRPC: Fix the TCP server's send buffer accounting
    nfsd41: Backchannel: minorversion support for the back channel
    nfsd41: Backchannel: cleanup nfs4.0 callback encode routines
    nfsd41: Remove ip address collision detection case
    nfsd: optimise the starting of zero threads when none are running.
    nfsd: don't take nfsd_mutex twice when setting number of threads.
    nfsd41: sanity check client drc maxreqs
    nfsd41: move channel attributes from nfsd4_session to a nfsd4_channel_attr struct
    NFS: kill off complicated macro 'PROC'
    sunrpc: potential memory leak in function rdma_read_xdr
    nfsd: minor nfsd_vfs_write cleanup
    nfsd: Pull write-gathering code out of nfsd_vfs_write
    nfsd: track last inode only in use_wgather case
    sunrpc: align cache_clean work's timer
    nfsd: Use write gathering only with NFSv2
    NFSv4: kill off complicated macro 'PROC'
    NFSv4: do exact check about attribute specified
    knfsd: remove unreported filehandle stats counters
    knfsd: fix reply cache memory corruption
    knfsd: reply cache cleanups
    ...

    Linus Torvalds
     
  • * 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (128 commits)
    nfs41: sunrpc: xprt_alloc_bc_request() should not use spin_lock_bh()
    nfs41: Move initialization of nfs4_opendata seq_res to nfs4_init_opendata_res
    nfs: remove unnecessary NFS_INO_INVALID_ACL checks
    NFS: More "sloppy" parsing problems
    NFS: Invalid mount option values should always fail, even with "sloppy"
    NFS: Remove unused XDR decoder functions
    NFS: Update MNT and MNT3 reply decoding functions
    NFS: add XDR decoder for mountd version 3 auth-flavor lists
    NFS: add new file handle decoders to in-kernel mountd client
    NFS: Add separate mountd status code decoders for each mountd version
    NFS: remove unused function in fs/nfs/mount_clnt.c
    NFS: Use xdr_stream-based XDR encoder for MNT's dirpath argument
    NFS: Clean up MNT program definitions
    lockd: Don't bother with RPC ping for NSM upcalls
    lockd: Update NSM state from SM_MON replies
    NFS: Fix false error return from nfs_callback_up() if ipv6.ko is not available
    NFS: Return error code from nfs_callback_up() to user space
    NFS: Do not display the setting of the "intr" mount option
    NFS: add support for splice writes
    nfs41: Backchannel: CB_SEQUENCE validation
    ...

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (43 commits)
    via-velocity: Fix velocity driver unmapping incorrect size.
    mlx4_en: Remove redundant refill code on RX
    mlx4_en: Removed redundant check on lso header size
    mlx4_en: Cancel port_up check in transmit function
    mlx4_en: using stop/start_all_queues
    mlx4_en: Removed redundant skb->len check
    mlx4_en: Counting all the dropped packets on the TX side
    usbnet cdc_subset: fix issues talking to PXA gadgets
    Net: qla3xxx, remove sleeping in atomic
    ipv4: fix NULL pointer + success return in route lookup path
    isdn: clean up documentation index
    cfg80211: validate station settings
    cfg80211: allow setting station parameters in mesh
    cfg80211: allow adding/deleting stations on mesh
    ath5k: fix beacon_int handling
    MAINTAINERS: Fix Atheros pattern paths
    ath9k: restore PS mode, before we put the chip into FULL SLEEP state.
    ath9k: wait for beacon frame along with CAB
    acer-wmi: fix rfkill conversion
    ath5k: avoid PCI FATAL interrupts by restoring RETRY_TIMEOUT disabling
    ...

    Linus Torvalds
     

22 Jun, 2009

7 commits

  • As noticed by Török Edwin :

    Compiling the kernel with clang has shown this warning:

    net/netfilter/xt_rateest.c:69:16: warning: self-comparison always results in a
    constant value
    ret &= pps2 == pps2;
    ^
    Looking at the code:
    if (info->flags & XT_RATEEST_MATCH_BPS)
    ret &= bps1 == bps2;
    if (info->flags & XT_RATEEST_MATCH_PPS)
    ret &= pps2 == pps2;

    Judging from the MATCH_BPS case it seems to be a typo, with the intention of
    comparing pps1 with pps2.

    http://bugzilla.kernel.org/show_bug.cgi?id=13535

    Signed-off-by: Patrick McHardy

    Patrick McHardy
     
  • Commit v2.6.29-rc5-872-gacc738f ("xtables: avoid pointer to self")
    forgot to copy the initial quota value supplied by iptables into the
    private structure, thus counting from whatever was in the memory
    kmalloc returned.

    Signed-off-by: Jan Engelhardt
    Signed-off-by: Patrick McHardy

    Jan Engelhardt
     
  • Signed-off-by: Patrick McHardy

    Patrick McHardy
     
  • net/netfilter/xt_NFQUEUE.c:46:9: warning: incorrect type in assignment (different base types)
    net/netfilter/xt_NFQUEUE.c:46:9: expected unsigned int [unsigned] [usertype] ipaddr
    net/netfilter/xt_NFQUEUE.c:46:9: got restricted unsigned int
    net/netfilter/xt_NFQUEUE.c:68:10: warning: incorrect type in assignment (different base types)
    net/netfilter/xt_NFQUEUE.c:68:10: expected unsigned int [unsigned]
    net/netfilter/xt_NFQUEUE.c:68:10: got restricted unsigned int
    net/netfilter/xt_NFQUEUE.c:69:10: warning: incorrect type in assignment (different base types)
    net/netfilter/xt_NFQUEUE.c:69:10: expected unsigned int [unsigned]
    net/netfilter/xt_NFQUEUE.c:69:10: got restricted unsigned int
    net/netfilter/xt_NFQUEUE.c:70:10: warning: incorrect type in assignment (different base types)
    net/netfilter/xt_NFQUEUE.c:70:10: expected unsigned int [unsigned]
    net/netfilter/xt_NFQUEUE.c:70:10: got restricted unsigned int
    net/netfilter/xt_NFQUEUE.c:71:10: warning: incorrect type in assignment (different base types)
    net/netfilter/xt_NFQUEUE.c:71:10: expected unsigned int [unsigned]
    net/netfilter/xt_NFQUEUE.c:71:10: got restricted unsigned int

    net/netfilter/xt_cluster.c:20:55: warning: incorrect type in return expression (different base types)
    net/netfilter/xt_cluster.c:20:55: expected unsigned int
    net/netfilter/xt_cluster.c:20:55: got restricted unsigned int const [usertype] ip
    net/netfilter/xt_cluster.c:20:55: warning: incorrect type in return expression (different base types)
    net/netfilter/xt_cluster.c:20:55: expected unsigned int
    net/netfilter/xt_cluster.c:20:55: got restricted unsigned int const [usertype] ip

    Signed-off-by: Patrick McHardy

    Patrick McHardy
     
  • The RCU protected conntrack hash lookup only checks whether the entry
    has a refcount of zero to decide whether it is stale. This is not
    sufficient, entries are explicitly removed while there is at least
    one reference left, possibly more. Explicitly check whether the entry
    has been marked as dying to fix this.

    Signed-off-by: Patrick McHardy

    Patrick McHardy
     
  • New connection tracking entries are inserted into the hash before they
    are fully set up, namely the CONFIRMED bit is not set and the timer not
    started yet. This can theoretically lead to a race with timer, which
    would set the timeout value to a relative value, most likely already in
    the past.

    Perform hash insertion as the final step to fix this.

    Signed-off-by: Patrick McHardy

    Patrick McHardy
     
  • death_by_timeout() might delete a conntrack from hash list
    and insert it in dying list.

    nf_ct_delete_from_lists(ct);
    nf_ct_insert_dying_list(ct);

    I believe a (lockless) reader could *catch* ct while doing a lookup
    and miss the end of its chain.
    (nulls lookup algo must check the null value at the end of lookup and
    should restart if the null value is not the expected one.
    cf Documentation/RCU/rculist_nulls.txt for details)

    We need to change nf_conntrack_init_net() and use a different "null" value,
    guaranteed not being used in regular lists. Choose very large values, since
    hash table uses [0..size-1] null values.

    Signed-off-by: Eric Dumazet
    Acked-by: Pablo Neira Ayuso
    Signed-off-by: Patrick McHardy

    Eric Dumazet
     

21 Jun, 2009

1 commit


20 Jun, 2009

2 commits

  • David S. Miller
     
  • Don't drop route if we're not caching

    I recently got a report of an oops on a route lookup. Maxime was
    testing what would happen if route caching was turned off (doing so by setting
    making rt_caching always return 0), and found that it triggered an oops. I
    looked at it and found that the problem stemmed from the fact that the route
    lookup routines were returning success from their lookup paths (which is good),
    but never set the **rp pointer to anything (which is bad). This happens because
    in rt_intern_hash, if rt_caching returns false, we call rt_drop and return 0.
    This almost emulates slient success. What we should be doing is assigning *rp =
    rt and _not_ dropping the route. This way, during slow path lookups, when we
    create a new route cache entry, we don't immediately discard it, rather we just
    don't add it into the cache hash table, but we let this one lookup use it for
    the purpose of this route request. Maxime has tested and reports it prevents
    the oops. There is still a subsequent routing issue that I'm looking into
    further, but I'm confident that, even if its related to this same path, this
    patch makes sense to take.

    Signed-off-by: Neil Horman
    Signed-off-by: David S. Miller

    Neil Horman
     

19 Jun, 2009

12 commits

  • When I disallowed interfering with stations on non-AP interfaces,
    I not only forget mesh but also managed interfaces which need
    this for the authorized flag. Let's actually validate everything
    properly.

    This fixes an nl80211 regression introduced by the interfering,
    under which wpa_supplicant -Dnl80211 could not properly connect.

    Signed-off-by: Johannes Berg
    Signed-off-by: John W. Linville

    Johannes Berg
     
  • Mesh Point interfaces can also set parameters, for example plink_open is
    used to manually establish peer links from user-space (currently via
    iw). Add Mesh Point to the check in nl80211_set_station.

    Signed-off-by: Andrey Yurovsky
    Signed-off-by: John W. Linville

    Andrey Yurovsky
     
  • Commit b2a151a288 added a check that prevents adding or deleting
    stations on non-AP interfaces. Adding and deleting stations is
    supported for Mesh Point interfaces, so add Mesh Point to that check as
    well.

    Signed-off-by: Andrey Yurovsky
    Signed-off-by: John W. Linville

    Andrey Yurovsky
     
  • This information allows userspace to implement a hybrid policy where
    it can store the rfkill soft-blocked state in platform non-volatile
    storage if available, and if not then file-based storage can be used.

    Some users prefer platform non-volatile storage because of the behaviour
    when dual-booting multiple versions of Linux, or if the rfkill setting
    is changed in the BIOS setting screens, or if the BIOS responds to
    wireless-toggle hotkeys itself before the relevant platform driver has
    been loaded.

    Signed-off-by: Alan Jenkins
    Acked-by: Henrique de Moraes Holschuh
    Signed-off-by: John W. Linville

    Alan Jenkins
     
  • The setting of the "persistent" flag is also made more explicit using
    a new rfkill_init_sw_state() function, instead of special-casing
    rfkill_set_sw_state() when it is called before registration.

    Suspend is a bit of a corner case so we try to get away without adding
    another hack to rfkill-input - it's going to be removed soon.
    If the state does change over suspend, users will simply have to prod
    rfkill-input twice in order to toggle the state.

    Userspace policy agents will be able to implement a more consistent user
    experience. For example, they can avoid the above problem if they
    toggle devices individually. Then there would be no "global state"
    to get out of sync.

    Currently there are only two rfkill drivers with persistent soft-blocked
    state. thinkpad-acpi already checks the software state on resume.
    eeepc-laptop will require modification.

    Signed-off-by: Alan Jenkins
    CC: Marcel Holtmann
    Acked-by: Henrique de Moraes Holschuh
    Signed-off-by: John W. Linville

    Alan Jenkins
     
  • If we return after fiddling with the state, userspace will see the
    wrong state and rfkill_set_sw_state() won't work until the next call to
    rfkill_set_block(). At the moment rfkill_set_block() will always be
    called from rfkill_resume(), but this will change in future.

    Also, presumably the point of this test is to avoid bothering devices
    which may be suspended. If we don't want to call set_block(), we
    probably don't want to call query() either :-).

    Signed-off-by: Alan Jenkins
    Signed-off-by: John W. Linville

    Alan Jenkins
     
  • Use print_hex_dump_bytes instead of self-written dumping function
    for outputting packet dumps.

    Signed-off-by: Dmitry Eremin-Solenikov
    Signed-off-by: David S. Miller

    Dmitry Baryshkov
     
  • If the iucv message limit for a communication path is exceeded,
    sendmsg() returns -EAGAIN instead of -EPIPE.
    The calling application can then handle this error situtation,
    e.g. to try again after waiting some time.

    For blocking sockets, sendmsg() waits up to the socket timeout
    before returning -EAGAIN. For the new wait condition, a macro
    has been introduced and the iucv_sock_wait_state() has been
    refactored to this macro.

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Hendrik Brueckner
     
  • Change the if condition to exit sendmsg() if the socket in not connected.

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Hendrik Brueckner
     
  • Currently, the sunrpc server is refusing to allow us to process new RPC
    calls if the TCP send buffer is 2/3 full, even if we do actually have
    enough free space to guarantee that we can send another request.
    The following patch fixes svc_tcp_has_wspace() so that we only stop
    processing requests if we know that the socket buffer cannot possibly fit
    another reply.

    It also fixes the tcp write_space() callback so that we only clear the
    SOCK_NOSPACE flag when the TCP send buffer is less than 2/3 full.
    This should ensure that the send window will grow as per the standard TCP
    socket code.

    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     
  • Conflicts:
    fs/nfs/client.c
    fs/nfs/super.c

    Trond Myklebust
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (55 commits)
    netxen: fix tx ring accounting
    netxen: fix detection of cut-thru firmware mode
    forcedeth: fix dma api mismatches
    atm: sk_wmem_alloc initial value is one
    net: correct off-by-one write allocations reports
    via-velocity : fix no link detection on boot
    Net / e100: Fix suspend of devices that cannot be power managed
    TI DaVinci EMAC : Fix rmmod error
    net: group address list and its count
    ipv4: Fix fib_trie rebalancing, part 2
    pkt_sched: Update drops stats in act_police
    sky2: version 1.23
    sky2: add GRO support
    sky2: skb recycling
    sky2: reduce default transmit ring
    sky2: receive counter update
    sky2: fix shutdown synchronization
    sky2: PCI irq issues
    sky2: more receive shutdown
    sky2: turn off pause during shutdown
    ...

    Manually fix trivial conflict in net/core/skbuff.c due to kmemcheck

    Linus Torvalds
     

18 Jun, 2009

11 commits

  • commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
    (net: No more expensive sock_hold()/sock_put() on each tx)
    changed initial sk_wmem_alloc value.

    This broke net/atm since this protocol assumed a null
    initial value. This patch makes necessary changes.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
    (net: No more expensive sock_hold()/sock_put() on each tx)
    changed initial sk_wmem_alloc value.

    We need to take into account this offset when reporting
    sk_wmem_alloc to user, in PROC_FS files or various
    ioctls (SIOCOUTQ/TIOCOUTQ)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This patch is inspired by patch recently posted by Johannes Berg. Basically what
    my patch does is to group list and a count of addresses into newly introduced
    structure netdev_hw_addr_list. This brings us two benefits:
    1) struct net_device becames a bit nicer.
    2) in the future there will be a possibility to operate with lists independently
    on netdevices (with exporting right functions).
    I wanted to introduce this patch before I'll post a multicast lists conversion.

    Signed-off-by: Jiri Pirko

    drivers/net/bnx2.c | 4 +-
    drivers/net/e1000/e1000_main.c | 4 +-
    drivers/net/ixgbe/ixgbe_main.c | 6 +-
    drivers/net/mv643xx_eth.c | 2 +-
    drivers/net/niu.c | 4 +-
    drivers/net/virtio_net.c | 10 ++--
    drivers/s390/net/qeth_l2_main.c | 2 +-
    include/linux/netdevice.h | 17 +++--
    net/core/dev.c | 130 ++++++++++++++++++--------------------
    9 files changed, 89 insertions(+), 90 deletions(-)
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • My previous patch, which explicitly delays freeing of tnodes by adding
    them to the list to flush them after the update is finished, isn't
    strict enough. It treats exceptionally tnodes without parent, assuming
    they are newly created, so "invisible" for the read side yet.

    But the top tnode doesn't have parent as well, so we have to exclude
    all exceptions (at least until a better way is found). Additionally we
    need to move rcu assignment of this node before flushing, so the
    return type of the trie_rebalance() function is changed.

    Reported-by: Yan Zheng
    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     
  • Action police statistics could be misleading because drops are not
    shown when expected.

    With feedback from: Jamal Hadi Salim

    Reported-by: Pawel Staszewski
    Signed-off-by: Jarek Poplawski
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jarek Poplawski
     
  • The skb mac_header field is sometimes NULL (or ~0u) as a sentinel
    value. The places where skb is expanded add an offset which would
    change this flag into an invalid pointer (or offset).

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Looking at the crash in log_martians(), one suspect is that the check for
    mac header being set is not correct. The value of mac_header defaults to
    0 on allocation, therefore skb_mac_header_was_set will always be true on
    platforms using NET_SKBUFF_USES_OFFSET.

    Signed-off-by: Stephen Hemminger
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Trond Myklebust
     
  • The 'rq_received' member of 'struct rpc_rqst' is used to track when we
    have received a reply to our request. With v4.1, the backchannel
    can now accept callback requests over the existing connection. Rename
    this field to make it clear that it is only used for tracking reply bytes
    and not all bytes received on the connection.

    Signed-off-by: Ricardo Labiaga
    Signed-off-by: Benny Halevy

    Ricardo Labiaga
     
  • Obtain the rpc_xprt from the rpc_rqst so that calls and callback replies
    can both use the same code path. A client needs the rpc_xprt in order
    to reply to a callback.

    Signed-off-by: Rahul Iyer
    Signed-off-by: Ricardo Labiaga
    Signed-off-by: Benny Halevy

    Rahul Iyer
     
  • Signed-off-by: Benny Halevy

    Benny Halevy