11 Aug, 2011

5 commits

  • As rt_iif represents input device even for packets
    coming from loopback with output route, it is not an unique
    key specific to input routes. Now rt_route_iif has such role,
    it was fl.iif in 2.6.38, so better to change the checks at
    some places to save CPU cycles and to restore 2.6.38 semantics.

    compare_keys:
    - input routes: only rt_route_iif matters, rt_iif is same
    - output routes: only rt_oif matters, rt_iif is not
    used for matching in __ip_route_output_key
    - now we are back to 2.6.38 state

    ip_route_input_common:
    - matching rt_route_iif implies input route
    - compared to 2.6.38 we eliminated one rth->fl.oif check
    because it was not needed even for 2.6.38

    compare_hash_inputs:
    Only the change here is not an optimization, it has
    effect only for output routes. I assume I'm restoring
    the original intention to ignore oif, it was using fl.iif
    - now we are back to 2.6.38 state

    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • Free the locally allocated table and newinfo as done in adjacent error
    handling code.

    Signed-off-by: Julia Lawall
    Signed-off-by: David S. Miller

    Julia Lawall
     
  • Call cipso_v4_doi_putdef in the case of the failure of the allocation of
    entry. Reverse the order of the error handling code at the end of the
    function and insert more labels in order to reduce the number of
    unnecessary calls to kfree.

    Signed-off-by: Julia Lawall
    Signed-off-by: David S. Miller

    Julia Lawall
     
  • This patch corrects an erroneous update of credential's gid with uid
    introduced in commit 257b5358b32f17 since 2.6.36.

    Signed-off-by: Tim Chen
    Acked-by: Eric Dumazet
    Reviewed-by: James Morris
    Signed-off-by: David S. Miller

    Tim Chen
     
  • Using a gcc 4.4.3, warnings are emitted for a possibly uninitialized use
    of ecn_ok.

    This can happen if cookie_check_timestamp() returns due to not having
    seen a timestamp. Defaulting to ecn off seems like a reasonable thing
    to do in this case, so initialized ecn_ok to false.

    Signed-off-by: Mike Waychison
    Signed-off-by: David S. Miller

    Mike Waychison
     

10 Aug, 2011

2 commits

  • commit 07bd8df5df4369487812bf85a237322ff3569b77
    (sch_sfq: fix peek() implementation) changed sfq to use generic
    peek helper.

    This makes HFSC complain about a non-work-conserving child qdisc, if
    prio with sfq child is used within hfsc:

    hfsc peeks into prio qdisc, which will then peek into sfq.
    returned skb is stashed in sch->gso_skb.

    Next, hfsc tries to dequeue from prio, but prio will call sfq dequeue
    directly, which may return NULL instead of previously peeked-at skb.

    Have prio call qdisc_dequeue_peeked, so sfq->dequeue() is
    not called in this case.

    Cc: Eric Dumazet
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • This ensures the neighbor entries associated with the bridge
    dev are flushed, also invalidating the associated cached L2 headers.

    This means we br_add_if/br_del_if ports to implement hand-over and
    not wind up with bridge packets going out with stale MAC.

    This means we can also change MAC of port device and also not wind
    up with bridge packets going out with stale MAC.

    This builds on Stephen Hemminger's patch, also handling the br_del_if
    case and the port MAC change case.

    Cc: Stephen Hemminger
    Signed-off-by: Andrei Warkentin
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Andrei Warkentin
     

08 Aug, 2011

6 commits

  • Make sure skb dst has reference when moving to
    another context. Currently, I don't see protocols that can
    hit it when sending broadcasts/multicasts to loopback using
    noref dsts, so it is just a precaution.

    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • The raw sockets can provide source address for
    routing but their privileges are not considered. We
    can provide non-local source address, make sure the
    FLOWI_FLAG_ANYSRC flag is set if socket has privileges
    for this, i.e. based on hdrincl (IP_HDRINCL) and
    transparent flags.

    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • TCP in some cases uses different global (raw) socket
    to send RST and ACK. The transparent flag is not set there.
    Currently, it is a problem for rerouting after the previous
    change.

    Fix it by simplifying the checks in ip_route_me_harder
    and use FLOWI_FLAG_ANYSRC even for sockets. It looks safe
    because the initial routing allowed this source address to
    be used and now we just have to make sure the packet is rerouted.

    As a side effect this also allows rerouting for normal
    raw sockets that use spoofed source addresses which was not possible
    even before we eliminated the ip_route_input call.

    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • IP_PKTOPTIONS is broken for 32-bit applications running
    in COMPAT mode on 64-bit kernels.

    This happens because msghdr's msg_flags field is always
    set to zero. When running in COMPAT mode this should be
    set to MSG_CMSG_COMPAT instead.

    Signed-off-by: Tiberiu Szocs-Mihai
    Signed-off-by: Daniel Baluta
    Signed-off-by: David S. Miller

    Daniel Baluta
     
  • compare_keys and ip_route_input_common rely on
    rt_oif for distinguishing of input and output routes
    with same keys values. But sometimes the input route has
    also same hash chain (keyed by iif != 0) with the output
    routes (keyed by orig_oif=0). Problem visible if running
    with small number of rhash_entries.

    Fix them to use rt_route_iif instead. By this way
    input route can not be returned to users that request
    output route.

    The patch fixes the ip_rt_bug errors that were
    reported in ip_local_out context, mostly for 255.255.255.255
    destinations.

    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • NF_STOLEN means skb was already freed

    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     

07 Aug, 2011

1 commit

  • Computers have become a lot faster since we compromised on the
    partial MD4 hash which we use currently for performance reasons.

    MD5 is a much safer choice, and is inline with both RFC1948 and
    other ISS generators (OpenBSD, Solaris, etc.)

    Furthermore, only having 24-bits of the sequence number be truly
    unpredictable is a very serious limitation. So the periodic
    regeneration and 8-bit counter have been removed. We compute and
    use a full 32-bit sequence number.

    For ipv6, DCCP was found to use a 32-bit truncated initial sequence
    number (it needs 43-bits) and that is fixed here as well.

    Reported-by: Dan Kaminsky
    Tested-by: Willy Tarreau
    Signed-off-by: David S. Miller

    David S. Miller
     

05 Aug, 2011

4 commits

  • When support for binding to 'mapped INADDR_ANY (::ffff.0.0.0.0)' was added
    in 0f8d3c7ac3693d7b6c731bf2159273a59bf70e12 the rest of the code
    wasn't told so now it's possible to bind IPv6 datagram socket to
    ::ffff.0.0.0.0, connect it to another IPv4 address and it will all
    work except for getsockhame() which does not return the local address
    as expected.

    To give getsockname() something to work with check for 'mapped INADDR_ANY'
    when connecting and update the in-core source addresses appropriately.

    Signed-off-by: Max Matveev
    Signed-off-by: David S. Miller

    Max Matveev
     
  • The sendmmsg() introduced by commit 228e548e "net: Add sendmmsg socket system
    call" is capable of sending to multiple different destination addresses.

    SMACK is using destination's address for checking sendmsg() permission.
    However, security_socket_sendmsg() is called for only once even if multiple
    different destination addresses are passed to sendmmsg().

    Therefore, we need to call security_socket_sendmsg() for each destination
    address rather than only the first destination address.

    Since calling security_socket_sendmsg() every time when only single destination
    address was passed to sendmmsg() is a waste of time, omit calling
    security_socket_sendmsg() unless destination address of previous datagram and
    that of current datagram differs.

    Signed-off-by: Tetsuo Handa
    Acked-by: Anton Blanchard
    Cc: stable [3.0+]
    Signed-off-by: David S. Miller

    Tetsuo Handa
     
  • To limit the amount of time we can spend in sendmmsg, cap the
    number of elements to UIO_MAXIOV (currently 1024).

    For error handling an application using sendmmsg needs to retry at
    the first unsent message, so capping is simpler and requires less
    application logic than returning EINVAL.

    Signed-off-by: Anton Blanchard
    Cc: stable [3.0+]
    Signed-off-by: David S. Miller

    Anton Blanchard
     
  • sendmmsg uses a similar error return strategy as recvmmsg but it
    turns out to be a confusing way to communicate errors.

    The current code stores the error code away and returns it on the next
    sendmmsg call. This means a call with completely valid arguments could
    get an error from a previous call.

    Change things so we only return an error if no datagrams could be sent.
    If less than the requested number of messages were sent, the application
    must retry starting at the first failed one and if the problem is
    persistent the error will be returned.

    This matches the behaviour of other syscalls like read/write - it
    is not an error if less than the requested number of elements are sent.

    Signed-off-by: Anton Blanchard
    Cc: stable [3.0+]
    Signed-off-by: David S. Miller

    Anton Blanchard
     

04 Aug, 2011

1 commit


03 Aug, 2011

2 commits


02 Aug, 2011

4 commits


01 Aug, 2011

2 commits

  • commit 8efa88540635 (sch_sfq: avoid giving spurious NET_XMIT_CN signals)
    forgot to call qdisc_tree_decrease_qlen() to signal upper levels that a
    packet (from another flow) was dropped, leading to various problems.

    With help from Michal Soltys and Michal Pokrywka, who did a bisection.

    Bugzilla ref: https://bugzilla.kernel.org/show_bug.cgi?id=39372
    Debian ref: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=631945

    Reported-by: Lucas Bocchi
    Reported-and-bisected-by: Michal Pokrywka
    Signed-off-by: Eric Dumazet
    CC: Michal Soltys
    Acked-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Convert array index from the loop bound to the loop index.

    A simplified version of the semantic patch that fixes this problem is as
    follows: (http://coccinelle.lip6.fr/)

    //
    @@
    expression e1,e2,ar;
    @@

    for(e1 = 0; e1 < e2; e1++) { }
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: David S. Miller

    Julia Lawall
     

29 Jul, 2011

3 commits


28 Jul, 2011

6 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (32 commits)
    tg3: Remove 5719 jumbo frames and TSO blocks
    tg3: Break larger frags into 4k chunks for 5719
    tg3: Add tx BD budgeting code
    tg3: Consolidate code that calls tg3_tx_set_bd()
    tg3: Add partial fragment unmapping code
    tg3: Generalize tg3_skb_error_unmap()
    tg3: Remove short DMA check for 1st fragment
    tg3: Simplify tx bd assignments
    tg3: Reintroduce tg3_tx_ring_info
    ASIX: Use only 11 bits of header for data size
    ASIX: Simplify condition in rx_fixup()
    Fix cdc-phonet build
    bonding: reduce noise during init
    bonding: fix string comparison errors
    net: Audit drivers to identify those needing IFF_TX_SKB_SHARING cleared
    net: add IFF_SKB_TX_SHARED flag to priv_flags
    net: sock_sendmsg_nosec() is static
    forcedeth: fix vlans
    gianfar: fix bug caused by 87c288c6e9aa31720b72e2bc2d665e24e1653c3e
    gro: Only reset frag0 when skb can be pulled
    ...

    Linus Torvalds
     
  • After the last patch, We are left in a state in which only drivers calling
    ether_setup have IFF_TX_SKB_SHARING set (we assume that drivers touching real
    hardware call ether_setup for their net_devices and don't hold any state in
    their skbs. There are a handful of drivers that violate this assumption of
    course, and need to be fixed up. This patch identifies those drivers, and marks
    them as not being able to support the safe transmission of skbs by clearning the
    IFF_TX_SKB_SHARING flag in priv_flags

    Signed-off-by: Neil Horman
    CC: Karsten Keil
    CC: "David S. Miller"
    CC: Jay Vosburgh
    CC: Andy Gospodarek
    CC: Patrick McHardy
    CC: Krzysztof Halasa
    CC: "John W. Linville"
    CC: Greg Kroah-Hartman
    CC: Marcel Holtmann
    CC: Johannes Berg
    Signed-off-by: David S. Miller

    Neil Horman
     
  • Pktgen attempts to transmit shared skbs to net devices, which can't be used by
    some drivers as they keep state information in skbs. This patch adds a flag
    marking drivers as being able to handle shared skbs in their tx path. Drivers
    are defaulted to being unable to do so, but calling ether_setup enables this
    flag, as 90% of the drivers calling ether_setup touch real hardware and can
    handle shared skbs. A subsequent patch will audit drivers to ensure that the
    flag is set properly

    Signed-off-by: Neil Horman
    Reported-by: Jiri Pirko
    CC: Robert Olsson
    CC: Eric Dumazet
    CC: Alexey Dobriyan
    CC: David S. Miller
    Signed-off-by: David S. Miller

    Neil Horman
     
  • Signed-off-by: Eric Dumazet
    CC: Anton Blanchard
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • David S. Miller
     
  • * 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (44 commits)
    NFSv4: Don't use the delegation->inode in nfs_mark_return_delegation()
    nfs: don't use d_move in nfs_async_rename_done
    RDMA: Increasing RPCRDMA_MAX_DATA_SEGS
    SUNRPC: Replace xprt->resend and xprt->sending with a priority queue
    SUNRPC: Allow caller of rpc_sleep_on() to select priority levels
    SUNRPC: Support dynamic slot allocation for TCP connections
    SUNRPC: Clean up the slot table allocation
    SUNRPC: Initalise the struct xprt upon allocation
    SUNRPC: Ensure that we grab the XPRT_LOCK before calling xprt_alloc_slot
    pnfs: simplify pnfs files module autoloading
    nfs: document nfsv4 sillyrename issues
    NFS: Convert nfs4_set_ds_client to EXPORT_SYMBOL_GPL
    SUNRPC: Convert the backchannel exports to EXPORT_SYMBOL_GPL
    SUNRPC: sunrpc should not explicitly depend on NFS config options
    NFS: Clean up - simplify the switch to read/write-through-MDS
    NFS: Move the pnfs write code into pnfs.c
    NFS: Move the pnfs read code into pnfs.c
    NFS: Allow the nfs_pageio_descriptor to signal that a re-coalesce is needed
    NFS: Use the nfs_pageio_descriptor->pg_bsize in the read/write request
    NFS: Cache rpc_ops in struct nfs_pageio_descriptor
    ...

    Linus Torvalds
     

27 Jul, 2011

4 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    merge fchmod() and fchmodat() guts, kill ancient broken kludge
    xfs: fix misspelled S_IS...()
    xfs: get rid of open-coded S_ISREG(), etc.
    vfs: document locking requirements for d_move, __d_move and d_materialise_unique
    omfs: fix (mode & S_IFDIR) abuse
    btrfs: S_ISREG(mode) is not mode & S_IFREG...
    ima: fmode_t misspelled as mode_t...
    pci-label.c: size_t misspelled as mode_t
    jffs2: S_ISLNK(mode & S_IFMT) is pointless
    snd_msnd ->mode is fmode_t, not mode_t
    v9fs_iop_get_acl: get rid of unused variable
    vfs: dont chain pipe/anon/socket on superblock s_inodes list
    Documentation: Exporting: update description of d_splice_alias
    fs: add missing unlock in default_llseek()

    Linus Torvalds
     
  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     
  • …wireless-next-2.6 into for-davem

    John W. Linville
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits)
    ceph: document unlocked d_parent accesses
    ceph: explicitly reference rename old_dentry parent dir in request
    ceph: document locking for ceph_set_dentry_offset
    ceph: avoid d_parent in ceph_dentry_hash; fix ceph_encode_fh() hashing bug
    ceph: protect d_parent access in ceph_d_revalidate
    ceph: protect access to d_parent
    ceph: handle racing calls to ceph_init_dentry
    ceph: set dir complete frag after adding capability
    rbd: set blk_queue request sizes to object size
    ceph: set up readahead size when rsize is not passed
    rbd: cancel watch request when releasing the device
    ceph: ignore lease mask
    ceph: fix ceph_lookup_open intent usage
    ceph: only link open operations to directory unsafe list if O_CREAT|O_TRUNC
    ceph: fix bad parent_inode calc in ceph_lookup_open
    ceph: avoid carrying Fw cap during write into page cache
    libceph: don't time out osd requests that haven't been received
    ceph: report f_bfree based on kb_avail rather than diffing.
    ceph: only queue capsnap if caps are dirty
    ceph: fix snap writeback when racing with writes
    ...

    Linus Torvalds