31 Oct, 2010

9 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
    isdn: mISDN: socket: fix information leak to userland
    netdev: can: Change mail address of Hans J. Koch
    pcnet_cs: add new_id
    net: Truncate recvfrom and sendto length to INT_MAX.
    RDS: Let rds_message_alloc_sgs() return NULL
    RDS: Copy rds_iovecs into kernel memory instead of rereading from userspace
    RDS: Clean up error handling in rds_cmsg_rdma_args
    RDS: Return -EINVAL if rds_rdma_pages returns an error
    net: fix rds_iovec page count overflow
    can: pch_can: fix section mismatch warning by using a whitelisted name
    can: pch_can: fix sparse warning
    netxen_nic: Fix the tx queue manipulation bug in netxen_nic_probe
    ip_gre: fix fallback tunnel setup
    vmxnet: trivial annotation of protocol constant
    vmxnet3: remove unnecessary byteswapping in BAR writing macros
    ipv6/udp: report SndbufErrors and RcvbufErrors
    phy/marvell: rename 88ec048 to 88e1318s and fix mscr1 addr

    Linus Torvalds
     
  • Signed-off-by: Linus Torvalds
    Signed-off-by: David S. Miller

    Linus Torvalds
     
  • Even with the previous fix, we still are reading the iovecs once
    to determine SGs needed, and then again later on. Preallocating
    space for sg lists as part of rds_message seemed like a good idea
    but it might be better to not do this. While working to redo that
    code, this patch attempts to protect against userspace rewriting
    the rds_iovec array between the first and second accesses.

    The consequences of this would be either a too-small or too-large
    sg list array. Too large is not an issue. This patch changes all
    callers of message_alloc_sgs to handle running out of preallocated
    sgs, and fail gracefully.

    Signed-off-by: Andy Grover
    Signed-off-by: David S. Miller

    Andy Grover
     
  • Change rds_rdma_pages to take a passed-in rds_iovec array instead
    of doing copy_from_user itself.

    Change rds_cmsg_rdma_args to copy rds_iovec array once only. This
    eliminates the possibility of userspace changing it after our
    sanity checks.

    Implement stack-based storage for small numbers of iovecs, based
    on net/socket.c, to save an alloc in the extremely common case.

    Although this patch reduces iovec copies in cmsg_rdma_args to 1,
    we still do another one in rds_rdma_extra_size. Getting rid of
    that one will be trickier, so it'll be a separate patch.

    Signed-off-by: Andy Grover
    Signed-off-by: David S. Miller

    Andy Grover
     
  • We don't need to set ret = 0 at the end -- it's initialized to 0.

    Also, don't increment s_send_rdma stat if we're exiting with an
    error.

    Signed-off-by: Andy Grover
    Signed-off-by: David S. Miller

    Andy Grover
     
  • rds_cmsg_rdma_args would still return success even if rds_rdma_pages
    returned an error (or overflowed).

    Signed-off-by: Andy Grover
    Signed-off-by: David S. Miller

    Andy Grover
     
  • As reported by Thomas Pollet, the rdma page counting can overflow. We
    get the rdma sizes in 64-bit unsigned entities, but then limit it to
    UINT_MAX bytes and shift them down to pages (so with a possible "+1" for
    an unaligned address).

    So each individual page count fits comfortably in an 'unsigned int' (not
    even close to overflowing into signed), but as they are added up, they
    might end up resulting in a signed return value. Which would be wrong.

    Catch the case of tot_pages turning negative, and return the appropriate
    error code.

    Reported-by: Thomas Pollet
    Signed-off-by: Linus Torvalds
    Signed-off-by: Andy Grover
    Signed-off-by: David S. Miller

    Linus Torvalds
     
  • Before making the fallback tunnel visible to lookups, we should make
    sure it is completely setup, once ipgre_tunnel_init() had been called
    and tstats per_cpu pointer allocated.

    move rcu_assign_pointer(ign->tunnels_wc[0], tunnel); from
    ipgre_fb_tunnel_init() to ipgre_init_net()

    Based on a patch from Pavel Emelyanov

    Reported-by: Pavel Emelyanov
    Signed-off-by: Eric Dumazet
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • commit a18135eb9389 (Add UDP_MIB_{SND,RCV}BUFERRORS handling.)
    forgot to make the necessary changes in net/ipv6/proc.c to report
    additional counters in /proc/net/snmp6

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Oct, 2010

3 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits)
    b43: Fix warning at drivers/mmc/core/core.c:237 in mmc_wait_for_cmd
    mac80211: fix failure to check kmalloc return value in key_key_read
    libertas: Fix sd8686 firmware reload
    ath9k: Fix incorrect access of rate flags in RC
    netfilter: xt_socket: Make tproto signed in socket_mt6_v1().
    stmmac: enable/disable rx/tx in the core with a single write.
    net: atarilance - flags should be unsigned long
    netxen: fix kdump
    pktgen: Limit how much data we copy onto the stack.
    net: Limit socket I/O iovec total length to INT_MAX.
    USB: gadget: fix ethernet gadget crash in gether_setup
    fib: Fix fib zone and its hash leak on namespace stop
    cxgb3: Fix panic in free_tx_desc()
    cxgb3: fix crash due to manipulating queues before registration
    8390: Don't oops on starting dev queue
    dccp ccid-2: Stop polling
    dccp: Refine the wait-for-ccid mechanism
    dccp: Extend CCID packet dequeueing interface
    dccp: Return-value convention of hc_tx_send_packet()
    igbvf: fix panic on load
    ...

    Linus Torvalds
     
  • David S. Miller
     
  • I noticed two small issues in mac80211/debugfs_key.c::key_key_read while
    reading through the code. Patch below.

    The key_key_read() function returns ssize_t and the value that's actually
    returned is the return value of simple_read_from_buffer() which also
    returns ssize_t, so let's hold the return value in a ssize_t local
    variable rather than a int one.

    Also, memory is allocated dynamically with kmalloc() which can fail, but
    the return value of kmalloc() is not checked, so we may end up operating
    on a null pointer further on. So check for a NULL return and bail out with
    -ENOMEM in that case.

    Signed-off-by: Jesper Juhl
    Signed-off-by: John W. Linville

    Jesper Juhl
     

29 Oct, 2010

10 commits

  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Otherwise error indications from ipv6_find_hdr() won't be noticed.

    This required making the protocol argument to extract_icmp6_fields()
    signed too.

    Reported-by: Geert Uytterhoeven
    Signed-off-by: David S. Miller

    David S. Miller
     
  • A program that accidentally writes too much data to the pktgen file can overflow
    the kernel stack and oops the machine. This is only triggerable by root, so
    there's no security issue, but it's still an unfortunate bug.

    printk() won't print more than 1024 bytes in a single call, anyways, so let's
    just never copy more than that much data. We're on a fairly shallow stack, so
    that should be safe even with CONFIG_4KSTACKS.

    Signed-off-by: Nelson Elhage
    Signed-off-by: David S. Miller

    Nelson Elhage
     
  • This helps protect us from overflow issues down in the
    individual protocol sendmsg/recvmsg handlers. Once
    we hit INT_MAX we truncate out the rest of the iovec
    by setting the iov_len members to zero.

    This works because:

    1) For SOCK_STREAM and SOCK_SEQPACKET sockets, partial
    writes are allowed and the application will just continue
    with another write to send the rest of the data.

    2) For datagram oriented sockets, where there must be a
    one-to-one correspondance between write() calls and
    packets on the wire, INT_MAX is going to be far larger
    than the packet size limit the protocol is going to
    check for and signal with -EMSGSIZE.

    Based upon a patch by Linus Torvalds.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • When we stop a namespace we flush the table and free one, but the
    added fn_zone-s (and their hashes if grown) are leaked. Need to free.
    Tries releases all its stuff in the flushing code.

    Shame on us - this bug exists since the very first make-fib-per-net
    patches in 2.6.27 :(

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • This updates CCID-2 to use the CCID dequeuing mechanism, converting from
    previous continuous-polling to a now event-driven mechanism.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This extends the existing wait-for-ccid routine so that it may be used with
    different types of CCID, addressing the following problems:

    1) The queue-drain mechanism only works with rate-based CCIDs. If CCID-2 for
    example has a full TX queue and becomes network-limited just as the
    application wants to close, then waiting for CCID-2 to become unblocked
    could lead to an indefinite delay (i.e., application "hangs").
    2) Since each TX CCID in turn uses a feedback mechanism, there may be changes
    in its sending policy while the queue is being drained. This can lead to
    further delays during which the application will not be able to terminate.
    3) The minimum wait time for CCID-3/4 can be expected to be the queue length
    times the current inter-packet delay. For example if tx_qlen=100 and a delay
    of 15 ms is used for each packet, then the application would have to wait
    for a minimum of 1.5 seconds before being allowed to exit.
    4) There is no way for the user/application to control this behaviour. It would
    be good to use the timeout argument of dccp_close() as an upper bound. Then
    the maximum time that an application is willing to wait for its CCIDs to can
    be set via the SO_LINGER option.

    These problems are addressed by giving the CCID a grace period of up to the
    `timeout' value.

    The wait-for-ccid function is, as before, used when the application
    (a) has read all the data in its receive buffer and
    (b) if SO_LINGER was set with a non-zero linger time, or
    (c) the socket is either in the OPEN (active close) or in the PASSIVE_CLOSEREQ
    state (client application closes after receiving CloseReq).

    In addition, there is a catch-all case of __skb_queue_purge() after waiting for
    the CCID. This is necessary since the write queue may still have data when
    (a) the host has been passively-closed,
    (b) abnormal termination (unread data, zero linger time),
    (c) wait-for-ccid could not finish within the given time limit.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This extends the packet dequeuing interface of dccp_write_xmit() to allow
    1. CCIDs to take care of timing when the next packet may be sent;
    2. delayed sending (as before, with an inter-packet gap up to 65.535 seconds).

    The main purpose is to take CCID-2 out of its polling mode (when it is network-
    limited, it tries every millisecond to send, without interruption).

    The mode of operation for (2) is as follows:
    * new packet is enqueued via dccp_sendmsg() => dccp_write_xmit(),
    * ccid_hc_tx_send_packet() detects that it may not send (e.g. window full),
    * it signals this condition via `CCID_PACKET_WILL_DEQUEUE_LATER',
    * dccp_write_xmit() returns without further action;
    * after some time the wait-condition for CCID becomes true,
    * that CCID schedules the tasklet,
    * tasklet function calls ccid_hc_tx_send_packet() via dccp_write_xmit(),
    * since the wait-condition is now true, ccid_hc_tx_packet() returns "send now",
    * packet is sent, and possibly more (since dccp_write_xmit() loops).

    Code reuse: the taskled function calls dccp_write_xmit(), the timer function
    reduces to a wrapper around the same code.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This patch reorganises the return value convention of the CCID TX sending
    function, to permit more flexible schemes, as required by subsequent patches.

    Currently the convention is
    * values < 0 mean error,
    * a value == 0 means "send now", and
    * a value x > 0 means "send in x milliseconds".

    The patch provides symbolic constants and a function to interpret return values.

    In addition, it caps the maximum positive return value to 0xFFFF milliseconds,
    corresponding to 65.535 seconds. This is possible since in CCID-3/4 the
    maximum possible inter-packet gap is fixed at t_mbi = 64 sec.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     

28 Oct, 2010

18 commits

  • This patch ensures that a read(fd, NULL, 10) returns EFAULT on a 9p file.

    Signed-off-by: Sanchit Garg
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Eric Van Hensbergen

    Sanchit Garg
     
  • SYNOPSIS
    size[4] Tfsync tag[2] fid[4] datasync[4]

    size[4] Rfsync tag[2]

    DESCRIPTION

    The Tfsync transaction transfers ("flushes") all modified in-core data of
    file identified by fid to the disk device (or other permanent storage
    device) where that file resides.

    If datasync flag is specified data will be fleshed but does not flush
    modified metadata unless that metadata is needed in order to allow a
    subsequent data retrieval to be correctly handled.

    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Venkateswararao Jujjuri (JV)
     
  • We need to return error in case we fail to encode data in protocol buffer.
    This patch also return error in case of a failed copy_from_user.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • If there is not enough space for the PDU on the VirtIO ring, current
    code returns -EIO propagating the error to user.

    This patch introduced a wqit_queue on the channel, and lets the process
    wait on this queue until VirtIO ring frees up.

    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Venkateswararao Jujjuri (JV)
     
  • Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Venkateswararao Jujjuri (JV)
     
  • Synopsis

    size[4] TReadlink tag[2] fid[4]
    size[4] RReadlink tag[2] target[s]

    Description
    Readlink is used to return the contents of the symoblic link
    referred by fid. Contents of symboic link is returned as a
    response.

    target[s] - Contents of the symbolic link referred by fid.

    Signed-off-by: M. Mohan Kumar
    Reviewed-by: Aneesh Kumar K.V
    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    M. Mohan Kumar
     
  • Synopsis

    size[4] TGetlock tag[2] fid[4] getlock[n]
    size[4] RGetlock tag[2] getlock[n]

    Description

    TGetlock is used to test for the existence of byte range posix locks on a file
    identified by given fid. The reply contains getlock structure. If the lock could
    be placed it returns F_UNLCK in type field of getlock structure. Otherwise it
    returns the details of the conflicting locks in the getlock structure

    getlock structure:
    type[1] - Type of lock: F_RDLCK, F_WRLCK
    start[8] - Starting offset for lock
    length[8] - Number of bytes to check for the lock
    If length is 0, check for lock in all bytes starting at the location
    'start' through to the end of file
    pid[4] - PID of the process that wants to take lock/owns the task
    in case of reply
    client[4] - Client id of the system that owns the process which
    has the conflicting lock

    Signed-off-by: M. Mohan Kumar
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    M. Mohan Kumar
     
  • Synopsis

    size[4] TLock tag[2] fid[4] flock[n]
    size[4] RLock tag[2] status[1]

    Description

    Tlock is used to acquire/release byte range posix locks on a file
    identified by given fid. The reply contains status of the lock request

    flock structure:
    type[1] - Type of lock: F_RDLCK, F_WRLCK, F_UNLCK
    flags[4] - Flags could be either of
    P9_LOCK_FLAGS_BLOCK - Blocked lock request, if there is a
    conflicting lock exists, wait for that lock to be released.
    P9_LOCK_FLAGS_RECLAIM - Reclaim lock request, used when client is
    trying to reclaim a lock after a server restrart (due to crash)
    start[8] - Starting offset for lock
    length[8] - Number of bytes to lock
    If length is 0, lock all bytes starting at the location 'start'
    through to the end of file
    pid[4] - PID of the process that wants to take lock
    client_id[4] - Unique client id

    status[1] - Status of the lock request, can be
    P9_LOCK_SUCCESS(0), P9_LOCK_BLOCKED(1), P9_LOCK_ERROR(2) or
    P9_LOCK_GRACE(3)
    P9_LOCK_SUCCESS - Request was successful
    P9_LOCK_BLOCKED - A conflicting lock is held by another process
    P9_LOCK_ERROR - Error while processing the lock request
    P9_LOCK_GRACE - Server is in grace period, it can't accept new lock
    requests in this period (except locks with
    P9_LOCK_FLAGS_RECLAIM flag set)

    Signed-off-by: M. Mohan Kumar
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    M. Mohan Kumar
     
  • SYNOPSIS
    size[4] Tfsync tag[2] fid[4]

    size[4] Rfsync tag[2]

    DESCRIPTION

    The Tfsync transaction transfers ("flushes") all modified in-core data of
    file identified by fid to the disk device (or other permanent storage
    device) where that file resides.

    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Venkateswararao Jujjuri (JV)
     
  • Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    jvrao
     
  • Signed-off-by: Arun R Bharadwaj
    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Arun R Bharadwaj
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (108 commits)
    ehea: Fixing statistics
    bonding: Fix lockdep warning after bond_vlan_rx_register()
    tunnels: Fix tunnels change rcu protection
    caif-u5500: Build config for CAIF shared mem driver
    caif-u5500: CAIF shared memory mailbox interface
    caif-u5500: CAIF shared memory transport protocol
    caif-u5500: Adding shared memory include
    drivers/isdn: delete double assignment
    drivers/net/typhoon.c: delete double assignment
    drivers/net/sb1000.c: delete double assignment
    qlcnic: define valid vlan id range
    qlcnic: reduce rx ring size
    qlcnic: fix mac learning
    ehea: fix use after free
    inetpeer: __rcu annotations
    fib_rules: __rcu annotates ctarget
    tunnels: add __rcu annotations
    net: add __rcu annotations to protocol
    ipv4: add __rcu annotations to routes.c
    qlge: bugfix: Restoring the vlan setting.
    ...

    Linus Torvalds
     
  • After making rcu protection for tunnels (ipip, gre, sit and ip6) a bug
    was introduced into the SIOCCHGTUNNEL code.

    The tunnel is first unlinked, then addresses change, then it is linked
    back probably into another bucket. But while changing the parms, the
    hash table is unlocked to readers and they can lookup the improper tunnel.

    Respective commits are b7285b79 (ipip: get rid of ipip_lock), 1507850b
    (gre: get rid of ipgre_lock), 3a43be3c (sit: get rid of ipip6_lock) and
    94767632 (ip6tnl: get rid of ip6_tnl_lock).

    The quick fix is to wait for quiescent state to pass after unlinking,
    but if it is inappropriate I can invent something better, just let me
    know.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Commit 651b52254fc061f02d965524e71de4333a009a5a added DS Parameter Set
    information into Probe Request frames that are transmitted on 2.4 GHz
    band, but it failed to increment local->scan_ies_len to cover this new
    information. This variable needs to be updated to match the maximum IE
    data length so that the extra buffer need gets reduced from the driver
    limit.

    Signed-off-by: Jouni Malinen
    Signed-off-by: John W. Linville

    Jouni Malinen
     
  • Adds __rcu annotations to inetpeer
    (struct inet_peer)->avl_left
    (struct inet_peer)->avl_right

    This is a tedious cleanup, but removes one smp_wmb() from link_to_pool()
    since we now use more self documenting rcu_assign_pointer().

    Note the use of RCU_INIT_POINTER() instead of rcu_assign_pointer() in
    all cases we dont need a memory barrier.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Adds __rcu annotation to (struct fib_rule)->ctarget

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Add __rcu annotations to :
    (struct ip_tunnel)->prl
    (struct ip_tunnel_prl_entry)->next
    (struct xfrm_tunnel)->next
    struct xfrm_tunnel *tunnel4_handlers
    struct xfrm_tunnel *tunnel64_handlers

    And use appropriate rcu primitives to reduce sparse warnings if
    CONFIG_SPARSE_RCU_POINTER=y

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Add __rcu annotations to :
    struct net_protocol *inet_protos
    struct net_protocol *inet6_protos

    And use appropriate casts to reduce sparse warnings if
    CONFIG_SPARSE_RCU_POINTER=y

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet