13 Jun, 2014

1 commit

  • Pull networking updates from David Miller:

    1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.

    2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
    Benniston.

    3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
    Mork.

    4) BPF now has a "random" opcode, from Chema Gonzalez.

    5) Add more BPF documentation and improve test framework, from Daniel
    Borkmann.

    6) Support TCP fastopen over ipv6, from Daniel Lee.

    7) Add software TSO helper functions and use them to support software
    TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia.

    8) Support software TSO in fec driver too, from Nimrod Andy.

    9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.

    10) Handle broadcasts more gracefully over macvlan when there are large
    numbers of interfaces configured, from Herbert Xu.

    11) Allow more control over fwmark used for non-socket based responses,
    from Lorenzo Colitti.

    12) Do TCP congestion window limiting based upon measurements, from Neal
    Cardwell.

    13) Support busy polling in SCTP, from Neal Horman.

    14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.

    15) Bridge promisc mode handling improvements from Vlad Yasevich.

    16) Don't use inetpeer entries to implement ID generation any more, it
    performs poorly, from Eric Dumazet.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
    rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
    tcp: fixing TLP's FIN recovery
    net: fec: Add software TSO support
    net: fec: Add Scatter/gather support
    net: fec: Increase buffer descriptor entry number
    net: fec: Factorize feature setting
    net: fec: Enable IP header hardware checksum
    net: fec: Factorize the .xmit transmit function
    bridge: fix compile error when compiling without IPv6 support
    bridge: fix smatch warning / potential null pointer dereference
    via-rhine: fix full-duplex with autoneg disable
    bnx2x: Enlarge the dorq threshold for VFs
    bnx2x: Check for UNDI in uncommon branch
    bnx2x: Fix 1G-baseT link
    bnx2x: Fix link for KR with swapped polarity lane
    sctp: Fix sk_ack_backlog wrap-around problem
    net/core: Add VF link state control policy
    net/fsl: xgmac_mdio is dependent on OF_MDIO
    net/fsl: Make xgmac_mdio read error message useful
    net_sched: drr: warn when qdisc is not work conserving
    ...

    Linus Torvalds
     

31 May, 2014

2 commits

  • This patch replaces a comma between expression statements by a semicolon.

    A simplified version of the semantic patch that performs this
    transformation is as follows:

    //
    @r@
    expression e1,e2,e;
    type T;
    identifier i;
    @@

    e1
    -,
    +;
    e2;
    //

    Signed-off-by: Himangi Saraogi
    Acked-by: Julia Lawall
    Signed-off-by: David S. Miller

    Himangi Saraogi
     
  • This patch replaces a comma between expression statements by a semicolon.

    A simplified version of the semantic patch that performs this
    transformation is as follows:

    //
    @r@
    expression e1,e2,e;
    type T;
    identifier i;
    @@

    e1
    -,
    +;
    e2;
    //

    Signed-off-by: Himangi Saraogi
    Acked-by: Julia Lawall
    Signed-off-by: David S. Miller

    Himangi Saraogi
     

19 May, 2014

1 commit


10 May, 2014

1 commit


18 Apr, 2014

1 commit

  • Mostly scripted conversion of the smp_mb__* barriers.

    Signed-off-by: Peter Zijlstra
    Acked-by: Paul E. McKenney
    Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
    Cc: Linus Torvalds
    Cc: linux-arch@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

12 Apr, 2014

1 commit

  • Several spots in the kernel perform a sequence like:

    skb_queue_tail(&sk->s_receive_queue, skb);
    sk->sk_data_ready(sk, skb->len);

    But at the moment we place the SKB onto the socket receive queue it
    can be consumed and freed up. So this skb->len access is potentially
    to freed up memory.

    Furthermore, the skb->len can be modified by the consumer so it is
    possible that the value isn't accurate.

    And finally, no actual implementation of this callback actually uses
    the length argument. And since nobody actually cared about it's
    value, lots of call sites pass arbitrary values in such as '0' and
    even '1'.

    So just remove the length argument from the callback, that way there
    is no confusion whatsoever and all of these use-after-free cases get
    fixed as a side effect.

    Based upon a patch by Eric Dumazet and his suggestion to audit this
    issue tree-wide.

    Signed-off-by: David S. Miller

    David S. Miller
     

01 Apr, 2014

1 commit


19 Jan, 2014

1 commit

  • This is a follow-up patch to f3d3342602f8bc ("net: rework recvmsg
    handler msg_name and msg_namelen logic").

    DECLARE_SOCKADDR validates that the structure we use for writing the
    name information to is not larger than the buffer which is reserved
    for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
    consistently in sendmsg code paths.

    Signed-off-by: Steffen Hurrle
    Suggested-by: Hannes Frederic Sowa
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Steffen Hurrle
     

18 Jan, 2014

2 commits

  • Conflicts:
    drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
    net/ipv4/tcp_metrics.c

    Overlapping changes between the "don't create two tcp metrics objects
    with the same key" race fix in net and the addition of the destination
    address in the lookup key in net-next.

    Minor overlapping changes in bnx2x driver.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • commit ae4b46e9d "net: rds: use this_cpu_* per-cpu helper" broke per-cpu
    handling for rds. chpfirst is the result of __this_cpu_read(), so it is
    an absolute pointer and not __percpu. Therefore, __this_cpu_write()
    should not operate on chpfirst, but rather on cache->percpu->first, just
    like __this_cpu_read() did before.

    Cc: # 3.8+
    Signed-off-byd Gerald Schaefer

    Signed-off-by: David S. Miller

    Gerald Schaefer
     

15 Jan, 2014

1 commit


28 Dec, 2013

1 commit

  • Binding might result in a NULL device, which is dereferenced
    causing this BUG:

    [ 1317.260548] BUG: unable to handle kernel NULL pointer dereference at 000000000000097
    4
    [ 1317.261847] IP: [] rds_ib_laddr_check+0x82/0x110
    [ 1317.263315] PGD 418bcb067 PUD 3ceb21067 PMD 0
    [ 1317.263502] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    [ 1317.264179] Dumping ftrace buffer:
    [ 1317.264774] (ftrace buffer empty)
    [ 1317.265220] Modules linked in:
    [ 1317.265824] CPU: 4 PID: 836 Comm: trinity-child46 Tainted: G W 3.13.0-rc4-
    next-20131218-sasha-00013-g2cebb9b-dirty #4159
    [ 1317.267415] task: ffff8803ddf33000 ti: ffff8803cd31a000 task.ti: ffff8803cd31a000
    [ 1317.268399] RIP: 0010:[] [] rds_ib_laddr_check+
    0x82/0x110
    [ 1317.269670] RSP: 0000:ffff8803cd31bdf8 EFLAGS: 00010246
    [ 1317.270230] RAX: 0000000000000000 RBX: ffff88020b0dd388 RCX: 0000000000000000
    [ 1317.270230] RDX: ffffffff8439822e RSI: 00000000000c000a RDI: 0000000000000286
    [ 1317.270230] RBP: ffff8803cd31be38 R08: 0000000000000000 R09: 0000000000000000
    [ 1317.270230] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
    [ 1317.270230] R13: 0000000054086700 R14: 0000000000a25de0 R15: 0000000000000031
    [ 1317.270230] FS: 00007ff40251d700(0000) GS:ffff88022e200000(0000) knlGS:000000000000
    0000
    [ 1317.270230] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [ 1317.270230] CR2: 0000000000000974 CR3: 00000003cd478000 CR4: 00000000000006e0
    [ 1317.270230] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 1317.270230] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000090602
    [ 1317.270230] Stack:
    [ 1317.270230] 0000000054086700 5408670000a25de0 5408670000000002 0000000000000000
    [ 1317.270230] ffffffff84223542 00000000ea54c767 0000000000000000 ffffffff86d26160
    [ 1317.270230] ffff8803cd31be68 ffffffff84223556 ffff8803cd31beb8 ffff8800c6765280
    [ 1317.270230] Call Trace:
    [ 1317.270230] [] ? rds_trans_get_preferred+0x42/0xa0
    [ 1317.270230] [] rds_trans_get_preferred+0x56/0xa0
    [ 1317.270230] [] rds_bind+0x73/0xf0
    [ 1317.270230] [] SYSC_bind+0x92/0xf0
    [ 1317.270230] [] ? context_tracking_user_exit+0xb8/0x1d0
    [ 1317.270230] [] ? trace_hardirqs_on+0xd/0x10
    [ 1317.270230] [] ? syscall_trace_enter+0x32/0x290
    [ 1317.270230] [] SyS_bind+0xe/0x10
    [ 1317.270230] [] tracesys+0xdd/0xe2
    [ 1317.270230] Code: 00 8b 45 cc 48 8d 75 d0 48 c7 45 d8 00 00 00 00 66 c7 45 d0 02 00
    89 45 d4 48 89 df e8 78 49 76 ff 41 89 c4 85 c0 75 0c 48 8b 03 b8 74 09 00 00 01 7
    4 06 41 bc 9d ff ff ff f6 05 2a b6 c2 02
    [ 1317.270230] RIP [] rds_ib_laddr_check+0x82/0x110
    [ 1317.270230] RSP
    [ 1317.270230] CR2: 0000000000000974

    Signed-off-by: Sasha Levin
    Signed-off-by: David S. Miller

    Sasha Levin
     

04 Dec, 2013

1 commit

  • After congestion update on a local connection, when rds_ib_xmit returns
    less bytes than that are there in the message, rds_send_xmit calls
    back rds_ib_xmit with an offset that causes BUG_ON(off & RDS_FRAG_SIZE)
    to trigger.

    For a 4Kb PAGE_SIZE rds_ib_xmit returns min(8240,4096)=4096 when actually
    the message contains 8240 bytes. rds_send_xmit thinks there is more to send
    and calls rds_ib_xmit again with a data offset "off" of 4096-48(rds header)
    =4048 bytes thus hitting the BUG_ON(off & RDS_FRAG_SIZE) [RDS_FRAG_SIZE=4k].

    The commit 6094628bfd94323fc1cea05ec2c6affd98c18f7f
    "rds: prevent BUG_ON triggering on congestion map updates" introduced
    this regression. That change was addressing the triggering of a different
    BUG_ON in rds_send_xmit() on PowerPC architecture with 64Kbytes PAGE_SIZE:
    BUG_ON(ret != 0 &&
    conn->c_xmit_sg == rm->data.op_nents);
    This was the sequence it was going through:
    (rds_ib_xmit)
    /* Do not send cong updates to IB loopback */
    if (conn->c_loopback
    && rm->m_inc.i_hdr.h_flags & RDS_FLAG_CONG_BITMAP) {
    rds_cong_map_updated(conn->c_fcong, ~(u64) 0);
    return sizeof(struct rds_header) + RDS_CONG_MAP_BYTES;
    }
    rds_ib_xmit returns 8240
    rds_send_xmit:
    c_xmit_data_off = 0 + 8240 - 48 (rds header accounted only the first time)
    = 8192
    c_xmit_data_off < 65536 (sg->length), so calls rds_ib_xmit again
    rds_ib_xmit returns 8240
    rds_send_xmit:
    c_xmit_data_off = 8192 + 8240 = 16432, calls rds_ib_xmit again
    and so on (c_xmit_data_off 24672,32912,41152,49392,57632)
    rds_ib_xmit returns 8240
    On this iteration this sequence causes the BUG_ON in rds_send_xmit:
    while (ret) {
    tmp = min_t(int, ret, sg->length - conn->c_xmit_data_off);
    [tmp = 65536 - 57632 = 7904]
    conn->c_xmit_data_off += tmp;
    [c_xmit_data_off = 57632 + 7904 = 65536]
    ret -= tmp;
    [ret = 8240 - 7904 = 336]
    if (conn->c_xmit_data_off == sg->length) {
    conn->c_xmit_data_off = 0;
    sg++;
    conn->c_xmit_sg++;
    BUG_ON(ret != 0 &&
    conn->c_xmit_sg == rm->data.op_nents);
    [c_xmit_sg = 1, rm->data.op_nents = 1]

    What the current fix does:
    Since the congestion update over loopback is not actually transmitted
    as a message, all that rds_ib_xmit needs to do is let the caller think
    the full message has been transmitted and not return partial bytes.
    It will return 8240 (RDS_CONG_MAP_BYTES+48) when PAGE_SIZE is 4Kb.
    And 64Kb+48 when page size is 64Kb.

    Reported-by: Josh Hunt
    Tested-by: Honggang Li
    Acked-by: Bang Nguyen
    Signed-off-by: Venkat Venkatsubra
    Signed-off-by: David S. Miller

    Venkat Venkatsubra
     

21 Nov, 2013

1 commit


20 Oct, 2013

3 commits

  • Initialize the ehash and ipv6_hash_secrets with net_get_random_once.

    Each compilation unit gets its own secret now:
    ipv4/inet_hashtables.o
    ipv4/udp.o
    ipv6/inet6_hashtables.o
    ipv6/udp.o
    rds/connection.o

    The functions still get inlined into the hashing functions. In the fast
    path we have at most two (needed in ipv6) if (unlikely(...)).

    Cc: Eric Dumazet
    Cc: "David S. Miller"
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     
  • This duplicates a bit of code but let's us easily introduce
    separate secret keys later. The separate compilation units are
    ipv4/inet_hashtabbles.o, ipv4/udp.o and rds/connection.o.

    Cc: Eric Dumazet
    Cc: "David S. Miller"
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     
  • There are a mix of function prototypes with and without extern
    in the kernel sources. Standardize on not using extern for
    function prototypes.

    Function prototypes don't need to be written with extern.
    extern is assumed by the compiler. Its use is as unnecessary as
    using auto to declare automatic/local variables in a block.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

13 Jun, 2013

1 commit

  • Reduce the uses of this unnecessary typedef.

    Done via perl script:

    $ git grep --name-only -w ctl_table net | \
    xargs perl -p -i -e '\
    sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \
    s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge'

    Reflow the modified lines that now exceed 80 columns.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

08 Mar, 2013

1 commit

  • for NUL terminated string, need be always sure '\0' in the end.

    additional info:
    strncpy will pads with zeroes to the end of the given buffer.
    should initialise every bit of memory that is going to be copied to userland

    Signed-off-by: Chen Gang
    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Chen Gang
     

06 Mar, 2013

1 commit

  • Pull networking fixes from David Miller:
    "A moderately sized pile of fixes, some specifically for merge window
    introduced regressions although others are for longer standing items
    and have been queued up for -stable.

    I'm kind of tired of all the RDS protocol bugs over the years, to be
    honest, it's way out of proportion to the number of people who
    actually use it.

    1) Fix missing range initialization in netfilter IPSET, from Jozsef
    Kadlecsik.

    2) ieee80211_local->tim_lock needs to use BH disabling, from Johannes
    Berg.

    3) Fix DMA syncing in SFC driver, from Ben Hutchings.

    4) Fix regression in BOND device MAC address setting, from Jiri
    Pirko.

    5) Missing usb_free_urb in ISDN Hisax driver, from Marina Makienko.

    6) Fix UDP checksumming in bnx2x driver for 57710 and 57711 chips,
    fix from Dmitry Kravkov.

    7) Missing cfgspace_lock initialization in BCMA driver.

    8) Validate parameter size for SCTP assoc stats getsockopt(), from
    Guenter Roeck.

    9) Fix SCTP association hangs, from Lee A Roberts.

    10) Fix jumbo frame handling in r8169, from Francois Romieu.

    11) Fix phy_device memory leak, from Petr Malat.

    12) Omit trailing FCS from frames received in BGMAC driver, from Hauke
    Mehrtens.

    13) Missing socket refcount release in L2TP, from Guillaume Nault.

    14) sctp_endpoint_init should respect passed in gfp_t, rather than use
    GFP_KERNEL unconditionally. From Dan Carpenter.

    15) Add AISX AX88179 USB driver, from Freddy Xin.

    16) Remove MAINTAINERS entries for drivers deleted during the merge
    window, from Cesar Eduardo Barros.

    17) RDS protocol can try to allocate huge amounts of memory, check
    that the user's request length makes sense, from Cong Wang.

    18) SCTP should use the provided KMALLOC_MAX_SIZE instead of it's own,
    bogus, definition. From Cong Wang.

    19) Fix deadlocks in FEC driver by moving TX reclaim into NAPI poll,
    from Frank Li. Also, fix a build error introduced in the merge
    window.

    20) Fix bogus purging of default routes in ipv6, from Lorenzo Colitti.

    21) Don't double count RTT measurements when we leave the TCP receive
    fast path, from Neal Cardwell."

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits)
    tcp: fix double-counted receiver RTT when leaving receiver fast path
    CAIF: fix sparse warning for caif_usb
    rds: simplify a warning message
    net: fec: fix build error in no MXC platform
    net: ipv6: Don't purge default router if accept_ra=2
    net: fec: put tx to napi poll function to fix dead lock
    sctp: use KMALLOC_MAX_SIZE instead of its own MAX_KMALLOC_SIZE
    rds: limit the size allocated by rds_message_alloc()
    MAINTAINERS: remove eexpress
    MAINTAINERS: remove drivers/net/wan/cycx*
    MAINTAINERS: remove 3c505
    caif_dev: fix sparse warnings for caif_flow_cb
    ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver
    sctp: use the passed in gfp flags instead GFP_KERNEL
    ipv[4|6]: correct dropwatch false positive in local_deliver_finish
    l2tp: Restore socket refcount when sendmsg succeeds
    net/phy: micrel: Disable asymmetric pause for KSZ9021
    bgmac: omit the fcs
    phy: Fix phy_device_free memory leak
    bnx2x: Fix KR2 work-around condition
    ...

    Linus Torvalds
     

05 Mar, 2013

2 commits

  • Cc: David S. Miller
    Cc: Venkat Venkatsubra
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • Dave Jones reported the following bug:

    "When fed mangled socket data, rds will trust what userspace gives it,
    and tries to allocate enormous amounts of memory larger than what
    kmalloc can satisfy."

    WARNING: at mm/page_alloc.c:2393 __alloc_pages_nodemask+0xa0d/0xbe0()
    Hardware name: GA-MA78GM-S2H
    Modules linked in: vmw_vsock_vmci_transport vmw_vmci vsock fuse bnep dlci bridge 8021q garp stp mrp binfmt_misc l2tp_ppp l2tp_core rfcomm s
    Pid: 24652, comm: trinity-child2 Not tainted 3.8.0+ #65
    Call Trace:
    [] warn_slowpath_common+0x75/0xa0
    [] warn_slowpath_null+0x1a/0x20
    [] __alloc_pages_nodemask+0xa0d/0xbe0
    [] ? native_sched_clock+0x26/0x90
    [] ? trace_hardirqs_off_caller+0x28/0xc0
    [] ? trace_hardirqs_off+0xd/0x10
    [] alloc_pages_current+0xb8/0x180
    [] __get_free_pages+0x2a/0x80
    [] kmalloc_order_trace+0x3e/0x1a0
    [] __kmalloc+0x2f5/0x3a0
    [] ? local_bh_enable_ip+0x7c/0xf0
    [] rds_message_alloc+0x23/0xb0 [rds]
    [] rds_sendmsg+0x2b1/0x990 [rds]
    [] ? trace_hardirqs_off+0xd/0x10
    [] sock_sendmsg+0xb0/0xe0
    [] ? get_lock_stats+0x22/0x70
    [] ? put_lock_stats.isra.23+0xe/0x40
    [] sys_sendto+0x130/0x180
    [] ? trace_hardirqs_on+0xd/0x10
    [] ? _raw_spin_unlock_irq+0x3b/0x60
    [] ? sysret_check+0x1b/0x56
    [] ? trace_hardirqs_on_caller+0x115/0x1a0
    [] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [] system_call_fastpath+0x16/0x1b
    ---[ end trace eed6ae990d018c8b ]---

    Reported-by: Dave Jones
    Cc: Dave Jones
    Cc: David S. Miller
    Cc: Venkat Venkatsubra
    Signed-off-by: Cong Wang
    Acked-by: Venkat Venkatsubra
    Signed-off-by: David S. Miller

    Cong Wang
     

28 Feb, 2013

1 commit

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

12 Jan, 2013

1 commit

  • The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
    while now and is almost always enabled by default. As agreed during the
    Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

    CC: Venkat Venkatsubra
    CC: "David S. Miller"
    Signed-off-by: Kees Cook
    Acked-by: Venkat Venkatsubra
    Acked-by: David S. Miller

    Kees Cook
     

27 Dec, 2012

2 commits


20 Nov, 2012

1 commit


10 Oct, 2012

1 commit

  • This is the revised patch for fixing rds-ping spinlock recursion
    according to Venkat's suggestions.

    RDS ping/pong over TCP feature has been broken for years(2.6.39 to
    3.6.0) since we have to set TCP cork and call kernel_sendmsg() between
    ping/pong which both need to lock "struct sock *sk". However, this
    lock has already been hold before rds_tcp_data_ready() callback is
    triggerred. As a result, we always facing spinlock resursion which
    would resulting in system panic.

    Given that RDS ping is only used to test the connectivity and not for
    serious performance measurements, we can queue the pong transmit to
    rds_wq as a delayed response.

    Reported-by: Dan Carpenter
    CC: Venkat Venkatsubra
    CC: David S. Miller
    CC: James Morris
    Signed-off-by: Jie Liu
    Signed-off-by: David S. Miller

    jeff.liu
     

23 Aug, 2012

1 commit


23 Jul, 2012

1 commit

  • Jay Fenlason (fenlason@redhat.com) found a bug,
    that recvfrom() on an RDS socket can return the contents of random kernel
    memory to userspace if it was called with a address length larger than
    sizeof(struct sockaddr_in).
    rds_recvmsg() also fails to set the addr_len paramater properly before
    returning, but that's just a bug.
    There are also a number of cases wher recvfrom() can return an entirely bogus
    address. Anything in rds_recvmsg() that returns a non-negative value but does
    not go through the "sin = (struct sockaddr_in *)msg->msg_name;" code path
    at the end of the while(1) loop will return up to 128 bytes of kernel memory
    to userspace.

    And I write two test programs to reproduce this bug, you will see that in
    rds_server, fromAddr will be overwritten and the following sock_fd will be
    destroyed.
    Yes, it is the programmer's fault to set msg_namelen incorrectly, but it is
    better to make the kernel copy the real length of address to user space in
    such case.

    How to run the test programs ?
    I test them on 32bit x86 system, 3.5.0-rc7.

    1 compile
    gcc -o rds_client rds_client.c
    gcc -o rds_server rds_server.c

    2 run ./rds_server on one console

    3 run ./rds_client on another console

    4 you will see something like:
    server is waiting to receive data...
    old socket fd=3
    server received data from client:data from client
    msg.msg_namelen=32
    new socket fd=-1067277685
    sendmsg()
    : Bad file descriptor

    /***************** rds_client.c ********************/

    int main(void)
    {
    int sock_fd;
    struct sockaddr_in serverAddr;
    struct sockaddr_in toAddr;
    char recvBuffer[128] = "data from client";
    struct msghdr msg;
    struct iovec iov;

    sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
    if (sock_fd < 0) {
    perror("create socket error\n");
    exit(1);
    }

    memset(&serverAddr, 0, sizeof(serverAddr));
    serverAddr.sin_family = AF_INET;
    serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
    serverAddr.sin_port = htons(4001);

    if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
    perror("bind() error\n");
    close(sock_fd);
    exit(1);
    }

    memset(&toAddr, 0, sizeof(toAddr));
    toAddr.sin_family = AF_INET;
    toAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
    toAddr.sin_port = htons(4000);
    msg.msg_name = &toAddr;
    msg.msg_namelen = sizeof(toAddr);
    msg.msg_iov = &iov;
    msg.msg_iovlen = 1;
    msg.msg_iov->iov_base = recvBuffer;
    msg.msg_iov->iov_len = strlen(recvBuffer) + 1;
    msg.msg_control = 0;
    msg.msg_controllen = 0;
    msg.msg_flags = 0;

    if (sendmsg(sock_fd, &msg, 0) == -1) {
    perror("sendto() error\n");
    close(sock_fd);
    exit(1);
    }

    printf("client send data:%s\n", recvBuffer);

    memset(recvBuffer, '\0', 128);

    msg.msg_name = &toAddr;
    msg.msg_namelen = sizeof(toAddr);
    msg.msg_iov = &iov;
    msg.msg_iovlen = 1;
    msg.msg_iov->iov_base = recvBuffer;
    msg.msg_iov->iov_len = 128;
    msg.msg_control = 0;
    msg.msg_controllen = 0;
    msg.msg_flags = 0;
    if (recvmsg(sock_fd, &msg, 0) == -1) {
    perror("recvmsg() error\n");
    close(sock_fd);
    exit(1);
    }

    printf("receive data from server:%s\n", recvBuffer);

    close(sock_fd);

    return 0;
    }

    /***************** rds_server.c ********************/

    int main(void)
    {
    struct sockaddr_in fromAddr;
    int sock_fd;
    struct sockaddr_in serverAddr;
    unsigned int addrLen;
    char recvBuffer[128];
    struct msghdr msg;
    struct iovec iov;

    sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
    if(sock_fd < 0) {
    perror("create socket error\n");
    exit(0);
    }

    memset(&serverAddr, 0, sizeof(serverAddr));
    serverAddr.sin_family = AF_INET;
    serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
    serverAddr.sin_port = htons(4000);
    if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
    perror("bind error\n");
    close(sock_fd);
    exit(1);
    }

    printf("server is waiting to receive data...\n");
    msg.msg_name = &fromAddr;

    /*
    * I add 16 to sizeof(fromAddr), ie 32,
    * and pay attention to the definition of fromAddr,
    * recvmsg() will overwrite sock_fd,
    * since kernel will copy 32 bytes to userspace.
    *
    * If you just use sizeof(fromAddr), it works fine.
    * */
    msg.msg_namelen = sizeof(fromAddr) + 16;
    /* msg.msg_namelen = sizeof(fromAddr); */
    msg.msg_iov = &iov;
    msg.msg_iovlen = 1;
    msg.msg_iov->iov_base = recvBuffer;
    msg.msg_iov->iov_len = 128;
    msg.msg_control = 0;
    msg.msg_controllen = 0;
    msg.msg_flags = 0;

    while (1) {
    printf("old socket fd=%d\n", sock_fd);
    if (recvmsg(sock_fd, &msg, 0) == -1) {
    perror("recvmsg() error\n");
    close(sock_fd);
    exit(1);
    }
    printf("server received data from client:%s\n", recvBuffer);
    printf("msg.msg_namelen=%d\n", msg.msg_namelen);
    printf("new socket fd=%d\n", sock_fd);
    strcat(recvBuffer, "--data from server");
    if (sendmsg(sock_fd, &msg, 0) == -1) {
    perror("sendmsg()\n");
    close(sock_fd);
    exit(1);
    }
    }

    close(sock_fd);
    return 0;
    }

    Signed-off-by: Weiping Pan
    Signed-off-by: David S. Miller

    Weiping Pan
     

11 Jul, 2012

1 commit


30 May, 2012

1 commit

  • RDS code assumes that the struct ib_device dma_device member, which is a
    pointer, points to a struct device embedded in a struct pci_dev.

    This is not the case for ehca, for example, which is a OF driver, and
    makes dma_device point to a struct device embedded in a struct
    platform_device.

    This will make the system crash when rds_rdma is loaded in a system
    with ehca, since it will try to access the bus member of a non-existent
    struct pci_dev.

    The only reason rds_rdma uses the struct pci_dev is to get the NUMA node
    the device is attached to. Using dev_to_node for that is much better,
    since it won't assume which bus the infiniband is attached to.

    Signed-off-by: Thadeu Lima de Souza Cascardo
    Cc: dledford@redhat.com
    Cc: Jes.Sorensen@redhat.com
    Cc: Venkat Venkatsubra
    Acked-by: Venkat Venkatsubra
    Signed-off-by: David S. Miller

    Thadeu Lima de Souza Cascardo
     

22 Apr, 2012

1 commit


21 Apr, 2012

2 commits

  • This results in code with less boiler plate that is a bit easier
    to read.

    Additionally stops us from using compatibility code in the sysctl
    core, hastening the day when the compatibility code can be removed.

    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • This makes it clearer which sysctls are relative to your current network
    namespace.

    This makes it a little less error prone by not exposing sysctls for the
    initial network namespace in other namespaces.

    This is the same way we handle all of our other network interfaces to
    userspace and I can't honestly remember why we didn't do this for
    sysctls right from the start.

    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

23 Mar, 2012

1 commit

  • We should be using the gfp flags the caller specified here, instead of
    GFP_KERNEL. I think this might be a bugfix, depending on the value of
    "sock->sk->sk_allocation" when we call rds_conn_create_outgoing() in
    rds_sendmsg(). Otherwise, it's just a cleanup.

    Signed-off-by: Dan Carpenter
    Acked-by: Venkat Venkatsubra
    Signed-off-by: David S. Miller

    Dan Carpenter
     

22 Mar, 2012

1 commit

  • Pull kmap_atomic cleanup from Cong Wang.

    It's been in -next for a long time, and it gets rid of the (no longer
    used) second argument to k[un]map_atomic().

    Fix up a few trivial conflicts in various drivers, and do an "evil
    merge" to catch some new uses that have come in since Cong's tree.

    * 'kmap_atomic' of git://github.com/congwang/linux: (59 commits)
    feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal
    highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename]
    drbd: remove the second argument of k[un]map_atomic()
    zcache: remove the second argument of k[un]map_atomic()
    gma500: remove the second argument of k[un]map_atomic()
    dm: remove the second argument of k[un]map_atomic()
    tomoyo: remove the second argument of k[un]map_atomic()
    sunrpc: remove the second argument of k[un]map_atomic()
    rds: remove the second argument of k[un]map_atomic()
    net: remove the second argument of k[un]map_atomic()
    mm: remove the second argument of k[un]map_atomic()
    lib: remove the second argument of k[un]map_atomic()
    power: remove the second argument of k[un]map_atomic()
    kdb: remove the second argument of k[un]map_atomic()
    udf: remove the second argument of k[un]map_atomic()
    ubifs: remove the second argument of k[un]map_atomic()
    squashfs: remove the second argument of k[un]map_atomic()
    reiserfs: remove the second argument of k[un]map_atomic()
    ocfs2: remove the second argument of k[un]map_atomic()
    ntfs: remove the second argument of k[un]map_atomic()
    ...

    Linus Torvalds
     

21 Mar, 2012

2 commits

  • Pull trivial tree from Jiri Kosina:
    "It's indeed trivial -- mostly documentation updates and a bunch of
    typo fixes from Masanari.

    There are also several linux/version.h include removals from Jesper."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (101 commits)
    kcore: fix spelling in read_kcore() comment
    constify struct pci_dev * in obvious cases
    Revert "char: Fix typo in viotape.c"
    init: fix wording error in mm_init comment
    usb: gadget: Kconfig: fix typo for 'different'
    Revert "power, max8998: Include linux/module.h just once in drivers/power/max8998_charger.c"
    writeback: fix fn name in writeback_inodes_sb_nr_if_idle() comment header
    writeback: fix typo in the writeback_control comment
    Documentation: Fix multiple typo in Documentation
    tpm_tis: fix tis_lock with respect to RCU
    Revert "media: Fix typo in mixer_drv.c and hdmi_drv.c"
    Doc: Update numastat.txt
    qla4xxx: Add missing spaces to error messages
    compiler.h: Fix typo
    security: struct security_operations kerneldoc fix
    Documentation: broken URL in libata.tmpl
    Documentation: broken URL in filesystems.tmpl
    mtd: simplify return logic in do_map_probe()
    mm: fix comment typo of truncate_inode_pages_range
    power: bq27x00: Fix typos in comment
    ...

    Linus Torvalds
     
  • no socket layer outputs a message for this error and neither should rds.

    Signed-off-by: Dave Jones
    Signed-off-by: David S. Miller

    Dave Jones