24 Aug, 2018

1 commit

  • commit caa21e19e08d7a1445116a93f7ab4e187ebbbadb upstream.

    Invoking shutdown for a socket in state SMC_LISTEN does not make
    sense. Nevertheless programs like syzbot fuzzing the kernel may
    try to do this. For SMC this means a socket refcounting problem.
    This patch makes sure a shutdown call for an SMC socket in state
    SMC_LISTEN simply returns with -ENOTCONN.

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ursula Braun
     

21 Jun, 2018

1 commit

  • [ Upstream commit bda27ff5c4526f80a7620a94ecfe8dca153e3696 ]

    The sendpage() call grabs the sock lock before calling the default
    implementation - which tries to grab it once again.

    Signed-off-by: Stefan Raspl
    Signed-off-by: Ursula Braun <
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Stefan Raspl
     

30 May, 2018

2 commits

  • [ Upstream commit c9f4c6cf53bfafb639386a4c094929f13f573e04 ]

    smc allocates a certain number of CQ entries for used RoCE devices. For
    mlx5 devices the chosen constant number results in a large allocation
    causing this warning:

    [13355.124656] WARNING: CPU: 3 PID: 16535 at mm/page_alloc.c:3883 __alloc_pages_nodemask+0x2be/0x10c0
    [13355.124657] Modules linked in: smc_diag(O) smc(O) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter mlx5_ib ib_core sunrpc mlx5_core s390_trng rng_core ghash_s390 prng aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 sha_common ptp pps_core eadm_sch dm_multipath dm_mod vhost_net tun vhost tap sch_fq_codel kvm ip_tables x_tables autofs4 [last unloaded: smc]
    [13355.124672] CPU: 3 PID: 16535 Comm: kworker/3:0 Tainted: G O 4.14.0uschi #1
    [13355.124673] Hardware name: IBM 3906 M04 704 (LPAR)
    [13355.124675] Workqueue: events smc_listen_work [smc]
    [13355.124677] task: 00000000e2f22100 task.stack: 0000000084720000
    [13355.124678] Krnl PSW : 0704c00180000000 000000000029da76 (__alloc_pages_nodemask+0x2be/0x10c0)
    [13355.124681] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
    [13355.124682] Krnl GPRS: 0000000000000000 00550e00014080c0 0000000000000000 0000000000000001
    [13355.124684] 000000000029d8b6 00000000f3bfd710 0000000000000000 00000000014080c0
    [13355.124685] 0000000000000009 00000000ec277a00 0000000000200000 0000000000000000
    [13355.124686] 0000000000000000 00000000000001ff 000000000029d8b6 0000000084723720
    [13355.124708] Krnl Code: 000000000029da6a: a7110200 tmll %r1,512
    000000000029da6e: a774ff29 brc 7,29d8c0
    #000000000029da72: a7f40001 brc 15,29da74
    >000000000029da76: a7f4ff25 brc 15,29d8c0
    000000000029da7a: a7380000 lhi %r3,0
    000000000029da7e: a7f4fef1 brc 15,29d860
    000000000029da82: 5820f0c4 l %r2,196(%r15)
    000000000029da86: a53e0048 llilh %r3,72
    [13355.124720] Call Trace:
    [13355.124722] ([] __alloc_pages_nodemask+0xfe/0x10c0)
    [13355.124724] [] s390_dma_alloc+0x6e/0x148
    [13355.124733] [] mlx5_dma_zalloc_coherent_node+0x8e/0xe0 [mlx5_core]
    [13355.124740] [] mlx5_buf_alloc_node+0x70/0x108 [mlx5_core]
    [13355.124744] [] mlx5_ib_create_cq+0x558/0x898 [mlx5_ib]
    [13355.124749] [] ib_create_cq+0x48/0x88 [ib_core]
    [13355.124751] [] smc_ib_setup_per_ibdev+0x52/0x118 [smc]
    [13355.124753] [] smc_conn_create+0x65e/0x728 [smc]
    [13355.124755] [] smc_listen_work+0x2d2/0x540 [smc]
    [13355.124756] [] process_one_work+0x1be/0x440
    [13355.124758] [] worker_thread+0x58/0x458
    [13355.124759] [] kthread+0x14e/0x168
    [13355.124760] [] kernel_thread_starter+0x6/0xc
    [13355.124762] [] kernel_thread_starter+0x0/0xc
    [13355.124762] Last Breaking-Event-Address:
    [13355.124764] [] __alloc_pages_nodemask+0x2ba/0x10c0
    [13355.124764] ---[ end trace 34be38b581c0b585 ]---

    This patch reduces the smc constant for the maximum number of allocated
    completion queue entries SMC_MAX_CQE by 2 to avoid high round up values
    in the mlx5 code, and reduces the number of allocated completion queue
    entries even more, if the final allocation for an mlx5 device hits the
    MAX_ORDER limit.

    Reported-by: Ihnken Menssen
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Ursula Braun
     
  • [ Upstream commit 2be922f31606f114119f48de3207d122a90e7357 ]

    The CONFIRM LINK reply message must contain the link_id sent
    by the server. And set the link_id explicitly when
    initializing the link.

    Signed-off-by: Karsten Graul
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Karsten Graul
     

25 May, 2018

1 commit

  • [ Upstream commit d49baa7e12ee70c0a7b821d088a770c94c02e494 ]

    It's possible to crash the kernel in several different ways by sending
    messages to the SMC_PNETID generic netlink family that are missing the
    expected attributes:

    - Missing SMC_PNETID_NAME => null pointer dereference when comparing
    names.
    - Missing SMC_PNETID_ETHNAME => null pointer dereference accessing
    smc_pnetentry::ndev.
    - Missing SMC_PNETID_IBNAME => null pointer dereference accessing
    smc_pnetentry::smcibdev.
    - Missing SMC_PNETID_IBPORT => out of bounds array access to
    smc_ib_device::pattr[-1].

    Fix it by validating that all expected attributes are present and that
    SMC_PNETID_IBPORT is nonzero.

    Reported-by: syzbot+5cd61039dc9b8bfa6e47@syzkaller.appspotmail.com
    Fixes: 6812baabf24d ("smc: establish pnet table management")
    Cc: # v4.11+
    Signed-off-by: Eric Biggers
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Biggers
     

29 Apr, 2018

1 commit

  • [ Upstream commit 1255fcb2a655f05e02f3a74675a6d6525f187afd ]

    Calling shutdown with SHUT_RD and SHUT_RDWR for a listening SMC socket
    crashes, because
    commit 127f49705823 ("net/smc: release clcsock from tcp_listen_worker")
    releases the internal clcsock in smc_close_active() and sets smc->clcsock
    to NULL.
    For SHUT_RD the smc_close_active() call is removed.
    For SHUT_RDWR the kernel_sock_shutdown() call is omitted, since the
    clcsock is already released.

    Fixes: 127f49705823 ("net/smc: release clcsock from tcp_listen_worker")
    Signed-off-by: Ursula Braun
    Reported-by: Stephen Hemminger
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ursula Braun
     

15 Mar, 2018

1 commit

  • commit a5dcb73b96a9d21431048bdaac02d9e96f386da3 upstream.

    when sock_create_kern(..., a) returns an error, 'a' might not be a valid
    pointer, so it shouldn't be dereferenced to read a->sk->sk_sndbuf and
    and a->sk->sk_rcvbuf; not doing that caused the following crash:

    general protection fault: 0000 [#1] SMP KASAN
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Modules linked in:
    CPU: 0 PID: 4254 Comm: syzkaller919713 Not tainted 4.16.0-rc1+ #18
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    RIP: 0010:smc_create+0x14e/0x300 net/smc/af_smc.c:1410
    RSP: 0018:ffff8801b06afbc8 EFLAGS: 00010202
    RAX: dffffc0000000000 RBX: ffff8801b63457c0 RCX: ffffffff85a3e746
    RDX: 0000000000000004 RSI: 00000000ffffffff RDI: 0000000000000020
    RBP: ffff8801b06afbf0 R08: 00000000000007c0 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    R13: ffff8801b6345c08 R14: 00000000ffffffe9 R15: ffffffff8695ced0
    FS: 0000000001afb880(0000) GS:ffff8801db200000(0000)
    knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000020000040 CR3: 00000001b0721004 CR4: 00000000001606f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    __sock_create+0x4d4/0x850 net/socket.c:1285
    sock_create net/socket.c:1325 [inline]
    SYSC_socketpair net/socket.c:1409 [inline]
    SyS_socketpair+0x1c0/0x6f0 net/socket.c:1366
    do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x26/0x9b
    RIP: 0033:0x4404b9
    RSP: 002b:00007fff44ab6908 EFLAGS: 00000246 ORIG_RAX: 0000000000000035
    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004404b9
    RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000000002b
    RBP: 00007fff44ab6910 R08: 0000000000000002 R09: 00007fff44003031
    R10: 0000000020000040 R11: 0000000000000246 R12: ffffffffffffffff
    R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000
    Code: 48 c1 ea 03 80 3c 02 00 0f 85 b3 01 00 00 4c 8b a3 48 04 00 00 48
    b8
    00 00 00 00 00 fc ff df 49 8d 7c 24 20 48 89 fa 48 c1 ea 03 3c 02
    00
    0f 85 82 01 00 00 4d 8b 7c 24 20 48 b8 00 00 00 00
    RIP: smc_create+0x14e/0x300 net/smc/af_smc.c:1410 RSP: ffff8801b06afbc8

    Fixes: cd6851f30386 smc: remote memory buffers (RMBs)
    Reported-and-tested-by: syzbot+aa0227369be2dcc26ebe@syzkaller.appspotmail.com
    Signed-off-by: Davide Caratti
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Davide Caratti
     

14 Dec, 2017

1 commit

  • [ Upstream commit 4e1061f4a2bba1669c7297455c73ddafbebf2b12 ]

    Commit 3e034725c0d8 ("net/smc: common functions for RMBs and send buffers")
    merged handling of SMC receive and send buffers. It introduced sk_buf_size
    as merged start value for size determination. But since sk_buf_size is not
    used at all, sk_sndbuf is erroneously used as start for rmb creation.
    This patch makes sure, sk_buf_size is really used as intended, and
    sk_rcvbuf is used as start value for rmb creation.

    Fixes: 3e034725c0d8 ("net/smc: common functions for RMBs and send buffers")
    Signed-off-by: Ursula Braun
    Reviewed-by: Hans Wippel
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Ursula Braun
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

22 Sep, 2017

9 commits

  • Usually socket closing is delayed if there is still data available in
    the send buffer to be transmitted. If a process is killed, the delay
    should be avoided.

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • The number of outstanding work requests is limited. If all work
    requests are in use, tx processing is postponed to another scheduling
    of the tx worker. Switch to a delayed worker to have a gap for tx
    completion queue events before the next retry.

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • An out-of-sync condition can just be detected by the client.
    If the server receives a CLC DECLINE message indicating an out-of-sync
    condition for the link groups, the server must clean up the out-of-sync
    link group.
    There is no need for an extra third parameter in smc_clc_send_decline().

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • Client link group creation always follows the server linkgroup creation.
    If peer creates a new server link group, client has to create a new
    client link group. If peer reuses a server link group for a new
    connection, client has to reuse its client link group as well. This
    patch introduces a longer delay for client link group removal to make
    sure this link group still exists, once the peer decides to reuse a
    server link group. This avoids out-of-sync conditions for link groups.
    If already scheduled, modify the delay.

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • The solicited flag is meaningful for the receive completion queue.
    Ask for next work completion of any type on the send queue.

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • smc_pnet_fill_entry() uses dev_get_by_name() adding a refcount to ndev.
    The following smc_pnet_enter() has to reduce the refcount if the entry
    to be added exists already in the pnet table.

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • smc_netinfo_by_tcpsk() looks up the routing cache. Such a lookup requires
    protection by an RCU read lock.

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • The SMC receive function currently lacks a timeout check under the
    condition that no data were received and no data are available. This
    patch adds such a check.

    Signed-off-by: Hans Wippel
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Hans Wippel
     
  • In the infiniband part, SMC currently uses get_netdev which calls
    dev_hold on the returned net device. However, the SMC code never calls
    dev_put on that net device resulting in a wrong reference count.

    This patch adds a dev_put after the usage of the net device to fix the
    issue.

    Signed-off-by: Hans Wippel
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Hans Wippel
     

30 Jul, 2017

10 commits


17 May, 2017

2 commits

  • The driver explicitly bypasses APIs to register all memory once a
    connection is made, and thus allows remote access to memory.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Leon Romanovsky
    Acked-by: Ursula Braun
    Signed-off-by: David S. Miller

    Christoph Hellwig
     
  • Currently, SMC enables remote access to physical memory when a user
    has successfully configured and established an SMC-connection until ten
    minutes after the last SMC connection is closed. Because this is considered
    a security risk, drivers are supposed to use IB_PD_UNSAFE_GLOBAL_RKEY in
    such a case.

    This patch changes the current SMC code to use IB_PD_UNSAFE_GLOBAL_RKEY.
    This improves user awareness, but does not remove the security risk itself.

    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     

11 May, 2017

1 commit

  • Pull RCU updates from Ingo Molnar:
    "The main changes are:

    - Debloat RCU headers

    - Parallelize SRCU callback handling (plus overlapping patches)

    - Improve the performance of Tree SRCU on a CPU-hotplug stress test

    - Documentation updates

    - Miscellaneous fixes"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
    rcu: Open-code the rcu_cblist_n_lazy_cbs() function
    rcu: Open-code the rcu_cblist_n_cbs() function
    rcu: Open-code the rcu_cblist_empty() function
    rcu: Separately compile large rcu_segcblist functions
    srcu: Debloat the header
    srcu: Adjust default auto-expediting holdoff
    srcu: Specify auto-expedite holdoff time
    srcu: Expedite first synchronize_srcu() when idle
    srcu: Expedited grace periods with reduced memory contention
    srcu: Make rcutorture writer stalls print SRCU GP state
    srcu: Exact tracking of srcu_data structures containing callbacks
    srcu: Make SRCU be built by default
    srcu: Fix Kconfig botch when SRCU not selected
    rcu: Make non-preemptive schedule be Tasks RCU quiescent state
    srcu: Expedite srcu_schedule_cbs_snp() callback invocation
    srcu: Parallelize callback handling
    kvm: Move srcu_struct fields to end of struct kvm
    rcu: Fix typo in PER_RCU_NODE_PERIOD header comment
    rcu: Use true/false in assignment to bool
    rcu: Use bool value directly
    ...

    Linus Torvalds
     

02 May, 2017

2 commits

  • rdma_ah_attr can now be either ib or roce allowing
    core components to use one type or the other and also
    to define attributes unique to a specific type. struct
    ib_ah is also initialized with the type when its first
    created. This ensures that calls such as modify_ah
    dont modify the type of the address handle attribute.

    Reviewed-by: Ira Weiny
    Reviewed-by: Don Hiatt
    Reviewed-by: Sean Hefty
    Reviewed-by: Niranjana Vishwanathapura
    Signed-off-by: Dasaratharaman Chandramouli
    Signed-off-by: Doug Ledford

    Dasaratharaman Chandramouli
     
  • Modify core and driver components to use accessor functions
    introduced to access individual fields of rdma_ah_attr

    Reviewed-by: Ira Weiny
    Reviewed-by: Don Hiatt
    Reviewed-by: Sean Hefty
    Reviewed-by: Niranjana Vishwanathapura
    Signed-off-by: Dasaratharaman Chandramouli
    Signed-off-by: Doug Ledford

    Dasaratharaman Chandramouli
     

23 Apr, 2017

1 commit


19 Apr, 2017

1 commit

  • A group of Linux kernel hackers reported chasing a bug that resulted
    from their assumption that SLAB_DESTROY_BY_RCU provided an existence
    guarantee, that is, that no block from such a slab would be reallocated
    during an RCU read-side critical section. Of course, that is not the
    case. Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire
    slab of blocks.

    However, there is a phrase for this, namely "type safety". This commit
    therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order
    to avoid future instances of this sort of confusion.

    Signed-off-by: Paul E. McKenney
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrew Morton
    Cc:
    Acked-by: Johannes Weiner
    Acked-by: Vlastimil Babka
    [ paulmck: Add comments mentioning the old name, as requested by Eric
    Dumazet, in order to help people familiar with the old name find
    the new one. ]
    Acked-by: David Rientjes

    Paul E. McKenney
     

12 Apr, 2017

5 commits

  • smc specifies IB_SEND_INLINE for IB_WR_SEND ib_post_send calls, but
    provides a mapped buffer to be sent. This is inconsistent, since
    IB_SEND_INLINE works without mapped buffer. Problem has not been
    detected in the past, because tests had been limited to Connect X3 cards
    from Mellanox, whose mlx4 driver just ignored the IB_SEND_INLINE flag.
    For now, the IB_SEND_INLINE flag is removed.

    Signed-off-by: Ursula Braun
    Reviewed-by: Thomas Richter
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • Make sure sockets never accepted are removed cleanly.

    Signed-off-by: Ursula Braun
    Reviewed-by: Thomas Richter
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • unhash is already called in sock_put_work. Remove the second call.

    Signed-off-by: Ursula Braun
    Reviewed-by: Thomas Richter
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • State SMC_CLOSED should be reached only, if ConnClosed has been sent to
    the peer. If ConnClosed is received from the peer, a socket with
    shutdown SHUT_WR done, switches errorneously to state SMC_CLOSED, which
    means the peer socket is dangling. The local SMC socket is supposed to
    switch to state APPFINCLOSEWAIT to make sure smc_close_final() is called
    during socket close.

    Signed-off-by: Ursula Braun
    Reviewed-by: Thomas Richter
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • Several state changes occur during SMC socket closing. Currently
    state changes triggered locally occur in process context with
    lock_sock() taken while state changes triggered by peer occur in
    tasklet context with bh_lock_sock() taken. bh_lock_sock() does not
    wait till a lock_sock(() task in process context is finished. This
    may lead to races in socket state transitions resulting in dangling
    SMC-sockets, or it may lead to duplicate SMC socket freeing.
    This patch introduces a closing worker to run all state changes under
    lock_sock().

    Signed-off-by: Ursula Braun
    Reviewed-by: Thomas Richter
    Reported-by: Dave Jones
    Signed-off-by: David S. Miller

    Ursula Braun