15 May, 2019

1 commit

  • To facilitate additional options to get_user_pages_fast() change the
    singular write parameter to be gup_flags.

    This patch does not change any functionality. New functionality will
    follow in subsequent patches.

    Some of the get_user_pages_fast() call sites were unchanged because they
    already passed FOLL_WRITE or 0 for the write parameter.

    NOTE: It was suggested to change the ordering of the get_user_pages_fast()
    arguments to ensure that callers were converted. This breaks the current
    GUP call site convention of having the returned pages be the final
    parameter. So the suggestion was rejected.

    Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
    Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com
    Signed-off-by: Ira Weiny
    Reviewed-by: Mike Marshall
    Cc: Aneesh Kumar K.V
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Dan Williams
    Cc: "David S. Miller"
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: James Hogan
    Cc: Jason Gunthorpe
    Cc: John Hubbard
    Cc: "Kirill A. Shutemov"
    Cc: Martin Schwidefsky
    Cc: Michal Hocko
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Thomas Gleixner
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ira Weiny
     

06 May, 2019

1 commit


02 May, 2019

1 commit

  • While the endiannes is being handled correctly as indicated by the comment
    above the offending line - sparse was unhappy with the missing annotation
    as be64_to_cpu() expects a __be64 argument. To mitigate this annotation
    all involved variables are changed to a consistent __le64 and the
    conversion to uint64_t delayed to the call to rds_cong_map_updated().

    Signed-off-by: Nicholas Mc Guire
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Nicholas Mc Guire
     

25 Apr, 2019

1 commit

  • Before the commit 490ea5967b0d ("RDS: IB: move FMR code to its own file"),
    when the dirty_count is greater than 9/10 of max_items of 8K pool,
    1M pool is used, Vice versa. After the commit 490ea5967b0d ("RDS: IB: move
    FMR code to its own file"), the above is removed. When we make the
    following tests.

    Server:
    rds-stress -r 1.1.1.16 -D 1M

    Client:
    rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M

    The following will appear.
    "
    connecting to 1.1.1.16:4000
    negotiated options, tasks will start in 2 seconds
    Starting up..header from 1.1.1.166:4001 to id 4001 bogus
    ..
    tsks tx/s rx/s tx+rx K/s mbi K/s mbo K/s tx us/c rtt us
    cpu %
    1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
    1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
    1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
    1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
    1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
    ...
    "
    So this exchange between 8K and 1M pool is added back.

    Fixes: commit 490ea5967b0d ("RDS: IB: move FMR code to its own file")
    Signed-off-by: Zhu Yanjun
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Zhu Yanjun
     

13 Apr, 2019

1 commit

  • syzbot is reporting uninitialized value at rds_connect() [1] and
    rds_bind() [2]. This is because syzbot is passing ulen == 0 whereas
    these functions expect that it is safe to access sockaddr->family field
    in order to determine minimal address length for validation.

    [1] https://syzkaller.appspot.com/bug?id=f4e61c010416c1e6f0fa3ffe247561b60a50ad71
    [2] https://syzkaller.appspot.com/bug?id=a4bf9e41b7e055c3823fdcd83e8c58ca7270e38f

    Reported-by: syzbot
    Reported-by: syzbot
    Signed-off-by: Tetsuo Handa
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Tetsuo Handa
     

29 Mar, 2019

1 commit

  • When it is to cleanup net namespace, rds_tcp_exit_net() will call
    rds_tcp_kill_sock(), if t_sock is NULL, it will not call
    rds_conn_destroy(), rds_conn_path_destroy() and rds_tcp_conn_free() to free
    connection, and the worker cp_conn_w is not stopped, afterwards the net is freed in
    net_drop_ns(); While cp_conn_w rds_connect_worker() will call rds_tcp_conn_path_connect()
    and reference 'net' which has already been freed.

    In rds_tcp_conn_path_connect(), rds_tcp_set_callbacks() will set t_sock = sock before
    sock->ops->connect, but if connect() is failed, it will call
    rds_tcp_restore_callbacks() and set t_sock = NULL, if connect is always
    failed, rds_connect_worker() will try to reconnect all the time, so
    rds_tcp_kill_sock() will never to cancel worker cp_conn_w and free the
    connections.

    Therefore, the condition !tc->t_sock is not needed if it is going to do
    cleanup_net->rds_tcp_exit_net->rds_tcp_kill_sock, because tc->t_sock is always
    NULL, and there is on other path to cancel cp_conn_w and free
    connection. So this patch is to fix this.

    rds_tcp_kill_sock():
    ...
    if (net != c_net || !tc->t_sock)
    ...
    Acked-by: Santosh Shilimkar

    ==================================================================
    BUG: KASAN: use-after-free in inet_create+0xbcc/0xd28
    net/ipv4/af_inet.c:340
    Read of size 4 at addr ffff8003496a4684 by task kworker/u8:4/3721

    CPU: 3 PID: 3721 Comm: kworker/u8:4 Not tainted 5.1.0 #11
    Hardware name: linux,dummy-virt (DT)
    Workqueue: krdsd rds_connect_worker
    Call trace:
    dump_backtrace+0x0/0x3c0 arch/arm64/kernel/time.c:53
    show_stack+0x28/0x38 arch/arm64/kernel/traps.c:152
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x120/0x188 lib/dump_stack.c:113
    print_address_description+0x68/0x278 mm/kasan/report.c:253
    kasan_report_error mm/kasan/report.c:351 [inline]
    kasan_report+0x21c/0x348 mm/kasan/report.c:409
    __asan_report_load4_noabort+0x30/0x40 mm/kasan/report.c:429
    inet_create+0xbcc/0xd28 net/ipv4/af_inet.c:340
    __sock_create+0x4f8/0x770 net/socket.c:1276
    sock_create_kern+0x50/0x68 net/socket.c:1322
    rds_tcp_conn_path_connect+0x2b4/0x690 net/rds/tcp_connect.c:114
    rds_connect_worker+0x108/0x1d0 net/rds/threads.c:175
    process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153
    worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296
    kthread+0x2f0/0x378 kernel/kthread.c:255
    ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117

    Allocated by task 687:
    save_stack mm/kasan/kasan.c:448 [inline]
    set_track mm/kasan/kasan.c:460 [inline]
    kasan_kmalloc+0xd4/0x180 mm/kasan/kasan.c:553
    kasan_slab_alloc+0x14/0x20 mm/kasan/kasan.c:490
    slab_post_alloc_hook mm/slab.h:444 [inline]
    slab_alloc_node mm/slub.c:2705 [inline]
    slab_alloc mm/slub.c:2713 [inline]
    kmem_cache_alloc+0x14c/0x388 mm/slub.c:2718
    kmem_cache_zalloc include/linux/slab.h:697 [inline]
    net_alloc net/core/net_namespace.c:384 [inline]
    copy_net_ns+0xc4/0x2d0 net/core/net_namespace.c:424
    create_new_namespaces+0x300/0x658 kernel/nsproxy.c:107
    unshare_nsproxy_namespaces+0xa0/0x198 kernel/nsproxy.c:206
    ksys_unshare+0x340/0x628 kernel/fork.c:2577
    __do_sys_unshare kernel/fork.c:2645 [inline]
    __se_sys_unshare kernel/fork.c:2643 [inline]
    __arm64_sys_unshare+0x38/0x58 kernel/fork.c:2643
    __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
    invoke_syscall arch/arm64/kernel/syscall.c:47 [inline]
    el0_svc_common+0x168/0x390 arch/arm64/kernel/syscall.c:83
    el0_svc_handler+0x60/0xd0 arch/arm64/kernel/syscall.c:129
    el0_svc+0x8/0xc arch/arm64/kernel/entry.S:960

    Freed by task 264:
    save_stack mm/kasan/kasan.c:448 [inline]
    set_track mm/kasan/kasan.c:460 [inline]
    __kasan_slab_free+0x114/0x220 mm/kasan/kasan.c:521
    kasan_slab_free+0x10/0x18 mm/kasan/kasan.c:528
    slab_free_hook mm/slub.c:1370 [inline]
    slab_free_freelist_hook mm/slub.c:1397 [inline]
    slab_free mm/slub.c:2952 [inline]
    kmem_cache_free+0xb8/0x3a8 mm/slub.c:2968
    net_free net/core/net_namespace.c:400 [inline]
    net_drop_ns.part.6+0x78/0x90 net/core/net_namespace.c:407
    net_drop_ns net/core/net_namespace.c:406 [inline]
    cleanup_net+0x53c/0x6d8 net/core/net_namespace.c:569
    process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153
    worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296
    kthread+0x2f0/0x378 kernel/kthread.c:255
    ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117

    The buggy address belongs to the object at ffff8003496a3f80
    which belongs to the cache net_namespace of size 7872
    The buggy address is located 1796 bytes inside of
    7872-byte region [ffff8003496a3f80, ffff8003496a5e40)
    The buggy address belongs to the page:
    page:ffff7e000d25a800 count:1 mapcount:0 mapping:ffff80036ce4b000
    index:0x0 compound_mapcount: 0
    flags: 0xffffe0000008100(slab|head)
    raw: 0ffffe0000008100 dead000000000100 dead000000000200 ffff80036ce4b000
    raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
    ffff8003496a4580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ffff8003496a4600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    >ffff8003496a4680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ^
    ffff8003496a4700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ffff8003496a4780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ==================================================================

    Fixes: 467fa15356ac("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.")
    Reported-by: Hulk Robot
    Signed-off-by: Mao Wenan
    Signed-off-by: David S. Miller

    Mao Wenan
     

10 Mar, 2019

1 commit

  • Pull rdma updates from Jason Gunthorpe:
    "This has been a slightly more active cycle than normal with ongoing
    core changes and quite a lot of collected driver updates.

    - Various driver fixes for bnxt_re, cxgb4, hns, mlx5, pvrdma, rxe

    - A new data transfer mode for HFI1 giving higher performance

    - Significant functional and bug fix update to the mlx5
    On-Demand-Paging MR feature

    - A chip hang reset recovery system for hns

    - Change mm->pinned_vm to an atomic64

    - Update bnxt_re to support a new 57500 chip

    - A sane netlink 'rdma link add' method for creating rxe devices and
    fixing the various unregistration race conditions in rxe's
    unregister flow

    - Allow lookup up objects by an ID over netlink

    - Various reworking of the core to driver interface:
    - drivers should not assume umem SGLs are in PAGE_SIZE chunks
    - ucontext is accessed via udata not other means
    - start to make the core code responsible for object memory
    allocation
    - drivers should convert struct device to struct ib_device via a
    helper
    - drivers have more tools to avoid use after unregister problems"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (280 commits)
    net/mlx5: ODP support for XRC transport is not enabled by default in FW
    IB/hfi1: Close race condition on user context disable and close
    RDMA/umem: Revert broken 'off by one' fix
    RDMA/umem: minor bug fix in error handling path
    RDMA/hns: Use GFP_ATOMIC in hns_roce_v2_modify_qp
    cxgb4: kfree mhp after the debug print
    IB/rdmavt: Fix concurrency panics in QP post_send and modify to error
    IB/rdmavt: Fix loopback send with invalidate ordering
    IB/iser: Fix dma_nents type definition
    IB/mlx5: Set correct write permissions for implicit ODP MR
    bnxt_re: Clean cq for kernel consumers only
    RDMA/uverbs: Don't do double free of allocated PD
    RDMA: Handle ucontext allocations by IB/core
    RDMA/core: Fix a WARN() message
    bnxt_re: fix the regression due to changes in alloc_pbl
    IB/mlx4: Increase the timeout for CM cache
    IB/core: Abort page fault handler silently during owning process exit
    IB/mlx5: Validate correct PD before prefetch MR
    IB/mlx5: Protect against prefetch of invalid MR
    RDMA/uverbs: Store PR pointer before it is overwritten
    ...

    Linus Torvalds
     

09 Feb, 2019

1 commit


05 Feb, 2019

7 commits

  • For RDMA transports, RDS TOS is an extension of IB QoS(Annex A13)
    to provide clients the ability to segregate traffic flows for
    different type of data. RDMA CM abstract it for ULPs using
    rdma_set_service_type(). Internally, each traffic flow is
    represented by a connection with all of its independent resources
    like that of a normal connection, and is differentiated by
    service type. In other words, there can be multiple qp connections
    between an IP pair and each supports a unique service type.

    The feature has been added from RDSv4.1 onwards and supports
    rolling upgrades. RDMA connection metadata also carries the tos
    information to set up SL on end to end context. The original
    code was developed by Bang Nguyen in downstream kernel back in
    2.6.32 kernel days and it has evolved over period of time.

    Reviewed-by: Sowmini Varadhan
    Signed-off-by: Santosh Shilimkar
    [yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
    Signed-off-by: Zhu Yanjun

    Santosh Shilimkar
     
  • RDMA transport maps user tos to underline virtual lanes(VL)
    for IB or DSCP values. RDMA CM transport abstract thats for
    RDS. TCP transport makes use of default priority 0 and maps
    all user tos values to it.

    Reviewed-by: Sowmini Varadhan
    Signed-off-by: Santosh Shilimkar
    [yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
    Signed-off-by: Zhu Yanjun

    Santosh Shilimkar
     
  • RDS Service type (TOS) is user-defined and needs to be configured
    via RDS IOCTL interface. It must be set before initiating any
    traffic and once set the TOS can not be changed. All out-going
    traffic from the socket will be associated with its TOS.

    Reviewed-by: Sowmini Varadhan
    Signed-off-by: Santosh Shilimkar
    [yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
    Signed-off-by: Zhu Yanjun

    Santosh Shilimkar
     
  • For legacy protocol version incompatibility with non linux RDS,
    consumer reject reason being used to convey it to peer. But the
    choice of reject reason value as '1' was really poor.

    Anyway for interoperability reasons with shipping products,
    it needs to be supported. For any future versions, properly
    encoded reject reason should to be used.

    Reviewed-by: Sowmini Varadhan
    Signed-off-by: Santosh Shilimkar
    [yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
    Signed-off-by: Zhu Yanjun

    Santosh Shilimkar
     
  • Mark RDSv3.1 as compat version and add v4.1 version macro's.
    Subsequent patches enable TOS(Type of Service) feature which is
    tied with v4.1 for RDMA transport.

    Reviewed-by: Sowmini Varadhan
    Signed-off-by: Santosh Shilimkar
    [yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
    Signed-off-by: Zhu Yanjun

    Santosh Shilimkar
     
  • Linux 5.0-rc5

    Needed to merge the include/uapi changes so we have an up to date
    single-tree for these files. Patches already posted are also expected to
    need this for dependencies.

    Jason Gunthorpe
     
  • Keeping single line wrapper functions is not useful. Hence remove the
    ib_sg_dma_address() and ib_sg_dma_len() functions. This patch does not
    change any functionality.

    Signed-off-by: Bart Van Assche
    Signed-off-by: Jason Gunthorpe

    Bart Van Assche
     

04 Feb, 2019

3 commits

  • Add SO_TIMESTAMP_NEW and SO_TIMESTAMPNS_NEW variants of
    socket timestamp options.
    These are the y2038 safe versions of the SO_TIMESTAMP_OLD
    and SO_TIMESTAMPNS_OLD for all architectures.

    Note that the format of scm_timestamping.ts[0] is not changed
    in this patch.

    Signed-off-by: Deepa Dinamani
    Acked-by: Willem de Bruijn
    Cc: jejb@parisc-linux.org
    Cc: ralf@linux-mips.org
    Cc: rth@twiddle.net
    Cc: linux-alpha@vger.kernel.org
    Cc: linux-mips@linux-mips.org
    Cc: linux-parisc@vger.kernel.org
    Cc: linux-rdma@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Cc: sparclinux@vger.kernel.org
    Signed-off-by: David S. Miller

    Deepa Dinamani
     
  • As part of y2038 solution, all internal uses of
    struct timeval are replaced by struct __kernel_old_timeval
    and struct compat_timeval by struct old_timeval32.
    Make socket timestamps use these new types.

    This is mainly to be able to verify that the kernel build
    is y2038 safe when such non y2038 safe types are not
    supported anymore.

    Signed-off-by: Deepa Dinamani
    Acked-by: Willem de Bruijn
    Cc: isdn@linux-pingi.de
    Signed-off-by: David S. Miller

    Deepa Dinamani
     
  • SO_TIMESTAMP, SO_TIMESTAMPNS and SO_TIMESTAMPING options, the
    way they are currently defined, are not y2038 safe.
    Subsequent patches in the series add new y2038 safe versions
    of these options which provide 64 bit timestamps on all
    architectures uniformly.
    Hence, rename existing options with OLD tag suffixes.

    Also note that kernel will not use the untagged SO_TIMESTAMP*
    and SCM_TIMESTAMP* options internally anymore.

    Signed-off-by: Deepa Dinamani
    Acked-by: Willem de Bruijn
    Cc: deller@gmx.de
    Cc: dhowells@redhat.com
    Cc: jejb@parisc-linux.org
    Cc: ralf@linux-mips.org
    Cc: rth@twiddle.net
    Cc: linux-afs@lists.infradead.org
    Cc: linux-alpha@vger.kernel.org
    Cc: linux-arch@vger.kernel.org
    Cc: linux-mips@linux-mips.org
    Cc: linux-parisc@vger.kernel.org
    Cc: linux-rdma@vger.kernel.org
    Cc: sparclinux@vger.kernel.org
    Signed-off-by: David S. Miller

    Deepa Dinamani
     

01 Feb, 2019

1 commit

  • syzbot was able to catch a bug in rds [1]

    The issue here is that the socket might be found in a hash table
    but that its refcount has already be set to 0 by another cpu.

    We need to use refcount_inc_not_zero() to be safe here.

    [1]

    refcount_t: increment on 0; use-after-free.
    WARNING: CPU: 1 PID: 23129 at lib/refcount.c:153 refcount_inc_checked lib/refcount.c:153 [inline]
    WARNING: CPU: 1 PID: 23129 at lib/refcount.c:153 refcount_inc_checked+0x61/0x70 lib/refcount.c:151
    Kernel panic - not syncing: panic_on_warn set ...
    CPU: 1 PID: 23129 Comm: syz-executor3 Not tainted 5.0.0-rc4+ #53
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
    panic+0x2cb/0x65c kernel/panic.c:214
    __warn.cold+0x20/0x48 kernel/panic.c:571
    report_bug+0x263/0x2b0 lib/bug.c:186
    fixup_bug arch/x86/kernel/traps.c:178 [inline]
    fixup_bug arch/x86/kernel/traps.c:173 [inline]
    do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
    do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:290
    invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
    RIP: 0010:refcount_inc_checked lib/refcount.c:153 [inline]
    RIP: 0010:refcount_inc_checked+0x61/0x70 lib/refcount.c:151
    Code: 1d 51 63 c8 06 31 ff 89 de e8 eb 1b f2 fd 84 db 75 dd e8 a2 1a f2 fd 48 c7 c7 60 9f 81 88 c6 05 31 63 c8 06 01 e8 af 65 bb fd 0b eb c1 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 54 49
    RSP: 0018:ffff8880a0cbf1e8 EFLAGS: 00010282
    RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc90006113000
    RDX: 000000000001047d RSI: ffffffff81685776 RDI: 0000000000000005
    RBP: ffff8880a0cbf1f8 R08: ffff888097c9e100 R09: ffffed1015ce5021
    R10: ffffed1015ce5020 R11: ffff8880ae728107 R12: ffff8880723c20c0
    R13: ffff8880723c24b0 R14: dffffc0000000000 R15: ffffed1014197e64
    sock_hold include/net/sock.h:647 [inline]
    rds_sock_addref+0x19/0x20 net/rds/af_rds.c:675
    rds_find_bound+0x97c/0x1080 net/rds/bind.c:82
    rds_recv_incoming+0x3be/0x1430 net/rds/recv.c:362
    rds_loop_xmit+0xf3/0x2a0 net/rds/loop.c:96
    rds_send_xmit+0x1355/0x2a10 net/rds/send.c:355
    rds_sendmsg+0x323c/0x44e0 net/rds/send.c:1368
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg+0xdd/0x130 net/socket.c:631
    __sys_sendto+0x387/0x5f0 net/socket.c:1788
    __do_sys_sendto net/socket.c:1800 [inline]
    __se_sys_sendto net/socket.c:1796 [inline]
    __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1796
    do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x458089
    Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007fc266df8c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
    RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 0000000000458089
    RDX: 0000000000000000 RSI: 00000000204b3fff RDI: 0000000000000005
    RBP: 000000000073bf00 R08: 00000000202b4000 R09: 0000000000000010
    R10: 0000000000000000 R11: 0000000000000246 R12: 00007fc266df96d4
    R13: 00000000004c56e4 R14: 00000000004d94a8 R15: 00000000ffffffff

    Fixes: cc4dfb7f70a3 ("rds: fix two RCU related problems")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Cc: Sowmini Varadhan
    Cc: Santosh Shilimkar
    Cc: rds-devel@oss.oracle.com
    Cc: Cong Wang
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Eric Dumazet
     

07 Jan, 2019

1 commit


04 Jan, 2019

1 commit

  • Pull networking fixes from David Miller:
    "Several fixes here. Basically split down the line between newly
    introduced regressions and long existing problems:

    1) Double free in tipc_enable_bearer(), from Cong Wang.

    2) Many fixes to nf_conncount, from Florian Westphal.

    3) op->get_regs_len() can throw an error, check it, from Yunsheng
    Lin.

    4) Need to use GFP_ATOMIC in *_add_hash_mac_address() of fsl/fman
    driver, from Scott Wood.

    5) Inifnite loop in fib_empty_table(), from Yue Haibing.

    6) Use after free in ax25_fillin_cb(), from Cong Wang.

    7) Fix socket locking in nr_find_socket(), also from Cong Wang.

    8) Fix WoL wakeup enable in r8169, from Heiner Kallweit.

    9) On 32-bit sock->sk_stamp is not thread-safe, from Deepa Dinamani.

    10) Fix ptr_ring wrap during queue swap, from Cong Wang.

    11) Missing shutdown callback in hinic driver, from Xue Chaojing.

    12) Need to return NULL on error from ip6_neigh_lookup(), from Stefano
    Brivio.

    13) BPF out of bounds speculation fixes from Daniel Borkmann"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits)
    ipv6: Consider sk_bound_dev_if when binding a socket to an address
    ipv6: Fix dump of specific table with strict checking
    bpf: add various test cases to selftests
    bpf: prevent out of bounds speculation on pointer arithmetic
    bpf: fix check_map_access smin_value test when pointer contains offset
    bpf: restrict unknown scalars of mixed signed bounds for unprivileged
    bpf: restrict stack pointer arithmetic for unprivileged
    bpf: restrict map value pointer arithmetic for unprivileged
    bpf: enable access to ax register also from verifier rewrite
    bpf: move tmp variable into ax register in interpreter
    bpf: move {prev_,}insn_idx into verifier env
    isdn: fix kernel-infoleak in capi_unlocked_ioctl
    ipv6: route: Fix return value of ip6_neigh_lookup() on neigh_create() error
    net/hamradio/6pack: use mod_timer() to rearm timers
    net-next/hinic:add shutdown callback
    net: hns3: call hns3_nic_net_open() while doing HNAE3_UP_CLIENT
    ip: validate header length on virtual device xmit
    tap: call skb_probe_transport_header after setting skb->dev
    ptr_ring: wrap back ->producer in __ptr_ring_swap_queue()
    net: rds: remove unnecessary NULL check
    ...

    Linus Torvalds
     

02 Jan, 2019

1 commit


29 Dec, 2018

1 commit

  • Pull rdma updates from Jason Gunthorpe:
    "This has been a fairly typical cycle, with the usual sorts of driver
    updates. Several series continue to come through which improve and
    modernize various parts of the core code, and we finally are starting
    to get the uAPI command interface cleaned up.

    - Various driver fixes for bnxt_re, cxgb3/4, hfi1, hns, i40iw, mlx4,
    mlx5, qib, rxe, usnic

    - Rework the entire syscall flow for uverbs to be able to run over
    ioctl(). Finally getting past the historic bad choice to use
    write() for command execution

    - More functional coverage with the mlx5 'devx' user API

    - Start of the HFI1 series for 'TID RDMA'

    - SRQ support in the hns driver

    - Support for new IBTA defined 2x lane widths

    - A big series to consolidate all the driver function pointers into a
    big struct and have drivers provide a 'static const' version of the
    struct instead of open coding initialization

    - New 'advise_mr' uAPI to control device caching/loading of page
    tables

    - Support for inline data in SRPT

    - Modernize how umad uses the driver core and creates cdev's and
    sysfs files

    - First steps toward removing 'uobject' from the view of the drivers"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (193 commits)
    RDMA/srpt: Use kmem_cache_free() instead of kfree()
    RDMA/mlx5: Signedness bug in UVERBS_HANDLER()
    IB/uverbs: Signedness bug in UVERBS_HANDLER()
    IB/mlx5: Allocate the per-port Q counter shared when DEVX is supported
    IB/umad: Start using dev_groups of class
    IB/umad: Use class_groups and let core create class file
    IB/umad: Refactor code to use cdev_device_add()
    IB/umad: Avoid destroying device while it is accessed
    IB/umad: Simplify and avoid dynamic allocation of class
    IB/mlx5: Fix wrong error unwind
    IB/mlx4: Remove set but not used variable 'pd'
    RDMA/iwcm: Don't copy past the end of dev_name() string
    IB/mlx5: Fix long EEH recover time with NVMe offloads
    IB/mlx5: Simplify netdev unbinding
    IB/core: Move query port to ioctl
    RDMA/nldev: Expose port_cap_flags2
    IB/core: uverbs copy to struct or zero helper
    IB/rxe: Reuse code which sets port state
    IB/rxe: Make counters thread safe
    IB/mlx5: Use the correct commands for UMEM and UCTX allocation
    ...

    Linus Torvalds
     

20 Dec, 2018

3 commits

  • >> net/rds/send.c:1109:42: warning: Using plain integer as NULL pointer

    Fixes: ea010070d0a7 ("net/rds: fix warn in rds_message_alloc_sgs")
    Reported-by: kbuild test robot
    Signed-off-by: David S. Miller

    David S. Miller
     
  • per comment from Leon in rdma mailing list
    https://lkml.org/lkml/2018/10/31/312 :

    Please don't forget to remove user triggered WARN_ON.
    https://lwn.net/Articles/769365/
    "Greg Kroah-Hartman raised the problem of core kernel API code that will
    use WARN_ON_ONCE() to complain about bad usage; that will not generate
    the desired result if WARN_ON_ONCE() is configured to crash the machine.
    He was told that the code should just call pr_warn() instead, and that
    the called function should return an error in such situations. It was
    generally agreed that any WARN_ON() or WARN_ON_ONCE() calls that can be
    triggered from user space need to be fixed."

    in addition harden rds_sendmsg to detect and overcome issues with
    invalid sg count and fail the sendmsg.

    Suggested-by: Leon Romanovsky
    Acked-by: Santosh Shilimkar
    Signed-off-by: shamir rabinovitch
    Signed-off-by: David S. Miller

    shamir rabinovitch
     
  • redundant copy_from_user in rds_sendmsg system call expose rds
    to issue where rds_rdma_extra_size walk the rds iovec and and
    calculate the number pf pages (sgs) it need to add to the tail of
    rds message and later rds_cmsg_rdma_args copy the rds iovec again
    and re calculate the same number and get different result causing
    WARN_ON in rds_message_alloc_sgs.

    fix this by doing the copy_from_user only once per rds_sendmsg
    system call.

    When issue occur the below dump is seen:

    WARNING: CPU: 0 PID: 19789 at net/rds/message.c:316 rds_message_alloc_sgs+0x10c/0x160 net/rds/message.c:316
    Kernel panic - not syncing: panic_on_warn set ...
    CPU: 0 PID: 19789 Comm: syz-executor827 Not tainted 4.19.0-next-20181030+ #101
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x244/0x39d lib/dump_stack.c:113
    panic+0x2ad/0x55c kernel/panic.c:188
    __warn.cold.8+0x20/0x45 kernel/panic.c:540
    report_bug+0x254/0x2d0 lib/bug.c:186
    fixup_bug arch/x86/kernel/traps.c:178 [inline]
    do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
    do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
    invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
    RIP: 0010:rds_message_alloc_sgs+0x10c/0x160 net/rds/message.c:316
    Code: c0 74 04 3c 03 7e 6c 44 01 ab 78 01 00 00 e8 2b 9e 35 fa 4c 89 e0 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 14 9e 35 fa 0b 31 ff 44 89 ee e8 18 9f 35 fa 45 85 ed 75 1b e8 fe 9d 35 fa
    RSP: 0018:ffff8801c51b7460 EFLAGS: 00010293
    RAX: ffff8801bc412080 RBX: ffff8801d7bf4040 RCX: ffffffff8749c9e6
    RDX: 0000000000000000 RSI: ffffffff8749ca5c RDI: 0000000000000004
    RBP: ffff8801c51b7490 R08: ffff8801bc412080 R09: ffffed003b5c5b67
    R10: ffffed003b5c5b67 R11: ffff8801dae2db3b R12: 0000000000000000
    R13: 000000000007165c R14: 000000000007165c R15: 0000000000000005
    rds_cmsg_rdma_args+0x82d/0x1510 net/rds/rdma.c:623
    rds_cmsg_send net/rds/send.c:971 [inline]
    rds_sendmsg+0x19a2/0x3180 net/rds/send.c:1273
    sock_sendmsg_nosec net/socket.c:622 [inline]
    sock_sendmsg+0xd5/0x120 net/socket.c:632
    ___sys_sendmsg+0x7fd/0x930 net/socket.c:2117
    __sys_sendmsg+0x11d/0x280 net/socket.c:2155
    __do_sys_sendmsg net/socket.c:2164 [inline]
    __se_sys_sendmsg net/socket.c:2162 [inline]
    __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2162
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x44a859
    Code: e8 dc e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 6b cb fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007f1d4710ada8 EFLAGS: 00000297 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00000000006dcc28 RCX: 000000000044a859
    RDX: 0000000000000000 RSI: 0000000020001600 RDI: 0000000000000003
    RBP: 00000000006dcc20 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000297 R12: 00000000006dcc2c
    R13: 646e732f7665642f R14: 00007f1d4710b9c0 R15: 00000000006dcd2c
    Kernel Offset: disabled
    Rebooting in 86400 seconds..

    Reported-by: syzbot+26de17458aeda9d305d8@syzkaller.appspotmail.com
    Acked-by: Santosh Shilimkar
    Signed-off-by: shamir rabinovitch
    Signed-off-by: David S. Miller

    shamir rabinovitch
     

12 Dec, 2018

1 commit


13 Oct, 2018

1 commit


11 Oct, 2018

1 commit

  • In rds_send_mprds_hash(), if the calculated hash value is non-zero and
    the MPRDS connections are not yet up, it will wait. But it should not
    wait if the send is non-blocking. In this case, it should just use the
    base c_path for sending the message.

    Signed-off-by: Ka-Cheong Poon
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Ka-Cheong Poon
     

26 Sep, 2018

1 commit


24 Sep, 2018

1 commit


22 Sep, 2018

1 commit

  • Clang warns when two declarations' section attributes don't match.

    net/rds/ib_stats.c:40:1: warning: section does not match previous
    declaration [-Wsection]
    DEFINE_PER_CPU_SHARED_ALIGNED(struct rds_ib_statistics, rds_ib_stats);
    ^
    ./include/linux/percpu-defs.h:142:2: note: expanded from macro
    'DEFINE_PER_CPU_SHARED_ALIGNED'
    DEFINE_PER_CPU_SECTION(type, name,
    PER_CPU_SHARED_ALIGNED_SECTION) \
    ^
    ./include/linux/percpu-defs.h:93:9: note: expanded from macro
    'DEFINE_PER_CPU_SECTION'
    extern __PCPU_ATTRS(sec) __typeof__(type) name;
    \
    ^
    ./include/linux/percpu-defs.h:49:26: note: expanded from macro
    '__PCPU_ATTRS'
    __percpu __attribute__((section(PER_CPU_BASE_SECTION sec)))
    \
    ^
    net/rds/ib.h:446:1: note: previous attribute is here
    DECLARE_PER_CPU(struct rds_ib_statistics, rds_ib_stats);
    ^
    ./include/linux/percpu-defs.h:111:2: note: expanded from macro
    'DECLARE_PER_CPU'
    DECLARE_PER_CPU_SECTION(type, name, "")
    ^
    ./include/linux/percpu-defs.h:87:9: note: expanded from macro
    'DECLARE_PER_CPU_SECTION'
    extern __PCPU_ATTRS(sec) __typeof__(type) name
    ^
    ./include/linux/percpu-defs.h:49:26: note: expanded from macro
    '__PCPU_ATTRS'
    __percpu __attribute__((section(PER_CPU_BASE_SECTION sec)))
    \
    ^
    1 warning generated.

    The initial definition was added in commit ec16227e1414 ("RDS/IB:
    Infiniband transport") and the cache aligned definition was added in
    commit e6babe4cc4ce ("RDS/IB: Stats and sysctls") right after. The
    definition probably should have been updated in net/rds/ib.h, which is
    what this patch does.

    Link: https://github.com/ClangBuiltLinux/linux/issues/114
    Signed-off-by: Nathan Chancellor
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Nathan Chancellor
     

17 Sep, 2018

1 commit

  • The function rds_inc_init is in recv process. To use memset can optimize
    the function rds_inc_init.
    The test result:

    Before:
    1) + 24.950 us | rds_inc_init [rds]();
    After:
    1) + 10.990 us | rds_inc_init [rds]();

    Acked-by: Santosh Shilimkar
    Signed-off-by: Zhu Yanjun
    Signed-off-by: David S. Miller

    Zhu Yanjun
     

13 Sep, 2018

1 commit


12 Sep, 2018

1 commit

  • When a rds sock is bound, it is inserted into the bind_hash_table
    which is protected by RCU. But when releasing rds sock, after it
    is removed from this hash table, it is freed immediately without
    respecting RCU grace period. This could cause some use-after-free
    as reported by syzbot.

    Mark the rds sock with SOCK_RCU_FREE before inserting it into the
    bind_hash_table, so that it would be always freed after a RCU grace
    period.

    The other problem is in rds_find_bound(), the rds sock could be
    freed in between rhashtable_lookup_fast() and rds_sock_addref(),
    so we need to extend RCU read lock protection in rds_find_bound()
    to close this race condition.

    Reported-and-tested-by: syzbot+8967084bcac563795dc6@syzkaller.appspotmail.com
    Reported-by: syzbot+93a5839deb355537440f@syzkaller.appspotmail.com
    Cc: Sowmini Varadhan
    Cc: Santosh Shilimkar
    Cc: rds-devel@oss.oracle.com
    Signed-off-by: Cong Wang
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Cong Wang
     

05 Sep, 2018

1 commit


02 Sep, 2018

1 commit

  • rds is the last in-kernel user of the old do_gettimeofday()
    function. Convert it over to ktime_get_real() to make it
    work more like the generic socket timestamps, and to let
    us kill off do_gettimeofday().

    A follow-up patch will have to change the user space interface
    to deal better with 32-bit tasks, which may use an incompatible
    layout for 'struct timespec'.

    Signed-off-by: Arnd Bergmann
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

01 Sep, 2018

1 commit

  • Getting prompt "The RDS Protocol" (RDS) is not too helpful, and it is
    easily confused with Radio Data System (which we may want to support
    in kernel, too).

    Signed-off-by: Pavel Machek
    Acked-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Acked-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Pavel Machek
     

28 Aug, 2018

1 commit

  • In IPv4, the newly introduced rdma_read_gids is used to read the SGID/DGID
    for the connection which returns GID correctly for RoCE transport as well.

    In IPv6, rdma_read_gids is also used. The following are why rdma_read_gids
    is introduced.

    rdma_addr_get_dgid() for RoCE for client side connections returns MAC
    address, instead of DGID.
    rdma_addr_get_sgid() for RoCE doesn't return correct SGID for IPv6 and
    when more than one IP address is assigned to the netdevice.

    So the transport agnostic rdma_read_gids() API is provided by rdma_cm
    module.

    Signed-off-by: Zhu Yanjun
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Zhu Yanjun
     

22 Aug, 2018

1 commit