12 Apr, 2014

1 commit

  • Pull 9p changes from Eric Van Hensbergen:
    "A bunch of updates and cleanup within the transport layer,
    particularly with a focus on RDMA"

    * tag 'for-linus-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
    9pnet_rdma: check token type before int conversion
    9pnet: trans_fd : allocate struct p9_trans_fd and struct p9_conn together.
    9pnet: p9_client->conn field is unused. Remove it.
    9P: Get rid of REQ_STATUS_FLSH
    9pnet_rdma: add cancelled()
    9pnet_rdma: update request status during send
    9P: Add cancelled() to the transport functions.
    net: Mark function as static in 9p/client.c
    9P: Add memory barriers to protect request fields over cb/rpc threads handoff

    Linus Torvalds
     

26 Mar, 2014

8 commits

  • When parsing options, make sure we have found a proper token before
    doing a numeric conversion.

    Without this check, the current code will end up following random
    pointers that just happened to be on the stack when this function was
    called, because match_token() will not touch the 'args' list unless a
    valid token is found.

    Signed-off-by: Simon Derr
    Signed-off-by: Eric Van Hensbergen

    Simon Derr
     
  • There is no point in allocating these structs separately.
    Changing this makes the code a little simpler and saves a few bytes of
    memory.

    Reported-by: Herve Vico
    Signed-off-by: Simon Derr
    Signed-off-by: Eric Van Hensbergen

    Simon Derr
     
  • This request state is mostly useless, and properly implementing it
    for RDMA would require an extra lock to be taken in handle_recv()
    and in rdma_cancel() to avoid this race:

    handle_recv() rdma_cancel()
    . .
    . if req->state == SENT
    req->state = RCVD .
    . req->state = FLSH

    So just get rid of it.

    Signed-off-by: Simon Derr
    Signed-off-by: Eric Van Hensbergen

    Simon Derr
     
  • Take into account posted recv buffers that will never receive their
    reply.

    The RDMA code posts a recv buffer for each request that it sends.
    When a request is flushed, it is possible that this request will
    never receive a reply, and that one recv buffer will stay unused on
    the recv queue.

    It is then possible, if this scenario happens several times, to have the
    recv queue full, and have the 9pnet_rmda module unable to send new requests.

    Signed-off-by: Simon Derr
    Signed-off-by: Eric Van Hensbergen

    Simon Derr
     
  • This will be needed by the flush logic.

    Signed-off-by: Simon Derr
    Signed-off-by: Eric Van Hensbergen

    Simon Derr
     
  • And move transport-specific code out of net/9p/client.c

    Signed-off-by: Simon Derr
    Signed-off-by: Eric Van Hensbergen

    Simon Derr
     
  • Mark function as static in net/9p/client.c because it is not used
    outside this file.

    This eliminates the following warning in net/9p/client.c:
    net/9p/client.c:207:18: warning: no previous prototype for ‘p9_fcall_alloc’ [-Wmissing-prototypes]

    Signed-off-by: Rashika Kheria
    Reviewed-by: Josh Triplett
    Signed-off-by: Eric Van Hensbergen

    Rashika
     
  • We need barriers to guarantee this pattern works as intended:
    [w] req->rc, 1 [r] req->status, 1
    wmb rmb
    [w] req->status, 1 [r] req->rc

    Where the wmb ensures that rc gets written before status,
    and the rmb ensures that if you observe status == 1, rc is the new value.

    Signed-off-by: Dominique Martinet
    Signed-off-by: Eric Van Hensbergen

    Dominique Martinet
     

11 Feb, 2014

1 commit

  • The 9p-virtio transport does zero copy on things larger than 1024 bytes
    in size. It accomplishes this by returning the physical addresses of
    pages to the virtio-pci device. At present, the translation is usually a
    bit shift.

    That approach produces an invalid page address when we read/write to
    vmalloc buffers, such as those used for Linux kernel modules. Any
    attempt to load a Linux kernel module from 9p-virtio produces the
    following stack.

    [] p9_virtio_zc_request+0x45e/0x510
    [] p9_client_zc_rpc.constprop.16+0xfd/0x4f0
    [] p9_client_read+0x15d/0x240
    [] v9fs_fid_readn+0x50/0xa0
    [] v9fs_file_readn+0x10/0x20
    [] v9fs_file_read+0x37/0x70
    [] vfs_read+0x9b/0x160
    [] kernel_read+0x41/0x60
    [] copy_module_from_fd.isra.34+0xfb/0x180

    Subsequently, QEMU will die printing:

    qemu-system-x86_64: virtio: trying to map MMIO memory

    This patch enables 9p-virtio to correctly handle this case. This not
    only enables us to load Linux kernel modules off virtfs, but also
    enables ZFS file-based vdevs on virtfs to be used without killing QEMU.

    Special thanks to both Avi Kivity and Alexander Graf for their
    interpretation of QEMU backtraces. Without their guidence, tracking down
    this bug would have taken much longer. Also, special thanks to Linus
    Torvalds for his insightful explanation of why this should use
    is_vmalloc_addr() instead of is_vmalloc_or_module_addr():

    https://lkml.org/lkml/2014/2/8/272

    Signed-off-by: Richard Yao
    Signed-off-by: David S. Miller

    Richard Yao
     

10 Feb, 2014

1 commit

  • Mark function as static in net/9p/client.c because it is not used
    outside this file.

    This eliminates the following warning in net/9p/client.c:
    net/9p/client.c:207:18: warning: no previous prototype for ‘p9_fcall_alloc’ [-Wmissing-prototypes]

    Signed-off-by: Rashika Kheria
    Reviewed-by: Josh Triplett
    Signed-off-by: David S. Miller

    Rashika Kheria
     

24 Nov, 2013

1 commit


15 Nov, 2013

1 commit

  • Pull virtio updates from Rusty Russell:
    "Nothing really exciting: some groundwork for changing virtio endian,
    and some robustness fixes for broken virtio devices, plus minor
    tweaks"

    * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    virtio_scsi: verify if queue is broken after virtqueue_get_buf()
    x86, asmlinkage, lguest: Pass in globals into assembler statement
    virtio: mmio: fix signature checking for BE guests
    virtio_ring: adapt to notify() returning bool
    virtio_net: verify if queue is broken after virtqueue_get_buf()
    virtio_console: verify if queue is broken after virtqueue_get_buf()
    virtio_blk: verify if queue is broken after virtqueue_get_buf()
    virtio_ring: add new function virtqueue_is_broken()
    virtio_test: verify if virtqueue_kick() succeeded
    virtio_net: verify if virtqueue_kick() succeeded
    virtio_ring: let virtqueue_{kick()/notify()} return a bool
    virtio_ring: change host notification API
    virtio_config: remove virtio_config_val
    virtio: use size-based config accessors.
    virtio_config: introduce size-based accessors.
    virtio_ring: plug kmemleak false positive.
    virtio: pm: use CONFIG_PM_SLEEP instead of CONFIG_PM

    Linus Torvalds
     

25 Oct, 2013

1 commit


17 Oct, 2013

1 commit


12 Sep, 2013

1 commit

  • Pull 9p updates from Eric Van Hensbergen:
    "Minor 9p fixes and tweaks for 3.12 merge window

    The first fixes namespace issues which causes a kernel NULL pointer
    dereference, the second fixes uevent handling to work better with
    udev, and the third switches some code to use srlcpy instead of
    strncpy in order to be safer.

    All changes have been baking in for-next for at least 2 weeks"

    * tag 'for-linus-3.12-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
    fs/9p: avoid accessing utsname after namespace has been torn down
    9p: send uevent after adding/removing mount_tag attribute
    fs: 9p: use strlcpy instead of strncpy

    Linus Torvalds
     

26 Aug, 2013

2 commits

  • During trinity fuzzing in a kvmtool guest, I stumbled across the
    following:

    Unable to handle kernel NULL pointer dereference at virtual address 00000004
    PC is at v9fs_file_do_lock+0xc8/0x1a0
    LR is at v9fs_file_do_lock+0x48/0x1a0
    [] (v9fs_file_do_lock+0xc8/0x1a0) from [] (locks_remove_flock+0x8c/0x124)
    [] (locks_remove_flock+0x8c/0x124) from [] (__fput+0x58/0x1e4)
    [] (__fput+0x58/0x1e4) from [] (task_work_run+0xac/0xe8)
    [] (task_work_run+0xac/0xe8) from [] (do_exit+0x6bc/0x8d8)
    [] (do_exit+0x6bc/0x8d8) from [] (do_group_exit+0x3c/0xb0)
    [] (do_group_exit+0x3c/0xb0) from [] (__wake_up_parent+0x0/0x18)

    I believe this is due to an attempt to access utsname()->nodename, after
    exit_task_namespaces() has been called, leaving current->nsproxy->uts_ns
    as NULL and causing the above dereference.

    A similar issue was fixed for lockd in 9a1b6bf818e7 ("LOCKD: Don't call
    utsname()->nodename from nlmclnt_setlockargs"), so this patch attempts
    something similar for 9pfs.

    Cc: Eric Van Hensbergen
    Cc: Ron Minnich
    Cc: Latchesar Ionkov
    Cc: Trond Myklebust
    Signed-off-by: Will Deacon
    Signed-off-by: Eric Van Hensbergen

    Will Deacon
     
  • This driver adds an attribute to the existing virtio device so a CHANGE
    event is required in order udev rules to make use of it. The ADD event
    happens before this driver is probed and unlike a more typical driver
    like a block device there isn't a higher level device to watch for.

    Signed-off-by: Michael Marineau
    Signed-off-by: Eric Van Hensbergen

    Michael Marineau
     

31 Jul, 2013

1 commit


25 Jul, 2013

1 commit

  • This patch gets rid of the following warning:

    net/9p/trans_rdma.c:594:12: warning: ‘rdma_cancelled’ defined but not used [-Wunused-function]
    static int rdma_cancelled(struct p9_client *client, struct p9_req_t *req)

    The rdma_cancelled function is not called anywhere in the kernel

    Signed-off-by: Andi Shyti
    Signed-off-by: David S. Miller

    Andi Shyti
     

14 Jul, 2013

1 commit

  • Pull networking fixes from David Miller:
    "Just a bunch of small fixes and tidy ups:

    1) Finish the "busy_poll" renames, from Eliezer Tamir.

    2) Fix RCU stalls in IFB driver, from Ding Tianhong.

    3) Linearize buffers properly in tun/macvtap zerocopy code.

    4) Don't crash on rmmod in vxlan, from Pravin B Shelar.

    5) Spinlock used before init in alx driver, from Maarten Lankhorst.

    6) A sparse warning fix in bnx2x broke TSO checksums, fix from Dmitry
    Kravkov.

    7) Dummy and ifb driver load failure paths can oops, fixes from Tan
    Xiaojun and Ding Tianhong.

    8) Correct MTU calculations in IP tunnels, from Alexander Duyck.

    9) Account all TCP retransmits in SNMP stats properly, from Yuchung
    Cheng.

    10) atl1e and via-rhine do not handle DMA mapping failures properly,
    from Neil Horman.

    11) Various equal-cost multipath route fixes in ipv6 from Hannes
    Frederic Sowa"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (36 commits)
    ipv6: only static routes qualify for equal cost multipathing
    via-rhine: fix dma mapping errors
    atl1e: fix dma mapping warnings
    tcp: account all retransmit failures
    usb/net/r815x: fix cast to restricted __le32
    usb/net/r8152: fix integer overflow in expression
    net: access page->private by using page_private
    net: strict_strtoul is obsolete, use kstrtoul instead
    drivers/net/ieee802154: don't use devm_pinctrl_get_select_default() in probe
    drivers/net/ethernet/cadence: don't use devm_pinctrl_get_select_default() in probe
    drivers/net/can/c_can: don't use devm_pinctrl_get_select_default() in probe
    net/usb: add relative mii functions for r815x
    net/tipc: use %*phC to dump small buffers in hex form
    qlcnic: Adding Maintainers.
    gre: Fix MTU sizing check for gretap tunnels
    pkt_sched: sch_qfq: remove forward declaration of qfq_update_agg_ts
    pkt_sched: sch_qfq: improve efficiency of make_eligible
    gso: Update tunnel segmentation to support Tx checksum offload
    inet: fix spacing in assignment
    ifb: fix oops when loading the ifb failed
    ...

    Linus Torvalds
     

12 Jul, 2013

2 commits

  • p9_release_pages() would attempt to dereference one value past the end of
    pages[]. This would cause the following crashes:

    [ 6293.171817] BUG: unable to handle kernel paging request at ffff8807c96f3000
    [ 6293.174146] IP: [] p9_release_pages+0x3b/0x60
    [ 6293.176447] PGD 79c5067 PUD 82c1e3067 PMD 82c197067 PTE 80000007c96f3060
    [ 6293.180060] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    [ 6293.180060] Modules linked in:
    [ 6293.180060] CPU: 62 PID: 174043 Comm: modprobe Tainted: G W 3.10.0-next-20130710-sasha #3954
    [ 6293.180060] task: ffff8807b803b000 ti: ffff880787dde000 task.ti: ffff880787dde000
    [ 6293.180060] RIP: 0010:[] [] p9_release_pages+0x3b/0x60
    [ 6293.214316] RSP: 0000:ffff880787ddfc28 EFLAGS: 00010202
    [ 6293.214316] RAX: 0000000000000001 RBX: ffff8807c96f2ff8 RCX: 0000000000000000
    [ 6293.222017] RDX: ffff8807b803b000 RSI: 0000000000000001 RDI: ffffea001c7e3d40
    [ 6293.222017] RBP: ffff880787ddfc48 R08: 0000000000000000 R09: 0000000000000000
    [ 6293.222017] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001
    [ 6293.222017] R13: 0000000000000001 R14: ffff8807cc50c070 R15: ffff8807cc50c070
    [ 6293.222017] FS: 00007f572641d700(0000) GS:ffff8807f3600000(0000) knlGS:0000000000000000
    [ 6293.256784] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [ 6293.256784] CR2: ffff8807c96f3000 CR3: 00000007c8e81000 CR4: 00000000000006e0
    [ 6293.256784] Stack:
    [ 6293.256784] ffff880787ddfcc8 ffff880787ddfcc8 0000000000000000 ffff880787ddfcc8
    [ 6293.256784] ffff880787ddfd48 ffffffff84128be8 ffff880700000002 0000000000000001
    [ 6293.256784] ffff8807b803b000 ffff880787ddfce0 0000100000000000 0000000000000000
    [ 6293.256784] Call Trace:
    [ 6293.256784] [] p9_virtio_zc_request+0x598/0x630
    [ 6293.256784] [] ? wake_up_bit+0x40/0x40
    [ 6293.256784] [] p9_client_zc_rpc+0x111/0x3a0
    [ 6293.256784] [] ? sched_clock_cpu+0x108/0x120
    [ 6293.256784] [] p9_client_read+0xe1/0x2c0
    [ 6293.256784] [] v9fs_file_read+0x90/0xc0
    [ 6293.256784] [] vfs_read+0xc3/0x130
    [ 6293.256784] [] ? trace_hardirqs_on+0xd/0x10
    [ 6293.256784] [] SyS_read+0x62/0xa0
    [ 6293.256784] [] tracesys+0xdd/0xe2
    [ 6293.256784] Code: 66 90 48 89 fb 41 89 f5 48 8b 3f 48 85 ff 74 29 85 f6 74 25 45 31 e4 66 0f 1f 84 00 00 00 00 00 e8 eb 14 12 fd 41 ff c4 49 63 c4 8b 3c c3 48 85 ff 74 05 45 39 e5 75 e7 48 83 c4 08 5b 41 5c
    [ 6293.256784] RIP [] p9_release_pages+0x3b/0x60
    [ 6293.256784] RSP
    [ 6293.256784] CR2: ffff8807c96f3000
    [ 6293.256784] ---[ end trace 50822ee72cd360fc ]---

    Signed-off-by: Sasha Levin
    Signed-off-by: David S. Miller

    Sasha Levin
     
  • …inux/kernel/git/ericvh/v9fs

    Pull second round of 9p patches from Eric Van Hensbergen:
    "Several of these patches were rebased in order to correct style
    issues. Only stylistic changes were made versus the patches which
    were in linux-next for two weeks. The rebases have been in linux-next
    for 3 days and have passed my regressions.

    The bulk of these are RDMA fixes and improvements. There's also some
    additions on the extended attributes front to support some additional
    namespaces and a new option for TCP to force allocation of mount
    requests from a priviledged port"

    * tag 'for-linus-3.11-merge-window-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
    fs/9p: Remove the unused variable "err" in v9fs_vfs_getattr()
    9P: Add cancelled() to the transport functions.
    9P/RDMA: count posted buffers without a pending request
    9P/RDMA: Improve error handling in rdma_request
    9P/RDMA: Do not free req->rc in error handling in rdma_request()
    9P/RDMA: Use a semaphore to protect the RQ
    9P/RDMA: Protect against duplicate replies
    9P/RDMA: increase P9_RDMA_MAXSIZE to 1MB
    9pnet: refactor struct p9_fcall alloc code
    9P/RDMA: rdma_request() needs not allocate req->rc
    9P: Fix fcall allocation for rdma
    fs/9p: xattr: add trusted and security namespaces
    net/9p: add privport option to 9p tcp transport

    Linus Torvalds
     

10 Jul, 2013

1 commit

  • …inux/kernel/git/ericvh/v9fs

    Pull 9p update from Eric Van Hensbergen:
    "Grab bag of little fixes and enhancements:
    - optional security enhancements
    - fix path coverage in MAINTAINERS
    - switch to using most used protocol and transport as default
    - clean up buffer dumps in trace code

    Held off on RDMA patches as they need to be cleaned up a bit, but will
    try to get the cleaned, checked, and pushed by mid-week"

    * tag 'for-linus-3.11-merge-window-part-1' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
    9p: Add rest of 9p files to MAINTAINERS entry
    9p: trace: use %*ph to dump buffer
    net/9p: Handle error in zero copy request correctly for 9p2000.u
    net/9p: Use virtio transpart as the default transport
    net/9p: Make 9P2000.L the default protocol for 9p file system

    Linus Torvalds
     

08 Jul, 2013

11 commits

  • RDMA needs to post a buffer for each incoming reply.
    Hence it needs to keep count of these and needs to be
    aware of whether a flushed request has received a reply
    or not.

    This patch adds the cancelled() callback to the transport modules.
    It is called when RFLUSH has been received and that the corresponding
    request will never receive a reply.

    Signed-off-by: Simon Derr
    Signed-off-by: Eric Van Hensbergen

    Simon Derr
     
  • In rdma_request():

    If an error occurs between posting the recv and the send,
    there will be a reply context posted without a pending
    request.
    Since there is no way to "un-post" it, we remember it and
    skip post_recv() for the next request.

    Signed-off-by: Simon Derr
    Signed-off-by: Eric Van Hensbergen

    Simon Derr
     
  • Most importantly:
    - do not free the recv context (rpl_context) after a successful post_recv()
    - but do free the send context (c) after a failed send.

    Signed-off-by: Simon Derr
    Signed-off-by: Eric Van Hensbergen

    Simon Derr
     
  • rdma_request() should never be in charge of freeing rc.

    When an error occurs:
    * Either the rc buffer has been recv_post()'ed.
    then kfree()'ing it certainly is a bad idea.
    * Or is has not, and in that case req->rc still points to it,
    hence it needs not be freed.

    Signed-off-by: Simon Derr
    Signed-off-by: Eric Van Hensbergen

    Simon Derr
     
  • The current code keeps track of the number of buffers posted in the RQ,
    and will prevent it from overflowing. But it does so by simply dropping
    post requests (And leaking memory in the process).
    When this happens there will actually be too few buffers posted, and
    soon the 9P server will complain about 'RNR retry counter exceeded'
    errors.

    Instead, use a semaphore, and block until the RQ is ready for another
    buffer to be posted.

    Signed-off-by: Simon Derr
    Signed-off-by: Eric Van Hensbergen

    Simon Derr
     
  • A well-behaved server would not send twice the reply to a request.
    But if it ever happens...
    This additional check prevents the kernel from leaking memory
    and possibly more nasty consequences in that unlikely event.

    Signed-off-by: Simon Derr
    Signed-off-by: Eric Van Hensbergen

    Simon Derr
     
  • The current value is too low to get good performance.

    Signed-off-by: Simon Derr
    Signed-off-by: Eric Van Hensbergen

    Simon Derr
     
  • Signed-off-by: Simon Derr
    Signed-off-by: Eric Van Hensbergen

    Simon Derr
     
  • p9_tag_alloc() takes care of that.

    Signed-off-by: Simon Derr
    Signed-off-by: Eric Van Hensbergen

    Simon Derr
     
  • The current code assumes that when a request in the request array
    does have a tc, it also has a rc.

    This is normally true, but not always : when using RDMA, req->rc
    will temporarily be set to NULL after the request has been sent.
    That is usually OK though, as when the reply arrives, req->rc will be
    reassigned to a sane value before the request is recycled.

    But there is a catch : if the request is flushed, the reply will never
    arrive, and req->rc will be NULL, but not req->tc.

    This patch fixes p9_tag_alloc to take this into account.

    Signed-off-by: Simon Derr
    Signed-off-by: Eric Van Hensbergen

    Simon Derr
     
  • If the privport option is specified, the tcp transport binds local
    address to a reserved port before connecting to the 9p server.

    In some cases when 9P AUTH cannot be implemented, this is better than
    nothing.

    Signed-off-by: Jim Garlick
    Signed-off-by: Eric Van Hensbergen

    Jim Garlick
     

11 Jun, 2013

1 commit


29 May, 2013

1 commit

  • For zero copy request, error will be encoded in the user space buffer.
    So copy the error code correctly using copy_from_user. Here we use the
    extra bytes we allocate for zero copy request. If total error details
    are more than P9_ZC_HDR_SZ - 7 bytes, we return -EFAULT. The patch also
    avoid a memory allocation in the error path.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     

28 May, 2013

3 commits