13 Apr, 2018

1 commit

  • When a topology subscription is created, we may encounter (or KASAN
    may provoke) a failure to create a corresponding service instance in
    the binding table. Instead of letting the tipc_nametbl_subscribe()
    report the failure back to the caller, the function just makes a warning
    printout and returns, without incrementing the subscription reference
    counter as expected by the caller.

    This makes the caller believe that the subscription was successful, so
    it will at a later moment try to unsubscribe the item. This involves
    a sub_put() call. Since the reference counter never was incremented
    in the first place, we get a premature delete of the subscription item,
    followed by a "use-after-free" warning.

    We fix this by adding a return value to tipc_nametbl_subscribe() and
    make the caller aware of the failure to subscribe.

    This bug seems to always have been around, but this fix only applies
    back to the commit shown below. Given the low risk of this happening
    we believe this to be sufficient.

    Fixes: commit 218527fe27ad ("tipc: replace name table service range
    array with rb tree")
    Reported-by: syzbot+aa245f26d42b8305d157@syzkaller.appspotmail.com

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

04 Apr, 2018

1 commit

  • When an item of struct tipc_subscription is created, we fail to
    initialize the two lists aggregated into the struct. This has so far
    never been a problem, since the items are just added to a root
    object by list_add(), which does not require the addee list to be
    pre-initialized. However, syzbot is provoking situations where this
    addition fails, whereupon the attempted removal if the item from
    the list causes a crash.

    This problem seems to always have been around, despite that the code
    for creating this object was rewritten in commit 242e82cc95f6 ("tipc:
    collapse subscription creation functions"), which is still in net-next.

    We fix this for that commit by initializing the two lists properly.

    Fixes: 242e82cc95f6 ("tipc: collapse subscription creation functions")
    Reported-by: syzbot+0bb443b74ce09197e970@syzkaller.appspotmail.com
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

17 Feb, 2018

9 commits

  • We rename struct tipc_server to struct tipc_topsrv. This reflect its now
    specialized role as topology server. Accoringly, we change or add function
    prefixes to make it clearer which functionality those belong to.

    There are no functional changes in this commit.

    Acked-by: Ying.Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • In order to narrow the interface and dependencies between the topology
    server and the subscription/binding table functionality we move struct
    tipc_server inside the file server.c. This requires some code
    adaptations in other files, but those are mostly minor.

    The most important change is that we have to move the start/stop
    functions for the topology server to server.c, where they logically
    belong anyway.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • Since we now have removed struct tipc_subscriber from the code, and
    only struct tipc_subscription remains, there is no longer need for long
    and awkward prefixes to distinguish between their pertaining functions.

    We now change all tipc_subscrp_* prefixes to tipc_sub_*. This is
    a purely cosmetic change.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • After the previous changes it becomes logical to collapse the two-level
    creation of subscription instances into one. We do that here.

    We also rename the creation and deletion functions for more consistency.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • Because of the requirement for total distribution transparency, users
    send subscriptions and receive topology events in their own host format.
    It is up to the topology server to determine this format and do the
    correct conversions to and from its own host format when needed.

    Until now, this has been handled in a rather non-transparent way inside
    the topology server and subscriber code, leading to unnecessary
    complexity when creating subscriptions and issuing events.

    We now improve this situation by adding two new macros, tipc_sub_read()
    and tipc_evt_write(). Both those functions calculate the need for
    conversion internally before performing their respective operations.
    Hence, all handling of such conversions become transparent to the rest
    of the code.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • The message transmission and reception in the topology server is more
    generic than is currently necessary. By basing the funtionality on the
    fact that we only send items of type struct tipc_event and always
    receive items of struct tipc_subcr we can make several simplifications,
    and also get rid of some unnecessary dynamic memory allocations.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • It is unnecessary to keep two structures, struct tipc_conn and struct
    tipc_subscriber, with a one-to-one relationship and still with different
    life cycles. The fact that the two often run in different contexts, and
    still may access each other via direct pointers constitutes an additional
    hazard, something we have experienced at several occasions, and still
    see happening.

    We have identified at least two remaining problems that are easier to
    fix if we simplify the topology server data structure somewhat.

    - When there is a race between a subscription up/down event and a
    timeout event, it is fully possible that the former might be delivered
    after the latter, leading to confusion for the receiver.

    - The function tipc_subcrp_timeout() is executing in interrupt context,
    while the following call chain is at least theoretically possible:
    tipc_subscrp_timeout()
    tipc_subscrp_send_event()
    tipc_conn_sendmsg()
    conn_put()
    tipc_conn_kref_release()
    sock_release(sock)

    I.e., we end up calling a function that might try to sleep in
    interrupt context. To eliminate this, we need to ensure that the
    tipc_conn structure and the socket, as well as the subscription
    instances, only are deleted in work queue context, i.e., after the
    timeout event really has been sent out.

    We now remove this unnecessary complexity, by merging data and
    functionality of the subscriber structure into struct tipc_conn
    and the associated file server.c. We thereafter add a spinlock and
    a new 'inactive' state to the subscription structure. Using those,
    both problems described above can be easily solved.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • Interaction between the functionality in server.c and subscr.c is
    done via function pointers installed in struct server. This makes
    the code harder to follow, and doesn't serve any obvious purpose.

    Here, we replace the function pointers with direct function calls.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • The socket handling in the topology server is unnecessarily generic.
    It is prepared to handle both SOCK_RDM, SOCK_DGRAM and SOCK_STREAM
    type sockets, as well as the only socket type which is really used,
    SOCK_SEQPACKET.

    We now remove this redundant code to make the code more readable.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

17 Jan, 2018

1 commit

  • We have identified a race condition during reception of socket
    events and messages in the topology server.

    - The function tipc_close_conn() is releasing the corresponding
    struct tipc_subscriber instance without considering that there
    may still be items in the receive work queue. When those are
    scheduled, in the function tipc_receive_from_work(), they are
    using the subscriber pointer stored in struct tipc_conn, without
    first checking if this is valid or not. This will sometimes
    lead to crashes, as the next call of tipc_conn_recvmsg() will
    access the now deleted item.
    We fix this by making the usage of this pointer conditional on
    whether the connection is active or not. I.e., we check the condition
    test_bit(CF_CONNECTED) before making the call tipc_conn_recvmsg().

    - Since the two functions may be running on different cores, the
    condition test described above is not enough. tipc_close_conn()
    may come in between and delete the subscriber item after the condition
    test is done, but before tipc_conn_recv_msg() is finished. This
    happens less frequently than the problem described above, but leads
    to the same symptoms.

    We fix this by using the existing sk_callback_lock for mutual
    exclusion in the two functions. In addition, we have to move
    a call to tipc_conn_terminate() outside the mentioned lock to
    avoid deadlock.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

10 Jan, 2018

2 commits

  • When a member joins a group, it also indicates a binding scope. This
    makes it possible to create both node local groups, invisible to other
    nodes, as well as cluster global groups, visible everywhere.

    In order to avoid that different members end up having permanently
    differing views of group size and memberhip, we must inhibit locally
    and globally bound members from joining the same group.

    We do this by using the binding scope as an additional separator between
    groups. I.e., a member must ignore all membership events from sockets
    using a different scope than itself, and all lookups for message
    destinations must require an exact match between the message's lookup
    scope and the potential target's binding scope.

    Apart from making it possible to create local groups using the same
    identity on different nodes, a side effect of this is that it now also
    becomes possible to create a cluster global group with the same identity
    across the same nodes, without interfering with the local groups.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • Currently, when a user is subscribing for binding table publications,
    he will receive a PUBLISH event for all already existing matching items
    in the binding table.

    However, a group socket making a subscriptions doesn't need this initial
    status update from the binding table, because it has already scanned it
    during the join operation. Worse, the multiplicatory effect of issuing
    mutual events for dozens or hundreds group members within a short time
    frame put a heavy load on the topology server, with the end result that
    scale out operations on a big group tend to take much longer than needed.

    We now add a new filter option, TIPC_SUB_NO_STATUS, for topology server
    subscriptions, so that this initial avalanche of events is suppressed.
    This change, along with the previous commit, significantly improves the
    range and speed of group scale out operations.

    We keep the new option internal for the tipc driver, at least for now.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

01 Nov, 2017

1 commit

  • In preparation for unconditionally passing the struct timer_list pointer to
    all timer callbacks, switch to using the new timer_setup() and from_timer()
    to pass the timer pointer explicitly.

    Cc: Jon Maloy
    Cc: Ying Xue
    Cc: "David S. Miller"
    Cc: netdev@vger.kernel.org
    Cc: tipc-discussion@lists.sourceforge.net
    Signed-off-by: Kees Cook
    Signed-off-by: David S. Miller

    Kees Cook
     

23 Aug, 2017

2 commits

  • No matter whether a request is inserted into workqueue as a work item
    to cancel a subscription or to delete a subscription's subscriber
    asynchronously, the work items may be executed in different workers.
    As a result, it doesn't mean that one request which is raised prior to
    another request is definitely handled before the latter. By contrast,
    if the latter request is executed before the former request, below
    error may happen:

    [ 656.183644] BUG: spinlock bad magic on CPU#0, kworker/u8:0/12117
    [ 656.184487] general protection fault: 0000 [#1] SMP
    [ 656.185160] Modules linked in: tipc ip6_udp_tunnel udp_tunnel 9pnet_virtio 9p 9pnet virtio_net virtio_pci virtio_ring virtio [last unloaded: ip6_udp_tunnel]
    [ 656.187003] CPU: 0 PID: 12117 Comm: kworker/u8:0 Not tainted 4.11.0-rc7+ #6
    [ 656.187920] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    [ 656.188690] Workqueue: tipc_rcv tipc_recv_work [tipc]
    [ 656.189371] task: ffff88003f5cec40 task.stack: ffffc90004448000
    [ 656.190157] RIP: 0010:spin_bug+0xdd/0xf0
    [ 656.190678] RSP: 0018:ffffc9000444bcb8 EFLAGS: 00010202
    [ 656.191375] RAX: 0000000000000034 RBX: ffff88003f8d1388 RCX: 0000000000000000
    [ 656.192321] RDX: ffff88003ba13708 RSI: ffff88003ba0cd08 RDI: ffff88003ba0cd08
    [ 656.193265] RBP: ffffc9000444bcd0 R08: 0000000000000030 R09: 000000006b6b6b6b
    [ 656.194208] R10: ffff8800bde3e000 R11: 00000000000001b4 R12: 6b6b6b6b6b6b6b6b
    [ 656.195157] R13: ffffffff81a3ca64 R14: ffff88003f8d1388 R15: ffff88003f8d13a0
    [ 656.196101] FS: 0000000000000000(0000) GS:ffff88003ba00000(0000) knlGS:0000000000000000
    [ 656.197172] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 656.197935] CR2: 00007f0b3d2e6000 CR3: 000000003ef9e000 CR4: 00000000000006f0
    [ 656.198873] Call Trace:
    [ 656.199210] do_raw_spin_lock+0x66/0xa0
    [ 656.199735] _raw_spin_lock_bh+0x19/0x20
    [ 656.200258] tipc_subscrb_subscrp_delete+0x28/0xf0 [tipc]
    [ 656.200990] tipc_subscrb_rcv_cb+0x45/0x260 [tipc]
    [ 656.201632] tipc_receive_from_sock+0xaf/0x100 [tipc]
    [ 656.202299] tipc_recv_work+0x2b/0x60 [tipc]
    [ 656.202872] process_one_work+0x157/0x420
    [ 656.203404] worker_thread+0x69/0x4c0
    [ 656.203898] kthread+0x138/0x170
    [ 656.204328] ? process_one_work+0x420/0x420
    [ 656.204889] ? kthread_create_on_node+0x40/0x40
    [ 656.205527] ret_from_fork+0x29/0x40
    [ 656.206012] Code: 48 8b 0c 25 00 c5 00 00 48 c7 c7 f0 24 a3 81 48 81 c1 f0 05 00 00 65 8b 15 61 ef f5 7e e8 9a 4c 09 00 4d 85 e4 44 8b 4b 08 74 92 8b 84 24 40 04 00 00 49 8d 8c 24 f0 05 00 00 eb 8d 90 0f 1f
    [ 656.208504] RIP: spin_bug+0xdd/0xf0 RSP: ffffc9000444bcb8
    [ 656.209798] ---[ end trace e2a800e6eb0770be ]---

    In above scenario, the request of deleting subscriber was performed
    earlier than the request of canceling a subscription although the
    latter was issued before the former, which means tipc_subscrb_delete()
    was called before tipc_subscrp_cancel(). As a result, when
    tipc_subscrb_subscrp_delete() called by tipc_subscrp_cancel() was
    executed to cancel a subscription, the subscription's subscriber
    refcnt had been decreased to 1. After tipc_subscrp_delete() where
    the subscriber was freed because its refcnt was decremented to zero,
    but the subscriber's lock had to be released, as a consequence, panic
    happened.

    By contrast, if we increase subscriber's refcnt before
    tipc_subscrb_subscrp_delete() is called in tipc_subscrp_cancel(),
    the panic issue can be avoided.

    Fixes: d094c4d5f5c7 ("tipc: add subscription refcount to avoid invalid delete")
    Reported-by: Parthasarathy Bhuvaragan
    Signed-off-by: Ying Xue
    Signed-off-by: David S. Miller

    Ying Xue
     
  • In commit, 139bb36f754a ("tipc: advance the time of deleting
    subscription from subscriber->subscrp_list"), we delete the
    subscription from the subscribers list and from nametable
    unconditionally. This leads to the following bug if the timer
    running tipc_subscrp_timeout() in another CPU accesses the
    subscription list after the subscription delete request.

    [39.570] general protection fault: 0000 [#1] SMP
    ::
    [39.574] task: ffffffff81c10540 task.stack: ffffffff81c00000
    [39.575] RIP: 0010:tipc_subscrp_timeout+0x32/0x80 [tipc]
    [39.576] RSP: 0018:ffff88003ba03e90 EFLAGS: 00010282
    [39.576] RAX: dead000000000200 RBX: ffff88003f0f3600 RCX: 0000000000000101
    [39.577] RDX: dead000000000100 RSI: 0000000000000201 RDI: ffff88003f0d7948
    [39.578] RBP: ffff88003ba03ea0 R08: 0000000000000001 R09: ffff88003ba03ef8
    [39.579] R10: 000000000000014f R11: 0000000000000000 R12: ffff88003f0d7948
    [39.580] R13: ffff88003f0f3618 R14: ffffffffa006c250 R15: ffff88003f0f3600
    [39.581] FS: 0000000000000000(0000) GS:ffff88003ba00000(0000) knlGS:0000000000000000
    [39.582] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [39.583] CR2: 00007f831c6e0714 CR3: 000000003d3b0000 CR4: 00000000000006f0
    [39.584] Call Trace:
    [39.584]
    [39.585] call_timer_fn+0x3d/0x180
    [39.585] ? tipc_subscrb_rcv_cb+0x260/0x260 [tipc]
    [39.586] run_timer_softirq+0x168/0x1f0
    [39.586] ? sched_clock_cpu+0x16/0xc0
    [39.587] __do_softirq+0x9b/0x2de
    [39.587] irq_exit+0x60/0x70
    [39.588] smp_apic_timer_interrupt+0x3d/0x50
    [39.588] apic_timer_interrupt+0x86/0x90
    [39.589] RIP: 0010:default_idle+0x20/0xf0
    [39.589] RSP: 0018:ffffffff81c03e58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
    [39.590] RAX: 0000000000000000 RBX: ffffffff81c10540 RCX: 0000000000000000
    [39.591] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
    [39.592] RBP: ffffffff81c03e68 R08: 0000000000000000 R09: 0000000000000000
    [39.593] R10: ffffc90001cbbe00 R11: 0000000000000000 R12: 0000000000000000
    [39.594] R13: ffffffff81c10540 R14: 0000000000000000 R15: 0000000000000000
    [39.595]
    ::
    [39.603] RIP: tipc_subscrp_timeout+0x32/0x80 [tipc] RSP: ffff88003ba03e90
    [39.604] ---[ end trace 79ce94b7216cb459 ]---

    Fixes: 139bb36f754a ("tipc: advance the time of deleting subscription from subscriber->subscrp_list")
    Signed-off-by: Parthasarathy Bhuvaragan
    Signed-off-by: David S. Miller

    Parthasarathy Bhuvaragan
     

29 Mar, 2017

2 commits

  • When a new subscription object is inserted into name_seq->subscriptions
    list, it's under name_seq->lock protection; when a subscription is
    deleted from the list, it's also under the same lock protection;
    similarly, when accessing a subscription by going through subscriptions
    list, the entire process is also protected by the name_seq->lock.

    Therefore, if subscription refcount is increased before it's inserted
    into subscriptions list, and its refcount is decreased after it's
    deleted from the list, it will be unnecessary to hold refcount at all
    before accessing subscription object which is obtained by going through
    subscriptions list under name_seq->lock protection.

    Signed-off-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • After a subscription object is created, it's inserted into its
    subscriber subscrp_list list under subscriber lock protection,
    similarly, before it's destroyed, it should be first removed from
    its subscriber->subscrp_list. Since the subscription list is
    accessed with subscriber lock, all the subscriptions are valid
    during the lock duration. Hence in tipc_subscrb_subscrp_delete(), we
    remove subscription get/put and the extra subscriber unlock/lock.

    After this change, the subscriptions refcount cleanup is very simple
    and does not access any lock.

    Acked-by: Jon Maloy
    Signed-off-by: Ying Xue
    Signed-off-by: Parthasarathy Bhuvaragan
    Signed-off-by: David S. Miller

    Ying Xue
     

23 Mar, 2017

1 commit

  • Until now, tipc_nametbl_unsubscribe() is called at subscriptions
    reference count cleanup. Usually the subscriptions cleanup is
    called at subscription timeout or at subscription cancel or at
    subscriber delete.

    We have ignored the possibility of this being called from other
    locations, which causes deadlock as we try to grab the
    tn->nametbl_lock while holding it already.

    CPU1: CPU2:
    ---------- ----------------
    tipc_nametbl_publish
    spin_lock_bh(&tn->nametbl_lock)
    tipc_nametbl_insert_publ
    tipc_nameseq_insert_publ
    tipc_subscrp_report_overlap
    tipc_subscrp_get
    tipc_subscrp_send_event
    tipc_close_conn
    tipc_subscrb_release_cb
    tipc_subscrb_delete
    tipc_subscrp_put
    tipc_subscrp_put
    tipc_subscrp_kref_release
    tipc_nametbl_unsubscribe
    spin_lock_bh(&tn->nametbl_lock)
    <>

    CPU1: CPU2:
    ---------- ----------------
    tipc_nametbl_stop
    spin_lock_bh(&tn->nametbl_lock)
    tipc_purge_publications
    tipc_nameseq_remove_publ
    tipc_subscrp_report_overlap
    tipc_subscrp_get
    tipc_subscrp_send_event
    tipc_close_conn
    tipc_subscrb_release_cb
    tipc_subscrb_delete
    tipc_subscrp_put
    tipc_subscrp_put
    tipc_subscrp_kref_release
    tipc_nametbl_unsubscribe
    spin_lock_bh(&tn->nametbl_lock)
    <>

    In this commit, we advance the calling of tipc_nametbl_unsubscribe()
    from the refcount cleanup to the intended callers.

    Fixes: d094c4d5f5c7 ("tipc: add subscription refcount to avoid invalid delete")
    Reported-by: John Thompson
    Acked-by: Jon Maloy
    Signed-off-by: Ying Xue
    Signed-off-by: Parthasarathy Bhuvaragan
    Signed-off-by: David S. Miller

    Ying Xue
     

25 Jan, 2017

1 commit

  • Until now, the subscribers keep track of the subscriptions using
    reference count at subscriber level. At subscription cancel or
    subscriber delete, we delete the subscription only if the timer
    was pending for the subscription. This approach is incorrect as:
    1. del_timer() is not SMP safe, if on CPU0 the check for pending
    timer returns true but CPU1 might schedule the timer callback
    thereby deleting the subscription. Thus when CPU0 is scheduled,
    it deletes an invalid subscription.
    2. We export tipc_subscrp_report_overlap(), which accesses the
    subscription pointer multiple times. Meanwhile the subscription
    timer can expire thereby freeing the subscription and we might
    continue to access the subscription pointer leading to memory
    violations.

    In this commit, we introduce subscription refcount to avoid deleting
    an invalid subscription.

    Reported-and-Tested-by: John Thompson
    Acked-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: Parthasarathy Bhuvaragan
    Signed-off-by: David S. Miller

    Parthasarathy Bhuvaragan
     

29 Apr, 2016

1 commit


15 Apr, 2016

1 commit

  • Until now, the requests sent to topology server are queued
    to a workqueue by the generic server framework.
    These messages are processed by worker threads and trigger the
    registered callbacks.
    To reduce latency on uniprocessor systems, explicit rescheduling
    is performed using cond_resched() after MAX_RECV_MSG_COUNT(25)
    messages.

    This implementation on SMP systems leads to an subscriber refcnt
    error as described below:
    When a worker thread yields by calling cond_resched() in a SMP
    system, a new worker is created on another CPU to process the
    pending workitem. Sometimes the sleeping thread wakes up before
    the new thread finishes execution.
    This breaks the assumption on ordering and being single threaded.
    The fault is more frequent when MAX_RECV_MSG_COUNT is lowered.

    If the first thread was processing subscription create and the
    second thread processing close(), the close request will free
    the subscriber and the create request oops as follows:

    [31.224137] WARNING: CPU: 2 PID: 266 at include/linux/kref.h:46 tipc_subscrb_rcv_cb+0x317/0x380 [tipc]
    [31.228143] CPU: 2 PID: 266 Comm: kworker/u8:1 Not tainted 4.5.0+ #97
    [31.228377] Workqueue: tipc_rcv tipc_recv_work [tipc]
    [...]
    [31.228377] Call Trace:
    [31.228377] [] dump_stack+0x4d/0x72
    [31.228377] [] __warn+0xd1/0xf0
    [31.228377] [] warn_slowpath_null+0x1d/0x20
    [31.228377] [] tipc_subscrb_rcv_cb+0x317/0x380 [tipc]
    [31.228377] [] tipc_receive_from_sock+0xd4/0x130 [tipc]
    [31.228377] [] tipc_recv_work+0x2b/0x50 [tipc]
    [31.228377] [] process_one_work+0x145/0x3d0
    [31.246554] ---[ end trace c3882c9baa05a4fd ]---
    [31.248327] BUG: spinlock bad magic on CPU#2, kworker/u8:1/266
    [31.249119] BUG: unable to handle kernel NULL pointer dereference at 0000000000000428
    [31.249323] IP: [] spin_dump+0x5c/0xe0
    [31.249323] PGD 0
    [31.249323] Oops: 0000 [#1] SMP

    In this commit, we
    - rename tipc_conn_shutdown() to tipc_conn_release().
    - move connection release callback execution from tipc_close_conn()
    to a new function tipc_sock_release(), which is executed before
    we free the connection.
    Thus we release the subscriber during connection release procedure
    rather than connection shutdown procedure.

    Signed-off-by: Parthasarathy Bhuvaragan
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Parthasarathy Bhuvaragan
     

09 Mar, 2016

1 commit


07 Mar, 2016

1 commit

  • commit 4d5cfcba2f6e ('tipc: fix connection abort during subscription
    cancel'), removes the check for a valid subscription before calling
    tipc_nametbl_subscribe().

    This will lead to a nullptr exception when we process a
    subscription cancel request. For a cancel request, a null
    subscription is passed to tipc_nametbl_subscribe() resulting
    in exception.

    In this commit, we call tipc_nametbl_subscribe() only for
    a valid subscription.

    Fixes: 4d5cfcba2f6e ('tipc: fix connection abort during subscription cancel')
    Reported-by: Anders Widell
    Signed-off-by: Parthasarathy Bhuvaragan
    Acked-by: Jon Maloy
    Signed-off-by: David S. Miller

    Parthasarathy Bhuvaragan
     

06 Feb, 2016

9 commits

  • Until now, we create timers even for the subscription requests
    with timeout = TIPC_WAIT_FOREVER.
    This can be improved by avoiding timer creation when the timeout
    is set to TIPC_WAIT_FOREVER.

    In this commit, we introduce a check to creates timers only
    when timeout != TIPC_WAIT_FOREVER.

    Acked-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: Parthasarathy Bhuvaragan
    Signed-off-by: David S. Miller

    Parthasarathy Bhuvaragan
     
  • Until now, during subscription creation the mod_time() &
    tipc_subscrb_get() are called after releasing the subscriber
    spin lock.

    In a SMP system when performing a subscription creation, if the
    subscription timeout occurs simultaneously (the timer is
    scheduled to run on another CPU) then the timer thread
    might decrement the subscribers refcount before the create
    thread increments the refcount.

    This can be simulated by creating subscription with timeout=0 and
    sometimes the timeout occurs before the create request is complete.
    This leads to the following message:
    [30.702949] BUG: spinlock bad magic on CPU#1, kworker/u8:3/87
    [30.703834] general protection fault: 0000 [#1] SMP
    [30.704826] CPU: 1 PID: 87 Comm: kworker/u8:3 Not tainted 4.4.0-rc8+ #18
    [30.704826] Workqueue: tipc_rcv tipc_recv_work [tipc]
    [30.704826] task: ffff88003f878600 ti: ffff88003fae0000 task.ti: ffff88003fae0000
    [30.704826] RIP: 0010:[] [] spin_dump+0x5c/0xe0
    [...]
    [30.704826] Call Trace:
    [30.704826] [] spin_bug+0x26/0x30
    [30.704826] [] do_raw_spin_lock+0xe5/0x120
    [30.704826] [] _raw_spin_lock_bh+0x19/0x20
    [30.704826] [] tipc_subscrb_rcv_cb+0x1d0/0x330 [tipc]
    [30.704826] [] tipc_receive_from_sock+0xc1/0x150 [tipc]
    [30.704826] [] tipc_recv_work+0x3f/0x80 [tipc]
    [30.704826] [] process_one_work+0x149/0x3c0
    [30.704826] [] worker_thread+0x66/0x460
    [30.704826] [] ? process_one_work+0x3c0/0x3c0
    [30.704826] [] ? process_one_work+0x3c0/0x3c0
    [30.704826] [] kthread+0xed/0x110
    [30.704826] [] ? kthread_create_on_node+0x190/0x190
    [30.704826] [] ret_from_fork+0x3f/0x70

    In this commit,
    1. we remove the check for the return code for mod_timer()
    2. we protect tipc_subscrb_get() using the subscriber spin lock.
    We increment the subscriber's refcount as soon as we add the
    subscription to subscriber's subscription list.

    Acked-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: Parthasarathy Bhuvaragan
    Signed-off-by: David S. Miller

    Parthasarathy Bhuvaragan
     
  • Until now, while creating a subscription the subscriber lock
    protects only the subscribers subscription list and not the
    nametable. The call to tipc_nametbl_subscribe() is outside
    the lock. However, at subscription timeout and cancel both
    the subscribers subscription list and the nametable are
    protected by the subscriber lock.

    This asymmetric locking mechanism leads to the following problem:
    In a SMP system, the timer can be fire on another core before
    the create request is complete.
    When the timer thread calls tipc_nametbl_unsubscribe() before create
    thread calls tipc_nametbl_subscribe(), we get a nullptr exception.

    This can be simulated by creating subscription with timeout=0 and
    sometimes the timeout occurs before the create request is complete.

    The following is the oops:
    [57.569661] BUG: unable to handle kernel NULL pointer dereference at (null)
    [57.577498] IP: [] tipc_nametbl_unsubscribe+0x8a/0x120 [tipc]
    [57.584820] PGD 0
    [57.586834] Oops: 0002 [#1] SMP
    [57.685506] CPU: 14 PID: 10077 Comm: kworker/u40:1 Tainted: P OENX 3.12.48-52.27.1. 9688.1.PTF-default #1
    [57.703637] Workqueue: tipc_rcv tipc_recv_work [tipc]
    [57.708697] task: ffff88064c7f00c0 ti: ffff880629ef4000 task.ti: ffff880629ef4000
    [57.716181] RIP: 0010:[] [] tipc_nametbl_unsubscribe+0x8a/ 0x120 [tipc]
    [...]
    [57.812327] Call Trace:
    [57.814806] [] tipc_subscrp_delete+0x37/0x90 [tipc]
    [57.821357] [] tipc_subscrp_timeout+0x3f/0x70 [tipc]
    [57.827982] [] call_timer_fn+0x31/0x100
    [57.833490] [] run_timer_softirq+0x1f9/0x2b0
    [57.839414] [] __do_softirq+0xe5/0x230
    [57.844827] [] call_softirq+0x1c/0x30
    [57.850150] [] do_softirq+0x55/0x90
    [57.855285] [] irq_exit+0x95/0xa0
    [57.860290] [] smp_apic_timer_interrupt+0x45/0x60
    [57.866644] [] apic_timer_interrupt+0x6d/0x80
    [57.872686] [] tipc_subscrb_rcv_cb+0x2a5/0x3f0 [tipc]
    [57.879425] [] tipc_receive_from_sock+0x9f/0x100 [tipc]
    [57.886324] [] tipc_recv_work+0x26/0x60 [tipc]
    [57.892463] [] process_one_work+0x172/0x420
    [57.898309] [] worker_thread+0x11a/0x3c0
    [57.903871] [] kthread+0xb4/0xc0
    [57.908751] [] ret_from_fork+0x58/0x90

    In this commit, we do the following at subscription creation:
    1. set the subscription's subscriber pointer before performing
    tipc_nametbl_subscribe(), as this value is required further in
    the call chain ex: by tipc_subscrp_send_event().
    2. move tipc_nametbl_subscribe() under the scope of subscriber lock

    Acked-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: Parthasarathy Bhuvaragan
    Signed-off-by: David S. Miller

    Parthasarathy Bhuvaragan
     
  • Until now, the subscribers endianness for a subscription
    create/cancel request is determined as:
    swap = !(s->filter & (TIPC_SUB_PORTS | TIPC_SUB_SERVICE))
    The checks are performed only for port/service subscriptions.

    The swap calculation is incorrect if the filter in the subscription
    cancellation request is set to TIPC_SUB_CANCEL (it's a malformed
    cancel request, as the corresponding subscription create filter
    is missing).
    Thus, the check if the request is for cancellation fails and the
    request is treated as a subscription create request. The
    subscription creation fails as the request is illegal, which
    terminates this connection.

    In this commit we determine the endianness by including
    TIPC_SUB_CANCEL, which will set swap correctly and the
    request is processed as a cancellation request.

    Acked-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: Parthasarathy Bhuvaragan
    Signed-off-by: David S. Miller

    Parthasarathy Bhuvaragan
     
  • In 'commit 7fe8097cef5f ("tipc: fix nullpointer bug when subscribing
    to events")', we terminate the connection if the subscription
    creation fails.
    In the same commit, the subscription creation result was based on
    the value of subscription pointer (set in the function) instead of
    the return code.

    Unfortunately, the same function also handles subscription
    cancellation request. For a subscription cancellation request,
    the subscription pointer cannot be set. Thus the connection is
    terminated during cancellation request.

    In this commit, we move the subcription cancel check outside
    of tipc_subscrp_create(). Hence,
    - tipc_subscrp_create() will create a subscripton
    - tipc_subscrb_rcv_cb() will subscribe or cancel a subscription.

    Fixes: 'commit 7fe8097cef5f ("tipc: fix nullpointer bug when subscribing to events")'

    Acked-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: Parthasarathy Bhuvaragan
    Signed-off-by: David S. Miller

    Parthasarathy Bhuvaragan
     
  • In this commit, we split tipc_subscrp_create() into two:
    1. tipc_subscrp_create() creates a subscription
    2. A new function tipc_subscrp_subscribe() adds the
    subscription to the subscriber subscription list,
    activates the subscription timer and subscribes to
    the nametable updates.

    In future commits, the purpose of tipc_subscrb_rcv_cb() will
    be to either subscribe or cancel a subscription.

    There is no functional change in this commit.

    Acked-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: Parthasarathy Bhuvaragan
    Signed-off-by: David S. Miller

    Parthasarathy Bhuvaragan
     
  • Until now, struct tipc_subscriber has duplicate fields for
    type, upper and lower (as member of struct tipc_name_seq) at:
    1. as member seq in struct tipc_subscription
    2. as member seq in struct tipc_subscr, which is contained
    in struct tipc_event
    The former structure contains the type, upper and lower
    values in network byte order and the later contains the
    intact copy of the request.
    The struct tipc_subscription contains a field swap to
    determine if request needs network byte order conversion.
    Thus by using swap, we can convert the request when
    required instead of duplicating it.

    In this commit,
    1. we remove the references to these elements as members of
    struct tipc_subscription and replace them with elements
    from struct tipc_subscr.
    2. provide new functions to convert the user request into
    network byte order.

    Acked-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: Parthasarathy Bhuvaragan
    Signed-off-by: David S. Miller

    Parthasarathy Bhuvaragan
     
  • Until now, struct tipc_subscription has duplicate timeout and filter
    attributes present:
    1. directly as members of struct tipc_subscription
    2. in struct tipc_subscr, which is contained in struct tipc_event

    In this commit, we remove the references to these elements as
    members of struct tipc_subscription and replace them with elements
    from struct tipc_subscr.

    Acked-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: Parthasarathy Bhuvaragan
    Signed-off-by: David S. Miller

    Parthasarathy Bhuvaragan
     
  • Until now, during subscription creation we set sub->timeout by
    converting the timeout request value in milliseconds to jiffies.
    This is followed by setting the timeout value in the timer if
    sub->timeout != TIPC_WAIT_FOREVER.

    For a subscription create request with a timeout value of
    TIPC_WAIT_FOREVER, msecs_to_jiffies(TIPC_WAIT_FOREVER)
    returns MAX_JIFFY_OFFSET (0xfffffffe). This is not equal to
    TIPC_WAIT_FOREVER (0xffffffff).

    In this commit, we remove this check.

    Acked-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: Parthasarathy Bhuvaragan
    Signed-off-by: David S. Miller

    Parthasarathy Bhuvaragan
     

30 Jan, 2016

1 commit

  • In 'commit 7fe8097cef5f ("tipc: fix nullpointer bug when subscribing
    to events")', we terminate the connection if the subscription
    creation fails.
    In the same commit, the subscription creation result was based on
    the value of the subscription pointer (set in the function) instead
    of the return code.

    Unfortunately, the same function tipc_subscrp_create() handles
    subscription cancel request. For a subscription cancellation request,
    the subscription pointer cannot be set. Thus if a subscriber has
    several subscriptions and cancels any of them, the connection is
    terminated.

    In this commit, we terminate the connection based on the return value
    of tipc_subscrp_create().
    Fixes: commit 7fe8097cef5f ("tipc: fix nullpointer bug when subscribing to events")

    Reviewed-by: Jon Maloy
    Signed-off-by: Parthasarathy Bhuvaragan
    Signed-off-by: David S. Miller

    Parthasarathy Bhuvaragan
     

05 May, 2015

4 commits

  • Currently subscriber's lock protects not only subscriber's subscription
    list but also all subscriptions linked into the list. However, as all
    members of subscription are never changed after they are initialized,
    it's unnecessary for subscription to be protected under subscriber's
    lock. If the lock is used to only protect subscriber's subscription
    list, the adjustment not only makes the locking policy simpler, but
    also helps to avoid a deadlock which may happen once creating a
    subscription is failed.

    Signed-off-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • At present subscriber's lock is used to protect the subscription list
    of subscriber as well as subscriptions linked into the list. While one
    or all subscriptions are deleted through iterating the list, the
    subscriber's lock must be held. Meanwhile, as deletion of subscription
    may happen in subscription timer's handler, the lock must be grabbed
    in the function as well. When subscription's timer is terminated with
    del_timer_sync() during above iteration, subscriber's lock has to be
    temporarily released, otherwise, deadlock may occur. However, the
    temporary release may cause the double free of a subscription as the
    subscription is not disconnected from the subscription list.

    Now if a reference counter is introduced to subscriber, subscription's
    timer can be asynchronously stopped with del_timer(). As a result, the
    issue is not only able to be fixed, but also relevant code is pretty
    readable and understandable.

    Signed-off-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Introducing a new function makes the purpose of tipc_subscrb_connect_cb
    callback routine more clear.

    Signed-off-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • When a topology server accepts a connection request from its client,
    it allocates a connection instance and a tipc_subscriber structure
    object. The former is used to communicate with client, and the latter
    is often treated as a subscriber which manages all subscription events
    requested from a same client. When a topology server receives a request
    of subscribing name services from a client through the connection, it
    creates a tipc_subscription structure instance which is seen as a
    subscription recording what name services are subscribed. In order to
    manage all subscriptions from a same client, topology server links
    them into the subscrp_list of the subscriber. So subscriber and
    subscription completely represents different meanings respectively,
    but function names associated with them make us so confused that we
    are unable to easily tell which function is against subscriber and
    which is to subscription. So we want to eliminate the confusion by
    renaming them.

    Signed-off-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     

28 Feb, 2015

1 commit

  • If a subscription request is sent to a topology server
    connection, and any error occurs (malformed request, oom
    or limit reached) while processing this request, TIPC should
    terminate the subscriber connection. While doing so, it tries
    to access fields in an already freed (or never allocated)
    subscription element leading to a nullpointer exception.
    We fix this by removing the subscr_terminate function and
    terminate the connection immediately upon any subscription
    failure.

    Signed-off-by: Erik Hugne
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Erik Hugne