31 Aug, 2016

36 commits

  • Both ETMv3 and ETMv4 drivers are declaring an 'enum etm_addr_type',
    creating reduncancy.

    This patch removes the enumeration from the driver files and adds
    it to a common header.

    Signed-off-by: Mathieu Poirier
    Signed-off-by: Greg Kroah-Hartman

    Mathieu Poirier
     
  • With this commit [1] address range filter information is now found
    in the struct hw_perf_event::addr_filters. As such pass the event
    itself to the coresight_source::enable/disable() functions so that
    both event attribute and filter can be accessible for configuration.

    [1] 'commit 375637bc5249 ("perf/core: Introduce address range filtering")'

    Signed-off-by: Mathieu Poirier
    Signed-off-by: Greg Kroah-Hartman

    Mathieu Poirier
     
  • The ETM registers are classified into 2 categories: trace and management.
    The core power domain contains most of the trace unit logic including
    all(except TRCOSLAR and TRCOSLSR) the trace registers. The debug power
    domain contains the external debugger interface including all management
    registers.

    This patch adds coresight unit specific function coresight_simple_func
    which can be used for ETM trace registers by providing a ETM specific
    read function which does smp cross call to ensure the trace core is
    powered up before the register is accessed.

    Cc: Mathieu Poirier
    Signed-off-by: Sudeep Holla
    Signed-off-by: Mathieu Poirier
    Signed-off-by: Greg Kroah-Hartman

    Sudeep Holla
     
  • The Coresight ETMv4 architecture provides a way to request to keep the
    power to the trace unit. This might help to collect the traces without
    the need to disable the CPU power management(entering/exiting deeper
    idle states).

    Trace PowerDown Control Register provides powerup request bit which when
    set requests the system to retain power to the trace unit and emulate
    the powerdown request.

    Typically, a trace unit drives a signal to the power controller to
    request that the trace unit core power domain is powered up. However,
    if the trace unit and the CPU are in the same power domain then the
    implementation might combine the trace unit power up status with a
    signal from the CPU.

    This patch requests to retain power to the trace unit when active and
    to remove when inactive. Note this change will only request but the
    behaviour depends on the implementation. However, it matches the
    exact behaviour expected when the external debugger is connected with
    respect to CPU power states.

    Cc: Mathieu Poirier
    Signed-off-by: Sudeep Holla
    Signed-off-by: Mathieu Poirier
    Signed-off-by: Greg Kroah-Hartman

    Sudeep Holla
     
  • The kfree() function tests whether its argument is NULL and then
    returns immediately. Thus the test around the call is not needed.

    This issue was detected by using the Coccinelle software.

    Signed-off-by: Markus Elfring
    Signed-off-by: Mathieu Poirier
    Signed-off-by: Greg Kroah-Hartman

    Markus Elfring
     
  • Remove duplicated include.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Mathieu Poirier
    Signed-off-by: Greg Kroah-Hartman

    Wei Yongjun
     
  • Each coresight device prepares a description for coresight_register()
    in struct coresight_desc. Once we register the device, the description is
    useless and can be freed. The coresight_desc is small enough (48bytes on
    64bit)i to be allocated on the stack. Hence use an automatic variable to
    avoid a needless dynamic allocation and wasting the memory(which will only
    be free'd when the device is destroyed).

    Cc: Mathieu Poirier
    Cc: Pratik Patel
    Signed-off-by: Suzuki K Poulose
    Signed-off-by: Mathieu Poirier
    Signed-off-by: Greg Kroah-Hartman

    Suzuki K Poulose
     
  • of_node_put needs to be called when the device node which is got
    from of_parse_phandle has finished using.

    Cc: linux-arm-kernel@lists.infradead.org
    Cc: Mathieu Poirier
    Signed-off-by: Peter Chen
    Signed-off-by: Mathieu Poirier
    Signed-off-by: Greg Kroah-Hartman

    Peter Chen
     
  • Signed-off-by: Olivier Schonken
    Signed-off-by: Mathieu Poirier
    Signed-off-by: Greg Kroah-Hartman

    Olivier Schonken
     
  • It is mandatory to enable a coresight block's power domain before
    trying to access management registers. Otherwise the transaction
    simply stalls, leading to a system hang.

    Signed-off-by: Mathieu Poirier
    Reviewed-by: Sudeep Holla
    Signed-off-by: Greg Kroah-Hartman

    Mathieu Poirier
     
  • Depending on when CoreSight device are discovered it is possible
    that some IP block may be referencing devices that have not been
    added to the bus yet. The end result is missing nodes in the
    CoreSight topology even when the devices are present and properly
    initialised.

    This patch solves the problem by asking the driver core to
    try initialising the device at a later time when the children
    of a CoreSight node are missing.

    Signed-off-by: Mathieu Poirier
    Signed-off-by: Greg Kroah-Hartman

    Mathieu Poirier
     
  • When we encounter a timeout waiting for a status change via
    coresight_timeout, the caller always print the offset which
    was tried. This is pretty much useless as it doesn't specify
    the bit position we wait for. Also, one needs to lookup the
    TRM to figure out, what was wrong. This patch changes all
    such error messages to print something more meaningful.

    Cc: Mathieu Poirier
    Signed-off-by: Suzuki K Poulose
    Signed-off-by: Mathieu Poirier
    Signed-off-by: Greg Kroah-Hartman

    Suzuki K Poulose
     
  • Use the defined symbol rather than hardcoding the value to
    check whether the TMC buffer is full.

    Cc: Mathieu Poirier
    Signed-off-by: Suzuki K Poulose
    Signed-off-by: Mathieu Poirier
    Signed-off-by: Greg Kroah-Hartman

    Suzuki K Poulose
     
  • This patch cleans up the peripheral id table for different ETMv4
    implementations.

    As per Cortex-A53 TRM, the ETM has following id values:

    Peripheral ID0 0x5D 0xFE0
    Peripheral ID1 0xB9 0xFE4
    Peripheral ID2 0x4B 0xFE8
    Peripheral ID3 0x00 0xFEC

    where, PID2: has the following format:

    [7:4] Revision
    [3] JEDEC 0b1 res1. Indicates a JEP106 identity code is used
    [2:0] DES_1 0b011 ARM Limited. This is bits[6:4] of JEP106 ID code

    The existing table entry checks only the bits [1:0], which is not
    sufficient enough. Fix it to match bits [3:0], just like the other
    entries do. While at it, correct the comment for A57 and the A53 entry.

    Cc: Mathieu Poirier
    Signed-off-by: Suzuki K Poulose
    Signed-off-by: Mathieu Poirier
    Signed-off-by: Greg Kroah-Hartman

    Suzuki K Poulose
     
  • At present the ETF or ETR gives out the entire device
    buffer, even if there is less or even no trace data
    available. This patch limits the trace data given out to
    the actual trace data collected.

    Cc: mathieu.poirier@linaro.org
    Signed-off-by: Suzuki K Poulose
    Signed-off-by: Mathieu Poirier
    Signed-off-by: Greg Kroah-Hartman

    Suzuki K Poulose
     
  • This is a cleanup patch.

    coresight_device->conns holds an array to point to the devices
    connected to the OUT ports of a component. Sinks, e.g ETR, do not
    have an OUT port (nr_outport = 0), as it streams the trace to
    memory via AXI.

    At coresight_register() we do :

    conns = kcalloc(csdev->nr_outport, sizeof(*conns), GFP_KERNEL);
    if (!conns) {
    ret = -ENOMEM;
    goto err_kzalloc_conns;
    }

    For ETR, since the total size requested for kcalloc is zero, the return
    value is, ZERO_SIZE_PTR ( != NULL). Hence, csdev->conns = ZERO_SIZE_PTR
    which cannot be verified later to contain a valid pointer. The code which
    accesses the csdev->conns is bounded by the csdev->nr_outport check,
    hence we don't try to dereference the ZERO_SIZE_PTR. This patch cleans
    up the csdev->conns initialisation to make sure we initialise it
    properly(i.e, either NULL or valid conns array).

    Cc: Mathieu Poirier
    Signed-off-by: Suzuki K Poulose
    Signed-off-by: Mathieu Poirier
    Signed-off-by: Greg Kroah-Hartman

    Suzuki K Poulose
     
  • This patch cleans up the error handling path for tmc_probe
    as a side effect of the removal of the spurious dma_free_coherent().

    Cc: Mathieu Poirier
    Signed-off-by: Suzuki K Poulose
    Signed-off-by: Mathieu Poirier
    Signed-off-by: Greg Kroah-Hartman

    Suzuki K Poulose
     
  • commit de5461970b3e9e194 ("coresight: tmc: allocating memory when needed")
    removed the static allocation of buffer for the trace data in ETR mode in
    tmc_probe. However it failed to remove the "devm_free_coherent" in
    tmc_probe when the probe fails due to other reasons. This patch gets
    rid of the incorrect dma_free_coherent() call.

    Fixes: commit de5461970b3e9e194 ("coresight: tmc: allocating memory when needed")
    Cc: Mathieu Poirier
    Signed-off-by: Suzuki K Poulose
    Signed-off-by: Mathieu Poirier
    Signed-off-by: Greg Kroah-Hartman

    Suzuki K Poulose
     
  • etm4_trace_id is not guaranteed to be executed on the CPU whose ETM is
    being accessed. This leads to exception similar to below one if the
    CPU whose ETM is being accessed is in deeper idle states. So it must
    be executed on the CPU whose ETM is being accessed.

    Unhandled fault: synchronous external abort (0x96000210) at 0xffff000008db4040
    Internal error: : 96000210 [#1] PREEMPT SMP
    Modules linked in:
    CPU: 5 PID: 5979 Comm: etm.sh Not tainted 4.7.0-rc3 #159
    Hardware name: ARM Juno development board (r2) (DT)
    task: ffff80096dd34b00 ti: ffff80096dfe4000 task.ti: ffff80096dfe4000
    PC is at etm4_trace_id+0x5c/0x90
    LR is at etm4_trace_id+0x3c/0x90
    Call trace:
    etm4_trace_id+0x5c/0x90
    coresight_id_match+0x78/0xa8
    bus_for_each_dev+0x60/0xa0
    coresight_enable+0xc0/0x1b8
    enable_source_store+0x3c/0x70
    dev_attr_store+0x18/0x28
    sysfs_kf_write+0x48/0x58
    kernfs_fop_write+0x14c/0x1e0
    __vfs_write+0x1c/0x100
    vfs_write+0xa0/0x1b8
    SyS_write+0x44/0xa0
    el0_svc_naked+0x24/0x28

    However, TRCTRACEIDR is not guaranteed to hold the previous programmed
    trace id if it enters deeper idle states. Further, the trace id that is
    computed in etm4_init_trace_id is programmed into TRCTRACEIDR only in
    etm4_enable_hw which happens much later in the sequence after
    coresight_id_match is executed from enable_source_store.

    This patch simplifies etm4_trace_id by returning the stashed trace id
    value similar to etm4_cpu_id.

    Cc: Mathieu Poirier
    Signed-off-by: Sudeep Holla
    Signed-off-by: Mathieu Poirier
    Signed-off-by: Greg Kroah-Hartman

    Sudeep Holla
     
  • CoreSight STM device allows direct mapping of the channel regions to
    userspace for zero-copy writing. To support this ability, the STM
    framework has provided a hook 'mmio_addr', this patch just implemented
    this hook for CoreSight STM.

    This patch also added an item into 'channel_space' to save the physical
    base address of channel region which mmap operation needs to know.

    Signed-off-by: Chunyan Zhang
    Signed-off-by: Mathieu Poirier
    Signed-off-by: Greg Kroah-Hartman

    Chunyan Zhang
     
  • If the addition of the coresight devices get deferred, then there's a
    window before child_name is populated by of_get_coresight_platform_data
    from the respective component driver's probe and the attempted to access
    the same from coresight_orphan_match resulting in kernel NULL pointer
    dereference as below:

    Unable to handle kernel NULL pointer dereference at virtual address 0x0
    Internal error: Oops: 96000004 [#1] PREEMPT SMP
    Modules linked in:
    CPU: 0 PID: 1038 Comm: kworker/0:1 Not tainted 4.7.0-rc3 #124
    Hardware name: ARM Juno development board (r2) (DT)
    Workqueue: events amba_deferred_retry_func
    PC is at strcmp+0x1c/0x160
    LR is at coresight_orphan_match+0x7c/0xd0
    Call trace:
    strcmp+0x1c/0x160
    bus_for_each_dev+0x60/0xa0
    coresight_register+0x264/0x2e0
    tmc_probe+0x130/0x310
    amba_probe+0xd4/0x1c8
    driver_probe_device+0x22c/0x418
    __device_attach_driver+0xbc/0x158
    bus_for_each_drv+0x58/0x98
    __device_attach+0xc4/0x160
    device_initial_probe+0x10/0x18
    bus_probe_device+0x94/0xa0
    device_add+0x344/0x580
    amba_device_try_add+0x194/0x238
    amba_deferred_retry_func+0x48/0xd0
    process_one_work+0x118/0x378
    worker_thread+0x48/0x498
    kthread+0xd0/0xe8
    ret_from_fork+0x10/0x40

    This patch adds a check for non-NULL conn->child_name before accessing
    the same.

    Cc: Mathieu Poirier
    Signed-off-by: Sudeep Holla
    Signed-off-by: Mathieu Poirier
    Signed-off-by: Greg Kroah-Hartman

    Sudeep Holla
     
  • Reports for available memory should use the si_mem_available() value.
    The previous freeram value does not include available page cache memory.

    Signed-off-by: Alex Ng
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Alex Ng
     
  • lockdep reports possible circular locking dependency when udev is used
    for memory onlining:

    systemd-udevd/3996 is trying to acquire lock:
    ((memory_chain).rwsem){++++.+}, at: [] __blocking_notifier_call_chain+0x4e/0xc0

    but task is already holding lock:
    (&dm_device.ha_region_mutex){+.+.+.}, at: [] hv_memory_notifier+0x5e/0xc0 [hv_balloon]
    ...

    which is probably a false positive because we take and release
    ha_region_mutex from memory notifier chain depending on the arg. No real
    deadlocks were reported so far (though I'm not really sure about
    preemptible kernels...) but we don't really need to hold the mutex
    for so long. We use it to protect ha_region_list (and its members) and the
    num_pages_onlined counter. None of these operations require us to sleep
    and nothing is slow, switch to using spinlock with interrupts disabled.

    While on it, replace list_for_each -> list_for_each_entry as we actually
    need entries in all these cases, drop meaningless list_empty() checks.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • With the recently introduced in-kernel memory onlining
    (MEMORY_HOTPLUG_DEFAULT_ONLINE) these is no point in waiting for pages
    to come online in the driver and we can get rid of the waiting.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • I'm observing the following hot add requests from the WS2012 host:

    hot_add_req: start_pfn = 0x108200 count = 330752
    hot_add_req: start_pfn = 0x158e00 count = 193536
    hot_add_req: start_pfn = 0x188400 count = 239616

    As the host doesn't specify hot add regions we're trying to create
    128Mb-aligned region covering the first request, we create the 0x108000 -
    0x160000 region and we add 0x108000 - 0x158e00 memory. The second request
    passes the pfn_covered() check, we enlarge the region to 0x108000 -
    0x190000 and add 0x158e00 - 0x188200 memory. The problem emerges with the
    third request as it starts at 0x188400 so there is a 0x200 gap which is
    not covered. As the end of our region is 0x190000 now it again passes the
    pfn_covered() check were we just adjust the covered_end_pfn and make it
    0x188400 instead of 0x188200 which means that we'll try to online
    0x188200-0x188400 pages but these pages were never assigned to us and we
    crash.

    We can't react to such requests by creating new hot add regions as it may
    happen that the whole suggested range falls into the previously identified
    128Mb-aligned area so we'll end up adding nothing or create intersecting
    regions and our current logic doesn't allow that. Instead, create a list of
    such 'gaps' and check for them in the page online callback.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • Windows 2012 (non-R2) does not specify hot add region in hot add requests
    and the logic in hot_add_req() is trying to find a 128Mb-aligned region
    covering the request. It may also happen that host's requests are not 128Mb
    aligned and the created ha_region will start before the first specified
    PFN. We can't online these non-present pages but we don't remember the real
    start of the region.

    This is a regression introduced by the commit 5abbbb75d733 ("Drivers: hv:
    hv_balloon: don't lose memory when onlining order is not natural"). While
    the idea of keeping the 'moving window' was wrong (as there is no guarantee
    that hot add requests come ordered) we should still keep track of
    covered_start_pfn. This is not a revert, the logic is different.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • KVP daemon does fork()/exec() (with popen()) so we need to close our fds
    to avoid sharing them with child processes. The immediate implication of
    not doing so I see is SELinux complaining about 'ip' trying to access
    '/dev/vmbus/hv_kvp'.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • On Hyper-V, performance critical channels use the monitor
    mechanism to signal the host when the guest posts mesages
    for the host. This mechanism minimizes the hypervisor intercepts
    and also makes the host more efficient in that each time the
    host is woken up, it processes a batch of messages as opposed to
    just one. The goal here is improve the throughput and this is at
    the expense of increased latency.
    Implement a mechanism to let the client driver decide if latency
    is important.

    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     
  • The current delay between retries is unnecessarily high and is negatively
    affecting the time it takes to boot the system.

    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     
  • For synthetic NIC channels, enable explicit signaling policy as netvsc wants to
    explicitly control when the host is to be signaled.

    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     
  • There is a rare race when we remove an entry from the global list
    hv_context.percpu_list[cpu] in hv_process_channel_removal() ->
    percpu_channel_deq() -> list_del(): at this time, if vmbus_on_event() ->
    process_chn_event() -> pcpu_relid2channel() is trying to query the list,
    we can get the kernel fault.

    Similarly, we also have the issue in the code path: vmbus_process_offer() ->
    percpu_channel_enq().

    We can resolve the issue by disabling the tasklet when updating the list.

    The patch also moves vmbus_release_relid() to a later place where
    the channel has been removed from the per-cpu and the global lists.

    Reported-by: Rolf Neugebauer
    Signed-off-by: Dexuan Cui
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Dexuan Cui
     
  • Background: userspace daemons registration protocol for Hyper-V utilities
    drivers has two steps:
    1) daemon writes its own version to kernel
    2) kernel reads it and replies with module version
    at this point we consider the handshake procedure being completed and we
    do hv_poll_channel() transitioning the utility device to HVUTIL_READY
    state. At this point we're ready to handle messages from kernel.

    When hvutil_transport is in HVUTIL_TRANSPORT_CHARDEV mode we have a
    single buffer for outgoing message. hvutil_transport_send() puts to this
    buffer and till the buffer is cleared with hvt_op_read() returns -EFAULT
    to all consequent calls. Hostguest protocol guarantees there is no more
    than one request at a time and we will not get new requests till we reply
    to the previous one so this single message buffer is enough.

    Now to the race. When we finish negotiation procedure and send kernel
    module version to userspace with hvutil_transport_send() it goes into the
    above mentioned buffer and if the daemon is slow enough to read it from
    there we can get a collision when a request from the host comes, we won't
    be able to put anything to the buffer so the request will be lost. To
    solve the issue we need to know when the negotiation is really done (when
    the version message is read by the daemon) and transition to HVUTIL_READY
    state after this happens. Implement a callback on read to support this.
    Old style netlink communication is not affected by the change, we don't
    really know when these messages are delivered but we don't have a single
    message buffer there.

    Reported-by: Barry Davis
    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • vmbus_teardown_gpadl() can result in infinite wait when it is called on 5
    second timeout in vmbus_open(). The issue is caused by the fact that gpadl
    teardown operation won't ever succeed for an opened channel and the timeout
    isn't always enough. As a guest, we can always trust the host to respond to
    our request (and there is nothing we can do if it doesn't).

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • In some cases create_gpadl_header() allocates submessages but we never
    free them.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • We use messagecount only once in vmbus_establish_gpadl() to check if
    it is safe to iterate through the submsglist. We can just initialize
    the list header in all cases in create_gpadl_header() instead.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • When we crash from NMI context (e.g. after NMI injection from host when
    'sysctl -w kernel.unknown_nmi_panic=1' is set) we hit

    kernel BUG at mm/vmalloc.c:1530!

    as vfree() is denied. While the issue could be solved with in_nmi() check
    instead I opted for skipping vfree on all sorts of crashes to reduce the
    amount of work which can cause consequent crashes. We don't really need to
    free anything on crash.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     

30 Aug, 2016

4 commits

  • Joe Perches points out [1] that this pattern isn't currently safe.
    This driver doesn't really need the zeroing semantic anyway;
    by restructuring the code slightly we can initialize all the
    fields of the structure up front instead.

    [1] https://lkml.kernel.org/r/1469729491.3998.58.camel@perches.com

    Signed-off-by: Chris Metcalf
    Signed-off-by: Greg Kroah-Hartman

    Chris Metcalf
     
  • Replace explicit computation of vma page count by a call to
    vma_pages()

    Signed-off-by: Muhammad Falak R Wani
    Signed-off-by: Greg Kroah-Hartman

    Muhammad Falak R Wani
     
  • v3.20 doesn't exist. heartbeat_enable was actually added in v4.4.

    Signed-off-by: Alexandre Belloni
    Signed-off-by: Greg Kroah-Hartman

    Alexandre Belloni
     
  • When building with W=1, the __scif_rma_destroy_tcw function
    causes a harmless warning about an argument variable that is
    modified but not used:

    drivers/misc/mic/scif/scif_dma.c: In function ‘__scif_rma_destroy_tcw’:
    drivers/misc/mic/scif/scif_dma.c:118:27: error: parameter ‘ep’ set but not used [-Werror=unused-but-set-parameter]

    In this case, we can just remove the argument, since all callers
    are in the same file.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Greg Kroah-Hartman

    Arnd Bergmann