26 May, 2016

1 commit

  • Pull perf updates from Ingo Molnar:
    "Mostly tooling and PMU driver fixes, but also a number of late updates
    such as the reworking of the call-chain size limiting logic to make
    call-graph recording more robust, plus tooling side changes for the
    new 'backwards ring-buffer' extension to the perf ring-buffer"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
    perf record: Read from backward ring buffer
    perf record: Rename variable to make code clear
    perf record: Prevent reading invalid data in record__mmap_read
    perf evlist: Add API to pause/resume
    perf trace: Use the ptr->name beautifier as default for "filename" args
    perf trace: Use the fd->name beautifier as default for "fd" args
    perf report: Add srcline_from/to branch sort keys
    perf evsel: Record fd into perf_mmap
    perf evsel: Add overwrite attribute and check write_backward
    perf tools: Set buildid dir under symfs when --symfs is provided
    perf trace: Only auto set call-graph to "dwarf" when syscalls are being traced
    perf annotate: Sort list of recognised instructions
    perf annotate: Fix identification of ARM blt and bls instructions
    perf tools: Fix usage of max_stack sysctl
    perf callchain: Stop validating callchains by the max_stack sysctl
    perf trace: Fix exit_group() formatting
    perf top: Use machine->kptr_restrict_warned
    perf trace: Warn when trying to resolve kernel addresses with kptr_restrict=1
    perf machine: Do not bail out if not managing to read ref reloc symbol
    perf/x86/intel/p4: Trival indentation fix, remove space
    ...

    Linus Torvalds
     

25 May, 2016

1 commit


24 May, 2016

1 commit

  • Pull drm updates from Dave Airlie:
    "Here's the main drm pull request for 4.7, it's been a busy one, and
    I've been a bit more distracted in real life this merge window. Lots
    more ARM drivers, not sure if it'll ever end. I think I've at least
    one more coming the next merge window.

    But changes are all over the place, support for AMD Polaris GPUs is in
    here, some missing GM108 support for nouveau (found in some Lenovos),
    a bunch of MST and skylake fixes.

    I've also noticed a few fixes from Arnd in my inbox, that I'll try and
    get in asap, but I didn't think they should hold this up.

    New drivers:
    - Hisilicon kirin display driver
    - Mediatek MT8173 display driver
    - ARC PGU - bitstreamer on Synopsys ARC SDP boards
    - Allwinner A13 initial RGB output driver
    - Analogix driver for DisplayPort IP found in exynos and rockchip

    DRM Core:
    - UAPI headers fixes and C++ safety
    - DRM connector reference counting
    - DisplayID mode parsing for Dell 5K monitors
    - Removal of struct_mutex from drivers
    - Connector registration cleanups
    - MST robustness fixes
    - MAINTAINERS updates
    - Lockless GEM object freeing
    - Generic fbdev deferred IO support

    panel:
    - Support for a bunch of new panels

    i915:
    - VBT refactoring
    - PLL computation cleanups
    - DSI support for BXT
    - Color manager support
    - More atomic patches
    - GEM improvements
    - GuC fw loading fixes
    - DP detection fixes
    - SKL GPU hang fixes
    - Lots of BXT fixes

    radeon/amdgpu:
    - Initial Polaris support
    - GPUVM/Scheduler/Clock/Power improvements
    - ASYNC pageflip support
    - New mesa feature support

    nouveau:
    - GM108 support
    - Power sensor support improvements
    - GR init + ucode fixes.
    - Use GPU provided topology information

    vmwgfx:
    - Add host messaging support

    gma500:
    - Some cleanups and fixes

    atmel:
    - Bridge support
    - Async atomic commit support

    fsl-dcu:
    - Timing controller for LCD support
    - Pixel clock polarity support

    rcar-du:
    - Misc fixes

    exynos:
    - Pipeline clock support
    - Exynoss4533 SoC support
    - HW trigger mode support
    - export HDMI_PHY clock
    - DECON5433 fixes
    - Use generic prime functions
    - use DMA mapping APIs

    rockchip:
    - Lots of little fixes

    vc4:
    - Render node support
    - Gamma ramp support
    - DPI output support

    msm:
    - Mostly cleanups and fixes
    - Conversion to generic struct fence

    etnaviv:
    - Fix for prime buffer handling
    - Allow hangcheck to be coalesced with other wakeups

    tegra:
    - Gamme table size fix"

    * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (1050 commits)
    drm/edid: add displayid detailed 1 timings to the modelist. (v1.1)
    drm/edid: move displayid validation to it's own function.
    drm/displayid: Iterate over all DisplayID blocks
    drm/edid: move displayid tiled block parsing into separate function.
    drm: Nuke ->vblank_disable_allowed
    drm/vmwgfx: Report vmwgfx version to vmware.log
    drm/vmwgfx: Add VMWare host messaging capability
    drm/vmwgfx: Kill some lockdep warnings
    drm/nouveau/gr/gf100-: fix race condition in fecs/gpccs ucode
    drm/nouveau/core: recognise GM108 chipsets
    drm/nouveau/gr/gm107-: fix touching non-existent ppcs in attrib cb setup
    drm/nouveau/gr/gk104-: share implementation of ppc exception init
    drm/nouveau/gr/gk104-: move rop_active_fbps init to nonctx
    drm/nouveau/bios/pll: check BIT table version before trying to parse it
    drm/nouveau/bios/pll: prevent oops when limits table can't be parsed
    drm/nouveau/volt/gk104: round up in gk104_volt_set
    drm/nouveau/fb/gm200: setup mmu debug buffer registers at init()
    drm/nouveau/fb/gk20a,gm20b: setup mmu debug buffer registers at init()
    drm/nouveau/fb/gf100-: allocate mmu debug buffers
    drm/nouveau/fb: allow chipset-specific actions for oneinit()
    ...

    Linus Torvalds
     

21 May, 2016

2 commits

  • The binary GCD algorithm is based on the following facts:
    1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2)
    2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b)
    3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b)

    Even on x86 machines with reasonable division hardware, the binary
    algorithm runs about 25% faster (80% the execution time) than the
    division-based Euclidian algorithm.

    On platforms like Alpha and ARMv6 where division is a function call to
    emulation code, it's even more significant.

    There are two variants of the code here, depending on whether a fast
    __ffs (find least significant set bit) instruction is available. This
    allows the unpredictable branches in the bit-at-a-time shifting loop to
    be eliminated.

    If fast __ffs is not available, the "even/odd" GCD variant is used.

    I use the following code to benchmark:

    #include
    #include
    #include
    #include
    #include
    #include

    #define swap(a, b) \
    do { \
    a ^= b; \
    b ^= a; \
    a ^= b; \
    } while (0)

    unsigned long gcd0(unsigned long a, unsigned long b)
    {
    unsigned long r;

    if (a < b) {
    swap(a, b);
    }

    if (b == 0)
    return a;

    while ((r = a % b) != 0) {
    a = b;
    b = r;
    }

    return b;
    }

    unsigned long gcd1(unsigned long a, unsigned long b)
    {
    unsigned long r = a | b;

    if (!a || !b)
    return r;

    b >>= __builtin_ctzl(b);

    for (;;) {
    a >>= __builtin_ctzl(a);
    if (a == b)
    return a << __builtin_ctzl(r);

    if (a < b)
    swap(a, b);
    a -= b;
    }
    }

    unsigned long gcd2(unsigned long a, unsigned long b)
    {
    unsigned long r = a | b;

    if (!a || !b)
    return r;

    r &= -r;

    while (!(b & r))
    b >>= 1;

    for (;;) {
    while (!(a & r))
    a >>= 1;
    if (a == b)
    return a;

    if (a < b)
    swap(a, b);
    a -= b;
    a >>= 1;
    if (a & r)
    a += b;
    a >>= 1;
    }
    }

    unsigned long gcd3(unsigned long a, unsigned long b)
    {
    unsigned long r = a | b;

    if (!a || !b)
    return r;

    b >>= __builtin_ctzl(b);
    if (b == 1)
    return r & -r;

    for (;;) {
    a >>= __builtin_ctzl(a);
    if (a == 1)
    return r & -r;
    if (a == b)
    return a << __builtin_ctzl(r);

    if (a < b)
    swap(a, b);
    a -= b;
    }
    }

    unsigned long gcd4(unsigned long a, unsigned long b)
    {
    unsigned long r = a | b;

    if (!a || !b)
    return r;

    r &= -r;

    while (!(b & r))
    b >>= 1;
    if (b == r)
    return r;

    for (;;) {
    while (!(a & r))
    a >>= 1;
    if (a == r)
    return r;
    if (a == b)
    return a;

    if (a < b)
    swap(a, b);
    a -= b;
    a >>= 1;
    if (a & r)
    a += b;
    a >>= 1;
    }
    }

    static unsigned long (*gcd_func[])(unsigned long a, unsigned long b) = {
    gcd0, gcd1, gcd2, gcd3, gcd4,
    };

    #define TEST_ENTRIES (sizeof(gcd_func) / sizeof(gcd_func[0]))

    #if defined(__x86_64__)

    #define rdtscll(val) do { \
    unsigned long __a,__d; \
    __asm__ __volatile__("rdtsc" : "=a" (__a), "=d" (__d)); \
    (val) = ((unsigned long long)__a) | (((unsigned long long)__d)<= start)
    ret = end - start;
    else
    ret = ~0ULL - start + 1 + end;

    *res = gcd_res;
    return ret;
    }

    #else

    static inline struct timespec read_time(void)
    {
    struct timespec time;
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time);
    return time;
    }

    static inline unsigned long long diff_time(struct timespec start, struct timespec end)
    {
    struct timespec temp;

    if ((end.tv_nsec - start.tv_nsec) < 0) {
    temp.tv_sec = end.tv_sec - start.tv_sec - 1;
    temp.tv_nsec = 1000000000ULL + end.tv_nsec - start.tv_nsec;
    } else {
    temp.tv_sec = end.tv_sec - start.tv_sec;
    temp.tv_nsec = end.tv_nsec - start.tv_nsec;
    }

    return temp.tv_sec * 1000000000ULL + temp.tv_nsec;
    }

    static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
    unsigned long a, unsigned long b, unsigned long *res)
    {
    struct timespec start, end;
    unsigned long gcd_res;

    start = read_time();
    gcd_res = gcd(a, b);
    end = read_time();

    *res = gcd_res;
    return diff_time(start, end);
    }

    #endif

    static inline unsigned long get_rand()
    {
    if (sizeof(long) == 8)
    return (unsigned long)rand() << 32 | rand();
    else
    return rand();
    }

    int main(int argc, char **argv)
    {
    unsigned int seed = time(0);
    int loops = 100;
    int repeats = 1000;
    unsigned long (*res)[TEST_ENTRIES];
    unsigned long long elapsed[TEST_ENTRIES];
    int i, j, k;

    for (;;) {
    int opt = getopt(argc, argv, "n:r:s:");
    /* End condition always first */
    if (opt == -1)
    break;

    switch (opt) {
    case 'n':
    loops = atoi(optarg);
    break;
    case 'r':
    repeats = atoi(optarg);
    break;
    case 's':
    seed = strtoul(optarg, NULL, 10);
    break;
    default:
    /* You won't actually get here. */
    break;
    }
    }

    res = malloc(sizeof(unsigned long) * TEST_ENTRIES * loops);
    memset(elapsed, 0, sizeof(elapsed));

    srand(seed);
    for (j = 0; j < loops; j++) {
    unsigned long a = get_rand();
    /* Do we have args? */
    unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
    unsigned long long min_elapsed[TEST_ENTRIES];
    for (k = 0; k < repeats; k++) {
    for (i = 0; i < TEST_ENTRIES; i++) {
    unsigned long long tmp = benchmark_gcd_func(gcd_func[i], a, b, &res[j][i]);
    if (k == 0 || min_elapsed[i] > tmp)
    min_elapsed[i] = tmp;
    }
    }
    for (i = 0; i < TEST_ENTRIES; i++)
    elapsed[i] += min_elapsed[i];
    }

    for (i = 0; i < TEST_ENTRIES; i++)
    printf("gcd%d: elapsed %llu\n", i, elapsed[i]);

    k = 0;
    srand(seed);
    for (j = 0; j < loops; j++) {
    unsigned long a = get_rand();
    unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
    for (i = 1; i < TEST_ENTRIES; i++) {
    if (res[j][i] != res[j][0])
    break;
    }
    if (i < TEST_ENTRIES) {
    if (k == 0) {
    k = 1;
    fprintf(stderr, "Error:\n");
    }
    fprintf(stderr, "gcd(%lu, %lu): ", a, b);
    for (i = 0; i < TEST_ENTRIES; i++)
    fprintf(stderr, "%ld%s", res[j][i], i < TEST_ENTRIES - 1 ? ", " : "\n");
    }
    }

    if (k == 0)
    fprintf(stderr, "PASS\n");

    free(res);

    return 0;
    }

    Compiled with "-O2", on "VirtualBox 4.4.0-22-generic #38-Ubuntu x86_64" got:

    zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
    gcd0: elapsed 10174
    gcd1: elapsed 2120
    gcd2: elapsed 2902
    gcd3: elapsed 2039
    gcd4: elapsed 2812
    PASS
    zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
    gcd0: elapsed 9309
    gcd1: elapsed 2280
    gcd2: elapsed 2822
    gcd3: elapsed 2217
    gcd4: elapsed 2710
    PASS
    zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
    gcd0: elapsed 9589
    gcd1: elapsed 2098
    gcd2: elapsed 2815
    gcd3: elapsed 2030
    gcd4: elapsed 2718
    PASS
    zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
    gcd0: elapsed 9914
    gcd1: elapsed 2309
    gcd2: elapsed 2779
    gcd3: elapsed 2228
    gcd4: elapsed 2709
    PASS

    [akpm@linux-foundation.org: avoid #defining a CONFIG_ variable]
    Signed-off-by: Zhaoxiu Zeng
    Signed-off-by: George Spelvin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhaoxiu Zeng
     
  • Define HAVE_EXIT_THREAD for archs which want to do something in
    exit_thread. For others, let's define exit_thread as an empty inline.

    This is a cleanup before we change the prototype of exit_thread to
    accept a task parameter.

    [akpm@linux-foundation.org: fix mips]
    Signed-off-by: Jiri Slaby
    Cc: "David S. Miller"
    Cc: "H. Peter Anvin"
    Cc: "James E.J. Bottomley"
    Cc: Aurelien Jacquiot
    Cc: Benjamin Herrenschmidt
    Cc: Catalin Marinas
    Cc: Chen Liqin
    Cc: Chris Metcalf
    Cc: Chris Zankel
    Cc: David Howells
    Cc: Fenghua Yu
    Cc: Geert Uytterhoeven
    Cc: Guan Xuetao
    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ivan Kokshaysky
    Cc: James Hogan
    Cc: Jeff Dike
    Cc: Jesper Nilsson
    Cc: Jiri Slaby
    Cc: Jonas Bonn
    Cc: Koichi Yasutake
    Cc: Lennox Wu
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Mikael Starvik
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Richard Henderson
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Russell King
    Cc: Steven Miao
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     

20 May, 2016

4 commits

  • Merge updates from Andrew Morton:

    - fsnotify fix

    - poll() timeout fix

    - a few scripts/ tweaks

    - debugobjects updates

    - the (small) ocfs2 queue

    - Minor fixes to kernel/padata.c

    - Maybe half of the MM queue

    * emailed patches from Andrew Morton : (117 commits)
    mm, page_alloc: restore the original nodemask if the fast path allocation failed
    mm, page_alloc: uninline the bad page part of check_new_page()
    mm, page_alloc: don't duplicate code in free_pcp_prepare
    mm, page_alloc: defer debugging checks of pages allocated from the PCP
    mm, page_alloc: defer debugging checks of freed pages until a PCP drain
    cpuset: use static key better and convert to new API
    mm, page_alloc: inline pageblock lookup in page free fast paths
    mm, page_alloc: remove unnecessary variable from free_pcppages_bulk
    mm, page_alloc: pull out side effects from free_pages_check
    mm, page_alloc: un-inline the bad part of free_pages_check
    mm, page_alloc: check multiple page fields with a single branch
    mm, page_alloc: remove field from alloc_context
    mm, page_alloc: avoid looking up the first zone in a zonelist twice
    mm, page_alloc: shortcut watermark checks for order-0 pages
    mm, page_alloc: reduce cost of fair zone allocation policy retry
    mm, page_alloc: shorten the page allocator fast path
    mm, page_alloc: check once if a zone has isolated pageblocks
    mm, page_alloc: move __GFP_HARDWALL modifications out of the fastpath
    mm, page_alloc: simplify last cpupid reset
    mm, page_alloc: remove unnecessary initialisation from __alloc_pages_nodemask()
    ...

    Linus Torvalds
     
  • I've just discovered that the useful-sounding has_transparent_hugepage()
    is actually an architecture-dependent minefield: on some arches it only
    builds if CONFIG_TRANSPARENT_HUGEPAGE=y, on others it's also there when
    not, but on some of those (arm and arm64) it then gives the wrong
    answer; and on mips alone it's marked __init, which would crash if
    called later (but so far it has not been called later).

    Straighten this out: make it available to all configs, with a sensible
    default in asm-generic/pgtable.h, removing its definitions from those
    arches (arc, arm, arm64, sparc, tile) which are served by the default,
    adding #define has_transparent_hugepage has_transparent_hugepage to
    those (mips, powerpc, s390, x86) which need to override the default at
    runtime, and removing the __init from mips (but maybe that kind of code
    should be avoided after init: set a static variable the first time it's
    called).

    Signed-off-by: Hugh Dickins
    Cc: "Kirill A. Shutemov"
    Cc: Andrea Arcangeli
    Cc: Andres Lagar-Cavilla
    Cc: Yang Shi
    Cc: Ning Qu
    Cc: Mel Gorman
    Cc: Konstantin Khlebnikov
    Acked-by: David S. Miller
    Acked-by: Vineet Gupta [arch/arc]
    Acked-by: Gerald Schaefer [arch/s390]
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Pull dmaengine updates from Vinod Koul:
    "This time round the update brings in following changes:

    - new tegra driver for ADMA device

    - support for Xilinx AXI Direct Memory Access Engine and Xilinx AXI
    Central Direct Memory Access Engine and few updates to this driver

    - new cyclic capability to sun6i and few updates

    - slave-sg support in bcm2835

    - updates to many drivers like designware, hsu, mv_xor, pxa, edma,
    qcom_hidma & bam"

    * tag 'dmaengine-4.7-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (84 commits)
    dmaengine: ioatdma: disable relaxed ordering for ioatdma
    dmaengine: of_dma: approximate an average distribution
    dmaengine: core: Use IS_ENABLED() instead of checking for built-in or module
    dmaengine: edma: Re-evaluate errors when ccerr is triggered w/o error event
    dmaengine: qcom_hidma: add support for object hierarchy
    dmaengine: qcom_hidma: add debugfs hooks
    dmaengine: qcom_hidma: implement lower level hardware interface
    dmaengine: vdma: Add clock support
    Documentation: DT: vdma: Add clock support for dmas
    dmaengine: vdma: Add config structure to differentiate dmas
    MAINTAINERS: Update Tegra DMA maintainers
    dmaengine: tegra-adma: Add support for Tegra210 ADMA
    Documentation: DT: Add binding documentation for NVIDIA ADMA
    dmaengine: vdma: Add Support for Xilinx AXI Central Direct Memory Access Engine
    Documentation: DT: vdma: update binding doc for AXI CDMA
    dmaengine: vdma: Add Support for Xilinx AXI Direct Memory Access Engine
    Documentation: DT: vdma: update binding doc for AXI DMA
    dmaengine: vdma: Rename xilinx_vdma_ prefix to xilinx_dma
    dmaengine: slave means at least one of DMA_SLAVE, DMA_CYCLIC
    dmaengine: mv_xor: Allow selecting mv_xor for mvebu only compatible SoC
    ...

    Linus Torvalds
     
  • Pull ARC updates from Vineet Gupta:
    "We have a relatively big changeset for ARC for 4.7.

    The highlight is support for EZChip (now Mellanox) NPS-400 network
    processor, a 400-Gb throughput C-programmable packet processor based
    on ARC700 cores from Synopsys. See

    http://www.mellanox.com/related-docs/prod_npu/PB_NPS-400.pdf

    Also present are irqchip and clocksource drivers for NPS as agreed
    with respective maintainers to go via ARC tree due to an soc header
    dependency. I have the needed ACKs from Jason, Marc, Daniel. You
    might run into a trivial merge conflict in drivers/irqchip/*

    This EZChip platform support required some deep changes in ARC
    architecture code and also opportunity to cleanup past sins (legacy
    irq domains, missing irq domain lookup, hard coded timer irqs...)

    Summary:

    - Support for EZChip (now Mellanox) NPS-400 Network processor based
    on ARC700

    - NPS interrupt controller and clocksource drivers

    - ARC timers probed off DT

    - ARC iqrchips switching to linear domain (upgrade from legacy
    domains)"

    * tag 'arc-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (37 commits)
    arc: axs103_smp: Fix CPU frequency to 100MHz for dual-core
    arc: axs10x: Add DT bindings for I2S PLL Clock
    ARC: pae: STRICT_MM_TYPECHECKS was broken
    ARC: Add eznps platform to Kconfig and Makefile
    ARC: [plat-eznps] Use dedicated COMMAND_LINE_SIZE
    ARC: [plat-eznps] Use dedicated cpu_relax()
    ARC: [plat-eznps] Use dedicated identity auxiliary register.
    ARC: [plat-eznps] Use dedicated SMP barriers
    ARC: [plat-eznps] Use dedicated atomic/bitops/cmpxchg
    ARC: [plat-eznps] Use dedicated user stack top
    ARC: [plat-eznps] Add eznps platform
    ARC: [plat-eznps] Add eznps board defconfig and dts
    ARC: Mark secondary cpu online only after all HW setup is done
    ARC: rwlock: disable interrupts in !LLSC variant
    ARC: Make vmalloc size configurable
    ARC: clean out UAPI byteorder.h clean off Kconfig symbol
    irqchip: add nps Internal and external irqchips
    clocksource: Add NPS400 timers driver
    soc: Support for EZchip SoC
    Documentation: Add EZchip vendor to binding list
    ...

    Linus Torvalds
     

18 May, 2016

2 commits

  • The most recent release of AXS103 [v1.1] is proven to work
    at 100 MHz in dual-core mode so this change uses mentioned feature.
    For that we:
    * Update axc003_idu.dtsi with mention of really-used CPU clock freq
    * Remove clock override in AXS platform code for dual-core HW

    Note we're still leaving a hack for clock "downgrade" on early boot
    for quad-core hardware.

    Also note this change will break functionality of AXS103 v1.0 hardware.
    That means all users of AXS103 __must__ upgrade their boards with the
    most recent firmware.

    Signed-off-by: Alexey Brodkin
    Signed-off-by: Vineet Gupta

    Alexey Brodkin
     
  • Pull GPIO updates from Linus Walleij:
    "This is the bulk of GPIO changes for kernel cycle v4.7:

    Core infrastructural changes:

    - Support for natively single-ended GPIO driver stages.

    This means that if the hardware has registers to configure open
    drain or open source configuration, we use that rather than (as we
    did before) try to emulate it by switching the line to an input to
    get high impedance.

    This is also documented throughly in Documentation/gpio/driver.txt
    for those of you who did not understand one word of what I just
    wrote.

    - Start to do away with the unnecessarily complex and unitelligible
    ARCH_REQUIRE_GPIOLIB and ARCH_WANT_OPTIONAL_GPIOLIB, another
    evolutional artifact from the time when the GPIO subsystem was
    unmaintained.

    Archs can now just select GPIOLIB and be done with it, cleanups to
    arches will trickle in for the next kernel. Some minor archs ACKed
    the changes immediately so these are included in this pull request.

    - Advancing the use of the data pointer inside the GPIO device for
    storing driver data by switching the PowerPC, Super-H Unicore and
    a few other subarches or subsystem drivers in ALSA SoC, Input,
    serial, SSB, staging etc to use it.

    - The initialization now reads the input/output state of the GPIO
    lines, so that each GPIO descriptor knows - if this callback is
    implemented - whether the line is input or output. This also
    reflects nicely in userspace "lsgpio".

    - It is now possible to name GPIO producer names, line names, from
    the device tree. (Platform data has been supported for a while).
    I bet we will get a similar mechanism for ACPI one of those days.
    This makes is possible to get sensible producer names for e.g.
    GPIO rails in "lsgpio" in userspace.

    New drivers:

    - New driver for the Loongson1.

    - The XLP driver now supports Broadcom Vulcan ARM64.

    - The IT87 driver now supports IT8620 and IT8628.

    - The PCA953X driver now supports Galileo Gen2.

    Driver improvements:

    - MCP23S08 was switched to use the gpiolib irqchip helpers and now
    also suppors level-triggered interrupts.

    - 74x164 and RCAR now supports the .set_multiple() callback

    - AMDPT was converted to use generic GPIO.

    - TC3589x, TPS65218, SX150X, F7188X, MENZ127, VX855, WM831X, WM8994
    support the new single ended callback for open drain and in some
    cases open source.

    - Implement the .get_direction() callback for a few more drivers like
    PL061, Xgene.

    Cleanups:

    - Paul Gortmaker combed through the drivers and de-modularized those
    who are not really modules.

    - Move the GPIO poweroff DT bindings to the power subdir where they
    belong.

    - Rename gpio-generic.c to gpio-mmio.c, which is much more to the
    point. That's what it is handling, nothing more, nothing less"

    * tag 'gpio-v4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (126 commits)
    MIPS: do away with ARCH_[WANT_OPTIONAL|REQUIRE]_GPIOLIB
    gpio: zevio: make it explicitly non-modular
    gpio: timberdale: make it explicitly non-modular
    gpio: stmpe: make it explicitly non-modular
    gpio: sodaville: make it explicitly non-modular
    pinctrl: sh-pfc: Let gpio_chip.to_irq() return zero on error
    gpio: dwapb: Add ACPI device ID for DWAPB GPIO controller on X-Gene platforms
    gpio: dt-bindings: add wd,mbl-gpio bindings
    gpio: of: make it possible to name GPIO lines
    gpio: make gpiod_to_irq() return negative for NO_IRQ
    gpio: xgene: implement .get_direction()
    gpio: xgene: Enable ACPI support for X-Gene GFC GPIO driver
    gpio: tegra: Implement gpio_get_direction callback
    gpio: set up initial state from .get_direction()
    gpio: rename gpio-generic.c into gpio-mmio.c
    gpio: generic: fix GPIO_GENERIC_PLATFORM is set to module case
    gpio: dwapb: add gpio-signaled acpi event support
    gpio: dwapb: convert device node to fwnode
    gpio: dwapb: remove name from dwapb_port_property
    gpio/qoriq: select IRQ_DOMAIN
    ...

    Linus Torvalds
     

17 May, 2016

2 commits

  • This makes perf_callchain_{user,kernel}() receive the max stack
    as context for the perf_callchain_entry, instead of accessing
    the global sysctl_perf_event_max_stack.

    Cc: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: Alexei Starovoitov
    Cc: Brendan Gregg
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: He Kuang
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Masami Hiramatsu
    Cc: Milian Wolff
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: Wang Nan
    Cc: Zefan Li
    Link: http://lkml.kernel.org/n/tip-kolmn1yo40p7jhswxwrc7rrd@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • …arc-processors/linux into drm-next

    Please pull this mini-series that allows ARC PGU to use
    dedicated memory location as framebuffer backing storage.

    * 'topic-arcpgu-updates' of https://github.com/foss-for-synopsys-dwc-arc-processors/linux:
    ARC: [axs10x] Specify reserved memory for frame buffer
    drm/arcpgu: use dedicated memory area for frame buffer

    Dave Airlie
     

13 May, 2016

2 commits


09 May, 2016

25 commits

  • This commit should be left last since only now eznps platform
    is in state which one can actually use.
    Signed-off-by: Noam Camus

    Noam Camus
     
  • The default 256 bytes sometimes is just not enough.
    We usually provide earlycon=... and console=... and ip=...
    All this and more may need more room.

    Signed-off-by: Noam Camus
    Acked-by: Vineet Gupta

    Noam Camus
     
  • Since the CTOP is SMT hardware multi-threaded, we need to hint
    the HW that now will be a very good time to do a hardware
    thread context switching. This is done by issuing the schd.rw
    instruction (binary coded here so as to not require specific
    revision of GCC to build the kernel).
    sched.rw means that Thread becomes eligible for execution by
    the threads scheduler after all pending read/write
    transactions were completed.

    Implementing cpu_relax_lowlatency() with barrier()
    Since with current semantics of cpu_relax() it may take a
    while till yielded CPU will get back.

    Signed-off-by: Noam Camus
    Cc: Peter Zijlstra
    Acked-by: Vineet Gupta

    Tal Zilcer
     
  • With generic "identity" num of CPUs is limited to 256 (8 bit).
    We use our alternative AUX register GLOBAL_ID (12 bit).
    Now we can support up to 4096 CPUs.

    Signed-off-by: Noam Camus

    Noam Camus
     
  • NPS device got 256 cores and each got 16 HW threads (SMT).
    We use EZchip dedicated ISA to trigger HW scheduler of the
    core that current HW thread belongs to.
    This scheduling makes sure that data beyond barrier is available
    to all HW threads in core and by that to all in device (4K).

    Signed-off-by: Noam Camus
    Cc: Peter Zijlstra

    Noam Camus
     
  • We need our own implementaions since we lack LLSC support.
    Our extended ISA provided with optimized solution for all 32bit
    operations we see in these three headers.
    Signed-off-by: Noam Camus

    Noam Camus
     
  • NPS use special mapping right below TASK_SIZE.
    Hence we need to lower STACK_TOP so that user stack won't
    overlap NPS special mapping.

    Signed-off-by: Noam Camus
    Acked-by: Vineet Gupta

    Noam Camus
     
  • This platform include boards:
    Hardware Emulator (HE)
    Simulator based upon nSIM.

    Signed-off-by: Noam Camus

    Noam Camus
     
  • Adding default configuration file and DTS file

    Signed-off-by: Noam Camus

    Noam Camus
     
  • In SMP setup, master loops for each_present_cpu calling cpu_up().
    For ARC it returns as soon as new cpu's status becomes online,
    However secondary may still do HW initializing,
    machine or platform hook level.

    So turn secondary online only after all HW setup is done.
    Signed-off-by: Noam Camus
    Acked-by: Vineet Gupta

    Noam Camus
     
  • If we hold rwlock and interrupt occures we may
    end up spinning on it for ever during softirq.
    Note that this lock is an internal lock
    and since the lock is free to be used from any context,
    the lock needs to be IRQ-safe.

    Below you may see an example for interrupt we get while
    nl_table_lock is holding its rw->lock_mutex and we spinned
    on it for ever.

    The concept for the fix was taken from SPARC.

    [2015-05-12 19:16:12] Stack Trace:
    [2015-05-12 19:16:12] arc_unwind_core+0xb8/0x11c
    [2015-05-12 19:16:12] dump_stack+0x68/0xac
    [2015-05-12 19:16:12] _raw_read_lock+0xa8/0xac
    [2015-05-12 19:16:12] netlink_broadcast_filtered+0x56/0x35c
    [2015-05-12 19:16:12] nlmsg_notify+0x42/0xa4
    [2015-05-12 19:16:13] neigh_update+0x1fe/0x44c
    [2015-05-12 19:16:13] neigh_event_ns+0x40/0xa4
    [2015-05-12 19:16:13] arp_process+0x46e/0x5a8
    [2015-05-12 19:16:13] __netif_receive_skb_core+0x358/0x500
    [2015-05-12 19:16:13] process_backlog+0x92/0x154
    [2015-05-12 19:16:13] net_rx_action+0xb8/0x188
    [2015-05-12 19:16:13] __do_softirq+0xda/0x1d8
    [2015-05-12 19:16:14] irq_exit+0x8a/0x8c
    [2015-05-12 19:16:14] arch_do_IRQ+0x6c/0xa8
    [2015-05-12 19:16:14] handle_interrupt_level1+0xe4/0xf0

    Signed-off-by: Noam Camus
    Acked-by: Peter Zijlstra

    Noam Camus
     
  • On ARC, lower 2G of address space is translated and used for
    - user vaddr space (region 0 to 5)
    - unused kernel-user gutter (region 6)
    - kernel vaddr space (region 7)

    where each region simply represents 256MB of address space.

    The kernel vaddr space of 256MB is used to implement vmalloc, modules
    So far this was enough, but not on EZChip system with 4K CPUs (given
    that per cpu mechanism uses vmalloc for allocating chunks)

    So allow VMALLOC_SIZE to be configurable by expanding down into the unused
    kernel-user gutter region which at default 256M was excessive anyways.

    Also use _BITUL() to fix a build error since PGDIR_SIZE cannot use "1UL"
    as called from assembly code in mm/tlbex.S

    Signed-off-by: Noam Camus
    [vgupta: rewrote changelog, debugged bootup crash due to int vs. hex]
    Acked-by: Vineet Gupta

    Noam Camus
     
  • UAPI header should not use Kconfig items

    Use __BIG_ENDIAN__ defined as a compiler intrinsic

    Signed-off-by: Noam Camus
    [vgupta: fix changelog]
    Signed-off-by: Vineet Gupta

    Signed-off-by: Vineet Gupta

    Noam Camus
     
  • There are no more users of this - so RIP!

    Signed-off-by: Alexey Brodkin
    [vgupta: update changelog]
    Signed-off-by: Vineet Gupta

    Alexey Brodkin
     
  • We no longer use it and instead a real clk device such as fixed-clk
    instance is fed to timers etc.

    Signed-off-by: Alexey Brodkin
    [vgupta: broken out of a bigger patch, rewrote changelog]
    Signed-off-by: Vineet Gupta

    Alexey Brodkin
     
  • UARTs usually have fixed clock so we're switching to use of
    constant values instead of something derived from core clock
    frequency.

    Among other things this will allow us to get rid of
    arc_{get|set}_core_freq() and switch to generic clock
    framework later on.

    Acked-by: Christian Ruppert
    Signed-off-by: Alexey Brodkin
    Signed-off-by: Vineet Gupta

    Alexey Brodkin
     
  • Now that we have Timers probed from DT, don't need legacy domain

    This however requires mapping to be called explicitly for the IRQ which
    still can't (and probably never) be probed from DT such as IPI and
    SOFTIRQ

    Acked-by: Marc Zyngier
    Signed-off-by: Vineet Gupta

    Vineet Gupta
     
  • The primary interrupt handler arch_do_IRQ() was passing hwirq as linux
    virq to core code. This was fragile and worked so far as we only had legacy/linear
    domains.

    This came out of a rant by Marc Zyngier.
    http://lists.infradead.org/pipermail/linux-snps-arc/2015-December/000298.html

    Cc: Marc Zyngier
    Cc: Thomas Gleixner
    Cc: Noam Camus
    Signed-off-by: Vineet Gupta

    Vineet Gupta
     
  • This will be needed for switching to linear irq domain as
    irq_create_mapping() called by intr code needs the IRQ numbers
    in addition to existing usage in mcip.c for requesting the irq

    Signed-off-by: Vineet Gupta

    Vineet Gupta
     
  • - Remove explicit clocksource setup and let it be done by OF framework
    by defining CLOCKSOURCE_OF_DECLARE() for various timers

    - This allows multiple clocksources to be potentially registered
    simultaneouly: previously we could only do one - as all of them had
    same arc_counter_setup() routine for registration

    - Setup routines also ensure that the underlying timer actually exists.

    - Remove some of the panic() calls if underlying timer is NOT detected as
    fallback clocksource might still be available
    1. If GRFC doesn't exist, jiffies clocksource gets registered anyways
    2. if RTC doesn't exist, TIMER1 can take over (as it is always
    present)

    Cc: Daniel Lezcano
    Signed-off-by: Vineet Gupta

    Vineet Gupta
     
  • - timer frequency is derived from DT (no longer rely on top level
    DT "clock-frequency" probed early and exported by asm/clk.h)

    - TIMER0_IRQ need not be exported across arch code, confined to intc as
    it is property of same

    - Any failures in clockevent setup are considered pedantic and system
    panic()'s as there is no generic fallback (unlike clocksource where
    a jiffies based soft clocksource always exists)

    Acked-by: Daniel Lezcano
    Signed-off-by: Vineet Gupta

    Vineet Gupta
     
  • ARC Timers have historically been probed directly.
    As precursor to start probing Timers thru DT introduce these bindings
    Note that to keep series bisectable, these bindings are not yet used in
    code.

    Cc: Daniel Lezcano
    Cc: devicetree@vger.kernel.org
    Acked-by: Rob Herring
    Signed-off-by: Vineet Gupta

    Vineet Gupta
     
  • This allows us to introduce timers in DT in next commit

    The core clk frequency hack in AXS103 platform is also extended,
    where the core clk feeding into timers is updated in-place in FDT.

    Cc: Daniel Lezcano
    Cc: Rob Herring
    Cc: devicetree@vger.kernel.org
    Signed-off-by: Vineet Gupta

    Vineet Gupta
     
  • This is again for future changes to use common DTSI for timers which
    refer to @core_intc

    Signed-off-by: Vineet Gupta

    Vineet Gupta
     
  • ... and add them to plat-sim DTS.

    This allows for future change to introduce timers in DT in single place

    Signed-off-by: Vineet Gupta

    Vineet Gupta