15 Apr, 2015

1 commit

  • We have recently had an example of someone wanting to use a 90kHz timer
    for the software delay loop.

    udelay() needs to have at least microsecond resolution to allow drivers
    access to a delay mechanism with a reasonable chance of delaying the
    period they requested within at least a 50% marging of error, especially
    for small delays.

    Discussion about the udelay() accuracy can be found at:
    https://lkml.org/lkml/2011/1/9/37

    Reject timers which are unable to supply this level of resolution.

    Acked-by: Nicolas Pitre
    Signed-off-by: Russell King

    Russell King
     

30 Mar, 2015

1 commit

  • This moves all fixup snippets to the .text.fixup section, which is
    a special section that gets emitted along with the .text section
    for each input object file, i.e., the snippets are kept much closer
    to the code they refer to, which helps prevent linker failure on
    large kernels.

    Acked-by: Nicolas Pitre
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Russell King

    Ard Biesheuvel
     

16 Jan, 2015

1 commit

  • This code was restored with commit 080fc66fb5 ("ARM: Bring back ARMv3 IO
    and user access code") because the RiscPC memory bus does not understand
    half-word load/stores. However only the IO code needed restoring since
    the alternative user access code contains no half-word accesses, is
    already used when CONFIG_PREEMPT is set and runs faster on a StrongARM.

    Signed-off-by: Nicolas Pitre
    Signed-off-by: Russell King

    Nicolas Pitre
     

28 Nov, 2014

3 commits

  • The memory copy functions(memcpy, __copy_from_user, __copy_to_user)
    never had unwinding annotations added. Currently, when accessing
    invalid pointer by these functions occurs the backtrace shown will
    stop at these functions or some completely unrelated function.
    Add unwinding annotations in hopes of getting a more useful backtrace
    in following cases:
    1. die on accessing invalid pointer by these functions
    2. kprobe trapped at any instruction within these functions
    3. interrupted at any instruction within these functions

    Signed-off-by: Lin Yongting
    Signed-off-by: Russell King

    Lin Yongting
     
  • The memmove function never had unwinding annotations added.
    Currently, when accessing invalid pointer by memmove occurs the
    backtrace shown will stop at memmove or some completely unrelated
    function. Add unwinding annotations in hopes of getting a more
    useful backtrace in following cases:
    1. die on accessing invalid pointer by memmove
    2. kprobe trapped at any instruction within memmove
    3. interrupted at any instruction within memmove

    Signed-off-by: Lin Yongting
    Signed-off-by: Russell King

    Lin Yongting
     
  • The __memzero function never had unwinding annotations added.
    Currently, when accessing invalid pointer by __memzero occurs the
    backtrace shown will stop at __memzero or some completely unrelated
    function. Add unwinding annotations in hopes of getting a more
    useful backtrace in following cases:
    1. die on accessing invalid pointer by __memzero
    2. kprobe trapped at any instruction within __memzero
    3. interrupted at any instruction within __memzero

    Signed-off-by: Lin Yongting
    Signed-off-by: Russell King

    Lin Yongting
     

21 Nov, 2014

1 commit

  • The memset function never had unwinding annotations added.
    Currently, when accessing NULL pointer by memset occurs the
    backtrace shown will stop at memset or some completely unrelated
    function. Add unwinding annotations in hopes of getting a more
    useful backtrace when accessing NULL pointer by memset, kprobe
    or interrupt.

    Signed-off-by: Lin Yongting
    Signed-off-by: Russell King

    Lin Yongting
     

13 Sep, 2014

1 commit

  • e38361d 'ARM: 8091/2: add get_user() support for 8 byte types' commit
    broke V7 BE get_user call when target var size is 64 bit, but '*ptr' size
    is 32 bit or smaller. e38361d changed type of __r2 from 'register
    unsigned long' to 'register typeof(x) __r2 asm("r2")' i.e before the change
    even when target variable size was 64 bit, __r2 was still 32 bit.
    But after e38361d commit, for target var of 64 bit size, __r2 became 64
    bit and now it should occupy 2 registers r2, and r3. The issue in BE case
    that r3 register is least significant word of __r2 and r2 register is most
    significant word of __r2. But __get_user_4 still copies result into r2 (most
    significant word of __r2). Subsequent code copies from __r2 into x, but
    for situation described it will pick up only garbage from r3 register.

    Special __get_user_64t_(124) functions are introduced. They are similar to
    corresponding __get_user_(124) function but result stored in r3 register
    (lsw in case of 64 bit __r2 in BE image). Those function are used by
    get_user macro in case of BE and target var size is 64bit.

    Also changed __get_user_lo8 name into __get_user_32t_8 to get consistent
    naming accross all cases.

    Signed-off-by: Victor Kamensky
    Suggested-by: Daniel Thompson
    Reviewed-by: Daniel Thompson
    Signed-off-by: Russell King

    Victor Kamensky
     

09 Aug, 2014

1 commit

  • Pull ARM SoC platform changes from Olof Johansson:
    "This is the bulk of new SoC enablement and other platform changes for
    3.17:

    - Samsung S5PV210 has been converted to DT and multiplatform
    - Clock drivers and bindings for some of the lower-end i.MX 1/2
    platforms
    - Kirkwood, one of the popular Marvell platforms, is folded into the
    mvebu platform code, removing mach-kirkwood
    - Hwmod data for TI AM43xx and DRA7 platforms
    - More additions of Renesas shmobile platform support
    - Removal of plat-samsung contents that can be removed with S5PV210
    being multiplatform/DT-enabled and the other two old platforms
    being removed

    New platforms (most with only basic support right now):

    - Hisilicon X5HD2 settop box chipset is introduced
    - Mediatek MT6589 (mobile chipset) is introduced
    - Broadcom BCM7xxx settop box chipset is introduced

    + as usual a lot other pieces all over the platform code"

    * tag 'soc-for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (240 commits)
    ARM: hisi: remove smp from machine descriptor
    power: reset: move hisilicon reboot code
    ARM: dts: Add hix5hd2-dkb dts file.
    ARM: debug: Rename Hi3716 to HIX5HD2
    ARM: hisi: enable hix5hd2 SoC
    ARM: hisi: add ARCH_HISI
    MAINTAINERS: add entry for Broadcom ARM STB architecture
    ARM: brcmstb: select GISB arbiter and interrupt drivers
    ARM: brcmstb: add infrastructure for ARM-based Broadcom STB SoCs
    ARM: configs: enable SMP in bcm_defconfig
    ARM: add SMP support for Broadcom mobile SoCs
    Documentation: arm: misc updates to Marvell EBU SoC status
    Documentation: arm: add URLs to public datasheets for the Marvell Armada XP SoC
    ARM: mvebu: fix build without platforms selected
    ARM: mvebu: add cpuidle support for Armada 38x
    ARM: mvebu: add cpuidle support for Armada 370
    cpuidle: mvebu: add Armada 38x support
    cpuidle: mvebu: add Armada 370 support
    cpuidle: mvebu: rename the driver from armada-370-xp to mvebu-v7
    ARM: mvebu: export the SCU address
    ...

    Linus Torvalds
     

18 Jul, 2014

2 commits

  • Recent contributions, including to DRM and binder, introduce 64-bit
    values in their interfaces. A common motivation for this is to allow
    the same ABI for 32- and 64-bit userspaces (and therefore also a shared
    ABI for 32/64 hybrid userspaces). Anyhow, the developers would like to
    avoid gotchas like having to use copy_from_user().

    This feature is already implemented on x86-32 and the majority of other
    32-bit architectures. The current list of get_user_8 hold out
    architectures are: arm, avr32, blackfin, m32r, metag, microblaze,
    mn10300, sh.

    Credit:

    My name sits rather uneasily at the top of this patch. The v1 and
    v2 versions of the patch were written by Rob Clark and to produce v4
    I mostly copied code from Russell King and H. Peter Anvin. However I
    have mangled the patch sufficiently that *blame* is rightfully mine
    even if credit should more widely shared.

    Changelog:

    v5: updated to use the ret macro (requested by Russell King)
    v4: remove an inlined add on big endian systems (spotted by Russell King),
    used __ARMEB__ rather than BIG_ENDIAN (to match rest of file),
    cleared r3 on EFAULT during __get_user_8.
    v3: fix a couple of checkpatch issues
    v2: pass correct size to check_uaccess, and better handling of narrowing
    double word read with __get_user_xb() (Russell King's suggestion)
    v1: original

    Signed-off-by: Rob Clark
    Signed-off-by: Daniel Thompson
    Signed-off-by: Russell King

    Daniel Thompson
     
  • ARMv6 and greater introduced a new instruction ("bx") which can be used
    to return from function calls. Recent CPUs perform better when the
    "bx lr" instruction is used rather than the "mov pc, lr" instruction,
    and this sequence is strongly recommended to be used by the ARM
    architecture manual (section A.4.1.1).

    We provide a new macro "ret" with all its variants for the condition
    code which will resolve to the appropriate instruction.

    Rather than doing this piecemeal, and miss some instances, change all
    the "mov pc" instances to use the new macro, with the exception of
    the "movs" instruction and the kprobes code. This allows us to detect
    the "mov pc, lr" case and fix it up - and also gives us the possibility
    of deploying this for other registers depending on the CPU selection.

    Reported-by: Will Deacon
    Tested-by: Stephen Warren # Tegra Jetson TK1
    Tested-by: Robert Jarzmik # mioa701_bootresume.S
    Tested-by: Andrew Lunn # Kirkwood
    Tested-by: Shawn Guo
    Tested-by: Tony Lindgren # OMAPs
    Tested-by: Gregory CLEMENT # Armada XP, 375, 385
    Acked-by: Sekhar Nori # DaVinci
    Acked-by: Christoffer Dall # kvm/hyp
    Acked-by: Haojian Zhuang # PXA3xx
    Acked-by: Stefano Stabellini # Xen
    Tested-by: Uwe Kleine-König # ARMv7M
    Tested-by: Simon Horman # Shmobile
    Signed-off-by: Russell King

    Russell King
     

17 Jun, 2014

1 commit

  • In case there are several possible delay timers, choose the one with the
    highest resolution. This code relies on the fact secondary CPUs have not yet
    been brought online when register_current_timer_delay() is called. This is
    ensured by implementing calibration_delay_done(),

    Signed-off-by: Peter De Schrijver
    Acked-by: Russell King
    Signed-off-by: Stephen Warren

    Peter De Schrijver
     

25 Feb, 2014

2 commits

  • Renames logical shift macros, 'push' and 'pull', defined in
    arch/arm/include/asm/assembler.h, into 'lspush' and 'lspull'.
    That eliminates name conflict between 'push' logical shift macro
    and 'push' instruction mnemonic. That allows assembler.h to be
    included in .S files that use 'push' instruction.

    Suggested-by: Will Deacon
    Signed-off-by: Victor Kamensky
    Acked-by: Nicolas Pitre
    Signed-off-by: Russell King

    Victor Kamensky
     
  • After a bunch of benchmarking on the interaction between dmb and pldw,
    it turns out that issuing the pldw *after* the dmb instruction can
    give modest performance gains (~3% atomic_add_return improvement on a
    dual A15).

    This patch adds prefetchw invocations to our barriered atomic operations
    including cmpxchg, test_and_xxx and futexes.

    Signed-off-by: Will Deacon
    Signed-off-by: Russell King

    Will Deacon
     

29 Dec, 2013

2 commits

  • Enable the compiler intrinsic for byte swapping on arch ARM. This
    allows the compiler to detect and be able to optimize out byte
    swappings, and has a very modest benefit on vmlinux size (Linaro gcc
    4.8):

    text data bss dec hex filename
    2840310 123932 61960 3026202 2e2d1a vmlinux-lart #orig
    2840152 123932 61960 3026044 2e2c7c vmlinux-lart #builtin-bswap

    6473120 314840 5616016 12403976 bd4508 vmlinux-mxs #orig
    6472586 314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap

    7419872 318372 379556 8117800 7bde28 vmlinux-imx_v6_v7 #orig
    7419170 318364 379556 8117090 7bdb62 vmlinux-imx_v6_v7 #builtin-bswap

    Signed-off-by: Kim Phillips
    Reviewed-by: Nicolas Pitre
    Acked-by: David Woodhouse
    Signed-off-by: Russell King

    Kim Phillips
     
  • We don't need the offset for the first function name in each backtrace
    entry; this needlessly consumes screen space. This is virtually always
    the first or second instruction in the called function.

    Also, recognise stmfd instructions which include r10 as a valid stack
    saving instruction, and when dumping the registers, dump six registers
    per line rather than five, and fix the wrapping.

    Signed-off-by: Russell King

    Russell King
     

01 Dec, 2013

1 commit

  • Currently mx53 (CortexA8) running at 1GHz reports:
    Calibrating delay loop... 663.55 BogoMIPS (lpj=3317760)

    Tom Evans verified that alignments of 0x0 and 0x8 run the two instructions of __loop_delay in one clock cycle (1 clock/loop), while alignments of 0x4 and 0xc take 3 clocks to run the loop twice. (1.5 clock/loop)

    The original object code looks like this:

    00000010 :
    10: e3e01000 mvn r1, #0
    14: e51f201c ldr r2, [pc, #-28] ; 0
    18: e5922000 ldr r2, [r2]
    1c: e0800921 add r0, r0, r1, lsr #18
    20: e1a00720 lsr r0, r0, #14
    24: e0822b21 add r2, r2, r1, lsr #22
    28: e1a02522 lsr r2, r2, #10
    2c: e0000092 mul r0, r2, r0
    30: e0800d21 add r0, r0, r1, lsr #26
    34: e1b00320 lsrs r0, r0, #6
    38: 01a0f00e moveq pc, lr

    0000003c :
    3c: e2500001 subs r0, r0, #1
    40: 8afffffe bhi 3c
    44: e1a0f00e mov pc, lr

    After adding the 'align 3' directive to __loop_delay (align to 8 bytes):

    00000010 :
    10: e3e01000 mvn r1, #0
    14: e51f201c ldr r2, [pc, #-28] ; 0
    18: e5922000 ldr r2, [r2]
    1c: e0800921 add r0, r0, r1, lsr #18
    20: e1a00720 lsr r0, r0, #14
    24: e0822b21 add r2, r2, r1, lsr #22
    28: e1a02522 lsr r2, r2, #10
    2c: e0000092 mul r0, r2, r0
    30: e0800d21 add r0, r0, r1, lsr #26
    34: e1b00320 lsrs r0, r0, #6
    38: 01a0f00e moveq pc, lr
    3c: e320f000 nop {0}

    00000040 :
    40: e2500001 subs r0, r0, #1
    44: 8afffffe bhi 40
    48: e1a0f00e mov pc, lr
    4c: e320f000 nop {0}

    , which now reports:
    Calibrating delay loop... 996.14 BogoMIPS (lpj=4980736)

    Some more test results:

    On mx31 (ARM1136) running at 532 MHz, before the patch:
    Calibrating delay loop... 351.43 BogoMIPS (lpj=1757184)

    On mx31 (ARM1136) running at 532 MHz after the patch:
    Calibrating delay loop... 528.79 BogoMIPS (lpj=2643968)

    Also tested on mx6 (CortexA9) and on mx27 (ARM926), which shows the same
    BogoMIPS value before and after this patch.

    Reported-by: Tom Evans
    Suggested-by: Tom Evans
    Signed-off-by: Fabio Estevam
    Signed-off-by: Russell King

    Fabio Estevam
     

21 Nov, 2013

1 commit

  • Uwe reported a build failure when targetting a NOMMU platform with my
    recent prefetch changes:

    arch/arm/lib/changebit.S: Assembler messages:
    arch/arm/lib/changebit.S:15: Error: architectural extension `mp' is
    not allowed for the current base architecture

    This is due to use of the .arch_extension mp directive immediately prior
    to an ALT_SMP(...) instruction. Whilst the ALT_SMP macro will expand to
    nothing if !CONFIG_SMP, gas will still choke on the directive.

    This patch fixes the issue by only emitting the sequence (including the
    directive) if CONFIG_SMP=y.

    Tested-by: Uwe Kleine-König
    Signed-off-by: Will Deacon
    Signed-off-by: Russell King

    Will Deacon
     

14 Nov, 2013

1 commit

  • Pull ARM updates from Russell King:
    "Included in this series are:

    1. BE8 (modern big endian) changes for ARM from Ben Dooks
    2. big.Little support from Nicolas Pitre and Dave Martin
    3. support for LPAE systems with all system memory above 4GB
    4. Perf updates from Will Deacon
    5. Additional prefetching and other performance improvements from Will.
    6. Neon-optimised AES implementation fro Ard.
    7. A number of smaller fixes scattered around the place.

    There is a rather horrid merge conflict in tools/perf - I was never
    notified of the conflict because it originally occurred between Will's
    tree and other stuff. Consequently I have a resolution which Will
    forwarded me, which I'll forward on immediately after sending this
    mail.

    The other notable thing is I'm expecting some build breakage in the
    crypto stuff on ARM only with Ard's AES patches. These were merged
    into a stable git branch which others had already pulled, so there's
    little I can do about this. The problem is caused because these
    patches have a dependency on some code in the crypto git tree - I
    tried requesting a branch I can pull to resolve these, and all I got
    each time from the crypto people was "we'll revert our patches then"
    which would only make things worse since I still don't have the
    dependent patches. I've no idea what's going on there or how to
    resolve that, and since I can't split these patches from the rest of
    this pull request, I'm rather stuck with pushing this as-is or
    reverting Ard's patches.

    Since it should "come out in the wash" I've left them in - the only
    build problems they seem to cause at the moment are with randconfigs,
    and since it's a new feature anyway. However, if by -rc1 the
    dependencies aren't in, I think it'd be best to revert Ard's patches"

    I resolved the perf conflict roughly as per the patch sent by Russell,
    but there may be some differences. Any errors are likely mine. Let's
    see how the crypto issues work out..

    * 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (110 commits)
    ARM: 7868/1: arm/arm64: remove atomic_clear_mask() in "include/asm/atomic.h"
    ARM: 7867/1: include: asm: use 'int' instead of 'unsigned long' for 'oldval' in atomic_cmpxchg().
    ARM: 7866/1: include: asm: use 'long long' instead of 'u64' within atomic.h
    ARM: 7871/1: amba: Extend number of IRQS
    ARM: 7887/1: Don't smp_cross_call() on UP devices in arch_irq_work_raise()
    ARM: 7872/1: Support arch_irq_work_raise() via self IPIs
    ARM: 7880/1: Clear the IT state independent of the Thumb-2 mode
    ARM: 7878/1: nommu: Implement dummy early_paging_init()
    ARM: 7876/1: clear Thumb-2 IT state on exception handling
    ARM: 7874/2: bL_switcher: Remove cpu_hotplug_driver_{lock,unlock}()
    ARM: footbridge: fix build warnings for netwinder
    ARM: 7873/1: vfp: clear vfp_current_hw_state for dying cpu
    ARM: fix misplaced arch_virt_to_idmap()
    ARM: 7848/1: mcpm: Implement cpu_kill() to synchronise on powerdown
    ARM: 7847/1: mcpm: Factor out logical-to-physical CPU translation
    ARM: 7869/1: remove unused XSCALE_PMU Kconfig param
    ARM: 7864/1: Handle 64-bit memory in case of 32-bit phys_addr_t
    ARM: 7863/1: Let arm_add_memory() always use 64-bit arguments
    ARM: 7862/1: pcpu: replace __get_cpu_var_uses
    ARM: 7861/1: cacheflush: consolidate single-CPU ARMv7 cache disabling code
    ...

    Linus Torvalds
     

12 Nov, 2013

1 commit


29 Oct, 2013

1 commit

  • The memory pinning code in uaccess_with_memcpy.c does not check
    for HugeTLB or THP pmds, and will enter an infinite loop should
    a __copy_to_user or __clear_user occur against a huge page.

    This patch adds detection code for huge pages to pin_page_for_write.
    As this code can be executed in a fast path it refers to the actual
    pmds rather than the vma. If a HugeTLB or THP is found (they have
    the same pmd representation on ARM), the page table spinlock is
    taken to prevent modification whilst the page is pinned.

    On ARM, huge pages are only represented as pmds, thus no huge pud
    checks are performed. (For huge puds one would lock the page table
    in a similar manner as in the pmd case).

    Two helper functions are introduced; pmd_thp_or_huge will check
    whether or not a page is huge or transparent huge (which have the
    same pmd layout on ARM), and pmd_hugewillfault will detect whether
    or not a page fault will occur on write to the page.

    Running the following test (with the chunking from read_zero
    removed):
    $ dd if=/dev/zero of=/dev/null bs=10M count=1024
    Gave: 2.3 GB/s backed by normal pages,
    2.9 GB/s backed by huge pages,
    5.1 GB/s backed by huge pages, with page mask=HPAGE_MASK.

    After some discussion, it was decided not to adopt the HPAGE_MASK,
    as this would have a significant detrimental effect on the overall
    system latency due to page_table_lock being held for too long.
    This could be revisited if split huge page locks are adopted.

    Signed-off-by: Steve Capper
    Reviewed-by: Nicolas Pitre
    Signed-off-by: Russell King

    Steven Capper
     

30 Sep, 2013

1 commit

  • The cost of changing a cacheline from shared to exclusive state can be
    significant, especially when this is triggered by an exclusive store,
    since it may result in having to retry the transaction.

    This patch prefixes our atomic bitops implementation with prefetchw,
    to try and grab the line in exclusive state from the start. The testop
    macro is left alone, since the barrier semantics limit the usefulness
    of prefetching data.

    Acked-by: Nicolas Pitre
    Signed-off-by: Will Deacon

    Will Deacon
     

17 Sep, 2013

1 commit

  • The Shark machine sub-architecture (also known as DNARD, the
    DIGITAL Network Appliance Reference Design) lacks a maintainer
    able to apply and test patches to modernize the architecture.

    It is suspected that the current kernel, while it compiles,
    does not even boot on this machine. The listed maintainer has
    expressed that he will not be able to spend any time on the
    maintenance for the coming year.

    So let's delete it from the kernel for now. It can always be
    resurrected with git revert if maintenance is resumed.

    As the VIA82c505 PCI adapter was only used by this
    architecture, that gets deleted too.

    Cc: arm@kernel.org
    Cc: Alexander Schulz
    Signed-off-by: Linus Walleij

    Linus Walleij
     

09 Sep, 2013

1 commit

  • Commit 0195659 introduced a NEON accelerated version of the xor_blocks()
    function, but it needs the changes in this patch to allow it to be built
    as a module rather than statically into the kernel.

    This patch creates a separate module xor-neon.ko which exports the NEON
    inner xor_blocks() functions depended upon by the regular xor.ko if it
    is built with CONFIG_KERNEL_MODE_NEON=y

    Reported-by: Josh Boyer
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Russell King

    Ard Biesheuvel
     

23 Jul, 2013

1 commit

  • Comments from Ard Biesheuvel:

    I have included two use cases that I have been using, XOR and RAID-6
    checksumming. The former gets a 60% performance boost on the NEON, the
    latter over 400%.

    ARM: add support for kernel mode NEON

    Adds kernel_neon_begin/end (renamed from kernel_vfp_begin/end in the
    previous version to de-emphasize the VFP part as VFP code that needs
    software assistance is not supported currently.)

    Introduces and the Kconfig symbol KERNEL_MODE_NEON. This
    has been aligned with Catalin for arm64, so any NEON code that does
    not use assembly but intrinsics or the GCC vectorizer (such as my
    examples) can potentially be shared between arm and arm64 archs.

    ARM: move VFP init to an earlier boot stage

    This is needed so the NEON is enabled when the XOR and RAID-6 algo
    boot time benchmarks are run.

    ARM: be strict about FP exceptions in kernel mode

    This adds a check to vfp_support_entry() to flag unsupported uses of
    the NEON/VFP in kernel mode. FP exceptions (bounces) are flagged as
    a bug, this is because of their potentially intermittent nature.
    Exceptions caused by the fact that kernel_neon_begin has not been
    called are just routed through the undef handler.

    ARM: crypto: add NEON accelerated XOR implementation

    This is the xor_blocks() implementation built with -ftree-vectorize,
    60% faster than optimized ARM code. It calls in_interrupt() to check
    whether the NEON flavor can be used: this should really not be
    necessary, but due to xor_blocks'squite generic nature, there is no
    telling how exactly people may be using it in the real world.

    lib/raid6: add ARM-NEON accelerated syndrome calculation

    This is a port of the RAID-6 checksumming code in altivec.uc ported
    to use NEON intrinsics. It is about 4x faster than the sequential
    code.

    Russell King
     

15 Jul, 2013

1 commit

  • The __cpuinit type of throwaway sections might have made sense
    some time ago when RAM was more constrained, but now the savings
    do not offset the cost and complications. For example, the fix in
    commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
    is a good example of the nasty type of bugs that can be created
    with improper use of the various __init prefixes.

    After a discussion on LKML[1] it was decided that cpuinit should go
    the way of devinit and be phased out. Once all the users are gone,
    we can then finally remove the macros themselves from linux/init.h.

    Note that some harmless section mismatch warnings may result, since
    notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
    and are flagged as __cpuinit -- so if we remove the __cpuinit from
    the arch specific callers, we will also get section mismatch warnings.
    As an intermediate step, we intend to turn the linux/init.h cpuinit
    related content into no-ops as early as possible, since that will get
    rid of these warnings. In any case, they are temporary and harmless.

    This removes all the ARM uses of the __cpuinit macros from C code,
    and all __CPUINIT from assembly code. It also had two ".previous"
    section statements that were paired off against __CPUINIT
    (aka .section ".cpuinit.text") that also get removed here.

    [1] https://lkml.org/lkml/2013/5/20/589

    Cc: Russell King
    Cc: Will Deacon
    Cc: linux-arm-kernel@lists.infradead.org
    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

09 Jul, 2013

1 commit


03 Apr, 2013

1 commit

  • Commit 70264367a243 ("ARM: 7653/2: do not scale loops_per_jiffy when
    using a constant delay clock") fixed a problem with our timer-based
    delay loop, where loops_per_jiffy is scaled by cpufreq yet used directly
    by the timer delay ops.

    This patch fixes the problem in a more elegant way by keeping a private
    ticks_per_jiffy field in the delay ops, independent of loops_per_jiffy
    and therefore not subject to scaling. The loop-based delay continues to
    use loops_per_jiffy directly, as it should.

    Acked-by: Nicolas Pitre
    Signed-off-by: Will Deacon
    Signed-off-by: Russell King

    Will Deacon
     

12 Mar, 2013

1 commit

  • Commit 455bd4c430b0 ("ARM: 7668/1: fix memset-related crashes caused by
    recent GCC (4.7.2) optimizations") attempted to fix a compliance issue
    with the memset return value. However the memset itself became broken
    by that patch for misaligned pointers.

    This fixes the above by branching over the entry code from the
    misaligned fixup code to avoid reloading the original pointer.

    Also, because the function entry alignment is wrong in the Thumb mode
    compilation, that fixup code is moved to the end.

    While at it, the entry instructions are slightly reworked to help dual
    issue pipelines.

    Signed-off-by: Nicolas Pitre
    Tested-by: Alexander Holler
    Signed-off-by: Russell King

    Nicolas Pitre
     

08 Mar, 2013

1 commit

  • Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on
    assumptions about the implementation of memset and similar functions.
    The current ARM optimized memset code does not return the value of
    its first argument, as is usually expected from standard implementations.

    For instance in the following function:

    void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter)
    {
    memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter));
    waiter->magic = waiter;
    INIT_LIST_HEAD(&waiter->list);
    }

    compiled as:

    800554d0 :
    800554d0: e92d4008 push {r3, lr}
    800554d4: e1a00001 mov r0, r1
    800554d8: e3a02010 mov r2, #16 ; 0x10
    800554dc: e3a01011 mov r1, #17 ; 0x11
    800554e0: eb04426e bl 80165ea0
    800554e4: e1a03000 mov r3, r0
    800554e8: e583000c str r0, [r3, #12]
    800554ec: e5830000 str r0, [r3]
    800554f0: e5830004 str r0, [r3, #4]
    800554f4: e8bd8008 pop {r3, pc}

    GCC assumes memset returns the value of pointer 'waiter' in register r0; causing
    register/memory corruptions.

    This patch fixes the return value of the assembly version of memset.
    It adds a 'mov' instruction and merges an additional load+store into
    existing load/store instructions.
    For ease of review, here is a breakdown of the patch into 4 simple steps:

    Step 1
    ======
    Perform the following substitutions:
    ip -> r8, then
    r0 -> ip,
    and insert 'mov ip, r0' as the first statement of the function.
    At this point, we have a memset() implementation returning the proper result,
    but corrupting r8 on some paths (the ones that were using ip).

    Step 2
    ======
    Make sure r8 is saved and restored when (! CALGN(1)+0) == 1:

    save r8:
    - str lr, [sp, #-4]!
    + stmfd sp!, {r8, lr}

    and restore r8 on both exit paths:
    - ldmeqfd sp!, {pc} @ Now
    Reviewed-by: Nicolas Pitre
    Signed-off-by: Dirk Behme
    Signed-off-by: Russell King

    Ivan Djelic
     

21 Feb, 2013

1 commit

  • When udelay() is implemented using an architected timer, it is wrong
    to scale loops_per_jiffy when changing the CPU clock frequency since
    the timer clock remains constant.

    The lpj should probably become an implementation detail relevant to
    the CPU loop based delay routine only and more confined to it. In the
    mean time this is the minimal fix needed to have expected delays with
    the timer based implementation when cpufreq is also in use.

    Reported-by: Viresh Kumar
    Signed-off-by: Nicolas Pitre
    Tested-by: Viresh Kumar
    Acked-by: Liviu Dudau
    Cc: stable@vger.kernel.org
    Signed-off-by: Russell King

    Nicolas Pitre
     

10 Oct, 2012

1 commit

  • read_current_timer is used by get_cycles since "ARM: 7538/1: delay:
    add registration mechanism for delay timer sources", and get_cycles
    can be used by device drivers in loadable modules, so it has to
    be exported.

    Without this patch, building imote2_defconfig fails with

    ERROR: "read_current_timer" [crypto/tcrypt.ko] undefined!

    Signed-off-by: Arnd Bergmann
    Cc: Stephen Boyd
    Cc: Jonathan Austin
    Cc: Will Deacon
    Cc: Russell King

    Arnd Bergmann
     

05 Oct, 2012

1 commit


27 Sep, 2012

1 commit

  • The current timer-based delay loop relies on the architected timer to
    initiate the switch away from the polling-based implementation. This is
    unfortunate for platforms without the architected timers but with a
    suitable delay source (that is, constant frequency, always powered-up
    and ticking as long as the CPUs are online).

    This patch introduces a registration mechanism for the delay timer
    (which provides an unconditional read_current_timer implementation) and
    updates the architected timer code to use the new interface.

    Reviewed-by: Stephen Boyd
    Signed-off-by: Jonathan Austin
    Signed-off-by: Will Deacon
    Signed-off-by: Russell King

    Jonathan Austin
     

10 Sep, 2012

2 commits

  • The delay functions may be called by some platforms between switching to
    the timer-based delay loop but before calibration. In this case, the
    initial loops_per_jiffy may not be suitable for the timer (although a
    compromise may be achievable) and delay times may be considered too
    inaccurate.

    This patch updates loops_per_jiffy when switching to the timer-based
    delay loop so that delays are consistent prior to calibration.

    Signed-off-by: Will Deacon
    Signed-off-by: Russell King

    Will Deacon
     
  • The {get,put}_user macros don't perform range checking on the provided
    __user address when !CPU_HAS_DOMAINS.

    This patch reworks the out-of-line assembly accessors to check the user
    address against a specified limit, returning -EFAULT if is is out of
    range.

    [will: changed get_user register allocation to match put_user]
    [rmk: fixed building on older ARM architectures]

    Reported-by: Catalin Marinas
    Signed-off-by: Will Deacon
    Cc: stable@vger.kernel.org
    Signed-off-by: Russell King

    Russell King
     

13 Aug, 2012

1 commit

  • This partially reverts 357c9c1f07d4546bc3fbc0fd1044d96b114d14ed
    (ARM: Remove support for ARMv3 ARM610 and ARM710 CPUs).

    Although we only support StrongARM on the RiscPC, we need to keep the
    ARMv3 user access code for this platform because the bus does not
    understand half-word load/stores.

    Reported-by: Arnd Bergmann
    Signed-off-by: Russell King

    Russell King
     

31 Jul, 2012

1 commit


28 Jul, 2012

1 commit


10 Jul, 2012

1 commit

  • This patch allows a timer-based delay implementation to be selected by
    switching the delay routines over to use get_cycles, which is
    implemented in terms of read_current_timer. This further allows us to
    skip the loop calibration and have a consistent delay function in the
    face of core frequency scaling.

    To avoid the pain of dealing with memory-mapped counters, this
    implementation uses the co-processor interface to the architected timers
    when they are available. The previous loop-based implementation is
    kept around for CPUs without the architected timers and we retain both
    the maximum delay (2ms) and the corresponding conversion factors for
    determining the number of loops required for a given interval. Since the
    indirection of the timer routines will only work when called from C,
    the sa1100 sleep routines are modified to branch to the loop-based delay
    functions directly.

    Tested-by: Shinya Kuribayashi
    Reviewed-by: Stephen Boyd
    Signed-off-by: Will Deacon
    Signed-off-by: Russell King

    Will Deacon