11 Aug, 2015

4 commits

  • commit 80f420842ff42ad61f84584716d74ef635f13892 upstream.

    ARCompact/ARCv2 ISA provide that any instructions which deals with
    bitpos/count operand ASL, LSL, BSET, BCLR, BMSK .... will only consider
    lower 5 bits. i.e. auto-clamp the pos to 0-31.

    ARC Linux bitops exploited this fact by NOT explicitly masking out upper
    bits for @nr operand in general, saving a bunch of AND/BMSK instructions
    in generated code around bitops.

    While this micro-optimization has worked well over years it is NOT safe
    as shifting a number with a value, greater than native size is
    "undefined" per "C" spec.

    So as it turns outm EZChip ran into this eventually, in their massive
    muti-core SMP build with 64 cpus. There was a test_bit() inside a loop
    from 63 to 0 and gcc was weirdly optimizing away the first iteration
    (so it was really adhering to standard by implementing undefined behaviour
    vs. removing all the iterations which were phony i.e. (1 << [63..32])

    | for i = 63 to 0
    | X = ( 1 << i )
    | if X == 0
    | continue

    So fix the code to do the explicit masking at the expense of generating
    additional instructions. Fortunately, this can be mitigated to a large
    extent as gcc has SHIFT_COUNT_TRUNCATED which allows combiner to fold
    masking into shift operation itself. It is currently not enabled in ARC
    gcc backend, but could be done after a bit of testing.

    Fixes STAR 9000866918 ("unsafe "undefined behavior" code in kernel")

    Reported-by: Noam Camus
    Signed-off-by: Vineet Gupta
    Signed-off-by: Greg Kroah-Hartman

    Vineet Gupta
     
  • commit 04e2eee4b02edcafce96c9c37b31b1a3318291a4 upstream.

    No semantical changes !

    Acked-by: Peter Zijlstra (Intel)
    Signed-off-by: Vineet Gupta
    Signed-off-by: Greg Kroah-Hartman

    Vineet Gupta
     
  • commit f51e2f1911122879eefefa4c592dea8bf794b39c upstream.

    Currently instruction_pointer() returns pt_regs->ret and so return value
    is of type "long", which implicitly stands for "signed long".

    While that's perfectly fine when dealing with 32-bit values if return
    value of instruction_pointer() gets assigned to 64-bit variable sign
    extension may happen.

    And at least in one real use-case it happens already.
    In perf_prepare_sample() return value of perf_instruction_pointer()
    (which is an alias to instruction_pointer() in case of ARC) is assigned
    to (struct perf_sample_data)->ip (which type is "u64").

    And what we see if instuction pointer points to user-space application
    that in case of ARC lays below 0x8000_0000 "ip" gets set properly with
    leading 32 zeros. But if instruction pointer points to kernel address
    space that starts from 0x8000_0000 then "ip" is set with 32 leadig
    "f"-s. I.e. id instruction_pointer() returns 0x8100_0000, "ip" will be
    assigned with 0xffff_ffff__8100_0000. Which is obviously wrong.

    In particular that issuse broke output of perf, because perf was unable
    to associate addresses like 0xffff_ffff__8100_0000 with anything from
    /proc/kallsyms.

    That's what we used to see:
    ----------->8----------
    6.27% ls [unknown] [k] 0xffffffff8046c5cc
    2.96% ls libuClibc-0.9.34-git.so [.] memcpy
    2.25% ls libuClibc-0.9.34-git.so [.] memset
    1.66% ls [unknown] [k] 0xffffffff80666536
    1.54% ls libuClibc-0.9.34-git.so [.] 0x000224d6
    1.18% ls libuClibc-0.9.34-git.so [.] 0x00022472
    ----------->8----------

    With that change perf output looks much better now:
    ----------->8----------
    8.21% ls [kernel.kallsyms] [k] memset
    3.52% ls libuClibc-0.9.34-git.so [.] memcpy
    2.11% ls libuClibc-0.9.34-git.so [.] malloc
    1.88% ls libuClibc-0.9.34-git.so [.] memset
    1.64% ls [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
    1.41% ls [kernel.kallsyms] [k] __d_lookup_rcu
    ----------->8----------

    Signed-off-by: Alexey Brodkin
    Cc: arc-linux-dev@synopsys.com
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Vineet Gupta
    Signed-off-by: Greg Kroah-Hartman

    Alexey Brodkin
     
  • commit 97709069214eb75312c14946803b9da4d3814203 upstream.

    ARC kernels have historically been built with -O3, despite top level
    Makefile defaulting to -O2. This was facilitated by implicitly ordering
    of arch makefile include AFTER top level assigned -O2.

    An upstream fix to top level a1c48bb160f ("Makefile: Fix unrecognized
    cross-compiler command line options") changed the ordering, making ARC
    -O3 defunct.

    Fix that by NOT relying on any ordering whatsoever and use the proper
    arch override facility now present in kbuild (ARCH_*FLAGS)

    Depends-on: ("kbuild: Allow arch Makefiles to override {cpp,ld,c}flags")
    Suggested-by: Michal Marek
    Cc: Geert Uytterhoeven
    Signed-off-by: Vineet Gupta
    Signed-off-by: Greg Kroah-Hartman

    Vineet Gupta
     

22 Jul, 2015

3 commits

  • commit 7002f77541f877a5590615ceb3da32b114f14b62 upstream.

    static arc_pmu in the arch/arc/kernel/perf_event.c is not initialized as
    it's shadowed by a local variable of the same name in the
    arc_pmu_device_probe.

    Signed-off-by: Max Filippov
    Fixes: 03c94fcf954d "ARC: perf: make @arc_pmu static global"
    Signed-off-by: Vineet Gupta
    Signed-off-by: Greg Kroah-Hartman

    Max Filippov
     
  • commit d57f727264f1425a94689bafc7e99e502cb135b5 upstream.

    When auditing cmpxchg call sites, Chuck noted that gcc was optimizing
    away some of the desired LDs.

    | do {
    | new = old = *ipi_data_ptr;
    | new |= 1U << msg;
    | } while (cmpxchg(ipi_data_ptr, old, new) != old);

    was generating to below

    | 8015cef8: ld r2,[r4,0]
    Acked-by: Peter Zijlstra (Intel)
    Signed-off-by: Vineet Gupta
    Signed-off-by: Greg Kroah-Hartman

    Vineet Gupta
     
  • commit 2576c28e3f623ed401db7e6197241865328620ef upstream.

    - arch_spin_lock/unlock were lacking the ACQUIRE/RELEASE barriers
    Since ARCv2 only provides load/load, store/store and all/all, we need
    the full barrier

    - LLOCK/SCOND based atomics, bitops, cmpxchg, which return modified
    values were lacking the explicit smp barriers.

    - Non LLOCK/SCOND varaints don't need the explicit barriers since that
    is implicity provided by the spin locks used to implement the
    critical section (the spin lock barriers in turn are also fixed in
    this commit as explained above

    Cc: Paul E. McKenney
    Acked-by: Peter Zijlstra (Intel)
    Signed-off-by: Vineet Gupta
    Signed-off-by: Greg Kroah-Hartman

    Vineet Gupta
     

11 May, 2015

2 commits


10 May, 2015

1 commit


24 Apr, 2015

1 commit

  • Pull ARC updates from Vineet Gupta:

    - perf fixes/improvements

    - misc cleanups

    * tag 'arc-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
    ARC: perf: don't add code for impossible case
    ARC: perf: Rename DT binding to not confuse with power mgmt
    ARC: perf: add user space attribution in callchains
    ARC: perf: Add kernel callchain support
    ARC: perf: support cache hit/miss ratio
    ARC: perf: Add some comments/debug stuff
    ARC: perf: make @arc_pmu static global
    ARC: mem init spring cleaning - No functional changes
    ARC: Fix RTT boot printing
    ARC: fold __builtin_constant_p() into test_bit()
    ARC: rename unhandled exception handler
    ARC: cosmetic: Remove unused ECR bitfield masks
    ARC: Fix WRITE_BCR
    ARC: [nsimosci] Update defconfig
    arc: copy_thread(): rename 'arg' argument to 'kthread_arg'

    Linus Torvalds
     

20 Apr, 2015

7 commits


17 Apr, 2015

1 commit


16 Apr, 2015

1 commit

  • Pull exec domain removal from Richard Weinberger:
    "This series removes execution domain support from Linux.

    The idea behind exec domains was to support different ABIs. The
    feature was never complete nor stable. Let's rip it out and make the
    kernel signal handling code less complicated"

    * 'exec_domain_rip_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/misc: (27 commits)
    arm64: Removed unused variable
    sparc: Fix execution domain removal
    Remove rest of exec domains.
    arch: Remove exec_domain from remaining archs
    arc: Remove signal translation and exec_domain
    xtensa: Remove signal translation and exec_domain
    xtensa: Autogenerate offsets in struct thread_info
    x86: Remove signal translation and exec_domain
    unicore32: Remove signal translation and exec_domain
    um: Remove signal translation and exec_domain
    tile: Remove signal translation and exec_domain
    sparc: Remove signal translation and exec_domain
    sh: Remove signal translation and exec_domain
    s390: Remove signal translation and exec_domain
    mn10300: Remove signal translation and exec_domain
    microblaze: Remove signal translation and exec_domain
    m68k: Remove signal translation and exec_domain
    m32r: Remove signal translation and exec_domain
    m32r: Autogenerate offsets in struct thread_info
    frv: Remove signal translation and exec_domain
    ...

    Linus Torvalds
     

15 Apr, 2015

2 commits

  • Pull vfs update from Al Viro:
    "Part one:

    - struct filename-related cleanups

    - saner iov_iter_init() replacements (and switching the syscalls to
    use of those)

    - ntfs switch to ->write_iter() (Anton)

    - aio cleanups and splitting iocb into common and async parts
    (Christoph)

    - assorted fixes (me, bfields, Andrew Elble)

    There's a lot more, including the completion of switchover to
    ->{read,write}_iter(), d_inode/d_backing_inode annotations, f_flags
    race fixes, etc, but that goes after #for-davem merge. David has
    pulled it, and once it's in I'll send the next vfs pull request"

    * 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (35 commits)
    sg_start_req(): use import_iovec()
    sg_start_req(): make sure that there's not too many elements in iovec
    blk_rq_map_user(): use import_single_range()
    sg_io(): use import_iovec()
    process_vm_access: switch to {compat_,}import_iovec()
    switch keyctl_instantiate_key_common() to iov_iter
    switch {compat_,}do_readv_writev() to {compat_,}import_iovec()
    aio_setup_vectored_rw(): switch to {compat_,}import_iovec()
    vmsplice_to_user(): switch to import_iovec()
    kill aio_setup_single_vector()
    aio: simplify arguments of aio_setup_..._rw()
    aio: lift iov_iter_init() into aio_setup_..._rw()
    lift iov_iter into {compat_,}do_readv_writev()
    NFS: fix BUG() crash in notify_change() with patch to chown_common()
    dcache: return -ESTALE not -EBUSY on distributed fs race
    NTFS: Version 2.1.32 - Update file write from aio_write to write_iter.
    VFS: Add iov_iter_fault_in_multipages_readable()
    drop bogus check in file_open_root()
    switch security_inode_getattr() to struct path *
    constify tomoyo_realpath_from_path()
    ...

    Linus Torvalds
     
  • Pull trivial tree from Jiri Kosina:
    "Usual trivial tree updates. Nothing outstanding -- mostly printk()
    and comment fixes and unused identifier removals"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
    goldfish: goldfish_tty_probe() is not using 'i' any more
    powerpc: Fix comment in smu.h
    qla2xxx: Fix printks in ql_log message
    lib: correct link to the original source for div64_u64
    si2168, tda10071, m88ds3103: Fix firmware wording
    usb: storage: Fix printk in isd200_log_config()
    qla2xxx: Fix printk in qla25xx_setup_mode
    init/main: fix reset_device comment
    ipwireless: missing assignment
    goldfish: remove unreachable line of code
    coredump: Fix do_coredump() comment
    stacktrace.h: remove duplicate declaration task_struct
    smpboot.h: Remove unused function prototype
    treewide: Fix typo in printk messages
    treewide: Fix typo in printk messages
    mod_devicetable: fix comment for match_flags

    Linus Torvalds
     

13 Apr, 2015

8 commits


12 Apr, 2015

1 commit


31 Mar, 2015

1 commit


26 Mar, 2015

2 commits

  • A malicious signal handler / restorer can DOS the system by fudging the
    user regs saved on stack, causing weird things such as sigreturn returning
    to user mode PC but cpu state still being kernel mode....

    Ensure that in sigreturn path status32 always has U bit; any other bogosity
    (gargbage PC etc) will be taken care of by normal user mode exceptions mechanisms.

    Reproducer signal handler:

    void handle_sig(int signo, siginfo_t *info, void *context)
    {
    ucontext_t *uc = context;
    struct user_regs_struct *regs = &(uc->uc_mcontext.regs);

    regs->scratch.status32 = 0;
    }

    Before the fix, kernel would go off to weeds like below:

    --------->8-----------
    [ARCLinux]$ ./signal-test
    Path: /signal-test
    CPU: 0 PID: 61 Comm: signal-test Not tainted 4.0.0-rc5+ #65
    task: 8f177880 ti: 5ffe6000 task.ti: 8f15c000

    [ECR ]: 0x00220200 => Invalid Write @ 0x00000010 by insn @ 0x00010698
    [EFA ]: 0x00000010
    [BLINK ]: 0x2007c1ee
    [ERET ]: 0x10698
    [STAT32]: 0x00000000 : 8-----------

    Reported-by: Alexey Brodkin
    Cc:
    Signed-off-by: Vineet Gupta

    Vineet Gupta
     
  • The regfile provided to SA_SIGINFO signal handler as ucontext was off by
    one due to pt_regs gutter cleanups in 2013.

    Before handling signal, user pt_regs are copied onto user_regs_struct and copied
    back later. Both structs are binary compatible. This was all fine until
    commit 2fa919045b72 (ARC: pt_regs update #2) which removed the empty stack slot
    at top of pt_regs (corresponding to first pad) and made the corresponding
    fixup in struct user_regs_struct (the pad in there was moved out of
    @scratch - not removed altogether as it is part of ptrace ABI)

    struct user_regs_struct {
    + long pad;
    struct {
    - long pad;
    long bta, lp_start, lp_end,....
    } scratch;
    ...
    }

    This meant that now user_regs_struct was off by 1 reg w.r.t pt_regs and
    signal code needs to user_regs_struct.scratch to reflect it as pt_regs,
    which is what this commit does.

    This problem was hidden for 2 years, because both save/restore, despite
    using wrong location, were using the same location. Only an interim
    inspection (reproducer below) exposed the issue.

    void handle_segv(int signo, siginfo_t *info, void *context)
    {
    ucontext_t *uc = context;
    struct user_regs_struct *regs = &(uc->uc_mcontext.regs);

    printf("regs %x %x\n", scratch.r8, regs->scratch.r9);
    }

    int main()
    {
    struct sigaction sa;

    sa.sa_sigaction = handle_segv;
    sa.sa_flags = SA_SIGINFO;
    sigemptyset(&sa.sa_mask);
    sigaction(SIGSEGV, &sa, NULL);

    asm volatile(
    "mov r7, 7 \n"
    "mov r8, 8 \n"
    "mov r9, 9 \n"
    "mov r10, 10 \n"
    :::"r7","r8","r9","r10");

    *((unsigned int*)0x10) = 0;
    }

    Fixes: 2fa919045b72ec892e "ARC: pt_regs update #2: Remove unused gutter at start of pt_regs"
    CC:
    Signed-off-by: Vineet Gupta

    Vineet Gupta
     

07 Mar, 2015

1 commit


27 Feb, 2015

4 commits


19 Feb, 2015

1 commit

  • Pull dmaengine updates from Vinod Koul:
    "This update brings:

    - the big cleanup up by Maxime for device control and slave
    capabilities. This makes the API much cleaner.

    - new IMG MDC driver by Andrew

    - new Renesas R-Car Gen2 DMA Controller driver by Laurent along with
    bunch of fixes on rcar drivers

    - odd fixes and updates spread over driver"

    * 'for-linus' of git://git.infradead.org/users/vkoul/slave-dma: (130 commits)
    dmaengine: pl330: add DMA_PAUSE feature
    dmaengine: pl330: improve pl330_tx_status() function
    dmaengine: rcar-dmac: Disable channel 0 when using IOMMU
    dmaengine: rcar-dmac: Work around descriptor mode IOMMU errata
    dmaengine: rcar-dmac: Allocate hardware descriptors with DMAC device
    dmaengine: rcar-dmac: Fix oops due to unintialized list in error ISR
    dmaengine: rcar-dmac: Fix spinlock issues in interrupt
    dmaenegine: edma: fix sparse warnings
    dmaengine: rcar-dmac: Fix uninitialized variable usage
    dmaengine: shdmac: extend PM methods
    dmaengine: shdmac: use SET_RUNTIME_PM_OPS()
    dmaengine: pl330: fix bug that cause start the same descs in cyclic
    dmaengine: at_xdmac: allow muliple dwidths when doing slave transfers
    dmaengine: at_xdmac: simplify channel configuration stuff
    dmaengine: at_xdmac: introduce save_cc field
    dmaengine: at_xdmac: wait for in-progress transaction to complete after pausing a channel
    ioat: fail self-test if wait_for_completion times out
    dmaengine: dw: define DW_DMA_MAX_NR_MASTERS
    dmaengine: dw: amend description of dma_dev field
    dmatest: move src_off, dst_off, len inside loop
    ...

    Linus Torvalds