06 May, 2015

11 commits

  • Hwpoison injection via debugfs:hwpoison/corrupt-pfn takes a refcount of
    the target page. But current code doesn't release it if the target page
    is not supposed to be injected, which results in memory leak. This patch
    simply adds the refcount releasing code.

    Signed-off-by: Naoya Horiguchi
    Acked-by: Dean Nelson
    Cc: Andi Kleen
    Cc: Andrea Arcangeli
    Cc: Hidetoshi Seto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • If multiple soft offline events hit one free page/hugepage concurrently,
    soft_offline_page() can handle the free page/hugepage multiple times,
    which makes num_poisoned_pages counter increased more than once. This
    patch fixes this wrong counting by checking TestSetPageHWPoison for normal
    papes and by checking the return value of dequeue_hwpoisoned_huge_page()
    for hugepages.

    Signed-off-by: Naoya Horiguchi
    Acked-by: Dean Nelson
    Cc: Andi Kleen
    Cc: [3.14+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • This is a basic driver for the ultra-low-power Abracon AB x80x series of RTC
    chips. It supports in particular, the supersets AB0805 and AB1805.
    It allows reading and writing the time, and enables the supercapacitor/
    battery charger.

    [arnd@arndb.de: abx805 depends on i2c]
    [alexandre.belloni@free-electrons.com: renam buffer from date to buf in abx80x_rtc_read_time()]
    Signed-off-by: Philippe De Muyter
    Cc: Alessandro Zummo
    Signed-off-by: Alexandre Belloni
    Signed-off-by: Arnd Bergmann
    Cc: Paul Bolle
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Philippe De Muyter
     
  • Document the bindings for abracon,abx80x and related compatibles.

    Signed-off-by: Alexandre Belloni
    Cc: Philippe De Muyter
    Cc: Alessandro Zummo
    Cc: Arnd Bergmann
    Cc: Paul Bolle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandre Belloni
     
  • The documentation shows a need for gcc > 4.9.2, but it's really >=. The
    Kconfig entries don't show require versions so add them. Correct a
    latter/later typo too. Also mention that gcc 5 required to catch out of
    bounds accesses to global and stack variables.

    Signed-off-by: Joe Perches
    Signed-off-by: Andrey Ryabinin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Currently memory_failure() calls shake_page() to sweep pages out from
    pcplists only when the victim page is 4kB LRU page or thp head page.
    But we should do this for a thp tail page too.

    Consider that a memory error hits a thp tail page whose head page is on
    a pcplist when memory_failure() runs. Then, the current kernel skips
    shake_pages() part, so hwpoison_user_mappings() returns without calling
    split_huge_page() nor try_to_unmap() because PageLRU of the thp head is
    still cleared due to the skip of shake_page().

    As a result, me_huge_page() runs for the thp, which is broken behavior.

    One effect is a leak of the thp. And another is to fail to isolate the
    memory error, so later access to the error address causes another MCE,
    which kills the processes which used the thp.

    This patch fixes this problem by calling shake_page() for thp tail case.

    Fixes: 385de35722c9 ("thp: allow a hwpoisoned head page to be put back to LRU")
    Signed-off-by: Naoya Horiguchi
    Reviewed-by: Andi Kleen
    Acked-by: Dean Nelson
    Cc: Andrea Arcangeli
    Cc: Hidetoshi Seto
    Cc: Jin Dongming
    Cc: [3.4+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • The file lib/find_last_bit.c was no longer used and supposed to be
    deleted by commit 8f6f19dd51 ("lib: move find_last_bit to
    lib/find_next_bit.c") but that delete didn't happen. This gets rid of
    it.

    Signed-off-by: Yury Norov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yury Norov
     
  • Add myself (Jacek Anaszewski) as a co-maintainer for the LED subsystem.

    Signed-off-by: Jacek Anaszewski
    Acked-by: Bryan Wu
    Cc: Richard Purdie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Sergey Senozhatsky has contributed/reviewed to zram for a long time. He
    is really helpful for maintaining zram so I want for him to continue
    helping me as Designated Reviewer unless he hates it.

    Signed-off-by: Minchan Kim
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Revert commit c72c6160d967ed26a0b136dbab337f821d233509

    It was intended to be a cosmetic change that w/o any functional change
    and was part of a bigger change:

    http://lkml.iu.edu/hypermail/linux/kernel/1503.1/01818.html

    Sergey Senozhatsky
    Cc: Linus Torvalds
    Cc: Minchan Kim
    Cc: Nitin Gupta
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Pull crypto fixes from Herbert Xu:
    "This fixes a build problem with bcm63xx and yet another fix to the
    memzero_explicit function to ensure that the memset is not elided"

    * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    hwrng: bcm63xx - Fix driver compilation
    lib: make memzero_explicit more robust against dead store elimination

    Linus Torvalds
     

05 May, 2015

1 commit

  • Pull media fixes from Mauro Carvalho Chehab:
    "Three driver fixes:

    - fix for omap4, fixing a regression due to a subsystem API that got
    removed for 4.1 (commit efde234674d9);

    - fix for one of the formats supported by Marvel ccic driver;

    - fix rcar_vin driver that, when stopping abnormally, the driver
    can't return from wait_for_completion"

    * tag 'media/v4.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
    [media] v4l: omap4iss: Replace outdated OMAP4 control pad API with syscon
    [media] media: soc_camera: rcar_vin: Fix wait_for_completion
    [media] marvell-ccic: fix Y'CbCr ordering

    Linus Torvalds
     

04 May, 2015

10 commits

  • - s/clk_didsable_unprepare/clk_disable_unprepare
    - s/prov/priv
    - s/error/ret (bcm63xx_rng_probe)

    Fixes: 6229c16060fe ("hwrng: bcm63xx - make use of devm_hwrng_register")
    Signed-off-by: Álvaro Fernández Rojas
    Acked-by: Florian Fainelli
    Signed-off-by: Herbert Xu

    Álvaro Fernández Rojas
     
  • In commit 0b053c951829 ("lib: memzero_explicit: use barrier instead
    of OPTIMIZER_HIDE_VAR"), we made memzero_explicit() more robust in
    case LTO would decide to inline memzero_explicit() and eventually
    find out it could be elimiated as dead store.

    While using barrier() works well for the case of gcc, recent efforts
    from LLVMLinux people suggest to use llvm as an alternative to gcc,
    and there, Stephan found in a simple stand-alone user space example
    that llvm could nevertheless optimize and thus elimitate the memset().
    A similar issue has been observed in the referenced llvm bug report,
    which is regarded as not-a-bug.

    Based on some experiments, icc is a bit special on its own, while it
    doesn't seem to eliminate the memset(), it could do so with an own
    implementation, and then result in similar findings as with llvm.

    The fix in this patch now works for all three compilers (also tested
    with more aggressive optimization levels). Arguably, in the current
    kernel tree it's more of a theoretical issue, but imho, it's better
    to be pedantic about it.

    It's clearly visible with gcc/llvm though, with the below code: if we
    would have used barrier() only here, llvm would have omitted clearing,
    not so with barrier_data() variant:

    static inline void memzero_explicit(void *s, size_t count)
    {
    memset(s, 0, count);
    barrier_data(s);
    }

    int main(void)
    {
    char buff[20];
    memzero_explicit(buff, sizeof(buff));
    return 0;
    }

    $ gcc -O2 test.c
    $ gdb a.out
    (gdb) disassemble main
    Dump of assembler code for function main:
    0x0000000000400400 : lea -0x28(%rsp),%rax
    0x0000000000400405 : movq $0x0,-0x28(%rsp)
    0x000000000040040e : movq $0x0,-0x20(%rsp)
    0x0000000000400417 : movl $0x0,-0x18(%rsp)
    0x000000000040041f : xor %eax,%eax
    0x0000000000400421 : retq
    End of assembler dump.

    $ clang -O2 test.c
    $ gdb a.out
    (gdb) disassemble main
    Dump of assembler code for function main:
    0x00000000004004f0 : xorps %xmm0,%xmm0
    0x00000000004004f3 : movaps %xmm0,-0x18(%rsp)
    0x00000000004004f8 : movl $0x0,-0x8(%rsp)
    0x0000000000400500 : lea -0x18(%rsp),%rax
    0x0000000000400505 : xor %eax,%eax
    0x0000000000400507 : retq
    End of assembler dump.

    As gcc, clang, but also icc defines __GNUC__, it's sufficient to define
    this in compiler-gcc.h only to be picked up. For a fallback or otherwise
    unsupported compiler, we define it as a barrier. Similarly, for ecc which
    does not support gcc inline asm.

    Reference: https://llvm.org/bugs/show_bug.cgi?id=15495
    Reported-by: Stephan Mueller
    Tested-by: Stephan Mueller
    Signed-off-by: Daniel Borkmann
    Cc: Theodore Ts'o
    Cc: Stephan Mueller
    Cc: Hannes Frederic Sowa
    Cc: mancha security
    Cc: Mark Charlebois
    Cc: Behan Webster
    Signed-off-by: Herbert Xu

    Daniel Borkmann
     
  • Linus Torvalds
     
  • Pull ext4 fixes from Ted Ts'o:
    "Some miscellaneous bug fixes and some final on-disk and ABI changes
    for ext4 encryption which provide better security and performance"

    * tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: fix growing of tiny filesystems
    ext4: move check under lock scope to close a race.
    ext4: fix data corruption caused by unwritten and delayed extents
    ext4 crypto: remove duplicated encryption mode definitions
    ext4 crypto: do not select from EXT4_FS_ENCRYPTION
    ext4 crypto: add padding to filenames before encrypting
    ext4 crypto: simplify and speed up filename encryption

    Linus Torvalds
     
  • Pull drm fixes from Dave Airlie:
    "One intel fix, one rockchip fix, and a bunch of radeon fixes for some
    regressions from audio rework and vm stability"

    * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
    drm/i915/chv: Implement WaDisableShadowRegForCpd
    drm/radeon: fix userptr return value checking (v2)
    drm/radeon: check new address before removing old one
    drm/radeon: reset BOs address after clearing it.
    drm/radeon: fix lockup when BOs aren't part of the VM on release
    drm/radeon: add SI DPM quirk for Sapphire R9 270 Dual-X 2G GDDR5
    drm/radeon: adjust pll when audio is not enabled
    drm/radeon: only enable audio streams if the monitor supports it
    drm/radeon: only mark audio as connected if the monitor supports it (v3)
    drm/radeon/audio: don't enable packets until the end
    drm/radeon: drop dce6_dp_enable
    drm/radeon: fix ordering of AVI packet setup
    drm/radeon: Use drm_calloc_ab for CS relocs
    drm/rockchip: fix error check when getting irq
    MAINTAINERS: add entry for Rockchip drm drivers

    Linus Torvalds
     
  • Just a single intel fix
    * tag 'drm-intel-fixes-2015-04-30' of git://anongit.freedesktop.org/drm-intel:
    drm/i915/chv: Implement WaDisableShadowRegForCpd

    Dave Airlie
     
  • one fix and maintainers update
    * 'drm-next0420' of https://github.com/markyzq/kernel-drm-rockchip:
    drm/rockchip: fix error check when getting irq
    MAINTAINERS: add entry for Rockchip drm drivers

    Dave Airlie
     
  • Pull SCSI fixes from James Bottomley:
    "This is three logical fixes (as 5 patches).

    The 3ware class of drivers were causing an oops with multiqueue by
    tearing down the command mappings after completing the command (where
    the variables in the command used to tear down the mapping were
    no-longer valid). There's also a fix for the qnap iscsi target which
    was choking on us sending it commands that were too long and a fix for
    the reworked aha1542 allocating GFP_KERNEL under a lock"

    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
    3w-9xxx: fix command completion race
    3w-xxxx: fix command completion race
    3w-sas: fix command completion race
    aha1542: Allocate memory before taking a lock
    SCSI: add 1024 max sectors black list flag

    Linus Torvalds
     
  • Pull slave dmaengine fixes from Vinod Koul:
    "Here are the fixes in dmaengine subsystem for rc2:

    - privatecnt fix for slave dma request API by Christopher

    - warn fix for PM ifdef in usb-dmac by Geert

    - fix hardware dependency for xgene by Jean"

    * 'next' of git://git.infradead.org/users/vkoul/slave-dma:
    dmaengine: increment privatecnt when using dma_get_any_slave_channel
    dmaengine: xgene: Set hardware dependency
    dmaengine: usb-dmac: Protect PM-only functions to kill warning

    Linus Torvalds
     
  • Pull powerpc fixes from Michael Ellerman:
    - build fix for SMP=n in book3s_xics.c
    - fix for Daniel's pci_controller_ops on powernv.
    - revert the TM syscall abort patch for now.
    - CPU affinity fix from Nathan.
    - two EEH fixes from Gavin.
    - fix for CR corruption from Sam.
    - selftest build fix.

    * tag 'powerpc-4.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
    powerpc/powernv: Restore non-volatile CRs after nap
    powerpc/eeh: Delay probing EEH device during hotplug
    powerpc/eeh: Fix race condition in pcibios_set_pcie_reset_state()
    powerpc/pseries: Correct cpu affinity for dlpar added cpus
    selftests/powerpc: Fix the pmu install rule
    Revert "powerpc/tm: Abort syscalls in active transactions"
    powerpc/powernv: Fix early pci_controller_ops loading.
    powerpc/kvm: Fix SMP=n build error in book3s_xics.c

    Linus Torvalds
     

03 May, 2015

3 commits

  • The estimate of necessary transaction credits in ext4_flex_group_add()
    is too pessimistic. It reserves credit for sb, resize inode, and resize
    inode dindirect block for each group added in a flex group although they
    are always the same block and thus it is enough to account them only
    once. Also the number of modified GDT block is overestimated since we
    fit EXT4_DESC_PER_BLOCK(sb) descriptors in one block.

    Make the estimation more precise. That reduces number of requested
    credits enough that we can grow 20 MB filesystem (which has 1 MB
    journal, 79 reserved GDT blocks, and flex group size 16 by default).

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Eric Sandeen

    Jan Kara
     
  • fallocate() checks that the file is extent-based and returns
    EOPNOTSUPP in case is not. Other tasks can convert from and to
    indirect and extent so it's safe to check only after grabbing
    the inode mutex.

    Signed-off-by: Davide Italiano
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Davide Italiano
     
  • Currently it is possible to lose whole file system block worth of data
    when we hit the specific interaction with unwritten and delayed extents
    in status extent tree.

    The problem is that when we insert delayed extent into extent status
    tree the only way to get rid of it is when we write out delayed buffer.
    However there is a limitation in the extent status tree implementation
    so that when inserting unwritten extent should there be even a single
    delayed block the whole unwritten extent would be marked as delayed.

    At this point, there is no way to get rid of the delayed extents,
    because there are no delayed buffers to write out. So when a we write
    into said unwritten extent we will convert it to written, but it still
    remains delayed.

    When we try to write into that block later ext4_da_map_blocks() will set
    the buffer new and delayed and map it to invalid block which causes
    the rest of the block to be zeroed loosing already written data.

    For now we can fix this by simply not allowing to set delayed status on
    written extent in the extent status tree. Also add WARN_ON() to make
    sure that we notice if this happens in the future.

    This problem can be easily reproduced by running the following xfs_io.

    xfs_io -f -c "pwrite -S 0xaa 4096 2048" \
    -c "falloc 0 131072" \
    -c "pwrite -S 0xbb 65536 2048" \
    -c "fsync" /mnt/test/fff

    echo 3 > /proc/sys/vm/drop_caches
    xfs_io -c "pwrite -S 0xdd 67584 2048" /mnt/test/fff

    This can be theoretically also reproduced by at random by running fsx,
    but it's not very reliable, though on machines with bigger page size
    (like ppc) this can be seen more often (especially xfstest generic/127)

    Signed-off-by: Lukas Czerner
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Lukas Czerner
     

02 May, 2015

10 commits

  • This patch removes duplicated encryption modes which were already in
    ext4.h. They were duplicated from commit 3edc18d and commit f542fb.

    Cc: Theodore Ts'o
    Cc: Michael Halcrow
    Cc: Andreas Dilger
    Signed-off-by: Chanho Park
    Signed-off-by: Theodore Ts'o

    Chanho Park
     
  • This patch adds a tristate EXT4_ENCRYPTION to do the selections
    for EXT4_FS_ENCRYPTION because selecting from a bool causes all
    the selected options to be built-in, even if EXT4 itself is a
    module.

    Signed-off-by: Herbert Xu
    Signed-off-by: Theodore Ts'o

    Herbert Xu
     
  • Pull networking fixes from David Miller:

    1) Receive packet length needs to be adjust by 2 on RX to accomodate
    the two padding bytes in altera_tse driver. From Vlastimil Setka.

    2) If rx frame is dropped due to out of memory in macb driver, we leave
    the receive ring descriptors in an undefined state. From Punnaiah
    Choudary Kalluri

    3) Some netlink subsystems erroneously signal NLM_F_MULTI. That is
    only for dumps. Fix from Nicolas Dichtel.

    4) Fix mis-use of raw rt->rt_pmtu value in ipv4, one must always go via
    the ipv4_mtu() helper. From Herbert Xu.

    5) Fix null deref in bridge netfilter, and miscalculated lengths in
    jump/goto nf_tables verdicts. From Florian Westphal.

    6) Unhash ping sockets properly.

    7) Software implementation of BPF divide did 64/32 rather than 64/64
    bit divide. The JITs got it right. Fix from Alexei Starovoitov.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits)
    ipv4: Missing sk_nulls_node_init() in ping_unhash().
    net: fec: Fix RGMII-ID mode
    net/mlx4_en: Schedule napi when RX buffers allocation fails
    netxen_nic: use spin_[un]lock_bh around tx_clean_lock
    net/mlx4_core: Fix unaligned accesses
    mlx4_en: Use correct loop cursor in error path.
    cxgb4: Fix MC1 memory offset calculation
    bnx2x: Delay during kdump load
    net: Fix Kernel Panic in bonding driver debugfs file: rlb_hash_table
    net: dsa: Fix scope of eeprom-length property
    net: macb: Fix race condition in driver when Rx frame is dropped
    hv_netvsc: Fix a bug in netvsc_start_xmit()
    altera_tse: Correct rx packet length
    mlx4: Fix tx ring affinity_mask creation
    tipc: fix problem with parallel link synchronization mechanism
    tipc: remove wrong use of NLM_F_MULTI
    bridge/nl: remove wrong use of NLM_F_MULTI
    bridge/mdb: remove wrong use of NLM_F_MULTI
    net: sched: act_connmark: don't zap skb->nfct
    trivial: net: systemport: bcmsysport.h: fix 0x0x prefix
    ...

    Linus Torvalds
     
  • Here the "other side" refers to the guest or host.

    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: Rusty Russell
    Signed-off-by: Linus Torvalds

    Stefan Hajnoczi
     
  • With my job change kernel work will be "own time"; I'm keeping lguest
    and modules (and the virtio standards work), but virtio kernel has to
    go.

    This makes it clear that Michael is in charge. He's good, but having
    me watch over his shoulder won't help.

    Good luck Michael!

    Signed-off-by: Rusty Russell
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • Pull Ceph RBD fix from Sage Weil.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    rbd: end I/O the entire obj_request on error

    Linus Torvalds
     
  • If we don't do that, then the poison value is left in the ->pprev
    backlink.

    This can cause crashes if we do a disconnect, followed by a connect().

    Tested-by: Linus Torvalds
    Reported-by: Wen Xu
    Signed-off-by: David S. Miller

    David S. Miller
     
  • When we end I/O struct request with error, we need to pass
    obj_request->length as @nr_bytes so that the entire obj_request worth
    of bytes is completed. Otherwise block layer ends up confused and we
    trip on

    rbd_assert(more ^ (which == img_request->obj_request_count));

    in rbd_img_obj_callback() due to more being true no matter what. We
    already do it in most cases but we are missing some, in particular
    those where we don't even get a chance to submit any obj_requests, due
    to an early -ENOMEM for example.

    A number of obj_request->xferred assignments seem to be redundant but
    I haven't touched any of obj_request->xferred stuff to keep this small
    and isolated.

    Cc: Alex Elder
    Cc: stable@vger.kernel.org # 3.10+
    Reported-by: Shawn Edwards
    Reviewed-by: Sage Weil
    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • This obscures the length of the filenames, to decrease the amount of
    information leakage. By default, we pad the filenames to the next 4
    byte boundaries. This costs nothing, since the directory entries are
    aligned to 4 byte boundaries anyway. Filenames can also be padded to
    8, 16, or 32 bytes, which will consume more directory space.

    Change-Id: Ibb7a0fb76d2c48e2061240a709358ff40b14f322
    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     
  • Avoid using SHA-1 when calculating the user-visible filename when the
    encryption key is available, and avoid decrypting lots of filenames
    when searching for a directory entry in a directory block.

    Change-Id: If4655f144784978ba0305b597bfa1c8d7bb69e63
    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     

01 May, 2015

5 commits

  • Pull btrfs fixes from Chris Mason:
    "A few more btrfs fixes.

    These range from corners Filipe found in the new free space cache
    writeback to a grab bag of fixes from the list"

    * 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: btrfs_release_extent_buffer_page didn't free pages of dummy extent
    Btrfs: fill ->last_trans for delayed inode in btrfs_fill_inode.
    btrfs: unlock i_mutex after attempting to delete subvolume during send
    btrfs: check io_ctl_prepare_pages return in __btrfs_write_out_cache
    btrfs: fix race on ENOMEM in alloc_extent_buffer
    btrfs: handle ENOMEM in btrfs_alloc_tree_block
    Btrfs: fix find_free_dev_extent() malfunction in case device tree has hole
    Btrfs: don't check for delalloc_bytes in cache_save_setup
    Btrfs: fix deadlock when starting writeback of bg caches
    Btrfs: fix race between start dirty bg cache writeout and bg deletion

    Linus Torvalds
     
  • Pull arm64 fixes from Will Deacon:
    "Not too much here, but we've addressed a couple of nasty issues in the
    dma-mapping code as well as adding the halfword and byte variants of
    load_acquire/store_release following on from the CSD locking bug that
    you fixed in the core.

    - fix perf devicetree warnings at probe time

    - fix memory leak in __dma_free()

    - ensure DMA buffers are always zeroed

    - show IRQ trigger in /proc/interrupts (for parity with ARM)

    - implement byte and halfword access for smp_{load_acquire,store_release}"

    * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
    arm64: perf: Fix the pmu node name in warning message
    arm64: perf: don't warn about missing interrupt-affinity property for PPIs
    arm64: add missing PAGE_ALIGN() to __dma_free()
    arm64: dma-mapping: always clear allocated buffers
    ARM64: Enable CONFIG_GENERIC_IRQ_SHOW_LEVEL
    arm64: add missing data types in smp_load_acquire/smp_store_release

    Linus Torvalds
     
  • Patches 7cba160ad "powernv/cpuidle: Redesign idle states management"
    and 77b54e9f2 "powernv/powerpc: Add winkle support for offline cpus"
    use non-volatile condition registers (cr2, cr3 and cr4) early in the system
    reset interrupt handler (system_reset_pSeries()) before it has been determined
    if state loss has occurred. If state loss has not occurred, control returns via
    the power7_wakeup_noloss() path which does not restore those condition
    registers, leaving them corrupted.

    Fix this by restoring the condition registers in the power7_wakeup_noloss()
    case.

    This is apparent when running a KVM guest on hardware that does not
    support winkle or sleep and the guest makes use of secondary threads. In
    practice this means Power7 machines, though some early unreleased Power8
    machines may also be susceptible.

    The secondary CPUs are taken off line before the guest is started and
    they call pnv_smp_cpu_kill_self(). This checks support for sleep
    states (in this case there is no support) and power7_nap() is called.

    When the CPU is woken, power7_nap() returns and because the CPU is
    still off line, the main while loop executes again. The sleep states
    support test is executed again, but because the tested values cannot
    have changed, the compiler has optimized the test away and instead we
    rely on the result of the first test, which has been left in cr3
    and/or cr4. With the result overwritten, the wrong branch is taken and
    power7_winkle() is called on a CPU that does not support it, leading
    to it stalling.

    Fixes: 7cba160ad789 ("powernv/cpuidle: Redesign idle states management")
    Fixes: 77b54e9f213f ("powernv/powerpc: Add winkle support for offline cpus")
    [mpe: Massage change log a bit more]
    Signed-off-by: Sam Bobroff
    Signed-off-by: Michael Ellerman

    Sam Bobroff
     
  • Commit 1c509148b ("powerpc/eeh: Do probe on pci_dn") probes EEH
    devices in early stage, which is reasonable to pSeries platform.
    However, it's wrong for PowerNV platform because the PE# isn't
    determined until the resources (IO and MMIO) are assigned to
    PE in hotplug case. So we have to delay probing EEH devices
    for PowerNV platform until the PE# is assigned.

    Fixes: ff57b454ddb9 ("powerpc/eeh: Do probe on pci_dn")
    Signed-off-by: Gavin Shan
    Signed-off-by: Michael Ellerman

    Gavin Shan
     
  • When asserting reset in pcibios_set_pcie_reset_state(), the PE
    is enforced to (hardware) frozen state in order to drop unexpected
    PCI transactions (except PCI config read/write) automatically by
    hardware during reset, which would cause recursive EEH error.
    However, the (software) frozen state EEH_PE_ISOLATED is missed.
    When users get 0xFF from PCI config or MMIO read, EEH_PE_ISOLATED
    is set in PE state retrival backend. Unfortunately, nobody (the
    reset handler or the EEH recovery functinality in host) will clear
    EEH_PE_ISOLATED when the PE has been passed through to guest.

    The patch sets and clears EEH_PE_ISOLATED properly during reset
    in function pcibios_set_pcie_reset_state() to fix the issue.

    Fixes: 28158cd ("Enhance pcibios_set_pcie_reset_state()")
    Reported-by: Carol L. Soto
    Signed-off-by: Gavin Shan
    Tested-by: Carol L. Soto
    Signed-off-by: Michael Ellerman

    Gavin Shan