Eric Lee / smarc-fsl-linux-kernel

11 May, 2016

2 commits

8cb17b5ed irqchip: Add LPC32xx interrupt controller driver ... Browse Code »

The change adds improved support of NXP LPC32xx MIC, SIC1 and SIC2
interrupt controllers.

This is a list of new features in comparison to the legacy driver:
* irq types are taken from device tree settings, no more need to
hardcode them,
* old driver is based on irq_domain_add_legacy, which causes problems
with handling MIC hardware interrupt 0 produced by SIC1,
* there is one driver for MIC, SIC1 and SIC2, no more need to handle
them separately, e.g. have two separate handlers for SIC1 and SIC2,
* the driver does not have any dependencies on hardcoded register
offsets,
* the driver is much simpler for maintenance,
* SPARSE_IRQS option is supported.

Legacy LPC32xx interrupt controller driver was broken since commit
76ba59f8366f ("genirq: Add irq_domain-aware core IRQ handler"), which
requires a private interrupt handler, otherwise any SIC1 generated
interrupt (mapped to MIC hwirq 0) breaks the kernel with the message
"unexpected IRQ trap at vector 00".

The change disables compilation of a legacy driver found at
arch/arm/mach-lpc32xx/irq.c, the file will be removed in a separate
commit.

Fixes: 76ba59f8366f ("genirq: Add irq_domain-aware core IRQ handler")
Tested-by: Sylvain Lemieux
Signed-off-by: Vladimir Zapolskiy
Signed-off-by: Marc Zyngier

Vladimir Zapolskiy
2016-05-11 17:12:11 +0800
f86c4fbd9 irqchip/gic: Ensure ordering between read of INTACK and shared data ... Browse Code »

When an IPI is generated by a CPU, the pattern looks roughly like:

smp_wmb();

On the receiving CPU we rely on the fact that, once we've taken the
interrupt, then the freshly written shared data must be visible to us.
Put another way, the CPU isn't going to speculate taking an interrupt.

Unfortunately, this assumption turns out to be broken.

Consider that CPUx wants to send an IPI to CPUy, which will cause CPUy
to read some shared_data. Before CPUx has done anything, a random
peripheral raises an IRQ to the GIC and the IRQ line on CPUy is raised.
CPUy then takes the IRQ and starts executing the entry code, heading
towards gic_handle_irq. Furthermore, let's assume that a bunch of the
previous interrupts handled by CPUy were SGIs, so the branch predictor
kicks in and speculates that irqnr will be
Signed-off-by: Marc Zyngier

Will Deacon
2016-05-11 17:11:51 +0800

04 May, 2016

2 commits

b8f3ebe63 irqchip: Add Layerscape SCFG MSI controller support ... Browse Code »

Some kind of Freescale Layerscape SoC provides a MSI
implementation which uses two SCFG registers MSIIR and
MSIR to support 32 MSI interrupts for each PCIe controller.
The patch is to support it.

Signed-off-by: Minghuan Lian
Tested-by: Alexander Stein
Acked-by: Marc Zyngier
Signed-off-by: Marc Zyngier

Minghuan Lian
2016-05-04 16:58:04 +0800
5e79cb29d dt/bindings: Add bindings for Layerscape SCFG MSI ... Browse Code »

Some Layerscape SoCs use a simple MSI controller implementation.
It contains only two SCFG register to trigger and describe a
group 32 MSI interrupts. The patch adds bindings to describe
the controller.

Signed-off-by: Minghuan Lian
Acked-by: Rob Herring
Signed-off-by: Marc Zyngier

Minghuan Lian
2016-05-04 16:54:21 +0800

02 May, 2016

7 commits

287e9357a DT/arm,gic-v3: Documment PPI partition support ... Browse Code »

Add a decription of the PPI partitioning support.

Signed-off-by: Marc Zyngier
Acked-by: Rob Herring
Cc: Mark Rutland
Cc: devicetree@vger.kernel.org
Cc: Jason Cooper
Cc: Will Deacon
Link: http://lkml.kernel.org/r/1460365075-7316-6-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner

Marc Zyngier
2016-05-02 19:42:51 +0800
e3825ba1a irqchip/gic-v3: Add support for partitioned PPIs ... Browse Code »

Plug the partitioning layer into the GICv3 PPI code, parsing the
DT and building the partition affinities and providing the generic
code with partition data and callbacks.

Signed-off-by: Marc Zyngier
Cc: Mark Rutland
Cc: devicetree@vger.kernel.org
Cc: Jason Cooper
Cc: Will Deacon
Cc: Rob Herring
Link: http://lkml.kernel.org/r/1460365075-7316-5-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner

Marc Zyngier
2016-05-02 19:42:51 +0800
9e2c986cb irqchip: Add per-cpu interrupt partitioning library ... Browse Code »

We've unfortunately started seeing a situation where percpu interrupts
are partitioned in the system: one arbitrary set of CPUs has an
interrupt connected to a type of device, while another disjoint
set of CPUs has the same interrupt connected to another type of device.

This makes it impossible to have a device driver requesting this interrupt
using the current percpu-interrupt abstraction, as the same interrupt number
is now potentially claimed by at least two drivers, and we forbid interrupt
sharing on per-cpu interrupt.

A solution to this is to turn things upside down. Let's assume that our
system describes all the possible partitions for a given interrupt, and
give each of them a unique identifier. It is then possible to create
a namespace where the affinity identifier itself is a form of interrupt
number. At this point, it becomes easy to implement a set of partitions
as a cascaded irqchip, each affinity identifier being the HW irq.

This allows us to keep a number of nice properties:
- Each partition results in a separate percpu-interrupt (with a restrictied
affinity), which keeps drivers happy.
- Because the underlying interrupt is still per-cpu, the overhead of
the indirection can be kept pretty minimal.
- The core code can ignore most of that crap.

For that purpose, we implement a small library that deals with some of
the boilerplate code, relying on platform-specific drivers to provide
a description of the affinity sets and a set of callbacks.

Signed-off-by: Marc Zyngier
Cc: Mark Rutland
Cc: devicetree@vger.kernel.org
Cc: Jason Cooper
Cc: Will Deacon
Cc: Rob Herring
Link: http://lkml.kernel.org/r/1460365075-7316-4-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner

Marc Zyngier
2016-05-02 19:42:51 +0800
222df54fd genirq: Allow the affinity of a percpu interrupt to be set/retrieved ... Browse Code »

In order to prepare the genirq layer for the concept of partitionned
percpu interrupts, let's allow an affinity to be associated with
such an interrupt. We introduce:

- irq_set_percpu_devid_partition: flag an interrupt as a percpu-devid
interrupt, and associate it with an affinity
- irq_get_percpu_devid_partition: allow the affinity of that interrupt
to be retrieved.

This will allow a driver to discover which CPUs the per-cpu interrupt
can actually fire on.

Signed-off-by: Marc Zyngier
Cc: Mark Rutland
Cc: devicetree@vger.kernel.org
Cc: Jason Cooper
Cc: Will Deacon
Cc: Rob Herring
Link: http://lkml.kernel.org/r/1460365075-7316-3-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner

Marc Zyngier
2016-05-02 19:42:51 +0800
651e8b54a irqdomain: Allow domain matching on irq_fwspec ... Browse Code »

When iterating over the irq domain list, we try to match a domain
either by calling a match() function or by comparing a number
of fields passed as parameters.

Both approaches are a bit restrictive:
- match() is DT specific and only takes a device node
- the fallback case only deals with the fwnode_handle

It would be useful if we had a per-domain function that would
actually perform the matching check on the whole of the
irq_fwspec structure. This would allow for a domain to triage
matching attempts that need to extend beyond the fwnode.

Let's introduce irq_find_matching_fwspec(), which takes a full
blown irq_fwspec structure, and call into a select() function
implemented by the irqdomain. irq_find_matching_fwnode() is
made a wrapper around irq_find_matching_fwspec in order to
preserve compatibility.

Signed-off-by: Marc Zyngier
Cc: Mark Rutland
Cc: devicetree@vger.kernel.org
Cc: Jason Cooper
Cc: Will Deacon
Cc: Rob Herring
Link: http://lkml.kernel.org/r/1460365075-7316-2-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner

Marc Zyngier
2016-05-02 19:42:50 +0800
7cec18a39 genirq: Add error code reporting to irq_{reserve,destroy}_ipi ... Browse Code »

Make these functions return appropriate error codes when something goes
wrong.

Previously irq_destroy_ipi returned void making it impossible to notify
the caller if the request could not be fulfilled. Patch 1 in the series
added another condition in which this could fail in addition to the
existing ones. irq_reserve_ipi returned an unsigned int meaning it could
only return 0 on failure and give the caller no indication as to why the
request failed.

As time goes on there are likely to be further conditions added in which
these functions can fail. These APIs and the IPI IRQ domain are new in
4.6 and the number of existing call sites are low, changing the API now
has little impact on the code, while making it easier for these
functions to grow over time.

Signed-off-by: Matt Redfearn
Cc: linux-mips@linux-mips.org
Cc: jason@lakedaemon.net
Cc: marc.zyngier@arm.com
Cc: ralf@linux-mips.org
Cc: Qais Yousef
Cc: lisa.parratt@imgtec.com
Cc: jiang.liu@linux.intel.com
Link: http://lkml.kernel.org/r/1461568464-31701-2-git-send-email-matt.redfearn@imgtec.com
Signed-off-by: Thomas Gleixner

Matt Redfearn
2016-05-02 19:42:50 +0800
01292cea0 genirq: Make irq_destroy_ipi take a cpumask of IPIs to destroy ... Browse Code »

Previously irq_destroy_ipi() would destroy IPIs to all CPUs that were
configured by irq_reserve_ipi(). This change makes it possible to
destroy just a subset of the IPIs. This may be useful to remove IPIs to
CPUs that have been hot removed so that the IRQ numbers allocated within
the IPI domain can be re-used.

The original behaviour is restored by passing the complete mask that the
IPI was created with.

There are currently no users of this function that would break from the
API change.

Signed-off-by: Matt Redfearn
Cc: linux-mips@linux-mips.org
Cc: jason@lakedaemon.net
Cc: marc.zyngier@arm.com
Cc: ralf@linux-mips.org
Cc: Qais Yousef
Cc: lisa.parratt@imgtec.com
Cc: jiang.liu@linux.intel.com
Link: http://lkml.kernel.org/r/1461568464-31701-1-git-send-email-matt.redfearn@imgtec.com
Signed-off-by: Thomas Gleixner

Matt Redfearn
2016-05-02 19:42:50 +0800

28 Apr, 2016

5 commits

b75a2bf89 Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq ... Browse Code »

Pull workqueue fix from Tejun Heo:
"So, it turns out we had a silly bug in the most fundamental part of
workqueue for a very long time. AFAICS, this dates back to pre-git
era and has quite likely been there from the time workqueue was first
introduced.

A work item uses its PENDING bit to synchronize multiple queuers.
Anyone who wins the PENDING bit owns the pending state of the work
item. Whether a queuer wins or loses the race, one thing should be
guaranteed - there will soon be at least one execution of the work
item - where "after" means that the execution instance would be able
to see all the changes that the queuer has made prior to the queueing
attempt.

Unfortunately, we were missing a smp_mb() after clearing PENDING for
execution, so nothing guaranteed visibility of the changes that a
queueing loser has made, which manifested as a reproducible blk-mq
stall.

Lots of kudos to Roman for debugging the problem. The patch for
-stable is the minimal one. For v3.7, Peter is working on a patch to
make the code path slightly more efficient and less fragile"

* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: fix ghost PENDING flag while doing MQ IO

Linus Torvalds
2016-04-28 03:03:59 +0800
763cfc86e Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup fixes from Tejun Heo:
"Two patches to fix a deadlock which can be easily triggered if memcg
charge moving is used.

This bug was introduced while converting threadgroup locking to a
global percpu_rwsem and is caused by cgroup controller task migration
path depending on the ability to create new kthreads. cpuset had a
similar issue which was fixed by performing heavy-lifting operations
asynchronous to task migration. The two patches fix the same issue in
memcg in a similar way. The first patch makes the mechanism generic
and the second relocates memcg charge moving outside the migration
path.

Given that we don't want to perform heavy operations while
writelocking threadgroup lock anyway, moving them out of the way is a
desirable solution. One thing to note is that the problem was
difficult to debug because lockdep couldn't figure out the deadlock
condition. Looking into how to improve that"

* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
memcg: relocate charge moving from ->attach to ->post_attach
cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback

Linus Torvalds
2016-04-28 02:41:14 +0800
3118e5f96 Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux ... Browse Code »

Pull i2c fixes from Wolfram Sang:
"I2C has one buildfix, one ABBA deadlock fix, and three simple 'add ID'
patches"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: exynos5: Fix possible ABBA deadlock by keeping I2C clock prepared
i2c: cpm: Fix build break due to incompatible pointer types
i2c: ismt: Add Intel DNV PCI ID
i2c: xlp9xx: add support for Broadcom Vulcan
i2c: rk3x: add support for rk3228

Linus Torvalds
2016-04-28 02:34:45 +0800
24131a61e Merge tag 'arc-4.6-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc ... Browse Code »

Pull ARC fixes from Vineet Gupta:

- lockdep now works for ARCv2 builds

- enable DT reserved-memory binding (for forthcoming HDMI driver)

* tag 'arc-4.6-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: add support for reserved memory defined by device tree
ARC: support generic per-device coherent dma mem
Documentation: dt: arc: fix spelling mistakes
ARCv2: Enable LOCKDEP

Linus Torvalds
2016-04-28 00:46:21 +0800
508fea71c Merge tag 'nios2-v4.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2 ... Browse Code »

Pull arch/nios2 fix from Ley Foon Tan:
"memset: use the right constraint modifier for the %4 output operand"

* tag 'nios2-v4.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
nios2: memset: use the right constraint modifier for the %4 output operand

Linus Torvalds
2016-04-28 00:33:24 +0800

27 Apr, 2016

8 commits

9453203bf Merge tag 'platform-drivers-x86-v4.6-3' of git://git.infradead.org/users/dvhart/… ... Browse Code »

…linux-platform-drivers-x86

Pull x86 platform driver fix from Darren Hart:
"Fix regression caused by hotkey enabling value in toshiba_acpi"

* tag 'platform-drivers-x86-v4.6-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
toshiba_acpi: Fix regression caused by hotkey enabling value

Linus Torvalds
2016-04-27 23:57:11 +0800
1b10cb21d ARC: add support for reserved memory defined by device tree ... Browse Code »

Enable reserved memory initialization from device tree.

Signed-off-by: Alexey Brodkin
Cc: Grant Likely
Cc: Marek Szyprowski
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Vineet Gupta

Alexey Brodkin
2016-04-27 19:36:56 +0800
32ed9a0e0 ARC: support generic per-device coherent dma mem ... Browse Code »

Signed-off-by: Alexey Brodkin
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Vineet Gupta

Alexey Brodkin
2016-04-27 19:36:55 +0800
a8950e49b nios2: memset: use the right constraint modifier for the %4 output operand ... Browse Code »

Depending on the size of the area to be memset'ed, the nios2 memset implementation
either uses a naive loop (for buffers smaller or equal than 8 bytes) or a more optimized
implementation (for buffers larger than 8 bytes). This implementation does 4-byte stores
rather than 1-byte stores to speed up memset.

However, we discovered that on our nios2 platform, memset() was not properly setting the
buffer to the expected value. A memset of 0xff would not set the entire buffer to 0xff, but to:

0xff 0x00 0xff 0x00 0xff 0x00 0xff 0x00 ...

Which is obviously incorrect. Our investigation has revealed that the problem lies in the
incorrect constraints used in the inline assembly.

The following piece of assembly, from the nios2 memset implementation, is supposed to
create a 4-byte value that repeats 4 times the 1-byte pattern passed as memset argument:

/* fill8 %3, %5 (c & 0xff) */
" slli %4, %5, 8\n"
" or %4, %4, %5\n"
" slli %3, %4, 16\n"
" or %3, %3, %4\n"

However, depending on the compiler and optimization level, this code might be compiled as:

34: 280a923a slli r5,r5,8
38: 294ab03a or r5,r5,r5
3c: 2808943a slli r4,r5,16
40: 2148b03a or r4,r4,r5

This is wrong because r5 gets used both for %5 and %4, which leads to the final pattern
stored in r4 to be 0xff00ff00 rather than the expected 0xffffffff.

%4 is defined with the "=r" constraint, i.e as an output operand. However, as explained in
http://www.ethernut.de/en/documents/arm-inline-asm.html, this does not prevent gcc from
using the same register for an output operand (%4) and input operand (%5). By using the
constraint modifier '&', we indicate that the register should be used for output only. With this
change, we get the following assembly output:

34: 2810923a slli r8,r5,8
38: 4150b03a or r8,r8,r5
3c: 400e943a slli r7,r8,16
40: 3a0eb03a or r7,r7,r8

Which correctly produces the 0xffffffff pattern when 0xff is passed as the memset() pattern.

It is worth mentioning the observed consequence of this bug: we were hitting the kernel
BUG() in mm/bootmem.c:__free() that verifies when marking a page as free that it was
previously marked as occupied (i.e that the bit was set to 1). The entire bootmem bitmap is
set to 0xff bit via a memset() during the bootmem initialization. The bootmem_free() call right
after the initialization was finding some bits to be set to 0, which didn't make sense since the
bitmap has just been memset'ed to 0xff. Except that due to the bug explained above, the
bitmap was in fact initialized to 0xff00ff00.

Thanks to Marek Vasut for his help and feedback.

Signed-off-by: Romain Perier
Acked-by: Marek Vasut
Acked-by: Ley Foon Tan

Romain Perier
2016-04-27 16:35:55 +0800
f28f20da7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) Handle v4/v6 mixed sockets properly in soreuseport, from Craig
Gallak.

2) Bug fixes for the new macsec facility (missing kmalloc NULL checks,
missing locking around netdev list traversal, etc.) from Sabrina
Dubroca.

3) Fix handling of host routes on ifdown in ipv6, from David Ahern.

4) Fix double-fdput in bpf verifier. From Jann Horn.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (31 commits)
bpf: fix double-fdput in replace_map_fd_with_map_ptr()
net: ipv6: Delete host routes on an ifdown
Revert "ipv6: Revert optional address flusing on ifdown."
net/mlx4_en: fix spurious timestamping callbacks
net: dummy: remove note about being Y by default
cxgbi: fix uninitialized flowi6
ipv6: Revert optional address flusing on ifdown.
ipv4/fib: don't warn when primary address is missing if in_dev is dead
net/mlx5: Add pci shutdown callback
net/mlx5_core: Remove static from local variable
net/mlx5e: Use vport MTU rather than physical port MTU
net/mlx5e: Fix minimum MTU
net/mlx5e: Device's mtu field is u16 and not int
net/mlx5_core: Add ConnectX-5 to list of supported devices
net/mlx5e: Fix MLX5E_100BASE_T define
net/mlx5_core: Fix soft lockup in steering error flow
qlcnic: Update version to 5.3.64
net: stmmac: socfpga: Remove re-registration of reset controller
macsec: fix netlink attribute validation
macsec: add missing macsec prefix in uapi
...

Linus Torvalds
2016-04-27 07:25:51 +0800
91ea692f8 Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc ... Browse Code »

Pull ARM SoC fixes from Arnd Bergmann:
"Here are the latest bug fixes for ARM SoCs, mostly addressing recent
regressions. Changes are across several platforms, so I'm listing
every change separately here.

Regressions since 4.5:

- A correction of the psci firmware DT binding, to prevent users from
relying on unintended semantics

- Actually getting the newly merged clock driver for some OMAP
platforms to work

- A revert of patches for the Qualcomm BAM, these need to be reworked
for 4.7 to avoid breaking boards other than the one they were
intended for

- A correction for the I2C device nodes on the Socionext Uniphier
platform

- i.MX SDHCI was broken for non-DT platforms due to a change with the
setting of the DMA mask

- A revert of a patch that accidentally added a nonexisting clock on
the Rensas "Porter" board

- A couple of OMAP fixes that are all related to suspend after the
power domain changes for dra7

- On Mediatek, revert part of the power domain initialization changes
that broke mt8173-evb

Fixes for older bugs:

- Workaround for an "external abort" in the omap34xx suspend/resume
code.

- The USB1/eSATA should not be listed as an excon device on
am57xx-beagle-x15 (broken since v4.0)

- A v4.5 regression in the TI AM33xx and AM43XX DT specifying
incorrect DMA request lines for the GPMC

- The jiffies calibration on Renesas platforms was incorrect for some
modern CPU cores.

- A hardware errata woraround for clockdomains on TI DRA7"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
drivers: firmware: psci: unify enable-method binding on ARM {64,32}-bit systems
arm64: dts: uniphier: fix I2C nodes of PH1-LD20
ARM: shmobile: timer: Fix preset_lpj leading to too short delays
Revert "ARM: dts: porter: Enable SCIF_CLK frequency and pins"
ARM: dts: r8a7791: Don't disable referenced optional clocks
Revert "ARM: OMAP: Catch callers of revision information prior to it being populated"
ARM: OMAP3: Fix external abort on 36xx waking from off mode idle
ARM: dts: am57xx-beagle-x15: remove extcon_usb1
ARM: dts: am437x: Fix GPMC dma properties
ARM: dts: am33xx: Fix GPMC dma properties
Revert "soc: mediatek: SCPSYS: Fix double enabling of regulators"
ARM: mach-imx: sdhci-esdhc-imx: initialize DMA mask
ARM: DRA7: clockdomain: Implement timer workaround for errata i874
ARM: OMAP: Catch callers of revision information prior to it being populated
ARM: dts: dra7: Correct clock tree for sys_32k_ck
ARM: OMAP: DRA7: Provide proper class to omap2_set_globals_tap
ARM: OMAP: DRA7: wakeupgen: Skip SAR save for wakeupgen
Revert "dts: msm8974: Add dma channels for blsp2_i2c1 node"
Revert "dts: msm8974: Add blsp2_bam dma node"
ARM: dts: Add clocks for dm814x ADPLL

Linus Torvalds
2016-04-27 07:17:01 +0800
8ead9dd54 devpts: more pty driver interface cleanups ... Browse Code »

This is more prep-work for the upcoming pty changes. Still just code
cleanup with no actual semantic changes.

This removes a bunch pointless complexity by just having the slave pty
side remember the dentry associated with the devpts slave rather than
the inode. That allows us to remove all the "look up the dentry" code
for when we want to remove it again.

Together with moving the tty pointer from "inode->i_private" to
"dentry->d_fsdata" and getting rid of pointless inode locking, this
removes about 30 lines of code. Not only is the end result smaller,
it's simpler and easier to understand.

The old code, for example, depended on the d_find_alias() to not just
find the dentry, but also to check that it is still hashed, which in
turn validated the tty pointer in the inode.

That is a _very_ roundabout way to say "invalidate the cached tty
pointer when the dentry is removed".

The new code just does

dentry->d_fsdata = NULL;

in devpts_pty_kill() instead, invalidating the tty pointer rather more
directly and obviously. Don't do something complex and subtle when the
obvious straightforward approach will do.

The rest of the patch (ie apart from code deletion and the above tty
pointer clearing) is just switching the calling convention to pass the
dentry or file pointer around instead of the inode.

Cc: Eric Biederman
Cc: Peter Anvin
Cc: Andy Lutomirski
Cc: Al Viro
Cc: Peter Hurley
Cc: Serge Hallyn
Cc: Willy Tarreau
Cc: Aurelien Jarno
Cc: Alan Cox
Cc: Jann Horn
Cc: Greg KH
Cc: Jiri Slaby
Cc: Florian Weimer
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-04-27 06:47:32 +0800
8358b02bf bpf: fix double-fdput in replace_map_fd_with_map_ptr() ... Browse Code »

When bpf(BPF_PROG_LOAD, ...) was invoked with a BPF program whose bytecode
references a non-map file descriptor as a map file descriptor, the error
handling code called fdput() twice instead of once (in __bpf_map_get() and
in replace_map_fd_with_map_ptr()). If the file descriptor table of the
current task is shared, this causes f_count to be decremented too much,
allowing the struct file to be freed while it is still in use
(use-after-free). This can be exploited to gain root privileges by an
unprivileged user.

This bug was introduced in
commit 0246e64d9a5f ("bpf: handle pseudo BPF_LD_IMM64 insn"), but is only
exploitable since
commit 1be7f75d1668 ("bpf: enable non-root eBPF programs") because
previously, CAP_SYS_ADMIN was required to reach the vulnerable code.

(posted publicly according to request by maintainer)

Signed-off-by: Jann Horn
Signed-off-by: Linus Torvalds
Acked-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Jann Horn
2016-04-27 05:37:21 +0800

26 Apr, 2016

12 commits

38bd10c44 net: ipv6: Delete host routes on an ifdown ... Browse Code »

It was a simple idea -- save IPv6 configured addresses on a link down
so that IPv6 behaves similar to IPv4. As always the devil is in the
details and the IPv6 stack as too many behavioral differences from IPv4
making the simple idea more complicated than it needs to be.

The current implementation for keeping IPv6 addresses can panic or spit
out a warning in one of many paths:

1. IPv6 route gets an IPv4 route as its 'next' which causes a panic in
rt6_fill_node while handling a route dump request.

2. rt->dst.obsolete is set to DST_OBSOLETE_DEAD hitting the WARN_ON in
fib6_del

3. Panic in fib6_purge_rt because rt6i_ref count is not 1.

The root cause of all these is references related to the host route for
an address that is retained.

So, this patch deletes the host route every time the ifdown loop runs.
Since the host route is deleted and will be re-generated an up there is
no longer a need for the l3mdev fix up. On the 'admin up' side move
addrconf_permanent_addr into the NETDEV_UP event handling so that it
runs only once versus on UP and CHANGE events.

All of the current panics and warnings appear to be related to
addresses on the loopback device, but given the catastrophic nature when
a bug is triggered this patch takes the conservative approach and evicts
all host routes rather than trying to determine when it can be re-used
and when it can not. That can be a later optimizaton if desired.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2016-04-26 23:48:26 +0800
6a923934c Revert "ipv6: Revert optional address flusing on ifdown." ... Browse Code »

This reverts commit 841645b5f2dfceac69b78fcd0c9050868d41ea61.

Ok, this puts the feature back. I've decided to apply David A.'s
bug fix and run with that rather than make everyone wait another
whole release for this feature.

Signed-off-by: David S. Miller

David S. Miller
2016-04-26 23:47:41 +0800
346c09f80 workqueue: fix ghost PENDING flag while doing MQ IO ... Browse Code »

The bug in a workqueue leads to a stalled IO request in MQ ctx->rq_list
with the following backtrace:

[ 601.347452] INFO: task kworker/u129:5:1636 blocked for more than 120 seconds.
[ 601.347574] Tainted: G O 4.4.5-1-storage+ #6
[ 601.347651] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 601.348142] kworker/u129:5 D ffff880803077988 0 1636 2 0x00000000
[ 601.348519] Workqueue: ibnbd_server_fileio_wq ibnbd_dev_file_submit_io_worker [ibnbd_server]
[ 601.348999] ffff880803077988 ffff88080466b900 ffff8808033f9c80 ffff880803078000
[ 601.349662] ffff880807c95000 7fffffffffffffff ffffffff815b0920 ffff880803077ad0
[ 601.350333] ffff8808030779a0 ffffffff815b01d5 0000000000000000 ffff880803077a38
[ 601.350965] Call Trace:
[ 601.351203] [] ? bit_wait+0x60/0x60
[ 601.351444] [] schedule+0x35/0x80
[ 601.351709] [] schedule_timeout+0x192/0x230
[ 601.351958] [] ? blk_flush_plug_list+0xc7/0x220
[ 601.352208] [] ? ktime_get+0x37/0xa0
[ 601.352446] [] ? bit_wait+0x60/0x60
[ 601.352688] [] io_schedule_timeout+0xa4/0x110
[ 601.352951] [] ? _raw_spin_unlock_irqrestore+0xe/0x10
[ 601.353196] [] bit_wait_io+0x1b/0x70
[ 601.353440] [] __wait_on_bit+0x5d/0x90
[ 601.353689] [] wait_on_page_bit+0xc0/0xd0
[ 601.353958] [] ? autoremove_wake_function+0x40/0x40
[ 601.354200] [] __filemap_fdatawait_range+0xe4/0x140
[ 601.354441] [] filemap_fdatawait_range+0x14/0x30
[ 601.354688] [] filemap_write_and_wait_range+0x3f/0x70
[ 601.354932] [] blkdev_fsync+0x1b/0x50
[ 601.355193] [] vfs_fsync_range+0x49/0xa0
[ 601.355432] [] blkdev_write_iter+0xca/0x100
[ 601.355679] [] __vfs_write+0xaa/0xe0
[ 601.355925] [] vfs_write+0xa9/0x1a0
[ 601.356164] [] kernel_write+0x38/0x50

The underlying device is a null_blk, with default parameters:

queue_mode = MQ
submit_queues = 1

Verification that nullb0 has something inflight:

root@pserver8:~# cat /sys/block/nullb0/inflight
0 1
root@pserver8:~# find /sys/block/nullb0/mq/0/cpu* -name rq_list -print -exec cat {} \;
...
/sys/block/nullb0/mq/0/cpu2/rq_list
CTX pending:
ffff8838038e2400
...

During debug it became clear that stalled request is always inserted in
the rq_list from the following path:

save_stack_trace_tsk + 34
blk_mq_insert_requests + 231
blk_mq_flush_plug_list + 281
blk_flush_plug_list + 199
wait_on_page_bit + 192
__filemap_fdatawait_range + 228
filemap_fdatawait_range + 20
filemap_write_and_wait_range + 63
blkdev_fsync + 27
vfs_fsync_range + 73
blkdev_write_iter + 202
__vfs_write + 170
vfs_write + 169
kernel_write + 56

So blk_flush_plug_list() was called with from_schedule == true.

If from_schedule is true, that means that finally blk_mq_insert_requests()
offloads execution of __blk_mq_run_hw_queue() and uses kblockd workqueue,
i.e. it calls kblockd_schedule_delayed_work_on().

That means, that we race with another CPU, which is about to execute
__blk_mq_run_hw_queue() work.

Further debugging shows the following traces from different CPUs:

CPU#0 CPU#1
---------------------------------- -------------------------------
reqeust A inserted
STORE hctx->ctx_map[0] bit marked
kblockd_schedule...() returns 1

request B inserted
STORE hctx->ctx_map[1] bit marked
kblockd_schedule...() returns 0
*** WORK PENDING bit is cleared ***
flush_busy_ctxs() is executed, but
bit 1, set by CPU#1, is not observed

As a result request B pended forever.

This behaviour can be explained by speculative LOAD of hctx->ctx_map on
CPU#0, which is reordered with clear of PENDING bit and executed _before_
actual STORE of bit 1 on CPU#1.

The proper fix is an explicit full barrier , which guarantees
that clear of PENDING bit is to be executed before all possible
speculative LOADS or STORES inside actual work function.

Signed-off-by: Roman Pen
Cc: Gioh Kim
Cc: Michael Wang
Cc: Tejun Heo
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Tejun Heo

Roman Pen
2016-04-26 23:23:22 +0800
978fa4362 drivers: firmware: psci: unify enable-method binding on ARM {64,32}-bit systems ... Browse Code »

Currently ARM CPUs DT bindings allows different enable-method value for
PSCI based systems. On ARM 64-bit this property is required and must be
"psci" while on ARM 32-bit systems this property is optional and must
be "arm,psci" if present.

However, "arm,psci" has always been the compatible string for the PSCI
node, and was never intended to be the enable-method. So this is a bug
in the binding and not a deliberate attempt at specifying 32-bit
differently.

This is problematic if 32-bit OS is run on 64-bit system which has
"psci" as enable-method rather than the expected "arm,psci".

So let's unify the value into "psci" and remove support for "arm,psci"
before it finds any users.

Reported-by: Soby Mathew
Cc: Rob Herring
Acked-by: Mark Rutland
Acked-by: Lorenzo Pieralisi
Signed-off-by: Sudeep Holla
Signed-off-by: Arnd Bergmann

Sudeep Holla
2016-04-26 18:46:08 +0800
fc96256c9 net/mlx4_en: fix spurious timestamping callbacks ... Browse Code »

When multiple skb are TX-completed in a row, we might incorrectly keep
a timestamp of a prior skb and cause extra work.

Fixes: ec693d47010e8 ("net/mlx4_en: Add HW timestamping (TS) support")
Signed-off-by: Eric Dumazet
Cc: Willem de Bruijn
Reviewed-by: Eran Ben Elisha
Signed-off-by: David S. Miller

Eric Dumazet
2016-04-26 13:13:18 +0800
9f5db5350 net: dummy: remove note about being Y by default ... Browse Code »

Signed-off-by: Ivan Babrou
Signed-off-by: David S. Miller

Ivan Babrou
2016-04-26 13:11:55 +0800
3d6d30d60 cxgbi: fix uninitialized flowi6 ... Browse Code »

ip6_route_output looks into different fields in the passed flowi6 structure,
yet cxgbi passes garbage in nearly all those fields. Zero the structure out
first.

Fixes: fc8d0590d9142 ("libcxgbi: Add ipv6 api to driver")
Signed-off-by: Jiri Benc
Signed-off-by: David S. Miller

Jiri Benc
2016-04-26 04:20:49 +0800
264a0ae16 memcg: relocate charge moving from ->attach to ->post_attach ... Browse Code »

Hello,

So, this ended up a lot simpler than I originally expected. I tested
it lightly and it seems to work fine. Petr, can you please test these
two patches w/o the lru drain drop patch and see whether the problem
is gone?

Thanks.
------ 8< ------
If charge moving is used, memcg performs relabeling of the affected
pages from its ->attach callback which is called under both
cgroup_threadgroup_rwsem and thus can't create new kthreads. This is
fragile as various operations may depend on workqueues making forward
progress which relies on the ability to create new kthreads.

There's no reason to perform charge moving from ->attach which is deep
in the task migration path. Move it to ->post_attach which is called
after the actual migration is finished and cgroup_threadgroup_rwsem is
dropped.

* move_charge_struct->mm is added and ->can_attach is now responsible
for pinning and recording the target mm. mem_cgroup_clear_mc() is
updated accordingly. This also simplifies mem_cgroup_move_task().

* mem_cgroup_move_task() is now called from ->post_attach instead of
->attach.

Signed-off-by: Tejun Heo
Cc: Johannes Weiner
Acked-by: Michal Hocko
Debugged-and-tested-by: Petr Mladek
Reported-by: Cyril Hrubis
Reported-by: Johannes Weiner
Fixes: 1ed1328792ff ("sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem")
Cc: # 4.4+

Tejun Heo
2016-04-26 03:45:14 +0800
5cf1cacb4 cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback ... Browse Code »

Since e93ad19d0564 ("cpuset: make mm migration asynchronous"), cpuset
kicks off asynchronous NUMA node migration if necessary during task
migration and flushes it from cpuset_post_attach_flush() which is
called at the end of __cgroup_procs_write(). This is to avoid
performing migration with cgroup_threadgroup_rwsem write-locked which
can lead to deadlock through dependency on kworker creation.

memcg has a similar issue with charge moving, so let's convert it to
an official callback rather than the current one-off cpuset specific
function. This patch adds cgroup_subsys->post_attach callback and
makes cpuset register cpuset_post_attach_flush() as its ->post_attach.

The conversion is mostly one-to-one except that the new callback is
called under cgroup_mutex. This is to guarantee that no other
migration operations are started before ->post_attach callbacks are
finished. cgroup_mutex is one of the outermost mutex in the system
and has never been and shouldn't be a problem. We can add specialized
synchronization around __cgroup_procs_write() but I don't think
there's any noticeable benefit.

Signed-off-by: Tejun Heo
Cc: Li Zefan
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: # 4.4+ prerequisite for the next patch

Tejun Heo
2016-04-26 03:45:14 +0800
841645b5f ipv6: Revert optional address flusing on ifdown. ... Browse Code »

This reverts the following three commits:

70af921db6f8835f4b11c65731116560adb00c14
799977d9aafbf0ca0b9c39b04cbfb16db71302c9
f1705ec197e705b79ea40fe7a2cc5acfa1d3bfac

The feature was ill conceived, has terrible semantics, and has added
nothing but regressions to the already fragile ipv6 stack.

Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
Signed-off-by: David S. Miller

David S. Miller
2016-04-26 03:33:55 +0800
a30b8f81d toshiba_acpi: Fix regression caused by hotkey enabling value ... Browse Code »

Commit 52cbae0127ad ("toshiba_acpi: Change default Hotkey enabling value")
changed the hotkeys enabling value, as it was the same value Windows uses,
however, it turns out that the value tells the EC that the driver will now
take care of the hardware events like the physical RFKill switch or the
pointing device toggle button.

This patch reverts such commit by changing the default hotkey enabling
value to 0x09, which enables hotkey events only, making the hardware
buttons working again.

Fixes bugs 113331 and 114941.

Signed-off-by: Azael Avalos
Cc: stable@vger.kernel.org
Signed-off-by: Darren Hart

Azael Avalos
2016-04-26 01:31:59 +0800
bcc981e9e Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 ... Browse Code »

Pull crypto fixes from Herbert Xu:
"This fixes a couple of regressions in the talitos driver that were
introduced back in 4.3.

The first bug causes a crash when the driver's AEAD functionality is
used while the second bug prevents its AEAD feature from working once
you get past the first bug"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: talitos - fix AEAD tcrypt tests
crypto: talitos - fix crash in talitos_cra_init()

Linus Torvalds
2016-04-26 00:32:45 +0800

25 Apr, 2016

4 commits

004cb62ef Merge tag 'omap-for-v4.6/dt-ti81xx-signed' of git://git.kernel.org/pub/scm/linux… ... Browse Code »

…/kernel/git/tmlind/linux-omap into fixes

Enable dm814x and dra62x clock driver. This branch has a dependency
to the clk-ti branch from the Linux clk tree for the ADPLL clock driver.
Otherwise things won't keep booting properly when we flip over to use
the clock driver instead of fixed clocks set up by the bootloader.

* tag 'omap-for-v4.6/dt-ti81xx-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: Add clocks for dm814x ADPLL

Kevin Hilman
2016-04-25 23:55:17 +0800
da67e68c0 Documentation: dt: arc: fix spelling mistakes ... Browse Code »

Signed-off-by: Eric Engestrom
Signed-off-by: Vineet Gupta

Eric Engestrom
2016-04-25 16:51:09 +0800
391a20333 ipv4/fib: don't warn when primary address is missing if in_dev is dead ... Browse Code »

After commit fbd40ea0180a ("ipv4: Don't do expensive useless work
during inetdev destroy.") when deleting an interface,
fib_del_ifaddr() can be executed without any primary address
present on the dead interface.

The above is safe, but triggers some "bug: prim == NULL" warnings.

This commit avoids warning if the in_dev is dead

Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller

Paolo Abeni
2016-04-25 11:26:29 +0800
02da2d721 Linux 4.6-rc5 Browse Code »

Linus Torvalds
2016-04-25 07:17:05 +0800