Eric Lee / smarc-fsl-linux-kernel

13 Dec, 2013

37 commits

6bd364d82 KEYS: fix uninitialized persistent_keyring_register_sem ... Browse Code »

We run into this bug:
[ 2736.063245] Unable to handle kernel paging request for data at address 0x00000000
[ 2736.063293] Faulting instruction address: 0xc00000000037efb0
[ 2736.063300] Oops: Kernel access of bad area, sig: 11 [#1]
[ 2736.063303] SMP NR_CPUS=2048 NUMA pSeries
[ 2736.063310] Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6table_security ip6table_raw ip6t_REJECT iptable_nat nf_nat_ipv4 iptable_mangle iptable_security iptable_raw ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ebtable_filter ebtables ip6table_filter iptable_filter ip_tables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 nf_nat nf_conntrack ip6_tables ibmveth pseries_rng nx_crypto nfsd auth_rpcgss nfs_acl lockd sunrpc binfmt_misc xfs libcrc32c dm_service_time sd_mod crc_t10dif crct10dif_common ibmvfc scsi_transport_fc scsi_tgt dm_mirror dm_region_hash dm_log dm_multipath dm_mod
[ 2736.063383] CPU: 1 PID: 7128 Comm: ssh Not tainted 3.10.0-48.el7.ppc64 #1
[ 2736.063389] task: c000000131930120 ti: c0000001319a0000 task.ti: c0000001319a0000
[ 2736.063394] NIP: c00000000037efb0 LR: c0000000006c40f8 CTR: 0000000000000000
[ 2736.063399] REGS: c0000001319a3870 TRAP: 0300 Not tainted (3.10.0-48.el7.ppc64)
[ 2736.063403] MSR: 8000000000009032 CR: 28824242 XER: 20000000
[ 2736.063415] SOFTE: 0
[ 2736.063418] CFAR: c00000000000908c
[ 2736.063421] DAR: 0000000000000000, DSISR: 40000000
[ 2736.063425]
GPR00: c0000000006c40f8 c0000001319a3af0 c000000001074788 c0000001319a3bf0
GPR04: 0000000000000000 0000000000000000 0000000000000020 000000000000000a
GPR08: fffffffe00000002 00000000ffff0000 0000000080000001 c000000000924888
GPR12: 0000000028824248 c000000007e00400 00001fffffa0f998 0000000000000000
GPR16: 0000000000000022 00001fffffa0f998 0000010022e92470 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 c000000000f4a828 00003ffffe527108 0000000000000000
GPR28: c000000000f4a730 c000000000f4a828 0000000000000000 c0000001319a3bf0
[ 2736.063498] NIP [c00000000037efb0] .__list_add+0x30/0x110
[ 2736.063504] LR [c0000000006c40f8] .rwsem_down_write_failed+0x78/0x264
[ 2736.063508] PACATMSCRATCH [800000000280f032]
[ 2736.063511] Call Trace:
[ 2736.063516] [c0000001319a3af0] [c0000001319a3b80] 0xc0000001319a3b80 (unreliable)
[ 2736.063523] [c0000001319a3b80] [c0000000006c40f8] .rwsem_down_write_failed+0x78/0x264
[ 2736.063530] [c0000001319a3c50] [c0000000006c1bb0] .down_write+0x70/0x78
[ 2736.063536] [c0000001319a3cd0] [c0000000002e5ffc] .keyctl_get_persistent+0x20c/0x320
[ 2736.063542] [c0000001319a3dc0] [c0000000002e2388] .SyS_keyctl+0x238/0x260
[ 2736.063548] [c0000001319a3e30] [c000000000009e7c] syscall_exit+0x0/0x7c
[ 2736.063553] Instruction dump:
[ 2736.063556] 7c0802a6 fba1ffe8 fbc1fff0 fbe1fff8 7cbd2b78 7c9e2378 7c7f1b78 f8010010
[ 2736.063566] f821ff71 e8a50008 7fa52040 40de00c0 7fbd2840 40de0094 7fbff040
[ 2736.063579] ---[ end trace 2708241785538296 ]---

It's caused by uninitialized persistent_keyring_register_sem.

The bug was introduced by commit f36f8c75, two typos are in that commit:
CONFIG_KEYS_KERBEROS_CACHE should be CONFIG_PERSISTENT_KEYRINGS and
krb_cache_register_sem should be persistent_keyring_register_sem.

Signed-off-by: Xiao Guangrong
Signed-off-by: David Howells

Xiao Guangrong
2013-12-13 23:59:11 +0800
f46a3cbbe KEYS: Remove files generated when SYSTEM_TRUSTED_KEYRING=y ... Browse Code »

Always remove generated SYSTEM_TRUSTED_KEYRING files while doing make mrproper.

Signed-off-by: Kirill Tkhai
Signed-off-by: David Howells

Kirill Tkhai
2013-12-13 23:59:11 +0800
d7ec435fd X.509: Fix certificate gathering ... Browse Code »

Fix the gathering of certificates from both the source tree and the build tree
to correctly calculate the pathnames of all the certificates.

The problem was that if the default generated cert, signing_key.x509, didn't
exist then it would not have a path attached and if it did, it would have a
path attached.

This means that the contents of kernel/.x509.list would change between the
first compilation in a directory and the second. After the second it would
remain stable because the signing_key.x509 file exists.

The consequence was that the kernel would get relinked unconditionally on the
second recompilation. The second recompilation would also show something like
this:

X.509 certificate list changed
CERTS kernel/x509_certificate_list
- Including cert /home/torvalds/v2.6/linux/signing_key.x509
AS kernel/system_certificates.o
LD kernel/built-in.o

which is why the relink would happen.

Unfortunately, it isn't a simple matter of just sticking a path on the front
of the filename of the certificate in the build directory as make can't then
work out how to build it.

So the path has to be prepended to the name for sorting and duplicate
elimination and then removed for the make rule if it is in the build tree.

Reported-by: Linus Torvalds
Signed-off-by: David Howells

David Howells
2013-12-13 23:28:14 +0800
8d2763770 Merge branch 'akpm' (fixes from Andrew) ... Browse Code »

Merge patches from Andrew Morton:
"13 fixes"

* emailed patches from Andrew Morton :
mm: memcg: do not allow task about to OOM kill to bypass the limit
mm: memcg: fix race condition between memcg teardown and swapin
thp: move preallocated PTE page table on move_huge_pmd()
mfd/rtc: s5m: fix register updating by adding regmap for RTC
rtc: s5m: enable IRQ wake during suspend
rtc: s5m: limit endless loop waiting for register update
rtc: s5m: fix unsuccesful IRQ request during probe
drivers/rtc/rtc-s5m.c: fix info->rtc assignment
include/linux/kernel.h: make might_fault() a nop for !MMU
drivers/rtc/rtc-at91rm9200.c: correct alarm over day/month wrap
procfs: also fix proc_reg_get_unmapped_area() for !MMU case
mm: memcg: do not declare OOM from __GFP_NOFAIL allocations
include/linux/hugetlb.h: make isolate_huge_page() an inline

Linus Torvalds
2013-12-13 10:22:10 +0800
1f14c1ac1 mm: memcg: do not allow task about to OOM kill to bypass the limit ... Browse Code »

Commit 4942642080ea ("mm: memcg: handle non-error OOM situations more
gracefully") allowed tasks that already entered a memcg OOM condition to
bypass the memcg limit on subsequent allocation attempts hoping this
would expedite finishing the page fault and executing the kill.

David Rientjes is worried that this breaks memcg isolation guarantees
and since there is no evidence that the bypass actually speeds up fault
processing just change it so that these subsequent charge attempts fail
outright. The notable exception being __GFP_NOFAIL charges which are
required to bypass the limit regardless.

Signed-off-by: Johannes Weiner
Reported-by: David Rientjes
Acked-by: Michal Hocko
Acked-bt: David Rientjes
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2013-12-13 10:19:26 +0800
96f1c58d8 mm: memcg: fix race condition between memcg teardown and swapin ... Browse Code »

There is a race condition between a memcg being torn down and a swapin
triggered from a different memcg of a page that was recorded to belong
to the exiting memcg on swapout (with CONFIG_MEMCG_SWAP extension). The
result is unreclaimable pages pointing to dead memcgs, which can lead to
anything from endless loops in later memcg teardown (the page is charged
to all hierarchical parents but is not on any LRU list) or crashes from
following the dangling memcg pointer.

Memcgs with tasks in them can not be torn down and usually charges don't
show up in memcgs without tasks. Swapin with the CONFIG_MEMCG_SWAP
extension is the notable exception because it charges the cgroup that
was recorded as owner during swapout, which may be empty and in the
process of being torn down when a task in another memcg triggers the
swapin:

teardown: swapin:

lookup_swap_cgroup_id()
rcu_read_lock()
mem_cgroup_lookup()
css_tryget()
rcu_read_unlock()
disable css_tryget()
call_rcu()
offline_css()
reparent_charges()
res_counter_charge() (hierarchical!)
css_put()
css_free()
pc->mem_cgroup = dead memcg
add page to dead lru

Add a final reparenting step into css_free() to make sure any such raced
charges are moved out of the memcg before it's finally freed.

In the longer term it would be cleaner to have the css_tryget() and the
res_counter charge under the same RCU lock section so that the charge
reparenting is deferred until the last charge whose tryget succeeded is
visible. But this will require more invasive changes that will be
harder to evaluate and backport into stable, so better defer them to a
separate change set.

Signed-off-by: Johannes Weiner
Acked-by: Michal Hocko
Cc: David Rientjes
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2013-12-13 10:19:26 +0800
3592806cf thp: move preallocated PTE page table on move_huge_pmd() ... Browse Code »

Andrey Wagin reported crash on VM_BUG_ON() in pgtable_pmd_page_dtor() with
fallowing backtrace:

free_pgd_range+0x2bf/0x410
free_pgtables+0xce/0x120
unmap_region+0xe0/0x120
do_munmap+0x249/0x360
move_vma+0x144/0x270
SyS_mremap+0x3b9/0x510
system_call_fastpath+0x16/0x1b

The crash can be reproduce with this test case:

#define _GNU_SOURCE
#include
#include
#include

#define MB (1024 * 1024UL)
#define GB (1024 * MB)

int main(int argc, char **argv)
{
char *p;
int i;

p = mmap((void *) GB, 10 * MB, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
for (i = 0; i < 10 * MB; i += 4096)
p[i] = 1;
mremap(p, 10 * MB, 10 * MB, MREMAP_FIXED | MREMAP_MAYMOVE, 2 * GB);
return 0;
}

Due to split PMD lock, we now store preallocated PTE tables for THP
pages per-PMD table. It means we need to move them to other PMD table
if huge PMD moved there.

Signed-off-by: Kirill A. Shutemov
Reported-by: Andrey Vagin
Tested-by: Andrey Vagin
Reviewed-by: Naoya Horiguchi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2013-12-13 10:19:26 +0800
3e1e4a5f3 mfd/rtc: s5m: fix register updating by adding regmap for RTC ... Browse Code »

Rename old regmap field of "struct sec_pmic_dev" to "regmap_pmic" and
add new regmap for RTC.

On S5M8767A registers were not properly updated and read due to usage of
the same regmap as the PMIC. This could be observed in various hangs,
e.g. in infinite loop during waiting for UDR field change.

On this chip family the RTC has different I2C address than PMIC so
additional regmap is needed.

Signed-off-by: Krzysztof Kozlowski
Signed-off-by: Kyungmin Park
Reviewed-by: Mark Brown
Acked-by: Sangbeom Kim
Cc: Samuel Ortiz
Cc: Lee Jones
Cc: Liam Girdwood
Cc: Alessandro Zummo
Cc: Marek Szyprowski
Cc: Geert Uytterhoeven
Cc: Kyungmin Park
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Krzysztof Kozlowski
2013-12-13 10:19:26 +0800
222ead7fd rtc: s5m: enable IRQ wake during suspend ... Browse Code »

Add PM suspend/resume ops to rtc-s5m driver and enable IRQ wake during
suspend so the RTC would act like a wake up source. This allows waking
up from suspend to RAM on RTC alarm interrupt.

Signed-off-by: Krzysztof Kozlowski
Signed-off-by: Kyungmin Park
Cc: Mark Brown
Acked-by: Sangbeom Kim
Cc: Samuel Ortiz
Cc: Lee Jones
Cc: Liam Girdwood
Cc: Alessandro Zummo
Cc: Marek Szyprowski
Cc: Geert Uytterhoeven
Cc: Kyungmin Park
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Krzysztof Kozlowski
2013-12-13 10:19:26 +0800
d73238d4a rtc: s5m: limit endless loop waiting for register update ... Browse Code »

After setting alarm or time the driver is waiting for UDR register to be
cleared indicating that registers data have been transferred.

Limit the endless loop to only 5 retries.

Signed-off-by: Krzysztof Kozlowski
Signed-off-by: Kyungmin Park
Reviewed-by: Mark Brown
Acked-by: Sangbeom Kim
Cc: Samuel Ortiz
Cc: Lee Jones
Cc: Liam Girdwood
Cc: Alessandro Zummo
Cc: Marek Szyprowski
Cc: Geert Uytterhoeven
Cc: Kyungmin Park
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Krzysztof Kozlowski
2013-12-13 10:19:26 +0800
7b003be82 rtc: s5m: fix unsuccesful IRQ request during probe ... Browse Code »

Probe failed for rtc-s5m:

s5m-rtc s5m-rtc: Failed to request alarm IRQ: 12: -22
s5m-rtc: probe of s5m-rtc failed with error -22

Fix rtc-s5m interrupt request by using regmap_irq_get_virq() for mapping
the IRQ.

Signed-off-by: Krzysztof Kozlowski
Signed-off-by: Kyungmin Park
Reviewed-by: Mark Brown
Acked-by: Sangbeom Kim
Cc: Samuel Ortiz
Cc: Lee Jones
Cc: Liam Girdwood
Cc: Alessandro Zummo
Cc: Marek Szyprowski
Cc: Geert Uytterhoeven
Cc: Kyungmin Park
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Krzysztof Kozlowski
2013-12-13 10:19:26 +0800
5ccb7d718 drivers/rtc/rtc-s5m.c: fix info->rtc assignment ... Browse Code »

Fix this warning:

drivers/rtc/rtc-s5m.c: In function `s5m_rtc_probe':
drivers/rtc/rtc-s5m.c:545: warning: assignment from incompatible pointer type

struct s5m_rtc_info.rtc has type "struct regmap *", while
struct sec_pmic_dev.rtc has type "struct i2c_client *".

Probably the author wanted to assign "struct sec_pmic_dev.regmap", which
has the correct type.

Also, as "rtc" doesn't make much sense as a name for a regmap, rename it
to "regmap".

Signed-off-by: Geert Uytterhoeven
Cc: Sangbeom Kim
Cc: Sachin Kamat
Tested-by: Krzysztof Kozlowski
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Geert Uytterhoeven
2013-12-13 10:19:26 +0800
386e79066 include/linux/kernel.h: make might_fault() a nop for !MMU ... Browse Code »

The machine cannot fault if !MUU, so make might_fault() a nop for !MMU.

This fixes below build error if
!CONFIG_MMU && (CONFIG_PROVE_LOCKING=y || CONFIG_DEBUG_ATOMIC_SLEEP=y):

arch/arm/kernel/built-in.o: In function `arch_ptrace':
arch/arm/kernel/ptrace.c:852: undefined reference to `might_fault'
arch/arm/kernel/built-in.o: In function `restore_sigframe':
arch/arm/kernel/signal.c:173: undefined reference to `might_fault'
...
arch/arm/kernel/built-in.o:arch/arm/kernel/signal.c:177: more undefined references to `might_fault' follow
make: *** [vmlinux] Error 1

Signed-off-by: Axel Lin
Acked-by: Michael S. Tsirkin
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Axel Lin
2013-12-13 10:19:26 +0800
eb3c22728 drivers/rtc/rtc-at91rm9200.c: correct alarm over day/month wrap ... Browse Code »

Update month and day of month to the alarm month/day instead of current
day/month when setting the RTC alarm mask.

Signed-off-by: Linus Pizunski
Signed-off-by: Nicolas Ferre
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Linus Pizunski
2013-12-13 10:19:26 +0800
ae5758a1a procfs: also fix proc_reg_get_unmapped_area() for !MMU case ... Browse Code »

Commit fad1a86e25e0 ("procfs: call default get_unmapped_area on
MMU-present architectures"), as its title says, took care of only the
MMU case, leaving the !MMU side still in the regressed state (returning
-EIO in all cases where pde->proc_fops->get_unmapped_area is NULL).

From the fad1a86e25e0 changelog:

"Commit c4fe24485729 ("sparc: fix PCI device proc file mmap(2)") added
proc_reg_get_unmapped_area in proc_reg_file_ops and
proc_reg_file_ops_no_compat, by which now mmap always returns EIO if
get_unmapped_area method is not defined for the target procfs file, which
causes regression of mmap on /proc/vmcore.

To address this issue, like get_unmapped_area(), call default
current->mm->get_unmapped_area on MMU-present architectures if
pde->proc_fops->get_unmapped_area, i.e. the one in actual file operation
in the procfs file, is not defined"

Signed-off-by: Jan Beulich
Cc: HATAYAMA Daisuke
Cc: Alexey Dobriyan
Cc: David S. Miller
Cc: [3.12.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Beulich
2013-12-13 10:19:26 +0800
a0d8b00a3 mm: memcg: do not declare OOM from __GFP_NOFAIL allocations ... Browse Code »

Commit 84235de394d9 ("fs: buffer: move allocation failure loop into the
allocator") started recognizing __GFP_NOFAIL in memory cgroups but
forgot to disable the OOM killer.

Any task that does not fail allocation will also not enter the OOM
completion path. So don't declare an OOM state in this case or it'll be
leaked and the task be able to bypass the limit until the next
userspace-triggered page fault cleans up the OOM state.

Reported-by: William Dauchy
Signed-off-by: Johannes Weiner
Acked-by: Michal Hocko
Cc: David Rientjes
Cc: [3.12.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2013-12-13 10:19:26 +0800
f40386a4e include/linux/hugetlb.h: make isolate_huge_page() an inline ... Browse Code »

With CONFIG_HUGETLBFS=n:

mm/migrate.c: In function `do_move_page_to_node_array':
include/linux/hugetlb.h:140:33: warning: statement with no effect [-Wunused-value]
#define isolate_huge_page(p, l) false
^
mm/migrate.c:1170:4: note: in expansion of macro `isolate_huge_page'
isolate_huge_page(page, &pagelist);

Reported-by: Borislav Petkov
Tested-by: Borislav Petkov
Signed-off-by: Naoya Horiguchi
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Naoya Horiguchi
2013-12-13 10:19:25 +0800
54fb723cc Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull kvm fixes from Paolo Bonzini:
"Four security fixes for KVM on x86. Thanks to Andrew Honig and Lars
Bull from Google for reporting them"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)
KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368)
KVM: x86: Fix potential divide by 0 in lapic (CVE-2013-6367)
KVM: Improve create VCPU parameter (CVE-2013-4587)

Linus Torvalds
2013-12-13 07:46:06 +0800
ea1e61cbb Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc ... Browse Code »

Pull ARM SoC fixes from Olof Johansson:
"Another week, another batch of fixes.

Again, OMAP regressions due to move to DT is the bulk of the changes
here, but this should be the last of it for 3.13. There are also a
handful of OMAP hwmod changes (power management, reset handling) for
USB on OMAP3 that fixes some longish-standing bugs around USB resets.

There are a couple of other changes that also add up line count a bit:
One is a long-standing bug with the keyboard layout on one of the PXA
platforms. The other is a fix for highbank that moves their
power-off/reset button handling to be done in-kernel since relying on
userspace to handle it was fragile and awkward"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: sun6i: dt: Fix interrupt trigger types
ARM: sun7i: dt: Fix interrupt trigger types
MAINTAINERS: merge IMX6 entry into IMX
ARM: tegra: add missing break to fuse initialization code
ARM: pxa: prevent PXA270 occasional reboot freezes
ARM: pxa: tosa: fix keys mapping
ARM: OMAP2+: omap_device: add fail hook for runtime_pm when bad data is detected
ARM: OMAP2+: hwmod: Fix usage of invalid iclk / oclk when clock node is not present
ARM: OMAP3: hwmod data: Don't prevent RESET of USB Host module
ARM: OMAP2+: hwmod: Fix SOFTRESET logic
ARM: OMAP4+: hwmod data: Don't prevent RESET of USB Host module
ARM: dts: Fix booting for secure omaps
ARM: OMAP2+: Fix the machine entry for am3517
ARM: dts: Fix missing entries for am3517
ARM: OMAP2+: Fix overwriting hwmod data with data from device tree
ARM: davinci: Fix McASP mem resource names
ARM: highbank: handle soft poweroff and reset key events
ARM: davinci: fix number of resources passed to davinci_gpio_register()
gpio: davinci: fix check for unbanked gpio

Linus Torvalds
2013-12-13 07:45:03 +0800
e09f67f14 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs fixes from Chris Mason:
"This is a small collection of fixes. It was rebased this morning, but
I was just fixing signed-off-by tags with the wrong email"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix access_ok() check in btrfs_ioctl_send()
Btrfs: make sure we cleanup all reloc roots if error happens
Btrfs: skip building backref tree for uuid and quota tree when doing balance relocation
Btrfs: fix an oops when doing balance relocation
Btrfs: don't miss skinny extent items on delayed ref head contention
btrfs: call mnt_drop_write after interrupted subvol deletion
Btrfs: don't clear the default compression type

Linus Torvalds
2013-12-13 07:25:10 +0800
c9111b4df Merge branch 'for-3.13' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd reply cache bugfix from Bruce Fields:
"One bugfix for nfsd crashes"

* 'for-3.13' of git://linux-nfs.org/~bfields/linux:
nfsd: when reusing an existing repcache entry, unhash it first

Linus Torvalds
2013-12-13 07:24:32 +0800
17d68b763 KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376) ... Browse Code »

A guest can cause a BUG_ON() leading to a host kernel crash.
When the guest writes to the ICR to request an IPI, while in x2apic
mode the following things happen, the destination is read from
ICR2, which is a register that the guest can control.

kvm_irq_delivery_to_apic_fast uses the high 16 bits of ICR2 as the
cluster id. A BUG_ON is triggered, which is a protection against
accessing map->logical_map with an out-of-bounds access and manages
to avoid that anything really unsafe occurs.

The logic in the code is correct from real HW point of view. The problem
is that KVM supports only one cluster with ID 0 in clustered mode, but
the code that has the bug does not take this into account.

Reported-by: Lars Bull
Cc: stable@vger.kernel.org
Signed-off-by: Gleb Natapov
Signed-off-by: Paolo Bonzini

Gleb Natapov
2013-12-13 05:46:18 +0800
fda4e2e85 KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368) ... Browse Code »

In kvm_lapic_sync_from_vapic and kvm_lapic_sync_to_vapic there is the
potential to corrupt kernel memory if userspace provides an address that
is at the end of a page. This patches concerts those functions to use
kvm_write_guest_cached and kvm_read_guest_cached. It also checks the
vapic_address specified by userspace during ioctl processing and returns
an error to userspace if the address is not a valid GPA.

This is generally not guest triggerable, because the required write is
done by firmware that runs before the guest. Also, it only affects AMD
processors and oldish Intel that do not have the FlexPriority feature
(unless you disable FlexPriority, of course; then newer processors are
also affected).

Fixes: b93463aa59d6 ('KVM: Accelerated apic support')

Reported-by: Andrew Honig
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Honig
Signed-off-by: Paolo Bonzini

Andy Honig
2013-12-13 05:39:46 +0800
b963a22e6 KVM: x86: Fix potential divide by 0 in lapic (CVE-2013-6367) ... Browse Code »

Under guest controllable circumstances apic_get_tmcct will execute a
divide by zero and cause a crash. If the guest cpuid support
tsc deadline timers and performs the following sequence of requests
the host will crash.
- Set the mode to periodic
- Set the TMICT to 0
- Set the mode bits to 11 (neither periodic, nor one shot, nor tsc deadline)
- Set the TMICT to non-zero.
Then the lapic_timer.period will be 0, but the TMICT will not be. If the
guest then reads from the TMCCT then the host will perform a divide by 0.

This patch ensures that if the lapic_timer.period is 0, then the division
does not occur.

Reported-by: Andrew Honig
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Honig
Signed-off-by: Paolo Bonzini

Andy Honig
2013-12-13 05:39:45 +0800
338c7dbad KVM: Improve create VCPU parameter (CVE-2013-4587) ... Browse Code »

In multiple functions the vcpu_id is used as an offset into a bitfield. Ag
malicious user could specify a vcpu_id greater than 255 in order to set or
clear bits in kernel memory. This could be used to elevate priveges in the
kernel. This patch verifies that the vcpu_id provided is less than 255.
The api documentation already specifies that the vcpu_id must be less than
max_vcpus, but this is currently not checked.

Reported-by: Andrew Honig
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Honig
Signed-off-by: Paolo Bonzini

Andy Honig
2013-12-13 05:39:33 +0800
2208f6513 Merge tag 'sound-3.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound ... Browse Code »

Pull sound fixes from Takashi Iwai:
"Still a slightly high amount of changes than wished, but they are all
good regression and/or device-specific fixes. Majority of commits are
for HD-audio, an HDMI ctl index fix that hits old graphics boards,
regression fixes for AD codecs and a few quirks.

Other than that, two major fixes are included: a 64bit ABI fix for
compress offload, and 64bit dma_addr_t truncation fix, which had hit
on PAE kernels"

* tag 'sound-3.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Add static DAC/pin mapping for AD1986A codec
ALSA: hda - One more Dell headset detection quirk
ALSA: hda - hdmi: Fix IEC958 ctl indexes for some simple HDMI devices
ALSA: hda - Mute all aamix inputs as default
ALSA: compress: Fix 64bit ABI incompatibility
ALSA: memalloc.h - fix wrong truncation of dma_addr_t
ALSA: hda - Another Dell headset detection quirk
ALSA: hda - A Dell headset detection quirk
ALSA: hda - Remove quirk for Dell Vostro 131
ALSA: usb-audio: fix uninitialized variable compile warning
ALSA: hda - fix mic issues on Acer Aspire E-572

Linus Torvalds
2013-12-13 05:14:25 +0800
ea4ebd1cb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input ... Browse Code »

Pull input fixes from Dmitry Torokhov:
"A fix for recent sysfs breakage in serio subsystem plus a fixup to
adxl34x driver"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: adxl34x - Fix bug in definition of ADXL346_2D_ORIENT
Input: serio - fix sysfs layout

Linus Torvalds
2013-12-13 05:13:47 +0800
846f29a6a Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media ... Browse Code »

Pull media fixes from Mauro Carvalho Chehab:
"A dvb core deadlock fix, a couple videobuf2 fixes an a series of media
driver fixes"

* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (30 commits)
[media] videobuf2-dma-sg: fix possible memory leak
[media] vb2: regression fix: always set length field.
[media] mt9p031: Include linux/of.h header
[media] rtl2830: add parent for I2C adapter
[media] media: marvell-ccic: use devm to release clk
[media] ths7303: Declare as static a private function
[media] em28xx-video: Swap release order to avoid lock nesting
[media] usbtv: Add support for PAL video source
[media] media_tree: Fix spelling errors
[media] videobuf2: Add support for file access mode flags for DMABUF exporting
[media] radio-shark2: Mark shark_resume_leds() inline to kill compiler warning
[media] radio-shark: Mark shark_resume_leds() inline to kill compiler warning
[media] af9035: unlock on error in af9035_i2c_master_xfer()
[media] af9033: fix broken I2C
[media] v4l: omap3isp: Don't check for missing get_fmt op on remote subdev
[media] af9035: fix broken I2C and USB I/O
[media] wm8775: fix broken audio routing
[media] marvell-ccic: drop resource free in driver remove
[media] tef6862/radio-tea5764: actually assign clamp result
[media] cx231xx: use after free on error path in probe
...

Linus Torvalds
2013-12-13 03:06:13 +0800
86b581f6f Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging ... Browse Code »

Pull hwmon fix from Guenter Roeck:
"Fix HIH-6130 driver to work with BeagleBone"

* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: HIH-6130: Support I2C bus drivers without I2C_FUNC_SMBUS_QUICK

Linus Torvalds
2013-12-13 03:05:19 +0800
c8469441c Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging ... Browse Code »

Pull hwmon fixes from Jean Delvare.

* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
hwmon: Prevent some divide by zeros in FAN_TO_REG()
hwmon: (w83l768ng) Fix fan speed control range
hwmon: (w83l786ng) Fix fan speed control mode setting and reporting
hwmon: (lm90) Unregister hwmon device if interrupt setup fails

Linus Torvalds
2013-12-13 03:03:57 +0800
11ec50cae word-at-a-time: provide generic big-endian zero_bytemask implementation ... Browse Code »

Whilst architectures may be able to do better than this (which they can,
by simply defining their own macro), this is a generic stab at a
zero_bytemask implementation for the asm-generic, big-endian
word-at-a-time implementation.

On arm64, a clz instruction is used to implement the fls efficiently.

Signed-off-by: Will Deacon
Signed-off-by: Linus Torvalds

Will Deacon
2013-12-13 02:39:01 +0800
a5c21dcef dcache: allow word-at-a-time name hashing with big-endian CPUs ... Browse Code »

When explicitly hashing the end of a string with the word-at-a-time
interface, we have to be careful which end of the word we pick up.

On big-endian CPUs, the upper-bits will contain the data we're after, so
ensure we generate our masks accordingly (and avoid hashing whatever
random junk may have been sitting after the string).

This patch adds a new dcache helper, bytemask_from_count, which creates
a mask appropriate for the CPU endianness.

Cc: Al Viro
Signed-off-by: Will Deacon
Signed-off-by: Linus Torvalds

Will Deacon
2013-12-13 02:39:01 +0800
319720f53 Merge tag 'iommu-fixes-for-v3.13-rc4' of git://github.com/awilliam/linux-vfio ... Browse Code »

Pull iommu fixes from Alex Williamson:
"arm/smmu driver updates via Will Deacon fixing locking around page
table walks and a couple other issues"

* tag 'iommu-fixes-for-v3.13-rc4' of git://github.com/awilliam/linux-vfio:
iommu/arm-smmu: fix error return code in arm_smmu_device_dt_probe()
iommu/arm-smmu: remove potential NULL dereference on mapping path
iommu/arm-smmu: use mutex instead of spinlock for locking page tables

Linus Torvalds
2013-12-13 02:20:58 +0800
5dec682c7 Merge tag 'keys-devel-20131210' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs ... Browse Code »

Pull misc keyrings fixes from David Howells:
"These break down into five sets:

- A patch to error handling in the big_key type for huge payloads.
If the payload is larger than the "low limit" and the backing store
allocation fails, then big_key_instantiate() doesn't clear the
payload pointers in the key, assuming them to have been previously
cleared - but only one of them is.

Unfortunately, the garbage collector still calls big_key_destroy()
when sees one of the pointers with a weird value in it (and not
NULL) which it then tries to clean up.

- Three patches to fix the keyring type:

* A patch to fix the hash function to correctly divide keyrings off
from keys in the topology of the tree inside the associative
array. This is only a problem if searching through nested
keyrings - and only if the hash function incorrectly puts the a
keyring outside of the 0 branch of the root node.

* A patch to fix keyrings' use of the associative array. The
__key_link_begin() function initially passes a NULL key pointer
to assoc_array_insert() on the basis that it's holding a place in
the tree whilst it does more allocation and stuff.

This is only a problem when a node contains 16 keys that match at
that level and we want to add an also matching 17th. This should
easily be manufactured with a keyring full of keyrings (without
chucking any other sort of key into the mix) - except for (a)
above which makes it on average adding the 65th keyring.

* A patch to fix searching down through nested keyrings, where any
keyring in the set has more than 16 keyrings and none of the
first keyrings we look through has a match (before the tree
iteration needs to step to a more distal node).

Test in keyutils test suite:

http://git.kernel.org/cgit/linux/kernel/git/dhowells/keyutils.git/commit/?id=8b4ae963ed92523aea18dfbb8cab3f4979e13bd1

- A patch to fix the big_key type's use of a shmem file as its
backing store causing audit messages and LSM check failures. This
is done by setting S_PRIVATE on the file to avoid LSM checks on the
file (access to the shmem file goes through the keyctl() interface
and so is gated by the LSM that way).

This isn't normally a problem if a key is used by the context that
generated it - and it's currently only used by libkrb5.

Test in keyutils test suite:

http://git.kernel.org/cgit/linux/kernel/git/dhowells/keyutils.git/commit/?id=d9a53cbab42c293962f2f78f7190253fc73bd32e

- A patch to add a generated file to .gitignore.

- A patch to fix the alignment of the system certificate data such
that it it works on s390. As I understand it, on the S390 arch,
symbols must be 2-byte aligned because loading the address discards
the least-significant bit"

* tag 'keys-devel-20131210' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
KEYS: correct alignment of system_certificate_list content in assembly file
Ignore generated file kernel/x509_certificate_list
security: shmem: implement kernel private shmem inodes
KEYS: Fix searching of nested keyrings
KEYS: Fix multiple key add into associative array
KEYS: Fix the keyring hash function
KEYS: Pre-clear struct key on allocation

Linus Torvalds
2013-12-13 02:15:24 +0800
48a2f0b27 Merge tag 'xfs-for-linus-v3.13-rc4' of git://oss.sgi.com/xfs/xfs ... Browse Code »

Pull xfs bugfixes from Ben Myers:

- fix for buffer overrun in agfl with growfs on v4 superblock

- return EINVAL if requested discard length is less than a block

- fix possible memory corruption in xfs_attrlist_by_handle()

* tag 'xfs-for-linus-v3.13-rc4' of git://oss.sgi.com/xfs/xfs:
xfs: growfs overruns AGFL buffer on V4 filesystems
xfs: don't perform discard if the given range length is less than block size
xfs: underflow bug in xfs_attrlist_by_handle()

Linus Torvalds
2013-12-13 02:14:13 +0800
5cdec2d83 futex: move user address verification up to common code ... Browse Code »

When debugging the read-only hugepage case, I was confused by the fact
that get_futex_key() did an access_ok() only for the non-shared futex
case, since the user address checking really isn't in any way specific
to the private key handling.

Now, it turns out that the shared key handling does effectively do the
equivalent checks inside get_user_pages_fast() (it doesn't actually
check the address range on x86, but does check the page protections for
being a user page). So it wasn't actually a bug, but the fact that we
treat the address differently for private and shared futexes threw me
for a loop.

Just move the check up, so that it gets done for both cases. Also, use
the 'rw' parameter for the type, even if it doesn't actually matter any
more (it's a historical artifact of the old racy i386 "page faults from
kernel space don't check write protections").

Cc: Thomas Gleixner
Signed-off-by: Linus Torvalds

Linus Torvalds
2013-12-13 01:53:51 +0800
f12d5bfce futex: fix handling of read-only-mapped hugepages ... Browse Code »

The hugepage code had the exact same bug that regular pages had in
commit 7485d0d3758e ("futexes: Remove rw parameter from
get_futex_key()").

The regular page case was fixed by commit 9ea71503a8ed ("futex: Fix
regression with read only mappings"), but the transparent hugepage case
(added in a5b338f2b0b1: "thp: update futex compound knowledge") case
remained broken.

Found by Dave Jones and his trinity tool.

Reported-and-tested-by: Dave Jones
Cc: stable@kernel.org # v2.6.38+
Acked-by: Thomas Gleixner
Cc: Mel Gorman
Cc: Darren Hart
Cc: Andrea Arcangeli
Cc: Oleg Nesterov
Signed-off-by: Linus Torvalds

Linus Torvalds
2013-12-13 01:38:42 +0800

12 Dec, 2013

3 commits

700ff4f09 Btrfs: fix access_ok() check in btrfs_ioctl_send() ... Browse Code »

The closing parenthesis is in the wrong place. We want to check
"sizeof(*arg->clone_sources) * arg->clone_sources_count" instead of
"sizeof(*arg->clone_sources * arg->clone_sources_count)".

Signed-off-by: Dan Carpenter
Reviewed-by: Jie Liu
Signed-off-by: Chris Mason
cc: stable@vger.kernel.org

Dan Carpenter
2013-12-12 23:13:02 +0800
467bb1d27 Btrfs: make sure we cleanup all reloc roots if error happens ... Browse Code »

I hit an oops when merging reloc roots fails, the reason is that
new reloc roots may be added and we should make sure we cleanup
all reloc roots.

Signed-off-by: Wang Shilong
Signed-off-by: Chris Mason

Wang Shilong
2013-12-12 23:12:51 +0800
664637486 Btrfs: skip building backref tree for uuid and quota tree when doing balance relocation ... Browse Code »

Quota tree and UUID Tree is only cowed, they can not be snapshoted.

Signed-off-by: Wang Shilong
Signed-off-by: Chris Mason

Wang Shilong
2013-12-12 23:12:36 +0800