Doug / smarc-fsl-linux-kernel | Embedian Git Server

27 Dec, 2011

21 commits

1a214246c KVM: make checks stricter in coalesced_mmio_in_range() ... Browse Code »

My testing version of Smatch complains that addr and len come from
the user and they can wrap. The path is:
-> kvm_vm_ioctl()
-> kvm_vm_ioctl_unregister_coalesced_mmio()
-> coalesced_mmio_in_range()

I don't know what the implications are of wrapping here, but we may
as well fix it, if only to silence the warning.

Signed-off-by: Dan Carpenter
Signed-off-by: Marcelo Tosatti

Dan Carpenter
2011-12-27 17:17:07 +0800
3f2e5260f KVM: x86: Simplify kvm timer handler ... Browse Code »

The vcpu reference of a kvm_timer can't become NULL while the timer is
valid, so drop this redundant test. This also makes it pointless to
carry a separate __kvm_timer_fn, fold it into kvm_timer_fn.

Signed-off-by: Jan Kiszka
Signed-off-by: Marcelo Tosatti

Jan Kiszka
2011-12-27 17:17:05 +0800
b297e672e KVM: Fix include dependency for mmu_notifier ... Browse Code »

The kvm_host struct can include an mmu_notifier struct but mmu_notifier.h is
not included directly.

Signed-off-by: Eric B Munson
Signed-off-by: Avi Kivity

Eric B Munson
2011-12-27 17:17:04 +0800
a30f47cb1 KVM: MMU: improve write flooding detected ... Browse Code »

Detecting write-flooding does not work well, when we handle page written, if
the last speculative spte is not accessed, we treat the page is
write-flooding, however, we can speculative spte on many path, such as pte
prefetch, page synced, that means the last speculative spte may be not point
to the written page and the written page can be accessed via other sptes, so
depends on the Accessed bit of the last speculative spte is not enough

Instead of detected page accessed, we can detect whether the spte is accessed
after it is written, if the spte is not accessed but it is written frequently,
we treat is not a page table or it not used for a long time

Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity

Xiao Guangrong
2011-12-27 17:17:02 +0800
5d9ca30e9 KVM: MMU: fix detecting misaligned accessed ... Browse Code »

Sometimes, we only modify the last one byte of a pte to update status bit,
for example, clear_bit is used to clear r/w bit in linux kernel and 'andb'
instruction is used in this function, in this case, kvm_mmu_pte_write will
treat it as misaligned access, and the shadow page table is zapped

Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity

Xiao Guangrong
2011-12-27 17:17:01 +0800
889e5cbce KVM: MMU: split kvm_mmu_pte_write function ... Browse Code »

kvm_mmu_pte_write is too long, we split it for better readable

Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity

Xiao Guangrong
2011-12-27 17:16:59 +0800
f8734352c KVM: MMU: remove unnecessary kvm_mmu_free_some_pages ... Browse Code »

In kvm_mmu_pte_write, we do not need to alloc shadow page, so calling
kvm_mmu_free_some_pages is really unnecessary

Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity

Xiao Guangrong
2011-12-27 17:16:58 +0800
f57f2ef58 KVM: MMU: fast prefetch spte on invlpg path ... Browse Code »

Fast prefetch spte for the unsync shadow page on invlpg path

Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity

Xiao Guangrong
2011-12-27 17:16:56 +0800
505aef8f3 KVM: MMU: cleanup FNAME(invlpg) ... Browse Code »

Directly Use mmu_page_zap_pte to zap spte in FNAME(invlpg), also remove the
same code between FNAME(invlpg) and FNAME(sync_page)

Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity

Xiao Guangrong
2011-12-27 17:16:54 +0800
d01f8d5e0 KVM: MMU: do not mark accessed bit on pte write path ... Browse Code »

In current code, the accessed bit is always set when page fault occurred,
do not need to set it on pte write path

Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity

Xiao Guangrong
2011-12-27 17:16:53 +0800
6f6fbe98c KVM: x86: cleanup port-in/port-out emulated ... Browse Code »

Remove the same code between emulator_pio_in_emulated and
emulator_pio_out_emulated

Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity

Xiao Guangrong
2011-12-27 17:16:51 +0800
1cb3f3ae5 KVM: x86: retry non-page-table writing instructions ... Browse Code »

If the emulation is caused by #PF and it is non-page_table writing instruction,
it means the VM-EXIT is caused by shadow page protected, we can zap the shadow
page and retry this instruction directly

The idea is from Avi

Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity

Xiao Guangrong
2011-12-27 17:16:50 +0800
d5ae7ce83 KVM: x86: tag the instructions which are used to write page table ... Browse Code »

The idea is from Avi:
| tag instructions that are typically used to modify the page tables, and
| drop shadow if any other instruction is used.
| The list would include, I'd guess, and, or, bts, btc, mov, xchg, cmpxchg,
| and cmpxchg8b.

This patch is used to tag the instructions and in the later path, shadow page
is dropped if it is written by other instructions

Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity

Xiao Guangrong
2011-12-27 17:16:48 +0800
f759e2b4c KVM: MMU: avoid pte_list_desc running out in kvm_mmu_pte_write ... Browse Code »

kvm_mmu_pte_write is unsafe since we need to alloc pte_list_desc in the
function when spte is prefetched, unfortunately, we can not know how many
spte need to be prefetched on this path, that means we can use out of the
free pte_list_desc object in the cache, and BUG_ON() is triggered, also some
path does not fill the cache, such as INS instruction emulated that does not
trigger page fault

Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity

Xiao Guangrong
2011-12-27 17:16:47 +0800
51cfe38ea KVM: nVMX: Fix warning-causing idt-vectoring-info behavior ... Browse Code »

When L0 wishes to inject an interrupt while L2 is running, it emulates an exit
to L1 with EXIT_REASON_EXTERNAL_INTERRUPT. This was explained in the original
nVMX patch 23, titled "Correct handling of interrupt injection".

Unfortunately, it is possible (though rare) that at this point there is valid
idt_vectoring_info in vmcs02. For example, L1 injected some interrupt to L2,
and when L2 tried to run this interrupt's handler, it got a page fault - so
it returns the original interrupt vector in idt_vectoring_info. The problem
is that if this is the case, we cannot exit to L1 with EXTERNAL_INTERRUPT
like we wished to, because the VMX spec guarantees that idt_vectoring_info
and exit_reason_external_interrupt can never happen together. This is not
just specified in the spec - a KVM L1 actually prints a kernel warning
"unexpected, valid vectoring info" if we violate this guarantee, and some
users noticed these warnings in L1's logs.

In order to better emulate a processor, which would never return the external
interrupt and the idt-vectoring-info together, we need to separate the two
injection steps: First, complete L1's injection into L2 (i.e., enter L2,
injecting to it the idt-vectoring-info); Second, after entry into L2 succeeds
and it exits back to L0, exit to L1 with the EXIT_REASON_EXTERNAL_INTERRUPT.
Most of this is already in the code - the only change we need is to remain
in L2 (and not exit to L1) in this case.

Note that the previous patch ensures (by using KVM_REQ_IMMEDIATE_EXIT) that
although we do enter L2 first, it will exit immediately after processing its
injection, allowing us to promptly inject to L1.

Note how we test vmcs12->idt_vectoring_info_field; This isn't really the
vmcs12 value (we haven't exited to L1 yet, so vmcs12 hasn't been updated),
but rather the place we save, at the end of vmx_vcpu_run, the vmcs02 value
of this field. This was explained in patch 25 ("Correct handling of idt
vectoring info") of the original nVMX patch series.

Thanks to Dave Allan and to Federico Simoncelli for reporting this bug,
to Abel Gordon for helping me figure out the solution, and to Avi Kivity
for helping to improve it.

Signed-off-by: Nadav Har'El
Signed-off-by: Avi Kivity

Nadav Har'El
2011-12-27 17:16:45 +0800
d6185f20a KVM: nVMX: Add KVM_REQ_IMMEDIATE_EXIT ... Browse Code »

This patch adds a new vcpu->requests bit, KVM_REQ_IMMEDIATE_EXIT.
This bit requests that when next entering the guest, we should run it only
for as little as possible, and exit again.

We use this new option in nested VMX: When L1 launches L2, but L0 wishes L1
to continue running so it can inject an event to it, we unfortunately cannot
just pretend to have run L2 for a little while - We must really launch L2,
otherwise certain one-off vmcs12 parameters (namely, L1 injection into L2)
will be lost. So the existing code runs L2 in this case.
But L2 could potentially run for a long time until it exits, and the
injection into L1 will be delayed. The new KVM_REQ_IMMEDIATE_EXIT allows us
to request that L2 will be entered, as necessary, but will exit as soon as
possible after entry.

Our implementation of this request uses smp_send_reschedule() to send a
self-IPI, with interrupts disabled. The interrupts remain disabled until the
guest is entered, and then, after the entry is complete (often including
processing an injection and jumping to the relevant handler), the physical
interrupt is noticed and causes an exit.

On recent Intel processors, we could have achieved the same goal by using
MTF instead of a self-IPI. Another technique worth considering in the future
is to use VM_EXIT_ACK_INTR_ON_EXIT and a highest-priority vector IPI - to
slightly improve performance by avoiding the useless interrupt handler
which ends up being called when smp_send_reschedule() is used.

Signed-off-by: Nadav Har'El
Signed-off-by: Avi Kivity

Nadav Har'El
2011-12-27 17:16:43 +0800
371de6e4e drm/i915: Disable RC6 on Sandybridge by default ... Browse Code »

RC6 fails again.

> I found my system freeze mostly during starting up X and KDE. Sometimes it
> works for some minutes, sometimes it freezes immediatly. When the freeze
> happens, everything is dead (even the reset button does not work, I need to
> power cycle).

> I disabled RC6, and my system runs wonderfully.

> The system is a Z68 Pro board with Sandybridge i5-2500K processor, 8
> GB of RAM and UEFI firmware.

Reported-by: Kai Krakow
Signed-off-by: Keith Packard
Signed-off-by: Linus Torvalds

Keith Packard
2011-12-27 13:07:27 +0800
ebbd857e6 drm/i915: Disable semaphores by default on SNB ... Browse Code »

Semaphores still cause problems on some machines:

> From Udo Steinberg:
>
> With Linux-3.2-rc6 I'm frequently seeing GPU hangs when large amounts of
> text scroll in an xterm, such as when extracting a tar archive. Such as this
> one (note the timestamps):
>
> I can reproduce it fairly easily with something
> as simple as:
>
> while true; do dmesg; done

This patch turns them off on SNB while leaving them on for IVB.

Reported-by: Udo Steinberg
Cc: Daniel Vetter
Cc: Eugeni Dodonov
Signed-off-by: Keith Packard
Signed-off-by: Linus Torvalds

Keith Packard
2011-12-27 13:07:26 +0800
7f54492fb Merge branch 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

* 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: PPC: e500: include linux/export.h
KVM: PPC: fix kvmppc_start_thread() for CONFIG_SMP=N
KVM: PPC: protect use of kvmppc_h_pr
KVM: PPC: move compute_tlbie_rb to book3s_64 common header
KVM: Don't automatically expose the TSC deadline timer in cpuid
KVM: Device assignment permission checks
KVM: Remove ability to assign a device without iommu support
KVM: x86: Prevent starting PIT timers in the absence of irqchip support

Linus Torvalds
2011-12-27 05:17:00 +0800
6fd8fb7f5 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394 ... Browse Code »

post 3.2-rc7 pull request

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
MAINTAINERS: firewire git URL update

Linus Torvalds
2011-12-27 04:46:17 +0800
6d4b9e38d vfs: fix handling of lock allocation failure in lease-break case ... Browse Code »

Bruce Fields notes that commit 778fc546f749 ("locks: fix tracking of
inprogress lease breaks") introduced a possible error pointer
dereference on failure to allocate memory. locks_conflict() will
dereference the passed-in new lease lock structure that may be an error pointer.

This means an open (without O_NONBLOCK set) on a file with a lease
applied (generally only done when Samba or nfsd (with v4) is running)
could crash if a kmalloc() fails.

So instead of playing games with IS_ERROR() all over the place, just
check the allocation failure early. That makes the code more
straightforward, and avoids this possible bad pointer dereference.

Based-on-patch-by: J. Bruce Fields
Cc: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2011-12-27 02:25:26 +0800

26 Dec, 2011

6 commits

fae9dbb4b KVM: PPC: e500: include linux/export.h ... Browse Code »

This is required for THIS_MODULE. We recently stopped acquiring
it via some other header.

Signed-off-by: Scott Wood
Signed-off-by: Alexander Graf

Scott Wood
2011-12-26 19:28:03 +0800
251da0389 KVM: PPC: fix kvmppc_start_thread() for CONFIG_SMP=N ... Browse Code »

Currently kvmppc_start_thread() tries to wake other SMT threads via
xics_wake_cpu(). Unfortunately xics_wake_cpu only exists when
CONFIG_SMP=Y so when compiling with CONFIG_SMP=N we get:

arch/powerpc/kvm/built-in.o: In function `.kvmppc_start_thread':
book3s_hv.c:(.text+0xa1e0): undefined reference to `.xics_wake_cpu'

The following should be fine since kvmppc_start_thread() shouldn't
called to start non-zero threads when SMP=N since threads_per_core=1.

Signed-off-by: Michael Neuling
Signed-off-by: Alexander Graf

Michael Neuling
2011-12-26 19:28:02 +0800
96f38d728 KVM: PPC: protect use of kvmppc_h_pr ... Browse Code »

kvmppc_h_pr is only available if CONFIG_KVM_BOOK3S_64_PR.

Signed-off-by: Andreas Schwab
Signed-off-by: Alexander Graf

Andreas Schwab
2011-12-26 19:28:01 +0800
36cc66d63 KVM: PPC: move compute_tlbie_rb to book3s_64 common header ... Browse Code »

compute_tlbie_rb is only used on ppc64 and cannot be compiled on ppc32.

Signed-off-by: Andreas Schwab
Signed-off-by: Alexander Graf

Andreas Schwab
2011-12-26 19:28:00 +0800
4d25a066b KVM: Don't automatically expose the TSC deadline timer in cpuid ... Browse Code »

Unlike all of the other cpuid bits, the TSC deadline timer bit is set
unconditionally, regardless of what userspace wants.

This is broken in several ways:
- if userspace doesn't use KVM_CREATE_IRQCHIP, and doesn't emulate the TSC
deadline timer feature, a guest that uses the feature will break
- live migration to older host kernels that don't support the TSC deadline
timer will cause the feature to be pulled from under the guest's feet;
breaking it
- guests that are broken wrt the feature will fail.

Fix by not enabling the feature automatically; instead report it to userspace.
Because the feature depends on KVM_CREATE_IRQCHIP, which we cannot guarantee
will be called, we expose it via a KVM_CAP_TSC_DEADLINE_TIMER and not
KVM_GET_SUPPORTED_CPUID.

Fixes the Illumos guest kernel, which uses the TSC deadline timer feature.

[avi: add the KVM_CAP + documentation]

Reported-by: Alexey Zaytsev
Tested-by: Alexey Zaytsev
Signed-off-by: Jan Kiszka
Signed-off-by: Avi Kivity

Jan Kiszka
2011-12-26 19:27:44 +0800
3d27e23b1 KVM: Device assignment permission checks ... Browse Code »

Only allow KVM device assignment to attach to devices which:

- Are not bridges
- Have BAR resources (assume others are special devices)
- The user has permissions to use

Assigning a bridge is a configuration error, it's not supported, and
typically doesn't result in the behavior the user is expecting anyway.
Devices without BAR resources are typically chipset components that
also don't have host drivers. We don't want users to hold such devices
captive or cause system problems by fencing them off into an iommu
domain. We determine "permission to use" by testing whether the user
has access to the PCI sysfs resource files. By default a normal user
will not have access to these files, so it provides a good indication
that an administration agent has granted the user access to the device.

[Yang Bai: add missing #include]
[avi: fix comment style]

Signed-off-by: Alex Williamson
Signed-off-by: Yang Bai
Signed-off-by: Marcelo Tosatti

Alex Williamson
2011-12-26 01:03:54 +0800

25 Dec, 2011

4 commits

423873736 KVM: Remove ability to assign a device without iommu support ... Browse Code »

This option has no users and it exposes a security hole that we
can allow devices to be assigned without iommu protection. Make
KVM_DEV_ASSIGN_ENABLE_IOMMU a mandatory option.

Signed-off-by: Alex Williamson
Signed-off-by: Marcelo Tosatti

Alex Williamson
2011-12-25 23:13:31 +0800
0924ab2cf KVM: x86: Prevent starting PIT timers in the absence of irqchip support ... Browse Code »

User space may create the PIT and forgets about setting up the irqchips.
In that case, firing PIT IRQs will crash the host:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000128
IP: [] kvm_set_irq+0x30/0x170 [kvm]
...
Call Trace:
[] pit_do_work+0x51/0xd0 [kvm]
[] process_one_work+0x111/0x4d0
[] worker_thread+0x152/0x340
[] kthread+0x7e/0x90
[] kernel_thread_helper+0x4/0x10

Prevent this by checking the irqchip mode before starting a timer. We
can't deny creating the PIT if the irqchips aren't set up yet as
current user land expects this order to work.

Signed-off-by: Jan Kiszka
Signed-off-by: Marcelo Tosatti

Jan Kiszka
2011-12-25 23:13:18 +0800
2ca526bf4 MAINTAINERS: firewire git URL update ... Browse Code »

Signed-off-by: Stefan Richter

Stefan Richter
2011-12-25 21:05:05 +0800
4962516b2 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux ... Browse Code »

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
vmwgfx: fix incorrect VRAM size check in vmw_kms_fb_create()
drm/radeon/kms: bail on BTC parts if MC ucode is missing

Linus Torvalds
2011-12-25 05:34:44 +0800

24 Dec, 2011

9 commits

5f0a6e2d5 Linux 3.2-rc7 Browse Code »

Linus Torvalds
2011-12-24 13:51:06 +0800
a22681fab Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
VFS: Fix race between CPU hotplug and lglocks

Linus Torvalds
2011-12-24 13:47:28 +0800
6d451c578 Merge tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux ... Browse Code »

for linus: writeback reason binary tracing format fix

* tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
writeback: show writeback reason with __print_symbolic

Linus Torvalds
2011-12-24 12:25:36 +0800
71448c1f4 Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild ... Browse Code »

* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
kconfig: adapt update-po-config to new UML layout

Linus Torvalds
2011-12-24 07:01:24 +0800
4d18de944 Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media ... Browse Code »

* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] omap3isp: Fix crash caused by subdevs now having a pointer to devnodes

Linus Torvalds
2011-12-24 06:59:08 +0800
827fa4c76 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: call d_instantiate after all ops are setup
Btrfs: fix worker lock misuse in find_worker

Linus Torvalds
2011-12-24 06:58:39 +0800
5d219c6b9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Fix MSIQ HV call ordering in pci_sun4v_msiq_build_irq().

Linus Torvalds
2011-12-24 06:58:14 +0800
155d4551b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
netfilter: xt_connbytes: handle negation correctly
net: relax rcvbuf limits
rps: fix insufficient bounds checking in store_rps_dev_flow_table_cnt()
net: introduce DST_NOPEER dst flag
mqprio: Avoid panic if no options are provided
bridge: provide a mtu() method for fake_dst_ops

Linus Torvalds
2011-12-24 06:57:55 +0800
6350323ad Merge branch 'nf' of git://1984.lsi.us.es/net Browse Code »

David S. Miller
2011-12-24 03:29:20 +0800