Eric Lee / smarc-fsl-linux-kernel

14 Dec, 2011

11 commits

216284c35 block, cfq: fix race condition in cic creation path and tighten locking ... Browse Code »

cfq_get_io_context() would fail if multiple tasks race to insert cic's
for the same association. This patch restructures
cfq_get_io_context() such that slow path insertion race is handled
properly.

Note that the restructuring also makes cfq_get_io_context() called
under queue_lock and performs both ioc and cfqd insertions while
holding both ioc and queue locks. This is part of on-going locking
tightening and will be used to simplify synchronization rules.

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2011-12-14 07:33:38 +0800
dc86900e0 block, cfq: move ioc ioprio/cgroup changed handling to cic ... Browse Code »

ioprio/cgroup change was handled by marking the changed state in ioc
and, on the following access to the ioc, performing RCU-protected
iteration through all cic's grabbing the matching queue_lock.

This patch moves the changed state to each cic. When ioprio or cgroup
changes, the respective bit is set on all cic's of the ioc and when
each of those cic (not ioc) is accessed, change is applied for that
specific ioc-queue pair.

This also fixes the following two race conditions between setting and
clearing of changed states.

* Missing barrier between assign/load of ioprio and ioprio_changed
allowed applying old ioprio.

* Change requests could happen between application of change and
clearing of changed variables.

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2011-12-14 07:33:38 +0800
283287a52 block, cfq: misc updates to cfq_io_context ... Browse Code »

Make the following changes to prepare for ioc/cic management cleanup.

* Add cic->q so that ioc can determine the associated queue without
querying cfq. This will eventually replace ->key.

* Factor out cfq_release_cic() from cic_free_func(). This function
assumes that the caller handled locking.

* Rename __cfq_exit_single_io_context() to cfq_exit_cic() and make it
take only @cic.

* Restructure cfq_cic_link() for future updates.

This patch doesn't introduce any functional changes.

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2011-12-14 07:33:38 +0800
09ac46c42 block: misc updates to blk_get_queue() ... Browse Code »

* blk_get_queue() is peculiar in that it returns 0 on success and 1 on
failure instead of 0 / -errno or boolean. Update it such that it
returns %true on success and %false on failure.

* Make sure the caller checks for the return value.

* Separate out __blk_get_queue() which doesn't check whether @q is
dead and put it in blk.h. This will be used later.

This patch doesn't introduce any functional changes.

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2011-12-14 07:33:38 +0800
6e736be7f block: make ioc get/put interface more conventional and fix race on alloction ... Browse Code »

Ignoring copy_io() during fork, io_context can be allocated from two
places - current_io_context() and set_task_ioprio(). The former is
always called from local task while the latter can be called from
different task. The synchornization between them are peculiar and
dubious.

* current_io_context() doesn't grab task_lock() and assumes that if it
saw %NULL ->io_context, it would stay that way until allocation and
assignment is complete. It has smp_wmb() between alloc/init and
assignment.

* set_task_ioprio() grabs task_lock() for assignment and does
smp_read_barrier_depends() between "ioc = task->io_context" and "if
(ioc)". Unfortunately, this doesn't achieve anything - the latter
is not a dependent load of the former. ie, if ioc itself were being
dereferenced "ioc->xxx", it would mean something (not sure what tho)
but as the code currently stands, the dependent read barrier is
noop.

As only one of the the two test-assignment sequences is task_lock()
protected, the task_lock() can't do much about race between the two.
Nothing prevents current_io_context() and set_task_ioprio() allocating
its own ioc for the same task and overwriting the other's.

Also, set_task_ioprio() can race with exiting task and create a new
ioc after exit_io_context() is finished.

ioc get/put doesn't have any reason to be complex. The only hot path
is accessing the existing ioc of %current, which is simple to achieve
given that ->io_context is never destroyed as long as the task is
alive. All other paths can happily go through task_lock() like all
other task sub structures without impacting anything.

This patch updates ioc get/put so that it becomes more conventional.

* alloc_io_context() is replaced with get_task_io_context(). This is
the only interface which can acquire access to ioc of another task.
On return, the caller has an explicit reference to the object which
should be put using put_io_context() afterwards.

* The functionality of current_io_context() remains the same but when
creating a new ioc, it shares the code path with
get_task_io_context() and always goes through task_lock().

* get_io_context() now means incrementing ref on an ioc which the
caller already has access to (be that an explicit refcnt or implicit
%current one).

* PF_EXITING inhibits creation of new io_context and once
exit_io_context() is finished, it's guaranteed that both ioc
acquisition functions return %NULL.

* All users are updated. Most are trivial but
smp_read_barrier_depends() removal from cfq_get_io_context() needs a
bit of explanation. I suppose the original intention was to ensure
ioc->ioprio is visible when set_task_ioprio() allocates new
io_context and installs it; however, this wouldn't have worked
because set_task_ioprio() doesn't have wmb between init and install.
There are other problems with this which will be fixed in another
patch.

* While at it, use NUMA_NO_NODE instead of -1 for wildcard node
specification.

-v2: Vivek spotted contamination from debug patch. Removed.

Signed-off-by: Tejun Heo
Cc: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2011-12-14 07:33:38 +0800
42ec57a8f block: misc ioc cleanups ... Browse Code »

* int return from put_io_context() wasn't used by anybody. Make it
return void like other put functions and docbook-fy the function
comment.

* Reorder dummy declarations for !CONFIG_BLOCK case a bit.

* Make alloc_ioc_context() use __GFP_ZERO allocation, take init out of
if block and drop 0'ing.

* Docbook-fy current_io_context() comment.

This patch doesn't introduce any functional change.

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2011-12-14 07:33:37 +0800
a73f730d0 block, cfq: move cfqd->cic_index to q->id ... Browse Code »

cfq allocates per-queue id using ida and uses it to index cic radix
tree from io_context. Move it to q->id and allocate on queue init and
free on queue release. This simplifies cfq a bit and will allow for
further improvements of io context life-cycle management.

This patch doesn't introduce any functional difference.

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2011-12-14 07:33:37 +0800
8ba61435d block: add missing blk_queue_dead() checks ... Browse Code »

blk_insert_cloned_request(), blk_execute_rq_nowait() and
blk_flush_plug_list() either didn't check whether the queue was dead
or did it without holding queue_lock. Update them so that dead state
is checked while holding queue_lock.

AFAICS, this plugs all holes (requeue doesn't matter as the request is
transitioning atomically from in_flight to queued).

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2011-12-14 07:33:37 +0800
481a7d647 block: fix drain_all condition in blk_drain_queue() ... Browse Code »

When trying to drain all requests, blk_drain_queue() checked only
q->rq.count[]; however, this only tracks REQ_ALLOCED requests. This
patch updates blk_drain_queue() such that it looks at all the counters
and queues so that request_queue is actually empty on completion.

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2011-12-14 07:33:37 +0800
34f6055c8 block: add blk_queue_dead() ... Browse Code »

There are a number of QUEUE_FLAG_DEAD tests. Add blk_queue_dead()
macro and use it.

This patch doesn't introduce any functional difference.

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2011-12-14 07:33:37 +0800
1ba64edef block, sx8: kill blk_insert_request() ... Browse Code »

The only user left for blk_insert_request() is sx8 and it can be
trivially switched to use blk_execute_rq_nowait() - special requests
aren't included in io stat and sx8 doesn't use block layer tagging.
Switch sx8 and kill blk_insert_requeset().

This patch doesn't introduce any functional difference.

Only compile tested.

Signed-off-by: Tejun Heo
Acked-by: Jeff Garzik
Signed-off-by: Jens Axboe

Tejun Heo
2011-12-14 07:33:37 +0800

10 Dec, 2011

9 commits

dc47ce90c Linux 3.2-rc5 Browse Code »

Linus Torvalds
2011-12-10 07:09:32 +0800
8def5f51b Merge git://git.samba.org/sfrench/cifs-2.6 ... Browse Code »

* git://git.samba.org/sfrench/cifs-2.6:
cifs: check for NULL last_entry before calling cifs_save_resume_key
cifs: attempt to freeze while looping on a receive attempt
cifs: Fix sparse warning when calling cifs_strtoUCS
CIFS: Add descriptions to the brlock cache functions

Linus Torvalds
2011-12-10 06:45:44 +0800
a776878d6 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, efi: Calling __pa() with an ioremap()ed address is invalid
x86, hpet: Immediately disable HPET timer 1 if rtc irq is masked
x86/intel_mid: Kconfig select fix
x86/intel_mid: Fix the Kconfig for MID selection

Linus Torvalds
2011-12-10 06:45:12 +0800
e2f4e0bc2 Merge branch 'spi/for-3.2' of git://git.pengutronix.de/git/wsa/linux-2.6 ... Browse Code »

* 'spi/for-3.2' of git://git.pengutronix.de/git/wsa/linux-2.6:
spi/gpio: fix section mismatch warning
spi/fsl-espi: disable CONFIG_SPI_FSL_ESPI=m build
spi/nuc900: Include linux/module.h
spi/ath79: fix compile error due to missing include

Linus Torvalds
2011-12-10 06:41:50 +0800
af209e0ae Merge branch 'for-linus' of git://neil.brown.name/md ... Browse Code »

* 'for-linus' of git://neil.brown.name/md:
md: raid5 crash during degradation
md/raid5: never wait for bad-block acks on failed device.
md: ensure new badblocks are handled promptly.
md: bad blocks shouldn't cause a Blocked status on a Faulty device.
md: take a reference to mddev during sysfs access.
md: refine interpretation of "hold_active == UNTIL_IOCTL".
md/lock: ensure updates to page_attrs are properly locked.

Linus Torvalds
2011-12-10 00:18:08 +0800
53523d526 Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
arch/tile: use new generic {enable,disable}_percpu_irq() routines
drivers/net/ethernet/tile: use skb_frag_page() API
asm-generic/unistd.h: support new process_vm_{readv,write} syscalls
arch/tile: fix double-free bug in homecache_free_pages()
arch/tile: add a few #includes and an EXPORT to catch up with kernel changes.

Linus Torvalds
2011-12-10 00:08:57 +0800
592d44a5f Merge branch 'iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu ... Browse Code »

* 'iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
MAINTAINERS: Update amd-iommu F: patterns
iommu/amd: Fix typo in kernel-parameters.txt
iommu/msm: Fix compile error in mach-msm/devices-iommu.c
Fix comparison using wrong pointer variable in dma debug code

Linus Torvalds
2011-12-10 00:08:14 +0800
3ab345fc4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - Fix lost speaker volume controls
ALSA: hda/realtek - Create "Bass Speaker" for two speaker pins
ALSA: hda/realtek - Don't create extra controls with channel suffix
ALSA: hda - Fix remaining VREF mute-LED NID check in post-3.1 changes
ALSA: hda - Fix GPIO LED setup for IDT 92HD75 codecs
ASoC: Provide a more complete DMA driver stub
ASoC: Remove references to corgi and spitz from machine driver document
ASoC: Make SND_SOC_MX27VIS_AIC32X4 depend on I2C
ASoC: Fix dependency for SND_SOC_RAUMFELD and SND_PXA2XX_SOC_HX4700
ASoC: uda1380: Return proper error in uda1380_modinit failure path
ASoC: kirkwood: Make SND_KIRKWOOD_SOC_OPENRD and SND_KIRKWOOD_SOC_T5325 depend on I2C
ASoC: Mark WM8994 ADC muxes as virtual
ALSA: hda/realtek - Fix Oops in alc_mux_select()
ALSA: sis7019 - give slow codecs more time to reset

Linus Torvalds
2011-12-10 00:07:42 +0800
975e32c28 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Do no try to schedule task events if there are none
lockdep, kmemcheck: Annotate ->lock in lockdep_init_map()
perf header: Use event_name() to get an event name
perf stat: Failure with "Operation not supported"

Linus Torvalds
2011-12-10 00:07:24 +0800

09 Dec, 2011

20 commits

031af165b sys_getppid: add missing rcu_dereference ... Browse Code »

In order to safely dereference current->real_parent inside an
rcu_read_lock, we need an rcu_dereference.

Signed-off-by: Mandeep Singh Baines
Cc: Thomas Gleixner
Cc: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: Kees Cook
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mandeep Singh Baines
2011-12-09 23:50:29 +0800
1cee22b7f rapidio/tsi721: modify PCIe capability settings ... Browse Code »

Modify initialization of PCIe capability registers in Tsi721 mport driver:
- change Completion Timeout value to avoid unexpected data transfer
aborts during intensive traffic.
- replace hardcoded offset of PCIe capability block by making it use the
common function.

This patch is applicable to kernel versions starting from 3.2-rc1.

Signed-off-by: Alexandre Bounine
Cc: Matt Porter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexandre Bounine
2011-12-09 23:50:29 +0800
b439e66f0 rapidio/tsi721: fix mailbox resource reporting ... Browse Code »

Bug fix for Tsi721 RapidIO mport driver: Tsi721 supports four RapidIO
mailboxes (MBOX0 - MBOX3) as defined by RapidIO specification. Mailbox
resources has to be properly reported to allow use of all available
mailboxes (initial version reports only MBOX0).

This patch is applicable to kernel versions staring from 3.2-rc1.

Signed-off-by: Alexandre Bounine
Cc: Matt Porter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexandre Bounine
2011-12-09 23:50:29 +0800
ceb963981 rapidio/tsi721: switch to dma_zalloc_coherent ... Browse Code »

Replace the pair dma_alloc_coherent()+memset() with the new
dma_zalloc_coherent() added by Andrew Morton for kernel version 3.2

Signed-off-by: Alexandre Bounine
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexandre Bounine
2011-12-09 23:50:29 +0800
2a95ea6c0 procfs: do not overflow get_{idle,iowait}_time for nohz ... Browse Code »

Since commit a25cac5198d4 ("proc: Consider NO_HZ when printing idle and
iowait times") we are reporting idle/io_wait time also while a CPU is
tickless. We rely on get_{idle,iowait}_time functions to retrieve
proper data.

These functions, however, use usecs_to_cputime to translate micro
seconds time to cputime64_t. This is just an alias to usecs_to_jiffies
which reduces the data type from u64 to unsigned int and also checks
whether the given parameter overflows jiffies_to_usecs(MAX_JIFFY_OFFSET)
and returns MAX_JIFFY_OFFSET in that case.

When we overflow depends on CONFIG_HZ but especially for CONFIG_HZ_300
it is quite low (1431649781) so we are getting MAX_JIFFY_OFFSET for
>3000s! until we overflow unsigned int. Just for reference
CONFIG_HZ_100 has an overflow window around 20s, CONFIG_HZ_250 ~8s and
CONFIG_HZ_1000 ~2s.

This results in a bug when people saw [h]top going mad reporting 100%
CPU usage even though there was basically no CPU load. The reason was
simply that /proc/stat stopped reporting idle/io_wait changes (and
reported MAX_JIFFY_OFFSET) and so the only change happening was for user
system time.

Let's use nsecs_to_jiffies64 instead which doesn't reduce the precision
to 32b type and it is much more appropriate for cumulative time values
(unlike usecs_to_jiffies which intended for timeout calculations).

Signed-off-by: Michal Hocko
Tested-by: Artem S. Tashkinov
Cc: Dave Jones
Cc: Arnd Bergmann
Cc: Alexey Dobriyan
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2011-12-09 23:50:29 +0800
1368edf06 mm: vmalloc: check for page allocation failure before vmlist insertion ... Browse Code »
1

Commit f5252e00 ("mm: avoid null pointer access in vm_struct via
/proc/vmallocinfo") adds newly allocated vm_structs to the vmlist after
it is fully initialised. Unfortunately, it did not check that
__vmalloc_area_node() successfully populated the area. In the event of
allocation failure, the vmalloc area is freed but the pointer to freed
memory is inserted into the vmlist leading to a a crash later in
get_vmalloc_info().

This patch adds a check for ____vmalloc_area_node() failure within
__vmalloc_node_range. It does not use "goto fail" as in the previous
error path as a warning was already displayed by __vmalloc_area_node()
before it called vfree in its failure path.

Credit goes to Luciano Chavez for doing all the real work of identifying
exactly where the problem was.

Signed-off-by: Mel Gorman
Reported-by: Luciano Chavez
Tested-by: Luciano Chavez
Reviewed-by: Rik van Riel
Acked-by: David Rientjes
Cc: [3.1.x+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2011-12-09 23:50:29 +0800
d02156388 mm: Ensure that pfn_valid() is called once per pageblock when reserving pageblocks ... Browse Code »
1

setup_zone_migrate_reserve() expects that zone->start_pfn starts at
pageblock_nr_pages aligned pfn otherwise we could access beyond an
existing memblock resulting in the following panic if
CONFIG_HOLES_IN_ZONE is not configured and we do not check pfn_valid:

IP: [] setup_zone_migrate_reserve+0xcd/0x180
*pdpt = 0000000000000000 *pde = f000ff53f000ff53
Oops: 0000 [#1] SMP
Pid: 1, comm: swapper Not tainted 3.0.7-0.7-pae #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
EIP: 0060:[] EFLAGS: 00010006 CPU: 0
EIP is at setup_zone_migrate_reserve+0xcd/0x180
EAX: 000c0000 EBX: f5801fc0 ECX: 000c0000 EDX: 00000000
ESI: 000c01fe EDI: 000c01fe EBP: 00140000 ESP: f2475f58
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 1, ti=f2474000 task=f2472cd0 task.ti=f2474000)
Call Trace:
[] __setup_per_zone_wmarks+0xec/0x160
[] setup_per_zone_wmarks+0xf/0x20
[] init_per_zone_wmark_min+0x27/0x86
[] do_one_initcall+0x2b/0x160
[] kernel_init+0xbe/0x157
[] kernel_thread_helper+0x6/0xd
Code: a5 39 f5 89 f7 0f 46 fd 39 cf 76 40 8b 03 f6 c4 08 74 32 eb 91 90 89 c8 c1 e8 0e 0f be 80 80 2f 86 c0 8b 14 85 60 2f 86 c0 89 c8 82 b4 12 00 00 c1 e0 05 03 82 ac 12 00 00 8b 00 f6 c4 08 0f
EIP: [] setup_zone_migrate_reserve+0xcd/0x180 SS:ESP 0068:f2475f58
CR2: 00000000000012b4

We crashed in pageblock_is_reserved() when accessing pfn 0xc0000 because
highstart_pfn = 0x36ffe.

The issue was introduced in 3.0-rc1 by 6d3163ce ("mm: check if any page
in a pageblock is reserved before marking it MIGRATE_RESERVE").

Make sure that start_pfn is always aligned to pageblock_nr_pages to
ensure that pfn_valid s always called at the start of each pageblock.
Architectures with holes in pageblocks will be correctly handled by
pfn_valid_within in pageblock_is_reserved.

Signed-off-by: Michal Hocko
Signed-off-by: Mel Gorman
Tested-by: Dang Bo
Reviewed-by: KAMEZAWA Hiroyuki
Cc: Andrea Arcangeli
Cc: David Rientjes
Cc: Arve Hjnnevg
Cc: KOSAKI Motohiro
Cc: John Stultz
Cc: Dave Hansen
Cc: [3.0+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2011-12-09 23:50:28 +0800
09761333e mm/migrate.c: pair unlock_page() and lock_page() when migrating huge pages ... Browse Code »

Avoid unlocking and unlocked page if we failed to lock it.

Signed-off-by: Hillf Danton
Cc: Naoya Horiguchi
Cc: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hillf Danton
2011-12-09 23:50:28 +0800
58a84aa92 thp: set compound tail page _count to zero ... Browse Code »
1

Commit 70b50f94f1644 ("mm: thp: tail page refcounting fix") keeps all
page_tail->_count zero at all times. But the current kernel does not
set page_tail->_count to zero if a 1GB page is utilized. So when an
IOMMU 1GB page is used by KVM, it wil result in a kernel oops because a
tail page's _count does not equal zero.

kernel BUG at include/linux/mm.h:386!
invalid opcode: 0000 [#1] SMP
Call Trace:
gup_pud_range+0xb8/0x19d
get_user_pages_fast+0xcb/0x192
? trace_hardirqs_off+0xd/0xf
hva_to_pfn+0x119/0x2f2
gfn_to_pfn_memslot+0x2c/0x2e
kvm_iommu_map_pages+0xfd/0x1c1
kvm_iommu_map_memslots+0x7c/0xbd
kvm_iommu_map_guest+0xaa/0xbf
kvm_vm_ioctl_assigned_device+0x2ef/0xa47
kvm_vm_ioctl+0x36c/0x3a2
do_vfs_ioctl+0x49e/0x4e4
sys_ioctl+0x5a/0x7c
system_call_fastpath+0x16/0x1b
RIP gup_huge_pud+0xf2/0x159

Signed-off-by: Youquan Song
Reviewed-by: Andrea Arcangeli
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Youquan Song
2011-12-09 23:50:28 +0800
b6999b191 thp: add compound tail page _mapcount when mapped ... Browse Code »
87

With the 3.2-rc kernel, IOMMU 2M pages in KVM works. But when I tried
to use IOMMU 1GB pages in KVM, I encountered an oops and the 1GB page
failed to be used.

The root cause is that 1GB page allocation calls gup_huge_pud() while 2M
page calls gup_huge_pmd. If compound pages are used and the page is a
tail page, gup_huge_pmd() increases _mapcount to record tail page are
mapped while gup_huge_pud does not do that.

So when the mapped page is relesed, it will result in kernel oops
because the page is not marked mapped.

This patch add tail process for compound page in 1GB huge page which
keeps the same process as 2M page.

Reproduce like:
1. Add grub boot option: hugepagesz=1G hugepages=8
2. mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages
3. qemu-kvm -m 2048 -hda os-kvm.img -cpu kvm64 -smp 4 -mem-path /dev/hugepages
-net none -device pci-assign,host=07:00.1

kernel BUG at mm/swap.c:114!
invalid opcode: 0000 [#1] SMP
Call Trace:
put_page+0x15/0x37
kvm_release_pfn_clean+0x31/0x36
kvm_iommu_put_pages+0x94/0xb1
kvm_iommu_unmap_memslots+0x80/0xb6
kvm_assign_device+0xba/0x117
kvm_vm_ioctl_assigned_device+0x301/0xa47
kvm_vm_ioctl+0x36c/0x3a2
do_vfs_ioctl+0x49e/0x4e4
sys_ioctl+0x5a/0x7c
system_call_fastpath+0x16/0x1b
RIP put_compound_page+0xd4/0x168

Signed-off-by: Youquan Song
Reviewed-by: Andrea Arcangeli
Cc: Andi Kleen
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Youquan Song
2011-12-09 23:50:28 +0800
09dc3cf93 printk: avoid double lock acquire ... Browse Code »

Commit 4f2a8d3cf5e ("printk: Fix console_sem vs logbuf_lock unlock race")
introduced another silly bug where we would want to acquire an already
held lock. Avoid this.

Reported-by: Andrea Arcangeli
Signed-off-by: Peter Zijlstra
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2011-12-09 23:50:28 +0800
c193c82f0 memcg: update maintainers ... Browse Code »

More players joined to memory cgroup developments and Johannes' great work
changed internal design of memory cgroup dramatically. And he will do
more works. Michal Hokko did many bug fixes and know memory cgroup very
well. Daisuke Nishimura helped us very much but he seems busy now.
Thanks to his works.

Signed-off-by: KAMEZAWA Hiroyuki
Acked-by: Michal Hocko
Acked-by: Johannes Weiner
Acked-by: Daisuke Nishimura
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2011-12-09 23:50:28 +0800
2dbcd05f1 drivers/rtc/rtc-s3c.c: fix driver clock enable/disable balance issues ... Browse Code »

If an error occurs after the clock is enabled, the enable/disable state
can become unbalanced.

Signed-off-by: Jonghwan Choi
Cc: Alessandro Zummo
Acked-by: Kukjin Kim
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jonghwan Choi
2011-12-09 23:50:28 +0800
1de8ad43d CREDITS: update Kees's expired fingerprint and fix details ... Browse Code »

Small clean-up for my CREDITS entry; the GPG fingerprint was not up to
date, so I fixed other details at the same time too.

Signed-off-by: Kees Cook
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2011-12-09 23:50:28 +0800
1dfb059b9 thp: reduce khugepaged freezing latency ... Browse Code »

khugepaged can sometimes cause suspend to fail, requiring that the user
retry the suspend operation.

Use wait_event_freezable_timeout() instead of
schedule_timeout_interruptible() to avoid missing freezer wakeups. A
try_to_freeze() would have been needed in the khugepaged_alloc_hugepage
tight loop too in case of the allocation failing repeatedly, and
wait_event_freezable_timeout will provide it too.

khugepaged would still freeze just fine by trying again the next minute
but it's better if it freezes immediately.

Reported-by: Jiri Slaby
Signed-off-by: Andrea Arcangeli
Tested-by: Jiri Slaby
Cc: Tejun Heo
Cc: Oleg Nesterov
Cc: "Srivatsa S. Bhat"
Cc: "Rafael J. Wysocki"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2011-12-09 23:50:28 +0800
b53fc7c29 fs/proc/meminfo.c: fix compilation error ... Browse Code »
1

Fix the error message "directives may not be used inside a macro argument"
which appears when the kernel is compiled for the cris architecture.

Signed-off-by: Claudio Scordino
Cc: Andrea Arcangeli
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Claudio Scordino
2011-12-09 23:50:28 +0800
83aeeada7 vmscan: use atomic-long for shrinker batching ... Browse Code »

Use atomic-long operations instead of looping around cmpxchg().

[akpm@linux-foundation.org: massage atomic.h inclusions]
Signed-off-by: Konstantin Khlebnikov
Cc: Dave Chinner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2011-12-09 23:50:27 +0800
635697c66 vmscan: fix initial shrinker size handling ... Browse Code »
43

A shrinker function can return -1, means that it cannot do anything
without a risk of deadlock. For example prune_super() does this if it
cannot grab a superblock refrence, even if nr_to_scan=0. Currently we
interpret this -1 as a ULONG_MAX size shrinker and evaluate `total_scan'
according to this. So the next time around this shrinker can cause
really big pressure. Let's skip such shrinkers instead.

Also make total_scan signed, otherwise the check (total_scan < 0) below
never works.

Signed-off-by: Konstantin Khlebnikov
Cc: Dave Chinner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2011-12-09 23:50:27 +0800
b2c163913 MAINTAINERS: Update amd-iommu F: patterns ... Browse Code »

Commit 29b68415e335 ("x86: amd_iommu: move to drivers/iommu/")
moved the files, update the patterns.

CC: Ohad Ben-Cohen
CC: Joerg Roedel

Signed-off-by: Joe Perches
Signed-off-by: Joerg Roedel

Joe Perches
2011-12-09 22:20:23 +0800
e8c710628 x86, efi: Calling __pa() with an ioremap()ed address is invalid ... Browse Code »

If we encounter an efi_memory_desc_t without EFI_MEMORY_WB set
in ->attribute we currently call set_memory_uc(), which in turn
calls __pa() on a potentially ioremap'd address.

On CONFIG_X86_32 this is invalid, resulting in the following
oops on some machines:

BUG: unable to handle kernel paging request at f7f22280
IP: [] reserve_ram_pages_type+0x89/0x210
[...]

Call Trace:
[] ? page_is_ram+0x1a/0x40
[] reserve_memtype+0xdf/0x2f0
[] set_memory_uc+0x49/0xa0
[] efi_enter_virtual_mode+0x1c2/0x3aa
[] start_kernel+0x291/0x2f2
[] ? loglevel+0x1b/0x1b
[] i386_start_kernel+0xbf/0xc8

A better approach to this problem is to map the memory region
with the correct attributes from the start, instead of modifying
it after the fact. The uncached case can be handled by
ioremap_nocache() and the cached by ioremap_cache().

Despite first impressions, it's not possible to use
ioremap_cache() to map all cached memory regions on
CONFIG_X86_64 because EFI_RUNTIME_SERVICES_DATA regions really
don't like being mapped into the vmalloc space, as detailed in
the following bug report,

https://bugzilla.redhat.com/show_bug.cgi?id=748516

Therefore, we need to ensure that any EFI_RUNTIME_SERVICES_DATA
regions are covered by the direct kernel mapping table on
CONFIG_X86_64. To accomplish this we now map E820_RESERVED_EFI
regions via the direct kernel mapping with the initial call to
init_memory_mapping() in setup_arch(), whereas previously these
regions wouldn't be mapped if they were after the last E820_RAM
region until efi_ioremap() was called. Doing it this way allows
us to delete efi_ioremap() completely.

Signed-off-by: Matt Fleming
Cc: H. Peter Anvin
Cc: Matthew Garrett
Cc: Zhang Rui
Cc: Huang Ying
Cc: Linus Torvalds
Cc: Andrew Morton
Link: http://lkml.kernel.org/r/1321621751-3650-1-git-send-email-matt@console-pimps.org
Signed-off-by: Ingo Molnar

Matt Fleming
2011-12-09 15:32:26 +0800