Eric Lee / smarc-fsl-linux-kernel

13 Dec, 2011

5 commits

e7f626db8 ARM: 7207/1: Use generic ARM instruction set condition code checks for nwfpe. ... Browse Code »

This patch changes the nwfpe implementation to use the new generic
ARM instruction set condition code checks, rather than a local
implementation. It also removes the existing condition code checking,
which has been used for the generic support (in kernel/opcodes.{ch}).

This code has not been tested beyond building, linking and booting.

Signed-off-by: Leif Lindholm
Reviewed-by: Will Deacon
Signed-off-by: Russell King

Leif Lindholm
2011-12-13 16:52:02 +0800
0c9030dea ARM: 7206/1: Add generic ARM instruction set condition code checks. ... Browse Code »

This patch breaks the ARM condition checking code out of nwfpe/fpopcode.{ch}
into a standalone file for opcode operations. It also modifies the code
somewhat for coding style adherence, and adds some temporary variables for
increased readability.

Signed-off-by: Leif Lindholm
Reviewed-by: Will Deacon
Signed-off-by: Russell King

Leif Lindholm
2011-12-13 16:52:02 +0800
9904f7933 ARM: 7200/1: activate TCM on the Integrator ... Browse Code »

Some Integrator core modules have TCM memory, so let's turn it on
if it's there.

Signed-off-by: Linus Walleij
Signed-off-by: Russell King

Linus Walleij
2011-12-13 16:52:02 +0800
90b9222ec ARM: 7199/2: only look for TCM on ARMv5 and later ... Browse Code »

The Integrator AP/CP can have a varying set of core modules, some
(like ARM920T) are so old that trying to read the TCM status register
with CP15 will make them hang. So we need to make sure that we are
running on v5 or later in order to be able to activate this for
the Integrator. (The Integrator with CM926EJ-S has 32+32 kb of TCM
memory.)

Signed-off-by: Linus Walleij
Signed-off-by: Russell King

Linus Walleij
2011-12-13 16:52:02 +0800
958cab0fb ARM: Allow Kconfig to control the definition of NR_BANKS ... Browse Code »

Move the sizing of NR_BANKS to a Kconfig control instead of selecting
it in a header file depending on platform selection. This allows new
additions to its dependencies to be handled more gracefully.

Signed-off-by: Russell King

Russell King
2011-12-13 16:52:02 +0800

11 Dec, 2011

2 commits

b4244738d ARM: 7202/1: Add Cortex-A7 proc info ... Browse Code »
46

This patch adds processor info for ARM Ltd. Cortex-A7.

A7 is architecturally identical to A15 so it shares the
same SMP initialization code and hwcaps.

Tested-by: Will Deacon
Signed-off-by: Pawel Moll
Signed-off-by: Russell King

Pawel Moll
2011-12-11 16:36:21 +0800
786a76746 ARM: 7201/1: add EDAC atomic_scrub function ... Browse Code »

Add support for architecture specific EDAC atomic_scrub to ARM. Only ARMv6+
is implemented as ldrex/strex instructions are needed. Supporting EDAC on
ARMv5 or earlier is unlikely at this point anyway.

Signed-off-by: Rob Herring
Signed-off-by: Russell King

Rob Herring
2011-12-11 16:35:50 +0800

06 Dec, 2011

5 commits

8878a539f ARM: 7178/1: fault.c: Port OOM changes into do_page_fault ... Browse Code »
46

Commit d065bd810b6deb67d4897a14bfe21f8eb526ba99
(mm: retry page fault when blocking on disk transfer) and
commit 37b23e0525d393d48a7d59f870b3bc061a30ccdb
(x86,mm: make pagefault killable)

The above commits introduced changes into the x86 pagefault handler
for making the page fault handler retryable as well as killable.

These changes reduce the mmap_sem hold time, which is crucial
during OOM killer invocation.

Port these changes to ARM.

Without these changes, my ARM board encounters many hang and livelock
scenarios.
After applying this patch, OOM feature performance improves according to
my testing.

Signed-off-by: Kautuk Consul
Signed-off-by: Russell King

Kautuk Consul
2011-12-06 19:15:26 +0800
df0e74da6 ARM: 7173/1: Add optimised swahb32() byteswap helper for v6 and above ... Browse Code »

ARMv6 and later processors have the REV16 instruction, which swaps
the bytes within each halfword of a register value.

This is already used to implement swab16(), but since the native
operation performaed by REV16 is actually swahb32(), this patch
renames the existing swab16() helper accordingly and defines
__arch_swab16() in terms of it. This allows calls to both swab16()
and swahb32() to be optimised.

The compiler's generated code might improve someday, but as of
4.5.2 the code generated for pure C implementing these 16-bit
bytesswaps remains pessimal.

swahb32() is useful for converting 32-bit Thumb instructions
between integer and memory representation on BE8 platforms (among
other uses).

Signed-off-by: Dave Martin
Reviewed-by: Nicolas Pitre
Signed-off-by: Russell King

Dave Martin
2011-12-06 19:15:26 +0800
7dbaa4667 ARM: 7169/1: topdown mmap support ... Browse Code »

Similar to other architectures, this adds topdown mmap support in user
process address space allocation policy. This allows mmap sizes greater
than 2GB. This support is largely copied from MIPS and the generic
implementations.

The address space randomization is moved into arch_pick_mmap_layout.

Tested on V-Express with ubuntu and a mmap test from here:
https://bugs.launchpad.net/bugs/861296

Signed-off-by: Rob Herring
Acked-by: Nicolas Pitre
Signed-off-by: Russell King

Rob Herring
2011-12-06 19:15:25 +0800
d22759ed5 ARM: 7193/1: Fix machine_is_xxx() naming for eSata SheevaPlug and QNAP TS-209 ... Browse Code »

The eSata SheevaPlug and QNAP TS-209 devices were removed from
mach-types due to naming mismatches between machine_is_xxx(), CONFIG_XXX
and MACH_TYPE_XXX.

This patch fixes those mismatches and adds the devices back into
mach-types.

Acked-by: Nicolas Pitre
Signed-off-by: Jon Medhurst
Signed-off-by: Russell King

Jon Medhurst (Tixy)
2011-12-06 19:14:01 +0800
023bfa3dc ARM: 7140/1: remove NR_IRQS dependency for ARM-specific HARDIRQ_BITS definition ... Browse Code »

As a first step towards removing NR_IRQS, remove the ARM customization
of HARDIRQ_BITS based on NR_IRQS.

The generic code in already has a default value of
10 for HARDIRQ_BITS which is the max used on ARM, so let's just remove
the NR_IRQS based customization and use the generic default.

Signed-off-by: Kevin Hilman
Acked-by: Nicolas Pitre
Signed-off-by: Russell King

Kevin Hilman
2011-12-06 19:14:01 +0800

02 Dec, 2011

5 commits

5611cc457 Linux 3.2-rc4 Browse Code »

Linus Torvalds
2011-12-02 06:56:01 +0800
0a4ebed78 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 ... Browse Code »

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (31 commits)
ocfs2: avoid unaligned access to dqc_bitmap
ocfs2: Use filemap_write_and_wait() instead of write_inode_now()
ocfs2: honor O_(D)SYNC flag in fallocate
ocfs2: Add a missing journal credit in ocfs2_link_credits() -v2
ocfs2: send correct UUID to cleancache initialization
ocfs2: Commit transactions in error cases -v2
ocfs2: make direntry invalid when deleting it
fs/ocfs2/dlm/dlmlock.c: free kmem_cache_zalloc'd data using kmem_cache_free
ocfs2: Avoid livelock in ocfs2_readpage()
ocfs2: serialize unaligned aio
ocfs2: Implement llseek()
ocfs2: Fix ocfs2_page_mkwrite()
ocfs2: Add comment about orphan scanning
ocfs2: Clean up messages in the fs
ocfs2/cluster: Cluster up now includes network connections too
ocfs2/cluster: Add new function o2net_fill_node_map()
ocfs2/cluster: Fix output in file elapsed_time_in_ms
ocfs2/dlm: dlmlock_remote() needs to account for remastery
ocfs2/dlm: Take inflight reference count for remotely mastered resources too
ocfs2/dlm: Cleanup dlm_wait_for_node_death() and dlm_wait_for_node_recovery()
...

Linus Torvalds
2011-12-02 06:55:34 +0800
939255798 ocfs2: avoid unaligned access to dqc_bitmap ... Browse Code »

The dqc_bitmap field of struct ocfs2_local_disk_chunk is 32-bit aligned,
but not 64-bit aligned. The dqc_bitmap is accessed by ocfs2_set_bit(),
ocfs2_clear_bit(), ocfs2_test_bit(), or ocfs2_find_next_zero_bit(). These
are wrapper macros for ext2_*_bit() which need to take an unsigned long
aligned address (though some architectures are able to handle unaligned
address correctly)

So some 64bit architectures may not be able to access the dqc_bitmap
correctly.

This avoids such unaligned access by using another wrapper functions for
ext2_*_bit(). The code is taken from fs/ext4/mballoc.c which also need to
handle unaligned bitmap access.

Signed-off-by: Akinobu Mita
Acked-by: Joel Becker
Cc: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Joel Becker

Akinobu Mita
2011-12-02 06:39:32 +0800
3b120ab76 Merge branch 'fixes' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm ... Browse Code »

* 'fixes' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm:
ARM: 7182/1: ARM cpu topology: fix warning
ARM: 7181/1: Restrict kprobes probing SWP instructions to ARMv5 and below
ARM: 7180/1: Change kprobes testcase with unpredictable STRD instruction
ARM: 7177/1: GIC: avoid skipping non-existent PPIs in irq_start calculation
ARM: 7176/1: cpu_pm: register GIC PM notifier only once
ARM: 7175/1: add subname parameter to mfp_set_groupg callers
ARM: 7174/1: Fix build error in kprobes test code on Thumb2 kernels
ARM: 7172/1: dma: Drop GFP_COMP for DMA memory allocations
ARM: 7171/1: unwind: add unwind directives to bitops assembly macros
ARM: 7170/2: fix compilation breakage in entry-armv.S
ARM: 7168/1: use cache type functions for arch_get_unmapped_area
ARM: perf: check that we have a platform device when reserving PMU
ARM: 7166/1: Use PMD_SHIFT instead of PGDIR_SHIFT in dma-consistent.c
ARM: 7165/2: PL330: Fix typo in _prepare_ccr()
ARM: 7163/2: PL330: Only register usable channels
ARM: 7162/1: errata: tidy up Kconfig options for PL310 errata workarounds
ARM: 7161/1: errata: no automatic store buffer drain
ARM: perf: initialise used_mask for fake PMU during validation
ARM: PMU: remove pmu_init declaration
ARM: PMU: re-export release_pmu symbol to modules

Linus Torvalds
2011-12-02 03:53:54 +0800
b930c2641 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix meta data raid-repair merge problem
Btrfs: skip allocation attempt from empty cluster
Btrfs: skip block groups without enough space for a cluster
Btrfs: start search for new cluster at the beginning
Btrfs: reset cluster's max_size when creating bitmap
Btrfs: initialize new bitmaps' list
Btrfs: fix oops when calling statfs on readonly device
Btrfs: Don't error on resizing FS to same size
Btrfs: fix deadlock on metadata reservation when evicting a inode
Fix URL of btrfs-progs git repository in docs
btrfs scrub: handle -ENOMEM from init_ipath()

Linus Torvalds
2011-12-02 00:28:53 +0800

01 Dec, 2011

18 commits

f4a8e6563 Btrfs: fix meta data raid-repair merge problem ... Browse Code »

Commit 4a54c8c16 introduced raid-repair, killing the individual
readpage_io_failed_hook entries from inode.c and disk-io.c. Commit
4bb31e92 introduced new readahead code, adding a readpage_io_failed_hook to
disk-io.c.

The raid-repair commit had logic to disable raid-repair, if
readpage_io_failed_hook is set. Thus, the readahead commit effectively
disabled raid-repair for meta data.

This commit changes the logic to always attempt raid-repair when needed and
call the readpage_io_failed_hook in case raid-repair fails. This is much
more straight forward and should have been like that from the beginning.

Signed-off-by: Jan Schmidt
Reported-by: Stefan Behrens
Signed-off-by: Chris Mason

Jan Schmidt
2011-12-01 22:30:36 +0800
11d814a20 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB: Fix RCU lockdep splats
IB/ipoib: Prevent hung task or softlockup processing multicast response
IB/qib: Fix over-scheduling of QSFP work
RDMA/cxgb4: Fix retry with MPAv1 logic for MPAv2
RDMA/cxgb4: Fix iw_cxgb4 count_rcqes() logic
IB/qib: Don't use schedule_work()

Linus Torvalds
2011-12-01 08:25:02 +0800
c290b2f2b Merge branch 'dt-for-linus' of git://sources.calxeda.com/kernel/linux ... Browse Code »

* 'dt-for-linus' of git://sources.calxeda.com/kernel/linux:
of: Add Silicon Image vendor prefix
of/irq: of_irq_init: add check for parent equal to child node

Linus Torvalds
2011-12-01 08:24:43 +0800
d6e92d360 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: twl: fix twl4030 support for smps regulators
regulator: fix use after free bug
regulator: aat2870: Fix the logic of checking if no id is matched in aat2870_get_regulator

Linus Torvalds
2011-12-01 08:24:24 +0800
cd5b49bce Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc ... Browse Code »

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (45 commits)
ARM: ux500: update defconfig
ARM: u300: update defconfig
ARM: at91: enable additional boards in existing soc defconfig files
ARM: at91: refresh soc defconfig files for 3.2
ARM: at91: rename defconfig files appropriately
ARM: OMAP2+: Fix Compilation error when omap_l3_noc built as module
ARM: OMAP2+: Remove empty io.h
ARM: OMAP2: select ARM_AMBA if OMAP3_EMU is defined
ARM: OMAP: smartreflex: fix IRQ handling bug
ARM: OMAP: PM: only register TWL with voltage layer when device is present
ARM: OMAP: hwmod: Fix the addr space, irq, dma count APIs
arm: mx28: fix bit operation in clock setting
ARM: imx: export imx_ioremap
ARM: imx/mm-imx3: conditionally compile i.MX31 and i.MX35 code
ARM: mx5: Fix checkpatch warnings in cpu-imx5.c
MAINTAINERS: Add missing directory
ARM: imx: drop 'ARCH_MX31' and 'ARCH_MX35'
ARM: imx6q: move clock register map to machine_desc.map_io
ARM: pxa168/gplugd: add the correct SSP device
ARM: Update mach-types to fix mxs build breakage
...

Linus Torvalds
2011-12-01 08:23:59 +0800
4cbd6b167 ARM: 7182/1: ARM cpu topology: fix warning ... Browse Code »

kernel/sched.c:7354:2: warning: initialization from incompatible pointer type

Align cpu_coregroup_mask prototype interface with sched_domain_mask_f typedef
use int cpu instead of unsigned int cpu

Cc:
Signed-off-by: Vincent Guittot
Signed-off-by: Russell King

Vincent Guittot
2011-12-01 07:55:21 +0800
b5bed7fe8 ARM: 7181/1: Restrict kprobes probing SWP instructions to ARMv5 and below ... Browse Code »

The SWP instruction is deprecated on ARMv6 and with ARMv7 it will be
UNDEFINED when CONFIG_SWP_EMULATE is selected. In this case, probing a
SWP instruction will cause an oops when the kprobes emulation code
executes an undefined instruction.

As the SWP instruction should be rare or non-existent in kernels for
ARMv6 and later, we can simply avoid these problems by not allowing
probing of these.

Reported-by: Leif Lindholm
Tested-by: Leif Lindholm
Acked-by: Nicolas Pitre
Signed-off-by: Jon Medhurst
Signed-off-by: Russell King

Jon Medhurst (Tixy)
2011-12-01 07:54:54 +0800
14383c295 ARM: 7180/1: Change kprobes testcase with unpredictable STRD instruction ... Browse Code »

There is a kprobes testcase for the instruction "strd r2, [r3], r4".
This has unpredictable behaviour as it uses r3 for register writeback
addressing and also stores it to memory.

On a cortex A9, this testcase would fail because the instruction writes
the updated value of r3 to memory, whereas the kprobes emulation code
writes the original value.

Fix this by changing testcase to used r5 instead of r3.

Reported-by: Leif Lindholm
Tested-by: Leif Lindholm
Acked-by: Nicolas Pitre
Signed-off-by: Jon Medhurst
Signed-off-by: Russell King

Jon Medhurst (Tixy)
2011-12-01 07:54:53 +0800
be064d113 Btrfs: skip allocation attempt from empty cluster ... Browse Code »

If we don't have a cluster, don't bother trying to allocate from it,
jumping right away to the attempt to allocate a new cluster.

Signed-off-by: Alexandre Oliva
Signed-off-by: Chris Mason

Alexandre Oliva
2011-12-01 02:43:00 +0800
425d83156 Btrfs: skip block groups without enough space for a cluster ... Browse Code »

We test whether a block group has enough free space to hold the
requested block, but when we're doing clustered allocation, we can
save some cycles by testing whether it has enough room for the cluster
upfront, otherwise we end up attempting to set up a cluster and
failing. Only in the NO_EMPTY_SIZE loop do we attempt an unclustered
allocation, and by then we'll have zeroed the cluster size, so this
patch won't stop us from using the block group as a last resort.

Signed-off-by: Alexandre Oliva
Signed-off-by: Chris Mason

Alexandre Oliva
2011-12-01 02:43:00 +0800
1b22bad77 Btrfs: start search for new cluster at the beginning ... Browse Code »

Instead of starting at zero (offset is always zero), request a cluster
starting at search_start, that denotes the beginning of the current
block group.

Signed-off-by: Alexandre Oliva
Signed-off-by: Chris Mason

Alexandre Oliva
2011-12-01 02:43:00 +0800
b78d09bce Btrfs: reset cluster's max_size when creating bitmap ... Browse Code »

The field that indicates the size of the largest contiguous chunk of
free space in the cluster is not initialized when setting up bitmaps,
it's only increased when we find a larger contiguous chunk. We end up
retaining a larger value than appropriate for highly-fragmented
clusters, which may cause pointless searches for large contiguous
groups, and even cause clusters that do not meet the density
requirements to be set up.

Signed-off-by: Alexandre Oliva
Signed-off-by: Chris Mason

Alexandre Oliva
2011-12-01 02:43:00 +0800
f2d0f6765 Btrfs: initialize new bitmaps' list ... Browse Code »

We're failing to create clusters with bitmaps because
setup_cluster_no_bitmap checks that the list is empty before inserting
the bitmap entry in the list for setup_cluster_bitmap, but the list
field is only initialized when it is restored from the on-disk free
space cache, or when it is written out to disk.

Besides a potential race condition due to the multiple use of the list
field, filesystem performance severely degrades over time: as we use
up all non-bitmap free extents, the try-to-set-up-cluster dance is
done at every metadata block allocation. For every block group, we
fail to set up a cluster, and after failing on them all up to twice,
we fall back to the much slower unclustered allocation.

To make matters worse, before the unclustered allocation, we try to
create new block groups until we reach the 1% threshold, which
introduces additional bitmaps and thus block groups that we'll iterate
over at each metadata block request.

Alexandre Oliva
2011-12-01 01:46:06 +0800
b772a86ea Btrfs: fix oops when calling statfs on readonly device ... Browse Code »

To reproduce this bug:

# dd if=/dev/zero of=img bs=1M count=256
# mkfs.btrfs img
# losetup -r /dev/loop1 img
# mount /dev/loop1 /mnt
OOPS!!

It triggered BUG_ON(!nr_devices) in btrfs_calc_avail_data_space().

To fix this, instead of checking write-only devices, we check all open
deivces:

# df -h /dev/loop1
Filesystem Size Used Avail Use% Mounted on
/dev/loop1 250M 28K 238M 1% /mnt

Signed-off-by: Li Zefan

Li Zefan
2011-12-01 01:46:05 +0800
ece7d20e8 Btrfs: Don't error on resizing FS to same size ... Browse Code »

It seems overly harsh to fail a resize of a btrfs file system to the
same size when a shrink or grow would succeed. User app GParted trips
over this error. Allow it by bypassing the shrink or grow operation.

Signed-off-by: Mike Fleetwood

Mike Fleetwood
2011-12-01 01:46:04 +0800
aa38a711a Btrfs: fix deadlock on metadata reservation when evicting a inode ... Browse Code »

When I ran the xfstests, I found the test tasks was blocked on meta-data
reservation.

By debugging, I found the reason of this bug:
start transaction
|
v
reserve meta-data space
|
v
flush delay allocation -> iput inode -> evict inode
^ |
| v
wait for delay allocation flush

Miao Xie
2011-12-01 01:46:03 +0800
b52f75a59 Fix URL of btrfs-progs git repository in docs ... Browse Code »

The location of the btrfs-progs repository has been changed.
This patch updates the documentation accordingly.

Signed-off-by: Arnd Hannemann

Arnd Hannemann
2011-12-01 01:46:02 +0800
26bdef541 btrfs scrub: handle -ENOMEM from init_ipath() ... Browse Code »

init_ipath() can return an ERR_PTR(-ENOMEM).

Signed-off-by: Dan Carpenter

Dan Carpenter
2011-12-01 01:46:01 +0800

30 Nov, 2011

5 commits

a493f1a24 Merge branches 'cxgb4', 'ipoib', 'misc' and 'qib' into for-next Browse Code »

Roland Dreier
2011-11-30 10:01:53 +0800
8cd792037 Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: Update comments describing device power management callbacks
PM / Sleep: Update documentation related to system wakeup
PM / Runtime: Make documentation follow the new behavior of irq_safe
PM / Sleep: Correct inaccurate information in devices.txt
PM / Domains: Document how PM domains are used by the PM core
PM / Hibernate: Do not leak memory in error/test code paths

Linus Torvalds
2011-11-30 06:43:22 +0800
580da35a3 IB: Fix RCU lockdep splats ... Browse Code »
1

Commit f2c31e32b37 ("net: fix NULL dereferences in check_peer_redir()")
forgot to take care of infiniband uses of dst neighbours.

Many thanks to Marc Aurele who provided a nice bug report and feedback.

Reported-by: Marc Aurele La France
Signed-off-by: Eric Dumazet
Cc: David Miller
Cc:
Signed-off-by: Roland Dreier

Eric Dumazet
2011-11-30 05:37:11 +0800
3874397c0 IB/ipoib: Prevent hung task or softlockup processing multicast response ... Browse Code »

This following can occur with ipoib when processing a multicast reponse:

BUG: soft lockup - CPU#0 stuck for 67s! [ib_mad1:982]
Modules linked in: ...
CPU 0:
Modules linked in: ...
Pid: 982, comm: ib_mad1 Not tainted 2.6.32-131.0.15.el6.x86_64 #1 ProLiant DL160 G5
RIP: 0010:[] [] _spin_unlock_irqrestore+0x17/0x20
RSP: 0018:ffff8802119ed860 EFLAGS: 00000246
0000000000000004 RBX: ffff8802119ed860 RCX: 000000000000a299
RDX: ffff88021086c700 RSI: 0000000000000246 RDI: 0000000000000246
RBP: ffffffff8100bc8e R08: ffff880210ac229c R09: 0000000000000000
R10: ffff88021278aab8 R11: 0000000000000000 R12: ffff8802119ed860
R13: ffffffff8100be6e R14: 0000000000000001 R15: 0000000000000003
FS: 0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000006d4840 CR3: 0000000209aa5000 CR4: 00000000000406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
[] ? ipoib_mcast_send+0x157/0x480 [ib_ipoib]
[] ? apic_timer_interrupt+0xe/0x20
[] ? apic_timer_interrupt+0xe/0x20
[] ? ipoib_path_lookup+0x124/0x2d0 [ib_ipoib]
[] ? ipoib_start_xmit+0x17c/0x430 [ib_ipoib]
[] ? dev_hard_start_xmit+0x2c8/0x3f0
[] ? sch_direct_xmit+0x15a/0x1c0
[] ? dev_queue_xmit+0x388/0x4d0
[] ? ipoib_mcast_join_finish+0x2c7/0x510 [ib_ipoib]
[] ? ipoib_mcast_sendonly_join_complete+0x1b8/0x1f0 [ib_ipoib]
[] ? mcast_work_handler+0x1a6/0x710 [ib_sa]
[] ? ib_send_mad+0xfe/0x3c0 [ib_mad]
[] ? ib_get_cached_lmc+0xa3/0xb0 [ib_core]
[] ? join_handler+0xeb/0x200 [ib_sa]
[] ? ib_sa_mcmember_rec_callback+0x5c/0xa0 [ib_sa]
[] ? recv_handler+0x3c/0x70 [ib_sa]
[] ? ib_mad_completion_handler+0x844/0x9d0 [ib_mad]
[] ? ib_mad_completion_handler+0x0/0x9d0 [ib_mad]
[] ? worker_thread+0x170/0x2a0
[] ? autoremove_wake_function+0x0/0x40
[] ? worker_thread+0x0/0x2a0
[] ? kthread+0x96/0xa0
[] ? child_rip+0xa/0x20

Coinciding with stack trace is the following message:

ib0: ib_address_create failed

The code below in ipoib_mcast_join_finish() will note the above
failure in the address handle but otherwise continue:

ah = ipoib_create_ah(dev, priv->pd, &av);
if (!ah) {
ipoib_warn(priv, "ib_address_create failed\n");
} else {

The while loop at the bottom of ipoib_mcast_join_finish() will attempt
to send queued multicast packets in mcast->pkt_queue and eventually
end up in ipoib_mcast_send():

if (!mcast->ah) {
if (skb_queue_len(&mcast->pkt_queue) < IPOIB_MAX_MCAST_QUEUE)
skb_queue_tail(&mcast->pkt_queue, skb);
else {
++dev->stats.tx_dropped;
dev_kfree_skb_any(skb);
}

My read is that the code will requeue the packet and return to the
ipoib_mcast_join_finish() while loop and the stage is set for the
"hung" task diagnostic as the while loop never sees a non-NULL ah, and
will do nothing to resolve.

There are GFP_ATOMIC allocates in the provider routines, so this is
possible and should be dealt with.

The test that induced the failure is associated with a host SM on the
same server during a shutdown.

This patch causes ipoib_mcast_join_finish() to exit with an error
which will flush the queued mcast packets. Nothing is done to unwind
the QP attached state so that subsequent sends from above will retry
the join.

Reviewed-by: Ram Vepa
Reviewed-by: Gary Leshner
Signed-off-by: Mike Marciniszyn
Signed-off-by: Roland Dreier

Mike Marciniszyn
2011-11-30 05:20:02 +0800
57db53b07 Merge branch 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux ... Browse Code »

* 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
slub: avoid potential NULL dereference or corruption
slub: use irqsafe_cpu_cmpxchg for put_cpu_partial
slub: move discard_slab out of node lock
slub: use correct parameter to add a page to partial list tail

Linus Torvalds
2011-11-30 03:13:22 +0800