Eric Lee / smarc-fsl-linux-kernel

08 Aug, 2011

1 commit

329551484 fix rcu annotations noise in cred.h ... Browse Code »

task->cred is declared as __rcu, and access to other tasks' ->cred is,
indeed, protected. Access to current->cred does not need rcu_dereference()
at all, since only the task itself can change its ->cred. sparse, of
course, has no way of knowing that...

Add force-cast in current_cred(), make current_fsuid() et.al. use it.

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2011-08-08 04:42:35 +0800

07 Aug, 2011

9 commits

c2f340a69 Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd ... Browse Code »

* 'for-linus' of git://git.open-osd.org/linux-open-osd:
ore: Make ore its own module
exofs: Rename raid engine from exofs/ios.c => ore
exofs: ios: Move to a per inode components & device-table
exofs: Move exofs specific osd operations out of ios.c
exofs: Add offset/length to exofs_get_io_state
exofs: Fix truncate for the raid-groups case
exofs: Small cleanup of exofs_fill_super
exofs: BUG: Avoid sbi realloc
exofs: Remove pnfs-osd private definitions
nfs_xdr: Move nfs4_string definition out of #ifdef CONFIG_NFS_V4

Linus Torvalds
2011-08-07 13:56:03 +0800
3ddcd0569 vfs: optimize inode cache access patterns ... Browse Code »

The inode structure layout is largely random, and some of the vfs paths
really do care. The path lookup in particular is already quite D$
intensive, and profiles show that accessing the 'inode->i_op->xyz'
fields is quite costly.

We already optimized the dcache to not unnecessarily load the d_op
structure for members that are often NULL using the DCACHE_OP_xyz bits
in dentry->d_flags, and this does something very similar for the inode
ops that are used during pathname lookup.

It also re-orders the fields so that the fields accessed by 'stat' are
together at the beginning of the inode structure, and roughly in the
order accessed.

The effect of this seems to be in the 1-2% range for an empty kernel
"make -j" run (which is fairly kernel-intensive, mostly in filename
lookup), so it's visible. The numbers are fairly noisy, though, and
likely depend a lot on exact microarchitecture. So there's more tuning
to be done.

Signed-off-by: Linus Torvalds

Linus Torvalds
2011-08-07 13:53:23 +0800
830c0f0ed vfs: renumber DCACHE_xyz flags, remove some stale ones ... Browse Code »

Gcc tends to generate better code with small integers, including the
DCACHE_xyz flag tests - so move the common ones to be first in the list.
Also just remove the unused DCACHE_INOTIFY_PARENT_WATCHED and
DCACHE_AUTOFS_PENDING values, their users no longer exists in the source
tree.

And add a "unlikely()" to the DCACHE_OP_COMPARE test, since we want the
common case to be a nice straight-line fall-through.

Signed-off-by: Linus Torvalds

Linus Torvalds
2011-08-07 13:52:40 +0800
7cd4767e6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net: Compute protocol sequence numbers and fragment IDs using MD5.
crypto: Move md5_transform to lib/md5.c

Linus Torvalds
2011-08-07 13:12:37 +0800
8ff660ab8 exofs: Rename raid engine from exofs/ios.c => ore ... Browse Code »

ORE stands for "Objects Raid Engine"

This patch is a mechanical rename of everything that was in ios.c
and its API declaration to an ore.c and an osd_ore.h header. The ore
engine will later be used by the pnfs objects layout driver.

* File ios.c => ore.c

* Declaration of types and API are moved from exofs.h to a new
osd_ore.h

* All used types are prefixed by ore_ from their exofs_ name.

* Shift includes from exofs.h to osd_ore.h so osd_ore.h is
independent, include it from exofs.h.

Other than a pure rename there are no other changes. Next patch
will move the ore into it's own module and will export the API
to be used by exofs and later the layout driver

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2011-08-07 10:36:18 +0800
6e5714eaf net: Compute protocol sequence numbers and fragment IDs using MD5. ... Browse Code »

Computers have become a lot faster since we compromised on the
partial MD4 hash which we use currently for performance reasons.

MD5 is a much safer choice, and is inline with both RFC1948 and
other ISS generators (OpenBSD, Solaris, etc.)

Furthermore, only having 24-bits of the sequence number be truly
unpredictable is a very serious limitation. So the periodic
regeneration and 8-bit counter have been removed. We compute and
use a full 32-bit sequence number.

For ipv6, DCCP was found to use a 32-bit truncated initial sequence
number (it needs 43-bits) and that is fixed here as well.

Reported-by: Dan Kaminsky
Tested-by: Willy Tarreau
Signed-off-by: David S. Miller

David S. Miller
2011-08-07 09:33:19 +0800
bc0b96b54 crypto: Move md5_transform to lib/md5.c ... Browse Code »

We are going to use this for TCP/IP sequence number and fragment ID
generation.

Signed-off-by: David S. Miller

David S. Miller
2011-08-07 09:32:45 +0800
2560540b7 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86 ... Browse Code »

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86: (38 commits)
acer-wmi: support Lenovo ideapad S205 wifi switch
acerhdf.c: spaces in aliased changed to *
platform-drivers-x86: ideapad-laptop: add missing ideapad_input_exit in ideapad_acpi_add error path
x86 driver: fix typo in TDP override enabling
Platform: fix samsung-laptop DMI identification for N150/N210/220/N230
dell-wmi: Add keys for Dell XPS L502X
platform-drivers-x86: samsung-q10: make dmi_check_callback return 1
Platform: Samsung Q10 backlight driver
platform-drivers-x86: intel_scu_ipc: convert to DEFINE_PCI_DEVICE_TABLE
platform-drivers-x86: intel_rar_register: convert to DEFINE_PCI_DEVICE_TABLE
platform-drivers-x86: intel_menlow: add missing return AE_OK for intel_menlow_register_sensor()
platform-drivers-x86: intel_mid_thermal: fix memory leak
platform-drivers-x86: msi-wmi: add missing sparse_keymap_free in msi_wmi_init error path
Samsung Laptop platform driver: support N510
asus-wmi: add uwb rfkill support
asus-wmi: add gps rfkill support
asus-wmi: add CWAP support and clarify the meaning of WAPF bits
asus-wmi: return proper value in store_cpufv()
asus-wmi: check for temp1 presence
asus-wmi: add thermal sensor
...

Linus Torvalds
2011-08-07 04:26:15 +0800
1eb19a12b lib/sha1: use the git implementation of SHA-1 ... Browse Code »

For ChromiumOS, we use SHA-1 to verify the integrity of the root
filesystem. The speed of the kernel sha-1 implementation has a major
impact on our boot performance.

To improve boot performance, we investigated using the heavily optimized
sha-1 implementation used in git. With the git sha-1 implementation, we
see a 11.7% improvement in boot time.

10 reboots, remove slowest/fastest.

Before:

Mean: 6.58 seconds Stdev: 0.14

After (with git sha-1, this patch):

Mean: 5.89 seconds Stdev: 0.07

The other cool thing about the git SHA-1 implementation is that it only
needs 64 bytes of stack for the workspace while the original kernel
implementation needed 320 bytes.

Signed-off-by: Mandeep Singh Baines
Cc: Ramsay Jones
Cc: Nicolas Pitre
Cc: Herbert Xu
Cc: David S. Miller
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Linus Torvalds

Mandeep Singh Baines
2011-08-07 02:26:52 +0800

06 Aug, 2011

3 commits

33009557b Add KEY_MICMUTE and enable it on Lenovo X220 ... Browse Code »

I suspect that this works on T410.

Signed-off-by: Andy Lutomirski
Signed-off-by: Matthew Garrett

Andy Lutomirski
2011-08-06 02:45:41 +0800
de96355c1 Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 ... Browse Code »

* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (55 commits)
Revert "drm/i915: Try enabling RC6 by default (again)"
drm/radeon: Extended DDC Probing for ECS A740GM-M DVI-D Connector
drm/radeon: Log Subsystem Vendor and Device Information
drm/radeon: Extended DDC Probing for Connectors with Improperly Wired DDC Lines (here: Asus M2A-VM HDMI)
drm: Separate EDID Header Check from EDID Block Check
drm: Add NULL check about irq functions
drm: Fix irq install error handling
drm/radeon: fix potential NULL dereference in drivers/gpu/drm/radeon/atom.c
drm/radeon: clean reg header files
drm/debugfs: Initialise empty variable
drm/radeon/kms: add thermal chip quirk for asus 9600xt
drm/radeon: off by one in check_reg() functions
drm/radeon/kms: fix version comment due to merge timing
drm/i915: allow cache sharing policy control
drm/i915/hdmi: HDMI source product description infoframe support
drm/i915/hdmi: split infoframe setting from infoframe type code
drm: track CEA version number if present
drm/i915: Try enabling RC6 by default (again)
Revert "drm/i915/dp: Zero the DPCD data before connection probe"
drm/i915/dp: wait for previous AUX channel activity to clear
...

Linus Torvalds
2011-08-06 00:44:38 +0800
07d952dc6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits)
ipv6: check for IPv4 mapped addresses when connecting IPv6 sockets
mlx4: decreasing ref count when removing mac
net: Fix security_socket_sendmsg() bypass problem.
net: Cap number of elements for sendmmsg
net: sendmmsg should only return an error if no messages were sent
ixgbe: fix PHY link setup for 82599
ixgbe: fix __ixgbe_notify_dca() bail out code
igb: fix WOL on second port of i350 device
e1000e: minor re-order of #include files
e1000e: remove unnecessary check for NULL pointer
intel drivers: repair missing flush operations
macb: restore wrap bit when performing underrun cleanup
cdc_ncm: fix endianness problem.
irda: use PCI_VENDOR_ID_*
mlx4: Fixing Ethernet unicast packet steering
net: fix NULL dereferences in check_peer_redir()
bnx2x: Clear MDIO access warning during first driver load
bnx2x: Fix BCM578xx MAC test
bnx2x: Fix BCM54618se invalid link indication
bnx2x: Fix BCM84833 link
...

Linus Torvalds
2011-08-06 00:42:01 +0800

05 Aug, 2011

4 commits

24f0eed26 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
RCUify freeing acls, let check_acl() go ahead in RCU mode if acl is cached
get rid of boilerplate switches in posix_acl.h
fix block device fallout from ->fsync() changes

Linus Torvalds
2011-08-05 10:44:40 +0800
7f3bf7cd3 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx ... Browse Code »

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
dmaengine: use DEFINE_IDR for static initialization
ioat: fix xor_idx_to_desc
Avoid section type conflict in dma/ioat/dma_v3.c
ioat: Adding PCI IDs for IOAT devices on SandyBridge platforms

Linus Torvalds
2011-08-05 10:43:43 +0800
655b16128 nfs_xdr: Move nfs4_string definition out of #ifdef CONFIG_NFS_V4 ... Browse Code »

exofs file system wants to use pnfs_osd_xdr.h file instead of
redefining pnfs-objects types in it's private "pnfs.h" headr.

Before we do the switch we must make sure pnfs_osd_xdr.h is
compilable also under NFS versions smaller than 4.1. Since now
it is needed regardless of version, by the exofs code.

nfs4_string is not the only nfs4 type out in the global scope.

Ack-by: Trond Myklebust
Signed-off-by: Boaz Harrosh

Boaz Harrosh
2011-08-05 03:35:08 +0800
53d1e658d Merge branch 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6 ... Browse Code »

* 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6:
Revert "dt: add of_alias_scan and of_alias_get_id"
dt: remove of_alias_get_id() reference

Linus Torvalds
2011-08-05 00:37:07 +0800

04 Aug, 2011

20 commits

051963d48 drm: Separate EDID Header Check from EDID Block Check ... Browse Code »
1

Provides function drm_edid_header_is_valid() for EDID header check
and replaces EDID header check part of function drm_edid_block_valid()
by a call of drm_edid_header_is_valid().
This is a prerequisite to extend DDC probing, e. g. in function
radeon_ddc_probe() for Radeon devices, by a central EDID header check.

Tested for kernel 2.6.35, 2.6.38 and 3.0

Cc:
Signed-off-by: Thomas Reim
Reviewed-by: Alex Deucher
Acked-by: Stephen Michaels
Signed-off-by: Dave Airlie

Thomas Reim
2011-08-04 21:39:35 +0800
0b576372e Merge branch 'drm-intel-next' of ssh://master.kernel.org/pub/scm/linux/kernel/gi… ... Browse Code »

…t/keithp/linux-2.6 into drm-fixes

* 'drm-intel-next' of ssh://master.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6: (42 commits)
drm/i915: allow cache sharing policy control
drm/i915/hdmi: HDMI source product description infoframe support
drm/i915/hdmi: split infoframe setting from infoframe type code
drm: track CEA version number if present
drm/i915: Try enabling RC6 by default (again)
Revert "drm/i915/dp: Zero the DPCD data before connection probe"
drm/i915/dp: wait for previous AUX channel activity to clear
drm/i915: don't use uninitialized EDID bpc values when picking pipe bpp
drm/i915/pch: Save/restore PCH_PORT_HOTPLUG across suspend
drm/i915: apply phase pointer override on SNB+ too
drm/i915: Add quirk to disable SSC on Sony Vaio Y2
drm/i915: provide more error output when mode sets fail
drm/i915: add GPU max frequency control file
i915: add Dell OptiPlex FX170 to intel_no_lvds
drm/i915: Ignore GPU wedged errors while pinning scanout buffers
drm/i915/hdmi: send AVI info frames on ILK+ as well
drm/i915: fix CB tuning check for ILK+
drm/i915: Flush other plane register writes
drm/i915: flush plane control changes on ILK+ as well
drm/i915: apply timing generator bug workaround on CPT and PPT
...

Dave Airlie
2011-08-04 21:22:24 +0800
fe55c1844 Revert "dt: add of_alias_scan and of_alias_get_id" ... Browse Code »

This reverts commit 750f463a749e28464151ad26938d11b07b1c43cb.

of_alias_* still needs work to be generalized for 'promtree' dt
platforms, and to no implicitly create entries for available ids.

Signed-off-by: Grant Likely

Grant Likely
2011-08-04 18:26:24 +0800
35e51fe82 Merge branch 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 ... Browse Code »

* 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
cpuidle: stop depending on pm_idle
x86 idle: move mwait_idle_with_hints() to where it is used
cpuidle: replace xen access to x86 pm_idle and default_idle
cpuidle: create bootparam "cpuidle.off=1"
mrst_pmu: driver for Intel Moorestown Power Management Unit

Linus Torvalds
2011-08-04 15:54:15 +0800
c0c770e61 Merge branch 'apei-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 ... Browse Code »

* 'apei-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPI, APEI, EINJ Param support is disabled by default
APEI GHES: 32-bit buildfix
ACPI: APEI build fix
ACPI, APEI, GHES: Add hardware memory error recovery support
HWPoison: add memory_failure_queue()
ACPI, APEI, GHES, Error records content based throttle
ACPI, APEI, GHES, printk support for recoverable error via NMI
lib, Make gen_pool memory allocator lockless
lib, Add lock-less NULL terminated single list
Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG
ACPI, APEI, Add WHEA _OSC support
ACPI, APEI, Add APEI bit support in generic _OSC call
ACPI, APEI, GHES, Support disable GHES at boot time
ACPI, APEI, GHES, Prevent GHES to be built as module
ACPI, APEI, Use apei_exec_run_optional in APEI EINJ and ERST
ACPI, APEI, Add apei_exec_run_optional
ACPI, APEI, GHES, Do not ratelimit fatal error printk before panic
ACPI, APEI, ERST, Fix erst-dbg long record reading issue
ACPI, APEI, ERST, Prevent erst_dbg from loading if ERST is disabled

Linus Torvalds
2011-08-04 15:53:27 +0800
27665ffa2 Merge branch 'devicetree/next' of git://git.secretlab.ca/git/linux-2.6 ... Browse Code »

* 'devicetree/next' of git://git.secretlab.ca/git/linux-2.6:
dt: add of_alias_scan and of_alias_get_id

Linus Torvalds
2011-08-04 09:10:30 +0800
ebec9a7bf drm: track CEA version number if present ... Browse Code »

Drivers need to know the CEA version number in addition to other display
info (like whether the display is an HDMI sink) before enabling certain
features. So track the CEA version number in the display info
structure.

Signed-off-by: Jesse Barnes
Signed-off-by: Keith Packard

Jesse Barnes
2011-08-04 08:43:10 +0800
e504f3fdd tmpfs radix_tree: locate_item to speed up swapoff ... Browse Code »

We have already acknowledged that swapoff of a tmpfs file is slower than
it was before conversion to the generic radix_tree: a little slower
there will be acceptable, if the hotter paths are faster.

But it was a shock to find swapoff of a 500MB file 20 times slower on my
laptop, taking 10 minutes; and at that rate it significantly slows down
my testing.

Now, most of that turned out to be overhead from PROVE_LOCKING and
PROVE_RCU: without those it was only 4 times slower than before; and
more realistic tests on other machines don't fare as badly.

I've tried a number of things to improve it, including tagging the swap
entries, then doing lookup by tag: I'd expected that to halve the time,
but in practice it's erratic, and often counter-productive.

The only change I've so far found to make a consistent improvement, is
to short-circuit the way we go back and forth, gang lookup packing
entries into the array supplied, then shmem scanning that array for the
target entry. Scanning in place doubles the speed, so it's now only
twice as slow as before (or three times slower when the PROVEs are on).

So, add radix_tree_locate_item() as an expedient, once-off,
single-caller hack to do the lookup directly in place. #ifdef it on
CONFIG_SHMEM and CONFIG_SWAP, as much to document its limited
applicability as save space in other configurations. And, sadly,
#include sched.h for cond_resched().

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2011-08-04 08:25:24 +0800
69f07ec93 tmpfs: use kmemdup for short symlinks ... Browse Code »

But we've not yet removed the old swp_entry_t i_direct[16] from
shmem_inode_info. That's because it was still being shared with the
inline symlink. Remove it now (saving 64 or 128 bytes from shmem inode
size), and use kmemdup() for short symlinks, say, those up to 128 bytes.

I wonder why mpol_free_shared_policy() is done in shmem_destroy_inode()
rather than shmem_evict_inode(), where we usually do such freeing? I
guess it doesn't matter, and I'm not into NUMA mpol testing right now.

Signed-off-by: Hugh Dickins
Acked-by: Rik van Riel
Reviewed-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2011-08-04 08:25:24 +0800
aa3b18955 tmpfs: convert mem_cgroup shmem to radix-swap ... Browse Code »

Remove mem_cgroup_shmem_charge_fallback(): it was only required when we
had to move swappage to filecache with GFP_NOWAIT.

Remove the GFP_NOWAIT special case from mem_cgroup_cache_charge(), by
moving its call out from shmem_add_to_page_cache() to two of thats three
callers. But leave it doing mem_cgroup_uncharge_cache_page() on error:
although asymmetrical, it's easier for all 3 callers to handle.

These two changes would also be appropriate if anyone were to start
using shmem_read_mapping_page_gfp() with GFP_NOWAIT.

Remove mem_cgroup_get_shmem_target(): mc_handle_file_pte() can test
radix_tree_exceptional_entry() to get what it needs for itself.

Signed-off-by: Hugh Dickins
Acked-by: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2011-08-04 08:25:24 +0800
41ffe5d5c tmpfs: miscellaneous trivial cleanups ... Browse Code »

While it's at its least, make a number of boring nitpicky cleanups to
shmem.c, mostly for consistency of variable naming. Things like "swap"
instead of "entry", "pgoff_t index" instead of "unsigned long idx".

And since everything else here is prefixed "shmem_", better change
init_tmpfs() to shmem_init().

Signed-off-by: Hugh Dickins
Acked-by: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2011-08-04 08:25:23 +0800
285b2c4fd tmpfs: demolish old swap vector support ... Browse Code »

The maximum size of a shmem/tmpfs file has been limited by the maximum
size of its triple-indirect swap vector. With 4kB page size, maximum
filesize was just over 2TB on a 32-bit kernel, but sadly one eighth of
that on a 64-bit kernel. (With 8kB page size, maximum filesize was just
over 4TB on a 64-bit kernel, but 16TB on a 32-bit kernel,
MAX_LFS_FILESIZE being then more restrictive than swap vector layout.)

It's a shame that tmpfs should be more restrictive than ramfs, and this
limitation has now been noticed. Add another level to the swap vector?
No, it became obscure and hard to maintain, once I complicated it to
make use of highmem pages nine years ago: better choose another way.

Surely, if 2.4 had had the radix tree pagecache introduced in 2.5, then
tmpfs would never have invented its own peculiar radix tree: we would
have fitted swap entries into the common radix tree instead, in much the
same way as we fit swap entries into page tables.

And why should each file have a separate radix tree for its pages and
for its swap entries? The swap entries are required precisely where and
when the pages are not. We want to put them together in a single radix
tree: which can then avoid much of the locking which was needed to
prevent them from being exchanged underneath us.

This also avoids the waste of memory devoted to swap vectors, first in
the shmem_inode itself, then at least two more pages once a file grew
beyond 16 data pages (pages accounted by df and du, but not by memcg).
Allocated upfront, to avoid allocation when under swapping pressure, but
pure waste when CONFIG_SWAP is not set - I have never spattered around
the ifdefs to prevent that, preferring this move to sharing the common
radix tree instead.

There are three downsides to sharing the radix tree. One, that it binds
tmpfs more tightly to the rest of mm, either requiring knowledge of swap
entries in radix tree there, or duplication of its code here in shmem.c.
I believe that the simplications and memory savings (and probable higher
performance, not yet measured) justify that.

Two, that on HIGHMEM systems with SWAP enabled, it's the lowmem radix
nodes that cannot be freed under memory pressure - whereas before it was
the less precious highmem swap vector pages that could not be freed.
I'm hoping that 64-bit has now been accessible for long enough, that the
highmem argument has grown much less persuasive.

Three, that swapoff is slower than it used to be on tmpfs files, since
it's using a simple generic mechanism not tailored to it: I find this
noticeable, and shall want to improve, but maybe nobody else will
notice.

So... now remove most of the old swap vector code from shmem.c. But,
for the moment, keep the simple i_direct vector of 16 pages, with simple
accessors shmem_put_swap() and shmem_get_swap(), as a toy implementation
to help mark where swap needs to be handled in subsequent patches.

Signed-off-by: Hugh Dickins
Acked-by: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2011-08-04 08:25:23 +0800
a2c16d6cb mm: let swap use exceptional entries ... Browse Code »

If swap entries are to be stored along with struct page pointers in a
radix tree, they need to be distinguished as exceptional entries.

Most of the handling of swap entries in radix tree will be contained in
shmem.c, but a few functions in filemap.c's common code need to check
for their appearance: find_get_page(), find_lock_page(),
find_get_pages() and find_get_pages_contig().

So as not to slow their fast paths, tuck those checks inside the
existing checks for unlikely radix_tree_deref_slot(); except for
find_lock_page(), where it is an added test. And make it a BUG in
find_get_pages_tag(), which is not applied to tmpfs files.

A part of the reason for eliminating shmem_readpage() earlier, was to
minimize the places where common code would need to allow for swap
entries.

The swp_entry_t known to swapfile.c must be massaged into a slightly
different form when stored in the radix tree, just as it gets massaged
into a pte_t when stored in page tables.

In an i386 kernel this limits its information (type and page offset) to
30 bits: given 32 "types" of swapfile and 4kB pagesize, that's a maximum
swapfile size of 128GB. Which is less than the 512GB we previously
allowed with X86_PAE (where the swap entry can occupy the entire upper
32 bits of a pte_t), but not a new limitation on 32-bit without PAE; and
there's not a new limitation on 64-bit (where swap filesize is already
limited to 16TB by a 32-bit page offset). Thirty areas of 128GB is
probably still enough swap for a 64GB 32-bit machine.

Provide swp_to_radix_entry() and radix_to_swp_entry() conversions, and
enforce filesize limit in read_swap_header(), just as for ptes.

Signed-off-by: Hugh Dickins
Acked-by: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2011-08-04 08:25:22 +0800
6328650bb radix_tree: exceptional entries and indices ... Browse Code »

A patchset to extend tmpfs to MAX_LFS_FILESIZE by abandoning its
peculiar swap vector, instead keeping a file's swap entries in the same
radix tree as its struct page pointers: thus saving memory, and
simplifying its code and locking.

This patch:

The radix_tree is used by several subsystems for different purposes. A
major use is to store the struct page pointers of a file's pagecache for
memory management. But what if mm wanted to store something other than
page pointers there too?

The low bit of a radix_tree entry is already used to denote an indirect
pointer, for internal use, and the unlikely radix_tree_deref_retry()
case.

Define the next bit as denoting an exceptional entry, and supply inline
functions radix_tree_exception() to return non-0 in either unlikely
case, and radix_tree_exceptional_entry() to return non-0 in the second
case.

If a subsystem already uses radix_tree with that bit set, no problem: it
does not affect internal workings at all, but is defined for the
convenience of those storing well-aligned pointers in the radix_tree.

The radix_tree_gang_lookups have an implicit assumption that the caller
can deduce the offset of each entry returned e.g. by the page->index of
a struct page. But that may not be feasible for some kinds of item to
be stored there.

radix_tree_gang_lookup_slot() allow for an optional indices argument,
output array in which to return those offsets. The same could be added
to other radix_tree_gang_lookups, but for now keep it to the only one
for which we need it.

Signed-off-by: Hugh Dickins
Acked-by: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2011-08-04 08:25:22 +0800
5d6f921b4 drivers/video/backlight/aat2870_bl.c: fix setting max_current ... Browse Code »

- Current implementation tests wrong value for setting
aat2870_bl->max_current.

- In the current implementation, we cannot differentiate between 2 cases:

a) if pdata->max_current is not set , or

b) pdata->max_current is set to AAT2870_CURRENT_0_45 (which is also 0).

Fix it by setting AAT2870_CURRENT_0_45 to be 1 and adjust the equation in
aat2870_brightness() accordingly.

Signed-off-by: Axel Lin
Cc: Richard Purdie
Cc: Samuel Ortiz
Tested-by: Jin Park
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Axel Lin
2011-08-04 08:25:22 +0800
3dab1bce8 mm: page_alloc: increase __GFP_BITS_SHIFT to include __GFP_OTHER_NODE ... Browse Code »

__GFP_OTHER_NODE is used for NUMA allocations on behalf of other nodes.
It's supposed to be passed through from the page allocator to
zone_statistics(), but it never gets there as gfp_allowed_mask is not
wide enough and masks out the flag early in the allocation path.

The result is an accounting glitch where successful NUMA allocations
by-agent are not properly attributed as local.

Increase __GFP_BITS_SHIFT so that it includes __GFP_OTHER_NODE.

Signed-off-by: Johannes Weiner
Acked-by: Andi Kleen
Reviewed-by: Minchan Kim
Acked-by: Mel Gorman
Reviewed-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-08-04 08:25:21 +0800
88eca0207 ida: simplified functions for id allocation ... Browse Code »

The current hyper-optimized functions are overkill if you simply want to
allocate an id for a device. Create versions which use an internal
lock.

In followup patches, numerous drivers are converted to use this
interface.

Thanks to Tejun for feedback.

Signed-off-by: Rusty Russell
Acked-by: Tejun Heo
Acked-by: Jonathan Cameron
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rusty Russell
2011-08-04 08:25:20 +0800
dd48c085c fault-injection: add ability to export fault_attr in arbitrary directory ... Browse Code »

init_fault_attr_dentries() is used to export fault_attr via debugfs.
But it can only export it in debugfs root directory.

Per Forlin is working on mmc_fail_request which adds support to inject
data errors after a completed host transfer in MMC subsystem.

The fault_attr for mmc_fail_request should be defined per mmc host and
export it in debugfs directory per mmc host like
/sys/kernel/debug/mmc0/mmc_fail_request.

init_fault_attr_dentries() doesn't help for mmc_fail_request. So this
introduces fault_create_debugfs_attr() which is able to create a
directory in the arbitrary directory and replace
init_fault_attr_dentries().

[akpm@linux-foundation.org: extraneous semicolon, per Randy]
Signed-off-by: Akinobu Mita
Tested-by: Per Forlin
Cc: Jens Axboe
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: Matt Mackall
Cc: Randy Dunlap
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2011-08-04 08:25:20 +0800
a0bfa1373 cpuidle: stop depending on pm_idle ... Browse Code »

cpuidle users should call cpuidle_call_idle() directly
rather than via (pm_idle)() function pointer.

Architecture may choose to continue using (pm_idle)(),
but cpuidle need not depend on it:

my_arch_cpu_idle()
...
if(cpuidle_call_idle())
pm_idle();

cc: Kevin Hilman
cc: Paul Mundt
cc: x86@kernel.org
Acked-by: H. Peter Anvin
Signed-off-by: Len Brown

Len Brown
2011-08-04 07:06:37 +0800
d91ee5863 cpuidle: replace xen access to x86 pm_idle and default_idle ... Browse Code »

When a Xen Dom0 kernel boots on a hypervisor, it gets access
to the raw-hardware ACPI tables. While it parses the idle tables
for the hypervisor's beneift, it uses HLT for its own idle.

Rather than have xen scribble on pm_idle and access default_idle,
have it simply disable_cpuidle() so acpi_idle will not load and
architecture default HLT will be used.

cc: xen-devel@lists.xensource.com
Tested-by: Konrad Rzeszutek Wilk
Acked-by: H. Peter Anvin
Signed-off-by: Len Brown

Len Brown
2011-08-04 07:06:36 +0800

03 Aug, 2011

3 commits

d0e323b47 Merge branch 'apei' into apei-release ... Browse Code »

Some trivial conflicts due to other various merges
adding to the end of common lists sooner than this one.

arch/ia64/Kconfig
arch/powerpc/Kconfig
arch/x86/Kconfig
lib/Kconfig
lib/Makefile

Signed-off-by: Len Brown

Len Brown
2011-08-03 23:30:42 +0800
a7e09d450 ACPI: APEI build fix ... Browse Code »

as GHES is optional...

When # CONFIG_ACPI_APEI_GHES is not set:

(.init.text+0x4c22): undefined reference to `ghes_disable'

Reported-by: Randy Dunlap
Acked-by: Randy Dunlap
Signed-off-by: Len Brown

Len Brown
2011-08-03 23:15:59 +0800
ea8f5fb8a HWPoison: add memory_failure_queue() ... Browse Code »

memory_failure() is the entry point for HWPoison memory error
recovery. It must be called in process context. But commonly
hardware memory errors are notified via MCE or NMI, so some delayed
execution mechanism must be used. In MCE handler, a work queue + ring
buffer mechanism is used.

In addition to MCE, now APEI (ACPI Platform Error Interface) GHES
(Generic Hardware Error Source) can be used to report memory errors
too. To add support to APEI GHES memory recovery, a mechanism similar
to that of MCE is implemented. memory_failure_queue() is the new
entry point that can be called in IRQ context. The next step is to
make MCE handler uses this interface too.

Signed-off-by: Huang Ying
Cc: Andi Kleen
Cc: Wu Fengguang
Cc: Andrew Morton
Signed-off-by: Len Brown

Huang Ying
2011-08-03 23:15:58 +0800