Eric Lee / smarc-fsl-linux-kernel

10 Jan, 2009

9 commits

3d14bdad4 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/… ... Browse Code »

…git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (36 commits)
x86: fix section mismatch warnings in mcheck/mce_amd_64.c
x86: offer frame pointers in all build modes
x86: remove duplicated #include's
x86: k8 numa register active regions later
x86: update Alan Cox's email addresses
x86: rename all fields of mpc_table mpc_X to X
x86: rename all fields of mpc_oemtable oem_X to X
x86: rename all fields of mpc_bus mpc_X to X
x86: rename all fields of mpc_cpu mpc_X to X
x86: rename all fields of mpc_intsrc mpc_X to X
x86: rename all fields of mpc_lintsrc mpc_X to X
x86: rename all fields of mpc_iopic mpc_X to X
x86: irqinit_64.c init_ISA_irqs should be static
Documentation/x86/boot.txt: payload length was changed to payload_length
x86: setup_percpu.c fix style problems
x86: irqinit_64.c fix style problems
x86: irqinit_32.c fix style problems
x86: i8259.c fix style problems
x86: irq_32.c fix style problems
x86: ioport.c fix style problems
...

Linus Torvalds
2009-01-10 22:13:09 +0800
c4be0c1dc filesystem freeze: add error handling of write_super_lockfs/unlockfs ... Browse Code »

Currently, ext3 in mainline Linux doesn't have the freeze feature which
suspends write requests. So, we cannot take a backup which keeps the
filesystem's consistency with the storage device's features (snapshot and
replication) while it is mounted.

In many case, a commercial filesystem (e.g. VxFS) has the freeze feature
and it would be used to get the consistent backup.

If Linux's standard filesystem ext3 has the freeze feature, we can do it
without a commercial filesystem.

So I have implemented the ioctls of the freeze feature.
I think we can take the consistent backup with the following steps.
1. Freeze the filesystem with the freeze ioctl.
2. Separate the replication volume or create the snapshot
with the storage device's feature.
3. Unfreeze the filesystem with the unfreeze ioctl.
4. Take the backup from the separated replication volume
or the snapshot.

This patch:

VFS:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that they can return an error.
Rename write_super_lockfs and unlockfs of the super block operation
freeze_fs and unfreeze_fs to avoid a confusion.

ext3, ext4, xfs, gfs2, jfs:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that write_super_lockfs returns an error if needed,
and unlockfs always returns 0.

reiserfs:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that they always return 0 (success) to keep a current behavior.

Signed-off-by: Takashi Sato
Signed-off-by: Masayuki Hamaguchi
Cc:
Cc:
Cc: Christoph Hellwig
Cc: Dave Kleikamp
Cc: Dave Chinner
Cc: Alasdair G Kergon
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Takashi Sato
2009-01-10 08:54:42 +0800
31aeb6c81 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
MAINTAINERS: squashfs entry
Squashfs: documentation
Squashfs: initrd support
Squashfs: Kconfig entry
Squashfs: Makefiles
Squashfs: header files
Squashfs: block operations
Squashfs: cache operations
Squashfs: uid/gid lookup operations
Squashfs: fragment block operations
Squashfs: export operations
Squashfs: super block operations
Squashfs: symlink operations
Squashfs: regular file operations
Squashfs: directory readdir operations
Squashfs: directory lookup operations
Squashfs: inode operations

Linus Torvalds
2009-01-10 07:18:49 +0800
c40f6f8bb Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-nommu ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-nommu:
NOMMU: Support XIP on initramfs
NOMMU: Teach kobjsize() about VMA regions.
FLAT: Don't attempt to expand the userspace stack to fill the space allocated
FDPIC: Don't attempt to expand the userspace stack to fill the space allocated
NOMMU: Improve procfs output using per-MM VMAs
NOMMU: Make mmap allocation page trimming behaviour configurable.
NOMMU: Make VMAs per MM as for MMU-mode linux
NOMMU: Delete askedalloc and realalloc variables
NOMMU: Rename ARM's struct vm_region
NOMMU: Fix cleanup handling in ramfs_nommu_get_umapped_area()

Linus Torvalds
2009-01-10 06:00:58 +0800
7d671f3e7 Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6 ... Browse Code »

* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] update documentation for hvc_iucv kernel parameter.
[S390] hvc_iucv: Special handling of IUCV HVC devices
[S390] hvc_iucv: Refactor console and device initialization
[S390] hvc_iucv: Update function documentation
[S390] hvc_iucv: Limit rate of outgoing IUCV messages
[S390] hvc_iucv: Change IUCV term id and use one device as default
[S390] Use unsigned long long for u64 on 64bit.
[S390] qdio: fix broken pointer in case of CONFIG_DEBUG_FS is disabled
[S390] vdso: compile fix
[S390] remove code for oldselect system call
[S390] types: add/fix types.h include in header files
[S390] dasd: add device attribute to disable blocking on lost paths
[S390] dasd: send change uevents for dasd block devices
[S390] tape block: fix dependencies
[S390] asm-s390/posix_types.h: drop __USE_ALL usage
[S390] gettimeofday.S: removed duplicated #includes
[S390] ptrace: no extern declarations for userspace

Linus Torvalds
2009-01-10 05:56:06 +0800
73d59314e Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (864 commits)
Btrfs: explicitly mark the tree log root for writeback
Btrfs: Drop the hardware crc32c asm code
Btrfs: Add Documentation/filesystem/btrfs.txt, remove old COPYING
Btrfs: kmap_atomic(KM_USER0) is safe for btrfs_readpage_end_io_hook
Btrfs: Don't use kmap_atomic(..., KM_IRQ0) during checksum verifies
Btrfs: tree logging checksum fixes
Btrfs: don't change file extent's ram_bytes in btrfs_drop_extents
Btrfs: Use btrfs_join_transaction to avoid deadlocks during snapshot creation
Btrfs: drop remaining LINUX_KERNEL_VERSION checks and compat code
Btrfs: drop EXPORT symbols from extent_io.c
Btrfs: Fix checkpatch.pl warnings
Btrfs: Fix free block discard calls down to the block layer
Btrfs: avoid orphan inode caused by log replay
Btrfs: avoid potential super block corruption
Btrfs: do not call kfree if kmalloc failed in btrfs_sysfs_add_super
Btrfs: fix a memory leak in btrfs_get_sb
Btrfs: Fix typo in clear_state_cb
Btrfs: Fix memset length in btrfs_file_write
Btrfs: update directory's size when creating subvol/snapshot
Btrfs: add permission checks to the ioctls
...

Linus Torvalds
2009-01-10 05:01:38 +0800
7c51d57e9 Merge git://git.infradead.org/mtd-2.6 ... Browse Code »

* git://git.infradead.org/mtd-2.6: (67 commits)
[MTD] [MAPS] Fix printk format warning in nettel.c
[MTD] [NAND] add cmdline parsing (mtdparts=) support to cafe_nand
[MTD] CFI: remove major/minor version check for command set 0x0002
[MTD] [NAND] ndfc driver
[MTD] [TESTS] Fix some size_t printk format warnings
[MTD] LPDDR Makefile and KConfig
[MTD] LPDDR extended physmap driver to support LPDDR flash
[MTD] LPDDR added new pfow_base parameter
[MTD] LPDDR Command set driver
[MTD] LPDDR PFOW definition
[MTD] LPDDR QINFO records definitions
[MTD] LPDDR qinfo probing.
[MTD] [NAND] pxa3xx: convert from ns to clock ticks more accurately
[MTD] [NAND] pxa3xx: fix non-page-aligned reads
[MTD] [NAND] fix nandsim sched.h references
[MTD] [NAND] alauda: use USB API functions rather than constants
[MTD] struct device - replace bus_id with dev_name(), dev_set_name()
[MTD] fix m25p80 64-bit divisions
[MTD] fix dataflash 64-bit divisions
[MTD] [NAND] Set the fsl elbc ECCM according the settings in bootloader.
...

Fixed up trivial debug conflicts in drivers/mtd/devices/{m25p80.c,mtd_dataflash.c}

Linus Torvalds
2009-01-10 04:37:15 +0800
a3a798c88 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 ... Browse Code »

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (94 commits)
ACPICA: hide private headers
ACPICA: create acpica/ directory
ACPI: fix build warning
ACPI : Use RSDT instead of XSDT by adding boot option of "acpi=rsdt"
ACPI: Avoid array address overflow when _CST MWAIT hint bits are set
fujitsu-laptop: Simplify SBLL/SBL2 backlight handling
fujitsu-laptop: Add BL power, LED control and radio state information
ACPICA: delete utcache.c
ACPICA: delete acdisasm.h
ACPICA: Update version to 20081204.
ACPICA: FADT: Update error msgs for consistency
ACPICA: FADT: set acpi_gbl_use_default_register_widths to TRUE by default
ACPICA: FADT parsing changes and fixes
ACPICA: Add ACPI_MUTEX_TYPE configuration option
ACPICA: Fixes for various ACPI data tables
ACPICA: Restructure includes into public/private
ACPI: remove private acpica headers from driver files
ACPI: reboot.c: use new acpi_reset interface
ACPICA: New: acpi_reset interface - write to reset register
ACPICA: Move all public H/W interfaces to new hwxface
...

Linus Torvalds
2009-01-10 03:55:14 +0800
d9e8a3a5b Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx ... Browse Code »

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (22 commits)
ioat: fix self test for multi-channel case
dmaengine: bump initcall level to arch_initcall
dmaengine: advertise all channels on a device to dma_filter_fn
dmaengine: use idr for registering dma device numbers
dmaengine: add a release for dma class devices and dependent infrastructure
ioat: do not perform removal actions at shutdown
iop-adma: enable module removal
iop-adma: kill debug BUG_ON
iop-adma: let devm do its job, don't duplicate free
dmaengine: kill enum dma_state_client
dmaengine: remove 'bigref' infrastructure
dmaengine: kill struct dma_client and supporting infrastructure
dmaengine: replace dma_async_client_register with dmaengine_get
atmel-mci: convert to dma_request_channel and down-level dma_slave
dmatest: convert to dma_request_channel
dmaengine: introduce dma_request_channel and private channels
net_dma: convert to dma_find_channel
dmaengine: provide a common 'issue_pending_all' implementation
dmaengine: centralize channel allocation, introduce dma_find_channel
dmaengine: up-level reference counting to the module level
...

Linus Torvalds
2009-01-10 03:52:14 +0800

09 Jan, 2009

28 commits

555d61d65 [S390] update documentation for hvc_iucv kernel parameter. ... Browse Code »

Signed-off-by: Hendrik Brueckner
Signed-off-by: Martin Schwidefsky

Hendrik Brueckner
2009-01-09 19:15:10 +0800
b2576e1d4 Merge branch 'linus' into release Browse Code »

Len Brown
2009-01-09 16:39:43 +0800
3cc8a5f4b Merge branch 'suspend' into release Browse Code »

Len Brown
2009-01-09 16:38:15 +0800
237889bf0 ACPI : Use RSDT instead of XSDT by adding boot option of "acpi=rsdt" ... Browse Code »

On some boxes there exist both RSDT and XSDT table. But unfortunately
sometimes there exists the following error when XSDT table is used:
a. 32/64X address mismatch
b. The 32/64X FACS address mismatch

In such case the boot option of "acpi=rsdt" is provided so that
RSDT is tried instead of XSDT table when the system can't work well.

http://bugzilla.kernel.org/show_bug.cgi?id=8246

Signed-off-by: Zhao Yakui
cc:Thomas Renninger
Signed-off-by: Len Brown

Zhao Yakui
2009-01-09 14:41:58 +0800
2150edc6c Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (57 commits)
jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fs
ext4: Remove "extents" mount option
block: Add Kconfig help which notes that ext4 needs CONFIG_LBD
ext4: Make printk's consistently prefixed with "EXT4-fs: "
ext4: Add sanity checks for the superblock before mounting the filesystem
ext4: Add mount option to set kjournald's I/O priority
jbd2: Submit writes to the journal using WRITE_SYNC
jbd2: Add pid and journal device name to the "kjournald2 starting" message
ext4: Add markers for better debuggability
ext4: Remove code to create the journal inode
ext4: provide function to release metadata pages under memory pressure
ext3: provide function to release metadata pages under memory pressure
add releasepage hooks to block devices which can be used by file systems
ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc
ext4: Init the complete page while building buddy cache
ext4: Don't allow new groups to be added during block allocation
ext4: mark the blocks/inode bitmap beyond end of group as used
ext4: Use new buffer_head flag to check uninit group bitmaps initialization
ext4: Fix the race between read_inode_bitmap() and ext4_new_inode()
ext4: code cleanup
...

Linus Torvalds
2009-01-09 09:14:59 +0800
1df2d017f Merge branch 'docs-next' of git://git.lwn.net/linux-2.6 ... Browse Code »

* 'docs-next' of git://git.lwn.net/linux-2.6:
Fix a typo in the development process document.
Document handling of bad memory
Document RCU and unloadable modules

Linus Torvalds
2009-01-09 07:52:13 +0800
d5b524327 Fix a typo in the development process document. ... Browse Code »

Reported-by: Aníbal Monsalve Salazar
Signed-off-by: Jonathan Corbet

Jonathan Corbet
2009-01-09 07:32:13 +0800
022992ee5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6:
regulator: fix kernel-doc warnings
regulator: catch some registration errors
regulator: Add basic DocBook manual
regulator: Fix some kerneldoc rendering issues
regulator: Add missing kerneldoc
regulator: Clean up kerneldoc warnings
regulator: Remove extraneous kerneldoc annotations
regulator: init/link earlier
regulator: move set_machine_constraints after regulator device initialization
regulator: da903x: make da903x_is_enabled return 0 or 1
regulator: da903x: add '\n' to error messages
regulator: sysfs attribute reduction (v2)
regulator: code shrink (v2)
regulator: improved mode error checks
regulator: enable/disable refcounting
regulator: struct device - replace bus_id with dev_name(), dev_set_name()

Linus Torvalds
2009-01-09 06:51:11 +0800
9fe5817f1 regulator: Add basic DocBook manual ... Browse Code »

Add a basic DocBook manual for the regulator API. This is much more
skeletal than the existing text documentation, the main benefit is to
provide a skeleton for automatic generation of a manual based on the
kerneldoc for the API.

Since large portions of the text are lifted from the existing text format
documentation written by Liam Girdwood much of the credit belongs to
him.

Signed-off-by: Mark Brown
Signed-off-by: Liam Girdwood

Mark Brown
2009-01-09 04:10:34 +0800
7ad68e2f9 regulator: sysfs attribute reduction (v2) ... Browse Code »

Clean up the sysfs interface to regulators by only exposing the
attributes that can be properly displayed. For example: when a
particular regulator method is needed to display the value, only
create that attribute when that method exists.

This cleaned-up interface is much more comprehensible. Most
regulators only support a subset of the possible methods, so
often more than half the attributes would be meaningless. Many
"not defined" values are no longer necessary. (But handling
of out-of-range values still looks a bit iffy.)

Documentation is updated to reflect that few of the attributes
are *always* present, and to briefly explain why a regulator may
not have a given attribute.

This adds object code, about a dozen bytes more than was removed
by the preceding patch, but saves a bunch of per-regulator data
associated with the now-removed attributes. So there's a net
reduction in memory footprint.

Signed-off-by: David Brownell
Signed-off-by: Liam Girdwood

David Brownell
2009-01-09 04:10:30 +0800
85da1fb54 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc ... Browse Code »

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (53 commits)
serial: Add driver for the Cell Network Processor serial port NWP device
powerpc: enable dynamic ftrace
powerpc/cell: Fix the prototype of create_vma_map()
powerpc/mm: Make clear_fixmap() actually work
powerpc/kdump: Use ppc_save_regs() in crash_setup_regs()
powerpc: Export cacheable_memzero as its now used in a driver
powerpc: Fix missing semicolons in mmu_decl.h
powerpc/pasemi: local_irq_save uses an unsigned long
powerpc/cell: Fix some u64 vs. long types
powerpc/cell: Use correct types in beat files
powerpc: Use correct type in prom_init.c
powerpc: Remove unnecessary casts
mtd/ps3vram: Use _PAGE_NO_CACHE in memory ioremap
mtd/ps3vram: Use msleep in waits
mtd/ps3vram: Use proper kernel types
mtd/ps3vram: Cleanup ps3vram driver messages
mtd/ps3vram: Remove ps3vram debug routines
mtd/ps3vram: Add modalias support to the ps3vram driver
mtd/ps3vram: Add ps3vram driver for accessing video RAM as MTD
powerpc: Fix iseries drivers build failure without CONFIG_VIOPATH
...

Linus Torvalds
2009-01-09 01:10:16 +0800
73ac36ea1 fix similar typos to successfull ... Browse Code »

When I review ocfs2 code, find there are 2 typos to "successfull". After
doing grep "successfull " in kernel tree, 22 typos found totally -- great
minds always think alike :)

This patch fixes all the similar typos. Thanks for Randy's ack and comments.

Signed-off-by: Coly Li
Acked-by: Randy Dunlap
Acked-by: Roland Dreier
Cc: Jeremy Kerr
Cc: Jeff Garzik
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: Theodore Ts'o
Cc: Mark Fasheh
Cc: Vlad Yasevich
Cc: Sridhar Samudrala
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Coly Li
2009-01-09 00:31:15 +0800
4037014e3 w1: send status messages after command processing ... Browse Code »

Send completion status of the commands to the userspace. Message and
protocol are described in the documentation.

Signed-off-by: Evgeniy Polyakov
Cc: Paul Alfille
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Evgeniy Polyakov
2009-01-09 00:31:14 +0800
f89735c4e w1: added w1 reset command ... Browse Code »

Command which allows to reset the bus.

Signed-off-by: Evgeniy Polyakov
Cc: Paul Alfille
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Evgeniy Polyakov
2009-01-09 00:31:14 +0800
e4e056aa3 w1: documentation update ... Browse Code »

Signed-off-by: Evgeniy Polyakov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Evgeniy Polyakov
2009-01-09 00:31:13 +0800
a5fd9139f w1: add 1-wire master driver for i.MX27 / i.MX31 ... Browse Code »

This patch adds support for the 1-wire master interface for i.MX27 and
i.MX31.

Signed-off-by: Luotao Fu
Signed-off-by: Sascha Hauer
Signed-off-by: Evgeniy Polyakov
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sascha Hauer
2009-01-09 00:31:13 +0800
999cd8a45 cgroups: add a per-subsystem hierarchy_mutex ... Browse Code »

These patches introduce new locking/refcount support for cgroups to
reduce the need for subsystems to call cgroup_lock(). This will
ultimately allow the atomicity of cgroup_rmdir() (which was removed
recently) to be restored.

These three patches give:

1/3 - introduce a per-subsystem hierarchy_mutex which a subsystem can
use to prevent changes to its own cgroup tree

2/3 - use hierarchy_mutex in place of calling cgroup_lock() in the
memory controller

3/3 - introduce a css_tryget() function similar to the one recently
proposed by Kamezawa, but avoiding spurious refcount failures in
the event of a race between a css_tryget() and an unsuccessful
cgroup_rmdir()

Future patches will likely involve:

- using hierarchy mutex in place of cgroup_lock() in more subsystems
where appropriate

- restoring the atomicity of cgroup_rmdir() with respect to cgroup_create()

This patch:

Add a hierarchy_mutex to the cgroup_subsys object that protects changes to
the hierarchy observed by that subsystem. It is taken by the cgroup
subsystem (in addition to cgroup_mutex) for the following operations:

- linking a cgroup into that subsystem's cgroup tree
- unlinking a cgroup from that subsystem's cgroup tree
- moving the subsystem to/from a hierarchy (including across the
bind() callback)

Thus if the subsystem holds its own hierarchy_mutex, it can safely
traverse its own hierarchy.

Signed-off-by: Paul Menage
Tested-by: KAMEZAWA Hiroyuki
Cc: Li Zefan
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2009-01-09 00:31:10 +0800
03f3c4336 memcg: fix swap accounting leak ... Browse Code »

Fix swapin charge operation of memcg.

Now, memcg has hooks to swap-out operation and checks SwapCache is really
unused or not. That check depends on contents of struct page. I.e. If
PageAnon(page) && page_mapped(page), the page is recoginized as
still-in-use.

Now, reuse_swap_page() calles delete_from_swap_cache() before establishment
of any rmap. Then, in followinig sequence

(Page fault with WRITE)
try_charge() (charge += PAGESIZE)
commit_charge() (Check page_cgroup is used or not..)
reuse_swap_page()
-> delete_from_swapcache()
-> mem_cgroup_uncharge_swapcache() (charge -= PAGESIZE)
......
New charge is uncharged soon....
To avoid this, move commit_charge() after page_mapcount() goes up to 1.
By this,

try_charge() (usage += PAGESIZE)
reuse_swap_page() (may usage -= PAGESIZE if PCG_USED is set)
commit_charge() (If page_cgroup is not marked as PCG_USED,
add new charge.)
Accounting will be correct.

Changelog (v2) -> (v3)
- fixed invalid charge to swp_entry==0.
- updated documentation.
Changelog (v1) -> (v2)
- fixed comment.

[nishimura@mxp.nes.nec.co.jp: swap accounting leak doc fix]
Signed-off-by: KAMEZAWA Hiroyuki
Acked-by: Balbir Singh
Tested-by: Balbir Singh
Cc: Hugh Dickins
Cc: Daisuke Nishimura
Signed-off-by: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-01-09 00:31:10 +0800
9836d8919 memcg: explain details and test document ... Browse Code »

Documentation for implementation details and how to test.

Just an example. feel free to modify, add, remove lines.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Daisuke Nishimura
Cc: Hugh Dickins
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-01-09 00:31:09 +0800
a7885eb8a memcg: swappiness ... Browse Code »

Currently, /proc/sys/vm/swappiness can change swappiness ratio for global
reclaim. However, memcg reclaim doesn't have tuning parameter for itself.

In general, the optimal swappiness depend on workload. (e.g. hpc
workload need to low swappiness than the others.)

Then, per cgroup swappiness improve administrator tunability.

Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: KOSAKI Motohiro
Cc: Balbir Singh
Cc: Daisuke Nishimura
Cc: Hugh Dickins
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2009-01-09 00:31:08 +0800
7f016ee8b memcg: show reclaim stat ... Browse Code »

Add the following four fields to memory.stat file:

- inactive_ratio
- recent_rotated_anon
- recent_rotated_file
- recent_scanned_anon
- recent_scanned_file

Acked-by: Rik van Riel
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: KOSAKI Motohiro
Cc: Balbir Singh
Cc: Daisuke Nishimura
Cc: Hugh Dickins
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2009-01-09 00:31:08 +0800
52bc0d821 memcg: memory cgroup hierarchy documentation ... Browse Code »

Documentation updates for hierarchy support

Signed-off-by: Balbir Singh
Cc: YAMAMOTO Takashi
Cc: Paul Menage
Cc: Li Zefan
Cc: David Rientjes
Cc: Pavel Emelianov
Cc: Dhaval Giani
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Balbir Singh
2009-01-09 00:31:05 +0800
8c7c6e34a memcg: mem+swap controller core ... Browse Code »

This patch implements per cgroup limit for usage of memory+swap. However
there are SwapCache, double counting of swap-cache and swap-entry is
avoided.

Mem+Swap controller works as following.
- memory usage is limited by memory.limit_in_bytes.
- memory + swap usage is limited by memory.memsw_limit_in_bytes.

This has following benefits.
- A user can limit total resource usage of mem+swap.

Without this, because memory resource controller doesn't take care of
usage of swap, a process can exhaust all the swap (by memory leak.)
We can avoid this case.

And Swap is shared resource but it cannot be reclaimed (goes back to memory)
until it's used. This characteristic can be trouble when the memory
is divided into some parts by cpuset or memcg.
Assume group A and group B.
After some application executes, the system can be..

Group A -- very large free memory space but occupy 99% of swap.
Group B -- under memory shortage but cannot use swap...it's nearly full.

Ability to set appropriate swap limit for each group is required.

Maybe someone wonder "why not swap but mem+swap ?"

- The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
to move account from memory to swap...there is no change in usage of
mem+swap.

In other words, when we want to limit the usage of swap without affecting
global LRU, mem+swap limit is better than just limiting swap.

Accounting target information is stored in swap_cgroup which is
per swap entry record.

Charge is done as following.
map
- charge page and memsw.

unmap
- uncharge page/memsw if not SwapCache.

swap-out (__delete_from_swap_cache)
- uncharge page
- record mem_cgroup information to swap_cgroup.

swap-in (do_swap_page)
- charged as page and memsw.
record in swap_cgroup is cleared.
memsw accounting is decremented.

swap-free (swap_free())
- if swap entry is freed, memsw is uncharged by PAGE_SIZE.

There are people work under never-swap environments and consider swap as
something bad. For such people, this mem+swap controller extension is just an
overhead. This overhead is avoided by config or boot option.
(see Kconfig. detail is not in this patch.)

TODO:
- maybe more optimization can be don in swap-in path. (but not very safe.)
But we just do simple accounting at this stage.

[nishimura@mxp.nes.nec.co.jp: make resize limit hold mutex]
[hugh@veritas.com: memswap controller core swapcache fixes]
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Li Zefan
Cc: Balbir Singh
Cc: Pavel Emelyanov
Signed-off-by: Daisuke Nishimura
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-01-09 00:31:05 +0800
c077719be memcg: mem+swap controller Kconfig ... Browse Code »

Config and control variable for mem+swap controller.

This patch adds CONFIG_CGROUP_MEM_RES_CTLR_SWAP
(memory resource controller swap extension.)

For accounting swap, it's obvious that we have to use additional memory to
remember "who uses swap". This adds more overhead. So, it's better to
offer "choice" to users. This patch adds 2 choices.

This patch adds 2 parameters to enable swap extension or not.
- CONFIG
- boot option

Reviewed-by: Daisuke Nishimura
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Li Zefan
Cc: Balbir Singh
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-01-09 00:31:05 +0800
d13d14430 memcg: handle swap caches ... Browse Code »

SwapCache support for memory resource controller (memcg)

Before mem+swap controller, memcg itself should handle SwapCache in proper
way. This is cut-out from it.

In current memcg, SwapCache is just leaked and the user can create tons of
SwapCache. This is a leak of account and should be handled.

SwapCache accounting is done as following.

charge (anon)
- charged when it's mapped.
(because of readahead, charge at add_to_swap_cache() is not sane)
uncharge (anon)
- uncharged when it's dropped from swapcache and fully unmapped.
means it's not uncharged at unmap.
Note: delete from swap cache at swap-in is done after rmap information
is established.
charge (shmem)
- charged at swap-in. this prevents charge at add_to_page_cache().

uncharge (shmem)
- uncharged when it's dropped from swapcache and not on shmem's
radix-tree.

at migration, check against 'old page' is modified to handle shmem.

Comparing to the old version discussed (and caused troubles), we have
advantages of
- PCG_USED bit.
- simple migrating handling.

So, situation is much easier than several months ago, maybe.

[hugh@veritas.com: memcg: handle swap caches build fix]
Reviewed-by: Daisuke Nishimura
Tested-by: Daisuke Nishimura
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Hugh Dickins
Cc: Li Zefan
Cc: Balbir Singh
Cc: Pavel Emelyanov
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-01-09 00:31:05 +0800
c1e862c1f memcg: new force_empty to free pages under group ... Browse Code »

By memcg-move-all-accounts-to-parent-at-rmdir.patch, there is no leak of
memory usage and force_empty is removed.

This patch adds "force_empty" again, in reasonable manner.

memory.force_empty file works when

#echo 0 (or some) > memory.force_empty
and have following function.

1. only works when there are no task in this cgroup.
2. free all page under this cgroup as much as possible.
3. page which cannot be freed will be moved up to parent.
4. Then, memcg will be empty after above echo returns.

This is much better behavior than old "force_empty" which just forget
all accounts. This patch also check signal_pending() and above "echo"
can be stopped by "Ctrl-C".

[akpm@linux-foundation.org: cleanup]
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Li Zefan
Cc: Balbir Singh
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-01-09 00:31:04 +0800
f817ed485 memcg: move all acccounting to parent at rmdir() ... Browse Code »

This patch provides a function to move account information of a page
between mem_cgroups and rewrite force_empty to make use of this.

This moving of page_cgroup is done under
- lru_lock of source/destination mem_cgroup is held.
- lock_page_cgroup() is held.

Then, a routine which touches pc->mem_cgroup without lock_page_cgroup()
should confirm pc->mem_cgroup is still valid or not. Typical code can be
following.

(while page is not under lock_page())
mem = pc->mem_cgroup;
mz = page_cgroup_zoneinfo(pc)
spin_lock_irqsave(&mz->lru_lock);
if (pc->mem_cgroup == mem)
...../* some list handling */
spin_unlock_irqrestore(&mz->lru_lock);

Of course, better way is
lock_page_cgroup(pc);
....
unlock_page_cgroup(pc);

But you should confirm the nest of lock and avoid deadlock.

If you treats page_cgroup from mem_cgroup's LRU under mz->lru_lock,
you don't have to worry about what pc->mem_cgroup points to.
moved pages are added to head of lru, not to tail.

Expected users of this routine is:
- force_empty (rmdir)
- moving tasks between cgroup (for moving account information.)
- hierarchy (maybe useful.)

force_empty(rmdir) uses this move_account and move pages to its parent.
This "move" will not cause OOM (I added "oom" parameter to try_charge().)

If the parent is busy (not enough memory), force_empty calls try_to_free_page()
and reduce usage.

Purpose of this behavior is
- Fix "forget all" behavior of force_empty and avoid leak of accounting.
- By "moving first, free if necessary", keep pages on memory as much as
possible.

Adding a switch to change behavior of force_empty to
- free first, move if necessary
- free all, if there is mlocked/busy pages, return -EBUSY.
is under consideration. (I'll add if someone requtests.)

This patch also removes memory.force_empty file, a brutal debug-only interface.

Reviewed-by: Daisuke Nishimura
Tested-by: Daisuke Nishimura
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-01-09 00:31:04 +0800
18e7f1f0d cgroups: documentation updates ... Browse Code »

- remove 'releasable' since it has been moved to the debug subsys.
- update lock requirements of subsys callbacks.

Signed-off-by: Li Zefan
Cc: Paul Menage
Cc: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2009-01-09 00:31:01 +0800

08 Jan, 2009

3 commits

dd8632a12 NOMMU: Make mmap allocation page trimming behaviour configurable. ... Browse Code »

NOMMU mmap allocates a piece of memory for an mmap that's rounded up in size to
the nearest power-of-2 number of pages. Currently it then discards the excess
pages back to the page allocator, making that memory available for use by other
things. This can, however, cause greater amount of fragmentation.

To counter this, a sysctl is added in order to fine-tune the trimming
behaviour. The default behaviour remains to trim pages aggressively, while
this can either be disabled completely or set to a higher page-granular
watermark in order to have finer-grained control.

vm region vm_top bits taken from an earlier patch by David Howells.

Signed-off-by: Paul Mundt
Signed-off-by: David Howells
Tested-by: Mike Frysinger

Paul Mundt
2009-01-08 20:04:47 +0800
8feae1311 NOMMU: Make VMAs per MM as for MMU-mode linux ... Browse Code »

Make VMAs per mm_struct as for MMU-mode linux. This solves two problems:

(1) In SYSV SHM where nattch for a segment does not reflect the number of
shmat's (and forks) done.

(2) In mmap() where the VMA's vm_mm is set to point to the parent mm by an
exec'ing process when VM_EXECUTABLE is specified, regardless of the fact
that a VMA might be shared and already have its vm_mm assigned to another
process or a dead process.

A new struct (vm_region) is introduced to track a mapped region and to remember
the circumstances under which it may be shared and the vm_list_struct structure
is discarded as it's no longer required.

This patch makes the following additional changes:

(1) Regions are now allocated with alloc_pages() rather than kmalloc() and
with no recourse to __GFP_COMP, so the pages are not composite. Instead,
each page has a reference on it held by the region. Anything else that is
interested in such a page will have to get a reference on it to retain it.
When the pages are released due to unmapping, each page is passed to
put_page() and will be freed when the page usage count reaches zero.

(2) Excess pages are trimmed after an allocation as the allocation must be
made as a power-of-2 quantity of pages.

(3) VMAs are added to the parent MM's R/B tree and mmap lists. As an MM may
end up with overlapping VMAs within the tree, the VMA struct address is
appended to the sort key.

(4) Non-anonymous VMAs are now added to the backing inode's prio list.

(5) Holes may be punched in anonymous VMAs with munmap(), releasing parts of
the backing region. The VMA and region structs will be split if
necessary.

(6) sys_shmdt() only releases one attachment to a SYSV IPC shared memory
segment instead of all the attachments at that addresss. Multiple
shmat()'s return the same address under NOMMU-mode instead of different
virtual addresses as under MMU-mode.

(7) Core dumping for ELF-FDPIC requires fewer exceptions for NOMMU-mode.

(8) /proc/maps is now the global list of mapped regions, and may list bits
that aren't actually mapped anywhere.

(9) /proc/meminfo gains a line (tagged "MmapCopy") that indicates the amount
of RAM currently allocated by mmap to hold mappable regions that can't be
mapped directly. These are copies of the backing device or file if not
anonymous.

These changes make NOMMU mode more similar to MMU mode. The downside is that
NOMMU mode requires some extra memory to track things over NOMMU without this
patch (VMAs are no longer shared, and there are now region structs).

Signed-off-by: David Howells
Tested-by: Mike Frysinger
Acked-by: Paul Mundt

David Howells
2009-01-08 20:04:47 +0800
24f030175 Merge commit 'origin/master' into next Browse Code »

Benjamin Herrenschmidt
2009-01-08 13:24:38 +0800