07 Mar, 2010
16 commits
-
free_hot_page() is just a wrapper around free_hot_cold_page() with
parameter 'cold = 0'. After adding a clear comment for
free_hot_cold_page(), it is reasonable to remove a level of call.[akpm@linux-foundation.org: fix build]
Signed-off-by: Li Hong
Cc: Mel Gorman
Cc: Rik van Riel
Cc: Ingo Molnar
Cc: Larry Woodman
Cc: Peter Zijlstra
Cc: Li Ming Chun
Cc: KOSAKI Motohiro
Cc: Americo Wang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Move a call of trace_mm_page_free_direct() from free_hot_page() to
free_hot_cold_page(). It is clearer and close to kmemcheck_free_shadow(),
as it is done in function __free_pages_ok().Signed-off-by: Li Hong
Cc: Mel Gorman
Cc: Rik van Riel
Cc: Ingo Molnar
Cc: Larry Woodman
Cc: Peter Zijlstra
Cc: Li Ming Chun
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
trace_mm_page_free_direct() is called in function __free_pages(). But it
is called again in free_hot_page() if order == 0 and produce duplicate
records in trace file for mm_page_free_direct event. As below:K-PID CPU# TIMESTAMP FUNCTION
gnome-terminal-1567 [000] 4415.246466: mm_page_free_direct: page=ffffea0003db9f40 pfn=1155800 order=0
gnome-terminal-1567 [000] 4415.246468: mm_page_free_direct: page=ffffea0003db9f40 pfn=1155800 order=0
gnome-terminal-1567 [000] 4415.246506: mm_page_alloc: page=ffffea0003db9f40 pfn=1155800 order=0 migratetype=0 gfp_flags=GFP_KERNEL
gnome-terminal-1567 [000] 4415.255557: mm_page_free_direct: page=ffffea0003db9f40 pfn=1155800 order=0
gnome-terminal-1567 [000] 4415.255557: mm_page_free_direct: page=ffffea0003db9f40 pfn=1155800 order=0This patch removes the first call and adds a call to
trace_mm_page_free_direct() in __free_pages_ok().Signed-off-by: Li Hong
Cc: Mel Gorman
Cc: Rik van Riel
Cc: Ingo Molnar
Cc: Larry Woodman
Cc: Peter Zijlstra
Cc: Li Ming Chun
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Commit cf40bd16fd ("lockdep: annotate reclaim context") introduced reclaim
context annotation. But it didn't annotate zone reclaim. This patch do
it.The point is, commit cf40bd16fd annotate __alloc_pages_direct_reclaim but
zone-reclaim doesn't use __alloc_pages_direct_reclaim.current call graph is
__alloc_pages_nodemask
get_page_from_freelist
zone_reclaim()
__alloc_pages_slowpath
__alloc_pages_direct_reclaim
try_to_free_pagesActually, if zone_reclaim_mode=1, VM never call
__alloc_pages_direct_reclaim in usual VM pressure.Signed-off-by: KOSAKI Motohiro
Reviewed-by: Minchan Kim
Acked-by: Nick Piggin
Cc: Peter Zijlstra
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The get_scan_ratio() should have all scan-ratio related calculations.
Thus, this patch move some calculation into get_scan_ratio.Signed-off-by: KOSAKI Motohiro
Reviewed-by: Rik van Riel
Reviewed-by: KAMEZAWA Hiroyuki
Reviewed-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Kswapd checks that zone has sufficient pages free via zone_watermark_ok().
If any zone doesn't have enough pages, we set all_zones_ok to zero.
!all_zone_ok makes kswapd retry rather than sleeping.I think the watermark check before shrink_zone() is pointless. Only after
kswapd has tried to shrink the zone is the check meaningful.Move the check to after the call to shrink_zone().
[akpm@linux-foundation.org: fix comment, layout]
Signed-off-by: Minchan Kim
Reviewed-by: KOSAKI Motohiro
Cc: Mel Gorman
Cc: Rik van Riel
Reviewed-by: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Make sure compiler won't do weird things with limits. E.g. fetching them
twice may return 2 different values after writable limits are implemented.I.e. either use rlimit helpers added in
3e10e716abf3c71bdb5d86b8f507f9e72236c9cd ("resource: add helpers for
fetching rlimits") or ACCESS_ONCE if not applicable.Signed-off-by: Jiri Slaby
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently, mlock_vma_pages_range() only return len or 0. then current
error handling of mmap_region() is meaningless complex.This patch makes simplify and makes consist with brk() code.
Signed-off-by: KOSAKI Motohiro
Cc: Nick Piggin
Cc: Lee Schermerhorn
Cc: Rik van Riel
Cc: KAMEZAWA Hiroyuki
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently, mlock_vma_pages_range() never return negative value. Then, we
can remove some worthless error check.Signed-off-by: KOSAKI Motohiro
Cc: Nick Piggin
Cc: Lee Schermerhorn
Cc: Rik van Riel
Cc: KAMEZAWA Hiroyuki
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
A frequent questions from users about memory management is what numbers of
swap ents are user for processes. And this information will give some
hints to oom-killer.Besides we can count the number of swapents per a process by scanning
/proc//smaps, this is very slow and not good for usual process
information handler which works like 'ps' or 'top'. (ps or top is now
enough slow..)This patch adds a counter of swapents to mm_counter and update is at each
swap events. Information is exported via /proc//status file as[kamezawa@bluextal memory]$ cat /proc/self/status
Name: cat
State: R (running)
Tgid: 2910
Pid: 2910
PPid: 2823
TracerPid: 0
Uid: 500 500 500 500
Gid: 500 500 500 500
FDSize: 256
Groups: 500
VmPeak: 82696 kB
VmSize: 82696 kB
VmLck: 0 kB
VmHWM: 432 kB
VmRSS: 432 kB
VmData: 172 kB
VmStk: 84 kB
VmExe: 48 kB
VmLib: 1568 kB
VmPTE: 40 kB
VmSwap: 0 kB
Reviewed-by: Minchan Kim
Reviewed-by: Christoph Lameter
Cc: Lee Schermerhorn
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Considering the nature of per mm stats, it's the shared object among
threads and can be a cache-miss point in the page fault path.This patch adds per-thread cache for mm_counter. RSS value will be
counted into a struct in task_struct and synchronized with mm's one at
events.Now, in this patch, the event is the number of calls to handle_mm_fault.
Per-thread value is added to mm at each 64 calls.rough estimation with small benchmark on parallel thread (2threads) shows
[before]
4.5 cache-miss/faults
[after]
4.0 cache-miss/faults
Anyway, the most contended object is mmap_sem if the number of threads grows.[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Minchan Kim
Cc: Christoph Lameter
Cc: Lee Schermerhorn
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Presently, per-mm statistics counter is defined by macro in sched.h
This patch modifies it to
- defined in mm.h as inlinf functions
- use array instead of macro's name creation.This patch is for reducing patch size in future patch to modify
implementation of per-mm counter.Signed-off-by: KAMEZAWA Hiroyuki
Reviewed-by: Minchan Kim
Cc: Christoph Lameter
Cc: Lee Schermerhorn
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Replace open-coded loop with for_each_set_bit().
Signed-off-by: Akinobu Mita
Acked-by: Roland Dreier
Cc: Sean Hefty
Cc: Hal Rosenstock
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Rename for_each_bit to for_each_set_bit in the kernel source tree. To
permit for_each_clear_bit(), should that ever be added.The patch includes a macro to map the old for_each_bit() onto the new
for_each_set_bit(). This is a (very) temporary thing to ease the migration.[akpm@linux-foundation.org: add temporary for_each_bit()]
Suggested-by: Alexey Dobriyan
Suggested-by: Andrew Morton
Signed-off-by: Akinobu Mita
Cc: "David S. Miller"
Cc: Russell King
Cc: David Woodhouse
Cc: Artem Bityutskiy
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Use of get_irq_chip_data() et al. requires including linux/irq.h
Signed-off-by: David S. Miller
Cc: Richard Röjfors
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We managed to lose O_DIRECTORY testing due to a stupid typo in commit
1f36f774b2 ("Switch !O_CREAT case to use of do_last()")Reported-by: Walter Sheets
Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds
06 Mar, 2010
24 commits
-
* 'slab-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
SLUB: Fix per-cpu merge conflict
failslab: add ability to filter slab caches
slab: fix regression in touched logic
dma kmalloc handling fixes
slub: remove impossible condition
slab: initialize unused alien cache entry as NULL at alloc_alien_cache().
SLUB: Make slub statistics use this_cpu_inc
SLUB: this_cpu: Remove slub kmem_cache fields
SLUB: Get rid of dynamic DMA kmalloc cache allocation
SLUB: Use this_cpu operations in slub -
* 'nfs-for-2.6.34' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (44 commits)
NFS: Remove requirement for inode->i_mutex from nfs_invalidate_mapping
NFS: Clean up nfs_sync_mapping
NFS: Simplify nfs_wb_page()
NFS: Replace __nfs_write_mapping with sync_inode()
NFS: Simplify nfs_wb_page_cancel()
NFS: Ensure inode is always marked I_DIRTY_DATASYNC, if it has unstable pages
NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set
NFS: Reduce the number of unnecessary COMMIT calls
NFS: Add a count of the number of unstable writes carried by an inode
NFS: Cleanup - move nfs_write_inode() into fs/nfs/write.c
nfs41 fix NFS4ERR_CLID_INUSE for exchange id
NFS: Fix an allocation-under-spinlock bug
SUNRPC: Handle EINVAL error returns from the TCP connect operation
NFSv4.1: Various fixes to the sequence flag error handling
nfs4: renewd renew operations should take/put a client reference
nfs41: renewd sequence operations should take/put client reference
nfs: prevent backlogging of renewd requests
nfs: kill renewd before clearing client minor version
NFS: Make close(2) asynchronous when closing NFS O_DIRECT files
NFS: Improve NFS iostat byte count accuracy for writes
... -
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
fs/9p: Add hardlink support to .u extension
9P2010.L handshake: .L protocol negotiation
9P2010.L handshake: Remove "dotu" variable
9P2010.L handshake: Add mount option
9P2010.L handshake: Add VFS flags
net/9p: Handle mount errors correctly.
net/9p: Remove MAX_9P_CHAN limit
net/9p: Add multi channel support. -
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits)
quota: stop using QUOTA_OK / NO_QUOTA
dquot: cleanup dquot initialize routine
dquot: move dquot initialization responsibility into the filesystem
dquot: cleanup dquot drop routine
dquot: move dquot drop responsibility into the filesystem
dquot: cleanup dquot transfer routine
dquot: move dquot transfer responsibility into the filesystem
dquot: cleanup inode allocation / freeing routines
dquot: cleanup space allocation / freeing routines
ext3: add writepage sanity checks
ext3: Truncate allocated blocks if direct IO write fails to update i_size
quota: Properly invalidate caches even for filesystems with blocksize < pagesize
quota: generalize quota transfer interface
quota: sb_quota state flags cleanup
jbd: Delay discarding buffers in journal_unmap_buffer
ext3: quota_write cross block boundary behaviour
quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota
quota: split out compat_sys_quotactl support from quota.c
quota: split out netlink notification support from quota.c
quota: remove invalid optimization from quota_sync_all
...Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c
-
* 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (145 commits)
KVM: x86: Add KVM_CAP_X86_ROBUST_SINGLESTEP
KVM: VMX: Update instruction length on intercepted BP
KVM: Fix emulate_sys[call, enter, exit]()'s fault handling
KVM: Fix segment descriptor loading
KVM: Fix load_guest_segment_descriptor() to inject page fault
KVM: x86 emulator: Forbid modifying CS segment register by mov instruction
KVM: Convert kvm->requests_lock to raw_spinlock_t
KVM: Convert i8254/i8259 locks to raw_spinlocks
KVM: x86 emulator: disallow opcode 82 in 64-bit mode
KVM: x86 emulator: code style cleanup
KVM: Plan obsolescence of kernel allocated slots, paravirt mmu
KVM: x86 emulator: Add LOCK prefix validity checking
KVM: x86 emulator: Check CPL level during privilege instruction emulation
KVM: x86 emulator: Fix popf emulation
KVM: x86 emulator: Check IOPL level during io instruction emulation
KVM: x86 emulator: fix memory access during x86 emulation
KVM: x86 emulator: Add Virtual-8086 mode of emulation
KVM: x86 emulator: Add group9 instruction decoding
KVM: x86 emulator: Add group8 instruction decoding
KVM: do not store wqh in irqfd
...Trivial conflicts in Documentation/feature-removal-schedule.txt
-
For regular file and directories we put the link
count in th extension field in a tagged string format.Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Eric Van Hensbergen -
This patch adds 9P2010.L protocol negotiation with the server
Signed-off-by: Sripathi Kodi
Signed-off-by: Eric Van Hensbergen -
Removes 'dotu' variable and make everything dependent
on 'proto_version' field.Signed-off-by: Sripathi Kodi
Signed-off-by: Eric Van Hensbergen -
Add new mount V9FS mount option to specify protocol version
This patch adds a new mount option to specify protocol version.
With this option it is possible to use "-o version=" switch to
specify 9P protocol version to use. Valid options for version
are:
9p2000
9p2000.u
9p2010.LSigned-off-by: Sripathi Kodi
Signed-off-by: Eric Van Hensbergen -
Add 9P2000.u and 9P2010.L protocol flags to V9FS VFS
This patch adds 9P2000.u and 9P2010.L protocol flags into V9FS VFS side code
and removes the single flag used for 'extended'.Signed-off-by: Sripathi Kodi
Signed-off-by: Eric Van Hensbergen -
With this patch we have
# mount -t 9p -o trans=virtio virtio2 /mnt/
# mount -t 9p -o trans=virtio virtio2 /mnt/
mount: virtio2 already mounted or /mnt/ busy
mount: according to mtab, virtio2 is already mounted on /mnt
# mount -t 9p -o trans=virtio virtio3 /mnt/ -o debug=0xfff
mount: special device virtio3 does not existSigned-off-by: Aneesh Kumar K.V
Signed-off-by: Eric Van Hensbergen -
Use a list to track the channel instead of statically
allocated arraySigned-off-by: Aneesh Kumar K.V
Signed-off-by: Eric Van Hensbergen -
This is needed for supporting multiple mount points.
We can find out the device names to be used with mount by checking
/sys/devices/virtio-pci/virtio*/device file
if the device file have value 9 then the specific virtio device can
be used for mounting.ex:
#cat /sys/devices/virtio-pci/virtio1/device
9now we can mount using
# mount -t 9p -o trans=virtio virtio1 /mnt/Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Eric Van Hensbergen -
Signed-off-by: Trond Myklebust
-
Remove the redundant call to filemap_write_and_wait().
Signed-off-by: Trond Myklebust
-
Signed-off-by: Trond Myklebust
-
Now that we have correct COMMIT semantics in writeback_single_inode, we can
reduce and simplify nfs_wb_all(). Also replace nfs_wb_nocommit() with a
call to filemap_write_and_wait(), which doesn't need to hold the
inode->i_mutex.With that done, we can eliminate nfs_write_mapping() altogether.
Signed-off-by: Trond Myklebust
-
In all cases we should be able to just remove the request and call
cancel_dirty_page().Signed-off-by: Trond Myklebust
-
Since nfs_scan_list() doesn't wait for locked pages, we have a race in
which it is possible to end up with an inode that needs to send a COMMIT,
but which does not have the I_DIRTY_DATASYNC flag set.Signed-off-by: Trond Myklebust
-
Signed-off-by: Trond Myklebust
Acked-by: Peter Zijlstra
Acked-by: Wu Fengguang -
If the caller is doing a non-blocking flush, and there are still writebacks
pending on the wire, we can usually defer the COMMIT call until those
writes are done.Also ensure that we honour the wbc->nonblocking flag.
Signed-off-by: Trond Myklebust
-
In order to know when we should do opportunistic commits of the unstable
writes, when the VM is doing a background flush, we add a field to count
the number of unstable writes.Signed-off-by: Trond Myklebust
-
The sole purpose of nfs_write_inode is to commit unstable writes, so
move it into fs/nfs/write.c, and make nfs_commit_inode static.Signed-off-by: Trond Myklebust