07 Mar, 2010

7 commits

  • A memmap is a directory in sysfs which includes 3 text files: start, end
    and type. For example:

    start: 0x100000
    end: 0x7e7b1cff
    type: System RAM

    Interface firmware_map_add was not called explicitly. Remove it and add
    function firmware_map_add_hotplug as hotplug interface of memmap.

    Each memory entry has a memmap in sysfs, When we hot-add new memory, sysfs
    does not export memmap entry for it. We add a call in function add_memory
    to function firmware_map_add_hotplug.

    Add a new function add_sysfs_fw_map_entry() to create memmap entry, it
    will be called when initialize memmap and hot-add memory.

    [akpm@linux-foundation.org: un-kernedoc a no longer kerneldoc comment]
    Signed-off-by: Shaohui Zheng
    Acked-by: Andi Kleen
    Acked-by: Yasunori Goto
    Reviewed-by: Wu Fengguang
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    akpm@linux-foundation.org
     
  • commit e815af95 ("change all_unreclaimable zone member to flags") changed
    all_unreclaimable member to bit flag. But it had an undesireble side
    effect. free_one_page() is one of most hot path in linux kernel and
    increasing atomic ops in it can reduce kernel performance a bit.

    Thus, this patch revert such commit partially. at least
    all_unreclaimable shouldn't share memory word with other zone flags.

    [akpm@linux-foundation.org: fix patch interaction]
    Signed-off-by: KOSAKI Motohiro
    Cc: David Rientjes
    Cc: Wu Fengguang
    Cc: KAMEZAWA Hiroyuki
    Cc: Minchan Kim
    Cc: Huang Shijie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • free_hot_page() is just a wrapper around free_hot_cold_page() with
    parameter 'cold = 0'. After adding a clear comment for
    free_hot_cold_page(), it is reasonable to remove a level of call.

    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Li Hong
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Ingo Molnar
    Cc: Larry Woodman
    Cc: Peter Zijlstra
    Cc: Li Ming Chun
    Cc: KOSAKI Motohiro
    Cc: Americo Wang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Hong
     
  • A frequent questions from users about memory management is what numbers of
    swap ents are user for processes. And this information will give some
    hints to oom-killer.

    Besides we can count the number of swapents per a process by scanning
    /proc//smaps, this is very slow and not good for usual process
    information handler which works like 'ps' or 'top'. (ps or top is now
    enough slow..)

    This patch adds a counter of swapents to mm_counter and update is at each
    swap events. Information is exported via /proc//status file as

    [kamezawa@bluextal memory]$ cat /proc/self/status
    Name: cat
    State: R (running)
    Tgid: 2910
    Pid: 2910
    PPid: 2823
    TracerPid: 0
    Uid: 500 500 500 500
    Gid: 500 500 500 500
    FDSize: 256
    Groups: 500
    VmPeak: 82696 kB
    VmSize: 82696 kB
    VmLck: 0 kB
    VmHWM: 432 kB
    VmRSS: 432 kB
    VmData: 172 kB
    VmStk: 84 kB
    VmExe: 48 kB
    VmLib: 1568 kB
    VmPTE: 40 kB
    VmSwap: 0 kB
    Reviewed-by: Minchan Kim
    Reviewed-by: Christoph Lameter
    Cc: Lee Schermerhorn
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Considering the nature of per mm stats, it's the shared object among
    threads and can be a cache-miss point in the page fault path.

    This patch adds per-thread cache for mm_counter. RSS value will be
    counted into a struct in task_struct and synchronized with mm's one at
    events.

    Now, in this patch, the event is the number of calls to handle_mm_fault.
    Per-thread value is added to mm at each 64 calls.

    rough estimation with small benchmark on parallel thread (2threads) shows
    [before]
    4.5 cache-miss/faults
    [after]
    4.0 cache-miss/faults
    Anyway, the most contended object is mmap_sem if the number of threads grows.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Minchan Kim
    Cc: Christoph Lameter
    Cc: Lee Schermerhorn
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Presently, per-mm statistics counter is defined by macro in sched.h

    This patch modifies it to
    - defined in mm.h as inlinf functions
    - use array instead of macro's name creation.

    This patch is for reducing patch size in future patch to modify
    implementation of per-mm counter.

    Signed-off-by: KAMEZAWA Hiroyuki
    Reviewed-by: Minchan Kim
    Cc: Christoph Lameter
    Cc: Lee Schermerhorn
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Rename for_each_bit to for_each_set_bit in the kernel source tree. To
    permit for_each_clear_bit(), should that ever be added.

    The patch includes a macro to map the old for_each_bit() onto the new
    for_each_set_bit(). This is a (very) temporary thing to ease the migration.

    [akpm@linux-foundation.org: add temporary for_each_bit()]
    Suggested-by: Alexey Dobriyan
    Suggested-by: Andrew Morton
    Signed-off-by: Akinobu Mita
    Cc: "David S. Miller"
    Cc: Russell King
    Cc: David Woodhouse
    Cc: Artem Bityutskiy
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

06 Mar, 2010

20 commits

  • * 'slab-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
    SLUB: Fix per-cpu merge conflict
    failslab: add ability to filter slab caches
    slab: fix regression in touched logic
    dma kmalloc handling fixes
    slub: remove impossible condition
    slab: initialize unused alien cache entry as NULL at alloc_alien_cache().
    SLUB: Make slub statistics use this_cpu_inc
    SLUB: this_cpu: Remove slub kmem_cache fields
    SLUB: Get rid of dynamic DMA kmalloc cache allocation
    SLUB: Use this_cpu operations in slub

    Linus Torvalds
     
  • * 'nfs-for-2.6.34' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (44 commits)
    NFS: Remove requirement for inode->i_mutex from nfs_invalidate_mapping
    NFS: Clean up nfs_sync_mapping
    NFS: Simplify nfs_wb_page()
    NFS: Replace __nfs_write_mapping with sync_inode()
    NFS: Simplify nfs_wb_page_cancel()
    NFS: Ensure inode is always marked I_DIRTY_DATASYNC, if it has unstable pages
    NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set
    NFS: Reduce the number of unnecessary COMMIT calls
    NFS: Add a count of the number of unstable writes carried by an inode
    NFS: Cleanup - move nfs_write_inode() into fs/nfs/write.c
    nfs41 fix NFS4ERR_CLID_INUSE for exchange id
    NFS: Fix an allocation-under-spinlock bug
    SUNRPC: Handle EINVAL error returns from the TCP connect operation
    NFSv4.1: Various fixes to the sequence flag error handling
    nfs4: renewd renew operations should take/put a client reference
    nfs41: renewd sequence operations should take/put client reference
    nfs: prevent backlogging of renewd requests
    nfs: kill renewd before clearing client minor version
    NFS: Make close(2) asynchronous when closing NFS O_DIRECT files
    NFS: Improve NFS iostat byte count accuracy for writes
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
    fs/9p: Add hardlink support to .u extension
    9P2010.L handshake: .L protocol negotiation
    9P2010.L handshake: Remove "dotu" variable
    9P2010.L handshake: Add mount option
    9P2010.L handshake: Add VFS flags
    net/9p: Handle mount errors correctly.
    net/9p: Remove MAX_9P_CHAN limit
    net/9p: Add multi channel support.

    Linus Torvalds
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits)
    quota: stop using QUOTA_OK / NO_QUOTA
    dquot: cleanup dquot initialize routine
    dquot: move dquot initialization responsibility into the filesystem
    dquot: cleanup dquot drop routine
    dquot: move dquot drop responsibility into the filesystem
    dquot: cleanup dquot transfer routine
    dquot: move dquot transfer responsibility into the filesystem
    dquot: cleanup inode allocation / freeing routines
    dquot: cleanup space allocation / freeing routines
    ext3: add writepage sanity checks
    ext3: Truncate allocated blocks if direct IO write fails to update i_size
    quota: Properly invalidate caches even for filesystems with blocksize < pagesize
    quota: generalize quota transfer interface
    quota: sb_quota state flags cleanup
    jbd: Delay discarding buffers in journal_unmap_buffer
    ext3: quota_write cross block boundary behaviour
    quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota
    quota: split out compat_sys_quotactl support from quota.c
    quota: split out netlink notification support from quota.c
    quota: remove invalid optimization from quota_sync_all
    ...

    Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c

    Linus Torvalds
     
  • * 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (145 commits)
    KVM: x86: Add KVM_CAP_X86_ROBUST_SINGLESTEP
    KVM: VMX: Update instruction length on intercepted BP
    KVM: Fix emulate_sys[call, enter, exit]()'s fault handling
    KVM: Fix segment descriptor loading
    KVM: Fix load_guest_segment_descriptor() to inject page fault
    KVM: x86 emulator: Forbid modifying CS segment register by mov instruction
    KVM: Convert kvm->requests_lock to raw_spinlock_t
    KVM: Convert i8254/i8259 locks to raw_spinlocks
    KVM: x86 emulator: disallow opcode 82 in 64-bit mode
    KVM: x86 emulator: code style cleanup
    KVM: Plan obsolescence of kernel allocated slots, paravirt mmu
    KVM: x86 emulator: Add LOCK prefix validity checking
    KVM: x86 emulator: Check CPL level during privilege instruction emulation
    KVM: x86 emulator: Fix popf emulation
    KVM: x86 emulator: Check IOPL level during io instruction emulation
    KVM: x86 emulator: fix memory access during x86 emulation
    KVM: x86 emulator: Add Virtual-8086 mode of emulation
    KVM: x86 emulator: Add group9 instruction decoding
    KVM: x86 emulator: Add group8 instruction decoding
    KVM: do not store wqh in irqfd
    ...

    Trivial conflicts in Documentation/feature-removal-schedule.txt

    Linus Torvalds
     
  • Removes 'dotu' variable and make everything dependent
    on 'proto_version' field.

    Signed-off-by: Sripathi Kodi
    Signed-off-by: Eric Van Hensbergen

    Sripathi Kodi
     
  • Add new mount V9FS mount option to specify protocol version

    This patch adds a new mount option to specify protocol version.
    With this option it is possible to use "-o version=" switch to
    specify 9P protocol version to use. Valid options for version
    are:
    9p2000
    9p2000.u
    9p2010.L

    Signed-off-by: Sripathi Kodi
    Signed-off-by: Eric Van Hensbergen

    Sripathi Kodi
     
  • Use a list to track the channel instead of statically
    allocated array

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • This is needed for supporting multiple mount points.

    We can find out the device names to be used with mount by checking

    /sys/devices/virtio-pci/virtio*/device file

    if the device file have value 9 then the specific virtio device can
    be used for mounting.

    ex:
    #cat /sys/devices/virtio-pci/virtio1/device
    9

    now we can mount using
    # mount -t 9p -o trans=virtio virtio1 /mnt/

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • Trond Myklebust
     
  • Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Now that we have correct COMMIT semantics in writeback_single_inode, we can
    reduce and simplify nfs_wb_all(). Also replace nfs_wb_nocommit() with a
    call to filemap_write_and_wait(), which doesn't need to hold the
    inode->i_mutex.

    With that done, we can eliminate nfs_write_mapping() altogether.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • In all cases we should be able to just remove the request and call
    cancel_dirty_page().

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • In order to know when we should do opportunistic commits of the unstable
    writes, when the VM is doing a background flush, we add a field to count
    the number of unstable writes.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • The sole purpose of nfs_write_inode is to commit unstable writes, so
    move it into fs/nfs/write.c, and make nfs_commit_inode static.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • * 'write_inode2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    pass writeback_control to ->write_inode
    make sure data is on disk before calling ->write_inode

    Linus Torvalds
     
  • …nel/git/tip/linux-2.6-tip

    * 'perf-probes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86: Issue at least one memory barrier in stop_machine_text_poke()
    perf probe: Correct probe syntax on command line help
    perf probe: Add lazy line matching support
    perf probe: Show more lines after last line
    perf probe: Check function address range strictly in line finder
    perf probe: Use libdw callback routines
    perf probe: Use elfutils-libdw for analyzing debuginfo
    perf probe: Rename probe finder functions
    perf probe: Fix bugs in line range finder
    perf probe: Update perf probe document
    perf probe: Do not show --line option without dwarf support
    kprobes: Add documents of jump optimization
    kprobes/x86: Support kprobes jump optimization on x86
    x86: Add text_poke_smp for SMP cross modifying code
    kprobes/x86: Cleanup save/restore registers
    kprobes/x86: Boost probes when reentering
    kprobes: Jump optimization sysctl interface
    kprobes: Introduce kprobes jump optimization
    kprobes: Introduce generic insn_slot framework
    kprobes/x86: Cleanup RELATIVEJUMP_INSTRUCTION to RELATIVEJUMP_OPCODE

    Linus Torvalds
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (36 commits)
    ext4: fix up rb_root initializations to use RB_ROOT
    ext4: Code cleanup for EXT4_IOC_MOVE_EXT ioctl
    ext4: Fix the NULL reference in double_down_write_data_sem()
    ext4: Fix insertion point of extent in mext_insert_across_blocks()
    ext4: consolidate in_range() definitions
    ext4: cleanup to use ext4_grp_offs_to_block()
    ext4: cleanup to use ext4_group_first_block_no()
    ext4: Release page references acquired in ext4_da_block_invalidatepages
    ext4: Fix ext4_quota_write cross block boundary behaviour
    ext4: Convert BUG_ON checks to use ext4_error() instead
    ext4: Use direct_IO_no_locking in ext4 dio read
    ext4: use ext4_get_block_write in buffer write
    ext4: mechanical rename some of the direct I/O get_block's identifiers
    ext4: make "offset" consistent in ext4_check_dir_entry()
    ext4: Handle non empty on-disk orphan link
    ext4: explicitly remove inode from orphan list after failed direct io
    ext4: fix error handling in migrate
    ext4: deprecate obsoleted mount options
    ext4: Fix fencepost error in chosing choosing group vs file preallocation.
    jbd2: clean up an assertion in jbd2_journal_commit_transaction()
    ...

    Linus Torvalds
     
  • This gives the filesystem more information about the writeback that
    is happening. Trond requested this for the NFS unstable write handling,
    and other filesystems might benefit from this too by beeing able to
    distinguish between the different callers in more detail.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

05 Mar, 2010

13 commits

  • Just use 0 / -EDQUOT directly - that's what it translates to anyway.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Get rid of the initialize dquot operation - it is now always called from
    the filesystem and if a filesystem really needs it's own (which none
    currently does) it can just call into it's own routine directly.

    Rename the now static low-level dquot_initialize helper to __dquot_initialize
    and vfs_dq_init to dquot_initialize to have a consistent namespace.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Currently various places in the VFS call vfs_dq_init directly. This means
    we tie the quota code into the VFS. Get rid of that and make the
    filesystem responsible for the initialization. For most metadata operations
    this is a straight forward move into the methods, but for truncate and
    open it's a bit more complicated.

    For truncate we currently only call vfs_dq_init for the sys_truncate case
    because open already takes care of it for ftruncate and open(O_TRUNC) - the
    new code causes an additional vfs_dq_init for those which is harmless.

    For open the initialization is moved from do_filp_open into the open method,
    which means it happens slightly earlier now, and only for regular files.
    The latter is fine because we don't need to initialize it for operations
    on special files, and we already do it as part of the namespace operations
    for directories.

    Add a dquot_file_open helper that filesystems that support generic quotas
    can use to fill in ->open.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Get rid of the drop dquot operation - it is now always called from
    the filesystem and if a filesystem really needs it's own (which none
    currently does) it can just call into it's own routine directly.

    Rename the now static low-level dquot_drop helper to __dquot_drop
    and vfs_dq_drop to dquot_drop to have a consistent namespace.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Get rid of the transfer dquot operation - it is now always called from
    the filesystem and if a filesystem really needs it's own (which none
    currently does) it can just call into it's own routine directly.

    Rename the now static low-level dquot_transfer helper to __dquot_transfer
    and vfs_dq_transfer to dquot_transfer to have a consistent namespace,
    and make the new dquot_transfer return a normal negative errno value
    which all callers expect.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Get rid of the alloc_inode and free_inode dquot operations - they are
    always called from the filesystem and if a filesystem really needs
    their own (which none currently does) it can just call into it's
    own routine directly.

    Also get rid of the vfs_dq_alloc/vfs_dq_free wrappers and always
    call the lowlevel dquot_alloc_inode / dqout_free_inode routines
    directly, which now lose the number argument which is always 1.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Get rid of the alloc_space, free_space, reserve_space, claim_space and
    release_rsv dquot operations - they are always called from the filesystem
    and if a filesystem really needs their own (which none currently does)
    it can just call into it's own routine directly.

    Move shared logic into the common __dquot_alloc_space,
    dquot_claim_space_nodirty and __dquot_free_space low-level methods,
    and rationalize the wrappers around it to move as much as possible
    code into the common block for CONFIG_QUOTA vs not. Also rename
    all these helpers to be named dquot_* instead of vfs_dq_*.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Current quota transfer interface support only uid/gid.
    This patch extend interface in order to support various quotas types
    The goal is accomplished without changes in most frequently used
    vfs_dq_transfer() func.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Jan Kara

    Dmitry Monakhov
     
  • - remove hardcoded USRQUOTA/GRPQUOTA flags
    - convert int to bool for appropriate functions

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Jan Kara

    Dmitry Monakhov
     
  • Currenly sync_quota_sb does a lot of sync and truncate action that only
    applies to "VFS" style quotas and is actively harmful for the sync
    performance in XFS. Move it into vfs_quota_sync and add a wait parameter
    to ->quota_sync to tell if we need it or not.

    My audit of the GFS2 code says it's also not needed given the way GFS2
    implements quotas, but I'd be happy if this can get a detailed review.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Currently Q_XQUOTASYNC calls into the quota_sync method, but XFS does something
    entirely different in it than the rest of the filesystems. xfs_quota which
    calls Q_XQUOTASYNC expects an asynchronous data writeout to flush delayed
    allocations, while the "VFS" quota support wants to flush changes to the quota
    file.

    So make Q_XQUOTASYNC call into the writeback code directly and make the
    quota_sync method optional as XFS doesn't need in the sense expected by the
    rest of the quota code.

    GFS2 was using limited XFS-style quota and has a quota_sync method fitting
    neither the style used by vfs_quota_sync nor xfs_fs_quota_sync. I left it
    in for now as per discussion with Steve it expects to be called from the
    sync path this way.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Since we implemented generic reserved space management interface,
    then it is possible to account reserved space even when quota
    is not active (similar to i_blocks/i_bytes).

    Without this patch following testcase result in massive comlain from
    WARN_ON in dquot_claim_space()

    TEST_CASE:
    mount /dev/sdb /mnt -oquota
    dd if=/dev/zero of=/mnt/test bs=1M count=1
    quotaon /mnt
    # fs_reserved_spave == 1Mb
    # quota_reserved_space == 0, because quota was disabled
    dd if=/dev/zero of=/mnt/test seek=1 bs=1M count=1
    # fs_reserved_spave == 2Mb
    # quota_reserved_space == 1Mb
    sync # ->dquot_claim_space() -> WARN_ON

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Jan Kara

    Dmitry Monakhov
     
  • CONFIG_BUFFER_DEBUG seems to have been removed from the documentation
    somewhere around 2.4.15 and seemingly hasn't been available even
    longer. It is, however, still referenced at one place from the jbd
    code (one is a copy of the other header). Time to clean it up

    Signed-off-by: Christoph Egger
    Signed-off-by: Jan Kara

    Christoph Egger