07 Mar, 2010

26 commits

  • The old anon_vma code can lead to scalability issues with heavily forking
    workloads. Specifically, each anon_vma will be shared between the parent
    process and all its child processes.

    In a workload with 1000 child processes and a VMA with 1000 anonymous
    pages per process that get COWed, this leads to a system with a million
    anonymous pages in the same anon_vma, each of which is mapped in just one
    of the 1000 processes. However, the current rmap code needs to walk them
    all, leading to O(N) scanning complexity for each page.

    This can result in systems where one CPU is walking the page tables of
    1000 processes in page_referenced_one, while all other CPUs are stuck on
    the anon_vma lock. This leads to catastrophic failure for a benchmark
    like AIM7, where the total number of processes can reach in the tens of
    thousands. Real workloads are still a factor 10 less process intensive
    than AIM7, but they are catching up.

    This patch changes the way anon_vmas and VMAs are linked, which allows us
    to associate multiple anon_vmas with a VMA. At fork time, each child
    process gets its own anon_vmas, in which its COWed pages will be
    instantiated. The parents' anon_vma is also linked to the VMA, because
    non-COWed pages could be present in any of the children.

    This reduces rmap scanning complexity to O(1) for the pages of the 1000
    child processes, with O(N) complexity for at most 1/N pages in the system.
    This reduces the average scanning cost in heavily forking workloads from
    O(N) to 2.

    The only real complexity in this patch stems from the fact that linking a
    VMA to anon_vmas now involves memory allocations. This means vma_adjust
    can fail, if it needs to attach a VMA to anon_vma structures. This in
    turn means error handling needs to be added to the calling functions.

    A second source of complexity is that, because there can be multiple
    anon_vmas, the anon_vma linking in vma_adjust can no longer be done under
    "the" anon_vma lock. To prevent the rmap code from walking up an
    incomplete VMA, this patch introduces the VM_LOCK_RMAP VMA flag. This bit
    flag uses the same slot as the NOMMU VM_MAPPED_COPY, with an ifdef in mm.h
    to make sure it is impossible to compile a kernel that needs both symbolic
    values for the same bitflag.

    Some test results:

    Without the anon_vma changes, when AIM7 hits around 9.7k users (on a test
    box with 16GB RAM and not quite enough IO), the system ends up running
    >99% in system time, with every CPU on the same anon_vma lock in the
    pageout code.

    With these changes, AIM7 hits the cross-over point around 29.7k users.
    This happens with ~99% IO wait time, there never seems to be any spike in
    system time. The anon_vma lock contention appears to be resolved.

    [akpm@linux-foundation.org: cleanups]
    Signed-off-by: Rik van Riel
    Cc: KOSAKI Motohiro
    Cc: Larry Woodman
    Cc: Lee Schermerhorn
    Cc: Minchan Kim
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • mm/memcontrol.c:2548:32: warning: Using plain integer as NULL pointer

    Signed-off-by: Thiago Farina
    Acked-by: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thiago Farina
     
  • It was tolerable until Eric went and added 8388608.

    Cc: Eric Paris
    Cc: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • This fixes inefficient page-by-page reads on POSIX_FADV_RANDOM.

    POSIX_FADV_RANDOM used to set ra_pages=0, which leads to poor performance:
    a 16K read will be carried out in 4 _sync_ 1-page reads.

    In other places, ra_pages==0 means
    - it's ramfs/tmpfs/hugetlbfs/sysfs/configfs
    - some IO error happened
    where multi-page read IO won't help or should be avoided.

    POSIX_FADV_RANDOM actually want a different semantics: to disable the
    *heuristic* readahead algorithm, and to use a dumb one which faithfully
    submit read IO for whatever application requests.

    So introduce a flag FMODE_RANDOM for POSIX_FADV_RANDOM.

    Note that the random hint is not likely to help random reads performance
    noticeably. And it may be too permissive on huge request size (its IO
    size is not limited by read_ahead_kb).

    In Quentin's report (http://lkml.org/lkml/2009/12/24/145), the overall
    (NFS read) performance of the application increased by 313%!

    Tested-by: Quentin Barnes
    Signed-off-by: Wu Fengguang
    Cc: Nick Piggin
    Cc: Andi Kleen
    Cc: Steven Whitehouse
    Cc: David Howells
    Cc: Jonathan Corbet
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Trond Myklebust
    Cc: Chuck Lever
    Cc: [2.6.33.x]
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • We'll introduce FMODE_RANDOM which will be runtime modified. So protect
    all runtime modification to f_mode with f_lock to avoid races.

    Signed-off-by: Wu Fengguang
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Trond Myklebust
    Cc: Chuck Lever
    Cc: [2.6.33.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • commit 01b1ae63c2 ("memcg: simple migration handling") removed
    mem_cgroup_uncharge_cache_page() call from migrate_page_copy. Local
    variable `anon' is now unused.

    Signed-off-by: KOSAKI Motohiro
    Cc: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • Currently, do_migrate_pages() have very long comment and this is not
    indent properly. I often misunderstand it is function starting commnents
    and confused it.

    this patch fixes it.

    note: this patch doesn't break 80 column rule. I guess original
    author intended this indentaion, but an accident corrupted it.

    Signed-off-by: KOSAKI Motohiro
    Reviewed-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • A memmap is a directory in sysfs which includes 3 text files: start, end
    and type. For example:

    start: 0x100000
    end: 0x7e7b1cff
    type: System RAM

    Interface firmware_map_add was not called explicitly. Remove it and add
    function firmware_map_add_hotplug as hotplug interface of memmap.

    Each memory entry has a memmap in sysfs, When we hot-add new memory, sysfs
    does not export memmap entry for it. We add a call in function add_memory
    to function firmware_map_add_hotplug.

    Add a new function add_sysfs_fw_map_entry() to create memmap entry, it
    will be called when initialize memmap and hot-add memory.

    [akpm@linux-foundation.org: un-kernedoc a no longer kerneldoc comment]
    Signed-off-by: Shaohui Zheng
    Acked-by: Andi Kleen
    Acked-by: Yasunori Goto
    Reviewed-by: Wu Fengguang
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    akpm@linux-foundation.org
     
  • Strangely, current mbind() doesn't merge vma with neighbor vma although it's possible.
    Unfortunately, many vma can reduce performance...

    This patch fixes it.

    reproduced program
    ----------------------------------------------------------------
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    static unsigned long pagesize;

    int main(int argc, char** argv)
    {
    void* addr;
    int ch;
    int node;
    struct bitmask *nmask = numa_allocate_nodemask();
    int err;
    int node_set = 0;
    char buf[128];

    while ((ch = getopt(argc, argv, "n:")) != -1){
    switch (ch){
    case 'n':
    node = strtol(optarg, NULL, 0);
    numa_bitmask_setbit(nmask, node);
    node_set = 1;
    break;
    default:
    ;
    }
    }
    argc -= optind;
    argv += optind;

    if (!node_set)
    numa_bitmask_setbit(nmask, 0);

    pagesize = getpagesize();

    addr = mmap(NULL, pagesize*3, PROT_READ|PROT_WRITE,
    MAP_ANON|MAP_PRIVATE, 0, 0);
    if (addr == MAP_FAILED)
    perror("mmap "), exit(1);

    fprintf(stderr, "pid = %d \n" "addr = %p\n", getpid(), addr);

    /* make page populate */
    memset(addr, 0, pagesize*3);

    /* first mbind */
    err = mbind(addr+pagesize, pagesize, MPOL_BIND, nmask->maskp,
    nmask->size, MPOL_MF_MOVE_ALL);
    if (err)
    error("mbind1 ");

    /* second mbind */
    err = mbind(addr, pagesize*3, MPOL_DEFAULT, NULL, 0, 0);
    if (err)
    error("mbind2 ");

    sprintf(buf, "cat /proc/%d/maps", getpid());
    system(buf);

    return 0;
    }
    ----------------------------------------------------------------

    result without this patch

    addr = 0x7fe26ef09000
    [snip]
    7fe26ef09000-7fe26ef0a000 rw-p 00000000 00:00 0
    7fe26ef0a000-7fe26ef0b000 rw-p 00000000 00:00 0
    7fe26ef0b000-7fe26ef0c000 rw-p 00000000 00:00 0
    7fe26ef0c000-7fe26ef0d000 rw-p 00000000 00:00 0

    => 0x7fe26ef09000-0x7fe26ef0c000 have three vmas.

    result with this patch

    addr = 0x7fc9ebc76000
    [snip]
    7fc9ebc76000-7fc9ebc7a000 rw-p 00000000 00:00 0
    7fffbe690000-7fffbe6a5000 rw-p 00000000 00:00 0 [stack]

    => 0x7fc9ebc76000-0x7fc9ebc7a000 have only one vma.

    [minchan.kim@gmail.com: fix file offset passed to vma_merge()]
    Signed-off-by: KOSAKI Motohiro
    Reviewed-by: Christoph Lameter
    Cc: Nick Piggin
    Cc: Hugh Dickins
    Cc: Andrea Arcangeli
    Cc: Mel Gorman
    Cc: Lee Schermerhorn
    Signed-off-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • commit e815af95 ("change all_unreclaimable zone member to flags") changed
    all_unreclaimable member to bit flag. But it had an undesireble side
    effect. free_one_page() is one of most hot path in linux kernel and
    increasing atomic ops in it can reduce kernel performance a bit.

    Thus, this patch revert such commit partially. at least
    all_unreclaimable shouldn't share memory word with other zone flags.

    [akpm@linux-foundation.org: fix patch interaction]
    Signed-off-by: KOSAKI Motohiro
    Cc: David Rientjes
    Cc: Wu Fengguang
    Cc: KAMEZAWA Hiroyuki
    Cc: Minchan Kim
    Cc: Huang Shijie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • free_hot_page() is just a wrapper around free_hot_cold_page() with
    parameter 'cold = 0'. After adding a clear comment for
    free_hot_cold_page(), it is reasonable to remove a level of call.

    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Li Hong
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Ingo Molnar
    Cc: Larry Woodman
    Cc: Peter Zijlstra
    Cc: Li Ming Chun
    Cc: KOSAKI Motohiro
    Cc: Americo Wang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Hong
     
  • Move a call of trace_mm_page_free_direct() from free_hot_page() to
    free_hot_cold_page(). It is clearer and close to kmemcheck_free_shadow(),
    as it is done in function __free_pages_ok().

    Signed-off-by: Li Hong
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Ingo Molnar
    Cc: Larry Woodman
    Cc: Peter Zijlstra
    Cc: Li Ming Chun
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Hong
     
  • trace_mm_page_free_direct() is called in function __free_pages(). But it
    is called again in free_hot_page() if order == 0 and produce duplicate
    records in trace file for mm_page_free_direct event. As below:

    K-PID CPU# TIMESTAMP FUNCTION
    gnome-terminal-1567 [000] 4415.246466: mm_page_free_direct: page=ffffea0003db9f40 pfn=1155800 order=0
    gnome-terminal-1567 [000] 4415.246468: mm_page_free_direct: page=ffffea0003db9f40 pfn=1155800 order=0
    gnome-terminal-1567 [000] 4415.246506: mm_page_alloc: page=ffffea0003db9f40 pfn=1155800 order=0 migratetype=0 gfp_flags=GFP_KERNEL
    gnome-terminal-1567 [000] 4415.255557: mm_page_free_direct: page=ffffea0003db9f40 pfn=1155800 order=0
    gnome-terminal-1567 [000] 4415.255557: mm_page_free_direct: page=ffffea0003db9f40 pfn=1155800 order=0

    This patch removes the first call and adds a call to
    trace_mm_page_free_direct() in __free_pages_ok().

    Signed-off-by: Li Hong
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Ingo Molnar
    Cc: Larry Woodman
    Cc: Peter Zijlstra
    Cc: Li Ming Chun
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Hong
     
  • Commit cf40bd16fd ("lockdep: annotate reclaim context") introduced reclaim
    context annotation. But it didn't annotate zone reclaim. This patch do
    it.

    The point is, commit cf40bd16fd annotate __alloc_pages_direct_reclaim but
    zone-reclaim doesn't use __alloc_pages_direct_reclaim.

    current call graph is

    __alloc_pages_nodemask
    get_page_from_freelist
    zone_reclaim()
    __alloc_pages_slowpath
    __alloc_pages_direct_reclaim
    try_to_free_pages

    Actually, if zone_reclaim_mode=1, VM never call
    __alloc_pages_direct_reclaim in usual VM pressure.

    Signed-off-by: KOSAKI Motohiro
    Reviewed-by: Minchan Kim
    Acked-by: Nick Piggin
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • The get_scan_ratio() should have all scan-ratio related calculations.
    Thus, this patch move some calculation into get_scan_ratio.

    Signed-off-by: KOSAKI Motohiro
    Reviewed-by: Rik van Riel
    Reviewed-by: KAMEZAWA Hiroyuki
    Reviewed-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • Kswapd checks that zone has sufficient pages free via zone_watermark_ok().

    If any zone doesn't have enough pages, we set all_zones_ok to zero.
    !all_zone_ok makes kswapd retry rather than sleeping.

    I think the watermark check before shrink_zone() is pointless. Only after
    kswapd has tried to shrink the zone is the check meaningful.

    Move the check to after the call to shrink_zone().

    [akpm@linux-foundation.org: fix comment, layout]
    Signed-off-by: Minchan Kim
    Reviewed-by: KOSAKI Motohiro
    Cc: Mel Gorman
    Cc: Rik van Riel
    Reviewed-by: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Make sure compiler won't do weird things with limits. E.g. fetching them
    twice may return 2 different values after writable limits are implemented.

    I.e. either use rlimit helpers added in
    3e10e716abf3c71bdb5d86b8f507f9e72236c9cd ("resource: add helpers for
    fetching rlimits") or ACCESS_ONCE if not applicable.

    Signed-off-by: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • Currently, mlock_vma_pages_range() only return len or 0. then current
    error handling of mmap_region() is meaningless complex.

    This patch makes simplify and makes consist with brk() code.

    Signed-off-by: KOSAKI Motohiro
    Cc: Nick Piggin
    Cc: Lee Schermerhorn
    Cc: Rik van Riel
    Cc: KAMEZAWA Hiroyuki
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • Currently, mlock_vma_pages_range() never return negative value. Then, we
    can remove some worthless error check.

    Signed-off-by: KOSAKI Motohiro
    Cc: Nick Piggin
    Cc: Lee Schermerhorn
    Cc: Rik van Riel
    Cc: KAMEZAWA Hiroyuki
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • A frequent questions from users about memory management is what numbers of
    swap ents are user for processes. And this information will give some
    hints to oom-killer.

    Besides we can count the number of swapents per a process by scanning
    /proc//smaps, this is very slow and not good for usual process
    information handler which works like 'ps' or 'top'. (ps or top is now
    enough slow..)

    This patch adds a counter of swapents to mm_counter and update is at each
    swap events. Information is exported via /proc//status file as

    [kamezawa@bluextal memory]$ cat /proc/self/status
    Name: cat
    State: R (running)
    Tgid: 2910
    Pid: 2910
    PPid: 2823
    TracerPid: 0
    Uid: 500 500 500 500
    Gid: 500 500 500 500
    FDSize: 256
    Groups: 500
    VmPeak: 82696 kB
    VmSize: 82696 kB
    VmLck: 0 kB
    VmHWM: 432 kB
    VmRSS: 432 kB
    VmData: 172 kB
    VmStk: 84 kB
    VmExe: 48 kB
    VmLib: 1568 kB
    VmPTE: 40 kB
    VmSwap: 0 kB
    Reviewed-by: Minchan Kim
    Reviewed-by: Christoph Lameter
    Cc: Lee Schermerhorn
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Considering the nature of per mm stats, it's the shared object among
    threads and can be a cache-miss point in the page fault path.

    This patch adds per-thread cache for mm_counter. RSS value will be
    counted into a struct in task_struct and synchronized with mm's one at
    events.

    Now, in this patch, the event is the number of calls to handle_mm_fault.
    Per-thread value is added to mm at each 64 calls.

    rough estimation with small benchmark on parallel thread (2threads) shows
    [before]
    4.5 cache-miss/faults
    [after]
    4.0 cache-miss/faults
    Anyway, the most contended object is mmap_sem if the number of threads grows.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Minchan Kim
    Cc: Christoph Lameter
    Cc: Lee Schermerhorn
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Presently, per-mm statistics counter is defined by macro in sched.h

    This patch modifies it to
    - defined in mm.h as inlinf functions
    - use array instead of macro's name creation.

    This patch is for reducing patch size in future patch to modify
    implementation of per-mm counter.

    Signed-off-by: KAMEZAWA Hiroyuki
    Reviewed-by: Minchan Kim
    Cc: Christoph Lameter
    Cc: Lee Schermerhorn
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Replace open-coded loop with for_each_set_bit().

    Signed-off-by: Akinobu Mita
    Acked-by: Roland Dreier
    Cc: Sean Hefty
    Cc: Hal Rosenstock
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Rename for_each_bit to for_each_set_bit in the kernel source tree. To
    permit for_each_clear_bit(), should that ever be added.

    The patch includes a macro to map the old for_each_bit() onto the new
    for_each_set_bit(). This is a (very) temporary thing to ease the migration.

    [akpm@linux-foundation.org: add temporary for_each_bit()]
    Suggested-by: Alexey Dobriyan
    Suggested-by: Andrew Morton
    Signed-off-by: Akinobu Mita
    Cc: "David S. Miller"
    Cc: Russell King
    Cc: David Woodhouse
    Cc: Artem Bityutskiy
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Use of get_irq_chip_data() et al. requires including linux/irq.h

    Signed-off-by: David S. Miller
    Cc: Richard Röjfors
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Miller
     
  • We managed to lose O_DIRECTORY testing due to a stupid typo in commit
    1f36f774b2 ("Switch !O_CREAT case to use of do_last()")

    Reported-by: Walter Sheets
    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

06 Mar, 2010

14 commits

  • * 'slab-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
    SLUB: Fix per-cpu merge conflict
    failslab: add ability to filter slab caches
    slab: fix regression in touched logic
    dma kmalloc handling fixes
    slub: remove impossible condition
    slab: initialize unused alien cache entry as NULL at alloc_alien_cache().
    SLUB: Make slub statistics use this_cpu_inc
    SLUB: this_cpu: Remove slub kmem_cache fields
    SLUB: Get rid of dynamic DMA kmalloc cache allocation
    SLUB: Use this_cpu operations in slub

    Linus Torvalds
     
  • * 'nfs-for-2.6.34' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (44 commits)
    NFS: Remove requirement for inode->i_mutex from nfs_invalidate_mapping
    NFS: Clean up nfs_sync_mapping
    NFS: Simplify nfs_wb_page()
    NFS: Replace __nfs_write_mapping with sync_inode()
    NFS: Simplify nfs_wb_page_cancel()
    NFS: Ensure inode is always marked I_DIRTY_DATASYNC, if it has unstable pages
    NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set
    NFS: Reduce the number of unnecessary COMMIT calls
    NFS: Add a count of the number of unstable writes carried by an inode
    NFS: Cleanup - move nfs_write_inode() into fs/nfs/write.c
    nfs41 fix NFS4ERR_CLID_INUSE for exchange id
    NFS: Fix an allocation-under-spinlock bug
    SUNRPC: Handle EINVAL error returns from the TCP connect operation
    NFSv4.1: Various fixes to the sequence flag error handling
    nfs4: renewd renew operations should take/put a client reference
    nfs41: renewd sequence operations should take/put client reference
    nfs: prevent backlogging of renewd requests
    nfs: kill renewd before clearing client minor version
    NFS: Make close(2) asynchronous when closing NFS O_DIRECT files
    NFS: Improve NFS iostat byte count accuracy for writes
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
    fs/9p: Add hardlink support to .u extension
    9P2010.L handshake: .L protocol negotiation
    9P2010.L handshake: Remove "dotu" variable
    9P2010.L handshake: Add mount option
    9P2010.L handshake: Add VFS flags
    net/9p: Handle mount errors correctly.
    net/9p: Remove MAX_9P_CHAN limit
    net/9p: Add multi channel support.

    Linus Torvalds
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits)
    quota: stop using QUOTA_OK / NO_QUOTA
    dquot: cleanup dquot initialize routine
    dquot: move dquot initialization responsibility into the filesystem
    dquot: cleanup dquot drop routine
    dquot: move dquot drop responsibility into the filesystem
    dquot: cleanup dquot transfer routine
    dquot: move dquot transfer responsibility into the filesystem
    dquot: cleanup inode allocation / freeing routines
    dquot: cleanup space allocation / freeing routines
    ext3: add writepage sanity checks
    ext3: Truncate allocated blocks if direct IO write fails to update i_size
    quota: Properly invalidate caches even for filesystems with blocksize < pagesize
    quota: generalize quota transfer interface
    quota: sb_quota state flags cleanup
    jbd: Delay discarding buffers in journal_unmap_buffer
    ext3: quota_write cross block boundary behaviour
    quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota
    quota: split out compat_sys_quotactl support from quota.c
    quota: split out netlink notification support from quota.c
    quota: remove invalid optimization from quota_sync_all
    ...

    Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c

    Linus Torvalds
     
  • * 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (145 commits)
    KVM: x86: Add KVM_CAP_X86_ROBUST_SINGLESTEP
    KVM: VMX: Update instruction length on intercepted BP
    KVM: Fix emulate_sys[call, enter, exit]()'s fault handling
    KVM: Fix segment descriptor loading
    KVM: Fix load_guest_segment_descriptor() to inject page fault
    KVM: x86 emulator: Forbid modifying CS segment register by mov instruction
    KVM: Convert kvm->requests_lock to raw_spinlock_t
    KVM: Convert i8254/i8259 locks to raw_spinlocks
    KVM: x86 emulator: disallow opcode 82 in 64-bit mode
    KVM: x86 emulator: code style cleanup
    KVM: Plan obsolescence of kernel allocated slots, paravirt mmu
    KVM: x86 emulator: Add LOCK prefix validity checking
    KVM: x86 emulator: Check CPL level during privilege instruction emulation
    KVM: x86 emulator: Fix popf emulation
    KVM: x86 emulator: Check IOPL level during io instruction emulation
    KVM: x86 emulator: fix memory access during x86 emulation
    KVM: x86 emulator: Add Virtual-8086 mode of emulation
    KVM: x86 emulator: Add group9 instruction decoding
    KVM: x86 emulator: Add group8 instruction decoding
    KVM: do not store wqh in irqfd
    ...

    Trivial conflicts in Documentation/feature-removal-schedule.txt

    Linus Torvalds
     
  • For regular file and directories we put the link
    count in th extension field in a tagged string format.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • This patch adds 9P2010.L protocol negotiation with the server

    Signed-off-by: Sripathi Kodi
    Signed-off-by: Eric Van Hensbergen

    Sripathi Kodi
     
  • Removes 'dotu' variable and make everything dependent
    on 'proto_version' field.

    Signed-off-by: Sripathi Kodi
    Signed-off-by: Eric Van Hensbergen

    Sripathi Kodi
     
  • Add new mount V9FS mount option to specify protocol version

    This patch adds a new mount option to specify protocol version.
    With this option it is possible to use "-o version=" switch to
    specify 9P protocol version to use. Valid options for version
    are:
    9p2000
    9p2000.u
    9p2010.L

    Signed-off-by: Sripathi Kodi
    Signed-off-by: Eric Van Hensbergen

    Sripathi Kodi
     
  • Add 9P2000.u and 9P2010.L protocol flags to V9FS VFS

    This patch adds 9P2000.u and 9P2010.L protocol flags into V9FS VFS side code
    and removes the single flag used for 'extended'.

    Signed-off-by: Sripathi Kodi
    Signed-off-by: Eric Van Hensbergen

    Sripathi Kodi
     
  • With this patch we have

    # mount -t 9p -o trans=virtio virtio2 /mnt/
    # mount -t 9p -o trans=virtio virtio2 /mnt/
    mount: virtio2 already mounted or /mnt/ busy
    mount: according to mtab, virtio2 is already mounted on /mnt
    # mount -t 9p -o trans=virtio virtio3 /mnt/ -o debug=0xfff
    mount: special device virtio3 does not exist

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • Use a list to track the channel instead of statically
    allocated array

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • This is needed for supporting multiple mount points.

    We can find out the device names to be used with mount by checking

    /sys/devices/virtio-pci/virtio*/device file

    if the device file have value 9 then the specific virtio device can
    be used for mounting.

    ex:
    #cat /sys/devices/virtio-pci/virtio1/device
    9

    now we can mount using
    # mount -t 9p -o trans=virtio virtio1 /mnt/

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • Trond Myklebust