07 Mar, 2010

35 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs:
    [LogFS] Change magic number
    [LogFS] Remove h_version field
    [LogFS] Check feature flags
    [LogFS] Only write journal if dirty
    [LogFS] Fix bdev erases
    [LogFS] Silence gcc
    [LogFS] Prevent 64bit divisions in hash_index
    [LogFS] Plug memory leak on error paths
    [LogFS] Add MAINTAINERS entry
    [LogFS] add new flash file system

    Fixed up trivial conflict in lib/Kconfig, and a semantic conflict in
    fs/logfs/inode.c introduced by write_inode() being changed to use
    writeback_control' by commit a9185b41a4f84971b930c519f0c63bd450c4810d
    ("pass writeback_control to ->write_inode")

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
    dm raid1: fix deadlock when suspending failed device
    dm: eliminate some holes data structures
    dm ioctl: introduce flag indicating uevent was generated
    dm: free dm_io before bio_endio not after
    dm table: remove unused dm_get_device range parameters
    dm ioctl: only issue uevent on resume if state changed
    dm raid1: always return error if all legs fail
    dm mpath: refactor pg_init
    dm mpath: wait for pg_init completion when suspending
    dm mpath: hold io until all pg_inits completed
    dm mpath: avoid storing private suspended state
    dm: document when snapshot has finished merging
    dm table: remove dm_get from dm_table_get_md
    dm mpath: skip activate_path for failed paths
    dm mpath: pass struct pgpath to pg init done

    Linus Torvalds
     
  • * 'for-2.6.34' of git://linux-nfs.org/~bfields/linux: (22 commits)
    nfsd4: fix minor memory leak
    svcrpc: treat uid's as unsigned
    nfsd: ensure sockets are closed on error
    Revert "sunrpc: move the close processing after do recvfrom method"
    Revert "sunrpc: fix peername failed on closed listener"
    sunrpc: remove unnecessary svc_xprt_put
    NFSD: NFSv4 callback client should use RPC_TASK_SOFTCONN
    xfs_export_operations.commit_metadata
    commit_metadata export operation replacing nfsd_sync_dir
    lockd: don't clear sm_monitored on nsm_reboot_lookup
    lockd: release reference to nsm_handle in nlm_host_rebooted
    nfsd: Use vfs_fsync_range() in nfsd_commit
    NFSD: Create PF_INET6 listener in write_ports
    SUNRPC: NFS kernel APIs shouldn't return ENOENT for "transport not found"
    SUNRPC: Bury "#ifdef IPV6" in svc_create_xprt()
    NFSD: Support AF_INET6 in svc_addsock() function
    SUNRPC: Use rpc_pton() in ip_map_parse()
    nfsd: 4.1 has an rfc number
    nfsd41: Create the recovery entry for the NFSv4.1 client
    nfsd: use vfs_fsync for non-directories
    ...

    Linus Torvalds
     
  • Most of the GPIO expanders controlled by the pca953x driver are able to
    report changes on the input pins through an *INT pin.

    This patch implements the irq_chip functionality (edge detection only).

    The driver has been tested on an Arcom Zeus.

    [akpm@linux-foundation.org: the compiler does inlining for us nowadays]
    Signed-off-by: Marc Zyngier
    Cc: Eric Miao
    Cc: Haojian Zhuang
    Cc: David Brownell
    Cc: Nate Case
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marc Zyngier
     
  • gpio_request() without initial configuration of the GPIO is normally
    useless, introduce gpio_request_one() together with GPIOF_ flags for
    input/output direction and initial output level.

    gpio_{request,free}_array() for multiple GPIOs.

    Signed-off-by: Eric Miao
    Cc: David Brownell
    Cc: Ben Nizette
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Miao
     
  • linux/i2c/pca953x.h is a very bare include file. Fix check for multiple
    includes of linux/i2c/pca953x.h, and add dependent includes into the
    header file.

    Signed-off-by: Olof Johansson
    Acked-by: Wolfram Sang
    Acked-by: Jean Delvare
    Cc: David Brownell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Olof Johansson
     
  • Add the MAX7300-I2C variant of the MAX7301-SPI version. Both chips share
    the same core logic, so the generic part of the in-kernel SPI-driver is
    refactored into a generic part. The I2C and SPI specific funtions are
    then wrapped into seperate drivers picking up the generic part.

    Signed-off-by: Wolfram Sang
    Cc: Juergen Beisert
    Cc: David Brownell
    Cc: Jean Delvare
    Cc: Anton Vorontsov
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wolfram Sang
     
  • The driver for the mc13783 rtc needs to know if the TODA irq is pending.

    Instead of tracking in the rtc driver if the irq is enabled provide that
    information, too.

    Signed-off-by: Uwe Kleine-König
    Cc: Alessandro Zummo
    Cc: Paul Gortmaker
    Cc: Valentin Longchamp
    Cc: Sascha Hauer
    Cc: Samuel Ortiz
    Cc: Dmitry Torokhov
    Cc: Luotao Fu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     
  • In the source file group these functions together.

    The mc13783 header file provides fallback implementations for the old
    names to prevent build failures. When all users of the old names are
    fixed to use the new names these can go away.

    Signed-off-by: Uwe Kleine-König
    Cc: Alessandro Zummo
    Cc: Paul Gortmaker
    Cc: Valentin Longchamp
    Cc: Sascha Hauer
    Cc: Samuel Ortiz
    Cc: Dmitry Torokhov
    Cc: Luotao Fu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     
  • Pass mm->flags as a coredump parameter for consistency.

    ---
    1787 if (mm->core_state || !get_dumpable(mm)) { mmap_sem);
    1789 put_cred(cred);
    1790 goto fail;
    1791 }
    1792
    [...]
    1798 if (get_dumpable(mm) == 2) { /* Setuid core dump mode */ fsuid = 0; /* Dump root private */
    1801 }
    ---

    Since dumpable bits are not protected by lock, there is a chance to change
    these bits between (1) and (2).

    To solve this issue, this patch copies mm->flags to
    coredump_params.mm_flags at the beginning of do_coredump() and uses it
    instead of get_dumpable() while dumping core.

    This copy is also passed to binfmt->core_dump, since elf*_core_dump() uses
    dump_filter bits in mm->flags.

    [akpm@linux-foundation.org: fix merge]
    Signed-off-by: Masami Hiramatsu
    Acked-by: Roland McGrath
    Cc: Hidehiro Kawai
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masami Hiramatsu
     
  • The current ELF dumper implementation can produce broken corefiles if
    program headers exceed 65535. This number is determined by the number of
    vmas which the process have. In particular, some extreme programs may use
    more than 65535 vmas. (If you google max_map_count, you can find some
    users facing this problem.) This kind of program never be able to generate
    correct coredumps.

    This patch implements ``extended numbering'' that uses sh_info field of
    the first section header instead of e_phnum field in order to represent
    upto 4294967295 vmas.

    This is supported by
    AMD64-ABI(http://www.x86-64.org/documentation.html) and
    Solaris(http://docs.sun.com/app/docs/doc/817-1984/).
    Of course, we are preparing patches for gdb and binutils.

    Signed-off-by: Daisuke HATAYAMA
    Cc: "Luck, Tony"
    Cc: Jeff Dike
    Cc: David Howells
    Cc: Greg Ungerer
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Alexander Viro
    Cc: Andi Kleen
    Cc: Alan Cox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke HATAYAMA
     
  • elf_core_dump() and elf_fdpic_core_dump() use #ifdef and the corresponding
    macro for hiding _multiline_ logics in functions. This patch removes
    #ifdef and replaces ELF_CORE_EXTRA_* by corresponding functions. For
    architectures not implemeonting ELF_CORE_EXTRA_*, we use weak functions in
    order to reduce a range of modification.

    This cleanup is for my next patches, but I think this cleanup itself is
    worth doing regardless of my firnal purpose.

    Signed-off-by: Daisuke HATAYAMA
    Cc: "Luck, Tony"
    Cc: Jeff Dike
    Cc: David Howells
    Cc: Greg Ungerer
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Alexander Viro
    Cc: Andi Kleen
    Cc: Alan Cox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke HATAYAMA
     
  • My next patch will replace ELF_CORE_EXTRA_* macros by functions, putting
    them into other newly created *.c files. Then, each files will contain
    dump_write(), where each pair of binfmt_*.c and elfcore.c should be the
    same. So, this patch moves them into a header file with dump_seek().
    Also, the patch deletes confusing DUMP_WRITE macros in each files.

    Signed-off-by: Daisuke HATAYAMA
    Cc: "Luck, Tony"
    Cc: Jeff Dike
    Cc: David Howells
    Cc: Greg Ungerer
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Alexander Viro
    Cc: Andi Kleen
    Cc: Alan Cox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke HATAYAMA
     
  • And bring them back to 4-bit mode during resume.

    Signed-off-by: Daniel Drake
    Signed-off-by: Nicolas Pitre
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Drake
     
  • This patch series provides the core changes needed to allow SDIO cards to
    remain powered and active while the host system is suspended, and let them
    wake up the host system when needed. This is used to implement
    wake-on-lan with SDIO wireless cards at the moment. Patches to add that
    support to the libertas driver will be posted separately.

    This patch:

    Some SDIO cards have the ability to keep on running autonomously when the
    host system is suspended, and wake it up when needed. This however
    requires that the host controller preserve power to the card, and
    configure itself appropriately for wake-up.

    There is however 4 layers of abstractions involved: the host controller
    driver, the MMC core code, the SDIO card management code, and the actual
    SDIO function driver. To make things simple and manageable, host drivers
    must advertise their PM capabilities with a feature bitmask, then function
    drivers can query and set those features from their suspend method. Then
    each layer in the suspend call chain is expected to act upon those bits
    accordingly.

    [akpm@linux-foundation.org: fix typo in comment]
    Signed-off-by: Nicolas Pitre
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicolas Pitre
     
  • Some SDIO cards expect byte transfers not to exceed the configured block
    transfer size. Add a quirk to that effect.

    Patches to make use of this quirk will be sent separately.

    Signed-off-by: Bing Zhao
    Signed-off-by: Nicolas Pitre
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bing Zhao
     
  • The function name must be followed by a space, hypen, space, and a short
    description.

    Signed-off-by: Ben Hutchings
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Hutchings
     
  • smp: Fix documentation.

    Fix documentation in include/linux/smp.h: smp_processor_id()

    Signed-off-by: Rakib Mullick
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rakib Mullick
     
  • The macro any_online_node() is prone to producing sparse warnings due to
    the local symbol 'node'. Since all the in-tree users are really
    requesting the first online node (the mask argument is either
    NODE_MASK_ALL or node_online_map) just use the first_online_node macro and
    remove the any_online_node macro since there are no users.

    Signed-off-by: H Hartley Sweeten
    Acked-by: David Rientjes
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Lee Schermerhorn
    Acked-by: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Dave Hansen
    Cc: Milton Miller
    Cc: Nathan Fontenot
    Cc: Geoff Levand
    Cc: Grant Likely
    Cc: J. Bruce Fields
    Cc: Neil Brown
    Cc: Trond Myklebust
    Cc: David S. Miller
    Cc: Benny Halevy
    Cc: Chuck Lever
    Cc: Ricardo Labiaga
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    H Hartley Sweeten
     
  • Dependent on CONFIG_SMP the num_*_cpus() functions return unsigned or
    signed values. Let them always return unsigned values to avoid strange
    casts.

    Fixes at least one warning:

    kernel/kprobes.c: In function 'register_kretprobe':
    kernel/kprobes.c:1038: warning: comparison of distinct pointer types lacks a cast

    Signed-off-by: Heiko Carstens
    Cc: Heiko Carstens
    Cc: Ananth N Mavinakayanahalli
    Cc: Masami Hiramatsu
    Cc: Ingo Molnar
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • __GFP_NOFAIL was deprecated in dab48dab, so add a comment that no new
    users should be added.

    Reviewed-by: KAMEZAWA Hiroyuki
    Signed-off-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • The VM currently assumes that an inactive, mapped and referenced file page
    is in use and promotes it to the active list.

    However, every mapped file page starts out like this and thus a problem
    arises when workloads create a stream of such pages that are used only for
    a short time. By flooding the active list with those pages, the VM
    quickly gets into trouble finding eligible reclaim canditates. The result
    is long allocation latencies and eviction of the wrong pages.

    This patch reuses the PG_referenced page flag (used for unmapped file
    pages) to implement a usage detection that scales with the speed of LRU
    list cycling (i.e. memory pressure).

    If the scanner encounters those pages, the flag is set and the page cycled
    again on the inactive list. Only if it returns with another page table
    reference it is activated. Otherwise it is reclaimed as 'not recently
    used cache'.

    This effectively changes the minimum lifetime of a used-once mapped file
    page from a full memory cycle to an inactive list cycle, which allows it
    to occur in linear streams without affecting the stable working set of the
    system.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Cc: Minchan Kim
    Cc: OSAKI Motohiro
    Cc: Lee Schermerhorn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • There are quite a few GFP_KERNEL memory allocations made during
    suspend/hibernation and resume that may cause the system to hang, because
    the I/O operations they depend on cannot be completed due to the
    underlying devices being suspended.

    Avoid this problem by clearing the __GFP_IO and __GFP_FS bits in
    gfp_allowed_mask before suspend/hibernation and restoring the original
    values of these bits in gfp_allowed_mask durig the subsequent resume.

    [akpm@linux-foundation.org: fix CONFIG_PM=n linkage]
    Signed-off-by: Rafael J. Wysocki
    Reported-by: Maxim Levitsky
    Cc: Sebastian Ott
    Cc: Benjamin Herrenschmidt
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • When a VMA is in an inconsistent state during setup or teardown, the worst
    that can happen is that the rmap code will not be able to find the page.

    The mapping is in the process of being torn down (PTEs just got
    invalidated by munmap), or set up (no PTEs have been instantiated yet).

    It is also impossible for the rmap code to follow a pointer to an already
    freed VMA, because the rmap code holds the anon_vma->lock, which the VMA
    teardown code needs to take before the VMA is removed from the anon_vma
    chain.

    Hence, we should not need the VM_LOCK_RMAP locking at all.

    Signed-off-by: Rik van Riel
    Cc: Nick Piggin
    Cc: KOSAKI Motohiro
    Cc: Larry Woodman
    Cc: Lee Schermerhorn
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • When the parent process breaks the COW on a page, both the original which
    is mapped at child and the new page which is mapped parent end up in that
    same anon_vma. Generally this won't be a problem, but for some workloads
    it could preserve the O(N) rmap scanning complexity.

    A simple fix is to ensure that, when a page which is mapped child gets
    reused in do_wp_page, because we already are the exclusive owner, the page
    gets moved to our own exclusive child's anon_vma.

    Signed-off-by: Rik van Riel
    Cc: KOSAKI Motohiro
    Cc: Larry Woodman
    Cc: Lee Schermerhorn
    Reviewed-by: Minchan Kim
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • The old anon_vma code can lead to scalability issues with heavily forking
    workloads. Specifically, each anon_vma will be shared between the parent
    process and all its child processes.

    In a workload with 1000 child processes and a VMA with 1000 anonymous
    pages per process that get COWed, this leads to a system with a million
    anonymous pages in the same anon_vma, each of which is mapped in just one
    of the 1000 processes. However, the current rmap code needs to walk them
    all, leading to O(N) scanning complexity for each page.

    This can result in systems where one CPU is walking the page tables of
    1000 processes in page_referenced_one, while all other CPUs are stuck on
    the anon_vma lock. This leads to catastrophic failure for a benchmark
    like AIM7, where the total number of processes can reach in the tens of
    thousands. Real workloads are still a factor 10 less process intensive
    than AIM7, but they are catching up.

    This patch changes the way anon_vmas and VMAs are linked, which allows us
    to associate multiple anon_vmas with a VMA. At fork time, each child
    process gets its own anon_vmas, in which its COWed pages will be
    instantiated. The parents' anon_vma is also linked to the VMA, because
    non-COWed pages could be present in any of the children.

    This reduces rmap scanning complexity to O(1) for the pages of the 1000
    child processes, with O(N) complexity for at most 1/N pages in the system.
    This reduces the average scanning cost in heavily forking workloads from
    O(N) to 2.

    The only real complexity in this patch stems from the fact that linking a
    VMA to anon_vmas now involves memory allocations. This means vma_adjust
    can fail, if it needs to attach a VMA to anon_vma structures. This in
    turn means error handling needs to be added to the calling functions.

    A second source of complexity is that, because there can be multiple
    anon_vmas, the anon_vma linking in vma_adjust can no longer be done under
    "the" anon_vma lock. To prevent the rmap code from walking up an
    incomplete VMA, this patch introduces the VM_LOCK_RMAP VMA flag. This bit
    flag uses the same slot as the NOMMU VM_MAPPED_COPY, with an ifdef in mm.h
    to make sure it is impossible to compile a kernel that needs both symbolic
    values for the same bitflag.

    Some test results:

    Without the anon_vma changes, when AIM7 hits around 9.7k users (on a test
    box with 16GB RAM and not quite enough IO), the system ends up running
    >99% in system time, with every CPU on the same anon_vma lock in the
    pageout code.

    With these changes, AIM7 hits the cross-over point around 29.7k users.
    This happens with ~99% IO wait time, there never seems to be any spike in
    system time. The anon_vma lock contention appears to be resolved.

    [akpm@linux-foundation.org: cleanups]
    Signed-off-by: Rik van Riel
    Cc: KOSAKI Motohiro
    Cc: Larry Woodman
    Cc: Lee Schermerhorn
    Cc: Minchan Kim
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • It was tolerable until Eric went and added 8388608.

    Cc: Eric Paris
    Cc: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • This fixes inefficient page-by-page reads on POSIX_FADV_RANDOM.

    POSIX_FADV_RANDOM used to set ra_pages=0, which leads to poor performance:
    a 16K read will be carried out in 4 _sync_ 1-page reads.

    In other places, ra_pages==0 means
    - it's ramfs/tmpfs/hugetlbfs/sysfs/configfs
    - some IO error happened
    where multi-page read IO won't help or should be avoided.

    POSIX_FADV_RANDOM actually want a different semantics: to disable the
    *heuristic* readahead algorithm, and to use a dumb one which faithfully
    submit read IO for whatever application requests.

    So introduce a flag FMODE_RANDOM for POSIX_FADV_RANDOM.

    Note that the random hint is not likely to help random reads performance
    noticeably. And it may be too permissive on huge request size (its IO
    size is not limited by read_ahead_kb).

    In Quentin's report (http://lkml.org/lkml/2009/12/24/145), the overall
    (NFS read) performance of the application increased by 313%!

    Tested-by: Quentin Barnes
    Signed-off-by: Wu Fengguang
    Cc: Nick Piggin
    Cc: Andi Kleen
    Cc: Steven Whitehouse
    Cc: David Howells
    Cc: Jonathan Corbet
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Trond Myklebust
    Cc: Chuck Lever
    Cc: [2.6.33.x]
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • A memmap is a directory in sysfs which includes 3 text files: start, end
    and type. For example:

    start: 0x100000
    end: 0x7e7b1cff
    type: System RAM

    Interface firmware_map_add was not called explicitly. Remove it and add
    function firmware_map_add_hotplug as hotplug interface of memmap.

    Each memory entry has a memmap in sysfs, When we hot-add new memory, sysfs
    does not export memmap entry for it. We add a call in function add_memory
    to function firmware_map_add_hotplug.

    Add a new function add_sysfs_fw_map_entry() to create memmap entry, it
    will be called when initialize memmap and hot-add memory.

    [akpm@linux-foundation.org: un-kernedoc a no longer kerneldoc comment]
    Signed-off-by: Shaohui Zheng
    Acked-by: Andi Kleen
    Acked-by: Yasunori Goto
    Reviewed-by: Wu Fengguang
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    akpm@linux-foundation.org
     
  • commit e815af95 ("change all_unreclaimable zone member to flags") changed
    all_unreclaimable member to bit flag. But it had an undesireble side
    effect. free_one_page() is one of most hot path in linux kernel and
    increasing atomic ops in it can reduce kernel performance a bit.

    Thus, this patch revert such commit partially. at least
    all_unreclaimable shouldn't share memory word with other zone flags.

    [akpm@linux-foundation.org: fix patch interaction]
    Signed-off-by: KOSAKI Motohiro
    Cc: David Rientjes
    Cc: Wu Fengguang
    Cc: KAMEZAWA Hiroyuki
    Cc: Minchan Kim
    Cc: Huang Shijie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • free_hot_page() is just a wrapper around free_hot_cold_page() with
    parameter 'cold = 0'. After adding a clear comment for
    free_hot_cold_page(), it is reasonable to remove a level of call.

    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Li Hong
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Ingo Molnar
    Cc: Larry Woodman
    Cc: Peter Zijlstra
    Cc: Li Ming Chun
    Cc: KOSAKI Motohiro
    Cc: Americo Wang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Hong
     
  • A frequent questions from users about memory management is what numbers of
    swap ents are user for processes. And this information will give some
    hints to oom-killer.

    Besides we can count the number of swapents per a process by scanning
    /proc//smaps, this is very slow and not good for usual process
    information handler which works like 'ps' or 'top'. (ps or top is now
    enough slow..)

    This patch adds a counter of swapents to mm_counter and update is at each
    swap events. Information is exported via /proc//status file as

    [kamezawa@bluextal memory]$ cat /proc/self/status
    Name: cat
    State: R (running)
    Tgid: 2910
    Pid: 2910
    PPid: 2823
    TracerPid: 0
    Uid: 500 500 500 500
    Gid: 500 500 500 500
    FDSize: 256
    Groups: 500
    VmPeak: 82696 kB
    VmSize: 82696 kB
    VmLck: 0 kB
    VmHWM: 432 kB
    VmRSS: 432 kB
    VmData: 172 kB
    VmStk: 84 kB
    VmExe: 48 kB
    VmLib: 1568 kB
    VmPTE: 40 kB
    VmSwap: 0 kB
    Reviewed-by: Minchan Kim
    Reviewed-by: Christoph Lameter
    Cc: Lee Schermerhorn
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Considering the nature of per mm stats, it's the shared object among
    threads and can be a cache-miss point in the page fault path.

    This patch adds per-thread cache for mm_counter. RSS value will be
    counted into a struct in task_struct and synchronized with mm's one at
    events.

    Now, in this patch, the event is the number of calls to handle_mm_fault.
    Per-thread value is added to mm at each 64 calls.

    rough estimation with small benchmark on parallel thread (2threads) shows
    [before]
    4.5 cache-miss/faults
    [after]
    4.0 cache-miss/faults
    Anyway, the most contended object is mmap_sem if the number of threads grows.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Minchan Kim
    Cc: Christoph Lameter
    Cc: Lee Schermerhorn
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Presently, per-mm statistics counter is defined by macro in sched.h

    This patch modifies it to
    - defined in mm.h as inlinf functions
    - use array instead of macro's name creation.

    This patch is for reducing patch size in future patch to modify
    implementation of per-mm counter.

    Signed-off-by: KAMEZAWA Hiroyuki
    Reviewed-by: Minchan Kim
    Cc: Christoph Lameter
    Cc: Lee Schermerhorn
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Rename for_each_bit to for_each_set_bit in the kernel source tree. To
    permit for_each_clear_bit(), should that ever be added.

    The patch includes a macro to map the old for_each_bit() onto the new
    for_each_set_bit(). This is a (very) temporary thing to ease the migration.

    [akpm@linux-foundation.org: add temporary for_each_bit()]
    Suggested-by: Alexey Dobriyan
    Suggested-by: Andrew Morton
    Signed-off-by: Akinobu Mita
    Cc: "David S. Miller"
    Cc: Russell King
    Cc: David Woodhouse
    Cc: Artem Bityutskiy
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

06 Mar, 2010

5 commits

  • Eliminate a 4-byte hole in 'struct dm_io_memory' by moving 'offset' above the
    'ptr' to which it applies (size reduced from 24 to 16 bytes). And by
    association, 1-4 byte hole is eliminated in 'struct dm_io_request' (size
    reduced from 56 to 48 bytes).

    Eliminate all 6 4-byte holes and 1 cache-line in 'struct dm_snapshot' (size
    reduced from 392 to 368 bytes).

    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mike Snitzer
     
  • Set a new DM_UEVENT_GENERATED_FLAG when returning from ioctls to
    indicate that a uevent was actually generated. This tells the userspace
    caller that it may need to wait for the event to be processed.

    Signed-off-by: Peter Rajnoha
    Signed-off-by: Alasdair G Kergon

    Peter Rajnoha
     
  • Remove unused parameters(start and len) of dm_get_device()
    and fix the callers.

    Signed-off-by: Nikanth Karthikesan
    Signed-off-by: Alasdair G Kergon

    Nikanth Karthikesan
     
  • * 'slab-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
    SLUB: Fix per-cpu merge conflict
    failslab: add ability to filter slab caches
    slab: fix regression in touched logic
    dma kmalloc handling fixes
    slub: remove impossible condition
    slab: initialize unused alien cache entry as NULL at alloc_alien_cache().
    SLUB: Make slub statistics use this_cpu_inc
    SLUB: this_cpu: Remove slub kmem_cache fields
    SLUB: Get rid of dynamic DMA kmalloc cache allocation
    SLUB: Use this_cpu operations in slub

    Linus Torvalds
     
  • * 'nfs-for-2.6.34' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (44 commits)
    NFS: Remove requirement for inode->i_mutex from nfs_invalidate_mapping
    NFS: Clean up nfs_sync_mapping
    NFS: Simplify nfs_wb_page()
    NFS: Replace __nfs_write_mapping with sync_inode()
    NFS: Simplify nfs_wb_page_cancel()
    NFS: Ensure inode is always marked I_DIRTY_DATASYNC, if it has unstable pages
    NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set
    NFS: Reduce the number of unnecessary COMMIT calls
    NFS: Add a count of the number of unstable writes carried by an inode
    NFS: Cleanup - move nfs_write_inode() into fs/nfs/write.c
    nfs41 fix NFS4ERR_CLID_INUSE for exchange id
    NFS: Fix an allocation-under-spinlock bug
    SUNRPC: Handle EINVAL error returns from the TCP connect operation
    NFSv4.1: Various fixes to the sequence flag error handling
    nfs4: renewd renew operations should take/put a client reference
    nfs41: renewd sequence operations should take/put client reference
    nfs: prevent backlogging of renewd requests
    nfs: kill renewd before clearing client minor version
    NFS: Make close(2) asynchronous when closing NFS O_DIRECT files
    NFS: Improve NFS iostat byte count accuracy for writes
    ...

    Linus Torvalds