14 Jan, 2011

36 commits

  • PG_buddy can be converted to _mapcount == -2. So the PG_compound_lock can
    be added to page->flags without overflowing (because of the sparse section
    bits increasing) with CONFIG_X86_PAE=y and CONFIG_X86_PAT=y. This also
    has to move the memory hotplug code from _mapcount to lru.next to avoid
    any risk of clashes. We can't use lru.next for PG_buddy removal, but
    memory hotplug can use lru.next even more easily than the mapcount
    instead.

    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Add hugepage stat information to /proc/vmstat and /proc/meminfo.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • We'd like to be able to oom_score_adj a process up/down as it
    enters/leaves the foreground. Currently, it is not possible to oom_adj
    down without CAP_SYS_RESOURCE. This patch allows a task to decrease its
    oom_score_adj back to the value that a CAP_SYS_RESOURCE thread set it to
    or its inherited value at fork. Assuming the thread that has forked it
    has oom_score_adj of 0, each process could decrease it back from 0 upon
    activation unless a CAP_SYS_RESOURCE thread elevated it to something
    higher.

    Alternative considered:

    * a setuid binary
    * a daemon with CAP_SYS_RESOURCE

    Since you don't wan't all processes to be able to reduce their oom_adj, a
    setuid or daemon implementation would be complex. The alternatives also
    have much higher overhead.

    This patch updated from original patch based on feedback from David
    Rientjes.

    Signed-off-by: Mandeep Singh Baines
    Acked-by: David Rientjes
    Cc: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Rik van Riel
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mandeep Singh Baines
     
  • Currently there is no way to find whether a process has locked its pages
    in memory or not. And which of the memory regions are locked in memory.

    Add a new field "Locked" to export this information via the smaps file.

    Signed-off-by: Nikanth Karthikesan
    Acked-by: Balbir Singh
    Acked-by: Wu Fengguang
    Cc: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nikanth Karthikesan
     
  • Merge mpage_end_io_read() and mpage_end_io_write() into mpage_end_io() to
    eliminate code duplication.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Hai Shan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hai Shan
     
  • Use correct function name, remove incorrect apostrophe

    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • When wb_writeback() is called in WB_SYNC_ALL mode, work->nr_to_write is
    usually set to LONG_MAX. The logic in wb_writeback() then calls
    __writeback_inodes_sb() with nr_to_write == MAX_WRITEBACK_PAGES and we
    easily end up with non-positive nr_to_write after the function returns, if
    the inode has more than MAX_WRITEBACK_PAGES dirty pages at the moment.

    When nr_to_write is
    Signed-off-by: Wu Fengguang
    Cc: Johannes Weiner
    Cc: Dave Chinner
    Cc: Christoph Hellwig
    Cc: Jan Engelhardt
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Background writeback is easily livelockable in a loop in wb_writeback() by
    a process continuously re-dirtying pages (or continuously appending to a
    file). This is in fact intended as the target of background writeback is
    to write dirty pages it can find as long as we are over
    dirty_background_threshold.

    But the above behavior gets inconvenient at times because no other work
    queued in the flusher thread's queue gets processed. In particular, since
    e.g. sync(1) relies on flusher thread to do all the IO for it, sync(1)
    can hang forever waiting for flusher thread to do the work.

    Generally, when a flusher thread has some work queued, someone submitted
    the work to achieve a goal more specific than what background writeback
    does. Moreover by working on the specific work, we also reduce amount of
    dirty pages which is exactly the target of background writeout. So it
    makes sense to give specific work a priority over a generic page cleaning.

    Thus we interrupt background writeback if there is some other work to do.
    We return to the background writeback after completing all the queued
    work.

    This may delay the writeback of expired inodes for a while, however the
    expired inodes will eventually be flushed to disk as long as the other
    works won't livelock.

    [fengguang.wu@intel.com: update comment]
    Signed-off-by: Jan Kara
    Signed-off-by: Wu Fengguang
    Cc: Johannes Weiner
    Cc: Dave Chinner
    Cc: Christoph Hellwig
    Cc: Jan Engelhardt
    Cc: Jens Axboe

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • This tracks when balance_dirty_pages() tries to wakeup the flusher thread
    for background writeback (if it was not started already).

    Suggested-by: Christoph Hellwig
    Signed-off-by: Wu Fengguang
    Cc: Jan Kara
    Cc: Johannes Weiner
    Cc: Dave Chinner
    Cc: Jan Engelhardt
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • Check whether background writeback is needed after finishing each work.

    When bdi flusher thread finishes doing some work check whether any kind of
    background writeback needs to be done (either because
    dirty_background_ratio is exceeded or because we need to start flushing
    old inodes). If so, just do background write back.

    This way, bdi_start_background_writeback() just needs to wake up the
    flusher thread. It will do background writeback as soon as there is no
    other work.

    This is a preparatory patch for the next patch which stops background
    writeback as soon as there is other work to do.

    Signed-off-by: Jan Kara
    Signed-off-by: Wu Fengguang
    Cc: Johannes Weiner
    Cc: Dave Chinner
    Cc: Christoph Hellwig
    Cc: Jan Engelhardt
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Stephen Rothwell reports that the vfs merge broke the build of ecryptfs.
    The breakage comes from commit 66cb76666d69 ("sanitize ecryptfs
    ->mount()") which was obviously not even build tested. Tssk, tssk, Al.

    This is the minimal build fixup for the situation, although I don't have
    a filesystem to actually test it with.

    Reported-by: Stephen Rothwell
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Commit c0204fd2b8fe047b18b67e07e1bf2a03691240cd (NFS: Clean up
    nfs4_proc_create()) broke NFSv3 exclusive open by removing the code
    that passes the O_EXCL flag down to nfs3_proc_create(). This patch
    reverts that offending hunk from the original commit.

    Reported-by: Nick Bowler
    Signed-off-by: Trond Myklebust
    Cc: stable@kernel.org [2.6.37]
    Tested-by: Nick Bowler
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     
  • * 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
    block: ensure that completion error gets properly traced
    blktrace: add missing probe argument to block_bio_complete
    block cfq: don't use atomic_t for cfq_group
    block cfq: don't use atomic_t for cfq_queue
    block: trace event block fix unassigned field
    block: add internal hd part table references
    block: fix accounting bug on cross partition merges
    kref: add kref_test_and_get
    bio-integrity: mark kintegrityd_wq highpri and CPU intensive
    block: make kblockd_workqueue smarter
    Revert "sd: implement sd_check_events()"
    block: Clean up exit_io_context() source code.
    Fix compile warnings due to missing removal of a 'ret' variable
    fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
    block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
    cfq-iosched: don't check cfqg in choose_service_tree()
    fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
    cdrom: export cdrom_check_events()
    sd: implement sd_check_events()
    sr: implement sr_check_events()
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (41 commits)
    fs: add documentation on fallocate hole punching
    Gfs2: fail if we try to use hole punch
    Btrfs: fail if we try to use hole punch
    Ext4: fail if we try to use hole punch
    Ocfs2: handle hole punching via fallocate properly
    XFS: handle hole punching via fallocate properly
    fs: add hole punching to fallocate
    vfs: pass struct file to do_truncate on O_TRUNC opens (try #2)
    fix signedness mess in rw_verify_area() on 64bit architectures
    fs: fix kernel-doc for dcache::prepend_path
    fs: fix kernel-doc for dcache::d_validate
    sanitize ecryptfs ->mount()
    switch afs
    move internal-only parts of ncpfs headers to fs/ncpfs
    switch ncpfs
    switch 9p
    pass default dentry_operations to mount_pseudo()
    switch hostfs
    switch affs
    switch configfs
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    rbd: fix cleanup when trying to mount inexistent image
    net/ceph: make ceph_msgr_wq non-reentrant
    ceph: fsc->*_wq's aren't used in memory reclaim path
    ceph: Always free allocated memory in osdmap_decode()
    ceph: Makefile: Remove unnessary code
    ceph: associate requests with opening sessions
    ceph: drop redundant r_mds field
    ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS
    ceph: add dir_layout to inode

    Linus Torvalds
     
  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
    Documentation/trace/events.txt: Remove obsolete sched_signal_send.
    writeback: fix global_dirty_limits comment runtime -> real-time
    ppc: fix comment typo singal -> signal
    drivers: fix comment typo diable -> disable.
    m68k: fix comment typo diable -> disable.
    wireless: comment typo fix diable -> disable.
    media: comment typo fix diable -> disable.
    remove doc for obsolete dynamic-printk kernel-parameter
    remove extraneous 'is' from Documentation/iostats.txt
    Fix spelling milisec -> ms in snd_ps3 module parameter description
    Fix spelling mistakes in comments
    Revert conflicting V4L changes
    i7core_edac: fix typos in comments
    mm/rmap.c: fix comment
    sound, ca0106: Fix assignment to 'channel'.
    hrtimer: fix a typo in comment
    init/Kconfig: fix typo
    anon_inodes: fix wrong function name in comment
    fix comment typos concerning "consistent"
    poll: fix a typo in comment
    ...

    Fix up trivial conflicts in:
    - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c)
    - fs/ext4/ext4.h

    Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.

    Linus Torvalds
     
  • Generate a unique inode numbers for any entries in the cram file system.
    For files which did not contain data's (device nodes, fifos and sockets)
    the offset of the directory entry inside the cramfs plus 1 will be used as
    inode number.

    The + 1 for the inode will it make possible to distinguish between a file
    which contains no data and files which has data, the later one has a inode
    value where the lower two bits are always 0.

    It also reimplements the behavior to set the size and the number of block
    to 0 for special file, which is the right value for empty files, devices,
    fifos and sockets

    As a little benefit it will be also more compatible which older mkcramfs,
    because it will never use the cramfs_inode->offset for creating a inode
    number for special files.

    [akpm@linux-foundation.org: trivial comment fix]
    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Stefani Seibold
    Cc: Al Viro
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stefani Seibold
     
  • aio_run_iocbs() is not used at all, so get rid of it.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Jeff Moyer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Moyer
     
  • 'nr >= min_nr >= 0' always satisfies 'nr >= 0' so the check is unnecesary.

    Signed-off-by: Namhyung Kim
    Acked-by: Jeff Moyer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Commit 66fa12c571d3 ("ieee1394: remove the old IEEE 1394 driver stack")
    eliminated the only user of cdev_index(). So it can be removed too.

    Signed-off-by: Namhyung Kim
    Cc: Stefan Richter
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Commit 34aacb2920 ("procfs: Use generic_file_llseek in /proc/kcore") broke
    seeking on /proc/kcore. This changes it back to use default_llseek in
    order to restore the original behavior.

    The problem with generic_file_llseek is that it only allows seeks up to
    inode->i_sb->s_maxbytes, which is 2GB-1 on procfs, where the memory file
    offset values in the /proc/kcore PT_LOAD segments may exceed or start
    beyond that offset value.

    A similar revert was made for /proc/vmcore.

    Signed-off-by: Dave Anderson
    Acked-by: Frederic Weisbecker
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Anderson
     
  • Filename is supposed to match procfile name for random junk.

    Add __init while I'm at it.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • For the common case where a proc entry is being removed and nobody is in
    the process of using it, save a LOCK/UNLOCK pair.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Add a PageSlab() check before adding the _mapcount value to /kpagecount.
    page->_mapcount is in a union with the SLAB structure so for pages
    controlled by SLAB, page_mapcount() returns nonsense.

    Signed-off-by: Petr Holasek
    Cc: Wu Fengguang
    Cc: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Holasek
     
  • single_open()'s third argument is for copying into seq_file->private. Use
    that, rather than open-coding it.

    Signed-off-by: Jovi Zhang
    Acked-by: David Rientjes
    Acked-by: Alexey Dobriyan
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jovi Zhang
     
  • - ->low_ino is write-once field -- reading it under locks is unnecessary.

    - /proc/$PID stuff never reaches pde_put()/free_proc_entry() --
    PROC_DYNAMIC_FIRST check never triggers.

    - in proc_get_inode(), inode number always matches proc dir entry, so
    save one parameter.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • For string without format specifiers, use seq_puts().
    For seq_printf("\n"), use seq_putc('\n').

    text data bss dec hex filename
    61866 488 112 62466 f402 fs/proc/proc.o
    61729 488 112 62329 f379 fs/proc/proc.o
    ----------------------------------------------------
    -139

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • /proc/*/statm code needlessly truncates data from unsigned long to int.
    One needs only 8+ TB of RAM to make truncation visible.

    Signed-off-by: Alexey Dobriyan
    Reviewed-by: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Use temporary lr for struct latency_record for improved readability and
    fewer columns used. Removed trailing space from output.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Joe Perches
    Cc: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • A call to va_start() must always be followed by a call to va_end() in the
    same function. In fs/reiserfs/prints.c::print_block() this is not always
    the case. If 'bh' is NULL we'll return without calling va_end().

    One could add a call to va_end() before the 'return' statement, but it's
    nicer to just move the call to va_start() after the test for 'bh' being
    NULL.

    Signed-off-by: Jesper Juhl
    Acked-by: Edward Shishkin
    Cc: Jeff Mahoney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesper Juhl
     
  • 'struct befs_disk_data_stream' is huge (~144 bytes) and it's being passed
    by value in fs/befs/endian.h::cpu_to_fsrun().

    It would be better to pass a pointer.

    Signed-off-by: Jesper Juhl
    Cc: Will Dyson
    Cc: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesper Juhl
     
  • Send the events the wakeup refers to, so that epoll, and even the new poll
    code in fs/select.c can avoid wakeups if the events do not match the
    requested set.

    Signed-off-by: Davide Libenzi
    Acked-by: David S. Miller
    Acked-by: Eric Dumazet
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • This cleans up a few bits in binfmt_elf.c and binfmts.h:

    - the hasvdso field in struct linux_binfmt is unused, so remove it and
    the only initialization of it

    - the elf_map CPP symbol is not defined anywhere in the kernel, so
    remove an unnecessary #ifndef elf_map

    - reduce excessive indentation in elf_format's initializer

    - add missing spaces, remove extraneous spaces

    No functional changes, but tested on x86 (32 and 64 bit), powerpc (32 and
    64 bit), sparc64, arm, and alpha.

    Signed-off-by: Mikael Pettersson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mikael Pettersson
     
  • On a 16TB machine, max_user_watches has an integer overflow. Convert it
    to use a long and handle the associated fallout.

    Signed-off-by: Robin Holt
    Cc: "Eric W. Biederman"
    Acked-by: Davide Libenzi
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robin Holt
     
  • On some architectures __kernel_suseconds_t is int. On these archs struct
    timeval has padding bytes at the end. This struct is copied to userspace
    with these padding bytes uninitialized. This leads to leaking of contents
    of kernel stack memory.

    This bug was added with v2.6.27-rc5-286-gb773ad4.

    [akpm@linux-foundation.org: avoid the memset on architectures which don't need it]
    Signed-off-by: Vasiliy Kulikov
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasiliy Kulikov
     
  • pr_warning_ratelimited() doesn't exist.

    Also include printk.h, which defines these things.

    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

13 Jan, 2011

4 commits