23 Mar, 2011

17 commits

  • printk()s without a priority level default to KERN_WARNING. To reduce
    noise at KERN_WARNING, this patch set the priority level appriopriately
    for unleveled printks()s. This should be useful to folks that look at
    dmesg warnings closely.

    Signed-off-by: Mandeep Singh Baines
    Cc: Jens Axboe
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mandeep Singh Baines
     
  • Now that the mere act of _looking_ at /proc/$pid/smaps will not destroy
    transparent huge pages, tell how much of the VMA is actually mapped with
    them.

    This way, we can make sure that we're getting THPs where we
    expect to see them.

    Signed-off-by: Dave Hansen
    Acked-by: Mel Gorman
    Acked-by: David Rientjes
    Reviewed-by: Eric B Munson
    Tested-by: Eric B Munson
    Cc: Michael J Wolf
    Cc: Andrea Arcangeli
    Cc: Johannes Weiner
    Cc: Matt Mackall
    Cc: Jeremy Fitzhardinge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • This adds code to explicitly detect and handle pmd_trans_huge() pmds. It
    then passes HPAGE_SIZE units in to the smap_pte_entry() function instead
    of PAGE_SIZE.

    This means that using /proc/$pid/smaps now will no longer cause THPs to be
    broken down in to small pages.

    Signed-off-by: Dave Hansen
    Reviewed-by: Eric B Munson
    Tested-by: Eric B Munson
    Acked-by: Andrea Arcangeli
    Acked-by: David Rientjes
    Cc: Mel Gorman
    Cc: Michael J Wolf
    Cc: Andrea Arcangeli
    Cc: Johannes Weiner
    Cc: Matt Mackall
    Cc: Jeremy Fitzhardinge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • Add an argument to the new smaps_pte_entry() function to let it account in
    things other than PAGE_SIZE units. I changed all of the PAGE_SIZE sites,
    even though not all of them can be reached for transparent huge pages,
    just so this will continue to work without changes as THPs are improved.

    Signed-off-by: Dave Hansen
    Acked-by: Mel Gorman
    Acked-by: Johannes Weiner
    Acked-by: David Rientjes
    Reviewed-by: Eric B Munson
    Tested-by: Eric B Munson
    Cc: Michael J Wolf
    Cc: Andrea Arcangeli
    Cc: Matt Mackall
    Cc: Jeremy Fitzhardinge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • We will use smaps_pte_entry() in a moment to handle both small and
    transparent large pages. But, we must break it out of smaps_pte_range()
    first.

    Signed-off-by: Dave Hansen
    Acked-by: Mel Gorman
    Acked-by: Johannes Weiner
    Acked-by: David Rientjes
    Reviewed-by: Eric B Munson
    Tested-by: Eric B Munson
    Cc: Michael J Wolf
    Cc: Andrea Arcangeli
    Cc: Matt Mackall
    Cc: Jeremy Fitzhardinge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • Right now, if a mm_walk has either ->pte_entry or ->pmd_entry set, it will
    unconditionally split any transparent huge pages it runs in to. In
    practice, that means that anyone doing a

    cat /proc/$pid/smaps

    will unconditionally break down every huge page in the process and depend
    on khugepaged to re-collapse it later. This is fairly suboptimal.

    This patch changes that behavior. It teaches each ->pmd_entry handler
    (there are five) that they must break down the THPs themselves. Also, the
    _generic_ code will never break down a THP unless a ->pte_entry handler is
    actually set.

    This means that the ->pmd_entry handlers can now choose to deal with THPs
    without breaking them down.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Dave Hansen
    Acked-by: Mel Gorman
    Acked-by: David Rientjes
    Reviewed-by: Eric B Munson
    Tested-by: Eric B Munson
    Cc: Michael J Wolf
    Cc: Andrea Arcangeli
    Cc: Johannes Weiner
    Cc: Matt Mackall
    Cc: Jeremy Fitzhardinge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • This patch series changes remove_from_page_cache()'s page ref counting
    rule. Page cache ref count is decreased in delete_from_page_cache(). So
    we don't need to decrease the page reference in callers.

    Signed-off-by: Minchan Kim
    Cc: William Irwin
    Acked-by: Hugh Dickins
    Acked-by: Mel Gorman
    Reviewed-by: KAMEZAWA Hiroyuki
    Reviewed-by: Johannes Weiner
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • This function basically does:

    remove_from_page_cache(old);
    page_cache_release(old);
    add_to_page_cache_locked(new);

    Except it does this atomically, so there's no possibility for the "add" to
    fail because of a race.

    If memory cgroups are enabled, then the memory cgroup charge is also moved
    from the old page to the new.

    This function is currently used by fuse to move pages into the page cache
    on read, instead of copying the page contents.

    [minchan.kim@gmail.com: add freepage() hook to replace_page_cache_page()]
    Signed-off-by: Miklos Szeredi
    Acked-by: Rik van Riel
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Signed-off-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
    [net/9p]: Introduce basic flow-control for VirtIO transport.
    9p: use the updated offset given by generic_write_checks
    [net/9p] Don't re-pin pages on retrying virtqueue_add_buf().
    [net/9p] Set the condition just before waking up.
    [net/9p] unconditional wake_up to proc waiting for space on VirtIO ring
    fs/9p: Add v9fs_dentry2v9ses
    fs/9p: Attach writeback_fid on first open with WR flag
    fs/9p: Open writeback fid in O_SYNC mode
    fs/9p: Use truncate_setsize instead of vmtruncate
    net/9p: Fix compile warning
    net/9p: Convert the in the 9p rpc call path to GFP_NOFS
    fs/9p: Fix race in initializing writeback fid

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    rbd: use watch/notify for changes in rbd header
    libceph: add lingering request and watch/notify event framework
    rbd: update email address in Documentation
    ceph: rename dentry_release -> d_release, fix comment
    ceph: add request to the tail of unsafe write list
    ceph: remove request from unsafe list if it is canceled/timed out
    ceph: move readahead default to fs/ceph from libceph
    ceph: add ino32 mount option
    ceph: update common header files
    ceph: remove debugfs debug cruft
    libceph: fix osd request queuing on osdmap updates
    ceph: preserve I_COMPLETE across rename
    libceph: Fix base64-decoding when input ends in newline.

    Linus Torvalds
     
  • Without this fix, even if a file is opened in O_APPEND mode, data will be
    written at current file position instead of end of file.

    Signed-off-by: M. Mohan Kumar
    Reviewed-by: Aneesh Kumar K.V
    Signed-off-by: Eric Van Hensbergen

    M. Mohan Kumar
     
  • Add the new static inline and use the same

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • We don't need writeback fid if we are only doing O_RDONLY open

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • Older version of protocol don't support tsyncfs operation.
    So for them force a O_SYNC flag on the server

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • convert vmtruncate usage to truncate_setsize. We also writeback
    all dirty pages before doing 9p operations and on success call truncate_setsize.
    This ensure that we continue sanely on failed truncate on the server. The
    disadvantage is that we are now going to write back the content that get
    thrown away later as a part of truncate.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • When two process open the same file we can end up with both of them
    allocating the writeback_fid. Add a new mutex which can be used
    for synchronizing v9fs_inode member values.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: make fuse_dentry_revalidate() RCU aware
    fuse: make fuse_permission() RCU aware
    fuse: wakeup pollers on connection release/abort
    fuse: reduce size of struct fuse_request

    Linus Torvalds
     

22 Mar, 2011

9 commits

  • * 'for-linus' of git://oss.sgi.com/xfs/xfs: (23 commits)
    xfs: don't name variables "panic"
    xfs: factor agf counter updates into a helper
    xfs: clean up the xfs_alloc_compute_aligned calling convention
    xfs: kill support/debug.[ch]
    xfs: Convert remaining cmn_err() callers to new API
    xfs: convert the quota debug prints to new API
    xfs: rename xfs_cmn_err_fsblock_zero()
    xfs: convert xfs_fs_cmn_err to new error logging API
    xfs: kill xfs_fs_mount_cmn_err() macro
    xfs: kill xfs_fs_repair_cmn_err() macro
    xfs: convert xfs_cmn_err to xfs_alert_tag
    xfs: Convert xlog_warn to new logging interface
    xfs: Convert linux-2.6/ files to new logging interface
    xfs: introduce new logging API.
    xfs: zero proper structure size for geometry calls
    xfs: enable delaylog by default
    xfs: more sensible inode refcounting for ialloc
    xfs: stop using xfs_trans_iget in the RT allocator
    xfs: check if device support discard in xfs_ioc_trim()
    xfs: prevent leaking uninitialized stack memory in FSGEOMETRY_V1
    ...

    Linus Torvalds
     
  • /sys/fs is a somewhat strange way to tweak what could more
    obviously be tuned with a mount option.

    Suggested-by: Christoph Hellwig
    Signed-off-by: Tony Luck
    Signed-off-by: Linus Torvalds

    Luck, Tony
     
  • Just for consistency's sake. Fix obsolete comment too.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • In sync_write_wait(), we assume that the newest request is at the
    tail of unsafe write list. We should maintain the semantics here.

    Signed-off-by: Henry C Chang
    Signed-off-by: Sage Weil

    Henry C Chang
     
  • This fixes the list corruption warning like this:

    ------------[ cut here ]------------
    WARNING: at lib/list_debug.c:30 __list_add+0x68/0x81()
    Hardware name: X8DTU
    list_add corruption. prev->next should be next (ffff880618931250), but was (null). (prev=ffff880c188b9130).
    Modules linked in: nfsd lockd nfs_acl auth_rpcgss exportfs ceph libceph libcrc32c sunrpc ipv6 fuse igb i2c_i801 ioatdma i2c_core iTCO_wdt iTCO_vendor_support joydev dca serio_raw usb_storage [last unloaded: scsi_wait_scan]
    Pid: 10977, comm: smbd Tainted: G W 2.6.32.23-170.Elaster.xendom0.fc12.x86_64 #1
    Call Trace:
    [] warn_slowpath_common+0x7c/0x94
    [] warn_slowpath_fmt+0x41/0x43
    [] __list_add+0x68/0x81
    [] ceph_aio_write+0x614/0x8a2 [ceph]
    [] do_sync_write+0xe8/0x125
    [] ? autoremove_wake_function+0x0/0x39
    [] ? selinux_file_permission+0x5c/0xb3
    [] ? security_file_permission+0x16/0x18
    [] vfs_write+0xae/0x10b
    [] sys_pwrite64+0x5a/0x76
    [] system_call_fastpath+0x16/0x1b
    ---[ end trace 08573eb9f07ff6f4 ]---

    Signed-off-by: Henry C Chang
    Signed-off-by: Sage Weil

    Henry C Chang
     
  • Signed-off-by: Sage Weil

    Sage Weil
     
  • The ino32 mount option forces the ceph fs to report 32 bit
    ino values. This is useful for 64 bit kernels with 32 bit userspace.

    Signed-off-by: Yehuda Sadeh

    Yehuda Sadeh
     
  • Whoops!

    Signed-off-by: Sage Weil

    Sage Weil
     
  • lookup_mnt() is only used in the core fs routines now, so it doesn't need to
    be globally declared anymore. It isn't exported to modules at the moment, so
    nothing that can be modularised seems to be using it.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

21 Mar, 2011

14 commits

  • Only bail out of fuse_dentry_revalidate() on LOOKUP_RCU when blocking
    is actually necessary.

    CC: Nick Piggin
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Only bail out of fuse_permission() on IPERM_FLAG_RCU when blocking is
    actually necessary.

    CC: Nick Piggin
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • If a fuse dev connection is broken, wake up any
    processes that are blocking, in a poll system call,
    on one of the files in the now defunct filesystem.

    Signed-off-by: Miklos Szeredi

    Bryan Green
     
  • Reduce the size of struct fuse_request by removing cuse_init_out from
    the request structure and allocating it dinamically instead.

    CC: Tejun Heo
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • The usage of find_first_zero_bit() in bfs_create() is wrong for two
    reasons.

    The bitmap size argument to find_first_zero_bit() is info->si_lasti but
    the correct bitmap size is info->si_lasti + 1 as info->si_lasti is the
    last valid index in info->si_imap bitmap.

    Another problem is that it is impossible to detect that info->si_imap
    bitmap is full because there is an off-by-one bug in the return value
    check for find_first_zero_bit(). If no zero bits exist in info->si_imap,
    find_first_zero_bit() returns info->si_lasti. But the check can't catch
    it due to the off-by-one.

    Signed-off-by: Akinobu Mita
    Acked-by: "Tigran A. Aivazian"
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Akinobu Mita
     
  • dentry_open() requires callers to pass a valid vfsmount.

    Signed-off-by: Tetsuo Handa
    Signed-off-by: Al Viro

    Tetsuo Handa
     
  • In this case nobody can open a slave point, so will be better return
    from devpts_pty_new()

    Now we should not check error code from d_find_alias() in
    devpts_pty_kill(), because the dentry exists all times.

    Signed-off-by: Andrey Vagin
    Signed-off-by: Al Viro

    Andrey Vagin
     
  • These should be spin_unlock() instead of spin_lock(). It's a typo.

    Signed-off-by: Dan Carpenter
    Signed-off-by: Al Viro

    Dan Carpenter
     
  • Move kfree() of i_private out of ->unlink() and into ->evict_inode()

    Signed-off-by: Tony Luck
    Signed-off-by: Al Viro

    Tony Luck
     
  • It is frequently useful to sync a single file system, instead of all
    mounted file systems via sync(2):

    - On machines with many mounts, it is not at all uncommon for some of
    them to hang (e.g. unresponsive NFS server). sync(2) will get stuck on
    those and may never get to the one you do care about (e.g., /).
    - Some applications write lots of data to the file system and then
    want to make sure it is flushed to disk. Calling fsync(2) on each
    file introduces unnecessary ordering constraints that result in a large
    amount of sub-optimal writeback/flush/commit behavior by the file
    system.

    There are currently two ways (that I know of) to sync a single super_block:

    - BLKFLSBUF ioctl on the block device: That also invalidates the bdev
    mapping, which isn't usually desirable, and doesn't work for non-block
    file systems.
    - 'mount -o remount,rw' will call sync_filesystem as an artifact of the
    current implemention. Relying on this little-known side effect for
    something like data safety sounds foolish.

    Both of these approaches require root privileges, which some applications
    do not have (nor should they need?) given that sync(2) is an unprivileged
    operation.

    This patch introduces a new system call syncfs(2) that takes an fd and
    syncs only the file system it references. Maybe someday we can

    $ sync /some/path

    and not get

    sync: ignoring all arguments

    The syscall is motivated by comments by Al and Christoph at the last LSF.
    syncfs(2) seems like an appropriate name given statfs(2).

    A similar ioctl was also proposed a while back, see
    http://marc.info/?l=linux-fsdevel&m=127970513829285&w=2

    Signed-off-by: Sage Weil
    Signed-off-by: Al Viro

    Sage Weil
     
  • Hi,

    I was backporting the coredump over pipe feature and noticed this small typo,
    I wish I would have something bigger to contribute...

    >From 15d6080e0ed4267da103c706917a33b1015e8804 Mon Sep 17 00:00:00 2001
    From: Holger Hans Peter Freyther
    Date: Thu, 24 Feb 2011 17:42:50 +0100
    Subject: [PATCH] fs: Fix a small typo in the comment

    The function is called umh_pipe_setup not uhm_pipe_setup.

    Signed-off-by: Holger Hans Peter Freyther
    Signed-off-by: Al Viro

    Holger Hans Peter Freyther
     
  • Fixed coding style issue.

    Signed-off-by: David Jenni
    Signed-off-by: Al Viro

    David Jenni
     
  • Signed-off-by: Ben Hutchings
    Signed-off-by: Al Viro

    Ben Hutchings
     
  • Remove the leftover from the commit 8ff3e8e85fa6 ("select:
    switch select() and poll() over to hrtimers").

    Signed-off-by: Namhyung Kim
    Acked-by: Arjan van de Ven
    Signed-off-by: Al Viro

    Namhyung Kim