28 Sep, 2009

1 commit


24 Sep, 2009

2 commits

  • * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits)
    HWPOISON: Enable error_remove_page on btrfs
    HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs
    HWPOISON: Add madvise() based injector for hardware poisoned pages v4
    HWPOISON: Enable error_remove_page for NFS
    HWPOISON: Enable .remove_error_page for migration aware file systems
    HWPOISON: The high level memory error handler in the VM v7
    HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process
    HWPOISON: shmem: call set_page_dirty() with locked page
    HWPOISON: Define a new error_remove_page address space op for async truncation
    HWPOISON: Add invalidate_inode_page
    HWPOISON: Refactor truncate to allow direct truncating of page v2
    HWPOISON: check and isolate corrupted free pages v2
    HWPOISON: Handle hardware poisoned pages in try_to_unmap
    HWPOISON: Use bitmask/action code for try_to_unmap behaviour
    HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2
    HWPOISON: Add poison check to page fault handling
    HWPOISON: Add basic support for poisoned pages in fault handler v3
    HWPOISON: Add new SIGBUS error codes for hardware poison signals
    HWPOISON: Add support for poison swap entries v2
    HWPOISON: Export some rmap vma locking to outside world
    ...

    Linus Torvalds
     
  • * remove asm/atomic.h inclusion from linux/utsname.h --
    not needed after kref conversion
    * remove linux/utsname.h inclusion from files which do not need it

    NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however
    due to some personality stuff it _is_ needed -- cowardly leave ELF-related
    headers and files alone.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

21 Sep, 2009

1 commit


16 Sep, 2009

1 commit

  • Enable removing of corrupted pages through truncation
    for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs
    These should cover most server needs.

    I chose the set of migration aware file systems for this
    for now, assuming they have been especially audited.
    But in general it should be safe for all file systems
    on the data area that support read/write and truncate.

    Caveat: the hardware error handler does not take i_mutex
    for now before calling the truncate function. Is that ok?

    Cc: tytso@mit.edu
    Cc: hch@infradead.org
    Cc: mfasheh@suse.com
    Cc: aia21@cantab.net
    Cc: hugh.dickins@tiscali.co.uk
    Cc: swhiteho@redhat.com
    Signed-off-by: Andi Kleen

    Andi Kleen
     

15 Sep, 2009

1 commit

  • * 'for-2.6.32' of git://git.kernel.dk/linux-2.6-block: (29 commits)
    block: use blkdev_issue_discard in blk_ioctl_discard
    Make DISCARD_BARRIER and DISCARD_NOBARRIER writes instead of reads
    block: don't assume device has a request list backing in nr_requests store
    block: Optimal I/O limit wrapper
    cfq: choose a new next_req when a request is dispatched
    Seperate read and write statistics of in_flight requests
    aoe: end barrier bios with EOPNOTSUPP
    block: trace bio queueing trial only when it occurs
    block: enable rq CPU completion affinity by default
    cfq: fix the log message after dispatched a request
    block: use printk_once
    cciss: memory leak in cciss_init_one()
    splice: update mtime and atime on files
    block: make blk_iopoll_prep_sched() follow normal 0/1 return convention
    cfq-iosched: get rid of must_alloc flag
    block: use interrupts disabled version of raise_softirq_irqoff()
    block: fix comment in blk-iopoll.c
    block: adjust default budget for blk-iopoll
    block: fix long lines in block/blk-iopoll.c
    block: add blk-iopoll, a NAPI like approach for block devices
    ...

    Linus Torvalds
     

14 Sep, 2009

2 commits

  • Reported-by: Daniel Walker
    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • blk_ioctl_discard duplicates large amounts of code from blkdev_issue_discard,
    the only difference between the two is that blkdev_issue_discard needs to
    send a barrier discard request and blk_ioctl_discard a non-barrier one,
    and blk_ioctl_discard needs to wait on the request. To facilitates this
    add a flags argument to blkdev_issue_discard to control both aspects of the
    behaviour. This will be very useful later on for using the waiting
    funcitonality for other callers.

    Based on an earlier patch from Matthew Wilcox .

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

09 Sep, 2009

2 commits

  • The /sys/fs/gfs2//lock_module/id file has been unused for
    some time now, so we can remove it. We still accept the mount option
    though, as userspace still sends that.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • There is a potential race in the inode deallocation code if two
    nodes try to deallocate the same inode at the same time. Most of
    the issue is solved by the iopen locking. There is still a small
    window which is not covered by the iopen lock. This patches fixes
    that and also makes the deallocation code more robust in the face of
    any errors in the rgrp bitmaps, or erroneous iopen callbacks from
    other nodes.

    This does introduce one extra disk read, but that is generally not
    an issue since its the same block that must be written to later
    in the deallocation process. The total disk accesses therefore stay
    the same,

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

27 Aug, 2009

3 commits

  • The inum structure used throughout GFS2 has two fields. One
    no_addr is the disk block number of the inode in question and
    is used everywhere as the inode number. The other, no_formal_ino,
    is used only as the generation number for NFS.

    Historically the no_formal_ino field was set using a complicated
    system of one global and one per-node file containing inode numbers
    in order to ensure that each no_formal_ino was unique. Also this
    code made no provision for what would happen when eventually the
    (64 bit) numbers ran out. Now I know that is pretty unlikely to
    happen given the large space of numbers, but it is possible
    nevertheless.

    The only guarantee required for no_formal_ino is that, for any
    single inode, the same number doesn't get reused too quickly.

    We already have a generation number which is kept in the inode
    and initialised from a counter in the resource group (almost
    no overhead, since we have to touch the resource group anyway
    in order to allocate an inode in the first place). Aside from
    ensuring that we never use the value 0 in the no_formal_ino
    field, we can use that counter directly.

    As a result of that change, we lose about 200 lines of code and
    also gain about 10 creates/sec on the postmark benchmark (on
    my test machine).

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Use the more conventional name for the extended attribute
    support code. Update all the places which care.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This has been on my list for some time. We need to change the way
    in which we handle extended attributes to allow faster file creation
    times (by reducing the number of transactions required) and the
    extended attribute code is the main obstacle to this.

    In addition to that, the VFS provides a way to demultiplex the xattr
    calls which we ought to be using, rather than rolling our own. This
    patch changes the GFS2 code to use that VFS feature and as a result
    the code shrinks by a couple of hundred lines or so, and becomes
    easier to read.

    I'm planning on doing further clean up work in this area, but this
    patch is a good start. The cleaned up code also uses the more usual
    "xattr" shorthand, I plan to eliminate the use of "eattr" eventually
    and in the mean time it serves as a flag as to which bits of the code
    have been updated.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

24 Aug, 2009

2 commits

  • This patch adds "-o errors=panic" and "-o errors=withdraw" to the
    gfs2 mount options. The "errors=withdraw" option is today's
    current behaviour, meaning to withdraw from the file system if a
    non-serious gfs2 error occurs. The new "errors=panic" option
    tells gfs2 to force a kernel panic if a non-serious gfs2 file
    system error occurs. This may be useful, for example, where
    fabric-level fencing is used that has no way to reboot (such as
    fence_scsi).

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • Also a gfs2_glock_dq() is required here.

    Signed-off-by: Roel Kluin
    Signed-off-by: Steven Whitehouse

    Roel Kluin
     

18 Aug, 2009

1 commit

  • this patch is for the same problem that Benjamin Marzinski fixes at commit
    b94a170e96dc416828af9d350ae2e34b70ae7347

    quotation of the original problem:

    ---cut here---
    When a file is deleted from a gfs2 filesystem on one node, a dcache
    entry for it may still exist on other nodes in the cluster. If this
    happens, gfs2 will be unable to free this file on disk. Because of this,
    it's possible to have a gfs2 filesystem with no files on it and no free
    space. With this patch, when a node receives a callback notifying it
    that the file is being deleted on another node, it schedules a new
    workqueue thread to remove the file's dcache entry.
    ---end cut---

    after applying Benjamin's patch, I think there is still a case in which the disk
    inode remains even when "no space" is hit. the case is that when running
    d_prune_aliases() against the inode, there are one or more dentries(aliases)
    which have reference count number > 0. in this case the dentries won't be pruned.
    and even later, the reference count becomes to 0, the dentries can still be
    cached in memory. unfortunately, no callback come again, things come back to
    the state before the callback runs. thus the on disk inode remains there until
    in memoryinode is removed for some other reason(shrinking inode cache or unmount
    the volume..).

    this patch is to remove those dentries when their reference count becomes to 0 and
    the inode is deleted by remote node. for implementation, gfs2_dentry_delete() is
    added as dentry_operations.d_delete. the function returns true when the inode is
    deleted by remote node. in dput(), gfs2_dentry_delete() is called and since it
    returns true, the dentry is unhashed from dcache and then removed. when all dentries
    are removed, the in memory inode get removed so that the on disk inode is freed.

    Signed-off-by: Wengang Wang
    Signed-off-by: Steven Whitehouse

    Wengang Wang
     

17 Aug, 2009

5 commits

  • This adds a link from the per-gfs2 sb sysfs directory to
    the block device upon which the filesystem is mounted. The
    link is called "device", strangely enough :-)

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • One fewer assert, one more place we can recover gracefully
    if there is an error.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • A little while back, block allocation was given some improved
    error handling which meant that -EIO was returned in the case
    of there being a problem in the resource group data. In addition
    a message is printed explaning what went wrong and how to fix it.
    This extends that error handling so that it also covers inode
    allocation too.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • With each uevent, we now always include the journal ID. We
    can't call it JID since that is already in use by some of
    the individual events relating to recovery, so we use
    JOURNALID instead. We don't send the JOURNALID for spectator
    mounts, since there isn't one.

    Also the ADD event now has both RDONLY and SPECTATOR information
    to match that of the ONLINE event.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • We already have an offline uevent (used when a withdraw occurs)
    but no online uevent. This adds an online uevent so that userspace
    will be able to detect a successful mount by means other than
    not receiving a remove event after the add & recovery (change)
    uevents.

    It has also been added to the remount path as well - we can't use
    a change uevent there as older GFS2 userspace acts on change uevents
    according to the state that it thinks the fs is in, so we can't
    easily add any new ones.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

14 Aug, 2009

1 commit


30 Jul, 2009

7 commits

  • When a file is deleted from a gfs2 filesystem on one node, a dcache
    entry for it may still exist on other nodes in the cluster. If this
    happens, gfs2 will be unable to free this file on disk. Because of this,
    it's possible to have a gfs2 filesystem with no files on it and no free
    space. With this patch, when a node receives a callback notifying it
    that the file is being deleted on another node, it schedules a new
    workqueue thread to remove the file's dcache entry.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     
  • Since both linked and unlinked inodes are counted by rgd->rd_dinodes, It
    makes no sense to count them with the used data blocks (first check that
    I changed), it makes sense to count them with the linked inodes (second
    check), and it makes no sense to care if there are more unlinked inodes
    than linked ones. This fixes these errors.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     
  • GFS2 was placing far too many glocks on the reclaim list that were not good
    candidates for freeing up from cache. These locks would sit there and
    repeatedly get scanned to see if they could be reclaimed, wasting a lot
    of time when there was memory pressure. This fix does more checks on the
    locks to see if they are actually likely to be removable from cache.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     
  • When searching for unlinked, but still allocated inodes during block
    allocation, avoid the block relating to the inode that is doing the
    allocation. This fixes a hang caused when an unlinked, but still
    open, inode tries to allocate some more blocks and lands up
    finding itself during the search for deallocatable inodes.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • It is possible for gfs2_shrink_glock_memory() to check a glock for
    demotion
    that's in the process of being freed by gfs2_glock_put(). In this case,
    gfs2_shrink_glock_memory() will acquire a new reference to this glock,
    and
    then try to free the glock itself when it drops the refernce. To solve
    this, gfs2_shrink_glock_memory() just needs to check if the glock is in
    the process of being freed, and if so skip it without ever unlocking the
    lru_lock.

    Signed-off-by: Benjamin Marzinski
    Acked-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     
  • GFS2 wasn't syncing its statfs info on grows. This causes a problem
    when you grow the filesystem on multiple nodes. GFS2 would calculate
    the new space based on the resource groups (which are always current),
    and then assume that the filesystem had grown the from the existing
    statfs size. If you grew the filesystem on two different nodes in a
    short time, the second node wouldn't see the statfs size change from the
    first node, and would assume that it was grown by a larger amount than
    it was. When all these changes were synced out, the total fileystem
    size would be incorrect (the first grow would be counted twice).

    This patch syncs makes GFS2 read in the statfs changes from disk before
    a grow, and write them out after the grow, while the master statfs inode
    is locked.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     
  • This patch removes some of the special cases that the shrinker
    was trying to deal with. As a result we leave fewer items on
    the list and none at all which cannot be demoted. This makes
    the list scanning more efficient and solves some issues seen
    with large numbers of inodes.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

13 Jul, 2009

1 commit

  • If TRACE_INCLDUE_FILE is defined,
    will be included and compiled, otherwise it will be

    So TRACE_SYSTEM should be defined outside of #if proctection,
    just like TRACE_INCLUDE_FILE.

    Imaging this scenario:

    #include
    -> TRACE_SYSTEM == foo
    ...
    #include
    -> TRACE_SYSTEM == bar
    ...
    #define CREATE_TRACE_POINTS
    #include
    -> TRACE_SYSTEM == bar !!!

    and then bar.h will be included and compiled.

    Signed-off-by: Li Zefan
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     

19 Jun, 2009

1 commit

  • Follow-up to "block: enable by default support for large devices
    and files on 32-bit archs".

    Rename CONFIG_LBD to CONFIG_LBDAF to:
    - allow update of existing [def]configs for "default y" change
    - reflect that it is used also for large files support nowadays

    Signed-off-by: Bartlomiej Zolnierkiewicz
    Signed-off-by: Jens Axboe

    Bartlomiej Zolnierkiewicz
     

12 Jun, 2009

5 commits

  • It is not required here.

    Signed-off-by: Steven Whitehouse
    Cc: Christoph Hellwig

    Steven Whitehouse
     
  • This patch adds the ability to trace various aspects of the GFS2
    filesystem. The trace points are divided into three groups,
    glocks, logging and bmap. These points have been chosen because
    they allow inspection of the major internal functions of GFS2
    and they are also generic enough that they are unlikely to need
    any major changes as the filesystem evolves.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Move BKL into ->put_super from the only caller. A couple of
    filesystems had trivial enough ->put_super (only kfree and NULLing of
    s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs,
    hugetlbfs, omfs, qnx4, shmem, all others got the full treatment. Most
    of them probably don't need it, but I'd rather sort that out individually.
    Preferably after all the other BKL pushdowns in that area.

    [AV: original used to move lock_super() down as well; these changes are
    removed since we don't do lock_super() at all in generic_shutdown_super()
    now]
    [AV: fuse, btrfs and xfs are known to need no damn BKL, exempt]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Acked-by: Steven Whitehouse
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • * 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits)
    block: add request clone interface (v2)
    floppy: fix hibernation
    ramdisk: remove long-deprecated "ramdisk=" boot-time parameter
    fs/bio.c: add missing __user annotation
    block: prevent possible io_context->refcount overflow
    Add serial number support for virtio_blk, V4a
    block: Add missing bounce_pfn stacking and fix comments
    Revert "block: Fix bounce limit setting in DM"
    cciss: decode unit attention in SCSI error handling code
    cciss: Remove no longer needed sendcmd reject processing code
    cciss: change SCSI error handling routines to work with interrupts enabled.
    cciss: separate error processing and command retrying code in sendcmd_withirq_core()
    cciss: factor out fix target status processing code from sendcmd functions
    cciss: simplify interface of sendcmd() and sendcmd_withirq()
    cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code
    cciss: Use schedule_timeout_uninterruptible in SCSI error handling code
    block: needs to set the residual length of a bidi request
    Revert "block: implement blkdev_readpages"
    block: Fix bounce limit setting in DM
    Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt
    ...

    Manually fix conflicts with tracing updates in:
    block/blk-sysfs.c
    drivers/ide/ide-atapi.c
    drivers/ide/ide-cd.c
    drivers/ide/ide-floppy.c
    drivers/ide/ide-tape.c
    include/trace/events/block.h
    kernel/trace/blktrace.c

    Linus Torvalds
     

10 Jun, 2009

2 commits


05 Jun, 2009

1 commit

  • This patch uses sget() to get a reference to the
    existing gfs2 sb when mouting the gfs2meta filesystem
    (in fact thats just another mount of the gfs2
    filesystem with a different root and this interface
    is for backward compatibility).

    Signed-off-by: Steven Whitehouse
    Reported-by: Benjamin Marzinski
    Tested-by: Benjamin Marzinski
    Cc: Christoph Hellwig

    Steven Whitehouse
     

03 Jun, 2009

1 commit