15 Jan, 2016

1 commit

  • Mark those kmem allocations that are known to be easily triggered from
    userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
    memcg. For the list, see below:

    - threadinfo
    - task_struct
    - task_delay_info
    - pid
    - cred
    - mm_struct
    - vm_area_struct and vm_region (nommu)
    - anon_vma and anon_vma_chain
    - signal_struct
    - sighand_struct
    - fs_struct
    - files_struct
    - fdtable and fdtable->full_fds_bits
    - dentry and external_name
    - inode for all filesystems. This is the most tedious part, because
    most filesystems overwrite the alloc_inode method.

    The list is far from complete, so feel free to add more objects.
    Nevertheless, it should be close to "account everything" approach and
    keep most workloads within bounds. Malevolent users will be able to
    breach the limit, but this was possible even with the former "account
    everything" approach (simply because it did not account everything in
    fact).

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Vladimir Davydov
    Acked-by: Johannes Weiner
    Acked-by: Michal Hocko
    Cc: Tejun Heo
    Cc: Greg Thelen
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

15 Dec, 2015

1 commit

  • Before this patch, multi-block reservation structures were allocated
    from a special slab. This patch folds the structure into the gfs2_inode
    structure. The disadvantage is that the gfs2_inode needs more memory,
    even when a file is opened read-only. The advantages are: (a) we don't
    need the special slab and the extra time it takes to allocate and
    deallocate from it. (b) we no longer need to worry that the structure
    exists for things like quota management. (c) This also allows us to
    remove the calls to get_write_access and put_write_access since we
    know the structure will exist.

    Signed-off-by: Bob Peterson

    Bob Peterson
     

24 Nov, 2015

1 commit

  • This patch basically reverts the majority of patch 5407e24.
    That patch eliminated the gfs2_qadata structure in favor of just
    using the reservations structure. The problem with doing that is that
    it increases the size of the reservations structure. That is not an
    issue until it comes time to fold the reservations structure into the
    inode in memory so we know it's always there. By separating out the
    quota structure again, we aren't punishing the non-quota users by
    making all the inodes bigger, requiring more slab space. This patch
    creates a new slab area to allocate the quota stuff so it's managed
    a little more sanely.

    Signed-off-by: Bob Peterson

    Bob Peterson
     

30 Oct, 2015

1 commit

  • Commit e66cf161 replaced the gl_spin spinlock in struct gfs2_glock with a
    gl_lockref lockref and defined gl_spin as gl_lockref.lock (the spinlock in
    gl_lockref). Remove that define to make the references to gl_lockref.lock more
    obvious.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson

    Andreas Gruenbacher
     

17 Nov, 2014

1 commit

  • The current gfs2 freezing code is considerably more complicated than it
    should be because it doesn't use the vfs freezing code on any node except
    the one that begins the freeze. This is because it needs to acquire a
    cluster glock before calling the vfs code to prevent a deadlock, and
    without the new freeze_super and thaw_super hooks, that was impossible. To
    deal with the issue, gfs2 had to do some hacky locking tricks to make sure
    that a frozen node couldn't be holding on a lock it needed to do the
    unfreeze ioctl.

    This patch makes use of the new hooks to simply the gfs2 locking code. Now,
    all the nodes in the cluster freeze and thaw in exactly the same way. Every
    node in the cluster caches the freeze glock in the shared state. The new
    freeze_super hook allows the freezing node to grab this freeze glock in
    the exclusive state without first calling the vfs freeze_super function.
    All the nodes in the cluster see this lock change, and call the vfs
    freeze_super function. The vfs locking code guarantees that the nodes can't
    get stuck holding the glocks necessary to unfreeze the system. To
    unfreeze, the freezing node uses the new thaw_super hook to drop the freeze
    glock. Again, all the nodes notice this, reacquire the glock in shared mode
    and call the vfs thaw_super function.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

07 Mar, 2014

2 commits

  • Add pr_fmt, remove embedded "GFS2: " prefixes.
    This now consistently emits lower case "gfs2: " for each message.

    Other miscellanea around these changes:

    o Add missing newlines
    o Coalesce formats
    o Realign arguments

    Signed-off-by: Joe Perches
    Signed-off-by: Steven Whitehouse

    Joe Perches
     
  • -All printk(KERN_foo converted to pr_foo().
    -Messages updated to fit in 80 columns.
    -fs_macros converted as well.
    -fs_printk removed.

    Signed-off-by: Fabian Frederick
    Signed-off-by: Steven Whitehouse

    Fabian Frederick
     

15 Jan, 2014

1 commit

  • Prior to this patch, GFS2 kept all the quotas for each
    super block in a single linked list. This is rather slow
    when there are large numbers of quotas.

    This patch introduces a hlist_bl based hash table, similar
    to the one used for glocks. The initial look up of the quota
    is now lockless in the case where it is already cached,
    although we still have to take the per quota spinlock in
    order to bump the ref count. Either way though, this is a
    big improvement on what was there before.

    The qd_lock and the per super block list is preserved, for
    the time being. However it is intended that since this is no
    longer used for its original role, it should be possible to
    shrink the number of items on that list in due course and
    remove the requirement to take qd_lock in qd_get.

    Signed-off-by: Steven Whitehouse
    Cc: Abhijith Das
    Cc: Paul E. McKenney

    Steven Whitehouse
     

04 Nov, 2013

1 commit

  • By using the generic list_lru code, we can now separate the
    per sb quota list locking from the lru locking. The lru
    lock is made into the inner-most lock.

    As a result of this new lock order, we may occasionally see
    items on the per-sb quota list which are "dead" so that the
    two places where we traverse that list are updated to take
    account of that.

    As a result of this patch, the gfs2 quota shrinker is now
    NUMA zone aware, and we are also laying the foundations for
    further improvments in due course.

    Signed-off-by: Steven Whitehouse
    Signed-off-by: Abhijith Das
    Tested-by: Abhijith Das
    Cc: Dave Chinner

    Steven Whitehouse
     

11 Sep, 2013

1 commit

  • Convert the filesystem shrinkers to use the new API, and standardise some
    of the behaviours of the shrinkers at the same time. For example,
    nr_to_scan means the number of objects to scan, not the number of objects
    to free.

    I refactored the CIFS idmap shrinker a little - it really needs to be
    broken up into a shrinker per tree and keep an item count with the tree
    root so that we don't need to walk the tree every time the shrinker needs
    to count the number of objects in the tree (i.e. all the time under
    memory pressure).

    [glommer@openvz.org: fixes for ext4, ubifs, nfs, cifs and glock. Fixes are needed mainly due to new code merged in the tree]
    [assorted fixes folded in]
    Signed-off-by: Dave Chinner
    Signed-off-by: Glauber Costa
    Acked-by: Mel Gorman
    Acked-by: Artem Bityutskiy
    Acked-by: Jan Kara
    Acked-by: Steven Whitehouse
    Cc: Adrian Hunter
    Cc: "Theodore Ts'o"
    Cc: Adrian Hunter
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Arve Hjønnevåg
    Cc: Carlos Maiolino
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Cc: Daniel Vetter
    Cc: David Rientjes
    Cc: Gleb Natapov
    Cc: Greg Thelen
    Cc: J. Bruce Fields
    Cc: Jan Kara
    Cc: Jerome Glisse
    Cc: John Stultz
    Cc: KAMEZAWA Hiroyuki
    Cc: Kent Overstreet
    Cc: Kirill A. Shutemov
    Cc: Marcelo Tosatti
    Cc: Mel Gorman
    Cc: Steven Whitehouse
    Cc: Thomas Hellstrom
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton

    Signed-off-by: Al Viro

    Dave Chinner
     

19 Aug, 2013

1 commit


06 Jun, 2012

1 commit


24 Apr, 2012

2 commits

  • Prior to this patch, we have two ways of sending i/o to the log.
    One of those is used when we need to allocate both the data
    to be written itself and also a buffer head to submit it. This
    is done via sb_getblk and friends. This is used mostly for writing
    log headers.

    The other method is used when writing blocks which have some
    in-place counterpart. This is the case for all the metadata
    blocks which are journalled, and when journaled data is in use,
    for unescaped journalled data blocks.

    This patch replaces both of those two methods, and about half
    a dozen separate i/o submission points with a single i/o
    submission function. We also go direct to bio rather than
    using buffer heads, since this allows us to build i/o
    requests of the maximum size for the block device in
    question. It also reduces the memory required for flushing
    the log, which can be very useful in low memory situations.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch changes block reservations so it uses slab storage.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

08 Mar, 2012

1 commit

  • In order to ensure that we've got enough buffer heads for flushing
    the journal, the orignal code used __GFP_NOFAIL when performing
    this allocation. Here we dispense with that in favour of using a
    mempool. This should improve efficiency in low memory conditions
    since flushing the journal is a good way to get memory back, we
    don't want to be spinning, waiting on memory allocations. The
    buffers which are allocated via this mempool are fairly short lived,
    so that we'll recycle them pretty quickly.

    Although there are other memory allocations which occur during the
    journal flush process, this is the one which can potentially require
    the most memory, so the most important one to fix.

    The amount of memory reserved is a fixed amount, and we should not need
    to scale it when there are a greater number of filesystems in use.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

11 Jan, 2012

1 commit

  • This new method of managing recovery is an alternative to
    the previous approach of using the userland gfs_controld.

    - use dlm slot numbers to assign journal id's
    - use dlm recovery callbacks to initiate journal recovery
    - use a dlm lock to determine the first node to mount fs
    - use a dlm lock to track journals that need recovery

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     

22 Nov, 2011

1 commit

  • This patch separates the code pertaining to allocations into two
    parts: quota-related information and block reservations.
    This patch also moves all the block reservation structure allocations to
    function gfs2_inplace_reserve to simplify the code, and moves
    the frees to function gfs2_inplace_release.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

15 Jul, 2011

1 commit

  • This patch adds a cache for the hash table to the directory code
    in order to help simplify the way in which the hash table is
    accessed. This is intended to be a first step towards introducing
    some performance improvements in the directory code.

    There are two follow ups that I'm hoping to see fairly shortly. One
    is to simplify the hash table reading code now that we always read the
    complete hash table, whether we want one entry or all of them. The
    other is to introduce readahead on the heads of the hash chains
    which are referred to from the table.

    The hash table is a maximum of 128k in size, so it is not worth trying
    to read it in small chunks.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

27 May, 2011

1 commit

  • * 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
    gfs2: Drop __TIME__ usage
    isdn/diva: Drop __TIME__ usage
    atm: Drop __TIME__ usage
    dlm: Drop __TIME__ usage
    wan/pc300: Drop __TIME__ usage
    parport: Drop __TIME__ usage
    hdlcdrv: Drop __TIME__ usage
    baycom: Drop __TIME__ usage
    pmcraid: Drop __DATE__ usage
    edac: Drop __DATE__ usage
    rio: Drop __DATE__ usage
    scsi/wd33c93: Drop __TIME__ usage
    scsi/in2000: Drop __TIME__ usage
    aacraid: Drop __TIME__ usage
    media/cx231xx: Drop __TIME__ usage
    media/radio-maxiradio: Drop __TIME__ usage
    nozomi: Drop __TIME__ usage
    cyclades: Drop __TIME__ usage

    Linus Torvalds
     

26 May, 2011

1 commit

  • The kernel already prints its build timestamp during boot, no need to
    repeat it in random drivers and produce different object files each
    time.

    Cc: Steven Whitehouse
    Cc: cluster-devel@redhat.com
    Signed-off-by: Michal Marek

    Michal Marek
     

20 Apr, 2011

1 commit

  • The GLF_LRU flag introduced in the previous patch can be
    used to check if a glock is on the lru list when a new
    holder is queued and if so remove it, without having first
    to get the lru_lock.

    The main purpose of this patch however is to optimise the
    glocks left over when an inode at end of life is being
    evicted. Previously such glocks were left with the GLF_LFLUSH
    flag set, so that when reclaimed, each one required a log flush.
    This patch resets the GLF_LFLUSH flag when there is nothing
    left to flush thus preventing later log flushes as glocks are
    reused or demoted.

    In order to do this, we need to keep track of the number of
    revokes which are outstanding, and also to clear the GLF_LFLUSH
    bit after a log commit when only revokes have been processed.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

16 Mar, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
    GFS2: Don't use _raw version of RCU dereference
    GFS2: Adding missing unlock_page()
    GFS2: Update to AIL list locking
    GFS2: introduce AIL lock
    GFS2: fix block allocation check for fallocate
    GFS2: Optimize glock multiple-dequeue code
    GFS2: Remove potential race in flock code
    GFS2: Fix glock deallocation race
    GFS2: quota allows exceeding hard limit
    GFS2: deallocation performance patch
    GFS2: panics on quotacheck update
    GFS2: Improve cluster mmap scalability
    GFS2: Fix glock queue trace point
    GFS2: Post-VFS scale update for RCU path walk
    GFS2: Use RCU for glock hash table

    Linus Torvalds
     

24 Feb, 2011

1 commit

  • Michael Leun reported that running parallel opens on a fuse filesystem
    can trigger a "kernel BUG at mm/truncate.c:475"

    Gurudas Pai reported the same bug on NFS.

    The reason is, unmap_mapping_range() is not prepared for more than
    one concurrent invocation per inode. For example:

    thread1: going through a big range, stops in the middle of a vma and
    stores the restart address in vm_truncate_count.

    thread2: comes in with a small (e.g. single page) unmap request on
    the same vma, somewhere before restart_address, finds that the
    vma was already unmapped up to the restart address and happily
    returns without doing anything.

    Another scenario would be two big unmap requests, both having to
    restart the unmapping and each one setting vm_truncate_count to its
    own value. This could go on forever without any of them being able to
    finish.

    Truncate and hole punching already serialize with i_mutex. Other
    callers of unmap_mapping_range() do not, and it's difficult to get
    i_mutex protection for all callers. In particular ->d_revalidate(),
    which calls invalidate_inode_pages2_range() in fuse, may be called
    with or without i_mutex.

    This patch adds a new mutex to 'struct address_space' to prevent
    running multiple concurrent unmap_mapping_range() on the same mapping.

    [ We'll hopefully get rid of all this with the upcoming mm
    preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex
    lockbreak" patch in particular. But that is for 2.6.39 ]

    Signed-off-by: Miklos Szeredi
    Reported-by: Michael Leun
    Reported-by: Gurudas Pai
    Tested-by: Gurudas Pai
    Acked-by: Hugh Dickins
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

17 Feb, 2011

1 commit

  • There are two spellings in use for 'freeze' + 'able' - 'freezable' and
    'freezeable'. The former is the more prominent one. The latter is
    mostly used by workqueue and in a few other odd places. Unify the
    spelling to 'freezable'.

    Signed-off-by: Tejun Heo
    Reported-by: Alan Stern
    Acked-by: "Rafael J. Wysocki"
    Acked-by: Greg Kroah-Hartman
    Acked-by: Dmitry Torokhov
    Cc: David Woodhouse
    Cc: Alex Dubov
    Cc: "David S. Miller"
    Cc: Steven Whitehouse

    Tejun Heo
     

21 Jan, 2011

1 commit

  • This has a number of advantages:

    - Reduces contention on the hash table lock
    - Makes the code smaller and simpler
    - Should speed up glock dumps when under load
    - Removes ref count changing in examine_bucket
    - No longer need hash chain lock in glock_put() in common case

    There are some further changes which this enables and which
    we may do in the future. One is to look at using SLAB_RCU,
    and another is to look at using a per-cpu counter for the
    per-sb glock counter, since that is touched twice in the
    lifetime of each glock (but only used at umount time).

    Signed-off-by: Steven Whitehouse
    Cc: Paul E. McKenney

    Steven Whitehouse
     

23 Oct, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    workqueue: remove in_workqueue_context()
    workqueue: Clarify that schedule_on_each_cpu is synchronous
    memory_hotplug: drop spurious calls to flush_scheduled_work()
    shpchp: update workqueue usage
    pciehp: update workqueue usage
    isdn/eicon: don't call flush_scheduled_work() from diva_os_remove_soft_isr()
    workqueue: add and use WQ_MEM_RECLAIM flag
    workqueue: fix HIGHPRI handling in keep_working()
    workqueue: add queue_work and activate_work trace points
    workqueue: prepare for more tracepoints
    workqueue: implement flush[_delayed]_work_sync()
    workqueue: factor out start_flush_work()
    workqueue: cleanup flush/cancel functions
    workqueue: implement alloc_ordered_workqueue()

    Fix up trivial conflict in fs/gfs2/main.c as per Tejun

    Linus Torvalds
     

11 Oct, 2010

1 commit

  • Add WQ_MEM_RECLAIM flag which currently maps to WQ_RESCUER, mark
    WQ_RESCUER as internal and replace all external WQ_RESCUER usages to
    WQ_MEM_RECLAIM.

    This makes the API users express the intent of the workqueue instead
    of indicating the internal mechanism used to guarantee forward
    progress. This is also to make it cleaner to add more semantics to
    WQ_MEM_RECLAIM. For example, if deemed necessary, memory reclaim
    workqueues can be made highpri.

    This patch doesn't introduce any functional change.

    Signed-off-by: Tejun Heo
    Cc: Jeff Garzik
    Cc: Dave Chinner
    Cc: Steven Whitehouse

    Tejun Heo
     

20 Sep, 2010

2 commits

  • Rather than calculating the qstrs for . and .. each time
    we need them, its better to keep a constant version of
    these and just refer to them when required.

    Signed-off-by: Steven Whitehouse
    Reviewed-by: Christoph Hellwig

    Steven Whitehouse
     
  • The recovery workqueue can be freezable since
    we want it to finish what it is doing if the system is to
    be frozen (although why you'd want to freeze a cluster node
    is beyond me since it will result in it being ejected from
    the cluster). It does still make sense for single node
    GFS2 filesystems though.

    The glock workqueue will benefit from being able to run more
    work items concurrently. A test running postmark shows
    improved performance and multi-threaded workloads are likely
    to benefit even more. It needs to be high priority because
    the latency directly affects the latency of filesystem glock
    operations.

    The delete workqueue is similar to the recovery workqueue in
    that it must not get blocked by memory allocations, and may
    run for a long time.

    Potentially other GFS2 threads might also be converted to
    workqueues, but I'll leave that for a later patch.

    Signed-off-by: Steven Whitehouse
    Acked-by: Tejun Heo

    Steven Whitehouse
     

23 Jul, 2010

1 commit

  • Workqueue can now handle high concurrency. Convert gfs to use
    workqueue instead of slow-work.

    * Steven pointed out that recovery path might be run from allocation
    path and thus requires forward progress guarantee without memory
    allocation. Create and use gfs_recovery_wq with rescuer. Please
    note that forward progress wasn't guaranteed with slow-work.

    * Updated to use non-reentrant workqueue.

    Signed-off-by: Tejun Heo
    Acked-by: Steven Whitehouse

    Tejun Heo
     

29 Mar, 2010

1 commit


01 Mar, 2010

1 commit

  • Since the start of GFS2, an "extra" inode has been used to store
    the metadata belonging to each inode. The only reason for using
    this inode was to have an extra address space, the other fields
    were unused. This means that the memory usage was rather inefficient.

    The reason for keeping each inode's metadata in a separate address
    space is that when glocks are requested on remote nodes, we need to
    be able to efficiently locate the data and metadata which relating
    to that glock (inode) in order to sync or sync and invalidate it
    (depending on the remotely requested lock mode).

    This patch adds a new type of glock, which has in addition to
    its normal fields, has an address space. This applies to all
    inode and rgrp glocks (but to no other glock types which remain
    as before). As a result, we no longer need to have the second
    inode.

    This results in three major improvements:
    1. A saving of approx 25% of memory used in caching inodes
    2. A removal of the circular dependency between inodes and glocks
    3. No confusion between "normal" and "metadata" inodes in super.c

    Although the first of these is the more immediately apparent, the
    second is just as important as it now enables a number of clean
    ups at umount time. Those will be the subject of future patches.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

20 Nov, 2009

1 commit


19 May, 2009

1 commit

  • This patch fixes a race condition where we can receive recovery
    requests part way through processing a umount. This was causing
    problems since the recovery thread had already gone away.

    Looking in more detail at the recovery code, it was really trying
    to implement a slight variation on a work queue, and that happens to
    align nicely with the recently introduced slow-work subsystem. As a
    result I've updated the code to use slow-work, rather than its own home
    grown variety of work queue.

    When using the wait_on_bit() function, I noticed that the wait function
    that was supplied as an argument was appearing in the WCHAN field, so
    I've updated the function names in order to produce more meaningful
    output.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

24 Mar, 2009

2 commits

  • This is the big patch that I've been working on for some time
    now. There are many reasons for wanting to make this change
    such as:
    o Reducing overhead by eliminating duplicated fields between structures
    o Simplifcation of the code (reduces the code size by a fair bit)
    o The locking interface is now the DLM interface itself as proposed
    some time ago.
    o Fewer lookups of glocks when processing replies from the DLM
    o Fewer memory allocations/deallocations for each glock
    o Scope to do further optimisations in the future (but this patch is
    more than big enough for now!)

    Please note that (a) this patch relates to the lock_dlm module and
    not the DLM itself, that is still a separate module; and (b) that
    we retain the ability to build GFS2 as a standalone single node
    filesystem with out requiring the DLM.

    This patch needs a lot of testing, hence my keeping it I restarted
    my -git tree after the last merge window. That way, this has the maximum
    exposure before its merged. This is (modulo a few minor bug fixes) the
    same patch that I've been posting on and off the the last three months
    and its passed a number of different tests so far.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Deallocation of gfs2_quota_data objects now happens on-demand through a
    shrinker instead of routinely deallocating through the quotad daemon.

    Signed-off-by: Abhijith Das
    Signed-off-by: Steven Whitehouse

    Abhijith Das
     

05 Jan, 2009

3 commits

  • This patch removes the two daemons, gfs2_scand and gfs2_glockd
    and replaces them with a shrinker which is called from the VM.

    The net result is that GFS2 responds better when there is memory
    pressure, since it shrinks the glock cache at the same rate
    as the VFS shrinks the dcache and icache. There are no longer
    any time based criteria for shrinking glocks, they are kept
    until such time as the VM asks for more memory and then we
    demote just as many glocks as required.

    There are potential future changes to this code, including the
    possibility of sorting the glocks which are to be written back
    into inode number order, to get a better I/O ordering. It would
    be very useful to have an elevator based workqueue implementation
    for this, as that would automatically deal with the read I/O cases
    at the same time.

    This patch is my answer to Andrew Morton's remark, made during
    the initial review of GFS2, asking why GFS2 needs so many kernel
    threads, the answer being that it doesn't :-) This patch is a
    net loss of about 200 lines of code.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Following on from the recent clean up of gfs2_quotad, this patch moves
    the processing of "truncate in progress" inodes from the glock workqueue
    into gfs2_quotad. This fixes a hang due to the "truncate in progress"
    processing requiring glocks in order to complete.

    It might seem odd to use gfs2_quotad for this particular item, but
    we have to use a pre-existing thread since creating a thread implies
    a GFP_KERNEL memory allocation which is not allowed from the glock
    workqueue context. Of the existing threads, gfs2_logd and gfs2_recoverd
    may deadlock if used for this operation. gfs2_scand and gfs2_glockd are
    both scheduled for removal at some (hopefully not too distant) future
    point. That leaves only gfs2_quotad whose workload is generally fairly
    light and is easily adapted for this extra task.

    Also, as a result of this change, it opens the way for a future patch to
    make the reading of the inode's information asynchronous with respect to
    the glock workqueue, which is another improvement that has been on the list
    for some time now.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch is a clean up of gfs2_quotad prior to giving it an
    extra job to do in addition to the current portfolio of updating
    the quota and statfs information from time to time.

    As a result it has been moved into quota.c allowing one of the
    functions it calls to be made static. Also the clean up allows
    the two existing functions to have separate timeouts and also
    to coexist with its future role of dealing with the "truncate in
    progress" inode flag.

    The (pointless) setting of gfs2_quotad_secs is removed since we
    arrange to only wake up quotad when one of the two timers expires.

    In addition the struct gfs2_quota_data is moved into a slab cache,
    mainly for easier debugging. It should also be possible to use
    a shrinker in the future, rather than the current scheme of scanning
    the quota data entries from time to time.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse