09 Jul, 2007

33 commits

  • This adds a nanosecond timestamp feature to the GFS2 filesystem. Due
    to the way that the on-disk format works, older filesystems will just
    appear to have this field set to zero. When mounted by an older version
    of GFS2, the filesystem will simply ignore the extra fields so that
    it will again appear to have whole second resolution, so that its
    trivially backward compatible.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch fixes some sign issues which were accidentally introduced
    into the quota & statfs code during the endianess annotation process.
    Also included is a general clean up which moves all of the _host
    structures out of gfs2_ondisk.h (where they should not have been to
    start with) and into the places where they are actually used (often only
    one place). Also those _host structures which are not required any more
    are removed entirely (which is the eventual plan for all of them).

    The conversion routines from ondisk.c are also moved into the places
    where they are actually used, which for almost every one, was just one
    single place, so all those are now static functions. This also cleans up
    the end of gfs2_ondisk.h which no longer needs the #ifdef __KERNEL__.

    The net result is a reduction of about 100 lines of code, many functions
    now marked static plus the bug fixes as mentioned above. For good
    measure I ran the code through sparse after making these changes to
    check that there are no warnings generated.

    This fixes Red Hat bz #239686

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This is a patch for the first three issues of RHBZ #238162

    The first issue is that when you allocate a new page for a file, it will not
    start off uptodate. This makes sense, since you haven't written anything to that
    part of the file yet. Unfortunately, gfs2_pin() checks to make sure that the
    buffers are uptodate. The solution to this is to mark the buffers uptodate in
    gfs2_commit_write(), after they have been zeroed out and have the data written
    into them. I'm pretty confident with this fix, although it's not completely
    obvious that there is no problem with marking the buffers uptodate here.

    The second issue is simply that you can try to pin a data buffer that is already
    on the incore log, and thus, already pinned. This patch checks to see if this
    buffer is already on the log, and exits databuf_lo_add() if it is, just like
    buf_lo_add() does.

    The third issue is that gfs2_log_flush() doesn't do it's block accounting
    correctly. Both metadata and journaled data are logged, but gfs2_log_flush()
    only compares the number of metadata blocks with the number of blocks to commit
    to the ondisk journal. This patch also counts the journaled data blocks.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     
  • This patch clears the user_data of active sockets as part of cleanup.
    This prevents any late-arriving data from trying to add jobs to the work
    queue while we are tidying up.

    Signed-Off-By: Patrick Caulfield
    Signed-Off-By: David Teigland
    Signed-off-by: Steven Whitehouse

    Patrick Caulfield
     
  • The number of blocks which we reserve in the log at the start of each
    transaction needs to depends upon the block size since the overhead is
    related to the number of "pointers" which can be fitted into a single
    block.

    This relates to Red Hat bz #240435

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch fixes a bug where gfs2 was writing update quota usage
    information to the wrong location in the quota file.

    Signed-off-by: Abhijith Das
    Signed-off-by: Steven Whitehouse

    Abhijith Das
     
  • Display the initial value of the "protocol" config value in configfs.
    The default value has always been 0 in the past anyway, so it's always
    appeared to be correct.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • Add a new debugfs file that dumps a compact list of mastered locks.
    This will be used by a userland daemon to collect state for deadlock
    detection.

    Also, for the existing function that prints all lock state, lock the rsb
    before going through the lock lists since they can be changing in the
    course of normal dlm activity.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • Add a function that can be used through libdlm by a system daemon to cancel
    another process's deadlocked lock. A completion ast with EDEADLK is returned
    to the process waiting for the lock.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • Various fixes related to the new timeout feature:
    - add_timeout() missed setting TIMEWARN flag on lkb's when the
    TIMEOUT flag was already set
    - clear_proc_locks should remove a dead process's locks from the
    timeout list
    - the end-of-life calculation for user locks needs to consider that
    ETIMEDOUT is equivalent to -DLM_ECANCEL
    - make initial default timewarn_cs config value visible in configfs
    - change bit position of TIMEOUT_CANCEL flag so it's not copied to
    a remote master node
    - set timestamp on remote lkb's so a lock dump will display the time
    they've been waiting

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • A one liner fix which got missed from the earlier patches.

    Signed-off-by: Steven Whitehouse
    Cc: Fabio Massimo Di Nitto
    Cc: David Teigland

    Steven Whitehouse
     
  • 2e8701a15cd6f7c95e74d6660615a69b09e453ef commit breaks libgfs2 build:

    gcc -Wall -I/usr/src/ubuntu/mypkgs/rhcluster/cluster/config -DHELPER_PROGRAM
    -D_FILE_OFFSET_BITS=64 -DGFS2_RELEASE_NAME=\"2.0\" -ggdb -I/usr/include
    -I../include -I../libgfs2 -c -o gfs2hex.o gfs2hex.c
    In file included from hexedit.h:22,
    from gfs2hex.c:27:
    /usr/include/linux/gfs2_ondisk.h:505: error: expected specifier-qualifier-list
    before ‘u32’
    make[2]: *** [gfs2hex.o] Error 1
    make[2]: Leaving directory `/usr/src/ubuntu/mypkgs/rhcluster/cluster/gfs2/edit'
    make[1]: *** [all] Error 2
    make[1]: Leaving directory `/usr/src/ubuntu/mypkgs/rhcluster/cluster/gfs2'
    make: *** [gfs2] Error 2

    Signed-off-by: Fabio Massimo Di Nitto
    Signed-off-by: Steven Whitehouse

    Fabio Massimo Di Nitto
     
  • In the rush to get the previous patch set sent, a compilation bug I fixed
    shortly before sending somehow got clobbered, probably by a missed quilt
    refresh or something.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • Joining the lockspace should wait for the initial round of inter-node
    config checks to complete before returning. This way, if there's a
    configuration mismatch between the joining node and the existing nodes,
    the join can fail and return an error to the application.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • Fix the error path when exiting new_lockspace(). It was kfree'ing the
    lockspace struct at the end, but that's only valid if it exits before
    kobject_register occured. After kobject_register we have to let the
    kobject do the freeing.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • When conversion deadlock is detected, cancel the conversion and return
    EDEADLK to the application. This is a new default behavior where before
    the dlm would allow the deadlock to exist indefinately.

    The DLM_LKF_NODLCKWT flag can now be used in a conversion to prevent the
    dlm from performing conversion deadlock detection/cancelation on it.
    The DLM_LKF_CONVDEADLK flag can continue to be used as before to tell the
    dlm to demote the granted mode of the lock being converted if it gets into
    a conversion deadlock.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • Change the user/kernel device interface used by libdlm:
    - Add ability for userspace to check the version of the interface. libdlm
    can now adapt to different versions of the kernel interface.
    - Increase the size of the flags passed in a lock request so all possible
    flags can be used from userspace.
    - Add an opaque "xid" value for each lock. This "transaction id" will be
    used later to associate locks with each other during deadlock detection.
    - Add a "timeout" value for each lock. This is used along with the
    DLM_LKF_TIMEOUT flag.

    Also, remove a fragment of unused code in device_read().

    This patch requires updating libdlm which is backward compatible with
    older kernels.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • New features: lock timeouts and time warnings. If the DLM_LKF_TIMEOUT
    flag is set, then the request/conversion will be canceled after waiting
    the specified number of centiseconds (specified per lock). This feature
    is only available for locks requested through libdlm (can be enabled for
    kernel dlm users if there's a use for it.)

    If the new DLM_LSFL_TIMEWARN flag is set when creating the lockspace, then
    a warning message will be sent to userspace (using genetlink) after a
    request/conversion has been waiting for a given number of centiseconds
    (configurable per node). The time warnings will be used in the future
    to do deadlock detection in userspace.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • Don't let dlm_scand run during recovery since it may try to do a resource
    directory removal while the directory nodes are changing.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • This problem was originally reported against GFS6.1, but the same issue exists
    in upstream DLM. This patch keeps the rsb iterator assigning under the rsbtbl
    list lock. Each time we process an rsb we grab a reference to it to make sure
    it is not freed out from underneath us, and then put it when we get the next rsb
    in the list or move onto another list.

    Signed-off-by: Josef Bacik
    Signed-off-by: Steven Whitehouse

    Josef Bacik
     
  • This patch fixes an error in the quota code where a 'struct
    gfs2_quota_lvb*' was being passed to gfs2_adjust_quota() instead of a
    'struct gfs2_quota_data*'. Also moved 'struct gfs2_quota_lvb' from
    fs/gfs2/incore.h to include/linux/gfs2_ondisk.h as per Steve's suggestion.

    Signed-off-by: Abhijith Das
    Signed-off-by: Steven Whitehouse

    Abhijith Das
     
  • This patch cleans up the inode number handling code. The main difference
    is that instead of looking up the inodes using a struct gfs2_inum_host
    we now use just the no_addr member of this structure. The tests relating
    to no_formal_ino can then be done by the calling code. This has
    advantages in that we want to do different things in different code
    paths if the no_formal_ino doesn't match. In the NFS patch we want to
    return -ESTALE, but in the ->lookup() path, its a bug in the fs if the
    no_formal_ino doesn't match and thus we can withdraw in this case.

    In order to later fix bz #201012, we need to be able to look up an inode
    without knowing no_formal_ino, as the only information that is known to
    us is the on-disk location of the inode in question.

    This patch will also help us to fix bz #236099 at a later date by
    cleaning up a lot of the code in that area.

    There are no user visible changes as a result of this patch and there
    are no changes to the on-disk format either.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch removes the completion (which is rather large) from struct
    gdlm_lock in favour of using the wait_on_bit() functions. We don't need
    to add any extra fields to the structure to do this, so we save 32 bytes
    (on x86_64) per structure. This adds up to quite a lot when we may
    potentially have millions of these lock structures,

    Signed-off-by: Steven Whitehouse
    Acked-by: David Teigland

    Steven Whitehouse
     
  • This addendum patch 2 corrects three things:

    1. It fixes a stupid mistake in the previous addendum that broke gfs2.
    Ref: https://www.redhat.com/archives/cluster-devel/2007-May/msg00162.html
    2. It fixes a problem that Dave Teigland pointed out regarding the
    external declarations in ops_address.h being in the wrong place.
    3. It recasts a couple more %llu printks to (unsigned long long)
    as requested by Steve Whitehouse.

    I would have loved to put this all in one revised patch, but there was
    a rush to get some patches for RHEL5. Therefore, the previous patches
    were applied to the git tree "as is" and therefore, I'm posting another
    addendum. Sorry.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Robert Peterson
     
  • Use zero_user_page() instead of open-coding it.

    Signed-off-by: Nate Diller
    Cc: Steven Whitehouse
    Signed-off-by: Andrew Morton

    Nate Diller
     
  • To avoid code redundancy, I separated out the operational "guts" into
    a new function called read_rindex_entry. Then I made two functions:
    the closer-to-original gfs2_ri_update (without the special condition
    checks) and gfs2_ri_update_special that's designed with that condition
    in mind. (I don't like the name, but if you have a suggestion, I'm
    all ears).

    Oh, and there's an added benefit: we don't need all the ugly gotos
    anymore. ;)

    This patch has been tested with gfs2_fsck_hellfire (which runs for
    three and a half hours, btw).

    Signed-off-By: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Robert Peterson
     
  • This is another revision of my gfs2 kernel patch that allows
    gfs2_grow to function properly.

    Steve Whitehouse expressed some concerns about the previous
    patch and I restructured it based on his comments.
    The previous patch was doing the statfs_change at file close time,
    under its own transaction. The current patch does the statfs_change
    inside the gfs2_commit_write function, which keeps it under the
    umbrella of the inode transaction.

    I can't call ri_update to re-read the rindex file during the
    transaction because the transaction may have outstanding unwritten
    buffers attached to the rgrps that would be otherwise blown away.
    So instead, I created a new function, gfs2_ri_total, that will
    re-read the rindex file just to total the file system space
    for the sake of the statfs_change. The ri_update will happen
    later, when gfs2 realizes the version number has changed, as it
    happened before my patch.

    Since the statfs_change is happening at write_commit time and there
    may be multiple writes to the rindex file for one grow operation.
    So one consequence of this restructuring is that instead of getting
    one kernel message to indicate the change, you may see several.
    For example, before when you did a gfs2_grow, you'd get a single
    message like:

    GFS2: File system extended by 247876 blocks (968MB)

    Now you get something like:

    GFS2: File system extended by 207896 blocks (812MB)
    GFS2: File system extended by 39980 blocks (156MB)

    This version has also been successfully run against the hours-long
    "gfs2_fsck_hellfire" test that does several gfs2_grow and gfs2_fsck
    while interjecting file system damage. It does this repeatedly
    under a variety Resource Group conditions.

    Signed-off-By: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Robert Peterson
     
  • Fix two races in fs/dlm/config.c:

    (1) Grab the configfs subsystem semaphore before calling
    config_group_find_obj() in get_space(). This solves a potential race
    between get_space() and concurrent mkdir(2) or rmdir(2).

    (2) Grab a reference on the found config_item _while_ holding the configfs
    subsystem semaphore in get_comm(), and not after it. This solves a
    potential race between get_comm() and concurrent rmdir(2).

    Signed-off-by: Satyam Sharma
    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    Satyam Sharma
     
  • Fix for bz #231910
    When filemap_fdatawrite() is called on the inode mapping in data=ordered mode,
    it will add the glock to the log. In inode_go_sync(), if you do the
    gfs2_log_flush() before this, after the filemap_fdatawrite() call, the glock
    and its associated data buffers will be on the log again. This means you can
    demote a lock from exclusive, without having it flushed from the log. The
    attached patch simply moves the gfs2_log_flush up to after the
    filemap_fdatawrite() call.

    Originally, I tried moving the gfs2_log_flush to after gfs2_meta_sync(), but
    that caused me to trip the following assert.

    GFS2: fsid=cypher-36:test.0: fatal: assertion "!buffer_busy(bh)" failed
    GFS2: fsid=cypher-36:test.0: function = gfs2_ail_empty_gl, file = fs/gfs2/glops.c, line = 61

    It appears that gfs2_log_flush() puts some of the glocks buffers in the busy
    state and the filemap_fdatawrite() call is necessary to flush them. This makes
    me worry slightly that a related problem could happen because of moving the
    gfs2_log_flush() after the initial filemap_fdatawrite(), but I assume that
    gfs2_ail_empty_gl() would catch that case as well.

    Signed-off-by: Benjamin E. Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     
  • Woo-hoo. I'm sure somebody will report a "this doesn't compile, and
    I have a new root exploit" five minutes after release, but it still
    feels good ;)

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • * master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6:
    qd65xx: fix PIO mode selection
    sis5513: adding PCI-ID

    Linus Torvalds
     
  • Commit 1c710c896eb461895d3c399e15bb5f20b39c9073 added the utimensat()
    system call, but didn't handle the case of checking for the writability
    of the target right, when the target was a file descriptor, not a
    filename.

    We cannot use vfs_permission(MAY_WRITE) for that case, and need to
    simply check whether the file descriptor is writable. The oops from
    using the wrong function was noticed and narrowed down by Markus
    Trippelsdorf.

    Cc: Ulrich Drepper
    Cc: Markus Trippelsdorf
    Cc: Andrew Morton
    Acked-by: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Fix a post-2.6.21 regression.

    read_cache_page_async() has two invocations of mark_page_accessed() which will
    launch pages right onto the active list.

    Remove the first one, keeping the latter one. This avoids marking unwanted
    pages active (in the retry loop).

    Signed-off-by: Peter Zijlstra
    Acked-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

08 Jul, 2007

6 commits

  • PIO4 is a maximum PIO mode supported by a driver. Using "255" as a max_mode
    argument to ide_get_best_pio_mode() could result in wrong timings being used
    by a driver (for "pio" equal to 5) or OOPS (for "pio" values > 5 && < 255).

    Signed-off-by: Bartlomiej Zolnierkiewicz
    Acked-by: Sergei Shtylyov
    Reviewed-by: Alan Cox

    Bartlomiej Zolnierkiewicz
     
  • The SiS966 has one additional PCI-ID 1180.

    If the chipset is using this PCI-ID, the primary channel is connected to the
    first PATA-port. The secondary channel is connected to SATA-ports in IDE
    emulation mode. The legacy IO-ports are used.

    The including of the PCI-ID into pata_sis is not sufficient, because the legacy
    driver in drivers/ide is initialized before pata_sis.

    Signed-off-by: Uwe Koziolek
    Signed-off-by: Bartlomiej Zolnierkiewicz

    Uwe Koziolek
     
  • The dependency of DLM on SYSFS got lost in
    commit 6ed7257b46709e87d79ac2b6b819b7e0c9184998 resulting in the
    following compile error with CONFIG_DLM=y, CONFIG_SYSFS=n:

    ...
    LD .tmp_vmlinux1
    fs/built-in.o: In function `dlm_lockspace_init':
    /home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/fs/dlm/lockspace.c:231: undefined reference to `kernel_subsys'
    fs/built-in.o: In function `configfs_init':
    /home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/fs/configfs/mount.c:143: undefined reference to `kernel_subsys'
    make[1]: *** [.tmp_vmlinux1] Error 1

    Signed-off-by: Adrian Bunk
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • The printk level in this printk is bogus, as the previous printk
    didn't have a terminating \n resulting in ..

    Intel E7520/7320/7525 detected.Disabling irq balancing and affinity

    It also never printed a \n at all in the case where we didn't do
    the quirk.

    Change it to only make noise if it actually does something useful.

    Signed-off-by: Dave Jones
    Signed-off-by: Linus Torvalds

    Dave Jones
     
  • This patch fixes the following 2.6.22 regression with CONFIG_KALLSYMS=n:

    ...
    CC arch/m32r/kernel/traps.o
    In file included from /home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/arch/m32r/kernel/traps.c:14:
    /home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/include/linux/kallsyms.h: In function 'lookup_symbol_name':
    /home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/include/linux/kallsyms.h:66: error: 'ERANGE' undeclared (first use in this function)
    /home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/include/linux/kallsyms.h:66: error: (Each undeclared identifier is reported only once
    /home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/include/linux/kallsyms.h:66: error: for each function it appears in.)
    /home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/include/linux/kallsyms.h: In function 'lookup_symbol_attrs':
    /home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/include/linux/kallsyms.h:71: error: 'ERANGE' undeclared (first use in this function)
    make[2]: *** [arch/m32r/kernel/traps.o] Error 1

    Signed-off-by: Adrian Bunk
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • When cleaning up HIDP sessions, we currently close the ACL connection
    before deregistering the input device. Closing the ACL connection
    schedules a workqueue to remove the associated objects from sysfs, but
    the input device still refers to them -- and if the workqueue happens to
    run before the input device removal, the kernel will oops when trying to
    look up PHYSDEVPATH for the removed input device.

    Fix this by deregistering the input device before closing the
    connections.

    Signed-off-by: David Woodhouse
    Acked-by: Marcel Holtmann
    Signed-off-by: Linus Torvalds

    David Woodhouse
     

07 Jul, 2007

1 commit

  • kmem_cache_open is static. EXPORT_SYMBOL was leftover from some earlier
    time period where kmem_cache_open was usable outside of slub.

    (Fixes powerpc build error)

    Signed-off-by: Chrsitoph Lameter
    Cc: Johannes Berg
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter