02 Dec, 2011

1 commit

  • * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (31 commits)
    ocfs2: avoid unaligned access to dqc_bitmap
    ocfs2: Use filemap_write_and_wait() instead of write_inode_now()
    ocfs2: honor O_(D)SYNC flag in fallocate
    ocfs2: Add a missing journal credit in ocfs2_link_credits() -v2
    ocfs2: send correct UUID to cleancache initialization
    ocfs2: Commit transactions in error cases -v2
    ocfs2: make direntry invalid when deleting it
    fs/ocfs2/dlm/dlmlock.c: free kmem_cache_zalloc'd data using kmem_cache_free
    ocfs2: Avoid livelock in ocfs2_readpage()
    ocfs2: serialize unaligned aio
    ocfs2: Implement llseek()
    ocfs2: Fix ocfs2_page_mkwrite()
    ocfs2: Add comment about orphan scanning
    ocfs2: Clean up messages in the fs
    ocfs2/cluster: Cluster up now includes network connections too
    ocfs2/cluster: Add new function o2net_fill_node_map()
    ocfs2/cluster: Fix output in file elapsed_time_in_ms
    ocfs2/dlm: dlmlock_remote() needs to account for remastery
    ocfs2/dlm: Take inflight reference count for remotely mastered resources too
    ocfs2/dlm: Cleanup dlm_wait_for_node_death() and dlm_wait_for_node_recovery()
    ...

    Linus Torvalds
     

02 Nov, 2011

1 commit


01 Jun, 2011

1 commit

  • ocfs2 cannot currently mount a device that is readonly at the media
    ("hard readonly"). Fix the broken places.
    see detail: http://oss.oracle.com/bugzilla/show_bug.cgi?id=1322

    [ Description edited -- Joel ]

    Signed-off-by: Tiger Yang
    Reviewed-by: Sunil Mushran
    Signed-off-by: Joel Becker

    Tiger Yang
     

29 Mar, 2011

1 commit


07 Mar, 2011

1 commit

  • mlog_exit is used to record the exit status of a function.
    But because it is added in so many functions, if we enable it,
    the system logs get filled up quickly and cause too much I/O.
    So actually no one can open it for a production system or even
    for a test.

    This patch just try to remove it or change it. So:
    1. if all the error paths already use mlog_errno, it is just removed.
    Otherwise, it will be replaced by mlog_errno.
    2. if it is used to print some return value, it is replaced with
    mlog(0,...).
    mlog_exit_ptr is changed to mlog(0.
    All those mlog(0,...) will be replaced with trace events later.

    Signed-off-by: Tao Ma

    Tao Ma
     

21 Feb, 2011

1 commit

  • ENTRY is used to record the entry of a function.
    But because it is added in so many functions, if we enable it,
    the system logs get filled up quickly and cause too much I/O.
    So actually no one can open it for a production system or even
    for a test.

    So for mlog_entry_void, we just remove it.
    for mlog_entry(...), we replace it with mlog(0,...), and they
    will be replace by trace event later.

    Signed-off-by: Tao Ma

    Tao Ma
     

20 Feb, 2011

1 commit

  • Patch makes use of the hrtimer to track times in ocfs2 lock stats.

    The patch is a bit involved to ensure no additional impact on the memory
    footprint. The size of ocfs2_inode_cache remains 1280 bytes on 32-bit systems.

    A related change was to modify the unit of the max wait time from nanosec to
    microsec allowing us to track max time larger than 4 secs. This change
    necessitated the bumping of the output version in the debugfs file,
    locking_state, from 2 to 3.

    Signed-off-by: Sunil Mushran
    Signed-off-by: Joel Becker

    Sunil Mushran
     

11 Sep, 2010

1 commit

  • Track negative dentries by recording the generation number of the parent
    directory in d_fsdata. The generation number for the parent directory is
    recorded in the inode_info, which increments every time the lock on the
    directory is dropped.

    If the generation number of the parent directory and the negative dentry
    matches, there is no need to perform the revalidate, else a revalidate
    is forced. This improves performance in situations where nodes look for
    the same non-existent file multiple times.

    Thanks Mark for explaining the DLM sequence.

    Signed-off-by: Goldwyn Rodrigues
    Signed-off-by: Joel Becker

    Goldwyn Rodrigues
     

20 Jul, 2010

1 commit


22 May, 2010

1 commit


08 Mar, 2010

1 commit


28 Feb, 2010

1 commit


27 Feb, 2010

3 commits

  • Inside the stackglue, the locking protocol structure is hanging off of
    the ocfs2_cluster_connection. This takes it one further; the locking
    protocol is passed into ocfs2_cluster_connect(). Now different cluster
    connections can have different locking protocols with distinct asts.
    Note that all locking protocols have to keep their maximum protocol
    version in lock-step.

    With the protocol structure set in ocfs2_cluster_connect(), there is no
    need for the stackglue to have a static pointer to a specific protocol
    structure. We can change initialization to only pass in the maximum
    protocol version.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • We're going to want it in the ast functions, so we convert union
    ocfs2_dlm_lksb to struct ocfs2_dlm_lksb and let it carry the connection.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • The stackglue ast and bast functions tried to maintain the fiction that
    their arguments were void pointers. In reality, stack_user.c had to
    know that the argument was an ocfs2_lock_res in order to get the status
    off of the lksb. That's ugly.

    This changes stackglue to always pass the lksb as the argument to ast
    and bast functions. The caller can always use container_of() to get the
    ocfs2_lock_res or user_dlm_lock_res. The net effect to the caller is
    zero. They still get back the lockres in their ast. stackglue gets
    cleaner, and now can use the lksb itself.

    Signed-off-by: Joel Becker

    Joel Becker
     

09 Feb, 2010

1 commit

  • In particular, several occurances of funny versions of 'success',
    'unknown', 'therefore', 'acknowledge', 'argument', 'achieve', 'address',
    'beginning', 'desirable', 'separate' and 'necessary' are fixed.

    Signed-off-by: Daniel Mack
    Cc: Joe Perches
    Cc: Junio C Hamano
    Signed-off-by: Jiri Kosina

    Daniel Mack
     

04 Feb, 2010

1 commit


03 Feb, 2010

4 commits

  • During blocked lock processing, we should consider the possibility that the
    lock is no longer blocking.

    Joel Becker assisted in fixing this issue.

    Reported-by: David Teigland
    Signed-off-by: Sunil Mushran
    Signed-off-by: Joel Becker

    Sunil Mushran
     
  • During upconvert, if the master were to send a BAST, dlmglue will detect the
    upconversion in process and send a cancel convert to the master. Upon receiving
    the AST for the cancel convert, it will re-process the lock resource to determine
    whether it needs downconverting. Say, the up was from PR to EX and the BAST was
    for EX. After the cancel convert, it will need to downconvert to NL.

    However, if the node was originally upconverting from NL to EX, then there would
    be no reason to downconvert (assuming the same message sequence).

    This patch makes dlmglue consider the possibility that the current lock level
    is already compatible and that downconverting is not required.

    Joel Becker assisted in fixing this issue.

    Fixes ossbz#1178
    http://oss.oracle.com/bugzilla/show_bug.cgi?id=1178

    Reported-by: Coly Li
    Signed-off-by: Sunil Mushran
    Signed-off-by: Joel Becker

    Sunil Mushran
     
  • There is possibility of a livelock in __ocfs2_cluster_lock(). If a node were
    to get an ast for an upconvert request, followed immediately by a bast,
    there is a small window where the fs may downconvert the lock before the
    process requesting the upconvert is able to take the lock.

    This patch adds a new flag to indicate that the upconvert is still in
    progress and that the dc thread should not downconvert it right now.

    Wengang Wang and Joel Becker
    contributed heavily to this patch.

    Reported-by: David Teigland
    Signed-off-by: Sunil Mushran
    Signed-off-by: Joel Becker

    Sunil Mushran
     
  • During bast, set the OCFS2_LOCK_BLOCKED flag only if the lock needs to
    downconverted.

    Signed-off-by: Wengang Wang
    Acked-by: Sunil Mushran
    Acked-by: Mark Fasheh
    Signed-off-by: Joel Becker

    Wengang Wang
     

26 Jan, 2010

1 commit


04 Dec, 2009

1 commit

  • That is "success", "unknown", "through", "performance", "[re|un]mapping"
    , "access", "default", "reasonable", "[con]currently", "temperature"
    , "channel", "[un]used", "application", "example","hierarchy", "therefore"
    , "[over|under]flow", "contiguous", "threshold", "enough" and others.

    Signed-off-by: André Goddard Rosa
    Signed-off-by: Jiri Kosina

    André Goddard Rosa
     

23 Sep, 2009

3 commits


05 Sep, 2009

2 commits

  • The next step in divorcing metadata I/O management from struct inode is
    to pass struct ocfs2_caching_info to the journal functions. Thus the
    journal locks a metadata cache with the cache io_lock function. It also
    can compare ci_last_trans and ci_created_trans directly.

    This is a large patch because of all the places we change
    ocfs2_journal_access..(handle, inode, ...) to
    ocfs2_journal_access..(handle, INODE_CACHE(inode), ...).

    Signed-off-by: Joel Becker

    Joel Becker
     
  • We are really passing the inode into the ocfs2_read/write_blocks()
    functions to get at the metadata cache. This commit passes the cache
    directly into the metadata block functions, divorcing them from the
    inode.

    Signed-off-by: Joel Becker

    Joel Becker
     

23 Jun, 2009

4 commits

  • Add lockdep support to OCFS2. The support also covers all of the cluster
    locks except for open locks, journal locks, and local quotafile locks. These
    are special because they are acquired for a node, not for a particular process
    and lockdep cannot deal with such type of locking.

    Signed-off-by: Jan Kara
    Signed-off-by: Joel Becker

    Jan Kara
     
  • Local and Hard-RO mounts do not need orphan scanning.

    Signed-off-by: Sunil Mushran
    Signed-off-by: Joel Becker

    Sunil Mushran
     
  • We don't access the LVB in our ocfs2_*_lock_res_init() functions.

    Since the LVB can become invalid during some cluster recovery
    operations, the dlmglue must be able to handle an uninitialized
    LVB.

    For the orphan scan lock, we initialized an uninitialzed LVB with our
    scan sequence number plus one. This starts a normal orphan scan
    cycle.

    Signed-off-by: Sunil Mushran
    Signed-off-by: Joel Becker

    Sunil Mushran
     
  • The Lock Value Block (LVB) of a DLM lock can be lost when nodes die and
    the DLM cannot reconstruct its state. Clients of the DLM need to know
    this.

    ocfs2's internal DLM, o2dlm, explicitly zeroes out the LVB when it loses
    track of the state. This is not a standard behavior, but ocfs2 has
    always relied on it. Thus, an o2dlm LVB is always "valid".

    ocfs2 now supports both o2dlm and fs/dlm via the stack glue. When
    fs/dlm loses track of an LVBs state, it sets a flag
    (DLM_SBF_VALNOTVALID) on the Lock Status Block (LKSB). The contents of
    the LVB may be garbage or merely stale.

    ocfs2 doesn't want to try to guess at the validity of the stale LVB.
    Instead, it should be checking the VALNOTVALID flag. As this is the
    'standard' way of treating LVBs, we will promote this behavior.

    We add a stack glue API ocfs2_dlm_lvb_valid(). It returns non-zero when
    the LVB is valid. o2dlm will always return valid, while fs/dlm will
    check VALNOTVALID.

    Signed-off-by: Joel Becker
    Acked-by: Mark Fasheh

    Joel Becker
     

04 Jun, 2009

1 commit

  • When a dentry is unlinked, the unlinking node takes an EX on the dentry lock
    before moving the dentry to the orphan directory. Other nodes that have
    this dentry in cache have a PR on the same dentry lock. When the EX is
    requested, the other nodes flag the corresponding inode as MAYBE_ORPHANED
    during downconvert. The inode is finally deleted when the last node to iput
    the inode sees that i_nlink==0 and the MAYBE_ORPHANED flag is set.

    A problem arises if a node is forced to free dentry locks because of memory
    pressure. If this happens, the node will no longer get downconvert
    notifications for the dentries that have been unlinked on another node.
    If it also happens that node is actively using the corresponding inode and
    happens to be the one performing the last iput on that inode, it will fail
    to delete the inode as it will not have the MAYBE_ORPHANED flag set.

    This patch fixes this shortcoming by introducing a periodic scan of the
    orphan directories to delete such inodes. Care has been taken to distribute
    the workload across the cluster so that no one node has to perform the task
    all the time.

    Signed-off-by: Srinivas Eeda
    Signed-off-by: Joel Becker

    Srinivas Eeda
     

04 Apr, 2009

1 commit

  • For nfs exporting, ocfs2_get_dentry() returns the dentry for fh.
    ocfs2_get_dentry() may read from disk when the inode is not in memory,
    without any cross cluster lock. this leads to the file system loading a
    stale inode.

    This patch fixes above problem.

    Solution is that in case of inode is not in memory, we get the cluster
    lock(PR) of alloc inode where the inode in question is allocated from (this
    causes node on which deletion is done sync the alloc inode) before reading
    out the inode itsself. then we check the bitmap in the group (the inode in
    question allcated from) to see if the bit is clear. if it's clear then it's
    stale. if the bit is set, we then check generation as the existing code
    does.

    We have to read out the inode in question from disk first to know its alloc
    slot and allot bit. And if its not stale we read it out using ocfs2_iget().
    The second read should then be from cache.

    And also we have to add a per superblock nfs_sync_lock to cover the lock for
    alloc inode and that for inode in question. this is because ocfs2_get_dentry()
    and ocfs2_delete_inode() lock on them in reverse order. nfs_sync_lock is locked
    in EX mode in ocfs2_get_dentry() and in PR mode in ocfs2_delete_inode(). so
    that mutliple ocfs2_delete_inode() can run concurrently in normal case.

    [mfasheh@suse.com: build warning fixes and comment cleanups]
    Signed-off-by: Wengang Wang
    Acked-by: Joel Becker
    Signed-off-by: Mark Fasheh

    wengang wang
     

27 Feb, 2009

1 commit


03 Feb, 2009

1 commit

  • When two nodes holding PR locks on a resource concurrently attempt to
    upconvert the locks to EX, the master sends a BAST to one of the nodes. This
    message tells that node to first cancel convert the upconvert request,
    followed by downconvert to a NL. Only when this lock is downconverted to NL,
    can the master upconvert the first node's lock to EX.

    While the fs was doing the cancel convert, it was forgetting to wake up the
    dc thread after a successful cancel, leading to a deadlock.

    Reported-and-Tested-by: David Teigland
    Signed-off-by: Sunil Mushran
    Signed-off-by: Mark Fasheh

    Sunil Mushran
     

09 Jan, 2009

1 commit

  • When I review ocfs2 code, find there are 2 typos to "successfull". After
    doing grep "successfull " in kernel tree, 22 typos found totally -- great
    minds always think alike :)

    This patch fixes all the similar typos. Thanks for Randy's ack and comments.

    Signed-off-by: Coly Li
    Acked-by: Randy Dunlap
    Acked-by: Roland Dreier
    Cc: Jeremy Kerr
    Cc: Jeff Garzik
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Theodore Ts'o
    Cc: Mark Fasheh
    Cc: Vlad Yasevich
    Cc: Sridhar Samudrala
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Coly Li
     

06 Jan, 2009

3 commits

  • dlmglue.c has lots of code which casts the return value of ocfs2_dlm_lvb().
    This is pointless however, as ocfs2_dlm_lvb() returns void *.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • ocfs2_bread() has become ocfs2_read_virt_blocks(), with a prototype to
    match ocfs2_read_blocks(). The quota code, converting from
    ocfs2_bread(), wraps the call to ocfs2_read_virt_blocks() in
    ocfs2_read_quota_block(). Unfortunately, the prototype of
    ocfs2_read_quota_block() matches the old prototype of ocfs2_bread().

    The problem is that ocfs2_bread() returned the buffer head, and callers
    assumed that a NULL pointer was indicative of error. It wasn't. This
    is why ocfs2_bread() took an int*err argument as well.

    The new prototype of ocfs2_read_virt_blocks() avoids this error handling
    confusion. Let's change ocfs2_read_quota_block() to match.

    Signed-off-by: Joel Becker
    Acked-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • For each quota type each node has local quota file. In this file it stores
    changes users have made to disk usage via this node. Once in a while this
    information is synced to global file (and thus with other nodes) so that
    limits enforcement at least aproximately works.

    Global quota files contain all the information about usage and limits. It's
    mostly handled by the generic VFS code (which implements a trie of structures
    inside a quota file). We only have to provide functions to convert structures
    from on-disk format to in-memory one. We also have to provide wrappers for
    various quota functions starting transactions and acquiring necessary cluster
    locks before the actual IO is really started.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara