06 Jan, 2009

40 commits

  • Add quota calls for allocation and freeing of inodes and space, also update
    estimates on number of needed credits for a transaction. Move out inode
    allocation from ocfs2_mknod_locked() because vfs_dq_init() must be called
    outside of a transaction.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • For each quota type each node has local quota file. In this file it stores
    changes users have made to disk usage via this node. Once in a while this
    information is synced to global file (and thus with other nodes) so that
    limits enforcement at least aproximately works.

    Global quota files contain all the information about usage and limits. It's
    mostly handled by the generic VFS code (which implements a trie of structures
    inside a quota file). We only have to provide functions to convert structures
    from on-disk format to in-memory one. We also have to provide wrappers for
    various quota functions starting transactions and acquiring necessary cluster
    locks before the actual IO is really started.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • Mark system files as not subject to quota accounting. This prevents
    possible recursions into quota code and thus deadlocks.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • OCFS2 can easily support nested transactions. We just have to
    take care and not spoil statistics acquire semaphore unnecessarily.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • OCFS2 needs to scan all active dquots once in a while and sync quota
    information among cluster nodes. Provide a helper function for it so
    that it does not have to reimplement internally a list which VFS
    already has. Moreover this function is probably going to be useful
    for other clustered filesystems if they decide to use VFS quotas.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • OCFS2 needs to peek whether quota structure is already in memory so
    that it can avoid expensive cluster locking in that case. Similarly
    when freeing dquots, it checks whether it is the last quota structure
    user or not. Finally, it needs to get reference to dquot structure for
    specified id and quota type when recovering quota file after crash.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • Increase reported version number of quota support since quota core has changed
    significantly. Also remove __DQUOT_NUM_VERSION__ since nobody uses it.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • Quota in a clustered environment needs to synchronize quota information
    among cluster nodes. This means we have to occasionally update some
    information in dquot from disk / network. On the other hand we have to
    be careful not to overwrite changes administrator did via SETQUOTA.
    So indicate in dquot->dq_flags which entries have been set by SETQUOTA
    and quota format can clear these flags when it properly propagated
    the changes.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • For clustered filesystems, it can happen that space / inode usage goes
    negative temporarily (because some node is allocating another node
    is freeing and they are not completely in sync). So let quota code
    allow this and change qsize_t so a signed type so that we don't
    underflow the variables.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • Coming quota support for OCFS2 is going to need quite a bit
    of additional per-sb quota information. Moreover having fs.h
    include all the types needed for this structure would be a
    pain in the a**. So remove the union from mem_dqinfo and add
    a private pointer for filesystem's use.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • There is going to be a new version of quota format having 64-bit
    quota limits and a new quota format for OCFS2. They are both
    going to use the same tree structure as VFSv0 quota format. So
    split out tree handling into a separate file and make size of
    leaf blocks, amount of space usable in each block (needed for
    checksumming) and structures contained in them configurable
    so that the code can be shared.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • Since these include files are used only by implementation of quota formats,
    there's no need to have them in include/linux/.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • If filesystem can handle quota files as system files hidden from users, we can
    skip a lot of cache invalidation, syncing, inode flags setting etc. when
    turning quotas on, off and quota_sync. Allow filesystem to indicate that it is
    hiding quota files from users by DQUOT_QUOTA_SYS_FILE flag.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • Split DQUOT_USR_ENABLED (and DQUOT_GRP_ENABLED) into DQUOT_USR_USAGE_ENABLED
    and DQUOT_USR_LIMITS_ENABLED. This way we are able to separately enable /
    disable whether we should:
    1) ignore quotas completely
    2) just keep uptodate information about usage
    3) actually enforce quota limits

    This is going to be useful when quota is treated as filesystem metadata - we
    then want to keep quota information uptodate all the time and just enable /
    disable limits enforcement.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • Upto now, DQUOT_USR_SUSPENDED behaved like a state - i.e., either quota
    was enabled or suspended or none. Now allowed states are 0, ENABLED,
    ENABLED | SUSPENDED. This will be useful later when we implement separate
    enabling of quota usage tracking and limits enforcement because we need to
    keep track of a state which has been suspended.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • Checks like
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • So far quota was fine with quota block limits and inode limits/numbers in
    a 32-bit type. Now with rapid increase in storage sizes there are coming
    requests to be able to handle quota limits above 4TB / more that 2^32 inodes.
    So bump up sizes of types in mem_dqblk structure to 64-bits to be able to
    handle this. Also update inode allocation / checking functions to use qsize_t
    and make global structure keep quota limits in bytes so that things are
    consistent.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • Some filesystems would like to keep private information together with each
    dquot. Add callbacks alloc_dquot and destroy_dquot allowing filesystem to
    allocate larger dquots from their private slab in a similar fashion we
    currently allocate inodes.

    Signed-off-by: Jan Kara
    Signed-off-by: Mark Fasheh

    Jan Kara
     
  • During an xattr set, when we move a xattr which was stored in inode to the
    outside bucket, we have to delete it and it will use the old value of
    xis->not_found. xis->not_found is removed by ocfs2_calc_xattr_set_need
    though, so we must restore it.

    Signed-off-by: Tao Ma
    Signed-off-by: Mark Fasheh

    Tao Ma
     
  • When we extend one xattr's value to a large size, the old value size might
    be smaller than the size of a value root. In those cases, we still need to
    guess the metadata allocation.

    Reported-by: Tiger Yang
    Signed-off-by: Tao Ma
    Signed-off-by: Mark Fasheh

    Tao Ma
     
  • JBD2 is fully backwards compatible with JBD and it's been tested enough with
    Ocfs2 that we can clean this code up now.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Now that we've centralized the ocfs2_read_virt_blocks() code, let's use
    it in ocfs2_read_dir_block().

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • The ocfs2_read_dir_block() function really maps an inode's virtual
    blocks to physical ones before calling ocfs2_read_blocks(). Let's
    extract that to common code, because other places might want to do that.

    Other than the block number being virtual, ocfs2_read_virt_blocks()
    takes the same arguments as ocfs2_read_blocks(). It converts those
    virtual block numbers to physical before calling ocfs2_read_blocks()
    directly. If the blocks asked for are discontiguous, this can mean
    multiple calls to ocfs2_read_blocks(), but this is mostly hidden from
    the caller.

    Like ocfs2_read_blocks(), the caller can pass in an existing
    buffer_head. This is usually done to pick up some readahead I/O.
    ocfs2_read_virt_blocks() checks the buffer_head's block number
    against the extent map - it must match.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Add an optional validation hook to ocfs2_read_blocks(). Now the
    validation function is only called when a block was actually read off of
    disk. It is not called when the buffer was in cache.

    We add a buffer state bit BH_NeedsValidate to flag these buffers. It
    must always be one higher than the last JBD2 buffer state bit.

    The dinode, dirblock, extent_block, and xattr_block validators are
    lifted to this scheme directly. The group_descriptor validator needs to
    be split into two pieces. The first part only needs the gd buffer and
    is passed to ocfs2_read_block(). The second part requires the dinode as
    well, and is called every time. It's only 3 compares, so it's tiny.
    This also allows us to clean up the non-fatal gd check used by resize.c.
    It now has no magic argument.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • We weren't consistently checking xattr blocks after we read them.
    Most places checked the signature, but none checked xb_blkno or
    xb_fs_signature. Create a toplevel ocfs2_read_xattr_block() that does
    the read and the validation.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • We have ocfs2_bread() as a vestige of the original ext-based dir code.
    It's only used by directories, though. Turn it into
    ocfs2_read_dir_block(), with a prototype matching the other metadata
    read functions. It's set up to validate dirblocks when the time comes.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • We weren't consistently checking extent blocks after we read them.
    Most places checked the signature, but none checked h_blkno or
    h_fs_signature. Create a toplevel ocfs2_read_extent_block() that does
    the read and the validation.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Random places in the code would check a group descriptor bh to see if it
    was valid. The previous commit unified descriptor block reads,
    validating all block reads in the same place. Thus, these checks are no
    longer necessary. Rather than eliminate them, however, we change them
    to BUG_ON() checks. This ensures the assumptions remain true. All of
    the code paths to these checks have been audited to ensure they come
    from a validated descriptor read.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • We have a clean call for validating group descriptors, but every place
    that wants the always does a read_block()+validate() call pair. Create
    a toplevel ocfs2_read_group_descriptor() that does the right
    thing. This allows us to leverage the single call point later for
    fancier handling. We also add validation of gd->bg_generation against
    the superblock and gd->bg_blkno against the block we thought we read.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Currently the validation of group descriptors is directly duplicated so
    that one version can error the filesystem and the other (resize) can
    just report the problem. Consolidate to one function that takes a
    boolean. Wrap that function with the old call for the old users.

    This is in preparation for lifting the read+validate step into a
    single function.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Random places in the code would check a dinode bh to see if it was
    valid. Not only did they do different levels of validation, they
    handled errors in different ways.

    The previous commit unified inode block reads, validating all block
    reads in the same place. Thus, these haphazard checks are no longer
    necessary. Rather than eliminate them, however, we change them to
    BUG_ON() checks. This ensures the assumptions remain true. All of the
    code paths to these checks have been audited to ensure they come from a
    validated inode read.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • The ocfs2 code currently reads inodes off disk with a simple
    ocfs2_read_block() call. Each place that does this has a different set
    of sanity checks it performs. Some check only the signature. A couple
    validate the block number (the block read vs di->i_blkno). A couple
    others check for VALID_FL. Only one place validates i_fs_generation. A
    couple check nothing. Even when an error is found, they don't all do
    the same thing.

    We wrap inode reading into ocfs2_read_inode_block(). This will validate
    all the above fields, going readonly if they are invalid (they never
    should be). ocfs2_read_inode_block_full() is provided for the places
    that want to pass read_block flags. Every caller is passing a struct
    inode with a valid ip_blkno, so we don't need a separate blkno argument
    either.

    We will remove the validation checks from the rest of the code in a
    later commit, as they are no longer necessary.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • This patch adds the Kconfig option "CONFIG_OCFS2_FS_POSIX_ACL"
    and mount options "acl" to enable acls in Ocfs2.

    Signed-off-by: Tiger Yang
    Signed-off-by: Mark Fasheh

    Tiger Yang
     
  • We need to get the parent directories acls and let the new child inherit it.
    To this, we add additional calculations for data/metadata allocation.

    Signed-off-by: Tiger Yang
    Signed-off-by: Mark Fasheh

    Tiger Yang
     
  • This function is used to update acl xattrs during file mode changes.

    Signed-off-by: Tiger Yang
    Signed-off-by: Mark Fasheh

    Tiger Yang