06 Jun, 2013

1 commit


10 May, 2013

1 commit

  • Pull btrfs update from Chris Mason:
    "These are mostly fixes. The biggest exceptions are Josef's skinny
    extents and Jan Schmidt's code to rebuild our quota indexes if they
    get out of sync (or you enable quotas on an existing filesystem).

    The skinny extents are off by default because they are a new variation
    on the extent allocation tree format. btrfstune -x enables them, and
    the new format makes the extent allocation tree about 30% smaller.

    I rebased this a few days ago to rework Dave Sterba's crc checks on
    the super block, but almost all of these go back to rc6, since I
    though 3.9 was due any minute.

    The biggest missing fix is the tracepoint bug that was hit late in
    3.9. I ran into problems with that in overnight testing and I'm still
    tracking it down. I'll definitely have that fixed for rc2."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (101 commits)
    Btrfs: allow superblock mismatch from older mkfs
    btrfs: enhance superblock checks
    btrfs: fix misleading variable name for flags
    btrfs: use unsigned long type for extent state bits
    Btrfs: improve the loop of scrub_stripe
    btrfs: read entire device info under lock
    btrfs: remove unused gfp mask parameter from release_extent_buffer callchain
    btrfs: handle errors returned from get_tree_block_key
    btrfs: make static code static & remove dead code
    Btrfs: deal with errors in write_dev_supers
    Btrfs: remove almost all of the BUG()'s from tree-log.c
    Btrfs: deal with free space cache errors while replaying log
    Btrfs: automatic rescan after "quota enable" command
    Btrfs: rescan for qgroups
    Btrfs: split btrfs_qgroup_account_ref into four functions
    Btrfs: allocate new chunks if the space is not enough for global rsv
    Btrfs: separate sequence numbers for delayed ref tracking and tree mod log
    btrfs: move leak debug code to functions
    Btrfs: return free space in cow error path
    Btrfs: set UUID in root_item for created trees
    ...

    Linus Torvalds
     

09 May, 2013

1 commit

  • Pull f2fs updates from Jaegeuk Kim:
    "This patch-set includes the following major enhancement patches.
    - introduce a new gloabl lock scheme
    - add tracepoints on several major functions
    - fix the overall cleaning process focused on victim selection
    - apply the block plugging to merge IOs as much as possible
    - enhance management of free nids and its list
    - enhance the readahead mode for node pages
    - address several cretical deadlock conditions
    - reduce lock_page calls

    The other minor bug fixes and enhancements are as follows.
    - calculation mistakes: overflow
    - bio types: READ, READA, and READ_SYNC
    - fix the recovery flow, data races, and null pointer errors"

    * tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
    f2fs: cover free_nid management with spin_lock
    f2fs: optimize scan_nat_page()
    f2fs: code cleanup for scan_nat_page() and build_free_nids()
    f2fs: bugfix for alloc_nid_failed()
    f2fs: recover when journal contains deleted files
    f2fs: continue to mount after failing recovery
    f2fs: avoid deadlock during evict after f2fs_gc
    f2fs: modify the number of issued pages to merge IOs
    f2fs: remove useless #include as we're now using sysfs as debug entry.
    f2fs: fix inconsistent using of NM_WOUT_THRESHOLD
    f2fs: check truncation of mapping after lock_page
    f2fs: enhance alloc_nid and build_free_nids flows
    f2fs: add a tracepoint on f2fs_new_inode
    f2fs: check nid == 0 in add_free_nid
    f2fs: add REQ_META about metadata requests for submit
    f2fs: give a chance to merge IOs by IO scheduler
    f2fs: avoid frequent background GC
    f2fs: add tracepoints to debug checkpoint request
    f2fs: add tracepoints for write page operations
    f2fs: add tracepoints to debug the block allocation
    ...

    Linus Torvalds
     

07 May, 2013

1 commit


04 May, 2013

1 commit

  • Pull nfsd changes from J Bruce Fields:
    "Highlights include:

    - Some more DRC cleanup and performance work from Jeff Layton

    - A gss-proxy upcall from Simo Sorce: currently krb5 mounts to the
    server using credentials from Active Directory often fail due to
    limitations of the svcgssd upcall interface. This replacement
    lifts those limitations. The existing upcall is still supported
    for backwards compatibility.

    - More NFSv4.1 support: at this point, if a user with a current
    client who upgrades from 4.0 to 4.1 should see no regressions. In
    theory we do everything a 4.1 server is required to do. Patches
    for a couple minor exceptions are ready for 3.11, and with those
    and some more testing I'd like to turn 4.1 on by default in 3.11."

    Fix up semantic conflict as per Stephen Rothwell and linux-next:

    Commit 030d794bf498 ("SUNRPC: Use gssproxy upcall for server RPCGSS
    authentication") adds two new users of "PDE(inode)->data", but we're
    supposed to use "PDE_DATA(inode)" instead since commit d9dda78bad87
    ("procfs: new helper - PDE_DATA(inode)").

    The old PDE() macro is no longer available since commit c30480b92cf4
    ("proc: Make the PROC_I() and PDE() macros internal to procfs")

    * 'for-3.10' of git://linux-nfs.org/~bfields/linux: (60 commits)
    NFSD: SECINFO doesn't handle unsupported pseudoflavors correctly
    NFSD: Simplify GSS flavor encoding in nfsd4_do_encode_secinfo()
    nfsd: make symbol nfsd_reply_cache_shrinker static
    svcauth_gss: fix error return code in rsc_parse()
    nfsd4: don't remap EISDIR errors in rename
    svcrpc: fix gss-proxy to respect user namespaces
    SUNRPC: gssp_procedures[] can be static
    SUNRPC: define {create,destroy}_use_gss_proxy_proc_entry in !PROC case
    nfsd4: better error return to indicate SSV non-support
    nfsd: fix EXDEV checking in rename
    SUNRPC: Use gssproxy upcall for server RPCGSS authentication.
    SUNRPC: Add RPC based upcall mechanism for RPCGSS auth
    SUNRPC: conditionally return endtime from import_sec_context
    SUNRPC: allow disabling idle timeout
    SUNRPC: attempt AF_LOCAL connect on setup
    nfsd: Decode and send 64bit time values
    nfsd4: put_client_renew_locked can be static
    nfsd4: remove unused macro
    nfsd4: remove some useless code
    nfsd4: implement SEQ4_STATUS_RECALLABLE_STATE_REVOKED
    ...

    Linus Torvalds
     

03 May, 2013

1 commit

  • Pull xfs update from Ben Myers:
    "For 3.10-rc1 we have a number of bug fixes and cleanups and a
    currently experimental feature from David Chinner, CRCs protection for
    metadata. CRCs are enabled by using mkfs.xfs to create a filesystem
    with the feature bits set.

    - numerous fixes for speculative preallocation
    - don't verify buffers on IO errors
    - rename of random32 to prandom32
    - refactoring/rearrangement in xfs_bmap.c
    - removal of unused m_inode_shrink in struct xfs_mount
    - fix error handling of xfs_bufs and readahead
    - quota driven preallocation throttling
    - fix WARN_ON in xfs_vm_releasepage
    - add ratelimited printk for different alert levels
    - fix spurious forced shutdowns due to freed Extent Free Intents
    - remove some obsolete XLOG_CIL_HARD_SPACE_LIMIT() macros
    - remove some obsoleted comments
    - (experimental) CRC support for metadata"

    * tag 'for-linus-v3.10-rc1' of git://oss.sgi.com/xfs/xfs: (46 commits)
    xfs: fix da node magic number mismatches
    xfs: Remote attr validation fixes and optimisations
    xfs: Teach dquot recovery about CONFIG_XFS_QUOTA
    xfs: add metadata CRC documentation
    xfs: implement extended feature masks
    xfs: add CRC checks to the superblock
    xfs: buffer type overruns blf_flags field
    xfs: add buffer types to directory and attribute buffers
    xfs: add CRC protection to remote attributes
    xfs: split remote attribute code out
    xfs: add CRCs to attr leaf blocks
    xfs: add CRCs to dir2/da node blocks
    xfs: shortform directory offsets change for dir3 format
    xfs: add CRC checking to dir2 leaf blocks
    xfs: add CRC checking to dir2 data blocks
    xfs: add CRC checking to dir2 free blocks
    xfs: add CRC checks to block format directory blocks
    xfs: add CRC checks to remote symlinks
    xfs: split out symlink code into it's own file.
    xfs: add version 3 inode format with CRCs
    ...

    Linus Torvalds
     

01 May, 2013

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "Mostly performance and bug fixes, plus some cleanups. The one new
    feature this merge window is a new ioctl EXT4_IOC_SWAP_BOOT which
    allows installation of a hidden inode designed for boot loaders."

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (50 commits)
    ext4: fix type-widening bug in inode table readahead code
    ext4: add check for inodes_count overflow in new resize ioctl
    ext4: fix Kconfig documentation for CONFIG_EXT4_DEBUG
    ext4: fix online resizing for ext3-compat file systems
    jbd2: trace when lock_buffer in do_get_write_access takes a long time
    ext4: mark metadata blocks using bh flags
    buffer: add BH_Prio and BH_Meta flags
    ext4: mark all metadata I/O with REQ_META
    ext4: fix readdir error in case inline_data+^dir_index.
    ext4: fix readdir error in the case of inline_data+dir_index
    jbd2: use kmem_cache_zalloc instead of kmem_cache_alloc/memset
    ext4: mext_insert_extents should update extent block checksum
    ext4: move quota initialization out of inode allocation transaction
    ext4: reserve xattr index for Rich ACL support
    jbd2: reduce journal_head size
    ext4: clear buffer_uninit flag when submitting IO
    ext4: use io_end for multiple bios
    ext4: make ext4_bio_write_page() use BH_Async_Write flags
    ext4: Use kstrtoul() instead of parse_strtoul()
    ext4: defragmentation code cleanup
    ...

    Linus Torvalds
     

30 Apr, 2013

1 commit


28 Apr, 2013

1 commit


26 Apr, 2013

1 commit

  • The main advantge of this new upcall mechanism is that it can handle
    big tickets as seen in Kerberos implementations where tickets carry
    authorization data like the MS-PAC buffer with AD or the Posix Authorization
    Data being discussed in IETF on the krbwg working group.

    The Gssproxy program is used to perform the accept_sec_context call on the
    kernel's behalf. The code is changed to also pass the input buffer straight
    to upcall mechanism to avoid allocating and copying many pages as tokens can
    be as big (potentially more in future) as 64KiB.

    Signed-off-by: Simo Sorce
    [bfields: containerization, negotiation api]
    Signed-off-by: J. Bruce Fields

    Simo Sorce
     

10 Apr, 2013

1 commit

  • Currently in ENOSPC condition when writing into unwritten space, or
    punching a hole, we might need to split the extent and grow extent tree.
    However since we can not allocate any new metadata blocks we'll have to
    zero out unwritten part of extent or punched out part of extent, or in
    the worst case return ENOSPC even though use actually does not allocate
    any space.

    Also in delalloc path we do reserve metadata and data blocks for the
    time we're going to write out, however metadata block reservation is
    very tricky especially since we expect that logical connectivity implies
    physical connectivity, however that might not be the case and hence we
    might end up allocating more metadata blocks than previously reserved.
    So in future, metadata reservation checks should be removed since we can
    not assure that we do not under reserve.

    And this is where reserved space comes into the picture. When mounting
    the file system we slice off a little bit of the file system space (2%
    or 4096 clusters, whichever is smaller) which can be then used for the
    cases mentioned above to prevent costly zeroout, or unexpected ENOSPC.

    The number of reserved clusters can be set via sysfs, however it can
    never be bigger than number of free clusters in the file system.

    Note that this patch fixes the failure of xfstest 274 as expected.

    Signed-off-by: Lukas Czerner
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Carlos Maiolino

    Lukas Czerner
     

09 Apr, 2013

1 commit

  • Add a new ioctl, EXT4_IOC_SWAP_BOOT which swaps i_blocks and
    associated attributes (like i_blocks, i_size, i_flags, ...) from the
    specified inode with inode EXT4_BOOT_LOADER_INO (#5). This is
    typically used to store a boot loader in a secure part of the
    filesystem, where it can't be changed by a normal user by accident.
    The data blocks of the previous boot loader will be associated with
    the given inode.

    This usercode program is a simple example of the usage:

    int main(int argc, char *argv[])
    {
    int fd;
    int err;

    if ( argc != 2 ) {
    printf("usage: ext4-swap-boot-inode FILE-TO-SWAP\n");
    exit(1);
    }

    fd = open(argv[1], O_WRONLY);
    if ( fd < 0 ) {
    perror("open");
    exit(1);
    }

    err = ioctl(fd, EXT4_IOC_SWAP_BOOT);
    if ( err < 0 ) {
    perror("ioctl");
    exit(1);
    }

    close(fd);
    exit(0);
    }

    [ Modified by Theodore Ts'o to fix a number of bugs in the original code.]

    Signed-off-by: Dr. Tilmann Bubeck
    Signed-off-by: "Theodore Ts'o"

    Dr. Tilmann Bubeck
     

03 Apr, 2013

1 commit


27 Feb, 2013

1 commit

  • Pull vfs pile (part one) from Al Viro:
    "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
    locking violations, etc.

    The most visible changes here are death of FS_REVAL_DOT (replaced with
    "has ->d_weak_revalidate()") and a new helper getting from struct file
    to inode. Some bits of preparation to xattr method interface changes.

    Misc patches by various people sent this cycle *and* ocfs2 fixes from
    several cycles ago that should've been upstream right then.

    PS: the next vfs pile will be xattr stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    saner proc_get_inode() calling conventions
    proc: avoid extra pde_put() in proc_fill_super()
    fs: change return values from -EACCES to -EPERM
    fs/exec.c: make bprm_mm_init() static
    ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
    ocfs2: fix possible use-after-free with AIO
    ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
    get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
    target: writev() on single-element vector is pointless
    export kernel_write(), convert open-coded instances
    fs: encode_fh: return FILEID_INVALID if invalid fid_type
    kill f_vfsmnt
    vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
    nfsd: handle vfs_getattr errors in acl protocol
    switch vfs_getattr() to struct path
    default SET_PERSONALITY() in linux/elf.h
    ceph: prepopulate inodes only when request is aborted
    d_hash_and_lookup(): export, switch open-coded instances
    9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
    9p: split dropping the acls from v9fs_set_create_acl()
    ...

    Linus Torvalds
     

26 Feb, 2013

1 commit

  • The following set of operations on a NFS client and server will cause

    server# mkdir a
    client# cd a
    server# mv a a.bak
    client# sleep 30 # (or whatever the dir attrcache timeout is)
    client# stat .
    stat: cannot stat `.': Stale NFS file handle

    Obviously, we should not be getting an ESTALE error back there since the
    inode still exists on the server. The problem is that the lookup code
    will call d_revalidate on the dentry that "." refers to, because NFS has
    FS_REVAL_DOT set.

    nfs_lookup_revalidate will see that the parent directory has changed and
    will try to reverify the dentry by redoing a LOOKUP. That of course
    fails, so the lookup code returns ESTALE.

    The problem here is that d_revalidate is really a bad fit for this case.
    What we really want to know at this point is whether the inode is still
    good or not, but we don't really care what name it goes by or whether
    the dcache is still valid.

    Add a new d_op->d_weak_revalidate operation and have complete_walk call
    that instead of d_revalidate. The intent there is to allow for a
    "weaker" d_revalidate that just checks to see whether the inode is still
    good. This is also gives us an opportunity to kill off the FS_REVAL_DOT
    special casing.

    [AV: changed method name, added note in porting, fixed confusion re
    having it possibly called from RCU mode (it won't be)]

    Cc: NeilBrown
    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     

04 Jan, 2013

1 commit


21 Dec, 2012

7 commits

  • Pull VFS update from Al Viro:
    "fscache fixes, ESTALE patchset, vmtruncate removal series, assorted
    misc stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (79 commits)
    vfs: make lremovexattr retry once on ESTALE error
    vfs: make removexattr retry once on ESTALE
    vfs: make llistxattr retry once on ESTALE error
    vfs: make listxattr retry once on ESTALE error
    vfs: make lgetxattr retry once on ESTALE
    vfs: make getxattr retry once on an ESTALE error
    vfs: allow lsetxattr() to retry once on ESTALE errors
    vfs: allow setxattr to retry once on ESTALE errors
    vfs: allow utimensat() calls to retry once on an ESTALE error
    vfs: fix user_statfs to retry once on ESTALE errors
    vfs: make fchownat retry once on ESTALE errors
    vfs: make fchmodat retry once on ESTALE errors
    vfs: have chroot retry once on ESTALE error
    vfs: have chdir retry lookup and call once on ESTALE error
    vfs: have faccessat retry once on an ESTALE error
    vfs: have do_sys_truncate retry once on an ESTALE error
    vfs: fix renameat to retry on ESTALE errors
    vfs: make do_unlinkat retry once on ESTALE errors
    vfs: make do_rmdir retry once on ESTALE errors
    vfs: add a flags argument to user_path_parent
    ...

    Linus Torvalds
     
  • …/linux-fs into for-linus

    Al Viro
     
  • Removed vmtruncate

    Signed-off-by: Marco Stornelli
    Signed-off-by: Al Viro

    Marco Stornelli
     
  • Pull nfsd update from Bruce Fields:
    "Included this time:

    - more nfsd containerization work from Stanislav Kinsbursky: we're
    not quite there yet, but should be by 3.9.

    - NFSv4.1 progress: implementation of basic backchannel security
    negotiation and the mandatory BACKCHANNEL_CTL operation. See

    http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues

    for remaining TODO's

    - Fixes for some bugs that could be triggered by unusual compounds.
    Our xdr code wasn't designed with v4 compounds in mind, and it
    shows. A more thorough rewrite is still a todo.

    - If you've ever seen "RPC: multiple fragments per record not
    supported" logged while using some sort of odd userland NFS client,
    that should now be fixed.

    - Further work from Jeff Layton on our mechanism for storing
    information about NFSv4 clients across reboots.

    - Further work from Bryan Schumaker on his fault-injection mechanism
    (which allows us to discard selective NFSv4 state, to excercise
    rarely-taken recovery code paths in the client.)

    - The usual mix of miscellaneous bugs and cleanup.

    Thanks to everyone who tested or contributed this cycle."

    * 'for-3.8' of git://linux-nfs.org/~bfields/linux: (111 commits)
    nfsd4: don't leave freed stateid hashed
    nfsd4: free_stateid can use the current stateid
    nfsd4: cleanup: replace rq_resused count by rq_next_page pointer
    nfsd: warn on odd reply state in nfsd_vfs_read
    nfsd4: fix oops on unusual readlike compound
    nfsd4: disable zero-copy on non-final read ops
    svcrpc: fix some printks
    NFSD: Correct the size calculation in fault_inject_write
    NFSD: Pass correct buffer size to rpc_ntop
    nfsd: pass proper net to nfsd_destroy() from NFSd kthreads
    nfsd: simplify service shutdown
    nfsd: replace boolean nfsd_up flag by users counter
    nfsd: simplify NFSv4 state init and shutdown
    nfsd: introduce helpers for generic resources init and shutdown
    nfsd: make NFSd service structure allocated per net
    nfsd: make NFSd service boot time per-net
    nfsd: per-net NFSd up flag introduced
    nfsd: move per-net startup code to separated function
    nfsd: pass net to __write_ports() and down
    nfsd: pass net to nfsd_set_nrthreads()
    ...

    Linus Torvalds
     
  • Provide a proper invalidation method rather than relying on the netfs retiring
    the cookie it has and getting a new one. The problem with this is that isn't
    easy for the netfs to make sure that it has completed/cancelled all its
    outstanding storage and retrieval operations on the cookie it is retiring.

    Instead, have the cache provide an invalidation method that will cancel or wait
    for all currently outstanding operations before invalidating the cache, and
    will cause new operations to queue up behind that. Whilst invalidation is in
    progress, some requests will be rejected until the cache can stack a barrier on
    the operation queue to cause new operations to be deferred behind it.

    Signed-off-by: David Howells

    David Howells
     
  • Fix the state management of internal fscache operations and the accounting of
    what operations are in what states.

    This is done by:

    (1) Give struct fscache_operation a enum variable that directly represents the
    state it's currently in, rather than spreading this knowledge over a bunch
    of flags, who's processing the operation at the moment and whether it is
    queued or not.

    This makes it easier to write assertions to check the state at various
    points and to prevent invalid state transitions.

    (2) Add an 'operation complete' state and supply a function to indicate the
    completion of an operation (fscache_op_complete()) and make things call
    it. The final call to fscache_put_operation() can then check that an op
    in the appropriate state (complete or cancelled).

    (3) Adjust the use of object->n_ops, ->n_in_progress, ->n_exclusive to better
    govern the state of an object:

    (a) The ->n_ops is now the number of extant operations on the object
    and is now decremented by fscache_put_operation() only.

    (b) The ->n_in_progress is simply the number of objects that have been
    taken off of the object's pending queue for the purposes of being
    run. This is decremented by fscache_op_complete() only.

    (c) The ->n_exclusive is the number of exclusive ops that have been
    submitted and queued or are in progress. It is decremented by
    fscache_op_complete() and by fscache_cancel_op().

    fscache_put_operation() and fscache_operation_gc() now no longer try to
    clean up ->n_exclusive and ->n_in_progress. That was leading to double
    decrements against fscache_cancel_op().

    fscache_cancel_op() now no longer decrements ->n_ops. That was leading to
    double decrements against fscache_put_operation().

    fscache_submit_exclusive_op() now decides whether it has to queue an op
    based on ->n_in_progress being > 0 rather than ->n_ops > 0 as the latter
    will persist in being true even after all preceding operations have been
    cancelled or completed. Furthermore, if an object is active and there are
    runnable ops against it, there must be at least one op running.

    (4) Add a remaining-pages counter (n_pages) to struct fscache_retrieval and
    provide a function to record completion of the pages as they complete.

    When n_pages reaches 0, the operation is deemed to be complete and
    fscache_op_complete() is called.

    Add calls to fscache_retrieval_complete() anywhere we've finished with a
    page we've been given to read or allocate for. This includes places where
    we just return pages to the netfs for reading from the server and where
    accessing the cache fails and we discard the proposed netfs page.

    The bugs in the unfixed state management manifest themselves as oopses like the
    following where the operation completion gets out of sync with return of the
    cookie by the netfs. This is possible because the cache unlocks and returns
    all the netfs pages before recording its completion - which means that there's
    nothing to stop the netfs discarding them and returning the cookie.

    FS-Cache: Cookie 'NFS.fh' still has outstanding reads
    ------------[ cut here ]------------
    kernel BUG at fs/fscache/cookie.c:519!
    invalid opcode: 0000 [#1] SMP
    CPU 1
    Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc

    Pid: 400, comm: kswapd0 Not tainted 3.1.0-rc7-fsdevel+ #1090 /DG965RY
    RIP: 0010:[] [] __fscache_relinquish_cookie+0x170/0x343 [fscache]
    RSP: 0018:ffff8800368cfb00 EFLAGS: 00010282
    RAX: 000000000000003c RBX: ffff880023cc8790 RCX: 0000000000000000
    RDX: 0000000000002f2e RSI: 0000000000000001 RDI: ffffffff813ab86c
    RBP: ffff8800368cfb50 R08: 0000000000000002 R09: 0000000000000000
    R10: ffff88003a1b7890 R11: ffff88001df6e488 R12: ffff880023d8ed98
    R13: ffff880023cc8798 R14: 0000000000000004 R15: ffff88003b8bf370
    FS: 0000000000000000(0000) GS:ffff88003bd00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00000000008ba008 CR3: 0000000023d93000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process kswapd0 (pid: 400, threadinfo ffff8800368ce000, task ffff88003b8bf040)
    Stack:
    ffff88003b8bf040 ffff88001df6e528 ffff88001df6e528 ffffffffa00b46b0
    ffff88003b8bf040 ffff88001df6e488 ffff88001df6e620 ffffffffa00b46b0
    ffff88001ebd04c8 0000000000000004 ffff8800368cfb70 ffffffffa00b2c91
    Call Trace:
    [] nfs_fscache_release_inode_cookie+0x3b/0x47 [nfs]
    [] nfs_clear_inode+0x3c/0x41 [nfs]
    [] nfs4_evict_inode+0x2f/0x33 [nfs]
    [] evict+0xa1/0x15c
    [] dispose_list+0x2c/0x38
    [] prune_icache_sb+0x28c/0x29b
    [] prune_super+0xd5/0x140
    [] shrink_slab+0x102/0x1ab
    [] balance_pgdat+0x2f2/0x595
    [] ? process_timeout+0xb/0xb
    [] kswapd+0x270/0x289
    [] ? __init_waitqueue_head+0x46/0x46
    [] ? balance_pgdat+0x595/0x595
    [] kthread+0x7f/0x87
    [] kernel_thread_helper+0x4/0x10
    [] ? finish_task_switch+0x45/0xc0
    [] ? retint_restore_args+0xe/0xe
    [] ? __init_kthread_worker+0x53/0x53
    [] ? gs_change+0xb/0xb

    Signed-off-by: David Howells

    David Howells
     
  • Pull new F2FS filesystem from Jaegeuk Kim:
    "Introduce a new file system, Flash-Friendly File System (F2FS), to
    Linux 3.8.

    Highlights:
    - Add initial f2fs source codes
    - Fix an endian conversion bug
    - Fix build failures on random configs
    - Fix the power-off-recovery routine
    - Minor cleanup, coding style, and typos patches"

    From the Kconfig help text:

    F2FS is based on Log-structured File System (LFS), which supports
    versatile "flash-friendly" features. The design has been focused on
    addressing the fundamental issues in LFS, which are snowball effect
    of wandering tree and high cleaning overhead.

    Since flash-based storages show different characteristics according to
    the internal geometry or flash memory management schemes aka FTL, F2FS
    and tools support various parameters not only for configuring on-disk
    layout, but also for selecting allocation and cleaning algorithms.

    and there's an article by Neil Brown about it on lwn.net:

    http://lwn.net/Articles/518988/

    * tag 'for-3.8-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (36 commits)
    f2fs: fix tracking parent inode number
    f2fs: cleanup the f2fs_bio_alloc routine
    f2fs: introduce accessor to retrieve number of dentry slots
    f2fs: remove redundant call to f2fs_put_page in delete entry
    f2fs: make use of GFP_F2FS_ZERO for setting gfp_mask
    f2fs: rewrite f2fs_bio_alloc to make it simpler
    f2fs: fix a typo in f2fs documentation
    f2fs: remove unused variable
    f2fs: move error condition for mkdir at proper place
    f2fs: remove unneeded initialization
    f2fs: check read only condition before beginning write out
    f2fs: remove unneeded memset from init_once
    f2fs: show error in case of invalid mount arguments
    f2fs: fix the compiler warning for uninitialized use of variable
    f2fs: resolve build failures
    f2fs: adjust kernel coding style
    f2fs: fix endian conversion bugs reported by sparse
    f2fs: remove unneeded version.h header file from f2fs.h
    f2fs: update the f2fs document
    f2fs: update Kconfig and Makefile
    ...

    Linus Torvalds
     

18 Dec, 2012

5 commits

  • Signed-off-by: Cyrill Gorcunov
    Cc: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Andrey Vagin
    Cc: Al Viro
    Cc: Alexey Dobriyan
    Cc: James Bottomley
    Cc: "Aneesh Kumar K.V"
    Cc: Alexey Dobriyan
    Cc: Matthew Helsley
    Cc: "J. Bruce Fields"
    Cc: "Aneesh Kumar K.V"
    Cc: Tvrtko Ursulin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • [akpm@linux-foundation.org: tweak documentation]
    Signed-off-by: Cyrill Gorcunov
    Cc: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Andrey Vagin
    Cc: Al Viro
    Cc: Alexey Dobriyan
    Cc: James Bottomley
    Cc: "Aneesh Kumar K.V"
    Cc: Alexey Dobriyan
    Cc: Matthew Helsley
    Cc: "J. Bruce Fields"
    Cc: "Aneesh Kumar K.V"
    Cc: Tvrtko Ursulin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • It is currently impossible to examine the state of seccomp for a given
    process. While attaching with gdb and attempting "call
    prctl(PR_GET_SECCOMP,...)" will work with some situations, it is not
    reliable. If the process is in seccomp mode 1, this query will kill the
    process (prctl not allowed), if the process is in mode 2 with prctl not
    allowed, it will similarly be killed, and in weird cases, if prctl is
    filtered to return errno 0, it can look like seccomp is disabled.

    When reviewing the state of running processes, there should be a way to
    externally examine the seccomp mode. ("Did this build of Chrome end up
    using seccomp?" "Did my distro ship ssh with seccomp enabled?")

    This adds the "Seccomp" line to /proc/$pid/status.

    Signed-off-by: Kees Cook
    Reviewed-by: Cyrill Gorcunov
    Cc: Andrea Arcangeli
    Cc: James Morris
    Acked-by: Serge E. Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • During c/r sessions we've found that there is no way at the moment to
    fetch some VMA associated flags, such as mlock() and madvise().

    This leads us to a problem -- we don't know if we should call for mlock()
    and/or madvise() after restore on the vma area we're bringing back to
    life.

    This patch intorduces a new field into "smaps" output called VmFlags,
    where all set flags associated with the particular VMA is shown as two
    letter mnemonics.

    [ Strictly speaking for c/r we only need mlock/madvise bits but it has been
    said that providing just a few flags looks somehow inconsistent. So all
    flags are here now. ]

    This feature is made available on CONFIG_CHECKPOINT_RESTORE=n kernels, as
    other applications may start to use these fields.

    The data is encoded in a somewhat awkward two letters mnemonic form, to
    encourage userspace to be prepared for fields being added or removed in
    the future.

    [a.p.zijlstra@chello.nl: props to use for_each_set_bit]
    [sfr@canb.auug.org.au: props to use array instead of struct]
    [akpm@linux-foundation.org: overall redesign and simplification]
    [akpm@linux-foundation.org: remove unneeded braces per sfr, avoid using bloaty for_each_set_bit()]
    Signed-off-by: Cyrill Gorcunov
    Cc: Pavel Emelyanov
    Cc: Peter Zijlstra
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • So far FAT either offsets time stamps by sys_tz.minuteswest or leaves them
    as they are (when tz=UTC mount option is used). However in some cases it
    is useful if one can specify time stamp offset on his own (e.g. when time
    zone of the camera connected is different from time zone of the computer,
    or when HW clock is in UTC and thus sys_tz.minuteswest == 0).

    So provide a mount option time_offset= which allows user to specify offset
    in minutes that should be applied to time stamps on the filesystem.

    akpm: this code would work incorrectly when used via `mount -o remount',
    because cached inodes would not be updated. But fatfs's fat_remount() is
    basically a no-op anyway.

    Signed-off-by: Jan Kara
    Acked-by: OGAWA Hirofumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

17 Dec, 2012

1 commit

  • Pull ext4 update from Ted Ts'o:
    "There are two major features for this merge window. The first is
    inline data, which allows small files or directories to be stored in
    the in-inode extended attribute area. (This requires that the file
    system use inodes which are at least 256 bytes or larger; 128 byte
    inodes do not have any room for in-inode xattrs.)

    The second new feature is SEEK_HOLE/SEEK_DATA support. This is
    enabled by the extent status tree patches, and this infrastructure
    will be used to further optimize ext4 in the future.

    Beyond that, we have the usual collection of code cleanups and bug
    fixes."

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (63 commits)
    ext4: zero out inline data using memset() instead of empty_zero_page
    ext4: ensure Inode flags consistency are checked at build time
    ext4: Remove CONFIG_EXT4_FS_XATTR
    ext4: remove unused variable from ext4_ext_in_cache()
    ext4: remove redundant initialization in ext4_fill_super()
    ext4: remove redundant code in ext4_alloc_inode()
    ext4: use sync_inode_metadata() when syncing inode metadata
    ext4: enable ext4 inline support
    ext4: let fallocate handle inline data correctly
    ext4: let ext4_truncate handle inline data correctly
    ext4: evict inline data out if we need to strore xattr in inode
    ext4: let fiemap work with inline data
    ext4: let ext4_rename handle inline dir
    ext4: let empty_dir handle inline dir
    ext4: let ext4_delete_entry() handle inline data
    ext4: make ext4_delete_entry generic
    ext4: let ext4_find_entry handle inline data
    ext4: create a new function search_dir
    ext4: let ext4_readdir handle inline data
    ext4: let add_dir_entry handle inline data properly
    ...

    Linus Torvalds
     

15 Dec, 2012

1 commit

  • Pull x86 EFI update from Peter Anvin:
    "EFI tree, from Matt Fleming. Most of the patches are the new efivarfs
    filesystem by Matt Garrett & co. The balance are support for EFI
    wallclock in the absence of a hardware-specific driver, and various
    fixes and cleanups."

    * 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
    efivarfs: Make efivarfs_fill_super() static
    x86, efi: Check table header length in efi_bgrt_init()
    efivarfs: Use query_variable_info() to limit kmalloc()
    efivarfs: Fix return value of efivarfs_file_write()
    efivarfs: Return a consistent error when efivarfs_get_inode() fails
    efivarfs: Make 'datasize' unsigned long
    efivarfs: Add unique magic number
    efivarfs: Replace magic number with sizeof(attributes)
    efivarfs: Return an error if we fail to read a variable
    efi: Clarify GUID length calculations
    efivarfs: Implement exclusive access for {get,set}_variable
    efivarfs: efivarfs_fill_super() ensure we clean up correctly on error
    efivarfs: efivarfs_fill_super() ensure we free our temporary name
    efivarfs: efivarfs_fill_super() fix inode reference counts
    efivarfs: efivarfs_create() ensure we drop our reference on inode on error
    efivarfs: efivarfs_file_read ensure we free data in error paths
    x86-64/efi: Use EFI to deal with platform wall clock (again)
    x86/kernel: remove tboot 1:1 page table creation code
    x86, efi: 1:1 pagetable mapping for virtual EFI calls
    x86, mm: Include the entire kernel memory map in trampoline_pgd
    ...

    Linus Torvalds
     

13 Dec, 2012

1 commit

  • Pull xfs update from Ben Myers:
    "There is plenty going on, including the cleanup of xfssyncd, metadata
    verifiers, CRC infrastructure for the log, tracking of inodes with
    speculative allocation, a cleanup of xfs_fs_subr.c, fixes for
    XFS_IOC_ZERO_RANGE, and important fix related to log replay (only
    update the last_sync_lsn when a transaction completes), a fix for
    deadlock on AGF buffers, documentation and comment updates, and a few
    more cleanups and fixes.

    Details:
    - remove the xfssyncd mess
    - only update the last_sync_lsn when a transaction completes
    - zero allocation_args on the kernel stack
    - fix AGF/alloc workqueue deadlock
    - silence uninitialised f.file warning
    - Update inode alloc comments
    - Update mount options documentation
    - report projid32bit feature in geometry call
    - speculative preallocation inode tracking
    - fix attr tree double split corruption
    - fix broken error handling in xfs_vm_writepage
    - drop buffer io reference when a bad bio is built
    - add more attribute tree trace points
    - growfs infrastructure changes for 3.8
    - fs/xfs/xfs_fs_subr.c die die die
    - add CRC infrastructure
    - add CRC checks to the log
    - Remove description of nodelaylog mount option from xfs.txt
    - inode allocation should use unmapped buffers
    - byte range granularity for XFS_IOC_ZERO_RANGE
    - fix direct IO nested transaction deadlock
    - fix stray dquot unlock when reclaiming dquots
    - fix sparse reported log CRC endian issue"

    Fix up trivial conflict in fs/xfs/xfs_fsops.c due to the same patch
    having been applied twice (commits eaef854335ce and 1375cb65e87b: "xfs:
    growfs: don't read garbage for new secondary superblocks") with later
    updates to the affected code in the XFS tree.

    * tag 'for-linus-v3.8-rc1' of git://oss.sgi.com/xfs/xfs: (78 commits)
    xfs: fix sparse reported log CRC endian issue
    xfs: fix stray dquot unlock when reclaiming dquots
    xfs: fix direct IO nested transaction deadlock.
    xfs: byte range granularity for XFS_IOC_ZERO_RANGE
    xfs: inode allocation should use unmapped buffers.
    xfs: Remove the description of nodelaylog mount option from xfs.txt
    xfs: add CRC checks to the log
    xfs: add CRC infrastructure
    xfs: convert buffer verifiers to an ops structure.
    xfs: connect up write verifiers to new buffers
    xfs: add pre-write metadata buffer verifier callbacks
    xfs: add buffer pre-write callback
    xfs: Add verifiers to dir2 data readahead.
    xfs: add xfs_da_node verification
    xfs: factor and verify attr leaf reads
    xfs: factor dir2 leaf read
    xfs: factor out dir2 data block reading
    xfs: factor dir2 free block reading
    xfs: verify dir2 block format buffers
    xfs: factor dir2 block read operations
    ...

    Linus Torvalds
     

11 Dec, 2012

4 commits

  • In f2fs_fs.h, one f2fs inode contains 923 data block pointers, while
    f2fs documentation says it is 929. Fix this inconsistence.

    Signed-off-by: Huajun Li

    Huajun Li
     
  • I moved the f2fs-tools.git into kernel.org.
    And I added a new mailing list, linux-f2fs-devel@lists.sourceforge.net.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This adds a document describing the mount options, proc entries, usage, and
    design of Flash-Friendly File System, namely F2FS.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Ted has sent out a RFC about removing this feature. Eric and Jan
    confirmed that both RedHat and SUSE enable this feature in all their
    product. David also said that "As far as I know, it's enabled in all
    Android kernels that use ext4." So it seems OK for us.

    And what's more, as inline data depends its implementation on xattr,
    and to be frank, I don't run any test again inline data enabled while
    xattr disabled. So I think we should add inline data and remove this
    config option in the same release.

    [ The savings if you disable CONFIG_EXT4_FS_XATTR is only 27k, which
    isn't much in the grand scheme of things. Since no one seems to be
    testing this configuration except for some automated compile farms, on
    balance we are better removing this config option, and so that it is
    effectively always enabled. -- tytso ]

    Cc: David Brown
    Cc: Eric Sandeen
    Reviewed-by: Jan Kara
    Signed-off-by: Tao Ma
    Signed-off-by: "Theodore Ts'o"

    Tao Ma
     

27 Nov, 2012

1 commit


26 Nov, 2012

1 commit

  • Our server rejects compounds containing more than one write operation.
    It's unclear whether this is really permitted by the spec; with 4.0,
    it's possibly OK, with 4.1 (which has clearer limits on compound
    parameters), it's probably not OK. No client that we're aware of has
    ever done this, but in theory it could be useful.

    The source of the limitation: we need an array of iovecs to pass to the
    write operation. In the worst case that array of iovecs could have
    hundreds of elements (the maximum rwsize divided by the page size), so
    it's too big to put on the stack, or in each compound op. So we instead
    keep a single such array in the compound argument.

    We fill in that array at the time we decode the xdr operation.

    But we decode every op in the compound before executing any of them. So
    once we've used that array we can't decode another write.

    If we instead delay filling in that array till the time we actually
    perform the write, we can reuse it.

    Another option might be to switch to decoding compound ops one at a
    time. I considered doing that, but it has a number of other side
    effects, and I'd rather fix just this one problem for now.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

17 Nov, 2012

1 commit

  • This is mostly a revert of 01dc52ebdf47 ("oom: remove deprecated oom_adj")
    from Davidlohr Bueso.

    It reintroduces /proc/pid/oom_adj for backwards compatibility with earlier
    kernels. It simply scales the value linearly when /proc/pid/oom_score_adj
    is written.

    The major difference is that its scheduled removal is no longer included
    in Documentation/feature-removal-schedule.txt. We do warn users with a
    single printk, though, to suggest the more powerful and supported
    /proc/pid/oom_score_adj interface.

    Reported-by: Artem S. Tashkinov
    Signed-off-by: David Rientjes
    Signed-off-by: Linus Torvalds

    David Rientjes
     

11 Nov, 2012

1 commit


08 Nov, 2012

1 commit