04 Apr, 2014

1 commit

  • Reclaim will be leaving shadow entries in the page cache radix tree upon
    evicting the real page. As those pages are found from the LRU, an
    iput() can lead to the inode being freed concurrently. At this point,
    reclaim must no longer install shadow pages because the inode freeing
    code needs to ensure the page tree is really empty.

    Add an address_space flag, AS_EXITING, that the inode freeing code sets
    under the tree lock before doing the final truncate. Reclaim will check
    for this flag before installing shadow pages.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Reviewed-by: Minchan Kim
    Cc: Andrea Arcangeli
    Cc: Bob Liu
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Greg Thelen
    Cc: Hugh Dickins
    Cc: Jan Kara
    Cc: KOSAKI Motohiro
    Cc: Luigi Semenzato
    Cc: Mel Gorman
    Cc: Metin Doslu
    Cc: Michel Lespinasse
    Cc: Ozgun Erdogan
    Cc: Peter Zijlstra
    Cc: Roman Gushchin
    Cc: Ryan Mallon
    Cc: Tejun Heo
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

31 Jan, 2014

1 commit

  • Pull core block IO changes from Jens Axboe:
    "The major piece in here is the immutable bio_ve series from Kent, the
    rest is fairly minor. It was supposed to go in last round, but
    various issues pushed it to this release instead. The pull request
    contains:

    - Various smaller blk-mq fixes from different folks. Nothing major
    here, just minor fixes and cleanups.

    - Fix for a memory leak in the error path in the block ioctl code
    from Christian Engelmayer.

    - Header export fix from CaiZhiyong.

    - Finally the immutable biovec changes from Kent Overstreet. This
    enables some nice future work on making arbitrarily sized bios
    possible, and splitting more efficient. Related fixes to immutable
    bio_vecs:

    - dm-cache immutable fixup from Mike Snitzer.
    - btrfs immutable fixup from Muthu Kumar.

    - bio-integrity fix from Nic Bellinger, which is also going to stable"

    * 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
    xtensa: fixup simdisk driver to work with immutable bio_vecs
    block/blk-mq-cpu.c: use hotcpu_notifier()
    blk-mq: for_each_* macro correctness
    block: Fix memory leak in rw_copy_check_uvector() handling
    bio-integrity: Fix bio_integrity_verify segment start bug
    block: remove unrelated header files and export symbol
    blk-mq: uses page->list incorrectly
    blk-mq: use __smp_call_function_single directly
    btrfs: fix missing increment of bi_remaining
    Revert "block: Warn and free bio if bi_end_io is not set"
    block: Warn and free bio if bi_end_io is not set
    blk-mq: fix initializing request's start time
    block: blk-mq: don't export blk_mq_free_queue()
    block: blk-mq: make blk_sync_queue support mq
    block: blk-mq: support draining mq queue
    dm cache: increment bi_remaining when bi_end_io is restored
    block: fixup for generic bio chaining
    block: Really silence spurious compiler warnings
    block: Silence spurious compiler warnings
    block: Kill bio_pair_split()
    ...

    Linus Torvalds
     

24 Jan, 2014

1 commit

  • In get_mapping_page(), after calling find_or_create_page(), the return
    value should be checked.

    This patch has been provided:
    http://www.spinics.net/lists/linux-fsdevel/msg66948.html but not been
    applied now.

    Signed-off-by: Younger Liu
    Cc: Younger Liu
    Cc: Vyacheslav Dubeyko
    Reviewed-by: Prasad Joshi
    Cc: Jörn Engel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Younger Liu
     

25 Nov, 2013

1 commit


24 Nov, 2013

3 commits

  • Immutable biovecs are going to require an explicit iterator. To
    implement immutable bvecs, a later patch is going to add a bi_bvec_done
    member to this struct; for now, this patch effectively just renames
    things.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Geert Uytterhoeven
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "Ed L. Cashin"
    Cc: Nick Piggin
    Cc: Lars Ellenberg
    Cc: Jiri Kosina
    Cc: Matthew Wilcox
    Cc: Geoff Levand
    Cc: Yehuda Sadeh
    Cc: Sage Weil
    Cc: Alex Elder
    Cc: ceph-devel@vger.kernel.org
    Cc: Joshua Morris
    Cc: Philip Kelleher
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Konrad Rzeszutek Wilk
    Cc: Jeremy Fitzhardinge
    Cc: Neil Brown
    Cc: Alasdair Kergon
    Cc: Mike Snitzer
    Cc: dm-devel@redhat.com
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: linux390@de.ibm.com
    Cc: Boaz Harrosh
    Cc: Benny Halevy
    Cc: "James E.J. Bottomley"
    Cc: Greg Kroah-Hartman
    Cc: "Nicholas A. Bellinger"
    Cc: Alexander Viro
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Andreas Dilger
    Cc: Jaegeuk Kim
    Cc: Steven Whitehouse
    Cc: Dave Kleikamp
    Cc: Joern Engel
    Cc: Prasad Joshi
    Cc: Trond Myklebust
    Cc: KONISHI Ryusuke
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Ben Myers
    Cc: xfs@oss.sgi.com
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Len Brown
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: Herton Ronaldo Krzesinski
    Cc: Ben Hutchings
    Cc: Andrew Morton
    Cc: Guo Chao
    Cc: Tejun Heo
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Wei Yongjun
    Cc: "Roger Pau Monné"
    Cc: Jan Beulich
    Cc: Stefano Stabellini
    Cc: Ian Campbell
    Cc: Sebastian Ott
    Cc: Christian Borntraeger
    Cc: Minchan Kim
    Cc: Jiang Liu
    Cc: Nitin Gupta
    Cc: Jerome Marchand
    Cc: Joe Perches
    Cc: Peng Tao
    Cc: Andy Adamson
    Cc: fanchaoting
    Cc: Jie Liu
    Cc: Sunil Mushran
    Cc: "Martin K. Petersen"
    Cc: Namjae Jeon
    Cc: Pankaj Kumar
    Cc: Dan Magenheimer
    Cc: Mel Gorman 6

    Kent Overstreet
     
  • With immutable biovecs we don't want code accessing bi_io_vec directly -
    the uses this patch changes weren't incorrect since they all own the
    bio, but it makes the code harder to audit for no good reason - also,
    this will help with multipage bvecs later.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Alexander Viro
    Cc: Chris Mason
    Cc: Jaegeuk Kim
    Cc: Joern Engel
    Cc: Prasad Joshi
    Cc: Trond Myklebust

    Kent Overstreet
     
  • It was being open coded in a few places.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Joern Engel
    Cc: Prasad Joshi
    Cc: Neil Brown
    Cc: Chris Mason
    Acked-by: NeilBrown

    Kent Overstreet
     

03 Jul, 2013

1 commit

  • Pull ext4 update from Ted Ts'o:
    "Lots of bug fixes, cleanups and optimizations. In the bug fixes
    category, of note is a fix for on-line resizing file systems where the
    block size is smaller than the page size (i.e., file systems 1k blocks
    on x86, or more interestingly file systems with 4k blocks on Power or
    ia64 systems.)

    In the cleanup category, the ext4's punch hole implementation was
    significantly improved by Lukas Czerner, and now supports bigalloc
    file systems. In addition, Jan Kara significantly cleaned up the
    write submission code path. We also improved error checking and added
    a few sanity checks.

    In the optimizations category, two major optimizations deserve
    mention. The first is that ext4_writepages() is now used for
    nodelalloc and ext3 compatibility mode. This allows writes to be
    submitted much more efficiently as a single bio request, instead of
    being sent as individual 4k writes into the block layer (which then
    relied on the elevator code to coalesce the requests in the block
    queue). Secondly, the extent cache shrink mechanism, which was
    introduce in 3.9, no longer has a scalability bottleneck caused by the
    i_es_lru spinlock. Other optimizations include some changes to reduce
    CPU usage and to avoid issuing empty commits unnecessarily."

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (86 commits)
    ext4: optimize starting extent in ext4_ext_rm_leaf()
    jbd2: invalidate handle if jbd2_journal_restart() fails
    ext4: translate flag bits to strings in tracepoints
    ext4: fix up error handling for mpage_map_and_submit_extent()
    jbd2: fix theoretical race in jbd2__journal_restart
    ext4: only zero partial blocks in ext4_zero_partial_blocks()
    ext4: check error return from ext4_write_inline_data_end()
    ext4: delete unnecessary C statements
    ext3,ext4: don't mess with dir_file->f_pos in htree_dirblock_to_tree()
    jbd2: move superblock checksum calculation to jbd2_write_superblock()
    ext4: pass inode pointer instead of file pointer to punch hole
    ext4: improve free space calculation for inline_data
    ext4: reduce object size when !CONFIG_PRINTK
    ext4: improve extent cache shrink mechanism to avoid to burn CPU time
    ext4: implement error handling of ext4_mb_new_preallocation()
    ext4: fix corruption when online resizing a fs with 1K block size
    ext4: delete unused variables
    ext4: return FIEMAP_EXTENT_UNKNOWN for delalloc extents
    jbd2: remove debug dependency on debug_fs and update Kconfig help text
    jbd2: use a single printk for jbd_debug()
    ...

    Linus Torvalds
     

29 Jun, 2013

1 commit


22 May, 2013

1 commit

  • Currently there is no way to truncate partial page where the end
    truncate point is not at the end of the page. This is because it was not
    needed and the functionality was enough for file system truncate
    operation to work properly. However more file systems now support punch
    hole feature and it can benefit from mm supporting truncating page just
    up to the certain point.

    Specifically, with this functionality truncate_inode_pages_range() can
    be changed so it supports truncating partial page at the end of the
    range (currently it will BUG_ON() if 'end' is not at the end of the
    page).

    This commit changes the invalidatepage() address space operation
    prototype to accept range to be invalidated and update all the instances
    for it.

    We also change the block_invalidatepage() in the same way and actually
    make a use of the new length argument implementing range invalidation.

    Actual file system implementations will follow except the file systems
    where the changes are really simple and should not change the behaviour
    in any way .Implementation for truncate_page_range() which will be able
    to accept page unaligned ranges will follow as well.

    Signed-off-by: Lukas Czerner
    Cc: Andrew Morton
    Cc: Hugh Dickins

    Lukas Czerner
     

24 Mar, 2013

1 commit

  • For immutable bvecs, all bi_idx usage needs to be audited - so here
    we're removing all the unnecessary uses.

    Most of these are places where it was being initialized on a bio that
    was just allocated, a few others are conversions to standard macros.

    Signed-off-by: Kent Overstreet
    CC: Jens Axboe

    Kent Overstreet
     

04 Mar, 2013

1 commit

  • Modify the request_module to prefix the file system type with "fs-"
    and add aliases to all of the filesystems that can be built as modules
    to match.

    A common practice is to build all of the kernel code and leave code
    that is not commonly needed as modules, with the result that many
    users are exposed to any bug anywhere in the kernel.

    Looking for filesystems with a fs- prefix limits the pool of possible
    modules that can be loaded by mount to just filesystems trivially
    making things safer with no real cost.

    Using aliases means user space can control the policy of which
    filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
    with blacklist and alias directives. Allowing simple, safe,
    well understood work-arounds to known problematic software.

    This also addresses a rare but unfortunate problem where the filesystem
    name is not the same as it's module name and module auto-loading
    would not work. While writing this patch I saw a handful of such
    cases. The most significant being autofs that lives in the module
    autofs4.

    This is relevant to user namespaces because we can reach the request
    module in get_fs_type() without having any special permissions, and
    people get uncomfortable when a user specified string (in this case
    the filesystem type) goes all of the way to request_module.

    After having looked at this issue I don't think there is any
    particular reason to perform any filtering or permission checks beyond
    making it clear in the module request that we want a filesystem
    module. The common pattern in the kernel is to call request_module()
    without regards to the users permissions. In general all a filesystem
    module does once loaded is call register_filesystem() and go to sleep.
    Which means there is not much attack surface exposed by loading a
    filesytem module unless the filesystem is mounted. In a user
    namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
    which most filesystems do not set today.

    Acked-by: Serge Hallyn
    Acked-by: Kees Cook
    Reported-by: Kees Cook
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

27 Feb, 2013

1 commit

  • Pull vfs pile (part one) from Al Viro:
    "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
    locking violations, etc.

    The most visible changes here are death of FS_REVAL_DOT (replaced with
    "has ->d_weak_revalidate()") and a new helper getting from struct file
    to inode. Some bits of preparation to xattr method interface changes.

    Misc patches by various people sent this cycle *and* ocfs2 fixes from
    several cycles ago that should've been upstream right then.

    PS: the next vfs pile will be xattr stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    saner proc_get_inode() calling conventions
    proc: avoid extra pde_put() in proc_fill_super()
    fs: change return values from -EACCES to -EPERM
    fs/exec.c: make bprm_mm_init() static
    ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
    ocfs2: fix possible use-after-free with AIO
    ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
    get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
    target: writev() on single-element vector is pointless
    export kernel_write(), convert open-coded instances
    fs: encode_fh: return FILEID_INVALID if invalid fid_type
    kill f_vfsmnt
    vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
    nfsd: handle vfs_getattr errors in acl protocol
    switch vfs_getattr() to struct path
    default SET_PERSONALITY() in linux/elf.h
    ceph: prepopulate inodes only when request is aborted
    d_hash_and_lookup(): export, switch open-coded instances
    9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
    9p: split dropping the acls from v9fs_set_create_acl()
    ...

    Linus Torvalds
     

23 Feb, 2013

1 commit


22 Jan, 2013

1 commit

  • The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
    while now and is almost always enabled by default. As agreed during the
    Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

    CC: Joern Engel
    CC: Prasad Joshi
    Cc: Al Viro
    Signed-off-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     

21 Dec, 2012

1 commit


19 Nov, 2012

1 commit


03 Oct, 2012

3 commits

  • Pull vfs update from Al Viro:

    - big one - consolidation of descriptor-related logics; almost all of
    that is moved to fs/file.c

    (BTW, I'm seriously tempted to rename the result to fd.c. As it is,
    we have a situation when file_table.c is about handling of struct
    file and file.c is about handling of descriptor tables; the reasons
    are historical - file_table.c used to be about a static array of
    struct file we used to have way back).

    A lot of stray ends got cleaned up and converted to saner primitives,
    disgusting mess in android/binder.c is still disgusting, but at least
    doesn't poke so much in descriptor table guts anymore. A bunch of
    relatively minor races got fixed in process, plus an ext4 struct file
    leak.

    - related thing - fget_light() partially unuglified; see fdget() in
    there (and yes, it generates the code as good as we used to have).

    - also related - bits of Cyrill's procfs stuff that got entangled into
    that work; _not_ all of it, just the initial move to fs/proc/fd.c and
    switch of fdinfo to seq_file.

    - Alex's fs/coredump.c spiltoff - the same story, had been easier to
    take that commit than mess with conflicts. The rest is a separate
    pile, this was just a mechanical code movement.

    - a few misc patches all over the place. Not all for this cycle,
    there'll be more (and quite a few currently sit in akpm's tree)."

    Fix up trivial conflicts in the android binder driver, and some fairly
    simple conflicts due to two different changes to the sock_alloc_file()
    interface ("take descriptor handling from sock_alloc_file() to callers"
    vs "net: Providing protocol type via system.sockprotoname xattr of
    /proc/PID/fd entries" adding a dentry name to the socket)

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
    MAX_LFS_FILESIZE should be a loff_t
    compat: fs: Generic compat_sys_sendfile implementation
    fs: push rcu_barrier() from deactivate_locked_super() to filesystems
    btrfs: reada_extent doesn't need kref for refcount
    coredump: move core dump functionality into its own file
    coredump: prevent double-free on an error path in core dumper
    usb/gadget: fix misannotations
    fcntl: fix misannotations
    ceph: don't abuse d_delete() on failure exits
    hypfs: ->d_parent is never NULL or negative
    vfs: delete surplus inode NULL check
    switch simple cases of fget_light to fdget
    new helpers: fdget()/fdput()
    switch o2hb_region_dev_write() to fget_light()
    proc_map_files_readdir(): don't bother with grabbing files
    make get_file() return its argument
    vhost_set_vring(): turn pollstart/pollstop into bool
    switch prctl_set_mm_exe_file() to fget_light()
    switch xfs_find_handle() to fget_light()
    switch xfs_swapext() to fget_light()
    ...

    Linus Torvalds
     
  • There's no reason to call rcu_barrier() on every
    deactivate_locked_super(). We only need to make sure that all delayed rcu
    free inodes are flushed before we destroy related cache.

    Removing rcu_barrier() from deactivate_locked_super() affects some fast
    paths. E.g. on my machine exit_group() of a last process in IPC
    namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.

    Signed-off-by: Kirill A. Shutemov
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Kirill A. Shutemov
     
  • Pull user namespace changes from Eric Biederman:
    "This is a mostly modest set of changes to enable basic user namespace
    support. This allows the code to code to compile with user namespaces
    enabled and removes the assumption there is only the initial user
    namespace. Everything is converted except for the most complex of the
    filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
    nfs, ocfs2 and xfs as those patches need a bit more review.

    The strategy is to push kuid_t and kgid_t values are far down into
    subsystems and filesystems as reasonable. Leaving the make_kuid and
    from_kuid operations to happen at the edge of userspace, as the values
    come off the disk, and as the values come in from the network.
    Letting compile type incompatible compile errors (present when user
    namespaces are enabled) guide me to find the issues.

    The most tricky areas have been the places where we had an implicit
    union of uid and gid values and were storing them in an unsigned int.
    Those places were converted into explicit unions. I made certain to
    handle those places with simple trivial patches.

    Out of that work I discovered we have generic interfaces for storing
    quota by projid. I had never heard of the project identifiers before.
    Adding full user namespace support for project identifiers accounts
    for most of the code size growth in my git tree.

    Ultimately there will be work to relax privlige checks from
    "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
    root in a user names to do those things that today we only forbid to
    non-root users because it will confuse suid root applications.

    While I was pushing kuid_t and kgid_t changes deep into the audit code
    I made a few other cleanups. I capitalized on the fact we process
    netlink messages in the context of the message sender. I removed
    usage of NETLINK_CRED, and started directly using current->tty.

    Some of these patches have also made it into maintainer trees, with no
    problems from identical code from different trees showing up in
    linux-next.

    After reading through all of this code I feel like I might be able to
    win a game of kernel trivial pursuit."

    Fix up some fairly trivial conflicts in netfilter uid/git logging code.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
    userns: Convert the ufs filesystem to use kuid/kgid where appropriate
    userns: Convert the udf filesystem to use kuid/kgid where appropriate
    userns: Convert ubifs to use kuid/kgid
    userns: Convert squashfs to use kuid/kgid where appropriate
    userns: Convert reiserfs to use kuid and kgid where appropriate
    userns: Convert jfs to use kuid/kgid where appropriate
    userns: Convert jffs2 to use kuid and kgid where appropriate
    userns: Convert hpfs to use kuid and kgid where appropriate
    userns: Convert btrfs to use kuid/kgid where appropriate
    userns: Convert bfs to use kuid/kgid where appropriate
    userns: Convert affs to use kuid/kgid wherwe appropriate
    userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
    userns: On ia64 deal with current_uid and current_gid being kuid and kgid
    userns: On ppc convert current_uid from a kuid before printing.
    userns: Convert s390 getting uid and gid system calls to use kuid and kgid
    userns: Convert s390 hypfs to use kuid and kgid where appropriate
    userns: Convert binder ipc to use kuids
    userns: Teach security_path_chown to take kuids and kgids
    userns: Add user namespace support to IMA
    userns: Convert EVM to deal with kuids and kgids in it's hmac computation
    ...

    Linus Torvalds
     

21 Sep, 2012

1 commit


27 Aug, 2012

1 commit

  • Pull LogFS bugfixes from Prasad Joshi:

    - "logfs: query block device for number of pages to send with bio"

    This BUG was found when LogFS was used on KVM. The patch fixes
    the problem by asking for underlaying block device the number
    of pages to send with each BIO.

    - "logfs: maintain the ordering of meta-inode destruction"

    LogFS maintains file system meta-data in special inodes. These
    inodes are releated to each other, therefore they must be
    destroyed in a proper order.

    - "logfs: initialize the number of iovecs in bio"

    LogFS used to panic when it was created on an encrypted LVM
    volume. The patch fixes the problem by properly initializing
    the BIO.

    Plus a couple more:
    - logfs: create a pagecache page if it is not present
    - logfs: destroy the reserved inodes while unmounting

    * tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream:
    logfs: query block device for number of pages to send with bio
    logfs: maintain the ordering of meta-inode destruction
    logfs: create a pagecache page if it is not present
    logfs: initialize the number of iovecs in bio
    logfs: destroy the reserved inodes while unmounting

    Linus Torvalds
     

23 Jul, 2012

3 commits

  • The block device driver puts a limit on maximum number of pages that
    can be sent with the bio. Not all block devices can handle
    BIO_MAX_PAGES number of pages in bio. Specifically the virtio-blk
    diriver limits it to 126. When the LogFS file system was excersized in
    KVM, the following bug from do_virtblk_request() was observed

    static void do_virtblk_request(struct request_queue *q)
    {
    ....
    ....
    while ((req = blk_peek_request(q)) != NULL) {
    BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems);
    ....
    ....
    }
    ....
    }

    The patch fixes the problem by querring the maximum number of pages in
    bio allowed from block device driver and then using those many pages
    during submit_bio.

    Signed-off-by: Prasad Joshi

    Prasad Joshi
     
  • LogFS does not use a specialized area to maintain the inodes. The
    inodes information is kept in a specialized file called inode file.
    Similarly, the segment information is kept in a segment file. Since
    the segment file also has an inode which is kept in the inode file,
    the inode for segment file must be evicted before the inode for inode
    file. The change fixes the following BUG during unmount

    Pid: 2057, comm: umount Not tainted 3.5.0-rc6+ #25 Bochs Bochs
    RIP: 0010:[] [] move_page_to_btree+0x32/0x1f0 [logfs]
    Process umount (pid: 2057, threadinfo ...)
    Call Trace:
    [] ? find_get_pages+0x2a/0x180
    [] logfs_invalidatepage+0x85/0x90 [logfs]
    [] truncate_inode_page+0xb1/0xd0
    [] truncate_inode_pages_range+0x15f/0x490
    [] ? printk+0x78/0x7a
    [] truncate_inode_pages+0x15/0x20
    [] logfs_evict_inode+0x6c/0x190 [logfs]
    [] ? _raw_spin_unlock+0x2b/0x40
    [] evict+0xa7/0x1b0
    [] dispose_list+0x3e/0x60
    [] evict_inodes+0xf4/0x110
    [] generic_shutdown_super+0x53/0xf0
    [] logfs_kill_sb+0x52/0xf0 [logfs]
    [] deactivate_locked_super+0x45/0x80
    [] deactivate_super+0x4a/0x70
    [] mntput_no_expire+0xde/0x140
    [] sys_umount+0x6f/0x3a0
    [] system_call_fastpath+0x16/0x1b
    ---[ end trace 45f7752082cefafd ]---

    Signed-off-by: Prasad Joshi

    Prasad Joshi
     
  • While writing the partial journal entries we assumed that the page
    associated with the journal would always in locatable. This incorrect
    assumption resulted in the following BUG

    kernel BUG at /home/benixon/WD_SMR/kernels/linux-3.3.7-logfs/fs/logfs/journal.c:569!
    EIP is at logfs_write_area+0xb6/0x109 [logfs]
    EAX: 00000000 EBX: 00000000 ECX: ef6efea4 EDX: 00000000
    ESI: 001b9000 EDI: f009e000 EBP: c3c13f14 ESP: c3c13ef0
    DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
    Process sync (pid: 1799, ti=c3c12000 task=f07825b0 task.ti=c3c12000)
    Stack:
    01001000 c3c13f26 781b9000 00000000 f009e000 f7286000 f1f83400 f8445071
    f1f83400 c3c13f30 f8445ae9 c3c13f20 0000100a 000ee000 f009e000 00000001
    c3c13f5c f8445d17 c05eb0ee 00000000 f1f83400 ef718000 f009e25c ea9c3d80
    Call Trace:
    [] ? account_shadow+0x16d/0x16d [logfs]
    [] logfs_write_je+0x2a/0x44 [logfs]
    [] logfs_write_anchor+0x114/0x228 [logfs]
    [] ? empty+0x5/0x5
    [] logfs_sync_fs+0x1e/0x31 [logfs]
    [] __sync_filesystem+0x5d/0x6f
    [] sync_one_sb+0x15/0x17
    [] iterate_supers+0x59/0x9a
    [] ? __sync_filesystem+0x6f/0x6f
    [] sys_sync+0x29/0x4f
    [] sysenter_do_call+0x12/0x28
    EIP: [] logfs_write_area+0xb6/0x109 [logfs] SS:ESP 0068:c3c13ef0
    ---[ end trace ef6e9ef52601a945 ]---

    The fix is to create the pagecache page if it is not locatable.

    Reported-and-tested-by: Benixon Dhas
    Signed-off-by: Prasad Joshi

    Prasad Joshi
     

14 Jul, 2012

3 commits

  • Pass mount flags to sget() so that it can use them in initialising a new
    superblock before the set function is called. They could also be passed to the
    compare function.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • boolean "does it have to be exclusive?" flag is passed instead;
    Local filesystem should just ignore it - the object is guaranteed
    not to be there yet.

    Signed-off-by: Al Viro

    Al Viro
     
  • Just the flags; only NFS cares even about that, but there are
    legitimate uses for such argument. And getting rid of that
    completely would require splitting ->lookup() into a couple
    of methods (at least), so let's leave that alone for now...

    Signed-off-by: Al Viro

    Al Viro
     

06 May, 2012

1 commit

  • After we moved inode_sync_wait() from end_writeback() it doesn't make sense
    to call the function end_writeback() anymore. Rename it to clear_inode()
    which well says what the function really does - set I_CLEAR flag.

    Signed-off-by: Jan Kara
    Signed-off-by: Fengguang Wu

    Jan Kara
     

02 Apr, 2012

2 commits

  • This fixes the following crash when a LogFS file system, created on a
    encrypted LVM volume, was mounted

    [ 526.548034] BUG: unable to handle kernel NULL pointer dereference at
    [ 526.550106] IP: [] memcpy+0xb/0x120
    [ 526.551008] PGD bd60067 PUD 1778d067 PMD 0
    [ 526.551783] Oops: 0000 [#1] SMP

    Pid: 2043, comm: mount
    RIP: 0010:[] [] memcpy+0xb/0x120
    Call Trace:
    kcryptd_io_read+0xdb/0x100
    crypt_map+0xfd/0x190
    __map_bio+0x48/0x150
    __split_and_process_bio+0x51b/0x630
    dm_request+0x138/0x230
    generic_make_request+0xca/0x100
    submit_bio+0x87/0x110
    sync_request+0xdd/0x120 [logfs]
    bdev_readpage+0x2e/0x70 [logfs]
    do_read_cache_page+0x82/0x180
    logfs_mount+0x2ad/0x770 [logfs]
    mount_fs+0x47/0x1c0
    vfs_kern_mount+0x72/0x110
    do_kern_mount+0x54/0x110
    do_mount+0x520/0x7f0
    sys_mount+0x90/0xe0

    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=42292
    Reported-by: Witold Baryluk
    Signed-off-by: Prasad Joshi

    Prasad Joshi
     
  • We were assuming that the evict_inode() would never be called on
    reserved inodes. However, (after the commit 8e22c1a4e logfs: get rid
    of magical inodes) while unmounting the file system, in put_super, we
    call iput() on all of the reserved inodes.

    The following simple test used to cause a kernel panic on LogFS:

    1. Mount a LogFS file system on /mnt

    2. Create a file
    $ touch /mnt/a

    3. Try to unmount the FS
    $ umount /mnt

    The simple fix would be to drop the assumption and properly destroy
    the reserved inodes.

    Signed-off-by: Prasad Joshi

    Prasad Joshi
     

22 Mar, 2012

1 commit

  • Pull vfs pile 1 from Al Viro:
    "This is _not_ all; in particular, Miklos' and Jan's stuff is not there
    yet."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits)
    ext4: initialization of ext4_li_mtx needs to be done earlier
    debugfs-related mode_t whack-a-mole
    hfsplus: add an ioctl to bless files
    hfsplus: change finder_info to u32
    hfsplus: initialise userflags
    qnx4: new helper - try_extent()
    qnx4: get rid of qnx4_bread/qnx4_getblk
    take removal of PF_FORKNOEXEC to flush_old_exec()
    trim includes in inode.c
    um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it
    um: embed ->stub_pages[] into mmu_context
    gadgetfs: list_for_each_safe() misuse
    ocfs2: fix leaks on failure exits in module_init
    ecryptfs: make register_filesystem() the last potential failure exit
    ntfs: forgets to unregister sysctls on register_filesystem() failure
    logfs: missing cleanup on register_filesystem() failure
    jfs: mising cleanup on register_filesystem() failure
    make configfs_pin_fs() return root dentry on success
    configfs: configfs_create_dir() has parent dentry in dentry->d_parent
    configfs: sanitize configfs_create()
    ...

    Linus Torvalds
     

21 Mar, 2012

3 commits


20 Mar, 2012

1 commit


02 Feb, 2012

1 commit

  • This patch fixes merge conflict resolution breakage introduced by merge
    d3712b9dfcf4 ("Merge tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream").

    The commit changed 'mtd_can_have_bb()' function and made it always
    return zero, which is incorrect. Instead, we need it to return whether
    the underlying flash device can have bad eraseblocks or not. UBI needs
    this information because it affects how it handles the underlying flash.
    E.g., if the underlying flash is NOR, it cannot have bad blocks and any
    write or erase error is fatal, and all we can do is to switch to R/O
    mode. We do not need to reserve a pool of good eraseblocks for bad
    eraseblocks handling, and so on.

    This patch also removes 'mtd_can_have_bb()' invocations from Logfs to
    ensure correct Logfs behavior.

    I've tested that with this patch UBI works on top of NOR and NAND
    flashes emulated by mtdram and nandsim correspondingly.

    This patch is based on patch from Linus Torvalds.

    Signed-off-by: Artem Bityutskiy
    Acked-by: Jörn Engel
    Acked-by: Prasad Joshi
    Acked-by: Brian Norris
    Signed-off-by: Linus Torvalds

    Artem Bityutskiy
     

01 Feb, 2012

1 commit

  • There are few important bug fixes for LogFS

    * tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream:
    Logfs: Allow NULL block_isbad() methods
    logfs: Grow inode in delete path
    logfs: Free areas before calling generic_shutdown_super()
    logfs: remove useless BUG_ON
    MAINTAINERS: Add Prasad Joshi in LogFS maintiners
    logfs: Propagate page parameter to __logfs_write_inode
    logfs: set superblock shutdown flag after generic sb shutdown
    logfs: take write mutex lock during fsync and sync
    logfs: Prevent memory corruption
    logfs: update page reference count for pined pages

    Fix up conflict in fs/logfs/dev_mtd.c due to semantic change in what
    "mtd->block_isbad" means in commit f2933e86ad93: "Logfs: Allow NULL
    block_isbad() methods" clashing with the abstraction changes in the
    commits 7086c19d0742: "mtd: introduce mtd_block_isbad interface" and
    d58b27ed58a3: "logfs: do not use 'mtd->block_isbad' directly".

    This resolution takes the semantics from commit f2933e86ad93, and just
    makes mtd_block_isbad() return zero (false) if the 'block_isbad'
    function is NULL. But that also means that now "mtd_can_have_bb()"
    always returns 0.

    Now, "mtd_block_markbad()" will obviously return an error if the
    low-level driver doesn't support bad blocks, so this is somewhat
    non-symmetric, but it actually makes sense if a NULL "block_isbad"
    function is considered to mean "I assume that all my blocks are always
    good".

    Linus Torvalds
     

28 Jan, 2012

2 commits