15 Jan, 2016

1 commit

  • Mark those kmem allocations that are known to be easily triggered from
    userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
    memcg. For the list, see below:

    - threadinfo
    - task_struct
    - task_delay_info
    - pid
    - cred
    - mm_struct
    - vm_area_struct and vm_region (nommu)
    - anon_vma and anon_vma_chain
    - signal_struct
    - sighand_struct
    - fs_struct
    - files_struct
    - fdtable and fdtable->full_fds_bits
    - dentry and external_name
    - inode for all filesystems. This is the most tedious part, because
    most filesystems overwrite the alloc_inode method.

    The list is far from complete, so feel free to add more objects.
    Nevertheless, it should be close to "account everything" approach and
    keep most workloads within bounds. Malevolent users will be able to
    breach the limit, but this was possible even with the former "account
    everything" approach (simply because it did not account everything in
    fact).

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Vladimir Davydov
    Acked-by: Johannes Weiner
    Acked-by: Michal Hocko
    Cc: Tejun Heo
    Cc: Greg Thelen
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

07 Jan, 2016

1 commit


07 Aug, 2014

1 commit


13 Mar, 2014

1 commit

  • Previously, the no-op "mount -o mount /dev/xxx" operation when the
    file system is already mounted read-write causes an implied,
    unconditional syncfs(). This seems pretty stupid, and it's certainly
    documented or guaraunteed to do this, nor is it particularly useful,
    except in the case where the file system was mounted rw and is getting
    remounted read-only.

    However, it's possible that there might be some file systems that are
    actually depending on this behavior. In most file systems, it's
    probably fine to only call sync_filesystem() when transitioning from
    read-write to read-only, and there are some file systems where this is
    not needed at all (for example, for a pseudo-filesystem or something
    like romfs).

    Signed-off-by: "Theodore Ts'o"
    Cc: linux-fsdevel@vger.kernel.org
    Cc: Christoph Hellwig
    Cc: Artem Bityutskiy
    Cc: Adrian Hunter
    Cc: Evgeniy Dushistov
    Cc: Jan Kara
    Cc: OGAWA Hirofumi
    Cc: Anders Larsen
    Cc: Phillip Lougher
    Cc: Kees Cook
    Cc: Mikulas Patocka
    Cc: Petr Vandrovec
    Cc: xfs@oss.sgi.com
    Cc: linux-btrfs@vger.kernel.org
    Cc: linux-cifs@vger.kernel.org
    Cc: samba-technical@lists.samba.org
    Cc: codalist@coda.cs.cmu.edu
    Cc: linux-ext4@vger.kernel.org
    Cc: linux-f2fs-devel@lists.sourceforge.net
    Cc: fuse-devel@lists.sourceforge.net
    Cc: cluster-devel@redhat.com
    Cc: linux-mtd@lists.infradead.org
    Cc: jfs-discussion@lists.sourceforge.net
    Cc: linux-nfs@vger.kernel.org
    Cc: linux-nilfs@vger.kernel.org
    Cc: linux-ntfs-dev@lists.sourceforge.net
    Cc: ocfs2-devel@oss.oracle.com
    Cc: reiserfs-devel@vger.kernel.org

    Theodore Ts'o
     

20 Nov, 2013

1 commit

  • The decompressor interface and code was written from
    the point of view of single-threaded operation. In doing
    so it mixed a lot of single-threaded implementation specific
    aspects into the decompressor code and elsewhere which makes it
    difficult to seamlessly support multiple different decompressor
    implementations.

    This patch does the following:

    1. It removes compressor_options parsing from the decompressor
    init() function. This allows the decompressor init() function
    to be dynamically called to instantiate multiple decompressors,
    without the compressor options needing to be read and parsed each
    time.

    2. It moves threading and all sleeping operations out of the
    decompressors. In doing so, it makes the decompressors
    non-blocking wrappers which only deal with interfacing with
    the decompressor implementation.

    3. It splits decompressor.[ch] into decompressor generic functions
    in decompressor.[ch], and moves the single threaded
    decompressor implementation into decompressor_single.c.

    The result of this patch is Squashfs should now be able to
    support multiple decompressors by adding new decompressor_xxx.c
    files with specialised implementations of the functions in
    decompressor_single.c

    Signed-off-by: Phillip Lougher
    Reviewed-by: Minchan Kim

    Phillip Lougher
     

11 Mar, 2013

1 commit


03 Oct, 2012

1 commit

  • There's no reason to call rcu_barrier() on every
    deactivate_locked_super(). We only need to make sure that all delayed rcu
    free inodes are flushed before we destroy related cache.

    Removing rcu_barrier() from deactivate_locked_super() affects some fast
    paths. E.g. on my machine exit_group() of a last process in IPC
    namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.

    Signed-off-by: Kirill A. Shutemov
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Kirill A. Shutemov
     

29 Mar, 2012

1 commit

  • Pull squashfs updates from Phillip Lougher:
    "Add an extra mount time sanity check, plus some code cleanups and bug
    fixes."

    * tag 'squashfs-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next:
    Squashfs: add mount time sanity check for block_size and block_log match
    Squashfs: fix f_pos check in get_dir_index_using_offset
    Squashfs: get rid of obsolete definitions in header file
    Squashfs: remove redundant length initialisation in squashfs_lookup
    Squashfs: remove redundant length initialisation in squashfs_readdir
    Squashfs: update comment removing reference to zlib only
    Squashfs: use define instead of constant

    Linus Torvalds
     

21 Mar, 2012

1 commit


10 Mar, 2012

1 commit

  • Squashfs currently has a sanity check for block_size less than or
    equal to the maximum block_size (1 Mbyte). This catches some
    superblock corruption, but obviously with a block_size maximum
    of 1 Mbyte there's 7 correct values (4K, 8K, 16K, 32K, ... etc) and
    a lot of incorrect values which are not caught by this check.

    The Squashfs superblock, however, has both a block_size and
    a block_log (2^block_log == block_size). Checking that the block_size
    matches the block_log is a much more robust check. Corruption of the
    superblock is unlikely to produce values which match, and it also
    ensures the block_size is an exact power of two.

    Signed-off-by: Phillip Lougher

    Phillip Lougher
     

14 Jan, 2012

1 commit


04 Jan, 2012

1 commit

  • Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
    it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
    the cost of taking it into inode_init_always() will be negligible for pipes
    and sockets and negative for everything else. Not to mention the removal of
    boilerplate code from ->destroy_inode() instances...

    Signed-off-by: Al Viro

    Al Viro
     

03 Jan, 2012

1 commit


03 Nov, 2011

1 commit

  • This commit adds an option to set the device block size used to 4K.

    By default Squashfs sets the device block size (sb_min_blocksize) to 1K
    or the smallest block size supported by the block device (if larger).
    This, because blocks are packed together and unaligned in Squashfs,
    should reduce latency.

    This, however, gives poor performance on MTD NAND devices where
    the optimal I/O size is 4K (even though the devices can support
    smaller block sizes).

    Using a 4K device block size may also improve overall I/O
    performance for some file access patterns (e.g. sequential
    accesses of files in filesystem order) on all media.

    Signed-off-by: Phillip Lougher

    Phillip Lougher
     

29 May, 2011

1 commit


26 May, 2011

7 commits


01 Mar, 2011

2 commits

  • Squashfs_get_sb() to squashfs_mount() conversion (commit 152a0836)
    results in line over 80 characters.

    Signed-off-by: Phillip Lougher

    Phillip Lougher
     
  • Extend decompressor framework to handle compression options stored in
    the filesystem. These options can be used by the relevant decompressor
    at initialisation time to over-ride defaults.

    The presence of compression options in the filesystem is indicated by
    the COMP_OPT filesystem flag. If present the data is read from the
    filesystem and passed to the decompressor init function. The decompressor
    init function signature has been extended to take this data.

    Also update the init function signature in the glib, lzo and xz
    decompressor wrappers.

    Signed-off-by: Phillip Lougher

    Phillip Lougher
     

07 Jan, 2011

1 commit

  • RCU free the struct inode. This will allow:

    - Subsequent store-free path walking patch. The inode must be consulted for
    permissions when walking, so an RCU inode reference is a must.
    - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
    to take i_lock no longer need to take sb_inode_list_lock to walk the list in
    the first place. This will simplify and optimize locking.
    - Could remove some nested trylock loops in dcache code
    - Could potentially simplify things a bit in VM land. Do not need to take the
    page lock to follow page->mapping.

    The downsides of this is the performance cost of using RCU. In a simple
    creat/unlink microbenchmark, performance drops by about 10% due to inability to
    reuse cache-hot slab objects. As iterations increase and RCU freeing starts
    kicking over, this increases to about 20%.

    In cases where inode lifetimes are longer (ie. many inodes may be allocated
    during the average life span of a single inode), a lot of this cache reuse is
    not applicable, so the regression caused by this patch is smaller.

    The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
    however this adds some complexity to list walking and store-free path walking,
    so I prefer to implement this at a later date, if it is shown to be a win in
    real situations. I haven't found a regression in any non-micro benchmark so I
    doubt it will be a problem.

    Signed-off-by: Nick Piggin

    Nick Piggin
     

29 Oct, 2010

1 commit


05 Oct, 2010

2 commits

  • The BKL is only used in put_super and fill_super, which are both protected
    by the superblocks s_umount rw_semaphore. Therefore it is safe to remove
    the BKL entirely.

    Signed-off-by: Arnd Bergmann
    Cc: Phillip Lougher

    Arnd Bergmann
     
  • This patch is a preparation necessary to remove the BKL from do_new_mount().
    It explicitly adds calls to lock_kernel()/unlock_kernel() around
    get_sb/fill_super operations for filesystems that still uses the BKL.

    I've read through all the code formerly covered by the BKL inside
    do_kern_mount() and have satisfied myself that it doesn't need the BKL
    any more.

    do_kern_mount() is already called without the BKL when mounting the rootfs
    and in nfsctl. do_kern_mount() calls vfs_kern_mount(), which is called
    from various places without BKL: simple_pin_fs(), nfs_do_clone_mount()
    through nfs_follow_mountpoint(), afs_mntpt_do_automount() through
    afs_mntpt_follow_link(). Both later functions are actually the filesystems
    follow_link inode operation. vfs_kern_mount() is calling the specified
    get_sb function and lets the filesystem do its job by calling the given
    fill_super function.

    Therefore I think it is safe to push down the BKL from the VFS to the
    low-level filesystems get_sb/fill_super operation.

    [arnd: do not add the BKL to those file systems that already
    don't use it elsewhere]

    Signed-off-by: Jan Blunck
    Signed-off-by: Arnd Bergmann
    Cc: Matthew Wilcox
    Cc: Christoph Hellwig

    Jan Blunck
     

18 May, 2010

3 commits


25 Apr, 2010

2 commits


21 Jan, 2010

2 commits


22 Sep, 2009

1 commit


13 Jul, 2009

1 commit

  • * Remove smp_lock.h from files which don't need it (including some headers!)
    * Add smp_lock.h to files which do need it
    * Make smp_lock.h include conditional in hardirq.h
    It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT

    This will make hardirq.h inclusion cheaper for every PREEMPT=n config
    (which includes allmodconfig/allyesconfig, BTW)

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

12 Jun, 2009

1 commit

  • Move BKL into ->put_super from the only caller. A couple of
    filesystems had trivial enough ->put_super (only kfree and NULLing of
    s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs,
    hugetlbfs, omfs, qnx4, shmem, all others got the full treatment. Most
    of them probably don't need it, but I'd rather sort that out individually.
    Preferably after all the other BKL pushdowns in that area.

    [AV: original used to move lock_super() down as well; these changes are
    removed since we don't do lock_super() at all in generic_shutdown_super()
    now]
    [AV: fuse, btrfs and xfs are known to need no damn BKL, exempt]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

13 May, 2009

1 commit


03 Apr, 2009

1 commit