14 Apr, 2014

1 commit

  • Some versions of gcc even warn about it:

    mm/shmem.c: In function ‘shmem_file_aio_read’:
    mm/shmem.c:1414: warning: ‘error’ may be used uninitialized in this function

    If the loop is aborted during the first iteration by one of the two
    first break statements, error will be uninitialized.

    Introduced by commit 6e58e79db8a1 ("introduce copy_page_to_iter, kill
    loop over iovec in generic_file_aio_read()").

    Signed-off-by: Geert Uytterhoeven
    Acked-by: Al Viro
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     

13 Apr, 2014

1 commit

  • Pull vfs updates from Al Viro:
    "The first vfs pile, with deep apologies for being very late in this
    window.

    Assorted cleanups and fixes, plus a large preparatory part of iov_iter
    work. There's a lot more of that, but it'll probably go into the next
    merge window - it *does* shape up nicely, removes a lot of
    boilerplate, gets rid of locking inconsistencie between aio_write and
    splice_write and I hope to get Kent's direct-io rewrite merged into
    the same queue, but some of the stuff after this point is having
    (mostly trivial) conflicts with the things already merged into
    mainline and with some I want more testing.

    This one passes LTP and xfstests without regressions, in addition to
    usual beating. BTW, readahead02 in ltp syscalls testsuite has started
    giving failures since "mm/readahead.c: fix readahead failure for
    memoryless NUMA nodes and limit readahead pages" - might be a false
    positive, might be a real regression..."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    missing bits of "splice: fix racy pipe->buffers uses"
    cifs: fix the race in cifs_writev()
    ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
    kill generic_file_buffered_write()
    ocfs2_file_aio_write(): switch to generic_perform_write()
    ceph_aio_write(): switch to generic_perform_write()
    xfs_file_buffered_aio_write(): switch to generic_perform_write()
    export generic_perform_write(), start getting rid of generic_file_buffer_write()
    generic_file_direct_write(): get rid of ppos argument
    btrfs_file_aio_write(): get rid of ppos
    kill the 5th argument of generic_file_buffered_write()
    kill the 4th argument of __generic_file_aio_write()
    lustre: don't open-code kernel_recvmsg()
    ocfs2: don't open-code kernel_recvmsg()
    drbd: don't open-code kernel_recvmsg()
    constify blk_rq_map_user_iov() and friends
    lustre: switch to kernel_sendmsg()
    ocfs2: don't open-code kernel_sendmsg()
    take iov_iter stuff to mm/iov_iter.c
    process_vm_access: tidy up a bit
    ...

    Linus Torvalds
     

12 Apr, 2014

1 commit


08 Apr, 2014

2 commits

  • mem_cgroup_newpage_charge is used only for charging anonymous memory so
    it is better to rename it to mem_cgroup_charge_anon.

    mem_cgroup_cache_charge is used for file backed memory so rename it to
    mem_cgroup_charge_file.

    Signed-off-by: Michal Hocko
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • In shmem/tmpfs, we also use the generic filemap_map_pages, seems the
    additional checking is not worth a separate version of map_pages for it.

    Signed-off-by: Ning Qu
    Acked-by: Hugh Dickins
    Cc: "Kirill A. Shutemov"
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Dave Hansen
    Cc: Dave Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ning Qu
     

04 Apr, 2014

2 commits

  • shmem mappings already contain exceptional entries where swap slot
    information is remembered.

    To be able to store eviction information for regular page cache, prepare
    every site dealing with the radix trees directly to handle entries other
    than pages.

    The common lookup functions will filter out non-page entries and return
    NULL for page cache holes, just as before. But provide a raw version of
    the API which returns non-page entries as well, and switch shmem over to
    use it.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Reviewed-by: Minchan Kim
    Cc: Andrea Arcangeli
    Cc: Bob Liu
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Greg Thelen
    Cc: Hugh Dickins
    Cc: Jan Kara
    Cc: KOSAKI Motohiro
    Cc: Luigi Semenzato
    Cc: Mel Gorman
    Cc: Metin Doslu
    Cc: Michel Lespinasse
    Cc: Ozgun Erdogan
    Cc: Peter Zijlstra
    Cc: Roman Gushchin
    Cc: Ryan Mallon
    Cc: Tejun Heo
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Page cache radix tree slots are usually stabilized by the page lock, but
    shmem's swap cookies have no such thing. Because the overall truncation
    loop is lockless, the swap entry is currently confirmed by a tree lookup
    and then deleted by another tree lookup under the same tree lock region.

    Use radix_tree_delete_item() instead, which does the verification and
    deletion with only one lookup. This also allows removing the
    delete-only special case from shmem_radix_tree_replace().

    Signed-off-by: Johannes Weiner
    Reviewed-by: Minchan Kim
    Reviewed-by: Rik van Riel
    Acked-by: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: Bob Liu
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Greg Thelen
    Cc: Hugh Dickins
    Cc: Jan Kara
    Cc: KOSAKI Motohiro
    Cc: Luigi Semenzato
    Cc: Metin Doslu
    Cc: Michel Lespinasse
    Cc: Ozgun Erdogan
    Cc: Peter Zijlstra
    Cc: Roman Gushchin
    Cc: Ryan Mallon
    Cc: Tejun Heo
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

02 Apr, 2014

2 commits


29 Jan, 2014

1 commit

  • Pull vfs updates from Al Viro:
    "Assorted stuff; the biggest pile here is Christoph's ACL series. Plus
    assorted cleanups and fixes all over the place...

    There will be another pile later this week"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits)
    __dentry_path() fixes
    vfs: Remove second variable named error in __dentry_path
    vfs: Is mounted should be testing mnt_ns for NULL or error.
    Fix race when checking i_size on direct i/o read
    hfsplus: remove can_set_xattr
    nfsd: use get_acl and ->set_acl
    fs: remove generic_acl
    nfs: use generic posix ACL infrastructure for v3 Posix ACLs
    gfs2: use generic posix ACL infrastructure
    jfs: use generic posix ACL infrastructure
    xfs: use generic posix ACL infrastructure
    reiserfs: use generic posix ACL infrastructure
    ocfs2: use generic posix ACL infrastructure
    jffs2: use generic posix ACL infrastructure
    hfsplus: use generic posix ACL infrastructure
    f2fs: use generic posix ACL infrastructure
    ext2/3/4: use generic posix ACL infrastructure
    btrfs: use generic posix ACL infrastructure
    fs: make posix_acl_create more useful
    fs: make posix_acl_chmod more useful
    ...

    Linus Torvalds
     

26 Jan, 2014

1 commit


24 Jan, 2014

1 commit

  • Most of the VM_BUG_ON assertions are performed on a page. Usually, when
    one of these assertions fails we'll get a BUG_ON with a call stack and
    the registers.

    I've recently noticed based on the requests to add a small piece of code
    that dumps the page to various VM_BUG_ON sites that the page dump is
    quite useful to people debugging issues in mm.

    This patch adds a VM_BUG_ON_PAGE(cond, page) which beyond doing what
    VM_BUG_ON() does, also dumps the page before executing the actual
    BUG_ON.

    [akpm@linux-foundation.org: fix up includes]
    Signed-off-by: Sasha Levin
    Cc: "Kirill A. Shutemov"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

02 Dec, 2013

1 commit

  • We have a problem where the big_key key storage implementation uses a
    shmem backed inode to hold the key contents. Because of this detail of
    implementation LSM checks are being done between processes trying to
    read the keys and the tmpfs backed inode. The LSM checks are already
    being handled on the key interface level and should not be enforced at
    the inode level (since the inode is an implementation detail, not a
    part of the security model)

    This patch implements a new function shmem_kernel_file_setup() which
    returns the equivalent to shmem_file_setup() only the underlying inode
    has S_PRIVATE set. This means that all LSM checks for the inode in
    question are skipped. It should only be used for kernel internal
    operations where the inode is not exposed to userspace without proper
    LSM checking. It is possible that some other users of
    shmem_file_setup() should use the new interface, but this has not been
    explored.

    Reproducing this bug is a little bit difficult. The steps I used on
    Fedora are:

    (1) Turn off selinux enforcing:

    setenforce 0

    (2) Create a huge key

    k=`dd if=/dev/zero bs=8192 count=1 | keyctl padd big_key test-key @s`

    (3) Access the key in another context:

    runcon system_u:system_r:httpd_t:s0-s0:c0.c1023 keyctl print $k >/dev/null

    (4) Examine the audit logs:

    ausearch -m AVC -i --subject httpd_t | audit2allow

    If the last command's output includes a line that looks like:

    allow httpd_t user_tmpfs_t:file { open read };

    There was an inode check between httpd and the tmpfs filesystem. With
    this patch no such denial will be seen. (NOTE! you should clear your
    audit log if you have tested for this previously)

    (Please return you box to enforcing)

    Signed-off-by: Eric Paris
    Signed-off-by: David Howells
    cc: Hugh Dickins
    cc: linux-mm@kvack.org

    Eric Paris
     

12 Sep, 2013

2 commits

  • Conditionally call the appropriate fs_init function and fill_super
    functions. Add a use once guard to shmem_init() to simply succeed on a
    second call.

    (Note that IS_ENABLED() is a compile time constant so dead code
    elimination removes unused function calls when CONFIG_TMPFS is disabled.)

    Signed-off-by: Rob Landley
    Cc: Jeff Layton
    Cc: Jens Axboe
    Cc: Stephen Warren
    Cc: Rusty Russell
    Cc: Jim Cromie
    Cc: Sam Ravnborg
    Cc: Greg Kroah-Hartman
    Cc: "Eric W. Biederman"
    Cc: Alexander Viro
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rob Landley
     
  • With users of radix_tree_preload() run from interrupt (block/blk-ioc.c is
    one such possible user), the following race can happen:

    radix_tree_preload()
    ...
    radix_tree_insert()
    radix_tree_node_alloc()
    if (rtp->nr) {
    ret = rtp->nodes[rtp->nr - 1];

    ...
    radix_tree_preload()
    ...
    radix_tree_insert()
    radix_tree_node_alloc()
    if (rtp->nr) {
    ret = rtp->nodes[rtp->nr - 1];

    And we give out one radix tree node twice. That clearly results in radix
    tree corruption with different results (usually OOPS) depending on which
    two users of radix tree race.

    We fix the problem by making radix_tree_node_alloc() always allocate fresh
    radix tree nodes when in interrupt. Using preloading when in interrupt
    doesn't make sense since all the allocations have to be atomic anyway and
    we cannot steal nodes from process-context users because some users rely
    on radix_tree_insert() succeeding after radix_tree_preload().
    in_interrupt() check is somewhat ugly but we cannot simply key off passed
    gfp_mask as that is acquired from root_gfp_mask() and thus the same for
    all preload users.

    Another part of the fix is to avoid node preallocation in
    radix_tree_preload() when passed gfp_mask doesn't allow waiting. Again,
    preallocation in such case doesn't make sense and when preallocation would
    happen in interrupt we could possibly leak some allocated nodes. However,
    some users of radix_tree_preload() require following radix_tree_insert()
    to succeed. To avoid unexpected effects for these users,
    radix_tree_preload() only warns if passed gfp mask doesn't allow waiting
    and we provide a new function radix_tree_maybe_preload() for those users
    which get different gfp mask from different call sites and which are
    prepared to handle radix_tree_insert() failure.

    Signed-off-by: Jan Kara
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

04 Sep, 2013

1 commit


25 Aug, 2013

1 commit


05 Aug, 2013

1 commit

  • Commit 46a1c2c7ae53 ("vfs: export lseek_execute() to modules") broke the
    tmpfs SEEK_DATA/SEEK_HOLE implementation, because vfs_setpos() converts
    the carefully prepared -ENXIO to -EINVAL. Other filesystems avoid it in
    error cases: do the same in tmpfs.

    Signed-off-by: Hugh Dickins
    Cc: Jie Liu
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

04 Jul, 2013

1 commit

  • Pull security subsystem updates from James Morris:
    "In this update, Smack learns to love IPv6 and to mount a filesystem
    with a transmutable hierarchy (i.e. security labels are inherited
    from parent directory upon creation rather than creating process).

    The rest of the changes are maintenance"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (37 commits)
    tpm/tpm_i2c_infineon: Remove unused header file
    tpm: tpm_i2c_infinion: Don't modify i2c_client->driver
    evm: audit integrity metadata failures
    integrity: move integrity_audit_msg()
    evm: calculate HMAC after initializing posix acl on tmpfs
    maintainers: add Dmitry Kasatkin
    Smack: Fix the bug smackcipso can't set CIPSO correctly
    Smack: Fix possible NULL pointer dereference at smk_netlbl_mls()
    Smack: Add smkfstransmute mount option
    Smack: Improve access check performance
    Smack: Local IPv6 port based controls
    tpm: fix regression caused by section type conflict of tpm_dev_release() in ppc builds
    maintainers: Remove Kent from maintainers
    tpm: move TPM_DIGEST_SIZE defintion
    tpm_tis: missing platform_driver_unregister() on error in init_tis()
    security: clarify cap_inode_getsecctx description
    apparmor: no need to delay vfree()
    apparmor: fix fully qualified name parsing
    apparmor: fix setprocattr arg processing for onexec
    apparmor: localize getting the security context to a few macros
    ...

    Linus Torvalds
     

03 Jul, 2013

1 commit

  • For those file systems(btrfs/ext4/ocfs2/tmpfs) that support
    SEEK_DATA/SEEK_HOLE functions, we end up handling the similar
    matter in lseek_execute() to update the current file offset
    to the desired offset if it is valid, ceph also does the
    simliar things at ceph_llseek().

    To reduce the duplications, this patch make lseek_execute()
    public accessible so that we can call it directly from the
    underlying file systems.

    Thanks Dave Chinner for this suggestion.

    [AV: call it vfs_setpos(), don't bring the removed 'inode' argument back]

    v2->v1:
    - Add kernel-doc comments for lseek_execute()
    - Call lseek_execute() in ceph->llseek()

    Signed-off-by: Jie Liu
    Cc: Dave Chinner
    Cc: Al Viro
    Cc: Andi Kleen
    Cc: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Chris Mason
    Cc: Josef Bacik
    Cc: Ben Myers
    Cc: Ted Tso
    Cc: Hugh Dickins
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Sage Weil
    Signed-off-by: Al Viro

    Jie Liu
     

29 Jun, 2013

1 commit


20 Jun, 2013

1 commit

  • Included in the EVM hmac calculation is the i_mode. Any changes to
    the i_mode need to be reflected in the hmac. shmem_mknod() currently
    calls generic_acl_init(), which modifies the i_mode, after calling
    security_inode_init_security(). This patch reverses the order in
    which they are called.

    Reported-by: Sven Vermeulen
    Signed-off-by: Mimi Zohar
    Acked-by: Hugh Dickins

    Mimi Zohar
     

08 May, 2013

1 commit

  • Faster kernel compiles by way of fewer unnecessary includes.

    [akpm@linux-foundation.org: fix fallout]
    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     

30 Apr, 2013

1 commit


02 Mar, 2013

1 commit


27 Feb, 2013

1 commit

  • Pull vfs pile (part one) from Al Viro:
    "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
    locking violations, etc.

    The most visible changes here are death of FS_REVAL_DOT (replaced with
    "has ->d_weak_revalidate()") and a new helper getting from struct file
    to inode. Some bits of preparation to xattr method interface changes.

    Misc patches by various people sent this cycle *and* ocfs2 fixes from
    several cycles ago that should've been upstream right then.

    PS: the next vfs pile will be xattr stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    saner proc_get_inode() calling conventions
    proc: avoid extra pde_put() in proc_fill_super()
    fs: change return values from -EACCES to -EPERM
    fs/exec.c: make bprm_mm_init() static
    ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
    ocfs2: fix possible use-after-free with AIO
    ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
    get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
    target: writev() on single-element vector is pointless
    export kernel_write(), convert open-coded instances
    fs: encode_fh: return FILEID_INVALID if invalid fid_type
    kill f_vfsmnt
    vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
    nfsd: handle vfs_getattr errors in acl protocol
    switch vfs_getattr() to struct path
    default SET_PERSONALITY() in linux/elf.h
    ceph: prepopulate inodes only when request is aborted
    d_hash_and_lookup(): export, switch open-coded instances
    9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
    9p: split dropping the acls from v9fs_set_create_acl()
    ...

    Linus Torvalds
     

26 Feb, 2013

3 commits

  • This patch is a follow up on below patch:

    [PATCH] exportfs: add FILEID_INVALID to indicate invalid fid_type
    commit: 216b6cbdcbd86b1db0754d58886b466ae31f5a63

    Signed-off-by: Namjae Jeon
    Signed-off-by: Vivek Trivedi
    Acked-by: Steven Whitehouse
    Acked-by: Sage Weil
    Signed-off-by: Al Viro

    Namjae Jeon
     
  • Note that provided ->d_dname() reproduces what we used to get for
    those guys in e.g. /proc/self/maps; it might be a good idea to change
    that to something less ugly, but for now let's keep the existing
    user-visible behaviour

    Signed-off-by: Al Viro

    Al Viro
     
  • Pull user namespace and namespace infrastructure changes from Eric W Biederman:
    "This set of changes starts with a few small enhnacements to the user
    namespace. reboot support, allowing more arbitrary mappings, and
    support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the
    user namespace root.

    I do my best to document that if you care about limiting your
    unprivileged users that when you have the user namespace support
    enabled you will need to enable memory control groups.

    There is a minor bug fix to prevent overflowing the stack if someone
    creates way too many user namespaces.

    The bulk of the changes are a continuation of the kuid/kgid push down
    work through the filesystems. These changes make using uids and gids
    typesafe which ensures that these filesystems are safe to use when
    multiple user namespaces are in use. The filesystems converted for
    3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs. The
    changes for these filesystems were a little more involved so I split
    the changes into smaller hopefully obviously correct changes.

    XFS is the only filesystem that remains. I was hoping I could get
    that in this release so that user namespace support would be enabled
    with an allyesconfig or an allmodconfig but it looks like the xfs
    changes need another couple of days before it they are ready."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits)
    cifs: Enable building with user namespaces enabled.
    cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t
    cifs: Convert struct cifs_sb_info to use kuids and kgids
    cifs: Modify struct smb_vol to use kuids and kgids
    cifs: Convert struct cifsFileInfo to use a kuid
    cifs: Convert struct cifs_fattr to use kuid and kgids
    cifs: Convert struct tcon_link to use a kuid.
    cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t
    cifs: Convert from a kuid before printing current_fsuid
    cifs: Use kuids and kgids SID to uid/gid mapping
    cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc
    cifs: Use BUILD_BUG_ON to validate uids and gids are the same size
    cifs: Override unmappable incoming uids and gids
    nfsd: Enable building with user namespaces enabled.
    nfsd: Properly compare and initialize kuids and kgids
    nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids
    nfsd: Modify nfsd4_cb_sec to use kuids and kgids
    nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion
    nfsd: Convert nfsxdr to use kuids and kgids
    nfsd: Convert nfs3xdr to use kuids and kgids
    ...

    Linus Torvalds
     

24 Feb, 2013

3 commits

  • Fix several mempolicy leaks in the tmpfs mount logic. These leaks are
    slow - on the order of one object leaked per mount attempt.

    Leak 1 (umount doesn't free mpol allocated in mount):
    while true; do
    mount -t tmpfs -o mpol=interleave,size=100M nodev /mnt
    umount /mnt
    done

    Leak 2 (errors parsing remount options will leak mpol):
    mount -t tmpfs -o size=100M nodev /mnt
    while true; do
    mount -o remount,mpol=interleave,size=x /mnt 2> /dev/null
    done
    umount /mnt

    Leak 3 (multiple mpol per mount leak mpol):
    while true; do
    mount -t tmpfs -o mpol=interleave,mpol=interleave,size=100M nodev /mnt
    umount /mnt
    done

    This patch fixes all of the above. I could have broken the patch into
    three pieces but is seemed easier to review as one.

    [akpm@linux-foundation.org: fix handling of mpol_parse_str() errors, per Hugh]
    Signed-off-by: Greg Thelen
    Acked-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Thelen
     
  • The tmpfs remount logic preserves filesystem mempolicy if the mpol=M
    option is not specified in the remount request. A new policy can be
    specified if mpol=M is given.

    Before this patch remounting an mpol bound tmpfs without specifying
    mpol= mount option in the remount request would set the filesystem's
    mempolicy object to a freed mempolicy object.

    To reproduce the problem boot a DEBUG_PAGEALLOC kernel and run:
    # mkdir /tmp/x

    # mount -t tmpfs -o size=100M,mpol=interleave nodev /tmp/x

    # grep /tmp/x /proc/mounts
    nodev /tmp/x tmpfs rw,relatime,size=102400k,mpol=interleave:0-3 0 0

    # mount -o remount,size=200M nodev /tmp/x

    # grep /tmp/x /proc/mounts
    nodev /tmp/x tmpfs rw,relatime,size=204800k,mpol=??? 0 0
    # note ? garbage in mpol=... output above

    # dd if=/dev/zero of=/tmp/x/f count=1
    # panic here

    Panic:
    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [< (null)>] (null)
    [...]
    Oops: 0010 [#1] SMP DEBUG_PAGEALLOC
    Call Trace:
    mpol_shared_policy_init+0xa5/0x160
    shmem_get_inode+0x209/0x270
    shmem_mknod+0x3e/0xf0
    shmem_create+0x18/0x20
    vfs_create+0xb5/0x130
    do_last+0x9a1/0xea0
    path_openat+0xb3/0x4d0
    do_filp_open+0x42/0xa0
    do_sys_open+0xfe/0x1e0
    compat_sys_open+0x1b/0x20
    cstar_dispatch+0x7/0x1f

    Non-debug kernels will not crash immediately because referencing the
    dangling mpol will not cause a fault. Instead the filesystem will
    reference a freed mempolicy object, which will cause unpredictable
    behavior.

    The problem boils down to a dropped mpol reference below if
    shmem_parse_options() does not allocate a new mpol:

    config = *sbinfo
    shmem_parse_options(data, &config, true)
    mpol_put(sbinfo->mpol)
    sbinfo->mpol = config.mpol /* BUG: saves unreferenced mpol */

    This patch avoids the crash by not releasing the mempolicy if
    shmem_parse_options() doesn't create a new mpol.

    How far back does this issue go? I see it in both 2.6.36 and 3.3. I did
    not look back further.

    Signed-off-by: Greg Thelen
    Acked-by: Hugh Dickins
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Thelen
     
  • In shmem_find_get_pages_and_swap(), use the faster radix tree iterator
    construct from commit 78c1d78488a3 ("radix-tree: introduce bit-optimized
    iterator").

    Signed-off-by: Johannes Weiner
    Acked-by: Hugh Dickins
    Cc: Konstantin Khlebnikov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

23 Feb, 2013

3 commits

  • Signed-off-by: Al Viro

    Al Viro
     
  • Allocating a file structure in function get_empty_filp() might fail because
    of several reasons:
    - not enough memory for file structures
    - operation is not allowed
    - user is over its limit

    Currently the function returns NULL in all cases and we loose the exact
    reason of the error. All callers of get_empty_filp() assume that the function
    can fail with ENFILE only.

    Return error through pointer. Change all callers to preserve this error code.

    [AV: cleaned up a bit, carved the get_empty_filp() part out into a separate commit
    (things remaining here deal with alloc_file()), removed pipe(2) behaviour change]

    Signed-off-by: Anatol Pomozov
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Al Viro

    Anatol Pomozov
     
  • Signed-off-by: Al Viro

    Al Viro
     

27 Jan, 2013

1 commit

  • There is no backing store to tmpfs and file creation rules are the
    same as for any other filesystem so it is semantically safe to allow
    unprivileged users to mount it. ramfs is safe for the same reasons so
    allow either flavor of tmpfs to be mounted by a user namespace root
    user.

    The memory control group successfully limits how much memory tmpfs can
    consume on any system that cares about a user namespace root using
    tmpfs to exhaust memory the memory control group can be deployed.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

03 Jan, 2013

1 commit


18 Dec, 2012

1 commit


13 Dec, 2012

1 commit

  • Revert 3.5's commit f21f8062201f ("tmpfs: revert SEEK_DATA and
    SEEK_HOLE") to reinstate 4fb5ef089b28 ("tmpfs: support SEEK_DATA and
    SEEK_HOLE"), with the intervening additional arg to
    generic_file_llseek_size().

    In 3.8, ext4 is expected to join btrfs, ocfs2 and xfs with proper
    SEEK_DATA and SEEK_HOLE support; and a good case has now been made for
    it on tmpfs, so let's join the party.

    It's quite easy for tmpfs to scan the radix_tree to support llseek's new
    SEEK_DATA and SEEK_HOLE options: so add them while the minutiae are
    still on my mind (in particular, the !PageUptodate-ness of pages
    fallocated but still unwritten).

    [akpm@linux-foundation.org: fix warning with CONFIG_TMPFS=n]
    Signed-off-by: Hugh Dickins
    Cc: Dave Chinner
    Cc: Jaegeuk Hanse
    Cc: "Theodore Ts'o"
    Cc: Zheng Liu
    Cc: Jeff liu
    Cc: Paul Eggert
    Cc: Christoph Hellwig
    Cc: Josef Bacik
    Cc: Andi Kleen
    Cc: Andreas Dilger
    Cc: Marco Stornelli
    Cc: Chris Mason
    Cc: Sunil Mushran
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

07 Dec, 2012

1 commit

  • This fixes a regression in 3.7-rc, which has since gone into stable.

    Commit 00442ad04a5e ("mempolicy: fix a memory corruption by refcount
    imbalance in alloc_pages_vma()") changed get_vma_policy() to raise the
    refcount on a shmem shared mempolicy; whereas shmem_alloc_page() went
    on expecting alloc_page_vma() to drop the refcount it had acquired.
    This deserves a rework: but for now fix the leak in shmem_alloc_page().

    Hugh: shmem_swapin() did not need a fix, but surely it's clearer to use
    the same refcounting there as in shmem_alloc_page(), delete its onstack
    mempolicy, and the strange mpol_cond_copy() and __mpol_cond_copy() -
    those were invented to let swapin_readahead() make an unknown number of
    calls to alloc_pages_vma() with one mempolicy; but since 00442ad04a5e,
    alloc_pages_vma() has kept refcount in balance, so now no problem.

    Reported-and-tested-by: Tommi Rantala
    Signed-off-by: Mel Gorman
    Signed-off-by: Hugh Dickins
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Mel Gorman