27 Feb, 2013

1 commit

  • Pull vfs pile (part one) from Al Viro:
    "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
    locking violations, etc.

    The most visible changes here are death of FS_REVAL_DOT (replaced with
    "has ->d_weak_revalidate()") and a new helper getting from struct file
    to inode. Some bits of preparation to xattr method interface changes.

    Misc patches by various people sent this cycle *and* ocfs2 fixes from
    several cycles ago that should've been upstream right then.

    PS: the next vfs pile will be xattr stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    saner proc_get_inode() calling conventions
    proc: avoid extra pde_put() in proc_fill_super()
    fs: change return values from -EACCES to -EPERM
    fs/exec.c: make bprm_mm_init() static
    ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
    ocfs2: fix possible use-after-free with AIO
    ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
    get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
    target: writev() on single-element vector is pointless
    export kernel_write(), convert open-coded instances
    fs: encode_fh: return FILEID_INVALID if invalid fid_type
    kill f_vfsmnt
    vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
    nfsd: handle vfs_getattr errors in acl protocol
    switch vfs_getattr() to struct path
    default SET_PERSONALITY() in linux/elf.h
    ceph: prepopulate inodes only when request is aborted
    d_hash_and_lookup(): export, switch open-coded instances
    9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
    9p: split dropping the acls from v9fs_set_create_acl()
    ...

    Linus Torvalds
     

24 Feb, 2013

2 commits

  • do_mmap_pgoff() rounds up the desired size to the next PAGE_SIZE
    multiple, however there was no equivalent code in mm_populate(), which
    caused issues.

    This could be fixed by introduced the same rounding in mm_populate(),
    however I think it's preferable to make do_mmap_pgoff() return populate
    as a size rather than as a boolean, so we don't have to duplicate the
    size rounding logic in mm_populate().

    Signed-off-by: Michel Lespinasse
    Acked-by: Rik van Riel
    Tested-by: Andy Lutomirski
    Cc: Greg Ungerer
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • When creating new mappings using the MAP_POPULATE / MAP_LOCKED flags (or
    with MCL_FUTURE in effect), we want to populate the pages within the
    newly created vmas. This may take a while as we may have to read pages
    from disk, so ideally we want to do this outside of the write-locked
    mmap_sem region.

    This change introduces mm_populate(), which is used to defer populating
    such mappings until after the mmap_sem write lock has been released.
    This is implemented as a generalization of the former do_mlock_pages(),
    which accomplished the same task but was using during mlock() /
    mlockall().

    Signed-off-by: Michel Lespinasse
    Reported-by: Andy Lutomirski
    Acked-by: Rik van Riel
    Tested-by: Andy Lutomirski
    Cc: Greg Ungerer
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

23 Feb, 2013

2 commits

  • Allocating a file structure in function get_empty_filp() might fail because
    of several reasons:
    - not enough memory for file structures
    - operation is not allowed
    - user is over its limit

    Currently the function returns NULL in all cases and we loose the exact
    reason of the error. All callers of get_empty_filp() assume that the function
    can fail with ENFILE only.

    Return error through pointer. Change all callers to preserve this error code.

    [AV: cleaned up a bit, carved the get_empty_filp() part out into a separate commit
    (things remaining here deal with alloc_file()), removed pipe(2) behaviour change]

    Signed-off-by: Anatol Pomozov
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Al Viro

    Anatol Pomozov
     
  • Signed-off-by: Al Viro

    Al Viro
     

12 Dec, 2012

1 commit

  • There was some desire in large applications using MAP_HUGETLB or
    SHM_HUGETLB to use 1GB huge pages on some mappings, and stay with 2MB on
    others. This is useful together with NUMA policy: use 2MB interleaving
    on some mappings, but 1GB on local mappings.

    This patch extends the IPC/SHM syscall interfaces slightly to allow
    specifying the page size.

    It borrows some upper bits in the existing flag arguments and allows
    encoding the log of the desired page size in addition to the *_HUGETLB
    flag. When 0 is specified the default size is used, this makes the
    change fully compatible.

    Extending the internal hugetlb code to handle this is straight forward.
    Instead of a single mount it just keeps an array of them and selects the
    right mount based on the specified page size. When no page size is
    specified it uses the mount of the default page size.

    The change is not visible in /proc/mounts because internal mounts don't
    appear there. It also has very little overhead: the additional mounts
    just consume a super block, but not more memory when not used.

    I also exported the new flags to the user headers (they were previously
    under __KERNEL__). Right now only symbols for x86 and some other
    architecture for 1GB and 2MB are defined. The interface should already
    work for all other architectures though. Only architectures that define
    multiple hugetlb sizes actually need it (that is currently x86, tile,
    powerpc). However tile and powerpc have user configurable hugetlb
    sizes, so it's not easy to add defines. A program on those
    architectures would need to query sysfs and use the appropiate log2.

    [akpm@linux-foundation.org: cleanups]
    [rientjes@google.com: fix build]
    [akpm@linux-foundation.org: checkpatch fixes]
    Signed-off-by: Andi Kleen
    Cc: Michael Kerrisk
    Acked-by: Rik van Riel
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Hillf Danton
    Signed-off-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

07 Sep, 2012

1 commit

  • - Store the ipc owner and creator with a kuid
    - Store the ipc group and the crators group with a kgid.
    - Add error handling to ipc_update_perms, allowing it to
    fail if the uids and gids can not be converted to kuids
    or kgids.
    - Modify the proc files to display the ipc creator and
    owner in the user namespace of the opener of the proc file.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

31 Jul, 2012

1 commit

  • If the SHMLBA definition for a native task differs from the definition for
    a compat task, the do_shmat() function would need to handle both.

    This patch introduces COMPAT_SHMLBA, which is used by the compat shmat
    syscall when calling the ipc code and allows architectures such as AArch64
    (where the native SHMLBA is 64k but the compat (AArch32) definition is
    16k) to provide the correct semantics for compat IPC system calls.

    Cc: David S. Miller
    Cc: Chris Zankel
    Cc: Arnd Bergmann
    Acked-by: Catalin Marinas
    Signed-off-by: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Will Deacon
     

08 Jun, 2012

1 commit

  • Commit 17cf28afea2a ("mm/fs: remove truncate_range") removed the
    truncate_range inode operation in favour of the fallocate file
    operation.

    When using SYSV IPC shared memory segments, calling madvise with the
    MADV_REMOVE advice on an area of shared memory will attempt to invoke
    the .fallocate function for the shm_file_operations, which is NULL and
    therefore returns -EOPNOTSUPP to userspace. The previous behaviour
    would inherit the inode_operations from the underlying tmpfs file and
    invoke truncate_range there.

    This patch restores the previous behaviour by wrapping the underlying
    fallocate function in shm_fallocate, as we do for fsync.

    [hughd@google.com: use -ENOTSUPP in shm_fallocate()]
    Signed-off-by: Will Deacon
    Acked-by: Hugh Dickins
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Will Deacon
     

01 Jun, 2012

2 commits


22 Mar, 2012

1 commit

  • When calling shmget() with SHM_HUGETLB, shmget aligns the request size to
    PAGE_SIZE, but this is not sufficient.

    Modify hugetlb_file_setup() to align requests to the huge page size, and
    to accept an address argument so that all alignment checks can be
    performed in hugetlb_file_setup(), rather than in its callers. Change
    newseg() and mmap_pgoff() to match the new prototype and eliminate a now
    redundant alignment check.

    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Steven Truelove
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven Truelove
     

24 Jan, 2012

2 commits

  • Commit cc39c6a9bbde ("mm: account skipped entries to avoid looping in
    find_get_pages") correctly fixed an infinite loop; but left a problem
    that find_get_pages() on shmem would return 0 (appearing to callers to
    mean end of tree) when it meets a run of nr_pages swap entries.

    The only uses of find_get_pages() on shmem are via pagevec_lookup(),
    called from invalidate_mapping_pages(), and from shmctl SHM_UNLOCK's
    scan_mapping_unevictable_pages(). The first is already commented, and
    not worth worrying about; but the second can leave pages on the
    Unevictable list after an unusual sequence of swapping and locking.

    Fix that by using shmem_find_get_pages_and_swap() (then ignoring the
    swap) instead of pagevec_lookup().

    But I don't want to contaminate vmscan.c with shmem internals, nor
    shmem.c with LRU locking. So move scan_mapping_unevictable_pages() into
    shmem.c, renaming it shmem_unlock_mapping(); and rename
    check_move_unevictable_page() to check_move_unevictable_pages(), looping
    down an array of pages, oftentimes under the same lock.

    Leave out the "rotate unevictable list" block: that's a leftover from
    when this was used for /proc/sys/vm/scan_unevictable_pages, whose flawed
    handling involved looking at pages at tail of LRU.

    Was there significance to the sequence first ClearPageUnevictable, then
    test page_evictable, then SetPageUnevictable here? I think not, we're
    under LRU lock, and have no barriers between those.

    Signed-off-by: Hugh Dickins
    Reviewed-by: KOSAKI Motohiro
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Shaohua Li
    Cc: Eric Dumazet
    Cc: Johannes Weiner
    Cc: Michel Lespinasse
    Cc: [back to 3.1 but will need respins]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • scan_mapping_unevictable_pages() is used to make SysV SHM_LOCKed pages
    evictable again once the shared memory is unlocked. It does this with
    pagevec_lookup()s across the whole object (which might occupy most of
    memory), and takes 300ms to unlock 7GB here. A cond_resched() every
    PAGEVEC_SIZE pages would be good.

    However, KOSAKI-san points out that this is called under shmem.c's
    info->lock, and it's also under shm.c's shm_lock(), both spinlocks.
    There is no strong reason for that: we need to take these pages off the
    unevictable list soonish, but those locks are not required for it.

    So move the call to scan_mapping_unevictable_pages() from shmem.c's
    unlock handling up to shm.c's unlock handling. Remove the recently
    added barrier, not needed now we have spin_unlock() before the scan.

    Use get_file(), with subsequent fput(), to make sure we have a reference
    to mapping throughout scan_mapping_unevictable_pages(): that's something
    that was previously guaranteed by the shm_lock().

    Remove shmctl's lru_add_drain_all(): we don't fault in pages at SHM_LOCK
    time, and we lazily discover them to be Unevictable later, so it serves
    no purpose for SHM_LOCK; and serves no purpose for SHM_UNLOCK, since
    pages still on pagevec are not marked Unevictable.

    The original code avoided redundant rescans by checking VM_LOCKED flag
    at its level: now avoid them by checking shp's SHM_LOCKED.

    The original code called scan_mapping_unevictable_pages() on a locked
    area at shm_destroy() time: perhaps we once had accounting cross-checks
    which required that, but not now, so skip the overhead and just let
    inode eviction deal with them.

    Put check_move_unevictable_page() and scan_mapping_unevictable_pages()
    under CONFIG_SHMEM (with stub for the TINY case when ramfs is used),
    more as comment than to save space; comment them used for SHM_UNLOCK.

    Signed-off-by: Hugh Dickins
    Reviewed-by: KOSAKI Motohiro
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Shaohua Li
    Cc: Eric Dumazet
    Cc: Johannes Weiner
    Cc: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

05 Aug, 2011

1 commit

  • This isn't really critical any more, since other patches (commit
    298507d4d2cf: "shm: optimize exit_shm()") have caused us to not actually
    need to touch the rw_mutex unless there are actual shm segments
    associated with the namespace, but we really should do tne shm_init_ns()
    earlier than we do now.

    This, together with commit 288d5abec831 ("Boot up with usermodehelper
    disabled") will mean that we really do initialize the initial ipc
    namespace data structure before we run any tasks.

    Tested-by: Marc Zyngier
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

04 Aug, 2011

2 commits

  • We may optimistically check .in_use == 0 without holding the rw_mutex:
    it's the common case, and if it's zero, there certainly won't be any
    segments associated with us.

    After taking the lock, the idr_for_each() will do the right thing, so we
    could now drop the re-check inside the lock without any real cost. But
    it won't hurt.

    Signed-off-by: Vasiliy Kulikov
    Signed-off-by: Linus Torvalds

    Vasiliy Kulikov
     
  • Commit 4c677e2eefdb ("shm: optimize locking and ipc_namespace getting")
    introduced a copy-paste bug. Due to the bug cycle optimizations were
    disabled.

    Signed-off-by: Vasiliy Kulikov
    Signed-off-by: Linus Torvalds

    Vasiliy Kulikov
     

31 Jul, 2011

2 commits

  • shm_lock() does a lookup of shm segment in shm_ids(ns).ipcs_idr, which
    is redundant as we already know shmid_kernel address. An actual lock is
    also not required for reads until we really want to destroy the segment.

    exit_shm() and shm_destroy_orphaned() may avoid the loop by checking
    whether there is at least one segment in current ipc_namespace.

    The check of nsproxy and ipc_ns against NULL is redundant as exit_shm()
    is called from do_exit() before the call to exit_notify(), so the
    dereferencing current->nsproxy->ipc_ns is guaranteed to be safe.

    Reported-by: Oleg Nesterov
    Signed-off-by: Vasiliy Kulikov
    Acked-by: Serge Hallyn
    Signed-off-by: Linus Torvalds

    Vasiliy Kulikov
     
  • shm_try_destroy_orphaned() and shm_try_destroy_current() didn't handle
    the case of separate PID namespaces, but a single IPC namespace. If
    there are tasks with the same PID values using the same shmem object,
    the wrong destroy decision could be reached.

    On shm segment creation store the pointer to the creator task in
    shmid_kernel->shm_creator field and zero it on task exit. Then
    use the ->shm_creator insread of shm_cprid in both functions. As
    shmid_kernel object is already locked at this stage, no additional
    locking is needed.

    Signed-off-by: Vasiliy Kulikov
    Acked-by: Serge Hallyn
    Signed-off-by: Linus Torvalds

    Vasiliy Kulikov
     

27 Jul, 2011

1 commit

  • Add support for the shm_rmid_forced sysctl. If set to 1, all shared
    memory objects in current ipc namespace will be automatically forced to
    use IPC_RMID.

    The POSIX way of handling shmem allows one to create shm objects and
    call shmdt(), leaving shm object associated with no process, thus
    consuming memory not counted via rlimits.

    With shm_rmid_forced=1 the shared memory object is counted at least for
    one process, so OOM killer may effectively kill the fat process holding
    the shared memory.

    It obviously breaks POSIX - some programs relying on the feature would
    stop working. So set shm_rmid_forced=1 only if you're sure nobody uses
    "orphaned" memory. Use shm_rmid_forced=0 by default for compatability
    reasons.

    The feature was previously impemented in -ow as a configure option.

    [akpm@linux-foundation.org: fix documentation, per Randy]
    [akpm@linux-foundation.org: fix warning]
    [akpm@linux-foundation.org: readability/conventionality tweaks]
    [akpm@linux-foundation.org: fix shm_rmid_forced/shm_forced_rmid confusion, use standard comment layout]
    Signed-off-by: Vasiliy Kulikov
    Cc: Randy Dunlap
    Cc: "Eric W. Biederman"
    Cc: "Serge E. Hallyn"
    Cc: Daniel Lezcano
    Cc: Oleg Nesterov
    Cc: Tejun Heo
    Cc: Ingo Molnar
    Cc: Alan Cox
    Cc: Solar Designer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasiliy Kulikov
     

21 Jul, 2011

1 commit

  • Btrfs needs to be able to control how filemap_write_and_wait_range() is called
    in fsync to make it less of a painful operation, so push down taking i_mutex and
    the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
    file systems can drop taking the i_mutex altogether it seems, like ext3 and
    ocfs2. For correctness sake I just pushed everything down in all cases to make
    sure that we keep the current behavior the same for everybody, and then each
    individual fs maintainer can make up their mind about what to do from there.
    Thanks,

    Acked-by: Jan Kara
    Signed-off-by: Josef Bacik
    Signed-off-by: Al Viro

    Josef Bacik
     

27 May, 2011

1 commit

  • The type of vma->vm_flags is 'unsigned long'. Neither 'int' nor
    'unsigned int'. This patch fixes such misuse.

    Signed-off-by: KOSAKI Motohiro
    [ Changed to use a typedef - we'll extend it to cover more cases
    later, since there has been discussion about making it a 64-bit
    type.. - Linus ]
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

31 Mar, 2011

1 commit


24 Mar, 2011

1 commit

  • CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(),
    because the resource comes from current's own ipc namespace.

    setuid/setgid are to uids in own namespace, so again checks can be against
    current_user_ns().

    Changelog:
    Jan 11: Use task_ns_capable() in place of sched_capable().
    Jan 11: Use nsown_capable() as suggested by Bastian Blank.
    Jan 11: Clarify (hopefully) some logic in futex and sched.c
    Feb 15: use ns_capable for ipc, not nsown_capable
    Feb 23: let copy_ipcs handle setting ipc_ns->user_ns
    Feb 23: pass ns down rather than taking it from current

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Serge E. Hallyn
    Acked-by: "Eric W. Biederman"
    Acked-by: Daniel Lezcano
    Acked-by: David Howells
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

30 Oct, 2010

1 commit


28 Oct, 2010

1 commit

  • The kernel currently provides no functionality to analyze the RSS and swap
    space usage of each individual sysvipc shared memory segment.

    This patch adds this info for each existing shm segment by extending the
    output of /proc/sysvipc/shm by two columns for RSS and swap.

    Since shmctl(SHM_INFO) already provides a similiar calculation (it
    currently sums up all RSS/swap info for all segments), I did split out a
    static function which is now used by the /proc/sysvipc/shm output and
    shmctl(SHM_INFO).

    SAP products (esp. the SAP Netweaver ABAP Kernel) uses lots of big shared
    memory segments (we often have Linux systems with >= 16GB shm usage).
    Sometimes we get customer reports about "slow" system responses and while
    looking into their configurations we often find massive swapping activity
    on the system. With this patch it's now easy to see from the command line
    if and which shm segments gets swapped out (and how much) and can more
    easily give recommendations for system tuning. Without the patch it's
    currently not possible to do such shm analysis at all.

    Also...

    Add some spaces in front of the "size" field for 64bit kernels to get the
    columns correct if you cat the contents of the file. In
    sysvipc_shm_proc_show() the kernel prints the size value in "SPEC_SIZE"
    format, which is defined like this:

    #if BITS_PER_LONG
    Cc: Manfred Spraul
    Acked-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Helge Deller
     

15 Oct, 2010

1 commit

  • All file_operations should get a .llseek operation so we can make
    nonseekable_open the default for future file operations without a
    .llseek pointer.

    The three cases that we can automatically detect are no_llseek, seq_lseek
    and default_llseek. For cases where we can we can automatically prove that
    the file offset is always ignored, we use noop_llseek, which maintains
    the current behavior of not returning an error from a seek.

    New drivers should normally not use noop_llseek but instead use no_llseek
    and call nonseekable_open at open time. Existing drivers can be converted
    to do the same when the maintainer knows for certain that no user code
    relies on calling seek on the device file.

    The generated code is often incorrectly indented and right now contains
    comments that clarify for each added line why a specific variant was
    chosen. In the version that gets submitted upstream, the comments will
    be gone and I will manually fix the indentation, because there does not
    seem to be a way to do that using coccinelle.

    Some amount of new code is currently sitting in linux-next that should get
    the same modifications, which I will do at the end of the merge window.

    Many thanks to Julia Lawall for helping me learn to write a semantic
    patch that does all this.

    ===== begin semantic patch =====
    // This adds an llseek= method to all file operations,
    // as a preparation for making no_llseek the default.
    //
    // The rules are
    // - use no_llseek explicitly if we do nonseekable_open
    // - use seq_lseek for sequential files
    // - use default_llseek if we know we access f_pos
    // - use noop_llseek if we know we don't access f_pos,
    // but we still want to allow users to call lseek
    //
    @ open1 exists @
    identifier nested_open;
    @@
    nested_open(...)
    {

    }

    @ open exists@
    identifier open_f;
    identifier i, f;
    identifier open1.nested_open;
    @@
    int open_f(struct inode *i, struct file *f)
    {

    }

    @ read disable optional_qualifier exists @
    identifier read_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    expression E;
    identifier func;
    @@
    ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
    {

    }

    @ read_no_fpos disable optional_qualifier exists @
    identifier read_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    @@
    ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
    {
    ... when != off
    }

    @ write @
    identifier write_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    expression E;
    identifier func;
    @@
    ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
    {

    }

    @ write_no_fpos @
    identifier write_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    @@
    ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
    {
    ... when != off
    }

    @ fops0 @
    identifier fops;
    @@
    struct file_operations fops = {
    ...
    };

    @ has_llseek depends on fops0 @
    identifier fops0.fops;
    identifier llseek_f;
    @@
    struct file_operations fops = {
    ...
    .llseek = llseek_f,
    ...
    };

    @ has_read depends on fops0 @
    identifier fops0.fops;
    identifier read_f;
    @@
    struct file_operations fops = {
    ...
    .read = read_f,
    ...
    };

    @ has_write depends on fops0 @
    identifier fops0.fops;
    identifier write_f;
    @@
    struct file_operations fops = {
    ...
    .write = write_f,
    ...
    };

    @ has_open depends on fops0 @
    identifier fops0.fops;
    identifier open_f;
    @@
    struct file_operations fops = {
    ...
    .open = open_f,
    ...
    };

    // use no_llseek if we call nonseekable_open
    ////////////////////////////////////////////
    @ nonseekable1 depends on !has_llseek && has_open @
    identifier fops0.fops;
    identifier nso ~= "nonseekable_open";
    @@
    struct file_operations fops = {
    ... .open = nso, ...
    +.llseek = no_llseek, /* nonseekable */
    };

    @ nonseekable2 depends on !has_llseek @
    identifier fops0.fops;
    identifier open.open_f;
    @@
    struct file_operations fops = {
    ... .open = open_f, ...
    +.llseek = no_llseek, /* open uses nonseekable */
    };

    // use seq_lseek for sequential files
    /////////////////////////////////////
    @ seq depends on !has_llseek @
    identifier fops0.fops;
    identifier sr ~= "seq_read";
    @@
    struct file_operations fops = {
    ... .read = sr, ...
    +.llseek = seq_lseek, /* we have seq_read */
    };

    // use default_llseek if there is a readdir
    ///////////////////////////////////////////
    @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier readdir_e;
    @@
    // any other fop is used that changes pos
    struct file_operations fops = {
    ... .readdir = readdir_e, ...
    +.llseek = default_llseek, /* readdir is present */
    };

    // use default_llseek if at least one of read/write touches f_pos
    /////////////////////////////////////////////////////////////////
    @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read.read_f;
    @@
    // read fops use offset
    struct file_operations fops = {
    ... .read = read_f, ...
    +.llseek = default_llseek, /* read accesses f_pos */
    };

    @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier write.write_f;
    @@
    // write fops use offset
    struct file_operations fops = {
    ... .write = write_f, ...
    + .llseek = default_llseek, /* write accesses f_pos */
    };

    // Use noop_llseek if neither read nor write accesses f_pos
    ///////////////////////////////////////////////////////////

    @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read_no_fpos.read_f;
    identifier write_no_fpos.write_f;
    @@
    // write fops use offset
    struct file_operations fops = {
    ...
    .write = write_f,
    .read = read_f,
    ...
    +.llseek = noop_llseek, /* read and write both use no f_pos */
    };

    @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier write_no_fpos.write_f;
    @@
    struct file_operations fops = {
    ... .write = write_f, ...
    +.llseek = noop_llseek, /* write uses no f_pos */
    };

    @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read_no_fpos.read_f;
    @@
    struct file_operations fops = {
    ... .read = read_f, ...
    +.llseek = noop_llseek, /* read uses no f_pos */
    };

    @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    @@
    struct file_operations fops = {
    ...
    +.llseek = noop_llseek, /* no read or write fn */
    };
    ===== End semantic patch =====

    Signed-off-by: Arnd Bergmann
    Cc: Julia Lawall
    Cc: Christoph Hellwig

    Arnd Bergmann
     

28 May, 2010

1 commit


13 Mar, 2010

1 commit

  • Make sure compiler won't do weird things with limits. E.g. fetching them
    twice may return 2 different values after writable limits are implemented.

    I.e. either use rlimit helpers added in
    3e10e716abf3c71bdb5d86b8f507f9e72236c9cd ("resource: add helpers for
    fetching rlimits") or ACCESS_ONCE if not applicable.

    Signed-off-by: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     

17 Jan, 2010

1 commit

  • Commit c4caa778157dbbf04116f0ac2111e389b5cd7a29 ("file
    ->get_unmapped_area() shouldn't duplicate work of get_unmapped_area()")
    broke SYSV SHM for NOMMU by taking away the pointer to
    shm_get_unmapped_area() from shm_file_operations.

    Put it back conditionally on CONFIG_MMU=n.

    file->f_ops->get_unmapped_area() is used to find out the base address for a
    mapping of a mappable chardev device or mappable memory-based file (such as a
    ramfs file). It needs to be called prior to file->f_ops->mmap() being called.

    Signed-off-by: David Howells
    Acked-by: Al Viro
    Cc: Greg Ungerer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

17 Dec, 2009

3 commits

  • * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (38 commits)
    direct I/O fallback sync simplification
    ocfs: stop using do_sync_mapping_range
    cleanup blockdev_direct_IO locking
    make generic_acl slightly more generic
    sanitize xattr handler prototypes
    libfs: move EXPORT_SYMBOL for d_alloc_name
    vfs: force reval of target when following LAST_BIND symlinks (try #7)
    ima: limit imbalance msg
    Untangling ima mess, part 3: kill dead code in ima
    Untangling ima mess, part 2: deal with counters
    Untangling ima mess, part 1: alloc_file()
    O_TRUNC open shouldn't fail after file truncation
    ima: call ima_inode_free ima_inode_free
    IMA: clean up the IMA counts updating code
    ima: only insert at inode creation time
    ima: valid return code from ima_inode_alloc
    fs: move get_empty_filp() deffinition to internal.h
    Sanitize exec_permission_lite()
    Kill cached_lookup() and real_lookup()
    Kill path_lookup_open()
    ...

    Trivial conflicts in fs/direct-io.c

    Linus Torvalds
     
  • There are 2 groups of alloc_file() callers:
    * ones that are followed by ima_counts_get
    * ones giving non-regular files
    So let's pull that ima_counts_get() into alloc_file();
    it's a no-op in case of non-regular files.

    Signed-off-by: Al Viro

    Al Viro
     
  • ... and have the caller grab both mnt and dentry; kill
    leak in infiniband, while we are at it.

    Signed-off-by: Al Viro

    Al Viro
     

16 Dec, 2009

1 commit

  • We have apparently had a memory leak since
    7ca7e564e049d8b350ec9d958ff25eaa24226352 "ipc: store ipcs into IDRs" in
    2007. The idr of which 3 exist for each ipc namespace is never freed.

    This patch simply frees them when the ipcns is freed. I don't believe any
    idr_remove() are done from rcu (and could therefore be delayed until after
    this idr_destroy()), so the patch should be safe. Some quick testing
    showed no harm, and the memory leak fixed.

    Caught by kmemleak.

    Signed-off-by: Serge E. Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

11 Dec, 2009

1 commit


28 Sep, 2009

1 commit


22 Sep, 2009

1 commit

  • This patchset adds a flag to mmap that allows the user to request that an
    anonymous mapping be backed with huge pages. This mapping will borrow
    functionality from the huge page shm code to create a file on the kernel
    internal mount and use it to approximate an anonymous mapping. The
    MAP_HUGETLB flag is a modifier to MAP_ANONYMOUS and will not work without
    both flags being preset.

    A new flag is necessary because there is no other way to hook into huge
    pages without creating a file on a hugetlbfs mount which wouldn't be
    MAP_ANONYMOUS.

    To userspace, this mapping will behave just like an anonymous mapping
    because the file is not accessible outside of the kernel.

    This patchset is meant to simplify the programming model. Presently there
    is a large chunk of boiler platecode, contained in libhugetlbfs, required
    to create private, hugepage backed mappings. This patch set would allow
    use of hugepages without linking to libhugetlbfs or having hugetblfs
    mounted.

    Unification of the VM code would provide these same benefits, but it has
    been resisted each time that it has been suggested for several reasons: it
    would break PAGE_SIZE assumptions across the kernel, it makes page-table
    abstractions really expensive, and it does not provide any benefit on
    architectures that do not support huge pages, incurring fast path
    penalties without providing any benefit on these architectures.

    This patch:

    There are two means of creating mappings backed by huge pages:

    1. mmap() a file created on hugetlbfs
    2. Use shm which creates a file on an internal mount which essentially
    maps it MAP_SHARED

    The internal mount is only used for shared mappings but there is very
    little that stops it being used for private mappings. This patch extends
    hugetlbfs_file_setup() to deal with the creation of files that will be
    mapped MAP_PRIVATE on the internal hugetlbfs mount. This extended API is
    used in a subsequent patch to implement the MAP_HUGETLB mmap() flag.

    Signed-off-by: Eric Munson
    Acked-by: David Rientjes
    Cc: Mel Gorman
    Cc: Adam Litke
    Cc: David Gibson
    Cc: Lee Schermerhorn
    Cc: Nick Piggin
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric B Munson
     

15 Sep, 2009

1 commit

  • My 353d5c30c666580347515da609dd74a2b8e9b828 "mm: fix hugetlb bug due to
    user_shm_unlock call" broke the CONFIG_SYSVIPC !CONFIG_MMU build of both
    2.6.31 and 2.6.30.6: "undefined reference to `user_shm_unlock'".

    gcc didn't understand my comment! so couldn't figure out to optimize
    away user_shm_unlock() from the error path in the hugetlb-less case, as
    it does elsewhere. Help it to do so, in a language it understands.

    Reported-by: Mike Frysinger
    Signed-off-by: Hugh Dickins
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

25 Aug, 2009

1 commit

  • 2.6.30's commit 8a0bdec194c21c8fdef840989d0d7b742bb5d4bc removed
    user_shm_lock() calls in hugetlb_file_setup() but left the
    user_shm_unlock call in shm_destroy().

    In detail:
    Assume that can_do_hugetlb_shm() returns true and hence user_shm_lock()
    is not called in hugetlb_file_setup(). However, user_shm_unlock() is
    called in any case in shm_destroy() and in the following
    atomic_dec_and_lock(&up->__count) in free_uid() is executed and if
    up->__count gets zero, also cleanup_user_struct() is scheduled.

    Note that sched_destroy_user() is empty if CONFIG_USER_SCHED is not set.
    However, the ref counter up->__count gets unexpectedly non-positive and
    the corresponding structs are freed even though there are live
    references to them, resulting in a kernel oops after a lots of
    shmget(SHM_HUGETLB)/shmctl(IPC_RMID) cycles and CONFIG_USER_SCHED set.

    Hugh changed Stefan's suggested patch: can_do_hugetlb_shm() at the
    time of shm_destroy() may give a different answer from at the time
    of hugetlb_file_setup(). And fixed newseg()'s no_id error path,
    which has missed user_shm_unlock() ever since it came in 2.6.9.

    Reported-by: Stefan Huber
    Signed-off-by: Hugh Dickins
    Tested-by: Stefan Huber
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

11 Jun, 2009

1 commit