06 Jan, 2009

5 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    inotify: fix type errors in interfaces
    fix breakage in reiserfs_new_inode()
    fix the treatment of jfs special inodes
    vfs: remove duplicate code in get_fs_type()
    add a vfs_fsync helper
    sys_execve and sys_uselib do not call into fsnotify
    zero i_uid/i_gid on inode allocation
    inode->i_op is never NULL
    ntfs: don't NULL i_op
    isofs check for NULL ->i_op in root directory is dead code
    affs: do not zero ->i_op
    kill suid bit only for regular files
    vfs: lseek(fd, 0, SEEK_CUR) race condition

    Linus Torvalds
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Fsync currently has a fdatawrite/fdatawait pair around the method call,
    and a mutex_lock/unlock of the inode mutex. All callers of fsync have
    to duplicate this, but we have a few and most of them don't quite get
    it right. This patch adds a new vfs_fsync that takes care of this.
    It's a little more complicated as usual as ->fsync might get a NULL file
    pointer and just a dentry from nfsd, but otherwise gets afile and we
    want to take the mapping and file operations from it when it is there.

    Notes on the fsync callers:

    - ecryptfs wasn't calling filemap_fdatawrite / filemap_fdatawait on the
    lower file
    - coda wasn't calling filemap_fdatawrite / filemap_fdatawait on the host
    file, and returning 0 when ->fsync was missing
    - shm wasn't calling either filemap_fdatawrite / filemap_fdatawait nor
    taking i_mutex. Now given that shared memory doesn't have disk
    backing not doing anything in fsync seems fine and I left it out of
    the vfs_fsync conversion for now, but in that case we might just
    not pass it through to the lower file at all but just call the no-op
    simple_sync_file directly.

    [and now actually export vfs_fsync]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • We used to have rather schizophrenic set of checks for NULL ->i_op even
    though it had been eliminated years ago. You'd need to go out of your
    way to set it to NULL explicitly _and_ a bunch of code would die on
    such inodes anyway. After killing two remaining places that still
    did that bogosity, all that crap can go away.

    Signed-off-by: Al Viro

    Al Viro
     
  • We don't have to do it because it is useless for non regular files.
    In fact block device may trigger this path without dentry->d_inode->i_mutex.

    (akpm: concerns were expressed (by me) about S_ISDIR inodes)

    Signed-off-by: Dmitri Monakhov
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Dmitri Monakhov
     

05 Jan, 2009

2 commits

  • With the write_begin/write_end aops, page_symlink was broken because it
    could no longer pass a GFP_NOFS type mask into the point where the
    allocations happened. They are done in write_begin, which would always
    assume that the filesystem can be entered from reclaim. This bug could
    cause filesystem deadlocks.

    The funny thing with having a gfp_t mask there is that it doesn't really
    allow the caller to arbitrarily tinker with the context in which it can be
    called. It couldn't ever be GFP_ATOMIC, for example, because it needs to
    take the page lock. The only thing any callers care about is __GFP_FS
    anyway, so turn that into a single flag.

    Add a new flag for write_begin, AOP_FLAG_NOFS. Filesystems can now act on
    this flag in their write_begin function. Change __grab_cache_page to
    accept a nofs argument as well, to honour that flag (while we're there,
    change the name to grab_cache_page_write_begin which is more instructive
    and does away with random leading underscores).

    This is really a more flexible way to go in the end anyway -- if a
    filesystem happens to want any extra allocations aside from the pagecache
    ones in ints write_begin function, it may now use GFP_KERNEL (rather than
    GFP_NOFS) for common case allocations (eg. ocfs2_alloc_write_ctxt, for a
    random example).

    [kosaki.motohiro@jp.fujitsu.com: fix ubifs]
    [kosaki.motohiro@jp.fujitsu.com: fix fuse]
    Signed-off-by: Nick Piggin
    Reviewed-by: KOSAKI Motohiro
    Cc: [2.6.28.x]
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    [ Cleaned up the calling convention: just pass in the AOP flags
    untouched to the grab_cache_page_write_begin() function. That
    just simplifies everybody, and may even allow future expansion of the
    logic. - Linus ]
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • The flush_cache_vmap in vmap_page_range() is called with the end of the
    range twice. The following patch fixes this for me.

    Signed-off-by: Adam Lackorzynski
    Cc: Nick Piggin
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adam Lackorzynski
     

01 Jan, 2009

2 commits

  • Impact: Use new API

    Convert kernel mm functions to use struct cpumask.

    We skip include/linux/percpu.h and mm/allocpercpu.c, which are in flux.

    Signed-off-by: Rusty Russell
    Signed-off-by: Mike Travis
    Reviewed-by: Christoph Lameter

    Rusty Russell
     
  • Impact: Remove obsolete API usage

    any_online_cpu() is a good name, but it takes a cpumask_t, not a
    pointer.

    There are several places where any_online_cpu() doesn't really want a
    mask arg at all. Replace all callers with cpumask_any() and
    cpumask_any_and().

    Signed-off-by: Rusty Russell
    Signed-off-by: Mike Travis

    Rusty Russell
     

31 Dec, 2008

4 commits

  • Conflicts:

    arch/x86/kernel/io_apic.c

    Rusty Russell
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
    slub: avoid leaking caches or refcounts on sysfs error
    slab: Fix comment on #endif
    slab: remove GFP_THISNODE clearing from alloc_slabmgmt()
    slub: Add might_sleep_if() to slab_alloc()
    SLUB: failslab support
    slub: Fix incorrect use of loose
    slab: Update the kmem_cache_create documentation regarding the name parameter
    slub: make early_kmem_cache_node_alloc void
    slab: unsigned slabp->inuse cannot be less than 0
    slub - fix get_object_page comment
    SLUB: Replace __builtin_return_address(0) with _RET_IP_.
    SLUB: cleanup - define macros instead of hardcoded numbers

    Linus Torvalds
     
  • * 'for-2.6.29' of git://git.kernel.dk/linux-2.6-block: (43 commits)
    bio: get rid of bio_vec clearing
    bounce: don't rely on a zeroed bio_vec list
    cciss: simplify parameters to deregister_disk function
    cfq-iosched: fix race between exiting queue and exiting task
    loop: Do not call loop_unplug for not configured loop device.
    loop: Flush possible running bios when loop device is released.
    alpha: remove dead BIO_VMERGE_BOUNDARY
    Get rid of CONFIG_LSF
    block: make blk_softirq_init() static
    block: use min_not_zero in blk_queue_stack_limits
    block: add one-hit cache for disk partition lookup
    cfq-iosched: remove limit of dispatch depth of max 4 times quantum
    nbd: tell the block layer that it is not a rotational device
    block: get rid of elevator_t typedef
    aio: make the lookup_ioctx() lockless
    bio: add support for inlining a number of bio_vecs inside the bio
    bio: allow individual slabs in the bio_set
    bio: move the slab pointer inside the bio_set
    bio: only mempool back the largest bio_vec slab cache
    block: don't use plugging on SSD devices
    ...

    Linus Torvalds
     
  • * 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits)
    stacktrace: provide save_stack_trace_tsk() weak alias
    rcu: provide RCU options on non-preempt architectures too
    printk: fix discarding message when recursion_bug
    futex: clean up futex_(un)lock_pi fault handling
    "Tree RCU": scalable classic RCU implementation
    futex: rename field in futex_q to clarify single waiter semantics
    x86/swiotlb: add default swiotlb_arch_range_needs_mapping
    x86/swiotlb: add default physbus conversion
    x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
    x86: add swiotlb allocation functions
    swiotlb: consolidate swiotlb info message printing
    swiotlb: support bouncing of HighMem pages
    swiotlb: factor out copy to/from device
    swiotlb: add arch hook to force mapping
    swiotlb: allow architectures to override physbusphys conversions
    swiotlb: add comment where we handle the overflow of a dma mask on 32 bit
    rcu: fix rcutorture behavior during reboot
    resources: skip sanity check of busy resources
    swiotlb: move some definitions to header
    swiotlb: allow architectures to override swiotlb pool allocation
    ...

    Fix up trivial conflicts in
    arch/x86/kernel/Makefile
    arch/x86/mm/init_32.c
    include/linux/hardirq.h
    as per Ingo's suggestions.

    Linus Torvalds
     

30 Dec, 2008

1 commit


29 Dec, 2008

9 commits

  • Conflicts:

    mm/slub.c

    Signed-off-by: Pekka Enberg

    Pekka Enberg
     
  • Pekka Enberg
     
  • If a slab cache is mergeable and the sysfs alias cannot be added, the
    target cache shall have its refcount decremented. kmem_cache_create()
    will return NULL, so if kmem_cache_destroy() is ever called on the target
    cache, it will never be freed if the refcount has been leaked.

    Likewise, if a slab cache is not mergeable and the sysfs link cannot be
    added, the new cache shall be removed from the slab_caches list.
    kmem_cache_create() will return NULL, so it will be impossible to call
    kmem_cache_destroy() on it.

    Both of these operations require slub_lock since refcount of all slab
    caches and slab_caches are protected by the lock.

    In the mergeable case, it would be better to restore objsize and offset
    back to their original values, but this could race with another merge
    since slub_lock was dropped.

    Cc: Christoph Lameter
    Signed-off-by: David Rientjes
    Signed-off-by: Pekka Enberg

    David Rientjes
     
  • Commit 6cb062296f73e74768cca2f3eaf90deac54de02d ("Categorize GFP flags")
    left one call-site in alloc_slabmgmt() to clear GFP_THISNODE instead of
    GFP_CONSTRAINT_MASK. Unfortunately, that ends up clearing __GFP_NOWARN
    and __GFP_NORETRY as well which is not what we want. As the only caller
    of alloc_slabmgmt() already clears GFP_CONSTRAINT_MASK before passing
    local_flags to it, we can just remove the clearing of GFP_THISNODE.

    This patch should fix spurious page allocation failure warnings on the
    mempool_alloc() path. See the following URL for the original discussion
    of the bug:

    http://lkml.org/lkml/2008/10/27/100

    Acked-by: Christoph Lameter
    Reported-by: Miklos Szeredi
    Signed-off-by: Pekka Enberg

    Pekka Enberg
     
  • Currently SLUB doesn't warn about __GFP_WAIT. Add it into slab_alloc().

    Acked-by: Christoph Lameter
    Signed-off-by: OGAWA Hirofumi
    Signed-off-by: Pekka Enberg

    OGAWA Hirofumi
     
  • Currently fault-injection capability for SLAB allocator is only
    available to SLAB. This patch makes it available to SLUB, too.

    [penberg@cs.helsinki.fi: unify slab and slub implementations]
    Cc: Christoph Lameter
    Cc: Matt Mackall
    Signed-off-by: Akinobu Mita
    Signed-off-by: Pekka Enberg

    Akinobu Mita
     
  • __blk_queue_bounce() relies on a zeroed bio_vec list, since it looks
    up arbitrary indexes in the allocated bio. The block layer only
    guarentees that added entries are valid, so clear memory after alloc.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • …el/git/tip/linux-2.6-tip

    * 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (241 commits)
    sched, trace: update trace_sched_wakeup()
    tracing/ftrace: don't trace on early stage of a secondary cpu boot, v3
    Revert "x86: disable X86_PTRACE_BTS"
    ring-buffer: prevent false positive warning
    ring-buffer: fix dangling commit race
    ftrace: enable format arguments checking
    x86, bts: memory accounting
    x86, bts: add fork and exit handling
    ftrace: introduce tracing_reset_online_cpus() helper
    tracing: fix warnings in kernel/trace/trace_sched_switch.c
    tracing: fix warning in kernel/trace/trace.c
    tracing/ring-buffer: remove unused ring_buffer size
    trace: fix task state printout
    ftrace: add not to regex on filtering functions
    trace: better use of stack_trace_enabled for boot up code
    trace: add a way to enable or disable the stack tracer
    x86: entry_64 - introduce FTRACE_ frame macro v2
    tracing/ftrace: add the printk-msg-only option
    tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp()
    x86, bts: correctly report invalid bts records
    ...

    Fixed up trivial conflict in scripts/recordmcount.pl due to SH bits
    being already partly merged by the SH merge.

    Linus Torvalds
     
  • * 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (246 commits)
    x86: traps.c replace #if CONFIG_X86_32 with #ifdef CONFIG_X86_32
    x86: PAT: fix address types in track_pfn_vma_new()
    x86: prioritize the FPU traps for the error code
    x86: PAT: pfnmap documentation update changes
    x86: PAT: move track untrack pfnmap stubs to asm-generic
    x86: PAT: remove follow_pfnmap_pte in favor of follow_phys
    x86: PAT: modify follow_phys to return phys_addr prot and return value
    x86: PAT: clarify is_linear_pfn_mapping() interface
    x86: ia32_signal: remove unnecessary declaration
    x86: common.c boot_cpu_stack and boot_exception_stacks should be static
    x86: fix intel x86_64 llc_shared_map/cpu_llc_id anomolies
    x86: fix warning in arch/x86/kernel/microcode_amd.c
    x86: ia32.h: remove unused struct sigfram32 and rt_sigframe32
    x86: asm-offset_64: use rt_sigframe_ia32
    x86: sigframe.h: include headers for dependency
    x86: traps.c declare functions before they get used
    x86: PAT: update documentation to cover pgprot and remap_pfn related changes - v3
    x86: PAT: add pgprot_writecombine() interface for drivers - v3
    x86: PAT: change pgprot_noncached to uc_minus instead of strong uc - v3
    x86: PAT: implement track/untrack of pfnmap regions for x86 - v3
    ...

    Linus Torvalds
     

25 Dec, 2008

2 commits


23 Dec, 2008

1 commit

  • …86/debug', 'x86/defconfig', 'x86/detect-hyper', 'x86/doc', 'x86/dumpstack', 'x86/early-printk', 'x86/fpu', 'x86/idle', 'x86/io', 'x86/memory-corruption-check', 'x86/microcode', 'x86/mm', 'x86/mtrr', 'x86/nmi-watchdog', 'x86/pat2', 'x86/pci-ioapic-boot-irq-quirks', 'x86/ptrace', 'x86/quirks', 'x86/reboot', 'x86/setup-memory', 'x86/signal', 'x86/sparse-fixes', 'x86/time', 'x86/uv' and 'x86/xen' into x86/core

    Ingo Molnar
     

20 Dec, 2008

4 commits


19 Dec, 2008

4 commits

  • Conflicts:
    include/linux/ftrace.h

    Ingo Molnar
     
  • Impact: Introduces new hooks, which are currently null.

    Introduce generic hooks in remap_pfn_range and vm_insert_pfn and
    corresponding copy and free routines with reserve and free tracking.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Suresh Siddha
    Signed-off-by: H. Peter Anvin

    venkatesh.pallipadi@intel.com
     
  • Impact: New currently unused interface.

    Add a generic interface to follow pfn in a pfnmap vma range. This is used by
    one of the subsequent x86 PAT related patch to keep track of memory types
    for vma regions across vma copy and free.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Suresh Siddha
    Signed-off-by: H. Peter Anvin

    venkatesh.pallipadi@intel.com
     
  • Impact: Code transformation, new functions added should have no effect.

    Drivers use mmap followed by pgprot_* and remap_pfn_range or vm_insert_pfn,
    in order to export reserved memory to userspace. Currently, such mappings are
    not tracked and hence not kept consistent with other mappings (/dev/mem,
    pci resource, ioremap) for the sme memory, that may exist in the system.

    The following patchset adds x86 PAT attribute tracking and untracking for
    pfnmap related APIs.

    First three patches in the patchset are changing the generic mm code to fit
    in this tracking. Last four patches are x86 specific to make things work
    with x86 PAT code. The patchset aso introduces pgprot_writecombine interface,
    which gives writecombine mapping when enabled, falling back to
    pgprot_noncached otherwise.

    This patch:

    While working on x86 PAT, we faced some hurdles with trackking
    remap_pfn_range() regions, as we do not have any information to say
    whether that PFNMAP mapping is linear for the entire vma range or
    it is smaller granularity regions within the vma.

    A simple solution to this is to use vm_pgoff as an indicator for
    linear mapping over the vma region. Currently, remap_pfn_range
    only sets vm_pgoff for COW mappings. Below patch changes the
    logic and sets the vm_pgoff irrespective of COW. This will still not
    be enough for the case where pfn is zero (vma region mapped to
    physical address zero). But, for all the other cases, we can look at
    pfnmap VMAs and say whether the mappng is for the entire vma region
    or not.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Suresh Siddha
    Signed-off-by: H. Peter Anvin

    venkatesh.pallipadi@intel.com
     

17 Dec, 2008

2 commits

  • Impact: cleanup, code robustization

    The __swp_...() macros silently relied upon which bits are used for
    _PAGE_FILE and _PAGE_PROTNONE. After having changed _PAGE_PROTNONE in
    our Xen kernel to no longer overlap _PAGE_PAT, live locks and crashes
    were reported that could have been avoided if these macros properly
    used the symbolic constants. Since, as pointed out earlier, for Xen
    Dom0 support mainline likewise will need to eliminate the conflict
    between _PAGE_PAT and _PAGE_PROTNONE, this patch does all the necessary
    adjustments, plus it introduces a mechanism to check consistency
    between MAX_SWAPFILES_SHIFT and the actual encoding macros.

    This also fixes a latent bug in that x86-64 used a 6-bit mask in
    __swp_type(), and if MAX_SWAPFILES_SHIFT was increased beyond 5 in (the
    seemingly unrelated) linux/swap.h, this would have resulted in a
    collision with _PAGE_FILE.

    Non-PAE 32-bit code gets similarly adjusted for its pte_to_pgoff() and
    pgoff_to_pte() calculations.

    Signed-off-by: Jan Beulich
    Signed-off-by: Ingo Molnar

    Jan Beulich
     
  • Commit 80bba1290ab5122c60cdb73332b26d288dc8aedd removed one necessary
    variable initialization. As a result following warning happened:

    CC mm/migrate.o
    mm/migrate.c: In function 'sys_move_pages':
    mm/migrate.c:1001: warning: 'err' may be used uninitialized in this function

    More unfortunately, if find_vma() failed, kernel read uninitialized
    memory.

    Signed-off-by: KOSAKI Motohiro
    CC: Brice Goglin
    Cc: Christoph Lameter
    Cc: KAMEZAWA Hiroyuki
    Cc: Nick Piggin
    Cc: Hugh Dickins
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

16 Dec, 2008

1 commit

  • The kmem_cache_create() function in the slob allocator passes the SLAB
    flags as GFP flags to the slob_alloc() function. The patch changes this
    call to pass GFP_KERNEL as the other allocators seem to do.

    Signed-off-by: Catalin Marinas
    Acked-by: Matt Mackall
    Cc: Cyrill Gorcunov
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Catalin Marinas
     

13 Dec, 2008

1 commit

  • …t_scnprintf to take pointers.

    Impact: change calling convention of existing cpumask APIs

    Most cpumask functions started with cpus_: these have been replaced by
    cpumask_ ones which take struct cpumask pointers as expected.

    These four functions don't have good replacement names; fortunately
    they're rarely used, so we just change them over.

    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    Signed-off-by: Mike Travis <travis@sgi.com>
    Acked-by: Ingo Molnar <mingo@elte.hu>
    Cc: paulus@samba.org
    Cc: mingo@redhat.com
    Cc: tony.luck@intel.com
    Cc: ralf@linux-mips.org
    Cc: Greg Kroah-Hartman <gregkh@suse.de>
    Cc: cl@linux-foundation.org
    Cc: srostedt@redhat.com

    Rusty Russell
     

11 Dec, 2008

2 commits

  • Miles Lane tailing /sys files hit a BUG which Pekka Enberg has tracked
    to my 966c8c12dc9e77f931e2281ba25d2f0244b06949 sprint_symbol(): use
    less stack exposing a bug in slub's list_locations() -
    kallsyms_lookup() writes a 0 to namebuf[KSYM_NAME_LEN-1], but that was
    beyond the end of page provided.

    The 100 slop which list_locations() allows at end of page looks roughly
    enough for all the other stuff it might print after the symbol before
    it checks again: break out KSYM_SYMBOL_LEN earlier than before.

    Latencytop and ftrace and are using KSYM_NAME_LEN buffers where they
    need KSYM_SYMBOL_LEN buffers, and vmallocinfo a 2*KSYM_NAME_LEN buffer
    where it wants a KSYM_SYMBOL_LEN buffer: fix those before anyone copies
    them.

    [akpm@linux-foundation.org: ftrace.h needs module.h]
    Signed-off-by: Hugh Dickins
    Cc: Christoph Lameter
    Cc Miles Lane
    Acked-by: Pekka Enberg
    Acked-by: Steven Rostedt
    Acked-by: Frederic Weisbecker
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Since commit 2f007e74bb85b9fc4eab28524052161703300f1a, do_pages_stat()
    gets the page address from user-space and puts the corresponding status
    back while holding the mmap_sem for read. There is no need to hold
    mmap_sem there while some page-faults may occur.

    This patch adds a temporary address and status buffer so as to only
    hold mmap_sem while working on these kernel buffers. This is
    implemented by extracting do_pages_stat_array() out of do_pages_stat().

    Signed-off-by: Brice Goglin
    Cc: Christoph Lameter
    Cc: KAMEZAWA Hiroyuki
    Cc: Nick Piggin
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Brice Goglin