12 Jun, 2009

38 commits

  • It is unnecessarily fragile to have two places (fsync_super() and do_sync())
    doing data integrity sync of the filesystem. Alter __fsync_super() to
    accommodate needs of both callers and use it. So after this patch
    __fsync_super() is the only place where we gather all the calls needed to
    properly send all data on a filesystem to disk.

    Nice bonus is that we get a complete livelock avoidance and write_supers()
    is now only used for periodic writeback of superblocks.

    sync_blockdevs() introduced a couple of patches ago is gone now.

    [build fixes folded]

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • __fsync_super() does the same thing as fsync_super(). So change the only
    caller to use fsync_super() and make __fsync_super() static. This removes
    unnecessarily duplicated call to sync_blockdev() and prepares ground
    for the changes to __fsync_super() in the following patches.

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Remove the unused s_async_list in the superblock, a leftover of the
    broken async inode deletion code that leaked into mainline. Having this
    in the middle of the sync/unmount path is not helpful for the following
    cleanups.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • This patch speeds up lmbench lat_mmap test by about another 2% after the
    first patch.

    Before:
    avg = 462.286
    std = 5.46106

    After:
    avg = 453.12
    std = 9.58257

    (50 runs of each, stddev gives a reasonable confidence)

    It does this by introducing mnt_clone_write, which avoids some heavyweight
    operations of mnt_want_write if called on a vfsmount which we know already
    has a write count; and mnt_want_write_file, which can call mnt_clone_write
    if the file is open for write.

    After these two patches, mnt_want_write and mnt_drop_write go from 7% on
    the profile down to 1.3% (including mnt_clone_write).

    [AV: mnt_want_write_file() should take file alone and derive mnt from it;
    not only all callers have that form, but that's the only mnt about which
    we know that it's already held for write if file is opened for write]

    Cc: Dave Hansen
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    npiggin@suse.de
     
  • This patch speeds up lmbench lat_mmap test by about 8%. lat_mmap is set up
    basically to mmap a 64MB file on tmpfs, fault in its pages, then unmap it.
    A microbenchmark yes, but it exercises some important paths in the mm.

    Before:
    avg = 501.9
    std = 14.7773

    After:
    avg = 462.286
    std = 5.46106

    (50 runs of each, stddev gives a reasonable confidence, but there is quite
    a bit of variation there still)

    It does this by removing the complex per-cpu locking and counter-cache and
    replaces it with a percpu counter in struct vfsmount. This makes the code
    much simpler, and avoids spinlocks (although the msync is still pretty
    costly, unfortunately). It results in about 900 bytes smaller code too. It
    does increase the size of a vfsmount, however.

    It should also give a speedup on large systems if CPUs are frequently operating
    on different mounts (because the existing scheme has to operate on an atomic in
    the struct vfsmount when switching between mounts). But I'm most interested in
    the single threaded path performance for the moment.

    [AV: minor cleanup]

    Cc: Dave Hansen
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    npiggin@suse.de
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • New field: nd->root. When pathname resolution wants to know the root,
    check if nd->root.mnt is non-NULL; use nd->root if it is, otherwise
    copy current->fs->root there. After path_walk() is finished, we check
    if we'd got a cached value in nd->root and drop it. Before calling
    path_walk() we should either set nd->root.mnt to NULL *or* copy (and
    pin down) some path to nd->root. In the latter case we won't be
    looking at current->fs->root at all.

    Signed-off-by: Al Viro

    Al Viro
     
  • This patch adds an -oexpose_privroot option to allow access to the privroot.

    Signed-off-by: Jeff Mahoney
    Signed-off-by: Al Viro

    Jeff Mahoney
     
  • * 'for-linus' of git://git.infradead.org/users/eparis/notify:
    fsnotify: allow groups to set freeing_mark to null
    inotify/dnotify: should_send_event shouldn't match on FS_EVENT_ON_CHILD
    dnotify: do not bother to lock entry->lock when reading mask
    dnotify: do not use ?true:false when assigning to a bool
    fsnotify: move events should indicate the event was on a child
    inotify: reimplement inotify using fsnotify
    fsnotify: handle filesystem unmounts with fsnotify marks
    fsnotify: fsnotify marks on inodes pin them in core
    fsnotify: allow groups to add private data to events
    fsnotify: add correlations between events
    fsnotify: include pathnames with entries when possible
    fsnotify: generic notification queue and waitq
    dnotify: reimplement dnotify using fsnotify
    fsnotify: parent event notification
    fsnotify: add marks to inodes so groups can interpret how to handle those inodes
    fsnotify: unified filesystem notification backend

    Linus Torvalds
     
  • * 'for-linus' of git://linux-arm.org/linux-2.6:
    kmemleak: Add the corresponding MAINTAINERS entry
    kmemleak: Simple testing module for kmemleak
    kmemleak: Enable the building of the memory leak detector
    kmemleak: Remove some of the kmemleak false positives
    kmemleak: Add modules support
    kmemleak: Add kmemleak_alloc callback from alloc_large_system_hash
    kmemleak: Add the vmalloc memory allocation/freeing hooks
    kmemleak: Add the slub memory allocation/freeing hooks
    kmemleak: Add the slob memory allocation/freeing hooks
    kmemleak: Add the slab memory allocation/freeing hooks
    kmemleak: Add documentation on the memory leak detector
    kmemleak: Add the base support

    Manual conflict resolution (with the slab/earlyboot changes) in:
    drivers/char/vt.c
    init/main.c
    mm/slab.c

    Linus Torvalds
     
  • …el/git/tip/linux-2.6-tip

    * 'perfcounters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (574 commits)
    perf_counter: Turn off by default
    perf_counter: Add counter->id to the throttle event
    perf_counter: Better align code
    perf_counter: Rename L2 to LL cache
    perf_counter: Standardize event names
    perf_counter: Rename enums
    perf_counter tools: Clean up u64 usage
    perf_counter: Rename perf_counter_limit sysctl
    perf_counter: More paranoia settings
    perf_counter: powerpc: Implement generalized cache events for POWER processors
    perf_counters: powerpc: Add support for POWER7 processors
    perf_counter: Accurate period data
    perf_counter: Introduce struct for sample data
    perf_counter tools: Normalize data using per sample period data
    perf_counter: Annotate exit ctx recursion
    perf_counter tools: Propagate signals properly
    perf_counter tools: Small frequency related fixes
    perf_counter: More aggressive frequency adjustment
    perf_counter/x86: Fix the model number of Intel Core2 processors
    perf_counter, x86: Correct some event and umask values for Intel processors
    ...

    Linus Torvalds
     
  • …/git/penberg/slab-2.6

    * 'topic/slab/earlyboot' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
    vgacon: use slab allocator instead of the bootmem allocator
    irq: use kcalloc() instead of the bootmem allocator
    sched: use slab in cpupri_init()
    sched: use alloc_cpumask_var() instead of alloc_bootmem_cpumask_var()
    memcg: don't use bootmem allocator in setup code
    irq/cpumask: make memoryless node zero happy
    x86: remove some alloc_bootmem_cpumask_var calling
    vt: use kzalloc() instead of the bootmem allocator
    sched: use kzalloc() instead of the bootmem allocator
    init: introduce mm_init()
    vmalloc: use kzalloc() instead of alloc_bootmem()
    slab: setup allocators earlier in the boot sequence
    bootmem: fix slab fallback on numa
    bootmem: use slab if bootmem is no longer available

    Linus Torvalds
     
  • fsnotify tells its listeners explicitly when an event happened on the given
    inode verses on the child of the given inode. (see __fsnotify_parent)
    However, the semantics of fsnotify_move() are such that we deliver events
    directly to the two parent directories in question (old_dir and new_dir)
    directly without using the __fsnotify_parent() call. fsnotify should be
    adding FS_EVENT_ON_CHILD for the notifications to these parents.

    Signed-off-by: Eric Paris

    Eric Paris
     
  • Reimplement inotify_user using fsnotify. This should be feature for feature
    exactly the same as the original inotify_user. This does not make any changes
    to the in kernel inotify feature used by audit. Those patches (and the eventual
    removal of in kernel inotify) will come after the new inotify_user proves to be
    working correctly.

    Signed-off-by: Eric Paris
    Acked-by: Al Viro
    Cc: Christoph Hellwig

    Eric Paris
     
  • When an fs is unmounted with an fsnotify mark entry attached to one of its
    inodes we need to destroy that mark entry and we also (like inotify) send
    an unmount event.

    Signed-off-by: Eric Paris
    Acked-by: Al Viro
    Cc: Christoph Hellwig

    Eric Paris
     
  • inotify needs per group information attached to events. This patch allows
    groups to attach private information and implements a callback so that
    information can be freed when an event is being destroyed.

    Signed-off-by: Eric Paris
    Acked-by: Al Viro
    Cc: Christoph Hellwig

    Eric Paris
     
  • As part of the standard inotify events it includes a correlation cookie
    between two dentry move operations. This patch includes the same behaviour
    in fsnotify events. It is needed so that inotify userspace can be
    implemented on top of fsnotify.

    Signed-off-by: Eric Paris
    Acked-by: Al Viro
    Cc: Christoph Hellwig

    Eric Paris
     
  • When inotify wants to send events to a directory about a child it includes
    the name of the original file. This patch collects that filename and makes
    it available for notification.

    Signed-off-by: Eric Paris
    Acked-by: Al Viro
    Cc: Christoph Hellwig

    Eric Paris
     
  • inotify needs to do asyc notification in which event information is stored
    on a queue until the listener is ready to receive it. This patch
    implements a generic notification queue for inotify (and later fanotify) to
    store events to be sent at a later time.

    Signed-off-by: Eric Paris
    Acked-by: Al Viro
    Cc: Christoph Hellwig

    Eric Paris
     
  • Reimplement dnotify using fsnotify.

    Signed-off-by: Eric Paris
    Acked-by: Al Viro
    Cc: Christoph Hellwig

    Eric Paris
     
  • inotify and dnotify both use a similar parent notification mechanism. We
    add a generic parent notification mechanism to fsnotify for both of these
    to use. This new machanism also adds the dentry flag optimization which
    exists for inotify to dnotify.

    Signed-off-by: Eric Paris
    Acked-by: Al Viro
    Cc: Christoph Hellwig

    Eric Paris
     
  • This patch creates a way for fsnotify groups to attach marks to inodes.
    These marks have little meaning to the generic fsnotify infrastructure
    and thus their meaning should be interpreted by the group that attached
    them to the inode's list.

    dnotify and inotify will make use of these markings to indicate which
    inodes are of interest to their respective groups. But this implementation
    has the useful property that in the future other listeners could actually
    use the marks for the exact opposite reason, aka to indicate which inodes
    it had NO interest in.

    Signed-off-by: Eric Paris
    Acked-by: Al Viro
    Cc: Christoph Hellwig

    Eric Paris
     
  • fsnotify is a backend for filesystem notification. fsnotify does
    not provide any userspace interface but does provide the basis
    needed for other notification schemes such as dnotify. fsnotify
    can be extended to be the backend for inotify or the upcoming
    fanotify. fsnotify provides a mechanism for "groups" to register for
    some set of filesystem events and to then deliver those events to
    those groups for processing.

    fsnotify has a number of benefits, the first being actually shrinking the size
    of an inode. Before fsnotify to support both dnotify and inotify an inode had

    unsigned long i_dnotify_mask; /* Directory notify events */
    struct dnotify_struct *i_dnotify; /* for directory notifications */
    struct list_head inotify_watches; /* watches on this inode */
    struct mutex inotify_mutex; /* protects the watches list

    But with fsnotify this same functionallity (and more) is done with just

    __u32 i_fsnotify_mask; /* all events for this inode */
    struct hlist_head i_fsnotify_mark_entries; /* marks on this inode */

    That's right, inotify, dnotify, and fanotify all in 64 bits. We used that
    much space just in inotify_watches alone, before this patch set.

    fsnotify object lifetime and locking is MUCH better than what we have today.
    inotify locking is incredibly complex. See 8f7b0ba1c8539 as an example of
    what's been busted since inception. inotify needs to know internal semantics
    of superblock destruction and unmounting to function. The inode pinning and
    vfs contortions are horrible.

    no fsnotify implementers do allocation under locks. This means things like
    f04b30de3 which (due to an overabundance of caution) changes GFP_KERNEL to
    GFP_NOFS can be reverted. There are no longer any allocation rules when using
    or implementing your own fsnotify listener.

    fsnotify paves the way for fanotify. In brief fanotify is a notification
    mechanism that delivers the lisener both an 'event' and an open file descriptor
    to the object in question. This means that fanotify is pathname agnostic.
    Some on lkml may not care for the original companies or users that pushed for
    TALPA, but fanotify was designed with flexibility and input for other users in
    mind. The readahead group expressed interest in fanotify as it could be used
    to profile disk access on boot without breaking the audit system. The desktop
    search groups have also expressed interest in fanotify as it solves a number
    of the race conditions and problems present with managing inotify when more
    than a limited number of specific files are of interest. fanotify can provide
    for a userspace access control system which makes it a clean interface for AV
    vendors to hook without trying to do binary patching on the syscall table,
    LSM, and everywhere else they do their things today. With this patch series
    fanotify can be implemented in less than 1200 lines of easy to review code.
    Almost all of which is the socket based user interface.

    This patch series builds fsnotify to the point that it can implement
    dnotify and inotify_user. Patches exist and will be sent soon after
    acceptance to finish the in kernel inotify conversion (audit) and implement
    fanotify.

    Signed-off-by: Eric Paris
    Acked-by: Al Viro
    Cc: Christoph Hellwig

    Eric Paris
     
  • * 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits)
    block: add request clone interface (v2)
    floppy: fix hibernation
    ramdisk: remove long-deprecated "ramdisk=" boot-time parameter
    fs/bio.c: add missing __user annotation
    block: prevent possible io_context->refcount overflow
    Add serial number support for virtio_blk, V4a
    block: Add missing bounce_pfn stacking and fix comments
    Revert "block: Fix bounce limit setting in DM"
    cciss: decode unit attention in SCSI error handling code
    cciss: Remove no longer needed sendcmd reject processing code
    cciss: change SCSI error handling routines to work with interrupts enabled.
    cciss: separate error processing and command retrying code in sendcmd_withirq_core()
    cciss: factor out fix target status processing code from sendcmd functions
    cciss: simplify interface of sendcmd() and sendcmd_withirq()
    cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code
    cciss: Use schedule_timeout_uninterruptible in SCSI error handling code
    block: needs to set the residual length of a bidi request
    Revert "block: implement blkdev_readpages"
    block: Fix bounce limit setting in DM
    Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt
    ...

    Manually fix conflicts with tracing updates in:
    block/blk-sysfs.c
    drivers/ide/ide-atapi.c
    drivers/ide/ide-cd.c
    drivers/ide/ide-floppy.c
    drivers/ide/ide-tape.c
    include/trace/events/block.h
    kernel/trace/blktrace.c

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (266 commits)
    sh: Tie sparseirq in to Kconfig.
    sh: Wire up sys_rt_tgsigqueueinfo.
    sh: Fix sys_pwritev() syscall table entry for sh32.
    sh: Fix sh4a llsc-based cmpxchg()
    sh: sh7724: Add JPU support
    sh: sh7724: INTC setting update
    sh: sh7722 clock framework rewrite
    sh: sh7366 clock framework rewrite
    sh: sh7343 clock framework rewrite
    sh: sh7724 clock framework rewrite V3
    sh: sh7723 clock framework rewrite V2
    sh: add enable()/disable()/set_rate() to div6 code
    sh: add AP325RXA mode pin configuration
    sh: add Migo-R mode pin configuration
    sh: sh7722 mode pin definitions
    sh: sh7724 mode pin comments
    sh: sh7723 mode pin V2
    sh: rework mode pin code
    sh: clock div6 helper code
    sh: clock div4 frequency table offset fix
    ...

    Linus Torvalds
     
  • * 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (138 commits)
    KVM: Prevent overflow in largepages calculation
    KVM: Disable large pages on misaligned memory slots
    KVM: Add VT-x machine check support
    KVM: VMX: Rename rmode.active to rmode.vm86_active
    KVM: Move "exit due to NMI" handling into vmx_complete_interrupts()
    KVM: Disable CR8 intercept if tpr patching is active
    KVM: Do not migrate pending software interrupts.
    KVM: inject NMI after IRET from a previous NMI, not before.
    KVM: Always request IRQ/NMI window if an interrupt is pending
    KVM: Do not re-execute INTn instruction.
    KVM: skip_emulated_instruction() decode instruction if size is not known
    KVM: Remove irq_pending bitmap
    KVM: Do not allow interrupt injection from userspace if there is a pending event.
    KVM: Unprotect a page if #PF happens during NMI injection.
    KVM: s390: Verify memory in kvm run
    KVM: s390: Sanity check on validity intercept
    KVM: s390: Unlink vcpu on destroy - v2
    KVM: s390: optimize float int lock: spin_lock_bh --> spin_lock
    KVM: s390: use hrtimer for clock wakeup from idle - v2
    KVM: s390: Fix memory slot versus run - v3
    ...

    Linus Torvalds
     
  • …s/security-testing-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (44 commits)
    nommu: Provide mmap_min_addr definition.
    TOMOYO: Add description of lists and structures.
    TOMOYO: Remove unused field.
    integrity: ima audit dentry_open failure
    TOMOYO: Remove unused parameter.
    security: use mmap_min_addr indepedently of security models
    TOMOYO: Simplify policy reader.
    TOMOYO: Remove redundant markers.
    SELinux: define audit permissions for audit tree netlink messages
    TOMOYO: Remove unused mutex.
    tomoyo: avoid get+put of task_struct
    smack: Remove redundant initialization.
    integrity: nfsd imbalance bug fix
    rootplug: Remove redundant initialization.
    smack: do not beyond ARRAY_SIZE of data
    integrity: move ima_counts_get
    integrity: path_check update
    IMA: Add __init notation to ima functions
    IMA: Minimal IMA policy and boot param for TCB IMA policy
    selinux: remove obsolete read buffer limit from sel_read_bool
    ...

    Linus Torvalds
     
  • * 'for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (28 commits)
    ide-tape: fix debug call
    alim15x3: Remove historical hacks, re-enable init_hwif for PowerPC
    ide-dma: don't reset request fields on dma_timeout_retry()
    ide: drop rq->data handling from ide_map_sg()
    ide-atapi: kill unused fields and callbacks
    ide-tape: simplify read/write functions
    ide-tape: use byte size instead of sectors on rw issue functions
    ide-tape: unify r/w init paths
    ide-tape: kill idetape_bh
    ide-tape: use standard data transfer mechanism
    ide-tape: use single continuous buffer
    ide-atapi,tape,floppy: allow ->pc_callback() to change rq->data_len
    ide-tape,floppy: fix failed command completion after request sense
    ide-pm: don't abuse rq->data
    ide-cd,atapi: use bio for internal commands
    ide-atapi: convert ide-{floppy,tape} to using preallocated sense buffer
    ide-cd: convert to using generic sense request
    ide: add helpers for preparing sense requests
    ide-cd: don't abuse rq->buffer
    ide-atapi: don't abuse rq->buffer
    ...

    Linus Torvalds
     
  • Now that we set up the slab allocator earlier, we can get rid of some
    alloc_bootmem_cpumask_var() calls in boot code.

    Cc: Ingo Molnar
    Cc: Johannes Weiner
    Cc: Linus Torvalds
    Signed-off-by: Yinghai Lu
    Signed-off-by: Pekka Enberg

    Yinghai Lu
     
  • There are allocations for which the main pointer cannot be found but
    they are not memory leaks. This patch fixes some of them. For more
    information on false positives, see Documentation/kmemleak.txt.

    Signed-off-by: Catalin Marinas

    Catalin Marinas
     
  • This patch adds the callbacks to kmemleak_(alloc|free) functions from
    the slab allocator. The patch also adds the SLAB_NOLEAKTRACE flag to
    avoid recursive calls to kmemleak when it allocates its own data
    structures.

    Signed-off-by: Catalin Marinas
    Reviewed-by: Pekka Enberg

    Catalin Marinas
     
  • This patch adds the base support for the kernel memory leak
    detector. It traces the memory allocation/freeing in a way similar to
    the Boehm's conservative garbage collector, the difference being that
    the unreferenced objects are not freed but only shown in
    /sys/kernel/debug/kmemleak. Enabling this feature introduces an
    overhead to memory allocations.

    Signed-off-by: Catalin Marinas
    Cc: Ingo Molnar
    Acked-by: Pekka Enberg
    Cc: Andrew Morton
    Reviewed-by: Paul E. McKenney

    Catalin Marinas
     

11 Jun, 2009

2 commits

  • * serial-from-alan: (79 commits)
    moxa: prevent opening unavailable ports
    imx: serial: use tty_encode_baud_rate to set true rate
    imx: serial: add IrDA support to serial driver
    imx: serial: use rational library function
    lib: isolate rational fractions helper function
    imx: serial: handle initialisation failure correctly
    imx: serial: be sure to stop xmit upon shutdown
    imx: serial: notify higher layers in case xmit IRQ was not called
    imx: serial: fix one bit field type
    imx: serial: fix whitespaces (no changes in functionality)
    tty: use prepare/finish_wait
    tty: remove sleep_on
    sierra: driver interface blacklisting
    sierra: driver urb handling improvements
    tty: resolve some sierra breakage
    timbuart: Fix the termios logic
    serial: Added Timberdale UART driver
    tty: Add URL for ttydev queue
    devpts: unregister the file system on error
    tty: Untangle termios and mm mutex dependencies
    ...

    Linus Torvalds
     
  • Conflicts:
    arch/x86/kernel/irqinit.c
    arch/x86/kernel/irqinit_64.c
    arch/x86/kernel/traps.c
    arch/x86/mm/fault.c
    include/linux/sched.h
    kernel/exit.c

    Ingo Molnar