19 Aug, 2010

8 commits

  • Fix the declaration of sys_execve() in asm-generic/syscalls.h to have
    various consts applied to its pointers.

    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    fs: brlock vfsmount_lock
    fs: scale files_lock
    lglock: introduce special lglock and brlock spin locks
    tty: fix fu_list abuse
    fs: cleanup files_lock locking
    fs: remove extra lookup in __lookup_hash
    fs: fs_struct rwlock to spinlock
    apparmor: use task path helpers
    fs: dentry allocation consolidation
    fs: fix do_lookup false negative
    mbcache: Limit the maximum number of cache entries
    hostfs ->follow_link() braino
    hostfs: dumb (and usually harmless) tpyo - strncpy instead of strlcpy
    remove SWRITE* I/O types
    kill BH_Ordered flag
    vfs: update ctime when changing the file's permission by setfacl
    cramfs: only unlock new inodes
    fix reiserfs_evict_inode end_writeback second call

    Linus Torvalds
     
  • This fixes a build breakage introduced by commit 4c2ef25fe0b8 ("mmc: fix
    all hangs related to mmc/sd card insert/removal during suspend/resume")

    Cc: David Brownell
    Cc: Alan Stern
    Cc: linux-mmc@vger.kernel.org
    Cc: Andrew Morton
    Signed-off-by: Uwe Kleine-König
    Acked-by: Kukjin Kim
    Acked-by: Maxim Levitsky
    Acked-by: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     
  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf tools: Fix build on POSIX shells
    latencytop: Fix kconfig dependency warnings
    perf annotate tui: Fix exit and RIGHT keys handling
    tracing: Sanitize value returned from write(trace_marker, "...", len)
    tracing/events: Convert format output to seq_file
    tracing: Extend recordmcount to better support Blackfin mcount
    tracing: Fix ring_buffer_read_page reading out of page boundary
    tracing: Fix an unallocated memory access in function_graph

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
    ALSA: emu10k1 - delay the PCM interrupts (add pcm_irq_delay parameter)
    ALSA: hda - Fix ALC680 base model capture
    ASoC: Remove DSP mode support for WM8776
    ALSA: hda - Add quirk for Dell Vostro 1220
    ALSA: riptide - Fix detection / load of firmware files

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
    m68knommu: include sched.h in ColdFire/SPI driver
    m68knommu: formatting of pointers in printk()
    m68knommu: arch/m68k/include/asm/ide.h fix for nommu

    Linus Torvalds
     
  • * 'for-linus' of git://neil.brown.name/md:
    md raid-1/10 Fix bio_rw bit manipulations again
    md: provide appropriate return value for spare_active functions.
    md: Notify sysfs when RAID1/5/10 disk is In_sync.
    Update recovery_offset even when external metadata is used.

    Linus Torvalds
     
  • * 'merge-devicetree' of git://git.secretlab.ca/git/linux-2.6:
    spi.h: missing kernel-doc notation, please fix
    of: fix missing headers for of_address_to_resource() in MTD and SysACE drivers
    of: Fix missing includes
    ata: update for of_device to platform_device replacement
    microblaze: Fix of: eliminate of_device->node and dev_archdata->{of,prom}_node
    microblaze: Fix of/address: Merge all of the bus translation code
    booting-without-of: Remove nonexistent chapters from TOC, fix numbering

    Linus Torvalds
     

18 Aug, 2010

32 commits

  • Takashi Iwai
     
  • Takashi Iwai
     
  • With some hardware combinations, the PCM interrupts are acknowledged
    before the period boundary from the emu10k1 chip. The midlevel PCM code
    gets confused and the playback stream is interrupted.

    It seems that the interrupt processing shift by 2 samples is enough
    to fix this issue. This default value does not harm other,
    non-affected hardware.

    More information: Kernel bugzilla bug#16300

    [A copmile warning fixed by tiwai]

    Signed-off-by: Jaroslav Kysela
    Cc:
    Signed-off-by: Takashi Iwai

    Jaroslav Kysela
     
  • fs: brlock vfsmount_lock

    Use a brlock for the vfsmount lock. It must be taken for write whenever
    modifying the mount hash or associated fields, and may be taken for read when
    performing mount hash lookups.

    A new lock is added for the mnt-id allocator, so it doesn't need to take
    the heavy vfsmount write-lock.

    The number of atomics should remain the same for fastpath rlock cases, though
    code would be slightly slower due to per-cpu access. Scalability is not not be
    much improved in common cases yet, due to other locks (ie. dcache_lock) getting
    in the way. However path lookups crossing mountpoints should be one case where
    scalability is improved (currently requiring the global lock).

    The slowpath is slower due to use of brlock. On a 64 core, 64 socket, 32 node
    Altix system (high latency to remote nodes), a simple umount microbenchmark
    (mount --bind mnt mnt2 ; umount mnt2 loop 1000 times), before this patch it
    took 6.8s, afterwards took 7.1s, about 5% slower.

    Cc: Al Viro
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    Nick Piggin
     
  • fs: scale files_lock

    Improve scalability of files_lock by adding per-cpu, per-sb files lists,
    protected with an lglock. The lglock provides fast access to the per-cpu lists
    to add and remove files. It also provides a snapshot of all the per-cpu lists
    (although this is very slow).

    One difficulty with this approach is that a file can be removed from the list
    by another CPU. We must track which per-cpu list the file is on with a new
    variale in the file struct (packed into a hole on 64-bit archs). Scalability
    could suffer if files are frequently removed from different cpu's list.

    However loads with frequent removal of files imply short interval between
    adding and removing the files, and the scheduler attempts to avoid moving
    processes too far away. Also, even in the case of cross-CPU removal, the
    hardware has much more opportunity to parallelise cacheline transfers with N
    cachelines than with 1.

    A worst-case test of 1 CPU allocating files subsequently being freed by N CPUs
    degenerates to contending on a single lock, which is no worse than before. When
    more than one CPU are allocating files, even if they are always freed by
    different CPUs, there will be more parallelism than the single-lock case.

    Testing results:

    On a 2 socket, 8 core opteron, I measure the number of times the lock is taken
    to remove the file, the number of times it is removed by the same CPU that
    added it, and the number of times it is removed by the same node that added it.

    Booting: locks= 25049 cpu-hits= 23174 (92.5%) node-hits= 23945 (95.6%)
    kbuild -j16 locks=2281913 cpu-hits=2208126 (96.8%) node-hits=2252674 (98.7%)
    dbench 64 locks=4306582 cpu-hits=4287247 (99.6%) node-hits=4299527 (99.8%)

    So a file is removed from the same CPU it was added by over 90% of the time.
    It remains within the same node 95% of the time.

    Tim Chen ran some numbers for a 64 thread Nehalem system performing a compile.

    throughput
    2.6.34-rc2 24.5
    +patch 24.9

    us sys idle IO wait (in %)
    2.6.34-rc2 51.25 28.25 17.25 3.25
    +patch 53.75 18.5 19 8.75

    So significantly less CPU time spent in kernel code, higher idle time and
    slightly higher throughput.

    Single threaded performance difference was within the noise of microbenchmarks.
    That is not to say penalty does not exist, the code is larger and more memory
    accesses required so it will be slightly slower.

    Cc: linux-kernel@vger.kernel.org
    Cc: Tim Chen
    Cc: Andi Kleen
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    Nick Piggin
     
  • lglock: introduce special lglock and brlock spin locks

    This patch introduces "local-global" locks (lglocks). These can be used to:

    - Provide fast exclusive access to per-CPU data, with exclusive access to
    another CPU's data allowed but possibly subject to contention, and to provide
    very slow exclusive access to all per-CPU data.
    - Or to provide very fast and scalable read serialisation, and to provide
    very slow exclusive serialisation of data (not necessarily per-CPU data).

    Brlocks are also implemented as a short-hand notation for the latter use
    case.

    Thanks to Paul for local/global naming convention.

    Cc: linux-kernel@vger.kernel.org
    Cc: Al Viro
    Cc: "Paul E. McKenney"
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    Nick Piggin
     
  • tty: fix fu_list abuse

    tty code abuses fu_list, which causes a bug in remount,ro handling.

    If a tty device node is opened on a filesystem, then the last link to the inode
    removed, the filesystem will be allowed to be remounted readonly. This is
    because fs_may_remount_ro does not find the 0 link tty inode on the file sb
    list (because the tty code incorrectly removed it to use for its own purpose).
    This can result in a filesystem with errors after it is marked "clean".

    Taking idea from Christoph's initial patch, allocate a tty private struct
    at file->private_data and put our required list fields in there, linking
    file and tty. This makes tty nodes behave the same way as other device nodes
    and avoid meddling with the vfs, and avoids this bug.

    The error handling is not trivial in the tty code, so for this bugfix, I take
    the simple approach of using __GFP_NOFAIL and don't worry about memory errors.
    This is not a problem because our allocator doesn't fail small allocs as a rule
    anyway. So proper error handling is left as an exercise for tty hackers.

    [ Arguably filesystem's device inode would ideally be divorced from the
    driver's pseudo inode when it is opened, but in practice it's not clear whether
    that will ever be worth implementing. ]

    Cc: linux-kernel@vger.kernel.org
    Cc: Christoph Hellwig
    Cc: Alan Cox
    Cc: Greg Kroah-Hartman
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    Nick Piggin
     
  • fs: cleanup files_lock locking

    Lock tty_files with a new spinlock, tty_files_lock; provide helpers to
    manipulate the per-sb files list; unexport the files_lock spinlock.

    Cc: linux-kernel@vger.kernel.org
    Cc: Christoph Hellwig
    Cc: Alan Cox
    Acked-by: Andi Kleen
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    Nick Piggin
     
  • fs: remove extra lookup in __lookup_hash

    Optimize lookup for create operations, where no dentry should often be
    common-case. In cases where it is not, such as unlink, the added overhead
    is much smaller than the removed.

    Also, move comments about __d_lookup racyness to the __d_lookup call site.
    d_lookup is intuitive; __d_lookup is what needs commenting. So in that same
    vein, add kerneldoc comments to __d_lookup and clean up some of the comments:

    - We are interested in how the RCU lookup works here, particularly with
    renames. Make that explicit, and point to the document where it is explained
    in more detail.
    - RCU is pretty standard now, and macros make implementations pretty mindless.
    If we want to know about RCU barrier details, we look in RCU code.
    - Delete some boring legacy comments because we don't care much about how the
    code used to work, more about the interesting parts of how it works now. So
    comments about lazy LRU may be interesting, but would better be done in the
    LRU or refcount management code.

    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    Nick Piggin
     
  • fs: fs_struct rwlock to spinlock

    struct fs_struct.lock is an rwlock with the read-side used to protect root and
    pwd members while taking references to them. Taking a reference to a path
    typically requires just 2 atomic ops, so the critical section is very small.
    Parallel read-side operations would have cacheline contention on the lock, the
    dentry, and the vfsmount cachelines, so the rwlock is unlikely to ever give a
    real parallelism increase.

    Replace it with a spinlock to avoid one or two atomic operations in typical
    path lookup fastpath.

    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    Nick Piggin
     
  • apparmor: use task path helpers

    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    Nick Piggin
     
  • fs: dentry allocation consolidation

    There are 2 duplicate copies of code in dentry allocation in path lookup.
    Consolidate them into a single function.

    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    Nick Piggin
     
  • fs: fix do_lookup false negative

    In do_lookup, if we initially find no dentry, we take the directory i_mutex and
    re-check the lookup. If we find a dentry there, then we revalidate it if
    needed. However if that revalidate asks for the dentry to be invalidated, we
    return -ENOENT from do_lookup. What should happen instead is an attempt to
    allocate and lookup a new dentry.

    This is probably not noticed because it is rare. It is only reached if a
    concurrent create races in first (in which case, the dentry probably won't be
    invalidated anyway), or if the racy __d_lookup has failed due to a
    false-negative (which is very rare).

    Fix this by removing code and have it use the normal reval path.

    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    Nick Piggin
     
  • Limit the maximum number of mb_cache entries depending on the number of
    hash buckets: if the only limit to the number of cache entries is the
    available memory the hash chains can grow very long, taking a long time
    to search.

    At least partially solves https://bugzilla.lustre.org/show_bug.cgi?id=22771.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     
  • we want the assignment to err done inside the if () to be
    visible after it, so (re)declaring err inside if () body
    is wrong.

    Signed-off-by: Al Viro

    Al Viro
     
  • ... not harmless in this case - we have a string in the end of buffer
    already.

    Signed-off-by: Al Viro

    Al Viro
     
  • …/linux-2.6 into perf/urgent

    Ingo Molnar
     
  • commit 7b6d91daee5cac6402186ff224c3af39d79f4a0e changed the behaviour
    of a few variables in raid1 and raid10 from flags to bit-sets, but
    left them as type 'bool' so they did not work.

    Change them (back) to unsigned long.
    (historical note: see 1ef04fefe2241087d9db7e9615c3f11b516e36cf)

    Signed-off-by: NeilBrown
    Reported-by: Jiri Slaby and many others

    NeilBrown
     
  • These flags aren't real I/O types, but tell ll_rw_block to always
    lock the buffer instead of giving up on a failed trylock.

    Instead add a new write_dirty_buffer helper that implements this semantic
    and use it from the existing SWRITE* callers. Note that the ll_rw_block
    code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
    this patch fixes.

    In the ufs code clean up the helper that used to call ll_rw_block
    to mirror sync_dirty_buffer, which is the function it implements for
    compound buffers.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Instead of abusing a buffer_head flag just add a variant of
    sync_dirty_buffer which allows passing the exact type of write
    flag required.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • generic_acl_set didn't update the ctime of the file when its permission was
    changed.

    Steps to reproduce:
    # touch aaa
    # stat -c %Z aaa
    1275289822
    # setfacl -m 'u::x,g::x,o::x' aaa
    # stat -c %Z aaa
    1275289822 .

    CC: Al Viro
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Commit 77b8a75f5bb introduced a warning at fs/inode.c:692 unlock_new_inode(),
    caused by unlock_new_inode() being called on existing inodes as well.

    This patch changes setup_inode() to only call unlock_new_inode() for I_NEW
    inodes.

    Signed-off-by: Alexander Shishkin
    Signed-off-by: Al Viro

    Alexander Shishkin
     
  • reiserfs_evict_inode calls end_writeback two times hitting
    kernel BUG at fs/inode.c:298 becase inode->i_state is I_CLEAR already.

    Signed-off-by: Sergey Senozhatsky
    Signed-off-by: Al Viro

    Sergey Senozhatsky
     
  • Added comments in kernel-doc notation for previously added struct fields.

    Signed-off-by: Ernst Schwab
    Acked-by: Randy Dunlap
    Signed-off-by: Grant Likely

    Ernst Schwab
     
  • Using the coldfire qspi driver, I get the following error:

    drivers/spi/coldfire_qspi.c: In function 'mcfqspi_irq_handler':
    drivers/spi/coldfire_qspi.c:166: error: 'TASK_NORMAL' undeclared (first use in this function)
    drivers/spi/coldfire_qspi.c:166: error: (Each undeclared identifier is reported only once

    It is solved by adding the following include to coldfire_sqpi.c:

    #include

    Fix suggested by Jate Sujjavanich

    Signed-off-by: Greg Ungerer

    Greg Ungerer
     
  • arch/m68knommu/kernel/process.c: formatting of pointers in printk()

    Use %p instead of %08x in printk().

    Signed-off-by: Kulikov Vasiliy
    Signed-off-by: Greg Ungerer

    Kulikov Vasiliy
     
  • The arch/m68k/include/asm/ide.h produces errors when the IDE driver is compiled for my 523x uClinux system under kernel. The header makes some redefines of operators not defined in the arch/m68k/include/asm/io_no.h header. There are no separate mmio and iospace defines.

    Signed-off-by: Jate Sujjavanich
    Acked-by: Geert Uytterhoeven
    Signed-off-by: Greg Ungerer

    Jate Sujjavanich
     
  • md_check_recovery expects ->spare_active to return 'true' if any
    spares were activated, but none of them do, so the consequent change
    in 'degraded' is not notified through sysfs.

    So count the number of spares activated, subtract it from 'degraded'
    just once, and return it.

    Reported-by: Adrian Drzewiecki
    Signed-off-by: NeilBrown

    NeilBrown
     
  • When RAID1 is done syncing disks, it'll update the state
    of synced rdevs to In_sync. But it neglected to notify
    sysfs that the attribute changed. So any programs that
    are waiting for an rdev's state to change will not be
    woken.

    (raid5/raid10 added by neilb)

    Signed-off-by: Adrian Drzewiecki
    Signed-off-by: NeilBrown

    Adrian Drzewiecki
     
  • The update of ->recovery_offset in sync_sbs is appropriate even then external
    metadata is in use. However sync_sbs is only called when native
    metadata is used.

    So move that update in to the top of md_update_sb (which is the only
    caller of sync_sbs) before the test on ->external.

    This moves the update out of ->write_lock protection, but those fields
    only need ->reconfig_mutex protection which they still have.

    Also move the test on ->persistent up to where ->external is set as
    for metadata update purposes they are the same.

    Clear MD_CHANGE_DEVS and MD_CHANGE_CLEAN as they can only be confusing
    if ->external is set or ->persistent isn't.

    Finally move the update of ->utime down as it is only relevent (like
    the ->events update) for native metadata.

    Signed-off-by: NeilBrown
    Reported-by: "Kwolek, Adam"

    NeilBrown
     
  • …s/security-testing-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
    AppArmor: fix task_setrlimit prototype

    Linus Torvalds
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
    vt,console,kdb: preserve console_blanked while in kdb
    vt: fix regression warnings from KMS merge
    arm,kgdb: fix GDB_MAX_REGS no longer used
    kgdb: add missing __percpu markup in arch/x86/kernel/kgdb.c
    kdb: fix compile error without CONFIG_KALLSYMS

    Linus Torvalds