11 Sep, 2009

4 commits

  • This gets rid of pdflush for bdi writeout and kupdated style cleaning.
    pdflush writeout suffers from lack of locality and also requires more
    threads to handle the same workload, since it has to work in a
    non-blocking fashion against each queue. This also introduces lumpy
    behaviour and potential request starvation, since pdflush can be starved
    for queue access if others are accessing it. A sample ffsb workload that
    does random writes to files is about 8% faster here on a simple SATA drive
    during the benchmark phase. File layout also seems a LOT more smooth in
    vmstat:

    r b swpd free buff cache si so bi bo in cs us sy id wa
    0 1 0 608848 2652 375372 0 0 0 71024 604 24 1 10 48 42
    0 1 0 549644 2712 433736 0 0 0 60692 505 27 1 8 48 44
    1 0 0 476928 2784 505192 0 0 4 29540 553 24 0 9 53 37
    0 1 0 457972 2808 524008 0 0 0 54876 331 16 0 4 38 58
    0 1 0 366128 2928 614284 0 0 4 92168 710 58 0 13 53 34
    0 1 0 295092 3000 684140 0 0 0 62924 572 23 0 9 53 37
    0 1 0 236592 3064 741704 0 0 4 58256 523 17 0 8 48 44
    0 1 0 165608 3132 811464 0 0 0 57460 560 21 0 8 54 38
    0 1 0 102952 3200 873164 0 0 4 74748 540 29 1 10 48 41
    0 1 0 48604 3252 926472 0 0 0 53248 469 29 0 7 47 45

    where vanilla tends to fluctuate a lot in the creation phase:

    r b swpd free buff cache si so bi bo in cs us sy id wa
    1 1 0 678716 5792 303380 0 0 0 74064 565 50 1 11 52 36
    1 0 0 662488 5864 319396 0 0 4 352 302 329 0 2 47 51
    0 1 0 599312 5924 381468 0 0 0 78164 516 55 0 9 51 40
    0 1 0 519952 6008 459516 0 0 4 78156 622 56 1 11 52 37
    1 1 0 436640 6092 541632 0 0 0 82244 622 54 0 11 48 41
    0 1 0 436640 6092 541660 0 0 0 8 152 39 0 0 51 49
    0 1 0 332224 6200 644252 0 0 4 102800 728 46 1 13 49 36
    1 0 0 274492 6260 701056 0 0 4 12328 459 49 0 7 50 43
    0 1 0 211220 6324 763356 0 0 0 106940 515 37 1 10 51 39
    1 0 0 160412 6376 813468 0 0 0 8224 415 43 0 6 49 45
    1 1 0 85980 6452 886556 0 0 4 113516 575 39 1 11 54 34
    0 2 0 85968 6452 886620 0 0 0 1640 158 211 0 0 46 54

    A 10 disk test with btrfs performs 26% faster with per-bdi flushing. A
    SSD based writeback test on XFS performs over 20% better as well, with
    the throughput being very stable around 1GB/sec, where pdflush only
    manages 750MB/sec and fluctuates wildly while doing so. Random buffered
    writes to many files behave a lot better as well, as does random mmap'ed
    writes.

    A separate thread is added to sync the super blocks. In the long term,
    adding sync_supers_bdi() functionality could get rid of this thread again.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • This is a first step at introducing per-bdi flusher threads. We should
    have no change in behaviour, although sb_has_dirty_inodes() is now
    ridiculously expensive, as there's no easy way to answer that question.
    Not a huge problem, since it'll be deleted in subsequent patches.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • This adds two new exported functions:

    - writeback_inodes_sb(), which only attempts to writeback dirty inodes on
    this super_block, for WB_SYNC_NONE writeout.
    - sync_inodes_sb(), which writes out all dirty inodes on this super_block
    and also waits for the IO to complete.

    Acked-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Commit b8313b6da7e2e7c7f47d93d8561969a3ff9ba0ea ("dm log: remove incorrect
    field from userspace table output") added a call to strstr() with a
    single-character "needle" string parameter.

    Unfortunately some versions of gcc replace such calls to strstr() by calls
    to strchr() behind our back. This causes linking errors if strchr() is
    defined as an inline function in (e.g. on m68k):

    | WARNING: "strchr" [drivers/md/dm-log-userspace.ko] undefined!

    Avoid this by explicitly calling strchr() instead.

    Signed-off-by: Geert Uytterhoeven
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     

10 Sep, 2009

3 commits

  • * lookup-permissions-cleanup:
    jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()'
    ext[234]: move over to 'check_acl' permission model
    shmfs: use 'check_acl' instead of 'permission'
    Make 'check_acl()' a first-class filesystem op
    Simplify exec_permission_lite(), part 3
    Simplify exec_permission_lite() further
    Simplify exec_permission_lite() logic
    Do not call 'ima_path_check()' for each path component

    Linus Torvalds
     
  • In fs/binfmt_elf.c, load_elf_interp() calls padzero() for .bss even if
    the PT_LOAD has no PROT_WRITE and no .bss. This generates EFAULT.

    Here is a small test case. (Yes, there are other, useful PT_INTERP
    which have only .text and no .data/.bss.)

    ----- ptinterp.S
    _start: .globl _start
    nop
    int3
    -----
    $ gcc -m32 -nostartfiles -nostdlib -o ptinterp ptinterp.S
    $ gcc -m32 -Wl,--dynamic-linker=ptinterp -o hello hello.c
    $ ./hello
    Segmentation fault # during execve() itself

    After applying the patch:
    $ ./hello
    Trace trap # user-mode execution after execve() finishes

    If the ELF headers are actually self-inconsistent, then dying is fine.
    But having no PROT_WRITE segment is perfectly normal and correct if
    there is no segment with p_memsz > p_filesz (i.e. bss). John Reiser
    suggested checking for PROT_WRITE in the bss logic. I think it makes
    most sense to simply apply the bss logic only when there is bss.

    This patch looks less trivial than it is due to some reindentation.
    It just moves the "if (last_bss > elf_bss) {" test up to include the
    partial-page bss logic as well as the more-pages bss logic.

    Reported-by: John Reiser
    Signed-off-by: Roland McGrath
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • Linus Torvalds
     

09 Sep, 2009

11 commits

  • Andy Whitcroft reported an oops in aoe triggered by use of an
    incorrectly initialised request_queue object:

    [ 2645.959090] kobject '' (ffff880059ca22c0): tried to add
    an uninitialized object, something is seriously wrong.
    [ 2645.959104] Pid: 6, comm: events/0 Not tainted 2.6.31-5-generic #24-Ubuntu
    [ 2645.959107] Call Trace:
    [ 2645.959139] [] kobject_add+0x5f/0x70
    [ 2645.959151] [] blk_register_queue+0x8b/0xf0
    [ 2645.959155] [] add_disk+0x8f/0x160
    [ 2645.959161] [] aoeblk_gdalloc+0x164/0x1c0 [aoe]

    The request queue of an aoe device is not used but can be allocated in
    code that does not sleep.

    Bruno bisected this regression down to

    cd43e26f071524647e660706b784ebcbefbd2e44

    block: Expose stacked device queues in sysfs

    "This seems to generate /sys/block/$device/queue and its contents for
    everyone who is using queues, not just for those queues that have a
    non-NULL queue->request_fn."

    Addresses http://bugs.launchpad.net/bugs/410198
    Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13942

    Note that embedding a queue inside another object has always been
    an illegal construct, since the queues are reference counted and
    must persist until the last reference is dropped. So aoe was
    always buggy in this respect (Jens).

    Signed-off-by: Ed Cashin
    Cc: Andy Whitcroft
    Cc: "Rafael J. Wysocki"
    Cc: Bruno Premont
    Cc: Martin K. Petersen
    Cc: Andrew Morton
    Signed-off-by: Jens Axboe

    Ed Cashin
     
  • Reinette Chatre reports a frozen system (with blinking keyboard LEDs)
    when switching from graphics mode to the text console, or when
    suspending (which does the same thing). With netconsole, the oops
    turned out to be

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000084
    IP: [] i915_driver_irq_handler+0x26b/0xd20 [i915]

    and it's due to the i915_gem.c code doing drm_irq_uninstall() after
    having done i915_gem_idle(). And the i915_gem_idle() path will do

    i915_gem_idle() ->
    i915_gem_cleanup_ringbuffer() ->
    i915_gem_cleanup_hws() ->
    dev_priv->hw_status_page = NULL;

    but if an i915 interrupt comes in after this stage, it may want to
    access that hw_status_page, and gets the above NULL pointer dereference.

    And since the NULL pointer dereference happens from within an interrupt,
    and with the screen still in graphics mode, the common end result is
    simply a silently hung machine.

    Fix it by simply uninstalling the irq handler before idling rather than
    after. Fixes

    http://bugzilla.kernel.org/show_bug.cgi?id=13819

    Reported-and-tested-by: Reinette Chatre
    Acked-by: Jesse Barnes
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • This avoids an indirect call in the VFS for each path component lookup.

    Well, at least as long as you own the directory in question, and the ACL
    check is unnecessary.

    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Don't implement per-filesystem 'extX_permission()' functions that have
    to be called for every path component operation, and instead just expose
    the actual ACL checking so that the VFS layer can now do it for us.

    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • shmfs wants purely standard POSIX ACL semantics, so we can use the new
    generic VFS layer POSIX ACL checking rather than cooking our own
    'permission()' function.

    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Acked-by: Hugh Dickins
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • This is stage one in flattening out the callchains for the common
    permission testing. Rather than have most filesystem implement their
    own inode->i_op->permission function that just calls back down to the
    VFS layers 'generic_permission()' with the per-filesystem ACL checking
    function, the filesystem can just expose its 'check_acl' function
    directly, and let the VFS layer do everything for it.

    This is all just preparatory - no filesystem actually enables this yet.

    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Don't call down to the generic inode_permission() function just to
    call the inode-specific permission function - just do it directly.

    The generic inode_permission() code does things like checking MAY_WRITE
    and devcgroup_inode_permission(), neither of which are relevant for the
    light pathname walk permission checks (we always do just MAY_EXEC, and
    the inode is never a special device).

    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • This function is only called for path components that are already known
    to be directories (they have a '->lookup' method). So don't bother
    doing that whole S_ISDIR() testing, the whole point of the 'lite()'
    version is that we know that we are looking at a directory component,
    and that we're only checking name lookup permission.

    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Instead of returning EAGAIN and having the caller do something
    special for that case, just do the special case directly.

    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Not only is that a supremely timing-critical path, but it's hopefully
    some day going to be lockless for the common case, and ima can't do
    that.

    Plus the integrity code doesn't even care about non-regular files, so it
    was always a total waste of time and effort.

    Acked-by: Serge Hallyn
    Acked-by: Mimi Zohar
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • eDP is exclusive connector too, and add missing crtc_mask
    setting for TV.

    This fixes

    http://bugzilla.kernel.org/show_bug.cgi?id=14139

    Signed-off-by: Zhenyu Wang
    Reported-and-tested-by: Carlos R. Mafra
    Signed-off-by: Linus Torvalds

    Zhenyu Wang
     

08 Sep, 2009

5 commits


07 Sep, 2009

2 commits

  • This adds some rv350+ register for LTE/GTE discard,
    and enables the rv515 two sided stencil register.
    It also disables the DEPTHXY_OFFSET register which
    can be used to workaround the CS checker.
    Moves rs690 to proper place in rs600 and uses correct
    table on rs600.

    Signed-off-by: Dave Airlie

    Dave Airlie
     
  • - As ima_counts_put() may be called after the inode has been freed,
    verify that the inode is not NULL, before dereferencing it.

    - Maintain the IMA file counters in may_open() properly, decrementing
    any counter increments on subsequent errors.

    Reported-by: Ciprian Docan
    Reported-by: J.R. Okajima
    Signed-off-by: Mimi Zohar
    Acked-by: Eric Paris

    Mimi Zohar
     

06 Sep, 2009

15 commits