11 Sep, 2009

18 commits

  • Splice should update the modification and access times on regular
    files just like read and write. Not updating mtime will confuse
    backup tools, etc...

    This patch only adds the time updates for regular files. For pipes
    and other special files that splice touches the need for updating the
    times is less clear. Let's discuss and fix that separately.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Jens Axboe

    Miklos Szeredi
     
  • Return 0 if we successfully marked this iopoll structure as ours for
    scheduling, instead of 1.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • It's not currently used, as pointed out by
    Gui Jianfeng . We already check the
    wait_request flag to allow an idling queue priority allocation access,
    so we don't need this extra flag.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • We already have interrupts disabled at that point, so use the
    __raise_softirq_irqoff() variant.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Signed-off-by: Jens Axboe

    Jens Axboe
     
  • It's not exported, I doubt we'll have a reason to change this...

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Note sure why they happened in the first place, probably some bad
    terminal setting.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • This borrows some code from NAPI and implements a polled completion
    mode for block devices. The idea is the same as NAPI - instead of
    doing the command completion when the irq occurs, schedule a dedicated
    softirq in the hopes that we will complete more IO when the iopoll
    handler is invoked. Devices have a budget of commands assigned, and will
    stay in polled mode as long as they continue to consume their budget
    from the iopoll softirq handler. If they do not, the device is set back
    to interrupt completion mode.

    This patch holds the core bits for blk-iopoll, device driver support
    sold separately.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Instead of just checking whether this device uses block layer
    tagging, we can improve the detection by looking at the maximum
    queue depth it has reached. If that crosses 4, then deem it a
    queuing device.

    This is important on high IOPS devices, since plugging hurts
    the performance there (it can be as much as 10-15% of the sys
    time).

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Get rid of any functions that test for these bits and make callers
    use bio_rw_flagged() directly. Then it is at least directly apparent
    what variable and flag they check.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Makes for a saner interface, instead of returning the bit position.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Whenever a block device changes it's read-only attribute
    notify the userspace about it.

    Signed-off-by: Hannes Reinecke
    Signed-off-by: Nikanth Karthikesan
    Signed-off-by: Jens Axboe

    Hannes Reinecke
     
  • o Get rid of busy_rt_queues infrastructure. Looks like it is redundant.

    o Once an RT queue gets request it will preempt any of the BE or IDLE queues
    immediately. Otherwise this queue will be put on service tree and scheduler
    will anyway select this queue before any of the BE or IDLE queue. Hence
    looks like there is no need to keep track of how many busy RT queues are
    currently on service tree.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • To lessen the impact of async IO on sync IO, let the device drain of
    any async IO in progress when switching to a sync cfqq that has idling
    enabled.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Update scsi_io_completion() such that it only fails requests till the
    next error boundary and retry the leftover. This enables block layer
    to merge requests with different failfast settings and still behave
    correctly on errors. Allow merge of requests of different failfast
    settings.

    As SCSI is currently the only subsystem which follows failfast status,
    there's no need to worry about other block drivers for now.

    Signed-off-by: Tejun Heo
    Cc: Niel Lambrechts
    Cc: James Bottomley
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Failfast has characteristics from other attributes. When issuing,
    executing and successuflly completing requests, failfast doesn't make
    any difference. It only affects how a request is handled on failure.
    Allowing requests with different failfast settings to be merged cause
    normal IOs to fail prematurely while not allowing has performance
    penalties as failfast is used for read aheads which are likely to be
    located near in-flight or to-be-issued normal IOs.

    This patch introduces the concept of 'mixed merge'. A request is a
    mixed merge if it is merge of segments which require different
    handling on failure. Currently the only mixable attributes are
    failfast ones (or lack thereof).

    When a bio with different failfast settings is added to an existing
    request or requests of different failfast settings are merged, the
    merged request is marked mixed. Each bio carries failfast settings
    and the request always tracks failfast state of the first bio. When
    the request fails, blk_rq_err_bytes() can be used to determine how
    many bytes can be safely failed without crossing into an area which
    requires further retrials.

    This allows request merging regardless of failfast settings while
    keeping the failure handling correct.

    This patch only implements mixed merge but doesn't enable it. The
    next one will update SCSI to make use of mixed merge.

    Signed-off-by: Tejun Heo
    Cc: Niel Lambrechts
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • bio and request use the same set of failfast bits. This patch makes
    the following changes to simplify things.

    * enumify BIO_RW* bits and reorder bits such that BIOS_RW_FAILFAST_*
    bits coincide with __REQ_FAILFAST_* bits.

    * The above pushes BIO_RW_AHEAD out of sync with __REQ_FAILFAST_DEV
    but the matching is useless anyway. init_request_from_bio() is
    responsible for setting FAILFAST bits on FS requests and non-FS
    requests never use BIO_RW_AHEAD. Drop the code and comment from
    blk_rq_bio_prep().

    * Define REQ_FAILFAST_MASK which is OR of all FAILFAST bits and
    simplify FAILFAST flags handling in init_request_from_bio().

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Commit b8313b6da7e2e7c7f47d93d8561969a3ff9ba0ea ("dm log: remove incorrect
    field from userspace table output") added a call to strstr() with a
    single-character "needle" string parameter.

    Unfortunately some versions of gcc replace such calls to strstr() by calls
    to strchr() behind our back. This causes linking errors if strchr() is
    defined as an inline function in (e.g. on m68k):

    | WARNING: "strchr" [drivers/md/dm-log-userspace.ko] undefined!

    Avoid this by explicitly calling strchr() instead.

    Signed-off-by: Geert Uytterhoeven
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     

10 Sep, 2009

3 commits

  • * lookup-permissions-cleanup:
    jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()'
    ext[234]: move over to 'check_acl' permission model
    shmfs: use 'check_acl' instead of 'permission'
    Make 'check_acl()' a first-class filesystem op
    Simplify exec_permission_lite(), part 3
    Simplify exec_permission_lite() further
    Simplify exec_permission_lite() logic
    Do not call 'ima_path_check()' for each path component

    Linus Torvalds
     
  • In fs/binfmt_elf.c, load_elf_interp() calls padzero() for .bss even if
    the PT_LOAD has no PROT_WRITE and no .bss. This generates EFAULT.

    Here is a small test case. (Yes, there are other, useful PT_INTERP
    which have only .text and no .data/.bss.)

    ----- ptinterp.S
    _start: .globl _start
    nop
    int3
    -----
    $ gcc -m32 -nostartfiles -nostdlib -o ptinterp ptinterp.S
    $ gcc -m32 -Wl,--dynamic-linker=ptinterp -o hello hello.c
    $ ./hello
    Segmentation fault # during execve() itself

    After applying the patch:
    $ ./hello
    Trace trap # user-mode execution after execve() finishes

    If the ELF headers are actually self-inconsistent, then dying is fine.
    But having no PROT_WRITE segment is perfectly normal and correct if
    there is no segment with p_memsz > p_filesz (i.e. bss). John Reiser
    suggested checking for PROT_WRITE in the bss logic. I think it makes
    most sense to simply apply the bss logic only when there is bss.

    This patch looks less trivial than it is due to some reindentation.
    It just moves the "if (last_bss > elf_bss) {" test up to include the
    partial-page bss logic as well as the more-pages bss logic.

    Reported-by: John Reiser
    Signed-off-by: Roland McGrath
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • Linus Torvalds
     

09 Sep, 2009

11 commits

  • Andy Whitcroft reported an oops in aoe triggered by use of an
    incorrectly initialised request_queue object:

    [ 2645.959090] kobject '' (ffff880059ca22c0): tried to add
    an uninitialized object, something is seriously wrong.
    [ 2645.959104] Pid: 6, comm: events/0 Not tainted 2.6.31-5-generic #24-Ubuntu
    [ 2645.959107] Call Trace:
    [ 2645.959139] [] kobject_add+0x5f/0x70
    [ 2645.959151] [] blk_register_queue+0x8b/0xf0
    [ 2645.959155] [] add_disk+0x8f/0x160
    [ 2645.959161] [] aoeblk_gdalloc+0x164/0x1c0 [aoe]

    The request queue of an aoe device is not used but can be allocated in
    code that does not sleep.

    Bruno bisected this regression down to

    cd43e26f071524647e660706b784ebcbefbd2e44

    block: Expose stacked device queues in sysfs

    "This seems to generate /sys/block/$device/queue and its contents for
    everyone who is using queues, not just for those queues that have a
    non-NULL queue->request_fn."

    Addresses http://bugs.launchpad.net/bugs/410198
    Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13942

    Note that embedding a queue inside another object has always been
    an illegal construct, since the queues are reference counted and
    must persist until the last reference is dropped. So aoe was
    always buggy in this respect (Jens).

    Signed-off-by: Ed Cashin
    Cc: Andy Whitcroft
    Cc: "Rafael J. Wysocki"
    Cc: Bruno Premont
    Cc: Martin K. Petersen
    Cc: Andrew Morton
    Signed-off-by: Jens Axboe

    Ed Cashin
     
  • Reinette Chatre reports a frozen system (with blinking keyboard LEDs)
    when switching from graphics mode to the text console, or when
    suspending (which does the same thing). With netconsole, the oops
    turned out to be

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000084
    IP: [] i915_driver_irq_handler+0x26b/0xd20 [i915]

    and it's due to the i915_gem.c code doing drm_irq_uninstall() after
    having done i915_gem_idle(). And the i915_gem_idle() path will do

    i915_gem_idle() ->
    i915_gem_cleanup_ringbuffer() ->
    i915_gem_cleanup_hws() ->
    dev_priv->hw_status_page = NULL;

    but if an i915 interrupt comes in after this stage, it may want to
    access that hw_status_page, and gets the above NULL pointer dereference.

    And since the NULL pointer dereference happens from within an interrupt,
    and with the screen still in graphics mode, the common end result is
    simply a silently hung machine.

    Fix it by simply uninstalling the irq handler before idling rather than
    after. Fixes

    http://bugzilla.kernel.org/show_bug.cgi?id=13819

    Reported-and-tested-by: Reinette Chatre
    Acked-by: Jesse Barnes
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • This avoids an indirect call in the VFS for each path component lookup.

    Well, at least as long as you own the directory in question, and the ACL
    check is unnecessary.

    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Don't implement per-filesystem 'extX_permission()' functions that have
    to be called for every path component operation, and instead just expose
    the actual ACL checking so that the VFS layer can now do it for us.

    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • shmfs wants purely standard POSIX ACL semantics, so we can use the new
    generic VFS layer POSIX ACL checking rather than cooking our own
    'permission()' function.

    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Acked-by: Hugh Dickins
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • This is stage one in flattening out the callchains for the common
    permission testing. Rather than have most filesystem implement their
    own inode->i_op->permission function that just calls back down to the
    VFS layers 'generic_permission()' with the per-filesystem ACL checking
    function, the filesystem can just expose its 'check_acl' function
    directly, and let the VFS layer do everything for it.

    This is all just preparatory - no filesystem actually enables this yet.

    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Don't call down to the generic inode_permission() function just to
    call the inode-specific permission function - just do it directly.

    The generic inode_permission() code does things like checking MAY_WRITE
    and devcgroup_inode_permission(), neither of which are relevant for the
    light pathname walk permission checks (we always do just MAY_EXEC, and
    the inode is never a special device).

    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • This function is only called for path components that are already known
    to be directories (they have a '->lookup' method). So don't bother
    doing that whole S_ISDIR() testing, the whole point of the 'lite()'
    version is that we know that we are looking at a directory component,
    and that we're only checking name lookup permission.

    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Instead of returning EAGAIN and having the caller do something
    special for that case, just do the special case directly.

    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Not only is that a supremely timing-critical path, but it's hopefully
    some day going to be lockless for the common case, and ima can't do
    that.

    Plus the integrity code doesn't even care about non-regular files, so it
    was always a total waste of time and effort.

    Acked-by: Serge Hallyn
    Acked-by: Mimi Zohar
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • eDP is exclusive connector too, and add missing crtc_mask
    setting for TV.

    This fixes

    http://bugzilla.kernel.org/show_bug.cgi?id=14139

    Signed-off-by: Zhenyu Wang
    Reported-and-tested-by: Carlos R. Mafra
    Signed-off-by: Linus Torvalds

    Zhenyu Wang
     

08 Sep, 2009

5 commits


07 Sep, 2009

2 commits

  • This adds some rv350+ register for LTE/GTE discard,
    and enables the rv515 two sided stencil register.
    It also disables the DEPTHXY_OFFSET register which
    can be used to workaround the CS checker.
    Moves rs690 to proper place in rs600 and uses correct
    table on rs600.

    Signed-off-by: Dave Airlie

    Dave Airlie
     
  • - As ima_counts_put() may be called after the inode has been freed,
    verify that the inode is not NULL, before dereferencing it.

    - Maintain the IMA file counters in may_open() properly, decrementing
    any counter increments on subsequent errors.

    Reported-by: Ciprian Docan
    Reported-by: J.R. Okajima
    Signed-off-by: Mimi Zohar
    Acked-by: Eric Paris

    Mimi Zohar
     

06 Sep, 2009

1 commit

  • Reported by Michael Guntsche

    --------------------
    Commit
    38bddf04bcfe661fbdab94888c3b72c32f6873b3 gianfar: gfar_remove needs to call unregister_netdev()

    breaks the build of the gianfar driver because "dev" is undefined in
    this function. To quickly test rc9 I changed this to priv->ndev but I do
    not know if this is the correct one.
    --------------------

    Signed-off-by: David S. Miller

    David S. Miller