01 Nov, 2011

14 commits

  • memchr_inv() is mainly used to check whether the whole buffer is filled
    with just a specified byte.

    The function name and prototype are stolen from logfs and the
    implementation is from SLUB.

    Signed-off-by: Akinobu Mita
    Acked-by: Christoph Lameter
    Acked-by: Pekka Enberg
    Cc: Matt Mackall
    Acked-by: Joern Engel
    Cc: Marcin Slusarz
    Cc: Eric Dumazet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • When direct reclaim encounters a dirty page, it gets recycled around the
    LRU for another cycle. This patch marks the page PageReclaim similar to
    deactivate_page() so that the page gets reclaimed almost immediately after
    the page gets cleaned. This is to avoid reclaiming clean pages that are
    younger than a dirty page encountered at the end of the LRU that might
    have been something like a use-once page.

    Signed-off-by: Mel Gorman
    Acked-by: Johannes Weiner
    Cc: Dave Chinner
    Cc: Christoph Hellwig
    Cc: Wu Fengguang
    Cc: Jan Kara
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: Alex Elder
    Cc: Theodore Ts'o
    Cc: Chris Mason
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Testing from the XFS folk revealed that there is still too much I/O from
    the end of the LRU in kswapd. Previously it was considered acceptable by
    VM people for a small number of pages to be written back from reclaim with
    testing generally showing about 0.3% of pages reclaimed were written back
    (higher if memory was low). That writing back a small number of pages is
    ok has been heavily disputed for quite some time and Dave Chinner
    explained it well;

    It doesn't have to be a very high number to be a problem. IO
    is orders of magnitude slower than the CPU time it takes to
    flush a page, so the cost of making a bad flush decision is
    very high. And single page writeback from the LRU is almost
    always a bad flush decision.

    To complicate matters, filesystems respond very differently to requests
    from reclaim according to Christoph Hellwig;

    xfs tries to write it back if the requester is kswapd
    ext4 ignores the request if it's a delayed allocation
    btrfs ignores the request

    As a result, each filesystem has different performance characteristics
    when under memory pressure and there are many pages being dirtied. In
    some cases, the request is ignored entirely so the VM cannot depend on the
    IO being dispatched.

    The objective of this series is to reduce writing of filesystem-backed
    pages from reclaim, play nicely with writeback that is already in progress
    and throttle reclaim appropriately when writeback pages are encountered.
    The assumption is that the flushers will always write pages faster than if
    reclaim issues the IO.

    A secondary goal is to avoid the problem whereby direct reclaim splices
    two potentially deep call stacks together.

    There is a potential new problem as reclaim has less control over how long
    before a page in a particularly zone or container is cleaned and direct
    reclaimers depend on kswapd or flusher threads to do the necessary work.
    However, as filesystems sometimes ignore direct reclaim requests already,
    it is not expected to be a serious issue.

    Patch 1 disables writeback of filesystem pages from direct reclaim
    entirely. Anonymous pages are still written.

    Patch 2 removes dead code in lumpy reclaim as it is no longer able
    to synchronously write pages. This hurts lumpy reclaim but
    there is an expectation that compaction is used for hugepage
    allocations these days and lumpy reclaim's days are numbered.

    Patches 3-4 add warnings to XFS and ext4 if called from
    direct reclaim. With patch 1, this "never happens" and is
    intended to catch regressions in this logic in the future.

    Patch 5 disables writeback of filesystem pages from kswapd unless
    the priority is raised to the point where kswapd is considered
    to be in trouble.

    Patch 6 throttles reclaimers if too many dirty pages are being
    encountered and the zones or backing devices are congested.

    Patch 7 invalidates dirty pages found at the end of the LRU so they
    are reclaimed quickly after being written back rather than
    waiting for a reclaimer to find them

    I consider this series to be orthogonal to the writeback work but it is
    worth noting that the writeback work affects the viability of patch 8 in
    particular.

    I tested this on ext4 and xfs using fs_mark, a simple writeback test based
    on dd and a micro benchmark that does a streaming write to a large mapping
    (exercises use-once LRU logic) followed by streaming writes to a mix of
    anonymous and file-backed mappings. The command line for fs_mark when
    botted with 512M looked something like

    ./fs_mark -d /tmp/fsmark-2676 -D 100 -N 150 -n 150 -L 25 -t 1 -S0 -s 10485760

    The number of files was adjusted depending on the amount of available
    memory so that the files created was about 3xRAM. For multiple threads,
    the -d switch is specified multiple times.

    The test machine is x86-64 with an older generation of AMD processor with
    4 cores. The underlying storage was 4 disks configured as RAID-0 as this
    was the best configuration of storage I had available. Swap is on a
    separate disk. Dirty ratio was tuned to 40% instead of the default of
    20%.

    Testing was run with and without monitors to both verify that the patches
    were operating as expected and that any performance gain was real and not
    due to interference from monitors.

    Here is a summary of results based on testing XFS.

    512M1P-xfs Files/s mean 32.69 ( 0.00%) 34.44 ( 5.08%)
    512M1P-xfs Elapsed Time fsmark 51.41 48.29
    512M1P-xfs Elapsed Time simple-wb 114.09 108.61
    512M1P-xfs Elapsed Time mmap-strm 113.46 109.34
    512M1P-xfs Kswapd efficiency fsmark 62% 63%
    512M1P-xfs Kswapd efficiency simple-wb 56% 61%
    512M1P-xfs Kswapd efficiency mmap-strm 44% 42%
    512M-xfs Files/s mean 30.78 ( 0.00%) 35.94 (14.36%)
    512M-xfs Elapsed Time fsmark 56.08 48.90
    512M-xfs Elapsed Time simple-wb 112.22 98.13
    512M-xfs Elapsed Time mmap-strm 219.15 196.67
    512M-xfs Kswapd efficiency fsmark 54% 56%
    512M-xfs Kswapd efficiency simple-wb 54% 55%
    512M-xfs Kswapd efficiency mmap-strm 45% 44%
    512M-4X-xfs Files/s mean 30.31 ( 0.00%) 33.33 ( 9.06%)
    512M-4X-xfs Elapsed Time fsmark 63.26 55.88
    512M-4X-xfs Elapsed Time simple-wb 100.90 90.25
    512M-4X-xfs Elapsed Time mmap-strm 261.73 255.38
    512M-4X-xfs Kswapd efficiency fsmark 49% 50%
    512M-4X-xfs Kswapd efficiency simple-wb 54% 56%
    512M-4X-xfs Kswapd efficiency mmap-strm 37% 36%
    512M-16X-xfs Files/s mean 60.89 ( 0.00%) 65.22 ( 6.64%)
    512M-16X-xfs Elapsed Time fsmark 67.47 58.25
    512M-16X-xfs Elapsed Time simple-wb 103.22 90.89
    512M-16X-xfs Elapsed Time mmap-strm 237.09 198.82
    512M-16X-xfs Kswapd efficiency fsmark 45% 46%
    512M-16X-xfs Kswapd efficiency simple-wb 53% 55%
    512M-16X-xfs Kswapd efficiency mmap-strm 33% 33%

    Up until 512-4X, the FSmark improvements were statistically significant.
    For the 4X and 16X tests the results were within standard deviations but
    just barely. The time to completion for all tests is improved which is an
    important result. In general, kswapd efficiency is not affected by
    skipping dirty pages.

    1024M1P-xfs Files/s mean 39.09 ( 0.00%) 41.15 ( 5.01%)
    1024M1P-xfs Elapsed Time fsmark 84.14 80.41
    1024M1P-xfs Elapsed Time simple-wb 210.77 184.78
    1024M1P-xfs Elapsed Time mmap-strm 162.00 160.34
    1024M1P-xfs Kswapd efficiency fsmark 69% 75%
    1024M1P-xfs Kswapd efficiency simple-wb 71% 77%
    1024M1P-xfs Kswapd efficiency mmap-strm 43% 44%
    1024M-xfs Files/s mean 35.45 ( 0.00%) 37.00 ( 4.19%)
    1024M-xfs Elapsed Time fsmark 94.59 91.00
    1024M-xfs Elapsed Time simple-wb 229.84 195.08
    1024M-xfs Elapsed Time mmap-strm 405.38 440.29
    1024M-xfs Kswapd efficiency fsmark 79% 71%
    1024M-xfs Kswapd efficiency simple-wb 74% 74%
    1024M-xfs Kswapd efficiency mmap-strm 39% 42%
    1024M-4X-xfs Files/s mean 32.63 ( 0.00%) 35.05 ( 6.90%)
    1024M-4X-xfs Elapsed Time fsmark 103.33 97.74
    1024M-4X-xfs Elapsed Time simple-wb 204.48 178.57
    1024M-4X-xfs Elapsed Time mmap-strm 528.38 511.88
    1024M-4X-xfs Kswapd efficiency fsmark 81% 70%
    1024M-4X-xfs Kswapd efficiency simple-wb 73% 72%
    1024M-4X-xfs Kswapd efficiency mmap-strm 39% 38%
    1024M-16X-xfs Files/s mean 42.65 ( 0.00%) 42.97 ( 0.74%)
    1024M-16X-xfs Elapsed Time fsmark 103.11 99.11
    1024M-16X-xfs Elapsed Time simple-wb 200.83 178.24
    1024M-16X-xfs Elapsed Time mmap-strm 397.35 459.82
    1024M-16X-xfs Kswapd efficiency fsmark 84% 69%
    1024M-16X-xfs Kswapd efficiency simple-wb 74% 73%
    1024M-16X-xfs Kswapd efficiency mmap-strm 39% 40%

    All FSMark tests up to 16X had statistically significant improvements.
    For the most part, tests are completing faster with the exception of the
    streaming writes to a mixture of anonymous and file-backed mappings which
    were slower in two cases

    In the cases where the mmap-strm tests were slower, there was more
    swapping due to dirty pages being skipped. The number of additional pages
    swapped is almost identical to the fewer number of pages written from
    reclaim. In other words, roughly the same number of pages were reclaimed
    but swapping was slower. As the test is a bit unrealistic and stresses
    memory heavily, the small shift is acceptable.

    4608M1P-xfs Files/s mean 29.75 ( 0.00%) 30.96 ( 3.91%)
    4608M1P-xfs Elapsed Time fsmark 512.01 492.15
    4608M1P-xfs Elapsed Time simple-wb 618.18 566.24
    4608M1P-xfs Elapsed Time mmap-strm 488.05 465.07
    4608M1P-xfs Kswapd efficiency fsmark 93% 86%
    4608M1P-xfs Kswapd efficiency simple-wb 88% 84%
    4608M1P-xfs Kswapd efficiency mmap-strm 46% 45%
    4608M-xfs Files/s mean 27.60 ( 0.00%) 28.85 ( 4.33%)
    4608M-xfs Elapsed Time fsmark 555.96 532.34
    4608M-xfs Elapsed Time simple-wb 659.72 571.85
    4608M-xfs Elapsed Time mmap-strm 1082.57 1146.38
    4608M-xfs Kswapd efficiency fsmark 89% 91%
    4608M-xfs Kswapd efficiency simple-wb 88% 82%
    4608M-xfs Kswapd efficiency mmap-strm 48% 46%
    4608M-4X-xfs Files/s mean 26.00 ( 0.00%) 27.47 ( 5.35%)
    4608M-4X-xfs Elapsed Time fsmark 592.91 564.00
    4608M-4X-xfs Elapsed Time simple-wb 616.65 575.07
    4608M-4X-xfs Elapsed Time mmap-strm 1773.02 1631.53
    4608M-4X-xfs Kswapd efficiency fsmark 90% 94%
    4608M-4X-xfs Kswapd efficiency simple-wb 87% 82%
    4608M-4X-xfs Kswapd efficiency mmap-strm 43% 43%
    4608M-16X-xfs Files/s mean 26.07 ( 0.00%) 26.42 ( 1.32%)
    4608M-16X-xfs Elapsed Time fsmark 602.69 585.78
    4608M-16X-xfs Elapsed Time simple-wb 606.60 573.81
    4608M-16X-xfs Elapsed Time mmap-strm 1549.75 1441.86
    4608M-16X-xfs Kswapd efficiency fsmark 98% 98%
    4608M-16X-xfs Kswapd efficiency simple-wb 88% 82%
    4608M-16X-xfs Kswapd efficiency mmap-strm 44% 42%

    Unlike the other tests, the fsmark results are not statistically
    significant but the min and max times are both improved and for the most
    part, tests completed faster.

    There are other indications that this is an improvement as well. For
    example, in the vast majority of cases, there were fewer pages scanned by
    direct reclaim implying in many cases that stalls due to direct reclaim
    are reduced. KSwapd is scanning more due to skipping dirty pages which is
    unfortunate but the CPU usage is still acceptable

    In an earlier set of tests, I used blktrace and in almost all cases
    throughput throughout the entire test was higher. However, I ended up
    discarding those results as recording blktrace data was too heavy for my
    liking.

    On a laptop, I plugged in a USB stick and ran a similar tests of tests
    using it as backing storage. A desktop environment was running and for
    the entire duration of the tests, firefox and gnome terminal were
    launching and exiting to vaguely simulate a user.

    1024M-xfs Files/s mean 0.41 ( 0.00%) 0.44 ( 6.82%)
    1024M-xfs Elapsed Time fsmark 2053.52 1641.03
    1024M-xfs Elapsed Time simple-wb 1229.53 768.05
    1024M-xfs Elapsed Time mmap-strm 4126.44 4597.03
    1024M-xfs Kswapd efficiency fsmark 84% 85%
    1024M-xfs Kswapd efficiency simple-wb 92% 81%
    1024M-xfs Kswapd efficiency mmap-strm 60% 51%
    1024M-xfs Avg wait ms fsmark 5404.53 4473.87
    1024M-xfs Avg wait ms simple-wb 2541.35 1453.54
    1024M-xfs Avg wait ms mmap-strm 3400.25 3852.53

    The mmap-strm results were hurt because firefox launching had a tendency
    to push the test out of memory. On the postive side, firefox launched
    marginally faster with the patches applied. Time to completion for many
    tests was faster but more importantly - the "Avg wait" time as measured by
    iostat was far lower implying the system would be more responsive. It was
    also the case that "Avg wait ms" on the root filesystem was lower. I
    tested it manually and while the system felt slightly more responsive
    while copying data to a USB stick, it was marginal enough that it could be
    my imagination.

    This patch: do not writeback filesystem pages in direct reclaim.

    When kswapd is failing to keep zones above the min watermark, a process
    will enter direct reclaim in the same manner kswapd does. If a dirty page
    is encountered during the scan, this page is written to backing storage
    using mapping->writepage.

    This causes two problems. First, it can result in very deep call stacks,
    particularly if the target storage or filesystem are complex. Some
    filesystems ignore write requests from direct reclaim as a result. The
    second is that a single-page flush is inefficient in terms of IO. While
    there is an expectation that the elevator will merge requests, this does
    not always happen. Quoting Christoph Hellwig;

    The elevator has a relatively small window it can operate on,
    and can never fix up a bad large scale writeback pattern.

    This patch prevents direct reclaim writing back filesystem pages by
    checking if current is kswapd. Anonymous pages are still written to swap
    as there is not the equivalent of a flusher thread for anonymous pages.
    If the dirty pages cannot be written back, they are placed back on the LRU
    lists. There is now a direct dependency on dirty page balancing to
    prevent too many pages in the system being dirtied which would prevent
    reclaim making forward progress.

    Signed-off-by: Mel Gorman
    Reviewed-by: Minchan Kim
    Cc: Dave Chinner
    Cc: Christoph Hellwig
    Cc: Johannes Weiner
    Cc: Wu Fengguang
    Cc: Jan Kara
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: Alex Elder
    Cc: Theodore Ts'o
    Cc: Chris Mason
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Add comments to explain the page statistics field in the mm_struct.

    [akpm@linux-foundation.org: add missing ;]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Some kernel components pin user space memory (infiniband and perf) (by
    increasing the page count) and account that memory as "mlocked".

    The difference between mlocking and pinning is:

    A. mlocked pages are marked with PG_mlocked and are exempt from
    swapping. Page migration may move them around though.
    They are kept on a special LRU list.

    B. Pinned pages cannot be moved because something needs to
    directly access physical memory. They may not be on any
    LRU list.

    I recently saw an mlockalled process where mm->locked_vm became
    bigger than the virtual size of the process (!) because some
    memory was accounted for twice:

    Once when the page was mlocked and once when the Infiniband
    layer increased the refcount because it needt to pin the RDMA
    memory.

    This patch introduces a separate counter for pinned pages and
    accounts them seperately.

    Signed-off-by: Christoph Lameter
    Cc: Mike Marciniszyn
    Cc: Roland Dreier
    Cc: Sean Hefty
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • test_set_oom_score_adj() was introduced in 72788c385604 ("oom: replace
    PF_OOM_ORIGIN with toggling oom_score_adj") to temporarily elevate
    current's oom_score_adj for ksm and swapoff without requiring an
    additional per-process flag.

    Using that function to both set oom_score_adj to OOM_SCORE_ADJ_MAX and
    then reinstate the previous value is racy since it's possible that
    userspace can set the value to something else itself before the old value
    is reinstated. That results in userspace setting current's oom_score_adj
    to a different value and then the kernel immediately setting it back to
    its previous value without notification.

    To fix this, a new compare_swap_oom_score_adj() function is introduced
    with the same semantics as the compare and swap CAS instruction, or
    CMPXCHG on x86. It is used to reinstate the previous value of
    oom_score_adj if and only if the present value is the same as the old
    value.

    Signed-off-by: David Rientjes
    Cc: Oleg Nesterov
    Cc: Ying Han
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • This removes mm->oom_disable_count entirely since it's unnecessary and
    currently buggy. The counter was intended to be per-process but it's
    currently decremented in the exit path for each thread that exits, causing
    it to underflow.

    The count was originally intended to prevent oom killing threads that
    share memory with threads that cannot be killed since it doesn't lead to
    future memory freeing. The counter could be fixed to represent all
    threads sharing the same mm, but it's better to remove the count since:

    - it is possible that the OOM_DISABLE thread sharing memory with the
    victim is waiting on that thread to exit and will actually cause
    future memory freeing, and

    - there is no guarantee that a thread is disabled from oom killing just
    because another thread sharing its mm is oom disabled.

    Signed-off-by: David Rientjes
    Reported-by: Oleg Nesterov
    Reviewed-by: Oleg Nesterov
    Cc: Ying Han
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • In __zone_reclaim case, we don't want to shrink mapped page. Nonetheless,
    we have isolated mapped page and re-add it into LRU's head. It's
    unnecessary CPU overhead and makes LRU churning.

    Of course, when we isolate the page, the page might be mapped but when we
    try to migrate the page, the page would be not mapped. So it could be
    migrated. But race is rare and although it happens, it's no big deal.

    Signed-off-by: Minchan Kim
    Acked-by: Johannes Weiner
    Reviewed-by: KAMEZAWA Hiroyuki
    Reviewed-by: KOSAKI Motohiro
    Reviewed-by: Michal Hocko
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • In async mode, compaction doesn't migrate dirty or writeback pages. So,
    it's meaningless to pick the page and re-add it to lru list.

    Of course, when we isolate the page in compaction, the page might be dirty
    or writeback but when we try to migrate the page, the page would be not
    dirty, writeback. So it could be migrated. But it's very unlikely as
    isolate and migration cycle is much faster than writeout.

    So, this patch helps cpu overhead and prevent unnecessary LRU churning.

    Signed-off-by: Minchan Kim
    Acked-by: Johannes Weiner
    Reviewed-by: KAMEZAWA Hiroyuki
    Reviewed-by: KOSAKI Motohiro
    Acked-by: Mel Gorman
    Acked-by: Rik van Riel
    Reviewed-by: Michal Hocko
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Change ISOLATE_XXX macro with bitwise isolate_mode_t type. Normally,
    macro isn't recommended as it's type-unsafe and making debugging harder as
    symbol cannot be passed throught to the debugger.

    Quote from Johannes
    " Hmm, it would probably be cleaner to fully convert the isolation mode
    into independent flags. INACTIVE, ACTIVE, BOTH is currently a
    tri-state among flags, which is a bit ugly."

    This patch moves isolate mode from swap.h to mmzone.h by memcontrol.h

    Signed-off-by: Minchan Kim
    Cc: Johannes Weiner
    Cc: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Michal Hocko
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • The basic idea behind cross memory attach is to allow MPI programs doing
    intra-node communication to do a single copy of the message rather than a
    double copy of the message via shared memory.

    The following patch attempts to achieve this by allowing a destination
    process, given an address and size from a source process, to copy memory
    directly from the source process into its own address space via a system
    call. There is also a symmetrical ability to copy from the current
    process's address space into a destination process's address space.

    - Use of /proc/pid/mem has been considered, but there are issues with
    using it:
    - Does not allow for specifying iovecs for both src and dest, assuming
    preadv or pwritev was implemented either the area read from or
    written to would need to be contiguous.
    - Currently mem_read allows only processes who are currently
    ptrace'ing the target and are still able to ptrace the target to read
    from the target. This check could possibly be moved to the open call,
    but its not clear exactly what race this restriction is stopping
    (reason appears to have been lost)
    - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
    domain socket is a bit ugly from a userspace point of view,
    especially when you may have hundreds if not (eventually) thousands
    of processes that all need to do this with each other
    - Doesn't allow for some future use of the interface we would like to
    consider adding in the future (see below)
    - Interestingly reading from /proc/pid/mem currently actually
    involves two copies! (But this could be fixed pretty easily)

    As mentioned previously use of vmsplice instead was considered, but has
    problems. Since you need the reader and writer working co-operatively if
    the pipe is not drained then you block. Which requires some wrapping to
    do non blocking on the send side or polling on the receive. In all to all
    communication it requires ordering otherwise you can deadlock. And in the
    example of many MPI tasks writing to one MPI task vmsplice serialises the
    copying.

    There are some cases of MPI collectives where even a single copy interface
    does not get us the performance gain we could. For example in an
    MPI_Reduce rather than copy the data from the source we would like to
    instead use it directly in a mathops (say the reduce is doing a sum) as
    this would save us doing a copy. We don't need to keep a copy of the data
    from the source. I haven't implemented this, but I think this interface
    could in the future do all this through the use of the flags - eg could
    specify the math operation and type and the kernel rather than just
    copying the data would apply the specified operation between the source
    and destination and store it in the destination.

    Although we don't have a "second user" of the interface (though I've had
    some nibbles from people who may be interested in using it for intra
    process messaging which is not MPI). This interface is something which
    hardware vendors are already doing for their custom drivers to implement
    fast local communication. And so in addition to this being useful for
    OpenMPI it would mean the driver maintainers don't have to fix things up
    when the mm changes.

    There was some discussion about how much faster a true zero copy would
    go. Here's a link back to the email with some testing I did on that:

    http://marc.info/?l=linux-mm&m=130105930902915&w=2

    There is a basic man page for the proposed interface here:

    http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt

    This has been implemented for x86 and powerpc, other architecture should
    mainly (I think) just need to add syscall numbers for the process_vm_readv
    and process_vm_writev. There are 32 bit compatibility versions for
    64-bit kernels.

    For arch maintainers there are some simple tests to be able to quickly
    verify that the syscalls are working correctly here:

    http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgz

    Signed-off-by: Chris Yeoh
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Arnd Bergmann
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: David Howells
    Cc: James Morris
    Cc:
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christopher Yeoh
     
  • x86_64 allnoconfig:

    In file included from arch/x86/kernel/pci-dma.c:3:
    include/linux/dmar.h:248: warning: 'struct acpi_dmar_header' declared inside parameter list
    include/linux/dmar.h:248: warning: its scope is only this definition or declaration, which is probably not what you want

    Cc: Suresh Siddha
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Commit 5fd75a7850b5 (dma-mapping: remove unnecessary sync_single_range_*
    in dma_map_ops) unified not only the dma_map_ops but also the
    corresponding debug_dma_sync_* calls. This led to spurious WARN()ings
    like the following because the DMA debug code was no longer able to detect
    the DMA buffer base address without the separate offset parameter:

    WARNING: at lib/dma-debug.c:911 check_sync+0xce/0x446()
    firewire_ohci 0000:04:00.0: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x00000000cedaa400] [size=1024 bytes]
    Call Trace: ...
    [] check_sync+0xce/0x446
    [] debug_dma_sync_single_for_device+0x39/0x3b
    [] ohci_queue_iso+0x4f3/0x77d [firewire_ohci]
    ...

    To fix this, unshare the sync_single_* and sync_single_range_*
    implementations so that we are able to call the correct debug_dma_sync_*
    functions.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Clemens Ladisch
    Cc: FUJITA Tomonori
    Cc: Konrad Rzeszutek Wilk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Clemens Ladisch
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
    vlan: allow nested vlan_do_receive()
    ipv6: fix route lookup in addrconf_prefix_rcv()
    bonding: eliminate bond_close race conditions
    qlcnic: fix beacon and LED test.
    qlcnic: Updated License file
    qlcnic: updated reset sequence
    qlcnic: reset loopback mode if promiscous mode setting fails.
    qlcnic: skip IDC ack check in fw reset path.
    i825xx: Fix incorrect dependency for BVME6000_NET
    ipv6: fix route error binding peer in func icmp6_dst_alloc
    ipv6: fix error propagation in ip6_ufo_append_data()
    stmmac: update normal descriptor structure (v2)
    stmmac: fix NULL pointer dereference in capabilities fixup (v2)
    stmmac: fix a bug while checking the HW cap reg (v2)
    be2net: Changing MAC Address of a VF was broken.
    be2net: Refactored be_cmds.c file.
    bnx2x: update driver version to 1.70.30-0
    bnx2x: use FW 7.0.29.0
    bnx2x: Enable changing speed when port type is PORT_DA
    bnx2x: Fix 54618se LED behavior
    ...

    Linus Torvalds
     

31 Oct, 2011

4 commits

  • * 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
    i2c: Functions for byte-swapped smbus_write/read_word_data
    i2c-algo-pca: Return standard fault codes
    i2c-algo-bit: Return standard fault codes
    i2c-algo-bit: Be verbose on bus testing failure
    i2c-algo-bit: Let user test buses without failing
    i2c/scx200_acb: Fix section mismatch warning in scx200_pci_drv
    i2c: I2C_ELEKTOR should depend on HAS_IOPORT

    Linus Torvalds
     
  • * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (33 commits)
    iommu/core: Remove global iommu_ops and register_iommu
    iommu/msm: Use bus_set_iommu instead of register_iommu
    iommu/omap: Use bus_set_iommu instead of register_iommu
    iommu/vt-d: Use bus_set_iommu instead of register_iommu
    iommu/amd: Use bus_set_iommu instead of register_iommu
    iommu/core: Use bus->iommu_ops in the iommu-api
    iommu/core: Convert iommu_found to iommu_present
    iommu/core: Add bus_type parameter to iommu_domain_alloc
    Driver core: Add iommu_ops to bus_type
    iommu/core: Define iommu_ops and register_iommu only with CONFIG_IOMMU_API
    iommu/amd: Fix wrong shift direction
    iommu/omap: always provide iommu debug code
    iommu/core: let drivers know if an iommu fault handler isn't installed
    iommu/core: export iommu_set_fault_handler()
    iommu/omap: Fix build error with !IOMMU_SUPPORT
    iommu/omap: Migrate to the generic fault report mechanism
    iommu/core: Add fault reporting mechanism
    iommu/core: Use PAGE_SIZE instead of hard-coded value
    iommu/core: use the existing IS_ALIGNED macro
    iommu/msm: ->unmap() should return order of unmapped page
    ...

    Fixup trivial conflicts in drivers/iommu/Makefile: "move omap iommu to
    dedicated iommu folder" vs "Rename the DMAR and INTR_REMAP config
    options" just happened to touch lines next to each other.

    Linus Torvalds
     
  • * 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (75 commits)
    KVM: SVM: Keep intercepting task switching with NPT enabled
    KVM: s390: implement sigp external call
    KVM: s390: fix register setting
    KVM: s390: fix return value of kvm_arch_init_vm
    KVM: s390: check cpu_id prior to using it
    KVM: emulate lapic tsc deadline timer for guest
    x86: TSC deadline definitions
    KVM: Fix simultaneous NMIs
    KVM: x86 emulator: convert push %sreg/pop %sreg to direct decode
    KVM: x86 emulator: switch lds/les/lss/lfs/lgs to direct decode
    KVM: x86 emulator: streamline decode of segment registers
    KVM: x86 emulator: simplify OpMem64 decode
    KVM: x86 emulator: switch src decode to decode_operand()
    KVM: x86 emulator: qualify OpReg inhibit_byte_regs hack
    KVM: x86 emulator: switch OpImmUByte decode to decode_imm()
    KVM: x86 emulator: free up some flag bits near src, dst
    KVM: x86 emulator: switch src2 to generic decode_operand()
    KVM: x86 emulator: expand decode flags to 64 bits
    KVM: x86 emulator: split dst decode to a generic decode_operand()
    KVM: x86 emulator: move memop, memopp into emulation context
    ...

    Linus Torvalds
     
  • * 'fbdev-next' of git://github.com/schandinat/linux-2.6: (270 commits)
    video: platinumfb: Add __devexit_p at necessary place
    drivers/video: fsl-diu-fb: merge diu_pool into fsl_diu_data
    drivers/video: fsl-diu-fb: merge diu_hw into fsl_diu_data
    drivers/video: fsl-diu-fb: only DIU modes 0 and 1 are supported
    drivers/video: fsl-diu-fb: remove unused panel operating mode support
    drivers/video: fsl-diu-fb: use an enum for the AOI index
    drivers/video: fsl-diu-fb: add several new video modes
    drivers/video: fsl-diu-fb: remove broken screen blanking support
    drivers/video: fsl-diu-fb: move some definitions out of the header file
    drivers/video: fsl-diu-fb: fix some ioctls
    video: da8xx-fb: Increased resolution configuration of revised LCDC IP
    OMAPDSS: picodlp: add missing #include
    fb: fix au1100fb bitrot.
    mx3fb: fix NULL pointer dereference in screen blanking.
    video: irq: Remove IRQF_DISABLED
    smscufx: change edid data to u8 instead of char
    OMAPDSS: DISPC: zorder support for DSS overlays
    OMAPDSS: DISPC: VIDEO3 pipeline support
    OMAPDSS/OMAP_VOUT: Fix incorrect OMAP3-alpha compatibility setting
    video/omap: fix build dependencies
    ...

    Fix up conflicts in:
    - drivers/staging/xgifb/XGI_main_26.c
    Changes to XGIfb_pan_var()
    - drivers/video/omap/{lcd_apollon.c,lcd_ldp.c,lcd_overo.c}
    Removed (or in the case of apollon.c, merged into the generic
    DSS panel in drivers/video/omap2/displays/panel-generic-dpi.c)

    Linus Torvalds
     

30 Oct, 2011

3 commits


29 Oct, 2011

12 commits

  • * 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6:
    ARM: mark empty gpio.h files empty
    gpio: Fix ARM versatile-express build failure
    of: include errno.h

    Linus Torvalds
     
  • * 'spi/next' of git://git.secretlab.ca/git/linux-2.6:
    drivercore: Add helper macro for platform_driver boilerplate
    spi: irq: Remove IRQF_DISABLED
    OMAP: SPI: Fix the trying to free nonexistent resource error
    spi/spi-ep93xx: add module.h include
    spi/tegra: fix compilation error in spi-tegra.c
    spi: spi-dw: fix all sparse warnings
    spi/spi-pl022: Call pl022_dma_remove(pl022) only if enable_dma is true
    spi/spi-pl022: calculate_effective_freq() must set rate t allocate more sg than required.
    spi/spi-pl022: Use GFP_ATOMIC for allocation from tasklet
    spi/spi-pl022: Resolve formatting issues

    Linus Torvalds
     
  • * 'gpio/next' of git://git.secretlab.ca/git/linux-2.6:
    h8300: Move gpio.h to gpio-internal.h
    gpio: pl061: add DT binding support
    gpio: fix build error in include/asm-generic/gpio.h
    gpiolib: Ensure struct gpio is always defined
    irq: Add EXPORT_SYMBOL_GPL to function of irq generic-chip
    gpio-ml-ioh: Use NUMA_NO_NODE not GFP_KERNEL
    gpio-pch: Use NUMA_NO_NODE not GFP_KERNEL
    gpio: langwell: ensure alternate function is cleared
    gpio-pch: Support interrupt function
    gpio-pch: Save register value in suspend()
    gpio-pch: modify gpio_nums and mask
    gpio-pch: support ML7223 IOH n-Bus
    gpio-pch: add spinlock in suspend/resume processing
    gpio-pch: Delete invalid "restore" code in suspend()
    gpio-ml-ioh: Fix suspend/resume issue
    gpio-ml-ioh: Support interrupt function
    gpio-ml-ioh: Delete unnecessary code
    gpio/mxc: add chained_irq_enter/exit() to mx3_gpio_irq_handler()
    gpio/nomadik: use genirq core to track enablement
    gpio/nomadik: disable clocks when unused

    Linus Torvalds
     
  • When compiling ath6kl for beagleboard (omap2plus_defconfig plus
    CONFIG_ATH6KL, CONFIG_OF disable) with current linux-next compilation
    fails:

    include/linux/of.h:269: error: 'ENOSYS' undeclared (first use in this function)
    include/linux/of.h:276: error: 'ENOSYS' undeclared (first use in this function)
    include/linux/of.h:289: error: 'ENOSYS' undeclared (first use in this function)

    Fix this by including errno.h from of.h.

    Signed-off-by: Kalle Valo
    Acked-by: Geert Uytterhoeven
    Signed-off-by: Grant Likely

    Kalle Valo
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (204 commits)
    [SCSI] qla4xxx: export address/port of connection (fix udev disk names)
    [SCSI] ipr: Fix BUG on adapter dump timeout
    [SCSI] megaraid_sas: Fix instance access in megasas_reset_timer
    [SCSI] hpsa: change confusing message to be more clear
    [SCSI] iscsi class: fix vlan configuration
    [SCSI] qla4xxx: fix data alignment and use nl helpers
    [SCSI] iscsi class: fix link local mispelling
    [SCSI] iscsi class: Replace iscsi_get_next_target_id with IDA
    [SCSI] aacraid: use lower snprintf() limit
    [SCSI] lpfc 8.3.27: Change driver version to 8.3.27
    [SCSI] lpfc 8.3.27: T10 additions for SLI4
    [SCSI] lpfc 8.3.27: Fix queue allocation failure recovery
    [SCSI] lpfc 8.3.27: Change algorithm for getting physical port name
    [SCSI] lpfc 8.3.27: Changed worst case mailbox timeout
    [SCSI] lpfc 8.3.27: Miscellanous logic and interface fixes
    [SCSI] megaraid_sas: Changelog and version update
    [SCSI] megaraid_sas: Add driver workaround for PERC5/1068 kdump kernel panic
    [SCSI] megaraid_sas: Add multiple MSI-X vector/multiple reply queue support
    [SCSI] megaraid_sas: Add support for MegaRAID 9360/9380 12GB/s controllers
    [SCSI] megaraid_sas: Clear FUSION_IN_RESET before enabling interrupts
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://ceph.newdream.net/git/ceph-client:
    libceph: fix double-free of page vector
    ceph: fix 32-bit ino numbers
    libceph: force resend of osd requests if we skip an osdmap
    ceph: use kernel DNS resolver
    ceph: fix ceph_monc_init memory leak
    ceph: let the set_layout ioctl set single traits
    Revert "ceph: don't truncate dirty pages in invalidate work thread"
    ceph: replace leading spaces with tabs
    libceph: warn on msg allocation failures
    libceph: don't complain on msgpool alloc failures
    libceph: always preallocate mon connection
    libceph: create messenger with client
    ceph: document ioctls
    ceph: implement (optional) max read size
    ceph: rename rsize -> rasize
    ceph: make readpages fully async

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (549 commits)
    ALSA: hda - Fix ADC input-amp handling for Cx20549 codec
    ALSA: hda - Keep EAPD turned on for old Conexant chips
    ALSA: hda/realtek - Fix missing volume controls with ALC260
    ASoC: wm8940: Properly set codec->dapm.bias_level
    ALSA: hda - Fix pin-config for ASUS W90V
    ALSA: hda - Fix surround/CLFE headphone and speaker pins order
    ALSA: hda - Fix typo
    ALSA: Update the sound git tree URL
    ALSA: HDA: Add new revision for ALC662
    ASoC: max98095: Convert codec->hw_write to snd_soc_write
    ASoC: keep pointer to resource so it can be freed
    ASoC: sgtl5000: Fix wrong mask in some snd_soc_update_bits calls
    ASoC: wm8996: Fix wrong mask for setting WM8996_AIF_CLOCKING_2
    ASoC: da7210: Add support for line out and DAC
    ASoC: da7210: Add support for DAPM
    ALSA: hda/realtek - Fix DAC assignments of multiple speakers
    ASoC: Use SGTL5000_LINREG_VDDD_MASK instead of hardcoded mask value
    ASoC: Set sgtl5000->ldo in ldo_regulator_register
    ASoC: wm8996: Use SND_SOC_DAPM_AIF_OUT for AIF2 Capture
    ASoC: wm8994: Use SND_SOC_DAPM_AIF_OUT for AIF3 Capture
    ...

    Linus Torvalds
     
  • * 'next-rebase' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci:
    PCI: Clean-up MPS debug output
    pci: Clamp pcie_set_readrq() when using "performance" settings
    PCI: enable MPS "performance" setting to properly handle bridge MPS
    PCI: Workaround for Intel MPS errata
    PCI: Add support for PASID capability
    PCI: Add implementation for PRI capability
    PCI: Export ATS functions to modules
    PCI: Move ATS implementation into own file
    PCI / PM: Remove unnecessary error variable from acpi_dev_run_wake()
    PCI hotplug: acpiphp: Prevent deadlock on PCI-to-PCI bridge remove
    PCI / PM: Extend PME polling to all PCI devices
    PCI quirk: mmc: Always check for lower base frequency quirk for Ricoh 1180:e823
    PCI: Make pci_setup_bridge() non-static for use by arch code
    x86: constify PCI raw ops structures
    PCI: Add quirk for known incorrect MPSS
    PCI: Add Solarflare vendor ID and SFC4000 device IDs

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: (83 commits)
    mmc: fix compile error when CONFIG_BLOCK is not enabled
    mmc: core: Cleanup eMMC4.5 conditionals
    mmc: omap_hsmmc: if multiblock reads are broken, disable them
    mmc: core: add workaround for controllers with broken multiblock reads
    mmc: core: Prevent too long response times for suspend
    mmc: recognise SDIO cards with SDIO_CCCR_REV 3.00
    mmc: sd: Handle SD3.0 cards not supporting UHS-I bus speed mode
    mmc: core: support HPI send command
    mmc: core: Add cache control for eMMC4.5 device
    mmc: core: Modify the timeout value for writing power class
    mmc: core: new discard feature support at eMMC v4.5
    mmc: core: mmc sanitize feature support for v4.5
    mmc: dw_mmc: modify DATA register offset
    mmc: sdhci-pci: add flag for devices that can support runtime PM
    mmc: omap_hsmmc: ensure pbias configuration is always done
    mmc: core: Add Power Off Notify Feature eMMC 4.5
    mmc: sdhci-s3c: fix potential NULL dereference
    mmc: replace printk with appropriate display macro
    mmc: core: Add default timeout value for CMD6
    mmc: sdhci-pci: add runtime pm support
    ...

    Linus Torvalds
     
  • …git-cur/linux-2.6-arm

    * 'devel-stable' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm: (178 commits)
    ARM: 7139/1: fix compilation with CONFIG_ARM_ATAG_DTB_COMPAT and large TEXT_OFFSET
    ARM: gic, local timers: use the request_percpu_irq() interface
    ARM: gic: consolidate PPI handling
    ARM: switch from NO_MACH_MEMORY_H to NEED_MACH_MEMORY_H
    ARM: mach-s5p64x0: remove mach/memory.h
    ARM: mach-s3c64xx: remove mach/memory.h
    ARM: plat-mxc: remove mach/memory.h
    ARM: mach-prima2: remove mach/memory.h
    ARM: mach-zynq: remove mach/memory.h
    ARM: mach-bcmring: remove mach/memory.h
    ARM: mach-davinci: remove mach/memory.h
    ARM: mach-pxa: remove mach/memory.h
    ARM: mach-ixp4xx: remove mach/memory.h
    ARM: mach-h720x: remove mach/memory.h
    ARM: mach-vt8500: remove mach/memory.h
    ARM: mach-s5pc100: remove mach/memory.h
    ARM: mach-tegra: remove mach/memory.h
    ARM: plat-tcc: remove mach/memory.h
    ARM: mach-mmp: remove mach/memory.h
    ARM: mach-cns3xxx: remove mach/memory.h
    ...

    Fix up mostly pretty trivial conflicts in:
    - arch/arm/Kconfig
    - arch/arm/include/asm/localtimer.h
    - arch/arm/kernel/Makefile
    - arch/arm/mach-shmobile/board-ap4evb.c
    - arch/arm/mach-u300/core.c
    - arch/arm/mm/dma-mapping.c
    - arch/arm/mm/proc-v7.S
    - arch/arm/plat-omap/Kconfig
    largely due to some CONFIG option renaming (ie CONFIG_PM_SLEEP ->
    CONFIG_ARM_CPU_SUSPEND for the arm-specific suspend code etc) and
    addition of NEED_MACH_MEMORY_H next to HAVE_IDE.

    Linus Torvalds
     
  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue: (21 commits)
    leases: fix write-open/read-lease race
    nfs: drop unnecessary locking in llseek
    ext4: replace cut'n'pasted llseek code with generic_file_llseek_size
    vfs: add generic_file_llseek_size
    vfs: do (nearly) lockless generic_file_llseek
    direct-io: merge direct_io_walker into __blockdev_direct_IO
    direct-io: inline the complete submission path
    direct-io: separate map_bh from dio
    direct-io: use a slab cache for struct dio
    direct-io: rearrange fields in dio/dio_submit to avoid holes
    direct-io: fix a wrong comment
    direct-io: separate fields only used in the submission path from struct dio
    vfs: fix spinning prevention in prune_icache_sb
    vfs: add a comment to inode_permission()
    vfs: pass all mask flags check_acl and posix_acl_permission
    vfs: add hex format for MAY_* flag values
    vfs: indicate that the permission functions take all the MAY_* flags
    compat: sync compat_stats with statfs.
    vfs: add "device" tag to /proc/self/mountstats
    cleanup: vfs: small comment fix for block_invalidatepage
    ...

    Fix up trivial conflict in fs/gfs2/file.c (llseek changes)

    Linus Torvalds
     
  • * '3.2-without-smb2' of git://git.samba.org/sfrench/cifs-2.6: (52 commits)
    Fix build break when freezer not configured
    Add definition for share encryption
    CIFS: Make cifs_push_locks send as many locks at once as possible
    CIFS: Send as many mandatory unlock ranges at once as possible
    CIFS: Implement caching mechanism for posix brlocks
    CIFS: Implement caching mechanism for mandatory brlocks
    CIFS: Fix DFS handling in cifs_get_file_info
    CIFS: Fix error handling in cifs_readv_complete
    [CIFS] Fixup trivial checkpatch warning
    [CIFS] Show nostrictsync and noperm mount options in /proc/mounts
    cifs, freezer: add wait_event_freezekillable and have cifs use it
    cifs: allow cifs_max_pending to be readable under /sys/module/cifs/parameters
    cifs: tune bdi.ra_pages in accordance with the rsize
    cifs: allow for larger rsize= options and change defaults
    cifs: convert cifs_readpages to use async reads
    cifs: add cifs_async_readv
    cifs: fix protocol definition for READ_RSP
    cifs: add a callback function to receive the rest of the frame
    cifs: break out 3rd receive phase into separate function
    cifs: find mid earlier in receive codepath
    ...

    Linus Torvalds
     

28 Oct, 2011

7 commits

  • Add a generic_file_llseek variant to the VFS that allows passing in
    the maximum file size of the file system, instead of always
    using maxbytes from the superblock.

    This can be used to eliminate some cut'n'paste seek code in ext4.

    Signed-off-by: Andi Kleen
    Signed-off-by: Christoph Hellwig

    Andi Kleen
     
  • The i_mutex lock use of generic _file_llseek hurts. Independent processes
    accessing the same file synchronize over a single lock, even though
    they have no need for synchronization at all.

    Under high utilization this can cause llseek to scale very poorly on larger
    systems.

    This patch does some rethinking of the llseek locking model:

    First the 64bit f_pos is not necessarily atomic without locks
    on 32bit systems. This can already cause races with read() today.
    This was discussed on linux-kernel in the past and deemed acceptable.
    The patch does not change that.

    Let's look at the different seek variants:

    SEEK_SET: Doesn't really need any locking.
    If there's a race one writer wins, the other loses.

    For 32bit the non atomic update races against read()
    stay the same. Without a lock they can also happen
    against write() now. The read() race was deemed
    acceptable in past discussions, and I think if it's
    ok for read it's ok for write too.

    => Don't need a lock.

    SEEK_END: This behaves like SEEK_SET plus it reads
    the maximum size too. Reading the maximum size would have the
    32bit atomic problem. But luckily we already have a way to read
    the maximum size without locking (i_size_read), so we
    can just use that instead.

    Without i_mutex there is no synchronization with write() anymore,
    however since the write() update is atomic on 64bit it just behaves
    like another racy SEEK_SET. On non atomic 32bit it's the same
    as SEEK_SET.

    => Don't need a lock, but need to use i_size_read()

    SEEK_CUR: This has a read-modify-write race window
    on the same file. One could argue that any application
    doing unsynchronized seeks on the same file is already broken.
    But for the sake of not adding a regression here I'm
    using the file->f_lock to synchronize this. Using this
    lock is much better than the inode mutex because it doesn't
    synchronize between processes.

    => So still need a lock, but can use a f_lock.

    This patch implements this new scheme in generic_file_llseek.
    I dropped generic_file_llseek_unlocked and changed all callers.

    Signed-off-by: Andi Kleen
    Signed-off-by: Christoph Hellwig

    Andi Kleen
     
  • We are going to add more flags and having them in hex format
    make it simpler

    Acked-by: J. Bruce Fields
    Acked-by: David Howells
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Christoph Hellwig

    Aneesh Kumar K.V
     
  • * 'drm-core-next' of git://people.freedesktop.org/~airlied/linux: (290 commits)
    Revert "drm/ttm: add a way to bo_wait for either the last read or last write"
    Revert "drm/radeon/kms: add a new gem_wait ioctl with read/write flags"
    vmwgfx: Don't pass unused arguments to do_dirty functions
    vmwgfx: Emulate depth 32 framebuffers
    drm/radeon: Lower the severity of the radeon lockup messages.
    drm/i915/dp: Fix eDP on PCH DP on CPT/PPT
    drm/i915/dp: Introduce is_cpu_edp()
    drm/i915: use correct SPD type value
    drm/i915: fix ILK+ infoframe support
    drm/i915: add DP test request handling
    drm/i915: read full receiver capability field during DP hot plug
    drm/i915/dp: Remove eDP special cases from bandwidth checks
    drm/i915/dp: Fix the math in intel_dp_link_required
    drm/i915/panel: Always record the backlight level again (but cleverly)
    i915: Move i915_read/write out of line
    drm/i915: remove transcoder PLL mashing from mode_set per specs
    drm/i915: if transcoder disable fails, say which
    drm/i915: set watermarks for third pipe on IVB
    drm/i915: export a CPT mode set verification function
    drm/i915: fix transcoder PLL select masking
    ...

    Linus Torvalds
     
  • * 'x86-rdrand-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86, random: Verify RDRAND functionality and allow it to be disabled
    x86, random: Architectural inlines to get random integers with RDRAND
    random: Add support for architectural random hooks

    Fix up trivial conflicts in drivers/char/random.c: the architectural
    random hooks touched "get_random_int()" that was simplified to use MD5
    and not do the keyptr thing any more (see commit 6e5714eaf77d: "net:
    Compute protocol sequence numbers and fragment IDs using MD5").

    Linus Torvalds
     
  • fs/cifs/transport.c: In function 'wait_for_response':
    fs/cifs/transport.c:328: error: implicit declaration of function 'wait_event_freezekillable'

    Caused by commit f06ac72e9291 ("cifs, freezer: add
    wait_event_freezekillable and have cifs use it"). In this config,
    CONFIG_FREEZER is not set.

    Reviewed-by: Shirish Pargaonkar
    CC: Jeff Layton
    Signed-off-by: Steve French

    Steve French
     
  • This reverts commit dfadbbdb57b3f2bb33e14f129a43047c6f0caefa.

    Further upstream discussion between Marek and Thomas decided this wasn't
    fully baked and needed further work, so revert it before it hits mainline.

    Signed-off-by: Dave Airlie

    Dave Airlie