17 Jun, 2009

25 commits

  • Fix allocating page cache/slab object on the unallowed node when memory
    spread is set by updating tasks' mems_allowed after its cpuset's mems is
    changed.

    In order to update tasks' mems_allowed in time, we must modify the code of
    memory policy. Because the memory policy is applied in the process's
    context originally. After applying this patch, one task directly
    manipulates anothers mems_allowed, and we use alloc_lock in the
    task_struct to protect mems_allowed and memory policy of the task.

    But in the fast path, we didn't use lock to protect them, because adding a
    lock may lead to performance regression. But if we don't add a lock,the
    task might see no nodes when changing cpuset's mems_allowed to some
    non-overlapping set. In order to avoid it, we set all new allowed nodes,
    then clear newly disallowed ones.

    [lee.schermerhorn@hp.com:
    The rework of mpol_new() to extract the adjusting of the node mask to
    apply cpuset and mpol flags "context" breaks set_mempolicy() and mbind()
    with MPOL_PREFERRED and a NULL nodemask--i.e., explicit local
    allocation. Fix this by adding the check for MPOL_PREFERRED and empty
    node mask to mpol_new_mpolicy().

    Remove the now unneeded 'nodes = NULL' from mpol_new().

    Note that mpol_new_mempolicy() is always called with a non-NULL
    'nodes' parameter now that it has been removed from mpol_new().
    Therefore, we don't need to test nodes for NULL before testing it for
    'empty'. However, just to be extra paranoid, add a VM_BUG_ON() to
    verify this assumption.]
    [lee.schermerhorn@hp.com:

    I don't think the function name 'mpol_new_mempolicy' is descriptive
    enough to differentiate it from mpol_new().

    This function applies cpuset set context, usually constraining nodes
    to those allowed by the cpuset. However, when the 'RELATIVE_NODES flag
    is set, it also translates the nodes. So I settled on
    'mpol_set_nodemask()', because the comment block for mpol_new() mentions
    that we need to call this function to "set nodes".

    Some additional minor line length, whitespace and typo cleanup.]
    Signed-off-by: Miao Xie
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Christoph Lameter
    Cc: Paul Menage
    Cc: Nick Piggin
    Cc: Yasunori Goto
    Cc: Pekka Enberg
    Cc: David Rientjes
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miao Xie
     
  • Fix the bug that the kernel didn't spread page cache/slab object evenly
    over all the allowed nodes when spread flags were set by updating tasks'
    page/slab spread flags in time.

    Signed-off-by: Miao Xie
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Christoph Lameter
    Cc: Paul Menage
    Cc: Nick Piggin
    Cc: Yasunori Goto
    Cc: Pekka Enberg
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miao Xie
     
  • The kernel still allocates the page caches on old node after modifying its
    cpuset's mems when 'memory_spread_page' was set, or it didn't spread the
    page cache evenly over all the nodes that faulting task is allowed to usr
    after memory_spread_page was set. it is caused by the old mem_allowed and
    flags of the task, the current kernel doesn't updates them unless some
    function invokes cpuset_update_task_memory_state(), it is too late
    sometimes.We must update the mem_allowed and the flags of the tasks in
    time.

    Slab has the same problem.

    The following patches fix this bug by updating tasks' mem_allowed and
    spread flag after its cpuset's mems or spread flag is changed.

    This patch:

    Extract a function from cpuset_update_task_memory_state(). It will be
    used later for update tasks' page/slab spread flags after its cpuset's
    flag is set

    Signed-off-by: Miao Xie
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Christoph Lameter
    Cc: Paul Menage
    Cc: Nick Piggin
    Cc: Yasunori Goto
    Cc: Pekka Enberg
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miao Xie
     
  • get_dirty_limits() calls clip_bdi_dirty_limit() and task_dirty_limit()
    with variable pbdi_dirty as one of the arguments. This variable is an
    unsigned long * but both functions expect it to be a long *. This causes
    the following sparse warnings:

    warning: incorrect type in argument 3 (different signedness)
    expected long *pbdi_dirty
    got unsigned long *pbdi_dirty
    warning: incorrect type in argument 2 (different signedness)
    expected long *pdirty
    got unsigned long *pbdi_dirty

    Fix the warnings by changing the long * to unsigned long * in both
    functions.

    Signed-off-by: H Hartley Sweeten
    Cc: Johannes Weiner
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    H Hartley Sweeten
     
  • Commit 33c120ed2843090e2bd316de1588b8bf8b96cbde ("more aggressively use
    lumpy reclaim") increased how aggressive lumpy reclaim was by isolating
    both active and inactive pages for asynchronous lumpy reclaim on
    costly-high-order pages and for cheap-high-order when memory pressure is
    high. However, if the system is under heavy pressure and there are dirty
    pages, asynchronous IO may not be sufficient to reclaim a suitable page in
    time.

    This patch causes the caller to enter synchronous lumpy reclaim for
    costly-high-order pages and for cheap-high-order pages when under memory
    pressure.

    Minchan.kim@gmail.com said:

    Andy added synchronous lumpy reclaim with
    c661b078fd62abe06fd11fab4ac5e4eeafe26b6d. At that time, lumpy reclaim is
    not agressive. His intension is just for high-order users.(above
    PAGE_ALLOC_COSTLY_ORDER).

    After some time, Rik added aggressive lumpy reclaim with
    33c120ed2843090e2bd316de1588b8bf8b96cbde. His intention was to do lumpy
    reclaim when high-order users and trouble getting a small set of
    contiguous pages.

    So we also have to add synchronous pageout for small set of contiguous
    pages.

    Cc: Lee Schermerhorn
    Cc: Andy Whitcroft
    Acked-by: Peter Zijlstra
    Cc: Rik van Riel
    Signed-off-by: KOSAKI Motohiro
    Reviewed-by: Minchan Kim
    Reviewed-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • Move more documentation for get_user_pages_fast into the new kerneldoc comment.
    Add some comments for get_user_pages as well.

    Also, move get_user_pages_fast declaration up to get_user_pages. It wasn't
    there initially because it was once a static inline function.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Nick Piggin
    Cc: Andy Grover
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Now that we do readahead for sequential mmap reads, here is a simple
    evaluation of the impacts, and one further optimization.

    It's an NFS-root debian desktop system, readahead size = 60 pages.
    The numbers are grabbed after a fresh boot into console.

    approach pgmajfault RA miss ratio mmap IO count avg IO size(pages)
    A 383 31.6% 383 11
    B 225 32.4% 390 11
    C 224 32.6% 307 13

    case A: mmap sync/async readahead disabled
    case B: mmap sync/async readahead enabled, with enforced full async readahead size
    case C: mmap sync/async readahead enabled, with enforced full sync/async readahead size
    or:
    A = vanilla 2.6.30-rc1
    B = A plus mmap readahead
    C = B plus this patch

    The numbers show that
    - there are good possibilities for random mmap reads to trigger readahead
    - 'pgmajfault' is reduced by 1/3, due to the _async_ nature of readahead
    - case C can further reduce IO count by 1/4
    - readahead miss ratios are not quite affected

    The theory is
    - readahead is _good_ for clustered random reads, and can perform
    _better_ than readaround because they could be _async_.
    - async readahead size is guaranteed to be larger than readaround
    size, and they are _async_, hence will mostly behave better
    However for B
    - sync readahead size could be smaller than readaround size, hence may
    make things worse by produce more smaller IOs
    which will be fixed by this patch.

    Final conclusion:
    - mmap readahead reduced major faults by 1/3 and no obvious overheads;
    - mmap io can be further reduced by 1/4 with this patch.

    Signed-off-by: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • Signed-off-by: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • Introduce page cache context based readahead algorithm.
    This is to better support concurrent read streams in general.

    RATIONALE
    ---------
    The current readahead algorithm detects interleaved reads in a _passive_ way.
    Given a sequence of interleaved streams 1,1001,2,1002,3,4,1003,5,1004,1005,6,...
    By checking for (offset == prev_offset + 1), it will discover the sequentialness
    between 3,4 and between 1004,1005, and start doing sequential readahead for the
    individual streams since page 4 and page 1005.

    The context readahead algorithm guarantees to discover the sequentialness no
    matter how the streams are interleaved. For the above example, it will start
    sequential readahead since page 2 and 1002.

    The trick is to poke for page @offset-1 in the page cache when it has no other
    clues on the sequentialness of request @offset: if the current requenst belongs
    to a sequential stream, that stream must have accessed page @offset-1 recently,
    and the page will still be cached now. So if page @offset-1 is there, we can
    take request @offset as a sequential access.

    BENEFICIARIES
    -------------
    - strictly interleaved reads i.e. 1,1001,2,1002,3,1003,...
    the current readahead will take them as silly random reads;
    the context readahead will take them as two sequential streams.

    - cooperative IO processes i.e. NFS and SCST
    They create a thread pool, farming off (sequential) IO requests to different
    threads which will be performing interleaved IO.

    It was not easy(or possible) to reliably tell from file->f_ra all those
    cooperative processes working on the same sequential stream, since they will
    have different file->f_ra instances. And NFSD's file->f_ra is particularly
    unusable, since their file objects are dynamically created for each request.
    The nfsd does have code trying to restore the f_ra bits, but not satisfactory.

    The new scheme is to detect the sequential pattern via looking up the page
    cache, which provides one single and consistent view of the pages recently
    accessed. That makes sequential detection for cooperative processes possible.

    USER REPORT
    -----------
    Vladislav recommends the addition of context readahead as a result of his SCST
    benchmarks. It leads to 6%~40% performance gains in various cases and achieves
    equal performance in others. http://lkml.org/lkml/2009/3/19/239

    OVERHEADS
    ---------
    In theory, it introduces one extra page cache lookup per random read. However
    the below benchmark shows context readahead to be slightly faster, wondering..

    Randomly reading 200MB amount of data on a sparse file, repeat 20 times for
    each block size. The average throughputs are:

    original ra context ra gain
    4K random reads: 65.561MB/s 65.648MB/s +0.1%
    16K random reads: 124.767MB/s 124.951MB/s +0.1%
    64K random reads: 162.123MB/s 162.278MB/s +0.1%

    Cc: Jens Axboe
    Cc: Jeff Moyer
    Tested-by: Vladislav Bolkhovitin
    Signed-off-by: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • Split all readahead cases, and move the random one to bottom.

    No behavior changes.

    This is to prepare for the introduction of context readahead, and make it
    easy for inserting accounting/tracing points for each case.

    Signed-off-by: Wu Fengguang
    Cc: Vladislav Bolkhovitin
    Cc: Jens Axboe
    Cc: Jeff Moyer
    Cc: Nick Piggin
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • The counterpart of radix_tree_next_hole(). To be used by context readahead.

    Signed-off-by: Wu Fengguang
    Cc: Vladislav Bolkhovitin
    Cc: Jens Axboe
    Cc: Jeff Moyer
    Cc: Nick Piggin
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • Mmap read-around now shares the same code style and data structure with
    readahead code.

    This also removes do_page_cache_readahead(). Its last user, mmap
    read-around, has been changed to call ra_submit().

    The no-readahead-if-congested logic is dumped by the way. Users will be
    pretty sensitive about the slow loading of executables. So it's
    unfavorable to disabled mmap read-around on a congested queue.

    [akpm@linux-foundation.org: coding-style fixes]
    Cc: Nick Piggin
    Signed-off-by: Fengguang Wu
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • We need this in one particular case and two more general ones.

    Now we do async readahead for sequential mmap reads, and do it with the
    help of PG_readahead. For normal reads, PG_readahead is the sufficient
    condition to do a sequential readahead. But unfortunately, for mmap
    reads, there is a tiny nuisance:

    [11736.998347] readahead-init0(process: sh/23926, file: sda1/w3m, offset=0:4503599627370495, ra=0+4-3) = 4
    [11737.014985] readahead-around(process: w3m/23926, file: sda1/w3m, offset=0:0, ra=290+32-0) = 17
    [11737.019488] readahead-around(process: w3m/23926, file: sda1/w3m, offset=0:0, ra=118+32-0) = 32
    [11737.024921] readahead-interleaved(process: w3m/23926, file: sda1/w3m, offset=0:2, ra=4+6-6) = 6
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~

    An unfavorably small readahead. The original dumb read-around size could
    be more efficient.

    That happened because ld-linux.so does a read(832) in L1 before mmap(),
    which triggers a 4-page readahead, with the second page tagged
    PG_readahead.

    L0: open("/lib/libc.so.6", O_RDONLY) = 3
    L1: read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\342"..., 832) = 832
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    L2: fstat(3, {st_mode=S_IFREG|0755, st_size=1420624, ...}) = 0
    L3: mmap(NULL, 3527256, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fac6e51d000
    L4: mprotect(0x7fac6e671000, 2097152, PROT_NONE) = 0
    L5: mmap(0x7fac6e871000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x154000) = 0x7fac6e871000
    L6: mmap(0x7fac6e876000, 16984, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fac6e876000
    L7: close(3) = 0

    In general, the PG_readahead flag will also be hit in cases

    - sequential reads

    - clustered random reads

    A full readahead size is desirable in both cases.

    Cc: Nick Piggin
    Signed-off-by: Wu Fengguang
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • Auto-detect sequential mmap reads and do readahead for them.

    The sequential mmap readahead will be triggered when
    - sync readahead: it's a major fault and (prev_offset == offset-1);
    - async readahead: minor fault on PG_readahead page with valid readahead state.

    The benefits of doing readahead instead of read-around:
    - less I/O wait thanks to async readahead
    - double real I/O size and no more cache hits

    The single stream case is improved a little.
    For 100,000 sequential mmap reads:

    user system cpu total
    (1-1) plain -mm, 128KB readaround: 3.224 2.554 48.40% 11.838
    (1-2) plain -mm, 256KB readaround: 3.170 2.392 46.20% 11.976
    (2) patched -mm, 128KB readahead: 3.117 2.448 47.33% 11.607

    The patched (2) has smallest total time, since it has no cache hit overheads
    and less I/O block time(thanks to async readahead). Here the I/O size
    makes no much difference, since there's only one single stream.

    Note that (1-1)'s real I/O size is 64KB and (1-2)'s real I/O size is 128KB,
    since the half of the read-around pages will be readahead cache hits.

    This is going to make _real_ differences for _concurrent_ IO streams.

    Cc: Nick Piggin
    Signed-off-by: Wu Fengguang
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • This shouldn't really change behavior all that much, but the single rather
    complex function with read-ahead inside a loop etc is broken up into more
    manageable pieces.

    The behaviour is also less subtle, with the read-ahead being done up-front
    rather than inside some subtle loop and thus avoiding the now unnecessary
    extra state variables (ie "did_readaround" is gone).

    Fengguang: the code split in fact fixed a bug reported by Pavel Levshin:
    the PGMAJFAULT accounting used to be bypassed when MADV_RANDOM is set, in
    which case the original code will directly jump to no_cached_page reading.

    Cc: Pavel Levshin
    Cc:
    Cc: Nick Piggin
    Signed-off-by: Wu Fengguang
    Signed-off-by: Linus Torvalds
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • The readahead call scheme is error-prone in that it expects the call sites
    to check for async readahead after doing a sync one. I.e.

    if (!page)
    page_cache_sync_readahead();
    page = find_get_page();
    if (page && PageReadahead(page))
    page_cache_async_readahead();

    This is because PG_readahead could be set by a sync readahead for the
    _current_ newly faulted in page, and the readahead code simply expects one
    more callback on the same page to start the async readahead. If the
    caller fails to do so, it will miss the PG_readahead bits and never able
    to start an async readahead.

    Eliminate this insane constraint by piggy-backing the async part into the
    current readahead window.

    Now if an async readahead should be started immediately after a sync one,
    the readahead logic itself will do it. So the following code becomes
    valid: (the 'else' in particular)

    if (!page)
    page_cache_sync_readahead();
    else if (PageReadahead(page))
    page_cache_async_readahead();

    Cc: Nick Piggin
    Signed-off-by: Wu Fengguang
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • Make sure interleaved readahead size is larger than request size. This
    also makes the readahead window grow up more quickly.

    Reported-by: Xu Chenfeng
    Signed-off-by: Wu Fengguang
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • (hit_readahead_marker != 0) means the page at @offset is present, so we
    can search for non-present page starting from @offset+1.

    Reported-by: Xu Chenfeng
    Signed-off-by: Wu Fengguang
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • Just in case someone aggressively sets a huge readahead size.

    Cc: Nick Piggin
    Signed-off-by: Wu Fengguang
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • Impact: code simplification.

    Cc: Nick Piggin
    Signed-off-by: Wu Fengguang
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • This makes the performance impact of possible mmap_miss wrap around to be
    temporary and tolerable: i.e. MMAP_LOTSAMISS=100 extra readarounds.

    Otherwise if ever mmap_miss wraps around to negative, it takes INT_MAX
    cache misses to bring it back to normal state. During the time mmap
    readaround will be _enabled_ for whatever wild random workload. That's
    almost permanent performance impact.

    Signed-off-by: Wu Fengguang
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • * create mm/init-mm.c, move init_mm there
    * remove INIT_MM, initialize init_mm with C99 initializer
    * unexport init_mm on all arches:

    init_mm is already unexported on x86.

    One strange place is some OMAP driver (drivers/video/omap/) which
    won't build modular, but it's already wants get_vm_area() export.
    Somebody should look there.

    [akpm@linux-foundation.org: add missing #includes]
    Signed-off-by: Alexey Dobriyan
    Cc: Mike Frysinger
    Cc: Americo Wang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13484

    Peer reported:
    | The bug is introduced from kernel 2.6.27, if E820 table reserve the memory
    | above 4G in 32bit OS(BIOS-e820: 00000000fff80000 - 0000000120000000
    | (reserved)), system will report Int 6 error and hang up. The bug is caused by
    | the following code in drivers/firmware/memmap.c, the resource_size_t is 32bit
    | variable in 32bit OS, the BUG_ON() will be invoked to result in the Int 6
    | error. I try the latest 32bit Ubuntu and Fedora distributions, all hit this
    | bug.
    |======
    |static int firmware_map_add_entry(resource_size_t start, resource_size_t end,
    | const char *type,
    | struct firmware_map_entry *entry)

    and it only happen with CONFIG_PHYS_ADDR_T_64BIT is not set.

    it turns out we need to pass u64 instead of resource_size_t for that.

    [akpm@linux-foundation.org: add comment]
    Reported-and-tested-by: Peer Chen
    Signed-off-by: Yinghai Lu
    Cc: Ingo Molnar
    Acked-by: H. Peter Anvin
    Cc: Thomas Gleixner
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     
  • Do not take the size of a pointer to determine the size of the pointed-to
    type.

    Signed-off-by: Roel Kluin
    Acked-by: Anton Vorontsov
    Cc: David Brownell
    Cc: Benjamin Herrenschmidt
    Cc: Kumar Gala
    Acked-by: Grant Likely
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roel Kluin
     
  • PIT_TICK_RATE is currently defined in four architectures, but in three
    different places. While linux/timex.h is not the perfect place for it, it
    is still a reasonable replacement for those drivers that traditionally use
    asm/timex.h to get CLOCK_TICK_RATE and expect it to be the PIT frequency.

    Note that for Alpha, the actual value changed from 1193182UL to 1193180UL.
    This is unlikely to make a difference, and probably can only improve
    accuracy. There was a discussion on the correct value of CLOCK_TICK_RATE
    a few years ago, after which every existing instance was getting changed
    to 1193182. According to the specification, it should be
    1193181.818181...

    Signed-off-by: Arnd Bergmann
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Len Brown
    Cc: john stultz
    Cc: Dmitry Torokhov
    Cc: Takashi Iwai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     

16 Jun, 2009

10 commits

  • * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
    [IA64] fix compile error in arch/ia64/mm/extable.c

    Linus Torvalds
     
  • …kernel/git/tip/linux-2.6-tip

    * 'timers-for-linus-migration' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    timers: Logic to move non pinned timers
    timers: /proc/sys sysctl hook to enable timer migration
    timers: Identifying the existing pinned timers
    timers: Framework for identifying pinned timers
    timers: allow deferrable timers for intervals tv2-tv5 to be deferred

    Fix up conflicts in kernel/sched.c and kernel/timer.c manually

    Linus Torvalds
     
  • …x/kernel/git/tip/linux-2.6-tip

    * 'timers-for-linus-clockevents' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    clockevent: export register_device and delta2ns
    clockevents: tick_broadcast_device can become static

    Linus Torvalds
     
  • …x/kernel/git/tip/linux-2.6-tip

    * 'timers-for-linus-clocksource' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    clocksource: prevent selection of low resolution clocksourse also for nohz=on
    clocksource: sanity check sysfs clocksource changes

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'timers-for-linus-ntp' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    ntp: fix comment typos
    ntp: adjust SHIFT_PLL to improve NTP convergence

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1244 commits)
    pkt_sched: Rename PSCHED_US2NS and PSCHED_NS2US
    ipv4: Fix fib_trie rebalancing
    Bluetooth: Fix issue with uninitialized nsh.type in DTL-1 driver
    Bluetooth: Fix Kconfig issue with RFKILL integration
    PIM-SM: namespace changes
    ipv4: update ARPD help text
    net: use a deferred timer in rt_check_expire
    ieee802154: fix kconfig bool/tristate muckup
    bonding: initialization rework
    bonding: use is_zero_ether_addr
    bonding: network device names are case sensative
    bonding: elminate bad refcount code
    bonding: fix style issues
    bonding: fix destructor
    bonding: remove bonding read/write semaphore
    bonding: initialize before registration
    bonding: bond_create always called with default parameters
    x_tables: Convert printk to pr_err
    netfilter: conntrack: optional reliable conntrack event delivery
    list_nulls: add hlist_nulls_add_head and hlist_nulls_del
    ...

    Linus Torvalds
     
  • * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (103 commits)
    powerpc: Fix bug in move of altivec code to vector.S
    powerpc: Add support for swiotlb on 32-bit
    powerpc/spufs: Remove unused error path
    powerpc: Fix warning when printing a resource_size_t
    powerpc/xmon: Remove unused variable in xmon.c
    powerpc/pseries: Fix warnings when printing resource_size_t
    powerpc: Shield code specific to 64-bit server processors
    powerpc: Separate PACA fields for server CPUs
    powerpc: Split exception handling out of head_64.S
    powerpc: Introduce CONFIG_PPC_BOOK3S
    powerpc: Move VMX and VSX asm code to vector.S
    powerpc: Set init_bootmem_done on NUMA platforms as well
    powerpc/mm: Fix a AB->BA deadlock scenario with nohash MMU context lock
    powerpc/mm: Fix some SMP issues with MMU context handling
    powerpc: Add PTRACE_SINGLEBLOCK support
    fbdev: Add PLB support and cleanup DCR in xilinxfb driver.
    powerpc/virtex: Add ml510 reference design device tree
    powerpc/virtex: Add Xilinx ML510 reference design support
    powerpc/virtex: refactor intc driver and add support for i8259 cascading
    powerpc/virtex: Add support for Xilinx PCI host bridge
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6:
    regulator/max1586: fix V3 gain calculation integer overflow
    regulator/max1586: support increased V3 voltage range
    regulator: lp3971 - fix driver link error when built-in.
    LP3971 PMIC regulator driver (updated and combined version)
    regulator: remove driver_data direct access of struct device
    regulator: Set MODULE_ALIAS for regulator drivers
    regulator: Support list_voltage for fixed voltage regulator
    regulator: Move regulator drivers to subsys_initcall()
    regulator: build fix for powerpc - renamed show_state
    regulator: add userspace-consumer driver
    Maxim 1586 regulator driver

    Linus Torvalds
     
  • ad6561dffa17f17bb68d7207d422c26c381c4313 ("module: trim exception table on init
    free.") put a bogus trim_init_extable() function into ia64 which didn't compile.

    Signed-off-by: Rusty Russell
    Signed-off-by: Tony Luck

    Rusty Russell
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (22 commits)
    nilfs2: support contiguous lookup of blocks
    nilfs2: add sync_page method to page caches of meta data
    nilfs2: use device's backing_dev_info for btree node caches
    nilfs2: return EBUSY against delete request on snapshot
    nilfs2: modify list of unsupported features in caveats
    nilfs2: enable sync_page method
    nilfs2: set bio unplug flag for the last bio in segment
    nilfs2: allow future expansion of metadata read out via get info ioctl
    NILFS2: Pagecache usage optimization on NILFS2
    nilfs2: remove nilfs_btree_operations from btree mapping
    nilfs2: remove nilfs_direct_operations from direct mapping
    nilfs2: remove bmap pointer operations
    nilfs2: remove useless b_low and b_high fields from nilfs_bmap struct
    nilfs2: remove pointless NULL check of bpop_commit_alloc_ptr function
    nilfs2: move get block functions in bmap.c into btree codes
    nilfs2: remove nilfs_bmap_delete_block
    nilfs2: remove nilfs_bmap_put_block
    nilfs2: remove header file for segment list operations
    nilfs2: eliminate removal list of segments
    nilfs2: add sufile function that can modify multiple segment usages
    ...

    Linus Torvalds
     

15 Jun, 2009

5 commits

  • On Thu, May 28, 2009 at 10:59 AM, Mark Brown wrote:
    > On Thu, May 28, 2009 at 07:15:16AM +0200, Philipp Zabel wrote:
    >> The V3 regulator can be configured with an external resistor
    >> connected to the feedback pin (R24 in the data sheet) to
    >> increase the voltage range.
    >>
    >> For example, hx4700 has R24 = 3.32 kOhm to achieve a maximum
    >> V3 voltage of 1.55 V which is needed for 624 MHz CPU frequency.
    >>
    >> Signed-off-by: Philipp Zabel
    >
    > Looks good.
    >
    > Acked-by: Mark Brown

    Thanks, but it turns out I hit a 32 bit integer overflow in
    the gain calculation. I'd like to mend that with the following
    patch. Now max_uV could be increased up to 4.294 V, enough to
    charge LiPo cells.

    Signed-off-by: Philipp Zabel
    Acked-by: Robert Jarzmik
    Signed-off-by: Liam Girdwood

    Philipp Zabel
     
  • The V3 regulator can be configured with an external resistor
    connected to the feedback pin (R24 in the data sheet) to
    increase the voltage range.

    For example, hx4700 has R24 = 3.32 kOhm to achieve a maximum
    V3 voltage of 1.55 V which is needed for 624 MHz CPU frequency.

    Signed-off-by: Philipp Zabel
    Acked-by: Mark Brown
    Acked-by: Robert Jarzmik
    Signed-off-by: Liam Girdwood

    Philipp Zabel
     
  • lp3971_i2c_remove' referenced in section `.data' of drivers/built-in.o:
    defined in discarded section `.devexit.text' of drivers/built-in.o

    Signed-off-by: Liam Girdwood

    Liam Girdwood
     
  • This patch adds regulator drivers for National Semiconductors LP3971 PMIC.
    This LP3971 PMIC controller has 3 DC/DC voltage converters and 5 low
    drop-out (LDO) regulators. LP3971 PMIC controller uses I2C interface.

    Reviewed-by: Kyungmin Park
    Signed-off-by: Marek Szyprowski
    Acked-by: Mark Brown
    Signed-off-by: Liam Girdwood

    Marek Szyprowski
     
  • In the near future, the driver core is going to not allow direct access
    to the driver_data pointer in struct device. Instead, the functions
    dev_get_drvdata() and dev_set_drvdata() should be used. These functions
    have been around since the beginning, so are backwards compatible with
    all older kernel versions.

    Cc: Mark Brown
    Cc: Liam Girdwood
    Signed-off-by: Greg Kroah-Hartman
    Acked-by: Mark Brown
    Signed-off-by: Liam Girdwood

    Greg Kroah-Hartman