16 Jun, 2011

40 commits

  • Fix format and spelling.

    Signed-off-by: Jörg Sommer
    Acked-by: Paul Menage
    Signed-off-by: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Jörg Sommer
     
  • According to commit 676db4af0430 ("cgroupfs: create /sys/fs/cgroup to
    mount cgroupfs on") the canonical mountpoint for the cgroup filesystem
    is /sys/fs/cgroup. Hence, this should be used in the documentation.

    Signed-off-by: Jörg Sommer
    Acked-by: Paul Menage
    Signed-off-by: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Jörg Sommer
     
  • Instead of listing the architectures that are supported by
    kmemleak in Documentation/kmemleak.txt, just refer people to
    the list of supported architecutures in lib/Kconfig.debug so
    that Documentation/kmemleak.txt does not need more updates
    for this.

    Signed-off-by: Maxin B. John
    Signed-off-by: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Maxin B. John
     
  • This patch updates the incomplete documentation concerning the printk
    extended format specifiers.

    Signed-off-by: Andrew Murray
    Signed-off-by: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Andrew Murray
     
  • …el/git/tip/linux-2.6-tip

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    sched: Check if lowest_mask is initialized in find_lowest_rq()
    sched: Fix need_resched() when checking peempt

    Linus Torvalds
     
  • Fix several security issues in Alpha-specific syscalls. Untested, but
    mostly trivial.

    1. Signedness issue in osf_getdomainname allows copying out-of-bounds
    kernel memory to userland.

    2. Signedness issue in osf_sysinfo allows copying large amounts of
    kernel memory to userland.

    3. Typo (?) in osf_getsysinfo bounds minimum instead of maximum copy
    size, allowing copying large amounts of kernel memory to userland.

    4. Usage of user pointer in osf_wait4 while under KERNEL_DS allows
    privilege escalation via writing return value of sys_wait4 to kernel
    memory.

    Signed-off-by: Dan Rosenberg
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Rosenberg
     
  • Fixes this warning:

    drivers/misc/apds990x.c: At top level:
    drivers/misc/apds990x.c:613: warning: `apds990x_chip_on' defined but not used

    Signed-off-by: Geert Uytterhoeven
    Cc: Samu Onkalo
    Cc: Jonathan Cameron
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     
  • Andrea Righi reported a case where an exiting task can race against
    ksmd::scan_get_next_rmap_item (http://lkml.org/lkml/2011/6/1/742) easily
    triggering a NULL pointer dereference in ksmd.

    ksm_scan.mm_slot == &ksm_mm_head with only one registered mm

    CPU 1 (__ksm_exit) CPU 2 (scan_get_next_rmap_item)
    list_empty() is false
    lock slot == &ksm_mm_head
    list_del(slot->mm_list)
    (list now empty)
    unlock
    lock
    slot = list_entry(slot->mm_list.next)
    (list is empty, so slot is still ksm_mm_head)
    unlock
    slot->mm == NULL ... Oops

    Close this race by revalidating that the new slot is not simply the list
    head again.

    Andrea's test case:

    #include
    #include
    #include
    #include

    #define BUFSIZE getpagesize()

    int main(int argc, char **argv)
    {
    void *ptr;

    if (posix_memalign(&ptr, getpagesize(), BUFSIZE) < 0) {
    perror("posix_memalign");
    exit(1);
    }
    if (madvise(ptr, BUFSIZE, MADV_MERGEABLE) < 0) {
    perror("madvise");
    exit(1);
    }
    *(char *)NULL = 0;

    return 0;
    }

    Reported-by: Andrea Righi
    Tested-by: Andrea Righi
    Cc: Andrea Arcangeli
    Signed-off-by: Hugh Dickins
    Signed-off-by: Chris Wright
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • RTC_CLASS is changed to bool, so 'm' is invalid.

    Signed-off-by: Wanlong Gao
    Acked-by: Mike Frysinger
    Acked-by: Wolfram Sang
    Acked-by: Hans-Christian Egtvedt
    Acked-by: Benjamin Herrenschmidt
    Cc: Guan Xuetao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wanlong Gao
     
  • If dmi_get_system_info() returns NULL, pch_uart_init_port() will
    dereferencea a zero pointer.

    This oops was observed on an Atom based board which has no BIOS, but
    a bootloder which doesn't provide DMI data.

    Signed-off-by: Alexander Stein
    Cc: Greg KH
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Stein
     
  • When interrupts are delayed due to interrupt masking or due to other
    interrupts being serviced the HPET periodic-emuation would fail. This
    happened because given an interval t and a time for the current interrupt
    m we would compute the next time as t + m. This works until we are
    delayed for > t, in which case we would be writing a new value which is in
    fact in the past.

    This can be solved by computing the next time instead as (k * t) + m where
    k is large enough to be in the future. The exact computation of k is
    described in a comment to the code.

    More detail:

    Assuming an interval of 5 between each expected interrupt we have a normal
    case of

    t0: interrupt, read t0 from comparator, set next interrupt t0 + 5
    t5: interrupt, read t5 from comparator, set next interrupt t5 + 5
    t10: interrupt, read t10 from comparator, set next interrupt t10 + 5
    ...

    So, what happens when the interrupt is serviced too late?

    t0: interrupt, read t0 from comparator, set next interrupt t0 + 5
    t11: delayed interrupt serviced, read t5 from comparator, set next
    interrupt t5 + 5, which is in the past!
    ... counter loops ...
    t10: Much much later, get the next interrupt.

    This can happen either because we have interrupts masked for too long
    (some stupid driver goes on a printk rampage) or just because we are
    pushing the limits of the interval (too small a period), or both most
    probably.

    My solution is to read the main counter as well and set the next interrupt
    to occur at the right interval, for example:

    t0: interrupt, read t0 from comparator, set next interrupt t0 + 5
    t11: delayed interrupt serviced, read t5 from comparator, set next
    interrupt t15 as t10 has been missed.
    t15: back on track.

    Signed-off-by: Nils Carlson
    Cc: John Stultz
    Cc: Thomas Gleixner
    Cc: Clemens Ladisch
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nils Carlson
     
  • Commit a77aea92010acf ("cgroup: remove the ns_cgroup") removed the
    ns_cgroup but it forgot to remove the related doc in
    feature-removal-schedule.txt.

    Signed-off-by: WANG Cong
    Cc: Daniel Lezcano
    Cc: Serge E. Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    akpm@linux-foundation.org
     
  • Asynchronous compaction is used when promoting to huge pages. This is all
    very nice but if there are a number of processes in compacting memory, a
    large number of pages can be isolated. An "asynchronous" process can
    stall for long periods of time as a result with a user reporting that
    firefox can stall for 10s of seconds. This patch aborts asynchronous
    compaction if too many pages are isolated as it's better to fail a
    hugepage promotion than stall a process.

    [minchan.kim@gmail.com: return COMPACT_PARTIAL for abort]
    Reported-and-tested-by: Ury Stankevich
    Signed-off-by: Mel Gorman
    Reviewed-by: Minchan Kim
    Reviewed-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • It is unsafe to run page_count during the physical pfn scan because
    compound_head could trip on a dangling pointer when reading
    page->first_page if the compound page is being freed by another CPU.

    [mgorman@suse.de: split out patch]
    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Mel Gorman
    Reviewed-by: Michal Hocko
    Reviewed-by: Minchan Kim

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Compaction works with two scanners, a migration and a free scanner. When
    the scanners crossover, migration within the zone is complete. The
    location of the scanner is recorded on each cycle to avoid excesive
    scanning.

    When a zone is small and mostly reserved, it's very easy for the migration
    scanner to be close to the end of the zone. Then the following situation
    can occurs

    o migration scanner isolates some pages near the end of the zone
    o free scanner starts at the end of the zone but finds that the
    migration scanner is already there
    o free scanner gets reinitialised for the next cycle as
    cc->migrate_pfn + pageblock_nr_pages
    moving the free scanner into the next zone
    o migration scanner moves into the next zone

    When this happens, NR_ISOLATED accounting goes haywire because some of the
    accounting happens against the wrong zone. One zones counter remains
    positive while the other goes negative even though the overall global
    count is accurate. This was reported on X86-32 with !SMP because !SMP
    allows the negative counters to be visible. The fact that it is the bug
    should theoritically be possible there.

    Signed-off-by: Mel Gorman
    Reviewed-by: Minchan Kim
    Reviewed-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • fragmentation_index() returns -1000 when the allocation might succeed
    This doesn't match the comment and code in compaction_suitable(). I
    thought compaction_suitable should return COMPACT_PARTIAL in -1000
    case, because in this case allocation could succeed depending on
    watermarks.

    The impact of this is that compaction starts and compact_finished() is
    called which rechecks the watermarks and the free lists. It should have
    the same result in that compaction should not start but is more expensive.

    Acked-by: Mel Gorman
    Signed-off-by: Shaohua Li
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohua Li
     
  • Pages isolated for migration are accounted with the vmstat counters
    NR_ISOLATE_[ANON|FILE]. Callers of migrate_pages() are expected to
    increment these counters when pages are isolated from the LRU. Once the
    pages have been migrated, they are put back on the LRU or freed and the
    isolated count is decremented.

    Memory failure is not properly accounting for pages it isolates causing
    the NR_ISOLATED counters to be negative. On SMP builds, this goes
    unnoticed as negative counters are treated as 0 due to expected per-cpu
    drift. On UP builds, the counter is treated by too_many_isolated() as a
    large value causing processes to enter D state during page reclaim or
    compaction. This patch accounts for pages isolated by memory failure
    correctly.

    [mel@csn.ul.ie: rewrote changelog]
    Reviewed-by: Andrea Arcangeli
    Signed-off-by: Minchan Kim
    Cc: Andi Kleen
    Acked-by: Mel Gorman
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • CONFIG_CONSTRUCTORS controls support for running constructor functions at
    kernel init time. According to commit b99b87f70c7785ab ("kernel:
    constructor support"), gcov (CONFIG_GCOV_KERNEL) needs this. However,
    CONFIG_CONSTRUCTORS currently defaults to y, with no option to disable it,
    and CONFIG_GCOV_KERNEL depends on it. Instead, default it to n and have
    CONFIG_GCOV_KERNEL select it, so that the normal case of
    CONFIG_GCOV_KERNEL=n will result in CONFIG_CONSTRUCTORS=n.

    Observed in the short list of =y values in a minimal kernel configuration.

    Signed-off-by: Josh Triplett
    Acked-by: WANG Cong
    Acked-by: Peter Oberparleiter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Triplett
     
  • I shall maintain the legacy eeprom driver, until we finally get rid of it.

    Signed-off-by: Jean Delvare
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jean Delvare
     
  • Based on Michal Hocko's comment.

    We are not draining per cpu cached charges during soft limit reclaim
    because background reclaim doesn't care about charges. It tries to free
    some memory and charges will not give any.

    Cached charges might influence only selection of the biggest soft limit
    offender but as the call is done only after the selection has been already
    done it makes no change.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Reviewed-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • For performance, memory cgroup caches some "charge" from res_counter into
    per cpu cache. This works well but because it's cache, it needs to be
    flushed in some cases. Typical cases are

    1. when someone hit limit.

    2. when rmdir() is called and need to charges to be 0.

    But "1" has problem.

    Recently, with large SMP machines, we see many kworker runs because of
    flushing memcg's cache. Bad things in implementation are that even if a
    cpu contains a cache for memcg not related to a memcg which hits limit,
    drain code is called.

    This patch does
    A) check percpu cache contains a useful data or not.
    B) check other asynchronous percpu draining doesn't run.
    C) don't call local cpu callback.

    (*)This patch avoid changing the calling condition with hard-limit.

    When I run "cat 1Gfile > /dev/null" under 300M limit memcg,

    [Before]
    13767 kamezawa 20 0 98.6m 424 416 D 10.0 0.0 0:00.61 cat
    58 root 20 0 0 0 0 S 0.6 0.0 0:00.09 kworker/2:1
    60 root 20 0 0 0 0 S 0.6 0.0 0:00.08 kworker/4:1
    4 root 20 0 0 0 0 S 0.3 0.0 0:00.02 kworker/0:0
    57 root 20 0 0 0 0 S 0.3 0.0 0:00.05 kworker/1:1
    61 root 20 0 0 0 0 S 0.3 0.0 0:00.05 kworker/5:1
    62 root 20 0 0 0 0 S 0.3 0.0 0:00.05 kworker/6:1
    63 root 20 0 0 0 0 S 0.3 0.0 0:00.05 kworker/7:1

    [After]
    2676 root 20 0 98.6m 416 416 D 9.3 0.0 0:00.87 cat
    2626 kamezawa 20 0 15192 1312 920 R 0.3 0.0 0:00.28 top
    1 root 20 0 19384 1496 1204 S 0.0 0.0 0:00.66 init
    2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
    3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
    4 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0

    [akpm@linux-foundation.org: make percpu_charge_mutex static, tweak comments]
    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: Daisuke Nishimura
    Reviewed-by: Michal Hocko
    Tested-by: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Hierarchical reclaim doesn't swap out if memsw and resource limits are
    thye same (memsw_is_minimum == true) because we would hit mem+swap limit
    anyway (during hard limit reclaim).

    If it comes to the soft limit we shouldn't consider memsw_is_minimum at
    all because it doesn't make much sense. Either the soft limit is bellow
    the hard limit and then we cannot hit mem+swap limit or the direct reclaim
    takes a precedence.

    Signed-off-by: KAMEZAWA Hiroyuki
    Reviewed-by: Michal Hocko
    Acked-by: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • The following crash was reported:

    > Call Trace:
    > [] mem_cgroup_from_task+0x15/0x17
    > [] __mem_cgroup_try_charge+0x148/0x4b4
    > [] ? need_resched+0x23/0x2d
    > [] ? preempt_schedule+0x46/0x4f
    > [] mem_cgroup_charge_common+0x9a/0xce
    > [] mem_cgroup_newpage_charge+0x5d/0x5f
    > [] khugepaged+0x5da/0xfaf
    > [] ? __init_waitqueue_head+0x4b/0x4b
    > [] ? add_mm_counter.constprop.5+0x13/0x13
    > [] kthread+0xa8/0xb0
    > [] ? sub_preempt_count+0xa1/0xb4
    > [] kernel_thread_helper+0x4/0x10
    > [] ? retint_restore_args+0x13/0x13
    > [] ? __init_kthread_worker+0x5a/0x5a

    What happens is that khugepaged tries to charge a huge page against an mm
    whose last possible owner has already exited, and the memory controller
    crashes when the stale mm->owner is used to look up the cgroup to charge.

    mm->owner has never been set to NULL with the last owner going away, but
    nobody cared until khugepaged came along.

    Even then it wasn't a problem because the final mmput() on an mm was
    forced to acquire and release mmap_sem in write-mode, preventing an
    exiting owner to go away while the mmap_sem was held, and until "692e0b3
    mm: thp: optimize memcg charge in khugepaged", the memory cgroup charge
    was protected by mmap_sem in read-mode.

    Instead of going back to relying on the mmap_sem to enforce lifetime of a
    task, this patch ensures that mm->owner is properly set to NULL when the
    last possible owner is exiting, which the memory controller can handle
    just fine.

    [akpm@linux-foundation.org: tweak comments]
    Signed-off-by: Hugh Dickins
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Johannes Weiner
    Reported-by: Hugh Dickins
    Reported-by: Dave Jones
    Reviewed-by: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Commit 21a3c9646873 ("memcg: allocate memory cgroup structures in local
    nodes") makes page_cgroup allocation as NUMA aware. But that caused a
    problem https://bugzilla.kernel.org/show_bug.cgi?id=36192.

    The problem was getting a NID from invalid struct pages, which was not
    initialized because it was out-of-node, out of [node_start_pfn,
    node_end_pfn)

    Now, with sparsemem, page_cgroup_init scans pfn from 0 to max_pfn. But
    this may scan a pfn which is not on any node and can access memmap which
    is not initialized.

    This makes page_cgroup_init() for SPARSEMEM node aware and remove a code
    to get nid from page->flags. (Then, we'll use valid NID always.)

    [akpm@linux-foundation.org: try to fix up comments]
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Commit 406eb0c9ba76 ("memcg: add memory.numastat api for numa
    statistics") adds memory.numa_stat file for memory cgroup. But the file
    permissions are wrong.

    [kamezawa@bluextal linux-2.6]$ ls -l /cgroup/memory/A/memory.numa_stat
    ---------- 1 root root 0 Jun 9 18:36 /cgroup/memory/A/memory.numa_stat

    This patch fixes the permission as

    [root@bluextal kamezawa]# ls -l /cgroup/memory/A/memory.numa_stat
    -r--r--r-- 1 root root 0 Jun 10 16:49 /cgroup/memory/A/memory.numa_stat

    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Seems when a config option does not have a dependency of the menuconfig,
    it messes the display of the rest configs, even if it's a hidden one.

    Signed-off-by: Eric Miao
    Cc: Richard Purdie
    Cc: Valdis Kletnieks
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Miao
     
  • When 1GB hugepages are allocated on a system, free(1) reports less
    available memory than what really is installed in the box. Also, if the
    total size of hugepages allocated on a system is over half of the total
    memory size, CommitLimit becomes a negative number.

    The problem is that gigantic hugepages (order > MAX_ORDER) can only be
    allocated at boot with bootmem, thus its frames are not accounted to
    'totalram_pages'. However, they are accounted to hugetlb_total_pages()

    What happens to turn CommitLimit into a negative number is this
    calculation, in fs/proc/meminfo.c:

    allowed = ((totalram_pages - hugetlb_total_pages())
    * sysctl_overcommit_ratio / 100) + total_swap_pages;

    A similar calculation occurs in __vm_enough_memory() in mm/mmap.c.

    Also, every vm statistic which depends on 'totalram_pages' will render
    confusing values, as if system were 'missing' some part of its memory.

    Impact of this bug:

    When gigantic hugepages are allocated and sysctl_overcommit_memory ==
    OVERCOMMIT_NEVER. In a such situation, __vm_enough_memory() goes through
    the mentioned 'allowed' calculation and might end up mistakenly returning
    -ENOMEM, thus forcing the system to start reclaiming pages earlier than it
    would be ususal, and this could cause detrimental impact to overall
    system's performance, depending on the workload.

    Besides the aforementioned scenario, I can only think of this causing
    annoyances with memory reports from /proc/meminfo and free(1).

    [akpm@linux-foundation.org: standardize comment layout]
    Reported-by: Russ Anderson
    Signed-off-by: Rafael Aquini
    Acked-by: Russ Anderson
    Cc: Andrea Arcangeli
    Cc: Christoph Lameter
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael Aquini
     
  • During memory hotplug we refresh zonelists when we online a page in a new
    zone. It means that the node's zonelist is not initialized until pages
    are onlined. So for example, "nid" passed by MEM_GOING_ONLINE notifier
    will point to NODE_DATA(nid) which has no zone fallback list. Moreover,
    if we hot-add cpu-only nodes, alloc_pages() will do no fallback.

    This patch makes a zonelist when a new pgdata is available.

    Note: in production, at fujitsu, memory should be onlined before cpu
    and our server didn't have any memory-less nodes and had no problems.

    But recent changes in MEM_GOING_ONLINE+page_cgroup
    will access not initialized zonelist of node.
    Anyway, there are memory-less node and we need some care.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Remove calibrate_delay_direct()'s KERN_DEBUG printk related to bogomips
    calculation as it appears when booting every core on setups with
    'ignore_loglevel' which dmesg people scan for possible issues. As the
    message doesn't show very useful information to the widest audience of
    kernel boot message gazers, it should be removed.

    Introduced by commit d2b463135f84 ("init/calibrate.c: fix for critical
    bogoMIPS intermittent calculation failure").

    Signed-off-by: Borislav Petkov
    Cc: Andrew Worsley
    Cc: Phil Carmody
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Borislav Petkov
     
  • On m68k (which doesn't support generic hardirqs yet):

    drivers/w1/masters/ds1wm.c: In function `ds1wm_probe':
    drivers/w1/masters/ds1wm.c: error: implicit declaration of function `irq_set_irq_type'

    Signed-off-by: Geert Uytterhoeven
    Cc: Evgeniy Polyakov
    Cc: Jean-Franois Dagenais
    Cc: Matt Reimer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     
  • Signed-off-by: Nicolas Kaiser
    Reviewed-by: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicolas Kaiser
     
  • Add maintainers for the videobuf2 V4L2 driver framework.

    Signed-off-by: Pawel Osciak
    Acked-by: Marek Szyprowski
    Cc: Mauro Carvalho Chehab
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pawel Osciak
     
  • Commit 4440673a95e6 ("leds: provide helper to register "leds-gpio"
    devices") broke the display of the NEW_LEDS menu as it didn't depend on
    NEW_LEDS and so made "LED drivers" and "LED Triggers" appear at the same
    level as "LED Support" instead of below it as it was before 4440673a.

    Moving LEDS_GPIO_REGISTER out of the menuconfig NEW_LEDS fixes this
    unintended side effect.

    Reported-by: Axel Lin
    Signed-off-by: Uwe Kleine-König
    Cc: Richard Purdie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     
  • We call led_classdev_unregister/led_classdev_register in
    asic3_led_remove/asic3_led_probe, thus make LEDS_ASIC3 depend on
    LEDS_CLASS.

    This patch fixes below build error if LEDS_CLASS is not configured.

    LD .tmp_vmlinux1
    drivers/built-in.o: In function `asic3_led_remove':
    clkdev.c:(.devexit.text+0x1860): undefined reference to `led_classdev_unregister'
    drivers/built-in.o: In function `asic3_led_probe':
    clkdev.c:(.devinit.text+0xcee8): undefined reference to `led_classdev_register'
    make: *** [.tmp_vmlinux1] Error 1

    Signed-off-by: Axel Lin
    Cc: Paul Parsons
    Cc: Richard Purdie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Axel Lin
     
  • Update my email address. Email will start to the old address bouncing
    soon

    Signed-off-by: Balbir Singh
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • The "hostname" tool falls back to setting the hostname to "localhost" if
    /etc/hostname does not exist. Distribution init scripts have the same
    fallback. However, if userspace never calls sethostname, such as when
    booting with init=/bin/sh, or otherwise booting a minimal system without
    the usual init scripts, the default hostname of "(none)" remains,
    unhelpfully appearing in various places such as prompts ("root@(none):~#")
    and logs. Furthermore, "(none)" doesn't typically resolve to anything
    useful.

    Make the default hostname configurable. This removes the need for the
    standard fallback, provides a useful default for systems that never call
    sethostname, and makes minimal systems that much more useful with less
    configuration. Distributions could choose to use "localhost" here to
    avoid the fallback, while embedded systems may wish to use a specific
    target hostname.

    Signed-off-by: Josh Triplett
    Acked-by: Linus Torvalds
    Acked-by: David Miller
    Cc: Serge Hallyn
    Cc: Kel Modderman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Triplett
     
  • BUILD_BUG_ON_ZERO and BUILD_BUG_ON_NULL must return values, even in the
    CHECKER case otherwise various users of it become syntactically invalid.

    Signed-off-by: Dr. David Alan Gilbert
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dr. David Alan Gilbert
     
  • Commit 56de7263fcf3 ("mm: compaction: direct compact when a high-order
    allocation fails") introduced a check for cc->order == -1 in
    compact_finished. We should continue compacting in that case because
    the request came from userspace and there is no particular order to
    compact for. Similar check has been added by 82478fb7 (mm: compaction:
    prevent division-by-zero during user-requested compaction) for
    compaction_suitable.

    The check is, however, done after zone_watermark_ok which uses order as a
    right hand argument for shifts. Not only watermark check is pointless if
    we can break out without it but it also uses 1 << -1 which is not well
    defined (at least from C standard). Let's move the -1 check above
    zone_watermark_ok.

    [minchan.kim@gmail.com> - caught compaction_suitable]
    Signed-off-by: Michal Hocko
    Cc: Mel Gorman
    Reviewed-by: Minchan Kim
    Reviewed-by: KAMEZAWA Hiroyuki
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Running a ktest.pl test, I hit the following bug on x86_32:

    ------------[ cut here ]------------
    WARNING: at arch/x86/mm/highmem_32.c:81 __kunmap_atomic+0x64/0xc1()
    Hardware name:
    Modules linked in:
    Pid: 93, comm: sh Not tainted 2.6.39-test+ #1
    Call Trace:
    [] warn_slowpath_common+0x7c/0x91
    [] ? __kunmap_atomic+0x64/0xc1
    [] ? __kunmap_atomic+0x64/0xc1^M
    [] warn_slowpath_null+0x22/0x24
    [] __kunmap_atomic+0x64/0xc1
    [] unmap_vmas+0x43a/0x4e0
    [] exit_mmap+0x91/0xd2
    [] mmput+0x43/0xad
    [] exit_mm+0x111/0x119
    [] do_exit+0x1ff/0x5fa
    [] ? set_current_blocked+0x3c/0x40
    [] ? sigprocmask+0x7e/0x8e
    [] do_group_exit+0x65/0x88
    [] sys_exit_group+0x18/0x1c
    [] sysenter_do_call+0x12/0x38
    ---[ end trace 8055f74ea3c0eb62 ]---

    Running a ktest.pl git bisect, found the culprit: commit e303297e6c3a
    ("mm: extended batches for generic mmu_gather")

    But although this was the commit triggering the bug, it was not the one
    originally responsible for the bug. That was commit d16dfc550f53 ("mm:
    mmu_gather rework").

    The code in zap_pte_range() has something that looks like the following:

    pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
    do {
    [...]
    } while (pte++, addr += PAGE_SIZE, addr != end);
    pte_unmap_unlock(pte - 1, ptl);

    The pte starts off pointing at the first element in the page table
    directory that was returned by the pte_offset_map_lock(). When it's done
    with the page, pte will be pointing to anything between the next entry and
    the first entry of the next page inclusive. By doing a pte - 1, this puts
    the pte back onto the original page, which is all that pte_unmap_unlock()
    needs.

    In most archs (64 bit), this is not an issue as the pte is ignored in the
    pte_unmap_unlock(). But on 32 bit archs, where things may be kmapped, it
    is essential that the pte passed to pte_unmap_unlock() resides on the same
    page that was given by pte_offest_map_lock().

    The problem came in d16dfc55 ("mm: mmu_gather rework") where it introduced
    a "break;" from the while loop. This alone did not seem to easily trigger
    the bug. But the modifications made by e303297e6 caused that "break;" to
    be hit on the first iteration, before the pte++.

    The pte not being incremented will now cause pte_unmap_unlock(pte - 1) to
    be pointing to the previous page. This will cause the wrong page to be
    unmapped, and also trigger the warning above.

    The simple solution is to just save the pointer given by
    pte_offset_map_lock() and use it in the unlock.

    Signed-off-by: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: KAMEZAWA Hiroyuki
    Acked-by: Hugh Dickins
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven Rostedt
     
  • Fix the wrong `if' condition for the check if the requested timer is
    available.

    The bitmap avail is used to store if a timer is used already. test_bit()
    is used to check if the requested timer is available. If a bit in the
    avail bitmap is set it means that the timer is available.

    The runtime effect would be that allocating a specific timer always fails
    (versus telling cs5535_mfgpt_alloc_timer to allocate the first available
    timer, which works).

    Signed-off-by: Christian Gmeiner
    Acked-by: Andres Salomon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christian Gmeiner