28 Apr, 2008

5 commits

  • It was used to compensate because MAX_NR_ZONES was not available to the
    #ifdefs. Export MAX_NR_ZONES via the new mechanism and get rid of
    __ZONE_COUNT.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • NR_PAGEFLAGS specifies the number of page flags we are using. From that we
    can calculate the number of bits leftover that can be used for zone, node (and
    maybe the sections id). There is no need anymore for FLAGS_RESERVED if we use
    NR_PAGEFLAGS.

    Use the new methods to make NR_PAGEFLAGS available via the preprocessor.
    NR_PAGEFLAGS is used to calculate field boundaries in the page flags fields.
    These field widths have to be available to the preprocessor.

    Signed-off-by: Christoph Lameter
    Cc: David Miller
    Cc: Andy Whitcroft
    Cc: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: Jeremy Fitzhardinge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • The use of enums create constants that are not available to the preprocessor
    when building the kernel (f.e. MAX_NR_ZONES).

    Arch code already has a way to export constants calculated to the preprocessor
    through the asm-offsets.c file. Generate something similar for the core
    kernel through kbuild.

    Signed-off-by: Sam Ravnborg
    Signed-off-by: Christoph Lameter
    Cc: Andy Whitcroft
    Cc: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: Jeremy Fitzhardinge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • The MPOL_BIND policy creates a zonelist that is used for allocations
    controlled by that mempolicy. As the per-node zonelist is already being
    filtered based on a zone id, this patch adds a version of __alloc_pages() that
    takes a nodemask for further filtering. This eliminates the need for
    MPOL_BIND to create a custom zonelist.

    A positive benefit of this is that allocations using MPOL_BIND now use the
    local node's distance-ordered zonelist instead of a custom node-id-ordered
    zonelist. I.e., pages will be allocated from the closest allowed node with
    available memory.

    [Lee.Schermerhorn@hp.com: Mempolicy: update stale documentation and comments]
    [Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask]
    [Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask rework]
    Signed-off-by: Mel Gorman
    Acked-by: Christoph Lameter
    Signed-off-by: Lee Schermerhorn
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Filtering zonelists requires very frequent use of zone_idx(). This is costly
    as it involves a lookup of another structure and a substraction operation. As
    the zone_idx is often required, it should be quickly accessible. The node idx
    could also be stored here if it was found that accessing zone->node is
    significant which may be the case on workloads where nodemasks are heavily
    used.

    This patch introduces a struct zoneref to store a zone pointer and a zone
    index. The zonelist then consists of an array of these struct zonerefs which
    are looked up as necessary. Helpers are given for accessing the zone index as
    well as the node index.

    [kamezawa.hiroyu@jp.fujitsu.com: Suggested struct zoneref instead of embedding information in pointers]
    [hugh@veritas.com: mm-have-zonelist: fix memcg ooms]
    [hugh@veritas.com: just return do_try_to_free_pages]
    [hugh@veritas.com: do_try_to_free_pages gfp_mask redundant]
    Signed-off-by: Mel Gorman
    Acked-by: Christoph Lameter
    Acked-by: David Rientjes
    Signed-off-by: Lee Schermerhorn
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Christoph Lameter
    Cc: Nick Piggin
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

27 Apr, 2008

2 commits

  • The SIE instruction on s390 uses the 2nd half of the page table page to
    virtualize the storage keys of a guest. This patch offers the s390_enable_sie
    function, which reorganizes the page tables of a single-threaded process to
    reserve space in the page table:
    s390_enable_sie makes sure that the process is single threaded and then uses
    dup_mm to create a new mm with reorganized page tables. The old mm is freed
    and the process has now a page status extended field after every page table.

    Code that wants to exploit pgstes should SELECT CONFIG_PGSTE.

    This patch has a small common code hit, namely making dup_mm non-static.

    Edit (Carsten): I've modified Martin's patch, following Jeremy Fitzhardinge's
    review feedback. Now we do have the prototype for dup_mm in
    include/linux/sched.h. Following Martin's suggestion, s390_enable_sie() does now
    call task_lock() to prevent race against ptrace modification of mm_users.

    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Carsten Otte
    Acked-by: Andrew Morton
    Signed-off-by: Avi Kivity

    Carsten Otte
     
  • Arrgghhh...

    Sorry about that, I'd been sure I'd folded that one, but it actually got
    lost. Please apply - that breaks execve().

    Signed-off-by: Al Viro
    Tested-by: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Al Viro
     

26 Apr, 2008

3 commits


25 Apr, 2008

8 commits

  • * let unshare_files() give caller the displaced files_struct
    * don't bother with grabbing reference only to drop it in the
    caller if it hadn't been shared in the first place
    * in that form unshare_files() is trivially implemented via
    unshare_fd(), so we eliminate the duplicate logics in fork.c
    * reset_files_struct() is not just only called for current;
    it will break the system if somebody ever calls it for anything
    else (we can't modify ->files of somebody else). Lose the
    task_struct * argument.

    Signed-off-by: Al Viro

    Al Viro
     
  • * unshare_files() can fail; doing it after irreversible actions is wrong
    and de_thread() is certainly irreversible.
    * since we do it unconditionally anyway, we might as well do it in do_execve()
    and save ourselves the PITA in binfmt handlers, etc.
    * while we are at it, binfmt_som actually leaked files_struct on failure.

    As a side benefit, unshare_files(), put_files_struct() and reset_files_struct()
    become unexported.

    Signed-off-by: Al Viro

    Al Viro
     
  • updating current->files requires task_lock

    Signed-off-by: Al Viro

    Al Viro
     
  • There is no guarantee that there is physical ram below 4GB, and in
    fact many boxes don't have exactly that.

    Signed-off-by: David S. Miller
    Signed-off-by: Ingo Molnar

    David Miller
     
  • fix __aggregate_redistribute_shares() related lockup reported by
    David S. Miller.

    The problem this code tries to solve is 'accurately' calculating the 'fair'
    share of the group weight for each cpu. The current code falls back to a global
    group rebalance in case the sched_domain's span it looks at has no shares, but
    does have tasks.

    The reason it gets stuck here, is because its inherently racy - if someone
    steals the last task after we compute the agg->rq_weight, but before we
    rebalance, we'll never get out of the loop.

    We could of course go fix that, but while looking at this issue I found that
    this 'fallback' wasn't nearly as rare as I'd hoped it to be. In fact its quite
    common - and given it walks the whole machine, thats very bad.

    The new approach is simple (why didn't I think of it before?), we set the
    aggregate shares to the full task group weight, and each larger sched domain
    that encounters an aggregate shares larger than the weight, clips it (it
    already re-distributes anyway).

    This nicely converges to the desired global picture where the sum of all
    shares equals the task group weight.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • David Miller reported:

    |--------------->
    the following commit:

    | commit 27ec4407790d075c325e1f4da0a19c56953cce23
    | Author: Ingo Molnar
    | Date: Thu Feb 28 21:00:21 2008 +0100
    |
    | sched: make cpu_clock() globally synchronous
    |
    | Alexey Zaytsev reported (and bisected) that the introduction of
    | cpu_clock() in printk made the timestamps jump back and forth.
    |
    | Make cpu_clock() more reliable while still keeping it fast when it's
    | called frequently.
    |
    | Signed-off-by: Ingo Molnar

    causes watchdog triggers when a cpu exits NOHZ state when it has been
    there for >= the soft lockup threshold, for example here are some
    messages from a 128 cpu Niagara2 box:

    [ 168.106406] BUG: soft lockup - CPU#11 stuck for 128s! [dd:3239]
    [ 168.989592] BUG: soft lockup - CPU#21 stuck for 86s! [swapper:0]
    [ 168.999587] BUG: soft lockup - CPU#29 stuck for 91s! [make:4511]
    [ 168.999615] BUG: soft lockup - CPU#2 stuck for 85s! [swapper:0]
    [ 169.020514] BUG: soft lockup - CPU#37 stuck for 91s! [swapper:0]
    [ 169.020514] BUG: soft lockup - CPU#45 stuck for 91s! [sh:4515]
    [ 169.020515] BUG: soft lockup - CPU#69 stuck for 92s! [swapper:0]
    [ 169.020515] BUG: soft lockup - CPU#77 stuck for 92s! [swapper:0]
    [ 169.020515] BUG: soft lockup - CPU#61 stuck for 92s! [swapper:0]
    [ 169.112554] BUG: soft lockup - CPU#85 stuck for 92s! [swapper:0]
    [ 169.112554] BUG: soft lockup - CPU#101 stuck for 92s! [swapper:0]
    [ 169.112554] BUG: soft lockup - CPU#109 stuck for 92s! [swapper:0]
    [ 169.112554] BUG: soft lockup - CPU#117 stuck for 92s! [swapper:0]
    [ 169.171483] BUG: soft lockup - CPU#40 stuck for 80s! [dd:3239]
    [ 169.331483] BUG: soft lockup - CPU#13 stuck for 86s! [swapper:0]
    [ 169.351500] BUG: soft lockup - CPU#43 stuck for 101s! [dd:3239]
    [ 169.531482] BUG: soft lockup - CPU#9 stuck for 129s! [mkdir:4565]
    [ 169.595754] BUG: soft lockup - CPU#20 stuck for 93s! [swapper:0]
    [ 169.626787] BUG: soft lockup - CPU#52 stuck for 93s! [swapper:0]
    [ 169.626787] BUG: soft lockup - CPU#84 stuck for 92s! [swapper:0]
    [ 169.636812] BUG: soft lockup - CPU#116 stuck for 94s! [swapper:0]

    It's simple enough to trigger this by doing a 10 minute sleep after a
    fresh bootup then starting a parallel kernel build.

    I suspect this might be reintroducing a problem we've had and fixed
    before, see the thread:

    http://marc.info/?l=linux-kernel&m=119546414004065&w=2

    Ingo Molnar
     
  • Regression caused by 434d53b00d6bb7be0a1d3dcc0d0d5df6c042e164

    Signed-off-by: Mike Travis
    Signed-off-by: Tony Luck

    Mike Travis
     
  • A recent change prevents SGI Altix from booting.
    This patch fixes the problem.

    The regresson was introduced in commit 434d53b00d6bb7be0a1d3dcc0d0d5df6c042e164

    Signed-off-by: Russ Anderson
    Signed-off-by: Tony Luck

    Russ Anderson
     

24 Apr, 2008

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
    iwlwifi: Fix built-in compilation of iwlcore
    net: Unexport move_addr_to_{kernel,user}
    rt2x00: Select LEDS_CLASS.
    iwlwifi: Select LEDS_CLASS.
    leds: Do not guard NEW_LEDS with HAS_IOMEM
    [IPSEC]: Fix catch-22 with algorithm IDs above 31
    time: Export set_normalized_timespec.
    tcp: Make use of before macro in tcp_input.c
    hamradio: Remove unneeded and deprecated cli()/sti() calls in dmascc.c
    [NETNS]: Remove empty ->init callback.
    [DCCP]: Convert do_gettimeofday() to getnstimeofday().
    [NETNS]: Don't initialize err variable twice.
    [NETNS]: The ip6_fib_timer can work with garbage on net namespace stop.
    [IPV4]: Convert do_gettimeofday() to getnstimeofday().
    [IPV4]: Make icmp_sk_init() static.
    [IPV6]: Make struct ip6_prohibit_entry_template static.
    tcp: Trivial fix to correct function name in a comment in net/ipv4/tcp.c
    [NET]: Expose netdevice dev_id through sysfs
    skbuff: fix missing kernel-doc notation
    [ROSE]: Fix soft lockup wrt. rose_node_list_lock

    Linus Torvalds
     

23 Apr, 2008

3 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    [PATCH] get rid of __exit_files(), __exit_fs() and __put_fs_struct()
    [PATCH] proc_readfd_common() race fix
    [PATCH] double-free of inode on alloc_file() failure exit in create_write_pipe()
    [PATCH] teach seq_file to discard entries
    [PATCH] umount_tree() will unhash everything itself
    [PATCH] get rid of more nameidata passing in namespace.c
    [PATCH] switch a bunch of LSM hooks from nameidata to path
    [PATCH] lock exclusively in collect_mounts() and drop_collected_mounts()
    [PATCH] move a bunch of declarations to fs/internal.h

    Linus Torvalds
     
  • The only reason to have separated __...() for those was to keep them inlined
    for local users in exit.c. Since Alexey removed the inline on those, there's
    no reason whatsoever to keep them around; just collapse with normal variants.

    Signed-off-by: Al Viro

    Al Viro
     
  • Add missing kernel-doc in kernel/sched.c:

    Warning(linux-2.6.25-git3//kernel/sched.c:7044): No description found for parameter 'span'

    Signed-off-by: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

22 Apr, 2008

8 commits

  • Sorry I have just realized set_normalized_timespec() (used in
    timespec_sub()) is not exported, and link will fail because of it...

    Signed-off-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    YOSHIFUJI Hideaki
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/juhl/trivial: (24 commits)
    DOC: A couple corrections and clarifications in USB doc.
    Generate a slightly more informative error msg for bad HZ
    fix typo "is" -> "if" in Makefile
    ext*: spelling fix prefered -> preferred
    DOCUMENTATION: Use newer DEFINE_SPINLOCK macro in docs.
    KEYS: Fix the comment to match the file name in rxrpc-type.h.
    RAID: remove trailing space from printk line
    DMA engine: typo fixes
    Remove unused MAX_NODES_SHIFT
    MAINTAINERS: Clarify access to OCFS2 development mailing list.
    V4L: Storage class should be before const qualifier (sn9c102)
    V4L: Storage class should be before const qualifier
    sonypi: Storage class should be before const qualifier
    intel_menlow: Storage class should be before const qualifier
    DVB: Storage class should be before const qualifier
    arm: Storage class should be before const qualifier
    ALSA: Storage class should be before const qualifier
    acpi: Storage class should be before const qualifier
    firmware_sample_driver.c: fix coding style
    MAINTAINERS: Add ati_remote2 driver
    ...

    Fixed up trivial conflicts in firmware_sample_driver.c

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/pci-2.6: (42 commits)
    PCI: Change PCI subsystem MAINTAINER
    PCI: pci-iommu-iotlb-flushing-speedup
    PCI: pci_setup_bridge() mustn't be __devinit
    PCI: pci_bus_size_cardbus() mustn't be __devinit
    PCI: pci_scan_device() mustn't be __devinit
    PCI: pci_alloc_child_bus() mustn't be __devinit
    PCI: replace remaining __FUNCTION__ occurrences
    PCI: Hotplug: fakephp: Return success, not ENODEV, when bus rescan is triggered
    PCI: Hotplug: Fix leaks in IBM Hot Plug Controller Driver - ibmphp_init_devno()
    PCI: clean up resource alignment management
    PCI: aerdrv_acpi.c: remove unneeded NULL check
    PCI: Update VIA CX700 quirk
    PCI: Expose PCI VPD through sysfs
    PCI: iommu: iotlb flushing
    PCI: simplify quirk debug output
    PCI: iova RB tree setup tweak
    PCI: parisc: use generic pci_enable_resources()
    PCI: ppc: use generic pci_enable_resources()
    PCI: powerpc: use generic pci_enable_resources()
    PCI: ia64: use generic pci_enable_resources()
    ...

    Linus Torvalds
     
  • This adds support for PTRACE_GETSIGINFO and PTRACE_SETSIGINFO in
    compat_ptrace_request. It relies on existing arch definitions for
    copy_siginfo_to_user32 and copy_siginfo_from_user32.

    On powerpc, this fixes a longstanding regression of 32-bit ptrace
    calls on 64-bit kernels vs native calls (64-bit calls or 32-bit
    kernels). This can be seen in a 32-bit call using PTRACE_GETSIGINFO
    to examine e.g. siginfo_t.si_addr from a signal that sets it.
    (This was broken as of 2.6.24 and, I presume, many or all prior versions.)

    Signed-off-by: Roland McGrath
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt:
    hrtimer: optimize the softirq time optimization
    hrtimer: reduce calls to hrtimer_get_softirq_time()
    clockevents: fix typo in tick-broadcast.c
    jiffies: add time_is_after_jiffies and others which compare with jiffies

    Linus Torvalds
     
  • * 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc:
    Deprecate the asm/semaphore.h files in feature-removal-schedule.
    Convert asm/semaphore.h users to linux/semaphore.h
    security: Remove unnecessary inclusions of asm/semaphore.h
    lib: Remove unnecessary inclusions of asm/semaphore.h
    kernel: Remove unnecessary inclusions of asm/semaphore.h
    include: Remove unnecessary inclusions of asm/semaphore.h
    fs: Remove unnecessary inclusions of asm/semaphore.h
    drivers: Remove unnecessary inclusions of asm/semaphore.h
    net: Remove unnecessary inclusions of asm/semaphore.h
    arch: Remove unnecessary inclusions of asm/semaphore.h

    Linus Torvalds
     
  • …linux-2.6-sched-devel

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel: (62 commits)
    sched: build fix
    sched: better rt-group documentation
    sched: features fix
    sched: /debug/sched_features
    sched: add SCHED_FEAT_DEADLINE
    sched: debug: show a weight tree
    sched: fair: weight calculations
    sched: fair-group: de-couple load-balancing from the rb-trees
    sched: fair-group scheduling vs latency
    sched: rt-group: optimize dequeue_rt_stack
    sched: debug: add some debug code to handle the full hierarchy
    sched: fair-group: SMP-nice for group scheduling
    sched, cpuset: customize sched domains, core
    sched, cpuset: customize sched domains, docs
    sched: prepatory code movement
    sched: rt: multi level group constraints
    sched: task_group hierarchy
    sched: fix the task_group hierarchy for UID grouping
    sched: allow the group scheduler to have multiple levels
    sched: mix tasks and groups
    ...

    Linus Torvalds
     
  • These are small cleanups all over the tree.

    Trivial style and comment changes to
    fs/select.c, kernel/signal.c, kernel/stop_machine.c & mm/pdflush.c

    Signed-off-by: Pavel Machek
    Signed-off-by: Jesper Juhl

    Pavel Machek
     

21 Apr, 2008

4 commits

  • The previous optimization did not take the case into account where a
    clock provides its own softirq_get_time() function.

    Check for the availablitiy of the clock get time function first and
    then check if we need to retrieve the time for both clocks via
    hrtimer_softirq_gettime() to avoid a double evaluation of time in that
    case as well.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • It seems that hrtimer_run_queues() is calling hrtimer_get_softirq_time() more
    often than it needs to. This can cause frequent contention on systems with
    large numbers of processors/cores.

    With this patch, hrtimer_run_queues only calls hrtimer_get_softirq_time() if
    there is a pending timer in one of the hrtimer bases, and only once.

    This also combines hrtimer_run_queues() and the inline run_hrtimer_queue()
    into one function.

    [ tglx@linutronix.de: coding style ]

    Signed-off-by: Dimitri Sivanich
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Dimitri Sivanich
     
  • braodcast -> broadcast

    Signed-off-by: Glauber Costa
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Glauber Costa
     
  • Done per Linus' request and suggestions. Linus has explained that
    better than I'll be able to explain:

    On Thu, Mar 27, 2008 at 10:12:10AM -0700, Linus Torvalds wrote:
    > Actually, before we go any further, there might be a less intrusive
    > alternative: add just a couple of flags to the resource flags field (we
    > still have something like 8 unused bits on 32-bit), and use those to
    > implement a generic "resource_alignment()" routine.
    >
    > Two flags would do it:
    >
    > - IORESOURCE_SIZEALIGN: size indicates alignment (regular PCI device
    > resources)
    >
    > - IORESOURCE_STARTALIGN: start field is alignment (PCI bus resources
    > during probing)
    >
    > and then the case of both flags zero (or both bits set) would actually be
    > "invalid", and we would also clear the IORESOURCE_STARTALIGN flag when we
    > actually allocate the resource (so that we don't use the "start" field as
    > alignment incorrectly when it no longer indicates alignment).
    >
    > That wouldn't be totally generic, but it would have the nice property of
    > automatically at least add sanity checking for that whole "res->start has
    > the odd meaning of 'alignment' during probing" and remove the need for a
    > new field, and it would allow us to have a generic "resource_alignment()"
    > routine that just gets a resource pointer.

    Besides, I removed IORESOURCE_BUS_HAS_VGA flag which was unused for ages.

    Signed-off-by: Ivan Kokshaysky
    Cc: Linus Torvalds
    Cc: Gary Hade
    Signed-off-by: Greg Kroah-Hartman

    Ivan Kokshaysky
     

20 Apr, 2008

6 commits