27 Jul, 2008

2 commits

  • long overdue...

    Signed-off-by: Al Viro

    Al Viro
     
  • Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
    architecture does:

    This enables us to cleanly fix the Calgary IOMMU issue that some devices
    are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).

    I think that per-device dma_mapping_ops support would be also helpful for
    KVM people to support PCI passthrough but Andi thinks that this makes it
    difficult to support the PCI passthrough (see the above thread). So I
    CC'ed this to KVM camp. Comments are appreciated.

    A pointer to dma_mapping_ops to struct dev_archdata is added. If the
    pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's
    NULL, the system-wide dma_ops pointer is used as before.

    If it's useful for KVM people, I plan to implement a mechanism to register
    a hook called when a new pci (or dma capable) device is created (it works
    with hot plugging). It enables IOMMUs to set up an appropriate
    dma_mapping_ops per device.

    The major obstacle is that dma_mapping_error doesn't take a pointer to the
    device unlike other DMA operations. So x86 can't have dma_mapping_ops per
    device. Note all the POWER IOMMUs use the same dma_mapping_error function
    so this is not a problem for POWER but x86 IOMMUs use different
    dma_mapping_error functions.

    The first patch adds the device argument to dma_mapping_error. The patch
    is trivial but large since it touches lots of drivers and dma-mapping.h in
    all the architecture.

    This patch:

    dma_mapping_error() doesn't take a pointer to the device unlike other DMA
    operations. So we can't have dma_mapping_ops per device.

    Note that POWER already has dma_mapping_ops per device but all the POWER
    IOMMUs use the same dma_mapping_error function. x86 IOMMUs use device
    argument.

    [akpm@linux-foundation.org: fix sge]
    [akpm@linux-foundation.org: fix svc_rdma]
    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: fix bnx2x]
    [akpm@linux-foundation.org: fix s2io]
    [akpm@linux-foundation.org: fix pasemi_mac]
    [akpm@linux-foundation.org: fix sdhci]
    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: fix sparc]
    [akpm@linux-foundation.org: fix ibmvscsi]
    Signed-off-by: FUJITA Tomonori
    Cc: Muli Ben-Yehuda
    Cc: Andi Kleen
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Avi Kivity
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    FUJITA Tomonori
     

26 Jul, 2008

4 commits

  • * git://git.infradead.org/~dwmw2/random-2.6:
    remove dummy asm/kvm.h files
    firmware: create firmware binaries during 'make modules'.

    Linus Torvalds
     
  • This patch removes the dummy asm/kvm.h files on architectures not (yet)
    supporting KVM and uses the same conditional headers installation as
    already used for a.out.h .

    Also removed are superfluous install rules in the s390 and x86 Kbuild
    files (they are already in Kbuild.asm).

    Signed-off-by: Adrian Bunk
    Acked-by: Sam Ravnborg
    Signed-off-by: David Woodhouse

    Adrian Bunk
     
  • This patch contains the following cleanups for the asm/ptrace.h
    userspace headers:

    - include/asm-generic/Kbuild.asm already lists ptrace.h, remove
    the superfluous listings in the Kbuild files of the following
    architectures:
    - cris
    - frv
    - powerpc
    - x86
    - don't expose function prototypes and macros to userspace:
    - arm
    - blackfin
    - cris
    - mn10300
    - parisc
    - remove #ifdef CONFIG_'s around #define's:
    - blackfin
    - m68knommu
    - sh: AFAIK __SH5__ should work in both kernel and userspace,
    no need to leak CONFIG_SUPERH64 to userspace
    - xtensa: cosmetical change to remove empty
    #ifndef __ASSEMBLY__ #else #endif
    from the userspace headers

    Not changed by this patch is the fact that the following architectures
    have a different struct pt_regs depending on CONFIG_ variables:
    - h8300
    - m68knommu
    - mips

    This does not work in userspace.

    Signed-off-by: Adrian Bunk
    Cc:
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Acked-by: Greg Ungerer
    Acked-by: Paul Mundt
    Acked-by: Grant Grundler
    Acked-by: Jesper Nilsson
    Acked-by: Chris Zankel
    Acked-by: David Howells
    Acked-by: Paul Mackerras
    Acked-by: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • We duplicate alloc/free_thread_info defines on many platforms (the
    majority uses __get_free_pages/free_pages). This patch defines common
    defines and removes these duplicated defines.
    __HAVE_ARCH_THREAD_INFO_ALLOCATOR is introduced for platforms that do
    something different.

    Signed-off-by: FUJITA Tomonori
    Acked-by: Russell King
    Cc: Pekka Enberg
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    FUJITA Tomonori
     

25 Jul, 2008

6 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (76 commits)
    ide: use proper printk() KERN_* levels in ide-probe.c
    ide: fix for EATA SCSI HBA in ATA emulating mode
    ide: remove stale comments from drivers/ide/Makefile
    ide: enable local IRQs in all handlers for TASKFILE_NO_DATA data phase
    ide-scsi: remove kmalloced struct request
    ht6560b: remove old history
    ht6560b: update email address
    ide-cd: fix oops when using growisofs
    gayle: release resources on ide_host_add() failure
    palm_bk3710: add UltraDMA/100 support
    ide: trivial sparse annotations
    ide: ide-tape.c sparse annotations and unaligned access removal
    ide: drop 'name' parameter from ->init_chipset method
    ide: prefix messages from IDE PCI host drivers by driver name
    it821x: remove DECLARE_ITE_DEV() macro
    it8213: remove DECLARE_ITE_DEV() macro
    ide: include PCI device name in messages from IDE PCI host drivers
    ide: remove for some archs
    ide-generic: remove ide_default_{io_base,irq}() inlines (take 3)
    ide-generic: is no longer needed on ppc32
    ...

    Linus Torvalds
     
  • * Remove include from ( includes
    which is enough).

    * Remove for alpha/blackfin/h8300/ia64/m32r/sh/x86/xtensa
    (this leaves us with arm/frv/m68k/mips/mn10300/parisc/powerpc/sparc[64]).

    There should be no functional changes caused by this patch.

    Signed-off-by: Bartlomiej Zolnierkiewicz

    Bartlomiej Zolnierkiewicz
     
  • * Now that ide_hwif_t instances are allocated dynamically
    the difference between MAX_HWIFS == 2 and MAX_HWIFS == 10
    is ~100 bytes (x86-32) so use MAX_HWIFS == 10 on all archs
    except these ones that use MAX_HWIFS == 1.

    * Define MAX_HWIFS in instead of .

    [ Please note that avr32/cris/v850 have no
    and alpha/ia64/sh always define CONFIG_IDE_MAX_HWIFS. ]

    Signed-off-by: Bartlomiej Zolnierkiewicz

    Bartlomiej Zolnierkiewicz
     
  • * Add missing include.

    While at it:

    * Remove needless ide_default_{irq,io_base}() inlines.

    Cc: Chris Zankel
    Signed-off-by: Bartlomiej Zolnierkiewicz

    Bartlomiej Zolnierkiewicz
     
  • * 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc:
    Remove __DECLARE_SEMAPHORE_GENERIC
    Remove asm/semaphore.h
    Remove use of asm/semaphore.h
    Add missing semaphore.h includes
    Remove mention of semaphores from kernel-locking

    Linus Torvalds
     
  • On 32-bit architectures PAGE_ALIGN() truncates 64-bit values to the 32-bit
    boundary. For example:

    u64 val = PAGE_ALIGN(size);

    always returns a value < 4GB even if size is greater than 4GB.

    The problem resides in PAGE_MASK definition (from include/asm-x86/page.h for
    example):

    #define PAGE_SHIFT 12
    #define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT)
    #define PAGE_MASK (~(PAGE_SIZE-1))
    ...
    #define PAGE_ALIGN(addr) (((addr)+PAGE_SIZE-1)&PAGE_MASK)

    The "~" is performed on a 32-bit value, so everything in "and" with
    PAGE_MASK greater than 4GB will be truncated to the 32-bit boundary.
    Using the ALIGN() macro seems to be the right way, because it uses
    typeof(addr) for the mask.

    Also move the PAGE_ALIGN() definitions out of include/asm-*/page.h in
    include/linux/mm.h.

    See also lkml discussion: http://lkml.org/lkml/2008/6/11/237

    [akpm@linux-foundation.org: fix drivers/media/video/uvc/uvc_queue.c]
    [akpm@linux-foundation.org: fix v850]
    [akpm@linux-foundation.org: fix powerpc]
    [akpm@linux-foundation.org: fix arm]
    [akpm@linux-foundation.org: fix mips]
    [akpm@linux-foundation.org: fix drivers/media/video/pvrusb2/pvrusb2-dvb.c]
    [akpm@linux-foundation.org: fix drivers/mtd/maps/uclinux.c]
    [akpm@linux-foundation.org: fix powerpc]
    Signed-off-by: Andrea Righi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Righi
     

24 Jul, 2008

1 commit


15 May, 2008

1 commit

  • I noticed this because alpha was broken due to the recent commit commit
    bdc807871d58285737d50dc6163d0feb72cb0dc2 ("avoid overflows in
    kernel/time.c"). Most arches do something like this in their
    asm/param.h:

    #ifdef __KERNEL__
    # define HZ CONFIG_HZ
    #else
    # define HZ 100
    #endif

    A few arches though (namely alpha/h8300/um/v850/xtensa) either do no set
    HZ at all for !__KERNEL__, or they set it wrongly. This should bring all
    arches in line by setting up HZ for userspace.

    Without this currently perl 5.10 doesn't build on alpha:

    perl.c: In function 'perl_construct':
    perl.c:388: error: 'CONFIG_HZ' undeclared (first use in this function)
    -> http://buildd.debian.org/fetch.cgi?pkg=perl;ver=5.10.0-10;arch=alpha;stamp=1210252894

    Signed-off-by: Mike Frysinger
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Yoshinori Sato
    Cc: Jeff Dike
    Cc: Chris Zankel
    Cc: maximilian attems
    Signed-off-by: Andrew Morton
    [ HZ on alpha is 1024 for historical reasons. - Linus ]
    Signed-off-by: Linus Torvalds

    Mike Frysinger
     

03 May, 2008

1 commit


29 Apr, 2008

1 commit

  • Unaligned access is ok for the following arches:
    cris, m68k, mn10300, powerpc, s390, x86

    Arches that use the memmove implementation for native endian, and
    the byteshifting for the opposite endianness.
    h8300, m32r, xtensa

    Packed struct for native endian, byteshifting for other endian:
    alpha, blackfin, ia64, parisc, sparc, sparc64, mips, sh

    m86knommu is generic_be for Coldfire, otherwise unaligned access is ok.

    frv, arm chooses endianness based on compiler settings, uses the byteshifting
    versions. Remove the unaligned trap handler from frv as it is now unused.

    v850 is le, uses the byteshifting versions for both be and le.

    Remove the now unused asm-generic implementation.

    Signed-off-by: Harvey Harrison
    Acked-by: David S. Miller
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Harvey Harrison
     

28 Apr, 2008

1 commit

  • s390 for one, cannot implement VM_MIXEDMAP with pfn_valid, due to their memory
    model (which is more dynamic than most). Instead, they had proposed to
    implement it with an additional path through vm_normal_page(), using a bit in
    the pte to determine whether or not the page should be refcounted:

    vm_normal_page()
    {
    ...
    if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
    if (vma->vm_flags & VM_MIXEDMAP) {
    #ifdef s390
    if (!mixedmap_refcount_pte(pte))
    return NULL;
    #else
    if (!pfn_valid(pfn))
    return NULL;
    #endif
    goto out;
    }
    ...
    }

    This is fine, however if we are allowed to use a bit in the pte to determine
    refcountedness, we can use that to _completely_ replace all the vma based
    schemes. So instead of adding more cases to the already complex vma-based
    scheme, we can have a clearly seperate and simple pte-based scheme (and get
    slightly better code generation in the process):

    vm_normal_page()
    {
    #ifdef s390
    if (!mixedmap_refcount_pte(pte))
    return NULL;
    return pte_page(pte);
    #else
    ...
    #endif
    }

    And finally, we may rather make this concept usable by any architecture rather
    than making it s390 only, so implement a new type of pte state for this.
    Unfortunately the old vma based code must stay, because some architectures may
    not be able to spare pte bits. This makes vm_normal_page a little bit more
    ugly than we would like, but the 2 cases are clearly seperate.

    So introduce a pte_special pte state, and use it in mm/memory.c. It is
    currently a noop for all architectures, so this doesn't actually result in any
    compiled code changes to mm/memory.o.

    BTW:
    I haven't put vm_normal_page() into arch code as-per an earlier suggestion.
    The reason is that, regardless of where vm_normal_page is actually
    implemented, the *abstraction* is still exactly the same. Also, while it
    depends on whether the architecture has pte_special or not, that is the
    only two possible cases, and it really isn't an arch specific function --
    the role of the arch code should be to provide primitive functions and
    accessors with which to build the core code; pte_special does that. We do
    not want architectures to know or care about vm_normal_page itself, and
    we definitely don't want them being able to invent something new there
    out of sight of mm/ code. If we made vm_normal_page an arch function, then
    we have to make vm_insert_mixed (next patch) an arch function too. So I
    don't think moving it to arch code fundamentally improves any abstractions,
    while it does practically make the code more difficult to follow, for both
    mm and arch developers, and easier to misuse.

    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Nick Piggin
    Acked-by: Carsten Otte
    Cc: Jared Hulbert
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

17 Apr, 2008

1 commit

  • Semaphores are no longer performance-critical, so a generic C
    implementation is better for maintainability, debuggability and
    extensibility. Thanks to Peter Zijlstra for fixing the lockdep
    warning. Thanks to Harvey Harrison for pointing out that the
    unlikely() was unnecessary.

    Signed-off-by: Matthew Wilcox
    Acked-by: Ingo Molnar

    Matthew Wilcox
     

03 Apr, 2008

1 commit

  • Currently include/linux/kvm.h is not considered by make headers_install,
    because Kbuild cannot handle " unifdef-$(CONFIG_FOO) += foo.h. This problem
    was introduced by

    commit fb56dbb31c4738a3918db81fd24da732ce3b4ae6
    Author: Avi Kivity
    Date: Sun Dec 2 10:50:06 2007 +0200

    KVM: Export include/linux/kvm.h only if $ARCH actually supports KVM

    Currently, make headers_check barfs due to , which
    includes, not existing. Rather than add a zillion s, export kvm.
    only if the arch actually supports it.

    Signed-off-by: Avi Kivity

    which makes this an 2.6.25 regression.

    One way of solving the issue is to enhance Kbuild, but Avi and David conviced
    me, that changing headers_install is not the way to go. This patch changes
    the definition for linux/kvm.h to unifdef-y.

    If  unifdef-y is used for linux/kvm.h "make headers_check" will fail on all
    architectures without asm/kvm.h. Therefore, this patch also provides
    asm/kvm.h on all architectures.

    Signed-off-by: Christian Borntraeger
    Acked-by: Avi Kivity
    Cc: Sam Ravnborg
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christian Borntraeger
     

14 Feb, 2008

12 commits


09 Feb, 2008

5 commits

  • In file included from /home/bunk/linux/kernel-2.6/git/linux-2.6/arch/xtensa/kernel/syscall.c:39:
    include2/asm/unistd.h:681: error: 'sys_timerfd' undeclared here (not in a function)

    Signed-off-by: Adrian Bunk
    Cc: Christian Zankel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • Background: I've implemented 1K/2K page tables for s390. These sub-page
    page tables are required to properly support the s390 virtualization
    instruction with KVM. The SIE instruction requires that the page tables
    have 256 page table entries (pte) followed by 256 page status table entries
    (pgste). The pgstes are only required if the process is using the SIE
    instruction. The pgstes are updated by the hardware and by the hypervisor
    for a number of reasons, one of them is dirty and reference bit tracking.
    To avoid wasting memory the standard pte table allocation should return
    1K/2K (31/64 bit) and 2K/4K if the process is using SIE.

    Problem: Page size on s390 is 4K, page table size is 1K or 2K. That means
    the s390 version for pte_alloc_one cannot return a pointer to a struct
    page. Trouble is that with the CONFIG_HIGHPTE feature on x86 pte_alloc_one
    cannot return a pointer to a pte either, since that would require more than
    32 bit for the return value of pte_alloc_one (and the pte * would not be
    accessible since its not kmapped).

    Solution: The only solution I found to this dilemma is a new typedef: a
    pgtable_t. For s390 pgtable_t will be a (pte *) - to be introduced with a
    later patch. For everybody else it will be a (struct page *). The
    additional problem with the initialization of the ptl lock and the
    NR_PAGETABLE accounting is solved with a constructor pgtable_page_ctor and
    a destructor pgtable_page_dtor. The page table allocation and free
    functions need to call these two whenever a page table page is allocated or
    freed. pmd_populate will get a pgtable_t instead of a struct page pointer.
    To get the pgtable_t back from a pmd entry that has been installed with
    pmd_populate a new function pmd_pgtable is added. It replaces the pmd_page
    call in free_pte_range and apply_to_pte_range.

    Signed-off-by: Martin Schwidefsky
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Martin Schwidefsky
     
  • When the conversion factor between jiffies and milli- or microseconds is
    not a single multiply or divide, as for the case of HZ == 300, we currently
    do a multiply followed by a divide. The intervening result, however, is
    subject to overflows, especially since the fraction is not simplified (for
    HZ == 300, we multiply by 300 and divide by 1000).

    This is exposed to the user when passing a large timeout to poll(), for
    example.

    This patch replaces the multiply-divide with a reciprocal multiplication on
    32-bit platforms. When the input is an unsigned long, there is no portable
    way to do this on 64-bit platforms there is no portable way to do this
    since it requires a 128-bit intermediate result (which gcc does support on
    64-bit platforms but may generate libgcc calls, e.g. on 64-bit s390), but
    since the output is a 32-bit integer in the cases affected, just simplify
    the multiply-divide (*3/10 instead of *300/1000).

    The reciprocal multiply used can have off-by-one errors in the upper half
    of the valid output range. This could be avoided at the expense of having
    to deal with a potential 65-bit intermediate result. Since the intent is
    to avoid overflow problems and most of the other time conversions are only
    semiexact, the off-by-one errors were considered an acceptable tradeoff.

    At Ralf Baechle's suggestion, this version uses a Perl script to compute
    the necessary constants. We already have dependencies on Perl for kernel
    compiles. This does, however, require the Perl module Math::BigInt, which
    is included in the standard Perl distribution starting with version 5.8.0.
    In order to support older versions of Perl, include a table of canned
    constants in the script itself, and structure the script so that
    Math::BigInt isn't required if pulling values from said table.

    Running the script requires that the HZ value is available from the
    Makefile. Thus, this patch also adds the Kconfig variable CONFIG_HZ to the
    architectures which didn't already have it (alpha, cris, frv, h8300, m32r,
    m68k, m68knommu, sparc, v850, and xtensa.) It does *not* touch the sh or
    sh64 architectures, since Paul Mundt has dealt with those separately in the
    sh tree.

    Signed-off-by: H. Peter Anvin
    Cc: Ralf Baechle ,
    Cc: Sam Ravnborg ,
    Cc: Paul Mundt ,
    Cc: Richard Henderson ,
    Cc: Michael Starvik ,
    Cc: David Howells ,
    Cc: Yoshinori Sato ,
    Cc: Hirokazu Takata ,
    Cc: Geert Uytterhoeven ,
    Cc: Roman Zippel ,
    Cc: William L. Irwin ,
    Cc: Chris Zankel ,
    Cc: H. Peter Anvin ,
    Cc: Jan Engelhardt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    H. Peter Anvin
     
  • Some arches (like alpha and ia64) already have a clean posix_types.h header.
    This brings all the others in line by removing all references to __GLIBC__
    (and some undocumented __USE_ALL).

    Signed-off-by: Mike Frysinger
    Acked-by: Ingo Molnar
    Cc: Ulrich Drepper
    Cc: Roland McGrath
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Frysinger
     
  • Move STACK_TOP[_MAX] out of asm/a.out.h and into asm/processor.h as they're
    required whether or not A.OUT format is available.

    Signed-off-by: David Howells
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

08 Feb, 2008

2 commits


06 Feb, 2008

1 commit

  • (with Martin Schwidefsky )

    The pgd/pud/pmd/pte page table allocation functions get a mm_struct pointer as
    first argument. The free functions do not get the mm_struct argument. This
    is 1) asymmetrical and 2) to do mm related page table allocations the mm
    argument is needed on the free function as well.

    [kamalesh@linux.vnet.ibm.com: i386 fix]
    [akpm@linux-foundation.org: coding-syle fixes]
    Signed-off-by: Benjamin Herrenschmidt
    Signed-off-by: Martin Schwidefsky
    Cc:
    Signed-off-by: Kamalesh Babulal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     

01 Feb, 2008

1 commit

  • A userspace program may wish to set the mark for each packets its send
    without using the netfilter MARK target. Changing the mark can be used
    for mark based routing without netfilter or for packet filtering.

    It requires CAP_NET_ADMIN capability.

    Signed-off-by: Laszlo Attila Toth
    Acked-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Laszlo Attila Toth