13 Dec, 2007

1 commit

  • * ide_xfer_verbose() fixups:
    - beautify returned mode names
    - fix PIO5 reporting
    - make it return 'const char *'

    * Change printk() level from KERN_DEBUG to KERN_INFO in ide_find_dma_mode().

    * Add ide_id_dma_bug() helper based on ide_dma_verbose() to check for invalid
    DMA info in identify block.

    * Use ide_id_dma_bug() in ide_tune_dma() and ide_driveid_update().

    As a result DMA won't be tuned or will be disabled after tuning if device
    reports inconsistent info about enabled DMA mode (ide_dma_verbose() does the
    same checks while the IDE device is probed by ide-{cd,disk} device driver).

    * Remove no longer needed ide_dma_verbose().

    This patch should fix the following problem with out-of-sync IDE messages
    reported by Nick Warne:

    hdd: ATAPI 48X DVD-ROM DVD-R-RAM CD-R/RW drive, 2048kB Cachehdd:
    skipping word 93 validity check
    , UDMA(66)

    and later debugged by Mark Lord to be caused by:

    ide_dma_verbose()
    printk( ... "2048kB Cache");
    eighty_ninty_three()
    printk(KERN_DEBUG "%s: skipping word 93 validity check\n");
    ide_dma_verbose()
    printk(", UDMA(66)"

    Please note that as a result ide-{cd,disk} device drivers won't report the
    DMA speed used but this is intended since now DMA mode being used is always
    reported by IDE core code.

    v2:
    * fixes suggested by Randy:
    - use KERN_CONT for printk()-s in ide-{cd,disk}.c
    - don't remove argument name from ide_xfer_verbose() declaration

    v3:
    * Remove incorrect check for (id->field_valid & 1) from ide_id_dma_bug()
    (spotted by Sergei).

    * "XFER SLOW" -> "PIO SLOW" in ide_xfer_verbose() (suggested by Sergei).

    * Fix ide_find_dma_mode() to report the correct mode ('mode' after being
    limited by 'req_mode').

    Cc: Sergei Shtylyov
    Cc: Nick Warne
    Cc: Mark Lord
    Cc: Randy Dunlap
    Signed-off-by: Bartlomiej Zolnierkiewicz

    Bartlomiej Zolnierkiewicz
     

12 Dec, 2007

1 commit

  • - Add comments to functions that require that caller hold q->lock
    - Add __videobuf_mmap_free that doesn't hold q->lock for use within videobuf
    - Add locking to videobuf_mmap_free
    - Fix linux/drivers/media/common/saa7146_video.c which was holding lock around
    videobuf_read_stop
    - Add locking to functions that operate on a queue
    - Add videobuf_stop to take care of stopping in both the read and stream case

    TODO: bttv still has an unsafe call to videobuf_queue_is_busy

    Signed-off-by: Brandon Philips
    Signed-off-by: Mauro Carvalho Chehab

    Brandon Philips
     

11 Dec, 2007

3 commits

  • * 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
    [MIPS] Malta: Enable tickless and highres timers.
    [MIPS] Bigsur: Enable tickless and and highres timers.
    qemu: do not enable IP7 blindly
    [MIPS] Alchemy: Fix Au1x SD controller IRQ
    [MIPS] Don't byteswap writes to display when running bigendian

    Linus Torvalds
     
  • The esp_reset_cleanup() function is called with the host lock held and
    invokes starget_for_each_device() which wants to take it too. Here is a
    fix along the lines of shost_for_each_device()/__shost_for_each_device()
    adding a __starget_for_each_device() counterpart which assumes the lock
    has already been taken.

    Eventually, I think the driver should get modified so that more work is
    done as a softirq rather than in the interrupt context, but for now it
    fixes a bug that causes the spinlock debugger to fire.

    While at it, it fixes a small number of cosmetic problems with
    starget_for_each_device() too.

    Signed-off-by: Maciej W. Rozycki
    Acked-by: David S. Miller
    Cc: James Bottomley
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Maciej W. Rozycki
     
  • Some places where CLOCK_TICK_RATE may be used incorrectly:

    arch/arm/mach-mx3/time.c:125: __raw_writel((v / CLOCK_TICK_RATE) - 1, MXC_GPT_GPTPR);
    drivers/watchdog/davinci_wdt.c:103: timer_margin = (((u64)heartbeat * CLOCK_TICK_RATE) & 0xffffffff);
    drivers/watchdog/davinci_wdt.c:105: timer_margin = (((u64)heartbeat * CLOCK_TICK_RATE) >> 32);
    drivers/watchdog/ks8695_wdt.c:64: unsigned long tval = wdt_time * CLOCK_TICK_RATE;

    I'm not sure whether this definition is used there, but adding parentheses
    should be good anyway.

    Signed-off-by: Roel Kluin
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roel Kluin
     

09 Dec, 2007

1 commit


08 Dec, 2007

5 commits

  • Make some IOSAPIC functions static and remove one that is unused.

    Signed-off-by: Simon Horman
    Signed-off-by: Tony Luck

    Simon Horman
     
  • Add new hash for balance-xor and 802.3ad modes. Originally
    submitted by "Glenn Griffin" ; modified by
    Jay Vosburgh to move setting of hash policy out of line, tweak the
    documentation update and add version update to 3.2.2.

    Glenn's original comment follows:

    Included is a patch for a new xmit_hash_policy for the bonding driver
    that selects slaves based on MAC and IP information. This is a middle
    ground between what currently exists in the layer2 only policy and the
    layer3+4 policy. This policy strives to be fully 802.3ad compliant by
    transmitting every packet of any particular flow over the same link.
    As documented the layer3+4 policy is not fully compliant for extreme
    cases such as ip fragmentation, so this policy is a nice compromise
    for environments that require full compliance but desire more than the
    layer2 only policy.

    Signed-off-by: "Glenn Griffin"
    Signed-off-by: Jay Vosburgh
    Signed-off-by: Jeff Garzik

    Jay Vosburgh
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6:
    [AVR32] Fix wrong pt_regs in critical exception handler
    [AVR32] Fix copy_to_user_page() breakage
    [AVR32] Follow the rules when dealing with the OCD system
    [AVR32] Clean up OCD register usage
    [AVR32] Implement irqflags trace and lockdep support
    [AVR32] Implement stacktrace support
    [AVR32] Kconfig: Use def_bool instead of bool + default
    [AVR32] Fix invalid status register bit definitions in asm/ptrace.h
    [AVR32] Add TIF_RESTORE_SIGMASK to the work masks

    Linus Torvalds
     
  • * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
    [AF_RXRPC]: Add a missing goto
    [VLAN]: Lost rtnl_unlock() in vlan_ioctl()
    [SCTP]: Fix the bind_addr info during migration.
    [SCTP]: Add bind hash locking to the migrate code
    [IPV4]: Remove prototype of ip_rt_advice
    [IPv4]: Reply net unreachable ICMP message
    [IPv6] SNMP: Increment OutNoRoutes when connecting to unreachable network
    [BRIDGE]: Section fix.
    [NIU]: Fix link LED handling.

    Linus Torvalds
     
  • * 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds:
    leds: Fix led trigger locking bugs

    Linus Torvalds
     

07 Dec, 2007

15 commits

  • The current implementation of copy_to_user_page() gives "vaddr" to the
    cache instruction when trying to sync the icache with the dcache. If
    vaddr does not exist in the TLB, the CPU will silently abort the
    operation, which may result in the caches staying out of sync.

    To fix this, pass the "dst" parameter to flush_icache_range() instead
    -- we know this is valid because we just wrote to it.

    Signed-off-by: Haavard Skinnemoen

    Haavard Skinnemoen
     
  • The current debug trap handling code does a number of things that are
    illegal according to the AVR32 Architecture manual. Most importantly,
    it may try to schedule from Debug Mode, thus clearing the D bit, which
    can lead to "undefined behaviour".

    It seems like this works in most cases, but several people have
    observed somewhat unstable behaviour when debugging programs,
    including soft lockups. So there's definitely something which is not
    right with the existing code.

    The new code will never schedule from Debug mode, it will always exit
    Debug mode with a "retd" instruction, and if something not running in
    Debug mode needs to do something debug-related (like doing a single
    step), it will enter debug mode through a "breakpoint" instruction.
    The monitor code will then return directly to user space, bypassing
    its own saved registers if necessary (since we don't actually care
    about the trapped context, only the one that came before.)

    This adds three instructions to the common exception handling code,
    including one branch. It does not touch super-hot paths like the TLB
    miss handler.

    Signed-off-by: Haavard Skinnemoen

    Haavard Skinnemoen
     
  • Generate a new set of OCD register definitions in asm/ocd.h and rename
    __mfdr() and __mtdr() to ocd_read() and ocd_write() respectively.

    The bitfield definitions are a lot more complete now, and they are
    entirely based on bit numbers, not masks. This is because OCD
    registers are frequently accessed from assembly code, where bit
    numbers are a lot more useful (can be fed directly to sbr, bfins,
    etc.)

    Bitfields that consist of more than one bit have two definitions:
    _START, which indicates the number of the first bit, and _SIZE, which
    indicates the number of bits. These directly correspond to the
    parameters taken by the bfextu, bfexts and bfins instructions.

    Signed-off-by: Haavard Skinnemoen

    Haavard Skinnemoen
     
  • Signed-off-by: Haavard Skinnemoen

    Haavard Skinnemoen
     
  • The 'H' bit is bit 29, while the 'R' bit doesn't exist. Luckily, we
    don't actually use any of the bits in question.

    Also update show_regs() to show the Debug Mask and Debug state bits.

    Signed-off-by: Haavard Skinnemoen

    Haavard Skinnemoen
     
  • We really need to check TIF_RESTORE_SIGMASK before returning to
    userspace. The existing code does not necessarily do this.

    Define the work masks as a bitwise OR of the respective flags instead
    of a hardcoded hex value to make it easier to spot errors like this in
    the future.

    Signed-off-by: Haavard Skinnemoen

    Haavard Skinnemoen
     
  • During accept/migrate the code attempts to copy the addresses from
    the parent endpoint to the new endpoint. However, if the parent
    was bound to a wildcard address, then we end up pointlessly copying
    all of the current addresses on the system.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • ip_rt_advice has been gone, so no need to keep prototype and debug message.

    Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • Convert part of the led trigger core from rw spinlocks to rw
    semaphores. We're calling functions which can sleep from invalid
    contexts otherwise. Fixes bug #9264.

    Signed-off-by: Richard Purdie

    Richard Purdie
     
  • * 'merge' of master.kernel.org:/pub/scm/linux/kernel/git/paulus/powerpc:
    [POWERPC] virtex bug fix: Use canonical value for AC97 interrupt xparams
    [POWERPC] Update defconfigs
    [POWERPC] PS3: Update ps3_defconfig
    [POWERPC] Update iseries_defconfig
    [POWERPC] Fix hardware IRQ time accounting problem.

    Linus Torvalds
     
  • * 'for-2.6.24' of git://git.kernel.org/pub/scm/linux/kernel/git/galak/powerpc:
    [POWERPC] Fix swapper_pg_dir size when CONFIG_PTE_64BIT=y on FSL_BOOKE

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6:
    [PARISC] lba_pci: pci_claim_resources disabled expansion roms
    [PARISC] print more than one character at a time for pdc console
    [PARISC] Update parisc-linux MAINTAINERS entries
    [PARISC] timer interrupt should not be IRQ_DISABLED
    Revert "[PARISC] import necessary bits of libgcc.a"

    Linus Torvalds
     
  • The size of swapper_pg_dir is 8k instead of 4k when using 64-bit PTEs
    (CONFIG_PTE_64BIT).

    This was reported by Cedric Hombourger

    Signed-off-by: Kumar Gala

    Kumar Gala
     
  • There's really no reason not to print more than one character at a
    time to the PDC console... Booting is measurably speedier, and now I don't
    have to watch individual characters get drawn.

    Signed-off-by: Kyle McMartin

    Kyle McMartin
     
  • Do what the commits commits f3e8d1da389fe2e514e31f6e93c690c8e1243849 and
    9d360ab4a7568a8d177280f651a8a772ae52b9b9 failed to achieve -- actually
    convert the Alchemy code to irq_cpu.

    Signed-off-by: Sergei Shtylyov
    Signed-off-by: Ralf Baechle

    Sergei Shtylyov
     

06 Dec, 2007

10 commits

  • The commit fa13a5a1f25f671d084d8884be96fc48d9b68275 (sched: restore
    deterministic CPU accounting on powerpc), unconditionally calls
    update_process_tick() in system context. In the deterministic
    accounting case this is the correct thing to do. However, in the
    non-deterministic accounting case we need to not do this, since doing
    this results in the time accounted as hardware irq time being
    artificially elevated.

    Also this collapses 2 consecutive '#ifdef CONFIG_VIRT_CPU_ACCOUNTING'
    checks in time.h into one for neatness.

    Signed-off-by: Tony Breeds
    Signed-off-by: Paul Mackerras

    Tony Breeds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
    futex: correctly return -EFAULT not -EINVAL
    lockdep: in_range() fix
    lockdep: fix debug_show_all_locks()
    sched: style cleanups
    futex: fix for futex_wait signal stack corruption

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
    VM/Security: add security hook to do_brk
    Security: round mmap hint address above mmap_min_addr
    security: protect from stack expantion into low vm addresses
    Security: allow capable check to permit mmap or low vm space
    SELinux: detect dead booleans
    SELinux: do not clear f_op when removing entries

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
    [LRO]: fix lro_gen_skb() alignment
    [TCP]: NAGLE_PUSH seems to be a wrong way around
    [TCP]: Move prior_in_flight collect to more robust place
    [TCP] FRTO: Use of existing funcs make code more obvious & robust
    [IRDA]: Move ircomm_tty_line_info() under #ifdef CONFIG_PROC_FS
    [ROSE]: Trivial compilation CONFIG_INET=n case
    [IPVS]: Fix sched registration race when checking for name collision.
    [IPVS]: Don't leak sysctl tables if the scheduler registration fails.

    Linus Torvalds
     
  • Creating PDEs with refcount 0 and "deleted" flag has problems (see below).
    Switch to usual scheme:
    * PDE is created with refcount 1
    * every de_get does +1
    * every de_put() and remove_proc_entry() do -1
    * once refcount reaches 0, PDE is freed.

    This elegantly fixes at least two following races (both observed) without
    introducing new locks, without abusing old locks, without spreading
    lock_kernel():

    1) PDE leak

    remove_proc_entry de_put
    ----------------- ------
    [refcnt = 1]
    if (atomic_read(&de->count) == 0)
    if (atomic_dec_and_test(&de->count))
    if (de->deleted)
    /* also not taken! */
    free_proc_entry(de);
    else
    de->deleted = 1;
    [refcount=0, deleted=1]

    2) use after free

    remove_proc_entry de_put
    ----------------- ------
    [refcnt = 1]

    if (atomic_dec_and_test(&de->count))
    if (atomic_read(&de->count) == 0)
    free_proc_entry(de);
    /* boom! */
    if (de->deleted)
    free_proc_entry(de);

    BUG: unable to handle kernel paging request at virtual address 6b6b6b6b
    printing eip: c10acdda *pdpt = 00000000338f8001 *pde = 0000000000000000
    Oops: 0000 [#1] PREEMPT SMP
    Modules linked in: af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
    Pid: 23161, comm: cat Not tainted (2.6.24-rc2-8c0863403f109a43d7000b4646da4818220d501f #4)
    EIP: 0060:[] EFLAGS: 00210097 CPU: 1
    EIP is at strnlen+0x6/0x18
    EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 6b6b6b6b EDX: fffffffe
    ESI: c128fa3b EDI: f380bf34 EBP: ffffffff ESP: f380be44
    DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
    Process cat (pid: 23161, ti=f380b000 task=f38f2570 task.ti=f380b000)
    Stack: c10ac4f0 00000278 c12ce000 f43cd2a8 00000163 00000000 7da86067 00000400
    c128fa20 00896b18 f38325a8 c128fe20 ffffffff 00000000 c11f291e 00000400
    f75be300 c128fa20 f769c9a0 c10ac779 f380bf34 f7bfee70 c1018e6b f380bf34
    Call Trace:
    [] vsnprintf+0x2ad/0x49b
    [] vscnprintf+0x14/0x1f
    [] vprintk+0xc5/0x2f9
    [] handle_fasteoi_irq+0x0/0xab
    [] do_IRQ+0x9f/0xb7
    [] preempt_schedule_irq+0x3f/0x5b
    [] need_resched+0x1f/0x21
    [] printk+0x1b/0x1f
    [] de_put+0x3d/0x50
    [] proc_delete_inode+0x38/0x41
    [] proc_delete_inode+0x0/0x41
    [] generic_delete_inode+0x5e/0xc6
    [] iput+0x60/0x62
    [] d_kill+0x2d/0x46
    [] dput+0xdc/0xe4
    [] __fput+0xb0/0xcd
    [] filp_close+0x48/0x4f
    [] sys_close+0x67/0xa5
    [] sysenter_past_esp+0x5f/0x85
    =======================
    Code: c9 74 0c f2 ae 74 05 bf 01 00 00 00 4f 89 fa 5f 89 d0 c3 85 c9 57 89 c7 89 d0 74 05 f2 ae 75 01 4f 89 f8 5f c3 89 c1 89 c8 eb 06 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 c3 90 90 90 57 83 c9
    EIP: [] strnlen+0x6/0x18 SS:ESP 0068:f380be44

    Also, remove broken usage of ->deleted from reiserfs: if sget() succeeds,
    module is already pinned and remove_proc_entry() can't happen => nobody
    can mark PDE deleted.

    Dummy proc root in netns code is not marked with refcount 1. AFAICS, we
    never get it, it's just for proper /proc/net removal. I double checked
    CLONE_NETNS continues to work.

    Patch survives many hours of modprobe/rmmod/cat loops without new bugs
    which can be attributed to refcounting.

    Signed-off-by: Alexey Dobriyan
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Before we start committing a transaction, we call
    __journal_clean_checkpoint_list() to cleanup transaction's written-back
    buffers.

    If this call happens to remove all of them (and there were already some
    buffers), __journal_remove_checkpoint() will decide to free the transaction
    because it isn't (yet) a committing transaction and soon we fail some
    assertion - the transaction really isn't ready to be freed :).

    We change the check in __journal_remove_checkpoint() to free only a
    transaction in T_FINISHED state. The locking there is subtle though (as
    everywhere in JBD ;(). We use j_list_lock to protect the check and a
    subsequent call to __journal_drop_transaction() and do the same in the end
    of journal_commit_transaction() which is the only place where a transaction
    can get to T_FINISHED state.

    Probably I'm too paranoid here and such locking is not really necessary -
    checkpoint lists are processed only from log_do_checkpoint() where a
    transaction must be already committed to be processed or from
    __journal_clean_checkpoint_list() where kjournald itself calls it and thus
    transaction cannot change state either. Better be safe if something
    changes in future...

    Signed-off-by: Jan Kara
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Remove some sort of bloaty code, try to get these pin_req arrays built at compile-time

    - move this static things to the blackfin board file
    - add pin_req array to struct bfin5xx_spi_master
    - tested on BF537/BF548 with SPI flash

    Signed-off-by: Bryan Wu
    Cc: David Brownell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bryan Wu
     
  • Move cs_chg_udelay handling (specific to this driver) to cs_deactive(), fixing
    a bug when some SPI LCD driver needs delay after cs_deactive.

    Fix bug reported by Cameron Barfield
    https://blackfin.uclinux.org/gf/project/uclinux-dist/forum/?action=ForumBrowse&forum_id=39&_forum_action=ForumMessageBrowse&thread_id=23630&feedback=Message%20replied.

    Cc: Cameron Barfield
    Signed-off-by: Bryan Wu
    Signed-off-by: David Brownell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bryan Wu
     
  • Use new Blackfin portmux interface, add error handling.

    Signed-off-by: Michael Hennerich
    Signed-off-by: Bryan Wu
    Signed-off-by: David Brownell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Hennerich
     
  • Initial BF54x SPI support

    - support BF54x SPI0
    - clean up some code (whitespace etc)
    - will support multiports in the future
    - start using portmux calls

    Signed-off-by: Bryan Wu
    Signed-off-by: David Brownell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bryan Wu
     

05 Dec, 2007

4 commits

  • David Holmes found a bug in the -rt tree with respect to
    pthread_cond_timedwait. After trying his test program on the latest git
    from mainline, I found the bug was there too. The bug he was seeing
    that his test program showed, was that if one were to do a "Ctrl-Z" on a
    process that was in the pthread_cond_timedwait, and then did a "bg" on
    that process, it would return with a "-ETIMEDOUT" but early. That is,
    the timer would go off early.

    Looking into this, I found the source of the problem. And it is a rather
    nasty bug at that.

    Here's the relevant code from kernel/futex.c: (not in order in the file)

    [...]
    smlinkage long sys_futex(u32 __user *uaddr, int op, u32 val,
    struct timespec __user *utime, u32 __user *uaddr2,
    u32 val3)
    {
    struct timespec ts;
    ktime_t t, *tp = NULL;
    u32 val2 = 0;
    int cmd = op & FUTEX_CMD_MASK;

    if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI)) {
    if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
    return -EFAULT;
    if (!timespec_valid(&ts))
    return -EINVAL;

    t = timespec_to_ktime(ts);
    if (cmd == FUTEX_WAIT)
    t = ktime_add(ktime_get(), t);
    tp = &t;
    }
    [...]
    return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
    }

    [...]

    long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
    u32 __user *uaddr2, u32 val2, u32 val3)
    {
    int ret;
    int cmd = op & FUTEX_CMD_MASK;
    struct rw_semaphore *fshared = NULL;

    if (!(op & FUTEX_PRIVATE_FLAG))
    fshared = ¤t->mm->mmap_sem;

    switch (cmd) {
    case FUTEX_WAIT:
    ret = futex_wait(uaddr, fshared, val, timeout);

    [...]

    static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared,
    u32 val, ktime_t *abs_time)
    {
    [...]
    struct restart_block *restart;
    restart = ¤t_thread_info()->restart_block;
    restart->fn = futex_wait_restart;
    restart->arg0 = (unsigned long)uaddr;
    restart->arg1 = (unsigned long)val;
    restart->arg2 = (unsigned long)abs_time;
    restart->arg3 = 0;
    if (fshared)
    restart->arg3 |= ARG3_SHARED;
    return -ERESTART_RESTARTBLOCK;
    [...]

    static long futex_wait_restart(struct restart_block *restart)
    {
    u32 __user *uaddr = (u32 __user *)restart->arg0;
    u32 val = (u32)restart->arg1;
    ktime_t *abs_time = (ktime_t *)restart->arg2;
    struct rw_semaphore *fshared = NULL;

    restart->fn = do_no_restart_syscall;
    if (restart->arg3 & ARG3_SHARED)
    fshared = ¤t->mm->mmap_sem;
    return (long)futex_wait(uaddr, fshared, val, abs_time);
    }

    So when the futex_wait is interrupt by a signal we break out of the
    hrtimer code and set up or return from signal. This code does not return
    back to userspace, so we set up a RESTARTBLOCK. The bug here is that we
    save the "abs_time" which is a pointer to the stack variable "ktime_t t"
    from sys_futex.

    This returns and unwinds the stack before we get to call our signal. On
    return from the signal we go to futex_wait_restart, where we update all
    the parameters for futex_wait and call it. But here we have a problem
    where abs_time is no longer valid.

    I verified this with print statements, and sure enough, what abs_time
    was set to ends up being garbage when we get to futex_wait_restart.

    The solution I did to solve this (with input from Linus Torvalds)
    was to add unions to the restart_block to allow system calls to
    use the restart with specific parameters. This way the futex code now
    saves the time in a 64bit value in the restart block instead of storing
    it on the stack.

    Note: I'm a bit nervious to add "linux/types.h" and use u32 and u64
    in thread_info.h, when there's a #ifdef __KERNEL__ just below that.
    Not sure what that is there for. If this turns out to be a problem, I've
    tested this with using "unsigned int" for u32 and "unsigned long long" for
    u64 and it worked just the same. I'm using u32 and u64 just to be
    consistent with what the futex code uses.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner
    Acked-by: Linus Torvalds

    Steven Rostedt
     
  • Add a field to the lro_mgr struct so that drivers can specify how much
    padding is required to align layer 3 headers when a packet is copied
    into a freshly allocated skb by inet_lro.c:lro_gen_skb(). Without
    padding, skbs generated by LRO will cause alignment warnings on
    architectures which require strict alignment (seen on sparc64).

    Myri10GE is updated to use this field.

    Signed-off-by: Andrew Gallatin
    Signed-off-by: David S. Miller

    Andrew Gallatin
     
  • If mmap_min_addr is set and a process attempts to mmap (not fixed) with a
    non-null hint address less than mmap_min_addr the mapping will fail the
    security checks. Since this is just a hint address this patch will round
    such a hint address above mmap_min_addr.

    gcj was found to try to be very frugal with vm usage and give hint addresses
    in the 8k-32k range. Without this patch all such programs failed and with
    the patch they happily get a higher address.

    This patch is wrappad in CONFIG_SECURITY since mmap_min_addr doesn't exist
    without it and there would be no security check possible no matter what. So
    we should not bother compiling in this rounding if it is just a waste of
    time.

    Signed-off-by: Eric Paris
    Signed-off-by: James Morris

    Eric Paris
     
  • Lately I've got this nice badness on mdio bus removal:

    Device 'e0103120:06' does not have a release() function, it is broken and must be fixed.
    ------------[ cut here ]------------
    Badness at drivers/base/core.c:107
    NIP: c015c1a8 LR: c015c1a8 CTR: c0157488
    REGS: c34bdcf0 TRAP: 0700 Not tainted (2.6.23-rc5-g9ebadfbb-dirty)
    MSR: 00029032 CR: 24088422 XER: 00000000
    ...
    [c34bdda0] [c015c1a8] device_release+0x78/0x80 (unreliable)
    [c34bddb0] [c01354cc] kobject_cleanup+0x80/0xbc
    [c34bddd0] [c01365f0] kref_put+0x54/0x6c
    [c34bdde0] [c013543c] kobject_put+0x24/0x34
    [c34bddf0] [c015c384] put_device+0x1c/0x2c
    [c34bde00] [c0180e84] mdiobus_unregister+0x2c/0x58
    ...

    Though actually there is nothing broken, it just device
    subsystem core expects another "pattern" of resource managment.

    This patch implement phy device's release function, thus
    we're getting rid of this badness.

    Also small hidden bug fixed, hope none other introduced. ;-)

    Signed-off-by: Anton Vorontsov
    Acked-by: Andy Fleming
    Signed-off-by: Jeff Garzik

    Anton Vorontsov