31 Mar, 2009

11 commits

  • This makes the includes more explicit, and is preparation for moving
    md_k.h to drivers/md/md.h

    Remove include/raid/md.h as its only remaining use was to #include
    other files.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • The extern function definitions are kernel-internal definitions, so
    they belong in md_k.h

    The MD_*_VERSION values could reasonably go in a number of places,
    but md_u.h seems most reasonable.

    This leaves almost nothing in md.h. It will go soon.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • .. as they are part of the user-space interface.
    Also move MdpMinorShift into there so we can remove duplication.

    Lastly move mdp_major in. It is less obviously part of the user-space
    interface, but do_mounts_md.c uses it, and it is acting a bit like
    user-space.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • Move the headers with the local structures for the disciplines and
    bitmap.h into drivers/md/ so that they are more easily grepable for
    hacking and not far away. md.h is left where it is for now as there
    are some uses from the outside.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: NeilBrown

    Christoph Hellwig
     
  • Use the -y variables instead of the old -objs so we can easily add
    conditional objects to the modules. Also always use += to add
    subobjects to avoid problems when placing additional objects in
    some place in the file.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: NeilBrown

    Christoph Hellwig
     
  • MAJOR_NR was only required for magic in linux/blk.h in 2.4 or earlier
    kernels, so no need to keep it around.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: NeilBrown

    Christoph Hellwig
     
  • md: Add support for data integrity to MD

    If all subdevices support the same protection format the MD device is
    flagged as integrity capable.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: NeilBrown

    Martin K. Petersen
     
  • When we add some spares to an array and start recovery, and we have
    a bitmap which is stored 'internally' on all devices, we call
    bitmap_write_all to make sure the bitmap is correct on the new
    device(s).
    However that doesn't work as write_sb_page only writes to
    'In_sync' devices, and devices undergoing recovery are not
    'In_sync' until recovery finishes.

    So extend write_sb_page (actually next_active_rdev) to include devices
    that are under recovery.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • It is safe to clear a bit from the write-intent bitmap for a raid1
    if we know the data has been written to all devices, which is
    what the current test does.

    But it is not always safe to update the 'events_cleared' counter in
    that case. This is because one request could complete successfully
    after some other request has partially failed.

    So simply disable the clearing and updating of events_cleared whenever
    the array is degraded. This might end up not clearing some bits that
    could safely be cleared, but it is safest approach.

    Note that the bug fixed here did not risk corrupting data by letting
    the array get out-of-sync. Rather it meant that when a device is
    removed and re-added to the array, it might incorrectly require a full
    recovery rather than just recovering based on the bitmap.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • md currently insists that the chunk size used for write-intent
    bitmaps (the amount of data that corresponds to one chunk)
    be at least one page.

    The reason for this restriction is lost in the mists of time,
    but a review of the code (and a vague memory) suggests that the only
    problem would be related to resync. Resync tries very hard to
    work in multiples of a page, but also needs to sync with units
    of a bitmap_chunk too.

    This connection comes out in the bitmap_start_sync call.

    So change bitmap_start_sync to always work in multiples of a page.
    If the bitmap chunk size is less that one page, we flag multiple
    chunks as 'syncing' and generally make them all appear to the
    resync routines like one chunk.

    All other code either already works with data ranges that could
    span multiple chunks, or explicitly only cares about a single chunk.

    Signed-off-by: Neil Brown

    NeilBrown
     
  • There are two problems with is_mddev_idle.

    1/ sync_io is 'atomic_t' and hence 'int'. curr_events and all the
    rest are 'long'.
    So if sync_io were to wrap on a 64bit host, the value of
    curr_events would go very negative suddenly, and take a very
    long time to return to positive.

    So do all calculations as 'int'. That gives us plenty of precision
    for what we need.

    2/ To initialise rdev->last_events we simply call is_mddev_idle, on
    the assumption that it will make sure that last_events is in a
    suitable range. It used to do this, but now it does not.
    So now we need to be more explicit about initialisation.

    Signed-off-by: NeilBrown

    NeilBrown
     

10 Mar, 2009

7 commits


09 Mar, 2009

20 commits

  • btrfs_tree_locked was being used to make sure a given extent_buffer was
    properly locked in a few places. But, it wasn't correct for UP compiled
    kernels.

    This switches it to using assert_spin_locked instead, and renames it to
    btrfs_assert_tree_locked to better reflect how it was really being used.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • Frans Pop reported the crash below when running an s390 kernel under Hercules:

    Kernel BUG at 000738b4 verbose debug info unavailable!
    fixpoint divide exception: 0009 #1! SMP
    Modules linked in: nfs lockd nfs_acl sunrpc ctcm fsm tape_34xx
    cu3088 tape ccwgroup tape_class ext3 jbd mbcache dm_mirror dm_log dm_snapshot
    dm_mod dasd_eckd_mod dasd_mod
    CPU: 0 Not tainted 2.6.27.19 #13
    Process awk (pid: 2069, task: 0f9ed9b8, ksp: 0f4f7d18)
    Krnl PSW : 070c1000 800738b4 (acct_update_integrals+0x4c/0x118)
    R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0
    Krnl GPRS: 00000000 000007d0 7fffffff fffff830
    00000000 ffffffff 00000002 0f9ed9b8
    00000000 00008ca0 00000000 0f9ed9b8
    0f9edda4 8007386e 0f4f7ec8 0f4f7e98
    Krnl Code: 800738aa: a71807d0 lhi %r1,2000
    800738ae: 8c200001 srdl %r2,1
    800738b2: 1d21 dr %r2,%r1
    >800738b4: 5810d10e l %r1,270(%r13)
    800738b8: 1823 lr %r2,%r3
    800738ba: 4130f060 la %r3,96(%r15)
    800738be: 0de1 basr %r14,%r1
    800738c0: 5800f060 l %r0,96(%r15)
    Call Trace:
    ( ! blocking_notifier_call_chain+0x1e/0x2c)
    ! do_exit+0x106/0x7c0
    ! do_group_exit+0x7a/0xb4
    ! SyS_exit_group+0x1e/0x30
    ! sysc_do_restart+0x12/0x16
    ! 0x77e7e924

    Reason for this is that cpu time accounting usually only happens from
    interrupt context, but acct_update_integrals gets also called from
    process context with interrupts enabled.

    So in acct_update_integrals we may end up with the following scenario:

    Between reading tsk->stime/tsk->utime and tsk->acct_timexpd an interrupt
    happens which updates accouting values. This causes acct_timexpd to be
    greater than the former stime + utime. The subsequent calculation of

    dtime = cputime_sub(time, tsk->acct_timexpd);

    will be negative and the division performed by

    cputime_to_jiffies(dtime)

    will generate an exception since the result won't fit into a 32 bit
    register.

    In order to fix this just always disable interrupts while accessing any
    of the accounting values.

    Reported by: Frans Pop
    Tested by: Frans Pop
    Cc: stable@kernel.org
    Cc: Martin Schwidefsky
    Signed-off-by: Heiko Carstens
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • Impact: remove lots of lguest boot WARN_ON() when CONFIG_SPARSE_IRQ=y

    We now need to call irq_to_desc_alloc_cpu() before
    set_irq_chip_and_handler_name(), but we can't do that from init_IRQ (no
    kmalloc available).

    So do it as we use interrupts instead. Also means we only alloc for
    irqs we use, which was the intent of CONFIG_SPARSE_IRQ anyway.

    Signed-off-by: Rusty Russell
    Cc: Ingo Molnar

    Rusty Russell
     
  • Impact: fix lguest boot crash on modern Intel machines

    The code in early_init_intel does:

    if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) {
    u64 misc_enable;

    rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable);

    And that rdmsr faults (not allowed from non-0 PL). We can get around
    this by mugging the family ID part of the cpuid. 5 seems like a good
    number.

    Of course, this is a hack (how very lguest!). We could just indicate
    that we don't support MSRs, or implement lguest_rdmst.

    Reported-by: Patrick McHardy
    Signed-off-by: Rusty Russell
    Tested-by: Patrick McHardy

    Rusty Russell
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
    mmc: fix data timeout for SEND_EXT_CSD

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    rcu: increment quiescent state counter in ksoftirqd()

    Linus Torvalds
     
  • …git/tip/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, pebs: correct qualifier passed to ds_write_config() from ds_request_pebs()
    x86, bts: remove bad warning
    x86: add Dell XPS710 reboot quirk
    x86, math-emu: fix init_fpu for task != current
    x86: EFI: Back efi_ioremap with init_memory_mapping instead of FIX_MAP
    x86: fix DMI on EFI

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
    [WATCHDOG] orion5x_wdt.c: 'ORION5X_TCLK' undeclared
    [WATCHDOG] gef_wdt.c: fsl_get_sys_freq() failure not noticed
    [WATCHDOG] ks8695_wdt.c: 'CLOCK_TICK_RATE' undeclared
    [WATCHDOG] rc32434_wdt: fix sections
    [WATCHDOG] rc32434_wdt: fix watchdog driver

    Linus Torvalds
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: fix ext4_free_inode() vs. ext4_claim_inode() race

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/blackfin-2.6: (28 commits)
    Blackfin arch: SPI_MMC is now mainlined MMC_SPI
    Blackfin arch: disable legacy /proc/scsi/ support by default
    Blackfin arch: remove duplicated ANOMALY_05000448 ifdef check
    Blackfin arch: add stubs for anomalies 447 and 448
    Blackfin arch: cleanup bfin_sport.h header and export it to userspace
    Blackfin arch: fix bug - gdb signull case make trunk kernel panic frequently
    Blackfin arch: remove spurious dash when dcache is off
    Blackfin arch: mark init_pda as __init as only __init funcs all it
    Blackfin arch: fix bug - On bf548-ezkit, ethernet fails to work after wakeup from "mem"
    Blackfin arch: Random read/write errors are a bad thing
    Blackfin arch: update default kernel config, select KSZ8893M driver for BF518
    Blackfin arch: Fix bug - KGDB single step into the middle of a 4 bytes instruction on bf561 after soft bp is hit
    Blackfin arch: Fix bug - make ksz8893m driver available when bfin_mac is enabled
    Blackfin arch: make sure people do not set the kernel load address too high
    Blackfin arch: fix bug - The SPORT_HYS bit is not set for BF561 0.5
    Blackfin arch: update anomaly sheets to match latest public info
    Blackfin arch: Fix BUG - kernel fails to build in pm.c when allow wakeup fromi standby by GPIO
    Blackfin arch: PM_BFIN_WAKE_GP: update help
    Blackfin arch: fix bug - kgdb fails to continue after setting breakpoint on bf561-ezkit kernel with smp patch
    Blackfin arch: Enable Write Back Cache on all Blackfin Boards
    ...

    Linus Torvalds
     
  • * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
    dmatest: fix use after free in dmatest_exit
    ipu_idmac: fix spinlock type
    iop-adma, mv_xor: fix mem leak on self-test setup failure
    fsldma: fix off by one in dma_halt
    I/OAT: fail self-test if callback test reaches timeout
    I/OAT: update driver version and copyright dates
    I/OAT: list usage cleanup
    I/OAT: set tcp_dma_copybreak to 256k for I/OAT ver.3
    I/OAT: cancel watchdog before dma remove
    I/OAT: fail initialization on zero channels detection
    I/OAT: do not set DCACTRL_CMPL_WRITE_ENABLE for I/OAT ver.3
    I/OAT: add verification for proper APICID_TAG_MAP setting by BIOS
    dmaengine: update kerneldoc

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6:
    ata: add CFA specific identify data words
    remove stale comment from
    AT91: initialize Compact Flash on AT91SAM9263 cpu
    ide: add at91_ide driver
    ide: allow to wrap interrupt handler
    ide-iops: fix odd-length ATAPI PIO transfers
    ide: NULL noise: drivers/ide/ide-*.c
    ide: expiry() returns int, negative expiry() return values won't be noticed

    Linus Torvalds
     
  • * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
    libata: Don't trust current capacity values in identify words 57-58
    libata: make sure port is thawed when skipping resets
    sata_nv: fix module parameter description
    ahci: Add the Device IDs for MCP89 and remove IDs of MCP7B to/from ahci.c
    libata: don't use on-stack sense buffer
    libata: align ap->sector_buf
    libata: fix dma_unmap_sg misuse
    libata: change drive ready wait after hard reset to 5s

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
    Squashfs: frag_size should be signed, as it can hold an error result
    Squashfs: fix documentation typo, Cramfs filesystem limit is 256 MiB
    Squashfs: Fix oops when reading fsfuzzer corrupted filesystems

    Linus Torvalds
     
  • …s/security-testing-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
    smack: fixes for unlabeled host support

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
    Input: serio - fix protocol number for TouchIT213

    Linus Torvalds
     
  • * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
    [IA64] fix PCI DMA flag propagation on SN (Altix) with PICs

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    block: fix missing bio back/front segment size setting in blk_recount_segments()
    loop: don't increment p->offset with (size_t) -EINVAL
    cciss: remove 30 second initial timeout on controller reset
    Fix kernel NULL pointer dereference in xen-blkfront

    Linus Torvalds
     
  • * 'fix/hda' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
    ALSA: hda - Fix headphone-detect regression with multiple HP jacks
    ALSA: hda - Fix typos in slave controls in patch_sigmatel.c

    Linus Torvalds
     
  • This is a build fix required after "x86-64: seccomp: fix 32/64 syscall
    hole" (commit 5b1017404aea6d2e552e991b3fd814d839e9cd67). MIPS doesn't
    have the issue that was fixed for x86-64 by that patch.

    This also doesn't solve the N32 issue which is that N32 seccomp processes
    will be treated as non-compat processes thus only have access to N64
    syscalls.

    Signed-off-by: Ralf Baechle
    Signed-off-by: Linus Torvalds

    Ralf Baechle
     

08 Mar, 2009

2 commits