16 Jun, 2011

1 commit


15 Jun, 2011

2 commits

  • Commit a26ac2455ffcf3(rcu: move TREE_RCU from softirq to kthread)
    introduced performance regression. In an AIM7 test, this commit degraded
    performance by about 40%.

    The commit runs rcu callbacks in a kthread instead of softirq. We observed
    high rate of context switch which is caused by this. Out test system has
    64 CPUs and HZ is 1000, so we saw more than 64k context switch per second
    which is caused by RCU's per-CPU kthread. A trace showed that most of
    the time the RCU per-CPU kthread doesn't actually handle any callbacks,
    but instead just does a very small amount of work handling grace periods.
    This means that RCU's per-CPU kthreads are making the scheduler do quite
    a bit of work in order to allow a very small amount of RCU-related
    processing to be done.

    Alex Shi's analysis determined that this slowdown is due to lock
    contention within the scheduler. Unfortunately, as Peter Zijlstra points
    out, the scheduler's real-time semantics require global action, which
    means that this contention is inherent in real-time scheduling. (Yes,
    perhaps someone will come up with a workaround -- otherwise, -rt is not
    going to do well on large SMP systems -- but this patch will work around
    this issue in the meantime. And "the meantime" might well be forever.)

    This patch therefore re-introduces softirq processing to RCU, but only
    for core RCU work. RCU callbacks are still executed in kthread context,
    so that only a small amount of RCU work runs in softirq context in the
    common case. This should minimize ksoftirqd execution, allowing us to
    skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels.

    Signed-off-by: Shaohua Li
    Tested-by: "Alex,Shi"
    Signed-off-by: Paul E. McKenney

    Shaohua Li
     
  • Make the functions creating the kthreads wake them up. Leverage the
    fact that the per-node and boost kthreads can run anywhere, thus
    dispensing with the need to wake them up once the incoming CPU has
    gone fully online.

    Signed-off-by: Paul E. McKenney
    Tested-by: Daniel J Blueman

    Paul E. McKenney
     

06 Jun, 2011

5 commits


05 Jun, 2011

4 commits

  • * 'for-linus' of git://android.git.kernel.org/kernel/tegra:
    ARM: Tegra: Harmony: Fix conflicting GPIO numbering

    Linus Torvalds
     
  • Currently, both the WM8903 and TPS6586x chips attempt to register with
    gpiolib using the same GPIO numbers. This causes the audio driver to
    fail to initialize.

    To solve this, add a define to board-harmony.h for the TPS6586x, and make
    board-harmony-power.c use this define, instead of directly referencing
    TEGRA_NR_GPIOS.

    This fixes a regression introduced by commit
    6f168f2fa60f87e85e0df25e87e2372f22f5eb7c.
    ARM: tegra: harmony: initialize the TPS65862 PMIC

    Signed-off-by: Stephen Warren
    Signed-off-by: Colin Cross

    Stephen Warren
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (25 commits)
    btrfs: fix uninitialized variable warning
    btrfs: add helper for fs_info->closing
    Btrfs: add mount -o inode_cache
    btrfs: scrub: add explicit plugging
    btrfs: use btrfs_ino to access inode number
    Btrfs: don't save the inode cache if we are deleting this root
    btrfs: false BUG_ON when degraded
    Btrfs: don't save the inode cache in non-FS roots
    Btrfs: make sure we don't overflow the free space cache crc page
    Btrfs: fix uninit variable in the delayed inode code
    btrfs: scrub: don't reuse bios and pages
    Btrfs: leave spinning on lookup and map the leaf
    Btrfs: check for duplicate entries in the free space cache
    Btrfs: don't try to allocate from a block group that doesn't have enough space
    Btrfs: don't always do readahead
    Btrfs: try not to sleep as much when doing slow caching
    Btrfs: kill BTRFS_I(inode)->block_group
    Btrfs: don't look at the extent buffer level 3 times in a row
    Btrfs: map the node block when looking for readahead targets
    Btrfs: set range_start to the right start in count_range_bits
    ...

    Linus Torvalds
     
  • Improve detection of MAX6642 by reading non existing registers (0x04, 0x06
    and 0xff). Reading those registers returns the previously read value.

    Signed-off-by: Per Dalen
    [guenter.roeck@ericsson.com: added second set of register reads]
    Signed-off-by: Guenter Roeck

    Per Dalén
     

04 Jun, 2011

24 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
    [SCSI] Fix oops caused by queue refcounting failure

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (40 commits)
    tg3: Fix tg3_skb_error_unmap()
    net: tracepoint of net_dev_xmit sees freed skb and causes panic
    drivers/net/can/flexcan.c: add missing clk_put
    net: dm9000: Get the chip in a known good state before enabling interrupts
    drivers/net/davinci_emac.c: add missing clk_put
    af-packet: Add flag to distinguish VID 0 from no-vlan.
    caif: Fix race when conditionally taking rtnl lock
    usbnet/cdc_ncm: add missing .reset_resume hook
    vlan: fix typo in vlan_dev_hard_start_xmit()
    net/ipv4: Check for mistakenly passed in non-IPv4 address
    iwl4965: correctly validate temperature value
    bluetooth l2cap: fix locking in l2cap_global_chan_by_psm
    ath9k: fix two more bugs in tx power
    cfg80211: don't drop p2p probe responses
    Revert "net: fix section mismatches"
    drivers/net/usb/catc.c: Fix potential deadlock in catc_ctrl_run()
    sctp: stop pending timers and purge queues when peer restart asoc
    drivers/net: ks8842 Fix crash on received packet when in PIO mode.
    ip_options_compile: properly handle unaligned pointer
    iwlagn: fix incorrect PCI subsystem id for 6150 devices
    ...

    Linus Torvalds
     
  • With Linus' tree, today's linux-next build (powercp ppc64_defconfig)
    produced this warning:

    fs/btrfs/delayed-inode.c: In function 'btrfs_delayed_update_inode':
    fs/btrfs/delayed-inode.c:1598:6: warning: 'ret' may be used
    uninitialized in this function

    Introduced by commit 16cdcec736cd ("btrfs: implement delayed inode items
    operation").

    This fixes a bug in btrfs_update_inode(): if the returned value from
    btrfs_delayed_update_inode is a nonzero garbage, inode stat data are not
    updated and several call paths may hit a BUG_ON or fail with strange
    code.

    Reported-by: Stephen Rothwell
    Signed-off-by: David Sterba

    David Sterba
     
  • wrap checking of filesystem 'closing' flag and fix a few missing memory
    barriers.

    Signed-off-by: David Sterba

    David Sterba
     
  • This makes the inode map cache default to off until we
    fix the overflow problem when the free space crcs don't fit
    inside a single page.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • With the removal of the implicit plugging scrub ends up doing more and
    smaller I/O than necessary. This patch adds explicit plugging per chunk.

    Signed-off-by: Arne Jansen
    Signed-off-by: Chris Mason

    Arne Jansen
     
  • commit 4cb5300bc ("Btrfs: add mount -o auto_defrag") accesses inode
    number directly while it should use the helper with the new inode
    number allocator.

    Signed-off-by: David Sterba
    Signed-off-by: Chris Mason

    David Sterba
     
  • With xfstest 254 I can panic the box every time with the inode number caching
    stuff on. This is because we clean the inodes out when we delete the subvolume,
    but then we write out the inode cache which adds an inode to the subvolume inode
    tree, and then when it gets evicted again the root gets added back on the dead
    roots list and is deleted again, so we have a double free. To stop this from
    happening just return 0 if refs is 0 (and we're not the tree root since tree
    root always has refs of 0). With this fix 254 no longer panics. Thanks,

    Signed-off-by: Josef Bacik
    Tested-by: David Sterba
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • In degraded mode the struct btrfs_device of missing devs don't have
    device->name set. A kstrdup of NULL correctly returns NULL. Don't
    BUG in this case.

    Signed-off-by: Arne Jansen
    Signed-off-by: Chris Mason

    Arne Jansen
     
  • This adds extra checks to make sure the inode map we are caching really
    belongs to a FS root instead of a special relocation tree. It
    prevents crashes during balancing operations.

    Signed-off-by: Liu Bo
    Signed-off-by: Chris Mason

    liubo
     
  • The free space cache uses only one page for crcs right now,
    which means we can't have a cache file bigger than the
    crcs we can fit in the first page. This adds a check to
    enforce that restriction.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • The nitems counter needs to start at zero

    Signed-off-by: Chris Mason

    Chris Mason
     
  • The current scrub implementation reuses bios and pages as often as possible,
    allocating them only on start and releasing them when finished. This leads
    to more problems with the block layer than it's worth. The elevator gets
    confused when there are more pages added to the bio than bi_size suggests.
    This patch completely rips out the reuse of bios and pages and allocates
    them freshly for each submit.

    Signed-off-by: Arne Jansen
    Signed-off-by: Chris Maosn

    Arne Jansen
     
  • * 'for-linus' of git://git.kernel.dk/linux-block:
    block: Use hlist_entry() for io_context.cic_list.first
    cfq-iosched: Remove bogus check in queue_fail path
    xen/blkback: potential null dereference in error handling
    xen/blkback: don't call vbd_size() if bd_disk is NULL
    block: blkdev_get() should access ->bd_disk only after success
    CFQ: Fix typo and remove unnecessary semicolon
    block: remove unwanted semicolons
    Revert "block: Remove extra discard_alignment from hd_struct."
    nbd: adjust 'max_part' according to part_shift
    nbd: limit module parameters to a sane value
    nbd: pass MSG_* flags to kernel_recvmsg()
    block: improve the bio_add_page() and bio_add_pc_page() descriptions

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin:
    Blackfin: strncpy: fix handling of zero lengths

    Linus Torvalds
     
  • * 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
    asm-generic/unistd.h: support sendmmsg syscall
    tile: enable CONFIG_BUGVERBOSE

    Linus Torvalds
     
  • * 'linux-next' of git://git.infradead.org/ubifs-2.6:
    UBIFS: fix-up free space earlier
    UBIFS: intialize LPT earlier
    UBIFS: assert no fixup when writing a node
    UBIFS: fix clean znode counter corruption in error cases
    UBIFS: fix memory leak on error path
    UBIFS: fix shrinker object count reports
    UBIFS: fix recovery broken by the previous recovery fix
    UBIFS: amend ubifs_recover_leb interface
    UBIFS: introduce a "grouped" journal head flag
    UBIFS: supress false error messages

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-ktest:
    ktest: Ignore unset values of the minconfig in config_bisect
    ktest: Fix result of rebooting the kernel
    ktest: Fix off-by-one in config bisect result

    Linus Torvalds
     
  • …nel/git/lethal/sh-2.6

    * 'rmobile-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
    ARM: mach-shmobile: add DMAC clock definitions on SH7372
    ARM: arch-shmobile: support SDHI card detection on mackerel, using a GPIO
    sh_mobile_meram: MERAM platform data for LCDC

    Linus Torvalds
     
  • * 'sh-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
    dmaengine: shdma: fix a regression: initialise DMA channels for memcpy
    dmaengine: shdma: Fix up fallout from runtime PM changes.
    Revert "clocksource: sh_cmt: Runtime PM support"
    Revert "clocksource: sh_tmu: Runtime PM support"
    sh: Fix up asm-generic/ptrace.h fallout.
    sh64: Move from P1SEG to CAC_ADDR for consistent sync.
    sh64: asm/pgtable.h needs asm/mmu.h
    sh: asm/tlb.h needs linux/swap.h
    sh: mark DMA slave ID 0 as invalid
    sh: Update shmin to reflect PIO dependency.
    sh: arch/sh/kernel/process_32.c needs linux/prefetch.h.
    sh: add MMCIF runtime PM support on ecovec
    sh: switch ap325rxa to dynamically manage the platform camera

    Linus Torvalds
     
  • This reverts commit ed0bd2333cffc3d856db9beb829543c1dfc00982.

    Since we reverted the TTY API change, we should revert the ASoC update
    to it too.

    Cc: Mark Brown
    Cc: Liam Girdwood
    Cc: Greg Kroah-Hartman
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • This reverts commit b1c43f82c5aa265442f82dba31ce985ebb7aa71c.

    It was broken in so many ways, and results in random odd pty issues.

    It re-introduced the buggy schedule_work() in flush_to_ldisc() that can
    cause endless work-loops (see commit a5660b41af6a: "tty: fix endless
    work loop when the buffer fills up").

    It also used an "unsigned int" return value fo the ->receive_buf()
    function, but then made multiple functions return a negative error code,
    and didn't actually check for the error in the caller.

    And it didn't actually work at all. BenH bisected down odd tty behavior
    to it:
    "It looks like the patch is causing some major malfunctions of the X
    server for me, possibly related to PTYs. For example, cat'ing a
    large file in a gnome terminal hangs the kernel for -minutes- in a
    loop of what looks like flush_to_ldisc/workqueue code, (some ftrace
    data in the quoted bits further down).

    ...

    Some more data: It -looks- like what happens is that the
    flush_to_ldisc work queue entry constantly re-queues itself (because
    the PTY is full ?) and the workqueue thread will basically loop
    forver calling it without ever scheduling, thus starving the consumer
    process that could have emptied the PTY."

    which is pretty much exactly the problem we fixed in a5660b41af6a.

    Milton Miller pointed out the 'unsigned int' issue.

    Reported-by: Benjamin Herrenschmidt
    Reported-by: Milton Miller
    Cc: Stefan Bigler
    Cc: Toby Gray
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Alan Cox
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • …wireless-2.6 into for-davem

    John W. Linville
     
  • CM6206: Turn off de-emphasis channel status bit in S/PDIF output.

    Signed-off-by: Eric Lammerts
    Signed-off-by: Takashi Iwai

    Eric Lammerts
     

03 Jun, 2011

4 commits

  • The free space fixup is currently initiated during mount after the call to
    ubifs_write_master() which results in a write to PEBs; this has been observed
    with the patch 'assert no fixup when writing a node' applied:

    Move the free space fixup on mount to before the calls to
    ubifs_recover_inl_heads() and ubifs_write_master(). This results in no
    assertions with the previously mentioned patch applied.

    Artem: tweaked the patch a bit

    Signed-off-by: Ben Gardiner
    Reviewed-by: Matthew L. Creech
    Signed-off-by: Artem Bityutskiy

    Ben Gardiner
     
  • The current 'mount_ubifs()' implementation does not initialize the LPT until the
    the master node is marked dirty. Move the LPT initialization to before marking
    the master node dirty. This is a preparation for the next patch which will move
    the free-space-fixup check to before marking the master node dirty, because we
    have to fix-up the free space before doing any writes.

    Artem: massaged the patch and commit message.

    Signed-off-by: Ben Gardiner
    Reviewed-by: Matthew L. Creech
    Signed-off-by: Artem Bityutskiy

    Ben Gardiner
     
  • The current free space fixup can result in some writing to the UBI volume
    when the space_fixup flag is set.

    To catch instances where UBIFS is writing to the NAND while the space_fixup
    flag is set, add an assert to ubifs_write_node().

    Artem: tweaked the patch, added similar assertion to the write buffer
    write path.

    Signed-off-by: Ben Gardiner
    Reviewed-by: Matthew L. Creech
    Signed-off-by: Artem Bityutskiy

    Ben Gardiner
     
  • UBIFS maintains per-filesystem and global clean znode counters
    ('c->clean_zn_cnt' and 'ubifs_clean_zn_cnt'). It is important to maintain
    correct values there since the shrinker relies on 'ubifs_clean_zn_cnt'.

    However, in case of failures during commit the counters were corrupted. E.g.,
    if a failure happens in the middle of 'write_index()', then some nodes in the
    commit list ('c->cnext') are marked as clean, and some are marked as dirty. And
    the 'ubifs_destroy_tnc_subtree()' frees does not retrun correct count, and we
    end up with non-zero 'c->clean_zn_cnt' when unmounting. This means that if we
    have 2 file-sytem and one of them fails, and we unmount it,
    'ubifs_clean_zn_cnt' stays incorrect and confuses the shrinker.

    Signed-off-by: Artem Bityutskiy

    Artem Bityutskiy