02 Aug, 2011

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (60 commits)
    ext4: prevent memory leaks from ext4_mb_init_backend() on error path
    ext4: use EXT4_BAD_INO for buddy cache to avoid colliding with valid inode #
    ext4: use ext4_msg() instead of printk in mballoc
    ext4: use ext4_kvzalloc()/ext4_kvmalloc() for s_group_desc and s_group_info
    ext4: introduce ext4_kvmalloc(), ext4_kzalloc(), and ext4_kvfree()
    ext4: use the correct error exit path in ext4_init_inode_table()
    ext4: add missing kfree() on error return path in add_new_gdb()
    ext4: change umode_t in tracepoint headers to be an explicit __u16
    ext4: fix races in ext4_sync_parent()
    ext4: Fix overflow caused by missing cast in ext4_fallocate()
    ext4: add action of moving index in ext4_ext_rm_idx for Punch Hole
    ext4: simplify parameters of reserve_backup_gdb()
    ext4: simplify parameters of add_new_gdb()
    ext4: remove lock_buffer in bclean() and setup_new_group_blocks()
    ext4: simplify journal handling in setup_new_group_blocks()
    ext4: let setup_new_group_blocks() set multiple bits at a time
    ext4: fix a typo in ext4_group_extend()
    ext4: let ext4_group_add_blocks() handle 0 blocks quickly
    ext4: let ext4_group_add_blocks() return an error code
    ext4: rename ext4_add_groupblocks() to ext4_group_add_blocks()
    ...

    Fix up conflict in fs/ext4/inode.c: commit aacfc19c626e ("fs: simplify
    the blockdev_direct_IO prototype") had changed the ext4_ind_direct_IO()
    function for the new simplified calling convention, while commit
    dae1e52cb126 ("ext4: move ext4_ind_* functions from inode.c to
    indirect.c") moved the function to another file.

    Linus Torvalds
     

31 Jul, 2011

1 commit


30 Jul, 2011

1 commit


27 Jul, 2011

2 commits

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
    jbd: change the field "b_cow_tid" of struct journal_head from type unsigned to tid_t
    ext3.txt: update the links in the section "useful links" to the latest ones
    ext3: Fix data corruption in inodes with journalled data
    ext2: check xattr name_len before acquiring xattr_sem in ext2_xattr_get
    ext3: Fix compilation with -DDX_DEBUG
    quota: Remove unused declaration
    jbd: Use WRITE_SYNC in journal checkpoint.
    jbd: Fix oops in journal_remove_journal_head()
    ext3: Return -EINVAL when start is beyond the end of fs in ext3_trim_fs()
    ext3/ioctl.c: silence sparse warnings about different address spaces
    ext3/ext4 Documentation: remove bh/nobh since it has been deprecated
    ext3: Improve truncate error handling
    ext3: use proper little-endian bitops
    ext2: include fs.h into ext2_fs.h
    ext3: Fix oops in ext3_try_to_allocate_with_rsv()
    jbd: fix a bug of leaking jh->b_jcount
    jbd: remove dependency on __GFP_NOFAIL
    ext3: Convert ext3 to new truncate calling convention
    jbd: Add fixed tracepoints
    ext3: Add fixed tracepoints

    Resolve conflicts in fs/ext3/fsync.c due to fsync locking push-down and
    new fixed tracepoints.

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback: (27 commits)
    mm: properly reflect task dirty limits in dirty_exceeded logic
    writeback: don't busy retry writeback on new/freeing inodes
    writeback: scale IO chunk size up to half device bandwidth
    writeback: trace global_dirty_state
    writeback: introduce max-pause and pass-good dirty limits
    writeback: introduce smoothed global dirty limit
    writeback: consolidate variable names in balance_dirty_pages()
    writeback: show bdi write bandwidth in debugfs
    writeback: bdi write bandwidth estimation
    writeback: account per-bdi accumulated written pages
    writeback: make writeback_control.nr_to_write straight
    writeback: skip tmpfs early in balance_dirty_pages_ratelimited_nr()
    writeback: trace event writeback_queue_io
    writeback: trace event writeback_single_inode
    writeback: remove .nonblocking and .encountered_congestion
    writeback: remove writeback_control.more_io
    writeback: skip balance_dirty_pages() for in-memory fs
    writeback: add bdi_dirty_limit() kernel-doc
    writeback: avoid extra sync work at enqueue time
    writeback: elevate queue_io() into wb_writeback()
    ...

    Fix up trivial conflicts in fs/fs-writeback.c and mm/filemap.c

    Linus Torvalds
     

26 Jul, 2011

1 commit

  • When CONFIG_FUNCTION_TRACER is disabled, compilation fails as follows:
    CC arch/x86/xen/setup.o
    In file included from arch/x86/include/asm/xen/hypercall.h:42,
    from arch/x86/xen/setup.c:19:
    include/trace/events/xen.h:31: warning: 'struct multicall_entry' declared inside parameter list
    include/trace/events/xen.h:31: warning: its scope is only this definition or declaration, which is probably not what you want
    include/trace/events/xen.h:31: warning: 'struct multicall_entry' declared inside parameter list
    include/trace/events/xen.h:31: warning: 'struct multicall_entry' declared inside parameter list
    include/trace/events/xen.h:31: warning: 'struct multicall_entry' declared inside parameter list
    [...]
    arch/x86/xen/trace.c:5: error: '__HYPERVISOR_set_trap_table' undeclared here (not in a function)
    arch/x86/xen/trace.c:5: error: array index in initializer not of integer type
    arch/x86/xen/trace.c:5: error: (near initialization for 'xen_hypercall_names')
    arch/x86/xen/trace.c:6: error: '__HYPERVISOR_mmu_update' undeclared here (not in a function)
    arch/x86/xen/trace.c:6: error: array index in initializer not of integer type
    arch/x86/xen/trace.c:6: error: (near initialization for 'xen_hypercall_names')

    Fix this by making sure struct multicall_entry has a declaration in
    scope at all times, and don't bother compiling xen/trace.c when tracing
    is disabled.

    Reported-by: Randy Dunlap
    Signed-off-by: Jeremy Fitzhardinge

    Jeremy Fitzhardinge
     

25 Jul, 2011

1 commit

  • * 'upstream/xen-tracing2' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
    xen/trace: use class for multicall trace
    xen/trace: convert mmu events to use DECLARE_EVENT_CLASS()/DEFINE_EVENT()
    xen/multicall: move *idx fields to start of mc_buffer
    xen/multicall: special-case singleton hypercalls
    xen/multicalls: add unlikely around slowpath in __xen_mc_entry()
    xen/multicalls: disable MC_DEBUG
    xen/mmu: tune pgtable alloc/release
    xen/mmu: use extend_args for more mmuext updates
    xen/trace: add tlb flush tracepoints
    xen/trace: add segment desc tracing
    xen/trace: add xen_pgd_(un)pin tracepoints
    xen/trace: add ptpage alloc/release tracepoints
    xen/trace: add mmu tracepoints
    xen/trace: add multicall tracing
    xen/trace: set up tracepoint skeleton
    xen/multicalls: remove debugfs stats
    trace/xen: add skeleton for Xen trace events

    Linus Torvalds
     

24 Jul, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: (297 commits)
    ALSA: asihpi - Replace with snd_ctl_boolean_mono_info()
    ALSA: asihpi - HPI version 4.08
    ALSA: asihpi - Add volume mute controls
    ALSA: asihpi - Control name updates
    ALSA: asihpi - Use size_t for sizeof result
    ALSA: asihpi - Explicitly include mutex.h
    ALSA: asihpi - Add new node and message defines
    ALSA: asihpi - Make local function static
    ALSA: asihpi - Fix minor typos and spelling
    ALSA: asihpi - Remove unused structures, macros and functions
    ALSA: asihpi - Remove spurious adapter index check
    ALSA: asihpi - Revise snd_pcm_debug_name, get rid of DEBUG_NAME macro
    ALSA: asihpi - DSP code loader API now independent of OS
    ALSA: asihpi - Remove controlex structs and associated special data transfer code
    ALSA: asihpi - Increase request and response buffer sizes
    ALSA: asihpi - Give more meaningful name to hpi request message type
    ALSA: usb-audio - Add quirk for Roland / BOSS BR-800
    ALSA: hda - Remove a superfluous argument of via_auto_init_output()
    ALSA: hda - Fix indep-HP path (de-)activation for VT1708* codecs
    ALSA: hda - Add documentation for codec-specific mixer controls
    ...

    Linus Torvalds
     

23 Jul, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (107 commits)
    vfs: use ERR_CAST for err-ptr tossing in lookup_instantiate_filp
    isofs: Remove global fs lock
    jffs2: fix IN_DELETE_SELF on overwriting rename() killing a directory
    fix IN_DELETE_SELF on overwriting rename() on ramfs et.al.
    mm/truncate.c: fix build for CONFIG_BLOCK not enabled
    fs:update the NOTE of the file_operations structure
    Remove dead code in dget_parent()
    AFS: Fix silly characters in a comment
    switch d_add_ci() to d_splice_alias() in "found negative" case as well
    simplify gfs2_lookup()
    jfs_lookup(): don't bother with . or ..
    get rid of useless dget_parent() in btrfs rename() and link()
    get rid of useless dget_parent() in fs/btrfs/ioctl.c
    fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers
    drivers: fix up various ->llseek() implementations
    fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek
    Ext4: handle SEEK_HOLE/SEEK_DATA generically
    Btrfs: implement our own ->llseek
    fs: add SEEK_HOLE and SEEK_DATA flags
    reiserfs: make reiserfs default to barrier=flush
    ...

    Fix up trivial conflicts in fs/xfs/linux-2.6/xfs_super.c due to the new
    shrinker callout for the inode cache, that clashed with the xfs code to
    start the periodic workers later.

    Linus Torvalds
     

22 Jul, 2011

1 commit


20 Jul, 2011

1 commit


19 Jul, 2011

9 commits


11 Jul, 2011

3 commits


10 Jul, 2011

2 commits

  • Add trace event balance_dirty_state for showing the global dirty page
    counts and thresholds at each global_dirty_limits() invocation. This
    will cover the callers throttle_vm_writeout(), over_bground_thresh()
    and each balance_dirty_pages() loop.

    Signed-off-by: Wu Fengguang

    Wu Fengguang
     
  • Pass struct wb_writeback_work all the way down to writeback_sb_inodes(),
    and initialize the struct writeback_control there.

    struct writeback_control is basically designed to control writeback of a
    single file, but we keep abuse it for writing multiple files in
    writeback_sb_inodes() and its callers.

    It immediately clean things up, e.g. suddenly wbc.nr_to_write vs
    work->nr_pages starts to make sense, and instead of saving and restoring
    pages_skipped in writeback_sb_inodes it can always start with a clean
    zero value.

    It also makes a neat IO pattern change: large dirty files are now
    written in the full 4MB writeback chunk size, rather than whatever
    remained quota in wbc->nr_to_write.

    Acked-by: Jan Kara
    Proposed-by: Christoph Hellwig
    Signed-off-by: Wu Fengguang

    Wu Fengguang
     

06 Jul, 2011

2 commits


25 Jun, 2011

2 commits

  • This commit adds fixed tracepoint for jbd. It has been based on fixed
    tracepoints for jbd2, however there are missing those for collecting
    statistics, since I think that it will require more intrusive patch so I
    should have its own commit, if someone decide that it is needed. Also
    there are new tracepoints in __journal_drop_transaction() and
    journal_update_superblock().

    The list of jbd tracepoints:

    jbd_checkpoint
    jbd_start_commit
    jbd_commit_locking
    jbd_commit_flushing
    jbd_commit_logging
    jbd_drop_transaction
    jbd_end_commit
    jbd_do_submit_data
    jbd_cleanup_journal_tail
    jbd_update_superblock_end

    Signed-off-by: Lukas Czerner
    Cc: Jan Kara
    Signed-off-by: Jan Kara

    Lukas Czerner
     
  • This commit adds fixed tracepoints to the ext3 code. It is based on ext4
    tracepoints, however due to the differences of both file systems, there
    are some tracepoints missing (those for delaloc and for multi-block
    allocator) and there are some ext3 specific as well (for reservation
    windows).

    Here is a list:

    ext3_free_inode
    ext3_request_inode
    ext3_allocate_inode
    ext3_evict_inode
    ext3_drop_inode
    ext3_mark_inode_dirty
    ext3_write_begin
    ext3_ordered_write_end
    ext3_writeback_write_end
    ext3_journalled_write_end
    ext3_ordered_writepage
    ext3_writeback_writepage
    ext3_journalled_writepage
    ext3_readpage
    ext3_releasepage
    ext3_invalidatepage
    ext3_discard_blocks
    ext3_request_blocks
    ext3_allocate_blocks
    ext3_free_blocks
    ext3_sync_file_enter
    ext3_sync_file_exit
    ext3_sync_fs
    ext3_rsv_window_add
    ext3_discard_reservation
    ext3_alloc_new_reservation
    ext3_reserved
    ext3_forget
    ext3_read_block_bitmap
    ext3_direct_IO_enter
    ext3_direct_IO_exit
    ext3_unlink_enter
    ext3_unlink_exit
    ext3_truncate_enter
    ext3_truncate_exit
    ext3_get_blocks_enter
    ext3_get_blocks_exit
    ext3_load_inode

    Signed-off-by: Lukas Czerner
    Cc: Jan Kara
    Signed-off-by: Jan Kara

    Lukas Czerner
     

22 Jun, 2011

3 commits

  • This patch adds 2 tracepoints to get a status of a socket receive queue
    and related parameter.

    One tracepoint is added to sock_queue_rcv_skb. It records rcvbuf size
    and its usage. The other tracepoint is added to __sk_mem_schedule and
    it records limitations of memory for sockets and current usage.

    By using these tracepoints we're able to know detailed reason why kernel
    drop the packet.

    Signed-off-by: Satoru Moriya
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Satoru Moriya
     
  • This patch adds a tracepoint to __udp_queue_rcv_skb to get the
    return value of ip_queue_rcv_skb. It indicates why kernel drops
    a packet at this point.

    ip_queue_rcv_skb returns following values in the packet drop case:

    rcvbuf is full : -ENOMEM
    sk_filter returns error : -EINVAL, -EACCESS, -ENOMEM, etc.
    __sk_mem_schedule returns error: -ENOBUF

    Signed-off-by: Satoru Moriya
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Satoru Moriya
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    jbd2: Fix oops in jbd2_journal_remove_journal_head()
    jbd2: Remove obsolete parameters in the comments for some jbd2 functions
    ext4: fixed tracepoints cleanup
    ext4: use FIEMAP_EXTENT_LAST flag for last extent in fiemap
    ext4: Fix max file size and logical block counting of extent format file
    ext4: correct comments for ext4_free_blocks()

    Linus Torvalds
     

19 Jun, 2011

1 commit


16 Jun, 2011

2 commits

  • While testing for memcg aware swap token, I observed a swap token was
    often grabbed an intermittent running process (eg init, auditd) and they
    never release a token.

    Why?

    Some processes (eg init, auditd, audispd) wake up when a process exiting.
    And swap token can be get first page-in process when a process exiting
    makes no swap token owner. Thus such above intermittent running process
    often get a token.

    And currently, swap token priority is only decreased at page fault path.
    Then, if the process sleep immediately after to grab swap token, the swap
    token priority never be decreased. That's obviously undesirable.

    This patch implement very poor (and lightweight) priority aging. It only
    be affect to the above corner case and doesn't change swap tendency
    workload performance (eg multi process qsbench load)

    Signed-off-by: KOSAKI Motohiro
    Reviewed-by: Rik van Riel
    Reviewed-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • This is useful for observing swap token activity.

    example output:

    zsh-1845 [000] 598.962716: update_swap_token_priority:
    mm=ffff88015eaf7700 old_prio=1 new_prio=0
    memtoy-1830 [001] 602.033900: update_swap_token_priority:
    mm=ffff880037a45880 old_prio=947 new_prio=949
    memtoy-1830 [000] 602.041509: update_swap_token_priority:
    mm=ffff880037a45880 old_prio=949 new_prio=951
    memtoy-1830 [000] 602.051959: update_swap_token_priority:
    mm=ffff880037a45880 old_prio=951 new_prio=953
    memtoy-1830 [000] 602.052188: update_swap_token_priority:
    mm=ffff880037a45880 old_prio=953 new_prio=955
    memtoy-1830 [001] 602.427184: put_swap_token:
    token_mm=ffff880037a45880
    zsh-1789 [000] 602.427281: replace_swap_token:
    old_token_mm= (null) old_prio=0 new_token_mm=ffff88015eaf7018
    new_prio=2
    zsh-1789 [001] 602.433456: update_swap_token_priority:
    mm=ffff88015eaf7018 old_prio=2 new_prio=4
    zsh-1789 [000] 602.437613: update_swap_token_priority:
    mm=ffff88015eaf7018 old_prio=4 new_prio=6
    zsh-1789 [000] 602.443924: update_swap_token_priority:
    mm=ffff88015eaf7018 old_prio=6 new_prio=8
    zsh-1789 [000] 602.451873: update_swap_token_priority:
    mm=ffff88015eaf7018 old_prio=8 new_prio=10
    zsh-1789 [001] 602.462639: update_swap_token_priority:
    mm=ffff88015eaf7018 old_prio=10 new_prio=12

    Signed-off-by: KOSAKI Motohiro
    Acked-by: Rik van Riel
    Reviewed-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

15 Jun, 2011

1 commit

  • Commit a26ac2455ffcf3(rcu: move TREE_RCU from softirq to kthread)
    introduced performance regression. In an AIM7 test, this commit degraded
    performance by about 40%.

    The commit runs rcu callbacks in a kthread instead of softirq. We observed
    high rate of context switch which is caused by this. Out test system has
    64 CPUs and HZ is 1000, so we saw more than 64k context switch per second
    which is caused by RCU's per-CPU kthread. A trace showed that most of
    the time the RCU per-CPU kthread doesn't actually handle any callbacks,
    but instead just does a very small amount of work handling grace periods.
    This means that RCU's per-CPU kthreads are making the scheduler do quite
    a bit of work in order to allow a very small amount of RCU-related
    processing to be done.

    Alex Shi's analysis determined that this slowdown is due to lock
    contention within the scheduler. Unfortunately, as Peter Zijlstra points
    out, the scheduler's real-time semantics require global action, which
    means that this contention is inherent in real-time scheduling. (Yes,
    perhaps someone will come up with a workaround -- otherwise, -rt is not
    going to do well on large SMP systems -- but this patch will work around
    this issue in the meantime. And "the meantime" might well be forever.)

    This patch therefore re-introduces softirq processing to RCU, but only
    for core RCU work. RCU callbacks are still executed in kthread context,
    so that only a small amount of RCU work runs in softirq context in the
    common case. This should minimize ksoftirqd execution, allowing us to
    skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels.

    Signed-off-by: Shaohua Li
    Tested-by: "Alex,Shi"
    Signed-off-by: Paul E. McKenney

    Shaohua Li
     

08 Jun, 2011

4 commits

  • Note that it adds a little overheads to account the moved/enqueued
    inodes from b_dirty to b_io. The "moved" accounting may be later used to
    limit the number of inodes that can be moved in one shot, in order to
    keep spinlock hold time under control.

    Signed-off-by: Wu Fengguang

    Wu Fengguang
     
  • It is valuable to know how the dirty inodes are iterated and their IO size.

    "writeback_single_inode: bdi 8:0: ino=134246746 state=I_DIRTY_SYNC|I_SYNC age=414 index=0 to_write=1024 wrote=0"

    - "state" reflects inode->i_state at the end of writeback_single_inode()
    - "index" reflects mapping->writeback_index after the ->writepages() call
    - "to_write" is the wbc->nr_to_write at entrance of writeback_single_inode()
    - "wrote" is the number of pages actually written

    v2: add trace event writeback_single_inode_requeue as proposed by Dave.

    CC: Dave Chinner
    Signed-off-by: Wu Fengguang

    Wu Fengguang
     
  • Remove two unused struct writeback_control fields:

    .encountered_congestion (completely unused)
    .nonblocking (never set, checked/showed in XFS,NFS/btrfs)

    The .for_background check in nfs_write_inode() is also removed btw,
    as .for_background implies WB_SYNC_NONE.

    Reviewed-by: Jan Kara
    Proposed-by: Christoph Hellwig
    Signed-off-by: Wu Fengguang

    Wu Fengguang
     
  • When wbc.more_io was first introduced, it indicates whether there are
    at least one superblock whose s_more_io contains more IO work. Now with
    the per-bdi writeback, it can be replaced with a simple b_more_io test.

    Acked-by: Jan Kara
    Acked-by: Mel Gorman
    Reviewed-by: Minchan Kim
    Signed-off-by: Wu Fengguang

    Wu Fengguang