22 Oct, 2010

5 commits

  • …/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (29 commits)
    sched: Export account_system_vtime()
    sched: Call tick_check_idle before __irq_enter
    sched: Remove irq time from available CPU power
    sched: Do not account irq time to current task
    x86: Add IRQ_TIME_ACCOUNTING
    sched: Add IRQ_TIME_ACCOUNTING, finer accounting of irq time
    sched: Add a PF flag for ksoftirqd identification
    sched: Consolidate account_system_vtime extern declaration
    sched: Fix softirq time accounting
    sched: Drop group_capacity to 1 only if local group has extra capacity
    sched: Force balancing on newidle balance if local group has capacity
    sched: Set group_imb only a task can be pulled from the busiest cpu
    sched: Do not consider SCHED_IDLE tasks to be cache hot
    sched: Drop all load weight manipulation for RT tasks
    sched: Create special class for stop/migrate work
    sched: Unindent labels
    sched: Comment updates: fix default latency and granularity numbers
    tracing/sched: Add sched_pi_setprio tracepoint
    sched: Give CPU bound RT tasks preference
    sched: Try not to migrate higher priority RT tasks
    ...

    Linus Torvalds
     
  • …git/tip/linux-2.6-tip

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (163 commits)
    tracing: Fix compile issue for trace_sched_wakeup.c
    [S390] hardirq: remove pointless header file includes
    [IA64] Move local_softirq_pending() definition
    perf, powerpc: Fix power_pmu_event_init to not use event->ctx
    ftrace: Remove recursion between recordmcount and scripts/mod/empty
    jump_label: Add COND_STMT(), reducer wrappery
    perf: Optimize sw events
    perf: Use jump_labels to optimize the scheduler hooks
    jump_label: Add atomic_t interface
    jump_label: Use more consistent naming
    perf, hw_breakpoint: Fix crash in hw_breakpoint creation
    perf: Find task before event alloc
    perf: Fix task refcount bugs
    perf: Fix group moving
    irq_work: Add generic hardirq context callbacks
    perf_events: Fix transaction recovery in group_sched_in()
    perf_events: Fix bogus AMD64 generic TLB events
    perf_events: Fix bogus context time tracking
    tracing: Remove parent recording in latency tracer graph options
    tracing: Use one prologue for the preempt irqs off tracer function tracers
    ...

    Linus Torvalds
     
  • * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (52 commits)
    sched: fix RCU lockdep splat from task_group()
    rcu: using ACCESS_ONCE() to observe the jiffies_stall/rnp->qsmask value
    sched: suppress RCU lockdep splat in task_fork_fair
    net: suppress RCU lockdep false positive in sock_update_classid
    rcu: move check from rcu_dereference_bh to rcu_read_lock_bh_held
    rcu: Add advice to PROVE_RCU_REPEATEDLY kernel config parameter
    rcu: Add tracing data to support queueing models
    rcu: fix sparse errors in rcutorture.c
    rcu: only one evaluation of arg in rcu_dereference_check() unless sparse
    kernel: Remove undead ifdef CONFIG_DEBUG_LOCK_ALLOC
    rcu: fix _oddness handling of verbose stall warnings
    rcu: performance fixes to TINY_PREEMPT_RCU callback checking
    rcu: upgrade stallwarn.txt documentation for CPU-bound RT processes
    vhost: add __rcu annotations
    rcu: add comment stating that list_empty() applies to RCU-protected lists
    rcu: apply TINY_PREEMPT_RCU read-side speedup to TREE_PREEMPT_RCU
    rcu: combine duplicate code, courtesy of CONFIG_PREEMPT_RCU
    rcu: Upgrade srcu_read_lock() docbook about SRCU grace periods
    rcu: document ways of stalling updates in low-memory situations
    rcu: repair code-duplication FIXMEs
    ...

    Linus Torvalds
     
  • …s/security-testing-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (26 commits)
    selinux: include vmalloc.h for vmalloc_user
    secmark: fix config problem when CONFIG_NF_CONNTRACK_SECMARK is not set
    selinux: implement mmap on /selinux/policy
    SELinux: allow userspace to read policy back out of the kernel
    SELinux: drop useless (and incorrect) AVTAB_MAX_SIZE
    SELinux: deterministic ordering of range transition rules
    kernel: roundup should only reference arguments once
    kernel: rounddown helper function
    secmark: export secctx, drop secmark in procfs
    conntrack: export lsm context rather than internal secid via netlink
    security: secid_to_secctx returns len when data is NULL
    secmark: make secmark object handling generic
    secmark: do not return early if there was no error
    AppArmor: Ensure the size of the copy is < the buffer allocated to hold it
    TOMOYO: Print URL information before panic().
    security: remove unused parameter from security_task_setscheduler()
    tpm: change 'tpm_suspend_pcr' to be module parameter
    selinux: fix up style problem on /selinux/status
    selinux: change to new flag variable
    selinux: really fix dependency causing parallel compile failure.
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (22 commits)
    ceph: do not carry i_lock for readdir from dcache
    fs/ceph/xattr.c: Use kmemdup
    rbd: passing wrong variable to bvec_kunmap_irq()
    rbd: null vs ERR_PTR
    ceph: fix num_pages_free accounting in pagelist
    ceph: add CEPH_MDS_OP_SETDIRLAYOUT and associated ioctl.
    ceph: don't crash when passed bad mount options
    ceph: fix debugfs warnings
    block: rbd: removing unnecessary test
    block: rbd: fixed may leaks
    ceph: switch from BKL to lock_flocks()
    ceph: preallocate flock state without locks held
    ceph: add pagelist_reserve, pagelist_truncate, pagelist_set_cursor
    ceph: use mapping->nrpages to determine if mapping is empty
    ceph: only invalidate on check_caps if we actually have pages
    ceph: do not hide .snap in root directory
    rbd: introduce rados block device (rbd), based on libceph
    ceph: factor out libceph from Ceph file system
    ceph-rbd: osdc support for osd call and rollback operations
    ceph: messenger and osdc changes for rbd
    ...

    Linus Torvalds
     

21 Oct, 2010

10 commits

  • When CONFIG_NF_CONNTRACK_SECMARK is not set we accidentally attempt to use
    the secmark fielf of struct nf_conn. Problem is when that config isn't set
    the field doesn't exist. whoops. Wrap the incorrect usage in the config.

    Signed-off-by: Eric Paris
    Signed-off-by: James Morris

    Eric Paris
     
  • The current secmark code exports a secmark= field which just indicates if
    there is special labeling on a packet or not. We drop this field as it
    isn't particularly useful and instead export a new field secctx= which is
    the actual human readable text label.

    Signed-off-by: Eric Paris
    Acked-by: Patrick McHardy
    Signed-off-by: James Morris

    Eric Paris
     
  • The conntrack code can export the internal secid to userspace. These are
    dynamic, can change on lsm changes, and have no meaning in userspace. We
    should instead be sending lsm contexts to userspace instead. This patch sends
    the secctx (rather than secid) to userspace over the netlink socket. We use a
    new field CTA_SECCTX and stop using the the old CTA_SECMARK field since it did
    not send particularly useful information.

    Signed-off-by: Eric Paris
    Reviewed-by: Paul Moore
    Acked-by: Patrick McHardy
    Signed-off-by: James Morris

    Eric Paris
     
  • Right now secmark has lots of direct selinux calls. Use all LSM calls and
    remove all SELinux specific knowledge. The only SELinux specific knowledge
    we leave is the mode. The only point is to make sure that other LSMs at
    least test this generic code before they assume it works. (They may also
    have to make changes if they do not represent labels as strings)

    Signed-off-by: Eric Paris
    Acked-by: Paul Moore
    Acked-by: Patrick McHardy
    Signed-off-by: James Morris

    Eric Paris
     
  • Commit 4a5a5c73 attempted to pass decent error messages back to userspace for
    netfilter errors. In xt_SECMARK.c however the patch screwed up and returned
    on 0 (aka no error) early and didn't finish setting up secmark. This results
    in a kernel BUG if you use SECMARK.

    Signed-off-by: Eric Paris
    Acked-by: Paul Moore
    Signed-off-by: James Morris

    Eric Paris
     
  • Decrement the free page counter when removing a page from the free_list.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • This only happened when parse_extra_token was not passed
    to ceph_parse_option() (hence, only happened in rbd).

    Signed-off-by: Yehuda Sadeh

    Yehuda Sadeh
     
  • These facilitate preallocation of pages so that we can encode into the pagelist
    in an atomic context.

    Signed-off-by: Greg Farnum
    Signed-off-by: Sage Weil

    Greg Farnum
     
  • The rados block device (rbd), based on osdblk, creates a block device
    that is backed by objects stored in the Ceph distributed object storage
    cluster. Each device consists of a single metadata object and data
    striped over many data objects.

    The rbd driver supports read-only snapshots.

    Signed-off-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Yehuda Sadeh
     
  • This factors out protocol and low-level storage parts of ceph into a
    separate libceph module living in net/ceph and include/linux/ceph. This
    is mostly a matter of moving files around. However, a few key pieces
    of the interface change as well:

    - ceph_client becomes ceph_fs_client and ceph_client, where the latter
    captures the mon and osd clients, and the fs_client gets the mds client
    and file system specific pieces.
    - Mount option parsing and debugfs setup is correspondingly broken into
    two pieces.
    - The mon client gets a generic handler callback for otherwise unknown
    messages (mds map, in this case).
    - The basic supported/required feature bits can be expanded (and are by
    ceph_fs_client).

    No functional change, aside from some subtle error handling cases that got
    cleaned up in the refactoring process.

    Signed-off-by: Sage Weil

    Yehuda Sadeh
     

19 Oct, 2010

1 commit

  • Peter Zijlstra found a bug in the way softirq time is accounted in
    VIRT_CPU_ACCOUNTING on this thread:

    http://lkml.indiana.edu/hypermail//linux/kernel/1009.2/01366.html

    The problem is, softirq processing uses local_bh_disable internally. There
    is no way, later in the flow, to differentiate between whether softirq is
    being processed or is it just that bh has been disabled. So, a hardirq when bh
    is disabled results in time being wrongly accounted as softirq.

    Looking at the code a bit more, the problem exists in !VIRT_CPU_ACCOUNTING
    as well. As account_system_time() in normal tick based accouting also uses
    softirq_count, which will be set even when not in softirq with bh disabled.

    Peter also suggested solution of using 2*SOFTIRQ_OFFSET as irq count
    for local_bh_{disable,enable} and using just SOFTIRQ_OFFSET while softirq
    processing. The patch below does that and adds API in_serving_softirq() which
    returns whether we are currently processing softirq or not.

    Also changes one of the usages of softirq_count in net/sched/cls_cgroup.c
    to in_serving_softirq.

    Looks like many usages of in_softirq really want in_serving_softirq. Those
    changes can be made individually on a case by case basis.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Venkatesh Pallipadi
     

16 Oct, 2010

1 commit

  • Don't try to "optimize" rds_page_copy_user() by using kmap_atomic() and
    the unsafe atomic user mode accessor functions. It's actually slower
    than the straightforward code on any reasonable modern CPU.

    Back when the code was written (although probably not by the time it was
    actually merged, though), 32-bit x86 may have been the dominant
    architecture. And there kmap_atomic() can be a lot faster than kmap()
    (unless you have very good locality, in which case the virtual address
    caching by kmap() can overcome all the downsides).

    But these days, x86-64 may not be more populous, but it's getting there
    (and if you care about performance, it's definitely already there -
    you'd have upgraded your CPU's already in the last few years). And on
    x86-64, the non-kmap_atomic() version is faster, simply because the code
    is simpler and doesn't have the "re-try page fault" case.

    People with old hardware are not likely to care about RDS anyway, and
    the optimization for the 32-bit case is simply buggy, since it doesn't
    verify the user addresses properly.

    Reported-by: Dan Rosenberg
    Acked-by: Andrew Morton
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

12 Oct, 2010

2 commits

  • Several other ethtool functions leave heap uncleared (potentially) by
    drivers. Some interfaces appear safe (eeprom, etc), in that the sizes
    are well controlled. In some situations (e.g. unchecked error conditions),
    the heap will remain unchanged in areas before copying back to userspace.
    Note that these are less of an issue since these all require CAP_NET_ADMIN.

    Cc: stable@kernel.org
    Signed-off-by: Kees Cook
    Acked-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Kees Cook
     
  • Stanse found that mpc_push frees skb and then it dereferences it. It
    is a typo, new_skb should be dereferenced there.

    Signed-off-by: Jiri Slaby
    Cc: Eric Dumazet
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Jiri Slaby
     

09 Oct, 2010

2 commits


08 Oct, 2010

2 commits

  • Conflicts:
    arch/x86/kernel/module.c

    Merge reason: Resolve the conflict, pick up fixes.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • > ===================================================
    > [ INFO: suspicious rcu_dereference_check() usage. ]
    > ---------------------------------------------------
    > include/linux/cgroup.h:542 invoked rcu_dereference_check() without protection!
    >
    > other info that might help us debug this:
    >
    >
    > rcu_scheduler_active = 1, debug_locks = 0
    > 1 lock held by swapper/1:
    > #0: (net_mutex){+.+.+.}, at: []
    > register_pernet_subsys+0x1f/0x47
    >
    > stack backtrace:
    > Pid: 1, comm: swapper Not tainted 2.6.35.4-28.fc14.x86_64 #1
    > Call Trace:
    > [] lockdep_rcu_dereference+0xaa/0xb3
    > [] sock_update_classid+0x7c/0xa2
    > [] sk_alloc+0x6b/0x77
    > [] __netlink_create+0x37/0xab
    > [] ? rtnetlink_rcv+0x0/0x2d
    > [] netlink_kernel_create+0x74/0x19d
    > [] ? __mutex_lock_common+0x339/0x35b
    > [] rtnetlink_net_init+0x2e/0x48
    > [] ops_init+0xe9/0xff
    > [] register_pernet_operations+0xab/0x130
    > [] register_pernet_subsys+0x2e/0x47
    > [] rtnetlink_init+0x53/0x102
    > [] netlink_proto_init+0x126/0x143
    > [] ? netlink_proto_init+0x0/0x143
    > [] do_one_initcall+0x72/0x186
    > [] kernel_init+0x23b/0x2c9
    > [] kernel_thread_helper+0x4/0x10
    > [] ? restore_args+0x0/0x30
    > [] ? kernel_init+0x0/0x2c9
    > [] ? kernel_thread_helper+0x0/0x10

    The sock_update_classid() function calls task_cls_classid(current),
    but the calling task cannot go away, so there is no danger of
    the associated structures disappearing. Insert an RCU read-side
    critical section to suppress the false positive.

    Reported-by: Subrata Modak
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

07 Oct, 2010

6 commits


06 Oct, 2010

1 commit

  • caif_connect() might dereference a netdevice after dev_put() it.

    It also doesnt check dev_get_by_index() return value and could
    dereference a NULL pointer.

    Fix it, using RCU to avoid taking a reference.

    Signed-off-by: Eric Dumazet
    CC: Sjur Braendeland
    Signed-off-by: David S. Miller

    Eric Dumazet
     

05 Oct, 2010

3 commits

  • skb_headroom() is unsigned so "skb_headroom(skb) + toff" is also
    unsigned and can't be less than zero. This test was added in 66d50d25:
    "u32: negative offset fix" It was supposed to fix a regression.

    Signed-off-by: Dan Carpenter
    Signed-off-by: David S. Miller

    Dan Carpenter
     
  • L2CAP doesn't permit change like MTU, FCS, TxWindow values while the
    connection is alive, we can only set that before the
    connection/configuration process. That can lead to bugs in the L2CAP
    operation.

    Signed-off-by: Gustavo F. Padovan

    Gustavo F. Padovan
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
    vlan: dont drop packets from unknown vlans in promiscuous mode
    Phonet: Correct header retrieval after pskb_may_pull
    um: Proper Fix for f25c80a4: remove duplicate structure field initialization
    ip_gre: Fix dependencies wrt. ipv6.
    net-2.6: SYN retransmits: Add new parameter to retransmits_timed_out()
    iwl3945: queue the right work if the scan needs to be aborted
    mac80211: fix use-after-free

    Linus Torvalds
     

04 Oct, 2010

6 commits

  • The sctp_asoc_get_hmac() function iterates through a peer's hmac_ids
    array and attempts to ensure that only a supported hmac entry is
    returned. The current code fails to do this properly - if the last id
    in the array is out of range (greater than SCTP_AUTH_HMAC_ID_MAX), the
    id integer remains set after exiting the loop, and the address of an
    out-of-bounds entry will be returned and subsequently used in the parent
    function, causing potentially ugly memory corruption. This patch resets
    the id integer to 0 on encountering an invalid id so that NULL will be
    returned after finishing the loop if no valid ids are found.

    Signed-off-by: Dan Rosenberg
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Dan Rosenberg
     
  • Two user-controlled allocations in SCTP are subsequently dereferenced as
    sockaddr structs, without checking if the dereferenced struct members fall
    beyond the end of the allocated chunk. There doesn't appear to be any
    information leakage here based on how these members are used and
    additional checking, but it's still worth fixing.

    [akpm@linux-foundation.org: remove unfashionable newlines, fix gmail tab->space conversion]
    Signed-off-by: Dan Rosenberg
    Acked-by: Vlad Yasevich
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Dan Rosenberg
     
  • A recent patch to allow IGMPv2 responses to IGMPv3 queries
    bypasses length checks for valid query lengths, incorrectly
    resets the v2_seen timer, and does not support IGMPv1.

    The following patch responds with a v2 report as required
    by IGMPv2 while correcting the other problems introduced
    by the patch.

    Signed-Off-By: David L Stevens

    Signed-off-by: David S. Miller

    David Stevens
     
  • This reverts commit e81963b180ac502fda0326edf059b1e29cdef1a2.

    LRO is now deprecated in favour of GRO, and only a few drivers use it,
    so it is desirable to build it as a module in distribution kernels.

    The original change to prevent building it as a module was made in an
    attempt to avoid the case where some dependents are set to y and some
    to m, and INET_LRO can be set to m rather than y. However, the
    Kconfig system will reliably set INET_LRO=y in this case.

    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Ben Hutchings
     
  • This patch fixes the condition (3rd arg) passed to sk_wait_event() in
    sk_stream_wait_memory(). The incorrect check in sk_stream_wait_memory()
    causes the following soft lockup in tcp_sendmsg() when the global tcp
    memory pool has exhausted.

    >>> snip <<<

    localhost kernel: BUG: soft lockup - CPU#3 stuck for 11s! [sshd:6429]
    localhost kernel: CPU 3:
    localhost kernel: RIP: 0010:[sk_stream_wait_memory+0xcd/0x200] [sk_stream_wait_memory+0xcd/0x200] sk_stream_wait_memory+0xcd/0x200
    localhost kernel:
    localhost kernel: Call Trace:
    localhost kernel: [sk_stream_wait_memory+0x1b1/0x200] sk_stream_wait_memory+0x1b1/0x200
    localhost kernel: [] autoremove_wake_function+0x0/0x40
    localhost kernel: [ipv6:tcp_sendmsg+0x6e6/0xe90] tcp_sendmsg+0x6e6/0xce0
    localhost kernel: [sock_aio_write+0x126/0x140] sock_aio_write+0x126/0x140
    localhost kernel: [xfs:do_sync_write+0xf1/0x130] do_sync_write+0xf1/0x130
    localhost kernel: [] autoremove_wake_function+0x0/0x40
    localhost kernel: [hrtimer_start+0xe3/0x170] hrtimer_start+0xe3/0x170
    localhost kernel: [vfs_write+0x185/0x190] vfs_write+0x185/0x190
    localhost kernel: [sys_write+0x50/0x90] sys_write+0x50/0x90
    localhost kernel: [system_call+0x7e/0x83] system_call+0x7e/0x83

    >>> snip <<<

    What is happening is, that the sk_wait_event() condition passed from
    sk_stream_wait_memory() evaluates to true for the case of tcp global memory
    exhaustion. This is because both sk_stream_memory_free() and vm_wait are true
    which causes sk_wait_event() to *not* call schedule_timeout().
    Hence sk_stream_wait_memory() returns immediately to the caller w/o sleeping.
    This causes the caller to again try allocation, which again fails and again
    calls sk_stream_wait_memory(), and so on.

    [ Bug introduced by commit c1cbe4b7ad0bc4b1d98ea708a3fecb7362aa4088
    ("[NET]: Avoid atomic xchg() for non-error case") -DaveM ]

    Signed-off-by: Nagendra Singh Tomar
    Signed-off-by: David S. Miller

    Nagendra Tomar
     
  • Signed-off-by: Maciej Żenczykowski
    Signed-off-by: David S. Miller

    Maciej Żenczykowski
     

01 Oct, 2010

1 commit

  • Roger Luethi noticed packets for unknown VLANs getting silently dropped
    even in promiscuous mode.

    Check for promiscuous mode in __vlan_hwaccel_rx() and vlan_gro_common()
    before drops.

    As suggested by Patrick, mark such packets to have skb->pkt_type set to
    PACKET_OTHERHOST to make sure they are dropped by IP stack.

    Reported-by: Roger Luethi
    Signed-off-by: Eric Dumazet
    CC: Patrick McHardy
    Signed-off-by: David S. Miller

    Eric Dumazet