23 Mar, 2011

33 commits

  • list_del() leaves poison in the prev and next pointers. The next
    list_empty() will compare those poisons, and say the list isn't empty.
    Any list operations that assume the node is on a list because of such a
    check will be fooled into dereferencing poison. One needs to INIT the
    node after the del, and fortunately there's already a wrapper for that -
    list_del_init().

    Some of the dels are followed by deallocations, so can be ignored, and one
    can be merged with an add to make a move. Apart from that, I erred on the
    side of caution in making nodes list_empty()-queriable.

    Signed-off-by: Phil Carmody
    Reviewed-by: Paul Menage
    Cc: Li Zefan
    Acked-by: Kirill A. Shutemov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Phil Carmody
     
  • The oom killer naturally defers killing anything if it finds an eligible
    task that is already exiting and has yet to detach its ->mm. This avoids
    unnecessarily killing tasks when one is already in the exit path and may
    free enough memory that the oom killer is no longer needed. This is
    detected by PF_EXITING since threads that have already detached its ->mm
    are no longer considered at all.

    The problem with always deferring when a thread is PF_EXITING, however, is
    that it may never actually exit when being traced, specifically if another
    task is tracing it with PTRACE_O_TRACEEXIT. The oom killer does not want
    to defer in this case since there is no guarantee that thread will ever
    exit without intervention.

    This patch will now only defer the oom killer when a thread is PF_EXITING
    and no ptracer has stopped its progress in the exit path. It also ensures
    that a child is sacrificed for the chosen parent only if it has a
    different ->mm as the comment implies: this ensures that the thread group
    leader is always targeted appropriately.

    Signed-off-by: David Rientjes
    Reported-by: Oleg Nesterov
    Cc: KOSAKI Motohiro
    Cc: KAMEZAWA Hiroyuki
    Cc: Hugh Dickins
    Cc: Andrey Vagin
    Cc: [2.6.38.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • We shouldn't defer oom killing if a thread has already detached its ->mm
    and still has TIF_MEMDIE set. Memory needs to be freed, so find kill
    other threads that pin the same ->mm or find another task to kill.

    Signed-off-by: Andrey Vagin
    Signed-off-by: David Rientjes
    Cc: KOSAKI Motohiro
    Cc: [2.6.38.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Vagin
     
  • This patch prevents unnecessary oom kills or kernel panics by reverting
    two commits:

    495789a5 (oom: make oom_score to per-process value)
    cef1d352 (oom: multi threaded process coredump don't make deadlock)

    First, 495789a5 (oom: make oom_score to per-process value) ignores the
    fact that all threads in a thread group do not necessarily exit at the
    same time.

    It is imperative that select_bad_process() detect threads that are in the
    exit path, specifically those with PF_EXITING set, to prevent needlessly
    killing additional tasks. If a process is oom killed and the thread group
    leader exits, select_bad_process() cannot detect the other threads that
    are PF_EXITING by iterating over only processes. Thus, it currently
    chooses another task unnecessarily for oom kill or panics the machine when
    nothing else is eligible.

    By iterating over threads instead, it is possible to detect threads that
    are exiting and nominate them for oom kill so they get access to memory
    reserves.

    Second, cef1d352 (oom: multi threaded process coredump don't make
    deadlock) erroneously avoids making the oom killer a no-op when an
    eligible thread other than current isfound to be exiting. We want to
    detect this situation so that we may allow that exiting thread time to
    exit and free its memory; if it is able to exit on its own, that should
    free memory so current is no loner oom. If it is not able to exit on its
    own, the oom killer will nominate it for oom kill which, in this case,
    only means it will get access to memory reserves.

    Without this change, it is easy for the oom killer to unnecessarily target
    tasks when all threads of a victim don't exit before the thread group
    leader or, in the worst case, panic the machine.

    Signed-off-by: David Rientjes
    Cc: KOSAKI Motohiro
    Cc: KAMEZAWA Hiroyuki
    Cc: Oleg Nesterov
    Cc: Hugh Dickins
    Cc: Andrey Vagin
    Cc: [2.6.38.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • If an administrator tries to swapon a file backed by NFS, the inode mutex is
    taken (as it is for any swapfile) but later identified to be a bad swapfile
    due to the lack of bmap and tries to cleanup. During cleanup, an attempt is
    made to close the file but with inode->i_mutex still held. Closing an NFS
    file syncs it which tries to acquire the inode mutex leading to deadlock. If
    lockdep is enabled the following appears on the console;

    =============================================
    [ INFO: possible recursive locking detected ]
    2.6.38-rc8-autobuild #1
    ---------------------------------------------
    swapon/2192 is trying to acquire lock:
    (&sb->s_type->i_mutex_key#13){+.+.+.}, at: vfs_fsync_range+0x47/0x7c

    but task is already holding lock:
    (&sb->s_type->i_mutex_key#13){+.+.+.}, at: sys_swapon+0x28d/0xae7

    other info that might help us debug this:
    1 lock held by swapon/2192:
    #0: (&sb->s_type->i_mutex_key#13){+.+.+.}, at: sys_swapon+0x28d/0xae7

    stack backtrace:
    Pid: 2192, comm: swapon Not tainted 2.6.38-rc8-autobuild #1
    Call Trace:
    __lock_acquire+0x2eb/0x1623
    find_get_pages_tag+0x14a/0x174
    pagevec_lookup_tag+0x25/0x2e
    vfs_fsync_range+0x47/0x7c
    lock_acquire+0xd3/0x100
    vfs_fsync_range+0x47/0x7c
    nfs_flush_one+0x0/0xdf [nfs]
    mutex_lock_nested+0x40/0x2b1
    vfs_fsync_range+0x47/0x7c
    vfs_fsync_range+0x47/0x7c
    vfs_fsync+0x1c/0x1e
    nfs_file_flush+0x64/0x69 [nfs]
    filp_close+0x43/0x72
    sys_swapon+0xa39/0xae7
    sysret_check+0x2e/0x69
    system_call_fastpath+0x16/0x1b

    This patch releases the mutex if its held before calling filep_close()
    so swapon fails as expected without deadlock when the swapfile is backed
    by NFS. If accepted for 2.6.39, it should also be considered a -stable
    candidate for 2.6.38 and 2.6.37.

    Signed-off-by: Mel Gorman
    Acked-by: Hugh Dickins
    Cc: [2.6.37+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • syncfs() is duplicating name_to_handle_at() due to a merging mistake.

    Cc: Sage Weil
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • * 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
    slub: Add statistics for this_cmpxchg_double failures
    slub: Add missing irq restore for the OOM path

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
    [net/9p]: Introduce basic flow-control for VirtIO transport.
    9p: use the updated offset given by generic_write_checks
    [net/9p] Don't re-pin pages on retrying virtqueue_add_buf().
    [net/9p] Set the condition just before waking up.
    [net/9p] unconditional wake_up to proc waiting for space on VirtIO ring
    fs/9p: Add v9fs_dentry2v9ses
    fs/9p: Attach writeback_fid on first open with WR flag
    fs/9p: Open writeback fid in O_SYNC mode
    fs/9p: Use truncate_setsize instead of vmtruncate
    net/9p: Fix compile warning
    net/9p: Convert the in the 9p rpc call path to GFP_NOFS
    fs/9p: Fix race in initializing writeback fid

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    rbd: use watch/notify for changes in rbd header
    libceph: add lingering request and watch/notify event framework
    rbd: update email address in Documentation
    ceph: rename dentry_release -> d_release, fix comment
    ceph: add request to the tail of unsafe write list
    ceph: remove request from unsafe list if it is canceled/timed out
    ceph: move readahead default to fs/ceph from libceph
    ceph: add ino32 mount option
    ceph: update common header files
    ceph: remove debugfs debug cruft
    libceph: fix osd request queuing on osdmap updates
    ceph: preserve I_COMPLETE across rename
    libceph: Fix base64-decoding when input ends in newline.

    Linus Torvalds
     
  • Using delayed-work for tty flip buffers ends up causing us to wait for
    the next tick to complete some actions. That's usually not all that
    noticeable, but for certain latency-critical workloads it ends up being
    totally unacceptable.

    As an extreme case of this, passing a token back-and-forth over a pty
    will take two ticks per iteration, so even just a thousand iterations
    will take 8 seconds assuming a common 250Hz configuration.

    Avoiding the whole delayed work issue brings that ping-pong test-case
    down to 0.009s on my machine.

    In more practical terms, this latency has been a performance problem for
    things like dive computer simulators (simulating the serial interface
    using the ptys) and for other environments (Alan mentions a CP/M emulator).

    Reported-by: Jef Driesen
    Acked-by: Greg KH
    Acked-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Recent zerocopy work in the 9P VirtIO transport maps and pins
    user buffers into kernel memory for the server to work on them.
    Since the user process can initiate this kind of pinning with a simple
    read/write call, thousands of IO threads initiated by the user process can
    hog the system resources and could result into denial of service.

    This patch introduces flow control to avoid that extreme scenario.

    The ceiling limit to avoid denial of service attacks is set to relatively
    high (nr_free_pagecache_pages()/4) so that it won't interfere with
    regular usage, but can step in extreme cases to limit the total system
    hang. Since we don't have a global structure to accommodate this variable,
    I choose the virtio_chan as the home for this.

    Signed-off-by: Venkateswararao Jujjuri
    Reviewed-by: Badari Pulavarty
    Signed-off-by: Eric Van Hensbergen

    Venkateswararao Jujjuri (JV)
     
  • Without this fix, even if a file is opened in O_APPEND mode, data will be
    written at current file position instead of end of file.

    Signed-off-by: M. Mohan Kumar
    Reviewed-by: Aneesh Kumar K.V
    Signed-off-by: Eric Van Hensbergen

    M. Mohan Kumar
     
  • Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Venkateswararao Jujjuri (JV)
     
  • Given that the sprious wake-ups are common, we need to move the
    condition setting right next to the wake_up(). After setting the condition
    to req->status = REQ_STATUS_RCVD, sprious wakeups may cause the
    virtqueue back on the free list for someone else to use.
    This may result in kernel panic while relasing the pinned pages
    in p9_release_req_pages().

    Also rearranged the while loop in req_done() for better redability.

    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Venkateswararao Jujjuri (JV)
     
  • Process may wait to get space on VirtIO ring to send a transaction to
    VirtFS server. Current code just does a conditional wake_up() which
    means only one process will be woken up even if multiple processes
    are waiting.

    This fix makes the wake_up unconditional. Hence we won't have any
    processes waiting for-ever.

    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Venkateswararao Jujjuri (JV)
     
  • Add the new static inline and use the same

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • We don't need writeback fid if we are only doing O_RDONLY open

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • Older version of protocol don't support tsyncfs operation.
    So for them force a O_SYNC flag on the server

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • convert vmtruncate usage to truncate_setsize. We also writeback
    all dirty pages before doing 9p operations and on success call truncate_setsize.
    This ensure that we continue sanely on failed truncate on the server. The
    disadvantage is that we are now going to write back the content that get
    thrown away later as a part of truncate.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • Without this we can cause reclaim allocation in writepage.

    [ 3433.448430] =================================
    [ 3433.449117] [ INFO: inconsistent lock state ]
    [ 3433.449117] 2.6.38-rc5+ #84
    [ 3433.449117] ---------------------------------
    [ 3433.449117] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
    [ 3433.449117] kswapd0/505 [HC0[0]:SC0[0]:HE1:SE1] takes:
    [ 3433.449117] (iprune_sem){+++++-}, at: [] shrink_icache_memory+0x45/0x2b1
    [ 3433.449117] {RECLAIM_FS-ON-W} state was registered at:
    [ 3433.449117] [] mark_held_locks+0x52/0x70
    [ 3433.449117] [] lockdep_trace_alloc+0x85/0x9f
    [ 3433.449117] [] slab_pre_alloc_hook+0x18/0x3c
    [ 3433.449117] [] kmem_cache_alloc+0x23/0xa2
    [ 3433.449117] [] idr_pre_get+0x2d/0x6f
    [ 3433.449117] [] p9_idpool_get+0x30/0xae
    [ 3433.449117] [] p9_client_rpc+0xd7/0x9b0
    [ 3433.449117] [] p9_client_clunk+0x88/0xdb
    [ 3433.449117] [] v9fs_evict_inode+0x3c/0x48
    [ 3433.449117] [] evict+0x1f/0x87
    [ 3433.449117] [] dispose_list+0x47/0xe3
    [ 3433.449117] [] evict_inodes+0x138/0x14f
    [ 3433.449117] [] generic_shutdown_super+0x57/0xe8
    [ 3433.449117] [] kill_anon_super+0x11/0x50
    [ 3433.449117] [] v9fs_kill_super+0x49/0xab
    [ 3433.449117] [] deactivate_locked_super+0x21/0x46
    [ 3433.449117] [] deactivate_super+0x40/0x44
    [ 3433.449117] [] mntput_no_expire+0x100/0x109
    [ 3433.449117] [] sys_umount+0x2f1/0x31c
    [ 3433.449117] [] system_call_fastpath+0x16/0x1b
    [ 3433.449117] irq event stamp: 192941
    [ 3433.449117] hardirqs last enabled at (192941): [] _raw_spin_unlock_irq+0x2b/0x30
    [ 3433.449117] hardirqs last disabled at (192940): [] shrink_inactive_list+0x290/0x2f5
    [ 3433.449117] softirqs last enabled at (188470): [] __do_softirq+0x133/0x152
    [ 3433.449117] softirqs last disabled at (188455): [] call_softirq+0x1c/0x28
    [ 3433.449117]
    [ 3433.449117] other info that might help us debug this:
    [ 3433.449117] 1 lock held by kswapd0/505:
    [ 3433.449117] #0: (shrinker_rwsem){++++..}, at: [] shrink_slab+0x38/0x15f
    [ 3433.449117]
    [ 3433.449117] stack backtrace:
    [ 3433.449117] Pid: 505, comm: kswapd0 Not tainted 2.6.38-rc5+ #84
    [ 3433.449117] Call Trace:
    [ 3433.449117] [] ? valid_state+0x17e/0x191
    [ 3433.449117] [] ? save_stack_trace+0x28/0x45
    [ 3433.449117] [] ? check_usage_forwards+0x0/0x87
    [ 3433.449117] [] ? mark_lock+0x113/0x22c
    [ 3433.449117] [] ? __lock_acquire+0x37a/0xcf7
    [ 3433.449117] [] ? mark_lock+0x2d/0x22c
    [ 3433.449117] [] ? __lock_acquire+0x392/0xcf7
    [ 3433.449117] [] ? determine_dirtyable_memory+0x15/0x28
    [ 3433.449117] [] ? lock_acquire+0x57/0x6d
    [ 3433.449117] [] ? shrink_icache_memory+0x45/0x2b1
    [ 3433.449117] [] ? down_read+0x47/0x5c
    [ 3433.449117] [] ? shrink_icache_memory+0x45/0x2b1
    [ 3433.449117] [] ? shrink_icache_memory+0x45/0x2b1
    [ 3433.449117] [] ? shrink_slab+0xdb/0x15f
    [ 3433.449117] [] ? kswapd+0x574/0x96a
    [ 3433.449117] [] ? kswapd+0x0/0x96a
    [ 3433.449117] [] ? kthread+0x7d/0x85
    [ 3433.449117] [] ? kernel_thread_helper+0x4/0x10
    [ 3433.449117] [] ? restore_args+0x0/0x30
    [ 3433.449117] [] ? kthread+0x0/0x85
    [ 3433.449117] [] ? kernel_thread_helper+0x0/0x10

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • When two process open the same file we can end up with both of them
    allocating the writeback_fid. Add a new mutex which can be used
    for synchronizing v9fs_inode member values.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • Add some statistics for debugging.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • OOM path is missing the irq restore in the CONFIG_CMPXCHG_LOCAL case.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • Send notifications when we change the rbd header (e.g. create a snapshot)
    and wait for such notifications. This allows synchronizing the snapshot
    creation between different rbd clients/rools.

    Signed-off-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Yehuda Sadeh
     
  • Lingering requests are requests that are sent to the OSD normally but
    tracked also after we get a successful request. This keeps the OSD
    connection open and resends the original request if the object moves to
    another OSD. The OSD can then send notification messages back to us
    if another client initiates a notify.

    This framework will be used by RBD so that the client gets notification
    when a snapshot is created by another node or tool.

    Signed-off-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Yehuda Sadeh
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: make fuse_dentry_revalidate() RCU aware
    fuse: make fuse_permission() RCU aware
    fuse: wakeup pollers on connection release/abort
    fuse: reduce size of struct fuse_request

    Linus Torvalds
     
  • …git/tip/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    xen: update mask_rw_pte after kernel page tables init changes
    xen: set max_pfn_mapped to the last pfn mapped
    x86: Cleanup highmap after brk is concluded

    Fix up trivial onflict (added header file includes) in
    arch/x86/mm/init_64.c

    Linus Torvalds
     
  • * 'next-samsung' of git://git.fluff.org/bjdooks/linux:
    ARM: H1940/RX1950: Change default LED triggers
    ARM: S3C2442: RX1950: Add support for LED blinking
    ARM: S3C2442: RX1950: Retain LEDs state in suspend
    ARM: S3C2410: H1940: Fix lcd_power_set function
    ARM: S3C2410: H1940: Add battery support
    ARM: S3C2410: H1940: Use leds-gpio driver for LEDs managing
    ARM: S3C2410: H1940: Make h1940-bluetooth.c compile again
    ARM: S3C2410: H1940: Add keys device

    Linus Torvalds
     
  • * 'for-linus/2639/i2c-2' of git://git.fluff.org/bjdooks/linux:
    i2c-pxa2xx: Don't clear isr bits too early
    i2c-pxa2xx: Fix register offsets
    i2c-pxa2xx: pass of_node from platform driver to adapter and publish
    i2c-pxa2xx: check timeout correctly
    i2c-pxa2xx: add support for shared IRQ handler
    i2c-pxa2xx: Add PCI support for PXA I2C controller
    ARM: pxa2xx: reorganize I2C files
    i2c-pxa2xx: use dynamic register layout
    i2c-mxs: set controller to pio queue mode after reset
    i2c-eg20t: support new device OKI SEMICONDUCTOR ML7213 IOH
    i2c/busses: Add support for Diolan U2C-12 USB-I2C adapter

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
    slub: Dont define useless label in the !CONFIG_CMPXCHG_LOCAL case
    slab,rcu: don't assume the size of struct rcu_head
    slub,rcu: don't assume the size of struct rcu_head
    slub: automatically reserve bytes at the end of slab
    Lockless (and preemptless) fastpaths for slub
    slub: Get rid of slab_free_hook_irq()
    slub: min_partial needs to be in first cacheline
    slub: fix ksize() build error
    slub: fix kmemcheck calls to match ksize() hints
    Revert "slab: Fix missing DEBUG_SLAB last user"
    mm: Remove support for kmem_cache_name()

    Linus Torvalds
     
  • Ensure that we kill discard requests after logical block provisioning
    has been disabled in sysfs.

    Signed-off-by: Martin K. Petersen
    Reported-by: Geert Uytterhoeven
    Reviewed-by: Jeff Moyer
    Signed-off-by: Linus Torvalds

    Martin K. Petersen
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (33 commits)
    IPVS: Use global mutex in ip_vs_app.c
    ipvs: fix a typo in __ip_vs_control_init()
    veth: Fix the byte counters
    net ipv6: Fix duplicate /proc/sys/net/ipv6/neigh directory entries.
    macvlan: Fix use after free of struct macvlan_port.
    net: fix incorrect spelling in drop monitor protocol
    can: c_can: Do basic c_can configuration _before_ enabling the interrupts
    net/appletalk: fix atalk_release use after free
    ipx: fix ipx_release()
    snmp: SNMP_UPD_PO_STATS_BH() always called from softirq
    l2tp: fix possible oops on l2tp_eth module unload
    xfrm: Fix initialize repl field of struct xfrm_state
    netfilter: ipt_CLUSTERIP: fix buffer overflow
    netfilter: xtables: fix reentrancy
    netfilter: ipset: fix checking the type revision at create command
    netfilter: ipset: fix address ranges at hash:*port* types
    niu: Rename NIU parent platform device name to fix conflict.
    r8169: fix a bug in rtl8169_init_phy()
    bonding: fix a typo in a comment
    ftmac100: use resource_size()
    ...

    Linus Torvalds
     

22 Mar, 2011

7 commits

  • As part of the work to make IPVS network namespace aware
    __ip_vs_app_mutex was replaced by a per-namespace lock,
    ipvs->app_mutex. ipvs->app_key is also supplied for debugging purposes.

    Unfortunately this implementation results in ipvs->app_key residing
    in non-static storage which at the very least causes a lockdep warning.

    This patch takes the rather heavy-handed approach of reinstating
    __ip_vs_app_mutex which will cover access to the ipvs->list_head
    of all network namespaces.

    [ 12.610000] IPVS: Creating netns size=2456 id=0
    [ 12.630000] IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP)
    [ 12.640000] BUG: key ffff880003bbf1a0 not in .data!
    [ 12.640000] ------------[ cut here ]------------
    [ 12.640000] WARNING: at kernel/lockdep.c:2701 lockdep_init_map+0x37b/0x570()
    [ 12.640000] Hardware name: Bochs
    [ 12.640000] Pid: 1, comm: swapper Tainted: G W 2.6.38-kexec-06330-g69b7efe-dirty #122
    [ 12.650000] Call Trace:
    [ 12.650000] [] warn_slowpath_common+0x75/0xb0
    [ 12.650000] [] warn_slowpath_null+0x15/0x20
    [ 12.650000] [] lockdep_init_map+0x37b/0x570
    [ 12.650000] [] ? trace_hardirqs_on+0xd/0x10
    [ 12.650000] [] debug_mutex_init+0x38/0x50
    [ 12.650000] [] __mutex_init+0x5c/0x70
    [ 12.650000] [] __ip_vs_app_init+0x64/0x86
    [ 12.660000] [] ? ip_vs_init+0x0/0xff
    [ 12.660000] [] T.620+0x43/0x170
    [ 12.660000] [] ? register_pernet_subsys+0x1a/0x40
    [ 12.660000] [] ? ip_vs_init+0x0/0xff
    [ 12.660000] [] ? ip_vs_init+0x0/0xff
    [ 12.660000] [] register_pernet_operations+0x57/0xb0
    [ 12.660000] [] ? ip_vs_init+0x0/0xff
    [ 12.670000] [] register_pernet_subsys+0x29/0x40
    [ 12.670000] [] ip_vs_app_init+0x10/0x12
    [ 12.670000] [] ip_vs_init+0x4c/0xff
    [ 12.670000] [] do_one_initcall+0x7a/0x12e
    [ 12.670000] [] kernel_init+0x13e/0x1c2
    [ 12.670000] [] kernel_thread_helper+0x4/0x10
    [ 12.670000] [] ? restore_args+0x0/0x30
    [ 12.680000] [] ? kernel_init+0x0/0x1c2
    [ 12.680000] [] ? kernel_thread_helper+0x0/0x1global0

    Signed-off-by: Simon Horman
    Cc: Ingo Molnar
    Cc: Eric Dumazet
    Cc: Julian Anastasov
    Cc: Hans Schillstrom
    Signed-off-by: David S. Miller

    Simon Horman
     
  • Reported-by: Ingo Molnar
    Signed-off-by: Eric Dumazet
    Cc: Simon Horman
    Cc: Julian Anastasov
    Acked-by: Simon Horman
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Commit 44540960 "veth: move loopback logic to common location" introduced
    a bug in the packet counters. I don't understand why that happened as it
    is not explained in the comments and the mut check in dev_forward_skb
    retains the assumption that skb->len is the total length of the packet.

    I just measured this emperically by setting up a veth pair between two
    noop network namespaces setting and attempting a telnet connection between
    the two. I saw three packets in each direction and the byte counters were
    exactly 14*3 = 42 bytes high in each direction. I got the actual
    packet lengths with tcpdump.

    So remove the extra ETH_HLEN from the veth byte count totals.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • When I was fixing issues with unregisgtering tables under /proc/sys/net/ipv6/neigh
    by adding a mount point it appears I missed a critical ordering issue, in the
    ipv6 initialization. I had not realized that ipv6_sysctl_register is called
    at the very end of the ipv6 initialization and in particular after we call
    neigh_sysctl_register from ndisc_init.

    "neigh" needs to be initialized in ipv6_static_sysctl_register which is
    the first ipv6 table to initialized, and definitely before ndisc_init.
    This removes the weirdness of duplicate tables while still providing a
    "neigh" mount point which prevents races in sysctl unregistering.

    This was initially reported at https://bugzilla.kernel.org/show_bug.cgi?id=31232
    Reported-by: sunkan@zappa.cx
    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • When the macvlan driver was extended to call unregisgter_netdevice_queue
    in 23289a37e2b127dfc4de1313fba15bb4c9f0cd5b, a use after free of struct
    macvlan_port was introduced. The code in dellink relied on unregister_netdevice
    actually unregistering the net device so it would be safe to free macvlan_port.

    Since unregister_netdevice_queue can just queue up the unregister instead of
    performing the unregiser immediately we free the macvlan_port too soon and
    then the code in macvlan_stop removes the macaddress for the set of macaddress
    to listen for and uses memory that has already been freed.

    To fix this add a reference count to track when it is safe to free the macvlan_port
    and move the call of macvlan_port_destroy into macvlan_uninit which is guaranteed
    to be called after the final macvlan_port_close.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • It was pointed out to me recently that my spelling could be better :)

    Signed-off-by: Neil Horman
    Signed-off-by: David S. Miller

    Neil Horman
     
  • I ran into some trouble while testing the SocketCAN driver for the BOSCH
    C_CAN controller. The interface is not correctly initialized, if I put
    some CAN traffic on the line, _while_ the interface is being started
    (which means: the interface doesn't come up correcty, if there's some RX
    traffic while doing 'ifconfig can0 up').

    The current implementation enables the controller interrupts _before_
    doing the basic c_can configuration. I think, this should be done the
    other way round.

    The patch below fixes things for me.

    Signed-off-by: Jan Altenberg
    Acked-by: Kurt Van Dijck
    Acked-by: Wolfgang Grandegger
    Signed-off-by: David S. Miller

    Jan Altenberg