25 Mar, 2010

7 commits

  • cpu_relax() is documented in volatile-considered-harmful.txt to be a
    memory barrier. However, everyone with the exception of Blackfin and
    possibly ia64 defines cpu_relax() to be a compiler barrier.

    Make the documentation reflect the general concensus.

    Linus sayeth:

    : I don't think it was ever the intention that it would be seen as anything
    : but a compiler barrier, although it is obviously implied that it might
    : well perform some per-architecture actions that have "memory barrier-like"
    : semantics.
    :
    : After all, the whole and only point of the "cpu_relax()" thing is to tell
    : the CPU that we're busy-looping on some event.
    :
    : And that "event" might be (and often is) about reading the same memory
    : location over and over until it changes to what we want it to be. So it's
    : quite possible that on various architectures the "cpu_relax()" could be
    : about making sure that such a tight loop on loads doesn't starve cache
    : transactions, for example - and as such look a bit like a memory barrier
    : from a CPU standpoint.
    :
    : But it's not meant to have any kind of architectural memory ordering
    : semantics as far as the kernel is concerned - those must come from other
    : sources.

    Signed-off-by: Russell King
    Cc:
    Acked-by: Linus Torvalds
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Russell King
     
  • fs/binfmt_aout.c: In function `aout_core_dump':
    fs/binfmt_aout.c:125: warning: passing argument 2 of `dump_write' makes pointer from integer without a cast
    include/linux/coredump.h:12: note: expected `const void *' but argument is of type `long unsigned int'
    fs/binfmt_aout.c:132: warning: passing argument 2 of `dump_write' makes pointer from integer without a cast
    include/linux/coredump.h:12: note: expected `const void *' but argument is of type `long unsigned int'

    due to dump_write() expecting a user void *. Fold casts into the
    START_DATA/START_STACK macros and shut up the warnings.

    Signed-off-by: Borislav Petkov
    Cc: Daisuke HATAYAMA
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Borislav Petkov
     
  • scripts/kernel-doc erroneously says:

    Warning(include/linux/skbuff.h:410): Excess struct/union/enum/typedef member 'cb' description in 'sk_buff'

    on this line in struct sk_buff:
    char cb[48] __aligned(8);

    due to treating the last field as the struct member name, so teach
    kernel-doc to ignore __aligned(x) in structs.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • This was introduced by v2.6.34-rc1~38:

    4c014e8 (rtc/mc13783: protect rtc {,un}registration by mc13783 lock)

    Signed-off-by: Uwe Kleine-König
    Reported-by: Dan Carpenter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     
  • There was a potential null deref introduced in c62b1a3b31b5 ("memcg: use
    generic percpu instead of private implementation").

    Signed-off-by: Dan Carpenter
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Carpenter
     
  • commit e6a1105b ("cgroups: subsystem module loading interface") and commit
    c50cc752 ("sched, cgroups: Fix module export") result in duplicate
    including of module.h

    Signed-off-by: Li Zefan
    Acked-by: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • In commit 02491447 ("memcg: move charges of anonymous swap"), I tried to
    disable move charge feature in no mmu case by enclosing all the related
    functions with "#ifdef CONFIG_MMU", but the commit places these ifdefs in
    wrong place. (it seems that it's mangled while handling some fixes...)

    This patch fixes it up.

    Signed-off-by: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     

23 Mar, 2010

8 commits

  • Commit 45575f5a426c ("ppc64 sys_ipc breakage in 2.6.34-rc2") fixed the
    definition of the sys_ipc() helper, but didn't fix the prototype in

    Reported-and-tested-by: Andreas Schwab
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
    virtio: console: Check if port is valid in resize_console
    virtio: console: Generate a kobject CHANGE event on adding 'name' attribute

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (38 commits)
    ip_gre: include route header_len in max_headroom calculation
    if_tunnel.h: add missing ams/byteorder.h include
    ipv4: Don't drop redirected route cache entry unless PTMU actually expired
    net: suppress lockdep-RCU false positive in FIB trie.
    Bluetooth: Fix kernel crash on L2CAP stress tests
    Bluetooth: Convert debug files to actually use debugfs instead of sysfs
    Bluetooth: Fix potential bad memory access with sysfs files
    netfilter: ctnetlink: fix reliable event delivery if message building fails
    netlink: fix NETLINK_RECV_NO_ENOBUFS in netlink_set_err()
    NET_DMA: free skbs periodically
    netlink: fix unaligned access in nla_get_be64()
    tcp: Fix tcp_mark_head_lost() with packets == 0
    net: ipmr/ip6mr: fix potential out-of-bounds vif_table access
    KS8695: update ksp->next_rx_desc_read at the end of rx loop
    igb: Add support for 82576 ET2 Quad Port Server Adapter
    ixgbevf: Message formatting cleanups
    ixgbevf: Shorten up delay timer for watchdog task
    ixgbevf: Fix VF Stats accounting after reset
    ixgbe: Set IXGBE_RSC_CB(skb)->DMA field to zero after unmapping the address
    ixgbe: fix for real_num_tx_queues update issue
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
    edac, mce: Filter out invalid values

    Linus Torvalds
     
  • alloc_skb() can return NULL.

    Signed-off-by: Tetsuo Handa
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     
  • It seems clear from the surrounding code that xpermits is allowed to be
    NULL here.

    Signed-off-by: Dan Carpenter
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    Dan Carpenter
     
  • I chased down a fail on ppc64 on 2.6.34-rc2 where an application that
    uses shared memory was getting a SEGV.

    Commit baed7fc9b580bd3fb8252ff1d9b36eaf1f86b670 ("Add generic sys_ipc
    wrapper") changed the second argument from an unsigned long to an int.
    When we call shmget the system call wrappers for sys_ipc will sign
    extend second (ie the size) which truncates it. It took a while to
    track down because the call succeeds and strace shows the untruncated
    size :)

    The patch below changes second from an int to an unsigned long which
    fixes shmget on ppc64 (and I assume s390, sparc64 and mips64).

    Signed-off-by: Anton Blanchard
    --

    I assume the function prototypes for the other IPC methods would cause us
    to sign or zero extend second where appropriate (avoiding any security
    issues). Come to think of it, the syscall wrappers for each method should do
    that for us as well.
    Signed-off-by: Linus Torvalds

    Anton Blanchard
     
  • Commit 3f6da3905398826d85731247e7fbcf53400c18bd
    (perf: Rework and fix the arch CPU-hotplug hooks) broke suspend to
    RAM on my HP nx6325 (and most likely on other AMD-based boxes too)
    by allowing amd_pmu_cpu_offline() to be executed for CPUs that are
    going offline as part of the suspend process. The problem is that
    cpuhw->amd_nb may be NULL already, so the function should make sure
    it's not NULL before accessing the object pointed to by it.

    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     

22 Mar, 2010

8 commits

  • Print the CPU associated with the error only when the field is valid.

    Cc: # .32.x .33.x
    Signed-off-by: Borislav Petkov

    Borislav Petkov
     
  • The console port could have been hot-unplugged. Check if it is valid
    before working on it.

    Signed-off-by: Amit Shah
    Signed-off-by: Michael S. Tsirkin

    Amit Shah
     
  • When the host lets us know what 'name' a port is assigned, we create the
    sysfs 'name' attribute. Generate a 'change' event after this so that
    udev wakes up and acts on the rules for virtio-ports (currently there's
    only one rule that creates a symlink from the 'name' to the actual char
    device).

    Signed-off-by: Amit Shah
    Signed-off-by: Michael S. Tsirkin

    Amit Shah
     
  • Taking route's header_len into account, and updating gre device
    needed_headroom will give better hints on upper bound of required
    headroom. This is useful if the gre traffic is xfrm'ed.

    Signed-off-by: Timo Teras
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Timo Teräs
     
  • When compiling userspace application which includes
    if_tunnel.h and uses GRE_* defines you will get undefined
    reference to __cpu_to_be16.

    Fix this by adding missing #include

    Cc: stable@kernel.org
    Signed-off-by: Paulius Zaleckas
    Signed-off-by: David S. Miller

    Paulius Zaleckas
     
  • TCP sessions over IPv4 can get stuck if routers between endpoints
    do not fragment packets but implement PMTU instead, and we are using
    those routers because of an ICMP redirect.

    Setup is as follows

    MTU1 MTU2 MTU1
    A--------B------C------D

    with MTU1 > MTU2. A and D are endpoints, B and C are routers. B and C
    implement PMTU and drop packets larger than MTU2 (for example because
    DF is set on all packets). TCP sessions are initiated between A and D.
    There is packet loss between A and D, causing frequent TCP
    retransmits.

    After the number of retransmits on a TCP session reaches tcp_retries1,
    tcp calls dst_negative_advice() prior to each retransmit. This results
    in route cache entries for the peer to be deleted in
    ipv4_negative_advice() if the Path MTU is set.

    If the outstanding data on an affected TCP session is larger than
    MTU2, packets sent from the endpoints will be dropped by B or C, and
    ICMP NEEDFRAG will be returned. A and D receive NEEDFRAG messages and
    update PMTU.

    Before the next retransmit, tcp will again call dst_negative_advice(),
    causing the route cache entry (with correct PMTU) to be deleted. The
    retransmitted packet will be larger than MTU2, causing it to be
    dropped again.

    This sequence repeats until the TCP session aborts or is terminated.

    Problem is fixed by removing redirected route cache entries in
    ipv4_negative_advice() only if the PMTU is expired.

    Signed-off-by: Guenter Roeck
    Signed-off-by: David S. Miller

    Guenter Roeck
     
  • David S. Miller
     
  • Allow fib_find_node() to be called either under rcu_read_lock()
    protection or with RTNL held.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Paul E. McKenney
     

21 Mar, 2010

7 commits

  • Added very simple check that req buffer has enough space to
    fit configuration parameters. Shall be enough to reject packets
    with configuration size more than req buffer.

    Crash trace below

    [ 6069.659393] Unable to handle kernel paging request at virtual address 02000205
    [ 6069.673034] Internal error: Oops: 805 [#1] PREEMPT
    ...
    [ 6069.727172] PC is at l2cap_add_conf_opt+0x70/0xf0 [l2cap]
    [ 6069.732604] LR is at l2cap_recv_frame+0x1350/0x2e78 [l2cap]
    ...
    [ 6070.030303] Backtrace:
    [ 6070.032806] [] (l2cap_add_conf_opt+0x0/0xf0 [l2cap]) from
    [] (l2cap_recv_frame+0x1350/0x2e78 [l2cap])
    [ 6070.043823] r8:dc5d3100 r7:df2a91d6 r6:00000001 r5:df2a8000 r4:00000200
    [ 6070.050659] [] (l2cap_recv_frame+0x0/0x2e78 [l2cap]) from
    [] (l2cap_recv_acldata+0x2bc/0x350 [l2cap])
    [ 6070.061798] [] (l2cap_recv_acldata+0x0/0x350 [l2cap]) from
    [] (hci_rx_task+0x244/0x478 [bluetooth])
    [ 6070.072631] r6:dc647700 r5:00000001 r4:df2ab740
    [ 6070.077362] [] (hci_rx_task+0x0/0x478 [bluetooth]) from
    [] (tasklet_action+0x78/0xd8)
    [ 6070.087005] [] (tasklet_action+0x0/0xd8) from []

    Signed-off-by: Andrei Emeltchenko
    Acked-by: Gustavo F. Padovan
    Signed-off-by: Marcel Holtmann

    Andrei Emeltchenko
     
  • Some of the debug files ended up wrongly in sysfs, because at that point
    of time, debugfs didn't exist. Convert these files to use debugfs and
    also seq_file. This patch converts all of these files at once and then
    removes the exported symbol for the Bluetooth sysfs class.

    Signed-off-by: Marcel Holtmann

    Marcel Holtmann
     
  • When creating a high number of Bluetooth sockets (L2CAP, SCO
    and RFCOMM) it is possible to scribble repeatedly on arbitrary
    pages of memory. Ensure that the content of these sysfs files is
    always less than one page. Even if this means truncating. The
    files in question are scheduled to be moved over to debugfs in
    the future anyway.

    Based on initial patches from Neil Brown and Linus Torvalds

    Reported-by: Neil Brown
    Signed-off-by: Marcel Holtmann

    Marcel Holtmann
     
  • David S. Miller
     
  • This patch fixes a bug that allows to lose events when reliable
    event delivery mode is used, ie. if NETLINK_BROADCAST_SEND_ERROR
    and NETLINK_RECV_NO_ENOBUFS socket options are set.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     
  • Currently, ENOBUFS errors are reported to the socket via
    netlink_set_err() even if NETLINK_RECV_NO_ENOBUFS is set. However,
    that should not happen. This fixes this problem and it changes the
    prototype of netlink_set_err() to return the number of sockets that
    have set the NETLINK_RECV_NO_ENOBUFS socket option. This return
    value is used in the next patch in these bugfix series.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     
  • Under NET_DMA, data transfer can grind to a halt when userland issues a
    large read on a socket with a high RCVLOWAT (i.e., 512 KB for both).
    This appears to be because the NET_DMA design queues up lots of memcpy
    operations, but doesn't issue or wait for them (and thus free the
    associated skbs) until it is time for tcp_recvmesg() to return.
    The socket hangs when its TCP window goes to zero before enough data is
    available to satisfy the read.

    Periodically issue asynchronous memcpy operations, and free skbs for ones
    that have completed, to prevent sockets from going into zero-window mode.

    Signed-off-by: Steven J. Magnani
    Signed-off-by: David S. Miller

    Steven J. Magnani
     

20 Mar, 2010

10 commits

  • This patch fixes a unaligned access in nla_get_be64() that was
    introduced by myself in a17c859849402315613a0015ac8fbf101acf0cc1.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     
  • A packet is marked as lost in case packets == 0, although nothing should be done.
    This results in a too early retransmitted packet during recovery in some cases.
    This small patch fixes this issue by returning immediately.

    Signed-off-by: Lennart Schulte
    Signed-off-by: Arnd Hannemann
    Signed-off-by: David S. Miller

    Lennart Schulte
     
  • mfc_parent of cache entries is used to index into the vif_table and is
    initialised from mfcctl->mfcc_parent. This can take values of to 2^16-1,
    while the vif_table has only MAXVIFS (32) entries. The same problem
    affects ip6mr.

    Refuse invalid values to fix a potential out-of-bounds access. Unlike
    the other validity checks, this is checked in ipmr_mfc_add() instead of
    the setsockopt handler since its unused in the delete path and might be
    uninitialized.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • There is no need to adjust the next rx descriptor after each packet,
    so do it only once at the end of the routine.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Yegor Yefremov

    Yegor Yefremov
     
  • Signed-off-by: Carolyn Wyborny
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Carolyn Wyborny
     
  • Clean up some text output formatting.

    Signed-off-by: Greg Rose
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Greg Rose
     
  • The recovery from PF reset works better when you shorten up the delay
    until the watchdog task executes.

    Signed-off-by: Greg Rose
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Greg Rose
     
  • The counters in the 82599 Virtual Function are not clear on read. They
    accumulate to the maximum value and then roll over. They are also not
    cleared when the VF executes a soft reset, so it is possible they are
    non-zero when the driver loads and starts. This has all been accounted
    for in the code that keeps the stats up to date but there is one case
    that is not. When the PF driver is reset the counters in the VF are
    all reset to zero. This adds an additional accounting overhead into
    the VF driver when the PF is reset under its feet. This patch adds
    additional counters that are used by the VF driver to accumulate and
    save stats after a PF reset has been detected. Prior to this patch
    displaying the stats in the VF after the PF has reset would show
    bogus data.

    Signed-off-by: Greg Rose
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Greg Rose
     
  • As per Simon Horman's feedback set IXGBE_RSC_CB(skb)->dma to zero
    after unmapping HWRSC DMA address to avoid double freeing.

    Signed-off-by: Mallikarjuna R Chilakala
    Acked-by: Peter P Waskiewicz Jr
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Mallikarjuna R Chilakala
     
  • Currently netdev_features_change is called before fcoe tx queues
    setup is done, so this patch moves calling of netdev_features_change
    after tx queues setup is done in ixgbe_init_interrupt_scheme, so
    that real_num_tx_queues is updated correctly on each fcoe enable
    or disable.

    This allows additional fcoe queues updated correctly in vlan driver
    for their correct queue selection.

    Signed-off-by: Vasu Dev
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Vasu Dev