15 Mar, 2011

1 commit

  • * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
    NFS: NFSROOT should default to "proto=udp"
    nfs4: remove duplicated #include
    NFSv4: nfs4_state_mark_reclaim_nograce() should be static
    NFSv4: Fix the setlk error handler
    NFSv4.1: Fix the handling of the SEQUENCE status bits
    NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses
    NFSv4.1 reclaim complete must wait for completion
    NFSv4: remove duplicate clientid in struct nfs_client
    NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAY
    sunrpc: Propagate errors from xs_bind() through xs_create_sock()
    (try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid
    nfs: fix compilation warning
    nfs: add kmalloc return value check in decode_and_add_ds
    SUNRPC: Remove resource leak in svc_rdma_send_error()
    nfs: close NFSv4 COMMIT vs. CLOSE race
    SUNRPC: Close a race in __rpc_wait_for_completion_task()

    Linus Torvalds
     

11 Mar, 2011

6 commits

  • Add necessary alias to autoload ip6ip6 tunnel module.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     
  • David S. Miller
     
  • When configs BRIDGE=y and IPV6=m, this build error occurs:

    br_multicast.c:(.text+0xa3341): undefined reference to `ipv6_dev_get_saddr'

    BRIDGE_IGMP_SNOOPING is boolean; if it were tristate, then adding
    depends on IPV6 || IPV6=n
    to BRIDGE_IGMP_SNOOPING would be a good fix. As it is currently,
    making BRIDGE depend on the IPV6 config works.

    Reported-by: Patrick Schaaf
    Signed-off-by: Randy Dunlap
    Signed-off-by: David S. Miller

    Randy Dunlap
     
  • xs_create_sock() is supposed to return a pointer or an ERR_PTR-encoded
    error, but it currently returns 0 if xs_bind() fails.

    Signed-off-by: Ben Hutchings
    Cc: stable@kernel.org [v2.6.37]
    Signed-off-by: Trond Myklebust

    Ben Hutchings
     
  • We leak the memory allocated to 'ctxt' when we return after
    'ib_dma_mapping_error()' returns !=0.

    Signed-off-by: Jesper Juhl
    Signed-off-by: Trond Myklebust

    Jesper Juhl
     
  • Although they run as rpciod background tasks, under normal operation
    (i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck()
    and nfs4_do_close() want to be fully synchronous. This means that when we
    exit, we want all references to the rpc_task to be gone, and we want
    any dentry references etc. held by that task to be released.

    For this reason these functions call __rpc_wait_for_completion_task(),
    followed by rpc_put_task() in the expectation that the latter will be
    releasing the last reference to the rpc_task, and thus ensuring that the
    callback_ops->rpc_release() has been called synchronously.

    This patch fixes a race which exists due to the fact that
    rpciod calls rpc_complete_task() (in order to wake up the callers of
    __rpc_wait_for_completion_task()) and then subsequently calls
    rpc_put_task() without ensuring that these two steps are done atomically.

    In order to avoid adding new spin locks, the patch uses the existing
    waitqueue spin lock to order the rpc_task reference count releases between
    the waiting process and rpciod.
    The common case where nobody is waiting for completion is optimised for by
    checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task
    reference count is 1: in those cases we drop trying to grab the spin lock,
    and immediately free up the rpc_task.

    Those few processes that need to put the rpc_task from inside an
    asynchronous context and that do not care about ordering are given a new
    helper: rpc_put_task_async().

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

10 Mar, 2011

4 commits

  • Addresses https://bugzilla.kernel.org/show_bug.cgi?id=29252
    Addresses https://bugzilla.kernel.org/show_bug.cgi?id=30462

    In commit d80bc0fd262ef840ed4e82593ad6416fa1ba3fc4 ("ipv6: Always
    clone offlink routes.") we forced the kernel to always clone offlink
    routes.

    The reason we do that is to make sure we never bind an inetpeer to a
    prefixed route.

    The logic turned on here has existed in the tree for many years,
    but was always off due to a protecting CPP define. So perhaps
    it's no surprise that there is a logic bug here.

    The problem is that we canot clone a route that is already a
    host route (ie. has DST_HOST set). Because if we do, an identical
    entry already exists in the routing tree and therefore the
    ip6_rt_ins() call is going to fail.

    This sets off a series of failures and high cpu usage, because when
    ip6_rt_ins() fails we loop retrying this operation a few times in
    order to handle a race between two threads trying to clone and insert
    the same host route at the same time.

    Fix this by simply using the route as-is when DST_HOST is set.

    Reported-by: slash@ac.auone-net.jp
    Reported-by: Ernst Sjöstrand
    Signed-off-by: David S. Miller

    David S. Miller
     
  • Since a8f80e8ff94ecba629542d9b4b5f5a8ee3eb565c any process with
    CAP_NET_ADMIN may load any module from /lib/modules/. This doesn't mean
    that CAP_NET_ADMIN is a superset of CAP_SYS_MODULE as modules are
    limited to /lib/modules/**. However, CAP_NET_ADMIN capability shouldn't
    allow anybody load any module not related to networking.

    This patch restricts an ability of autoloading modules to netdev modules
    with explicit aliases. This fixes CVE-2011-1019.

    Arnd Bergmann suggested to leave untouched the old pre-v2.6.32 behavior
    of loading netdev modules by name (without any prefix) for processes
    with CAP_SYS_MODULE to maintain the compatibility with network scripts
    that use autoloading netdev modules by aliases like "eth0", "wlan0".

    Currently there are only three users of the feature in the upstream
    kernel: ipip, ip_gre and sit.

    root@albatros:~# capsh --drop=$(seq -s, 0 11),$(seq -s, 13 34) --
    root@albatros:~# grep Cap /proc/$$/status
    CapInh: 0000000000000000
    CapPrm: fffffff800001000
    CapEff: fffffff800001000
    CapBnd: fffffff800001000
    root@albatros:~# modprobe xfs
    FATAL: Error inserting xfs
    (/lib/modules/2.6.38-rc6-00001-g2bf4ca3/kernel/fs/xfs/xfs.ko): Operation not permitted
    root@albatros:~# lsmod | grep xfs
    root@albatros:~# ifconfig xfs
    xfs: error fetching interface information: Device not found
    root@albatros:~# lsmod | grep xfs
    root@albatros:~# lsmod | grep sit
    root@albatros:~# ifconfig sit
    sit: error fetching interface information: Device not found
    root@albatros:~# lsmod | grep sit
    root@albatros:~# ifconfig sit0
    sit0 Link encap:IPv6-in-IPv4
    NOARP MTU:1480 Metric:1

    root@albatros:~# lsmod | grep sit
    sit 10457 0
    tunnel4 2957 1 sit

    For CAP_SYS_MODULE module loading is still relaxed:

    root@albatros:~# grep Cap /proc/$$/status
    CapInh: 0000000000000000
    CapPrm: ffffffffffffffff
    CapEff: ffffffffffffffff
    CapBnd: ffffffffffffffff
    root@albatros:~# ifconfig xfs
    xfs: error fetching interface information: Device not found
    root@albatros:~# lsmod | grep xfs
    xfs 745319 0

    Reference: https://lkml.org/lkml/2011/2/24/203

    Signed-off-by: Vasiliy Kulikov
    Signed-off-by: Michael Tokarev
    Acked-by: David S. Miller
    Acked-by: Kees Cook
    Signed-off-by: James Morris

    Vasiliy Kulikov
     
  • The units in show_results in pktgen were not correct.
    The results are in usec but it was displayed nsec.

    Reported-by: Jong-won Lee
    Signed-off-by: Daniel Turull
    Signed-off-by: David S. Miller

    Daniel Turull
     
  • In usual cases ifa_address == ifa_local, but in the case where
    SIOCSIFDSTADDR sets the destination address on a point-to-point
    link, ifa_address gets set to that destination address.

    Therefore we should use ifa_local when we want the local interface
    address.

    There were two cases where the selection was done incorrectly:

    1) When devinet_ioctl() does matching, it checks ifa_address even
    though gifconf correct reported ifa_local to the user

    2) IN_DEV_ARP_NOTIFY handling sends a gratuitous ARP using
    ifa_address instead of ifa_local.

    Reported-by: Julian Anastasov
    Signed-off-by: David S. Miller

    David S. Miller
     

09 Mar, 2011

1 commit

  • Recently had this bug halt reported to me:

    kernel BUG at net/rds/send.c:329!
    Oops: Exception in kernel mode, sig: 5 [#1]
    SMP NR_CPUS=1024 NUMA pSeries
    Modules linked in: rds sunrpc ipv6 dm_mirror dm_region_hash dm_log ibmveth sg
    ext4 jbd2 mbcache sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt
    dm_mod [last unloaded: scsi_wait_scan]
    NIP: d000000003ca68f4 LR: d000000003ca67fc CTR: d000000003ca8770
    REGS: c000000175cab980 TRAP: 0700 Not tainted (2.6.32-118.el6.ppc64)
    MSR: 8000000000029032 CR: 44000022 XER: 00000000
    TASK = c00000017586ec90[1896] 'krdsd' THREAD: c000000175ca8000 CPU: 0
    GPR00: 0000000000000150 c000000175cabc00 d000000003cb7340 0000000000002030
    GPR04: ffffffffffffffff 0000000000000030 0000000000000000 0000000000000030
    GPR08: 0000000000000001 0000000000000001 c0000001756b1e30 0000000000010000
    GPR12: d000000003caac90 c000000000fa2500 c0000001742b2858 c0000001742b2a00
    GPR16: c0000001742b2a08 c0000001742b2820 0000000000000001 0000000000000001
    GPR20: 0000000000000040 c0000001742b2814 c000000175cabc70 0800000000000000
    GPR24: 0000000000000004 0200000000000000 0000000000000000 c0000001742b2860
    GPR28: 0000000000000000 c0000001756b1c80 d000000003cb68e8 c0000001742b27b8
    NIP [d000000003ca68f4] .rds_send_xmit+0x4c4/0x8a0 [rds]
    LR [d000000003ca67fc] .rds_send_xmit+0x3cc/0x8a0 [rds]
    Call Trace:
    [c000000175cabc00] [d000000003ca67fc] .rds_send_xmit+0x3cc/0x8a0 [rds]
    (unreliable)
    [c000000175cabd30] [d000000003ca7e64] .rds_send_worker+0x54/0x100 [rds]
    [c000000175cabdb0] [c0000000000b475c] .worker_thread+0x1dc/0x3c0
    [c000000175cabed0] [c0000000000baa9c] .kthread+0xbc/0xd0
    [c000000175cabf90] [c000000000032114] .kernel_thread+0x54/0x70
    Instruction dump:
    4bfffd50 60000000 60000000 39080001 935f004c f91f0040 41820024 813d017c
    7d094a78 7d290074 7929d182 394a0020 40e2ff68 4bffffa4 39200000
    Kernel panic - not syncing: Fatal exception
    Call Trace:
    [c000000175cab560] [c000000000012e04] .show_stack+0x74/0x1c0 (unreliable)
    [c000000175cab610] [c0000000005a365c] .panic+0x80/0x1b4
    [c000000175cab6a0] [c00000000002fbcc] .die+0x21c/0x2a0
    [c000000175cab750] [c000000000030000] ._exception+0x110/0x220
    [c000000175cab910] [c000000000004b9c] program_check_common+0x11c/0x180

    Signed-off-by: David S. Miller

    Neil Horman
     

08 Mar, 2011

2 commits

  • The unix_dgram_recvmsg and unix_stream_recvmsg routines in
    net/af_unix.c utilize mutex_lock(&u->readlock) calls in order to
    serialize read operations of multiple threads on a single socket. This
    implies that, if all n threads of a process block in an AF_UNIX recv
    call trying to read data from the same socket, one of these threads
    will be sleeping in state TASK_INTERRUPTIBLE and all others in state
    TASK_UNINTERRUPTIBLE. Provided that a particular signal is supposed to
    be handled by a signal handler defined by the process and that none of
    this threads is blocking the signal, the complete_signal routine in
    kernel/signal.c will select the 'first' such thread it happens to
    encounter when deciding which thread to notify that a signal is
    supposed to be handled and if this is one of the TASK_UNINTERRUPTIBLE
    threads, the signal won't be handled until the one thread not blocking
    on the u->readlock mutex is woken up because some data to process has
    arrived (if this ever happens). The included patch fixes this by
    changing mutex_lock to mutex_lock_interruptible and handling possible
    error returns in the same way interruptions are handled by the actual
    receive-code.

    Signed-off-by: Rainer Weikusat
    Signed-off-by: David S. Miller

    Rainer Weikusat
     
  • exthdrs_core.c and addrconf_core.c in net/ipv6/ contain bits which
    must be made available even if IPv6 is disabled.

    net/ipv6/Makefile already correctly includes them if CONFIG_IPV6=n
    but net/Makefile prevents entering the subdirectory.

    Signed-off-by: Thomas Graf
    Acked-by: Randy Dunlap
    Signed-off-by: David S. Miller

    Thomas Graf
     

06 Mar, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    ceph: no .snap inside of snapped namespace
    libceph: fix msgr standby handling
    libceph: fix msgr keepalive flag
    libceph: fix msgr backoff
    libceph: retry after authorization failure
    libceph: fix handling of short returns from get_user_pages
    ceph: do not clear I_COMPLETE from d_release
    ceph: do not set I_COMPLETE
    Revert "ceph: keep reference to parent inode on ceph_dentry"

    Linus Torvalds
     

05 Mar, 2011

3 commits

  • The standby logic used to be pretty dependent on the work requeueing
    behavior that changed when we switched to WQ_NON_REENTRANT. It was also
    very fragile.

    Restructure things so that:
    - We clear WRITE_PENDING when we set STANDBY. This ensures we will
    requeue work when we wake up later.
    - con_work backs off if STANDBY is set. There is nothing to do if we are
    in standby.
    - clear_standby() helper is called by both con_send() and con_keepalive(),
    the two actions that can wake us up again. Move the connect_seq++
    logic here.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • There was some broken keepalive code using a dead variable. Shift to using
    the proper bit flag.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • With commit f363e45f we replaced a bunch of hacky workqueue mutual
    exclusion logic with the WQ_NON_REENTRANT flag. One pieces of fallout is
    that the exponential backoff breaks in certain cases:

    * con_work attempts to connect.
    * we get an immediate failure, and the socket state change handler queues
    immediate work.
    * con_work calls con_fault, we decide to back off, but can't queue delayed
    work.

    In this case, we add a BACKOFF bit to make con_work reschedule delayed work
    next time it runs (which should be immediately).

    Signed-off-by: Sage Weil

    Sage Weil
     

04 Mar, 2011

5 commits

  • …s/security-testing-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
    DNS: Fix a NULL pointer deref when trying to read an error key [CVE-2011-1076]

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits)
    MAINTAINERS: Add Andy Gospodarek as co-maintainer.
    r8169: disable ASPM
    RxRPC: Fix v1 keys
    AF_RXRPC: Handle receiving ACKALL packets
    cnic: Fix lost interrupt on bnx2x
    cnic: Prevent status block race conditions with hardware
    net: dcbnl: check correct ops in dcbnl_ieee_set()
    e1000e: disable broken PHY wakeup for ICH10 LOMs, use MAC wakeup instead
    igb: fix sparse warning
    e1000: fix sparse warning
    netfilter: nf_log: avoid oops in (un)bind with invalid nfproto values
    dccp: fix oops on Reset after close
    ipvs: fix dst_lock locking on dest update
    davinci_emac: Add Carrier Link OK check in Davinci RX Handler
    bnx2x: update driver version to 1.62.00-6
    bnx2x: properly calculate lro_mss
    bnx2x: perform statistics "action" before state transition.
    bnx2x: properly configure coefficients for MinBW algorithm (NPAR mode).
    bnx2x: Fix ethtool -t link test for MF (non-pmf) devices.
    bnx2x: Fix nvram test for single port devices.
    ...

    Linus Torvalds
     
  • When a DNS resolver key is instantiated with an error indication, attempts to
    read that key will result in an oops because user_read() is expecting there to
    be a payload - and there isn't one [CVE-2011-1076].

    Give the DNS resolver key its own read handler that returns the error cached in
    key->type_data.x[0] as an error rather than crashing.

    Also make the kenter() at the beginning of dns_resolver_instantiate() limit the
    amount of data it prints, since the data is not necessarily NUL-terminated.

    The buggy code was added in:

    commit 4a2d789267e00b5a1175ecd2ddefcc78b83fbf09
    Author: Wang Lei
    Date: Wed Aug 11 09:37:58 2010 +0100
    Subject: DNS: If the DNS server returns an error, allow that to be cached [ver #2]

    This can trivially be reproduced by any user with the following program
    compiled with -lkeyutils:

    #include
    #include
    #include
    static char payload[] = "#dnserror=6";
    int main()
    {
    key_serial_t key;
    key = add_key("dns_resolver", "a", payload, sizeof(payload),
    KEY_SPEC_SESSION_KEYRING);
    if (key == -1)
    err(1, "add_key");
    if (keyctl_read(key, NULL, 0) == -1)
    err(1, "read_key");
    return 0;
    }

    What should happen is that keyctl_read() reports error 6 (ENXIO) to the user:

    dns-break: read_key: No such device or address

    but instead the kernel oopses.

    This cannot be reproduced with the 'keyutils add' or 'keyutils padd' commands
    as both of those cut the data down below the NUL termination that must be
    included in the data. Without this dns_resolver_instantiate() will return
    -EINVAL and the key will not be instantiated such that it can be read.

    The oops looks like:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
    IP: [] user_read+0x4f/0x8f
    PGD 3bdf8067 PUD 385b9067 PMD 0
    Oops: 0000 [#1] SMP
    last sysfs file: /sys/devices/pci0000:00/0000:00:19.0/irq
    CPU 0
    Modules linked in:

    Pid: 2150, comm: dns-break Not tainted 2.6.38-rc7-cachefs+ #468 /DG965RY
    RIP: 0010:[] [] user_read+0x4f/0x8f
    RSP: 0018:ffff88003bf47f08 EFLAGS: 00010246
    RAX: 0000000000000001 RBX: ffff88003b5ea378 RCX: ffffffff81972368
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003b5ea378
    RBP: ffff88003bf47f28 R08: ffff88003be56620 R09: 0000000000000000
    R10: 0000000000000395 R11: 0000000000000002 R12: 0000000000000000
    R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffffffffa1
    FS: 00007feab5751700(0000) GS:ffff88003e000000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000010 CR3: 000000003de40000 CR4: 00000000000006f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process dns-break (pid: 2150, threadinfo ffff88003bf46000, task ffff88003be56090)
    Stack:
    ffff88003b5ea378 ffff88003b5ea3a0 0000000000000000 0000000000000000
    ffff88003bf47f68 ffffffff811b708e ffff88003c442bc8 0000000000000000
    00000000004005a0 00007fffba368060 0000000000000000 0000000000000000
    Call Trace:
    [] keyctl_read_key+0xac/0xcf
    [] sys_keyctl+0x75/0xb6
    [] system_call_fastpath+0x16/0x1b
    Code: 75 1f 48 83 7b 28 00 75 18 c6 05 58 2b fb 00 01 be bb 00 00 00 48 c7 c7 76 1c 75 81 e8 13 c2 e9 ff 4c 8b b3 e0 00 00 00 4d 85 ed 0f b7 5e 10 74 2d 4d 85 e4 74 28 e8 98 79 ee ff 49 39 dd 48
    RIP [] user_read+0x4f/0x8f
    RSP
    CR2: 0000000000000010

    Signed-off-by: David Howells
    Acked-by: Jeff Layton
    cc: Wang Lei
    Signed-off-by: James Morris

    David Howells
     
  • If we mark the connection CLOSED we will give up trying to reconnect to
    this server instance. That is appropriate for things like a protocol
    version mismatch that won't change until the server is restarted, at which
    point we'll get a new addr and reconnect. An authorization failure like
    this is probably due to the server not properly rotating it's secret keys,
    however, and should be treated as transient so that the normal backoff and
    retry behavior kicks in.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • get_user_pages() can return fewer pages than we ask for. We were returning
    a bogus pointer/error code in that case. Instead, loop until we get all
    the pages we want or get an error we can return to the caller.

    Signed-off-by: Sage Weil

    Sage Weil
     

03 Mar, 2011

3 commits


02 Mar, 2011

3 commits

  • Like many other places, we have to check that the array index is
    within allowed limits, or otherwise, a kernel oops and other nastiness
    can ensue when we access memory beyond the end of the array.

    [ 5954.115381] BUG: unable to handle kernel paging request at 0000004000000000
    [ 5954.120014] IP: __find_logger+0x6f/0xa0
    [ 5954.123979] nf_log_bind_pf+0x2b/0x70
    [ 5954.123979] nfulnl_recv_config+0xc0/0x4a0 [nfnetlink_log]
    [ 5954.123979] nfnetlink_rcv_msg+0x12c/0x1b0 [nfnetlink]
    ...

    The problem goes back to v2.6.30-rc1~1372~1342~31 where nf_log_bind
    was decoupled from nf_log_register.

    Reported-by: Miguel Di Ciurcio Filho ,
    via irc.freenode.net/#netfilter
    Signed-off-by: Jan Engelhardt
    Signed-off-by: Patrick McHardy

    Jan Engelhardt
     
  • This fixes a bug in the order of dccp_rcv_state_process() that still permitted
    reception even after closing the socket. A Reset after close thus causes a NULL
    pointer dereference by not preventing operations on an already torn-down socket.

    dccp_v4_do_rcv()
    |
    | state other than OPEN
    v
    dccp_rcv_state_process()
    |
    | DCCP_PKT_RESET
    v
    dccp_rcv_reset()
    |
    v
    dccp_time_wait()

    WARNING: at net/ipv4/inet_timewait_sock.c:141 __inet_twsk_hashdance+0x48/0x128()
    Modules linked in: arc4 ecb carl9170 rt2870sta(C) mac80211 r8712u(C) crc_ccitt ah
    [] (unwind_backtrace+0x0/0xec) from [] (warn_slowpath_common)
    [] (warn_slowpath_common+0x4c/0x64) from [] (warn_slowpath_n)
    [] (warn_slowpath_null+0x1c/0x24) from [] (__inet_twsk_hashd)
    [] (__inet_twsk_hashdance+0x48/0x128) from [] (dccp_time_wai)
    [] (dccp_time_wait+0x40/0xc8) from [] (dccp_rcv_state_proces)
    [] (dccp_rcv_state_process+0x120/0x538) from [] (dccp_v4_do_)
    [] (dccp_v4_do_rcv+0x11c/0x14c) from [] (release_sock+0xac/0)
    [] (release_sock+0xac/0x110) from [] (dccp_close+0x28c/0x380)
    [] (dccp_close+0x28c/0x380) from [] (inet_release+0x64/0x70)

    The fix is by testing the socket state first. Receiving a packet in Closed state
    now also produces the required "No connection" Reset reply of RFC 4340, 8.3.1.

    Reported-and-tested-by: Johan Hovold
    Cc: stable@kernel.org
    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • Fix dst_lock usage in __ip_vs_update_dest. We need
    _bh locking because destination is updated in user context.
    Can cause lockups on frequent destination updates.
    Problem reported by Simon Kirby. Bug was introduced
    in 2.6.37 from the "ipvs: changes for local real server"
    change.

    Signed-off-by: Julian Anastasov
    Signed-off-by: Hans Schillstrom
    Signed-off-by: Simon Horman

    Julian Anastasov
     

01 Mar, 2011

1 commit

  • netlink_dump() may failed, but nobody handle its error.
    It generates output data, when a previous portion has been returned to
    user space. This mechanism works when all data isn't go in skb. If we
    enter in netlink_recvmsg() and skb is absent in the recv queue, the
    netlink_dump() will not been executed. So if netlink_dump() is failed
    one time, the new data never appear and the reader will sleep forever.

    netlink_dump() is called from two places:

    1. from netlink_sendmsg->...->netlink_dump_start().
    In this place we can report error directly and it will be returned
    by sendmsg().

    2. from netlink_recvmsg
    There we can't report error directly, because we have a portion of
    valid output data and call netlink_dump() for prepare the next portion.
    If netlink_dump() is failed, the socket will be mark as error and the
    next recvmsg will be failed.

    Signed-off-by: Andrey Vagin
    Signed-off-by: David S. Miller

    Andrey Vagin
     

26 Feb, 2011

3 commits

  • addr_type of 0 means that the type should be adopted from from_dev and
    not from __hw_addr_del_multiple(). Unfortunately it isn't so and
    addr_type will always be considered. Fix this by implementing the
    considered and documented behavior.

    Signed-off-by: Hagen Paul Pfeifer
    Signed-off-by: David S. Miller

    Hagen Paul Pfeifer
     
  • With slab poisoning enabled, I see the following oops:

    Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6b73
    ...
    NIP [c0000000006bc61c] .rxrpc_destroy+0x44/0x104
    LR [c0000000006bc618] .rxrpc_destroy+0x40/0x104
    Call Trace:
    [c0000000feb2bc00] [c0000000006bc618] .rxrpc_destroy+0x40/0x104 (unreliable)
    [c0000000feb2bc90] [c000000000349b2c] .key_cleanup+0x1a8/0x20c
    [c0000000feb2bd40] [c0000000000a2920] .process_one_work+0x2f4/0x4d0
    [c0000000feb2be00] [c0000000000a2d50] .worker_thread+0x254/0x468
    [c0000000feb2bec0] [c0000000000a868c] .kthread+0xbc/0xc8
    [c0000000feb2bf90] [c000000000020e00] .kernel_thread+0x54/0x70

    We aren't initialising token->next, but the code in destroy_context relies
    on the list being NULL terminated. Use kzalloc to zero out all the fields.

    Signed-off-by: Anton Blanchard
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    Anton Blanchard
     
  • Before this patch issuing these commands:

    fd = open("/proc/sys/net/ipv6/route/flush")
    unshare(CLONE_NEWNET)
    write(fd, "stuff")

    would flush the newly created net, not the original one.

    The equivalent ipv4 code is correct (stores the net inside ->extra1).
    Acked-by: Daniel Lezcano

    Signed-off-by: David S. Miller

    Lucian Adrian Grijincu
     

24 Feb, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (33 commits)
    Added support for usb ethernet (0x0fe6, 0x9700)
    r8169: fix RTL8168DP power off issue.
    r8169: correct settings of rtl8102e.
    r8169: fix incorrect args to oob notify.
    DM9000B: Fix PHY power for network down/up
    DM9000B: Fix reg_save after spin_lock in dm9000_timeout
    net_sched: long word align struct qdisc_skb_cb data
    sfc: lower stack usage in efx_ethtool_self_test
    bridge: Use IPv6 link-local address for multicast listener queries
    bridge: Fix MLD queries' ethernet source address
    bridge: Allow mcast snooping for transient link local addresses too
    ipv6: Add IPv6 multicast address flag defines
    bridge: Add missing ntohs()s for MLDv2 report parsing
    bridge: Fix IPv6 multicast snooping by correcting offset in MLDv2 report
    bridge: Fix IPv6 multicast snooping by storing correct protocol type
    p54pci: update receive dma buffers before and after processing
    fix cfg80211_wext_siwfreq lock ordering...
    rt2x00: Fix WPA TKIP Michael MIC failures.
    ath5k: Fix fast channel switching
    tcp: undo_retrans counter fixes
    ...

    Linus Torvalds
     

23 Feb, 2011

6 commits

  • David S. Miller
     
  • Currently the bridge multicast snooping feature periodically issues
    IPv6 general multicast listener queries to sense the absence of a
    listener.

    For this, it uses :: as its source address - however RFC 2710 requires:
    "To be valid, the Query message MUST come from a link-local IPv6 Source
    Address". Current Linux kernel versions seem to follow this requirement
    and ignore our bogus MLD queries.

    With this commit a link local address from the bridge interface is being
    used to issue the MLD query, resulting in other Linux devices which are
    multicast listeners in the network to respond with a MLD response (which
    was not the case before).

    Signed-off-by: Linus Lüssing
    Signed-off-by: David S. Miller

    Linus Lüssing
     
  • Map the IPv6 header's destination multicast address to an ethernet
    source address instead of the MLD queries multicast address.

    For instance for a general MLD query (multicast address in the MLD query
    set to ::), this would wrongly be mapped to 33:33:00:00:00:00, although
    an MLD queries destination MAC should always be 33:33:00:00:00:01 which
    matches the IPv6 header's multicast destination ff02::1.

    Signed-off-by: Linus Lüssing
    Signed-off-by: David S. Miller

    Linus Lüssing
     
  • Currently the multicast bridge snooping support is not active for
    link local multicast. I assume this has been done to leave
    important multicast data untouched, like IPv6 Neighborhood Discovery.

    In larger, bridged, local networks it could however be desirable to
    optimize for instance local multicast audio/video streaming too.

    With the transient flag in IPv6 multicast addresses we have an easy
    way to optimize such multimedia traffic without tempering with the
    high priority multicast data from well-known addresses.

    This patch alters the multicast bridge snooping for IPv6, to take
    effect for transient multicast addresses instead of non-link-local
    addresses.

    Signed-off-by: Linus Lüssing
    Signed-off-by: David S. Miller

    Linus Lüssing
     
  • The nsrcs number is 2 Byte wide, therefore we need to call ntohs()
    before using it.

    Signed-off-by: Linus Lüssing
    Signed-off-by: David S. Miller

    Linus Lüssing
     
  • We actually want a pointer to the grec_nsrcr and not the following
    field. Otherwise we can get very high values for *nsrcs as the first two
    bytes of the IPv6 multicast address are being used instead, leading to
    a failing pskb_may_pull() which results in MLDv2 reports not being
    parsed.

    Signed-off-by: Linus Lüssing
    Signed-off-by: David S. Miller

    Linus Lüssing