07 Apr, 2012

1 commit

  • Pull networking updates from David Miller:

    1) Fix inaccuracies in network driver interface documentation, from Ben
    Hutchings.

    2) Fix handling of negative offsets in BPF JITs, from Jan Seiffert.

    3) Compile warning, locking, and refcounting fixes in netfilter's
    xt_CT, from Pablo Neira Ayuso.

    4) phonet sendmsg needs to validate user length just like any other
    datagram protocol, fix from Sasha Levin.

    5) Ipv6 multicast code uses wrong loop index, from RongQing Li.

    6) Link handling and firmware fixes in bnx2x driver from Yaniv Rosner
    and Yuval Mintz.

    7) mlx4 erroneously allocates 4 pages at a time, regardless of page
    size, fix from Thadeu Lima de Souza Cascardo.

    8) SCTP socket option wasn't extended in a backwards compatible way,
    fix from Thomas Graf.

    9) Add missing address change event emissions to bonding, from Shlomo
    Pongratz.

    10) /proc/net/dev regressed because it uses a private offset to track
    where we are in the hash table, but this doesn't track the offset
    pullback that the seq_file code does resulting in some entries being
    missed in large dumps.

    Fix from Eric Dumazet.

    11) do_tcp_sendpage() unloads the send queue way too fast, because it
    invokes tcp_push() when it shouldn't. Let the natural sequence
    generated by the splice paths, and the assosciated MSG_MORE
    settings, guide the tcp_push() calls.

    Otherwise what goes out of TCP is spaghetti and doesn't batch
    effectively into GSO/TSO clusters.

    From Eric Dumazet.

    12) Once we put a SKB into either the netlink receiver's queue or a
    socket error queue, it can be consumed and freed up, therefore we
    cannot touch it after queueing it like that.

    Fixes from Eric Dumazet.

    13) PPP has this annoying behavior in that for every transmit call it
    immediately stops the TX queue, then calls down into the next layer
    to transmit the PPP frame.

    But if that next layer can take it immediately, it just un-stops the
    TX queue right before returning from the transmit method.

    Besides being useless work, it makes several facilities unusable, in
    particular things like the equalizers. Well behaved devices should
    only stop the TX queue when they really are full, and in PPP's case
    when it gets backlogged to the downstream device.

    David Woodhouse therefore fixed PPP to not stop the TX queue until
    it's downstream can't take data any more.

    14) IFF_UNICAST_FLT got accidently lost in some recent stmmac driver
    changes, re-add. From Marc Kleine-Budde.

    15) Fix link flaps in ixgbe, from Eric W. Multanen.

    16) Descriptor writeback fixes in e1000e from Matthew Vick.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
    net: fix a race in sock_queue_err_skb()
    netlink: fix races after skb queueing
    doc, net: Update ndo_start_xmit return type and values
    doc, net: Remove instruction to set net_device::trans_start
    doc, net: Update netdev operation names
    doc, net: Update documentation of synchronisation for TX multiqueue
    doc, net: Remove obsolete reference to dev->poll
    ethtool: Remove exception to the requirement of holding RTNL lock
    MAINTAINERS: update for Marvell Ethernet drivers
    bonding: properly unset current_arp_slave on slave link up
    phonet: Check input from user before allocating
    tcp: tcp_sendpages() should call tcp_push() once
    ipv6: fix array index in ip6_mc_add_src()
    mlx4: allocate just enough pages instead of always 4 pages
    stmmac: re-add IFF_UNICAST_FLT for dwmac1000
    bnx2x: Clear MDC/MDIO warning message
    bnx2x: Fix BCM57711+BCM84823 link issue
    bnx2x: Clear BCM84833 LED after fan failure
    bnx2x: Fix BCM84833 PHY FW version presentation
    bnx2x: Fix link issue for BCM8727 boards.
    ...

    Linus Torvalds
     

06 Apr, 2012

6 commits

  • As soon as an skb is queued into socket error queue, another thread
    can consume it, so we are not allowed to reference skb anymore, or risk
    use after free.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • As soon as an skb is queued into socket receive_queue, another thread
    can consume it, so we are not allowed to reference skb anymore, or risk
    use after free.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • A phonet packet is limited to USHRT_MAX bytes, this is never checked during
    tx which means that the user can specify any size he wishes, and the kernel
    will attempt to allocate that size.

    In the good case, it'll lead to the following warning, but it may also cause
    the kernel to kick in the OOM and kill a random task on the server.

    [ 8921.744094] WARNING: at mm/page_alloc.c:2255 __alloc_pages_slowpath+0x65/0x730()
    [ 8921.749770] Pid: 5081, comm: trinity Tainted: G W 3.4.0-rc1-next-20120402-sasha #46
    [ 8921.756672] Call Trace:
    [ 8921.758185] [] warn_slowpath_common+0x87/0xb0
    [ 8921.762868] [] warn_slowpath_null+0x15/0x20
    [ 8921.765399] [] __alloc_pages_slowpath+0x65/0x730
    [ 8921.769226] [] ? zone_watermark_ok+0x1a/0x20
    [ 8921.771686] [] ? get_page_from_freelist+0x625/0x660
    [ 8921.773919] [] __alloc_pages_nodemask+0x1f8/0x240
    [ 8921.776248] [] kmalloc_large_node+0x70/0xc0
    [ 8921.778294] [] __kmalloc_node_track_caller+0x34/0x1c0
    [ 8921.780847] [] ? sock_alloc_send_pskb+0xbc/0x260
    [ 8921.783179] [] __alloc_skb+0x75/0x170
    [ 8921.784971] [] sock_alloc_send_pskb+0xbc/0x260
    [ 8921.787111] [] ? release_sock+0x7e/0x90
    [ 8921.788973] [] sock_alloc_send_skb+0x10/0x20
    [ 8921.791052] [] pep_sendmsg+0x60/0x380
    [ 8921.792931] [] ? pn_socket_bind+0x156/0x180
    [ 8921.794917] [] ? pn_socket_autobind+0x3f/0x90
    [ 8921.797053] [] pn_socket_sendmsg+0x4f/0x70
    [ 8921.798992] [] sock_aio_write+0x187/0x1b0
    [ 8921.801395] [] ? sub_preempt_count+0xae/0xf0
    [ 8921.803501] [] ? __lock_acquire+0x42c/0x4b0
    [ 8921.805505] [] ? __sock_recv_ts_and_drops+0x140/0x140
    [ 8921.807860] [] do_sync_readv_writev+0xbc/0x110
    [ 8921.809986] [] ? might_fault+0x97/0xa0
    [ 8921.811998] [] ? security_file_permission+0x1e/0x90
    [ 8921.814595] [] do_readv_writev+0xe2/0x1e0
    [ 8921.816702] [] ? do_setitimer+0x1ac/0x200
    [ 8921.818819] [] ? get_parent_ip+0x11/0x50
    [ 8921.820863] [] ? sub_preempt_count+0xae/0xf0
    [ 8921.823318] [] vfs_writev+0x46/0x60
    [ 8921.825219] [] sys_writev+0x4f/0xb0
    [ 8921.827127] [] system_call_fastpath+0x16/0x1b
    [ 8921.829384] ---[ end trace dffe390f30db9eb7 ]---

    Signed-off-by: Sasha Levin
    Acked-by: Rémi Denis-Courmont
    Signed-off-by: David S. Miller

    Sasha Levin
     
  • commit 2f533844242 (tcp: allow splice() to build full TSO packets) added
    a regression for splice() calls using SPLICE_F_MORE.

    We need to call tcp_flush() at the end of the last page processed in
    tcp_sendpages(), or else transmits can be deferred and future sends
    stall.

    Add a new internal flag, MSG_SENDPAGE_NOTLAST, acting like MSG_MORE, but
    with different semantic.

    For all sendpage() providers, its a transparent change. Only
    sock_sendpage() and tcp_sendpages() can differentiate the two different
    flags provided by pipe_to_sendpage()

    Reported-by: Tom Herbert
    Cc: Nandita Dukkipati
    Cc: Neal Cardwell
    Cc: Tom Herbert
    Cc: Yuchung Cheng
    Cc: H.K. Jerry Chu
    Cc: Maciej Żenczykowski
    Cc: Mahesh Bandewar
    Cc: Ilpo Järvinen
    Signed-off-by: Eric Dumazet com>
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Merge batch of fixes from Andrew Morton:
    "The simple_open() cleanup was held back while I wanted for laggards to
    merge things.

    I still need to send a few checkpoint/restore patches. I've been
    wobbly about merging them because I'm wobbly about the overall
    prospects for success of the project. But after speaking with Pavel
    at the LSF conference, it sounds like they're further toward
    completion than I feared - apparently davem is at the "has stopped
    complaining" stage regarding the net changes. So I need to go back
    and re-review those patchs and their (lengthy) discussion."

    * emailed from Andrew Morton : (16 patches)
    memcg swap: use mem_cgroup_uncharge_swap fix
    backlight: add driver for DA9052/53 PMIC v1
    C6X: use set_current_blocked() and block_sigmask()
    MAINTAINERS: add entry for sparse checker
    MAINTAINERS: fix REMOTEPROC F: typo
    alpha: use set_current_blocked() and block_sigmask()
    simple_open: automatically convert to simple_open()
    scripts/coccinelle/api/simple_open.cocci: semantic patch for simple_open()
    libfs: add simple_open()
    hugetlbfs: remove unregister_filesystem() when initializing module
    drivers/rtc/rtc-88pm860x.c: fix rtc irq enable callback
    fs/xattr.c:setxattr(): improve handling of allocation failures
    fs/xattr.c:listxattr(): fall back to vmalloc() if kmalloc() failed
    fs/xattr.c: suppress page allocation failure warnings from sys_listxattr()
    sysrq: use SEND_SIG_FORCED instead of force_sig()
    proc: fix mount -t proc -o AAA

    Linus Torvalds
     
  • Many users of debugfs copy the implementation of default_open() when
    they want to support a custom read/write function op. This leads to a
    proliferation of the default_open() implementation across the entire
    tree.

    Now that the common implementation has been consolidated into libfs we
    can replace all the users of this function with simple_open().

    This replacement was done with the following semantic patch:

    @ open @
    identifier open_f != simple_open;
    identifier i, f;
    @@
    -int open_f(struct inode *i, struct file *f)
    -{
    (
    -if (i->i_private)
    -f->private_data = i->i_private;
    |
    -f->private_data = i->i_private;
    )
    -return 0;
    -}

    @ has_open depends on open @
    identifier fops;
    identifier open.open_f;
    @@
    struct file_operations fops = {
    ...
    -.open = open_f,
    +.open = simple_open,
    ...
    };

    [akpm@linux-foundation.org: checkpatch fixes]
    Signed-off-by: Stephen Boyd
    Cc: Greg Kroah-Hartman
    Cc: Al Viro
    Cc: Julia Lawall
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Boyd
     

05 Apr, 2012

2 commits

  • Convert array index from the loop bound to the loop index.

    And remove the void type conversion to ip6_mc_del1_src() return
    code, seem it is unnecessary, since ip6_mc_del1_src() does not
    use __must_check similar attribute, no compiler will report the
    warning when it is removed.

    v2: enrich the commit header

    Signed-off-by: RongQing.Li
    Signed-off-by: David S. Miller

    RongQing.Li
     
  • getsockopt(..., SCTP_EVENTS, ...) performs a length check and returns
    an error if the user provides less bytes than the size of struct
    sctp_event_subscribe.

    Struct sctp_event_subscribe needs to be extended by an u8 for every
    new event or notification type that is added.

    This obviously makes getsockopt fail for binaries that are compiled
    against an older versions of which do not contain
    all event types.

    This patch changes getsockopt behaviour to no longer return an error
    if not enough bytes are being provided by the user. Instead, it
    returns as much of sctp_event_subscribe as fits into the provided buffer.

    This leads to the new behavior that users see what they have been aware
    of at compile time.

    The setsockopt(..., SCTP_EVENTS, ...) API is already behaving like this.

    Signed-off-by: Thomas Graf
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Thomas Graf
     

04 Apr, 2012

7 commits

  • We have to decrement the conntrack counter if we fail to access the
    zone extension.

    Reported-by: Tetsuo Handa
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     
  • The error path misses putting the timeout object. This patch adds
    new function xt_ct_tg_timeout_put() to put the timeout object.

    Reported-by: Tetsuo Handa
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     
  • Reported-by: Tetsuo Handa
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     
  • David S. Miller
     
  • The function is renamed to make it a little more clear what it does.
    It is not added to any .h because it is not for general consumption, only for
    bpf internal use (and so by the jits).

    Signed-of-by: Jan Seiffert
    Acked-by: Eric Dumazet

    Signed-off-by: David S. Miller

    Jan Seiffert
     
  • vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a
    time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096)

    The call to tcp_push() at the end of do_tcp_sendpages() forces an
    immediate xmit when pipe is not already filled, and tso_fragment() try
    to split these skb to MSS multiples.

    4096 bytes are usually split in a skb with 2 MSS, and a remaining
    sub-mss skb (assuming MTU=1500)

    This makes slow start suboptimal because many small frames are sent to
    qdisc/driver layers instead of big ones (constrained by cwnd and packets
    in flight of course)

    In fact, applications using sendmsg() (adding an additional memory copy)
    instead of vmsplice()/splice()/sendfile() are a bit faster because of
    this anomaly, especially if serving small files in environments with
    large initial [c]wnd.

    Call tcp_push() only if MSG_MORE is not set in the flags parameter.

    This bit is automatically provided by splice() internals but for the
    last page, or on all pages if user specified SPLICE_F_MORE splice()
    flag.

    In some workloads, this can reduce number of sent logical packets by an
    order of magnitude, making zero-copy TCP actually faster than
    one-copy :)

    Reported-by: Tom Herbert
    Cc: Nandita Dukkipati
    Cc: Neal Cardwell
    Cc: Tom Herbert
    Cc: Yuchung Cheng
    Cc: H.K. Jerry Chu
    Cc: Maciej Żenczykowski
    Cc: Mahesh Bandewar
    Cc: Ilpo Järvinen
    Signed-off-by: Eric Dumazet com>
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Commit f04565ddf52 (dev: use name hash for dev_seq_ops) added a second
    regression, as some devices are missing from /proc/net/dev if many
    devices are defined.

    When seq_file buffer is filled, the last ->next/show() method is
    canceled (pos value is reverted to value prior ->next() call)

    Problem is after above commit, we dont restart the lookup at right
    position in ->start() method.

    Fix this by removing the internal 'pos' pointer added in commit, since
    we need to use the 'loff_t *pos' provided by seq_file layer.

    This also reverts commit 5cac98dd0 (net: Fix corruption
    in /proc/*/net/dev_mcast), since its not needed anymore.

    Reported-by: Ben Greear
    Signed-off-by: Eric Dumazet
    Cc: Mihai Maruseac
    Tested-by: Ben Greear
    Signed-off-by: David S. Miller

    Eric Dumazet
     

03 Apr, 2012

2 commits

  • If CONFIG_NF_CONNTRACK_TIMEOUT=n we have following warning :

    CC [M] net/netfilter/xt_CT.o
    net/netfilter/xt_CT.c: In function ‘xt_ct_tg_check_v1’:
    net/netfilter/xt_CT.c:284: warning: label ‘err4’ defined but not used

    Reported-by: Eric Dumazet
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Pull networking fixes from David Miller:

    1) Provide device string properly for USB i2400m wimax devices, also
    don't OOPS when providing firmware string. From Phil Sutter.

    2) Add support for sh_eth SH7734 chips, from Nobuhiro Iwamatsu.

    3) Add another device ID to USB zaurus driver, from Guan Xin.

    4) Loop index start in pool vector iterator is wrong causing MAC to not
    get configured in bnx2x driver, fix from Dmitry Kravkov.

    5) EQL driver assumes HZ=100, fix from Eric Dumazet.

    6) Now that skb_add_rx_frag() can specify the truesize increment
    separately, do so in f_phonet and cdc_phonet, also from Eric
    Dumazet.

    7) virtio_net accidently uses net_ratelimit() not only on the kernel
    warning but also the statistic bump, fix from Rick Jones.

    8) ip_route_input_mc() uses fixed init_net namespace, oops, use
    dev_net(dev) instead. Fix from Benjamin LaHaise.

    9) dev_forward_skb() needs to clear the incoming interface index of the
    SKB so that it looks like a new incoming packet, also from Benjamin
    LaHaise.

    10) iwlwifi mistakenly initializes a channel entry as 2GHZ instead of
    5GHZ, fix from Stanislav Yakovlev.

    11) Missing kmalloc() return value checks in orinoco, from Santosh
    Nayak.

    12) ath9k doesn't check for HT capabilities in the right way, it is
    checking ht_supported instead of the ATH9K_HW_CAP_HT flag. Fix from
    Sujith Manoharan.

    13) Fix x86 BPF JIT emission of 16-bit immediate field of AND
    instructions, from Feiran Zhuang.

    14) Avoid infinite loop in GARP code when registering sysfs entries.
    From David Ward.

    15) rose protocol uses memcpy instead of memcmp in a device address
    comparison, oops. Fix from Daniel Borkmann.

    16) Fix build of lpc_eth due to dev_hw_addr_rancom() interface being
    renamed to eth_hw_addr_random(). From Roland Stigge.

    17) Make ipv6 RTM_GETROUTE interpret RTA_IIF attribute the same way
    that ipv4 does. Fix from Shmulik Ladkani.

    18) via-rhine has an inverted bit test, causing suspend/resume
    regressions. Fix from Andreas Mohr.

    19) RIONET assumes 4K page size, fix from Akinobu Mita.

    20) Initialization of imask register in sky2 is buggy, because bits are
    "or'd" into an uninitialized local variable. Fix from Lino
    Sanfilippo.

    21) Fix FCOE checksum offload handling, from Yi Zou.

    22) Fix VLAN processing regression in e1000, from Jiri Pirko.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
    sky2: dont overwrite settings for PHY Quick link
    tg3: Fix 5717 serdes powerdown problem
    net: usb: cdc_eem: fix mtu
    net: sh_eth: fix endian check for architecture independent
    usb/rtl8150 : Remove duplicated definitions
    rionet: fix page allocation order of rionet_active
    via-rhine: fix wait-bit inversion.
    ipv6: Fix RTM_GETROUTE's interpretation of RTA_IIF to be consistent with ipv4
    net: lpc_eth: Fix rename of dev_hw_addr_random
    net/netfilter/nfnetlink_acct.c: use linux/atomic.h
    rose_dev: fix memcpy-bug in rose_set_mac_address
    Fix non TBI PHY access; a bad merge undid bug fix in a previous commit.
    net/garp: avoid infinite loop if attribute already exists
    x86 bpf_jit: fix a bug in emitting the 16-bit immediate operand of AND
    bonding: emit event when bonding changes MAC
    mac80211: fix oper channel timestamp updation
    ath9k: Use HW HT capabilites properly
    MAINTAINERS: adding maintainer for ipw2x00
    net: orinoco: add error handling for failed kmalloc().
    net/wireless: ipw2x00: fix a typo in wiphy struct initilization
    ...

    Linus Torvalds
     

02 Apr, 2012

5 commits

  • In IPv4, if an RTA_IIF attribute is specified within an RTM_GETROUTE
    message, then a route is searched as if a packet was received on the
    specified 'iif' interface.

    However in IPv6, RTA_IIF is not interpreted in the same way:
    'inet6_rtm_getroute()' always calls 'ip6_route_output()', regardless the
    RTA_IIF attribute.

    As a result, in IPv6 there's no way to use RTM_GETROUTE in order to look
    for a route as if a packet was received on a specific interface.

    Fix 'inet6_rtm_getroute()' so that RTA_IIF is interpreted as "lookup a
    route as if a packet was received on the specified interface", similar
    to IPv4's 'inet_rtm_getroute()' interpretation.

    Reported-by: Ami Koren
    Signed-off-by: Shmulik Ladkani
    Signed-off-by: David S. Miller

    Shmulik Ladkani
     
  • There's no known problem here, but this is one of only two non-arch files
    in the kernel which use asm/atomic.h instead of linux/atomic.h.

    Acked-by: Pablo Neira Ayuso
    Cc: Patrick McHardy
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Andrew Morton
     
  • If both addresses equal, nothing needs to be done. If the device is down,
    then we simply copy the new address to dev->dev_addr. If the device is up,
    then we add another loopback device with the new address, and if that does
    not fail, we remove the loopback device with the old address. And only
    then, we update the dev->dev_addr.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    danborkmann@iogearbox.net
     
  • An infinite loop occurred if garp_attr_create was called with the values
    of an existing attribute. This might happen if a previous leave request
    for the attribute has not yet been followed by a PDU transmission (or,
    if the application previously issued a join request for the attribute
    and is now issuing another one, without having issued a leave request).

    If garp_attr_create finds an existing attribute having the same values,
    return the address to it. Its state will then get updated (i.e., if it
    was in a leaving state, it will move into a non-leaving state and not
    get deleted during the next PDU transmission).

    To accomplish this fix, collapse garp_attr_insert into garp_attr_create
    (which is its only caller).

    Thanks to Jorge Boncompte [DTI2] for contributing to
    this fix.

    Signed-off-by: David Ward
    Acked-by: Jorge Boncompte [DTI2]
    Signed-off-by: David S. Miller

    David Ward
     
  • David S. Miller
     

30 Mar, 2012

2 commits

  • Pull x32 support for x86-64 from Ingo Molnar:
    "This tree introduces the X32 binary format and execution mode for x86:
    32-bit data space binaries using 64-bit instructions and 64-bit kernel
    syscalls.

    This allows applications whose working set fits into a 32 bits address
    space to make use of 64-bit instructions while using a 32-bit address
    space with shorter pointers, more compressed data structures, etc."

    Fix up trivial context conflicts in arch/x86/{Kconfig,vdso/vma.c}

    * 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
    x32: Fix alignment fail in struct compat_siginfo
    x32: Fix stupid ia32/x32 inversion in the siginfo format
    x32: Add ptrace for x32
    x32: Switch to a 64-bit clock_t
    x32: Provide separate is_ia32_task() and is_x32_task() predicates
    x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls
    x86/x32: Fix the binutils auto-detect
    x32: Warn and disable rather than error if binutils too old
    x32: Only clear TIF_X32 flag once
    x32: Make sure TS_COMPAT is cleared for x32 tasks
    fs: Remove missed ->fds_bits from cessation use of fd_set structs internally
    fs: Fix close_on_exec pointer in alloc_fdtable
    x32: Drop non-__vdso weak symbols from the x32 VDSO
    x32: Fix coding style violations in the x32 VDSO code
    x32: Add x32 VDSO support
    x32: Allow x32 to be configured
    x32: If configured, add x32 system calls to system call tables
    x32: Handle process creation
    x32: Signal-related system calls
    x86: Add #ifdef CONFIG_COMPAT to
    ...

    Linus Torvalds
     
  • Pull nfsd changes from Bruce Fields:

    Highlights:
    - Benny Halevy and Tigran Mkrtchyan implemented some more 4.1 features,
    moving us closer to a complete 4.1 implementation.
    - Bernd Schubert fixed a long-standing problem with readdir cookies on
    ext2/3/4.
    - Jeff Layton performed a long-overdue overhaul of the server reboot
    recovery code which will allow us to deprecate the current code (a
    rather unusual user of the vfs), and give us some needed flexibility
    for further improvements.
    - Like the client, we now support numeric uid's and gid's in the
    auth_sys case, allowing easier upgrades from NFSv2/v3 to v4.x.

    Plus miscellaneous bugfixes and cleanup.

    Thanks to everyone!

    There are also some delegation fixes waiting on vfs review that I
    suppose will have to wait for 3.5. With that done I think we'll finally
    turn off the "EXPERIMENTAL" dependency for v4 (though that's mostly
    symbolic as it's been on by default in distro's for a while).

    And the list of 4.1 todo's should be achievable for 3.5 as well:

    http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues

    though we may still want a bit more experience with it before turning it
    on by default.

    * 'for-3.4' of git://linux-nfs.org/~bfields/linux: (55 commits)
    nfsd: only register cld pipe notifier when CONFIG_NFSD_V4 is enabled
    nfsd4: use auth_unix unconditionally on backchannel
    nfsd: fix NULL pointer dereference in cld_pipe_downcall
    nfsd4: memory corruption in numeric_name_to_id()
    sunrpc: skip portmap calls on sessions backchannel
    nfsd4: allow numeric idmapping
    nfsd: don't allow legacy client tracker init for anything but init_net
    nfsd: add notifier to handle mount/unmount of rpc_pipefs sb
    nfsd: add the infrastructure to handle the cld upcall
    nfsd: add a header describing upcall to nfsdcld
    nfsd: add a per-net-namespace struct for nfsd
    sunrpc: create nfsd dir in rpc_pipefs
    nfsd: add nfsd4_client_tracking_ops struct and a way to set it
    nfsd: convert nfs4_client->cl_cb_flags to a generic flags field
    NFSD: Fix nfs4_verifier memory alignment
    NFSD: Fix warnings when NFSD_DEBUG is not defined
    nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)
    nfsd: rename 'int access' to 'int may_flags' in nfsd_open()
    ext4: return 32/64-bit dir name hash according to usage type
    fs: add new FMODE flags: FMODE_32bithash and FMODE_64bithash
    ...

    Linus Torvalds
     

29 Mar, 2012

6 commits

  • Pull NFS client bugfixes for Linux 3.4 from Trond Myklebust

    Highlights include:
    - Fix infinite loops in the mount code
    - Fix a userspace buffer overflow in __nfs4_get_acl_uncached
    - Fix a memory leak due to a double reference count in rpcb_getport_async()

    Signed-off-by: Trond Myklebust

    * tag 'nfs-for-3.4-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    NFSv4: Minor cleanups for nfs4_handle_exception and nfs4_async_handle_error
    NFSv4.1: Fix layoutcommit error handling
    NFSv4: Fix two infinite loops in the mount code
    SUNRPC: Use the already looked-up xprt in rpcb_getport_async()
    NFS4.1: remove duplicate variable declaration in filelayout_clear_request_commit
    Fix length of buffer copied in __nfs4_get_acl_uncached

    Linus Torvalds
     
  • …m/linux/kernel/git/dhowells/linux-asm_system

    Pull "Disintegrate and delete asm/system.h" from David Howells:
    "Here are a bunch of patches to disintegrate asm/system.h into a set of
    separate bits to relieve the problem of circular inclusion
    dependencies.

    I've built all the working defconfigs from all the arches that I can
    and made sure that they don't break.

    The reason for these patches is that I recently encountered a circular
    dependency problem that came about when I produced some patches to
    optimise get_order() by rewriting it to use ilog2().

    This uses bitops - and on the SH arch asm/bitops.h drags in
    asm-generic/get_order.h by a circuituous route involving asm/system.h.

    The main difficulty seems to be asm/system.h. It holds a number of
    low level bits with no/few dependencies that are commonly used (eg.
    memory barriers) and a number of bits with more dependencies that
    aren't used in many places (eg. switch_to()).

    These patches break asm/system.h up into the following core pieces:

    (1) asm/barrier.h

    Move memory barriers here. This already done for MIPS and Alpha.

    (2) asm/switch_to.h

    Move switch_to() and related stuff here.

    (3) asm/exec.h

    Move arch_align_stack() here. Other process execution related bits
    could perhaps go here from asm/processor.h.

    (4) asm/cmpxchg.h

    Move xchg() and cmpxchg() here as they're full word atomic ops and
    frequently used by atomic_xchg() and atomic_cmpxchg().

    (5) asm/bug.h

    Move die() and related bits.

    (6) asm/auxvec.h

    Move AT_VECTOR_SIZE_ARCH here.

    Other arch headers are created as needed on a per-arch basis."

    Fixed up some conflicts from other header file cleanups and moving code
    around that has happened in the meantime, so David's testing is somewhat
    weakened by that. We'll find out anything that got broken and fix it..

    * tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits)
    Delete all instances of asm/system.h
    Remove all #inclusions of asm/system.h
    Add #includes needed to permit the removal of asm/system.h
    Move all declarations of free_initmem() to linux/mm.h
    Disintegrate asm/system.h for OpenRISC
    Split arch_align_stack() out from asm-generic/system.h
    Split the switch_to() wrapper out of asm-generic/system.h
    Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h
    Create asm-generic/barrier.h
    Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h
    Disintegrate asm/system.h for Xtensa
    Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt]
    Disintegrate asm/system.h for Tile
    Disintegrate asm/system.h for Sparc
    Disintegrate asm/system.h for SH
    Disintegrate asm/system.h for Score
    Disintegrate asm/system.h for S390
    Disintegrate asm/system.h for PowerPC
    Disintegrate asm/system.h for PA-RISC
    Disintegrate asm/system.h for MN10300
    ...

    Linus Torvalds
     
  • Whenever the station informs the AP that it is about to leave the
    operating channel, the timestamp should be recorded. It is handled
    in scan resume but not in scan start. Fix that.

    Signed-off-by: Rajkumar Manoharan
    Signed-off-by: John W. Linville

    Rajkumar Manoharan
     
  • Remove all #inclusions of asm/system.h preparatory to splitting and killing
    it. Performed with the following command:

    perl -p -i -e 's!^#\s*include\s*.*\n!!' `grep -Irl '^#\s*include\s*' *`

    Signed-off-by: David Howells

    David Howells
     
  • Pull Ceph updates for 3.4-rc1 from Sage Weil:
    "Alex has been busy. There are a range of rbd and libceph cleanups,
    especially surrounding device setup and teardown, and a few critical
    fixes in that code. There are more cleanups in the messenger code,
    virtual xattrs, a fix for CRC calculation/checks, and lots of other
    miscellaneous stuff.

    There's a patch from Amon Ott to make inos behave a bit better on
    32-bit boxes, some decode check fixes from Xi Wang, and network
    throttling fix from Jim Schutt, and a couple RBD fixes from Josh
    Durgin.

    No new functionality, just a lot of cleanup and bug fixing."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (65 commits)
    rbd: move snap_rwsem to the device, rename to header_rwsem
    ceph: fix three bugs, two in ceph_vxattrcb_file_layout()
    libceph: isolate kmap() call in write_partial_msg_pages()
    libceph: rename "page_shift" variable to something sensible
    libceph: get rid of zero_page_address
    libceph: only call kernel_sendpage() via helper
    libceph: use kernel_sendpage() for sending zeroes
    libceph: fix inverted crc option logic
    libceph: some simple changes
    libceph: small refactor in write_partial_kvec()
    libceph: do crc calculations outside loop
    libceph: separate CRC calculation from byte swapping
    libceph: use "do" in CRC-related Boolean variables
    ceph: ensure Boolean options support both senses
    libceph: a few small changes
    libceph: make ceph_tcp_connect() return int
    libceph: encapsulate some messenger cleanup code
    libceph: make ceph_msgr_wq private
    libceph: encapsulate connection kvec operations
    libceph: move prepare_write_banner()
    ...

    Linus Torvalds
     
  • Pull 9p changes for the 3.4 merge window from Eric Van Hensbergen.

    * tag 'for-linus-3.4-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
    9p: statfs should not override server f_type
    net/9p: handle flushed Tclunk/Tremove
    net/9p: don't allow Tflush to be interrupted

    Linus Torvalds
     

28 Mar, 2012

6 commits

  • While investigating another bug, I found that the code on the incoming path
    in __netif_receive_skb will only set skb->skb_iif if it is already 0. When
    dev_forward_skb() is used in the case of interfaces like veth, skb_iif may
    already have been set. Making dev_forward_skb() cause the packet to look
    like a newly received packet would seem to the the correct behaviour here,
    as otherwise the wrong incoming interface can be reported for such a packet.

    Signed-off-by: Benjamin LaHaise
    Signed-off-by: David S. Miller

    Benjamin LaHaise
     
  • When using multicast over a local bridge feeding a number of LXC guests
    using veth, the LXC guests are unable to get a response from other guests
    when pinging 224.0.0.1. Multicast packets did not appear to be getting
    delivered to the network namespaces of the guest hosts, and further
    inspection showed that the incoming route was pointing to the loopback
    device of the host, not the guest. This lead to the wrong network namespace
    being picked up by sockets (like ICMP). Fix this by using the correct
    network namespace when creating the inbound route entry.

    Signed-off-by: Benjamin LaHaise
    Signed-off-by: David S. Miller

    Benjamin LaHaise
     
  • David S. Miller
     
  • Pull networking fixes from David Miller:
    1) Name string overrun fix in gianfar driver from Joe Perches.

    2) VHOST bug fixes from Michael S. Tsirkin and Nadav Har'El

    3) Fix dependencies on xt_LOG netfilter module, from Pablo Neira Ayuso.

    4) Fix RCU locking in xt_CT, also from Pablo Neira Ayuso.

    5) Add a parameter to skb_add_rx_frag() so we can fix the truesize
    adjustments in the drivers that use it. The individual drivers
    aren't fixed by this commit, but will be dealt with using follow-on
    commits. From Eric Dumazet.

    6) Add some device IDs to qmi_wwan driver, from Andrew Bird.

    7) Fix a potential rcu_read_lock() imbalancein rt6_fill_node(). From
    Eric Dumazet.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
    net: fix a potential rcu_read_lock() imbalance in rt6_fill_node()
    net: add a truesize parameter to skb_add_rx_frag()
    gianfar: Fix possible overrun and simplify interrupt name field creation
    USB: qmi_wwan: Add ZTE (Vodafone) K3570-Z and K3571-Z net interfaces
    USB: option: Ignore ZTE (Vodafone) K3570/71 net interfaces
    USB: qmi_wwan: Add ZTE (Vodafone) K3565-Z and K4505-Z net interfaces
    qlcnic: Bug fix for LRO
    netfilter: nf_conntrack: permanently attach timeout policy to conntrack
    netfilter: xt_CT: fix assignation of the generic protocol tracker
    netfilter: xt_CT: missing rcu_read_lock section in timeout assignment
    netfilter: cttimeout: fix dependency with l4protocol conntrack module
    netfilter: xt_LOG: use CONFIG_IP6_NF_IPTABLES instead of CONFIG_IPV6
    vhost: fix release path lockdep checks
    vhost: don't forget to schedule()
    tools/virtio: stub out strong barriers
    tools/virtio: add linux/hrtimer.h stub
    tools/virtio: add linux/module.h stub

    Linus Torvalds
     
  • Commit f2c31e32b378 (net: fix NULL dereferences in check_peer_redir() )
    added a regression in rt6_fill_node(), leading to rcu_read_lock()
    imbalance.

    Thats because NLA_PUT() can make a jump to nla_put_failure label.

    Fix this by using nla_put()

    Many thanks to Ben Greear for his help

    Reported-by: Ben Greear
    Reported-by: Dave Jones
    Signed-off-by: Eric Dumazet
    Tested-by: Ben Greear
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • rbcb_getport_async() was looking up the rpc_xprt (reference++) and then
    later looking it up again (reference++) to pass through the
    rpcbind_args. The xprt would only be dereferenced once, when we were
    done with the rpcbind_args (reference--). This leaves an extra
    reference to the transport that would never go away.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     

27 Mar, 2012

3 commits

  • Is possible that we will arm the tid_rx->reorder_timer after
    del_timer_sync() in ___ieee80211_stop_rx_ba_session(). We need to stop
    timer after RCU grace period finish, so move it to
    ieee80211_free_tid_rx(). Timer will not be armed again, as
    rcu_dereference(sta->ampdu_mlme.tid_rx[tid]) will return NULL.

    Debug object detected problem with the following warning:
    ODEBUG: free active (active state 0) object type: timer_list hint: sta_rx_agg_reorder_timer_expired+0x0/0xf0 [mac80211]

    Bug report (with all warning messages):
    https://bugzilla.redhat.com/show_bug.cgi?id=804007

    Reported-by: "jan p. springer"
    Cc: stable@vger.kernel.org
    Signed-off-by: Stanislaw Gruszka
    Signed-off-by: John W. Linville

    Stanislaw Gruszka
     
  • The on-oper-channel optimization was reverted,
    so remove the outdated comment as well.

    Signed-off-by: Eliad Peller
    Signed-off-by: John W. Linville

    Eliad Peller
     
  • The station_info struct had demanded dBm signal values, but the
    cfg80211 wireless extensions implementation was also accepting
    "unspecified" (i.e. RSSI) unit values while the nl80211 code was
    completely unaware of them. Resolve this by formally allowing the
    "unspecified" units while making nl80211 ignore them.

    Signed-off-by: John W. Linville
    Reviewed-by: Johannes Berg

    John W. Linville