14 Sep, 2013

1 commit

  • Pull aio changes from Ben LaHaise:
    "First off, sorry for this pull request being late in the merge window.
    Al had raised a couple of concerns about 2 items in the series below.
    I addressed the first issue (the race introduced by Gu's use of
    mm_populate()), but he has not provided any further details on how he
    wants to rework the anon_inode.c changes (which were sent out months
    ago but have yet to be commented on).

    The bulk of the changes have been sitting in the -next tree for a few
    months, with all the issues raised being addressed"

    * git://git.kvack.org/~bcrl/aio-next: (22 commits)
    aio: rcu_read_lock protection for new rcu_dereference calls
    aio: fix race in ring buffer page lookup introduced by page migration support
    aio: fix rcu sparse warnings introduced by ioctx table lookup patch
    aio: remove unnecessary debugging from aio_free_ring()
    aio: table lookup: verify ctx pointer
    staging/lustre: kiocb->ki_left is removed
    aio: fix error handling and rcu usage in "convert the ioctx list to table lookup v3"
    aio: be defensive to ensure request batching is non-zero instead of BUG_ON()
    aio: convert the ioctx list to table lookup v3
    aio: double aio_max_nr in calculations
    aio: Kill ki_dtor
    aio: Kill ki_users
    aio: Kill unneeded kiocb members
    aio: Kill aio_rw_vect_retry()
    aio: Don't use ctx->tail unnecessarily
    aio: io_cancel() no longer returns the io_event
    aio: percpu ioctx refcount
    aio: percpu reqs_available
    aio: reqs_active -> reqs_available
    aio: fix build when migration is disabled
    ...

    Linus Torvalds
     

13 Sep, 2013

5 commits

  • After the last architecture switched to generic hard irqs the config
    options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code
    for !CONFIG_GENERIC_HARDIRQS can be removed.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • Merge more patches from Andrew Morton:
    "The rest of MM. Plus one misc cleanup"

    * emailed patches from Andrew Morton : (35 commits)
    mm/Kconfig: add MMU dependency for MIGRATION.
    kernel: replace strict_strto*() with kstrto*()
    mm, thp: count thp_fault_fallback anytime thp fault fails
    thp: consolidate code between handle_mm_fault() and do_huge_pmd_anonymous_page()
    thp: do_huge_pmd_anonymous_page() cleanup
    thp: move maybe_pmd_mkwrite() out of mk_huge_pmd()
    mm: cleanup add_to_page_cache_locked()
    thp: account anon transparent huge pages into NR_ANON_PAGES
    truncate: drop 'oldsize' truncate_pagecache() parameter
    mm: make lru_add_drain_all() selective
    memcg: document cgroup dirty/writeback memory statistics
    memcg: add per cgroup writeback pages accounting
    memcg: check for proper lock held in mem_cgroup_update_page_stat
    memcg: remove MEMCG_NR_FILE_MAPPED
    memcg: reduce function dereference
    memcg: avoid overflow caused by PAGE_ALIGN
    memcg: rename RESOURCE_MAX to RES_COUNTER_MAX
    memcg: correct RESOURCE_MAX to ULLONG_MAX
    mm: memcg: do not trap chargers with full callstack on OOM
    mm: memcg: rework and document OOM waiting and wakeup
    ...

    Linus Torvalds
     
  • RESOURCE_MAX is far too general name, change it to RES_COUNTER_MAX.

    Signed-off-by: Sha Zhengju
    Signed-off-by: Qiang Huang
    Acked-by: Michal Hocko
    Cc: Daisuke Nishimura
    Cc: Jeff Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sha Zhengju
     
  • Pull vfs pile 4 from Al Viro:
    "list_lru pile, mostly"

    This came out of Andrew's pile, Al ended up doing the merge work so that
    Andrew didn't have to.

    Additionally, a few fixes.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits)
    super: fix for destroy lrus
    list_lru: dynamically adjust node arrays
    shrinker: Kill old ->shrink API.
    shrinker: convert remaining shrinkers to count/scan API
    staging/lustre/libcfs: cleanup linux-mem.h
    staging/lustre/ptlrpc: convert to new shrinker API
    staging/lustre/obdclass: convert lu_object shrinker to count/scan API
    staging/lustre/ldlm: convert to shrinkers to count/scan API
    hugepage: convert huge zero page shrinker to new shrinker API
    i915: bail out earlier when shrinker cannot acquire mutex
    drivers: convert shrinkers to new count/scan API
    fs: convert fs shrinkers to new scan/count API
    xfs: fix dquot isolation hang
    xfs-convert-dquot-cache-lru-to-list_lru-fix
    xfs: convert dquot cache lru to list_lru
    xfs: rework buffer dispose list tracking
    xfs-convert-buftarg-lru-to-generic-code-fix
    xfs: convert buftarg LRU to generic code
    fs: convert inode and dentry shrinking to be node aware
    vmscan: per-node deferred work
    ...

    Linus Torvalds
     
  • Pull NFS client bugfixes (part 2) from Trond Myklebust:
    "Bugfixes:
    - Fix a few credential reference leaks resulting from the
    SP4_MACH_CRED NFSv4.1 state protection code.
    - Fix the SUNRPC bloatometer footprint: convert a 256K hashtable into
    the intended 64 byte structure.
    - Fix a long standing XDR issue with FREE_STATEID
    - Fix a potential WARN_ON spamming issue
    - Fix a missing dprintk() kuid conversion

    New features:
    - Enable the NFSv4.1 state protection support for the WRITE and
    COMMIT operations"

    * tag 'nfs-for-3.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    SUNRPC: No, I did not intend to create a 256KiB hashtable
    sunrpc: Add missing kuids conversion for printing
    NFSv4.1: sp4_mach_cred: WARN_ON -> WARN_ON_ONCE
    NFSv4.1: sp4_mach_cred: no need to ref count creds
    NFSv4.1: fix SECINFO* use of put_rpccred
    NFSv4.1: sp4_mach_cred: ask for WRITE and COMMIT
    NFSv4.1 fix decode_free_stateid

    Linus Torvalds
     

12 Sep, 2013

15 commits

  • Fix the declaration of the gss_auth_hash_table so that it creates
    a 16 bucket hashtable, as I had intended.

    Reported-by: Geert Uytterhoeven
    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • m68k/allmodconfig:

    net/sunrpc/auth_generic.c: In function ‘generic_key_timeout’:
    net/sunrpc/auth_generic.c:241: warning: format ‘%d’ expects type ‘int’, but
    argument 2 has type ‘kuid_t’

    commit cdba321e291f0fbf5abda4d88340292b858e3d4d ("sunrpc: Convert kuids and
    kgids to uids and gids for printing") forgot to convert one instance.

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Trond Myklebust

    Geert Uytterhoeven
     
  • Merge first patch-bomb from Andrew Morton:
    - Some pidns/fork/exec tweaks
    - OCFS2 updates
    - Most of MM - there remain quite a few memcg parts which depend on
    pending core cgroups changes. Which might have been already merged -
    I'll check tomorrow...
    - Various misc stuff all over the place
    - A few block bits which I never got around to sending to Jens -
    relatively minor things.
    - MAINTAINERS maintenance
    - A small number of lib/ updates
    - checkpatch updates
    - epoll
    - firmware/dmi-scan
    - Some kprobes work for S390
    - drivers/rtc updates
    - hfsplus feature work
    - vmcore feature work
    - rbtree upgrades
    - AOE updates
    - pktcdvd cleanups
    - PPS
    - memstick
    - w1
    - New "inittmpfs" feature, which does the obvious
    - More IPC work from Davidlohr.

    * emailed patches from Andrew Morton : (303 commits)
    lz4: fix compression/decompression signedness mismatch
    ipc: drop ipc_lock_check
    ipc, shm: drop shm_lock_check
    ipc: drop ipc_lock_by_ptr
    ipc, shm: guard against non-existant vma in shmdt(2)
    ipc: document general ipc locking scheme
    ipc,msg: drop msg_unlock
    ipc: rename ids->rw_mutex
    ipc,shm: shorten critical region for shmat
    ipc,shm: cleanup do_shmat pasta
    ipc,shm: shorten critical region for shmctl
    ipc,shm: make shmctl_nolock lockless
    ipc,shm: introduce shmctl_nolock
    ipc: drop ipcctl_pre_down
    ipc,shm: shorten critical region in shmctl_down
    ipc,shm: introduce lockless functions to obtain the ipc object
    initmpfs: use initramfs if rootfstype= or root= specified
    initmpfs: make rootfs use tmpfs when CONFIG_TMPFS enabled
    initmpfs: move rootfs code from fs/ramfs/ to init/
    initmpfs: move bdi setup from init_rootfs to init_ramfs
    ...

    Linus Torvalds
     
  • I found the following pattern that leads in to interesting findings:

    grep -r "ret.*|=.*__put_user" *
    grep -r "ret.*|=.*__get_user" *
    grep -r "ret.*|=.*__copy" *

    The __put_user() calls in compat_ioctl.c, ptrace compat, signal compat,
    since those appear in compat code, we could probably expect the kernel
    addresses not to be reachable in the lower 32-bit range, so I think they
    might not be exploitable.

    For the "__get_user" cases, I don't think those are exploitable: the worse
    that can happen is that the kernel will copy kernel memory into in-kernel
    buffers, and will fail immediately afterward.

    The alpha csum_partial_copy_from_user() seems to be missing the
    access_ok() check entirely. The fix is inspired from x86. This could
    lead to information leak on alpha. I also noticed that many architectures
    map csum_partial_copy_from_user() to csum_partial_copy_generic(), but I
    wonder if the latter is performing the access checks on every
    architectures.

    Signed-off-by: Mathieu Desnoyers
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: Jens Axboe
    Cc: Oleg Nesterov
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathieu Desnoyers
     
  • Pull networking fixes from David Miller:

    1) Brown paper bag fix in HTB scheduler, class options set incorrectly
    due to a typoe. Fix from Vimalkumar.

    2) It's possible for the ipv6 FIB garbage collector to run before all
    the necessary datastructure are setup during init, defer the
    notifier registry to avoid this problem. Fix from Michal Kubecek.

    3) New i40e ethernet driver from the Intel folks.

    4) Add new qmi wwan device IDs, from Bjørn Mork.

    5) Doorbell lock in bnx2x driver is not initialized properly in some
    configurations, fix from Ariel Elior.

    6) Revert an ipv6 packet option padding change that broke standardized
    ipv6 implementation test suites. From Jiri Pirko.

    7) Fix synchronization of ARP information in bonding layer, from
    Nikolay Aleksandrov.

    8) Fix missing error return resulting in illegal memory accesses in
    openvswitch, from Daniel Borkmann.

    9) SCTP doesn't signal poll events properly due to mistaken operator
    precedence, fix also from Daniel Borkmann.

    10) __netdev_pick_tx() passes wrong index to sk_tx_queue_set() which
    essentially disables caching of TX queue in sockets :-/ Fix from
    Eric Dumazet.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits)
    net_sched: htb: fix a typo in htb_change_class()
    net: qmi_wwan: add new Qualcomm devices
    ipv6: don't call fib6_run_gc() until routing is ready
    net: tilegx driver: avoid compiler warning
    fib6_rules: fix indentation
    irda: vlsi_ir: Remove casting the return value which is a void pointer
    irda: donauboe: Remove casting the return value which is a void pointer
    net: fix multiqueue selection
    net: sctp: fix smatch warning in sctp_send_asconf_del_ip
    net: sctp: fix bug in sctp_poll for SOCK_SELECT_ERR_QUEUE
    net: fib: fib6_add: fix potential NULL pointer dereference
    net: ovs: flow: fix potential illegal memory access in __parse_flow_nlattrs
    bcm63xx_enet: remove deprecated IRQF_DISABLED
    net: korina: remove deprecated IRQF_DISABLED
    macvlan: Move skb_clone check closer to call
    qlcnic: Fix warning reported by kbuild test robot.
    bonding: fix bond_arp_rcv setting and arp validate desync state
    bonding: fix store_arp_validate race with mode change
    ipv6/exthdrs: accept tlv which includes only padding
    bnx2x: avoid atomic allocations during initialization
    ...

    Linus Torvalds
     
  • Fix a typo added in commit 56b765b79 ("htb: improved accuracy at high
    rates")

    cbuffer should not be a copy of buffer.

    Signed-off-by: Vimalkumar
    Signed-off-by: Eric Dumazet
    Cc: Jesper Dangaard Brouer
    Cc: Jiri Pirko
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Vimalkumar
     
  • When loading the ipv6 module, ndisc_init() is called before
    ip6_route_init(). As the former registers a handler calling
    fib6_run_gc(), this opens a window to run the garbage collector
    before necessary data structures are initialized. If a network
    device is initialized in this window, adding MAC address to it
    triggers a NETDEV_CHANGEADDR event, leading to a crash in
    fib6_clean_all().

    Take the event handler registration out of ndisc_init() into a
    separate function ndisc_late_init() and move it after
    ip6_route_init().

    Signed-off-by: Michal Kubecek
    Signed-off-by: David S. Miller

    Michal Kubeček
     
  • This change just removes two tabs from the source file.

    Signed-off-by: Stefan Tomanek
    Signed-off-by: David S. Miller

    Stefan Tomanek
     
  • commit 416186fbf8c5b4e4465 ("net: Split core bits of netdev_pick_tx
    into __netdev_pick_tx") added a bug that disables caching of queue
    index in the socket.

    This is the source of packet reorders for TCP flows, and
    again this is happening more often when using FQ pacing.

    Old code was doing

    if (queue_index != old_index)
    sk_tx_queue_set(sk, queue_index);

    Alexander renamed the variables but forgot to change sk_tx_queue_set()
    2nd parameter.

    if (queue_index != new_index)
    sk_tx_queue_set(sk, queue_index);

    This means we store -1 over and over in sk->sk_tx_queue_mapping

    Signed-off-by: Eric Dumazet
    Cc: Alexander Duyck
    Acked-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This was originally reported in [1] and posted by Neil Horman [2], he said:

    Fix up a missed null pointer check in the asconf code. If we don't find
    a local address, but we pass in an address length of more than 1, we may
    dereference a NULL laddr pointer. Currently this can't happen, as the only
    users of the function pass in the value 1 as the addrcnt parameter, but
    its not hot path, and it doesn't hurt to check for NULL should that ever
    be the case.

    The callpath from sctp_asconf_mgmt() looks okay. But this could be triggered
    from sctp_setsockopt_bindx() call with SCTP_BINDX_REM_ADDR and addrcnt > 1
    while passing all possible addresses from the bind list to SCTP_BINDX_REM_ADDR
    so that we do *not* find a single address in the association's bind address
    list that is not in the packed array of addresses. If this happens when we
    have an established association with ASCONF-capable peers, then we could get
    a NULL pointer dereference as we only check for laddr == NULL && addrcnt == 1
    and call later sctp_make_asconf_update_ip() with NULL laddr.

    BUT: this actually won't happen as sctp_bindx_rem() will catch such a case
    and return with an error earlier. As this is incredably unintuitive and error
    prone, add a check to catch at least future bugs here. As Neil says, its not
    hot path. Introduced by 8a07eb0a5 ("sctp: Add ASCONF operation on the
    single-homed host").

    [1] http://www.spinics.net/lists/linux-sctp/msg02132.html
    [2] http://www.spinics.net/lists/linux-sctp/msg02133.html

    Reported-by: Dan Carpenter
    Signed-off-by: Neil Horman
    Signed-off-by: Daniel Borkmann
    Cc: Michio Honda
    Acked-By: Neil Horman
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • If we do not add braces around ...

    mask |= POLLERR |
    sock_flag(sk, SOCK_SELECT_ERR_QUEUE) ? POLLPRI : 0;

    ... then this condition always evaluates to true as POLLERR is
    defined as 8 and binary or'd with whatever result comes out of
    sock_flag(). Hence instead of (X | Y) ? A : B, transform it into
    X | (Y ? A : B). Unfortunatelty, commit 8facd5fb73 ("net: fix
    smatch warnings inside datagram_poll") forgot about SCTP. :-(

    Introduced by 7d4c04fc170 ("net: add option to enable error queue
    packets waking select").

    Signed-off-by: Daniel Borkmann
    Cc: Jacob Keller
    Acked-by: Neil Horman
    Acked-by: Vlad Yasevich
    Acked-by: Jacob Keller
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • When the kernel is compiled with CONFIG_IPV6_SUBTREES, and we return
    with an error in fn = fib6_add_1(), then error codes are encoded into
    the return pointer e.g. ERR_PTR(-ENOENT). In such an error case, we
    write the error code into err and jump to out, hence enter the if(err)
    condition. Now, if CONFIG_IPV6_SUBTREES is enabled, we check for:

    if (pn != fn && pn->leaf == rt)
    ...
    if (pn != fn && !pn->leaf && !(pn->fn_flags & RTN_RTINFO))
    ...

    Since pn is NULL and fn is f.e. ERR_PTR(-ENOENT), then pn != fn
    evaluates to true and causes a NULL-pointer dereference on further
    checks on pn. Fix it, by setting both NULL in error case, so that
    pn != fn already evaluates to false and no further dereference
    takes place.

    This was first correctly implemented in 4a287eba2 ("IPv6 routing,
    NLM_F_* flag support: REPLACE and EXCL flags support, warn about
    missing CREATE flag"), but the bug got later on introduced by
    188c517a0 ("ipv6: return errno pointers consistently for fib6_add_1()").

    Signed-off-by: Daniel Borkmann
    Cc: Lin Ming
    Cc: Matti Vaittinen
    Cc: Hannes Frederic Sowa
    Acked-by: Hannes Frederic Sowa
    Acked-by: Matti Vaittinen
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • In function __parse_flow_nlattrs(), we check for condition
    (type > OVS_KEY_ATTR_MAX) and if true, print an error, but we do
    not return from this function as in other checks. It seems this
    has been forgotten, as otherwise, we could access beyond the
    memory of ovs_key_lens, which is of ovs_key_lens[OVS_KEY_ATTR_MAX + 1].
    Hence, a maliciously prepared nla_type from user space could access
    beyond this upper limit.

    Introduced by 03f0d916a ("openvswitch: Mega flow implementation").

    Signed-off-by: Daniel Borkmann
    Cc: Andy Zhou
    Acked-by: Jesse Gross
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • In rfc4942 and rfc2460 I cannot find anything which would implicate to
    drop packets which have only padding in tlv.

    Current behaviour breaks TAHI Test v6LC.1.2.6.

    Problem was intruduced in:
    9b905fe6843 "ipv6/exthdrs: strict Pad1 and PadN check"

    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • Pull 9p updates from Eric Van Hensbergen:
    "Minor 9p fixes and tweaks for 3.12 merge window

    The first fixes namespace issues which causes a kernel NULL pointer
    dereference, the second fixes uevent handling to work better with
    udev, and the third switches some code to use srlcpy instead of
    strncpy in order to be safer.

    All changes have been baking in for-next for at least 2 weeks"

    * tag 'for-linus-3.12-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
    fs/9p: avoid accessing utsname after namespace has been torn down
    9p: send uevent after adding/removing mount_tag attribute
    fs: 9p: use strlcpy instead of strncpy

    Linus Torvalds
     

11 Sep, 2013

2 commits

  • Pull nfsd updates from Bruce Fields:
    "This was a very quiet cycle! Just a few bugfixes and some cleanup"

    * 'nfsd-next' of git://linux-nfs.org/~bfields/linux:
    rpc: let xdr layer allocate gssproxy receieve pages
    rpc: fix huge kmalloc's in gss-proxy
    rpc: comment on linux_cred encoding, treat all as unsigned
    rpc: clean up decoding of gssproxy linux creds
    svcrpc: remove unused rq_resused
    nfsd4: nfsd4_create_clid_dir prints uninitialized data
    nfsd4: fix leak of inode reference on delegation failure
    Revert "nfsd: nfs4_file_get_access: need to be more careful with O_RDWR"
    sunrpc: prepare NFS for 2038
    nfsd4: fix setlease error return
    nfsd: nfs4_file_get_access: need to be more careful with O_RDWR

    Linus Torvalds
     
  • Convert the remaining couple of random shrinkers in the tree to the new
    API.

    Signed-off-by: Dave Chinner
    Signed-off-by: Glauber Costa
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Cc: Chuck Lever
    Cc: J. Bruce Fields
    Cc: Trond Myklebust
    Cc: "Theodore Ts'o"
    Cc: Adrian Hunter
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Arve Hjønnevåg
    Cc: Carlos Maiolino
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Cc: Daniel Vetter
    Cc: David Rientjes
    Cc: Gleb Natapov
    Cc: Greg Thelen
    Cc: J. Bruce Fields
    Cc: Jan Kara
    Cc: Jerome Glisse
    Cc: John Stultz
    Cc: KAMEZAWA Hiroyuki
    Cc: Kent Overstreet
    Cc: Kirill A. Shutemov
    Cc: Marcelo Tosatti
    Cc: Mel Gorman
    Cc: Steven Whitehouse
    Cc: Thomas Hellstrom
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Dave Chinner
     

10 Sep, 2013

2 commits

  • Pull NFS client updates from Trond Myklebust:
    "Highlights include:

    - Fix NFSv4 recovery so that it doesn't recover lost locks in cases
    such as lease loss due to a network partition, where doing so may
    result in data corruption. Add a kernel parameter to control
    choice of legacy behaviour or not.
    - Performance improvements when 2 processes are writing to the same
    file.
    - Flush data to disk when an RPCSEC_GSS session timeout is imminent.
    - Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other
    NFS clients from being able to manipulate our lease and file
    locking state.
    - Allow sharing of RPCSEC_GSS caches between different rpc clients.
    - Fix the broken NFSv4 security auto-negotiation between client and
    server.
    - Fix rmdir() to wait for outstanding sillyrename unlinks to complete
    - Add a tracepoint framework for debugging NFSv4 state recovery
    issues.
    - Add tracing to the generic NFS layer.
    - Add tracing for the SUNRPC socket connection state.
    - Clean up the rpc_pipefs mount/umount event management.
    - Merge more patches from Chuck in preparation for NFSv4 migration
    support"

    * tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (107 commits)
    NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity
    NFS: nfs_compare_super shouldn't check the auth flavour unless 'sec=' was set
    NFSv4: Allow security autonegotiation for submounts
    NFSv4: Disallow security negotiation for lookups when 'sec=' is specified
    NFSv4: Fix security auto-negotiation
    NFS: Clean up nfs_parse_security_flavors()
    NFS: Clean up the auth flavour array mess
    NFSv4.1 Use MDS auth flavor for data server connection
    NFS: Don't check lock owner compatability unless file is locked (part 2)
    NFS: Don't check lock owner compatibility in writes unless file is locked
    nfs4: Map NFS4ERR_WRONG_CRED to EPERM
    nfs4.1: Add SP4_MACH_CRED write and commit support
    nfs4.1: Add SP4_MACH_CRED stateid support
    nfs4.1: Add SP4_MACH_CRED secinfo support
    nfs4.1: Add SP4_MACH_CRED cleanup support
    nfs4.1: Add state protection handler
    nfs4.1: Minimal SP4_MACH_CRED implementation
    SUNRPC: Replace pointer values with task->tk_pid and rpc_clnt->cl_clid
    SUNRPC: Add an identifier for struct rpc_clnt
    SUNRPC: Ensure rpc_task->tk_pid is available for tracepoints
    ...

    Linus Torvalds
     
  • Pull ceph updates from Sage Weil:
    "This includes both the first pile of Ceph patches (which I sent to
    torvalds@vger, sigh) and a few new patches that add support for
    fscache for Ceph. That includes a few fscache core fixes that David
    Howells asked go through the Ceph tree. (Thanks go to Milosz Tanski
    for putting this feature together)

    This first batch of patches (included here) had (has) several
    important RBD bug fixes, hole punch support, several different
    cleanups in the page cache interactions, improvements in the truncate
    code (new truncate mutex to avoid shenanigans with i_mutex), and a
    series of fixes in the synchronous striping read/write code.

    On top of that is a random collection of small fixes all across the
    tree (error code checks and error path cleanup, obsolete wq flags,
    etc)"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (43 commits)
    ceph: use d_invalidate() to invalidate aliases
    ceph: remove ceph_lookup_inode()
    ceph: trivial buildbot warnings fix
    ceph: Do not do invalidate if the filesystem is mounted nofsc
    ceph: page still marked private_2
    ceph: ceph_readpage_to_fscache didn't check if marked
    ceph: clean PgPrivate2 on returning from readpages
    ceph: use fscache as a local presisent cache
    fscache: Netfs function for cleanup post readpages
    FS-Cache: Fix heading in documentation
    CacheFiles: Implement interface to check cache consistency
    FS-Cache: Add interface to check consistency of a cached object
    rbd: fix null dereference in dout
    rbd: fix buffer size for writes to images with snapshots
    libceph: use pg_num_mask instead of pgp_num_mask for pg.seed calc
    rbd: fix I/O error propagation for reads
    ceph: use vfs __set_page_dirty_nobuffers interface instead of doing it inside filesystem
    ceph: allow sync_read/write return partial successed size of read/write.
    ceph: fix bugs about handling short-read for sync read mode.
    ceph: remove useless variable revoked_rdcache
    ...

    Linus Torvalds
     

08 Sep, 2013

2 commits

  • Pull namespace changes from Eric Biederman:
    "This is an assorted mishmash of small cleanups, enhancements and bug
    fixes.

    The major theme is user namespace mount restrictions. nsown_capable
    is killed as it encourages not thinking about details that need to be
    considered. A very hard to hit pid namespace exiting bug was finally
    tracked and fixed. A couple of cleanups to the basic namespace
    infrastructure.

    Finally there is an enhancement that makes per user namespace
    capabilities usable as capabilities, and an enhancement that allows
    the per userns root to nice other processes in the user namespace"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    userns: Kill nsown_capable it makes the wrong thing easy
    capabilities: allow nice if we are privileged
    pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD
    userns: Allow PR_CAPBSET_DROP in a user namespace.
    namespaces: Simplify copy_namespaces so it is clear what is going on.
    pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup
    sysfs: Restrict mounting sysfs
    userns: Better restrictions on when proc and sysfs can be mounted
    vfs: Don't copy mount bind mounts of /proc//ns/mnt between namespaces
    kernel/nsproxy.c: Improving a snippet of code.
    proc: Restrict mounting the proc filesystem
    vfs: Lock in place mounts from more privileged users

    Linus Torvalds
     
  • Pull networking fixes from David Miller:
    "A quick set of fixes, some to deal with fallout from yesterday's
    net-next merge.

    1) Fix compilation of bnx2x driver with CONFIG_BNX2X_SRIOV disabled,
    from Dmitry Kravkov.

    2) Fix a bnx2x regression caused by one of Dave Jones's mistaken
    braces changes, from Eilon Greenstein.

    3) Add some protective filtering in the netlink tap code, from Daniel
    Borkmann.

    4) Fix TCP congestion window growth regression after timeouts, from
    Yuchung Cheng.

    5) Correctly adjust TCP's rcv_ssthresh for out of order packets, from
    Eric Dumazet"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
    tcp: properly increase rcv_ssthresh for ofo packets
    net: add documentation for BQL helpers
    mlx5: remove unused MLX5_DEBUG param in Kconfig
    bnx2x: Restore a call to config_init
    bnx2x: fix broken compilation with CONFIG_BNX2X_SRIOV is not set
    tcp: fix no cwnd growth after timeout
    net: netlink: filter particular protocols from analyzers

    Linus Torvalds
     

07 Sep, 2013

6 commits

  • TCP receive window handling is multi staged.

    A socket has a memory budget, static or dynamic, in sk_rcvbuf.

    Because we do not really know how this memory budget translates to
    a TCP window (payload), TCP announces a small initial window
    (about 20 MSS).

    When a packet is received, we increase TCP rcv_win depending
    on the payload/truesize ratio of this packet. Good citizen
    packets give a hint that it's reasonable to have rcv_win = sk_rcvbuf/2

    This heuristic takes place in tcp_grow_window()

    Problem is : We currently call tcp_grow_window() only for in-order
    packets.

    This means that reorders or packet losses stop proper grow of
    rcv_win, and senders are unable to benefit from fast recovery,
    or proper reordering level detection.

    Really, a packet being stored in OFO queue is not a bad citizen.
    It should be part of the game as in-order packets.

    In our traces, we very often see sender is limited by linux small
    receive windows, even if linux hosts use autotuning (DRS) and should
    allow rcv_win to grow to ~3MB.

    Signed-off-by: Eric Dumazet
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • In commit 0f7cc9a3 "tcp: increase throughput when reordering is high",
    it only allows cwnd to increase in Open state. This mistakenly disables
    slow start after timeout (CA_Loss). Moreover cwnd won't grow if the
    state moves from Disorder to Open later in tcp_fastretrans_alert().

    Therefore the correct logic should be to allow cwnd to grow as long
    as the data is received in order in Open, Loss, or even Disorder state.

    Signed-off-by: Yuchung Cheng
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Yuchung Cheng
     
  • Fix finer-grained control and let only a whitelist of allowed netlink
    protocols pass, in our case related to networking. If later on, other
    subsystems decide they want to add their protocol as well to the list
    of allowed protocols they shall simply add it. While at it, we also
    need to tell what protocol is in use otherwise BPF_S_ANC_PROTOCOL can
    not pick it up (as it's not filled out).

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Patches for Ceph FS-Cache support

    Milosz Tanski
     
  • Pull trivial tree from Jiri Kosina:
    "The usual trivial updates all over the tree -- mostly typo fixes and
    documentation updates"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (52 commits)
    doc: Documentation/cputopology.txt fix typo
    treewide: Convert retrun typos to return
    Fix comment typo for init_cma_reserved_pageblock
    Documentation/trace: Correcting and extending tracepoint documentation
    mm/hotplug: fix a typo in Documentation/memory-hotplug.txt
    power: Documentation: Update s2ram link
    doc: fix a typo in Documentation/00-INDEX
    Documentation/printk-formats.txt: No casts needed for u64/s64
    doc: Fix typo "is is" in Documentations
    treewide: Fix printks with 0x%#
    zram: doc fixes
    Documentation/kmemcheck: update kmemcheck documentation
    doc: documentation/hwspinlock.txt fix typo
    PM / Hibernate: add section for resume options
    doc: filesystems : Fix typo in Documentations/filesystems
    scsi/megaraid fixed several typos in comments
    ppc: init_32: Fix error typo "CONFIG_START_KERNEL"
    treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks
    page_isolation: Fix a comment typo in test_pages_isolated()
    doc: fix a typo about irq affinity
    ...

    Linus Torvalds
     
  • Pull HID updates from Jiri Kosina:
    "Highlights:

    - conversion of HID subsystem to use devm-based resource management,
    from Benjamin Tissoires

    - i2c-hid support for DT bindings, from Benjamin Tissoires

    - much improved support for Win8-multitouch devices, from Benjamin
    Tissoires

    - cleanup of core code using common hidinput_input_event(), from
    David Herrmann

    - fix for bug in implement() access to the bit stream (causing oops)
    that has been present in the code for ages, but devices that are
    able to trigger it have started to appear only now, from Jiri
    Kosina

    - fixes for CVE-2013-2899, CVE-2013-2898, CVE-2013-2896,
    CVE-2013-2892, CVE-2013-2888 (all triggerable only by specially
    crafted malicious HW devices plugged into the system), from Kees
    Cook

    - hidraw oops fix, from Manoj Chourasia

    - various smaller fixes here and there, support for a bunch of new
    devices by various contributors"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (53 commits)
    HID: MAINTAINERS: add roccat drivers
    HID: hid-sensor-hub: change kmalloc + memcpy by kmemdup
    HID: hid-sensor-hub: move to devm_kzalloc
    HID: hid-sensor-hub: fix indentation accross the code
    HID: move HID_REPORT_TYPES closer to the report-definitions
    HID: check for NULL field when setting values
    HID: picolcd_core: validate output report details
    HID: sensor-hub: validate feature report details
    HID: ntrig: validate feature report details
    HID: pantherlord: validate output report details
    HID: hid-wiimote: print small buffers via %*phC
    HID: uhid: improve uhid example client
    HID: Correct the USB IDs for the new Macbook Air 6
    HID: wiimote: add support for Guitar-Hero guitars
    HID: wiimote: add support for Guitar-Hero drums
    Input: introduce BTN/ABS bits for drums and guitars
    HID: battery: don't do DMA from stack
    HID: roccat: add support for KonePureOptical v2
    HID: picolcd: Prevent NULL pointer dereference on _remove()
    HID: usbhid: quirk for N-Trig DuoSense Touch Screen
    ...

    Linus Torvalds
     

06 Sep, 2013

7 commits

  • In theory the linux cred in a gssproxy reply can include up to
    NGROUPS_MAX data, 256K of data. In the common case we expect it to be
    shorter. So do as the nfsv3 ACL code does and let the xdr code allocate
    the pages as they come in, instead of allocating a lot of pages that
    won't typically be used.

    Tested-by: Simo Sorce
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • The reply to a gssproxy can include up to NGROUPS_MAX gid's, which will
    take up more than a page. We therefore need to allocate an array of
    pages to hold the reply instead of trying to allocate a single huge
    buffer.

    Tested-by: Simo Sorce
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • The encoding of linux creds is a bit confusing.

    Also: I think in practice it doesn't really matter whether we treat any
    of these things as signed or unsigned, but unsigned seems more
    straightforward: uid_t/gid_t are unsigned and it simplifies the ngroups
    overflow check.

    Tested-by: Simo Sorce
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • We can use the normal coding infrastructure here.

    Two minor behavior changes:

    - we're assuming no wasted space at the end of the linux cred.
    That seems to match gss-proxy's behavior, and I can't see why
    it would need to do differently in the future.

    - NGROUPS_MAX check added: note groups_alloc doesn't do this,
    this is the caller's responsibility.

    Tested-by: Simo Sorce
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Pull networking changes from David Miller:
    "Noteworthy changes this time around:

    1) Multicast rejoin support for team driver, from Jiri Pirko.

    2) Centralize and simplify TCP RTT measurement handling in order to
    reduce the impact of bad RTO seeding from SYN/ACKs. Also, when
    both timestamps and local RTT measurements are available prefer
    the later because there are broken middleware devices which
    scramble the timestamp.

    From Yuchung Cheng.

    3) Add TCP_NOTSENT_LOWAT socket option to limit the amount of kernel
    memory consumed to queue up unsend user data. From Eric Dumazet.

    4) Add a "physical port ID" abstraction for network devices, from
    Jiri Pirko.

    5) Add a "suppress" operation to influence fib_rules lookups, from
    Stefan Tomanek.

    6) Add a networking development FAQ, from Paul Gortmaker.

    7) Extend the information provided by tcp_probe and add ipv6 support,
    from Daniel Borkmann.

    8) Use RCU locking more extensively in openvswitch data paths, from
    Pravin B Shelar.

    9) Add SCTP support to openvswitch, from Joe Stringer.

    10) Add EF10 chip support to SFC driver, from Ben Hutchings.

    11) Add new SYNPROXY netfilter target, from Patrick McHardy.

    12) Compute a rate approximation for sending in TCP sockets, and use
    this to more intelligently coalesce TSO frames. Furthermore, add
    a new packet scheduler which takes advantage of this estimate when
    available. From Eric Dumazet.

    13) Allow AF_PACKET fanouts with random selection, from Daniel
    Borkmann.

    14) Add ipv6 support to vxlan driver, from Cong Wang"

    Resolved conflicts as per discussion.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1218 commits)
    openvswitch: Fix alignment of struct sw_flow_key.
    netfilter: Fix build errors with xt_socket.c
    tcp: Add missing braces to do_tcp_setsockopt
    caif: Add missing braces to multiline if in cfctrl_linkup_request
    bnx2x: Add missing braces in bnx2x:bnx2x_link_initialize
    vxlan: Fix kernel panic on device delete.
    net: mvneta: implement ->ndo_do_ioctl() to support PHY ioctls
    net: mvneta: properly disable HW PHY polling and ensure adjust_link() works
    icplus: Use netif_running to determine device state
    ethernet/arc/arc_emac: Fix huge delays in large file copies
    tuntap: orphan frags before trying to set tx timestamp
    tuntap: purge socket error queue on detach
    qlcnic: use standard NAPI weights
    ipv6:introduce function to find route for redirect
    bnx2x: VF RSS support - VF side
    bnx2x: VF RSS support - PF side
    vxlan: Notify drivers for listening UDP port changes
    net: usbnet: update addr_assign_type if appropriate
    driver/net: enic: update enic maintainers and driver
    driver/net: enic: Exposing symbols for Cisco's low latency driver
    ...

    Linus Torvalds
     
  • sw_flow_key alignment was declared as " __aligned(__alignof__(long))".
    However, this breaks on the m68k architecture where long is 32 bit in
    size but 16 bit aligned by default. This aligns to the size of a long to
    ensure that we can always do comparsions in full long-sized chunks. It
    also adds an additional build check to catch any reduction in alignment.

    CC: Andy Zhou
    Reported-by: Fengguang Wu
    Reported-by: Geert Uytterhoeven
    Signed-off-by: Jesse Gross
    Signed-off-by: David S. Miller

    Jesse Gross
     
  • Conflicts:
    drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
    net/bridge/br_multicast.c
    net/ipv6/sit.c

    The conflicts were minor:

    1) sit.c changes overlap with change to ip_tunnel_xmit() signature.

    2) br_multicast.c had an overlap between computing max_delay using
    msecs_to_jiffies and turning MLDV2_MRC() into an inline function
    with a name using lowercase instead of uppercase letters.

    3) stmmac had two overlapping changes, one which conditionally allocated
    and hooked up a dma_cfg based upon the presence of the pbl OF property,
    and another one handling store-and-forward DMA made. The latter of
    which should not go into the new of_find_property() basic block.

    Signed-off-by: David S. Miller

    David S. Miller