12 Oct, 2014

1 commit

  • Pull file locking related changes from Jeff Layton:
    "This release is a little more busy for file locking changes than the
    last:

    - a set of patches from Kinglong Mee to fix the lockowner handling in
    knfsd
    - a pile of cleanups to the internal file lease API. This should get
    us a bit closer to allowing for setlease methods that can block.

    There are some dependencies between mine and Bruce's trees this cycle,
    and I based my tree on top of the requisite patches in Bruce's tree"

    * tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linux: (26 commits)
    locks: fix fcntl_setlease/getlease return when !CONFIG_FILE_LOCKING
    locks: flock_make_lock should return a struct file_lock (or PTR_ERR)
    locks: set fl_owner for leases to filp instead of current->files
    locks: give lm_break a return value
    locks: __break_lease cleanup in preparation of allowing direct removal of leases
    locks: remove i_have_this_lease check from __break_lease
    locks: move freeing of leases outside of i_lock
    locks: move i_lock acquisition into generic_*_lease handlers
    locks: define a lm_setup handler for leases
    locks: plumb a "priv" pointer into the setlease routines
    nfsd: don't keep a pointer to the lease in nfs4_file
    locks: clean up vfs_setlease kerneldoc comments
    locks: generic_delete_lease doesn't need a file_lock at all
    nfsd: fix potential lease memory leak in nfs4_setlease
    locks: close potential race in lease_get_mtime
    security: make security_file_set_fowner, f_setown and __f_setown void return
    locks: consolidate "nolease" routines
    locks: remove lock_may_read and lock_may_write
    lockd: rip out deferred lock handling from testlock codepath
    NFSD: Get reference of lockowner when coping file_lock
    ...

    Linus Torvalds
     

24 Sep, 2014

1 commit


10 Sep, 2014

3 commits


06 Sep, 2014

2 commits

  • This patch fix spelling typo found in DocBook/networking.xml.
    It is because the neworking.xml is generated from comments
    in the source, I have to fix typo in comments within the source.

    Signed-off-by: Masanari Iida
    Acked-by: Randy Dunlap
    Signed-off-by: David S. Miller

    Masanari Iida
     
  • The timestamping API has separate bits for generating and reporting
    timestamps. A software timestamp should only be reported for a packet
    when the packet has the relevant generation flag (SKBTX_..) set
    and the socket has reporting bit SOF_TIMESTAMPING_SOFTWARE set.

    The second check was accidentally removed. Reinstitute the original
    behavior.

    Tested:
    Without this patch, Documentation/networking/txtimestamp reports
    timestamps regardless of whether SOF_TIMESTAMPING_SOFTWARE is set.
    After the patch, it only reports them when the flag is set.

    Fixes: f24b9be5957b ("net-timestamp: extend SCM_TIMESTAMPING ancillary data struct")
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

07 Aug, 2014

1 commit

  • sock_tx_timestamp() should not ignore initial *tx_flags value, as TCP
    stack can store SKBTX_SHARED_FRAG in it.

    Also first argument (struct sock *) can be const.

    Signed-off-by: Eric Dumazet
    Fixes: 4ed2d765dfac ("net-timestamp: TCP timestamping")
    Cc: Willem de Bruijn
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Eric Dumazet
     

06 Aug, 2014

4 commits

  • Add SOF_TIMESTAMPING_TX_ACK, a request for a tstamp when the last byte
    in the send() call is acknowledged. It implements the feature for TCP.

    The timestamp is generated when the TCP socket cumulative ACK is moved
    beyond the tracked seqno for the first time. The feature ignores SACK
    and FACK, because those acknowledge the specific byte, but not
    necessarily the entire contents of the buffer up to that byte.

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     
  • Kernel transmit latency is often incurred in the packet scheduler.
    Introduce a new timestamp on transmission just before entering the
    scheduler. When data travels through multiple devices (bonding,
    tunneling, ...) each device will export an individual timestamp.

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     
  • sk_flags is reaching its limit. New timestamping options will not fit.
    Move all of them into a new field sk->sk_tsflags.

    Added benefit is that this removes boilerplate code to convert between
    SOF_TIMESTAMPING_.. and SOCK_TIMESTAMPING_.. in getsockopt/setsockopt.

    SOCK_TIMESTAMPING_RX_SOFTWARE is also used to toggle the receive
    timestamp logic (netstamp_needed). That can be simplified and this
    last key removed, but will leave that for a separate patch.

    Signed-off-by: Willem de Bruijn

    ----

    The u16 in sock can be moved into a 16-bit hole below sk_gso_max_segs,
    though that scatters tstamp fields throughout the struct.
    Signed-off-by: David S. Miller

    Willem de Bruijn
     
  • Applications that request kernel tx timestamps with SO_TIMESTAMPING
    read timestamps as recvmsg() ancillary data. The response is defined
    implicitly as timespec[3].

    1) define struct scm_timestamping explicitly and

    2) add support for new tstamp types. On tx, scm_timestamping always
    accompanies a sock_extended_err. Define previously unused field
    ee_info to signal the type of ts[0]. Introduce SCM_TSTAMP_SND to
    define the existing behavior.

    The reception path is not modified. On rx, no struct similar to
    sock_extended_err is passed along with SCM_TIMESTAMPING.

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

30 Jul, 2014

1 commit

  • The SO_TIMESTAMPING API defines three types of timestamps: software,
    hardware in raw format (hwtstamp) and hardware converted to system
    format (syststamp). The last has been deprecated in favor of combining
    hwtstamp with a PTP clock driver. There are no active users in the
    kernel.

    The option was device driver dependent. If set, but without hardware
    support, the correct behavior is to return zero in the relevant field
    in the SCM_TIMESTAMPING ancillary message. Without device drivers
    implementing the option, this field is effectively always zero.

    Remove the internal plumbing to dissuage new drivers from implementing
    the feature. Keep the SOF_TIMESTAMPING_SYS_HARDWARE flag, however, to
    avoid breaking existing applications that request the timestamp.

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

17 Apr, 2014

1 commit

  • Make sys_recv a first class citizen by using the SYSCALL_DEFINEx
    macro. Besides being cleaner this will also generate meta data
    for the system call so tracing tools like ftrace or LTTng can
    resolve this system call.

    Signed-off-by: Jan Glauber
    Signed-off-by: David S. Miller

    Jan Glauber
     

02 Apr, 2014

1 commit

  • This commit fixes a build error reported by Fengguang, that is
    triggered when CONFIG_NETWORK_PHY_TIMESTAMPING is not set:

    ERROR: "ptp_classify_raw" [drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.ko] undefined!

    The fix is to introduce its own file for the PTP BPF classifier,
    so that PTP_1588_CLOCK and/or NETWORK_PHY_TIMESTAMPING can select
    it independently from each other. IXP4xx driver on ARM needs to
    select it as well since it does not seem to select PTP_1588_CLOCK
    or similar that would pull it in automatically.

    This also allows for hiding all of the internals of the BPF PTP
    program inside that file, and only exporting relevant API bits
    to drivers.

    This patch also adds a kdoc documentation of ptp_classify_raw()
    API to make it clear that it can return PTP_CLASS_* defines. Also,
    the BPF program has been translated into bpf_asm code, so that it
    can be more easily read and altered (extensively documented in [1]).

    In the kernel tree under tools/net/ we have bpf_asm and bpf_dbg
    tools, so the commented program can simply be translated via
    `./bpf_asm -c prog` where prog is a file that contains the
    commented code. This makes it easily readable/verifiable and when
    there's a need to change something, jump offsets etc do not need
    to be replaced manually which can be very error prone. Instead,
    a newly translated version via bpf_asm can simply replace the old
    code. I have checked opcode diffs before/after and it's the very
    same filter.

    [1] Documentation/networking/filter.txt

    Fixes: 164d8c666521 ("net: ptp: do not reimplement PTP/BPF classifier")
    Reported-by: Fengguang Wu
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Cc: Richard Cochran
    Cc: Jiri Benc
    Acked-by: Richard Cochran
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

15 Mar, 2014

1 commit


14 Mar, 2014

1 commit

  • Pull networking fixes from David Miller:
    "I know this is a bit more than you want to see, and I've told the
    wireless folks under no uncertain terms that they must severely scale
    back the extent of the fixes they are submitting this late in the
    game.

    Anyways:

    1) vmxnet3's netpoll doesn't perform the equivalent of an ISR, which
    is the correct implementation, like it should. Instead it does
    something like a NAPI poll operation. This leads to crashes.

    From Neil Horman and Arnd Bergmann.

    2) Segmentation of SKBs requires proper socket orphaning of the
    fragments, otherwise we might access stale state released by the
    release callbacks.

    This is a 5 patch fix, but the initial patches are giving
    variables and such significantly clearer names such that the
    actual fix itself at the end looks trivial.

    From Michael S. Tsirkin.

    3) TCP control block release can deadlock if invoked from a timer on
    an already "owned" socket. Fix from Eric Dumazet.

    4) In the bridge multicast code, we must validate that the
    destination address of general queries is the link local all-nodes
    multicast address. From Linus Lüssing.

    5) The x86 BPF JIT support for negative offsets puts the parameter
    for the helper function call in the wrong register. Fix from
    Alexei Starovoitov.

    6) The descriptor type used for RTL_GIGA_MAC_VER_17 chips in the
    r8169 driver is incorrect. Fix from Hayes Wang.

    7) The xen-netback driver tests skb_shinfo(skb)->gso_type bits to see
    if a packet is a GSO frame, but that's not the correct test. It
    should use skb_is_gso(skb) instead. Fix from Wei Liu.

    8) Negative msg->msg_namelen values should generate an error, from
    Matthew Leach.

    9) at86rf230 can deadlock because it takes the same lock from it's
    ISR and it's hard_start_xmit method, without disabling interrupts
    in the latter. Fix from Alexander Aring.

    10) The FEC driver's restart doesn't perform operations in the correct
    order, so promiscuous settings can get lost. Fix from Stefan
    Wahren.

    11) Fix SKB leak in SCTP cookie handling, from Daniel Borkmann.

    12) Reference count and memory leak fixes in TIPC from Ying Xue and
    Erik Hugne.

    13) Forced eviction in inet_frag_evictor() must strictly make sure all
    frags are deleted, otherwise module unload (f.e. 6lowpan) can
    crash. Fix from Florian Westphal.

    14) Remove assumptions in AF_UNIX's use of csum_partial() (which it
    uses as a hash function), which breaks on PowerPC. From Anton
    Blanchard.

    The main gist of the issue is that csum_partial() is defined only
    as a value that, once folded (f.e. via csum_fold()) produces a
    correct 16-bit checksum. It is legitimate, therefore, for
    csum_partial() to produce two different 32-bit values over the
    same data if their respective alignments are different.

    15) Fix endiannes bug in MAC address handling of ibmveth driver, also
    from Anton Blanchard.

    16) Error checks for ipv6 exthdrs offload registration are reversed,
    from Anton Nayshtut.

    17) Externally triggered ipv6 addrconf routes should count against the
    garbage collection threshold. Fix from Sabrina Dubroca.

    18) The PCI shutdown handler added to the bnx2 driver can wedge the
    chip if it was not brought up earlier already, which in particular
    causes the firmware to shut down the PHY. Fix from Michael Chan.

    19) Adjust the sanity WARN_ON_ONCE() in qdisc_list_add() because as
    currently coded it can and does trigger in legitimate situations.
    From Eric Dumazet.

    20) BNA driver fails to build on ARM because of a too large udelay()
    call, fix from Ben Hutchings.

    21) Fair-Queue qdisc holds locks during GFP_KERNEL allocations, fix
    from Eric Dumazet.

    22) The vlan passthrough ops added in the previous release causes a
    regression in source MAC address setting of outgoing headers in
    some circumstances. Fix from Peter Boström"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (70 commits)
    ipv6: Avoid unnecessary temporary addresses being generated
    eth: fec: Fix lost promiscuous mode after reconnecting cable
    bonding: set correct vlan id for alb xmit path
    at86rf230: fix lockdep splats
    net/mlx4_en: Deregister multicast vxlan steering rules when going down
    vmxnet3: fix building without CONFIG_PCI_MSI
    MAINTAINERS: add networking selftests to NETWORKING
    net: socket: error on a negative msg_namelen
    MAINTAINERS: Add tools/net to NETWORKING [GENERAL]
    packet: doc: Spelling s/than/that/
    net/mlx4_core: Load the IB driver when the device supports IBoE
    net/mlx4_en: Handle vxlan steering rules for mac address changes
    net/mlx4_core: Fix wrong dump of the vxlan offloads device capability
    xen-netback: use skb_is_gso in xenvif_start_xmit
    r8169: fix the incorrect tx descriptor version
    tools/net/Makefile: Define PACKAGE to fix build problems
    x86: bpf_jit: support negative offsets
    bridge: multicast: enable snooping on general queries only
    bridge: multicast: add sanity check for general query destination
    tcp: tcp_release_cb() should release socket ownership
    ...

    Linus Torvalds
     

13 Mar, 2014

1 commit

  • When copying in a struct msghdr from the user, if the user has set the
    msg_namelen parameter to a negative value it gets clamped to a valid
    size due to a comparison between signed and unsigned values.

    Ensure the syscall errors when the user passes in a negative value.

    Signed-off-by: Matthew Leach
    Signed-off-by: David S. Miller

    Matthew Leach
     

10 Mar, 2014

1 commit


14 Feb, 2014

1 commit


11 Dec, 2013

1 commit

  • This patch makes socketpair() use error paths which do not
    rely on heavy-weight call to sys_close(): it's better to try
    to push the file descriptor to userspace before installing
    the socket file to the file descriptor, so that errors are
    catched earlier and being easier to handle.

    Using sys_close() seems to be the exception, while writing the
    file descriptor before installing it look like it's more or less
    the norm: eg. except for code used in init/, error handling
    involve fput() and put_unused_fd(), but not sys_close().

    This make socketpair() usage of sys_close() quite unusual.
    So it deserves to be replaced by the common pattern relying on
    fput() and put_unused_fd() just like, for example, the one used
    in pipe(2) or recvmsg(2).

    Three distinct error paths are still needed since calling
    fput() on file structure returned by sock_alloc_file() will
    implicitly call sock_release() on the associated socket
    structure.

    Cc: David S. Miller
    Cc: Al Viro
    Signed-off-by: Yann Droneaud
    Link: http://marc.info/?i=1385979146-13825-1-git-send-email-ydroneaud@opteya.com
    Signed-off-by: David S. Miller

    Yann Droneaud
     

06 Dec, 2013

1 commit


30 Nov, 2013

1 commit

  • If kmsg->msg_namelen > sizeof(struct sockaddr_storage) then in the
    original code that would lead to memory corruption in the kernel if you
    had audit configured. If you didn't have audit configured it was
    harmless.

    There are some programs such as beta versions of Ruby which use too
    large of a buffer and returning an error code breaks them. We should
    clamp the ->msg_namelen value instead.

    Fixes: 1661bf364ae9 ("net: heap overflow in __audit_sockaddr()")
    Reported-by: Eric Wong
    Signed-off-by: Dan Carpenter
    Tested-by: Eric Wong
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Dan Carpenter
     

21 Nov, 2013

2 commits


20 Nov, 2013

1 commit


19 Nov, 2013

2 commits


04 Oct, 2013

1 commit

  • We need to cap ->msg_namelen or it leads to a buffer overflow when we
    to the memcpy() in __audit_sockaddr(). It requires CAP_AUDIT_CONTROL to
    exploit this bug.

    The call tree is:
    ___sys_recvmsg()
    move_addr_to_user()
    audit_sockaddr()
    __audit_sockaddr()

    Reported-by: Jüri Aedla
    Signed-off-by: Dan Carpenter
    Signed-off-by: David S. Miller

    Dan Carpenter
     

14 Sep, 2013

1 commit

  • Pull aio changes from Ben LaHaise:
    "First off, sorry for this pull request being late in the merge window.
    Al had raised a couple of concerns about 2 items in the series below.
    I addressed the first issue (the race introduced by Gu's use of
    mm_populate()), but he has not provided any further details on how he
    wants to rework the anon_inode.c changes (which were sent out months
    ago but have yet to be commented on).

    The bulk of the changes have been sitting in the -next tree for a few
    months, with all the issues raised being addressed"

    * git://git.kvack.org/~bcrl/aio-next: (22 commits)
    aio: rcu_read_lock protection for new rcu_dereference calls
    aio: fix race in ring buffer page lookup introduced by page migration support
    aio: fix rcu sparse warnings introduced by ioctx table lookup patch
    aio: remove unnecessary debugging from aio_free_ring()
    aio: table lookup: verify ctx pointer
    staging/lustre: kiocb->ki_left is removed
    aio: fix error handling and rcu usage in "convert the ioctx list to table lookup v3"
    aio: be defensive to ensure request batching is non-zero instead of BUG_ON()
    aio: convert the ioctx list to table lookup v3
    aio: double aio_max_nr in calculations
    aio: Kill ki_dtor
    aio: Kill ki_users
    aio: Kill unneeded kiocb members
    aio: Kill aio_rw_vect_retry()
    aio: Don't use ctx->tail unnecessarily
    aio: io_cancel() no longer returns the io_event
    aio: percpu ioctx refcount
    aio: percpu reqs_available
    aio: reqs_active -> reqs_available
    aio: fix build when migration is disabled
    ...

    Linus Torvalds
     

12 Sep, 2013

1 commit

  • I found the following pattern that leads in to interesting findings:

    grep -r "ret.*|=.*__put_user" *
    grep -r "ret.*|=.*__get_user" *
    grep -r "ret.*|=.*__copy" *

    The __put_user() calls in compat_ioctl.c, ptrace compat, signal compat,
    since those appear in compat code, we could probably expect the kernel
    addresses not to be reachable in the lower 32-bit range, so I think they
    might not be exploitable.

    For the "__get_user" cases, I don't think those are exploitable: the worse
    that can happen is that the kernel will copy kernel memory into in-kernel
    buffers, and will fail immediately afterward.

    The alpha csum_partial_copy_from_user() seems to be missing the
    access_ok() check entirely. The fix is inspired from x86. This could
    lead to information leak on alpha. I also noticed that many architectures
    map csum_partial_copy_from_user() to csum_partial_copy_generic(), but I
    wonder if the latter is performing the access checks on every
    architectures.

    Signed-off-by: Mathieu Desnoyers
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: Jens Axboe
    Cc: Oleg Nesterov
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathieu Desnoyers
     

02 Aug, 2013

1 commit


30 Jul, 2013

2 commits

  • sock_aio_dtor() is dead code - and stuff that does need to do cleanup
    can simply do it before calling aio_complete().

    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Cc: Theodore Ts'o
    Signed-off-by: Benjamin LaHaise

    Kent Overstreet
     
  • This code doesn't serve any purpose anymore, since the aio retry
    infrastructure has been removed.

    This change should be safe because aio_read/write are also used for
    synchronous IO, and called from do_sync_read()/do_sync_write() - and
    there's no looping done in the sync case (the read and write syscalls).

    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Signed-off-by: Benjamin LaHaise

    Kent Overstreet
     

11 Jul, 2013

2 commits


09 Jul, 2013

1 commit

  • Rename functions in include/net/ll_poll.h to busy wait.
    Clarify documentation about expected power use increase.
    Rename POLL_LL to POLL_BUSY_LOOP.
    Add need_resched() testing to poll/select busy loops.

    Note, that in select and poll can_busy_poll is dynamic and is
    updated continuously to reflect the existence of supported
    sockets with valid queue information.

    Signed-off-by: Eliezer Tamir
    Signed-off-by: David S. Miller

    Eliezer Tamir
     

26 Jun, 2013

1 commit

  • select/poll busy-poll support.

    Split sysctl value into two separate ones, one for read and one for poll.
    updated Documentation/sysctl/net.txt

    Add a new poll flag POLL_LL. When this flag is set, sock_poll will call
    sk_poll_ll if possible. sock_poll sets this flag in its return value
    to indicate to select/poll when a socket that can busy poll is found.

    When poll/select have nothing to report, call the low-level
    sock_poll again until we are out of time or we find something.

    Once the system call finds something, it stops setting POLL_LL, so it can
    return the result to the user ASAP.

    Signed-off-by: Eliezer Tamir
    Signed-off-by: David S. Miller

    Eliezer Tamir
     

18 Jun, 2013

2 commits