24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

25 Jul, 2020

1 commit

  • Rework the remaining setsockopt code to pass a sockptr_t instead of a
    plain user pointer. This removes the last remaining set_fs(KERNEL_DS)
    outside of architecture specific code.

    Signed-off-by: Christoph Hellwig
    Acked-by: Stefan Schmidt [ieee802154]
    Acked-by: Matthieu Baerts
    Signed-off-by: David S. Miller

    Christoph Hellwig
     

20 Jul, 2020

2 commits


27 Apr, 2020

1 commit

  • Instead of having all the sysctl handlers deal with user pointers, which
    is rather hairy in terms of the BPF interaction, copy the input to and
    from userspace in common code. This also means that the strings are
    always NUL-terminated by the common code, making the API a little bit
    safer.

    As most handler just pass through the data to one of the common handlers
    a lot of the changes are mechnical.

    Signed-off-by: Christoph Hellwig
    Acked-by: Andrey Ignatov
    Signed-off-by: Al Viro

    Christoph Hellwig
     

04 Jan, 2020

1 commit


29 Oct, 2019

1 commit


05 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation this program is
    distributed in the hope that it will be useful but without any
    warranty without even the implied warranty of merchantability or
    fitness for a particular purpose see the gnu general public license
    for more details you should have received a copy of the gnu general
    public license along with this program if not write to the free
    software foundation inc 51 franklin st fifth floor boston ma 02110
    1301 usa

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 246 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Alexios Zavras
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190530000436.674189849@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

22 May, 2019

1 commit

  • Pull SPDX update from Greg KH:
    "Here is a series of patches that add SPDX tags to different kernel
    files, based on two different things:

    - SPDX entries are added to a bunch of files that we missed a year
    ago that do not have any license information at all.

    These were either missed because the tool saw the MODULE_LICENSE()
    tag, or some EXPORT_SYMBOL tags, and got confused and thought the
    file had a real license, or the files have been added since the
    last big sweep, or they were Makefile/Kconfig files, which we
    didn't touch last time.

    - Add GPL-2.0-only or GPL-2.0-or-later tags to files where our scan
    tools can determine the license text in the file itself. Where this
    happens, the license text is removed, in order to cut down on the
    700+ different ways we have in the kernel today, in a quest to get
    rid of all of these.

    These patches have been out for review on the linux-spdx@vger mailing
    list, and while they were created by automatic tools, they were
    hand-verified by a bunch of different people, all whom names are on
    the patches are reviewers.

    The reason for these "large" patches is if we were to continue to
    progress at the current rate of change in the kernel, adding license
    tags to individual files in different subsystems, we would be finished
    in about 10 years at the earliest.

    There will be more series of these types of patches coming over the
    next few weeks as the tools and reviewers crunch through the more
    "odd" variants of how to say "GPLv2" that developers have come up with
    over the years, combined with other fun oddities (GPL + a BSD
    disclaimer?) that are being unearthed, with the goal for the whole
    kernel to be cleaned up.

    These diffstats are not small, 3840 files are touched, over 10k lines
    removed in just 24 patches"

    * tag 'spdx-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (24 commits)
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 25
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 24
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 23
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 22
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 21
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 20
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 19
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 18
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 17
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 15
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 14
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 12
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 11
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 10
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 9
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 7
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 5
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 4
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 3
    ...

    Linus Torvalds
     

21 May, 2019

1 commit


20 May, 2019

1 commit


28 Apr, 2019

1 commit

  • We currently have two levels of strict validation:

    1) liberal (default)
    - undefined (type >= max) & NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted
    - garbage at end of message accepted
    2) strict (opt-in)
    - NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted

    Split out parsing strictness into four different options:
    * TRAILING - check that there's no trailing data after parsing
    attributes (in message or nested)
    * MAXTYPE - reject attrs > max known type
    * UNSPEC - reject attributes with NLA_UNSPEC policy entries
    * STRICT_ATTRS - strictly validate attribute size

    The default for future things should be *everything*.
    The current *_strict() is a combination of TRAILING and MAXTYPE,
    and is renamed to _deprecated_strict().
    The current regular parsing has none of this, and is renamed to
    *_parse_deprecated().

    Additionally it allows us to selectively set one of the new flags
    even on old policies. Notably, the UNSPEC flag could be useful in
    this case, since it can be arranged (by filling in the policy) to
    not be an incompatible userspace ABI change, but would then going
    forward prevent forgetting attribute entries. Similar can apply
    to the POLICY flag.

    We end up with the following renames:
    * nla_parse -> nla_parse_deprecated
    * nla_parse_strict -> nla_parse_deprecated_strict
    * nlmsg_parse -> nlmsg_parse_deprecated
    * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
    * nla_parse_nested -> nla_parse_nested_deprecated
    * nla_validate_nested -> nla_validate_nested_deprecated

    Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

    For this patch, don't actually add the strict, non-renamed versions
    yet so that it breaks compile if I get it wrong.

    Also, while at it, make nla_validate and nla_parse go down to a
    common __nla_validate_parse() function to avoid code duplication.

    Ultimately, this allows us to have very strict validation for every
    new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
    next patch, while existing things will continue to work as is.

    In effect then, this adds fully strict validation for any new command.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

22 Feb, 2019

1 commit

  • clang warns about overflowing the data[] member in the struct pnpipehdr:

    net/phonet/pep.c:295:8: warning: array index 4 is past the end of the array (which contains 1 element) [-Warray-bounds]
    if (hdr->data[4] == PEP_IND_READY)
    ^ ~
    include/net/phonet/pep.h:66:3: note: array 'data' declared here
    u8 data[1];

    Using a flexible array member at the end of the struct avoids the
    warning, but since we cannot have a flexible array member inside
    of the union, each index now has to be moved back by one, which
    makes it a little uglier.

    Signed-off-by: Arnd Bergmann
    Acked-by: Rémi Denis-Courmont
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

29 Jun, 2018

1 commit

  • The poll() changes were not well thought out, and completely
    unexplained. They also caused a huge performance regression, because
    "->poll()" was no longer a trivial file operation that just called down
    to the underlying file operations, but instead did at least two indirect
    calls.

    Indirect calls are sadly slow now with the Spectre mitigation, but the
    performance problem could at least be largely mitigated by changing the
    "->get_poll_head()" operation to just have a per-file-descriptor pointer
    to the poll head instead. That gets rid of one of the new indirections.

    But that doesn't fix the new complexity that is completely unwarranted
    for the regular case. The (undocumented) reason for the poll() changes
    was some alleged AIO poll race fixing, but we don't make the common case
    slower and more complex for some uncommon special case, so this all
    really needs way more explanations and most likely a fundamental
    redesign.

    [ This revert is a revert of about 30 different commits, not reverted
    individually because that would just be unnecessarily messy - Linus ]

    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

05 Jun, 2018

1 commit

  • Pull aio updates from Al Viro:
    "Majority of AIO stuff this cycle. aio-fsync and aio-poll, mostly.

    The only thing I'm holding back for a day or so is Adam's aio ioprio -
    his last-minute fixup is trivial (missing stub in !CONFIG_BLOCK case),
    but let it sit in -next for decency sake..."

    * 'work.aio-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    aio: sanitize the limit checking in io_submit(2)
    aio: fold do_io_submit() into callers
    aio: shift copyin of iocb into io_submit_one()
    aio_read_events_ring(): make a bit more readable
    aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way
    aio: take list removal to (some) callers of aio_complete()
    aio: add missing break for the IOCB_CMD_FDSYNC case
    random: convert to ->poll_mask
    timerfd: convert to ->poll_mask
    eventfd: switch to ->poll_mask
    pipe: convert to ->poll_mask
    crypto: af_alg: convert to ->poll_mask
    net/rxrpc: convert to ->poll_mask
    net/iucv: convert to ->poll_mask
    net/phonet: convert to ->poll_mask
    net/nfc: convert to ->poll_mask
    net/caif: convert to ->poll_mask
    net/bluetooth: convert to ->poll_mask
    net/sctp: convert to ->poll_mask
    net/tipc: convert to ->poll_mask
    ...

    Linus Torvalds
     

26 May, 2018

2 commits


16 May, 2018

1 commit

  • Variants of proc_create{,_data} that directly take a struct seq_operations
    and deal with network namespaces in ->open and ->release. All callers of
    proc_create + seq_open_net converted over, and seq_{open,release}_net are
    removed entirely.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     

28 Mar, 2018

1 commit


28 Feb, 2018

1 commit


13 Feb, 2018

1 commit

  • Changes since v1:
    Added changes in these files:
    drivers/infiniband/hw/usnic/usnic_transport.c
    drivers/staging/lustre/lnet/lnet/lib-socket.c
    drivers/target/iscsi/iscsi_target_login.c
    drivers/vhost/net.c
    fs/dlm/lowcomms.c
    fs/ocfs2/cluster/tcp.c
    security/tomoyo/network.c

    Before:
    All these functions either return a negative error indicator,
    or store length of sockaddr into "int *socklen" parameter
    and return zero on success.

    "int *socklen" parameter is awkward. For example, if caller does not
    care, it still needs to provide on-stack storage for the value
    it does not need.

    None of the many FOO_getname() functions of various protocols
    ever used old value of *socklen. They always just overwrite it.

    This change drops this parameter, and makes all these functions, on success,
    return length of sockaddr. It's always >= 0 and can be differentiated
    from an error.

    Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.

    rpc_sockname() lost "int buflen" parameter, since its only use was
    to be passed to kernel_getsockname() as &buflen and subsequently
    not used in any way.

    Userspace API is not changed.

    text data bss dec hex filename
    30108430 2633624 873672 33615726 200ef6e vmlinux.before.o
    30108109 2633612 873672 33615393 200ee21 vmlinux.o

    Signed-off-by: Denys Vlasenko
    CC: David S. Miller
    CC: linux-kernel@vger.kernel.org
    CC: netdev@vger.kernel.org
    CC: linux-bluetooth@vger.kernel.org
    CC: linux-decnet-user@lists.sourceforge.net
    CC: linux-wireless@vger.kernel.org
    CC: linux-rdma@vger.kernel.org
    CC: linux-sctp@vger.kernel.org
    CC: linux-nfs@vger.kernel.org
    CC: linux-x25@vger.kernel.org
    Signed-off-by: David S. Miller

    Denys Vlasenko
     

12 Feb, 2018

1 commit

  • This is the mindless scripted replacement of kernel use of POLL*
    variables as described by Al, done by this script:

    for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
    L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
    for f in $L; do sed -i "-es/^\([^\"]*\)\(\\)/\\1E\\2/" $f; done
    done

    with de-mangling cleanups yet to come.

    NOTE! On almost all architectures, the EPOLL* constants have the same
    values as the POLL* constants do. But they keyword here is "almost".
    For various bad reasons they aren't the same, and epoll() doesn't
    actually work quite correctly in some cases due to this on Sparc et al.

    The next patch from Al will sort out the final differences, and we
    should be all done.

    Scripted-by: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

01 Feb, 2018

1 commit

  • Pull networking updates from David Miller:

    1) Significantly shrink the core networking routing structures. Result
    of http://vger.kernel.org/~davem/seoul2017_netdev_keynote.pdf

    2) Add netdevsim driver for testing various offloads, from Jakub
    Kicinski.

    3) Support cross-chip FDB operations in DSA, from Vivien Didelot.

    4) Add a 2nd listener hash table for TCP, similar to what was done for
    UDP. From Martin KaFai Lau.

    5) Add eBPF based queue selection to tun, from Jason Wang.

    6) Lockless qdisc support, from John Fastabend.

    7) SCTP stream interleave support, from Xin Long.

    8) Smoother TCP receive autotuning, from Eric Dumazet.

    9) Lots of erspan tunneling enhancements, from William Tu.

    10) Add true function call support to BPF, from Alexei Starovoitov.

    11) Add explicit support for GRO HW offloading, from Michael Chan.

    12) Support extack generation in more netlink subsystems. From Alexander
    Aring, Quentin Monnet, and Jakub Kicinski.

    13) Add 1000BaseX, flow control, and EEE support to mvneta driver. From
    Russell King.

    14) Add flow table abstraction to netfilter, from Pablo Neira Ayuso.

    15) Many improvements and simplifications to the NFP driver bpf JIT,
    from Jakub Kicinski.

    16) Support for ipv6 non-equal cost multipath routing, from Ido
    Schimmel.

    17) Add resource abstration to devlink, from Arkadi Sharshevsky.

    18) Packet scheduler classifier shared filter block support, from Jiri
    Pirko.

    19) Avoid locking in act_csum, from Davide Caratti.

    20) devinet_ioctl() simplifications from Al viro.

    21) More TCP bpf improvements from Lawrence Brakmo.

    22) Add support for onlink ipv6 route flag, similar to ipv4, from David
    Ahern.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1925 commits)
    tls: Add support for encryption using async offload accelerator
    ip6mr: fix stale iterator
    net/sched: kconfig: Remove blank help texts
    openvswitch: meter: Use 64-bit arithmetic instead of 32-bit
    tcp_nv: fix potential integer overflow in tcpnv_acked
    r8169: fix RTL8168EP take too long to complete driver initialization.
    qmi_wwan: Add support for Quectel EP06
    rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK
    ipmr: Fix ptrdiff_t print formatting
    ibmvnic: Wait for device response when changing MAC
    qlcnic: fix deadlock bug
    tcp: release sk_frag.page in tcp_disconnect
    ipv4: Get the address of interface correctly.
    net_sched: gen_estimator: fix lockdep splat
    net: macb: Handle HRESP error
    net/mlx5e: IPoIB, Fix copy-paste bug in flow steering refactoring
    ipv6: addrconf: break critical section in addrconf_verify_rtnl()
    ipv6: change route cache aging logic
    i40e/i40evf: Update DESC_NEEDED value to reflect larger value
    bnxt_en: cleanup DIM work on device shutdown
    ...

    Linus Torvalds
     

17 Jan, 2018

1 commit

  • /proc has been ignoring struct file_operations::owner field for 10 years.
    Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba
    ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
    inode->i_fop is initialized with proxy struct file_operations for
    regular files:

    - if (de->proc_fops)
    - inode->i_fop = de->proc_fops;
    + if (de->proc_fops) {
    + if (S_ISREG(inode->i_mode))
    + inode->i_fop = &proc_reg_file_ops;
    + else
    + inode->i_fop = de->proc_fops;
    + }

    VFS stopped pinning module at this point.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

05 Dec, 2017

1 commit


28 Nov, 2017

1 commit


14 Nov, 2017

1 commit


04 Nov, 2017

1 commit


02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

08 Oct, 2017

2 commits


10 Aug, 2017

1 commit

  • This change allows us to later indicate to rtnetlink core that certain
    doit functions should be called without acquiring rtnl_mutex.

    This change should have no effect, we simply replace the last (now
    unused) calcit argument with the new flag.

    Signed-off-by: Florian Westphal
    Reviewed-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Florian Westphal
     

01 Jul, 2017

2 commits

  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    This patch uses refcount_inc_not_zero() instead of
    atomic_inc_not_zero_hint() due to absense of a _hint()
    version of refcount API. If the hint() version must
    be used, we might need to revisit API.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     
  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     

08 Jun, 2017

1 commit

  • Network devices can allocate reasources and private memory using
    netdev_ops->ndo_init(). However, the release of these resources
    can occur in one of two different places.

    Either netdev_ops->ndo_uninit() or netdev->destructor().

    The decision of which operation frees the resources depends upon
    whether it is necessary for all netdev refs to be released before it
    is safe to perform the freeing.

    netdev_ops->ndo_uninit() presumably can occur right after the
    NETDEV_UNREGISTER notifier completes and the unicast and multicast
    address lists are flushed.

    netdev->destructor(), on the other hand, does not run until the
    netdev references all go away.

    Further complicating the situation is that netdev->destructor()
    almost universally does also a free_netdev().

    This creates a problem for the logic in register_netdevice().
    Because all callers of register_netdevice() manage the freeing
    of the netdev, and invoke free_netdev(dev) if register_netdevice()
    fails.

    If netdev_ops->ndo_init() succeeds, but something else fails inside
    of register_netdevice(), it does call ndo_ops->ndo_uninit(). But
    it is not able to invoke netdev->destructor().

    This is because netdev->destructor() will do a free_netdev() and
    then the caller of register_netdevice() will do the same.

    However, this means that the resources that would normally be released
    by netdev->destructor() will not be.

    Over the years drivers have added local hacks to deal with this, by
    invoking their destructor parts by hand when register_netdevice()
    fails.

    Many drivers do not try to deal with this, and instead we have leaks.

    Let's close this hole by formalizing the distinction between what
    private things need to be freed up by netdev->destructor() and whether
    the driver needs unregister_netdevice() to perform the free_netdev().

    netdev->priv_destructor() performs all actions to free up the private
    resources that used to be freed by netdev->destructor(), except for
    free_netdev().

    netdev->needs_free_netdev is a boolean that indicates whether
    free_netdev() should be done at the end of unregister_netdevice().

    Now, register_netdevice() can sanely release all resources after
    ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
    and netdev->priv_destructor().

    And at the end of unregister_netdevice(), we invoke
    netdev->priv_destructor() and optionally call free_netdev().

    Signed-off-by: David S. Miller

    David S. Miller
     

18 Apr, 2017

1 commit

  • Add netlink_ext_ack arg to rtnl_doit_func. Pass extack arg to nlmsg_parse
    for doit functions that call it directly.

    This is the first step to using extended error reporting in rtnetlink.
    >From here individual subsystems can be updated to set netlink_ext_ack as
    needed.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

14 Apr, 2017

1 commit


10 Mar, 2017

1 commit

  • Lockdep issues a circular dependency warning when AFS issues an operation
    through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.

    The theory lockdep comes up with is as follows:

    (1) If the pagefault handler decides it needs to read pages from AFS, it
    calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
    creating a call requires the socket lock:

    mmap_sem must be taken before sk_lock-AF_RXRPC

    (2) afs_open_socket() opens an AF_RXRPC socket and binds it. rxrpc_bind()
    binds the underlying UDP socket whilst holding its socket lock.
    inet_bind() takes its own socket lock:

    sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET

    (3) Reading from a TCP socket into a userspace buffer might cause a fault
    and thus cause the kernel to take the mmap_sem, but the TCP socket is
    locked whilst doing this:

    sk_lock-AF_INET must be taken before mmap_sem

    However, lockdep's theory is wrong in this instance because it deals only
    with lock classes and not individual locks. The AF_INET lock in (2) isn't
    really equivalent to the AF_INET lock in (3) as the former deals with a
    socket entirely internal to the kernel that never sees userspace. This is
    a limitation in the design of lockdep.

    Fix the general case by:

    (1) Double up all the locking keys used in sockets so that one set are
    used if the socket is created by userspace and the other set is used
    if the socket is created by the kernel.

    (2) Store the kern parameter passed to sk_alloc() in a variable in the
    sock struct (sk_kern_sock). This informs sock_lock_init(),
    sock_init_data() and sk_clone_lock() as to the lock keys to be used.

    Note that the child created by sk_clone_lock() inherits the parent's
    kern setting.

    (3) Add a 'kern' parameter to ->accept() that is analogous to the one
    passed in to ->create() that distinguishes whether kernel_accept() or
    sys_accept4() was the caller and can be passed to sk_alloc().

    Note that a lot of accept functions merely dequeue an already
    allocated socket. I haven't touched these as the new socket already
    exists before we get the parameter.

    Note also that there are a couple of places where I've made the accepted
    socket unconditionally kernel-based:

    irda_accept()
    rds_rcp_accept_one()
    tcp_accept_from_sock()

    because they follow a sock_create_kern() and accept off of that.

    Whilst creating this, I noticed that lustre and ocfs don't create sockets
    through sock_create_kern() and thus they aren't marked as for-kernel,
    though they appear to be internal. I wonder if these should do that so
    that they use the new set of lock keys.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

02 Mar, 2017

1 commit


18 Nov, 2016

1 commit

  • Make struct pernet_operations::id unsigned.

    There are 2 reasons to do so:

    1)
    This field is really an index into an zero based array and
    thus is unsigned entity. Using negative value is out-of-bound
    access by definition.

    2)
    On x86_64 unsigned 32-bit data which are mixed with pointers
    via array indexing or offsets added or subtracted to pointers
    are preffered to signed 32-bit data.

    "int" being used as an array index needs to be sign-extended
    to 64-bit before being used.

    void f(long *p, int i)
    {
    g(p[i]);
    }

    roughly translates to

    movsx rsi, esi
    mov rdi, [rsi+...]
    call g

    MOVSX is 3 byte instruction which isn't necessary if the variable is
    unsigned because x86_64 is zero extending by default.

    Now, there is net_generic() function which, you guessed it right, uses
    "int" as an array index:

    static inline void *net_generic(const struct net *net, int id)
    {
    ...
    ptr = ng->ptr[id - 1];
    ...
    }

    And this function is used a lot, so those sign extensions add up.

    Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
    messing with code generation):

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

    Unfortunately some functions actually grow bigger.
    This is a semmingly random artefact of code generation with register
    allocator being used differently. gcc decides that some variable
    needs to live in new r8+ registers and every access now requires REX
    prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
    used which is longer than [r8]

    However, overall balance is in negative direction:

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
    function old new delta
    nfsd4_lock 3886 3959 +73
    tipc_link_build_proto_msg 1096 1140 +44
    mac80211_hwsim_new_radio 2776 2808 +32
    tipc_mon_rcv 1032 1058 +26
    svcauth_gss_legacy_init 1413 1429 +16
    tipc_bcbase_select_primary 379 392 +13
    nfsd4_exchange_id 1247 1260 +13
    nfsd4_setclientid_confirm 782 793 +11
    ...
    put_client_renew_locked 494 480 -14
    ip_set_sockfn_get 730 716 -14
    geneve_sock_add 829 813 -16
    nfsd4_sequence_done 721 703 -18
    nlmclnt_lookup_host 708 686 -22
    nfsd4_lockt 1085 1063 -22
    nfs_get_client 1077 1050 -27
    tcf_bpf_init 1106 1076 -30
    nfsd4_encode_fattr 5997 5930 -67
    Total: Before=154856051, After=154854321, chg -0.00%

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan