12 Apr, 2014

1 commit

  • Several spots in the kernel perform a sequence like:

    skb_queue_tail(&sk->s_receive_queue, skb);
    sk->sk_data_ready(sk, skb->len);

    But at the moment we place the SKB onto the socket receive queue it
    can be consumed and freed up. So this skb->len access is potentially
    to freed up memory.

    Furthermore, the skb->len can be modified by the consumer so it is
    possible that the value isn't accurate.

    And finally, no actual implementation of this callback actually uses
    the length argument. And since nobody actually cared about it's
    value, lots of call sites pass arbitrary values in such as '0' and
    even '1'.

    So just remove the length argument from the callback, that way there
    is no confusion whatsoever and all of these use-after-free cases get
    fixed as a side effect.

    Based upon a patch by Eric Dumazet and his suggestion to audit this
    issue tree-wide.

    Signed-off-by: David S. Miller

    David S. Miller
     

27 Mar, 2014

1 commit

  • Some applications didn't expect recvmsg() on a non blocking socket
    could return -EINTR. This possibility was added as a side effect
    of commit b3ca9b02b00704 ("net: fix multithreaded signal handling in
    unix recv routines").

    To hit this bug, you need to be a bit unlucky, as the u->readlock
    mutex is usually held for very small periods.

    Fixes: b3ca9b02b00704 ("net: fix multithreaded signal handling in unix recv routines")
    Signed-off-by: Eric Dumazet
    Cc: Rainer Weikusat
    Signed-off-by: David S. Miller

    Eric Dumazet
     

07 Mar, 2014

1 commit

  • The unix socket code is using the result of csum_partial to
    hash into a lookup table:

    unix_hash_fold(csum_partial(sunaddr, len, 0));

    csum_partial is only guaranteed to produce something that can be
    folded into a checksum, as its prototype explains:

    * returns a 32-bit number suitable for feeding into itself
    * or csum_tcpudp_magic

    The 32bit value should not be used directly.

    Depending on the alignment, the ppc64 csum_partial will return
    different 32bit partial checksums that will fold into the same
    16bit checksum.

    This difference causes the following testcase (courtesy of
    Gustavo) to sometimes fail:

    #include
    #include

    int main()
    {
    int fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0);

    int i = 1;
    setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &i, 4);

    struct sockaddr addr;
    addr.sa_family = AF_LOCAL;
    bind(fd, &addr, 2);

    listen(fd, 128);

    struct sockaddr_storage ss;
    socklen_t sslen = (socklen_t)sizeof(ss);
    getsockname(fd, (struct sockaddr*)&ss, &sslen);

    fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0);

    if (connect(fd, (struct sockaddr*)&ss, sslen) == -1){
    perror(NULL);
    return 1;
    }
    printf("OK\n");
    return 0;
    }

    As suggested by davem, fix this by using csum_fold to fold the
    partial 32bit checksum into a 16bit checksum before using it.

    Signed-off-by: Anton Blanchard
    Cc: stable@vger.kernel.org
    Signed-off-by: David S. Miller

    Anton Blanchard
     

19 Jan, 2014

1 commit

  • This is a follow-up patch to f3d3342602f8bc ("net: rework recvmsg
    handler msg_name and msg_namelen logic").

    DECLARE_SOCKADDR validates that the structure we use for writing the
    name information to is not larger than the buffer which is reserved
    for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
    consistently in sendmsg code paths.

    Signed-off-by: Steffen Hurrle
    Suggested-by: Hannes Frederic Sowa
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Steffen Hurrle
     

19 Dec, 2013

1 commit


18 Dec, 2013

1 commit

  • This is similar to the set_peek_off patch where calling bind while the
    socket is stuck in unix_dgram_recvmsg() will block and cause a hung task
    spew after a while.

    This is also the last place that did a straightforward mutex_lock(), so
    there shouldn't be any more of these patches.

    Signed-off-by: Sasha Levin
    Signed-off-by: David S. Miller

    Sasha Levin
     

11 Dec, 2013

1 commit

  • unix_dgram_recvmsg() will hold the readlock of the socket until recv
    is complete.

    In the same time, we may try to setsockopt(SO_PEEK_OFF) which will hang until
    unix_dgram_recvmsg() will complete (which can take a while) without allowing
    us to break out of it, triggering a hung task spew.

    Instead, allow set_peek_off to fail, this way userspace will not hang.

    Signed-off-by: Sasha Levin
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Sasha Levin
     

07 Dec, 2013

1 commit


21 Nov, 2013

1 commit


20 Oct, 2013

1 commit

  • In the case of credentials passing in unix stream sockets (dgram
    sockets seem not affected), we get a rather sparse race after
    commit 16e5726 ("af_unix: dont send SCM_CREDENTIALS by default").

    We have a stream server on receiver side that requests credential
    passing from senders (e.g. nc -U). Since we need to set SO_PASSCRED
    on each spawned/accepted socket on server side to 1 first (as it's
    not inherited), it can happen that in the time between accept() and
    setsockopt() we get interrupted, the sender is being scheduled and
    continues with passing data to our receiver. At that time SO_PASSCRED
    is neither set on sender nor receiver side, hence in cmsg's
    SCM_CREDENTIALS we get eventually pid:0, uid:65534, gid:65534
    (== overflow{u,g}id) instead of what we actually would like to see.

    On the sender side, here nc -U, the tests in maybe_add_creds()
    invoked through unix_stream_sendmsg() would fail, as at that exact
    time, as mentioned, the sender has neither SO_PASSCRED on his side
    nor sees it on the server side, and we have a valid 'other' socket
    in place. Thus, sender believes it would just look like a normal
    connection, not needing/requesting SO_PASSCRED at that time.

    As reverting 16e5726 would not be an option due to the significant
    performance regression reported when having creds always passed,
    one way/trade-off to prevent that would be to set SO_PASSCRED on
    the listener socket and allow inheriting these flags to the spawned
    socket on server side in accept(). It seems also logical to do so
    if we'd tell the listener socket to pass those flags onwards, and
    would fix the race.

    Before, strace:

    recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}],
    msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
    cmsg_type=SCM_CREDENTIALS{pid=0, uid=65534, gid=65534}},
    msg_flags=0}, 0) = 5

    After, strace:

    recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}],
    msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
    cmsg_type=SCM_CREDENTIALS{pid=11580, uid=1000, gid=1000}},
    msg_flags=0}, 0) = 5

    Signed-off-by: Daniel Borkmann
    Cc: Eric Dumazet
    Cc: Eric W. Biederman
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

03 Oct, 2013

1 commit

  • When filling the netlink message we miss to wipe the pad field,
    therefore leak one byte of heap memory to userland. Fix this by
    setting pad to 0.

    Signed-off-by: Mathias Krause
    Signed-off-by: David S. Miller

    Mathias Krause
     

12 Aug, 2013

1 commit

  • commit e370a723632 ("af_unix: improve STREAM behavior with fragmented
    memory") added a bug on large send() because the
    skb_copy_datagram_from_iovec() call always start from the beginning
    of iovec.

    We must instead use the @sent variable to properly skip the
    already processed part.

    Reported-by: Hannes Frederic Sowa
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

10 Aug, 2013

2 commits

  • Adding paged frags skbs to af_unix sockets introduced a performance
    regression on large sends because of additional page allocations, even
    if each skb could carry at least 100% more payload than before.

    We can instruct sock_alloc_send_pskb() to attempt high order
    allocations.

    Most of the time, it does a single page allocation instead of 8.

    I added an additional parameter to sock_alloc_send_pskb() to
    let other users to opt-in for this new feature on followup patches.

    Tested:

    Before patch :

    $ netperf -t STREAM_STREAM
    STREAM STREAM TEST
    Recv Send Send
    Socket Socket Message Elapsed
    Size Size Size Time Throughput
    bytes bytes bytes secs. 10^6bits/sec

    2304 212992 212992 10.00 46861.15

    After patch :

    $ netperf -t STREAM_STREAM
    STREAM STREAM TEST
    Recv Send Send
    Socket Socket Message Elapsed
    Size Size Size Time Throughput
    bytes bytes bytes secs. 10^6bits/sec

    2304 212992 212992 10.00 57981.11

    Signed-off-by: Eric Dumazet
    Cc: David Rientjes
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • unix_stream_sendmsg() currently uses order-2 allocations,
    and we had numerous reports this can fail.

    The __GFP_REPEAT flag present in sock_alloc_send_pskb() is
    not helping.

    This patch extends the work done in commit eb6a24816b247c
    ("af_unix: reduce high order page allocations) for
    datagram sockets.

    This opens the possibility of zero copy IO (splice() and
    friends)

    The trick is to not use skb_pull() anymore in recvmsg() path,
    and instead add a @consumed field in UNIXCB() to track amount
    of already read payload in the skb.

    There is a performance regression for large sends
    because of extra page allocations that will be addressed
    in a follow-up patch, allowing sock_alloc_send_pskb()
    to attempt high order page allocations.

    Signed-off-by: Eric Dumazet
    Cc: David Rientjes
    Signed-off-by: David S. Miller

    Eric Dumazet
     

10 Jul, 2013

1 commit

  • Pull networking updates from David Miller:
    "This is a re-do of the net-next pull request for the current merge
    window. The only difference from the one I made the other day is that
    this has Eliezer's interface renames and the timeout handling changes
    made based upon your feedback, as well as a few bug fixes that have
    trickeled in.

    Highlights:

    1) Low latency device polling, eliminating the cost of interrupt
    handling and context switches. Allows direct polling of a network
    device from socket operations, such as recvmsg() and poll().

    Currently ixgbe, mlx4, and bnx2x support this feature.

    Full high level description, performance numbers, and design in
    commit 0a4db187a999 ("Merge branch 'll_poll'")

    From Eliezer Tamir.

    2) With the routing cache removed, ip_check_mc_rcu() gets exercised
    more than ever before in the case where we have lots of multicast
    addresses. Use a hash table instead of a simple linked list, from
    Eric Dumazet.

    3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from
    Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski,
    Marek Puzyniak, Michal Kazior, and Sujith Manoharan.

    4) Support reporting the TUN device persist flag to userspace, from
    Pavel Emelyanov.

    5) Allow controlling network device VF link state using netlink, from
    Rony Efraim.

    6) Support GRE tunneling in openvswitch, from Pravin B Shelar.

    7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from
    Daniel Borkmann and Eric Dumazet.

    8) Allow controlling of TCP quickack behavior on a per-route basis,
    from Cong Wang.

    9) Several bug fixes and improvements to vxlan from Stephen
    Hemminger, Pravin B Shelar, and Mike Rapoport. In particular,
    support receiving on multiple UDP ports.

    10) Major cleanups, particular in the area of debugging and cookie
    lifetime handline, to the SCTP protocol code. From Daniel
    Borkmann.

    11) Allow packets to cross network namespaces when traversing tunnel
    devices. From Nicolas Dichtel.

    12) Allow monitoring netlink traffic via AF_PACKET sockets, in a
    manner akin to how we monitor real network traffic via ptype_all.
    From Daniel Borkmann.

    13) Several bug fixes and improvements for the new alx device driver,
    from Johannes Berg.

    14) Fix scalability issues in the netem packet scheduler's time queue,
    by using an rbtree. From Eric Dumazet.

    15) Several bug fixes in TCP loss recovery handling, from Yuchung
    Cheng.

    16) Add support for GSO segmentation of MPLS packets, from Simon
    Horman.

    17) Make network notifiers have a real data type for the opaque
    pointer that's passed into them. Use this to properly handle
    network device flag changes in arp_netdev_event(). From Jiri
    Pirko and Timo Teräs.

    18) Convert several drivers over to module_pci_driver(), from Peter
    Huewe.

    19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a
    O(1) calculation instead. From Eric Dumazet.

    20) Support setting of explicit tunnel peer addresses in ipv6, just
    like ipv4. From Nicolas Dichtel.

    21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet.

    22) Prevent a single high rate flow from overruning an individual cpu
    during RX packet processing via selective flow shedding. From
    Willem de Bruijn.

    23) Don't use spinlocks in TCP md5 signing fast paths, from Eric
    Dumazet.

    24) Don't just drop GSO packets which are above the TBF scheduler's
    burst limit, chop them up so they are in-bounds instead. Also
    from Eric Dumazet.

    25) VLAN offloads are missed when configured on top of a bridge, fix
    from Vlad Yasevich.

    26) Support IPV6 in ping sockets. From Lorenzo Colitti.

    27) Receive flow steering targets should be updated at poll() time
    too, from David Majnemer.

    28) Fix several corner case regressions in PMTU/redirect handling due
    to the routing cache removal, from Timo Teräs.

    29) We have to be mindful of ipv4 mapped ipv6 sockets in
    upd_v6_push_pending_frames(). From Hannes Frederic Sowa.

    30) Fix L2TP sequence number handling bugs, from James Chapman."

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits)
    drivers/net: caif: fix wrong rtnl_is_locked() usage
    drivers/net: enic: release rtnl_lock on error-path
    vhost-net: fix use-after-free in vhost_net_flush
    net: mv643xx_eth: do not use port number as platform device id
    net: sctp: confirm route during forward progress
    virtio_net: fix race in RX VQ processing
    virtio: support unlocked queue poll
    net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit
    Documentation: Fix references to defunct linux-net@vger.kernel.org
    net/fs: change busy poll time accounting
    net: rename low latency sockets functions to busy poll
    bridge: fix some kernel warning in multicast timer
    sfc: Fix memory leak when discarding scattered packets
    sit: fix tunnel update via netlink
    dt:net:stmmac: Add dt specific phy reset callback support.
    dt:net:stmmac: Add support to dwmac version 3.610 and 3.710
    dt:net:stmmac: Allocate platform data only if its NULL.
    net:stmmac: fix memleak in the open method
    ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available
    net: ipv6: fix wrong ping_v6_sendmsg return value
    ...

    Linus Torvalds
     

13 Jun, 2013

1 commit

  • Reduce the uses of this unnecessary typedef.

    Done via perl script:

    $ git grep --name-only -w ctl_table net | \
    xargs perl -p -i -e '\
    sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \
    s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge'

    Reflow the modified lines that now exceed 80 columns.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

12 May, 2013

1 commit

  • Avoid waking up every thread sleeping in read call on an AF_UNIX
    socket during suspend and resume by calling a freezable blocking
    call. Previous patches modified the freezer to avoid sending
    wakeups to threads that are blocked in freezable blocking calls.

    This call was selected to be converted to a freezable call because
    it doesn't hold any locks or release any resources when interrupted
    that might be needed by another freezing task or a kernel driver
    during suspend, and is a common site where idle userspace tasks are
    blocked.

    Acked-by: Tejun Heo
    Signed-off-by: Colin Cross
    Signed-off-by: Rafael J. Wysocki

    Colin Cross
     

02 May, 2013

1 commit

  • Using bit fields is dangerous on ppc64/sparc64, as the compiler [1]
    uses 64bit instructions to manipulate them.
    If the 64bit word includes any atomic_t or spinlock_t, we can lose
    critical concurrent changes.

    This is happening in af_unix, where unix_sk(sk)->gc_candidate/
    gc_maybe_cycle/lock share the same 64bit word.

    This leads to fatal deadlock, as one/several cpus spin forever
    on a spinlock that will never be available again.

    A safer way would be to use a long to store flags.
    This way we are sure compiler/arch wont do bad things.

    As we own unix_gc_lock spinlock when clearing or setting bits,
    we can use the non atomic __set_bit()/__clear_bit().

    recursion_level can share the same 64bit location with the spinlock,
    as it is set only with this spinlock held.

    [1] bug fixed in gcc-4.8.0 :
    http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080

    Reported-by: Ambrose Feinstein
    Signed-off-by: Eric Dumazet
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Apr, 2013

2 commits


23 Apr, 2013

1 commit

  • Conflicts:
    drivers/net/ethernet/emulex/benet/be_main.c
    drivers/net/ethernet/intel/igb/igb_main.c
    drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c
    include/net/scm.h
    net/batman-adv/routing.c
    net/ipv4/tcp_input.c

    The e{uid,gid} --> {uid,gid} credentials fix conflicted with the
    cleanup in net-next to now pass cred structs around.

    The be2net driver had a bug fix in 'net' that overlapped with the VLAN
    interface changes by Patrick McHardy in net-next.

    An IGB conflict existed because in 'net' the build_skb() support was
    reverted, and in 'net-next' there was a comment style fix within that
    code.

    Several batman-adv conflicts were resolved by making sure that all
    calls to batadv_is_my_mac() are changed to have a new bat_priv first
    argument.

    Eric Dumazet's TS ECR fix in TCP in 'net' conflicted with the F-RTO
    rewrite in 'net-next', mostly overlapping changes.

    Thanks to Stephen Rothwell and Antonio Quartulli for help with several
    of these merge resolutions.

    Signed-off-by: David S. Miller

    David S. Miller
     

08 Apr, 2013

2 commits

  • Now that uids and gids are completely encapsulated in kuid_t
    and kgid_t we no longer need to pass struct cred which allowed
    us to test both the uid and the user namespace for equality.

    Passing struct cred potentially allows us to pass the entire group
    list as BSD does but I don't believe the cost of cache line misses
    justifies retaining code for a future potential application.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Conflicts:
    drivers/nfc/microread/mei.c
    net/netfilter/nfnetlink_queue_core.c

    Pull in 'net' to get Eric Biederman's AF_UNIX fix, upon which
    some cleanups are going to go on-top.

    Signed-off-by: David S. Miller

    David S. Miller
     

05 Apr, 2013

2 commits

  • It was reported that the following LSB test case failed
    https://lsbbugs.linuxfoundation.org/attachment.cgi?id=2144 because we
    were not coallescing unix stream messages when the application was
    expecting us to.

    The problem was that the first send was before the socket was accepted
    and thus sock->sk_socket was NULL in maybe_add_creds, and the second
    send after the socket was accepted had a non-NULL value for sk->socket
    and thus we could tell the credentials were not needed so we did not
    bother.

    The unnecessary credentials on the first message cause
    unix_stream_recvmsg to start verifying that all messages had the same
    credentials before coallescing and then the coallescing failed because
    the second message had no credentials.

    Ignoring credentials when we don't care in unix_stream_recvmsg fixes a
    long standing pessimization which would fail to coallesce messages when
    reading from a unix stream socket if the senders were different even if
    we did not care about their credentials.

    I have tested this and verified that the in the LSB test case mentioned
    above that the messages do coallesce now, while the were failing to
    coallesce without this change.

    Reported-by: Karel Srot
    Reported-by: Ding Tianhong
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • This reverts commit 14134f6584212d585b310ce95428014b653dfaf6.

    The problem that the above patch was meant to address is that af_unix
    messages are not being coallesced because we are sending unnecesarry
    credentials. Not sending credentials in maybe_add_creds totally
    breaks unconnected unix domain sockets that wish to send credentails
    to other sockets.

    In practice this break some versions of udev because they receive a
    message and the sending uid is bogus so they drop the message.

    Reported-by: Sven Joachim
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

03 Apr, 2013

1 commit

  • Commit 7d4c04fc170087119727119074e72445f2bb192b ("net: add option to enable
    error queue packets waking select") has an issue due to operator precedence
    causing the bit-wise OR to bind to the sock_flags call instead of the result of
    the terniary conditional. This fixes the *_poll functions to work properly. The
    old code results in "mask |= POLLPRI" instead of what was intended, which is to
    only include POLLPRI when the socket option is enabled.

    Signed-off-by: Jacob Keller
    Signed-off-by: David S. Miller

    Jacob Keller
     

01 Apr, 2013

1 commit

  • Currently, when a socket receives something on the error queue it only wakes up
    the socket on select if it is in the "read" list, that is the socket has
    something to read. It is useful also to wake the socket if it is in the error
    list, which would enable software to wait on error queue packets without waking
    up for regular data on the socket. The main use case is for receiving
    timestamped transmit packets which return the timestamp to the socket via the
    error queue. This enables an application to select on the socket for the error
    queue only instead of for the regular traffic.

    -v2-
    * Added the SO_SELECT_ERR_QUEUE socket option to every architechture specific file
    * Modified every socket poll function that checks error queue

    Signed-off-by: Jacob Keller
    Cc: Jeffrey Kirsher
    Cc: Richard Cochran
    Cc: Matthew Vick
    Signed-off-by: David S. Miller

    Keller, Jacob E
     

27 Mar, 2013

1 commit

  • SCM_SCREDENTIALS should apply to write() syscalls only either source or destination
    socket asserted SOCK_PASSCRED. The original implememtation in maybe_add_creds is wrong,
    and breaks several LSB testcases ( i.e. /tset/LSB.os/netowkr/recvfrom/T.recvfrom).

    Origionally-authored-by: Karel Srot
    Signed-off-by: Ding Tianhong
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    dingtianhong
     

26 Mar, 2013

1 commit

  • As reported by Jan, and others over the past few years, there is a
    race condition caused by unix_release setting the sock->sk pointer
    to NULL before properly marking the socket as dead/orphaned. This
    can cause a problem with the LSM hook security_unix_may_send() if
    there is another socket attempting to write to this partially
    released socket in between when sock->sk is set to NULL and it is
    marked as dead/orphaned. This patch fixes this by only setting
    sock->sk to NULL after the socket has been marked as dead; I also
    take the opportunity to make unix_release_sock() a void function
    as it only ever returned 0/success.

    Dave, I think this one should go on the -stable pile.

    Special thanks to Jan for coming up with a reproducer for this
    problem.

    Reported-by: Jan Stancek
    Signed-off-by: Paul Moore
    Signed-off-by: David S. Miller

    Paul Moore
     

28 Feb, 2013

1 commit

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

27 Feb, 2013

1 commit

  • Pull vfs pile (part one) from Al Viro:
    "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
    locking violations, etc.

    The most visible changes here are death of FS_REVAL_DOT (replaced with
    "has ->d_weak_revalidate()") and a new helper getting from struct file
    to inode. Some bits of preparation to xattr method interface changes.

    Misc patches by various people sent this cycle *and* ocfs2 fixes from
    several cycles ago that should've been upstream right then.

    PS: the next vfs pile will be xattr stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    saner proc_get_inode() calling conventions
    proc: avoid extra pde_put() in proc_fill_super()
    fs: change return values from -EACCES to -EPERM
    fs/exec.c: make bprm_mm_init() static
    ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
    ocfs2: fix possible use-after-free with AIO
    ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
    get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
    target: writev() on single-element vector is pointless
    export kernel_write(), convert open-coded instances
    fs: encode_fh: return FILEID_INVALID if invalid fid_type
    kill f_vfsmnt
    vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
    nfsd: handle vfs_getattr errors in acl protocol
    switch vfs_getattr() to struct path
    default SET_PERSONALITY() in linux/elf.h
    ceph: prepopulate inodes only when request is aborted
    d_hash_and_lookup(): export, switch open-coded instances
    9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
    9p: split dropping the acls from v9fs_set_create_acl()
    ...

    Linus Torvalds
     

23 Feb, 2013

1 commit


19 Feb, 2013

2 commits

  • proc_net_remove is only used to remove proc entries
    that under /proc/net,it's not a general function for
    removing proc entries of netns. if we want to remove
    some proc entries which under /proc/net/stat/, we still
    need to call remove_proc_entry.

    this patch use remove_proc_entry to replace proc_net_remove.
    we can remove proc_net_remove after this patch.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     
  • Right now, some modules such as bonding use proc_create
    to create proc entries under /proc/net/, and other modules
    such as ipv4 use proc_net_fops_create.

    It looks a little chaos.this patch changes all of
    proc_net_fops_create to proc_create. we can remove
    proc_net_fops_create after this patch.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     

10 Jan, 2013

1 commit


19 Nov, 2012

1 commit

  • In preparation for supporting the creation of network namespaces
    by unprivileged users, modify all of the per net sysctl exports
    and refuse to allow them to unprivileged users.

    This makes it safe for unprivileged users in general to access
    per net sysctls, and allows sysctls to be exported to unprivileged
    users on an individual basis as they are deemed safe.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

24 Oct, 2012

1 commit


18 Sep, 2012

1 commit


11 Sep, 2012

1 commit

  • It is a frequent mistake to confuse the netlink port identifier with a
    process identifier. Try to reduce this confusion by renaming fields
    that hold port identifiers portid instead of pid.

    I have carefully avoided changing the structures exported to
    userspace to avoid changing the userspace API.

    I have successfully built an allyesconfig kernel with this change.

    Signed-off-by: "Eric W. Biederman"
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

01 Sep, 2012

1 commit