27 Apr, 2015

1 commit

  • Pull fourth vfs update from Al Viro:
    "d_inode() annotations from David Howells (sat in for-next since before
    the beginning of merge window) + four assorted fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    RCU pathwalk breakage when running into a symlink overmounting something
    fix I_DIO_WAKEUP definition
    direct-io: only inc/dec inode->i_dio_count for file systems
    fs/9p: fix readdir()
    VFS: assorted d_backing_inode() annotations
    VFS: fs/inode.c helpers: d_inode() annotations
    VFS: fs/cachefiles: d_backing_inode() annotations
    VFS: fs library helpers: d_inode() annotations
    VFS: assorted weird filesystems: d_inode() annotations
    VFS: normal filesystems (and lustre): d_inode() annotations
    VFS: security/: d_inode() annotations
    VFS: security/: d_backing_inode() annotations
    VFS: net/: d_inode() annotations
    VFS: net/unix: d_backing_inode() annotations
    VFS: kernel/: d_inode() annotations
    VFS: audit: d_backing_inode() annotations
    VFS: Fix up some ->d_inode accesses in the chelsio driver
    VFS: Cachefiles should perform fs modifications on the top layer only
    VFS: AF_UNIX sockets should call mknod on the top layer only

    Linus Torvalds
     

23 Apr, 2015

1 commit

  • Pull audit fixes from Paul Moore:
    "Seven audit patches for v4.1, all bug fixes.

    The largest, and perhaps most significant commit helps resolve some
    memory pressure issues related to the inode cache and audit, there are
    also a few small commits which help resolve some timing issues with
    the audit log queue, and the rest fall into the always popular "code
    clean-up" category.

    In general, nothing really substantial, just a nice set of maintenance
    patches"

    * 'upstream' of git://git.infradead.org/users/pcmoore/audit:
    audit: Remove condition which always evaluates to false
    audit: reduce mmap_sem hold for mm->exe_file
    audit: consolidate handling of mm->exe_file
    audit: code clean up
    audit: don't reset working wait time accidentally with auditd
    audit: don't lose set wait time on first successful call to audit_log_start()
    audit: move the tree pruning to a dedicated thread

    Linus Torvalds
     

16 Apr, 2015

1 commit


14 Mar, 2015

1 commit

  • After commit 3e1d0bb6224f019893d1c498cc3327559d183674 ("audit: Convert int limit
    uses to u32"), by converting an int to u32, few conditions will always evaluate
    to false.

    These warnings were emitted during compilation:

    kernel/audit.c: In function ‘audit_set_enabled’:
    kernel/audit.c:347:2: warning: comparison of unsigned expression < 0 is always
    false [-Wtype-limits]
    if (state < AUDIT_OFF || state > AUDIT_LOCKED)
    ^
    kernel/audit.c: In function ‘audit_receive_msg’:
    kernel/audit.c:880:9: warning: comparison of unsigned expression < 0 is
    always false [-Wtype-limits]
    if (s.backlog_wait_time < 0 ||

    The following patch removes those unnecessary conditions.

    Signed-off-by: Pranith Kumar
    Signed-off-by: Paul Moore

    Pranith Kumar
     

24 Feb, 2015

5 commits


31 Dec, 2014

1 commit

  • Pull networking fixes from David Miller:

    1) Fix double SKB free in bluetooth 6lowpan layer, from Jukka Rissanen.

    2) Fix receive checksum handling in enic driver, from Govindarajulu
    Varadarajan.

    3) Fix NAPI poll list corruption in virtio_net and caif_virtio, from
    Herbert Xu. Also, add code to detect drivers that have this mistake
    in the future.

    4) Fix doorbell endianness handling in mlx4 driver, from Amir Vadai.

    5) Don't clobber IP6CB() before xfrm6_policy_check() is called in TCP
    input path,f rom Nicolas Dichtel.

    6) Fix MPLS action validation in openvswitch, from Pravin B Shelar.

    7) Fix double SKB free in vxlan driver, also from Pravin.

    8) When we scrub a packet, which happens when we are switching the
    context of the packet (namespace, etc.), we should reset the
    secmark. From Thomas Graf.

    9) ->ndo_gso_check() needs to do more than return true/false, it also
    has to allow the driver to clear netdev feature bits in order for
    the caller to be able to proceed properly. From Jesse Gross.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits)
    genetlink: A genl_bind() to an out-of-range multicast group should not WARN().
    netlink/genetlink: pass network namespace to bind/unbind
    ne2k-pci: Add pci_disable_device in error handling
    bonding: change error message to debug message in __bond_release_one()
    genetlink: pass multicast bind/unbind to families
    netlink: call unbind when releasing socket
    netlink: update listeners directly when removing socket
    genetlink: pass only network namespace to genl_has_listeners()
    netlink: rename netlink_unbind() to netlink_undo_bind()
    net: Generalize ndo_gso_check to ndo_features_check
    net: incorrect use of init_completion fixup
    neigh: remove next ptr from struct neigh_table
    net: xilinx: Remove unnecessary temac_property in the driver
    net: phy: micrel: use generic config_init for KSZ8021/KSZ8031
    net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding
    openvswitch: fix odd_ptr_err.cocci warnings
    Bluetooth: Fix accepting connections when not using mgmt
    Bluetooth: Fix controller configuration with HCI_QUIRK_INVALID_BDADDR
    brcmfmac: Do not crash if platform data is not populated
    ipw2200: select CFG80211_WEXT
    ...

    Linus Torvalds
     

27 Dec, 2014

1 commit

  • Netlink families can exist in multiple namespaces, and for the most
    part multicast subscriptions are per network namespace. Thus it only
    makes sense to have bind/unbind notifications per network namespace.

    To achieve this, pass the network namespace of a given client socket
    to the bind/unbind functions.

    Also do this in generic netlink, and there also make sure that any
    bind for multicast groups that only exist in init_net is rejected.
    This isn't really a problem if it is accepted since a client in a
    different namespace will never receive any notifications from such
    a group, but it can confuse the family if not rejected (it's also
    possible to silently (without telling the family) accept it, but it
    would also have to be ignored on unbind so families that take any
    kind of action on bind/unbind won't do unnecessary work for invalid
    clients like that.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

24 Dec, 2014

1 commit

  • Pull audit fixes from Paul Moore:
    "Four patches to fix various problems with the audit subsystem, all are
    fairly small and straightforward.

    One patch fixes a problem where we weren't using the correct gfp
    allocation flags (GFP_KERNEL regardless of context, oops), one patch
    fixes a problem with old userspace tools (this was broken for a
    while), one patch fixes a problem where we weren't recording pathnames
    correctly, and one fixes a problem with PID based filters.

    In general I don't think there is anything controversial with this
    patchset, and it fixes some rather unfortunate bugs; the allocation
    flag one can be particularly scary looking for users"

    * 'upstream' of git://git.infradead.org/users/pcmoore/audit:
    audit: restore AUDIT_LOGINUID unset ABI
    audit: correctly record file names with different path name types
    audit: use supplied gfp_mask from audit_buffer in kauditd_send_multicast_skb
    audit: don't attempt to lookup PIDs when changing PID filtering audit rules

    Linus Torvalds
     

20 Dec, 2014

1 commit

  • Eric Paris explains: Since kauditd_send_multicast_skb() gets called in
    audit_log_end(), which can come from any context (aka even a sleeping context)
    GFP_KERNEL can't be used. Since the audit_buffer knows what context it should
    use, pass that down and use that.

    See: https://lkml.org/lkml/2014/12/16/542

    BUG: sleeping function called from invalid context at mm/slab.c:2849
    in_atomic(): 1, irqs_disabled(): 0, pid: 885, name: sulogin
    2 locks held by sulogin/885:
    #0: (&sig->cred_guard_mutex){+.+.+.}, at: [] prepare_bprm_creds+0x28/0x8b
    #1: (tty_files_lock){+.+.+.}, at: [] selinux_bprm_committing_creds+0x55/0x22b
    CPU: 1 PID: 885 Comm: sulogin Not tainted 3.18.0-next-20141216 #30
    Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A15 06/20/2014
    ffff880223744f10 ffff88022410f9b8 ffffffff916ba529 0000000000000375
    ffff880223744f10 ffff88022410f9e8 ffffffff91063185 0000000000000006
    0000000000000000 0000000000000000 0000000000000000 ffff88022410fa38
    Call Trace:
    [] dump_stack+0x50/0xa8
    [] ___might_sleep+0x1b6/0x1be
    [] __might_sleep+0x119/0x128
    [] cache_alloc_debugcheck_before.isra.45+0x1d/0x1f
    [] kmem_cache_alloc+0x43/0x1c9
    [] __alloc_skb+0x42/0x1a3
    [] skb_copy+0x3e/0xa3
    [] audit_log_end+0x83/0x100
    [] ? avc_audit_pre_callback+0x103/0x103
    [] common_lsm_audit+0x441/0x450
    [] slow_avc_audit+0x63/0x67
    [] avc_has_perm+0xca/0xe3
    [] inode_has_perm+0x5a/0x65
    [] selinux_bprm_committing_creds+0x98/0x22b
    [] security_bprm_committing_creds+0xe/0x10
    [] install_exec_creds+0xe/0x79
    [] load_elf_binary+0xe36/0x10d7
    [] search_binary_handler+0x81/0x18c
    [] do_execveat_common.isra.31+0x4e3/0x7b7
    [] do_execve+0x1f/0x21
    [] SyS_execve+0x25/0x29
    [] stub_execve+0x69/0xa0

    Cc: stable@vger.kernel.org #v3.16-rc1
    Reported-by: Valdis Kletnieks
    Signed-off-by: Richard Guy Briggs
    Tested-by: Valdis Kletnieks
    Signed-off-by: Paul Moore

    Richard Guy Briggs
     

14 Dec, 2014

1 commit

  • Pull audit updates from Paul Moore:
    "Two small patches from the audit next branch; only one of which has
    any real significant code changes, the other is simply a MAINTAINERS
    update for audit.

    The single code patch is pretty small and rather straightforward, it
    changes the audit "version" number reported to userspace from an
    integer to a bitmap which is used to indicate the functionality of the
    running kernel. This really doesn't have much impact on the kernel,
    but it will make life easier for the audit userspace folks.

    Thankfully we were still on a version number which allowed us to do
    this without breaking userspace"

    * 'upstream' of git://git.infradead.org/users/pcmoore/audit:
    audit: convert status version to a feature bitmap
    audit: add Paul Moore to the MAINTAINERS entry

    Linus Torvalds
     

10 Dec, 2014

2 commits

  • Pull scheduler updates from Ingo Molnar:
    "The main changes in this cycle are:

    - 'Nested Sleep Debugging', activated when CONFIG_DEBUG_ATOMIC_SLEEP=y.

    This instruments might_sleep() checks to catch places that nest
    blocking primitives - such as mutex usage in a wait loop. Such
    bugs can result in hard to debug races/hangs.

    Another category of invalid nesting that this facility will detect
    is the calling of blocking functions from within schedule() ->
    sched_submit_work() -> blk_schedule_flush_plug().

    There's some potential for false positives (if secondary blocking
    primitives themselves are not ready yet for this facility), but the
    kernel will warn once about such bugs per bootup, so the warning
    isn't much of a nuisance.

    This feature comes with a number of fixes, for problems uncovered
    with it, so no messages are expected normally.

    - Another round of sched/numa optimizations and refinements, for
    CONFIG_NUMA_BALANCING=y.

    - Another round of sched/dl fixes and refinements.

    Plus various smaller fixes and cleanups"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
    sched: Add missing rcu protection to wake_up_all_idle_cpus
    sched/deadline: Introduce start_hrtick_dl() for !CONFIG_SCHED_HRTICK
    sched/numa: Init numa balancing fields of init_task
    sched/deadline: Remove unnecessary definitions in cpudeadline.h
    sched/cpupri: Remove unnecessary definitions in cpupri.h
    sched/deadline: Fix rq->dl.pushable_tasks bug in push_dl_task()
    sched/fair: Fix stale overloaded status in the busiest group finding logic
    sched: Move p->nr_cpus_allowed check to select_task_rq()
    sched/completion: Document when to use wait_for_completion_io_*()
    sched: Update comments about CLONE_NEWUTS and CLONE_NEWIPC
    sched/fair: Kill task_struct::numa_entry and numa_group::task_list
    sched: Refactor task_struct to use numa_faults instead of numa_* pointers
    sched/deadline: Don't check CONFIG_SMP in switched_from_dl()
    sched/deadline: Reschedule from switched_from_dl() after a successful pull
    sched/deadline: Push task away if the deadline is equal to curr during wakeup
    sched/deadline: Add deadline rq status print
    sched/deadline: Fix artificial overrun introduced by yield_task_dl()
    sched/rt: Clean up check_preempt_equal_prio()
    sched/core: Use dl_bw_of() under rcu_read_lock_sched()
    sched: Check if we got a shallowest_idle_cpu before searching for least_loaded_cpu
    ...

    Linus Torvalds
     
  • Paul Moore
     

18 Nov, 2014

1 commit

  • The version field defined in the audit status structure was found to have
    limitations in terms of its expressibility of features supported. This is
    distict from the get/set features call to be able to command those features
    that are present.

    Converting this field from a version number to a feature bitmap will allow
    distributions to selectively backport and support certain features and will
    allow upstream to be able to deprecate features in the future. It will allow
    userspace clients to first query the kernel for which features are actually
    present and supported. Currently, EINVAL is returned rather than EOPNOTSUP,
    which isn't helpful in determining if there was an error in the command, or if
    it simply isn't supported yet. Past features are not represented by this
    bitmap, but their use may be converted to EOPNOTSUP if needed in the future.

    Since "version" is too generic to convert with a #define, use a union in the
    struct status, introducing the member "feature_bitmap" unionized with
    "version".

    Convert existing AUDIT_VERSION_* macros over to AUDIT_FEATURE_BITMAP*
    counterparts, leaving the former for backwards compatibility.

    Signed-off-by: Richard Guy Briggs
    [PM: minor whitespace tweaks]
    Signed-off-by: Paul Moore

    Richard Guy Briggs
     

14 Nov, 2014

1 commit

  • Pull audit fixes from Paul Moore:
    "After he sent the initial audit pull request for 3.18, Eric asked me
    to take over the management of the audit tree, hence this pull request
    to fix a couple of problems with audit.

    As you can see below, the changes are minimal: adding some whitespace
    to a string so userspace parses it correctly, and fixing a problem
    with audit's usage of fsnotify that was causing audit watch rules to
    be lost. Neither of these patches were very controversial on the
    mailing lists and they fix real problems, getting them into 3.18 would
    be a good thing"

    * 'stable-3.18' of git://git.infradead.org/users/pcmoore/audit:
    audit: keep inode pinned
    audit: AUDIT_FEATURE_CHANGE message format missing delimiting space

    Linus Torvalds
     

04 Nov, 2014

1 commit

  • The kauditd_thread wait loop is a bit iffy; it has a number of problems:

    - calls try_to_freeze() before schedule(); you typically want the
    thread to re-evaluate the sleep condition when unfreezing, also
    freeze_task() issues a wakeup.

    - it unconditionally does the {add,remove}_wait_queue(), even when the
    sleep condition is false.

    Use wait_event_freezable() that does the right thing.

    Reported-by: Mike Galbraith
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Eric Paris
    Cc: oleg@redhat.com
    Cc: Eric Paris
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/20141002102251.GA6324@worktop.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

31 Oct, 2014

1 commit


20 Oct, 2014

1 commit

  • Pull audit updates from Eric Paris:
    "So this change across a whole bunch of arches really solves one basic
    problem. We want to audit when seccomp is killing a process. seccomp
    hooks in before the audit syscall entry code. audit_syscall_entry
    took as an argument the arch of the given syscall. Since the arch is
    part of what makes a syscall number meaningful it's an important part
    of the record, but it isn't available when seccomp shoots the
    syscall...

    For most arch's we have a better way to get the arch (syscall_get_arch)
    So the solution was two fold: Implement syscall_get_arch() everywhere
    there is audit which didn't have it. Use syscall_get_arch() in the
    seccomp audit code. Having syscall_get_arch() everywhere meant it was
    a useless flag on the stack and we could get rid of it for the typical
    syscall entry.

    The other changes inside the audit system aren't grand, fixed some
    records that had invalid spaces. Better locking around the task comm
    field. Removing some dead functions and structs. Make some things
    static. Really minor stuff"

    * git://git.infradead.org/users/eparis/audit: (31 commits)
    audit: rename audit_log_remove_rule to disambiguate for trees
    audit: cull redundancy in audit_rule_change
    audit: WARN if audit_rule_change called illegally
    audit: put rule existence check in canonical order
    next: openrisc: Fix build
    audit: get comm using lock to avoid race in string printing
    audit: remove open_arg() function that is never used
    audit: correct AUDIT_GET_FEATURE return message type
    audit: set nlmsg_len for multicast messages.
    audit: use union for audit_field values since they are mutually exclusive
    audit: invalid op= values for rules
    audit: use atomic_t to simplify audit_serial()
    kernel/audit.c: use ARRAY_SIZE instead of sizeof/sizeof[0]
    audit: reduce scope of audit_log_fcaps
    audit: reduce scope of audit_net_id
    audit: arm64: Remove the audit arch argument to audit_syscall_entry
    arm64: audit: Add audit hook in syscall_trace_enter/exit()
    audit: x86: drop arch from __audit_syscall_entry() interface
    sparc: implement is_32bit_task
    sparc: properly conditionalize use of TIF_32BIT
    ...

    Linus Torvalds
     

24 Sep, 2014

7 commits

  • When task->comm is passed directly to audit_log_untrustedstring() without
    getting a copy or using the task_lock, there is a race that could happen that
    would output a NULL (\0) in the output string that would effectively truncate
    the rest of the report text after the comm= field in the audit, losing fields.

    Use get_task_comm() to get a copy while acquiring the task_lock to prevent
    this and to prevent the result from being a mixture of old and new values of
    comm.

    Signed-off-by: Tetsuo Handa
    Signed-off-by: Richard Guy Briggs

    Richard Guy Briggs
     
  • When an AUDIT_GET_FEATURE message is sent from userspace to the kernel, it
    should reply with a message tagged as an AUDIT_GET_FEATURE type with a struct
    audit_feature. The current reply is a message tagged as an AUDIT_GET
    type with a struct audit_feature.

    This appears to have been a cut-and-paste-eo in commit b0fed40.

    Reported-by: Steve Grubb
    Signed-off-by: Richard Guy Briggs

    Richard Guy Briggs
     
  • Report:
    Looking at your example code in
    http://people.redhat.com/rbriggs/audit-multicast-listen/audit-multicast-listen.c,
    it seems that nlmsg_len field in the received messages is supposed to
    contain the length of the header + payload, but it is always set to the
    size of the header only, i.e. 16. The example program works, because
    the printf format specifies the minimum width, not "precision", so it
    simply prints out the payload until the first zero byte. This isn't too
    much of a problem, but precludes the use of recvmmsg, iiuc?

    (gdb) p *(struct nlmsghdr*)nlh
    $14 = {nlmsg_len = 16, nlmsg_type = 1100, nlmsg_flags = 0, nlmsg_seq = 0, nlmsg_pid = 9910}

    The only time nlmsg_len would have been updated was at audit_buffer_alloc()
    inside audit_log_start() and never updated after. It should arguably be done
    in audit_log_vformat(), but would be more efficient in audit_log_end().

    Reported-by: Zbigniew Jędrzejewski-Szmek
    Signed-off-by: Richard Guy Briggs

    Richard Guy Briggs
     
  • Since there is already a primitive to do this operation in the atomic_t, use it
    to simplify audit_serial().

    Signed-off-by: Richard Guy Briggs

    Richard Guy Briggs
     
  • Use kernel.h definition.

    Cc: Eric Paris
    Cc: Andrew Morton
    Signed-off-by: Fabian Frederick
    Signed-off-by: Richard Guy Briggs

    Fabian Frederick
     
  • audit_log_fcaps() isn't used outside kernel/audit.c. Reduce its scope.

    Signed-off-by: Richard Guy Briggs

    Richard Guy Briggs
     
  • audit_net_id isn't used outside kernel/audit.c. Reduce its scope.

    Signed-off-by: Richard Guy Briggs

    Richard Guy Briggs
     

24 Jul, 2014

1 commit

  • This is effectively a revert of 7b9a7ec565505699f503b4fcf61500dceb36e744
    plus fixing it a different way...

    We found, when trying to run an application from an application which
    had dropped privs that the kernel does security checks on undefined
    capability bits. This was ESPECIALLY difficult to debug as those
    undefined bits are hidden from /proc/$PID/status.

    Consider a root application which drops all capabilities from ALL 4
    capability sets. We assume, since the application is going to set
    eff/perm/inh from an array that it will clear not only the defined caps
    less than CAP_LAST_CAP, but also the higher 28ish bits which are
    undefined future capabilities.

    The BSET gets cleared differently. Instead it is cleared one bit at a
    time. The problem here is that in security/commoncap.c::cap_task_prctl()
    we actually check the validity of a capability being read. So any task
    which attempts to 'read all things set in bset' followed by 'unset all
    things set in bset' will not even attempt to unset the undefined bits
    higher than CAP_LAST_CAP.

    So the 'parent' will look something like:
    CapInh: 0000000000000000
    CapPrm: 0000000000000000
    CapEff: 0000000000000000
    CapBnd: ffffffc000000000

    All of this 'should' be fine. Given that these are undefined bits that
    aren't supposed to have anything to do with permissions. But they do...

    So lets now consider a task which cleared the eff/perm/inh completely
    and cleared all of the valid caps in the bset (but not the invalid caps
    it couldn't read out of the kernel). We know that this is exactly what
    the libcap-ng library does and what the go capabilities library does.
    They both leave you in that above situation if you try to clear all of
    you capapabilities from all 4 sets. If that root task calls execve()
    the child task will pick up all caps not blocked by the bset. The bset
    however does not block bits higher than CAP_LAST_CAP. So now the child
    task has bits in eff which are not in the parent. These are
    'meaningless' undefined bits, but still bits which the parent doesn't
    have.

    The problem is now in cred_cap_issubset() (or any operation which does a
    subset test) as the child, while a subset for valid cap bits, is not a
    subset for invalid cap bits! So now we set durring commit creds that
    the child is not dumpable. Given it is 'more priv' than its parent. It
    also means the parent cannot ptrace the child and other stupidity.

    The solution here:
    1) stop hiding capability bits in status
    This makes debugging easier!

    2) stop giving any task undefined capability bits. it's simple, it you
    don't put those invalid bits in CAP_FULL_SET you won't get them in init
    and you won't get them in any other task either.
    This fixes the cap_issubset() tests and resulting fallout (which
    made the init task in a docker container untraceable among other
    things)

    3) mask out undefined bits when sys_capset() is called as it might use
    ~0, ~0 to denote 'all capabilities' for backward/forward compatibility.
    This lets 'capsh --caps="all=eip" -- -c /bin/bash' run.

    4) mask out undefined bit when we read a file capability off of disk as
    again likely all bits are set in the xattr for forward/backward
    compatibility.
    This lets 'setcap all+pe /bin/bash; /bin/bash' run

    Signed-off-by: Eric Paris
    Reviewed-by: Kees Cook
    Cc: Andrew Vagin
    Cc: Andrew G. Morgan
    Cc: Serge E. Hallyn
    Cc: Kees Cook
    Cc: Steve Grubb
    Cc: Dan Walsh
    Cc: stable@vger.kernel.org
    Signed-off-by: James Morris

    Eric Paris
     

13 Jun, 2014

1 commit

  • Pull networking updates from David Miller:

    1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.

    2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
    Benniston.

    3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
    Mork.

    4) BPF now has a "random" opcode, from Chema Gonzalez.

    5) Add more BPF documentation and improve test framework, from Daniel
    Borkmann.

    6) Support TCP fastopen over ipv6, from Daniel Lee.

    7) Add software TSO helper functions and use them to support software
    TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia.

    8) Support software TSO in fec driver too, from Nimrod Andy.

    9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.

    10) Handle broadcasts more gracefully over macvlan when there are large
    numbers of interfaces configured, from Herbert Xu.

    11) Allow more control over fwmark used for non-socket based responses,
    from Lorenzo Colitti.

    12) Do TCP congestion window limiting based upon measurements, from Neal
    Cardwell.

    13) Support busy polling in SCTP, from Neal Horman.

    14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.

    15) Bridge promisc mode handling improvements from Vlad Yasevich.

    16) Don't use inetpeer entries to implement ID generation any more, it
    performs poorly, from Eric Dumazet.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
    rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
    tcp: fixing TLP's FIN recovery
    net: fec: Add software TSO support
    net: fec: Add Scatter/gather support
    net: fec: Increase buffer descriptor entry number
    net: fec: Factorize feature setting
    net: fec: Enable IP header hardware checksum
    net: fec: Factorize the .xmit transmit function
    bridge: fix compile error when compiling without IPv6 support
    bridge: fix smatch warning / potential null pointer dereference
    via-rhine: fix full-duplex with autoneg disable
    bnx2x: Enlarge the dorq threshold for VFs
    bnx2x: Check for UNDI in uncommon branch
    bnx2x: Fix 1G-baseT link
    bnx2x: Fix link for KR with swapped polarity lane
    sctp: Fix sk_ack_backlog wrap-around problem
    net/core: Add VF link state control policy
    net/fsl: xgmac_mdio is dependent on OF_MDIO
    net/fsl: Make xgmac_mdio read error message useful
    net_sched: drr: warn when qdisc is not work conserving
    ...

    Linus Torvalds
     

07 Jun, 2014

1 commit


13 May, 2014

1 commit

  • Conflicts:
    drivers/net/ethernet/altera/altera_sgdma.c
    net/netlink/af_netlink.c
    net/sched/cls_api.c
    net/sched/sch_api.c

    The netlink conflict dealt with moving to netlink_capable() and
    netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations
    in non-init namespaces. These were simple transformations from
    netlink_capable to netlink_ns_capable.

    The Altera driver conflict was simply code removal overlapping some
    void pointer cast cleanups in net-next.

    Signed-off-by: David S. Miller

    David S. Miller
     

25 Apr, 2014

1 commit

  • It is possible by passing a netlink socket to a more privileged
    executable and then to fool that executable into writing to the socket
    data that happens to be valid netlink message to do something that
    privileged executable did not intend to do.

    To keep this from happening replace bare capable and ns_capable calls
    with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
    Which act the same as the previous calls except they verify that the
    opener of the socket had the desired permissions as well.

    Reported-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

23 Apr, 2014

3 commits

  • Test first to see if there are any userspace multicast listeners bound to the
    socket before starting the multicast send work.

    Signed-off-by: Richard Guy Briggs
    Signed-off-by: David S. Miller

    Richard Guy Briggs
     
  • Add a netlink multicast socket with one group to kaudit for "best-effort"
    delivery to read-only userspace clients such as systemd, in addition to the
    existing bidirectional unicast auditd userspace client.

    Currently, auditd is intended to use the CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE
    capabilities, but actually uses CAP_NET_ADMIN. The CAP_AUDIT_READ capability
    is added for use by read-only AUDIT_NLGRP_READLOG netlink multicast group
    clients to the kaudit subsystem.

    This will safely give access to services such as systemd to consume audit logs
    while ensuring write access remains restricted for integrity.

    Signed-off-by: Richard Guy Briggs
    Signed-off-by: David S. Miller

    Richard Guy Briggs
     
  • Register a netlink per-protocol bind fuction for audit to check userspace
    process capabilities before allowing a multicast group connection.

    Signed-off-by: Richard Guy Briggs
    Signed-off-by: David S. Miller

    Richard Guy Briggs
     

13 Apr, 2014

1 commit

  • Pull audit updates from Eric Paris.

    * git://git.infradead.org/users/eparis/audit: (28 commits)
    AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC
    audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range
    audit: do not cast audit_rule_data pointers pointlesly
    AUDIT: Allow login in non-init namespaces
    audit: define audit_is_compat in kernel internal header
    kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c
    sched: declare pid_alive as inline
    audit: use uapi/linux/audit.h for AUDIT_ARCH declarations
    syscall_get_arch: remove useless function arguments
    audit: remove stray newline from audit_log_execve_info() audit_panic() call
    audit: remove stray newlines from audit_log_lost messages
    audit: include subject in login records
    audit: remove superfluous new- prefix in AUDIT_LOGIN messages
    audit: allow user processes to log from another PID namespace
    audit: anchor all pid references in the initial pid namespace
    audit: convert PPIDs to the inital PID namespace.
    pid: get pid_t ppid of task in init_pid_ns
    audit: rename the misleading audit_get_context() to audit_take_context()
    audit: Add generic compat syscall support
    audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
    ...

    Linus Torvalds
     

01 Apr, 2014

1 commit

  • It its possible to configure your PAM stack to refuse login if audit
    messages (about the login) were unable to be sent. This is common in
    many distros and thus normal configuration of many containers. The PAM
    modules determine if audit is enabled/disabled in the kernel based on
    the return value from sending an audit message on the netlink socket.
    If userspace gets back ECONNREFUSED it believes audit is disabled in the
    kernel. If it gets any other error else it refuses to let the login
    proceed.

    Just about ever since the introduction of namespaces the kernel audit
    subsystem has returned EPERM if the task sending a message was not in
    the init user or pid namespace. So many forms of containers have never
    worked if audit was enabled in the kernel.

    BUT if the container was not in net_init then the kernel network code
    would send ECONNREFUSED (instead of the audit code sending EPERM). Thus
    by pure accident/dumb luck/bug if an admin configured the PAM stack to
    reject all logins that didn't talk to audit, but then ran the login
    untility in the non-init_net namespace, it would work!! Clearly this was
    a bug, but it is a bug some people expected.

    With the introduction of network namespace support in 3.14-rc1 the two
    bugs stopped cancelling each other out. Now, containers in the
    non-init_net namespace refused to let users log in (just like PAM was
    configfured!) Obviously some people were not happy that what used to let
    users log in, now didn't!

    This fix is kinda hacky. We return ECONNREFUSED for all non-init
    relevant namespaces. That means that not only will the old broken
    non-init_net setups continue to work, now the broken non-init_pid or
    non-init_user setups will 'work'. They don't really work, since audit
    isn't logging things. But it's what most users want.

    In 3.15 we should have patches to support not only the non-init_net
    (3.14) namespace but also the non-init_pid and non-init_user namespace.
    So all will be right in the world. This just opens the doors wide open
    on 3.14 and hopefully makes users happy, if not the audit system...

    Reported-by: Andre Tomt
    Reported-by: Adam Richter
    Signed-off-by: Eric Paris
    Signed-off-by: Linus Torvalds

    Conflicts:
    kernel/audit.c

    Eric Paris
     

31 Mar, 2014

1 commit

  • It its possible to configure your PAM stack to refuse login if audit
    messages (about the login) were unable to be sent. This is common in
    many distros and thus normal configuration of many containers. The PAM
    modules determine if audit is enabled/disabled in the kernel based on
    the return value from sending an audit message on the netlink socket.
    If userspace gets back ECONNREFUSED it believes audit is disabled in the
    kernel. If it gets any other error else it refuses to let the login
    proceed.

    Just about ever since the introduction of namespaces the kernel audit
    subsystem has returned EPERM if the task sending a message was not in
    the init user or pid namespace. So many forms of containers have never
    worked if audit was enabled in the kernel.

    BUT if the container was not in net_init then the kernel network code
    would send ECONNREFUSED (instead of the audit code sending EPERM). Thus
    by pure accident/dumb luck/bug if an admin configured the PAM stack to
    reject all logins that didn't talk to audit, but then ran the login
    untility in the non-init_net namespace, it would work!! Clearly this was
    a bug, but it is a bug some people expected.

    With the introduction of network namespace support in 3.14-rc1 the two
    bugs stopped cancelling each other out. Now, containers in the
    non-init_net namespace refused to let users log in (just like PAM was
    configfured!) Obviously some people were not happy that what used to let
    users log in, now didn't!

    This fix is kinda hacky. We return ECONNREFUSED for all non-init
    relevant namespaces. That means that not only will the old broken
    non-init_net setups continue to work, now the broken non-init_pid or
    non-init_user setups will 'work'. They don't really work, since audit
    isn't logging things. But it's what most users want.

    In 3.15 we should have patches to support not only the non-init_net
    (3.14) namespace but also the non-init_pid and non-init_user namespace.
    So all will be right in the world. This just opens the doors wide open
    on 3.14 and hopefully makes users happy, if not the audit system...

    Reported-by: Andre Tomt
    Reported-by: Adam Richter
    Signed-off-by: Eric Paris
    Signed-off-by: Linus Torvalds

    Eric Paris
     

25 Mar, 2014

1 commit

  • This patch replaces rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL)

    The rcu_assign_pointer() ensures that the initialization of a structure
    is carried out before storing a pointer to that structure.
    And in the case of the NULL pointer, there is no structure to initialize.
    So, rcu_assign_pointer(p, NULL) can be safely converted to RCU_INIT_POINTER(p, NULL)

    Signed-off-by: Monam Agarwal
    Signed-off-by: Eric Paris

    Monam Agarwal