20 Jan, 2011
9 commits
-
This implements a mqprio queueing discipline that by default creates
a pfifo_fast qdisc per tx queue and provides the needed configuration
interface.Using the mqprio qdisc the number of tcs currently in use along
with the range of queues alloted to each class can be configured. By
default skbs are mapped to traffic classes using the skb priority.
This mapping is configurable.Configurable parameters,
struct tc_mqprio_qopt {
__u8 num_tc;
__u8 prio_tc_map[TC_BITMASK + 1];
__u8 hw;
__u16 count[TC_MAX_QUEUE];
__u16 offset[TC_MAX_QUEUE];
};Here the count/offset pairing give the queue alignment and the
prio_tc_map gives the mapping from skb->priority to tc.The hw bit determines if the hardware should configure the count
and offset values. If the hardware bit is set then the operation
will fail if the hardware does not implement the ndo_setup_tc
operation. This is to avoid undetermined states where the hardware
may or may not control the queue mapping. Also minimal bounds
checking is done on the count/offset to verify a queue does not
exceed num_tx_queues and that queue ranges do not overlap. Otherwise
it is left to user policy or hardware configuration to create
useful mappings.It is expected that hardware QOS schemes can be implemented by
creating appropriate mappings of queues in ndo_tc_setup().One expected use case is drivers will use the ndo_setup_tc to map
queue ranges onto 802.1Q traffic classes. This provides a generic
mechanism to map network traffic onto these traffic classes and
removes the need for lower layer drivers to know specifics about
traffic types.Signed-off-by: John Fastabend
Signed-off-by: David S. Miller -
This patch provides a mechanism for lower layer devices to
steer traffic using skb->priority to tx queues. This allows
for hardware based QOS schemes to use the default qdisc without
incurring the penalties related to global state and the qdisc
lock. While reliably receiving skbs on the correct tx ring
to avoid head of line blocking resulting from shuffling in
the LLD. Finally, all the goodness from txq caching and xps/rps
can still be leveraged.Many drivers and hardware exist with the ability to implement
QOS schemes in the hardware but currently these drivers tend
to rely on firmware to reroute specific traffic, a driver
specific select_queue or the queue_mapping action in the
qdisc.By using select_queue for this drivers need to be updated for
each and every traffic type and we lose the goodness of much
of the upstream work. Firmware solutions are inherently
inflexible. And finally if admins are expected to build a
qdisc and filter rules to steer traffic this requires knowledge
of how the hardware is currently configured. The number of tx
queues and the queue offsets may change depending on resources.
Also this approach incurs all the overhead of a qdisc with filters.With the mechanism in this patch users can set skb priority using
expected methods ie setsockopt() or the stack can set the priority
directly. Then the skb will be steered to the correct tx queues
aligned with hardware QOS traffic classes. In the normal case with
single traffic class and all queues in this class everything
works as is until the LLD enables multiple tcs.To steer the skb we mask out the lower 4 bits of the priority
and allow the hardware to configure upto 15 distinct classes
of traffic. This is expected to be sufficient for most applications
at any rate it is more then the 8021Q spec designates and is
equal to the number of prio bands currently implemented in
the default qdisc.This in conjunction with a userspace application such as
lldpad can be used to implement 8021Q transmission selection
algorithms one of these algorithms being the extended transmission
selection algorithm currently being used for DCB.Signed-off-by: John Fastabend
Signed-off-by: David S. Miller -
If a rtnetlink request specifies a negative or zero ifindex and has no
interface name attribute, but has a group attribute, then the chenges
are made to all the interfaces belonging to the specified group.Signed-off-by: Vlad Dogaru
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Net devices can now be grouped, enabling simpler manipulation from
userspace. This patch adds a group field to the net_device structure, as
well as rtnetlink support to query and modify it.Signed-off-by: Vlad Dogaru
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Clean up some unused macros in net/*.
1. be left for code change. e.g. PGV_FROM_VMALLOC, PGV_FROM_VMALLOC, KMEM_SAFETYZONE.
2. never be used since introduced to kernel.
e.g. P9_RDMA_MAX_SGE, UTIL_CTRL_PKT_SIZE.Signed-off-by: Shan Wei
Acked-by: Sjur Braendeland
Signed-off-by: David S. Miller -
Update vxge driver version to 2.5.2
Signed-off-by: Jon Mason
Signed-off-by: David S. Miller -
To reduce the possibility of losing an interrupt in the handler due to a
race between an interrupt processing and disable/enable of interrupts,
enable MSIX one shot.Also, add support for adaptive interrupt coalesing
Signed-off-by: Jon Mason
Signed-off-by: Masroor Vettuparambil
Signed-off-by: David S. Miller -
The firmware PXE EPROM version detection is failing due to passing the
wrong parameter into firmware query function. Also, the version
printing function has an extraneous newline.Signed-off-by: Jon Mason
Signed-off-by: Sivakumar Subramani
Signed-off-by: David S. Miller -
Reorder the commands to be in the inverse order of their allocations
(instead of the random order they appear to be in), propagate return
code on errors from pci_request_region and register_netdev, reduce the
config_dev_cnt and total_dev_cnt counters on remove, and return the
correct error code for vdev->vpaths kzalloc failures. Also, prevent
leaking of vdev->vpaths memory and netdev in vxge_probe error path due
to freeing for these not occurring in vxge_device_unregister.Signed-off-by: Jon Mason
Signed-off-by: Sivakumar Subramani
Signed-off-by: David S. Miller
19 Jan, 2011
31 commits
-
Packet filter (BPF) doesnt need to disable softirqs, being fully
re-entrant and lock-less.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Linux Socket Filters can already be successfully attached and detached on unix
sockets with setsockopt(sockfd, SOL_SOCKET, SO_{ATTACH,DETACH}_FILTER, ...).
See: Documentation/networking/filter.txtBut the filter was never used in the unix socket code so it did not work. This
patch uses sk_filter() to filter buffers before delivery.This short program demonstrates the problem on SOCK_DGRAM.
int main(void) {
int i, j, ret;
int sv[2];
struct pollfd fds[2];
char *message = "Hello world!";
char buffer[64];
struct sock_filter ins[32] = {{0,},};
struct sock_fprog filter;socketpair(AF_UNIX, SOCK_DGRAM, 0, sv);
for (i = 0 ; i < 2 ; i++) {
fds[i].fd = sv[i];
fds[i].events = POLLIN;
fds[i].revents = 0;
}for(j = 1 ; j < 13 ; j++) {
/* Set a socket filter to truncate the message */
memset(ins, 0, sizeof(ins));
ins[0].code = BPF_RET|BPF_K;
ins[0].k = j;
filter.len = 1;
filter.filter = ins;
setsockopt(sv[1], SOL_SOCKET, SO_ATTACH_FILTER, &filter, sizeof(filter));/* send a message */
send(sv[0], message, strlen(message) + 1, 0);/* The filter should let the message pass but truncated. */
poll(fds, 2, 0);/* Receive the truncated message*/
ret = recv(sv[1], buffer, 64, 0);
printf("received %d bytes, expected %d\n", ret, j);
}for (i = 0 ; i < 2 ; i++)
close(sv[i]);return 0;
}Signed-off-by: Alban Crequy
Reviewed-by: Ian Molton
Signed-off-by: David S. Miller -
Just stumbled upon the issue while looking for another bug.
The code looks correct, the indentation is not.
Signed-off-by: Anton Vorontsov
Signed-off-by: David S. Miller -
sh_irda can not use RX/TX in same time,
but this driver didn't return to RX mode when TX error occurred.
This patch care xmit error case to solve this issue.Signed-off-by: Kuninori Morimoto
Signed-off-by: David S. Miller -
In netif_skb_features() we return only the features that are valid for vlans
if we have a vlan packet. However, we should not mask out NETIF_F_HW_VLAN_TX
since it enables transmission of vlan tags and is obviously valid.Reported-by: Eric Dumazet
Signed-off-by: Jesse Gross
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller -
- tx_fixup() can be called from either timer callback or from xmit()
in usbnet, so spinlock is added to avoid concurrency-related problem.
- minor correction due to checkpatch warning for some line over 80
chars after previous patch was applied.Signed-off-by: Alexey Orishko
Signed-off-by: David S. Miller -
In drivers/net/ns83820.c::ns83820_init_one() we dynamically allocate
memory via alloc_etherdev(). We then call PRIV() on the returned storage
which is 'return netdev_priv()'. netdev_priv() takes the pointer it is
passed and adds 'ALIGN(sizeof(struct net_device), NETDEV_ALIGN)' to it and
returns it. Then we test the resulting pointer for NULL, which it is
unlikely to be at this point, and later dereference it. This will go bad
if alloc_etherdev() actually returned NULL.This patch reworks the code slightly so that we test for a NULL pointer
(and return -ENOMEM) directly after calling alloc_etherdev().Signed-off-by: Jesper Juhl
Signed-off-by: Benjamin LaHaise
Signed-off-by: David S. Miller -
When a network namespace is created (via CLONE_NEWNET), the loopback
interface is automatically added to the new namespace, triggering a
printk in ipv6_add_dev() if CONFIG_IPV6_PRIVACY is set.This is problematic for applications which use CLONE_NEWNET as
part of a sandbox, like Chromium's suid sandbox or recent versions of
vsftpd. On a busy machine, it can lead to thousands of useless
"lo: Disabled Privacy Extensions" messages appearing in dmesg.It's easy enough to check the status of privacy extensions via the
use_tempaddr sysctl, so just removing the printk seems like the most
sensible solution.Signed-off-by: Romain Francoise
Signed-off-by: David S. Miller -
Update bnx2x version to 1.62.00-4
Signed-off-by: Yaniv Rosner
Signed-off-by: Eilon Greenstein
Signed-off-by: David S. Miller -
Fix AER settings for BCM57712 to allow accessing all device addresses range in CL45 MDC/MDIO
Signed-off-by: Yaniv Rosner
Signed-off-by: Eilon Greenstein
Signed-off-by: David S. Miller -
Fix BCM84823 LED behavior which may show on some systems
Signed-off-by: Yaniv Rosner
Signed-off-by: Eilon Greenstein
Signed-off-by: David S. Miller -
Device may show incorrect duplex mode for devices with external PHY
Signed-off-by: Yaniv Rosner
Signed-off-by: Eilon Greenstein
Signed-off-by: David S. Miller -
Improve microcode loading verification before proceeding to next stage
Signed-off-by: Yaniv Rosner
Signed-off-by: Eilon Greenstein
Signed-off-by: David S. Miller -
LED on BCM57712+BCM8727 systems requires different settings
Signed-off-by: Yaniv Rosner
Signed-off-by: Eilon Greenstein
Signed-off-by: David S. Miller -
Common init used to be called by the driver when the first port comes up, mainly to reset and reload external PHY microcode.
However, in case management driver is active on the other port, traffic would halted. So limit the common init to be done only once after POR.Signed-off-by: Yaniv Rosner
Signed-off-by: Eilon Greenstein
Signed-off-by: David S. Miller -
Enable controlling BCM8073 PN polarity swap through nvm configuration, which is required in certain systems
Signed-off-by: Yaniv Rosner
Signed-off-by: Eilon Greenstein
Signed-off-by: David S. Miller -
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/staging:
hwmon: (lm93) Add support for LM94 -
…/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf: Validate cpu early in perf_event_alloc()
perf: Find_get_context: fix the per-cpu-counter check
perf: Fix contexted inheritance -
…git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Clear irqstack thread_info
x86: Make relocatable kernel work with new binutils -
* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus: (26 commits)
MIPS: Malta: enable Cirrus FB console
MIPS: add CONFIG_VIRTUALIZATION for virtio support
MIPS: Implement __read_mostly
MIPS: ath79: add common WMAC device for AR913X based boards
MIPS: ath79: Add initial support for the Atheros AP81 reference board
MIPS: ath79: add common SPI controller device
SPI: Add SPI controller driver for the Atheros AR71XX/AR724X/AR913X SoCs
MIPS: ath79: add common GPIO buttons device
MIPS: ath79: add common watchdog device
MIPS: ath79: add common GPIO LEDs device
MIPS: ath79: add initial support for the Atheros PB44 reference board
MIPS: ath79: utilize the MIPS multi-machine support
MIPS: ath79: add GPIOLIB support
MIPS: Add initial support for the Atheros AR71XX/AR724X/AR931X SoCs
MIPS: jump label: Add MIPS support.
MIPS: Use WARN() in uasm for better diagnostics.
MIPS: Optimize TLB handlers for Octeon CPUs
MIPS: Add LDX and LWX instructions to uasm.
MIPS: Use BBIT instructions in TLB handlers
MIPS: Declare uasm bbit0 and bbit1 functions.
... -
This patch adds basic support for LM94 to the LM93 driver. LM94 specific
sensors and features are not supported.Signed-off-by: Guenter Roeck
Acked-by: Jean Delvare -
When read valid tx/rx chains from EEPROM, there is a bug to use the
tx chain value for both tx and rx, the result of this cause low
receive throughput on 1x2 devices becuase rx will only utilize single
chain instead of two chainsSigned-off-by: Wey-Yi Guy
Signed-off-by: John W. Linville -
ath5k_reset must be called with sc->lock. Since the tx queue
watchdog runs in a workqueue and accesses sc, it's appropriate
to just take the lock over the whole function.Signed-off-by: Bob Copeland
Signed-off-by: John W. Linville -
Starting from perf_event_alloc()->perf_init_event(), the kernel
assumes that event->cpu is either -1 or the valid CPU number.Change perf_event_alloc() to validate this argument early. This
also means we can remove the similar check in
find_get_context().Signed-off-by: Oleg Nesterov
Acked-by: Peter Zijlstra
Cc: Alan Stern
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Paul Mackerras
Cc: Prasad
Cc: Roland McGrath
Cc: gregkh@suse.de
Cc: stable@kernel.org
LKML-Reference:
Signed-off-by: Ingo Molnar -
If task == NULL, find_get_context() should always check that cpu
is correct.Afaics, the bug was introduced by 38a81da2 "perf events: Clean
up pid passing", but even before that commit "&& cpu != -1" was
not exactly right, -ESRCH from find_task_by_vpid() is not
accurate.Signed-off-by: Oleg Nesterov
Acked-by: Peter Zijlstra
Cc: Alan Stern
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Paul Mackerras
Cc: Prasad
Cc: Roland McGrath
Cc: gregkh@suse.de
Cc: stable@kernel.org
LKML-Reference:
Signed-off-by: Ingo Molnar -
While most users of a physical Malta board are using the serial port
as the console, a lot of QEMU users would prefer to interact with a
graphical console. Enable the Cirrus FB support in the Malta default
configuration to make that possible. Note that the default console will
still be the serial port, users have to pass "console=tty0" to the
kernel to use the Cirrus FB.Signed-off-by: Aurelien Jarno
To: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/2001/
Signed-off-by: Ralf Baechle -
Add CONFIG_VIRTUALIZATION to the MIPS architecture and include the
the virtio code there. Used to enable the virtio drivers under QEMU.Signed-off-by: Aurelien Jarno
To: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/2002/
Signed-off-by: Ralf Baechle -
Just do what everyone else is doing by placing __read_mostly things in
the .data.read_mostly section.mips_io_port_base can not be read-only (const) and writable
(__read_mostly) at the same time. One of them has to go, so I chose
to eliminate the __read_mostly. It will still get stuck in a portion
of memory that is not adjacent to things that are written, and thus
not be on a dirty cache line, for whatever that is worth.Signed-off-by: David Daney
To: linux-mips@linux-mips.org
Patchwork: http://patchwork.linux-mips.org/patch/1702/
Signed-off-by: Ralf Baechle