18 Dec, 2011

3 commits


13 Dec, 2011

2 commits

  • Modify the algorithm to build the source hashing hash table to add
    extra slots for destinations with higher weight. This has the effect
    of allowing an IPVS SH user to give more connections to hosts that
    have been configured to have a higher weight.

    The reason for the Kconfig change is because the size of the hash table
    becomes more relevant/important if you decide to use the weights in the
    manner this patch lets you. It would be conceivable that someone might
    need to increase the size of that table to accommodate their
    configuration, so it will be handy to be able to do that through the
    regular configuration system instead of editing the source.

    Signed-off-by: Michael Maxim
    Signed-off-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso

    Michael Maxim
     
  • This is not merged with the ipv4 match into xt_rpfilter.c
    to avoid ipv6 module dependency issues.

    Signed-off-by: Florian Westphal
    Acked-by: David S. Miller
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

05 Dec, 2011

5 commits

  • like rt6_lookup, but allows caller to pass in flowi6 structure.
    Will be used by the upcoming ipv6 netfilter reverse path filter
    match.

    Signed-off-by: Florian Westphal
    Acked-by: David S. Miller
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • This tries to do the same thing as fib_validate_source(), but differs
    in several aspects.

    The most important difference is that the reverse path filter built into
    fib_validate_source uses the oif as iif when performing the reverse
    lookup. We do not do this, as the oif is not yet known by the time the
    PREROUTING hook is invoked.

    We can't wait until FORWARD chain because by the time FORWARD is invoked
    ipv4 forward path may have already sent icmp messages is response
    to to-be-discarded-via-rpfilter packets.

    To avoid the such an additional lookup in PREROUTING, Patrick McHardy
    suggested to attach the path information directly in the match
    (i.e., just do what the standard ipv4 path does a bit earlier in PREROUTING).

    This works, but it also has a few caveats. Most importantly, when using
    marks in PREROUTING to re-route traffic based on the nfmark, -m rpfilter
    would have to be used after the nfmark has been set; otherwise the nfmark
    would have no effect (because the route is already attached).

    Another problem would be interaction with -j TPROXY, as this target sets an
    nfmark and uses ACCEPT instead of continue, i.e. such a version of
    -m rpfilter cannot be used for the initial to-be-intercepted packets.

    In case in turns out that the oif is required, we can add Patricks
    suggestion with a new match option (e.g. --rpf-use-oif) to keep ruleset
    compatibility.

    Another difference to current builtin ipv4 rpfilter is that packets subject to ipsec
    transformation are not automatically excluded. If you want this, simply
    combine -m rpfilter with the policy match.

    Packets arriving on loopback interfaces always match.

    Signed-off-by: Florian Westphal
    Acked-by: David S. Miller
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • The reverse path filter module will use fib_lookup.

    If CONFIG_IP_MULTIPLE_TABLES is not set, fib_lookup is
    only a static inline helper that calls fib_table_lookup,
    so export that too.

    Signed-off-by: Florian Westphal
    Acked-by: David S. Miller
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • If our TCP_PAGE(sk) is not shared (page_count() == 1), we can set page
    offset to 0.

    This permits better filling of the pages on small to medium tcp writes.

    "tbench 16" results on my dev server (2x4x2 machine) :

    Before : 3072 MB/s
    After : 3146 MB/s (2.4 % gain)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • We discovered that TCP stack could retransmit misaligned skbs if a
    malicious peer acknowledged sub MSS frame. This currently can happen
    only if output interface is non SG enabled : If SG is enabled, tcp
    builds headless skbs (all payload is included in fragments), so the tcp
    trimming process only removes parts of skb fragments, header stay
    aligned.

    Some arches cant handle misalignments, so force a head reallocation and
    shrink headroom to MAX_TCP_HEADER.

    Dont care about misaligments on x86 and PPC (or other arches setting
    NET_IP_ALIGN to 0)

    This patch introduces __pskb_copy() which can specify the headroom of
    new head, and pskb_copy() becomes a wrapper on top of __pskb_copy()

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

04 Dec, 2011

15 commits

  • The advantage of kcalloc is, that will prevent integer overflows which could
    result from the multiplication of number of elements and size and it is also
    a bit nicer to read.

    The semantic patch that makes this change is available
    in https://lkml.org/lkml/2011/11/25/107

    Signed-off-by: Thomas Meyer
    Signed-off-by: David S. Miller

    Thomas Meyer
     
  • The advantage of kcalloc is, that will prevent integer overflows which could
    result from the multiplication of number of elements and size and it is also
    a bit nicer to read.

    The semantic patch that makes this change is available
    in https://lkml.org/lkml/2011/11/25/107

    Signed-off-by: Thomas Meyer
    Signed-off-by: David S. Miller

    Thomas Meyer
     
  • The advantage of kcalloc is, that will prevent integer overflows which could
    result from the multiplication of number of elements and size and it is also
    a bit nicer to read.

    The semantic patch that makes this change is available
    in https://lkml.org/lkml/2011/11/25/107

    Signed-off-by: Thomas Meyer
    Signed-off-by: David S. Miller

    Thomas Meyer
     
  • The advantage of kcalloc is, that will prevent integer overflows which could
    result from the multiplication of number of elements and size and it is also
    a bit nicer to read.

    The semantic patch that makes this change is available
    in https://lkml.org/lkml/2011/11/25/107

    Signed-off-by: Thomas Meyer
    Signed-off-by: David S. Miller

    Thomas Meyer
     
  • Denys Fedoryshchenko reported that SYN+FIN attacks were bringing his
    linux machines to their limits.

    Dont call conn_request() if the TCP flags includes SYN flag

    Reported-by: Denys Fedoryshchenko
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • David S. Miller
     
  • It's only used in net/ipv6/route.c and the NULL device check is
    superfluous for all of the existing call sites.

    Just expand the __ndisc_lookup_errno() call at each location.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • 1) x == NULL --> !x
    2) x != NULL --> x
    3) (x&BIT) --> (x & BIT)
    4) (BIT1|BIT2) --> (BIT1 | BIT2)
    5) proper argument and struct member alignment

    Signed-off-by: David S. Miller

    David S. Miller
     
  • 1) x == NULL --> !x
    2) x != NULL --> x
    3) if() --> if ()
    4) while() --> while ()
    5) (x & BIT) == 0 --> !(x & BIT)
    6) (x&BIT) --> (x & BIT)
    7) x=y --> x = y
    8) (BIT1|BIT2) --> (BIT1 | BIT2)
    9) if ((x & BIT)) --> if (x & BIT)
    10) proper argument and struct member alignment

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Open vSwitch is a multilayer Ethernet switch targeted at virtualized
    environments. In addition to supporting a variety of features
    expected in a traditional hardware switch, it enables fine-grained
    programmatic extension and flow-based control of the network.
    This control is useful in a wide variety of applications but is
    particularly important in multi-server virtualization deployments,
    which are often characterized by highly dynamic endpoints and the need
    to maintain logical abstractions for multiple tenants.

    The Open vSwitch datapath provides an in-kernel fast path for packet
    forwarding. It is complemented by a userspace daemon, ovs-vswitchd,
    which is able to accept configuration from a variety of sources and
    translate it into packet processing rules.

    See http://openvswitch.org for more information and userspace
    utilities.

    Signed-off-by: Jesse Gross

    Jesse Gross
     
  • While parsing through IPv6 extension headers, fragment headers are
    skipped making them invisible to the caller. This reports the
    fragment offset of the last header in order to make it possible to
    determine whether the packet is fragmented and, if so whether it is
    a first or last fragment.

    Signed-off-by: Jesse Gross

    Jesse Gross
     
  • Open vSwitch needs this function for vlan handling.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: Jesse Gross

    Pravin B Shelar
     
  • This adds rcu_dereference_genl and genl_dereference, which are genl
    variants of the RTNL functions to enforce proper locking with lockdep
    and sparse.

    Signed-off-by: Jesse Gross

    Jesse Gross
     
  • Open vSwitch uses genl_mutex locking to protect datapath
    data-structures like flow-table, flow-actions. Following patch adds
    lockdep_genl_is_held() which is used for rcu annotation to prove
    locking.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: Jesse Gross

    Pravin B Shelar
     
  • Open vSwitch uses Generic Netlink interface for communication
    between userspace and kernel module. genl_notify() is used
    for sending notification back to userspace.

    genl_notify() is analogous to rtnl_notify() but uses genl_sock
    instead of rtnl.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: Jesse Gross

    Pravin B Shelar
     

03 Dec, 2011

4 commits


02 Dec, 2011

11 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits)
    netfilter: Remove ADVANCED dependency from NF_CONNTRACK_NETBIOS_NS
    ipv4: flush route cache after change accept_local
    sch_red: fix red_change
    Revert "udp: remove redundant variable"
    bridge: master device stuck in no-carrier state forever when in user-stp mode
    ipv4: Perform peer validation on cached route lookup.
    net/core: fix rollback handler in register_netdevice_notifier
    sch_red: fix red_calc_qavg_from_idle_time
    bonding: only use primary address for ARP
    ipv4: fix lockdep splat in rt_cache_seq_show
    sch_teql: fix lockdep splat
    net: fec: Select the FEC driver by default for i.MX SoCs
    isdn: avoid copying too long drvid
    isdn: make sure strings are null terminated
    netlabel: Fix build problems when IPv6 is not enabled
    sctp: better integer overflow check in sctp_auth_create_key()
    sctp: integer overflow in sctp_auth_create_key()
    ipv6: Set mcast_hops to IPV6_DEFAULT_MCASTHOPS when -1 was given.
    net: Fix corruption in /proc/*/net/dev_mcast
    mac80211: fix race between the AGG SM and the Tx data path
    ...

    Linus Torvalds
     
  • firewalld in Fedora 16 needs this.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Signed-off-by: David S. Miller

    David S. Miller
     
  • Reported-by: Thomas Jarosch
    Signed-off-by: David S. Miller

    David S. Miller
     
  • After reset ipv4_devconf->data[IPV4_DEVCONF_ACCEPT_LOCAL] to 0,
    we should flush route cache, or it will continue receive packets with local
    source address, which should be dropped.

    Signed-off-by: Weiping Pan
    Signed-off-by: David S. Miller

    Peter Pan(潘卫平)
     
  • Le mercredi 30 novembre 2011 à 14:36 -0800, Stephen Hemminger a écrit :

    > (Almost) nobody uses RED because they can't figure it out.
    > According to Wikipedia, VJ says that:
    > "there are not one, but two bugs in classic RED."

    RED is useful for high throughput routers, I doubt many linux machines
    act as such devices.

    I was considering adding Adaptative RED (Sally Floyd, Ramakrishna
    Gummadi, Scott Shender), August 2001

    In this version, maxp is dynamic (from 1% to 50%), and user only have to
    setup min_th (target average queue size)
    (max_th and wq (burst in linux RED) are automatically setup)

    By the way it seems we have a small bug in red_change()

    if (skb_queue_empty(&sch->q))
    red_end_of_idle_period(&q->parms);

    First, if queue is empty, we should call
    red_start_of_idle_period(&q->parms);

    Second, since we dont use anymore sch->q, but q->qdisc, the test is
    meaningless.

    Oh well...

    [PATCH] sch_red: fix red_change()

    Now RED is classful, we must check q->qdisc->q.qlen, and if queue is empty,
    we start an idle period, not end it.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Linus Torvalds
     
  • * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (31 commits)
    ocfs2: avoid unaligned access to dqc_bitmap
    ocfs2: Use filemap_write_and_wait() instead of write_inode_now()
    ocfs2: honor O_(D)SYNC flag in fallocate
    ocfs2: Add a missing journal credit in ocfs2_link_credits() -v2
    ocfs2: send correct UUID to cleancache initialization
    ocfs2: Commit transactions in error cases -v2
    ocfs2: make direntry invalid when deleting it
    fs/ocfs2/dlm/dlmlock.c: free kmem_cache_zalloc'd data using kmem_cache_free
    ocfs2: Avoid livelock in ocfs2_readpage()
    ocfs2: serialize unaligned aio
    ocfs2: Implement llseek()
    ocfs2: Fix ocfs2_page_mkwrite()
    ocfs2: Add comment about orphan scanning
    ocfs2: Clean up messages in the fs
    ocfs2/cluster: Cluster up now includes network connections too
    ocfs2/cluster: Add new function o2net_fill_node_map()
    ocfs2/cluster: Fix output in file elapsed_time_in_ms
    ocfs2/dlm: dlmlock_remote() needs to account for remastery
    ocfs2/dlm: Take inflight reference count for remotely mastered resources too
    ocfs2/dlm: Cleanup dlm_wait_for_node_death() and dlm_wait_for_node_recovery()
    ...

    Linus Torvalds
     
  • The dqc_bitmap field of struct ocfs2_local_disk_chunk is 32-bit aligned,
    but not 64-bit aligned. The dqc_bitmap is accessed by ocfs2_set_bit(),
    ocfs2_clear_bit(), ocfs2_test_bit(), or ocfs2_find_next_zero_bit(). These
    are wrapper macros for ext2_*_bit() which need to take an unsigned long
    aligned address (though some architectures are able to handle unaligned
    address correctly)

    So some 64bit architectures may not be able to access the dqc_bitmap
    correctly.

    This avoids such unaligned access by using another wrapper functions for
    ext2_*_bit(). The code is taken from fs/ext4/mballoc.c which also need to
    handle unaligned bitmap access.

    Signed-off-by: Akinobu Mita
    Acked-by: Joel Becker
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Joel Becker

    Akinobu Mita
     
  • * 'fixes' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm:
    ARM: 7182/1: ARM cpu topology: fix warning
    ARM: 7181/1: Restrict kprobes probing SWP instructions to ARMv5 and below
    ARM: 7180/1: Change kprobes testcase with unpredictable STRD instruction
    ARM: 7177/1: GIC: avoid skipping non-existent PPIs in irq_start calculation
    ARM: 7176/1: cpu_pm: register GIC PM notifier only once
    ARM: 7175/1: add subname parameter to mfp_set_groupg callers
    ARM: 7174/1: Fix build error in kprobes test code on Thumb2 kernels
    ARM: 7172/1: dma: Drop GFP_COMP for DMA memory allocations
    ARM: 7171/1: unwind: add unwind directives to bitops assembly macros
    ARM: 7170/2: fix compilation breakage in entry-armv.S
    ARM: 7168/1: use cache type functions for arch_get_unmapped_area
    ARM: perf: check that we have a platform device when reserving PMU
    ARM: 7166/1: Use PMD_SHIFT instead of PGDIR_SHIFT in dma-consistent.c
    ARM: 7165/2: PL330: Fix typo in _prepare_ccr()
    ARM: 7163/2: PL330: Only register usable channels
    ARM: 7162/1: errata: tidy up Kconfig options for PL310 errata workarounds
    ARM: 7161/1: errata: no automatic store buffer drain
    ARM: perf: initialise used_mask for fake PMU during validation
    ARM: PMU: remove pmu_init declaration
    ARM: PMU: re-export release_pmu symbol to modules

    Linus Torvalds
     
  • Commit 1386be55e32a3c5d8ef4a2b243c530a7b664c02c ("dccp: fix
    auto-loading of dccp(_probe)") fixed a bug but created a new
    compiler warning:

    net/dccp/probe.c: In function ‘dccpprobe_init’:
    net/dccp/probe.c:166:2: warning: the omitted middle operand in ?: will always be ‘true’, suggest explicit middle operand [-Wparentheses]

    try_then_request_module() is built for situations where the
    "existence" test is some lookup function that returns a non-NULL
    object on success, and with a reference count of some kind held.

    Here we're looking for a success return of zero from the jprobe
    registry.

    Instead of fighting the way try_then_request_module() works, simply
    open code what we want to happen in a local helper function.

    Signed-off-by: David S. Miller

    David S. Miller