20 Jul, 2008

1 commit


17 Jul, 2008

1 commit

  • The tcp_enter_memory_pressure calls NET_INC_STATS, but doesn't
    have where to get the net from.

    I decided to add a sk argument, not the net itself, only to factor
    all the required sock_net(sk) calls inside the enter_memory_pressure
    callback itself.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

06 Jul, 2008

1 commit


17 Jun, 2008

1 commit


05 May, 2008

1 commit


26 Mar, 2008

2 commits


01 Mar, 2008

1 commit


08 Feb, 2008

1 commit

  • Same alignment requirement was removed on IP route cache in the past.

    This alignment actually has bad effect on 32 bit arches, uniprocessor,
    since sizeof(dn_rt_hash_bucket) is forced to 8 bytes instead of 4.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

29 Jan, 2008

18 commits

  • Remove struct net from fib_rules_register(unregister)/notify_change
    paths and diet code size a bit.

    add/remove: 0/0 grow/shrink: 10/12 up/down: 35/-100 (-65)
    function old new delta
    notify_rule_change 273 280 +7
    trie_show_stats 471 475 +4
    fn_trie_delete 473 477 +4
    fib_rules_unregister 144 148 +4
    fib4_rule_compare 119 123 +4
    resize 2842 2845 +3
    fn_trie_select_default 515 518 +3
    inet_sk_rebuild_header 836 838 +2
    fib_trie_seq_show 764 766 +2
    __devinet_sysctl_register 276 278 +2
    fn_trie_lookup 1124 1123 -1
    ip_fib_check_default 133 131 -2
    devinet_conf_sysctl 223 221 -2
    snmp_fold_field 126 123 -3
    fn_trie_insert 2091 2086 -5
    inet_create 876 870 -6
    fib4_rules_init 197 191 -6
    fib_sync_down 452 444 -8
    inet_gso_send_check 334 325 -9
    fib_create_info 3003 2991 -12
    fib_nl_delrule 568 553 -15
    fib_nl_newrule 883 852 -31

    Signed-off-by: Denis V. Lunev
    Acked-by: Daniel Lezcano
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • The backward link from FIB rules operations to the network namespace
    will allow to simplify the API a bit.

    Signed-off-by: Denis V. Lunev
    Acked-by: Daniel Lezcano
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • Create a specific helper for netlink kernel socket disposal. This just
    let the code look better and provides a ground for proper disposal
    inside a namespace.

    Signed-off-by: Denis V. Lunev
    Tested-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • The garbage collection function receive the dst_ops structure as
    parameter. This is useful for the next incoming patchset because it
    will need the dst_ops (there will be several instances) and the
    network namespace pointer (contained in the dst_ops).

    The protocols which do not take care of the namespaces will not be
    impacted by this change (expect for the function signature), they do
    just ignore the parameter.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: David S. Miller

    Daniel Lezcano
     
  • fib_rules_ops contains operations and the list of configured rules. ops will
    become per/namespace soon, so we need them to be known in the default_pref
    callback.

    Acked-by: Benjamin Thery
    Acked-by: Daniel Lezcano
    Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • The patch extends the different fib rules API in order to pass the
    network namespace pointer. That will allow to access the different
    tables from a namespace relative object. As usual, the pointer to the
    init_net variable is passed as parameter so we don't break the
    network.

    Acked-by: Benjamin Thery
    Acked-by: Daniel Lezcano
    Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • The decnet includes two places to patch. The first one is
    the net/decnet table itself, and it is patched just like
    other subsystems in the first patch in this series.

    The second place is a bit more complex - it is the
    net/decnet/conf/xxx entries,. similar to those in
    ipv4/devinet.c and ipv6/addrconf.c. This code is made similar
    to those in ipv[46].

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • I'm actually surprised at how much was involved. At first glance it
    appears that the neighbour table data structures are already split by
    network device so all that should be needed is to modify the user
    interface commands to filter the set of neighbours by the network
    namespace of their devices.

    However a couple things turned up while I was reading through the
    code. The proxy neighbour table allows entries with no network
    device, and the neighbour parms are per network device (except for the
    defaults) so they now need a per network namespace default.

    So I updated the two structures (which surprised me) with their very
    own network namespace parameter. Updated the relevant lookup and
    destroy routines with a network namespace parameter and modified the
    code that interacts with users to filter out neighbour table entries
    for devices of other namespaces.

    I'm a little concerned that we can modify and display the global table
    configuration and from all network namespaces. But this appears good
    enough for now.

    I keep thinking modifying the neighbour table to have per network
    namespace instances of each table type would should be cleaner. The
    hash table is already dynamically sized so there are it is not a
    limiter. The default parameter would be straight forward to take care
    of. However when I look at the how the network table is built and
    used I still find some assumptions that there is only a single
    neighbour table for each type of table in the kernel. The netlink
    operations, neigh_seq_start, the non-core network users that call
    neigh_lookup. So while it might be doable it would require more
    refactoring than my current approach of just doing a little extra
    filtering in the code.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Daniel Lezcano
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • The NETFILTER_ADVANCED option hides lots of the rather obscure netfilter
    options when disabled and provides defaults (M) that should allow to
    run a distribution firewall without further thinking.

    Defaults to 'y' to avoid breaking current configurations.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Signed-off-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    YOSHIFUJI Hideaki
     
  • My previous patch made the wait flag take the opposite value to what
    it should be. This patch fixes that.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch converts all callers of xfrm_lookup that used an
    explicit value of 1 to indiciate blocking to use the new flag
    XFRM_LOOKUP_WAIT.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Signed-off-by: Denis V. Lunev
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • After this patch none of the netlink callback support anything
    except the initial network namespace but the rtnetlink infrastructure
    now handles multiple network namespaces.

    Changes from v2:
    - IPv6 addrlabel processing

    Changes from v1:
    - no need for special rtnl_unlock handling
    - fixed IPv6 ndisc

    Signed-off-by: Denis V. Lunev
    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • Before I can enable rtnetlink to work in all network namespaces I need
    to be certain that something won't break. So this patch deliberately
    disables all of the rtnletlink methods in everything except the
    initial network namespace. After the methods have been audited this
    extra check can be disabled.

    Changes from v1:
    - added IPv6 addrlabel protection

    Signed-off-by: Denis V. Lunev
    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller
    Signed-off-by: Herbert Xu

    Denis V. Lunev
     
  • We have a number of copies of dst_discard scattered around the place
    which all do the same thing, namely free a packet on the input or
    output paths.

    This patch deletes all of them except dst_discard and points all the
    users to it.

    The only non-trivial bit is decnet where it returns an error.
    However, conceptually this is identical to the blackhole functions
    used in IPv4 and IPv6 which do not return errors. So they should
    either all return errors or all return zero. For now I've stuck with
    the majority and picked zero as the return value.

    It doesn't really matter in practice since few if any driver would
    react differently depending on a zero return value or NET_RX_DROP.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Many-many code in the kernel initialized the timer->function
    and timer->data together with calling init_timer(timer). There
    is already a helper for this. Use it for networking code.

    The patch is HUGE, but makes the code 130 lines shorter
    (98 insertions(+), 228 deletions(-)).

    Signed-off-by: Pavel Emelyanov
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

11 Jan, 2008

1 commit

  • In dn_rt_cache_get_next(), no need to guard seq->private by a
    rcu_dereference() since seq is private to the thread running this
    function. Reading seq.private once (as guaranted bu rcu_dereference())
    or several time if compiler really is dumb enough wont change the
    result.

    But we miss real spots where rcu_dereference() are needed, both in
    dn_rt_cache_get_first() and dn_rt_cache_get_next()

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Nov, 2007

1 commit

  • As far as I see from the err variable initialization
    the dn_nl_deladdr() routine was designed to report errors
    like "EADDRNOTAVAIL" and probaby "ENODEV".

    But the code sets this err to 0 after the first nlmsg_parse
    and goes on, returning this 0 in any case.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Steven Whitehouse
    Signed-off-by: Herbert Xu

    Pavel Emelyanov
     

11 Nov, 2007

2 commits


07 Nov, 2007

1 commit

  • sysfs keeps references to module parameters via /sys/module/*/parameters,
    so marking them as __initdata can't work.

    Steps to reproduce:

    modprobe decnet
    cat /sys/module/decnet/parameters/addr

    BUG: unable to handle kernel paging request at virtual address f88cd410
    printing eip: c043dfd1 *pdpt = 0000000000004001 *pde = 0000000004408067 *pte = 0000000000000000
    Oops: 0000 [#1] PREEMPT SMP
    Modules linked in: decnet sunrpc af_packet ipv6 binfmt_misc dm_mirror dm_multipath dm_mod sbs sbshc fan dock battery backlight ac power_supply parport loop rtc_cmos serio_raw rtc_core rtc_lib button amd_rng sr_mod cdrom shpchp pci_hotplug ehci_hcd ohci_hcd uhci_hcd usbcore
    Pid: 2099, comm: cat Not tainted (2.6.24-rc1-b1d08ac064268d0ae2281e98bf5e82627e0f0c56-bloat #6)
    EIP: 0060:[] EFLAGS: 00210286 CPU: 1
    EIP is at param_get_int+0x6/0x20
    EAX: c5c87000 EBX: 00000000 ECX: 000080d0 EDX: f88cd410
    ESI: f8a108f8 EDI: c5c87000 EBP: 00000000 ESP: c5c97f00
    DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
    Process cat (pid: 2099, ti=c5c97000 task=c641ee10 task.ti=c5c97000)
    Stack: 00000000 f8a108f8 c5c87000 c043db6b f8a108f1 00000124 c043de1a c043db2f
    f88cd410 ffffffff c5c87000 f8a16bc8 f8a16bc8 c043dd69 c043dd54 c5dd5078
    c043dbc8 c5cc7580 c06ee64c c5d679f8 c04c431f c641f480 c641f484 00001000
    Call Trace:
    [] param_array_get+0x3c/0x62
    [] param_array_set+0x0/0xdf
    [] param_array_get+0x0/0x62
    [] param_attr_show+0x15/0x2d
    [] param_attr_show+0x0/0x2d
    [] module_attr_show+0x1a/0x1e
    [] sysfs_read_file+0x7c/0xd9
    [] sysfs_read_file+0x0/0xd9
    [] vfs_read+0x88/0x134
    [] do_page_fault+0x0/0x7d5
    [] sys_read+0x41/0x67
    [] sysenter_past_esp+0x6b/0xc1
    =======================
    Code: 00 83 c4 0c c3 83 ec 0c 8b 52 10 8b 12 c7 44 24 04 27 dd 6c c0 89 04 24 89 54 24 08 e8 ea 01 0c 00 83 c4 0c c3 83 ec 0c 8b 52 10 12 c7 44 24 04 58 8c 6a c0 89 04 24 89 54 24 08 e8 ca 01 0c
    EIP: [] param_get_int+0x6/0x20 SS:ESP 0068:c5c97f00

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

01 Nov, 2007

1 commit

  • Finally, the zero_it argument can be completely removed from
    the callers and from the function prototype.

    Besides, fix the checkpatch.pl warnings about using the
    assignments inside if-s.

    This patch is rather big, and it is a part of the previous one.
    I splitted it wishing to make the patches more readable. Hope
    this particular split helped.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

16 Oct, 2007

1 commit


11 Oct, 2007

6 commits

  • This patch make processing netlink user -> kernel messages synchronious.
    This change was inspired by the talk with Alexey Kuznetsov about current
    netlink messages processing. He says that he was badly wrong when introduced
    asynchronious user -> kernel communication.

    The call netlink_unicast is the only path to send message to the kernel
    netlink socket. But, unfortunately, it is also used to send data to the
    user.

    Before this change the user message has been attached to the socket queue
    and sk->sk_data_ready was called. The process has been blocked until all
    pending messages were processed. The bad thing is that this processing
    may occur in the arbitrary process context.

    This patch changes nlk->data_ready callback to get 1 skb and force packet
    processing right in the netlink_unicast.

    Kernel -> user path in netlink_unicast remains untouched.

    EINTR processing for in netlink_run_queue was changed. It forces rtnl_lock
    drop, but the process remains in the cycle until the message will be fully
    processed. So, there is no need to use this kludges now.

    Signed-off-by: Denis V. Lunev
    Acked-by: Alexey Kuznetsov
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • Just switch to the consolidated code.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Add inline for common usage of hardware header creation, and
    fix bug in IPV6 mcast where the assumption about negative return is
    an errno. Negative return from hard_header means not enough space
    was available,(ie -N bytes).

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • This patch makes loopback_dev per network namespace. Adding
    code to create a different loopback device for each network
    namespace and adding the code to free a loopback device
    when a network namespace exits.

    This patch modifies all users the loopback_dev so they
    access it as init_net.loopback_dev, keeping all of the
    code compiling and working. A later pass will be needed to
    update the users to use something other than the initial network
    namespace.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • This patch replaces all occurences to the static variable
    loopback_dev to a pointer loopback_dev. That provides the
    mindless, trivial, uninteressting change part for the dynamic
    allocation for the loopback.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Daniel Lezcano
    Acked-By: Kirill Korotaev
    Acked-by: Benjamin Thery
    Signed-off-by: David S. Miller

    Daniel Lezcano
     
  • Signed-off-by: Denis Cheng

    Denis Cheng