11 Dec, 2012

1 commit

  • ip_check_defrag() might be called from af_packet within the
    RX path where shared SKBs are used, so it must not modify
    the input SKB before it has unshared it for defragmentation.
    Use skb_copy_bits() to get the IP header and only pull in
    everything later.

    The same is true for the other caller in macvlan as it is
    called from dev->rx_handler which can also get a shared SKB.

    Reported-by: Eric Leblond
    Cc: stable@vger.kernel.org
    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

10 Dec, 2012

4 commits

  • Add logic to verify that a port comparison byte code operation
    actually has the second inet_diag_bc_op from which we read the port
    for such operations.

    Previously the code blindly referenced op[1] without first checking
    whether a second inet_diag_bc_op struct could fit there. So a
    malicious user could make the kernel read 4 bytes beyond the end of
    the bytecode array by claiming to have a whole port comparison byte
    code (2 inet_diag_bc_op structs) when in fact the bytecode was not
    long enough to hold both.

    Signed-off-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Neal Cardwell
     
  • Add logic to check the address family of the user-supplied conditional
    and the address family of the connection entry. We now do not do
    prefix matching of addresses from different address families (AF_INET
    vs AF_INET6), except for the previously existing support for having an
    IPv4 prefix match an IPv4-mapped IPv6 address (which this commit
    maintains as-is).

    This change is needed for two reasons:

    (1) The addresses are different lengths, so comparing a 128-bit IPv6
    prefix match condition to a 32-bit IPv4 connection address can cause
    us to unwittingly walk off the end of the IPv4 address and read
    garbage or oops.

    (2) The IPv4 and IPv6 address spaces are semantically distinct, so a
    simple bit-wise comparison of the prefixes is not meaningful, and
    would lead to bogus results (except for the IPv4-mapped IPv6 case,
    which this commit maintains).

    Signed-off-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Neal Cardwell
     
  • Add logic to validate INET_DIAG_BC_S_COND and INET_DIAG_BC_D_COND
    operations.

    Previously we did not validate the inet_diag_hostcond, address family,
    address length, and prefix length. So a malicious user could make the
    kernel read beyond the end of the bytecode array by claiming to have a
    whole inet_diag_hostcond when the bytecode was not long enough to
    contain a whole inet_diag_hostcond of the given address family. Or
    they could make the kernel read up to about 27 bytes beyond the end of
    a connection address by passing a prefix length that exceeded the
    length of addresses of the given family.

    Signed-off-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Neal Cardwell
     
  • Fix inet_diag to be aware of the fact that AF_INET6 TCP connections
    instantiated for IPv4 traffic and in the SYN-RECV state were actually
    created with inet_reqsk_alloc(), instead of inet6_reqsk_alloc(). This
    means that for such connections inet6_rsk(req) returns a pointer to a
    random spot in memory up to roughly 64KB beyond the end of the
    request_sock.

    With this bug, for a server using AF_INET6 TCP sockets and serving
    IPv4 traffic, an inet_diag user like `ss state SYN-RECV` would lead to
    inet_diag_fill_req() causing an oops or the export to user space of 16
    bytes of kernel memory as a garbage IPv6 address, depending on where
    the garbage inet6_rsk(req) pointed.

    Signed-off-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Neal Cardwell
     

08 Dec, 2012

2 commits

  • commit 2e71a6f8084e (net: gro: selective flush of packets) added
    a bug for skbs using frag_list. This part of the GRO stack is rarely
    used, as it needs skb not using a page fragment for their skb->head.

    Most drivers do use a page fragment, but some of them use GFP_KERNEL
    allocations for the initial fill of their RX ring buffer.

    napi_gro_flush() overwrite skb->prev that was used for these skb to
    point to the last skb in frag_list.

    Fix this using a separate field in struct napi_gro_cb to point to the
    last fragment.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • If SYN-ACK partially acks SYN-data, the client retransmits the
    remaining data by tcp_retransmit_skb(). This increments lost recovery
    state variables like tp->retrans_out in Open state. If loss recovery
    happens before the retransmission is acked, it triggers the WARN_ON
    check in tcp_fastretrans_alert(). For example: the client sends
    SYN-data, gets SYN-ACK acking only ISN, retransmits data, sends
    another 4 data packets and get 3 dupacks.

    Since the retransmission is not caused by network drop it should not
    update the recovery state variables. Further the server may return a
    smaller MSS than the cached MSS used for SYN-data, so the retranmission
    needs a loop. Otherwise some data will not be retransmitted until timeout
    or other loss recovery events.

    Signed-off-by: Yuchung Cheng
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Yuchung Cheng
     

02 Dec, 2012

1 commit

  • Recent network changes allowed high order pages being used
    for skb fragments.

    This uncovered a bug in do_tcp_sendpages() which was assuming its caller
    provided an array of order-0 page pointers.

    We only have to deal with a single page in this function, and its order
    is irrelevant.

    Reported-by: Willy Tarreau
    Tested-by: Willy Tarreau
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Dec, 2012

1 commit


29 Nov, 2012

6 commits

  • Two small openswitch fixes from Jesse Gross.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • An interface name overflow fix in netfilter via Pablo Neira Ayuso.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Cleanup the memory we allocated earlier in irttp_open_tsap() when we hit
    this error path. The leak goes back to at least 1da177e4
    ("Linux-2.6.12-rc2").

    Discovered with Trinity (the syscall fuzzer).

    Signed-off-by: Tommi Rantala
    Signed-off-by: David S. Miller

    Tommi Rantala
     
  • The calculation of RTTVAR involves the subtraction of two unsigned
    numbers which
    may causes rollover and results in very high values of RTTVAR when RTT > SRTT.
    With this patch it is possible to set RTOmin = 1 to get the minimum of RTO at
    4 times the clock granularity.

    Change Notes:

    v2)
    *Replaced abs() by abs64() and long by __s64, changed patch
    description.

    Signed-off-by: Christian Schoch
    CC: Vlad Yasevich
    CC: Sridhar Samudrala
    CC: Neil Horman
    CC: linux-sctp@vger.kernel.org
    Acked-by: Vlad Yasevich
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Schoch Christian
     
  • Consider the following program, that sets the second argument to the
    sendto() syscall incorrectly:

    #include
    #include
    #include

    int main(void)
    {
    int fd;
    struct sockaddr_in sa;

    fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/);
    if (fd < 0)
    return 1;

    memset(&sa, 0, sizeof(sa));
    sa.sin_family = AF_INET;
    sa.sin_addr.s_addr = inet_addr("127.0.0.1");
    sa.sin_port = htons(11111);

    sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa));

    return 0;
    }

    We get -ENOMEM:

    $ strace -e sendto ./demo
    sendto(3, NULL, 1, 0, {sa_family=AF_INET, sin_port=htons(11111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 ENOMEM (Cannot allocate memory)

    Propagate the error code from sctp_user_addto_chunk(), so that we will
    tell user space what actually went wrong:

    $ strace -e sendto ./demo
    sendto(3, NULL, 1, 0, {sa_family=AF_INET, sin_port=htons(11111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EFAULT (Bad address)

    Noticed while running Trinity (the syscall fuzzer).

    Signed-off-by: Tommi Rantala
    Acked-by: Vlad Yasevich
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Tommi Rantala
     
  • Trinity (the syscall fuzzer) discovered a memory leak in SCTP,
    reproducible e.g. with the sendto() syscall by passing invalid
    user space pointer in the second argument:

    #include
    #include
    #include

    int main(void)
    {
    int fd;
    struct sockaddr_in sa;

    fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/);
    if (fd < 0)
    return 1;

    memset(&sa, 0, sizeof(sa));
    sa.sin_family = AF_INET;
    sa.sin_addr.s_addr = inet_addr("127.0.0.1");
    sa.sin_port = htons(11111);

    sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa));

    return 0;
    }

    As far as I can tell, the leak has been around since ~2003.

    Signed-off-by: Tommi Rantala
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Tommi Rantala
     

27 Nov, 2012

4 commits

  • Name of pimreg devices are built from following format :

    char name[IFNAMSIZ]; // IFNAMSIZ == 16

    sprintf(name, "pimreg%u", mrt->id);

    We must therefore limit mrt->id to 9 decimal digits
    or risk a buffer overflow and a crash.

    Restrict table identifiers in [0 ... 999999999] interval.

    Reported-by: Chen Gang
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • inet_getpeer_v4() can return NULL under OOM conditions, and while
    inet_peer_xrlim_allow() is OK with a NULL peer, inet_putpeer() will
    crash.

    This code path now uses the same idiom as the others from:
    1d861aa4b3fb08822055345f480850205ffe6170 ("inet: Minimize use of
    cached route inetpeer.").

    Signed-off-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Neal Cardwell
     
  • Set in the rx_ifindex to pass the correct interface index in the case of a
    message timeout detection. Usually the rx_ifindex value is set at receive
    time. But when no CAN frame has been received the RX_TIMEOUT notification
    did not contain a valid value.

    Cc: linux-stable
    Reported-by: Andre Naujoks
    Signed-off-by: Oliver Hartkopp
    Signed-off-by: Marc Kleine-Budde

    Oliver Hartkopp
     
  • Felix Liao reported that when an interface is set DOWN
    while another interface is executing a ROC, the warning
    in ieee80211_start_next_roc() (about the first item on
    the list having started already) triggers.

    This is because ieee80211_roc_purge() calls it even if
    it never actually changed the list of ROC items. To fix
    this, simply remove the function call. If it is needed
    then it will be done by the ieee80211_sw_roc_work()
    function when the ROC item that is being removed while
    active is cleaned up.

    Cc: stable@vger.kernel.org
    Reported-by: Felix Liao
    Signed-off-by: Johannes Berg
    Signed-off-by: John W. Linville

    Johannes Berg
     

25 Nov, 2012

1 commit


23 Nov, 2012

3 commits

  • Starting from 3.6 we cache output routes for
    multicasts only when using route to 224/4. For local receivers
    we can set RTCF_LOCAL flag depending on the membership but
    in such case we use maddr and saddr which are not caching
    keys as before. Additionally, we can not use same place to
    cache routes that differ in RTCF_LOCAL flag value.

    Fix it by caching only RTCF_MULTICAST entries
    without RTCF_LOCAL (send-only, no loopback). As a side effect,
    we avoid unneeded lookup for fnhe when not caching because
    multicasts are not redirected and they do not learn PMTU.

    Thanks to Maxime Bizon for showing the caching
    problems in __mkroute_output for 3.6 kernels: different
    RTCF_LOCAL flag in cache can lead to wrong ip_mc_output or
    ip_output call and the visible problem is that traffic can
    not reach local receivers via loopback.

    Reported-by: Maxime Bizon
    Tested-by: Maxime Bizon
    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • Pablo Neira Ayuso says:

    ====================
    The following patchset contains two Netfilter fixes:

    * Fix buffer overflow in the name of the timeout policy object
    in the cttimeout infrastructure, from Florian Westphal.

    * Fix a bug in the hash set in case that IP ranges are
    specified, from Jozsef Kadlecsik.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Steffen Klassert says:

    ====================
    This pull request is intended for 3.7 and contains a single patch to
    fix the IPsec gc threshold value for ipv4.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

22 Nov, 2012

3 commits

  • Chen Gang reports:
    the length of nla_data(cda[CTA_TIMEOUT_NAME]) is not limited in server side.

    And indeed, its used to strcpy to a fixed-sized buffer.

    Fortunately, nfnetlink users need CAP_NET_ADMIN.

    Reported-by: Chen Gang
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Due to the missing ininitalization at adding/deleting entries, when
    a plain_ip,port,net element was the object, multiple elements were
    added/deleted instead. The bug came from the missing dangling
    default initialization.

    The error-prone default initialization is corrected in all hash:* types.

    Signed-off-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Jozsef Kadlecsik
     
  • …wireless into for-davem

    John W. Linville says:

    ====================
    This is a batch of fixes intended for 3.7...

    Included are two pulls. Regarding the mac80211 tree, Johannes says:

    "Please pull my mac80211.git tree (see below) to get two more fixes for
    3.7. Both fix regressions introduced *before* this cycle that weren't
    noticed until now, one for IBSS not cleaning up properly and the other
    to add back the "wireless" sysfs directory for Fedora's startup scripts."

    Regarding the iwlwifi tree, Johannes says:

    "Please also pull my iwlwifi.git tree, I have two fixes: one to remove a
    spurious warning that can actually trigger in legitimate situations, and
    the other to fix a regression from when monitor mode was changed to use
    the "sniffer" firmware mode."

    Also included is an nfc tree pull. Samuel says:

    "We mostly have pn533 fixes here, 2 memory leaks and an early unlocking fix.
    Moreover, we also have an LLCP adapter linked list insertion fix."

    On top of that, a few more bits... Albert Pool adds a USB ID
    to rtlwifi. Bing Zhao provides two mwifiex fixes -- one to fix
    a system hang during a command timeout, and the other to properly
    report a suspend error to the MMC core. Finally, Sujith Manoharan
    fixes a thinko that would trigger an ath9k hang during device reset.
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    John W. Linville
     

21 Nov, 2012

1 commit


20 Nov, 2012

2 commits


17 Nov, 2012

6 commits

  • commit 35b2a113cb0298d4f9a1263338b456094a414057 broke (at least)
    Fedora's networking scripts, they check for the existence of the
    wireless directory. As the files aren't used, add the directory
    back and not the files. Also do it for both drivers based on the
    old wireless extensions and cfg80211, regardless of whether the
    compat code for wext is built into cfg80211 or not.

    Cc: stable@vger.kernel.org [3.6]
    Reported-by: Dave Airlie
    Reported-by: Bill Nottingham
    Signed-off-by: Johannes Berg

    Johannes Berg
     
  • John W. Linville says:

    ====================
    This batch of fixes is intended for the 3.7 stream...

    This includes a pull of the Bluetooth tree. Gustavo says:

    "A few important fixes to go into 3.7. There is a new hw support by Marcos
    Chaparro. Johan added a memory leak fix and hci device index list fix.
    Also Marcel fixed a race condition in the device set up that was prevent the
    bt monitor to work properly. Last, Paulo Sérgio added a fix to the error
    status when pairing for LE fails. This was prevent userspace to work to handle
    the failure properly."

    Regarding the mac80211 pull, Johannes says:

    "I have a locking fix for some SKB queues, a variable initialization to
    avoid crashes in a certain failure case, another free_txskb fix from
    Felix and another fix from him to avoid calling a stopped driver, a fix
    for a (very unlikely) memory leak and a fix to not send null data
    packets when resuming while not associated."

    Regarding the iwlwifi pull, Johannes says:

    "Two more fixes for iwlwifi ... one to use ieee80211_free_txskb(), and
    one to check DMA mapping errors, please pull."

    On top of that, Johannes also included a wireless regulatory fix
    to allow 40 MHz on channels 12 and 13 in world roaming mode. Also,
    Hauke Mehrtens fixes a #ifdef typo in brcmfmac.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • In commit c445477d74ab3779 which adds aRFS to the kernel, the CPU
    selected for RFS is not set correctly when CPU is changing.
    This is causing OOO packets and probably other issues.

    Signed-off-by: Tom Herbert
    Acked-by: Eric Dumazet
    Acked-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • Included fixes are:
    - update the client entry status flags when using the "early client
    detection". This makes the Distributed AP isolation correctly work;
    - transfer the client entry status flags when recovering the translation
    table from another node. This makes the Distributed AP isolation correctly
    work;
    - prevent the "early client detection mechanism" to add clients belonging to
    other backbone nodes in the same LAN. This breaks connectivity when using this
    mechanism together with the Bridge Loop Avoidance
    - process broadcast packets with the Bridge Loop Avoidance before any other
    component. BLA can possibly drop the packets based on the source address. This
    makes the "early client detection mechanism" correctly work when used with
    BLA.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • order-5 allocations can fail with current kernels, we should
    try vmalloc() as well.

    Reported-by: Julien Tinnes
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • …wireless into for-davem

    John W. Linville
     

16 Nov, 2012

5 commits

  • The logic in the BLA mechanism may decide to drop broadcast packets
    because the node may still be in the setup phase. For this reason,
    further broadcast processing like the early client detection mechanism
    must be done only after the BLA check.

    This patches moves the invocation to BLA before any other broadcast
    processing.

    This was introduced 30cfd02b60e1cb16f5effb0a01f826c5bb7e4c59
    ("batman-adv: detect not yet announced clients")

    Reported-by: Glen Page
    Signed-off-by: Simon Wunderlich
    Signed-off-by: Antonio Quartulli
    Signed-off-by: Marek Lindner

    Antonio Quartulli
     
  • The "early client detection" mechanism must not add clients belonging
    to other backbone nodes. Such clients must be reached by directly
    using the LAN instead of the mesh.

    This was introduced by 30cfd02b60e1cb16f5effb0a01f826c5bb7e4c59
    ("batman-adv: detect not yet announced clients")

    Reported-by: Glen Page
    Signed-off-by: Antonio Quartulli
    Signed-off-by: Marek Lindner

    Antonio Quartulli
     
  • When a TT response with the full table is sent, the client flags
    should be sent as well. This patch fix the flags assignment when
    populating the tt_response to send back

    This was introduced by 30cfd02b60e1cb16f5effb0a01f826c5bb7e4c59
    ("batman-adv: detect not yet announced clients")

    Signed-off-by: Antonio Quartulli
    Signed-off-by: Marek Lindner

    Antonio Quartulli
     
  • Flags carried by a change_entry have to be always copied into the
    client entry as they may contain important attributes (e.g.
    TT_CLIENT_WIFI).

    For instance, a client added by means of the "early detection
    mechanism" has no flag set at the beginning, so they must be updated once the
    proper ADD event is received.

    This was introduced by 30cfd02b60e1cb16f5effb0a01f826c5bb7e4c59
    ("batman-adv: detect not yet announced clients")

    Signed-off-by: Antonio Quartulli
    Signed-off-by: Marek Lindner

    Antonio Quartulli
     
  • Check (ha->addr == dev->dev_addr) is always true because dev_addr_init()
    sets this. Correct the check to behave properly on addr removal.

    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko