17 Jul, 2007

3 commits

  • Add TTY input auditing, used to audit system administrator's actions. This is
    required by various security standards such as DCID 6/3 and PCI to provide
    non-repudiation of administrator's actions and to allow a review of past
    actions if the administrator seems to overstep their duties or if the system
    becomes misconfigured for unknown reasons. These requirements do not make it
    necessary to audit TTY output as well.

    Compared to an user-space keylogger, this approach records TTY input using the
    audit subsystem, correlated with other audit events, and it is completely
    transparent to the user-space application (e.g. the console ioctls still
    work).

    TTY input auditing works on a higher level than auditing all system calls
    within the session, which would produce an overwhelming amount of mostly
    useless audit events.

    Add an "audit_tty" attribute, inherited across fork (). Data read from TTYs
    by process with the attribute is sent to the audit subsystem by the kernel.
    The audit netlink interface is extended to allow modifying the audit_tty
    attribute, and to allow sending explanatory audit events from user-space (for
    example, a shell might send an event containing the final command, after the
    interactive command-line editing and history expansion is performed, which
    might be difficult to decipher from the TTY input alone).

    Because the "audit_tty" attribute is inherited across fork (), it would be set
    e.g. for sshd restarted within an audited session. To prevent this, the
    audit_tty attribute is cleared when a process with no open TTY file
    descriptors (e.g. after daemon startup) opens a TTY.

    See https://www.redhat.com/archives/linux-audit/2007-June/msg00000.html for a
    more detailed rationale document for an older version of this patch.

    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Miloslav Trmac
    Cc: Al Viro
    Cc: Alan Cox
    Cc: Paul Fulghum
    Cc: Casey Schaufler
    Cc: Steve Grubb
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miloslav Trmac
     
  • Part two in the O_CLOEXEC saga: adding support for file descriptors received
    through Unix domain sockets.

    The patch is once again pretty minimal, it introduces a new flag for recvmsg
    and passes it just like the existing MSG_CMSG_COMPAT flag. I think this bit
    is not used otherwise but the networking people will know better.

    This new flag is not recognized by recvfrom and recv. These functions cannot
    be used for that purpose and the asymmetry this introduces is not worse than
    the already existing MSG_CMSG_COMPAT situations.

    The patch must be applied on the patch which introduced O_CLOEXEC. It has to
    remove static from the new get_unused_fd_flags function but since scm.c cannot
    live in a module the function still hasn't to be exported.

    Here's a test program to make sure the code works. It's so much longer than
    the actual patch...

    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    #ifndef O_CLOEXEC
    # define O_CLOEXEC 02000000
    #endif
    #ifndef MSG_CMSG_CLOEXEC
    # define MSG_CMSG_CLOEXEC 0x40000000
    #endif

    int
    main (int argc, char *argv[])
    {
    if (argc > 1)
    {
    int fd = atol (argv[1]);
    printf ("child: fd = %d\n", fd);
    if (fcntl (fd, F_GETFD) == 0 || errno != EBADF)
    {
    puts ("file descriptor valid in child");
    return 1;
    }
    return 0;

    }

    struct sockaddr_un sun;
    strcpy (sun.sun_path, "./testsocket");
    sun.sun_family = AF_UNIX;

    char databuf[] = "hello";
    struct iovec iov[1];
    iov[0].iov_base = databuf;
    iov[0].iov_len = sizeof (databuf);

    union
    {
    struct cmsghdr hdr;
    char bytes[CMSG_SPACE (sizeof (int))];
    } buf;
    struct msghdr msg = { .msg_iov = iov, .msg_iovlen = 1,
    .msg_control = buf.bytes,
    .msg_controllen = sizeof (buf) };
    struct cmsghdr *cmsg = CMSG_FIRSTHDR (&msg);

    cmsg->cmsg_level = SOL_SOCKET;
    cmsg->cmsg_type = SCM_RIGHTS;
    cmsg->cmsg_len = CMSG_LEN (sizeof (int));

    msg.msg_controllen = cmsg->cmsg_len;

    pid_t child = fork ();
    if (child == -1)
    error (1, errno, "fork");
    if (child == 0)
    {
    int sock = socket (PF_UNIX, SOCK_STREAM, 0);
    if (sock < 0)
    error (1, errno, "socket");

    if (bind (sock, (struct sockaddr *) &sun, sizeof (sun)) < 0)
    error (1, errno, "bind");
    if (listen (sock, SOMAXCONN) < 0)
    error (1, errno, "listen");

    int conn = accept (sock, NULL, NULL);
    if (conn == -1)
    error (1, errno, "accept");

    *(int *) CMSG_DATA (cmsg) = sock;
    if (sendmsg (conn, &msg, MSG_NOSIGNAL) < 0)
    error (1, errno, "sendmsg");

    return 0;
    }

    /* For a test suite this should be more robust like a
    barrier in shared memory. */
    sleep (1);

    int sock = socket (PF_UNIX, SOCK_STREAM, 0);
    if (sock < 0)
    error (1, errno, "socket");

    if (connect (sock, (struct sockaddr *) &sun, sizeof (sun)) < 0)
    error (1, errno, "connect");
    unlink (sun.sun_path);

    *(int *) CMSG_DATA (cmsg) = -1;

    if (recvmsg (sock, &msg, MSG_CMSG_CLOEXEC) < 0)
    error (1, errno, "recvmsg");

    int fd = *(int *) CMSG_DATA (cmsg);
    if (fd == -1)
    error (1, 0, "no descriptor received");

    char fdname[20];
    snprintf (fdname, sizeof (fdname), "%d", fd);
    execl ("/proc/self/exe", argv[0], fdname, NULL);
    puts ("execl failed");
    return 1;
    }

    [akpm@linux-foundation.org: Fix fastcall inconsistency noted by Michael Buesch]
    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Ulrich Drepper
    Cc: Ingo Molnar
    Cc: Michael Buesch
    Cc: Michael Kerrisk
    Acked-by: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Drepper
     
  • Recent breakage..

    net/sunrpc/auth_gss/auth_gss.c:1002: warning: implicit declaration of function 'lock_kernel'
    net/sunrpc/auth_gss/auth_gss.c:1004: warning: implicit declaration of function 'unlock_kernel'

    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

16 Jul, 2007

3 commits

  • * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (53 commits)
    [TCP]: Verify the presence of RETRANS bit when leaving FRTO
    [IPV6]: Call inet6addr_chain notifiers on link down
    [NET_SCHED]: Kill CONFIG_NET_CLS_POLICE
    [NET_SCHED]: act_api: qdisc internal reclassify support
    [NET_SCHED]: sch_dsmark: act_api support
    [NET_SCHED]: sch_atm: act_api support
    [NET_SCHED]: sch_atm: Lindent
    [IPV6]: MSG_ERRQUEUE messages do not pass to connected raw sockets
    [IPV4]: Cleanup call to __neigh_lookup()
    [NET_SCHED]: Revert "avoid transmit softirq on watchdog wakeup" optimization
    [NETFILTER]: nf_conntrack: UDPLITE support
    [NETFILTER]: nf_conntrack: mark protocols __read_mostly
    [NETFILTER]: x_tables: add connlimit match
    [NETFILTER]: Lower *tables printk severity
    [NETFILTER]: nf_conntrack: Don't track locally generated special ICMP error
    [NETFILTER]: nf_conntrack: Introduces nf_ct_get_tuplepr and uses it
    [NETFILTER]: nf_conntrack: make l3proto->prepare() generic and renames it
    [NETFILTER]: nf_conntrack: Increment error count on parsing IPv4 header
    [NET]: Add ethtool support for NETIF_F_IPV6_CSUM devices.
    [AF_IUCV]: Add lock when updating accept_q
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
    9p: fix a race condition bug in umount which caused a segfault
    9p: re-enable mount time debug option
    9p: cache meta-data when cache=loose
    net/9p: set error to EREMOTEIO if trans->write returns zero
    net/9p: change net/9p module name to 9pnet
    9p: Reorganization of 9p file system code

    Linus Torvalds
     
  • Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

15 Jul, 2007

34 commits

  • For yet unknown reason, something cleared SACKED_RETRANS bit
    underneath FRTO.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     
  • Currently if the link is brought down via ip link or ifconfig down,
    the inet6addr_chain notifiers are not called even though all
    the addresses are removed from the interface. This caused SCTP
    to add duplicate addresses to it's list.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • The NET_CLS_ACT option is now a full replacement for NET_CLS_POLICE,
    remove the old code. The config option will be kept around to select
    the equivalent NET_CLS_ACT options for a short time to allow easier
    upgrades.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • The behaviour of NET_CLS_POLICE for TC_POLICE_RECLASSIFY was to return
    it to the qdisc, which could handle it internally or ignore it. With
    NET_CLS_ACT however, tc_classify starts over at the first classifier
    and never returns it to the qdisc. This makes it impossible to support
    qdisc-internal reclassification, which in turn makes it impossible to
    remove the old NET_CLS_POLICE code without breaking compatibility since
    we have two qdiscs (CBQ and ATM) that support this.

    This patch adds a tc_classify_compat function that handles
    reclassification the old way and changes CBQ and ATM to use it.

    This again is of course not fully backwards compatible with the previous
    NET_CLS_ACT behaviour. Unfortunately there is no way to fully maintain
    compatibility *and* support qdisc internal reclassification with
    NET_CLS_ACT, but this seems like the better choice over keeping the two
    incompatible options around forever.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Handle act_api classification results.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Handle act_api classification results.

    The ATM scheduler behaves slightly different than other schedulers
    in that it only handles policer results for successful classifications,
    this behaviour is retained for the act_api case.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • From: Dmitry Butskoy

    Taken from http://bugzilla.kernel.org/show_bug.cgi?id=8747

    Problem Description:

    It is related to the possibility to obtain MSG_ERRQUEUE messages from the udp
    and raw sockets, both connected and unconnected.

    There is a little typo in net/ipv6/icmp.c code, which prevents such messages
    to be delivered to the errqueue of the correspond raw socket, when the socket
    is CONNECTED. The typo is due to swap of local/remote addresses.

    Consider __raw_v6_lookup() function from net/ipv6/raw.c. When a raw socket is
    looked up usual way, it is something like:

    sk = __raw_v6_lookup(sk, nexthdr, daddr, saddr, IP6CB(skb)->iif);

    where "daddr" is a destination address of the incoming packet (IOW our local
    address), "saddr" is a source address of the incoming packet (the remote end).

    But when the raw socket is looked up for some icmp error report, in
    net/ipv6/icmp.c:icmpv6_notify() , daddr/saddr are obtained from the echoed
    fragment of the "bad" packet, i.e. "daddr" is the original destination
    address of that packet, "saddr" is our local address. Hence, for
    icmpv6_notify() must use "saddr, daddr" in its arguments, not "daddr, saddr"
    ...

    Steps to reproduce:

    Create some raw socket, connect it to an address, and cause some error
    situation: f.e. set ttl=1 where the remote address is more than 1 hop to reach.
    Set IPV6_RECVERR .
    Then send something and wait for the error (f.e. poll() with POLLERR|POLLIN).
    You should receive "time exceeded" icmp message (because of "ttl=1"), but the
    socket do not receive it.

    If you do not connect your raw socket, you will receive MSG_ERRQUEUE
    successfully. (The reason is that for unconnected socket there are no actual
    checks for local/remote addresses).

    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Dmitry Butskoy
     
  • Back in the times of Linux 2.2, negative values for the creat parameter
    of __neigh_lookup() had a particular meaning, but no longer, so we
    should pass 1 instead.

    Signed-off-by: Jean Delvare
    Signed-off-by: David S. Miller

    Jean Delvare
     
  • As noticed by Ranko Zivojnovic , calling qdisc_run
    from the timer handler can result in deadlock:

    > CPU#0
    >
    > qdisc_watchdog() fires and gets dev->queue_lock
    > qdisc_run()...qdisc_restart()...
    > -> releases dev->queue_lock and enters dev_hard_start_xmit()
    >
    > CPU#1
    >
    > tc del qdisc dev ...
    > qdisc_graft()...dev_graft_qdisc()...dev_deactivate()...
    > -> grabs dev->queue_lock ...
    >
    > qdisc_reset()...{cbq,hfsc,htb,netem,tbf}_reset()...qdisc_watchdog_cancel()...
    > -> hrtimer_cancel() - waiting for the qdisc_watchdog() to exit, while still
    > holding dev->queue_lock
    >
    > CPU#0
    >
    > dev_hard_start_xmit() returns ...
    > -> wants to get dev->queue_lock(!)
    >
    > DEADLOCK!

    The entire optimization is a bit questionable IMO, it moves potentially
    large parts of NET_TX_SOFTIRQ work to TIMER_SOFTIRQ/HRTIMER_SOFTIRQ,
    which kind of defeats the separation of them.

    Signed-off-by: Patrick McHardy
    Acked-by: Ranko Zivojnovic
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Also remove two unnecessary EXPORT_SYMBOLs and move the
    nf_conntrack_l3proto_ipv4 declaration to the correct file.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • ipt_connlimit has been sitting in POM-NG for a long time.
    Here is a new shiny xt_connlimit with:

    * xtables'ified
    * will request the layer3 module
    (previously it hotdropped every packet when it was not loaded)
    * fixed: there was a deadlock in case of an OOM condition
    * support for any layer4 protocol (e.g. UDP/SCTP)
    * using jhash, as suggested by Eric Dumazet
    * ipv6 support

    Signed-off-by: Jan Engelhardt
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Jan Engelhardt
     
  • Lower ip6tables, arptables and ebtables printk severity similar to
    Dan Aloni's patch for iptables.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • The conntrack assigned to locally generated ICMP error is usually the one
    assigned to the original packet which has caused the error. But if
    the original packet is handled as invalid by nf_conntrack, no conntrack
    is assigned to the original packet. Then nf_ct_attach() cannot assign
    any conntrack to the ICMP error packet. In that case the current
    nf_conntrack_icmp assigns appropriate conntrack to it. But the current
    code mistakes the direction of the packet. As a result, NAT code mistakes
    the address to be mangled.

    To fix the bug, this changes nf_conntrack_icmp not to assign conntrack
    to such ICMP error. Actually no address is necessary to be mangled
    in this case.

    Spotted by Jordan Russell.

    Signed-off-by: Yasuyuki Kozakai
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Yasuyuki Kozakai
     
  • nf_ct_get_tuple() requires the offset to transport header and that bothers
    callers such as icmp[v6] l4proto modules. This introduces new function
    to simplify them.

    Signed-off-by: Yasuyuki Kozakai
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Yasuyuki Kozakai
     
  • The icmp[v6] l4proto modules parse headers in ICMP[v6] error to get tuple.
    But they have to find the offset to transport protocol header before that.
    Their processings are almost same as prepare() of l3proto modules.
    This makes prepare() more generic to simplify icmp[v6] l4proto module
    later.

    Signed-off-by: Yasuyuki Kozakai
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Yasuyuki Kozakai
     
  • Signed-off-by: Yasuyuki Kozakai
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Yasuyuki Kozakai
     
  • Add ethtool utility function to set or clear IPV6_CSUM feature flag.
    Modify tg3.c and bnx2.c to use this function when doing ethtool -K
    to change tx checksum.

    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Michael Chan
     
  • The accept_queue of an af_iucv socket will be corrupted, if
    adding and deleting of entries in this queue occurs at the
    same time (connect request from one client, while accept call
    is processed for another client).
    Solution: add locking when updating accept_q

    Signed-off-by: Ursula Braun
    Acked-by: Frank Pavlic
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • An iucv deadlock may occur, where one CPU is spinning on the
    iucv_table_lock for iucv_tasklet_fn(), while another CPU is holding
    the iucv_table_lock for an iucv_path_connect() and is waiting for
    the first CPU in an smp_call_function.
    Solution: replace spin_lock in iucv_tasklet_fn by spin_trylock and
    reschedule tasklet in case of non-granted lock.

    Signed-off-by: Ursula Braun
    Acked-by: Frank Pavlic
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • Signed-off-by: Jennifer Hunt
    Signed-off-by: Ursula Braun >braunu@de.ibm.com>
    Acked-by: Frank Pavlic
    Signed-off-by: David S. Miller

    Jennifer Hunt
     
  • This patch makes the needlessly global __inet_twsk_kill() static.

    Signed-off-by: Adrian Bunk
    Signed-off-by: David S. Miller

    Adrian Bunk
     
  • David S. Miller
     
  • Sangtae noticed the ssthresh got missed.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Fix sizeof(ETH_ALEN) Introduced by my rtnl_link patches.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Add macvlan driver, which allows to create virtual ethernet devices
    based on MAC address.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • The set_multicast_list function may be called without holding the rtnl
    mutex, resulting in races when changing the underlying device's promiscous
    and allmulti state. Use the change_rx_mode hook, which is always invoked
    under the rtnl.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • The method drivers currently use to synchronize multicast lists is not
    very pretty:

    - walk the multicast list
    - search each entry on a copy of the previous list
    - if new add to lower device
    - walk the copy of the previous list
    - search each entry on the current list
    - if removed delete from lower device
    - copy entire list

    This patch adds a new field to struct dev_addr_list to store the
    synchronization state and adds two helper functions for synchronization
    and cleanup.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Currently the set_multicast_list (and set_rx_mode) callbacks are
    responsible for configuring the device according to the IFF_PROMISC,
    IFF_MULTICAST and IFF_ALLMULTI flags and the mc_list (and uc_list in
    case of set_rx_mode).

    These callbacks can be invoked from BH context without the rtnl_mutex
    by dev_mc_add/dev_mc_delete, which makes reading the device flags and
    promiscous/allmulti count racy. For real hardware drivers that just
    commit all changes to the hardware this is not a real problem since
    the stack guarantees to call them for every change, so at least the
    final call will not race and commit the correct configuration to the
    hardware.

    For software devices that want to synchronize promiscous and multicast
    state to an underlying device however this can cause corruption of the
    underlying device's flags or promisc/allmulti counts.

    When the software device is concurrently put in promiscous or allmulti
    mode while set_multicast_list is invoked from bottem half context, the
    device might synchronize the change to the underlying device without
    holding the rtnl_mutex, which races with concurrent changes to the
    underlying device.

    Add a dev->change_rx_flags hook that is invoked when any of the flags
    that affect rx filtering change (under the rtnl_mutex), which allows
    drivers to perform synchronization immediately and only synchronize
    the address lists in set_multicast_list/set_rx_mode.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Subject: [patch] net/input: fix net/rfkill/rfkill-input.c bug on 64-bit systems

    this recent commit:

    commit cf4328cd949c2086091c62c5685f1580fe9b55e4
    Author: Ivo van Doorn
    Date: Mon May 7 00:34:20 2007 -0700

    [NET]: rfkill: add support for input key to control wireless radio

    added this 64-bit bug:

    ....
    unsigned int flags;

    spin_lock_irqsave(&task->lock, flags);
    ....

    irq 'flags' must be unsigned long, not unsigned int. The -rt tree has
    strict checks about this on 64-bit so this triggered a build failure.

    Signed-off-by: Ingo Molnar
    Signed-off-by: David S. Miller

    Ingo Molnar
     
  • umounting partitions after heavy activity would sometimes trigger a
    segmentation violation. This fix appears to remove that problem.
    Fix originally provided by Latchesar Ionkov.

    Signed-off-by: Eric Van Hensbergen

    Eric Van Hensbergen
     
  • If trans->write returns 0, p9_write_work goes through the error path, but
    sets the error code to zero.

    This patch sets the error code to EREMOTEIO if trans->write returns zero
    value.

    Signed-off-by: Latchesar Ionkov

    Latchesar Ionkov