08 Dec, 2010

7 commits

  • If inotify_init is unable to allocate a new file for the new inotify
    group we leak the new group. This patch drops the reference on the
    group on file allocation failure.

    Reported-by: Vegard Nossum
    cc: stable@kernel.org
    Signed-off-by: Eric Paris

    Eric Paris
     
  • When fanotify_release() is called, there may still be processes waiting for
    access permission. Currently only processes for which an event has already been
    queued into the groups access list will be woken up. Processes for which no
    event has been queued will continue to sleep and thus cause a deadlock when
    fsnotify_put_group() is called.
    Furthermore there is a race allowing further processes to be waiting on the
    access wait queue after wake_up (if they arrive before clear_marks_by_group()
    is called).
    This patch corrects this by setting a flag to inform processes that the group
    is about to be destroyed and thus not to wait for access permission.

    [additional changelog from eparis]
    Lets think about the 4 relevant code paths from the PoV of the
    'operator' 'listener' 'responder' and 'closer'. Where operator is the
    process doing an action (like open/read) which could require permission.
    Listener is the task (or in this case thread) slated with reading from
    the fanotify file descriptor. The 'responder' is the thread responsible
    for responding to access requests. 'Closer' is the thread attempting to
    close the fanotify file descriptor.

    The 'operator' is going to end up in:
    fanotify_handle_event()
    get_response_from_access()
    (THIS BLOCKS WAITING ON USERSPACE)

    The 'listener' interesting code path
    fanotify_read()
    copy_event_to_user()
    prepare_for_access_response()
    (THIS CREATES AN fanotify_response_event)

    The 'responder' code path:
    fanotify_write()
    process_access_response()
    (REMOVE A fanotify_response_event, SET RESPONSE, WAKE UP 'operator')

    The 'closer':
    fanotify_release()
    (SUPPOSED TO CLEAN UP THE REST OF THIS MESS)

    What we have today is that in the closer we remove all of the
    fanotify_response_events and set a bit so no more response events are
    ever created in prepare_for_access_response().

    The bug is that we never wake all of the operators up and tell them to
    move along. You fix that in fanotify_get_response_from_access(). You
    also fix other operators which haven't gotten there yet. So I agree
    that's a good fix.
    [/additional changelog from eparis]

    [remove additional changes to minimize patch size]
    [move initialization so it was inside CONFIG_FANOTIFY_PERMISSION]

    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     
  • In mark_remove_from_mask() we destroy marks that have their event mask cleared.
    Thus we should not allow the creation of those marks in the first place.
    With this patch we check if the mask given from user is 0 in case of FAN_MARK_ADD.
    If so we return an error. Same for FAN_MARK_REMOVE since this does not have any
    effect.

    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     
  • If adding a mount or inode mark failed fanotify_free_mark() is called explicitly.
    But at this time the mark has already been put into the destroy list of the
    fsnotify_mark kernel thread. If the thread is too slow it will try to decrease
    the reference of a mark, that has already been freed by fanotify_free_mark().
    (If its fast enough it will only decrease the marks ref counter from 2 to 1 - note
    that the counter has been increased to 2 in add_mark() - which has practically no
    effect.)

    This patch fixes the ref counting by not calling free_mark() explicitly, but
    decreasing the ref counter and rely on the fsnotify_mark thread to cleanup in
    case adding the mark has failed.

    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     
  • Unsetting FMODE_NONOTIFY in fsnotify_open() is too late, since fsnotify_perm()
    is called before. If FMODE_NONOTIFY is set fsnotify_perm() will skip permission
    checks, so a user can still disable permission checks by setting this flag
    in an open() call.
    This patch corrects this by unsetting the flag before fsnotify_perm is called.

    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     
  • Since fanotify has decided to be careful about alignment and packing
    rather than rely on __attribute__((packed)) for multiarch support.
    Since this attribute isn't doing anything on fanotify_response we just
    drop it. This does not break API/ABI.

    Suggested-by: Tvrtko Ursulin
    Signed-off-by: Eric Paris

    Eric Paris
     
  • If no event was sent to userspace we cannot expect userspace to respond to
    permissions requests. Today such requests just hang forever. This patch will
    deny any permissions event which was unable to be sent to userspace.

    Reported-by: Tvrtko Ursulin
    Signed-off-by: Eric Paris

    Eric Paris
     

30 Nov, 2010

14 commits

  • Linus Torvalds
     
  • * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
    powerpc: Use call_rcu_sched() for pagetables

    Linus Torvalds
     
  • PowerPC relies on IRQ-disable to guard against RCU quiecent states,
    use the appropriate RCU call version.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Benjamin Herrenschmidt

    Peter Zijlstra
     
  • This reverts commit e0fdace10e75dac67d906213b780ff1b1a4cc360.

    On-list discussion seems to suggest that the robustness fixes for printk
    make this unnecessary and DaveM has also agreed in person at Kernel Summit
    and on list.

    The main problem with this code is once we hit a lockdep splat we always
    keep oops_in_progress set, the console layer uses oops_in_progress with KMS
    to decide when it should be showing the oops and not showing X, so it causes
    problems around suspend/resume time when a userspace resume can cause a console
    switch away from X, only if oops_in_progress is set (which is what we want
    if an oops actually is in progress, but not because we had a lockdep splat
    2 days prior).

    Cc: David S Miller
    Cc: Ingo Molnar
    Signed-off-by: Dave Airlie
    Signed-off-by: Linus Torvalds

    Dave Airlie
     
  • …s/security-testing-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
    tpm: Autodetect itpm devices

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits)
    af_unix: limit recursion level
    pch_gbe driver: The wrong of initializer entry
    pch_gbe dreiver: chang author
    ucc_geth: fix ucc halt problem in half duplex mode
    inet: Fix __inet_inherit_port() to correctly increment bsockets and num_owners
    ehea: Add some info messages and fix an issue
    hso: fix disable_net
    NET: wan/x25_asy, move lapb_unregister to x25_asy_close_tty
    cxgb4vf: fix setting unicast/multicast addresses ...
    net, ppp: Report correct error code if unit allocation failed
    DECnet: don't leak uninitialized stack byte
    au1000_eth: fix invalid address accessing the MAC enable register
    dccp: fix error in updating the GAR
    tcp: restrict net.ipv4.tcp_adv_win_scale (#20312)
    netns: Don't leak others' openreq-s in proc
    Net: ceph: Makefile: Remove unnessary code
    vhost/net: fix rcu check usage
    econet: fix CVE-2010-3848
    econet: fix CVE-2010-3850
    econet: disallow NULL remote addr for sendmsg(), fixes CVE-2010-3849
    ...

    Linus Torvalds
     
  • …/git/tmlind/linux-omap-2.6

    * 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
    OMAP2+: PM/serial: hold console semaphore while OMAP UARTs are disabled
    OMAP: UART: don't resume UARTs that are not enabled.

    Linus Torvalds
     
  • Some Lenovos have TPMs that require a quirk to function correctly. This can
    be autodetected by checking whether the device has a _HID of INTC0102. This
    is an invalid PNPid, and as such is discarded by the pnp layer - however
    it's still present in the ACPI code, so we can pull it out that way. This
    means that the quirk won't be automatically applied on non-ACPI systems,
    but without ACPI we don't have any way to identify the chip anyway so I
    don't think that's a great concern.

    Signed-off-by: Matthew Garrett
    Acked-by: Rajiv Andrade
    Tested-by: Jiri Kosina
    Tested-by: Andy Isaacson
    Signed-off-by: James Morris

    Matthew Garrett
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (24 commits)
    Btrfs: don't use migrate page without CONFIG_MIGRATION
    Btrfs: deal with DIO bios that span more than one ordered extent
    Btrfs: setup blank root and fs_info for mount time
    Btrfs: fix fiemap
    Btrfs - fix race between btrfs_get_sb() and umount
    Btrfs: update inode ctime when using links
    Btrfs: make sure new inode size is ok in fallocate
    Btrfs: fix typo in fallocate to make it honor actual size
    Btrfs: avoid NULL pointer deref in try_release_extent_buffer
    Btrfs: make btrfs_add_nondir take parent inode as an argument
    Btrfs: hold i_mutex when calling btrfs_log_dentry_safe
    Btrfs: use dget_parent where we can UPDATED
    Btrfs: fix more ESTALE problems with NFS
    Btrfs: handle NFS lookups properly
    btrfs: make 1-bit signed fileds unsigned
    btrfs: Show device attr correctly for symlinks
    btrfs: Set file size correctly in file clone
    btrfs: Check if dest_offset is block-size aligned before cloning file
    Btrfs: handle the space_cache option properly
    btrfs: Fix early enospc because 'unused' calculated with wrong sign.
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
    EDAC: Fix typos in Documentation/edac.txt
    EDAC, MCE: Fix edac_init_mce_inject error handling
    EDAC: Remove deprecated kbuild goal definitions

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
    GFS2: Userland expects quota limit/warn/usage in 512b blocks

    Linus Torvalds
     
  • Its easy to eat all kernel memory and trigger NMI watchdog, using an
    exploit program that queues unix sockets on top of others.

    lkml ref : http://lkml.org/lkml/2010/11/25/8

    This mechanism is used in applications, one choice we have is to have a
    recursion limit.

    Other limits might be needed as well (if we queue other types of files),
    since the passfd mechanism is currently limited by socket receive queue
    sizes only.

    Add a recursion_level to unix socket, allowing up to 4 levels.

    Each time we send an unix socket through sendfd mechanism, we copy its
    recursion level (plus one) to receiver. This recursion level is cleared
    when socket receive queue is emptied.

    Reported-by: Марк Коренберг
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • The wrong of initializer entry was modified.

    Signed-off-by: Toshiharu Okada
    Reported-by: Dr. David Alan Gilbert
    Signed-off-by: David S. Miller

    Toshiharu Okada
     
  • This driver's AUTHOR was changed to "Toshiharu Okada" from "Masayuki Ohtake".
    I update the Kconfig, renamed "Topcliff" to "EG20T".

    Signed-off-by: Toshiharu Okada
    Signed-off-by: David S. Miller

    Toshiharu Okada
     

29 Nov, 2010

19 commits

  • Fixes compile error

    Signed-off-by: Chris Mason

    Chris Mason
     
  • In commit 58933c64(ucc_geth: Fix the wrong the Rx/Tx FIFO size),
    the UCC_GETH_UTFTT_INIT is set to 512 based on the recommendation
    of the QE Reference Manual. But that will sometimes cause tx halt
    while working in half duplex mode.

    According to errata draft QE_GENERAL-A003(High Tx Virtual FIFO
    threshold size can cause UCC to halt), setting UTFTT less than
    [(UTFS x (M - 8)/M) - 128] will prevent this from happening
    (M is the minimum buffer size).

    The patch changes UTFTT back to 256.

    Signed-off-by: Li Yang
    Cc: Jean-Denis Boyer
    Cc: Andreas Schmitz
    Cc: Anton Vorontsov
    Signed-off-by: David S. Miller

    Yang Li
     
  • inet sockets corresponding to passive connections are added to the bind hash
    using ___inet_inherit_port(). These sockets are later removed from the bind
    hash using __inet_put_port(). These two functions are not exactly symmetrical.
    __inet_put_port() decrements hashinfo->bsockets and tb->num_owners, whereas
    ___inet_inherit_port() does not increment them. This results in both of these
    going to -ve values.

    This patch fixes this by calling inet_bind_hash() from ___inet_inherit_port(),
    which does the right thing.

    'bsockets' and 'num_owners' were introduced by commit a9d8f9110d7e953c
    (inet: Allowing more than 64k connections and heavily optimize bind(0))

    Signed-off-by: Nagendra Singh Tomar
    Acked-by: Eric Dumazet
    Acked-by: Evgeniy Polyakov
    Signed-off-by: David S. Miller

    Nagendra Tomar
     
  • This patch adds some debug information about ehea not being able to
    allocate enough spaces. Also it correctly updates the amount of available
    skb.

    Signed-off-by: Breno Leitao
    Signed-off-by: David S. Miller

    Breno Leitao
     
  • The new DIO bio splitting code has problems when the bio
    spans more than one ordered extent. This will happen as the
    generic DIO code merges our get_blocks calls together into
    a bigger single bio.

    This fixes things by walking forward in the ordered extent
    code finding all the overlapping ordered extents and completing them
    all at once.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • This avoids some include-file hell, and the function isn't really
    important enough to be inlined anyway.

    Reported-by: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • And in particular, use it in 'pipe_fcntl()'.

    The other pipe functions do not need to use the 'careful' version, since
    they are only ever called for things that are already known to be pipes.

    The normal read/write/ioctl functions are called through the file
    operations structures, so if a file isn't a pipe, they'd never get
    called. But pipe_fcntl() is special, and called directly from the
    generic fcntl code, and needs to use the same careful function that the
    splice code is using.

    Cc: Jens Axboe
    Cc: Andrew Morton
    Cc: Al Viro
    Cc: Dave Jones
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • .. and change it to take the 'file' pointer instead of an inode, since
    that's what all users want anyway.

    The renaming is preparatory to exporting it to other users. The old
    'pipe_info()' name was too generic and is already used elsewhere, so
    before making the function public we need to use a more specific name.

    Cc: Jens Axboe
    Cc: Andrew Morton
    Cc: Al Viro
    Cc: Dave Jones
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf: Fix the software context switch counter
    perf, x86: Fixup Kconfig deps
    x86, perf, nmi: Disable perf if counters are not accessible
    perf: Fix inherit vs. context rotation bug

    Linus Torvalds
     
  • * 'fwnet' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
    firewire: net: throttle TX queue before running out of tlabels
    firewire: net: replace lists by counters
    firewire: net: fix memory leaks
    firewire: net: count stats.tx_packets and stats.tx_bytes

    Linus Torvalds
     
  • The HSO driver incorrectly creates a serial device instead of a net
    device when disable_net is set. It shouldn't create anything for the
    network interface.

    Signed-off-by: Filip Aben
    Reported-by: Piotr Isajew
    Reported-by: Johan Hovold
    Signed-off-by: David S. Miller

    Filip Aben
     
  • We register lapb when tty is created, but unregister it only when the
    device is UP. So move the lapb_unregister to x25_asy_close_tty after
    the device is down.

    The old behaviour causes ldisc switching to fail each second attempt,
    because we noted for us that the device is unused, so we use it the
    second time, but labp layer still have it registered, so it fails
    obviously.

    Signed-off-by: Jiri Slaby
    Reported-by: Sergey Lapin
    Cc: Andrew Hendry
    Tested-by: Sergey Lapin
    Tested-by: Mikhail Ulyanov
    Signed-off-by: David S. Miller

    Jiri Slaby
     
  • We were truncating the number of unicast and multicast MAC addresses
    supported. Additionally, we were incorrectly computing the MAC Address
    hash (a "1 << N" where we needed a "1ULL << N").

    Signed-off-by: Casey Leedom
    Signed-off-by: David S. Miller

    Casey Leedom
     
  • Allocating unit from ird might return several error codes
    not only -EAGAIN, so it should not be changed and returned
    precisely. Same time unit release procedure should be invoked
    only if device is unregistering.

    Signed-off-by: Cyrill Gorcunov
    CC: Paul Mackerras
    Signed-off-by: David S. Miller

    Cyrill Gorcunov
     
  • A single uninitialized padding byte is leaked to userspace.

    Signed-off-by: Dan Rosenberg
    CC: stable
    Signed-off-by: David S. Miller

    Dan Rosenberg
     
  • "aup->enable" holds already the address pointing to the MAC enable
    register. The bug was introduced by commit d0e7cb:

    "au1000-eth: remove volatiles, switch to I/O accessors".

    CC: Florian Fainelli
    Signed-off-by: Wolfgang Grandegger
    Acked-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Wolfgang Grandegger
     
  • This fixes a bug in updating the Greatest Acknowledgment number Received (GAR):
    the current implementation does not track the greatest received value -
    lower values in the range AWL..AWH (RFC 4340, 7.5.1) erase higher ones.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • David S. Miller
     
  • tcp_win_from_space() does the following:

    if (sysctl_tcp_adv_win_scale > (-sysctl_tcp_adv_win_scale);
    else
    return space - (space >> sysctl_tcp_adv_win_scale);

    "space" is int.

    As per C99 6.5.7 (3) shifting int for 32 or more bits is
    undefined behaviour.

    Indeed, if sysctl_tcp_adv_win_scale is exactly 32,
    space >> 32 equals space and function returns 0.

    Which means we busyloop in tcp_fixup_rcvbuf().

    Restrict net.ipv4.tcp_adv_win_scale to [-31, 31].

    Fix https://bugzilla.kernel.org/show_bug.cgi?id=20312

    Steps to reproduce:

    echo 32 >/proc/sys/net/ipv4/tcp_adv_win_scale
    wget www.kernel.org
    [softlockup]

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan