11 Mar, 2011

1 commit

  • Although they run as rpciod background tasks, under normal operation
    (i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck()
    and nfs4_do_close() want to be fully synchronous. This means that when we
    exit, we want all references to the rpc_task to be gone, and we want
    any dentry references etc. held by that task to be released.

    For this reason these functions call __rpc_wait_for_completion_task(),
    followed by rpc_put_task() in the expectation that the latter will be
    releasing the last reference to the rpc_task, and thus ensuring that the
    callback_ops->rpc_release() has been called synchronously.

    This patch fixes a race which exists due to the fact that
    rpciod calls rpc_complete_task() (in order to wake up the callers of
    __rpc_wait_for_completion_task()) and then subsequently calls
    rpc_put_task() without ensuring that these two steps are done atomically.

    In order to avoid adding new spin locks, the patch uses the existing
    waitqueue spin lock to order the rpc_task reference count releases between
    the waiting process and rpciod.
    The common case where nobody is waiting for completion is optimised for by
    checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task
    reference count is 1: in those cases we drop trying to grab the spin lock,
    and immediately free up the rpc_task.

    Those few processes that need to put the rpc_task from inside an
    asynchronous context and that do not care about ordering are given a new
    helper: rpc_put_task_async().

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

08 Mar, 2011

2 commits


07 Mar, 2011

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
    ALSA: hda - Don't set to D3 in Cirrus errata init verbs
    ALSA: hda - add new Fermi 5xx codec IDs to snd-hda
    ASoC: WM8994: Ensure late enable events are processed for the ADCs
    ASoC: WM8994: Don't disable the AIF[1|2]CLK_ENA unconditionaly
    ASoC: Fix WM9081 platform data initialisation
    ALSA: hda - Fix unable to record issue on ASUS N82JV
    ALSA: HDA: Realtek: Fixup jack detection to input subsystem

    Linus Torvalds
     
  • If a virtio-console device gets unplugged while a port is open, a
    subsequent close() call on the port accesses vqs to free up buffers.
    This can lead to a crash.

    The buffers are already freed up as a result of the call to
    unplug_ports() from virtcons_remove(). The fix is to simply not access
    vq information if port->portdev is NULL.

    Reported-by: juzhang
    CC: stable@kernel.org
    Signed-off-by: Amit Shah
    Signed-off-by: Rusty Russell
    Signed-off-by: Linus Torvalds

    Amit Shah
     

06 Mar, 2011

2 commits


05 Mar, 2011

22 commits

  • Pass down the correct node for a transparent hugepage allocation. Most
    callers continue to use the current node, however the hugepaged daemon
    now uses the previous node of the first to be collapsed page instead.
    This ensures that khugepaged does not mess up local memory for an
    existing process which uses local policy.

    The choice of node is somewhat primitive currently: it just uses the
    node of the first page in the pmd range. An alternative would be to
    look at multiple pages and use the most popular node. I used the
    simplest variant for now which should work well enough for the case of
    all pages being on the same node.

    [akpm@linux-foundation.org: coding-style fixes]
    Acked-by: Andrea Arcangeli
    Signed-off-by: Andi Kleen
    Reviewed-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • This makes a difference for LOCAL policy, where the node cannot be
    determined from the policy itself, but has to be gotten from the original
    page.

    Acked-by: Andrea Arcangeli
    Signed-off-by: Andi Kleen
    Reviewed-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • Add a alloc_page_vma_node that allows passing the "local" node in. Used
    in a followon patch.

    Acked-by: Andrea Arcangeli
    Signed-off-by: Andi Kleen
    Reviewed-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • Currently alloc_pages_vma() always uses the local node as policy node for
    the LOCAL policy. Pass this node down as an argument instead.

    No behaviour change from this patch, but will be needed for followons.

    Acked-by: Andrea Arcangeli
    Signed-off-by: Andi Kleen
    Reviewed-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • Signed-off-by: Alexandre Bounine
    Cc: Matt Porter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandre Bounine
     
  • Signed-off-by: Axel Lin
    Cc: Haavard Skinnemoen
    Cc: Richard Purdie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Axel Lin
     
  • Add maintainer of Samsung Mobile machine support. Currently, Aquila,
    Goni, Universal (C210), and Nuri board are supported.

    Signed-off-by: Kyungmin Park
    Cc: Joe Perches
    Cc: "David S. Miller"
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kyungmin Park
     
  • This driver causes hard lockups, when the active clock soure is jiffies.

    The reason is that it loops with interrupts disabled waiting for a
    timestamp to be reached by polling getnstimeofday(). Though with a
    jiffies clocksource, when that code runs on the same CPU which is
    responsible for updating jiffies, then we loop in circles for ever
    simply because the timer interrupt cannot update jiffies. So both UP
    and SMP can be affected.

    There is no easy fix for that problem so make it depend on BROKEN for
    now.

    Signed-off-by: Thomas Gleixner
    Cc: Alexander Gordeev
    Cc: Rodolfo Giometti
    Cc: john stultz
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • The device table is required to load modules based on modaliases.

    Signed-off-by: Axel Lin
    Cc: Shubhrajyoti D
    Cc: Christoph Mair
    Cc: Jonathan Cameron
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Axel Lin
     
  • Don't forget to release cgroup_mutex if alloc_trial_cpuset() fails.

    [akpm@linux-foundation.org: avoid multiple return points]
    Signed-off-by: Li Zefan
    Cc: Paul Menage
    Acked-by: David Rientjes
    Cc: Miao Xie
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • Fix s3c_rtc_setaie() prototype to eliminate the following compile
    warning:

    drivers/rtc/rtc-s3c.c:383: warning: initialization from incompatible pointer type

    (akpm: the rtc_class_ops.alarm_irq_enable() handler is being passed two
    arguments where it expects just one, presumably with undesired effects)

    Signed-off-by: Axel Lin
    Cc: Alessandro Zummo
    Cc: Ben Dooks
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Axel Lin
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin:
    Blackfin: iflush: update anomaly 05000491 workaround
    Blackfin: outs[lwb]: make sure count is greater than 0

    Linus Torvalds
     
  • …nel/git/lethal/sh-2.6

    * 'rmobile-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
    ARM: mach-shmobile: mackerel: modify LCDC clock divider value
    ARM: mach-shmobile: ap4evb: modify LCDC clock divider value
    ARM: mach-shmobile: mackerel: fixup memory initialize for zboot
    ARM: mach-shmobile: ap4evb: fixup memory initialize for zboot
    ARM: mach-shmobile: Add sh73a0 MIPI-CSI and CEU clocks
    ARM: mach-shmobile: AG5EVM MIPI-DSI LCD reset delay fix

    Linus Torvalds
     
  • * 'sh-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
    sh: Change __nosave_XXX symbols to long
    sh: Flush executable pages in copy_user_highpage
    sh: Ensure ST40-300 BogoMIPS value is consistent
    sh: sh7750: Fix incompatible pointer type
    sh: sh7750: move machtypes.h to include/generated

    Linus Torvalds
     
  • * 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
    drm/nouveau: allocate kernel's notifier object at end of block

    Linus Torvalds
     
  • The "bad_page()" page allocator sanity check was reported recently (call
    chain as follows):

    bad_page+0x69/0x91
    free_hot_cold_page+0x81/0x144
    skb_release_data+0x5f/0x98
    __kfree_skb+0x11/0x1a
    tcp_ack+0x6a3/0x1868
    tcp_rcv_established+0x7a6/0x8b9
    tcp_v4_do_rcv+0x2a/0x2fa
    tcp_v4_rcv+0x9a2/0x9f6
    do_timer+0x2df/0x52c
    ip_local_deliver+0x19d/0x263
    ip_rcv+0x539/0x57c
    netif_receive_skb+0x470/0x49f
    :virtio_net:virtnet_poll+0x46b/0x5c5
    net_rx_action+0xac/0x1b3
    __do_softirq+0x89/0x133
    call_softirq+0x1c/0x28
    do_softirq+0x2c/0x7d
    do_IRQ+0xec/0xf5
    default_idle+0x0/0x50
    ret_from_intr+0x0/0xa
    default_idle+0x29/0x50
    cpu_idle+0x95/0xb8
    start_kernel+0x220/0x225
    _sinittext+0x22f/0x236

    It occurs because an skb with a fraglist was freed from the tcp
    retransmit queue when it was acked, but a page on that fraglist had
    PG_Slab set (indicating it was allocated from the Slab allocator (which
    means the free path above can't safely free it via put_page.

    We tracked this back to an nfsv4 setacl operation, in which the nfs code
    attempted to fill convert the passed in buffer to an array of pages in
    __nfs4_proc_set_acl, which gets used by the skb->frags list in
    xs_sendpages. __nfs4_proc_set_acl just converts each page in the buffer
    to a page struct via virt_to_page, but the vfs allocates the buffer via
    kmalloc, meaning the PG_slab bit is set. We can't create a buffer with
    kmalloc and free it later in the tcp ack path with put_page, so we need
    to either:

    1) ensure that when we create the list of pages, no page struct has
    PG_Slab set

    or

    2) not use a page list to send this data

    Given that these buffers can be multiple pages and arbitrarily sized, I
    think (1) is the right way to go. I've written the below patch to
    allocate a page from the buddy allocator directly and copy the data over
    to it. This ensures that we have a put_page free-able page for every
    entry that winds up on an skb frag list, so it can be safely freed when
    the frame is acked. We do a put page on each entry after the
    rpc_call_sync call so as to drop our own reference count to the page,
    leaving only the ref count taken by tcp_sendpages. This way the data
    will be properly freed when the ack comes in

    Successfully tested by myself to solve the above oops.

    Note, as this is the result of a setacl operation that exceeded a page
    of data, I think this amounts to a local DOS triggerable by an
    uprivlidged user, so I'm CCing security on this as well.

    Signed-off-by: Neil Horman
    CC: Trond Myklebust
    CC: security@kernel.org
    CC: Jeff Layton
    Signed-off-by: Linus Torvalds

    Neil Horman
     
  • Otherwise you can do things like

    # mkdir .snap/foo
    # cd .snap/foo/.snap
    # ls

    Signed-off-by: Sage Weil

    Sage Weil
     
  • The standby logic used to be pretty dependent on the work requeueing
    behavior that changed when we switched to WQ_NON_REENTRANT. It was also
    very fragile.

    Restructure things so that:
    - We clear WRITE_PENDING when we set STANDBY. This ensures we will
    requeue work when we wake up later.
    - con_work backs off if STANDBY is set. There is nothing to do if we are
    in standby.
    - clear_standby() helper is called by both con_send() and con_keepalive(),
    the two actions that can wake us up again. Move the connect_seq++
    logic here.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • There was some broken keepalive code using a dead variable. Shift to using
    the proper bit flag.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • With commit f363e45f we replaced a bunch of hacky workqueue mutual
    exclusion logic with the WQ_NON_REENTRANT flag. One pieces of fallout is
    that the exponential backoff breaks in certain cases:

    * con_work attempts to connect.
    * we get an immediate failure, and the socket state change handler queues
    immediate work.
    * con_work calls con_fault, we decide to back off, but can't queue delayed
    work.

    In this case, we add a BACKOFF bit to make con_work reschedule delayed work
    next time it runs (which should be immediately).

    Signed-off-by: Sage Weil

    Sage Weil
     
  • Signed-off-by: Dave Kleikamp
    Signed-off-by: Linus Torvalds

    Dave Kleikamp
     
  • They are only used inside kernel/ptrace.c, and have been for a long
    time. We don't want to go back to the bad-old-days when architectures
    did things on their own, so make them static and private.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

04 Mar, 2011

11 commits

  • Recent feedback from design says we need three NOPs in the hardware loop.

    Signed-off-by: Mike Frysinger

    Mike Frysinger
     
  • Some devices will use the outs* funcs with a length of zero, so make sure
    we do not write any data in that case.

    Reported-by: Gilbert Inho
    Signed-off-by: Mike Frysinger

    Mike Frysinger
     
  • mackerel WVGA LCDC panel expect 33.3MHz for dot-clock,
    but current dot-clock was 50.0MHz.
    This patch modify clock divider value.

    Signed-off-by: Makoto Ueda
    Signed-off-by: Kuninori Morimoto
    Signed-off-by: Paul Mundt

    Kuninori Morimoto
     
  • ap4evb WVGA LCDC panel expect 33.3MHz for dot-clock,
    but current dot-clock was 50.0MHz.
    This patch modify clock divider value.

    Signed-off-by: Makoto Ueda
    Signed-off-by: Kuninori Morimoto
    Signed-off-by: Paul Mundt

    Kuninori Morimoto
     
  • The nv30/nv40 3d driver is about to start using DMA_FENCE from the 3D
    object which, it turns out, doesn't like its DMA object to not be
    aligned to a 4KiB boundary.

    Signed-off-by: Ben Skeggs
    Signed-off-by: Dave Airlie

    Ben Skeggs
     
  • …s/security-testing-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
    DNS: Fix a NULL pointer deref when trying to read an error key [CVE-2011-1076]

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits)
    MAINTAINERS: Add Andy Gospodarek as co-maintainer.
    r8169: disable ASPM
    RxRPC: Fix v1 keys
    AF_RXRPC: Handle receiving ACKALL packets
    cnic: Fix lost interrupt on bnx2x
    cnic: Prevent status block race conditions with hardware
    net: dcbnl: check correct ops in dcbnl_ieee_set()
    e1000e: disable broken PHY wakeup for ICH10 LOMs, use MAC wakeup instead
    igb: fix sparse warning
    e1000: fix sparse warning
    netfilter: nf_log: avoid oops in (un)bind with invalid nfproto values
    dccp: fix oops on Reset after close
    ipvs: fix dst_lock locking on dest update
    davinci_emac: Add Carrier Link OK check in Davinci RX Handler
    bnx2x: update driver version to 1.62.00-6
    bnx2x: properly calculate lro_mss
    bnx2x: perform statistics "action" before state transition.
    bnx2x: properly configure coefficients for MinBW algorithm (NPAR mode).
    bnx2x: Fix ethtool -t link test for MF (non-pmf) devices.
    bnx2x: Fix nvram test for single port devices.
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    block: kill loop_mutex
    blktrace: Remove blk_fill_rwbs_rq.
    block: blk-flush shouldn't call directly into q->request_fn() __blk_run_queue()
    block: add @force_kblockd to __blk_run_queue()
    block: fix kernel-doc format for blkdev_issue_zeroout
    blk-throttle: Do not use kblockd workqueue for throtl work

    Linus Torvalds
     
  • * 'i_nlink' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    hfs: fix rename() over non-empty directory
    udf: fix i_nlink limit
    fix reiserfs mkdir() breakage
    exofs: i_nlink races in rename()
    nilfs2: i_nlink races in rename()
    minix: i_nlink races in rename()
    ufs: i_nlink races in rename()
    sysv: i_nlink races in rename()

    Linus Torvalds
     
  • When a DNS resolver key is instantiated with an error indication, attempts to
    read that key will result in an oops because user_read() is expecting there to
    be a payload - and there isn't one [CVE-2011-1076].

    Give the DNS resolver key its own read handler that returns the error cached in
    key->type_data.x[0] as an error rather than crashing.

    Also make the kenter() at the beginning of dns_resolver_instantiate() limit the
    amount of data it prints, since the data is not necessarily NUL-terminated.

    The buggy code was added in:

    commit 4a2d789267e00b5a1175ecd2ddefcc78b83fbf09
    Author: Wang Lei
    Date: Wed Aug 11 09:37:58 2010 +0100
    Subject: DNS: If the DNS server returns an error, allow that to be cached [ver #2]

    This can trivially be reproduced by any user with the following program
    compiled with -lkeyutils:

    #include
    #include
    #include
    static char payload[] = "#dnserror=6";
    int main()
    {
    key_serial_t key;
    key = add_key("dns_resolver", "a", payload, sizeof(payload),
    KEY_SPEC_SESSION_KEYRING);
    if (key == -1)
    err(1, "add_key");
    if (keyctl_read(key, NULL, 0) == -1)
    err(1, "read_key");
    return 0;
    }

    What should happen is that keyctl_read() reports error 6 (ENXIO) to the user:

    dns-break: read_key: No such device or address

    but instead the kernel oopses.

    This cannot be reproduced with the 'keyutils add' or 'keyutils padd' commands
    as both of those cut the data down below the NUL termination that must be
    included in the data. Without this dns_resolver_instantiate() will return
    -EINVAL and the key will not be instantiated such that it can be read.

    The oops looks like:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
    IP: [] user_read+0x4f/0x8f
    PGD 3bdf8067 PUD 385b9067 PMD 0
    Oops: 0000 [#1] SMP
    last sysfs file: /sys/devices/pci0000:00/0000:00:19.0/irq
    CPU 0
    Modules linked in:

    Pid: 2150, comm: dns-break Not tainted 2.6.38-rc7-cachefs+ #468 /DG965RY
    RIP: 0010:[] [] user_read+0x4f/0x8f
    RSP: 0018:ffff88003bf47f08 EFLAGS: 00010246
    RAX: 0000000000000001 RBX: ffff88003b5ea378 RCX: ffffffff81972368
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003b5ea378
    RBP: ffff88003bf47f28 R08: ffff88003be56620 R09: 0000000000000000
    R10: 0000000000000395 R11: 0000000000000002 R12: 0000000000000000
    R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffffffffa1
    FS: 00007feab5751700(0000) GS:ffff88003e000000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000010 CR3: 000000003de40000 CR4: 00000000000006f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process dns-break (pid: 2150, threadinfo ffff88003bf46000, task ffff88003be56090)
    Stack:
    ffff88003b5ea378 ffff88003b5ea3a0 0000000000000000 0000000000000000
    ffff88003bf47f68 ffffffff811b708e ffff88003c442bc8 0000000000000000
    00000000004005a0 00007fffba368060 0000000000000000 0000000000000000
    Call Trace:
    [] keyctl_read_key+0xac/0xcf
    [] sys_keyctl+0x75/0xb6
    [] system_call_fastpath+0x16/0x1b
    Code: 75 1f 48 83 7b 28 00 75 18 c6 05 58 2b fb 00 01 be bb 00 00 00 48 c7 c7 76 1c 75 81 e8 13 c2 e9 ff 4c 8b b3 e0 00 00 00 4d 85 ed 0f b7 5e 10 74 2d 4d 85 e4 74 28 e8 98 79 ee ff 49 39 dd 48
    RIP [] user_read+0x4f/0x8f
    RSP
    CR2: 0000000000000010

    Signed-off-by: David Howells
    Acked-by: Jeff Layton
    cc: Wang Lei
    Signed-off-by: James Morris

    David Howells
     
  • If we mark the connection CLOSED we will give up trying to reconnect to
    this server instance. That is appropriate for things like a protocol
    version mismatch that won't change until the server is restarted, at which
    point we'll get a new addr and reconnect. An authorization failure like
    this is probably due to the server not properly rotating it's secret keys,
    however, and should be treated as transient so that the normal backoff and
    retry behavior kicks in.

    Signed-off-by: Sage Weil

    Sage Weil