31 Aug, 2013

1 commit

  • Serge Hallyn writes:

    > Since commit af4b8a83add95ef40716401395b44a1b579965f4 it's been
    > possible to get into a situation where a pidns reaper is
    > , reparented to host pid 1, but never reaped. How to
    > reproduce this is documented at
    >
    > https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526
    > (and see
    > https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/comments/13)
    > In short, run repeated starts of a container whose init is
    >
    > Process.exit(0);
    >
    > sysrq-t when such a task is playing zombie shows:
    >
    > [ 131.132978] init x ffff88011fc14580 0 2084 2039 0x00000000
    > [ 131.132978] ffff880116e89ea8 0000000000000002 ffff880116e89fd8 0000000000014580
    > [ 131.132978] ffff880116e89fd8 0000000000014580 ffff8801172a0000 ffff8801172a0000
    > [ 131.132978] ffff8801172a0630 ffff88011729fff0 ffff880116e14650 ffff88011729fff0
    > [ 131.132978] Call Trace:
    > [ 131.132978] [] schedule+0x29/0x70
    > [ 131.132978] [] do_exit+0x6e1/0xa40
    > [ 131.132978] [] ? signal_wake_up_state+0x1e/0x30
    > [ 131.132978] [] do_group_exit+0x3f/0xa0
    > [ 131.132978] [] SyS_exit_group+0x14/0x20
    > [ 131.132978] [] tracesys+0xe1/0xe6
    >
    > Further debugging showed that every time this happened, zap_pid_ns_processes()
    > started with nr_hashed being 3, while we were expecting it to drop to 2.
    > Any time it didn't happen, nr_hashed was 1 or 2. So the reaper was
    > waiting for nr_hashed to become 2, but free_pid() only wakes the reaper
    > if nr_hashed hits 1.

    The issue is that when the task group leader of an init process exits
    before other tasks of the init process when the init process finally
    exits it will be a secondary task sleeping in zap_pid_ns_processes and
    waiting to wake up when the number of hashed pids drops to two. This
    case waits forever as free_pid only sends a wake up when the number of
    hashed pids drops to 1.

    To correct this the simple strategy of sending a possibly unncessary
    wake up when the number of hashed pids drops to 2 is adopted.

    Sending one extraneous wake up is relatively harmless, at worst we
    waste a little cpu time in the rare case when a pid namespace
    appropaches exiting.

    We can detect the case when the pid namespace drops to just two pids
    hashed race free in free_pid.

    Dereferencing pid_ns->child_reaper with the pidmap_lock held is safe
    without out the tasklist_lock because it is guaranteed that the
    detach_pid will be called on the child_reaper before it is freed and
    detach_pid calls __change_pid which calls free_pid which takes the
    pidmap_lock. __change_pid only calls free_pid if this is the
    last use of the pid. For a thread that is not the thread group leader
    the threads pid will only ever have one user because a threads pid
    is not allowed to be the pid of a process, of a process group or
    a session. For a thread that is a thread group leader all of
    the other threads of that process will be reaped before it is allowed
    for the thread group leader to be reaped ensuring there will only
    be one user of the threads pid as a process pid. Furthermore
    because the thread is the init process of a pid namespace all of the
    other processes in the pid namespace will have also been already freed
    leading to the fact that the pid will not be used as a session pid or
    a process group pid for any other running process.

    CC: stable@vger.kernel.org
    Acked-by: Serge Hallyn
    Tested-by: Serge Hallyn
    Reported-by: Serge Hallyn
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

29 Aug, 2013

1 commit

  • Don't allow mounting sysfs unless the caller has CAP_SYS_ADMIN rights
    over the net namespace. The principle here is if you create or have
    capabilities over it you can mount it, otherwise you get to live with
    what other people have mounted.

    Instead of testing this with a straight forward ns_capable call,
    perform this check the long and torturous way with kobject helpers,
    this keeps direct knowledge of namespaces out of sysfs, and preserves
    the existing sysfs abstractions.

    Acked-by: Greg Kroah-Hartman
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

27 Aug, 2013

4 commits

  • Rely on the fact that another flavor of the filesystem is already
    mounted and do not rely on state in the user namespace.

    Verify that the mounted filesystem is not covered in any significant
    way. I would love to verify that the previously mounted filesystem
    has no mounts on top but there are at least the directories
    /proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
    for other filesystems to mount on top of.

    Refactor the test into a function named fs_fully_visible and call that
    function from the mount routines of proc and sysfs. This makes this
    test local to the filesystems involved and the results current of when
    the mounts take place, removing a weird threading of the user
    namespace, the mount namespace and the filesystems themselves.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Don't copy bind mounts of /proc//ns/mnt between namespaces.
    These files hold references to a mount namespace and copying them
    between namespaces could result in a reference counting loop.

    The current mnt_ns_loop test prevents loops on the assumption that
    mounts don't cross between namespaces. Unfortunately unsharing a
    mount namespace and shared substrees can both cause mounts to
    propogate between mount namespaces.

    Add two flags CL_COPY_UNBINDABLE and CL_COPY_MNT_NS_FILE are added to
    control this behavior, and CL_COPY_ALL is redefined as both of them.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • It seems GCC generates a better code in that way, so I changed that statement.
    Btw, they have the same semantic, so I'm sending this patch due to performance issues.

    Acked-by: Serge E. Hallyn
    Signed-off-by: Raphael S.Carvalho
    Signed-off-by: Eric W. Biederman

    Raphael S.Carvalho
     
  • Don't allow mounting the proc filesystem unless the caller has
    CAP_SYS_ADMIN rights over the pid namespace. The principle here is if
    you create or have capabilities over it you can mount it, otherwise
    you get to live with what other people have mounted.

    Andy pointed out that this is needed to prevent users in a user
    namespace from remounting proc and specifying different hidepid and gid
    options on already existing proc mounts.

    Cc: stable@vger.kernel.org
    Reported-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

25 Jul, 2013

1 commit

  • When creating a less privileged mount namespace or propogating mounts
    from a more privileged to a less privileged mount namespace lock the
    submounts so they may not be unmounted individually in the child mount
    namespace revealing what is under them.

    This enforces the reasonable expectation that it is not possible to
    see under a mount point. Most of the time mounts are on empty
    directories and revealing that does not matter, however I have seen an
    occassionaly sloppy configuration where there were interesting things
    concealed under a mount point that probably should not be revealed.

    Expirable submounts are not locked because they will eventually
    unmount automatically so whatever is under them already needs
    to be safe for unprivileged users to access.

    From a practical standpoint these restrictions do not appear to be
    significant for unprivileged users of the mount namespace. Recursive
    bind mounts and pivot_root continues to work, and mounts that are
    created in a mount namespace may be unmounted there. All of which
    means that the common idiom of keeping a directory of interesting
    files and using pivot_root to throw everything else away continues to
    work just fine.

    Acked-by: Serge Hallyn
    Acked-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

15 Jul, 2013

4 commits

  • Linus Torvalds
     
  • Pull slab update from Pekka Enberg:
    "Highlights:

    - Fix for boot-time problems on some architectures due to
    init_lock_keys() not respecting kmalloc_caches boundaries
    (Christoph Lameter)

    - CONFIG_SLUB_CPU_PARTIAL requested by RT folks (Joonsoo Kim)

    - Fix for excessive slab freelist draining (Wanpeng Li)

    - SLUB and SLOB cleanups and fixes (various people)"

    I ended up editing the branch, and this avoids two commits at the end
    that were immediately reverted, and I instead just applied the oneliner
    fix in between myself.

    * 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
    slub: Check for page NULL before doing the node_match check
    mm/slab: Give s_next and s_stop slab-specific names
    slob: Check for NULL pointer before calling ctor()
    slub: Make cpu partial slab support configurable
    slab: add kmalloc() to kernel API documentation
    slab: fix init_lock_keys
    slob: use DIV_ROUND_UP where possible
    slub: do not put a slab to cpu partial list when cpu_partial is 0
    mm/slub: Use node_nr_slabs and node_nr_objs in get_slabinfo
    mm/slub: Drop unnecessary nr_partials
    mm/slab: Fix /proc/slabinfo unwriteable for slab
    mm/slab: Sharing s_next and s_stop between slab and slub
    mm/slab: Fix drain freelist excessively
    slob: Rework #ifdeffery in slab.h
    mm, slab: moved kmem_cache_alloc_node comment to correct place

    Linus Torvalds
     
  • In the -rt kernel (mrg), we hit the following dump:

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [] kmem_cache_alloc_node+0x51/0x180
    PGD a2d39067 PUD b1641067 PMD 0
    Oops: 0000 [#1] PREEMPT SMP
    Modules linked in: sunrpc cpufreq_ondemand ipv6 tg3 joydev sg serio_raw pcspkr k8temp amd64_edac_mod edac_core i2c_piix4 e100 mii shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom sata_svw ata_generic pata_acpi pata_serverworks radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
    CPU 3
    Pid: 20878, comm: hackbench Not tainted 3.6.11-rt25.14.el6rt.x86_64 #1 empty empty/Tyan Transport GT24-B3992
    RIP: 0010:[] [] kmem_cache_alloc_node+0x51/0x180
    RSP: 0018:ffff8800a9b17d70 EFLAGS: 00010213
    RAX: 0000000000000000 RBX: 0000000001200011 RCX: ffff8800a06d8000
    RDX: 0000000004d92a03 RSI: 00000000000000d0 RDI: ffff88013b805500
    RBP: ffff8800a9b17dc0 R08: ffff88023fd14d10 R09: ffffffff81041cbd
    R10: 00007f4e3f06e9d0 R11: 0000000000000246 R12: ffff88013b805500
    R13: ffff8801ff46af40 R14: 0000000000000001 R15: 0000000000000000
    FS: 00007f4e3f06e700(0000) GS:ffff88023fd00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000000000 CR3: 00000000a2d3a000 CR4: 00000000000007e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process hackbench (pid: 20878, threadinfo ffff8800a9b16000, task ffff8800a06d8000)
    Stack:
    ffff8800a9b17da0 ffffffff81202e08 ffff8800a9b17de0 000000d001200011
    0000000001200011 0000000001200011 0000000000000000 0000000000000000
    00007f4e3f06e9d0 0000000000000000 ffff8800a9b17e60 ffffffff81041cbd
    Call Trace:
    [] ? current_has_perm+0x68/0x80
    [] copy_process+0xdd/0x15b0
    [] ? rt_up_read+0x25/0x30
    [] do_fork+0x5a/0x360
    [] ? migrate_enable+0xeb/0x220
    [] sys_clone+0x28/0x30
    [] stub_clone+0x13/0x20
    [] ? system_call_fastpath+0x16/0x1b
    Code: 89 fc 89 75 cc 41 89 d6 4d 8b 04 24 65 4c 03 04 25 48 ae 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 74 12 41 83 fe ff 74 27 8b 00 48 c1 e8 3a 41 39 c6 74 1b 8b 75 cc 4c 89 c9 44 89 f2
    RIP [] kmem_cache_alloc_node+0x51/0x180
    RSP
    CR2: 0000000000000000
    ---[ end trace 0000000000000002 ]---

    Now, this uses SLUB pretty much unmodified, but as it is the -rt kernel
    with CONFIG_PREEMPT_RT set, spinlocks are mutexes, although they do
    disable migration. But the SLUB code is relatively lockless, and the
    spin_locks there are raw_spin_locks (not converted to mutexes), thus I
    believe this bug can happen in mainline without -rt features. The -rt
    patch is just good at triggering mainline bugs ;-)

    Anyway, looking at where this crashed, it seems that the page variable
    can be NULL when passed to the node_match() function (which does not
    check if it is NULL). When this happens we get the above panic.

    As page is only used in slab_alloc() to check if the node matches, if
    it's NULL I'm assuming that we can say it doesn't and call the
    __slab_alloc() code. Is this a correct assumption?

    Acked-by: Christoph Lameter
    Signed-off-by: Steven Rostedt
    Signed-off-by: Pekka Enberg
    Signed-off-by: Linus Torvalds

    Steven Rostedt
     
  • Pull more vfs stuff from Al Viro:
    "O_TMPFILE ABI changes, Oleg's fput() series, misc cleanups, including
    making simple_lookup() usable for filesystems with non-NULL s_d_op,
    which allows us to get rid of quite a bit of ugliness"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    sunrpc: now we can just set ->s_d_op
    cgroup: we can use simple_lookup() now
    efivarfs: we can use simple_lookup() now
    make simple_lookup() usable for filesystems that set ->s_d_op
    configfs: don't open-code d_alloc_name()
    __rpc_lookup_create_exclusive: pass string instead of qstr
    rpc_create_*_dir: don't bother with qstr
    llist: llist_add() can use llist_add_batch()
    llist: fix/simplify llist_add() and llist_add_batch()
    fput: turn "list_head delayed_fput_list" into llist_head
    fs/file_table.c:fput(): add comment
    Safer ABI for O_TMPFILE

    Linus Torvalds
     

14 Jul, 2013

24 commits

  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • ... and use d_hash_and_lookup() instead of open-coding it, for fsck sake...

    Signed-off-by: Al Viro

    Al Viro
     
  • just pass the name

    Signed-off-by: Al Viro

    Al Viro
     
  • Pull x86 platform driver updates from Matthew Garrett:
    "Nothing overly exciting here - a couple of new drivers that don't do a
    great deal, along with some miscellaneous fixes and a couple of small
    feature enablement patches"

    * 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86:
    x86 platform drivers: fix gpio leak
    toshiba_acpi: Add dependency on SERIO_I8042
    asus-nb-wmi: set wapf=4 for ASUSTeK COMPUTER INC. 1015E/U
    Add trivial driver to disable Intel Smart Connect
    Add support driver for Intel Rapid Start Technology
    hp-wmi: add supports for POST code error
    asus-wmi: control wlan-led only if wapf == 4
    drivers/platform/x86/intel_ips: Convert to module_pci_driver
    asus-nb-wmi: ignore ALS notification key code
    asus-wmi: append newline to messages
    x86: asus-laptop: fix invalid point access
    x86: msi-laptop: fix memleak
    amilo-rfkill: Add dependency on SERIO_I8042
    dell-laptop: fix error return code in dell_init()
    hp-wmi: Enable hotkeys on some systems

    Linus Torvalds
     
  • Pull second round of input updates from Dmitry Torokhov:
    "An update to Elantech driver to support hardware v7, fix to the new
    cyttsp4 driver to use proper addressing, ads7846 device tree support
    and nspire-keypad got a small cleanup."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
    Input: nspire-keypad - replace magic offset with define
    Input: elantech - fix for newer hardware versions (v7)
    Input: cyttsp4 - use 16bit address for I2C/SPI communication
    Input: ads7846 - add device tree bindings
    Input: ads7846 - make sure we do not change platform data

    Linus Torvalds
     
  • Pull networking fixes from David Miller:
    "Just a bunch of small fixes and tidy ups:

    1) Finish the "busy_poll" renames, from Eliezer Tamir.

    2) Fix RCU stalls in IFB driver, from Ding Tianhong.

    3) Linearize buffers properly in tun/macvtap zerocopy code.

    4) Don't crash on rmmod in vxlan, from Pravin B Shelar.

    5) Spinlock used before init in alx driver, from Maarten Lankhorst.

    6) A sparse warning fix in bnx2x broke TSO checksums, fix from Dmitry
    Kravkov.

    7) Dummy and ifb driver load failure paths can oops, fixes from Tan
    Xiaojun and Ding Tianhong.

    8) Correct MTU calculations in IP tunnels, from Alexander Duyck.

    9) Account all TCP retransmits in SNMP stats properly, from Yuchung
    Cheng.

    10) atl1e and via-rhine do not handle DMA mapping failures properly,
    from Neil Horman.

    11) Various equal-cost multipath route fixes in ipv6 from Hannes
    Frederic Sowa"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (36 commits)
    ipv6: only static routes qualify for equal cost multipathing
    via-rhine: fix dma mapping errors
    atl1e: fix dma mapping warnings
    tcp: account all retransmit failures
    usb/net/r815x: fix cast to restricted __le32
    usb/net/r8152: fix integer overflow in expression
    net: access page->private by using page_private
    net: strict_strtoul is obsolete, use kstrtoul instead
    drivers/net/ieee802154: don't use devm_pinctrl_get_select_default() in probe
    drivers/net/ethernet/cadence: don't use devm_pinctrl_get_select_default() in probe
    drivers/net/can/c_can: don't use devm_pinctrl_get_select_default() in probe
    net/usb: add relative mii functions for r815x
    net/tipc: use %*phC to dump small buffers in hex form
    qlcnic: Adding Maintainers.
    gre: Fix MTU sizing check for gretap tunnels
    pkt_sched: sch_qfq: remove forward declaration of qfq_update_agg_ts
    pkt_sched: sch_qfq: improve efficiency of make_eligible
    gso: Update tunnel segmentation to support Tx checksum offload
    inet: fix spacing in assignment
    ifb: fix oops when loading the ifb failed
    ...

    Linus Torvalds
     
  • Pull final round of SCSI updates from James Bottomley:
    "This is the remaining set of SCSI patches for the merge window. It's
    mostly driver updates (scsi_debug, qla2xxx, storvsc, mp3sas). There
    are also several bug fixes in fcoe, libfc, and megaraid_sas. We also
    have a couple of core changes to try to make device destruction more
    deterministic"

    * tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (46 commits)
    [SCSI] scsi constants: command, sense key + additional sense strings
    fcoe: Reduce number of sparse warnings
    fcoe: Stop fc_rport_priv structure leak
    libfcoe: Fix meaningless log statement
    libfc: Differentiate echange timer cancellation debug statements
    libfc: Remove extra space in fc_exch_timer_cancel definition
    fcoe: fix the link error status block sparse warnings
    fcoe: Fix smatch warning in fcoe_fdmi_info function
    libfc: Reject PLOGI from nodes with incompatible role
    [SCSI] enable destruction of blocked devices which fail LUN scanning
    [SCSI] Fix race between starved list and device removal
    [SCSI] megaraid_sas: fix a bug for 64 bit arches
    [SCSI] scsi_debug: reduce duplication between prot_verify_read and prot_verify_write
    [SCSI] scsi_debug: simplify offset calculation for dif_storep
    [SCSI] scsi_debug: invalidate protection info for unmapped region
    [SCSI] scsi_debug: fix NULL pointer dereference with parameters dif=0 dix=1
    [SCSI] scsi_debug: fix incorrectly nested kmap_atomic()
    [SCSI] scsi_debug: fix invalid address passed to kunmap_atomic()
    [SCSI] mpt3sas: Bump driver version to v02.100.00.00
    [SCSI] mpt3sas: when async scanning is enabled then while scanning, devices are removed but their transport layer entries are not removed
    ...

    Linus Torvalds
     
  • Pull scheduler fix from Thomas Gleixner:
    "Fix a potential deadlock versus hrtimers"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched: Fix HRTICK

    Linus Torvalds
     
  • Pull irq updates from Thomas Gleixner:
    - core fix for missing round up in the generic irq chip implementation
    - new irq chip for MOXA SoCs
    - a few fixes and cleanups in the irqchip drivers

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    irqchip: Add support for MOXA ART SoCs
    genirq: generic chip: Use DIV_ROUND_UP to calculate numchips
    irqchip: nvic: Fix wrong num_ct argument for irq_alloc_domain_generic_chips()
    irqchip: sun4i: Staticize sun4i_irq_ack()
    irqchip: vt8500: Staticize local symbols

    Linus Torvalds
     
  • Pull timer updates from Thomas Gleixner:
    - watchdog fixes for full dynticks
    - improved debug output for full dynticks
    - remove an obsolete full dynticks check
    - two ARM SoC clocksource drivers for sharing across SoCs
    - tick broadcast fix for CPU hotplug

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    tick: broadcast: Check broadcast mode on CPU hotplug
    clocksource: arm_global_timer: Add ARM global timer support
    clocksource: Add Marvell Orion SoC timer
    nohz: Remove obsolete check for full dynticks CPUs to be RCU nocbs
    watchdog: Boot-disable by default on full dynticks
    watchdog: Rename confusing state variable
    watchdog: Register / unregister watchdog kthreads on sysctl control
    nohz: Warn if the machine can not perform nohz_full

    Linus Torvalds
     
  • Pull perf fixes from Thomas Gleixner:
    - fix for do_div() abuse on x86
    - locking fix in perf core
    - a pile of (build) fixes and cleanups in perf tools

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
    perf/x86: Fix incorrect use of do_div() in NMI warning
    perf: Fix perf_lock_task_context() vs RCU
    perf: Remove WARN_ON_ONCE() check in __perf_event_enable() for valid scenario
    perf: Clone child context from parent context pmu
    perf script: Fix broken include in Context.xs
    perf tools: Fix -ldw/-lelf link test when static linking
    perf tools: Revert regression in configuration of Python support
    perf tools: Fix perf version generation
    perf stat: Fix per-socket output bug for uncore events
    perf symbols: Fix vdso list searching
    perf evsel: Fix missing increment in sample parsing
    perf tools: Update symbol_conf.nr_events when processing attribute events
    perf tools: Fix new_term() missing free on error path
    perf tools: Fix parse_events_terms() segfault on error path
    perf evsel: Fix count parameter to read call in event_format__new
    perf tools: fix a typo of a Power7 event name
    perf tools: Fix -x/--exclude-other option for report command
    perf evlist: Enhance perf_evlist__start_workload()
    perf record: Remove -f/--force option
    perf record: Remove -A/--append option
    ...

    Linus Torvalds
     
  • Pull core locking updates from Thomas Gleixner:
    "Header cleanup as requested by Linus"

    (This is the "don't include support for ww_mutex in a header file that
    everybody wants, when almost nobody wants the ww part" change)

    * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    mutex: Move ww_mutex definitions to ww_mutex.h

    Linus Torvalds
     
  • Pull ARM SoC fixes from Olof Johansson:
    "This is our first set of fixes from arm-soc for 3.11.
    - A handful of build and warning fixes from Arnd
    - A collection of OMAP fixes
    - defconfig updates to make the default configs more useful for real
    use (and testing) out of the box on hardware

    And a couple of other small fixes. Some of these have been recently
    applied but it's normally how we deal with fixes, with less bake time
    in -next needed"

    * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (32 commits)
    arm: multi_v7_defconfig: Tweaks for omap and sunxi
    arm: multi_v7_defconfig: add i.MX options and NFS root
    ARM: omap2: add select of TI_PRIV_EDMA
    ARM: exynos: select PM_GENERIC_DOMAINS only when used
    ARM: ixp4xx: avoid circular header dependency
    ARM: OMAP: omap_common_late_init may be unused
    ARM: sti: move DEBUG_STI_UART into alphabetical order
    ARM: OMAP: build mach-omap code only if needed
    ARM: zynq: use DT_MACHINE_START
    ARM: omap5: omap5 has SCU and TWD
    ARM: OMAP2+: omap2plus_defconfig: Enable appended DTB support
    ARM: OMAP2+: Enable TI_EDMA in omap2plus_defconfig
    ARM: OMAP2+: omap2plus_defconfig: enable DRA752 thermal support by default
    ARM: OMAP2+: omap2plus_defconfig: enable TI bandgap driver
    ARM: OMAP2+: devices: remove duplicated include from devices.c
    ARM: OMAP3: igep0020: Set DSS pins in correct mux mode.
    ARM: OMAP2+: N900: enable N900-specific drivers even if device tree is enabled
    ARM: OMAP2+: Cocci spatch "ptr_ret.spatch"
    ARM: OMAP2+: Remove obsolete Makefile line
    ARM: OMAP5: Enable Cortex A15 errata 798181
    ...

    Linus Torvalds
     
  • Pull ARM fixes from Russell King:
    "A few fixes for ARM, mostly just one liners with the exception of the
    missing section specification. We decided not to rely on .previous to
    fix this but to explicitly state the section we want the code to be
    in."

    * 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
    ARM: 7778/1: smp_twd: twd_update_frequency need be run on all online CPUs
    ARM: 7782/1: Kconfig: Let ARM_ERRATA_364296 not depend on CONFIG_SMP
    ARM: mm: fix boot on SA1110 Assabet
    ARM: 7781/1: mmu: Add debug_ll_io_init() mappings to early mappings
    ARM: 7780/1: add missing linker section markup to head-common.S

    Linus Torvalds
     
  • Pull MIPS updates from Ralf Baechle:
    "MIPS updates:

    - All the things that didn't make 3.10.
    - Removes the Windriver PPMC platform. Nobody will miss it.
    - Remove a workaround from kernel/irq/irqdomain.c which was there
    exclusivly for MIPS. Patch by Grant Likely.
    - More small improvments for the SEAD 3 platform
    - Improvments on the BMIPS / SMP support for the BCM63xx series.
    - Various cleanups of dead leftovers.
    - Platform support for the Cavium Octeon-based EdgeRouter Lite.

    Two large KVM patchsets didn't make it for this pull request because
    their respective authors are vacationing"

    * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (124 commits)
    MIPS: Kconfig: Add missing MODULES dependency to VPE_LOADER
    MIPS: BCM63xx: CLK: Add dummy clk_{set,round}_rate() functions
    MIPS: SEAD3: Disable L2 cache on SEAD-3.
    MIPS: BCM63xx: Enable second core SMP on BCM6328 if available
    MIPS: BCM63xx: Add SMP support to prom.c
    MIPS: define write{b,w,l,q}_relaxed
    MIPS: Expose missing pci_io{map,unmap} declarations
    MIPS: Malta: Update GCMP detection.
    Revert "MIPS: make CAC_ADDR and UNCAC_ADDR account for PHYS_OFFSET"
    MIPS: APSP: Remove
    SSB: Kconfig: Amend SSB_EMBEDDED dependencies
    MIPS: microMIPS: Fix improper definition of ISA exception bit.
    MIPS: Don't try to decode microMIPS branch instructions where they cannot exist.
    MIPS: Declare emulate_load_store_microMIPS as a static function.
    MIPS: Fix typos and cleanup comment
    MIPS: Cleanup indentation and whitespace
    MIPS: BMIPS: support booting from physical CPU other than 0
    MIPS: Only set cpu_has_mmips if SYS_SUPPORTS_MICROMIPS
    MIPS: GIC: Fix gic_set_affinity infinite loop
    MIPS: Don't save/restore OCTEON wide multiplier state on syscalls.
    ...

    Linus Torvalds
     
  • Pull watchdog updates from Wim Van Sebroeck:
    - lots of devm_ conversions and cleanup
    - platform_set_drvdata cleanups
    - s3c2410: dev_err/dev_info + dev_pm_ops
    - watchdog_core: don't try to stop device if not running fix
    - wdrtas: use print_hex_dump
    - xilinx cleanups
    - orion_wdt fixes
    - softdog cleanup
    - hpwdt: check on UEFI bits
    - deletion of mpcore_wdt driver
    - addition of broadcom BCM2835 watchdog timer driver
    - addition of MEN A21 watcdog devices

    * git://www.linux-watchdog.org/linux-watchdog: (38 commits)
    watchdog: hpwdt: Add check for UEFI bits
    watchdog: softdog: remove replaceable ping operation
    watchdog: New watchdog driver for MEN A21 watchdogs
    Watchdog: fix clearing of the watchdog interrupt
    Watchdog: allow orion_wdt to be built for Dove
    watchdog: Add Broadcom BCM2835 watchdog timer driver
    watchdog: delete mpcore_wdt driver
    watchdog: xilinx: Setup the origin compatible string
    watchdog: xilinx: Fix driver header
    watchdog: wdrtas: don't use custom version of print_hex_dump
    watchdog: core: don't try to stop device if not running
    watchdog: jz4740: Pass device to clk_get
    watchdog: twl4030: Remove redundant platform_set_drvdata()
    watchdog: mpcore: Remove redundant platform_set_drvdata()
    watchdog: da9055: use platform_{get,set}_drvdata()
    watchdog: da9052: use platform_{get,set}_drvdata()
    watchdog: cpwd: use platform_{get,set}_drvdata()
    watchdog: s3c2410_wdt: convert s3c2410wdt to dev_pm_ops
    watchdog: s3c2410_wdt: use dev_err()/dev_info() instead of pr_err()/pr_info()
    watchdog: wm831x: use platform_{get,set}_drvdata()
    ...

    Linus Torvalds
     
  • Pull InfiniBand/RDMA changes from Roland Dreier:
    - AF_IB (native IB addressing) for CMA from Sean Hefty
    - new mlx5 driver for Mellanox Connect-IB adapters (including post
    merge request fixes)
    - SRP fixes from Bart Van Assche (including fix to first merge request)
    - qib HW driver updates
    - resurrection of ocrdma HW driver development
    - uverbs conversion to create fds with O_CLOEXEC set
    - other small changes and fixes

    * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits)
    mlx5: Return -EFAULT instead of -EPERM
    IB/qib: Log all SDMA errors unconditionally
    IB/qib: Fix module-level leak
    mlx5_core: Adjust hca_cap.uar_page_sz to conform to Connect-IB spec
    IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline
    IB/uverbs: Use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd()
    mlx5_core: Fixes for sparse warnings
    IB/mlx5: Make profile[] static in main.c
    mlx5: Fix parameter type of health_handler_t
    mlx5: Add driver for Mellanox Connect-IB adapters
    IB/core: Add reserved values to enums for low-level driver use
    IB/srp: Bump driver version and release date
    IB/srp: Make HCA completion vector configurable
    IB/srp: Maintain a single connection per I_T nexus
    IB/srp: Fail I/O fast if target offline
    IB/srp: Skip host settle delay
    IB/srp: Avoid skipping srp_reset_host() after a transport error
    IB/srp: Fix remove_one crash due to resource exhaustion
    IB/qib: New transmitter tunning settings for Dell 1.1 backplane
    IB/core: Fix error return code in add_port()
    ...

    Linus Torvalds
     
  • Pull media updates from Mauro Carvalho Chehab:
    "This series contain:
    - new i2c video drivers: ml86v7667 (video decoder),
    ths8200 (video encoder)
    - a new video driver for EasyCap cards based on Fushicai USBTV007
    - Improved support for OF and embedded systems, with V4L2 async
    initialization and a better support for clocks
    - API cleanups on the ioctls used by the v4l2 debug tool
    - Lots of cleanups
    - As usual, several driver improvements and new cards additions
    - Revert two changesets that change the minimal symbol rate for
    stv0399, as request by Manu
    - Update MAINTAINERS and other files to point to my new e-mail"

    * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (378 commits)
    MAINTAINERS & ABI: Update to point to my new email
    [media] stb0899: restore minimal rate to 5Mbauds
    [media] exynos4-is: Correct colorspace handling at FIMC-LITE
    [media] exynos4-is: Set valid initial format on FIMC.n subdevs
    [media] exynos4-is: Set valid initial format on FIMC-IS-ISP subdev pads
    [media] exynos4-is: Fix format propagation on FIMC-IS-ISP subdev
    [media] exynos4-is: Set valid initial format at FIMC-LITE
    [media] exynos4-is: Fix format propagation on FIMC-LITE.n subdevs
    [media] MAINTAINERS: Update S5P/Exynos FIMC driver entry
    [media] Documentation: Update driver's directory in video4linux/fimc.txt
    [media] exynos4-is: Change fimc-is firmware file names
    [media] exynos4-is: Add support for Exynos5250 MIPI-CSIS
    [media] exynos4-is: Add Exynos5250 SoC support to fimc-lite driver
    [media] exynos4-is: Drop drvdata handling in fimc-lite for non-dt platforms
    [media] media: i2c: tvp514x: remove manual setting of subdev name
    [media] media: i2c: tvp7002: remove manual setting of subdev name
    [media] mem2mem: set missing v4l2_dev pointer
    [media] wl128x: add missing struct v4l2_device
    [media] tvp514x: Fix init seqeunce
    [media] saa7134: Fix sparse warnings by adding __user annotation
    ...

    Linus Torvalds
     
  • Pull more xfs updates from Ben Myers:
    "Here are a fix for xfs_fsr, a cleanup in bulkstat, a cleanup in
    xfs_open_by_handle, updated mount options documentation, a cleanup in
    xfs_bmapi_write, a fix for the size of dquot log reservations, a fix
    for sgid inheritance when acls are in use, a fix for cleaning up
    quotainfo structures, and some more of the work which allows group and
    project quotas to be used together.

    We had a few more in this last quota category that we might have liked
    to get in, but it looks there are still a few items that need to be
    addressed.

    - fix for xfs_fsr returning -EINVAL
    - cleanup in xfs_bulkstat
    - cleanup in xfs_open_by_handle
    - update mount options documentation
    - clean up local format handling in xfs_bmapi_write
    - fix dquot log reservations which were too small
    - fix sgid inheritance for subdirectories when default acls are in use
    - add project quota fields to various structures
    - fix teardown of quotainfo structures when quotas are turned off"

    * tag 'for-linus-v3.11-rc1-2' of git://oss.sgi.com/xfs/xfs:
    xfs: Fix the logic check for all quotas being turned off
    xfs: Add pquota fields where gquota is used.
    xfs: fix sgid inheritance for subdirectories inheriting default acls [V3]
    xfs: dquot log reservations are too small
    xfs: remove local fork format handling from xfs_bmapi_write()
    xfs: update mount options documentation
    xfs: use get_unused_fd_flags(0) instead of get_unused_fd()
    xfs: clean up unused codes at xfs_bulkstat()
    xfs: use XFS_BMAP_BMDR_SPACE vs. XFS_BROOT_SIZE_ADJ

    Linus Torvalds
     
  • Pull cifs fixes from Steve French:
    "Fixes for 4 cifs bugs, including a reconnect problem, a problem
    parsing responses to SMB2 open request, and setting nlink incorrectly
    to some servers which don't report it properly on the wire. Also
    improves data integrity on reconnect with series from Pavel which adds
    durable handle support for SMB2."

    * 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
    CIFS: Fix a deadlock when a file is reopened
    CIFS: Reopen the file if reconnect durable handle failed
    [CIFS] Fix minor endian error in durable handle patch series
    CIFS: Reconnect durable handles for SMB2
    CIFS: Make SMB2_open use cifs_open_parms struct
    CIFS: Introduce cifs_open_parms struct
    CIFS: Request durable open for SMB2 opens
    CIFS: Simplify SMB2 create context handling
    CIFS: Simplify SMB2_open code path
    CIFS: Respect create_options in smb2_open_file
    CIFS: Fix lease context buffer parsing
    [CIFS] use sensible file nlink values if unprovided
    Limit allocation of crypto mechanisms to dialect which requires

    Linus Torvalds
     

13 Jul, 2013

5 commits

  • llist_add(new, head) can simply use llist_add_batch(new, new, head),
    no need to duplicate the code.

    This obviously uninlines llist_add() and to me this is a win. But we
    can make llist_add_batch() inline if this is desirable, in this case
    gcc can notice that new_first == new_last if the caller is llist_add().

    Signed-off-by: Oleg Nesterov
    Cc: Al Viro
    Cc: Andrey Vagin
    Cc: "Eric W. Biederman"
    Cc: David Howells
    Cc: Huang Ying
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Oleg Nesterov
     
  • 1. This is mostly theoretical, but llist_add*() need ACCESS_ONCE().

    Otherwise it is not guaranteed that the first cmpxchg() uses the
    same value for old_entry and new_last->next.

    2. These helpers cache the result of cmpxchg() and read the initial
    value of head->first before the main loop. I do not think this
    makes sense. In the likely case cmpxchg() succeeds, otherwise
    it doesn't hurt to reload head->first.

    I think it would be better to simplify the code and simply read
    ->first before cmpxchg().

    Signed-off-by: Oleg Nesterov
    Cc: Al Viro
    Cc: Andrey Vagin
    Cc: "Eric W. Biederman"
    Cc: David Howells
    Cc: Huang Ying
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Oleg Nesterov
     
  • fput() and delayed_fput() can use llist and avoid the locking.

    This is unlikely path, it is not that this change can improve
    the performance, but this way the code looks simpler.

    Signed-off-by: Oleg Nesterov
    Suggested-by: Andrew Morton
    Cc: Al Viro
    Cc: Andrey Vagin
    Cc: "Eric W. Biederman"
    Cc: David Howells
    Cc: Huang Ying
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Oleg Nesterov
     
  • A missed update to "fput: task_work_add() can fail if the caller has
    passed exit_task_work()".

    Cc: "Eric W. Biederman"
    Cc: Al Viro
    Cc: Andrey Vagin
    Cc: David Howells
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Andrew Morton
     
  • [suggested by Rasmus Villemoes] make O_DIRECTORY | O_RDWR part of O_TMPFILE;
    that will fail on old kernels in a lot more cases than what I came up with.
    And make sure O_CREAT doesn't get there...

    Signed-off-by: Al Viro

    Al Viro