30 Nov, 2010

1 commit

  • This reverts commit e0fdace10e75dac67d906213b780ff1b1a4cc360.

    On-list discussion seems to suggest that the robustness fixes for printk
    make this unnecessary and DaveM has also agreed in person at Kernel Summit
    and on list.

    The main problem with this code is once we hit a lockdep splat we always
    keep oops_in_progress set, the console layer uses oops_in_progress with KMS
    to decide when it should be showing the oops and not showing X, so it causes
    problems around suspend/resume time when a userspace resume can cause a console
    switch away from X, only if oops_in_progress is set (which is what we want
    if an oops actually is in progress, but not because we had a lockdep splat
    2 days prior).

    Cc: David S Miller
    Cc: Ingo Molnar
    Signed-off-by: Dave Airlie
    Signed-off-by: Linus Torvalds

    Dave Airlie
     

12 Nov, 2010

1 commit

  • Salman Qazi describes the following radix-tree bug:

    In the following case, we get can get a deadlock:

    0. The radix tree contains two items, one has the index 0.
    1. The reader (in this case find_get_pages) takes the rcu_read_lock.
    2. The reader acquires slot(s) for item(s) including the index 0 item.
    3. The non-zero index item is deleted, and as a consequence the other item is
    moved to the root of the tree. The place where it used to be is queued for
    deletion after the readers finish.
    3b. The zero item is deleted, removing it from the direct slot, it remains in
    the rcu-delayed indirect node.
    4. The reader looks at the index 0 slot, and finds that the page has 0 ref
    count
    5. The reader looks at it again, hoping that the item will either be freed or
    the ref count will increase. This never happens, as the slot it is looking
    at will never be updated. Also, this slot can never be reclaimed because
    the reader is holding rcu_read_lock and is in an infinite loop.

    The fix is to re-use the same "indirect" pointer case that requires a slot
    lookup retry into a general "retry the lookup" bit.

    Signed-off-by: Nick Piggin
    Reported-by: Salman Qazi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

28 Oct, 2010

1 commit

  • * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (48 commits)
    DMAENGINE: move COH901318 to arch_initcall
    dma: imx-dma: fix signedness bug
    dma/timberdale: simplify conditional
    ste_dma40: remove channel_type
    ste_dma40: remove enum for endianess
    ste_dma40: remove TIM_FOR_LINK option
    ste_dma40: move mode_opt to separate config
    ste_dma40: move channel mode to a separate field
    ste_dma40: move priority to separate field
    ste_dma40: add variable to indicate valid dma_cfg
    async_tx: make async_tx channel switching opt-in
    move async raid6 test to lib/Kconfig.debug
    dmaengine: Add Freescale i.MX1/21/27 DMA driver
    intel_mid_dma: change the slave interface
    intel_mid_dma: fix the WARN_ONs
    intel_mid_dma: Add sg list support to DMA driver
    intel_mid_dma: Allow DMAC2 to share interrupt
    intel_mid_dma: Allow IRQ sharing
    intel_mid_dma: Add runtime PM support
    DMAENGINE: define a dummy filter function for ste_dma40
    ...

    Linus Torvalds
     

27 Oct, 2010

17 commits

  • Add idr/ida to kernel-api docbook.
    Fix typos and kernel-doc notation.

    Signed-off-by: Randy Dunlap
    Acked-by: Tejun Heo
    Cc: Naohiro Aota
    Cc: Jiri Kosina
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
    arch/tile: convert a BUG_ON to BUILD_BUG_ON
    arch/tile: make ptrace() work properly for TILE-Gx COMPAT mode
    arch/tile: support new info op generated by compiler
    arch/tile: minor whitespace/naming changes for string support files
    arch/tile: enable single-step support for TILE-Gx
    arch/tile: parameterize system PLs to support KVM port
    arch/tile: add Tilera's header as an open-source header
    arch/tile: Bomb C99 comments to C89 comments in tile's
    arch/tile: prevent corrupt top frame from causing backtracer runaway
    arch/tile: various top-level Makefile cleanups
    arch/tile: change lower bound on syscall error return to -4095
    arch/tile: properly export __mb_incoherent for modules
    arch/tile: provide a definition of MAP_STACK
    kmemleak: add TILE to the list of supported architectures.
    char: hvc: check for error case
    arch/tile: Add a warning if we try to allocate too much vmalloc memory.
    arch/tile: update some comments to clarify register usage.
    arch/tile: use better "punctuation" for VMSPLIT_3_5G and friends
    arch/tile: Use
    tile: replace some BUG_ON checks with BUILD_BUG_ON checks

    Linus Torvalds
     
  • The current implementation of div64_u64 for 32bit systems returns an
    approximately correct result when the divisor exceeds 32bits. Since doing
    64bit division using 32bit hardware is a long since solved problem we just
    use one of the existing proven methods.

    Additionally, add a div64_s64 function to correctly handle doing signed
    64bit division.

    Addresses https://bugzilla.redhat.com/show_bug.cgi?id=616105

    Signed-off-by: Brian Behlendorf
    Signed-off-by: Oleg Nesterov
    Cc: Ben Woodard
    Cc: Jeremy Fitzhardinge
    Cc: Mark Grondona
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Brian Behlendorf
     
  • Use new variable 'len' to make code more readable.

    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • this_cpu_ptr() avoids an array lookup and can use the percpu offset of the
    local cpu directly.

    Signed-off-by: Christoph Lameter
    Cc: Eric Dumazet
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Improve 'lib_sort()' test and check that:
    o 'cmp()' is called only for elements which were present in the original list,
    i.e., the 'a' and 'b' parameters are valid
    o the resulted (sorted) list consists onlly of the original elements
    o intdoruce "poison" fields to make sure data around 'struc list_head' field
    are not corrupted.

    Signed-off-by: Artem Bityutskiy
    Cc: Don Mullis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Artem Bityutskiy
     
  • This patch unifies 'list_sort_test()' messages a bit and makes sure all of
    them start with the "list_sort_test:" prefix to make it obvious for users
    where the messages come from.

    Signed-off-by: Artem Bityutskiy
    Cc: Don Mullis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Artem Bityutskiy
     
  • The 'lib_sort()' test does not free memory if it fails, and it makes the
    kernel panic if it cannot allocate memory. This patch fixes the problem.

    This patch also changes several small things:
    o use 'list_add()' helper instead of adding manually
    o introduce temporary 'el1' variable to avoid ugly and unreadalbe
    "if" statement
    o make 'head' to be stack variable instead of 'kmalloc()'ed, which
    simplifies code a bit

    Overall, this patch is of clean-up type.

    Signed-off-by: Artem Bityutskiy
    Cc: Don Mullis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Artem Bityutskiy
     
  • Instead of using own pseudo-random generator, use generic linux
    'random32()' function. Presumably, this should improve test coverage.

    At the same time, do the following changes:
    o Use shorter macro name for test list length
    o Do not use strange 'l_h' name for 'struct list_head' element,
    use 'list', because it is traditional name and thus, makes the
    code more obvious and readable.

    Signed-off-by: Artem Bityutskiy
    Cc: Don Mullis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Artem Bityutskiy
     
  • I do not see any reason to use KERN_WARN for normal messages and
    KERN_EMERG for error messages in the lib_sort testing routine. Let's use
    more reasonable KERN_NORM and KERN_ERR levels.

    Signed-off-by: Artem Bityutskiy
    Cc: Don Mullis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Artem Bityutskiy
     
  • While hunting a non-existing bug in 'list_sort()', I've improved the
    'list_sort_test()' function which tests the 'list_sort()' library call.
    Although at the end I found a bug in my code, but not in 'list_sort()', I
    think my clean-ups and improvements are worth merging because they make
    the test function better.

    This patch:

    Make the self-tests selectable via Kconfig rather than by manual enabling
    of DEBUG_LIST_SORT.

    Signed-off-by: Artem Bityutskiy
    Cc: Don Mullis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Artem Bityutskiy
     
  • All percpu counters are linked to a global list on initialization and
    removed from it on destruction. The list is walked during CPU up/down.
    If a percpu counter is freed without being properly destroyed, the system
    will oops only on the next CPU up/down making it pretty nasty to track
    down. This patch adds debugobj support for percpu counters so that such
    problems can be found easily.

    As percpu counters don't make sense on stack and can't be statically
    initialized, debugobj support is pretty simple. It's initialized and
    activated on counter initialization, and deactivatd and destroyed on
    counter destruction. With this patch applied, the bug fixed by commit
    602586a83b719df0fbd94196a1359ed35aeb2df3 (shmem: put_super must
    percpu_counter_destroy) triggers the following warning on tmpfs unmount
    and the system won't oops on the next cpu up/down operation.

    ------------[ cut here ]------------
    WARNING: at lib/debugobjects.c:259 debug_print_object+0x5c/0x70()
    Hardware name: Bochs
    ODEBUG: free active (active state 0) object type: percpu_counter
    Modules linked in:
    Pid: 3999, comm: umount Not tainted 2.6.36-rc2-work+ #5
    Call Trace:
    [] warn_slowpath_common+0x7f/0xc0
    [] warn_slowpath_fmt+0x46/0x50
    [] debug_print_object+0x5c/0x70
    [] debug_check_no_obj_freed+0x125/0x210
    [] kfree+0xb3/0x2f0
    [] shmem_put_super+0x1d/0x30
    [] generic_shutdown_super+0x56/0xe0
    [] kill_anon_super+0x16/0x60
    [] kill_litter_super+0x27/0x30
    [] deactivate_locked_super+0x45/0x60
    [] deactivate_super+0x4a/0x70
    [] mntput_no_expire+0x86/0xe0
    [] sys_umount+0x6f/0x360
    [] system_call_fastpath+0x16/0x1b
    ---[ end trace cce2a341ba3611a7 ]---

    Signed-off-by: Tejun Heo
    Acked-by: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Despite the idr_pre_get() kernel-doc, there are some cases where you can
    call idr_pre_get() from within locked regions. Add a description for such
    cases.

    See also: http://lkml.org/lkml/2010/9/16/462

    [akpm@linux-foundation.org: cleanups, grammatical fixes]
    Signed-off-by: Naohiro Aota
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naohiro Aota
     
  • Signed-off-by: Andy Shevchenko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     
  • scnprintf() should return 0 if @size is == 0. Update the comment for it,
    as @size is unsigned.

    Signed-off-by: Changli Gao
    Cc: Ingo Molnar
    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Changli Gao
     
  • It might be nicer to align the output.

    For instance, ACPI messages sometimes have "(null)" pointers.

    $ dmesg | grep "(null)" -A 1 -B 1
    [ 0.198733] ACPI: Dynamic OEM Table Load:
    [ 0.198745] ACPI: SSDT (null) 00239 (v02 PmRef Cpu0Ist 00003000 INTL 20051117)
    [ 0.199294] ACPI: SSDT 7f596e10 001C7 (v02 PmRef Cpu0Cst 00003001 INTL 20051117)
    [ 0.200708] ACPI: Dynamic OEM Table Load:
    [ 0.200721] ACPI: SSDT (null) 001C7 (v02 PmRef Cpu0Cst 00003001 INTL 20051117)
    [ 0.201950] ACPI: SSDT 7f597f10 000D0 (v02 PmRef Cpu1Ist 00003000 INTL 20051117)
    [ 0.203386] ACPI: Dynamic OEM Table Load:
    [ 0.203398] ACPI: SSDT (null) 000D0 (v02 PmRef Cpu1Ist 00003000 INTL 20051117)
    [ 0.203871] ACPI: SSDT 7f595f10 00083 (v02 PmRef Cpu1Cst 00003000 INTL 20051117)
    [ 0.205301] ACPI: Dynamic OEM Table Load:
    [ 0.205315] ACPI: SSDT (null) 00083 (v02 PmRef Cpu1Cst 00003000 INTL 20051117)

    [akpm@linux-foundation.org: add code comment]
    Signed-off-by: Joe Perches
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • WARNING: at lib/list_debug.c:26 __list_add+0x3f/0x81()
    Hardware name: Express5800/B120a [N8400-085]
    list_add corruption. next->prev should be prev (ffffffff81a7ea00), but was dead000000200200. (next=ffff88080b872d58).
    Modules linked in: aoe ipt_MASQUERADE iptable_nat nf_nat autofs4 sunrpc bridge 8021q garp stp llc ipv6 cpufreq_ondemand acpi_cpufreq freq_table dm_round_robin dm_multipath kvm_intel kvm uinput lpfc scsi_transport_fc igb ioatdma scsi_tgt i2c_i801 i2c_core dca iTCO_wdt iTCO_vendor_support pcspkr shpchp megaraid_sas [last unloaded: aoe]
    Pid: 54, comm: events/3 Tainted: G W 2.6.34-vanilla1 #1
    Call Trace:
    [] warn_slowpath_common+0x7c/0x94
    [] warn_slowpath_fmt+0x41/0x43
    [] __list_add+0x3f/0x81
    [] __percpu_counter_init+0x59/0x6b
    [] bdi_init+0x118/0x17e
    [] blk_alloc_queue_node+0x79/0x143
    [] blk_alloc_queue+0x11/0x13
    [] aoeblk_gdalloc+0x8e/0x1c9 [aoe]
    [] aoecmd_sleepwork+0x25/0xa8 [aoe]
    [] worker_thread+0x1a9/0x237
    [] ? aoecmd_sleepwork+0x0/0xa8 [aoe]
    [] ? autoremove_wake_function+0x0/0x39
    [] ? worker_thread+0x0/0x237
    [] kthread+0x7f/0x87
    [] kernel_thread_helper+0x4/0x10
    [] ? kthread+0x0/0x87
    [] ? kernel_thread_helper+0x0/0x10

    It's because there is no initialization code for a list_head contained in
    the struct backing_dev_info under CONFIG_HOTPLUG_CPU, and the bug comes up
    when block device drivers calling blk_alloc_queue() are used. In case of
    me, I got them by using aoe.

    Signed-off-by: Masanori Itoh
    Cc: Tejun Heo
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masanori ITOH
     

25 Oct, 2010

2 commits

  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
    Update broken web addresses in arch directory.
    Update broken web addresses in the kernel.
    Revert "drivers/usb: Remove unnecessary return's from void functions" for musb gadget
    Revert "Fix typo: configuation => configuration" partially
    ida: document IDA_BITMAP_LONGS calculation
    ext2: fix a typo on comment in ext2/inode.c
    drivers/scsi: Remove unnecessary casts of private_data
    drivers/s390: Remove unnecessary casts of private_data
    net/sunrpc/rpc_pipe.c: Remove unnecessary casts of private_data
    drivers/infiniband: Remove unnecessary casts of private_data
    drivers/gpu/drm: Remove unnecessary casts of private_data
    kernel/pm_qos_params.c: Remove unnecessary casts of private_data
    fs/ecryptfs: Remove unnecessary casts of private_data
    fs/seq_file.c: Remove unnecessary casts of private_data
    arm: uengine.c: remove C99 comments
    arm: scoop.c: remove C99 comments
    Fix typo configue => configure in comments
    Fix typo: configuation => configuration
    Fix typo interrest[ing|ed] => interest[ing|ed]
    Fix various typos of valid in comments
    ...

    Fix up trivial conflicts in:
    drivers/char/ipmi/ipmi_si_intf.c
    drivers/usb/gadget/rndis.c
    net/irda/irnet/irnet_ppp.c

    Linus Torvalds
     
  • Conflicts:
    include/linux/percpu.h
    mm/percpu.c

    Pekka Enberg
     

23 Oct, 2010

7 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (31 commits)
    driver core: Display error codes when class suspend fails
    Driver core: Add section count to memory_block struct
    Driver core: Add mutex for adding/removing memory blocks
    Driver core: Move find_memory_block routine
    hpilo: Despecificate driver from iLO generation
    driver core: Convert link_mem_sections to use find_memory_block_hinted.
    driver core: Introduce find_memory_block_hinted which utilizes kset_find_obj_hinted.
    kobject: Introduce kset_find_obj_hinted.
    driver core: fix build for CONFIG_BLOCK not enabled
    driver-core: base: change to new flag variable
    sysfs: only access bin file vm_ops with the active lock
    sysfs: Fail bin file mmap if vma close is implemented.
    FW_LOADER: fix kconfig dependency warning on HOTPLUG
    uio: Statically allocate uio_class and use class .dev_attrs.
    uio: Support 2^MINOR_BITS minors
    uio: Cleanup irq handling.
    uio: Don't clear driver data
    uio: Fix lack of locking in init_uio_class
    SYSFS: Allow boot time switching between deprecated and modern sysfs layout
    driver core: remove CONFIG_SYSFS_DEPRECATED_V2 but keep it for block devices
    ...

    Linus Torvalds
     
  • * 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
    vfs: make no_llseek the default
    vfs: don't use BKL in default_llseek
    llseek: automatically add .llseek fop
    libfs: use generic_file_llseek for simple_attr
    mac80211: disallow seeks in minstrel debug code
    lirc: make chardev nonseekable
    viotape: use noop_llseek
    raw: use explicit llseek file operations
    ibmasmfs: use generic_file_llseek
    spufs: use llseek in all file operations
    arm/omap: use generic_file_llseek in iommu_debug
    lkdtm: use generic_file_llseek in debugfs
    net/wireless: use generic_file_llseek in debugfs
    drm: use noop_llseek

    Linus Torvalds
     
  • * 'config' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
    BKL: introduce CONFIG_BKL.
    dabusb: remove the BKL
    sunrpc: remove the big kernel lock
    init/main.c: remove BKL notations
    blktrace: remove the big kernel lock
    rtmutex-tester: make it build without BKL
    dvb-core: kill the big kernel lock
    dvb/bt8xx: kill the big kernel lock
    tlclk: remove big kernel lock
    fix rawctl compat ioctls breakage on amd64 and itanic
    uml: kill big kernel lock
    parisc: remove big kernel lock
    cris: autoconvert trivial BKL users
    alpha: kill big kernel lock
    isapnp: BKL removal
    s390/block: kill the big kernel lock
    hpet: kill BKL, add compat_ioctl

    Linus Torvalds
     
  • One call chain getting to kset_find_obj is:
    link_mem_sections()
    find_mem_section()
    kset_find_obj()

    This is done during boot. The memory sections were added in a linearly
    increasing order and link_mem_sections tends to utilize them in that
    same linear order.

    Introduce a kset_find_obj_hinted which is passed the result of the
    previous kset_find_obj which it uses for a quick "is the next object
    our desired object" check before falling back to the old behavior.

    Signed-off-by: Robin Holt
    To: Robert P. J. Day
    Reviewed-by: KAMEZAWA Hiroyuki
    Signed-off-by: Greg Kroah-Hartman

    Robin Holt
     
  • Having the ddebug_query= boot parameter it makes sense to set up
    dynamic debug as soon as possible.

    I expect sysfs files cannot be set up via an arch_initcall, because
    this one is even before fs_initcall. Therefore I splitted the
    dynamic_debug_init function into an early one and a later one providing
    /sys/../dynamic_debug/control file.

    Possibly dynamic_debug can be initialized even earlier, not sure whether
    this still makes sense then. I picked up arch_initcall as it covers
    quite a lot already.

    Dynamic debug needs to allocate memory, therefore it's not easily possible to
    set it up even before the command line gets parsed.
    Therefore the boot param query string is stored in a temp string which is
    applied when dynamic debug gets set up.

    This has been tested with ddebug_query="file ec.c +p"
    and I could retrieve pr_debug() messages early at boot during ACPI setup:
    ACPI: EC: Look up EC in DSDT
    ACPI: EC: ---> status = 0x08
    ACPI: EC: transaction start
    ACPI: EC: interrupt
    ACPI: EC: ---> status = 0x08
    ACPI: EC: status = 0x00
    ACPI: EC: transaction start
    ACPI: EC:
    Acked-by: jbaron@redhat.com
    Acked-by: Pekka Enberg
    CC: linux-acpi@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Thomas Renninger
     
  • Dynamic debug lacks the ability to enable debug messages at boot time.
    One could patch initramfs or service startup scripts to write to
    /sys/../dynamic_debug/control, but this sucks.

    This patch makes it possible to pass a query in the same format one can
    write to /sys/../dynamic_debug/control via boot param.
    When dynamic debug gets initialized, this query will automatically be
    applied.

    Signed-off-by: Thomas Renninger
    Acked-by: jbaron@redhat.com
    Acked-by: Pekka Enberg
    Signed-off-by: Greg Kroah-Hartman

    Thomas Renninger
     
  • The parsing and applying of dynamic debug strings is not only useful for
    /sys/../dynamic_debug/control write access, but can also be used for
    boot parameter parsing.
    The boot parameter is introduced in a follow up patch.

    Signed-off-by: Thomas Renninger
    Acked-by: jbaron@redhat.com
    Acked-by: Pekka Enberg
    Signed-off-by: Greg Kroah-Hartman

    Thomas Renninger
     

22 Oct, 2010

2 commits

  • …it/konrad/swiotlb-2.6

    * 'stable/swiotlb-0.9' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb-2.6:
    swiotlb: Use page alignment for early buffer allocation
    swiotlb: make io_tlb_overflow static

    Linus Torvalds
     
  • …git/tip/linux-2.6-tip

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (163 commits)
    tracing: Fix compile issue for trace_sched_wakeup.c
    [S390] hardirq: remove pointless header file includes
    [IA64] Move local_softirq_pending() definition
    perf, powerpc: Fix power_pmu_event_init to not use event->ctx
    ftrace: Remove recursion between recordmcount and scripts/mod/empty
    jump_label: Add COND_STMT(), reducer wrappery
    perf: Optimize sw events
    perf: Use jump_labels to optimize the scheduler hooks
    jump_label: Add atomic_t interface
    jump_label: Use more consistent naming
    perf, hw_breakpoint: Fix crash in hw_breakpoint creation
    perf: Find task before event alloc
    perf: Fix task refcount bugs
    perf: Fix group moving
    irq_work: Add generic hardirq context callbacks
    perf_events: Fix transaction recovery in group_sched_in()
    perf_events: Fix bogus AMD64 generic TLB events
    perf_events: Fix bogus context time tracking
    tracing: Remove parent recording in latency tracer graph options
    tracing: Use one prologue for the preempt irqs off tracer function tracers
    ...

    Linus Torvalds
     

21 Oct, 2010

1 commit

  • With all the patches we have queued in the BKL removal tree, only a
    few dozen modules are left that actually rely on the BKL, and even
    there are lots of low-hanging fruit. We need to decide what to do
    about them, this patch illustrates one of the options:

    Every user of the BKL is marked as 'depends on BKL' in Kconfig,
    and the CONFIG_BKL becomes a user-visible option. If it gets
    disabled, no BKL using module can be built any more and the BKL
    code itself is compiled out.

    The one exception is file locking, which is practically always
    enabled and does a 'select BKL' instead. This effectively forces
    CONFIG_BKL to be enabled until we have solved the fs/lockd
    mess and can apply the patch that removes the BKL from fs/locks.c.

    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

19 Oct, 2010

1 commit


15 Oct, 2010

2 commits

  • All file_operations should get a .llseek operation so we can make
    nonseekable_open the default for future file operations without a
    .llseek pointer.

    The three cases that we can automatically detect are no_llseek, seq_lseek
    and default_llseek. For cases where we can we can automatically prove that
    the file offset is always ignored, we use noop_llseek, which maintains
    the current behavior of not returning an error from a seek.

    New drivers should normally not use noop_llseek but instead use no_llseek
    and call nonseekable_open at open time. Existing drivers can be converted
    to do the same when the maintainer knows for certain that no user code
    relies on calling seek on the device file.

    The generated code is often incorrectly indented and right now contains
    comments that clarify for each added line why a specific variant was
    chosen. In the version that gets submitted upstream, the comments will
    be gone and I will manually fix the indentation, because there does not
    seem to be a way to do that using coccinelle.

    Some amount of new code is currently sitting in linux-next that should get
    the same modifications, which I will do at the end of the merge window.

    Many thanks to Julia Lawall for helping me learn to write a semantic
    patch that does all this.

    ===== begin semantic patch =====
    // This adds an llseek= method to all file operations,
    // as a preparation for making no_llseek the default.
    //
    // The rules are
    // - use no_llseek explicitly if we do nonseekable_open
    // - use seq_lseek for sequential files
    // - use default_llseek if we know we access f_pos
    // - use noop_llseek if we know we don't access f_pos,
    // but we still want to allow users to call lseek
    //
    @ open1 exists @
    identifier nested_open;
    @@
    nested_open(...)
    {

    }

    @ open exists@
    identifier open_f;
    identifier i, f;
    identifier open1.nested_open;
    @@
    int open_f(struct inode *i, struct file *f)
    {

    }

    @ read disable optional_qualifier exists @
    identifier read_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    expression E;
    identifier func;
    @@
    ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
    {

    }

    @ read_no_fpos disable optional_qualifier exists @
    identifier read_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    @@
    ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
    {
    ... when != off
    }

    @ write @
    identifier write_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    expression E;
    identifier func;
    @@
    ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
    {

    }

    @ write_no_fpos @
    identifier write_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    @@
    ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
    {
    ... when != off
    }

    @ fops0 @
    identifier fops;
    @@
    struct file_operations fops = {
    ...
    };

    @ has_llseek depends on fops0 @
    identifier fops0.fops;
    identifier llseek_f;
    @@
    struct file_operations fops = {
    ...
    .llseek = llseek_f,
    ...
    };

    @ has_read depends on fops0 @
    identifier fops0.fops;
    identifier read_f;
    @@
    struct file_operations fops = {
    ...
    .read = read_f,
    ...
    };

    @ has_write depends on fops0 @
    identifier fops0.fops;
    identifier write_f;
    @@
    struct file_operations fops = {
    ...
    .write = write_f,
    ...
    };

    @ has_open depends on fops0 @
    identifier fops0.fops;
    identifier open_f;
    @@
    struct file_operations fops = {
    ...
    .open = open_f,
    ...
    };

    // use no_llseek if we call nonseekable_open
    ////////////////////////////////////////////
    @ nonseekable1 depends on !has_llseek && has_open @
    identifier fops0.fops;
    identifier nso ~= "nonseekable_open";
    @@
    struct file_operations fops = {
    ... .open = nso, ...
    +.llseek = no_llseek, /* nonseekable */
    };

    @ nonseekable2 depends on !has_llseek @
    identifier fops0.fops;
    identifier open.open_f;
    @@
    struct file_operations fops = {
    ... .open = open_f, ...
    +.llseek = no_llseek, /* open uses nonseekable */
    };

    // use seq_lseek for sequential files
    /////////////////////////////////////
    @ seq depends on !has_llseek @
    identifier fops0.fops;
    identifier sr ~= "seq_read";
    @@
    struct file_operations fops = {
    ... .read = sr, ...
    +.llseek = seq_lseek, /* we have seq_read */
    };

    // use default_llseek if there is a readdir
    ///////////////////////////////////////////
    @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier readdir_e;
    @@
    // any other fop is used that changes pos
    struct file_operations fops = {
    ... .readdir = readdir_e, ...
    +.llseek = default_llseek, /* readdir is present */
    };

    // use default_llseek if at least one of read/write touches f_pos
    /////////////////////////////////////////////////////////////////
    @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read.read_f;
    @@
    // read fops use offset
    struct file_operations fops = {
    ... .read = read_f, ...
    +.llseek = default_llseek, /* read accesses f_pos */
    };

    @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier write.write_f;
    @@
    // write fops use offset
    struct file_operations fops = {
    ... .write = write_f, ...
    + .llseek = default_llseek, /* write accesses f_pos */
    };

    // Use noop_llseek if neither read nor write accesses f_pos
    ///////////////////////////////////////////////////////////

    @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read_no_fpos.read_f;
    identifier write_no_fpos.write_f;
    @@
    // write fops use offset
    struct file_operations fops = {
    ...
    .write = write_f,
    .read = read_f,
    ...
    +.llseek = noop_llseek, /* read and write both use no f_pos */
    };

    @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier write_no_fpos.write_f;
    @@
    struct file_operations fops = {
    ... .write = write_f, ...
    +.llseek = noop_llseek, /* write uses no f_pos */
    };

    @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read_no_fpos.read_f;
    @@
    struct file_operations fops = {
    ... .read = read_f, ...
    +.llseek = noop_llseek, /* read uses no f_pos */
    };

    @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    @@
    struct file_operations fops = {
    ...
    +.llseek = noop_llseek, /* no read or write fn */
    };
    ===== End semantic patch =====

    Signed-off-by: Arnd Bergmann
    Cc: Julia Lawall
    Cc: Christoph Hellwig

    Arnd Bergmann
     
  • All the necessary functionality was already there; we just need
    to make it possible to select the config option.

    Signed-off-by: Chris Metcalf

    Chris Metcalf
     

12 Oct, 2010

2 commits

  • We could call free_bootmem_late() if swiotlb is not used, and
    it will shrink to page alignment.

    So alloc them with page alignment at first, to avoid lose two pages

    before patch:
    [ 0.000000] memblock_x86_reserve_range: [00d3600000, 00d7600000] swiotlb buffer
    [ 0.000000] memblock_x86_reserve_range: [00d7e7ef40, 00d7e9ef40] swiotlb list
    [ 0.000000] memblock_x86_reserve_range: [00d7e3ef40, 00d7e7ef40] swiotlb orig_ad
    [ 0.000000] memblock_x86_reserve_range: [000008a000, 0000092000] swiotlb overflo

    after patch will get
    [ 0.000000] memblock_x86_reserve_range: [00d3600000, 00d7600000] swiotlb buffer
    [ 0.000000] memblock_x86_reserve_range: [00d7e7e000, 00d7e9e000] swiotlb list
    [ 0.000000] memblock_x86_reserve_range: [00d7e3e000, 00d7e7e000] swiotlb orig_ad
    [ 0.000000] memblock_x86_reserve_range: [000008a000, 0000092000] swiotlb overflo

    Signed-off-by: Yinghai Lu
    Acked-by: FUJITA Tomonori
    Cc: Becky Bruce
    Signed-off-by: Konrad Rzeszutek Wilk

    Yinghai Lu
     
  • We don't need to export io_tlb_overflow_buffer. I'll remove
    io_tlb_overflow_buffer completely in the long term though.

    Signed-off-by: FUJITA Tomonori
    Signed-off-by: Konrad Rzeszutek Wilk

    FUJITA Tomonori
     

08 Oct, 2010

2 commits


07 Oct, 2010

1 commit