31 Jul, 2012

40 commits

  • Rewrite existing cpu-notifier-error-inject module to use debugfs based new
    framework.

    This change removes cpu_up_prepare_error and cpu_down_prepare_error module
    parameters which were used to specify error code to be injected. We could
    keep these module parameters for backward compatibility by module_param_cb
    but it seems overkill for this module.

    This provides the ability to inject artifical errors to CPU notifier chain
    callbacks. It is controlled through debugfs interface under
    /sys/kernel/debug/notifier-error-inject/cpu

    If the notifier call chain should be failed with some events notified,
    write the error code to "actions//error".

    Example1: inject CPU offline error (-1 == -EPERM)

    # cd /sys/kernel/debug/notifier-error-inject/cpu
    # echo -1 > actions/CPU_DOWN_PREPARE/error
    # echo 0 > /sys/devices/system/cpu/cpu1/online
    bash: echo: write error: Operation not permitted

    Example2: inject CPU online error (-2 == -ENOENT)

    # cd /sys/kernel/debug/notifier-error-inject/cpu
    # echo -2 > actions/CPU_UP_PREPARE/error
    # echo 1 > /sys/devices/system/cpu/cpu1/online
    bash: echo: write error: No such file or directory

    Signed-off-by: Akinobu Mita
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: Greg KH
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Dave Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • This patchset provides kernel modules that can be used to test the error
    handling of notifier call chain failures by injecting artifical errors to
    the following notifier chain callbacks.

    * CPU notifier
    * PM notifier
    * memory hotplug notifier
    * powerpc pSeries reconfig notifier

    Example: Inject CPU offline error (-1 == -EPERM)

    # cd /sys/kernel/debug/notifier-error-inject/cpu
    # echo -1 > actions/CPU_DOWN_PREPARE/error
    # echo 0 > /sys/devices/system/cpu/cpu1/online
    bash: echo: write error: Operation not permitted

    The patchset also adds cpu and memory hotplug tests to
    tools/testing/selftests These tests first do simple online and offline
    test and then do fault injection tests if notifier error injection
    module is available.

    This patch:

    The notifier error injection provides the ability to inject artifical
    errors to specified notifier chain callbacks. It is useful to test the
    error handling of notifier call chain failures.

    This adds common basic functions to define which type of events can be
    fail and to initialize the debugfs interface to control what error code
    should be returned and which event should be failed.

    Signed-off-by: Akinobu Mita
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: Greg KH
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Dave Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • When we restore file descriptors we would like them to look exactly as
    they were at dumping time.

    With help of fcntl it's almost possible, the missing snippet is file
    owners UIDs.

    To be able to read their values the F_GETOWNER_UIDS is introduced.

    This option is valid iif CONFIG_CHECKPOINT_RESTORE is turned on, otherwise
    returning -EINVAL.

    Signed-off-by: Cyrill Gorcunov
    Acked-by: "Eric W. Biederman"
    Cc: "Serge E. Hallyn"
    Cc: Oleg Nesterov
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • When the requested range is outside of the root range the logic in
    __reserve_region_with_split will cause an infinite recursion which will
    overflow the stack as seen in the warning bellow.

    This particular stack overflow was caused by requesting the
    (100000000-107ffffff) range while the root range was (0-ffffffff). In
    this case __request_resource would return the whole root range as
    conflict range (i.e. 0-ffffffff). Then, the logic in
    __reserve_region_with_split would continue the recursion requesting the
    new range as (conflict->end+1, end) which incidentally in this case
    equals the originally requested range.

    This patch aborts looking for an usable range when the request does not
    intersect with the root range. When the request partially overlaps with
    the root range, it ajust the request to fall in the root range and then
    continues with the new request.

    When the request is modified or aborted errors and a stack trace are
    logged to allow catching the errors in the upper layers.

    [ 5.968374] WARNING: at kernel/sched.c:4129 sub_preempt_count+0x63/0x89()
    [ 5.975150] Modules linked in:
    [ 5.978184] Pid: 1, comm: swapper Not tainted 3.0.22-mid27-00004-gb72c817 #46
    [ 5.985324] Call Trace:
    [ 5.987759] [] ? console_unlock+0x17b/0x18d
    [ 5.992891] [] warn_slowpath_common+0x48/0x5d
    [ 5.998194] [] ? sub_preempt_count+0x63/0x89
    [ 6.003412] [] warn_slowpath_null+0xf/0x13
    [ 6.008453] [] sub_preempt_count+0x63/0x89
    [ 6.013499] [] _raw_spin_unlock+0x27/0x3f
    [ 6.018453] [] add_partial+0x36/0x3b
    [ 6.022973] [] deactivate_slab+0x96/0xb4
    [ 6.027842] [] __slab_alloc.isra.54.constprop.63+0x204/0x241
    [ 6.034456] [] ? kzalloc.constprop.5+0x29/0x38
    [ 6.039842] [] ? kzalloc.constprop.5+0x29/0x38
    [ 6.045232] [] kmem_cache_alloc_trace+0x51/0xb0
    [ 6.050710] [] ? kzalloc.constprop.5+0x29/0x38
    [ 6.056100] [] kzalloc.constprop.5+0x29/0x38
    [ 6.061320] [] __reserve_region_with_split+0x1c/0xd1
    [ 6.067230] [] __reserve_region_with_split+0xc6/0xd1
    ...
    [ 7.179057] [] __reserve_region_with_split+0xc6/0xd1
    [ 7.184970] [] reserve_region_with_split+0x30/0x42
    [ 7.190709] [] e820_reserve_resources_late+0xd1/0xe9
    [ 7.196623] [] pcibios_resource_survey+0x23/0x2a
    [ 7.202184] [] pcibios_init+0x23/0x35
    [ 7.206789] [] pci_subsys_init+0x3f/0x44
    [ 7.211659] [] do_one_initcall+0x72/0x122
    [ 7.216615] [] ? pci_legacy_init+0x3d/0x3d
    [ 7.221659] [] kernel_init+0xa6/0x118
    [ 7.226265] [] ? start_kernel+0x334/0x334
    [ 7.231223] [] kernel_thread_helper+0x6/0x10

    Signed-off-by: Octavian Purdila
    Signed-off-by: Ram Pai
    Cc: Jesse Barnes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Octavian Purdila
     
  • Convert init_sync_kiocb() from a nasty macro into a nice C function. The
    struct assignment trick takes care of zeroing all unmentioned fields.
    Shrinks fs/read_write.o's .text from 9857 bytes to 9714.

    Also demacroize is_sync_kiocb() and aio_ring_avail(). The latter fixes an
    arg-referenced-multiple-times hand grenade.

    Cc: Junxiao Bi
    Cc: Mark Fasheh
    Acked-by: Jeff Moyer
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Support the caching of large files.

    Addresses https://bugzilla.kernel.org/show_bug.cgi?id=31182

    Signed-off-by: Justin Lecher
    Signed-off-by: Suresh Jayaraman
    Tested-by: Suresh Jayaraman
    Acked-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Justin Lecher
     
  • We should return PTR_ERR if the call to the device_create function fails.
    Without this patch we instead return the value from a successful call to
    cdev_add if the call to device_create fails.

    Signed-off-by: Emil Goode
    Acked-by: Devendra Naga
    Cc: Alexander Gordeev
    Cc: Rodolfo Giometti
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Emil Goode
     
  • Addresses https://bugzilla.kernel.org/show_bug.cgi?id=44621

    Reported-by:
    Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • register_sysctl_table() is a strange function, as it makes internal
    allocations (a header) to register a sysctl_table. This header is a
    handle to the table that is created, and can be used to unregister the
    table. But if the table is permanent and never unregistered, the header
    acts the same as a static variable.

    Unfortunately, this allocation of memory that is never expected to be
    freed fools kmemleak in thinking that we have leaked memory. For those
    sysctl tables that are never unregistered, and have no pointer referencing
    them, kmemleak will think that these are memory leaks:

    unreferenced object 0xffff880079fb9d40 (size 192):
    comm "swapper/0", pid 0, jiffies 4294667316 (age 12614.152s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [] kmemleak_alloc+0x73/0x98
    [] kmemleak_alloc_recursive.constprop.42+0x16/0x18
    [] __kmalloc+0x107/0x153
    [] kzalloc.constprop.8+0xe/0x10
    [] __register_sysctl_paths+0xe1/0x160
    [] register_sysctl_paths+0x1b/0x1d
    [] register_sysctl_table+0x18/0x1a
    [] sysctl_init+0x10/0x14
    [] proc_sys_init+0x2f/0x31
    [] proc_root_init+0xa5/0xa7
    [] start_kernel+0x3d0/0x40a
    [] x86_64_start_reservations+0xae/0xb2
    [] x86_64_start_kernel+0x102/0x111
    [] 0xffffffffffffffff

    The sysctl_base_table used by sysctl itself is one such instance that
    registers the table to never be unregistered.

    Use kmemleak_not_leak() to suppress the kmemleak false positive.

    Signed-off-by: Steven Rostedt
    Acked-by: Catalin Marinas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven Rostedt
     
  • Rather than #define the options manually in the architecture code, add
    Kconfig options for them and select them there instead. This also allows
    us to select the compat IPC version parsing automatically for platforms
    using the old compat IPC interface.

    Reported-by: Andrew Morton
    Signed-off-by: Will Deacon
    Cc: Arnd Bergmann
    Cc: Chris Metcalf
    Cc: Catalin Marinas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Will Deacon
     
  • The msgsnd and msgrcv system calls use size_t to represent the size of the
    message being transferred. POSIX states that values of msgsz greater than
    SSIZE_MAX cause the result to be implementation-defined. On Linux, this
    equates to returning -EINVAL if (long) msgsz < 0.

    For compat tasks where !CONFIG_ARCH_WANT_OLD_COMPAT_IPC and compat_size_t
    is smaller than size_t, negative size values passed from userspace will be
    interpreted as positive values by do_msg{rcv,snd} and will fail to exit
    early with -EINVAL.

    This patch changes the compat prototypes for msg{rcv,snd} so that the
    message size is represented as a compat_ssize_t, which we cast to the
    native ssize_t type for the core IPC code.

    Cc: Arnd Bergmann
    Acked-by: Chris Metcalf
    Acked-by: Catalin Marinas
    Signed-off-by: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Will Deacon
     
  • Commit 48b25c43e6ee ("ipc: provide generic compat versions of IPC
    syscalls") added a new ARCH_WANT_OLD_COMPAT_IPC config option for
    architectures to select if their compat target requires the old IPC
    syscall interface.

    For architectures (such as AArch64) that do not require the internal
    calling conventions provided by this option, but have a compat target
    where the C library passes the IPC_64 flag explicitly,
    compat_ipc_parse_version no longer strips out the flag before calling
    the native system call implementation, resulting in unknown SHM/IPC
    commands and -EINVAL being returned to userspace.

    This patch separates the selection of the internal calling conventions
    for the IPC syscalls from the version parsing, allowing architectures to
    select __ARCH_WANT_COMPAT_IPC_PARSE_VERSION if they want to use version
    parsing whilst retaining the newer syscall calling conventions.

    Acked-by: Chris Metcalf
    Cc: Arnd Bergmann
    Acked-by: Catalin Marinas
    Signed-off-by: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Will Deacon
     
  • If the SHMLBA definition for a native task differs from the definition for
    a compat task, the do_shmat() function would need to handle both.

    This patch introduces COMPAT_SHMLBA, which is used by the compat shmat
    syscall when calling the ipc code and allows architectures such as AArch64
    (where the native SHMLBA is 64k but the compat (AArch32) definition is
    16k) to provide the correct semantics for compat IPC system calls.

    Cc: David S. Miller
    Cc: Chris Zankel
    Cc: Arnd Bergmann
    Acked-by: Catalin Marinas
    Signed-off-by: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Will Deacon
     
  • The last line of vmcoreinfo note does not end with \n. Parsing all the
    lines in note becomes easier if all lines end with \n instead of trying to
    special case the last line.

    I know at least one tool, vmcore-dmesg in kexec-tools tree which made the
    assumption that all lines end with \n. I think it is a good idea to fix
    it.

    Signed-off-by: Vivek Goyal
    Cc: "Eric W. Biederman"
    Cc: Atsushi Kumagai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     
  • The function dup_task() may fail at the following function calls in the
    following order.

    0) alloc_task_struct_node()
    1) alloc_thread_info_node()
    2) arch_dup_task_struct()

    Error by 0) is not a matter, it can just return. But error by 1) requires
    releasing task_struct allocated by 0) before it returns. Likewise, error
    by 2) requires releasing task_struct and thread_info allocated by 0) and
    1).

    The existing error handling calls free_task_struct() and
    free_thread_info() which do not only release task_struct and thread_info,
    but also call architecture specific arch_release_task_struct() and
    arch_release_thread_info().

    The problem is that task_struct and thread_info are not fully initialized
    yet at this point, but arch_release_task_struct() and
    arch_release_thread_info() are called with them.

    For example, x86 defines its own arch_release_task_struct() that releases
    a task_xstate. If alloc_thread_info_node() fails in dup_task(),
    arch_release_task_struct() is called with task_struct which is just
    allocated and filled with garbage in this error handling.

    This actually happened with tools/testing/fault-injection/failcmd.sh

    # env FAILCMD_TYPE=fail_page_alloc \
    ./tools/testing/fault-injection/failcmd.sh --times=100 \
    --min-order=0 --ignore-gfp-wait=0 \
    -- make -C tools/testing/selftests/ run_tests

    In order to fix this issue, make free_{task_struct,thread_info}() not to
    call arch_release_{task_struct,thread_info}() and call
    arch_release_{task_struct,thread_info}() implicitly where needed.

    Default arch_release_task_struct() and arch_release_thread_info() are
    defined as empty by default. So this change only affects the
    architectures which implement their own arch_release_task_struct() or
    arch_release_thread_info() as listed below.

    arch_release_task_struct(): x86, sh
    arch_release_thread_info(): mn10300, tile

    Signed-off-by: Akinobu Mita
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: David Howells
    Cc: Koichi Yasutake
    Cc: Paul Mundt
    Cc: Chris Metcalf
    Cc: Salman Qazi
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • To make way for "fork: fix error handling in dup_task()", which fixes the
    errors more completely.

    Cc: Salman Qazi
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • The current code can be replaced by vma_pages(). So use it to simplify
    the code.

    [akpm@linux-foundation.org: initialise `len' at its definition site]
    Signed-off-by: Huang Shijie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Shijie
     
  • __mem_open() which is called by both /proc//environ and
    /proc//mem ->open() handlers will allow the use of negative offsets.
    /proc//mem has negative offsets but not /proc//environ.

    Clean this by moving the 'force FMODE_UNSIGNED_OFFSET flag' to mem_open()
    to allow negative offsets only on /proc//mem.

    Signed-off-by: Djalal Harouni
    Cc: Oleg Nesterov
    Cc: Brad Spengler
    Acked-by: Kees Cook
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Djalal Harouni
     
  • Currently the following offset and environment address range check in
    environ_read() of /proc//environ is buggy:

    int this_len = mm->env_end - (mm->env_start + src);
    if (this_len /environ converted to 'unsigned
    long' may pass this check since '(mm->env_start + src)' can overflow and
    'this_len' will be positive.

    This can turn /proc//environ to act like /proc//mem since
    (mm->env_start + src) will point and read from another VMA.

    There are two fixes here plus some code cleaning:

    1) Fix the overflow by checking if the offset that was converted to
    unsigned long will always point to the [mm->env_start, mm->env_end]
    address range.

    2) Remove the truncation that was made to the result of the check,
    storing the result in 'int this_len' will alter its value and we can
    not depend on it.

    For kernels that have commit b409e578d ("proc: clean up
    /proc//environ handling") which adds the appropriate ptrace check and
    saves the 'mm' at ->open() time, this is not a security issue.

    This patch is taken from the grsecurity patch since it was just made
    available.

    Signed-off-by: Djalal Harouni
    Cc: Oleg Nesterov
    Cc: Brad Spengler
    Acked-by: Kees Cook
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Djalal Harouni
     
  • In commit 898b374af6f7 ("exec: replace call_usermodehelper_pipe with use
    of umh init function and resolve limit"), the core limits recursive
    check value was changed from 0 to 1, but the corresponding comments were
    not updated.

    Signed-off-by: Jovi Zhang
    Cc: Oleg Nesterov
    Cc: Neil Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jovi Zhang
     
  • The system deadlocks (at least since 2.6.10) when
    call_usermodehelper(UMH_WAIT_EXEC) request triggers
    call_usermodehelper(UMH_WAIT_PROC) request.

    This is because "khelper thread is waiting for the worker thread at
    wait_for_completion() in do_fork() since the worker thread was created
    with CLONE_VFORK flag" and "the worker thread cannot call complete()
    because do_execve() is blocked at UMH_WAIT_PROC request" and "the khelper
    thread cannot start processing UMH_WAIT_PROC request because the khelper
    thread is waiting for the worker thread at wait_for_completion() in
    do_fork()".

    The easiest example to observe this deadlock is to use a corrupted
    /sbin/hotplug binary (like shown below).

    # : > /tmp/dummy
    # chmod 755 /tmp/dummy
    # echo /tmp/dummy > /proc/sys/kernel/hotplug
    # modprobe whatever

    call_usermodehelper("/tmp/dummy", UMH_WAIT_EXEC) is called from
    kobject_uevent_env() in lib/kobject_uevent.c upon loading/unloading a
    module. do_execve("/tmp/dummy") triggers a call to
    request_module("binfmt-0000") from search_binary_handler() which in turn
    calls call_usermodehelper(UMH_WAIT_PROC).

    In order to avoid deadlock, as a for-now and easy-to-backport solution, do
    not try to call wait_for_completion() in call_usermodehelper_exec() if the
    worker thread was created by khelper thread with CLONE_VFORK flag. Future
    and fundamental solution might be replacing singleton khelper thread with
    some workqueue so that recursive calls up to max_active dependency loop
    can be handled without deadlock.

    [akpm@linux-foundation.org: add comment to kmod_thread_locker]
    Signed-off-by: Tetsuo Handa
    Cc: Arjan van de Ven
    Acked-by: Rusty Russell
    Cc: Tejun Heo
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     
  • This function's interface is, uh, subtle. Attempt to apologise for it.

    Cc: WANG Cong
    Cc: Cyrill Gorcunov
    Cc: Kees Cook
    Cc: Serge Hallyn
    Cc: "Eric W. Biederman"
    Cc: Alan Cox
    Cc: Oleg Nesterov
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Nearly identical shortname parsing is performed in fat_search_long() and
    __fat_readdir(). Extract this code into a function that may be called by
    both.

    Signed-off-by: Steven J. Magnani
    Acked-by: OGAWA Hirofumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven J. Magnani
     
  • Simplify code by providing accessor functions for the directory entry
    start cluster fields.

    Signed-off-by: Steven J. Magnani
    Acked-by: OGAWA Hirofumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven J. Magnani
     
  • Use -ENOMEM return value instead of -EINVAL when kzalloc() fails.

    Signed-off-by: Namjae Jeon
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namjae Jeon
     
  • Add omitted comments for different structures in driver implementation.

    Signed-off-by: Vyacheslav Dubeyko
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • Add omitted comments for structures in nilfs2_fs.h.

    Signed-off-by: Vyacheslav Dubeyko
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • An fs-thaw ioctl causes deadlock with a chcp or mkcp -s command:

    chcp D ffff88013870f3d0 0 1325 1324 0x00000004
    ...
    Call Trace:
    nilfs_transaction_begin+0x11c/0x1a0 [nilfs2]
    wake_up_bit+0x20/0x20
    copy_from_user+0x18/0x30 [nilfs2]
    nilfs_ioctl_change_cpmode+0x7d/0xcf [nilfs2]
    nilfs_ioctl+0x252/0x61a [nilfs2]
    do_page_fault+0x311/0x34c
    get_unmapped_area+0x132/0x14e
    do_vfs_ioctl+0x44b/0x490
    __set_task_blocked+0x5a/0x61
    vm_mmap_pgoff+0x76/0x87
    __set_current_blocked+0x30/0x4a
    sys_ioctl+0x4b/0x6f
    system_call_fastpath+0x16/0x1b
    thaw D ffff88013870d890 0 1352 1351 0x00000004
    ...
    Call Trace:
    rwsem_down_failed_common+0xdb/0x10f
    call_rwsem_down_write_failed+0x13/0x20
    down_write+0x25/0x27
    thaw_super+0x13/0x9e
    do_vfs_ioctl+0x1f5/0x490
    vm_mmap_pgoff+0x76/0x87
    sys_ioctl+0x4b/0x6f
    filp_close+0x64/0x6c
    system_call_fastpath+0x16/0x1b

    where the thaw ioctl deadlocked at thaw_super() when called while chcp was
    waiting at nilfs_transaction_begin() called from
    nilfs_ioctl_change_cpmode(). This deadlock is 100% reproducible.

    This is because nilfs_ioctl_change_cpmode() first locks sb->s_umount in
    read mode and then waits for unfreezing in nilfs_transaction_begin(),
    whereas thaw_super() locks sb->s_umount in write mode. The locking of
    sb->s_umount here was intended to make snapshot mounts and the downgrade
    of snapshots to checkpoints exclusive.

    This fixes the deadlock issue by replacing the sb->s_umount usage in
    nilfs_ioctl_change_cpmode() with a dedicated mutex which protects snapshot
    mounts.

    Signed-off-by: Ryusuke Konishi
    Cc: Fernando Luis Vazquez Cao
    Tested-by: Ryusuke Konishi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • The checkpoint deletion ioctl (rmcp ioctl) has potential for breaking
    snapshot because it is not fully exclusive with checkpoint mode change
    ioctl (chcp ioctl).

    The rmcp ioctl first tests if the specified checkpoint is a snapshot or
    not within nilfs_cpfile_delete_checkpoint function, and then calls
    nilfs_cpfile_delete_checkpoints function to actually invalidate the
    checkpoint only if it's not a snapshot. However, the checkpoint can be
    changed into a snapshot by the chcp ioctl between these two operations.
    In that case, calling nilfs_cpfile_delete_checkpoints() wrongly
    invalidates the snapshot, which leads to snapshot list corruption and
    snapshot count mismatch.

    This fixes the issue by changing nilfs_cpfile_delete_checkpoints() so
    that it reconfirms the target checkpoints are snapshot or not.

    This second check is exclusive with the chcp operation since it is
    protected by an existing semaphore.

    Signed-off-by: Ryusuke Konishi
    Cc: Fernando Luis Vazquez Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • ->delete_inode(), ->write_super_lockfs(), ->unlockfs() are gone so remove
    references to them in the NTFS code. Noticed while cleaning up the
    fsfreeze mess.

    Signed-off-by: Fernando Luis Vazquez Cao
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fernando Luis Vazquez Cao
     
  • Add omitted comment for ns_mount_state field of the_nilfs structure.

    Signed-off-by: Vyacheslav Dubeyko
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • On minix2 and minix3 usually max_size is 7fffffff and the check in
    question prohibits creation of last block spanning right before 7fffffff,
    due to downward rounding during the division. Fix it by using
    multiplication instead.

    [akpm@linux-foundation.org: fix up code layout, use local `sb']
    Signed-off-by: Vladimir Serbinenko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Serbinenko
     
  • Set the of_match_table for this driver so that devices can be described
    in the device tree.

    Signed-off-by: Nick Bowler
    Cc: Alessandro Zummo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Bowler
     
  • The owner member is supposed to be set to the module implementing the
    device driver, i.e., THIS_MODULE. This enables the appropriate module
    link in sysfs.

    Signed-off-by: Nick Bowler
    Cc: Alessandro Zummo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Bowler
     
  • Freeing will trigger when driver unloads, so using devm_kfree() is not
    needed.

    Signed-off-by: Devendra Naga
    Cc: Alessandro Zummo
    Cc: Ashish Jangam
    Cc: David Dajun Chen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Devendra Naga
     
  • Signed-off-by: Uwe Kleine-König
    Cc: Alessandro Zummo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     
  • This allows automatic driver loading for all supported device types.

    Signed-off-by: Uwe Kleine-König
    Cc: Alessandro Zummo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     
  • Fixes the following checkpatch warnings:

    WARNING: Use #include instead of
    WARNING: Use #include instead of

    Signed-off-by: Sachin Kamat
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sachin Kamat
     
  • When the driver detects that the clock time is invalid, it attempts to
    write a sane time into the hardware. We curently assume that everything
    is OK if those writes succeeded. But it is better to re-read the time
    from the hardware to ensure that the new settings got there OK.

    Cc: Devendra Naga
    Cc: Alessandro Zummo
    Cc: Anatolij Gustschin
    Cc: Andreas Dumberger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • r9701_get_datetime() calls rtc_valid_tm() and returns the value returned
    by rtc_valid_tm(), which can be used in the `if', so calling
    rtc_valid_tm() a second time is not required.

    Signed-off-by: Devendra Naga
    Cc: Alessandro Zummo
    Cc: Anatolij Gustschin
    Cc: Andreas Dumberger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Devendra Naga