01 May, 2013

40 commits

  • Use platform_device_put() instead of platform_device_unregister() if
    platform_device_add() fails, and also add the return value check of
    platform_device_add_data().

    Signed-off-by: Wei Yongjun
    Cc: Evgeniy Polyakov
    Cc: Greg KH
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wei Yongjun
     
  • r592_pm_ops is not exported. Also, CONFIG_PM_SLEEP is used to
    remove unnecessary ifdefs.

    Signed-off-by: Jingoo Han
    Cc: Maxim Levitsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jingoo Han
     
  • Signed-off-by: liguang
    Cc: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    liguang
     
  • Signed-off-by: liguang
    Cc: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    liguang
     
  • drivers/pps/kc.c:37:1: sparse: symbol 'pps_kc_hardpps_lock' was not declared. Should it be static?
    drivers/pps/kc.c:39:19: sparse: symbol 'pps_kc_hardpps_dev' was not declared. Should it be static?
    drivers/pps/kc.c:40:5: sparse: symbol 'pps_kc_hardpps_mode' was not declared. Should it be static?

    Signed-off-by: Fengguang Wu
    Cc: Rodolfo Giometti
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fengguang Wu
     
  • Make CONFIG_PPS_DEBUG and CONFIG_NTP_PPS be hidden if CONFIG_PPS is not
    selected, so that we are not prompted for these configuration options if
    CONFIG_PPS is not set.

    Signed-off-by: Florian Fainelli
    Cc: Rodolfo Giometti
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Florian Fainelli
     
  • Signed-off-by: Mihnea Dobrescu-Balaur
    Cc: Ed Cashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mihnea Dobrescu-Balaur
     
  • Raise the default max request size for nbd to 128KB (from 127KB) to get it
    4KB aligned. This patch also allows the max request size to be increased
    (via /sys/block/nbd/queue/max_sectors_kb) to 32MB.

    The patch makes nbd network traffic more efficient by:
    - reducing request fragmentation (4KB alignment)
    - reducing the number of requests (fewer round trips, less network overhead)

    Especially in high latency networks, larger request size can make a dramatic

    Signed-off-by: Paul Clements
    Signed-off-by: Michal Belczyk
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Belczyk
     
  • Move BITS_PER_PAGE from pid_namespace.c to pid_namespace.h, since we can
    simplify the define PID_MAP_ENTRIES by using the BITS_PER_PAGE.

    [akpm@linux-foundation.org: kernel/pid.c:54:1: warning: "BITS_PER_PAGE" redefined]
    Signed-off-by: Raphael S.Carvalho
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Raphael S.Carvalho
     
  • find_next_offset() searches for an available "cleaned bit" in the
    respective pid bitmap (page), so returns the offset if found, otherwise
    it returns a value equals to BITS_PER_PAGE.

    For example, suppose find_next_offset didn't find any available bit, so
    there's no purpose to call mk_pid (Wasteful Cpu Cycles).

    Therefore, I found it could be better to call mk_pid after the checking
    (offset < BITS_PER_PAGE) returned sucessfully! Another point: If (offset
    < BITS_PER_PAGE) results in a "failure", then mk_pid would be called
    again afterwards.

    [akpm@linux-foundation.org: simplify code]
    Signed-off-by: Raphael S. Carvalho
    Cc: "Eric W. Biederman"
    Cc: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Raphael S. Carvalho
     
  • Signed-off-by: Davidlohr Bueso
    Reviewed-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Account for the rbtree having 2**bh(v)-1 internal nodes.

    While this can be seen as a consequence of other checks, Michel states
    that it nicely sums up what the other properties are for.

    Signed-off-by: Davidlohr Bueso
    Reviewed-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Simplify the logic of variable assignments.

    [akpm@linux-foundation.org: replace min_t with min, remove unneeded casts]
    Signed-off-by: Zhang Yanfei
    Cc: "Eric W. Biederman"
    Reviewed-by: Simon Horman
    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang Yanfei
     
  • The types of the following local variables:

    - ubytes/mbytes in kimage_load_crash_segment()/kimage_load_normal_segment()

    - r in vmcoreinfo_append_str()

    are wrong, so fix them.

    Signed-off-by: Zhang Yanfei
    Cc: "Eric W. Biederman"
    Cc: Simon Horman
    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang Yanfei
     
  • threadgroup_lock() takes signal->cred_guard_mutex to ensure that
    thread_group_leader() is stable. This doesn't look nice, the scope of
    this lock in do_execve() is huge.

    And as Dave pointed out this can lead to deadlock, we have the
    following dependencies:

    do_execve: cred_guard_mutex -> i_mutex
    cgroup_mount: i_mutex -> cgroup_mutex
    attach_task_by_pid: cgroup_mutex -> cred_guard_mutex

    Change de_thread() to take threadgroup_change_begin() around the
    switch-the-leader code and change threadgroup_lock() to avoid
    ->cred_guard_mutex.

    Note that de_thread() can't sleep with ->group_rwsem held, this can
    obviously deadlock with the exiting leader if the writer is active, so it
    does threadgroup_change_end() before schedule().

    Reported-by: Dave Jones
    Acked-by: Tejun Heo
    Acked-by: Li Zefan
    Signed-off-by: Oleg Nesterov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • set_task_comm() does memset() + wmb() before strlcpy(). This buys
    nothing and to add to the confusion, the comment is wrong.

    - We do not need memset() to be "safe from non-terminating string
    reads", the final char is always zero and we never change it.

    - wmb() is paired with nothing, it cannot prevent from printing
    the mixture of the old/new data unless the reader takes the lock.

    Signed-off-by: Oleg Nesterov
    Cc: Andi Kleen
    Cc: John Stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Currently, a write to a procfs file will return the number of bytes
    successfully written. If the actual string is longer than this, the
    remainder of the string will not be be written and userspace will
    complete the operation by issuing additional write()s.

    Hence

    $ echo -n "abcdefghijklmnopqrs" > /proc/self/comm

    results in

    $ cat /proc/$$/comm
    pqrs

    since the final four bytes were written with a second write() since
    TASK_COMM_LEN == 16. This is obviously an undesired result and not
    equivalent to prctl(PR_SET_NAME). The implementation should not need to
    know the definition of TASK_COMM_LEN.

    This patch truncates the string to the first TASK_COMM_LEN bytes and
    returns the bytes written as the length of the string written so the
    second write() is suppressed.

    $ cat /proc/$$/comm
    abcdefghijklmno

    Signed-off-by: David Rientjes
    Acked-by: John Stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • wait_for_dump_helpers() calls wake_up/kill_fasync from inside the
    wait_event-like loop. This is not needed and in fact this is not
    strictly correct, we can/should do this only once after we change
    pipe->writers. We could even check if it becomes zero.

    Change this code to use use wait_event_interruptible(), this can also
    help to make this wait freezable.

    With this patch we check pipe->readers without pipe_lock(), this is
    fine. Once we see pipe->readers == 1 we know that the handler
    decremented the counter, this is all we need.

    Signed-off-by: Oleg Nesterov
    Acked-by: Mandeep Singh Baines
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Cleanup. Every linux_binfmt->core_dump() sets PF_DUMPCORE, move this into
    zap_threads() called by do_coredump().

    Signed-off-by: Oleg Nesterov
    Acked-by: Mandeep Singh Baines
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • By discussion with Mandeep.

    Change dump_write(), dump_seek() and do_coredump() to check
    signal_pending() and abort if it is true. dump_seek() does this only
    before f_op->llseek(), otherwise it relies on dump_write().

    We need this change to ensure that the coredump won't delay suspend, and
    to ensure it reacts to SIGKILL "quickly enough", a core dump can take a
    lot of time. In particular this can help oom-killer.

    We add the new trivial helper, dump_interrupted() to add the comments and
    to simplify the potential freezer changes. Perhaps it will have more
    callers.

    Ideally it should do try_to_freeze() but then we need the unpleasant
    changes in dump_write() and wait_for_dump_helpers(). It is not trivial to
    change dump_write() to restart if f_op->write() fails because of
    freezing(). We need to handle the short writes, we need to clear
    TIF_SIGPENDING (and we can't rely on recalc_sigpending() unless we change
    it to check PF_DUMPCORE). And if the buggy f_op->write() sets
    TIF_SIGPENDING we can not distinguish this case from the race with
    freeze_task() + __thaw_task().

    So we simply accept the fact that the freezer can truncate a core-dump but
    at least you can reliably suspend. Hopefully we can tolerate this
    unlikely case and the necessary complications doesn't worth a trouble.
    But if we decide to make the coredumping freezable later we can do this on
    top of this change.

    Signed-off-by: Oleg Nesterov
    Acked-by: Mandeep Singh Baines
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Now that the coredumping process can be SIGKILL'ed, the setting of
    ->group_exit_code in do_coredump() can race with complete_signal() and
    SIGKILL or 0x80 can be "lost", or wait(status) can report status ==
    SIGKILL | 0x80.

    But the main problem is that it is not clear to me what should we do if
    binfmt->core_dump() succeeds but SIGKILL was sent, that is why this patch
    comes as a separate change.

    This patch adds 0x80 if ->core_dump() succeeds and the process was not
    killed. But perhaps we can (should?) re-set ->group_exit_code changed by
    SIGKILL back to "siginfo->si_signo |= 0x80" in case when core_dumped == T.

    Signed-off-by: Oleg Nesterov
    Tested-by: Mandeep Singh Baines
    Cc: Ingo Molnar
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Roland McGrath
    Cc: Tejun Heo
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • prepare_signal() blesses SIGKILL sent to the dumping process but this
    signal can be "lost" anyway. The problems is, complete_signal() sees
    SIGNAL_GROUP_EXIT and skips the "kill them all" logic. And even if the
    dumping process is single-threaded (so the target is always "correct"),
    the group-wide SIGKILL is not recorded in task->pending and thus
    __fatal_signal_pending() won't be true. A multi-threaded case has even
    more problems.

    And even ignoring all technical details, SIGNAL_GROUP_EXIT doesn't look
    right to me. This coredumping process is not exiting yet, it can do a lot
    of work dumping the core.

    With this patch the dumping process doesn't have SIGNAL_GROUP_EXIT, we set
    signal->group_exit_task instead. This makes signal_group_exit() true and
    thus this should equally close the races with exit/exec/stop but allows to
    kill the dumping thread reliably.

    Notes:
    - It is not clear what should we do with ->group_exit_code
    if the dumper was killed, see the next change.

    - we need more (hopefully straightforward) changes to ensure
    that SIGKILL actually interrupts the coredump. Basically we
    need to check __fatal_signal_pending() in dump_write() and
    dump_seek().

    Signed-off-by: Oleg Nesterov
    Tested-by: Mandeep Singh Baines
    Cc: Ingo Molnar
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Roland McGrath
    Cc: Tejun Heo
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • There are 2 well known and ancient problems with coredump/signals, and a
    lot of related bug reports:

    - do_coredump() clears TIF_SIGPENDING but of course this can't help
    if, say, SIGCHLD comes after that.

    In this case the coredump can fail unexpectedly. See for example
    wait_for_dump_helper()->signal_pending() check but there are other
    reasons.

    - At the same time, dumping a huge core on the slow media can take a
    lot of time/resources and there is no way to kill the coredumping
    task reliably. In particular this is not oom_kill-friendly.

    This patch tries to fix the 1st problem, and makes the preparation for the
    next changes.

    We add the new SIGNAL_GROUP_COREDUMP flag set by zap_threads() to indicate
    that this process dumps the core. prepare_signal() checks this flag and
    nacks any signal except SIGKILL.

    Note that this check tries to be conservative, in the long term we should
    probably treat the SIGNAL_GROUP_EXIT case equally but this needs more
    discussion. See marc.info/?l=linux-kernel&m=120508897917439

    Notes:
    - recalc_sigpending() doesn't check SIGNAL_GROUP_COREDUMP.
    The patch assumes that dump_write/etc paths should never
    call it, but we can change it as well.

    - There is another source of TIF_SIGPENDING, freezer. This
    will be addressed separately.

    Signed-off-by: Oleg Nesterov
    Tested-by: Mandeep Singh Baines
    Cc: Ingo Molnar
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Roland McGrath
    Cc: Tejun Heo
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • This function suffers from not being able to determine if the cleanup is
    called in case it returns -ENOMEM. Nobody is using it anymore, so let's
    remove it.

    Signed-off-by: Lucas De Marchi
    Cc: Oleg Nesterov
    Cc: David Howells
    Cc: James Morris
    Cc: Al Viro
    Cc: Tejun Heo
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lucas De Marchi
     
  • These are the only users of call_usermodehelper_fns(). This function
    suffers from not being able to determine if the cleanup is called. Even
    if in this places the cleanup pointer is NULL, convert them to use the
    separate call_usermodehelper_setup() + call_usermodehelper_exec()
    functions so we can remove the _fns variant.

    Signed-off-by: Lucas De Marchi
    Cc: Oleg Nesterov
    Cc: David Howells
    Cc: James Morris
    Cc: Al Viro
    Cc: Tejun Heo
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lucas De Marchi
     
  • Signed-off-by: Lucas De Marchi
    Cc: Oleg Nesterov
    Cc: David Howells
    Cc: James Morris
    Cc: Al Viro
    Cc: Tejun Heo
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lucas De Marchi
     
  • Use call_usermodehelper_setup() + call_usermodehelper_exec() instead of
    calling call_usermodehelper_fns(). In case there's an OOM in this last
    function the cleanup function may not be called - in this case we would
    miss a call to key_put().

    Signed-off-by: Lucas De Marchi
    Cc: Oleg Nesterov
    Acked-by: David Howells
    Acked-by: James Morris
    Cc: Al Viro
    Cc: Tejun Heo
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lucas De Marchi
     
  • Use call_usermodehelper_setup() + call_usermodehelper_exec() instead of
    calling call_usermodehelper_fns(). In case the latter returns -ENOMEM the
    cleanup function may had not been called - in this case we would not free
    argv and module_name.

    Signed-off-by: Lucas De Marchi
    Cc: Oleg Nesterov
    Cc: David Howells
    Cc: James Morris
    Cc: Al Viro
    Cc: Tejun Heo
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lucas De Marchi
     
  • call_usermodehelper_setup() + call_usermodehelper_exec() need to be
    called instead of call_usermodehelper_fns() when the cleanup function
    needs to be called even when an ENOMEM error occurs. In this case using
    call_usermodehelper_fns() the user can't distinguish if the cleanup
    function was called or not.

    [akpm@linux-foundation.org: export call_usermodehelper_setup() to modules]
    Signed-off-by: Lucas De Marchi
    Reviewed-by: Oleg Nesterov
    Cc: David Howells
    Cc: James Morris
    Cc: Al Viro
    Cc: Tejun Heo
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lucas De Marchi
     
  • * Dump signals from process-wide and per-thread queues with
    different sizes of buffers.
    * Check error paths for buffers with restricted permissions. A part of
    buffer or a whole buffer is for read-only.
    * Try to get nonexistent signal.

    Signed-off-by: Andrew Vagin
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Cc: "Paul E. McKenney"
    Cc: David Howells
    Cc: Dave Jones
    Cc: "Michael Kerrisk (man-pages)"
    Cc: Pavel Emelyanov
    Cc: Linus Torvalds
    Cc: Pedro Alves
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Vagin
     
  • This patch adds a new ptrace request PTRACE_PEEKSIGINFO.

    This request is used to retrieve information about pending signals
    starting with the specified sequence number. Siginfo_t structures are
    copied from the child into the buffer starting at "data".

    The argument "addr" is a pointer to struct ptrace_peeksiginfo_args.
    struct ptrace_peeksiginfo_args {
    u64 off; /* from which siginfo to start */
    u32 flags;
    s32 nr; /* how may siginfos to take */
    };

    "nr" has type "s32", because ptrace() returns "long", which has 32 bits on
    i386 and a negative values is used for errors.

    Currently here is only one flag PTRACE_PEEKSIGINFO_SHARED for dumping
    signals from process-wide queue. If this flag is not set, signals are
    read from a per-thread queue.

    The request PTRACE_PEEKSIGINFO returns a number of dumped signals. If a
    signal with the specified sequence number doesn't exist, ptrace returns
    zero. The request returns an error, if no signal has been dumped.

    Errors:
    EINVAL - one or more specified flags are not supported or nr is negative
    EFAULT - buf or addr is outside your accessible address space.

    A result siginfo contains a kernel part of si_code which usually striped,
    but it's required for queuing the same siginfo back during restore of
    pending signals.

    This functionality is required for checkpointing pending signals. Pedro
    Alves suggested using it in "gdb" to peek at pending signals. gdb already
    uses PTRACE_GETSIGINFO to get the siginfo for the signal which was already
    dequeued. This functionality allows gdb to look at the pending signals
    which were not reported yet.

    The prototype of this code was developed by Oleg Nesterov.

    Signed-off-by: Andrew Vagin
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Cc: "Paul E. McKenney"
    Cc: David Howells
    Cc: Dave Jones
    Cc: "Michael Kerrisk (man-pages)"
    Cc: Pavel Emelyanov
    Cc: Linus Torvalds
    Cc: Pedro Alves
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Vagin
     
  • Signed-off-by: Vyacheslav Dubeyko
    Cc: Christoph Hellwig
    Cc: Al Viro
    Cc: Hin-Tak Leung
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • __hfsplus_ext_write_extent() suppresses errors coming from
    hfs_brec_find(). The patch implements error code propagation.

    Signed-off-by: Alexey Khoroshilov
    Reviewed-by: Vyacheslav Dubeyko
    Cc: Hin-Tak Leung
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Khoroshilov
     
  • Use a more current logging style.

    Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
    hfsplus now uses "hfsplus: " for all messages.
    Coalesce formats.
    Prefix debugging messages too.

    Signed-off-by: Joe Perches
    Cc: Vyacheslav Dubeyko
    Cc: Hin-Tak Leung
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Use a more current logging style.

    Rename macro and uses.
    Add do {} while (0) to macro.
    Add DBG_ to macro.
    Add and use hfs_dbg_cont variant where appropriate.

    Signed-off-by: Joe Perches
    Cc: Vyacheslav Dubeyko
    Cc: Hin-Tak Leung
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • fs/hfsplus/bfind.c: In function 'hfs_find_1st_rec_by_cnid':
    (1) include/uapi/linux/swab.h:60:2: warning: 'search_cnid' may be used uninitialized in this function [-Wmaybe-uninitialized]
    (2) include/uapi/linux/swab.h:60:2: warning: 'cur_cnid' may be used uninitialized in this function [-Wmaybe-uninitialized]

    [akpm@linux-foundation.org: make the workaround more explicit]
    Signed-off-by: Vyacheslav Dubeyko
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • hfs_find_init() may fail with ENOMEM, but there are places, where the
    returned value is not checked. The consequences can be very unpleasant,
    e.g. kfree uninitialized pointer and inappropriate mutex unlocking.

    The patch adds checks for errors in hfs_find_init().

    Found by Linux Driver Verification project (linuxtesting.org).

    Signed-off-by: Alexey Khoroshilov
    Reviewed-by: Vyacheslav Dubeyko
    Cc: Hin-Tak Leung
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Khoroshilov
     
  • page->mapping->host cannot be NULL in nilfs_writepage(), so remove the
    unneeded test.

    The fixes the smatch warning: "fs/nilfs2/inode.c:211 nilfs_writepage()
    error: we previously assumed 'inode' could be null (see line 195)".

    Reported-by: Dan Carpenter
    Signed-off-by: Vyacheslav Dubeyko
    Cc: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • Change test_bit(PG_locked, &page->flags) to PageLocked().

    Signed-off-by: Vyacheslav Dubeyko
    Cc: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • …river's internal error or metadata corruption

    The NILFS2 driver remounts itself in RO mode in the case of discovering
    metadata corruption (for example, discovering a broken bmap). But
    usually, this takes place when there have been file system operations
    before remounting in RO mode.

    Thereby, NILFS2 driver can be in RO mode with presence of dirty pages in
    modified inodes' address spaces. It results in flush kernel thread's
    infinite trying to flush dirty pages in RO mode. As a result, it is
    possible to see such side effects as: (1) flush kernel thread occupies
    50% - 99% of CPU time; (2) system can't be shutdowned without manual
    power switch off.

    SYMPTOMS:
    (1) System log contains error message: "Remounting filesystem read-only".
    (2) The flush kernel thread occupies 50% - 99% of CPU time.
    (3) The system can't be shutdowned without manual power switch off.

    REPRODUCTION PATH:
    (1) Create volume group with name "unencrypted" by means of vgcreate utility.
    (2) Run script (prepared by Anthony Doggett <Anthony2486@interfaces.org.uk>):

    ----------------[BEGIN SCRIPT]--------------------
    #!/bin/bash

    VG=unencrypted
    #apt-get install nilfs-tools darcs
    lvcreate --size 2G --name ntest $VG
    mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
    mkdir /var/tmp/n
    mkdir /var/tmp/n/ntest
    mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
    mkdir /var/tmp/n/ntest/thedir
    cd /var/tmp/n/ntest/thedir
    sleep 2
    date
    darcs init
    sleep 2
    dmesg|tail -n 5
    date
    darcs whatsnew || true
    date
    sleep 2
    dmesg|tail -n 5
    ----------------[END SCRIPT]--------------------

    (3) Try to shutdown the system.

    REPRODUCIBILITY: 100%

    FIX:

    This patch implements checking mount state of NILFS2 driver in
    nilfs_writepage(), nilfs_writepages() and nilfs_mdt_write_page()
    methods. If it is detected the RO mount state then all dirty pages are
    simply discarded with warning messages is written in system log.

    [akpm@linux-foundation.org: fix printk warning]
    Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
    Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Cc: Anthony Doggett <Anthony2486@interfaces.org.uk>
    Cc: ARAI Shun-ichi <hermes@ceres.dti.ne.jp>
    Cc: Piotr Szymaniak <szarpaj@grubelek.pl>
    Cc: Zahid Chowdhury <zahid.chowdhury@starsolutions.com>
    Cc: Elmer Zhang <freeboy6716@gmail.com>
    Cc: Wu Fengguang <fengguang.wu@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Vyacheslav Dubeyko