27 Jul, 2011

34 commits

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     
  • acct_arg_size() takes ->page_table_lock around add_mm_counter() if
    !SPLIT_RSS_COUNTING. This is not needed after commit 172703b08cd0 ("mm:
    delete non-atomic mm counter implementation").

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Matt Fleming
    Cc: Dave Hansen
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • If CONFIG_MODULES=n, it makes no sense to retry the list of binary formats
    handler because the list will not be modified by request_module().

    Signed-off-by: Tetsuo Handa
    Cc: Richard Weinberger
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     
  • Currently, search_binary_handler() tries to load binary loader module
    using request_module() if a loader for the requested program is not yet
    loaded. But second attempt of request_module() does not affect the result
    of search_binary_handler().

    If request_module() triggered recursion, calling request_module() twice
    causes 2 to the power of MAX_KMOD_CONCURRENT (= 50) repetitions. It is
    not an infinite loop but is sufficient for users to consider as a hang up.

    Therefore, this patch changes not to call request_module() twice, making 1
    to the power of MAX_KMOD_CONCURRENT repetitions in case of recursion.

    Signed-off-by: Tetsuo Handa
    Reported-by: Richard Weinberger
    Tested-by: Richard Weinberger
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     
  • Commit a8bef8ff6ea1 ("mm: migration: avoid race between
    shift_arg_pages() and rmap_walk() during migration by not migrating
    temporary stacks") introduced a BUG_ON() to ensure that VM_STACK_FLAGS
    and VM_STACK_INCOMPLETE_SETUP do not overlap. The check is a compile
    time one, so BUILD_BUG_ON is more appropriate.

    Signed-off-by: Michal Hocko
    Cc: Mel Gorman
    Cc: Richard Weinberger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • If an inode's mode permits opening /proc/PID/io and the resulting file
    descriptor is kept across execve() of a setuid or similar binary, the
    ptrace_may_access() check tries to prevent using this fd against the
    task with escalated privileges.

    Unfortunately, there is a race in the check against execve(). If
    execve() is processed after the ptrace check, but before the actual io
    information gathering, io statistics will be gathered from the
    privileged process. At least in theory this might lead to gathering
    sensible information (like ssh/ftp password length) that wouldn't be
    available otherwise.

    Holding task->signal->cred_guard_mutex while gathering the io
    information should protect against the race.

    The order of locking is similar to the one inside of ptrace_attach():
    first goes cred_guard_mutex, then lock_task_sighand().

    Signed-off-by: Vasiliy Kulikov
    Cc: Al Viro
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasiliy Kulikov
     
  • Change the return value to ENOENT. This return value is then returned
    when opening the proc entry that have been removed. For example,
    open("/proc/bus/pci/XX/YY") when the corresponding device is being
    hot-removed.

    Signed-off-by: Daisuke Ogino
    Cc: Jesse Barnes
    Acked-by: Alexey Dobriyan
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Ogino
     
  • do_coredump() assumes that if format_corename() fails it should return
    -ENOMEM. This is not true, for example cn_print_exe_file() can propagate
    the error from d_path. Even if it was true, this is too fragile. Change
    the code to check "ispipe < 0".

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Jiri Slaby
    Reviewed-by: Neil Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Change every occurence of / in comm and hostname to !. If the process
    changes its name to contain /, the core is not dumped (if the directory
    tree doesn't exist like that). The same with hostname being something
    like myhost/3. Fix this behaviour by using the escape loop used in %E.
    (We extract it to a separate function.)

    Now both with comm == myprocess/1 and hostname == myhost/1, the core is
    dumped like (kernel.core_pattern='core.%p.%e.%h):
    core.2349.myprocess!1.myhost!1

    Signed-off-by: Jiri Slaby
    Cc: Alan Cox
    Cc: Al Viro
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • If we don't know the file corresponding to the binary (i.e. exe_file is
    unknown), use "task->comm (path unknown)" instead of simple "(unknown)"
    as suggested by ak.

    The fallback is the same as %e except it will append "(path unknown)".

    Signed-off-by: Jiri Slaby
    Cc: Alan Cox
    Cc: Al Viro
    Cc: Andi Kleen
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits)
    ceph: document unlocked d_parent accesses
    ceph: explicitly reference rename old_dentry parent dir in request
    ceph: document locking for ceph_set_dentry_offset
    ceph: avoid d_parent in ceph_dentry_hash; fix ceph_encode_fh() hashing bug
    ceph: protect d_parent access in ceph_d_revalidate
    ceph: protect access to d_parent
    ceph: handle racing calls to ceph_init_dentry
    ceph: set dir complete frag after adding capability
    rbd: set blk_queue request sizes to object size
    ceph: set up readahead size when rsize is not passed
    rbd: cancel watch request when releasing the device
    ceph: ignore lease mask
    ceph: fix ceph_lookup_open intent usage
    ceph: only link open operations to directory unsafe list if O_CREAT|O_TRUNC
    ceph: fix bad parent_inode calc in ceph_lookup_open
    ceph: avoid carrying Fw cap during write into page cache
    libceph: don't time out osd requests that haven't been received
    ceph: report f_bfree based on kb_avail rather than diffing.
    ceph: only queue capsnap if caps are dirty
    ceph: fix snap writeback when racing with writes
    ...

    Linus Torvalds
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
    jbd: change the field "b_cow_tid" of struct journal_head from type unsigned to tid_t
    ext3.txt: update the links in the section "useful links" to the latest ones
    ext3: Fix data corruption in inodes with journalled data
    ext2: check xattr name_len before acquiring xattr_sem in ext2_xattr_get
    ext3: Fix compilation with -DDX_DEBUG
    quota: Remove unused declaration
    jbd: Use WRITE_SYNC in journal checkpoint.
    jbd: Fix oops in journal_remove_journal_head()
    ext3: Return -EINVAL when start is beyond the end of fs in ext3_trim_fs()
    ext3/ioctl.c: silence sparse warnings about different address spaces
    ext3/ext4 Documentation: remove bh/nobh since it has been deprecated
    ext3: Improve truncate error handling
    ext3: use proper little-endian bitops
    ext2: include fs.h into ext2_fs.h
    ext3: Fix oops in ext3_try_to_allocate_with_rsv()
    jbd: fix a bug of leaking jh->b_jcount
    jbd: remove dependency on __GFP_NOFAIL
    ext3: Convert ext3 to new truncate calling convention
    jbd: Add fixed tracepoints
    ext3: Add fixed tracepoints

    Resolve conflicts in fs/ext3/fsync.c due to fsync locking push-down and
    new fixed tracepoints.

    Linus Torvalds
     
  • For the most part we don't care about racing with rename when directing
    MDS requests; either the old or new parent is fine. Document that, and
    do some minor cleanup.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • We carry a pin on the parent directory for the rename source and dest
    dentries. For the source it's r_locked_dir; we need to explicitly
    reference the old_dentry parent as well, since the dentry's d_parent may
    change between when the request was created and pinned and when it is
    freed.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • Have caller pass in a safely-obtained reference to the parent directory
    for calculating a dentry's hash valud.

    While we're here, simpify the flow through ceph_encode_fh() so that there
    is a single exit point and cleanup.

    Also fix a bug with the dentry hash calculation: calculate the hash for the
    dentry we were given, not its parent.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • Protect d_parent with d_lock. Carry a reference. Simplify the flow so
    that there is a single exit point and cleanup.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • d_parent is protected by d_lock: use it when looking up a dentry's parent
    directory inode. Also take a reference and drop it in the caller to avoid
    a use-after-free.

    Reported-by: Al Viro
    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • The ->lookup() and prepopulate_readdir() callers are working with unhashed
    dentries, so we don't have to worry. The export.c callers, though, need
    to initialize something they got back from d_obtain_alias() and are
    potentially racing with other callers. Make sure we don't return unless
    the dentry is properly initialized (by us or someone else).

    Reported-by: Al Viro
    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • Curretly ceph_add_cap clears the complete bit if we are newly issued the
    FILE_SHARED cap, which is normally the case for a newly issue cap on a new
    directory. That means we clear the just-set bit. Move the check that sets
    the flag to after the cap is added/updated.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • This should improve the default read performance, as without it
    readahead is practically disabled.

    Signed-off-by: Yehuda Sadeh

    Yehuda Sadeh
     
  • The lease mask is no longer used (and it changed a while back). Instead,
    use a non-zero duration to indicate that there is a lease being issued.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • We weren't properly calling lookup_instantiate_filp when setting up the
    lookup intent, which could lead to file leakage on errors. So:

    - use separate helper for the hidden snapdir translation, immediately
    following the mds request
    - use ceph_finish_lookup for the final dentry/return value dance in the
    exit path
    - lookup_instantiate_filp on success

    Reported-by: Al Viro
    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • We only need to put these on the directory unsafe list if they have
    side effects that fsync(2) should flush out.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • We were always getting NULL here because the intent file f_dentry is always
    NULL at this point, which means we were always passing NULL to
    ceph_mdsc_do_request. In reality, this was fine, since this isn't
    currently ever a write operation that needs to get strung on the dir's
    unsafe list.

    Use the dir explicitly, and only pass it if this open has side-effects that
    a dir fsync should flush.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • The generic_file_aio_write call may block on balance_dirty_pages while we
    flush data to the OSDs. If we hold a reference to the FILE_WR cap during
    that interval revocation by the MDS (e.g., to do a stat(2)) may be very
    slow.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • Reviewed-by: Yehuda Sadeh
    Signed-off-by: Greg Farnum

    Greg Farnum
     
  • We used to go into this branch if i_wrbuffer_ref_head was non-zero. This
    was an ancient check from before we were careful about dealing with all
    kinds of caps (and not just dirty pages). It is cleaner to only queue a
    capsnap if there is an actual dirty cap. If we are racing with...
    something...we will end up here with ci->i_wrbuffer_refs but no dirty
    caps.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • There are two problems that come up when we try to queue a capsnap while a
    write is in progress:

    - The FILE_WR cap is held, but not yet dirty, so we may queue a capsnap
    with dirty == 0. That will crash later in __ceph_flush_snaps(). Or
    on the FILE_WR cap if a write is in progress.
    - We may not have i_head_snapc set, which causes problems pretty quickly.
    Look to the snaprealm in this case.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • This saves us a word of memory per file.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • This allows us to force IO through the sync path which you normally only
    get when multiple clients are reading/writing to the same file or by
    mounting with -o sync. Among other things, this lets test programs verify
    correctness with a single mount.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
    cifs: Cleanup: check return codes of crypto api calls
    CIFS: Fix oops while mounting with prefixpath
    [CIFS] Redundant null check after dereference
    cifs: use cifs_dirent in cifs_save_resume_key
    cifs: use cifs_dirent to replace cifs_get_name_from_search_buf
    cifs: introduce cifs_dirent
    cifs: cleanup cifs_filldir

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback: (27 commits)
    mm: properly reflect task dirty limits in dirty_exceeded logic
    writeback: don't busy retry writeback on new/freeing inodes
    writeback: scale IO chunk size up to half device bandwidth
    writeback: trace global_dirty_state
    writeback: introduce max-pause and pass-good dirty limits
    writeback: introduce smoothed global dirty limit
    writeback: consolidate variable names in balance_dirty_pages()
    writeback: show bdi write bandwidth in debugfs
    writeback: bdi write bandwidth estimation
    writeback: account per-bdi accumulated written pages
    writeback: make writeback_control.nr_to_write straight
    writeback: skip tmpfs early in balance_dirty_pages_ratelimited_nr()
    writeback: trace event writeback_queue_io
    writeback: trace event writeback_single_inode
    writeback: remove .nonblocking and .encountered_congestion
    writeback: remove writeback_control.more_io
    writeback: skip balance_dirty_pages() for in-memory fs
    writeback: add bdi_dirty_limit() kernel-doc
    writeback: avoid extra sync work at enqueue time
    writeback: elevate queue_io() into wb_writeback()
    ...

    Fix up trivial conflicts in fs/fs-writeback.c and mm/filemap.c

    Linus Torvalds
     

26 Jul, 2011

6 commits

  • Commit 4e34e719e457 ("fs: take the ACL checks to common code") removed
    the use of the 'acl' variable in v9fs_iop_get_acl(), but left the
    variable definition around. Remove it to get rid of the warning:

    fs/9p/acl.c: In function ‘v9fs_iop_get_acl’:
    fs/9p/acl.c:101:20: warning: unused variable ‘acl’

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
    Squashfs: Make ZLIB compression support optional
    Squashfs: Update documentation for XZ and add squashfs-tools devel tree

    Linus Torvalds
     
  • * 'for-3.1' of git://linux-nfs.org/~bfields/linux:
    nfsd: don't break lease on CLAIM_DELEGATE_CUR
    locks: rename lock-manager ops
    nfsd4: update nfsv4.1 implementation notes
    nfsd: turn on reply cache for NFSv4
    nfsd4: call nfsd4_release_compoundargs from pc_release
    nfsd41: Deny new lock before RECLAIM_COMPLETE done
    fs: locks: remove init_once
    nfsd41: check the size of request
    nfsd41: error out when client sets maxreq_sz or maxresp_sz too small
    nfsd4: fix file leak on open_downgrade
    nfsd4: remember to put RW access on stateid destruction
    NFSD: Added TEST_STATEID operation
    NFSD: added FREE_STATEID operation
    svcrpc: fix list-corrupting race on nfsd shutdown
    rpc: allow autoloading of gss mechanisms
    svcauth_unix.c: quiet sparse noise
    svcsock.c: include sunrpc.h to quiet sparse noise
    nfsd: Remove deprecated nfsctl system call and related code.
    NFSD: allow OP_DESTROY_CLIENTID to be only op in COMPOUND

    Fix up trivial conflicts in Documentation/feature-removal-schedule.txt

    Linus Torvalds
     
  • Commit e77819e57f08 ("vfs: move ACL cache lookup into generic code")
    didn't take the FS_POSIX_ACL config variable into account - when that is
    not set, ACL's go away, and the cache helper functions do not exist,
    causing compile errors like

    fs/namei.c: In function 'check_acl':
    fs/namei.c:191:10: error: implicit declaration of function 'negative_cached_acl'
    fs/namei.c:196:2: error: implicit declaration of function 'get_cached_acl'
    fs/namei.c:196:6: warning: assignment makes pointer from integer without a cast
    fs/namei.c:212:11: error: implicit declaration of function 'set_cached_acl'

    Reported-by: Markus Trippelsdorf
    Acked-by: Stephen Rothwell
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • * Merge akpm patch series: (122 commits)
    drivers/connector/cn_proc.c: remove unused local
    Documentation/SubmitChecklist: add RCU debug config options
    reiserfs: use hweight_long()
    reiserfs: use proper little-endian bitops
    pnpacpi: register disabled resources
    drivers/rtc/rtc-tegra.c: properly initialize spinlock
    drivers/rtc/rtc-twl.c: check return value of twl_rtc_write_u8() in twl_rtc_set_time()
    drivers/rtc: add support for Qualcomm PMIC8xxx RTC
    drivers/rtc/rtc-s3c.c: support clock gating
    drivers/rtc/rtc-mpc5121.c: add support for RTC on MPC5200
    init: skip calibration delay if previously done
    misc/eeprom: add eeprom access driver for digsy_mtc board
    misc/eeprom: add driver for microwire 93xx46 EEPROMs
    checkpatch.pl: update $logFunctions
    checkpatch: make utf-8 test --strict
    checkpatch.pl: add ability to ignore various messages
    checkpatch: add a "prefer __aligned" check
    checkpatch: validate signature styles and To: and Cc: lines
    checkpatch: add __rcu as a sparse modifier
    checkpatch: suggest using min_t or max_t
    ...

    Did this as a merge because of (trivial) conflicts in
    - Documentation/feature-removal-schedule.txt
    - arch/xtensa/include/asm/uaccess.h
    that were just easier to fix up in the merge than in the patch series.

    Linus Torvalds
     
  • Use hweight_long() to count free bits in the bitmap.

    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita