13 Dec, 2014

4 commits

  • Pull trivial tree update from Jiri Kosina:
    "Usual stuff: documentation updates, printk() fixes, etc"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits)
    intel_ips: fix a type in error message
    cpufreq: cpufreq-dt: Move newline to end of error message
    ps3rom: fix error return code
    treewide: fix typo in printk and Kconfig
    ARM: dts: bcm63138: change "interupts" to "interrupts"
    Replace mentions of "list_struct" to "list_head"
    kernel: trace: fix printk message
    scsi: mpt2sas: fix ioctl in comment
    zbud, zswap: change module author email
    clocksource: Fix 'clcoksource' typo in comment
    arm: fix wording of "Crotex" in CONFIG_ARCH_EXYNOS3 help
    gpio: msm-v1: make boolean argument more obvious
    usb: Fix typo in usb-serial-simple.c
    PCI: Fix comment typo 'COMFIG_PM_OPS'
    powerpc: Fix comment typo 'CONIFG_8xx'
    powerpc: Fix comment typos 'CONFiG_ALTIVEC'
    clk: st: Spelling s/stucture/structure/
    isci: Spelling s/stucture/structure/
    usb: gadget: zero: Spelling s/infrastucture/infrastructure/
    treewide: Fix company name in module descriptions
    ...

    Linus Torvalds
     
  • Pull UBI/UBIFS updates from Artem Bityutskiy:
    "This includes the following UBI/UBIFS changes:
    - UBI debug messages now include the UBI device number. This change
    is responsible for the big diffstat since it touched every
    debugging print statement.
    - An Xattr bug-fix which fixes SELinux support
    - Several error path fixes in UBI/UBIFS"

    * tag 'upstream-3.19-rc1' of git://git.infradead.org/linux-ubifs:
    UBI: Fix invalid vfree()
    UBI: Fix double free after do_sync_erase()
    UBIFS: fix a couple bugs in UBIFS xattr length calculation
    UBI: vtbl: Use ubi_eba_atomic_leb_change()
    UBI: Extend UBI layer debug/messaging capabilities
    UBIFS: fix budget leak in error path

    Linus Torvalds
     
  • Pull xfs update from Dave Chinner:
    "There's relatively little change in this update; it is mainly bug
    fixes, cleanups and more of the on-going libxfs restructuring and
    on-disk format header consolidation work.

    Details:
    - more on-disk format header consolidation
    - move some structures shared with userspace to libxfs
    - new per-mount workqueue to fix for deadlocks between nested loop
    mounted filesystems
    - various bug fixes for ENOSPC, stats, quota off and preallocation
    - a bunch of compiler warning fixes for set-but-unused variables
    - various code cleanups"

    * tag 'xfs-for-linus-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (24 commits)
    xfs: split metadata and log buffer completion to separate workqueues
    xfs: fix set-but-unused warnings
    xfs: move type conversion functions to xfs_dir.h
    xfs: move ftype conversion functions to libxfs
    xfs: lobotomise xfs_trans_read_buf_map()
    xfs: active inodes stat is broken
    xfs: cleanup xfs_bmse_merge returns
    xfs: cleanup xfs_bmse_shift_one goto mess
    xfs: fix premature enospc on inode allocation
    xfs: overflow in xfs_iomap_eof_align_last_fsb
    xfs: fix simple_return.cocci warning in xfs_bmse_shift_one
    xfs: fix simple_return.cocci warning in xfs_file_readdir
    libxfs: fix simple_return.cocci warnings
    xfs: remove unnecessary null checks
    xfs: merge xfs_inum.h into xfs_format.h
    xfs: move most of xfs_sb.h to xfs_format.h
    xfs: merge xfs_ag.h into xfs_format.h
    xfs: move acl structures to xfs_format.h
    xfs: merge xfs_dinode.h into xfs_format.h
    xfs: catch invalid negative blknos in _xfs_buf_find()
    ...

    Linus Torvalds
     
  • Pull ext4 updates from Ted Ts'o:
    "Lots of bugs fixes, including Zheng and Jan's extent status shrinker
    fixes, which should improve CPU utilization and potential soft lockups
    under heavy memory pressure, and Eric Whitney's bigalloc fixes"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (26 commits)
    ext4: ext4_da_convert_inline_data_to_extent drop locked page after error
    ext4: fix suboptimal seek_{data,hole} extents traversial
    ext4: ext4_inline_data_fiemap should respect callers argument
    ext4: prevent fsreentrance deadlock for inline_data
    ext4: forbid journal_async_commit in data=ordered mode
    jbd2: remove unnecessary NULL check before iput()
    ext4: Remove an unnecessary check for NULL before iput()
    ext4: remove unneeded code in ext4_unlink
    ext4: don't count external journal blocks as overhead
    ext4: remove never taken branch from ext4_ext_shift_path_extents()
    ext4: create nojournal_checksum mount option
    ext4: update comments regarding ext4_delete_inode()
    ext4: cleanup GFP flags inside resize path
    ext4: introduce aging to extent status tree
    ext4: cleanup flag definitions for extent status tree
    ext4: limit number of scanned extents in status tree shrinker
    ext4: move handling of list of shrinkable inodes into extent status code
    ext4: change LRU to round-robin in extent status tree shrinker
    ext4: cache extent hole in extent status tree for ext4_da_map_blocks()
    ext4: fix block reservation for bigalloc filesystems
    ...

    Linus Torvalds
     

12 Dec, 2014

2 commits

  • Pull MIPS updates from Ralf Baechle:
    "This is an unusually large pull request for MIPS - in parts because
    lots of patches missed the 3.18 deadline but primarily because some
    folks opened the flood gates.

    - Retire the MIPS-specific phys_t with the generic phys_addr_t.
    - Improvments for the backtrace code used by oprofile.
    - Better backtraces on SMP systems.
    - Cleanups for the Octeon platform code.
    - Cleanups and fixes for the Loongson platform code.
    - Cleanups and fixes to the firmware library.
    - Switch ATH79 platform to use the firmware library.
    - Grand overhault to the SEAD3 and Malta interrupt code.
    - Move the GIC interrupt code to drivers/irqchip
    - Lots of GIC cleanups and updates to the GIC code to use modern IRQ
    infrastructures and features of the kernel.
    - OF documentation updates for the GIC bindings
    - Move GIC clocksource driver to drivers/clocksource
    - Merge GIC clocksource driver with clockevent driver.
    - Further updates to bring the GIC clocksource driver up to date.
    - R3000 TLB code cleanups
    - Improvments to the Loongson 3 platform code.
    - Convert pr_warning to pr_warn.
    - Merge a bunch of small lantiq and ralink fixes that have been
    staged/lingering inside the openwrt tree for a while.
    - Update archhelp for IP22/IP32
    - Fix a number of issues for Loongson 1B.
    - New clocksource and clockevent driver for Loongson 1B.
    - Further work on clk handling for Loongson 1B.
    - Platform work for Broadcom BMIPS.
    - Error handling cleanups for TurboChannel.
    - Fixes and optimization to the microMIPS support.
    - Option to disable the FTLB.
    - Dump more relevant information on machine check exception
    - Change binfmt to allow arch to examine PT_*PROC headers
    - Support for new style FPU register model in O32
    - VDSO randomization.
    - BCM47xx cleanups
    - BCM47xx reimplement the way the kernel accesses NVRAM information.
    - Random cleanups
    - Add support for ATH25 platforms
    - Remove pointless locking code in some PCI platforms.
    - Some improvments to EVA support
    - Minor Alchemy cleanup"

    * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (185 commits)
    MIPS: Add MFHC0 and MTHC0 instructions to uasm.
    MIPS: Cosmetic cleanups of page table headers.
    MIPS: Add CP0 macros for extended EntryLo registers
    MIPS: Remove now unused definition of phys_t.
    MIPS: Replace use of phys_t with phys_addr_t.
    MIPS: Replace MIPS-specific 64BIT_PHYS_ADDR with generic PHYS_ADDR_T_64BIT
    PCMCIA: Alchemy Don't select 64BIT_PHYS_ADDR in Kconfig.
    MIPS: lib: memset: Clean up some MIPS{EL,EB} ifdefery
    MIPS: iomap: Use __mem_{read,write}{b,w,l} for MMIO
    MIPS: fix indentation.
    MAINTAINERS: Add entry for BMIPS multiplatform kernel
    MIPS: Enable VDSO randomization
    MIPS: Remove a temporary hack for debugging cache flushes in SMTC configuration
    MIPS: Remove declaration of obsolete arch_init_clk_ops()
    MIPS: atomic.h: Reformat to fit in 79 columns
    MIPS: Apply `.insn' to fixup labels throughout
    MIPS: Fix microMIPS LL/SC immediate offsets
    MIPS: Kconfig: Only allow 32-bit microMIPS builds
    MIPS: signal.c: Fix an invalid cast in ISA mode bit handling
    MIPS: mm: Only build one microassembler that is suitable
    ...

    Linus Torvalds
     
  • Pull networking updates from David Miller:

    1) New offloading infrastructure and example 'rocker' driver for
    offloading of switching and routing to hardware.

    This work was done by a large group of dedicated individuals, not
    limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
    Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu

    2) Start making the networking operate on IOV iterators instead of
    modifying iov objects in-situ during transfers. Thanks to Al Viro
    and Herbert Xu.

    3) A set of new netlink interfaces for the TIPC stack, from Richard
    Alpe.

    4) Remove unnecessary looping during ipv6 routing lookups, from Martin
    KaFai Lau.

    5) Add PAUSE frame generation support to gianfar driver, from Matei
    Pavaluca.

    6) Allow for larger reordering levels in TCP, which are easily
    achievable in the real world right now, from Eric Dumazet.

    7) Add a variable of napi_schedule that doesn't need to disable cpu
    interrupts, from Eric Dumazet.

    8) Use a doubly linked list to optimize neigh_parms_release(), from
    Nicolas Dichtel.

    9) Various enhancements to the kernel BPF verifier, and allow eBPF
    programs to actually be attached to sockets. From Alexei
    Starovoitov.

    10) Support TSO/LSO in sunvnet driver, from David L Stevens.

    11) Allow controlling ECN usage via routing metrics, from Florian
    Westphal.

    12) Remote checksum offload, from Tom Herbert.

    13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
    driver, from Thomas Lendacky.

    14) Add MPLS support to openvswitch, from Simon Horman.

    15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
    Klassert.

    16) Do gro flushes on a per-device basis using a timer, from Eric
    Dumazet. This tries to resolve the conflicting goals between the
    desired handling of bulk vs. RPC-like traffic.

    17) Allow userspace to ask for the CPU upon what a packet was
    received/steered, via SO_INCOMING_CPU. From Eric Dumazet.

    18) Limit GSO packets to half the current congestion window, from Eric
    Dumazet.

    19) Add a generic helper so that all drivers set their RSS keys in a
    consistent way, from Eric Dumazet.

    20) Add xmit_more support to enic driver, from Govindarajulu
    Varadarajan.

    21) Add VLAN packet scheduler action, from Jiri Pirko.

    22) Support configurable RSS hash functions via ethtool, from Eyal
    Perry.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
    Fix race condition between vxlan_sock_add and vxlan_sock_release
    net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
    net/mlx4: Add support for A0 steering
    net/mlx4: Refactor QUERY_PORT
    net/mlx4_core: Add explicit error message when rule doesn't meet configuration
    net/mlx4: Add A0 hybrid steering
    net/mlx4: Add mlx4_bitmap zone allocator
    net/mlx4: Add a check if there are too many reserved QPs
    net/mlx4: Change QP allocation scheme
    net/mlx4_core: Use tasklet for user-space CQ completion events
    net/mlx4_core: Mask out host side virtualization features for guests
    net/mlx4_en: Set csum level for encapsulated packets
    be2net: Export tunnel offloads only when a VxLAN tunnel is created
    gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
    cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
    net: fec: only enable mdio interrupt before phy device link up
    net: fec: clear all interrupt events to support i.MX6SX
    net: fec: reset fep link status in suspend function
    net: sock: fix access via invalid file descriptor
    net: introduce helper macro for_each_cmsghdr
    ...

    Linus Torvalds
     

11 Dec, 2014

34 commits

  • Merge first patchbomb from Andrew Morton:
    - a few minor cifs fixes
    - dma-debug upadtes
    - ocfs2
    - slab
    - about half of MM
    - procfs
    - kernel/exit.c
    - panic.c tweaks
    - printk upates
    - lib/ updates
    - checkpatch updates
    - fs/binfmt updates
    - the drivers/rtc tree
    - nilfs
    - kmod fixes
    - more kernel/exit.c
    - various other misc tweaks and fixes

    * emailed patches from Andrew Morton : (190 commits)
    exit: pidns: fix/update the comments in zap_pid_ns_processes()
    exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting
    exit: exit_notify: re-use "dead" list to autoreap current
    exit: reparent: call forget_original_parent() under tasklist_lock
    exit: reparent: avoid find_new_reaper() if no children
    exit: reparent: introduce find_alive_thread()
    exit: reparent: introduce find_child_reaper()
    exit: reparent: document the ->has_child_subreaper checks
    exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper()
    exit: reparent: fix the cross-namespace PR_SET_CHILD_SUBREAPER reparenting
    exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparenting
    exit: proc: don't try to flush /proc/tgid/task/tgid
    exit: release_task: fix the comment about group leader accounting
    exit: wait: drop tasklist_lock before psig->c* accounting
    exit: wait: don't use zombie->real_parent
    exit: wait: cleanup the ptrace_reparented() checks
    usermodehelper: kill the kmod_thread_locker logic
    usermodehelper: don't use CLONE_VFORK for ____call_usermodehelper()
    fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp
    nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races
    ...

    Linus Torvalds
     
  • proc_flush_task_mnt() always tries to flush task/pid, but this is
    pointless if we reap the leader. d_invalidate() is recursive, and
    if nothing else the next d_hash_and_lookup(tgid) should fail anyway.

    Signed-off-by: Oleg Nesterov
    Cc: Aaron Tomlin
    Cc: "Eric W. Biederman"
    Cc: Rik van Riel
    Cc: Sterling Alexander
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Relying on the sign (after casting to int) of the difference of two
    quantities for comparison is usually wrong. For example, should a-b
    turn out to be 2^31, the return value of cmp(a,b) is -2^31; but that
    would also be the return value from cmp(b, a). So a compares less than
    b and b compares less than a. One can also easily find three values
    a,b,c such that a compares less than b, b compares less than c, but a
    does not compare less than c.

    Signed-off-by: Rasmus Villemoes
    Reviewed-by: Vyacheslav Dubeyko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     
  • Same story as in commit 41080b5a2401 ("nfsd race fixes: ext2") (similar
    ext2 fix) except that nilfs2 needs to use insert_inode_locked4() instead
    of insert_inode_locked() and a bug of a check for dead inodes needs to
    be fixed.

    If nilfs_iget() is called from nfsd after nilfs_new_inode() calls
    insert_inode_locked4(), nilfs_iget() will wait for unlock_new_inode() at
    the end of nilfs_mkdir()/nilfs_create()/etc to unlock the inode.

    If nilfs_iget() is called before nilfs_new_inode() calls
    insert_inode_locked4(), it will create an in-core inode and read its
    data from the on-disk inode. But, nilfs_iget() will find i_nlink equals
    zero and fail at nilfs_read_inode_common(), which will lead it to call
    iget_failed() and cleanly fail.

    However, this sanity check doesn't work as expected for reused on-disk
    inodes because they leave a non-zero value in i_mode field and it
    hinders the test of i_nlink. This patch also fixes the issue by
    removing the test on i_mode that nilfs2 doesn't need.

    Signed-off-by: Ryusuke Konishi
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • The iput() function tests whether its argument is NULL and then returns
    immediately. Thus the test around the call is not needed.

    This issue was detected by using the Coccinelle software.

    Signed-off-by: Markus Elfring
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Markus Elfring
     
  • This patch removes filemap_write_and_wait_range() from nilfs_sync_file(),
    because it triggers a data segment construction by calling
    nilfs_writepages() with WB_SYNC_ALL. A data segment construction does not
    remove the inode from the i_dirty list and it does not clear the
    NILFS_I_DIRTY flag. Therefore nilfs_inode_dirty() still returns true,
    which leads to an unnecessary duplicate segment construction in
    nilfs_sync_file().

    A call to filemap_write_and_wait_range() is not needed, because NILFS2
    does not rely on the generic writeback mechanisms. Instead it implements
    its own mechanism to collect all dirty pages and write them into segments.
    It is more efficient to initiate the segment construction directly in
    nilfs_sync_file() without the detour over filemap_write_and_wait_range().

    Additionally the lock of i_mutex is not needed, because all code blocks
    that are protected by i_mutex are also protected by a NILFS transaction:

    Function i_mutex nilfs_transaction
    ------------------------------------------------------
    nilfs_ioctl_setflags: yes yes
    nilfs_fiemap: yes no
    nilfs_write_begin: yes yes
    nilfs_write_end: yes yes
    nilfs_lookup: yes no
    nilfs_create: yes yes
    nilfs_link: yes yes
    nilfs_mknod: yes yes
    nilfs_symlink: yes yes
    nilfs_mkdir: yes yes
    nilfs_unlink: yes yes
    nilfs_rmdir: yes yes
    nilfs_rename: yes yes
    nilfs_setattr: yes yes

    For nilfs_lookup() i_mutex is held for the parent directory, to protect it
    from modification. The segment construction does not modify directory
    inodes, so no lock is needed.

    nilfs_fiemap() reads the block layout on the disk, by using
    nilfs_bmap_lookup_contig(). This is already protected by bmap->b_sem.

    Signed-off-by: Andreas Rohner
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Rohner
     
  • If some error happens in NCP_IOC_SETROOT ioctl, the appropriate error
    return value is then (in most cases) just overwritten before we return.
    This can result in reporting success to userspace although error happened.

    This bug was introduced by commit 2e54eb96e2c8 ("BKL: Remove BKL from
    ncpfs"). Propagate the errors correctly.

    Coverity id: 1226925.

    Fixes: 2e54eb96e2c80 ("BKL: Remove BKL from ncpfs")
    Signed-off-by: Jan Kara
    Cc: Petr Vandrovec
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • vma_dump_size() has been used several times on actual dumper and it is
    supposed to return the same value for the same vma. But vma_dump_size()
    could return different values for same vma.

    The known problem case is concurrent shared memory removal. If a vma is
    used for a shared memory and that shared memory is removed between
    writing program header and dumping vma memory, this will result in a
    dump file which is internally consistent.

    To fix the problem, we set baseline to get dump size and store the size
    into vma_filesz and always use the same vma dump size which is stored in
    vma_filsz. The consistnecy with reality is not actually guranteed, but
    it's tolerable since that is fully consistent with base line.

    Signed-off-by: Jungseung Lee
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jungseung Lee
     
  • GFP_USER means "honour cpuset nodes-allowed beancounting". These are
    regular old kernel objects and there seems no reason to give them this
    treatment.

    Acked-by: Mike Frysinger
    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Clean up various coding style issues that checkpatch complains about.
    No functional changes here.

    Signed-off-by: Mike Frysinger
    Cc: Al Viro
    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Frysinger
     
  • When trying to develop a custom format handler, the errors returned all
    effectively get bucketed as EINVAL with no kernel messages. The other
    errors (ENOMEM/EFAULT) are internal/obvious and basic. Thus any time a
    bad handler is rejected, the developer has to walk the dense code and
    try to guess where it went wrong. Needing to dive into kernel code is
    itself a fairly high barrier for a lot of people.

    To improve this situation, let's deploy extensive pr_debug markers at
    logical parse points, and add comments to the dense parsing logic. It
    let's you see exactly where the parsing aborts, the string the kernel
    received (useful when dealing with shell code), how it translated the
    buffers to binary data, and how it will apply the mask at runtime.

    Some example output:
    $ echo ':qemu-foo:M::\x7fELF\xAD\xAD\x01\x00:\xff\xff\xff\xff\xff\x00\xff\x00:/usr/bin/qemu-foo:POC' > register
    $ dmesg
    binfmt_misc: register: received 92 bytes
    binfmt_misc: register: delim: 0x3a {:}
    binfmt_misc: register: name: {qemu-foo}
    binfmt_misc: register: type: M (magic)
    binfmt_misc: register: offset: 0x0
    binfmt_misc: register: magic[raw]: 5c 78 37 66 45 4c 46 5c 78 41 44 5c 78 41 44 5c \x7fELF\xAD\xAD\
    binfmt_misc: register: magic[raw]: 78 30 31 5c 78 30 30 00 x01\x00.
    binfmt_misc: register: mask[raw]: 5c 78 66 66 5c 78 66 66 5c 78 66 66 5c 78 66 66 \xff\xff\xff\xff
    binfmt_misc: register: mask[raw]: 5c 78 66 66 5c 78 30 30 5c 78 66 66 5c 78 30 30 \xff\x00\xff\x00
    binfmt_misc: register: mask[raw]: 00 .
    binfmt_misc: register: magic/mask length: 8
    binfmt_misc: register: magic[decoded]: 7f 45 4c 46 ad ad 01 00 .ELF....
    binfmt_misc: register: mask[decoded]: ff ff ff ff ff 00 ff 00 ........
    binfmt_misc: register: magic[masked]: 7f 45 4c 46 ad 00 01 00 .ELF....
    binfmt_misc: register: interpreter: {/usr/bin/qemu-foo}
    binfmt_misc: register: flag: P (preserve argv0)
    binfmt_misc: register: flag: O (open binary)
    binfmt_misc: register: flag: C (preserve creds)

    The [raw] lines show us exactly what was received from userspace. The
    lines after that show us how the kernel has decoded things.

    Signed-off-by: Mike Frysinger
    Cc: Al Viro
    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Frysinger
     
  • This patch replaces calls to get_unused_fd() with equivalent call to
    get_unused_fd_flags(0) to preserve current behavor for existing code.

    In a further patch, get_unused_fd() will be removed so that new code
    start using get_unused_fd_flags(), with the hope O_CLOEXEC could be
    used, either by default or choosen by userspace.

    Signed-off-by: Yann Droneaud
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yann Droneaud
     
  • This patch replaces calls to get_unused_fd() with equivalent call to
    get_unused_fd_flags(0) to preserve current behavor for existing code.

    In a further patch, get_unused_fd() will be removed so that new code start
    using get_unused_fd_flags(), with the hope O_CLOEXEC could be used, either
    by default or choosen by userspace.

    Signed-off-by: Yann Droneaud
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yann Droneaud
     
  • p->ptrace != 0 means that release_task(p) was not called, so pid_alive()
    buys nothing and we can remove this check. Other callers already use it
    directly without additional checks.

    Note: with or without this patch ptrace_parent() can return the pointer to
    the freed task, this will be explained/fixed later.

    Signed-off-by: Oleg Nesterov
    Cc: Aaron Tomlin
    Cc: Alexey Dobriyan
    Cc: "Eric W. Biederman" ,
    Cc: Sterling Alexander
    Cc: Peter Zijlstra
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • task_state() does seq_printf() under rcu_read_lock(), but this is only
    needed for task_tgid_nr_ns() and task_numa_group_id(). We can calculate
    tgid/ngid and drop rcu lock.

    Signed-off-by: Oleg Nesterov
    Cc: Aaron Tomlin
    Cc: Alexey Dobriyan
    Cc: "Eric W. Biederman" ,
    Cc: Sterling Alexander
    Cc: Peter Zijlstra
    Cc: Roland McGrath
    Reviewed-by: Paul E. McKenney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • 1. The usage of fdt looks very ugly, it can't be NULL if ->files is
    not NULL. We can use "unsigned int max_fds" instead.

    2. This also allows to move seq_printf(max_fds) outside of task_lock()
    and join it with the previous seq_printf(). See also the next patch.

    Signed-off-by: Oleg Nesterov
    Cc: Aaron Tomlin
    Cc: Alexey Dobriyan
    Cc: "Eric W. Biederman" ,
    Cc: Sterling Alexander
    Cc: Peter Zijlstra
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • task_state() reads cred->group_info under task_lock() because a long ago
    it was task_struct->group_info and it was actually protected by
    task->alloc_lock. Today this task_unlock() after rcu_read_unlock() just
    adds the confusion, move task_unlock() up.

    Signed-off-by: Oleg Nesterov
    Cc: Aaron Tomlin
    Cc: Alexey Dobriyan
    Cc: "Eric W. Biederman" ,
    Cc: Sterling Alexander
    Cc: Peter Zijlstra
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Better to use existing macro that rewriting them.

    Signed-off-by: Nicolas Dichtel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicolas Dichtel
     
  • proc_register() error paths are leaking inodes and directory refcounts.

    Signed-off-by: Debabrata Banerjee
    Cc: Alexander Viro
    Acked-by: Nicolas Dichtel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Debabrata Banerjee
     
  • When a lot of netdevices are created, one of the bottleneck is the
    creation of proc entries. This serie aims to accelerate this part.

    The current implementation for the directories in /proc is using a single
    linked list. This is slow when handling directories with large numbers of
    entries (eg netdevice-related entries when lots of tunnels are opened).

    This patch replaces this linked list by a red-black tree.

    Here are some numbers:

    dummy30000.batch contains 30 000 times 'link add type dummy'.

    Before the patch:
    $ time ip -b dummy30000.batch
    real 2m31.950s
    user 0m0.440s
    sys 2m21.440s
    $ time rmmod dummy
    real 1m35.764s
    user 0m0.000s
    sys 1m24.088s

    After the patch:
    $ time ip -b dummy30000.batch
    real 2m0.874s
    user 0m0.448s
    sys 1m49.720s
    $ time rmmod dummy
    real 1m13.988s
    user 0m0.000s
    sys 1m1.008s

    The idea of improving this part was suggested by Thierry Herbelot.

    [akpm@linux-foundation.org: initialise proc_root.subdir at compile time]
    Signed-off-by: Nicolas Dichtel
    Acked-by: David S. Miller
    Cc: Thierry Herbelot .
    Acked-by: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicolas Dichtel
     
  • As a small zero page, huge zero page should not be accounted in smaps
    report as normal page.

    For small pages we rely on vm_normal_page() to filter out zero page, but
    vm_normal_page() is not designed to handle pmds. We only get here due
    hackish cast pmd to pte in smaps_pte_range() -- pte and pmd format is not
    necessary compatible on each and every architecture.

    Let's add separate codepath to handle pmds. follow_trans_huge_pmd() will
    detect huge zero page for us.

    We would need pmd_dirty() helper to do this properly. The patch adds it
    to THP-enabled architectures which don't yet have one.

    [akpm@linux-foundation.org: use do_div to fix 32-bit build]
    Signed-off-by: "Kirill A. Shutemov"
    Reported-by: Fengguang Wu
    Tested-by: Fengwei Yin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • At one place we assign major number we found to ret. That assignment is
    then never used and actually doesn't make any sense given how the code is
    currently structured (the assignment comes from pre-git times). Just
    remove it.

    Coverity id: 1226852.

    Signed-off-by: Jan Kara
    Cc: Alexander Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • In commit 1faf289454b9 ("ocfs2_dlm: disallow a domain join if node maps
    mismatch") we introduced a new earlier NULL check so this one is not
    needed. Also static checkers complain because we dereference it first
    and then check for NULL.

    Signed-off-by: Dan Carpenter
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Carpenter
     
  • "inode" isn't NULL here, and also we dereference it on the previous line
    so static checkers get annoyed.

    Signed-off-by: Dan Carpenter
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Carpenter
     
  • Do not set the filesystem readonly if the storage link is down. In this
    case, metadata is not corrupted and only -EIO is returned. And if it is
    indeed corrupted metadata, it has already called ocfs2_error() in
    ocfs2_validate_inode_block().

    Signed-off-by: Yiwen Jiang
    Cc: Joel Becker
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    jiangyiwen
     
  • ocfs2_readpages() use nonblocking flag to avoid page lock inversion. It
    will trigger cluster hang because that flag OCFS2_LOCK_UPCONVERT_FINISHING
    is not cleared if nonblocking lock cannot be granted at once. The flag
    would prevent dc thread from downconverting. So other nodes cannot
    acheive this lockres for ever.

    So we should not set OCFS2_LOCK_UPCONVERT_FINISHING when receiving ast if
    nonblocking lock had already returned.

    Signed-off-by: joyce.xue
    Reviewed-by: Junxiao Bi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xue jiufei
     
  • Error handling if creation of root of debugfs in ocfs2_init() fails is
    broken. Although error code is set we fail to exit ocfs2_init() with
    error and thus initialization ends with success. Later when mounting a
    filesystem, ocfs2 debugfs entries end up being created in the root of
    debugfs filesystem which is confusing.

    Fix the error handling to bail out.

    Coverity id: 1227009.

    Signed-off-by: Jan Kara
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Filesize is not a good indication that the file needs to be synced.
    An example where this breaks is:
    1. Open the file in O_SYNC|O_RDWR
    2. Read a small portion of the file (say 64 bytes)
    3. Lseek to starting of the file
    4. Write 64 bytes

    If the node crashes, it is not written out to disk because this was not
    committed in the journal and the other node which reads the file after
    recovery reads stale data (even if the write on the other node was
    successful)

    Signed-off-by: Goldwyn Rodrigues
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Goldwyn Rodrigues
     
  • Set nn_persistent_error to -ENOTCONN will stop reconnect since the
    "stop" condition in o2net_start_connect() will be true.

    stop = (nn->nn_sc ||
    (nn->nn_persistent_error &&
    (nn->nn_persistent_error != -ENOTCONN || timeout == 0)));

    This will make connection never be established if the first connection
    request is lost.

    Set nn_persistent_error to 0 when connect expired to fix this. With
    this changes, dlm will not be waken up when connect expired, this is OK
    since dlm depends on network, dlm can do nothing in this case if waken
    up. Let it wait there for network recover and connect built again to
    continue.

    Signed-off-by: Junxiao Bi
    Reviewed-by: Srinivas Eeda
    Cc: Joel Becker
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     
  • Node A sends master query request to node B which is the master. At this
    time lockres happens to be on purgelist. dlm_master_request_handler gets
    the dlm spinlock, finds the resource and releases the dlm spin lock.
    Right at this dlm_thread on this node could purge the lockres.
    dlm_master_request_handler can then acquire lockres spinlock and reply to
    Node A that node B is the master even though lockres on node B is purged.

    The above scenario will now make node A falsely think node B is the master
    which is inconsistent. Further if another node C tries to master the same
    resource, every node will respond they are not the master. Node C then
    masters the resource and sends assert master to all nodes. This will now
    make node A crash with the following message.

    dlm_assert_master_handler:1831 ERROR: DIE! Mastery assert from 9, but current
    owner is 10!

    Signed-off-by: Srinivas Eeda
    Cc: Mark Fasheh
    Cc: Joel Becker
    Reviewed-by: Wengang Wang
    Tested-by: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Srinivas Eeda
     
  • Report return value of o2hb_do_disk_heartbeat() as a part of ML_HEARTBEAT
    message so that we know whether a heartbeat actually happened or not.
    This also makes assigned but otherwise unused 'ret' variable used.

    Coverity id: 1227053.

    Signed-off-by: Jan Kara
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • 'args' are always set for ocfs2_read_locked_inode() and brelse() checks
    whether bh is NULL. So the test (args && bh) is unnecessary (plus the
    args part is really confusing anyway). Remove it.

    Coverity id: 1128856.

    Signed-off-by: Jan Kara
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • ocfs2_get_xattr_nolock() checks whether inode has any extended attributes
    (OCFS2_HAS_XATTR_FL). If not, it just sets 'ret' to -ENODATA but
    continues with checking inline and external attributes anyway (which is
    pointless although it does not harm). Just return immediately when we
    know there are no extended attributes in the inode.

    Coverity id: 1226906.

    Signed-off-by: Jan Kara
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • The ->si_slots[] array is allocated in ocfs2_init_slot_info() it has
    "->max_slots" number of elements so this test should be >= instead of >.

    Static checker work. Compile tested only.

    Signed-off-by: Dan Carpenter
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Carpenter