03 May, 2013

2 commits

  • Pull xfs update from Ben Myers:
    "For 3.10-rc1 we have a number of bug fixes and cleanups and a
    currently experimental feature from David Chinner, CRCs protection for
    metadata. CRCs are enabled by using mkfs.xfs to create a filesystem
    with the feature bits set.

    - numerous fixes for speculative preallocation
    - don't verify buffers on IO errors
    - rename of random32 to prandom32
    - refactoring/rearrangement in xfs_bmap.c
    - removal of unused m_inode_shrink in struct xfs_mount
    - fix error handling of xfs_bufs and readahead
    - quota driven preallocation throttling
    - fix WARN_ON in xfs_vm_releasepage
    - add ratelimited printk for different alert levels
    - fix spurious forced shutdowns due to freed Extent Free Intents
    - remove some obsolete XLOG_CIL_HARD_SPACE_LIMIT() macros
    - remove some obsoleted comments
    - (experimental) CRC support for metadata"

    * tag 'for-linus-v3.10-rc1' of git://oss.sgi.com/xfs/xfs: (46 commits)
    xfs: fix da node magic number mismatches
    xfs: Remote attr validation fixes and optimisations
    xfs: Teach dquot recovery about CONFIG_XFS_QUOTA
    xfs: add metadata CRC documentation
    xfs: implement extended feature masks
    xfs: add CRC checks to the superblock
    xfs: buffer type overruns blf_flags field
    xfs: add buffer types to directory and attribute buffers
    xfs: add CRC protection to remote attributes
    xfs: split remote attribute code out
    xfs: add CRCs to attr leaf blocks
    xfs: add CRCs to dir2/da node blocks
    xfs: shortform directory offsets change for dir3 format
    xfs: add CRC checking to dir2 leaf blocks
    xfs: add CRC checking to dir2 data blocks
    xfs: add CRC checking to dir2 free blocks
    xfs: add CRC checks to block format directory blocks
    xfs: add CRC checks to remote symlinks
    xfs: split out symlink code into it's own file.
    xfs: add version 3 inode format with CRCs
    ...

    Linus Torvalds
     
  • Pull powerpc update from Benjamin Herrenschmidt:
    "The main highlights this time around are:

    - A pile of addition POWER8 bits and nits, such as updated
    performance counter support (Michael Ellerman), new branch history
    buffer support (Anshuman Khandual), base support for the new PCI
    host bridge when not using the hypervisor (Gavin Shan) and other
    random related bits and fixes from various contributors.

    - Some rework of our page table format by Aneesh Kumar which fixes a
    thing or two and paves the way for THP support. THP itself will
    not make it this time around however.

    - More Freescale updates, including Altivec support on the new e6500
    cores, new PCI controller support, and a pile of new boards support
    and updates.

    - The usual batch of trivial cleanups & fixes"

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits)
    powerpc: Fix build error for book3e
    powerpc: Context switch the new EBB SPRs
    powerpc: Turn on the EBB H/FSCR bits
    powerpc: Replace CPU_FTR_BCTAR with CPU_FTR_ARCH_207S
    powerpc: Setup BHRB instructions facility in HFSCR for POWER8
    powerpc: Fix interrupt range check on debug exception
    powerpc: Update tlbie/tlbiel as per ISA doc
    powerpc: Print page size info during boot
    powerpc: print both base and actual page size on hash failure
    powerpc: Fix hpte_decode to use the correct decoding for page sizes
    powerpc: Decode the pte-lp-encoding bits correctly.
    powerpc: Use encode avpn where we need only avpn values
    powerpc: Reduce PTE table memory wastage
    powerpc: Move the pte free routines from common header
    powerpc: Reduce the PTE_INDEX_SIZE
    powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format
    powerpc: New hugepage directory format
    powerpc: Don't truncate pgd_index wrongly
    powerpc: Don't hard code the size of pte page
    powerpc: Save DAR and DSISR in pt_regs on MCE
    ...

    Linus Torvalds
     

02 May, 2013

19 commits

  • Pull VFS updates from Al Viro,

    Misc cleanups all over the place, mainly wrt /proc interfaces (switch
    create_proc_entry to proc_create(), get rid of the deprecated
    create_proc_read_entry() in favor of using proc_create_data() and
    seq_file etc).

    7kloc removed.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
    don't bother with deferred freeing of fdtables
    proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
    proc: Make the PROC_I() and PDE() macros internal to procfs
    proc: Supply a function to remove a proc entry by PDE
    take cgroup_open() and cpuset_open() to fs/proc/base.c
    ppc: Clean up scanlog
    ppc: Clean up rtas_flash driver somewhat
    hostap: proc: Use remove_proc_subtree()
    drm: proc: Use remove_proc_subtree()
    drm: proc: Use minor->index to label things, not PDE->name
    drm: Constify drm_proc_list[]
    zoran: Don't print proc_dir_entry data in debug
    reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
    proc: Supply an accessor for getting the data from a PDE's parent
    airo: Use remove_proc_subtree()
    rtl8192u: Don't need to save device proc dir PDE
    rtl8187se: Use a dir under /proc/net/r8180/
    proc: Add proc_mkdir_data()
    proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
    proc: Move PDE_NET() to fs/proc/proc_net.c
    ...

    Linus Torvalds
     
  • Pull x86/efi changes from Peter Anvin:
    "The bulk of these changes are cleaning up the efivars handling and
    breaking it up into a tree of files. There are a number of fixes as
    well.

    The entire changeset is pretty big, but most of it is code movement.

    Several of these commits are quite new; the history got very messed up
    due to a mismerge with the urgent changes for rc8 which completely
    broke IA64, and so Ingo requested that we rebase it to straighten it
    out."

    * 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    efi: remove "kfree(NULL)"
    efi: locking fix in efivar_entry_set_safe()
    efi, pstore: Read data from variable store before memcpy()
    efi, pstore: Remove entry from list when erasing
    efi, pstore: Initialise 'entry' before iterating
    efi: split efisubsystem from efivars
    efivarfs: Move to fs/efivarfs
    efivars: Move pstore code into the new EFI directory
    efivars: efivar_entry API
    efivars: Keep a private global pointer to efivars
    efi: move utf16 string functions to efi.h
    x86, efi: Make efi_memblock_x86_reserve_range more readable
    efivarfs: convert to use simple_open()

    Linus Torvalds
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Move non-public declarations and definitions from linux/proc_fs.h to
    fs/proc/internal.h.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Make the PROC_I() and PDE() macros internal to procfs. This means making
    PDE_DATA() out of line. This could be made more optimal by storing
    PDE()->data into inode->i_private.

    Also provide a __PDE_DATA() that is inline and internal to procfs.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Supply a function (proc_remove()) to remove a proc entry (and any subtree
    rooted there) by proc_dir_entry pointer rather than by name and (optionally)
    root dir entry pointer. This allows us to eliminate all remaining pde->name
    accesses outside of procfs.

    Signed-off-by: David Howells
    Acked-by: Grant Likely
    cc: linux-acpi@vger.kernel.org
    cc: openipmi-developer@lists.sourceforge.net
    cc: devicetree-discuss@lists.ozlabs.org
    cc: linux-pci@vger.kernel.org
    cc: netdev@vger.kernel.org
    cc: netfilter-devel@vger.kernel.org
    cc: alsa-devel@alsa-project.org
    Signed-off-by: Al Viro

    David Howells
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Don't access the proc_dir_entry in ReiserFS's r_open(), r_start() r_show()
    procfs interface functions.

    ReiserFS stores the ->show() method pointer in PDE->data and the super_block
    pointer in PDE->parent->data. This isn't changing.

    Currently, ReiserFS passes the PDE pointer into seq_file::private from
    r_open() so that r_start() and r_show() can then access it. Instead, use
    seq_open_private() to allocate a two-pointer struct that's passed through
    seq_file::private and put the ->show() method and the sb pointers in there.

    Signed-off-by: David Howells
    cc: reiserfs-devel@vger.kernel.org
    Signed-off-by: Al Viro

    David Howells
     
  • Supply an accessor function for getting the private data from the parent
    proc_dir_entry struct of the proc_dir_entry struct associated with an inode.

    ReiserFS, for instance, stores the super_block pointer in the proc directory
    it makes for that super_block, and a pointer to the respective seq_file show
    function in each of the proc files in that directory.

    This allows a reduction in the number of file_operations structs, open
    functions and seq_operations structs required. The problem otherwise is that
    each show function requires two pieces of data but only has storage for one
    per PDE (and this has no release function).

    Signed-off-by: David Howells
    Acked-by: Mauro Carvalho Chehab
    Acked-by: Greg Kroah-Hartman
    cc: Jerry Chuang
    cc: Maxim Mikityanskiy
    cc: YAMANE Toshiaki
    cc: linux-wireless@vger.kernel.org
    cc: linux-scsi@vger.kernel.org
    cc: devel@driverdev.osuosl.org
    Signed-off-by: Al Viro

    David Howells
     
  • Add proc_mkdir_data() to allow procfs directories to be created that are
    annotated at the time of creation with private data rather than doing this
    post-creation. This means no access is then required to the proc_dir_entry
    struct to set this.

    Signed-off-by: David Howells
    Acked-by: Mauro Carvalho Chehab
    Acked-by: Greg Kroah-Hartman
    cc: Neela Syam Kolli
    cc: Jerry Chuang
    cc: linux-scsi@vger.kernel.org
    cc: devel@driverdev.osuosl.org
    cc: linux-wireless@vger.kernel.org
    Signed-off-by: Al Viro

    David Howells
     
  • Move some bits from linux/proc_fs.h to linux/of.h, signal.h and tty.h.

    Also move proc_tty_init() and proc_device_tree_init() to fs/proc/internal.h as
    they're internal to procfs.

    Signed-off-by: David Howells
    Acked-by: Greg Kroah-Hartman
    Acked-by: Grant Likely
    cc: devicetree-discuss@lists.ozlabs.org
    cc: linux-arch@vger.kernel.org
    cc: Greg Kroah-Hartman
    cc: Jri Slaby
    Signed-off-by: Al Viro

    David Howells
     
  • Move PDE_NET() to fs/proc/proc_net.c as that's where the only user is.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Split the proc namespace stuff out into linux/proc_ns.h.

    Signed-off-by: David Howells
    cc: netdev@vger.kernel.org
    cc: Serge E. Hallyn
    cc: Eric W. Biederman
    Signed-off-by: Al Viro

    David Howells
     
  • Move proc_fd() to fs/proc/fd.h.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Uninline pid_delete_dentry() as it's only used by three function pointers.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Supply accessor functions to set attributes in proc_dir_entry structs.

    The following are supplied: proc_set_size() and proc_set_user().

    Signed-off-by: David Howells
    Acked-by: Mauro Carvalho Chehab
    cc: linuxppc-dev@lists.ozlabs.org
    cc: linux-media@vger.kernel.org
    cc: netdev@vger.kernel.org
    cc: linux-wireless@vger.kernel.org
    cc: linux-pci@vger.kernel.org
    cc: netfilter-devel@vger.kernel.org
    cc: alsa-devel@alsa-project.org
    Signed-off-by: Al Viro

    David Howells
     
  • Pull networking updates from David Miller:
    "Highlights (1721 non-merge commits, this has to be a record of some
    sort):

    1) Add 'random' mode to team driver, from Jiri Pirko and Eric
    Dumazet.

    2) Make it so that any driver that supports configuration of multiple
    MAC addresses can provide the forwarding database add and del
    calls by providing a default implementation and hooking that up if
    the driver doesn't have an explicit set of handlers. From Vlad
    Yasevich.

    3) Support GSO segmentation over tunnels and other encapsulating
    devices such as VXLAN, from Pravin B Shelar.

    4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton.

    5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita
    Dukkipati.

    6) In the PHY layer, allow supporting wake-on-lan in situations where
    the PHY registers have to be written for it to be configured.

    Use it to support wake-on-lan in mv643xx_eth.

    From Michael Stapelberg.

    7) Significantly improve firewire IPV6 support, from YOSHIFUJI
    Hideaki.

    8) Allow multiple packets to be sent in a single transmission using
    network coding in batman-adv, from Martin Hundebøll.

    9) Add support for T5 cxgb4 chips, from Santosh Rastapur.

    10) Generalize the VXLAN forwarding tables so that there is more
    flexibility in configurating various aspects of the endpoints.
    From David Stevens.

    11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver,
    from Dmitry Kravkov.

    12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo
    Neira Ayuso.

    13) Start adding networking selftests.

    14) In situations of overload on the same AF_PACKET fanout socket, or
    per-cpu packet receive queue, minimize drop by distributing the
    load to other cpus/fanouts. From Willem de Bruijn and Eric
    Dumazet.

    15) Add support for new payload offset BPF instruction, from Daniel
    Borkmann.

    16) Convert several drivers over to mdoule_platform_driver(), from
    Sachin Kamat.

    17) Provide a minimal BPF JIT image disassembler userspace tool, from
    Daniel Borkmann.

    18) Rewrite F-RTO implementation in TCP to match the final
    specification of it in RFC4138 and RFC5682. From Yuchung Cheng.

    19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear
    you like netlink, so I implemented netlink dumping of netlink
    sockets.") From Andrey Vagin.

    20) Remove ugly passing of rtnetlink attributes into rtnl_doit
    functions, from Thomas Graf.

    21) Allow userspace to be able to see if a configuration change occurs
    in the middle of an address or device list dump, from Nicolas
    Dichtel.

    22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes
    Frederic Sowa.

    23) Increase accuracy of packet length used by packet scheduler, from
    Jason Wang.

    24) Beginning set of changes to make ipv4/ipv6 fragment handling more
    scalable and less susceptible to overload and locking contention,
    from Jesper Dangaard Brouer.

    25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_*()
    instead. From Hong Zhiguo.

    26) Optimize route usage in IPVS by avoiding reference counting where
    possible, from Julian Anastasov.

    27) Convert IPVS schedulers to RCU, also from Julian Anastasov.

    28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger
    Eitzenberger.

    29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG,
    nfnetlink_log, and nfnetlink_queue. From Gao feng.

    30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa.

    31) Support several new r8169 chips, from Hayes Wang.

    32) Support tokenized interface identifiers in ipv6, from Daniel
    Borkmann.

    33) Use usbnet_link_change() helper in USB net driver, from Ming Lei.

    34) Add 802.1ad vlan offload support, from Patrick McHardy.

    35) Support mmap() based netlink communication, also from Patrick
    McHardy.

    36) Support HW timestamping in mlx4 driver, from Amir Vadai.

    37) Rationalize AF_PACKET packet timestamping when transmitting, from
    Willem de Bruijn and Daniel Borkmann.

    38) Bring parity to what's provided by /proc/net/packet socket dumping
    and the info provided by netlink socket dumping of AF_PACKET
    sockets. From Nicolas Dichtel.

    39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin
    Poirier"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
    filter: fix va_list build error
    af_unix: fix a fatal race with bit fields
    bnx2x: Prevent memory leak when cnic is absent
    bnx2x: correct reading of speed capabilities
    net: sctp: attribute printl with __printf for gcc fmt checks
    netlink: kconfig: move mmap i/o into netlink kconfig
    netpoll: convert mutex into a semaphore
    netlink: Fix skb ref counting.
    net_sched: act_ipt forward compat with xtables
    mlx4_en: fix a build error on 32bit arches
    Revert "bnx2x: allow nvram test to run when device is down"
    bridge: avoid OOPS if root port not found
    drivers: net: cpsw: fix kernel warn on cpsw irq enable
    sh_eth: use random MAC address if no valid one supplied
    3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA)
    tg3: fix to append hardware time stamping flags
    unix/stream: fix peeking with an offset larger than data in queue
    unix/dgram: fix peeking with an offset larger than data in queue
    unix/dgram: peek beyond 0-sized skbs
    openvswitch: Remove unneeded ovs_netdev_get_ifindex()
    ...

    Linus Torvalds
     
  • Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • - optimise the calcuation for the number of blocks in a remote
    xattr.
    - check attribute length against MAX_XATTR_SIZE, not MAXPATHLEN
    - whitespace fixes

    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Dave Chinner
     

01 May, 2013

19 commits

  • Pull ext4 updates from Ted Ts'o:
    "Mostly performance and bug fixes, plus some cleanups. The one new
    feature this merge window is a new ioctl EXT4_IOC_SWAP_BOOT which
    allows installation of a hidden inode designed for boot loaders."

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (50 commits)
    ext4: fix type-widening bug in inode table readahead code
    ext4: add check for inodes_count overflow in new resize ioctl
    ext4: fix Kconfig documentation for CONFIG_EXT4_DEBUG
    ext4: fix online resizing for ext3-compat file systems
    jbd2: trace when lock_buffer in do_get_write_access takes a long time
    ext4: mark metadata blocks using bh flags
    buffer: add BH_Prio and BH_Meta flags
    ext4: mark all metadata I/O with REQ_META
    ext4: fix readdir error in case inline_data+^dir_index.
    ext4: fix readdir error in the case of inline_data+dir_index
    jbd2: use kmem_cache_zalloc instead of kmem_cache_alloc/memset
    ext4: mext_insert_extents should update extent block checksum
    ext4: move quota initialization out of inode allocation transaction
    ext4: reserve xattr index for Rich ACL support
    jbd2: reduce journal_head size
    ext4: clear buffer_uninit flag when submitting IO
    ext4: use io_end for multiple bios
    ext4: make ext4_bio_write_page() use BH_Async_Write flags
    ext4: Use kstrtoul() instead of parse_strtoul()
    ext4: defragmentation code cleanup
    ...

    Linus Torvalds
     
  • Pull compat cleanup from Al Viro:
    "Mostly about syscall wrappers this time; there will be another pile
    with patches in the same general area from various people, but I'd
    rather push those after both that and vfs.git pile are in."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
    syscalls.h: slightly reduce the jungles of macros
    get rid of union semop in sys_semctl(2) arguments
    make do_mremap() static
    sparc: no need to sign-extend in sync_file_range() wrapper
    ppc compat wrappers for add_key(2) and request_key(2) are pointless
    x86: trim sys_ia32.h
    x86: sys32_kill and sys32_mprotect are pointless
    get rid of compat_sys_semctl() and friends in case of ARCH_WANT_OLD_COMPAT_IPC
    merge compat sys_ipc instances
    consolidate compat lookup_dcookie()
    convert vmsplice to COMPAT_SYSCALL_DEFINE
    switch getrusage() to COMPAT_SYSCALL_DEFINE
    switch epoll_pwait to COMPAT_SYSCALL_DEFINE
    convert sendfile{,64} to COMPAT_SYSCALL_DEFINE
    switch signalfd{,4}() to COMPAT_SYSCALL_DEFINE
    make SYSCALL_DEFINE-generated wrappers do asmlinkage_protect
    make HAVE_SYSCALL_WRAPPERS unconditional
    consolidate cond_syscall and SYSCALL_ALIAS declarations
    teach SYSCALL_DEFINE how to deal with long long/unsigned long long
    get rid of duplicate logics in __SC_....[1-6] definitions

    Linus Torvalds
     
  • Merge third batch of fixes from Andrew Morton:
    "Most of the rest. I still have two large patchsets against AIO and
    IPC, but they're a bit stuck behind other trees and I'm about to
    vanish for six days.

    - random fixlets
    - inotify
    - more of the MM queue
    - show_stack() cleanups
    - DMI update
    - kthread/workqueue things
    - compat cleanups
    - epoll udpates
    - binfmt updates
    - nilfs2
    - hfs
    - hfsplus
    - ptrace
    - kmod
    - coredump
    - kexec
    - rbtree
    - pids
    - pidns
    - pps
    - semaphore tweaks
    - some w1 patches
    - relay updates
    - core Kconfig changes
    - sysrq tweaks"

    * emailed patches from Andrew Morton : (109 commits)
    Documentation/sysrq: fix inconstistent help message of sysrq key
    ethernet/emac/sysrq: fix inconstistent help message of sysrq key
    sparc/sysrq: fix inconstistent help message of sysrq key
    powerpc/xmon/sysrq: fix inconstistent help message of sysrq key
    ARM/etm/sysrq: fix inconstistent help message of sysrq key
    power/sysrq: fix inconstistent help message of sysrq key
    kgdb/sysrq: fix inconstistent help message of sysrq key
    lib/decompress.c: fix initconst
    notifier-error-inject: fix module names in Kconfig
    kernel/sys.c: make prctl(PR_SET_MM) generally available
    UAPI: remove empty Kbuild files
    menuconfig: print more info for symbol without prompts
    init/Kconfig: re-order CONFIG_EXPERT options to fix menuconfig display
    kconfig menu: move Virtualization drivers near other virtualization options
    Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
    relay: use macro PAGE_ALIGN instead of FIX_SIZE
    kernel/relay.c: move FIX_SIZE macro into relay.c
    kernel/relay.c: remove unused function argument actor
    drivers/w1/slaves/w1_ds2760.c: fix the error handling in w1_ds2760_add_slave()
    drivers/w1/slaves/w1_ds2781.c: fix the error handling in w1_ds2781_add_slave()
    ...

    Linus Torvalds
     
  • threadgroup_lock() takes signal->cred_guard_mutex to ensure that
    thread_group_leader() is stable. This doesn't look nice, the scope of
    this lock in do_execve() is huge.

    And as Dave pointed out this can lead to deadlock, we have the
    following dependencies:

    do_execve: cred_guard_mutex -> i_mutex
    cgroup_mount: i_mutex -> cgroup_mutex
    attach_task_by_pid: cgroup_mutex -> cred_guard_mutex

    Change de_thread() to take threadgroup_change_begin() around the
    switch-the-leader code and change threadgroup_lock() to avoid
    ->cred_guard_mutex.

    Note that de_thread() can't sleep with ->group_rwsem held, this can
    obviously deadlock with the exiting leader if the writer is active, so it
    does threadgroup_change_end() before schedule().

    Reported-by: Dave Jones
    Acked-by: Tejun Heo
    Acked-by: Li Zefan
    Signed-off-by: Oleg Nesterov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • set_task_comm() does memset() + wmb() before strlcpy(). This buys
    nothing and to add to the confusion, the comment is wrong.

    - We do not need memset() to be "safe from non-terminating string
    reads", the final char is always zero and we never change it.

    - wmb() is paired with nothing, it cannot prevent from printing
    the mixture of the old/new data unless the reader takes the lock.

    Signed-off-by: Oleg Nesterov
    Cc: Andi Kleen
    Cc: John Stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Currently, a write to a procfs file will return the number of bytes
    successfully written. If the actual string is longer than this, the
    remainder of the string will not be be written and userspace will
    complete the operation by issuing additional write()s.

    Hence

    $ echo -n "abcdefghijklmnopqrs" > /proc/self/comm

    results in

    $ cat /proc/$$/comm
    pqrs

    since the final four bytes were written with a second write() since
    TASK_COMM_LEN == 16. This is obviously an undesired result and not
    equivalent to prctl(PR_SET_NAME). The implementation should not need to
    know the definition of TASK_COMM_LEN.

    This patch truncates the string to the first TASK_COMM_LEN bytes and
    returns the bytes written as the length of the string written so the
    second write() is suppressed.

    $ cat /proc/$$/comm
    abcdefghijklmno

    Signed-off-by: David Rientjes
    Acked-by: John Stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • wait_for_dump_helpers() calls wake_up/kill_fasync from inside the
    wait_event-like loop. This is not needed and in fact this is not
    strictly correct, we can/should do this only once after we change
    pipe->writers. We could even check if it becomes zero.

    Change this code to use use wait_event_interruptible(), this can also
    help to make this wait freezable.

    With this patch we check pipe->readers without pipe_lock(), this is
    fine. Once we see pipe->readers == 1 we know that the handler
    decremented the counter, this is all we need.

    Signed-off-by: Oleg Nesterov
    Acked-by: Mandeep Singh Baines
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Cleanup. Every linux_binfmt->core_dump() sets PF_DUMPCORE, move this into
    zap_threads() called by do_coredump().

    Signed-off-by: Oleg Nesterov
    Acked-by: Mandeep Singh Baines
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • By discussion with Mandeep.

    Change dump_write(), dump_seek() and do_coredump() to check
    signal_pending() and abort if it is true. dump_seek() does this only
    before f_op->llseek(), otherwise it relies on dump_write().

    We need this change to ensure that the coredump won't delay suspend, and
    to ensure it reacts to SIGKILL "quickly enough", a core dump can take a
    lot of time. In particular this can help oom-killer.

    We add the new trivial helper, dump_interrupted() to add the comments and
    to simplify the potential freezer changes. Perhaps it will have more
    callers.

    Ideally it should do try_to_freeze() but then we need the unpleasant
    changes in dump_write() and wait_for_dump_helpers(). It is not trivial to
    change dump_write() to restart if f_op->write() fails because of
    freezing(). We need to handle the short writes, we need to clear
    TIF_SIGPENDING (and we can't rely on recalc_sigpending() unless we change
    it to check PF_DUMPCORE). And if the buggy f_op->write() sets
    TIF_SIGPENDING we can not distinguish this case from the race with
    freeze_task() + __thaw_task().

    So we simply accept the fact that the freezer can truncate a core-dump but
    at least you can reliably suspend. Hopefully we can tolerate this
    unlikely case and the necessary complications doesn't worth a trouble.
    But if we decide to make the coredumping freezable later we can do this on
    top of this change.

    Signed-off-by: Oleg Nesterov
    Acked-by: Mandeep Singh Baines
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Now that the coredumping process can be SIGKILL'ed, the setting of
    ->group_exit_code in do_coredump() can race with complete_signal() and
    SIGKILL or 0x80 can be "lost", or wait(status) can report status ==
    SIGKILL | 0x80.

    But the main problem is that it is not clear to me what should we do if
    binfmt->core_dump() succeeds but SIGKILL was sent, that is why this patch
    comes as a separate change.

    This patch adds 0x80 if ->core_dump() succeeds and the process was not
    killed. But perhaps we can (should?) re-set ->group_exit_code changed by
    SIGKILL back to "siginfo->si_signo |= 0x80" in case when core_dumped == T.

    Signed-off-by: Oleg Nesterov
    Tested-by: Mandeep Singh Baines
    Cc: Ingo Molnar
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Roland McGrath
    Cc: Tejun Heo
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • prepare_signal() blesses SIGKILL sent to the dumping process but this
    signal can be "lost" anyway. The problems is, complete_signal() sees
    SIGNAL_GROUP_EXIT and skips the "kill them all" logic. And even if the
    dumping process is single-threaded (so the target is always "correct"),
    the group-wide SIGKILL is not recorded in task->pending and thus
    __fatal_signal_pending() won't be true. A multi-threaded case has even
    more problems.

    And even ignoring all technical details, SIGNAL_GROUP_EXIT doesn't look
    right to me. This coredumping process is not exiting yet, it can do a lot
    of work dumping the core.

    With this patch the dumping process doesn't have SIGNAL_GROUP_EXIT, we set
    signal->group_exit_task instead. This makes signal_group_exit() true and
    thus this should equally close the races with exit/exec/stop but allows to
    kill the dumping thread reliably.

    Notes:
    - It is not clear what should we do with ->group_exit_code
    if the dumper was killed, see the next change.

    - we need more (hopefully straightforward) changes to ensure
    that SIGKILL actually interrupts the coredump. Basically we
    need to check __fatal_signal_pending() in dump_write() and
    dump_seek().

    Signed-off-by: Oleg Nesterov
    Tested-by: Mandeep Singh Baines
    Cc: Ingo Molnar
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Roland McGrath
    Cc: Tejun Heo
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • There are 2 well known and ancient problems with coredump/signals, and a
    lot of related bug reports:

    - do_coredump() clears TIF_SIGPENDING but of course this can't help
    if, say, SIGCHLD comes after that.

    In this case the coredump can fail unexpectedly. See for example
    wait_for_dump_helper()->signal_pending() check but there are other
    reasons.

    - At the same time, dumping a huge core on the slow media can take a
    lot of time/resources and there is no way to kill the coredumping
    task reliably. In particular this is not oom_kill-friendly.

    This patch tries to fix the 1st problem, and makes the preparation for the
    next changes.

    We add the new SIGNAL_GROUP_COREDUMP flag set by zap_threads() to indicate
    that this process dumps the core. prepare_signal() checks this flag and
    nacks any signal except SIGKILL.

    Note that this check tries to be conservative, in the long term we should
    probably treat the SIGNAL_GROUP_EXIT case equally but this needs more
    discussion. See marc.info/?l=linux-kernel&m=120508897917439

    Notes:
    - recalc_sigpending() doesn't check SIGNAL_GROUP_COREDUMP.
    The patch assumes that dump_write/etc paths should never
    call it, but we can change it as well.

    - There is another source of TIF_SIGPENDING, freezer. This
    will be addressed separately.

    Signed-off-by: Oleg Nesterov
    Tested-by: Mandeep Singh Baines
    Cc: Ingo Molnar
    Cc: Neil Horman
    Cc: "Rafael J. Wysocki"
    Cc: Roland McGrath
    Cc: Tejun Heo
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • These are the only users of call_usermodehelper_fns(). This function
    suffers from not being able to determine if the cleanup is called. Even
    if in this places the cleanup pointer is NULL, convert them to use the
    separate call_usermodehelper_setup() + call_usermodehelper_exec()
    functions so we can remove the _fns variant.

    Signed-off-by: Lucas De Marchi
    Cc: Oleg Nesterov
    Cc: David Howells
    Cc: James Morris
    Cc: Al Viro
    Cc: Tejun Heo
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lucas De Marchi
     
  • Signed-off-by: Lucas De Marchi
    Cc: Oleg Nesterov
    Cc: David Howells
    Cc: James Morris
    Cc: Al Viro
    Cc: Tejun Heo
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lucas De Marchi
     
  • Signed-off-by: Vyacheslav Dubeyko
    Cc: Christoph Hellwig
    Cc: Al Viro
    Cc: Hin-Tak Leung
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • __hfsplus_ext_write_extent() suppresses errors coming from
    hfs_brec_find(). The patch implements error code propagation.

    Signed-off-by: Alexey Khoroshilov
    Reviewed-by: Vyacheslav Dubeyko
    Cc: Hin-Tak Leung
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Khoroshilov
     
  • Use a more current logging style.

    Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
    hfsplus now uses "hfsplus: " for all messages.
    Coalesce formats.
    Prefix debugging messages too.

    Signed-off-by: Joe Perches
    Cc: Vyacheslav Dubeyko
    Cc: Hin-Tak Leung
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Use a more current logging style.

    Rename macro and uses.
    Add do {} while (0) to macro.
    Add DBG_ to macro.
    Add and use hfs_dbg_cont variant where appropriate.

    Signed-off-by: Joe Perches
    Cc: Vyacheslav Dubeyko
    Cc: Hin-Tak Leung
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • fs/hfsplus/bfind.c: In function 'hfs_find_1st_rec_by_cnid':
    (1) include/uapi/linux/swab.h:60:2: warning: 'search_cnid' may be used uninitialized in this function [-Wmaybe-uninitialized]
    (2) include/uapi/linux/swab.h:60:2: warning: 'cur_cnid' may be used uninitialized in this function [-Wmaybe-uninitialized]

    [akpm@linux-foundation.org: make the workaround more explicit]
    Signed-off-by: Vyacheslav Dubeyko
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko