01 Jul, 2011

8 commits

  • KVM needs one-shot samples, since a PMC programmed to -X will fire after X
    events and then again after 2^40 events (i.e. variable period).

    Signed-off-by: Avi Kivity
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1309362157-6596-4-git-send-email-avi@redhat.com
    Signed-off-by: Ingo Molnar

    Avi Kivity
     
  • The perf_event overflow handler does not receive any caller-derived
    argument, so many callers need to resort to looking up the perf_event
    in their local data structure. This is ugly and doesn't scale if a
    single callback services many perf_events.

    Fix by adding a context parameter to perf_event_create_kernel_counter()
    (and derived hardware breakpoints APIs) and storing it in the perf_event.
    The field can be accessed from the callback as event->overflow_handler_context.
    All callers are updated.

    Signed-off-by: Avi Kivity
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1309362157-6596-2-git-send-email-avi@redhat.com
    Signed-off-by: Ingo Molnar

    Avi Kivity
     
  • Add a NODE level to the generic cache events which is used to measure
    local vs remote memory accesses. Like all other cache events, an
    ACCESS is HIT+MISS, if there is no way to distinguish between reads
    and writes do reads only etc..

    The below needs filling out for !x86 (which I filled out with
    unsupported events).

    I'm fairly sure ARM can leave it like that since it doesn't strike me as
    an architecture that even has NUMA support. SH might have something since
    it does appear to have some NUMA bits.

    Sparc64, PowerPC and MIPS certainly want a good look there since they
    clearly are NUMA capable.

    Signed-off-by: Peter Zijlstra
    Cc: David Miller
    Cc: Anton Blanchard
    Cc: David Daney
    Cc: Deng-Cheng Zhu
    Cc: Paul Mundt
    Cc: Will Deacon
    Cc: Robert Richter
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1303508226.4865.8.camel@laptop
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • This patch improves the code managing the extra shared registers
    used for offcore_response events on Intel Nehalem/Westmere. The
    idea is to use static allocation instead of dynamic allocation.
    This simplifies greatly the get and put constraint routines for
    those events.

    The patch also renames per_core to shared_regs because the same
    data structure gets used whether or not HT is on. When HT is
    off, those events still need to coordination because they use
    a extra MSR that has to be shared within an event group.

    Signed-off-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110606145703.GA7258@quad
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     
  • Since only samples call perf_output_sample() its much saner (and more
    correct) to put the sample logic in there than in the
    perf_output_begin()/perf_output_end() pair.

    Saves a useless argument, reduces conditionals and shrinks
    struct perf_output_handle, win!

    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-2crpvsx3cqu67q3zqjbnlpsc@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • The nmi parameter indicated if we could do wakeups from the current
    context, if not, we would set some state and self-IPI and let the
    resulting interrupt do the wakeup.

    For the various event classes:

    - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
    the PMI-tail (ARM etc.)
    - tracepoint: nmi=0; since tracepoint could be from NMI context.
    - software: nmi=[0,1]; some, like the schedule thing cannot
    perform wakeups, and hence need 0.

    As one can see, there is very little nmi=1 usage, and the down-side of
    not using it is that on some platforms some software events can have a
    jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).

    The up-side however is that we can remove the nmi parameter and save a
    bunch of conditionals in fast paths.

    Signed-off-by: Peter Zijlstra
    Cc: Michael Cree
    Cc: Will Deacon
    Cc: Deng-Cheng Zhu
    Cc: Anton Blanchard
    Cc: Eric B Munson
    Cc: Heiko Carstens
    Cc: Paul Mundt
    Cc: David S. Miller
    Cc: Frederic Weisbecker
    Cc: Jason Wessel
    Cc: Don Zickus
    Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Reorder perf_event_context to remove 8 bytes of 64 bit alignment padding
    shrinking its size to 192 bytes, allowing it to fit into a smaller slab
    and use one fewer cache lines.

    Signed-off-by: Richard Kennedy
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1307460819.1950.5.camel@castor.rsk
    Signed-off-by: Ingo Molnar

    Richard Kennedy
     
  • Merge reason: Pick up the latest fixes.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

28 Jun, 2011

8 commits

  • Under heavy memory and filesystem load, users observe the assertion
    mapping->nrpages == 0 in end_writeback() trigger. This can be caused by
    page reclaim reclaiming the last page from a mapping in the following
    race:

    CPU0 CPU1
    ...
    shrink_page_list()
    __remove_mapping()
    __delete_from_page_cache()
    radix_tree_delete()
    evict_inode()
    truncate_inode_pages()
    truncate_inode_pages_range()
    pagevec_lookup() - finds nothing
    end_writeback()
    mapping->nrpages != 0 -> BUG
    page->mapping = NULL
    mapping->nrpages--

    Fix the problem by doing a reliable check of mapping->nrpages under
    mapping->tree_lock in end_writeback().

    Analyzed by Jay , lost in LKML, and dug out
    by Miklos Szeredi .

    Cc: Jay
    Cc: Miklos Szeredi
    Signed-off-by: Jan Kara
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • This is required for tilegx to be able to use the compat unistd.h header
    where compat_sys_sendmmsg() is now mentioned.

    Signed-off-by: Chris Metcalf
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Metcalf
     
  • Although it is used (by i915) on nothing but tmpfs, read_cache_page_gfp()
    is unsuited to tmpfs, because it inserts a page into pagecache before
    calling the filesystem's ->readpage: tmpfs may have pages in swapcache
    which only it knows how to locate and switch to filecache.

    At present tmpfs provides a ->readpage method, and copes with this by
    copying pages; but soon we can simplify it by removing its ->readpage.
    Provide shmem_read_mapping_page_gfp() now, ready for that transition,

    Export shmem_read_mapping_page_gfp() and add it to list in shmem_fs.h,
    with shmem_read_mapping_page() inline for the common mapping_gfp case.

    (shmem_read_mapping_page_gfp or shmem_read_cache_page_gfp? Generally the
    read_mapping_page functions use the mapping's ->readpage, and the
    read_cache_page functions use the supplied filler, so I think
    read_cache_page_gfp was slightly misnamed.)

    Signed-off-by: Hugh Dickins
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • 2.6.35's new truncate convention gave tmpfs the opportunity to control
    its file truncation, no longer enforced from outside by vmtruncate().
    We shall want to build upon that, to handle pagecache and swap together.

    Slightly redefine the ->truncate_range interface: let it now be called
    between the unmap_mapping_range()s, with the filesystem responsible for
    doing the truncate_inode_pages_range() from it - just as the filesystem
    is nowadays responsible for doing that from its ->setattr.

    Let's rename shmem_notify_change() to shmem_setattr(). Instead of
    calling the generic truncate_setsize(), bring that code in so we can
    call shmem_truncate_range() - which will later be updated to perform its
    own variant of truncate_inode_pages_range().

    Remove the punch_hole unmap_mapping_range() from shmem_truncate_range():
    now that the COW's unmap_mapping_range() comes after ->truncate_range,
    there is no need to call it a third time.

    Export shmem_truncate_range() and add it to the list in shmem_fs.h, so
    that i915_gem_object_truncate() can call it explicitly in future; get
    this patch in first, then update drm/i915 once this is available (until
    then, i915 will just be doing the truncate_inode_pages() twice).

    Though introduced five years ago, no other filesystem is implementing
    ->truncate_range, and its only other user is madvise(,,MADV_REMOVE): we
    expect to convert it to fallocate(,FALLOC_FL_PUNCH_HOLE,,) shortly,
    whereupon ->truncate_range can be removed from inode_operations -
    shmem_truncate_range() will help i915 across that transition too.

    Signed-off-by: Hugh Dickins
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Before adding any more global entry points into shmem.c, gather such
    prototypes into shmem_fs.h. Remove mm's own declarations from swap.h,
    but for now leave the ones in mm.h: because shmem_file_setup() and
    shmem_zero_setup() are called from various places, and we should not
    force other subsystems to update immediately.

    Signed-off-by: Hugh Dickins
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Fix 'make htmldocs' warnings:

    Warning(/include/linux/hrtimer.h:153): No description found for parameter 'clockid'
    Warning(/include/linux/device.h:604): Excess struct/union/enum/typedef member 'of_match' description in 'device'
    Warning(/include/net/sock.h:349): Excess struct/union/enum/typedef member 'sk_rmem_alloc' description in 'sock'

    Signed-off-by: Vitaliy Ivanov
    Acked-by: Grant Likely
    Acked-by: David S. Miller
    Signed-off-by: Linus Torvalds

    Vitaliy Ivanov
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
    mmc: queue: bring discard_granularity/alignment into line with SCSI
    mmc: queue: append partition subname to queue thread name
    mmc: core: make erase timeout calculation allow for gated clock
    mmc: block: switch card to User Data Area when removing the block driver
    mmc: sdio: reset card during power_restore
    mmc: cb710: fix #ifdef HAVE_EFFICIENT_UNALIGNED_ACCESS
    mmc: sdhi: DMA slave ID 0 is invalid
    mmc: tmio: fix regression in TMIO_MMC_WRPROTECT_DISABLE handling
    mmc: omap_hsmmc: use original sg_len for dma_unmap_sg
    mmc: omap_hsmmc: fix ocr mask usage
    mmc: sdio: fix runtime PM path during driver removal
    mmc: Add PCI fixup quirks for Ricoh 1180:e823 reader
    mmc: sdhi: fix module unloading
    mmc: of_mmc_spi: add NO_IRQ define to of_mmc_spi.c
    mmc: vub300: fix null dereferences in error handling

    Linus Torvalds
     
  • commit 21a3c96 uses node_start/end_pfn(nid) for detection start/end
    of nodes. But, it's not defined in linux/mmzone.h but defined in
    /arch/???/include/mmzone.h which is included only under
    CONFIG_NEED_MULTIPLE_NODES=y.

    Then, we see
    mm/page_cgroup.c: In function 'page_cgroup_init':
    mm/page_cgroup.c:308: error: implicit declaration of function 'node_start_pfn'
    mm/page_cgroup.c:309: error: implicit declaration of function 'node_end_pfn'

    So, fixiing page_cgroup.c is an idea...

    But node_start_pfn()/node_end_pfn() is a very generic macro and
    should be implemented in the same manner for all archs.
    (m32r has different implementation...)

    This patch removes definitions of node_start/end_pfn() in each archs
    and defines a unified one in linux/mmzone.h. It's not under
    CONFIG_NEED_MULTIPLE_NODES, now.

    A result of macro expansion is here (mm/page_cgroup.c)

    for !NUMA
    start_pfn = ((&contig_page_data)->node_start_pfn);
    end_pfn = ({ pg_data_t *__pgdat = (&contig_page_data); __pgdat->node_start_pfn + __pgdat->node_spanned_pages;});

    for NUMA (x86-64)
    start_pfn = ((node_data[nid])->node_start_pfn);
    end_pfn = ({ pg_data_t *__pgdat = (node_data[nid]); __pgdat->node_start_pfn + __pgdat->node_spanned_pages;});

    Changelog:
    - fixed to avoid using "nid" twice in node_end_pfn() macro.

    Reported-and-acked-by: Randy Dunlap
    Reported-and-tested-by: Ingo Molnar
    Acked-by: Mel Gorman
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

25 Jun, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
    ALSA: Remove unneeded version.h includes from sound/
    ASoC: pxa-ssp: Correct check for stream presence
    ASoC: imx: add missing module informations
    ASoC: imx: Remove unused Kconfig SND_MXC_SOC_SSI entry
    ALSA: HDA: Pinfix quirk for HP Z200 Workstation
    ALSA: VIA HDA: Create a master amplifier control for VT1718S.
    ALSA: VIA HDA: Mute/unmute mixer conncted to Headphone for VT1718S.
    ALSA: VIA HDA: Modify initial verbs list for VT1718S.
    ALSA: hda - Remove ALC268 model override for CPR2000
    ALSA: HDA: Remove quirk for an HP device
    ASoC: Remove unused and about to be broken SND_SOC_CUSTOM I/O bus

    Linus Torvalds
     

24 Jun, 2011

2 commits


23 Jun, 2011

1 commit

  • * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
    PCI / PM: Block races between runtime PM and system sleep
    PM / Domains: Update documentation
    PM / Runtime: Handle clocks correctly if CONFIG_PM_RUNTIME is unset
    PM: Fix async resume following suspend failure
    PM: Free memory bitmaps if opening /dev/snapshot fails
    PM: Rename dev_pm_info.in_suspend to is_prepared
    PM: Update documentation regarding sysdevs
    PM / Runtime: Update doc: usage count no longer incremented across system PM

    Linus Torvalds
     

22 Jun, 2011

4 commits

  • * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
    NFS: Fix decode_secinfo_maxsz
    NFSv4.1: Fix an off-by-one error in pnfs_generic_pg_test
    NFSv4.1: Fix some issues with pnfs_generic_pg_test
    NFSv4.1: file layout must consider pg_bsize for coalescing
    pnfs-obj: No longer needed to take an extra ref at add_device
    SUNRPC: Ensure the RPC client only quits on fatal signals
    NFSv4: Fix a readdir regression
    nfs4.1: mark layout as bad on error path in _pnfs_return_layout
    nfs4.1: prevent race that allowed use of freed layout in _pnfs_return_layout
    NFSv4.1: need to put_layout_hdr on _pnfs_return_layout error path
    NFS: (d)printks should use %zd for ssize_t arguments
    NFSv4.1: fix break condition in pnfs_find_lseg
    nfs4.1: fix several problems with _pnfs_return_layout
    NFSv4.1: allow zero fh array in filelayout decode layout
    NFSv4.1: allow nfs_fhget to succeed with mounted on fileid
    NFSv4.1: Fix a refcounting issue in the pNFS device id cache
    NFSv4.1: deprecate headerpadsz in CREATE_SESSION
    NFS41: do not update isize if inode needs layoutcommit
    NLM: Don't hang forever on NLM unlock requests
    NFS: fix umount of pnfs filesystems

    Linus Torvalds
     
  • The PM core doesn't handle suspend failures correctly when it comes to
    asynchronously suspended devices. These devices are moved onto the
    dpm_suspended_list as soon as the corresponding async thread is
    started up, and they remain on the list even if they fail to suspend
    or the sleep transition is cancelled before they get suspended. As a
    result, when the PM core unwinds the transition, it tries to resume
    the devices even though they were never suspended.

    This patch (as1474) fixes the problem by adding a new "is_suspended"
    flag to dev_pm_info. Devices are resumed only if the flag is set.

    [rjw:
    * Moved the dev->power.is_suspended check into device_resume(),
    because we need to complete dev->power.completion and clear
    dev->power.is_prepared too for devices whose
    dev->power.is_suspended flags are unset.
    * Fixed __device_suspend() to avoid setting dev->power.is_suspended
    if async_error is different from zero.]

    Signed-off-by: Alan Stern
    Signed-off-by: Rafael J. Wysocki
    Cc: stable@kernel.org

    Alan Stern
     
  • This patch (as1473) renames the "in_suspend" field in struct
    dev_pm_info to "is_prepared", in preparation for an upcoming change.
    The new name is more descriptive of what the field really means.

    Signed-off-by: Alan Stern
    Signed-off-by: Rafael J. Wysocki
    Cc: stable@kernel.org

    Alan Stern
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    jbd2: Fix oops in jbd2_journal_remove_journal_head()
    jbd2: Remove obsolete parameters in the comments for some jbd2 functions
    ext4: fixed tracepoints cleanup
    ext4: use FIEMAP_EXTENT_LAST flag for last extent in fiemap
    ext4: Fix max file size and logical block counting of extent format file
    ext4: correct comments for ext4_free_blocks()

    Linus Torvalds
     

21 Jun, 2011

5 commits

  • Commit 13e12d14e2dc ("vfs: reorganize 'struct inode' layout a bit")
    moved things around a bit changed i_state to be unsigned int instead of
    unsigned long. That was to help structure layout for the 64-bit case,
    and shrink 'struct inode' a bit (admittedly that only happened when
    spinlock debugging was on and i_flags didn't pack with i_lock).

    However, Meelis Roos reports that this results in unaligned exceptions
    on sprc, and it turns out that the bit-locking primitives that we use
    for the I_NEW bit want to use the bitops. Which want 'unsigned long',
    not 'unsigned int'.

    We really should fix the bit locking code to not have that kind of
    requirement, but that's a much bigger change. So for now, revert that
    field back to 'unsigned long' (but keep the other re-ordering changes
    from the commit that caused this).

    Andi points out that we have played games with this in 'struct page', so
    it's solvable with other hacks too, but since right now the struct inode
    size advantage only happens with some rare config options, it's not
    worth fighting.

    It _would_ be worth fixing the bitlocking code, though. Especially
    since there is no type safety in the bitlocking code (this never caused
    any warnings, and worked fine on x86-64, because the bitlocks take a
    'void *' and x86-64 doesn't care that deeply about alignment). So it's
    currently a very easy problem to trigger by mistake and never notice.

    Reported-by: Meelis Roos
    Cc: Andi Kleen
    Cc: David Miller
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • * 'for-2.6.40' of git://linux-nfs.org/~bfields/linux:
    nfsd4: fix break_lease flags on nfsd open
    nfsd: link returns nfserr_delay when breaking lease
    nfsd: v4 support requires CRYPTO
    nfsd: fix dependency of nfsd on auth_rpcgss

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (40 commits)
    pxa168_eth: fix race in transmit path.
    ipv4, ping: Remove duplicate icmp.h include
    netxen: fix race in skb->len access
    sgi-xp: fix a use after free
    hp100: fix an skb->len race
    netpoll: copy dev name of slaves to struct netpoll
    ipv4: fix multicast losses
    r8169: fix static initializers.
    inet_diag: fix inet_diag_bc_audit()
    gigaset: call module_put before restart of if_open()
    farsync: add module_put to error path in fst_open()
    net: rfs: enable RFS before first data packet is received
    fs_enet: fix freescale FCC ethernet dp buffer alignment
    netdev: bfin_mac: fix memory leak when freeing dma descriptors
    vlan: don't call ndo_vlan_rx_register on hardware that doesn't have vlan support
    caif: Bugfix - XOFF removed channel from caif-mux
    tun: teach the tun/tap driver to support netpoll
    dp83640: drop PHY status frames in the driver.
    dp83640: fix phy status frame event parsing
    phylib: Allow BCM63XX PHY to be selected only on BCM63XX.
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    devcgroup_inode_permission: take "is it a device node" checks to inlined wrapper
    fix comment in generic_permission()
    kill obsolete comment for follow_down()
    proc_sys_permission() is OK in RCU mode
    reiserfs_permission() doesn't need to bail out in RCU mode
    proc_fd_permission() is doesn't need to bail out in RCU mode
    nilfs2_permission() doesn't need to bail out in RCU mode
    logfs doesn't need ->permission() at all
    coda_ioctl_permission() is safe in RCU mode
    cifs_permission() doesn't need to bail out in RCU mode
    bad_inode_permission() is safe from RCU mode
    ubifs: dereferencing an ERR_PTR in ubifs_mount()

    Linus Torvalds
     
  • Otherwise we end up overflowing the rpc buffer size on the receive end.

    Signed-off-by: Benny Halevy
    Signed-off-by: Trond Myklebust

    Benny Halevy
     

20 Jun, 2011

4 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
    Input: sh_keysc - 8x8 MODE_6 fix
    Input: omap-keypad - add missing input_sync()
    Input: evdev - try to wake up readers only if we have full packet
    Input: properly assign return value of clamp() macro.

    Linus Torvalds
     
  • inode_permission() calls devcgroup_inode_permission() and almost all such
    calls are _not_ for device nodes; let's at least keep the common path
    straight...

    Signed-off-by: Al Viro

    Al Viro
     
  • Add REQ_SECURE flag to REQ_COMMON_MASK so that
    init_request_from_bio() can pass it to @req->cmd_flags.

    Signed-off-by: Namhyung Kim
    Acked-by: Adrian Hunter
    Cc: stable@kernel.org # 2.6.36 and newer
    Signed-off-by: Jens Axboe

    Namhyung Kim
     
  • …-for-linus' and 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    tools/perf: Fix static build of perf tool
    tracing: Fix regression in printk_formats file

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    generic-ipi: Fix kexec boot crash by initializing call_single_queue before enabling interrupts

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    clocksource: Make watchdog robust vs. interruption
    timerfd: Fix wakeup of processes when timer is cancelled on clock change

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, MAINTAINERS: Add x86 MCE people
    x86, efi: Do not reserve boot services regions within reserved areas

    Linus Torvalds
     

19 Jun, 2011

2 commits


18 Jun, 2011

3 commits

  • According to the data sheet for G4, AP4 and AG5 KEYSC MODE_6 is 8x8 keys.
    Bump up MAXKEYS to 64 too.

    Signed-off-by: Magnus Damm
    Reviewed-by: Simon Horman
    Signed-off-by: Dmitry Torokhov

    Magnus Damm
     
  • * 'gpio/merge' of git://git.secretlab.ca/git/linux-2.6:
    gpio: add GPIOF_ values regardless on kconfig settings
    gpio: include linux/gpio.h where needed
    gpio/omap4: Fix missing interrupts during device wakeup due to IOPAD.

    * 'spi/merge' of git://git.secretlab.ca/git/linux-2.6:
    spi/bfin_spi: fix handling of default bits per word setting

    Linus Torvalds
     
  • ____call_usermodehelper() now erases any credentials set by the
    subprocess_inf::init() function. The problem is that commit
    17f60a7da150 ("capabilites: allow the application of capability limits
    to usermode helpers") creates and commits new credentials with
    prepare_kernel_cred() after the call to the init() function. This wipes
    all keyrings after umh_keys_init() is called.

    The best way to deal with this is to put the init() call just prior to
    the commit_creds() call, and pass the cred pointer to init(). That
    means that umh_keys_init() and suchlike can modify the credentials
    _before_ they are published and potentially in use by the rest of the
    system.

    This prevents request_key() from working as it is prevented from passing
    the session keyring it set up with the authorisation token to
    /sbin/request-key, and so the latter can't assume the authority to
    instantiate the key. This causes the in-kernel DNS resolver to fail
    with ENOKEY unconditionally.

    Signed-off-by: David Howells
    Acked-by: Eric Paris
    Tested-by: Jeff Layton
    Signed-off-by: Linus Torvalds

    David Howells
     

17 Jun, 2011

2 commits

  • There is a problem that kdump(2nd kernel) sometimes hangs up due
    to a pending IPI from 1st kernel. Kernel panic occurs because IPI
    comes before call_single_queue is initialized.

    To fix the crash, rename init_call_single_data() to call_function_init()
    and call it in start_kernel() so that call_single_queue can be
    initialized before enabling interrupts.

    The details of the crash are:

    (1) 2nd kernel boots up

    (2) A pending IPI from 1st kernel comes when irqs are first enabled
    in start_kernel().

    (3) Kernel tries to handle the interrupt, but call_single_queue
    is not initialized yet at this point. As a result, in the
    generic_smp_call_function_single_interrupt(), NULL pointer
    dereference occurs when list_replace_init() tries to access
    &q->list.next.

    Therefore this patch changes the name of init_call_single_data()
    to call_function_init() and calls it before local_irq_enable()
    in start_kernel().

    Signed-off-by: Takao Indoh
    Reviewed-by: WANG Cong
    Acked-by: Neil Horman
    Acked-by: Vivek Goyal
    Acked-by: Peter Zijlstra
    Cc: Milton Miller
    Cc: Jens Axboe
    Cc: Paul E. McKenney
    Cc: kexec@lists.infradead.org
    Link: http://lkml.kernel.org/r/D6CBEE2F420741indou.takao@jp.fujitsu.com
    Signed-off-by: Ingo Molnar

    Takao Indoh
     
  • David S. Miller