28 Jul, 2011

10 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (32 commits)
    tg3: Remove 5719 jumbo frames and TSO blocks
    tg3: Break larger frags into 4k chunks for 5719
    tg3: Add tx BD budgeting code
    tg3: Consolidate code that calls tg3_tx_set_bd()
    tg3: Add partial fragment unmapping code
    tg3: Generalize tg3_skb_error_unmap()
    tg3: Remove short DMA check for 1st fragment
    tg3: Simplify tx bd assignments
    tg3: Reintroduce tg3_tx_ring_info
    ASIX: Use only 11 bits of header for data size
    ASIX: Simplify condition in rx_fixup()
    Fix cdc-phonet build
    bonding: reduce noise during init
    bonding: fix string comparison errors
    net: Audit drivers to identify those needing IFF_TX_SKB_SHARING cleared
    net: add IFF_SKB_TX_SHARED flag to priv_flags
    net: sock_sendmsg_nosec() is static
    forcedeth: fix vlans
    gianfar: fix bug caused by 87c288c6e9aa31720b72e2bc2d665e24e1653c3e
    gro: Only reset frag0 when skb can be pulled
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://neil.brown.name/md: (75 commits)
    md/raid10: handle further errors during fix_read_error better.
    md/raid10: Handle read errors during recovery better.
    md/raid10: simplify read error handling during recovery.
    md/raid10: record bad blocks due to write errors during resync/recovery.
    md/raid10: attempt to fix read errors during resync/check
    md/raid10: Handle write errors by updating badblock log.
    md/raid10: clear bad-block record when write succeeds.
    md/raid10: avoid writing to known bad blocks on known bad drives.
    md/raid10 record bad blocks as needed during recovery.
    md/raid10: avoid reading known bad blocks during resync/recovery.
    md/raid10 - avoid reading from known bad blocks - part 3
    md/raid10: avoid reading from known bad blocks - part 2
    md/raid10: avoid reading from known bad blocks - part 1
    md/raid10: Split handle_read_error out from raid10d.
    md/raid10: simplify/reindent some loops.
    md/raid5: Clear bad blocks on successful write.
    md/raid5. Don't write to known bad block on doubtful devices.
    md/raid5: write errors should be recorded as bad blocks if possible.
    md/raid5: use bad-block log to improve handling of uncorrectable read errors.
    md/raid5: avoid reading from known bad blocks.
    ...

    Linus Torvalds
     
  • Pktgen attempts to transmit shared skbs to net devices, which can't be used by
    some drivers as they keep state information in skbs. This patch adds a flag
    marking drivers as being able to handle shared skbs in their tx path. Drivers
    are defaulted to being unable to do so, but calling ether_setup enables this
    flag, as 90% of the drivers calling ether_setup touch real hardware and can
    handle shared skbs. A subsequent patch will audit drivers to ensure that the
    flag is set properly

    Signed-off-by: Neil Horman
    Reported-by: Jiri Pirko
    CC: Robert Olsson
    CC: Eric Dumazet
    CC: Alexey Dobriyan
    CC: David S. Miller
    Signed-off-by: David S. Miller

    Neil Horman
     
  • …s/security-testing-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (54 commits)
    tpm_nsc: Fix bug when loading multiple TPM drivers
    tpm: Move tpm_tis_reenable_interrupts out of CONFIG_PNP block
    tpm: Fix compilation warning when CONFIG_PNP is not defined
    TOMOYO: Update kernel-doc.
    tpm: Fix a typo
    tpm_tis: Probing function for Intel iTPM bug
    tpm_tis: Fix the probing for interrupts
    tpm_tis: Delay ACPI S3 suspend while the TPM is busy
    tpm_tis: Re-enable interrupts upon (S3) resume
    tpm: Fix display of data in pubek sysfs entry
    tpm_tis: Add timeouts sysfs entry
    tpm: Adjust interface timeouts if they are too small
    tpm: Use interface timeouts returned from the TPM
    tpm_tis: Introduce durations sysfs entry
    tpm: Adjust the durations if they are too small
    tpm: Use durations returned from TPM
    TOMOYO: Enable conditional ACL.
    TOMOYO: Allow using argv[]/envp[] of execve() as conditions.
    TOMOYO: Allow using executable's realpath and symlink's target as conditions.
    TOMOYO: Allow using owner/group etc. of file objects as conditions.
    ...

    Fix up trivial conflict in security/tomoyo/realpath.c

    Linus Torvalds
     
  • Space must have been allocated when array was created.
    A feature flag is set when the badblock list is non-empty, to
    ensure old kernels don't load and trust the whole device.

    We only update the on-disk badblocklist when it has changed.
    If the badblocklist (or other metadata) is stored on a bad block, we
    don't cope very well.

    If metadata has no room for bad block, flag bad-blocks as disabled,
    and do the same for 0.90 metadata.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • * 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (44 commits)
    NFSv4: Don't use the delegation->inode in nfs_mark_return_delegation()
    nfs: don't use d_move in nfs_async_rename_done
    RDMA: Increasing RPCRDMA_MAX_DATA_SEGS
    SUNRPC: Replace xprt->resend and xprt->sending with a priority queue
    SUNRPC: Allow caller of rpc_sleep_on() to select priority levels
    SUNRPC: Support dynamic slot allocation for TCP connections
    SUNRPC: Clean up the slot table allocation
    SUNRPC: Initalise the struct xprt upon allocation
    SUNRPC: Ensure that we grab the XPRT_LOCK before calling xprt_alloc_slot
    pnfs: simplify pnfs files module autoloading
    nfs: document nfsv4 sillyrename issues
    NFS: Convert nfs4_set_ds_client to EXPORT_SYMBOL_GPL
    SUNRPC: Convert the backchannel exports to EXPORT_SYMBOL_GPL
    SUNRPC: sunrpc should not explicitly depend on NFS config options
    NFS: Clean up - simplify the switch to read/write-through-MDS
    NFS: Move the pnfs write code into pnfs.c
    NFS: Move the pnfs read code into pnfs.c
    NFS: Allow the nfs_pageio_descriptor to signal that a re-coalesce is needed
    NFS: Use the nfs_pageio_descriptor->pg_bsize in the read/write request
    NFS: Cache rpc_ops in struct nfs_pageio_descriptor
    ...

    Linus Torvalds
     
  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
    target: Convert to DIV_ROUND_UP_SECTOR_T usage for sectors / dev_max_sectors
    kernel.h: Add DIV_ROUND_UP_ULL and DIV_ROUND_UP_SECTOR_T macro usage
    iscsi-target: Add iSCSI fabric support for target v4.1
    iscsi: Add Serial Number Arithmetic LT and GT into iscsi_proto.h
    iscsi: Use struct scsi_lun in iscsi structs instead of u8[8]
    iscsi: Resolve iscsi_proto.h naming conflicts with drivers/target/iscsi

    Linus Torvalds
     
  • Since __proc_create() appends the name it is given to the end of the PDE
    structure that it allocates, there isn't a need to store a name pointer.
    Instead we can just replace the name pointer with a terminal char array of
    _unspecified_ length. The compiler will simply append the string to statically
    defined variables of PDE type overlapping any hole at the end of the structure
    and, unlike specifying an explicitly _zero_ length array, won't give a warning
    if you try to statically initialise it with a string of more than zero length.

    Also, whilst we're at it:

    (1) Move namelen to end just prior to name and reduce it to a single byte
    (name shouldn't be longer than NAME_MAX).

    (2) Move pde_unload_lock two places further on so that if it's four bytes in
    size on a 64-bit machine, it won't cause an unused hole in the PDE struct.

    Signed-off-by: David Howells
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    David Howells
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: (22 commits)
    ALSA: hda - Cirrus Logic CS421x support
    ALSA: Make pcm.h self-contained
    ALSA: hda - Allow codec-specific set_power_state ops
    ALSA: hda - Add post_suspend patch ops
    ALSA: hda - Make CONFIG_SND_HDA_POWER_SAVE depending on CONFIG_PM
    ALSA: hda - Make sure mute led reflects master mute state
    ALSA: hda - Fix invalid mute led state on resume of IDT codecs
    ASoC: Revert "ASoC: SAMSUNG: Add I2S0 internal dma driver"
    ALSA: hda - Add support of the 4 internal speakers on certain HP laptops
    ALSA: Make snd_pcm_debug_name usable outside pcm_lib
    ALSA: hda - Fix DAC filling for multi-connection pins in Realtek parser
    ASoC: dapm - Add methods to retrieve snd_card and soc_card from dapm context.
    ASoC: SAMSUNG: Add I2S0 internal dma driver
    ASoC: SAMSUNG: Modify I2S driver to support idma
    ASoC: davinci: add missing break statement
    ASoC: davinci: fix codec start and stop functions
    ASoC: dapm - add DAPM macro for external enum widgets
    ASoC: Acknowledge WM8962 interrupts before acting on them
    ASoC: sgtl5000: guide user when regulator support is needed
    ASoC: sgtl5000: refactor registering internal ldo
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (53 commits)
    Input: synaptics - fix reporting of min coordinates
    Input: tegra-kbc - enable key autorepeat
    Input: kxtj9 - fix locking typo in kxtj9_set_poll()
    Input: kxtj9 - fix bug in probe()
    Input: intel-mid-touch - remove pointless checking for variable 'found'
    Input: hp_sdc - staticize hp_sdc_kicker()
    Input: pmic8xxx-keypad - fix a leak of the IRQ during init failure
    Input: cy8ctmg110_ts - set reset_pin and irq_pin from platform data
    Input: cy8ctmg110_ts - constify i2c_device_id table
    Input: cy8ctmg110_ts - fix checking return value of i2c_master_send
    Input: lifebook - make dmi callback functions return 1
    Input: atkbd - make dmi callback functions return 1
    Input: gpio_keys - switch to using SIMPLE_DEV_PM_OPS
    Input: gpio_keys - add support for device-tree platform data
    Input: aiptek - remove double define
    Input: synaptics - set minimum coordinates as reported by firmware
    Input: synaptics - process button bits in AGM packets
    Input: synaptics - rename set_slot to be more descriptive
    Input: synaptics - fuzz position for touchpad with reduced filtering
    Input: synaptics - set resolution for MT_POSITION_X/Y axes
    ...

    Linus Torvalds
     

27 Jul, 2011

30 commits

  • Currently skb_gro_header_slow unconditionally resets frag0 and
    frag0_len. However, when we can't pull on the skb this leaves
    the GRO fields in an inconsistent state.

    This patch fixes this by only resetting those fields after the
    pskb_may_pull test.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    merge fchmod() and fchmodat() guts, kill ancient broken kludge
    xfs: fix misspelled S_IS...()
    xfs: get rid of open-coded S_ISREG(), etc.
    vfs: document locking requirements for d_move, __d_move and d_materialise_unique
    omfs: fix (mode & S_IFDIR) abuse
    btrfs: S_ISREG(mode) is not mode & S_IFREG...
    ima: fmode_t misspelled as mode_t...
    pci-label.c: size_t misspelled as mode_t
    jffs2: S_ISLNK(mode & S_IFMT) is pointless
    snd_msnd ->mode is fmode_t, not mode_t
    v9fs_iop_get_acl: get rid of unused variable
    vfs: dont chain pipe/anon/socket on superblock s_inodes list
    Documentation: Exporting: update description of d_splice_alias
    fs: add missing unlock in default_llseek()

    Linus Torvalds
     
  • * 'next/devel2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc: (47 commits)
    OMAP: Add debugfs node to show the summary of all clocks
    OMAP2+: hwmod: Follow the recommended PRCM module enable sequence
    OMAP2+: clock: allow per-SoC clock init code to prevent clockdomain calls from clock code
    OMAP2+: clockdomain: Add per clkdm lock to prevent concurrent state programming
    OMAP2+: PM: idle clkdms only if already in idle
    OMAP2+: clockdomain: add clkdm_in_hwsup()
    OMAP2+: clockdomain: Add 2 APIs to control clockdomain from hwmod framework
    OMAP: clockdomain: Remove redundant call to pwrdm_wait_transition()
    OMAP4: hwmod: Introduce the module control in hwmod control
    OMAP4: cm: Add two new APIs for modulemode control
    OMAP4: hwmod data: Add modulemode entry in omap_hwmod structure
    OMAP4: hwmod data: Add PRM context register offset
    OMAP4: prm: Remove deprecated functions
    OMAP4: prm: Replace warm reset API with the offset based version
    OMAP4: hwmod: Replace RSTCTRL absolute address with offset macros
    OMAP: hwmod: Wait the idle status to be disabled
    OMAP4: hwmod: Replace CLKCTRL absolute address with offset macros
    OMAP2+: hwmod: Init clkdm field at boot time
    OMAP4: hwmod data: Add clock domain attribute
    OMAP4: clock data: Add missing divider selection for auxclks
    ...

    Linus Torvalds
     
  • …git/arm/linux-arm-soc

    * 'next/cross-platform' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc:
    ARM: Consolidate the clkdev header files
    ARM: set vga memory base at run-time
    ARM: convert PCI defines to variables
    ARM: pci: make pcibios_assign_all_busses use pci_has_flag
    ARM: remove unnecessary mach/hardware.h includes
    pci: move microblaze and powerpc pci flag functions into asm-generic
    powerpc: rename ppc_pci_*_flags to pci_*_flags

    Fix up conflicts in arch/microblaze/include/asm/pci-bridge.h

    Linus Torvalds
     
  • * 'next/fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc: (24 commits)
    ASoC: omap: McBSP: fix build breakage on OMAP1
    OMAP: hwmod: fix the i2c-reset timeout during bootup
    I2C: OMAP2+: add correct functionality flags to all omap2plus i2c dev_attr
    I2C: OMAP2+: Tag all OMAP2+ hwmod defintions with I2C IP revision
    I2C: OMAP1/OMAP2+: create omap I2C functionality flags for each cpu_... test
    I2C: OMAP2+: Introduce I2C IP versioning constants
    I2C: OMAP2+: increase omap_i2c_dev_attr flags from u8 to u32
    I2C: OMAP2+: Set hwmod flags to only allow 16-bit accesses to i2c
    OMAP4: hwmod data: Change DSS main_clk scheme
    OMAP4: powerdomain data: Remove unsupported MPU powerdomain state
    OMAP4: clock data: Keep GPMC clocks always enabled and hardware managed
    OMAP4: powerdomain data: Fix core mem states and missing cefuse flag
    OMAP2+: PM: Initialise sleep_switch to a non-valid value
    OMAP4: hwmod data: Modify DSS opt clocks
    OMAP4: iommu: fix clock name
    omap: iovmm: s/sg_dma_len(sg)/sg->length/
    omap: iommu: fix pte programming
    arm: omap3: cm-t35: fix slow path warning
    arm: omap3: cm-t35: minor comments fixes
    omap: ZOOM: QUART: Request reset GPIO
    ...

    Linus Torvalds
     
  • Only a few core funcs need to be implemented for SMP systems, so allow the
    arches to override them while getting the rest for free.

    At least, this is enough to allow the Blackfin SMP port to use things.

    Signed-off-by: Mike Frysinger
    Cc: Arun Sharma
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Frysinger
     
  • Since arches are expected to implement this guy, add a common version for
    people the same way as atomic_clear_mask is handled.

    Signed-off-by: Mike Frysinger
    Cc: Arun Sharma
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Frysinger
     
  • The atomic helpers are supposed to take an atomic_t pointer, not a random
    unsigned long pointer. So convert atomic_clear_mask over.

    While we're here, also add some nice documentation to the func.

    Signed-off-by: Mike Frysinger
    Cc: Arun Sharma
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Frysinger
     
  • We already declared inc/dec helpers, so we don't need to call the
    atomic_{add,sub}_return funcs directly.

    Signed-off-by: Mike Frysinger
    Cc: Arun Sharma
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Frysinger
     
  • This clarifies the differences between and

    Signed-off-by: Arun Sharma
    Suggested-by: Mike Frysinger
    Cc: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     
  • After changing all consumers of atomics to include , we
    ran into some compile time errors due to this dependency chain:

    linux/atomic.h
    -> asm/atomic.h
    -> asm-generic/atomic-long.h

    where atomic-long.h could use funcs defined later in linux/atomic.h
    without a prototype. This patches moves the code that includes
    asm-generic/atomic*.h to linux/atomic.h.

    Archs that need need to select
    CONFIG_GENERIC_ATOMIC64 from now on (some of them used to include it
    unconditionally).

    Compile tested on i386 and x86_64 with allnoconfig.

    Signed-off-by: Arun Sharma
    Cc: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     
  • This is in preparation for more generic atomic primitives based on
    __atomic_add_unless.

    Signed-off-by: Arun Sharma
    Signed-off-by: Hans-Christian Egtvedt
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     
  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     
  • The majority of architectures implement ext2 atomic bitops as
    test_and_{set,clear}_bit() without spinlock.

    This adds this type of generic implementation in ext2-atomic-setbit.h and
    use it wherever possible.

    Signed-off-by: Akinobu Mita
    Suggested-by: Andreas Dilger
    Suggested-by: Arnd Bergmann
    Acked-by: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Use debugfs_remove_recursive() to simplify initialization and
    deinitialization of fault injection debugfs files.

    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • should_fail_srandom() does not exist.

    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • The size of the dump is currently set using the RECORD_SIZE macro which
    is set to a page size. This patch makes the record size a module
    parameter and allows it to be set through platform data as well to allow
    larger dumps if needed.

    Signed-off-by: Sergiu Iordache
    Acked-by: Marco Stornelli
    Cc: "Ahmed S. Darwish"
    Cc: Artem Bityutskiy
    Cc: Kyungmin Park
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergiu Iordache
     
  • The platform driver currently allows setting the mem_size and
    mem_address.

    ince dump_oops is also a module parameter it would be more consistent if
    it could be set through platform data as well.

    Signed-off-by: Sergiu Iordache
    Acked-by: Marco Stornelli
    Cc: "Ahmed S. Darwish"
    Cc: Artem Bityutskiy
    Cc: Kyungmin Park
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergiu Iordache
     
  • Don't force output if you intend to reboot immediately.

    In this patch, I'm disabling the functionality enabled by
    vc->vc_panic_force_write if panic_timeout < 0 (i.e. no timeout).
    vc_panic_force_write is only enabled for fb video consoles if the
    FBINFO_CAN_FORCE_OUTPUT flag is set.

    For our application, we're using ram_oops to preserved the panic in
    memory. We want to reliably, and as fast as possible, machine_restart.
    The vc_panic_force_write flag results in a bunch of graphics driver code
    to be invoked which slows down restart and decreases reliability. Since
    we're already storing the panic in RAM and are going to reboot
    immediately, there is no benefit in mode switching back to the vc in
    order to display the panic output. The log buffer will get flushed by
    the console_unblank() call so remote management consoles should see all
    output.

    Signed-off-by: Mandeep Singh Baines
    Cc: Huang Ying
    Cc: Andi Kleen
    Cc: Hugh Dickins
    Cc: Olaf Hering
    Cc: Jesse Barnes
    Cc: Dave Airlie
    Cc: Greg Kroah-Hartman
    Acked-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mandeep Singh Baines
     
  • git grep shows there are no users in tree, so we can remove them safely.

    Signed-off-by: WANG Cong
    Acked-by: FUJITA Tomonori
    Acked-by: Jiri Slaby
    Acked-by: Vinod Koul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    WANG Cong
     
  • Add support for the shm_rmid_forced sysctl. If set to 1, all shared
    memory objects in current ipc namespace will be automatically forced to
    use IPC_RMID.

    The POSIX way of handling shmem allows one to create shm objects and
    call shmdt(), leaving shm object associated with no process, thus
    consuming memory not counted via rlimits.

    With shm_rmid_forced=1 the shared memory object is counted at least for
    one process, so OOM killer may effectively kill the fat process holding
    the shared memory.

    It obviously breaks POSIX - some programs relying on the feature would
    stop working. So set shm_rmid_forced=1 only if you're sure nobody uses
    "orphaned" memory. Use shm_rmid_forced=0 by default for compatability
    reasons.

    The feature was previously impemented in -ow as a configure option.

    [akpm@linux-foundation.org: fix documentation, per Randy]
    [akpm@linux-foundation.org: fix warning]
    [akpm@linux-foundation.org: readability/conventionality tweaks]
    [akpm@linux-foundation.org: fix shm_rmid_forced/shm_forced_rmid confusion, use standard comment layout]
    Signed-off-by: Vasiliy Kulikov
    Cc: Randy Dunlap
    Cc: "Eric W. Biederman"
    Cc: "Serge E. Hallyn"
    Cc: Daniel Lezcano
    Cc: Oleg Nesterov
    Cc: Tejun Heo
    Cc: Ingo Molnar
    Cc: Alan Cox
    Cc: Solar Designer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasiliy Kulikov
     
  • cpumask_var_t has one notable difference from cpumask_t. Add the
    explanation.

    Signed-off-by: KOSAKI Motohiro
    Cc: Thiago Farina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • [ This patch has already been accepted as commit 0ac0c0d0f837 but later
    reverted (commit 35926ff5fba8) because it itroduced arch specific
    __node_random which was defined only for x86 code so it broke other
    archs. This is a followup without any arch specific code. Other than
    that there are no functional changes.]

    Some workloads that create a large number of small files tend to assign
    too many pages to node 0 (multi-node systems). Part of the reason is
    that the rotor (in cpuset_mem_spread_node()) used to assign nodes starts
    at node 0 for newly created tasks.

    This patch changes the rotor to be initialized to a random node number
    of the cpuset.

    [akpm@linux-foundation.org: fix layout]
    [Lee.Schermerhorn@hp.com: Define stub numa_random() for !NUMA configuration]
    [mhocko@suse.cz: Make it arch independent]
    [akpm@linux-foundation.org: fix CONFIG_NUMA=y, MAX_NUMNODES>1 build]
    Signed-off-by: Jack Steiner
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Michal Hocko
    Reviewed-by: KOSAKI Motohiro
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Paul Menage
    Cc: Jack Steiner
    Cc: Robin Holt
    Cc: David Rientjes
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Jack Steiner
    Cc: KOSAKI Motohiro
    Cc: Lee Schermerhorn
    Cc: Michal Hocko
    Cc: Paul Menage
    Cc: Pekka Enberg
    Cc: Robin Holt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • The commit log of 0ae5e89c60c9 ("memcg: count the soft_limit reclaim
    in...") says it adds scanning stats to memory.stat file. But it doesn't
    because we considered we needed to make a concensus for such new APIs.

    This patch is a trial to add memory.scan_stat. This shows
    - the number of scanned pages(total, anon, file)
    - the number of rotated pages(total, anon, file)
    - the number of freed pages(total, anon, file)
    - the number of elaplsed time (including sleep/pause time)

    for both of direct/soft reclaim.

    The biggest difference with oringinal Ying's one is that this file
    can be reset by some write, as

    # echo 0 ...../memory.scan_stat

    Example of output is here. This is a result after make -j 6 kernel
    under 300M limit.

    [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.scan_stat
    [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.vmscan_stat
    scanned_pages_by_limit 9471864
    scanned_anon_pages_by_limit 6640629
    scanned_file_pages_by_limit 2831235
    rotated_pages_by_limit 4243974
    rotated_anon_pages_by_limit 3971968
    rotated_file_pages_by_limit 272006
    freed_pages_by_limit 2318492
    freed_anon_pages_by_limit 962052
    freed_file_pages_by_limit 1356440
    elapsed_ns_by_limit 351386416101
    scanned_pages_by_system 0
    scanned_anon_pages_by_system 0
    scanned_file_pages_by_system 0
    rotated_pages_by_system 0
    rotated_anon_pages_by_system 0
    rotated_file_pages_by_system 0
    freed_pages_by_system 0
    freed_anon_pages_by_system 0
    freed_file_pages_by_system 0
    elapsed_ns_by_system 0
    scanned_pages_by_limit_under_hierarchy 9471864
    scanned_anon_pages_by_limit_under_hierarchy 6640629
    scanned_file_pages_by_limit_under_hierarchy 2831235
    rotated_pages_by_limit_under_hierarchy 4243974
    rotated_anon_pages_by_limit_under_hierarchy 3971968
    rotated_file_pages_by_limit_under_hierarchy 272006
    freed_pages_by_limit_under_hierarchy 2318492
    freed_anon_pages_by_limit_under_hierarchy 962052
    freed_file_pages_by_limit_under_hierarchy 1356440
    elapsed_ns_by_limit_under_hierarchy 351386416101
    scanned_pages_by_system_under_hierarchy 0
    scanned_anon_pages_by_system_under_hierarchy 0
    scanned_file_pages_by_system_under_hierarchy 0
    rotated_pages_by_system_under_hierarchy 0
    rotated_anon_pages_by_system_under_hierarchy 0
    rotated_file_pages_by_system_under_hierarchy 0
    freed_pages_by_system_under_hierarchy 0
    freed_anon_pages_by_system_under_hierarchy 0
    freed_file_pages_by_system_under_hierarchy 0
    elapsed_ns_by_system_under_hierarchy 0

    total_xxxx is for hierarchy management.

    This will be useful for further memcg developments and need to be
    developped before we do some complicated rework on LRU/softlimit
    management.

    This patch adds a new struct memcg_scanrecord into scan_control struct.
    sc->nr_scanned at el is not designed for exporting information. For
    example, nr_scanned is reset frequentrly and incremented +2 at scanning
    mapped pages.

    To avoid complexity, I added a new param in scan_control which is for
    exporting scanning score.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Michal Hocko
    Cc: Ying Han
    Cc: Andrew Bresticker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • In mm/memcontrol.c, there are many lru stat functions as..

    mem_cgroup_zone_nr_lru_pages
    mem_cgroup_node_nr_file_lru_pages
    mem_cgroup_nr_file_lru_pages
    mem_cgroup_node_nr_anon_lru_pages
    mem_cgroup_nr_anon_lru_pages
    mem_cgroup_node_nr_unevictable_lru_pages
    mem_cgroup_nr_unevictable_lru_pages
    mem_cgroup_node_nr_lru_pages
    mem_cgroup_nr_lru_pages
    mem_cgroup_get_local_zonestat

    Some of them are under #ifdef MAX_NUMNODES >1 and others are not.
    This seems bad. This patch consolidates all functions into

    mem_cgroup_zone_nr_lru_pages()
    mem_cgroup_node_nr_lru_pages()
    mem_cgroup_nr_lru_pages()

    For these functions, "which LRU?" information is passed by a mask.

    example:
    mem_cgroup_nr_lru_pages(mem, BIT(LRU_ACTIVE_ANON))

    And I added some macro as ALL_LRU, ALL_LRU_FILE, ALL_LRU_ANON.

    example:
    mem_cgroup_nr_lru_pages(mem, ALL_LRU)

    BTW, considering layout of NUMA memory placement of counters, this patch seems
    to be better.

    Now, when we gather all LRU information, we scan in following orer
    for_each_lru -> for_each_node -> for_each_zone.

    This means we'll touch cache lines in different node in turn.

    After patch, we'll scan
    for_each_node -> for_each_zone -> for_each_lru(mask)

    Then, we'll gather information in the same cacheline at once.

    [akpm@linux-foundation.org: fix warnigns, build error]
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Michal Hocko
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Each memory cgroup has a 'swappiness' value which can be accessed by
    get_swappiness(memcg). The major user is try_to_free_mem_cgroup_pages()
    and swappiness is passed by argument. It's propagated by scan_control.

    get_swappiness() is a static function but some planned updates will need
    to get swappiness from files other than memcontrol.c This patch exports
    get_swappiness() as mem_cgroup_swappiness(). With this, we can remove the
    argument of swapiness from try_to_free... and drop swappiness from
    scan_control. only memcg uses it.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Michal Hocko
    Cc: Ying Han
    Cc: Shaohua Li
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits)
    ceph: document unlocked d_parent accesses
    ceph: explicitly reference rename old_dentry parent dir in request
    ceph: document locking for ceph_set_dentry_offset
    ceph: avoid d_parent in ceph_dentry_hash; fix ceph_encode_fh() hashing bug
    ceph: protect d_parent access in ceph_d_revalidate
    ceph: protect access to d_parent
    ceph: handle racing calls to ceph_init_dentry
    ceph: set dir complete frag after adding capability
    rbd: set blk_queue request sizes to object size
    ceph: set up readahead size when rsize is not passed
    rbd: cancel watch request when releasing the device
    ceph: ignore lease mask
    ceph: fix ceph_lookup_open intent usage
    ceph: only link open operations to directory unsafe list if O_CREAT|O_TRUNC
    ceph: fix bad parent_inode calc in ceph_lookup_open
    ceph: avoid carrying Fw cap during write into page cache
    libceph: don't time out osd requests that haven't been received
    ceph: report f_bfree based on kb_avail rather than diffing.
    ceph: only queue capsnap if caps are dirty
    ceph: fix snap writeback when racing with writes
    ...

    Linus Torvalds
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
    jbd: change the field "b_cow_tid" of struct journal_head from type unsigned to tid_t
    ext3.txt: update the links in the section "useful links" to the latest ones
    ext3: Fix data corruption in inodes with journalled data
    ext2: check xattr name_len before acquiring xattr_sem in ext2_xattr_get
    ext3: Fix compilation with -DDX_DEBUG
    quota: Remove unused declaration
    jbd: Use WRITE_SYNC in journal checkpoint.
    jbd: Fix oops in journal_remove_journal_head()
    ext3: Return -EINVAL when start is beyond the end of fs in ext3_trim_fs()
    ext3/ioctl.c: silence sparse warnings about different address spaces
    ext3/ext4 Documentation: remove bh/nobh since it has been deprecated
    ext3: Improve truncate error handling
    ext3: use proper little-endian bitops
    ext2: include fs.h into ext2_fs.h
    ext3: Fix oops in ext3_try_to_allocate_with_rsv()
    jbd: fix a bug of leaking jh->b_jcount
    jbd: remove dependency on __GFP_NOFAIL
    ext3: Convert ext3 to new truncate calling convention
    jbd: Add fixed tracepoints
    ext3: Add fixed tracepoints

    Resolve conflicts in fs/ext3/fsync.c due to fsync locking push-down and
    new fixed tracepoints.

    Linus Torvalds
     
  • Keep track of when an outgoing message is ACKed (i.e., the server fully
    received it and, presumably, queued it for processing). Time out OSD
    requests only if it's been too long since they've been received.

    This prevents timeouts and connection thrashing when the OSDs are simply
    busy and are throttling the requests they read off the network.

    Reviewed-by: Yehuda Sadeh
    Signed-off-by: Sage Weil

    Sage Weil
     
  • * 'x86-olpc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, olpc-xo15-sci: Enable EC wakeup capability
    x86, olpc: Fix dependency on POWER_SUPPLY
    x86, olpc: Add XO-1.5 SCI driver
    x86, olpc: Add XO-1 RTC driver
    x86, olpc-xo1-sci: Propagate power supply/battery events
    x86, olpc-xo1-sci: Add lid switch functionality
    x86, olpc-xo1-sci: Add GPE handler and ebook switch functionality
    x86, olpc: EC SCI wakeup mask functionality
    x86, olpc: Add XO-1 SCI driver and power button control
    x86, olpc: Add XO-1 suspend/resume support
    x86, olpc: Rename olpc-xo1 to olpc-xo1-pm
    x86, olpc: Move CS5536-related constants to cs5535.h
    x86, olpc: Add missing elements to device tree

    Linus Torvalds