20 Oct, 2008

40 commits

  • There are not-on-LRU pages which can be mapped and they are not worth to
    be accounted. (becasue we can't shrink them and need dirty codes to
    handle specical case) We'd like to make use of usual objrmap/radix-tree's
    protcol and don't want to account out-of-vm's control pages.

    When special_mapping_fault() is called, page->mapping is tend to be NULL
    and it's charged as Anonymous page. insert_page() also handles some
    special pages from drivers.

    This patch is for avoiding to account special pages.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • This patch tries to make page->mapping to be NULL before
    mem_cgroup_uncharge_cache_page() is called.

    "page->mapping == NULL" is a good check for "whether the page is still
    radix-tree or not". This patch also adds BUG_ON() to
    mem_cgroup_uncharge_cache_page();

    Signed-off-by: KAMEZAWA Hiroyuki
    Reviewed-by: Daisuke Nishimura
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • While page-cache's charge/uncharge is done under page_lock(), swap-cache
    isn't. (anonymous page is charged when it's newly allocated.)

    This patch moves do_swap_page()'s charge() call under lock. I don't see
    any bad problem *now* but this fix will be good for future for avoiding
    unnecessary racy state.

    Signed-off-by: KAMEZAWA Hiroyuki
    Reviewed-by: Daisuke Nishimura
    Acked-by: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Since we introduced rcu for read side, spin_lock is used only for update.
    But we always hold cgroup_lock() when update, so spin_lock() is not need.

    Additional cleanup:
    1) include linux/rcupdate.h explicitly
    2) remove unused variable cur_devcgroup in devcgroup_update_access()

    Signed-off-by: Lai Jiangshan
    Acked-by: "Serge E. Hallyn"
    Cc: Paul Menage
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lai Jiangshan
     
  • Signed-off-by: Li Zefan
    Acked-by: Serge Hallyn
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • This saves 40 bytes on my x86_32 box.

    Signed-off-by: Li Zefan
    Acked-by: Serge Hallyn
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • The choice of real/dummy declaration for cgroup_mm_owner_callbacks()
    shouldn't be based on CONFIG_MM_OWNER, but on CONFIG_CGROUPS. Otherwise
    kernel/exit.c fails to compile when something other than a cgroups
    controller selects CONFIG_MM_OWNER

    Signed-off-by: Paul Menage
    Acked-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Rather than pre-generating the entire text for the "tasks" file each
    time the file is opened, we instead just generate/update the array of
    process ids and use a seq_file to report these to userspace. All open
    file handles on the same "tasks" file can share a pid array, which may
    be updated any time that no thread is actively reading the array. By
    sharing the array, the potential for userspace to DoS the system by
    opening many handles on the same "tasks" file is removed.

    [Based on a patch by Lai Jiangshan, extended to use seq_file]

    Signed-off-by: Paul Menage
    Reviewed-by: Lai Jiangshan
    Cc: Serge Hallyn
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • put_css_set_taskexit may be called when find_css_set is called on other
    cpu. And the race will occur:

    put_css_set_taskexit side find_css_set side

    |
    atomic_dec_and_test(&kref->refcount) |
    /* kref->refcount = 0 */ |
    ....................................................................
    | read_lock(&css_set_lock)
    | find_existing_css_set
    | get_css_set
    | read_unlock(&css_set_lock);
    ....................................................................
    __release_css_set |
    ....................................................................
    | /* use a released css_set */
    |

    [put_css_set is the same. But in the current code, all put_css_set are
    put into cgroup mutex critical region as the same as find_css_set.]

    [akpm@linux-foundation.org: repair comments]
    [menage@google.com: eliminate race in css_set refcounting]
    Signed-off-by: Lai Jiangshan
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lai Jiangshan
     
  • A corrupted extent for the extent file itself may try to get an impossible
    extent, causing a deadlock if I see it correctly.

    Check the inode number after the first_blocks checks and fail if it's the
    extent file, as according to the spec the extent file should have no
    extent for itself.

    Signed-off-by: Eric Sesterhenn
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sesterhenn
     
  • hfsplus: O_LARGEFILE checking is missing

    Addresses http://bugzilla.kernel.org/show_bug.cgi?id=8490

    From: Alan Cox
    Reported-by: didier
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • A very large directory with many read failures (either due to storage
    problems, or due to invalid size & blocks from corruption) will generate a
    printk storm as the filesystem continues to try to read all the blocks.
    This flood of messages can tie up the box until it is complete - which may
    be a very long time, especially for very large corrupted values.

    This is fixed by only reporting the corruption once each time we try to
    read the directory.

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"
    Cc: Eugene Teo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • For blocksize < pagesize we need to remove blocks that got allocated in
    block_write_begin() if we fail with ENOSPC for later blocks.
    block_write_begin() internally does this if it allocated page locally.
    This makes sure we don't have blocks outside inode.i_size during ENOSPC.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • This fixes a bug where readdir() would return a directory entry twice
    if there was a hash collision in an hash tree indexed directory.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Eugene Dashevsky
    Signed-off-by: Mike Snitzer
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eugene Dashevsky
     
  • In ordered mode, if a file data buffer being dirtied exists in the
    committing transaction, we write the buffer to the disk, move it from the
    committing transaction to the running transaction, then dirty it. But we
    don't have to remove the buffer from the committing transaction when the
    buffer couldn't be written out, otherwise it would miss the error and the
    committing transaction would not abort.

    This patch adds an error check before removing the buffer from the
    committing transaction.

    Signed-off-by: Hidehiro Kawai
    Acked-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hidehiro Kawai
     
  • If the journal doesn't abort when it gets an IO error in file data blocks,
    the file data corruption will spread silently. Because most of
    applications and commands do buffered writes without fsync(), they don't
    notice the IO error. It's scary for mission critical systems. On the
    other hand, if the journal aborts whenever it gets an IO error in file
    data blocks, the system will easily become inoperable. So this patch
    introduces a filesystem option to determine whether it aborts the journal
    or just call printk() when it gets an IO error in file data.

    If you mount a ext3 fs with data_err=abort option, it aborts on file data
    write error. If you mount it with data_err=ignore, it doesn't abort, just
    call printk(). data_err=ignore is the default.

    Signed-off-by: Hidehiro Kawai
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hidehiro Kawai
     
  • We could run into ENOSPC error on ext3, even when there is free blocks on
    the filesystem.

    The problem is triggered in the case the goal block group has 0 free
    blocks , and the rest block groups are skipped due to the check of
    "free_blocks < windowsz/2". Current code could fall back to non
    reservation allocation to prevent early ENOSPC after examing all the block
    groups with reservation on , but this code was bypassed if the reservation
    window is turned off already, which is true in this case.

    This patch fixed two issues:
    1) We don't need to turn off block reservation if the goal block group has
    0 free blocks left and continue search for the rest of block groups.

    Current code the intention is to turn off the block reservation if the
    goal allocation group has a few (some) free blocks left (not enough for
    make the desired reservation window),to try to allocation in the goal
    block group, to get better locality. But if the goal blocks have 0 free
    blocks, it should leave the block reservation on, and continues search for
    the next block groups,rather than turn off block reservation completely.

    2) we don't need to check the window size if the block reservation is off.

    The problem was originally found and fixed in ext4.

    Signed-off-by: Mingming Cao
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • When trying to resize a ext3 fs and you run out of reserved gdt blocks,
    you get an error that doesn't actually tell you what went wrong, it just
    says that the gdb it picked is not correct, which is the case since you
    don't have any reserved gdt blocks left. This patch adds a check to make
    sure you have reserved gdt blocks to use, and if not prints out a more
    relevant error.

    Signed-off-by: Josef Bacik
    Cc:
    Cc: Andreas Dilger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef Bacik
     
  • Currently, original metadata buffers are dirtied when they are unfiled
    whether the journal has aborted or not. Eventually these buffers will be
    written-back to the filesystem by pdflush. This means some metadata
    buffers are written to the filesystem without journaling if the journal
    aborts. So if both journal abort and system crash happen at the same
    time, the filesystem would become inconsistent state. Additionally,
    replaying journaled metadata can overwrite the latest metadata on the
    filesystem partly. Because, if the journal aborts, journaled metadata are
    preserved and replayed during the next mount not to lose uncheckpointed
    metadata. This would also break the consistency of the filesystem.

    This patch prevents original metadata buffers from being dirtied on abort
    by clearing BH_JBDDirty flag from those buffers. Thus, no metadata
    buffers are written to the filesystem without journaling.

    Signed-off-by: Hidehiro Kawai
    Acked-by: Jan Kara
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hidehiro Kawai
     
  • If we failed to write metadata buffers to the journal space and succeeded
    to write the commit record, stale data can be written back to the
    filesystem as metadata in the recovery phase.

    To avoid this, when we failed to write out metadata buffers, abort the
    journal before writing the commit record.

    Signed-off-by: Hidehiro Kawai
    Acked-by: Jan Kara
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hidehiro Kawai
     
  • The phone_device array is covered by the phone_lock mutex in all cases and
    request_module no longer needs the BKL so we can remove the only remaining
    instance of the BKL from phonedev.

    Signed-off-by: Richard Holden
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Richard Holden
     
  • Change lock_kernel()/unlock_kernel() to local fb mutex. Each frame buffer
    instance has its own mutex.

    The one line try_to_load() function is unrolled to request_module() in two
    places for readability.

    [righi.andrea@gmail.com: fb: fix NULL pointer BUG dereference in fb_open()]
    Signed-off-by: Krzysztof Helt
    Signed-off-by: Andrea Righi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Krzysztof Helt
     
  • Framebuffer is heavily BKL dependant at the moment so just wrap the ioctl
    handler in the driver as we push down.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Alan Cox
    Cc: Krzysztof Helt
    Cc: "Antonino A. Daplas"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • We can get the following oops from gpio_get_value_cansleep() when a GPIO
    controller doesn't provide a get() callback:

    Unable to handle kernel paging request for instruction fetch
    Faulting instruction address: 0x00000000
    Oops: Kernel access of bad area, sig: 11 [#1]
    [...]
    NIP [00000000] 0x0
    LR [c0182fb0] gpio_get_value_cansleep+0x40/0x50
    Call Trace:
    [c7b79e80] [c0183f28] gpio_value_show+0x5c/0x94
    [c7b79ea0] [c01a584c] dev_attr_show+0x30/0x7c
    [c7b79eb0] [c00d6b48] fill_read_buffer+0x68/0xe0
    [c7b79ed0] [c00d6c54] sysfs_read_file+0x94/0xbc
    [c7b79ef0] [c008f24c] vfs_read+0xb4/0x16c
    [c7b79f10] [c008f580] sys_read+0x4c/0x90
    [c7b79f40] [c0013a14] ret_from_syscall+0x0/0x38

    It's OK to request the value of *any* GPIO; most GPIOs are bidirectional,
    so configuring them as outputs just enables an output driver and doesn't
    disable the input logic.

    So the problem is that gpio_get_value_cansleep() isn't making the same
    sanity check that gpio_get_value() does: making sure this GPIO isn't one
    of the atypical "no input logic" cases.

    Reported-by: Anton Vorontsov
    Signed-off-by: David Brownell
    Cc: [2.6.27.x, 2.6.26.x, 2.6.25.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Brownell
     
  • gpiolib can export GPIOs to userspace via sysfs. This patch modifies the
    gpio_value_show() so that any non-zero value is explicitly printed as "1",
    rather than whatever numerical value the lower-level driver returns.

    Signed-off-by: Steve Falco
    Signed-off-by: David Brownell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven A. Falco
     
  • Teach rtc-cmos about the second bank of registers found on most modern x86
    systems, giving access to 128 bytes more NVRAM.

    This version only sees that extra NVRAM when both register banks are
    provided as part of *one* PNP resource. Since BIOS on some systems
    presents them using two IO resources, and nothing merges them, this can't
    always show all the NVRAM. (We're supposed to be able to use PNP id
    PNP0b01 too, but BIOS tables doesn't often seem to use that particular
    option.)

    Signed-off-by: David Brownell
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Bjorn Helgaas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Brownell
     
  • In function sn_sal_switch_to_asynch(): drivers/serial/sn_console.c:713:

    HZ * SN_SAL_UART_FIFO_DEPTH / SN_SAL_UART_FIFO_SPEED_CPS;

    After preprocessing (see defines in patch) this becomes HZ * 16 / 9600 / 10
    (associativity from left to right), not equivalent to HZ * 16 / 960.

    Looks-obviously-right-to: Tony Luck
    Cc: Jes Sorensen
    Acked-by: Pat Gefre
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    roel kluin
     
  • This patch makes the needlessly global probe_serial_gsc() static.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • readl/writel are not expected to accept iomap return value. Replace
    bogus mapping by standard ioremap.

    Signed-off-by: Jiri Slaby
    Cc:
    Acked-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • The read fail ratio is sensitive to the delay between the first byte
    written and the first byte read; apparently the sensors cannot be rushed.
    Increasing the minimum wait time, without changing the total wait time,
    improves the fail ratio from a 8% chance that any of the sensors fails in
    one read, down to 0.4%, on a Macbook Air. On a Macbook Pro 3,1, the
    effect is even more apparent. By reducing the number of status polls, the
    ratio is further improved to below 0.1%. Finally, increasing the total
    wait time brings the fail ratio down to virtually zero.

    Signed-off-by: Henrik Rydberg
    Tested-by: Bob McElrath
    Cc: Nicolas Boichat
    Cc: "Mark M. Hoffman"
    Cc: Jean Delvare
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henrik Rydberg
     
  • Add temperature sensor support for Macbook Pro 3.

    Signed-off-by: Henrik Rydberg
    Cc: Nicolas Boichat
    Cc: Riki Oktarianto
    Cc: Mark M. Hoffman
    Cc: Jean Delvare
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henrik Rydberg
     
  • Adds temperature sensor support for the Macbook Pro 4.

    Signed-off-by: Henrik Rydberg
    Cc: Nicolas Boichat
    Cc: Riki Oktarianto
    Cc: Mark M. Hoffman
    Cc: Jean Delvare
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henrik Rydberg
     
  • dmi_system_id.driver_data is already void*.

    Cc: Henrik Rydberg
    Cc: Nicolas Boichat
    Cc: Riki Oktarianto
    Cc: Mark M. Hoffman
    Cc: Jean Delvare
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • This patch adds accelerometer, backlight and temperature sensor support
    for the Macbook Air.

    Signed-off-by: Henrik Rydberg
    Cc: Nicolas Boichat
    Cc: Riki Oktarianto
    Cc: Mark M. Hoffman
    Cc: Jean Delvare
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henrik Rydberg
     
  • On some recent Macbooks, the package length for the light sensors ALV0 and
    ALV1 has changed from 6 to 10. This patch allows for a variable package
    length encompassing both variants.

    Signed-off-by: Henrik Rydberg
    Cc: Nicolas Boichat
    Cc: Riki Oktarianto
    Cc: Mark M. Hoffman
    Cc: Jean Delvare
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henrik Rydberg
     
  • The time to wait for a status change while reading or writing to the SMC
    ports is a balance between read reliability and system performance. The
    current setting yields rougly three errors in a thousand when
    simultaneously reading three different temperature values on a Macbook
    Air. This patch increases the setting to a value yielding roughly one
    error in ten thousand, with no noticable system performance degradation.

    Signed-off-by: Henrik Rydberg
    Cc: Nicolas Boichat
    Cc: Riki Oktarianto
    Cc: Mark M. Hoffman
    Cc: Jean Delvare
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henrik Rydberg
     
  • On many Macbooks since mid 2007, the Pro, C2D and Air models, applesmc
    fails to read some or all SMC ports. This problem has various effects,
    such as flooded logfiles, malfunctioning temperature sensors,
    accelerometers failing to initialize, and difficulties getting backlight
    functionality to work properly.

    The root of the problem seems to be the command protocol. The current
    code sends out a command byte, then repeatedly polls for an ack before
    continuing to send or recieve data. From experiments leading to this
    patch, it seems the command protocol never quite worked or changed so that
    one now sends a command byte, waits a little bit, polls for an ack, and if
    it fails, repeats the whole thing by sending the command byte again.

    This patch implements a send_command function according to the new
    interpretation of the protocol, and should work also for earlier models.

    Signed-off-by: Henrik Rydberg
    Cc: Nicolas Boichat
    Cc: Riki Oktarianto
    Cc: Mark M. Hoffman
    Cc: Jean Delvare
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henrik Rydberg
     
  • At one single place in the code, the specified number of bytes to read and
    the actual number of bytes read differ by one. This one-liner patch fixes
    that inconsistency.

    Signed-off-by: Henrik Rydberg
    Cc: Nicolas Boichat
    Cc: Riki Oktarianto
    Cc: Mark M. Hoffman
    Cc: Jean Delvare
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henrik Rydberg
     
  • Adds therm-min/max/crit-alarm callbacks, sensor-device-attribute
    declarations, and refs to those new decls in the macro used to initialize
    the therm_group (of sysfs files)

    The thermistors use voltage channels to measure; so they don't have a
    fault-alarm, but unlike the other voltages, they do have an overtemp,
    which we call crit (by convention).

    [akpm@linux-foundation.org: cleanup]
    Signed-off-by: Jim Cromie
    Cc: Jean Delvare
    Cc: "Mark M. Hoffman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jim Cromie
     
  • temp and vin status register values may be set by chip specifications, set
    again by bios, or by this previously loaded driver. Debug output nicely
    displays modprobe init=\d actions.

    Signed-off-by: Jim Cromie
    Cc: Jean Delvare
    Cc: "Mark M. Hoffman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jim Cromie