08 Oct, 2016

1 commit

  • It causes double align requirement for __get_vm_area_node() if parameter
    size is power of 2 and VM_IOREMAP is set in parameter flags, for example
    size=0x10000 -> fls_long(0x10000)=17 -> align=0x20000

    get_count_order_long() is implemented and can be used instead of
    fls_long() for fixing the bug, for example size=0x10000 ->
    get_count_order_long(0x10000)=16 -> align=0x10000

    [akpm@linux-foundation.org: s/get_order_long()/get_count_order_long()/]
    [zijun_hu@zoho.com: fixes]
    Link: http://lkml.kernel.org/r/57AABC8B.1040409@zoho.com
    [akpm@linux-foundation.org: locate get_count_order_long() next to get_count_order()]
    [akpm@linux-foundation.org: move get_count_order[_long] definitions to pick up fls_long()]
    [zijun_hu@htc.com: move out get_count_order[_long]() from __KERNEL__ scope]
    Link: http://lkml.kernel.org/r/57B2C4CE.80303@zoho.com
    Link: http://lkml.kernel.org/r/fc045ecf-20fa-0722-b3ac-9a6140488fad@zoho.com
    Signed-off-by: zijun_hu
    Cc: Tejun Heo
    Cc: Johannes Weiner
    Cc: Minchan Kim
    Cc: David Rientjes
    Signed-off-by: zijun_hu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    zijun_hu
     

10 May, 2016

1 commit

  • Some code waits for a metadata update by:

    1. flagging that it is needed (MD_CHANGE_DEVS or MD_CHANGE_CLEAN)
    2. setting MD_CHANGE_PENDING and waking the management thread
    3. waiting for MD_CHANGE_PENDING to be cleared

    If the first two are done without locking, the code in md_update_sb()
    which checks if it needs to repeat might test if an update is needed
    before step 1, then clear MD_CHANGE_PENDING after step 2, resulting
    in the wait returning early.

    So make sure all places that set MD_CHANGE_PENDING are atomicial, and
    bit_clear_unless (suggested by Neil) is introduced for the purpose.

    Cc: Martin Kepplinger
    Cc: Andrew Morton
    Cc: Denys Vlasenko
    Cc: Sasha Levin
    Cc:
    Reviewed-by: NeilBrown
    Signed-off-by: Guoqing Jiang
    Signed-off-by: Shaohua Li

    Guoqing Jiang
     

10 Dec, 2015

1 commit

  • ROL on a 32 bit integer with a shift of 32 or more is undefined and the
    result is arch-dependent. Avoid this by handling the trivial case of
    roling by 0 correctly.

    The trivial solution of checking if shift is 0 breaks gcc's detection
    of this code as a ROL instruction, which is unacceptable.

    This bug was reported and fixed in GCC
    (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157):

    The standard rotate idiom,

    (x << n) | (x >> (32 - n))

    is recognized by gcc (for concreteness, I discuss only the case that x
    is an uint32_t here).

    However, this is portable C only for n in the range 0 < n < 32. For n
    == 0, we get x >> 32 which gives undefined behaviour according to the
    C standard (6.5.7, Bitwise shift operators). To portably support n ==
    0, one has to write the rotate as something like

    (x << n) | (x >> ((-n) & 31))

    And this is apparently not recognized by gcc.

    Note that this is broken on older GCCs and will result in slower ROL.

    Acked-by: Linus Torvalds
    Signed-off-by: Sasha Levin
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

07 Nov, 2015

2 commits

  • Months back, this was discussed, see https://lkml.org/lkml/2015/1/18/289
    The result was the 64-bit version being "likely fine", "valuable" and
    "correct". The discussion fell asleep but since there are possible users,
    let's add it.

    Signed-off-by: Martin Kepplinger
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Arnaldo Carvalho de Melo
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: George Spelvin
    Cc: Rasmus Villemoes
    Cc: Maxime Coquelin
    Cc: Denys Vlasenko
    Cc: Yury Norov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Martin Kepplinger
     
  • It is often overlooked that sign_extend32(), despite its name, is safe to
    use for 16 and 8 bit types as well. This should help prevent sign
    extension being done manually some other way.

    Signed-off-by: Martin Kepplinger
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Arnaldo Carvalho de Melo
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: George Spelvin
    Cc: Rasmus Villemoes
    Cc: Maxime Coquelin
    Cc: Denys Vlasenko
    Cc: Yury Norov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Martin Kepplinger
     

05 Aug, 2015

1 commit

  • With this config:

    http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os

    gcc-4.7.2 generates many copies of these tiny functions:

    bitmap_weight (55 copies):
    55 push %rbp
    48 89 e5 mov %rsp,%rbp
    e8 3f 3a 8b 00 callq __bitmap_weight
    5d pop %rbp
    c3 retq

    hweight_long (23 copies):
    55 push %rbp
    e8 b5 65 8e 00 callq __sw_hweight64
    48 89 e5 mov %rsp,%rbp
    5d pop %rbp
    c3 retq

    See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122

    This patch fixes this via s/inline/__always_inline/

    While at it, replaced two "__inline__" with usual "inline"
    (the rest of the source file uses the latter).

    text data bss dec filename
    86971357 17195880 36659200 140826437 vmlinux.before
    86971120 17195912 36659200 140826232 vmlinux

    Signed-off-by: Denys Vlasenko
    Cc: Andrew Morton
    Cc: David Rientjes
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Thomas Graf
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/1438697716-28121-1-git-send-email-dvlasenk@redhat.com
    Signed-off-by: Ingo Molnar

    Denys Vlasenko
     

17 Apr, 2015

1 commit

  • This patchset does rework to find_bit function family to achieve better
    performance, and decrease size of text. All rework is done in patch 1.
    Patches 2 and 3 are about code moving and renaming.

    It was boot-tested on x86_64 and MIPS (big-endian) machines.
    Performance tests were ran on userspace with code like this:

    /* addr[] is filled from /dev/urandom */
    start = clock();
    while (ret < nbits)
    ret = find_next_bit(addr, nbits, ret + 1);

    end = clock();
    printf("%ld\t", (unsigned long) end - start);

    On Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz measurements are: (for
    find_next_bit, nbits is 8M, for find_first_bit - 80K)

    find_next_bit: find_first_bit:
    new current new current
    26932 43151 14777 14925
    26947 43182 14521 15423
    26507 43824 15053 14705
    27329 43759 14473 14777
    26895 43367 14847 15023
    26990 43693 15103 15163
    26775 43299 15067 15232
    27282 42752 14544 15121
    27504 43088 14644 14858
    26761 43856 14699 15193
    26692 43075 14781 14681
    27137 42969 14451 15061
    ... ...

    find_next_bit performance gain is 35-40%;
    find_first_bit - no measurable difference.

    On ARM machine, there is arch-specific implementation for find_bit.

    Thanks a lot to George Spelvin and Rasmus Villemoes for hints and
    helpful discussions.

    This patch (of 3):

    New implementations takes less space in source file (see diffstat) and in
    object. For me it's 710 vs 453 bytes of text. It also shows better
    performance.

    find_last_bit description fixed due to obvious typo.

    [akpm@linux-foundation.org: include linux/bitmap.h, per Rasmus]
    Signed-off-by: Yury Norov
    Reviewed-by: Rasmus Villemoes
    Reviewed-by: George Spelvin
    Cc: Alexey Klimov
    Cc: David S. Miller
    Cc: Daniel Borkmann
    Cc: Hannes Frederic Sowa
    Cc: Lai Jiangshan
    Cc: Mark Salter
    Cc: AKASHI Takahiro
    Cc: Thomas Graf
    Cc: Valentin Rothberg
    Cc: Chris Wilson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yury Norov
     

16 Nov, 2014

1 commit

  • On some 32 bits architectures, including x86, GENMASK(31, 0) returns 0
    instead of the expected ~0UL.

    This is the same on some 64 bits architectures with GENMASK_ULL(63, 0).

    This is due to an overflow in the shift operand, 1 << 32 for GENMASK,
    1 << 64 for GENMASK_ULL.

    Reported-by: Eric Paire
    Suggested-by: Rasmus Villemoes
    Signed-off-by: Maxime Coquelin
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: # v3.13+
    Cc: linux@rasmusvillemoes.dk
    Cc: gong.chen@linux.intel.com
    Cc: John Sullivan
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Theodore Ts'o
    Fixes: 10ef6b0dffe4 ("bitops: Introduce a more generic BITMASK macro")
    Link: http://lkml.kernel.org/r/1415267659-10563-1-git-send-email-maxime.coquelin@st.com
    Signed-off-by: Ingo Molnar

    Maxime COQUELIN
     

13 Aug, 2014

1 commit

  • Its been a while and there are no in-tree users left, so remove the
    deprecated barriers.

    Signed-off-by: Peter Zijlstra
    Cc: Chen, Gong
    Cc: Jacob Pan
    Cc: Joe Perches
    Cc: John Sullivan
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Srinivas Pandruvada
    Cc: Theodore Ts'o
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

18 Apr, 2014

1 commit

  • Since the smp_mb__{before,after}*() ops are fundamentally dependent on
    how an arch can implement atomics it doesn't make sense to have 3
    variants of them. They must all be the same.

    Furthermore, the 3 variants suggest they're only valid for those 3
    atomic ops, while we have many more where they could be applied.

    So move away from
    smp_mb__{before,after}_{atomic,clear}_{dec,inc,bit}() and reduce the
    interface to just the two: smp_mb__{before,after}_atomic().

    This patch prepares the way by introducing default implementations in
    asm-generic/barrier.h that default to a full barrier and providing
    __deprecated inlines for the previous 6 barriers if they're not
    provided by the arch.

    This should allow for a mostly painless transition (lots of deprecated
    warns in the interim).

    Signed-off-by: Peter Zijlstra
    Acked-by: Paul E. McKenney
    Link: http://lkml.kernel.org/n/tip-wr59327qdyi9mbzn6x937s4e@git.kernel.org
    Cc: Arnd Bergmann
    Cc: "Chen, Gong"
    Cc: John Sullivan
    Cc: Linus Torvalds
    Cc: Mauro Carvalho Chehab
    Cc: Srinivas Pandruvada
    Cc: "Theodore Ts'o"
    Cc: linux-arch@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

31 Mar, 2014

1 commit

  • Use cmpxchg() to atomically set i_flags instead of clearing out the
    S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the
    EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race
    where an immutable file has the immutable flag cleared for a brief
    window of time.

    Reported-by: John Sullivan
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Theodore Ts'o
     

14 Nov, 2013

1 commit

  • Pull ACPI and power management updates from Rafael J Wysocki:

    - New power capping framework and the the Intel Running Average Power
    Limit (RAPL) driver using it from Srinivas Pandruvada and Jacob Pan.

    - Addition of the in-kernel switching feature to the arm_big_little
    cpufreq driver from Viresh Kumar and Nicolas Pitre.

    - cpufreq support for iMac G5 from Aaro Koskinen.

    - Baytrail processors support for intel_pstate from Dirk Brandewie.

    - cpufreq support for Midway/ECX-2000 from Mark Langsdorf.

    - ARM vexpress/TC2 cpufreq support from Sudeep KarkadaNagesha.

    - ACPI power management support for the I2C and SPI bus types from Mika
    Westerberg and Lv Zheng.

    - cpufreq core fixes and cleanups from Viresh Kumar, Srivatsa S Bhat,
    Stratos Karafotis, Xiaoguang Chen, Lan Tianyu.

    - cpufreq drivers updates (mostly fixes and cleanups) from Viresh
    Kumar, Aaro Koskinen, Jungseok Lee, Sudeep KarkadaNagesha, Lukasz
    Majewski, Manish Badarkhe, Hans-Christian Egtvedt, Evgeny Kapaev.

    - intel_pstate updates from Dirk Brandewie and Adrian Huang.

    - ACPICA update to version 20130927 includig fixes and cleanups and
    some reduction of divergences between the ACPICA code in the kernel
    and ACPICA upstream in order to improve the automatic ACPICA patch
    generation process. From Bob Moore, Lv Zheng, Tomasz Nowicki, Naresh
    Bhat, Bjorn Helgaas, David E Box.

    - ACPI IPMI driver fixes and cleanups from Lv Zheng.

    - ACPI hotplug fixes and cleanups from Bjorn Helgaas, Toshi Kani, Zhang
    Yanfei, Rafael J Wysocki.

    - Conversion of the ACPI AC driver to the platform bus type and
    multiple driver fixes and cleanups related to ACPI from Zhang Rui.

    - ACPI processor driver fixes and cleanups from Hanjun Guo, Jiang Liu,
    Bartlomiej Zolnierkiewicz, Mathieu Rhéaume, Rafael J Wysocki.

    - Fixes and cleanups and new blacklist entries related to the ACPI
    video support from Aaron Lu, Felipe Contreras, Lennart Poettering,
    Kirill Tkhai.

    - cpuidle core cleanups from Viresh Kumar and Lorenzo Pieralisi.

    - cpuidle drivers fixes and cleanups from Daniel Lezcano, Jingoo Han,
    Bartlomiej Zolnierkiewicz, Prarit Bhargava.

    - devfreq updates from Sachin Kamat, Dan Carpenter, Manish Badarkhe.

    - Operation Performance Points (OPP) core updates from Nishanth Menon.

    - Runtime power management core fix from Rafael J Wysocki and update
    from Ulf Hansson.

    - Hibernation fixes from Aaron Lu and Rafael J Wysocki.

    - Device suspend/resume lockup detection mechanism from Benoit Goby.

    - Removal of unused proc directories created for various ACPI drivers
    from Lan Tianyu.

    - ACPI LPSS driver fix and new device IDs for the ACPI platform scan
    handler from Heikki Krogerus and Jarkko Nikula.

    - New ACPI _OSI blacklist entry for Toshiba NB100 from Levente Kurusa.

    - Assorted fixes and cleanups related to ACPI from Andy Shevchenko, Al
    Stone, Bartlomiej Zolnierkiewicz, Colin Ian King, Dan Carpenter,
    Felipe Contreras, Jianguo Wu, Lan Tianyu, Yinghai Lu, Mathias Krause,
    Liu Chuansheng.

    - Assorted PM fixes and cleanups from Andy Shevchenko, Thierry Reding,
    Jean-Christophe Plagniol-Villard.

    * tag 'pm+acpi-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (386 commits)
    cpufreq: conservative: fix requested_freq reduction issue
    ACPI / hotplug: Consolidate deferred execution of ACPI hotplug routines
    PM / runtime: Use pm_runtime_put_sync() in __device_release_driver()
    ACPI / event: remove unneeded NULL pointer check
    Revert "ACPI / video: Ignore BIOS initial backlight value for HP 250 G1"
    ACPI / video: Quirk initial backlight level 0
    ACPI / video: Fix initial level validity test
    intel_pstate: skip the driver if ACPI has power mgmt option
    PM / hibernate: Avoid overflow in hibernate_preallocate_memory()
    ACPI / hotplug: Do not execute "insert in progress" _OST
    ACPI / hotplug: Carry out PCI root eject directly
    ACPI / hotplug: Merge device hot-removal routines
    ACPI / hotplug: Make acpi_bus_hot_remove_device() internal
    ACPI / hotplug: Simplify device ejection routines
    ACPI / hotplug: Fix handle_root_bridge_removal()
    ACPI / hotplug: Refuse to hot-remove all objects with disabled hotplug
    ACPI / scan: Start matching drivers after trying scan handlers
    ACPI: Remove acpi_pci_slot_init() headers from internal.h
    ACPI / blacklist: fix name of ThinkPad Edge E530
    PowerCap: Fix build error with option -Werror=format-security
    ...

    Conflicts:
    arch/arm/mach-omap2/opp.c
    drivers/Kconfig
    drivers/spi/spi.c

    Linus Torvalds
     

22 Oct, 2013

1 commit

  • GENMASK is used to create a contiguous bitmask([hi:lo]). It is
    implemented twice in current kernel. One is in EDAC driver, the other
    is in SiS/XGI FB driver. Move it to a more generic place for other
    usage.

    Signed-off-by: Chen, Gong
    Cc: Borislav Petkov
    Cc: Thomas Winischhofer
    Cc: Jean-Christophe Plagniol-Villard
    Cc: Tomi Valkeinen
    Acked-by: Borislav Petkov
    Acked-by: Mauro Carvalho Chehab
    Signed-off-by: Tony Luck

    Chen, Gong
     

17 Oct, 2013

1 commit

  • Adding BIT(x) equivalent for unsigned long long type, BIT_ULL(x). Also
    added BIT_ULL_MASK and BIT_ULL_WORD.

    Suggested-by: Joe Perches
    Signed-off-by: Srinivas Pandruvada
    Signed-off-by: Jacob Pan
    Signed-off-by: Rafael J. Wysocki

    Srinivas Pandruvada
     

24 Mar, 2012

3 commits

  • Introduce for_each_clear_bit() and for_each_clear_bit_from(). They are
    similar to for_each_set_bit() and list_for_each_set_bit_from(), but they
    iterate over all the cleared bits in a memory region.

    Signed-off-by: Akinobu Mita
    Cc: Robert Richter
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: David Woodhouse
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Stefano Panella
    Cc: David Vrabel
    Cc: Sergei Shtylyov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Remove for_each_set_bit_cont() after confirming that no one uses
    for_each_set_bit_cont() anymore.

    [sfr@canb.auug.org.au: regmap: cope with bitops API change]
    Signed-off-by: Akinobu Mita
    Signed-off-by: Stephen Rothwell
    Cc: Robert Richter
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Mark Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • This renames for_each_set_bit_cont() to for_each_set_bit_from() because
    it is analogous to list_for_each_entry_from() in list.h rather than
    list_for_each_entry_continue().

    This doesn't remove for_each_set_bit_cont() for now.

    Signed-off-by: Akinobu Mita
    Cc: Robert Richter
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

19 Feb, 2012

1 commit


16 Feb, 2012

1 commit

  • Use standard ror64() instead of hand-written.
    There is no standard ror64, so create it.

    The difference is shift value being "unsigned int" instead of uint64_t
    (for which there is no reason). gcc starts to emit native ROR instructions
    which it doesn't do for some reason currently. This should make the code
    faster.

    Patch survives in-tree crypto test and ping flood with hmac(sha512) on.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Herbert Xu

    Alexey Dobriyan
     

06 Dec, 2011

1 commit

  • This patch introduces x86 perf scheduler code helper functions. We
    need this to later add more complex functionality to support
    overlapping counter constraints (next patch).

    The algorithm is modified so that the range of weight values is now
    generated from the constraints. There shouldn't be other functional
    changes.

    With the helper functions the scheduler is controlled. There are
    functions to initialize, traverse the event list, find unused counters
    etc. The scheduler keeps its own state.

    V3:
    * Added macro for_each_set_bit_cont().
    * Changed functions interfaces of perf_sched_find_counter() and
    perf_sched_next_event() to use bool as return value.
    * Added some comments to make code better understandable.

    V4:
    * Fix broken event assignment if weight of the first event is not
    wmin (perf_sched_init()).

    Signed-off-by: Robert Richter
    Signed-off-by: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1321616122-1533-2-git-send-email-robert.richter@amd.com
    Signed-off-by: Ingo Molnar

    Robert Richter
     

27 May, 2011

2 commits

  • By the previous style change, CONFIG_GENERIC_FIND_NEXT_BIT,
    CONFIG_GENERIC_FIND_BIT_LE, and CONFIG_GENERIC_FIND_LAST_BIT are not used
    to test for existence of find bitops anymore.

    Signed-off-by: Akinobu Mita
    Acked-by: Greg Ungerer
    Cc: Arnd Bergmann
    Cc: Russell King
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • The style that we normally use in asm-generic is to test the macro itself
    for existence, so in asm-generic, do:

    #ifndef find_next_zero_bit_le
    extern unsigned long find_next_zero_bit_le(const void *addr,
    unsigned long size, unsigned long offset);
    #endif

    and in the architectures, write

    static inline unsigned long find_next_zero_bit_le(const void *addr,
    unsigned long size, unsigned long offset)
    #define find_next_zero_bit_le find_next_zero_bit_le

    This adds the #ifndef for each of the find bitops in the generic header
    and source files.

    Suggested-by: Arnd Bergmann
    Signed-off-by: Akinobu Mita
    Acked-by: Russell King
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Greg Ungerer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

16 Nov, 2010

1 commit


10 Oct, 2010

2 commits

  • If CONFIG_GENERIC_FIND_NEXT_BIT is enabled, find_next_bit() and
    find_next_zero_bit() are doubly declared in asm-generic/bitops/find.h
    and linux/bitops.h.

    asm/bitops.h includes asm-generic/bitops/find.h if and only if the
    architecture enables CONFIG_GENERIC_FIND_NEXT_BIT. And asm/bitops.h
    is included by linux/bitops.h

    So we can just remove the extern declarations of find_next_bit() and
    find_next_zero_bit() in linux/bitops.h.

    Also we can remove unneeded #ifndef CONFIG_GENERIC_FIND_NEXT_BIT in
    asm-generic/bitops/find.h.

    Signed-off-by: Akinobu Mita
    Signed-off-by: Arnd Bergmann

    Akinobu Mita
     
  • asm-generic/bitops/find.h has the extern declarations of find_next_bit()
    and find_next_zero_bit() and the macro definitions of find_first_bit()
    and find_first_zero_bit(). It is only usable by the architectures which
    enables CONFIG_GENERIC_FIND_NEXT_BIT and disables
    CONFIG_GENERIC_FIND_FIRST_BIT.

    x86 and tile enable both CONFIG_GENERIC_FIND_NEXT_BIT and
    CONFIG_GENERIC_FIND_FIRST_BIT. These architectures cannot include
    asm-generic/bitops/find.h in their asm/bitops.h. So ifdefed extern
    declarations of find_first_bit and find_first_zero_bit() are put in
    linux/bitops.h.

    This makes asm-generic/bitops/find.h usable by these architectures
    and use it. Also this change is needed for the forthcoming duplicated
    extern declarations cleanup.

    Signed-off-by: Akinobu Mita
    Signed-off-by: Arnd Bergmann
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: x86@kernel.org
    Cc: Chris Metcalf

    Akinobu Mita
     

19 May, 2010

1 commit


05 May, 2010

1 commit

  • Fix function prototype visibility issues when compiling for non-x86
    architectures. Tested with crosstool
    (ftp://ftp.kernel.org/pub/tools/crosstool/) with alpha, ia64 and sparc
    targets.

    Signed-off-by: Borislav Petkov
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Borislav Petkov
     

07 Apr, 2010

2 commits


07 Mar, 2010

1 commit

  • Rename for_each_bit to for_each_set_bit in the kernel source tree. To
    permit for_each_clear_bit(), should that ever be added.

    The patch includes a macro to map the old for_each_bit() onto the new
    for_each_set_bit(). This is a (very) temporary thing to ease the migration.

    [akpm@linux-foundation.org: add temporary for_each_bit()]
    Suggested-by: Alexey Dobriyan
    Suggested-by: Andrew Morton
    Signed-off-by: Akinobu Mita
    Cc: "David S. Miller"
    Cc: Russell King
    Cc: David Woodhouse
    Cc: Artem Bityutskiy
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

04 Feb, 2010

1 commit


29 Jan, 2010

1 commit


23 Apr, 2009

1 commit

  • Finds the first set bit in a 64 bit word. This is required in order
    to fix a bug in GFS2, but I think it should be a generic function
    in case of future users.

    Signed-off-by: Steven Whitehouse
    Reviewed-by: Christoph Lameter
    Reviewed-by: Willy Tarreau

    Steven Whitehouse
     

01 Jan, 2009

1 commit


29 Apr, 2008

2 commits

  • The mapsize optimizations which were moved from x86 to the generic
    code in commit 64970b68d2b3ed32b964b0b30b1b98518fde388e increased the
    binary size on non x86 architectures.

    Looking into the real effects of the "optimizations" it turned out
    that they are not used in find_next_bit() and find_next_zero_bit().

    The ones in find_first_bit() and find_first_zero_bit() are used in a
    couple of places but none of them is a real hot path.

    Remove the "optimizations" all together and call the library functions
    unconditionally.

    Boot-tested on x86 and compile tested on every cross compiler I have.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • BITS_PER_LONG is a signed value (32 or 64)

    DIV_ROUND_UP(nr, BITS_PER_LONG) performs signed arithmetic if "nr" is signed too.

    Converting BITS_TO_LONGS(nr) to DIV_ROUND_UP(nr, BITS_PER_BYTE *
    sizeof(long)) makes sure compiler can perform a right shift, even if "nr"
    is a signed value, instead of an expensive integer divide.

    Applying this patch saves 141 bytes on x86 when CONFIG_CC_OPTIMIZE_FOR_SIZE=y
    and speedup bitmap operations.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

27 Apr, 2008

3 commits

  • Avoid a call to find_first_bit if the bitmap size is know at
    compile time and small enough to fit in a single long integer.
    Modeled after an optimization in the original x86_64-specific
    code.

    Signed-off-by: Alexander van Heukelum
    Signed-off-by: Ingo Molnar

    Alexander van Heukelum
     
  • Generic versions of __find_first_bit and __find_first_zero_bit
    are introduced as simplified versions of __find_next_bit and
    __find_next_zero_bit. Their compilation and use are guarded by
    a new config variable GENERIC_FIND_FIRST_BIT.

    The generic versions of find_first_bit and find_first_zero_bit
    are implemented in terms of the newly introduced __find_first_bit
    and __find_first_zero_bit.

    This patch does not remove the i386-specific implementation,
    but it does switch i386 to use the generic functions by setting
    GENERIC_FIND_FIRST_BIT=y for X86_32.

    Signed-off-by: Alexander van Heukelum
    Signed-off-by: Ingo Molnar

    Alexander van Heukelum
     
  • This moves an optimization for searching constant-sized small
    bitmaps form x86_64-specific to generic code.

    On an i386 defconfig (the x86#testing one), the size of vmlinux hardly
    changes with this applied. I have observed only four places where this
    optimization avoids a call into find_next_bit:

    In the functions return_unused_surplus_pages, alloc_fresh_huge_page,
    and adjust_pool_surplus, this patch avoids a call for a 1-bit bitmap.
    In __next_cpu a call is avoided for a 32-bit bitmap. That's it.

    On x86_64, 52 locations are optimized with a minimal increase in
    code size:

    Current #testing defconfig:
    146 x bsf, 27 x find_next_*bit
    text data bss dec hex filename
    5392637 846592 724424 6963653 6a41c5 vmlinux

    After removing the x86_64 specific optimization for find_next_*bit:
    94 x bsf, 79 x find_next_*bit
    text data bss dec hex filename
    5392358 846592 724424 6963374 6a40ae vmlinux

    After this patch (making the optimization generic):
    146 x bsf, 27 x find_next_*bit
    text data bss dec hex filename
    5392396 846592 724424 6963412 6a40d4 vmlinux

    [ tglx@linutronix.de: build fixes ]

    Signed-off-by: Ingo Molnar

    Alexander van Heukelum
     

29 Mar, 2008

1 commit