[clue-tech] 2.6 kernel compile options make Intel CPU slow

Jim Ockers ockers at ockers.net
Wed Aug 4 14:31:40 MDT 2010


Hi Dave,

David L. Anselmi wrote:
> Jim Ockers wrote:
>   
>> It seems that the slowness (as indicated by the 100GB "dd if=/dev/zero
>> of=/dev/null" test) occurs on a system with Intel P4 CPU whenever
>> "hyperthreading" is enabled in the BIOS.
>>     
>
> Did you ever say which 2.6 version you're using?  Can you experiment with other versions?  Can you 
> narrow down your kernel config choices based on those related to hyperthreading?
>
> This sounds like a bug or "not quite finished" feature.
>   
We are using the CentOS 5.2 kernel which is 2.6.18-92.1.22.el5 more or 
less. There is a CONFIG_X86_HT but it is not clear where in the "make 
menuconfig" that comes from. It goes away if the CONFIG_SMP is unset.

There is also a CONFIG_SCHED_SMT option which seems to be a good idea to 
set if CONFIG_X86_HT is also set.
>   
>> So my new questions for the group are:
>>
>> 1. Is the 100GB dd test flawed? Or is it indicating some actual slowness
>> in the memory/CPU bandwidth? Or maybe some kernel inefficiency? Recall
>> that these are all "fast" with the 2.4 Linux kernel.
>>     
>
> Perhaps this test demonstrates some contention between multiple "hyperthreads".  If 2.4 doesn't 
> support hyperthreading it wouldn't show up.  If you're exercising a shared resource it would make 
> sense that handing that resource off to a thread that you're not measuring would impact your results.
>
> This article looks interesting (don't know if it helps you) http://linuxgazette.net/103/pramode.html
>   
That was in fact VERY interesting, thank you. I learned a lot and also 
confirmed a lot from reading it.
>   
>> 2. Does "fastness/slowness" depend on the system workload? Would
>> hyperthreading be more efficient in an interactive desktop application
>> than for a background server type application with LAMP? Or does it not
>> matter?
>>     
>
> Obviously.  Didn't you say that dd is slow on a core duo but the desktop is faster?  So dd might be 
> good for finding an unexpected bottleneck, but it might not represent your performance concerns very 
> well.
>
>   
Yes, exactly.
>> 3. What's up with all the Intel systems (except for Xeon) seeming slow
>> but the AMD systems are fast?
>>     
>
> Uh...  The AMDs don't hyperthread (at least not the same way that Intel does).
>   
Those all seem to be true SMP, just like the P4 Xeon also seems to be 
true SMP, and all of those are "fast."
>   
>> 4. What might be a better test than this dd test to expose system
>> performance issues?
>>     
>
> You might look at the O'Reilly performance tuning book (but it may be somewhat dated).  Why do you 
> think you have "performance issues?"  You have performance.  It varies on different hardware. 
> Measure it and find the hardware that works best for you.  If you're writing your own code, it may 
> have performance issues (you used an O(^n) algorithm rather than an O(1) algorithm) but you'd use a 
> profiler to find those (but they might be worse on different kinds of hardware).
>   
Thanks for the suggestions, I will try to investigate those further!

>> 5. Should we leave hyperthreading on or turn it off? We want to try to
>> use the same kernel for everything if possible (One Kernel To Rule Them
>> All..)
>>     
>
> Depends how many workloads you have and how closely they perform across your kernel and hardware 
> choices.  Seems like you're in for lots of measuring (unless you can understand how kernel configs 
> affect your workloads).
>
>   
>> Recall that the origin of this dd test was to expose the "invisible"
>> thermal throttling that Intel CPUs use to protect themselves from
>> overheating.
>>     
>
> Which has nothing to do with "performance issues" but rather cooling issues, right?  I assume you're 
> trying to determine what cooling is needed for various configurations.  It doesn't sound like "dd is 
> too slow" is something you need to worry about, except that it interferes with your ability to 
> detect cooling problems.
>   
The problem is that when the cooling fails (failed fan or clogged heat 
sink), we get a significant performance hit, so we need to know right 
away if that happens and we don't always get good information about the 
cooling. The thing we see right away is slow system performance but it 
manifests itself in a variety of weird and non-obvious ways. We have 
determined that we need maximum performance and nothing less is OK so we 
were trying to detect failures right away.
> So you can use a 2.4 kernel for cooling measurements.
>   
A 2.6 kernel with SMP turned off behaves the same, scheduler-wise, as 
the 2.4 kernel (and I'm sure that was not a SMP kernel either) on the 
hyperthreading P4.
> Can you measure/detect CPU temperature (which ought to relate somehow to throttling)?
>   
Yes but not reliably enough in our 2004-vintage hardware.
> Can you measure power consumption (which ought to decrease when throttling)?
>   
Good idea, but unfortunately no in our old hardware.
> Do you understand your real workloads well enough to configure for them?
>   
We will investigate this further.
> Sorry I don't have any silver bullets but I hope that helps.
>   
It does, thank you very much!

Jim.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://cluedenver.org/pipermail/clue-tech/attachments/20100804/14101d7b/attachment.html 


More information about the clue-tech mailing list