Phenomenon
check TSC disabled from guest
1 | [ 0.000004] tsc: Detected 1999.999 MHz processor |
Steps
lscpu | grep tsc
in guest, confirm cpu support tsc- check vm not runs over IBM’s Summit2 by
dmidecode
lscpu | grep constant_tsc
in both guest and host to confirm this is not in guest- change kernel cmdline, add
tsc=reliable
if need ( based oncat /proc/cmdline
) - check cpu Vendor by
lscpu
if using Intel cpu - remove
acpi
from libvirt. xml,but this will let all hot-plug operation fails
change to using other amd cpu,tsc clocksource still could not be found
Cpu that do not has vendor intel runs linux kernel will met this issue.
Resolution
method 1:
change guest os linux cmdline add tsc=reliable
method 2:
remove acpi
from libvirt. xml by virsh edit
More information
x86 guest disable constant_tsc in code,see the disscusion: https://lore.kernel.org/qemu-devel/20210426185155.G[A32439@fuller.cnet](mailto:A32439@fuller.cnet)/t/
but linux guest kernel let intel cpu use tsc and as the result intel vs amd cpu has below differences.
go through kernel code
Code snippet 0
1 | /* |
The code defines a static integer variable tsc_unstable
. The static
keyword means that this variable is only visible within the file it’s defined in, limiting its scope to tsc.c
. This is a common practice when you want to keep the variable private to a particular file and prevent it from being accessed directly from other files.
The __read_mostly
attribute is a compiler directive specific to the Linux kernel, which suggests that this variable is expected to be read from often, but seldom written to. This can help the compiler optimize the generated code for this access pattern.
The variable tsc_unstable
is used to indicate whether the TSC is unstable. The comment above the variable declaration explains that the TSC can be unstable due to CPU frequency scaling (cpufreq) or due to unsynchronized TSCs across different cores or processors.
In terms of improvements, the code is quite clear and concise. However, it might be helpful to expand the comment to explain what the implications of an unstable TSC are, and how the tsc_unstable
variable is used in the system’s logic.
Code snippet 1
1 | static int __init tsc_setup(char *str) |
The __init
keyword before the function name is a macro used in the Linux kernel to indicate that this function is used only at initialization time. The memory for this function is freed after the initialization is complete to conserve the kernel’s memory footprint.
Inside the function, there are several if
statements that compare the input string str
to different string literals using the strcmp
and strncmp
functions. strcmp
compares two strings and returns 0 if they are identical, while strncmp
compares up to a specified number of characters from two strings.
If str
is “reliable”, it sets the tsc_clocksource_reliable
variable to 1. If str
starts with “noirqtime”, it sets the no_sched_irq_time
variable to 1. If str
is “unstable”, it calls the mark_tsc_unstable
function with “boot parameter” as an argument. If str
is “nowatchdog”, it sets the no_tsc_watchdog
variable to 1.
Each of these variables or functions presumably controls a different aspect of the TSC’s behavior. For example, tsc_clocksource_reliable
might indicate whether the TSC is a reliable source of time, and no_sched_irq_time
might control whether the scheduler uses the TSC for interrupt timing.
In terms of improvements, the function is quite clear and concise. However, adding comments to explain the purpose of each variable and what each string argument represents would improve readability. It would also be beneficial to add error handling to account for the case where str
does not match any of the expected values.
And we could use tsc=reliable
to kernel cmdline to changes to tsc clock source.
Code snippet 2
1 | static void tsc_cs_mark_unstable(struct clocksource *cs) |
The function first checks if the tsc_unstable
variable is already set to 1. If it is, the function immediately returns, as the TSC has already been marked as unstable. This is a common pattern in C programming known as a “guard clause”, which is used to exit a function early when certain conditions are met.
If tsc_unstable
is not set to 1, the function proceeds to mark the TSC as unstable. It does this by setting tsc_unstable
to 1, and then calling two functions: clear_sched_clock_stable
and disable_sched_clock_irqtime
. These functions presumably perform some cleanup or configuration changes related to the TSC becoming unstable.
Finally, the function logs a message using the pr_info
macro, which is a kernel print function that outputs a message to the system log. The message indicates that the TSC has been marked as unstable due to the clocksource watchdog.
In terms of improvements, the function is quite clear and concise. However, adding comments to explain the purpose of the clear_sched_clock_stable
and disable_sched_clock_irqtime
functions would improve readability. It would also be beneficial to add error handling to account for any potential issues that could occur when these functions are called.
Code snippet 3
1 | /* |
The function begins by checking if the boot CPU has the TSC feature and if the TSC is unstable. If either of these conditions is true, the function immediately returns 1, indicating that the TSC is unsynchronized.
Next, if the system is configured for symmetric multiprocessing (SMP), the function checks if the Advanced Programmable Interrupt Controller (APIC) is clustered. If it is, the function returns 1, again indicating that the TSC is unsynchronized.
The function then checks if the boot CPU has the constant TSC feature or if the TSC clocksource is reliable. If either of these conditions is true, the function returns 0, indicating that the TSC is synchronized.
Finally, the function checks if the CPU vendor is not Intel. If it is not, and the system has more than one possible CPU, the function returns 1, indicating that the TSC is unsynchronized. If none of the previous conditions are met, the function returns 0, indicating that the TSC is synchronized.
More practice
SystemTap
Because of above issue, I just spent more time to check the tsc value used by guest and from host cpu do have any different. With systemtap.
observe rdtsc
result
value of tsc clock,average value and stantard deviation has different
and the value from guest os is not stable when compared with host
during live migration, tsc value will be smaller than usual (I think its because live migration has down time, so we need to change tsc to tolerant it)
so just from the small test, its not a good idea to relay on tsc which is not as specific as it on the host
data from my test
The first version, use the script test average value and stantard deviation
in guest:
1 | TSC mean: 2000170717.800000, TSC std dev: 255861.233545 |
in guest during live migration:
1 | TSC mean: 1990107194.600000, TSC std dev: 71113321.983521 |
Samples from host:
1 | TSC mean: 2000087563.600000, TSC std dev: 16626.290598 |
TSC average value will be less that normal during migration.
change the script to check abnormal samples
1 | Sample 54, TSC diff: 1998893560, Time diff: 1000069363 ns |
just paste my test code:
1 |
|
more links for reading
TSC
Time Stamp Counter (TSC)All 80x86 microprocessors include a CLK
input pin, which receives the clock signal of an external oscillator. Starting with the Pentium, 80x86 microprocessors sport a counter that is increased at each clock signal, and is accessible through the TSC register which can be read by means of the rdtsc
assembly instruction. When using this register the kernel has to take into consideration the frequency of the clock signal: if, for instance, the clock ticks at 1 GHz, the TSC is increased once every nanosecond. Linux may take advantage of this register to get much more accurate time measurements.