2024-02-04

Linux guest could not access tsc clocksource

Phenomenon

check TSC disabled from guest

1 2	[ 0.000004] tsc: Detected 1999.999 MHz processor [ 0.251920] tsc: Marking TSC unstable due to TSCs unsynchronized

Steps

lscpu | grep tsc in guest, confirm cpu support tsc
check vm not runs over IBM’s Summit2 by dmidecode
lscpu | grep constant_tsc in both guest and host to confirm this is not in guest
change kernel cmdline, add tsc=reliable if need ( based on cat /proc/cmdline)
check cpu Vendor by lscpu if using Intel cpu
remove acpi from libvirt. xml，but this will let all hot-plug operation fails

change to using other amd cpu，tsc clocksource still could not be found

Cpu that do not has vendor intel runs linux kernel will met this issue.

Resolution

method 1:

change guest os linux cmdline add tsc=reliable

method 2:

remove acpi from libvirt. xml by virsh edit

More information

x86 guest disable constant_tsc in code，see the disscusion: https://lore.kernel.org/qemu-devel/20210426185155.G[A32439@fuller.cnet](mailto:A32439@fuller.cnet)/t/

but linux guest kernel let intel cpu use tsc and as the result intel vs amd cpu has below differences.

go through kernel code

Code snippet 0

/*
 * TSC can be unstable due to cpufreq or due to unsynced TSCs
 */

static int __read_mostly tsc_unstable;`

The code defines a static integer variable tsc_unstable. The static keyword means that this variable is only visible within the file it’s defined in, limiting its scope to tsc.c. This is a common practice when you want to keep the variable private to a particular file and prevent it from being accessed directly from other files.

The __read_mostly attribute is a compiler directive specific to the Linux kernel, which suggests that this variable is expected to be read from often, but seldom written to. This can help the compiler optimize the generated code for this access pattern.

The variable tsc_unstable is used to indicate whether the TSC is unstable. The comment above the variable declaration explains that the TSC can be unstable due to CPU frequency scaling (cpufreq) or due to unsynchronized TSCs across different cores or processors.

In terms of improvements, the code is quite clear and concise. However, it might be helpful to expand the comment to explain what the implications of an unstable TSC are, and how the tsc_unstable variable is used in the system’s logic.

Code snippet 1

static int __init tsc_setup(char *str)
{
    if (!strcmp(str, "reliable"))
        tsc_clocksource_reliable = 1;
    if (!strncmp(str, "noirqtime", 9))
        no_sched_irq_time = 1;
    if (!strcmp(str, "unstable"))
        mark_tsc_unstable("boot parameter");
    if (!strcmp(str, "nowatchdog"))
        no_tsc_watchdog = 1;
    return 1;
}

The __init keyword before the function name is a macro used in the Linux kernel to indicate that this function is used only at initialization time. The memory for this function is freed after the initialization is complete to conserve the kernel’s memory footprint.

Inside the function, there are several if statements that compare the input string str to different string literals using the strcmp and strncmp functions. strcmp compares two strings and returns 0 if they are identical, while strncmp compares up to a specified number of characters from two strings.

If str is “reliable”, it sets the tsc_clocksource_reliable variable to 1. If str starts with “noirqtime”, it sets the no_sched_irq_time variable to 1. If str is “unstable”, it calls the mark_tsc_unstable function with “boot parameter” as an argument. If str is “nowatchdog”, it sets the no_tsc_watchdog variable to 1.

Each of these variables or functions presumably controls a different aspect of the TSC’s behavior. For example, tsc_clocksource_reliable might indicate whether the TSC is a reliable source of time, and no_sched_irq_time might control whether the scheduler uses the TSC for interrupt timing.

In terms of improvements, the function is quite clear and concise. However, adding comments to explain the purpose of each variable and what each string argument represents would improve readability. It would also be beneficial to add error handling to account for the case where str does not match any of the expected values.

And we could use tsc=reliable to kernel cmdline to changes to tsc clock source.

Code snippet 2

static void tsc_cs_mark_unstable(struct clocksource *cs)
{
    if (tsc_unstable)
        return;
 
    tsc_unstable = 1;
    if (using_native_sched_clock())
        clear_sched_clock_stable();
    disable_sched_clock_irqtime();
    pr_info("Marking TSC unstable due to clocksource watchdog\n");
}

The function first checks if the tsc_unstable variable is already set to 1. If it is, the function immediately returns, as the TSC has already been marked as unstable. This is a common pattern in C programming known as a “guard clause”, which is used to exit a function early when certain conditions are met.

If tsc_unstable is not set to 1, the function proceeds to mark the TSC as unstable. It does this by setting tsc_unstable to 1, and then calling two functions: clear_sched_clock_stable and disable_sched_clock_irqtime. These functions presumably perform some cleanup or configuration changes related to the TSC becoming unstable.

Finally, the function logs a message using the pr_info macro, which is a kernel print function that outputs a message to the system log. The message indicates that the TSC has been marked as unstable due to the clocksource watchdog.

In terms of improvements, the function is quite clear and concise. However, adding comments to explain the purpose of the clear_sched_clock_stable and disable_sched_clock_irqtime functions would improve readability. It would also be beneficial to add error handling to account for any potential issues that could occur when these functions are called.

Code snippet 3

/*
 * Make an educated guess if the TSC is trustworthy and synchronized
 * over all CPUs.
 */
int unsynchronized_tsc(void)
{
    if (!boot_cpu_has(X86_FEATURE_TSC) || tsc_unstable)
        return 1;
 
#ifdef CONFIG_SMP
    if (apic_is_clustered_box())
        return 1;
#endif
 
    if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
        return 0;
 
    if (tsc_clocksource_reliable)
        return 0;
    /*
     * Intel systems are normally all synchronized.
     * Exceptions must mark TSC as unstable:
     */
    if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) {
        /* assume multi socket systems are not synchronized: */
        if (num_possible_cpus() > 1)
            return 1;
    }
 
    return 0;
}

The function begins by checking if the boot CPU has the TSC feature and if the TSC is unstable. If either of these conditions is true, the function immediately returns 1, indicating that the TSC is unsynchronized.

Next, if the system is configured for symmetric multiprocessing (SMP), the function checks if the Advanced Programmable Interrupt Controller (APIC) is clustered. If it is, the function returns 1, again indicating that the TSC is unsynchronized.

The function then checks if the boot CPU has the constant TSC feature or if the TSC clocksource is reliable. If either of these conditions is true, the function returns 0, indicating that the TSC is synchronized.

Finally, the function checks if the CPU vendor is not Intel. If it is not, and the system has more than one possible CPU, the function returns 1, indicating that the TSC is unsynchronized. If none of the previous conditions are met, the function returns 0, indicating that the TSC is synchronized.

More practice

SystemTap

Because of above issue, I just spent more time to check the tsc value used by guest and from host cpu do have any different. With systemtap.

observe rdtsc

result

value of tsc clock，average value and stantard deviation has different

and the value from guest os is not stable when compared with host

during live migration, tsc value will be smaller than usual (I think its because live migration has down time, so we need to change tsc to tolerant it)

so just from the small test, its not a good idea to relay on tsc which is not as specific as it on the host

data from my test

The first version, use the script test average value and stantard deviation

in guest:

TSC mean: 2000170717.800000, TSC std dev: 255861.233545
Time mean: 1000101683.030000, Time std dev: 162898.956256
TSC mean: 2000158595.200000, TSC std dev: 340159.343486
Time mean: 1000092746.020000, Time std dev: 170311.019513
TSC mean: 2000116749.600000, TSC std dev: 96417.905701
Time mean: 1000076448.860000, Time std dev: 102460.953979

in guest during live migration:

TSC mean: 1990107194.600000, TSC std dev: 71113321.983521
Time mean: 1000129868.770000, Time std dev: 340417.298586
TSC mean: 1993829457.200000, TSC std dev: 47439246.502752
Time mean: 1001882162.230000, Time std dev: 16929541.16989

Samples from host:

TSC mean: 2000087563.600000, TSC std dev: 16626.290598
Time mean: 1000065114.610000, Time std dev: 8341.142215
TSC mean: 2000084499.400000, TSC std dev: 4760.334824
Time mean: 1000063541.340000, Time std dev: 2447.439965
TSC mean: 2000083391.800000, TSC std dev: 11786.744451
Time mean: 1000062911.800000, Time std dev: 5922.538748

TSC average value will be less that normal during migration.

change the script to check abnormal samples

Sample 54, TSC diff: 1998893560, Time diff: 1000069363 ns
Sample 55, TSC diff: 1998887140, Time diff: 1000065570 ns
Sample 56, TSC diff: 1999007020, Time diff: 1000130480 ns
Sample 57, TSC diff: 1535293100, Time diff: 1000090774 ns
Sample 58, TSC diff: 1998899520, Time diff: 1000073836 ns
Sample 59, TSC diff: 2000588260, Time diff: 1001300447 ns
Sample 60, TSC diff: 1998899540, Time diff: 1000072444 ns

just paste my test code:


#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
#include <time.h>
#include <math.h>
 
#define SAMPLES 100
 
uint64_t rdtsc(){
    unsigned int lo,hi;
    __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
    return ((uint64_t)hi << 32) | lo;
}
 
double calc_std_dev(uint64_t *data, double mean){
    double sum = 0.0;
    for(int i = 0; i < SAMPLES; i++){
        sum += pow(data[i] - mean, 2);
    }
    return sqrt(sum / SAMPLES);
}
 
int main(){
    struct timespec start, end;
    uint64_t tsc_start, tsc_end;
    uint64_t tsc_diffs[SAMPLES], time_diffs[SAMPLES];
    double tsc_sum = 0.0, time_sum = 0.0;
 
    for(int i = 0; i < SAMPLES; i++){
        clock_gettime(CLOCK_MONOTONIC, &start);
        tsc_start = rdtsc();
 
        sleep(1);
 
        tsc_end = rdtsc();
        clock_gettime(CLOCK_MONOTONIC, &end);
 
        tsc_diffs[i] = tsc_end - tsc_start;
        time_diffs[i] = (end.tv_sec - start.tv_sec) * 1e9 + (end.tv_nsec - start.tv_nsec);
 
        printf("Sample %d, TSC diff: %llu, Time diff: %llu ns\n", i, tsc_diffs[i], time_diffs[i]);
 
        tsc_sum += tsc_diffs[i];
        time_sum += time_diffs[i];
    }
 
    double tsc_mean = tsc_sum / SAMPLES;
    double time_mean = time_sum / SAMPLES;
 
    double tsc_std_dev = calc_std_dev(tsc_diffs, tsc_mean);
    double time_std_dev = calc_std_dev(time_diffs, time_mean);
 
    printf("TSC mean: %f, TSC std dev: %f\n", tsc_mean, tsc_std_dev);
    printf("Time mean: %f, Time std dev: %f\n", time_mean, time_std_dev);
 
    return 0;
}

花の様に