std::chrono::clock，硬件时钟和周期计数

2021-12-23 00:00:00 time cpu benchmarking c++ chrono

std::chrono 提供多个时钟来测量时间.同时，我猜 cpu 评估时间的唯一方法是计数周期.

std::chrono offer several clocks to measure times. At the same time, I guess the only way a cpu can evaluate time, is by counting cycles.

问题 1:除了计数周期外，cpu 或 gpu 是否还有其他方法来评估时间?

Question 1: Does a cpu or a gpu has any other way to evaluate time than by counting cycles?

如果是这样，因为计算机计数周期的方式永远不会像原子钟那样精确，这意味着秒"(period = std::ratio<1>) 对于计算机而言，实际上可能比实际秒更短或更大，从而导致计算机时钟与 GPS 之间的时间测量的长期差异.

If that is the case, because the way a computer count cycles will never be as precise as an atomic clock, it means that a "second" (period = std::ratio<1>) for a computer can be actually shorter or bigger than an actual second, causing differences in the long run for time measurements between the computer clock and let's say GPS.

问题 2:正确吗?

某些硬件具有不同的频率(例如空闲模式和 Turbo 模式).在这种情况下，这意味着一秒钟内的循环数会发生变化.

Some hardware have varying frequencies (for example idle mode, and turbo modes). In that case, it would mean that the number of cycles would vary during a second.

问题 3: cpu 和 gpus 测量的周期数"是否因硬件频率而异?如果是，那么 std::chrono 如何处理?如果不是，一个周期对应什么(比如什么是基本"时间)?有没有办法在编译时访问转换?有没有办法在运行时访问转换?

Question 3: Is the "cycle count" measured by cpu and gpus varying depending on the hardware frequency? If yes, then how std::chrono deal with it? If not, what does a cycle correspond to (like what is the "fundamental" time)? Is there a way to access the conversion at compile-time? Is there a way to access the conversion at runtime?

推荐答案

计数周期，是的，但是什么的周期?

Counting cycles, yes, but cycles of what?

在现代 x86 上，内核使用的时间源(在内部以及用于 clock_gettime 和其他系统调用)通常是一个固定频率的计数器，它计算参考周期"，而不管涡轮、功率 -保存，或时钟停止空闲.(这是您从 rdtsc 或 __rdtsc() 在 C/C++ 中).

On a modern x86, the timesource used by the kernel (internally and for clock_gettime and other system calls) is typically a fixed-frequency counter that counts "reference cycles" regardless of turbo, power-saving, or clock-stopped idle. (This is the counter you get from rdtsc, or __rdtsc() in C/C++).

普通 std::chrono 实现将使用操作系统提供的函数，如 Unix 上的 clock_gettime.(在 Linux 上，这可以纯粹在用户空间中运行，内核映射到每个进程的地址空间的 VDSO 页中的代码 + 比例因子数据.低开销时间源很好.避免用户->内核->用户往返启用 Meltdown + Spectre 缓解有很大帮助.)

Normal std::chrono implementations will use an OS-provided function like clock_gettime on Unix. (On Linux, this can run purely in user-space, code + scale factor data in a VDSO page mapped by the kernel into every process's address space. Low-overhead timesources are nice. Avoiding a user->kernel->user round trip helps a lot with Meltdown + Spectre mitigation enabled.)

分析不受内存限制的紧密循环可能需要使用实际的核心时钟周期，因此它对当前核心的实际速度不敏感.(并且不必担心将 CPU 提升到最大涡轮增压等)例如使用 perf stat ./a.out 或 perf record ./a.out.例如x86 的 MOV 真的可以吗?免费"?为什么我完全不能重现这个?

Profiling a tight loop that's not memory bound might want to use actual core clock cycles, so it will be insensitive to the actual speed of the current core. (And doesn't have to worry about ramping up the CPU to max turbo, etc.) e.g. using perf stat ./a.out or perf record ./a.out. e.g. Can x86's MOV really be "free"? Why can't I reproduce this at all?

有些系统没有/没有内置在 CPU 中的挂钟等效计数器，因此操作系统会在 RAM 中维护一个时间，它会在定时器中断时更新，或者时间查询函数会从单独的芯片读取时间.

Some systems didn't / don't have a wall-clock-equivalent counter built right in to the CPU, so either the OS would maintain a time in RAM that it updates on timer interrupts, or time-query functions would read the time from a separate chip.

(系统调用 + 硬件 I/O = 更高的开销，这也是 x86 的 rdtsc 指令从分析事物转变为时钟源事物的部分原因.)

(System call + hardware I/O = higher overhead, which is part of the reason that x86's rdtsc instruction morphed from a profiling thing into a clocksource thing.)

所有这些时钟频率最终都来自主板上的晶体振荡器.但正如@Tony 指出的那样，可以调整从周期计数推断时间的比例因子，以使时钟与原子时间保持同步，通常使用网络时间协议 (NTP).

All of these clock frequencies are ultimately derived from a crystal oscillator on the mobo. But the scale factors to extrapolate time from cycle counts can be adjusted to keep the clock in sync with atomic time, typically using the Network Time Protocol (NTP), as @Tony points out.

相关文章