KyberSlash: division timings depending on secrets in Kyber software

For KyberSlash1, what is the problematic code?

Here's the critical line in the Kyber reference software:

t  = (((t << 1) + KYBER_Q/2)/KYBER_Q) & 1;

In context, KYBER_Q is a public integer 3329, but t is secret. This line doubles t, adds 1664, divides by 3329, and extracts the bottom bit of the result.

As another example, here's the critical line in another library:

t = (int) ((((((int) (a[8 * i + j])) << 1) + (KyberParams.paramsQ / 2)) / KyberParams.paramsQ) & 1);

What is the problem with this code?

In many environments, the time taken by division depends on the inputs to the division: in this case, on the secret t (or the secret a[8 * i + j] in the second example). This creates data flow from secrets to the time taken by Kyber.

For KyberSlash2, what is the problematic code?

Here are the critical lines for KyberSlash2 in the Kyber reference software (not adjacent to each other):

t[j] = ((((uint16_t)u << 4) + KYBER_Q/2)/KYBER_Q) & 15;

t[j] = ((((uint32_t)u << 5) + KYBER_Q/2)/KYBER_Q) & 31;

t[k]  = ((((uint32_t)t[k] << 11) + KYBER_Q/2)/KYBER_Q) & 0x7ff;

t[k]  = ((((uint32_t)t[k] << 10) + KYBER_Q/2)/ KYBER_Q) & 0x3ff;

Similar lines appear in various other Kyber libraries.

What is the structural difference between KyberSlash1 and KyberSlash2?

The KyberSlash1 division is called from decryption in Kyber's underlying PKE (e.g., indcpa_dec in the Kyber reference software uses poly_tomsg). The KyberSlash2 divisions are called from encryption in Kyber's underlying PKE (e.g., indcpa_enc in the Kyber reference software uses poly_compress and polyvec_compress). Decapsulation in the Kyber KEM calls both decryption and encryption in the underlying PKE, so both of these use secret inputs.

Why are secret-dependent timings important?

Timing attacks often successfully work backwards from timings to secrets.

Are these particular dependencies exploitable?

See the analysis in the KyberSlash paper. The KyberSlash demos demonstrate exploitability in some situations. Even if you're in a different situation, patch now; don't wait to see whether further exploits are demonstrated.

If a Kyber key is used only once then everything is okay?

Not necessarily. This does block the current demos, but there are previous examples of side channels that were eventually shown to allow single-trace attacks, and the same might happen to KyberSlash. Patch now.

Does a division in C, C++, Go, Rust, etc. always turn into a division instruction in machine language?

Not necessarily. There are some important combinations of CPUs, compilers, and compiler options that will convert divisions with public denominators into multiplications. For example, on Intel/AMD CPUs, gcc -Os (size optimization) will produce division instructions, but gcc -O3 (time optimization) will produce multiplication instructions. On RISC-V CPUs, gcc -Os and gcc -O3 will both produce division instructions. Matt Godbolt's Compiler Explorer is an easy way to see what different compilers will do.

There are also environments where the ABI does not guarantee the presence of division instructions, so a division is converted into a call to a "soft division" function. In the KyberSlash1 demo for a Raspberry Pi 2, gcc -Os ends up calling an internal divsi3 function; that function uses branches depending on the inputs. The KyberSlash2 demo instead targets a hardware division instruction.

Does a division instruction in machine language really vary in timing across the small integers used in Kyber?

Not necessarily, but experiments on various CPUs have shown variations. For example, on AMD Zen 2, division slows down by a cycle when the numerator reaches 8192. As another example, on the SiFive U74 RISC-V CPU, division speeds up by 62 cycles when the numerator reaches 2048, then slows down by a cycle when the numerator reaches 4096, then slows down by a cycle when the numerator reaches 8192.

For the KyberSlash1 code sequence, the numerator for attacker-chosen ciphertexts can be any even number between 1664 and 8320. The numerator for legitimate ciphertexts tends to be close to 1664, 4993, or 8320. For the KyberSlash2 code sequences, the numerators have a wider range, generally producing larger timing variations.

Can timing attacks really work with timing variations as small as one or two cycles?

Yes. For example, CacheBleed demonstrated exploitability of cache-bank conflicts. Each cache-bank conflict creates a single-cycle delay.

The KyberSlash2 demo also targets such small variations. In the KyberSlash1 demo, the timing variations are larger, above 20 cycles.

If the compiler turns divisions into multiplications then everything is okay?

Not necessarily. There are CPUs where multiplications take variable time, often in undocumented ways. This is a general issue for lattice-based cryptography, and for various other forms of cryptography, including elliptic-curve cryptography; see, e.g., the warning about the IBM PowerPC RS64 IV in the Curve25519 paper.

Are there tools that scan code for divisions?

Yes, although they are not necessarily comprehensive. For example, saferewrite automatically reports warning-div and warning-mul for any secret divisions and secret multiplications detected in machine code. There are more tools that check for secret branches and secret array indices. See also the KyberSlash paper.

What does "Slash" mean in KyberSlash?

Mathematicians and programmers typically write division using a slash symbol.

Does KyberSlash have a logo?

No. The obvious choice of logo would be the name Kyber in a circle with a slash through it to represent division, but this could be misinterpreted.


Version: This is version 2024.06.28 of the "FAQ" web page.