Rendered at 00:24:30 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
jaen 1 days ago [-]
Wow, that's pretty cool. Translating (almost) arbitrary floating point programs into weird integer programs while also preserving equivalence under non-strict floating point semantics? Mathematics can be surprisingly wonderful.
measurablefunc 1 days ago [-]
> if f and g are algebraically equivalent programs then FPSan(f) and FPSan(g) produce identical results when given identical inputs
Ok, but we want the other direction. If FPSan(f) & FPSan(g) produce identical results for identical inputs then we want to conclude that f & g are also equivalent. If g is an "optimized" version of f then this would allow checking equivalence but that's not what they are proving or maybe they are but it looks like the converse is contingent on an unproven conjecture.
simonreiff 1 days ago [-]
Right. Put differently, we have that FPSan() is a well-defined function, so [ f = g ] => [ FPSan(f) = FPSan(g)], but we need to show that FPSan() is injective, i.e., [ FPSan(f) = FPSan(g) ] => [ f = g ]. I confess I haven't looked very closely but it should not be so hard. We can prove injectibility in the alternative by analyzing ker(FPSan()), the set of all inputs in the domain of functions mapping to the identity element in the co-domain. If the kernel is trivial and only consists of the identity map, the injectibility is established, but I am not immediately seeing the proof. Fun!
amluto 21 hours ago [-]
It seems to me that one could maliciously construct a failure. For example:
phi(1.0) * 2^32 = 0
So:
1.0 + … + 1.0 (2^32 terms added)
Will turn into zero in the embedding. (I bet other, dramatically smaller zeros could be found by other operations. phi^-1(2^16) could be a good starting point, but you don’t necessarily need a shorter one — see below.)
Now you find a floating point expression tree that has only 1.0, 0, and -1.0 at the leaves and generates this spurious zero. (For example, 1.0 + 1.0, squared five times.)
Now you maliciously transform a program by adding one of these spurious zero expressions somewhere. Am I missing something?
More generally, what is the multiply-xorshift-multiply sequence accomplishing? I feel like it might make non-malicious collisions unlikely, but I feel like it would be mildly surprising if it does much in the setting of trying to prove something without any probability of error. And it seems a bit unfortunate that no choice of the scrambling constants has any effect on the expressions that start with 1.0 and use only multiplication and addition to get to zero.
Also, how does floating point infinity fit in? It seems like it doesn’t act very infinite in the integer embedding.
(I could be totally wrong here. I only read the definitions twice, and I didn’t try to write anything down.)
goucher 16 hours ago [-]
Original author here.
Yes, this is correct, and indeed injectivity is impossible from a counting argument: for example, there are only (2^32)^(2^32) unary functions in this finite ring, but infinitely many polynomials with integer coefficients, so we cannot hope to distinguish them all.
> More generally, what is the multiply-xorshift-multiply sequence accomplishing? I feel like it might make non-malicious collisions unlikely
Exactly this. One issue with the bitcast embedding (other than not mapping {-1.0, 0.0, 1.0} to {-1, 0, 1}) is that 'round numbers' such as 42.0 have floating-point representations (such as 0x42280000) that end in very many trailing zero bits. This is particularly problematic because multiplying two such numbers gives 0.
In general, FPSan isn't designed for an adversarial regime. There are ways that it could be hardened in this direction: for example, instead of using the ring of integers modulo 2^32, choose a prime-power modulus using a large prime derived from a cryptographic hash of the source code of the two programs that you're trying to compare. But I'm not confident that even that guarantees adversarial robustness: for instance, there may be some clever way to efficiently implement a nonzero polynomial that vanishes at every small integer (say, every integer with absolute value < 10^100) and that would break this hardened variant.
The motivation behind FPSan is to protect against accidental implementation bugs: for example, someone writes a complicated fused kernel that improves the runtime of model inference, but makes an honest mistake whilst doing so. Bitwise floating-point matching is too strict; approximate matching is too messy (what should the error tolerances be?); symbolic methods are too unscalable.
amluto 9 hours ago [-]
> But I'm not confident that even that guarantees adversarial robustness: for instance, there may be some clever way to efficiently implement a nonzero polynomial that vanishes at every small integer (say, every integer with absolute value < 10^100) and that would break this hardened variant.
That’s a lot of zeros.
I wonder if there’s a construction with the property that, for all pairs of circuits with size below some threshold (with size appropriately defined), then, if the circuits are not algebraically equal, then, under randomization of whatever parameters are randomized, their embedded versions are, with high probability, not equal and differ in at least a constant fraction of inputs.
At the very least, I imagine that most ML kernels have polynomial-ish order far below 100 in a sense where e^x is defined to have low polynomial-ish order. sin and cos might be worse due to actually having infinite zeros, but maybe pi isn’t constructible and no one can find those zeros.
Ok, but we want the other direction. If FPSan(f) & FPSan(g) produce identical results for identical inputs then we want to conclude that f & g are also equivalent. If g is an "optimized" version of f then this would allow checking equivalence but that's not what they are proving or maybe they are but it looks like the converse is contingent on an unproven conjecture.
Now you find a floating point expression tree that has only 1.0, 0, and -1.0 at the leaves and generates this spurious zero. (For example, 1.0 + 1.0, squared five times.)
Now you maliciously transform a program by adding one of these spurious zero expressions somewhere. Am I missing something?
More generally, what is the multiply-xorshift-multiply sequence accomplishing? I feel like it might make non-malicious collisions unlikely, but I feel like it would be mildly surprising if it does much in the setting of trying to prove something without any probability of error. And it seems a bit unfortunate that no choice of the scrambling constants has any effect on the expressions that start with 1.0 and use only multiplication and addition to get to zero.
Also, how does floating point infinity fit in? It seems like it doesn’t act very infinite in the integer embedding.
(I could be totally wrong here. I only read the definitions twice, and I didn’t try to write anything down.)
Yes, this is correct, and indeed injectivity is impossible from a counting argument: for example, there are only (2^32)^(2^32) unary functions in this finite ring, but infinitely many polynomials with integer coefficients, so we cannot hope to distinguish them all.
> More generally, what is the multiply-xorshift-multiply sequence accomplishing? I feel like it might make non-malicious collisions unlikely
Exactly this. One issue with the bitcast embedding (other than not mapping {-1.0, 0.0, 1.0} to {-1, 0, 1}) is that 'round numbers' such as 42.0 have floating-point representations (such as 0x42280000) that end in very many trailing zero bits. This is particularly problematic because multiplying two such numbers gives 0.
In general, FPSan isn't designed for an adversarial regime. There are ways that it could be hardened in this direction: for example, instead of using the ring of integers modulo 2^32, choose a prime-power modulus using a large prime derived from a cryptographic hash of the source code of the two programs that you're trying to compare. But I'm not confident that even that guarantees adversarial robustness: for instance, there may be some clever way to efficiently implement a nonzero polynomial that vanishes at every small integer (say, every integer with absolute value < 10^100) and that would break this hardened variant.
The motivation behind FPSan is to protect against accidental implementation bugs: for example, someone writes a complicated fused kernel that improves the runtime of model inference, but makes an honest mistake whilst doing so. Bitwise floating-point matching is too strict; approximate matching is too messy (what should the error tolerances be?); symbolic methods are too unscalable.
That’s a lot of zeros.
I wonder if there’s a construction with the property that, for all pairs of circuits with size below some threshold (with size appropriately defined), then, if the circuits are not algebraically equal, then, under randomization of whatever parameters are randomized, their embedded versions are, with high probability, not equal and differ in at least a constant fraction of inputs.
At the very least, I imagine that most ML kernels have polynomial-ish order far below 100 in a sense where e^x is defined to have low polynomial-ish order. sin and cos might be worse due to actually having infinite zeros, but maybe pi isn’t constructible and no one can find those zeros.