Discussion:
Curve25519 (ECDH) portable C implementation
(too old to reply)
xmath
2006-08-18 00:33:51 UTC
Permalink
I've made a simple re-implementation of djb's curve25519
(http://cr.yp.to/ecdh.html) in plain C, using 64-bit arithmetic.

http://cds.xs4all.nl:8081/ecdh/curve25519_i64.tgz -- sources
http://cds.xs4all.nl:8081/ecdh/curve25519_i64 -- sources (expanded)
http://cds.xs4all.nl:8081/ecdh/curve25519_i64.txt -- some timings

While it's certainly not going to set new speed records, it still
performs quite well compared to generic-field implementations. For
example a GMP-implementation, even when using the low-level mpn-calls,
is still much slower than this C implementation, even when doing 64-bit
math on a 32-bit machine. The good performance and code simplicity seem
largely due to hardcoding the field (p=2^255-19).

The code should be portable to any platform with int64_t. It's also
definitely much more readable than the x86 asm version :-)

- xmath
Tom St Denis
2006-08-18 00:51:13 UTC
Permalink
Post by xmath
I've made a simple re-implementation of djb's curve25519
(http://cr.yp.to/ecdh.html) in plain C, using 64-bit arithmetic.
http://cds.xs4all.nl:8081/ecdh/curve25519_i64.tgz -- sources
http://cds.xs4all.nl:8081/ecdh/curve25519_i64 -- sources (expanded)
http://cds.xs4all.nl:8081/ecdh/curve25519_i64.txt -- some timings
Cool. Why are the AMD64 timings in 32-bit mode though? Need a shell?

Tom
xmath
2006-08-18 09:15:46 UTC
Permalink
Post by Tom St Denis
Cool. Why are the AMD64 timings in 32-bit mode though?
The person who did those tests was running a 32-bit kernel
Post by Tom St Denis
Need a shell?
That would be useful :-)

- xmath
xmath
2006-08-18 11:27:17 UTC
Permalink
Also, does anyone know of a reasonably simple (not necessarily
efficient) way to do point addition on this curve using only
x-coordinates but without also having (the x-coord of) the difference
of the points? Or, in the alternative, a simple way to confirm that
P+Q=R given x-coords of P, Q, R?

The curve is y^2 = x^3 + (4*121665+2) x^2 + x over GF(2^255-19)

- xmath
daniel bleichenbacher
2006-08-18 13:22:50 UTC
Permalink
Post by xmath
Also, does anyone know of a reasonably simple (not necessarily
efficient) way to do point addition on this curve using only
x-coordinates but without also having (the x-coord of) the difference
of the points?
I don't know of any such algorithm.
Post by xmath
Or, in the alternative, a simple way to confirm that
P+Q=R given x-coords of P, Q, R?
That is possible. See Richard Crandall. US Patent Nr. 6285760, 2000.
Post by xmath
The curve is y^2 = x^3 + (4*121665+2) x^2 + x over GF(2^255-19)
- xmath
xmath
2006-08-18 13:44:38 UTC
Permalink
Post by daniel bleichenbacher
Post by xmath
Or, in the alternative, a simple way to confirm that
P+Q=R given x-coords of P, Q, R?
That is possible. See Richard Crandall. US Patent Nr. 6285760, 2000.
I was interested until my eyes hit the word "patent" :-)

Luckily, I just realized I actually have the y-coord of one of the
points, which makes the whole thing quite trivial... using the identity
(A + Px + Qx + Rx)(Px - Qx)^2 = (Py - Qy)^2 and the fact that I can
quickly calculate the square of an y-coord (from the x-coord), this
produces Py from Px,Qx,Rx,1/Qy with a small number of multiplications.

Now let's see if calculating Rx from Px,Qx,1/Qy is somehow also
possible...

- xmath
xmath
2006-08-18 13:59:51 UTC
Permalink
I'll explain the context of this: I'm trying to work out how to build
a signature system on top of curve25519. This requires calculating sP
+ tG where P is the public key and G the base point. I know both
coords of the (hardcoded) base point, but only the x-coord of P.

Doing it the naive way would require adding two unrelated points at the
end. Using a 2-dimensional addition chain can produce the answer
straight from P, G, P-G but that leaves the issue of obtaining P-G.

The "obvious" way would be by recovering the y-coord of P, and doing an
affine point subtraction. However this requires use of a square root,
and GF(2^255-19) is not a convenient field for that.

Alternatively, the signer includes P-G with the public key (it can
calculate P=xG and P-G=(x-1)G at the same time), but then the verifier
needs to confirm that the given "P-G" genuinely is P-G.

So anyway, I've figured out how to verify P-G, but calculating P-G at
the recipient end would still be nicer, since it would make public keys
only 32 bytes instead of 64 bytes.

- xmath
daniel bleichenbacher
2006-08-18 18:04:29 UTC
Permalink
Post by xmath
Post by daniel bleichenbacher
Post by xmath
Or, in the alternative, a simple way to confirm that
P+Q=R given x-coords of P, Q, R?
That is possible. See Richard Crandall. US Patent Nr. 6285760, 2000.
I was interested until my eyes hit the word "patent" :-)
Luckily, I just realized I actually have the y-coord of one of the
points, which makes the whole thing quite trivial... using the identity
(A + Px + Qx + Rx)(Px - Qx)^2 = (Py - Qy)^2 and the fact that I can
quickly calculate the square of an y-coord (from the x-coord), this
produces Py from Px,Qx,Rx,1/Qy with a small number of multiplications.
That is a nice idea.
Post by xmath
Now let's see if calculating Rx from Px,Qx,1/Qy is somehow also
possible...
Are you trying to avoid modular inversion?
If you assume that you know Qy then you would get 1/Py with your method
above. Obviously since Py^2 is known you can compute Py as 1/Py * Py^2
without
modular inversion. And using symmetry it is also possible to compute Ry
the
same way you compute Py.
Post by xmath
- xmath
xmath
2006-08-18 18:32:36 UTC
Permalink
Post by daniel bleichenbacher
Post by xmath
Now let's see if calculating Rx from Px,Qx,1/Qy is somehow also
possible...
Are you trying to avoid modular inversion?
Somewhat. I'm mostly just trying to avoid square roots.
Post by daniel bleichenbacher
If you assume that you know Qy then you would get 1/Py with your method
above. Obviously since Py^2 is known you can compute Py as 1/Py * Py^2
without modular inversion.
That calculation requires Rx as input however. I'm trying to produce
it as output :-)

Note that I don't really mind doing precomputation on Q, such as
calculating 1/Qy. (Q here is the base point).

That is, I can do: confirm R = P + Q given Rx, Px, Qx, Qy by
calculating Py from these and confirming (Px,Py) is actually a point on
the curve. (I'm pretty sure that confirms R=P+Q)

However I'm wondering if I can also do: calculate R = P + Q given Px,
Qx, Qy, producing Rx.

- xmath
D. J. Bernstein
2006-08-19 14:05:49 UTC
Permalink
Thanks for the implementation.
Post by xmath
However I'm wondering if I can also do: calculate R = P + Q given Px,
Qx, Qy, producing Rx.
If you can easily obtain Rx from Px,Qx,Qy then you can also easily
obtain Py. Say the curve equation is by^2 = x^3 + ax^2 + x; your desired
Rx value is b(Py-Qy)^2/(Px-Qx)^2 - a - Qx - Px; add Px, add Qx, add a,
multiply by (Px-Qx)^2, subtract bQy^2, subtract bPy^2 (which is easily
computed as Px^3 + aPx^2 + Px), and divide by 2bQy, to obtain Py.

This means that computing your desired Rx (plus a few multiplications
and a division) is at least as hard as mapping Px to (Px,Py). So you
can't expect something significantly faster than a square root.

On the bright side, there's an easy straight-line formula from Atkin for
the square root of u modulo a prime p in 5+8Z: it's uv(2uv^2-1) where
v = (2u)^((p-5)/8). So it's easy to expand Px into P. There's no need
for an extra bit to select between +-P; the signer can simply negate his
multiplier modulo the group order.

---D. J. Bernstein, Professor, Mathematics, Statistics,
and Computer Science, University of Illinois at Chicago
xmath
2006-08-19 17:03:22 UTC
Permalink
Post by D. J. Bernstein
Thanks for the implementation.
I'm still updating it, and also working on xP+yQ. I've turned the
recursive definition of your diffchain paper into a concrete sequential
algorithm:

http://cds.xs4all.nl:8081/ecdh/d2chain.txt -- notes
http://cds.xs4all.nl:8081/ecdh/d2chain.pl -- a small test in perl

I've also made the initialization phase of the algorithm simpler:
instead of the first round being non-standard to work from [P, Q, P-Q],
I have the first round the same as all others. However, the
pre-initialization to allow this requires one extra add. The
pre-first-round state is (1,1) for odd-odd (this is the add), (0,0) for
even-even, and (0,1) or (1,0) for the third. It causes the odd-odd pair
of the first round to become a redundant (1,1) + (0,0) -> (1,1).

I think the simplicity is worth the extra add.
Post by D. J. Bernstein
If you can easily obtain Rx from Px,Qx,Qy then you can also easily
obtain Py. [...] This means that computing your desired Rx is at least
as hard as mapping Px to (Px,Py). So you can't expect something
significantly faster than a square root.
Oh duh, I should have seen that one :-)
Post by D. J. Bernstein
On the bright side, there's an easy straight-line formula from Atkin for
the square root of u modulo a prime p in 5+8Z: it's uv(2uv^2-1) where
v = (2u)^((p-5)/8). So it's easy to expand Px into P.
Excellent!

- xmath
Tom St Denis
2006-08-19 23:15:18 UTC
Permalink
Post by xmath
I've made a simple re-implementation of djb's curve25519
(http://cr.yp.to/ecdh.html) in plain C, using 64-bit arithmetic.
http://cds.xs4all.nl:8081/ecdh/curve25519_i64.tgz -- sources
http://cds.xs4all.nl:8081/ecdh/curve25519_i64 -- sources (expanded)
http://cds.xs4all.nl:8081/ecdh/curve25519_i64.txt -- some timings
I got "0.263 ms" on my Opteron 285 [2.6Ghz] with GCC 4.1.1 in 64-bit
mode. 683800 cycles, I assume per point mul. For reference, LTC gets
583,723 cycles per P-256 point mul when the base has been pre-computed
[e.g. as you would for DH encrypt, sign and verify].

Tom

Loading...