Skip to content
GitLab
Menu
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
pkg
nettle
Commits
194c1564
Commit
194c1564
authored
Jun 10, 2021
by
Magnus Holmgren
Committed by
Ritesh Raj Sarraf
Jul 26, 2021
Browse files
Import Debian changes 3.7.31
parents
5a259d5c
67fbaf5f
Pipeline
#280006
passed with stages
in 21 seconds
Changes
56
Pipelines
1
Expand all
Hide whitespace changes
Inline
Sidebyside
ChangeLog
View file @
194c1564
20210522 Niels Möller <nisse@lysator.liu.se>
* configure.ac: Bump package version, to 3.7.3.
(LIBNETTLE_MINOR): Bump minor number, to 8.4.
(LIBHOGWEED_MINOR): Bump minor number, to 6.4.
20210517 Niels Möller <nisse@lysator.liu.se>
* rsadecrypttr.c (rsa_decrypt_tr): Check upfront that input is
in range.
* rsasecdecrypt.c (rsa_sec_decrypt): Likewise.
* rsadecrypt.c (rsa_decrypt): Likewise.
* testsuite/rsaencrypttest.c (test_main): Add tests with input > n.
20210514 Niels Möller <nisse@lysator.liu.se>
* rsasigntr.c (rsa_sec_blind): Delete mn argument.
(_rsa_sec_compute_root_tr): Delete mn argument, instead require
that input size matches key size. Rearrange use of temporary
storage, to support inplace operation, x == m. Update all
callers.
* rsadecrypttr.c (rsa_decrypt_tr): Make zeropadded copy of
input, for calling _rsa_sec_compute_root_tr.
* rsasecdecrypt.c (rsa_sec_decrypt): Likewise.
* testsuite/rsaencrypttest.c (test_main): Test calling all of
rsa_decrypt, rsa_decrypt_tr, and rsa_sec_decrypt with zero input.
20210506 Niels Möller <nisse@lysator.liu.se>
* pkcs1secdecrypt.c (_pkcs1_sec_decrypt): Check that message
length is valid, for given key size.
* testsuite/rsasecdecrypttest.c (test_main): Add test cases for
calls to rsa_sec_decrypt specifying a too large message length.
20210321 Niels Möller <nisse@lysator.liu.se>
* NEWS: NEWS entries for 3.7.2.
20210317 Niels Möller <nisse@lysator.liu.se>
* configure.ac: Bump package version, to 3.7.2.
(LIBNETTLE_MINOR): Bump minor number, to 8.3.
(LIBHOGWEED_MINOR): Bump minor number, to 6.3.
20210313 Niels Möller <nisse@lysator.liu.se>
* gostdsavko.c (gostdsa_vko): Use ecc_mod_mul_canonical to
compute the scalar used for ecc multiplication.
* eddsahash.c (_eddsa_hash): Ensure result is canonically
reduced. Two of the three call sites need that.
* eccgostdsaverify.c (ecc_gostdsa_verify): Use ecc_mod_mul_canonical
to compute the scalars used for ecc multiplication.
* eccecdsasign.c (ecc_ecdsa_sign): Ensure s output is reduced to
canonical range.
* eccecdsaverify.c (ecc_ecdsa_verify): Use ecc_mod_mul_canonical
to compute the scalars used for ecc multiplication.
* testsuite/ecdsaverifytest.c (test_main): Add test case that
triggers an assert on 64bit platforms, without above fix.
* testsuite/ecdsasigntest.c (test_main): Test case generating
the same signature.
20210313 Niels Möller <nisse@lysator.liu.se>
* eddsaverify.c (equal_h): Use ecc_mod_mul_canonical.
20210311 Niels Möller <nisse@lysator.liu.se>
* eccmodarith.c (ecc_mod_mul_canonical, ecc_mod_sqr_canonical):
New functions.
* eccinternal.h: Declare and document new functions.
* curve448ehtox.c (curve448_eh_to_x): Use ecc_mod_sqr_canonical.
* curve25519ehtox.c (curve25519_eh_to_x): Use ecc_mod_mul_canonical.
* eccehtoa.c (ecc_eh_to_a): Likewise.
* eccjtoa.c (ecc_j_to_a): Likewise.
* eccmulm.c (ecc_mul_m): Likewise.
20210217 Niels Möller <nisse@lysator.liu.se>
* Released Nettle3.7.1.
20210215 Niels Möller <nisse@lysator.liu.se>
* examples/nettleopenssl.c (nettle_openssl_arcfour128): Deleted
glue to openssl arcfour.
(openssl_arcfour128_set_encrypt_key)
(openssl_arcfour128_set_decrypt_key): Deleted.
* nettleinternal.h: Deleted declaration.
* examples/nettlebenchmark.c (aeads): Delete benchmarking.
20210213 Niels Möller <nisse@lysator.liu.se>
* configure.ac: Bump package version, to 3.7.1.
(LIBNETTLE_MINOR): Bump minor number, to 8.2.
(LIBHOGWEED_MINOR): Bump minor number, to 6.2.
20210210 Niels Möller <nisse@lysator.liu.se>
* chachacrypt.c (_nettle_chacha_crypt_4core): Fix for the case
that counter increment should be 3 (129 <= message length <= 192).
(_nettle_chacha_crypt32_4core): Likewise.
* testsuite/chachatest.c (test_chacha_rounds): New function, for
tests with nonstandard round count. Extracted from _test_chacha.
(_test_chacha): Deleted rounds argument. Reorganized crypt/crypt32
handling. When testing message prefixes of varying length, also
encrypt the remainder of the message, to catch errors in counter
value update.
(test_main): Add a few tests with large messages (16 blocks, 1024
octets), to improve test coverage for _nettle_chacha_crypt_4core
and _nettle_chacha_crypt32_4core.
20210125 Niels Möller <nisse@lysator.liu.se>
* arm/neon/salsa20coreinternal.asm: Deleted file. This ARM Neon
implementation reportedly gave a speedup of 45% on Cortex A9,
compared to the C implementation, when it was added back in 2013.
That appears to no longer be the case with more recent processors
and compilers. And it's even significantly slower than the C
implementation on some platforms, including the Raspberry Pi 4.
With the introduction of salsa202core.asm, performance of this
function is also less important.
* arm/neon/chachacoreinternal.asm: Deleted file, for analogous reasons.
* arm/fat/salsa20coreinternal2.asm: Deleted file.
* arm/fat/chachacoreinternal2.asm: Deleted file.
* fatarm.c (_nettle_salsa20_core, _nettle_chacha_core): Delete fat setup.
20210131 Niels Möller <nisse@lysator.liu.se>
New variants, contributed by Nicolas Mora.
* pbkdf2hmacsha384.c (pbkdf2_hmac_sha384): New file and function.
* pbkdf2hmacsha512.c (pbkdf2_hmac_sha512): New file and function.
* testsuite/pbkdf2test.c (test_main): Corresponding tests.
20210120 Niels Möller <nisse@lysator.liu.se>
* eccecdsaverify.c (ecc_ecdsa_verify): Fix corner case with
allzero hash. Reported by Guido Vranken.
* testsuite/ecdsaverifytest.c: Add corresponding test case.
20210110 Niels Möller <nisse@lysator.liu.se>
* fatppc.c: Don't use __GLIBC_PREREQ in the same preprocessor
conditional as defined(__GLIBC_PREREQ), but move to a nested #if
conditional. Fixes compile error on OpenBSD/powerpc64, reported by
Jasper Lievisse Adriaanse.
20210104 Niels Möller <nisse@lysator.liu.se>
* Released Nettle3.7.
...
...
Makefile.in
View file @
194c1564
...
...
@@ 131,7 +131,7 @@ nettle_SOURCES = aesdecryptinternal.c aesdecrypt.c \
nettlemetaaeads.c nettlemetaarmors.c
\
nettlemetaciphers.c nettlemetahashes.c nettlemetamacs.c
\
pbkdf2.c pbkdf2hmacgosthash94.c pbkdf2hmacsha1.c
\
pbkdf2hmacsha256.c
\
pbkdf2hmacsha256.c
pbkdf2hmacsha384.c pbkdf2hmacsha512.c
\
poly1305aes.c poly1305internal.c
\
realloc.c
\
ripemd160.c ripemd160compress.c ripemd160meta.c
\
...
...
NEWS
View file @
194c1564
NEWS for the Nettle 3.7.3 release
This is bugfix release, fixing bugs that could make the RSA
decryption functions crash on invalid inputs.
Upgrading to the new version is strongly recommended. For
applications that want to support older versions of Nettle,
the bug can be worked around by adding a check that the RSA
ciphertext is in the range 0 < ciphertext < n, before
attempting to decrypt it.
Thanks to Paul Schaub and Justus Winter for reporting these
problems.
The new version is intended to be fully source and binary
compatible with Nettle3.6. The shared library names are
libnettle.so.8.4 and libhogweed.so.6.4, with sonames
libnettle.so.8 and libhogweed.so.6.
Bug fixes:
* Fix crash for zero input to rsa_sec_decrypt and
rsa_decrypt_tr. Potential denial of service vector.
* Ensure that all of rsa_decrypt_tr and rsa_sec_decrypt return
failure for out of range inputs, instead of either crashing,
or silently reducing input modulo n. Potential denial of
service vector.
* Ensure that rsa_decrypt returns failure for out of range
inputs, instead of silently reducing input modulo n.
* Ensure that rsa_sec_decrypt returns failure if the message
size is too large for the given key. Unlike the other bugs,
this would typically be triggered by invalid local
configuration, rather than by processing untrusted remote
data.
NEWS for the Nettle 3.7.2 release
This is a bugfix release, fixing a bug in ECDSA signature
verification that could lead to a denial of service attack
(via an assertion failure) or possibly incorrect results. It
also fixes a few related problems where scalars are required
to be canonically reduced modulo the ECC group order, but in
fact may be slightly larger.
Upgrading to the new version is strongly recommended.
Even when no assert is triggered in ecdsa_verify, ECC point
multiplication may get invalid intermediate values as input,
and produce incorrect results. It's trivial to construct
alleged signatures that result in invalid intermediate values.
It appears difficult to construct an alleged signature that
makes the function misbehave in such a way that an invalid
signature is accepted as valid, but such attacks can't be
ruled out without further analysis.
Thanks to Guido Vranken for setting up the fuzzer tests that
uncovered this problem.
The new version is intended to be fully source and binary
compatible with Nettle3.6. The shared library names are
libnettle.so.8.3 and libhogweed.so.6.3, with sonames
libnettle.so.8 and libhogweed.so.6.
Bug fixes:
* Fixed bug in ecdsa_verify, and added a corresponding test
case.
* Similar fixes to ecc_gostdsa_verify and gostdsa_vko.
* Similar fixes to eddsa signatures. The problem is less severe
for these curves, because (i) the potentially out or range
value is derived from output of a hash function, making it
harder for the attacker to to hit the narrow range of
problematic values, and (ii) the ecc operations are
inherently more robust, and my current understanding is that
unless the corresponding assert is hit, the verify
operation should complete with a correct result.
* Fix to ecdsa_sign, which with a very low probability could
return out of range signature values, which would be
rejected immediately by a verifier.
NEWS for the Nettle 3.7.1 release
This is primarily a bug fix release, fixing a couple of
problems found in Nettle3.7.
The new version is intended to be fully source and binary
compatible with Nettle3.6. The shared library names are
libnettle.so.8.2 and libhogweed.so.6.2, with sonames
libnettle.so.8 and libhogweed.so.6.
Bug fixes:
* Fix bug in chacha counter update logic. The problem affected
ppc64 and ppc64el, with the new altivec assembly code
enabled. Reported by Andreas Metzler, after breakage in
GnuTLS tests on ppc64.
* Support for bigendian ARM platforms has been restored.
Fixes contributed by Michael Weiser.
* Fix build problem on OpenBSD/powerpc64, reported by Jasper
Lievisse Adriaanse.
* Fix corner case bug in ECDSA verify, it would produce
incorrect result in the unlikely case of an allzero
message hash. Reported by Guido Vranken.
New features:
* Support for pbkdf2_hmac_sha384 and pbkdf2_hmac_sha512,
contributed by Nicolas Mora.
Miscellaneous:
* Poorly performing ARM Neon code for doing singleblock
Salsa20 and Chacha has been deleted. The code to do two or
three blocks in parallel, introduced in Nettle3.7, is
unchanged.
NEWS for the Nettle 3.7 release
This release adds one new feature, the bcrypt password hashing
...
...
arm/README
View file @
194c1564
...
...
@@ 70,12 +70,24 @@ If data is to be processed with bit operations only, endianness can be ignored
because byteswapping on load and store will cancel each other out. Shifts
however have to be inverted. See arm/memxor.asm for an example.
3. vld
1.8
3. v
{
ld
,st}1.{8,32}
NEON's vld instruction can be used to produce endiannessneutral code. vld1.8
will load a byte sequence into a register regardless of memory endianness. This
can be used to process byte sequences. See arm/neon/umacnh.asm for example.
In the same fashion, vst1.8 can be used do a littleendian store. See
arm/neon/salsa and chacha routines for examples.
NOTE: vst1.x (at least on the Allwinner A20 CortexA7 implementation) seems to
interfer with itself on subsequent calls, slowing it down. This can be avoided
by putting calculcations or loads inbetween two vld1.x stores.
Similarly, vld1.32 is used in chacha and salsa routines where 32bit operands
are stored in hostendianness in RAM but need to be loaded sequentially without
the distortion introduced by vldm/vstm. Consecutive vld1.x instructions do not
seem to suffer from slowdown similar to vst1.x.
4. vldm/vstm
Care has to be taken when using vldm/vstm because they have two nonobvious
...
...
arm/neon/chacha3core.asm
View file @
194c1564
...
...
@@ 36,6 +36,7 @@ ifelse(`
define
(
`DST', `
r0
'
)
define
(
`SRC', `
r1
'
)
define
(
`ROUNDS', `
r2
'
)
define
(
`SRCp32', `
r3
'
)
C
State
,
X
,
Y
and
Z
representing
consecutive
bl
ocks
define
(
`X0', `
q0
'
)
...
...
@@ 64,10 +65,13 @@ define(`T3', `q7')
C
_chacha_3core
(
uint32_t
*
ds
t
,
const
uint32_t
*
src
,
unsigned
rounds
)
PROLOGUE
(
_nettle_chacha_3core
)
vldm
SRC
,
{
X0
,
X1
,
X2
,
X3
}
C
loads
using
vld1.32
to
be
endianness

neutral
wrt
consecutive
32

bit
word
s
add
SRCp32
,
SRC
,
#
32
vld1.32
{
X0
,
X1
}
,
[
SRC
]
vld1.32
{
X2
,
X3
}
,
[
SRCp32
]
vpush
{
q4
,
q5
,
q6
,
q7
}
adr
r12
,
.Lcount1
vld1.
64
{
Z3
}
,
[
r12
]
vld1.
32
{
Z3
}
,
[
r12
]
vadd.i64
Y3
,
X3
,
Z3
C
Increment
64

bit
counter
vadd.i64
Z3
,
Y3
,
Z3
...
...
@@ 213,33 +217,49 @@ PROLOGUE(_nettle_chacha_3core)
vadd.i32
Y3
,
Y3
,
T2
vadd.i32
Z3
,
Z3
,
T3
vld
m
SRC
,
{
T0
,
T1
,
T2
,
T3
}
vld
1.32
{
T0
,
T1
}
,
[
SRC
]
vadd.i32
X0
,
X0
,
T0
vadd.i32
X1
,
X1
,
T1
C
vst1.8
because
caller
expects
results
little

endian
C
interleave
loads
,
calculations
and
stores
to
save
cycles
on
stores
C
use
vstm
when
little

endian
for
some
additional
sp
eedup
IF_BE
(
` vst1.8 {X0,X1}, [DST]!')
vld1.32 {T2,T3}, [SRCp32]
vadd.i32 X2, X2, T2
vadd.i32 X3, X3, T3
vstmia
DS
T
!
,
{
X0
,
X1
,
X2
,
X3
}
IF_BE(`
vst1.8
{
X2
,
X3
}
,
[
DS
T
]
!
')
IF_LE(` vstmia DST!, {X0,X1,X2,X3}'
)
vadd.i32
Y0
,
Y0
,
T0
vadd.i32
Y1
,
Y1
,
T1
IF_BE
(
` vst1.8 {Y0,Y1}, [DST]!')
vadd.i32 Y2, Y2, T2
vstmia
DS
T
!
,
{
Y0
,
Y1
,
Y2
,
Y3
}
IF_BE(`
vst1.8
{
Y2
,
Y3
}
,
[
DS
T
]
!
')
IF_LE(` vstmia DST!, {Y0,Y1,Y2,Y3}'
)
vadd.i32
Z0
,
Z0
,
T0
vadd.i32
Z1
,
Z1
,
T1
IF_BE
(
` vst1.8 {Z0,Z1}, [DST]!')
vadd.i32 Z2, Z2, T2
vpop {q4,q5,q6,q7}
vstm
DS
T
,
{
Z0
,
Z1
,
Z2
,
Z3
}
IF_BE(`
vst1.8
{
Z2
,
Z3
}
,
[
DS
T
]
')
IF_LE(` vstm DST, {Z0,Z1,Z2,Z3}'
)
bx
lr
EPILOGUE
(
_nettle_chacha_3core
)
PROLOGUE
(
_nettle_chacha_3core32
)
vldm
SRC
,
{
X0
,
X1
,
X2
,
X3
}
add
SRCp32
,
SRC
,
#
32
vld1.32
{
X0
,
X1
}
,
[
SRC
]
vld1.32
{
X2
,
X3
}
,
[
SRCp32
]
vpush
{
q4
,
q5
,
q6
,
q7
}
adr
r12
,
.Lcount1
vld1.
64
{
Z3
}
,
[
r12
]
vld1.
32
{
Z3
}
,
[
r12
]
vadd.i32
Y3
,
X3
,
Z3
C
Increment
32

bit
counter
vadd.i32
Z3
,
Y3
,
Z3
...
...
arm/neon/chachacoreinternal.asm
deleted
100644 → 0
View file @
5a259d5c
C
arm
/
neon
/
ch
acha

core

internal.asm
ifelse
(
`
Copyright
(
C
)
2013
,
2015
Niels
M
ö
ller
This
file
is
part
of
GNU
Nettle.
GNU
Nettle
is
free
software
:
you
can
redistribute
it
and
/
or
modify
it
under
the
terms
of
either
:
*
the
GNU
Lesser
General
Public
License
as
published
by
the
Free
Software
Foundation
; either version 3 of the License, or (at your
option
)
any
later
version.
or
*
the
GNU
General
Public
License
as
published
by
the
Free
Software
Foundation
; either version 2 of the License, or (at your
option
)
any
later
version.
or
both
in
parallel
,
as
here.
GNU
Nettle
is
di
stributed
in
the
hope
that
it
will
be
useful
,
but
WITHOUT
ANY
WARRANTY
; without even the implied warranty of
MERCHANTABILITY
or
FITNESS
FOR
A
PARTICULAR
PURPOSE.
See
the
GNU
General
Public
License
for
more
details.
You
should
have
received
copies
of
the
GNU
General
Public
License
and
the
GNU
Lesser
General
Public
License
al
ong
with
this
program.
If
not
,
see
http
:
//
www.gnu.org
/
licenses
/
.
')
.file
"
ch
acha

core

internal.asm
"
.fpu
neon
define
(
`DST', `
r0
'
)
define
(
`SRC', `
r1
'
)
define
(
`ROUNDS', `
r2
'
)
define
(
`X0', `
q0
'
)
define
(
`X1', `
q1
'
)
define
(
`X2', `
q2
'
)
define
(
`X3', `
q3
'
)
define
(
`T0', `
q8
'
)
define
(
`S0', `
q12
'
)
define
(
`S1', `
q13
'
)
define
(
`S2', `
q14
'
)
define
(
`S3', `
q15
'
)
define
(
`QROUND', `
C
x0
+
=
x1
,
x3
^
=
x0
,
x3
lrot
16
C
x2
+
=
x3
,
x1
^
=
x2
,
x1
lrot
12
C
x0
+
=
x1
,
x3
^
=
x0
,
x3
lrot
8
C
x2
+
=
x3
,
x1
^
=
x2
,
x1
lrot
7
vadd.i32
$
1
,
$
1
,
$
2
veor
$
4
,
$
4
,
$
1
vshl.i32
T0
,
$
4
,
#
16
vshr.u32
$
4
,
$
4
,
#
16
veor
$
4
,
$
4
,
T0
vadd.i32
$
3
,
$
3
,
$
4
veor
$
2
,
$
2
,
$
3
vshl.i32
T0
,
$
2
,
#
12
vshr.u32
$
2
,
$
2
,
#
20
veor
$
2
,
$
2
,
T0
vadd.i32
$
1
,
$
1
,
$
2
veor
$
4
,
$
4
,
$
1
vshl.i32
T0
,
$
4
,
#
8
vshr.u32
$
4
,
$
4
,
#
24
veor
$
4
,
$
4
,
T0
vadd.i32
$
3
,
$
3
,
$
4
veor
$
2
,
$
2
,
$
3
vshl.i32
T0
,
$
2
,
#
7
vshr.u32
$
2
,
$
2
,
#
25
veor
$
2
,
$
2
,
T0
')
.text
.align
4
C
_chacha_core
(
uint32_t
*
ds
t
,
const
uint32_t
*
src
,
unsigned
rounds
)
PROLOGUE
(
_nettle_chacha_core
)
vldm
SRC
,
{
X0
,
X1
,
X2
,
X3
}
vmov
S0
,
X0
vmov
S1
,
X1
vmov
S2
,
X2
vmov
S3
,
X3
C
Input
rows
little

endian
:
C
0
1
2
3
X0
C
4
5
6
7
X1
C
8
9
10
11
X2
C
12
13
14
15
X3
C
Input
rows
big

endian
:
C
1
0
3
2
X0
C
5
4
7
6
X1
C
9
8
11
10
X2
C
13
12
15
14
X3
C
even
and
odd
columns
switched
because
C
vldm
loads
consecutive
doublewords
and
C
switches
word
s
inside
them
to
make
them
BE
.Loop:
QROUND
(
X0
,
X1
,
X2
,
X3
)
C
In
little

endian
rotate
rows
,
to
get
C
0
1
2
3
C
5
6
7
4
>>>
3
C
10
11
8
9
>>>
2
C
15
12
13
14
>>>
1
C
In
big

endian
rotate
rows
,
to
get
C
1
0
3
2
C
6
5
4
7
>>>
1
C
11
10
9
8
>>>
2
C
12
15
14
13
>>>
3
C
di
fferent
number
of
elements
needs
to
be
C
extracted
on
BE
because
of
di
fferent
column
order
IF_LE
(
` vext.32 X1, X1, X1, #1')
IF_BE(`
vext.32
X1
,
X1
,
X1
,
#
3
')
vext.32 X2, X2, X2, #2
IF_LE(` vext.32 X3, X3, X3, #3'
)
IF_BE
(
`
vext.32
X3
,
X3
,
X3
,
#
1
'
)
QROUND
(
X0
,
X1
,
X2
,
X3
)
subs
ROUNDS
,
ROUNDS
,
#
2
C
Inverse
rotation
IF_LE
(
` vext.32 X1, X1, X1, #3')
IF_BE(`
vext.32
X1
,
X1
,
X1
,
#
1
')
vext.32 X2, X2, X2, #2
IF_LE(` vext.32 X3, X3, X3, #1'
)
IF_BE
(
`
vext.32
X3
,
X3
,
X3
,
#
3
'
)
bhi
.Loop
vadd.u32
X0
,
X0
,
S0
vadd.u32
X1
,
X1
,
S1
vadd.u32
X2
,
X2
,
S2
vadd.u32
X3
,
X3
,
S3
C
caller
expects
result
little

endian
IF_BE
(
`
vrev32.u8
X0
,
X0
vrev32.u8
X1
,
X1
vrev32.u8
X2
,
X2
vrev32.u8
X3
,
X3
'
)
vstm
DS
T
,
{
X0
,
X1
,
X2
,
X3
}
bx
lr
EPILOGUE
(
_nettle_chacha_core
)
divert
(

1
)
define
ch
achastate
p
/
x
$
q0.u32
p
/
x
$
q1.u32
p
/
x
$
q2.u32
p
/
x
$
q3.u32
end
arm/neon/salsa202core.asm
View file @
194c1564
...
...
@@ 36,6 +36,7 @@ ifelse(`
define
(
`DST', `
r0
'
)
define
(
`SRC', `
r1
'
)
define
(
`ROUNDS', `
r2
'
)
define
(
`SRCp32', `
r3
'
)
C
State
,
even
elements
in
X
,
odd
elements
in
Y
define
(
`X0', `
q0
'
)
...
...
@@ 58,11 +59,14 @@ define(`T3', `q15')
C
_salsa20_2core
(
uint32_t
*
ds
t
,
const
uint32_t
*
src
,
unsigned
rounds
)
PROLOGUE
(
_nettle_salsa20_2core
)
vldm
SRC
,
{
X0
,
X1
,
X2
,
X3
}
C
loads
using
vld1.32
to
be
endianness

neutral
wrt
consecutive
32

bit
word
s
add
SRCp32
,
SRC
,
#
32
vld1.32
{
X0
,
X1
}
,
[
SRC
]
vld1.32
{
X2
,
X3
}
,
[
SRCp32
]
adr
r12
,
.Lcount1
vmov
Y3
,
X0
vld1.
64
{
Y1
}
,
[
r12
]
vld1.
32
{
Y1
}
,
[
r12
]
vmov
Y0
,
X1
vadd.i64
Y1
,
Y1
,
X2
C
Increment
counter
vmov
Y2
,
X3
...
...
@@ 180,7 +184,8 @@ C Inverse swaps and transpositions
vswp
D1REG
(
Y0
),
D1REG
(
Y2
)
vswp
D1REG
(
Y1
),
D1REG
(
Y3
)
vldm
SRC
,
{
T0
,
T1
,
T2
,
T3
}
vld1.32
{
T0
,
T1
}
,
[
SRC
]
vld1.32
{
T2
,
T3
}
,
[
SRCp32
]
vtrn.32
X0
,
Y3
vtrn.32
X1
,
Y0
...
...
@@ 190,17 +195,26 @@ C Inverse swaps and transpositions
C
Add
in
the
original
context
vadd.i32
X0
,
X0
,
T0
vadd.i32
X1
,
X1
,
T1
C
vst1.8
because
caller
expects
results
little

endian
C
interleave
loads
,
calculations
and
stores
to
save
cycles
on
stores
C
use
vstm
when
little

endian
for
some
additional
sp
eedup
IF_BE
(
` vst1.8 {X0,X1}, [DST]!')
vadd.i32 X2, X2, T2
vadd.i32 X3, X3, T3
IF_BE(`
vst1.8
{
X2
,
X3
}
,
[
DS
T
]
!
')
IF_LE(` vstmia DST!, {X0,X1,X2,X3}'
)
vstmia
DS
T
!
,
{
X0
,
X1
,
X2
,
X3
}
vld1.64
{
X0
}
,
[
r12
]
vld1.32
{
X0
}
,
[
r12
]
vadd.i32
T0
,
T0
,
Y3
vadd.i64
T2
,
T2
,
X0
vadd.i32
T1
,
T1
,
Y0
IF_BE
(
` vst1.8 {T0,T1}, [DST]!')
vadd.i32 T2, T2, Y1
vadd.i32 T3, T3, Y2
vstm
DS
T
,
{
T0
,
T1
,
T2
,
T3
}
IF_BE(`
vst1.8
{
T2
,
T3
}
,
[
DS
T
]
')
IF_LE(`
vstm DST, {T0,T1,T2,T3}
'
)
bx
lr
EPILOGUE
(
_nettle_salsa20_2core
)
arm/neon/salsa20coreinternal.asm
deleted
100644 → 0
View file @
5a259d5c