Show HN: Rscrypto, pure-Rust crypto with industry leading public benches

(github.com)

32 points | by LoadingALIAS 1 day ago

5 comments

dave_universetf 1 day ago
The readme has strong LLM smells. Was the code written by an LLM as well?
What is your experience with cryptographic engineering, in particular avoiding common implementation pitfalls that bite first-time implementers of cryptographic primitives?
Are the primitives tested against Wycheproof vectors, and proofed against the common implementation mistakes they document?
[-]
- tux3 1 day ago
  Yeah, spot on. This is what the code looks like: https://github.com/loadingalias/rscrypto/blob/4e24772a54fef3...
  Look at these section comments that LLMs love ("// ─── Rotation helpers ────")
  Now you sometimes see these section comments in legacy codebases that have very long files. What you don't see people use is U+2500 BOX DRAWINGS LIGHT HORIZONTAL unicode characters padded out just right to look pretty. We humans have regular keyboards, but these AIs are trained to output emojis and pretty unicode.
  [-]
  - LoadingALIAS 23 hours ago
    The documentation and to a large extent commenting, auditing, and almost every markdown file was likely generated with an LLM. Do not mistake that for competence or quality.
    This is a pre-v1 codebase. I'm looking for bench-methodology failures; I'm looking for API issues and/or code smells. I'm looking for ASM/SIMD weak points and/or testing issues.
    Over time, as I have the capacity, I will almost certainly clean up anything that's just not necessary. Having said that, if something feels clean and it was done by an LLM in my harness/workflow - I'm 100% happy to leave it.
    Please, dig into the code. Let me know what you see. Thanks.
- LoadingALIAS 23 hours ago
  Yeah, all fair questions.
  To address the LLM question - almost all MD files in the codebase were built around the codebase by an LLM. I simply don't have the time; this project is a side project and not my main squeeze. This is also a pre-v1 codebase; I will have time soon enough to address anything overly 'LLM' flavored.
  My experience covers nearly two decades in one way or another. Having said that, I've never felt like I had the time, nor the need, for rscrypto. The last year was different; I genuinely needed this myself for my actual work. I have worked on rscrypto in part for a year. This isn't like a whimsical LLM codebase or some vibe coded junk.
  I use LLMs in my workflows every single day and have for the better part of two-years; I gain more trust in them almost weekly, too. I feel like there isn't an engineer on Earth who can say otherwise and if there is... I'd probably argue with them against integrating LLMs into their tooling in some way.
  Finally, the actual important question... not all primitives are tested against Wycheproof vectors yet. RSA - yes; the whole crate, not yet. Again, it's just a time thing. I've used official RFC/NIST vectors, RustCrypto/oracle differential tests, proptests, fuzz corpus replay, Miri where applicable, and backend-vs-portable equivalence tests to cover the rest of the codebase.
  Also, “proofed” is too strong a word for test vectors, IMO. Wycheproof is regression evidence against known bug classes, not a proof of cryptographic correctness.
  Nevertheless, it's a valid point and it's covered in my backlog as of like a month ago.
sevenoftwelve 1 day ago
Hi @LoadingAlias,
> Constant-time MAC, AEAD, and signature verification.
That sounds suspiciously incomplete to me.
Which cryptographic algorithms in the library are currently not implemented in constant time?
Where did the speedup come from? How where these optimizations achieved?
What motivated you to write the library? Why not contribute to existing rust crypto libraries instead? How is the work financed?
What peer review strategy are you following with the library? Who else but yourself has verified this code?
[-]
- sevenoftwelve 1 day ago
  Why do the different sha2 variants not share code? This seems like a lot of opportunities for small mistakes/discrepancies; especially considering the many architectures.
  Was any of this generated using AI?
  [-]
  - LoadingALIAS 18 hours ago
    The SHA2 variants DO share the compression layer where I felt it mattered:
    - SHA-224 uses the SHA-256 compression kernels w/ different IV/output truncation. - SHA-384 and SHA-512/256 use the SHA-512 compression kernels w/ different IV/output truncation.
    There IS some duplicated wrapper/finalization/state code per public type, and I agree that is probably the first place where small discrepancies/mistakes can creep in over time. I appreciate you pointing it out; I've added it to the backlog and will look it over as soon as possible. The reason it exists today is more about keeping monomorphized public types simple. I’m not religious about it; if I can reduce that wrapper duplication w/o making the dispatch/type story worse, I should - and I will.
    The guardrail is that SHA2 has official vectors + differential/proptest coverage against the sha2 crate for one-shot and streaming paths.
    Yes, I use an LLM daily and have for a few years now. It's used as an assistant during parts of the project, especially for drafting, refactoring passes, test scaffolding, and review prompts. I use an LLM to write markdown files for the public - it's not something I'm great at. I do not treat generated code as trusted... in fact, it's the exact opposite. It has to compile, pass vectors/differentials/fuzz/Miri where applicable, and survive manual review. Also, this is crypto, the tests are not decoration; they are the bar before code counts. I know that our industry is drowning in vibe-coded nonsense; this is not that. This is like a year of my life... and maintaining it for many years to come.
    A final point I wanted to leave... this is pre-v1. The point of sharing today was to get people to dig into it and find the problems. If there are other issues, inefficiencies, or smells you fine - please, share them. Thank you!
- CodesInChaos 1 day ago
  "Constant-time signature verification" stands out, since unlike signature creation, verification doesn't involve secrets, and thus doesn't require constant-time in most threat models.
  [-]
  - LoadingALIAS 18 hours ago
    [dead]
- LoadingALIAS 23 hours ago
  Hey! Thank you for taking a second. Really, I appreciate it. So... fair criticism. The constant-time line is too compressed and should probably be replaced w/ some kind of matrix.
  I ask you to give me a few hours. I'm not able to like devote the time to the comments that it deserves. I'm nearly home, give me a bit, please.
  Thanks!
- LoadingALIAS 19 hours ago
  Okay, I finally have a second to breathe. Sorry for the delayed response - life and all that.
  I am not claiming the entire crate is constant-time, and if the README reads that way, that is my mistake. My intended claim is MUCH narrower... secret-bearing compare/open/verify/private-op paths avoid secret-dependent early exits where it matters.
  NO global constant-time claim for:
  - parsers/importers/DER/PHC decoding - algo/profile negotiation - keygen and OS randomness paths - public RSA verification/encryption work - hashes/checksums/fast hashes as whole APIs - length/shape rejection before a primitive boundary - Argon2d/scrypt as blanket CT primitives
  With respect to AEAD/MAC verification, the important pieces are full tag comparison and opaque failure. For RSA private ops, the relevant pieces are blinding, fixed-window exponentiation with constant-time table selection, public fault checks, failure accumulation, output clearing, and the release-mode leakage regression gate in CI (rsa.yaml). That is the evidence, it's not proof.
  The speedups are not from one trick. This is about a year of work, w/ some general planning before that. The main sources are:
  - arch dispatch with portable Rust as the reference path - hardware AES/SHA/PMULL/CLMUL/CRC/etc. where available and measurably better - tuned per-size dispatch tables instead of one backend for every length - fused one-shot paths for small HMAC/HKDF/PBKDF2 cases - reusable scratch APIs to avoid repeated allocation, especially RSA - backend-specific kernels for SHA-2/SHA-3/BLAKE3/AEAD/checksums
  It is not uniformly faster... but my God, it's close. Crucially, the README.md/OVERVIEW.md call out the losses too: small AEAD overhead, some X25519/RSA verify rows, PBKDF2-SHA256 at iters=1, and platform-sensitive SHA-3/SHAKE behavior... I'm also having some trouble w/ the MacOS Blake3 perf. It's just been elusive af. The `benchmark_results/OVERVIEW.md` is the clearest source for the raw shape of the wins and losses.
  My motivation was straightforward - I needed this. My company’s lead product benefits a LOT from removing C libs/FFI, reducing external deps, avoiding competing types, having a unified no_std/WASM story, and making checksums faster. I had worked on https://crates.io/crates/crc-fast previously and wanted to push that kind of direction much further... but it just wasn't going to happen. I contributed the no-std/wasm compat there and then realized... I need to do this myself; that's the point I started like really working out the details. I'd already been exploring it for a while at that point and was tackling Blake3 w/o C-libs head-on for months.
  I did consider contributing more to existing Rust crypto libraries, but this was not a small patch series. The shape I wanted was a single pure-Rust primitive stack with small feature-selected/leaf builds, no mandatory C/OpenSSL/system-lib dependency, no_std support, portable fallbacks, and cross arch dispatch built in. The existing Rust crypto ecosystem is important and I use/compare against it heavily; rscrypto is exploring a different packaging/performance/control point.
  This is 100% self-funded right now. If/when the OSS side of my startup is ready, rscrypto may become company-maintained, but it will remain open source forever. I cannot afford to start a FIPS validation process yet. I have a backlog, and the first thing - at least before sharing it today - is deciding the FIPS structure if the opportunity presented itself for a subsidized audit.
  No formal third-party audit yet, either. I guess I should have been more explicit about that? Current review evidence is purely the public source, official vectors, RSA Wycheproof (I will likely expand Wycheproof when I get a the time), NIST CAVP subset coverage, differential tests against established crates/libs where possible, proptests, fuzzer/corpus replay, Miri, and the RSA CI gate in '.github/workflows/rsa.yaml'.
  I’m posting it publicly because I want serious review before pretending it has had one. I know review matters; I simply cannot afford a proper external audit yet, or I would have done it already.
LoadingALIAS 1 day ago
I've built rscrypto because crypto kept being where my Rust database stopped being portable: different stack on the server, different target story on WASM, different answer on RISC-V/POWER/IBM Z, and a different audit surface every time I added a primitive. The supply chain risk, given the landscape we're in today, was too high.
v0.3.1 is one feature-selected crate. Leaf features when you need one primitive (`sha2`, `rsa`, `aes-gcm`, `ed25519`, etc.) or `full` for the stack. Scope includes SHA-2/3, SHAKE, cSHAKE256, BLAKE2, BLAKE3, Ascon hash/XOF, XXH3, RapidHash, CRCs, HMAC, KMAC256, HKDF, PBKDF2, Argon2, scrypt, PHC strings, RSA, Ed25519, X25519, AES-128/256-GCM, AES-128/256-GCM-SIV, ChaCha20-Poly1305, XChaCha20-Poly1305, AEGIS-256, and Ascon-AEAD128.
The primitive stack has zero default deps and no C-libs or FFI. Optional `getrandom`, `serde`, and `rayon` features stay out until enabled.
The current bench evidence is across nine Linux runners (Intel Sapphire Rapids, Intel Ice Lake, AMD Zen4, AMD Zen5, Graviton3, Graviton4, IBM Z/s390x, IBM POWER10/ppc64le, RISE RISC-V) and my local Apple MBP M1.
Linux vs. fastest-external: 3,545 wins and 5,210 wins-or-ties out of 5,832 comparisons, 1.61x geomean.
MBP M1 vs fastest-external: 235 wins and 450 wins-or-ties out of 463 comparisons, 1.25x geomean.
BLAKE3 large inputs (`>=64 KiB`) are 2.31x geomean improvement across Linux vs the official `blake3` crate and 1.80x on MBP M1.
While it's not universally faster - it's incredibly close. Current weak spots include PBKDF2-SHA256 setup at `iters=1`, X25519 DH, RSA verification on Arm/RISC-V, small-message AEAD rows, MBP M1 BLAKE3 64 KiB rows, HMAC-SHA256 bulk pressure against `aws-lc-rs`, and SHA3-256 streaming on Apple Silicon. The `./benchmark_results/OVERVIEW.md` lists the losses next to the wins in more detail.
Trust, Testing, Etc: portable Rust is the byte-for-byte authority. SIMD/ASM paths are accelerators and are differential tested against the portable path. MAC, AEAD, and signature comparisons are constant-time. Secret-bearing types zeroize on drop. I've got a pretty thorough Miri and Fuzzer testing gate setup, too. The RSA impl has it's own CI gate. Codecov = 73.06, fuzzing included.
This is not FIPS 140-3 validated, not a TLS stack, not a key store, and not third-party audited yet. I am genuinely interested in a third-party audit and would LOVE to plan for FIPS 140-3 validation, but it's just out of my reach right now.
The codebase/lib is obviously pre-v1 and I'm asking for public review while API changes are still relatively cheap.
Repo: https://github.com/loadingalias/rscrypto
Crate: https://crates.io/crates/rscrypto
Benches: https://github.com/loadingalias/rscrypto/blob/main/benchmark...
Migration Guides: https://github.com/loadingalias/rscrypto/blob/main/docs/migr...
Me: https://x.com/loadingalias
If you're testing, benching, etc. and happen to stumble across inconsistencies, vulnerabilities, etc. - please just reach out directly via 'X' or use Github's Vulnerability Reporting. There are a decent number of people already using the library.
Also, the 'fastest-external' competitors for perf comparisons are almost always one of the following: aws-lc-rs, ring, RustCrypto, dryoc, OpenSSL, Blake3 and/or one of the many 'crc-fast/fast-crc' crate variations. I benched these external crates against eachother in the beginning to trace the most performant before hunting inefficiency and cutting out any external deps/c-libs. So, if the benches show a 2x geomean over Blake3... that means it's over the fastest implementation of Blake3 I could find and bench publicly.
sevenoftwelve 15 hours ago
Hi there,
I am Karolin Varner, the person who designed the Rosenpass Protocol, which secures WireGuard against quantum attacks, and I am the managing director of Rosenpass e. V. I am well connected in the real-world cryptography scientific community and do cryptography daily.
Based on the responses from the author in this thread, I would strongly advise anyone against using this library for the following reasons:
- Lack of third-party reviews/existing review processes
- Somewhat evasive/defensive answers about LLM usage in this thread
- Lack of complete constant-time cryptography support
- The author's own insistence that this pre-v1 code (if it's pre-v1 and thus not yet at quality, don't advertise it for use. This is cryptography.) This claim is especially alarming since it's not found in the README, which does beg users to employ this library.
- Publishing this and then forcing everyone else to review this is extracting free labor from cryptographers. Code duplication in the code base extracts extra labor from cryptographers checking this. It's simply not good neighborly behavior.
The author claims themselves that the library is not fully constant-time secure; constant-time security is a basic guarantee all cryptography code MUST follow.
AUTHOR QUOTE:
```
  NO global constant-time claim for:
  
  - parsers/importers/DER/PHC decoding
  - algo/profile negotiation
  - keygen and OS randomness paths
  - public RSA verification/encryption work
  - hashes/checksums/fast hashes as whole APIs
  - length/shape rejection before a primitive boundary
  - Argon2d/scrypt as blanket CT primitives
```
Honestly, I am having a hard time making sense of this comment. A lot of these clearly should be constant time; not making them constant time is by definition insecure.
Given that this code is partially LLM-generated, I am dubious about whether every line of the code that is claimed to be constant-time secure is actually so.
It is also very confusing that the author claims constant-time security for MACs but not for (cryptographic) hashes. MACs (message authentication codes) are implemented in terms of hash functions, so how can the MAC be constant-time secure if the hash function is not? It makes no sense.
I suspect that the speedup reported might be in part due to the lack of constant-time security. Constant-time code comes with a performance penalty; if you are not doing constant time and the other people are, then of course your code will be faster.
It is particularly troubling that the author talks about "algo/profile negotiation"; this type of feature was a frequent source of dangerous vulnerabilities in SSL/TLS implementations (look up downgrade attacks). Also, for a library providing primitives, why is algo/profile negotiation even needed?
In short: If you want your projects to be secure, please do not use this. And please, dear author, do not publish half-baked crates in this way. It's disrespectful and steals our time as cryptographers.
[-]
- LoadingALIAS 6 hours ago
  Karolin, I’ll answer this directly, and try to stay on track, because several claims here are materially wrong.
  First things first, I am not claiming I've had any third-party audit; I'm not claiming I've been FIPS validated, or telling anyone “use this blindly in high-assurance production.” If that is your bar, then yes: do not use rscrypto yet. That is a fair warning... and it's plastered everywhere. I almost feel like you didn't read the codebase, the thread here, etc.
  Now, to your constant-time criticism - it's just not correct. It's wrong. It's misleading. It shouldn't come from someone leading Rosenpass.
  “Not a global constant-time claim” does not mean “secret-bearing cryptographic ops branch on secrets.” I've said this twice now - verbatim. It means I am refusing to make a fake blanket claim over APIs where constant-time behavior is either irrelevant, impossible, or the wrong security property.
  DER parsing is not supposed to be constant-time over DER structure. PHC string decoding is not supposed to be constant-time over ASCII syntax. OS randomness is not a constant-time primitive. RSA public verification/encryption operates on public inputs. Rejection of public length/shape before entering a primitive boundary is NOT a secret-dependent timing leak. Argon2d intentionally has data-dependent memory access; treating it as a blanket CT primitive would be wrong - it's a lie at that point.
  That list was a boundary, Karolin, not an admission of insecure primitives.
  The MAC/hash point is also mixing categories. A MAC verification claim is about the keyed construction and verification surface: no secret-dependent behavior in the keyed path that matters for the construction, opaque failure, and constant-time tag comparison. An unkeyed hash API as a whole does not get the same global claim because message length, streaming shape, finalization shape, feature-dispatched kernels, and non-crypto hash/checksum APIs are public-input machinery. “HMAC uses a hash” does not imply “the entire hash API must be globally constant-time under every possible use.”
  On “algo/profile negotiation”: this is not TLS-style negotiation. It is closed protocol identifier mapping for RSA profiles: JWT/COSE/TLS/X.509 identifiers are parsed into EXPLICIT supported profile enums, and unsupported/confused algos are rejected. If the word “negotiation” suggested downgrade-prone protocol behavior, I should fix the language, sure, but the feature is not an open negotiation mechanism.
  With respect to the perf, if you believe a benchmark win comes from a specific missing constant-time property, name the primitive, input class, backend, and compared impl. “It might be faster because it is insecure” is not evidence... it's negativity and hyperbole.
  Also, every single line of this codebase is my responsibility. My usage of an LLM is not a substitute for review, tests, vectors, fuzzing, Miri, and/or a third-party audit. If you're insinuating that using an LLM today is a vulnerability report - I'm afraid you're falling behind the best engineers in the world and are sorely mistaken.
  Finally, and this is the one that really gets under my skin, which is likely your goal given your very public history of such... my putting code in public doesn't force cryptographers to review it. It invites scrutiny. If there is duplication that materially increases audit cost, point to it and I will remove or justify it. If there is a side channel, GIVE ME THE PATH and I will treat it as a security issue. But “do not publish until the community has reviewed it” is backwards: review cannot happen against code that is not visible. You're plastered all across the Internet championing this exact maxim, are you not? Did I not read an interview this morning where you went on about the importance of OSS crypto?
  Please, take some time to review the codebase or don't; that's up to you... but don't come in here and trash my work improving and unifying well known primitives. Also, this isn't a whimsical codebase vibe coded by Claude overnight. This is a year of my life reading, understanding, and improving the inefficiencies in primitives I've used for years.
  The entire point of a pre-v1 review by the community is the same exact thing OSS engineers/contributors have done for decades. I'm looking for the community to point out glaring issues, errors, API shapes, code smells, etc. You came in here with a bunch of incorrect claims and emotion. This codebase isn't perfect, but I assure you - it will be, and it will lead Rust by default becasue there simple will not be a better option.
  Happy to talk one on one, or we could put something together where others have a chance to chime in. Either way, have a good one.
chauhan_dhruvil 14 hours ago
interasting