Exetools  

Go Back   Exetools > General > General Discussion

Notices

Reply
 
Thread Tools Display Modes
  #121  
Old 03-18-2026, 15:58
cjack's Avatar
cjack cjack is offline
Family
 
Join Date: Jan 2002
Posts: 170
Rept. Given: 196
Rept. Rcvd 176 Times in 34 Posts
Thanks Given: 332
Thanks Rcvd at 219 Times in 64 Posts
cjack Reputation: 100-199 cjack Reputation: 100-199
After 74.5 hours of computation, we successfully extracted the private key for Encryptionizer Certificate #6!!!!!!!!

The collision was found by "Raziel" (RTX 5090) on March 17, 2026 at 22:06 UTC.

The solution is:

k = 0x0000002082DF1821D6E82CEE6880211228

Certificate: #6 of 12 (Encryptionizer / reg3.exe)
BasepointInit: 3254253397 (0xC1F7F755)
Checksum: 0xF4775F5E
Public key (ONB2):
Q.x = 3600264749883462755399490686438491
Q.y = 419754383946908414551514272523181
Curve: GF(2^113) Koblitz (y^2 + xy = x^3 + x^2 + 1)
Subgroup order: 5192296858534827627896703833467507


Statistics:

59 GPU workers (57x RTX 5090 + RTX 4070 + RTX 4070 Ti SUPER)
Peak speed: 32.78 billion iterations/second
210 million distinguished points collected
~7.84 × 10^15 effective iterations
Pollard's Rho with Frobenius canonicalization on GF(2^113) Koblitz curve
Architecture: Custom CUDA solver running on a distributed fleet coordinated by a central Python/FastAPI server. The solver uses table-free GF(2^113) arithmetic, Lopez-Dahab projective coordinates, and per-step Itoh-Tsujii affine normalization — achieving 96 registers, 0 spill on sm_120 (RTX 5090).

Special thanks to WhoCares whose independent audit identified a critical missing optimization (the negation map), which led to the development of v2.2.0 — our "GOLDEN" build that will deliver an additional 41% algorithmic speedup on future certificates. His RTX 4070 also contributed to the search alongside the main fleet.
We will be publishing the complete source code (GPU solver, CPU solver, and distributed server) on GitHub shortly. This includes the full v2.2.0-GOLDEN solver with negation map and fruitless cycle escape, ready for future ECDSA-113 targets.

The journey from a broken 3.5 G/s solver to a production-grade 32.78 G/s distributed system was quite a ride. Every bug caught made the code stronger. Every optimization attempt taught us something about the hardware.
Reply With Quote
The Following 2 Users Gave Reputation+1 to cjack For This Useful Post:
niculaita (03-18-2026), WhoCares (03-19-2026)
The Following 7 Users Say Thank You to cjack For This Useful Post:
Abaddon (03-19-2026), DARKER (03-18-2026), Jupiter (03-18-2026), niculaita (03-18-2026), nulli (03-18-2026), WhoCares (03-18-2026), wx69wx2023 (03-18-2026)
  #122  
Old 03-18-2026, 16:00
WhoCares's Avatar
WhoCares WhoCares is offline
who cares
 
Join Date: Jan 2002
Location: Here
Posts: 468
Rept. Given: 11
Rept. Rcvd 32 Times in 25 Posts
Thanks Given: 69
Thanks Rcvd at 247 Times in 94 Posts
WhoCares Reputation: 32
Perhaps we'd better upgrade CUDA toolkit from 12.x to 13.1.

For learning purposes, I asked an AI to optimize the GPU kernel function pollard_kernel(), mainly targeting the NVIDIA GeForce RTX 5090.

The optimization goal was to reduce register usage from 96 registers to 64 registers. This increases SM occupancy, allowing the number of blocks that can run concurrently on a single SM to increase from 5 to 8, yielding a theoretical performance improvement of around one third.

The actual performance gain should be evaluated using NVIDIA Nsight Compute together with real benchmark data.

By leveraging the SMRS compiler feature introduced in NVIDIA CUDA Toolkit 13.0, spilled registers can be replaced with accesses to shared memory, making it possible to ultimately achieve the 64-register optimization target.

Quote:
ptxas info : Compiling entry function '_Z14pollard_kernelP6worm_tP4dp_tPjjiyiyi' for 'sm_120'
ptxas info : Function properties for _Z14pollard_kernelP6worm_tP4dp_tPjjiyiyi
200 bytes stack frame, -36 bytes spill stores, -28 bytes spill loads
ptxas info : Used 64 registers, used 1 barriers, 200 bytes cumulative stack size, 7168 bytes smem

ptxas info : Compile time = 0.000 ms
ptxas info : Function properties for _Z10ec_canon_x4fe_t
0 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads
ptxas info : Function properties for _Z10ld_madd_z1RK4fe_tS1_S1_S1_RS_S2_S2_
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z10reset_wormiyR14worm_context_tP6walk_t
0 bytes stack frame, 124 bytes spill stores, 120 bytes spill loads
ptxas info : Function properties for _Z12iterate_stepiyR14worm_context_tjP4dp_tPjiP6walk_ty
0 bytes stack frame, 32 bytes spill stores, 16 bytes spill loads
ptxas info : Function properties for _Z12prepare_stepR4fe_tS0_R4sc_tS2_
0 bytes stack frame, 0 bytes spill stores, 8 bytes spill loads
ptxas info : Function properties for _Z6fe_inv4fe_t
0 bytes stack frame, 0 bytes spill stores, 4 bytes spill loads
ptxas info : Function properties for _Z9record_dpRK4fe_tRK4sc_tS4_P4dp_tPji
0 bytes stack frame, 0 bytes spill stores, 8 bytes spill loads
tmpxft_00006c68_00000000-7_solver_fast.compute_120.cudafe1.cpp
[100%] Linking CUDA executable solver_fast.exe
[100%] Built target solver_fast
Attached Files
File Type: zip GPU_Register_Usage_Optimization_Walkthrough.zip (3.1 KB, 12 views)
__________________
AKA Solomon/blowfish.

Last edited by WhoCares; 03-18-2026 at 16:12.
Reply With Quote
The Following User Gave Reputation+1 to WhoCares For This Useful Post:
cjack (03-18-2026)
The Following 4 Users Say Thank You to WhoCares For This Useful Post:
cjack (03-18-2026), niculaita (03-19-2026), nulli (03-20-2026), wx69wx2023 (03-18-2026)
  #123  
Old 03-18-2026, 18:55
cjack's Avatar
cjack cjack is offline
Family
 
Join Date: Jan 2002
Posts: 170
Rept. Given: 196
Rept. Rcvd 176 Times in 34 Posts
Thanks Given: 332
Thanks Rcvd at 219 Times in 64 Posts
cjack Reputation: 100-199 cjack Reputation: 100-199
Hey everyone,

After weeks of work, we've reached the finish line. The ECDLP on the Armadillo Koblitz curve GF(2^113) has been solved, and we've successfully generated a working registration key for the original, unmodified target : Encryptionizer NEP.
It's a great transparent data encryption software for windows.....old version

The Target:
NetLib Encryptionizer Platform for Server, Version 2017.1216.31129.1

Download:
https://mega.nz/file/M8oRXBBK#OSW0ZKq963cY92MhHh9xAxOYHMtlIaKNMqUfo9DJpD4

Registration Data:

Serial Number: 7375-4869-9685-9586a
Name: exetools
Key: 111HTT-TGQ6JH-HVEZF7-U36QEA-UFJD05-QHM5RJ-YEP3DZ-TNZZ77-2UGEKK-KW8A1X-5M95WE


Built a keygen based on AKT sources and verified it against the original Armadillo v9.66 protection of Encryptionizer.
This was a team effort. Special thanks to WhoCares whose vulnerability report identified the missing negation map optimization, giving us a √2 algorithmic speedup that made the difference.

Cheers,
CrackerJack & Claude
Reply With Quote
The Following User Says Thank You to cjack For This Useful Post:
niculaita (03-18-2026)
  #124  
Old 03-18-2026, 19:08
cjack's Avatar
cjack cjack is offline
Family
 
Join Date: Jan 2002
Posts: 170
Rept. Given: 196
Rept. Rcvd 176 Times in 34 Posts
Thanks Given: 332
Thanks Rcvd at 219 Times in 64 Posts
cjack Reputation: 100-199 cjack Reputation: 100-199
Quote:
Originally Posted by WhoCares View Post
Perhaps we'd better upgrade CUDA toolkit from 12.x to 13.1.

For learning purposes, I asked an AI to optimize the GPU kernel function pollard_kernel(), mainly targeting the NVIDIA GeForce RTX 5090.

The optimization goal was to reduce register usage from 96 registers to 64 registers. This increases SM occupancy, allowing the number of blocks that can run concurrently on a single SM to increase from 5 to 8, yielding a theoretical performance improvement of around one third.

The actual performance gain should be evaluated using NVIDIA Nsight Compute together with real benchmark data.

By leveraging the SMRS compiler feature introduced in NVIDIA CUDA Toolkit 13.0, spilled registers can be replaced with accesses to shared memory, making it possible to ultimately achieve the 64-register optimization target.
Hey WhoCares,

Thanks for the detailed optimization work!
About the register optimization proposal — we actually went down this exact rabbit hole. Here's what we found on real silicon:

The kernel is throughput-bound, not latency-bound. We built and benchmarked an optimization called "Thunderstrike" (OPT-3) that used ILP to fuse operations and improve parallelism. Result on RTX 5090: 0% speedup. The ALU pipeline is already saturated by the 112 sequential GF(2^113) squarings in ec_canon_x (40% of step cost) and the 8 multiplications in fe_inv via Itoh-Tsujii (50% of step cost). More warps via higher occupancy just queue up behind the same ALU — there are no idle cycles to fill.

A few specific concerns with the 64-register approach:

fe_mul alone needs ~80 registers (table-free XOR accumulation across 113-bit field elements). Forcing 64 via __launch_bounds__ guarantees massive spills. Even with __noinline__ on hot functions like fe_inv and ec_canon_x, the call overhead and lost register context hurt throughput on the critical path.

CUDA 13.x / SMRS: we'd love to test it!.

The ptxas output shows negative spill values (-36 bytes spill stores, -28 bytes spill loads). Negative spills are unusual and suggest the compiler is reporting redirected spills rather than actual elimination. Without real Nsight Compute profiling data, it's hard to confirm whether this translates to actual throughput gains.

What we WILL adopt from your proposals for the next certificate:

Single-pass DP retrieval (clean simplification, ~1-2%)
cudaOccupancyMaxActiveBlocksPerMultiprocessor for self-tuning grid size
Benchmarking L1 cache vs shared memory for the walk table (your bank conflict analysis was spot-on theoretically)
Bottom line: fe_inv and ec_canon_x consume 90% of the step cost and are algorithmically irreducible for Koblitz curve canonicalization. No amount of occupancy optimization can reduce these costs.
Reply With Quote
The Following 2 Users Say Thank You to cjack For This Useful Post:
niculaita (03-18-2026), nulli (03-20-2026)
  #125  
Old 03-19-2026, 12:30
WhoCares's Avatar
WhoCares WhoCares is offline
who cares
 
Join Date: Jan 2002
Location: Here
Posts: 468
Rept. Given: 11
Rept. Rcvd 32 Times in 25 Posts
Thanks Given: 69
Thanks Rcvd at 247 Times in 94 Posts
WhoCares Reputation: 32
Armadillo parameter verification scripts, including:

Elliptic curve base point generation (based on code from mrexodia’s Armadillo Key Tool).
Solving the basis transformation matrix between PB(Polynomial Basis) and ONB2(Type-2 Optimal Normal Basis), and transform cordinates between PB and ONB2.
Curve equation validation.(trivial task)
Subgroup order verification.(trivial task)
Public/private key consistency checks.(trivial task)

https://github.com/z16166/ArmadilloEcdlpVerify (for fun)

Quote:
Originally Posted by cjack View Post
[B]
The solution is:

k = 0x0000002082DF1821D6E82CEE6880211228

Certificate: #6 of 12 (Encryptionizer / reg3.exe)
BasepointInit: 3254253397 (0xC1F7F755)
Checksum: 0xF4775F5E
Public key (ONB2):
Q.x = 3600264749883462755399490686438491
Q.y = 419754383946908414551514272523181
Curve: GF(2^113) Koblitz (y^2 + xy = x^3 + x^2 + 1)
Subgroup order: 5192296858534827627896703833467507
__________________
AKA Solomon/blowfish.

Last edited by WhoCares; 03-19-2026 at 13:15.
Reply With Quote
The Following User Gave Reputation+1 to WhoCares For This Useful Post:
cjack (03-19-2026)
The Following 5 Users Say Thank You to WhoCares For This Useful Post:
blue_devil (03-20-2026), cjack (03-19-2026), cybercoder (03-21-2026), niculaita (03-19-2026), tonyweb (04-06-2026)
  #126  
Old 04-10-2026, 03:54
DARKER DARKER is offline
VIP
 
Join Date: Jul 2004
Location: Somewhere Over the Rainbow
Posts: 541
Rept. Given: 16
Rept. Rcvd 123 Times in 54 Posts
Thanks Given: 21
Thanks Rcvd at 1,038 Times in 262 Posts
DARKER Reputation: 100-199 DARKER Reputation: 100-199
Btw, project sources are already online:

Github:
Code:
https://github.com/mbollini72/Bolero-ECDLP-Solver
Reply With Quote
The Following 3 Users Say Thank You to DARKER For This Useful Post:
blue_devil (04-11-2026), tonyweb (05-24-2026), WhoCares (04-10-2026)
  #127  
Old 04-11-2026, 05:34
JMP-JECXZ JMP-JECXZ is offline
Friend
 
Join Date: Mar 2017
Posts: 123
Rept. Given: 0
Rept. Rcvd 5 Times in 4 Posts
Thanks Given: 15
Thanks Rcvd at 150 Times in 69 Posts
JMP-JECXZ Reputation: 5
does the givcurve project will have a comeback?
Reply With Quote
Reply

Tags
bolero, ecdlp


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Replacing ECDSA in Target (arma) Mynotos General Discussion 3 11-22-2019 00:49


All times are GMT +8. The time now is 04:29.


Always Your Best Friend: Aaron, JMI, ahmadmansoor, ZeNiX, chessgod101
( Since 1998 )