Advance Submission: Neural Net Optimizer

Advance Rewards Submission

We are delighted to share that our first Advance submission for our Neural Net Optimizer Challenge has been made public today (round 111)!

The algorithm, Nova Prime FGD, is now open for the community to review.

Please see here: Advance Evidence Form.
The code submission that embodies the method described above: Code Submission.

You are invited to explore the evidence and code and prepare to cast your vote in the token-weighted vote on the submission’s eligibility for Advance rewards. Voting will open at the beginning of the next round (round 112) and remain open until the end of that round.


Optimizer for Neural Network Training

An optimization algorithm for neural network training is the engine behind modern artificial intelligence. It works by iteratively adjusting a model’s millions of parameters, guided by gradients, to minimize a loss function. The optimizer is the algorithm that decides how those adjustments are made.

Optimizers capture the dynamics of learning itself — how fast to move, when to slow down, and how to escape bad regions of the loss landscape. Built on variants of Stochastic Gradient Descent (SGD), they efficiently navigate extraordinarily high-dimensional, non-convex loss surfaces to find parameters that generalize well to unseen data.

Why It Matters

Training neural networks underpins virtually every modern AI system. Optimizers are crucial for:

  • Training at Scale: They must handle millions to billions of parameters, navigating loss landscapes where brute-force methods are completely impractical.
  • Powering Real-World AI: Optimizer speed and quality directly determines the cost, energy, and feasibility of training models for language understanding, autonomous driving, medical imaging, and scientific simulation.

Neural network training consumes entire data centers running continuously. Even marginal improvements in optimizer efficiency compound into enormous savings in compute, energy, and cost — and can unlock capabilities previously out of reach.

This submission represents a potential step forward in one of the most fundamental problems in computer science. Your participation in reviewing, discussing, and voting will help determine whether it qualifies for Advance Rewards. For more information on the Challenge, check out our technical paper

Having looked through the actual code on the submitted branch, the claims in the evidence form don’t seem to add up.

The two core methods, FGD and HUF-1, don’t run at all.

For FGD, the submitter’s own comment in mod.rs reads: “intentionally inert (not wired into the step path) … does not change runtime behaviour” , and the module is tagged #[allow(dead_code)]. Even setting that aside, there’s no fractional calculus in it. The math is just a standard exponential moving average of squared gradients, which is what Adam has always done.

For HUF-1, nova_prime_step,the function that would actually invoke the holographic machinery, is declared as an extern in mod.rs but never called anywhere in the file. A full search confirms it appears exactly once, only as its declaration. The “holographic prediction” itself is a two-point linear extrapolation dampened by 0.95. No wave interference, no phase, no amplitude.

What actually executes is one kernel: gas_fx_mega_fused_kernel This is standard Adam with FP16 second moments. The step function has comments in it saying phase control, gradient norm tracking, and sparse masking were all “disabled for speed.”

I seriously doubt the performance gains claimed too, considering the entire submission is a mis-match of old code, dead code and non-implemented functions. A 56% reduction in runtime while gaining 9% quality is hard to believe considering the actual code implemented.

Absolutely not something I would call an advance or worthy of such a title.