Are Benchmarkers Incentivized Enough to Explore New Algorithms and Hyperparameters?

I want to raise a concern about the current benchmarking dynamics in TIG.

From my perspective, many benchmarkers are naturally pushed toward a simple strategy: increase hardware capacity and run already-known algorithm + hyperparameter combinations that are known to be profitable. This is rational from the benchmarker’s point of view, but I am not sure it maximizes the long-term innovation value of TIG.

The core issue is the balance between exploration and exploitation.

Right now, exploitation is much safer:

  • use an already-proven algorithm;

  • copy or converge toward known hyperparameters;

  • scale compute;

  • optimize for predictable short-term rewards.

Exploration is much riskier:

  • test new algorithms that may fail;

  • spend compute on hyperparameter search;

  • compare many configurations across tracks;

  • discover good settings that others can quickly copy once visible;

  • receive no clear extra reward for being the first to do the hard tuning work.

This creates a possible problem: TIG may drift into a hardware race, where the main advantage becomes who can deploy more compute, not who can find better algorithms or better algorithm configurations.

I do not think hardware is useless. Compute is obviously necessary for solving challenges, securing the network, and producing benchmark data. But compute should serve algorithm discovery. If most compute is spent repeatedly running the same known algorithms with the same known parameters, then the system may not be extracting the maximum innovation value from the available hardware.

In my opinion, the more valuable direction for TIG would be to encourage benchmarkers to allocate more compute toward:

  • testing newly submitted algorithms earlier;

  • running systematic hyperparameter sweeps;

  • comparing algorithms under equal fuel or runtime budgets;

  • evaluating performance across different tracks;

  • discovering algorithm + parameter combinations that improve quality per unit of compute.

The problem is that the benchmarker who performs this exploration takes the cost and risk, while the result can often be copied by others. This creates a classic incentive mismatch: exploration is expensive and uncertain, while exploitation is safer and more immediately profitable.

Why this matters for TIG

TIG’s main value proposition is algorithmic innovation. Benchmarkers are not only miners; they are also the market mechanism that helps discover which algorithms are actually useful.

If benchmarkers mostly scale hardware instead of exploring the algorithmic search space, the signal becomes weaker.

A new algorithm may be strong, but if nobody tests it properly, or if nobody spends enough time tuning its hyperparameters, it may look bad or remain ignored. In that case, TIG could miss useful innovation simply because the incentives favor copying proven strategies over testing uncertain new ones.

Possible improvements

I am not suggesting that TIG should punish large benchmarkers or make compute less important. Instead, I think the protocol or ecosystem could add stronger incentives for exploration.

Possible ideas:

  1. Exploration rewards
    Reward benchmarkers who test newly added or low-adoption algorithms early, especially if they find competitive configurations.

  2. First-discovery credit for strong hyperparameter configurations
    If a benchmarker is the first to find a strong algorithm + hyperparameter combination, they could receive temporary recognition or reward before the configuration becomes widely copied.

  3. Temporary privacy for discovered parameters
    As another option, a benchmarker who first discovers an effective hyperparameter configuration could be allowed to keep those parameters hidden from public view for a limited period of time. For example, the parameters would become public only after a defined delay. This would create an additional reward for exploration: the benchmarker who spent compute and time finding a strong configuration gets a short window of advantage before others can copy it.

I believe this matters because TIG should not become only a hardware race. Hardware should be a tool for discovering better algorithms, not the final source of advantage by itself.

The goal should be: more compute, yes — but compute used intelligently across more algorithms, more tracks, and more hyperparameter combinations.

I would like to hear what other benchmarkers, innovators, and the TIG team think about this.

2 Likes

I fully agree with your suggestions. TIG is about innovation. Benchmarkers can and should become part of that innovation. Finding optimal hyperparameters can provide innovators with useful information about possible directions for improving algorithms.

Therefore, this hyperparameter tuning should be rewarded. For that reason, I would make the information about the selected hyperparameters private. The most innovative benchmarkers should be the ones rewarded for their work.

1 Like

Thanks for raising this issue around hyperparameter selection. We have also been aware of this design feature, and have discussed internally whether the current approach is the right one. Thank you for being patient with us while we think this through.

I will briefly summarise the issue you raised.

Hyperparameter choices are currently public. This means that, as a benchmarker, you may not be strongly incentivised to spend your own compute searching for better hyperparameter configurations, because other benchmarkers can quickly copy any good settings you discover.

The concern is that this could lead to a situation where algorithm A is being widely adopted, while algorithm B might actually be more performant if run with a different set of hyperparameters. However, because nobody is sufficiently incentivised to find those hyperparameters, algorithm B may remain underused.

I will structure our thoughts as follows.

1. To what degree is this actually happening?

The first question is: how suboptimal are the hyperparameter choices currently being used?

We do not expect every algorithm to be run with perfectly optimal hyperparameters. In fact, we are not especially concerned with exact optimality, since highly tuned parameters may overfit to a specific setting. What matters more is that algorithms are being run with reasonably good hyperparameter choices that is, choices within some neighbourhood of strong performance.

If an algorithm is well designed, it should hopefully not be too costly to find a good configuration, even if it is not globally optimal. So we are generally comfortable with algorithms being run on slightly suboptimal hyperparameters.

The more important question is whether this is materially affecting outcomes.

For example, can anyone show:

  • an algorithm currently being adopted in TIG that is being run with one set of hyperparameters, but performs significantly better with another;

  • or an algorithm that is not currently being adopted, but would become competitive if run with a particular hyperparameter configuration?

This kind of evidence would be very useful.

2. If this is happening, what should we do about it?

If we determine that poor hyperparameter choices are materially affecting benchmarking outcomes, then we will edit the protocol.

At the moment, we see two possible directions.

Option One: Delayed hyperparameter reveal

Benchmarkers’ chosen hyperparameters would not be immediately visible. Instead, they would be revealed only after the round concludes.

This would give benchmarkers a temporary advantage for discovering strong configurations, and would make hyperparameter exploration worthwhile.

However, this approach has some complications. For example, innovators may try to hide good hyperparameter choices inside their algorithms, which could give them an advantage when the algorithm is run. It could also make it harder for smaller benchmarkers to participate, and may add some complexity from a decentralisation perspective.

That said, these issues are not fatal, and this option may still be worth considering.

Option Two: Shared hyperparameter pool

Benchmarkers would select hyperparameters from a shared pool. Participants could add new hyperparameter configurations to the pool for a small fee, and would then be rewarded based on the adoption or performance of the configurations they contributed.

This would create a more explicit incentive for hyperparameter discovery. In effect, it would add a new role to TIG: contributing useful algorithm-parameter configurations.

The downside is that this would add complexity to the protocol.

What we would welcome from the community

We would be very interested in hearing:

  1. Evidence that significantly suboptimal hyperparameters are currently being used in benchmarking.

  2. Thoughts on the two possible options above: delayed hyperparameter reveal, and a shared hyperparameter pool.

  3. Any alternative mechanisms that could better incentivise exploration.

I’m a bit confused about the ultimate goal of benchmarking. As I understand it, the goal of benchmarking is to test candidate algorithms (algo…) by investing computational resources (CPU/GPU) and find the most efficient algorithm. Since the goal is to find the best algorithm, why not let the algorithm authors directly fix the HP parameters or provide multiple HP combinations for selection? Why do we still need benchmarkers to research and test HP? Most testers are just ordinary people, not scholars… They basically don’t know the meaning of algorithm HP parameters, let alone how to adjust them. Only the algorithm authors know how to tune HP. As a benchmarker, I only know how to efficiently utilize my CPU/GPU resources. I allocate/schedule the benchmark’s MASTER and SLAVE in a reasonable manner to cooperate with the entire TIG team to find the best algorithm.

I have been a benchmarker for more than 18 months. Deep down, I believe this project has great goals and immense commercial value. That’s why I have invested a lot of money and effort into it, while also holding a large amount of $TIG. The entire TIG project is carried out through the collaboration of various roles. Innovators (algorithm authors) Benchmarker Challenge Owners Each has their own responsibilities. Do we really need an additional HP debugger? Shouldn’t this be the work of the Innovators?

My suggestion is that Innovators provide several fixed HP options when providing algorithms, and no longer allow customization of HP.

However, verification work is also indispensable… because it cannot be ruled out that some benchmarkers might modify the source code of the algorithm and compile the algorithm program themselves to achieve customization of the algorithm.

I want to clarify my point about hardware vs exploration.

I am not saying that hardware is not important. Obviously, benchmarkers need compute: without compute, it is impossible to find qualifiers, test algorithms, and provide a useful signal to the network. But the real question is how that hardware is used.

If a benchmarker simply increases capacity and runs the same already-known algorithm with the same already-known parameters, then after a certain point they are mostly getting more attempts and more stable statistics for an already-known configuration. This is useful for short-term mining, but it is not always useful for discovery.

Every parameter configuration has statistical noise. More instances help reduce that noise. But the returns diminish: the more runs you have already done with the same configuration, the less new information each additional run provides. At some point, it becomes clear enough how well a specific combination of algorithm + parameters + fuel/runtime budget performs. After that, adding more compute to the same configuration does less and less to tell us whether there is a stronger configuration nearby.

That is why, in my opinion, it is important not only to increase hardware capacity, but also to use that capacity to explore the parameter and algorithm space.

Hyperparameters can have a very strong impact on solution quality. They are not just minor technical settings. In some algorithms, parameters effectively change the search strategy itself: faster and shallower search vs deeper and slower search; more exploration vs more aggressive exploitation; different population sizes; different iteration counts; different limits; different penalties; and a different balance between speed and quality.

Because of this, two benchmarkers can use the same algorithm and even similar hardware, but get significantly different results only because they use different parameters. One configuration may quickly produce an average result, another may spend more time or fuel but find higher-quality solutions more often, and a third may only be strong on a specific track or under a specific budget.

So my point is not that “more hardware is useless.” On the contrary: more hardware is very useful if it is used as a search tool. For example:

  • running different hyperparameter variants;

  • comparing algorithms under similar fuel/runtime budgets;

  • quickly filtering out weak configurations;

  • allocating more compute to promising directions;

  • finding algorithm + parameter combinations that deliver more quality per unit of compute.

If all hardware is used only to repeat already-known profitable configurations, TIG risks gradually becoming a hardware race. In that scenario, the advantage goes not to the benchmarker who explores algorithms and parameters better, but to the benchmarker who can simply scale already-discovered settings.

For a benchmarker, this may be rational in the short term. But for TIG as a system for algorithmic innovation, it is not necessarily optimal.

In my view, the best outcome for the ecosystem happens when large amounts of compute are used not only for mining, but also for systematic exploration: testing new algorithms, exploring new parameters, and discovering more efficient configurations. These discoveries are what ultimately create value for the whole network.

1 Like

As the developer of TIGPOOL and someone who has been benchmarking on TIG for more than two years, I would like to share my perspective on this topic.

I think we have reached a point where the current situation around hyperparameters is frustrating for almost everyone. Benchmarkers are increasingly hiding their best HPs, while others argue that everything should be public. Neither side is really satisfied, and the result is growing tension within the ecosystem.

Personally, I believe hyperparameters are necessary. They are an important part of algorithm optimization and often make the difference between an average benchmark and a great one. They allow benchmarkers to continue improving algorithms after they are released.

The reality is that innovation cycles have become extremely fast. Innovators are shipping new challenges and new algorithms at a pace that often leaves little time for extensive tuning. In practice, benchmarkers are the ones providing the compute resources needed to explore the search space and finish the optimization work. This exploration creates real value for the protocol.

The problem is that the benchmarker who discovers a valuable configuration currently bears all of the research cost, while everyone else can benefit from that discovery once it becomes public. This naturally encourages people to hide their HPs.

I don’t think hiding HPs for an entire round is a good solution either. It would give too much advantage to whoever discovers a strong configuration first and could prevent other benchmarkers from competing fairly during the round.

Instead, I would propose an alternative based on adoption.

For each challenge, a small percentage fee could be collected from benchmark rewards, similar in spirit to the existing precommit fees. These fees would be accumulated into an Adoption Pool.

At the end of the round, the Adoption Pool would be awarded to the benchmarker who first introduced the benchmark configuration (algorithm + hyperparameters) that achieved the highest adoption across the network.

This approach has several advantages:

  • No additional TIG emissions are required.
  • Innovation is rewarded without preventing copying.
  • Benchmarkers are incentivized to share discoveries rather than hide them.
  • The mechanism is objective because it relies on exact benchmark configurations rather than subjective notions of “similar” hyperparameters.
  • The ecosystem benefits from faster dissemination of good configurations.

If the most adopted configuration is simply the default benchmark provided by the Innovator, then the Adoption Pool could automatically be awarded to the Innovator.

In my opinion, this would better align incentives across the ecosystem. Today, benchmarkers are rewarded for producing solutions. An adoption reward would additionally reward benchmarkers who discover and share improvements that become useful to everyone else.

Most importantly, it would transform disclosure from a disadvantage into an advantage, which seems much healthier for the long-term growth of the protocol.

2 Likes

See my point 1. in my previous reply to this thread can you comment on that specifically? Couple replies to things you said :

  1. “hyperparameters can have very strong impact on solution quality” → Agreed. If hyperparameters had no meaningful impact, this would not be an issue.

  2. “more hardware is useful” → lets recall the role of benchmarking in TIG : benchmarking
    creates a synthetic market for algorithms. The market signal identifies which algorithm is performing best under the protocol’s constraints, meaning highest solution quality within the allotted fuel budget. The important question is not whether every algorithm is being run with globally optimal hyperparameters. We do not need perfect hyperparameter optimality. The important question is whether hyperparameter choices are materially changing the outcome of the market signal.

  3. “compute are used not only for mining, but also for systematic exploration” → We agree the compute to find reasonably good hyperparameter configurations has to come from somewhere in the protocol and this compute should be rewarded. However we think that benchmarkers who dont want to perform a hyperparameter search should still be able to participate in benchmarking.

Thanks for the input, xnico.

Two things:

  1. Your proposal sounds similar to what I suggested in Option Two. My view is that the “pool of hyperparameters” available for benchmarking should only be updated between rounds.

Otherwise, a benchmarker could immediately copy another benchmarker’s hyperparameters, add a tiny \epsilon change, call it a new configuration, and earn adoption rewards from it. Having a one-round buffer would help avoid this problem.

  1. You mention benchmarkers hiding their HPs. Can you confirm whether you mean a case where a benchmarker misreports which HPs they used to obtain a solution?

If so, I think this is a separate issue, and one we intend to address through the update to method verification.

At first glance, rewarding the first “algorithm + exact hyperparameters” configuration with the highest adoption looks objective. But in practice, this mechanism can be easy to bypass.

Suppose benchmarker A spends compute and discovers a strong configuration:

{"window_k":208,"ils_rounds":120,"n_random_starts":4,"n_crossover_gen":12,"perturb_base_frac":6,"perturb_max_frac":5,"ils_restart_interval":10,"n_full_restarts":2,"core_half_dp":50}

This is not just a random set of numbers. It represents an entire search strategy: window size, ILS rounds, random starts, crossover, perturbation, restarts, DP core size, etc. This combination defines the balance between speed, search depth, exploration, and exploitation.

Now benchmarker B can take this base configuration and change one minor parameter:

window_k: 208 → 209

or

core_half_dp: 50 → 51

Formally, this becomes a different exact configuration. But in practice, it is essentially the same strategy. If the solution quality is almost unchanged, then the real discovery was still made by benchmarker A, not B.

The problem is that other benchmarkers may start using variant B not because it is actually better, but because a large benchmarker or leaderboard leader started using it. For others, that can look like a strong signal: if the leader is using this configuration, then it is probably worth copying. In that case, the Adoption Pool could go not to the person who did the real research work, but to someone who copied the base configuration, slightly changed the parameters, and gained adoption because of their size or leaderboard position.

This creates a risk of “parameter laundering”: someone else’s discovery can be turned into a new configuration through a minimal change that barely affects the result.

So a reward based only on exact configuration can be technically objective, but economically unfair. If an adoption reward is introduced, it should account not only for exact parameter matches, but also for who first discovered a strong region in the HP space.

It is also important to consider that finding strong HPs often requires a lot of time and compute. Sometimes a good combination is only found near the end of a round, or after a long period of testing. For that reason, a short hiding window, for example one week, may be too weak of an incentive for real exploration.

1 Like

Hi Haver - your concern is addressed by my idea that hyperparameters can only be selected from a “hyperparameter pool” which can only be added to at the start of each round. Therefore a benchmarker is incentivised to add new configurations to this pool since they will earn “hyperparameter adoption rewards” for a whole round.

Regarding the one-round buffer, my concern is that it simply delays the problem rather than solving it.

On challenges such as C008, we are already seeing new algorithms being released almost every round. More generally, innovation cycles have become very short, with meaningful algorithmic improvements appearing every one or two weeks. In that environment, a one-round delay does not fundamentally change the incentives. It only postpones the moment at which discoveries become public.

Regarding hyperparameters, yes, benchmarkers can currently submit null/default HPs while executing their benchmarks with tuned HPs internally.

This is possible because TIG verification does not replay the benchmark using the submitted hyperparameters.

The verifier reconstructs the challenge instance deterministically from the seed and nonce, then evaluates a provided solution against that reconstructed instance. It verifies two things only:

  1. Is the solution valid? ( did he do the compute )
  2. What is its quality? ( did the quality is true or faked )

The verifier does not search for the solution itself, does not reproduce the benchmarking process, and does not verify that a given solution was obtained using the submitted hyperparameters.

This is not a limitation of the verifier implementation; it is a consequence of the verification model itself. Verification is intentionally designed to be much cheaper than solution discovery.

For some challenges, replaying the full benchmarking process would be prohibitively expensive. On C002, for example, evaluating a single nonce can take around 15 minutes. Reproducing large numbers of benchmark executions with submitted hyperparameters would be impractical from both a computational and protocol perspective.

This situation is where the decentralized verification feature is needed, the purpose of which is to ensure that all benchmarkers do not use fake HP. So… decentralized verification allows all other benchmarkers to spend more of their own resources to verify a suspicious result… This verification work should be voluntary, not mandatory… Those who propose verification may waste computing resources, but there is also a possibility of receiving rewards…

Similarly, if you report using the correct HP from the beginning, then your results are not something others can verify… everything is fair.

Hi xnico31 and Newman. As Newman says - benchmarkers hiding their HP will be solved by decentralised verification. Assume benchmarkers can no longer hide their HP - under this assumption, what do you think of Option Two that I described above, note hyperparameters can only be added to the pool at the end of each round. If each round is too slow. Then maybe every 48hrs or something.

I prefer the second option… the hyperparameter pool scheme… where at the beginning of each round, all Benchmarker participants can select a parameter scheme from the pool and pay a portion of the fee in the reward. Meanwhile, the hyperparameter explorers can submit the parameter schemes they have found before each round ends.

Regarding the issue of the new algorithm being updated quickly. I think this is not a problem. Because the new algorithm will be released in two rounds before going live… As for the hyperparameter explorer, it has up to 14 days of exploration/testing time. This is completely sufficient.