What is LLMO (Large Language Model Optimisation)?
LLMO uses large language models as optimisation engines that iteratively improve solutions to computational problems.
Large Language Model Optimisation (LLMO) refers to the use of large language models as optimisers or optimisation agents, rather than simply as text generators. The paper 'Large Language Models as Optimizers' (ICLR 2024) frames LLMs explicitly as optimisers and provides code and extensive experiments demonstrating this use case. Instead of writing rules or tuning hyperparameters, you prompt an LLM with a problem description, feed it historical decisions and rewards, and let it generate improved solutions iteratively.
Terminology varies across the literature. Some papers use "LLMO", others refer to "LLM-assisted optimizer" or "LLM-based agent". A recent survey contrasts parameter-driven LLM optimisation (improving model parameters) with LLM-based agent optimisation (using the model to solve complex tasks), positioning LLMO in the latter category. The core idea remains consistent: the model acts as the optimisation engine, not just a component in a larger system.
This approach differs fundamentally from using LLMs to generate text or answer questions. Here, the model's output is a candidate solution to an optimisation problem, and the system evaluates that solution, then feeds the result back into the next prompt. The LLM learns to improve its suggestions over successive iterations.
How does an LLMO work in practice?
LLMO architectures often iterate by prompting LLMs with historical decisions and rewards so the model can generate improved solutions without heavy hyperparameter tuning. You describe the problem, specify the format for a solution, and include the performance of previous attempts. The LLM then proposes a new solution, which you evaluate, and the cycle repeats.
Researchers proposed an LLM-assisted optimizer for neural architecture search that uses the CRISPE prompting framework and reported numerical experiments on NAS-Bench-201 with CIFAR-10 and CIFAR-100. CRISPE stands for Capacity and Role, Insight, Statement, Personality, and Experiment. This structured prompt tells the model what role it plays (an optimiser), what the task is (finding a neural architecture), what data it has (past architectures and their accuracy), and what format to return (an array-like solution). The model does not need to understand evolutionary computation or adversarial robustness; it simply follows the prompt and improves on past results.
Google Research describes a hybrid approach for trip planning where an LLM suggests a qualitative itinerary and an optimisation step enforces quantitative, real-world constraints to produce a feasible plan. The LLM generates an initial list of activities tailored to the user's interests. A classical optimiser then checks opening hours, travel time, budget limits, and other hard constraints, adjusting the plan to make it feasible. This hybrid architecture combines the LLM's qualitative reasoning with the precision of traditional optimisation algorithms.
The iterative loop is central. Each prompt includes the problem statement, the best solutions found so far, and their scores. The LLM reads this context and proposes a new solution. You evaluate it, add the result to the history, and prompt again. Over several iterations, the LLM's suggestions converge towards better solutions.
Which problems and benchmarks show LLMO works?
Published benchmarks demonstrate LLMO's effectiveness on specific tasks. The neural architecture search experiments on NAS-Bench-201 used CIFAR-10 and CIFAR-100 datasets, comparing LLMO against six discrete meta-heuristic algorithms. The LLM-assisted optimizer matched or exceeded the performance of traditional methods without requiring manual tuning of crossover rates, mutation probabilities, or population sizes.
Another study applied LLMO to adversarial robustness neural architecture search (ARNAS), searching for architectures that maintain accuracy under adversarial attacks. The model iteratively proposed architectures, evaluated their robustness, and refined its suggestions based on the results. This task combines qualitative design choices (which layers to include, how to connect them) with quantitative performance metrics (accuracy under attack).
Network management problems also benefit from LLMO. One paper examined black-box network optimisation, where the objective function is unknown or too complex to model analytically. The LLM receives network configurations and their performance scores, then suggests improved configurations. The approach scales to large networks by decoupling action vectors into blocks of variables, allowing the model to process each block separately rather than handling all variables in a single prompt.
These results show LLMO works on problems where qualitative reasoning and iterative refinement matter. The benchmarks are narrow, however. The literature does not claim LLMO outperforms traditional optimisers across all domains, and empirical results remain concentrated on specific tasks like architecture search and network configuration.
How is LLMO different from traditional optimisers and from SEO?
Traditional black-box optimisers (Bayesian optimisation, evolutionary algorithms, simulated annealing) require careful hyperparameter tuning and domain-specific design. You must choose population size, mutation rate, acquisition function, or cooling schedule. LLMO replaces this with a prompt. The model reads the problem description and past results, then generates a new solution. No hyperparameters to tune, no search operators to design.
Classical optimisers also struggle with qualitative goals. If your objective is "find a neural architecture that is strong to adversarial attacks and interpretable", a traditional optimiser needs you to encode "interpretable" as a numeric score. LLMO can work with natural language descriptions of interpretability, combining them with numeric robustness scores in the same prompt.
The computational cost differs sharply. Traditional optimisers evaluate many candidate solutions in parallel, relying on fast objective functions. LLMO depends on a series of LLM inferences, each costing tokens and compute. One viable approach is to adopt a suitable early stopping technique, reducing the number of LLM inferences without performance degradation. In large-scale networks, decoupling action vectors into blocks of variables helps LLMO scale.
Search Engine Optimisation (SEO) is a completely separate concept. SEO improves a website's visibility in search engine results by optimising content, metadata, links, and technical performance. The objective is to rank higher for target keywords, the signals are crawlability, relevance, authority, and user experience, and the stakeholders are website owners, search engines, and users.
LLMO optimises solutions to computational problems by prompting a language model. SEO optimises web pages for search algorithms. The acronyms overlap, but the fields do not. LLMO has no connection to keyword research, backlink profiles, or page speed. If you are researching SEO, LLMO is not relevant. If you are researching optimisation algorithms, SEO is not relevant.
What are the main limitations, costs and scaling considerations for LLMO?
Since the LLMO depends on a series of LLM inferences, its computational cost is a critical issue for real-world networks. Each iteration requires a prompt containing the problem description and historical results, which can run to thousands of tokens. The model then generates a solution, which you evaluate, and the cycle repeats. For problems requiring dozens of iterations, token costs accumulate quickly.
Convergence is not guaranteed. The model may plateau, propose solutions that violate constraints, or oscillate between similar candidates without improvement. Early stopping helps: monitor the improvement rate and halt when gains fall below a threshold. This reduces cost without sacrificing final performance.
In large-scale problems, prompt length becomes a bottleneck. A network with thousands of variables cannot fit all of them into a single prompt without exceeding context limits or degrading the model's attention. Decoupling action vectors into blocks solves this. You divide the problem into smaller sub-problems, prompt the LLM for each block separately, then combine the results. This approach scales LLMO to handle large-scale problems.
To make LLMs practical in resource-constrained settings, the literature highlights three main compression strategies: knowledge distillation, model quantisation and model pruning. Knowledge distillation trains a smaller model to mimic a larger one, reducing inference cost. Model quantisation represents weights with fewer bits (for example, 8-bit integers instead of 32-bit floats), cutting memory and compute. Model pruning removes less important weights or neurons, shrinking the model without large accuracy loss.
These techniques matter for LLMO deployment. If you run LLMO on edge devices or in environments with limited compute, a compressed model may be necessary. The trade-off is between model capability and resource use. A heavily quantised model may struggle with complex prompts or long context, reducing optimisation quality.
When should you choose an LLMO instead of a conventional optimiser?
Choose LLMO when your problem involves qualitative goals that are hard to encode as numeric objectives. If you need an architecture that is "interpretable", "strong to distribution shift", or "suitable for deployment on mobile devices", LLMO can work with those descriptions directly. A traditional optimiser requires you to translate each goal into a score, which may be difficult or subjective.
LLMO also suits problems where you want flexible reasoning and the ability to incorporate domain knowledge without writing custom code. You can add constraints, preferences, or context to the prompt, and the model adapts. This is faster than re-engineering a genetic algorithm or Bayesian optimiser.
Interpretability is another factor. LLMO generates solutions in a readable format (text, code, structured data), and you can ask the model to explain its reasoning. Traditional optimisers produce candidate solutions without explanation. If you need to understand why a particular solution was proposed, LLMO offers an advantage.
The cost of orchestration matters. LLMO requires API access to a capable LLM, token budget, and infrastructure to manage the iterative loop. If your organisation already uses LLMs for other tasks, adding LLMO is straightforward. If you do not, the setup cost may outweigh the benefit, especially for problems where traditional optimisers work well.
Operational readiness is critical. LLMO is less mature than classical optimisation methods. Reproducibility can be an issue: the same prompt may yield different solutions across runs, depending on the model's sampling settings. Benchmarking is harder because performance depends on prompt design, model version, and iteration count. Productionising LLMO requires careful monitoring, fallback strategies, and cost controls.
If your problem has a well-defined numeric objective, a fast evaluation function, and no qualitative constraints, a traditional optimiser is likely more efficient. If your problem involves natural language goals, complex trade-offs, or the need for interpretable solutions, LLMO is worth testing.
The field is young. The benchmarks published so far show promise on specific tasks, but broad claims about LLMO replacing conventional methods are premature. Treat LLMO as a tool for a particular class of problems, not a universal optimiser.
