Abstract

AI-Trader is not a “magic EA”, but an open-source benchmark and infrastructure for competing LLM-based agents on real market data (NASDAQ-100, SSE-50, Bitwise10 crypto). The project provides fully autonomous agents (“zero human input”) and historical replay with anti-look-ahead controls, but it is not a ready-made trading system with proven, robust profitability.

A detailed analysis of the architecture, market model, data, and related research on LLM agents shows: using AI-Trader as is to earn stable profits on full automation is neither methodologically nor practically justified.


1. Context: What AI-Trader Is Really For

The project positions itself as “AI-Trader: Can AI Beat the Market? Live Trading Bench”, where multiple models (GPT, Claude, Qwen, etc.) autonomously trade NASDAQ-100, SSE-50 and major crypto assets, with results displayed in a live leaderboard and via historical replay.

Key points from the README:

  • fully autonomous decision-making with zero human intervention;
  • architecture based on MCP (Model Context Protocol): models trade by calling a fixed toolchain (price feeds, news, order execution, math);
  • multi-market coverage: NASDAQ-100, SSE-50, top crypto (Bitwise10);
  • historical replay with anti-look-ahead (agents only see information available at that historical moment);
  • live web dashboard with portfolio analytics and leaderboards.

Crucially, at the end of the README the team explicitly states that this is for research only, is not investment advice, and returns are not guaranteed.

Given your main question (“can this deliver stable automatic income?”), we must evaluate the project as infrastructure and a research testbed, not as a finished money-making trading robot.


2. AI-Trader Architecture: What Exactly You Get

2.1. Overall Structure

The README shows a clear project tree:

AI-Trader Bench/
├── main.py
├── agent/
│   ├── base_agent/
│   ├── base_agent_astock/
│   └── base_agent_crypto/
├── agent_tools/
├── data/
├── prompts/
├── frontend/
├── configs/
└── scripts/

Conceptually AI-Trader is:

  1. a trading environment (data + execution + market rules),
  2. a set of “agent wrappers” around LLM models,
  3. a web dashboard that reads logs and plots equity curves.

Core components:

  • Core System
    • main.py — main entry point;
    • agent/:
      • base_agent/ — base agent for US equities (daily and hourly);
      • base_agent_astock/ — base agents for Chinese A-shares (daily + hourly);
      • base_agent_crypto/ — base agent for crypto.
  • MCP Toolchain
    • tool_trade.py — order execution and portfolio updates;
    • tool_get_price_local.py — local price feed for US and A-shares;
    • tool_jina_search.py — news and information search;
    • tool_math.py — math utilities;
    • start_mcp_services.py — MCP service launcher.
  • Data System
    • US data: NASDAQ-100 daily prices (merged.jsonl, etc.);
    • A-shares: SSE-50 daily and hourly bars;
    • Crypto: Bitwise10 components and other major coins.

2.2. Markets, Timeframes, and Rules

Markets and initial capital:

  • US: NASDAQ-100, $10,000 initial capital;
  • China: SSE-50 (A-shares), ¥100,000 initial capital;
  • Crypto: Bitwise10 constituents, 50,000 USDT initial capital.

Timeframes:

  • daily for all markets;
  • hourly for US and A-shares;
  • crypto currently daily only.

Simulation rules (backtest / replay):

  • in competition mode, execution is typically tied to opening prices of the bar (daily/hourly);
  • data structures include separate “buy price” / “sell price” fields, i.e. a simplified bid/ask spread, but still based on bar-level data, not on order book microstructure.

Anti-look-ahead controls

AI-Trader emphasises that agents can only access data up to the current replay timestamp (prices, news, reports), helping to avoid blatant look-ahead bias.

This is a solid design choice for research, but it does not automatically guarantee realistic live performance.


3. How an Agent Thinks and Trades (Conceptually)

Even without reading every line of code, the README and the related STOCKBENCH paper allow us to reconstruct the decision loop:

  1. Portfolio Overview
    The agent receives:
    • current positions and cash,
    • recent actions,
    • opening prices for the instrument universe,
    • brief market/benchmark summary.
  2. In-Depth Stock Analysis
    Via tools, the agent can request extra data:
    • fundamental metrics (market cap, P/E, dividend yield, etc.),
    • recent price history,
    • relevant news (through Jina search or similar).
  3. Decision Generation
    The LLM uses this information plus a carefully engineered prompt to generate actions:
    • increase/decrease positions,
    • enter new trades,
    • hold / rebalance.
      Decisions are converted into target weights / share quantities.
  4. Execution & Validation
    tool_trade validates orders:
    • cash constraints,
    • market rules (lot size, T+1 constraints on SSE-50, etc.),
    • updates the portfolio accordingly.

A key design principle: there is no hard-coded trading rule in the usual quant sense; the “strategy” emerges from the LLM’s reasoning + prompts + available tools.

From a quant / professional algotrader perspective this leads to several issues:

  • there is no explicit, parametric policy to inspect, test, and optimise;
  • decisions are stochastic and sensitive to prompt phrasing, temperature, and context;
  • there is no clear mapping to statistically-tested signals (factors, time-series models, etc.).

In other words, it’s a powerful research toy, but a very opaque core for a production trading engine.


4. Market Model Quality: Where “Paper Profits” Can Appear

4.1. Data Sources and Limitations

Based on the README:

  • US market: Alpha Vantage daily data, later merged into JSONL;
  • A-shares: Tushare + efinance daily and 60-minute bars;
  • Crypto: internally collected data for Bitwise10 and similar.

For a research benchmark this is fine. For live trading realism there are drawbacks:

  • bar-level data (daily/hourly) with no intraday microstructure;
  • quality and completeness of free/academic APIs may vary;
  • handling of corporate actions (splits, dividends) is not fully documented—community issues explicitly ask about this.

4.2. Order Execution and Transaction Costs

The documentation does not provide a clear, explicit model for:

  • commissions;
  • slippage;
  • partial fills;
  • liquidity constraints.

Community members raise these exact questions in GitHub Issues (limits on position size, trade frequency, short selling, margin, etc.), and there is no comprehensive public specification yet.

Practical implication:

  • if fees and slippage are omitted or oversimplified, any “paper” PnL is over-optimistic compared to real trading;
  • this effect is critical for higher-frequency behaviour (even hourly), where cumulative friction costs easily eat small edges.

4.3. Risk Management

The README does not document any robust risk framework:

  • no explicit position limits (e.g., max % of NAV per ticker);
  • no explicit daily/monthly trade limits;
  • no global drawdown stops or coherent stop-loss policy.

Again, GitHub Issues show that users actively ask for this. The absence of a transparent and well-tested risk layer is a major red flag for any fully automated real-money deployment.

For a serious algo setup this is a deal-breaker: without hard risk constraints, any strategy, including an LLM agent, can eventually hit a catastrophic loss.


5. What Empirical Evidence and External Research Say

5.1. AI-Trader’s Own Results

The public dashboard (ai4trade.ai) shows:

  • comparative equity curves for different agents;
  • a leaderboard by return and drawdown;
  • portfolio breakdowns.

However:

  • without executing client-side JS we mostly see “Loading trading data…” — the raw historical curves are not directly visible in static view;
  • the README notes that runtime data are no longer stored in the repo but periodically uploaded to external storage, with history generally spanning months, not years.

So at this stage there is no easily verifiable, long-horizon track record for AI-Trader agents. That makes any claims about “stable profits” inherently weak.

5.2. Related LLM-Agent Benchmarks

HKUDS and related groups also publish STOCKBENCH, an academic benchmark closely aligned in design with AI-Trader.

Core findings from the STOCKBENCH paper (and similar works):

  • LLM agents trade in a realistic environment (prices, fundamentals, news) with strict anti-look-ahead;
  • average returns are modest, and a naive equal-weight buy-and-hold baseline over the universe delivers about 0.4% return over the test period with a max drawdown of ~−15.2%;
  • best-performing agents sometimes beat this baseline (e.g., ~1.9% return over a few months with lower drawdown), but:
    • performance gains are small,
    • results are unstable over time,
    • most agents fail to consistently outperform this trivial baseline.

The paper explicitly notes that most LLM agents do not reliably beat simple buy-and-hold on risk-adjusted metrics.

Other benchmarks (InvestorBench, Agent Market Arena, etc.) show a similar pattern:

  • LLM agents can be profitable in some windows;
  • their PnL is highly regime-dependent and sensitive to setup;
  • there is no solid evidence of robust, long-term alpha.

5.3. Broader Context: Active Strategies vs. the Market

It’s also important to place all this in the context of traditional active management:

  • SPIVA scorecards and research by S&P Dow Jones / Apollo show that around 80–90% of active equity funds underperform their benchmarks over 10–15-year horizons;
  • even in years where active managers “catch up” (e.g. volatile regimes, tariff shocks, etc.), the majority still fails to sustainably beat the index, and outperformance is concentrated in a small minority of funds.

In other words: even well-capitalized, regulated professional managers with deep teams rarely deliver long-term alpha. Expecting first-generation LLM agents (as in AI-Trader) to do so out of the box is extremely optimistic.


6. Can You Earn “Stable Profits” on Full Auto with AI-Trader?

Let’s break it down.

6.1. Scenario 1: Run AI-Trader “Out of the Box” with One of the Built-In Models

Answer: essentially no, for multiple reasons.

  1. No long-term statistical proof.
    Publicly available history is short (months, not years). Even if curves look good, it may simply be a lucky regime, not a robust edge.
  2. Simplified market model.
    Daily/hourly bar data, execution at open, no real order-book dynamics. Missing microstructure means that path-dependent risk (gaps, intraday spread spikes, flash moves) is under-represented.
  3. Unclear or missing transaction cost modelling.
    Without explicit fees and slippage, any apparent edge is overstated. For strategies that rebalance frequently, realistic friction costs can easily flip the sign of the PnL.
  4. No transparent risk framework.
    Lack of clear position limits, trade limits, or global drawdown controls means the system can blow up under adverse conditions.
  5. High stochasticity and prompt-sensitivity of LLM decisions.
    LLM agents are fragile: small changes in prompts or parameters can significantly alter behaviour. External research (e.g., STOCKBENCH) confirms that this leads to unstable and regime-dependent performance.

Practical conclusion:
Running AI-Trader as a “plug-and-play robo-advisor that steadily grows your account” is effectively donating capital to an experiment. It’s a cool scientific setup, not a responsible production trading solution.

6.2. Scenario 2: Use AI-Trader as Infrastructure and Build Your Own Strategy

Here the story is different and more positive.

AI-Trader provides:

  • a ready-made research playground (data + replay + basic execution);
  • a flexible way to plug in custom agents (CustomAgent) and even external MCP tools (e.g., your custom risk module, signal engine, news filter).

In this setup you can:

  1. Implement your own, statistically-validated signals (factor models, time-series, stat-arb, etc.) as a separate tool, and use the LLM only for:
    • interpreting news and fundamentals;
    • high-level risk overlays (reduce leverage before macro events, etc.);
    • producing natural-language rationales.
  2. Replace the execution model, commission handling, and risk management with something closer to your real infrastructure.
  3. Run long historical replays across multiple regimes (bull, bear, sideways, crisis), and test:
    • return stability;
    • parameter sensitivity;
    • tail-risk behaviour (worst days/months).

If done properly, you can theoretically build a profitable and reasonably robust strategy on top of AI-Trader’s framework.

But in that case:

  • the core edge will be your quant logic,
  • AI-Trader will be just the environment and tooling
  • and the usual statistical reality still holds: even good quants don’t guarantee long-term outperformance.

6.3. Operational and Economic Constraints

Even with a profitable strategy on the AI-Trader stack, you still face real-world hurdles:

  1. Broker Integration.
    AI-Trader does not natively connect to Interactive Brokers, Binance, Bybit, etc. You will need to:
    • build a bridge between tool_trade.py and broker APIs;
    • implement robust handling of network errors, partial fills, and reconciliation of positions.
  2. LLM API Costs.
    For daily/hourly trading this is manageable, but:
    • complex reasoning calls are not cheap;
    • at scale (many instruments, many agents) model costs may meaningfully eat into your alpha.
  3. Regulatory and compliance concerns.
    Fully autonomous systems with no human oversight are not how serious institutional desks operate. You will need proper monitoring, controls, and risk sign-offs.

7. How a Professional Algotrader Should Use AI-Trader

Summarising the sensible use cases:

  1. Treat it as a research benchmark and sandbox, not a turnkey EA.
    Use it to compare different LLM agents, signals, and your own strategies on consistent datasets.
  2. Strictly separate benchmark PnL from real-world expectations.
    Any positive backtest in AI-Trader must be re-validated:
    • in an independent backtesting stack with realistic costs;
    • on out-of-sample windows;
    • under stress tests and resampling.
  3. Use LLMs as an overlay, not the entire brain.
    For example:
    • core trade signals from classical quant models;
    • LLM agents for:
      • news interpretation,
      • dynamic risk regime selection,
      • explainable trade rationales for clients/investors.
  4. Never go straight to full-auto live trading.
    Minimal responsible path:
    • historical replay / backtest;
    • live paper-trading with real-time data;
    • small real money with strict limits.

8. Final Verdict: Can AI-Trader Deliver Stable Auto-Profits?

Let’s answer plainly.

Given the current state of the project and the available evidence, using AI-Trader “out of the box” (with default LLM agents) to earn stable profits on the market in fully automated mode is neither methodologically sound nor practically safe.

Main reasons:

  1. No robust, long-horizon, independently verifiable track record.
    Existing results cover relatively short periods and are not sufficient to claim durable alpha.
  2. Simplified and partially documented market model.
    Bar-level data, lack of explicit slippage/fee modelling, and missing detailed risk framework all inflate “paper” performance and hide real-world failure modes.
  3. Macro evidence against stable alpha.
    Most active managers underperform their benchmarks over 10–15 years, even with large teams and capital. First-generation LLM agents in a simplified research environment are unlikely to beat these odds without heavy human-driven quant engineering.

At the same time:

  • AI-Trader is a very valuable research tool, especially if you:
    • develop your own EAs/algos,
    • want to compare them against LLM agents,
    • need a multi-market, reproducible testbed for content and R&D.

Used as a framework rather than a “black-box robot”, it can be part of a serious workflow: your signals + robust risk engine + AI-Trader’s environment + LLM assistance.

But the promise of “stable, fully automatic income just by running AI-Trader” belongs to marketing fantasies, not to professional quantitative trading.