📊 Full opportunity report: Introducing Forezai · TradingAgents — a committee of LLMs decides paper-trades on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

Forezai · TradingAgents has launched a system where a committee of large language models makes paper-trading decisions. This development aims to test if AI-driven, multi-agent reasoning can outperform random choices in simulated markets. It extends prior research on parametric strategies’ failures and explores new AI collaboration approaches.

Forezai · TradingAgents has introduced an operational version of a multi-LLM committee system designed to simulate paper-trading decisions in financial markets. This development aims to evaluate whether collaborative reasoning among specialized AI agents can produce decisions comparable to or better than random choices, marking a significant step in AI-driven trading research.

The project is a fork of an existing framework that employs thirteen specialized LLM-based agents, each performing distinct analytical roles such as market structure, news, fundamentals, and sentiment. These agents debate and synthesize their findings into a final trading recommendation, without claiming to predict markets but rather to articulate reasoning through structured argumentation.

The new features include an autonomous daily execution loop, an auto-trader that maps AI ratings to paper orders, position management with exit rules, and a multi-broker abstraction supporting local, paper, and shadow modes. Additionally, a web dashboard provides real-time analytics, performance metrics, and audit logs, all running locally to ensure data privacy and control.

According to the project’s documentation, the system does not trade real money unless operators explicitly override safety restrictions, emphasizing its research focus rather than live trading. This setup allows researchers to explore AI decision-making processes in a controlled environment.

Introducing Forezai · TradingAgents — Thorsten Meyer AI

AGENTS

● ANNOUNCEMENT / MAY 2026

THORSTEN MEYER AI · FOREZAI · § 03

FOREZAI · 03

TRADINGAGENTS · LAUNCH

Research Series · Companion to Polybot Week 1-2 · 2026-05-17

Introducing Forezai · TradingAgents.
A committee of LLMs
decides paper-trades.

After two weeks of finding out most parametric strategies don’t work, the obvious next research question: can multi-agent LLM judgment do any better?

A fork of the open-source TradingAgents framework (TauricResearch): thirteen LLM agents in four stages — four parallel analysts · a bull-bear debate with research-manager arbitration · a three-voice risk team · a two-layer trader + portfolio-manager decision. The fork keeps the agent graph intact and adds the operational layer the upstream doesn’t ship: an autonomous loop · a multi-broker abstraction · a local web dashboard · Codex OAuth · MCP plug-ins · 520+ unit tests. The question is narrower than “do LLMs predict the market” — that prior is “no, with high confidence.” The narrower question is: when LLMs are structured into specialised adversarial roles, does the committee produce decisions at least no worse than a coin flip after fees? Honest priors before running: it might fail too. If it appears to work, the most likely explanation is variance.

Thorsten Meyer AI Forezai · TradingAgents Apache-2.0 fork · upstream cited Companion piece · ~2,500 words Polybot · § 03

This is not financial advice. Nothing in this announcement should be used to inform real trading decisions. The software described trades simulated money by default. If you reconfigure it to trade real money, you should expect to lose that money — regardless of how clever any individual agent’s reasoning looks. Algorithmic trading is zero-sum after fees and structurally hostile to part-time retail strategies.

13 agents

Specialised roles in four stages
Analysts · Debate · Risk · Decision

78% / -33%

Polybot prior: fleet win rate
combined with -33% bankroll

520+

Passing unit tests across engine,
services, HTTP routes (starting baseline)

€0 floor

LLM cost on Codex OAuth
(falls back to public API per token)

FOREZAI / TRADINGAGENTS· APACHE 2.0 FORK· UPSTREAM TAURIC RESEARCH· LANGGRAPH· 13 AGENTS / 4 STAGES· 4 PARALLEL ANALYSTS· BULL-BEAR DEBATE· 3-VOICE RISK TEAM· TRADER + PORTFOLIO MANAGER· 5-TIER FINAL RATING· ALPACA PAPER + LOCAL + SHADOW· LIVE ENDPOINTS HARD-REFUSED· FASTAPI + REACT VIA CDN· CODEX OAUTH· MCP PLUG-IN REGISTRY· 520+ UNIT TESTS· POLYBOT WEEK 1: 21 EXPERIMENTS· WEEK 2: -33% BANKROLL· 78% FLEET WIN RATE· HONEST RESEARCH, NOT EDGE· FOREZAI / TRADINGAGENTS· APACHE 2.0 FORK· UPSTREAM TAURIC RESEARCH· LANGGRAPH· 13 AGENTS / 4 STAGES· 4 PARALLEL ANALYSTS· BULL-BEAR DEBATE· 3-VOICE RISK TEAM· TRADER + PORTFOLIO MANAGER· 5-TIER FINAL RATING· ALPACA PAPER + LOCAL + SHADOW· LIVE ENDPOINTS HARD-REFUSED· FASTAPI + REACT VIA CDN· CODEX OAUTH· MCP PLUG-IN REGISTRY· 520+ UNIT TESTS· POLYBOT WEEK 1: 21 EXPERIMENTS· WEEK 2: -33% BANKROLL· 78% FLEET WIN RATE· HONEST RESEARCH, NOT EDGE·

FIG. 01 — THE 13-AGENT COMMITTEE

Thirteen specialised roles · four stages · biases made to argue in public

The architecture forces the system to articulate its reasoning rather than relying on what a single context window happens to recall

Stage 1 · Four analysts in parallel4 agents

Market

Structure, ranges, regime indicators

News + Insider

News flow, filings, insider activity

Fundamentals

Balance sheet, earnings, ratios

Social Sentiment

Social-media tone, retail signal

↓

Stage 2 · Bull-bear debate + research-manager arbitration3 agents

Bull researcher

Argues upside thesis from analyst reports

Bear researcher

Argues downside thesis from same reports

Research manager

Arbitrates · writes single synthesis

↓

Stage 3 · Three-voice risk team3 agents

Aggressive

Looks for upside · accepts variance

Conservative

Looks for downside · protects capital

Neutral

Balances · forces downside articulation

↓

Stage 4 · Two-layer decision2 agents

Trader

Three-tier proposal · buy / hold / sell

Portfolio manager

Five-tier rating + price target + horizon · sees arguments only, never raw data

The portfolio manager only sees the arguments, never the raw data — which forces the committee to make its reasoning explicit rather than relying on a single context window’s recall. The upstream framework ships the agent graph; it does not ship the operational machinery to run that graph on autopilot, observe its results honestly, store them for later inspection, or prevent the operator from accidentally trading real money. That gap is what the Forezai fork fills.

FIG. 02 — THE POLYBOT PRIOR · WHY THIS IS A DIFFERENT BET

Two weeks of paper-trading prediction markets · the trap underneath the headline numbers

25 experiments · 78% fleet-wide win rate · -33% bankroll · most parametric strategies are structurally negative-expectation when measured honestly

The flattering number

78%

Fleet-wide win rate · week 2

“You can win four out of five trades and still go broke, because the one loss is bigger than the four wins put together.” Win rate without P&L context is a mechanical illusion.

The honest number

−33%

Fleet bankroll · week 2 close

The strongest possible demonstration of the trap. A parametric trading strategy that looks compelling in a backtest will almost always fail to survive a fresh sample. Most “edges” are mechanical artefacts.

Week 1: 21 parallel strategy experiments · early winners mostly mechanical illusions · exactly one strategy (a fair-value taker on BTC) showed the mathematical signature of real edge over a few hundred settled trades. Week 2: same fair-value strategy with more data collapsed. A separate mid-week hypothesis (market-making) also failed cleanly. Fleet ended week 2 at roughly negative thirty-three percent of bankroll. The honest research finding wasn’t on the winning side — it was on the losing side. Adding more parameters to Polybot wouldn’t change that. TradingAgents is asking a separable question.

FIG. 03 — WHAT THE FORK ADDS · THE OPERATIONAL LAYER

Six layers the upstream framework doesn’t ship

Same agent graph, intact. The fork makes it a research instrument rather than a tech demo.

01 · Loop

An autonomous loop

Scheduler · watchlist · auto-trader maps ratings to paper orders · allow-list filtering · per-ticker cooldowns · sector caps · cash checks · position manager evaluates open positions every 60s for TP / SL / max-hold. Append-only audit logs.

02 · Brokers

Multi-broker abstraction

Three modes: local Python broker (yfinance fills, JSON-persisted) · Alpaca paper-trading adapter · “shadow” mode running both in parallel with divergence view. Real Alpaca live endpoints are hard-refused at multiple layers.

03 · Dashboard

A local web dashboard

FastAPI backend · React via CDN, no Node toolchain · SVG equity curve · rolling-peak drawdown · win-rate by rating / ticker / model · exit-reason breakdown · LLM cost vs realised P&L joined by run ID. Runs locally; nothing sent to a cloud service.

04 · Codex

Codex OAuth

Runs the engine on a ChatGPT Pro subscription via the Codex backend. LLM cost floor effectively zero if you already have ChatGPT Pro. Token stored encrypted locally. Falls back to the regular OpenAI API if you’d rather pay per token.

05 · Alerts

Multi-channel alerts

Slack · Discord · SMTP email · configurable filter on rating events and order fills · append-only history kept locally. Webhook URLs masked in API responses so a screenshot can’t accidentally leak credentials.

06 · MCP

MCP plug-ins

Registry for adding Anthropic Model Context Protocol servers (Kensho · Aiera · FactSet · Morningstar · LSEG) as analyst tools. Plug-ins advertise category (fundamentals · news · market data · social) · probe endpoint tests credentials.

Honest-by-design touches: every generated report prepends “Research, not advice” and appends a footer with version, commit, provider, models used, run ID, and cost. Closed trades carry the same metadata. 520+ passing unit tests across engine, services, and HTTP routes. The intent: when the system loses money, the journal makes it impossible to pretend it didn’t.

FIG. 04 — HONEST PRIORS · BEFORE RUNNING THIS IN ANGER

Three priors stated before the data starts arriving

The bias of the project: when the data says no, the dashboard says no, the article says no

It might fail too. LLMs are not oracles, and a sophisticated framework around language-model outputs does not change the underlying error rate of the model. Sample is still everything. The framework’s outputs are subject to the same statistical noise as any prediction system over small samples.

Highest likelihood

If it appears to work, the most likely explanation is variance. The same trap that caught the first article’s candidate edge applies here. A high win rate over fifty trades means much less than it looks. Without out-of-sample confirmation, a flattering early sample tells you almost nothing about whether the system has real edge.

Second-most likely

If it appears to work for the right reasons — empirical win rate matches stated confidence, and alpha-versus-benchmark persists across non-overlapping samples — that would be a meaningful research finding. Whether that happens, I don’t know. The point of putting it in the open is that the data will say.

Genuinely open

This is explicitly not a launch announcement for a product anyone should connect a real brokerage account to. The Alpaca live endpoints are hard-refused at multiple layers in the code, and the design choice is deliberate. The right next step is data, not deployment. The bias of the whole project is straightforward: when the data says no, the dashboard says no, the article says no, and no one tries to retroactively rescue the thesis. That’s the contribution.

FIG. 05 — WEEK THREE · WHAT THE METHODOLOGY WILL MEASURE

Four concrete measurements before publishing findings

The hope: write the week-three article from a position of “here’s what the data says”. The fear: another candidate falsified at higher sample. Both outcomes are publishable.

M1 · Sample discipline

Small watchlist for a few weeks before publishing

A handful of tickers across two or three sectors. Long enough to gather sample, narrow enough to keep attention on what’s actually happening per agent. Avoid the noise of a 65-ticker autonomous loop until the smaller version has been read carefully.

M2 · Calibration view

Stated confidence vs. realised win rate

When the system says “75% confident”, do the trades actually win 75% of the time? Same measurement applied to Polybot’s fair-value model. If the model is systematically over-confident, that bias dominates everything downstream.

M3 · Cost accounting

Cost per ticker · per rating · per profitable trade

With Codex OAuth the marginal LLM cost is effectively zero. With the public OpenAI API, each run is hundreds of agent turns. The honest question: does this scale economically if you ever did run it at real cost?

M4 · Non-overlapping windows

Alpha vs benchmark · out-of-sample

Not within-sample alpha — trivially inflatable. Hold out one period entirely, run the system on the next, then check whether the held-out result matches the in-sample stats. If they diverge sharply, the in-sample was curve-fit.

Open under Apache-2.0 with upstream cited from every relevant surface. Not open: the operator’s running results, the specific watchlist, the per-agent prompt customisations, the alert channels, the trade journals — kept local for the same reason Polybot’s per-experiment data is kept local. Publishing exact configurations encourages people to copy them with real money, which is the opposite of what an honest research project should do. Summary findings will be published. Recipes will not.

The bet is on a different mechanism, not a different parameter setting. The point is not to find a money-printing AI. The point is to put honest measurements of these systems into the public record — so the next person looking at the space starts a step further along than the last.

Thorsten Meyer AI · Introducing Forezai · TradingAgents · § 03

Source dossier & project notes

Forezai · TradingAgents fork — Apache-2.0 license preserved · GitHub published under same author as Polybot · upstream cited from README · marketing pages · generated report footers · forezai.com
TradingAgents upstream — TauricResearch team · multi-agent stock-research framework on LangGraph · the agent graph is intact in the fork · upstream license requirements (notice file · attribution · patent-grant clauses) preserved if anyone forks again
Polybot · Week 1 — 21 parallel strategy experiments on Polymarket 5-minute Up/Down · early winners mostly mechanical illusions · one BTC fair-value taker showed mathematical signature of edge over a few hundred settled trades
Polybot · Week 2 — fair-value strategy collapsed at higher sample · market-making hypothesis failed cleanly · fleet ended at ~negative 33% of bankroll · 78% fleet-wide win rate combined with deeply negative P&L · “you can win four out of five trades and still go broke”
Architecture · Stage 1 — Four analysts in parallel: market structure · news + insider · fundamentals (balance sheet + earnings) · social-media sentiment · each writes a short structured report independently
Architecture · Stage 2 — Bull-bear debate: two researcher agents argue opposing theses from the analyst reports · a research-manager agent arbitrates and writes a single synthesis
Architecture · Stage 3 — Three-voice risk team: aggressive (upside, accepts variance) · conservative (downside, protects capital) · neutral (balances) · forces explicit articulation of downside before any order is proposed
Architecture · Stage 4 — Two-layer decision: trader agent produces three-tier proposal (buy / hold / sell) · portfolio-manager agent synthesises into final five-tier rating with price target and time horizon · PM sees arguments only, never raw data
Operational layer — Autonomous loop · multi-broker abstraction (local Python / Alpaca paper / shadow mode) · FastAPI + React-via-CDN web dashboard · Codex OAuth on ChatGPT Pro · multi-channel alerts (Slack · Discord · SMTP) · MCP plug-in registry (Kensho · Aiera · FactSet · Morningstar · LSEG)
Engineering baseline — 520+ passing unit tests across engine, services, HTTP routes · honest-by-design touches: “Research, not advice” prepended to every report · footer with version, commit, provider, models used, run ID, cost
Hard-refused live trading — Real Alpaca live endpoints hard-refused at multiple layers · operator must deliberately override the refusal in more than one place to actually risk real money
Week three methodology — Small watchlist · calibration view (stated confidence vs realised win rate) · cost accounting per ticker / rating / profitable trade · out-of-sample alpha across non-overlapping windows
Stance — Summary findings published · recipes kept local · the bias is straightforward: when the data says no, the dashboard says no, the article says no, no one tries to retroactively rescue the thesis

Colophon

Set in Source Serif 4 (display, italic accent), EB Garamond (body), IBM Plex Sans (UI labels), IBM Plex Mono (mastheads, ticker, stamps, plug-in tags). Paper-cool gray-cream #e6e7e4.

Chromatic register: structural-slate dominant (committee-architecture analysis), empirical-clay for the Polybot prior forensic, labor-rose for the cautionary “expect to lose money” stance and the regulatory disclaimers, transition-bronze for the week-three forward-shape methodology, alternative-sage for the open-source / honest-measurement positive signal.

Key frame:

FOREZAI / TRADINGAGENTS 13-AGENT COMMITTEE POLYBOT PRIOR OPERATIONAL LAYER EXPECT TO LOSE MONEY CALIBRATION VIEW BULL-BEAR DEBATE HARD-REFUSED LIVE APACHE 2.0 HONEST MEASUREMENTS

Implications for AI-Driven Market Decision-Making

This development matters because it tests whether a structured committee of specialized AI agents can produce consistent trading judgments in simulated environments. If successful, it could demonstrate a new approach to AI decision-making that moves beyond single-model predictions and rule-based strategies, potentially informing future research on AI collaboration and financial modeling. It also provides a platform to study how explicit reasoning and debate among AI agents influence trading outcomes, which is relevant for advancing explainability and robustness in AI systems used in finance.

12Pcs Trading Chart Pattern Posters Candlestick Pattern Poster Bulletin Board Crypto and Stock Market Trading Poster Office Decorations for Trader Investor Supplies Wall Door Decor 11 x 15.7 Inches

Package includes: This set includes 12 trading chart pattern posters and 100 adhesive dots, providing you with all…

As an affiliate, we earn on qualifying purchases.

Background on AI Trading Research and Frameworks

Previous research, including reports from Thorsten Meyer and the TauricResearch team, has shown that parametric trading strategies often fail in live conditions despite promising backtests. These findings highlighted the mechanical artefacts and overfitting issues common in rule-based approaches. In response, researchers have explored more flexible AI methods, such as multi-agent systems where different models argue and synthesize insights.

The original TradingAgents framework was designed to facilitate structured debate among LLMs, routing questions through specialized roles—analysts, debaters, risk managers, and portfolio synthesizers—to emulate more nuanced decision processes. The current release of Forezai extends this by operationalizing the framework, enabling automated, repeatable experiments in paper trading, with detailed logging and visualization tools to analyze AI reasoning.

“This system doesn’t claim to predict markets but aims to explore whether AI agents, working together through structured debate, can produce decisions that are at least no worse than random, in a simulated environment.”
— Thorsten Meyer

The Intelligent AI Investor: A Beginner’s Guide to Using AI Tools for Informed Investment Decisions, Risk Management, and Wealth Building (Trading & Investing Series Book 7)

As an affiliate, we earn on qualifying purchases.

Limitations and Unknowns in AI Paper-Trading System

It remains unclear how well the AI committee’s decisions will perform over longer periods or in live markets, as the current setup is limited to simulated paper trades. The effectiveness of the structured debate approach compared to other AI methods or human traders has yet to be empirically established. Additionally, the impact of different agent roles and their biases on decision quality is still being studied, and the system’s robustness to market volatility is untested in real-world conditions.

Financial Analysis With Microsoft Excel 2019

As an affiliate, we earn on qualifying purchases.

Next Steps for Testing and Validating AI Trading Agents

Researchers plan to conduct extended experiments using the Forezai system, analyzing the performance of the AI committee over multiple market cycles. They will also explore variations in agent roles, decision thresholds, and data inputs to optimize decision quality. Future developments may include integrating real-time market data, refining the debate architecture, and publishing detailed performance metrics to assess the approach’s viability for broader AI trading applications.

Smart NFC Social Media Wristband – Silicone QR Bracelet for Facebook, Instagram & Google Reviews | Waterproof | Lifetime Link Update | Analytics Dashboard (NFC-WB-INS-BLK-V1)

【Tap to Connect Instantly】Let people follow your Facebook or Instagram profile — or leave a Google review —…

As an affiliate, we earn on qualifying purchases.

Key Questions

Can this AI system trade with real money?

No. The current setup is designed for simulated paper trading. Operators must deliberately override safety restrictions to enable real-money trading, which is not recommended at this stage.

How does the AI committee make decisions?

The system routes data through specialized roles—analysts, debate agents, risk teams—that generate arguments and synthesize their insights into a final recommendation, emphasizing explicit reasoning rather than prediction.

What makes this approach different from traditional AI trading models?

Instead of relying on single models or explicit rules, it employs a multi-agent debate architecture that articulates reasoning through structured arguments, aiming to improve decision transparency and robustness.

What are the main challenges or limitations?

The system’s effectiveness in live markets remains unproven, and its performance over extended periods has not yet been validated. Additionally, the complexity of multi-agent reasoning may introduce unforeseen biases or errors.

Source: ThorstenMeyerAI.com

This content is for general information only and is not financial, tax or legal advice. Consult a qualified professional for decisions about your money.

Introducing Forezai · TradingAgents — a committee of LLMs decides paper-trades

Up next

Week Three — Foundation model vs Brownian motion. Kronos on five-minute BTC.

Author

Startup Sofa Team

Introducing Forezai · TradingAgents.
A committee of LLMs
decides paper-trades.

Implications for AI-Driven Market Decision-Making

12Pcs Trading Chart Pattern Posters Candlestick Pattern Poster Bulletin Board Crypto and Stock Market Trading Poster Office Decorations for Trader Investor Supplies Wall Door Decor 11 x 15.7 Inches

Background on AI Trading Research and Frameworks

The Intelligent AI Investor: A Beginner’s Guide to Using AI Tools for Informed Investment Decisions, Risk Management, and Wealth Building (Trading & Investing Series Book 7)

Limitations and Unknowns in AI Paper-Trading System

Financial Analysis With Microsoft Excel 2019

Next Steps for Testing and Validating AI Trading Agents

Key Questions

Can this AI system trade with real money?

How does the AI committee make decisions?

What makes this approach different from traditional AI trading models?

What are the main challenges or limitations?

The Prototype Cost Shortcut Most Founders Miss

Mastering Controllable Variance: A Quick Guide

Inspiring Insights on Wealth From Carnegie

Master Production Variance Analysis in SAP S/4HANA

13 Best Digital Signage Screen For Franchise Business In 2026

OpenAI in talks to give Trump administration a 5% stake in the company, FT reports

Reopening Federal Bond Issue – Auction Result

AI Changelog Digest For Open-source Maintainers

Introducing Forezai · TradingAgents — a committee of LLMs decides paper-trades

Up next

Author

Startup Sofa Team

Introducing Forezai · TradingAgents.A committee of LLMsdecides paper-trades.

Implications for AI-Driven Market Decision-Making

12Pcs Trading Chart Pattern Posters Candlestick Pattern Poster Bulletin Board Crypto and Stock Market Trading Poster Office Decorations for Trader Investor Supplies Wall Door Decor 11 x 15.7 Inches

Background on AI Trading Research and Frameworks

The Intelligent AI Investor: A Beginner’s Guide to Using AI Tools for Informed Investment Decisions, Risk Management, and Wealth Building (Trading & Investing Series Book 7)

Limitations and Unknowns in AI Paper-Trading System

Financial Analysis With Microsoft Excel 2019

Next Steps for Testing and Validating AI Trading Agents

Smart NFC Social Media Wristband – Silicone QR Bracelet for Facebook, Instagram & Google Reviews | Waterproof | Lifetime Link Update | Analytics Dashboard (NFC-WB-INS-BLK-V1)

Key Questions

Can this AI system trade with real money?

How does the AI committee make decisions?

What makes this approach different from traditional AI trading models?

What are the main challenges or limitations?

You May Also Like

Introducing Forezai · TradingAgents.
A committee of LLMs
decides paper-trades.