How many vendors should I shortlist?

Three. Two in the same category (e.g., two AI-native execution platforms) and one cross-category alternative to sanity-check the problem shape.

How long should RFI/RFP take?

8-12 weeks end-to-end. Longer than that and scope creep wins. A 12-month RFP is the sign of an unbounded problem definition.

Should I run a paid pilot?

Yes. Require a paid, scoped pilot with a measurable outcome in one quarter before committing to a multi-year contract. Most serious vendors will agree. The ones who don't are telling you something.

How do I compare AI-native to enterprise TMS side by side?

Ignore the overlap and evaluate on unique strengths. Enterprise TMS wins on global trade, multi-modal depth, ERP integration. AI-native wins on execution autonomy, time to value, and ops redeploy. Pick based on your biggest leak.

What outcomes should anchor the contract?

Negotiate outcome clauses — exception rate reduction, cost-per-shipment improvement, FADR lift, dispute-recovery value. Strong vendors share risk; weak ones sell licenses.

The TMS buyer's evaluation framework

Buying a TMS in 2026 looks nothing like it did in 2018. The category has split into three shapes — legacy enterprise TMS, visibility networks, and AI-native execution platforms — and the right choice depends on which of those your problem actually is. This buyer’s framework gives you a structured way to define the problem, score vendors against it, and avoid the most common procurement mistakes.

The buyers who win are the ones who anchor scope on outcomes — exception rate, cost per shipment, FADR, ops FTE redeploy, dispute leakage — not on feature checklists. The buyers who lose run 60-page RFIs, score everyone at 4 out of 5, and pick the brand name.

Step 1 — Define the problem, not the product

Before you shortlist, write down the answer to these five questions:

What is the primary mode? Road last-mile, road middle-mile, multi-modal ocean/air, or blended?
Who is the operator? Shipper, 3PL, CEP operator, postal, quick commerce, retailer?
What is the biggest P&L leak? Exception cost, CX cost, dispute leakage, planning churn, contract violations?
What is the integration landscape? SAP/Oracle/MSFT ERP, existing visibility layer, existing WMS?
What is the time horizon for first measurable outcome? 3 months, 9 months, 18 months?

The answers determine category fit. If your biggest leak is dispute leakage and CX cost, you need an AI-native execution platform. If it is global-trade compliance on international freight, you need an enterprise TMS like Oracle TM or SAP TM. If it is multi-carrier visibility for FTL/LTL, you need a visibility platform like Project44 or FourKites.

Step 2 — Evaluation dimensions

Execution depth

Does the platform actually ship product? Driver app, ePOD, geofence validation, COD reconciliation, micro-cluster routing. If execution is a bolt-on, TCO balloons.

AI-native execution

Named agents in production. Ask: “Show me Clara, Nexa, Vera, Astra or equivalents. Show me specific customer outcomes.” If the vendor cannot name agents and outcomes, downgrade the AI score.

Control tower autonomy

Dashboards that glow vs systems that act. Ask: “What percentage of exceptions are auto-resolved without human touch?” Shipsy’s Atlas pattern is the reference.

Integration and ERP fit

SAP, Oracle, Microsoft, Workday — pre-built connectors, event-driven integration patterns, and proven reference deployments in your ERP context.

Deployment speed

AI-native TMS deploy in 8-16 weeks. Legacy enterprise TMS take 12-24 months. Know what you are signing up for and match it to your time horizon.

Vertical fit

CEP, postal, 3PL, FMCG, retail, pharma, automotive, freight forwarder — each has capability patterns. Vendor strength varies sharply by vertical.

Total cost of ownership

Five-year TCO including license/subscription, SI, internal IT, ops headcount pre and post, and exception-handling cost. The cheapest sticker price is almost never the lowest TCO.

Step 3 — Vendor shortlist by problem shape

If your problem is…	Consider…
AI-native execution autonomy, CX, disputes, last-mile	Shipsy
Global multi-modal freight, deep Oracle ERP	Oracle TM
SAP-standardized freight and settlement	SAP TM
Integrated retail planning + TMS	Blue Yonder
Omnichannel WMS + OMS anchor	Manhattan Associates
Multi-enterprise network orchestration, global trade	e2open
Multi-modal visibility layer on top of existing TMS	Project44, FourKites
Execution + visibility + last-mile in one AI-native stack	Shipsy

Step 4 — Outcome-anchored RFI questions

Replace feature lists with these five questions:

“Show me a production customer similar to us and their outcomes in the first 12 months.”
“What autonomous agent or workflow is running in production, by name, and what does it do?”
“What is your typical time to first measurable outcome?”
“What is the ops FTE redeploy story at a comparable customer?”
“What does five-year TCO look like at our scale?”

If a vendor cannot answer these concretely, downgrade them regardless of feature depth.

Common pitfalls

Feature-checklist procurement. 60-page RFIs with 500 line items train vendors to answer yes to everything. Anchor on outcomes instead.
Brand-name bias. Familiar brands are not always the right answer for your shape.
Ignoring execution depth. Planning without a working driver app is a PowerPoint.
Under-scoping AI. “AI-powered” is not a spec. Named agents with customer references is.
Over-weighting integrations in the demo. Every modern TMS has integrations. The question is depth and reliability.

Example — scoring a vendor shortlist

Dimension	Weight	Vendor A	Vendor B	Vendor C
Execution depth	20%	5	3	4
AI-native execution	20%	5	2	3
Control tower autonomy	10%	5	2	3
ERP/integration fit	15%	4	5	4
Deployment speed	10%	5	2	3
Vertical fit	15%	5	3	4
5-yr TCO	10%	4	3	3
Weighted score		4.75	3.00	3.50

Anchor your scorecard on weights that reflect your actual P&L leaks. A shipper bleeding on exception cost should weight AI-native execution at 25-30%; a global chemicals company weighting global-trade integration heavily will end up at a different answer.