How to Structure Product Data for AI Agents (2026 Guide)

How Do You Structure Product Data for AI Agents?

You need 12 core product attributes, JSON-LD Product schema, a feed in OpenAI commerce format or AP2/ACP/UCP/MCP, and active maintenance so AI agents can find, understand, and recommend each product.

To structure product data for AI agents in 2026, retailers need four things working together:

12 core attributes per product, complete on every SKU. Title, description, brand, GTIN, MPN, category (Google Product Taxonomy), price, sale price, availability, condition, image URL, and product URL. These are the minimum baseline. Most retailers shipping to Google Shopping have most of them; AI agents need all of them.
20-30+ enrichment attributes that go beyond Google Shopping. Material, dimensions, weight, color, size, age group, gender, ingredient list, intended use, occasion, certifications, sustainability claims, and a structured Q&A block. AI agents weight these heavily when picking between similar products.
JSON-LD Product schema in raw HTML on every product page. Server-side rendered, not client-injected. AI crawlers (GPTBot, ClaudeBot, PerplexityBot) do not execute JavaScript.
A feed in at least one of the agentic commerce formats: OpenAI's commerce feed specification (for ChatGPT Shopping), Google's UCP feed (Universal Commerce Protocol), Stripe's ACP integration (Agentic Commerce Protocol for checkout), or an MCP server (Model Context Protocol for real-time inventory and pricing).

This guide walks through each piece. The order matters: ship the 12 core attributes first, layer the enrichment attributes second, add schema third, then pick the feed format that matches where your buyers are. Skipping the first step makes the rest invisible.

Last updated: May 11, 2026. Sources cited inline.

The 12 Core Attributes Every AI Agent Needs

Title, description, brand, GTIN, MPN, category, price, sale price, availability, condition, image URL, product URL. Missing any one of these reduces a product's chance of being recommended.

AI shopping agents (ChatGPT Shopping, Perplexity Shopping, Google AI Mode, Amazon Rufus) rank products by query relevance, data completeness, and recency. Of the three, data completeness is the only one fully under a retailer's control. The 12 core attributes are the baseline test.

Attribute	What it does for AI agents	Common mistake
Title	Primary text match for the buyer's query. AI agents weight title heavily when deciding whether a product is relevant to a question.	Marketing-language titles ("Our Iconic Best-Seller") instead of "Brand + Product Type + Key Attribute + Size" titles ("Nike Air Max 90 Triple White, Men's Size 10").
Description	The primary text the AI agent quotes from when generating a recommendation. AI-friendly descriptions answer specific buyer questions in the first 200 words.	HTML-stripped marketing copy with no product specifics. AI agents need attributes mentioned in prose, not just in fields.
Brand	Entity linkage for the AI agent. The brand becomes a graph node the agent can reason about across multiple products and reviews.	Brand stored in title only ("Sony WH-1000XM5") and missing as a structured attribute. Agents need it as its own field.
GTIN (UPC/EAN/ISBN)	Globally unique product identifier. Allows the AI agent to match the same product across multiple retailers and aggregate reviews, prices, and availability.	Missing or invalid GTINs. Per GS1, 86% of product data feeds contain at least one invalid GTIN (GS1 US Data Quality Report, 2024).
MPN (Manufacturer Part Number)	Fallback identifier when GTIN is unavailable (private-label products, custom configurations).	Stored in title or SKU field instead of as MPN.
Google Product Category	Taxonomy that lets AI agents understand what category your product competes in. Maps to Google's full Product Taxonomy with ~6,000 nodes.	Using your own internal category name instead of mapping to Google's taxonomy. AI agents trained on Google data don't know your internal taxonomy.
Price	Filterable attribute. AI agents serve queries like "under $300" or "best value." Products without a clean price field get excluded from price-filtered queries.	Price stored as a string with currency symbol ("$199.99") instead of a numeric field plus separate currency code. ACP and UCP feeds require numeric.
Sale price	Separate field, not a replacement for price. AI agents surface deal context ("currently 30% off").	Overwriting the regular price field during a sale and losing the original anchor price.
Availability	Real-time inventory signal. Out-of-stock products that still rank in AI responses generate a bad user experience and reduce future trust in your catalog.	Stale availability values cached for 24+ hours. AI agents are starting to penalize feeds with stale availability signals.
Condition	"new", "refurbished", "used". Affects which buyer queries surface the product.	Missing on private-label or third-party reseller listings.
Image URL	Primary product image at 800x800 minimum, white background preferred. AI agents now route product images through visual classifiers; low-quality images hurt rankings.	Thumbnail-resolution images, watermarked images, or relative URLs that break when the AI agent rehosts the link.
Product URL	Canonical product page URL. Must be server-side rendered with the JSON-LD Product schema embedded.	Client-rendered pages where the body is empty when JavaScript is disabled. AI crawlers won't see the product.

The 12 core attributes are necessary but not sufficient. The next section covers the enrichment attributes that separate products AI agents recommend from products AI agents skip.

The 20-30 Enrichment Attributes That Decide Whether You Get Recommended

Material, dimensions, color, size, age group, ingredients, certifications, intended use, Q&A pairs. AI agents weight these heavily when filtering between similar products.

Two products with identical core attributes can score very differently in AI agent recommendations based on what comes next. The enrichment attributes answer the questions buyers actually ask AI agents - questions like "is it dishwasher safe?", "does it run small?", "is it gluten-free?", "is it good for cold-weather camping?". Products that have those answers structured in the feed get cited. Products that bury them in a 2000-word marketing description don't.

The Princeton/Georgia Tech/Allen AI GEO paper (Aggarwal et al., SIGKDD 2024) tested nine optimization strategies across 10,000 queries and found that adding statistics, expert quotes, and cited sources boosted content visibility in generative engine responses by 30-40%. The product-data equivalent: structure your facts as attributes, not as prose buried in a description.

The high-leverage enrichment attributes, grouped by category:

Physical properties (always include if applicable):

Material - "100% organic cotton", "anodized aluminum", "vegan leather"
Dimensions - length, width, height, depth as separate numeric fields with units
Weight - numeric field with units (grams, ounces, pounds)
Color - normalized color name plus hex code; AI agents serve "navy blue" queries differently than "blue"
Size - structured size with size system (US, EU, UK, JP) and size type (regular, petite, big & tall)
Pattern - "solid", "striped", "floral" - filterable for fashion queries

Audience attributes:

Age group - newborn, infant, toddler, kids, adult (Google Merchant Center enum)
Gender - male, female, unisex (or omit entirely for gender-neutral items)
Intended use - "running", "everyday", "formal", "cold-weather camping"
Skill level - "beginner", "intermediate", "professional" for tools/equipment

Composition (for consumables and beauty):

Ingredient list - structured array, not prose
Allergens - structured list with allergen codes
Nutritional facts - structured numeric fields per FDA panel
Certifications - "USDA Organic", "Leaping Bunny", "Fair Trade", "B Corp" as structured tags

Performance and durability:

Warranty - duration in months/years as a numeric field plus warranty type
Care instructions - "machine washable", "dry clean only" as structured tags
Battery life / runtime - numeric where applicable
Water resistance rating - IPX rating

Sustainability and ethics (increasingly weighted by AI agents):

Recycled content percentage
Carbon footprint
Sourcing claims - "responsibly sourced", "made in [country]"
End-of-life options - "recyclable", "compostable"

Structured Q&A block - the single highest-leverage enrichment for AI agent retrieval. Five to ten common buyer questions with concise answers, each ~25-30 words, stored as structured pairs. AI agents quote directly from these answers when responding to natural-language queries. Sample questions to include for every product:

How does sizing run?
What's the return policy?
Is this dishwasher / microwave / freezer safe?
How long does it last?
What's included in the box?
How does this compare to similar products in [category]?

The pattern: every attribute the AI agent could filter on or quote from should exist as a structured field, not as a sentence buried in the description. Most retailers have this data internally (in PIM systems, product spec sheets, or warranty paperwork) but never expose it in the feed.

JSON-LD Product Schema: The Required Embed

JSON-LD Product schema embedded in raw HTML (server-side rendered) on every product page is the minimum requirement for AI agent retrieval. Without it, AI crawlers see your page as undifferentiated text.

Schema.org markup is processed by both training-time crawlers and runtime retrieval systems. 45 million web domains use Schema.org structured data (Schema.org, 2024). For product pages, the JSON-LD Product schema is the canonical format.

A minimum-viable Product schema block:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Nike Air Max 90 Triple White",
  "description": "The Air Max 90 in Triple White is...",
  "brand": {
    "@type": "Brand",
    "name": "Nike"
  },
  "sku": "CN8490-100",
  "gtin13": "0194501842877",
  "mpn": "CN8490-100",
  "category": "Shoes > Athletic Shoes > Running Shoes",
  "image": "https://example.com/images/cn8490-100.jpg",
  "color": "White",
  "material": "Leather and mesh upper",
  "size": "10 US Men's",
  "offers": {
    "@type": "Offer",
    "url": "https://example.com/products/nike-air-max-90-triple-white",
    "priceCurrency": "USD",
    "price": "129.99",
    "availability": "https://schema.org/InStock",
    "itemCondition": "https://schema.org/NewCondition",
    "seller": {
      "@type": "Organization",
      "name": "Example Retailer"
    }
  },
  "aggregateRating": {
    "@type": "AggregateRating",
    "ratingValue": "4.6",
    "reviewCount": "1248"
  }
}
</script>

Critical implementation details that retailers get wrong:

Server-side rendered. GPTBot, ClaudeBot, and PerplexityBot do not execute JavaScript. JSON-LD must be in the HTML response, not injected by client-side React after hydration. This requires SSR or a static export.
One Product per page. A page with variant selectors should ship a single Product with hasVariant rather than a separate Product schema per color/size.
Use real attribute fields, not additionalProperty for everything. Schema.org has dedicated fields for color, material, size, weight, dimensions, age group, gender. Use them. Use additionalProperty only for truly custom attributes.
Validate. Run every product page through validator.schema.org and Google's Rich Results Test. Errors block recommendation eligibility.
Add complementary schemas where they fit. FAQPage for product Q&A blocks. HowTo for setup/installation pages. VideoObject for product demo videos.

The full Schema.org Product reference is at schema.org/Product. The Google-specific extensions are at developers.google.com/search/docs/appearance/structured-data/product.

The Four Feed Formats AI Agents Actually Use

OpenAI commerce feed (ChatGPT Shopping), Google UCP, Stripe ACP, and MCP server. Each has its own schema and update cadence. Most retailers should ship to all four within 90 days.

Beyond the on-page schema, AI shopping channels each ingest a structured feed. The four that matter in 2026:

OpenAI commerce feed specification (for ChatGPT Shopping). A JSON feed format with a strict schema for product objects. Required for products to be surfaced in ChatGPT Shopping recommendations. Update cadence: real-time push via webhook or hourly poll. Specification: see the ChatGPT product feed format reference. Shopify merchants are auto-enrolled; other platforms integrate through Stripe's Agentic Commerce Suite or infrastructure providers like Paz.ai.

UCP (Universal Commerce Protocol) feed (for Google AI Mode and other agentic channels). Google's protocol for agentic checkout, donated to the Linux Foundation in early 2026. Feed format extends the Google Merchant Center spec with agentic fields (agent permissions, checkout URLs, real-time pricing). Update cadence: continuous via API. See UCP reference.

ACP (Agentic Commerce Protocol) for checkout. Stripe + OpenAI's protocol for agentic checkout. Not a product feed per se but a checkout integration that AI agents call to complete purchases on behalf of buyers. Required for ChatGPT Shopping checkout, Perplexity Buy with Pro, and most other agentic checkout flows. See ACP reference.

MCP (Model Context Protocol) server. Anthropic's protocol (donated to the Linux Foundation in Feb 2026) that lets AI agents fetch real-time product data, inventory, and pricing via a standardized server interface. Increasingly used by Claude-based shopping agents and any AI system that needs live data instead of a cached feed. See MCP reference.

What to ship first: if your buyers are in ChatGPT, ship the OpenAI commerce feed. If they're using Google AI Mode, ship UCP. If you're seeing Claude or Perplexity traffic, an MCP server matters. Most retailers in 2026 need all four within 90 days because the AI traffic mix is shifting too fast to bet on one channel.

The Process: From Audit to AI-Ready in 90 Days

Audit current attribute completeness, fix the 12 core attributes first, layer enrichment attributes by SKU priority, add schema, ship feeds. Most retailers finish in 8-12 weeks.

A concrete 90-day implementation sequence:

Days 1-14: audit. Run your current product feed through a completeness check. For each SKU, score 0/1 on each of the 12 core attributes plus the top 10 enrichment attributes for your category. Identify SKUs missing more than 3 core attributes; these are blocking AI agent recommendation. Tools like Paz.ai's AI Readiness Report produce this audit in 30 seconds against your live catalog.

Days 15-30: fix the 12 core attributes. Prioritize top-selling SKUs first. For brand attribute extraction, structured title patterns, missing GTINs (use MPN as fallback), Google Product Category mapping. Automated enrichment tooling can flip 80-90% of these fixes per product in a single pass; manual review is required for the rest.

Days 31-60: layer enrichment attributes. Start with the highest-leverage 20 attributes for your category (material, dimensions, color, size, intended use, certifications, Q&A block). Source data from product spec sheets, supplier feeds, or LLM-generated extraction from existing descriptions. Every attribute added moves the product up in retrieval ranking.

Days 61-75: add Schema.org JSON-LD to every product page. Server-side rendered. Validate with validator.schema.org and Google's Rich Results Test. Confirm GPTBot can see the schema by running curl -A "GPTBot" -s https://yoursite.com/products/example | grep '@type":"Product"'.

Days 76-90: ship feeds. Connect to OpenAI commerce feed, Google UCP, Stripe ACP, and (if applicable) an MCP server. Confirm each channel ingestion via the channel's own diagnostic dashboard.

Day 90 onward: monitor and maintain. Product data is not a one-time project. New products need the same treatment. Old products go stale (especially descriptions and Q&A). Set up a weekly audit cron. AI agents starting in 2026 weight recency; freshness signals matter for ongoing retrieval eligibility.

Common Mistakes That Block AI Agent Recommendation

Marketing-language titles, client-rendered product pages, stale availability, missing GTINs, prose-only descriptions, and skipping schema are the six most common reasons products don't get recommended.

The top six mistakes I see when auditing retailer catalogs in 2026:

Marketing-language titles. "Our Iconic Best-Seller" is not a title. "Nike Air Max 90 Triple White, Men's Size 10" is a title. AI agents match the query to the title; marketing copy doesn't match buyer queries.
Client-rendered product pages. If your product page is empty when JavaScript is disabled, AI crawlers see nothing. Test with curl -A "GPTBot" -s https://yoursite.com/products/example and check the body text. If it's under 1000 characters, your pages are invisible to AI agents.
Stale availability. Inventory cached for 24 hours instead of synced in real-time. AI agents are starting to penalize feeds where recent recommendations turned out to be out-of-stock.
Missing GTINs. Per GS1, 86% of product data feeds contain at least one invalid GTIN. Missing GTINs prevent AI agents from cross-matching the same product across retailers (which means they can't aggregate reviews and prices, which means they're less confident recommending you).
Prose-only descriptions. A 2000-word description tells a human the product is great. It tells an AI agent nothing structured. Move every fact (material, dimensions, certifications, intended use) into a dedicated attribute. Keep prose for tone and brand voice.
Skipping schema entirely. A product page without JSON-LD Product schema is competing with one arm tied behind its back. Schema is the lowest-cost, highest-leverage on-page change available.

How Paz.ai Handles This

Paz.ai audits the catalog, identifies missing attributes, generates LLM suggestions per product, lets the merchant approve in bulk, and pushes optimized feeds to ChatGPT, Google, and other AI channels.

Paz.ai's optimization product runs against a connected catalog and produces suggested values for every attribute that is missing, malformed, or underspecified. Each suggestion is reviewable per product or in bulk; nothing writes to the source catalog without merchant approval. Specific transformations include:

Generate AI Description - rewrites descriptions so AI shopping agents can confidently surface the product, with attributes named in the prose.
Generate Tech Specs - extracts structured attribute-value pairs (dimensions, weight, materials, certifications) for the Google Merchant product_detail field.
Generate Highlights - produces 3-5 short bullet-style highlights for the Google Shopping product_highlight field.
Titles - rewrites titles to the "Brand + Product Type + Key Attributes" pattern.
Generate Product Category - maps to the closest valid Google Product Taxonomy node from ~6,000 options.
Generate Q&A - produces shopper Q&A pairs that AI agents quote directly when answering buyer questions.
Discover Product Brand - extracts brand from existing title or description when the brand attribute is missing.
Assign Age Group - classifies into the Google Merchant age_group enum (newborn, infant, toddler, kids, adult).
Fix Prices - strips currency symbols and formatting from price and sale_price fields so feeds validate.
Variant Images / Image Enrichment - distributes parent product images to matching variants (Shopify) or fetches additional images from Adobe Dynamic Media (Magento).

The platform then pushes the optimized catalog to Google Merchant Center, OpenAI commerce feed (ChatGPT Shopping), and other AI shopping channels. The full feature surface is at paz.ai/ai-commerce/core.

FAQ

What is the most important product attribute for AI agents?+

GTIN (Global Trade Item Number) is the most important single attribute. It lets AI agents match the same product across multiple retailers and aggregate reviews and prices. Without GTIN, your product is a stranger to the AI agent's knowledge graph.

How many attributes should each product have?+

At minimum, the 12 core attributes (title, description, brand, GTIN, MPN, category, price, sale price, availability, condition, image URL, product URL). High-performing AI-ready catalogs add 20-30 enrichment attributes per product. Paz.ai customers typically grow from 5-12 attributes to 47+ attributes per product.

Do I need JSON-LD schema if I already have a product feed?+

Yes. Product feeds are ingested by specific channels (Google, OpenAI, Stripe). JSON-LD schema is read by AI crawlers (GPTBot, ClaudeBot, PerplexityBot) when they crawl your product pages directly. The two systems serve different retrieval paths. Most retailers need both.

How often should I update product data for AI agents?+

Availability and pricing in real-time (within minutes, not hours). Descriptions, attributes, and Q&A blocks at least quarterly. AI agents in 2026 weight recency - GPT-5.3 retrieves only 6% of pages older than 30 days, down from 33% on GPT-5.2 (Ahrefs analysis cited by Passionfruit Labs, March 2026).

How long does it take to make a catalog AI-ready?+

For most mid-market retailers (5,000-50,000 SKUs), a complete AI-readiness pass takes 8-12 weeks: 2 weeks audit, 2 weeks fix core attributes, 4 weeks layer enrichment attributes, 2 weeks add schema, 2 weeks ship feeds. Automation tooling like Paz.ai compresses the enrichment phase by 60-80%.

Does adding more attributes always improve AI ranking?+

Up to a point. The first 12 core attributes are required. The next 20-30 enrichment attributes produce meaningful lift. Beyond ~50 attributes, additional fields show diminishing returns. Quality and consistency matter more than raw quantity past that threshold.

What if my product is a one-of-a-kind item without a GTIN?+

Use MPN (Manufacturer Part Number) as the structured identifier. For truly unique items (handmade, vintage, custom), use the SKU as both the identifier and a clear note in the description that the item is one-of-a-kind. AI agents handle this case but rank it lower than GTIN-matched items.

How AI-Ready Are Your Products?

Check how AI shopping agents evaluate any product page. Free score in 30 seconds with specific recommendations.

Run Free Report →

How to Structure Product Data for AI Agents (2026 Guide)

How Do You Structure Product Data for AI Agents?

The 12 Core Attributes Every AI Agent Needs

The 20-30 Enrichment Attributes That Decide Whether You Get Recommended

JSON-LD Product Schema: The Required Embed

The Four Feed Formats AI Agents Actually Use

The Process: From Audit to AI-Ready in 90 Days

Common Mistakes That Block AI Agent Recommendation

How Paz.ai Handles This

FAQ

Related Terms

Structured Product Data

Product Data Enrichment

Product Schema Markup

ChatGPT Product Feed: Format and How to Submit

Product Feed Optimization for AI

AI Catalog Management

Generative Engine Optimization (GEO)

Agentic Commerce Protocol (ACP): How It Works in 2026

Universal Commerce Protocol (UCP)

Model Context Protocol (MCP)

How AI-Ready Are Your Products?