How Do You Structure Product Data for AI Agents?
You need 12 core product attributes, JSON-LD Product schema, a feed in OpenAI commerce format or AP2/ACP/UCP/MCP, and active maintenance so AI agents can find, understand, and recommend each product.
To structure product data for AI agents in 2026, retailers need four things working together:
- 12 core attributes per product, complete on every SKU. Title, description, brand, GTIN, MPN, category (Google Product Taxonomy), price, sale price, availability, condition, image URL, and product URL. These are the minimum baseline. Most retailers shipping to Google Shopping have most of them; AI agents need all of them.
- 20-30+ enrichment attributes that go beyond Google Shopping. Material, dimensions, weight, color, size, age group, gender, ingredient list, intended use, occasion, certifications, sustainability claims, and a structured Q&A block. AI agents weight these heavily when picking between similar products.
- JSON-LD Product schema in raw HTML on every product page. Server-side rendered, not client-injected. AI crawlers (GPTBot, ClaudeBot, PerplexityBot) do not execute JavaScript.
- A feed in at least one of the agentic commerce formats: OpenAI's commerce feed specification (for ChatGPT Shopping), Google's UCP feed (Universal Commerce Protocol), Stripe's ACP integration (Agentic Commerce Protocol for checkout), or an MCP server (Model Context Protocol for real-time inventory and pricing).
This guide walks through each piece. The order matters: ship the 12 core attributes first, layer the enrichment attributes second, add schema third, then pick the feed format that matches where your buyers are. Skipping the first step makes the rest invisible.
Last updated: May 11, 2026. Sources cited inline.
The 12 Core Attributes Every AI Agent Needs
Title, description, brand, GTIN, MPN, category, price, sale price, availability, condition, image URL, product URL. Missing any one of these reduces a product's chance of being recommended.
AI shopping agents (ChatGPT Shopping, Perplexity Shopping, Google AI Mode, Amazon Rufus) rank products by query relevance, data completeness, and recency. Of the three, data completeness is the only one fully under a retailer's control. The 12 core attributes are the baseline test.
| Attribute | What it does for AI agents | Common mistake |
|---|---|---|
| Title | Primary text match for the buyer's query. AI agents weight title heavily when deciding whether a product is relevant to a question. | Marketing-language titles ("Our Iconic Best-Seller") instead of "Brand + Product Type + Key Attribute + Size" titles ("Nike Air Max 90 Triple White, Men's Size 10"). |
| Description | The primary text the AI agent quotes from when generating a recommendation. AI-friendly descriptions answer specific buyer questions in the first 200 words. | HTML-stripped marketing copy with no product specifics. AI agents need attributes mentioned in prose, not just in fields. |
| Brand | Entity linkage for the AI agent. The brand becomes a graph node the agent can reason about across multiple products and reviews. | Brand stored in title only ("Sony WH-1000XM5") and missing as a structured attribute. Agents need it as its own field. |
| GTIN (UPC/EAN/ISBN) | Globally unique product identifier. Allows the AI agent to match the same product across multiple retailers and aggregate reviews, prices, and availability. | Missing or invalid GTINs. Per GS1, 86% of product data feeds contain at least one invalid GTIN (GS1 US Data Quality Report, 2024). |
| MPN (Manufacturer Part Number) | Fallback identifier when GTIN is unavailable (private-label products, custom configurations). | Stored in title or SKU field instead of as MPN. |
| Google Product Category | Taxonomy that lets AI agents understand what category your product competes in. Maps to Google's full Product Taxonomy with ~6,000 nodes. | Using your own internal category name instead of mapping to Google's taxonomy. AI agents trained on Google data don't know your internal taxonomy. |
| Price | Filterable attribute. AI agents serve queries like "under $300" or "best value." Products without a clean price field get excluded from price-filtered queries. | Price stored as a string with currency symbol ("$199.99") instead of a numeric field plus separate currency code. ACP and UCP feeds require numeric. |
| Sale price | Separate field, not a replacement for price. AI agents surface deal context ("currently 30% off"). | Overwriting the regular price field during a sale and losing the original anchor price. |
| Availability | Real-time inventory signal. Out-of-stock products that still rank in AI responses generate a bad user experience and reduce future trust in your catalog. | Stale availability values cached for 24+ hours. AI agents are starting to penalize feeds with stale availability signals. |
| Condition | "new", "refurbished", "used". Affects which buyer queries surface the product. | Missing on private-label or third-party reseller listings. |
| Image URL | Primary product image at 800x800 minimum, white background preferred. AI agents now route product images through visual classifiers; low-quality images hurt rankings. | Thumbnail-resolution images, watermarked images, or relative URLs that break when the AI agent rehosts the link. |
| Product URL | Canonical product page URL. Must be server-side rendered with the JSON-LD Product schema embedded. | Client-rendered pages where the body is empty when JavaScript is disabled. AI crawlers won't see the product. |
The 12 core attributes are necessary but not sufficient. The next section covers the enrichment attributes that separate products AI agents recommend from products AI agents skip.
The 20-30 Enrichment Attributes That Decide Whether You Get Recommended
Material, dimensions, color, size, age group, ingredients, certifications, intended use, Q&A pairs. AI agents weight these heavily when filtering between similar products.
Two products with identical core attributes can score very differently in AI agent recommendations based on what comes next. The enrichment attributes answer the questions buyers actually ask AI agents - questions like "is it dishwasher safe?", "does it run small?", "is it gluten-free?", "is it good for cold-weather camping?". Products that have those answers structured in the feed get cited. Products that bury them in a 2000-word marketing description don't.
The Princeton/Georgia Tech/Allen AI GEO paper (Aggarwal et al., SIGKDD 2024) tested nine optimization strategies across 10,000 queries and found that adding statistics, expert quotes, and cited sources boosted content visibility in generative engine responses by 30-40%. The product-data equivalent: structure your facts as attributes, not as prose buried in a description.
The high-leverage enrichment attributes, grouped by category:
Physical properties (always include if applicable):
- Material - "100% organic cotton", "anodized aluminum", "vegan leather"
- Dimensions - length, width, height, depth as separate numeric fields with units
- Weight - numeric field with units (grams, ounces, pounds)
- Color - normalized color name plus hex code; AI agents serve "navy blue" queries differently than "blue"
- Size - structured size with size system (US, EU, UK, JP) and size type (regular, petite, big & tall)
- Pattern - "solid", "striped", "floral" - filterable for fashion queries
Audience attributes:
- Age group - newborn, infant, toddler, kids, adult (Google Merchant Center enum)
- Gender - male, female, unisex (or omit entirely for gender-neutral items)
- Intended use - "running", "everyday", "formal", "cold-weather camping"
- Skill level - "beginner", "intermediate", "professional" for tools/equipment
Composition (for consumables and beauty):
- Ingredient list - structured array, not prose
- Allergens - structured list with allergen codes
- Nutritional facts - structured numeric fields per FDA panel
- Certifications - "USDA Organic", "Leaping Bunny", "Fair Trade", "B Corp" as structured tags
Performance and durability:
- Warranty - duration in months/years as a numeric field plus warranty type
- Care instructions - "machine washable", "dry clean only" as structured tags
- Battery life / runtime - numeric where applicable
- Water resistance rating - IPX rating
Sustainability and ethics (increasingly weighted by AI agents):
- Recycled content percentage
- Carbon footprint
- Sourcing claims - "responsibly sourced", "made in [country]"
- End-of-life options - "recyclable", "compostable"
Structured Q&A block - the single highest-leverage enrichment for AI agent retrieval. Five to ten common buyer questions with concise answers, each ~25-30 words, stored as structured pairs. AI agents quote directly from these answers when responding to natural-language queries. Sample questions to include for every product:
- How does sizing run?
- What's the return policy?
- Is this dishwasher / microwave / freezer safe?
- How long does it last?
- What's included in the box?
- How does this compare to similar products in [category]?
The pattern: every attribute the AI agent could filter on or quote from should exist as a structured field, not as a sentence buried in the description. Most retailers have this data internally (in PIM systems, product spec sheets, or warranty paperwork) but never expose it in the feed.
JSON-LD Product Schema: The Required Embed
JSON-LD Product schema embedded in raw HTML (server-side rendered) on every product page is the minimum requirement for AI agent retrieval. Without it, AI crawlers see your page as undifferentiated text.
Schema.org markup is processed by both training-time crawlers and runtime retrieval systems. 45 million web domains use Schema.org structured data (Schema.org, 2024). For product pages, the JSON-LD Product schema is the canonical format.
A minimum-viable Product schema block:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Product",
"name": "Nike Air Max 90 Triple White",
"description": "The Air Max 90 in Triple White is...",
"brand": {
"@type": "Brand",
"name": "Nike"
},
"sku": "CN8490-100",
"gtin13": "0194501842877",
"mpn": "CN8490-100",
"category": "Shoes > Athletic Shoes > Running Shoes",
"image": "https://example.com/images/cn8490-100.jpg",
"color": "White",
"material": "Leather and mesh upper",
"size": "10 US Men's",
"offers": {
"@type": "Offer",
"url": "https://example.com/products/nike-air-max-90-triple-white",
"priceCurrency": "USD",
"price": "129.99",
"availability": "https://schema.org/InStock",
"itemCondition": "https://schema.org/NewCondition",
"seller": {
"@type": "Organization",
"name": "Example Retailer"
}
},
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "4.6",
"reviewCount": "1248"
}
}
</script>
Critical implementation details that retailers get wrong:
- Server-side rendered. GPTBot, ClaudeBot, and PerplexityBot do not execute JavaScript. JSON-LD must be in the HTML response, not injected by client-side React after hydration. This requires SSR or a static export.
- One Product per page. A page with variant selectors should ship a single Product with
hasVariantrather than a separate Product schema per color/size. - Use real attribute fields, not
additionalPropertyfor everything. Schema.org has dedicated fields for color, material, size, weight, dimensions, age group, gender. Use them. UseadditionalPropertyonly for truly custom attributes. - Validate. Run every product page through validator.schema.org and Google's Rich Results Test. Errors block recommendation eligibility.
- Add complementary schemas where they fit.
FAQPagefor product Q&A blocks.HowTofor setup/installation pages.VideoObjectfor product demo videos.
The full Schema.org Product reference is at schema.org/Product. The Google-specific extensions are at developers.google.com/search/docs/appearance/structured-data/product.
The Four Feed Formats AI Agents Actually Use
OpenAI commerce feed (ChatGPT Shopping), Google UCP, Stripe ACP, and MCP server. Each has its own schema and update cadence. Most retailers should ship to all four within 90 days.
Beyond the on-page schema, AI shopping channels each ingest a structured feed. The four that matter in 2026:
OpenAI commerce feed specification (for ChatGPT Shopping). A JSON feed format with a strict schema for product objects. Required for products to be surfaced in ChatGPT Shopping recommendations. Update cadence: real-time push via webhook or hourly poll. Specification: see the ChatGPT product feed format reference. Shopify merchants are auto-enrolled; other platforms integrate through Stripe's Agentic Commerce Suite or infrastructure providers like Paz.ai.
UCP (Universal Commerce Protocol) feed (for Google AI Mode and other agentic channels). Google's protocol for agentic checkout, donated to the Linux Foundation in early 2026. Feed format extends the Google Merchant Center spec with agentic fields (agent permissions, checkout URLs, real-time pricing). Update cadence: continuous via API. See UCP reference.
ACP (Agentic Commerce Protocol) for checkout. Stripe + OpenAI's protocol for agentic checkout. Not a product feed per se but a checkout integration that AI agents call to complete purchases on behalf of buyers. Required for ChatGPT Shopping checkout, Perplexity Buy with Pro, and most other agentic checkout flows. See ACP reference.
MCP (Model Context Protocol) server. Anthropic's protocol (donated to the Linux Foundation in Feb 2026) that lets AI agents fetch real-time product data, inventory, and pricing via a standardized server interface. Increasingly used by Claude-based shopping agents and any AI system that needs live data instead of a cached feed. See MCP reference.
What to ship first: if your buyers are in ChatGPT, ship the OpenAI commerce feed. If they're using Google AI Mode, ship UCP. If you're seeing Claude or Perplexity traffic, an MCP server matters. Most retailers in 2026 need all four within 90 days because the AI traffic mix is shifting too fast to bet on one channel.
The Process: From Audit to AI-Ready in 90 Days
Audit current attribute completeness, fix the 12 core attributes first, layer enrichment attributes by SKU priority, add schema, ship feeds. Most retailers finish in 8-12 weeks.
A concrete 90-day implementation sequence:
Days 1-14: audit. Run your current product feed through a completeness check. For each SKU, score 0/1 on each of the 12 core attributes plus the top 10 enrichment attributes for your category. Identify SKUs missing more than 3 core attributes; these are blocking AI agent recommendation. Tools like Paz.ai's AI Readiness Report produce this audit in 30 seconds against your live catalog.
Days 15-30: fix the 12 core attributes. Prioritize top-selling SKUs first. For brand attribute extraction, structured title patterns, missing GTINs (use MPN as fallback), Google Product Category mapping. Automated enrichment tooling can flip 80-90% of these fixes per product in a single pass; manual review is required for the rest.
Days 31-60: layer enrichment attributes. Start with the highest-leverage 20 attributes for your category (material, dimensions, color, size, intended use, certifications, Q&A block). Source data from product spec sheets, supplier feeds, or LLM-generated extraction from existing descriptions. Every attribute added moves the product up in retrieval ranking.
Days 61-75: add Schema.org JSON-LD to every product page. Server-side rendered. Validate with validator.schema.org and Google's Rich Results Test. Confirm GPTBot can see the schema by running curl -A "GPTBot" -s https://yoursite.com/products/example | grep '@type":"Product"'.
Days 76-90: ship feeds. Connect to OpenAI commerce feed, Google UCP, Stripe ACP, and (if applicable) an MCP server. Confirm each channel ingestion via the channel's own diagnostic dashboard.
Day 90 onward: monitor and maintain. Product data is not a one-time project. New products need the same treatment. Old products go stale (especially descriptions and Q&A). Set up a weekly audit cron. AI agents starting in 2026 weight recency; freshness signals matter for ongoing retrieval eligibility.
Common Mistakes That Block AI Agent Recommendation
Marketing-language titles, client-rendered product pages, stale availability, missing GTINs, prose-only descriptions, and skipping schema are the six most common reasons products don't get recommended.
The top six mistakes I see when auditing retailer catalogs in 2026:
- Marketing-language titles. "Our Iconic Best-Seller" is not a title. "Nike Air Max 90 Triple White, Men's Size 10" is a title. AI agents match the query to the title; marketing copy doesn't match buyer queries.
- Client-rendered product pages. If your product page is empty when JavaScript is disabled, AI crawlers see nothing. Test with
curl -A "GPTBot" -s https://yoursite.com/products/exampleand check the body text. If it's under 1000 characters, your pages are invisible to AI agents. - Stale availability. Inventory cached for 24 hours instead of synced in real-time. AI agents are starting to penalize feeds where recent recommendations turned out to be out-of-stock.
- Missing GTINs. Per GS1, 86% of product data feeds contain at least one invalid GTIN. Missing GTINs prevent AI agents from cross-matching the same product across retailers (which means they can't aggregate reviews and prices, which means they're less confident recommending you).
- Prose-only descriptions. A 2000-word description tells a human the product is great. It tells an AI agent nothing structured. Move every fact (material, dimensions, certifications, intended use) into a dedicated attribute. Keep prose for tone and brand voice.
- Skipping schema entirely. A product page without JSON-LD Product schema is competing with one arm tied behind its back. Schema is the lowest-cost, highest-leverage on-page change available.
How Paz.ai Handles This
Paz.ai audits the catalog, identifies missing attributes, generates LLM suggestions per product, lets the merchant approve in bulk, and pushes optimized feeds to ChatGPT, Google, and other AI channels.
Paz.ai's optimization product runs against a connected catalog and produces suggested values for every attribute that is missing, malformed, or underspecified. Each suggestion is reviewable per product or in bulk; nothing writes to the source catalog without merchant approval. Specific transformations include:
- Generate AI Description - rewrites descriptions so AI shopping agents can confidently surface the product, with attributes named in the prose.
- Generate Tech Specs - extracts structured attribute-value pairs (dimensions, weight, materials, certifications) for the Google Merchant
product_detailfield. - Generate Highlights - produces 3-5 short bullet-style highlights for the Google Shopping
product_highlightfield. - Titles - rewrites titles to the "Brand + Product Type + Key Attributes" pattern.
- Generate Product Category - maps to the closest valid Google Product Taxonomy node from ~6,000 options.
- Generate Q&A - produces shopper Q&A pairs that AI agents quote directly when answering buyer questions.
- Discover Product Brand - extracts brand from existing title or description when the brand attribute is missing.
- Assign Age Group - classifies into the Google Merchant
age_groupenum (newborn, infant, toddler, kids, adult). - Fix Prices - strips currency symbols and formatting from price and sale_price fields so feeds validate.
- Variant Images / Image Enrichment - distributes parent product images to matching variants (Shopify) or fetches additional images from Adobe Dynamic Media (Magento).
The platform then pushes the optimized catalog to Google Merchant Center, OpenAI commerce feed (ChatGPT Shopping), and other AI shopping channels. The full feature surface is at paz.ai/ai-commerce/core.
FAQ
What is the most important product attribute for AI agents?+
How many attributes should each product have?+
Do I need JSON-LD schema if I already have a product feed?+
How often should I update product data for AI agents?+
How long does it take to make a catalog AI-ready?+
Does adding more attributes always improve AI ranking?+
What if my product is a one-of-a-kind item without a GTIN?+
Related Terms
Structured Product Data
Structured product data is machine-readable product information organized in standardized formats like Schema.org, enabling search engines and AI agents to understand and recommend products.
Product Data Enrichment
Product data enrichment is the process of enhancing raw product information with additional attributes, descriptions, and metadata to improve discoverability and conversions.
Product Schema Markup
Product schema markup is structured JSON-LD data embedded in a product page that tells search engines and AI systems what the product is, what it costs, whether it is in stock, and what buyers think of it.
ChatGPT Product Feed: Format and How to Submit
The ChatGPT product feed format is a JSONL-based spec by OpenAI that retailers use to syndicate catalogs into ChatGPT Shopping for AI-driven product discovery.
Product Feed Optimization for AI
Product feed optimization for AI is the practice of structuring and enhancing product data specifically for discovery and recommendation by AI shopping agents.
AI Catalog Management
AI catalog management uses artificial intelligence to automate product data creation, enrichment, categorization, and optimization across sales channels.
Generative Engine Optimization (GEO)
GEO is the practice of structuring digital content to maximize visibility in AI-generated responses from ChatGPT, Google AI, and Perplexity.
Agentic Commerce Protocol (ACP): How It Works in 2026
ACP is an open-source checkout protocol by Stripe and OpenAI that enables AI agents to complete purchases on behalf of consumers.
Universal Commerce Protocol (UCP)
UCP is an open standard by Google and Shopify that enables AI agents to handle the full commerce journey from discovery to post-purchase.
Model Context Protocol (MCP)
MCP is an open standard originally created by Anthropic that provides a universal way for AI agents to connect to external data sources in real time.
How AI-Ready Are Your Products?
Check how AI shopping agents evaluate any product page. Free score in 30 seconds with specific recommendations.
Run Free Report →