检索增强生成（Retrieval-Augmented Generation）

核心创新

子图召回 + 模板填充 + 大模型推理 三段式流水线，将 Listing 事实错误率从 21% 降至 4.8%。

三段式流水线

text

┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│ 子图召回     │───▶│ 模板填充     │───▶│ 大模型推理   │
│ (Subgraph    │    │ (Cypher-like│    │ (GPT/Gemini│
│  Retrieval) │    │  Triples)   │    │  with Hard │
│             │    │             │    │  Constraint)│
└─────────────┘    └─────────────┘    └─────────────┘

阶段 1：任务感知子图召回

不同任务召回不同子图

text

任务                       召回的关系子集
─────────────────────────────────────────────────────────
主图生成      → MADE_OF + HAS_SPEC（材质 + 主要规格）
A+ 信息图     → HIGHLIGHTS + HAS_SPEC（卖点 + 规格）
Lifestyle 图  → SUITABLE_FOR + Audience + Material
4 宫格        → Top-4 SUITABLE_FOR
Bullet 文案   → HAS_SPEC + HIGHLIGHTS + COMPLIES_WITH
违禁词检查    → COMPLIES_WITH（仅合规）
视频脚本      → SUITABLE_FOR + HIGHLIGHTS（场景 + 行为）

召回算法

python

def retrieve_subgraph(product_id: str, task: str) -> List[Triple]:
    """
    任务感知子图召回
    """
    # 1. 根据任务确定关系类型
    REL_TYPES = {
        'main_image': ['MADE_OF', 'HAS_SPEC'],
        'aplus':      ['HIGHLIGHTS', 'HAS_SPEC'],
        'lifestyle':  ['SUITABLE_FOR', 'MADE_OF'],
        'bullet':     ['HAS_SPEC', 'HIGHLIGHTS', 'COMPLIES_WITH'],
        'compliance': ['COMPLIES_WITH'],
        'video':      ['SUITABLE_FOR', 'HIGHLIGHTS'],
    }
    rel_types = REL_TYPES[task]

    # 2. SQL 查询
    triples = db.query("""
        SELECT s.name AS subject,
               r.rel_type,
               t.name AS object,
               t.attributes,
               r.weight,
               r.evidence
        FROM kg_relations r
        JOIN kg_entities s ON s.id = r.source_id
        JOIN kg_entities t ON t.id = r.target_id
        WHERE r.source_id = ?
          AND r.rel_type IN ({})
        ORDER BY r.rel_type, r.weight DESC
    """.format(','.join('?' * len(rel_types))),
       (product_id, *rel_types))

    # 3. 按任务做 Top-K 截断
    if task == 'bullet':
        triples = top_k_per_relation(triples, k=5)  # 每类最多 5 条
    elif task == 'lifestyle':
        triples = top_k(triples, k=4)               # 总共 4 条

    return triples

召回示例

Bullet 任务的召回结果Lifestyle 任务的召回结果

text

HAS_SPEC:
  Product → Size 60x30x80cm        (weight=1.00)
  Product → Load 8kg per tier      (weight=0.90)
  Product → Folded 60x30x8cm       (weight=0.85)
  Product → 3 Tiers                (weight=0.85)
  Product → Total Load 24kg        (weight=0.80)

HIGHLIGHTS:
  Product → Foldable               (weight=1.15) ← 用户反馈强化
  Product → Multi-tier             (weight=1.05)
  Product → 304-grade              (weight=1.00)
  Product → Anti-rust              (weight=0.92)

COMPLIES_WITH:
  Product → Lead-Free              (weight=1.00)
  Product → Food Contact Safe      (weight=1.00)

text

SUITABLE_FOR (Top-4):
  Product → Small Kitchen Apartment (weight=0.95)
  Product → Bathroom Storage         (weight=0.65)
  Product → Office Pantry            (weight=0.60)
  Product → Outdoor Camping          (weight=0.55)

MADE_OF:
  Product → 304 Stainless Steel      (weight=1.00)

阶段 2：模板填充

序列化为 Cypher-like 三元组

text

<MADE_OF, Stainless Steel Kitchen Rack, 304 Stainless Steel>
<HAS_SPEC, Stainless Steel Kitchen Rack, Size 60x30x80cm>
<HAS_SPEC, Stainless Steel Kitchen Rack, Load 8kg per tier>
<HAS_SPEC, Stainless Steel Kitchen Rack, Folded Size 60x30x8cm>
<HIGHLIGHTS, Stainless Steel Kitchen Rack, Foldable [weight=1.15]>
<HIGHLIGHTS, Stainless Steel Kitchen Rack, Multi-tier [weight=1.05]>
<COMPLIES_WITH, Stainless Steel Kitchen Rack, Lead-Free>
<COMPLIES_WITH, Stainless Steel Kitchen Rack, Food Contact Safe>

注入 System Prompt

Bullet 任务 System PromptLifestyle 图 System Prompt

text

你是有 10 年经验的亚马逊高级运营。请遵循 A10 / COSMO / Rufus 三种算法，
撰写英文 Listing。

【知识图谱事实（必须 entailed by）】
<MADE_OF, Stainless Steel Kitchen Rack, 304 Stainless Steel>
<HAS_SPEC, Stainless Steel Kitchen Rack, Size 60x30x80cm>
<HAS_SPEC, Stainless Steel Kitchen Rack, Load 8kg per tier>
<HAS_SPEC, Stainless Steel Kitchen Rack, Folded Size 60x30x8cm>
<HIGHLIGHTS, Stainless Steel Kitchen Rack, Foldable [weight=1.15]>
<HIGHLIGHTS, Stainless Steel Kitchen Rack, Multi-tier [weight=1.05]>
<HIGHLIGHTS, Stainless Steel Kitchen Rack, 304-grade [weight=1.00]>
<COMPLIES_WITH, Stainless Steel Kitchen Rack, Lead-Free>
<COMPLIES_WITH, Stainless Steel Kitchen Rack, Food Contact Safe>

【硬约束】
1. 严禁生成与三元组冲突的描述（如 316 / 自动折叠 / 1m 长度）
2. 严禁使用违禁词（FDA approved, Antibacterial, #1, Best, ...）
3. Bullet 顺序：使用场景 → 关键参数 → 卖点 → 售后承诺 → 品牌延伸
4. 每个 Bullet 用 [STANDOUT PHRASE] 开头吸引点击
5. 卖点按 weight 降序，最高 weight 进 Bullet 1

请生成 Title + 5-Point Bullets + Description。

text

你是亚马逊产品摄影提示词专家。请基于以下知识图谱生成 Lifestyle 场景图的提示词。

【场景图谱】
<SUITABLE_FOR, Product, Small Kitchen Apartment [weight=0.95]>
<MADE_OF, Product, 304 Stainless Steel>

【硬约束】
1. 场景必须真实可信，避免不合理布局
2. 光线根据场景调整（厨房 = 早晨自然光，户外 = 黄金时刻）
3. 必须能体现材质（304 不锈钢 = brushed metallic sheen）
4. 商品摆放符合实际使用方式（不悬空、不倾斜）
5. 不得添加竞品 logo、违禁文字

请生成英文 Prompt（用于 Gemini 3 Pro Image）。

阶段 3：大模型推理

调用方式

python

# Bullet 生成
response = openai.chat.completions.create(
    model="gpt-5.5",
    messages=[
        {"role": "system", "content": filled_system_prompt},
        {"role": "user", "content": user_request}
    ],
    temperature=0.7,
    max_tokens=2048,
    response_format={"type": "json_object"}
)

输出后处理

python

def post_process(llm_output, subgraph):
    """
    后处理：
      1. 验证输出 entailed by 三元组
      2. 检查违禁词命中
      3. Bullet 顺序检查
    """
    # 1. 事实校验
    facts = extract_claims(llm_output)
    for claim in facts:
        if not entailed_by(claim, subgraph):
            llm_output = regenerate_with_warning(claim, subgraph)

    # 2. 违禁词检查
    for word in BANNED_WORDS:
        if word.lower() in llm_output.lower():
            llm_output = replace_or_regenerate(word, llm_output)

    # 3. Bullet 顺序
    bullets = parse_bullets(llm_output)
    if not is_golden_order(bullets):
        bullets = reorder_to_golden(bullets)

    return llm_output

实证对比

我们用 100 条真实跨境电商商品做对比测试：

场景：Bullet 生成

指标	直接 LLM	+ GraphRAG	改进
事实错误率	21.0%	4.8%	↓ 77%
违禁词命中率	11.0%	0.6%	↓ 95%
关键卖点遗漏率	18.5%	3.2%	↓ 83%
Bullet 顺序合规	62%	96%	↑ 55%
生成长度合规	78%	94%	↑ 21%

案例对比

直接 LLM+ GraphRAG

text

Title: Premium Stainless Steel Kitchen Rack with FDA Approved
        Antibacterial Coating - #1 Best Seller!

Bullet 1: [HEALTH & SAFETY] Built with FDA-approved
          antibacterial coating that kills 99.9% of germs.
Bullet 2: [DURABLE] Made of 316 marine-grade stainless steel.
Bullet 3: [SPACE SAVING] Auto-folds in 3 seconds with motorized
          mechanism.
...

❌ 问题：
- "FDA approved" 违禁词
- "kills 99.9%" 抗菌违禁词
- "316" 不是 304（事实错误）
- "motorized" 商品本无电机（幻觉）
- "#1 Best Seller" 平台禁词

text

Title: Foldable 3-Tier Stainless Steel Kitchen Rack -
        304 Stainless, 60x30x80cm, 8kg Load Per Tier

Bullet 1: [USE ANYWHERE] Perfect for small kitchens, apartments,
          bathrooms, and outdoor camping. Folds flat to just
          8cm thick for hassle-free storage.

Bullet 2: [304 STAINLESS STEEL] Genuine 304 SUS construction.
          Lead-free certified and food-contact safe — durable
          enough for daily kitchen use.

Bullet 3: [FOLDABLE 3-TIER DESIGN] 60×30×80cm fully expanded,
          collapses to 60×30×8cm in seconds. No tools required.

Bullet 4: [24KG TOTAL CAPACITY] Three sturdy tiers each support
          8kg — combined 24kg total. Anti-rust coating ensures
          long-lasting performance.

Bullet 5: [1-YEAR WARRANTY] Backed by 24/7 customer service and
          full replacement guarantee. Buy with confidence.

✅ 改进：
- 严格基于 GraphRAG 三元组（304 / 60×30×80cm / 8kg / foldable / lead-free）
- 自动避开所有违禁词（无 FDA / antibacterial / #1）
- Bullet 黄金顺序（场景 → 参数 → 卖点 → 售后）
- 卖点按 weight 排序（Foldable=1.15 进 Bullet 1）
- 长度合规（每条 Bullet 在 200 字符内）

多模态生成的 GraphRAG

图片生成的 Prompt 注入

python

def graphrag_image_prompt(product_id, style):
    """
    生成图片 Prompt 时注入 GraphRAG
    """
    triples = retrieve_subgraph(product_id, task='lifestyle' if style == 'lifestyle' else 'main_image')

    base_prompt = STYLE_TEMPLATES[style]
    facts = render_triples_for_image(triples)

    return f"""
    {base_prompt}

    Product context (must be visually consistent):
    {facts}

    Subject must visibly demonstrate:
    - Material: {get_entity('Material', triples)}
    - Key features: {get_top_features(triples, k=2)}

    Negative: avoid showing competing brands, logos, watermarks,
    or any items not consistent with above context.
    """

视频脚本的 GraphRAG

python

def graphrag_video_script(product_id):
    triples = retrieve_subgraph(product_id, task='video')

    # 场景从 SUITABLE_FOR 选 Top-1
    scene = top_relation(triples, 'SUITABLE_FOR')

    # 行为从 HIGHLIGHTS 编排
    features = filter_by_relation(triples, 'HIGHLIGHTS')
    actions = features_to_actions(features)
    # 例如 Foldable → "fold and unfold demonstration"

    return generate_video_prompt(scene, actions)

性能数据

阶段	耗时	说明
子图召回	< 60 ms	SQLite 索引查询
模板填充	< 10 ms	字符串拼接
LLM 推理（Bullet）	8-10 s	GPT-5.5
LLM 推理（图片）	12-16 s	Gemini 3 Pro Image
后处理	< 100 ms	违禁词检查 + 顺序校验

下一步

💾 存储设计 —— SQLite + 向量索引完整实现
⚡ 一键全案 —— GraphRAG 在产品中的应用
📊 测试报告 —— GraphRAG 准确率实测数据

检索增强生成（Retrieval-Augmented Generation） ​

三段式流水线 ​

阶段 1：任务感知子图召回 ​

不同任务召回不同子图 ​

召回算法 ​

召回示例 ​

阶段 2：模板填充 ​

序列化为 Cypher-like 三元组 ​

注入 System Prompt ​

阶段 3：大模型推理 ​

调用方式 ​

输出后处理 ​

实证对比 ​

场景：Bullet 生成 ​

案例对比 ​

多模态生成的 GraphRAG ​

图片生成的 Prompt 注入 ​

视频脚本的 GraphRAG ​

性能数据 ​

下一步 ​

检索增强生成（Retrieval-Augmented Generation）

三段式流水线

阶段 1：任务感知子图召回

不同任务召回不同子图

召回算法

召回示例

阶段 2：模板填充

序列化为 Cypher-like 三元组

注入 System Prompt

阶段 3：大模型推理

调用方式

输出后处理

实证对比

场景：Bullet 生成

案例对比

多模态生成的 GraphRAG

图片生成的 Prompt 注入

视频脚本的 GraphRAG

性能数据

下一步