Skip to content

检索增强生成(Retrieval-Augmented Generation)

核心创新

子图召回 + 模板填充 + 大模型推理 三段式流水线,将 Listing 事实错误率从 21% 降至 4.8%。

三段式流水线

text
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│ 子图召回     │───▶│ 模板填充     │───▶│ 大模型推理   │
│ (Subgraph    │    │ (Cypher-like│    │ (GPT/Gemini│
│  Retrieval) │    │  Triples)   │    │  with Hard │
│             │    │             │    │  Constraint)│
└─────────────┘    └─────────────┘    └─────────────┘

阶段 1:任务感知子图召回

不同任务召回不同子图

text
任务                       召回的关系子集
─────────────────────────────────────────────────────────
主图生成      → MADE_OF + HAS_SPEC(材质 + 主要规格)
A+ 信息图     → HIGHLIGHTS + HAS_SPEC(卖点 + 规格)
Lifestyle 图  → SUITABLE_FOR + Audience + Material
4 宫格        → Top-4 SUITABLE_FOR
Bullet 文案   → HAS_SPEC + HIGHLIGHTS + COMPLIES_WITH
违禁词检查    → COMPLIES_WITH(仅合规)
视频脚本      → SUITABLE_FOR + HIGHLIGHTS(场景 + 行为)

召回算法

python
def retrieve_subgraph(product_id: str, task: str) -> List[Triple]:
    """
    任务感知子图召回
    """
    # 1. 根据任务确定关系类型
    REL_TYPES = {
        'main_image': ['MADE_OF', 'HAS_SPEC'],
        'aplus':      ['HIGHLIGHTS', 'HAS_SPEC'],
        'lifestyle':  ['SUITABLE_FOR', 'MADE_OF'],
        'bullet':     ['HAS_SPEC', 'HIGHLIGHTS', 'COMPLIES_WITH'],
        'compliance': ['COMPLIES_WITH'],
        'video':      ['SUITABLE_FOR', 'HIGHLIGHTS'],
    }
    rel_types = REL_TYPES[task]

    # 2. SQL 查询
    triples = db.query("""
        SELECT s.name AS subject,
               r.rel_type,
               t.name AS object,
               t.attributes,
               r.weight,
               r.evidence
        FROM kg_relations r
        JOIN kg_entities s ON s.id = r.source_id
        JOIN kg_entities t ON t.id = r.target_id
        WHERE r.source_id = ?
          AND r.rel_type IN ({})
        ORDER BY r.rel_type, r.weight DESC
    """.format(','.join('?' * len(rel_types))),
       (product_id, *rel_types))

    # 3. 按任务做 Top-K 截断
    if task == 'bullet':
        triples = top_k_per_relation(triples, k=5)  # 每类最多 5 条
    elif task == 'lifestyle':
        triples = top_k(triples, k=4)               # 总共 4 条

    return triples

召回示例

text
HAS_SPEC:
  Product → Size 60x30x80cm        (weight=1.00)
  Product → Load 8kg per tier      (weight=0.90)
  Product → Folded 60x30x8cm       (weight=0.85)
  Product → 3 Tiers                (weight=0.85)
  Product → Total Load 24kg        (weight=0.80)

HIGHLIGHTS:
  Product → Foldable               (weight=1.15) ← 用户反馈强化
  Product → Multi-tier             (weight=1.05)
  Product → 304-grade              (weight=1.00)
  Product → Anti-rust              (weight=0.92)

COMPLIES_WITH:
  Product → Lead-Free              (weight=1.00)
  Product → Food Contact Safe      (weight=1.00)
text
SUITABLE_FOR (Top-4):
  Product → Small Kitchen Apartment (weight=0.95)
  Product → Bathroom Storage         (weight=0.65)
  Product → Office Pantry            (weight=0.60)
  Product → Outdoor Camping          (weight=0.55)

MADE_OF:
  Product → 304 Stainless Steel      (weight=1.00)

阶段 2:模板填充

序列化为 Cypher-like 三元组

text
<MADE_OF, Stainless Steel Kitchen Rack, 304 Stainless Steel>
<HAS_SPEC, Stainless Steel Kitchen Rack, Size 60x30x80cm>
<HAS_SPEC, Stainless Steel Kitchen Rack, Load 8kg per tier>
<HAS_SPEC, Stainless Steel Kitchen Rack, Folded Size 60x30x8cm>
<HIGHLIGHTS, Stainless Steel Kitchen Rack, Foldable [weight=1.15]>
<HIGHLIGHTS, Stainless Steel Kitchen Rack, Multi-tier [weight=1.05]>
<COMPLIES_WITH, Stainless Steel Kitchen Rack, Lead-Free>
<COMPLIES_WITH, Stainless Steel Kitchen Rack, Food Contact Safe>

注入 System Prompt

text
你是有 10 年经验的亚马逊高级运营。请遵循 A10 / COSMO / Rufus 三种算法,
撰写英文 Listing。

【知识图谱事实(必须 entailed by)】
<MADE_OF, Stainless Steel Kitchen Rack, 304 Stainless Steel>
<HAS_SPEC, Stainless Steel Kitchen Rack, Size 60x30x80cm>
<HAS_SPEC, Stainless Steel Kitchen Rack, Load 8kg per tier>
<HAS_SPEC, Stainless Steel Kitchen Rack, Folded Size 60x30x8cm>
<HIGHLIGHTS, Stainless Steel Kitchen Rack, Foldable [weight=1.15]>
<HIGHLIGHTS, Stainless Steel Kitchen Rack, Multi-tier [weight=1.05]>
<HIGHLIGHTS, Stainless Steel Kitchen Rack, 304-grade [weight=1.00]>
<COMPLIES_WITH, Stainless Steel Kitchen Rack, Lead-Free>
<COMPLIES_WITH, Stainless Steel Kitchen Rack, Food Contact Safe>

【硬约束】
1. 严禁生成与三元组冲突的描述(如 316 / 自动折叠 / 1m 长度)
2. 严禁使用违禁词(FDA approved, Antibacterial, #1, Best, ...)
3. Bullet 顺序:使用场景 → 关键参数 → 卖点 → 售后承诺 → 品牌延伸
4. 每个 Bullet 用 [STANDOUT PHRASE] 开头吸引点击
5. 卖点按 weight 降序,最高 weight 进 Bullet 1

请生成 Title + 5-Point Bullets + Description。
text
你是亚马逊产品摄影提示词专家。请基于以下知识图谱生成 Lifestyle 场景图的提示词。

【场景图谱】
<SUITABLE_FOR, Product, Small Kitchen Apartment [weight=0.95]>
<MADE_OF, Product, 304 Stainless Steel>

【硬约束】
1. 场景必须真实可信,避免不合理布局
2. 光线根据场景调整(厨房 = 早晨自然光,户外 = 黄金时刻)
3. 必须能体现材质(304 不锈钢 = brushed metallic sheen)
4. 商品摆放符合实际使用方式(不悬空、不倾斜)
5. 不得添加竞品 logo、违禁文字

请生成英文 Prompt(用于 Gemini 3 Pro Image)。

阶段 3:大模型推理

调用方式

python
# Bullet 生成
response = openai.chat.completions.create(
    model="gpt-5.5",
    messages=[
        {"role": "system", "content": filled_system_prompt},
        {"role": "user", "content": user_request}
    ],
    temperature=0.7,
    max_tokens=2048,
    response_format={"type": "json_object"}
)

输出后处理

python
def post_process(llm_output, subgraph):
    """
    后处理:
      1. 验证输出 entailed by 三元组
      2. 检查违禁词命中
      3. Bullet 顺序检查
    """
    # 1. 事实校验
    facts = extract_claims(llm_output)
    for claim in facts:
        if not entailed_by(claim, subgraph):
            llm_output = regenerate_with_warning(claim, subgraph)

    # 2. 违禁词检查
    for word in BANNED_WORDS:
        if word.lower() in llm_output.lower():
            llm_output = replace_or_regenerate(word, llm_output)

    # 3. Bullet 顺序
    bullets = parse_bullets(llm_output)
    if not is_golden_order(bullets):
        bullets = reorder_to_golden(bullets)

    return llm_output

实证对比

我们用 100 条真实跨境电商商品做对比测试:

场景:Bullet 生成

指标直接 LLM+ GraphRAG改进
事实错误率21.0%4.8%↓ 77%
违禁词命中率11.0%0.6%↓ 95%
关键卖点遗漏率18.5%3.2%↓ 83%
Bullet 顺序合规62%96%↑ 55%
生成长度合规78%94%↑ 21%

案例对比

text
Title: Premium Stainless Steel Kitchen Rack with FDA Approved
        Antibacterial Coating - #1 Best Seller!

Bullet 1: [HEALTH & SAFETY] Built with FDA-approved
          antibacterial coating that kills 99.9% of germs.
Bullet 2: [DURABLE] Made of 316 marine-grade stainless steel.
Bullet 3: [SPACE SAVING] Auto-folds in 3 seconds with motorized
          mechanism.
...

❌ 问题:
- "FDA approved" 违禁词
- "kills 99.9%" 抗菌违禁词
- "316" 不是 304(事实错误)
- "motorized" 商品本无电机(幻觉)
- "#1 Best Seller" 平台禁词
text
Title: Foldable 3-Tier Stainless Steel Kitchen Rack -
        304 Stainless, 60x30x80cm, 8kg Load Per Tier

Bullet 1: [USE ANYWHERE] Perfect for small kitchens, apartments,
          bathrooms, and outdoor camping. Folds flat to just
          8cm thick for hassle-free storage.

Bullet 2: [304 STAINLESS STEEL] Genuine 304 SUS construction.
          Lead-free certified and food-contact safe — durable
          enough for daily kitchen use.

Bullet 3: [FOLDABLE 3-TIER DESIGN] 60×30×80cm fully expanded,
          collapses to 60×30×8cm in seconds. No tools required.

Bullet 4: [24KG TOTAL CAPACITY] Three sturdy tiers each support
          8kg — combined 24kg total. Anti-rust coating ensures
          long-lasting performance.

Bullet 5: [1-YEAR WARRANTY] Backed by 24/7 customer service and
          full replacement guarantee. Buy with confidence.

✅ 改进:
- 严格基于 GraphRAG 三元组(304 / 60×30×80cm / 8kg / foldable / lead-free)
- 自动避开所有违禁词(无 FDA / antibacterial / #1)
- Bullet 黄金顺序(场景 → 参数 → 卖点 → 售后)
- 卖点按 weight 排序(Foldable=1.15 进 Bullet 1)
- 长度合规(每条 Bullet 在 200 字符内)

多模态生成的 GraphRAG

图片生成的 Prompt 注入

python
def graphrag_image_prompt(product_id, style):
    """
    生成图片 Prompt 时注入 GraphRAG
    """
    triples = retrieve_subgraph(product_id, task='lifestyle' if style == 'lifestyle' else 'main_image')

    base_prompt = STYLE_TEMPLATES[style]
    facts = render_triples_for_image(triples)

    return f"""
    {base_prompt}

    Product context (must be visually consistent):
    {facts}

    Subject must visibly demonstrate:
    - Material: {get_entity('Material', triples)}
    - Key features: {get_top_features(triples, k=2)}

    Negative: avoid showing competing brands, logos, watermarks,
    or any items not consistent with above context.
    """

视频脚本的 GraphRAG

python
def graphrag_video_script(product_id):
    triples = retrieve_subgraph(product_id, task='video')

    # 场景从 SUITABLE_FOR 选 Top-1
    scene = top_relation(triples, 'SUITABLE_FOR')

    # 行为从 HIGHLIGHTS 编排
    features = filter_by_relation(triples, 'HIGHLIGHTS')
    actions = features_to_actions(features)
    # 例如 Foldable → "fold and unfold demonstration"

    return generate_video_prompt(scene, actions)

性能数据

阶段耗时说明
子图召回< 60 msSQLite 索引查询
模板填充< 10 ms字符串拼接
LLM 推理(Bullet)8-10 sGPT-5.5
LLM 推理(图片)12-16 sGemini 3 Pro Image
后处理< 100 ms违禁词检查 + 顺序校验

下一步

基于 MIT 协议开源 · 中国大学生计算机设计大赛软件应用与开发类作品