# Field 描述对 LLM 的约束力非常强 step_by_step_analysis: str = Field(description=""" Detailed step-by-step analysis of the answer with at least 5 steps and at least 150 words. Pay special attention to the wording of the question to avoid being tricked. """)
at least 5 steps and at least 150 words 这个约束防止 LLM 用两句话走完推理,强迫它展开分析。
AnswerWithRAGContextNumberPrompt 的核心约束写在 step_by_step_analysis 的 Field 描述里:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
step_by_step_analysis: str = Field(description=""" Detailed step-by-step analysis with at least 5 steps and at least 150 words. **Strict Metric Matching Required:** 1. Determine the precise concept the question's metric represents. 2. Examine potential metrics in the context. 3. Accept ONLY if: The context metric's meaning *exactly* matches the target metric. Synonyms are acceptable; conceptual differences are NOT. 4. Reject (and use 'N/A') if: - The context metric covers more or less than the question's metric. - The context metric is a related concept but not the exact equivalent. - Answering requires calculation, derivation, or inference. - Aggregation Mismatch: question needs a single value but context offers only aggregated total 5. No Guesswork: If any doubt exists about the metric's equivalence, default to N/A. """)
final_answer: Union[float, int, Literal['N/A']] = Field(description=""" Pay special attention to any mentions about whether metrics are reported in units, thousands, or millions: - Value from context: 4970.5 (in thousands $) Final answer: 4970500 ← 要换算成实际值,补三个零 Pay attention if value wrapped in parentheses, it means NEGATIVE: - Value from context: (2,124,837) CHF Final answer: -2124837 ← 括号 = 负数(财务报表惯例) """)
这两条规则处理的是财务报表的惯用表达,新手极容易在此出错:
千元单位:财务报表里 “4,970.5 (in thousands)” 的真实值是 4,970,500,不是 4,970.5
step_by_step_analysis: str = Field(description=""" Detailed step-by-step analysis with at least 5 steps and at least 150 words. Pay special attention to the wording of the question to avoid being tricked. Sometimes it seems that there is an answer in the context, but this might be not the requested value, but only a similar one. """)
配合 Example 里的反面案例(这是 Prompt 里最有价值的部分之一):
1 2 3 4 5 6 7 8 9 10 11 12 13
example = r""" Question: "Did W. P. Carey Inc. announce any changes to its dividend policy?" Answer: { "step_by_step_analysis": "... 4. Consistent, incremental increases throughout the year, with explicit mentions of maintaining a 'steady and growing' dividend, indicates no changes to *policy*, though the *amount* increased as planned within the existing policy.", "final_answer": False ← 股息金额变了,但政策没变,答案是 False } """
这个 Example 展示了一个关键的语义区分:**”股息金额变化”** ≠ **”股息政策变化”**。Example 在 Prompt 里的作用是:让 LLM 知道”我应该做多细粒度的区分”,这比任何文字描述都更直观。
4.3 names 题:位置名称 vs 人名的区分
names 题(复数)专门用于”列举多个名称”的场景,比如”哪些高管发生了职位变动”。
1 2 3 4 5 6 7 8 9 10 11 12
final_answer: Union[List[str], Literal["N/A"]] = Field(description=""" If question asks about POSITIONS (e.g., changes in positions), return ONLY position titles, WITHOUT names or additional info. Appointments on new leadership positions also count as changes. If several changes related to position with same title, return title only once. Position title should always be in SINGULAR form. Example: ['Chief Technology Officer', 'Board Member', 'Chief Executive Officer'] If question asks about NAMES, return ONLY full names exactly as in context. Example: ['Carly Kennedy', 'Brian Appelgate Jr.'] """)
两者共享基础 Prompt 框架,但 Schema 不同,且 names 题的 Field 描述对”返回位置名称还是人名”有额外的规则判断。
4.5 共享的基础 Prompt
所有题型的 instruction 部分来自同一个基类:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
classAnswerWithRAGContextSharedPrompt: instruction = """ You are a RAG (Retrieval-Augmented Generation) answering system. Your task is to answer the given question based only on information from the company's annual report, which is uploaded in the format of relevant pages extracted using RAG. Before giving a final answer, carefully think out loud and step by step. Pay special attention to the wording of the question. - Keep in mind that the content containing the answer may be worded differently than the question. - The question was autogenerated from a template, so it may be meaningless or not applicable to the given company. """
最后一句 “The question was autogenerated from a template, so it may be meaningless” 是竞赛场景特有的提示——问题是模板生成的,某些问题对特定公司不适用(比如”公司的主要产品是什么”,但这家公司是金融机构没有实体产品),这时候正确答案是 N/A 而不是强行找一个答案。
5. 比较型问题:Query Routing 模式
5.1 什么是比较型问题
1
"Which company had higher revenue in 2022, 'Apple' or 'Microsoft'?"
# src/questions_processing.py defprocess_comparative_question(self, question, companies, schema): # Step 1: 用 LLM 把比较题拆成独立子问题 rephrased_questions = self.openai_processor.get_rephrased_questions( original_question=question, companies=companies ) # 输入: "Which had higher revenue, 'Apple' or 'Microsoft'?" # 输出: { # "Apple": "What was Apple's revenue in 2022?", # "Microsoft": "What was Microsoft's revenue in 2022?" # } # Step 2: 对每家公司并行执行独立的 RAG 问答 individual_answers = {} with ThreadPoolExecutor() as executor: futures = { executor.submit(self.get_answer_for_company, company, sub_question, "number"): company for company, sub_question in rephrased_questions.items() } for future in as_completed(futures): company, answer_dict = future.result() individual_answers[company] = answer_dict # Step 3: 把各公司的独立答案汇总,再调用一次 LLM 做比较判断 comparative_answer = self.openai_processor.get_answer_from_rag_context( question=question, # 原始比较问题 rag_context=individual_answers, # 各公司独立答案作为上下文 schema="comparative", model=self.answering_model ) # Step 4: 聚合所有公司的引用页码 comparative_answer["references"] = aggregated_references return comparative_answer
5.3 第一步:问题拆解的 Prompt
问题拆解本身也是一次 LLM 调用,使用 RephrasedQuestionsPrompt:
1 2 3 4 5 6 7 8 9
instruction = """ You are a question rephrasing system. Your task is to break down a comparative question into individual questions for each company mentioned. Each output question must be: - self-contained(完全独立,不依赖原始问题的上下文) - maintain the same intent and metric(保留原始问题的度量指标) - specific to the respective company(改写为针对单个公司的问题) """
“self-contained” 这个约束至关重要。如果直接对每家公司发”Which had higher revenue?”这个问题,LLM 会困惑——改写后的子问题必须是完全独立的,比如 “What was Apple’s total revenue in fiscal year 2022?”。
user_prompt = """ Here are the individual company answers: \"\"\" {context} ← 这里是 {"Apple": {"final_answer": 394.3}, "Microsoft": {"final_answer": 198.3}} \"\"\" Here is the original comparative question: "{question}" """
专门的比较规则:
1 2 3 4 5 6 7 8
instruction = """ Important rules for comparison: - When the question asks to choose one company, return the company name EXACTLY as it appears in the original question - If a company's metric is in a different currency than what is asked, EXCLUDE that company from comparison - If all companies are excluded, return 'N/A' """
classAnswerSchemaFixPrompt: system_prompt = """ You are a JSON formatter. Your task is to format raw LLM response into a valid JSON object. Your answer should always start with '{' and end with '}' Your answer should contain only json string, without any preambles, comments, or triple backticks. """
defbuild_system_prompt(instruction, example, pydantic_schema=""): schema_block = ( f"Your answer should be in JSON and strictly follow this schema:\n" f"```\n{pydantic_schema}\n```" ) # 组装:instruction + schema(可选)+ example return instruction + schema_block + example
answer = processor.get_answer_from_rag_context( question="What is the operating margin for Tradition in 2022?", rag_context=rag_context, schema="number", model="gpt-4o-mini-2024-07-18" )
# ✅ 正确做法:在 Field 描述里告诉 LLM 自己处理 final_answer: Union[float, int, Literal['N/A']] = Field(description=""" Pay attention to mentions of thousands/millions and adjust accordingly. Value from context: 4970.5 (in thousands $) → Final answer: 4970500 """)