June 24, 2024 Translation Prompt

#AI #Prompt #Thoughts #howtowriteprompt **Table of Content** [[#Summary for this Blog]] [[#Why Translation Task?]] [[#Translation Prompt Idea]] [[#Prompt show]] [[#Comparison]] [[#Evaluation from GPT-4]] --- ## Summary for this Blog <mark style="background: #BBFABBA6;">The blog discusses using pseudo code prompts to instruct large language models (LLMs) for translation tasks and compares their performance with standard prompts when working with models like GPT-4.</mark> --- In this blog, I will explain how to use pseudo code prompts to instruct large language models (LLMs) to perform translation tasks. I will also highlight the differences between pseudo code prompts and standard prompts when working with LLMs like GPT-4. ## Why Translation Task? You might wonder why I focus on translation tasks. The reason is that translation is incredibly useful in my daily workflow. As a non-native English speaker, I often find reading lengthy articles challenging. Using a large language model to translate these articles into Chinese, my first language, provides a quick and efficient way to understand the content. ## Translation Prompt Idea Here's an idea for my translation prompt. First, I want the LLM to assume the role of an experienced translator who has worked for renowned magazines like *The Economist* and *Hacker News*. Then, I will manually set a word list that includes specific domain terminologies I don't want the LLM to translate, such as JSON, LLM, Claude, GPT, and so on. Thirdly, I will establish a translation workflow for the LLM. This workflow will involve receiving user input, performing the translation, and then conducting a self-review to check for grammatical errors and meaningful logical mistakes. This process will be executed three times. Finally, the LLM will print out its output. ## Prompt show So, my prompt is as follows: ```Markdown 以下是伪代码提示词，请你理解，然后根据我发送给你的内容，按照提示词逻辑，执行翻译。 class Translate: def __init__(self, role="翻译官", skill="擅长将英文准确翻译成中文，精准表达作者的含义", background="多次为《经济学人》,《Hacker News》等著名科技，经济杂志提供过翻译稿", vocabulary = ["token", "json", "XML", "Claude", "GPT", "Gemini"]): self.role = role self.skill = skill self.background = background self.vocabulary = vocabulary # Industry-specific terms that do not need translation def translate(self, original_text): translated_text = original_text # Placeholder for translation logic return translated_text def reflect(self, translated_text, original_text): grammar_check = True meaningful_logical_check = True if not grammar_check or not meaningful_logical_check: corrected_translation = self.translate(original_text) return corrected_translation else: return translated_text def get_user_input(self): original_text = input("Please enter text to be translated: ") return original_text # Example usage: translator = Translate() user_input = translator.get_user_input() for epoch in range(3): translated_text = translator.translate(user_input) reflection = translator.reflect(translated_text, user_input) print(reflection) # get_user_input """ simulate and execute Translate class for following content: A few prompting techniques have consistently helped improve performance across various models and tasks: n-shot prompts + in-context learning, chain-of-thought, and providing relevant resources. The idea of in-context learning via n-shot prompts is to provide the LLM with a few examples that demonstrate the task and align outputs to our expectations. A few tips: If n is too low, the model may over-anchor on those specific examples, hurting its ability to generalize. As a rule of thumb, aim for n ≥ 5. Don’t be afraid to go as high as a few dozen. Examples should be representative of the expected input distribution. If you’re building a movie summarizer, include samples from different genres in roughly the proportion you expect to see in practice. You don’t necessarily need to provide the full input-output pairs. In many cases, examples of desired outputs are sufficient. If you are using an LLM that supports tool use, your n-shot examples should also use the tools you want the agent to use. In chain-of-thought (CoT) prompting, we encourage the LLM to explain its thought process before returning the final answer. Think of it as providing the LLM with a sketchpad so it doesn’t have to do it all in memory. The original approach was to simply add the phrase “Let’s think step-by-step” as part of the instructions. However, we’ve found it helpful to make the CoT more specific, where adding specificity via an extra sentence or two often reduces hallucination rates significantly. For example, when asking an LLM to summarize a meeting transcript, we can be explicit about the steps, such as: First, list the key decisions, follow-up items, and associated owners in a sketchpad. Then, check that the details in the sketchpad are factually consistent with the transcript. Finally, synthesize the key points into a concise summary. """ ``` ## Comparison I will compare the performance of GPT-4 when using pseudo code prompts versus standard prompts. **result of pseudo code execution on GPT-4** ```Markdown 以下是将上述英文内容翻译成中文的结果：一些提示技术始终有助于提升各种模型和任务的性能：n-shot提示+上下文学习、思维链和提供相关资源。通过n-shot提示进行上下文学习的想法是向大型语言模型提供一些示例，这些示例展示了任务并使输出符合我们的期望。以下是一些提示：如果n太低，模型可能会过度依赖这些特定示例，损害其泛化能力。一般来说，目标是n ≥ 5。不要害怕增加到几十个。示例应该代表预期输入分布。如果您正在构建一个电影摘要器，请包括来自不同类型的样本，大致按照您预期在实践中看到的比例。您不一定需要提供完整的输入-输出对。在许多情况下，只提供所需输出的示例就足够了。如果您使用的大型语言模型支持工具使用，您的n-shot示例也应该使用您希望代理使用的工具。在思维链（CoT）提示中，我们鼓励大型语言模型在返回最终答案前解释其思考过程。可以将其视为为大型语言模型提供一个草图本，这样它就不必全部在内存中完成。最初的方法是简单地添加短语“让我们一步步思考”作为指令的一部分。然而，我们发现通过增加一两句具体描述，使CoT更具体，通常能显著降低产生幻觉的比率。例如，当要求大型语言模型总结会议记录时，我们可以明确步骤，比如：首先，在草图本上列出关键决策、后续事项及其负责人。然后，检查草图本中的细节与会议记录事实上是否一致。最后，将关键点综合成一份简洁的总结。 ``` **result of standard prompt execution on GPT-4** ```Markdown 以下是一些在各种模型和任务中持续提高性能的提示技巧：n次提示+上下文学习、思路链条，以及提供相关资源。通过n次提示的上下文学习的概念是为大型语言模型（LLM）提供几个示例，这些示例展示了任务并将输出与我们的期望对齐。一些建议：如果n太低，模型可能会过度依赖这些特定示例，从而损害其泛化能力。一般来说，目标是n ≥ 5。不要害怕将数量增加到几十个。示例应该代表预期的输入分布。如果你正在构建一个电影总结器，应包括来自不同类型的样本，大致按照你期望在实际中看到的比例。你不一定需要提供完整的输入-输出对。在许多情况下，仅提供所需输出的示例就足够了。如果你使用的是支持工具使用的LLM，你的n次示例也应该使用你希望代理使用的工具。在思路链条（CoT）提示中，我们鼓励LLM在返回最终答案之前解释其思考过程。可以将其视为为LLM提供一个草图本，这样它就不必全部在记忆中完成。最初的方法是简单地在指令中添加“让我们一步步思考”的短语。然而，我们发现通过增加一两个具体的句子来使CoT更具体，通常可以显著降低产生错误信息的比率。例如，当要求LLM总结会议记录时，我们可以明确步骤：首先，在草图本上列出关键决策、后续事项及其负责人。然后，检查草图本中的细节是否与会议记录事实一致。最后，将关键点综合成一个简洁的总结。 ``` ## Evaluation from GPT-4 I use GPT-4 to evaluate these two translation results. ```Markdown 这两段中文翻译都非常贴近原文，但在表达和风格上有一些微妙的差别。下面是我的评分： ### 第一段翻译： - **准确性**：这段翻译准确地捕捉了原文的技术术语和概念。 - **流畅性**：文本流畅，用词准确，读起来较为自然。 - **风格与语境**：翻译保留了原文的正式和教育性质，适合学术或技术背景的读者。 ### 第二段翻译： - **准确性**：同样地，这段翻译也准确地表达了原文意思，技术术语使用得当。 - **流畅性**：此翻译同样流畅，但在某些表达上更加地道，例如“一步步思考”的使用更接近中文说法。 - **风格与语境**：这段翻译稍微增加了一些解释性的语句，使得内容更加易于理解，特别是对于不熟悉主题的读者。 ### 个人偏好：我更倾向于第二段翻译。虽然两者都非常优秀，第二段翻译在表达上稍微直观和易懂一些，特别是在解释复杂概念和步骤时，增加的细节使得信息更加易于接受和理解。此外，它在文风上稍微轻松一些，这可能会使得阅读体验更佳，尤其是在长篇技术文本中。 ``` Surprisingly, GPT-4 showed a preference for the translation results generated from the standard prompt. I believe there are several potential reasons for this: 1. **Simplicity of the Text**: The text that needed translation was too simple to highlight the strengths of the pseudo code prompt, meaning the standard prompt could easily handle it. 2. **Quality of the Pseudo Code Prompt**: The pseudo code prompt may not have been well-optimized for translation tasks. There is certainly room for improvement to achieve better results. 3. **Capability of GPT-4**: GPT-4 is an extremely advanced LLM that may not require a pseudo code prompt to perform straightforward translation tasks. A simple standard prompt is sufficient for such an intelligent model to handle easy translations. ## Conclusion If you are using an LLM like GPT-4 to translate short and easily understandable text, a standard prompt is a good choice. It is easy to write and does not require extensive testing. However, if you want to translate longer and more complex articles, using a pseudo code prompt can be a valuable alternative to achieve the best results.