1. Introduction to Classical Chinese and the Challenges in Understanding It Classical Chinese, as a vehicle of Chinese cultural heritage, encompasses rich linguistic and cultural knowledge. Due to its thousand-year-long evolution,classical Chinese texts are marked by complex language phenomena, including changes in characters, vocabulary, grammar, and phonetics. Each historical period introduced new variations, making comprehension increasingly difficult for modern readers. Beyond language, understanding classical Chinese texts also requires knowledge of historical and cultural contexts. The use of subtle references, varied names for people and places, and the frequent employment of allusions further complicates interpretation. This dual challenge in language and culture has posed significant difficulties for modern readers and remains a key issue in the study and digitization of classical texts today. To assist in the organization of ancient Chinese texts, classical Chinese language education, and digital humanities research, we have developed a large language model (LLM) specifically tailored for classical Chinese: Literate Soul LLM. This presentation will discuss the model’s architecture, data processing techniques, base training, and fine-tuning, and explore its potential applications in education, tourism, and academia
Name: Yunjia Zhang
Email: rachalzhangyunjia@gmail.com
Directory Structure.
├── Code
│ ├── Frontend
│ ├── Model Reasoning
│ └── Model Training
├── Dataset_example_with_code
│ ├── Ancient_Chinese
│ ├── Ancient_Modern_Chinese
│ ├── CodeForDataset
│ └── statistic.md
├── MaterialForDemo
│ ├── TranscriptForDemo.pdf
│ ├── YunjiaZhang_AMD_DEMO.mp4
│ ├── YunjiaZhang_AMDmi210project.pdf
│ └── YunjiaZhang_AMDmi210project.pptx
├── README
│ ├── LLM-as-a-Meta-Judge.png
│ ├── Last login- Tue Jul 2 07-21-36 2024 from 172.17.0.1.png
│ ├── Model- Establishment.png
│ ├── README_DPO_procedure.md
│ ├── README_EN.md
│ ├── README_ModelTraining_and_Reasoning.md
│ ├── README_frontend.md
│ ├── older_version_of_README
│ └── successful_conda -V.png
README_ModelTraining_and_ReasoningDocumentation SectionEnvironment SetupInstall Pythonsudo apt-get update
sudo apt-get install python3-pip
Open Administrator Privilegessudo -i
Install PyTorchYou can find the installation command on the PyTorch official website: https://pytorch.org. If using another LLM, download it as needed.
Due to limitations with vllm, the latest version of PyTorch cannot be installed, so we use the following command:
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/rocm5.7
Install Conda EnvironmentFind the appropriate version of Anaconda at https://repo.anaconda.com/archive/index.html and download it using the wget command.
wget -c https://repo.anaconda.com/archive/Anaconda3-2024.02-1-Linux-x86_64.sh
In the directory of the Conda file, enter the command to install, and keep pressing Enter until prompted to enter "yes."
bash Anaconda3-2024.02-1-Linux-x86_64.sh
It may take a few minutes to extract and execute.
Then, use this command to check whether Conda is correctly installed.
conda -V
In case of source: not found
on AMD mi210, enter the following line:
export PATH=~/anaconda3/bin:$PATH$
Then try conda -V
again.
First, use git clone.
sudo apt-get install git
git clone https://github.com/hiyouga/LLaMA-Factory.git
git pull
Create the Conda environment.
conda create -n llama_factory python=3.10
Activate the Conda environment.
conda activate llama_factory
If you encounter an error on AMD mi210 asking to run conda init
, enter:
export PATH=~/anaconda3/bin:$PATH
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/home/aac/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/home/aac/anaconda3/etc/profile.d/conda.sh" ]; then
. "/home/aac/anaconda3/etc/profile.d/conda.sh"
else
export PATH="/home/aac/anaconda3/bin:$PATH"
fi
fi
unset __conda_setup
After completion, activate the Conda environment.
conda activate llama_factory
Switch to the directory and install LLaMA Factory.
cd LLaMA-Factory
pip install -e .[metrics]
At this point, LLaMA-Factory installation is complete.
Install Djangopip install django
Start DjangoCreate a Project with DjangoAfter installing Django, we use the built-in scaffold tool django-admin
to create a project:
django-admin startproject django_news
cd django_news
The generated project skeleton and the purpose of each file is as follows:
django_news
├── django_news // Global project file directory
│ ├── __init__.py
│ ├── settings.py // Global configuration
│ ├── urls.py // Global routing
│ └── wsgi.py // WSGI service interface (no need to worry about this for now)
└── manage.py // Project management script
Run Django ProjectWe use manage.py
to run the development server:
python manage.py runserver
Add Custom App to Global ConfigurationIn settings.py
, add the news application to INSTALLED_APPS
:
# ...
INSTALLED_APPS = [
"django.contrib.admin",
"django.contrib.auth",
"django.contrib.contenttypes",
"django.contrib.sessions",
"django.contrib.messages",
"django.contrib.staticfiles",
"news",
]
Training LLMChoose an Appropriate Base LLM ModelHere, we select Qwen1.5-7B-Chat.
Process STF Training DatasetFor the format of the fine-tuning dataset, please refer to: https://github.com/hiyouga/LLaMA-Factory/tree/main/data
Import DatasetSwitch to the data directory and use scp
to transfer the dataset to the server (optional) and modify the dataset_info.json
file.
cd data
Use scp
to transfer the dataset to the server.
Example:
scp [local file path] [name@xx.xx.xx.xx]:[target file path]
You need to install vim
first:
sudo apt-get install vim
Switch to the data directory and modify the dataset_info.json
file.
cd data
vim dataset_info.json
Press i
to enter insert mode in vim and modify the file.
Add the following to dataset_info.json
:
"modernchinese_classicalchinese":{
"file_name":"modernchinese_classicalchinese.parquet",
"columns":{"prompt":"classical",
"response":"modern"
}
}
After editing, press [esc]
, then type :wq
to save and exit vim.
modernchinese_classicalchinese
is the name displayed in the LLaMA Factory interface, and modernchinese_classicalchinese.parquet
is the name of the dataset in the data folder. The columns
specify the corresponding column names for prompt
and response
.
Once completed, return to the parent directory.
cd ..
Set Specific Parameters in LLaMA FactoryParameters in LLaMA Factory need to be set by yourself. It is recommended to check pissa
for faster initialization.
Set the learning rate to 5e-5
for optimal performance.
受到META的论文:META-REWARDING LANGUAGE MODELS: Self-Improving Alignment with LLM-as-a-Meta-Judge 的启发。https://doi.org/10.48550/arXiv.2407.19594
Figure 1 Meta-Rewarding iterative training scheme.
We need three prompts to generate judgements.
/Use the prompt in Chinese. The English is for reading/Prompt1: Evaluation Prompt (in the data creation(in Figure1: Meta-Rewarding iterative training scheme.))Qwen2 Judge Prompt AlternativeThe Prompt is from AlpacaEval
<|im start|>system You are a highly efficient assistant, who evaluates and selects the best large language model (LLMs) based on the quality of their responses to a given instruction. This process will be used to create a leaderboard reflecting the most accurate and human-preferred answers. <|im end|> <|im start|>user I require a leaderboard for various large language models. I’ll provide you with prompts given to these models and their corresponding outputs. Your task is to assess these responses, and select the model that produces the best output from a human perspective.
Instruction{ “instruction”: ““{instruction}””, }
Model OutputsHere are the unordered outputs from the models. Each output is associated with a specific model, identified by a unique model identifier. { { “model identifier”: “m”, “output”: ““{output 1}”” }, { “output”: ““{output 2}”” } }
TaskEvaluate the models based on the quality and relevance of their outputs, and select the model that generated the best output. Answer by providing the model identifier of the best model. We will use your output as the name of the best model, so make sure your output only contains one of the following model identifiers and nothing else (no quotes, no spaces, no new lines, ...): m or M.
Best Model Identifier<|im end|>
我需要一个针对多种大型语言模型的排行榜。我将向你提供给这些模型的指令以及它们对应的输出结果。你的任务是评估这些响应,并从人类的角度选出产生最佳输出的模型。
## 指令
{
“instruction”: ““{instruction}””, }
## 模型输出
以下是来自各模型的未排序输出。每个输出都与一个特定模型相关联,该模型通过一个独特的模型标识符进行识别。
{
{
“model identifier”: “m”,
“output”: ““{output 1}”” },
{
“output”: ““{output 2}”” }
}
## 任务
根据输出的质量和相关性评估这些模型,并选择产生最佳输出的模型。回答时,请提供最佳模型的标识符。我们将使用你的输出作为最佳模型的名称,因此确保你的输出只包含以下模型标识符之一,且没有任何其他字符(无引号,无空格,无换行符等):m或M。
## 最佳模型标识符
请注意,上述模板中,"instruction" 和 "output" 都应被实际的指令和输出所替换,以便进行实际的评估。在给出评估时,应详细比较各模型的输出,考量其准确性、流畅性、相关性和对指令的遵循程度,从而选出最佳模型。<|im end|>
Prompt2: Judge Evaluation PromptReview the user's query and the corresponding response, assessing their quality according to the following grading system. This system is divided into three criteria:
Information Coverage:
0 points: Completely Missing: The response fails to provide any relevant information, completely failing to address the user's query.
- 0 points: Completely Missing: The response fails to provide any relevant information, completely failing to address the user's query.
1 point: Incomplete: The response provides very little information, failing to touch on the core of the user's query, leaving the user in need of additional information for a satisfactory answer.
- 1 point: Incomplete: The response provides very little information, failing to touch on the core of the user's query, leaving the user in need of additional information for a satisfactory answer.
2 points: Partially Covered: The response touches on some aspects of the user's query, but does not comprehensively address it, potentially omitting key details or information.
- 2 points: Partially Covered: The response touches on some aspects of the user's query, but does not comprehensively address it, potentially omitting key details or information.
3 points: Moderately Covered: The response is relatively comprehensive, covering multiple aspects of the user's query, but still has shortcomings, failing to fully satisfy all needs.
- 3 points: Moderately Covered: The response is relatively comprehensive, covering multiple aspects of the user's query, but still has shortcomings, failing to fully satisfy all needs.
4 points: Fully Covered: The response comprehensively addresses the user's query, encompassing most of the relevant information, allowing the user to gain a fairly complete understanding.
- 4 points: Fully Covered: The response comprehensively addresses the user's query, encompassing most of the relevant information, allowing the user to gain a fairly complete understanding.
5 points: Fully Covered with Extra: The response not only comprehensively answers the user's query but also provides additional relevant information, helping the user gain a deeper understanding of the topic.
- 5 points: Fully Covered with Extra: The response not only comprehensively answers the user's query but also provides additional relevant information, helping the user gain a deeper understanding of the topic.
- Information Coverage:
0 points: Completely Missing: The response fails to provide any relevant information, completely failing to address the user's query.
1 point: Incomplete: The response provides very little information, failing to touch on the core of the user's query, leaving the user in need of additional information for a satisfactory answer.
2 points: Partially Covered: The response touches on some aspects of the user's query, but does not comprehensively address it, potentially omitting key details or information.
3 points: Moderately Covered: The response is relatively comprehensive, covering multiple aspects of the user's query, but still has shortcomings, failing to fully satisfy all needs.
4 points: Fully Covered: The response comprehensively addresses the user's query, encompassing most of the relevant information, allowing the user to gain a fairly complete understanding.
5 points: Fully Covered with Extra: The response not only comprehensively answers the user's query but also provides additional relevant information, helping the user gain a deeper understanding of the topic.
Conciseness:
0 points: Redundant: The response contains a large amount of unnecessary content, making the information appear cluttered and difficult to extract useful information.
- 0 points: Redundant: The response contains a large amount of unnecessary content, making the information appear cluttered and difficult to extract useful information.
1 point: Verbose: The response is lengthy, with overly wordy language that makes it difficult for the user to understand.
- 1 point: Verbose: The response is lengthy, with overly wordy language that makes it difficult for the user to understand.
2 points: Verbosity: Uses too many lengthy sentences or complex vocabulary, failing to clearly convey the core information.
- 2 points: Verbosity: Uses too many lengthy sentences or complex vocabulary, failing to clearly convey the core information.
3 points: Reasonable: The response is reasonably expressed, without being overly lengthy, but there is still room for improvement to further simplify it.
- 3 points: Reasonable: The response is reasonably expressed, without being overly lengthy, but there is still room for improvement to further simplify it.
4 points: Concise: The response is concise and to the point, with clear information transmission, allowing the user to quickly grasp the key points.
- 4 points: Concise: The response is concise and to the point, with clear information transmission, allowing the user to quickly grasp the key points.
5 points: Precise: The response is precisely to the point, with concise wording, no redundancy, and an excellent reading experience.
- 5 points: Precise: The response is precisely to the point, with concise wording, no redundancy, and an excellent reading experience.
- Conciseness:
0 points: Redundant: The response contains a large amount of unnecessary content, making the information appear cluttered and difficult to extract useful information.
1 point: Verbose: The response is lengthy, with overly wordy language that makes it difficult for the user to understand.
2 points: Verbosity: Uses too many lengthy sentences or complex vocabulary, failing to clearly convey the core information.
3 points: Reasonable: The response is reasonably expressed, without being overly lengthy, but there is still room for improvement to further simplify it.
4 points: Concise: The response is concise and to the point, with clear information transmission, allowing the user to quickly grasp the key points.
5 points: Precise: The response is precisely to the point, with concise wording, no redundancy, and an excellent reading experience.
Language Fluency:
0 points: Incoherent: The response is poorly organized in language, with unclear logic, making it difficult to understand.
- 0 points: Incoherent: The response is poorly organized in language, with unclear logic, making it difficult to understand.
1 point: Stiff: The language expression appears stiff, lacking a natural flow, affecting the user's reading experience.
- 1 point: Stiff: The language expression appears stiff, lacking a natural flow, affecting the user's reading experience.
2 points: Unnatural: Although understandable, the language use is not natural, potentially causing discomfort to the user in reading.
- 2 points: Unnatural: Although understandable, the language use is not natural, potentially causing discomfort to the user in reading.
3 points: Understandable: The response can be understood, but there is still room for improvement in fluency and naturalness, with some sentences possibly appearing stiff.
- 3 points: Understandable: The response can be understood, but there is still room for improvement in fluency and naturalness, with some sentences possibly appearing stiff.
4 points: Fluent: The language is used fluently, with reasonable sentence structure, providing a good reading experience for the user.
- 4 points: Fluent: The language is used fluently, with reasonable sentence structure, providing a good reading experience for the user.
5 points: Perfect: The response is fluent and natural in language, with no grammatical errors, demonstrating high-level writing ability, and is easily understood.
- 5 points: Perfect: The response is fluent and natural in language, with no grammatical errors, demonstrating high-level writing ability, and is easily understood.
- Language Fluency:
0 points: Incoherent: The response is poorly organized in language, with unclear logic, making it difficult to understand.
1 point: Stiff: The language expression appears stiff, lacking a natural flow, affecting the user's reading experience.
2 points: Unnatural: Although understandable, the language use is not natural, potentially causing discomfort to the user in reading.
3 points: Understandable: The response can be understood, but there is still room for improvement in fluency and naturalness, with some sentences possibly appearing stiff.
4 points: Fluent: The language is used fluently, with reasonable sentence structure, providing a good reading experience for the user.
5 points: Perfect: The response is fluent and natural in language, with no grammatical errors, demonstrating high-level writing ability, and is easily understood.
User: {query}
<response>{response}</response>
After examining the user’s instruction and the response:
- Briefly justify your total score, up to 100 words. - Conclude with the score using the format: “Score: <total points>”
Remember to assess from the AI Assistant perspective, utilizing web search knowledge as necessary.
评估用户查询和响应质量
根据以下评分系统审查用户查询和对应响应的质量。此系统分为三个标准:
信息覆盖:
0分:完全缺失:响应未能提供任何相关信息,完全未能解决用户的问题。
1分:不完整:响应提供的信息很少,未能触及用户问题的核心,用户需要额外信息才能获得满意的答案。
2分:部分覆盖:响应涉及了用户问题的某些方面,但未全面解决,可能遗漏了关键细节或信息。
3分:适度覆盖:响应较为全面,涵盖了用户问题的多个方面,但仍存在不足,未能完全满足所有需求。
4分:全面覆盖:响应全面解决了用户的问题,包含了大部分相关信息,使用户能够获得较为完整理解。
5分:全面覆盖并额外:响应不仅全面解答了用户的问题,还提供了额外的相关信息,帮助用户更深入地理解主题。
简洁性:
0分:冗余:响应包含大量不必要的内容,使信息显得混乱,难以提取有用信息。
1分:啰嗦:响应内容繁多,语言表达过于冗长,使用户难以理解。
2分:冗长:使用了过多冗长的句子或复杂的词汇,未能清晰传达核心信息。
3分:合理:响应表达合理,未过于冗长,但仍有改进空间以进一步简化。
4分:简洁:响应简洁明了,信息传达清晰,使用户能够迅速抓住要点。
5分:精准:响应精准直达主题,用词精炼,无冗余,提供极佳的阅读体验。
语言流畅性:
0分:不连贯:响应的语言组织混乱,逻辑不清,难以理解。
1分:生硬:语言表达显得生硬,缺乏自然流畅感,影响用户的阅读体验。
2分:不自然:虽然可以理解,但语言使用不够自然,可能给用户带来阅读不适。
3分:可理解:响应可以被理解,但在流畅性和自然性上有改进空间,某些句子可能显得生硬。
4分:流利:语言使用流畅,句子结构合理,为用户提供良好的阅读体验。
5分:完美:响应语言流畅自然,无语法错误,展现出高水平的写作能力,易于理解。
用户:{query}
<response>{response}</response>
在审查用户指示和响应后:
简要说明总分理由,不超过100字。
以总分结束,格式为:“分数:<总分>”
请从AI助手的视角进行评估,必要时利用网络搜索知识。
Prompt 3: Meta-Judge EvaluationPrompt: As a meta-judge, review the user's question, the model's response, and the two previous judgments. Determine which judgment aligns better with the scoring rubric below:
1 point for relevance.
- 1 point for relevance.
2 points for substantial, but not complete, addressing of the question.
- 2 points for substantial, but not complete, addressing of the question.
3 points for useful answers to basic elements.
- 3 points for useful answers to basic elements.
4 points for clear, organized AI Assistant responses.
- 4 points for clear, organized AI Assistant responses.
5 points for expert-level, engaging responses tailored to the question.
- 5 points for expert-level, engaging responses tailored to the question.
User: {user_query} Response: {model_response} Judgment A: {judgment_a} Judgment B: {judgment_b}
- User: {user_query} Response: {model_response} Judgment A: {judgment_a} Judgment B: {judgment_b}
After examining the original question, response, and both judgments:
Explain which judgment is more accurate according to the original rubric and why. Consider factors such as adherence to the rubric, accuracy in evaluating the response, and consistency in applying the criteria.
- Explain which judgment is more accurate according to the original rubric and why. Consider factors such as adherence to the rubric, accuracy in evaluating the response, and consistency in applying the criteria.
After review, explain which judgment is more accurate according to the rubric and conclude with: “Winner: [Judgment A | Judgment B]”.
- After review, explain which judgment is more accurate according to the rubric and conclude with: “Winner: [Judgment A | Judgment B]”.
- After examining the original question, response, and both judgments:
Explain which judgment is more accurate according to the original rubric and why. Consider factors such as adherence to the rubric, accuracy in evaluating the response, and consistency in applying the criteria.
After review, explain which judgment is more accurate according to the rubric and conclude with: “Winner: [Judgment A | Judgment B]”.
### 提示3:元法官评估
**提示内容:**
作为元法官,回顾用户的问题、模型的回答以及前两个判决。根据以下评分标准判断哪个判决更符合要求:
- 1分:相关性。
- 2分:实质上回答了问题,但不完全。
- 3分:对基本要素有实用的回答。
- 4分:清晰、有条理的AI助手回答。
- 5分:专家级、引人入胜的回答,针对问题定制。
- 用户: {用户查询}
回答: {模型回答}
判决A: {判决A}
判决B: {判决B}
- 在审查原始问题、回答和两个判决后:
- 解释哪个判决根据原始评分标准更准确,为什么。考虑的因素包括遵循评分标准、评估回答的准确性以及一致应用标准。
- 审查后,解释哪个判决更准确并以以下方式结束:“胜者:[判决A | 判决B]”。
请提供具体的用户查询、模型回答、判决A和判决B,以便我能根据上述提示进行评估。如果没有具体的例子,我将无法进行具体的元法官评估。如果您只是想了解如何进行这种评估,那么上述翻译和解释应该已经足够清晰。如果有具体例子,请提供,我将根据它们进行评估。
Use the Code to automatically use the prompt2 and 3 to generate DPO datasetThis code structure allows you to adapt the input easily while ensuring the output is formatted for DPO usage.
import openai
import json
import pandas as pd
openai.api_base = 'https://api.openai.com/v1' # change it
openai.api_key = '' # keep it empty
def generate_output(prompt):
"""Generate output from the model based on the provided prompt."""
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo", # vllm need it to act as gpt-3.5-turbo even if the project use qwen
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message['content']
def collect_results(input_data):
"""Collect results by evaluating model outputs using judge and meta-judge prompts."""
results = []
for data in input_data:
# Generate judge output using prompt2
judge_prompt = f"Evaluate the following output: {data['output']}"
judge_output = generate_output(judge_prompt)
# Generate meta-judge output using prompt3
meta_judge_prompt = f"Evaluate the judge's evaluation: {judge_output}"
meta_judge_output = generate_output(meta_judge_prompt)
results.append({
'input': data['input'],
'output': data['output'],
'judge': judge_output,
'meta_judge': meta_judge_output
})
return results
def save_results_to_json(results, filename='dpo_data.json'):
"""Save the collected results to a JSON file for DPO usage."""
with open(filename, 'w') as f:
json.dump(results, f, ensure_ascii=False, indent=4)
# Example input data format
# The input data should be a list of dictionaries, each containing 'input' and 'output' keys.
# Example:
input_data = [
{'input': 'Input example 1', 'output': 'Model output 1'},
{'input': 'Input example 2', 'output': 'Model output 2'},
{'input': 'Input example 3', 'output': 'Model output 3'},
]
results = collect_results(input_data)
save_results_to_json(results)
You can also create a json format to named input_data.json
[
{"input": "Input example 1", "output": "Model output 1"},
{"input": "Input example 2", "output": "Model output 2"},
{"input": "Input example 3", "output": "Model output 3"}
]
and load the json file if you have a large amount of data:
with open('input_data.json', 'r') as f:
input_data = json.load(f)
Example Output:
[
{
"input": "古代汉语的语法特点是什么?",
"output": "古代汉语的语法特点包括主谓宾的固定顺序、丰富的词类变化以及使用助词和虚词来表达句子的意义和结构。",
"judge": "回答准确,涵盖了古代汉语的主要语法特点。",
"meta_judge": "评估有效,正确判断了输出的准确性和完整性。"
},
{
"input": "古代汉语的声调是如何发展的?",
"output": "古代汉语的声调经历了多次演变,从最初的平仄对立到后来的声调系统,影响了现代汉语的声调分类。",
"judge": "描述清晰,说明了声调发展的过程和影响。",
"meta_judge": "评估准确,充分理解了输出内容的历史背景。"
},
{
"input": "古代汉语与现代汉语有什么区别?",
"output": "古代汉语与现代汉语在词汇、语法、发音和使用场合等方面存在显著差异,古代汉语使用的词汇更加丰富,语法结构也更加灵活。",
"judge": "全面列举了区别,涵盖多个方面。",
"meta_judge": "评估充分,正确识别了输出内容的多维度特征。"
},
{
"input": "解释“子曰”的用法。",
"output": "“子曰”是古代汉语中用于引用孔子言论的常用表达,表示某个观点或思想是由孔子所提出。",
"judge": "解释准确,清楚地说明了用法。",
"meta_judge": "评估良好,确认了输出的语境和文化意义。"
},
{
"input": "古代汉语的文言文有哪些特点?",
"output": "文言文的特点包括简练、优雅,常用对仗、排比等修辞手法,且语法结构较为复杂,强调省略和暗示。",
"judge": "回答详尽,突出了文言文的主要特点。",
"meta_judge": "评估有效,准确反映了输出的细节和深度。"
},
{
"input": "古代汉语中常见的成语是如何形成的?",
"output": "古代汉语中的成语通常来源于历史故事、文学作品或民间传说,经过长期使用而形成固定的表达方式。",
"judge": "描述准确,解释了成语形成的多种来源。",
"meta_judge": "评估合理,清晰识别了输出内容的来源与演变。"
},
{
"input": "“道”字在古代汉语中的多重含义是什么?",
"output": "“道”在古代汉语中具有多重含义,包括道路、方法、道理,以及道家哲学中所指的自然法则和宇宙真理。",
"judge": "回答全面,涵盖了“道”字的多个重要含义。",
"meta_judge": "评估准确,成功识别了输出内容的复杂性和多义性。"
},
{
"input": "古代汉语中常见的修辞手法有哪些?",
"output": "常见的修辞手法包括比喻、拟人、排比、对仗和夸张,这些手法丰富了古代文人的表达。",
"judge": "列举了多种修辞手法,解释清晰。",
"meta_judge": "评估有效,准确判断了输出内容的丰富性。"
}
]
README_frontendDevelopmentBefore starting development, you must create a new .env.local
file at project root, and place your api key into it:
OPENAI_API_KEY=<your api key here>
BASE_URL=<your base url>
Local Development# 1. install nodejs and yarn first
# 2. config local env vars in `.env.local`
# 3. run
yarn install
yarn dev
DeploymentDocker (Recommended)docker pull yidadaa/chatgpt-next-web
docker run -d -p 3000:3000 \
-e OPENAI_API_KEY=sk-xxxx \
-e CODE=your-password \
yidadaa/chatgpt-next-web
start service behind a proxy:
docker run -d -p 3000:3000 \
-e OPENAI_API_KEY=sk-xxxx \
-e CODE=your-password \
-e PROXY_URL=http://localhost:7890 \
yidadaa/chatgpt-next-web
If proxy needs password, use:
-e PROXY_URL="http://127.0.0.1:7890 user pass"
Comments