Llama 2 Prompt Template
Table of Contents
What’s the prompt template best practice for prompting the Llama 2 chat models? #
Note that this only applies to the llama 2 chat models. The base models have no prompt structure, they’re raw non-instruct tuned models.1
The answer is:
If you need newlines escaped, e.g. for using with curl or in the terminal:
<s>[INST] <<SYS>>\n{your_system_message}\n<</SYS>>\n\n{user_message_1} [/INST]
With regular newlines, e.g. for using with text-generation-webui:
<s>[INST] <<SYS>>
{your_system_message}
<</SYS>>
{user_message_1} [/INST]
Without a system message, it’d be like this:
<s>[INST] {user_message_1} [/INST]
To append model responses and continue a conversation, it should look like this
<s>[INST] <<SYS>>\n{your_system_message}\n<</SYS>>\n\n{user_message_1} [/INST] {model_reply_1}</s><s>[INST] {user_message_2} [/INST]
With regular newlines, e.g. for using with text-generation-webui:
<s>[INST] <<SYS>>
{your_system_message}
<</SYS>>
{user_message_1} [/INST] {model_reply_1}</s><s>[INST] {user_message_2} [/INST]
What end of string signifier is used by llama 2 - {EOS} or </s>? #
</s>
To confirm this for yourself, download the llama 2 tokenizer.model and run this python script:
from sentencepiece import SentencePieceProcessor
def main():
model_path = 'tokenizer.model'
sp_model = SentencePieceProcessor(model_file=model_path)
eos_symbol = sp_model.id_to_piece(sp_model.eos_id())
print(f"End of sequence (EOS) symbol is: {eos_symbol}")
if __name__ == "__main__":
main()
Does it use an end of string signifier if there’s only a single message? #
I don’t think so, but I’m not certain.
Notes #
On the hugging face quantized model pages you’ll see a simple:
System: You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
User: {prompt}
Assistant:
Note I’ve done a pull request on one repo and this is now being updated.
But on the Llama repo, you’ll see something different.2
It’s confusing to get it printed as a simple text format!
So, here it is. I did this by taking their generation.py, modifying the code to output the raw prompt text before it’s fed to the tokenizer.
Here’s the result, using the default system message, and a first example user message, a hypothetical model response, and a second example user message.
[INST] «SYS»\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don’t know the answer to a question, please don’t share false information.\n«/SYS»\n\nHi, how are you? [/INST] Good thanks! \n[INST] Can you help me with this math program? [/INST]
Note that I missed the beginning of string and end of string items, which I’ve since updated in this post.
The default system message is:
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don’t know the answer to a question, please don’t share false information.
Acknowledgements #
Thanks to mike-ravine, viniciusarruda and tmm1.