LLAMA 2 LONG AI – META
Meta quietly unveils Llama 2 Long AI that beats GPT-3.5 Turbo and Claude 2 on some tasks. Llama 2 Long, a new AI model based on Meta’s open source Llama 2 released which has undergone continual pretraining from Llama 2 with longer training sequences and on a dataset where long texts are upsampled.
Meta’s newly elongated AI model outperforms some of the leading competition in generating responses to long user prompts, including OpenAI’s GPT-3.5 Turbo with 16,000-character context window, as well as Claude 2 with its 100,000-character context window.
Meta researchers took the original Llama 2 available in its different training parameter sizes – the values of data and information the algorithm can change on its own as it learns, which in the case of Llama 2 come in 7 billion, 13 billion, 34 billion, and 70 billion variants and included more longer text data sources than the original Llama 2 training dataset.
Then, the researchers kept the original Llama 2’s architecture the same, and only made a necessary modification to the positional encoding that is crucial for the model to attend longer.
Using reinforcement learning from human feedback (RLHF), a common AI model training method where AI is rewarded for correct answers with human oversight to check it, and synthetic data generated by Llama 2 chat itself, the researchers were able to improve its performance in common LLM tasks including coding, math, language understanding, common sense reasoning, and answering a human user’s prompted questions.