Microsoft’s 1.3 Billion Parameter phi-1.5 Model for a Small Language

2 minutes, 27 seconds Read

Just when you thought you’d heard everything there is to hear about Large Language Models (LLMs), Microsoft Research goes and does something that shakes the industry once again. Microsoft Research announced phi-1, a new big language model for code, in a paper titled “Textbooks is All You Need,” published in June 2023. phi-1 is a 1.3B-parameter transformer-based model that was trained for 4 days on 8 A100s GPUs using a subset of publicly available, “textbook quality” online data.

It would appear that LLMs are shrinking.

Now, thanks to the efforts of Microsoft Research, you may meet phi-1.5, a Transformer with 1.3B parameters that was trained with the same data as phi-1. As was previously mentioned, phi-1 was trained on high-quality textbook data while phi-1.5 was trained only on synthetic data.
In under 8 days, 32xA100-40G GPUs were employed to successfully train the phi-1.5 neural network. The purpose of phi-1.5 was to provide an open-source model that may play a role in the research community by employing a tiny model that is not constrained in any way, allowing you to investigate the many safety concerns associated with LLMs.

Using the ‘Synthetic Data Generation’ method, phi-1.5 has been found to beat most LLMs on tougher reasoning tasks, and its performance on natural language tests is comparable to that of models that are 5x bigger.

Wow, that’s a lot of work, right?

The model’s development as a learner is fascinating. It compiles information from several resources, such as StackOverflow Python code snippets, synthetic Python textbooks, and GPT-3.5-turbo-0301-generated exercises.

Toxic and biased material is one of the main problems with LLMs. Microsoft Research set out to address the ever-present problem of extremist and hateful material available online.

As can be seen in the graphic below, the model trained with synthetic data has a reduced tendency for creating harmful material than other LLMs like Falcon-7B and Llama 2-7B.

small language models

On three benchmarks (common sense reasoning, language skills, and multi-step reasoning), the image below demonstrates that phi-1.5 fared marginally better than state-of-the-art models like Llama 2-7B, Llama-7B, and Falcon-RW-1.3B.

phi 1.5 technical part

Unlike information gleaned from the internet, the data used in LLMs is more akin to what could be found in a textbook. In order to better understand the model’s limits, we utilized ToxiGen to analyze how it handles harmful content and then created 86 prompts that were manually categorized as “pass,” “fail,” or “did not understand.”

In conclusion, phi-1.5 was successful in responding to 47 questions, unsuccessful with 34 questions, and confused with 4 questions. When using the HumanEval methodology to evaluate the models, the generated replies demonstrate that phi-1.5 outperformed some of the more well-known models.

The most important points to remember about phi-1.5 are as follows:

  • Is an example of a transformer design
  • Is a Master of Laws degree with an emphasis on semantic prediction tasks
  • thirty billion tokens were used for training
  • Utilized 32 A100-40G Graphics Processing Units
  • Had a productive 8-day training session

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *