Comment on page


Reinforcement Learning from Human Feedback (RLHF)

Using Reinforcement Learning from Human Feedback (RLHF), we fine-tuned our model. We trained a reward model to assess the utterances generated by the model when conversing with humans. This was to identify and correct errors made by the model, as well as to guide it in creating more fruitful conversations. Finally, through the same RLHF, we adjusted the model's parameters to maximize the rewards given by the reward model.
We trained an initial model using a supervised fine-tuning process, and the human-AI trainers were given model-written suggestions to help them create their responses.
We further modified the InstructGPT dataset to be in a dialogue format, thus expanding the range of inputs our model could learn from. Additionally, we used the Reinforcement Learning from Human Feedback (RLHF) algorithm to fine-tune the model’s parameters to make the model's predictions more accurate. To achieve this, a reward model was created to measure the quality of the conversations the model was having with human partners and was used to identify and fix mistakes whilst also enabling us to shape the outputs of our model to be more relevant. Try ASK PEPE -