In machine learning, a model has two types of parameters: those that are learned from data, and hyper-parameters, which are set by the user.
Hyper-parameters are like the settings on a camera; you can change them to get different results.
- Batch size: Batch size refers to the number of samples used in one iteration of training. The larger the batch size, the more memory is required, but the model may converge faster.
- Token: Token refers to a sequence of characters in text data. For example, in the sentence “I am a robot,” “I,” “am,” “a,” and “robot” are all tokens.
- Learning rate: Learning rate is a hyper-parameter that controls how fast the model learns. A higher learning rate means the model will learn faster, but it may also converge to a less optimal solution.
- Epoch: Epoch refers to one complete pass through the entire dataset during training. For example, if you have 100 samples in your dataset and you set the batch size to 10, it will take 10 iterations (or “steps”) to complete one epoch.
- Prompt loss weight: Prompt loss weight is a parameter used to balance the loss function during fine-tuning. It is used to control the relative importance of the prompt and the model’s own prediction during training.
When using our plugin to fine-tune your model these parameters set automatically. You can view hyper-parameters that has been set for each fine-tune process under the Trainings tab. Please click on Hyper-parameters as shown below.
This will open a popup displaying hyper-parameters that have been used for that specific model.
It will look like this:
And below is the hyper-parameters settings of the GPT models that OpenAI trained. All models were trained for a total of 300 billion tokens.
In summary, Hyper-parameter tuning is the process of systematically searching for the best combination of hyperparameters for a learning algorithm in order to improve its performance. Batch size, token, learning rate, epoch, and prompt loss weight are examples of hyperparameters that can be tuned.