One of the most critical tasks in machine learning is finding the proper level of model complexity. If the model is too complex the data is used to fit and build the model outstandingly, however the unseen data is generalized poorly (overfitting). If there is insufficient complexity, the model can’t capture all the information in the data (underfitting).
In both machine learning and deep learning scenarios, the model performance relies a lot on the hyperparameter values selected. Therefore the aim of the exploration of hyperparameters is to search across a number of hyperparameter configurations and come up with a configuration that gives the best possible performance. Generally, exploration of hyperparameters is manual and cumbersome since the search space is vast and the evaluation of each configuration is expensive.
- What are hyperparameters?
- Here are the training algorithm hyperparameters
- Methods used to find out Hyperparameters
1. What are hyperparameters?
There are two different types of parameters that make up machine learning models:
- Hyper-parameters: these are all the variables that a user can set arbitrarily before starting the training.
- Model Parameters: These are parameters that are learned during model training.
While model parameters go into the transformation of input data into desired outputs, hyperparameters are used to define the structure of the model in use. The learning algorithms most commonly in use have a set of hyperparameters that need to be defined before the training commences. Different training algorithms use different hyperparameters and some don’t even require a hyper-parameter like the ordinary least square.
What is essential to understand is that hyperparameters can change the model’s output significant in relation to the time taken to train it. So it is critical to pick the right hyperparameter as having it is half of the part of the solution while the rest is figuring what kind suits the need. This is what makes the difference between parameter vs hyper-parameter.
These are the hyper-parameter in machine learning related to network structures:
- A number of hidden layers: These are the layers that come between the input and output layers. The presence of more hidden layers normally improves accuracy to a degree that can change depending on the problem.
- Dropout: It a regularization technique that avoids overfitting and increases validation accuracy. It also shows what per cent of neurons should be randomly taken out to prevent overfitting in each epoch.
- Weight initialization: different weight initializations are best used according to the activation function on each layer. For the first forward pass, it’s essential to set initial weights.
- Activation functions: These are used to introduce non-linearity to models. It is necessary to allow deep learning models to learn nonlinear prediction boundaries.
2. Here are the training algorithm hyperparameters
- Learning rate: Learning rate defines how rapidly the parameters in a network are updated. Low learning rates lead to smooth converges while larger learning rates may not lead to convergence.
- Momentum: Momentum determines the next step’s direction using the knowledge of the previous steps. Momentum can prevent oscillations and the typical choice is between 0.5 to 0.9.
- A number of epochs: This is basically the number of times the network is shown the whole data while training. Epochs can be increased until validation accuracy starts to drop even when overfitting.
- Batch size: It is the number of subsamples given to the network before which parameter updates occur. The default batch size is 32.
3. Methods used to find out Hyperparameters
Hyperparameter optimization is done through the following methods:
- Manual search: Hyperparameters were traditionally tuned by trial and error. It is still done the same way often where experienced operators can take a guess at the parameter values that can result in high accuracy in deep learning models. It is simple and effective in the hands of experienced operators but is not a scientific approach. There is always a constant search for better more automated methods.
- Grid search: grid search is a leg up from manual tuning. It is a systematic procedure to test multiple values for each hyper-parameter by having the model automatically trained for each value of the parameter. Grid searches for example can be performed for automatically training the models for 10-100 sample batch sizes in steps of 20. The model runs multiple times and picks the batch which yields the highest accuracy.
- Random search: It is seen sometimes that testing randomized values of hyperparameters can be more effective compared to both manual and grid search. Instead of focusing on promising areas, it’s better to pick random test values from the whole problem space.
- Bayesian optimization: This technique uses different hyperparameter values to approximate the trained model. It observes the output generated for the model by each set of parameter values and continues sampling until it has a list of hyperparameter values sets.
In simple words, hyperparameters are the variables that determine the network structure and also how the network is trained. They are set before training the network.
There are no right or wrong ways of learning AI and ML technologies – the more, the better! These valuable resources can be the starting point for your journey on how to learn Artificial Intelligence and Machine Learning. Do pursuing AI and ML interest you? If you want to step into the world of emerging tech, you can accelerate your career with this Machine Learning And AI Courses by Jigsaw Academy.