Formal context

The formal context of MetaGen is outlined in Equation 1. The solver defines a problem \(P\) and uses the metaheuristic \(M\), created by the developer, to find an optimal solution \(S_{opt}\) (Equation (1a)). A problem \(P\) is composed of a domain \(D\) and a fitness function \(F\) (Equation (1b)). A domain is a set of variables and their corresponding value range, which the metaheuristic optimizes through the fitness function. A solution is an assignment of valid values to all variables within the domain. All potential solutions to a problem are part of the search space, and the metaheuristic explores, modifies, and evaluates the search space using the fitness function.

Equation 1:

\[\begin{split}M(P) = S_{opt} \qquad (1a)\\ P = \langle D,F \rangle \qquad (1b)\end{split}\]

Domain and solution

The formal definition of a domain \(D\) is given by Equation (2a). It is a set of \(N\) variable definitions, \(V_i \models Def^{T}\), where \(V_i\) represents the name of the variable, and \(Def^{T}\) is its definition of type \(T\) as specified in Eq. (2b). There are six different types of variables, namely \(INTEGER\) (\(I\)), \(REAL\) (\(R\)), \(CATEGORICAL\) (\(C\)), \(GROUP\) (\(G\)), \(DYNAMIC\) (\(D\)) and, \(STATIC\) (\(S\)). The alias \(BASIC\) (\(B\)) is defined as the combination of \(INTEGER\), \(REAL\), and \(CATEGORICAL\) in Eq. (2c).

Equation 2:

\[\begin{split}D = \{Var_1 \models Def^{T}_1, Var_2 \models Def^{T}_2,...,Var_i \models Def^{T}_i,...,Var^{T}_{N} \models Def_{N}\} \qquad (2a)\\ T \epsilon \{INTEGER(I), REAL(R), CATEGORICAL(C),\\ , GROUP(G), DYNAMIC(D), STATIC(S)\} \qquad (2b)\\ BASIC (B) \epsilon \{INTEGER(I), REAL(R), CATEGORICAL(C)\} \qquad (2c)\end{split}\]

The MetaGen supported definitions are defined in Equation 3. The definitions of \(INTEGER\) and \(REAL\) represent a value in the range of integers \(\mathbb{Z}\) and real numbers \(\mathbb{R}\), respectively, in the range \([Min, Max]\) as shown in Eq. (3a). The \(CATEGORICAL\) definition represents a set of \(P\) unique labels, as described in Eq. (3b). The \(GROUP\) definition represents a set of \(Q\) elements, each defined by a basic definition \(Def^B\), as given in Eq. (3c). The \(DYNAMIC\) definition represents a sequence of \(BASIC\) or \(GROUP\) values of length in the range \([LN_{min}, LN_{max}]\) as specified in Eq. (3d). Finally, the \(STATIC\) definition is a sequence of \(BASIC\) or \(GROUP\) values with a fixed length, \(LN\) as specified in Eq. (3e).

Equation 3:

\[\begin{split}Def^{I|R} = \langle Min, Max \rangle \qquad (3a)\\ Def^{C} = \{L_1, L_2,...,L_i,...,L_{P}\} \qquad (3b)\\ Def^{G} = \{E_1 = Def^{B}_1, E_2 = Def^{B}_2,...,E_j = Def^{B}_j,...,E_{Q} = Def^{B}_{Q}\} \qquad (3c)\\ Def^{D} = \langle LN_{min}, LN_{max}, Def^{B|G} \rangle \qquad (3d)\\ Def^{S} = \langle LN, Def^{B|G} \rangle \qquad (3e)\end{split}\]

A collection of example problems can be found in the next table to support the formal definition. The problem \(P_1\) is composed of a domain with an \(INTEGER\) variable \(x\) that moves within the interval \([-10, 10]\), and the function to be optimized is \(f(x)=x+5\). Similarly, the problem \(P_2\) has a domain consisting of a \(REAL\) variable \(x\) that moves within the interval \([0.0, 1.0]\), and the objective function is \(f(x)=x^5\).

Sample problems

Table 1 Sample Problems

\(P_{ID}\)

Domain

Function

\(P_1\)

\(x \models Def^{I} = \langle -10, 10\rangle\)

\(x+5\)

\(P_2\)

\(x \models Def^{R} = \langle 0.0, 1.0\rangle\)

\(x^2\)

\(P_3\)

\(Alpha \models Def^{R} = \langle 0.0001, 0.001\rangle\) \(Iterations \models Def^{I} = \langle 5, 200\rangle\) \(Loss \models Def^{C} = \{squared\:error, huber, epsilon\:insensitive\}\)

\(Regression(Alpha, Iterations, Loss)\)

\(P_4\)

\(Learning\;rate \models Def^{R} = \langle 0.0, 0.000001\rangle\) \(Ema \models Def^{C} = \{True, False\}\) \(Arch \models Def^{D} = \langle 2,10,\,Def^{G} = \{Neurons \models Def^{I} = \langle 25, 300\rangle, Activation \models Def^{C} = \{relu, sigmoid, softmax, tanh\}, Dropout \models Def^{R} = \langle 0.0, 0.45\rangle\}\rangle\)

\(LSTM(Learning\;rate, Ema, Arch)\)

Problems \(P_3\) and \(P_4\) are examples of the target of the MetaGen framework, which is hyperparameter optimization for machine learning models. Problem \(P_3\) illustrates the optimization of a linear regression model. The objective function is the performance of a model that has been trained with the \(Alpha\), \(Iterations\), and \(Loss\) hyperparameters. \(Alpha\) represents the regularization term that the model uses to prevent overfitting and is typically set to a value close to zero. The \(Iterations\) hyperparameter controls the number of times the linear model should be re-calculated before reaching an error tolerance. Finally, the \(Loss\) hyperparameter sets the function used during training to measure the performance of the linear model in each iteration. The domain of problem \(P_3\) consists of the \(Alpha\) variable, which is a \(REAL\) value defined in the interval \([0.0001, 0.001]\), the \(Iterations\) variable, which is an \(INTEGER\) defined in the interval \([5, 200]\), and the \(Loss\) variable, which is a \(CATEGORY\) and can be set to \(squared\:error\), \(huber\) or \(epsilon\:insensitive\).

The problem \(P_4\) represents a hyperparameter optimization problem for a Long Short-Term Memory (LSTM) deep learning architecture. The objective is to determine the best configuration of the network in terms of the number of layers and the optimal hyperparameters for each layer. There are two common properties that apply to all layers of the architecture: the optimizer, which is stochastic gradient descent, and the control of the optimizer, which is through the learning rate and exponential moving average (EMA) parameters. The stochastic gradient descent is an iterative method used to optimize the network’s performance, and the learning rate parameter mod