One of the main components of an organism are the proteins: macromolecules formed by chains of amino-acids involved in the elementary processes of cellular life. In other words, proteins are the pillars on which the vital functions of an individual's system are based on. A protein may be responsible for one or more biological functions. Determining which proteins are associated with a given function is a major breakthrough in the fields of pharmacology and modern medicine.
In order to represent the interactions that exist between proteins in an organism, Protein-Protein Interaction (PPI) networks have arisen. In these networks, each vertex corresponds to a specific protein, while the edges are the relationships established between proteins. Thanks to this representation, studying and analyzing how proteins behave becomes easier.
One of the main applications of PPI networks is the alignment of PPI networks of different species. This research process aims to identify those proteins that are homologous or equivalent between species. That is, suppose that a protein in an organism is involved in the immune response against certain pathogens. If it was known which protein in humans performs this same action, it would be benefitial to develop drugs or therapies against these pathogens. When aligning two PPI networks, there are two factors to take into account: topological and biological similarities. On the one hand, the topological similarity between two networks is determined by the percentage of coincidence between the two structures (the neighbors of each vertex, the number of edges, the edge distribution, etc.). On the other hand, biological similarity is obtained by calculating the number of common biological functions in which the proteins are involved. Therefore, the higher the number of common functions, the higher the percentage of biological similarity. Traditionally, these two similarities have always been considered in isolation (i.e., the alignment was calculated exclusively according to one of them) or together (approaching them as a single objective). However, it was later discovered that both similarities constituted conflicting objectives. That is, when the value of one objective increases, the value of the other decreases. For this reason, we propose a novel solution to the problem based on a multi-objective approach: MOMEA (Multi-Objective Mutation-based Evolutionary Algorithm).
Currently, most of the proposals available in the scientific literature are single-objective proposals, i.e., proposals that combine both similarities in a single objective function:
Fitness = wT + (1 - w)B,
where T is the topological similarity, B is the biological similarity, and w and (1 - w) are the weights (importances) assigned to each similarity. The main concern of this strategy is to determine the static weight that each similarity receives. This makes it a rather subjective task, since increasing the weight of one criterion automatically decreases the weight or importance of the other. For this reason, multi-objective algorithms are a very interesting approach for this problem. These algorithms analyze each objective independently without statically assigning weights, but rather they calculate them continuously as new solutions are generated.
Unlike single-objective algorithms where there is only a single optimal solution, in multi-objective algorithms there can be a multitude of potential solutions. To determine when a solution is better than another, the concept of dominance is used. We say that a solution a dominates another solution b, when a has higher quality in one objective than b, and higher or equal quality in the rest of objectives. In this particular problem, one solution dominates another, when it has higher topological similarity and higher or equal biological similarity, or vice versa.
MOMEA, or Multi-Objective Mutation-based Evolutionary Algorithm, uses as its main method an evolutionary algorithm that employs mutation operators to generate new solutions from a multi-objective approach. One of the main features that differentiates MOMEA from other PPI network aligners in the scientific literature is the use of problem-aware mutation operators. These operators focus on performing mutations within the current solution population that benefit and improve both topological and biological quality. The following diagrams display the main logic followed by these operators.
![]() |
![]() |
|---|---|
Biological similarity mutation operator. Given a solution, it iterates over each of its proteins. If for a given protein the mutation probability is satisfied, then the proteins of the second PPI network with which it has the highest degree of biological similarity are obtained. From among the candidates, one is randomly selected. Afterwards, the aligned proteins are swapped between the candidate protein and the initial protein. If the final biological quality is worse than the initial one, the swap is reversed. |
Topological similarity mutation operator. Given a solution, it iterates over each of its proteins. If for a given protein the mutation probability is satisfied, then its neighbors within the first PPI network are obtained. The proteins of the second PPI network are analyzed. If there are proteins with the same number of neighbors, they are stored as possible candidates. From the set of candidates, one protein is randomly selected. Then, its aligned protein is swapped with the aligned protein of the initial one (in addition, this is also done between the neighbors of both). If the mutated solution has worse quality than the original solution, the swap is reversed. |
MOMEA uses a population to store the solutions that are being mutated, as well as a set of solutions (archive) that is being updated as the number of evaluations increases. Initially, the archive solutions are initialized with random individuals. Then, the ranking and crowding operators of NSGA-II are used to classify the individuals in the archive by quality. Next, the elements of the population are initializated with the best individuals in the archive. Finally, the main MOMEA loop starts. As long as the number of evaluations does not reach its maximum value, we iterate over each individual of the population and perform the following actions:
Then, once every individual in the population has been mutated, the ranking and crowding operators are used on the archive to reorganize their current solutions. In addition, the individuals of the population are updated by eliminating the worst ones and replacing them with those whose quality is higher. Finally, the highest quality solutions are also stored in a third data structure that will contain only non-dominated and non-repeated solutions. These will be the solutions that MOMEA returns at the end of the loop.
MOMEA has been developed using C++ as programming language, therefore, it will be an indispensable requirement to have a C/C++ compiler installed. Moreover, it will also be necessary to have the Boost Library. To compile the source code, a MakeFile is provided, so using the make command is needed. Then, in order to be able to execute MOMEA properly, the following input parameters must be specified:
Usage example: ./MOMEA net1File.tab net2File.tab annotations1File.annos annotations2File.annos 11 150 1500000 0.001 output_prefix
This work is supported by several projects. Once these projects have been concluded, the MOMEA source code will be shared on this website.
This work was partially funded by INTERREG V-A Spain-Portugal (European Union), under the contract 0479_RED_FAROTIC_4_E (FAROTIC project). This work was also partially funded by the MCIU (Ministry of Science, Innovation and Universities, Spain), the AEI (State Research Agency, Spain), and the ERDF (European Regional Development Fund, EU), under the contract PID2019-107299GB-I00/AEI/10.13039/501100011033 (Multi-HPC-Bio project).