Development and assessment of protein loop modeling methods : Application to CDR loops in antibodies

Amelie Barozet

Résumé

This thesis deals with antibody modeling, in particular the modeling of hypervariable loops found at the interface with the antigen. These protein loops are responsible for specific recognition of the antigen, as well as the formation of the antibody-antigen complex with a great affinity, possible thanks to a great variability of sequence and to the plasticity of these protein fragments. Indeed, contrary to other more stable structural elements like alpha helices or beta sheets, protein coils exhibit a flexibility that plays a crucial role in many biological processes. This manuscript starts with describing the analysis of structural changes in antibodies upon antigen binding, which constitutes the first contribution of this PhD research work. This study, based on the analysis of experimental structural data, shows that antibody conformational changes (occurring mainly in the loops), can be substantial and are not sufficiently accounted for. In particular, docking algorithms show poor results when dealing with excessively flexible hypervariable loops in the antigen binding site. In this context, the PhD research work then focused more generally on protein loop modeling. These flexible protein regions represent a challenge for structural biology. Most experimental data related to protein structure are obtained through X-ray crystallography, which cannot correctly represent flexible parts in the structure. Indeed, it provides a unique structure, which is inappropriate for protein loops, that adopt an ensemble of different conformations with various associated probabilities. As shown by multiple recent works, current protein methods cannot properly model protein loops. Protein loop modeling is usually performed in two steps. First, an exhaustive cong, and consists in attributing scores to each of these sampled conformations. This score is meant to represent the energy differences between the models generated during the first step. Sampling and scoring remain open problems. Indeed, methods developed so far in the field mostly focus on predicting a single stable conformation, that is not representative enough. The two next contributions of the PhD research work logically follow from this observation. The first one presents a method for exhaustive sampling, with a reinforcement learning component to speed up the generation of loop models. This robotics-inspired method uses a geometric representation that forbids steric clashes between atoms and uses protein fragments from a database built specially for this application. The second contribution is an in-depth analysis of the performance of several scoring methods on a set of flexible loops for which experimental data exist. Combining sampling and scoring allows the visualization of energy landscapes implicitly modeled by these methods. The analysis of these energy landscapes enables to precisely identify both the flaws of sampling and the limits of scoring methods. Finally, these methods were applied to an antibody with a hypervariable loop which changes conformations upon antigen binding. Results show that the methods previously studied and developed enable to model a consistent energy landscape for this flexible loop, identifying both known conformations. This suggests that these methods could be successfully applied to antibody design by predicting a loop's stability in a position or another and discarding loop sequences that are insufficiently stable or that adopt undesirable conformations. Although applied to antibodies, the research contributions presented in this work can perfectly be generalized to the analysis of protein loops in other systems, since the developed methods are not antibody-specific.

Cette thèse porte sur la modélisation d'anticorps et de leurs boucles hypervariables. Ces boucles protéiques assurent la reconnaissance spécifique de l'antigène et la formation du complexe anticorps-antigène. La spécificité et l'affinité de cette interaction sont permises par une grande variabilité de séquence et par la plasticité de ces fragments protéiques. En effet, contrairement à d'autres éléments de structures plus stables comme les hélices alpha et les feuillets beta, les boucles protéiques possèdent une flexibilité s'avérant cruciale pour certaines fonctions biologiques. La première contribution de la thèse analyse les changements structurels d'anticorps survenant suite à la liaison avec l'antigène. Cette étude s'appuie sur l'analyse de données structurales expérimentales et établit que les changements conformationnels de l'anticorps (principalement dans les boucles) peuvent être substantiels et sont insuffisamment pris en compte. La prédiction de l'amarrage anticorps-antigène est particulièrement mise en difficulté par la flexibilité des boucles hypervariables. Fort de ce constat, le travail de thèse s'est ensuite concentré sur la modélisation plus générale de boucles protéiques. Ces régions flexibles représentent un défi pour la biologie structurale puisque la grande majorité des données expérimentales liées aux structures protéiques proviennent de cristallographie aux rayons X. Cette technique ne peut fournir qu'une structure unique, inadaptée à la réalité des boucles protéiques qui peuvent adopter un ensemble de conformations différentes, avec diverses probabilités associées. Divers travaux soulignent l'insuffisance des méthodes actuelles de modélisation de boucles protéiques, qui s'effectuent généralement en deux étapes. La première, appelée échantillonnage, doit générer les conformations possibles de la boucle de manière exhaustive, afin de représenter ce fragment protéique de manière globale. L'étape suivante, l'évaluation, consiste à associer un score à chacune de ces conformations, censé représenter leur énergie. L'échantillonnage et l'évaluation restent des problèmes ouverts, les méthodes développées jusqu'à présent dans ce domaine se concentrant en majorité sur la prédiction d'une unique conformation stable, insuffisamment représentative. C'est dans ce contexte que se positionnent les deux contributions suivantes de la thèse. La première propose une méthode d'échantillonnage exhaustif, intégrant une composante d'apprentissage par renforcement pour accélérer la génération de modèles de boucles. Cette méthode inspirée de la robotique utilise une modélisation géométrique interdisant les collisions entre atomes et utilise des fragments protéiques issus d'une base de données. La seconde contribution analyse en profondeur les performances de diverses méthodes d'évaluation sur plusieurs boucles flexibles pour lesquelles des données expérimentales sont disponibles. La combinaison de l'échantillonnage et de l'évaluation permet de reconstituer une visualisation des paysages énergétiques modélisés par ces méthodes et d'identifier plus précisément les insuffisances de l'échantillonnage et les limites de l'évaluation. Enfin, ces méthodes ont été appliquées à un anticorps dont une boucle hypervariable subit un changement conformationnel lors de sa liaison avec l'antigène. Les résultats montrent qu'il est possible de reconstituer un paysage énergétique cohérent pour cette boucle, qui identifie les conformations connues. Cela souligne une application intéressante de ces méthodes pour le design d'anticorps, en prédisant la stabilité d'une conformation de boucle donnée et en éliminant les séquences insuffisamment stables ou adoptant des conformations indésirables. Bien qu'appliquées aux anticorps, les contributions du travail de thèse se généralisent parfaitement à l'analyse de boucles protéiques appartenant à d'autres systèmes, les méthodes développées n'étant pas spécifiques au cas des anticorps.

Development and assessment of protein loop modeling methods : Application to CDR loops in antibodies

Développement et évaluation de méthodes pour la modélisation de boucles protéiques : application aux boucles CDR d'anticorps

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager