Deep learning, text and intertextuality

One of the team’s areas of focus during the last wave was to propose statistical and convolutional models for processing texts; this work has been the subject of numerous publications posted on HAL and a collective work published by Honoré Champion in 2021: L’intelligence artificielle des textes. Des algorithmes à l’interprétation (Artificial Intelligence in Texts: From Algorithms to Interpretation).

Building on and going beyond this work, the team plans to expand its thinking in two directions.

On the one hand, it will develop models that complement the convolutional models considered so far, namely recurrent models capable of taking into account the syntagmatic axis or the memory of textual units in the chain. Our initial experiments indicate that models based on self-attention mechanisms will be necessary, since the processes of reading and writing, or even the analyst’s interpretative processes, do not only function from left to right (by convolution) but in both directions, taking into account long-term dependencies between words (possible recursive reading, resumption and deletion of manuscripts, reinterpretation as the text progresses, etc.).

On the other hand, the proposed methods and the modelling of textuality should enable us to objectify intertextuality. Based on the vast corpora at our disposal, a palimpsestic text should reveal in its depth (deep) the borrowings and imprints, the echoes and even the plagiarisms. Classification and multichannel linguistic description (forms, lemmas, morphosyntax, syntax) thus concern not only texts, authors or discursive genres, but also, internally within the text, paragraphs or sequences that may have been inspired.