Logometry: Corpus, Processing, Models
Projet Manager : Laurent Vanni
Permanent members
Brunet, Étienne – Kor Chahine, Irina – Lavigne, FrĂ©dĂ©ric – Magri, VĂ©ronique – Mayaffre, Damon – Poudat, CĂ©line – Rojas, Minerva – Ruggia, Simona – Vanni, Laurent
Non-permanent members
Babault, Sophie – Beghini, Federica – Bouzereau, Camille – Chandelier, Marie – Haris, Sofiane – Kamagate, Karfa – LongrĂ©e, Dominique – Maciel, Carlos – Mahmoudi, Hadi – Maurer, Julia
Presentation
Using automatic or semi-automatic methods, thanks to linguistic processing techniques that draw on computer science, textual statistics and deep learning, the team works on textual corpora: contemporary political speeches, media performances, and ancient and modern literary works.
Working within the fields of discourse analysis, textual linguistics and corpus linguistics, its main objective is to reflect on discursivity/textuality, to reveal the internal organisation of texts, to propose a controlled description of their linguistic composition (recurring vocabulary, dominant grammatical tone, preferred syntactic structures), to establish textual typologies that take into account, in particular, the genre of the discourses considered, the conditions of enunciation, and the socio-historical positioning of the speaker, and finally to objectify reading or interpretative paths.
To do this, the team manipulates digital text corpora (which involves considering how to capture, store, format, lemmatise and tag texts) and develops methods and tools for automatic or semi-automatic processing.
In line with the work of Etienne Brunet, the team favours a quantitative, statistical or mathematical approach to large corpora using the HYPERBASE logometry software, whose performance they are seeking to improve and whose applications they are seeking to diversify, particularly in the field of contemporary political discourse.