ViGramm – Visualizing Grammars across Space and Time
The project in a nutshell
Since the early 20th century, trained linguists have painstakingly collected thousands of sentences from little-known Romance varieties (so-called dialects). In recent decades, a considerable effort has been put into digitizing and reviving existing collections of linguistic data (e.g. linguistic atlases), which are now freely available thanks to the recent open data revolution.
The main goal of ViGramm is to move a step forward, by turning digitized textual material into quantitative variables. Variables are geo- and time-referenced and are organized in open-access spreadsheets that provide a faithful, yet abstract picture of grammatical systems.
ViGramm spreadsheets will allow researchers to model syntactic variation through statistical analysis and digital cartography.
ViGramm is a SOSI project (Suivi Ouvert des Sociétés et de leurs Interactions), a funding scheme supported by CNRS Sciences humaines & sociales.
The project team is formed by Diego Pescarini(BCL), Anne Dagnac (CLLE), Stella Medori (LISA), Xavier Bach (CLLE).
The project is made possible by a collaborative network comprising other European research centers that maintain our primary sources such as the database of the Syntactic Atlas of Italy (ASIt – University of Padua), the Manzini & Savoia Archive (University of Florence), the project AIS Reloaded (University of Zurich), the collection of maps published in the framework of the Symila project based at the Université Toulouse Jean Jaurès,.
Relevant figures
[last update: May 2026]
Number of data points: 1,396 towns and villages in Italy, France, Switzerland, Belgium, and Slovenia
Number of variables: 62 linguistic variables studied across five syntactic domains:
- 13 on object clitics
- 17 on subject clitics
- 16 on possessives
- 8 on interrogatives
- 8 on negation
About this website
All ViGramm datasheets are stored in the open-access repository Nakala (to retrieve our datasheets, enter ‘ViGramm’ in the query field and launch the search)
The ViGramm dataset is modular. To retrieve and merge the files that may be relevant for your research, please follow the instructions below.
All files are distributed under a CC BY-NC-SA license.
Understanding the structure of ViGramm datasets
1) Filename
Each .csv file has an id. The structure of the id is :
- source_format_variable_yyyymmdd.csv
2) Source
Each .csv file contains metadata from a single source, e.g.
- ALF : Linguistic Atlas of France
- AIS : Linguistic Atlas of Italy and southern Switzerland
- ASIt : Syntactic Atlas of Italy
- MS : the dataset published in Manzini & Savoia’s 2005 three volumes
- SyMiLa : a survey of French dialects conducted by the university of Toulouse
- Daddipro : a survey of Occitan dialects conducted by the university of Nice
3) Format
ViGramm spreadsheets come in three formats :
- “examples” : each row corresponds to a single sentence/phrase. The columns indicate the place-name of the locality where the examples were collected, the discrete variable representing the phenomenon under study (e.g. 1 = presence of the morpheme X ; 0 = absence of said morpheme), the factor(s) that may affect the distribution of the variable across syntactic environments, e.g. declarative vs interrogative clauses.
Variety, Variable, Factor
Language A, 1, declarative
Language B, 0, declarative
Language A, 1, interrogative
Language B, 1, interrogative
- "synthesis” : this type of file summarizes the information contained in the “examples” files (see above). Each line corresponds to a datapoint. Variables, which are continuous, are organized into columns, each corresponding to a factor :
Variety, Declarative, Interrogative
Language A, 1, 1
Language B, 0, 1
- “sample” this type of file contains information gathered from a selected sample of maps. Each line corresponds to a datapoint. Variables, which are discrete, are organized into columns, each corresponding to a factor :
Variety, Declarative, Interrogative
Datapoint A, 1, 1
Datapoint B, 0, 1
4) Coordinates
To merge datasets and add coordinates, open a new excel file and import data from different files (Data > new query > from file). See this video tutorial below