ViGramm – Visualizing Grammars across Space and Time

ViGramm spreadsheets will allow researchers to model syntactic variation through statistical analysis and digital cartography.
Sommaire

The project in a nutshell

Since the early 20th century, trained linguists have painstakingly collected thousands of sentences from little-known Romance varieties (so-called dialects). In recent decades, a considerable effort has been put into digitizing and reviving existing collections of linguistic data (e.g. linguistic atlases), which are now freely available thanks to the recent open data revolution.

The main goal of ViGramm is to move a step forward, by turning digitized textual material into quantitative variables. Variables are geo- and time-referenced and are organized in open-access spreadsheets that provide a faithful, yet abstract picture of grammatical systems.

ViGramm spreadsheets will allow researchers to model syntactic variation through statistical analysis and digital cartography.

ViGramm is a SOSI project (Suivi Ouvert des Sociétés et de leurs Interactions), a funding scheme supported by CNRS Sciences humaines & sociales.

The project team is formed by Diego Pescarini(BCL), Anne Dagnac (CLLE), Stella Medori (LISA), Xavier Bach (CLLE). 

The project is made possible by a collaborative network comprising other European research centers that maintain our primary sources such as the database of the Syntactic Atlas of Italy (ASIt – University of Padua), the Manzini & Savoia Archive (University of Florence), the project AIS Reloaded (University of Zurich), the collection of maps published in the framework of the Symila project based at the Université Toulouse Jean Jaurès,.

Relevant figures

[last update: May 2026]

Number of data points: 1,396 towns and villages in Italy, France, Switzerland, Belgium, and Slovenia

Number of variables: 62 linguistic variables studied across five syntactic domains:

    • 13 on object clitics
    • 17 on subject clitics
    • 16 on possessives
    • 8 on interrogatives
    • 8 on negation

 

About this website

All ViGramm datasheets are stored in the open-access repository Nakala (to retrieve our datasheets, enter ‘ViGramm’ in the query field and launch the search)

The ViGramm dataset is modular. To retrieve and merge the files that may be relevant for your research, please follow the instructions below.

All files are distributed under a CC BY-NC-SA license.

Understanding the structure of ViGramm datasets

1) Filename

Each .csv file has an id. The structure of the id is :

2) Source

Each .csv file contains metadata from a single source, e.g.

3) Format

ViGramm spreadsheets come in three formats :

Variety, Variable, Factor
Language A, 1, declarative
Language B, 0, declarative
Language A, 1, interrogative
Language B, 1, interrogative

Variety, Declarative, Interrogative
Language A, 1, 1
Language B, 0, 1

Variety, Declarative, Interrogative
Datapoint A, 1, 1
Datapoint B, 0, 1

4) Coordinates

.csv files do not always contain coordinates. Coordinates of all ViGramm datapoints are stored in a separate .csv file and can be added to “examples” or “synthesis” files by using “datapoint” as a connecting column.
To merge datasets and add coordinates, open a new excel file and import data from different files (Data > new query > from file). See this video tutorial below

Excel - Merge Data from Multiple Sheets Based on Key Column