Parlamint I: Towards Comparable Parliamentary Corpora

Project website

The ParlaMint project focused on the creation of comparable and uniformly annotated corpora of parliamentary debates in Europe. The first stage of the project resulted in the compilation of 17 corpora, while the second stage increased the time-span of the corpora, adding corpora for new countries and autonomous regions, providing a machine translated version of the corpora into English, further enhancing the corpora with additional metadata and improving the usability of the corpora.

ParlaMint I (July 2020 – May 2021)

Tasks

Creating a multilingual set of uniformly annotated corpora of parliamentary proceedings dating from November 2019 to July 2020 (thus covering current COVID-19 pandemic situation).
Creating a set of comparable multilingual reference corpora of parliamentary data from 2015 to October 2019.
Processing the corpora linguistically to add syntactic structures of Universal Dependencies as well as Named Entities annotation.
Making the corpora available through concordancers and Parlameter.
Building use cases in Political Sciences and Digital Humanities based on the corpus data.

ParlaMint I Work Plan (2020-2021)

WP 1: Testing the approach for four languages (Lead: Maciej Ogrodniczuk (IPI-PAN), Petya Osenova (IICT-BAS))

T1.1: Preparation of the reference parliamentary corpora
T1.2: Creation of COVID-19 parliamentary corpora
T1.3: Mounting of the corpora on the NoSketch Engine and KonText concordancers
T1.4: Preparation of guidelines and mini-grant procedure

WP 2: Extending the corpora and showcasing (Lead: Tomaž Erjavec (IJS))

T2.1: Adding additional corpora to the infrastructure
T2.2: Preparation of showcases
T2.3: Preparation of the documentation for usage by interested parties

More detailed information can be found below in section ParlaMint I (July 2020 – May 2021)

The ParlaMint I project (2020-2021)

Created comparable corpora of parliamentary debates:
- Of 29 European countries and autonomous regions
- From 2015 until 2022
- Containing over 1 billion words
Created uniformly encoded corpora
- Inclusion of rich metadata about 24000 speakers
- Linguistically annotated

Parlamint I: Towards Comparable Parliamentary Corpora

ParlaMint I (July 2020 – May 2021)

Tasks

ParlaMint I Work Plan (2020-2021)

INZ Research Group

Andrej Pančur, PhD

Programme Funder