ParlaCAP – Comparing agenda settings across parliaments via the ParlaMint dataset
Summary
The ParlaCAP project leverages advanced natural language processing to analyse political agendas and sentiments in debates from 27 European national parliaments. The automatic coding of agendas throughout a wide dataset of more than 7 million speeches, given in more than 20 languages, has become possible recently with significant developments in natural language processing and artificial intelligence, allowing for multilingual transformer models to provide both highly consistent and accurate codings. By integrating the ParlaMint dataset and the Comparative Agendas Project’s coding scheme, the project will create a comprehensive, FAIR dataset for comparative political research, enhancing transparency and accountability in legislative discourse across Europe.
Challenge
Open Science project, Open Science Service, Cross-domain/Cross-RIParliaments are the cornerstone of democracy in Europe, ensuring the political representation of citizens. Despite their empirical relevance, parliamentary studies have often limited their scope to a single parliamentary body or a small group of parliaments analysed in comparative perspective.
The main challenge of the ParlaCAP project is to bridge the gap between existing parliamentary research data and how to utilise these data in political science research by integrating two key international and cross-disciplinary initiatives: the CLARIN ERIC ParlaMint project, which provides texts of parliamentary debates from 27 European national parliaments, and the Comparative Agendas Project (CAP), which offers a coding schema of 21 topics for tracking political agendas in parliamentary proceedings.
Solution
The project will employ the Comparative Agendas Project’s text-as-data methodology to analyse parliamentary debates of all the 27 parliaments, consisting of more than 7 million speeches, given in more than 20 languages, by automatically coding the agenda of each speech and transforming the ParlaMint corpora into a structured and tabular dataset, available for complete download through CESSDA ERIC.
The project aims to further code each speech with the sentiment expressed, as well as cross-reference the data with the PartyFacts metadatabase on political party metadata and the V-DEM surveys on the state of democracies. With this enriched and fully-FAIR dataset, now suitable for quantitative research, it will be possible to acquire a comprehensive understanding of how political attention is distributed across policy areas by analysing topic and sentiment coding over an unprecedented number of parliaments for political science research. The dataset will be available through RIs, such as CESSDA, CLARIN, and DARIAH, along with a graphical user interface and API for broader accessibility.
Scientific Impact
ParlaCAP will revolutionise comparative parliamentary studies by providing a robust dataset for tracking political agenda-setting across European parliaments. Its open, FAIR data management approach will support a wide range of RIs and projects in social sciences, while promoting transparency in political discourse and accountability of legislative bodies. The findings will have societal relevance, fostering collaboration in political science and beyond.
Moreover, the engagement activities foreseen in the frame of the project will provide services and accompanying tutorials to raise awareness of the political studies community and ensure that the project’s results are FAIR for further application and research by scientists across various Social Sciences and Humanities (SSH) domains.
Open science added value
The new FAIR dataset will feature speech-level metadata on democracies, parties, speakers, topics, and sentiment, accompanied by both original and translated text of the debates as supporting information. By providing structured data, it will be possible to better serve the needs of the CESSDA ERIC infrastructure, the CAP infrastructure on agenda setting in political discourse, the MEDEM infrastructure on monitoring electoral democracies, and all RIs and research agendas interested in parliamentary debates that rely primarily on structured data analysis.
- Project type:
- OSCARS project
- Period:
- 1. 1. 2025 - 1. 1. 2027
- Funders:
- Fundend by the European Union
- Lead Organisation:
- Jožef Stefan Institute
- Partner Organisations:
- Jožef Stefan Institute
- Institute for Contemporary History
- University of Zagreb
- Bulgarian Academy of Sciences
- Polish Academy of Sciences
- Head:
- Nikola Ljubešić
INZ Research Group
Programme Funder