Historical parliamentary corpora: the Carniolan provincial assembly records
Alenka Kavčič, Darja Fišer, Andrej Pančur, Matija Marolt, 2025
Parliamentary debates provide an invaluable source of information for research in various fields, as they reflect the prevailing ideas and beliefs of the time. We have developed a corpus of Slovenian parliamentary records from 1861–1913, covering 694 sessions of the Carniolan Provincial Assembly and containing about 10 million words. This article describes the process of creating the first version of the corpus, which consisted of obtaining OCR processed stenographic notes of the parliamentary sessions, analysing the content to extract metadata, and developing a linguistically processed, metadata-rich and structurally encoded corpus in the Parla-CLARIN format. Transforming these records into more accessible and analysable resources will facilitate in-depth study and analysis of the historical development and change of political concepts and ideas. In addition, a brief analysis of the corpus vocabulary is provided, focusing on the use of Slovenian and German language. As language played a key role in shaping ethnic identity during this period, the results are also discussed in their historical context. To make the corpus accessible to the public, we have developed a web application that facilitates exploration of the corpus and enables efficient searching in an intuitive and user-friendly way.
- Authors:
- Alenka Kavčič, Darja Fišer, Andrej Pančur, Matija Marolt
- Year:
- 2025
- Publishers:
- Oxford University Press
- Source:
- Digital Scholarship in the Humanities