Foma is a Russian corpus compiled from the online portal of the Russian Orthodox magazine Foma (foma.ru), excluding Q&A content. Data was collected in December 2025.
The source portal foma.ru is part of the Orthodox media project “Foma”, which positions itself as “Orthodox media for those who doubt” and aims to explain Orthodox faith and church life in modern, accessible language.
“Foma” describes its specialization as cultural and educational. Its typical focus includes religion, society, culture, history, and related public issues, while avoiding discussion of day-to-day secular politics.
<doc> metadata: doc.language, doc.author, doc.status, doc.title, doc.pubyear, doc.source, doc.url, doc.rubric, doc.text_id.<s> (attribute: s.id).The corpus is annotated within the Universal Dependencies (UD) framework using UDPipe for lemmatization and UPOS tagging, with morphological features in UD FEATS format.
The tagger and lemmatizer model is based on UD Russian SynTagRus.
For the tagset and annotation notes, see: corpora.fisun.org/corpus-pages/tagset.html.
Positional attributes: word, lemma, pos, morph, lc.
| Item | Count |
|---|---|
| Tokens | 29,890,878 |
| Words | 23,434,351 |
Sentences (<s>) | 1,877,821 |
Documents (<doc>) | 31,744 |
| Attribute | Count |
|---|---|
word | 687,581 |
lemma | 374,911 |
pos | 17 |
morph | 553 |
lc | 607,073 |
The corpus defines <doc> with 9 attributes and <s> with 1 attribute (s.id).
| Structure | Attribute | Distinct values |
|---|---|---|
<doc> | doc.author | 529 |
<doc> | doc.language | 1 |
<doc> | doc.pubyear | 5,576 |
<doc> | doc.rubric | 373 |
<doc> | doc.source | 1 |
<doc> | doc.status | 200 |
<doc> | doc.text_id | 31,744 |
<doc> | doc.title | 31,647 |
<doc> | doc.url | 31,744 |
<s> | s.id | 1 |
Fisun, Roman. 2025. Foma: Russian Orthodox Magazine Corpus. Compiled from foma.ru
(data collected in December 2025; Q&A content excluded).
Available at: https://corpora.fisun.org/
(corpus name: foma). Accessed: <YYYY-MM-DD>.
Access to this corpus is restricted (password-protected) and provided on an “as is” basis for research and educational use only. This service does not grant any licence or other rights to the underlying texts.
The corpus, including any excerpts, downloads, or derived copies of the original texts, is not freely distributable. Reproduction, redistribution, republication, mirroring, or making the content publicly available is prohibited unless you have explicit permission from the respective rights holders and/or the source website.
Copyright and any other rights in the original texts remain with the respective source website and/or their authors. Users are solely responsible for ensuring that any use complies with applicable law and the terms of the original source.
The maintainer makes no warranties regarding completeness, accuracy, fitness for a particular purpose, or continued availability of the service.
name.surname [at] ur.de