IslamRus

IslamRus (islamrus) is a multi-portal Russian corpus of Islamic web discourse. The data were collected in December 2025 and January 2026; the current index was compiled in January 2026. Public subcorpora are provided by genre (doc.genre) and by portal (doc.portal).

Corpus query interface (NoSketch Engine): https://noske.fisun.org/#dashboard?corpname=islamrus

Tagset and annotation notes: https://corpora.fisun.org/corpus-pages/tagset.html

Sources (portals)

IslamRus combines three analytical layers of Russian Islamic discourse: institutional communication (muftiate register), editorial media writing (news and commentary), and educational/advisory writing (didactic texts and Q&A). Portal labels correspond to doc.portal.

Portal coverage (current index)

Token coverage is intentionally uneven: the index is built around large media sources and then complemented with smaller educational and institutional components to maximize register diversity.

PortalTokensShare
islamdag.ru11,804,98632.8%
islam.ru11,223,19731.2%
islam-today.ru7,843,28421.8%
umma.ru3,413,7299.5%
azan.ru1,212,9723.4%
muftiyatrd.ru523,2921.5%

Reach (where available)

Audience figures are heterogeneous across portals. Where explicit portal metrics or widely used third-party estimates are available, they are listed here as contextual information.

islamdag.ru

islamdag.ru is treated as a high-coverage media-and-advisory source with a strong regional anchor in Dagestan. The portal explicitly frames its work as religious education and as protection from pseudo-religious movements linked to extremist ideology.

islam.ru

islam.ru is treated as a large-scale media portal in the current index. It contributes substantial news and editorial discourse and functions as one of the backbone sources for mainstream Russian-language Islamic public writing in IslamRus.

islam-today.ru

islam-today.ru is treated as a federal-scale media source combining news reporting with explanatory and advisory genres. In its positioning statement, the portal links its ideological basis to traditions historically characteristic for Muslims of Tatarstan and Russia.

umma.ru

umma.ru is treated as an author-centered educational source in IslamRus. The site describes itself as an educational project, states a 1999 launch, and identifies Shamil Alyautdinov as the author of materials.

azan.ru

azan.ru is treated as a structured educational and reference source. A programmatic portal text frames the project as “traditional Sunni” in creed and madhhab-based jurisprudence and presents it as an educational initiative.

muftiyatrd.ru

muftiyatrd.ru is treated as the institutional component of IslamRus. It represents the public communication of the Muftiate of the Republic of Dagestan and is used to capture a formal, administrative register (announcements, statements, organizational texts).

Public subcorpora

Public subcorpora are provided by genre (doc.genre) and by portal (doc.portal). Genre subcorpora support controlled comparisons between editorial discourse, news reporting, and Q&A interaction.

By genre

By portal

Document structure

Units and segmentation

<doc> metadata fields

Metadata is extracted from source pages and normalized where possible. Field availability differs across portals and sections.

FieldMeaning
doc.text_idInternal unique identifier (assigned during compilation).
doc.urlSource URL of the document.
doc.portalSource portal label (domain-based).
doc.genreGenre label used for public subcorpora: core, news, qa.
doc.titleTitle/headline (for Q&A sources, the question text may be mapped to title during normalization).
doc.authorAuthor string when present.
doc.statusAuthor role/status descriptor when available (values may be noisy due to heterogeneous sources).
doc.rubricRubric/section label as defined by the source site.
doc.pubdatePublication date when available.
doc.pubyearPublication year.
doc.languageLanguage label when available.

Linguistic annotation

The corpus is annotated in the Universal Dependencies (UD) framework using UDPipe (lemmatization, UPOS tagging, and UD FEATS).

Tagset and annotation notes: https://corpora.fisun.org/corpus-pages/tagset.html

Size (current index)

Tokens36,021,460
Documents (<doc>)76,385
Sentences (<s>)1,971,787

Minimal corpus profile

IslamRus is dominated by editorial and educational discourse (genre_core), while news and Q&A form smaller but analytically distinct registers. In the current index, core accounts for 72.3% of tokens, news for 21.2%, and Q&A for 6.6%.

Subcorpus sizes (tokens)

SubcorpusTokensShare
genre_core26,035,26872.3%
genre_news7,622,30321.2%
genre_qa2,363,8896.6%

How to cite

Fisun, Roman. 2026. IslamRus: Multi-portal Russian Islamic web discourse corpus. Compiled from islamdag.ru, islam.ru, islam-today.ru, umma.ru, azan.ru, muftiyatrd.ru (multi-genre; core sections, news, and Q&A; data collected in December 2025 and January 2026). Available at: https://corpora.fisun.org/ (corpus name: islamrus). Accessed: <YYYY-MM-DD>.

Software

Terms of use

Access to this corpus is restricted (password-protected) and provided on an “as is” basis for research and educational use only. This service does not grant any license or other rights to the underlying texts.

The corpus (including any excerpts, downloads, or derived copies of the original texts) is not freely distributable. Reproduction, redistribution, republication, mirroring, or making the content publicly available is prohibited unless you have explicit permission from the respective rights holders and/or the source websites.

Copyright and any other rights in the original texts remain with the respective source websites and/or their authors. Users are solely responsible for ensuring that any use complies with applicable law and the terms of the original sources.

The maintainer makes no warranties regarding completeness, accuracy, fitness for a particular purpose, or continued availability of the service.

Contact

Maintainer: roman.fisun@ur.de