4 Language
MARC: 041a
4.1 Complete Dataset Overview
Unique languages: 174
1126967 single-language entries (93.89%)
73294 multilingual entries (6.11%)
Unrecognized language: 53833 documents (4.49%)
Conversions from raw to preprocessed language entries
Download language harmonized dataset
Language codes are from MARC; new custom abbreviations can be added in this table.
4.2 Subset Analysis: 1809-1917
Unique languages (1809-1917): 59
68466 single-language entries (93.18%)
5013 multilingual entries (6.82%)
Unrecognized language (1809-1917): 884 documents (1.2%)
Download language harmonized dataset (1809-1917)
4.2.1 Top languages for 1809-1917
Number of titles assigned with each language (top-10). For a complete list, see accepted languages (1809-1917).
Language | Entries (n) | Fraction (%) |
---|---|---|
Finnish | 36772 | 3.1 |
Swedish | 21176 | 1.8 |
German | 2504 | 0.2 |
Finnish;Swedish | 2397 | 0.2 |
Latin | 2386 | 0.2 |
Russian | 2082 | 0.2 |
English | 1215 | 0.1 |
Undetermined | 884 | 0.1 |
French | 669 | 0.1 |
Finnish;English | 299 | 0 |
Title count per language (including multi-language documents; note the log10 scale):