4  Language

MARC: 041a

4.1 Complete Dataset Overview

Unique languages: 174

Unique primary languages: 151

1126967 single-language entries (93.89%)

73294 multilingual entries (6.11%)

Unrecognized language: 53833 documents (4.49%)

Conversions from raw to preprocessed language entries

Download language harmonized dataset

Language codes are from MARC; new custom abbreviations can be added in this table.

4.2 Subset Analysis: 1809-1917

Unique languages (1809-1917): 59

68466 single-language entries (93.18%)

5013 multilingual entries (6.82%)

Unrecognized language (1809-1917): 884 documents (1.2%)

Download language harmonized dataset (1809-1917)

4.2.1 Top languages for 1809-1917

Number of titles assigned with each language (top-10). For a complete list, see accepted languages (1809-1917).

Language Entries (n) Fraction (%)
Finnish 36772 3.1
Swedish 21176 1.8
German 2504 0.2
Finnish;Swedish 2397 0.2
Latin 2386 0.2
Russian 2082 0.2
English 1215 0.1
Undetermined 884 0.1
French 669 0.1
Finnish;English 299 0

Title count per language (including multi-language documents; note the log10 scale):