
2 Author’s info: name, dates, gender, profession
2.1 Author name Fennica
MARC: 100a
Author’s name section’s summary tables offer insights into the dataset’s integrity, illustrating the accepted and discarded author names. An examination of missing values in the original dataset provides transparency regarding data completeness. The inclusion of information on name variants and pseudonyms enriches the analysis, addressing nuances in authorship representation. This comprehensive approach ensures a thorough understanding of the dataset’s composition and the intricacies associated with author identification.
2.1.1 Complete Dataset Overview
- Unique accepted entries in original data: 0
- Unique discarded entries in original data (excluding NA cases): 0
- Original documents with non-NA titles 0 / 1229782 (NaN%)
- Original documents with missing (NA) titles 0 / 1229782 documents (NaN%)
2.1.1.1 Authors
0 unique authors These final names capture all name variants from the custom author synonyme table, and exclude known pseudonymes (see below). If multiple names for the same author are still observed on this list, they should be added on the author synonyme table.
0 documents have unambiguous author information (NaN%).
Author name conversions Non-trivial conversions from the original raw data to final names.
2.1.1.2 Auxiliary files
2.1.2 Subset Analysis: 1809-1917
Unique discarded entries in original data (excluding NA cases): 0
Top-20 titles and their title counts for period 1809-1917.
The accompanying plot visually underscores the prominence of these authors, emphasizing the metric of the number of unique titles published by each author.
3 Author’s name after integration
4 Author’s date
4.1 Author’s lifetime
MARC: 100d
The author’s lifetime section furnishes concise summaries following an extensive cleaning process, delineating the accepted and discarded years pertaining to each author.The accepted years signify the refined and validated data, while insights into the discarded years offer valuable context, shedding light on the challenges encountered and decisions made during the cleaning procedure.
4.2 Complete Dataset Overview
The field has 0 / (0%) missing values and 1229782 / (100%) with non-missing lifetime information.
4.3 Subset Analysis: 1809-1917
Author date accepted for 1809-1917
Author date discarded for 1809-1917

4.4 Gender
Gender information is not originally included in Fennica. We enriched the data by linking author names with gender information from various sources, including the HENKO project, the Genderize dataset, manual curation and search, and additional records provided by the National Library of Finland. The full list of names and genders can be found here.
In total N = 1229782 records. After enrichmnet in total there are records 939699 / (76.4%) with assigned gender. There are 144916 (15.4%) female names, 253985 (27%) male names and 48008 (5.1%) unisex names.
4.4.1 Genres over Time (1600-1950)

4.4.2 Gender in 1809-1917
In total N = 66860 records. After enrichmnet in total there are records 42868 / (64.1%) with assigned gender. There are 1795 (4.2%) female names, 9594 (22.4%) male names and 1862 (4.3%) unisex names.
4.4.2.1 Genres over Time (1809-1917)

4.5 Author’s profession
Auhtor profession in fennica is depicted in 700,e field. We enrich it with information from Kanto. In total N = 1229782 records. Before enrichment 700,e has 695926 / (56.6%) missing values. After enrichmnet in total there are records 533856 / (41.4%) with assigned profession. There are 33331 professions.
| Profession | Entries (n) | Fraction (%) |
|---|---|---|
| kääntäjä | 50485 | 4.1 |
| professori | 38067 | 3.1 |
| kirjailija | 29630 | 2.4 |
| kirjoittaja | 19745 | 1.6 |
| tutkija | 14686 | 1.2 |
| lukija | 14060 | 1.1 |
| toimittaja | 13069 | 1.1 |
| kääntäjä,kirjailija | 9661 | 0.8 |
| kääntäjä,lukija | 9221 | 0.7 |
| kirjoittaja,kääntäjä | 5580 | 0.5 |
| dosentti | 5398 | 0.4 |
| lehtori,opettaja | 4811 | 0.4 |
| opettaja | 3754 | 0.3 |
| kuvittaja | 3171 | 0.3 |
| lääkäri | 3061 | 0.2 |