2 Author’s info
2.1 Author name Fennica
MARC: 100a
Author’s name section’s summary tables offer insights into the dataset’s integrity, illustrating the accepted and discarded author names. An examination of missing values in the original dataset provides transparency regarding data completeness. The inclusion of information on name variants and pseudonyms enriches the analysis, addressing nuances in authorship representation. This comprehensive approach ensures a thorough understanding of the dataset’s composition and the intricacies associated with author identification.
2.1.1 Complete Dataset Overview
- Unique accepted entries in original data: 202872
- Unique discarded entries in original data (excluding NA cases): 29
- Original documents with non-NA titles 776971 / 1229782 (63.2%)
- Original documents with missing (NA) titles 452811 / 1229782 documents (36.8%)
2.1.1.1 Authors
202872 unique authors These final names capture all name variants from the custom author synonyme table, and exclude known pseudonymes (see below). If multiple names for the same author are still observed on this list, they should be added on the author synonyme table.
776971 documents have unambiguous author information (63%).
Author name conversions Non-trivial conversions from the original raw data to final names.
2.1.1.2 Auxiliary files
2.1.2 Subset Analysis: 1809-1917
Unique discarded entries in original data (excluding NA cases): 17
Top-20 titles and their title counts for period 1809-1917.
The accompanying plot visually underscores the prominence of these authors, emphasizing the metric of the number of unique titles published by each author.
2.2 Author name Kanto
The author information is enriched using the finto R package. It includes extra missing values of author name, author dates and author profession.
- Kanto author names
- Unique accepted entries in kanto: 111281
- Unique discarded entries in kantoa (excluding NA cases): 3
- Original documents with non-NA titles 696612 / 1229782 (56.6%)
- Original documents with missing (NA) titles 533170 / 1229782 documents (43.4%)
2.3 Author name combined
This is the combination of Fennica and Kanto author names.
2.4 Author Date/Lifetime Fennica
MARC: 100d
The author’s lifetime section furnishes concise summaries following an extensive cleaning process, delineating the accepted and discarded years pertaining to each author.The accepted years signify the refined and validated data, while insights into the discarded years offer valuable context, shedding light on the challenges encountered and decisions made during the cleaning procedure.
2.4.1 Complete Dataset Overview
2.4.2 Subset Analysis: 1809-1917
2.5 Author Date/Lifetime Kanto
2.6 Author Date/Lifetime combined
This is the combination of both Fennica and Kanto dates.
2.7 Author Profession
The profession of the author is only available via Kanto.
Author profession accepted for the kanto
Author profession discarded for the kanto