2 Author’s name
MARC: 100a
Author’s name section’s summary tables offer insights into the dataset’s integrity, illustrating the accepted and discarded author names. An examination of missing values in the original dataset provides transparency regarding data completeness. The inclusion of information on name variants and pseudonyms enriches the analysis, addressing nuances in authorship representation. This comprehensive approach ensures a thorough understanding of the dataset’s composition and the intricacies associated with author identification.
2.1 Complete Dataset Overview
- Unique accepted entries in original data: 197995
- Unique discarded entries in original data (excluding NA cases): 33
- Original documents with non-NA titles 757075 / 1200261 (63.1%)
- Original documents with missing (NA) titles 443186 / 1200261 documents (36.9%)
2.1.2 Auxiliary files
2.2 Subset Analysis: 1809-1917
Unique discarded entries in original data (excluding NA cases): 18
Top-20 titles and their title counts for period 1809-1917.
The accompanying plot visually underscores the prominence of these authors, emphasizing the metric of the number of unique titles published by each author.