3 Publication time
MARC field: 008/07-14
The publication years section offers a comprehensive summary of the dataset’s years of publication, providing an understanding of the temporal distribution of titles. The inclusion of links to uniquely accepted and discarded output tables adds a layer of transparency, allowing for detailed exploration of the refined and excluded data.
Data harmonization was performed using the polish_years function, originally developed to clean and refine publication time data from the 362a field, which contained numerous symbols requiring conversion for readability and statistical analysis. Although publication years are now extracted from the much cleaner 008 field, the function performs effectively on this field as well, eliminating the need to develop a new one.
The polish_years function processes and harmonizes temporal data, splitting it into columns such as publication_year, publication_from, publication_till (for serials), and publication_decade (for visualization). Links to the converted data are provided below.
The dataset includes information on missing values, represented as NA in the refined data. There are 12 empty rows in the field 008. Discarded values, such as invalid entries coded as characters (e.g., “uuuu”, “||||”) or inconsistent data (e.g., years beyond the current year or mismatched date ranges), are excluded. This does not imply that the discarded values are incorrect; rather, they are excluded because they cannot be utilized for statistical analysis.
The summary also accounts for publication statuses (field 008/06). For accurate temporal distribution, these statuses and their implications must be carefully considered to ensure only valid dates are selected that represent the publication year and not something else.
3.1 Complete Data Overview
Publication year discarded. 801 records are discarded where the publication date is not coded or unknown or contain ambiguous dates, such as non-numeric characters. Error list is for librarians’ use.
Download publication time harmonized dataset
Publication years is available for 1229112 documents (100%). The publication years span is 11-2025.
3.1.1 Title count per decade (log values)
3.1.2 Publication status summaries
Thу visualization of publication status field enhances understanding of how publication years are recorded. The harmonization process depended on the publication status field due to its nuanced information, which doesn’t always directly signify the start or end of publication.
Publication Status | Entries (n) | Fraction (%) |
---|---|---|
Single known date/probable date | 1099762 | 89.4 |
Continuing resource ceased publication | 45120 | 3.7 |
Publication date and copyright date | 39650 | 3.2 |
Continuing resource currently published | 25331 | 2.1 |
Questionable date | 8393 | 0.7 |
Inclusive dates of collection | 4735 | 0.4 |
Reprint/reissue date and original date | 4112 | 0.3 |
Multiple dates | 1997 | 0.2 |
Continuing resource status unknown | 486 | 0 |
Date of distribution etc | 106 | 0 |
Detailed date | 82 | 0 |
No attempt to code | 66 | 0 |
Dates unknown | 27 | 0 |
No dates given; B.C. date involved | 4 | 0 |
3.2 Subset Analysis: 1809-1917
In this segment we concentrate on the so called “long 19th century”: literary production during the years 1809-1917, when the Grand Duchy of Finland was an autonomous part of the Russian Empire.
Publication year conversions (1809-1917)
Publication year discarded (1809-1917)
Download publication time harmonized dataset (1809-1917)
3.2.1 Title count per decade
A plot depicting title counts per decade from 1809 to 1917 enriches the analysis by visually capturing the trends and fluctuations in literary output over this historical period.