3  Publication time

MARC field: 008/07-14

The publication years section offers a comprehensive summary of the dataset’s years of publication, providing an understanding of the temporal distribution of titles. The inclusion of links to uniquely accepted and discarded output tables adds a layer of transparency, allowing for detailed exploration of the refined and excluded data.

Data harmonization was performed using the polish_years function, originally developed to clean and refine publication time data from the 362a field, which contained numerous symbols requiring conversion for readability and statistical analysis. Although publication years are now extracted from the much cleaner 008 field, the function performs effectively on this field as well, eliminating the need to develop a new one.

The polish_years function processes and harmonizes temporal data, splitting it into columns such as publication_year, publication_from, publication_till (for serials), and publication_decade (for visualization). Links to the converted data are provided below.

The dataset includes information on missing values, represented as NA in the refined data. There are 12 empty rows in the field 008. Discarded values, such as invalid entries coded as characters (e.g., “uuuu”, “||||”) or inconsistent data (e.g., years beyond the current year or mismatched date ranges), are excluded. This does not imply that the discarded values are incorrect; rather, they are excluded because they cannot be utilized for statistical analysis.

The summary also accounts for publication statuses (field 008/06). For accurate temporal distribution, these statuses and their implications must be carefully considered to ensure only valid dates are selected that represent the publication year and not something else.

3.1 Complete Data Overview

Publication year conversions

Publication year discarded. 801 records are discarded where the publication date is not coded or unknown or contain ambiguous dates, such as non-numeric characters. Error list is for librarians’ use.

Download publication time harmonized dataset

Publication years is available for 1229112 documents (100%). The publication years span is 11-2025.

3.1.1 Title count per decade (log values)

3.1.2 Publication status summaries

Thу visualization of publication status field enhances understanding of how publication years are recorded. The harmonization process depended on the publication status field due to its nuanced information, which doesn’t always directly signify the start or end of publication.

Publication Status Entries (n) Fraction (%)
Single known date/probable date 1099762 89.4
Continuing resource ceased publication 45120 3.7
Publication date and copyright date 39650 3.2
Continuing resource currently published 25331 2.1
Questionable date 8393 0.7
Inclusive dates of collection 4735 0.4
Reprint/reissue date and original date 4112 0.3
Multiple dates 1997 0.2
Continuing resource status unknown 486 0
Date of distribution etc 106 0
Detailed date 82 0
No attempt to code 66 0
Dates unknown 27 0
No dates given; B.C. date involved 4 0

3.2 Subset Analysis: 1809-1917

In this segment we concentrate on the so called “long 19th century”: literary production during the years 1809-1917, when the Grand Duchy of Finland was an autonomous part of the Russian Empire.

Publication year conversions (1809-1917)

Publication year discarded (1809-1917)

Download publication time harmonized dataset (1809-1917)

3.2.1 Title count per decade

A plot depicting title counts per decade from 1809 to 1917 enriches the analysis by visually capturing the trends and fluctuations in literary output over this historical period.