6 Genre/form 655
MARC: 655,a
6.1 Description
6.2 Description
Field 655 (Genre/Form) contains genre and form descriptors recorded as free text in the bibliographic metadata. These entries are heterogeneous: multiple values may occur within a single record (typically separated by delimiters such as |), and labels vary in spelling, formatting, and level of standardization.
To enable consistent analysis, the field was harmonized using a controlled vocabulary:
Controlled vocabulary matching: extracted values were matched against a curated list derived from SLM – Suomalainen lajityyppi- ja muotosanasto, a Finnish genre and form thesaurus maintained within the Finto ontology service. This provides a standardized set of genre terms in Finnish.
Filtering of unmatched values: only values that could be directly matched to SLM labels were retained. All other entries were classified as unrecognized and excluded from the harmonized field. These discarded values are reported separately for transparency and quality assessment.
Reconstruction of harmonized field: validated labels were recombined into a semicolon-separated string (
harmonized) per record. As SLM is a Finnish-language vocabulary, all retained genre labels are treated as Finnish, and no additional language assignment was required.Missing data handling: records without valid matches were set to
NA. The proportion of missing values reflects both absence of genre information and limitations in mapping to the controlled vocabulary.
This workflow produces a standardized and reproducible representation of genre/form information, enabling reliable aggregation and comparison across records while preserving information about discarded and ambiguous cases.
6.3 Complete Dataset Overview
Unique genre/form: 1213
There are 1070036 missing values in the dataset,accounting for (81.18%) of the total.
Unrecognized genre_655s provides details of Genre/Form that were discarded, in total: 0.
| Genre/Form | Entries (n) | Fraction (%) |
|---|---|---|
| väitöskirjat | 75774 | 5.7 |
| äänikirjat | 32976 | 2.5 |
| muistelmat | 12565 | 1 |
| tiedotuslehdet | 9010 | 0.7 |
| kartat | 8323 | 0.6 |
| elämäkerrat | 4920 | 0.4 |
| oppaat | 4094 | 0.3 |
| sukukirjat | 3770 | 0.3 |
| asiakaslehdet | 3663 | 0.3 |
| henkilöstölehdet | 2751 | 0.2 |
6.4 Subset Analysis: 1809-1917
Unique genre_655s (1809-1917): 376
There are 56002 missing values in the dataset,accounting for (76.66%) of the total.
Download genre_655 harmonized dataset (1809-1917)
6.4.1 Top Genre/Form for 1809-1917
Accepted Genre/Form (1809-1917).
| Genre/Form | Entries (n) | Fraction (%) |
|---|---|---|
| väitöskirjat | 4853 | 6.6 |
| luettelot | 1476 | 2 |
| virret; arkkiveisut | 794 | 1.1 |
| esitteet | 743 | 1 |
| sanomalehdet | 564 | 0.8 |
| arkkiveisut | 492 | 0.7 |
| hartauskirjat | 297 | 0.4 |
| kokousjulkaisut | 295 | 0.4 |
| huumorilehdet; kausijulkaisut | 256 | 0.4 |
| uskonnolliset lehdet | 253 | 0.3 |