Scandinavian Working Papers in Economics

SSE Working Paper Series in Economic History,
Stockholm School of Economics

No 2026:2: Relational Databases and Machine Learning for Qualitative Big Data

Erik Lakomaa () and Christoffer Friedl ()
Additional contact information
Erik Lakomaa: Institute for Economic and Business History Research, Postal: Stockholm School of Economics, P.O. Box 6501, SE-113 83 Stockholm, Sweden
Christoffer Friedl: Stockholm School of Economics, Postal: Stockholm School of Economics, P.O. Box 6501, SE-113 83 Stockholm, Sweden

Abstract: Recent advances in large-scale digitisation have created new opportunities for economic and business historians who work with substantial bodies of qualitative archival material. Although historical datasets of around 10,000 to 100,000 observations are modest compared to conventional big data, they present similar information processing challenges and make it possible to apply a wide range of machine learning techniques. In this paper, we show how relational databases can provide the necessary infrastructure for preparing, structuring, and analysing large qualitative historical datasets, and how they support the effective use of machine learning tools, including Large Language Models (LLMs). We draw on a research program that has collected more than 114,000 digitised documents from over 30 archives. Our relational database design enables us to structure unstructured sources, standardise metadata, link documents to events and actors, and create longitudinal datasets that can be used for supervised learning, topic modelling, document classification, and embedding-based similarity searches. We also assess the value and limitations of LLMs in historical research. LLMs can accelerate tasks such as document triage, entity recognition, thematic grouping, and preliminary coding. At the same time, they introduce risks related to hallucinations, opaque reasoning processes, and difficulties in tracing the evidentiary basis of outputs. We argue that relational databases reduce these risks by retaining document-level traceability, by making the full set of consulted sources transparent, and by allowing researchers to verify and reinterpret AI-assisted results by saving the epistemological chain of tentative AI suggestions and subsequent researcher validation. Our contribution is an empirically grounded demonstration of how qualitative big data, relational databases, and machine learning methods can be combined to advance economic history, along with a discussion of the safeguards needed to ensure these tools are used responsibly.

Keywords: Qualitative methods; Artificial intelligence; Machine Learning; Big data; Economic History

JEL-codes: A00; N00; N01

Language: English

23 pages, February 11, 2026

Download statistics

Questions (including download problems) about the papers in this series should be directed to Erik Lakomaa ()
Report other problems with accessing this service to Sune Karlsson ().

RePEc:hhs:haechi:2026_002This page generated on 2026-03-12 11:01:19.