Abstract:
The reconstruction of news flows in early modern Europe is a research topic spanning across media, time and space. Quantitative methods can help with semi-automatic techniques applied to the massive study of intertextuality.
Sharing Brendan Dooley's idea that “the first step in tracing news flows is to compare typical texts”, we are developing algorithms to automatically find textual borrowings to reconstruct these flows. Our corpus is made of both handwritten and printed newsletters, which allows us to specifically study inter and intra medium interactions. Texts are all in Italian, yet the language is not standardised and varies greatly across different sources. We thus need to approach comparisons with fuzzy language-independent methods, in fact relevant for early modern texts in general.
Our techniques are mixed and tailored for each medium type: OCRed texts extracted from printed sources are compared to manually constructed graphs of keywords from handwritten sources. We use a multi layer graph representation, which keeps track of named entities, quantities and other meaningful informations, linking them according to agentivity or specification, aiming at reconstructing the signature of each news. Different representations are then compared with ad hoc techniques, among which we profitably use vector and matrix similarity, string kernels originally developed for protein classification, and more traditional n-gram methods.
The research is mostly experimental and methodological, with a view on the potential reuse and expansion of the methods developed.