Browsing by Author "Muñoz Cruces, Martín Alonso"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- ItemOutput-Linear enumeration for extensions of MSO(2025) Muñoz Cruces, Martín Alonso; Riveros Jaeger, Cristian; Pontificia Universidad Católica de Chile. Escuela de IngenieríaEnumeration algorithms for data extraction queries are especially relevant in modern day systems where data reaches extremely large sizes while sometimes having little-to-no organization. This thesis proposes an enumeration framework for queries for data extraction that can be modeled by different formal languages, and in some cases, over compressed data. This is done by the development of an enumeration data structure called Enumerable Compact Set, which provides a set of basic operations over sets, each implementable in constant time, that render a concise representation of the set of results which can then be enumerated efficiently. First, this thesis provides an algorithm for enumerating results of a query that has been modeled by a nested-word automata, over a nested document. This algorithm is built to work in a streaming setting, and requires time that is independent from the data size (constant in data-complexity) per input symbol. Second, we detail three algorithms for enumeration for queries that are modeled by contextfree grammars. We deal with the cases of unambiguous, rigid and determinable grammars, which require cubic, quadratic and linear time over the data, respectively. Third, we show enumeration over compressed documents via an algorithm that receives a regular-language query and a document that is concisely represented as a straight-line program. The enumeration can be done after linear time preprocessing in the size of thecompact representation. This thesis also serves as a proof of concept of the Annotated Automata model. This model properly subsumes Document Spanners efficient reductions, and allows the construction of algorithms that we deem to be more intuitive and easier to implement. Needless to say, all of our results are also applicable to Document Spanners.