Efficient Tree-Structured Categorical Retrieval

Djamal Belazzougui, Gregory Kucherov

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We study a document retrieval problem in the new framework where D text documents are organized in a category tree with a pre-defined number h of categories. This situation occurs e.g. with taxomonic trees in biology or subject classification systems for scientific literature. Given a string pattern p and a category (level in the category tree), we wish to efficiently retrieve the t categorical units containing this pattern and belonging to the category. We propose several efficient solutions for this problem. One of them uses n(log(1 + o(1)) + logD + O(h)) + O() bits of space and O(p + t) query time, where n is the total length of the documents,the size of the alphabet used in the documents andis the total number of nodes in the category tree. Another solution uses n(log(1+o(1))+O(logD))+O()+O(Dlog n) bits of space and O(p+t logD) query time. We finally propose other solutions which are more space-efficient at the expense of a slight increase in query time. 2012 ACM Subject Classification Theory of computation ! Pattern matching; Information systems ! Document representation; Information systems ! Information retrieval query processing; Theory of computation ! Data structures design and analysis.

Original languageEnglish
Title of host publication31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020
EditorsInge Li Gortz, Oren Weimann
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
ISBN (Electronic)9783959771498
DOIs
Publication statusPublished - 1 Jun 2020
Externally publishedYes
Event31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020 - Copenhagen, Denmark
Duration: 17 Jun 202019 Jun 2020

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume161
ISSN (Print)1868-8969

Conference

Conference31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020
Country/TerritoryDenmark
CityCopenhagen
Period17/06/2019/06/20

Keywords

  • Category tree
  • Document retrieval
  • Pattern matching
  • Space-efficient data structures

Fingerprint

Dive into the research topics of 'Efficient Tree-Structured Categorical Retrieval'. Together they form a unique fingerprint.

Cite this