PML-TQ - Tool for Querying Treebanks

PML-TQ is a powerful open-source search tool for all kinds of linguistaically annotated treebanks with several client interfaces and two search backends (one based on a SQL database and one based on Perl and the TrEd toolkit). The tool works natively with treebanks encoded in the PML data format (conversion scripts are available for many established treebank formats).


PML-TQ in TrEd (screenshot) PML-TQ in Opera (screenshot)

Getting Started

Search your local files:

Use the client-side PML-TQ search engine, which is part of the pmltq extension to the tree editor TrEd (see section about client interfaces below).

Register to search various treebanks using our server:

We are hosting a PML-TQ search service for PDT 2.0 and various other treebanks, including Penn Treebank 3, Penn Chinese Treebank, Penn Arabic Treebank, Tiger Corpus 1.0. To register, send an email to Jan Štěpánek; for some treebanks, you will need to obtain a license from the treebank distributor. The server is accessible from several clients, including modern web browsers or TrEd (see clients).

Search any treebank on your own PML-TQ server:

Download and install the PML-TQ server (Linux, UNIX, Mac OS X) on your computer/server.


Documentation


Clients

Web Browser

Any web browser with good support for SVG rendering, CSS, and JavaScript can be used as a client to a PML-TQ server (we recomend Opera).

TrEd

A fully graphical client for PML-TQ with client-side searching capability is part of the tree editor TrEd (a GPL-licensed software available separatelly) as an extension called pmltq. Several other extensions provide PML schemas and visualization stylesheets for various treebanks.

To install this extension, start TrEd, select Setup > Manage Extensions > Get New Extensions and select 'pmltq'. When done, press Shift+F3 to start the search. Select Treebank (server) for searching using a PML-TQ server, or 'Files (local)' for searching local files using client-side search engine built into the client.

Command-line

A simple text-based client called pmltq is included in the server package.


Server

This distribution contains a fast and efficient implementation of PML-TQ powered by an SQL database with a client-server architecture (HTTP client -> custom HTTP server -> CGI -> SQL database backend).

The server is intended for searching large static data sets (complete treebanks). For individual files or small treebanks, up to say 10K trees (your mileage may vary), the client-side PML-TQ implementation in TrEd is usually sufficient.

Running a PML-TQ server requires either Oracle or PostgresSQL database, Perl >= 5.8.8 and several Perl modules installable from CPAN. The treebank must be encoded in or converted to the PML format.

The server has been tested on Linux with Oracle XE 10g and PostgresSQL (8.4beta).

Download

Current version is 0.7.10 (beta). This realease is ready for testing, but some important parts of the documentation are still missing.

pmltq.tar.gz - PML-TQ distribution package

Directory structure

Subdirectories:
  config    - sample configuration files (must edit first!)
  contrib   - sample conversion scripts (e.g. for PDT 2.0)
  doc       - documentation
  libs      - perl modules used by pmltq
  resources - PML schemas used by pmltq
  sql       - SQL scripts to init the database
  run       - unified server startup/shutdown script and configuration

Scripts:
  install_deps.sh - install modules required by the search server
  pmltq_http  - small HTTP server providing PML-TQ services
  pml2base.pl - PML to SQL database conversion script
  pmltq     - command-line client for both dabase and Perl-driven query engine

Installation of PML-TQ Server

To run PML-TQ Servers, you will first need to install an SQL database server. Fully supported are Oracle 10g or 11g and PostgreSQL ver. min. 8.4.1.

Then follow carefully the instructions in the README file provided in the distribution and the configuration scripts you will be asked to edit during the installation process. Since individual steps of the server installation are still poorly documented, do not hesitate to ask the authors for guidance via e-mail.


Bibliography

Štěpánek Jan, Pajas Petr: Querying Diverse Treebanks in a Uniform Way, in Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10), Copyright © European Language Resources Association (ELRA), Valletta, Malta, pp. 1828-1835, 2010

Pajas Petr, Štěpánek Jan: System for Querying Syntactically Annotated Corpora, in Proceedings of the ACL-IJCNLP 2009 Software Demonstrations, Copyright © Association for Computational Linguistics, Suntec, Singapore, pp. 33-36, 2009

Pajas Petr, Štěpánek Jan: Recent Advances in a Feature-Rich Framework for Treebank Annotation, in The 22nd International Conference on Computational Linguistics - Proceedings of the Conference, Manchester, pp. 673-680, 2008


Authors

Copyright © 2008-2009 by Petr Pajas and Jan Štěpánek

Acknowledgement

The development of PML-TQ is a part of the project "Integration of language resources for information extraction from natural texts", Information Society of Grant Agency of Academy of Sciences of the Czech Republic: 1ET101120503


License

This software is published under GPL (General Public License).