A suite of software components for building tools for annotating linguistic signals, time-series data which documents any kind of linguistic behavior (e.g. audio, video). The internal data structures are based on annotation graphs.
Arithmetic Coding |
A java package Arithmetic Coding and PPM (adaptive variable-length > n-gram language models for compression)
ComLinToo |
A set of Perl tools for computational linguistics (esp. corpus
handling and (permutation) statistics).
Attribute-Logic Engine (ALE) |
A freeware logic programming and grammar parsing and generation system
A Lisp system for developing and displaying HPSG
Ellogon |
An LGPL component-based natural language engineering platform written
in C, C++, Java, Tcl, Perl, and Python
Emdros |
A text database engine for analyzed or annotated text.
FreeLing |
An open source suite of language analyzers.
GuiTAR |
A General Tool for Anaphora Resolution.
Heart of Gold |
Middleware for combining shallow and deep NLP components.
Leo |
A project to provide an architecture for defining XML specifications of grammars for different natural language parsing systems and tools for converting grammars automatically between those systems
The LKB system is a grammar and lexicon development environment for use with
constraint-based linguistic formalisms.
Mallet |
A Machine Learning for Language Toolkit written in Java
MinorThird |
A collection of Java classes for storing text, annotating text, and learning to extract entities and categorize text.
Ngram Statistics Package |
Allows for the counting and measuring of Ngrams in text.
A Python package intended to simplify the task of programming natural language systems.
nlpFarm |
A collection of NLP libraries, tools and demo applications.
Current focus is mainly on parsing and dialogue systems.
SenseRelate |
Implements a word sense disambiguation algorithm using WordNet::Similarity
Tiger API |
Library which allows java programmers to easily access the structure of any corpus given as a tiger-xml file.
Web as Corpus Toolkit |
A collection of programs that can be used to create a (large) text corpus from a list of URLs.
Weka |
A collection of machine learning algorithms for data mining tasks.
Weta |
The Waikato Environment for Text Analysis
WordNet::Similarity |
Provides measures of semantic relatedness using WordNet.