No longer maintained, but feel free to have a look.Maui - Multi-purpose automatic topic indexingCurrent repository: repository: maui-indexer.googlecode.comMaui extends the keyphrase indexing algorithm Kea and is a GNU GPL Licensed library.

It performs the following tasks:

  • keyphrase extraction

  • automatic tagging

  • term assignment with a controlled vocabulary, thesaurus or a taxonomy

  • subject indexing

  • extracting most relevant concepts and entities from Wikipedia

It can also be used for terminology extraction and semi-automatic topic indexing.

About the name: In Māori mythology, Māui is a culture hero.

He fished out the North Island of New Zealand with a hook

made out of his jaw-bone (above).

KEA - Keyphrase extraction algorithm -

I have extended the original version of the keyphrase extraction algorithm Kea-3.0 (designed for free indexing) into a new version that performs controlled indexing Kea-4.1 (also known as Kea++).

Given a document and a thesaurus or controlled vocabulary (Kea accepts any vocabulary in the SKOS format), Kea selects a list of phrases from this vocabulary describing the document's main topics. (See examples of Kea's performance on different domains).

ELKB - Electronic Lexical Knowledge Base

Java package for accessing and exploring Roget's Thesaurus, originally developed by Mario Jarmasz, University of Ottawa. ELKB includes several NLP-applications: for detecting lexical chains in text, determining semantic distance between words and phrases, clustering words based on their meaning and solving a word quiz.

Other interesting projects: