An ontology is a knowledge representation framework that is machine readable. It facilitates logical relationships between classes and allows us to standardise - and formalise - vocabulary within a domain. The metadata contained within an ontology is valuable with research showing can address the challenges presented by unstructured text.
Unstructured text can be processed, mined, and empowered by Natural Language Processing (NLP) tools, yet majority of tools are not designed to consider ontologies.
Jabberwocky allows users to conduct various NLP tasks whilst easily manipulating ontologies. Here provides an explanation - with a working example - for the Jabberwocky toolkit.
Documentation is linked in the table below:
| function | description | documentation | |
|---|---|---|---|
| 1. | converter |
convert an excel to an ontology | docs |
| 2. | snatch metadata |
extract metadata from classes | docs |
| 3. | catch text |
annotate corpus with key terms/phrases | docs |
| 4. | rank terms |
rank terms in order of importance | docs |
| 5. | update entities |
update ontology with new classes and metadata | docs |
| 6. | ontology plotting |
plot an ontology via web or tree format | docs |
When combining these Jabberwocky functions, users can create an NLP workflow:

Scenario
You have curated unstructured text: user-generated posts from a social media platform (with permission of course, in this example I invented these fake conversations).
Your aim is to text mine the corpus and only have posts covering a particular topic (or set of topics). But you realise, although you know some words in this topic of yours, you may be missing some important semantics.
This is where ontologies are useful. Ontologies are a controlled set of vocabularies with metadata, for example: synonyms.
-
You decide to design an ontology in excel and then use
converterto make it into an ontology. -
Of all the words in the ontology, some are vital to your research, you can
snatch the metadatafrom the ontology to extract concepts of interest and their synonyms. -
With these concepts and synonyms, you can annotate your corpus (social media data) to
catchthe posts that had these terms - this is great for downstream analysis! BUT…you also output a version of the corpus without annotations as you are curious if anything of interest was here… -
With the version of “nothing annotated”, you proceed to investigate with
ranking terms, this statistical techique highlights “important” terms in context of frequency in the corpus. With this output, you’ve found new synonyms and perhaps even terms that highlight a new class for your ontology. -
Organising these new entities (classes or metadata), we can
updatethe ontology. -
Finally, you rerun (2)
snatchto update your concepts list, and rerun (3)catchfor more annotations: meaning a more fruitful output for your investigations.
Useful links
- GitHub repository for Jabberwocky
- GitHub repository for the test files, CelestialObject
- An introduction to ontologies and space being a practical example, the Space Ontology and a Beginners Guide to Ontologies
- Software for ontology editing, Protégé
- Annotation was developed based on spaCy’s PhraseMatcher
- Wikipedia article for TF-IDF
- Jabberwocky was published in JOSS, you can cite here:
@article{Pendleton2020,
doi = {10.21105/joss.02168},
url = {https://doi.org/10.21105/joss.02168},
year = {2020},
publisher = {The Open Journal},
volume = {5},
number = {51},
pages = {2168},
author = {Samantha C. Pendleton and Georgios V. Gkoutos},
title = {Jabberwocky: an ontology-aware toolkit for manipulating text},
journal = {Journal of Open Source Software}
}
End of document