Key Words: Natural Language Processing (NLP), Software Infrastructure, Language Engineering (LE), Analysis Engines, Distributed Hash Tables (DHT), RESTFul, Service Oriented Architecture (SOA), NLP Interchange Format (NIF).
Purpose: The purpose of the dissertation is to design a prototype platform and to compare a selection of existing NLP systems and technologies. Existing systems are compared and contrasted. Assessment metrics are defined and reasoning is provided for why we think it is important to expand and develop a new and fresh natural language processing system.
Contribution: Enlightened by existing theories, this dissertation has aimed to shed light on the techniques used in large scale NLP Systems and at the same time produce a working prototype, SparkLE, of a design that take advantage of existing and new technologies like distributed services. Experiencing firsthand how to implement fresh a natural language processing system provides a platform and framework for other researchers to build upon. The SparkLE NLP software infrastructure prototype has been published as a public Open Source repository on GitHub, github.com/sparkle-git/SparkLE, so that SparkLE can be an example and a facilitator for future development of NLP software infrastructures.