The ultimate goal of ELSSIE is to make complex information at the fingertips of the scientists so that they can carry on drug invention at a rapid pace.
For accomplishing this, the solution needs to be able to ingest multiple structured and unstructured content, store it as queriable structured data, semantically understand and generate relationships by recognizing entities and concepts, interpret stored data and offer graph query capabilities and provide an API to integrate with external applications and finally make it easy for scientists to search for information.
ELSSIE as a final solution included the following components:
In summary, ELSSIE project used Apache Spark, Apache Hadoop, Apache Cassandra, Apache Kafka, Apache Solr, Apache grid gain, All built on AWS. Several innovations like dynamically scaled Apache Spark and Hadoop clusters, extending QUERTZL using Antlr parsers, using LDA along with NLP to find entities in text and their contextual meaning instead of hard literary meaning are achieved.