In this segment:
ArchitectureThe Common Ground for All Our Platforms.Use CasesIn-Depth Technology Perspectives on What We Deliver.
Architecture
Both System One products as well as all our custom solutions share a common technological platform. Created without legacy from 2005 on, it is the result of a deep semantic web background: This includes industry firsts like Wikipedia³ as well as many production environments, with systems ranging into billions of items and requests per month. It also follows modern web paradigms since the beginning, with independent components that scale horizontally on commodity hardware and are interconnected via lightweight web services.
Component:
Data Adaptors:
The System One platform is bundled with a wide range of data connectors that can tap structured, unstructured, and internal as well as external information sources. Current ready-to-use adaptors include web search engines, file systems, mail, Wikipedia, an index of global news and social media sources, image and video consolidation, JAX, JRDF, JDBC for structured contents, various proprietary data providers, corporate systems like SAP or Lotus Notes and many more. Also included is a crawler framework that can harvest web sources in a structured manner and is managed with a powerful visual XPath editor.
IO Gateway:
The scriptable, secure data gateway serves mainly two purposes: Coordinating, timing and securing data gathering and exposing all functionalities of the platform as web services. On the input side, it manages scheduled jobs as well as real-time streams, checks for data integrity and normalizes the already sanitized data. Output ranges broadly from more traditional XML SOA options (like JMS, SOAP) to very lightweight approaches (JSON). Besides being responsible for the overall security, the IO gateway also houses a single integrated interface and UI to monitor, manage and manipulate all platform settings.
In-Memory Triple 'Blackboard':
This component is an enhancement of the initial blackboard concept: All incoming meta data is stored here in the first place. From then on, it's being continously enhanced by algorithms and agents, as well as user feedback and interactions. For example: The meta data of an indexed document is initially being stored. This triggers algorithms that relate it to and cluster it with existing contents, based on that additional meta data like people, categories and locations is being extracted and attention data is stored for recommendations. While the in-memory approach enables lightning fast speeds, persistance options range from triple-stores to semistructured repositories, and even traditional database systems.
Backend Indices:
While the blackboard holds all meta-data, contents are stored in the backend indices. Here we rely on solid open-source technologies like Lucene (Solr) and can optionally also plug into existing client investments (eg Fast ESP). An index wrapper takes care of more advanced features like large scale federation, score normalization and other distribution factors that improve performance and allow multi billion item indices. These back-ends are only used to store and retrieve documents, while most of the processing is done by seperate algorithms and agents.
Algorithmic Framework:
The algorithmic framework is a language independent collection of proprietary as well as open source components in mainly three information retrieval areas: Vector models / clustering to identify similarities in and group contents or user profiles and allow content based targeting. Named entity recognition for automated tagging of contents with topics, names, locations etc, where we support purely statistical as well as taxonomical and ontological approaches. The third area are recommendations, in which attention data and user behaviour is also being used to enhance the accuracy of targeting and results.
Front-End Scripting Environment:
Also included is an external framework that allows the development of front-end applications as well as rapid prototyping in a combination of Java and JavaScript. Based on the Helma Object Publisher, it lets you wrap different services into final applications or APIs. So no matter if you want to enhance existing applications or build new ones, create widgets, mobile applications or custom APIs, you'll find a tightly integrated environment to do so. Of course you can also skip this and stick with the environment of your choosing.