Over the last year, as part of the new enterprise services that IBM has been pushing om its reinvention, Watson has become less of a "Jeopardy"-winning gimmick and more of a tool. It also remains IBM's proprietary creation.
What are the chances, then, of creating a natural-language machine learning system on the order of Watson, albeit with open source components? To some degree, this has already happened -- in part because Watson itself was built in top of existing open source work, and others have been developing similar systems in parallel to Watson. Here's a look at four such projects.
DARPA DeepDive
The biggest name brand of the bunch, DARPA's DeepDive project isn't meant to emulate Watson's plain-language query system, but rather Watson's ability to improve its decision-making over time with human guidance.
Developed principally by Christopher Re, a professor at the University of Wisconsin, the project is open source (Apache 2.0). According to EE Times, the main goal of DeepDive is to create an automated system for classifying unstructured data -- in one example case, categorizing articles in technical journals. Those planning to make use of DeepDive should be familiar with SQL and Python, but the system is already capable of extracting data from a wide variety of conventional sources, such as Web pages or PDF documents.
Apache UIMA
Unstructured Information Management (UIMA) is a standard for performing analysis on textual content. Watson used an implementation of UIMA, but you don't have to go through Watson to use UIMA. In fact, IBM's UIMA architecture was open-sourced and is being maintained by the Apache Foundation. It features support for multiple programming languages, with updates added periodically (most recently in October 2014).
Apache UIMA as it stands is a long way from being a full machine learning solution; it's only one -- albeit an important -- part of the whole that IBM created. If you don't want to use the bare bones, you can pick up one of its derivative projects, such as YodaQA, which leverages UIMA for its processing and uses Wikipedia as a primary data source.
OpenCog
OpenCog "aims to provide research scientists and software developers with a common platform to build and share artificial intelligence programs." Open-sourced under the GNU Affero license, the project's ambition is to fuel nothing less than what its creators call "generally intelligent" systems, artificial intelligence that has broad, humanlike understandings of the world instead of domain-centered specialties (such as being very good at chess but nothing else).
OpenCog's creators claim their framework is already in use in "natural language applications, both for research and by commercial corporations." That puts it a little further away from pie-in-the-sky AI concepts and closer to the practical Q&A domain inhabited by Watson.