Natural language processing (NLP), the field of AI that involves analyzing text for tasks such as summarization and generation, is a growing technology. According to a 2021 survey from John Snow Labs and Gradient Flow, 60% of technology leaders said their NLP budgets increased by at least 10% from 2020, while a third said their spending increased by more than 30%. Fortune Business Insights pegged the NLP market at $16.53 billion in 2020.
Against this backdrop, Deepset, the startup behind the open-source NLP framework Haystack, today announced that it has raised $14 million in a Series A investment led by GV with participation from Harpoon Ventures, System.One, Lunar Ventures and Acequia Capital. The capital injection arrived alongside Deepset Cloud, a new subscription product for building NLP-powered software.
“Guided by [our] belief in open source, the Deepset team has…contributed models and research results to the open source NLP community [for years]”, Rusic told TechCrunch via email. “Haystack, the company’s flagship open source product, was born out of the experience, expertise, and know-how gained while creating NLP for large organizations and the need for an appropriate set of building blocks. for scalable, API-driven NLP backend applications.
CEO Milos Rusic co-founded Deepset with Malte Pietsch and Timo Möller in 2018. Pietsch and Möller – who have a background in data science – came from Plista, an adtech startup, where they worked on products including an authoring tool AI-based ads.
Haystack allows developers to create pipelines for NLP use cases. Originally created for search applications, the framework can power engines that answer specific questions (eg “Why do startups move to Berlin?”) or browse documents.
Haystack can also run “knowledge-based” searches that look for granular information on data-heavy websites or internal wikis. Rusic says Haystack has been used to automate risk management workflows at financial services companies, returning results for queries like “What’s the business outlook?” » » and « How has income changed over the last few years? Other organizations, like Alcatel-Lucent Enterprise, have used Haystack to launch virtual assistants that recommend documents to field technicians.
According to Rusic, the goal with Haystack was to enable developers and product divisions to successfully and quickly build modern, API-driven NLP applications. He notes that while it’s often straightforward for a data science team to come up with a prototype, challenges can arise when transitioning from prototype to production. According to a 2019 Gartner survey, around 80% of AI projects, including NLP projects, never go into production.
“[With Haystack,] development teams…are equipped with all the components to create a complete NLP application and are guided by the appropriate workflows…Modern NLP moves very quickly and it is much easier to bridge the gap between cutting-edge research and real production-ready technologies thanks to open source,” said Rusic. “[Prebuilt NLP systems] are the basis [for Haystack] and often deliver great results in pipelines without additional training. Customization, if needed, happens with end users and experts providing feedback by testing and using new iterations of a [system] or a pipeline.
But not all businesses choose – or want – to go the DIY route. For those who prefer a managed solution, there is the aforementioned Deepset Cloud, which supports customers throughout the NLP service lifecycle. The service starts with experimentation – that is, testing and evaluating an application, adapting it to a use case and building a proof of concept – and ends with labeling and monitoring the application in production.
“All NLP services that are developed [with Deepset Cloud] can be used in any end application, just by integrating an API,” Rusic said. “Examples of applications are NLP-based enterprise search (think ‘modern Google-like’ search) and knowledge management.”
With the new funding secured ($15.6 million in total), Deepset aims to translate its open source success—thousands of organizations currently use Haystack—into increased revenue. Rusic says the 30-person company based in Berlin, Germany, seeded and broke even before raising its first funding round in 2021, and now has major corporate clients, including Airbus.
“[With the new funding,] we will continue to build the open-source Haystack NLP project — adding more features, which will make it even easier for NLP-savvy backend developers to build NLP services,” Rusic said. “[We’ll also] develop Deepset Cloud into a full-fledged enterprise software as a service for building language-aware applications. This will include enabling more flexible workflows, more accurate product lifecycle guidance, and offering essential and additional tools, such as labeling and data integrations.