Linkup enables legal access for large language models (LLMs) to premium content sources, ensuring compliance with copyright regulations
These AI chatbots are significantly enhanced by the ability to search the web and view citations inline, as you know if you have used ChatGPT Search or Perplexity.
Web search may mitigate so-called hallucinations, which occur when a generative AI generates inaccurate information. Additionally, results are enhanced by the inclusion of timely information.
This is the reason why Linkup, a French startup, is developing an API that enables developers to access web content from premium, trusted sources and provide the results to a large language model (LLM) to enhance its responses. Frequently, this workflow is referred to as Retrieval-Augmented Generation (RAG) by AI developers.
More importantly, the future of scraping algorithms is uncertain. If there is no pre-existing financial agreement between content publishers and the entities scraping web pages, these bots are lifting content from the open web without paying, and many people are not pleased about that deal. This is resulting in increased regulatory scrutiny around AI training.
Additionally, there are now high-profile legal cases in the spotlight, such as the ongoing litigation between the New York Times and OpenAI, the manufacturer of ChatGPT.
Consequently, the situation regarding web scraping may transform the near future. This is the reason why OpenAI has entered into multi-year content licensing agreements with prominent publishers, including AP, Axel Springer, Condé Nast, El País, the Financial Times, and Le Monde.
“We established the company at a time when OpenAI was negotiating agreements with news sources to enhance the responses of OpenAI models and their products, whether for training or inference purposes.”
And we thought, ‘OK, this is fantastic because we finally have AI companies that pay their sources,'” Linkup co-founder and CEO Philippe Mizrahi told TechCrunch.
He elucidated the motivation behind the founders’ decision to establish a business that would facilitate the connection between AI developers and content providers, with the intention of fostering mutual benefit.
At present, content publishers are confronted with the challenge of determining how to address GenAI’s insatiable appetite for data. Web scrapers can be prevented by employing the non-legally binding robots.txt metadata file, which denotes whether a website is suitable for AI model training.
Additionally, they have the option to file a lawsuit against AI companies that they believe have infringed upon their copyright. Conversely, they could permit algorithms to index their content without restriction (er, YOLO?).
Alternatively, they may be able to license content to AI developers in order to receive compensation for their intellectual property.
However, there are thousands of technology companies that utilize A, but they lack the scope and reach of OpenAI. Simultaneously, the web’s greatest asset is its extensive network of content publishers.
However, this implies that a minor content publisher typically lacks the financial resources to initiate a lawsuit. It also implies that the transition from a harvesting model to a licensing model for millions of websites will be challenging.
This is the reason why Linkup is not solely a technical solution. It functions as a marketplace, serving as an intermediary between content publishers and companies that wish to supplement their LLM responses with web content.
Linkup executes content licensing agreements with publishers and integrates with their CMS to ensure that it can retrieve content from publishers without the need for scanning. Content partners are compensated by Linkup contingent upon the frequency with which Linkup customers access their content.
Mizrahi stated, “Our primary objective is to target applications that are integrating AI into their own products.” “Consequently, the typical scenario is that I develop an AI application by employing a model from OpenAI or Mistral.” I construct my own pipeline; however, I must supplement it with external information.
ChatGPT is capable of browsing the web, whereas GPT models are unable to do so. OpenAI offers a widely used application (ChatGPT) and LLMs that developers can utilize with an API (GPT). However, web search is a ChatGPT feature.
Mizrahi also provided an example that he found particularly appealing: “One of our customers developed an internal application for their sales personnel.” “On the one hand, they have enumerated all of the benefits of their own products.”
Additionally, we provide them with current, high-quality information regarding their prospects, which they incorporate into a Mistral LLM. Additionally, Mistral’s LLM will produce a sales pitch for the sales representatives to refer to during their interactions with consumer leads.
Initially, Linkup elected to concentrate on business and corporate information. Knowledge databases, such as Statista, Xerfi, or other comparable resources, are utilized by the venture in addition to news websites.
It is not the sole startup that is engaged in the development of premium content for LLMs, with licensing contracts being executed behind the scenes. ScalePost, a startup that collaborates with Perplexity to expedite its licensing agreements with publishers, is the most apparent competitor.
A few months ago, Linkup secured a €3 million seed round ($3.2 million at current exchange rates) from Axeleo Capital, Motier Ventures, Seedcamp, and one hundred business entrepreneurs. The venture currently employs approximately 10 individuals and intends to recruit an additional 10 employees within the next year.