• bitcoinBitcoin$91,155.39-1.79%
  • ethereumEthereum$3,120.33-1.85%
  • rippleXRP$2.07-3.33%
  • binancecoinBNB$892.69-1.75%
  • solanaSolana$136.83-4.41%

Decentralized OORT AI Data Ranks Among Top on Google Kaggle

Decentralized OORT AI Data Ranks Among Top on Google Kaggle

OORT’s decentralized AI data ranks among the top on Google Kaggle, showcasing its growing influence in AI and machine learning competitions.

OORT’s AI image data collection in several categories made it to Kaggle’s first page, demonstrating the growing need for community-sourced, high-quality training data.

On Google’s Kaggle platform, the decentralized AI solution provider OORT has significantly succeeded with a dataset of training images for artificial intelligence.

The Variety of OORT’s Tools Since its early April release, the Kaggle data set listing has risen to the top of several categories. Google owns the online data science and machine learning competition, learning, and teamwork platform Kaggle.

“A front-page Kaggle ranking is a strong social signal, indicating that the data set is engaging the right communities of data scientists, machine learning engineers, and practitioners,” Ramkumar Subramaniam, a core contributor at the crypto AI project OpenLedger, said.

OORT’s CEO and founder, Max Li, said that the company “saw encouraging engagement metrics that confirm the early demand and significance” of its training data collected using a decentralized architecture. He went on to say:

“The organic interest from the community, including active usage and contributions, demonstrates how decentralized, community-driven data pipelines like OORT’s can achieve rapid distribution and engagement without relying on centralized intermediaries.“

Li added that OORT intends to make more data sets available in the upcoming months. These include a dataset for voice commands in cars, a dataset for voice commands in smart homes, and a dataset for deepfake movies designed to enhance AI-powered media verification.

The first page across several categories

Earlier this month, Cointelegraph confirmed that the dataset below has made it to the first page of Kaggle’s General AI, Retail & Shopping, Manufacturing, and Engineering categories. It lost those spots after two updates to the data set, one on May 6 and the other on May 14, which may not have been related.

OORT’s data set is on the first Kaggle page in the Engineering category. Source: Kaggle
OORT’s data set is on the first Kaggle page in the Engineering category. Source: Kaggle

Subramaniam acknowledged the accomplishment but cautioned, “It’s not a definitive indicator of real-world adoption or enterprise-grade quality.” “The provenance and incentive layer behind the data set, not just the ranking, is what sets OORT’s data set apart,” he stated. He clarified:

“Unlike centralized vendors that may rely on opaque pipelines, a transparent, token-incentivized system offers traceability, community curation, and the potential for continuous improvement, assuming the right governance is in place.“

Although he does not believe these results are difficult to duplicate, Lex Sokolin, a partner at the AI venture capital firm Generative Ventures, stated that “it does show that crypto projects can use decentralized incentives to organize economically valuable activity.”

A rare resource is high-quality AI training data.

According to statistics released by the AI research group Epoch AI, the amount of human-generated text for AI training data is expected to run out in 2028. Investors are increasingly negotiating agreements that grant AI businesses rights to copyrighted goods due to significant market pressure.

For years, reports have been about the growing scarcity of AI training data and how it would restrict the field’s expansion. Human data is still generally considered the superior alternative, higher-quality data that produces better AI models, even if synthetic (AI-generated) data is being used increasingly with at least some success.

Artists intentionally undermining training efforts are making things more difficult, especially regarding images for AI training. Nightshade lets users “poison” their photos and drastically reduce model performance, which is intended to prevent their pictures from being used for AI training without consent.

Model performance per number of poisoned images. Source: TowardsDataScience
Model performance per number of poisoned images. Source: TowardsDataScience

“An era where high-quality image data will become increasingly scarce is coming,” Subramaniam stated. Additionally, he acknowledged that the growing prevalence of image poisoning exacerbates this scarcity:

“With the rise of techniques like image cloaking and adversarial watermarking to poison AI training, open-source datasets face a dual challenge: quantity and trust.”

According to Subramaniam, verifiable and community-sourced incentive data sets are “more valuable than ever” in this context. Such initiatives “can become not just alternatives, but pillars of AI alignment and provenance in the data economy,” he said.

Previous Article

Kazakhstan Aims to Be Central Asia’s Crypto Hub

Next Article

Top South Korean Candidates Support Legalizing Bitcoin ETFs