Fact checking, smart search, and more: Nine partner projects show what the future of open web search could look like

,

Just in time for the wrap-up of the 42-month EU project OpenWebSearch.EU, we present exciting use cases based on the Open Web Index, which was developed in the project


As a reminder, the OpenWebSearch.EU project was implemented by 14 partner organizations from the research and non-profit sectors and aimed to create the first European Open Web Index as the centerpiece for sovereign structured access to the internet.

The OWI (Open Web Index) has been up and running since June 2025 and has crawled an impressive 1.3 petabytes of data to this date. In the course of the project, a total of 15 third-party partner projects were integrated through various open calls. The goal: to conduct legal, technical, and commercial analysis and feasibility studies related to the Open Web Index in order to lay a solid foundation for the expansion of a European web data infrastructure.

Seven of the third-party partner projects (projects from Open Call 2) dealt with specific technical application examples based on the OWI. The projects demonstrate the range of possibilities when web index data is openly accessible. We have summarized the promising results briefly:

VERITAS project: Fact-checking the war in Ukraine with a RAG chatbot

The company DEXAI (Czechia) developed a so-called retrieval-augmented generation chatbot and a Chrome browser extension for real-time fact-checking. Statements about the war in Ukraine were examined as test examples. The system filtered 30 days of OWI crawl data, extracted news content, indexed it using embeddings, and used an established LLM to generate source-supported, evidence-based responses to user queries. Users can highlight any text on a web page and receive an instant evaluation based on verified news sources. The project shows that open web data enables domain-specific fact checking tools that would otherwise rely on proprietary search APIs.
The full VERITAS story can be read here: https://openwebsearch.eu/results-veritas/

 

AKASE: The world’s arguments as a knowledge graph

The University of Groningen (Netherlands) constructed an argumentation knowledge graph based on over 105 million web index documents. The system automatically identifies argumentative content – claims and premises on websites, recognizes rhetorical fallacies, evaluates argumentation logics, and documents support, attack, and paraphrasing relationships between arguments. The applications include a search engine that reorders results according to argument quality and ArgsBase, a multi-agent deliberation platform that won the JTS Early Career Researcher Prize.
The full AKASE story is available here: https://openwebsearch.eu/akase-results/

 

CIFFIL Service: Sharing search statistics between Dutch municipalities

Spinque (Netherlands) integrated the Common Index File Format (CIFF) into its search platform to enable Dutch municipalities to easily exchange index statistics. Small municipal document collections – some with fewer than 10,000 documents – often suffer from poor search quality because the data sets are simply too small to provide accurate term frequency estimates (statistics on the frequency and relevance of certain terms within data collections). As a result, search result rankings cannot be designed effectively. By adopting statistics from larger municipalities via CIFF, smaller municipalities can significantly improve their ranking effectiveness.
Read the full CIFFIL story: https://openwebsearch.eu/ciffil-results/

 

DTCommerce: Supporting retailers in their transition to digital

ZenLab (Slovenia) developed open-source tools to facilitate the transition to e-commerce for brick-and-mortar retailers. Based on an Excel export from the company’s ERP or accounting tool, the system searches for information on the products listed therein. To do so it uses titles, descriptions, images, and MSRPs from supplier websites, optimizes existing descriptions using AI, and finally imports everything automatically into a WooCommerce online store via a WordPress plugin.
You can read the full DTCommerce story here: https://openwebsearch.eu/ditcommerce-results/

 

OMMS: Open Maps as an alternative to established Maps Apps

The E Foundation (France) used OpenWebSearch.EU’s crawling tools to harvest structured business data – opening hours, contact information, FAQ – from websites linked to OpenStreetMap Points of Interest (POI). This data is made available via an open-source POI server for mobile map applications. Starting in the Seattle metropolitan area and then expanding globally, the project team found that about 12% of POI-linked websites contain analyzable structured data. In addition, the project identified two promising future directions for OpenWebSearch.EU: the publication of POI relevance rankings (based on PageRank or similar metrics) to improve result sorting in open data geocoders, and the use of backlink data as an open alternative to proprietary rating databases.
The full OMMS story can be read here: https://openwebsearch.eu/results-omms/

 

FUN: Rethinking web crawling

The University of Pisa and the University of Glasgow (Italy/UK) proposed a paradigm shift in web crawling. Traditional crawlers use link-based heuristics such as PageRank to decide which pages to consider. FUN argues that in the age of AI, crawlers should instead use language models to assess the semantic quality of pages. The team developed four neural crawling strategies and tested them on 87 million pages from ClueWeb22-B. For natural language queries, the best strategy consistently outperformed PageRank in both crawling effectiveness and downstream retrieval quality, while remaining competitive for keyword queries.
The full FUN story is available here: https://openwebsearch.eu/fun-results/

 

TILDE: Trustworthy health search with fairness-conscious ranking

Know Center Research GmbH (Austria) built a health-related search system on the OWI that addresses potential biases and varying degrees of trustworthiness beyond pure search result relevance. The system extracted medical content from around 200,000 health-related websites, standardized it against the clinical UMLS ontology, and implemented a hybrid retrieval engine combining entity-based and semantic search. Its unique feature is a three-stage fairness pipeline: it enriches each search result with trustworthiness and neutrality attributes, sorts results to maximize fairness while maintaining credibility and diversity of viewpoints, and checks its own system outputs for stereotypes. The visual web interface allows users to explore medical evidence via visual knowledge graphs and faceted search.
The full TILDE story can be read here: https://openwebsearch.eu/tilde-results/

 

What happens next?

An Open Web Index enables applications that proprietary search cannot offer.
The research contributions have a direct impact on how the infrastructure itself should evolve.
With the completion of all open calls, the OpenWebSearch.EU project has built a community that extends far beyond the core consortium. The code, data, models, and tools from these projects are predominantly open source and freely available. The infrastructure contributions will continue beyond the formal end of the project.