Our OpenWebSearch.EU project was recently featured in a German arte.tv report about European alternatives to overseas BigTech web services.

The video highlights our commitment to strengthening European digital sovereignty in the world wide web.
The report provides insights from Prof. Dr. Ir. Djoerd Hiemstra, Professor of Federated Search and Head of the Information Retrieval research group at Radboud University, one of the OpenWebSearch.EU consortium partners. Djoerd introduced the Open Web Index in its current state and the role it could play in creating powerful European search solutions.
Skip to minute 4:16 to hear Djoerd‘s insights:

Alternatively, watch the video directly on Arte.tv: https://www.arte.tv/de/videos/121620-127-A/wo-bleibt-das-europaeische-google-oder-facebook/

Just in time for the wrap-up of the 42-month EU project OpenWebSearch.EU, we present exciting use cases based on the Open Web Index, which was developed in the project


As a reminder, the OpenWebSearch.EU project was implemented by 14 partner organizations from the research and non-profit sectors and aimed to create the first European Open Web Index as the centerpiece for sovereign structured access to the internet.

The OWI (Open Web Index) has been up and running since June 2025 and has crawled an impressive 1.3 petabytes of data to this date. In the course of the project, a total of 15 third-party partner projects were integrated through various open calls. The goal: to conduct legal, technical, and commercial analysis and feasibility studies related to the Open Web Index in order to lay a solid foundation for the expansion of a European web data infrastructure.

Seven of the third-party partner projects (projects from Open Call 2) dealt with specific technical application examples based on the OWI. The projects demonstrate the range of possibilities when web index data is openly accessible. We have summarized the promising results briefly:

VERITAS project: Fact-checking the war in Ukraine with a RAG chatbot

The company DEXAI (Czechia) developed a so-called retrieval-augmented generation chatbot and a Chrome browser extension for real-time fact-checking. Statements about the war in Ukraine were examined as test examples. The system filtered 30 days of OWI crawl data, extracted news content, indexed it using embeddings, and used an established LLM to generate source-supported, evidence-based responses to user queries. Users can highlight any text on a web page and receive an instant evaluation based on verified news sources. The project shows that open web data enables domain-specific fact checking tools that would otherwise rely on proprietary search APIs.
The full VERITAS story can be read here: https://openwebsearch.eu/results-veritas/

 

AKASE: The world’s arguments as a knowledge graph

The University of Groningen (Netherlands) constructed an argumentation knowledge graph based on over 105 million web index documents. The system automatically identifies argumentative content – claims and premises on websites, recognizes rhetorical fallacies, evaluates argumentation logics, and documents support, attack, and paraphrasing relationships between arguments. The applications include a search engine that reorders results according to argument quality and ArgsBase, a multi-agent deliberation platform that won the JTS Early Career Researcher Prize.
The full AKASE story is available here: https://openwebsearch.eu/akase-results/

 

CIFFIL Service: Sharing search statistics between Dutch municipalities

Spinque (Netherlands) integrated the Common Index File Format (CIFF) into its search platform to enable Dutch municipalities to easily exchange index statistics. Small municipal document collections – some with fewer than 10,000 documents – often suffer from poor search quality because the data sets are simply too small to provide accurate term frequency estimates (statistics on the frequency and relevance of certain terms within data collections). As a result, search result rankings cannot be designed effectively. By adopting statistics from larger municipalities via CIFF, smaller municipalities can significantly improve their ranking effectiveness.
Read the full CIFFIL story: https://openwebsearch.eu/ciffil-results/

 

DTCommerce: Supporting retailers in their transition to digital

ZenLab (Slovenia) developed open-source tools to facilitate the transition to e-commerce for brick-and-mortar retailers. Based on an Excel export from the company’s ERP or accounting tool, the system searches for information on the products listed therein. To do so it uses titles, descriptions, images, and MSRPs from supplier websites, optimizes existing descriptions using AI, and finally imports everything automatically into a WooCommerce online store via a WordPress plugin.
You can read the full DTCommerce story here: https://openwebsearch.eu/ditcommerce-results/

 

OMMS: Open Maps as an alternative to established Maps Apps

The E Foundation (France) used OpenWebSearch.EU’s crawling tools to harvest structured business data – opening hours, contact information, FAQ – from websites linked to OpenStreetMap Points of Interest (POI). This data is made available via an open-source POI server for mobile map applications. Starting in the Seattle metropolitan area and then expanding globally, the project team found that about 12% of POI-linked websites contain analyzable structured data. In addition, the project identified two promising future directions for OpenWebSearch.EU: the publication of POI relevance rankings (based on PageRank or similar metrics) to improve result sorting in open data geocoders, and the use of backlink data as an open alternative to proprietary rating databases.
The full OMMS story can be read here: https://openwebsearch.eu/results-omms/

 

FUN: Rethinking web crawling

The University of Pisa and the University of Glasgow (Italy/UK) proposed a paradigm shift in web crawling. Traditional crawlers use link-based heuristics such as PageRank to decide which pages to consider. FUN argues that in the age of AI, crawlers should instead use language models to assess the semantic quality of pages. The team developed four neural crawling strategies and tested them on 87 million pages from ClueWeb22-B. For natural language queries, the best strategy consistently outperformed PageRank in both crawling effectiveness and downstream retrieval quality, while remaining competitive for keyword queries.
The full FUN story is available here: https://openwebsearch.eu/fun-results/

 

TILDE: Trustworthy health search with fairness-conscious ranking

Know Center Research GmbH (Austria) built a health-related search system on the OWI that addresses potential biases and varying degrees of trustworthiness beyond pure search result relevance. The system extracted medical content from around 200,000 health-related websites, standardized it against the clinical UMLS ontology, and implemented a hybrid retrieval engine combining entity-based and semantic search. Its unique feature is a three-stage fairness pipeline: it enriches each search result with trustworthiness and neutrality attributes, sorts results to maximize fairness while maintaining credibility and diversity of viewpoints, and checks its own system outputs for stereotypes. The visual web interface allows users to explore medical evidence via visual knowledge graphs and faceted search.
The full TILDE story can be read here: https://openwebsearch.eu/tilde-results/

 

What happens next?

An Open Web Index enables applications that proprietary search cannot offer.
The research contributions have a direct impact on how the infrastructure itself should evolve.
With the completion of all open calls, the OpenWebSearch.EU project has built a community that extends far beyond the core consortium. The code, data, models, and tools from these projects are predominantly open source and freely available. The infrastructure contributions will continue beyond the formal end of the project.

“Imagine, our streets would have no names and our houses no readable house numbers. Just a cryptic code readable only by machines.”

Dr. Stefan Voigt, Chairman of the OSF Board, explains the mission of the Open Search Foundation and the goal of the Open Web Index project in an interview for the Polish web blog HomoDigital.

He goes into more detail about the challenges and the great importance of the project in current political, social and technological context.

“So one of the main challenges is to inspire people and computing providers to cooperate on this large but socially extremely relevant task and to jointly make public information publicly accessible and usable again.”

Dr. Stefan Voigt is optimistic about a possible paradigm shift away from the current market concentration of large tech companies on the Internet.

The full interview is available on HomoDigital (in Polish) here.

The annual International Open Search Symposium #ossym will take place for the seventh time in a row in 2025. From 8 to 10 October, #ossym25 invites the open search community to travel to Helskinki/Finland to participate in the 3-day long interdisciplinary forum at this year’s event partner CSC – IT Center for Science or online.

Interdisciplinary perspectives on classic web search and AI

As every year, the #ossym brings together experts from a wide array of fields such as computer science, law & regulation, ethics, business, politics and society. The seventh International Open Search Symposium provides a forum for innovating ideas regarding open and distributed web search as well as its use cases. The focus will be on artificial intelligence (AI), search applications and technologies, legal and ethical aspects of open web search as well as topics relating to information exploitation/media literacy.

Keynotes on data governance and knowledge management

Viivi Lähteenoja is Chief Executuve Officer at MyData Company and gives impulses on the topic of data governance.

Harri Ketamo is founder and CEO of Headai and will speak on “The openness of knowledge data and its role in Future Search Solutions”.

Scientific sessions on the topic of “Architecture & Infrastructure”

Two science tracks on the topic of “Architecture & Infrastructure” will deal with the extraction of structured data from the Open Web Index, data storage structures for the URL Frontier in OpenWebSearch.eu, and the extraction of geodata from semi-structured data with the help of LLMs. Additionally, Common Crawl offers insights into the coverage of diverse European language and cultural content.

Application tracks on “Retrieval Augmented Generation & Large Language Models”

Decentralized approaches for accessing information via browser-agentic web as well as the fusion of retrieval, grammar and decision trees for text generation will be presented and discussed.

Search Engine Tracks remain an integral part of the #ossym conferences

A popular #ossym tradition are the Search Engine Tracks, which focus on alternative search engines and their areas of application. Already confirmed this year are fragFinn.de and searchmysite.net.

Ethics, law and society

Non-technical topics such as societal interests and special needs search solutions will round up the program and will provide exciting food for thought.

Information and Registration

The Finnish Supercomputing Center CSC – IT Center for Science offers around 100 on-site seats.

All information on registration and tickets can be found at:
https://opensearchfoundation.org/en/events-osf/ossym25/

The Open Search Foundation e. V. is a European movement that creates the basis for independent, free and self-determined access to information on the Internet. In cooperation with research institutions, data centers and other partners, we are committed to a web search that benefits everyone.
 True to the motto: “Together for a better net”.

Contact :
Open Search Foundation e.V. – OSF

With the vision to revolutionize web search on a European scale, the Open Search Foundation was one of the driving forces that kicked off the Open Web Search initiative in 2018.
Under this very initiative the eponymous OpenWebSearch.eu project was implemented in 2022 with 13 additional organizations from research and economy, all in all uniting forces across 7 European countries.

Now, only 2,5 years later, in 2025, the consortium proudly presents its common European federated Open Web Index pilot by the name of: OWI.
This achievement not only marks an important first cornerstone in European digital sovereignty, but it also comes at a critical time amidst urgent calls for action in the face of rapidly progressing  global AI developments.

Innovators & early adopters wanted

From June onward, commercial and scientific development teams of any size as well as interested individuals are welcome to access and make use of almost a petabyte of open web data under a general research license or – upon request – under a designated commercial license as well.

This is an active call for early adopters to pioneer innovative projects surrounding vertical web search, argumentative search, LLM applications including RAG and more.

The OWI symbolizes a first step towards true European digital sovereignty and is a fundamental step in paving the way for a comprehensive open European AI landscape.
says OpenWebSearch.eu’s Community Manager Ursula Gmelch and she elaborates further:

Our goal behind this initial pilot phase is to onboard a range of projects from diverse domains to get early feedback in. We look forward to users confirming the quality and value in current functionalities and/or helping us pivot in such ways that real market demands can be met and further expanded upon.

Kick-off Event happens on 6 June

On 6 June from 10 am to 12 am CEST you can join the official kick-off event via Zoom.
Registration to the event is open under the following link:
https://cscfi.zoom.us/meeting/register/eATIpDQ5TZidh4Jzkim6FQ#/registration


 

Google’s decision to no longer explicitly rule out the use of its AI in weapons systems rightly raises questions about the value of voluntary commitments and principles made by companies.

The world is going crazy! This is the conclusion that observers of the latest political drumbeats are likely to come to. At the Munich Security Conference, it became clear that the anchors of stability of past decades, such as the transatlantic NATO alliance, could soon be a nostalgic thing of the past. The world is in a state of upheaval. And it is a small number of people who are using their power to shape social and political change. Google’s change in AI principles fits well into this picture: The company now allows its own artificial intelligence to be used for weapons systems. Such use was previously explicitly excluded. In today’s technocratic world, the heads of large digital tech giants are shaping the political discourse. Elon Musk, for example, has secured Donald Trump’s trust through money and skilful manoeuvring. On behalf of the US president, he is now turning the American executive branch upside down and making decisions at breakneck speed that have serious consequences for people all over the world, such as when Musk canceled the development aid from one day to the next. When authors talk about an ‘AI coup’, they are not being pessimistic.

Google’s decision to no longer explicitly exclude the use of its AI in weapons systems rightly raises the question of what voluntary promises and principles made by companies are actually worth. One thing is clear: Google is free to use its AI for the development and operation of weapons within the framework of the applicable laws. However, Google’s turnaround also makes it clear that companies are willing to throw ethical concerns overboard if they hope to reap economic benefits. Of course, this does not mean that ethical commitments by large corporations are purely a marketing measure. There are many companies that take their ethical and moral responsibility in the development of artificial intelligence seriously and set a good example. However, especially in the case of sensitive new technologies that will undoubtedly transform our society, compliance with minimum ethical standards should not be left to commercial players to decide for themselves. Instead, ethical standards must be ensured across all sectors and companies – through binding regulation. Whether the EU AI Regulation will prove to be a suitable means of achieving this remains to be seen.

The Musk case in the US already shows that if tech giants have too much power, no democratic system is safe from them. This applies not only to the USA, but also to Europe. The heavyweights of the digital world are already having a significant influence on legislative processes. Meta alone currently employs more than 40 lobbyists in Brussels. If Europeans want to prevent companies from ruthlessly pushing through their own interests, there is no way around more diversity in the digital space. In order to strengthen diversity and fairness in digital markets, European legislators have passed the Digital Markets Act (DMA). This regulatory offensive is an important building block, but is not enough on its own to protect Europe’s citizens, researchers and businesses from monopolists in the digital space. Rather, European solutions are needed that are fully available to business and science in order to keep Europe competitive and enable innovation.

Title picture My Opinion

 

In our new section ‘My opinion’, we provide comments and opinions from the Open Search Foundation team. Today, Leopold Beer – research fellow in the PriDI project – commented on Google’s decision to make its own AI applications available for weapons development in future.