2nd International Symposium on Open Search Technology

A review by Christian Geminn and Leon Pfeiffer, originally written in german for ‘MMR – Zeitschrift für IT-Recht und Digitalisierung‘, a German law magazine

From 12 to 14 October 2020, the “2nd International Symposium on Open Search Technology (OSSYM 2020)” took place as a virtual conference. The symposium was organized by the European Organization for Nuclear Research (CERN) as the “Cradle of the Web” and the Open Search Foundation e.V. founded in 2018 (OSF). The symposium was designed as an interdisciplinary conference on the topic of open Internet search (Open Search) and focused primarily on technical, legal, ethical and economic aspects.

While working at CERN, Tim Berners-Lee developed the World Wide Web in 1989; the first web page was dedicated to the WWW project itself. In addition, CERN may be considered a pioneer on the subject of openness. In this respect, CERN was a logical partner for OSF in hosting this series of conferences to explore and discuss an open Internet search.

The OSF is committed to a transparent, diverse and neutral Internet search and wants to lay the foundations of such a search infrastructure. The central pillar is the creation of an open web index, on which various search engine providers can base their services. To date, there are only four comprehensive Internet indexes in the world, operated by Google, Microsoft, Baidu and Yandex. These providers dictate the terms of access to their indexes and thus assume the role of gatekeepers of the search engine market. The goal of the event was to bring together international and interdisciplinary experts to discuss the implications and requirements of an open Web index and open Internet search.

The following is a brief presentation of the keynotes from the three-day event, which brought together over 90 registered participants from 16 countries.

The conference began with an introduction by OSF founder Stefan Voigt, DLR, who articulated three key principles of an open search ecosystem: collaborative, open, and publicly moderated. A welcome address was given by the host, Tim Smith, Head of Collaboration, Devices and Applications Group at CERN. Smith highlighted CERN’s role as a pioneer of the Web, which celebrated its 30th anniversary in 2019, and spoke about the ideals of the Web that were outlined back in its early years. In the meantime, however, he said, the Web has moved away from the ideal of openness and is dominated in many areas by a few, central “gatekeepers”.

Pearse O’Donohue, Director Future Network, DG Connect, EU Commission, then gave a policy keynote in which he emphasized the importance of European digital sovereignty. A European cloud, for example (based on GAIA-X as a Franco-German initiative), O’Donohue pointed out, is an important step towards sovereignty. However, the same also applies to other areas such as high-performance computing. Accessibility is always a fundamental element: common good – instead of profit-oriented knowledge access. He emphasizes how well the idea of an open Web index fits into the EU Commission’s vision of Europe’s digital future, especially in view of the current monopolies in this area. He says it is important to be consistent with European values and laws, especially in ensuring data security and data protection. A battle between value systems is taking place on a global level, the outcome of which will determine the future of the digital space, and in which it is important for Europe to assert itself and find its own way. In the process, attention must also be paid to the use of “green” IT in the sense of the European Green Deal. This applies all the more in the context of electricity-intensive areas such as Internet search.

The keynote address on legal aspects was given by Oilivia Tambou from the Université Paris-Dauphine. The starting point was the question how legal experts can support the development of a free and open search infrastructure. She emphasized the importance of law by design. One problem, she said, is the classification of search engines into different roles (data controller, digital platform, technical intermediary, information society service), which leads to an overlap of obligations, for example, under liability law and data protection law. In addition, Tambou described a wide range of legal tools for regulating search engines from contracts to self-regulation. The latter, however, has proven insufficient; hopes could be pinned on the future Digital Services Act. She identified three main challenges: the commitment to either a public law or private law approach to regulation; the need for a layered approach that takes into account EU law, national law, and international law; and the fact that European Digital Law is still emerging. The presentation concluded with a reference to the FAIR principles: Findable, Accessible, Interoperable, Reusable.

Arjen DeVries, Radboud University Nijmegen, then spoke on the technical challenges of open Internet search under the title “Searching, fast and slow – a tech perspective”. He began his remarks with a reference to the Report of the Investigation of Competition in Digital Markets of the Subcommittee on Antitrust Commercial and Administration Law of the Committee on the Judiciary of the U.S. House of Representatives, chaired by Jerrold Nadler. The report points to the high cost of operating a search engine. This is where the cooperative approach of the Open Search Foundation comes in, which distributes these costs on many shoulders. But he also pointed out that creating a Web index is only part of the work that needs to be done. Search engines today no longer need just an index, but also numerous accompanying functions such as snippets, verticals, knowledge graphs, instant answers or mobile applications.

The central question is how Google’s monopoly position in the search engine market can be broken. One particular challenge is to become the standard search engine in operating systems such as Android. According to DeVries, the work should begin with the creation of a European Web Index. He also introduced the concept of “slow search”, in which the search engine initially delivers only rudimentary results, but later adds detailed results. Furthermore, he addressed “Human-Centric Search”. The concept behind this is that no one needs access to the entire Web, because not all languages represented on the Web are comprehensible to the individual. Instead, the search should focus on personal interests and hide uninteresting topics. The opportunity of an open Web index lies precisely in offering a genuine alternative to Google through human-centric search. The approach of decentralized data processing could also offer decisive advantages over Google. The presentation concluded with the recommendation of cooperation with national web archives and libraries in building an open web index.

In a presentation titled “Legal Open Standard Design for Legal Search Features”, Monica Palmirani, Bologna University, School of Law, addressed the tracing of the legislative process and related processes on the basis of digital sources, as well as search engine-supported legal research. The lawyer of the future should also be supported in the translation of legal documents by artificial intelligence. Machine learning should be used, for example, in the classification of legal documents, which must be digitally linked and enriched with meta-information (e.g., on the jurisdiction in which a legal document is valid). The LegalXML project (Advancing Standards for Legal Data Exchange) is working in this direction.

It is important to build a Dynamic Legal Information System, in which the link in a document to a legal regulation does not lead to the most current version of the regulation, but to the version applicable during the period of validity of the document. It is precisely this issue of temporal classification that is neglected by many legal databases, both at the governmental and commercial level. The legal search engines of the future must take temporal factors into account (temporal model) at least and be able to perform an implicit interpretation automatically.

The keynote of the first session (Open Search Ecosystems) was given by Alexander Decker on the topic “Beyond Tech – Raising Awareness For The Open Search Foundation Through A Tailor-Made Communication Approach”. The talk opened with a quote from Google founders Brin and Page: “For this type of reason and historical experience with other media, we expect that advertising funded search engines will be inherently biased toward the advertisers and away from the needs of the consumers.” (The Anatomy of a Large-Scale Hypertextual Web Search Engine, (1998) 30 Computer Networks and ISDN Systems, 107, Appendix A). Decker went on to trace the problems encountered in raising public awareness of the basic problem and in building the organizational infrastructure of OSF e.V.

The first keynote of the second day of the event was given by Michael Völske of Bauhaus University Weimar. Under the title “Towards an Open Web Index: Lessons From the Past”, he addressed, among other things, the aggregation of the number of hits on web pages in comparison to their ranking as a necessary method for improving search. The search engine operator CLIQZ has made initial efforts in this area. Here users collect locally which web pages they have requested in previous searches and pass this information on to the search service provider temporarily and in encrypted form during their search.

A public meeting of OSF’s Legal Section, which met in the context of the symposium, emphasized the importance of a two-pronged approach to providing legal support for the development of an open Web index. In addition to aspects of legal compliance, it is also necessary to work towards a constitutionally compatible technology design in which European constitutional values and principles (beyond minimum legal requirements) were expressed in the best possible way.

One of the central issues identified was the fact that the current legal framework is geared toward a convergence of web index operators and search engine operators; central rulings (on the right to be forgotten, for example) are concerned with the Google search engine. Furthermore, the development of an open web index is taking place in a highly volatile legal environment in which significant changes are to be expected in the medium term. In addition, there are clear problems in concretizing constitutional requirements into requirements that can be implemented by computer scientists and technicians, especially since many of these constitutional requirements are in competition with each other.

Further Information: The 3rd International Symposium on Open Search Technology is planned as a regular (non-virtual) meeting for fall 2021 at CERN in Geneva.

Dr. Christian Geminn is managing director of the Project Group for Constitutionally Compatible Technology Design (provet) at the Research Center for Information System Design (ITeG) at the University of Kassel and head of the Open Search Foundation’s Legal Section.

Leon Pfeiffer is a law student at the Ludwig-Maximilians-University of Munich and the Université Panthéon-Assas and a founding member of the OSF’s Legal Section.