Discuss ideas and approaches to open up the web search ecosystem!
Discuss ideas for an open web search ecosystem under the topics of:
Search engine deployment
Search engine evaluation
Use of the web as a resource
The First International Workshop on Open Web Search (WOWS) aims to promote and discuss ideas and approaches to open up the web search ecosystem so that small research groups and young startups can leverage the web to foster an open and diverse search market.
Therefore, the workshop has two calls that support collaborative and open web search engines: (1) for scientific contributions, and (2) for open-source implementations.
The first call aims for scientific contributions to building collaborative search engines, including collaborative crawling, collaborative search engine deployment, collaborative search engine evaluation, and collaborative use of the web as a resource for researchers and innovators.
The second call aims to gather open-source prototypes and gain practical experience with collaborative, cooperative evaluation of search engines and their components using the TIREx Information Retrieval Evaluation Platform.
On behalf of the Organizers,
Sheikh Mastura Farzana, German Aerospace Center
Maik Fröbe, Friedrich-Schiller-Universität Jena
Gijs Hendriksen, Radboud University
Michael Granitzer, University of Passau
Djoerd Hiemstra, Radboud University
Martin Potthast, Leipzig University and ScaDS.AI
Saber Zerhoudi, University of Passau
Calls for participation
Call 1: Call for papers #wows2024
We seek research contributions that address elements of a traditional web search pipeline and also incorporate recent developments such as (open source) large language models to interface with retrieval systems. Specifically, our focus is on contributions that address the importance of an open web search pipeline and the creation of an open web index as a basis for the development of search applications for specific purposes and communities. We are particularly interested in contribution along the pipeline for creating an open web index, i.e. crawling, preprocessing, enrichment and indexing, as well as on serving (parts of) an open web index for advancing information retrieval. The latter also includes new search paradigms or ethical, legal and social aspects related to open web search. Topics include:
- Crawling for an Open Web Index, Collaborative crawling
- Web deployment of search engines
- Standards for search and Interoperability
- Large scale web data pre-processing components or pipelines
- Pre-processing and Enrichment
- Indexing and Search Architectures
- Open infrastructures for evaluation
- Open source search engines
- Open source replicability
- Ethical and legal aspects of web search
- Alternatives for query logs and click logs
- Vertical search engines
- Search engines for low-resource languages
- Energy efficiency of web search
- Standardisation and methods for index exchange and tokenisation
Call 2: Call for software #wows2024
We seek submissions of dockerized components of retrieval pipelines to the open TIRA/TIREx platform. Submitted components, such as query processors (e.g., query expansion, performance prediction, etc.), document processors (e.g., spam classification, document expansion/reduction, etc.), re-rankers, etc., can be combined as pipelines or reused outside of TIRA/TIREx in declarative PyTerrier pipelines. Because IR collections are static and submissions to TIRA/TIREx are immutable, components must be executed only once in a lifetime on each collection, and post hoc experiments can directly use their cached and publicly available outputs in new pipelines. We will make all outputs of the submitted software publicly available where the dataset licenses allow this so that the community can build complex retrieval pipelines without re-executing many of its components. Experience with Docker is not a prerequisite for participation, and we would be happy to help/assist you in dockerizing your components (please feel free to contact us).
Pre-Registration to Foster Collaborations
We intend to promote new collaborations among potential participants. Therefore, we maintain a public, non-binding pre-registration list where we share ideas for components together with potential participants of the workshop who expressed their interest in working on some component(s) so that potential participants can coordinate and avoid overlapping their work. If you have additional ideas for components you want to see in the list or want to express interest in working on a component, please write a short message. We hope the pre-registration also provides collaboration ideas as part of the ECIR’24 Collab-a-thon.
We have a set of baselines and a step-by-step tutorial to simplify submissions, including a screencast. We maintain a set of Jupyter Notebooks for submitted components that showcase how anyone can reuse components in declarative PyTerrier pipelines.
Participants who contribute a software submission are expected to submit a (short) notebook paper. The submission for notebooks opens soon.