Discuss ideas and approaches to open up the web search ecosystem!
Discuss ideas for an open web search ecosystem under the topics of:
Crawling
Search engine deployment
Search engine evaluation
Use of the web as a resource
Open-source prototypes
Cooperative action
The First International Workshop on Open Web Search (WOWS) aims to promote and discuss ideas and approaches to open up the web search ecosystem so that small research groups and young startups can leverage the web to foster an open and diverse search market.
Therefore, the workshop has two calls that support collaborative and open web search engines: (1) for scientific contributions, and (2) for open-source implementations.
The first call aims for scientific contributions to building collaborative search engines, including collaborative crawling, collaborative search engine deployment, collaborative search engine evaluation, and collaborative use of the web as a resource for researchers and innovators.
The second call aims to gather open-source prototypes and gain practical experience with collaborative, cooperative evaluation of search engines and their components using the TIREx Information Retrieval Evaluation Platform.
On behalf of the Organisers,
Sheikh Mastura Farzana, German Aerospace Center
Maik Fröbe, Friedrich-Schiller-Universität Jena
Gijs Hendriksen, Radboud University
Michael Granitzer, University of Passau
Djoerd Hiemstra, Radboud University
Martin Potthast, Leipzig University and ScaDS.AI
Saber Zerhoudi, University of Passau
Tentative Programme
8:30 — 9:00
Welcome and coffee
9:00 — 10:00
Keynote
Evaluation of Information Access Systems in the Generative Era
Negar Arabzadehghahyazi (University of Waterloo)
10:00 — 10:30
Perspectives on Evaluating diverse Open Web Search Applications with TIREx
Maik Fröbe (Friedrich-Schiller University Jena)
10:30 — 11:00
Coffee break
11:00 — 11:20
Web content control standards in times of Generative AI
Michael Dinzinger, Florian Heß and Michael Granitzer (University of Passau)
11:20 — 11:40
Efficiently Scoring the Health-relatedness of Web Pages
Ferdinand Schlatt (Friedrich-Schiller University Jena)
11:40 — 12:00
ORCAS-I intent predictor as component of TIRA
Daria Alexander, Wojciech Kusa and Arjen P de Vries (Radboud University and TU Wien)
12:00 — 12:20
Embedding-based Query Spelling Correction
Ines Zelch, Gustav Lahmann and Matthias Hagen (Friedrich-Schiller University Jena)
12:20 — 12:30
Comparing Recall-Oriented Document Processing on TIREx: DocT5Query vs. the Corpus Graph
Sean MacAvaney (University of Glasgow)
12:30 — 13:30
Lunch
13:30 — 13:50
A Mastodon Corpus to Evaluate Federated Microblog Search
Matti Wiegmann, Jan Heinrich Reimer, Maximilian Ernst, Martin Potthast, Matthias Hagen and Benno Stein (Bauhaus-Universität Weimar, Friedrich-Schiller University Jena, Leipzig University)
13:50 — 14:10
QPPTK@TIREx: Simplified Query Performance Prediction for Ad-Hoc Retrieval Experiments
Oleg Zendel, Maik Fröbe and Guglielmo Faggioli (RMIT University, Friedrich-Schiller University Jena, and University of Padua)
14:10 — 14:30
Integrating Query Interpretation Components into the Information Retrieval Experiment Platform
Marcel Gohsen, Benno Stein (Bauhaus-Universität Weimar)
14:30 — 14:50
TU Dresden at WOWS 2024
Linda Erben, Maria Hampel, Malte-Christian Kuns, Vincent Melisch, Per Natzschka, Wilhelm Pertsch, Lina Razouk, Reiner Stolle, Robert Thomas Thoss, Tuan Giang Trinh, Julius Gonsior and Anja Reusch (TU Dresden)
14:50 — 15:00
Discussion/Preparation of Breakout Groups
15:00 — 15:30
Coffee break
15:30 — 16:30
Breakout groups
16:30 — 17:00
Reports of the Breakout groups and Closing
Calls for participation
Call 1: Call for papers #wows2024
We seek research contributions that address elements of a traditional web search pipeline and also incorporate recent developments such as (open source) large language models to interface with retrieval systems. Specifically, our focus is on contributions that address the importance of an open web search pipeline and the creation of an open web index as a basis for the development of search applications for specific purposes and communities. We are particularly interested in contribution along the pipeline for creating an open web index, i.e. crawling, preprocessing, enrichment and indexing, as well as on serving (parts of) an open web index for advancing information retrieval. The latter also includes new search paradigms or ethical, legal and social aspects related to open web search. Topics include:
- Crawling for an Open Web Index, Collaborative crawling
- Web deployment of search engines
- Standards for search and Interoperability
- Large scale web data pre-processing components or pipelines
- Pre-processing and Enrichment
- Indexing and Search Architectures
- Open infrastructures for evaluation
- Open source search engines
- Open source replicability
- Ethical and legal aspects of web search
- Alternatives for query logs and click logs
- Vertical search engines
- Search engines for low-resource languages
- Energy efficiency of web search
- Standardisation and methods for index exchange and tokenisation
Submissions to WOWS should be 5 to 12 pages. Reviewing will be single-blind. Please, use the single-column CEUR-WS format available at http://ceur-ws.org/Vol-XXX/CEURART.zip
Call 2: Call for software #wows2024
We seek submissions of dockerized components of retrieval pipelines to the open TIRA/TIREx platform. Submitted components, such as query processors (e.g., query expansion, performance prediction, etc.), document processors (e.g., spam classification, document expansion/reduction, etc.), re-rankers, etc., can be combined as pipelines or reused outside of TIRA/TIREx in declarative PyTerrier pipelines. Because IR collections are static and submissions to TIRA/TIREx are immutable, components must be executed only once in a lifetime on each collection, and post hoc experiments can directly use their cached and publicly available outputs in new pipelines. We will make all outputs of the submitted software publicly available where the dataset licenses allow this so that the community can build complex retrieval pipelines without re-executing many of its components. Experience with Docker is not a prerequisite for participation, and we would be happy to help/assist you in dockerizing your components (please feel free to contact us).
Pre-Registration to Foster Collaborations
We intend to promote new collaborations among potential participants. Therefore, we maintain a public, non-binding pre-registration list where we share ideas for components together with potential participants of the workshop who expressed their interest in working on some component(s) so that potential participants can coordinate and avoid overlapping their work. If you have additional ideas for components you want to see in the list or want to express interest in working on a component, please write a short message. We hope the pre-registration also provides collaboration ideas as part of the ECIR’24 Collab-a-thon.
Submissions
We have a set of baselines and a step-by-step tutorial to simplify submissions, including a screencast. We maintain a set of Jupyter Notebooks for submitted components that showcase how anyone can reuse components in declarative PyTerrier pipelines.
Participants who contribute a software submission are expected to submit a (short) notebook paper. The submission for notebooks opens soon.