28 March 2024

1st International Workshop on Open Web Search #wows2024

28 March 2024

1st International Workshop on Open Web Search #wows2024

Discuss ideas and approaches to open up the web search ecosystem!

28 March 2024
Glasgow, United Kingdom
Deadline calls: NEW! 28 February 2024
Co-located with ECIR 2024

Discuss ideas for an open web search ecosystem under the topics of:

Crawling

Search engine deployment

Search engine evaluation

Use of the web as a resource

Open-source prototypes

Cooperative action

papierflieger – call for papers open search symposium

The First International Workshop on Open Web Search (WOWS) aims to promote and discuss ideas and approaches to open up the web search ecosystem so that small research groups and young startups can leverage the web to foster an open and diverse search market.

Therefore, the workshop has two calls that support collaborative and open web search engines: (1) for scientific contributions, and (2) for open-source implementations.

The first call aims for scientific contributions to building collaborative search engines, including collaborative crawling, collaborative search engine deployment, collaborative search engine evaluation, and collaborative use of the web as a resource for researchers and innovators.

The second call aims to gather open-source prototypes and gain practical experience with collaborative, cooperative evaluation of search engines and their components using the TIREx Information Retrieval Evaluation Platform.

On behalf of the Organisers,
Sheikh Mastura Farzana, German Aerospace Center
Maik Fröbe, Friedrich-Schiller-Universität Jena
Gijs Hendriksen, Radboud University
Michael Granitzer, University of Passau
Djoerd Hiemstra, Radboud University
Martin Potthast, Leipzig University and ScaDS.AI
Saber Zerhoudi, University of Passau

Tentative Programme

8:30 — 9:00
Welcome and coffee

9:00 — 10:00
Keynote
Evaluation of Information Access Systems in the Generative Era
Negar Arabzadehghahyazi (University of Waterloo)

10:00 — 10:30
Perspectives on Evaluating diverse Open Web Search Applications with TIREx
Maik Fröbe (Friedrich-Schiller University Jena)

10:30 — 11:00
Coffee break

11:00 — 11:20
Web content control standards in times of Generative AI
Michael Dinzinger, Florian Heß and Michael Granitzer (University of Passau)

11:20 — 11:40
Efficiently Scoring the Health-relatedness of Web Pages
Ferdinand Schlatt (Friedrich-Schiller University Jena)

11:40 — 12:00
ORCAS-I intent predictor as component of TIRA
Daria Alexander, Wojciech Kusa and Arjen P de Vries (Radboud University and TU Wien)

12:00 — 12:20
Embedding-based Query Spelling Correction
Ines Zelch, Gustav Lahmann and Matthias Hagen (Friedrich-Schiller University Jena)

12:20 — 12:30
Comparing Recall-Oriented Document Processing on TIREx: DocT5Query vs. the Corpus Graph
Sean MacAvaney (University of Glasgow)

12:30 — 13:30
Lunch

13:30 — 13:50
A Mastodon Corpus to Evaluate Federated Microblog Search
Matti Wiegmann, Jan Heinrich Reimer, Maximilian Ernst, Martin Potthast, Matthias Hagen and Benno Stein (Bauhaus-Universität Weimar, Friedrich-Schiller University Jena, Leipzig University)

13:50 — 14:10
QPPTK@TIREx: Simplified Query Performance Prediction for Ad-Hoc Retrieval Experiments
Oleg Zendel, Maik Fröbe and Guglielmo Faggioli (RMIT University, Friedrich-Schiller University Jena, and University of Padua)

14:10 — 14:30
Integrating Query Interpretation Components into the Information Retrieval Experiment Platform
Marcel Gohsen, Benno Stein (Bauhaus-Universität Weimar)

14:30 — 14:50
TU Dresden at WOWS 2024
Linda Erben, Maria Hampel, Malte-Christian Kuns, Vincent Melisch, Per Natzschka, Wilhelm Pertsch, Lina Razouk, Reiner Stolle, Robert Thomas Thoss, Tuan Giang Trinh, Julius Gonsior and Anja Reusch (TU Dresden)

14:50 — 15:00
Discussion/Preparation of Breakout Groups

15:00 — 15:30
Coffee break

15:30 — 16:30

Breakout groups

16:30 — 17:00
Reports of the Breakout groups and Closing

Calls for participation

Call 1: Call for papers #wows2024

We seek research contributions that address elements of a traditional web search pipeline and also incorporate recent developments such as (open source) large language models to interface with retrieval systems. Specifically, our focus is on contributions that address the importance of an open web search pipeline and the creation of an open web index as a basis for the development of search applications for specific purposes and communities. We are particularly interested in contribution along the pipeline for creating an open web index, i.e. crawling, preprocessing, enrichment and indexing, as well as on serving (parts of) an open web index for advancing information retrieval. The latter also includes new search paradigms or ethical, legal and social aspects related to open web search. Topics include:

Crawling for an Open Web Index, Collaborative crawling
Web deployment of search engines
Standards for search and Interoperability
Large scale web data pre-processing components or pipelines
Pre-processing and Enrichment
Indexing and Search Architectures
Open infrastructures for evaluation
Open source search engines
Open source replicability
Ethical and legal aspects of web search
Alternatives for query logs and click logs
Vertical search engines
Search engines for low-resource languages
Energy efficiency of web search
Standardisation and methods for index exchange and tokenisation

Submissions to WOWS should be 5 to 12 pages. Reviewing will be single-blind. Please, use the single-column CEUR-WS format available at http://ceur-ws.org/Vol-XXX/CEURART.zip

Paper submissions

Call 2: Call for software #wows2024

We seek submissions of dockerized components of retrieval pipelines to the open TIRA/TIREx platform. Submitted components, such as query processors (e.g., query expansion, performance prediction, etc.), document processors (e.g., spam classification, document expansion/reduction, etc.), re-rankers, etc., can be combined as pipelines or reused outside of TIRA/TIREx in declarative PyTerrier pipelines. Because IR collections are static and submissions to TIRA/TIREx are immutable, components must be executed only once in a lifetime on each collection, and post hoc experiments can directly use their cached and publicly available outputs in new pipelines. We will make all outputs of the submitted software publicly available where the dataset licenses allow this so that the community can build complex retrieval pipelines without re-executing many of its components. Experience with Docker is not a prerequisite for participation, and we would be happy to help/assist you in dockerizing your components (please feel free to contact us).

Pre-Registration to Foster Collaborations

We intend to promote new collaborations among potential participants. Therefore, we maintain a public, non-binding pre-registration list where we share ideas for components together with potential participants of the workshop who expressed their interest in working on some component(s) so that potential participants can coordinate and avoid overlapping their work. If you have additional ideas for components you want to see in the list or want to express interest in working on a component, please write a short message. We hope the pre-registration also provides collaboration ideas as part of the ECIR’24 Collab-a-thon.

Submissions

We have a set of baselines and a step-by-step tutorial to simplify submissions, including a screencast. We maintain a set of Jupyter Notebooks for submitted components that showcase how anyone can reuse components in declarative PyTerrier pipelines.

Participants who contribute a software submission are expected to submit a (short) notebook paper. The submission for notebooks opens soon.

Software submissions

Main Topics

Topics of #wows2024 include but are not limited to

Crawling for an Open Web Index, Collaborative crawling
Web deployment of search engines
Standards for search and Interoperability
Large scale web data pre-processing components or pipelines
Pre-processing and Enrichment
Indexing and Search Architectures
Open infrastructures for evaluation
Open source search engines
Open source replicability
Ethical and legal aspects of web search
Alternatives for query logs and click logs
Vertical search engines
Search engines for low-resource languages
Energy efficiency of web search
Standardisation and methods for index exchange and tokenisation

Important Dates

January 24, 2024 (optional):
Early Bird Submissions of Software and Papers. You receive early notifications; Accepted contributions get a free WOWS T-Shirt

February 14 28th 2024:
Deadline Submissions of Software and Papers

March 13, 2024:
Peer review notification

March 20, 2024:
Camera-ready papers submission

March 28, 2024:
Workshop (co-located with ECIR 2024 in Glasgow)

Program committee

Jan van Acken (Utrecht University)
Hui Fang (University of Delaware)
Noor Afshan Fathima (CERN)
Christian Guetl (TU Graz)
Claudia Hauff (Spotify)
Katja Mankinen (IT Center for Science, CSC)
Craig Macdonald (University of Glasgow)
Jan Heinrich Reimer (Friedrich-Schiller-Universität Jena)
Benno Stein (Bauhaus-Universität Weimar)
Nicola Tonellotto (University of Pisa)
Andrew Trotman (University of Otago)

Organisers

28 March 2024

1st International Workshop on Open Web Search #wows2024

28 March 2024

1st International Workshop on Open Web Search #wows2024

Discuss ideas and approaches to open up the web search ecosystem!

Discuss ideas for an open web search ecosystem under the topics of:

Crawling

Search engine deployment

Search engine evaluation

Use of the web as a resource

Open-source prototypes

Cooperative action

The First International Workshop on Open Web Search (WOWS) aims to promote and discuss ideas and approaches to open up the web search ecosystem so that small research groups and young startups can leverage the web to foster an open and diverse search market.

Therefore, the workshop has two calls that support collaborative and open web search engines: (1) for scientific contributions, and (2) for open-source implementations.

The first call aims for scientific contributions to building collaborative search engines, including collaborative crawling, collaborative search engine deployment, collaborative search engine evaluation, and collaborative use of the web as a resource for researchers and innovators.

The second call aims to gather open-source prototypes and gain practical experience with collaborative, cooperative evaluation of search engines and their components using the TIREx Information Retrieval Evaluation Platform.

Tentative Programme

Calls for participation

Call 1: Call for papers #wows2024

Call 2: Call for software #wows2024

Pre-Registration to Foster Collaborations

Submissions

Main Topics

Important Dates

Program committee

Organisers

Workshop Registration via ECIR