A growing body of literature on recommender systems is making use of synthetic data and/or simulation methods in order to understand the behavior of these systems. There are many different uses for such synthetic data: to preserve privacy in underlying data set, to test algorithms over a range of data conditions, to synthesize unobservable attributes for algorithmic experimentation, to study experimental method behavior under controlled conditions, to assess reinforcement learning algorithms, and as a means to fulfil ethical requirements, among other purposes. Despite the popularity of simulation methods, the assumptions, implementations, and applications of these methods vary vastly. This workshop is intended to catalyze a discussion to identify what is currently known about the methods that could inform best practices, and the open lines of research needed in order to advance simulation as a robust, reproducible, and useful experimental methods. This will ultimately enable researchers and practitioners to rigorously and robustly apply simulation methods.
To that end, we solicit position papers from both academia and industry, including from researchers, practitioners, students, and others interested in participating in this discussion and helping shape the use of simulation methods and synthetic data in recommender systems research over the coming years. We are interested in a broad spectrum of perspectives, including those from computer and information sciences, business, ethics, and the social sciences.
The primary outcome of this workshop will be a report, jointly authored by the organizers and participants, documenting the group’s consensus on currently-known best practices and laying out an agenda for further research over the next 3-5 years to fill in places where we currently lack the information needed to make methodological recommendations. We anticipate this report will address the following topics:
- What are use cases where simulation methods are particularly or uniquely useful for promoting research?
- What are use cases where simulation methods are ill-suited?
- What is currently known about how to effectively use simulations and synthetic data for recommender systems research, and what can be promoted as a current best practice?
- How should RecSys research using synthetic data or simulations be evaluated?
- What open questions need further research in order to identify good practices, evaluation criteria, etc. to improve the robustness, validity, and usefulness of simulation-based research methods?
While this workshop will have the standard hybrid format during the RecSys conference, there will also be a significant asynchronous virtual component to this workshop. Selected participants will be expected to participate in the following activities prior to the conference.
- Read the position papers submitted by the other participants,
- Attend at least two of the three one-hour Zoom calls for synchronous discussion of workshop topics,
- Participate in asynchronous discussion of workshop topics via Slack or email, and
- Assist in drafting a report on currently-known best practices and open research questions for using simulation and synthetic data for recommender systems research.
We solicit short position papers (up to 5 pages excluding references) in the ACM manuscript paper style and should be submitted in EasyChair. Participants should describe their goals and use cases for simulation methods, and their perspective, experience, or open questions on how to use them effectively and rigorously to advance the state of knowledge in the field. Paper topics may include, but are not limited to, the following topics:
- What kinds of research questions and problem settings are simulation methods uniquely suited for?
- What are advantages and possibilities of simulation methods compared to other research approaches?
- What are limitations and pitfalls researchers should be aware of when using simulation?
- What have you found particularly promising or difficult in your own application of simulation to research and/or system development?
- What should the field be studying to improve the rigor and usefulness of simulation?
- What results so far shed light on the effective and appropriate use of simulation?
Authors are specifically asked not to include significant empirical results in their position papers, but rather cite their published work or separate preprints with the details of empirical findings (the position paper should, of course, summarize the findings with their relevance to supporting the authors’ arguments). The submission process has a set of keywords, and we ask authors to select appropriate keywords to help us understand the space of submitted and accepted papers.
All submitted papers must:
- be written in English;
- contain author names, affiliations, and email addresses;
- be formatted according to the ACM submissions guidelines (Word template, LaTeX template [use ‘manuscript’ option], Overleaf LaTex template) with a font size no smaller than 9pt; and
- be in PDF (make sure that the PDF can be viewed on any platform), and formatted for US Letter size.
Selected position papers will not be published by the workshop, and will only be distributed to workshop participants, unless the authors themselves distribute elsewhere (e.g. arXiv). Public dissemination of participant positions will be through the jointly-authored workshop report. Participants may also have the option of having their position papers included as appendices to the group’s report, depending on format and target. Papers will be reviewed and selected by the workshop organizers to facilitate robust discussion with a diverse set of topics and perspectives and seeded with well-supported positions.
Submissions are due August 2, 2021; please see the workshop main page for other important dates.
To aid in reviewing and organizing submissions, authors should tag their submissions with any of the follow keyphrases that are relevant.
- Bias and fairness
- Consumer behavior / psychology
- Data augmentation
- Data sharing
- Deep learning
- Diversity and homogenization
- Evaluation methods / criteria
- Feedback loops
- Interpretability and transparency
- Privacy preservation
- Reinforcement learning / bandits
- Simulation assumptions
- Societal impacts