ACL 2019 Chairs Blog

Florence, Italy July 28 - August 2

Category: Chairs Blog

Report on the ACL 2019 Survey on Reviewing: Make Your Voice Heard!

The ACL survey on reviewing was conducted from May 6 – June 5, 2019 and received 422  responses.  The survey was advertised twice on the ACL membership email distribution list as well as on social media.  The  ACL PC Chairs and the  ACL Exec invites the membership to discuss these results both in the comments features of this blog and during a town hall session in the last 45 minutes of the Business meeting at ACL 2019 in Florence (Tuesday, July 30, 17:30-19:00, Cavagnilia room).  That portion of the meeting will  include a presentation of a summary of these results, a panel of discussants, and an open mic for discussion from the membership.

This blog post contains just the executive summary. The full results, including charts and categorized comments, are in this pdf document.

This discussion is complementary to one that took place two years ago on the ACL 2017 PC Chairs blog.  The topics are different than that one, and we would like the comments to stay focused to what was asked in this survey.

Background

Purpose: The purpose of this survey was to inform ACL leadership and the membership about the membership’s opinions about several aspects of the reviewing process, based on members’ experience as authors, reviewers, area chairs (AC), or program chairs (PC). The results of this survey will help guide decisions about these policies in the future. 

Structure: The survey consisted of 26 questions organized as (i) respondents’ experience as reviewers, authors, area chairs, and program co-chairs, and then opinions on: (ii) author response, (iii) author discussion, (iv) meta-review, (v) structured review forms, (vi) review transparency, (vii) acceptance rates, (viii) timing of conference dates and review release, (iv) public review, (x) open comments, and finally, (xi) demographics.  Participants were allowed to skip questions, so results are reported in percentages throughout. An option of “No opinion / not sure’’ was offered on most questions.

Discussion:  The ACL Exec invites the membership to discuss these results both in the comments features of this blog and during an open session in the Business meeting at ACL 2019 in Florence.

Summary of Results

Demographics: The vast majority of participants (95%) are either current or previous members of the ACL, and as far as we can tell from available membership statistics, the sample appears to be representative with respect to demographic factors like geographical affiliation, gender, role and academic background. 

The vast majority of participants have submitted an *ACL paper for review at least once (98%) or at least twice (95%) over the last 10 years.  However, there were a significant number of respondents with little experience reviewing (13% never, 8% one time, 8% two times). Furthermore, most respondents have not had first-hand experience with higher levels of reviewing.  74% have not been an AC, and only 16% have been an AC two or more times in the last ten years. The respondent pool consisted of 7% of people who have been program co-chairs for *ACL in that time period. Thus, most respondents are relatively inexperienced in the reviewing process.

Opinions: This is a brief summary; details and charts appear in the full documents that are linked below.

  • Author Response: A significant majority of survey respondents were in favor of author response as a whole, with 61% in favor compared to 27% opposed. Very few were strongly opposed.  Preference for author response was negatively correlated with reviewer experience.

 

  • Author Discussion: Opinions were nearly evenly split on author discussion, with 37% in favor, 32% opposed, and 30% with no opinion.  However, those who had experienced author discussion were more in favor of it than those who had not experienced it.  Those with experience as an Area Chair were more opposed.

 

  • Meta-Reviews: There was strong support for meta-reviews (only 9% opposed), with the majority (73%) agreeing that it is ok to have them only for borderline cases.  However, former PC chairs were split 50%-35% on the latter question. 

 

  • Review Form Structure: A significant majority supported (65%) a minimal amount of structure in review forms, although a minority (24%) preferred more structure. There was a clear trend that more experienced reviewers preferred less structured review forms.

 

  • Review Transparency:  A clear majority (54%) preferred that reviews be released only to authors, but a notable minority (36%) preferred public release of reviews. Those with no experience reviewing more often in favor of public release of reviews, while those with some experience reviewing were more often opposed. Public release of reviews was one area where there was a significant gap between genders, with only 26% of female respondents preferring reviews be publicly released, compared with about 40% for those who were male or preferred not to state. On the other hand, there were no regional differences in preferring public release, but a significant 20% of researchers from the Asia/Pacific region preferred that reviews not be released even to authors. Finally, those with program chair experience were even more strongly against public release of reviews, with only 14% agreeing. 18% were opposed to even releasing reviews to authors.

 

  • Post Review Discussion Transparency: There was significant disagreement in whether post-review discussion should be released to authors, with about half of those who responded with an opinion (47%) preferring that it not, and the remaining half split between release to authors  (24%) and public release (20%). As seen in similar questions, those with greater amounts of review experience preferred that post-review discussion be kept private, while those who had never reviewed strongly preferred that discussion be available to authors or the public.

 

  • Meta-Review Transparency:  There was strong support in releasing meta-reviews to authors, with 83% agreeing. 33% were also in favor of releasing to the public. Similar trends were seen as above for reviewer experience.

 

  • Acceptance Rates: A near majority (47%) were in favor of keeping the status quo with respect to acceptance rates, while 32% were in favor of increasing, 9% were in favor of decreasing. A large majority of respondents (68%) preferred that acceptance rates not be decided before-hand, but rather post-hoc based on quality of papers. There was strong overall support (83%) for the status quo for conference publications remaining selective.

 

  • Timing of Review Release: The great majority said that the timing between review release and the next conference deadlines was at least somewhat important, (83%-14%). People would prefer at least 2-3 weeks, and many (39%) said at least a month was preferable.  Female respondents slightly preferred having more time between review release and the next deadline, and those who preferred not to state their gender particularly preferred longer reviewing cycles.

 

  • Public Review: Opinions about public review were mixed and tended to be strong, with 42% opposed and 32% in favor.  Of these, 37% held strong opinions. Those who had experience participating in conferences with public review tended to be in favor 50%-30%, while those who did not have such experience were opposed 27%-47%.  In general, support for public review tended to be inversely correlated with reviewing experience. Female respondents were less likely to support public review than male respondents, 19%-55% vs. 36%-39% respectively. However, female respondents were only half as likely as male respondents to have participated in a conference with public review, and the effects of gender and experience with public review may be conflated in these results.

 

Summary of Open Comments:  This is a summary of the main comments raised in the open comments section, as well as selected ideas put forward;  the interested reader can review the full, organized list of comments:

 

  • Author Response: Positive: ACs/PCs find it useful; reviewers find it useful; authors find it useful for clarifications, to repair poor reviews, and for fairness.   Neutral: perhaps should be used only in some cases or only sent to ACs; need better indication of when response considered. Negative: too much effort for too little effect; a stressful time-sink with little feedback; too much time for reviewers and authors with little ultimate payoff; reviewer discussion is more valuable; ACL author response not as thorough/long as other conferences/journals.

 

  • Author Discussion: Positive: If set up more like a journal model, then a good idea; the OpenReview style of discussion is a good model.  Neutral: a good idea, but only with a mechanism for conditional acceptance; would need to reduce the number of papers per reviewer; maybe for limited cases only.  Negative: too much work for reviewers/ACs/authors; not enough time between conferences; likely to be as ineffective as regular author response; only works for those with flexible schedules. Idea: a revise-and-resubmit model for the next *ACL conference where reviewers have access to prior reviews.

 

  • Meta-Reviews: Positive: Helpful to authors and PCs when reviews conflict; ensures ACs review all reviews; imposes discipline on the reviewing process; catches errors. Negative: more work for AC and benefit not clear. Idea: occasional message from the PC explaining decision.

 

  • Review Form Structure: Positive: Minimal structure (over no structure) is useful as an AC; minimal structure a good middle ground for experienced reviewers; ACL 2019’s form received many positive comments; can reduce bias and increase fairness and clarity.  Negative: frequent changes in format across conferences; too much structure increases reviewer workload and can reduce review coherence (NAACL 2018 in particular); minimal word counts are not a good idea.

 

  • Transparency (all three types): Positive:  Comments are similar to those about public review, summarized below. Negative: Public release of review discussion may make frank discussion difficult/unintentionally reveal reviewers’ identities.

 

  • Acceptance Rates / Conference Selectivity: Pro higher acceptance rates: Promote journals more and at the same time make conferences less selective; have different levels of selectivity within conferences; low acceptance rates lead to inefficiencies as papers are resubmitted multiple times, and quality problems since good papers are rejected and there is variance in quality.  Neutral: current acceptance rates seem to be striking a good balance; the medical system is not a good one; our rate is good; other fields’ are too low. Con higher acceptance rates: being selective at conferences is an important quality control; important for jobs; impractical logistically. Idea: focus higher acceptance rates on specific special topics.

 

  • Timing of Review Release: Having more time encourages meaningful revisions; it is important to demotivate resubmissions “as is”; reviewers need time to rest between reviewing assignments; timing of major conferences needs to be better spaced out.  Negative: people should not feel they have to submit to every conference. Ideas: allow authors to see reviews before final accept/reject decision to give them more time to revise work; do not allow people to resubmit unaltered papers; prior reviews should travel with resubmitted papers.

 

  • Public Review: Positive: incentivizes better reviewing; a way to recruit reviewers; speeds up research; might improve review quality.   Neutral: it is essential to retain double-blind reviewing. Negative: may hurt diversity; may discourage participation; may hurt early careers; may be a popularity contest; no filter on quality of comments; lack of quality on posted; unreviewed papers; too easy to game; too time-consuming. Ideas: support public discussion after publication; try public review once as an experiment.

 

  • General Comments: Concerns about reviewer preparation, review quality, and reviewing load;  concerns about diversity of acceptable content; comments on the bidding process; concerns about over-publishing (just noticeable difference). Idea: two week review cycle (suggested by Omer Levy).

 

Implications for Policy

 

These results can be divided into three categories, broadly.  The first is in terms of guidance for conference planning for ACL Exec and Chapter Boards.   There is general agreement about keeping the status quo in terms of acceptance rates, keeping conference presentations selective, retaining gaps in time between conferences to allow for resubmission.  

The second is to help inform Program Co-Chairs and conference organizing committees terms of details for reviewing procedure in terms of author response, meta-reviews, and their transparency.  Program co-chairs can take this information into account when making decisions about reviewing, and the ACL Exec can use this information to help prioritize changes to reviewing software.

The third category is the potential for more radical changes to *ACL reviewing procedures.  In particular, some people are interested in open review. This survey has canvassed the views of the membership about this topic.  Currently, the views seem to be split, which suggests that any path to adoption most likely would need to first take place in some kind of laboratory setting, such as workshop reviewing or the like.  Additionally, some respondents have suggested other changes to the reviewing process that were not explicitly asked about but which may be a useful way forward. These include: requiring authors to include prior reviews when re-submitting papers;  allow authors to see reviews before final decisions are made to allow more time for improving their work, and providing a method for public discussion of papers after acceptance and publication.

This survey does not address other pressing issues relating to reviewing that remain before *ACL, including the need for better automation of detection of reviewing matches, recruiting of reviewers, detection of conflicts of interest, and improvement of reviewing quality.

 

Definitions

The text of the original survey is in this pdf document.

Terms were defined as followed in the survey.

Author Response: In recent years, some *ACL conferences have had author responses, which provide a chance for authors to respond to reviewers to answer questions and provide clarifications before final decisions are made.

Author Discussion: A more comprehensive version of author discussion, some conferences have a discussion period where authors can interact with reviewers over an extended period of time. After initial reviews are released, authors may respond to the reviews point-by-point, and then the reviewers or ACs can ask additional follow-up questions or clarifications until the author discussion period is over. All this can be done in an anonymous fashion, preserving double-blind review.

Meta-Review: A Meta-review is a review performed by ACs after the review process completes that summarizes the views of the reviewers, and also explains the reasoning of the ACs regarding why they reached their final decision. These meta-reviews potentially make the reasoning about why decisions were made more clear. 

Structured Review Form:  Review forms require various levels of structure ranging from a simple score and free-form text box with which to enter reviews, to multiple text boxes requiring information about different aspects of the review.

Transparency: The decision about by whom information should be seen: program chairs and reviewers only, the authors of the papers, the public at large.

Acceptance Rates: Currently, acceptance rates for the major *ACL conferences are around 20-25%, and papers must be accepted via a competitive review process in order to be presented.  (Demonstrations, posters and some other methods of presenting materials tend to have a higher acceptance rate, but no questions were asked about these.)

Conference Selectivity: In other fields (e.g. medicine), most submitted papers or abstracts are given presentations at conferences, and other measures are used to indicate relative quality of papers (e.g. journal publications).  This question asked if conference publications should remain selective, not be selective with journals taking the role, or not be selective but with public reviews released. 

Timing of Release: When several conferences are held in sequence back-to-back, it is sometimes the case that reviews are released only shortly before the next major conference submission deadline. Questions about timing refer to this limited time span for revisions before the next opportunity to submit.

Public Review: Some conferences have introduced a review mechanism called public review in which, in addition to a program committee, the public is allowed to view and publicly post comments about papers during the review period.  Although in some conferences the authors’ names are publicly exposed and the reviews remain even for rejected papers, it is possible to change the format and anonymize the submissions and remove rejected papers after the review period.

This survey and report were prepared primarily by Graham Neubig and Marti Hearst in June-July 2019.

What’s new, different and challenging in ACL 2019?

Given the rise of AI, Natural Language Processing has become increasingly popular and almost all recent conferences have reported a record breaking number of submissions. Yet, never in the history of ACL have we seen such a dramatic growth: within just a single year, we have gone from 1544 submissions to 2906! This is illustrated in the following graph that shows the growth of ACL over the past 20 years in terms of the number of submissions, reviewers and (Senior) Area Chairs.

Review of such a large number of submissions requires a large, well-organised Program Committee. Extending the ACL 2018 practice, we created a structure similar to the conferences that have a Senior Program Committee alongside the Program Committee. For the Senior PC, we recruited a relatively large number of Senior Area Chairs (46, 2-4 to head each area) and Area Chairs (184, 3-15 per area). We also differentiated between their roles so that SACs assign papers to ACs and reviewers and make recommendations for their area, while ACs each manage a smaller set of papers within the area, lead discussions with reviewers, write meta-reviews and make initial recommendations. This structure also helps to compensate for the problem that our rapidly growing field is suffering from: the lack of experienced reviewers. As ACs focus on a smaller number of papers, they can pay more attention to the review process. As for reviewers, we simply have many of them this year: 2281 (ACL 2018 had 1610).

With such a huge number of submissions, every step of conference organisation (from the initial checking of submissions to decision making) takes longer than before. Knowing the timeline would be extremely tight, we looked into ways of improving efficiency. We wanted to improve efficiency in ways that would optimise the experience for authors and PC members. In particular, we reduced the number of deadlines requiring a short turn-around of 3 days (or less). Such deadlines at best are stressful for all, but often work poorly, given the diversity of work and life situations in the community (i.e. the great variation in times / days when people are actually available for conference-related work).

We implemented the following changes:

  • We dropped the paper bidding phase. This phase can take several days of time, and given the large number of submissions, reviewers find it increasingly time consuming. However, the time considerations aside, we were also worried about the impact of reviewers choosing their favourite papers for review, as opposed to choosing to review papers that they are qualified to review (for an interesting blog post on the topic, see https://naacl2018.wordpress.com/2018/01/28/a-review-of-reviewer-assignment-methods/). Our plan was to rely on the Toronto Paper Matching System (TPMS) in allocating papers to reviewers. Unfortunately, this system didn’t prove as useful as we had hoped for (it requires more extensive reviewer profiles for optimal performance than what we had available) and the work had to rely largely on the manual effort. Our fantastic SACs did an outstanding job here, but this is clearly a task that needs better automated support.
  • Like NAACL 2019, we didn’t have an author response phase this year. Originally introduced as an improvement to the review process, author response has proven time-consuming (taking not only authors but also reviewers and chairs time) and not hugely impactful on a larger scale. For example, the following paper (due to appear in NAACL 2019) summarises relevant data from ACL 2018:

Does My Rebuttal Matter? Insights from a Major NLP Conference
Yang Gao, Steffen Eger, Ilia Kuznetsov, Iryna Gurevych and Yusuke Miyao

So, instead of author response, we decided to invest in promoting discussion within the PC,   and on ensuring that discussions, papers and reviews have the full attention of ACs.

  • Finally, in contrast with the elaborate review forms of some recent conferences, we adopted much simpler, streamlined review form, adapted from EMNLP 2018 (many thanks to Julia Hockenmaier, David Chiang and Junichi Tsujii!). While encouraging thorough review, this form is less laborious for reviewers and more focused on highlighting the key points for decision making.

However, even with our time saving measures the conference schedule is still too tight, not only for us PC chairs but also for (S)ACs and reviewers. Interestingly, although ACL has grown significantly over the past 20 years, the schedule remains almost the same as it was back in 1999. In particular, the time between the submission deadline and the notification of acceptance is exactly the same (2 months) as it was in 1999, although the number of submissions has increased tenfold and the size and the complexity of the PC even more. It may be time to adopt the practice of related conferences (e.g., IJCAI, NeurIPS, SIGIR) and extend the schedule to allow for 3-4 months for this process. This could be critical for maintaining the quality of reviewing as the conference grows further.

The conference schedule is also impacted by the schedules for other conferences (e.g. this year NAACL and EMNLP-IJCNLP) and the ACL Guidelines and desire for preprints. We made a concerted effort with other conferences to avoid overlap in the review period (which otherwise shortened the available time for each conference). Overlapping review periods will either result in unhappy authors (when they cannot submit to all conferences) or chaos for PC chairs who struggle to manage multiple submissions and large numbers of withdrawn papers. Also reviewers may be less likely to review for multiple conferences at the same time. Even with no overlap this year, there is still a desire from authors for a longer period in between conferences, so that they can revise and resubmit papers based on the feedback from previous conferences, and rejection from one conference does not leave enough time for people to submit before the anonymity period of the next conference begins. In general, not just the dates of the conferences but also the number and scheduling of deadlines should be given community-wide attention going forward.

Statistics on submissions

ACL 2019 received as many as 2906 submissions by the submission deadline. This constitutes more than a 75% increase over ACL 2018 and is an all-time record for ACL-related conferences! The huge logistics involved in handling these submissions explains our long silence. However, we can now finally give you some basic statistics:

• Out of the 2906 submissions, 120 submissions have been withdrawn by authors and 92 have been desk-rejected due to issues such as dual submissions, plagiarism or submissions not conforming to the submission guidelines. (These numbers are likely to still change over the coming weeks as papers undergo review).

• The resulting 2694 valid submissions, including 1609 long and 1085 short papers, have been sent to review.

• Each paper has been assigned to one of 22 areas for review. Each area is headed by 2-4 Senior Area Chairs (SACs) who are in charge of the overall review process within their area. They are assisted by 3-15 Area Chairs (ACs) who look after a subset of the papers (15 on average). The different areas have 59-319 reviewers, depending on the number of submissions. In total, our Programme Committee includes 2256 people: 46 SACs, 184 ACs and 2026 reviewers (1903 are currently involved in reviewing).

• The following table shows, for each area, the number of submissions (long, short and total) that are currently undergoing review.

Our 3 largest areas in terms of submissions are the same as in ACL 2018:

  • Information Extraction and Text Mining (9.2% of all valid submissions vs. 11.5% in ACL 2018 – note that the percentages are not fully comparable because this year’s conference features an additional area, Applications)
  • Machine Learning (8.2% vs. 7.4% in ACL 2018)
  • Machine Translation (7.7% vs. 8.3% in ACL 2018)

Also Dialogue and Interactive systems are among the top 5 areas in both conferences. However, Document Analysis, which was the 4th largest area last year, ranks only the 16th this year, while Generation (which ranked the 14th last year with 59 submissions) is ranked now the 5th with 156 submissions (the increase in submissions is much larger here than our overall growth rate!). Another surprise is Linguistic Theories, Cognitive Modeling and Psycholinguistics, which clearly grew in popularity: 24 submissions last year, 60 this year.

Submissions remain still relatively evenly distributed across the different areas (see the below pie chart) in comparison with e.g. in ACL 2017 where IE was clearly dominating (23.4% of submissions).

Looking at the long and short papers, 60% of our submissions are long while in ACL 2018 66% were. So short papers are more popular this year. Looking the individual areas,

  • short papers are clearly more popular than long ones in only one of the areas: Applications.
  • the two types of paper are almost equally popular in Machine Translation and Tagging, Chunking, Syntax, Parsing,
  • areas that have the clearest preference for long papers (over 65% of submissions are long) include Machine Learning, Vision, Robotics Multimodal Grounding, and Speech, and Dialogue and Interactive Systems.

Finally, regarding the geographical distribution of papers, we received papers from 61 countries. Considering the country of the corresponding author only (which is clearly a simplification), we looked at which countries produced most submissions. The first chart below shows the results for the top 20 countries, and the second zooms into the top 20 countries with fewer than 140 submissions.

As expected, we have the US and China in the lead. The UK and Germany rank third and fourth with 129 and 126 submissions, respectively, and are closely followed by Japan (120 submissions).

The chart differentiates between the long and short paper submissions. It shows that the top 5 countries, apart from Japan, have a clear preference for long papers. There is some variation among them, e.g. China produces relatively more long paper submissions (69% of all submissions) than the US (60% of all submissions). The countries among the top 20 where short paper submissions are more popular than long ones include Japan, India, Taiwan, Denmark and Spain. The countries that have the strongest preference for long papers are Singapore (84% of all submissions are long) and Israel (73%).

Let us know if you would like us to compute additional statistics.