CS3730: Computational Modeling of Discourse and Grounded Communication

Instructor: Malihe Alikhani

Time and Place: Wednesday 6-8:30 pm, Slack & Zoom

background image: “The art of conversation” by René Magritte

Overview

One of the main aspirations of AI is to design intelligent systems that can communicate with people naturally and effectively. Designing such systems poses challenges that encompass multiple academic disciplines. In this course, we will discuss recent advances that bring together tools and theories from natural language processing, computer vision, robotics, linguistics, and spatial and social cognition to address these challenges.

In the first half of the class, we will discuss computational models of discourse, machine learning approaches for analyzing and modeling multimodal web data, and crowdsourcing for creating robust and interpretable systems. In the second half, we will study how we can extend these models and practices to model language generation, reasoning, and cooperation in interactive systems. The course will cover traditional approaches to designing communicative systems, as exemplified by the work of Grosz and Sidner, as well as more recent work in dialogue, discourse, and evaluation, including statistical approaches to problems in the field. We will discuss open problems in computational discourse, conversational AI, and human-robot collaboration.

The class will mostly revolve around reading research papers in the area. I will give an overview of the research topic, present papers, and go over demos and code in the first half of each class. In the second half, students will present papers that they have signed up for and participate in class discussions about these papers.

Week 1 (Aug 19): Research in Computational Linguistics:

Methodology, Ethics, and Implication

Schlangen, David. "Targeting the Benchmark: On Methodology in Current Natural Language Processing Research." arXiv preprint arXiv:2007.04792 (2020).
Pavlick, Ellie, and Tom Kwiatkowski. "Inherent disagreements in human textual inferences." Transactions of the Association for Computational Linguistics 7 (2019): 677-694.
Vitak, Jessica, Katie Shilton, and Zahra Ashktorab. "Beyond the Belmont principles: Ethical challenges, practices, and beliefs in the online data research community." Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing. 2016.
Related papers:
- Kilgarriff, Adam. "Language is never, ever, ever, random." Corpus linguistics and linguistic theory 1.2 (2005): 263-276.
- Bender, Emily M., and Batya Friedman. "Data statements for natural language processing: Toward mitigating system bias and enabling better science." Transactions of the Association for Computational Linguistics 6 (2018): 587-60
- Grosz, Barbara, and Candace L. Sidner. "Attention, intentions, and the structure of discourse." Computational linguistics (1986).
- Jha, Rohan, Charles Lovering, and Ellie Pavlick. "When does data augmentation help generalization in NLP?." arXiv preprint arXiv:2004.15012 (2020).
- Curry, Amanda Cercas, and Verena Rieser. "# MeToo Alexa: How conversational systems respond to sexual harassment." Proceedings of the Second ACL Workshop on Ethics in Natural Language Processing. 2018.
- Schmaltz, Allen. "On the utility of lay summaries and ai safety disclosures: Toward robust, open research oversight." Proceedings of the Second ACL Workshop on Ethics in Natural Language Processing. 2018.
- Can bots manipulate public?
Background readings:
- Keshav, Srinivasan. "How to read a paper." ACM SIGCOMM Computer Communication Review 37.3 (2007): 83-84.
- Smith, Alan Jay. "The task of the referee." IEEE Computer 23.4 (1990): 65-71.
- How to write a CS paper
- Crowdsourcing for NLP
- Baltrušaitis, Tadas, Chaitanya Ahuja, and Louis-Philippe Morency. "Multimodal machine learning: A survey and taxonomy." IEEE transactions on pattern analysis and machine intelligence 41.2 (2018): 423-443
- Newman, Mark EJ. "Power laws, Pareto distributions and Zipf's law." Contemporary physics 46.5 (2005): 323-351.

Screen Shot 2020-07-27 at 10.28.35 PM.pn

Week 2 (Aug 26): Discourse Coherence

Miltsakaki, Eleni, et al. "The Penn Discourse Treebank." LREC. 2004.
Pitler, Emily, Annie Louis, and Ani Nenkova. "Automatic sense prediction for implicit discourse relations in text." (2009).
Shi, Wei, and Vera Demberg. "On the need of cross-validation for discourse relation classification." Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. 2017.
Further reading:
- An overview from a computational perspective:
- Sections 1 and 2 of Webber, Egg and Kordoni, Discourse structure and language technology
- Discourse Coherence, Jurafsky & Martin
- Discourse Coherence, Andy Kehler, Chapter 11
- Sanders, Ted, and Wilbert Spooren. "Discourse and text structure." Handbook of cognitive linguistics (2007): 916-941.
- Asr, Fatemeh Torabi, and Vera Demberg. "Implicitness of discourse relations." Proceedings of COLING 2012. 2012.
- Miltsakaki, Eleni, et al. "Sense annotation in the penn discourse treebank." International Conference on Intelligent Text Processing and Computational Linguistics. Springer, Berlin, Heidelberg, 2008.
Code/data:

Screen Shot 2020-07-27 at 10.40.48 PM.pn

Week 3 (Sept 2): Learning inferences in text

Keith, Katherine A., David Jensen, and Brendan O'Connor. "Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates." arXiv preprint arXiv:2005.00649 (2020).
Scheffler, Tatjana, et al. "Annotating Shallow Discourse Relations in Twitter Conversations." Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019. 2019.
Related papers:
- Kim, Najoung, et al. "Implicit Discourse Relation Classification: We Need to Talk about Evaluation." Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020.
- Xu, Yang, et al. "Using active learning to expand training data for implicit discourse relation recognition." Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018.
Code/data:
- Active learning for discourse relation prediction
- Discourse relations in Twitter conversations

Week 4 (Sept 9): Applications:

Machine Comprehension, Summarization and Argument Mining

Week 5 (Sept 16): Local coherence and co-reference resolution

Nguyen, Dat Tien, and Shafiq Joty. "A neural local coherence model." Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017.
Clark, Kevin, and Christopher D. Manning. "Deep reinforcement learning for mention-ranking coreference models." arXiv preprint arXiv:1609.08667 (2016).
Related papers:
- Mesgar, Mohsen, and Michael Strube. "Lexical coherence graph modeling using word embeddings." Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016.
- Section 15.1 of Eisenstein’s notes
- Clark, Kevin, and Christopher D. Manning. "Deep reinforcement learning for mention-ranking coreference models." arXiv preprint arXiv:1609.08667 (2016).
- Lee, Kenton, et al. "End-to-end neural coreference resolution." arXiv preprint arXiv:1707.07045 (2017).
- Durrett, Greg, and Dan Klein. "A joint model for entity analysis: Coreference, typing, and linking." Transactions of the association for computational linguistics 2 (2014): 477-490.
- Lee, Kenton, Luheng He, and Luke Zettlemoyer. "Higher-order coreference resolution with coarse-to-fine inference." arXiv preprint arXiv:1804.05392 (2018).
- Hobbs, Jerry R. "Coherence and coreference." Cognitive science 3.1 (1979): 67-90.
- Lee, Kenton, et al. "End-to-end neural coreference resolution." arXiv preprint arXiv:1707.07045 (2017).
- Lee, Kenton, et al. "End-to-end neural coreference resolution." arXiv preprint arXiv:1707.07045 (2017).
Code/data:

Week 6 (Sept 23): Learning Inferences in Text and Imagery

Alikhani, Malihe, et al. "Cross-modal Coherence Modeling for Caption Generation." Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020.
Hessel, Jack, Lillian Lee, and David Mimno. "Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents." arXiv preprint arXiv:1904.07826 (2019).
Related papers:
- Thomas, Christopher, and Adriana Kovashka. "Predicting the Politics of an Image Using Webly Supervised Data." Advances in Neural Information Processing Systems. 2019.
- Kruk, Julia, et al. "Integrating text and image: determining multimodal document intent in instagram posts." arXiv preprint arXiv:1904.09073 (2019)
- Zhang, Mingda, Rebecca Hwa, and Adriana Kovashka. "Equal but not the same: Understanding the implicit relationship between persuasive images and text." arXiv preprint arXiv:1807.08205 (2018).
- Coherence in film and comics
  - McCloud, Scott, and A. D. Manning. "Understanding Comics: The invisible art." IEEE Transactions on Professional Communications 41.1 (1998): 66-69.
  - Cumming, Samuel, Gabriel Greenberg, and Rory Kelly. "Conventions of viewpoint coherence in film." (2017).
Code/data:
- Cross-modal coherence modeling
- Unsupervised discovery of multimodal links

Screen Shot 2020-07-27 at 11.05.47 PM.pn

Week 7 (Sept 30): Diagram Understanding

Alikhani, Malihe, and Matthew Stone. "Arrows are the verbs of diagrams." Proceedings of the 27th International Conference on Computational Linguistics. 2018.
Larkin, Jill H., and Herbert A. Simon. "Why a diagram is (sometimes) worth ten thousand words." Cognitive science 11.1 (1987): 65-100.

Student presentation

Screen Shot 2020-07-27 at 11.09.33 PM.pn

Screen Shot 2020-07-28 at 4.26.09 PM.png

Week 8 (Oct 7): Language Grounding to Vision

Suhr, Alane, et al. "A corpus for reasoning about natural language grounded in photographs." arXiv preprint arXiv:1811.00491 (2018).
Li, Wei, et al. "Visual question answering with attention transfer and a cross-modal gating mechanism." Pattern Recognition Letters 133 (2020): 334-340
Code/data:
- A list of code and data available for grounded language learning
ACL and CVPR tutorials on language grounding:

Screen%20Shot%202020-08-02%20at%203.11_e

Week 9 (Oct 14): From VQA to Visual Dialogue

Schlangen, David. "Grounded Agreement Games: Emphasizing Conversational Grounding in Visual Dialogue Settings." arXiv preprint arXiv:1908.11279 (2019).
Kim, Hyounghun, Hao Tan, and Mohit Bansal. "Modality-Balanced Models for Visual Dialogue." AAAI. 2020.
Code/data:
- GuessWhat, GuessWhich, PhotoBook, Meetup!: reference to code and data is available in section 6 of the ACL tutorial on achieving common ground in multimodal dialogue by Stone and Alikhani.
- Haber, Janosch, et al. "The photobook dataset: Building common ground through visually-grounded dialogue." arXiv preprint arXiv:1906.01530 (2019).

Week 10 (Oct 21): Human-Robot/Agent Dialogue

Henderson, James, Oliver Lemon, and Kallirroi Georgila. "Hybrid reinforcement/supervised learning of dialogue policies from fixed data sets." Computational Linguistics 34.4 (2008): 487-511.
Liu, Changsong, Rui Fang, and Joyce Chai. "Towards mediating shared perceptual basis in situated dialogue." Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 2012.
Related Paper:
- Gervits, Felix, et al. "It’s About Time: Turn-Entry Timing For Situated Human-Robot Dialogue." Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 2020.
- Bonial, Claire, et al. "Augmenting Abstract Meaning Representation for human-robot dialogue." Proceedings of the First International Workshop on Designing Meaning Representations. 2019.
- Rieser, Verena, and Oliver Lemon. "Learning and evaluation of dialogue strategies for new applications: Empirical methods for optimization from small data sets." Computational Linguistics 37.1 (2011): 153-196.

Screen Shot 2020-08-02 at 3.16.18 PM.png

Week 11 (Oct 28): Grounding in Communication: Science and Models

Clark, Herbert H., and Susan E. Brennan. "Grounding in communication." (1991)
Related papers:
- Clark, Herbert H. "Coordinating with each other in a material world." Discourse studies 7.4-5 (2005): 507-525.
- Traum, David. "Computational approaches to dialogue." The Routledge Handbook of Language and Dialogue. Taylor & Francis (2017): 143-161
- Zhang, Yayun, et al. "Seeking Meaning: Examining a Cross-situational Solution to Learn Action Verbs Using Human Simulation Paradigm."
- Matuszek, Cynthia. "Grounded Language Learning: Where Robotics and NLP Meet." IJCAI. 2018.

Week 12 (Nov 4): Achieving Common Ground in Multimodal Dialogue

Chai, Joyce Y., et al. "Collaborative language grounding toward situated human-robot dialogue." AI Magazine 37.4 (2016): 32-45.
Hough, Julian, and David Schlangen. "It's Not What You Do, It's How You Do It: Grounding Uncertainty for a Simple Robot." 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI. IEEE, 2017.
Related papers:
- Foster, Mary Ellen, et al. "The roles of haptic-ostensive referring expressions in cooperative, task-based human-robot dialogue." 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 2008.

Screen Shot 2020-08-02 at 4.44.20 PM.png

Week 13 (Nov 11): Applications:
Assistive Technology, Tutoring Systems, Social Companions

Kragic, Danica, et al. "Interactive, Collaborative Robots: Challenges and Opportunities." IJCAI 2018
Mehlmann, Gregor, Kathrin Janowski, and Elisabeth André. "Modeling grounding for interactive social companions." KI-Künstliche Intelligenz 30.1 (2016): 45-52
Related papers:
- Diziol, Dejana, et al. "Using intelligent tutor technology to implement adaptive support for student collaboration." Educational Psychology Review 22.1 (2010): 89-102.
- Forbes-Riley, Kate, and Diane Litman. "Designing and evaluating a wizarded uncertainty-adaptive spoken dialogue tutoring system." Computer Speech & Language 25.1 (2011): 105-126.
- Novikova, Jekaterina, et al. "Sympathy Begins with a Smile, Intelligence Begins with a Word: Use of Multimodal Features in Spoken Human-Robot Interaction." arXiv preprint arXiv:1706.02757 (2017).

Screen Shot 2020-08-02 at 5.02.40 PM.png

Week 14 (Nov 18): Student Presentations

Course Requirements and Grading

Research paper presentations

Each student will be required to give one solo (given by the student alone) and one paired (given by a team of two students) research paper presentation. The class has a Slack workspace that can help you get in touch with your classmates easily. Students will be asked to rank the papers they prefer to present, and then will be assigned both a solo and paired paper to present on a specified date. Each class will include two such paper presentations.

A paper presentation should be about 20 minutes and will be followed by 15 minutes of class discussion led by the presenter(s). The presentation should first summarize the content of the paper, clearly presenting: 1) What problem or task does the paper study? 2) What is the motivation for this problem/task, i.e., why is it important? 3) What is the novel algorithm/approach to this problem that the paper proposes? 4) How is this method evaluated, i.e. what is the experimental methodology, what data is utilized, and what performance metrics are used? 5) What are the basic results and conclusions of the paper? Finally, the presentation should conclude with the presenter's critique of the paper including 1) Are there any reasons to question the motivation and importance of the problem studied? 2) Are there any limitations/weaknesses to the proposed approach to this problem? 3) Are there any limitations/weaknesses to the evaluation methodology and/or results? 4) What are some promising future research directions following up on this work? For the paired presentation, the two presenters can decide how to divide the work of presentation among themselves.

Daily paper questions

Before each class in which papers are presented (by midnight each Tuesday before class), every student should submit two questions on Slack, one for each of the two papers scheduled to be presented in that class. To encourage everyone to attend the class and participate, I will start the discussion of each paper by randomly selecting three students to ask their questions in the class.

Weekly paper critiques

At the end of every week (by midnight Friday), each student should submit a "weekly paper critique" on Canvas. Pick one of the papers discussed in class during the week and write a 1/2 page critique of the paper. Use the critique questions listed above for the oral presentations to guide your discussion, but there is no need to address all of these questions in a given critique. Submit a nicely formated 11pt font PDF. If you presented one of the papers during the week, choose another paper you did not present for your weekly critique.

Programming assignment

Three programming assignments are posted on Canvas. You are required to turn in one assignment (of your choice) by November 1.

Final research project

Final projects should ideally be done by teams of two students. Projects done by one or three students are possible on rare occasions with prior approval of the instructor. Each student is responsible for finding a partner for their final project. Feel free to post a message on Slack about your interests in order to find a partner with similar interests.

Submit a one-page project proposal by September 16 that briefly covers the first 4 questions for research paper presentations (see above). I will provide feedback on these proposals, but they will not be graded. Students are encouraged to discuss project proposals with me in office hours before submitting them. Groups are required to give a 6-minute presentation about their progress on the final project in the 7th week of the class. During the last two weeks of class, each team will be required to give a 20-minute presentation on the current state of their project, and lead a 15-minute discussion of their project. This will allow for two project presentations per class. Use the same format as the paired research paper presentations (see above), but you may have only preliminary actual results at that time to present. Submit the code to the GitHub repo for the class at midnight on December 1. Final project reports are due on Canvas at midnight on December 4. They should be formatted to meet the requirements of paper submission to the ACL conference as described here.

Final grade

The final grade will be computed as follows:

28% Solo Research Paper Presentations

10% Paired Research Paper Presentation

10% Daily Paper Questions

10% Weekly Paper Critiques

14% Programming assignment

28% Final Project Report

Academic integrity
All assignment submissions must be the sole work of each individual student. Students may not read or copy another student's solutions or share their own solutions with other students. Students may not review solutions from students who have taken the course in previous years. Submissions that are substantively similar will be considered cheating by all students involved, and as such, students must be mindful not to post their code publicly. The use of books and online resources is allowed, but must be credited in submissions, and material may not be copied verbatim. Any use of electronics or other resources during an examination will be considered cheating.

If you have any doubts about whether a particular action may be construed as cheating, ask the instructor for clarification before you do it. The instructor will make the final determination of what is considered cheating.

Cheating in this course will result in a grade of F for the course and may be subject to further disciplinary action.

Using an open-source codebase is accepted, but you must explicitly cite the source, especially following the owner's guideline if it exists. For any writing involved in the project, plagiarism is strictly prohibited. If you are unclear whether your work will be considered as plagiarism, ask the instructor before submitting or presenting the work

Students with disabilities

If you have a disability for which you are or may be requesting an accommodation, you are encouraged to contact both your instructor and Disability Resources and Services, 216 William Pitt Union, 412-648-7890 or 412-383-7355 (TTY) as early as possible in the term. DRS will verify your disability and determine reasonable accommodations for this course. More info at www.drs.pitt.edu.

Audio/video recordings

To ensure the free and open discussion of ideas, students may not record classroom lectures, discussions, and/or activities without the advance written permission of the instructor, and any such recording properly approved in advance can be used solely for the student's own private use. Since this is a seminar class and meetings are all online, students are required to use their cameras during the class.

Copyrighted materials

All material provided through this web site is subject to copyright. This applies to class/recitation notes, slides, assignments, solutions, project descriptions, etc. You are allowed (and expected!) to use all the provided material for personal use. However, you are strictly prohibited from sharing the material with others in general and from posting the material on the Web or other file sharing venues in particular.

Diversity and Inclusion

The University of Pittsburgh does not tolerate any form of discrimination, harassment, or retaliation based on disability, race, color, religion, national origin, ancestry, genetic information, marital status, familial status, sex, age, sexual orientation, veteran status or gender identity or other factors as stated in the University’s Title IX policy. The University is committed to taking prompt action to end a hostile environment that interferes with the University’s mission. For more information about policies, procedures, and practices, see:http://diversity.pitt.edu/affirmative-action/policies-procedures-and-practices.3I ask that everyone in the class strive to help ensure that other members of this class can learn in a supportive and respectful environment. If there are instances of the aforementioned issues, please contact the Title IX Coordinator, by calling 412-648-7860, or e-mailingtitleixcoordinator@pitt.edu. Reports can also be filed online:https://www.diversity.pitt.edu/make-report/report-form. You may also choose to report this to a faculty/staff member; they are required to communicate this to the University’s Office of Diversity and Inclusion. If you wish to maintain complete confidentiality, you may also contact the University Counseling Center (412-648-7930)

Gender Inclusive Language Statement (from Pitt GSWS)

Language is gender-inclusive and non-sexist when we use words that affirm and respect how people describe, express and experience their gender. Just as sexist language excludes women’s experiences, non-gender-inclusive language excludes the experiences of individuals whose identities may not fit the gender binary, and/or who may not identify with the sex they were assigned at birth. Identities including trans, intersex, and genderqueer reflect personal descriptions, expressions, and experiences. Gender-inclusive/non-sexist language acknowledges people of any gender (for example, first-year student versus freshman, chair versus chairman, humankind versus mankind, etc.). It also affirms non-binary gender identifications and recognizes the difference between biological sex and gender expression. Students may share their preferred pronouns and names, and these gender identities and gender expressions should be honored.

The pictures are taken from the related papers in each section.