
Welcome
New social technologies and widespread access to the internet have allowed for new forms of content creation, connectivity and information sharing. With vast unstructured data and limited labels, organizing and reconciling information from different sources and modalities with bounded supervision is one of the current challenges in machine learning. This tutorial focuses on using multimodal representations for graph-regularized or semi-supervised learning, and uses as case study two real-world multi-domain datasets which prompt for understanding the fine-grained visual and linguistic semantics.
Venue
The Online Multimodal Knowledge Discovery tutorial will be held virtually at ICDM 2020: 20th IEEE International Conference on Data Mining on November 18th, 2020, from 14:30 to 16:30 CET.
Outline
Section | Subsection | min |
Introduction
|
The landscape of online content | 10 |
A case for multimodal knowledge reconciliation | 5 | |
Natural
Language Processing |
From word embeddings to contextualized representations | 10 |
Fine-tuning pretrained models on downstream tasks | 5 | |
The textual entailment problem | 5 | |
Structured Data
|
Semi-structured and tabular text | 5 |
Knowledge graphs | 5 | |
Neural Graph Learning | Leveraging structured signals with Neural Structured Learning | 10 |
Break | - | 5 |
Multimodal Learning
|
Learning joint representations for visual and language tasks | 20 |
Self-Supervised Multimodal Versatile Networks | 20 | |
Multimodal representations for knowledge reconciliation | 10 | |
Final considerations
|
Closing notes | 5 |
Q&A | 5 | |
Total | – | 120 |
Slides
Reading list
Natural Language Processing
- Attention is all you need, Vaswani et at., 2017.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et at., 2018.
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, Lan et at., 2019.
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, Raffel et at., 2019.
- Deep Learning for NLP with Tensorflow, Ilharco et at., 2019.
- High Performance Natural Language Processing, Ilharco et at., 2020.
Textual Entailment
- The Seventh PASCAL Recognizing Textual Entailment Challenge, Bentivogli et at., 2011.
- Did It Happen? The Pragmatic Complexity of Veridicality Assessment, de Marneffe et at., 2012.
- The Multi-Genre Natural Language Inference (MultiNLI) corpus, Williams et at., 2017.
- XNLI: Evaluating Cross-lingual Sentence Representations, Conneau et at., 2018.
Structured Data
- Industry-scale Knowledge Graphs, Noy et at., 2019.
- Understanding categorical semantic compatibility in KG, Muxagata et at., 2019.
Neural Graph Learning
- Neural Graph Machines: Learning Neural Networks Using Graphs, Bui et at., 2017.
- Neural Structured Learning: Training Neural Networks with Structured Signals, Heydon et at., 2020.
Multimodal Learning
- Multimodal Deep Learning, Ngiam et at., 2011.
- DeViSE: A Deep Visual-Semantic Embedding Model, Frome et at., 2013.
- Learning a Text-Video Embedding from Incomplete and Heterogeneous Data, Miech et at., 2018.
- VideoBERT: A Joint Model for Video and Language Representation Learning, Sun et at., 2019.
- HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips, Miech et at., 2019.
- Learning Video Representations using Contrastive Bidirectional Transformer, Sun et at., 2019.
- Use What You Have: Video Retrieval Using Representations From Collaborative Experts, Liu et at., 2019.
- LXMERT: Learning Cross-Modality Encoder Representations from Transformers, Tan et at., 2019.
- VL-BERT: Pre-training of Generic Visual-Linguistic Representations, Su et at., 2020.
- 12-in-1: Multi-Task Vision and Language Representation Learning, Lu et at., 2020.
- Speech2Action: Cross-modal Supervision for Action Recognition, Nagrani et at., 2020.
- Self-Supervised MultiModal Versatile Networks, Alayrac et at., 2020.
- Multi-modal Transformer for Video Retrieval, Gabeur et at., 2020.
Datasets
- PHEME dataset for Rumour Detection and Veracity Classification, Kochkina et at., 2018.
- MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims, Augenstein et at., 2019.
Tutors

Afsaneh Shirazi
Senior Staff Software Engineer,
Google Research