Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8)

EMNLP 2014 / SIGMT / SIGLEX Workshop
25 Oct 2014, Doha, Qatar

*** [NEW] Slides for all papers below ***

*** Special theme: Compositional Distributional Semantics and Machine Translation ***

The Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8) seeks to bring together a large number of researchers working on diverse aspects of structure, semantics and representation in relation to statistical machine translation. Since its first edition in 2006, its program each year has comprised high-quality papers discussing current work spanning topics including: new grammatical models of translation; new learning methods for syntax- and semantics-based models; formal properties of synchronous/transduction grammars (hereafter S/TGs); discriminative training of models incorporating linguistic features; using S/TGs for semantics and generation; and syntax- and semantics-based evaluation of machine translation.

We invite two types of submissions this year:

Extended abstracts for poster or hands-on presentations on the special theme
Full papers spanning all areas of interest for SSST

Special Theme Extended Abstracts

This year, the special theme of semantics of the past three editions of SSST takes a new step with a "working workshop" bringing together researchers interested in compositional distributional semantics, distributed representations, and continuous vector space models in MT, with tutorials bridging both directions, as well as discussions and hands-on work on relevant tasks with real data. Such models have proven beneficial for a number of NLP tasks, for example phrasal similarity, lexical entailment, modeling semantic deviance, detecting order restrictions in recursive structures, or improving NP bracketing in parsing. However, they have not received as much attention in MT.

Extended abstracts of at most two (2) pages should describe poster or hands-on presentations that will stimulate discussions on the special theme of compositional distributional semantics and machine translation, including position papers, recent work, pilot studies, negative results. We encourage the presentation of relevant work that has been published or submitted elsewhere, as well as new work in progress.

Full Papers

The need for structural mappings between languages is widely recognized in the fields of statistical machine translation and spoken language translation, and there is now wide consensus that these mappings are appropriately represented using a family of formalisms that includes synchronous/transduction grammars and similar notational equivalents. To date, flat-structured models, such as the word-based IBM models of the early 1990s or the more recent phrase-based models, remain widely used. But tree-structured mappings arguably offer a much greater potential for learning valid generalizations about relationships between languages.

Within this area of research there is a rich diversity of approaches. There is active research ranging from formal properties of S/TGs to large-scale end-to-end systems. There are approaches that make heavy use of linguistic theory, and approaches that use little or none. There is theoretical work characterizing the expressiveness and complexity of particular formalisms, as well as empirical work assessing their modeling accuracy and descriptive adequacy across various language pairs. There is work being done to invent better translation models, and work to design better algorithms. Recent years have seen significant progress on all these fronts. In particular, systems based on these formalisms are now top contenders in MT evaluations.

At the same time, SMT has seen a movement toward semantics over the past few years, which has been reflected at recent SSST workshops, including the last three editions which had semantics for SMT as a special theme. The issues of deep syntax and shallow semantics are closely linked and SSST-8 continues to encourage submissions on semantics for MT in a number of directions, including semantic role labeling, sense disambiguation, and compositional distributional semantics for translation and evaluation.

We invite full papers on:

syntax-based / semantics-based / tree-structured SMT
machine learning techniques for inducing structured translation models
algorithms for training, decoding, and scoring with semantic representation structure
empirical studies on adequacy and efficiency of formalisms
creation and usefulness of syntactic/semantic resources for MT
formal properties of synchronous/transduction grammars
learning semantic information from monolingual, parallel or comparable corpora
unsupervised and semi-supervised word sense induction and disambiguation methods for MT
lexical substitution, word sense induction and disambiguation, semantic role labeling, textual entailment, paraphrase and other semantic tasks for MT
semantic features for MT models (word alignment, translation lexicons, language models, etc.)
evaluation of syntactic/semantic components within MT (task-based evaluation)
scalability of structured translation methods to small or large data
applications of S/TGs to related areas including:
- speech translation
- formal semantics and semantic parsing
- paraphrases and textual entailment
- information retrieval and extraction
syntactically- and semantically-motivated evaluation of MT
compositional distributional semantics in MT
distributed representations and continuous vector space models in MT

Program

Session 1: Morning Orals

09:00–09:10

Opening remarks
Dekai Wu, Marine Carpuat, Xavier Carreras, Eva Maria Vecchi

09:10–09:30

Vector Space Models for Phrase-based Machine Translation [slides]
Tamer Alkhouli¹, Andreas Guta², Hermann Ney¹
¹RWTH Aachen University, ²RWTH Aachen

09:30–09:50

Bilingual Markov Reordering Labels for Hierarchical SMT [slides]
Gideon Maillette de Buy Wenniger¹ and Khalil Sima'an²
¹Institute for Logic Language and Computation - University of Amsterdam, ²ILLC, University of Amsterdam

09:50–10:10

Better Semantic Frame Based MT Evaluation via Inversion Transduction Grammars [slides]
Dekai Wu¹, Chi-kiu Lo¹, Meriem Beloucif¹, Markus Saers²
¹HKUST, ²Hong Kong University of Science and Technology

10:10–10:30

Rule-based Syntactic Preprocessing for Syntax-based Machine Translation [slides]
Yuto Hatakoshi, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura
Nara Institute of Science and Technology

10:30–11:00

Coffee break

Invited Talk

11:00–12:00

Composed, Distributed Reflections on Semantics and Statistical Machine Translation [slides]
Timothy Baldwin

Session 2: Morning Spotlights

12:00–12:05

Applying HMEANT to English-Russian Translations
Alexander Chuchunkov, Alexander Tarelkin, Irina Galinskaya
Yandex LLC

12:05–12:10

Reducing the Impact of Data Sparsity in Statistical Machine Translation [slides]
Karan Singla¹, Kunal Sachdeva¹, Srinivas Bangalore², Dipti Misra Sharma¹, Diksha Yadav³
¹LTRC, IIIT-Hyderabad, ²AT&T Labs-Research, ³IIIT-Hyderabad

12:10–12:15

Expanding the Language model in a low-resource hybrid MT system [slides]
George Tambouratzis, Sokratis Sofianopoulos, Marina Vassiliou
ILSP/Athena R.C.

12:15–12:20

Syntax and Semantics in Quality Estimation of Machine Translation [slides]
Rasoul Kaljahi¹, Jennifer Foster¹, Johann Roturier²
¹Dublin City University, ²Symantec

12:20–12:25

Overcoming the Curse of Sentence Length for Neural Machine Translation using Automatic Segmentation [slides]
Jean Pouget-Abadie¹, Dzmitry Bahdanau², Bart van Merrienboer³, Kyunghyun Cho³, Yoshua Bengio³
¹Ecole Polytechnique, ²Jacobs University Bremen, ³University of Montreal

12:25–12:30

Ternary Segmentation for Improving Search in Top-down Induction of Segmental ITGs [slides]
Markus Saers¹ and Dekai Wu²
¹Hong Kong University of Science and Technology, ²HKUST

12:30–14:00

Lunch break

Session 3: Afternoon Orals and Spotlights

14:00–14:20

A CYK+ Variant for SCFG Decoding Without a Dot Chart [slides]
Rico Sennrich
University of Edinburgh

14:20–14:40

On the Properties of Neural Machine Translation: Encoder–Decoder Approaches [slides]
Kyunghyun Cho¹, Bart van Merrienboer¹, Dzmitry Bahdanau², Yoshua Bengio¹
¹University of Montreal, ²Jacobs University Bremen

14:40–15:00

Transduction Recursive Auto-Associative Memory: Learning Bilingual Compositional Distributed Vector Representations of Inversion Transduction Grammars [slides]
Karteek Addanki and Dekai Wu
HKUST

15:00–15:20

Transformation and Decomposition for Efficiently Implementing and Improving Dependency-to-String Model In Moses [slides]
Liangyou Li¹, Jun Xie², Andy Way³, Qun Liu⁴
¹CNGL Centre for Global Intelligent Content, School of Computing, Dublin City University, ²ICT,CAS, ³CNGL, Dublin City University, ⁴Dublin City University

15:20–15:25

Word's Vector Representations meet Machine Translation [slides]
Eva Martinez Garcia¹, Jörg Tiedemann², Cristina España-Bonet¹, Lluís Màrquez³
¹TALP Research Center, ²Uppsala University, ³Qatar Computing Research Institute

15:25–15:30

Context Sense Clustering for Translation [slides]
João Casteleiro, Gabriel Lopes, Joaquim Silva
Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia, Departamento de Informática

15:30–16:00

Coffee break

Session 4: Afternoon Spotlights

16:00–16:05

Evaluating Word Order Recursively over Permutation-Forests [slides]
Miloš Stanojević¹ and Khalil Sima'an²
¹University of Amsterdam, ILLC, ²ILLC, University of Amsterdam

16:05–16:10

Preference Grammars and Soft Syntactic Constraints for GHKM Syntax-based Statistical Machine Translation [slides]
Matthias Huck, Hieu Hoang, Philipp Koehn
University of Edinburgh

16:10–16:15

How Synchronous are Adjuncts in Translation Data? [slides]
Sophie Arnoult¹ and Khalil Sima'an²
¹Institute of Logic, Language and Computation (ILLC), University of Amsterdam (UvA), ²ILLC, University of Amsterdam

Poster Session

16:15–17:30

Poster session of all workshop papers
All workshop presenters

Organizers

Dekai WU, Hong Kong University of Science and Technology (HKUST)
Marine CARPUAT, National Research Council (NRC) Canada
Xavier CARRERAS, Universitat Politècnica de Catalunya (UPC)
Eva Maria VECCHI, Cambridge University

Important Dates

Submission deadline for papers and extended abstracts: 1 Aug 2014
Notification to authors: 26 Aug 2014
Camera copy deadline: 15 Sep 2014

Submission

Papers will be accepted on or before 1 Aug 2014 in PDF or Postscript formats via the START system at https://www.softconf.com/emnlp2014/SSST-8/. Submissions should follow the EMNLP 2014 length and formatting requirements for long papers of nine (9) pages of content with any number of additional pages of references, found at http://emnlp2014.org/templates.html.

Contact

Please send inquiries to ssst@cs.ust.hk.

Last updated: 2014.10.25