Cargando…

Multilingual corpora and multilingual corpus analysis.

This paper presents the metadata model of the EXMARaLDA system and its implementations. It will first take a look on existing metadata schemes for transcriptions of spoken language as well as written texts and emphasize on their advantages and disadvantages. The paper will justify the decisions agai...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Velupillai, Viveka
Formato: Electrónico eBook
Idioma:Inglés
Publicado: [Place of publication not identified] : John Benjamins, 2012.
Colección:Hamburg studies on multilingualism ; v. 14
Temas:
Acceso en línea:Texto completo
Tabla de Contenidos:
  • Multilingual Corpora and Multilingual Corpus Analysis
  • Editorial page
  • Title page
  • LCC page
  • Dedication page
  • Table of contents
  • Introduction
  • Section 1. Learner and attrition corpora
  • The LeaP corpus: A multilingual corpus of spoken learner German and learner English
  • 1. Introduction
  • 2. LeaP corpus: Primary data
  • 3. Corpus annotation
  • 4. Corpus data format
  • 5. Corpus search
  • 6. Exploring fluency in second language learner speech with the LeaP corpus
  • 7. Conclusion
  • References
  • Technological and methodological challenges in creating, annotating and sharing a learner corpus of
  • 1. Introduction
  • 2. The Hamburg Map Task Corpus
  • 3. Manual interpretative annotation
  • 4. Conclusion
  • References
  • Creation and analysis of a reading comprehension exercise corpus: Towards evaluating meaning in cont
  • 1. Introduction
  • 2. The Corpus of Reading Comprehension Exercises in German (CREG)
  • 3. Corpus collection and the WELCOME tool
  • 4. Inter-annotator agreement analysis for meaning assessment
  • 5. Meaning assessment results
  • 6. Avenues for future research
  • 7. Summary
  • Acknowledgments
  • References
  • The ALeSKo learner corpus: Design
  • annotation
  • quantitative analyses
  • 1. Introduction
  • 2. Design of the corpus
  • 3. Annotation layers
  • 4. Quantitative descriptive analyses
  • 5. Applications for the corpus
  • Acknowledgements
  • References
  • Corpora of spoken Spanish by simultaneous and successive German-Spanish bilingual and Spanish monoli
  • 1. Introduction
  • 2. Description of the corpora
  • 3. Further research
  • Acknowledgements
  • References
  • Monolingual and bilingual phonoprosodic corpora of child German and child Spanish
  • 1. Introduction
  • 2. The PAIDUS corpus
  • 3. The corpus PhonBLA
  • 4. Concluding remarks
  • References.
  • Pragmatic corpus analysis, exemplified by Turkish-German bilingual and monolingual data
  • 1. An introductory note on methodology
  • 2. The data: Corpus and constellation
  • 3. Research questions and aspects of frequency
  • 4. Procedures of quantitative analysis
  • 5. Classification of search results
  • 6. Contextual interpretation of the items
  • 7. Extending the analysis: Interpretative procedures
  • 8. Consequences and further research
  • Abbreviations and conventions
  • References
  • Corpus of Polish spoken in Germany: Collecting and analysing written & spoken data for investigating
  • 1. Introduction
  • 2. Participants of the study
  • 3. Corpus design
  • 4. Data acquisition and storage
  • 5. Transcription
  • 6. Corpus publication and reuse
  • References
  • The HABLA-Corpus (German-French and German-Italian)
  • 1. Introduction
  • 2. Research on simultaneous bilingualism and the weaker (heritage) language
  • 3. Corpus design
  • 4. Transcription
  • 5. Availability
  • References
  • Appendix
  • Section 2. Language contact corpora
  • The Hamburg Corpus of Argentinean Spanish (HaCASpa)
  • 1. Introduction
  • 2. Argentinean Porteño Spanish as a contact variety: The role of multilingualism and Second Language
  • 3. Corpus design
  • 4. Main findings
  • 5. Remaining issues
  • References
  • Ad hoc contact phenomena or established features of a contact variety? Evidence from corpus analysis
  • 1. Introduction
  • 2. The language situation on the Faroe Islands: Sociopolitical and linguistic factors
  • 3. Written and spoken language in language contact
  • 4. Corpus-based analyses of contact-induced transfer
  • 5. The corpora
  • 6. Case study: The use of subjunctions in conditional clauses in Faroe-Danish
  • 7. Conclusion
  • References
  • Phonoprosodic corpus of spoken Catalan (PhonCAT)
  • 1. Introduction: PhonCAT
  • 2. Data collection.
  • 3. Data segmentation and coding
  • 4. Collected data
  • 5. Data analysis
  • 6. Conclusions
  • References
  • Researching the intelligibility of a (German) dialect
  • 1. Why passive knowledge of a dialect?
  • 2. Focussing on language variation and its intelligibility in health care institutions
  • 3. The corpus design
  • 4. The annotation system
  • 5. Evaluating the (results of) the annotating system
  • References
  • Annotating ambiguity: Insights from a corpus-based study on syntactic change in Old Swedish
  • 1. Specific problems in historical corpora
  • 2. The HaCOSSA corpus
  • 3. Digital representation and linguistic annotation
  • 4. Syntactic ambiguity in Old Swedish
  • 5. Concluding remarks
  • References
  • Section 3. Interpreting corpora
  • Sharing community interpreting corpora: A pilot study
  • 1. Introduction
  • 2. Data for the pilot study
  • 3. Technical heterogeneity of the data
  • 4. Common platform for sharing the data: Integration of sound, text, and images
  • 5. Common approaches to annotating the data
  • 6. Conclusion and outlook
  • References
  • CoSi
  • A Corpus of Consecutive and Simultaneous Interpreting
  • 1. Introduction
  • 2. Corpus design
  • 3. Corpus creation and editing
  • 4. Corpus use
  • 5. Getting access to the corpus
  • References
  • The corpus "Interpreting in Hospitals": Possible applications for research and communication trainin
  • 1. The corpus "Interpreting in Hospitals": Design and background
  • 2. The corpus "Interpreting in Hospitals" as a source for research
  • 3. Using the corpus in communication trainings
  • 4. Conclusions
  • References
  • Section 4. Comparable and parallel corpora
  • The GeWiss corpus: Comparing spoken academic German, English and Polish
  • 1. Putting GeWiss into context: Motivation, aims and applications
  • 2. The design of the GeWiss corpus
  • 3. Data acquisition
  • 4. Metadata.
  • 5. Transcription
  • 6. Annotation
  • 7. Perspectives
  • References
  • Corpora
  • Appendix 1
  • Appendix 2
  • Korpus C4: A distributed corpus of German varieties
  • 1. A German variety corpus
  • 2. Design of the Korpus C4
  • 3. Corpus format and metadata
  • 4. Access to the Korpus C4
  • 5. Conclusion
  • References
  • Treebanks in translation studies: The CroCo Dependency Treebank
  • 1. Introduction
  • 2. The CroCo Dependency Bank
  • 3. Treebanks in translation studies
  • 4. Conclusion and outlook
  • References
  • Section 5. Corpus tools
  • Multilingual phonological corpus analysis: The tools behind the PhonBank Project
  • 1. Introduction
  • 2. PhonBank
  • 3. Phon
  • 4. A practical illustration
  • 5. Outlook
  • References
  • Finding the balance between strict defaults and total openness: Collecting and managing metadata for
  • 1. What is metadata?
  • 2. Why metadata?
  • 3. Metadata standards
  • 4. ISLE Meta Data Initiative (IMDI)
  • 5. Institut für Deutsche Sprache (IDS)
  • 6. EXMARaLDA metadata
  • 7. Using EXMARaLDA metadata
  • 8. Possible enhancements to the toolset regarding metadata
  • 9. Outlook
  • References
  • General index
  • Corpora index
  • Language index.