Cargando…

Exploring newspaper language : using the web to create and investigate a large corpus of modern Norwegian /

This book describes new methodological and technological approaches to corpus building and presents recent research based on the Norwegian Newspaper Corpus. This is a large monitor corpus of contemporary Norwegian language, compiled through daily harvesting of web newspapers. The book gives an overv...

Descripción completa

Detalles Bibliográficos
Clasificación:	Libro Electrónico
Otros Autores:	Andersen, Gisle
Formato:	Electrónico eBook
Idioma:	Inglés
Publicado:	Amsterdam ; Philadelphia : John Benjamins Pub. Co., 2012.
Colección:	Studies in corpus linguistics ; v. 49.
Temas:	Norwegian language (Nynorsk) > Usage. Norwegian language (Nynorsk) > Syntax. Newspapers > Norway. Mass media > Norway. Information technology > Norway. Nynorsk (Langue) > Syntaxe. Médias > Norvège. Technologie de l'information > Norvège. FOREIGN LANGUAGE STUDY > Norwegian. FOREIGN LANGUAGE STUDY > Scandinavian Languages (Other) Information technology. Mass media. Newspapers. Norway.
Acceso en línea:	Texto completo

MARC


LEADER	00000cam a2200000Mi 4500
001	EBSCO_ocn779828976
003	OCoLC
005	20231017213018.0
006	m o d
007	cr \|n\|---\|\|\|\|\|
008	120312s2012 ne ob 001 0 eng d
010			\|a 2011045662
040			\|a EBLCP \|b eng \|e pn \|c EBLCP \|d OCLCQ \|d N$T \|d OCLCQ \|d IDEBK \|d CDX \|d YDXCP \|d E7B \|d OCLCQ \|d OCLCA \|d OCLCQ \|d LOA \|d AGLDB \|d PIFAG \|d ZCU \|d OCLCQ \|d MERUC \|d OCLCQ \|d U3W \|d OCLCA \|d OCLCF \|d STF \|d WRM \|d VTS \|d ICG \|d INT \|d REC \|d VT2 \|d OCLCQ \|d WYU \|d TKN \|d DKC \|d OCLCQ \|d M8D \|d UKAHL \|d OCLCQ \|d AJS \|d OCLCO \|d OCLCQ \|d QGK
016	7		\|a 016070495 \|2 Uk
019			\|a 787847218 \|a 794545620 \|a 817078477 \|a 1055338892 \|a 1066600163 \|a 1081235046 \|a 1228548682 \|a 1259078595
020			\|a 9789027274991 \|q (electronic bk.)
020			\|a 9027274991 \|q (electronic bk.)
020			\|a 1280497661
020			\|a 9781280497667
020			\|a 9027203547
020			\|a 9789027203540
020			\|z 9789027203540 \|q (alk. paper)
020			\|a 9786613592897
020			\|a 6613592897
029	1		\|a AU@ \|b 000054186897
029	1		\|a DEBBG \|b BV043033325
029	1		\|a DEBBG \|b BV044163272
029	1		\|a DEBSZ \|b 421440236
029	1		\|a NZ1 \|b 14536388
035			\|a (OCoLC)779828976 \|z (OCoLC)787847218 \|z (OCoLC)794545620 \|z (OCoLC)817078477 \|z (OCoLC)1055338892 \|z (OCoLC)1066600163 \|z (OCoLC)1081235046 \|z (OCoLC)1228548682 \|z (OCoLC)1259078595
043			\|a e-no---
050		4	\|a PD2914 \|b .E97 2012
072		7	\|a FOR \|x 039000 \|2 bisacsh
072		7	\|a FOR \|x 022000 \|2 bisacsh
072		7	\|a CFX \|2 bicssc
072		7	\|a PD \|2 lcco
072		7	\|a PN \|2 lcco
082	0	4	\|a 439.8/20188 \|2 23
049			\|a UAMI
245	0	0	\|a Exploring newspaper language : \|b using the web to create and investigate a large corpus of modern Norwegian / \|c edited by Gisele Andersen.
260			\|a Amsterdam ; \|a Philadelphia : \|b John Benjamins Pub. Co., \|c 2012.
300			\|a 1 online resource (362 pages)
336			\|a text \|b txt \|2 rdacontent
337			\|a computer \|b c \|2 rdamedia
338			\|a online resource \|b cr \|2 rdacarrier
490	1		\|a Studies in corpus linguistics, \|x 1388-0373 ; \|v v. 49
505	0		\|a Exploring Newspaper Language; Editorial page; Titla page; LCC data; Table of contents; Building a large corpus based on newspapers from the web; 1. Introduction; 2. An overview of the Norwegian Newspaper Corpus and its system architecture; 2.1 Text harvesting; 2.2 Boilerplate and duplicate removal; 2.3 Language classification; 2.4 Text annotation; 2.4.1 Annotation of source, date and author information; 2.4.2 Topic classification; 2.4.3 Part-of-speech tagging; 2.5 Search system and user interface; 2.5.1 Corpus WorkBench; 2.5.2 Corpuscle; 2.6 Extraction of new words.
505	8		\|a 2.7 Classification of new words2.7.1 Anglicism detection; 2.8 Frequency profiling and lexical database entry; 2.9 Identification of multiword expressions; 3. The content of the research contributions to this book; 4. Concluding remarks; References; Part II. Exploiting the web as a corpus -- Methods and tools; Corpuscle -- a new corpus management platform for annotated corpora; 1. Introduction; 2. Design principles; 3. Querying the corpus; 4. API and Web interface; 4.1 The API; 4.2 The Web interface; 5. Editing and manual annotation; 6. Evaluation and concluding remarks; References; OBT+stat.
505	8		\|a 1. Introduction2. Background; 2.1 The history of the Oslo-Bergen Tagger; 2.2 State of the art for Norwegian POS taggers; 3. The architecture of the Oslo-Bergen Constraint Grammar Tagger; 4. Methodology of improvements to the Oslo-Bergen Tagger; 5. Dealing with left-over ambiguities in the Oslo-Bergen Tagger; 5.1 Morphological ambiguities; 5.2 Lemma ambiguities; 6. Statistical disambiguation; 7. Modelling challenges and engineering concerns; 8. Evaluation of the statistical module; 8.1 How to evaluate; 8.2 Evaluation results; 9. Conclusion; References.
505	8		\|a Exploring corpora through syntactic annotation1. Introduction; 2. Treebanking; 3. INESS -- the Norwegian treebanking infrastructure; 4. Searching for complex syntactic constructions in a treebank; 4.1 Passive constructions; 4.2 Relative clauses; 5. Conclusion; References; Collocations and statistical analysis of n-grams; 1. Introduction; 2. Background; 2.1 Multiword Expressions (MWEs); 2.2 Collocations; 3. Methodology; 3.1 Data and n-gram extraction; 3.2 Post-processing of n-gram lists; 3.3 Contingency tables; 3.3.1 Bigram Contingency Tables; 3.3.2 Trigram Contingency Tables.
505	8		\|a 3.4 Bigram Association Measures3.5 Trigram Association Measures; 4. Results; 4.1 Bigrams; 4.2 Trigrams; 5. Conclusion and Future Work; References; Automatic topic classi?cation of a large newspaper corpus; 1. Introduction; 2. Background and related work; 2.1 The rule-based approach; 2.2 The pattern-matching approach; 2.3 Promising results; 3. Material; 3.1 Manual annotation; 3.2 Feature extraction; 3.3 Cleaning the text; 3.4 The gold standard; 4. Overview of our final approach; 5. Our approach in detail; 5.1 Hypothesis; 5.2 De?ning categories; 5.3 Tools; 5.4 Programming and experimenting.
500			\|a 6. Data and experimental evaluation.
520			\|a This book describes new methodological and technological approaches to corpus building and presents recent research based on the Norwegian Newspaper Corpus. This is a large monitor corpus of contemporary Norwegian language, compiled through daily harvesting of web newspapers. The book gives an overview of the corpus and its system architecture, and presents tools used for tasks such as text harvesting, annotation, topic classification and extraction and frequency profiling of new words and phrases. Among the innovative technologies is Corpuscle, a corpus query engine and management system whic.
588	0		\|a Print version record.
504			\|a Includes bibliographical references and indexes.
546			\|a English.
590			\|a eBooks on EBSCOhost \|b EBSCO eBook Subscription Academic Collection - Worldwide
650		0	\|a Norwegian language (Nynorsk) \|x Usage.
650		0	\|a Norwegian language (Nynorsk) \|x Syntax.
650		0	\|a Newspapers \|z Norway.
650		0	\|a Mass media \|z Norway.
650		0	\|a Information technology \|z Norway.
650		6	\|a Nynorsk (Langue) \|x Syntaxe.
650		6	\|a Médias \|z Norvège.
650		6	\|a Technologie de l'information \|z Norvège.
650		7	\|a FOREIGN LANGUAGE STUDY \|x Norwegian. \|2 bisacsh
650		7	\|a FOREIGN LANGUAGE STUDY \|x Scandinavian Languages (Other) \|2 bisacsh
650		7	\|a Information technology. \|2 fast \|0 (OCoLC)fst00973089
650		7	\|a Mass media. \|2 fast \|0 (OCoLC)fst01011219
650		7	\|a Newspapers. \|2 fast \|0 (OCoLC)fst01037111
651		7	\|a Norway. \|2 fast \|0 (OCoLC)fst01204556
700	1		\|a Andersen, Gisle.
776	0	8	\|i Print version: \|a Andersen, Gisle. \|t Exploring Newspaper Language : Using the web to create and investigate a large corpus of modern Norwegian. \|d Amsterdam/Philadelphia : John Benjamins Publishing Company, ©2012 \|z 9789027203540
830		0	\|a Studies in corpus linguistics ; \|v v. 49. \|x 1388-0373
856	4	0	\|u https://ebsco.uam.elogim.com/login.aspx?direct=true&scope=site&db=nlebk&AN=439344 \|z Texto completo
938			\|a Askews and Holts Library Services \|b ASKH \|n AH28556052
938			\|a Coutts Information Services \|b COUT \|n 22291010
938			\|a EBL - Ebook Library \|b EBLB \|n EBL869351
938			\|a ebrary \|b EBRY \|n ebr10540413
938			\|a EBSCOhost \|b EBSC \|n 439344
938			\|a ProQuest MyiLibrary Digital eBook Collection \|b IDEB \|n 359289
938			\|a YBP Library Services \|b YANK \|n 7249140
994			\|a 92 \|b IZTAP

Exploring newspaper language : using the web to create and investigate a large corpus of modern Norwegian /

MARC

Ejemplares similares