Genres on the Web Computational Models and Empirical Studies

The description, automatic identification and further processing of web genres is a novel field of research in computational linguistics, NLP and related areas such as text-technology, digital humanities and web mining. One of the driving forces behind this research is the idea of genre-enabled sear...

Full description

Saved in:
Bibliographic Details
Main Authors Mehler, Alexander, Sharoff, Serge, Santini, Marina
Format eBook Book
LanguageEnglish
Published Dordrecht Springer Nature 2010
Springer Science + Business Media
Springer
Springer Netherlands
Edition1. Aufl.
SeriesText, Speech and Language Technology
Subjects
Online AccessGet full text
ISBN9048191785
9789048191789
9789048191772
9048191777
ISSN1386-291X
DOI10.1007/978-90-481-9178-9

Cover

Abstract The description, automatic identification and further processing of web genres is a novel field of research in computational linguistics, NLP and related areas such as text-technology, digital humanities and web mining. One of the driving forces behind this research is the idea of genre-enabled search engines which enable users to additionally specify web genres that the documents to be retrieved should comply with (e.g., personal homepage, weblog, scientific article etc.). This book offers a thorough foundation of this upcoming field of research on web genres and document types in web-based social networking. It provides theoretical foundations of web genres, presents corpus linguistic approaches to their analysis and computational models for their classification. This includes research in the areas of web genre identification, web genre modelling and related fields such as genres and registers in web based communication social software-based document networks web genre ontologies and classification schemes text-technological models of web genres web content, structure and usage mining web genre classification web as corpus. The book addresses researchers who want to become acquainted with theoretical developments, computational models and their empirical evaluation in this field of research. It also addresses researchers who are interested in standards for the creation of corpora of web documents. Thus, the book concerns readers from many disciplines such as corpus linguistics, computational linguistics, text-technology and computer science.
AbstractList The volume "Genres on the Web" has been designed for a wide audience, from the expert to the novice. It is a required book for scholars, researchers and students who want to become acquainted with the latest theoretical, empirical and computational advances in the expanding field of web genre research. The study of web genre is an overarching and interdisciplinary novel area of research that spans from corpus linguistics, computational linguistics, NLP, and text-technology, to web mining, webometrics, social network analysis and information studies. This book gives readers a thorough grounding in the latest research on web genres and emerging document types. The book covers a wide range of web-genre focused subjects, such as: - The identification of the sources of web genres - Automatic web genre identification - The presentation of structure-oriented models - Empirical case studies One of the driving forces behind genre research is the idea of a genre-sensitive information system, which incorporates genre cues complementing the current keyword-based search and retrieval applications.
The description, automatic identification and further processing of web genres is a novel field of research in computational linguistics, NLP and related areas such as text-technology, digital humanities and web mining. One of the driving forces behind this research is the idea of genre-enabled search engines which enable users to additionally specify web genres that the documents to be retrieved should comply with (e.g., personal homepage, weblog, scientific article etc.). This book offers a thorough foundation of this upcoming field of research on web genres and document types in web-based social networking. It provides theoretical foundations of web genres, presents corpus linguistic approaches to their analysis and computational models for their classification. This includes research in the areas of web genre identification, web genre modelling and related fields such as genres and registers in web based communication social software-based document networks web genre ontologies and classification schemes text-technological models of web genres web content, structure and usage mining web genre classification web as corpus. The book addresses researchers who want to become acquainted with theoretical developments, computational models and their empirical evaluation in this field of research. It also addresses researchers who are interested in standards for the creation of corpora of web documents. Thus, the book concerns readers from many disciplines such as corpus linguistics, computational linguistics, text-technology and computer science.
This book presents the latest research on web genres and emerging document types. It covers a wide range of web-genre focused subjects, such as: the identification of the sources of web genres, automatic web genre identification and structure-oriented models.
Author Santini, Marina
Sharoff, Serge
Mehler, Alexander
Author_xml – sequence: 1
  fullname: Mehler, Alexander
– sequence: 2
  fullname: Sharoff, Serge
– sequence: 3
  fullname: Santini, Marina
BackLink https://cir.nii.ac.jp/crid/1130000795103008384$$DView record in CiNii
BookMark eNpNkM1LAzEQxSO2Ylt78OitiAge1iabpEmOWuoHFLyIegvZ3dl27Tapm6r435t0RZrDZB783mNm-qhjnQWEzgi-JhiLsRIyUThhkiSKxP4A9RUOMip-uC86qEeonCSpIm9d1E8xwYpSkZIj1JtIIZkURB2joffvODzGFKe4h07vwTbgR86OtksYvUJ2grqlqT0M__4BermbPU8fkvnT_eP0Zp4YKjiVSZZzUXJIQRaAS5WCKAtJMymMyAkXGS4LZvIiUyxVBgyRYZjgMIYBZKwo6ABdtcHGr-DbL1299fqrhsy5lddh9f_lVGDHLes3TWUX0OiWIljHQ0VaK6wDr6NBR8dl69g07uMT_FbvgnOw28bUenY7nTDOJQvgRQvaqtJ5FSshNN5IKE5w6CTdYectllfGFk6HOdam-dGtTDGVYh8y3tQhS6-ddYvGbJZec8YEVoL-AkrIgkk
ContentType eBook
Book
Copyright Springer Science+Business Media B.V. 2011
Copyright_xml – notice: Springer Science+Business Media B.V. 2011
DBID I4C
08O
RYH
DEWEY 005
DOI 10.1007/978-90-481-9178-9
DatabaseName Casalini Torrossa eBooks Institutional Catalogue
ciando eBooks
CiNii Complete
DatabaseTitleList


DeliveryMethod fulltext_linktorsrc
Discipline Languages & Literatures
Computer Science
EISBN 9048191785
9789048191789
Edition 1. Aufl.
1
Editor Santini, Marina
Sharoff, Serge
Mehler, Alexander
Editor_xml – sequence: 1
  fullname: Mehler, Alexander
– sequence: 2
  fullname: Santini, Marina
– sequence: 3
  fullname: Sharoff, Serge
ExternalDocumentID 9789048191789
192746
EBC645584
BB04035733
ciando203874
5447097
GroupedDBID -T.
.~Z
089
0DA
0DD
0E8
20A
38.
A4J
AABBV
AAFYB
AAINA
AAMFE
ABMNI
ACBPT
ACDPG
AECAB
AECMQ
AEGQK
AEKFX
AETDV
AEZAY
AFNRJ
ALMA_UNASSIGNED_HOLDINGS
AZZ
BBABE
C9S
C9V
CZZ
E6I
I4C
IEZ
MYL
NUC
SAS
SBO
TPJZQ
UZ6
Z83
Z84
Z88
08O
T.
Z
AAJYQ
AATVQ
ABBUY
ABCYT
ACDTA
ACDUY
AEHEY
AEJLV
AHNNE
ATJMZ
RYH
Z81
ID FETCH-LOGICAL-a37538-bc57f5e2e8de0f92e7fd83b87a7c157b0fd4acdb9429aea1837257faa4eeb4dd3
ISBN 9048191785
9789048191789
9789048191772
9048191777
ISSN 1386-291X
IngestDate Fri Nov 08 00:09:58 EST 2024
Tue Jul 29 20:04:26 EDT 2025
Sat May 31 00:05:30 EDT 2025
Thu Jun 26 23:56:12 EDT 2025
Mon Aug 22 12:36:06 EDT 2022
Tue Nov 14 23:02:12 EST 2023
IsPeerReviewed false
IsScholarly false
Keywords Computer Internet
Intranet
LCCN 2010933721
LCCallNum_Ident Q
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-a37538-bc57f5e2e8de0f92e7fd83b87a7c157b0fd4acdb9429aea1837257faa4eeb4dd3
Notes Includes bibliographical references and index
OCLC 687848719
PQID EBC645584
PageCount 363
ParticipantIDs askewsholts_vlebooks_9789048191789
springer_books_10_1007_978_90_481_9178_9
proquest_ebookcentral_EBC645584
nii_cinii_1130000795103008384
ciando_primary_ciando203874
casalini_monographs_5447097
PublicationCentury 2000
PublicationDate 2010
2011
c2010
2014-07-30
PublicationDateYYYYMMDD 2010-01-01
2011-01-01
2014-07-30
PublicationDate_xml – year: 2010
  text: 2010
PublicationDecade 2010
PublicationPlace Dordrecht
PublicationPlace_xml – name: Netherlands
– name: Dordrecht
PublicationSeriesTitle Text, Speech and Language Technology
PublicationSeriesTitleAlternate Text,Speech,Language Tech.
PublicationYear 2010
2011
2014
Publisher Springer Nature
Springer Science + Business Media
Springer
Springer Netherlands
Publisher_xml – name: Springer Nature
– name: Springer Science + Business Media
– name: Springer
– name: Springer Netherlands
SSID ssj0000449530
Score 2.0946996
Snippet The description, automatic identification and further processing of web genres is a novel field of research in computational linguistics, NLP and related areas...
This book presents the latest research on web genres and emerging document types. It covers a wide range of web-genre focused subjects, such as: the...
The volume "Genres on the Web" has been designed for a wide audience, from the expert to the novice. It is a required book for scholars, researchers and...
SourceID askewsholts
springer
proquest
nii
ciando
casalini
SourceType Aggregation Database
Publisher
SubjectTerms Classification
Computational Linguistics
Computer programming, programs, data
Computer Science
Corpora (Linguistics)
Natural Language Processing (NLP)
Nonbook materials
World Wide Web
Subtitle Computational Models and Empirical Studies
TableOfContents 4.1.1 What Is the Purpose of a Genre Taxonomy? -- 4.2 Why Is It Hard to Develop a Web Genre Taxonomy? -- 4.2.1 Difficulties in Defining Genres -- 4.2.2 Difficulties in Developing the Scope and Expressiveness of the Taxonomy -- 4.3 A Use-Centered Development of a Taxonomy of Web Genres -- 4.3.1 Research Design: Naturalistic Field Study -- 4.3.2 Research Informants -- 4.3.3 Data Elicitation -- 4.3.4 Data Analysis -- 4.4 Results -- 4.5 Discussion -- 4.6 Conclusions -- References -- Part III Automatic Web Genre Identification -- 5 Cross-Testing a Genre Classification Model for the Web -- Marina Santini -- 5.1 Introduction -- 5.2 Approximating Genre Population on the Web -- 5.2.1 Noise -- 5.2.2 Description of the Corpora Used for Cross-Testing -- 5.3 The Web as Communication -- 5.3.1 Genre Palette -- 5.3.2 Linguistically- and Functionally-Motivated Features -- 5.4 The Genre Model -- 5.4.1 Methodology -- 5.4.2 Flow and Hypotheses -- 5.5 Results -- 5.5.1 Cross-Testing Performance on Single Labels: BBC and 7-Webgenre Collections -- 5.5.2 Performances of Other Single-Label Models on the 7-Webgenre Collection -- 5.5.3 Cross-Testing Performance on Single Labels: Mapped Web Genres -- 5.5.4 Cross-Testing Performance on Single Labels: HCG and MCG in Isolation -- 5.5.5 The SPIRIT Sample: An Attempt to Assess Multilabelling -- 5.6 Discussion -- 5.7 Conclusion and Future Work -- References -- 6 Formulating Representative Features with Respect to Genre Classification -- Yunhyong Kim and Seamus Ross -- 6.1 Introduction -- 6.2 Defining Genre Classification -- 6.2.1 Document Representation in Conventional Text Classification -- 6.2.2 Harmonic Descriptor Representation (HDR) of Documents -- 6.2.3 Defining Genre -- 6.3 Classifiers -- 6.4 Dataset -- 6.5 Features -- 6.6 Results -- 6.6.1 Overall Accuracy -- 6.6.2 Precision and Recall
Jack Grieve, Douglas Biber, Eric Friginal, and Tatiana Nekrasova -- 14.1 Introduction -- 14.2 Corpus Compilation and Analysis -- 14.3 Factor Analysis -- 14.3.1 Method -- 14.3.2 Results -- 14.3.3 Interpretation of Factors -- 14.4 Text Type Analysis -- 14.4.1 Method -- 14.4.2 Results -- 14.4.3 Interpretation of Clusters -- 14.5 Summary of Findings -- References -- 15 Evolving Genres in Online Domains: The Hybrid Genre of the Participatory News Article -- Ian Bruce -- 15.1 Introduction -- 15.1.1 The Systemic Functional Approach to Genre -- 15.1.2 The English for Specific Purposes Approach to Genre -- 15.1.3 Problems with these Existing Approaches to Genre -- 15.1.4 A Solution: Social Genre and Cognitive Genre -- 15.1.5 A Web Genre: The Participatory News Article -- 15.2 Methodology -- 15.3 Results -- 15.3.1 The News Article -- 15.3.2 Reader Comments -- 15.4 Discussion -- 15.5 Conclusion -- References -- Part VI Prospect -- 16 Any Land in Sight? -- Marina Santini, Serge Sharoff, and Alexander Mehler -- 16.1 Web Genre Benchmarks -- 16.1.1 Genre Labels -- 16.1.2 Annotation -- 16.1.3 Representativeness -- 16.2 Work Plan -- 16.2.1 Benefits -- Index
10.4 Features for Classification -- 10.4.1 Features Derived from Structure -- 10.4.2 Features Derived from Content -- 10.5 Classification of Web Sites -- 10.5.1 Classification by Structure -- 10.5.2 Classification by Content -- 10.5.3 Classification by Structure and Content -- 10.6 Conclusion -- References -- 11 Mining Graph Patterns in Web-Based Systems: A Conceptual View -- Matthias Dehmer and Frank Emmert-Streib -- 11.1 Introduction -- 11.2 Mathematical Preliminaries -- 11.3 Structural Graph Measures -- 11.4 Graph Similarity Measures for Web Mining -- 11.4.1 Classical Similarity and Distance Measures for Graphs -- 11.4.2 Graph Similarity Measures Based on Trees -- 11.4.3 Structural Similarity of Generalized Trees -- 11.5 Applications -- 11.6 Conclusion -- References -- 12 Genre Connectivity and Genre Drift in a Web of Genres -- Lennart Björneborn -- 12.1 Introduction -- 12.2 Methodology -- 12.2.1 Source Pages and Target Pages -- 12.2.2 Genre Categorization -- 12.3 Results and Discussion -- 12.3.1 Source Genres, Target Genres and Genre Pairs -- 12.3.2 Web of Genres -- 12.3.3 ``Hook'' Genres and ``Lug'' Genres -- 12.3.4 Genre Drift, Topic Drift and Small-World Implications -- 12.4 Conclusion -- References -- Part V Case Studies of Web Genres -- 13 Genre Emergence in Amateur Flash -- John C. Paolillo, Jonathan Warren, and Breanne Kunz -- 13.1 Genres, Multimedia and the Web -- 13.2 Flash and Newgrounds in Amateur Multimedia -- 13.3 Method -- 13.3.1 Sampling -- 13.3.2 Identifying Potential Emergent Genres -- 13.3.3 Cultural References and Message Content -- 13.4 Results -- 13.4.1 Network Analysis -- 13.4.2 Genre Features -- 13.4.3 Cultural References -- 13.4.4 Genre, Emergence and Social Network -- 13.5 Discussion and Conclusions -- References -- 14 Variation Among Blogs: A Multi-Dimensional Analysis
6.7 Conclusions -- References -- 7 In the Garden and in the Jungle -- Serge Sharoff -- 7.1 Introduction -- 7.2 Text Typology for the Web -- 7.3 An Experiment in Automatic Classification of the Web -- 7.4 Analysis of Results -- 7.4.1 Qualitative Assessment of Texts in Each Category -- 7.4.2 Assessing the Composition of ukWac -- 7.5 Conclusions and Future Research -- References -- 8 Web Genre Analysis: Use Cases, Retrieval Models, and Implementation Issues -- Benno Stein, Sven Meyer zu Eissen, and Nedim Lipka -- 8.1 Introduction -- 8.1.1 Contributions -- 8.2 Use Cases: Genre Analysis in the Retrieval Practice -- 8.2.1 Genre-Enabled Web Search -- 8.2.2 Information Extraction Based on Genre Information -- 8.2.3 Organizing Collections in Both Topic and Genre Dimensions -- 8.2.4 Empower Web Page Abstraction with Genre Information -- 8.3 Construction of Genre Retrieval Models -- 8.3.1 Problems of Genre Retrieval Models and Lessons Learned -- 8.3.2 New Elements for Genre Retrieval Models -- 8.4 Evaluation -- 8.4.1 Improving Generalization Capability -- 8.4.2 Measuring Generalization Capability -- 8.4.3 Experiments -- 8.5 Implementing Genre-Enabled Web Search -- 8.6 Conclusion -- References -- 9 Marrying Relevance and Genre Rankings: An Exploratory Study -- Pavel Braslavski -- 9.1 Introduction -- 9.2 Related Work -- 9.2.1 Genre Classification -- 9.2.2 Readability Scores -- 9.2.3 Genres in Relevance Ranking -- 9.3 Data -- 9.3.1 Functional Styles Sample -- 9.3.2 ROMIP Collection -- 9.4 Formality Score -- 9.5 Results -- 9.5.1 Genre-Related Rankings -- 9.5.2 Merged Rankings -- 9.6 Conclusion -- References -- Part IV Structure-Oriented Models of Web Genres -- 10 Classification of Web Sites at Super-Genre Level -- Christoph Lindemann and Lars Littig -- 10.1 Introduction -- 10.2 Related Work -- 10.3 Dataset
Intro -- Foreword -- Personal Note -- Contents -- Contributors -- Part I Introduction -- 1 Riding the Rough Waves of Genre on the Web -- Marina Santini, Alexander Mehler, and Serge Sharoff -- 1.1 Why Is Genre Important? -- 1.1.1 Zooming In: Information on the Web -- 1.2 Trying to Grasp the Ungraspable? -- 1.2.1 In Quest of a Definition of Web Genre for Empirical Studies and Computational Applications -- 1.3 Empirical and Computational Approaches to Genre: Open Issues -- 1.3.1 Web Documents -- 1.3.2 Corpora, Genres and the Web -- 1.3.3 Empirical and Computational Models of Web Genres -- 1.4 Conclusions -- 1.5 Outline of the Volume -- References -- Part II Identifying the Sources of Web Genres -- 2 Conventions and Mutual Expectations -- Jussi Karlgren -- 2.1 Genres Are Not Rule-Bound -- 2.2 So, Let's Ask the Readers -- 2.3 An Editorial, Third Party, View of Genres on the Web -- 2.4 Data Source: Observation of User Actions -- 2.5 Conclusions -- References -- 3 Identification of Web Genres by User Warrant -- Mark A. Rosso and Stephanie W. Haas -- 3.1 Introduction -- 3.2 Criteria for the Identification of Web Genre -- 3.3 Operationalizing Traditional Genre Theory for the World Wide Web -- 3.3.1 A Genre's User Group -- 3.3.2 Genre: Function, Form and Substance -- 3.3.3 Genres on the Web: Further Implications for Research -- 3.4 Developing a Web Genre Palette -- 3.4.1 Collecting Genre Terminology in the Users' Own Words -- 3.4.2 Users Choose the Best of the Collected Genre Terminology -- 3.4.3 User Validation of the Genre Palette -- 3.4.4 A Fourth Study: Determining the Genres' Usefulness for Web Search -- 3.5 Conclusion -- References -- 4 Problems in the Use-Centered Development of a Taxonomy of Web Genres -- Kevin Crowston, Barbara Kwasnik, and Joseph Rubleske -- 4.1 Introduction
Title Genres on the Web
URI http://digital.casalini.it/9789048191789
http://ebooks.ciando.com/book/index.cfm/bok_id/203874
https://cir.nii.ac.jp/crid/1130000795103008384
https://ebookcentral.proquest.com/lib/[SITE_ID]/detail.action?docID=645584
http://link.springer.com/10.1007/978-90-481-9178-9
https://www.vlebooks.com/vleweb/product/openreader?id=none&isbn=9789048191789&uid=none
Volume 42
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9QwELbocqEn3pRSsBBCSCgomzhrmwMSWy2qqi0XSunNchxHXQHZiiwc-PXM-JVNQUJwiZJJlLH8JfZ4PPMNIc9MoW0tpwZ-JIELFM4yUTKZFWLGmZFMto678-T97OgjOz6vzgc2P5ddsqlfmZ9_zCv5H1RBBrhiluw_IJteCgI4B3zhCAjD8Yrxmy4DUz8Ko6P_5Sdbu5W9cRUaonfP1bjxDMz26-XKU4H047DBExuLH6dEl-RyudAwRDvKxg-YoZnkGmtLrEKeTyi-Hf0GLvxs228Q_Yaj9aRE9hhYwPlqOr-NrkNAhcwzeBIAxvNhKkkBfvM5DA4lci3ukB3OxYRcf7s4Xp4l91fOMLA1h3Vy0sk9H9LQhrgJHXiARzp3ya7uP8M8AHPEpkejQvcac0nBkkB3ULMGY6FbrUYLhyt73c6EOL1JJphWcotcs91tcn8Z3MQ9fU6Xidm6v0PeeGjpuqMALQVo6Ws6ApZ6YClopwlYGoC9S87eLU4Pj7JQ4yLTJcfJpjYVbytbWNHYvJWF5W0jylpwzc204nXeNkybppZgOGirYQTmMMq2WjNra9Y05T0y6dadfUCoKUBQwYRVmZZZzGguZ1YUegpy0-Rsjzzd6jP144vbj-_VVqcLuUf2Y1cq-F08b3qvKsZ4Ljnedb2rLj0ZivKXBUZBgIID6HIQ4XGKG6WAHZrwJVr6Au4_iWAopzsEIKvF_HDGqgqfeBExUr5xkVYbGqlkrqCZCtup5MO_KNsnN4bP_hGZbL59twdgQG7qx-Fj_AX2-GOP
linkProvider Library Specific Holdings
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.title=Genres+on+the+Web+%3A+computational+models+and+empirical+studies&rft.au=Mehler%2C+Alexander&rft.au=Sharoff%2C+Serge&rft.au=Santini%2C+Marina&rft.date=2010-01-01&rft.pub=Springer&rft.isbn=9789048191772&rft_id=info:doi/10.1007%2F978-90-481-9178-9&rft.externalDocID=BB04035733
thumbnail_m http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fvle.dmmserver.com%2Fmedia%2F640%2F97890481%2F9789048191789.jpg
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fmedia.springernature.com%2Fw306%2Fspringer-static%2Fcover-hires%2Fbook%2F978-90-481-9178-9