Heaps’ Law and Heaps functions in tagged texts: evidences of their linguistic relevance

Chacoma, A.; Zanette, Damian Horacio

Heaps’ Law and Heaps functions in tagged texts: evidences of their linguistic relevance

cnea.tipodocumento	ARTÍCULO CIENTÍFICO
dc.contributor.author	Chacoma, A.
dc.contributor.author	Zanette, Damian Horacio
dc.date.accessioned	2025-03-19T15:34:02Z
dc.date.available	2025-03-19T15:34:02Z
dc.date.issued	2020
dc.description.abstract	We study the relationship between vocabulary size and text length in a corpus of 75 literary works in English, authored by six writers, distinguishing between the contributions of three grammatical classes (or ‘tags,’ namely, nouns, verbs and others), and analyse the progressive appearance of new words of each tag along each individual text. We find that, as prescribed by Heaps’ Law, vocabulary sizes and text lengths follow a well-defined power-law relation. Meanwhile, the appearance of new words in each text does not obey a power law, and is on the whole well described by the average of random shufflings of the text. Deviations from this average, however, are statistically significant and show systematic trends across the corpus. Specifically, we find that the appearance of new words along each text is predominantly retarded with respect to the average of random shufflings. Moreover, different tags add systematically distinct contributions to this tendency, with verbs and others being respectively more and less retarded than the mean trend, and nouns following instead the overall mean. These statistical systematicities are likely to point to the existence of linguistically relevant information stored in the different variants of Heaps’ Law, a feature that is still in need of extensive assessment.
dc.description.institutionalaffiliation	Fil.: Zanette, Damian Horacio Comisión Nacional de Energía Atómica. Instituto Balseiro; Universidad Nacional de Cuyo, Argentina; Consejo Nacional de Investigaciones Científicas y Técnicas, Argentina
dc.description.institutionalaffiliationexternal	Fil.: Chacoma, A Instituto de Física Enrique Gaviola. Consejo Nacional de Investigaciones Científicas y Técnicas, Argentina; Universidad Nacional de Córdoba, Argentina
dc.format.extent	1-15 p.
dc.format.extent	application/pdf
dc.identifier.doi	https://doi.org/10.1098/rsos.200008
dc.identifier.uri	https://nuclea.cnea.gob.ar/handle/20.500.12553/6181
dc.language.ISO639-3	eng
dc.publisher	Royal Society
dc.relation.ispartof	Royal Society Open Science; Vol. 7 N° 3 (2020), pp. 1-15
dc.rights.accesslevel	info:eu-repo/semantics/openAccess
dc.rights.license	Creative Commons Atribución-NoComercial-CompartirIgual 4.0 Internacional
dc.rights.uri	https://creativecommons.org/licenses/by-nc-sa/4.0/
dc.subject.keyword	REGULARIDADES DEL LENGIAJE
dc.subject.keyword	LEY DE HEAPS
dc.subject.keyword	TEXTOS ETIQUETADOS
dc.subject.keyword	CLASES GRAMATICALES
dc.subject.keyword	ANOMALIAS ESTADISTICAS
dc.subject.keyword	LANGUAGE REGULARITIES
dc.subject.keyword	HEAP’S LAW
dc.subject.keyword	TAGGED TEXTS
dc.subject.keyword	GRAMMATICAL CLASSES
dc.subject.keyword	STATISTICAL ANOMALIES
dc.title	Heaps’ Law and Heaps functions in tagged texts: evidences of their linguistic relevance
dc.type	info:eu-repo/semantics/article
dc.type	info:ar-repo/semantics/artículo
dc.type.version	info:eu-repo/semantics/publishedVersion

Archivos

Bloque original

Mostrando 1 - 1 de 1

Nombre:: CNEA_FIE_ART_006.pdf
Tamaño:: 1.3 MB
Formato:: Adobe Portable Document Format

Descargar

Colecciones

ARTÍCULOS