Исследование сложности предложений, составляющих тексты правовых актов органов власти Российской Федерации

D.A. Savel'ev

doi:10.17323/2072-8166.2020.1.50.74

D.A. Savel'ev

DOI: https://doi.org/10.17323/2072-8166.2020.1.50.74

Keywords: lawmaking, legal information, lawmaking procedure, corpus linguistics, proofreading, lexical variability, open data, computational linguistics, text mining

Abstract

To ensure proper law enforcement, the fact of official publication of regulatory actsit is not enough. What is important is the clarity of legal texts, their accessibility forunderstanding. Linguistic and legal quality of the text are interconnected. Creation of atext that is good from the point of view of linguistics will contribute to a clearer formulationof ideas embodied in a legal or judicial act. Linguistic aid after the creation of the draftact is insufficient. It is necessary to take into account recommendations for the clearwriting of texts at the stage of creating a legal act. The methodology and results of astudy of Russian legislation texts carried out in order to improve law enforcement andmobilization, to reduce the time spent on the perception of legal norms, and to improvethe quality of legal acts are presented. A corpus of texts from 199 thousand legal actswas used. Its texts were segmented into 5.5 million sentences. Using artificial intelligencetechnologies, morphosyntactic markup of sentences with the allocation of parts ofspeech and their properties was carried out. On this basis, the metrics of the lexical andsyntactic complexity of each sentence were calculated: length, lexical diversity, lengthsof dependencies of parts of speech (Dependency Length), word lengths in syllables, etc.Metrics were selected that quantified the complexity of sentences in a legal text, whichis different from the literary text. A technique is proposed for the automated search of sentences that can be attributed to the most difficult to read without the use of manuallabor. On the basis of this work, a body of poorly readable sentences of legal acts wascreated and published in the public domain, consisting of a wider selection — too longsentences and narrower — sentences that differ for the worse from the majority inthree metrics at the same time. This corpus is analyzed statistically and the authoritiesthat write more difficult are identified, and the subjects of documents in which thereare more complex written sentences. It is shown that the number of long sentences inthe legislation has significantly (5 times) increased in comparison with the first years ofmodern Russian statehood. Half of the sentences from acts of the Constitutional Court ofthe Russian Federation consist of more than 40 tokens. Using the NPMI method, the mostfrequently occurring phrases and phrases that characterize the subject of the text areselected from the body. The published corpus may become a subject for more detailedwork on improving the legal technique and content of legal and judicial acts.
For citation: Saveliev D.V. (2020) A Study in Complexity of Sentences Constituting Russian Federation Legal Acts. Pravo. Zhurnal Vysshey shkoly ekonomiki, no 1, pp. 50–74 (in Russian) DOI: 10.17323/2072-8166.2020.1.50.74