On Creating and Using Text of the Russian Federation Corpus of Legal Acts as an Open Dataset
Abstract
Methods of computer-aided text analysis that are currently being developed can be useful for research in legal science and in practice. An obvious requirement for such an analysis is the availability of an open and structured corpus of texts. The article presents such a corpus of texts of legal acts of federal and regional legislation in a machine-readable form (of a dataset) RusLawOD. It is publicly available on the Github Internet portal. The created data set is based on open sources of legal acts, primarily on the data of the Official Internet Portal of Legal Information (pravo.gov.ru) as a result of integration of open data about published officially legal acts and the Zakonodatelstvo Rossii legal information system. The main research issue in the field of law in the development of this resource was the question how to publish the texts of legal acts and metadata about them. It is necessary to come on a nationwide scale to the general standard for the description of legal acts in machine-readable form for the possibilities of data exchange between different information systems. To do this, we need to determine the uniform name of the attributes that identify the document, as well as its internal structure. The article suggests solutions that can be taken as a basis for this. In addition to describing the data, examples are given how the data presented can help in solving research legal problems. Such examples are the classification of legal acts and the definition of the frequency of collocations of certain terms. On the basis of analysis of metadata about documents published in the official site, the classifier of really used themes was reconstructed, and theme usage was counted. The author compares existing classification of legal acts and the use of methods of computer linguistics to determine the most frequently used subjects in legislation, coming to the conclusion that modern methods of computer-based text analysis make it possible to get valuable and proven results.
References
Baranov V.M., Kuznetsov A.P., Marshakova N.N. (2014) Klassifikatsiya v rossiyskom zakonodatel'stve (teoretiko-prikladnoe issledovanie) [Classification in Russian legislation (theoretical and applied research)]. Moscow: Yurlitinform, 160 p. (in Russian)
Budakov A.S. (2013) Voprosy ofitsial'nogo opublikovaniya pravovykh aktov v elektronnom vide [Issues of formal publishing legal acts in electronic form]. Poluchenie, khranenie i ispol'zovanie informatsii v elektronnoy srede: publichno-pravovoe i chastnopravovoe regulirovanie [Retrieving, keeping and applying information in the electronic environment. N.A. Shevelev (ed.)]. Saint Petersburg: Presidential Library, p. 25-30.
Boyarskiy K. K. (2014) Vvedenie v komp'yuternuyu lingvistiku [Introduction into computer linguistics]. Saint Petersburg: NIU ITMO Press, 72 p.
Isakov V.B. (2013) Formirovanie pravovoy osnovy sistemy ofitsial'nogo elektronnogo opublikovaniya [Forming legal basis of official electronic publication]. Poluchenie, khranenie i ispol'zovanie informatsii v elektronnoy srede: publichno-pravovoe i chastno-pravovoe regulirovanie... [Retrieving, keeping and applying information in the electronic environment...]. Saint Petersburg: Presidential Library, p. 18-24.
Istoriya razvitiya pravovoy informatizatsii Rossii (2014) [History of legal information system in Russia]. Available at: URL: http://pravo.gov.ru/Inform/pravinfarticles/articles/pravinfarticles_7.html (accessed: 15.11. 2017)
Korobov M.V. (2015) Morphological analyzer and generator for Russian and Ukrainian languages. Analysis of images, social networks and texts. Basel: Springer International, p. 320-332.
Lodder A., Oskamp A. (2006) Information technology and lawyers. Advanced technology in the legal domain, from challenges to daily routine. Berlin: Springer, 198 p.
Nikolaev I.S., Mitrenina O.V., Lando T.M. (2017) Prikladnaya i komp'yuternaya lingvistika [Applied and computer linguistics]. Moscow: URSS, 320 p. (in Russian)
Officialnoye electronnoye opublikovamie: isrotia, podhody, perspectivy (2012) [Official electronic publishing: history, approaches, prospects]. V.B Isakov, ed. Moscow: Formula prava, 320 p. (in Russian)
Sharshun V.A. (2015) O edinom pravovom klassifikatore Respubliki Belarus' [On the unified nomenclature of the Republic of Belarus]. Informatsionnoe pravo, no 3, p. 7-11.
Tkachenko N.V. (2016) Statisticheskiy analiz federal'nogo zakonodatel'stva Available at: URL: https://csr.ru/wp-content/uploads/2017/02/Issledovanie_TSSR_statistika-po-zakonoproektam.pdf (accessed: 15.11.2017)
Vershinin A.P. (2010) Elektronnyy Svod zakonov i pravovaya informatizatsiya v Rossii [Electronic digest of laws and legal information system in Russia]. Izvestiya vysshikh uchebnykh zavedeniy. Pravovedenie, no 4, p. 98-108.
Vershinin A.P. (2016) Ot svoda zakonov Rossiyskoy imperii k avtomatizirovannoy sistematizatsii rossiyskogo zakonodatel'stva [From The Digest of Laws of the Russian Empire to automatic system of Russian law]. Gosudarstvo i pravo, no 10, p. 90-91.
Zakharov G.N. (2015) Klassifikator pravovykh aktov [Nomenclature of legal acts]. Vestnik Tverskogo universiteta, no 3, p. 20-25.
Zvyagintsev M.N. (2007) Klassifikatsiya munitsipal'nykh pravovykh aktov [Nomenclature of municipal legal acts]. Ekonomika i upravlenie, no 4, p. 54-56.
Copyright (c) 2018 Law. Journal of the Higher School of Economics

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.












