О создании и перспективах использования корпуса текстов российских правовых актов как набора открытых данных

D.A. Savel'ev

doi:10.17323/2072-8166.2018.1.26.44

D.A. Savel'ev

DOI: https://doi.org/10.17323/2072-8166.2018.1.26.44

Abstract

Methods of computer-aided text analysis that are currently being developed can be usefulfor research in legal science and in practice. An obvious requirement for such an analysisis the availability of an open and structured corpus of texts. The article presents such acorpus of texts of legal acts of federal and regional legislation in a machine-readable form(of a dataset) RusLawOD. It is publicly available on the Github Internet portal. The created data set is based on open sources of legal acts, primarily on the data of the Official InternetPortal of Legal Information (pravo.gov.ru) as a result of integration of open data aboutpublished officially legal acts and the Zakonodatelstvo Rossii legal information system.The main research issue in the field of law in the development of this resource was thequestion how to publish the texts of legal acts and metadata about them. It is necessaryto come on a nationwide scale to the general standard for the description of legal actsin machine-readable form for the possibilities of data exchange between differentinformation systems. To do this, we need to determine the uniform name of the attributesthat identify the document, as well as its internal structure. The article suggests solutionsthat can be taken as a basis for this. In addition to describing the data, examples are givenhow the data presented can help in solving research legal problems. Such examples arethe classification of legal acts and the definition of the frequency of collocations of certainterms. On the basis of analysis of metadata about documents published in the official site,the classifier of really used themes was reconstructed, and theme usage was counted. Theauthor compares existing classification of legal acts and the use of methods of computerlinguistics to determine the most frequently used subjects in legislation, coming to theconclusion that modern methods of computer-based text analysis make it possible to getvaluable and proven results.

On Creating and Using Text of the Russian Federation Corpus of Legal Acts as an Open Dataset

Abstract