Data lineage construction method for complex data audit requirements

Pan Xiaohua1
Jin Yong2
Gao Yanghua2
Zhu Xinzhou1
Shen Shijing3
1. School of Software Technology, Zhejiang University, Hangzhou 310058, China
2. Information Center, China Tobacco Zhejiang Industrial Co. , Ltd. , Hangzhou 310007, China
3. Innovation Centre for Information Science, Binjiang Institute of Zhejiang University, Hangzhou 310053, China

Abstract

For complex data audit requirements, existing methods rely on querying and analyzing the information of each execution statement in the database, resulting in low efficiency of data audit. At present, there are also some methods that use data lineage tools for quick search, but these methods require intrusion into the system to obtain source code, which can easily cause data leakage or malicious tampering. In response to these issues, this paper proposed a data lineage construction method for complex data audit requirements, integrating key technologies such as log preprocessing, data relationship analysis, and data alignment. By analyzing the system's running log information, it constructed the data lineage graph in a non-invasive manner, and formed a data audit tool for the tobacco logistics inbound and outbound. This paper took 155 728 transaction logs corresponding to 13 796 batches of goods in the tobacco logistics as the test dataset and conducted comparative experiments from three aspects, such as completeness, construction cost, and data audit efficiency. The experimental results show that the proposed method can complete the query task within 10 s, occupying a memory of 1.23 MB/hundred items, which is obviously less than the existing methods. Compared with the existing methods, the proposed method can construct a complete and accurate data lineage at the data level granularity, and using the data lineage constructed by this proposed method can greatly improve the efficiency of data auditing in the cigarette logistics.

Foundation Support

浙江省科技计划资助项目(2023C01213)
“尖兵”“领雁”研发攻关计划资助项目

Publish Information

DOI: 10.19734/j.issn.1001-3695.2023.05.0214
Publish at: Application Research of Computers Printed Article, Vol. 41, 2024 No. 1
Section: Algorithm Research & Explore
Pages: 76-82
Serial Number: 1001-3695(2024)01-012-0076-07

Publish History

[2023-10-07] Accepted Paper
[2024-01-05] Printed Article

Cite This Article

潘晓华, 金泳, 高扬华, 等. 面向复杂数据审计需求的数据血缘构建方法 [J]. 计算机应用研究, 2024, 41 (1): 76-82. (Pan Xiaohua, Jin Yong, Gao Yanghua, et al. Data lineage construction method for complex data audit requirements [J]. Application Research of Computers, 2024, 41 (1): 76-82. )

About the Journal

  • Application Research of Computers Monthly Journal
  • Journal ID ISSN 1001-3695
    CN  51-1196/TP

Application Research of Computers, founded in 1984, is an academic journal of computing technology sponsored by Sichuan Institute of Computer Sciences under the Science and Technology Department of Sichuan Province.

Aiming at the urgently needed cutting-edge technology in this discipline, Application Research of Computers reflects the mainstream technology, hot technology and the latest development trend of computer application research at home and abroad in a timely manner. The main contents of the journal include high-level academic papers in this discipline, the latest scientific research results and major application results. The contents of the columns involve new theories of computer discipline, basic computer theory, algorithm theory research, algorithm design and analysis, blockchain technology, system software and software engineering technology, pattern recognition and artificial intelligence, architecture, advanced computing, parallel processing, database technology, computer network and communication technology, information security technology, computer image graphics and its latest hot application technology.

Application Research of Computers has many high-level readers and authors, and its readers are mainly senior and middle-level researchers and engineers engaged in the field of computer science, as well as teachers and students majoring in computer science and related majors in colleges and universities. Over the years, the total citation frequency and Web download rate of Application Research of Computers have been ranked among the top of similar academic journals in this discipline, and the academic papers published are highly popular among the readers for their novelty, academics, foresight, orientation and practicality.


Indexed & Evaluation

  • The Second National Periodical Award 100 Key Journals
  • Double Effect Journal of China Journal Formation
  • the Core Journal of China (Peking University 2023 Edition)
  • the Core Journal for Science
  • Chinese Science Citation Database (CSCD) Source Journals
  • RCCSE Chinese Core Academic Journals
  • Journal of China Computer Federation
  • 2020-2022 The World Journal Clout Index (WJCI) Report of Scientific and Technological Periodicals
  • Full-text Source Journal of China Science and Technology Periodicals Database
  • Source Journal of China Academic Journals Comprehensive Evaluation Database
  • Source Journals of China Academic Journals (CD-ROM Version), China Journal Network
  • 2017-2019 China Outstanding Academic Journals with International Influence (Natural Science and Engineering Technology)
  • Source Journal of Top Academic Papers (F5000) Program of China's Excellent Science and Technology Journals
  • Source Journal of China Engineering Technology Electronic Information Network and Electronic Technology Literature Database
  • Source Journal of British Science Digest (INSPEC)
  • Japan Science and Technology Agency (JST) Source Journal
  • Russian Journal of Abstracts (AJ, VINITI) Source Journals
  • Full-text Journal of EBSCO, USA
  • Cambridge Scientific Abstracts (Natural Sciences) (CSA(NS)) core journals
  • Poland Copernicus Index (IC)
  • Ulrichsweb (USA)