XML Data Integrity Based on Concatenated Hash Function
IJCSIS Publication Vol.1 No.1 May 2009 Special IssueFULL PAPER DOWNLOAD
I. INTRODUCTIONGeneral applications of data integrity could exist in many domains, including e-government, e-commerce, e-financial services, e-business, e-banking, e-learning, e-healthcare, mobile communications, heterogeneous networks, digital factories, multi-agent systems, and grid computing [1-14]. For example, Wu and Chen described the need for data integrity when official documents are being transmitted between government agencies for e-government in Taiwan [1, 2]. O’Neill pointed out the importance of data integrity through an assessment of a bank’s web service . IBM gives an example of data integrity as follows: Assume the data is a funds transfer and the hacker alters a random piece of the data that happens to be the account number. When the bank decrypts the data, the account number is not a valid account; therefore, the data tampering is detected and the transaction is not completed. However, assume instead that the data altered by the hacker node is the amount of money and, changed it from 1000 units to 9000 units . In this case, the transaction would be completed using the incorrect amount. Therefore, research into this area would be of great benefit.There are two approaches to ensuring integrity for XML data. The first tries to add additional elements to XML data to record the integrity information [16, 17]. Without cryptography, this method is easily attacked by a hacker. The second approach is based on a cryptography mechanism, and adopts a hash function to ensure integrity [18, 19, 20, 21, 22]. This cryptography-based approach has a higher security level than the first approach, although there are two major shortcomings in this kind of integrity model for XML data. Firstly, existing integrity models generate a digest value for XML data content without considering XML data features. For non-XML data formats, a user can directly generate digest value of the data content to ensure integrity, but protecting data content integrity alone is not enough for XML data. For example, a signed XML data can be copied to another document but still keep signature valid. This problem can be utilized by an attacker to forge a document with a valid signature. Therefore, besides data content integrity, XML data integrity should also consider element location information and element context meaning under a fine-grained security situation. In this paper, fine-grained security means that users would encrypt or sign portions of XML data rather than the whole. Location information of an XML element refers to the position of this element in the XML data . An element has an entire meaning related to its position in XML data, and will lose original meaning if the position has been changed. Thus, XML data integrity should also protect location information of an XML element in XML data. Another factor which affects the meaning of XML elements is the context relationship. For example, the element will no longer have its original meaning without context relationship in an XML data, and the paper defines this as context referential integrity, in other words, an XML element has an entire meaning only related to other elements in the same XML data, but there is no mechanism which can be used to protect this meaning in an existing integrity model for XML data. Secondly, most of these models are based on the Merkle hash tree [20, 21], when generate digest value, the Merkle hash tree will increase virtual nodes. The hash times will also be increased because of these virtual nodes, and this leads to a low efficiency on digest value-generation. Motivated by the problems above, this paper aims to present XML data integrity requirements combined with XML data features. Based on the XML data integrity requirements presented, it proposes an integrity model for XML data, and improves the efficiency of digest value-generation for XML data.This paper presents an XML data integrity model named as CSR. The model consists of three parts, and CSR is an acronym for these parts: ‘C’ for content integrity, ‘S’ for structure integrity, and ‘R’ for context referential integrity. The three parts are combined with a concatenated hash function. Content integrity is used to ensure XML data content integrity by using a concatenated hash function. Structure integrity is used to protect the location information of an element in XML data by hashing an absolute path string from the root node. Finally, context referential integrity protects the integrity of context-related elements. This paper also describes the combination of the model with XML specification, and integrates the integrity model presented into the XML signature. From evaluation, the integrity model presented has a higher efficiency on digest value-generation than the Merkle hash tree-based integrity model for XML data.
Baolong LiuSchool of Computing & EngineeringUniversity of HuddersfieldHuddersfield, UKJoan LuSchool of Computing & EngineeringUniversity of HuddersfieldHuddersfield, UKJim YipSchool of Computing & EngineeringUniversity of HuddersfieldHuddersfield, UK