The main result in this thesis is the definition of a difference operator for XML documents and the implementation of a corresponding algorithm. The difference operator is similar to the diff Unix command, but it applies to structured documents rather than to plain files. The intended usage for the operator is to compare XML documents or webpages to identify updates and changes.
It is suggested to model a simplified version of XML as terms. A partial order relation is defined for such terms and a difference operator is formally defined with basis in the partial order. The operator must satisfy four properties, one of which is a minimality cirtierion. With these properties the result of the operator is a unique term. A difference algorithm is presented and is proved to be correct with regards to the definition of the operator. The proofs are by structural induction on the terms. The algorithm is implemented in Haskell as a function which ranges over a type representing the simplified version of XML.
The operator is initially defined for a simplified version of XML to discover the principal properties for such an operator. The results obtained for the simple model can be generalised to a model for proper XML. The thesis does not contain a formal definition for such an extended model but a Haskell function which ranges over a type representing XML without simplifications is implemented. Some examples of the application of the latter function to XHTML documents is presented.