Howto diff XML files or assemblies thereof such as OpenXML or ODF files

Office Open XML or Open Document Format for Office Applications (ODF) are both zipped, XML-based file formats, that can be extracted to directories.

Diffing XML files is tricky because the common diff tools may be confused by different line endings (LF vs CR-LF) or missing line ending, and whitespace and indentation.

We have this recipy here to compare two directories that contain XML files:

unzip A.xlsx -d A
unzip B.xlsx -d B
find A/ -type f | xargs fromdos
find B/ -type f | xargs fromdos
find A/ -type f | xargs xmlindent -w
find B/ -type f | xargs xmlindent -w
find A/ | grep '~' | xargs rm
find B/ | grep '~' | xargs rm
kdiff3 A/ B/