File-format analysis tools

Format analysis applications allow the identification and validation of a certain format. This process provides the verification of the format that usually announces the file extension and checks the degree of matching with the corresponding structure, which is especially useful at the time of intake.

Droid

Website: http://www.nationalarchives.gov.uk/information-management/manage-information/preserving-digital-records/droid/

License: BSD License.

Developer: UK National Archive. Digital Preservation department.

Operating system: Cross-platform.

Main features:

  • Desktop application written in Java (requires version 1.7 or higher).
  • It can be used from a graphical user interface or though the command line.
  • Can accurately identify specific file formats (more than 1400) even when the file extension is wrong or non-existent. To do this, it uses internal signatures (magic number: alphanumeric characters that identify a file) and, when they exist, the external ones (file extension). The format identification is also linked to PRONOM, an online database that contains technical information of file formats, software and technical environments.
  • Extracts other interesting information from the files like their sizes, dates of last modification, locations, among others. However, it doesn’t validate formats or metadata documentary.  
  • Automatically generates checksums for each file and directories using the SHA 1, SHA 2 (256) or MD5 algorithms.
  • Exports result reports in CSV, XML or PDF format.

 

Jhove2

Website: https://github.com/opf-labs/jhove2

License: BSD License

Developer: California Digital Library, Portico, and Stanford University. It was funded by the National Digital Information Infrastructure and Preservation Program (NDIIPP).

Operating system: Cross-platform.

Maine features:

  • Written in Java 6 (requires Java SE Runtime Environment 6).
  • Processes the characterization of digital objects, the identification and the validation of the format according to the technical rules for the standard, the extraction of technical metadata and evaluating its condition to be accepted or not in the repository of the institution according to the established policies.
  • Able to characterize complex and hierarchical digital objects such as directories, files in zip format, bit streams nested within other files, etc.
  • Includes validation modules for formats such as the ICC color profile, SGML, Shapefile, TIFF (including TIFF / EP, TIFF-FX, TIFF / IT, Exif, GeoTIFF, DNG and RFC 1314), plain text encoded in UTF-8, WAVE (including Broadcast Wave), XML, ZIP, GZIP, ARC, WARC.
  • Jhove 1 is compatible with JPEG 2000 and PDF (including PDF / X and PDF / A versions). Jhove2 has not yet been implemented.
  • It does not include any graphical user interface, so its use depends on certain knowledge and experience with the use of the terminal.
  • Allows you to export result reports in text and XML formats.

 

Jpylyzer

Website: http://jpylyzer.openpreservation.org

License: GPL v3.

Developer: Johan Van Der Knijff / Scape Project.

Operating system: Cross-platform.

Main features:

  • Supports the validator tool and the metadata extractor of JP2, file format in compliance with the JPEG2000 standard.
  • Capable of validating the JP2 format and of extracting the technical characteristics of the image.
  • Being specially designed for JP2, it avoids making some errors that other more generic tools make, whose validation is more limited when dealing with JP2.

 

DPF Manager

Website: http://dpfmanager.org

License:  MPL v2+

Developer: DPF Manager is promoted by Easy Innova, the Digital Humanities Lab of the University of Basel and the Agents Research Lab of the University of Girona. This initiative has been boosted by PREFORMA, a PCP project

Operating system: Cross-platform.

Main features:

  • DPF Manager was created in parallel with Tiff Library 4J, an open source TIFF library for java.
  • Specific validator and identifier of files in TIF format.
  • Offers a powerful online version http://dpfmanager.org/application.html