Fraunhofer IPK

Institute for Production Systems and Design Technology

Automation Technology - ST

Automated Virtual Reconstruction of Ripped Stasi Files

Fraunhofer IPK first began research and development in the areas of digitalization and reconstruction of damaged and destroyed documents in the mid 1990s. In April 2007, Fraunhofer IPK was commissioned by the Procurement Office of the Federal Ministry of the Interior (BMI) with a pilot project for the development of a method for the virtual reconstruction of the ripped files of the former German Democratic Republic’s Secret Security Service (Stasi). The pilot phase was set for a period of four years during which 400 sacks out of over 15,000 sacks of ripped documents should be processed.

The first step in the work involved laying the groundwork needed for the gradual commissioning of a system for virtual reconstruction. The system comprises the following components: digitalization hardware and server system for the digitalization workflow, a grid-based server system for the reconstruction workflow, the ePuzzler reconstruction module, and a software framework for context-sensitive control of the reconstruction modules.

The Overall Process

Developed by Fraunhofer IPK, the ePuzzler is a reconstruction software that uses complex image processing and pattern recognition algorithms for the automated reconstruction of scanned scraps of paper into complete pages. It also offers tools for the manual inspection and correction of dubious or ambiguous ePuzzler results. The ePuzzler is structured in three key components: feature extractor, search area reducer, and matcher. As no two scraps of torn paper exactly match, it cannot be predicted how often, or in what sequence, the three modules will interact with one another during a reconstruction operation. This is why they are embedded in a complex software framework in a non-deterministic adaptive workflow. The methodology of virtual reconstruction is similar to that used by a person working out a jigsaw puzzle who will compare a number of features to decide whether or not two pieces fit together. Similar to the human way, the ePuzzler first computes the various features of the paper scraps like shape, paper color, fonts, writing, and line ruling which are then used to reduce the number of possible combinations for the puzzle – and this is especially important when it comes to dealing with large volumes of data. Scraps with similar attributes are then collected in subgroups by smart reduction of the search area. Actual reconstruction or matching takes place within these reduced volumes. Matching involves comparison of the torn fragments for similarities along their edges. If two scraps of paper fit together, they are digitally “glued” and featured as a larger fragment in the on-going reconstruction process.

Overall Process of the »Stasi Puzzle«
© Fraunhofer IPK

ePuzzler: Technology of the virtual reconstruction
© Fraunhofer IPK