Librarian uncovers historic files using ‘digital forensics’

Using modern technology and a background in computer science, data librarian Vincent Gray has been working for the past 10 years to extract historic files from more than 100 floppy discs and an outdated laptop. The digital forensics project has turned up detailed logs of 13th and 14th century agrarian practices in Winchester, England.

Krista Habermehl // Western NewsUsing modern technology and a background in computer science, data librarian Vincent Gray has been working for the past 10 years to extract historic files from more than 100 floppy discs and an outdated laptop. The digital forensics project has turned up detailed logs of 13th and 14th century agrarian practices in Winchester, England.

Vincent Gray has more than 100 early 80s-era floppy discs and a hefty, mustard yellow, Back to the Future-looking laptop tucked away in his office.

There’s a wealth of information stored on the dated hardware – detailed logs of 13th and 14th century agrarian practices in Winchester, England, in fact – but the technology to read and make sense of it has gone by the wayside.

This is the conundrum Gray, a data librarian with Western Libraries, faced when History professor Eona Karakacili dropped the above-mentioned discs and laptop on his desk 10 years ago, asking if he could retrieve the information and help her make it publicly available.

Not one to shy away from a challenge, Gray put his background in computer science to the test to find a way not only to recover and read the data, but also to interpret it. Today, he is proud to share hundreds upon hundreds of hours of work as a digital sleuth have paid off.

“I’ve cleaned the data to the extent that I can clean it – there are a couple of files I can’t make hide nor tail of. None of the information there seems to make sense. Everything else has been cleaned up. I didn’t keep track of how many hours, but it’s 200, 300 or more. I feel good because if we can get this data out to where it can be re-used by people, it’s a whole aspect of history that’s important.”

The process of extracting the historic data and turning it into something that could be used by a broader research audience was anything but simple. To read the floppy discs, Gray salvaged an old computer, sourced obsolete software programs and wrote several Basic programs of his own to open, sort and compare the multitude of files. A colleague at the time was able to provide a power supply for the laptop so Gray could recover those files as well.

“We had all of these files and what I had to do was look at them and try to figure out, first of all, what it was I should even be looking for. I was able to drag the files into WordPerfect – which I still use,” said Gray, smiling.

He was tickled to uncover a set of ‘codebooks’ that act as keys to unlocking the terminology and coding used in the actual data sets. Code numbers were used to highlight different manors in the area and the types of agricultural products recorded.

After understanding the documentation, Gray painstakingly compared each and every file to pull out detailed information, including how much milk, butter and cheese each manor in the Winchester area produced, what taxes were paid to the Bishop, the crop rotation on each farm and even how many sheep were shorn between the years of 1200 to 1350.

“What I really liked about this story is there’s this academic in the 1970s and 1980s who retrieves these files from a kind of ‘dead state’ and puts them in a tremendously cutting edge technology – for the time – and then Vince has to undertake this secondary process of digital forensics to bring them back to life, to pull them out of history a second time,” said Robert Glushko, Associate Chief Librarian at Western Libraries.

Glushko said the university – and the library, in particular – is increasingly being asked to think about its research data management processes, something that will ultimately help avoid issues like the one Gray spent a decade tackling. “This (project) is what happens when we don’t have proper research data management and why the question is so complex. It’s the same expertise that’s been unwinding this problem for the past 10 years that we’re going to be relying on moving forward with research data management, as well.”

Once the data has been reviewed for potential inconsistencies or inaccuracies, it is hoped the information will be deposited and archived with the Interuniversity Consortium for Political and Social Research (ICPSR), where it will be available for use. The ICPSR is aware of the data and has asked to archive it, once fully recovered.

“Certainly, the amount of work we’ve put in over the years to put this data into an accessible format is something I don’t want to lose. That would be tragic, particularly when I know people have been asking for access to them,” said Gray. “Realistically, it’s possible no one ever uses this information again. But, it’s now going to be available for them if they want to use it or they decide it’s worth comparing to other records they have.”