Saturday, 30 August 2014

Free the phantom images - 12 million historic images to be released

An American academic has uploaded 2.6 million of public domain images onto a site on photo sharing service  Flickr to allow users to take a "digital trip through time". Kalev Leetaru has uploaded an astonishing 2.6 million fully-tagged images and drawings from books as part of the Internet Archive Organisation's scanning process. 

Leetaru aims to finally upload 12 million images and is urging others to join in the process, and include text, telling the BBC  “Any library could repeat this process. It's actually my hope, that libraries around the world run this same process of their digitised books to constantly expand this universe of images.” The Internet Archive had used an optical character recognition (OCR) program to analyse each of its 600 million scanned pages in order to convert the image of each word into searchable text. As part of the process, the software recognised which parts of a page were pictures in order to discard them. Mr Leetaru reversed the process and wrote a code that used this information to go back to the original scans, extract the regions the OCR program had ignored, and then save each one as a separate file in the Jpeg picture format. 


The images are all tagged meaning they can be easily searched. The software also copied the caption for each image and the text from the paragraphs immediately preceding and following it in the book and Mr Leetaru said
""Type in the telephone, for example, and you can see that all the initial pictures are of businesspeople, and mostly men.  Then you see it morph into more of a tool to connect families" adding  "You see another progression with the railroad where in the first images it was all about innovation and progress that was going to change the world, then you see its evolution as it becomes part of everyday life."

https://www.flickr.com/photos/internetarchivebookimages/  and more on the BBC here 

No comments:

Post a Comment