Hacking the Internet Archive - URL's
To start out, I decided to poke around inside the HTML source code for a details page, In this case a simple image upload.
One thing to notice, is that archive.org is now using jQuery, Bootstrap and other interesting frameworks. Fun stuff! We'll come back to that in a later post.
When the detail page is for a book (mediatype: texts), there are lots of interesting things to say. I'll devote an entire post or two to this topic.
An easy thing to do in most browsers is search for the detail's identifier, preceded by a slash to find interesting url's related to the archive.org item. Here are some that I found so far:
The item home page is located at '/details', followed by the unique id for your archive.org upload item. By default, archive.org shows single images in the 'theater' area of the page if it can find one.
BTW, here is the full list of archive.org media types as show in the media type select pull down on the archive.org advanced search page (/advancedsearch.php):
Notice that you can get a thumbnail, if it exists, using '/services/img'. Here's an example: I simply asked blogger to insert an image from a url, then typed in 'http://archive.org/services/AliciasShoes' and presto!
I wonder what other services there are?
You can access a specific file by using '/download'. This is useful for image or video 'src' attributes, 'json' data and more. Here's an example:
I have not made use of the '/compress' feature yet, but it looks handy. More to come on that one for sure.
I'm not sure what '/embed' does at this time. I think it is for embedding an iframe into another webpage, but it seems to be broken as of this writing. :(
That's the iframe above, that big white space. The main div inside the embed appears to be empty on this and other detail pages I have tried today.
Ok, that's it for the first post. Just a simple introduction to some important features. In future posts, we'll look at some cool tools for working with the Internet Archive and some ways to use them to create some interesting web pages.
Labels: archive.org, hacking, internet archive, open media, open source
1 Comments:
I found this file with more information about archive.org mediatypes:
https://github.com/internetarchive/IAS3API/blob/master/metadata.md
mediatype
The primary type of media contained in the item. While an item can contain files of diverse mediatypes the value in this field defines the appearance and functionality of the item's detail page on Internet Archive. In particular, the mediatype of an item defines what sort of online viewer is available for the files contained in the item.
The mediatype metadata field recognizes a limited set of values:
audio: The majority of audio items should receive this mediatype value. Items for the Live Music Archive should instead use the etree value.
collection: Denotes the item as a collection to which other collections and items can belong.
data: This is the default value for mediatype. Items with a mediatype of data will be available in Internet Archive but you will not be able to browse to them. In addition there will be no online reader/player for the files.
etree: Items which contain files for the Live Music Archive should have a mediatype value of etree. The Live Music Archive has very specific upload requirements. Please consult the documentation for the Live Music Archive prior to creating items for it.
image: Items which predominantly consist of image files should receive a mediatype value of image. Currently these items will not available for browsing or online viewing in Internet Archive but they will require no additional changes when this mediatype receives additional support in the Archive.
movies: All videos (television, features, shorts, etc.) should receive a mediatype value of movies. These items will be displayed with an online video player.
software: Items with a mediatype of software are accessible to browse via Internet Archive's software collection. There is no online viewer for software but all files are available for download.
web: The web mediatype value is reserved for items which contain web archive WARC files.
If the mediatype value you set is not in the list above it will be saved but ignored by the system. The item will be treated as though it has a mediatype value of data.
If a value is not specified for this field it will default to data.
Post a Comment
<< Home