Alexa Thumbnail Service

Amazon offers some pretty cool services: S3, EC2, Alexa Site Thumbnail, and others. A while back I wanted to use AST with Django, so ended up writing the Python bindings to the REST API (they didn’t previously exist. I even wrote up a quick tutorial.

Update: Amazon no longer maintains AST. I’ve decided to archive a few of the old sites, so no longer need to take thumbnails. However, a few other thumbnail services seem to have crept up, including SnapCasa", and WebSnapr.

Last Xenful Comments

One of the biggest things I regret is not utilizing Xen more. I’ve finally been admitted to Amazon’s EC2 Limited Beta, just two days before I leave, so not enough time to actually do anything fun. However, I think Xen is an ideal infrastructure aid for SMEs in particular. The cost of technology is continuing to decrease, which means bigger servers cost less. This is great for the small/branch office. Let me explain.

One of the themes I noticed while studying and taking the MCSE was that the solution to the majority of the problems was to just buy more servers. Even for simple things like DHCP, buy another server. I’ve always operated on a limited budget, and anyways, I don’t think money should be wasted on resources when it isn’t needed. With a VT chipset, you aren’t tied to any OS in particular.

My friend Ian and I were talking and he illustrated a great usage of Xen through his work. What he’s ended up doing is installing the Small Business Edition of Server 2003 in a Xen node. The reasoning is that SBE is, apparently extremely difficult to create backups of – mainly due to odd file locking behavior. I’ve had similar thoughts, but mainly taking advantage of Xen’s migration feature. The idea of taking a small branch office and putting everything on a Xen server is quite appealing to me, especially considering a second server could be used to create virtual hot spare.

As you can see, I like Xen. I’ve found it relatively easy to install, and the fact that it is starting to come bundled with recent distributions is pretty darn, sweet.

The Risk in Risk Mitigation

Back in the day the barrier to entry for the Internet was quite high. The technology used required a steep learning curve, the equipment extremely expensive, and sometimes even hard to acquire. Fast forward to 2007 and things have certainly changed. If you know any tech people you can likely get free hosting for a small website, and even more demanding websites can be hosted for not much. The cost of dedicated servers has dropped even more. And the final kicker: web services. I’ve started to think of some web services not as a service, but more like outsourcing requirements.

This very dependency adds risk for a multitude of reasons, and when your entire web application platform revolves around a third party, such as is the case with mashups, you incur great risk.

One of the nice things when requirements are outsourced is the fact that risk is mitigated. I’ll use SmugMug as an example. In summary, they moved their storage to Amason’s S3 network, which is something I will be utilizing as well. Amazon’s S3 (and other web services) continue to drive down the barrier of entry – now you don’t even need to purchase hugely expensive servers for the sole purchase of storage! If you don’t need to purchase them, you also don’t need to manage them. Risk mitigated.

However, continuing the slight allusion from The Other Blog’s article on mashups, I see a slight problem with the outsourcing of requirements. While the following thought isn’t particularly innovative: mitigating risk and outsourcing requirements creates a dependency on the third-party. This very dependency adds risk for a multitude of reasons, and when your entire web application platform revolves around a third party, such as is the case with mashups, you incur great risk.

But, as is evident by the fact that I’ve had stitches nine different times, I’m still going to do some cool mashups anyways, so stay tuned.

Python, AST and SOAP

For one of my projects I need to generate thumbnails for a page. And lots and lots and lots of them. Even though I can generate them via a python script and a very light “gtk browser”, I would prefer to mitigate the server load. To do this I’ve decided to tap into the Alexa Thumbnail Service. They allow two methods: REST and SOAP. After several hours of testing things out, I’ve decided to toss in the towel and settle on REST. If you can spot the error with my SOAP setup, I owe you a beer.
I’m using the ZSI module for python.

1. wsdl2py

I pull in the needed classes by using wsdl2py.

wsdl2py -b http://ast.amazonaws.com/doc/2006-05-15/AlexaSiteThumbnail.wsdl

2. Look at the code generated.

See AlexaSiteThumbnail_types.py and AlexaSiteThumbnail_client.py.

3. Write python code to access AST over SOAP.


#!/usr/bin/env python
import sys
import datetime
import hmac
import sha
import base64
from AlexaSiteThumbnail_client import *

print 'Starting...'

AWS_ACCESS_KEY_ID = 'super-duper-access-key'
AWS_SECRET_ACCESS_KEY = 'super-secret-key'

print 'Generating signature...'

def generate_timestamp(dtime):
    return dtime.strftime("%Y-%m-%dT%H:%M:%SZ")

def generate_signature(operation, timestamp, secret_access_key):
    my_sha_hmac = hmac.new(secret_access_key, operation + timestamp, sha)
    my_b64_hmac_digest = base64.encodestring(my_sha_hmac.digest()).strip()
    return my_b64_hmac_digest

timestamp_datetime = datetime.datetime.utcnow()
timestamp_list = list(timestamp_datetime.timetuple())
timestamp_list[6] = 0
timestamp_tuple = tuple(timestamp_list)
timestamp_str = generate_timestamp(timestamp_datetime)

signature = generate_signature('Thumbnail', timestamp_str, AWS_SECRET_ACCESS_KEY)

print 'Initializing Locator...'

locator = AlexaSiteThumbnailLocator()
port = locator.getAlexaSiteThumbnailPort(tracefile=sys.stdout)

print 'Requesting thumbnails...'

request = ThumbnailRequestMsg()
request.Url = "alexa.com"
request.Signature = signature
request.Timestamp = timestamp_tuple
request.AWSAccessKeyId = AWS_ACCESS_KEY_ID
request.Request = [request.new_Request()]

resp = port.Thumbnail(request)

4. Run, and see error.


ZSI.EvaluateException: Got None for nillable(False), minOccurs(1) element 
(http://ast.amazonaws.com/doc/2006-05-15/,Url), 



 xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" 
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" 
xmlns:ZSI="http://www.zolera.com/schemas/ZSI/" 
xmlns:ns1="http://ast.amazonaws.com/doc/2006-05-15/" 
xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

[Element trace: /SOAP-ENV:Body/ns1:ThumbnailRequest]

55. Conclusion

I’m not entirely certain what I’m doing wrong. I’ve also written another version but actually with NPBinding connecting to the wsdl file. It seems to work much better, as it fully connects, and I get a 200, but it doesn’t return the thumbnail location in the response, and I get a:

TypeError: Response is "text/plain", not "text/xml"

So, while I have things working fine with REST, I would like to get the SOAP calls working. One beer reward.

AWS in Python (REST)

As some of you may know, I have some projects cooked up. I don’t expect to make a million bucks (wish me luck!), but a few extra bills in the pocket wouldn’t hurt. Plus, I’m highly considering further education, which will set me back a few-thirty grand. That said, one of my projects will rely heavily on Amazon Web Services. Amazon has, for quite some time now, opened up their information via REST and SOAP. I’ve been trying (virtually the entire day) to get SOAP to work, but seem to get snagged on a few issues. Stay tuned.
However, in my quest to read every RTFM I stumbled upon a post regarding Python+REST to access Alexa Web Search. After staring at Python code, especially trying to grapple why SOAP isn’t working, updating the outdated REST code was a 5 minute hack. So, if you are interested in using Alexa Web Search with Python via Rest, look below:

websearch.py


#!/usr/bin/python

"""
Test script to run a WebSearch query on AWS via the REST interface.  Written
 originally by Walter Korman ([email protected]), based on urlinfo.pl script from 
  AWIS-provided sample code, updated to the new API by  
Kelvin Nicholson ([email protected]). Assumes Python 2.4 or greater.
"""

import base64
import datetime
import hmac
import sha
import sys
import urllib
import urllib2

AWS_ACCESS_KEY_ID = 'your-access-key'
AWS_SECRET_ACCESS_KEY = 'your-super-secret-key'

def get_websearch(searchterm):
    def generate_timestamp(dtime):
        return dtime.strftime("%Y-%m-%dT%H:%M:%SZ")
    
    def generate_signature(operation, timestamp, secret_access_key):
        my_sha_hmac = hmac.new(secret_access_key, operation + timestamp, sha)
        my_b64_hmac_digest = base64.encodestring(my_sha_hmac.digest()).strip()
        return my_b64_hmac_digest
    
    timestamp_datetime = datetime.datetime.utcnow()
    timestamp_list = list(timestamp_datetime.timetuple())
    timestamp_list[6] = 0
    timestamp_tuple = tuple(timestamp_list)
    timestamp = generate_timestamp(timestamp_datetime)
    
    signature = generate_signature('WebSearch', timestamp, AWS_SECRET_ACCESS_KEY)
    
    def generate_rest_url (access_key, secret_key, query):
        """Returns the AWS REST URL to run a web search query on the specified
        query string."""
    
        params = urllib.urlencode(
            { 'AWSAccessKeyId':access_key,
              'Timestamp':timestamp,
              'Signature':signature,
              'Action':'WebSearch',
              'ResponseGroup':'Results',
              'Query':searchterm, })
        return "http://websearch.amazonaws.com/?%s" % (params)
    
    # print "Querying '%s'..." % (query)
    url = generate_rest_url(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, searchterm)
    # print "url => %s" % (url)
    print urllib2.urlopen(url).read()

You run it like this:

>>> from websearch import get_websearch
>>> get_websearch('python')

S3 Super Backups

My buddy Ian  mentioned Amazon’s S3 service, and the potential for using it for fun webapps.  While utilizing it for webapps will have to wait a few months, I was able to use it as a cheap backup for my home server (pictures, documents, etc,.) – and my server that houses my email and websites.  The setup is pretty quick, and most of it can be detailed here.  The ruby package is here   I’ll toss in my recommendation to use the jets3t Cockpit application for viewing the buckets, especially considering the Firefox extension didn’t work as advertised.  My only two comments will be this:

  1. Making sure SSL is working.  The site mentioned above just has you hunt down some random bash file, that isn’t even hosted anymore.  On my Debian system I simply added this to my upload.sh:
export SSL_CERT_DIR=/etc/ssl/certs/
  1. The second suggestion is another example of the s2sync layout.  Let’s say you created the bucket “kelvinism” – the following would move the documents inside a test folder from /home/kelvin named test to a folder named test in the kelvinism bucket.  Sweet.
 s3sync.rb -r --ssl --delete /home/kelvin/test kelvinism:/test