Alexa Site Thumbnail with Python II

This is how I actually use Alexa Site Thumbnail, and since I’min a sharing mood, I’ll extend the code your way. In short, this takes the url and searches in STORELOC first, then any urls not already in STORELOC are retrieved and named via a slug. You need to pass two variables to either of these: blog_site.url and blot_site.slug – since I’m using Django, this is naturally how sites are returned after I filter a queryset. What I do is place the call to Alexa as high up the page as I can, and because I’ve threaded this, the page can continue to load without waiting for Alexa’s response. For instance, let’s say you have some model with cool sites, and you want to return the sites filtered by owner…

views.py

from getAST import create_thumbnail_list
blog_sites = CoolSiteListing.objects.filter(owner__username__iexact=user_name, is_active=True)
create_thumbnail_list(blog_sites).start()

Notice the .start() on the create_thumbnail_list function? That starts the thread.

getAST.py

import base64
import datetime
import hmac
import sha
import sys
import re
import urllib
import xml.dom.minidom
import os
import threading

AWS_ACCESS_KEY_ID = 'your-access-key-id'
AWS_SECRET_ACCESS_KEY = 'your-super-secret-key'
STORELOC = "/path/to/store/thumbs/"

# This one is for an individual thumbnail...
class create_thumbnail(threading.Thread):
   # Override Thread's __init__ method to accept the parameters needed:
    def __init__(self, site_url, site_slug):
        self.site_url = site_url
        self.site_slug = site_slug
        threading.Thread.__init__(self)
        
    def run(self):
        # First check if the thumbnail exists already
        # site_slug is the name of thumbnail, for instance
        # I would generate the slug of my site as kelvinism_com,
        # and the entire image would be kelvinism_com.jpg 
        if not os.path.isfile(STORELOC+self.site_slug+".jpg"):
            def generate_timestamp(dtime):
                return dtime.strftime("%Y-%m-%dT%H:%M:%SZ")
            def generate_signature(operation, timestamp, secret_access_key):
                my_sha_hmac = hmac.new(secret_access_key, operation + timestamp, sha)
                my_b64_hmac_digest = base64.encodestring(my_sha_hmac.digest()).strip()
                return my_b64_hmac_digest
            timestamp_datetime = datetime.datetime.utcnow()
            timestamp_list = list(timestamp_datetime.timetuple())
            timestamp_list[6] = 0
            timestamp_tuple = tuple(timestamp_list)
            timestamp = generate_timestamp(timestamp_datetime)
            signature = generate_signature('Thumbnail', timestamp, AWS_SECRET_ACCESS_KEY)
            parameters = {
                'AWSAccessKeyId': AWS_ACCESS_KEY_ID,
                'Timestamp': timestamp,
                'Signature': signature,
                'Url': self.site_url,
                'Action': 'Thumbnail',
                }
            url = 'http://ast.amazonaws.com/?'
            result_xmlstr = urllib.urlopen(url, urllib.urlencode(parameters)).read()
            result_xml = xml.dom.minidom.parseString(result_xmlstr)
            image_urls = result_xml.childNodes[0].getElementsByTagName('aws:Thumbnail')[0].firstChild.data
            #image_name = re.sub("\.|\/", "_", result_xml.childNodes[0].getElementsByTagName('aws:RequestUrl')[0].firstChild.data) + ".jpg"
            image_name = self.site_slug + ".jpg"
            store_name = STORELOC + image_name
            urllib.urlretrieve(image_urls, store_name)
            return image_name
  
# And this one is for a list
class create_thumbnail_list(threading.Thread):
   # Override Thread's __init__ method to accept the parameters needed:
   def __init__(self, all_sites):
      self.all_sites = all_sites
      threading.Thread.__init__(self)
   def run(self):     
        SITES = []
        # go through the sites and only request the ones that don't
        # exist yet
        for s in self.all_sites:
            if not os.path.isfile(STORELOC+s.slug+"SM.jpg"):
                SITES.append(s)
                       
        if SITES: 
            def generate_timestamp(dtime):
                return dtime.strftime("%Y-%m-%dT%H:%M:%SZ")
            
            def generate_signature(operation, timestamp, secret_access_key):
                my_sha_hmac = hmac.new(secret_access_key, operation + timestamp, sha)
                my_b64_hmac_digest = base64.encodestring(my_sha_hmac.digest()).strip()
                return my_b64_hmac_digest
            
            timestamp_datetime = datetime.datetime.utcnow()
            timestamp_list = list(timestamp_datetime.timetuple())
            timestamp_list[6] = 0
            timestamp_tuple = tuple(timestamp_list)
            timestamp = generate_timestamp(timestamp_datetime)
            
            signature = generate_signature('Thumbnail', timestamp, AWS_SECRET_ACCESS_KEY)
            
            image_loc = {}
            image_num = []
            image_size = {}
            
            count = 1   
            for s in SITES:
                image_num = 'Thumbnail.%s.Url' % count
                image_loc[image_num] = s.url
                count += 1
                
            parameters = {
                'AWSAccessKeyId': AWS_ACCESS_KEY_ID,
                'Timestamp': timestamp,
                'Signature': signature,
                'Action': 'Thumbnail',
                'Thumbnail.Shared.Size': 'Small',
                }
                
            parameters.update(image_loc)
            
            ast_url = 'http://ast.amazonaws.com/?'
                
            result_xmlstr = urllib.urlopen(ast_url, urllib.urlencode(parameters)).read()
            result_xml = xml.dom.minidom.parseString(result_xmlstr)
    
            count = 0
            for s in SITES:
                image_urls = result_xml.childNodes[0].getElementsByTagName('aws:Thumbnail')[count].firstChild.data
                image_name = s.slug + "SM.jpg"
                store_name = STORELOC + image_name
                urllib.urlretrieve(image_urls, store_name)
                count += 1

Solved: NO PUBKEY

I’ve received this error more than once, so I’m finally writing my notes how I solve it.

Error message:

W: GPG error: http://security.debian.org stable/updates Release: The following signatures couldn’t be verified because the public key is not available: NO_PUBKEY A70DAF536070D3A1

This really is just your standard don’t-have-the-gpg-keys error. So, get’em – take the last eight digits from the long NO_PUBKEY string that is displayed on your computer. If you are using Debian 4.0, the above key is likely correct; if you are using Ubuntu or another version of Debian, it will be wrong. (The last eight digits are used as an identifier at the keyservers). Then:

gpg --keyserver subkeys.pgp.net --recv-keys 6070D3A1
gpg --export 6070D3A1 | apt-key add -

Repeat if necessary. All done, just do an apt-get update and no more warning!

Postfix/Dovecot + MySQL

As you can see by another post, I decided to reinstall the server. This isn’t really a problem, I have pretty good backups. I’ve installed apache and friends a bagillion times. However, Postfix(chroot)+Dovecot authenticating from MySQl, that doesn’t install quite so smoothly.
Just for my future reference, and maybe helpful for somebody, someday. Clearly not a tutorial. The postfix chroot = /var/spool/postfix

cannot connect to saslauthd server: No such file or directory

First, get the saslauthd files into the postfix chroot. Edit /etc/conf.d/saslauthd (or /etc/default/saslauthd), and add this:

SASLAUTHD_OPTS="-m /var/spool/postfix/var/run/saslauthd"

Second, add it to the init script.

stop() {
        ebegin "Stopping saslauthd"
        start-stop-daemon --stop --quiet /
--pidfile /var/spool/postfix/var/run/saslauthd/saslauthd.pid
        eend $?
}

Third, maybe, change /etc/sasl2/smtpd.conf (or /etc/postfix/sasl/smtpd.conf) and add this:

saslauthd_path: /var/run/saslauthd/mux

Ok, that error should go away now.

Recipient address rejected: Domain not found;

(Host or domain name not found. Name service error for name=domain.com

These are actually the same type of error. Copy /etc/resolv.conf into the chroot.

fatal: unknown service: smtp/tcp

Copy /etc/services into the chroot.
I searched google for these answers, to a certain degree at least, but couldn’t really find much. Then I remembered “oh, this is a chroot, it needs things” – and fixed stuff. If you came here from google, and these super quick notes were helpful, feel free to leave a comment, or contact me directly if you have any questions.

Generating a Self-Signed SSL Cert

I have the need to generate an SSL cert (Apache2) about once every 3 months. And since I’m cheap, I don’t ever actually buy one, I just self-sign it. And every time I forget the commands needed. So, here they are, for my reference only.
1) Generate Private Key

openssl genrsa -des3 -out server.key 1024

2) Generate a CSR

openssl req -new -key server.key -out server.csr

3) Remove passphrase

cp server.key server.key.org
openssl rsa -in server.key.org -out server.key

4) Generate Self-Signed Cert

openssl x509 -req -days 365 -in server.csr -signkey server.key -out server.crt

PNG Transparency and IE

I’ve vowed to not use transparent PNGs until almost everybody has switched to IE7, where they are actually supported (despite being supported by every other major browser). I’ve done the hacks, and have had good results. I like using PNGs, I’ll admit it. Inkscape exports them directly, however one slight problem: transparency still exists. This isn’t really a problem since I’m not layering images, or is it?
My initial assumption is that IE would simple pull the white background and everything would be dandy. Well, we all know what they say about assumptions.

A few options exist:

  • Convert them to GIFs
  • Try some sneaky PNG IE hack
  • Do a rewrite and send all IE6 traffic to download firefox. Err… Do a rewrite and send all IE6 traffic to download firefox
  • Open each in GIMP and add a white background
  • Use ImageMagick and convert the background to white.

We have a winner! The problem is, for the life of me, I couldn’t figure out a simple convert command to do this. The quick bash script would suffice:

#/bin/bash
CONVERT=/usr/bin/convert
for image in *.png; do
 $CONVERT -background white $image $image
 echo "Finished converting: $image"
done

**Note:**This is gonna convert all PNGs.

So, no the transparent GIFs have a “white” base layer, IE renders them fine, normal browswers render the images fine, and I’m allowed a cup of coffee. I hope this helps somebody, if so, leave a note!

Simple Chrooted SSH

You might be asking: why would you want to chroot ssh? Why use ssh anyways? Here are the quick answers:

  • FTP usually isn’t great. Unless sent over SSL, all information is sent cleartext.
  • SSH usually is much better. SSH sends all data over an encrypted channel – the main drawback is: you can often browse around the system, and if permissions aren’t set right, read things you shouldn’t be able to.
  • Chroot’d SSH rocks. The solution to both the above problems.

So, let me tell a quick story.
When I started uni in 2001 I was a nerd. Still a nerd, I guess. I was cramped in my apartment on campus with like 5 boxes, most of them old p100s running Linux or OpenBSD. Life was good.
I started a CS degree (shifted into Business with a focus on IT), and we were told to use the school’s main servers to compile our programs. The other interesting thing is that all user accounts were visible when logged in via ssh – but hey, that is just the nature of Linux. I knew this, but asked the head I.T. person “why don’t you jail the connections?” He responded quickly telling me to go away.
Well, shortly after making the comment (although solutions existed at the time being), pam-chroot was released. This is right about the time students figured they could spam everybody in the school, some 25,000 emails, quickly and easily – ‘cause all the accounts were displayed. Sweet – now we can chroot individual ssh connections.
This quick demo will be on Debian, we’ll create a pretend user named “karl.” (I’ll assume you’ve already added the user before beginning these steps). Also, the jails will be in /var/chroot/{username}

First: Install libpam-chroot and makejail

session required pam_chroot.so

kelvin@server ~$ sudo apt-get install libpam-chroot makejail

Second: makejail config file

Put the following in /etc/makejail/create-user.py:

#Clean the jail

cleanJailFirst=1
preserve=["/html", "/home"]
chroot="/var/chroot/karl"
users=["root","karl"]
groups=["root","karl"]
packages=["coreutils"]

Edit: If you need to use SFTP also, try this config:

cleanJailFirst=1
preserve=["/html", "/home"]
chroot="/home/vhosts/karl"
forceCopy=["/usr/bin/scp", "/usr/lib/sftp-server", /
 "/usr/bin/find", "/dev/null", "/dev/zero"]
users=["root","karl"]
groups=["root","karl"]
packages=["coreutils"]

As you’ll see, there is a “preserve” directive. This is so that when you “clean” the jail (if you need to refresh the files, for instance), you won’t wipe out anything important. I created an /html so that the user can upload their web docs to that file.

Third: configure libpam_chroot

Add the following to /etc/pam.d/ssh:

# Set up chrootd ssh

session required pam_chroot.so

Forth: allow the actual user to be chrootd

Edit /etc/security/chroot.conf and add the following:

karl /var/chroot/karl

Fifth: create/chown the chroot’d dir

kelvin@server ~$ sudo mkdir -p /var/chroot/karl/home

kelvin@server ~$ sudo chown /var/chroot/karl/home

Now you should be able to log in, via the new username karl.

Layer Images Using ImageMagick

For one of my webapp projects I’m needing to layer two images. This isn’t a problem on my laptop – I just fire up GIMP, do some copy ’n pasting, and I’m done. However, since everything needs to be automated (scripted), and on a server – well, you get the point.
The great ImageMagick toolkit comes to the rescue. This is highly documented elsewhere, so I’m going to be brief.

Take this:

And add it to this:

I first tried to use the following technique:

convert bg.jpg -gravity center world.png -composite test.png

This generated a pretty picture, what I wanted. What I didn’t want was the fact that the picture was freaking 1.5 megs large, not to mention the resources were a little high:

real    0m7.405s
user    0m7.064s
sys     0m0.112s

Next, I tried to just use composite.

composite -gravity center world.png bg.png output.png

Same results, although the resource usage was just a tad lower. So, what was I doing wrong? I explored a little and realized I was slightly being a muppet. I was using a bng background that was 1.2 megs large (long story). I further changed the compose type to “atop,” as that is what appeared to have the lowest resource usage. I modified things appropriately:

 composite -compose atop -gravity center world.png bg.jpg output.jpg

This also yielded an acceptable resource usage.

The result:

Resize a VMWare Image of Windows XP

Over the years I have been shrinking the number of computers I own.  At one point my dorm was littered with old P100s, running whatever OS I felt like playing with at the time.  

VMWare comes to help.  In this recent oops, an image I made (for Windows XP), was slightly too small.  I didn’t feel like reinstalling XP + cruft, so just resized the image.  This is the process:

  1. Make Clone or just backup your VMWare image.
  2. Note: if your disk is a Dynamic Disk, you won’t be able use GParted.  There is a chance you can use Disk Management inside Computer Managemen inside XP.
  3. Turn off VMWare image.
  4. Grow the image.  
 vmware-vdiskmanager -x sizeGB yourimagename.vmdk 
  1. Download the GParted LiveCD
  2. Change the CD-ROM drive of your  VMWare image to boot from the ISO you just downloaded.
  3. Boot VMWare image.  Make sure to press ESC right when it starts.
  4. Follow the instructions for GParted. I had to select the Xvesa option, then Done.  Choose your language and keyboard and resolution.
  5. GParted will come up.  First delete the partition (the empty one!), and make sure it says unallocated.  Then go up to Edit and hit Apply.  Then select the partition and say Resize.  Hit apply again.
  6. Reboot image.  Windows XP will come up, and go through checking the disk.  It will reboot again, and you should then be able to log in.

Lighttpd As Apache Sidekick

So, you have a web server. So, you have PHP. So, you want to make it a little quicker? The following are a few ideas to let you do that. First, let me share my experiences.
I have always been wondering “what would a digg do to my site.” I mean, I don’t run a commenting system, so I’m refering to just some article. Because I prefer to manage my own server, I have decided to use a VPS (Virtual Private Server) from VPSLink. Before purchasing I searched around, read reviews, and finally tested it out. Liking what I tested, I stayed. However, since I just host a few ‘play’ sites (http/email/ftp), and a few sites for friends, I am not going to spend much money on a high-end plan. That leaves me with a little problem: how can I maximize what I’ve got?
I’ve tried quite a few things. I finally ended up using Apache to handle php and Lighttpd to serve all static stuff. So, how?

Staticzerize A Page

One of the first things you will need to do is pull down a static copy of your page.

 user@vps:~$ wget http://www.kelvinism.com/howtos/notes/quick-n-dirty-firewall.html 

That was easy enough. Next, let’s create a directory for static pages.

user@vps:~$ sudo mkdir /var/www/html/kelvinism/static
user@vps:~$ sudo mv quick-n-dirty-firewall.html /var/www/html/kelvinism/static/ 

Sweet. (This is assuming of course that the site’s DirectoryRoot is /var/www/html/kelvinism). Next, Lighttpd.

Lighttpd Configuration

Install Lighttpd however you choose. There are a few key changes to make in the configuration.
First, change the directory for your base DocumentRoot. Next, change what ports the server will listen on.

server.document-root = \"/var/www/html\"
## bind to port (default: 80)
server.port = 81
## bind to localhost (default: all interfaces)
server.bind = \"127.0.0.1\"

Ok, Lighttpd is all done. Now just start her up, and move onto Apache.

user@vps:/etc/lighttpd$ sudo /etc/init.d/lighttpd start 

Master Configuration

Depending on your distro and what apache you installed, you might need to do this a little different. I will illustrate how to do it with the Apache package from the Debian repository. Let’s activate the mod_proxy module.

 user@vps:~$ sudo a2enmod
Password:
 Which module would you like to enable?
 Your choices are: actions alias asis auth_basic auth_digest authn_alias authn_anon authn_dbd authn_dbm authn_default authn_file authnz_ldap authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cern_meta cgi cgid charset_lite dav dav_fs dav_lock dbd deflate dir disk_cache dump_io env expires ext_filter file_cache filter headers ident imagemap include info ldap log_forensic mem_cache mime mime_magic negotiation php5 proxy proxy_ajp proxy_balancer proxy_connect proxy_ftp proxy_http rewrite setenvif speling ssl status suexec unique_id userdir usertrack version vhost_alias

 Module name? proxy_http

If you are not using a system with a2enmod, you can edit your configuration by hand. Just insert the following into your apache2.conf or httpd.conf files:

LoadModule proxy_module /usr/lib/apache2/modules/mod_proxy.so
LoadModule proxy_http_module /usr/lib/apache2/modules/mod_proxy_http.so 

The actual location of the extension (*.so) will vary depending on where you installed it. If you have tried this out and get forbidden errors, or it just simply isn’t working, the reason is because the proxy modules isn’t configured right. You will likely get an error like:

 client denied by server configuration: proxy 

To solve this, you need to edit /etc/apache2/mods-enabled/proxy.conf or your httpd.conf file.

<IfModule mod_proxy.c>
   #turning ProxyRequests on and allowing proxying from all may allow
    #spammers to use your proxy to send email.
    ProxyRequests Off
    <Proxy \*>
        AddDefaultCharset off
        Order deny,allow
        Deny from all
        Allow from .kelvinism.com
    </Proxy>
    # Enable/disable the handling of HTTP/1.1 \\"Via:\\" headers.
    # (\\"Full\\" adds the server version; \\"Block\\" removes all outgoing Via: headers)
    # Set to one of: Off | On | Full | Block
    ProxyVia On
</IfModule>

Now, open up your httpd-vhosts.conf or httpd.conf or wherever your site configuration is stored, and add the following inside the virtualhost directive:

#DocumentRoot is just for reference, I assume you know how to setup virtualhosts.

DocumentRoot /var/www/html/kelvinism/
ProxyRequests Off
ProxyPreserveHost On
ProxyPass /howtos/notes/quick-n-dirty-firewall.html http://127.0.0.1:81/kelvinism/stat ic/quick-n-dirty-firewall.html 
ProxyPass /images/ http://127.0.0.1:81/kelvinism/images/ 
ProxyPassReverse / http://127.0.0.1:81/kelvinism/

As an alternative, you could use a rewrite rule.

#DocumentRoot is just for reference, I assume you know how to setup virtualhosts.
DocumentRoot /var/www/html/kelvinism/
RewriteEngine On
RewriteRule ^/howtos/notes/quick-n-dirty-firewall\.html$
http://127.0.0.1:81/kelvinism/static/quick-n-dirty-firewall.html [P,L]
ProxyPass /images/ http://127.0.0.1:81/kelvinism/images/
ProxyPassReverse / http://127.0.0.1:81/kelvinism/
 

So what this does is pass the page http://www.kelvinism.com/howtos/notes/quick-n-dirty-firewall.html through mod_proxy to Lighttpd. So, test it out, and you are all done!

Make Dynamic Sites Static

Let’s say one page on your site is getting hit hard. And I mean, it was digg’d or something. If the page resides on some CMS or blog, and each request is being processed by PHP and resulting in requests to your database, which, as they say, crap is gonna hit the fan. Well, at least if you’re cheap like me, you’ll try to squeeze every penny out of what you’ve got.
That said, mod_rewrite comes to the rescue.
There are only a few modifications that you need to do. The first is to ensure that mod_rewrite is enabled. If you have apache installed on debian, this might do:

user@vps:~$ sudo a2enmod
Password:
Which module would you like to enable?
Your choices are: actions alias asis auth_basic auth_digest authn_alias authn_anon authn_dbd authn_dbm authn_default authn_file authnz_ldap authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cern_meta cgi cgid charset_lite dav dav_fs dav_lock dbd deflate dir disk_cache dump_io env expires ext_filter file_cache filter headers ident imagemap include info ldap log_forensic mem_cache mime mime_magic negotiation php5 proxy proxy_ajp proxy_balancer proxy_connect proxy_ftp proxy_http rewrite setenvif speling ssl status suexec unique_id userdir usertrack version vhost_alias
Module name? rewrite 

Otherwise, you’ll need to drop the following in your apache2.conf (or httpd.conf).

LoadModule rewrite_module /usr/lib/apache2/modules/mod_rewrite.so

Next, grab the page that is getting hit hard from your site.

user@vps:~$ wget http://www.kelvinism.com/stuff/hit-hard.html

Next, let’s create a static directory and move that page into it.

user@vps:~$ sudo mkdir /var/www/html/kelvinism/static
user@vps:~$ sudo mv hit-hard.html /var/www/html/kelvinism/static/

Coolio. Now we’ll rewrite the normal URL (the one being hit hard) to the static URL.
If you have full access to the server, just mimic the following to a VirtualHost:

<VirtualHost *>
    DocumentRoot /var/www/html/kelvinism
    ServerName www.kelvinism.com
    ServerAlias kelvinism.com www.kelvinism.com
<Directory \"/var/www/html/kelvinism\">
    Options Indexes -FollowSymLinks +SymLinksIfOwnerMatch
    allow from all
    AllowOverride None
    RewriteEngine On
    RewriteRule ^stuff/hit-hard\\.html$ /static/hit-hard.html [L]
</Directory>
</VirtualHost>

If you don’t have access to the server, you can just add the following to a .htaccess file:

RewriteEngine On
RewriteRule ^stuff/hit-hard\\.html$ /static/hit-hard.html [L]

Sweet.