Migrating To Github Pages

I started this website 20 years ago as a means to learn Django, which made sense, as at the time I wished to do everything myself: website and hosting, running my own mail server, dns server, and probably something else, too. Over time my desire to spend time on these activities dwindled, and I move my content to blogger.

Year by year I wrote fewer articles, and then it became years since I wrote anything. Perhaps it was the UI for blogger or the niggling feeling that Google could kill it off at any time, or perhaps it was having a kid. Eventually the libraries were no longer maintained. This weekend I opted to:

  • Remove as many libraries as possible
  • Move from Blogger to Github Pages

Intially I gave Jekyll a go, as GH Pages natively supports it, but Jekyll seems to be on its way out as a static site generator (e.g. the pagenate plugin hasn’t been updated in like 7 years, and you can’t natively use other plugsin in Github). I gave Hugo a go, and it worked incredibly well.

Here is an overview of what I had to do:

  • Read this page and follow almost all of the steps.
  • Created this script to download the full size images from Blogger. Run it with python src/download_images.py after installing by pip install requests. I had about 250MB of images, so beware if you have a lot, at Pages has a limit of 1GB storage.
  • Created this script to replace any remaining links to picasaweb or blogger with just the image name located next to it. You might not need to run this.
  • Leveraged a lot of concepts from the PaperMod theme, but kept the theme as close to my original theme as possible.
  • If you want to use an apex (’naked’) domain, then you need to use the entire Github Pages domain as your entire repo name. For example, username.github.io, or see here for an example
  • Added a custom domain, which I found the documentation to be a bit out of date. From what I remember just point the apex domain to Github’s IPs, and use a CNAME for the www subdomain.

Github IPs

  • 185.199.108.153
  • 185.199.109.153
  • 185.199.110.153
  • 185.199.111.153

Updating DNS records

apexdomain.jpg wwwdomain.jpg customdomain.jpg

Geocoding Photos (Mac)

I’ve recently started using OSX (again), and am really enjoying it (again). One Windows-only tool that I found really useful is Geosetter, which allows you to add geo coordinates into photos. There don’t appear to be any free geocoding tools that work to my satisfaction to do this, so the next best thing was geocode like you would using Linux. Here’s how.

We’re going to use the command line program ExifTool (by Phil Harvey) to extract coordinates from a gpx file and embed them in a directory of images.

Firstly, install exiftool using brew. Here’s the command:

brew install exiftool

Copy the gpx files into your image directory and initiate the sync with the geotag flag:

exiftool -geotag=gpslog2014-12-10_212401.gpx ./

It is possible to also specify multiple gpx files (e.g. multiple day trip):

exiftool -geotag=gpslog2014-12-10_212401.gpx -geotag=gpslog2014-12-07_132315.gpx -geotag=gpslog2014-12-08_181318.gpx -geotag=gpslog2014-12-10_073811.gpx ./

And finally, you can include a time offset with the geosync flag. For instance, I had an 11-hour (39600 seconds) difference due to a timezone hiccup with my new camera, so we can get rid of that:

exiftool -geotag=gpslog2014-12-10_212401.gpx -geotag=gpslog2014-12-07_132315.gpx -geotag=gpslog2014-12-08_181318.gpx -geotag=gpslog2014-12-10_073811.gpx -geosync=39600 ./

It will process the images, renaming the original with an “.original” extension, and give you a report at the end:

1 directories scanned
193 image files updated
83 image files unchanged

If your camera is set to GMT, then put all the GPX files in the same directory as the photos to geocode, and do this:

TZ=GMT exiftool -geotag "*.gpx" *.jpg

For any additional manual geocoding I fallback on Picasa’s Places GeoTag to add the coordinates.

If you have Lightroom, then try doing a search for a suitable ExifTool Lightroom plugin, as there seem to be a few.

Snap-CI Deploy to OpenShift

There are some wonderful CI / CD tools out there right now, and some of them have very usable free tiers. A few good examples include Shippable, Wercker, CloudBees, and Snap-CI. There are others, of course, but these all allow at least one private project to get started.

I have recently moved my projects to Snap, and my hack for the day needed to be deployed to OpenShift. Although Snap has built in integrations for some providers, no such integration currently exists for OpenShift (yet!). However, it takes less than 10 minutes to configure a Deploy step to OpenShift, and here’s how.

Add SSH Keys
You will need to add your private SSH key (i.e. id_rsa) to Snap, and your public key to OpenShift (i.e. id_rsa.pub)

You can create the keys on another machine with the ssh-keygen command, and copy them into them into the corresponding places. In OpenShift, this is under Settings -> Add a new key. Once open, paste in the contents of your id_rsa.pub key

In Snap, edit your configuration, navigate to your Deploy step, and look for “Secure Files” and “Add new”

Get the content of the id_rsa key you generated earlier and post it in the content box. It should look like this, with “/var/go” as the file location, except with a real key:

Enable Git Push from Snap

If you’ve used ssh much, you are probably aware that that you can specify an identify file with the “-i” flag. The git command has no such flag, yet, but we can create a simple bash script that emulates this (script courtesy of Alvin Abad).

Add another New File in Snap and paste in the below script:

#!/bin/bash
 
# The MIT License (MIT)
# Copyright (c) 2013 Alvin Abad
 
if [ $# -eq 0 ]; then
    echo "Git wrapper script that can specify an ssh-key file
Usage:
    git.sh -i ssh-key-file git-command
    "
    exit 1
fi
 
# remove temporary file on exit
trap 'rm -f /tmp/.git_ssh.$$' 0
 
if [ "$1" = "-i" ]; then
    SSH_KEY=$2; shift; shift
    echo "ssh -i $SSH_KEY \$@" > /tmp/.git_ssh.$$
    chmod +x /tmp/.git_ssh.$$
    export GIT_SSH=/tmp/.git_ssh.$$
fi
 
# in case the git command is repeated
[ "$1" = "git" ] && shift
 
# Run the git command
git "$@"

Give this script the name “git.sh”, set the file permissions to “0755”, and update the file location to “/var/go”.

Profit
With all these parts configured correctly you can add this single line to your Deploy script:

/var/go/git.sh -i /var/go/id_rsa push ssh://[email protected]/~/git/example.git/

Re-run the build, check your logs, and it should deploy. Good luck!

Solved: slow build times from Dockerfiles with Python packages (pip)

I have recently had the opportunity to begin exploring Docker, the currently hip way to build application containers, and I generally like it. It feels a bit like using Xen back in 2005, when you still had to download it from cl.cam.ac.uk, but there is huge momentum right now. I like the idea of breaking down each component of your application into unique services and bundling them up - it seems clean. The next year is going to be very interesting with Docker, as I am especially looking forward to seeing how Google’s App Engine allows Docker usage, or what’s in store for the likes of Flynn, Deis, CoreOS, or Stackdock.

One element I had been frustrated with is the build time of my image to host a Django application I’m working on. I kept hearing these crazy low rebuild times, but my container was taking ages to rebuild. I noticed that it was cached up until I re-added my code, and then pip would reinstall all my packages.

It appeared as though anything after I used ADD for my code was being rebuilt, and reading online seemed to confirm this. Most of the items were very quick, e.g. “EXPOSE 80”, but then it hit “RUN pip -r requirements.txt”

There are various documented ways around this, from two Dockerfiles to just using packaged libraries. However, I found it easier to just use multiple ADD statements, and the good Docker folks have added caching for them. The idea is to ADD your requirements first, then RUN pip, and then ADD your code. This will mean that any code changes don’t invalidate the pip cache.

For instance, I had something (abbreviated snippet) like this:

# Set the base image to Ubuntu
FROM ubuntu:14.04

# Update the sources list
RUN apt-get update
RUN apt-get upgrade -y

# Install basic applications
RUN apt-get install -y build-essential

# Install Python and Basic Python Tools
RUN apt-get install -y python python-dev python-distribute python-pip postgresql-client

# Copy the application folder inside the container
ADD . /app

# Get pip to download and install requirements:
RUN pip install -r /app/requirements.txt

# Expose ports
EXPOSE 80 8000

# Set the default directory where CMD will execute
WORKDIR /app

VOLUME [/app]

CMD ["sh", "/app/run.sh"]

And it rebuild pip whenever the code changes. Just add the requirements and move the RUN pip line:

# Set the base image to Ubuntu
FROM ubuntu:14.04

# Update the sources list
RUN apt-get update
RUN apt-get upgrade -y

# Install basic applications
RUN apt-get install -y build-essential

# Install Python and Basic Python Tools
RUN apt-get install -y python python-dev python-distribute python-pip postgresql-client

ADD requirements.txt /app/requirements.txt

# Get pip to download and install requirements:
RUN pip install -r /app/requirements.txt

# Copy the application folder inside the container
ADD . /app

# Expose ports
EXPOSE 80 8000

# Set the default directory where CMD will execute
WORKDIR /app

VOLUME [/app]

CMD ["sh", "/app/run.sh"]

I feel a bit awkward for having missed something that must be so obvious, so hopefully this can help somebody in a similar situation.

TLS Module In SaltStack Not Available (Fixed)

I was trying to install HALite, the WebUI for SaltStack, using the provided instructions. However, I kept getting the following errors when trying to create the certificates using Salt:

'tls.create_ca_signed_cert' is not available.  
'tls.create_ca' is not available.

Basically, the ’tls’ module in Salt simply didn’t appear to work. The reason for this is detailed on intothesaltmind.org:

Note: Use of the tls module within Salt requires the pyopenssl python extension.

That makes sense. We can fix this with something like:

apt-get install libffi-dev  
pip install -U pyOpenSSL  
/etc/init.d/salt-minion restart

Or, better yet, with Salt alone:

salt '*' cmd.run 'apt-get install libffi-dev'  
salt '*' pip.install pyOpenSSL  
salt '*' cmd.run "service salt-minion restart"

The commands to create the PKI key should work now:

Created Private Key: "/etc/pki/salt/salt_ca_cert.key." Created CA "salt": "/etc/pki/salt/salt_ca_cert.crt."  

Error opening /dev/sda: No medium found

I have had this issue before, solved it, and had it again.

Let’s say you plug in a USB drive into a Linux machine, and try to access it (mount it, partition it with fdisk/parted, or format it), and you get the error

Error opening /dev/sda: No medium found  

Naturally the first thing you will do is ensure that it appeared when you plugged it in, so you run ‘dmesg’ and get:

sd 2:0:0:0: [sda] 125045424 512-byte logical blocks: (64.0 GB/59.6 GiB)  

And it appears in /dev

Computer:~ $ ls /dev/sd*  
/dev/sda  
Computer:~ $  

Now what? Here’s what has bitten me twice: make sure the drive has enough power. Let’s say you mounted a 2.5" USB drive into a Raspberry Pi. The Pi probably doesn’t have enough current to power the drive, but it does have enough to make the drive recognisable. Or, if you are like me, the USB charger powering the drive is faulty, so even though it has power, it doesn’t have enough.

The next troubleshooting step should be obvious: give the drive enough power to completely spin up.

Finding The Same (Misspelled) Name Using Python/NLTK

I have been meaning to play around with the Natural Language Toolkit for quite some time, but I had been waiting for a time when I could experiment with it and actually create some value (as opposed to just play with it). A suitable use case appeared this week: matching strings. In particular, matching two different lists of many, many thousands of names.

To give you an example, let’s say you had two lists of names, but with the name spelled incorrectly in one list:

List 1:
Leonard Hofstadter
Sheldon Cooper
Penny
Howard Wolowitz
Raj Koothrappali
Leslie Winkle
Bernadette Rostenkowski
Amy Farrah Fowler
Stuart Bloom
Alex Jensen
Barry Kripke

List 2:
Leonard Hofstadter
Sheldon Coopers
Howie Wolowits
Rav Toothrapaly
Ami Sarah Fowler
Stu Broom
Alexander Jensen

This could easily occur if somebody was manually typing in the lists, dictating names over the phone, or spell their name differently (e.g. Phil vs. Phillip) at different times.

If we wanted to match people on List 1 to List 2, how could we go about that? For a small list like this you can just look and see, but with many thousands of people, something more sophisticated would be useful. One tool could be NLTK’s edit_distance function. The following Python script displays how easy this is:

import nltk
 
list_1 = ['Leonard Hofstadter', 'Sheldon Cooper', 'Penny', 'Howard Wolowitz', 'Raj Koothrappali', 'Leslie Winkle', 'Bernadette Rostenkowski', 'Amy Farrah Fowler', 'Stuart Bloom', 'Alex Jensen', 'Barry Kripke']
 
list_2 = ['Leonard Hofstadter', 'Sheldon Coopers', 'Howie Wolowits', 'Rav Toothrapaly', 'Ami Sarah Fowler', 'Stu Broom', 'Alexander Jensen']
 
for person_1 in list_1:
    for person_2 in list_2:
        print nltk.metrics.edit_distance(person_1, person_2), person_1, person_2

And we get this output:

0 Leonard Hofstadter Leonard Hofstadter  
15 Leonard Hofstadter Sheldon Coopers  
14 Leonard Hofstadter Howie Wolowits  
15 Leonard Hofstadter Rav Toothrapaly  
14 Leonard Hofstadter Ami Sarah Fowler  
16 Leonard Hofstadter Stu Broom  
15 Leonard Hofstadter Alexander Jensen  
14 Sheldon Cooper Leonard Hofstadter  
1 Sheldon Cooper Sheldon Coopers  
13 Sheldon Cooper Howie Wolowits  
13 Sheldon Cooper Rav Toothrapaly  
12 Sheldon Cooper Ami Sarah Fowler  
11 Sheldon Cooper Stu Broom  
12 Sheldon Cooper Alexander Jensen  
16 Penny Leonard Hofstadter  
13 Penny Sheldon Coopers  
13 Penny Howie Wolowits  
14 Penny Rav Toothrapaly  
16 Penny Ami Sarah Fowler  
9 Penny Stu Broom  
13 Penny Alexander Jensen  
11 Howard Wolowitz Leonard Hofstadter  
13 Howard Wolowitz Sheldon Coopers  
4 Howard Wolowitz Howie Wolowits  
15 Howard Wolowitz Rav Toothrapaly  
13 Howard Wolowitz Ami Sarah Fowler  
13 Howard Wolowitz Stu Broom  
14 Howard Wolowitz Alexander Jensen  
16 Raj Koothrappali Leonard Hofstadter  
14 Raj Koothrappali Sheldon Coopers  
16 Raj Koothrappali Howie Wolowits  
4 Raj Koothrappali Rav Toothrapaly  
14 Raj Koothrappali Ami Sarah Fowler  
14 Raj Koothrappali Stu Broom  
16 Raj Koothrappali Alexander Jensen  
14 Leslie Winkle Leonard Hofstadter  
13 Leslie Winkle Sheldon Coopers  
11 Leslie Winkle Howie Wolowits  
14 Leslie Winkle Rav Toothrapaly  
14 Leslie Winkle Ami Sarah Fowler  
12 Leslie Winkle Stu Broom  
12 Leslie Winkle Alexander Jensen  
17 Bernadette Rostenkowski Leonard Hofstadter  
18 Bernadette Rostenkowski Sheldon Coopers  
18 Bernadette Rostenkowski Howie Wolowits  
19 Bernadette Rostenkowski Rav Toothrapaly  
20 Bernadette Rostenkowski Ami Sarah Fowler  
20 Bernadette Rostenkowski Stu Broom  
17 Bernadette Rostenkowski Alexander Jensen  
15 Amy Farrah Fowler Leonard Hofstadter  
14 Amy Farrah Fowler Sheldon Coopers  
15 Amy Farrah Fowler Howie Wolowits  
14 Amy Farrah Fowler Rav Toothrapaly  
3 Amy Farrah Fowler Ami Sarah Fowler  
14 Amy Farrah Fowler Stu Broom  
13 Amy Farrah Fowler Alexander Jensen  
15 Stuart Bloom Leonard Hofstadter  
12 Stuart Bloom Sheldon Coopers  
12 Stuart Bloom Howie Wolowits  
14 Stuart Bloom Rav Toothrapaly  
13 Stuart Bloom Ami Sarah Fowler  
4 Stuart Bloom Stu Broom  
14 Stuart Bloom Alexander Jensen  
15 Alex Jensen Leonard Hofstadter  
12 Alex Jensen Sheldon Coopers  
13 Alex Jensen Howie Wolowits  
15 Alex Jensen Rav Toothrapaly  
13 Alex Jensen Ami Sarah Fowler  
10 Alex Jensen Stu Broom  
5 Alex Jensen Alexander Jensen  
15 Barry Kripke Leonard Hofstadter  
13 Barry Kripke Sheldon Coopers  
13 Barry Kripke Howie Wolowits  
12 Barry Kripke Rav Toothrapaly  
13 Barry Kripke Ami Sarah Fowler  
10 Barry Kripke Stu Broom  
14 Barry Kripke Alexander Jensen  

As you can see, this displays the Levenstein distance of the two sequences. Another option we have is to look at the ratio.

len1 = len(list_1)
len2 = len(list_2)
lensum = len1 + len2
for person_1 in list_1:
    for person_2 in list_2:
        levdist = nltk.metrics.edit_distance(person_1, person_2)
        nltkratio = (float(lensum) - float(levdist)) / float(lensum)
        if nltkratio > 0.70:
            print nltkratio, person_1, person_2

Which we can see the end result below:

1.0 Leonard Hofstadter Leonard Hofstadter  
0.944444444444 Sheldon Cooper Sheldon Coopers  
0.777777777778 Howard Wolowitz Howie Wolowits  
0.777777777778 Raj Koothrappali Rav Toothrapaly  
0.833333333333 Amy Farrah Fowler Ami Sarah Fowler  
0.777777777778 Stuart Bloom Stu Broom  
0.722222222222 Alex Jensen Alexander Jensen

Mapping Mesh Blocks with TileMill

This quick tutorial will detail how to prepair the ABS Mesh Blocks to be used with MapBox’s TileMill. Beyond scope is how to install postgresql, postgis and TileMill. There is a lot of documentation how to do these tasks.

First, we create a database to import the shapefile and population data into:

Using ‘psql’ or ‘SQL Query’, create a new database:

CREATE DATABASE transport WITH TEMPLATE postgis20 OWNER postgres;
# Query returned successfully with no result in 5527 ms.

It is necessary to first import the Mesh Block spatial file using something like PostGIS Loader.

We then create a table to import the Mesh Block population data:

CREATE TABLE tmp_x (id character varying(11), Dwellings numeric, Persons_Usually_Resident numeric);

And then load the data:

COPY tmp_x FROM '/home/kelvinn/censuscounts_mb_2011_aust_good.csv' DELIMITERS ',' CSV HEADER;

It is possible to import the GIS information and view it in QGIS:

Now that we know the shapefile was imported correctly we can merge the population with spatial data. The following query is used to merge the datasets:

UPDATE mb_2011_nsw
SET    dwellings = tmp_x.dwellings FROM tmp_x
WHERE  mb_2011_nsw.mb_code11 = tmp_x.id;

UPDATE mb_2011_nsw
SET    pop = tmp_x.persons_usually_resident FROM tmp_x
WHERE  mb_2011_nsw.mb_code11 = tmp_x.id;

We can do a rough validation by using this query:

SELECT sum(pop) FROM mb_2011_nsw;

And we get 6916971, which is about right (ABS has the 2011 official NSW population of 7.21 million).

Finally, using TileMill, we can connect to the PostgGIS database and apply some themes to the map.

host=127.0.0.1 user=MyUsername password=MyPassword dbname=transport
(SELECT * from mb_2011_nsw JOIN westmead_health on mb_2011_nsw.mb_code11 = westmead_health.label) as mb

After generating the MBTiles file I pushed it to my little $15/year VPS and used TileStache to serve the tiles and UTFGrids. The TileStache configuration I am using looks something like this:

{
  "cache": {
    "class": "TileStache.Goodies.Caches.LimitedDisk.Cache",
    "kwargs": {
        "path": "/tmp/limited-cache",
        "limit": 16777216
    }
  },
  "layers": 
  {
    "NSWUrbanDensity":
    {
        "provider": {
            "name": "mbtiles",
            "tileset": "/home/user/mbtiles/NSWUrbanDensity.mbtiles"
        }
    },
    "NSWPopDensity":
    {
        "provider": {
            "name": "mbtiles",
            "tileset": "/home/user/mbtiles/NSWPopDensity.mbtiles"
        }
    }
  }
}

Migrate Custom Blog to Blogger

For the last ten years I have run this website from various systems. First it was on Wordpress, then Mambo, then Joomla, and since early 2006 it has been running on custom code written using Django. I used this site as a learning tool for Django, re-wrote it after gaining more knowledge of Django, and then re-wrote it again when Google released App Engine. However, I recently realised that for the last few years I have spent more time writing little features than actually writing. I have entire trips that I never wrote because I was too busy writing code.

This week it all changed. I did the unthinkable. I moved this website to Blogger.

After evaluating some of the features of blogger, i.e. custom domains, location storing, ability to filter on labels, custom HTML/CSS, great integration with Picasa, and their mobile app, I realised I could virtually replace everything I had previously custom made.

This post gives a technical description how to migrate a site running Django, but readily applies to any blog running on custom code. I initially spent a fair bit of time trying to figure out how to convert my existing RSS feed into something Blogger could accept, but every solution required troubleshooting. I soon remembered why I love Django so much, and that it would be trivial to generate the correct XML for import.

  1. Create Blogger Template
    I wanted to keep my design, so I hacked it to support Blogger. Take one of the existing templates, edit the HTML, and adjust it for your design. If you’ve worked with templates before this shouldn’t be too difficult.
  2. Generate Sample XML
    The first step was to generate a sample XML file from Blogger to see what would be needed for import. Create a sample post with a unique name and a few labels, and location. In Blogger, go to Settings->Other and click Export Blog. The first 90% of the file will be for your template and other settings, but eventually you will find a section with entry elements in it. Copy this sample element out - this will become your template.
  3. Format Template
    Using the sample section from the blog export, format it so the view you will create populates it correctly. A note of caution: the template needs time in ISO 8601 format, you need the id element, and the location element needs coordinates if there is a name. It won’t import later if there is a name with no coordinates. My template looks like this:

feeds/rss.html

{%  load blog_extras %}
{% for entry in entries %}
    tag:blogger.com,1999:blog-1700991654357243752.post-{% generate_id %}
        {{ entry.publish_date|date:"Y-m-d" }}T10:30:00.000123
        {{ entry.publish_date|date:"Y-m-d" }}T10:30:00.000123
        {% for tag in entry.tags %}
            {% endfor %}

        {{ entry.title }}
        {{ entry.content }}

        

        Joe Bloggs
            https://plus.google.com/12345689843655881853
            [email protected] 
{% endfor %}

This isn’t really RSS, so if you are pedantic you can name it something else. You will notice I loaded some template tags in there (“blog_extras”). This is for generating the random number, as this is needed for the ID element.. Here’s the template tag.

blog_extras.py

# 'import random' at beginning of file
def generate_id():
    id = ""
    for x in xrange(1, 7):
        id = id + str(int(random.uniform(400, 900)))
    id = id + "8"
    return {'obj': id}
register.inclusion_tag('blog/generate_id.html')(generate_id)

/blog/generate_id.html

{{ obj }}
  1. Create Code To Populate View

This section should be easy if you have written your blog in Django. Simply populate the template, what I have shown as “rss.html” above

blog/views.py

def show_rss(self):
    q = Entry.all()
    q = q.filter("genre !=", "blog")
    entries = q.fetch(500)
    return render_to_response("feeds/rss.html", {
        'entries': entries,
        }, mimetype='text/plain')

I did a filter on the model to not include “blog” entries - these are my travel stories, and I exported them separately. Remember that this is all happening on App Engine, so you will need to adjust if using Django’s normal ORM.

  1. Download Entries

Visit the URL you mapped to the “show_rss” function in urls.py, it should generate your list of entries. Copy and paste those entries into the exported XML from Blogger where you took out the original entry element.

  1. Import Entries

Now go to Blogger and import your blog. With any luck you will have imported all your entries. You will probably need to do this a few times as you tweak the text. I had to remove some newlines from my original posts.

Optional Steps

  1. Create Redirect URLS
    Links in Blogger appear to only end in .html, which is a problem for links coming from Django. Luckily, Blogger includes the ability to add redirects. Go to Settings->Other-Search Preferences. You can then edit redirects there. I generated a list of my old URLs and combined that with a list of the new URLs. Hint: you can use Yahoo Pipes to extract a list of URLS from a RSS feed. If you open any of the links in Excel and split on forward slashes, remember that it will cut off leading zeros. Set that field to TEXT during import.

I decided not to create redirects for every entry, as I didn’t really have time, and it only probably matters if somebody links directly to that page. I opened Google Analytics and looked at the Search Engine Optimisation page and sorted it by the most used inbound links. After getting down to entries that only had 1 inbound request per month I stopped creating redirects.

  1. Host Stylesheets and Images Externally

Blogger won’t host host files, so you need to work around this problem. All my images are generally from Picasa, except very specific website related ones. I moved those to Amazon’s S3 and updated the links. I did the same with my CSS. You could probably store them in Google Storage, too.

  1. Create Filters on Labels

If you had any previous groupings you can still link to them using label searches (in my case I actually added the “genre” as a label). The syntax is “/search/label/labelname/”, as you can see in my howtos section link.

  1. Update Webmaster Tools

If your site is part of Google’s Webmaster Tools, you will want to login and take a look that things are OK. You will also probably want to update your sitemap (send Google your atom.xml feed).

How to convert 131500 TDX to GTFS

TDX data has been available for a number of years on 131500.info, but many tools are GTFS specific. I also find GTFS easier to work with.

Luckily, converting from TDX to GTFS is not overly difficult, and below are some instructions. This howto is a bit old, as I am only now copying it from my “Notes” folder to put online to help others.

Note: You can now directly download GTFS from the TransportInfo website: https://tdx.131500.com.au

  1. Signup for an account with EC2 (AWS), unless you have 16GB of memory available on a machine.
  2. Upload TransXChange2GTFS to a place you can download from.
  3. Upload the latest TDX data dump from 131500.info to a place you can download from.
  4. Login to AWS and start an EC2 instance.  I picked a large instance and used 64-bit Ubuntu 10.04, us-east-1 ami-f8f40591
  5. Download the Data and transxchange to /mnt
wget http://ec2-175-41-139-176.ap-southeast-1.compute.amazonaws.com/Data20110127.zip
wget http://cdn.kelvinism.com/transxchange2GoogleTransit.jar

EDIT 16-03-2025: I’ve since removed these files.

  1. Install Sun JRE.
apt-get install python-software-properties
add-apt-repository "deb http://archive.canonical.com/ lucid partner"
apt-get update
apt-get install sun-java6-jre
  1. Check how much memory is available
root@domU-12-31-39-10-31-B1:/mnt# free -m
             total       used       free     shared    buffers     cached
Mem:          7680        626       7053          0         11        329
-/+ buffers/cache:        285       7394
Swap:            0          0          0
  1. Create a configuration file sydney.conf
url=http://131500.info
timezone=Australia/Sydney
default-route-type=2
output-directory=output
useagencyshortname=true
skipemptyservice=true
skiporhpanstops=true
  1. If you’re on the train like me, start screen, and start converting. The number you pick for “-Xmx” obviously needs to fit in the amount of free memory you have.
java -Xmx104000m -jar dist\\transxchange2GoogleTransit.jar Data20120524.zip -c sydney.conf