Free Splunk Hosting

I first used Splunk about 10 years ago after an old colleague installed it on a computer in the corner, and ever since then I have preached about it. If you have log data, of any kind, I’d recommend you give it a go.

The Splunk people have a a few pretty good options for trying Splunk out, as you can either use Splunk Storm or Splunk Free. The first option is obviously hosted, and has a generous storage option, but also does not allow long term storage of data. I send system log data to Splunk Storm.

However, what if you don’t have a lot of data, but you want to keep that data forever? After reading Ed Hunsinger’s Go Splunk Yourself entry about using it for Quantified Self data, I knew I had to do the same.

From personal experience, Splunk requires at least 1GB to even start. You can probably get it to run on less, but I haven’t had much success. This leaves two options: look at Low End Box for a VPS with enough memory (as cheap as $5/month), of use OpenShift. Red Hat generously provides three “gears” to host applications, for free, and each with 1GB of memory. I have sort of a love-hate relationship with OpenShift, maybe a bit like using OAuth. Red Hat calls OpenShift the “Open Hybrid Cloud Application Platform”, and I can attest that it is really this. They have provided a method to bundle an application stack and push it into production without needing to fuss about infrastructure, or even provisioning and management of the application. It feels like what would happen if Google App Engine and Amazon’s EC2 had a child. Heroku or dotCloud might be its closest alternatives.

Anyways, this isn’t a review of OpenShift, although it would be a positive review, but instead on how to use OpenShift to host Splunk. I first installed Splunk in a gear using Nginx as a proxy, and it worked. However, this felt overly complex, and after one of my colleagues started working on installing Splunk in a cartridge, I eventually agreed this would be the way to go. The result was a Splunk cartridge that can be installed inside any existing gear. Here are the instructions; you need an OpenShift account, obviously. The install should take less than ten clicks of your mouse, and one copy/paste.

From the cartridge’s GitHub README:

  1. Create an Application based on existing web framework. If in doubt, just pick “Do-It-Yourself 0.1” or “Python 2.7”
  2. Click on “Continue to the application overview page.”
  3. On the Application page, click on “Or, see the entire list of cartridges you can add”.
  4. Under “Install your own cartridge” enter the following URL: https://raw.github.com/kelvinn/openshift-splunk-cartridge/master/metadata/manifest.yml
  5. Next and Add Cartrdige. Wait a few minutes for Splunk to download and install.
  6. Logon to Splunk at: https://your-app.rhcloud.com/ui

More details can be read on the cartridge’s GitHub page, and I would especially direct you to the limitations of this configuration. This will all stop working if Splunk makes the installer file unavailable, but I will deal with that when the time comes. Feel free to alert me if this happens.

Sydney Commute Times Mapped Part 2

EDIT 12-03-2025: I accidentially broke the maps when deleting my AWS account, as the mbtiles were hosted there. Oops.

In Sydney Commute Times Mapped Part 1 I took a small step to a bigger goal of mashing together public transport in Sydney, and the Metropolitan Strategy for Sydney to 2031. The question I wanted to answer is this: how aligned is Sydney’s public transport infrastructure and the Metropolitan Strategy’s of a “city of cities”?

I decided to find out.

Thanks to the release of GTFS data by 131500 it is possible to visualise how long it takes via public transport to commute to the nearest “centre”.

Cities and Corridors - Metropolitan Strategy for Sydney to 2031

The Australian Bureau of Statistics collects data based on “mesh blocks”, or roughly an area containing roughly 50 dwellings. Last week I had some fun mapping the mesh blocks, as well as looking at Sydney’s urban densities. These mesh blocks are a good size to look at for calculating commute times.

The simplified process I used was this, for the technical minded:

  1. Calculate the centre of each mesh block
  2. Calculate the commute time via public transport from each block to every “centre” (using 131500’s GTFS and OpenTripPlanner’s Analyst tool)
  3. Import times in a database, calculate lowest commute time to each centre
  4. Visualise in TileMill
  5. Serve tiles in TileStache and visualise with Leaflet

The first map I created was simply to indicate how long it would take to the nearest centre. There appears to be rapidly poorer accessibility on the fringe of Sydney. I was also surprised of what appears to be a belt of higher times between Wetherill Park and all the way to Marrickville. There also appears to be poorer accessibility in parts of Western Sydney. It is worth noting that I offer not guarantee of the integrity of the data in these maps, and I have seen a few spots where the commute times increase significantly in adjacent mesh blocks. This tells me the street data (from OpenStreetMap) might not be connected correctly.

My next map shows what areas are within 30 minutes.

These maps were both created using open data and open source tools, which I find quite neat.

I have been interested in mapping traffic for a number of years, maybe ever since arriving in Sydney. It is sort of a hobby; I find making maps relaxing. My first little map was way back in 2008, where I visualised speed from a GPS unit. A little later I added some colour to the visualisations, and then used this as an excuse to create a little GUI for driving speed. My interest in visualising individual vehicles has decreased recently, as it has now shifted to the mapping wider systems. Have an idea you would like to see mapped? Leave a note in the comments.

Open Source Video Editing

In the next year I plan to make a little video, nothing fancy likely, but something that will require an editor. However, I don’t own a mac (which rules our Final Cut Pro + After Effect and iMovie, which Ian and I both have had too much fun with. Inside joke.) I’m also a die-hard Linux fan, trying to hold out buying a mac for as long as possible.

SF to the rescure. There are four editors listed, and in the next year I’ll try them all. Overalll, they look quite promising.

Jahshaka – Beta. Good reviews from what I’ve seen.
Kdenlive – Alpha/Beta. Looks a lot less mature than Jahshaka, especially since I’m going to have to check it out via svn. But, the screenshots look quite impressive.
LiVES – Beta.