IoT Foray with Sonoff S20 / IFTTT / Lambda / CloudMQTT

I recently purchased an Echo from Amazon, and we were contemplating how else to better integrate it with our somewhat minimalistic home. I thought it would be interesting to get it to link to a WiFi-enabled power outlet, but unfortunately they are pretty expensive in Australia.

Then I stumbled across the Sonoff devices by Itead, and learned that they were somewhat hackable via a custom firmware. Coincidentally I received the two devices on the same day my daughter was off sick, so when she had her nap, I got hacking.

The first bottleneck was discovering that the units I received did not have any headers. A little quick soldering later, and we had headers.

No headers mom :(

Now we have headers!

A few notes of warning: the $2 programmer I got from AliExpress has 3.3v and 5v, but the output is 5v. I’m glad I measured it with my multimeter, and used a random 3.3v breadboard supply instead.

In hindsight I wish I had just purchased the FTDI programmer from Itead. It looks pretty neat.

After following the rest of the Tasmoto hardware instructions, and then the PlatformIO instructions, I was able to successfully flash both my units with the custom firmware.

I then created a Lambda function that sends a signal to CloudMQTT, and connected the two devices.

Voila!

Geocoding Photos (Mac)

I’ve recently started using OSX (again), and am really enjoying it (again). One Windows-only tool that I found really useful is Geosetter, which allows you to add geo coordinates into photos. There don’t appear to be any free geocoding tools that work to my satisfaction to do this, so the next best thing was geocode like you would using Linux. Here’s how.

We’re going to use the command line program ExifTool (by Phil Harvey) to extract coordinates from a gpx file and embed them in a directory of images.

Firstly, install exiftool using brew. Here’s the command:

brew install exiftool

Copy the gpx files into your image directory and initiate the sync with the geotag flag:

exiftool -geotag=gpslog2014-12-10_212401.gpx ./

It is possible to also specify multiple gpx files (e.g. multiple day trip):

exiftool -geotag=gpslog2014-12-10_212401.gpx -geotag=gpslog2014-12-07_132315.gpx -geotag=gpslog2014-12-08_181318.gpx -geotag=gpslog2014-12-10_073811.gpx ./

And finally, you can include a time offset with the geosync flag. For instance, I had an 11-hour (39600 seconds) difference due to a timezone hiccup with my new camera, so we can get rid of that:

exiftool -geotag=gpslog2014-12-10_212401.gpx -geotag=gpslog2014-12-07_132315.gpx -geotag=gpslog2014-12-08_181318.gpx -geotag=gpslog2014-12-10_073811.gpx -geosync=39600 ./

It will process the images, renaming the original with an “.original” extension, and give you a report at the end:

1 directories scanned
193 image files updated
83 image files unchanged

If your camera is set to GMT, then put all the GPX files in the same directory as the photos to geocode, and do this:

TZ=GMT exiftool -geotag "*.gpx" *.jpg

For any additional manual geocoding I fallback on Picasa’s Places GeoTag to add the coordinates.

If you have Lightroom, then try doing a search for a suitable ExifTool Lightroom plugin, as there seem to be a few.

Snap-CI Deploy to OpenShift

There are some wonderful CI / CD tools out there right now, and some of them have very usable free tiers. A few good examples include Shippable, Wercker, CloudBees, and Snap-CI. There are others, of course, but these all allow at least one private project to get started.

I have recently moved my projects to Snap, and my hack for the day needed to be deployed to OpenShift. Although Snap has built in integrations for some providers, no such integration currently exists for OpenShift (yet!). However, it takes less than 10 minutes to configure a Deploy step to OpenShift, and here’s how.

Add SSH Keys
You will need to add your private SSH key (i.e. id_rsa) to Snap, and your public key to OpenShift (i.e. id_rsa.pub)

You can create the keys on another machine with the ssh-keygen command, and copy them into them into the corresponding places. In OpenShift, this is under Settings -> Add a new key. Once open, paste in the contents of your id_rsa.pub key

In Snap, edit your configuration, navigate to your Deploy step, and look for “Secure Files” and “Add new”

Get the content of the id_rsa key you generated earlier and post it in the content box. It should look like this, with “/var/go” as the file location, except with a real key:

Enable Git Push from Snap

If you’ve used ssh much, you are probably aware that that you can specify an identify file with the “-i” flag. The git command has no such flag, yet, but we can create a simple bash script that emulates this (script courtesy of Alvin Abad).

Add another New File in Snap and paste in the below script:

#!/bin/bash
 
# The MIT License (MIT)
# Copyright (c) 2013 Alvin Abad
 
if [ $# -eq 0 ]; then
    echo "Git wrapper script that can specify an ssh-key file
Usage:
    git.sh -i ssh-key-file git-command
    "
    exit 1
fi
 
# remove temporary file on exit
trap 'rm -f /tmp/.git_ssh.$$' 0
 
if [ "$1" = "-i" ]; then
    SSH_KEY=$2; shift; shift
    echo "ssh -i $SSH_KEY \$@" > /tmp/.git_ssh.$$
    chmod +x /tmp/.git_ssh.$$
    export GIT_SSH=/tmp/.git_ssh.$$
fi
 
# in case the git command is repeated
[ "$1" = "git" ] && shift
 
# Run the git command
git "$@"

Give this script the name “git.sh”, set the file permissions to “0755”, and update the file location to “/var/go”.

Profit
With all these parts configured correctly you can add this single line to your Deploy script:

/var/go/git.sh -i /var/go/id_rsa push ssh://[email protected]/~/git/example.git/

Re-run the build, check your logs, and it should deploy. Good luck!

Solved: slow build times from Dockerfiles with Python packages (pip)

I have recently had the opportunity to begin exploring Docker, the currently hip way to build application containers, and I generally like it. It feels a bit like using Xen back in 2005, when you still had to download it from cl.cam.ac.uk, but there is huge momentum right now. I like the idea of breaking down each component of your application into unique services and bundling them up - it seems clean. The next year is going to be very interesting with Docker, as I am especially looking forward to seeing how Google’s App Engine allows Docker usage, or what’s in store for the likes of Flynn, Deis, CoreOS, or Stackdock.

One element I had been frustrated with is the build time of my image to host a Django application I’m working on. I kept hearing these crazy low rebuild times, but my container was taking ages to rebuild. I noticed that it was cached up until I re-added my code, and then pip would reinstall all my packages.

It appeared as though anything after I used ADD for my code was being rebuilt, and reading online seemed to confirm this. Most of the items were very quick, e.g. “EXPOSE 80”, but then it hit “RUN pip -r requirements.txt”

There are various documented ways around this, from two Dockerfiles to just using packaged libraries. However, I found it easier to just use multiple ADD statements, and the good Docker folks have added caching for them. The idea is to ADD your requirements first, then RUN pip, and then ADD your code. This will mean that any code changes don’t invalidate the pip cache.

For instance, I had something (abbreviated snippet) like this:

# Set the base image to Ubuntu
FROM ubuntu:14.04

# Update the sources list
RUN apt-get update
RUN apt-get upgrade -y

# Install basic applications
RUN apt-get install -y build-essential

# Install Python and Basic Python Tools
RUN apt-get install -y python python-dev python-distribute python-pip postgresql-client

# Copy the application folder inside the container
ADD . /app

# Get pip to download and install requirements:
RUN pip install -r /app/requirements.txt

# Expose ports
EXPOSE 80 8000

# Set the default directory where CMD will execute
WORKDIR /app

VOLUME [/app]

CMD ["sh", "/app/run.sh"]

And it rebuild pip whenever the code changes. Just add the requirements and move the RUN pip line:

# Set the base image to Ubuntu
FROM ubuntu:14.04

# Update the sources list
RUN apt-get update
RUN apt-get upgrade -y

# Install basic applications
RUN apt-get install -y build-essential

# Install Python and Basic Python Tools
RUN apt-get install -y python python-dev python-distribute python-pip postgresql-client

ADD requirements.txt /app/requirements.txt

# Get pip to download and install requirements:
RUN pip install -r /app/requirements.txt

# Copy the application folder inside the container
ADD . /app

# Expose ports
EXPOSE 80 8000

# Set the default directory where CMD will execute
WORKDIR /app

VOLUME [/app]

CMD ["sh", "/app/run.sh"]

I feel a bit awkward for having missed something that must be so obvious, so hopefully this can help somebody in a similar situation.

TLS Module In SaltStack Not Available (Fixed)

I was trying to install HALite, the WebUI for SaltStack, using the provided instructions. However, I kept getting the following errors when trying to create the certificates using Salt:

'tls.create_ca_signed_cert' is not available.  
'tls.create_ca' is not available.

Basically, the ’tls’ module in Salt simply didn’t appear to work. The reason for this is detailed on intothesaltmind.org:

Note: Use of the tls module within Salt requires the pyopenssl python extension.

That makes sense. We can fix this with something like:

apt-get install libffi-dev  
pip install -U pyOpenSSL  
/etc/init.d/salt-minion restart

Or, better yet, with Salt alone:

salt '*' cmd.run 'apt-get install libffi-dev'  
salt '*' pip.install pyOpenSSL  
salt '*' cmd.run "service salt-minion restart"

The commands to create the PKI key should work now:

Created Private Key: "/etc/pki/salt/salt_ca_cert.key." Created CA "salt": "/etc/pki/salt/salt_ca_cert.crt."  

Beers of Myanmar

While in Myanmar on a recent trip I did a brief taste comparison of the three main beers available in most supermarkets.

Andaman - Not to my taste, perhaps like XXXX, VB, Natural Light, or a light Steel Reserve.
Myanmar - Quite refreshing, a bit like similar beers in the region, e.g. Chang, Tiger, or Laos Beer.

ABC - An extra stout (and 8%!) in such a hot country? That’s a surprise.

Error opening /dev/sda: No medium found

I have had this issue before, solved it, and had it again.

Let’s say you plug in a USB drive into a Linux machine, and try to access it (mount it, partition it with fdisk/parted, or format it), and you get the error

Error opening /dev/sda: No medium found  

Naturally the first thing you will do is ensure that it appeared when you plugged it in, so you run ‘dmesg’ and get:

sd 2:0:0:0: [sda] 125045424 512-byte logical blocks: (64.0 GB/59.6 GiB)  

And it appears in /dev

Computer:~ $ ls /dev/sd*  
/dev/sda  
Computer:~ $  

Now what? Here’s what has bitten me twice: make sure the drive has enough power. Let’s say you mounted a 2.5" USB drive into a Raspberry Pi. The Pi probably doesn’t have enough current to power the drive, but it does have enough to make the drive recognisable. Or, if you are like me, the USB charger powering the drive is faulty, so even though it has power, it doesn’t have enough.

The next troubleshooting step should be obvious: give the drive enough power to completely spin up.

Continuous Flow Through Worm Bin

Status: ✅

A few months ago we decided we wanted a worm bin, as we were eating a lot of vegetables, and tossing away bits that weren’t used. We were also buying soil for our plants, so it made sense to try to turn one into another.

One of our friends gave us some worms from her compost - no idea what kind - and I build an experimental CFT worm bin (sample plans). We harvested once at about two months, but I don’t think it was quite ready. We’ll keep experimenting.

Free Splunk Hosting

I first used Splunk about 10 years ago after an old colleague installed it on a computer in the corner, and ever since then I have preached about it. If you have log data, of any kind, I’d recommend you give it a go.

The Splunk people have a a few pretty good options for trying Splunk out, as you can either use Splunk Storm or Splunk Free. The first option is obviously hosted, and has a generous storage option, but also does not allow long term storage of data. I send system log data to Splunk Storm.

However, what if you don’t have a lot of data, but you want to keep that data forever? After reading Ed Hunsinger’s Go Splunk Yourself entry about using it for Quantified Self data, I knew I had to do the same.

From personal experience, Splunk requires at least 1GB to even start. You can probably get it to run on less, but I haven’t had much success. This leaves two options: look at Low End Box for a VPS with enough memory (as cheap as $5/month), of use OpenShift. Red Hat generously provides three “gears” to host applications, for free, and each with 1GB of memory. I have sort of a love-hate relationship with OpenShift, maybe a bit like using OAuth. Red Hat calls OpenShift the “Open Hybrid Cloud Application Platform”, and I can attest that it is really this. They have provided a method to bundle an application stack and push it into production without needing to fuss about infrastructure, or even provisioning and management of the application. It feels like what would happen if Google App Engine and Amazon’s EC2 had a child. Heroku or dotCloud might be its closest alternatives.

Anyways, this isn’t a review of OpenShift, although it would be a positive review, but instead on how to use OpenShift to host Splunk. I first installed Splunk in a gear using Nginx as a proxy, and it worked. However, this felt overly complex, and after one of my colleagues started working on installing Splunk in a cartridge, I eventually agreed this would be the way to go. The result was a Splunk cartridge that can be installed inside any existing gear. Here are the instructions; you need an OpenShift account, obviously. The install should take less than ten clicks of your mouse, and one copy/paste.

From the cartridge’s GitHub README:

  1. Create an Application based on existing web framework. If in doubt, just pick “Do-It-Yourself 0.1” or “Python 2.7”
  2. Click on “Continue to the application overview page.”
  3. On the Application page, click on “Or, see the entire list of cartridges you can add”.
  4. Under “Install your own cartridge” enter the following URL: https://raw.github.com/kelvinn/openshift-splunk-cartridge/master/metadata/manifest.yml
  5. Next and Add Cartrdige. Wait a few minutes for Splunk to download and install.
  6. Logon to Splunk at: https://your-app.rhcloud.com/ui

More details can be read on the cartridge’s GitHub page, and I would especially direct you to the limitations of this configuration. This will all stop working if Splunk makes the installer file unavailable, but I will deal with that when the time comes. Feel free to alert me if this happens.

Finding The Same (Misspelled) Name Using Python/NLTK

I have been meaning to play around with the Natural Language Toolkit for quite some time, but I had been waiting for a time when I could experiment with it and actually create some value (as opposed to just play with it). A suitable use case appeared this week: matching strings. In particular, matching two different lists of many, many thousands of names.

To give you an example, let’s say you had two lists of names, but with the name spelled incorrectly in one list:

List 1:
Leonard Hofstadter
Sheldon Cooper
Penny
Howard Wolowitz
Raj Koothrappali
Leslie Winkle
Bernadette Rostenkowski
Amy Farrah Fowler
Stuart Bloom
Alex Jensen
Barry Kripke

List 2:
Leonard Hofstadter
Sheldon Coopers
Howie Wolowits
Rav Toothrapaly
Ami Sarah Fowler
Stu Broom
Alexander Jensen

This could easily occur if somebody was manually typing in the lists, dictating names over the phone, or spell their name differently (e.g. Phil vs. Phillip) at different times.

If we wanted to match people on List 1 to List 2, how could we go about that? For a small list like this you can just look and see, but with many thousands of people, something more sophisticated would be useful. One tool could be NLTK’s edit_distance function. The following Python script displays how easy this is:

import nltk
 
list_1 = ['Leonard Hofstadter', 'Sheldon Cooper', 'Penny', 'Howard Wolowitz', 'Raj Koothrappali', 'Leslie Winkle', 'Bernadette Rostenkowski', 'Amy Farrah Fowler', 'Stuart Bloom', 'Alex Jensen', 'Barry Kripke']
 
list_2 = ['Leonard Hofstadter', 'Sheldon Coopers', 'Howie Wolowits', 'Rav Toothrapaly', 'Ami Sarah Fowler', 'Stu Broom', 'Alexander Jensen']
 
for person_1 in list_1:
    for person_2 in list_2:
        print nltk.metrics.edit_distance(person_1, person_2), person_1, person_2

And we get this output:

0 Leonard Hofstadter Leonard Hofstadter  
15 Leonard Hofstadter Sheldon Coopers  
14 Leonard Hofstadter Howie Wolowits  
15 Leonard Hofstadter Rav Toothrapaly  
14 Leonard Hofstadter Ami Sarah Fowler  
16 Leonard Hofstadter Stu Broom  
15 Leonard Hofstadter Alexander Jensen  
14 Sheldon Cooper Leonard Hofstadter  
1 Sheldon Cooper Sheldon Coopers  
13 Sheldon Cooper Howie Wolowits  
13 Sheldon Cooper Rav Toothrapaly  
12 Sheldon Cooper Ami Sarah Fowler  
11 Sheldon Cooper Stu Broom  
12 Sheldon Cooper Alexander Jensen  
16 Penny Leonard Hofstadter  
13 Penny Sheldon Coopers  
13 Penny Howie Wolowits  
14 Penny Rav Toothrapaly  
16 Penny Ami Sarah Fowler  
9 Penny Stu Broom  
13 Penny Alexander Jensen  
11 Howard Wolowitz Leonard Hofstadter  
13 Howard Wolowitz Sheldon Coopers  
4 Howard Wolowitz Howie Wolowits  
15 Howard Wolowitz Rav Toothrapaly  
13 Howard Wolowitz Ami Sarah Fowler  
13 Howard Wolowitz Stu Broom  
14 Howard Wolowitz Alexander Jensen  
16 Raj Koothrappali Leonard Hofstadter  
14 Raj Koothrappali Sheldon Coopers  
16 Raj Koothrappali Howie Wolowits  
4 Raj Koothrappali Rav Toothrapaly  
14 Raj Koothrappali Ami Sarah Fowler  
14 Raj Koothrappali Stu Broom  
16 Raj Koothrappali Alexander Jensen  
14 Leslie Winkle Leonard Hofstadter  
13 Leslie Winkle Sheldon Coopers  
11 Leslie Winkle Howie Wolowits  
14 Leslie Winkle Rav Toothrapaly  
14 Leslie Winkle Ami Sarah Fowler  
12 Leslie Winkle Stu Broom  
12 Leslie Winkle Alexander Jensen  
17 Bernadette Rostenkowski Leonard Hofstadter  
18 Bernadette Rostenkowski Sheldon Coopers  
18 Bernadette Rostenkowski Howie Wolowits  
19 Bernadette Rostenkowski Rav Toothrapaly  
20 Bernadette Rostenkowski Ami Sarah Fowler  
20 Bernadette Rostenkowski Stu Broom  
17 Bernadette Rostenkowski Alexander Jensen  
15 Amy Farrah Fowler Leonard Hofstadter  
14 Amy Farrah Fowler Sheldon Coopers  
15 Amy Farrah Fowler Howie Wolowits  
14 Amy Farrah Fowler Rav Toothrapaly  
3 Amy Farrah Fowler Ami Sarah Fowler  
14 Amy Farrah Fowler Stu Broom  
13 Amy Farrah Fowler Alexander Jensen  
15 Stuart Bloom Leonard Hofstadter  
12 Stuart Bloom Sheldon Coopers  
12 Stuart Bloom Howie Wolowits  
14 Stuart Bloom Rav Toothrapaly  
13 Stuart Bloom Ami Sarah Fowler  
4 Stuart Bloom Stu Broom  
14 Stuart Bloom Alexander Jensen  
15 Alex Jensen Leonard Hofstadter  
12 Alex Jensen Sheldon Coopers  
13 Alex Jensen Howie Wolowits  
15 Alex Jensen Rav Toothrapaly  
13 Alex Jensen Ami Sarah Fowler  
10 Alex Jensen Stu Broom  
5 Alex Jensen Alexander Jensen  
15 Barry Kripke Leonard Hofstadter  
13 Barry Kripke Sheldon Coopers  
13 Barry Kripke Howie Wolowits  
12 Barry Kripke Rav Toothrapaly  
13 Barry Kripke Ami Sarah Fowler  
10 Barry Kripke Stu Broom  
14 Barry Kripke Alexander Jensen  

As you can see, this displays the Levenstein distance of the two sequences. Another option we have is to look at the ratio.

len1 = len(list_1)
len2 = len(list_2)
lensum = len1 + len2
for person_1 in list_1:
    for person_2 in list_2:
        levdist = nltk.metrics.edit_distance(person_1, person_2)
        nltkratio = (float(lensum) - float(levdist)) / float(lensum)
        if nltkratio > 0.70:
            print nltkratio, person_1, person_2

Which we can see the end result below:

1.0 Leonard Hofstadter Leonard Hofstadter  
0.944444444444 Sheldon Cooper Sheldon Coopers  
0.777777777778 Howard Wolowitz Howie Wolowits  
0.777777777778 Raj Koothrappali Rav Toothrapaly  
0.833333333333 Amy Farrah Fowler Ami Sarah Fowler  
0.777777777778 Stuart Bloom Stu Broom  
0.722222222222 Alex Jensen Alexander Jensen