Ruby and Fortran libraries

I’ve been trying to learn Fortran for reasons of: future job security; masochism; curiousity.

Since having a REPL for a language makes learning it so much easier, I wrote one in ruby, called frepl. It is pretty buggy, and needs serious refactoring, but mostly works. After having written it, I was informed that there are other such efforts, e.g. in Python, but after cursory examination, they seem a bit less REPL-y than frepl.

I also have been interested in calling Fortran from Ruby, with the ultimate goal of perhaps being able to benefit from Fortran’s superior array-handling abilities for doing machine learning work in ruby. Turns out this is somewhat possible with ruby-ffi. I say “somewhat” because it is not very ergonomic to call Fortran from Ruby, as far as I can tell, and of the solutions I’ve come up with, some seem unstable (as in, occasionally produce segfaults), especially when trying to interact with Fortran derived types.

The ruby-fortran FFI proof-of-concept is here, along with some benchmarks. As I suspected/hoped, Fortran is much faster than ruby at some array manipulation tasks, e.g. summing an array is ~10x faster, doing dot product is ~2x faster. I mean, when using Fortran without the FFI/Ruby overhead, I suspect the speed differences are even more pronounced–these numbers are specifically with regard to pure ruby vs. ruby-calling-fortran.

Fixing slow HTTP PUTs with Rails and Typhoeus

I’m working on a project that involves communication between two Rails apps, one as an API and the other as a backend for web app. The web app communicates with the API via the excellent Typhoeus ruby gem, and everything was going swell until I got to implementing updates via HTTP PUTs for a couple resources.

The problem was that they were way slower than any of the other requests, regardless of payload size–slower than GETs, DELETEs, even POSTs. I tried a PATCH and that was as fast as the rest.

Since at the time I was originally debugging this I had no Internet access, I began throwing logging into the internals of Typhoeus everywhere, and eventually narrowed it down to the Ethon gem, which wasn’t too much help since it is basically just an interface to the CURL library. But when I did a plain ol’ CURL PUT to the same endpoint, everything worked just fine.

I tried using tcpump to diagnose the problem, suspecting that it lay in a difference in the way commandline CURL and the Typhoeus use were encoding the request, but my tcpdump-fu was weak: I figured the traffic I was interested would not be going over any of the usual interfaces listed by `ifconfig -a` but didn’t know which one it was, and it was around this time that I temporarily gave up, figuring some quick Googling would solve these problems when I got back online.

It turns out that this traffic goes over the “loopback” interface, which is in fact listed in the output of `ifconfig -a` as the first one on OSX: `lo0`, so I could have run `tcpdump -i lo0`. But now that I had the Internets back, I installed Wireshark, which for some reason I didn’t already have, and fired it up, setting a display filter for http traffic only.

After making a command line CURL PUT request, which was speedy, and a slow ruby Typhoeus request from the rails console, I had a look at the relevant captures in Wireshark and quickly noticed this difference:

Wireshark CURL request


Wireshark Typhoeus request

You’ll notice that the second, slow one is composed of two separate TCP frames.

Examining the headers further, I noticed this:

Expect: 100-continue header
Expect: 100-continue header

Turns out that despite all the times I’ve looked at the list of HTTP codes I’ve always just skipped over the 1XXs. Wikipedia has a more detailed breakdown, but the gist is that clients may send an `Expect: Continue-100` header so that the server can check if a request is appropriate based solely on the headers, which can be useful for large requests. It turns out that the libcurl will send this header by default, though you can disable it with the `-H` command line flag, which is what my CURL commands were doing by specifying various content-type and authorization headers.

It also turns out that CURL will just send the second request anyway after one second if it doesn’t receive a 100 code, which explains why my API client was consistently slow but not broken.

Now, how to work around this. Some Googling revealed some people monkeypatching the Rails default Webrick server, which I didn’t want to do. I could also have hacked Typhoeus or Ethon, but that didn’t seem like the way to go either. The best way, it seemed, was to take advantage of this feature and have my application return a 100 code if the user was indeed authorized to make the request. (Alternatively, you can just specify an Expect header with an empty value, rather than “100-continue”.)

The sensible place to put this code, it seemed to me, was in Rack middleware. (It should be noted at this point that this does not seem to work with Webrick or Thin.) A barebones implementation could look like so:

class Preauthenticator
  def initialize app
    @app = app

  def call env
    if env["HTTP_EXPECT"] =~ /100-continue/
      return [100, {}, [""]]


(For an example from the Unicorn source code, see here.)

The only tricky part here was figuring out how exactly to get Rails to use it. Middleware generally goes in your `config/application.rb` file (make sure to require your file with the middleware class in it at the top of that file), but including this like so: `config.middleware.use Preauthenticator` resulted in rack complaining about a “app error: deadlock; recursive locking (ThreadError)”. Running `rake middleware` will list all the middleware used by your Rails app; it seemed to me that my Preauthenticator should run before any of this, so I tried `config.middleware.insert_before Rack::Sendfile, Preauthenticator`, Rack::Sendfile being the first middleware listed, and that worked like a charm.

Now my HTTP PUTs were as speedy as the rest of my requests.

As of Dec 14, 2014, the middleware here will not work with Webrick 1.3.1, Thin 1.6.3, or Puma 2.10.0 ruby servers.


Typhoeus and CURL will make some PUTs and POSTs slow by sending an “Expect: 100-Continue” header. Add rack middleware that returns this header to make your PUTs and POSTs faster.

Rails 4 scopes with has_many :through relations

Given the following Rails 4 model:

class Producer < ActiveRecord::Base has_many :producer_types has_many :types, :through => :producer_types

There are a couple ways you could create a scope on this model to retrieve only producers of a certain type.


  scope :manufacturer, -> { where(:types => {:name => 'manufacturer'}).joins(:types) }

Second, to generate a dynamic scope for the given column name, you can take advantage of the following active record functionality:

  scope :with_type_name, lambda {|type_name|
    where(:types => {:name => type_name}).joins(:types)

LAMP on Digital Ocean

I’ve been experimenting lately with using DigitalOcean both for remote dev environments (to facilitate development on a Chromebook) and also for staging/testing servers.

The plugin for Vagrant in concert with chef solo makes spinning up and provisioning new instances a breeze, but this post is about going the slightly more manual route, and using one of the application bundles (currently in beta) for Ubuntu 12.04.

These steps are a combination of things I’ve gleaned from various other places and my experience. I make no claim for their soundness.

Create the droplet

Create a new droplet using the Ubuntu 12.04 x32 (or x64–it shouldn’t matter for our purposes here), and select “LAMP on Ubuntu 12.04” from the Applications tab.

At this point, I’m going to assume that you’ll also add your SSH public keys, so you don’t have to login with username and password credentials. (If you don’t have an SSH key, here are some instructions for generating one, you can just ignore the git-specific stuff.) An advantage of doing this is that your root password to the new droplet will not be sent over email (because there will no root password).

When you’ve selected all the options, click “Create Droplet.”

Login and Setup

Droplet creation should take a minute at most, after which you’ll see a screen with various information about your newly created vm, including an IP address. If you specified that your SSH keys should be automatically added, you should be able to SSH in now.


First off, let’s change the default mysql root password as suggested by the login banner (note that the banner may continue to say that the password is still “password” even after you’ve changed it and logged in again):

mysqladmin -u root -p'password' password newpassword

Next, because we began with our SSH keys pre-installed, there’s no root password, so set one by typing `passwd` and following the prompts.

Next we’ll issue some commands with the package manager used by Ubuntu, `apt-get` to first update the list of available packages, and then upgrade the installed ones:

apt-get update
apt-get upgrade

Next, install fail2ban, a service that scans logfiles and auto-bans IP addresses that show signs of malicious activity, a good line of defense against crackers:

apt-get install fail2ban

Next, I’ll install my text editor of choice, vim:

apt-get install vim

As well as unzip…

apt-get install unzip

… and ack

apt-get install ack-grep

Adding a user

Because it is, for a variety of reasons, generally not a good idea to do things as root, let’s add a new user, create a home folder for them with the right permissions,  copy the contents of root’s authorized keys files to the new user’s .ssh folder, so we can ssh in as that user, give them a password, and set their default shell to bash.

useradd luke
mkdir /home/luke
mkdir /home/luke/.ssh
chmod 700 /home/luke/.ssh

And add the contents of the root user’s authorized_keys files to that of the new user:

cat .ssh/authorized_keys > /home/luke/.ssh/authorized_keys
chmod 400 /home/luke/.ssh/authorized_keys
chown -R luke:luke /home/luke
passwd luke
chsh -s /bin/bash luke

Now we’ll give that user the ability to run commands as root via `sudo`. Type `visudo` then enter this line, say, below the similar one for root (it doesn’t matter where, actually):

luke ALL=(ALL) ALL

Hit Command-X when done editing.

Now we’ll disable remote root login. Edit this file, /etc/ssh/sshd_config, with vim, or however you prefer, and make change “PermitRootLogin yes” to “PermitRootLogin no”, and uncomment the line “#PasswordAuthentication yes” and change it to “no”. This will mean you can only login on machines that have your SSH private key. Following that, we need to restart ssh:

service ssh restart

Now we’ll install a firewall to control which ports we allow traffic into. We’ll allow SSH and SFTP (port 22), HTTP (80), and HTTPS (443).

apt-get install ufw
ufw allow 22
ufw allow 80
ufw allow 443
ufw enable

You may get a warning after the last command about this disrupting your SSH session, but you should be able to ignore it.

Other dev tools

git, rvm + ruby, tmux

apt-get install git
apt-get install tmux
\curl -L | bash -s stable --ruby


These are just some relevant notes and steps I’m including mostly for myself to remember for later:


In some cases you may need FTP and not just SFTP.

apt-get install vsftpd

Edit /etc/vsftpd.conf and change the following lines


Save the file, and `/etc/init.d/vsftpd restart` then `ufw allow 21`.

Serving a git repository

If you’re serving files directly from a git repository, make sure you aren’t serving .git.

Simulate typekit FOUT on OSX with IPFW

A downside of using Typekit’s async javascript snippet is FOUT (Flash of Unstyled Text).

In order to reproduce what this will look like in your app/website it can be helpful to artificially slow down your connection to Typekit. This can be accomplished via ipfw (no guarantee that these commands will work exactly as below for other unix variants).

First get the IP address of by pinging it (for me it is currently Then:

sudo ipfw add pipe 1 ip from to any

sudo ipfw pipe 1 config bw 80kbit/s plr 0.05 delay 50ms

Play with the values 80 to change bandwidth, 0.05 to change packet loss ratio (you can just remove this as well), and 50 to change latency.

When you’re done (note that this will flush all existing ipfw rules):

sudo ipfw flush

Graphing changes in file size across git commits

Both as an excuse to try to learn gnuplot and as a way to track the growth of compiled javascript and css assets files, I was looking for a way to grab the size of a given file across a series of git commits, and end up with output like this:

# size commit                                   date
439323 d4d09e047d50388180a1e317efc61af5d8961275 20130201
439323 fd30e151e35efba1bda65488e621c7338895542e 20130130
439241 6ce650d7e97add955b7cd07150732890c0edaf49 20130129
439241 3c1d2aec69f874926965843800163be71ec5f376 20130128

If the name of the file stays the same, it turn out this is pretty simple. The following git command will show the size of the file for the commit in question:

git ls-tree -r -l <COMMIT> <PATH>

So we can do something like

git ls-tree -r -l HEAD~$COUNTER compiledjs.min.js

in a bash script and increment $COUNTER as much as we want, grabbing the file size with some ugly use of tr and cut, e.g:

git ls-tree -r -l HEAD~39 compiledjs.min.js | tr -s ' ' | tr '\t' ' ' | cut -d ' ' -f 4

But if the name of the file changes across commits, as it will if you are tagging it with a date or SHA1 for cache-busting, this approach won’t work. The approach I came up with, which is hacky, involves creating and deleting temporary branches based on HEAD~1, HEAD~2, etc., and getting the requisite date, file size, and commit info by pattern-matching on the name of the file in question.

Shell script to accomplish this, along with some basic gnuplot commands to plot the output, here:

closure compiler externs for underscore 1.4.4

Closure Compiler externs for underscore 1.4.4 are now available.

Underscore 1.4.4 adds two new functions, _.findWhere, and _.partial.

Also pushed some fixes to 1.4.3 externs:

  • typo in `properties` param for _.where
  • a bunch of methods should be able to operate on Arguments objects as well as Arrays. Previously, only Arrays were noted as valid params by the externs.