Talk: “Why and How to Do a Software Startup”


Last night, I gave a talk at the Carolina Innovations Seminar. About a hundred people showed up to hear me talk about “Why and How to do a Software Startup”.

Here’s the abstract:

As famous investor Marc Andreessen said, “Software is eating the world.” In other words, software is and will remain relevant, making software startups popular and attractive. For aspiring and current entrepreneurs, this talk will focus on the practical aspects of creating and running a software startup. Topics will include: managing a software project, hiring tech people, hosting and operations, security, intellectual property, which programming language to use, and social media, with a dual emphasis on what to do and why to do it that way.

The slides from my talk are now up on SlideShare. Enjoy!

New Website


We recently launched a new website. Our old website was, like, so last year. Now we have a fresh new look that is also friendlier to mobile device screen sizes. Our blog got a similar update, both freshening things up visually but also moving to

In particular, check out our new About page. We’ve revised that to be more in line with our new mission: to amplify human intelligence by creating great tools. We have a lot of stuff in the queue related to that mission, so stay tuned!

Analyzing Reddit Submission Times


I enjoy reading the Data is Beautiful sub-Reddit, which often has interesting and useful visualizations. The official guide has some tips for making a great post, including this one:

The best time to post is generally between 12pm and 5pm EST (UTC–5). Other times also work well, but most of the successful posts on /r/dataisbeautiful were posted in that time range.

I thought I’d test that assertion. So I wrote a simple tool in Clojure and gnuplot that queries the Reddit API for a particular subreddit, groups recent submissions by the submission hour (UTC), and creates a chart displaying the percentage of high-scoring submissions per hour. Here’s the current chart for DataIsBeautiful:

The percentage of /r/dataisbeautiful Reddit scores that were above some value, per hour.

As you can see, the guide, highlighted in orange, was wrong! The best time to post, at least according to recent data, is actually between 8am and 12pm EDT (UTC–4), which is highlighted in green.

You can run the same analysis yourself on any subreddit by using the tool I wrote. Pull requests welcome!

Update 2014-09-16 2:07 PM

The Data is Beautiful sub-Reddit has updated their guide to reflect the new data. Cheers!

Dynamic Thresholding Tool


[Regretfully, the screenshots as well as the web application discussed here have been lost in a botched site migration. Apologies. -Editor]

We’ve been hard at work on a new tool, which we are proud to announce: the Altometrics Dynamic Thresholding Tool.

Dynamic Thresholding Screenshot


Many data sets have periodic patterns. For example, in network management, the network usage (in bytes per second) varies greatly by the time of day. When, say, everybody is at work, the usage goes up. When everybody leaves, usage goes down.

Trying to create notification thresholds for this sort of data is frustrating. If you set a static threshold, it’s prone to trigger false alarms during the morning high-traffic time, because only a little bit more traffic than normal would exceed the threshold. Conversely, even a pretty serious anomaly might not trigger an alarm if it happens in the middle of the night, when there is typically little traffic.

Clearly, we need thresholds that vary by the time of day. But we shouldn’t burden the network managers with creating such complex thresholds. Instead, we can do that automatically.

