Random Annie

We started posting on twitter, facebook, linkedin, and google+ this week, so there’s a little less grist for the mill but here’s a few good articles we’ve been thinking about:

Irresponsible Freedom

We’ve got social psychology, robotics and AI, personal uses of data algorithms, and a historical call for voluntary self-restraint:

Analyzing Reddit Submission Times

I enjoy reading the Data is Beautiful sub-Reddit, which often has interesting and useful visualizations. The official guide has some tips for making a great post, including this one:

The best time to post is generally between 12pm and 5pm EST (UTC–5). Other times also work well, but most of the successful posts on /r/dataisbeautiful were posted in that time range.

I thought I’d test that assertion. So I wrote a simple tool in Clojure and gnuplot that queries the Reddit API for a particular subreddit, groups recent submissions by the submission hour (UTC), and creates a chart displaying the percentage of high-scoring submissions per hour. Here’s the current chart for DataIsBeautiful:

The percentage of /r/dataisbeautiful Reddit scores that were above some value, per hour.

As you can see, the guide, highlighted in orange, was wrong! The best time to post, at least according to recent data, is actually between 8am and 12pm EDT (UTC–4), which is highlighted in green.

You can run the same analysis yourself on any subreddit by using the tool I wrote. Pull requests welcome!

Update 2014-09-16 2:07 PM

The Data is Beautiful sub-Reddit has updated their guide to reflect the new data. Cheers!

Big Easy

Among other things we’ve been reading about neuroscience, society, financial market prediction, and statistical research results this week:

Dynamic Thresholding Tool

We’ve been hard at work on a new tool, which we are proud to announce: the Altometrics Dynamic Thresholding Tool.

Dynamic Thresholding Screenshot

Rationale

Many data sets have periodic patterns. For example, in network management, the network usage (in bytes per second) varies greatly by the time of day. When, say, everybody is at work, the usage goes up. When everybody leaves, usage goes down.

Trying to create notification thresholds for this sort of data is frustrating. If you set a static threshold, it’s prone to trigger false alarms during the morning high-traffic time, because only a little bit more traffic than normal would exceed the threshold. Conversely, even a pretty serious anomaly might not trigger an alarm if it happens in the middle of the night, when there is typically little traffic.

Clearly, we need thresholds that vary by the time of day. But we shouldn’t burden the network managers with creating such complex thresholds. Instead, we can do that automatically.

Continue reading “Dynamic Thresholding Tool”

Look Around You

…there is data lurking around every corner. There are a few longer reads this week. This is what we have been reading and thinking about in the realm of technology, history, business, and data: