Articles

Getting to grips with git - Part 3: rebasing

This is the third post in my Getting to grips with git series. In previous articles, I’ve covered the basics and merging, branching & tags. This time, I’m going to introduce the concept of rebasing. I think it justifies its own article as it’s a potentially confusing subject.

Double Shot #479

Announcing my latest project: Rails Rescue Handbook. If you’re facing a Rails codebase that needs help, you may find this $9.95 ebook useful.

Rails Underground Conference

I'm attending Rails Underground!

At the end of last week, I booked my tickets for the Rails Underground conference in London which takes place towards the end of next month.

Fresh news for Rails developer

A week ago i discovered on twitter a useful site.

http://www.railsfire.com/

Up to date, with fresh and interesting links. Add this to your feeds.

Taking Things Too Far: REST

I’m going to put up a few posts based on a talk I gave at RailsConf ‘09 in Vegas and RailsWayCon in Berlin. Sorry for the delay in updating but I wanted to deliver the talks before posting here.

Updates

Just a few quick updates about what’s going on around here.

Indexing ActiveRecord Objects in an Ordered Collection

Ever wanted to find the index of an ActiveRecord object in a collection? By collection, I’m referring to both associations and named scopes, the two types of lazy-loaded object collections available in ActiveRecord.

Suprisingly, Rails offers no easy way of doing this without loading the entire collection. This is fine for small collections, but once your data grows the operation will become more expensive.

Shadow of the Animal

Let’s recapitulate where we have been for readers who are new to the blog and for the convenience of others. I started this a while ago as an experiment trying to give some insights into my way of reasoning. I choose processing of large log files as subject which should help me analyze problems on production systems. I do believe that this might also help others as log file analysis is a fairly common task.

Background Process (threading and forking) with spawn plugin in rails

Do you have a process which takes a long amount of time before you display the result to the user? First of all make sure that your code is correct and you are following Rails standards. If thats alrite then you should do that process in background. Doing a process in background means that you [...]

Interview: Author Gregory Brown

Our Book Promotion: “Ruby Best Practices” starts soon. Win one of four books to be given out for active participation. The coolest thing? Author Gregory Brown will be on site to answer questions! Click here for more details. Here, in this brief interview, Satish Talim of RubyLearning talks to Gregory Brown.

Gregory Brown

A Note on the Recent Outages | Rails Fire

A Note on the Recent Outages

Send to friend

Following three months of near 100% uptime, we’ve just been through three major outages in as many days. I wanted to take some time to detail the problems and what we intend to do to prevent similar downtime in the future.

Outage #1 (02/02/2010 9:55:09AM PST) was initiated by a load spike on one of our file servers (fs1a). When a file server stops responding to heartbeat, the slave server in the pair kills the master and takes over. In this case, the master was not killed quickly enough and the storage partitions did not migrate cleanly to the slave. Cleanup on the split-blain file server pair was delayed due to some inefficient DRBD configuration that we’ve been meaning to update. By rolling out improvements to the DRBD configuration, this type of problem should be prevented from happening in the future.

Outage #2 (02/03/2010 6:10:08PM PST) looked like a power outage at first, since so many machines were affected, but the root cause was the deployment of a faulty DRBD configuration update that propagated to all machines (courtesy of Puppet) and started causing pairs of machines to halt replication to prevent corruption caused by an invalid configuration file. Eventually the load balancer pair was affected and we could no longer even serve the Angry Unicorn page. The way that the servers went down, the number of servers that went down, and the length of time it takes to resync downed pairs resulted in a lengthy outage. There are several steps to preventing this kind of outage in the future. First and most obvious is to maintain tighter control and testing of proposed system-wide configuration changes. We also plan to deploy (well-tested) changes to the DRBD configuration that will reduce cleanup times and automate the startup process for downed machines. These changes will result in shorter recovery times in the event of single failovers and wider machine-level restarts.

Outage #3 (02/04/2010 2:37:08AM PST) was caused by massive load spikes across all five file servers. To prevent extended downtime we marked all file servers as offline (preventing them from going into failover) and looking for the cause of the load. After inspecting the HTTP logs, we identified a Yahoo! spider that was making thousands of requests but never waiting for responses. After banning the spider, the load returned to normal and we were able to bring the file servers back online. We are looking at our rate limiting strategy and will be making improvements over time to get the best performance for legitimate users and the best protection from anomalous behavior.

In order to execute the improvements to various infrastructure elements, we will be having scheduled maintenance windows at 10PM PST over the next week. Most of these changes will not require any downtime, but some of them may result in temporary unavailability of file server partitions. As we perform the maintenance, we’ll keep you updated via the GitHub Twitter account, so make sure to check there for the latest maintenance news.

We sincerely apologize for the recent problems and are working very hard to address each flaw. Stability is one of our biggest goals this year, and I look forward to making your GitHub experience as flawless as possible.