Articles

Waiting for gg.google.com

When I was working on Google Maps API , I was getting error of “waiting for gg.google.com” on status bar.
Later on I found out that if script debugging is diabled, this problem can not raise.
So just disable script debugging from the firebug.

Double Shot #513

Spent too much time yesterday fighting with code. Hope to spend more time writing it instead today.

GPolygon in Gmaps

By using Gpolygon we can display a path on gmap. You need to just define order of locations(latitude,longitude) in this.
But you have to know How to generate Gmaps in ruby on rails
Create a polygone array #From Sri Lanka to Afghanistan

polyline=GPolyline.new([[7.880794,80.507813],
[22.792388,79.497070],
[29.874945, 69.345703],
[33.757228,65.522461]],
"#ff0000",1,5.0)

Add into your map

@map.overlay_init(polyline)

You will get following output.

Faking startup and shutdown with ActiveSupport::TestCase

Time-was we didn't just have setup and teardown in our Tests (which are run for every single test case)... we could also have a startup/shutdown pair that ran once per proper Test-case class (ie once at the beginning of all tests in a set).

This was a perfect place to put run-once-only things - especially useful if, say, you wanted to run ruby-prof over your tests.

Google Maps in Ruby on Rails

To generate google maps in ruby on rails is so easy and simple. I had recently generated a few maps in my projects by  YM4R/GM plugin.
There are few steps that you need to follow to generate simple gmap.
1. Install YM4R/GM plugin in your rails app.
ruby script/plugin install svn://rubyforge.org/var/svn/ym4r/Plugins/GM/trunk/ym4r_gm
2. Copy  the javascripts files from YM4R/GM plugin [...]

Patterns Are Not Scary: Method Missing Proxy

In which I show how to create a method missing proxy and provide some example uses in the wild.

Method missing proxy? Ooooh! Sounds scary, right? I got news for you Walter Cronkite, it’s not. Lets start with the definition of proxy, according to Wikipedia.

Definition

A proxy, in its most general form, is a class functioning as an interface to something else.

Continuous Integration Spring Cleaning

Continuous Integration is a fancy term for “run your project’s tests after someone pushes to the repository and notify interested parties if they fail.”

We’re currently in the process of revamping our test suite (which we’ll blog about in the future) and moving servers, so I thought it’d be a good time to re-evaluate our options.

Bowline: An MVC Framework for GUI Apps in Ruby

bowlineBowline is a new MVC GUI framework for developing cross-platform desktop applications using Ruby, HTML, CSS and Javascript. The author, Alex MacCaw, aims to make building a desktop app as simple as creating a Rails site.

JavaScript Performance Rocks UPDATE!

Just a quick note for those of you who have bought our JavaScript Performance Rocks! ebook & DOM Monster package:

Check your inboxes!

You should have received an email with a pre-final copy of the book. It's pretty much content-complete -- but still has a few editor's notes, and needs a couple minor updates (to cover Safari 4 final, for example), and I plan to add a few more infographics.

Basically, there's just a little bit of polishing left. And I didn't want to keep you waiting any longer!

The DB2 adapter now supports Django 1.1

I’m glad to announce that the API team has just released version 1.0.2 of the adapter for Django. And on my birthday to boot, what a nice present. This version extends its support to the recently released Django 1.1, as well as incorporating the feedback that was received earlier on.

Shadow of the Animal | Rails Fire

Shadow of the Animal

Let’s recapitulate where we have been for readers who are new to the blog and for the convenience of others. I started this a while ago as an experiment trying to give some insights into my way of reasoning. I choose processing of large log files as subject which should help me analyze problems on production systems. I do believe that this might also help others as log file analysis is a fairly common task. The key point here is to be able to efficiently sift through many large logfiles and give access to the relevant information (see also the list of requirements and the initial description of the scope).

I picked “Animal” as name of the logfile analyzer because that must sift through tons of log data in a similar way as the like named character of a TV show treated his instrument – ferociously.

In today’s article I will present some major design decisions. Some core components of the architecture have been mentioned in the previous article already so I won’t repeat them here.

Design Decisions

As I want to give insights into my way of reasoning about software and design in particular I will list some of the major design decisions along with my rationale.

No class for individual log entries

The reasoning has been presented in a previous article: I want to avoid the overhead of allocating short lived objects. You can play around with a toy Java project (Eclipse as well as ant) to see the effect I was talking about. I had up to 9% time difference between the straight version and the version which puts data in a special record class.

No meta data

I had pondered the option to use a two step approach with an initial analysis step which will generate meta data which then is used by another program to efficiently extract relevant information. The advantage clearly would have been that we can apply a lot of different filtering and selection criteria without having to go through the initial analysis phase. I have decided against for these reasons:

Easier propagation of relevant data
Others do not need the extraction program and the original log files to look at interactions,
Simpler software structure
Only one program is needed, no writing and reading of meta data needed,
More robust
A change in the location of the original logs does not affect analysis.

Efficient Processing

This is the part I personally find most interesting: how do we manage to stream process log data while still limiting memory usage? We cannot simply aggregate all interactions in a huge Hash and dump the interesting bits at the end because this will burn too much memory (which costs speed through paging) or even make the whole process fail completely. That would be especially bad since in a case of lacking memory we would loose all the work that has been done up to the point.

My idea for solving this involves two components:

  1. A hash based storage with LRU semantics,
  2. An interaction processor.

Basically the LRU storage is responsible for keeping only a limited number of interaction processors in memory while the interaction processor is responsible for dealing as efficiently as possible with a single interaction. This means, the interaction processor will have to efficiently decide whether an interaction is included in the output and write out log lines as soon as possible (!) to a file in a given location.

Output Storage

Every interaction will be stored in its own file. To avoid potential issues with file systems becoming inefficient with thousands of files we will use an output directory tree with these levels:

  1. date, formatted YYYY-MM-DD (example ‘2006-06-21’),
  2. hour and minute, two digit 24 hour clock and minute (example ‘19-11’).

Interaction file names will include the timestamp of the first record. The naming will be like this: “SS.SSS-” where “SS” are the seconds of the timestamp and “SSS” are milliseconds". A valid name might then look like this 2009-06-21/19-11/00.003-sdmyrsmmlbxodfd.

The advantage is that we will not face any filesystem issues because there will be only a few thousand entries per minute. Plus, because of the hierarchical naming of directories and files we can easily use textual sorting of names and get chronological output. If we discover that there are too many entries per minute we can still extend the schema pretty easily to include a directory level for seconds.

Structure

The structure of the application will look like this: there is an option parsing method which outputs options which contain a combination of all processing options. These options will be hand off to the Coordinator which is the main driver of the processing. The Coordinator also needs a Parser instance which is capable of parsing log lines and providing interaction_id as well as a time_stamp of the last parsed entry. The coordinator will create InteractionProcessors as needed and hold them in a structure keyed by interaction_id. Each InteractionProcessor will handle a single interaction only. It knows the coordinator through which it gains access to options and a filter evaluator which determines whether an interaction is included; this filter works on the InteractionProcessor.

Thought Process

Now you might wonder how I arrived at these decisions. Well, I wanted the resulting bit of software to get the job done simply and efficiently. Generating a whole lot of meta data which helps in finding interactions efficiently would have created additional complexity as I have pointed out. So it was logical to rather choose a grepish approach, i.e. read the input once and spit out everything that we are interested in.

Since volume of input data is large we cannot expect to hold all the input in memory at some point in time. So we would rather have to pick a streaming approach. This means, we can only hold as many entities in memory at a time. Luckily, log entries of a single interaction have high locality, i.e. whenever the first occurs follow up lines are not too far away and the interaction end is usually near as well.

Now, I considered two approaches for aging out interactions from the processor’s storage: timestamp based and access based. I picked access based because in reality there are some interactions which take a bit longer than the usual interaction which would make picking a high latency necessary. This in turn would keep short interactions longer in memory than necessary. Also, with this approach you cannot put a hard numeric limit on the number of interactions in memory which could lead to failures for some logs while other logs are processed ok. With a LRU based storage we can set a size limit and avoid this problem. Of course, long interactions which also have many log entries could still cause memory issues but since these are comparatively rare the likelyhood of issues is small.

I will now wander off hacking together a rough application which does not yet do much but should give you an idea of how all the pieces are supposed to work together.