Using decision trees to predict paying users in Ruby
I had some hiccups trying to implement this in a project, so this is a quick post on how you might actually use his decisiontree gem in your app.
Sample use
Suppose you have an app with a bunch of users, and several payment options. You also offer a free service, which you hope will lead people to buy the paid service. You want to know what kind of people buy the paid service, so that you can target your app towards them.
Setup
To get started you’ll need Ilya’s decisiontree gem.
1 |
sudo gem install decisiontree |
You’ll also need to install the graphr ruby classes. You can pick up the graphr stuff here
If you’re on leopard, you’ll also need to use an older version of graphviz, as the newest version gives errors like:
1 |
dyld: lazy symbol binding failed: Symbol not found: _pixman_image_create_bits |
When I ran into this problem myself, I finally found this post on graphviz with leopard which suggests that you download an earlier release here This is what worked for me.
Those were the only technical problems that I had.
Finally, since you’ll probably be using this in a Rails app, let’s mock out the situation that you will probably have.
Here’s a quick script to basically pretend that you have an active record model named User, as most people do.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
class User attr_accessor :age attr_accessor :salary attr_accessor :paid def initialize(atts={}) atts.each {|att, value| self.send("#{att}=", value)} end def salary_range return 'Less than 10k' if salary < 10000 return '10k - 50k' if salary < 50000 return '50k - 100k' if salary < 100000 return '100k+' if salary end def age_range return 'less than 20' if age < 20 return '20 - 30' if age < 30 return '30 - 40' if age < 40 return '40+' if age end def paid? paid ? 1 : 0 end end USERS = [ User.new(:age => 25, :salary => 40000, :paid => true), User.new(:age => 40, :salary => 10000, :paid => false), # more ] - |
Figuring out why users paid
Next, we can just copy Ilya’s script to train the tree and print a graph.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
require 'rubygems' require 'decisiontree' require 'users' def find_reasons attributes = ['Age range', 'Salary range'] training = USERS.map do |user| [user.age_range, user.salary_range, user.paid?] end dec_tree = DecisionTree::ID3Tree.new(attributes, training, 1, :discrete) dec_tree.train dec_tree.graph("reasons_for_paying") end |
The key thing to note is just that we first define the attributes that we think might play into the decision process, then we create an array of arrays that has the information. For each array, the last element should be the decision. So, if the decision I was analyzing were ‘did users eat cake’, 0 would mean that they didn’t and 1 would mean that they did.
So, we end up with a nice visual like this:

Caveats
There are obviously a few caveats here. One huge one is that you must make sure you have not omitted a key factor in the decision.
For example, if you were trying to figure out who will pay for cake, and you have an attribute like ‘likes cake’ and you only included ‘salary’ and ‘age’ in your analysis, you might end up with the conclusion that age is the most important factor in whether or not people will pay for cake, while in fact ‘likes cake’ may be the more important factor.
Furthermore, it’s important to realize that many of these values may be affected by whether or not users pay. If you are including ‘cakes ordered’ in your analysis of whether users sign up for a monthly cake subscription, you will want to be sure to only include cakes that were ordered before the user started the monthly subscription. Otherwise, your conclusions will likely be affected by the fact that paying users are more likely to be ordering cakes. That is, paying users are the ones most likely to be using your service, so concluding that usage leads to payment is probably too simple.
Let me know if people have any other interesting ways to use this.
- Person:


Recent comments
1 year 23 weeks ago
1 year 23 weeks ago
1 year 25 weeks ago
1 year 27 weeks ago
1 year 42 weeks ago
1 year 45 weeks ago
1 year 45 weeks ago
1 year 45 weeks ago
1 year 46 weeks ago
1 year 48 weeks ago