Category Archives: Ruby

bob != the greatest

Fun bug. It’s not all that hard to see once you sort out what line it’s on, but when the sauce is much thicker… the initial Hash definition may not be the first place you check.

…and when I say “you”, I of course mean “me”. It may not be the first place me check.

>> a = Hash.new{ |h,k| h[k] = 1; bob = "lame" }
=> {}
>> a[0]
=> "lame"
>> a[0]
=> 1
>> a = Hash.new{ |h,k| bob = "the greatest"; h[k] = 1 }
=> {}
>> a[0]
=> 1
>> a[0]
=> 1

Advertisements

Leave a comment

Filed under Ruby

MySQL Triggers w/Rails

I recently incorporated db-triggers with a Rails app to maintain some counts that were otherwise fairly expensive to retrieve.  Rails wasn’t super-pumped about the idea (what with the “keep all the logic in the app” approach and all), but sometimes… you know… you know better than your framework.

Some things I was aiming for:

  1. Set them up with normal migrations.
  2. Test them with the normal test suite/normal fixtures.
  3. Make recovery/reset simple for when the table (inevitably) is somehow out of sync.

The “frequent counts” table

I’ll have multiple counts but not TOO many — enough that I don’t want to have a column per count but not enough that I mind using “LIKE” to lookup patterns, so my table has: id, code, current_count.

Code will be a unique key (important later) and be formatted like “style_ABC_size_456”.

So, when a new item is added it’ll be associated with a style and some sizes – each combination will either need to be setup (with a current_count = 1) or an existing combo will be found and +=1.

The FrequentCount class has the fairly straightforward finders that you’d expect + methods to reset each of the counts that it contains.  The reset methods follow the pattern “reset_frequent_count_COUNT_NAME” -> they clear the existing counts that they maintain before repopulating them.

I also threw in a reset_all method that looks for anything on the class following the “reset_frequent_count_COUNT_NAME” pattern and runs them.

The trigger-SQL

The SQL for creating the triggers will be needed by the migration as well as the test suite.  In fact, the test suite will need to run them somewhat often due to the way the standard tests “prepare” the database.

I ended up throwing it in lib/trigger_sql.rb.  Methods there are named with the pattern “sql_for_TABLE_OPERATION_TRIGGER_NAME” ex: sql_for_items_insert_style_and_size

Many of the triggers could not rely on pre-existing rows.  i.e. a new style/size combination needs to INSERT where an existing combo could update ( +=1 ).  To get around this, I relied on the unique key setup earlier on the “code” field for the frequent counts table.  <– that allowed me to lean on insert statements with “ON DUPLICATE KEY” clauses with update statements.  Something like this…
create trigger items_insert_style_and_size after insert on items
for each row
begin
insert into frequent_counts(code, current_count)
values (concat(‘style_’, new.style, ‘_size_’, new.size), 1)
on duplicate key update current_count = current_count + 1;
end;
The Migration

I’ve already given away most of the fun stuff about the migration.  It just needs to run through the triggers that are being setup at this specific time, doing things like:
TriggerSql.connection.execute(TriggerSql.sql_for_items_insert_items_by_style_and_size)
and then make sure to populate it all (with that reset_all) method when we’re done. <– next time out I may want to call specific methods to reset just the ones I care about but this first time, I can just do the whole table.

Testing with Fixtures
Rails goes a little too far when running the default test tasks for us – it ends up nuking the triggers on us, but not to fear: it’s a quick hack in the Rakefile.
I’m going to spare you some details (drop me a line if you want them) but I basically overrode the db:test:prepare method to call a special version of the clone_structure task.  My version has a dependent task that does:
# find methods that follow our pattern of “methods providing trigger sql” and execute the contents of each
TriggerSql.methods.select{ |m| m =~ /sql_for_.+/ }.each do |method_name|
ActiveRecord::Base.connection.execute(TriggerSql.send(method_name))
end
As you see there, it’s leaning on that naming convention “sql_for_TABLE_OPERATION_TRIGGER_NAME” to find the sql to (re)apply.
That’s it!
Migrations set them up and share the code to do so with the fixtures that can repeat the tests whenever we need.  Those reset methods also come in handy not only for the initial population (by the migration) but we can call them manually should we need them.

Leave a comment

Filed under deployment, rails, Ruby, SQL, Test Driven Development (TDD)

Friend’s Price

A friend was asking the other day about getting a little site up and running for his small business.  Seems like the scope is going to be quite small (famous last words) and there are a few things I’ve been meaning to checkout lately, so I think I’m going to give it a stab.

Night one went something like this:

I got an invite (did they do invites?) to checkout Heroku really early on.  At the time, I remember devoting a night to it – working with an online editor (or maybe I was ssh’d in with vi or something?) and basically just getting a little “Hello World” up and running.  Ever since bumping into Jim Fiorato’s app/case study the other day, I’ve been meaning to revisit a bit.

It’s entirely likely that the friend’s site will actually not need anything but static HTML, but hey: I can always tune the caching and free hosting can’t be argued with, right?

So, I opened an account out at herku.com and grabbed the heroku gem; installed git (officially taking me off the short list of people that still hadn’t given it a chance) and set to work on getting my app out there.

I hadn’t written anything, so I was just wanting to throw the “welcome to Rails” app out there to make sure everything was setup correctly.  The first bump was some fun with SSH keys.  I’m actually still not sure exactly what the issue was but there seemed to be some commands that were respecting the path to the key that I had setup but some others that seemed to be looking in the default location (~/.ssh).  I am actually thinking now that I probably could have got around it with a little more effort put into the config but I ended up just using the default key location — no probs after that.

At that point + some very simple/standard Rails app config, it was incredibly early to push the app out there, setup a few tables (via migrations) and get to work.

A little effort into a pretty basic layout and we’re underway.  It’s nothing spectacular to checkout at the moment – generic copy, placeholder colors/blocks and text… but someday it’ll be a star.

Leave a comment

Filed under rails, Recommended Sites, Ruby

Baseball Draft

Ah yes, it’s that wonderful time of year again: fantasy baseball time.  Anyone who plays fantasy sports can tell you that other than getting a check for winning your league (something I would know nothing about), the best part of the season is the draft.

My fantasy draft preparation always involves the creation of a spreadsheet.  This nugget of gold will start with ratings and stats from various sites and sources before getting my own personal notes, organization, and shuffling (aka “messing it up”).  and what kind of card-carrying software engineer would do such a compilation by hand…

My basic strategy for compiling this sucker this year:

  1. Scrape some stats and rankings from a few popular sites and dump them locally.
  2. Load the stats from local, merge em, and kick out a csv.
  3. Mess with stuff in the spreadsheet.
  4. Win millions.

Scraping

I’ve used hpricot in the past but wanted a bit of a refresher — Man, this thing makes this task nice and easy.  Most of the sites have a fairly sane markup scheme for the tables they store players data in, so it’s normally as simple as (this is Ruby btw):

doc = Hpricot(open(url))
players = doc.search(".playerDataRow")
players.each do |player|
  meta = player.search("td .playerMeta")
  stats = player.search("td .playerStats")
  < do stuff >
end

 

Update: Here are a couple example files (they should be .rb files, but are docs to make WordPress happy)


Yahoo Parser

Something to invoke the parser

Local Storage

Last year, I wrote this as a single script (scrape -> csv). Things got hairy when I needed to tweak the merger and thus had to re-run the whole thing (annoying) or hack it to run only a portion… and the same portion on each site (annoying). So, this year I got a little smart and wanted to dump the scrape results locally.

I fully intended to evaluate a few options here but as it turns out, the first try was just dead simple and worked perfectly: yaml. My scrapers each end up with a couple Arrays: one for hitters and one for pitchers. Each entry is a Hash of player data. I considered skipping the Array here but didn’t want to have the logic for name collisions in the scraper.

How hard was it to write my +player_data+ Hash to yaml?

YAML.dump(player_data, File.open(File.dirname(__FILE__) + "/yaml/" + filename, "w"))

Merging the scrape sources

This is a separate script here now…

Resurrecting the player data proved as simple as storing it in the first place.

YAML.load(File.open(File.dirname(__FILE__) + "/yaml/" + filename))

First pass I just wanted to merge based on player names – ignoring the imperfections that surely come with that… I was pleasantly surprised at how well things actually came out. I wrote a little throwaway script (as if this whole thing isn’t throwaway) to tell me how much of a problem I actually have. Basically: how many players in the top 400 of any site don’t have a match from the other sites? The answer was basically: a lot of Latin guys and a few others.

So, one problem I had was character sets used in the Latin player names. This is an area where “you’re going to throw this away” came into play: I just grabbed the few codes that I was having trouble with and regex’d to replace them with their friendlier counterparts. This is now on my list of “things to figure out how to do right”.

That out of the way, I re-ran and found that I really only had problems with about 20 guys. People with names like “Mike Smith”, that were “Michael Smith” in the other set. Here again: cheated. 20 guys? I can handle typing 20 names instead of figuring this one out, so I just modified the yaml manually here (this immediately breaks down if I have to re-run the scraper, but… I didn’t).

After determining which of the dumped data I wanted to keep (and which site could trump the others when they both had something like HRs), I created my csv with FasterCSV and I was rockin’.

In the spreadsheet

At this point, I was realizing that I should have created some new fields in that script… maybe look for things like big differences between sources (why does Y! think this guy is #20 and ESPN thinks he’s #100?) or even just: what’s the average of all the sites scraped. I was enamored with my spreadsheet though and haven’t had to been able to work with formulas in spreadsheets since my consulting-at-AmFam days, so I thought I’d give it a spin… and was quickly reminded of how easy they make it… to bash your head against the wall:

=IF(AND(P13="n/a";F13="");M13;IF(P13="n/a";(M13+F13)/2;IF(F13="";(M13+P13)/2;(M13+P13+F13)/3)))

2 Comments

Filed under Ruby

Ruby String Concatenation

Ran in to a bug tonight.  See it?

def display_value
display_value = notes.blank? ? "untitled" : notes
display_value << " (#{ shortcut })" unless shortcut.blank?
display_value
end

Need a hint?  The symptom was the +notes+ field being unexpectedly changed.

+display_value+ is pointed at +notes+. “<<” then goes ahead and changes that value (which both attributes are pointed at), thus both are effectively changed.

So, while it’s nice to avoid the extra String creation with “<<” where possible – sometimes, you’ve just gotta go with the “+=” to clone, then modify.


a and b ending up the same:
>> a = "a"
=> "a"
>> b = a
=> "a"
>> b << "asomething"
>> a.object_id == b.object_id
=> true


and different:
>> a = "a"
=> "a"
>> b = a
=> "a"
>> b += "something"
=> "asomething"
>> a.object_id == b.object_id
=> false

Leave a comment

Filed under rails, Ruby

Rails Upgrade Bumps

After skimming the “What’s New” posts for several Rails versions without finding anything worth jumping at, it was time to upgrade.  While JSON enhancements (fixes) are the driving reason for the move, there certainly are a few “nice to haves” that I’ve been looking forward to checking out.

 

Partial Updates

Unfortunately, one of the things I’ve been looking forward to completely falls flat in my book… Partial Updates (to some extent: Dirty Objects).  

When I first read the paragraph-blurb about this addition I was pretty impressed and was looking forward to seeing how it was implemented.  I was much less impressed when I got my head under the hood.  Relying totally on use of ActiveRecord setter methods is a pretty big fail in my book.  Tracking down every place that a field is edited “in place” to specially flag it (xyz_will_change) is completely unreasonable and maintaining that rule going forward is an annoyance that I just don’t need.

I played with several ways of attempting to set flags when changes were “likely” (ex: when getters were used) or always flagging “likely to be edited in place” attributes (ex: serialized fields) but just was ending up with lots of additional complexity, reduced reliability, and basically voiding out any gained efficiency.

Having this enabled by default (apparently the plan) is just plain confusing:  Seems much more reasonable to enabled this when you know it’s going to benefit you.  I also don’t like that it adds another question for a new developer to ask (or get tripped up on) when joining a project.  I really don’t want to have to read every line of the new guys code for the first three months to make sure that he’s sticking to the rules (and setting up tests to verify it explicitly).

 

XML handling

Null values are now flagged as such instead of just being blank, like they would look if they were… uh… blank.  This is a good thing – it also tripped me up for a bit.

 

Eager Loading

It seems eager loading has been changed a bit.  Statements that used to result in :include items being joined right into a single large query now (may) result in several smaller statements.  

Does it make queries more efficient?  Probably.

Does it increase db traffic?  Probably.

Is it a bad thing?  Probably Not (overall).

Is it a bad thing that stuff that worked before doesn’t?  Yeah, that’s annoying.

My specific problem is on an association defined with a :select => “distinct labels.*”.  I played around a bit with potential changes down in the AR guts but in the end (unfortunately) I ended up basically tricking it into running it as a single statement.  It’s not at all difficult to make it happen but it’s also not at all straightforward – and I hate it when the framework makes me write a big comment.

For me, I added an :order to my find that would make it order by something in a different table, which scares the thing into running a single statement.  Ideally, I’d like to have something explicit (maybe on the association but more likely on the find itself) that would allow an optional param to say “run this as one statement”.  That would let someone reading the code clearly see what’s going on – instead of wondering (or more likely: not noticing) that I’m ordering by some pointless field.

 

Overall

Not nearly as bad as I had expected.  Felt good to get some deprecated stuff fixed and get some of my old TODOs out of there in the process.  Also good to have it (almost) over with and to, at least temporarily,have caught up with the Joneses.

2 Comments

Filed under rails, Ruby

Leading zeroes on Time.strftime

Not sure why this isn’t in the Ruby documentation (at least where I normally check) but the common annoyance of leading zeros on Time.strftime can be avoided simply by using “%l”.

It does come with another inconvenience though: you get a leading space instead of a leading zero…

Leave a comment

Filed under Ruby