On Mercurial (ie: git lite)

It seems that a rather popular theme when reading about distributed SCMs on a blog post is that someone says that they hate or love Git, where the hate is generally that it’s hard to learn, unintuitive, etc. Then, generally without exception, a mercurial user jumps in on the comments and says something like “I tried Git, but it was impossible to learn, so I’m using Mercurial and it’s easy-peasy”. That person is wrong.

Git is not hard to learn. At least, not any more difficult than Mercurial is. There, I said it. If you think that Git is like learning Linux - powerful but steep in the curve of learning, while Mercurial is like Mac - more constrained, but far easier to learn, you have either tried the systems a long time ago or have never really tried them and are just repeating the Merc FUD.

Don’t get me wrong, it certainly used to be this way. My point here is that if you take a fresh look at the two systems, the majority of beginner to intermediate tasks that you have to do with a DSCM are very similar in both systems and being sufficiently familiar with one takes very little effort to use the other.

I state this because of the incredibly scientific research I concluded tonight, wherein I used hg. I have been a Git guy for several years now and have never previously touched mercurial, and I dove right in a few hours ago and took some notes so I could share what is _not_ intuitive in hg, even from an advanced DSCM user, and to give it a fair shake. Here is what I have concluded:

  • Git and Mercurial have nearly the same learning curve
  • Some things are easier / more intuitive in Git, and some in Hg
  • Both systems have a similar number of overall common commands, of which 90% are identically named
  • You can pretty easily move from one to the other for basic tasks

Let me get into a bit of detail about what I found. As my first piece of evidence, I will look at the help menu. If you simply type ‘git’ or ‘hg’ on the command line, hg will give you the following 17 commands :

 add        add the specified files on the next commit
 annotate   show changeset information per file line
 clone      make a copy of an existing repository
 commit     commit the specified files or all outstanding changes
 diff       diff repository (or selected files)
 export     dump the header and diffs for one or more changesets
 init       create a new repository in the given directory
 log        show revision history of entire repository or files
 merge      merge working directory with another revision
 parents    show the parents of the working dir or revision
 pull       pull changes from the specified source
 push       push changes to the specified destination
 remove     remove the specified files on the next commit
 serve      export the repository via HTTP
 status     show changed files in the working directory
 update     update working directory

and Git will give you the following 21 commands:

   add        Add file contents to the index
   bisect     Find the change that introduced a bug by binary search
   branch     List, create, or delete branches
   checkout   Checkout a branch or paths to the working tree
   clone      Clone a repository into a new directory
   commit     Record changes to the repository
   diff       Show changes between commits, commit and working tree
   fetch      Download objects and refs from another repository
   grep       Print lines matching a pattern
   init       Create an empty git repository
   log        Show commit logs
   merge      Join two or more development histories together
   mv         Move or rename a file, a directory, or a symlink
   pull       Fetch from and merge with another repository
   push       Update remote refs along with associated objects
   rebase     Forward-port local commits to the updated
   reset      Reset current HEAD to the specified state
   rm         Remove files from the working tree and from index
   show       Show various types of objects
   status     Show the working tree status
   tag        Create, list, delete or verify a tag object

Take a good look at that, because there is not a lot of frickin’ difference. If you know one, you basically know the other. I can attest to that because I didn’t need to look up a lot to figure out how to use hg - not because it’s so super simple, but because it’s nearly identical (for the basic things).

One of the things I hear a lot is that Git has a billion esoteric commands that are cryptic, magical and impossible to remember. That… is true. However, it doesn’t matter. What matters are the porcelain commands that are meant to be used by the end user, and there are about 30 of them - the 21 above plus some special stuff like ’stash’ and ’submodule’. On the other hand, if you type ‘hg help’, you get a list of 41 commands.

Now, there are another 100 commands that git will respond to, but they are plumbing commands and are just there in case you want to build something novel - using them is the equivalent of opening up Mercurial and modifying the source. I happen to use a bunch of them to do some really weird stuff that is just not possible in Hg, but there is no reason users even need to know they’re there. They are not in the path (as of 1.6) - out of sight, out of mind. As far as a new user is concerned, Git is simpler in it’s command set then Hg.

Now, let’s look at a simple use of hg - creating a new hg repo and commiting and import:

mkdir test1; cd test1
hg init
cp [files] .
hg add .
hg commit -m ‘my message’
hg log

Now, let’s look at the same thing in Git:

s/hg/git/g

It’s exactly the same thing. clone, add, annotate*, commit, diff, init, log, merge*, pull*, push, rm, status - these are all basically identical in the two systems. (annotate is generally called ‘blame’ in git, but ‘git annotate’ will also work, and merge/pull work slightly differently but do largely the same type of thing) This is the core of both systems, these commands are what you spend nearly all of your time doing, and they are almost exactly the same.

Now for the fun part.

Things that are Confusing in Mercurial


I get to listen to Mercs take the high ground all the time about how git is hard to learn and the UI is confusing - now it’s my turn. Here are the things I had to go look up because I didn’t get it and even the ‘hg help’ wasn’t helping.

You have to set your username via ‘vim ~/.hgrc’

In Git, one of the first things you do is :

$ git config --global user.name 'Scott Chacon'
$ git config --global user.email 'schacon@gmail.com'

In Mercurial, far as I can figure, you gotta do that by hand. There is an ‘hg showconfig’, but no setter (again, far as I can tell). That means you have to look up a snippet of what the actual config format is and paste that into your ~/.hgrc file manually before your commits will stop complaining that no user is set. PITA.

There is no staging area

This is really just a Gitter wondering how Mercs do it, but the lack of a staging area is something I didn’t know I would miss. The lack of control over what versions of what files you’re committing seems like a huge, huge missing feature to me (again, only because I’m used to Git). People argue that it keeps it simpler, but you can get the equivalent functionality by adding a ‘-a’ to the ‘git commit’ command every time, which a lot of people do. In my initial foray, that was the only place where Git was actually more complicated than Hg - ignoring the staging functionality takes an extra ‘-a’.

How do I setup a remote repository?

OK, I have this Hg repository, and I want to create a remote one and push to it. I know I am being an idiot here, but I literally could not figure out how to do this short of doing an ‘hg clone ‘ and looking at the .hg/hgrc file to see what was added to allow ‘hg push’ to know where to go. I figured out that you could specify it on the command line, but the thought of typing a url every time I want to push made me throw up a little in my mouth. I could not find the equivalent of a ‘git remote’ where I could add and manipulate my remote repositories without editing the ‘.hg/hgrc’ file. I couldn’t find it in the hg book, either. Perhaps in the comments someone could enlighten me.

I setup a repo on BitBucket and the instructions on how to push into it were simply ‘clone this’, and then I assume you’re supposed to pull your files in and then push, but what if you already have a repo? This drove me nuts, and I still don’t know how to do it.

Then, for Act II of this little play, I wanted to know how to have another remote - say I want to be able to push my repo to my staging server for deployment and my central server for collaboration. Again, could not figure out how to add it - I ventured a guess and just copied the line in the config file and gave it a different name and that seems to have worked, but do you really have to edit the file to add a remote repository?

It also appears that something that happens incredibly frequently in your typical day is much more complex in Hg than Git - pulling. In Mercurial, you have to do three commands each time you want to pull (and merge) changes from your remote repository:

1  hg pull
2  hg merge
3  hg commit -m 'Merged remote changes'

In Git, that is effectively done with ‘git pull’. Now, you can do that with Git:

1  git fetch
2  git merge --no-commit
3  git commit -m 'Merged remote changes'

But WHY? (as an amusing side-note, there is a Merc plugin that adds an ‘hg fetch’ command that does what ‘git pull’ does, so in hg: fetch == pull + merge and in git: pull == fetch + merge…)

Branching… poor, poor branching…

I passed out for a quarter second when I read this in the Hg Book:

The easiest way to isolate a “big picture” branch in Mercurial is in a dedicated repository. … You can then clone a new shared myproject-1.0.1 repository as of that tag.

I was naive enough to think that branches living in entirely different directories was a thing of the past. How SVN of them. I cannot imagine living my life making local clones to effectively deal with long running branches. The book literally says:

“In most instances, isolating branches in repositories is the right approach.”

Um, no thank you.

It turns out that the more I get into branching stuff, the more I understand why they advocate that you clone to branch. Everything is on one track - you can’t commit something and then easily leave it there for work later and ignore it for the time being, which is what I use branches mainly for. It’s like Mercurial is a one-track mixer with some post-it notes to remember where you were and Git is a multi-track board that starts with one and then allows you to snap on new tracks at any time. Not sure if that metaphor worked, but the constraint of not having cheap, real local branches would drive me batty.

When I tried to have two topic branches going at the same time (say a master branch and an experiment branch), it was rather painful. It worked OK until I went back and forth and then when I tried to push it gave me a:

$ hg push
pushing to http://bitbucket.org/Scotty/objective-git/
searching for changes
abort: push creates new remote heads!
(did you forget to merge? use push -f to force)

No! I didn’t forget to merge, I want to have two branches! So, I forced it. Then, when I want to switch back to my other branch, it gives me this:

$ hg branch newbranch
abort: a branch of the same name already exists (use --force to override)

Yes, I know it does, I’m switching back to it, you bastard! So, you _can_ have several local branches being developed at the same time, but hg hates it and you cannot push one of them without pushing all of them. It looks like it stores them as sequential changesets but then stores the parents so you can technically recreate the history. However, it seems that you _cannot_ push your A branch without also pushing your B branch.

That. Is. Annoying.

Conclusion

Perhaps some of these things are simpler at first in Hg, but I don’t really think they are that much easier (if at all), and the amount of flexibility you lose is so immense that I can’t understand how anyone can think of Mercurial as anything other than ‘Git Lite’. Same great usability, much less functionality. And if your answer to that is ‘get X plugin’, then why do you think you’re winning the usability battle again?

That’s it - I’ll keep playing with Hg and sharing my thoughts (being as how they are sooo unbiased). In the meantime, I’ll leave you with some more metaphors:

* If DSCMs were bikes, Hg would be the Git bike with the training wheels soldered on.

* If DSCMs were TVs, Git and Hg would turn on to the same channel, but then Git would also have cable.

* If DSCMs were GPS units, both would have places of interest, but Git would also come with the street maps and be able to do driving directions.

* If DSCMs were shoes, you could play basketball in either, but Git would have the pump (for when you needed extra jumping and whatnot)

* If DSCMs were alarm clocks, they would both wake you up, but Git would also make you coffee.

(if you have others, please share - again, the theme is that they’re the same out of the box, but then the one is ultimately a lot more useful)

Why? Why why why why why?

Warning: the following post is a rant - I wrote it so that I could get this frustration off of my chest and out of my mind.

I live in California and I cannot for the bloody life of me understand why there are so many Prop 8 supporters. For those of you that don’t know, Proposition 8 is a ballot initiative on the California statewide ballot that will eliminate the rights of gay couples to marry.

How can this be taken as anything other than pure bigotry? It literally makes me sick to my stomach. I have not been so angry at the actions of other people in a long time. I can see a small, zealous minority agitating over this, but there are tons of normal people donating and picketing and arguing for the opportunity to strip rights from those who are not like them. Their self-righteousness, intolerance and in some cases, hatred, are so transparent that I have a hard time understanding how aren’t completely mortified at their own blatant vindictiveness.

There is simply no good reason to oppose it. There is no way in which this effects their own lives in any tangible way. It is simply that they don’t want people that they don’t like in the abstract to be in any way accepted in society. They may as well be wearing t-shirts that say “Gay people are icky, Yes on 8!”

It’s that there is actually no argument - the entire ‘Yes’ campaign is “wink, nod - you know you think it’s gross too…”

The sad part is that I actually do understand these people - I know tons of people that will vote ‘Yes’ on this. It’s people who are so ensconced in their own little self-righteous, self-affirming communities that they can justify it internally it as a referendum on gayness without feeling personally guilty. It’s naked xenophobia that is still so widely accepted that people don’t feel like they’re bad people like they would if they were equally blatently racist. Every “argument” for ‘Yes’ works completely unmodified if you replace ‘gay’ or ’same-sex’ with ‘mixed-marriage’.

There are no financial, legal or health implications for anyone who would vote ‘Yes’. They just don’t want gay people to be able to avoid the social awkwardness that comes with having to refer to your husband as a ‘life-partner’ instead. They don’t want their children to even be exposed to the notion that being gay might be OK - secure in the notion that THEIR children could never be gay. It is that simple. It is not even fundamentally religious - there are churches and synagogues that will marry gay couples, there are tolerant congregations all over the place.

I don’t blame the people personally, they’re by and large good and loving people, it’s the environment - the churches and right-wing agitators that think this crap will energize their base, and people trust them and are moved to action and animosity by them unfairly.

The thing that really bothers me is the support and role of the churches in this. The fact is that this entire thing is fanned and supported by churches across the country. I read that nearly half the funding for the ‘Yes’ campaign has come from the Mormon church alone, and that this Sunday thousands of pulpits will be used for political purposes to encourage people to vote the “right” way on this - the way Christ would want you to vote.

For some reason, I still naively believe that faith is supposed to be a source of strength - teaching love, understanding and tolerance, not to be a support center for bigotry and judgment. What makes it worse is that I probably know the sections and context of the Bible that most of them draw from as the source of their intolerance and self-righteousness much better than they do, but that wouldn’t slow them down one bit. No, they are justified in their judgment - God Himself supports them. I found myself thinking of this quote from Obama’s “Audacity of Hope”:

“We think of faith as a source of comfort and understanding but find our expressions of faith sowing division; we believe ourselves to be a tolerant people even as racial, religious and cultural tensions roil the landscape.”

Sure, in a decade this will all be for naught. The slow movement of tolerance will eventually force peoples view on gay marriage go the route of mixed-marriage bans and segregation - but that we aren’t there yet is frustrating to me. That people naturally take such pleasure and zeal in excluding minorities and ostracizing those who are not like them, and furthermore that they are allowed to hide behind religion as a justification for their animosity, makes me truly, truly sad.

Please vote ‘No’ on 8.

A GitHubber Now

My big news of the day is that next week I officially start my new job at Logical Awesome, working on GitHub. This is a really exciting move for me, I’ve been interested in GitHub since long before it was released, nearly a year ago, when Chris first told me about it and I was first writing my Git/Ruby library.

Since then, I have loved it as a tool, evangelized it in my talks and whatnot, and done some part time consulting work for the site - I helped write Gist, Inline File Editing, and added some pure ruby goodness to the Grit library we’re using on the site. I have greatly enjoyed drinking, podcasting and working with Chris, PJ and Tom and I’m looking forward to doing that full time going forward.

Most importantly, I’m incredibly excited about working on Git stuff full time now. For the last few years, it has taken up more and more of my free time and now I get to do what I am truly passionate about for a living. Everyone should be so lucky. What’s more, all my talks and events I go to are now somewhat justified - I have to spread the Git love! The only downside is that where before I could plug GitHub as being truly awesome from a somewhat neutral viewpoint, now people will likely think I’m biased. :)

On a somewhat related note, we announced this at the Git Down event last night in San Francisco that GitHub put on, hosted by Serious Business, where we got to learn a little about git-sh, codeswarm and magit. Tom showed us some of the new stuff he’s working on for GitHub and I got to show off my GitHub iPhone app preview and talk a bit about ObjectiveGit, my Objective-C implementation of Git. A good time and a fair amount of beer was had by all.

Anyhow, it’s likely that future posts will be even more Git related, and I hope this means I can be even more effective in getting people to use and understand this tool that I enjoy so much.

Git Community Book

For the past several weeks, I’ve been working on a free, open source, online book on Git called the Git Community Book that I’m hoping to turn into a great one-stop resource for learning Git.

The idea is that we have a super solid Git resource linked right off the Git homepage that people can get most of the answers they need in a single, easy to browse place, from super-beginner to super-advanced. I’ve taken some content from the existing User Guide and tutorials that can be found scattered around online, and re-written a lot of it and added a ton of my own content, screencasts and images. There is a PDF version of the book that is generated and linked automatically everytime I do a build, so for those of you who like a paper or local copy, I’ve got your back. Probably 80% of the book is done now, and now I’m looking for some other contributors and some feedback.

I have a bunch of ideas for git-scm.com and the Git Community Book - I’d like to do searchable documentation, a cookbook, a quick-start guide and a few more things - but first I want to get this online book at least initially complete. If you are interested in helping contribute content for a section or chapter, I would hugely appreciate it. Even if it’s just notes that you’ve tested, I would be happy to humanize it for you. Or, if you’ve written a blog post that I can re-use the contents of that cover one of the topics, that would also be great.

The topics I’m currently looking for are Advanced History Modification, Corruption Recovery, Branch Tracking, Subversion Integration, Git with Perl/Python/PHP, and Git with Editors (especially NetBeans/Eclipse). I can write them, but since I have limited personal experience with these topics, I’m not very confident that they would turn out particularly well.

Again, the book itself is open sourced and you can download the raw markdown and build scripts from it’s GitHub repo, and read the “how to contribute” guide on it’s wiki.

Write me if you can help contribute or proof-read the existing content at schacon at gmail.com.

Thanks, and I hope you like the book!

The Launch of git-scm.com

I love Git. However, a lot of people have the idea that Git is hard to learn, which I really disagree with. I have been working with Git for a few years now, but I understand it vastly better than I ever understood SVN or CVS, which I worked with for many, many years. Why? Because it’s cheap and easy to try things out, the model is ultimately very simple and understandable, and it’s really pretty hard to really screw things up - Git almost never removes information. So, I found it easier to play with features and find what is really helpful to me, rather than being scared of costing myself more time than it’s worth.

I assume that the main reason people think Git is difficult is because they’ve heard other people say Git is difficult and they didn’t have a good teacher or learning resource, so when they fall back to their instinct - what SVN would have done or something - nothing works as expected and they get confused. Then they auto-complete for ‘git-’ and get 150 commands. How are they supposed to know that only about 20 of those are really going to be useful to them most of the time?

So, I’m trying to build some resources that will help newcomers love Git from day one. Really try to focus on the usability of the main site, make it easy to find reference or tutorial documentation, and eventually I’d like to build a really nice online book that answers learners questions when they need to know them and guides them through the learning process as naturally and easily as possible. I honestly don’t think that Git needs to be made easier somehow, I think the learning process does. The current docs are wonderful for many of us that are more technical, but often it’s easier to learn with screencasts and diagrams.

However, first I wanted to fix git.or.cz. I have always pointed to it as the git homepage, and Petr is awesome - he’s always kept it up to date and is a core contributor himself. However, as a landing page for a project, it is very overwhelming. There are nearly 1200 words on that page - almost all of them at the same font size. It is very difficult to skim, and pretty difficult to figure out what Git really does in a second or two.

gitscm

So, I’ve forked the source of that site (because awesome Petr made it open source) and created a new site, which is now being hosted at git-scm.com. I’ve broken the page up into 5 topic pages and drastically simplified everything I could. Hopefully it is easier to navigate and find what you’re looking for. The version number should be updated automatically, and I’ve setup a mirror of the Git source code at GitHub that I will eventually be doing some fun automated statistics with. The source code for the website is at GitHub, if you have an idea or contribution to make, feel free to fork the site and send me a patch.

Now that is done, I’ll work on the spiffy new documentation project, which will likely be another branch in that same repo. I’ll do another post when there is enough to share, at which point I would be happy to have all the contributor help I can get.

By the way, in case you’re wondering, the logo at the top is a Git. He’s a BLOB that is COMMITed to storing TREEs. Little Git humor, there…

Fuzed and EC2

First of all, this post is about 2 things that rock. One is Fuzed, the new Erlang web glue that runs Rails on the Erlang-based Yaws web server, put together by Tom Preston Warner and Dave Fayram of Powerset. The other is Amazon EC2, which is a big, sneakily expensive nerd playland. I became interested in Fuzed at the talk I heard on it at RailsConf by Tom and Dave.

I downloaded the Fuzed software and compiled it and got it running on my local machine, which was a bit more complex than was neccesary, since I needed 3 nodes (a faceplate, a master and a backend) to get it running. So, I thought I would see what it could really do in a real, multi-node environment. That’s where EC2 comes in.

I setup two (now public) EC2 AMIs, one 32-bit (ami-1da54174) and the other 64-bit (ami-64a6420d), (in EC2, you need a 64 bit version to run the bigger instances) each with Erlang and Fuzed and Rails and an example Rails app and HAProxy all installed on it. Then I started playing with different configurations.

For all of these examples, I’m using the Kronos rails app (a simple frontend to Chronic running on Rails v1.2) in development mode (just to make it slower) with no database. So, it is not a totally normal setup, but I wanted to see how to scale a slowish, difficult to cache app with Fuzed.

Control Group : our trusty mongrels

The first thing I did, in order to get a sort of benchmark of the Rails app itself was to run a normal setup on a small EC2 slice. I started up an m1.small slice (if you aren’t familiar with the various EC2 slice sizes, see here) and ran the app in 3 mongrels proxied behind nginx.

  • single server running a small mongrel cluster

This setup did pretty well under smallish loads - I was able to serve 9.0 requests per second with 5 simultaneous users[#/sec] with 95% of the requests taking less than a second, which means that I could pretty readily serve 30k requests in an hour, or somewhere around 500k a day, as long as it’s steady traffic.

cost: $72/month
performance: max 15M page views/month, 500/min max

(obviously, these numbers are highly dependent on the app and spikes in traffic and such, but they should help us compare to the other setups)

Configuration 2 : 3 servers

So that gives us a baseline. Now, my first pass at a Fuzed setup was with 3 servers - one faceplate, one master node and one rails node with 3 handlers running in it - all ‘m1.small’ sized. This is sort of a weird setup, since we’re still only dealing with 3 rails handlers, but now have added the overhead of the Fuzed master node and frontend listener.

  • one faceplate
  • one master
  • one small rails node (3 handlers)

And it turns out, that we get roughly the same overall performance here - about 10 requests/second sustained, though I mistakenly tested it with 10 simultaneous users instead, so the page loads were a bit slower on avg, but still not bad.

cost: 3 small = $215 / month
performance: max 15M page views/month, 500/min max

(though I’m fairly certain I could have run the master, faceplate and rails node all on the same slice and gotten much the same performance, so it’s not really 3 times more expensive than the mongrels…)

Configuration 3 : 4 servers

So now I come to the fun part of Fuzed, the auto-configuration. I simply spun a High-CPU Medium Instance ($0.20/hr, 5 CU), started Fuzed on it with 15 Rails handlers (my formula was consistently 3 Rails handlers per EC2 Compute Unit), bringing my total number of Rails handlers to 18.

Now I brought my concurrency up to 100 and got 56 req/sec from the stack, though the average response time was a bit higher, at about 1.5 - 2 seconds per request. That’s unacceptably high, so for the performance calculations, I’ll assume limits of closer to 40 req/sec.

  • one faceplate
  • one master
  • one small rails node (3 handlers)
  • one high-cpu medium rails node (15 handlers)

cost: $360/month + $20 bandwidth = $380/mo
performance: 85M page views/month, 2500/min max

Configuration 4 : 5 servers

Now I decided to bring in the big boys. Amazon has brand new High-CPU Extra Large Instances ($0.80/hr) with 7 GB of memory and 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each), which means I should be able to run 60 Rails handlers on each one. So, I fired up one of these and attached it as a Rails node, then brought up two more small instances, one as a second faceplate and one as a frontend server running haproxy round-robining back to the two faceplates. This entire reconfiguration took me about 5 minutes to do, most of it spent waiting for the instances to come up.

  • one master
  • one haproxy
  • two faceplates
  • one small rails node (3 handlers)
  • one high-cpu medium rails node (15 handlers)
  • one high-cpu extra large rails node (60 handlers)

However, now my stack was able to fairly easily handle 140 req/sec with 90% of them coming in under a second. This means that I can take sustained loads of close to 500k hits per hour on an uncached Rails app in development mode, 20 minutes after I was running it on a single slice.

cost: 935/month + $85 bw = $1020/month
performance: max 300M page views/month, 8500/min max

Configuration 5 : 13 servers

Just to see how far I could take this before I wasn’t willing to pay the hourly rate for this test, I spun up another faceplate and 5 more high cpu extra large servers for a grand total of:

  • one small master
  • one small haproxy
  • three small faceplates
  • one small rails node (3 handlers)
  • one high-cpu medium rails node (15 handlers)
  • six high-cpu extra large rails node (6*60 = 360 handlers)

At this point, adding new nodes is pretty trivial - it took about 10 minutes to get here from the previous setup, most of the time spent reconfiguring haproxy for the third faceplate.

Now I’m hitting other breakdown points in my testing setup - I can pretty easily get about 475 req/second from the stack, but it appears that the bottleneck is now in the faceplate layer and possibly elsewhere, but 475 req/second is pretty dang fast, especially for a site with no caching and no static requests at all - all 475 of those invoke the full Rails stack in development mode, no less. Plus, it’s still really responsive to me even as I’m slamming it with ab in the background.

475 hits per second translates into roughly 1.5M hits per hour, 35M a day, etc. Obviously these estimations are getting pretty ridiculous - so many other things come up before you hit this point that the comparison is only interesting from a relative view to the other configurations, but I did have fun setting it up and beating the crap out of this with ab.

cost: 3816/month + $342 bw = $4158.00/month
performance: max 1000M page views/month, ~28,500/min max

Conclusion

Now that I’ve taken this to it’s illogical conclusion, it’s fun to look back at what we’ve done. The point of the post is not to say “Fuzed will solve all Rails scaling problems”, because a) I don’t know enough about running any actual huge scale websites to have my opinion be worth a lick and b) this is a stupid-simple Rails app - any complexity you add to your app will give you new and interesting problems at scale.

However, what is fun to note is how easily Fuzed made it for me to take a simple Rails app running in development mode and make it serve 5000 dynamic pages in 10 seconds flat with no failed requests, and that it took me more time to write this blog post than it did to get it there.

All of this was automated with a little tool I wrote called FuzEc2, pronounced un-pronou-ncab-le, which basically just uses EC2::Base and Net::SSH to automate all the Fuzed and HAProxy setup steps. The first time I ran through this stuff I did it all manually and it still only took a couple hours, with FuzEc2, it took about 30 minutes. The AMIs are public, so if you want to try this for yourself, just download my FuzEc2 script, replace the example variables with your own AWS keys and such, and start spinning them up!

You can watch a video of me doing another, more limited run of this after the jump, which has an example of using the script.


RailsConf Git Talk

I just got back from RailsConf 2008, which was held in Portland again this year, and I have to say I had a really excellent time. I met tons of really cool people, had a hundred conversations about Git and Ruby and consumed a fair amount of Drop Top.

In addition to that, my talk went as well as I could have hoped. The place was packed, I talked on the second day after they had extended the rooms and the room was still almost totally full. The presentation went off without a hitch and dozens of people came up to me through the rest of the weekend to say they enjoyed it, so that made me feel really great.

One of the awesome guys that I met there, Daniel Wanja of OnRails.org, was kind enough to take some video of my talk, if you’re curious how I present or how many people there were:


RailsConf 2008 Git Talk by Scot Chacon Video from daniel wanja on Vimeo.


You can also download my slides here:





There were about 520 slides that I went through in 55 minutes (went a tad bit over the time limit) and some of them had moving pieces, meaning I probably clicked that clicker at least 600 times during that talk.

Update: I’ve recorded my basic talk over the presentation of the full slide deck and posted it as an episode at GitCasts, if you want to see the whole thing.

So, now I go back to work, but I enjoyed myself and met a bunch of great people and learned a lot about DataMapper, Erlang (Fuzed and Vertebra), and even a little Ruby… I also got to whine at Chad Fowler about the sessions not being videotaped, which he commiserated with me on. Lastly, I wanted to point out that for those of you who missed Nick Kallen’s talk on ActiveRecord at the very end of the conference, you missed out on a really well done live coding session - probably the most informative and well presented Ruby learning I’ve had in months.

GitCasts - Git Screencasts

Continuing on my git-ish roll, I’ve just launched a new site called GitCasts. I’ve noticed that a lot of people have been watching my other screencast, nearly 5,500 views so far - so I thought I would do something along the lines of the excellent RailsCasts website and do a bunch of short, topical screencasts on Git usage and internals.

I’ve put the site live with the first 4 screencasts:

Those of you who have my Peepcode Git Book may notice that these are the same screencasts that are distributed with the book. Next up will be “Browsing Git Objects”, “Branching and Merging”, “Rebasing” and “Distributed Workflow”. After that, I will continue to produce short screencasts, mainly from the list I’m keeping here, so if there is something you want to see, give me a shout.

Hope you find it helpful.

Peepcode Git Book

I’ve been wanting to write a book on Git for a while now - at least since the beginning of the year. I really wanted to take the time to write out a book that taught Git the way I wish I had been taught - describe the internals first, what Git is really doing, rather than comparing it to SVN or just showing random commands without context.

I laid it all out and started writing, but it’s really hard to justify the untold hours it takes to finish it if you’re not sure anyone will even read it. So, about a month and a half ago, after a few false starts, I asked Geoffrey Grosenbach if Peepcode would be interested in publishing a mini-book on Git - it seemed to be getting more popular (remember, this was before GitHub or Rails moving to Git) and he was enthusiastic. So, off I went, and just a few minutes ago the Peepcode Git PDF product page went live:




This is actually only half of the book that I laid out - about halfway in it became clear that this is way too much for one mini-book, so I will be coming out with a follow-up book on “Advanced Git” hopefully sometime in the next few months.

In addition to the PDF, I also produced 8 short (5-15 min) screencasts that are associated with several of the chapters - you can download them all when you buy the book and there are sidebars in the text that point you to which episode demonstrates the contents of that section. Also, I am working on an audio-book version in case you want to review the content on your commute - that should be available this weekend.

I originally intended this content to be free, but not having something on the line would never have gotten me to actually do it, nor would the final product have been nearly so polished. I’m really glad I went with Peepcode - I think the $9 is worth the quality that the Peepcode production added to it. I hope you agree. Let me know what you think!

Ruby Reporting with Munger

I started in on a story at work where I needed to add some reports on one of our internal applications. In the past, I have just done some query and then iterated over it in the view, creating a report manually, but it seemed that I was always doing the same sorts of operations and that there should be some tool that makes that all easier. So, I started searching for reporting tools in ruby, googling and asking friends and the only thing I found was Ruport.

It seemed pretty cool and I was really impressed with the demos and my first few passes, but after a half day of struggling, I just could not force out of it the reports I was looking for. It did grouping weirdly, I didn’t seem to have easy control over the rendering of the html, I couldn’t highlight cells, the pivoting functionality was destructive to other row data and you couldn’t change more than one column in a single pass.

My biggest problem was that the data manipulation and the report formatting was so tightly coupled. It was an interesting api, so I considered hacking on it, but it didn’t seem too complex and I wanted the architecture to have a cleaner separation of what I feel are the three major steps of reporting - data manipulation, report formatting and output rendering. So, I decided to start my own project and a day and a half later, I had Munger, the alternative ruby reporting library.

The library is about two days old, but it’s pretty usable already. If you’d like to help out, fork and hack!

Simple Example

Here is a simple example.

result = AdAirings.find(:all)
report = Munger::Report.from_data(result).process
puts Munger::Render.to_text(report)

will result in :

|airtime | airdate    | clicks | advert |
------------------------------------------
|15      | 2008-01-01 | 301    | spot 1 |
|30      | 2008-01-02 | 199    | spot 1 |
|30      | 2008-01-03 | 234    | spot 1 |
|15      | 2008-01-04 | 342    | spot 1 |
|30      | 2008-01-01 | 172    | spot 2 |
|15      | 2008-01-02 | 217    | spot 2 |
|90      | 2008-01-03 | 1023   | spot 2 |
|30      | 2008-01-04 | 321    | spot 2 |
|60      | 2008-01-01 | 512    | spot 3 |
|30      | 2008-01-02 | 813    | spot 3 |
|15      | 2008-01-03 | 333    | spot 3 |

A Simple Pivot

result = AdAirings.find(:all)
data = Munger::Data.load_data(result)

new_columns = data.pivot('airdate', 'advert', 'clicks')

report = Munger::Report.from_data(data).columns([:advert] + new_columns.sort).process
puts Munger::Render.to_text(report)

becomes:

|advert | 2008-01-01 | 2008-01-02 | 2008-01-03 | 2008-01-04 |
--------------------------------------------------------------
|Spot 1 | 301        | 199        | 234        | 342        |
|Spot 2 | 172        | 217        | 1023       | 321        |
|Spot 3 | 512        | 813        | 333        |            |

A More Complex Example

result = AdAirings.find(:all)

data = Munger::Data.load_data(result)
data.add_columns([:advert, :rate]) do |row|
  rate = (row.clicks / row.airtime)
  [row.advert.capitalize, rate]
end

report = Munger::Report.from_data(data)
report.sort(’airtime’).subgroup(’airtime’)
report.aggregate(Proc.new {|arr| arr.inject(0) {|total, i| i * i + (total - 30) }} => :airtime,
                        :sum => :rate)

puts Munger::Render.to_text(report)

gives us :

|Spot   | Rate | Air Date   | Airtime |
----------------------------------------
|Spot 2 | 14   | 2008-01-02 | 15      |
|Spot 1 | 20   | 2008-01-01 | 15      |
|Spot 3 | 22   | 2008-01-03 | 15      |
|Spot 1 | 22   | 2008-01-04 | 15      |
|       | 78   |            | 780     |
|Spot 2 | 5    | 2008-01-01 | 30      |
|Spot 1 | 6    | 2008-01-02 | 30      |
|Spot 1 | 7    | 2008-01-03 | 30      |
|Spot 2 | 10   | 2008-01-04 | 30      |
|Spot 3 | 27   | 2008-01-02 | 30      |
|       | 55   |            | 4350    |
|Spot 3 | 8    | 2008-01-01 | 60      |
|       | 8    |            | 3570    |
|Spot 2 | 11   | 2008-01-03 | 90      |
|       | 11   |            | 8070    |
|       | 152  |            | 16770   |

Wrapping Up

There is also cool cell and row styling, html rendering, etc, so check it out.