CarrierWave – limit file size (plus gif fix)

CarrierWave has an awesome abstraction API. It is simple, clear and extensible. But has some critical vulnerability specially when combined with image processing, such as, ImageMagick when resizing an image will consume exponencial memory size and any upload can easily make your process crash, when not processed safely. Also, it is not pretty good to combine .gif out of the box, because it makes a collection out of the file.

Friendly advice beforehand; Using http://filepicker.io/ may be a way better idea if you are hosting in Heroku, just make sure if fits your constraints before get hard work done.

Solution Spec

Hard limit file size of the request, so the process don’t block for too long, and don’t blow memory!

If you behind a server such as Apache or Nginx, you can impose a limit to the request size, and you should!

Unless you are in Heroku, and afaik, there is no way to do that, at least just yet. So yes, this can be a major security breach for Rails apps on Heroku.

Given a successful upload, pre-validate size.

The ‘official’ solution attempt to validate the size after the file have been processed. It doesn’t help, since when processing an image rather large (6Mb image consumed 2GB memory in my case) your process will be killed! Letting your website down for some time, and letting your users down as well.

For gifs, take only the first image (less memory consumption too)

When processing .gifs it seems to make a vertical frameset will all the images in the sequence, so it looks like a movie roll, which is not what most people want. Lets just extract the first frame.

Interestingly enough, I found that the processor is invoked for all frames in the .gif. (thanks debugger!)

Solution code

This code takes care the mentioned specs (except for the request size limit), and I think the great advantage is that it avoids opening a file as Image if it fails the size constraint. As well as being very efficient with gifs (only acting on the first frame).
It works on Heroku, with integration for S3, and should work on Amazon Cloud and other VPS.

The shortcome is about handling the exception which is a bit messy involving controller-side logic in a non-automated AR fashion.

Controller

  def create
    begin
    @post = Post.new(params[:post])
    rescue Exception => e
      if e.message == 'too large'
        redirect_to news_path(err: 'file')
      else
        raise e
      end
    end
   #...

uploader

# encoding: utf-8


class NewsUploader < CarrierWave::Uploader::Base

  include CarrierWave::RMagick

  include Sprockets::Helpers::RailsHelper
  include Sprockets::Helpers::IsolatedHelper


  def store_dir
    "uploads/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
  end

  def pre_limit file
    #require 'debugger'; debugger
    if file && file.size > 5.megabytes
      raise Exception.new("too large")
    end
    true
  end

  def only_first_frame
    manipulate! do |img|

      if img.mime_type.match /gif/
        if img.scene == 0
          img = img.cur_image #Magick::ImageList.new( img.base_filename )[0]
        else
          img = nil # avoid concat all frames
        end
      end
      img
    end
  end

  version :large, if: :pre_limit do
    process :only_first_frame
    process :convert => 'jpg'
    process :resize_to_limit => [1280, 1024]
  end

  # Create different versions of your uploaded files:
  version :small, if: :pre_limit do
    process :only_first_frame
    process :convert => 'jpg'
    process :resize_to_limit => [360, 360]
  end


  # For images you might use something like this:
  def extension_white_list
    %w(jpg jpeg gif png)
  end

end
Advertisements

Ruby’s fear-cancer

This video: High Performance Ruby: Threading Versus Evented by Dr Nic Williams, Engine Yard

It meant so much to me!

For around 4 months I’ve been using Node.js, and before that, for 3 years, programming Rails.

As soon as I started on Node.js, I could feel that something was different. My little Computer Science bachelor conscience was starting to tell me: now you are starting to do it right 🙂

But then, lets go back, Ruby’s syntax is lovely.. Rails API is sugar! But still, it’s neverland. Why??

It does not fit real world! It is a thin layer of happiness, so delicate, that we feel fear of touching it and breaking it. So what we puny humans do? We delegate it to those genius heroes who will be able to take that code and maybe squeeze a few more couples of requests/sec!

But not even that is the cancer I mean.

The cancer here is represented by clown hosting the video. “Oh those scientific articles mess with our head and stuff O.o Let me chew that evil reality and I will give you the RIGHT (sugar coated) solution!” and all other non fun nor assertive statements. By the way, he may be referring to the C10K problem, and alike, articles.

The problem here is, VP  EngineYard and want ppl addicted on his junk. He does not host Node.js.

Ruby always had so many language implementations to solve some important language problems, as well as servers. This is not the first time a good solution for ruby is born (JRuby+Trinidad), but why would they share it before if he could have people paying for so much RAM?

The reason I believe he felt compelled to share it now, is that Node.js unoptimized code can outperform Rails by a LOT,  under 150Mb.

But my main problem is fear of complexity. The host talks about it all over the initial part of the video. He refers his audience under constant fear of it’s own ignorance all the time. This is a awful, and that’s how I felt using Rails. Such a large stack, complex language design, only understanding the framework code itself was hard task.

To keep people in bliss ignorance he just goes like: “Oh I promissed evented? We are past that, right?! HAAAA” so just be happy and keep the status quo. You would’t want to mess your pretty little head with asynchronous code, would you?

I like Node.js, besides performance, language is simple; it is JavaScript, open Objects, closure and more. Of course, don’t assume well-written JS is something easy to do. No great code in any modern language is easy to achieve.

If you’ve developed Rails, you have concepts of MRI, thin, mongrel, jruby, 1.8.7, 1.9.2, rubinius, unicorn, and LOT MORE, whilst in node.js world all that stands for: node.js. It is a unification point for language and server.

What about gems?  npm, a easier system of distributing packages.

Wrapping it up, node.js is no silver bullet, it just made me realize the problems I had while programming ruby, as much as ruby did it for me on php. If nothing else, node.js helped improve ruby’s community by adding options to the web development mainstream.

Would I work with Rails again? If a employer would point me that out as the only chosen solution, then yes, and I’d probably try that stack, but I prefer confidence instead of fear.

When to Ruby on Rails, when to Node.js

(update) Take this post as a naive overview, it may not reflect the most accurate reality

Hello!

I am trying to do a sort of indirect comparison between Rails and Node.js. The very main reason of being indirect, is that Rails is a Framework, while Node.js is a runtime with custom libraries.

If it were be to put in a simple phrase, Rails is resourceful and Node.js is light and fast.

Lets elaborate some more..

Rails

Is the most complete open-source framework available (that I know of)! Big companies use it. It can do lots of stuff well, in an organized manner; this meaning, Rails is more than just MVC, it has a full stack of features well-integrated, at the same time being very modular. Some of the features included out of the box:

  • Database adapter for the majority of them, supporting plug your own.
  • Database migrations, so multiple dev can sync and experiment with their DB.
  • Powerful engines for Views, Controllers and Models.
  • Support to code generator.
  • Has structure to all sorts of tests and friendly to TDD.
  • Really awesome documentation.
  • Model has all kinds of hooks, validations and associations.
  • Controller has support to handle XML/JSON in the same action that serves HTML.
  • Gems that integrate, for instance, Memcached, MongoDB, Auth and lots more.
So Rails is war-proven, capable of integrating lots of features together without harass. There is also a very cool post of Fabio Akita in the refs. about how it made possible to develop systems in periods before impossible.

Node.js

Two things make this platform suitable for web:

Its engine, V8 is very fast! In a very loose average, 8 times faster than Python (or even up to 200 at peak). Python already outperforms Ruby (ref. bottom)

Second point; and this argument is separated from the above, is that it async-driven (is built around reactor pattern). As in the case, requests can be performed in parallel, without any blocking I/O. A single server can handle a lot. (update) And with >0.6.0 Cluster API, it can scale to use all of available CPU cores.

So, it is a very new sort of backend language, but huge players, besides Joyent, who invented it, are adopting it, including LearnBoost and LinkedIn, which has an awesome article about using. The language, and it’s main web framework, Express, deserve a list of features (you can check more info in the references below).

  • It´s web server is able to handle a HUGE number of connections out of the box
  • Various libraries can be run on browser, the same as in the server
  • Very friendly to Websockets (real-time web apps)
  • Lots of libraries are being ported to it from other langs.
  • Express, inspired in ruby´s Sinatra; is very light on memory but also very powerful
Running a simple benchmark against a single server instance, I were able to get 587 req/s accessing MySQL without any external cache support. This number could scale, if I used Cluster to spawn at least a process per processor.

Summarizing, When to use each?

Rails really shines when..

  • The database is complex in terms of associations.
  • The app structure is well defined.
  • Business rules are complex, and validation is needed.
  • When the number of requests isn´t the a decisive factor.
  • Administrative interfaces.
  • Many developers in parallel keep the DB up-to-date with migrations
  • The database to be used is undefined, or may vary.
What about Node.js?
  • APIs
  • Real-time web/mobile apps.
  • Application that should scale to lots of concurrent requests.
  • Little memory footprint

This being said, there is no reason at all, a web-site or service can´t easily integrate both.

— I’d appreciate if you could leave a comment, either to talk about your case, or add up.

References

 

UPDATE: http://www.mikealrogers.com/posts/a-new-direction-for-web-applications-.html

http://guides.rubyonrails.org/

http://railscasts.com/

http://blog.heroku.com/archives/2011/6/22/the_new_heroku_2_node_js_new_http_routing_capabilities/

http://nodejs.org/

http://akitaonrails.com/2011/04/16/twitter-muda-de-ruby-para-java-ruby-e-3x-mais-lento-que-java

http://venturebeat.com/2011/08/16/linkedin-node/

http://blog.bossylobster.com/2011/08/lesson-v8-can-teach-python-and-other.html

https://github.com/LearnBoost

http://www.readwriteweb.com/hack/2011/01/how-3-companies-are-using-node.php

http://twitter.com/#!/FlockonUS/status/104655096956190720

https://github.com/LearnBoost/cluster

Importing data from MongoHQ, and sending to MongoLab (as Heroku plugin)

Up to present date, I have not found a complete guide about it, so I am assembling it here, but please, have in mind that I am a MongoDB noob

What I want to do:
– Get all collection I have in MongoHQ, and backup locally
– Also, lets dump it into binary
– Get all collection of documents locally, and transmit to MongoLab (lets say, because they give 200MB of host free *Please see comments on more insight about this subject )

To do so, I first got to: http://www.mongodb.org/display/DOCS/Import+Export+Tools, but then, they warn it is easier to achieve by: http://www.mongodb.org/display/DOCS/Copy+Database+Commands command: db.copyDatabase

Step1, backup

On the folder associated to heroku (where you commit and push your app) run:
heroku config

focus on the key:
MONGOHQ_URL => mongodb://heroku:some_pass@flame.mongohq.com:27021/app9000

You can have more info about MongoHQ here http://devcenter.heroku.com/articles/mongohq

So, after figuring out this string, you just log on your local mongo console via: mongo

Now, you plan to backup to some db name, as ‘my_heroku_app_bckp’
run: db.copyDatabase( ‘app9000’, ‘my_heroku_app_bckp’, ‘flame.mongohq.com:27021/app9000’, ‘heroku’, ‘some_pass’ )

And you should get: { “ok” : 1 } ftw! If you get some #fail, refer to someone wiser than me, like www.StackOverflow.com :L

If all went right now you have you database replicated in your PC, and you can do whatever you want with it. For extra paranoid run: db.some_coll_you_know_has_stuff.count()

STEP 2, dump

Lets dump all data locally, using:
mongodump -o mongo_bckp -d my_heroku_app_bckp

That sould create a folder and the data inside, as bson.

STEP 3, migrate

Lets add MongoLab at heroku; follow this guide: http://devcenter.heroku.com/articles/mongolab

All right? Again run: heroku config

Now focus on:
MONGOLAB_URI => mongodb://heroku_app9000:other_pass@dbh22.mongolab.com:27227/heroku_app9000

Lets make a straigth import to them, using:
mongorestore -h dbh22.mongolab.com:27227 -d heroku_app9000 -u heroku_app9000 -p “mongo_bckp/my_heroku_app_bckp”

Enter the huge password when propted, and if everything ran smooth you will see many importations and no #fail

All is left to do is to switch the DB connection inside your app.

Now, commit cross your fingers! Check your app, all good? 🙂

* If you get a empty DB, check the database name

End Notes

MongoLabs and MongoHQ have different ways of counting and charging for data, in ways that can make a huge difference. For instance, for the same 6,582 doccuments under 1 collection, I may get 2.18Mb in one, while in the other I get 6.91 MB! (see comments)

Finally, by a very very very loose benchmark, their response has pretty much the same speed, being MongoLabs a tiny bit faster

Very simple invisible JavaScript & Rails captcha

Hello!

Visual captchas are far from being desirable on most public sites, but spam is even less desirable, mainly in contact forms. This solution I am implementing is dead simple, but also, weaker than reCaptcha.

snippet:

Put this in the application_controller.rb

  before_filter :form_spam_control

  private

  def form_spam_control
    if request.post? || request.put?
      unless params['agent_smith'] == 'Mr Anderson, welcome back'
        render :text => "Please make sure cookies and js are enabled"
        return false
      end
    end
  end

Put this in a javascript that is executed on every public page, typically, application.js (*does require jQuery loaded)

$(document).ready( function(){
  $('form').append( j('<input/>', {
    type: 'hidden',
    id: 'agent_smith',
    name: 'agent_smith',
    value: 'Mr Anderson, welcome back'
  }) )
})
//UPDATE! in order to support AJAX without extra params add:
j('body').ajaxSend(function(a,b,c){ if( c.type == 'POST' || c.type == 'PUT' ) c.data = c.data.length > 0 ? c.data+'&agent_smith=Mr+Anderson%2C+welcome+back' : 'agent_smith=Mr+Anderson%2C+welcome+back'})

Discussion:
This is totally invisible and harass-free for the user.
I am based on the principle that spam crawlers does not run JavaScript, which may not be true for all of them. Still this will deny some crawlers that may be considered good, such as Mechanize.
This technique can be easily ported to other backend languages, such as PHP, ASP, C#, Java, since it only requires a parameter filter on POSTs and PUTs
If the attacker focus your website, this will be easily broken.
If the user has JavaScript disabled, he can’t post, but this is a normal drawback on some captchas.
* the part of the error message including ‘cookies’ is just a disguise =)

accepts_nested_attributes_for 3.0.5 :reject_if still have gotchas

Hey

Nested forms are nasty, but many times, a necessary evil 🙂

Rails tackled it since version 2.3x, when added the ActiveRecord method accepts_nested_attributes_for, main ref here. For its complexity, I believe, it is still hard to master it, and still bugs may happen; this was my lesson.

Ryan Bates, (a ruby hero) has made some screencasts from it, watch part1, part2. Soon after, he and some buds, made a gem for it, targeting Adding and Removing via JavaScript, without harassment. You can find on GitHub, along with the necessary doc.

So, using the gem or not, still there are things to cover over this methods params;

From what I could observe, :reject_if, does not work for ‘rejecting’ a existing record, i.e.: those that have ID. But, it is called for those too! – In this topic, (what may be a bug) is that, if the record gets rejected, it does not get updated (it does not go through it’s model’s validations either).

The solution I found was not using :reject_if,  instead, validating whatever I wanted on the nested Model in order to keep dry.

For this scenario, consider the following setup (rails v3.0.5)

class A < ActiveRecord::Base
 has_many :bs
 accepts_nested_attributes_for :bs,
                               :allow_destroy => true
                               #no use. :reject_if => :combo_elem_zero?

 # wont use this function at all
 def combo_elem_zero?( att )
  #puts(atribs.to_yaml)
  #att['_destroy'] = '1' # wont work here

  # only useful for new records
  if atribs['id'].blank? && att[:some_atrib].to_f < something_blah
   true
  else
   false
  end
 end
end

class B < ActiveRecord::Base
 belongs_to :a

 validate :destruction_update # works both for create/update

 def destruction_update
  if self.some_atrib.to_f < something_blah
   self.mark_for_destruction
  end
 end

end

If on the other hand, you only need to check on the nested new records, :reject_if may do the job.

Leaving Google AppEngine for Rails and going Heroku

Hey all

Introduction (goal)

I’ve been projecting for sometime my tag based search engine for sometime now. And while it is something relatively simple, query performance still have a great impact on that matter.

It should be a project, with roots in gamification; and I choose Ruby on Rails, and not GAE’s Python or Java framework due I do not want to be stuck in a hosting platform for whatsoever reason.

AppEngine, what’s great?

AppEngine has many, cool things. Those including some that should make my life a lot simpler

  • Generous elastic free quota for all stuff I needed
  • BigTable, with stable performance (it is suppost to perform so well with thousand entry through a million )[demo]
  • They have the type List in DataStore, witch is exactly what I need to make a good queries
  • Should scale seamless, no pain
  • Have a great panel, that allows easy data manipulation, and great reports.
Besides, it is clear that Google is investing heavy on that product, which is a huge plus.

AppEngine, what is bad, for Rails?

After building my hello world on GAE, things looked awesome; a proof of concept right there! Rails 2.3.8 running smooth.

But then it began.. migration, data backup, gems like omniauth, sorta elaborated queries.. where to look for all those stuff? Rubygems system is disabled, and no doc is easily found, if any doc is around. So I got to go build conceptual proofs  for all those things?

So it happens, that their landing page for Rails, is stale for years; a big turn down! And still it hard to find some decent support for Rails (@woodie does help in docs, but can’t be accepted as the only source)

Recently they’ve announced a they are going beta with the Go language;  what sounds really great, still in practice it mean Ruby won’t be getting support (if ever) in a very long while.

So, Heroku and why

For being a Computer Scientist . I am a big fan of Google capacity of execution and performance for years, but the downsides have been just too great too keep on. So let’s check Heroku.

  • I had prior experience there
  • Heroku offers Rails out of the box, working with Omniauth.
  • They also have a free quota to try (1 dyno ~ 10-50req/sec)
  • 240MB from MongoLab
  • Nice features
  • Robust and elastic scalable (dyno charge per hour)

Final Considerations

The pro in this scenario, is that they take Rails with all gems I could need, no harass there, things just work! And if I grow tired of then, I can just switch host 🙂

Why not a VPS? I’ve had Linode for sometime now, they have some great management panel; but the concern about the O.S. security and app scalability is considerable.. Sure is cool to config all my features from the ground; but considering business, this kind of setup must be taken professionally.

Google AppEngine, or GAE, is a great platform, for their supported languages. If I were to host there, I would definitely learn Go, for all great features intended, and the attention Google is giving it.