CarrierWave – limit file size (plus gif fix)

CarrierWave has an awesome abstraction API. It is simple, clear and extensible. But has some critical vulnerability specially when combined with image processing, such as, ImageMagick when resizing an image will consume exponencial memory size and any upload can easily make your process crash, when not processed safely. Also, it is not pretty good to combine .gif out of the box, because it makes a collection out of the file.

Friendly advice beforehand; Using http://filepicker.io/ may be a way better idea if you are hosting in Heroku, just make sure if fits your constraints before get hard work done.

Solution Spec

Hard limit file size of the request, so the process don’t block for too long, and don’t blow memory!

If you behind a server such as Apache or Nginx, you can impose a limit to the request size, and you should!

Unless you are in Heroku, and afaik, there is no way to do that, at least just yet. So yes, this can be a major security breach for Rails apps on Heroku.

Given a successful upload, pre-validate size.

The ‘official’ solution attempt to validate the size after the file have been processed. It doesn’t help, since when processing an image rather large (6Mb image consumed 2GB memory in my case) your process will be killed! Letting your website down for some time, and letting your users down as well.

For gifs, take only the first image (less memory consumption too)

When processing .gifs it seems to make a vertical frameset will all the images in the sequence, so it looks like a movie roll, which is not what most people want. Lets just extract the first frame.

Interestingly enough, I found that the processor is invoked for all frames in the .gif. (thanks debugger!)

Solution code

This code takes care the mentioned specs (except for the request size limit), and I think the great advantage is that it avoids opening a file as Image if it fails the size constraint. As well as being very efficient with gifs (only acting on the first frame).
It works on Heroku, with integration for S3, and should work on Amazon Cloud and other VPS.

The shortcome is about handling the exception which is a bit messy involving controller-side logic in a non-automated AR fashion.

Controller

  def create
    begin
    @post = Post.new(params[:post])
    rescue Exception => e
      if e.message == 'too large'
        redirect_to news_path(err: 'file')
      else
        raise e
      end
    end
   #...

uploader

# encoding: utf-8


class NewsUploader < CarrierWave::Uploader::Base

  include CarrierWave::RMagick

  include Sprockets::Helpers::RailsHelper
  include Sprockets::Helpers::IsolatedHelper


  def store_dir
    "uploads/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
  end

  def pre_limit file
    #require 'debugger'; debugger
    if file && file.size > 5.megabytes
      raise Exception.new("too large")
    end
    true
  end

  def only_first_frame
    manipulate! do |img|

      if img.mime_type.match /gif/
        if img.scene == 0
          img = img.cur_image #Magick::ImageList.new( img.base_filename )[0]
        else
          img = nil # avoid concat all frames
        end
      end
      img
    end
  end

  version :large, if: :pre_limit do
    process :only_first_frame
    process :convert => 'jpg'
    process :resize_to_limit => [1280, 1024]
  end

  # Create different versions of your uploaded files:
  version :small, if: :pre_limit do
    process :only_first_frame
    process :convert => 'jpg'
    process :resize_to_limit => [360, 360]
  end


  # For images you might use something like this:
  def extension_white_list
    %w(jpg jpeg gif png)
  end

end
Advertisements

Ruby’s fear-cancer

This video: High Performance Ruby: Threading Versus Evented by Dr Nic Williams, Engine Yard

It meant so much to me!

For around 4 months I’ve been using Node.js, and before that, for 3 years, programming Rails.

As soon as I started on Node.js, I could feel that something was different. My little Computer Science bachelor conscience was starting to tell me: now you are starting to do it right 🙂

But then, lets go back, Ruby’s syntax is lovely.. Rails API is sugar! But still, it’s neverland. Why??

It does not fit real world! It is a thin layer of happiness, so delicate, that we feel fear of touching it and breaking it. So what we puny humans do? We delegate it to those genius heroes who will be able to take that code and maybe squeeze a few more couples of requests/sec!

But not even that is the cancer I mean.

The cancer here is represented by clown hosting the video. “Oh those scientific articles mess with our head and stuff O.o Let me chew that evil reality and I will give you the RIGHT (sugar coated) solution!” and all other non fun nor assertive statements. By the way, he may be referring to the C10K problem, and alike, articles.

The problem here is, VP  EngineYard and want ppl addicted on his junk. He does not host Node.js.

Ruby always had so many language implementations to solve some important language problems, as well as servers. This is not the first time a good solution for ruby is born (JRuby+Trinidad), but why would they share it before if he could have people paying for so much RAM?

The reason I believe he felt compelled to share it now, is that Node.js unoptimized code can outperform Rails by a LOT,  under 150Mb.

But my main problem is fear of complexity. The host talks about it all over the initial part of the video. He refers his audience under constant fear of it’s own ignorance all the time. This is a awful, and that’s how I felt using Rails. Such a large stack, complex language design, only understanding the framework code itself was hard task.

To keep people in bliss ignorance he just goes like: “Oh I promissed evented? We are past that, right?! HAAAA” so just be happy and keep the status quo. You would’t want to mess your pretty little head with asynchronous code, would you?

I like Node.js, besides performance, language is simple; it is JavaScript, open Objects, closure and more. Of course, don’t assume well-written JS is something easy to do. No great code in any modern language is easy to achieve.

If you’ve developed Rails, you have concepts of MRI, thin, mongrel, jruby, 1.8.7, 1.9.2, rubinius, unicorn, and LOT MORE, whilst in node.js world all that stands for: node.js. It is a unification point for language and server.

What about gems?  npm, a easier system of distributing packages.

Wrapping it up, node.js is no silver bullet, it just made me realize the problems I had while programming ruby, as much as ruby did it for me on php. If nothing else, node.js helped improve ruby’s community by adding options to the web development mainstream.

Would I work with Rails again? If a employer would point me that out as the only chosen solution, then yes, and I’d probably try that stack, but I prefer confidence instead of fear.

When to Ruby on Rails, when to Node.js

(update) Take this post as a naive overview, it may not reflect the most accurate reality

Hello!

I am trying to do a sort of indirect comparison between Rails and Node.js. The very main reason of being indirect, is that Rails is a Framework, while Node.js is a runtime with custom libraries.

If it were be to put in a simple phrase, Rails is resourceful and Node.js is light and fast.

Lets elaborate some more..

Rails

Is the most complete open-source framework available (that I know of)! Big companies use it. It can do lots of stuff well, in an organized manner; this meaning, Rails is more than just MVC, it has a full stack of features well-integrated, at the same time being very modular. Some of the features included out of the box:

  • Database adapter for the majority of them, supporting plug your own.
  • Database migrations, so multiple dev can sync and experiment with their DB.
  • Powerful engines for Views, Controllers and Models.
  • Support to code generator.
  • Has structure to all sorts of tests and friendly to TDD.
  • Really awesome documentation.
  • Model has all kinds of hooks, validations and associations.
  • Controller has support to handle XML/JSON in the same action that serves HTML.
  • Gems that integrate, for instance, Memcached, MongoDB, Auth and lots more.
So Rails is war-proven, capable of integrating lots of features together without harass. There is also a very cool post of Fabio Akita in the refs. about how it made possible to develop systems in periods before impossible.

Node.js

Two things make this platform suitable for web:

Its engine, V8 is very fast! In a very loose average, 8 times faster than Python (or even up to 200 at peak). Python already outperforms Ruby (ref. bottom)

Second point; and this argument is separated from the above, is that it async-driven (is built around reactor pattern). As in the case, requests can be performed in parallel, without any blocking I/O. A single server can handle a lot. (update) And with >0.6.0 Cluster API, it can scale to use all of available CPU cores.

So, it is a very new sort of backend language, but huge players, besides Joyent, who invented it, are adopting it, including LearnBoost and LinkedIn, which has an awesome article about using. The language, and it’s main web framework, Express, deserve a list of features (you can check more info in the references below).

  • It´s web server is able to handle a HUGE number of connections out of the box
  • Various libraries can be run on browser, the same as in the server
  • Very friendly to Websockets (real-time web apps)
  • Lots of libraries are being ported to it from other langs.
  • Express, inspired in ruby´s Sinatra; is very light on memory but also very powerful
Running a simple benchmark against a single server instance, I were able to get 587 req/s accessing MySQL without any external cache support. This number could scale, if I used Cluster to spawn at least a process per processor.

Summarizing, When to use each?

Rails really shines when..

  • The database is complex in terms of associations.
  • The app structure is well defined.
  • Business rules are complex, and validation is needed.
  • When the number of requests isn´t the a decisive factor.
  • Administrative interfaces.
  • Many developers in parallel keep the DB up-to-date with migrations
  • The database to be used is undefined, or may vary.
What about Node.js?
  • APIs
  • Real-time web/mobile apps.
  • Application that should scale to lots of concurrent requests.
  • Little memory footprint

This being said, there is no reason at all, a web-site or service can´t easily integrate both.

— I’d appreciate if you could leave a comment, either to talk about your case, or add up.

References

 

UPDATE: http://www.mikealrogers.com/posts/a-new-direction-for-web-applications-.html

http://guides.rubyonrails.org/

http://railscasts.com/

http://blog.heroku.com/archives/2011/6/22/the_new_heroku_2_node_js_new_http_routing_capabilities/

http://nodejs.org/

http://akitaonrails.com/2011/04/16/twitter-muda-de-ruby-para-java-ruby-e-3x-mais-lento-que-java

http://venturebeat.com/2011/08/16/linkedin-node/

http://blog.bossylobster.com/2011/08/lesson-v8-can-teach-python-and-other.html

https://github.com/LearnBoost

http://www.readwriteweb.com/hack/2011/01/how-3-companies-are-using-node.php

http://twitter.com/#!/FlockonUS/status/104655096956190720

https://github.com/LearnBoost/cluster

Leaving Google AppEngine for Rails and going Heroku

Hey all

Introduction (goal)

I’ve been projecting for sometime my tag based search engine for sometime now. And while it is something relatively simple, query performance still have a great impact on that matter.

It should be a project, with roots in gamification; and I choose Ruby on Rails, and not GAE’s Python or Java framework due I do not want to be stuck in a hosting platform for whatsoever reason.

AppEngine, what’s great?

AppEngine has many, cool things. Those including some that should make my life a lot simpler

  • Generous elastic free quota for all stuff I needed
  • BigTable, with stable performance (it is suppost to perform so well with thousand entry through a million )[demo]
  • They have the type List in DataStore, witch is exactly what I need to make a good queries
  • Should scale seamless, no pain
  • Have a great panel, that allows easy data manipulation, and great reports.
Besides, it is clear that Google is investing heavy on that product, which is a huge plus.

AppEngine, what is bad, for Rails?

After building my hello world on GAE, things looked awesome; a proof of concept right there! Rails 2.3.8 running smooth.

But then it began.. migration, data backup, gems like omniauth, sorta elaborated queries.. where to look for all those stuff? Rubygems system is disabled, and no doc is easily found, if any doc is around. So I got to go build conceptual proofs  for all those things?

So it happens, that their landing page for Rails, is stale for years; a big turn down! And still it hard to find some decent support for Rails (@woodie does help in docs, but can’t be accepted as the only source)

Recently they’ve announced a they are going beta with the Go language;  what sounds really great, still in practice it mean Ruby won’t be getting support (if ever) in a very long while.

So, Heroku and why

For being a Computer Scientist . I am a big fan of Google capacity of execution and performance for years, but the downsides have been just too great too keep on. So let’s check Heroku.

  • I had prior experience there
  • Heroku offers Rails out of the box, working with Omniauth.
  • They also have a free quota to try (1 dyno ~ 10-50req/sec)
  • 240MB from MongoLab
  • Nice features
  • Robust and elastic scalable (dyno charge per hour)

Final Considerations

The pro in this scenario, is that they take Rails with all gems I could need, no harass there, things just work! And if I grow tired of then, I can just switch host 🙂

Why not a VPS? I’ve had Linode for sometime now, they have some great management panel; but the concern about the O.S. security and app scalability is considerable.. Sure is cool to config all my features from the ground; but considering business, this kind of setup must be taken professionally.

Google AppEngine, or GAE, is a great platform, for their supported languages. If I were to host there, I would definitely learn Go, for all great features intended, and the attention Google is giving it.

Migrating from SQLite to Mysql on Rails 3

Common sense on developers use to say: Don’t use sqlite on production, because it sux in performance.

Well, for a system with around 20 users that should be no problem at all in the begin, right? – Right but also Wrong!

I’ve tested using the system on the same settings, but for some reason, after a while the reading on the Database by Rails became very inconsistent. While in console I would find some recently created objects by Rufus Scheduler, the web app would not see then until the server was restarted! Weird results, but a price to pay for not listening the common sense =)

The solution would be to migrate the schema and data to some easy to use Mysql. Schema as widely known is extremely easy to migrate on Rails framework (if used properly).

Searching for solutions for migrating data from different databases may be a pain, but given a great Rails tool it is rather simple! Introducing the gem yaml_db, made by the Heroku people.

As the commands are simple as :dump and :load, I don’t see why go longer in the explanation.

The whole process took less than 2 hours, including a small bug fix in a non-agnostic code =)

At service of BananaBrains, FIRE the social soccer game RPG.

Desenvolvimento web, Porque Aprender Rails?

[ESSE É UM POST MASHUP]

Isso quer dizer que vou me focar em retransmitir bons argumentos de referências que respeito.

Argumentos Porque muitas Startups Usam Rails?

Mas e a PerformanceTwitter muda de Ruby para Java. Ruby é 3x mais lento que Java.

Corroborando, o que realmente importa, o processo You Are Solving The Wrong Problem

Venho desenvolvendo Ruby, mas não exclusivamente isso à mais de 2 anos, tenho muito o que aprender ainda. As ferramentas de Console são extremamente úteis, aconselho abusar delas, seja como 1ª linguagem ou migrando de outras como Java, C++.

Sigam pessoas legais no Twitter para se manter informados 🙂

Input BULKY de dados por Google Docs(spreadsheet)

Precisando ler dados de uma planilha de Google Docs pelo Rails 3? https://github.com/gimite/google-spreadsheet-ruby muito fácil!

Como a documentação da gem está muito bem feita nem vou me preocupar em explicar isso, mas vou dar uma sugestão de uso.

Temos essa necessidade grande junto à um cliente de importar uma quantidade massiva de dados, então ele me sugeriu importar de uma planilha, o resultado foi esse modelo:

Como o meu objetivo aqui foi bastante específico, vou só passar uma idéia geral do modelo:

  • As celulas azuis são usadas pelo usuário para inserir dados
  • As celulas cinza são de uso exclusivo do sistema para feedback

O sistema está muito crú ainda, mas pode ser uma boa fazer uma Gem com isso. Vou deixar em anexo o Model e o Controller que usei para fazer a importação de dados ‘bulky’.

Model GIST

Controller GIST

A idéia no código é que cada linha pode ser julgada por 3 resultados distintos:

  • CADASTRADA: é considerada já persistida e ignorada futuramente
  • INVÁLIDA: Por já ser cadastrada, dada uma condição de busca
  • IGNORADA: No caso de linha vazia (invisível)

O que eu considero de mais legal nesse sistema é a capacidade de interação bi-lateral: O usuário fornece uma quantidade massiva de dados e o sistema responde com possíveis problemas.

Sugestão: Retornar na coluna de erros: registro.errors é bastante interessante pra casos onde uma validação elaborada ocorre, e o usuário tem capacidade de entender uma mensagem um tanto “confusa” 🙂

Abraços!