I worked on a project around 2007 that used Ruby on Rails. That was my first experience with Ruby and my first experience with a real web product. I liked Ruby and Rails, but it was easy to get bitten by some of the abstractions. I remember the site bogged down really bad whenever we searched for a record in a large database table. The problem was that the database was hidden behind ActiveRecord, so it was easy to forget we were using a database at all. Writing a for loop to search for a record that matched some criteria felt natural, because our interface was with objects, not the underlying tables. However, behind the scenes, each iteration was a separate query. The result was thousands and thousands of queries, instead of just a single query with a simple WHERE clause. We were essentially doing in Ruby what we could have done much more efficiently in SQL. Once we realized the problem, we rewrote that kind of code so it used more or less raw SQL. The result was much faster, but we lost the readability of the abstraction. Everyone on the team was new to Ruby and Rails (grad students who shuffled in and out each semester), so it's possible that we were just doing things completely wrong. Still, it feels like it shouldn't have been that easy to shoot ourselves in the foot. Have things improved since then? How do you balance nice abstractions like ActiveRecord with performance? How do you make it clear to novices what's going on internally, so they can avoid the mistakes that we made?
Aren't abstractions supposed to hide the need for specialized knowledge? One should only have to ask it to "save", and the details of how to talk to the database should then be hidden away from the abstraction user.
Aren't abstractions supposed to hide the need for specialized knowledge?
Not all abstractions are correct. You cannot abstract the reality of remote resources. And your abstractions should be related to your domain model, not on treating remote resources as if they were a local array of things. I suggest you google "Fallacies of Distributed Computing".
One should only have to ask it to "save",
This level of abstraction is correct (but only if the abstraction has been developed properly).
and the details of how to talk to the database should then be hidden away from the abstraction user.
Define "talk". If the abstraction lets you iterate a set of remote resources as if they were a local array without regard to bandwidth o
You cannot abstract the reality of remote resources.
Do you have an example or scenario?
The internal guts of an access coordinator could "watch" the timing of requests to know to batch operators together, for example.
Also, one could tell the abstraction how important "recency" is, such as whether it's okay to delay actual writes or not. Perhaps a delay threshold in seconds can be given.
In other words, the abstraction can "ask" one to rank the relative importance of performance-related factors so as to selec
You cannot abstract the reality of remote resources.
Do you have an example or scenario?
The internal guts of an access coordinator could "watch" the timing of requests to know to batch operators together, for example.
Also, one could tell the abstraction how important "recency" is, such as whether it's okay to delay actual writes or not. Perhaps a delay threshold in seconds can be given.
In other words, the abstraction can "ask" one to rank the relative importance of performance-related factors so as to select the best internal or vendor-specific implementation choices to fit the trade-off profile given to it.
That may be difficult or impractical to fully implement, but not necessarily impossible in terms of abstracting away details.
Sometimes a DBA will ask me whether to optimize a database or table for quick writing or quick querying, or an in-between balance. I usually don't have to really know how the DBA does that, only give him/her my or my boss's preference.
Perhaps the existence of performance trade-offs cannot be hidden away, but the implementation of those choices largely can.
You are providing counter-examples that are very constrained by nature. I'm referring to general cases of remote resources.
In particular we are referring to the case of executing a predicate on a sequence of remote resources (specifically a sequence of database rows matching some criteria.)
I didn't think I have to make it clearer, but I guess I have to. Very specifically I was referring to general abstractions. Why? Because executing a predicate on a sequence of remote resources is, typically very spec
I didn't think I have to make it clearer, but I guess I have to.
I'm still asking for a specific example/scenario/use-case to illustrate your point. You are talking in generalities. Take an example from a university grade tracking system or airline reservation system or something from everyday life to construct a specific scenario. Make a quick and dirty schema for your example, and walk through the scenario steps.
As I said, this was a long time ago, so maybe I'm not remembering the exact performance problem we hit. My larger point still stands: we were burned by an abstraction, possibly made worse by our own lack of expertise. In our defense, we weren't there to build a website. Or research was in a totally unrelated area, not even really within the realm of computer science, and none of us had expertise building websites that scaled. The website was just the thing we used to collect data and do experiments. We
I think this is where optimization really kicks in.
AR is great for getting stuff out the door quick, and for most products its fine (After all its cheaper to add hardware than it is to add developers). However there are some problems where "throw hardware at it" wont solve fundamental problems. This is the point where you look at the profiler and realise AR is messing up performance and drop back to SQL.
I worked on a project around 2007 that used Ruby on Rails. That was my first experience with Ruby and my first experience with a real web product. I liked Ruby and Rails, but it was easy to get bitten by some of the abstractions. I remember the site bogged down really bad whenever we searched for a record in a large database table. The problem was that the database was hidden behind ActiveRecord, so it was easy to forget we were using a database at all. Writing a for loop to search for a record that matched some criteria felt natural, because our interface was with objects, not the underlying tables. However, behind the scenes, each iteration was a separate query. The result was thousands and thousands of queries, instead of just a single query with a simple WHERE clause. We were essentially doing in Ruby what we could have done much more efficiently in SQL. Once we realized the problem, we rewrote that kind of code so it used more or less raw SQL. The result was much faster, but we lost the readability of the abstraction. Everyone on the team was new to Ruby and Rails (grad students who shuffled in and out each semester), so it's possible that we were just doing things completely wrong. Still, it feels like it shouldn't have been that easy to shoot ourselves in the foot. Have things improved since then? How do you balance nice abstractions like ActiveRecord with performance? How do you make it clear to novices what's going on internally, so they can avoid the mistakes that we made?
I agree with what you said that the frame work makes things much easier to deal with database. As a result, one wouldn't know how to optimize it. If I remember correctly, they put in some optimization ways to deal with SQL (such as include, select, etc) starting in either version 1.2 or a little later (can't remember). What it does is to improve SQL in order to make 1 call instead of 100 calls for 100 records. However, it is extremely difficult to be as perfect as SQL language, so you would have to decide
The problem was that the database was hidden behind ActiveRecord, so it was easy to forget we were using a database at all. Writing a for loop to search for a record that matched some criteria felt natural, because our interface was with objects, not the underlying tables.
This problem is not unique to Ruby on Rails usage of the ActiveRecord pattern. People blindly using Hibernate and other ORMs run into the same thing, and it is, in general, what happens when people fall for one or more of the fallacies of distributed computing ("latency is zero" and "bandwidth is infinite".)
ActiveRecord pattern provides a theoretically pleasing abstraction, that is OK when accessing one row out of a relation. Trying to
Abstractions (Score:4, Informative)
I worked on a project around 2007 that used Ruby on Rails. That was my first experience with Ruby and my first experience with a real web product. I liked Ruby and Rails, but it was easy to get bitten by some of the abstractions. I remember the site bogged down really bad whenever we searched for a record in a large database table. The problem was that the database was hidden behind ActiveRecord, so it was easy to forget we were using a database at all. Writing a for loop to search for a record that matched some criteria felt natural, because our interface was with objects, not the underlying tables. However, behind the scenes, each iteration was a separate query. The result was thousands and thousands of queries, instead of just a single query with a simple WHERE clause. We were essentially doing in Ruby what we could have done much more efficiently in SQL. Once we realized the problem, we rewrote that kind of code so it used more or less raw SQL. The result was much faster, but we lost the readability of the abstraction. Everyone on the team was new to Ruby and Rails (grad students who shuffled in and out each semester), so it's possible that we were just doing things completely wrong. Still, it feels like it shouldn't have been that easy to shoot ourselves in the foot. Have things improved since then? How do you balance nice abstractions like ActiveRecord with performance? How do you make it clear to novices what's going on internally, so they can avoid the mistakes that we made?
Re: (Score:1)
Aren't abstractions supposed to hide the need for specialized knowledge? One should only have to ask it to "save", and the details of how to talk to the database should then be hidden away from the abstraction user.
Re: (Score:2)
Aren't abstractions supposed to hide the need for specialized knowledge?
Not all abstractions are correct. You cannot abstract the reality of remote resources. And your abstractions should be related to your domain model, not on treating remote resources as if they were a local array of things. I suggest you google "Fallacies of Distributed Computing".
One should only have to ask it to "save",
This level of abstraction is correct (but only if the abstraction has been developed properly).
and the details of how to talk to the database should then be hidden away from the abstraction user.
Define "talk". If the abstraction lets you iterate a set of remote resources as if they were a local array without regard to bandwidth o
Re: (Score:1)
Do you have an example or scenario?
The internal guts of an access coordinator could "watch" the timing of requests to know to batch operators together, for example.
Also, one could tell the abstraction how important "recency" is, such as whether it's okay to delay actual writes or not. Perhaps a delay threshold in seconds can be given.
In other words, the abstraction can "ask" one to rank the relative importance of performance-related factors so as to selec
Re: (Score:2)
Do you have an example or scenario?
The internal guts of an access coordinator could "watch" the timing of requests to know to batch operators together, for example.
Also, one could tell the abstraction how important "recency" is, such as whether it's okay to delay actual writes or not. Perhaps a delay threshold in seconds can be given.
In other words, the abstraction can "ask" one to rank the relative importance of performance-related factors so as to select the best internal or vendor-specific implementation choices to fit the trade-off profile given to it.
That may be difficult or impractical to fully implement, but not necessarily impossible in terms of abstracting away details.
Sometimes a DBA will ask me whether to optimize a database or table for quick writing or quick querying, or an in-between balance. I usually don't have to really know how the DBA does that, only give him/her my or my boss's preference.
Perhaps the existence of performance trade-offs cannot be hidden away, but the implementation of those choices largely can.
You are providing counter-examples that are very constrained by nature. I'm referring to general cases of remote resources.
In particular we are referring to the case of executing a predicate on a sequence of remote resources (specifically a sequence of database rows matching some criteria.)
I didn't think I have to make it clearer, but I guess I have to. Very specifically I was referring to general abstractions. Why? Because executing a predicate on a sequence of remote resources is, typically very spec
Re: (Score:1)
I'm still asking for a specific example/scenario/use-case to illustrate your point. You are talking in generalities. Take an example from a university grade tracking system or airline reservation system or something from everyday life to construct a specific scenario. Make a quick and dirty schema for your example, and walk through the scenario steps.
Re: (Score:2)
As I said, this was a long time ago, so maybe I'm not remembering the exact performance problem we hit. My larger point still stands: we were burned by an abstraction, possibly made worse by our own lack of expertise. In our defense, we weren't there to build a website. Or research was in a totally unrelated area, not even really within the realm of computer science, and none of us had expertise building websites that scaled. The website was just the thing we used to collect data and do experiments. We
Re: (Score:2)
I think this is where optimization really kicks in.
AR is great for getting stuff out the door quick, and for most products its fine (After all its cheaper to add hardware than it is to add developers). However there are some problems where "throw hardware at it" wont solve fundamental problems. This is the point where you look at the profiler and realise AR is messing up performance and drop back to SQL.
Re: (Score:2)
I worked on a project around 2007 that used Ruby on Rails. That was my first experience with Ruby and my first experience with a real web product. I liked Ruby and Rails, but it was easy to get bitten by some of the abstractions. I remember the site bogged down really bad whenever we searched for a record in a large database table. The problem was that the database was hidden behind ActiveRecord, so it was easy to forget we were using a database at all. Writing a for loop to search for a record that matched some criteria felt natural, because our interface was with objects, not the underlying tables. However, behind the scenes, each iteration was a separate query. The result was thousands and thousands of queries, instead of just a single query with a simple WHERE clause. We were essentially doing in Ruby what we could have done much more efficiently in SQL. Once we realized the problem, we rewrote that kind of code so it used more or less raw SQL. The result was much faster, but we lost the readability of the abstraction. Everyone on the team was new to Ruby and Rails (grad students who shuffled in and out each semester), so it's possible that we were just doing things completely wrong. Still, it feels like it shouldn't have been that easy to shoot ourselves in the foot. Have things improved since then? How do you balance nice abstractions like ActiveRecord with performance? How do you make it clear to novices what's going on internally, so they can avoid the mistakes that we made?
I agree with what you said that the frame work makes things much easier to deal with database. As a result, one wouldn't know how to optimize it. If I remember correctly, they put in some optimization ways to deal with SQL (such as include, select, etc) starting in either version 1.2 or a little later (can't remember). What it does is to improve SQL in order to make 1 call instead of 100 calls for 100 records. However, it is extremely difficult to be as perfect as SQL language, so you would have to decide
Fallacies of Distributed Computing (Score:1)
The problem was that the database was hidden behind ActiveRecord, so it was easy to forget we were using a database at all. Writing a for loop to search for a record that matched some criteria felt natural, because our interface was with objects, not the underlying tables.
Object-Relational Impendance Mismatch [wikipedia.org]
Law of Leaky Abstractions [joelonsoftware.com]
This problem is not unique to Ruby on Rails usage of the ActiveRecord pattern. People blindly using Hibernate and other ORMs run into the same thing, and it is, in general, what happens when people fall for one or more of the fallacies of distributed computing ("latency is zero" and "bandwidth is infinite".)
ActiveRecord pattern provides a theoretically pleasing abstraction, that is OK when accessing one row out of a relation. Trying to
Re: (Score:2)
Eager loading might have helped.
Using something other than ruby on rails would have definitely helped.
Using the fad of the day will almost always bite you in the arse.