At my real job I'm working on an application whose architecture is client, server, relational database. Right now our client talks to our server by calling functions like "UpdateThing(int attr1, string attr2, int attr3...)". The function creates a message which goes to the server. The server translates the message into what you can think of as a stored proc, that has as its arguments nearly the same ones as the original function that was called way back at the client. You can think of these client-side UpdateThing functions as our current API.
We are now considering opening up the API to our customers so that they can write their own apps that talk to our server and update the data. Some of us think that we shouldn't expose our rather raw, prosaic, stored-proc-like API. Instead, some of us say, we should develop anaother layer, object oriented, with domain classes, and have our customers only interact with those domain classes.
Does this debate sound familiar? Yep, it's another manifestation of the old object/relational impedence
That's something I first ran into in the early 90's, when I was the architect of an application that fully embraced the ORM philosophy. I was following the fashion, which was to solve most every problem using an Object Oriented approach. I didn't remember now if I even knew the term "ORM" at the time, but nevertheless, what I created was an ORM layer. When we first got all the pieces of the application up and running and started exercising them in a realistic way, the performance of my ORM layer was...terrible. But, after I got done rigging it up with intelligent caching and adding a scheme whereby the client layer could give optimization hints to the ORM layer, the performance was.... still terrible. Complimenting the app's terrible performance was the fact that it was cumbersome to develop and hard to debug. The only thing this app was good at was serving as a bullet point on my resume.
In the same way that the 70's were a bad decade for hair and pant cuffs, the early 90's were a bad time for programming, because it was when so many programmers were moving from C to C++, encountering OO for the first time. Just like you really want to keep those 70 snapshots buried deep in the drawer, you don't want to see the code we wrote in the early 90's. For many of us our first experiences with OO were like freshman year spring break in Cancun (anachronism, because I don't think Cancun existed in the early 90's...). I mean, there was excess. With a capital X for Crap. I remember one of my co-workers at the time who was also just moving from C to C++. He was way smarter than me, a Princeton engineering graduate, so his potential for deranged OO was much greater than mine. He created some helper classes for the very challenging computer science problem of....opening a folder and getting a list of filenames in it. The Object Oriented way. His solution was.... abstract. I know for sure he used the word "Factory" in his class hierarchy (oh, boy was there ever a hierarchy!) because that got you bonus points. When he was done, back on earth, back to lucidity and sobriety, he confessed that his classes were maybe just a tad too abstract in relation to the problem, given that you could use them not just to solve the "problem" of getting a list of filenames, but also for determining when to put your 22-pound turkey in the oven if you want to time the serving of Thanksgiving dinner with the Lion's annual football loss.
My most recent experience with an ORM layer was on a free lance project which used Subsonic. To add just one query against existing tables I had to add two new files to the project and change four others. Why? Why do women inflict on themselves the torture of wearing pantyhose and high heels? Why do software developers inflict on themselves the torture of an ORM layers? The answer: Lipstick. No, wait, wrong joke. (And who will understand a couple years from now?). The answer: Fashion. And crowd behavior.
So, at work Monday I'm expecting we'll be discussing the ORM question and so this weekend I've been doing research. Nah, that's not true. "Research" would imply that my mind is open. I'm not doing research; I'm stockpiling ammunition
. Below are some snippets from web pages that I'll be using to support my point of view. If you are interested in this topic, I encourage you to follow the links to the original sources and read the entire articles, especially if you DISAGREE with them, so that you can follow how the author has developed his argument rather than base your reaction on my taken-out-of-context snippets.
From "The Vietnam of Computer Science
by Ted Neward, June 2006
...So what were the principal failures in Vietnam? And, more importantly, what does all this have to do with O/R Mapping?
In the case of Vietnam, the United States political and military apparatus was faced with a deadly form of the Law of Diminishing Returns. In the case of automated Object/Relational Mapping, it's the same concern--that early successes yield a commitment to use O/R-M in places where success becomes more elusive, and over time, isn't a success at all due to the overhead of time and energy required to support it through all possible use-cases...
One of the key lessons of Vietnam was the danger of what's colloquially called "the Slippery Slope": that a given course of action might yield some early success, yet further investment into that action yields decreasingly commensurate results and increasingly dangerous obstacles whose only solution appears to be greater and greater commitment of resources and/or action...Others call this "the Last Mile Problem": that as one nears the end of a problem, it becomes increasingly difficult in cost terms (both monetary and abstract) to find a 100% complete solution. All are basically speaking of the same thing--the difficulty of finding an answer that allows our hero to "finish off" the problem in question, completely and satisfactorily...
The Partial-Object Problem and the Load-Time Paradox
It has long been known that network traversal, such as that done when making a traditional SQL request, takes a significant amount of time to process. (Rough benchmarks have placed this value at anywhere from three to five orders of magnitude, compared against a simple method call on either the Java or .NET platform5; roughly analogous, if it takes you twenty minutes to drive to work in the morning, and we call that the time required to execute a local method call, four orders of magnitude to that is roughly the time it takes to travel to Pluto, or just shy of fourteen years, one way.) This cost is clearly non-trivial, so as a result, developers look for ways to minimize this cost by optimizing the number of round trips and data retrieved.
In SQL, this optimization is achieved by carefully structuring the SQL request, making sure to retrieve only the columns and/or tables desired, rather than entire tables or sets of tables. For example, when constructing a traditional drill-down user interface, the developer presents a summary display of all the records from which the user can select one, and once selected, the developer then displays the complete set of data for that particular record. Given that we wish to do a drill-down of the Persons relational type described earlier, for example, the two queries to do so would be, in order (assuming the first one is selected):
SELECT id, first_name, last_name FROM person;
SELECT * FROM person WHERE id = 1;
In particular, take notice that only the data desired at each stage of the process is retrieved--in the first query, the necessary summary information and identifier (for the subsequent query, in case first and last name wouldn't be sufficient to identify the person directly), and in the second, the remainder of the data to display...This notion of being able to return a part of a table (though still in relational form, which is important for reasons of closure, described above) is fundamental to the ability to optimize these queries this way--most queries will, in fact, only require a portion of the complete relation...
The problem here is that the data to be displayed in the first Display...() call is not the complete Person, but a subset of that data; here we face our first problem, in that an object-oriented system like C# or Java cannot return just "parts" of an object--an object is an object, and if the Person object consists of 12 fields, then all 12 fields will be present in every Person returned. This means that the system faces one of three uncomfortable choices: one, require that Person objects must be able to accommodate "nullable" fields, regardless of the domain restrictions against that; two, return the Person completely filled out with all the data comprising a Person object; or three, provide some kind of on-demand load that will obtain those fields if and when the developer accesses those fields, even indirectly, perhaps through a method call....
Unfortunately, fields within the object are only part of the problem--the other problem we face is that objects are frequently associated with other objects, in various cardinalities (one-to-one, one-to-many, many-to-one, many-to-many), and an O/R mapping has to make some up-front decisions about when to retrieve these associated objects, and despite the best efforts of the O/R-M's developers, there will always be common use-cases where the decision made will be exactly the wrong thing to do....
From "Why OOP Fails Domain Modeling
by "topmind", August 2008
The problem is that OOP has not figured out how to work smoothly with RDBMS; it is a fitful marriage as one might find in Hollywood couples. For one, both practically and philosophically, RDBMS do not focus on the object level, or even "entity" level. (Entities are a somewhat close match to objects or classes).
For example, a "parts" table may have 40 columns. However, for a summary of all parts worth in the warehouse grouped by vendor, only the Vendor ID and Price would be needed out of these 40 columns. (Things like vendor name may come from other tables.)
For network communication efficiency, the RDBMS would process and sum only using these two columns and send the result to the application to display on a screen or report. The other 38 columns are not in the picture. The total volume of data is only about 5% of the total available for the entity.
(And it's even less if we are merely summing by vendor. If the average vendor has about 7 parts, then there is only one summary record for every seven parts, and the total data sent over the network may be less than 1 percent than if all the part objects or records were sent to the application to do the summing.)
But in OOP you generally don't sub-divide objects. It's not called "sub-object oriented programming" after all. It's based around whole objects. However, if they are dealt with as a whole object, then you have to marshal around all 40 columns or attributes to process them even if you don't use the entire object at a time. It's like carrying the entire cartoon of milk to the TV room even though you only want a mug's worth.
One may have a method called "sum_by_vendor" instead, which talks to the RDBMS and delivers the sums. However the RDBMS is still doing most of the work because it would be inefficient to transfer the entire 40 columns to an object processor. The method is merely a function-like wrapper around a query, so this is not really OOP.
RDBMS are designed to answer "queries". The answers can take information from the total available data, and the subset used in the answer has no obligation to reflect the entire "object" or objects. It provides only the answer (data) requested of it. It's no coincidence that Oracle named their company after a professional question answerer from ancient Greece.
Further, "joins" can combine multiple entities and produce a result that is "blind" to the fact that the original info came from two or more entities (tables). Framing everything in terms of the unit "object" is foreign to it.
Now it is possible to translate RDBMS results into "objects" per se, but it tends to create unnatural artifacts. For example, if a given task only needs say 6 attributes out of an available 40 columns, it could be wasteful bandwidth-wise to fill all 40 from the database. So one trick is to leave the non-used attributes blank. However, we then have a "dirty" object. We risk accidentally using it for another purpose that needs one of the attributes we left blank.
Another problem is staleness. Over time the object may grow out of synch with the database as changes to the database are not passed on to the object copy in the application's memory. Other users may be modifying the data at the same time you are; which means you may be processing old data and not know it.
In procedural programming used with RDBMS, usually one only uses the query results for the immediate task at hand and then discards it when done with that task. The RDBMS is assumed the official "keeper of the state", and not memory objects that are an attempt to model actual objects from the real world. RDBMS are well-tuned and suited for a get-only-what-you-need-for-this-task viewpoint. It is a kind of hit-and-run style of processing. It's the question-and-answer system at work. This directly conflicts with the OOP philosophy of having little machines hanging around that reflect real-world nouns (sometimes called "state machines").
There are tools, known as Object-Relational-Mappers (ORM) that attempt to automate around these issues so that objects appear to be whole and updated to the application. However, they are awkward and require a lot of skill to understand, tune performance for, get database synching right, and to troubleshoot. It thus results in a paradigm translation tax, and a big tax at that.
Very few who've worked with ORM's say they are wonderful things (except maybe those who've invested a career in them and are paid well). At best they view them as a necessary evil because they really want to stick to an object view of the world. Some see ORM's as a temporary hold-over until the day OODBMS replace RDBMS, which will probably happen after cancer is cured and there is world peace. ORM's are growing more complex than the application language itself in many cases. It's a lot of effort to hide from the question-and-answer oracle.
From "Inappropriate Abstractions, A Conversation with Anders Hejlsberg
" [the architect of the C# language!]
by Bill Venners with Bruce Eckel, December 2003
Anders Hejlsberg: Let's say you fetch the Customer with custID 100. Internally in an object-oriented program, if you ask for that customer in a query, and then you ask for it again later in another query, what would you expect to get the second time?
Bill Venners: A Customer that's semantically equal to the one I got the first time.
Anders Hejlsberg: Would you expect to get the same object reference?
Bill Venners: I don't see why I would care, so long as the two were semantically equal.
Anders Hejlsberg: Really? Because it has a profound difference in how your program works. Do you think of the customer as an object, of which there's only one, or do you think of the objects you operate on as copies of the database? Most O/R mappings try to give the illusion that there is just that one Customer object with custID 100, and it literally is that customer. If you get the customer and set a field on it, then you have now changed that customer. That constrasts with: you have changed this copy of the customer, but not that copy. And if two people update the customer on two copies of the object, whoever updates first, or maybe last, wins.
Bruce Eckel: Really, if you're going to all this trouble it's nice for it to be transparent.
Anders Hejlsberg: It's funny. It reminds me of the discussion we had earlier about CORBA and attempting to provide the illusion that an application is not distributed. Well, this is the same. You may want to have the illusion that the data is not in a database. You can have that illusion, but it comes at a cost.
Bruce Eckel: With CORBA, they were trying to have the illusion that there is basically no network. With Jini, they said, "No, there is a network. We have to acknowledge it at this certain level, otherwise things get excessively complicated." The trick in design is where do you make that acknowledgement? Where do you say, "Here is this boundary that we always have to see." And I think those kinds of issues exist with an O/R mapping. The challenge is figuring out what's the right abstraction.
Eric Gunnerson: The big question is: Do you need the abstraction? In a lot of cases you don't. We have something similar in our current implementation of remoting in .NET that tries to be transparent. Most people say, "Yeah, I know I'm doing remoting. I know the object lives over there. Don't go to all this effort to try and make it look like it's local."
Bruce Eckel: Sometimes you discover that if you try and use an abstraction like local-remote transparency, suddenly the complexity around it gets huge. Whereas if you just say, "I'm going to make a call here. The network may fail, and I have to acknowledge that," then things get clearer. With an object-oriented database, it seems there is that kind of choice in there as well. I have to accept that maybe I have multiple representations of the same Customer object. Maybe I have to tell the object I'm done. Maybe there has to be a transaction.
Anders Hejlsberg: And that's actually better, because then the user thinks deeply about the things that might possibly happen. As a designer, you try to give users that capability as best you can.
Bruce Eckel: And you try to put the abstraction at the right level, so that the users are not going to so much trouble to try and make things work because of the wrong abstraction.
Eric Gunnerson: The trouble with the wrong abstraction is there's no way out of it. In practice, though, it's very hard for class designers to make reasonable guesses about even the scenarios in which their designs will be used, much less the relative frequency of each kind of use. You may think your users will want transparency, because it lets them do really cool things, so you implement transparency. But if it turns out 99% of your users never care, guess what? Those people pay the tax.
From "Why I do Not Use ORM
by Ken Downs, June 2008
"Do you propose us to organize our applications in terms of tables and records instead of objects and classes?" Yes. Well, maybe not you, but that's how I do it. I do not expect to reach agreement on this point, but here at least is why I do it this way:
- My sphere of activity is business applications, things like accounting, ERP, medical management, job control, inventory, magazine distribution and so forth.
- I have been doing business application programming for 15 years, but every program I have ever written (with a single recent exception) has replaced an existing application.
- On every job I have been paid to migrate data, but the old program goes in the trash. Every program I have written will someday die, and every program written by every reader of this blog will someday die, but the data will be migrated again and again. (...and you may even be paid to re-deploy your own app on a new platform).
- The data is so much more important than the code that it only makes sense to me to cast requirements in terms of data.
- Once the data model is established, it is the job of the application and interface to give users convenient, accurate and safe access to their data.
- While none of this precludes ORM per se, the dictionary-based approach described above allows me to write both procedural and OOP code and stay focused on what the customer is paying for: convenient, accurate and safe access.
- The danger in casting needs in any other terms is that it places an architectural element above the highest customer need, which is suspect at best and just plain bad customer service at worst. We all love to write abstractions, but I much prefer the one that gets the job done correctly in the least time, rather than the one that, to me, appears to most in fashion.
In addition to the well thought out articles above, here are some dicussion threads, with Reddit coming last-but-not-least:
"Why do we need entity objects?"
...I really need to see some honest, thoughtful debate on the merits of the currently accepted enterprise application design paradigm. I am not convinced that entity objects should exist.
By entity objects I mean the typical things we tend to build for our applications, like "Person", "Account", "Order", etc...I am not anti-OO. I write lots of classes for different purposes, just not entities...I have come to the battle-tested conclusion that entity objects are getting in our way, and our lives would be so much easier without them...If you let people get too far away from the physical data store with an abstraction, they will create havoc with an application that needs to scale...
"Why is ORM considered good...?"
...A good example would be the online store that I work for. It has a Brand object, and on the main page of the Web site, all of the brands that the store sells are listed on the left side. To display this menu of brands, all the site needs is the integer BrandId and the string BrandName. But the Brand object contains a whole boatload of other properties, most notably a Description property that can contain a substantially large amount of text about the Brand. No two ways about it, loading all of that extra information about the brand just to spit out its name in an unordered list is (1) measurably and significantly slow, usually because of the large text fields and (2) pretty inefficient when it comes to memory usage, building up large strings and not even looking at them before throwing them away....
"Ask Reddit: I know SQL, but not ORMs: Am I crazy to create a mid-sized web app without an ORM?"
...I used to program that way but I don't any more. I cannot for the life of me figure out why I bothered copying the data out of the database object (Recordset, DataReader, or DataTable) and into these "business objects"...
...As I grow older I'm really starting to believe that Domain Driven Design is overkill and ultimately counter-productive for most applications. More often than not it seems like an unnecessary layer between you and your data...
...As for "the language of the domain", usually the domain is "tabular data". If I were to use Domain Driven Design, one of two things would happen. 1) I would have to create a hundred different customer classes to represent the various ways I want to work with the data. 2) I would have to write significantly more code to pull out the information I actually care about...