Ask Database Guru Brian Aker

Ask Database Guru Brian Aker 232

Posted by Roblimo on Monday November 12, 2007 @12:00PM from the earning-a-living-with-open-source-software dept.

Brian Aker is Director of Architecture for MySQL AB. He has also worked on the code (and database) that runs Slashdot, and is well-known in both Apache and Perl circles. Outside of the arcane world of open source "back-end" programming, though, hardly anyone has heard of him. This is your chance to ask Brian (hopefully after looking at his blog and Wikipedia listing) about anything you like, from Perl to database architecture to open source philosophy to upcoming events in Seattle. We'll send Brian 10 of the highest-moderated questions approximately 24 hours after this post appears. His (verbatim) answers will appear late this week or early next week.

Ask Database Guru Brian Aker

This discussion has been archived. No new comments can be posted.

Search 232 Comments Log In/Create an Account

Comments Filter:

Re:Tabular vs hierarchal arrays (Score:3, Informative)

by DaleGlass ( 1068434 ) writes: on Monday November 12, 2007 @03:51PM (#21327079) Homepage

How do you do a SELECT DISTINCT name, or an OUTER JOIN in that model? What if you need to search by a non-key column?

Key/value systems have their place, but doing very normal RDBMS things in them is a pain.

Re:Object databases? (Score:2, Informative)

by daniel_newton ( 817437 ) writes: on Monday November 12, 2007 @04:42PM (#21327695)

db4o (http://en.wikipedia.org/wiki/Db4o [wikipedia.org]) is an open source object database. Apparently BMW, Boeing, Intel and others think it is "industrial grade".

It has a Java and a .NET version.

Re:Document databases? (Score:3, Informative)

by Pseudonym ( 62607 ) writes: on Monday November 12, 2007 @07:56PM (#21330109)

Well, one might argue also that hierarchical data is also not a good fit with the relational model.

True enough.

If we already have XML data type in many DBs, what prevents us having Document data type like what's in CouchDB (yes, it's proprietary, but so are many of the features in current RDBMS).

It's the way you interact with it that's critical here. Think about how you work with your favourite SQL DBMS and with Google, and you'll see that they're quite different at a very basic leve.

The fundamental result data type in a relational DBMS is the stream of tuples, and tuples contain real data. In other words, querying (i.e. finding what matches some criteria) is essentially the same as presentation (i.e. getting real data out).

The fundamental result data type in a document-based DBMS is the sequence of document numbers. Document numbers do not contain real data. You perform a query, and get a sequence of document numbers. Then you let the user refine the sequence. Maybe you present some metadata, or KWIC information. Maybe you sort on a field. Maybe you add more constraints. Eventually, you get to the point where you present real data.

Yes, you can do this in principle with relational databases, but it costs. It's not uncommon for a relational model for a document database to use 5-10x the disk space of a native document database engine.

Damien Katz has for example said that he might consider supporting SQL syntax, but it's too early to say.

SQL is a poor fit for textual data, too. Partly because support for textual querying operators (e.g. phrase or adjacency queries, query-based ranking) is poor, and partly because SQL has poor support for managing result sets on the server.

A few options: (Score:3, Informative)

by einhverfr ( 238914 ) writes: <chris...travers@@@gmail...com> on Monday November 12, 2007 @09:35PM (#21330897) Homepage Journal

1) TotalRekall (Python-based)
2) PgAccess (a TCL front-end and form builder for PostgreSQL)
3) Once:Radix (a web-based front-end builder for PostgreSQL).

OnceRadix is quite new and I think it is well thought out in a lot of ways.

Re:Tabular vs hierarchal arrays (Score:1, Informative)

by Anonymous Coward writes: on Tuesday November 13, 2007 @12:33AM (#21332447)

Why are so many databases still tabular today and not hierarchal?
Short answer: because people much smarter than the average coder have thought about data management and concluded that generic flexible structures are much better than rigid, application-specific or access-specific structures.
Longer answer:
Tabular structures are closer to the relations of the relational model, which store data as facts, and associate those facts via arbitrary boolean expressions. I.e., this row is related to that row when the following boolean function is TRUE. You can't get any more flexible than that. If you did, you would essentially be inventing the next data management model after relational.
Forty some odd years ago, database were hierarchic, because people would simply create structures that "matched" what they were doing in the code. You've got in Invoice, which contains, Line Items, which contain Inventory Items, blah blah.
Then people realized, sometimes you need ask questions like this: How much of each item did we sell? Who bought these items? Who didn't buy item X? And so on. Basically, if you wanted general queries, you had to "walk the hierarchy". Two problems here: 1) the data is not optimized for any query.. it's only optimized for a subset of possible queries and 2) "walking the hierarchy" means you have to write procedural, navigational code, which can't be optimized by the database, even if you wanted it to. In other words, you can't just ask the database for the answer, you have to "code it up". Then, the next app that needs that query has to do the work all over again (possibly making mistakes). Ever worked with a 10-year old app that has multiple apps, in multiple programming languages, hitting the same DB? Each making their own assumptions and querying the database in different ways? It's bad enough with SQL, it would be a nightmare with hierarchical databases.
Then, they came up with Network databases (analogous to today's "Object" databases), which used arbitrary links to connect data. More general than hierarchical, but still, what if you want to do many-to-many connections? What if you want to create associations that weren't implemented by the database designer? For example, what if object1 can be associated with all objects that have field "X" less than or equal to field "Y" of object1? Back to the procedural code.
Ted Codd, father of the relational model, took these concepts and GENERALIZED them. Arbitrary types (including complex compound types like objects), grouped together to create single facts, constrained with arbitrary boolean functions. Beautiful. And then he connected it to the "real world" using predicate logic.. each set of values represents a single boolean assertion, as complex as you can think of. Extra-super-beautiful. To top it off, there's a closed, simple algebra that can derive any answer from these simple structures.
SQL is about as far as we got in implementing those ideas. And, unfortunately, we seem to be heading back to the brain-damaged world of hierarchic databases. You can model any type of database with the relational model, by the way, because it is a completely general abstract model. Most people don't realize that you could store objects within relations for example.. for some reason people always think of "rows" as equivalent to "instances", which makes no sense at all.. what class is a row generated by a JOIN? What class is "SELECT name FROM customer"? What class is a UNION of different tables? "ORM" has really fucked up data management in the minds of most programmers.
I'm thinking of how Cache (or M / Mumps) is a database which basically lets you store associative arrays of data nested at any level, and access them in the normal programmatic way which you would access any other array.
MUMPS is an example of an old pre-relational design. Basically a backing store for an application, no data integrity, nothing more than a simple place to dump some strings. Most people who wo
Read the rest of this comment...

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Ask Database Guru Brian Aker 232

Ask Database Guru Brian Aker More Login

Ask Database Guru Brian Aker

Re:Tabular vs hierarchal arrays (Score:3, Informative)

Re:Object databases? (Score:2, Informative)

Re:Document databases? (Score:3, Informative)

A few options: (Score:3, Informative)

Re:Tabular vs hierarchal arrays (Score:1, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot