Slashdot Log In
Database Sensei Brian 'Krow' Aker Answers Your Questions
Posted by
Roblimo
on Wed Nov 28, 2007 01:11 PM
from the what-if-a-database-fails-in-the-forest-where-no-one-hears? dept.
from the what-if-a-database-fails-in-the-forest-where-no-one-hears? dept.
The last two weeks have been super-hectic in Brian Aker's life, but we finally have his responses to your questions ready to share with you...
1) Database Models (Score:5, Interesting)
by Tom (822)
Hi, Brian -
Are relational database the end-all, or do we have other promising database models to look out for? OO-databases apparently went the way of the Dodo, but what else is out there that you find interesting?
Brian:
The key designs right now that I am looking at are around what drives web applications and data analytics. The small scale OLTP problems have been solved. There are dozens of solutions out there today for basic OLTP. The databases that we need today are distributed, asynchronous, on-line, shared nothing, and modular. It should be possible to componentize the database so that all pieces can be replaced. This is one of the things that I believe Apache got right, and it is something that all database designers struggle with.
I do not believe that the OO databases died out completely. I see the concepts they were pushing in frameworks that are being built. The SQL extensions for OO are pretty lousy; they feel grafted on. Table inheritance looks great until you start to figure out all of the inherent gotchas in implementing it. Today if you look around in the open source world you will find examples where developers are wrapping OO layers on top of the relational model. One of these groups is going to get tired of the relational model and implement their own backend. Necessity will breed the next successful model.
2) Why MySQL? (Score:5, Interesting)
by eldavojohn (898314)
What is it about MySQL that caused you to create so much for it and to become a director for it? You're obviously very talented, why didn't you pick some other open source database (I don't mean to incite a religious war here)? Was it political? Was it because of friendships or really for technological reasons?
Brian:
Back in 1997 I ran into a problem with an application that I had written to use mSQL. At the time I was more accustomed to Informix, but the user I was writing the application for did not have a license to use it. Looking around the net I found MySQL and I ported the application to use MySQL in about thirty minutes. The only other databases that I considered were Addabas and Postgres. Postgres at the time had some row limits which made it a non-starter for my application. Adabas had nothing available in documentation. MySQL was simple to get up and running, and it had great on-line documentation. I could also follow the code which made me comfortable with deploying it.
So why come to work for MySQL? The week after 9/11 I went to a Database conference in Rhode Island. The conference was a bust, no one attended. I had known the founders Monty and David for several years but I really got to know them there. I had been working on extensions for MySQL at the time and it was fun to sit down and look at them with Monty. I bet on people, and David and Monty are wonderful people. Cycle forward a few years and David Axmark spent eight hours one night at a conference talking to me about joining. I also got to meet Marten Mickos and from him I got a good impression that the company was viable. I had been getting offers at the time and was looking to do something new. David convinced me that it would be fun to come work for MySQL, and he was right.
3) mySQL vs. PostgreSQL (Score:5, Funny)
by saterdaies (842986)
So, let me get this straight: you (mySQL) use a dolphin to fetch data while PostgreSQL uses an elephant to fetch data. Would that explain why PostgreSQL is better at fetching large datasets? Like, the elephant can haul more, but is slower while the dolphin is faster, but can't carry as well? Have you thought about using a non-animal to fetch your data? Maybe a racecar? Those are fast and could probably haul as much as an elephant. Plus, then I wouldn't need to have fish or peanuts in my server room.
Brian:
What is it with mammals? You would think that we would have branched out a little and found a real fish. Why not a raptosaurus? I try to avoid bikesheds, or any problem that deals with the weight of the average laden TCP packet being carried by either a Dolphin or an Elephant.
4) Why will MySQL not revisit some key design flaws? (Score:5, Interesting)
by MemRaven (39601)
There are several behaviors in MySQL that are quite key to the core of the system, and which are still not being revisited. For example:
* Lack of foreign key constraint even metadata maintenance in the core system schema
* Row-at-a-time constraint checking of all kinds
* Copy-on-schema-upgrades (meaning that you have to have at least twice the size of your largest schema element if you want to make a change, rather than making the physical change in-utero)
These affect performance of almost any large system. However, even with the new storage back-end, some of these fundamental "characteristics" aren't being revisited, in particular the universally derided failure to support foreign key constraint checking.
Why is MySQL so fundamentally incapable of revisiting decisions that have proven to be incorrect over time? I mean, Monty may have had reasons for making his database incapable of supporting key relational database constructs. But why are you incapable of accepting that for MySQL to have much wider scale adoption you have to remove the Monty Personality Constructs from the core limitations of the system?
For those of you who are interested, the key part here is that systems like InnoDB (which are able to process FK constraints internally) aren't able to do so efficiently, because the core locks them into row-at-a-time constraint processing, meaning that large inserts into fact tables with small dimension tables are painfully slow if you don't turn of FK constraint checking)
Brian:
You could add to this list, "Why is that you can insert a bad value into an enum and not have it always toss an error?" That is second on my list of the things which most annoy me about MySQL. MySQL was built for a number of tasks originally and when you understand what it was designed for most of these limitations make sense, though not all. Unireg, which is the predecessor to MySQL, was designed for Analytics.
MySQL revisits lots of decisions and nothing is set in stone. Monty is not set in stone. He changes his mind as well, and his opinions on how MySQL should behave and what it should support are different today. It takes time to develop code though, and users rely on previous behavior. MySQL is also not trying to copy what all of the other vendors have done either. We push to make ourselves more compatible with the SQL standard, but we have no intention of giving up on all of our own extensions. There is a team inside of MySQL today working on Foreign Keys, and that Foreign Key design will work in the manner that you describe. One of the strengths of MySQL is the ability to have multiple backends to the database. This makes many problems much more difficult to solve. Foreign Keys that are internal to an engine are pretty easy to do, but ones that can work across engines, some of which are not transactional, is much more difficult.
As far as "Copy-on-schema-upgrades" goes, that is a storage engine limitation. We started fixing some of this in 5.1, and we know we must solve this problem for all engines. The world is on-line and MySQL has to be as well. Today in large web application environments being on-line with MySQL means solving problems with dual master replication. Turn off the SQL slave on one master and point your clients away from it. Do your schema changes and then turn on replication. As soon as it catches up, turn off the other master and let the changes to the schema be applied to it via replication.
5) Client GUI Tools? (Score:5, Interesting)
by Blakey Rat (99501)
One of the things I've always felt is most lacking for Open Source databases is good client-side GUI tools to do ad-hoc queries, look at the database structure, assist in copying/importing/exporting data, etc. Microsoft has a pretty good tool in this area with either the SQL Enterprise Manager/Query Analyzer combo or SQL Server Management Studio.
MySQL on the other hand has a poorly-implemented, not-well-supported equivalent to Query Analyzer called MySQL Query Browser: http://www.mysql.com/products/tools/query-browser/ [mysql.com] It's slow, crashes often (especially when called upon to list large datasets), and has tons and tons of usability flaws (the most glaring being the impossibility to select/copy the dataset to any other applications, and the failure to support standard keyboard shortcuts like Control/Command-A to Select All.)
It seems to me that MySQL Query Browser is treated, at best, as a second-class citizen in the MySQL world. Is that the case? If so, is it simply due to a lack of qualified developers for it, or is it part of a larger strategy to keep more resources working on the back-end?
In my opinion, MySQL could benefit greatly from having some really great (or at least passable!) client-side tools.
Brian:
The open source world rarely wants to put a lot of effort into GUI tools. There are notable exceptions, like Firefox, but for the most part the feedback that is given is not enough. Watching those tools be developed was an eye-opener on the state of GUI development in the open source world. Most of the code I design is for backend servers and tools. I have frameworks for writing test cases and every time I do a push of some code a regression suite is run.
For GUI tools, though, in the open source world? Getting good test cases is hard, and reproducing them is time-intensive. Looking at the state of the world today, I would be hard pressed to tell someone to design a rich client. Javascript compatibility has gotten better and testing web APIs is very straightforward.
6) Slashdot's Future (Score:5, Interesting)
by eldavojohn (898314)
From your Wikipedia article:
"He then moved to work on Slashdot, where his initial task was to rewrite the database back-end to use Oracle. However, he extended the system to ensure it allowed multiple database back-ends, and became a published author along the way, writing Running Weblogs with Slash. From 2001 to 2007 he posted stories on Slashdot under the Author name of 'Krow'."
What do you think about the current state of Slashdot? Do you have any criticisms or praise for its current state compared to where it was when you worked on it? What do you see in the future of it, if anything at all?
Also, why do you select the nickname "Krow" for both Slashdot & your blog?
Brian:
I am glad that javascript became widespread so that it could be used for the site. Some of the coding for menus and display logic were just odd; Javascript should allow the current Slash coders to clean up the admin interface. Slash is in good shape. It has a number of concepts in it that have held up over time. Google's Map Reduce? Slashdot has runtask. I saw early on that we were going to have to solve problems like map/reduce, and the others agreed (Jamie took runtask further than what I did). Pudge worked with the folks who did Template::Tookit to get it to the point where we could use it. Some of the specialized hacks I put in place are still being used today. Slash has social networking features, Jabber support... it has a lot of good features.
Does Slash have issues? There are things it would have been nice to do. The database is not as abstracted away from the internal APIs as it should be. We made a couple of attempts to write a memcached sort of service, but none of them worked out right. Today some data is cached on startup and other bits are stored in the Apache children (the allocated objects are directly hooked into the Apache pools). I trade messages with Jamie from time to time about some of this and what I have suggested is that the entire guts should be written to make use of objects stored in memcached. Do away with the need for read replication slaves.
My nickname? Krow was the name I picked on an early time sharing system. For some reason lost to me I thought it was a funny anagram at the time. Why continue to use it? It stuck.
7) Replication (Score:5, Interesting)
by neoform (551705)
When can we get replication that isn't brutally difficult to setup and maintain?
Example, slaves that can be started blank and copy all necessary files without needing to have a copy of the databases already.
Brian:
mysqldump | mysql --host somehost
If you are only using Innodb you can use the single-transaction option and do it entirely online. Could this be simpler? Both MySQL and other vendors are writing tools to make the monitoring and deployment much easier.
8) Largest challenges (Score:5, Interesting)
by KevMar (471257)
What were your largest challenges or features that gave you the most grief when implementing them in MySQL?
Brian:
From a team standpoint I believe it would be stored procedures. MySQL internally does lock avoidance in order to not have deadlocks. Doing this with stored procedures was not very easy, and Per-Erik, the developer of stored procedures, spent months working on this by himself. A lot of credit should go to him for getting them to work. Since then Kostja Osipov, Dmitri Lenev, and others have spent a lot of time working out the kinks.
As far as my own projects go, evolving the storage engine code to allow it to be loadable was a long task, and one that took a number of people to accomplish. I began in 4.1 refactoring the storage engine API to allow engines to be loadable. In 5.0 I made a lot more progress and introduced the structure called the "handlerton" that is the singleton that is a storage engine. Making storage engines loadable was one of my reasons to join MySQL. I had made a previous engine that Slashdot used for a short period of time which was a hybrid between MyISAM and the Memory engine. I saw the potential from that and from working on Apache to make storage engine loading dynamic under MySQL. A number of individuals like Antony Curtis and Sergei Golubchik had the vision as well, and it was fun to work on it.
The work is not done yet really, and there is quite a bit more evolution that needs to happen in the code to support more loadable pieces.
9) Solid-State Storage (Score:5, Interesting)
by spoonboy42
It strikes me that a lot of the work that has gone into optimizing databases in the past (especially storage, index structure, and buffer management) works on the assumption that the database will be stored to a hard drive, which is way slower than main memory and has variable access times depending on the relative position of the disk head and the required data.
Nowadays there's a lot of large solid-state flash drives coming out, and as time goes on we can expect to see sizes increase and costs decrease. For small-to-medium size databases, it's now reasonable to keep the whole database on, say, one of the new 64GB memory drives [slashdot.org]. The use of drives which have a fixed, cheap cost for random access seems to invalidate and simplify a lot of the assumptions made about conventional back-end data storage, and opens up possibilities for new types of optimization (binary tree indexes instead of B+ trees, more use of indirection).
I'm wondering what you see as some of the biggest opportunities for memory-resident databases, and what work is going into MySQL to take advantage of the changes that are happening in storage.
Brian:
Databases are on the verge of evolving. When I am doing due diligence for VCs today on database technologies, in almost all cases the technology being presented is assuming lots of memory. I believe though that we will not see complete replacement of winchester disks any time soon. What we will see is solid-state disks making up another layer in caching schemes. You will probably have your main database still sitting on traditional spindle, but you will have a subset of that data located on solid-state storage. Even if the main database is just being kept around for backups, it will exist and it will be using spindle.
New media like this is all about caching. Winchester drives were brought in to speed up the data transfer for hot data. They acted as a caching layer between tape and main memory. Solid state drives will do the same for databases. Where I really want to see solid state disks is in my laptop :)
10) Japanese Animation (Score:5, Interesting)
by Volanin (935080)
Well, I will ask something not related to databases, or even computing at all. From the "About Me" section in your blog, I can see you named your lovely cats Nausicaä and Kiki. I will assume that you are indeed a fan of Japanese Animation, especially the works of Hayao Myazaki. So I ask:
- What's your favorite Myazaki movie to date?
- What's the reason for this choice?
- And finally: I also love Myazaki's movies, but lately I have been struggling to find anime that I really enjoy. They all seem repetitive and cliché. Would you dare recommend one? =)
Thank you a lot.
My best wishes for you and your cats!
Brian:
In graduate school I would spend Sundays working on the lab on campus, most of the time waiting on some program to finish. The building across the way housed the University of Iowa Anime fan club one Sunday a month. I would spend those Sundays going back and forth between the buildings. I was blown away by the environmental message in Nausicaa. For years Nausicaa was my favorite piece of anime, but then the Egyptian here in Seattle ran a Myazaki film festival and I got to see Porco Rossa in 35mm. I really enjoyed the film.
Despite loving Myazaki films I have seen very little anime in the last decade. Other then some stuff I picked up off the Cartoon Network back when I had cable, I have not really seen anything new. Rob Malda is the person to ask about Anime, he is a serious collector of it. I am more of an independent film watcher. Go watch "Ever Since the World Ended" or "Smala Sussie."
_______________________________________________________
Brian "Krow" Aker, brian at tangent.org
Seattle, Washington
http://krow.net/ http://tangent.org/ http://exploitseattle.com/
_______________________________________________________
You can't grep a dead tree.
Related Stories
[+]
Ask Database Guru Brian Aker 232 comments
Brian Aker is Director of Architecture for MySQL AB. He has also worked on the code (and database) that runs Slashdot, and is well-known in both Apache and Perl circles. Outside of the arcane world of open source "back-end" programming, though, hardly anyone has heard of him. This is your chance to ask Brian (hopefully after looking at his blog and Wikipedia listing) about anything you like, from Perl to database architecture to open source philosophy to upcoming events in Seattle. We'll send Brian 10 of the highest-moderated questions approximately 24 hours after this post appears. His (verbatim) answers will appear late this week or early next week.
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
Answer to 4 (Score:4, Interesting)
Re:Answer to 4 (Score:5, Insightful)
Parent
Re: (Score:2)
What is the amount of effort required to enforce data type constraints (e.g. enum, and every other type for that matter)? It's not that they don't have time to do it, it's that they don't want to do it. The MySQL philosophy is that data integrity is the domain of all of the applications that access the database, and not the database itself. I, personally
Re: (Score:2)
Then, really, MySQL may not be the DBMS for you, since that's a fairly central feature and one which a lot of the rest of the feature set (both what it has and what it doesn't have) revolves around. Its not like their aren't open-source DBMS's that aren't constrained by this, and its not like you get beat up by a swarm of dolphins if you use, say, PostgreSQL.
Re:Answer to 4 (Score:5, Insightful)
I disagree. I thought the most important part of his answer was this:
It seems his answer is more accurately portrayed as: we did it the way we did for various reasons, but the design is still evolving. Give us a little time to catch up.
Parent
Re: (Score:3, Insightful)
Re: (Score:2)
My personal favorite MySQL flaw, if you try to store a 9 character string in an 8 character field while the server is in the default configuration, it doesn't throw an error. Obviously, it doesn't work either, but no errors or warnings.
Yes, basic error checking is an option, because the benchmarks look a little bit better every time you stop caring about data integrity.
MySQL: Because your data isn't that important, anyway.
Re: (Score:2)
I think the SQL standards require that if you try to store a 9-character value in a char(8) field then it should be silently truncated. If MySQL is doing what the standard requires I don't think you can fault them for it. It would be a useful vendor enhancement to add a 'sa
About Question 6 (Score:2)
Re: (Score:3, Informative)
The reason is: repeatability. It's not good, for example, to monkey with your live schema using a GUI. Of course, some tools allow you to preview or log all the commands that get sent, which is a reasonable compromise. You do various things on an offline copy of the database, collecting the commands into a file. Then you run the file against another offline copy of the database to make sure it works. Then you consider whet
Re: (Score:2)
Re: (Score:2)
However even in development, I like to know what the GUI is doing. This is one of my beefs with MS SQL Server; it does things using undocumented stored procedures, for which there is no documented non-GUI procedure. What this means is that there is no supported way to script certain schema changes.
Re: (Score:2)
At my company, we don't actually allow *any
Re: (Score:2)
I use the query browser all the time. It has quirks, no question. I've run into some weird ones (every once in a while, I somehow cause it to switch to right-to-left text mode, for example).
The piece of software I've used the most (besides the command line tool and the query browser) is called YourSQL (Mac, Java) and it is very nice.
Re: (Score:2)
Re: (Score:2)
The open source world rarely wants to put a lot of effort into GUI tools. There are notable exceptions, like Firefox, but for the most part the feedback that is given is not enough. Watching those tools be developed was an eye-opener on the state of GUI development in the open source world. Most of the code I design is for backend servers and tools. I have frameworks for writing test cases and every time I do a push of some code a re
Re: (Score:2)
Re: (Score:2)
That'll teach me to read too fast.
Re: (Score:2)
Re: (Score:2)
He clearly said MySQL 'Query Browser'
Re: (Score:2)
Honestly, I didn't have *many* problems with crashing, I think it happened two or three times before I gave up on the tool. My main problems where the huge, gaping usability flaws. Two examples:
It's impossible to select the results grid and paste it into an
Re: (Score:3, Interesting)
http://bugs.mysql.com/bug.php?id=29163 [mysql.com]
Looks like my memory was a little faulty; apparently Control-A on Windows works as expected, it's only on Macintosh where Command-A does not work. On the other hand, I also forgot about the hostility and condescension of the developer involved (telling me to check the bug reporting guidelines when I had, and was following them exactly.)
Oh well.
Way to be Politically Correct! (Score:5, Funny)
Call me a kingdomist all you want, but it's science. Mammalia rules and everything else is a vile corruption of God's original plan. Get that straight you pc sanitized jerk.
Re: (Score:2)
Don't you mean classist?
Re: (Score:2)
Hmmm.... Perhaps that is just confusingly stated. If I understand correctly, I believe you mean to say that the "Kingdom of Heaven" is Animalia. "Mammalia", on the other hand, is a distinction of Class.
Re: (Score:2)
#5 - GUI Tools (Score:5, Interesting)
I use it daily against MSSQL, DB2/400, MySQL and Postgres. Not sure if Blakey Rat has tried it before but I wanted to drop a suggestion.
Off topic: Can I send messages to Slashdot users?
Re: (Score:2)
Sad... (Score:2, Interesting)
It's truly disheartening to read this kind of nonsense from the people who drive open source database development:
Today if you look around in the open source world you will find examples where developers are wrapping OO layers on top of the relational model.
NO, WRONG. Developers are wrapping OO layers on the *SQL* model, which doesn't support a complex, non-scalar type system with polymorphism (otherwise known as an object-oriented type system).
One of these groups is going to get tired of the relat
Re: (Score:2)
Re: (Score:2)
Must show all work for credit.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
I own 4 of CJ Date's books and they really changed my perspective on a lot of things. It took a long time though before I really understood the implications of what he is proposing. In fact, I still don't entirely understand, but I know a lot more about extracting long term value from data, and building systems that actually work.
Re: (Score:2, Interesting)
Speak for your database -- postgresql does.
Postgresql's "table inheritance" is a flawed concept and has nothing to do with the *type system*. Relations contain tuples, and tuples contain attributes, which are a name plus a VALUE. Those values are chosen from TYPES (sets of possible values). Those types are the TYPE SYSTEM.
Table inheritence doesn't even make sense. Tables are analogous to relations. All relations are the same type, the relation type (think "set" or "array" to make it easier). How can
Mod Parent Up! (Score:3, Insightful)
This is exactly what is wrong with the "relational vs. object" debate: people don't understand that a true relational database can contain objects. The difference is not with the *objects* but with the *relationships* between objects: an OODB works from the idea that sub-objects are "inside" of parent objects; a true RDBMS works from the idea that sub-objects are "related" to the parent objects (by a relationship that explicitly describes how they are related).
Re: (Score:2)
Er, no, its not. There are implementations of "Tutorial" and "Industrial" D's (which are implementations of the relational, as distinguished from SQL, model.) The problem is getting any substantial part of the database market to care that anyone has implemented it.
Joining (Score:2, Funny)
You would have thought that a "database sensei" would already know everything there is to know about joins.
Discussions between Masters (Score:2)
They may play thought games to try to figure out why X is better than Y, or to understand more fully the ramifications of why o
New mod type? (Score:2)
OOP versus Relational (Score:2)
OO and relational do tend to *not* get along well. They just have conflicting root philosophies. I don't think the mother-of-invention can solve it, not without a new philosophical underpinning coming about.
One solution
Re: (Score:2, Interesting)
I need to qualify that a bit before it triggers a paradigm fight. I've never seen a reference implimentation where the OO version was clearly better than a procedural/relational equivalent. Yes, the OO version may "run" and produce the correct output, but as far as software maintenence I've seen no net benefit over competitors.
The biggest problem is that to take advantage of inheriten
Oddly enough... (Score:2)
Good movies choice indeed (Score:2)
Seriously good flicks, I really recommend them as well. They really are movies you want to see.
Question 7: Replication (Score:2, Insightful)
>
> If you are only using Innodb you can use the single-transaction option and do it entirely online. Could this be simpler? Both MySQL and other vendors are writing tools to make the monitoring and deployment much easier.
WTF ? Granted, I haven't set up v5 replication yet, so maybe a miracle happened, but I doubt it.
Just doing this is just going to provide you with an inconsistent dataset. You need to (as documented) stick the entire database into read-only mode, m
Re: (Score:3, Informative)
Re: (Score:2)
And in English, the name of the anime in the answer is Porco Rosso, not Porco Rossa.
Re: (Score:2)
On the drama end, check Haibane Renmei. Probably the most touching story ever to hit the "moving picture" media.
Re: (Score:2)
Also might want to look into Armitage The Third: Poly-Matrix and Dual-Matrix. The box will look like it's 'Amitage III'. You can nab a dual-disc box with both of them for $36 off of Amazon.
Re: (Score:2)
It carried Trigun a few years ago.