It seems that your study attempted to simulate the growth of an internet startup firm on Windows or Linux. One thing I did not see in the study was a good description of assumptions you made. What assumptions were made in both the design of the requirements and the analysis of the data? What limitations can we place on the conclusions as a result of these assumptions?
This is a really important question. I think there are two sections of the study: the assessment methodology and then the experiment we undertook to illustrate how to apply that methodology. I'll answer the assumption question for both parts:
Methodology - For the methodology, we wanted to provide a tool that organizations could use and apply their own assumptions. Maintaining a system is all about context; some environments favor Linux, others Windows. The question is, how do you know what's likely to be the most reliable (which includes manageable, secure and supportable) solution for your environment? We proposed a methodology a recipe - that looks at a solution in its entirety instead of just individual components. Policies like configuration control vary from organization to organization and to get something that's truly meaningful in your environment, the methodology needs to be carried out in your context. Enterprise customers can and should do this when they are about to trust their critical business processes to a platform. That said, the basic assumptions of the methodology are that patches are applied at 1 month intervals and that business needs evolve over time. How those business needs evolve depends on the scenario you're looking at (in our experiment we looked at ecommerce for example). The methodology doesn't cover steady state reliability, meaning the uptime of a system that is completely static. While this is important, our conversations with CIOs, CTOs, CSOs and IT folks lead us to believe that this was a smaller contributor to pain in a dynamic environment. In an appliance for example, though, steady state reliability is king, and I think an important limitation of this methodology is that we don't capture that well, and I think it's amazingly difficult quality to measure in a time-lapse way.
The purpose of the experiment was to illustrate how to apply the methodology and to begin to get some insights into some of the key model differences between two platforms. For the experiment we picked the ecommerce scenario, for no other reason than there has been a clear shift in how ecommerce sites have serviced their customers in recent years moving from static sites to personalized content. Some specific assumptions were:
* The transition from a basic purchasing site to a personalized portal based on order/browsing history takes place over a one year period.
* The period we looked at was July 1st, 2004 to June 30th, 2005 (the most recent full year at the time of the study).
* A configuration control policy exists that mandates OS version but not much else meaning administrators had fairly free rein to meet business requirements.
* All patches marked as critical or important supplied by the vendor are applied.
* We assume the system to be functioning if the original ecommerce application is running and meets some basic acceptance tests (same for both platforms see Appendix 1 of the report) and the new installed components are also running.
* To add new capabilities, we use leading 3rd party components as opposed to building custom code in-house.
* The business migrates operating system versions at the end of the one year period to the latest versions of the platform.
* The administrators that participated in the experiment reflect the average Linux (specifically SuSE) and Windows administrators in skill, capability and knowledge. While this was strived for, it's important to recognize the small sample size in drawing any conclusions from the data.
As far as limitations, the experiment looks at one specific case with a total of six administrators. I'd love to have done it with a hundred admins on each side on a wide range of business requirement scenarios and my hope is that others will do that and publish their results. Our experiment, however, shows that for this particular, clearly documented scenario, experienced Linux Admins had conflicts between meeting business needs and a recommended best practice like not introducing out-of-distribution components. If one is aware of potential conflicts and challenges upfront, I think you can put controls in place to make reasonable tradeoffs. In the linux case, a precise and specific configuration control policy may have prohibited the problematic upgrade of one of the components that the 3rd party solutions required. This would have likely reduced the number of failures but would have put some hefty constraints on 3rd party solutions. To understand the implications for your environment you really need to run through the methodology with the assumptions and restrictions of your organization and I hope that this study either prompts or provokes people to do that.
2 - Meta-credibility?
Where I come from (non-management, grunt-level techie), appearing in any of these analysts' journals *costs* an author more credibility than it gains him or her. For example, if $RAG says that $CORP has the best customer support, I immediately assume that $CORP has such horrid customer support that they had to pay someone to make up some research that proves otherwise.
To be sarcastic, I'd ask "who the heck actually takes these studies seriously?", but obviously *somebody* does. Who are these people, and why do these people take these industry analyst firms/journals/reports seriously? Are they right or wrong to do so? This isn't an attack (or endorsement :) of your research -- I'm talking about the credibility gap in industry research, and my observation that it's an industry-wide problem.
The meta-credibility question is this: Given the amount of shoddy pay-for-play research out there, does being published in an analyst journal tend to cost (a researcher, his consulting company, his financial backers) more credibility than it can gains him/her/them? If not, why not -- and more importantly, if so, is there any way to reverse the trend?
This is a really interesting question because it cuts to the heart of what a real research study should provide to the reader. It should provide a baseline and I think research should always be questioned, scrutinized and debated because one can always find reasons for bias. Particularly, if a subject of the study (vendor for example) is behind its funding, whether directly (as in this study) or indirectly (meaning that they are big clients) I think it's critical that the study not provide just a baked cake for readers but the recipe as well. The recipe has to be inherently fair and simple, meaning that it has to map directly to a the quality or pain one is trying to measure without taking into account how the subjects try and provide that service or mitigate that pain. I think slanted opinion pieces, with no backup for those opinions, seriously hurts credibility, at least in my book. If you're presenting facts though and encouraging others to question them then I think that actually helps credibility, even if the search for those facts was paid for.
I agree though that one is tempted to dismiss research a priori though because of funding or some vendor tie. I think a good way to reverse the trend is to open the process up to public scrutiny; that's probably the main reason I came on Slashdot. To use this specific study as an example, some folks disagreed with several points in the experiment from counting patches, to reasons for upgrading key components, to the ecommerce scenario we used. For me, the study's key value is the methodology. Could different applications/scenarios have been chosen: absolutely!
The value I think that this study gives to the practitioner is arming them with a tool to help measure in their own environment. By applying the methodology, the results should take into account things like administrators skillsets, support policies, configuration control policies and the tradeoffs between customizability, maintainability, visibility, security and usability. It's only by looking at this stuff in context can one make a sound judgment; and a true research paper, especially one where funding is in question, needs to fully disclose the method and the funding source. In our case, the methodology has been vetted by industry analysts, IT organizations and several academics. That doesn't mean much, though, if you don't find the methodology meaningful for the questions you want answered. One reason I've come on Slashdot is to get the thoughts, opinions and assessments of the methodology itself from administrators in the trenches. I'm really pleased with the great questions and comments amidst the inevitable flames and I'm looking forward to this being posted so that others can weigh-in with their feedback and I can jump into the threads to get some discussion going.
If the research helps give real insight, and the methodology makes sense, I think there's real value no matter who paid the bill. At the end of the day, you need to decide whether or not you can extract any value from the information presented to you. In the case of this study, my hope is that it will leave you thinking hmmm.... maybe we should actually run through a process like this and check out how this works for ourselves. My more ambitious hope is that you'll implement it and tell me what challenges you faces on Windows, Linux, OSX, BSD, whatever platform you choose to compare. It may not even venture into the perennial Windows versus Linux battle; maybe you're a linux shop trying to decide between multiple distributions for example. Either way, if it's got people thinking about the topic and asking questions, well, that's all any researcher can really hope for.
3 - Weak setup
If I understand the study correctly, the windows side had to do nothing but set up a server to do a few different tasks over time and run windows update. The linux side had to have multiple incompatible versions of their database server running simultaneously on a single system and had to run unsupported versions of software to do it.
Why wasn't the windows side required to run multiple versions of IIS or SQL server simultaneously? In real life if you need to run multiple database versions you use virtualization or multiple systems, especially if one requires untested software. You don't run some hokie unstable branch on the same system as everything else. Why was a linux solution picked that required this level of work? My other related question is, did any of the unix administrators question why there were being asked to do such a thing? For example, did they come back and say they need a license for vmware? If they did not they do not seem like very competent administrators in my opinion.
The Windows Admins and Linux admins were given the exact same set of business requirements which doesn't necessarily translate into the same tasks as they went about fulfilling them. The 3rd party components installed were chosen solely based on their market leadership position and any upgrades of OS were unknown at the time of selection. That said, on the Windows side, it turned out that no upgrades of IIS were needed (except for patches) and SQL Server was upgraded to SP4 as part of patch application. On the Linux side, at a high-level there were two main classes of upgrades: MySQL and GLIBC and they were both prompted by the installed components. After the experiment, the administrators were asked on both sides if this kind of evolution of systems met with their real-world experience. They said yes, with the caveat of if they were asked to install a component that required an upgrade of GLIBC that they would likely upgrade the operating system as long as their configuration control policy allowed it.
You make a great point about installing components on some sort of staging system (which is almost always done) as opposed to live running systems. That still means that the problems that the administrators had equal real IT pain. If something weird had to be done to get the system running but it does run and it's then put into production it's like a fuse that gets set on a bomb. A careful configuration control policy would almost certainly help and thats why I think it's so important to conduct this kind of experiment in your own environment with your own policies.
As far as selection of the Linux administrators go, they all had at least 5 years of enterprise administration experience, and two years of experience on SuSE specifically. With three people there's certainly likely to be a lot of variability and to get some conclusive results, I'd love to get a huge group of administrators across the spectrum in terms of experience. I'd also love to do it across multiple scenarios, beyond the ecommerce study. For this experiment, basically the bottom line is that we Illustrate one clearly documented scenario with six highly qualified admins that we selected based on experience. We cant ensure equal competency levels, but there was nothing in our screening that would lead us to believe there were gaps in knowledge on either side. When it comes down to it though, the really meaningful results are the ones you get when you perform the evaluation in your environment. Hopefully this study provides a starting point for asking the right questions when you do that.
4- Who determined the metrics
Did Microsoft come to you with a specific set of metrics, or did you work with them to develop the metrics, or did you determine them completely on your own?
Kudos to you for braving the inevitable flames to answer people's questions here on Slashdot.
Great question! The metrics and the methodology were developed completely on our own and independent of Microsoft. They were created with the help and feedback of enterprise CIOs as well as industry analysts. I think that this relates to a couple of other questions on Slashdot with the gist of if Microsoft is funding the study aren't you incentivized for them to come out ahead. Besides the standard we would never do that and that would put our credibility at risk which is our primary commodity which are both very true, let me explain a little more about how our research engagements work.
Company X (in this case Microsoft) comes to us and says can you help us measure quality Y (in this case Reliability) to get some insight into how product Z stacks up. We say, sure, BUT we have complete creation and control of the methodology, it will be reviewed and vetted by the community (end users and independent analysts) and must strictly follow scientific principles. The response will either be: great, we want to know whats really going on or um, heres some things to focus on and I think you should set it up this way. In the first case we proceed, in the second case we inform that company that we don't do that kind of research. We are also not in the opinion business, so we present a methodology to follow and illustrate how that methodology is applied with the hope that people will take the methodology and apply it in their own environment.
All of our studies are written as if they will be released publicly BUT it is up to the sponsor if the study is publicly released. The vendor knows that they're taking a risk. They pay for the research either way but only have control over whether it is published, not over content. So if their intent is to use it as an outward facing piece, they may end up with something they don't like. Either way, I think it's of high value to them. If there are aspects of the results that favor the sponsor's product, in my experience, it goes to the marketing department and gets released publicly; if it favors the competitors product it goes off to the engineering folks as a tool to understand their product, their competitor's product, and the problem more clearly. Either way, we maintain complete editorial control over the study and there is no financial incentive for us if it becomes a public study or is used as an internal market analysis piece. The methodology has to be as objective as possible to be of any real value in either case.
5 - ATMs vs. Voting Machines
How is it that Diebold can make ATM machines that will account for every last penny in a banking system, but they can't make secure electronic voting machines?
Also, does the flame-resistant suit come with its own matching tinfoil hat? (don't answer that one)
This is a question that has passed through my mind more than once. The voting world is very interesting. I don't have experience with the inner workings of Diebolds ATM machines but I can say that the versions of their tabulation software that Ive seen have some major security challenges (see this Washington post documentary for some of the gory details). I'd say I'm concerned about the e-voting systems Ive seen but that would be a serious understatement.
I question whether the economic incentive is there for them to make their voting systems more secure. Take an ATM for example. Imagine the ATM has a flaw and if you do something to it, you can make it give you more money than is actually deducted from your account. Anything involving money gets audited and sometimes audited multiple times and chances are good that the bank is going to figure out that they're loosing money. On the flip side, if there was a flaw in the ATM in the banks favor, someone balancing their checkbook is going to notice a discrepancy. The point is that there's always traceability and there's always someone keeping score. If you think about voting tabulators though we've got this mysterious box that vote data gets fed into and then, in many states, only a fraction of these votes are audited. That means we don't really know what the bank balance is other than what the machine tells us it is. If the system is highly vulnerable and its vulnerability is known by the manufacturer *but* it's going to be expensive to fix it and shore up defenses, there seems to be no huge incentive to fix the problems. I think the only way to get some decent software that counts votes that people can have confidence in is to allow security experts to actually test the systems, highlight potential vulnerabilities, and put some proper checks and balances in place. That would give the general public some visibility into a critical infrastructure system that we usually aren't in the habit of questioning and will hold voting manufacturers directly accountable to voters.
As for the tin foil hat to go with the flame resistant suit; it hasn't been shipped to me yet - apparently the manufacturing company is still filling backorders from SCO :).
6 - Why are the requirements different?
Looking at your research report's appendices, it seems that the requirements for Windows Administrators were somewhat different than the Linux Administrators. For instance, you ask for 4-5 years sys admin experience minimum for Windows, whereas it's 3-4 years sys admin experience minimum for Linux.
Why wasn't it equal for both? And doesn't this sort of slight Windows favoring undermine your credibility?
Short answer: Typo. Long answer: We originally were looking for 4 years of general administration experience for both Linux and Windows which is what is reflected in the desired responses to the General Background questionnaire for Linux. We then raised it to 5 years for both Linux and Windows which is reflected in the General Background of the Windows questionnaire. The difference in the two was just a failure to update the response criteria on that shared section of one of the questionnaires. On page 5 though we've got the actual administrator experience laid out:
Each SuSE Linux administrator had at least 5 years experience administering Linux in an enterprise setting. We also required 2 years minimum experience administering SuSE Linux distributions and at least 1 year administering SuSE Linux Enterprise Server 8 and half a year administering SLES 9 (released in late 2004). Windows administrators all had at least 5 years experience administering Windows servers in an enterprise environment. These administrators also had at least 2 years experience administering Windows Server 2000 and at least 1 year administration experience with Windows Server 2003.
7 - Scalability of Results?
You tested six people on two different systems; how is that supposed to yield any substantial insight into the underlying OSes themselves?
[At best, your study seems to show that the GNU/Linux distribution you selected was not particularly good at this task. But why does that show that the ``monolithic" style of Windows is better per se than the ``modular" style of GNU/Linux distributions?]
First, let's look at what we did. We followed a methodology for evaluating reliability with three Windows admins and three Linux admins. This is small sample set and it looked at one scenario: ecommerce. Is this enough to make sweeping claims about the reliability of Linux/Windows? No way. I do however think the results raise some interesting questions about the modularity vs. integration tradeoffs that come with operating systems. I don't think that either the Windows or Linux models are better in a general sense but they *are* different; the question is which is likely to cause less pain and provide more value for your particular business need in your specific environment. Hopefully these are the questions that people will ask after reading this study, and with any luck it will prompt others to carry out their own analysis within their own IT environment, building on what we started here. I think the methodology in this paper has provided a good starting point to help people answer those questions in context.
8 - Convenience vs. security
Lately, I've felt that Microsoft is emphasizing greater trust in their control over your system as a means of increasing your security. This is suggested by the difficulty of obtaining individual or bulk security patches from their website as opposed to simply loading Internet Explorer and using their Windows Update service, the encouragement in Service Pack 2 of allowing Automatic Update to run in the background, and the introduction of Genuine Advantage requiring the user to authenticate his system before obtaining critical updates such as DirectX.
In addition, Digital Rights Management or other copy protection schemes are becoming increasingly demanding and insidious, whether by uniquely identifying and reporting on user activity, intentionally restricting functionality, and even introducing new security issues (the most recent flap involves copy protection software on Sony CDs that not only hides content from the user but permits viruses to take advantage of this feature.)
I would like to know how you feel about the shift of control over the personal computer from the person to the software manufacturers -- is it right, and do we gain more than we're losing in privacy and security?
This is an interesting problem because manufacturers have to deal with a wide range of users. If there was real visibility and education for users on the security implications of doing A, B or C then we'd be ok. It's scary though when that line gets crossed. Sony's DRM rootkit is a good example. But if you think about it, we are essentially passively accepting things like this all the time. Every time we install a new piece of software,especially something that reads untrusted data like a browser plugin,we tacitly accept that this software is likely to contain security flaws and can be an entryway into your system; NOW are you sure you want to install it? The visceral immediate reaction is no but then you balance tradeoffs of the features you get versus potential risks. Increasingly, were not even given that choice, and components that are intended to help us (or help the vendor) are installed with out our knowledge. This also brings up the question of visibility; how do we know what security state were really in with a system? Again, there are tradeoffs, some of this installed software may actually increase usability or maintainability but it's abstracting away what's happening on the metal. So far, it seems as though the market has tended towards the usability, maintainability, integration that favors bundling on both the Linux and Windows sides. It's kind of a disturbing trend though.
As another example, think about how much trustaverage programmers put into their compiler these days. Whenever I teach classes on computer security and then go off into x86 op codes or even assembly, it seems to be a totally foreign concept and skillset. We've created a culture of building applications rapidly in super high-level languages which does get the job done, but at the same time seems to have sacrificed knowledge of (or even the desire to know) what's happening on the metal. This places a heavy burden on platform developers, compiler writers and even IDE manufacturers because we are shifting the cloud of security responsibility over to them in big way. Under the right conditions it can be good because the average programmer knows little about security, but we need to make sure that the components we depend on and trust are written with security in mind, analyzed by folks that have a clue, and are tested and verified with security in mind. This means asking vendors the tough questions about their development processes and making sure they've got pretty good answers. Here's what I think is a good start. If that fails, theres always BSD. :).
9 - Apache versus IIS
Simple one: of course I accept that Windows and Linux are a priori equally vulnerable - C programmers make mistakes. The question is which model is most likely to deliver a fix fastest. Given that the one area where Linux is probably in the lead over Microsoft's software is in the realm of the webserver - why are my server logs filled with artifacts of hacked IIS boxes but apache seems to remain pretty safe?
You bring up a couple of interesting points. The first is patch delivery. It's true that on Linux if there's a high profile vulnerability you're likely to be able to find a patch out on the net from somebody in a few hours. Sometimes the fix is simple, a one-liner, and other times it may be more complex. Either way, there could be unintended side effects of the patch which is why there's usually a significant lag between these first responder patches and a blessed patch released from the distribution vendor. Most enterprises I know wait for the distribution patch as a matter of policy, and even then, they go through a fairly rigorous testing and compatibility verification process before the patch gets deployed widely. In the Windows world, one doesn't get the alpha or beta patches, just the blessed finished product. So the question is which solution is likely to provide a patch that fixes the problem and doesn't create any more problems the fastest. That's a tough one to answer. I think theres something to be learned by looking historically and that in general theres a big discrepancy between perception and reality. Here's a (pdf) link to a study we did earlier this year based on 2004 data that I think provides a good starting point for answering that question.
As far as why you've got so many attempts on your Windows/IIS box, I think there are two distinct issues: vulnerability and threat profile. In the past, I would argue that the path of least resistance was through Windows because desktop systems were often left unprotected by the home computer user. Bang-for-the-packet favored creating tools that exploited these problems and some of the attacks actually worked on poorly configured servers as well. Then there's the targeted vs. broad attacks. Theres no question that the high-profile worms and viruses in the last several years have favored Windows as a target. The issue gets even more complicated when you look at targeted attacks. These targeted attacks are much harder to measure, even anecdotally, because either an organization gets compromised and doesn't disclose it (unless they're compelled to by law) or the attack goes undetected because it doesn't leave any of the standard footprints, in which case no pain is felt immediately. That may help to explain it but the truth is that there's a lot of conflicting data out there. I remember reading this on Slashdot last year which claims Apache was more attacked than IIS but I've also read reports to the contrary. The reality is that any target of value is going to get attacked frequently. If there is an indiscriminant mass attack like a worm or virus, that's pretty bad and can be really painful. What's scarier though is the attack that just targets you.
10 - Do you agree with Windows Local Workflow
Microsoft and Linux distros have had a policy for some time of including more and more functionality in the base operating system, the latest example is the inclusion of "Local Workflow" in Windows Vista.
As a security expert do you think that bundling more and more increases or decreases the risks, and should both Windows and Linux distros be doing more to create reduced platforms that just act as good operating systems?
Three years ago I bought my mother a combination TV, VCR and DVD player. It was great; she didn't have to worry about cables or the notorious multi-remote control problem. She didn't even really need the VCR because she hardly ever watches Video tapes, but I thought, why not. It worked great for two years, mom watched her DVDs, and on a blue moon a video tape from a family vacation would find its way into the VCR. All was well at the Thompson household. This past year, tragedy struck. The VCR devoured a videotape, completely entangling it in the machine. This not only knocked out the VCR but the television too (it thought it was constantly at the end of a tape and needing to rewind it). So here's the issue: mom probably only needed a TV and a separate DVD player. I probably could have gotten better quality components individually too, and with some ebay-savvy shopping, the group may have been cheaper. For my mom though, the integration and ease of operation of the three were key assets. The flipside of that is that the whole is only as strong as the weakest of its constituent parts, and by the manufacturer throwing some questionable VCR components into the mix, it caused the whole thing to fail. The meta-question: did I make the right choice, going for the kitchen-sink approach versus individual components? I think for mom I made the right call. For me, my willingness to program a universal remote and my love of tweaking the system would have lead me down a different route.
In operating systems, it depends what you're looking for and what the risk vs. reward equation is for you, and I would argue that the answer varies from user to user. The ideal would be something that gave you integration, ease of use, visibility, manageability and the ability to truly customize and minimize functionality and maintenance requirements. No operating system I've ever seen strikes that balance optimally and for every user. As far as bundling functionality with the distribution, I think it's a question of market demand. There's no question though that from a simple mathematical perspective, the less code processing untrusted data the better. That means if I need a system to perform one specific function, and that function was constant over time, then from a security perspective I only want the stuff on that box that does what I need to serve that goal. For example, I don't ever want X Windows on my linux file server. I just want the minimal code base there because as long as the code itself is reliable, I'll only have to mess with the box to apply patches (and much fewer patches if I strip the system down). That's true of my home fileserver. If I have an army of systems to manage though, my decision is going to come down to which platform is reliable and extends me the most tools to manage it efficiently and effectively. That's a question that can only be answered in context. I can tell you what I run at home though. File server: Red Hat EL 4 (no X windows). Laptop: Windows XP SP2. Desktop: Windows Server 2003 with virtual machines of everything under the sun from Win 9x to SuSE, Red Hat and Debian.