Slashdot Log In
Ask the Man Behind the NOAA's New Beowulf Cluster
Posted by
Roblimo
on Tue May 23, 2000 11:00 AM
from the don't-you-wish-you-had-his-job? dept.
from the don't-you-wish-you-had-his-job? dept.
Greg Lindahl sent in this story last September about a massive Alpha Linux cluster that's being built by HPTi for the NOAA's Forecast Systems Laboratories. What Greg forgot to mention when he submitted the original story is that he's the project's chief designer. What with all the Beowulf (and Alpha) interest around here, we figured he'd make a great interview guest, especially now that the project is well under way. Please post your questions below. Answers to 10 - 15 of the highest-moderated ones should appear within the next week.
This discussion has been archived.
No new comments can be posted.
Ask the Man Behind the NOAA's New Beowulf Cluster
|
Log In/Create an Account
| Top
| 87 comments
(Spill at 50!) | Index Only
| Search Discussion
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
|
2
(1)
|
2
A possible paradigm shift in Beowulf technology? (Score:3)
--
Will it rain? (Score:3)
This brings to mind a more fundamental and philosophical question - Does your computer (or any one that's possible to build) have enough horsepower to out-calculate that analog computer called reality that we all know and love so very much?
The Future of Scientific Programming? (Score:5)
You have worked in research with Legion and in industry at HPTi. Do you think there is hope for some radical new programming technology that makes clusters easier for scientists to use? If so, what do you think the cluster programming environment of tomorrow might look like?
Job management (Score:4)
Is there provision for shifting jobs onto different nodes if one of them dies during a run?
What NOAA applications are not ideal for clusters? (Score:3)
Most of the IS/IT trade publications and media usually do not fully comprehend the differences between massively multiprocessor systems with shared memory and those clusters of systems and processors with their own local memory, or supercomputing clusters. This is quite evident in a recent article regarding the TPC-D performance between clusterd Compaq Wintel/MSSQL systems and a single, shared memory Sun/Oracle system where the Compaq cluster outperformed the Sun solution in 2 of the 10 standard benchmarks. Basic laws of statistics negate those results because the design of the two systems were not of the same class -- e.g., to be fair, Microsoft-Compaq should have compared performance to an equivalent cluster of lower-costing Sun systems (let alone a Lintel cluster!).
As you and I already know (and I hope everyone reading this now knows), there are several applications where lower costing clusters cannot always do the job of more costly shared memory systems as efficiently (e.g., low-latency, real-time applications such as real-time simluations, come to mind). That is why the Compaq Wintel cluster scored drastically far below the shared Sun system in many of the other 8 benchmarks in the aforementioned study.
As such, I am interested in the considerations the NOAA has had to make in evaluating shared memory versus clustered systems. Specifically:
- What are some of the NOAA/NWS programs and software that will not be applicable for execution on this new cluster?
- What [estimated] percentage do these programs make up of the total applications the NOAA uses, both quantity and in time of execution?
- What [assuming] shared memory systems and solutions does the NOAA use for these applications?
Of course, the lower the number in the first two questions, the more advantageous the existence of a supercomputing cluster is to an organization. For example, in the aerospace industry, the quantity of cluster-efficient applications may be small, but the total execution time of a "run" of these select applications can greatly outweigh all others. Again, speaking from my aerospace background, such applications like Monte Carlo, CFD, 6DOF (six degrees of freedom) runs and simulations are extremely time consuming. Monte Carlo is an ideal application for clustering since each "run" result is complete independent from another (almost linear performance improvement when distributed in a cluster). CFD is very close to linear (~90% efficient) and 6DOF, I would guess, could be as high as 60 or 70%, if it is written to take advantage of distributed computing systems.The main reason why these engineering applications are so efficient on clusters is the nature of how they use data. They need little to start crunching, and return little. But during the run, they create and use massive ammount of data, which is all "temporary." This is in stark constrast to databases (such as those targetted by the aforementioned TPC-D benchmarks), where data, not computational results, is the focus of the application. By using supercomputing clusters for computational-driven engineering apps, we can save both money on systems and the time of our engineers waiting on results.
As such, I am interested in the overall increase in efficiency you are seeing after the introduction of supercomputing clusters. Specifically:
[ I now work in the semiconductory design industry, and we are looking at acquiring some Linux supercomputing clusters speed up the runs of EDA (electronic design automation) tools like those for IC layout and the like. ]
I appreciate your time and wish your organization and yourself the best wishing in our Linux and OSS endeavors.
-- Bryan "TheBS" Smith
first? (Score:4)
Beowulf in General (Score:4)
In the beginning... (Score:5)
Who Else? (Score:4)
Are other government agencies going to duplicate your work? Have they already? If so, for what purposes?
Hardware info? (Score:3)
The Future of the Control Software (Score:5)
Weather forecasting in general. (Score:5)
As I understood it weather models are a fairly hard thing to paralleliz (how the hell do you spell that?) because of the interdependence of pieces of the model. This would seem to me to make a Beowulf cluster a tough choice as it's inter-CPU bandwidth is pretty low right? And that's why I thought most weather prediction places chose high end super-computers because of their custom and expensive inter-CPU I/O?
Second part: Is weather prediction getting any better? Everything I've read about dynamic systems says that prediction past a certain level of detail or timeframe is impossible. Is that true?
Disclaimer: I might be dumb.
Hotnutz.com [hotnutz.com] - Funny
What about a dnet type client? (Score:5)
It seems to me that there are more than a few people who are willing to donate spare cpu cycles for various projects. At a minimum. you could concentrate on the client side binaries and not worry as mouch about hardware issues.
Beowulf Alternatives? (Score:5)
Additionally, to what would you compare the system that you are planning to build, as far as computing power is concerned?
Thanks,
VVulfe
Biggest whack in the head? (Score:5)
What was the biggest 'WTF was I thinking' on this project? I'd imagine there was a fair amount of lateral space allowed to the designers, and freedom to design also means freedom to screw up.
Imagine ... (Score:4)
Seriously, what was the most challenging of maintainence tasks you had to undertake? Do you anticipate that a trade off point where the number of machines makes maintanence impossible? Do you have any pearls of wisdom for those of us just involved in the initial design of such clusters, so that maintaining it in the future is less painful?
Applications of this cluster? (Score:3)
1) Having just graduated with a BS in Atmospheric Sciences, I have had a chance to take numerical weather prediction courses over the last five years. With this new influx of processing power, where do you see numerical models going in the future?
2) Somewhat related to 1), with mesoscale models becoming more popular (MM5 quickly springs to mind), where do you see the balance of processor time going to these models. The ability to get a model out faster, or to compute more variables to provide a more accurate forecast at the smaller scale?
3) Not knowing too much about the origins of these models, I was interested to find that a person could get the source to the MM5 and modify it as they see fit. Will models developed in the future follow this same trend? With powerful computers becoming affordable, it would not be that difficult for a university to build one and run a particular model for their area (I believe that Ohio State is doing it, again, with the MM5)?
Thanks!
Bryan R.
Question about maintinance. (Score:5)
Besides that, best of luck, and I can't wait to see the final product. ;^)
-legolas
i've looked at love from both sides now. from win and lose, and still somehow...
Why alpha? (Score:5)