Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Ask the Man Behind the NOAA's New Beowulf Cluster

Posted by Roblimo on Tue May 23, 2000 11:00 AM
from the don't-you-wish-you-had-his-job? dept.
Greg Lindahl sent in this story last September about a massive Alpha Linux cluster that's being built by HPTi for the NOAA's Forecast Systems Laboratories. What Greg forgot to mention when he submitted the original story is that he's the project's chief designer. What with all the Beowulf (and Alpha) interest around here, we figured he'd make a great interview guest, especially now that the project is well under way. Please post your questions below. Answers to 10 - 15 of the highest-moderated ones should appear within the next week.
This discussion has been archived. No new comments can be posted.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1) | 2
  • Future Plans? by cyphergirl (Score:1) Tuesday May 23 2000, @06:16AM
  • AppleSeed Beowulf? by dmhirsch (Score:1) Tuesday May 23 2000, @06:58AM
  • Significant event by dragonfly_blue (Score:1) Tuesday May 23 2000, @06:17AM
  • Intra-CPU Bandwidth? by Anonymous Coward (Score:1) Tuesday May 23 2000, @07:01AM
  • Coding style differences for distributed computing by Hentai (Score:1) Tuesday May 23 2000, @06:17AM
  • Future of the Alpha by mcelrath (Score:2) Tuesday May 23 2000, @07:02AM
  • Should be: AppleSeed > Beowulf? by dmhirsch (Score:1) Tuesday May 23 2000, @07:05AM
  • How did you choose the CPU type? by Argyle (Score:1) Tuesday May 23 2000, @07:06AM
  • Pretty impressive and all..but... by SgtPepper (Score:1) Tuesday May 23 2000, @07:06AM
  • Re:Weather forecasting in general. by cartographer (Score:1) Tuesday May 23 2000, @09:06AM
  • various... by Anonymous Coward (Score:1) Tuesday May 23 2000, @09:40AM
  • Technical Challenges by YAAC (Score:1) Tuesday May 23 2000, @07:08AM
  • oversea's clusters... by ChiaBen (Score:1) Tuesday May 23 2000, @07:11AM
  • I have recently become gainfully employed in a capacity which will require me to administer a Beowulf cluster. My question, Mr. Lindahl, is how you feel about the various competing technologies for distribution of computation. In particular, do you feel there is much to be gained from the work of the MOSIX project at The Hebrew University of Jerusalem? Traditionally tasks for Beowulf style supercomputers have required specific programming in MPI or PVM calls. MOSIX endeavors to provide adaptive load-balancing with process migration. Essentially this allows the programmer to forgo the hassle of parallelizing his code. Rather, he can now simply fork() or create SMP threads and the OS will automatically handle distribution of those processes over the cluster. Do you feel that this is a worthwhile avenue to pursue for scientific computation or are there issues which make MPI or PVM still a substantially better choice? Thank you for your time.
    --
  • Are smaller clusters worth building? by Medievalist (Score:1) Tuesday May 23 2000, @07:12AM
  • Doing it all over again... by WindChild (Score:1) Tuesday May 23 2000, @10:18AM
  • how to start on building beowulf by nettarzan (Score:1) Tuesday May 23 2000, @10:36AM
  • Re:Why alpha - serious number crunching. by morzel (Score:2) Tuesday May 23 2000, @11:51AM
  • Re:Who Else? by Eugene (Score:1) Tuesday May 23 2000, @12:12PM
  • Storage for a beowulf by dbrower (Score:1) Tuesday May 23 2000, @12:14PM
  • Will it rain? (Score:3)

    by wass (72082) on Tuesday May 23 2000, @06:17AM (#1053744)
    Okay, here's my question. What will the weather be like on September 27, 2005, in Baltimore, MD?

    This brings to mind a more fundamental and philosophical question - Does your computer (or any one that's possible to build) have enough horsepower to out-calculate that analog computer called reality that we all know and love so very much?

  • Kernel 2.4 by superlame (Score:1) Tuesday May 23 2000, @06:18AM
  • by Matt Gleeson (85831) on Tuesday May 23 2000, @06:19AM (#1053746) Homepage
    The raw performance of the hardware being used for scientific and parallel programming has improved by leaps and bounds in the past 10-20 years. However, most folks still program these supercomputers much the same way they did in the 80's: Unix, Fortran, explicit message passing, etc.

    You have worked in research with Legion and in industry at HPTi. Do you think there is hope for some radical new programming technology that makes clusters easier for scientists to use? If so, what do you think the cluster programming environment of tomorrow might look like?
  • Job management (Score:4)

    by gcoates (31407) on Tuesday May 23 2000, @06:20AM (#1053747)
    One of the weaknesses for beowulfs seems to me to be a lack of decent (job) management software. How do you split the clusters resources? Do you run one large simulation on all the CPUs, or do you run 2 or 3 jobs on 1/2 or 1/3 of the available CPUs?

    Is there provision for shifting jobs onto different nodes if one of them dies during a run?
  • everyone thinks CPU I think network ! by johnjones (Score:1) Tuesday May 23 2000, @06:21AM
  • Re:Congratulations. But why??? by technos (Score:2) Tuesday May 23 2000, @06:21AM
  • My sister was bit by a moose once. by Hentai (Score:1) Tuesday May 23 2000, @06:22AM
  • Management and Monitoring by cblack (Score:1) Tuesday May 23 2000, @06:24AM
  • Re:Who are the programmers? by David Roundy (Score:1) Tuesday May 23 2000, @07:14AM
  • Most of the IS/IT trade publications and media usually do not fully comprehend the differences between massively multiprocessor systems with shared memory and those clusters of systems and processors with their own local memory, or supercomputing clusters. This is quite evident in a recent article regarding the TPC-D performance between clusterd Compaq Wintel/MSSQL systems and a single, shared memory Sun/Oracle system where the Compaq cluster outperformed the Sun solution in 2 of the 10 standard benchmarks. Basic laws of statistics negate those results because the design of the two systems were not of the same class -- e.g., to be fair, Microsoft-Compaq should have compared performance to an equivalent cluster of lower-costing Sun systems (let alone a Lintel cluster!).

    As you and I already know (and I hope everyone reading this now knows), there are several applications where lower costing clusters cannot always do the job of more costly shared memory systems as efficiently (e.g., low-latency, real-time applications such as real-time simluations, come to mind). That is why the Compaq Wintel cluster scored drastically far below the shared Sun system in many of the other 8 benchmarks in the aforementioned study.

    As such, I am interested in the considerations the NOAA has had to make in evaluating shared memory versus clustered systems. Specifically:

    • What are some of the NOAA/NWS programs and software that will not be applicable for execution on this new cluster?
    • What [estimated] percentage do these programs make up of the total applications the NOAA uses, both quantity and in time of execution?
    • What [assuming] shared memory systems and solutions does the NOAA use for these applications?
    Of course, the lower the number in the first two questions, the more advantageous the existence of a supercomputing cluster is to an organization. For example, in the aerospace industry, the quantity of cluster-efficient applications may be small, but the total execution time of a "run" of these select applications can greatly outweigh all others. Again, speaking from my aerospace background, such applications like Monte Carlo, CFD, 6DOF (six degrees of freedom) runs and simulations are extremely time consuming. Monte Carlo is an ideal application for clustering since each "run" result is complete independent from another (almost linear performance improvement when distributed in a cluster). CFD is very close to linear (~90% efficient) and 6DOF, I would guess, could be as high as 60 or 70%, if it is written to take advantage of distributed computing systems.

    The main reason why these engineering applications are so efficient on clusters is the nature of how they use data. They need little to start crunching, and return little. But during the run, they create and use massive ammount of data, which is all "temporary." This is in stark constrast to databases (such as those targetted by the aforementioned TPC-D benchmarks), where data, not computational results, is the focus of the application. By using supercomputing clusters for computational-driven engineering apps, we can save both money on systems and the time of our engineers waiting on results.

    As such, I am interested in the overall increase in efficiency you are seeing after the introduction of supercomputing clusters. Specifically:

    • By executing appropriate applications on supercomputer clusters, what price/performance efficiency do you see over execution on equivalent shared memory systems? [e.g., for CFD, we found equivalently performing supercomputing Linux clusters cost 5-10% of the cost of shared memory systems from Sun and SGI.]
    • In addition to these computational-intensive applications, do you have any data-intensive applications (if any) that are more price/performance efficient (not necessarily faster overall) on clusters than shared memory systems? [I personally have not been able to justify clusters for such uses, yet]

    [ I now work in the semiconductory design industry, and we are looking at acquiring some Linux supercomputing clusters speed up the runs of EDA (electronic design automation) tools like those for IC layout and the like. ]

    I appreciate your time and wish your organization and yourself the best wishing in our Linux and OSS endeavors.

    -- Bryan "TheBS" Smith

  • Re:The Future of the Control Software by kbh3rd (Score:1) Tuesday May 23 2000, @06:24AM
  • Re:The Future of the Control Software by technos (Score:2) Tuesday May 23 2000, @07:22AM
  • performance benchmarks by fishbonez (Score:2) Tuesday May 23 2000, @07:26AM
  • Re:Why alpha? by b_pretender (Score:2) Tuesday May 23 2000, @07:35AM
  • Rolling Cluster by manplusdog (Score:1) Tuesday May 23 2000, @03:20PM
  • Ask The Man? by Ed Avis (Score:2) Tuesday May 23 2000, @07:37AM
  • Clustering vs. Distributed Computing by TheLocustNMI (Score:1) Tuesday May 23 2000, @07:38AM
  • What is your installation/administration software? by exa (Score:1) Tuesday May 23 2000, @04:04PM
  • What should I be reading? by DoktorMel (Score:1) Tuesday May 23 2000, @07:47AM
  • first? (Score:4)

    by Greg Lindahl (37568) on Tuesday May 23 2000, @06:02AM (#1053763) Homepage
    first post?
  • Screensaver? by chowpalace (Score:1) Tuesday May 23 2000, @06:02AM
  • Beowulf in General (Score:4)

    by BgJonson79 (129962) <(srsmith) (at) (alum.wpi.edu)> on Tuesday May 23 2000, @06:05AM (#1053765)
    How do you think the new wave of Beowulf clusters will effect all of supercomputing, not just forcasting?
  • The end for SC's in Forecasting? by nagora (Score:2) Tuesday May 23 2000, @06:06AM
  • by zpengo (99887) on Tuesday May 23 2000, @06:06AM (#1053767) Homepage
    How did you come to be the project's chief designer? I'm curious to know the background of anyone who gets to work on such an interesting project.
  • Who Else? (Score:4)

    by Alarmist (180744) on Tuesday May 23 2000, @06:04AM (#1053768) Homepage
    You've built a large cluster of machines on a relatively pea-sized budget.

    Are other government agencies going to duplicate your work? Have they already? If so, for what purposes?

  • Hardware info? (Score:3)

    by matticus (93537) on Tuesday May 23 2000, @06:04AM (#1053769) Homepage
    can you give us some information about what exactly is in this cluster? what alphas, etc?
  • Oh the temptation... by drenehtsral (Score:2) Tuesday May 23 2000, @06:05AM
  • Um. err. by MVoelker (Score:1) Tuesday May 23 2000, @06:07AM
  • by PacketMaster (65250) on Tuesday May 23 2000, @06:08AM (#1053772) Homepage
    I built a Beowulf-style cluster this past semester in college for independent study. One of the biggest hurdles we had was picking out a message passing interface such as MPI or PVM. Configurining across multiple platforms was then even worse (we had a mixture of old Intels, SunSparcs and IBM RS/6000's). What do you see in the future for these interfaces in terms of setup and usage and will cross-platform clusters become easier to install and configure in the future?

  • Cost and application by mobiux (Score:2) Tuesday May 23 2000, @06:25AM
  • Beowulf Design by pyronicide (Score:1) Tuesday May 23 2000, @06:08AM
  • Why Beowulf? by LaNMaN2000 (Score:1) Tuesday May 23 2000, @06:26AM
  • Modifications for Beowulf? by LaNMaN2000 (Score:1) Tuesday May 23 2000, @06:28AM
  • by Matt2000 (29624) on Tuesday May 23 2000, @06:29AM (#1053777) Homepage
    Ok, a two parter:

    As I understood it weather models are a fairly hard thing to paralleliz (how the hell do you spell that?) because of the interdependence of pieces of the model. This would seem to me to make a Beowulf cluster a tough choice as it's inter-CPU bandwidth is pretty low right? And that's why I thought most weather prediction places chose high end super-computers because of their custom and expensive inter-CPU I/O?

    Second part: Is weather prediction getting any better? Everything I've read about dynamic systems says that prediction past a certain level of detail or timeframe is impossible. Is that true?

    Disclaimer: I might be dumb.

    Hotnutz.com [hotnutz.com] - Funny
  • Re:Question about maintinance. by Anonymous Coward (Score:1) Tuesday May 23 2000, @06:30AM
  • by x0 (32926) on Tuesday May 23 2000, @06:30AM (#1053779) Homepage
    I am curious as to whether (no pun intended...:)) or not you have ever done any testing to see if a distributed.net type enviornment would be useful for your type of work?

    It seems to me that there are more than a few people who are willing to donate spare cpu cycles for various projects. At a minimum. you could concentrate on the client side binaries and not worry as mouch about hardware issues.

  • Finer grain parallel linux system and debugging by theHippo (Score:1) Tuesday May 23 2000, @07:50AM
  • Long term predictions by digitalhermit (Score:2) Tuesday May 23 2000, @06:31AM
  • When is a GNU/Linux cluster not a good choice? by dumpest (Score:1) Tuesday May 23 2000, @07:55AM
  • Re:When is a GNU/Linux cluster not a good choice? by BitMan (Score:1) Tuesday May 23 2000, @08:00AM
  • bandwidth by ArchieBunker (Score:1) Tuesday May 23 2000, @08:02AM
  • Re:Congratulations. But why??? by Atticka (Score:1) Tuesday May 23 2000, @08:19AM
  • Design your own BeoWulf Cluster at Home! by Domini (Score:1) Tuesday May 23 2000, @10:27PM
  • Beowulfs in Business by Gerakis (Score:2) Tuesday May 23 2000, @08:32AM
  • Can I have a job? by pacabell (Score:1) Wednesday May 24 2000, @06:20AM
  • Reliability - general & alpha vs intel by hjs (Score:1) Tuesday May 23 2000, @08:36AM
  • Semi-Related: Clusters vs. Supercomputers by kristau (Score:1) Wednesday May 24 2000, @08:05AM
  • Re:What about a dnet type client? by BRock97 (Score:1) Tuesday May 23 2000, @08:55AM
  • Hi Greg! by ayden (Score:1) Thursday May 25 2000, @04:56PM
  • details please by synaptic-impulse (Score:1) Tuesday May 30 2000, @02:16PM
  • by vvulfe (156725) on Tuesday May 23 2000, @06:09AM (#1053794)
    Before deciding on a beowulf clusters, what different options did you explore (Cray? IBM?), and what motivated you to choose the Beowulf System?

    Additionally, to what would you compare the system that you are planning to build, as far as computing power is concerned?

    Thanks,
    VVulfe
  • MPI restrictive? by ThePurpleBuffalo (Score:1) Tuesday May 23 2000, @06:11AM
  • by technos (73414) on Tuesday May 23 2000, @06:12AM (#1053796) Homepage Journal
    Having built a few small ones, I got to know quite a bit about Linux clusters, and about programming for them. Therefore, this question has nothing to with clusters.

    What was the biggest 'WTF was I thinking' on this project? I'd imagine there was a fair amount of lateral space allowed to the designers, and freedom to design also means freedom to screw up.
  • Imagine ... (Score:4)

    by (void*) (113680) on Tuesday May 23 2000, @06:13AM (#1053797)
    ... a beowulf of these babies - oh wait! :-)

    Seriously, what was the most challenging of maintainence tasks you had to undertake? Do you anticipate that a trade off point where the number of machines makes maintanence impossible? Do you have any pearls of wisdom for those of us just involved in the initial design of such clusters, so that maintaining it in the future is less painful?

  • Why not rackmounted servers? by Tet (Score:2) Tuesday May 23 2000, @06:33AM
  • Essence of a Beowulf cluster by Paul Jakma (Score:1) Tuesday May 23 2000, @06:13AM
  • by BRock97 (17460) on Tuesday May 23 2000, @06:35AM (#1053800) Homepage
    First off, from what I have gathered, it was not clear if you background was weather or not, so, I am hoping it is. Here are a couple of questions:

    1) Having just graduated with a BS in Atmospheric Sciences, I have had a chance to take numerical weather prediction courses over the last five years. With this new influx of processing power, where do you see numerical models going in the future?

    2) Somewhat related to 1), with mesoscale models becoming more popular (MM5 quickly springs to mind), where do you see the balance of processor time going to these models. The ability to get a model out faster, or to compute more variables to provide a more accurate forecast at the smaller scale?

    3) Not knowing too much about the origins of these models, I was interested to find that a person could get the source to the MM5 and modify it as they see fit. Will models developed in the future follow this same trend? With powerful computers becoming affordable, it would not be that difficult for a university to build one and run a particular model for their area (I believe that Ohio State is doing it, again, with the MM5)?

    Thanks!

    Bryan R.
  • Wait for it... by laborit (Score:1) Tuesday May 23 2000, @06:41AM
  • Why GNU/Linux ? by Oestergaard (Score:1) Tuesday May 23 2000, @06:44AM
  • Quake? by Dethboy (Score:1) Tuesday May 23 2000, @06:45AM
  • Will you help spread the word about Open Source? by agravaine (Score:2) Tuesday May 23 2000, @06:53AM
  • Re:Why not rackmounted servers? by Animats (Score:2) Tuesday May 23 2000, @06:56AM
  • Who are the programmers? by FascDot Killed My Pr (Score:2) Tuesday May 23 2000, @06:13AM
  • by Legolas-Greenleaf (181449) on Tuesday May 23 2000, @06:14AM (#1053807)
    A major problem with using a beowulf cluster over a single supercomputer is that you now have to administer many computers instead of just one. Additionally, if something is failing/misbehaving/etc., you have to determine which part of the cluster is doing it. I'm interested a] how much of a problem this is over a traditional single machine supercomputer, b] why you chose the beowulf over a single machine considering this factor, and c] how you'll keep this problem to a minimum.

    Besides that, best of luck, and I can't wait to see the final product. ;^)
    -legolas

    i've looked at love from both sides now. from win and lose, and still somehow...

  • Why alpha? (Score:5)

    by crow (16139) on Tuesday May 23 2000, @06:14AM (#1053808) Homepage Journal
    Why did you choose Alpha processors for the individual nodes? Why not something cheaper with more nodes, or something more expensive with fewer nodes? What other configurations did you consider, and why weren't they as good?
  • Future of SuperComputing by superlame (Score:1) Tuesday May 23 2000, @06:15AM
  • Which parallel programming toolkit and why? by Bishop (Score:1) Tuesday May 23 2000, @06:15AM
(1) | 2