Forgot your password?
typodupeerror

Jonathan Zdziarski Answers 326

Posted by ScuttleMonkey
from the friends-of-anti-spam-slash-back dept.
Wednesday we requested questions for Jonathan Zdziarski, an open source contributor and author of the recently reviewed book "Ending Spam." Jonathan seems to have taken great care in answering your questions, which you will find published below. We have also invited Jonathan to take part in the discussion if he has time so if your question didn't make the cut perhaps there is still hope.
Winkydink asks:
How do you pronounce your name?

Jonathan Responds:
Hi. Well first off, I'm sticking to the pronunciation 'Jarsky', however many of my relatives still pronounce it 'Zarsky' or "Za-Jarsky". As far as I can tell, my last name was originally 'Dziarstac' when the first generation of my family came over, which would have been pronounced with a 'J'. It's of polish decent, but I'm afraid I'm not very in tune with my ancestors on this side of the family. The other side of my family is mostly Italian, and they drink a lot, organize crime, and generally have more fun - so they are much more interesting to hang out with. For the past 29 years of my life, giving my last name to anyone has included the obligatory explanation of its pronunciation, history, and snickering at puns they think I'm hearing for the first time (-1, Redundant), so don't feel too bad for asking.

As far as who I am and why you should care - I guess that depends on what kind of geek you are. I've never appeared in a Star Trek series or anything (I've been too busy coding and being a real geek), so I guess that eliminates me as a candidate for public worship in some circles. I guess if you're into coding, open source, hacking all kinds of Verizon gear, or eradicating spam, then some of my recent projects may be of interest. If you at least hate me by the end of the interview, I'll have accomplished something.

An Anonymous Coward asks:
What do you think about the proposed change to the GPL with the upcoming GPL 3? Is it a welcomed breath of fresh air to the Open Source Community, or will it just be a reiteration of the previous GPL? What are your thoughts and comments on the GPL 3?

Jonathan Responds:
Based on the scattered information I've read about some potentially targeted areas in GPLv3 and the religious fervor with which some of these discussions have been reported, all I can say is I hope common sense prevails. Actually there's much more I can, and will, say about the subject below, but I think it's probably a good idea to summarize in advance as you may not make it through the list of details in one sitting. So in summary of all my points to come: I hope common sense prevails.

One of the things I've heard, which doesn't make much sense to me, is the idea of changing the GPL to deal with 'use' rather than 'distribution', which would affect companies like Google and Amazon. The argument seems to be that some people feel building your infrastructure on open source should demand a company release all of their proprietary source code which links to or builds on existing GPL projects. They argue that the open source community hasn't benefited from companies like Google and Amazon. Well, from a source code perspective that might be somewhat true - but if you take into consideration the fact that we all have a good quality, freely accessible search engine, cheap books, and employment for many local developers (many of whom write open source applications), the benefits seem to balance out the deficiency. Does anybody remember what the world was like before Google? None of us do, primarily because we couldn't find it - we couldn't find much of anything we were looking for on the Internet as a matter of fact, including other people's open source projects. You might not be getting "free as in beer" or "free as in freedom", but you are getting "free as in searches" and "free as in heavily discounted but not quite free books" in exchange. That's a pretty good trade. It's certainly better than having to look at pages of advertising before completing your order, or subscribing to a Google search membership. On top of this, you probably wouldn't want to see half of the source code that's out there being integrated (internally) into these projects. While I haven't seen Google or Amazon's mods specifically, I do heavily suspect that, if they are like any other large corporate environment, there are many disgusting and miserable hacks that should under all circumstances remain hidden from sight forever - many of which are probably helping ensure job security for the developers that performed the ugly hacks in the first place. Just how useful would they be to your project anyway? Probably little. And if you really believe in free software ("free as in freedom"), then the idea that someone should be required to contribute back to your project in order to use it is contradictory to that belief - you might just as well be developing under an EULA instead of the GPL.

With that said, there's a difference between freedom and stealing. I've heard that GPLv3 will attempt to address the mixing of GPL and non-GPL software. I think this clarification might be a good thing. For one, because I've seen far too many pseudo-open source tin cans and CDs being resold commercially out there, distributing many different F/OSS tools with painfully obvious closed commercial code, and finding ways to easily loophole around this part of the GPL, and secondly because it's based around implementation guidelines that really aren't any of the GPL's business. At the moment, mixing uses a very archaic guideline, which is - in its simplest terms -based on whether or not your code shares the same execution space as the GPL code. I think this needs to be reworked to give authors the flexibility to define "public" and "private" interfaces in a project manifest. We're already defining these anyway if we believe in secure coding practices. Closed source projects may then use whatever public interfaces the author has declared public (such as command line execution, protocols, etcetera) but private interfaces are off limits. One particular area where this would come in handy is in GPL kernel drivers, which need this ability to avoid tainted-kernel situations. If the author wants, they can declare dynamic linking to a library as a public interface and even make their code more widely useful without having to switch to the GPL's red-headed stepchild, the LGPL. It would also be nice to be able to restrict proprietary protocols (such as one between a client piece and a server piece, which may have originally been designed to function together) to only other GPL projects, which would essentially create GPL-bonded protocol interfaces. This won't restrict use in any way - only what closed-source projects are limited to interfacing with when redistributed.

I would also like to see the GPL's integration clause tightened down quite a bit. There are some companies out there abusing the GPL with "dual licensing". I've considered dual licensing myself in some commercial products, and I just don't believe it's being done in the right spirit much, if at all. Doing away with the possibility of integrating the GPL into a dual license could help strengthen the GPL.

Finally, I'd say mentioning a few words in the GPLv3 about submission practices to help stave off problems like this whole Sco and Linux® fiasco from ever happening again would be a good thing. People generally don't want to limit usage, but if you're going to submit code, there should be at least some submission guidelines. I suspect much of this can (and should) be done outside of the GPL, but at least covering the basics might be appropriate. It should be understood that if you're going to contribute code to the GPL, it had better be unencumbered. It's definitely something every project should already be considering already.

An Anonymous Coward asks:
Do you have any suggestions for the enthusiastic yet inexperienced? Perhaps a listing of projects in need of developers, with some indication of the level of experience suggested (as well as languages required).

Jonathan Responds:
The best projects I've seen were those started from someone with a passion for what it is they're coding. Open source development is the internship of the 21st century, and working on projects is tedious, frustrating, and likely to make you want burn out if you haven't developed perseverance. I usually suggest to people to come up with ideas for some projects they feel passionately about and make those their first couple of goals. Even if it's completely useless to anyone else, you're still likely to benefit from it yourself. Just look at my Australian C programming macros. Who would have thought that people wouldn't want to use "int gday(int argc, char *argv[])" in their code. I'm sure I learned something from that project, though I still can't remember what.

Instead of spending idle time looking for other projects to jump on, I'd spend as much time as I could in man pages, books, and coding up my own little concoctions. Even if they're stupid ones, you're likely to learn something, or even better - come up with another neat little idea you can spin off of it. Necessity is the mother of invention, so I try and figure out what it is I need, and then do it myself. That usually works. If you still can't think of anything, see if you can catch a vision for something someone else needs. I wouldn't touch anything that you're not 100% bought into and excited about for your first projects.

RealisticCanadian asks:
I myself have had numerous interactions with less-than-technically-savvy management-types. Any time I bring up solutions that are quite obviously a better technical and financial choice over software-giant-type solutions; conversation seems to hit a brick wall. The ignorance of these people on such topics is astounding, and I find many approaches I have tried seem to yield no results in the short term. "Well, yes, your example proves that we would save $500,000 per year using that Open Source solution. But We've decided to go the Microsoft (or what-have-you) route." With your track record, I can only assume you have found some ways to overcome this closed-mindedness.

Jonathan Responds:
I'm not so sure that I have convinced anyone open source was better inasmuch as I've convinced people that other people's projects were better than what Microsoft had to offer, and that's not hard for anyone to accomplish. I can strongly justify some open source projects to people because they are already superior to their commercial counterparts, but there are also a lot of crummy projects out there that should be shot and put out of my misery. I'm not one to advocate a terribly written project, even if it is open source. The good projects can usually speak for themselves with only a little bit of yelling and biting from me. So if you want to become a respected open source advocate at your place of business, I'd say the first rule of thumb is not to try and advocate crap projects for the mere reason that they're open source. Advocating the good ones will help you build a reputation. It also helps if you read a lot of Dilbert so you'll understand the intellectual challenges you'll be facing.

Some other things that I've found can help include what managers love to call a "decision matrix" which is a spreadsheet designed to make difficult decisions for them. For your benefit, this should consist of a series of features and considerations that the competitor doesn't have, with a big stream of checkboxes down the row corresponding to your favorite open source project. Nobody's interested in knowing what the projects have in common anyway, so tell them (with visual cues) what features your open source solution has over the competitor. And if you really want to get your point across clearly to your manager, do the spreadsheet in OpenOffice so they'll have to download and install an open source project to read it.

Once you've done that, and if you're still employed by now, the next thing to put together is an ROI (return on investment) comparison, which not only addresses the costs of the different solutions, but costs to support both solutions in the long run, cost of inaccuracy (if this is a spam solution for example), cost of training, customizations, and resources to manage each product. This is a great opportunity to size machines and manpower and include that in a budget forecast. Many managers are sensitive to knowing just how much extra dough it's going to cost to implement the commercial solution. At the very least, you ought to be able to prove many commercial solutions don't actually make the company much money in the long run. If speaking of cash isn't enough to convince your manager then a full analysis of low-level technical aspects will be necessary. This is simply a dreadful process, and where most open source attempts fail - because a lot of people are just too lazy to learn about the technical details of both projects and complete their due diligence. If you take the time, though, you're likely to either convince your boss or utterly confuse him - either one is very satisfying.

The biggest challenge in justifying many open source projects I've run into is finding solid support channels that your boss can rely on if you get hit by a bus (or in his mind, fired). Support is, in many cases, a requirement but not all good open source projects see the benefit in offering support. A lot of companies are willing to pay just to have someone they can call when they have a problem. So if you can find a project that's got a pool of support you can draw out of, you can not only use that to justify the project to your manager, but kick a few bucks back into the open source community. I started offering support contracts for dspam primarily because people needed them in order to get the filter approved as a solution. I think I do a good job supporting my clients that do need help, but at least half of them just pay for a contract and never use it. I certainly don't have a problem with that, and it supports the project as well as the people investing time in it.

Goo.cc asks a two parter:
1. In your new book, you basically state that Bogofilter is not a bayesian filter, which was news to some of the Bogofilter people I have spoken to. Can you explain why you feel that Bogofilter is not a bayesian filter?

Jonathan Responds:
Bogofilter uses an alternative algorithm known as Fisher-Robinson's Chi-Square. Gary Robinson (Transpose) basically built upon Fisher's Inverse Chi-Square algorithm for spam filtering, which provided some competition for the previously widely accepted Bayesian approach to this. Therefore, Bogofilter is not technically a Bayesian filter. The term, "Bayesian", however is commonly a buzzword known to most people to describe statistical content filtering in general (even if it isn't Bayesian), and so Bogofilter often gets thrown into the same bucket. CRM114 is another good example of this - many people throw it in the same bucket as a Bayesian filter, but it is configured (by default, at least) to be a Markovian-based filter which is "almost entirely nothing like Bayesian filtering". Technically, CRM114 isn't a filter at all, but a filtering-language JIT compiler (it can be any filter). I cover all of these mathematical approaches in Ending Spam, so grab a copy if you're interested in learning about their specific differences.

2. Bayesian filters have been around for some time now but there still seems to be no standardized testing methods for determining how well filters work in comparison to one another. Do you think that comparative testing would be useful and if so, how should it be performed?

Jonathan Responds:
Part of the reason there's no standardized testing methodology is because there's no standardized filter interface. A few individuals have attempted to build spam "jigs" for testing filters, but the bigger problem is really lack of an interface. About a year ago, the ASRG was reportedly working on developing such a standard - but as things usually turn out, it's an extremely long and painful process to get anything done when you've got a committee building it (take the mule, for instance, which was a horse built by a committee). This is probably why filter authors have also been hesitant to try and accommodate their filters to a particular testing jig. Incidentally, this is how I surmise that SPF could not have possibly made it through the ASRG - the fact that it made it out at all suggests that it never went in.

I think it's of some interest to compare the different filters out there, but it's also somewhat of a pointless process too. Since these systems learn, and learn based on the environment around them, only a simulation and not a test, will really identify the true accuracy of these filters - and even if you can build a rock solid simulation, it will only tell you how well each filter compared for the test subject's email. If we are to have a bake-off of sorts, it definitely ought to include ten or more different corpora from different individuals, from different walks of life. Even the best test out there can't predict how a filter might react to your specific mail, and for all we know the test subjects may have been secretly into ASCII donkey porn (which will, in fact, complicate your filtering).

This is why some people misunderstand my explanations of dspam's accuracy. All I've said in the past is "this is the accuracy I get", and "this is the accuracy this dude got". Which is the equivalent of "our lab mice ate this and grew breasts". There's no guarantee anybody else is going to get those results, though I'm sure many would try (with the mice, that is). In general, though, I try to publish what I think are good "average" levels for users on my own system, and they are usually around 99.5% - 99.8%. In other words: your mileage may vary. So try it anyway. Incidentally, I've been working with Gordon Cormack to try and figure out what the heck went wrong with his first set of dspam tests. So far, we've made progress and ran a successful test with an overall accuracy of 99.23% (not bad for a simulation).

What would be far more interesting to me would be a well-put together bakeoff between commercial solutions and open source solutions. The open source community around spam filtering really has got the upper hand in this area of technology, and I'm quite confident F/OSS solutions can blow away most commercial solutions in terms of accuracy (and efficacy).

Mxmasster asks:
Most antispam software seems to be fairly reactionary - wither it is based on keyword patters, urls, sender, ip, or the checksum of the message a certain amount of spam has to first be sent and identified before additional messages will be tagged and blocked. Spf, domainkeys, etc... requires a certain percentage of the Internet to adopt before they will be truely effective. What do you see on the horizon as the next big technique to battle spam? How will this affect legitimate users on the Internet?

Jonathan Responds:
That's the problem with most spam solutions, and why I wrote Ending Spam. Bayesian content filtering, commonly thrown into this mix, has the unique ability to grow out of your typical reactive state and become a proactive tool in fighting spam. I get about one spam per month now at the most, and DSPAM is learning many new variants of spam as it catches them; I'd call that pretty proactive. Spam, phishing, viruses, and even intrusion detection are all areas that can benefit greatly from this approach to machine learning. They will likely never become perfect, but these filters have the ability to not only adapt to new kinds of spam, but to also learn them proactively before it makes it into your inbox. Some of this is done through what is called "unsupervised learning" and not traditional training, while other tools, such as message inoculation and honey pots, can help automate the sharing of new spam and virus strains before anyone has to worry about seeing them. We haven't thoroughly explored statistical analysis enough yet for there to be a "next big technique" beyond this. The next big techniques seem to be trying to change email permanently, and I don't quite feel excited about that. Statistical tools are where I think the technology is at and it needs to become commonplace and easier to setup and run.

The problem seems to be in the myth that statistical filtering is ineffective or incomplete. Many commercial solutions pass themselves off as statistical(ish) and it seem to be contributing to this myth by failing to do justice to the levels of accuracy many of the true (and open source) statistical filters are reflecting. Any commercial solution that claims to be an adaptive, content-based solution (like Bayesian filters are) really ought to deliver better than 95% or 99% accuracy. Part of the problem is just bad marketing - most of these tools are not true "Bayesian" devices; they just threw a Bayesian filter in there somewhere so they could use the buzzword. Another problem is design philosophy and the idea that you need an arsenal of other, less accurate tests, to be bolted in front of the statistical piece. If you're going to train a Bayesian filter with something other than a human being, whatever it is that's training it ought to be at least as smart as a human being. Blacklist-trained Bayesian filters are being fed with about 60% accurate data, (whereas a human is about 99.8% accurate). So it's no surprise to me that Blacklist-trained filters are severely crippled - what a dumb combination. If you really want to combine a bunch of tools for identifying spam, build a reputation system instead. They do a very good job of cutting spam off at the border, are generally more scalable than content-based filtering, and most large networks can justify their accuracy by their precision.

Not all commercial content-based filters are junk. Death2Spam is one exception to this, and delivers around 99.9% accuracy, which is in the right neighborhood for a statistical filter. Not all reputation systems are junk either. CipherTrust's TrustedSource is one example of what I call a well-thought out system. If you must have a commercial solution, either of these I suspect will make you quite happy. As for (most of) the rest, quit screwing around and build something original that actually works.

Jnaujok asks:
The SMTP standard that we use for mail transfer was developed in the late 70's - early 80's and has, for the most part, never been updated. In that time period, the idea of hordes of spam flowing through the net wasn't even considered. It has always been the most obvious solution to me that what we really need is SMTP 2.0. Isn't it about time we updated the SMTP standard?

Jonathan Responds:
You're talking about an authenticated approach to email, and there have been many different standards proposed to do this. First let me say that, even though SMTP was drafted a few decades ago, it's still successful in performing its function, which is a public message delivery system - key word being public. There exist many private message delivery systems already, which you could opt to use, including bonded sender and even rolling your own using PGP signatures and mailbox rules. I have reservations about forcing such a solution on everybody and breaking down anonymity for the sake of preventing junk mail. Until you can sell a company like Microsoft on absolute anonymity in bonded sender and sell ISPs into putting up initial bonds for their customers (so that a ten-year old gradeschool student can still use email), I see a very large threat (especially by the government) in globalizing this as a replacement for the 'public' system. With services like gmail, where you can store an entire life's worth of email, the idea that everything you've ever said could be sufficiently traced back to you and used against you, I would rather deal with the spam. Why? Let me pull out my tinfoil hat...

It's been advertised plenty of times on Slashdot that Google stores everything about all of its queries. It wouldn't surprise me if they already have government contracts in place to perform data mining on specific individuals. How would you like, in the future, all of your email to be mined and correlated with other personal data to determine whether or not you should be allowed to fly? Buy a firearm? Rent a car? We're not very far off from that, and even less so once this correlation is made possible.

So abstract some level of anonymity at the ISP-level you say? That's just not going to happen. For one, that makes it just as simple for a spammer to abuse somebody's network and then we've gone and redesigned SMTP for no good reason. Remember, business has to be able to set up shop online fairly easily and spammers are a type of shop. So we are always going to balance between free enterprise and letting spammers roam on the network. Should we employ a CA, how much would it cost to run your own email server? More importantly - does this perhaps open the door for per-email taxes? I'd much rather just deal with spam the way we are now. For another thing, abstracted identity architectures would only give you a level of anonymity parallel to the level of anonymity you have when you purchase a firearm (where the forms are stored by your dealer, rather than filed to a central government agency). See how long it takes for the feds to trace your handgun back to you if you leave it at the scene of a crime.

You can't leave it in the ISP's control anyway. The sad truth is that most ISPs still don't care about managing outgoing spam on their network; so new spammers are being given a nurturing environment to break into this new and exciting business. I had a recent bout with XO Communications about one such new spammer who had run a full-blown business on their network since 1997 and recently decided he'd like to start spamming under the "CAN-SPAM" act (which he was convinced defended his right to spam). He included his phone number, address, and web address in the spam - I called him up and verified he was who he said he was (the owner of this business, and spamming). Provided all of this information (over a phone call) to the XO abuse rep (let's call him "Ted"), even filed a police report, and XO still to this day has done nothing. His site is even still there, selling the same crap he spams for. This happens every day at ISPs out there.

The consequences outweigh the benefits. The people who drafted the SMTP protocol probably thought of most of these issues too. A public system can't exist without the freedom to remain anonymous, ambiguous, and the right to change your virtual identity whenever the heck you like.

Sheetrock asks a two parter:
1. In the past, I've heard it suggested that anti-spam techniques often go too far, culling good e-mail with the bad and perhaps even curtailing 1st Amendment rights. Clearly this depends on what end of the spectrum you're on, but recent developments have given me pause for thought on the matter. For example, certain spam blacklists would censor more than was strictly necessary (a subjective opinion, I realize) to block a spammer -- sometimes blocking a whole Class C to get one individual. This would cause other innocent users in that net space to have their e-mail to hosts using the blacklists silently dropped without any option of fixing the problem besides switching ISPs.

Jonathan Responds:
A lot of blacklists have started taking on a vigilante agenda, or at the very least rather questionable ethical practices. Spamhaus' recent blacklisting of all Yahoo! Store URLs (and Paul Graham's website) is a prime example of this. As long as you're subscribed to human-operated blacklists, you're going to suffer from someone's politics. That's one of the reasons I coded up the RABL, which is a machine-automated blacklist. There is also another called the WPBL (weighted private block list). As the politics of the organizations running human-maintained lists get worse, I think more of these automated lists will start to pop up. Machine-automated blacklists don't have an agenda - they have a sensitivity threshold. It's much easier to find the right list with the right threshold than it is to find the right politics (and then keep tabs on them to make sure they don't change). The RABL, for example, measures network spread rather than number of complaints. If a spammer has affected more than X networks, they are automatically added to the system, and removed after being clear for six hours (no messy cleanup). Another nice thing about machine-automated blacklists is that they are really real-time blacklists, and capable of catching zombies and other such evils with great precision.

NOTE: I haven't had time yet to bring the RABL into full production, but am interested in finding more participants to bring us out of testing.

2. This is an extreme example, but most anti-spam approaches have the following characteristics: They are implemented on a mail server without fully informing the users of the ramifications (or really informing them at all). They block messages without notification to the sender, causing things to be silently dropped. Even if the recipient becomes aware of the problem, few or no options are given for the recipient to alter this "service".

Jonathan Responds:
I've run into issues like this with my ISP (Alltel), and I agree with a lot of what you're saying. In the case of Alltel, not only are they filtering inbound messages using blacklisting techniques and other approaches they don't care to tell me about, but they are filtering outbound messages as well. I had to eventually give up using their mail server because I could not adequately train my own spam filter (Alltel would block messages I forwarded to it). To make matters worse, there is no way to opt out of this type of filtering on their network, even though I offered to give them the IP address of my remote mail server. This clearly does affect their customers, and I feel there are censorship, violation of privacy and denial of service issues all going on here. (Somebody please sue them by the way).

Fortunately, I don't think this issue is as wide spread as you might think. Many of the ISPs and Colleges I've worked with are, unlike Alltel, very dedicated to ensuring that their tools only provide a way for their users to censor themselves. I think this ought to be a requirement for any publicly used system. Specifically...

1.The user must be able to opt in or out of all aspects of filtering
2.All filtering components and their general function must be fully disclosed
3.The user must be able to review and recover messages the system filtered

Opting out of RBLs is as easy as having two separate mail servers and homing on the box you want. I would strongly advise to ensure that your solution is capable of receiving instruction from a user to improve its results, but it is still very difficult to scale this to millions users. At the very least should be fully disclosed, recoverable, and removable.

An Anonymous Coward asks:
Without going into the truths of the beliefs in question, which I'm sure will be debated enough in the Slashdot thread anyway (and I hope you'll join in), what do you think the reason is that so many scientists, nerds and people otherwise rather similar to you think your beliefs are obviously incorrect? Do you think they are all deluded? Do you agree that there might be a possibility that your beliefs are not rational?

Jonathan Responds:
The beliefs I hold as a Christian aren't always the popular ones, but they're certainly valid arguments for anyone who cares to ask about them (not that that has happened). When you read about someone's beliefs, you have the option to engage in discussion, or to filter his or her beliefs through your own belief system. The former option involves cognitive thought, however the latter is how most people today respond to anything that even smells religious. And I say this coming from the position of someone who hasn't tried to shove my beliefs down anyone's throat - I merely documented them on my personal website. That tells me that some people don't believe I have the right to my own beliefs - how asinine is that?

But to address the question, my beliefs aren't based on some religious intellectual suicide. In fact, the Bible teaches that you should know what you believe and why, and that you should even be prepared to give a defense for your faith - so the Bible encourages sound thinking and not some pontificated ideal structure as many quickly dismiss it as. I didn't dumb down when I became a Christian. In fact, it felt more like I began to think more clearly. I was raised in the same public school system as everyone else and didn't even know who Jesus Christ was until around my junior or senior year of high school. I've read from my early days in Kindergarten how "billions of years ago, when dinosaurs roamed the earth" and I've been taught the theory of evolution like everyone else. The problem, though, is that no matter how credible or not a particular area of science is, much of what is out there is taught based on authority. I find it very ironic to be flamed by anyone who thinks I'm an idiot for not believing in a theory that's never been proven by scientific process. It's recently become a "religious act" to question science in any capacity, but isn't questioning science the only way we can tell the good science from the bad science? And there is a lot of great science out there - even in public schools. But there's no longer a way for students to evaluate the credibility of what they're being taught. That seems to be degrading the quality of the subject. Science should be a quest for the truth, with no presuppositions, and appropriate understanding between hypotheses vs. theories vs. laws. When a theory is presented in the classroom as law and it's not held accountable to method, it's degenerated into mere conditioning.

I've spent a considerable amount of time studying topics such as the age of the earth and the theory of evolution, and I could probably argue it quite well if so inclined to engage in a discussion. That's important if you're going to believe anything really - including whatever the mainstreamed secular agenda happens to be.

Just as an example, I've recently looked into Carbon-14 dating and found that in cross-referencing it to Egyptian history (which dates back as far as 3500 B.C. and is held to be in very high regard by archaeologists and scientists alike), there is evidence that Carbon dating may be inaccurate beyond around 1800 B.C. For someone not to consider that would be ignoring science. My point here is that my beliefs aren't merely unfounded, eccentric ideas. Just because microevolution is feasable, that doesn't mean I'm going to sweep macroevolution under the rug and not test it - the two are actually worlds apart, just cleverly bundled. The Bible has given me a perspective that seems to offer a reasonable and sensible way to put the different pieces of good science together. No matter what you believe, I strongly feel that you should have some factual foundation to support whatever it is, and if you don't, then be man enough to admit you only have a theory put together.

No matter what side of the camp you are on, your beliefs require a certain amount of faith, as neither side is at present proven scientifically. I don't have all the answers, but I don't think science in its present state does either. At the end of the day, you can't prove the existence of God factually, and so whatever you believe is still based on faith. But at least the Christians can admit that - I just wish the evolutionists would too.
This discussion has been archived. No new comments can be posted.

Jonathan Zdziarski Answers

Comments Filter:
  • by webby123 (911327) on Tuesday August 30, 2005 @03:20PM (#13438610)
    First Question: How do you pronounce your name?
    • proving a theory? (Score:3, Insightful)

      by measlymonkey (750045)
      i always find it laughable when 'intelligent' people counter the 'theory of evolution' by rolilng over with a statement like:

      "I find it very ironic to be flamed by anyone who thinks I'm an idiot for not believing in a theory that's never been proven by scientific process."

      since 'the theory of evolution' falls under the Scientific definition of theory...

      a plausible or scientifically acceptable general principle or body of principles offered to explain phenomena

      here is a good one Jonathan, explain
      • Re:proving a theory? (Score:4, Informative)

        by Valiss (463641) on Tuesday August 30, 2005 @05:39PM (#13439608) Homepage
        "I find it very ironic to be flamed by anyone who thinks I'm an idiot for not believing in a theory that's never been proven by scientific process."


        I hate crap like that. Scientific America had a great article a while back that explains this just as well as I ever could. Here, I found a copy of the article (Scientific America wants you to reg to read the original on their site):

        "1. Evolution is only a theory. It is not a fact or a scientific law.

        Many people learned in elementary school that a theory falls in the middle of a hierarchy of certainty--above a mere hypothesis but below a law. Scientists do not use the terms that way, however. According to the National Academy of Sciences (NAS), a scientific theory is "a well-substantiated explanation of some aspect of the natural world that can incorporate facts, laws, inferences, and tested hypotheses." No amount of validation changes a theory into a law, which is a descriptive generalization about nature. So when scientists talk about the theory of evolution--or the atomic theory or the theory of relativity, for that matter--they are not expressing reservations about its truth."

        PDF version: http://www.swarthmore.edu/NatSci/cpurrin1/textbook disclaimers/wackononsense.pdf [swarthmore.edu]

        Original: http://www.sciam.com/article.cfm?articleID=000D4FE C-7D5B-1D07-8E49809EC588EEDF [sciam.com]

        Enjoy! (flame on)
    • The Z, d, z, and i are all silent. Oh yeah, and there's an invisible "J" in the name. Duh.

      Hey, at least he doesn't have a silent "3" in his name.

  • Very interesting (Score:3, Insightful)

    by Sinryc (834433) on Tuesday August 30, 2005 @03:26PM (#13438655)
    Very, VERY interesting. I have to say thank you to him, for the fact that he made a good statement about faith. Very brave, and very good man.
  • n00B! (Score:3, Insightful)

    by Anonymous Coward on Tuesday August 30, 2005 @03:27PM (#13438661)
    Does anybody remember what the world was like before Google? None of us do, primarily because we couldn't find iy

    YESI do remember you noob.

    Google is nothing new, before them there were a few engines that did the job fine. There was even an web based FTP search engine Where is that google, where is that.

    • Re:n00B! (Score:3, Informative)

      by kashani (2011)
      I beleive you're thinking about that golden period in '96 when any ol search engine could index the net, Yahoo actually rated sites, etc. Yeah you could search then and it wasn't bad.

      Fast forward a year or two and we see Internet content outstripping Moore's law among other things. You might have been able to find something if you read 5-10 pages of search results... maybe. Google's sucess was that it appeared about the time other search engines were failing miserably. Yeah they all had the same results, bu
    • Re:n00B! (Score:2, Interesting)

      by jonadab (583620)
      > > Does anybody remember what the world was like before Google?
      > YES I do remember you noob.

      Yeah, me too.

      > Google is nothing new, before them there were a few engines that did
      > the job fine.

      In 1995 they did, but by 1998 or thereabouts, things had degenerated rather badly. It got to the point eventually where you could go through three or four *pages* of results looking for the thing you wanted. I had conditioned myself to go straight for the advanced boolean search and construct complex cr
  • by 00_NOP (559413) on Tuesday August 30, 2005 @03:31PM (#13438676) Homepage
    If i write some code and I licence it under the GPL and something else what is the problem?
    You can take the GPL code and do what you like with it under the GPL, but I choose to licence what i have written under BSD (say) as well then what is the problem? It is going way OTT to take that away from me if I am gifting my work back to the community with the GPL. This is why I always stipulate that my code is licenced under GPL v2 and not any subsequent version - no self-appointed guardian has the right to take away my freedom to dual licence code.
  • At least his first name isn't Jathan. He's be stuck explaining his name for another five minutes to people as well.
    Me: My name is Jathan.
    Response: Woah. Were your parents stoned when they named you?
    Me: haha, yeah that's funny. It's kind of like saying Jason with a lisp.
    Response: Thats great caus I half a lithsp
    Me: Oh, sorry, it's like Nathan with a J then.

    I feel your pain Mr.Zdziarski....
  • by pclminion (145572) on Tuesday August 30, 2005 @03:36PM (#13438709)
    Although there don't seem to be any standards in the Open Source community for this, there are definitely standards in the academic community. Spam filters are a subset of machine learners, and there are very specific and well accepted ways of comparing machine learners.

    Typically what is done is to select a range of filters/learners that you want to evaluate. A test dataset is also selected (in this case, it would be an archive of spam and nonspam messages, correctly classified). An M-way N-fold cross validation is performed. What this means is that the data set is split into N parts, and N runs are conducted for each classifier, training using N-1 of the parts. The remaining part is used to test the learner. This is repeated, each time holding out a different part of the test set.

    This ENTIRE procedure is repeated M times. This gives, ultimately, M*N results. Each column pair of results from a specific pair of learners has a T-test applied to it. This tells the statistical significance of variations in performance. Usually, a 5% or 1% threshold of significance is used.

    Once that is completed, something called a WLT table is computed. Each time a learner defeats another learner on a given test, its W ("Win") counter is incremented. Likewise, when a learner loses, the L ("Loss") counter is incremented. When two learners tie (i.e., when the variation is not statistically significant), the T ("Tie") counter is incremented.

    The overall "winner" of the comparison is the learner with the maximum value of W-L.

    This sounds complicated and bloated, but it is, in fact, how machine learners are tested in academia. The cross validation method, along with checks for statistical significance, is critical to achieving a valid comparison. Simply running the tests once and saying "This filter got 98% correct, and this other filter got 95% correct -- therefore the first filter is better" is NOT sufficient.

  • by scovetta (632629) on Tuesday August 30, 2005 @03:38PM (#13438725) Homepage
    I use SpamAssassin (server) and SpamPal (client). They're both quite accurate and I'm very happy with them.

    However, I've had unacceptably high false-positive rates. Saying that you only get one spam a day is fine--I can deal with that. Are you sure that no legitimate e-mail is being tagged though? I have the subject lines prefixed with [SPAM] and so I just go through and look for anything that looks like it might not be spam. This process takes about 10 minutes a day, which is 10 minutes more than I would care to spend.

    I give the anti-spam developers credit for their hard work, but I believe that the best solution would not be filter-based, for mere fact that if 1 spam gets through a day, and the volume of spam increases 100x in the next 2 years, then you're back up to ~100 spams a day. It's a temporary solution to a permanent problem.

    Just my $0.02.
    • However, I've had unacceptably high false-positive rates. Saying that you only get one spam a day is fine--I can deal with that. Are you sure that no legitimate e-mail is being tagged though?

      After a few months of learning, DSPAM has gotten pretty good about not giving me very many false positives. I'd say my FP rate is about the same as my FN rate, perhaps one per month. DSPAM has some integrated false-positive protection coding called "statistical sedation" which cuts off after you it learns enough mes
    • > However, I've had unacceptably high false-positive rates.

      I've been using Death2Spam for about 4-5 months now and get almost no FPs.

      As far as scanning [SPAM] for "just in case" FPs, I have my client route those messages to a SPAM folder that I look at every couple days. All I do is glance at the subject line first and hit DELETE if it's an obvious spam. If not, I look at the sender and hit DELETE if I don't recognize the sender name. I think I average about one second per message...at ~30 messages p

    • Maybe you should tweak your spam settings. If you're getting more then one spam or so a month then you're getting too many. I've used Thunderbird's spam filters, Gmail, and spam assassin indirectly through Evolution, . GMail by far has the highest accuracy, I've had my account for about 1 year and 3 months, I have a little over 3,000 emails (not including spam) and the accuracy is 100%. I kid you not, every single email has been correctly identified, which is one of the main perks I think most people use it
    • ``However, I've had unacceptably high false-positive rates.''

      Maybe you should consider using bayesian filters instead. IIRC, SpamAsassin can do bayesian filtering, but SpamPal (from a quick glance at the site) seems unable to do so.

      Empirical evidence shows that bayesian spam filters achieve higher recall (what part of spam messages were classified as spam) and much higher precission (what part of messages classified as spam were actually spam) than other filters.

      I've written a very simple bayesian filter my
    • However, I've had unacceptably high false-positive rates.

      I too had that problem with SpamAssassin version 2.6x, but with 3.0 and later its been much better. Granted, I use almost every test under the sun, and I have my threshold set to 10.0 for "spam-nasty" and 13.0 for "spam-nasty-to-the-point-of-no-return". Yes, those are the mailbox names :)

      I also have a custom plugin for SpamAssassin that does non-linear postprocess scoring of SPAM. What it does is gives 3 points for every SPAM subject rule hit, one
  • And it was +5 Interesting. Anyone want to take a crack at it?

    http://slashdot.org/comments.pl?sid=160001&cid=133 92902 [slashdot.org]

    • What advice do you have as a developer of this program to: * Help my users send legitimate messages (either by education (specifically) or by programming techniques) * Help Spam Filtering Software check the messages my program sends out for possible abuse * Be a part of the solution to sending legitimate messages to many people, rather than perhaps be part of the problem.

      I had written up an answer to this one, but it turned out not to appear in the interview questions, so it got bitcanned. I believe t
  • by PhYrE2k2 (806396) on Tuesday August 30, 2005 @03:54PM (#13438813)
    . The ignorance of these people on such topics is astounding, and I find many approaches I have tried seem to yield no results...


    Bingo. One of my managers said it very well at my former employment: nobody ever got fired for choosing IBM- and he's 100% right. I know a few companies spending millions to have services offered by Dell, IBM, Microsoft, etc who could get their services for thousands from clone computer makers, and Linux- but who would they?

    Choose IBM and loose a few million, and you 'missed the market'. Choose open source and loose a few million and 'your solution wasn't up to par'. Choose open source and succeed and you make millions...

    Is it worth the risk for the second situation? Most managers who want to leave with a hefty bonus and a good referral woulds say no.

    PS: Agree 100% with almost everything he said. Smart man.

    -M
  • My spam problem... (Score:3, Interesting)

    by radish (98371) on Tuesday August 30, 2005 @04:07PM (#13438896) Homepage
    My spam problem is the reverse of most people's. Using grey listing I get basically no actual spam. It's wonderful - works very well. But that's not my problem. My problem is that some a$$hole spammer has decided to start using my domain as a from address in his spams. So I'm currently getting deluged by bounce messages for mails I never sent. I've published SPF records and that's helped a bit, but not a lot.

    Anyone got any good suggestions?
    • some a$$hole spammer has decided to start using my domain as a from address in his spams

      This is known as "joe jobbing." It's happened to me a couple of times.

      Anyone got any good suggestions?

      The bounce messages are typically coming from mailservers operated by brain-dead admins. These servers accept the email from trojaned machines, and then bounce it to you when they try to deliver it to the recipient.

      As these servers are obviously misconfigured, block them so they have to deal with the bounce messages (i
    • You say your domain, not your email address, so I assume you have the same problem as I had, my mail server had a catch-all system so all Firstname_23@mydomain.nl type addresses the spammer invented came in my inbox. I had this catch-all system because I got the domain from another company but still received the odd mail to them.

      In the end the only think that worked was stopping the catch-all, which is a simple setting on most servers. From then on all non-existing email addresses bounced back.

      PS All invent
  • Carbon-14 (Score:2, Insightful)

    by Anonymous Coward
    Sure looks like he didn't take his good time on researching carbon-14 and find out that to date dino-bones we aren't using carbon-14 that much.

    http://science.howstuffworks.com/carbon-142.htm [howstuffworks.com]

    Hard-core Christians complain that we aren't researching their opinions, but I see way too much that it is the same the other way around. If you believe in Carbon-14 then you have to agree that the other science behind the chemistry also works. And in that case that argument for the age of dinosaurs so fall apart for tho
    • by the_raptor (652941) on Tuesday August 30, 2005 @04:43PM (#13439121)
      I am truely sick of people who call themselves Christians but are really practising some whacky supersitious religion that has no place for critical thought.

      To quote Mr Zdziarski's homepage:
      "to teach and to defend what I have come to find is a scientifically beautiful piece of logical harmony - the Bible"

      Ah so science is a book that is thousands of years old and most of it is not corroborated in secondary sources? A book that is known to have been selectively edited through out its history for political reasons? So Jesus violating the laws of physics in his numerous miracles is science?

      It certainly has great bits of logic and moral teaching in it (Do unto others, as you would have them do unto you), but it is not science. For someone to call it science shows that they have no understanding of science at all and it is no surprise that he thinks creationism and evolution should be taught in science class. I was taught creationism at school but in thelogy class.

      I spent my entire education in christian schools. I have spent the last three weeks going to church to reconnect with God. Science does not preclude God. Just because God didn't have to make Adam from mud, after he made the world in six days, doesn't mean there is no God. No matter who much scientific knowledge we get there will always be room for God (What came before the Big Bang? And how did matter get the properties it has?).

      For me God is the ultimate programmer. No sense doing all the work by hand when you can write some perl scripts to do it for you.

      Science tells us what we can do and how. Religion tells us if we should.
      • These are serious questions. Is "God" the god of the universe or just earth? If "God" is the god of the universe, and "God" made man in his own image, then wouldn't life on all planets on the universe also be made in "Gods" image? Is the bible discussing the birth of the earth or the universe? If the earth, then is there another super bible explaining the universe and its creation?
    • Re:Carbon-14 (Score:2, Interesting)

      by LexNaturalis (895838)
      That assumes that you DO believe that the chemistry behind Carbon-14 dating works. Actually, it assumes that you believe that radioactive decay is constant and that you can accurately determine rate of decay based on an unchanging half-life. I used to believe such things, until I learned an interesting tidbit.

      Russian scientists recently (Well, more like 6-7 years ago) discovered that radioactive decay is not constant. They were opening up some of their nuclear weapons and expected to find N amount of rad
  • I welcome our new Flying Spaghetti Monster [wikipedia.org] overlord.

    Seriously, you can believe whatever you want. It's when you start dissing evolution that we've got a problem: now the burden of proof is yours.

    And you're going to have to do a hell of a lot better than challenging the accuracy of carbon dating. Ideally, you'd have an alternative explanation that wasn't half-baked.
    • It would've been more on topic if you were a follower of SPAM [wikipedia.org]. (Hopefully that link survives.)
    • by slavemowgli (585321) on Tuesday August 30, 2005 @04:43PM (#13439123) Homepage
      I second that. What you believe is one thing, but if you abandon scepticism and value religious doctrine higher than tested scientific theory, then you've got a problem.

      Incidentally, I'm sad to see that Zdziarski tries to pull the same old stunt again that most supporters of creationism try to pull - namely, deliberately misunderstanding the meaning of "theory" in the context of science and equating it with an unproven hypothesis. Everyone who knows a bit about science (which no doubt includes Zdziarski) will know that that's not true, of course, but the general public often doesn't, which is why this kind of tactic is so despicable.

      I'd really like to see a supporter of creationism who says "I don't believe in evolution, but I still acknowledge that it explains the observed facts and has made falsifiable predictions that were, in turn, shown to be correct". But I guess that's something you just won't hear from someone who puts his personal faith above the scientific method, as far as the search for scientific truth is concerned.
      • deliberately misunderstanding the meaning of "theory" in the context of science and equating it with an unproven hypothesis.

        Oh I don't misunderstand the difference. However, most public schools do, and they teach a theory as if it were a law (such as the laws of thermodynamics or the laws of gravity). I think a lot of people misinterpret trying to bring the theory of evolution down to a "theory" as trying to convert it into a hypothetis. This just isn't the desired intent.

        don't believe in evolution,
        • I believe there is more than enough information to suggest that we move on and find some other scientific explanations.
          And at some point, evolution could be as quaint a theory as phlogiston. Show us the other way. Behe and Dembski have plainly failed. So long as you have no alternate theory that explains evidence not covered by evolution, your objections will sound idiotic.

          Falsible? I get 90 hits on Google.
          • Perhaps he meant "falsifiable." One of the marks of a good theory is if it is falsifiable, meaning, can you design an experiment that could prove the theory false. Falsifiable does not mean the theory is false. For more on that check out the wikipedia article [wikipedia.org]. Intelligent Design is easily falsifiable, despite the claims of some evolutionists to the contrary. All you have to do is show once through an experiment that you can produce something that is supposedly of "irreducible complexity" from simpler p
            • by khasim (1285)

              Intelligent Design is easily falsifiable, despite the claims of some evolutionists to the contrary.

              Nope.

              All you have to do is show once through an experiment that you can produce something that is supposedly of "irreducible complexity" from simpler parts.

              Nope. If the statement is that an Intelligence is required to create the complexities of life ... then having an Intelligence create the complexities of life demonstrates nothing.

              The only way to show that ID is false is to show that such complexities ar

        • Re:I'm a pastafarian (Score:3, Informative)

          by Fëanáro (130986)
          Oh I don't misunderstand the difference. However, most public schools do, and they teach a theory as if it were a law (such as the laws of thermodynamics or the laws of gravity). I think a lot of people misinterpret trying to bring the theory of evolution down to a "theory" as trying to convert it into a hypothetis. This just isn't the desired intent.


          Which laws of gravity are you talking about? the newtonian laws always were just a theory too, which has since been disproven and replaced by einsteins theorie
    • "It's when you start dissing evolution that we've got a problem: now the burden of proof is yours."

      Description: The burden of proof is always on the person making the assertion or proposition. Shifting the burden of proof, a special case of "argumentum ad ignorantium," is a fallacy of putting the burden of proof on the person who denies or questions the assertion being made. The source of the fallacy is the assumption that something is true unless proven otherwise.

      Evolution is the assertion (theory) b
      • WOW, I really screwed up that href...sorry.
        Was supposed to be:
        http://people.uncw.edu/kozloffm/EDN566logical fallacies.html [uncw.edu]
      • There is lots of verifiable evidence to support the theory of evolution. Its not proven, but there is evidence that supports it. If you want to say its wrong, and claim that there's lots of evidence proving its wrong, then yes, you really do have to step up and show this evidence. People aren't telling you that you have to accept or believe evolution, just that if you are going to claim its wrong, you put your money where you mouth is and show us your proof.

        People who aren't blindly following an irration
      • And your arguing is what, an Ad Hominem? Le me partake also.

        Your God isn't provable or disprovable, and the old dino bones were just put here to test our faith.

        Riiiight.... sorry, but the burden of proof still rests on those making the most outlandish assertion. Between macro-evolution and an omnipotent deity, it's your half-baked theory that wins, noodly appendages down.
  • I do like his section on his beliefs, he raises some very good points. I used to be cynical and thought all relgious people were idiots, but lately I've come to realize that this was just my own sense of inflated ego fucking with me. Some of the smartest people have been religious believers. There have even been really smart fanatics, Mohamed Atta [wikipedia.org] had a masters degree, and pretty much all the attackers in the sarin gas attack on Tokyo [wikipedia.org] had degrees in engineering, many of them at least masters.
    One minor g
  • PEOPLE!! STOP. Stop using "theory" in regards to science as a theory in the common definition. A scientific theory is very well backed up by facts.

    "In layman's terms, if something is said to be "just a theory," it usually means that it is a mere guess, or is unproved. It might even lack credibility. But in scientific terms, a theory implies that something has been proven and is generally accepted as being true."

    I will personally stangle the next person who does this. Pisses me off.
    Scientific Theory = Scient
  • Religion (Score:2, Insightful)

    by Anonymous Coward

    The beliefs I hold as a Christian aren't always the popular ones, but they're certainly valid arguments for anyone who cares to ask about them (not that that has happened).

    Yes, but as anybody with a clue would point out, a perfectly valid argument can still be completely wrong. The problem scientific-types have with Christianity isn't that it's not a valid argument - it's that the axioms are wrong.

    Point out a scientist who claims that, assuming the Bible is the incorruptible word of God, Christianit

  • by sfjoe (470510) on Tuesday August 30, 2005 @05:02PM (#13439292)
    I find it very ironic to be flamed by anyone who thinks I'm an idiot for not believing in a theory that's never been proven by scientific process.

    In actuality, I believe you're an idiot because you don't understand that the theory of evolution does not attempt to be a proof. That's just more propaganda from the radical right.
    Science attempts to explain why things are and, by extrapolation and interpolation, why they might change. Evolution does this very well. It does not require you to "believe". It simply states what is. You are still free to believe in God in any form you care to, just don't expect to be able to predict what will come next with any accuracy.
    • But it is belief (Score:3, Insightful)

      by Skeezix (14602)
      A theory can be correct or incorrect. It can be revised, or shot down completely, or confirmed beyond reasonable doubt as further evidence and experiments come to light. You can remain agnostic about the issue and say, "well here is the evidence; it is what it is." But in practice, human beings do not really remain in that state. They form beliefs and world views. And the vast majority of evolutionary biologists start with the atheistic world view and see everything through those tinted glasses. I'm n
      • Evolution seems unfalsifiable, on the other hand.

        Exactly. Which is what makes it the prevailing scientific theory.

        Scientific theories can be used to make predictions about the natural world. Creationists like to believe that just because nobody has ever found the fossil remains of a monkey that walks upright and carries a pocket watch, evolution must be "unproven." The truth is that the vast preponderance of observations of living species support the theory of evolution.

        Might we someday find an organ

  • by 3am (314579)
    I had written a pretty wordy response (about halfway through a point-by-point rebuttal) to the article you wrote and linked to in your site's bio regarding your stance towards evolution, but I decided to delete it all make it more succinct.

    Your beliefs on evolution are stupid.

    You seem smart, so I hope you see the light at some point. I have no illusions that I will be able to convince you otherwise, as I'm sure you have heard this many times before.
  • by gvc (167165) on Tuesday August 30, 2005 @05:40PM (#13439609)
    Zdziarski says,
    Incidentally, I've been working with Gordon Cormack to try and figure out what the heck went wrong with his first set of dspam tests. So far, we've made progress and ran a successful test with an overall accuracy of 99.23% (not bad for a simulation).
    First, I would like to thank Jonathan for his recent helpful correspondence in configuring DSPAM for the TREC Spam Filter Evaluation Tool Kit [uwaterloo.ca]. When finalized, this configuration will replace the one currently available (along with seven others [uwaterloo.ca]). However, I take exception to the statement above, implying that there is something wrong with the tests that Lynam and I previously published. I stand by those results. [uwaterloo.ca] Since that report was made public, I have become aware of two others that achieve much the same results: Holden's Spam Filtering II [holden.id.au] and Sergeant's CRM114 and DSPAM (Virus Bulletin, no longer freely available).

    Lynam and I said that DSPAM 2.8.3, in its default configuration, achieved 98.15% accuracy on the same corpus to which Zdziarski refers above. The report also argued that accuracy was a very poor measure of filter performance and that a false positive rate such as the 1.28% demonstrated by DSPAM would likely be unacceptable to an email user.

    In recent correspondence, Zdziarski suggested three configurations of DSPAM (available here on the web [uwaterloo.ca]) that achieved the following results:

    dspam(tum) fpr 1.81% fnr 0.80% accuracy 99.20%
    dspam(toe) fpr 1.94% fnr 0.59% accuracy 99.16%
    dspam(teft) frp 1.85% fnr 0.53% accuracy 99.32%

    More detailed results and comparisons will be made available when our current study is complete. Don't take my word (or Jonathan's) for anything; run this filter and others on your own email. But please take great care in constructing your gold standard [www.ceas.cc].

    Gordon Cormack

  • by computersareevil (244846) on Tuesday August 30, 2005 @05:41PM (#13439622)
    Seriously. Science only deals with theories and facts. Any scientist worth his salt with object strongly if you say science proves anything.
    I find it very ironic to be flamed by anyone who thinks I'm an idiot for not believing in a theory that's never been proven by scientific process.
    Spoken like a "True Believer". It seems to me that many (most?) True Believers can't understand that science isn't about believing in anything, and that science never, ever claims to prove anything. Shame, too, on the non-Believers who say that it does.
    It's recently become a "religious act" to question science in any capacity, but isn't questioning science the only way we can tell the good science from the bad science?
    Again, a clear misunderstanding of science. Science is all about questioning everything. I.e. why are we here? How did we get here? Where are we going? What is thunder? It's only a "religious act" when the questioner is an avowed (or covert) Believer who is offering zero actual evidence supporting any alternate theory.

    Religion, generally, and Chistianity specifically, appears to be all about answers. I.e., to answer the questions above, because god made us. God put us here. To heaven if you're good, hell if your not. God is angry. See the difference?

    Does this mean I think less of his thoughts on email? No, I don't dismiss someone because of their beliefs (think Stallman ;). But I will read their works with a heightened sensitivity toward bias in the direction of whatever they believe. I think people like Mr. Zdziarski who openly declare their beliefs are less likely to let them taint their works because they know it would reflect negatively on their credibility.
  • Every now and then I will get an email from a friend, CC'ed to everyone else on his/her address book. Inevitably, one of the other recipients has a hacked PC, and soon after that my daily Spam level increases.

    People need to learn to use BCC instead. Web clients like gmail should make this the default for emails with more than say 4 recipients.

  • GPL 3.0 should address the now common practice of offering services of GPL'd software by "wrapping" it in new code, which connects to the GPL'd code via a public API in the GPL'd code. That kind of interface is more like linking to a GPL'd library than like revising the actual code of the GPL'd source. It's not even as tight an integration. So requiring that new code that's "API linked" to GPL'd code also be governed by GPL terms (eg. publishing all the new code's source, and virally transmitting the GPL wi

"Love may fail, but courtesy will previal." -- A Kurt Vonnegut fan

Working...