Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Government Politics Your Rights Online

Ask Carl Malamud About Shedding Light On Government Data 59

If you've ever tried to look up public records online, you may have run into byzantine sign-up procedures, proprietary formats, charges just to view what are ostensibly public documents, and generally the sense that you're in a snooty library with closed stacks. Carl Malamud of Public.Resource.Org has for years been forging a path through the grey goo of U.S. government data, helping to publicize the need for accessible digital archives — not just awkward, fee-per-page access. (Mother Jones calls him a "badass.") Malamud has (with help) been making it easier to get to the huge swathes of data in government sources like PACER, EDGAR, and the U.S. Patent Office. He's got a new initiative now to establish a "Federal Scanning Commission," the task of which would be to assess the scope and outcomes of a large-scale effort to actually digitize and make available online as much as practical of the vast holdings of the U.S. government. ("If we were able to put a man on the moon, why can't we launch the Library of Congress into cyberspace?") Ask Malamud below questions about his plans and challenges in disseminating public information. (But please, post unrelated questions separately, lest ye be modded down.)
This discussion has been archived. No new comments can be posted.

Ask Carl Malamud About Shedding Light On Government Data

Comments Filter:
  • Re:Be careful ... (Score:2, Informative)

    by Anonymous Coward on Wednesday January 04, 2012 @03:43PM (#38588068)

    Right. Because power corrupts, and yet we keep putting people into power and expecting them to not get corrupted. Nothing will chenge until we open source it. [wikipedia.org]

  • Re:Why (Score:3, Informative)

    by Anonymous Coward on Wednesday January 04, 2012 @06:39PM (#38590020)

    Having worked for the government in the recent past, I can offer a few insights...

    1 - A lot of government agencies, on receiving a request for information, will kick it over to the IT department, on the grounds that "they keep the data". Unfortunately, because of the way things are structured, while the people in IT may run the disks and servers, they don't actually deal with the data... which means they either have to fight an internal battle with the people who actually manage the data, or take the path of least resistance and offer to provide the data as some sort of raw data dump.

    2 - A few agencies regularly get requests for data, and have people whose job it is to work with the public in getting them the data they request. Most, however, don't. This means that someone gets stuck with the task for whom it isn't part of their normal job. Since they don't deal with translating data formats, exporting large chunks of data, etc. on a regular basis, they have to go find out how to do these things... and while they're doing that, they can't do their normal work.

    3 - The data the agency has may be mixed in with other data, which might be confidential. To give a real example, I was working for the Department of Environmental Protection in my state, and we were sent a request to give out a list of all our employee's names and email addresses. You'd think that'd be simple, right? However, our employees include a law enforcement division, many of whom are exempt from having their personal information disclosed (because they're currently or previously involved in undercover investigations, have held positions as prison guards, etc.). Further, regular employees can be exempt under certain circumstances (e.g., they have a restraining order against a stalker, ex-spouse, or whatever). Now, since no one had ever previously asked us for this info, naturally no one had bothered to make a list of everyone who was exempt... which meant that we had to start creating such a list immediately, and couldn't release the information until it was completed. For extra fun, we also had people in our mail system who weren't employees -- volunteers with the state parks, for example, could get an email address from us. So we had to contact Parks and find out whether all the non-employees with email addresses were correctly marked as such in the system. What in theory should have been a simple "run a script, get a list, email it" operation that could be done in ten minutes took weeks and a lot of man-hours.

    4 - Just like everybody else, records retention is a problem for the government. Storing old data costs money. Keeping the formats that data is in current costs more money. A lot of our programs did "front-end" processing for Federal EPA programs, collecting data in our state, then sending it to the EPA. Our state DEP received funding from the EPA to do this for them. We weren't, though, being paid to keep old records for them... so it'd be kept for however long the EPA required us to keep it, then deleted after that period -- generally six months or a year. Thus, if someone requested data for two years ago from us, we would have to either tell them, "we don't have that -- go ask the Feds" or hope we could find old backups with that data, and that those were still readable. And, of course, since people hate to be the bearer of bad news, and the IT department, which would get handed the request, didn't manage the data, the result would be that it would literally take a week or more for us in IT to find someone who would admit that the data wasn't there.

    5 - And to tell the truth, sometimes we just don't want to. That guy back in #3 who asked for all our email addresses? Well, the only reasons we in IT could see for someone asking for that were either (1) so they could use the list to spam our employees with something, or (2) so they could sell the list to someone who would then spam our employees. Understandably, we weren't highly motivated to get that list back to them quickly, or to keep the

For God's sake, stop researching for a while and begin to think!

Working...