Ask Carl Malamud About Shedding Light On Government Data

Ask Carl Malamud About Shedding Light On Government Data 59

Posted by timothy on Wednesday January 04, 2012 @03:30PM from the righteous-fight dept.

If you've ever tried to look up public records online, you may have run into byzantine sign-up procedures, proprietary formats, charges just to view what are ostensibly public documents, and generally the sense that you're in a snooty library with closed stacks. Carl Malamud of Public.Resource.Org has for years been forging a path through the grey goo of U.S. government data, helping to publicize the need for accessible digital archives — not just awkward, fee-per-page access. (Mother Jones calls him a "badass.") Malamud has (with help) been making it easier to get to the huge swathes of data in government sources like PACER, EDGAR, and the U.S. Patent Office. He's got a new initiative now to establish a "Federal Scanning Commission," the task of which would be to assess the scope and outcomes of a large-scale effort to actually digitize and make available online as much as practical of the vast holdings of the U.S. government. ("If we were able to put a man on the moon, why can't we launch the Library of Congress into cyberspace?") Ask Malamud below questions about his plans and challenges in disseminating public information. (But please, post unrelated questions separately, lest ye be modded down.)

Ask Carl Malamud About Shedding Light On Government Data

This discussion has been archived. No new comments can be posted.

Search 59 Comments Log In/Create an Account

Comments Filter:

LOC (Score:3, Interesting)

by Anonymous Coward writes: on Wednesday January 04, 2012 @03:41PM (#38588048)

So how many GB/TB is a library of congress? :)
Or more seriously how big are you estimating? Are you using raw scans or some sort of compression (jpeg, png, ...etc)? What resolution are you using? Do you vary the resolution depending on the document?
What sort of meta data are you putting in?

Happend Top Down Already (Score:5, Interesting)

by jimmerz28 ( 1928616 ) writes: on Wednesday January 04, 2012 @03:44PM (#38588078)

Didn't Obama already mandate that all government agencies must digitize their records and develop plans within 4 months? http://www.simplysecurity.com/2011/12/28/obama-administration-pushes-for-digital-records-management-overhaul/ [simplysecurity.com]

regulations.gov is a good model to follow (Score:5, Interesting)

by hyeprofile ( 1851598 ) writes: on Wednesday January 04, 2012 @03:44PM (#38588082)

The US actually does a good job with sharing data on regulations and rulemaking on regulations.gov. You can pretty much search any of the regulatory dockets from msot departments, and even access public comments and supporting material. You can even take advantage of regulatory policy updates and eRulemaking Program activities on your Twitter stream. Wouldn't this be a good model to follow to systematically publish everything online? I'm thinking publishing everything online on a government website would make for a great summer job for students, and help boost the economy and employment stats, no?

Why (Score:4, Interesting)

by CanHasDIY ( 1672858 ) writes: on Wednesday January 04, 2012 @03:46PM (#38588100) Homepage Journal

Can you provide any explanation as to why it is so difficult and cost-prohibitive to obtain records from the government, especially considering the abundance of laws requiring government compliance with requests for information (AKA "Sunshine Laws")?
Is it simply a matter of government employee ineptitude, or have you found evidence of a more nefarious rationale?

Ancestry.com (Score:3, Interesting)

by Anonymous Coward writes: on Wednesday January 04, 2012 @03:51PM (#38588158)

What is your opinion about websites like Ancestry.com which make use of public records and charge a subscription fee for access? What is the incentive for the government to migrate old documents into digital form when services like these exist? Do you think Ancestry.com should be a 501(c)(3)?

Who is the worst? (Score:5, Interesting)

by TheBrez ( 1748 ) writes: <brez@brezworks.com> on Wednesday January 04, 2012 @03:51PM (#38588160) Homepage

Which government agency is the worst to get information from?

Scanning ? (Score:3, Interesting)

by SoothingMist ( 1517119 ) writes: on Wednesday January 04, 2012 @03:53PM (#38588184)

By "scanning", what do you mean? Are we talking about searchable records or just a bunch of images? If searchable, what quality control is going to be provided? As someone who has re-published books that are out of copyright, it takes a lot of quality control to ensure a usable product. Unless high-quality searchable records in a solid database are the end result, the project is not worth funding, in my personal opinion.

How to get more attention to (Score:4, Interesting)

by oneiros27 ( 46144 ) writes: on Wednesday January 04, 2012 @03:56PM (#38588208) Homepage

Recently in the federal register, there were two calls for comments about access to data and research from federally funded research:
http://federalregister.gov/a/2011-28623 [federalregister.gov]
http://federalregister.gov/a/2011-28621 [federalregister.gov]
I didn't hear about these until ~4 weeks after the original announcement, and with the holidays, it was too late to try to get the societies I'm involved with to prepare and vote on official statements. Are there any places where people can get/post notices of these sorts of things so that we can stay informed and try to help influence policies?
(note -- the second one on data access doesn't close 'til Jan 12th; NSF also has a similar RFC that closes Jan 18th [nsf.gov])

Idea (Score:4, Interesting)

by hardwarejunkie9 ( 878942 ) writes: on Wednesday January 04, 2012 @04:06PM (#38588298)

Something has been rattling around my head in recent days on this topic and now I think it's a proper time to let it out.
The amount of information you're trying to free is entirely staggering and consists, largely, of tables of numbers. These numbers are incredibly significant, but people generally can't see them.
After you free all of this information and make it available to the public (as it should be), then what? What do you expect for the public to do with these numbers? Tables of information are not nearly as useful as graphs. This data needs to be seen, but, more importantly, it needs to be understood.
Do you have any ideas for how to disseminate this information? Perhaps a team-up with someone like gapminder.org's Hans Rosling might be particularly valuable for all of us.

Re:Happend Top Down Already (Score:4, Interesting)

by garcia ( 6573 ) writes: on Wednesday January 04, 2012 @04:12PM (#38588350)

I scour publicly available records for fun stuff all the time. I not only find it online but I also request it from government agencies (not Federal usually but local/county/etc).
In Minnesota data must be, "easily accessible for convenient use." [mn.gov] While that has specific wording related to historical records, it basically means that on recent data it must be in some sort of electronic format or otherwise easily found and presented, free of charge as long as you do it in person, to anyone who asks--even anonymously. Now. This is great in theory. Unfortunately just because it's easy for the agency to use it doesn't mean it's easy for you to use or interpret.
Let's take for instance data on bus ridership data [lazylightning.org]. It's not well organized for outsiders to read it and due to collection methodologies (not explained to the general person who had to pay $50 to get the data in the first place) is basically useless.
They have the data and after months of fighting with them for how much they claimed it cost (they wanted to charge me more than $300 IIRC) I got it down to $50 and got what you see above even though they already pulled it (and summarized it) for the mass media but wouldn't release it in a raw format.
So. It's in a format which isn't standard. It's methodology is questionable and it's expensive. So no matter the mandates, the promises, etc, the data is not terribly useful across agencies or to the public without some intermediate steps which costs the taxpayers more than doing it right the first time around.

Encouraging Governments? (Score:4, Interesting)

by theNAM666 ( 179776 ) writes: on Wednesday January 04, 2012 @04:58PM (#38588790)

In a city such as Nashville, things as basic as business ownership and property records are not available online. In states such as New Jersey, public records such as basic corporate filings (officers, operating address/address for service of process) are accessible only for a fee.
What concrete actions can citizens confronting such situations, take to encourage accessibility and accountability?

Can the rare books collections be digitized? (Score:5, Interesting)

by autophile ( 640621 ) writes: on Wednesday January 04, 2012 @05:02PM (#38588842)

Three closely related questions about the rare books collections at the Library of Congress:
1. I know there is some kind of effort going on to digitize the rare books collections, but can it be sped up? There are many high-quality low-cost archival book scanners out there (such as the ones developed at diybookscanner.org).
2. It gets really annoying to have to receive paper copies of books when copies are requested. Why not DVDs of high-quality images?
3. Why is there no outreach by the LoC to smaller, cheaper book scanning efforts? The Internet Archive, DIYBookscanner.org, and Decapod all come to mind.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Ask Carl Malamud About Shedding Light On Government Data 59

Ask Carl Malamud About Shedding Light On Government Data More Login

Ask Carl Malamud About Shedding Light On Government Data

LOC (Score:3, Interesting)

Happend Top Down Already (Score:5, Interesting)

regulations.gov is a good model to follow (Score:5, Interesting)

Why (Score:4, Interesting)

Ancestry.com (Score:3, Interesting)

Who is the worst? (Score:5, Interesting)

Scanning ? (Score:3, Interesting)

How to get more attention to (Score:4, Interesting)

Idea (Score:4, Interesting)

Re:Happend Top Down Already (Score:4, Interesting)

Encouraging Governments? (Score:4, Interesting)

Can the rare books collections be digitized? (Score:5, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot