Ask Carl Malamud About Shedding Light On Government Data 59
If you've ever tried to look up public records online, you may have run into byzantine sign-up procedures, proprietary formats, charges just to view what are ostensibly public documents, and generally the sense that you're in a snooty library with closed stacks. Carl Malamud of Public.Resource.Org has for years been forging a path through the grey goo of U.S. government data, helping to publicize the need for accessible digital archives — not just awkward, fee-per-page access. (Mother Jones calls him a "badass.") Malamud has (with help) been making it easier to get to the huge swathes of data in government sources like PACER, EDGAR, and the U.S. Patent Office. He's got a new initiative now to establish a "Federal Scanning Commission," the task of which would be to assess the scope and outcomes of a large-scale effort to actually digitize and make available online as much as practical of the vast holdings of the U.S. government. ("If we were able to put a man on the moon, why can't we launch the Library of Congress into cyberspace?") Ask Malamud below questions about his plans and challenges in disseminating public information. (But please, post unrelated questions separately, lest ye be modded down.)
LOC (Score:3, Interesting)
So how many GB/TB is a library of congress? :)
Or more seriously how big are you estimating? Are you using raw scans or some sort of compression (jpeg, png, ...etc)? What resolution are you using? Do you vary the resolution depending on the document?
What sort of meta data are you putting in?
Happend Top Down Already (Score:5, Interesting)
regulations.gov is a good model to follow (Score:5, Interesting)
Why (Score:4, Interesting)
Can you provide any explanation as to why it is so difficult and cost-prohibitive to obtain records from the government, especially considering the abundance of laws requiring government compliance with requests for information (AKA "Sunshine Laws")?
Is it simply a matter of government employee ineptitude, or have you found evidence of a more nefarious rationale?
Ancestry.com (Score:3, Interesting)
What is your opinion about websites like Ancestry.com which make use of public records and charge a subscription fee for access? What is the incentive for the government to migrate old documents into digital form when services like these exist? Do you think Ancestry.com should be a 501(c)(3)?
Who is the worst? (Score:5, Interesting)
Scanning ? (Score:3, Interesting)
How to get more attention to (Score:4, Interesting)
Recently in the federal register, there were two calls for comments about access to data and research from federally funded research:
http://federalregister.gov/a/2011-28623 [federalregister.gov]
http://federalregister.gov/a/2011-28621 [federalregister.gov]
I didn't hear about these until ~4 weeks after the original announcement, and with the holidays, it was too late to try to get the societies I'm involved with to prepare and vote on official statements. Are there any places where people can get/post notices of these sorts of things so that we can stay informed and try to help influence policies?
(note -- the second one on data access doesn't close 'til Jan 12th; NSF also has a similar RFC that closes Jan 18th [nsf.gov])
Idea (Score:4, Interesting)
The amount of information you're trying to free is entirely staggering and consists, largely, of tables of numbers. These numbers are incredibly significant, but people generally can't see them.
After you free all of this information and make it available to the public (as it should be), then what? What do you expect for the public to do with these numbers? Tables of information are not nearly as useful as graphs. This data needs to be seen, but, more importantly, it needs to be understood.
Do you have any ideas for how to disseminate this information? Perhaps a team-up with someone like gapminder.org's Hans Rosling might be particularly valuable for all of us.
Re:Happend Top Down Already (Score:4, Interesting)
I scour publicly available records for fun stuff all the time. I not only find it online but I also request it from government agencies (not Federal usually but local/county/etc).
In Minnesota data must be, "easily accessible for convenient use." [mn.gov] While that has specific wording related to historical records, it basically means that on recent data it must be in some sort of electronic format or otherwise easily found and presented, free of charge as long as you do it in person, to anyone who asks--even anonymously. Now. This is great in theory. Unfortunately just because it's easy for the agency to use it doesn't mean it's easy for you to use or interpret.
Let's take for instance data on bus ridership data [lazylightning.org]. It's not well organized for outsiders to read it and due to collection methodologies (not explained to the general person who had to pay $50 to get the data in the first place) is basically useless.
They have the data and after months of fighting with them for how much they claimed it cost (they wanted to charge me more than $300 IIRC) I got it down to $50 and got what you see above even though they already pulled it (and summarized it) for the mass media but wouldn't release it in a raw format.
So. It's in a format which isn't standard. It's methodology is questionable and it's expensive. So no matter the mandates, the promises, etc, the data is not terribly useful across agencies or to the public without some intermediate steps which costs the taxpayers more than doing it right the first time around.
Encouraging Governments? (Score:4, Interesting)
In a city such as Nashville, things as basic as business ownership and property records are not available online. In states such as New Jersey, public records such as basic corporate filings (officers, operating address/address for service of process) are accessible only for a fee.
What concrete actions can citizens confronting such situations, take to encourage accessibility and accountability?
Can the rare books collections be digitized? (Score:5, Interesting)
Three closely related questions about the rare books collections at the Library of Congress:
1. I know there is some kind of effort going on to digitize the rare books collections, but can it be sped up? There are many high-quality low-cost archival book scanners out there (such as the ones developed at diybookscanner.org).
2. It gets really annoying to have to receive paper copies of books when copies are requested. Why not DVDs of high-quality images?
3. Why is there no outreach by the LoC to smaller, cheaper book scanning efforts? The Internet Archive, DIYBookscanner.org, and Decapod all come to mind.