Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

California Civic Data for IRS data #5

Open
hampelm opened this issue Oct 16, 2016 · 2 comments
Open

California Civic Data for IRS data #5

hampelm opened this issue Oct 16, 2016 · 2 comments

Comments

@hampelm
Copy link
Member

hampelm commented Oct 16, 2016

At http://www.californiacivicdata.org/, they produce:

The complete package of 80 database tables released by California's Secretary of State, which we have cleaned up and converted into flat files with 40.7 million rows of comma-separated values*.

Along with related documentation.

Work with the new IRS XML files at https://aws.amazon.com/public-data-sets/irs-990/

  • Document differences between years and forms
  • Produce standardized scrapers
  • Make a web interface to make the index CSVs easily searchable so you can more easily pull collections of files by organization, year, state, etc.
@hampelm
Copy link
Member Author

hampelm commented Oct 16, 2016

https://www.reddit.com/r/aws/comments/4p772f/how_the_heck_do_i_view_the_990_documents_on/

People wanted:

  • Searchable index
  • Flat files / easier way to pull the data ("I am working on a project right now with some people that iterates through the index, downloads the xml files and appends to a large flat file. Let me know if you are interested in the data once we are done. We've sampled around 75k docs and the beginning results look interesting and def useful.")
  • EBS volume of all the files

@hampelm
Copy link
Member Author

hampelm commented Oct 18, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant