-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Importing bulk IRS data #138
Comments
To import: A. Data downloads are at https://www.irs.gov/uac/soi-tax-stats-annual-extract-of-tax-exempt-organization-financial-data B. Get desired file:
C. Extract file
D. Generate schema for file:
This may give a segmentation fault for large files. It's probably running out of memory; try it on your personal computer if this happens. E. Create a table using that statement:
F. Import the data into postgres:
|
Note that 2015 extracts are linked directly as .dat (not zipped). Adjust import by skipping step C. Revise step D: Once imported, rename 2015 990 tables for consistency: |
Here are the statements I've come up with to standardize the 2012 data. I think I was overzealous in the schema and we can remove a lot of fields, but I'll think about that later. To test these, I created a test table ( You'll notice that these seem pretty redundant. Why select from a select statement? That's a quirk of postgres/SQL: if instead we did the much cleaner
the insert wouldn't save Here goes -- you can use these as templates for doing the other tables: 2012 990s insert into matt
(source, ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue, compensation,
fundraising_fees, total_expenses, total_assets, total_liabilities,
net_assets, data)
select
source, ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue, compensation,
fundraising_fees, total_expenses, total_assets, total_liabilities,
net_assets, data
from (select
'2012 990' as source,
subseccd,
"EIN" as ein,
'990' as filing_type,
tax_prd as tax_period,
totcntrbgfts as contributions_and_grants,
totprgmrevnue as program_service_revenue,
invstmntinc as investment_income,
totrevenue as total_revenue,
othrsalwages + compnsatncurrofcr as compensation,
profndraising as fundraising_fees,
totfuncexpns as total_expenses,
totassetsend as total_assets,
totliabend as total_liabilities,
totnetassetend as net_assets,
row_to_json(py12_990) as data
from py12_990 ) as stuff
limit 10 PY12 990 EZ insert into matt
(source, ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue,
total_expenses,
total_assets, total_liabilities,
net_assets, data)
select
source, ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue,
total_expenses,
total_assets, total_liabilities,
net_assets, data
from (select
'PY12 990 EZ' as source,
subseccd,
ein,
'990ez' as filing_type,
tax_prd as tax_period,
totcntrbs as contributions_and_grants,
prgmservrev as program_service_revenue,
othrinvstinc as investment_income,
-- othrsalwages + compnsatncurrofcr as compensation,
-- profndraising as fundraising_fees,
totrevnue as total_revenue,
totexpns as total_expenses,
totassetsend as total_assets,
totliabend as total_liabilities,
totnetassetsend as net_assets,
row_to_json(py12_990ez) as data
from py12_990ez ) as stuff
limit 10 2012 PF insert into matt
(
source,
ein,
subseccd,
irs_year,
filing_type,
tax_period,
contributions_and_grants,
compensation,
investment_income,
total_revenue,
total_expenses,
total_assets,
total_liabilities,
grants_paid,
data
)
select
source,
ein,
subseccd,
irs_year,
filing_type,
tax_period,
contributions_and_grants,
compensation,
investment_income,
total_revenue,
total_expenses,
total_assets,
total_liabilities,
grants_paid,
data
from (select
'2012 990PF' as source,
"SUBCD" as subseccd,
"EIN" as ein,
'990pf' as filing_type,
"TAX_PRD" as tax_period,
"TAX_YR" as irs_year,
"GRSCONTRGIFTS" as contributions_and_grants,
-- prgmservrev as program_service_revenue,
"NETINVSTINC" as investment_income,
"COMPOFFICERS" as compensation,
-- profndraising as fundraising_fees,
"TOTRCPTPERBKS" as total_revenue,
"TOTEXPNSPBKS" as total_expenses,
"TOTASSETSEND" as total_assets,
"TOTLIABEND" as total_liabilities,
"CONTRPDPBKS" as grants_paid,
-- totnetassetsend as net_assets,
row_to_json(py12_990pf) as data
from py12_990pf) as stuff
limit 10 |
schemas for 2013, 2014, and 2015 990: https://gist.github.com/bnchdrff/14413f0bf8253729bd646373d91519a0 |
2013 990 insert into matt
(source, ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue, compensation,
fundraising_fees, total_expenses, total_assets, total_liabilities,
net_assets, data)
select
source, ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue, compensation,
fundraising_fees, total_expenses, total_assets, total_liabilities,
net_assets, data
from (select
'2013 990' as source,
subseccd,
"EIN" as ein,
'990' as filing_type,
tax_pd as tax_period,
totcntrbgfts as contributions_and_grants,
totprgmrevnue as program_service_revenue,
invstmntinc as investment_income,
totrevenue as total_revenue,
othrsalwages + compnsatnandothr + compnsatncurrofcr as compensation,
profndraising as fundraising_fees,
totfuncexpns as total_expenses,
totassetsend as total_assets,
totliabend as total_liabilities,
totnetassetend as net_assets,
row_to_json(py13_990) as data
from py13_990 ) as stuff
limit 10 |
2013 EZ insert into matt
(source, ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue,
total_expenses,
total_assets, total_liabilities,
net_assets, data)
select
source, ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue,
total_expenses,
total_assets, total_liabilities,
net_assets, data
from (select
'2013 EZ' as source,
subseccd,
"EIN" as ein,
'990ez' as filing_type,
tax_pd as tax_period,
totcntrbs as contributions_and_grants,
prgmservrev as program_service_revenue,
othrinvstinc as investment_income,
-- othrsalwages + compnsatncurrofcr as compensation,
-- profndraising as fundraising_fees,
totrevnue as total_revenue,
totexpns as total_expenses,
totassetsend as total_assets,
totliabend as total_liabilities,
totnetassetsend as net_assets,
row_to_json("py13_EZ") as data
from "py13_EZ" ) as stuff
limit 10 |
2013 PF insert into matt
(
source,
ein,
subseccd,
irs_year,
filing_type,
tax_period,
contributions_and_grants,
compensation,
investment_income,
total_revenue,
total_expenses,
total_assets,
total_liabilities,
grants_paid,
data
)
select
source,
ein,
subseccd,
irs_year,
filing_type,
tax_period,
contributions_and_grants,
compensation,
investment_income,
total_revenue,
total_expenses,
total_assets,
total_liabilities,
grants_paid,
data
from (select
'2013 PF' as source,
"SUBCD" as subseccd,
"EIN" as ein,
'990ez' as filing_type,
"TAX_PRD" as tax_period,
"TAX_YR" as irs_year,
"GRSCONTRGIFTS" as contributions_and_grants,
-- prgmservrev as program_service_revenue,
"NETINVSTINC" as investment_income,
"COMPOFFICERS" as compensation,
-- profndraising as fundraising_fees,
"TOTRCPTPERBKS" as total_revenue,
"TOTEXPNSPBKS" as total_expenses,
"TOTASSETSEND" as total_assets,
"TOTLIABEND" as total_liabilities,
"CONTRPDPBKS" as grants_paid,
-- totnetassetsend as net_assets,
row_to_json(py13_990pf) as data
from py13_990pf) as stuff
limit 10 |
2014 990 insert into matt
(
source,
ein,
subseccd,
filing_type,
tax_period,
contributions_and_grants,
program_service_revenue,
investment_income,
total_revenue,
compensation,
fundraising_fees,
total_expenses,
total_assets,
total_liabilities,
net_assets,
data
)
select
source,
ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue, compensation,
fundraising_fees, total_expenses, total_assets, total_liabilities,
net_assets, data
from (select
'2014 990' as source,
subseccd,
"EIN" as ein,
'990' as filing_type,
tax_prd as tax_period,
totcntrbgfts as contributions_and_grants,
totprgmrevnue as program_service_revenue,
invstmntinc as investment_income,
totrevenue as total_revenue,
othrsalwages + compnsatnandothr + compnsatncurrofcr as compensation,
profndraising as fundraising_fees,
totfuncexpns as total_expenses,
totassetsend as total_assets,
totliabend as total_liabilities,
totnetassetend as net_assets,
row_to_json(py14_990) as data
from py14_990 ) as stuff
limit 10 |
2014 990ez insert into matt
(source, ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue,
total_expenses,
total_assets, total_liabilities,
net_assets, data)
select
source, ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue,
total_expenses,
total_assets, total_liabilities,
net_assets, data
from (select
'2014 990ez' as source,
subseccd,
"EIN" as ein,
'990ez' as filing_type,
taxprd as tax_period,
totcntrbs as contributions_and_grants,
prgmservrev as program_service_revenue,
othrinvstinc as investment_income,
-- othrsalwages + compnsatncurrofcr as compensation,
-- profndraising as fundraising_fees,
totrevnue as total_revenue,
totexpns as total_expenses,
totassetsend as total_assets,
totliabend as total_liabilities,
totnetassetsend as net_assets,
row_to_json("py14_EZ") as data
from "py14_EZ" ) as stuff
limit 10 |
2014 990pf insert into matt
(
source,
ein,
subseccd,
irs_year,
filing_type,
tax_period,
contributions_and_grants,
compensation,
investment_income,
total_revenue,
total_expenses,
total_assets,
total_liabilities,
grants_paid,
data
)
select
source,
ein,
subseccd,
irs_year,
filing_type,
tax_period,
contributions_and_grants,
compensation,
investment_income,
total_revenue,
total_expenses,
total_assets,
total_liabilities,
grants_paid,
data
from (select
'2014 990PF' as source,
"SUBCD" as subseccd,
"EIN" as ein,
'990pf' as filing_type,
"TAX_PRD" as tax_period,
"TAX_YR" as irs_year,
"GRSCONTRGIFTS" as contributions_and_grants,
-- prgmservrev as program_service_revenue,
"NETINVSTINC" as investment_income,
"COMPOFFICERS" as compensation,
-- profndraising as fundraising_fees,
"TOTRCPTPERBKS" as total_revenue,
"TOTEXPNSPBKS" as total_expenses,
"TOTASSETSEND" as total_assets,
"TOTLIABEND" as total_liabilities,
"CONTRPDPBKS" as grants_paid,
-- totnetassetsend as net_assets,
row_to_json(py14_990pf) as data
from py14_990pf) as stuff
limit 10 |
2015 990 insert into matt
(
source,
ein,
subseccd,
filing_type,
tax_period,
contributions_and_grants,
program_service_revenue,
investment_income,
total_revenue,
compensation,
fundraising_fees,
total_expenses,
total_assets,
total_liabilities,
net_assets,
data
)
select
source,
ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue, compensation,
fundraising_fees, total_expenses, total_assets, total_liabilities,
net_assets, data
from (select
'2015 990' as source,
subseccd,
"EIN" as ein,
'990' as filing_type,
tax_prd as tax_period,
totcntrbgfts as contributions_and_grants,
totprgmrevnue as program_service_revenue,
invstmntinc as investment_income,
totrevenue as total_revenue,
othrsalwages + compnsatnandothr + compnsatncurrofcr as compensation,
profndraising as fundraising_fees,
totfuncexpns as total_expenses,
totassetsend as total_assets,
totliabend as total_liabilities,
totnetassetend as net_assets,
row_to_json(py15_990) as data
from py15_990 ) as stuff
limit 10 |
2015 990ez insert into matt
(source, ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue,
total_expenses,
total_assets, total_liabilities,
net_assets, data)
select
source, ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue,
total_expenses,
total_assets, total_liabilities,
net_assets, data
from (select
'2015 990 EZ' as source,
subseccd,
"EIN" as ein,
'990ez' as filing_type,
a_tax_prd as tax_period,
totcntrbs as contributions_and_grants,
prgmservrev as program_service_revenue,
othrinvstinc as investment_income,
-- othrsalwages + compnsatncurrofcr as compensation,
-- profndraising as fundraising_fees,
totrevnue as total_revenue,
totexpns as total_expenses,
totassetsend as total_assets,
totliabend as total_liabilities,
totnetassetsend as net_assets,
row_to_json("py15_990ez") as data
from "py15_990ez" ) as stuff
limit 10 |
2015 990pf insert into matt
(
source,
ein,
subseccd,
irs_year,
filing_type,
tax_period,
contributions_and_grants,
compensation,
investment_income,
total_revenue,
total_expenses,
total_assets,
total_liabilities,
grants_paid,
data
)
select
source,
ein,
subseccd,
irs_year,
filing_type,
tax_period,
contributions_and_grants,
compensation,
investment_income,
total_revenue,
total_expenses,
total_assets,
total_liabilities,
grants_paid,
data
from (select
'2015 990 pf' as source,
"SUBCD" as subseccd,
"EIN" as ein,
'990pf' as filing_type,
"TAX_PRD" as tax_period,
"TAX_YR" as irs_year,
"GRSCONTRGIFTS" as contributions_and_grants,
-- prgmservrev as program_service_revenue,
"NETINVSTINC" as investment_income,
"COMPOFFICERS" as compensation,
-- profndraising as fundraising_fees,
"TOTRCPTPERBKS" as total_revenue,
"TOTEXPNSPBKS" as total_expenses,
"TOTASSETSEND" as total_assets,
"TOTLIABEND" as total_liabilities,
"CONTRPDPBKS" as grants_paid,
-- totnetassetsend as net_assets,
row_to_json(py15_990pf) as data
from py15_990pf) as stuff
limit 10 |
Note that the real inserts should go into the |
Hi Hampelm. I know nothing about how this works. I read about it in Friedman's book Thank you for Being Late. There is something here I know but how to mine it, I don't. I am an Enrolled Agent for the IRS and I help people resolve tax issues. I am looking for a data base that tells me who the IRS has sent Tax audits, tax notices, and tax liens and etc. The faster I know this the faster I can help people avoid overspending on high priced professionals. It's digitized somewhere, I need an expert to dig it out on a program like GitHub. Any comments? Thank you Jerry |
This statement creates the unified 990s table:
The text was updated successfully, but these errors were encountered: