CPJ_floodinsurance.Rmd

---
title: Flood Insurance Impact on Post-Flood Home Sales
author: "Connor P. Jackson[^disclaim]"
date: "`r format(Sys.time(), '%B %d, %Y')`"
output:
  bookdown::pdf_document2:
    fig_caption: yes
    toc: no
    number_sections: yes
    keep_tex: yes
    df_print: !expr pander::pander
  bookdown::html_notebook2:
    fig_caption: yes
    number_sections: yes
    df_print: kable
bibliography: [library.bib, pkg-refs.bib]
header-includes: \usepackage{setspace}
link-citations: true
linestretch: 2
reference-section-title: References
linkcolor: blue
urlcolor: blue
abstract: "Welfare improving buyouts of high flood risk homes need to be carefully designed to overcome political opposition. I assemble a panel of real estate and flood insurance policy data to estimate whether holding an insurance policy impacts homeowners' likelihood of selling their home following a flood. OLS results suggest no difference between policyholders and non-holders, but IV estimates are unreliable due to insufficient variation to separately identity two estimates simultaneously."
---

[^disclaim]: I thank Jim Sallee for valuable advising and guidance, Michael Anderson, Max Auffhammer, Sara Johns, and Abdou Cisse, Katherine Wagner, Arthur Wardle, and the entire 2019 ARE cohort for valuable guidance and feedback. Real Estate data provided by Zillow through the Zillow Transaction and Assessment Dataset (ZTRAX). More information on accessing the data can be found at <http://www.zillow.com/ztrax>. The results and opinions are those of the author and do not reflect the position of Zillow Group. NFIP Policies and Claims data provided by FEMA through the OpenFEMA API. FEMA and the Federal Government cannot vouch for the data or analyses derived from these data after the data have been retrieved from the Agency's website(s) and/or Data.gov. This product uses the Federal Emergency Management Agency’s API, but is not endorsed by FEMA.

```{r setup, include=FALSE}
library(bookdown)
library(knitr)
library(kableExtra)
library(pander)
library(rmarkdown)
library(stargazer)
library(lfe)
library(ggplot2)
library(grateful)
options(knitr.kable.NA = '')
panderOptions("digits", 4)
panderOptions("table.split.cells", 50)
panderOptions("table.split.table", 120)
panderOptions("big.mark", ",")
# panderOptions("table.caption.prefix", paste("Table:", knitr::opts_current$get('label')))
```

```{r citations, echo = FALSE}
citerefs <- cite_packages(generate.document = FALSE, all.pkgs = FALSE)
nocite_references(citerefs, citation_processor = 'pandoc')
```

```{r initialize-data, include=FALSE}
source("merge_and_analysis.R")
```

# Introduction

Flood damage prevention and recovery has been a major expenditure for every level of government in the United States for nearly a century, and climate change is leading to floods of increased frequency and intensity. Decades of primarily local and state policy have encouraged development in flood plains and coastal zones, creating substantial new private wealth financed by risk exposure for the federal government [@Gaul2019]. The US continues to spend vast amounts on various subsidies for landowners in flood prone areas: from federally underwritten flood insurance below actuarially fair prices to major capital investments for dredging, levee construction and maintenance, and coastal erosion prevention. Taken together, these subsidies and pro-development policies have led to a large number of properties in flood-prone areas that are worth less than the ongoing stream of subsidy involved in maintaining them.

The most severe examples of this problem are so called severe repetitive loss (SRL) properties [@NationalFloodInsuranceProgram2020]. These structures, usually older homes not adapted by elevation to withstand regular flood damage, are insured by the National Flood Insurance Program (NFIP), and have received repeated claim payouts that exceed the value of the home, a situation that is only possible because the NFIP sets premiums below actuarially fair levels. A private insurer faced with this situation would either refuse to cover the property, or upon the filing of a claim, declare the property a total loss, much as car insurers will "total" a car following a particularly damaging crash. However, the NFIP is required by statute to provide affordable flood insurance to all Americans, and thus are prevented from declining coverage on a money-losing property. The NFIP was originally established by Congress to shift some of the cost and risk of development in flood prone areas from the government onto property owners. However, these pricing practices have left the NFIP with a substantial debt load, and a deficit that currently exceeds $1 billion per year [@CBO2017]. The ongoing fiscal problems of the NFIP along with its conflicting mandates of moving risk from the government to homeowners and maintaining affordability has been a topic of research and discussion for more than a decade [@Miller2019; @NationalAcademiesofSciencesandMedicine2016]. 

Congress has made several attempts to reduce the working deficit of the NFIP, and any such plan will need to address SRL properties. Because they cost more to insure in expectation than the properties are worth, it would be welfare improving to simply buy these properties and tear them down. Doing so would avoid the substantial insurance losses and be a net gain to society, without even having to consider the additional avoided costs for infrastructure, US Army Corps of Engineers management projects, etc. The Federal Emergency Management Agency (FEMA) has pursued buyouts of SRL properties in recent years, but these proceedings are slow, costly, and politically unpopular [@Weber2019; @Salvesen2018]. Consumers are, understandably, attached to their homes and communities, and being asked (let alone required) to move is, at a minimum, quite disruptive. This policy direction stands in stark contrast to decades of pro-development policies, including the investment of billions of federal and state dollars, along the Atlantic and Gulf Coasts, so turning the tide of public opinion will be no easy task [@Gaul2019]. To address this problem on a broad scale, policymakers will need to think carefully about how homeowners choose whether to rebuild or relocate in the wake of a disaster.

This paper attempts to provide evidence to support that policymaking process by exploring how holding flood insurance impacts a homeowner's decision to sell their home in the wake of a flood. By identifying property owners who are most likely to sell their homes following a flood, policymakers can design a policy in which the government purchases those homes, rather than a new private buyer, and then permanently removes them from the market. Because FEMA already has an existing relationship with flood insurance policyholders, this group of flood-susceptible residents could represent an easy first group of homeowners to approach for buyouts, particularly in the wake of a flood. However, if holding flood insurance makes homeowners _less_ likely to sell their homes following a flood, then that population is not as useful when FEMA is in search of potential buyout offerings. Thus, this paper attempts to determine whether holding flood insurance impacts a homeowner's decision to sell their home in the wake of a flood.

To answer this question, I combine several detailed micro datasets: real estate assessment and transaction records from county public records, collected in Zillow's ZTRAX database [@ZillowGroup2020], and NFIP data on flood policies written and claims filed. Using these data, I construct a panel of home transactions, flood events, and flood insurance takeup for 2009–2016 to estimate both the main effect of floods on home sale probability, and how that effect differs for those holding flood insurance. An observation in my constructed panel is home in one year, though the matching process to connect insurance policies with their associated homes is imperfect, therefore the panel does not uniquely identify insurance policies for each home. Rather, the insurance policies are spread over the set of homes in the same census tract and flood zone, and constructed in the same year.

Using a linear probability model and estimating with OLS, I find that homeowners are less likely to sell following a flood, in agreement with existing literature [@Zivin2020]. Flood insurance policyholders are more likely to sell their homes (in general, regardless of flood occurrence), but this effect may be due to imprecise recording of the relative timing of flooding, sales, and policy effective dates. I find no significant effects of insurance in the wake of flooding using the OLS specifications. 

Because the choice of insurance takeup is endogenous, I instrument for insurance using exogenous price variation caused by the Biggert Waters Flood Insurance Reform Act of 2012. This instrument does provide some exogenous variation in takeup for one endogenous regressor, but in a specification with both a main effect and interaction with flooding, the instrument (both alone and itself interacted with flooding) is unable to provide sufficient variation to identify both endogenous regressors—even less so when lagged terms are introduced. As a result, I am unable to use these estimates to make any credible causal claims, though they do provide useful suggestions for refinement and future research. 

This paper builds on a growing literature on consumer responses to natural disasters, particularly floods, and the welfare and policy implications of existing flood insurance rate structures. @Gallagher2014 shows that insurance takeup is not perfectly rational, and in fact depends on salience of flood risk. @Wagner2019 tests for selection in the flood insurance market, and describes how risk misperception and other frictions in insurance takeup distort demand, and thus lead to different welfare effects of reforms than a standard expected utility framework would suggest. @Peralta2019 explore the location sorting effects of subsidized flood insurance. This paper most closely builds on the work of @Zivin2020, who test directly for the impact of hurricanes on housing markets. The authors look for impacts on housing prices and transaction probability of hurricane-force winds, finding an increase in prices and a small decrease in sale probability up to three years following a hurricane, and also find that incoming homebuyers after a hurricane are higher income than those they replace. This research follows the methodology of @Zivin2020, but uses flood events rather than wind exposure, and is not limited to high-damage events like hurricanes. 

# Data

This research is built upon two primary data sources: flood insurance policies, claims and flood zones from the National Flood Insurance Program, and real estate assessment and transaction records from the Zillow ZTRAX dataset [@ZillowGroup2020]. I focus on North Carolina, which is one of the primary states covered by NFIP policies and receiving claims. North Carolina was selected based on its high frequency of tropical storms and flooding incidents, complete coverage of digital flood maps, and variation in community characteristics in special flood hazard areas (SFHA). Much of North Carolina's severe weather comes in the form of large amounts of precipitation and storm surge, rather than damaging winds which are not covered by NFIP policies. In addition, nearly all of North Carolina's counties (including all of its coastal counties) have digitized flood insurance rate maps (FIRMs), which allows me to identify the flood zone of nearly every property in the state. North Carolina has a wide variety of flood prone areas, both coastal and inland, covering both high and low income communities, and high and low density regions. Some coastal areas are well adapted to flooding, while other, older communities are much more vulnerable to regular flooding, to the degree that managed retreat has begun to be discussed at the state and local level. While a nationwide analysis would be valuable in the long term, North Carolina provides a reasonable case study for the types of disaster damage most relevant to the National Flood Insurance Program. 

## Flood Insurance Policies and Claims

FEMA publishes deidentified micro data on all [NFIP policies](https://www.fema.gov/openfema-data-page/openfema-dataset-fima-nfip-redacted-policies--v1) underwritten by the federal government, which covers nearly all flood insurance policies written in the United States. The data describe the amount of coverage, premiums and fees, deductibles, and attributes of the covered property. Properties are identified down to the census tract, and also report flood zone and year of construction, which I use to match with the real estate data. Flood insurance policy data are available for the years `r policies[, min(year(policyEffectiveDate))]`–`r policies[, max(year(policyEffectiveDate))]`. I discard policies that cover only home contents and not the structure itself. Figure \@ref(fig:first-stage-graph) shows insurance takeup over time, with the Biggert-Waters reform indicated. Takeup is shown separately for adapted and non-adapted homes, as they are subject to different premium rate schedules and different price shocks. Adapted homes are those that were constructed (or substantially renovated) after the publication of their area's first flood insurance rate map (FIRM). New or renovated homes in special flood hazard areas with an existing, published FIRM are required to be adapted by elevating the lowest occupied level of the home above a pre-defined "base flood elevation" to reduce the risk of damage from flooding.

```{r first-stage-graph, echo=FALSE, fig.cap="Flood insurance adoption, driven by Biggert-Waters, separately for adapted and non-adapted homes. The key source of variation in my first stage.", fig.dim=c(7,4)}
firststage <- nctrans_panel[full_sample == TRUE, .(polfrac = sum(policy_prob) / .N), keyby = .(year, adapted_text)]
ggplot(firststage, aes(x=year, y=polfrac)) + geom_point(aes(color = adapted_text, shape=adapted_text)) + geom_vline(xintercept=2012.5) + ggtitle("Flood Insurance Adoption Over Time") + ylab("Fraction of Homes with Flood Insurance") + geom_text(aes(x=2012.35, label="Biggert-Waters Act", y=.03), angle=90) + labs(shape="Adaptation Status", color="Adaptation Status")
```

The [NFIP claims data](https://www.fema.gov/openfema-data-page/fima-nfip-redacted-claims-v1) tabulate every claim filed with the NFIP `r claims[, min(year(dateOfLoss))]` through `r claims[, max(year(dateOfLoss))]`. The data include details about the insured property, the insurance coverage, the loss event, and subsequent claim and payout. I discard claims that cover only loss of contents (either due to the policyholder not holding any building coverage, or not submitting a claim for building damage), as well as those with listed payouts that exceed the maximum coverage of $250,000. The claims data are used to identify which properties were exposed to flooding in a given year. Since a given property's vulnerability to flood damage is endogenous, through defensive investments, home elevation, etc., I define exposure to flooding by indicating whether a flood claim was filed within the intersection of a census tract and flood zone in each year. One potential issue with using insurance claims data to define flood experiences: it requires at least some homes to hold flood insurance in the exposed area. This connection may result in bias if areas holding fewer insurance policies are then less likely to register a flood event in my data, even if the true incidence of flooding is as high or higher than areas holding more insurance policies. A more comprehensive spatial flooding dataset, such as NASA's [MODIS Global Flood Mapping](https://floodmap.modaps.eosdis.nasa.gov/), could be an improvement in this methodology for future research.

Finally, I also use the [national flood hazard layer](https://www.fema.gov/flood-maps/national-flood-hazard-layer) geospatial database to assign individual properties to flood zones, to allow matching with the policies and claims data. Flood zones are defined on the region's flood insurance rate maps following a topographic survey, and are categorized based on their probability of flooding, likely flood depth, and susceptibility to other flooding effects like coastal storm surge. The flood zones used in this analysis, defined as Special Flood Hazard Areas, have at least a 1% chance of flooding in a given year. Homes within a given flood zone generally share a similar base flood elevation, the expected depth of flooding above which new homes are required to be elevated to reduce the risk of property damage.

## Housing Assessment and Transactions

The [ZTRAX data](https://www.zillow.com/research/ztrax/) are a real estate database of properties comprised of both assessor records and real estate transaction records, compiled into a nationwide database by Zillow [@ZillowGroup2020]. The assessor data include details about the parcel and primary structure, location (street address, census tract, latitude and longitude), and value (assessed and market values). This table is used to identify the sample of North Carolina homes. I limit the sample to single family homes, excluding rural residences (homes on productive agricultural land) as well as condominiums and similar structures. Assessment data are available from `r sfha_homes[, min(year(record_date))]` through `r sfha_homes[, max(year(record_date))]`.

The transaction data contains information about every real estate transaction recorded for the parcels in the assessor data. The records include the date and type of transaction, information about the buyer, seller, and lender, if applicable, sale price and any taxes, and mortgage information. These data require extensive filtering to identify the set of records that correspond to "arm's length" transactions of true home sales, rather than refinancing, transfers to family members, or liens. Data are available for transactions from `r nctrans_panel[, min(year, na.rm = TRUE)]` into `r nctrans_panel[, max(year, na.rm = TRUE)]`. I discard homes that were built in the same year as their area's first flood insurance rate map (FIRM) which determines adaptation requirements, since the requirement within the same year is ambiguous, as well as homes with unrecorded years of construction.

## Constructed Panel

My analysis sample is comprised of arm's length transactions of single family homes in North Carolina special flood hazard areas, using data from 2009–2016. Because the flood insurance data are not available at the individual home level, I must imperfectly match flood insurance policies to homes by census tract, flood zone, and year of home construction. From these data, I construct a yearly panel of homes, defining the fraction of matched homes that hold a flood insurance policy as the probability that each home holds a policy. In addition, I construct variables for whether each home was sold in that year, and whether the home was exposed to a flood event. I restrict the panel to include only homes built before 2009, so no homes enter the dataset in the middle of the analysis period. Table \@ref(tab:panel-summary-stats) provides summary statistics for the panel as used in this analysis.

```{r panel-summary-stats, echo=FALSE, warning=FALSE}
# panel descriptive statistics table. n, T, N, average homes per cell, average policies per cell
# Average transaction probability in the sample: 

summary_cols <- c("TotalMarketValue", "transaction_obs", "flood_event", "policy_prob", "adapted")
adapted_sum <- melt(nctrans_panel[full_sample == TRUE, lapply(.SD, mean, na.rm = TRUE), 
                                  by = .(adapted_text), .SDcols=summary_cols], id.vars = "adapted_text")
adapted_sum[variable == "TotalMarketValue", value := round(value, 0)]

flooded_areas <- unique(tzy_panel[in_sample == TRUE, .(panel_id, year, flood_event)])[, .(flooded=sum(flood_event) > 0), by = .(panel_id)][flooded == TRUE, panel_id]
flooded_sum <- melt(nctrans_panel[full_sample == TRUE, lapply(.SD, mean, na.rm = TRUE), 
                                  by = .(flooded=ifelse(panel_id %in% flooded_areas, "flooded during panel", "not flooded during panel")), 
                                  .SDcols=summary_cols], id.vars = "flooded")
flooded_sum[variable == "TotalMarketValue", value := round(value, 0)]

cellsizes_ad <- melt(tzy_panel[in_sample == TRUE, .(`homes per policy-match cell` = mean(properties_count), 
                                                 `standard deviation of homes per match cell` = sd(properties_count)), 
                            by=adapted_text], id.vars = "adapted_text")
cellsizes_fl <- melt(tzy_panel[in_sample == TRUE, .(`homes per policy-match cell` = mean(properties_count), 
                                                 `standard deviation of homes per match cell` = sd(properties_count)), 
                            by=.(flooded=ifelse(panel_id %in% flooded_areas, "flooded during panel", "not flooded during panel"))], id.vars = "flooded")

adapted_sum <- dcast(rbind(adapted_sum, cellsizes_ad), variable ~ adapted_text)
flooded_sum <- dcast(rbind(flooded_sum, cellsizes_fl), variable ~ flooded)
sumstats <- merge(adapted_sum, flooded_sum, by = "variable")
sumstats$variable <- c("Market Value", "Sale During Panel", "Flood Probability During Panel", 
                       "Probability of Holding Insurance During Panel", "Proportion Adapted (new homes)",
                       "Mean number of homes per policy-match cell",
                       "Standard deviation of homes per match cell")
# colnames(sumstats) <- c("", colnames(sumstats)[2:5])
sumstats_rn <- sumstats[, variable]
sumstats[, variable := NULL]
setDF(sumstats, rownames=sumstats_rn)
kable(sumstats, digits = 4, caption = "Summary statistics of the analysis panel, 2009--2016",
      format.args=list(big.mark = ",", scientific=FALSE, drop0trailing=TRUE), format="latex",
      booktabs = TRUE, linesep = "\\addlinespace", align = "lcccc") %>%
  kable_styling(latex_options = "scale_down") %>%
  column_spec(column=1, width="1.7in")
# pander(sumstats, caption = "(\\#tab:panel-summary-stats) Summary statistics of the analysis panel, 2009--2016",
#        justify="lcccc", split.cells=c(24, 11, 11, 13, 13))
```

One potential issue in my data is that year of construction is not always properly recorded in county records or by the NFIP, adding error to the matching process. This issue is particularly pernicious because a sufficiently comprehensive renovation can result in a much later "effective" year of construction, subjecting a previously old house to new adaptation requirements, and county assessor records may or may not reflect this new construction year. This problem occasionally manifests as some set of matched homes having more insurance policies than homes. In these instances, I drop from the analysis all homes and matched insurance policies in each of the intersections of census tracts and flood zones that have this problem, for all construction years. A more detailed analysis could resolve or more delicately handle this mismatch for future research. 

Another caveat is that, within a year, I do not enforce that sales must occur after floods and insurance policy effective dates. This potential reversal of causality could yield misleading results if homeowners are likely to hold flood insurance immediately after purchasing a home in a flood zone (per federal mortgage requirements), but let their policy lapse in later years. Future research would benefit from a more careful comparison of flooding dates, policy start dates, and transaction dates, potentially in combination with more precise flood data as previously mentioned.

To inspect the variation of home sales relative to flood events, I run reduced form regressions of my two primary specifications, regressing transaction probability (either in the observed year or over the course of the subsequent two years) by years since a flood event, controlling for adapted status. The results of these reduced form regressions are shown in Table \@ref(tab:reduced-form), and indicate at least some responsiveness to flooding. The response does not appear to be materially different for adapted and non-adapted homes. 

```{r transprob-byflood, echo=FALSE, warning=FALSE, results='asis'}
transprob_flood_es <- felm(transaction_obs ~ adapted*flood_event + adapted*flood_L1 + adapted*flood_L2
                        + adapted*flood_L3 + adapted*flood_F1 | panel_id + year + YearBuilt | 0 | censusTract,
                        data = nctrans_panel[es_sample_3L == TRUE], exactDOF = TRUE)
# summary(transprob_flood_es)
transprob_flood_dd <- felm(sale_2years ~ adapted*flood_event | panel_id + year + YearBuilt | 0 | censusTract,
                        data = nctrans_panel[dd_sample == TRUE], exactDOF = TRUE)
# summary(transprob_flood_dd)

stargazer(transprob_flood_dd, transprob_flood_es, type=ifelse(knitr::is_latex_output(), "latex", ifelse(knitr::is_html_output(),"html" , "text")), 
          covariate.labels = c("adapted", "flood (concurrent)", "flood (1 year ago)", "flood (2 years ago)",
                               "flood (3 years ago)", "flood (next year)", "flood x adapted (concurrent)",
                    "flood x adapted (1 year ago)", "flood x adapted (2 years ago)",
                    "flood x adapted (3 years ago)", "flood x adapted (next year)"),
          title="Reduced Form: Home Sale Probability Against Flood Events and Adaptation", 
          dep.var.caption = "Probability of home sale…", dep.var.labels.include = FALSE, 
          column.labels = c("Within the next two years", "This year"),
          df=FALSE, digits = 4, header = FALSE, keep.stat = c("n", "rsq"), label="tab:reduced-form",
          notes = "\\parbox[t]{0.5\\textwidth}{Fixed effects: census tract x flood zone, observation year, home construction year. Clustering SEs by census tract.}",
          no.space = TRUE, single.row = TRUE, font.size = "small")
```

# Model

My research question is twofold: first, does experiencing a flood lead homeowners to sell their properties, and second, does holding flood insurance mediate this effect. To explore these questions, I apply a difference-in-differences approach to a linear probability model, regressing an indicator for whether the home is sold within two years of the observation year on whether the home is exposed to flooding, whether the home holds flood insurance (imperfectly matched), and the interaction of the two in a given observation year:
$$Transaction_{lict} = \alpha Flood_{lt} + \beta Insurance_{lct} + \gamma (Flood_{lt} \times Insurance_{lct}) + \varepsilon_{lict}$$
The regression also includes fixed effects for location (the intersection of a census tract and a flood zone), year, year of home construction, and whether the home is subject to an adaptation requirement (which is a combination of location and year of construction). The unit of analysis is an individual home in a given year. In the estimating equation, $l$ indexes locations (the intersection of a census tract and a flood zone), $i$ indexes an individual home, $c$ indexes year of construction, and $t$ indexes the year of observation. Together, the subscript $lict$ indexes a single observation. Note that I do not include any home attributes or other typical hedonic regressors because they will just be absorbed by the fixed effects. 

Since selling a home is often a time consuming process, I look across multiple years to observe a home sale. Because the transaction data ends in 2016, I restrict the sample of years in this regression to 2009–2014, in order to observe two additional years of potential transactions following the final year of observations. This approach is my preferred specification to detect an effect on home transactions. 

Because the effect may accumulate over time, however, I also want to estimate the effect separately in different years relative to a flood. To that end, I also estimate an event study form of this regression. This estimation approach regresses an indicator for whether the home was sold _in the observation year_ on the same flood and insurance variables as in the earlier specification, but with several years of lag and lead variables:
$$P(transaction_{lit}) = \sum_{\tau} \left[\alpha_{\tau} Flood_{lt+\tau} + \beta_{\tau} Insurance_{lit+\tau} + \gamma_{\tau} (Flood_{lt+\tau} \times Insurance_{lit+\tau}) \right] + \varepsilon_{lit}$$
As before, the regression also includes fixed effects for location (the intersection of a census tract and a flood zone), year, year of home construction, and whether the home is subject to an adaptation requirement. The unit of analysis is again a single house in a single year, and the meaning of the subscripts is as before. The lag indexer $\tau$ takes on values from -3 (a three year lag) to +1 (a one year lead), allowing me to study how the effect on home sales changes over time. In order to have a full set of lags, I restrict the years of the sample based on the number of lags included. With two lags, I can consider sales from 2011–2016, while three lags restricts the panel to 2012–2016.  Because the NFIP data extends beyond the housing data, I can create one year lead variables of flood events and insurance policies as a check for pretrends without reducing the sample size of the panel. 

Because the choice to purchase flood insurance is endogenous, I instrument for insurance takeup using an exogenous policy shock that changed flood insurance rate schedules. The Biggert Waters Flood Insurance Reform Act of 2012 made several changes to NFIP rate schedules starting in 2013. In particular, Biggert-Waters mandated annual premium increases that were applied differently for adapted and non-adapted homes. Homes built or substantially renovated after the adoption of the first FIRM panel covering the property are required to be adapted in their design to reduce the risk of flood damage, primarily through elevation of the lowest occupied level of the home. The adapted status is defined in my data by the year of home construction relative to the year of FIRM adoption. Biggert-Waters put in place mandated premium increases for older, non-adapted homes, but did not include any mandates or policy changes that may impact home sales other than through flood insurance. This policy change should provide exogenous policy variation to identify the effect of holding flood insurance. In particular, I instrument using the interaction of an indicator variable for the years after Biggert-Waters was passed and the home's adaptation status ($\mathbb{I}(t > 2012) \times Adapted_{lc}$), along with the interaction of this instrument with the flood indicator ($\mathbb{I}(t > 2012) \times Adapted_{lc} \times Flood_{lt}$) and the relevant lag and lead versions. This instrument structure allows me to leverage the exogenous variation from the _differential_ price shocks between adapted and non-adapted homes, while still including year fixed effects, which would not be possible if instrumenting only with the year dummy $\mathbb{I}(t > 2012)$.

Table \@ref(tab:first-stage-output) shows example first stage regressions for both the difference-in-difference and event study specifications. Because insurance takeup appears both as a main effect and an interaction effect (and each again with multiple lag terms in the event study), the Biggert-Waters-induced exogenous price variation, when lagged and interacted with flood experience, needs to provide enough exogenous variation to instrument for the multiple appearances of the endogenous variable in the specification. This requirement may be of concern, particularly for the event study specification with several lags, and additional exogenous variation could be useful to better identify the multiple endogenous regressors.

```{r first-stage-policy, include=FALSE}
FS_policy_prob <- felm(policy_prob ~ flood_event*(reg_reform:adapted) + adapted | panel_id + year + YearBuilt
                             | 0 | censusTract, data = nctrans_panel[dd_sample == TRUE], exactDOF = TRUE)
# summary(FS_policy_prob)
```

```{r first-stage-flpl, include=FALSE}
FS_policy_fld_prob <- felm(flooded_insured ~ flood_event*(reg_reform:adapted) + adapted | panel_id + year + YearBuilt
                             | 0 | censusTract, data = nctrans_panel[dd_sample == TRUE], exactDOF = TRUE)
# summary(FS_policy_fld_prob)
```

```{r first-stage-policy-lags, include=FALSE}
FS_policy_prob_lags <- felm(policy_prob ~ flood_event*(reg_reform:adapted) + flood_L1*(reg_reform_L1:adapted) 
                                + flood_L2*(reg_reform_L2:adapted) + flood_F1*(reg_reform_F1:adapted)
                                + adapted | panel_id + year + YearBuilt
                             | 0 | censusTract, data = nctrans_panel[es_sample_2L == TRUE])
# summary(FS_policy_prob_lags)
```

```{r first-stage-flpl-lags, include=FALSE}
FS_policy_fld_prob_lags <- felm(flooded_insured ~ flood_event*(reg_reform:adapted) + flood_L1*(reg_reform_L1:adapted) 
                                + flood_L2*(reg_reform_L2:adapted) 
                                + flood_F1*(reg_reform_F1:adapted) + adapted | panel_id + year + YearBuilt
                             | 0 | censusTract, data = nctrans_panel[es_sample_2L == TRUE])
# summary(FS_policy_fld_prob_lags)
```

```{r first-stage-output, echo=FALSE, warning=FALSE, echo=FALSE, results='asis'}
first_stage_covars <- c("flood (concurrent)", "flood (1 year ago)", "flood (2 years ago)", 
                    "flood (next year)", "adapted", "adapted x year>2012",
                    "adapted x year>2012 (1 year ago)", "adapted x year>2012 (2 years ago)",
                    "adapted x year>2012 (next year)", "flood x adapted x year>2012",
                    "flood x adapted x year>2012 (1 year ago)", 
                    "flood x adapted x year>2012 (2 years ago)",
                    "flood x adapted x year>2012 (next year)")
first_stage_deps <- c("concurrent", "1 year ago", "2 years ago", "next year")
stargazer(FS_policy_prob, FS_policy_fld_prob, FS_policy_prob_lags, FS_policy_fld_prob_lags,
          type=ifelse(knitr::is_latex_output(), "latex", ifelse(knitr::is_html_output(),"html" , "text")), 
          covariate.labels = first_stage_covars, no.space = TRUE, column.labels = c("Difference-in-Difference", "Event Study"),
          title="First Stage: Instrumenting for Insurance Takeup", font.size = "small", column.separate = c(2,2),
          dep.var.caption = "Probability of Insurance Takeup", dep.var.labels = rep(c("Insured", "Insured x Flooded"), 2),
          df=FALSE, digits = 4, header = FALSE, keep.stat = c("n", "rsq"), label="tab:first-stage-output", column.sep.width = "1pt",
          notes = "\\parbox[t]{0.6\\textwidth}{Fixed effects: census tract x flood zone, observation year, and home construction year. Clustering SEs by census tract.}")
```

This econometric approach requires two identification assumptions: first, the probability of a home experiencing flooding is exogenous, conditional on census tract, flood zone, and adaptation status. Obviously, floods are highly spatially and temporally correlated, and homeowners can undertake substantial defensive investments (primarily though elevation) to reduce their risk. But by defining a flood event at the census tract x flood zone level, we exclude the peculiarities of a given house, and think only about the generalized flood risk in the local area. Once we control for these broader spatial effects, exogeneity of flood probability is much more credible.

The second identification assumption is that any time trends in transaction probability must be common to adapted and non-adapted homes. The instrument for insurance leverages _differential_ price changes for adapted and non-adapted homes, so common time trends do not effect the coefficient estimates, but differential time trends will violate the exclusion restriction and yield biased estimates. While it is certainly possible that older and newer homes had diverging sale trends during this period, the threshold year defining adapted and non-adapted homes differed in different areas depending on the first FIRM published for the region, ranging primarily from the mid-1970s to the mid-1990s, with some initial FIRMs established even later. This diversity of adaptation years will hopefully dilute any differing trends driven by home construction year. Figure \@ref(fig:trans-prob) shows the fraction of homes sold in a given sample year, separately for adapted and non-adapted homes. While the probability levels are different, the trends appear to be similar in the two series, lending credence to this identification assumption.

```{r trans-prob, echo=FALSE, fig.cap="Graph of home sale probability over time, separately for adapted and nonadapted homes.", fig.dim=c(7,4)}
transtime <- nctrans_panel[full_sample == TRUE, .(transfrac = mean(transaction_obs)), keyby = .(year, adapted_text)]
ggplot(transtime, aes(x=year, y=transfrac)) + geom_point(aes(color = adapted_text, shape=adapted_text)) + geom_vline(xintercept=2012.5) + ggtitle("Home Sale Probability Over Time") + ylab("Probability of Home Sale") + geom_text(aes(x=2012.35, label="Biggert-Waters Act", y=.055), angle=90) + labs(shape="Adaptation Status", color="Adaptation Status")
```

# Results

## OLS

```{r OLS-2-noF, include=FALSE}
# Two Lags, no Leads
OLS_transprob_reg_2 <- felm(transaction_obs ~ flood_event*policy_prob + flood_L1*policy_prob_L1 
                        + flood_L2*policy_prob_L2 | panel_id + year + YearBuilt + adapted
                        | 0 | censusTract, data = nctrans_panel[es_sample_2L == TRUE])
# summary(OLS_transprob_reg_2)
```

```{r OLS-3-noF, include=FALSE}
# Three Lags, No Leads
OLS_transprob_reg_3 <- felm(transaction_obs ~ flood_event*policy_prob + flood_L1*policy_prob_L1 
                        + flood_L2*policy_prob_L2 + flood_L3*policy_prob_L3 | panel_id + year + YearBuilt + adapted
                        | 0 | censusTract, data = nctrans_panel[es_sample_3L == TRUE])
# summary(OLS_transprob_reg_3)
```

```{r OLS-2-F, include=FALSE}
# Two Lags, One Lead
OLS_transprob_reg_2_lead <- felm(transaction_obs ~ flood_event*policy_prob + flood_L1*policy_prob_L1 
                        + flood_L2*policy_prob_L2 + flood_F1*policy_prob_F1 | panel_id + year + YearBuilt + adapted
                             | 0 | censusTract, data = nctrans_panel[es_sample_2L == TRUE])
# summary(OLS_transprob_reg_2_lead)
```

```{r OLS-DD, include=FALSE}
# diff-in-diff
OLS_transprob_dd <- felm(sale_2years ~ flood_event*policy_prob | panel_id + year + YearBuilt + adapted
                             | 0 | censusTract, data = nctrans_panel[dd_sample == TRUE])
# summary(OLS_transprob_dd)
```

I first consider the OLS results of these models without instrumenting for insurance takeup. I run four regressions: my preferred difference-in-difference specification (column 1) and three event study specifications (columns 2-4). To explore any time delays in the effect of flood events and insurance, the event studies vary the number of lag and lead years, which changes the available sample size of the model. Table \@ref(tab:OLS-output) shows the regression outputs for these uninstrumented models. The event study models with lag terms are indexed from the perspective of the observation year of the potential sale. For example, the _flood (2 years ago)_ variable considers whether homes are more likely to be sold if they had flooded two years prior, or perhaps more naturally, whether homes are more likely to be sold two years after a flood. Column (1) is my preferred specification. 

Without instrumenting for insurance takeup, across all models, the main effects of flooding and holding insurance drive the explained variation. Homes that have flooded are less likely to be sold in the subsequent one to two years, comporting with the results of @Zivin2020, who find a reduction in transaction probability in the years following a hurricane. Homes holding flood insurance are overall more likely to sell than homes without. Looking at the event study results, insurance held in the observation year increases sale probability, while homes that held flood insurance in one or two prior years were less likely to be sold. These effects are robust to the specification, but are very likely due to the endogenous nature of insurance takeup. Homes in SFHAs with federally backed mortgages are required to hold flood insurance, so recently purchased homes are likely to hold insurance. However, enforcement is low, so takeup falls in later years of ownership. This combination suggests that these effects are at least partially driven by sales occurring prior to flood events within a year, as well as new homeowners buying flood insurance upon purchasing a new home, per mortgage requirements. Further analysis of the relative timing of these events within a year could be able to disentangle these effects. I do not observe any significant differential effects of flooding for policyholders. 

```{r OLS-output, echo=FALSE, warning=FALSE, echo=FALSE, results='asis'}
covariate_labs <- c("flood (concurrent)", "flood (1 year ago)", "flood (2 years ago)", "flood (3 years ago)", 
                    "flood (next year)", "insurance (concurrent)", "insurance (1 year ago)", "insurance (2 years ago)",
                    "insurance (3 years ago)","insurance (next year)", "flood x insurance (concurrent)",
                    "flood x insurance (1 year ago)", "flood x insurance (2 years ago)",
                    "flood x insurance (3 years ago)", "flood x insurance (next year)")
stargazer(OLS_transprob_dd, OLS_transprob_reg_2, OLS_transprob_reg_3, OLS_transprob_reg_2_lead, 
          type=ifelse(knitr::is_latex_output(), "latex", ifelse(knitr::is_html_output(),"html" , "text")),
          covariate.labels = covariate_labs, order = c(1,3,5,7,9,2,4,6,8,10,11:15), 
          title="Home Sale Probability Against Flood Events and Insurance, OLS", 
          dep.var.caption = "Probability of Home Sale…", column.labels = c("Within the next two years", "This year"),
          dep.var.labels.include = FALSE, label="tab:OLS-output", column.separate = c(1,3),
          column.sep.width = "2pt", df=FALSE, digits = 4, header = FALSE, keep.stat = c("n", "rsq"),
          notes = "\\parbox[t]{0.55\\textwidth}{Fixed effects: census tract x flood zone, observation year, home construction year, adaptation status. Clustering SEs by census tract.}", font.size = "small")
```

## Instrumenting for Insurance Takeup

```{r IV-2-noF, include=FALSE}
# Two Lags, no Leads
transprob_reg_2 <- felm(transaction_obs ~ flood_event + flood_L1 + flood_L2 | panel_id + year + YearBuilt + adapted
                        | (policy_prob | policy_prob_L1 | policy_prob_L2 | flooded_insured 
                           | flooded_insured_L1 | flooded_insured_L2 
                           ~ flood_event:reg_reform:adapted + reg_reform:adapted + flood_L1:reg_reform_L1:adapted
                           + reg_reform_L1:adapted + flood_L2:reg_reform_L2:adapted + reg_reform_L2:adapted) 
                        | censusTract, data = nctrans_panel[es_sample_2L == TRUE], exactDOF = TRUE)
# summary(transprob_reg_2)
```

```{r IV-3-noF, include=FALSE}
# Three Lags, No Leads
transprob_reg_3 <- felm(transaction_obs ~ flood_event + flood_L1 + flood_L2 + flood_L3 | panel_id + year + YearBuilt + adapted
                        | (policy_prob | policy_prob_L1 | policy_prob_L2 | policy_prob_L3 
                           | flooded_insured | flooded_insured_L1 | flooded_insured_L2 | flooded_insured_L3 
                           ~ flood_event:reg_reform:adapted + reg_reform:adapted + flood_L1:reg_reform_L1:adapted
                           + reg_reform_L1:adapted + flood_L2:reg_reform_L2:adapted + reg_reform_L2:adapted 
                           + flood_L3:reg_reform_L3:adapted + reg_reform_L3:adapted) 
                        | censusTract, data = nctrans_panel[es_sample_3L == TRUE], exactDOF = TRUE)
# summary(transprob_reg_3)
```

```{r IV-2-F, include=FALSE}
# Two Lags, One Lead
transprob_reg_2_lead <- felm(transaction_obs ~ flood_event + flood_L1 + flood_L2 + flood_F1 | panel_id + year + YearBuilt + adapted
                             | (policy_prob | policy_prob_L1 | policy_prob_L2 | policy_prob_F1 
                                | flooded_insured | flooded_insured_L1 | flooded_insured_L2 | flooded_insured_F1
                                ~ flood_event:reg_reform:adapted + reg_reform:adapted + flood_L1:reg_reform_L1:adapted
                                + reg_reform_L1:adapted + flood_L2:reg_reform_L2:adapted + reg_reform_L2:adapted
                                + flood_F1:reg_reform_F1:adapted + reg_reform_F1:adapted) 
                             | censusTract, data = nctrans_panel[es_sample_2L == TRUE], exactDOF = TRUE)
summary(transprob_reg_2_lead)
```

```{r IV-DD, include=FALSE}
# Diff in Diff
transprob_reg_dd <- felm(sale_2years ~ flood_event | panel_id + year + YearBuilt + adapted
                             | (policy_prob | flooded_insured ~ flood_event:reg_reform:adapted + reg_reform:adapted) 
                             | censusTract, data = nctrans_panel[dd_sample == TRUE], exactDOF = TRUE)
summary(transprob_reg_dd)
```

To disambiguate true effects from the endogeneity of insurance takeup, I instrument for takeup with the Biggert-Waters premium reforms. I again run four regressions: my preferred difference-in-difference specification (column 1) and three event study specifications (columns 2-4). The event studies vary the number of lag and lead years, which changes the available sample size of the model. Note that I cannot include both a three year lag variable and a one year lead variable in the same regression given my sample. Because of the position of the Biggert-Waters act within the sample period, including three lags reduces the sample such that there is no variation in the instrument for the one year lead variable.

Table \@ref(tab:IV-output) shows the regression results when I instrument for insurance takeup. In all specifications, I am unable to find any significant effects. In addition, the coefficient estimates are implausibly large, suggesting that the instrument was not successful in providing enough exogenous variation for the multiple endogenous variables in each specification. Despite having both $\mathbb{I}(t > 2012) \times Adapted_{lc}$ and $\mathbb{I}(t > 2012) \times Adapted_{lc} \times Flood_{lt}$ to instrument for $Insurance_{lct}$ and $Flood_{lt} \times Insurance_{lct}$, and the instruments providing a nontrivial amount of predictive power for each endogenous variable individually (per Table \@ref(tab:first-stage-output)), there was not enough unique variation to provide sufficient identification for both endogenous variables simultaneously (likely due to the relatively low probability of flooding and holding insurance overall in my data). This problem was worsened in the event study specification, now with four or six endogenous regressors to be instrumented. Without enough exogenous variation, my IV estimates cannot be credibly interpreted, even their signs.

```{r IV-output, echo=FALSE, warning=FALSE, echo=FALSE, results='asis'}
stargazer(transprob_reg_dd, transprob_reg_2, transprob_reg_3, transprob_reg_2_lead, 
          type=ifelse(knitr::is_latex_output(), "latex", ifelse(knitr::is_html_output(),"html" , "text")), 
          covariate.labels = covariate_labs, column.separate = c(1,3), dep.var.labels.include = FALSE,
          title="Home Sale Probability Against Flood Events and Insurance, Instrumenting for Insurance", 
          dep.var.caption = "Probability of Home Sale…", column.labels = c("Within the next two years", "This year"), 
          column.sep.width = "2pt", df=FALSE, digits = 4, header = FALSE, keep.stat = "n", label="tab:IV-output",
          notes = "\\parbox[t]{0.55\\textwidth}{Fixed effects: census tract x flood zone, observation year, home construction year, adaptation status. Clustering SEs by census tract.}", font.size = "small")
```

# Conclusions

Understanding the behavior of homeowners in flood zones is of increasing importance as climate change exacerbates flood risk and the corresponding government outlays. Buying out owners of severe repetitive loss properties is likely to be an important step toward achieving long term financial stability for the National Flood Insurance Program, and reducing the risk to life and property in flood hazard areas. Designing such a policy to be feasible and politically palatable will be a substantial challenge, and an understanding of the behavior of homeowners will provide important insight into this policy design. This research attempts to quantify whether experiencing a flood event makes homeowners more or less likely to sell their home in subsequent years, and whether holding flood insurance moderates that effect. Using data on flood insurance policies and claims, as well as real estate assessment and transaction data, I attempt to estimate the probability of home sale using concurrent and lagged indicators of flood events and insurance, instrumenting for insurance takeup using exogenous policy variation in flood insurance premiums. The OLS results confirm the findings of @Zivin2020, but are also suggestive of bias due to ambiguity in the relative dates of floods, insurance policies, and home sales. The IV results are not usefully interpretable, as my chosen instrument for exogenous variation in insurance premiums is not able to provide enough variation to simultaneously identify the various forms of the exogenous regressor (interacted with flooding and lagged). 

Additional work is ongoing to improve and build upon this research. Further refinement of the analysis panel may be able to provide improved validity in the parameter estimates by correcting for data inconsistencies in year of home construction and the relative timing of flooding, home sales, and insurance policies. In addition, improved geocoding may increase match rates between homes and insurance policies. I believe Biggert-Waters should provide sufficient price variation to impact takeup, but potentially will need additional variation to supplement it when multiple endogenous regressors appear in the model. In addition, a cleaner dataset could allow me to explore more of the potential underlying mechanisms driving any effects on transaction probability, such as differing rates of foreclosure. Finally, more precise data from the NFIP could provide direct matching of homes with flood policies and claims, removing a potentially significant source of error in this analysis from the imprecise matching method currently employed to construct the panel.

Of course, this research design is not sufficient to provide unambiguous policy guidance around severe repetitive loss properties. While all SFHA properties are at substantially elevated flood risk (1% or more in any given year), SRL properties involve the confluence of high flood risk and low home value. Heterogeneity in home value is likely an important driver of the transaction probability, and is not well captured by the data as they currently exist. If FEMA were looking to design a more robust program of buyouts for SRL homes, they would need to think carefully about the distributional effects and moral hazard implications of any policy, both of which are beyond the scope of this paper. 

\singlespacing

\clearpage