Streamlining the New York City Environmental Quality (CEQR) Review Application with Geospatial Tools

Open source content and tools at the core of automating complex process

City Environmental Quality Review, or CEQR, is the process by which New York City agencies determine what effect, if any, a discretionary action they approve may have upon the environment. CEQR is a disclosure process and not an approval process in and of itself. Completion of an environmental review supports other decisions made by agencies such as approvals of rezoning or variance applications, funding, or issuance of discretionary permits. Ministerial actions, such as the issuance of a building permit, are not subject to environmental review.

Historically, CEQR, along with other government environmental review programs such as the New York State Environmental Quality Review Act (SEQRA) and the National Environmental Protection Act (NEPA) have been the subject of much debate – right or wrong – with regard to being overwhelming, complicated, and costly to those individuals and/or organizations involved in projects or “actions” which trigger the application process.

CEQR is precursor to ULURP (Uniform Land Use Review Procedure), which, in part, is the approval process that decides the fate of the action.  ULURP cannot start until the environmental review process is complete.

Introducing AutoCEQR

In the New York CEQR space, leave it to a couple seasoned GIS folks to step in and combine  professional experience with geospatial tools and programming skills to offer a cost effective and streamlined process to work through the CEQR application.

AutoCEQR cofounder Matt Sloane has worked in the planning field since 2007, working extensively with SEQRA and CEQR.  Over that time Matt developed specialties in both GIS and Data Science.  As Matt learned to program the tools that power ESRI ArcDesktop software, he realized that many of the processes required by CEQR, which are explicitly prescribed by the CEQR Technical Manual, could be automated based on existing data (e.g., MapPLUTO) and several project-specific inputs. He approached Danny Sheehan, a close friend and former classmate at SUNY Geneseo’s planning and geography courses, about the project. Both agreed it would be a great opportunity to put their combined skills to work and build a platform to augment the CEQR application process.  Danny was able to bring geospatial development expertise and software production knowledge he learned at UBS, Carto, and Columbia University to start and evolve the project into a production application.

AutoCEQR leverages a mixture of City, State, and Federal data resources, though primarily relies on NYC Open Data.  Other data sources include:

This 400’ radius buffer around a subject property which requires CEQR shows adjacent parcel land use classifications that are included in the NYC MapPluto file on a regular basis

A. Coding and Software Environments

Python is at the core of the AutoCEQR technology.  For working with data, the AutoCEQR team uses  Pandas, GeoPandas, Shapely, Fiona and ArcPy for generating Map Document files (.mxd’s), and creating custom Python classes for the workloads.  Sheehan notes “With GeoPandas and Shapely it’s phenomenal how close to parity they now are for matching ArcPy functionality.”  In the development environment, PyCharm Community Edition and GitHub are used for code development & versioning.   

AutoCEQR prototyping started with ArcPy for all tasks but it was decided to abstract the high-level functions so the geoprocessing engine could be changed to GeoPandas, the geoprocessing library of choice.  For interacting and communicating with Amazon Web Services (AWS) – the current AutoCEQR Cloud Computing Platform – developers leveraged Boto3 (AWS SDK for Python).  EC2 and S3 is leveraged in the AWS environment for computing, data storage, and distribution which has enabled to keep the application computing bill fairly low per month. In the future, it is anticipated to modify the architecture by leveraging more serverless technology and more scalable architecture for added compute cost savings.   AWS generously provided AutoCEQR with free computing credits for one year through AWS Activate – which was brought to their attention as part of their involvement and experience at the Columbia Startup Lab (CSL).  QGIS is also used to verify results and quick GIS work. 

Interacting with Census data and a whole host of services is made possible by leveraging the many great open-source libraries available on PyPl and GitHub. The storefront is the Squarespace AP which is used to process and deliver orders.

AutoCEQR still uses ArcPy mapping for generating maps, .mxd’s, and map packages but given the high cost of licensing and the technical slowdown it adds to both the production application and ongoing development speed, and it’s unclear if .mxd’s will exist in future iterations. (Both Sheehan and Sloane would like to have more feedback from users if the .mxd deliverable is necessary or if the application should generate static maps with Matplotlib and GeoPandas or if interactive web maps would be more helpful.)

The data engineering ETL process mostly consists of pulling down data with requests, unzipping files, some transformations and projecting data, and API libraries and a scheduler. We download the latest data every night – whether the source is updated daily or not. Data ETL would be a big focus to redesign to improve the platform and save on cloud storage and computing costs.

In addition to being consistent with existing property zoning classifications, projects are also reviewed in context of proximity to a myriad of other special districts and overlay zones.

B.  Application Process

Users input relevant project-specific information (e.g., dwelling units, building height, square footage, etc.) via the AutoCEQR website.  From there the application software ingests  the data and checks it against public data sources – usually with some intermediate geoprocessing steps required – and then references the analysis thresholds stated in the Environmental Assessment Form (EAS) to determine which analysis the proposed project is required to undertake as part of the CEQR environmental review. For certain quantitative calculations,  AutoCEQR has translated all of that logic into functions or classes in the codebase. Users also receive the data and maps for either a CEQR Pre-Screen or a select set of CEQR Full Analysis items. This VIMEO video provides an introduction to accessing the application and illustrates the products generated.

C.  Usage

To date, AutoCEQR has had several dozen environmental professionals targeted from a few key firms to evaluate application and then go on to use AutoCEQR in production. Currently Sheeran and Sloane are allowing users to leverage AutoCEQR freely in order to get helpful product feedback and gain traction.  With the aim of soliciting feedback for refinement, feature expansion, and product evolution,  AutoCEQR has been well received by former director of the NYCDCP Environmental Assessment Review Division, Ms. Olga Abinader.  She comments:

“AutoCEQR is an excellent application – as its title indicates, it automates tedious, time-consuming CEQR documentation that has historically taken consultants dozens of person-hours to complete.  As a longtime NYC environmental review expert and former public service leader, I appreciate that it gathers data points from the City’s publicly available databases and agency websites (MapPLUTO, NYC Parks, NYC LPC, GIS sources), and combines this information with user inputs (i.e., analysis framework details) to generate useful EAS Maps, visuals, and content/data for the EAS Forms in a short turnaround. Given the time savings it offers, I am very enthusiastic about AutoCEQR as a tool and recommend it highly to consultants, public service professionals, the general public, decision-makers and others interested in preparing or reviewing CEQR materials.” 

As the product is currently operating under a freemium model, users don’t need to currently apply the discount.  However, it is important for AutoCEQR to continue this offering to support affordable housing in NYC in the event AutoCEQR ever moves to any kind of fee-based model. 

All AutoCEQR maps included in the project delivery file as both ArcGIS Map Document files (.mxd) and Map Package files (.mpk).

D.  Affordable Housing Development Services Discount

Those working on the development of Affordable Housing or Inclusionary Housing are encouraged to contact the AutoCEQR team.  It is their aim is to provide the AutoCEQR platform and reporting deeply discounted for individuals or companies involved in these types of housing projects.  If the entire development provides 100% Affordable units, the AutoCEQR team intends to provide free reporting and analysis.*

As the product is currently operating under a freemium model, users don’t need to currently apply the discount.  However, it is important for AutoCEQR to continue this offering to support affordable housing in NYC in the event AutoCEQR ever moves to any kind of fee-based model. 

* Free reporting with minimal overhead for costs associated with report processing. 

Summary 

Development and marketing efforts on the AutoCEQR project has slowed down since both Sheehan and Sloane have started new full-time positions.  Nonetheless, both continue to explore interesting options for its future development and continued success.  Individuals and companies interested in the application and/or communicating with Sheehan and Sloane are encouraged to do so via the contact information below.

Contact:

Daniel M. Sheehan
danny@autoceqr.com

Matt Sloane
matt@autoceqr.com

Developing and Applying the Geographic Aggregation Tool (GAT) at NYS Department of Health

“R”-based software application is used by public health researchers and data analysts

While often much of the Empire State GIS software discussion evolves around the use of the ESRI platform, there continues to be a steady dose and discussion of alternative geospatial client and web-based products. For example, staff at the New York State Department of Health (NYSDOH) use ArcGIS, but also other GIS programs including MapInfo, Map Marker, SatScan, and QGIS, as well as statistical programs like SAS, SPSS and Tableau for research and surveillance projects.

Supported by funding from the Centers for Disease Control and Prevention (CDC), staff from the NYSDOH Environmental Public Health Tracking (EPHT) Program – also known as “Tracking” – has leveraged the free and open-source software program “R” to develop and maintain the Geographic Aggregation Tool (GAT).  R has grown primarily out of the statistics and data analysis space which is very popular and used extensively in public health research.

GAT is currently maintained by Abigail Stamm at NYSDOH who supports a GAT webpage on GitHub containing an extensive listing of documentation, developer tools, metadata, tutorials and more.  According to Stamm, EPHT staff like using R for a couple of reasons including:

  • R is free,  easily accessible to the public, and enables staff to share the GAT package with agencies that do not have an ArcGIS license or the training and resources to use any of the other geographical software. (Legacy versions of GAT, including a SAS version, are archived on GitHub.)
  • The GAT package automates everything.  This is particularly valuable especially if users include scripting options which bypass the Graphic User Interface (series of pop-up windows) and reduces likelihood of mistakes and makes recording and reproducing workflows and updating much easier.

GAT documentation and content on GitHub.

At its core, GAT aggregates, or dissolves, geographic areas (most commonly used is census tract geography) based on numeric values for each area, such as case or population numbers, as well as other demographic values such as median income.  Health researchers often want subject data at a higher granularity than the county-level, which can conceal or mask variation, especially in counties with a mix of urban and rural populations.  Also, showing data at town level won’t work because many rural towns have very small populations.  Areas with small populations are likely to have few cases, resulting in unstable rates and also putting confidentiality of cases at risk.

To overcome these limitations NYSDOH developed GAT to join neighboring geographic areas together until a user defined population and/or number of cases is reached to support the statistical analysis desired. This allows local health departments and others to use rates to identify hot spots for targeted interventions. GAT can also be used to produce maps at varying geographic resolutions required by the user.

How GAT Works

GAT requests user inputs through a series of dialogs, including menus, checkboxes, and text boxes, so no programming knowledge is necessary.  GAT reads in a polygon shapefile which must contain, at minimum, a character variable that uniquely identifies areas and a numeric variable to sum for aggregation.  A series of dialog boxes allows the user to select:

  1. A variable to uniquely identify areas
  2. One or two aggregation variables
  3. Optionally, a variable of areas within which merging will be preferred (ex. county)
  4. The value (sum) to which the selected aggregation variable(s) should be aggregated
  5. The preferred aggregation method: closest geographic or population-weighted centroid, least value, or ratio of two values

Depending on the specifics of the data and the type of analysis of interest to the user, GAT offers four types of aggregation methods:

  1. Closest geographic centroid
  2. Closest population – weighted centroid
  3. Neighbor with the lowest count
  4. Most similar neighbor

Applying different rules or criteria to the different GAT aggregation results produce contrasting results. These are samples of the GAT aggregation tool when applied to total population numbers in towns in Hamilton and Fulton (NY) Counties. Sample code for producing these maps can be accessed here.

GAT produces two shapefiles, an aggregated file and a crosswalk. GAT also produces a PDF of maps and a log of the entire process, including user settings, any warnings, and a brief data dictionary. (The PDF and log provide much more information than being shown in this article).   These files are designed to help evaluate and report aggregation results and standardize user process.  NYSDOH staff developed GAT to standardize and automate how to aggregate New York’s 4900 census tracts.

Using the different aggregation methods may affect the values of resulting aggregated areas in different ways. For example, it was found when testing GAT that in cases where a small corner of a census tract contained most of its population, aggregating by geographic versus population weighted centroid could provide very different results. Also, when developing aggregated areas for the Tracking portal, it was recognized the portal would be displaying disease rates.  NYSDOH staff felt the most suitable aggregation method for these population-based measures would be to aggregate to the closest population weighted centroid.  To check for areas with smaller populations and unusually large numbers of cases, users have the option to aggregate by case count rather than population (or in addition to population).

While access to health data and resolution of data varies across the NYSDOH (point/address, census tract, zip code, or municipal level), staff in the Tracking program receive hospitalization and emergency room visit data at point level (resident address) from the Statewide Planning and Research Cooperative System (SPARCS). Staff geocode these datasets using multiple programs including MapMarker, SAM in ArcGIS, and NYCGBAT and assign the encompassing census tract to the point data. Methods to assign to tract based on zip code or town and to impute tracts for records that cannot be geocoded due to incomplete address have also been developed.

 Other Applications of GAT

Public Use in New York State

 Working towards NYSDOH’s goal of making sub-county data more accessible, NYS EPHT is developing a platform which will display environmental health outcomes and exposures in an interactive mapping application.  The current EPHT data portal displays county-level indicators but EPHT staff are redesigning it to include subcounty data using the sub-county aggregated areas created by GAT.  This will assist local agencies with targeting interventions while maintaining privacy and confidentiality.

Tracking is also working with Health Data NY (HDNY) to display the sub-county data. The HDNY platform will serve as a data repository displaying data for environmental health outcomes including asthma, chronic obstructive pulmonary disease (COPD), and myocardial infarction (MI; heart attacks), since these are the health outcomes local agencies are most interested in when planning extreme weather-related mitigation and resource allocation. The data will be available for public download along with the shapefile and a brief description of how the shapefile was developed.

The CDC is also using GAT to develop sub-county areas for various health outcome indicators having piloted it for data from other EPHT grantee states for health outcomes including asthma, MI, and COPD.

Sampling Design

 GAT is also being used in the development and selection of sampling areas as part of Biomonitoring NY, a statewide biomonitoring project.  In the first year of sampling this effort has focused on the sampling of households on Long Island. To this end, the group aggregated 2010 census block groups within census tracts so that each aggregated area had at least 440 households. In the next step, remaining tracts that were too small to meet the household minimum were aggregated to neighboring tracts. After completing the final aggregation, study staff randomly selected 25 aggregated areas and mailed postcards and invitation packets soliciting participants in households in the block groups within the selected aggregated areas.  Read more about the project.

Biomonitoring is a way of measuring the amount of environmental chemicals found in the human body. It is an important part of New York statewide epidemiological research that seeks to determine levels of chemical exposure in the human body and help better understand whether chemical exposures are associated with health effects in humans.

Informing Policy

In 2019, NYS passed the nation-leading Climate Leadership and Community Protection Act (CLCPA; aka “Climate Act”) to empower New York residents to fight climate change at home, at work, and in their communities.  NYSDOH, along with several other state agencies, is providing support and assisting in the implementation of the Climate Act. One effort includes identifying disadvantaged communities (DAC) that could benefit from mitigation efforts and allocation of investments. GAT was used to create sub-county areas for several climate-impacted health indicators that will be analyzed with multiple other variables to define DACs.

Learn More About GAT

A variety of documentation is available on the use and development of GAT.  Slides (pdf) from a 2020 and National Association of Health Data Organizations (NAHDO) presentation by NYSDOH staff are available here including a YouTube video.  GAT slides and a video from the 2021 useR Conference are also available.   Most recently staff presented at the 2021 Place and Health Conference in November which included epidemiologists, health geographers, social and behavioral scientists, statisticians, data scientists, and public health professionals from all levels of government.

GAT poster that was presented at the 2021 Place and Health Conference (download here). A lightning talk is also available.

Contact:

Abigail Stamm, Research Scientist
Bureau of Environmental and Occupational Epidemiology
New York State Department of Health
abigail.stamm@health.ny.gov

GAT was written in R-2.9.2 under Windows XP and was revised and converted to a package in R-3.4.3 under Windows 10 using RStudio-1.4.1103 and devtools-2.3.2. The latest version of GAT was compiled in R-3.6.1 and runs in R-3.5.3 through R-4.1.1.

https://github.com/ajstamm/gatpkg