Data Sources and Inputs

Last Updated: 2025-10-08

Obtaining relevant and trusted data is a common barrier to conducting impact assessments. Koi consolidates and derives lifecycle and market data from authoritative sources, removing a substantial share of the manual work that has slowed the field. We've modeled thousands of proposed climate solutions and provide both the off-the-shelf models and data in the application itself.

Data sourcing: This page contains a representative overview of the data sources used in Koi models, organized by reference category. We prioritize open-access, trusted, structured, and recent datasets, such as those provided by the IEA and government databases. This is often supplemented with bottom-up lifecycle GHG data for solution modeling, value chain segmentation, and boundary definition. Because sources vary by technology, industry, and region, the complete model-specific references are documented on each model’s Datasheet page (see screenshot below).

Koi's Data Lake

Koi maintains a comprehensive data lake containing baseline GHG intensity, solution GHG intensity, market size, and market diffusion data that can be accessed through a variety of interfaces. These data may be primary or derived but are always transparently referenced, ensuring full traceability and auditability. Every model draws on at least one of each component type, and all references can be viewed on the Datasheet tab within each model.

This data lake powers automated modeling capabilities and has enabled our database to contain 9,000+ models built on trusted underlying input data. The centralized, curated nature of this data repository ensures consistency across models while providing the flexibility to accommodate diverse climate solutions and assessment needs. Standardization of input data and outputs, as well as quality assurance and control are paramount to our data pipeline and model generation processes.

Koi Data Lake and Datasheet Interface

Major Data Providers

These are Government, Commercial, and Non-Profit data sources from which we ingest primarily baseline data. Baseline data characterizes the emissions of the reference/counterfactual scenario and consists of a baseline GHG intensity and a baseline market size.

  • International Energy Agency (IEA) - Global authority on energy statistics, forecasts, and technology roadmaps; used for modeling energy transitions and emissions pathways.
  • U.S. Energy Information Administration (US EIA) - Authoritative source for U.S. and international energy production, consumption, and emissions data; often used for baselines and trends.
  • UN FAO (FAO STAT) - Global agricultural and food systems data; supports modeling land use, livestock, and crop-related emissions and yields.
  • U.S. Environmental Protection Agency (US EPA) - U.S. environmental regulatory body; provides tools and models (eGRID, GREET, WARM, Waste Reduction) for emissions factors and lifecycle assessments.
  • U.S. Federal Highway Administration - Transportation sector data, especially for infrastructure use, vehicle miles traveled, and fuel economy baselines.
  • U.S. Department of Agriculture (USDA) - Data on U.S. agricultural production, land use, conservation practices, and rural energy systems.
  • National Renewable Energy Laboratory (NREL) - Premier U.S. lab for renewable energy data and models, including photovoltaic LCA, wind, and biomass systems.
  • National Energy Technology Laboratory (NETL) - U.S. lab providing detailed fossil fuel lifecycle inventories and emissions data for power generation technologies.
  • European Commission - Through the JRC and EEA, provides EU-wide data on environmental performance, technology assessments, and emissions inventories.
  • EU Taxonomy - Regulatory framework offering historical benchmarks and forward-looking thresholds for sustainable investments.
  • IPCC (Intergovernmental Panel on Climate Change) - Global authority on climate science; source of standardized emissions factors, global warming potentials, and scenario modeling.
  • World Resources Institute (WRI) - Through the Greenhouse Gas Protocol, provides standards and guidance for corporate and product-level emissions accounting.
  • Ember - Independent think tank offering real-time and historical global electricity grid emissions factors and energy transition tracking.
  • EDGAR - U.S. Securities and Exchange Commission (SEC) Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system; primarily used for market uptake data.

Academic and Industry Journals

These sources are often used to develop value chain GHG intensities and inform performance for emerging technologies or solutions. Academic and industry journals supply the depth needed for sensitivity analyses across plausible performance ranges and provide otherwise hard-to-obtain evidence on new technologies.

  • Journal of Cleaner Production – Interdisciplinary journal focused on sustainable production systems, circular economy, and cleaner technologies.
  • Environmental Science & Technology – Leading journal for environmental chemistry and engineering, with robust data on emissions, pollutants, and mitigation technologies.
  • Renewable and Sustainable Energy Reviews – Comprehensive reviews and meta-analyses of renewable energy technologies, lifecycle impacts, and sustainability assessments.
  • Nature Sustainability – High-impact research on the intersection of environmental, social, and economic dimensions of sustainability, including climate innovation pathways.
  • Energy Policy – In-depth policy analyses and modeling related to global and national energy systems, often covering decarbonization strategies and their implications.
  • Resources, Conservation & Recycling – Focused on resource efficiency, materials flow, waste management, and circular economy interventions with quantified impacts.
  • Agricultural Systems – Systems-level analyses of agricultural practices and innovations, including their environmental and climate impacts.
  • Science of the Total Environment – Broad environmental science journal covering multidisciplinary studies on pollutants, ecosystems, and human-environment interactions.
  • Environmental Research Letters – Open-access journal emphasizing high-quality, timely research on global environmental change and sustainability solutions.
  • Applied Energy – Engineering and modeling-focused journal on energy systems, technologies, and applications with detailed emissions and efficiency data.

LCA Datasets and Providers

Life Cycle Analysis (LCA) datasets are often used for modeling baseline and/or solution GHG intensities by lifecycle phase. In the Koi Engine, LCA data are frequently used as templates for constructing solution GHG intensities and comparing solutions against established value chains.

  • US Federal LCA Commons
    • Labs and data providers (listed above)
    • USEEIO - U.S.-focused environmentally extended input-output model
    • TRACI - US EPA Tool for the Reduction and Assessment of Chemical and Other Environmental Impacts
    • ReCiPe 2016 - Life cycle impact assessment (LCIA) and other methods
    • USLCI - U.S. Life Cycle Inventory Database

Compatible LCA Datasets that Require Commercial Authorization

  • ecoinvent – Comprehensive, high-quality life cycle inventory database widely used across industries; known for transparency and global coverage across many sectors.
  • Agri-Footprint – Specialized in agricultural and food-related life cycle data, with deep coverage of crop, livestock, and food processing systems.
  • ELCD – The European Reference Life Cycle Database, focused on harmonized and high-quality LCI data for key European industrial processes.
  • Exiobase – Environmentally extended multiregional input-output (EE-MRIO) database, ideal for macroeconomic and global supply chain footprint analysis.
  • EVAH – Focused on animal health and livestock systems, offering LCA data tailored to veterinary interventions and their environmental impacts.

Industry Reports and White Papers

  • American Chemistry Council - Publishes detailed lifecycle assessments and sustainability reports focused on plastics, petrochemicals, and chemical manufacturing.
  • Aluminum Association - Provides industry-specific data and LCA studies on aluminum production, recycling, and embodied emissions across supply chains.
  • Concrete Sustainability Hub (MIT CSHub) - Research center focused on the environmental and economic impact of concrete and cement, offering cutting-edge LCA and durability models.
  • U.S. Department of Energy (DOE) - Authoritative analyses and modeling tools for energy technologies, efficiency measures, and decarbonization pathways.
  • Vinyl Institute - Industry group publishing LCAs and technical assessments on vinyl products, especially for construction and manufacturing applications.

Additional Sources

Additional sources include EPDs, proprietary data, and LLM estimates.

  • Environmental Product Declarations (EPDs) – asphalt, aluminum, concrete, steel, etc.
  • Company-generated LCA reports – e.g., from Tesla, Ball Corporation, Veolia
  • Consultant-generated LCAs – Sometimes unnamed but linked to specific product studies
  • Direct interaction with operators
  • Algorithmic/LLM estimates – where data gaps are filled using modeled or GPT-derived values
  • Conference proceedings and working papers – e.g., from SETAC or LCA Food conferences
  • Science Based Targets Initiative (SBTi) - normative industry targets
  • One Earth Climate Model (OECM) - Normative industry pathways