Sitemap

Company Z

18 min readMay 30, 2025

SQ 2025 Client Project

Introduction

Note: To comply with confidentiality agreements, we cannot disclose the name of the company directly.

Introduction of Client

During the Spring 2025 Cohort, Codelab had the incredible opportunity to work on a project with a Fortune 15 U.S. energy corporation operating at the forefront of the oil and gas industry. With a transformative vision for the future, the company is making major strides in the low-carbon market and is guided by a mission to drive clean energy innovation. In this article, we are excited to showcase Codelab’s contributions to the company’s mission and the tool we developed to support their operations.

On another note, this project was one of two collaborations that Codelab had with the company — be sure to check out the other one here!

Meet the Team!

Summary of Project

Context

The stakeholders we are partnering with are specifically from the Renewable Energy team. This team is composed of analysts, researchers, and marketers. One of their priorities is to identify and acquire customers who are interested in low-carbon fuel adoption, specifically Renewable Natural Gas (RNG). Traditionally, this process involved manually scouring the web, digging for key data that would indicate the likelihood of a company’s adoption of RNG solutions.

To streamline and scale this effort, our team was tasked with creating a tool that could automate their research process. Our solution leverages web scraping to gather necessary information and displays this data to the user through an interactive dashboard. This tool would be able to sift through a broader range of content, in a faster time, providing valuable insights into a company’s sustainability commitment. It would ideally support its users with data for informed decision-making and help them prioritize partnerships with other companies. A standout feature of the tool is a score that estimates a company’s likelihood of adopting RNG solutions, helping the team prioritize high-potential partnerships.

Time Frame

April 2025 — June 2025 | 7 weeks

Tools

  • Design: Figma
  • Development: React, TailwindCSS, Next.JS, Chart.JS, FastAPI, Google Custom Search API, Playwright API, OpenAI API, PostgreSQL, Azure Cloud
  • Maintenance: Jira, Notion, Slack, Zoom

Design

Research

Competitive Analysis

To inform our design approach, we began our research with a competitive analysis of existing dashboards and sustainability management tools. Our hope was to gain an understanding of features, design patterns, and user experience strategies that would be beneficial to our tool. We analyzed three products: GoodData, IBM Envizi, and Microsoft Sustainability Manager. We compared their layouts, visual designs, UX designs, and key features.

GoodData and IBM Enivizi Competitive Analysis

Microsoft Competitive Analysis

Conducting the competitive analysis provided us with comprehensive insight into essential features that our tool should include. It also helped us pinpoint the limitations of existing products, allowing us to identify opportunities for improvement and differentiation. We drew inspiration from features such as navigation bars, filtering capabilities, and the versatility of data presentation. However, we aimed to avoid an overwhelming experience that would be convoluted for the user to follow. Rather, we wanted to create a simple, intuitive experience that would allow the users to accomplish their goal efficiently.

User Interview

As a part of our research, we conducted interviews with three contacts from the Renewable Energy Team, whose positions included Business Development Analysts and Renewable Strategy Analysts. These interviews provided us with insight into their daily workflows, frustrations the team faced, and features they’d like to see incorporated into a more efficient solution. We also used these sessions to clarify our own questions and ensure our tool aligned with their expectations.

Based on these interviews, two major pain points or frustrations stood out:

  1. Manual research was time-consuming — reading and searching through websites was a very tedious process that could take up to 3–4 hours per company!
  2. Data validation was difficult — data had to be aggregated from multiple sources and cross-checked for validity and recency.

In terms of features and functionalities that the end-users would like to see, our biggest takeaways from the interviews were:

  1. Comparison tools to evaluate the results and data across multiple companies.
  2. A summary of each company’s sustainability efforts as well as information on their respective Compressed Natural Gas (CNG) fleet data, emission goals, alternative fuels, partnerships, and market type.
  3. The ability to save company reports for easy reference in the future.

Other features that were mentioned and suggested during user interviews included pie charts, the ability to export reports as Excel Sheets, notification and updates on query status, the ability to share comments, a timeline of a company’s emission goals, company headquarters, and leadership contacts.

User Flow

Following the User Interviews, each designer independently created a user flow based on their interpretations of the insights gathered. We analyzed the similarities and differences between all three flows to determine the final flow of the tool.

Prototyping

Low-Fidelity

Disclaimer: All data shown is placeholder content for design and demonstration purposes only.

In early design stages, we experimented with different ways to structure and present the reported sustainability data through low-fidelity mockups. Each designer independently created their own iterations of the tool’s pages, tinkering with various layouts. Our goal was to jot down a wide range of ways we could display key information, ensuring an efficient and intuitive experience for users.

These visuals were instrumental in shaping our thinking around the tool’s flow and turning our ideas into more solidified designs. Using these low-fi designs, we aligned on core components of the interface: a side bar for navigation, a company summary page, a saved queries page, and a company comparison page. The designers also proposed a tab-like feature to initiate and organize multiple ongoing queries.

Mid-Fidelity

Disclaimer: All data shown is placeholder content for design and demonstration purposes only.

As we transitioned into mid-fidelity stages, our team was equipped with a much more cohesive and unified vision of the tool and its functionalities. With this foundation, each designer was able to focus on one page of the tool in greater detail, allowing us to bring key features and interactions to life.

During this stage, we also reconnected with our client contacts to conduct usability testing. We gained valuable insight into their opinions on our display of data and identified opportunities to increase efficiency when viewing company information. Through feedback from our contacts, we also introduced new features such as pie charts, bar graphs for emission reduction and strategies, and various pop-ups and filtering systems.

Before advancing forward with our high-fidelity designs, our developers also closely collaborated with designers to ensure they had a clear understanding of the proposed designs and user flows, and to confirm technical feasibility.

High Fidelity

Building on our mid-fidelity prototypes, our high-fidelity designs brought the interface to life, capturing the visual and interactive aspects of the final product. The completed tool consisted of four core pages: the search page, the individual company summary page, the company comparison page, and the saved reports page.

1. The Search Page

Disclaimer: All data shown is placeholder content for design and demonstration purposes only.

The Search Page is the starting point for users to initiate queries. Key features include:

  • Search Bar: User inputs their query here to begin the data retrieval process.
  • Loading Page: Displays while the system gathers and processes information.
  • New Tabs: Users can open multiple tabs from the top navigation bar to run and manage new queries simultaneously.

2. The Individual Company Summary Page

Disclaimer: All data shown is placeholder content for design and demonstration purposes only.

Once a query has loaded, users are taken to the Individual Company Summary Page, which displays detailed insights about the queried company. Key features include:

  • Pop-ups and Tooltips: Open or hover to view additional or more in-depth information.
  • Save Query Button: Allows users to save the company’s report for future reference. Saved reports can be accessed via the “Saved” tab.
  • Export Button: Enables users to export the company’s full report as an Excel Spreadsheet and PDF.

Sample PDF (Disclaimer: All data shown is placeholder content for design and demonstration purposes only.)

3. Company Comparison Page

Disclaimer: All data shown is placeholder content for design and demonstration purposes only.

Users can visit this page to view higher-level statistics involving multiple companies. Key features include:

  • Pie Charts: Visualize the percentage of saved companies that do or don’t satisfy a specific metric.
  • Pop-ups and Tooltips: Open or hover to view additional or more in-depth information. This includes the specific metric being analyzed, its weight in calculating the RNG adoption score, and a breakdown of companies that contribute to each portion of the chart.

4. Saved Reports Page

Disclaimer: All data shown is placeholder content for design and demonstration purposes only.

The Saved Reports Page allows users to easily access and manage previously saved company reports. Key features include:

  • Filtering and Sorting: Quickly narrow down results by applying filters or sorting based on specific criteria.
  • Search Bar: Directly search for a specific company.
  • Company Tiles: Provides a quick overview of each company’s report. Clicking on a tile will direct users to the full report page.

Development — Backend

Routing

The back end of this project was built on FastAPI and Next.js. We decided to use FastAPI as it was a simple but scalable solution that allowed us to provide fast and consistent service to front-end and Database calls. Since our backend is written in Python, and our Frontend in TypeScript, we implemented Next.js routing for fetching metrics and data from our database, along with navigation to reach direct requests to the appropriate endpoints.

There are two options in the web scraper routing:

  1. If a company card does not already exist for a queried company, a fetch request is made by Next.js to the Python backend. This triggers a GET request from FastAPI, and initializes the web scraping logic. After the data has been collected and summarized, the necessary metrics send a POST request to the frontend, displaying all the information. If the user decides to save the query, the front end sends a POST request to the database and saves the query.
  2. For all other requests, the client sends a request handled by Next.js, which routes them to the appropriate endpoint and retrieves all necessary data from the Azure Cloud Database to populate the frontend.

Query input

When a user inputs a company name, it is sent as a payload to the web scraper, which initiates the backend scraping logic. The company name is then used as input in our custom queries to retrieve the specific sustainability and emissions data that our users are seeking. If we are unable to find all the evidence required to fulfill our metric, we then use an iterative approach, gradually relaxing query restraints to allow for broader searching. This allows all required scoreboard metric data to be populated with as accurate data as possible.

Adoption Score Calculation

The main data deliverables represented on the front end are:

  1. Metrics requested by the client to assess the likelihood of customer acquisition.
  2. Custom summaries generated by the metric sources.
  3. The exact sources from which the data is pulled.
  4. An overall Adoption score — calculated based on the compiled metrics, and weighted according to client priorities.

Web Scraper Logic

Link Collection

Once the payload has been received, these queries are sent to the Google Custom Search Engine API. We chose this technology as it was incredibly efficient (under 3-second response time/request) and highly customizable, allowing us to tailor queries and specify the number of links to return. The queries are input into the API request through an iterative approach, and the resulting links are added to a set of total links.

Priority Queue Scoring

Our Web Scraper uses a custom scoring system to organize our links in a priority queue data structure. The score is calculated based on how well the URLs fit the criteria keywords for the metrics we are trying to fill out.

Iterative Scoring Process:

  • Normalize and parse the URL into a query section and a path section using the urllib.parse module, and replace special non-alphanumeric characters with dashes.
from urllib.parse import urljoin, urlparse, urldefrag

parsed = urlparse(url)
path = parsed.path.lower()
query = parsed.query.lower()
path = path.replace('/', '-').replace('_', '-')
query = query.replace('&', '-').replace('=', '-')
  • Initialize the score of every URL at 0.
  • For every primary/critical keyword that is in either the path or the query of the URL, we add 2 points.
  • For every primary/critical keyword that is in either the path or the query of the URL, we add 2 points.
  • If the URL links to a report or is a PDF file, we prioritize it, and add extra points — 2 points for PDFs, 1 point for reports.

A URL link is also given additional priority in the points schema if it is from a whitelisted website. This implementation is an object containing various companies that our client indicated as trusted and highly verified sources. We also discard all links that are from websites which are blacklisted (proven scam or highly inaccurate sources).

PDF Extraction

As soon as the unique links are aggregated and returned to the base code of the Web Scraper, the links are scanned for PDFs. If they exist, the PDF URLs are sent to a separate module that scans, extracts, and standardizes the text content by removing all white spaces. The extracted text is returned in a string format.

Web Page Crawling

When any non-PDF link is encountered, the program scans the HTML Tags and data from the scripts and grabs sentences that contain keywords. Afterwards, the sentence is processed through a function that checks the relevancy of its surrounding context. The web scrolling and extraction process is powered by the Playwright API, along with the Beautiful Soup and Trafilatura libraries.

The data is cross-referenced using a multi-tiered analysis approach that uses regex and sentence semantics patterns:

  • Tier 1 (Strict Patterns): Uses precise regex patterns to find definitive evidence
  • Tier 2 (Moderate Patterns): Applies more flexible matching for harder-to-find data
  • Tier 3 (Relaxed Patterns): Uses broad pattern matching as a last resort

This tiered approach ensures high-confidence results while still capturing evidence that might be phrased differently to satisfy all the metrics we are attempting to fulfill. Once accurate evidence for a metric has been found, the quote is appended to the final string output, and the source link is stored for future usage.

Scrolling and Sub-links

To deal with the risk of receiving incomplete information from landing pages, we have implemented Playwright as our main automation tool, which can handle JavaScript-heavy websites. Playwright also handles the high number of interactive components, shadow DOM, and ensures all pages are rendered and scrolled through for complete access to the data.

Sublinks are pulled from <a href> tags, along with any dynamic links that are rendered inside shadow DOM elements. After discovering these sub-links, there is logic to determine whether it is “worth it” to consider these links. The sub-links are scored based on whether they are PDFs (always searched), contain enough criteria keywords in the path, and most importantly, if the depth of the sub-link is less than 2(we do not add grandchildren sub-links to the queue). If these conditions are met, then the sub-links are scored and pushed to the priority queue.

LLM Summary Generation

Quotes

To ensure the accuracy of quotes passed to the Open AI API, we wanted to gather context for the quote that we are choosing. This helps us detect AI hallucinations, or incorrect inferences being made based on the quotes.

Using regex expressions, we normalize the input by space-separating the sentences, then creating a list of all sentences that contain evidence of the data we want to showcase on the front end.

Similar to the web-scraper logic, the context extractor attempts to identify criteria keywords based on the metrics we are collecting, using semantic and regex expressions to identify quoted sentences.

We have also implemented a multi-tiered keyword approach for identifying accurate sentences to use for the logic. We have similar steps here to ensure the sources pulled and the data being aggregated are correctly linked to the appropriate source.

"alt_fuels": {
"primary": ["biogas", "biodiesel", "rng", "alternate", "alternative"],
"secondary": [
"renewable diesel", "biofuel", "renewable fuel", "sustainable fuel",
"low-carbon fuel", "clean fuel", "green fuel", "hydrogen fuel",
"electric vehicle", "ev charging", "battery electric"

( Note: Above is an example of the difference in complexity and rigor of the primary and secondary tiered keywords. All data shown is placeholder content for demonstration purposes only, and does not reflect actual use cases in the final product.)

Context

After we have identified the relevant quote, we extract it and a window of context around the quote. Our logic also has error checking to avoid selecting duplicate contexts and going out of bounds of the string input.

Justification

This project is powered by the OpenAI API, which provides AI-driven inferences based on the data received. The justification is used to determine the overall adoption score, and how likely certain metrics(i.e. alternative fuels, CNG fleet truck presence, etc) exist for the specified company. These metric scores will internally fill out our quantitative metrics that will be represented in the frontend. We then use the quotes and cross-reference it with the justification (using ~ 85% threshold) to ensure that no AI hallucinations are occurring. If hallucinations are found, then the justification generation re-prompts the API until a new, accurate justification is generated.

Summary Creation

The next step of the logic is to generate a summary. This is done by using the justification to create a summary based on the quantitative metrics found in the AI inferences. The summary is then expanded on with additional context that is prompt-engineered in the API call. The prompt engineering has limits for the maximum amount of tokens to ensure that the summaries fit into the UI. Additionally, a low temperature of 0.2 is used when calling OpenAI to reduce inconsistency in the output.

Below is an example of our API request:

def _call_openai_for_section_summary_generation(prompt_text, section_title):
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are an expert sustainability analyst..."},
{"role": "user", "content": prompt_text}
],
temperature=0.2, # Low temperature for consistency
max_tokens=250 # Limit to 2-3 sentences
)

After all context selection, justification, summary generation, and score metric inferences have been made, the data is returned to the final module of the pipeline as a JSON file output.

JSON Output Parsing

The last step of the pipeline is to prep and parse the JSON file, ensuring that we are only sending data that is accurate and necessary to our clients. We discard the context, the direct quotes, and most of the justification. We ensure that we only keep the data needed to fill out our database tables.

Some of the non-discarded justifications include:

  • Numerical data for our numerical scoring metrics
  • Real values for the CNG truck and total truck fleet sizes
  • LinkedIn of the company’s Chief Sustainability Officers
  • Target years for projected emissions goals to be met

With the important quantitative data, along with the summaries remaining post-parsing, we use the defined FastAPI routes to POST the data to the frontend for rendering.

Database and Schema Design

The Database holds all saved company report data. Initially, information collected through web scraping is only sent to the frontend for visualization. If the user decides that they want to save the company query, only then is it posted to the database.

Each company has a table that hosts the collected metrics, their sources, the specialized summaries that pertain to each metric specifically, and the general company summaries.

We used Azure Cloud with PostgreSQL to better integrate ourselves into the client’s technology stack. This ensures that integration will be smoother post-hand-off.

Development — Frontend

The frontend is built using Next.js, TailwindCSS, and ChartJS in TypeScript. We decided on these technologies to ensure we can have a dynamic and responsive display in our product, and for the ease of building components based on our high-fidelity designs.

Key Development Details

  • Scalable Design — using TailwindCSS allowed our developers to quickly format and implement our designs without slowing down to create custom CSS classes.
  • Dynamic Graphs — our clients wanted graphical representations across multiple metrics in the final product. To achieve this, we use ChartJS to create interactive designs, pulling from quantitative data fetched from the backend.
  • API Implementation — Our main web-scraper logic is formatted as a custom API using FastAPI, allowing us to efficiently establish CRUD operations and RESTful endpoints. In addition, Next.js handles both routing and frontend logic, enabling seamless interaction with the Azure environment and the custom FastAPI endpoints.
  • Custom Components — The layout and designs for our modals, buttons, and interactive elements are all unique to this project. Our developers accessed Figma’s development mode to incorporate the HTML/CSS specs directly to save time in implementation.

Next Steps

While our team was already able to accomplish a lot in just these past seven weeks, a few nice-to-have features didn’t make it into the final product. If we had more time, here’s what we would tackle next:

  • Commenting Feature: Allows users to leave comments on company reports and view comments left by other team members.
  • Enhanced Export Options: Stylized PDFs with the ability to export additional pages and sections.
  • Emissions Timeline: A graphical timeline that showcases a company’s historical and forecasted carbon emissions.

Challenges and Takeaways

Setbacks

Delayed On-boarding

Our project roadmap was 7 weeks total, and we had onboarding of new team members up until week 3. Our team had a total of 12 members, so scheduling conflicts were unavoidable, especially with the addition of new members. We addressed this by prioritizing weekly team meetings at times when the majority of team members had overlapping availability and shared individual updates with any members who were unable to join.

Delayed User Interviews/Testing

Due to scheduling conflicts with our client, we had to delay initial user testing to week 3–4 of the project, and half of our users were unable to be scheduled for A/B testing within the project timeline. As a result, our designers were only able to receive input from our primary client. To mitigate potential issues, they maintained communication with end users via email throughout the process. This led to them quickly iterating on low-fidelity prototypes and delivering finalized design prototypes by week 5, well within the timeline of the project.

Consistent Communication

A key takeaway from this project was the importance of maintaining proper communication. At the start of the project, the Project Managers did not provide a clear enough set of expectations for communication, GitHub push consistency, and meeting attendance. This led to some misunderstandings of the impact of missing work meetings and early roadblocks to more extensive work.

In Week 3, the Project Managers introduced and discussed expectations for response times, course of action for missing client meetings, and logistics for the rest of the project. With clear expectations set in place, workflows were significantly faster, reducing ambiguity and stress for everyone. Afterwards, we also implemented structured frameworks on how to utilize Jira, Notion, and Slack for delegation purposes. Going forward, we learned how to set up these expectations and frameworks from the beginning.

Task Delegation

At the beginning of the project, we set up 1:1s with the Project Managers and all team members to ascertain strengths, work styles, and personal goals for the project. This allowed us to work together and create a comprehensive understanding of each team member’s capabilities, preferences, and personal objectives for the project. By conducting these individual meetings early in the project lifecycle, the project managers were able to map out optimal task assignments that aligned with both the project requirements and individual team member strengths. For example, we paired team members with strong backend skills with those wanting to gain more experience, fostering personal growth while accomplishing project deliverables.

Closing Remarks

We are honored to have worked with an industry leader and are incredibly grateful to the Renewable Energy team at our client company for their unwavering support throughout our development journey. This collaboration introduced us to the complexities of renewable natural gas markets and low-carbon energy solutions, an industry that we really had no prior knowledge of.

Throughout this intensive process, our team was able to learn and develop in response to evolving requirements and technical pivots, all while cultivating invaluable skills in cross-functional collaboration and stakeholder management.

We want to say thank you once again to our primary points of contact, whose patience and guidance were instrumental in shaping both our deliverables and our professional development. Their confidence in our abilities and willingness to invest in our success made this endeavor not just a project, but a meaningful partnership.

To anyone reading, we appreciate your support and hope our article provided valuable insights into our product and progress throughout this timeline! ❤️

--

--

CodeLab UC Davis
CodeLab UC Davis

Written by CodeLab UC Davis

CodeLab is a student-run software development and UX design agency at UC Davis.

No responses yet