Sitemap

Solidigm AI

8 min readMay 30, 2025

SQ 2025 Client Project

Introduction

During the Spring 2025 cohort, our team had the pleasure of working with Solidigm on building a data extraction pipeline as a core component of their AI debugging tool. With guidance from lead engineer Ali Hashim, and invaluable support from Jasmin Vora and Charles Anyimi, we developed an automated system designed to streamline data querying and improve the efficiency of storing service ticket metadata. Our tool allows for a scalable, consistent, and accurate way to prepare Solidigm’s technical documents and service tickets for downstream AI models.

Team

Client

Solidigm is a U.S.-based global leader in NAND flash memory solutions. Formed in December 2021 following SK hynix’s acquisition of Intel’s NAND and SSD business, Solidigm is headquartered in Rancho Cordova, California, and operates as an independent subsidiary of SK hynix. Combining Intel’s innovation legacy with SK hynix’s scale, Solidigm provides a full suite of SSD products optimized for data centers, enterprises, and client systems. The company is heavily invested in advancing AI and data-centric technologies, developing solutions to meet modern computing demands.

Task

Solidigm aimed to develop an AI-powered tool to provide quick diagnostics and automated suggestions to their clients when encountering product service issues. Traditionally, identifying the root cause of such issues could consume hours of engineering time.

The new tool uses a Retrieval-Augmented Generation (RAG) architecture trained on past service tickets, solutions, and product specifications. Our task was to build the foundation of this system: a pipeline that accurately extracts, cleans, classifies, and vectorizes raw, unstructured engineering data to feed into the RAG model — completely automatically.

Timeframe

April — June 2025 | 8 weeks

Tools

Design — Figma

Development — Data-iku, LlamaIndex, JavaScript, HTML, CSS

Maintenance — Azure, GitHub, Notion, Jira

Why This Stack?

Data-iku was the existing SaaS platform used by Solidigm, and it provided a robust GUI for designing workflows, in-house AI inference capabilities, plugin customization support, and scalable data storage — making it an ideal environment to build within.

LlamaIndex and PDFPlumber were used to process and scrape metadata from unstructured PDF files, while Pandas helped efficiently clean, merge, and transform data extracted from CSVs such as JIRA tickets.

JavaScript, HTML and CSS were chosen to build customized, user-friendly plugin interfaces within Data-iku, enabling Solidigm engineers to personalize and interact with the data extraction process visually.

Azure was used to provision and host a secure Virtual Machine configured with a VPN, securing SSH access to Solidigm’s internal servers.

Design

Ideation

At the outset of the project, the Solidigm team was uncertain about what specific user-facing features would best support long-term maintainability and usability of the tool. This ambiguity created an opportunity for the design team to take initiative in shaping the product’s direction. Through multiple brainstorming sessions and technical scoping exercises, several concepts were proposed to enhance flexibility, control, and transparency in the extraction process.

After exploring different workflows, the team ultimately structured the plugin around two core views. The first was a settings/dashboard interface that would allow engineers to control what metadata categories are extracted and define custom keywords to improve the system’s accuracy. The second was an extraction review interface where users could browse past extractions, compare raw and processed data side-by-side, and manually correct any discrepancies. This dual-view structure balanced configurability with traceability, providing both control and oversight for long-term scalability.

User Research

Research was conducted through direct interviews and collaborative sessions with Ali Hashim, the lead engineer and primary user of the tool. The goal was to identify workflows and pain points in the current service ticket management process. Key insights included the importance of customizable extraction criteria, visibility into extraction accuracy, and the need for a streamlined review process.

Lo-fis

Initial wireframes mapped out basic interactions, such as document uploads, metadata field selection, and review screens. These low-fidelity prototypes served as a foundation for discussion and ensured early alignment with technical constraints and user needs.

Sketch for Proposed Dashboard View
Sketch for Proposed Extraction View

Mid-fis

After several rounds of iteration and feedback from the client, particularly through deeper discussions around real-world use cases, the information architecture and overall user flow of the extraction review interface were significantly refined. One of the key updates involved rethinking the landing screen of the interface; it was transformed into a searchable navigation view, aligning with the user’s primary goal of locating specific tickets quickly and efficiently. Navigation elements and button placements were adjusted to improve usability and reduce friction during repeated interactions. Additionally, this phase marked the beginning of a consistent design system, ensuring visual and functional coherence across the plugin as development progressed.

Information Architecture

Hi-fis

Final prototypes were styled to match Solidigm’s visual identity and polished for clarity and usability. The two main views — Settings/Dashboard and Extraction View — were refined for intuitive navigation, and interaction patterns were simplified to minimize friction for technical users.

Final Dashboard View
Final Extraction View

Development

CSV extraction

The CSV extraction pipeline was developed to handle structured service ticket data exported from Jira. Although the files followed a standard schema, the phrasing and structure of the content within them varied across records. The system was designed to identify key metadata such as product names, issue categories, etc. through pattern recognition, keyword matching, and rule-based logic. Pandas was used for data cleaning and preprocessing tasks, including column normalization, date formatting, and merging datasets from multiple sources. Conditional parsing rules were implemented to handle inconsistencies in terminology and formatting, ensuring that the extracted data remained accurate and aligned with predefined metadata categories. This enabled consistent, large-scale processing of CSV files while maintaining a high level of accuracy and reliability.

PDF extraction

The PDF extraction workflow was built to process unstructured engineering documents that contained a mix of tables, technical descriptions, and varied formatting. PDFPlumber was used primarily to extract surrounding free-form text, handling the general body content and preserving positional structure across pages. LlamaIndex, on the other hand, was leveraged to focus specifically on extracting and organizing data from tables and cleaning the surrounding elements — removing headers, footers, and other noise that could interfere with accurate parsing. Given the complexity and inconsistency of the documents, custom heuristics were developed to identify and isolate relevant sections, ensuring accurate extraction of metadata.

Data-iku Plugin

The Data-iku plugin was designed to serve as the control interface for configuring the extraction pipeline. The first core view, the Settings/Dashboard, allows users to define which metadata categories should be extracted from incoming documents, such as product models and error codes. It also provides inputs for adding or editing keyword lists, enabling engineers to guide the extraction process for improved accuracy. This view ensures flexibility and customization, making the system adaptable to evolving requirements without requiring backend modifications.

The second core view, the Extraction View, functions as an auditing and quality control interface. It displays a searchable history of past extractions, with a side-by-side layout showing the original raw document content alongside the extracted metadata. This enables users to validate the output and manually correct any misclassifications. Together, these two views give users full transparency into how the system operates and allow for direct oversight of extraction accuracy, ensuring that the pipeline remains both trustworthy and maintainable over time.

Challenges

UI/UX features

One of the primary challenges during the project was defining how a user-facing component could meaningfully support the extraction process, especially given the initially backend-focused nature of the tool. Since the core functionality was aimed at automating metadata extraction, it wasn’t immediately clear how user interactions would be integrated. This required extensive ideation to identify valuable touchpoints for human input — such as selecting extraction categories, inputting keywords, and reviewing outputs.

Data-iku onboarding

Working within Solidigm’s existing infrastructure meant committing to Data-iku, a third-party platform that offers powerful tools but comes with a steep learning curve. While its drag-and-drop workflow system and plugin support were aligned with the project’s goals, its unconventional UI model and limited design flexibility posed constraints. Adapting to the platform required significant time for technical onboarding, as well as careful attention to how user experience could be optimized within a strict structure.

Unstructured data processing

Processing the large volume of unstructured engineering documents presented both technical and semantic challenges. Documents varied widely in structure, terminology, and formatting — ranging from structured tables to free-form descriptions and embedded schematics. To achieve high accuracy in metadata extraction, the pipeline had to incorporate a combination of rule-based logic, semantic parsing, and model-driven classification. The system was ultimately tuned to achieve over 99% accuracy, but this required rigorous experimentation and a highly modular pipeline to handle the diversity of inputs.

A Closing Word

We’d like to take a moment to once again express our heartfelt gratitude to Solidigm and their amazing team! This project has been such a rewarding experience, giving us the chance to explore the field of automation and data extraction, something that was new to almost all of us. It pushed us to grow both as developers and designers, presenting us with unique challenges to overcome and conquer. We’ve learned so much along the way, and we’re incredibly thankful for the support and trust that made this journey not only meaningful but truly enjoyable.

--

--

CodeLab UC Davis
CodeLab UC Davis

Written by CodeLab UC Davis

CodeLab is a student-run software development and UX design agency at UC Davis.

No responses yet