BlockScience Labs - cadCAD Data Models
Our client for this project was BlockScience Labs (BSCI Labs). BlockScience Labs is a software development company that provides tools to create and manage data science workflows. Their platform seeks to empower their clients to make data-driven decisions. Users can make and experiment with data models of complex systems and make use of a growing library of models. They strive to make the data science pipeline stronger, easier, and faster.
For this project, we created data models using cadCAD: a Python package created by BlockScience to simulate complex systems. This project was driven by the interests of our developers, who created models to simulate predator-prey interaction and restaurant revenue.
Jeffrey Zang (Project Manager)
Ehong Kuo (Developer)
Sean Hingco (Developer)
Jan–May 2021 | 15 weeks
The first iteration of the model design was to simply model the interaction between a single predator species and a single prey species. We accomplish this by injecting Lotka-Volterra equations into cadCAD and simulating the results of the equations over a variable number of timesteps:
In the second stage of development, we seek to inject randomness into the model, allowing it to model real-world populations more accurately. Additionally, we introduce a second predator species, as a step to testing the viability of scalability for the model:
The third stage of development introduced a complete refactoring of the architecture. The two species Lotka-Volterra equations are replaced with the generalized equation. Additionally, we introduce a single species class to encompass all species, meaning an individual species can now simultaneously be a predator and prey species. With these adjustments, full ecosystems can be modeled to check for viability and fluctuations in population. The below image displays the change in population for a four species food chain (grasshopper, frog, python, hawk):
Restaurant Revenue Model
The first stage of development for this model after we decided on the idea was to get a working model with hard-coded values. Because this model was an original model, there were no prior equations/data that we could base our model on, unlike the predator-prey model. As a result, we used randomness for a lot of the start of our model’s logic. Here is what the current model looks like with user cuisine preferences hard-coded and static.
As we can see, restaurants that are highly preferred continue to get more business over time. Our next step with this model will be to adjust preferences to reflect the reality that, oftentimes, customers crave different cuisines after having a certain type (e.g. after having burgers, craving something other than burgers the next day).
Both models can be divided into three sections: model, configuration, and executor.
The model includes the class definitions for the objects which will be manipulated and tracked in the model. For the ecosystems model, a species class is defined, while in the restaurant-revenue model, a restaurant class and a customer class is defined:
Additionally, the model contains two sets of functions. The first set of functions, called policy functions, takes the previous state of the objects being tracked and defines some behavior to change that state, usually in the form of a numerical increase or decrease. The second set of functions, called variable functions, simply update the state of each object instance based on the values calculated with the policy functions. In the case of the predator-prey model, policy functions change the state of a species’ population by modeling the reproduction and elimination of each species. For the restaurant-revenue model, policy functions allot some currency to each restaurant based on predetermined customer preferences:
The configuration portion of the architecture simply sets the parameters under which the simulation will be run. Class instances are declared in their genesis state, which cadCAD will use to begin the simulation process. Additionally, the number of timesteps for the simulation as well as the number of Monte Carlo runs can be set in the configuration. The user is meant to be able to quickly edit this portion of the architecture to run different simulations:
The executor of the architecture is located inside a Jupyter notebook, which takes the model and configuration, and performs the actual cadCAD simulation. Additionally, pandas or matplotlib can be used to visually display the changes in object state over the course of the entire simulation:
Takeaways & Challenges
It was important to know what tasks were assigned on a weekly basis, especially because this project was more creative rather than having a set schedule. Knowing immediate and long-term milestones allowed us to keep at a productive pace throughout the term. Having weekly meetings with Chris Catoya and Chris Frazier (our mentors at BSCI Labs) not only kept us on track but gave us fresh inspiration and validation of our new ideas. Finally, having access to Slack channels was very helpful towards the end of the term as we began to develop our models and run into more technical issues that needed to be communicated with other developers at BSCI Labs.
At the beginning of the term, our team was very fixed on getting models up and running but progress was slow due to our lack of a complete understanding of cadCAD. This challenge slowed our progress as we kept running into basic issues. By revisiting cadCAD fundamentals and investing time into more research, we were able to speed up development moving forward. After we taught ourselves a deeper understanding of our tool, progress sped up so our up-front time investment was well worth a couple of weeks without development.
When brainstorming for new model ideas, we were weighing many factors such as ease of development, amount of prior modeling, and potential for additional parameters. However, when looking at the larger picture, we realized the most important thing to consider — especially in a creative project like ours — was how useful our models were. If someone would be able to use this model to learn/model a system to produce useful simulations and data, then we saw this as a good model to create. With our latest model, the restaurant revenue model, our vision was to be able to simulate the revenues restaurants make in any area depending on cuisine types, hours of operation, price point, and other factors. This makes this model useful to many different types of people.
Thank you BlockScience Labs for giving us the opportunity to work with you!