Lambda Labs: Got Merge Conflicts?

Jeffrey Asuncion
7 min readMar 4, 2021

--

An Exciting learning experience in Lambda School Labs!

After a tough unit of learning computer science at Lambda School, it was now time to enter the Labs program. Labs is a work-place simulation program where students are thrown into the real-world product development experience. In this program, we got the chance to apply our newly learned skills to benefit a real-world stakeholder.

CitySpire is a city metrics app that is to be a one-stop resource for users to receive the most accurate city information. An app that analyzes data from cities such as populations, cost of living, rental rates, crime rates, park (walk score), and many other social and economic factors that are important in deciding where someone would like to live. This app will present such important data in an intuitive and easy-to-understand interface.

After meeting with the CitySpire stakeholder, Ike, we had direction on which features to prioritize with the endpoints of the DS API for releases 1 and 2 of the product. Once we came to a final understanding, the project tasks became more clear and less convoluted.

Description of Product Releases 1 and 2

My role as Lead Data Scientist in the project was to work on the Data Science API with a cross-functional team of Data Scientists, Front-End Engineers, and iOS Engineers. As the Lead Data Scientist, my goal was to make sure that the Handshake between Data Science and Front-End / iOS Engineers was clearly defined, as well as, making sure that Data Science Team meets the release requirements. The Handshake is an agreement made by the cross-functional teams to make sure that the DS API will integrate with the Front-End Web app and iOS app. Our main goal as the Data Science team was to create various endpoints with future predictions of location features and a recommendation list of suitable locations based on user preferences.

Features I Implemented

Data Acquisition and Preparation

Before we can predict future population, crime rate, rental rates, and walk score, we need data to make the prediction. I was tasked with the population feature. The main source for the population data is census.gov.api. And the fun associated with APIs is to read documentation, blog posts, or tutorials to make the best use of the API. After finding the census.gov.api, I needed to apply for the developer access key. Write a script for ETL from API to the data frame. Clean the data frame and any necessary. The data was stored and persisted in a CSV file and an SQLite3 database.

Machine Learning Model

Nearest Neighbors model to create a list of recommended cities, states based on user preferences such as population size, crime rate, rental rate, walk score. The Dataset for the NearestNeighbors model is a join of population, crime rate, rental rates, and walk score. If there are missing values in the dataset they will be filled with state averages and if a state average is not present use the average for the whole dataset. The model will create a recommendation list of locations based on user-preferred features of the population, crime rate, rental rate, and walk score.

Preview of the DS API endpoints

Feature engineering

Livability Score is one of the features which needed to be engineered. The Livability Score is based on the importance of crime rate, rental rates, and walk score for a specific location. The route below calculates the Livability Score but calling the crime rate, rental rate, and walk score for a particular location while combining them with user-defined weights. The percentage of the walk score is calculated 100% less the sum of crime rate percent and rental rate percent.

Data Engineering

I built the DS API with endpoint defined by the Handshake with Data Science and the Engineering Teams. The predictions and models from the Data Science teams were wrapped into the FastAPI framework. The endpoints present the following information:

  • predict future population, crime rate, rental rate, walk score
  • calculate a livability score for a specific city, state
  • recommend a list of city, states based on user parameters for population, crime rate, rental rate, and walk score

Technical Problems We Were Faced With

The Data Science Team worked together to predict the future values for location features of the population, crime rate, rental rates, and walk score. We also created our respective endpoint of each location feature in the FastAPI framework. We work independently and ran into our first issue of merge conflicts. One of the main reasons for the multitude of merge conflicts was that there were Pull Requests from 3 different Data Scientists that consisted of 100 lines of code over a couple of different files. And the merge conflicts were resolved by making more specific changes per Pull Request. Then the app worked locally.

When we jumped into the code there were a lot of issues that came up while trying to deploy the API in AWS Elastic Beanstalk. For our team, this was our first exposure to AWS Elastic Beanstalk. With all three data scientists building the FastAPI framework, we soon understood the reason we were running into problems with deployment as well as merge conflicts.

The technical building of an API was not our technical challenge for a team of three Data Scientists. The challenge came when a team of three data scientists is making Pull Requests to build each route. It became a case of too many cooks in the kitchen.

The code worked locally but went trying to merge. Merge Conflict, not just one line but sometimes many lines of code from several different files. This was not due to errors in code but in too many changes per Pull Request.

We were able to deploy the API locally but we had a number of issues trying to deploy our FastAPI app with AWS Elastic Bean Stalk. We tried deploying the app with a fully functional DS API with ML and DB endpoint but we ran into a number of errors from Load Balancer or 400 HTTP error.

How We Faced our Challenge

This is how we overcame it. It comes down to one word, delegate. We discussed as a group that one person should build the FastAPI framework and the rest of the team will support the Data Science prediction and models. The data engineer will be in charge of building the API while the Data Science Team preps the models and predictions for the endpoints.

Then we decided to start from a basic web app with dummy endpoints and deploy the basic app with Data Engineer working on FastAPI. While the Data Science team continued their work predicting future values of each of their respective features such as population, rental rates, crime rate, and walk score.

By having a Data Engineer working on the API endpoints, the cause of most merge conflicts will be minimized. And the work of each member of the Data Science Team could be wrapped into the FastAPI framework.

We also found that piplock.file was also causing a lot of problems with deploying the app. By deleting the piplock.file before ‘git push’ we were able to deploy the app.

DS API Demo

Please checkout our DS API walk through

https://www.youtube.com/embed/fyE_Oo3EW6U

Reflecting

This was a great real-world experience working with cross-functional teams. We had the opportunity to meet with Front-End and iOs Engineers and see what we could assist them with our Data Science API. We in turn had to communicate with our Engineers to ensure that we were on the same page when it came to the DS / Front-End Handshake.

As a Data Science Team, We ran into problems with deployment and merge conflicts. But with some backtracking and planning and understanding, I was able to help lead the Data Science Team to find the correct workflow and using Teamwork and Communication to get the job done. By delegating and assigning roles we were able to get the work done after being “stuck” for a week not being able to deploy.

Thank you to Lambda School for the opportunity to learn by doing.

--

--

No responses yet