I’m somewhat new to python and wrote this piece of code to do a string comparison of accounts that are being requested for import into our data base against accounts that are already present. The issue is that the accounts currently in our DB is over 65K and I’m comparing over 5K accounts for import causing this code to take over 5 hours to run. I suspect this has to do with the loop I’m using but I’m not certain how to improve it.
TLDR; I need help optimizing this code so it has a shorter run time.
from fuzzywuzzy import fuzz from fuzzywuzzy import process accounts_DB = pd.read_csv("file.csv") #65,000 rows and 15 columns accounts_SF = pd.read_csv("Requested Import.csv") #5,000 rows and 30 columns def NameComparison(DB_account, choices): """Function uses fuzzywuzzy module to perform Levenshtein distance string comparison""" return(process.extractBests(DB_account, choices, score_cutoff= 95)) options = accounts_sf["Account Name"] a_list =  for i in range(len(accounts_db)): a_list.append(NameComparison(accounts_db.at[i,"Company Name"], options)) b_list = pd.DataFrame(a_list) b_list.to_csv("Matched Accounts.csv")